2.4 KiB
Chomsky Hierarchy
Academic studies of languages have focused heavily on the syntax of languages. This study included both natural languages and formal languages. A natural language is what we normally think of when we speak of languages. A formal language is a mathematical entity that follows strict rules. While a natural language has syntax, rules are frequently bent when a natural language is used. A formal language is usually defined by its syntax in a rigidly defined mathematical manner.
Noam Chomsky is a linguist who taught at MIT. While there he developed a classification of languages that has become known as the Chomsky Hierarchy (aka Chomsky-Schützenberger Hierarchy). This idea centers around the idea that there are four classes of formal grammars that have the capability of generating increasingly complex languages from a syntactical point of view.
An automaton is a machine that performs a function according to a predetermined set of coded instructions, especially one capable of a range of programmed responses to different circumstances.
This idea of a hierarch of grammars was fundamental in the development of the theory of formal languages. Of interest to computer science is the fact that each of these grammar types has a dual automaton that recognizes that language.
grammar type | automaton |
---|---|
recursively enumerable | turing machine/lambda calculus |
context sensitive | linear bounded turing machine |
context free | pushdown automaton |
regular | finite state machine |
The recursively enumerable languages correspond in some sense to what used to be called computable functions. Any program you write in a modern programming language falls into this category. The idea that the effectively computable functions are the same as those that can be implemented with a Turing machine is known as the Church-Turing thesis.
The remaining languages are characterized by the structure of their syntax.
The context sensitive languages are not going to be described in this course.
The context free languages have a syntax structure that forms tree shapes. These are very useful in computing. Most programming languages will specify a context free grammar for the language and use a program called a parser to produce the corresponding tree structure.
The regular languages are known to programmers as regular expressions.