## Chomsky Hierarchy Academic studies of languages have focused heavily on the syntax of languages. This study included both natural languages and formal languages. A natural language is what we normally think of when we speak of languages. A **formal language** is a mathematical entity that follows strict rules. While a natural language has syntax, rules are frequently bent when a natural language is used. A formal language is usually defined by its syntax in a rigidly defined mathematical manner. Noam Chomsky is a linguist who taught at MIT. While there he developed a classification of languages that has become known as the **Chomsky Hierarchy** (aka Chomsky-Schützenberger Hierarchy). This idea centers around the idea that there are four classes of formal grammars that have the capability of generating increasingly complex languages from a syntactical point of view. An **automaton** is a machine that performs a function according to a predetermined set of coded instructions, especially one capable of a range of programmed responses to different circumstances. This idea of a hierarch of grammars was fundamental in the development of the theory of formal languages. Of interest to computer science is the fact that each of these grammar types has a dual automaton that recognizes that language. | grammar type | automaton | |-------------|-----------| | recursively enumerable | turing machine/lambda calculus | | context sensitive | linear bounded turing machine | | context free | pushdown automaton | | regular | finite state machine | The **recursively enumerable languages** correspond in some sense to what used to be called computable functions. Any program you write in a modern programming language falls into this category. The idea that the effectively computable functions are the same as those that can be implemented with a Turing machine is known as the **Church-Turing thesis**. The remaining languages are characterized by the structure of their syntax. The **context sensitive languages** are not going to be described in this course. The **context free languages** have a syntax structure that forms tree shapes. These are very useful in computing. Most programming languages will specify a context free grammar for the language and use a program called a parser to produce the corresponding tree structure. The **regular languages** are known to programmers as regular expressions.