parent
1f74368dc8
commit
82cb2ebae9
@ -0,0 +1,65 @@
|
|||||||
|
## Context Free Grammars
|
||||||
|
Most modern programming languages specify their grammar using a **context free grammar**, CFG. A **CFG** consists of a finite set of syntactic variables, a special variable known as the start symbol, one or more production rules that correspond to each symbol, and a set of terminals.
|
||||||
|
|
||||||
|
A a grammar rule consists of three parts. The first part is a a **variable** on the left hand side. In the middle is the production symbol, **::=**, and on the right hand side is a list of productions. A **production** is a list of variables and terminals. Note that a single variable may have more than one production associated with it.
|
||||||
|
|
||||||
|
Grammars are so important that most languages use a special short hand for describing context free grammars named **EBNF**. EBNF stands for Extended Backus Naur Form. Backus and Naur were two of the key developers of the Algol-60 programming language and were pioneers in the use of CFG's to specify programming languages.
|
||||||
|
|
||||||
|
In EBNF, the start symbol is the first rule presented. All of the productions associated with a given variable are separated with the '|' character.
|
||||||
|
|
||||||
|
Using a grammar to describe a language gives us a way to describe exactly what constitutes a legal program. More importantly, the grammatical structure of a language gives us clues we need to interpret what a program means.
|
||||||
|
|
||||||
|
## Pushdown Automata
|
||||||
|
The machine that is equivalent to a CFG is a pushdown automata. A **pushdown automata** consists of a finite state machine control with an infinite stack. In this course, we will not have the time to formally define this machine, suffice it to say, there is a 7-tuple that formally defines this machine.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
A pushdown automaton differs from a finite state machine in that the top of the stack can affect the transition that occurs on input of a letter and the machine can manipulate the stack.
|
||||||
|
|
||||||
|
When a pushdown automaton starts, it pushes a symbol that represents the goal of matching the start production. As input is consumed, it can either push a symbol indicating that a new subgoal needs to be matched, or it may pop the symbol on the top of the stack indicating that pattern was matched. A pushdown automaton typically uses an empty stack as the final or accepting condition.
|
||||||
|
|
||||||
|
The action of pushing a new subgoal is known as a **shift** operation. The action of popping a subgoal is known as a **reduce** operation. A machine representing an ambiguous language may exhibit errors. These errors fall into two categories, **shift/reduce** and **reduce/reduce**. A shift/reduce error is when the machine could either push a new subgoal or recognize a subgoal. A reduce/reduce error is when the machine can not distinguish which of two patterns have just been matched.
|
||||||
|
|
||||||
|
One way a parser may resolve amiguities is by looking ahead in the input. Many parsers use one token of lookahead to help disambiguate what action should occur next.
|
||||||
|
|
||||||
|
The **language of a PDA** is the set of all input strings that leave the machine in a final condition.
|
||||||
|
|
||||||
|
## Formal Definition of a CFG
|
||||||
|
A CFG is defined by a four-tuple.
|
||||||
|
|
||||||
|
**G = (V, T, P, S)**
|
||||||
|
* V is the set of syntactic variables
|
||||||
|
* T is the set of terminals
|
||||||
|
* P is the set of productions
|
||||||
|
* S is the start symbol
|
||||||
|
### Parse Trees
|
||||||
|
Given a grammar, it is possible to show the grammatical structure of a program as a tree. This process of finding the productions that produce a given program is called parsing.
|
||||||
|
|
||||||
|
#### Rules for building a parse tree.
|
||||||
|
|
||||||
|
* The root of the parse tree is the start symbol.
|
||||||
|
* Each interior node is a variable in V.
|
||||||
|
* Each leaf node is either a terminal or ℇ. If it is ℇ, then it must be an only child.
|
||||||
|
* If an interior node is labeled with the variable A and has children X1, X2, ... Xn, then there must be a production A ::= X1, X2, ... Xn
|
||||||
|
#### Ambiguous Parse Trees
|
||||||
|
Consider the following grammar.
|
||||||
|
```
|
||||||
|
Expr ::= <number> | Expr '+' Expr | Expr '*' Expr
|
||||||
|
```
|
||||||
|
A grammar is termed ambiguous if there exist two different parse trees for the same input string. Demonstrate that the above grammar is ambiguous using the string ```2 + 3 * 4```.
|
||||||
|
|
||||||
|
Note that amiguity is by itself not necessarily a problem. The string ```2 + 3 + 4``` has an ambiguous parse, but because the semantics of + are associative, the normal interpretation of either parse tree would produce the same result.
|
||||||
|
|
||||||
|
When ambiguity is an issue, one way of resolving the ambiguity is by structuring the grammar to group operators of a common precedence together in the same rule.
|
||||||
|
For example:
|
||||||
|
```
|
||||||
|
Expr ::= MulOp | Expr "+" MulOp
|
||||||
|
MulOp ::= Value | MulOp "*" Value
|
||||||
|
Value ::= <number> | "(" Expr ")"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Parsing Expression Grammars aka PEGs
|
||||||
|
A **PEG**, parsing expression grammar, is a formal grammar. It looks just like a context free grammar, but with some differences. Unlike a CFG, a PEG can never be ambiguous. If there is a valid parse tree, it will be the only one.
|
||||||
|
|
||||||
|
PEGs have recently become more popular. PEGs are becoming more popular for a number of reasons. One is that there is a very simple rule for resolving ambiguity. The more traditional CFG based tools like yacc and bison can be difficult when parsing ambiguities arise. Another is that complexity of a modern parsing built from a PEG has linear complexity relative to the length of the input. When PEGs were first explored, they had a combintorial complexity that rendered them useless for complex programs.
|
||||||
|
|
Loading…
Reference in new issue