From 82cb2ebae9a0939bda8221c586efb78aef6e23af Mon Sep 17 00:00:00 2001 From: John Sarkela Date: Thu, 25 Sep 2025 11:59:11 -0400 Subject: [PATCH] Added grammars lecture --- lectures/grammars.md | 65 ++++++++++++++++++++++++++++++++++++++++++++ lectures/index.md | 1 + 2 files changed, 66 insertions(+) create mode 100644 lectures/grammars.md diff --git a/lectures/grammars.md b/lectures/grammars.md new file mode 100644 index 0000000..b68d137 --- /dev/null +++ b/lectures/grammars.md @@ -0,0 +1,65 @@ +## Context Free Grammars +Most modern programming languages specify their grammar using a **context free grammar**, CFG. A **CFG** consists of a finite set of syntactic variables, a special variable known as the start symbol, one or more production rules that correspond to each symbol, and a set of terminals. + +A a grammar rule consists of three parts. The first part is a a **variable** on the left hand side. In the middle is the production symbol, **::=**, and on the right hand side is a list of productions. A **production** is a list of variables and terminals. Note that a single variable may have more than one production associated with it. + +Grammars are so important that most languages use a special short hand for describing context free grammars named **EBNF**. EBNF stands for Extended Backus Naur Form. Backus and Naur were two of the key developers of the Algol-60 programming language and were pioneers in the use of CFG's to specify programming languages. + +In EBNF, the start symbol is the first rule presented. All of the productions associated with a given variable are separated with the '|' character. + +Using a grammar to describe a language gives us a way to describe exactly what constitutes a legal program. More importantly, the grammatical structure of a language gives us clues we need to interpret what a program means. + +## Pushdown Automata +The machine that is equivalent to a CFG is a pushdown automata. A **pushdown automata** consists of a finite state machine control with an infinite stack. In this course, we will not have the time to formally define this machine, suffice it to say, there is a 7-tuple that formally defines this machine. + +![Pushdown automaton](./pushdownautomaton.png "a pushdown automaton") + +A pushdown automaton differs from a finite state machine in that the top of the stack can affect the transition that occurs on input of a letter and the machine can manipulate the stack. + +When a pushdown automaton starts, it pushes a symbol that represents the goal of matching the start production. As input is consumed, it can either push a symbol indicating that a new subgoal needs to be matched, or it may pop the symbol on the top of the stack indicating that pattern was matched. A pushdown automaton typically uses an empty stack as the final or accepting condition. + +The action of pushing a new subgoal is known as a **shift** operation. The action of popping a subgoal is known as a **reduce** operation. A machine representing an ambiguous language may exhibit errors. These errors fall into two categories, **shift/reduce** and **reduce/reduce**. A shift/reduce error is when the machine could either push a new subgoal or recognize a subgoal. A reduce/reduce error is when the machine can not distinguish which of two patterns have just been matched. + +One way a parser may resolve amiguities is by looking ahead in the input. Many parsers use one token of lookahead to help disambiguate what action should occur next. + +The **language of a PDA** is the set of all input strings that leave the machine in a final condition. + +## Formal Definition of a CFG +A CFG is defined by a four-tuple. + +**G = (V, T, P, S)** +* V is the set of syntactic variables +* T is the set of terminals +* P is the set of productions +* S is the start symbol +### Parse Trees +Given a grammar, it is possible to show the grammatical structure of a program as a tree. This process of finding the productions that produce a given program is called parsing. + +#### Rules for building a parse tree. + +* The root of the parse tree is the start symbol. +* Each interior node is a variable in V. +* Each leaf node is either a terminal or ℇ. If it is ℇ, then it must be an only child. +* If an interior node is labeled with the variable A and has children X1, X2, ... Xn, then there must be a production A ::= X1, X2, ... Xn +#### Ambiguous Parse Trees +Consider the following grammar. +``` +Expr ::= | Expr '+' Expr | Expr '*' Expr +``` +A grammar is termed ambiguous if there exist two different parse trees for the same input string. Demonstrate that the above grammar is ambiguous using the string ```2 + 3 * 4```. + +Note that amiguity is by itself not necessarily a problem. The string ```2 + 3 + 4``` has an ambiguous parse, but because the semantics of + are associative, the normal interpretation of either parse tree would produce the same result. + +When ambiguity is an issue, one way of resolving the ambiguity is by structuring the grammar to group operators of a common precedence together in the same rule. +For example: +``` +Expr ::= MulOp | Expr "+" MulOp +MulOp ::= Value | MulOp "*" Value +Value ::= | "(" Expr ")" +``` + +### Parsing Expression Grammars aka PEGs +A **PEG**, parsing expression grammar, is a formal grammar. It looks just like a context free grammar, but with some differences. Unlike a CFG, a PEG can never be ambiguous. If there is a valid parse tree, it will be the only one. + +PEGs have recently become more popular. PEGs are becoming more popular for a number of reasons. One is that there is a very simple rule for resolving ambiguity. The more traditional CFG based tools like yacc and bison can be difficult when parsing ambiguities arise. Another is that complexity of a modern parsing built from a PEG has linear complexity relative to the length of the input. When PEGs were first explored, they had a combintorial complexity that rendered them useless for complex programs. + diff --git a/lectures/index.md b/lectures/index.md index b6eb8ac..1658650 100644 --- a/lectures/index.md +++ b/lectures/index.md @@ -4,3 +4,4 @@ * [all about names](./names.md) * [lambda calculus](./lambda.md) * [state machines and regular expression](./stateMachinesAndRegularExprs.md) +* [grammars](./grammars.md)