Language Description: Syntax Structure

Language Descriptions

  1. Syntax:  What is a legal program

    1. Example:  English sentences are always a non-phrase followed by a verb phrase.

    2. Generally described by a formal grammer.

  2. Semantics:  What does the program mean

    1. Example:  cat is an annoying furry manmal.

    2. Generally does not have a good formal definition, but s suite of test programs.

Expressions

  1. Prefix:  Every operator preceeds its operands.

    1. No complex rules of precedence, no need for parenthesis.

    2. Example:  * 30 20 = 600

    3. Example * + - 6 4 2 5 = 20

    4. Alternative syntax:  (+ 4 (* 3 2 )) = 24

  2. Infix: Every operator is surrounded by its operands.

    1. Need for precdence (My dear aunt sally).

    2. Need to have associativity

    3. The "normal" way of doing things.

    4. Example:  1 + 2 * 3 - 4 = 3

  3. Postfix:  Every operator is preceeded by its operands.

    1. No complex rules of  precedence, no need for parenthesis.

    2. Very easy implementation using stacks.

    3. Example:  2 20 4 5 + - * = 22

  4. Precedence:  Which operation to do first

    1. Traditionally, * / then + - but what about % and casting

    2. Can be changed by ()

  5. Associativity

    1. For the same operator, do you go left or right (is 4-2-1 equal to 1 or 3)

    2. Generally go left, but exponentiation goes right

    3. Smalltalk has no precedence and everything is left-to-right (ick!)

  6. Arity:  How many operands does the operator take

    1. One:  unary minus, sqrt, casting

    2. Two + - * /

    3. Three ??

  7. Abstract Syntax Trees

    1. These are directed graphs that describe an expression.

    2. Each interior node is an operation, each leaf node a constant.

    3. Used to easily manipulate expressions.

      1. For transative operators, can swap pointers

      2. Can eliminate constant subexpressions

      3. Can reduce common subexpressions

Lexical Analysis

  1. Lexical anaysis is the art of grouping charactors into meaningful words.

  2. Definition:  Keyword:  A set of charactors that is used as part of the language definition

  3. Definition:  reserved words are keywords that cannot be used as names (of variables and procedures, etc).

  4. Example:  Understand w h i l e as a keyword 'while'.

  5. Example:  Understand >= as 'greater-than-or-equal-to'.

  6. Typically, tokens are sperated by white space.  However, main(int argc, char argv) has more tokens than whitespaces.

Grammers and BNF

  1. Grammers have four parts

    1. A terminal is a basic unit of grammer.  These are the base units of the grammer, like 'if', 'then' and 'a real number'.

    2. A token is a unit of grammer defined by one or more rules.

    3. The starting token is the token that represents the whole input, the top of the parse tree, and the place to start in parsing.

  2. Grammers are generally written in Backus-Naur form (BNF)

    1. Tokens are enclosed in <>

    2. Terminals are written litterally, optionally with quotes around them.

    3. Pipes "|" mean 'or'.

    4. Left and right hands sides of rules are seperated by '::=' or an arrow if drawn by hand.

  3. Example grammer for real numbers
       <real> ::= <int> . <fraction>
       <int>   ::= <digit> | <int><digit>
       <fraction> ::= <digit> | <digit><fraction>
        <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
    But, does this grammer parse 3.14?  Does it parse 3?  Does it parse 0.3?

  4. Example grammer for if
      <S> ::= if <expression> then <S>
      <S> ::= if <expression> then <S> else <S>
    But how do you parse if E1 then if E2 else S2?  With which if does the else corespond?

  5. A grammer is ambigious if there is more than one parse tree possible for the same imput.

    1. Generally this leads to more than one meaning for the same imput

    2. Generally they either

      1. Change the grammer to prevent this (every if statement ends with a 'fi').

      2. Add additional non-grammer rules (every else coresponds to the closest if).



The Power of Grammars

  1. Chomsky defined four grammar levels. See http://en.wikipedia.org/wiki/Chomsky_hierarchy.

  2. The most powerful grammars can describe anything “computable”. The least powerful grammars describe useful things than can be computed within reasonable time and memory restrictions.

  3. Languages generated by grammars are a superset of the languages than can be generated by regular expressions.

Definition of a Computer Language

  1. It's a formal grammar describing the syntax, a set of operators