parsing

33
Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Concepts Lecture 6

Upload: kuper

Post on 02-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Parsing. Programming Language Concepts Lecture 6. Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida. Context-Free Grammars. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parsing

Parsing

Prepared by

Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida

Programming Language ConceptsLecture 6

Page 2: Parsing

Context-Free Grammars

• Definition: A context-free grammar (CFG) is a quadruple G = (, , P, S), where all productions are of the form A → , for A and (u )*.

• Re-writing using grammar rules:

– βAγ => βγ if A → (derivation).

Page 3: Parsing

String Derivations

• Left-most derivation: At each step, the left-most nonterminal is re-written.

• Right-most derivation: At each step, the right-most nonterminal is re-written.

Page 4: Parsing
Page 5: Parsing

Derivation Trees

Derivation trees: Describe re-writes, independently of the order (left-most or right-most).

• Each tree branch matches a production rule in the grammar.

Page 6: Parsing
Page 7: Parsing

Derivation Trees

Notes:1) Leaves are terminals.2) Bottom contour is the sentence.3) Left recursion causes left branching.4) Right recursion causes right branching.

Page 8: Parsing

Goal of Parsing

• Examine input string, determine whether it's legal.

• Equivalent to building derivation tree. • Added benefit: tree embodies syntactic

structure of input.• Therefore, tree should be unique.

Page 9: Parsing

Ambiguous Grammars

• Definition: A CFG is ambiguous if there exist two different right-most (or left-most, but not both) derivations for some sentence z.

• (Equivalent) Definition: A CFG is ambiguous if there exist two different derivation trees for some sentence z.

Page 10: Parsing

Ambiguous Grammars

Classic ambiguities:

– Simultaneous left/right recursion: E → E + E

→ i

– Dangling else problem: S → if E then S → if E then S else S →

Page 11: Parsing
Page 12: Parsing

Operator Precedence and Associativity

• Let’s build a CFG for expressions consisting of:

– elementary identifier i.– + and - (binary ops) have lowest

precedence, and are left associative .– * and / (binary ops) have middle

precedence, and are right associative.– + and - (unary ops) have highest

precedence, and are right associative.

Page 13: Parsing

Corresponding Grammar for Expressions

E → E + T E consists of T's, → E - T separated by –’s and +'s → T (lowest precedence).T → F * T T consists of F's, → F / T separated by *'s and /'s → F (next precedence).F → - F F consists of a single P, → + F preceded by +'s and -'s. → P (next precedence).P → '(' E ')' P consists of a parenthesized E, → i or a single i (highest precedence).

Page 14: Parsing

Operator Precedence and Associativity

• Operator precedence:– The lower in the grammar, the higher the

precedence.• Operator Associativity:

– Tie breaker for precedence.– Left recursion in the grammar means

• left associativity of the operator,• left branching in the tree.

– Right recursion in the grammar means• right associativity of the operator,• right branching in the tree.

Page 15: Parsing

Building Derivation Trees

Sample Input : - + i - i * ( i + i ) / i + i

(Human) derivation tree construction:

• Bottom-up.• On each pass, scan entire expression,

process operators with highest precedence (parentheses are highest).

• Lowest precedence operators are last, at the top of tree.

Page 16: Parsing
Page 17: Parsing

Abstract Syntax Trees

• AST is a condensed version of the derivation tree.

• No noise (intermediate nodes).• String-to-tree transduction grammar:

– rules of the form A → ω => 's'. • Build 's' tree node, with one child per tree

from each nonterminal in ω.

Page 18: Parsing

Example

E → E + T => + → E - T => - → TT → F * T => * → F / T => / → FF → - F => neg → + F => + → PP → '(' E ')' → i => i

Page 19: Parsing

Sample Input : - + i - i * ( i + i ) / i + i

Page 20: Parsing

String-to-Tree Transduction

• We transduce from vocabulary of input symbols, to vocabulary of tree node names.

• Could eliminate construction of unary + node, anticipating semantics.

F → - F => neg → + F // no more unary + node → P

Page 21: Parsing

The Game of Syntactic Dominoes• The grammar:

E → E+T T → P*T P → (E) → T → P → i

• The playing pieces: An arbitrary supply of each piece (one per grammar rule).

• The game board:• Start domino at the top.• Bottom dominoes are the "input."

Page 22: Parsing
Page 23: Parsing

The Game of Syntactic Dominoes

• Game rules: – Add game pieces to the board.– Match the flat parts and the symbols.– Lines are infinitely elastic.

• Object of the game:– Connect start domino with the input

dominoes.– Leave no unmatched flat parts.

Page 24: Parsing

Parsing Strategies

• Same as for the game of syntactic dominoes.

– “Top-down” parsing: start at the start symbol, work toward the input string.

– “Bottom-up” parsing: start at the input string, work towards the goal symbol.

• In either strategy, can process the input left-to-right or right-to-left

Page 25: Parsing

Top-Down Parsing

• Attempt a left-most derivation, by predicting the re-write that will match the remaining input.

• Use a string (a stack, really) from which the input can be derived.

Page 26: Parsing

Top-Down Parsing

Start with S on the stack.At every step, two alternatives:

1) (the stack) begins with a terminal t. Match t against the first input symbol.

2) begins with a nonterminal A. Consult an OPF (Omniscient Parsing Function) to determine which production for A would lead to a match with the first symbol of the input.

The OPF does the “predicting” in such a predictive parser.

Page 27: Parsing
Page 28: Parsing

Classical Top-Down Parsing Algorithm

Push (Stack, S);while not Empty (Stack) do

if Top(Stack) then if Top(Stack) = Head(input)

then input := tail(input)Pop(Stack)

else error (Stack, input)else P:= OPF (Stack, input)

Push (Pop(Stack), RHS(P))od

Page 29: Parsing
Page 30: Parsing

Top-Down Parsing

• Most parsing methods impose bounds on the amount of stack lookback and input lookahead. For programming languages, a common choice is (1,1).

• We must define OPF (A,t), where A is the top element of the stack, and t is the first symbol on the input.

• Storage requirements: O(n2), where n is the size of the grammar vocabulary (a few hundred).

Page 31: Parsing

LL(1) Grammars

Definition:A CFG G is LL(1) (Left-to-right, Left-most, one-symbol lookahead) iff for all A, and for all A→, A→, ,

Select (A → ) ∩ Select (A → ) =

• Previous example: Grammar is not LL(1).• More later on why, and what do to about it.

Page 32: Parsing

Example:

S → A {b,}A → bAd {b} → {d, }

Disjoint!

Grammar is LL(1)!

d b

S S → A S → P

A A → A → bAd A →

(At most) one production per entry.

Page 33: Parsing

Parsing

Prepared by

Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida

Programming Language ConceptsLecture 6