compiler principles fall 2014-2015 compiler principles lecture 3: parsing part 2 roman manevich...

Post on 05-Jan-2016

242 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Fall 2014-2015 Compiler PrinciplesLecture 3: Parsing part 2

Roman ManevichBen-Gurion University

2

Tentative syllabusFrontEnd

Scanning

Top-downParsing (LL)

Bottom-upParsing (LR)

AttributeGrammars

IntermediateRepresentation

Lowering

Optimizations

Local Optimizations

DataflowAnalysis

LoopOptimizations

Code Generation

RegisterAllocation

InstructionSelection

mid-term exam

3

Previously• Role of syntax analysis

• Context-free grammars refresher

• Top-down (predictive) parsing– Recursive descent

4

Functions for nonterminalsE LIT | (E OP E) | not ELIT true | falseOP and | or | xor

E() {if (current {TRUE, FALSE}) LIT();else if (current == LPAREN) match(LPARENT); E(); OP(); E(); match(RPAREN);else if (current == NOT) match(NOT); E();else error;

}

LIT() {if (current == TRUE) match(TRUE);else if (current == FALSE) match(FALSE);else error;

}

OP() {if (current == AND) match(AND);else if (current == OR) match(OR);else if (current == XOR) match(XOR);else error;

}

5

Technical challengeswith recursive descent

• With lookahead 1, the function for indexed_elem will never be tried… – What happens for input of the form ID[expr]

term ID | indexed_elemindexed_elem ID [ expr ]

Recursive descent: problem 1

6

Recursive descent: problem 2

int S() { return A() && match(token(‘a’)) && match(token(‘b’));}int A() { return match(token(‘a’)) || 1;}

S A a bA a |

What happens for input “ab”? What happens if you flip order of alternatives and try “aab”?

7

Recursive descent: problem 3

int E() { return E() && match(token(‘-’)) && term();}

E E - term | term

What happens when we execute this procedure? Recursive descent parsers cannot handle left-recursive grammars

p. 127

8

9

Agenda• Predicting productions via

FIRST/FOLLOW/NULLABLE sets

• Handling conflicts

• LL(k) via pushdown automata

10

How do we predict?

• How can we decide which production of ‘E’ to take?

E LIT | (E OP E) | not ELIT true | falseOP and | or | xor

FIRST sets

• For a nonterminal A, FIRST(A) is the set of terminals that can start in a sentence derived from A– Formally: FIRST(A) = {t | A * t ω}

• For a sentential form α, FIRST(α) is the set of terminals that can start in a sentence derived from α– Formally: FIRST(α) = {t | α * t ω}

11

FIRST sets example

• FIRST(E) = …?• FIRST(LIT) = …?• FIRST(OP) = …?

12

E LIT | (E OP E) | not ELIT true | falseOP and | or | xor

FIRST sets example

• FIRST(E) = FIRST(LIT) FIRST(( E OP E )) FIRST(not E)• FIRST(LIT) = { true, false }• FIRST(OP) = {and, or, xor}

• A set of recursive equations• How do we solve them?

13

E LIT | (E OP E) | not ELIT true | falseOP and | or | xor

14

Computing FIRST sets

• This is known as a fixed-point algorithm• We will see such iterative methods later in the course

and learn to reason about them

Assume no null productions (A )1. Initially, for all nonterminals A, set

FIRST(A) = { t | A t ω for some ω }2. Repeat the following until no changes occur:

for each nonterminal A for each production A α1 | … | αk

FIRST(A) = FIRST(α1) … FIRST(∪ ∪ αk)

15

Exercise: compute FIRST

STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant

TERM EXPR STMT

16

1. Initialization

STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant

TERM EXPR STMT

idconstant

zero?Not++--

ifwhile

17

2. Iterate 1

STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant

TERM EXPR STMT

idconstant

zero?Not++--

ifwhile

zero?Not++--

18

2. Iterate 2

STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant

TERM EXPR STMT

idconstant

zero?Not++--

ifwhile

idconstant

zero?Not++--

19

2. Iterate 3 – fixed-point

STMT if EXPR then STMT | while EXPR do STMT | EXPR ;EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- idTERM id | constant

TERM EXPR STMT

idconstant

zero?Not++--

ifwhile

idconstant

zero?Not++--

idconstant

20

Reasoning about the algorithm

• Is the algorithm correct?• Does it terminate? (complexity)

Assume no null productions (A )

1. Initially, for all nonterminals A, setFIRST(A) = { t | A t ω for some ω }

2. Repeat the following until no changes occur:for each nonterminal A for each production A α1 | … | αk

FIRST(A) = FIRST(α1) … FIRST(∪ ∪ αk)

21

Reasoning about the algorithm

• Termination:

• Correctness:

22

LL(1) Parsing of grammars without epsilon productions

Using FIRST sets

• Assume G has no epsilon productions and for every non-terminal X and every pair of productions X and X we have thatFIRST() FIRST() = {}

• No intersection between FIRST sets => can always pick a single rule

23

Using FIRST sets

• In our Boolean expressions example– FIRST( LIT ) = { true, false }– FIRST( ( E OP E ) ) = { ‘(‘ }– FIRST( not E ) = { not }

• If the FIRST sets intersect, may need longer lookahead– LL(k) = class of grammars in which production rule

can be determined using a lookahead of k tokens– LL(1) is an important and useful class

• What if there are epsilon productions?24

25

Extending LL(1) Parsingfor epsilon productions

26

FIRST, FOLLOW, NULLABLE sets

• For each non-terminal X• FIRST(X) = set of terminals that can start in a

sentence derived from X– FIRST(X) = {t | X * t ω}

• NULLABLE(X) if X * • FOLLOW(X) = set of terminals that can follow X

in some derivation– FOLLOW(X) = {t | S * X t }

27

Computing the NULLABLE set

• Lemma: NULLABLE(1 … k) = NULLABLE(1) … NULLABLE(k)

1. Initially NULLABLE(X) = false2. For each non-terminal X if exists a production

X then NULLABLE(X) = true

3. Repeatfor each production Y 1 … k if NULLABLE(1 … k) then

NULLABLE(Y) = trueuntil NULLABLE stabilizes

28

Exercise: compute NULLABLE

S A a bA a | B A B | CC b |

NULLABLE(S) = NULLABLE(A) NULLABLE(a) NULLABLE(b)NULLABLE(A) = NULLABLE(a) NULLABLE()NULLABLE(B) = NULLABLE(A) NULLABLE(B) NULLABLE(C)NULLABLE(C) = NULLABLE(b) NULLABLE()

29

FIRST with epsilon productions

• How do we compute FIRST(1 … k) when epsilon productions are allowed?– FIRST(1 … k) = ?

30

FIRST with epsilon productions

• How do we compute FIRST(1 … k) when epsilon productions are allowed?– FIRST(1 … k) =

if not NULLABLE(1) then FIRST(1) else FIRST(1) FIRST (2 … k)

31

Exercise: compute FIRST

S A c bA a |

NULLABLE(S) = NULLABLE(A) NULLABLE(c) NULLABLE(b)NULLABLE(A) = NULLABLE(a) NULLABLE()

FIRST(S) = FIRST(A) FIRST(cb)FIRST(A) = FIRST(a) FIRST ()

FIRST(S) = FIRST(A) {c}FIRST(A) = FIRST(a)

FOLLOW sets

• if X α Y then FOLLOW(Y) ?

if NULLABLE() or = thenFOLLOW(Y) ?

p. 189

32

FOLLOW sets

• if X α Y then FOLLOW(Y) FIRST()

if NULLABLE() or = thenFOLLOW(Y) ?

p. 189

33

FOLLOW sets

• if X α Y then FOLLOW(Y) FIRST()

if NULLABLE() or = thenFOLLOW(Y) FOLLOW(X)

p. 189

34

35

FOLLOW sets

• if X α Y then FOLLOW(Y) FIRST()

if NULLABLE() or = thenFOLLOW(Y) FOLLOW(X)

• Allows predicting epsilon productions:X when the lookahead token is in FOLLOW(X)

p. 189

S A c bA a |

What should we predict for input “cb”? What should we predict for input “acb”?

36

LL(k) grammars

37

Conflicts

• FIRST-FIRST conflict– X α and X and– If FIRST(α) FIRST(β) {}

• FIRST-FOLLOW conflict– NULLABLE(X)– If FIRST(X) FOLLOW(X) {}

38

LL(1) grammars

• A grammar is in the class LL(1) when it can be derived via:– Top-down derivation– Scanning the input from left to right (L)– Producing the leftmost derivation (L)– With lookahead of one token– For every two productions Aα and Aβ we have

FIRST(α) ∩ FIRST(β) = {}and if NULLABLE(A) then FIRST(A) FOLLOW(A) = {}

• A language is said to be LL(1) when it has an LL(1) grammar

LL(k) grammars

• Generalizes LL(1) for k lookahead tokens• Need to generalize FIRST and FOLLOW for k

lookahead tokens

39

40

Agenda• Predicting productions via

FIRST/FOLLOW/NULLABLE sets

• Handling conflicts

• LL(k) via pushdown automata

41

Handling conflicts

Back to problem 1

• FIRST(term) = { ID }• FIRST(indexed_elem) = { ID }

• FIRST-FIRST conflict

term ID | indexed_elemindexed_elem ID [ expr ]

42

Solution: left factoring• Rewrite the grammar to be in LL(1)

Intuition: just like factoring in algebra: x*y + x*z into x*(y+z)

term ID | indexed_elemindexed_elem ID [ expr ]

term ID after_IDAfter_ID [ expr ] |

43

New grammar is more complex – has epsilon production

S if E then S else S | if E then S | T

Exercise: apply left factoring

44

S if E then S else S | if E then S | T

S if E then S S’ | TS’ else S |

Exercise: apply left factoring

45

Back to problem 2

• FIRST(S) = { a } FOLLOW(S) = { } • FIRST(A) = { a } FOLLOW(A) = { a }

• FIRST-FOLLOW conflict

S A a bA a |

46

Solution: substitution

S A a bA a |

S a a b | a b

Substitute A in S

47

48

Solution: substitution

S A a bA a |

S a a b | a b

Substitute A in S

S a after_A after_A a b | b

Left factoring

Back to problem 3

• Left recursion cannot be handled with a bounded lookahead

• What can we do?

E E - term | term

49

Left recursion removal

• L(G1) = β, βα, βαα, βααα, …

• L(G2) = same

N Nα | β N βN’ N’ αN’ |

G1 G2

E E - term | term E term TE | termTE - term TE |

For our 3rd example:

p. 130

Can be done algorithmically.Problem 1: grammar becomes mangled beyond recognitionProblem 2: grammar may not be LL(1)

50

51

Recap

• Given a grammar• Compute for each non-terminal– NULLABLE– FIRST using NULLABLE– FOLLOW using FIRST and NULLABLE

• Compute FIRST for each sentential form appearing on right-hand side of a production

• Check for conflicts– If exist: attempt to remove conflicts by rewriting

grammar

52

Agenda• Predicting productions via

FIRST/FOLLOW/NULLABLE sets

• Handling conflicts

• LL(k) via pushdown automata

53

LL(1) parsing:the automata approach

By MG (talk · contribs) (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/)], via Wikimedia Commons

54

Marking “end-of-file”

• Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ and a new production rule S’ S $where $ is not part of the set of tokens

• To parse an input α with G’ we change it intoα $

• Simplifies top-down parsing with null productions and LR parsing

55

Another convention

• We will assume that all productions have been consecutively numbered(1) S E $(2) E T(3) E E + T(4) T id (5) T ( E )

LL(1) Parsers

• Recursive Descent– Manual construction

(parsing combinators make this easier, but…)– Uses recursion

• Wanted– A parser that can be generated automatically– Does not use recursion

56

• Pushdown automaton uses– Input stream– Prediction stack– Parsing table

• Nonterminal token production rule• Entry indexed by nonterminal N and token t contains the

alternative of N that must be predicated when current input starts with t

• Essentially, classic conversion from CFG to PDA– The only difference is that we replace non-

deterministic choice with the parsing table

LL(1) parsing via pushdown automata

57

58

Model of non-recursivepredictive parser

Predictive Parsing program

Parsing Table

X

Y

Z

$

Stack

$ b + a

Output

59

LL(1) parsing algorithm• Set stack=S$• While true

– Prediction• When top of stack is nonterminal N• pop N, lookup table[N,t]• If table[N,t] is not empty, push table[N,t] on prediction stack• Otherwise: return syntax error

– Match• When top of prediction stack is a terminal t, must be equal to next input

token t’. If (t = t’), pop t and consume t’.If (t ≠ t’): return syntax error

– End• When prediction stack is empty• If input is empty at that point: return success• Otherwise: return syntax error

( ) not true false and or xor $

E 2 3 1 1LIT 4 5OP 6 7 8

(1) E → LIT(2) E → ( E OP E ) (3) E → not E(4) LIT → true(5) LIT → false(6) OP → and(7) OP → or(8) OP → xor

Non

term

inal

s

Input tokens

Which rule should be used

Example transition table

60

( FIRST(E)

a b c

A A aAb A c

A aAb | caacbb$

Input suffix Stack content Move

aacbb$ A$ predict(A,a) = A aAbaacbb$ aAb$ match(a,a)

acbb$ Ab$ predict(A,a) = A aAbacbb$ aAbb$ match(a,a)

cbb$ Abb$ predict(A,c) = A ccbb$ cbb$ match(c,c)

bb$ bb$ match(b,b)

b$ b$ match(b,b)

$ $ match($,$) – success

Running parser example

61

a b c

A A aAb A c

A aAb | cabcbb$

Input suffix Stack content Move

abcbb$ A$ predict(A,a) = A aAbabcbb$ aAb$ match(a,a)

bcbb$ Ab$ predict(A,b) = ERROR

Illegal input example

62

63

Creating the prediction table

• Let G be an LL(1) grammar• Compute FIRST/NULLABLE/FOLLOW• Check for conflicts• For non-terminal N and token t predict: …

Top-down parsing summary

• Recursive descent• LL(k) grammars• LL(k) parsing with pushdown automata• Cannot deal with left recursion– Left-recursion removal might result with

complicated grammar

64

Next lecture:Bottom-up parsing

top related