winter 2006-2007 compiler construction t3 – syntax analysis (parsing, part 1 of 2) mooly sagiv and...

27
Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

Upload: garry-barnett

Post on 17-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

Winter 2006-2007Compiler ConstructionT3 – Syntax Analysis(Parsing, part 1 of 2)

Mooly Sagiv and Roman ManevichSchool of Computer Science

Tel-Aviv University

Page 2: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

3

Today: Review

Grammars, parse trees, ambiguity

Top-down parsing Bottom-up

parsing

Today

ICLanguag

e

ic

Executable

code

exeLexicalAnalysi

s

Syntax Analysi

s

Parsing

AST Symbol

Tableetc.

Inter.Rep.(IR)

CodeGeneration

Next week: Conflict resolution Shift/Reduce

parsing via JavaCup

(Error handling) AST intro. PA2

Page 3: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

4

Goals of parsing Programming language has syntactic

rules Context-Free Grammars

Decide whether program satisfies syntactic structure Error detection Error recovery Simplification: rules on tokens

Build Abstract Syntax Tree

Page 4: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

5

From text to abstract syntax

5 + (7 * x)

num+(num*id)

Lexical Analyzer

program text

token stream

Parser

Grammar:E id E numE E + EE E * EE ( E ) num

E

E E+

E * E

( E )

num id

+

num

7 x

*Abstract syntax tree

parse tree

validsyntaxerror

Page 5: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

6

TerminologySymbols:terminals (tokens) + * ( ) id numnon-terminals E

Derivation:EE + E1 + E1 + E + E1 + 2 + E1 + 2 * 3

Parse tree:

1

E

E E+

E E*

2 3

Grammar rules:E id E numE E + EE E * EE ( E )

Page 6: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

7

Ambiguity

Derivation:EE + E1 + E1 + E + E1 + 2 + E1 + 2 * 3

Parse tree:

1

E

E E+

E E*

2 3

Derivation:EE * EE * 3E + E * 3E + 2 * 31 + 2 * 3

Parse tree:

E

E E*

3E E+

1 2

Leftmost derivation Rightmost derivation

Grammar rules:E id E numE E + EE E * EE ( E )

Page 7: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

8

Grammar rewritingAmbiguous grammar:E id E numE E + EE E * EE ( E )

Non-ambiguous grammar:E E + TE TT T * FT FF idF ( E )

E

E T+

T F*

3F

2

T

F

1

Derivation:EE + T1 + T1 + T * F1 + F * F1 + 2 * F1 + 2 * 3

Parse tree:

Page 8: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

9

Top-down / predictive / recursive descent without backtracking : LL(1) “L” – left-to-right scan of input “L” – leftmost derivation “1” – predict based on one token look-ahead

For every non-terminal and token predict the next production

Bottom-up : LR(0), SLR(1), LR(1), LALR(1) “L” – left-to-right scan of input “R” – rightmost derivation (in the reversed order)

For every potential right hand side and token decide when a production is found

Parsing methods

Page 9: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

10

Top-down parsing Builds parse tree in preorder LL(1) example

Grammar:S if E then S else S S begin S LS print EL endL ; S LE num

if 5 then print 8 else…

Token : ruleif : S if E then S else S if E then S else S5 : E num if 5 then S else S print : print E if 5 then print E else S

Page 10: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

11

Problem: left recursion

Arithmetic expressions:E E + TE TT T * FT FF idF ( E )

Left recursion: E E + T Symbol on left also first symbol on right

Predictive parsing fails when two rules can start with same token E E + T E T

Rewrite grammar using left-factoring Nullable, FIRST, FOLLOW sets

Left factored grammar:E T E

p F id

Ep + T E

p F ( E )

Ep

T F Tp

Tp * F T

p

Tp

Page 11: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

12

More left recursion Non-terminal with two rules starting

with same prefix

Grammar:S if E then S else S S if E then S

Left factored grammar:S if E then S XX X else S

Page 12: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

13

Bottom-up parsing No problem with left recursion Widely used in practice LR(0), SLR(1), LR(1), LALR(1) JavaCup implements LALR(1)

Page 13: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

14

Bottom-up parsing

1 + (2) + (3)

E + (E) + (3)

+

E E + (E) E i

E

1 2 + 3

E

E + (3)

E

( ) ( )

E + (E)

E

E

E

E + (2) + (3)

Page 14: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

15

Shift-reduce parsing Parser stack: symbols (terminal and

non-terminals) + automaton states Parsing actions: sequence of shift and

reduce operations Action determined by top of stack and k input tokens

Shift: move next token to top of stack Reduce: for rule X A B C

pop C, B, A then push X Convention: $ stands for end of file

Page 15: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

16

Pushdown automaton

control

parser-table

input

stack

$

$u t w

V

Page 16: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

17

LR parsing table

state terminals non-terminals

shift/reduceactions

gotopart

0

1...

sn

rk

Shift and move to state n

Reduce by rule k

gm

Goto state m

Page 17: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

18

STATE

S E$E TE E + TT i T ( E )

Parsing table example

SYMBOL

i+()$ET

0s5errs7errerr16

1errs3errerrs2

2accept

3s5errs7errerr4

4reduce EE+T

5reduce T i

6reduce E T

7s5errs7errerr86

8errs3errs9err

9reduce T(E)

Page 18: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

19

Items

1: S E$

2: S E $

3: S E $ 4: E T

5: E T 6: E E + T

7: E E + T

8: E E + T

9: E E + T 10: T i

11: T i 12: T (E)

13: T ( E)

14: T (E )

15: T (E)

Items indicate the position inside a rule:LR(0) items are of the form A t(LR(1) items are of the form A t, )

Grammar:S E$E TE E + TT i T ( E )

Page 19: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

20

1: S E$

4: E T6: E E + T

10: T i12: T (E)

5: E T T

11: T i i

2: S E $7: E E + TE

13: T ( E)4: E T6: E E + T10: T i12: T (E)

(

(

15: T (E) ) 14: T (E )

7: E E + T

E

7: E E + T10: T i12: T (E)

+

+

8: E E + T

T

2: S E $ $

0

6

5

7

89

4

3

2

1

i

Automaton states

Page 20: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

21

Identifying handles Create a finite state automaton over

grammar symbols Sets of LR(0) items

Use automaton to build parser tables shift For items A t on token t reduce For items A on every token

Any grammar has Transition diagram GOTO table

Not every grammar has deterministic action table

When no conflicts occur use a DPDA which pushes states on the stack

Page 21: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

22

Non-LR(0) grammars When conflicts occur the grammar is not LR(0)

Parsing table contains non-determinism shift-reduce conflicts reduce-reduce conflicts shift-shift conflicts?

Known cases Operator precedence Operator associativity Dangling if-then-else Unary minus

Solutions Develop equivalent non-ambiguous grammar Patch parsing table to shift/reduce Precedence and associativity of tokens Stronger parser algorithm: SLR/LR(1)/LALR(1)

Page 22: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

23

Precedence and associativity

Precedence E E+E*E E E+E

Reduce+ precedes *

Shift * precedes +

Page 23: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

24

Precedence and associativity

Precedence E E+E*E E E+E

Reduce+ precedes *

Shift * precedes +

+

E

1 2 * 3

E

E

E

E

+

E

1 2 * 3

E

E

E

E

= 9

= 7

Page 24: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

25

Precedence and associativity

Associativity E E+E+E E E+E

Shift + right-associative

Reduce + left-associative

Precedence E E+E*E E E+E

Reduce+ precedes *

Shift * precedes +

Page 25: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

26

Dangling else/if-else ambiguity

Grammar:S if E then S else S S if E then SS other

if a then if b then e1 else e2which interpretation should we use?

(1) if a then { if b then e1 else e2 } -- standard interpretation

(2) if a then { if b then e1 } else e2

LR(1) items: token:S if E then S elseS if E then S else S (any)

shift/reduce conflict

Page 26: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

27

See you next week

Page 27: Winter 2006-2007 Compiler Construction T3 – Syntax Analysis (Parsing, part 1 of 2) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

28

Non-ambiguous CFG

LALR(1)

SLR(1)

LL(1)

LR(0)

Grammar hierarchy