syntax analysis - wordpress.com · 1. to check syntax (= string recognizer) – and to report...

Syntax AnalysisSyntax Analysis

1

Syntax Analysis

LexicalAnalyzerLexical

Analyzer

Parserand rest offront-end


SourceProgram

Token,tokenval

Get nexttoken

Intermediaterepresentation

Position of a Parserin the Compiler Model

2

LexicalAnalyzer



Symbol TableSymbol Table

Get nexttoken

Lexical error Syntax errorSemantic error

The Role Of Parser

• A parser implements a C-F grammar• The role of the parser is twofold:1. To check syntax (= string recognizer)

– And to report syntax errors accurately

2. To invoke semantic actions– For static semantics checking, e.g. type checking of

expressions, functions, etc.– For syntax-directed translation of the source code to an

intermediate representation






3











Syntax-Directed Translation

• One of the major roles of the parser is to produce anintermediate representation (IR) of the sourceprogram using syntax-directed translation methods

• Possible IR output:– Abstract syntax trees (ASTs)– Control-flow graphs (CFGs) with triples, three-address

code, or register transfer list notation– WHIRL (SGI Pro64 compiler) has 5 IR levels!




4







Error Handling

• A good compiler should assist in identifying andlocating errors– Lexical errors: important, compiler can easily recover and

continue– Syntax errors: most important for compiler, can almost

always recover– Static semantic errors: important, can sometimes recover– Dynamic semantic errors: hard or impossible to detect at

compile time, runtime checks are required– Logical errors: hard or impossible to detect





5









Viable-Prefix Property

• The viable-prefix property of LL/LR parsersallows early detection of syntax errors– Goal: detection of an error as soon as possible

without further consuming unnecessary input– How: detect an error as soon as the prefix of the

input does not match a prefix of any string in thelanguage




6







…for (;)…

…DO 10 I = 1;0…

Error isdetected here

Error isdetected here

Prefix Prefix

Error Recovery Strategies

• Panic mode– Discard input until a token in a set of designated

synchronizing tokens is found• Phrase-level recovery

– Perform local correction on the input to repair the error• Error productions

– Augment grammar with productions for erroneousconstructs

• Global correction– Choose a minimal sequence of changes to obtain a global

least-cost correction







7













Grammars (Recap)

• Context-free grammar is a 4-tupleG = (N, T, P, S) where– T is a finite set of tokens (terminal symbols)– N is a finite set of nonterminals– P is a finite set of productions of the form

where (NT)* N (NT)* and (NT)*– S N is a designated start symbol



8





Notational Conventions Used

• Terminalsa,b,c,… Tspecific terminals: 0, 1, id, +

• NonterminalsA,B,C,… Nspecific nonterminals: expr, term, stmt

• Grammar symbolsX,Y,Z (NT)

• Strings of terminalsu,v,w,x,y,z T*

• Strings of grammar symbols,, (NT)*






9











Derivations (Recap)

• The one-step derivation is defined by A

where A is a production in the grammar• In addition, we define

– is leftmostlm if does not contain a nonterminal– is rightmostrm if does not contain a nonterminal– Transitive closure* (zero or more steps)– Positive closure+ (one or more steps)

• The language generated by G is defined byL(G) = {w T* | S+ w}





10









Derivation (Example)

Grammar G = ({E}, {+,*,(,),-,id}, P, E) withproductions P = E E + E

E E * EE ( E )E - EE id



11





E - E - idE - E - id

E* EE* E

E+ id * id + idE+ id * id + id

Erm E + Erm E + idrm id + idErm E + Erm E + idrm id + id

Example derivations:Example derivations:

E* id + idE* id + id

Chomsky Hierarchy: LanguageClassification

• A grammar G is said to be– Regular if it is right linear where each production is of the

formA w B or A w

or left linear where each production is of the formA B w or A w

– Context free if each production is of the formA

where A N and (NT)*– Context sensitive if each production is of the form A

where A N, ,, (NT)*, || > 0– Unrestricted


formA w B or A w





12


formA w B or A w






formA w B or A w





Chomsky Hierarchy

L(regular) L(context free) L(context sensitive) L(unrestricted)L(regular) L(context free) L(context sensitive) L(unrestricted)

Where L(T) = { L(G) | G is of type T }That is: the set of all languages

generated by grammars G of type T



13





L1 = { anbn | n 1 } is context freeL1 = { anbn | n 1 } is context free

L2 = { anbncn | n 1 } is context sensitiveL2 = { anbncn | n 1 } is context sensitive

Every finite language is regular! (construct a FSA for strings in L(G))Every finite language is regular! (construct a FSA for strings in L(G))

Examples:Examples:

Parsing

• Universal (any C-F grammar)– Cocke-Younger-Kasimi– Earley

• Top-down (C-F grammar with restrictions)– Recursive descent (predictive parsing)– LL (Left-to-right, Leftmost derivation) methods

• Bottom-up (C-F grammar with restrictions)– Operator precedence parsing– LR (Left-to-right, Rightmost derivation) methods

• SLR, canonical LR, LALR





14









Top-Down Parsing

• LL methods (Left-to-right, Leftmost derivation)and recursive-descent parsing


Grammar:E T + TT ( E )T - ET id

Leftmost derivation:Elm T + Tlm id + Tlm id + id

15


Grammar:E T + TT ( E )T - ET id

Leftmost derivation:Elm T + Tlm id + Tlm id + id

E E

T

+

T

idid

E

TT

+

E

T

+

T

id

• Productions of the formA A

| |

are left recursive• When one of the productions in a grammar is

left recursive then a predictive parser loopsforever on certain inputs


| |



Left Recursion (Recap)

16


| |




| |



General Left Recursion EliminationMethod

Arrange the nonterminals in some order A1, A2, …, Anfor i = 1, …, n do

for j = 1, …, i-1 doreplace each

Ai Aj with

Ai 1 | 2 | … | k where

Aj 1 | 2 | … | kenddoeliminate the immediate left recursion in Ai

enddo



Ai Aj with

Ai 1 | 2 | … | k where


enddo

17



Ai Aj with

Ai 1 | 2 | … | k where


enddo



Ai Aj with

Ai 1 | 2 | … | k where


enddo

Immediate Left-Recursion EliminationMethod

Rewrite every left-recursive productionA A

| | | A

into a right-recursive production:A AR

| ARAR AR

| AR|


| | | A


| ARAR AR

| AR|

18


| | | A


| ARAR AR

| AR|


| | | A


| ARAR AR

| AR|

Left Factoring

• When a nonterminal has two or more productionswhose right-hand sides start with the same grammarsymbols, the grammar is not LL(1) and cannot beused for predictive parsing

• Replace productionsA 1 | 2 | … | n |

withA AR | AR1 | 2 | … | n



withA AR | AR1 | 2 | … | n

20



withA AR | AR1 | 2 | … | n



withA AR | AR1 | 2 | … | n

Predictive Parsing

• Eliminate left recursion from grammar• Left factor the grammar• Compute FIRST and FOLLOW• Two variants:

– Recursive (recursive calls)– Non-recursive (table-driven)



21





FIRST (Revisited)

• FIRST() = { the set of terminals that begin allstrings derived from }

FIRST(a) = {a} if a TFIRST() = {}FIRST(A) = A FIRST() for A PFIRST(X1X2…Xk) =

if for all j = 1, …, i-1 : FIRST(Xj) thenadd non- in FIRST(Xi) to FIRST(X1X2…Xk)

if for all j = 1, …, k : FIRST(Xj) thenadd to FIRST(X1X2…Xk)





22









FOLLOW• FOLLOW(A) = { the set of terminals that can

immediately follow nonterminal A }

FOLLOW(A) =for all (B A ) P do

add FIRST()\{} to FOLLOW(A)for all (B A ) P and FIRST() do

add FOLLOW(B) to FOLLOW(A)for all (B A) P do

add FOLLOW(B) to FOLLOW(A)if A is the start symbol S then

add $ to FOLLOW(A)

• FOLLOW(A) = { the set of terminals that canimmediately follow nonterminal A }





add $ to FOLLOW(A)

23






add $ to FOLLOW(A)






add $ to FOLLOW(A)

First Set (2)

S aSeS BB bBeB CC cCeC d

Red : A Blue :


G0


G0

First Set (2)

• First (SaSe) = First(a) ={a}• First (SB) = First(B)• First (B bBe) = First(b)={b}• First (B C) = First(C)• First (C cCe) = First(c) ={c}• First (C d) = First(d)={d}

Red : A Blue :

Step 1:• First (SaSe) = First(a) ={a}• First (SB) = First(B)• First (B bBe) = First(b)={b}• First (B C) = First(C)• First (C cCe) = First(c) ={c}• First (C d) = First(d)={d}


G0

First Set (2)

• First (SaSe) = {a}• First (SB) = First(B)• First (B bBe) = {b}• First (B C) = First(C)• First (C cCe) = {c}• First (C d) = {d}

Red : A Blue :

Step 1:• First (SaSe) = {a}• First (SB) = First(B)• First (B bBe) = {b}• First (B C) = First(C)• First (C cCe) = {c}• First (C d) = {d}

Step First Set

S B C a b c d

Step 1 {a}∪First(B) {b}∪First(C) {c, d}


G0

First Set (2)

• First (SaSe) = {a}• First (SB) = First(B) = {b}∪First(C)• First (B bBe) = {b}• First (B C) = First(C)• First (C cCe) = {c}• First (C d) = {d}

Red : A Blue :

Step 2:• First (SaSe) = {a}• First (SB) = First(B) = {b}∪First(C)• First (B bBe) = {b}• First (B C) = First(C)• First (C cCe) = {c}• First (C d) = {d}

Step First Set

S B C a b c d



G0

First Set (2)

• First (SaSe) = {a}• First (SB) = {b}∪First(C)• First (B bBe) = {b}• First (B C) = First(C)• First (C cCe) = {c}• First (C d) = {d}

Red : A Blue :

Step 2:• First (SaSe) = {a}• First (SB) = {b}∪First(C)• First (B bBe) = {b}• First (B C) = First(C)• First (C cCe) = {c}• First (C d) = {d}

Step First Set

S B C a b c d


Step 2 {a}∪ {b}∪First(C)


G0

First Set (2)

• First (SaSe) = {a}• First (SB) = {b}∪First(C)• First (B bBe) = {b}• First (B C) = First(C) = {c, d}• First (C cCe) = {c}• First (C d) = {d}

Red : A Blue :

Step 2:• First (SaSe) = {a}• First (SB) = {b}∪First(C)• First (B bBe) = {b}• First (B C) = First(C) = {c, d}• First (C cCe) = {c}• First (C d) = {d}

Step First Set

S B C a b c d


Step 2 {a}∪ {b}∪First(C)


G0

First Set (2)

• First (SaSe) = {a}• First (SB) = {b}∪First(C)• First (B bBe) = {b}• First (B C) = {c, d}• First (C cCe) = {c}• First (C d) = {d}

Red : A Blue :

Step 2:• First (SaSe) = {a}• First (SB) = {b}∪First(C)• First (B bBe) = {b}• First (B C) = {c, d}• First (C cCe) = {c}• First (C d) = {d}

Step First Set

S B C a b c d


Step 2 {a}∪ {b}∪First(C) {b}∪{c, d} = {b,c,d} {c, d}


G0

First Set (2)

• First (SaSe) = {a}• First (SB) = {b}∪First(C) = {b}∪ {c, d}• First (B bBe) = {b}• First (B C) = {c, d}• First (C cCe) = {c}• First (C d) = {d}

Red : A Blue :

Step 3:• First (SaSe) = {a}• First (SB) = {b}∪First(C) = {b}∪ {c, d}• First (B bBe) = {b}• First (B C) = {c, d}• First (C cCe) = {c}• First (C d) = {d}

Step First Set

S B C a b c d




G0

First Set (2)

• First (SaSe) = {a}• First (SB) = {b, c, d}• First (B bBe) = {b}• First (B C) = {c, d}• First (C cCe) = {c}• First (C d) = {d}

Red : A Blue :

Step 3:• First (SaSe) = {a}• First (SB) = {b, c, d}• First (B bBe) = {b}• First (B C) = {c, d}• First (C cCe) = {c}• First (C d) = {d}

Step First Set

S B C a b c d

Step1

{a}∪First(B) {b}∪First(C) {c, d}

Step2

{a}∪ {b}∪First(C) {b}∪{c, d} = {b,c,d} {c, d}

Step3

{a}∪ {b}∪{c, d} = {a,b,c,d} {b}∪{c, d} = {b,c,d} {c, d}


G0

First Set (2)


Red : A Blue :

Step 3:

If no more change…The first set of a terminalsymbol is itself


Step First Set

S B C a b c d



Step 3 {a}∪ {b}∪{c, d} = {a,b,c,d} {b}∪{c, d} = {b,c,d} {c, d} {a} {b} {c} {d}


Another Example….

First Set (2)

S ABcA aA B bB

G0

Red : A Blue :

S ABcA aA B bB

G0

First Set (2)S ABcA aA B bB

G0

Red : A Blue :

• First (SABc) = First(ABc)• First (Aa) = First(a)• First (A ) = First()∪First()• First (B b) = First(b)• First (B ) = First()∪First()

Step 1:• First (SABc) = First(ABc)• First (Aa) = First(a)• First (A ) = First()∪First()• First (B b) = First(b)• First (B ) = First()∪First()


G0

Red : A Blue :

• First (SABc) = First(ABc)• First (Aa) = {a}• First (A ) = {}• First (B b) = {b}• First (B ) = {}

Step 1:• First (SABc) = First(ABc)• First (Aa) = {a}• First (A ) = {}• First (B b) = {b}• First (B ) = {}

Step First Set

S A B a b c

Step 1 First(ABc) {a, } {b, }


G0

Red : A Blue :

• First (SABc) = First(ABc) = {a, }= {a, } - {} ∪ First(Bc)= {a} ∪ First(Bc)

• First (Aa) = {a}• First (A ) = {}• First (B b) = {b}• First (B ) = {}

Step 2:• First (SABc) = First(ABc) = {a, }

= {a, } - {} ∪ First(Bc)= {a} ∪ First(Bc)

• First (Aa) = {a}• First (A ) = {}• First (B b) = {b}• First (B ) = {} Step First Set

S A B a b c


Step 2 {a} ∪ First(Bc) {a, } {b, }


G0

Red : A Blue :

• First (SABc) = {a} ∪ First(Bc)= {a} ∪{b, }= {a} ∪{b, } - {} ∪First(c)= {a} ∪{b,c}


Step 3:• First (SABc) = {a} ∪ First(Bc)

= {a} ∪{b, }= {a} ∪{b, } - {} ∪First(c)= {a} ∪{b,c}


Step First Set

S A B a b c


Step 2 {a} ∪ First(Bc) {a, } {b, }

Step 3 {a} ∪ {b, c}= {a,b,c} {a, } {b, }


G0

Red : A Blue :

• First (SABc) = {a,b,c}• First (Aa) = {a}• First (A ) = {}• First (B b) = {b}• First (B ) = {}

Step 3:• First (SABc) = {a,b,c}• First (Aa) = {a}• First (A ) = {}• First (B b) = {b}• First (B ) = {}

Step First Set

S A B a b c

Step1

First(ABc) {a, } {b, }

Step2

{a} ∪ First(Bc) {a, } {b, }

Step3

{a} ∪ {b, c}= {a,b,c} {a, } {b, } {a} {b} {c}


LL(1) Grammar

• A grammar G is LL(1) if it is not left recursive and foreach collection of productions

A1 | 2 | … | nfor nonterminal A the following holds:

1. FIRST(i) FIRST(j) = for all i j2. if i* then

2.a. j* for all i j2.b. FIRST(j) FOLLOW(A) =

for all i j





for all i j

41





for all i j





for all i j

Recursive Descent Parsing (Recap)

• Grammar must be LL(1)• Every nonterminal has one (recursive) procedure

responsible for parsing the nonterminal’s syntacticcategory of input tokens

• When a nonterminal has multiple productions, eachproduction is implemented in a branch of a selectionstatement based on input look-ahead information




43







Using FIRST and FOLLOW to Write aRecursive Descent Parser

expr term restrest + term rest

| - term rest|

term id


| - term rest|

term id

procedure rest();begin

if lookahead in FIRST(+ term rest) thenmatch(‘+’); term(); rest()

else if lookahead in FIRST(- term rest) thenmatch(‘-’); term(); rest()

else if lookahead in FOLLOW(rest) thenreturn

else error()end;





else error()end;

44


| - term rest|

term id


| - term rest|

term id





else error()end;





else error()end;

FIRST(+ term rest) = { + }FIRST(- term rest) = { - }FOLLOW(rest) = { $ }

FIRST(+ term rest) = { + }FIRST(- term rest) = { - }FOLLOW(rest) = { $ }

Non-Recursive Predictive Parsing:Table-Driven Parsing

• Given an LL(1) grammar G = (N, T, P, S)construct a table M[A,a] for A N, a T anduse a driver program with a stack


45


Predictive parsingprogram (driver)

Predictive parsingprogram (driver)

Parsing tableM

Parsing tableM

a + b $

XYZ$

stack

input

output

Constructing an LL(1) PredictiveParsing Table

for each production A do

for each a FIRST() doadd A to M[A,a]

enddo

if FIRST() then

for each b FOLLOW(A) doadd A to M[A,b]

enddoendif

enddoMark each undefined entry in M error



enddo

if FIRST() then


enddoendif


46



enddo

if FIRST() then


enddoendif




enddo

if FIRST() then


enddoendif


Example Table

E T ERER + T ER | T F TRTR * F TR | F ( E ) | id


A FIRST() FOLLOW(A)E T ER ( id $ )

ER + T ER + $ )ER $ )

T F TR ( id + $ )TR * F TR * + $ )

TR + $ )F ( E ) ( * + $ )

47

id + * ( ) $E E T ER E T ER

ER ER + T ER ER ER

T T F TR T F TR

TR TR TR * F TR TR TR

F F id F ( E )

F ( E ) ( * + $ )F id id * + $ )

LL(1) Grammars are Unambiguous

Ambiguous grammarS i E t S SR | aSR e S | E b


A FIRST() FOLLOW(A)S i E t S SR i e $

S a a e $SR e S e e $

48


a b e i t $S S a S i E t S SR

SRSR

SR e S SR

E E b

SR e S e e $SR e $E b b t

Error: duplicate table entry

Predictive Parsing Program (Driver)push($)push(S)a := lookaheadrepeat

X := pop()if X is a terminal or X = $ then

match(X) // moves to next token and a := lookaheadelse if M[X,a] = X Y1Y2…Yk then

push(Yk, Yk-1, …, Y2, Y1) // such that Y1 is on top… invoke actions and/or produce IR output …

else error()endif

until X = $

push($)push(S)a := lookaheadrepeat




else error()endif

until X = $

49





else error()endif

until X = $





else error()endif

until X = $

Example Table-Driven ParsingStack$E$ERT$ERTRF$ERTRid$ERTR$ER$ERT+$ERT$ERTRF$ERTRid$ERTR$ERTRF*$ERTRF$ERTRid$ERTR$ER$

Stack$E$ERT$ERTRF$ERTRid$ERTR$ER$ERT+$ERT$ERTRF$ERTRid$ERTR$ERTRF*$ERTRF$ERTRid$ERTR$ER$

Inputid+id*id$id+id*id$id+id*id$id+id*id$

+id*id$+id*id$+id*id$

id*id$id*id$id*id$

*id$*id$

id$id$

$$$



id*id$id*id$id*id$

*id$*id$

id$id$

$$$

Production appliedE T ERT F TRF id

TR ER + T ER

T F TRF id

TR * F TR

F id

TR ER


TR ER + T ER

T F TRF id

TR * F TR

F id

TR ER

50





id*id$id*id$id*id$

*id$*id$

id$id$

$$$



id*id$id*id$id*id$

*id$*id$

id$id$

$$$


TR ER + T ER

T F TRF id

TR * F TR

F id

TR ER


TR ER + T ER

T F TRF id

TR * F TR

F id

TR ER

Panic Mode Recovery

FOLLOW(E) = { ) $ }FOLLOW(ER) = { ) $ }FOLLOW(T) = { + ) $ }FOLLOW(TR) = { + ) $ }FOLLOW(F) = { + * ) $ }


Add synchronizing actions toundefined entries based on FOLLOWAdd synchronizing actions toundefined entries based on FOLLOW

Pro: Can be automatedCons: Error messages are neededPro: Can be automatedCons: Error messages are needed

51

id + * ( ) $E E T ER E T ER synch synchER ER + T ER ER ER

T T F TR synch T F TR synch synchTR TR TR * F TR TR TR

F F id synch synch F ( E ) synch synch


synch: the driver pops current nonterminal A and skips input tillsynch token or skips input until one of FIRST(A) is found

Pro: Can be automatedCons: Error messages are needed

Phrase-Level Recovery

Change input stream by inserting missing tokensFor example: id id is changed into id * idChange input stream by inserting missing tokensFor example: id id is changed into id * id

Can then continue here

Pro: Can be automatedCons: Recovery not always intuitivePro: Can be automatedCons: Recovery not always intuitive

52


T T F TR synch T F TR synch synchTR insert * TR TR * F TR TR TR


insert *: driver inserts missing * and retries the production

Can then continue here

Error Productions



Add “error production”:TR F TR

to ignore missing *, e.g.: id id

Add “error production”:TR F TR

to ignore missing *, e.g.: id id

Pro: Powerful recovery methodCons: Cannot be automatedPro: Powerful recovery methodCons: Cannot be automated

53


T T F TR synch T F TR synch synchTR TR F TR TR TR * F TR TR TR


E T ERER + T ER | T F TRTR * F TR | F ( E ) | id Pro: Powerful recovery method

Cons: Cannot be automatedPro: Powerful recovery methodCons: Cannot be automated

Bottom-Up Parsing

• LR methods (Left-to-right, Rightmostderivation)– SLR, Canonical LR, LALR

• Other special cases:– Shift-reduce parsing– Operator-precedence parsing



54





Operator-Precedence Parsing

• Special case of shift-reduce parsing• We will not further discuss (you can skip

textbook section 4.6)



55



Shift-Reduce Parsing

Grammar:S a A B eA A b c | bB d


Shift-reduce correspondsto a rightmost derivation:Srm a A B erm a A d erm a A b c d erm a b b c d e


Reducing a sentence:a b b c d ea A b c d ea A d ea A B eS


56






S

a b b c d e

A

A

B

a b b c d e

A

A

B

a b b c d e

A

A

a b b c d e

A

These matchproduction’s

right-hand sides

These matchproduction’s

right-hand sides

Handles



A handle is a substring of grammar symbols in aright-sentential form that matches a right-hand side

of a production

A handle is a substring of grammar symbols in aright-sentential form that matches a right-hand side

of a productiona b b c d ea A b c d ea A d ea A B eS

a b b c d ea A b c d ea A d ea A B eS

57

Handle



NOT a handle, becausefurther reductions will fail

(result is not a sentential form)

a b b c d ea A b c d ea A A e… ?

a b b c d ea A b c d ea A A e… ?



Stack Implementation ofShift-Reduce Parsing

Stack$$id$E$E+$E+id$E+E$E+E*$E+E*id$E+E*E$E+E$E


Inputid+id*id$

+id*id$+id*id$

id*id$*id$*id$

id$$$$$

Inputid+id*id$

+id*id$+id*id$

id*id$*id$*id$

id$$$$$

Actionshiftreduce E idshiftshiftreduce E idshift (or reduce?)shiftreduce E idreduce E E * Ereduce E E + Eaccept


Grammar:E E + EE E * EE ( E )E id


How toresolve

conflicts?

58



Inputid+id*id$

+id*id$+id*id$

id*id$*id$*id$

id$$$$$

Inputid+id*id$

+id*id$+id*id$

id*id$*id$*id$

id$$$$$





Find handlesto reduce

How toresolve

conflicts?

Conflicts

• Shift-reduce and reduce-reduce conflicts arecaused by– The limitations of the LR parsing method (even

when the grammar is unambiguous)– Ambiguity of the grammar



59





Shift-Reduce Parsing:Shift-Reduce Conflicts

Stack$…$…if E then S


Input…$

else…$

Input…$

else…$

Action…shift or reduce?


Ambiguous grammar:S if E then S

| if E then S else S| other



60


Input…$

else…$






Resolve in favorof shift, so else

matches closest if

Resolve in favorof shift, so else

matches closest if

Shift-Reduce Parsing:Reduce-Reduce Conflicts

Stack$$a

Stack$$a

Inputaa$

a$

Inputaa$

a$

Actionshiftreduce A a or B a ?


Grammar:C A BA aB a

Grammar:C A BA aB a

61

Stack$$a

Inputaa$

a$


Grammar:C A BA aB a

Grammar:C A BA aB a

Resolve in favorof reduce A a,

otherwise we’re stuck!

Resolve in favorof reduce A a,

otherwise we’re stuck!

LR(k) Parsers: Use a DFA forShift/Reduce Decisions

1

2

4

5

0start

a

AC B

aState I1:S C•State I1:S C• State I4:

C A B•State I4:C A B•goto(I0,C)

62

5

3a

a

Grammar:S CC A BA aB a


State I0:S •CC •A BA •a


State I1:S C•

State I2:C A•BB •a


State I3:A a•State I3:A a•

State I4:C A B•State I4:C A B•

State I5:B a•State I5:B a•

goto(I0,C)

goto(I0,a)

goto(I0,A)

goto(I2,a)

goto(I2,B)

Can onlyreduce A a(not B a)

DFA for Shift/Reduce Decisions

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Inputaa$aa$

a$a$

$$$

Inputaa$aa$

a$a$

$$$

Actionstart in state 0shift (and goto state 3)reduce A a (goto 2)shift (goto 5)reduce B a (goto 4)reduce C AB (goto 1)accept (S C)




The states of the DFA are used to determineif a handle is on top of the stack


63

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Inputaa$aa$

a$a$

$$$

Inputaa$aa$

a$a$

$$$







State I3:A a•State I3:A a•

goto(I0,a)goto(I0,a)


Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Inputaa$aa$

a$a$

$$$

Inputaa$aa$

a$a$

$$$







64

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Inputaa$aa$

a$a$

$$$

Inputaa$aa$

a$a$

$$$









goto(I0,A)goto(I0,A)


Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Inputaa$aa$

a$a$

$$$

Inputaa$aa$

a$a$

$$$






65

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Inputaa$aa$

a$a$

$$$

Inputaa$aa$

a$a$

$$$






State I5:B a•State I5:B a•

goto(I2,a)goto(I2,a)


Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Inputaa$aa$

a$a$

$$$

Inputaa$aa$

a$a$

$$$







66

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Inputaa$aa$

a$a$

$$$

Inputaa$aa$

a$a$

$$$







State I4:C A B•State I4:C A B•

goto(I2,B)goto(I2,B)


Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Inputaa$aa$

a$a$

$$$

Inputaa$aa$

a$a$

$$$







67

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Inputaa$aa$

a$a$

$$$

Inputaa$aa$

a$a$

$$$







State I1:S C•State I1:S C•

goto(I0,C)goto(I0,C)


Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Inputaa$aa$

a$a$

$$$

Inputaa$aa$

a$a$

$$$







68

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Stack$ 0$ 0$ 0 a 3$ 0 A 2$ 0 A 2 a 5$ 0 A 2 B 4$ 0 C 1

Inputaa$aa$

a$a$

$$$

Inputaa$aa$

a$a$

$$$







State I1:S C•State I1:S C•

goto(I0,C)

Model of an LR ParserModel of an LR Parser

Sm

Xm

Sm-1

Xm-1

a1 ... ai ... an $

LR Parsing Algorithm

stack

input

output

69

Xm-1

.

.S1

X1

S0

Action Tableterminals and $

st four differenta actionstes

Goto Tablenon-terminal

st each item isa a state numbertes

A Configuration of LR ParsingAlgorithm

• A configuration of a LR parsing is:

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )

Stack Rest of Input

• Sm and ai decides the parser action by consulting the parsing action table.(Initial Stack contains just So )

• A configuration of a LR parsing represents the right sentential form:

X1 ... Xm ai ai+1 ... an $

• A configuration of a LR parsing is:

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )

Stack Rest of Input

• Sm and ai decides the parser action by consulting the parsing action table.(Initial Stack contains just So )

• A configuration of a LR parsing represents the right sentential form:

X1 ... Xm ai ai+1 ... an $

Actions of A LR-Parser1. shift s -- shifts the next input symbol and the state s onto the stack

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $ )

2. reduce A (or rn where n is a production number)– pop 2|| (=r) items from the stack;– then push A and s where s=goto[sm-r,A]

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

– Output is the reducing production reduce A

3. Accept – Parsing successfully completed

4. Error -- Parser detected an error (an empty entry in the action table)

1. shift s -- shifts the next input symbol and the state s onto the stack( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $ )

2. reduce A (or rn where n is a production number)– pop 2|| (=r) items from the stack;– then push A and s where s=goto[sm-r,A]

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

– Output is the reducing production reduce A

3. Accept – Parsing successfully completed

4. Error -- Parser detected an error (an empty entry in the action table)

Reduce ActionReduce Action

• pop 2|| (=r) items from the stack; let usassume that = Y1Y2...Yr

• then push A and s where s=goto[sm-r,A]

( So X1 S1 ... Xm-r Sm-r Y1 Sm-r+1 ...Yr Sm, ai ai+1 ... an $ ) ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

• In fact, Y1Y2...Yr is a handle.

X1 ... Xm-r A ai ... an $ X1 ... Xm Y1...Yr ai ai+1 ... an $

• pop 2|| (=r) items from the stack; let usassume that = Y1Y2...Yr

• then push A and s where s=goto[sm-r,A]

( So X1 S1 ... Xm-r Sm-r Y1 Sm-r+1 ...Yr Sm, ai ai+1 ... an $ ) ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

• In fact, Y1Y2...Yr is a handle.

X1 ... Xm-r A ai ... an $ X1 ... Xm Y1...Yr ai ai+1 ... an $

(SLR) Parsing Tables for ExpressionGrammar

(SLR) Parsing Tables for ExpressionGrammar

state id + * ( ) $ E T F

0 s5 s4 1 2 3

1 s6 acc

2 r2 s7 r2 r2

3 r4 r4 r4 r4

4 s5 s4 8 2 3

Action Table Goto Table

1) E E+T2) E T3) T T*F4) T F5) F (E)6) F id


4 s5 s4 8 2 3

5 r6 r6 r6 r6

6 s5 s4 9 3

7 s5 s4 10

8 s6 s11

9 r1 s7 r1 r1

10 r3 r3 r3 r3

11 r5 r5 r5 r5


Actions of A (S)LR-Parser -- Examplestack input action output0 id*id+id$ shift 50id5 *id+id$ reduce by Fid Fid0F3 *id+id$ reduce by TF TF0T2 *id+id$ shift 70T2*7 id+id$ shift 50T2*7id5 +id$ reduce by Fid Fid0T2*7F10 +id$ reduce by TT*FTT*F0T2 +id$ reduce by ET ET0E1 +id$ shift 60E1+6 id$ shift 50E1+6id5 $ reduce by Fid Fid0E1+6F3 $ reduce by TF TF0E1+6T9$ reduce by EE+T EE+T0E1 $ accept

stack input action output0 id*id+id$ shift 50id5 *id+id$ reduce by Fid Fid0F3 *id+id$ reduce by TF TF0T2 *id+id$ shift 70T2*7 id+id$ shift 50T2*7id5 +id$ reduce by Fid Fid0T2*7F10 +id$ reduce by TT*FTT*F0T2 +id$ reduce by ET ET0E1 +id$ shift 60E1+6 id$ shift 50E1+6id5 $ reduce by Fid Fid0E1+6F3 $ reduce by TF TF0E1+6T9$ reduce by EE+T EE+T0E1 $ accept

SLR Grammars

• SLR (Simple LR): a simple extension of LR(0)shift-reduce parsing

• SLR eliminates some conflicts by populatingthe parsing table with reductions A onsymbols in FOLLOW(A)

75

• SLR (Simple LR): a simple extension of LR(0)shift-reduce parsing

• SLR eliminates some conflicts by populatingthe parsing table with reductions A onsymbols in FOLLOW(A)

S EE id + EE id

State I0:S •EE •id + EE •id

State I2:E id•+ EE id•

goto(I0,id) goto(I3,+)

FOLLOW(E)={$}thus reduce on $

Shift on +

SLR Parsing Table

id + $ E

• Reductions do not fill entire rows• Otherwise the same as LR(0)

76

s2acc

s3 r3s2

r2

id + $01234

E1

4

1. S E2. E id + E3. E id

FOLLOW(E)={$}thus reduce on $

Shift on +

SLR Parsing

• An LR(0) state is a set of LR(0) items• An LR(0) item is a production with a • (dot) in the

right-hand side• Build the LR(0) DFA by

– Closure operation to construct LR(0) items– Goto operation to determine transitions

• Construct the SLR parsing table from the DFA• LR parser program uses the SLR parsing table to

determine shift/reduce operations

77

• An LR(0) state is a set of LR(0) items• An LR(0) item is a production with a • (dot) in the

right-hand side• Build the LR(0) DFA by

– Closure operation to construct LR(0) items– Goto operation to determine transitions

• Construct the SLR parsing table from the DFA• LR parser program uses the SLR parsing table to

determine shift/reduce operations

LR(0) Items of a Grammar

• An LR(0) item of a grammar G is a production of Gwith a • at some position of the right-hand side

• Thus, a productionA X Y Z

has four items:[A • X Y Z][A X • Y Z][A X Y • Z][A X Y Z •]

• Note that production A has one item [A •]

78

• An LR(0) item of a grammar G is a production of Gwith a • at some position of the right-hand side

• Thus, a productionA X Y Z

has four items:[A • X Y Z][A X • Y Z][A X Y • Z][A X Y Z •]

• Note that production A has one item [A •]

Constructing the set of LR(0) Items of aGrammar

1. The grammar is augmented with a new startsymbol S’ and production S’S

2. Initially, set C = closure({[S’•S]})(this is the start state of the DFA)

3. For each set of items I C and each grammarsymbol X (NT) such that goto(I,X) C andgoto(I,X) , add the set of items goto(I,X) to C

4. Repeat 3 until no more sets can be added to C

79

1. The grammar is augmented with a new startsymbol S’ and production S’S

2. Initially, set C = closure({[S’•S]})(this is the start state of the DFA)



The Closure Operation for LR(0) Items

1. Initially, every LR(0) item in I is added toclosure(I)

2. If [A•B] closure(I) then for eachproduction B in the grammar, add theitem [B•] to I if not already in I

3. Repeat 2 until no new items can be added

80

1. Initially, every LR(0) item in I is added toclosure(I)

2. If [A•B] closure(I) then for eachproduction B in the grammar, add theitem [B•] to I if not already in I


The Closure Operation (Example)

{ [E’ • E] }

closure({[E’ •E]}) =

{ [E’ • E][E • E + T][E • T] }

{ [E’ • E][E • E + T][E • T][T • T * F][T • F] }

{ [E’ • E][E • E + T][E • T][T • T * F][T • F][F • ( E )][F • id] }

81

Grammar:E E + T | TT T * F | FF ( E )F id

{ [E’ • E][E • E + T][E • T][T • T * F][T • F] }


Add [E•]Add [T•]

Add [F•]

The Goto Operation for LR(0) Items

1. For each item [A•X] I, add the set ofitems closure({[AX•]}) to goto(I,X) if notalready there

2. Repeat step 1 until no more items can beadded to goto(I,X)

3. Intuitively, goto(I,X) is the set of items thatare valid for the viable prefix X when I is theset of items that are valid for

82

1. For each item [A•X] I, add the set ofitems closure({[AX•]}) to goto(I,X) if notalready there


3. Intuitively, goto(I,X) is the set of items thatare valid for the viable prefix X when I is theset of items that are valid for

The Goto Operation (Example 1)

Suppose I = Then goto(I,E)= closure({[E’ E •, E E • + T]})=

{ [E’ E •][E E • + T] }


83

{ [E’ E •][E E • + T] }



The Goto Operation (Example 2)

Suppose I = { [E’ E •], [E E • + T] }

Then goto(I,+) = closure({[E E + • T]}) = { [E E + • T][T • T * F][T • F][F • ( E )][F • id] }

84

{ [E E + • T][T • T * F][T • F][F • ( E )][F • id] }


Constructing SLR Parsing Tables1. Augment the grammar with S’S2. Construct the set C={I0,I1,…,In} of LR(0) items3. If [A•a] Ii and goto(Ii,a)=Ij then set

action[i,a]=shift j4. If [A•] Ii then set action[i,a]=reduce A for

all a FOLLOW(A) (apply only if AS’)5. If [S’S•] is in Ii then set action[i,$]=accept6. If goto(Ii,A)=Ij then set goto[i,A]=j7. Repeat 3-6 until no more entries added8. The initial state i is the Ii holding item [S’•S]

85

1. Augment the grammar with S’S2. Construct the set C={I0,I1,…,In} of LR(0) items3. If [A•a] Ii and goto(Ii,a)=Ij then set

action[i,a]=shift j4. If [A•] Ii then set action[i,a]=reduce A for

all a FOLLOW(A) (apply only if AS’)5. If [S’S•] is in Ii then set action[i,$]=accept6. If goto(Ii,A)=Ij then set goto[i,A]=j7. Repeat 3-6 until no more entries added8. The initial state i is the Ii holding item [S’•S]

The Canonical LR(0) Collection --Example

I0: E’ .E I1: E’ E. I6: E E+.T I9: E E+T.E .E+T E E.+T T .T*F T T.*FE .T T .FT .T*F I2: E T. F .(E) I10: T T*F.T .F T T.*F F .idF .(E)F .id I3: T F. I7: T T*.F I11: F (E).

F .(E)I4: F (.E) F .id

E .E+TE .T I8: F (E.)T .T*F E E.+TT .FF .(E)F .id

I5: F id.

I0: E’ .E I1: E’ E. I6: E E+.T I9: E E+T.E .E+T E E.+T T .T*F T T.*FE .T T .FT .T*F I2: E T. F .(E) I10: T T*F.T .F T T.*F F .idF .(E)F .id I3: T F. I7: T T*.F I11: F (E).

F .(E)I4: F (.E) F .id

E .E+TE .T I8: F (E.)T .T*F E E.+TT .FF .(E)F .id

I5: F id.

Transition Diagram (DFA) of GotoFunction

I0 I1

I2

I3

I4

I5

I6

I7

I8to I2to I3to I4

I9to I3to I4to I5

I10to I4to I5

I11to I6

to I7

id

(F

*

E T

T

FF

*+I1

I2

I3

I4

I5

I6

I7

I8to I2to I3to I4

I9to I3to I4to I5

I10to I4to I5

I11to I6E

+T

)

F

F

(

idid

(

(id

Example SLR Grammar and LR(0) Items

Augmentedgrammar:1. C’ C2. C A B3. A a4. B a

State I1:C’ C• State I4:

C A B•goto(I0,C)

I0 = closure({[C’ •C]})I1 = goto(I0,C) = closure({[C’ C•]})…

final

88

Augmentedgrammar:1. C’ C2. C A B3. A a4. B a

State I0:C’ •CC •A BA •a

State I1:C’ C•


State I3:A a•

State I4:C A B•

State I5:B a•

goto(I0,C)

goto(I0,a)

goto(I0,A)

goto(I2,a)

goto(I2,B)

start

final

Example SLR Parsing Table

State I0:C’ •CC •A BA •a

State I1:C’ C•


State I3:A a•

State I4:C A B•

State I5:B a•

89

s3acc

s5r3

r2r4

a $012345

C A B1 2

4

1

2

4

5

3

0start

a

AC B

a

Grammar:1. C’ C2. C A B3. A a4. B a

SLR and Ambiguity

• Every SLR grammar is unambiguous, but not everyunambiguous grammar is SLR

• Consider for example the unambiguous grammarS L = R | RL * R | idR L

90

• Every SLR grammar is unambiguous, but not everyunambiguous grammar is SLR

• Consider for example the unambiguous grammarS L = R | RL * R | idR L

I0:S’ •SS •L=RS •RL •*RL •idR •L

I1:S’ S•

I2:S L•=RR L•

I3:S R•

I4:L *•RR •LL •*RL •id

I5:L id•

I6:S L=•RR •LL •*RL •id

I7:L *R•

I8:R L•

I9:S L=R•

action[2,=]=s6action[2,=]=r5 no

Has no SLRparsing table

LR(1) Grammars

• SLR too simple• LR(1) parsing uses lookahead to avoid

unnecessary conflicts in parsing table• LR(1) item = LR(0) item + lookahead

91

• SLR too simple• LR(1) parsing uses lookahead to avoid

unnecessary conflicts in parsing table• LR(1) item = LR(0) item + lookahead

LR(0) item:[A•]

LR(1) item:[A•, a]

I2:S L•=RR L•

split

SLR Versus LR(1)

• Split the SLR states byadding LR(1) lookahead

• Unambiguous grammar1. S L = R2. | R3. L * R4. | id5. R L

92

I2:S L•=RR L•

action[2,=]=s6

Should not reduce on =, because noright-sentential form begins with R=

R L•S L•=R

• Split the SLR states byadding LR(1) lookahead

• Unambiguous grammar1. S L = R2. | R3. L * R4. | id5. R L

lookahead=$

action[2,$]=r5

LR(1) Items

• An LR(1) item[A•, a]

contains a lookahead terminal a, meaning alreadyon top of the stack, expect to see a

• For items of the form[A•, a]

the lookahead a is used to reduce A only if thenext input is a


with the lookahead has no effect

93

• An LR(1) item[A•, a]

contains a lookahead terminal a, meaning alreadyon top of the stack, expect to see a


the lookahead a is used to reduce A only if thenext input is a


with the lookahead has no effect

The Closure Operation for LR(1) Items

1. Start with closure(I) = I2. If [A•B, a] closure(I) then for each

production B in the grammar and eachterminal b FIRST(a), add the item [B•,b] to I if not already in I


94

1. Start with closure(I) = I2. If [A•B, a] closure(I) then for each

production B in the grammar and eachterminal b FIRST(a), add the item [B•,b] to I if not already in I


The Goto Operation for LR(1) Items

1. For each item [A•X, a] I, add the setof items closure({[AX•, a]}) to goto(I,X)if not already there


95

1. For each item [A•X, a] I, add the setof items closure({[AX•, a]}) to goto(I,X)if not already there


Constructing the set of LR(1) Items of aGrammar

1. Augment the grammar with a new start symbol S’and production S’S

2. Initially, set C = closure({[S’•S, $]})(this is the start state of the DFA)



96

1. Augment the grammar with a new start symbol S’and production S’S

2. Initially, set C = closure({[S’•S, $]})(this is the start state of the DFA)



Example Grammar and LR(1) Items

• Unambiguous LR(1) grammar:S L = R

| RL * R

| idR L

• Augment with S’ S• LR(1) items (next slide)

97


| RL * R

| idR L

• Augment with S’ S• LR(1) items (next slide)

[S’ •S, $] goto(I0,S)=I1[S •L=R, $] goto(I0,L)=I2[S •R, $] goto(I0,R)=I3[L •*R, =/$] goto(I0,*)=I4[L •id, =/$] goto(I0,id)=I5[R •L, $] goto(I0,L)=I2

[S’ S•, $]

[S L•=R, $] goto(I0,=)=I6[R L•, $]

[S R•, $]

[L *•R, =/$] goto(I4,R)=I7[R •L, =/$] goto(I4,L)=I8[L •*R, =/$] goto(I4,*)=I4[L •id, =/$] goto(I4,id)=I5

[L id•, =/$]

[S L=•R, $] goto(I6,R)=I9[R •L, $] goto(I6,L)=I10[L •*R, $] goto(I6,*)=I11[L •id, $] goto(I6,id)=I12

[L *R•, =/$]

[R L•, =/$]

[S L=R•, $]

[R L•, $]

[L *•R, $] goto(I11,R)=I13[R •L, $] goto(I11,L)=I10[L •*R, $] goto(I11,*)=I11[L •id, $] goto(I11,id)=I12

[L id•, $]

[L *R•, $]

I0:

I1:

I2:

I6:

I7:

I8:

I9:

98

[S’ •S, $] goto(I0,S)=I1[S •L=R, $] goto(I0,L)=I2[S •R, $] goto(I0,R)=I3[L •*R, =/$] goto(I0,*)=I4[L •id, =/$] goto(I0,id)=I5[R •L, $] goto(I0,L)=I2

[S’ S•, $]

[S L•=R, $] goto(I0,=)=I6[R L•, $]

[S R•, $]

[L *•R, =/$] goto(I4,R)=I7[R •L, =/$] goto(I4,L)=I8[L •*R, =/$] goto(I4,*)=I4[L •id, =/$] goto(I4,id)=I5

[L id•, =/$]


[L *R•, =/$]

[R L•, =/$]

[S L=R•, $]

[R L•, $]

[L *•R, $] goto(I11,R)=I13[R •L, $] goto(I11,L)=I10[L •*R, $] goto(I11,*)=I11[L •id, $] goto(I11,id)=I12

[L id•, $]

[L *R•, $]

I2:

I3:

I4:

I5:

I9:

I10:

I12:

I11:

I13:

Constructing Canonical LR(1) ParsingTables

1. Augment the grammar with S’S2. Construct the set C={I0,I1,…,In} of LR(1) items3. If [A•a, b] Ii and goto(Ii,a)=Ij then set

action[i,a]=shift j4. If [A•, a] Ii then set action[i,a]=reduce A

(apply only if AS’)5. If [S’S•, $] is in Ii then set action[i,$]=accept6. If goto(Ii,A)=Ij then set goto[i,A]=j7. Repeat 3-6 until no more entries added8. The initial state i is the Ii holding item [S’•S,$]

99

1. Augment the grammar with S’S2. Construct the set C={I0,I1,…,In} of LR(1) items3. If [A•a, b] Ii and goto(Ii,a)=Ij then set

action[i,a]=shift j4. If [A•, a] Ii then set action[i,a]=reduce A

(apply only if AS’)5. If [S’S•, $] is in Ii then set action[i,$]=accept6. If goto(Ii,A)=Ij then set goto[i,A]=j7. Repeat 3-6 until no more entries added8. The initial state i is the Ii holding item [S’•S,$]

Example LR(1) Parsing Tables5 s4

acc

s6 r6r3

s5 s4

r4

id * = $012345

S L R1 2 3

8 7Grammar:1. S’ S2. S L = R3. S R4. L * R5. L id6. R L

100

s5 s4r5 r5

s12

s11

r4 r4r6 r6

r2r6

s12

s11

r4

5678910

1112

10 4

10 13

Grammar:1. S’ S2. S L = R3. S R4. L * R5. L id6. R L

LALR(1) Grammars

• LR(1) parsing tables have many states• LALR(1) parsing (Look-Ahead LR) combines LR(1)

states to reduce table size• Less powerful than LR(1)

– Will not introduce shift-reduce conflicts, because shifts donot use lookaheads

– May introduce reduce-reduce conflicts, but seldom do sofor grammars of programming languages

101

• LR(1) parsing tables have many states• LALR(1) parsing (Look-Ahead LR) combines LR(1)

states to reduce table size• Less powerful than LR(1)

– Will not introduce shift-reduce conflicts, because shifts donot use lookaheads

– May introduce reduce-reduce conflicts, but seldom do sofor grammars of programming languages

Constructing LALR(1) Parsing Tables

1. Construct sets of LR(1) items2. Combine LR(1) sets with sets of items that

share the same first part

102

[L *•R, =][R •L, =][L •*R, =][L •id, =]

[L *•R, $][R •L, $][L •*R, $][L •id, $]

I4:

I11:

[L *•R, =/$][R •L, =/$][L •*R, =/$][L •id, =/$]

Shorthandfor two items

in the same set

Example LALR(1) Grammar


| RL * R

| idR L

• Augment with S’ S• LALR(1) items (next slide)

103


| RL * R

| idR L

• Augment with S’ S• LALR(1) items (next slide)

[S’ •S,$] goto(I0,S)=I1[S •L=R,$] goto(I0,L)=I2[S •R,$] goto(I0,R)=I3[L •*R,=/$] goto(I0,*)=I4[L •id,=/$] goto(I0,id)=I5[R •L,$] goto(I0,L)=I2

goto(I0,=)= I6[S’ S•,$]

[S L•=R,$][R L•,$]

[S R•,$]

[L *•R,=/$] goto(I4,R)=I7[R •L,=/$] goto(I4,L)=I9[L •*R,=/$] goto(I4,*)=I4[L •id,=/$] goto(I4,id)=I5

[L id•,=/$]


[L *R•,=/$]

[S L=R•,$]

[R L•,=/$]

I0:

I1:

I2:

I6:

I7:

I8:

I9:

104

[S’ •S,$] goto(I0,S)=I1[S •L=R,$] goto(I0,L)=I2[S •R,$] goto(I0,R)=I3[L •*R,=/$] goto(I0,*)=I4[L •id,=/$] goto(I0,id)=I5[R •L,$] goto(I0,L)=I2

goto(I0,=)= I6[S’ S•,$]

[S L•=R,$][R L•,$]

[S R•,$]

[L *•R,=/$] goto(I4,R)=I7[R •L,=/$] goto(I4,L)=I9[L •*R,=/$] goto(I4,*)=I4[L •id,=/$] goto(I4,id)=I5

[L id•,=/$]


[L *R•,=/$]

[S L=R•,$]

[R L•,=/$]I2:

I3:

I4:

I5:

I9:

Shorthandfor two items

[R L•,=][R L•,$]

Example LALR(1) Parsing Table

s5 s4acc

s6 r6

id * = $0123

S L R1 2 3


105

s6 r6r3

s5 s4r5 r5

s5 s4r4 r4

r2r6 r6

3456789

9 7

9 8


LL, SLR, LR, LALR Summary• LL parse tables computed using FIRST/FOLLOW

– Nonterminals terminals productions– Computed using FIRST/FOLLOW

• LR parsing tables computed using closure/goto– LR states terminals shift/reduce actions– LR states nonterminals goto state transitions

• A grammar is– LL(1) if its LL(1) parse table has no conflicts– SLR if its SLR parse table has no conflicts– LALR(1) if its LALR(1) parse table has no conflicts– LR(1) if its LR(1) parse table has no conflicts

106

• LL parse tables computed using FIRST/FOLLOW– Nonterminals terminals productions– Computed using FIRST/FOLLOW

• LR parsing tables computed using closure/goto– LR states terminals shift/reduce actions– LR states nonterminals goto state transitions

• A grammar is– LL(1) if its LL(1) parse table has no conflicts– SLR if its SLR parse table has no conflicts– LALR(1) if its LALR(1) parse table has no conflicts– LR(1) if its LR(1) parse table has no conflicts

LL, SLR, LR, LALR Grammars

LR(1)

LALR(1)

107

LL(1)

LR(0)

SLR

LALR(1)

Dealing with Ambiguous Grammars

1. S’ E2. E E + E3. E id s2

s3 acc

id + $012

E1

$ 0 E 1 + 3 E 4

id+id+id$

+id$

$ 0

… …

stack input

108

s3 acc

r3 r3s2

s3/r2 r2

234

4

Shift/reduce conflict:action[4,+] = shift 4action[4,+] = reduce E E + E

When reducing on +:yields left associativity(id+id)+id

When shifting on +:yields right associativityid+(id+id)

Using Associativity and Precedence toResolve Conflicts

• Left-associative operators: reduce• Right-associative operators: shift• Operator of higher precedence on stack: reduce• Operator of lower precedence on stack: shift

109

• Left-associative operators: reduce• Right-associative operators: shift• Operator of higher precedence on stack: reduce• Operator of lower precedence on stack: shift

S’ EE E + EE E * EE id $ 0 E 1 * 3 E 5

id*id+id$

+id$

$ 0

… …

stack input

reduce E E * E

Error Detection in LR Parsing

• Canonical LR parser uses full LR(1) parse tablesand will never make a single reduction beforerecognizing the error when a syntax erroroccurs on the input

• SLR and LALR may still reduce when a syntaxerror occurs on the input, but will never shiftthe erroneous input symbol

110

• Canonical LR parser uses full LR(1) parse tablesand will never make a single reduction beforerecognizing the error when a syntax erroroccurs on the input

• SLR and LALR may still reduce when a syntaxerror occurs on the input, but will never shiftthe erroneous input symbol

Error Recovery in LR Parsing• Panic mode

– Pop until state with a goto on a nonterminal A is found,(where A represents a major programming construct),push A

– Discard input symbols until one is found in the FOLLOW setof A

• Phrase-level recovery– Implement error routines for every error entry in table

• Error productions– Pop until state has error production, then shift on stack– Discard input until symbol is encountered that allows

parsing to continue

111

• Panic mode– Pop until state with a goto on a nonterminal A is found,

(where A represents a major programming construct),push A

– Discard input symbols until one is found in the FOLLOW setof A

• Phrase-level recovery– Implement error routines for every error entry in table

• Error productions– Pop until state has error production, then shift on stack– Discard input until symbol is encountered that allows

parsing to continue

ANTLR, Yacc, and Bison

• ANTLR tool– Generates LL(k) parsers

• Yacc (Yet Another Compiler Compiler)– Generates LALR(1) parsers

• Bison– Improved version of Yacc

112

• ANTLR tool– Generates LL(k) parsers

• Yacc (Yet Another Compiler Compiler)– Generates LALR(1) parsers

• Bison– Improved version of Yacc

Creating an LALR(1) Parser withYacc/Bison

Yacc or Bisoncompiler

yaccspecificationyacc.y

y.tab.c

113

y.tab.c

inputstream

Ccompiler

a.out outputstream

a.out

Yacc Specification• A yacc specification consists of three parts:

yacc declarations, and C declarations within %{ %}%%translation rules%%user-defined auxiliary procedures

• The translation rules are productions with actions:production1 { semantic action1 }production2 { semantic action2 }…productionn { semantic actionn }

114

• A yacc specification consists of three parts:yacc declarations, and C declarations within %{ %}%%translation rules%%user-defined auxiliary procedures

• The translation rules are productions with actions:production1 { semantic action1 }production2 { semantic action2 }…productionn { semantic actionn }

Writing a Grammar in Yacc

• Productions in Yacc are of the formNonterminal: tokens/nonterminals { action }

| tokens/nonterminals { action }…;

• Tokens that are single characters can be used directlywithin productions, e.g. ‘+’

• Named tokens must be declared first in thedeclaration part using

%token TokenName

115

• Productions in Yacc are of the formNonterminal: tokens/nonterminals { action }

| tokens/nonterminals { action }…;

• Tokens that are single characters can be used directlywithin productions, e.g. ‘+’

• Named tokens must be declared first in thedeclaration part using

%token TokenName

Synthesized Attributes

• Semantic actions may refer to values of thesynthesized attributes of terminals and nonterminalsin a production:

X : Y1 Y2 Y3 … Yn { action }– $$ refers to the value of the attribute of X– $i refers to the value of the attribute of Yi

• For examplefactor : ‘(’ expr ‘)’ { $$=$2; }

116

• Semantic actions may refer to values of thesynthesized attributes of terminals and nonterminalsin a production:

X : Y1 Y2 Y3 … Yn { action }– $$ refers to the value of the attribute of X– $i refers to the value of the attribute of Yi

• For examplefactor : ‘(’ expr ‘)’ { $$=$2; }

factor.val=x

expr.val=x )($$=$2

Example 1%{ #include <ctype.h> %}%token DIGIT%%line : expr ‘\n’ { printf(“%d\n”, $1); }

;expr : expr ‘+’ term { $$ = $1 + $3; }

| term { $$ = $1; };

term : term ‘*’ factor { $$ = $1 * $3; }| factor { $$ = $1; };

factor : ‘(’ expr ‘)’ { $$ = $2; }| DIGIT { $$ = $1; };

%%int yylex(){ int c = getchar();if (isdigit(c)){ yylval = c-’0’;

return DIGIT;}return c;

}

Also results in definition of#define DIGIT xxx

117

%{ #include <ctype.h> %}%token DIGIT%%line : expr ‘\n’ { printf(“%d\n”, $1); }

;expr : expr ‘+’ term { $$ = $1 + $3; }

| term { $$ = $1; };

term : term ‘*’ factor { $$ = $1 * $3; }| factor { $$ = $1; };

factor : ‘(’ expr ‘)’ { $$ = $2; }| DIGIT { $$ = $1; };

%%int yylex(){ int c = getchar();if (isdigit(c)){ yylval = c-’0’;

return DIGIT;}return c;

}

Attribute of token(stored in yylval)

Attribute ofterm (parent)

Attribute of factor (child)

Example of a very crude lexicalanalyzer invoked by the parser

Dealing With Ambiguous Grammars

• By defining operator precedence levels and left/rightassociativity of the operators, we can specifyambiguous grammars in Yacc, such asE E+E | E-E | E*E | E/E | (E) | -E | num

• To define precedence levels and associativity in Yacc’sdeclaration part:

%left ‘+’ ‘-’%left ‘*’ ‘/’%right UMINUS

118

• By defining operator precedence levels and left/rightassociativity of the operators, we can specifyambiguous grammars in Yacc, such asE E+E | E-E | E*E | E/E | (E) | -E | num

• To define precedence levels and associativity in Yacc’sdeclaration part:

%left ‘+’ ‘-’%left ‘*’ ‘/’%right UMINUS

Example 2%{#include <ctype.h>#include <stdio.h>#define YYSTYPE double%}%token NUMBER%left ‘+’ ‘-’%left ‘*’ ‘/’%right UMINUS%%lines : lines expr ‘\n’ { printf(“%g\n”, $2); }

| lines ‘\n’| /* empty */;

expr : expr ‘+’ expr { $$ = $1 + $3; }| expr ‘-’ expr { $$ = $1 - $3; }| expr ‘*’ expr { $$ = $1 * $3; }| expr ‘/’ expr { $$ = $1 / $3; }| ‘(’ expr ‘)’ { $$ = $2; }| ‘-’ expr %prec UMINUS { $$ = -$2; }| NUMBER;

%%

Double type for attributesand yylval

119

%{#include <ctype.h>#include <stdio.h>#define YYSTYPE double%}%token NUMBER%left ‘+’ ‘-’%left ‘*’ ‘/’%right UMINUS%%lines : lines expr ‘\n’ { printf(“%g\n”, $2); }

| lines ‘\n’| /* empty */;

expr : expr ‘+’ expr { $$ = $1 + $3; }| expr ‘-’ expr { $$ = $1 - $3; }| expr ‘*’ expr { $$ = $1 * $3; }| expr ‘/’ expr { $$ = $1 / $3; }| ‘(’ expr ‘)’ { $$ = $2; }| ‘-’ expr %prec UMINUS { $$ = -$2; }| NUMBER;

%%

Example 2 (cont’d)%%int yylex(){ int c;while ((c = getchar()) == ‘ ‘)

;if ((c == ‘.’) || isdigit(c)){ ungetc(c, stdin);

scanf(“%lf”, &yylval);return NUMBER;

}return c;

}int main(){ if (yyparse() != 0)

fprintf(stderr, “Abnormal exit\n”);return 0;

}int yyerror(char *s){ fprintf(stderr, “Error: %s\n”, s);}

Crude lexical analyzer forfp doubles and arithmeticoperators

120

%%int yylex(){ int c;while ((c = getchar()) == ‘ ‘)

;if ((c == ‘.’) || isdigit(c)){ ungetc(c, stdin);

scanf(“%lf”, &yylval);return NUMBER;

}return c;

}int main(){ if (yyparse() != 0)

fprintf(stderr, “Abnormal exit\n”);return 0;

}int yyerror(char *s){ fprintf(stderr, “Error: %s\n”, s);}

Run the parser

Crude lexical analyzer forfp doubles and arithmeticoperators

Invoked by parserto report parse errors

Combining Lex/Flex with Yacc/Bison

Yacc or Bisoncompiler

yaccspecificationyacc.y

y.tab.cy.tab.h

Lex or Flexcompiler

Lex specificationlex.l

and token definitionsy.tab.h

121

lex.yy.cy.tab.c

inputstream

Ccompiler

a.out outputstream

a.out

Lex or Flexcompiler

Lex specificationlex.l

and token definitionsy.tab.h

lex.yy.c

Lex Specification for Example 2%option noyywrap%{#include “y.tab.h”extern double yylval;%}number [0-9]+\.?|[0-9]*\.[0-9]+%%[ ] { /* skip blanks */ }{number} { sscanf(yytext, “%lf”, &yylval);

return NUMBER;}

\n|. { return yytext[0]; }

Generated by Yacc, contains#define NUMBER xxx

Defined in y.tab.c

122

%option noyywrap%{#include “y.tab.h”extern double yylval;%}number [0-9]+\.?|[0-9]*\.[0-9]+%%[ ] { /* skip blanks */ }{number} { sscanf(yytext, “%lf”, &yylval);

return NUMBER;}

\n|. { return yytext[0]; }

yacc -d example2.ylex example2.lgcc y.tab.c lex.yy.c./a.out

bison -d -y example2.yflex example2.lgcc y.tab.c lex.yy.c./a.out

Error Recovery in Yacc

%{…%}…%%lines : lines expr ‘\n’ { printf(“%g\n”, $2; }

| lines ‘\n’| /* empty */| error ‘\n’ { yyerror(“reenter last line: ”);

yyerrok;}

;…

123

%{…%}…%%lines : lines expr ‘\n’ { printf(“%g\n”, $2; }

| lines ‘\n’| /* empty */| error ‘\n’ { yyerror(“reenter last line: ”);

yyerrok;}

;…

Reset parser to normal modeError production:set error mode and

skip input until newline

syntax analysis - wordpress.com · 1. to check syntax (= string recognizer) – and to report...

Documents