compiler design and construction bottom-up parsingsking/courses/compilers/slides/bottom_up... ·...

Compiler Design and Construction

Bottom-Up Parsing

Slides modified from Louden Book, Y Chung (NTHU), and Fischer, Leblanc

2 2

Outline 6.0 Introduction

6.1 Shift-Reduce Parsers

6.2 LR Parsers

6.3 LR(1) Parsing

6.4 SLR(1)Parsing

6.5 LALR(1)

Fall 2012 Bottom Up Parsing

Parsing

A top-down parser “discovers” the parse tree by

starting at the root (start symbol) and expanding

(predict) downward in a depth-first manner

They predict the derivation before the matching is done

A bottom-up parser starts at the leaves (terminals)

and determines which production generates them.

Then it determines the rules to generate their parents and so-on, until reaching root (S)

Fall 2012 Bottom Up Parsing 3

Bottom-up Parsing Example


Scan the input looking for any substrings that appear on the RHS of a rule!

We call that RHS a handle

We can do this left-to-right or right-to-left

Let's use left-to-right

Replace that RHS with the LHS

Repeat until left with Start symbol or error

Effectively we are going to figure out which rules (in a right-most derivation) will generate our input (but in reverse order)

Can think of this as handle pruning

Top-down Parsing Example

Consider the following input and CFG

Input: begin SimpleStmt; SimpleStmt; end $

How would we generate this string in a rightmost

fashion?

<program> begin <stmts> end $ <stmts> SimpleStmt ; <stmts> <stmts> begin <stmts> end ; <stmts>

<stmts> l

Top-down Parsing Example

Consider the following input and CFG

Input: begin SimpleStmt; SimpleStmt; end $

<program> => begin <stmts> end $

=> begin SimpleStmt; <stmts> end $

=> begin SimpleStmt; SimpleStmt; <stmts> end $

=> begin SimpleStmt; SimpleStmt; end $

<program> begin <stmts> end $ <stmts> SimpleStmt ; <stmts> <stmts> begin <stmts> end ; <stmts>

<stmts> l

Bottom-up Parsing Example Input: begin SimpleStmt; SimpleStmt; <stmts> end $

Replace lambda with <stmts>

<stmts>

l <program> begin <stmts> end $ <stmts> SimpleStmt ; <stmts> <stmts> begin <stmts> end ; <stmts>

<stmts> l


Input: begin SimpleStmt; SimpleStmt; <stmts> end $

Replace SimpleStmt; <stmts> with <stmts>

Input: begin SimpleStmt; <stmts> end $

<stmts>

SimpleStmts ; <stmts>


<stmts> l


Input: begin SimpleStmt; <stmts> end $

Replace SimpleStmt; <stmts> with <stmts>

Input: begin <stmts> end $

<stmts>

SimpleStmt ; <stmts>



<stmts> l


Input: begin <stmts> end $

Replace with start symbol

<program> $

<program>

begin <stmts> end $

SimpleStmt ; <stmts>



<stmts> l

Bottom Up Parsing


Consider this grammar:

S --> a T U e

T --> T b c | b

U --> d

and the rightmost derivation of the

sentence:

a b b c d e:

S ==> a T U e

==> a T d e

==> a T b c d e

==> a b b c d e

Bottom Up Parsing


A bottom-up parser is an LR parser so it reads the input

from left-to-right and performs a rightmost derivation in

reverse order.

There are four steps in the rightmost derivation of a b b

c d e so a bottom-up parser performs the steps in

reverse order: S ==> a T U e

==> a T d e

==> a T b c d e

==> a b b c d e

Bottom Up Parsing


The parser examines the sentence ( a b b c d e ) for substrings that match the right-sides of productions in the grammar.

There are three cases:

the first (b) in the sentence;

the second (b) in the sentence;

or the (d).

The parser chooses the first b and reduces it to the left-side of the T --> b production to produce the sentential form: a T b c d e .

S --> a T U e T --> T b c | b U --> d

Bottom Up Parsing


The parser examines the sentential form ( a T b c d e )

for substrings that match the right-sides of productions in

the grammar.

There are three cases:

( T b c ), (b), and (d).

The parser chooses ( T b c ) and reduces it to the left-

side of the production: T --> T b c to produce the

sentential form: a T d e.

S --> a T U e T --> T b c | b U --> d

Bottom Up Parsing


The parser examines the sentential form ( a T d e ) for

substrings that match the right-sides of productions in the

grammar and finds only one case:

(d).

The parser reduces it to the left-side of the production:

U --> d to produce the sentential form: a T U e.

S --> a T U e T --> T b c | b U --> d

Bottom Up Parsing


The parser examines the sentential form ( a T U e ) for substrings that match the right-sides of productions in the grammar and finds that the only case is the whole string: ( a T U e ).

The parser reduces it to the left-side of the production: S --> a T U e to produce a sentential form containing only the start

symbol, S.

Note that each step applies a production in reverse, replacing the right-side with the left-side, so we use the word reduce instead of produce.

Handles


The substring of the sentential form that the parser

chooses to reduce in each step of the parse is called the

handle for that step.

In the previous example the handles are:

1. the first (b) in ( a b b c d e ).

2. the ( T b c ) substring in ( a T b c d e ).

3. the (d) in ( a T d e ).

4. the whole string, ( a T U e ).

Handles


In step 1 and in step 2 of the example the parser has three possible handles to choose from:

if the parser chooses the wrong handle it won't be able to complete the reverse-ordered rightmost derivation.

The main task of a bottom-up parser is to choose the correct handle at each step of the parse.

There could be many choices on any step;

e.g., the empty string can be inserted into the string of n symbols in any of n + 1 different locations so just a single e -production in a grammar will give us many possible handles to choose from.

Shift Reduce Parsing


Most bottom-up parsers are implemented as shift-reduce

parsers.

Such a parser uses a stack to hold grammar symbols (it is

convenient to think of a horizontal stack with its bottom on

the left and its top on the right) and has four possible actions:

Shift: Move the next input symbol on to the top (right) of the stack.

Reduce: Reduce a handle on the right-most part of the stack by

popping it off the stack and pushing the left-side of the appropriate

production on to the right-end of the stack.

Accept: Announce successful completion of parsing.

Error: Signal discovery of a syntax error.



We use $ to mark the left-end (bottom) of the stack and also the end of the input string.

Initially the stack is empty.

Parsing ends successfully when the input is empty and the stack contains only the start symbol.

As an example we use the following grammar:

E --> E + E

E --> E *E

E --> (E )

E --> id

Example (louden)

Grammar:

E E + n | n

Input: 2 + 3, or n + n

Parse: ($ is EOF in input, also bottom of stack)


Parsing stack Input Action

1 $ n + n $ shift

2 $ n + n $ reduce E n

3 $ E + n $ shift

4 $ E + n $ shift

5 $ E + n $ reduce E E + n

6 $ E $ accept

Notes:

Left recursion is not a problem in bottom-up

parsing. Indeed, as we shall see, lookahead is

not as serious an issue.

Keeping track of what is on the stack, however,

is an issue (note the difference in the grammar

rule reductions at lines 2 and 5 of the previous

example). See later discussion on stack state.

Right recursion is actually a bit of a problem,

because it makes the stack grow large (see next example).


Example

Grammar:

E n + E | n

Input: 2 + 3, or n + n

Parse:


Parsing stack Input Action

1 $ n + n $ shift

2 $ n + n $ shift

3 $ n + n $ shift

4 $ n + n $ reduce E n

5 $ n + E $ reduce E n + E

6 $ E $ accept



The following figure shows the

actions of a shift-reduce parser to

parse the input string id1 * (id2 +

id3) according to the grammar.

STACK INPUT ACTION

$ id1 * ( id2 + id3 ) $ shift

$ id1 * ( id2 + id3 ) $ E --> id

$ E * ( id2 + id3 ) $ shift

$ E * ( id2 + id3 ) $ shift

$ E * ( id2 + id3 ) $ shift

$ E * ( id2 + id3 ) $ E --> id

$ E * ( E + id3 ) $ shift

$ E * ( E + id3 ) $ shift

$ E * ( E + id3 ) $ E --> id

$ E * ( E + E ) $ E --> E + E

$ E * ( E ) $ shift

$ E * ( E ) $ E --> ( E

$ E * E $ E --> E * E

$ E $ accept



Shift-reduce parsers can be constructed for a large class

of grammars - the LR grammars - but the construction is

usually so complicated that they are only constructed by

parser-construction programs (YACC)

However, the next section will show that there is a small

but important class of grammars where shift-reduce

parsers can be easily constructed by hand.

Introduction(2)

In Chapter 6

Bottom-up parsers

A bottom-up parser, or a shift-reduce parser,

begins at the leaves and works up to the top of the tree.

The reduction steps trace a rightmost derivation on reverse.

Fall 2012 26

More Example at Next Page to explain it.

S aABe

A Abc | b

B d

Grammar

The input string : abbcde.

parse

Bottom Up Parsing

27

Introduction(3)

a d b b c INPUT:

Bottom-Up Parsing

Program

e OUTPUT: $

Production

S aABe

A Abc

A b

B d

Bottom-Up Parser Example

Shift a


28

Introduction(4)

a d b b c INPUT:

Bottom-Up Parsing

Program

e OUTPUT:

A

b

$

Production

S aABe

A Abc

A b

B d

Bottom-Up Parser Example Shift b

Reduce from b to A


29

Introduction(5)

a d b A c INPUT:

Bottom-Up Parsing

Program

e OUTPUT:

A

b

$

Production

S aABe

A Abc

A b

B d


Shift A


30

Introduction(6)

a d b A c INPUT:

Bottom-Up Parsing

Program

e OUTPUT:

A

b

$

Production

S aABe

A Abc

A b

B d


Shift b


31

Introduction(7)

a d b A c INPUT:

Bottom-Up Parsing

Program

e OUTPUT:

A

b

$

Production

S aABe

A Abc

A b

B d

c

A

b

Bottom-Up Parser Example Shift c

Reduce from Abc to A


32

Introduction(8)

a d A INPUT:

Bottom-Up Parsing

Program

e OUTPUT:

A c

A

b

$

Production

S aABe

A Abc

A b

B d

b


Shift A


33

Introduction(9)

a d A INPUT:

Bottom-Up Parsing

Program

e OUTPUT:

A c

A

b

$

Production

S aABe

A Abc

A b

B d

b

B

d

Bottom-Up Parser Example Shift d

Reduce from d to B


34

Introduction(10)

a B A INPUT:

Bottom-Up Parsing

Program

e OUTPUT:

A c

A

b

$

Production

S aABe

A Abc

A b

B d

b

B

d


Shift B


35

Introduction(11)

a B A INPUT:

Bottom-Up Parsing

Program

e OUTPUT:

A c

A

b

$

Production

S aABe

A Abc

A b

B d

b

B

d

a

S

e

Bottom-Up Parser Example Shift e

Reduce from aABe to S


36

Introduction(12)

S INPUT:

Bottom-Up Parsing

Program

OUTPUT:

A c

A

b

$

Production

S aABe

A Abc

A b

B d

b

B

d

a

S

e

This parser is known as an LR Parser because

it scans the input from Left to right, and it constructs

a Rightmost derivation in reverse order.

Bottom-Up Parser Example Shift S

Hit the target $


Introduction(13)

Conclusion

The scanning of productions for matching with handles in the

input string

Backtracking makes the method used in the previous example

very inefficient.

Can we do better? Discuss it later!!!

Previous Architecture Renew Architecture

38 38



6.2 LR Parsers

6.3 LR(1) Parsing

6.4 SLR(1)Parsing

6.5 LALR(1)


Parse Trees

Phrase – sequence of tokens descended from a

nonterminal

Simple phrase – phrase that contains no smaller

phrase at the leaves

Handle – the leftmost simple phrase

40

Shift-Reduce Parsers(1) Shift-Reduce (bottom-up) parser is known as an LR Parser

It scans the input from Left to right

Rightmost derivation in reverse order

Kinds of LR

LR(k)

most powerful deterministic bottom-up parsing using k lookaheads

SLR(k)

LALR(k)

mechanism to perform

bottom-up parsing finite state machine

to manipulate “handle”

Components Parse stack Shift-reduce driver Action table

Goto table Fall 2012 Bottom Up Parsing

41

Shift-Reduce Parsers(2)

Parse stack

Initially empty, contains symbols already parsed

Elements in the stack are terminal or non-terminal symbols

The parse stack catenated with the remaining input always

represents a right sentential form


42


Shift-Reduce driver

Shift -- when top of stack doesn't contain a handle of the

sentential form

push input token (with contextual information) onto stack

Reduce -- when top of stack contains a handle

pop the handle

push reduced non-terminal (with contextual information)

Success when no input left and goal symbol on the stack


43


Two questions

– Have we reached the end of handles and how long is the

handle?

– Which non-terminal does the handle reduce to?

We use tables to answer the questions

ACTION table

GOTO table


44


LR parsers are driven by two tables:

Action table, which specifies that actions to take

Shift, reduce, accept (terminate with success) or error

Goto table, which specifies state transition

Defines successor states after a token or LHS is matched and shifted.

Parse stack – contains parse states (not symbols)

Encode the shifted symbol and the handles that are being matched, a possible sub-tree of the parse tree

Fall 2012

45

Shift-Reduce Parsers(6) grammar G0

1. <program> begin <stmts> end $

2. <stmts> SimpleStmt ; <stmts>

3. <stmts> begin <stmts> end ; <stmts>

4. <stmts> l

Action Table

Goto Table

blank -- ERROR

Shift Reduce Parser S – top parse stack state

T – Current input token

push(S0) // start state

Loop forever

case Action(S,T)

error => ReportSyntaxError()

accept => CleanUpAndFinish()

shift => Push(GoTo(S,T))

Scanner(T) // yylex()

reduce => Assume X -> Y1...Ym

Pop(m) // S' is new stack top

Push(GoTo(S',X))

47


void shift_reduce_driver(void) { /* Push the Start State, S0, * onto an empty parse stack. */ push(S0); while (TRUE) { /* forever */ /* Let S be the top parse stack state; * let T be the current input token.*/ switch (action[S][T]) { case ERROR: announce_syntax_error(); break; case ACCEPT: /* The input has been correctly

* parsed. */ clean_up_and_finish(); return;

case SHIFT: push(go_to[S][T]); scanner(&T); /* Get next token. */ break; case REDUCEi: /* Assume i-th production is * X Y1 Ym. * Remove states corresponding to * the RHS of the production. */ pop(m); /* S' is the new stack top. */ push(go_to[S'][X]); break; } } }


grammar G0

1. <program>begin<stmts>end$

2. <stmts> SimpleStmt;<stmts>

3. <stmts> begin<stmts>end;<stmts>

4. <stmts> l

tracing steps

Step Parse Stack Remaining Input Action (1) 0 begin SimpleStmt ; SimpleStmt ; end $ Shift 1

Shift-Reduce

Parsers(8)

Symbol State 0 1 2 3 4 5 6 7 8 9 10 11

begin S S S S S end R4 S R4 R4 S R4 R2 R3

; S S SimpleStmt S S S S

$ A

action

table

grammar G0




4. <stmts> l

tracing steps

Step Parse Stack Remaining Input Action (2) 0,1 SimpleStmt ; SimpleStmt ; end $ Shift 5

Shift-Reduce

Parsers(9)

Symbol State 0 1 2 3 4 5 6 7 8 9 10 11



$ A

action

table

grammar G0




4. <stmts> l

tracing steps

Step Parse Stack Remaining Input Action (3) 0,1,5 ; SimpleStmt ; end $ Shift 6

Shift-Reduce

Parsers(10)

Symbol State 0 1 2 3 4 5 6 7 8 9 10 11



$ A

action

table

grammar G0




4. <stmts> l

tracing steps

Step Parse Stack Remaining Input Action (4) 0,1,5,6 SimpleStmt ; end $ Shift 5

Symbol State 0 1 2 3 4 5 6 7 8 9 10 11



$ A

Shift-Reduce

Parsers(11)

action

table

grammar G0




4. <stmts> l

tracing steps

Step Parse Stack Remaining Input Action (5) 0,1,5,6,5 ; end $ Shift 6

Symbol State 0 1 2 3 4 5 6 7 8 9 10 11



$ A

Shift-Reduce

Parsers(12)

action

table

grammar G0




4. <stmts> l

tracing steps

Step Parse Stack Remaining Input Action (6) 0,1,5,6,5,6,l end $ /* goto(6,<stmts>) = 10 */ Reduce 4

Symbol State 0 1 2 3 4 5 6 7 8 9 10 11



$ A

Shift-Reduce

Parsers(13)

goto

table

action

table

grammar G0




4. <stmts> l

tracing steps

Step Parse Stack Remaining Input Action (7) 0,1,5,6,5,6,10 end $ /* goto(6,<stmts>) = 10 */ Reduce 2

Symbol State 0 1 2 3 4 5 6 7 8 9 10 11



$ A

Shift-Reduce

Parsers(14)

goto

table

action

table

grammar G0




4. <stmts> l

tracing steps

Step Parse Stack Remaining Input Action (8) 0,1,5,6,10 end $ /* goto(1,<stmts>) = 2 */ Reduce 2

Symbol State 0 1 2 3 4 5 6 7 8 9 10 11



$ A

Shift-Reduce

Parsers(15)

goto

table

action

table

grammar G0




4. <stmts> l

tracing steps

Step Parse Stack Remaining Input Action (9) 0,1,2 end $ Shift 3

Symbol State 0 1 2 3 4 5 6 7 8 9 10 11



$ A

Shift-Reduce

Parsers(16)

action

table

grammar G0




4. <stmts> l

tracing steps

Step Parse Stack Remaining Input Action (10) 0,1,2,3 $ Accept

Symbol State 0 1 2 3 4 5 6 7 8 9 10 11



$ A

Shift-Reduce

Parsers(17)

action

table

tracing steps

Step Parse Stack Remaining Input Action (1) 0 begin SimpleStmt ; SimpleStmt ; end $ Shift 1 (2) 0,1 SimpleStmt ; SimpleStmt ; end $ Shift 5 (3) 0,1,5 ; SimpleStmt ; end $ Shift 6 (4) 0,1,5,6 SimpleStmt ; end $ Shift 5 (5) 0,1,5,6,5 ; end $ Shift 6 (6) 0,1,5,6,5,6 end $ /* goto(6,<stmts>) = 10 */ Reduce 4 (7) 0,1,5,6,5,6,10 end $ /* goto(6,<stmts>) = 10 */ Reduce 2 (8) 0,1,5,6,10 end $ /* goto(1,<stmts>) = 2 */ Reduce 2 (9) 0,1,2 end $ Shift 3 (10) 0,1,2,3 $ Accept


<program>

begin(1) <stmts> end(9) $(10)

SimpleStmt(2) ;(3) <stmts>

SimpleStmt(4) ;(5) <stmts>

l(6)

R4(6)

R2(7)

R2(8)

grammar G0 1. <program> begin <stmts> end $ 2. <stmts> SimpleStmt ; <stmts> 3. <stmts> begin <stmts> end ; <stmts> 4. <stmts> l

59 59



6.2 LR Parsers

6.3 LR(1) Parsing

6.4 SLR(1)Parsing

6.5 LALR(1)

6.6 Calling Semantic Routines in Shift-Reduce Parsers

6.7 Using a Parser Generator (TA course)

6.8 Optimizing Parse Tables

6.9 Practical LR(1) Parsers

6.10 Properties of LR Parsing

6.11 LL(1) or LAlR(1) , That is the question

6.12 Other Shift-Reduce Technique


60

LR Parsers LR(n) n=0~k

Read from Left, Right-most derivation, n look-ahead

LR parsers are deterministic

No backup or retry parsing actions

LR(0):

Without prediction read from Left, Right-most derivation, 0 look-ahead

LR(1):

1-token look-ahead

General

LR(k) parsers

Decide the next action by examining the tokens already shifted and at most k look-ahead tokens

The most powerful of deterministic

Difficult to implement


61

A production has the form

AX1X2…Xj

By adding a dot, we get a configuration (or an item)

A•X1X2…Xj

AX1X2…Xi • Xi+1 … Xj

AX1X2…Xj •

The • indicates how much of a RHS has been shifted onto the stack. an item (configuration) tells you where you are in a parse!

These are LR(0) configurations since no lookahead info is used.

An item with the • at the end of the RHS

Such as, AX1X2…Xj •, indicates that RHS should be reduced to LHS, it thus has recognized that production.

An item with the • at the beginning of RHS

Such as, A•X1X2…Xj, predicts that production, that is the RHS will be shifted onto the stack

LR(0) Table Construction(1)


LR(0) Table Construction(2) An LR(0) state is a set of configurations

The actual state of LR(0) parsers is denoted by one of the items (configurations).

The closure0 operation:

if there is a configuration B • A in the set where A is a non terminal, then add all configurations of the form A • to the set.

The initial configuration

s0 = closure0({S • $})

A configuration set is all possible configurations at a given point during a parse.

Configuration_set closure (configuration_set s) { configuration_set s’ = s ; do {

if( B • A s’ for A Vn ) { /* Predict productions with A as LHS */ Add all configurations of the form A • γ to s’ } } while (more new configurations can be added) ; return 0; }

EX: for grammar G1 :

1. S'S$

2.SID|l closure0( { S S $ } ) =

{ S' S$,

S ID,

S l }

special case: l

LR(0) Table Construction(3) • Q1: Why the grammar use S'S$ ?

• Ans: To check for the end of the parse.

EX: If S’ does not exist~

SID$

S l$

When we button up to reduce the original symbol S, there are two paths to achieve it.

Multipath is a problem that if we

have in complex grammars like C.

A lot of paths we need to check the ending symbol $.


1. S'S$

2.SID|l

closure0( { S S $ } ) =

{ S' S$,

S ID,

S l }

Given a configuration set s, we can compute its successor, s’ , under a symbol X

Denoted go_to0(s,X)=s’

Configuration_set goto (configuration_set s , symbol x) { Sb = Ø ;

for (each configuration c s) if(c = A β•x γ to sb) Add A βx • γ to sb ; /* * That is, we advance the • past the symbol X, * if possible. Configurations not having a * dot preceding an X are not included in sb . */ /* Add new predictions to sb via closure0. */ return closure0(sb) ; }


void_build_CFSM(void)

{

S = SET_OF(S0);

while (S is nonempty) {

Remove a configuration set s from S;

/* Consider both terminals and non-terminals */

for ( X in Symbols) {

if(go_to0(s,X) does not label a CFSM state) {

Create a new CFSM state & label with go_to0(s , X)

Add go_to0(s,X) to S;

}

Create a transition under X from the state s

labels to the state go_to0(s , X)

}

}

}

The grammar is finite, also the # of configurations and configuration sets.

Characteristic finite state machine (CFSM)

Build by identifying configuration sets and successor operations with CFSM states and transitions

It is a finite automaton



1. S'S$

2.SID|l

state 0

S' S$,

S ID,

S l

state 1

S ID

ID

state 2

S' S $

S

state 3

S' S $

$

state 4

error

Int ** build_go_to_table(finite_automation CFSM) {

const int N = num_states (CFSM);

int **tab;

Dynamically allocate a table of dimension

N × num_symbols (CFSM) to represent

the go_to table and assign it to tab;

Number the states of CFSM from 0 to N-1,

with the Start State labeled 0;

for( S = 0 ; S<=N-1 ; S++) {

/* Consider both terminals and non-terminals. */


if ( State S has a transition under X to some state T)

tab [S][X] = T ;

else

tab [S][X] = EMPTY;

}

}

return tab;

}

LR(0) Table Construction(6) CFSM is the goto table of LR(0) parsers. state 0

S' S$,

S ID,

S l

state 1

S ID

ID

state 2

S' S $

S

state 3

S' S $

$

State Symbol

ID $ S

0 1 4 2

1 4 4 4

2 4 3 4

3 4 4 4

4

goto table

Because LR(0) uses no look-ahead, we must extract the

action function directly from the configuration sets of

CFSM

Let Q={Shift, Reduce1, Reduce2 , …, Reducen}

There are n productions in the CFG

Let S0 be the set of CFSM states

The power set P, is a projection that maps each CFSM set

to appropriate subset of Q

P:S02Q 2Q is the power set of Q.

P(s)={Reducei | B • s and production i is B }

(if A • a s for a Vt Then {Shift} Else )


G is LR(0) if and only if s S0 |P(s)|=1

If G is LR(0), the action table is trivially extracted from P

P(s)={Shift} action[s]=Shift

P(s)={Reducei}, where production j is the augmenting

production, action[s]=Accept

P(s)={Reducei}, ij, action[s]=Reducei

P(s)= action[s]=Error


state 0

S' S$,

S ID,

S l

state 1

S ID

ID

state 2

S' S $

S

state 3

S' S $

$



1. S'S$

2.SID|l

state 0 1 2 3

action S R2 S Accept

Reducei | B • s and production i is B (if A • a s for a Vt Then {Shift} Else )

state 0

S' S$,

S ID,

S l

state 1

S ID

ID

state 2

S' S $

S

state 3

S' S $

$



1. S'S$

2.SID|l

state 0 1 2 3



Any state s S0 for which |P(s)|>1 is said to be inadequate

Two kinds of parser conflicts create inadequacies in configuration sets

Shift-reduce conflicts

Reduce-reduce conflicts

Should be able to resolve inadequacy by using alookahead

If is easy to introduce inadequacies in CFSM states

Hence, few real grammars are LR(0). For example,

Consider l-productions

The only possible configuration involving a l-production is of the form A l•

However, if A can generate any terminal string other than l, then a shift

action must also be possible (First(A))

LR(0) parser will have problems in handling operator precedence properly


Before tracing , we will need to know the mind of CFSM

LR(0) Tracing Example(0)

for grammar G2 :

1. SE$

2.EE+T

3.ET

4.T id

5.T (E)

closure0( { T ( E ) }

= { T ( E ) ,

E E + T ,

E T ,

T id ,

T ( E ) }

T

( E )

E + T

T

( E )

T

id

T

( E )

T

id

When shift ( , some possible answers of tree:

state 0 S E$ E E+T E T T id T (E)


closure0( { S E$ } ) = { S E$, E E+T, E T, T id, T (E) }

E

T

(

id


closure0({ S E $, E E +T } ) =itself

E

T

(

id

state 1 S E $ E E +T

$

+



closure0({ S E $ } ) =itself

E

T

(

id

$

+

state 2 S E $




closure0({E E+ T}) = {E E+ T, T id, T (E) }

E

T

(

id

$

+ state 3 E E + T T id T (E)

id

T

(

state 2 S E $




closure0({E E+ T }) =itself

E

T

(

id

$

+

id

T

(

state 4 E E +T

state 3 E E + T T id T (E)

state 2 S E $




closure0({T id }) =itself

E

T

(

id

$

+

id

T

(

state 5 T id

state 4 E E +T


state 2 S E $




closure0({T ( E) }) = { T ( E) , E E+T, E T, T id, T (E) }

E

T

(

id

$

+

id

T

(

state 4 E E +T

state 6 T ( E) E E+T E T T id T (E)

(

id

T

E

state 5 T id


state 2 S E $




closure0({T (E ) ,E E +T } ) =itself

E

T

(

id

$

+

id

T

(

(

id

T

E

state 7 T (E) E E +T

+ )

state 4 E E +T


state 5 T id


state 2 S E $




closure0({T (E ) } ) =itself

E

T

(

id

$

+

id

T

(

(

id

T

E

+ )

state 8 T (E)


state 4 E E +T


state 5 T id


state 2 S E $




closure0({E T } ) =itself

E

T

(

id

$

+

id

T

(

(

id

T

E

+ )

state 8 T (E)

state 9

E T


state 4 E E +T


state 5 T id


state 2 S E $





E

T

(

id


$

+

state 2 S E $


id

T

(

state 4 E E +T

state 5 T id


(

id

T

E


+ )

state 8 T (E)

state 9 E T

Symbol State

0 1 2 3 4 5 6 7 8 9 10

anything S S A S R2 R4 S S R5 R3

state 10 Error

any

error

action

table


LR(0) Tracing

Example(12)

goto table

State Symbol

S E T + id ( ) $

0 1 9 5 6

1 3 2

2

3 4 5 6

4

5

6 7 9 5 6

7 3 8

8

9

10

Stat

e

Symbol

S E T + id ( ) $

0 1 9 5 6

1 3 2

2

3 4 5 6

4

5

6 7 9 5 6

7 3 8

8

9

10

Symbol State

0 1 2 3 4 5 6 7 8 9 10


Program Example (1)

Initial :(id)$

step1:0 (id)$ shift (

1

Tree:

(

Stat

e

Symbol

S E T + id ( ) $

0 1 9 5 6

1 3 2

2

3 4 5 6

4

5

6 7 9 5 6

7 3 8

8

9

10

Symbol State

0 1 2 3 4 5 6 7 8 9 10


Program Example (2)

step2:06 id)$ shift id

2

Tree:

(

Initial :(id)$

id

Stat

e

Symbol

S E T + id ( ) $

0 1 9 5 6

1 3 2

2

3 4 5 6

4

5

6 7 9 5 6

7 3 8

8

9

10

Symbol State

0 1 2 3 4 5 6 7 8 9 10


Program Example (3)

step3:065 )$ reduce 4

3

Tree:

Initial :(id)$

(

id

T

Stat

e

Symbol

S E T + id ( ) $

0 1 9 5 6

1 3 2

2

3 4 5 6

4

5

6 7 9 5 6

7 3 8

8

9

10

Symbol State

0 1 2 3 4 5 6 7 8 9 10


Program Example (4)

step4:069 )$ reduce 3

4

Tree:

Initial :(id)$

(

id

T

E

Stat

e

Symbol

S E T + id ( ) $

0 1 9 5 6

1 3 2

2

3 4 5 6

4

5

6 7 9 5 6

7 3 8

8

9

10

Symbol State

0 1 2 3 4 5 6 7 8 9 10


Program Example (5)

step5:067 )$ shift )

5

Tree: (

id

T

Initial :(id)$

E )

Stat

e

Symbol

S E T + id ( ) $

0 1 9 5 6

1 3 2

2

3 4 5 6

4

5

6 7 9 5 6

7 3 8

8

9

10

Symbol State

0 1 2 3 4 5 6 7 8 9 10


Program Example (6)

step6:0678 $ reduce 5

6

Tree:

Initial :(id)$

(

id

T

E )

T

Stat

e

Symbol

S E T + id ( ) $

0 1 9 5 6

1 3 2

2

3 4 5 6

4

5

6 7 9 5 6

7 3 8

8

9

10

Symbol State

0 1 2 3 4 5 6 7 8 9 10


Program Example (7)

step7:09 $ reduce 3

7

Tree:

Initial :(id)$

(

id

T

E )

T

E

Stat

e

Symbol

S E T + id ( ) $

0 1 9 5 6

1 3 2

2

3 4 5 6

4

5

6 7 9 5 6

7 3 8

8

9

10

Symbol State

0 1 2 3 4 5 6 7 8 9 10


Program Example (8)

step8:01 $ shift $

8

Tree:

Initial :(id)$

(

id

T

E )

T

E $

Stat

e

Symbol

S E T + id ( ) $

0 1 9 5 6

1 3 2

2

3 4 5 6

4

5

6 7 9 5 6

7 3 8

8

9

10

Symbol State

0 1 2 3 4 5 6 7 8 9 10


Program Example (9)

step9:012 Accept

9 Accept

Tree:

Initial :(id)$

(

id

T

E )

T

E $

S

96 96



6.2 LR Parsers

6.3 LR(1) Parsing

6.4 SLR(1)Parsing

6.5 LALR(1)


97

LR(1) Parsing (1)

An LR(1) configuration, or item is of the form

AX1X2…Xi • Xi+1 … Xj, l where l Vt{l}

The look ahead component l represents a possible look-ahead

after the entire right-hand side has been matched

The l appears as look-ahead only for the augmenting production

because there is no look-ahead after the end-marker

We use the following notation to represent the set of LR(1)

configurations that shared the same dotted production

AX1X2…Xi • Xi+1 … Xj, {l1…lm}

={AX1X2…Xi • Xi+1 … Xj, l1}

{AX1X2…Xi • Xi+1 … Xj, l2}

…

{AX1X2…Xi • Xi+1 … Xj, lm}


98

LR(1) Parsing (2)

LR(1) There are many more distinct LR(1) configurations than LR(0) configurations.

In fact, the major difficulty with LR(1) parsers is not their power but rather finding ways to represent them in storage-efficient ways.

Parsing begins with the configuration : closure1({S • $, {l}})

Configuration_set closure1 (configuration_set s) { configuration_set s’ = s ; do { if( B • A , l s’ for A Vn ) { /* * Predict productions with A as the left-hand side. * Possible lookaheads are First(l ) */ Add all configurations of the form A • γ, u where u First(l ) to s’ } } while (more new configurations can be added) ; return s’; }

for grammar G2 : 1. SE$

2.EE+T

3.ET

4.T id

5.T (E)

closure1(S • E$, l}) = { S E$,{l} E E+T,{$+} E T,{$+} T id,{$+} T (E),{$+} }


99

LR(1) Parsing (3)

Tracing Example for grammar G2 :

1. SE$

2.EE+T

3.ET

4.T id

5.T (E)

closure1(S • E$, l})

S E$,{l}

E E+T,{$} E T,{$}

T id,{$} T (E),{$}

E E+T,{+} E T,{+}

T id,{+} T (E),{+}

closure1(S • E$, l})=

{ S E$,{l} E E+T,{$+} E T,{$+} T id,{$+} T (E),{$+} }


100

LR(1) Parsing (4)

Given an LR(1) configuration set s

We compute its successor, s', under a symbol X

go_to1(s,X) Configuration_set goto1 (configuration_set s , symbol x) { Sb = Ø ;

for (each configuration c s) if( c is of the form A βx • γ, l)

//In goto0 if( each configuration c s) Add A βx • γ, l to sb ; /* * That is, we advance the • past the symbol X, * if possible. Configurations not having a * dot preceding an X are not included in sb . */ /* Add new predictions to sb via closure1. */ return closure1(sb) ; } Fall 2012 Bottom Up Parsing

101

LR(1) Parsing (5)

LR(1) We can build a finite automata that is analogue of the LR(0) CFSM

LR(1) FSM, LR(1) machine

The relationship between CFSM and LR(1) macine By merging LR(1) machine’s configuration sets, we can obtain CFSM

void_build_LR1(void)

{

Create the Start State of FSM; Label it with s0

Put s0 into an initially empty set , S.

while (S is nonempty) {

Remove a configuration set s from S;

/* Consider both terminals and non-terminals */


if(go_to1(s,X) does not label a FSM state) {

Create a new FSM state and label it with go_to1(s , X) into S;

Put go_to1(s , X) into S;

}

Create a transition under X from the state s

labels to the state go_to1 (s , X) labels;

} } }

Tracing Example:

for grammar G3 :

1. S E $ 2. E E + T 3. E T 4. T T * P 5. T P 6. P id 7. P ( E )


102

state 0 S E$ ,{l} E E+T,{$+} E T ,{$+} T T*P ,{$+*} T P ,{$+*} P id ,{$+*} P (E) ,{$+*}

E

P

id

T (


103


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

(


104


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

$ state 2 //Accept S E $ ,{l}

(


105


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

+


state 3 E E+ T,{$+} T T*P ,{$+*} T P ,{$+*} P id ,{$+*} P (E) ,{$+*}

P

T id ( (


106


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

+



P

T id (

state 4 T P ,{$+*}

(


107


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

+



P

T id

(

state 4 T P ,{$+*}

state 5 P id ,{$+*}

(


108


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

+



P

T id

(

state 4 T P ,{$+*}

state 5 P id ,{$+*}

state 6 P ( E) ,{$+*} E E+T,{)+} E T ,{)+} T T*P ,{)+*} T P ,{)+*} P id ,{)+*} P (E) ,{)+*}

(

E T

P id (

Be careful of

look-ahead !!


109


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

+



P

T id

(

state 4 T P ,{$+*}

state 5 P id ,{$+*}


(

state 7 E T ,{$+} T T *P,{$+*}

E T

P id (

*


110


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

+



P

T id

(

state 4 T P ,{$+*}

state 5 P id ,{$+*}


(

state 7 E T ,{$+} T T *P,{$+*}

E T

P id (

*

state 8 T T* P,{$+*} P id ,{$+*} P (E) ,{$+*}

id

P

(


111


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

+



P

T id

(

state 4 T P ,{$+*}

state 5 P id ,{$+*}


(

state 7 E T ,{$+} T T *P,{$+*}

E T

P id (

*

state 8 T T* P,{$+*} P id ,{$+*} P (E) ,{$+*}

id

P

(

state 9 T T* P ,{$+*}


112


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

+



P

T id

(

state 4 T P ,{$+*}

state 5 P id ,{$+*}


(

state 7 E T ,{$+} T T *P,{$+*}

E T

P

*

state 8 T T* P,{$+*} P id ,{$+*} P (E) ,{$+*}

id

P

(

state 9 T T* P ,{$+*}

state 10 P id ,{)+*}

(

id


113


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

+



P

T id

(

state 4 T P ,{$+*}

state 5 P id ,{$+*}


(

state 7 E T ,{$+} T T *P,{$+*}

E T

P

*

state 8 T T* P,{$+*} P id ,{$+*} P (E) ,{$+*}

id

P

(

state 9 T T* P ,{$+*}


(

id

state 11 E E+ T ,{$+} T T *P,{$+*}

* State 8


114


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

+



P

T id

(

state 4 T P ,{$+*}

state 5 P id ,{$+*}


(

state 7 E T ,{$+} T T *P,{$+*}

E

T

P

*

state 8 T T* P,{$+*} P id ,{$+*} P (E) ,{$+*}

id

P

(

state 9 T T* P ,{$+*}


(

id

state 11 E E+ T ,{$+} T T *P,{$+*}

* State 8

state 12 P (E ) ,{$+*} E E +T,{)+}

+ )


115


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

+



P

T id

(

state 4 T P ,{$+*}

state 5 P id ,{$+*}


(

state 7 E T ,{$+} T T *P,{$+*}

E

T

P

*

state 8 T T* P,{$+*} P id ,{$+*} P (E) ,{$+*}

id

P

(

state 9 T T* P ,{$+*}


(

id

state 11 E E+ T ,{$+} T T *P,{$+*}

* State 8

state 12 P (E ) ,{$+*} E E +T,{)+}

+

)

state 13 P (E ) ,{$+*}


116


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

+



P

T id

(

state 4 T P ,{$+*}

state 5 P id ,{$+*}


(

state 7 E T ,{$+} T T *P,{$+*}

E

T *

state 8 T T* P,{$+*} P id ,{$+*} P (E) ,{$+*}

id

P

(

state 9 T T* P ,{$+*}


id

state 11 E E+ T ,{$+} T T *P,{$+*}

* State 8

state 12 P (E ) ,{$+*} E E +T,{)+}

+

)

state 13 P (E ) ,{$+*}

(

state 14 T P ,{)+*}

P


117


E

P

id

T

state 1 S E $ ,{l} E E +T,{$+}

+



P

T id

(

state 4 T P ,{$+*}

state 5 P id ,{$+*}


(

state 7 E T ,{$+} T T *P,{$+*}

E

T *

state 8 T T* P,{$+*} P id ,{$+*} P (E) ,{$+*}

id

P

(

state 9 T T* P ,{$+*}


id

state 11 E E+ T ,{$+} T T *P,{$+*}

* State 8

state 12 P (E ) ,{$+*} E E +T,{)+}

+

)

state 13 P (E ) ,{$+*}

(

state 14 T P ,{)+*}

P

state 18 P ( E) ,{)+*} E E+T,{)+} E T ,{)+} T T*P ,{)+*} T P ,{)+*} P id ,{)+*} P (E) ,{)+*}

id P

(

T E Fall 2012 Bottom Up Parsing

118


T

(

id P

State 14 State 10

E state 16 P (E ) ,{)+*} E E +T,{)+}

( +

LR(1) Parsing (16)


119


T

(

id P

State 14 State 10

E state 16 P (E ) ,{)+*} E E +T,{)+}

( +

state 15 P (E ) ,{)+*}

LR(1) Parsing (17)


120


T

(

id P

State 14 State 10

E state 16 P (E ) ,{)+*} E E +T,{)+}

(

+

state 15 P (E ) ,{)+*}

state 17 E E +T,{)+} T T*P ,{)+*} T P ,{)+*} P id ,{)+*} P (E) ,{)+*}

P

id

(

T

LR(1) Parsing (18)

Renew state 12

->+ to state 17


121


T

(

id P

State 14 State 10

E state 16 P (E ) ,{)+*} E E +T,{)+}

(

+

state 15 P (E ) ,{)+*}


P

id

(

T

state 19 E T ,{)+} T T *P ,{)+*}

*

Renew state 6

->T to state 19

LR(1) Parsing (19)


122


T

(

id P

State 14 State 10

E state 16 P (E ) ,{)+*} E E +T,{)+}

(

+

state 15 P (E ) ,{)+*}


P

id

(

T

state 19 E T ,{)+} T T *P ,{)+*}

state 20 E E +T,{)+} T T *P ,{)+*}

*

*

LR(1) Parsing (20)


123


T

(

id P

State 14 State 10

E state 16 P (E ) ,{)+*} E E +T,{)+}

(

+

state 15 P (E ) ,{)+*}


P

id

(

T

state 19 E T ,{)+} T T *P ,{)+*}

state 20 E E +T,{)+} T T *P ,{)+*}

*

state 21 T T * P,{)+*} P id ,{)+*} P (E) ,{)+*}

*

(

id P

LR(1) Parsing (21)


124


T

(

id P

State 14 State 10

E state 16 P (E ) ,{)+*} E E +T,{)+}

(

+

state 15 P (E ) ,{)+*}


P

id

(

T

state 19 E T ,{)+} T T *P ,{)+*}

state 20 E E +T,{)+} T T *P ,{)+*}

*

state 21 T T * P,{)+*} P id ,{)+*} P (E) ,{)+*}

*

(

id P

state 22 T T * P ,{)+*}

LR(1) Parsing (22)


125

LR(1) Parsing (23)

LR(1)

The go_to table used to

drive an LR(1) is extracted

directly from the LR(1)

machine

The algorithm

to generate “go_to”

table is same that we

discuss in LR(0)


126

LR(1) Parsing (24)

LR(1)

Action table is extracted directly from the configur-ation sets of the LR(1) machine

A projection function, P

P : S1Vt2Q

S1 be the set of LR(1) machine states

P(s,a)= {Reducei | B •,a s and production i is B } (if A • a,b s Then {Shift} Else )


127

LR(1) Parsing (25)

LR(1)

G is LR(1) if and only if

s S1 a Vt |P(s,a)|1

If G is LR(1), the action

table is trivially extracted

from P

P(s,$)={Shift}

action[s][$]=Accept

P(s,a)={Shift}, a$

action[s][a]=Shift

P(s,a)={Reducei},

action[s][a]=Reducei

P(s,a)=

action[s][a]=Error


128

LR(1) Parsing (26)

Example:

state 7 Reduce when look-ahead $+

Shift when look-ahead *

P(s,a)= {Reducei | B •,a s and production i is B } (if A • a,b s Then {Shift} Else )


129

Look- State ahead 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

+ S3 R5 R6 R3 R4 R6 R2 S17 R7 R5 R7 S17 R3 R2 R4

* R5 R6 S8 R4 R6 S8 R7 R5 R7 S21 S21 R4

id S5 S5 S10 S5 S10 S10 S10

( S6 S6 S18 S6 S18 S18 S18

) R6 S13 R5 R7 S15 R3 R2 R4

$ A R5 R6 R3 R4 R2 R7

S

E S1 S12 S16

T S7 S11 S19 S20 S19

P S4 S4 S14 S9 S14 S14 S22

Complete Table

Merge Action table & Go-To table


130

Combare G3 action in LR(0) and LR(1)

Symbol State

0 1 2 3 4 5 6 7 8 9 10 11 12

anything S S A S R5 R6 S S

R3

S R4 R7 S

R2

S

for grammar G3 :


Look- State ahead 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

+ S3 R5 R6 R3 R4 R6 R2 S17 R7 R5 R7 S17 R3 R2 R4

* R5 R6 S8 R4 R6 S8 R7 R5 R7 S21 S21 R4

id S5 S5 S10 S5 S10 S10 S10

( S6 S6 S18 S6 S18 S18 S18

) R6 S13 R5 R7 S15 R3 R2 R4

$ A R5 R6 R3 R4 R2 R7

S

E S1 S12 S16

T S7 S11 S19 S20 S19

P S4 S4 S14 S9 S14 S14 S22

LR(0)

LR(1)

ambiguous

state 7 E T T T *P

state 7 E T ,{$+} T T *P,{$+*}

LR(0) LR(1)


131

Initial :(id+id)$

step1:0 (id+id)$ shift (

Tree:

(


132

Initial :(id+id)$

step2:0 6 id+id)$ shift id

Tree:

( id


133

Initial :(id+id)$

step3:0 6 10 +id)$ Reduce 6

Tree:

(

id

P


134

Initial :(id+id)$


Tree:

(

id

P

T


135

Initial :(id+id)$


Tree:

(

id

P

T

E


136

Initial :(id+id)$

step6:0 6 12 +id)$ shift +

Tree:

(

id

P

T

E +


137

Initial :(id+id)$

step7:0 6 12 17 id)$ shift id

Tree:

(

id

P

T

E + id


138

Initial :(id+id)$

step8:0 6 12 17 10 )$ Reduce 6

Tree:

(

id

P

T

E +

id

P


139

Initial :(id+id)$

step9:0 6 12 17 14 )$ Reduce 5

Tree:

(

id

P

T

E +

id

P

T


140

Initial :(id+id)$

step10:0 6 12 17 20 )$ Reduce 2

Tree:

(

id

P

T

+

id

P

E T

E


141

Initial :(id+id)$

step11:0 6 12 )$ Shift 13

Tree:

(

id

P

T

+

id

P

T E

E )


142

Initial :(id+id)$

step12:0 6 12 13 $ Reduce 7

Tree:

(

id

P

T

+

id

P

T E

E )

P


143

Initial :(id+id)$

step13:0 4 $ Reduce 7

Tree:

(

id

P

T

+

id

P

T E

E )

P

T


144

Initial :(id+id)$


Tree:

(

id

P

T

+

id

P

T E

E )

P

T

E


145

Initial :(id+id)$

step15:0 1 $ Accept

Tree:

(

id

P

T

+

id

P

T E

E )

P

T

E


146 146



6.2 LR Parsers

6.3 LR(1) Parsing

6.4 SLR(1)Parsing

6.5 LALR(1)


147

SLR(1) Parsing (1)

LR(1) parsers

are the most powerful case of shift-reduce parsers, using a single look-ahead

LR(1) grammars exist for virtually all programming languages

LR(1)’s problem is that the LR(1) machine contains so many states that the go_to and action tables become prohibitively large

In reaction to the space inefficiency of LR(1) tables computer scientists have devised parsing techniques that are almost as

powerful as LR(1) but that require far smaller tables

One is to start with the CFSM, and then add look-ahead after the CFSM is build

– SLR(1)

The other approach to reducing LR(1)’s space inefficiencies is to merger inessential LR(1) states

– LALR(1)

148

SLR(1) Parsing (2)

SLR(1) stands for Simple LR(1)

One-symbol look-ahead

Look-aheads are not built directly into configurations but rather are added after the LR(0) configuration sets are built

An SLR(1) parser will perform a reduce action for configuration B • if the look-ahead symbol is in the set Follow(B)

The SLR(1) projection function, from CFSM states,

P : S0Vt2Q

P(s,a)={Reducei | B •,a Follow(B) and production i is B } (if A • a s for a Vt Then {Shift} Else )

149

SLR(1) Parsing (3)

G is SLR(1) if and only if

s S0 a Vt |P(s,a)|1

If G is SLR(1), the action table is trivially extracted from P

P(s,$)={Shift} action[s][$]=Accept

P(s,a)={Shift}, a$ action[s][a]=Shift

P(s,a)={Reducei}, action[s][a]=Reducei

P(s,a)= action[s][a]=Error

Clearly SLR(1) is a proper superset of LR(0)

150

SLR(1) Parsing (4)

Consider G3

It is LR(1) but not LR(0)

What’re follow-sets in G3?

Consider G3 :


Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}

151

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}

state 0 S E$ E E+T E T T T*P T P P id P (E)

E

P

id

T (

152

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}


E

P

id

T


+

$

(

153

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}


E

P

id

T


+

$ state 2 //Accept S E $

(

154

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}


E

P

id

T


+


(

state 3 E E+ T T T*P T P P id P (E)

P

T id

(

155

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}


E

P

id

T


+


(


P

T id

(

P state 4 T P

156

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}


E

P

id

T


+


(


P

T id

(

P state 4 T P

state 5 P id

157

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}


E

P

id

T


+


(


P

T id

(

P state 4 T P

state 5 P id

state 6 P ( E) E E+T E T T T*P T P P id P (E)

E

T

P (

id

State 4

158

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}


E

P

id

T


+


(


P

T id

(

P state 4 T P

state 5 P id


E

T

P (

id

State 4

state 7 E T T T *P

*

159

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}


E

P

id

T


+


(


P

T id

(

P state 4 T P

state 5 P id


E

T

P (

id

State 4

state 7 E T T T *P

*

state 8 T T* P P id P (E)

id

P

(

160

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}


E

P

id

T


+


(


P

T id

(

P state 4 T P

state 5 P id


E

T

P (

id

State 4

state 7 E T T T *P

*


id

P

(

state 9 T T* P

161

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}


E

P

id

T


+


(


P

T id

(

P state 4 T P

state 5 P id


E

T

P (

id

State 4

state 7 E T T T *P

*


id

P

(

state 9 T T* P

state 11 E E+ T T T *P

* State 8

162

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}


E

P

id

T


+


(


P

T id

(

P state 4 T P

state 5 P id


E

T

P (

id

State 4

state 7 E T T T *P

*


id

P

(

state 9 T T* P


* State 8

state 12 P (E ) E E +T

)

State 3

+

163

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}


E

P

id

T


+


(


P

T id

(

P state 4 T P

state 5 P id


E

T

P (

id

State 4

state 7 E T T T *P

*


id

P

(

state 9 T T* P


* State 8

state 12 P (E ) E E +T

)

State 3

+

state 10 P (E)

164

SLR(1) Parsing (5)

SLR(1) action table

165

SLR(1) Parsing (6)

Limitations of the SLR(1) Technique

The use of Follow sets to estimate the look-aheads that predict

reduce actions is less precise than using the exact look-aheads

incorporated into LR(1) configurations

Example in next page

166

Compare

LR(1)&

SLR(1)

LR(1)

SLR(1)

Consider Input: id )

Step1:0 id) shift 5

Step2:05 ) Error

Step1:0 id) shift 5

Step2:05 ) Reduce 6

Step3:04 ) Reduce 5

Step4:07 ) Reduce 3

Step5:01 ) Error

Consider G3 :


LR(1)

SLR(1)

The performance of

detecting errors

167 167



6.2 LR Parsers

6.3 LR(1) Parsing

6.4 SLR(1)Parsing

6.5 LALR(1)


168

LALR(1) (1)

LALR(1) parsers

can be built by first constructing an LR(1) parser and then

merging states

An LALR(1) parser is an LR(1) parser in which all states that differ only in the

look-ahead components of the configurations are merged

LALR is an acronym for Look Ahead LR



The core of the above two configurations is the same. Example: LR(1)- state3,state17

Core s’

E E+ T

T T*P

T P

P id

P (E)

Cognate(s)={c|cs, core(s)=s}

state 3 E E+ T,{)$+} T T*P ,{)$+*} T P ,{)$+*} P id ,{)$+*} P (E) ,{)$+*}

169

LR(1) G3 diagram

LALR(1) G3 diagram

170 LALR(1) G3 diagram

SLR(1) G3 diagram (CFSM)

Compare SLR(1) & LALR(1)

It’s same behavior whether

action or goto using SLR(1) or

LALR(1) in G3

Follow(S) = {l},

Follow(E) = {+)$},

Follow(T) = {+*)$},

Follow(P) = {+*)$}

Example:

Compare state 7and state10

in SLR(1) andLALR(1).

Are they all same?

When’s different???

171

LALR(1) (4)

The CFSM state is transformed into its LALR(1) Cognate

P : S0Vt2Q

P(s,a)={Reducei | B •,a Cognate(s) and production i is B }

(if A • a s Then {Shift} Else )

G is LALR(1) if and only if

s S0 a Vt |P(s,a)|1

If G is LALR(1), the action table is trivially extracted from P

P(s,$)={Shift} action[s][$]=Accept

P(s,a)={Shift}, a$ action[s][a]=Shift

P(s,a)={Reducei}, action[s][a]=Reducei

P(s,a)= action[s][a]=Error

172

state 1 <stmt> ID

<var> ID

<var> ID [<expr>]

LALR(1) (5) For Grammar 5:

Assume statements are separated by ;’s,

the grammar is not SLR(1) because

; Follow(<stmt>) and

; Follow(<var>), since <expr><var>

grammar G5 : ….. <prog> <stmt>;{<stmt>;} <stmt>ID

<stmt><var>:=<expr>

<var> ID

<var> ID[<expr>]

<expr><var>

Reduce-reduce conflict

state 0 …… <prog> <stmt>;{<stmt>;} <stmt> ID

<stmt> <var>:=<expr>

<var> ID

<var> ID[<expr>]

<expr> <var>

id

173

LALR(1) (6)

However, in LALR(1),

if we use <var> ID the next symbol must be :=

so action[ 1, := ] = reduce(<var> ID)

action[ 1, ; ] = reduce(<stmt> ID)

action[ 1,[ ] = shift

There is no conflict.

state 1 <stmt> ID ,{$ ;} <var> ID ,{$ ; :=} <var> ID [<expr>] ,{$ ; := [ }

state 0 …… <prog> <stmt>;{<stmt>;} ,{$ ;} <stmt> ID ,{$ ;}

<stmt> <var>:=<expr> ,{$ ; :=}

<var> ID,{$ ; :=}

<var> ID[<expr>] ,{$ ; := [ }

<expr> <var>

id

174

A common technique

to put an LALR(1) grammar into SLR(1) form is to introduce a new non-terminal whose global (I.e. SLR) look-aheads more nearly correspond to LALR’s exact look-aheads

Follow(<lhs>) = {:=}

LALR(1) (7)

grammar G5 : …… <prog> <stmt>;{<stmt>;} <stmt> ID

<stmt> <var>:=<expr>

<var> ID

<var> ID[<expr>]

<expr> <var>

grammar G5 : …… <prog> <stmt>;{<stmt>;} <stmt> ID

<stmt> <lhs>:=<expr>

<lhs> ID

<lhs> ID[<expr>]

<var> ID

<var> ID[<expr>]

<expr> <var>

175

Both SLR(1) and LALR(1) are both built CFSM

Does the case ever occur in which action table can’t work?

At times, it is the CFSM itself that is at fault.

A different expression non-terminal is used to allow error or warning diagnostics

grammar G6 : S (Exp1)

S [Exp1]

S (Exp2]

S [Exp2)

<Exp1>ID

<Exp2>ID

LALR(1) (8)

In state4 , after reduce,

we do not know what

state should be the

next state

In LR(1) , state4 will split into

two states and have a solution.

176

Building LALR(1) Parsers (1)

In the definition of LALR(1)

An LR(1) machine is first built, and then its states are merged to form an

automaton identical in structure to the CFSM

May be quite inefficient

An alternative is to build the CFSM first.

Then LALR(1) look-aheads are “propagated” from configuration to configuration

Propagate links: Case 1: one configuration is created from another in a

previous state via a shift operation

Case 2: one configuration is created as the result of a closure

or prediction operation on another configuration

A •X , L1 A X• , L2

L2={ x|xFirst( t) and t L1 } B •A , L1

A • , L2

177

Building LALR(1) Parsers(2) Step 1:

After the CFSM is built, we can create all the necessary propagate links to transmit look-aheads from one configuration to another (case1)

Step 2: spontaneous look-aheads are determined (case2)

By including in L2, for configuration A,L2, all spontaneous look-aheads induced by configurations of the form B A,L1

These are simply the non-l values of First()

Step 3: Then, propagate look-aheads via the propagate links

While (stack is not empty)

{

pop top items , assign its components to (s,c,L)

if ( configuration c in state s has any propagate links)

{

Try, in turn, to add L to the look-ahead set of each

configuration so linked.

for (each configuration c’ in state s’ to which L is added)

Push(s’,c’,L) onto the stack

} }

178

Building LALR(1) Parsers(3) state 1 S Opts$ Opts Opt Opt Opt ID

grammar G6 : S Opts $

Opts Opt Opt

Opt ID

Opt state 2 Opts Opt Opt Opt ID

state 3 Opt ID

ID ID

Build CFSM

state 1 S Opts$ , {} Opts Opt Opt ,{$} Opt ID,{ID}

Opt state 2 Opts Opt Opt Opt ID

state 3 Opt ID

ID ID

Build initial Lookahead

Stack:

(s1,c2,$)

(s1,c3,ID)

179

Building LALR(1) Parsers(3)

Opt state 2 Opts Opt Opt,{$} Opt ID

state 3 Opt ID

ID ID Step1:

Pop(s1,c2,$)

Add $ to c1 in s2

Push(s2,c1,$)


Opt state 2 Opts Opt Opt.{$} Opt ID,{$}

state 3 Opt ID

ID ID

Stack:

(s2,c1,$)

(s1,c3,ID)

Stack:

(s1,c2,$)

(s1,c3,ID)

Step2:

Pop(s2,c1,$)

Add $ to c2 in s2

Push(s2,c2,$)



Opts Opt Opt

Opt ID

180

Building LALR(1) Parsers(4) state 1 S Opts$ , {} Opts Opt Opt ,{$} Opt ID,{ID}


state 3 Opt ID ,{$}

ID ID

Stack:

(s2,c2,$)

(s1,c3,ID)

Step3:

Pop(s2,c2,$)

Add $ to c1 in s3

Push(s3,c1,$)


Opt

ID ID

Stack:

(s3,c1,$)

(s1,c3,ID)

Step4:

Pop(s3,c1,$)

Nothing to added

(no links)

state 2 Opts Opt Opt.{$} Opt ID,{$}

state 3 Opt ID ,{$}


Opts Opt Opt

Opt ID

181



state 3 Opt ID ,{$ ID}

ID ID

Stack:

(s1,c3,ID)

Step5:

Pop(s1,c3,ID)

Add ID to c1 in s3

Push(s3,c1,ID)


Opt

ID ID

Stack:

(s3,c1,ID)

Step6:

Pop(s3,c1,ID)

Nothing to added

(no links)

state 2 Opts Opt Opt.{$} Opt ID,{$}



Opts Opt Opt

Opt ID

182




ID ID

Stack:

Step7:

Terminate algorithm

Stack:

high Index low Index


Opts Opt Opt

Opt ID

183

Building LALR(1) Parsers (6) A number of LALR(1) parser

generators use look-ahead propagation to compute the parser action table

LALR-Gen uses the propagation algorithm

YACC examines each state repeatedly

184 184



6.2 LR Parsers

6.3 LR(1) Parsing

6.4 SLR(1)Parsing

6.5 LALR(1)


185

Calling Semantic Routines in Shift-

Reduce Parsers (1) Shift-reduce parsers

can normally handle larger classes of grammars than LL(1) parsers, which is a major reason for their popularity

are not predictive

so we cannot always be sure what production is being recognized until its entire right-hand side has been matched

The semantic routines can be invoked only after a production is recognized and reduced

Action symbols only at the extreme right end of a right-hand side

186


Reduce Parsers (2)

Two common tricks are known that allow more flexible placement of semantic routine calls

For example,

<stmt>if <expr> then <stmts> else <stmts> end if

We need to call semantic routines

after the conditional expression else and end if are matched

Solution: create new non-terminals that generate l

<stmt>if <expr> <test cond>

then <stmts> <process then part>

else <stmts> end if

<test cond>l

<process then part>l

187


Reduce Parsers (3) If the right-hand sides differ in the semantic routines

that are to be called, the parser will be unable to correctly determine which routines to invoke

Ambiguity will manifest. For example, <stmt>if <expr> <test cond1>


else <stmts> end if;

<stmt>if <expr> <test cond2>


else <stmts> end if;

<test cond1>l

<test cond2>l

<process then part>l

188


Reduce Parsers (4) An alternative to the use of l–generating non-terminals

is to break a production into a number of pieces,

with the breaks placed where semantic routines are required

<stmt><if head><then part><else part>

<if head>if <expr>

<then part>then <stmts>

<else part>then <stmts> end if;

This approach can make productions harder to read but has the advantage

that no l–generating are needed

189 189



6.2 LR Parsers

6.3 LR(1) Parsing

6.4 SLR(1)Parsing

6.5 LALR(1)


6.7 Using a Parser Generator (TA course)

6.8 Optimizing Parse Tables

6.9 Practical LR(1) Parsers

6.10 Properties of LR Parsing

6.11 LL(1) or LALR(1) , That is the question

6.12 Other Shift-Reduce Technique

190

Optimizing

Parse tables (1)

Action table

Step1: Merge Action table and Go-to table

Lookahead State

0 1 2 3 4 5 6 7 8 9 10 11 12

+ S R5 R6 R3 R4 R7 R2 S

* R5 R6 S R4 R7 S

id S S S S

( S S S S

) R5 R6 R3 R4 R7 R2 S

$ A R5 R6 R3 R4 R7 R2

191

Optimizing

Parse tables (1)

Goto table

Optimizing Parse Table

Step1:Merge Action table

and Go-to table

Lookahead State

0 1 2 3 4 5 6 7 8 9 10 11 12

+ 3 3

* 8 8

id 5 5 5 5

( 6 6 6 6

) 10

$

S

E 1 12

T 7 11 7

P 4 4 4 9

192

Optimizing Parse tables (3) Action table

Goto table

Complete table

+

Lookahead State

0 1 2 3 4 5 6 7 8 9 10 11 12

+ S R5 R6 R3 R4 R7 R2 S

* R5 R6 S R4 R7 S

id S S S S

( S S S S

) R5 R6 R3 R4 R7 R2 S

$ A R5 R6 R3 R4 R7 R2

Lookahead State

0 1 2 3 4 5 6 7 8 9 10 11 12

+ 3 3

* 8 8

id 5 5 5 5

( 6 6 6 6

) 10

$

S

E 1 12

T 7 11 7

P 4 4 4 9

Lookahead State

0 1 2 3 4 5 6 7 8 9 10 11 12

+ S3 R5 R6 R3 R4 R7 R2 S3

* R5 R6 S8 R4 R7 S8

id S5 S5 S5 S5

( S6 S6 S6 S6

) R5 R6 R3 R4 R7 R2 S10

$ A R5 R6 R3 R4 R7 R2

S

E S1 S12

T S7 S11 S7

P S4 S4 S4 S9

193

Optimizing Parse Tables (2)

Single Reduce State

The state always simply reduce

Because of always reducing , can we simplify using another display?

Lookahead State

0 1 2 3 4 5 6 7 8 9 10 11 12

+ S3 R5 R6 R3 R4 R7 R2 S3

* R5 R6 S8 R4 R7 S8

id S5 S5 S5 S5

( S6 S6 S6 S6

) R5 R6 R3 R4 R7 R2 S10

$ A R5 R6 R3 R4 R7 R2

S

E S1 S12

T S7 S11 S7

P S4 S4 S4 S9

194


Step2:

Eliminate all single reduce states.

Replaced with a special marker--- L-prefix

Example

Shift to state4 would be replaced by the entry L5

Make only one possible reduction in a state, we need not ever

go to that state

Cancel this column

Replace S4

to L5

L5 L5 L5

195


Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

196

Shift-Reduce Parsers

void shift_reduce_driver(void) { /* Push the Start State, S0, * onto an empty parse stack. */ push(S0); while (TRUE) { /* forever */ /* Let S be the top parse stack state; * let T be the current input token.*/ switch (action[S][T]) { case ERROR: announce_syntax_error(); break; case ACCEPT: /* The input has been correctly

* parsed. */ clean_up_and_finish(); return;

case SHIFT: push(go_to[S][T]); scanner(&T); /* Get next token. */ break; case REDUCEi: /* Assume i-th production is * X Y1 Ym. * Remove states corresponding to * the RHS of the production. */ pop(m); /* S' is the new stack top. */ push(go_to[S'][X]); break; case Li: /* Assume i-th production is * X Y1 Ym. * Remove states corresponding to * the RHS of the production. */ pop(m-1); /* S' is the new stack top. */ push(go_to[S'][X]); break; } } }

Example(1)

197

for grammar G3 :


Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

Input:(id+id)$

Example(2)

198

Initial :(id+id)$

step1:0 (id+id)$ shift (

Tree:

(

Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


Example(3)

199

Initial :(id+id)$

step2:0 6 id+id)$ L6

Tree:

( id

Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


Example(4)

200

Initial :(id+id)$

step3:0 6 id+id)$ L5

Tree:

(

id

P

Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


Example(5)

201

Initial :(id+id)$

step4:0 6 id+id)$ shift id

Tree:

(

id

P

T

Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


Example(6)

202

Initial :(id+id)$


Tree:

(

id

P

T

Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


+

Example(7)

203

Initial :(id+id)$

step6:0 6 12 +id)$ shift +

Tree:

(

id

P

T

E +

Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


Example(8)

204

Initial :(id+id)$

step7:0 6 12 3 id)$ L6

Tree:

(

id

P

T

E + id

Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


Example(9)

205

Initial :(id+id)$

step8:0 6 12 3 id)$ L5

Tree:

(

id

P

T

E +

id

P

Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


Example(10)

206

Initial :(id+id)$

step9:0 6 12 3 id)$ Shift id

Tree:

(

id

P

T

E +

id

P

T

Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


Example(11)

207

Initial :(id+id)$

step10:0 6 12 3 11 )$ Reduce 2

Tree:

(

id

P

T

+

id

P

E T

Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


)

Example(12)

208

Initial :(id+id)$

step11:0 6 12 )$ L7

Tree:

(

id

P

T

+

id

P

T E

E )

Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


Example(13)

209

Initial :(id+id)$

step12:0 )$ L5

Tree:

(

id

P

T

+

id

P

T E

E )

P Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


Example(14)

210

Initial :(id+id)$

step13:0 )$ Shift )

Tree:

(

id

P

T

+

id

P

T E

E )

P

T

Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


Example(15)

211

Initial :(id+id)$


Tree:

(

id

P

T

+

id

P

T E

E )

P

T

E Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


Example(16)

212

Initial :(id+id)$

step15:0 1 $ Accept

Tree:

(

id

P

T

+

id

P

T E

E )

P

T

E Lookahead State

0 1 2 3 6 7 8 11 12

+ S3 R3 R2 S3

* S8 S8

id L6 L6 L6 L6

( S6 S6 S6 S6

) R3 R2 L7

$ A R3 R2

S

E S1 S12

T S7 S11 S7

P L5 L5 L5 L4

for grammar G3 :


LR(1) Parsers


Very powerful and most languages can be recognized by

them

But, the LR(1) machine contains so many states the GoTo

and Action tables are prohibitivley large.

Alternatives to LR(1) Parsers


LR(0) Parsers

Very compact tables

But with no lookahead, not very powerful

SLR(1) – Simple LR(1) parsers

Add lookahead to LR(0) talbes

Almost as powerful as LR(1) but much smaller

LALR(1) – look-ahead LR(1) parsers

Start with LR(1) states and merge states differing only in the

look-ahead

Smaller and slightly weaker than LR(1)

215

LL(1) or LALR(1) , That is the question(1)

--Modified by http://www.csie.ntu.edu.tw/~compiler/

LR(1) grammar

LALR(1) grammar

SLR(1) grammar

LR(0) grammar

LR(0) SLR(1) LALR(1) LR(1)

state number n n n N

action table † n 1 n |VT| n |VT| N |VT|

goto table † n |V| n |V| n |V| N |V|

† before compression

power --

LALR(1) is the most commonly used bottom-up parsing method

216

LL(1) or LALR(1) , That is the question(2)

--Modified by http://www.csie.ntu.edu.tw/~compiler/

LL(1) LALR(1)

simplicity simpler

generality all LL(1) grammars are LALR(1)

a grammar in LALR(1) form is more readable

placement of

action symbols

anywhere in rhs extreme right end

of rhs, essentially

error repair simpler, because parse stack

has predicted information

parse stack just has

matched information

table sizes |VN| |VT| |states| |V|

|states| may exponential

parsing speed comparable

semantic stack easier manipulation

Two most popular parsing methods

Shift-reduce parsers differ in their use of

Follow information:


LR(0) parsers never consult the lookahead at all.

SLR(1) parsers use the Follow sets as previously

constructed.

LR(1) parsers use context to split the Follow sets

into subsets for different parsing paths (huge,

inefficient parsers).

LALR(1) parsers: like LR(1) but coarser subsets are

used (achieves most of the benefit, but much smaller

and faster).

LL(1) vs LALR(1)

LL(1) and LALR(1) are dominant types

Although variants are used (recursive descent and SLR(1))

LL(1) is simpler

LALR(1) is more general

Most languages can be represented by an LL(1) or LALR(1) grammar, but it is easier to write the LALR(1) grammar

LL(1) can be easier to specify actions

Error repair is easier to do in LL(1)

LL(1) tables will be ~½ size of LALR(1)

A Comparison of Predictive Parsers with

Shift-Reduce Parsers


Both parsers read the input from left-to-right and

maintain a stack of grammar symbols but their parsing

operations are decidedly different as shown in the

following table: Predictive Parser Shift-Reduce Parser

Top-down (LL) Parser Bottom-up (LR) Parser

Stack predicts what is to come Stack shows what has been seen so far

The stack initially contains the start-symbol of the

grammar.

The stack is initially empty.

The stack is empty when the accept state is reached. The stack contains the start symbol of the grammar when the accept

state is reached.

Input tokens are popped off the stack. Input tokens are pushed on the stack.

Left sides of productions are popped off the stack. Right sides of productions are popped off the stack.

Right sides of productions are pushed on the stack. Left sides of productions are pushed on the stack.

Properties of LR(1) Parsers

A correct rightmost parse is guaranteed

Since LR-style parsers accept only viable prefixes,

syntax errors are detected as soon as the parser

attempts to shift a token that isn't part of a viable

prefix

Prompt error reporting

They are linear in operation

All LR(1) grammars are unambiguous

Will yacc generate a parser for an

ambiguous grammar?

compiler design and construction bottom-up parsingsking/courses/compilers/slides/bottom_up... ·...

Documents