l16lr

Upload: sumit-joshi

Post on 05-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 L16LR

    1/8

    1

    CS780(Prasad) L16LR 1

    LR Parsing

    Lecture Notes by

    Profs Aiken and Necula (UCB)

    CS780(Prasad) L16LR 2

    Out line

    Review of SLR parsing

    Limit s of SLR par sing

    LR parsing

    LALR parsing

    I mplement at ion of semant ic act ions

    Using parser generat ors

    CS780(Prasad) L16LR 3

    Review of SLR(1) Parsing

    LR par ser maintains a stack sym1, st at e1 . . . symn, st at en

    staten is the f inal stat e of t he DFA on sym1 symn

    Got ot ab le: t he tr ansition f unct ion of t he DFA Got o[i ,A] = j i f statei A statej

    Act ion t able: f or each st ate and t erminal:

    Shif t jReduce X AcceptError

    Act ion[i, a] =

    CS780(Prasad) L16LR 4

    LR Parsing Algori t hm

    Let I = w$ be init ial input

    Let j = 0

    Let DFA st ate 1 have it em S .S

    Let stack = dummy, 1

    r epeat

    case action[top_st ate(st ack), I [j ]] of

    shift k: push I [j ++], k

    reduce X A:

    pop |A| pairs,

    push X, Goto[X,t op_st ate(st ack)]

    accept : halt normally

    err or: halt and report err or

    CS780(Prasad) L16LR 5

    Review. I t ems

    An i tem [X .] says t hat t he parser is looking f or an X

    it has an on t op of t he st ack

    Expect s t o f ind a st r ing der ived f r om next in the

    input Notes:

    [X .a] means that a should f ollow. Then we canshif t it and still have a viable pref ix.

    [X .] means t hat we could reduce X

    But t his is not always a good idea !

    CS780(Prasad) L16LR 6

    SLR(1) Act ion Table

    For each stat e si and t erminal a I f si has it em X .a and Goto[i,a] = j t hen

    Action[i,a] = shif t j

    I f si has it em S S. t hen Act ion[i,$ ] = accept

    I f si has it em X . and a Follow(X) and X St hen Act ion[i,a] = reduce X

    Ot herwise, Act ion[i,a] = err or

  • 7/31/2019 L16LR

    2/8

    2

    CS780(Prasad) L16LR 7

    Limit s of SLR Parsing

    SLR(1) is t he simplest LR parsing method SLR(1) is almost power f ul enough, but

    some common programming languageconst ructs are not SLR(1).

    Consider t he grammar

    S L = E | E

    L * E | idE L

    CS780(Prasad) L16LR 8

    Limit s of SLR Parsing (cont . )

    Consider t wo st at es of t he DFA f orrecognizing viable pref ixes

    S . S S L . = E

    S . L = E L E L .

    S . E

    L . * E

    L . id

    E . L

    SLR(1) parser on input = shif t (item L . = E ) reduce by E L(since = Follow(E))

    CS780(Prasad) L16LR 9

    What s The Pr oblem?

    The grammar is not SLR(1), but why?

    Focus on the reduce move in t he second st ate We are in t he context of S E L

    No = can f ollow E in this cont ext

    Even t hough = Follow(E) (in S L = E *E = E)

    The r educe move should not happen if an = f ollowsin this context .

    CS780(Prasad) L16LR 10

    What s The Problem? (Cont . )

    Problem: t he SLR t able has t oo many reduceact ions. Using Follow is t oo coarse.

    I n any given cont ext , only some element s ofFollow can actually f ollow a non-t erminal.

    For example:Follow(E) = {=, $}, but

    I n cont ext S E only $ can f ollow E

    I n cont ext S L = E * E1 = E only = can f ollow E1

    CS780(Prasad) L16LR 11

    One Way t o Fix The Problem: LR(1) I t ems

    I dea: ref ine Follow based on context .

    The context is described t hrough it ems.

    An LR(1) it em is a pair

    [X ., a]where X is a product ion and a is thelookahead t oken or $

    LR(k) is similar but wit h k t okens of lookahead I n pract ice, k = 1

    CS780(Prasad) L16LR 12

    LR(1) I t ems. I nt uit ion

    [X ., a] describes a st at e of t he parser : We are t rying to f ind an X, and

    We have already on top of t he stack, and

    We expect t o see a pref ix derived f rom a

    Back to reduce act ions: have an [X ., a] Perf orm the reduce only if next t oken is a !

    Wil l have f ewer reduce acti ons

    Not f or all b Follow(X)

  • 7/31/2019 L16LR

    3/8

    3

    CS780(Prasad) L16LR 13

    Const ruct ing Sets of LR(1) I t ems (1)

    Similar t o const ruct ion f or LR(0).

    The states of t he NFA are t he LR(1) itemsof G.

    The star t s tate is [S . S, $ ]

    CS780(Prasad) L16LR 14

    Const r uct ing Sets of LR(1) I t ems (2)

    1. For each LR(1) it em [Y .X, a]Add an X-tr ansit ion

    [Y .X, a] X [Y X., a]

    2. For each LR(1) it em [Y .X, a]

    For each product ion X

    For each terminal b First(a)

    Add an t ransition[Y .X, a] [X ., b]

    CS780(Prasad) L16LR 15

    NFA f or Viable Pref ixes in Detail (1)

    S . S $

    S . E $

    S . L = E $

    S S . $

    S

    CS780(Prasad) L16LR 16

    NFA f or Viable Pr ef ixes in Det ail (2)

    S . S $

    S S . $

    S . L = E $

    S . E $

    L . * E =

    L . id =S

    S L . = E $

    L

    CS780(Prasad) L16LR 17

    NFA f or Viable Pref ixes in Detail (3)

    S . S $

    S S . $

    S . L = E $

    S . E $

    L . id =

    L . * E =

    S L . = E $

    E . L $

    S

    L

    E

    CS780(Prasad) L16LR 18

    NFA f or Viable Pr ef ixes in Det ail (4)

    S . S $

    S S . $

    S . L = E $

    S . E $

    E . L $

    L . id =

    L . * E =

    S L . = E $

    L E L . $

    L . id $

    L . * E $

    S

    L

    E

  • 7/31/2019 L16LR

    4/8

    4

    CS780(Prasad) L16LR 19

    An Example Revisit ed

    Consider t he state f rom last slide

    LR(1) parser on input = only shi ft ( it em L . = E )

    S L . = E $

    E L . $

    CS780(Prasad) L16LR 20

    Const ruct ing LR(1) Parsing Tables

    1. Add a dummy S S product ion

    2. Const ruct t he NFA of LR(1) it ems as bef ore

    3. Convert t he NFA int o a DFA

    4. Goto is def ined exact ly as befor e:

    Got o[i, A] = j if statei A statej

    (t he t ransit ion f unct ion of t he DFA)

    CS780(Prasad) L16LR 21

    Const ruct ing LR(1) Parsing Tables (Cont . )

    5. For each st at e si of t he DFA and ter minal a I f si has it em [X .a, c] and Goto[i , a] = j then

    act ion[i,a] = shif t j

    I f si has it em [X ., a] and X S then

    action[i,a] = reduce X

    I f si has it em [S S., $] then

    act ion[i,$] = accept

    Ot herwise,

    action[i,a] = err or

    LR(1) grammar act ion[i ,a] uniquely def inedCS780(Prasad) L16LR 22

    LALR Parsing

    Two bot t om-up parsing methods: SLR and LR

    Which one we use? Neither SLR is not power f ul enough.

    LR parsing tables are t oo big (1000s of states vs.100s of states f or SLR).

    I n pract ice, use LALR(1) St ands f or Look-Ahead LR

    A compromise between SLR(1) and LR(1)

    CS780(Prasad) L16LR 23

    LALR Parsing (Cont . )

    Rough intuit ion: A LALR(1) parser f or G has The number of states of an SLR parser.

    Some of t he lookahead discr iminat ion of LR(1).

    I dea: const ruct t he DFA f or t he LR(1). Then merge the DFA st at es whose it ems

    dif f er only in t he lookahead t okens We say that such states have t he same core.

    CS780(Prasad) L16LR 24

    The Core of a Set of LR I t em

    Def ini t ion: The core of a set of LR it ems ist he set of f ir st components.

    Example: t he cor e of

    { [X ., b], [Y ., d]}is

    {X ., Y .}

    The core of an LR it em is an LR(0) it em.

  • 7/31/2019 L16LR

    5/8

    5

    CS780(Prasad) L16LR 25

    A LALR(1) DFA

    Repeat unt il all st at es have dist inct core. Choose two dist inct states wit h same core.

    Merge t he stat es by creat ing a new one with t heunion of all t he it ems.

    Point edges f r om predecessors t o new stat e.

    New state point s t o all t he previous successors.

    A

    ED

    CB

    F

    A

    BE

    D

    C

    F

    CS780(Prasad) L16LR 26

    The LALR Parser Can Have Conf lict s

    Consider f or example t he LR(1) st at es{[X ., a], [ Y ., b]}

    {[X ., b], [Y ., a]}

    And the merged LALR(1) stat e

    {[X ., a/ b], [Y ., a/ b]}

    Has a new reduce-reduce conf lict .

    I n pr act ice such cases are r ar e.

    CS780(Prasad) L16LR 27

    LALR vs. LR Parsing

    LALR languages ar e not nat ural. They are an ef f iciency hack on LR languages

    Any r easonable programming language has anLALR(1) gr ammar .

    LALR(1) has become a st andar d f orprogramming languages and f or parsergener ator s.

    CS780(Prasad) L16LR 28

    A Hierar chy of Grammar Classes

    CS780(Prasad) L16LR 29

    Semant ic Act ions

    We can now illust rate how semant ic act ionsare implemented f or LR par sing.

    Keep at t ributes on t he stack.

    On shif t a, push at t ribute f or a on stack.

    On reduce X pop att r ibut es f or

    comput e att r ibut e f or X

    and push it on t he stack

    CS780(Prasad) L16LR 30

    Perf orming Semant ic Act ions. Example

    Recall t he example f rom ear lier lecture

    E T + E1 { E.val = T.val + E1.val }

    | T { E.val = T.val }

    T int * T1 { T.val = int .val * T1.val }

    | int { T.val = int .val }

    Consider t he parsing of t he st r ing 3 * 5 + 8

  • 7/31/2019 L16LR

    6/8

    6

    CS780(Prasad) L16LR 31

    Perf orming Semant ic Act ions. Example

    | int * int + int shif t

    int 3 | * int + int shif tint 3 * | int + int shif t

    int 3 * int 5 | + int reduce T intint 3 * T5 | + int reduce T int * T

    T15 | + int shif tT15 + | int shif t

    T15 + int 8 | reduce T intT15 + T8 | reduce E T

    T15 + E8 | reduce E T + EE23 | accept

    CS780(Prasad) L16LR 32

    Notes

    The previous discussion shows howsynt hesized at t r ibutes are comput ed by LRparsers.

    I t is also possible to comput e inher it edat t r ibut es in an LR parser.

    CS780(Prasad) L16LR 33

    Using Parser Generat ors

    Most common parser generat ors are LALR(1).

    A parser generator const ruct s a LALR(1) t able.

    And report s an err or when a t able ent ry is multiplydef ined: A shif t and a reduce. Called shift / reduce conf lict

    Mult iple reduces. Called reduce/ reduce conf lict

    An ambiguous grammar will generate conf lict s.

    What do we do in that case?

    CS780(Prasad) L16LR 34

    Shif t / Reduce Conf lict s

    Typically due to ambiguit ies in the grammar .

    Classic example: t he dangling elseS if E t hen S | if E t hen S else S | OTHER

    Will have DFA st at e containing[S if E t hen S., else]

    [S if E t hen S. else S, x]

    if else f ollows, t hen we can shif t or reduce

    Def ault (bison, CUP, etc.) is to shif t Default behavior is as needed in t his case.

    CS780(Prasad) L16LR 35

    More Shif t / Reduce Conf lict s

    Consider t he ambiguous grammarE E + E | E * E | int

    We will have t he st at es containing[E E * . E, +] [E E * E., +]

    [E . E + E, +] E [E E . + E, +]

    Again we have a shif t / r educe on input + We need to reduce (* binds more t ightly t hat =)

    Recall solut ion: declare t he precedence of * and =

    CS780(Prasad) L16LR 36

    More Shif t / Reduce Conf lict s

    I n bison, declare precedence and associat ivit y:%left +

    %left *

    Pr ecedence of a rule = t hat of it s last t erminal

    See bison manual f or ways to overr ide t his def ault .

    Resolve shift / reduce conf lict wit h a shif t if : no precedence declared f or eit her r ule or t erminal

    input t erminal has higher precedence than t he rule

    t he precedences are t he same and r ight associat ive

  • 7/31/2019 L16LR

    7/8

    7

    CS780(Prasad) L16LR 37

    Using Precedence to Solve S/ R Conf lict s

    Back t o our example:[E E * . E, +] [E E * E., +]

    [E . E + E, +] E [E E . + E, +]

    Wil l choose reduce because precedence ofrule E E * E is higher t han of t erminal +

    CS780(Prasad) L16LR 38

    Using Pr ecedence to Solve S/ R Conf lict s

    Same grammar as bef oreE E + E | E * E | int

    We will also have t he stat es[E E + . E, +] [E E + E., +]

    [E . E + E, +] E [E E . + E, +]

    Now we also have an S/ R conf lict on input + We choose reduce because E E + E and + have

    t he same precedence and + is left -associative.

    CS780(Prasad) L16LR 39

    Using Precedence to Solve S/ R Conf lict s

    Back t o our dangling else example[S if E t hen S., else]

    [S if E t hen S. else S, x]

    Can eliminat e conf lict by declar ing else withhigher precedence thant hen.

    But t his st ar t s t o look like hacking the t ables.

    Best t o avoid overuse of precedencedeclar ati ons, or youll end wit h unexpectedpar se t r ees.

    CS780(Prasad) L16LR 40

    Reduce/ Reduce Conf lict s

    Usually due t o gross ambiguit y in t he grammar

    Example: a sequence of ident if iersS | id | id S

    There are t wo parse tr ees f or t he str ing idS i d

    S id S id

    How does t his conf use t he parser?

    CS780(Prasad) L16LR 41

    More on Reduce/ Reduce Conf lict s

    Consider t he states [S id ., $][S . S, $] [S id . S, $]

    [S ., $] id [S ., $]

    [S . id, $] [S . id, $]

    [S . id S, $] [S . id S, $ ] Reduce/ reduce conf lict on input $

    S S id

    S S id S i d

    Bet t er rewrit e the grammar: S | id S

    CS780(Prasad) L16LR 42

    St range Reduce/ Reduce Conf lict s

    Consider t he grammarS PR , NL N | N , NL

    P T | NL : T R T | N : T

    N id T id

    P - parameters specif icat ion R - result specif icat ion

    N - a par ameter or result name

    T - a type name

    NL - a list of names

  • 7/31/2019 L16LR

    8/8

    8

    CS780(Prasad) L16LR 43

    St range Reduce/ Reduce Conf lict s

    I n Pan id is a N when f ollowed by , or :

    T when f ollowed by id

    I n R an id is a N when f ollowed by :

    T when f ollowed by ,

    This is an LR(1) grammar.

    But it is not LALR(1). Why? For obscure r easons

    CS780(Prasad) L16LR 44

    A Few LR(1) St ates

    P . T id

    P . NL : T id

    NL . N :

    NL . N , NL :

    N . id :

    N . id ,

    T . id id

    1

    R . T ,

    R . N : T ,

    T . id ,

    N . id :

    2

    T id . id

    N id . :

    N id . ,

    id

    3

    T id . ,

    N id . :id

    4

    T id . id/,

    N id . :/,LALR merge

    LALR r educe/ r educeconf lict on ,

    CS780(Prasad) L16LR 45

    What Happened?

    Two dist inct stat es were confused becauset hey have t he same core.

    Fix: add dummy product ions t o dist inguish t het wo conf used states.

    E.g., add

    R id bogus bogus is a t erminal not used by the lexer .

    This pr oduct ion will never be used dur ing parsing.

    But it dist inguishes R from P.

    CS780(Prasad) L16LR 46

    A Few LR(1) Stat es Af t er Fix

    P . T id

    P . NL : T id

    NL . N :

    NL . N , NL :

    N . id :

    N . id ,

    T . id id

    R . T ,

    R . N : T ,

    R . id bogus ,

    T . id ,

    N . id :

    T id . id

    N id . :

    N id . ,

    T id . ,

    N id . :

    R id . bogus ,

    id

    id

    1

    2

    3

    4

    Diff erent cores no LALR merging

    CS780(Prasad) L16LR 47

    Not es on Parsing

    Parsing A solid f oundat ion: cont ext -f ree grammars

    A simple parser: LL(1)

    A more powerf ul parser: LR(1)

    An ef f iciency hack: LALR(1) LALR(1) parser generat ors

    Next t ime we move on t o semant ic analysis.