lr p - parasol laboratoryrwerger/courses/434/lec12-sum.pdfesente d by stack is viable, p arser has...

26

Upload: others

Post on 03-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LR ParsingQuick Review:� top-down parsers build a parse tree from root to leaves� bottom-up parsers build a parse tree from leaves to root� LR(1) parsers: (bottom up)| scan the input from left to right| build a rightmost derivation in reverse| use a single token lookahead to disambiguate� have a simple, table-driven, shift-reduce skeleton� encode grammatical knowledge in tablesLR parsers are practical, e�cient, and easy to build.CPSC434 (LR Summary) Lecture 11, Page 1

Page 2: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LR(1) ParsingThe skeleton parser:token = next token()repeat forevers = top of stackif action[s,token] = "shift si"' thenpush tokenpush sitoken = next token()else if action[s,token] = "reduce A ::= �"thenpop 2 � j � j symbolss = top of stackpush Apush goto[s,A]else if action[s, token] = "accept" thenreturnelse error()This takes k shifts and l reduces, where k is the length of the inputstring and l depends on the grammar.Equivalent to Figure 4.30, Aho, Sethi, and Ullman.CPSC434 (LR Summary) Lecture 11, Page 2

Page 3: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

Example TablesACTION GOTOid + * $ <expr> <term> <factor>S0 s4 | | | 1 2 3S1 | | | acc | | |S2 | s5 | r3 | | |S3 | r5 s6 r5 | | |S4 | r6 r6 r6 | | |S5 s4 | | | 7 2 3S6 s4 | | | | 8 3S7 | | | r2 | | |S8 | r4 | r4 | | |The Grammar1 <goal> ::= <expr>2 <expr> ::= <term> + <expr>3 j <term>4 <term> ::= <factor> * <term>5 j <factor>6 <factor> ::= idCPSC434 (LR Summary) Lecture 11, Page 3

Page 4: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LR(1) ParsingThere are three commonly used algorithms to build tables for an\LR" parser:1. SLR(1)� smallest class of grammars� smallest tables (number of states)� simple, fast construction2. LR(1)� full set of LR(1) grammars� largest tables (number of states)� slow, large construction3. LALR(1)� intermediate sized set of grammars� same number of states as SLR(1)� canonical construction is slow and large� better construction techniques existAn LR(1) parser for either ALGOL or PASCAL has several thousandstates, while an SLR(1) or LALR(1) parser for the same languagemay have several hundred states.CPSC434 (LR Summary) Lecture 11, Page 4

Page 5: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

FIRSTFor a string of grammar symbols �, de�ne FIRST(�) as� the set of terminal symbols that begin strings derived from �� If �)� �, then � 2 FIRST(�)FIRST(�) contains the set of tokens that are valid in the initialposition in �To build FIRST(X):1. if X is a terminal, FIRST(X) is fXg2. if X ::= �, then � 2 FIRST(X).3. if X ::= Y1Y2 � � �Yk then put FIRST(Y1) in FIRST(X)4. if X is a non-terminal and X ::= Y1Y2 � � �Yk, thena 2 FIRST(X) if a 2 FIRST(Yi) and � 2 FIRST(Yj) for all1 � j < i.(If � 6 2 FIRST(Y1), then FIRST(Yi) is irrelevant, for 1 < i.)CPSC434 (LR Summary) Lecture 11, Page 5

Page 6: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

FOLLOWFor a non-terminal A, de�ne FOLLOW(A) asthe set of terminals that can appear immediately to the right ofA in some sentential formThus, a non-terminal's FOLLOW set speci�es the tokens that canlegally appear after it.A terminal symbol has no FOLLOW set.To build FOLLOW(X):1. place eof in FOLLOW(<goal>)2. if A ::= �B�, then put fFIRST(�)� �g in FOLLOW(B)3. if A ::= �B then put FOLLOW(A) in FOLLOW(B)4. if A ::= �B� and � 2 FIRST(�), then put FOLLOW(A) inFOLLOW(B)CPSC434 (LR Summary) Lecture 11, Page 6

Page 7: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

ExampleFor our example grammar, these sets are:Symbol FIRST FOLLOW<goal> f id,number g f eof g<expr> f id,number g f eof g<term> f id,number g f eof,+,- g<factor> f id,number g f eof,+,-,*,/ g+ f + g |- f - g |* f * g |/ f / g |id f id g |number f number g |Computing these sets is the 1st step in building LR(1) tablesCPSC434 (LR Summary) Lecture 11, Page 7

Page 8: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LR(1) GrammarsGiven these de�nitions, we can formally de�ne an LR(1) grammar.An augmented grammary G is LR(1) if the three conditions1. Start)� �Aw )� ��w,2. Start)� Bx)� ��y,3. FIRST(w) = FIRST(y)imply that �Ay = Bx.(That is, � = , A = B, and x = y.)To extend this to LR(k) grammars, we de�ne FIRSTk(�) as theleading k symbols that begin strings derived from �.The de�nition extends naturally by changing rule 3.y An \augmented grammar" is one where the start symbol appearsonly on the lhs of productions.For the rest of LR parsing, assume the grammar is augmented witha production S0 ::= S.CPSC434 (LR Summary) Lecture 11, Page 8

Page 9: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LR(k) ItemsThe table construction algorithms use LR(k) items to represent theset of possible states in a parse.An LR(k) item is a pair [�; �], where� is a production from G with a � at some position in the rhs� is a lookahead string containing k symbols (terminals or eof)Two cases of interest are k = 0 and k = 1.LR(0) items play a key role in the SLR(1) table constructionalgorithm.LR(1) items play a key role in the LR(1) and LALR(1) tableconstruction algorithms.CPSC434 (LR Summary) Lecture 11, Page 9

Page 10: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

Viable pre�xA viable pre�x is1. a pre�x of a right-sentential form that does not continue pastthe right end of the rightmost handle of that sentential formy, or2. a pre�x of a right-sentential form that can appear on the stackof a shift-reduce parser.If the viable pre�x is a proper pre�x (that is, a handle), it is possibleto add terminals onto its end to form a right-sentential form.As long as the pre�x represented by the stack is viable, theparser has not seen a detectable error.y If the grammar is unambiguous, there is a unique rightmost handle.LR(k) grammars are unambiguous. Operator grammars may beambiguous, but are still parsed with shift-reduce parsers.CPSC434 (LR Summary) Lecture 11, Page 10

Page 11: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

ExampleThe � indicates how much of an item we have seen at a given statein the parse.[A ::= �XY Z] indicates that the parser is looking for a string thatcan be derived from XY Z[A ::= XY � Z] indicates that the parser has seen a string derivedfrom XY and is looking for one derivable from ZLR(0) Items: (no lookahead)A ::= XY Z generates 4 LR(0) items.1. [A ::= �XY Z]2. [A ::= X � Y Z]3. [A ::= XY � Z]4. [A ::= XY Z�]CPSC434 (LR Summary) Lecture 11, Page 11

Page 12: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LR(1) ItemsWhat about LR(1) items?� In an LR(1) item, all the lookahead strings are constrained tohave length 1.� An LR(1) item might look like [A ::= X � Y Z; a].Key Observations:1. unambiguous grammar ) unique rightmost derivation2. handles appear on upper fringe of tree built by reverserightmost derivationcan keep fringe on the stack3. while L(G) isn't regular, the language of handles is, becausethere are only a �nite number of handles4. can recurse to match terms by leaving dfa state on the stackAll the dfa knowledge is encoded in the Action and Goto tablesCPSC434 (LR Summary) Lecture 11, Page 12

Page 13: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LR(1) ItemsThe LR(1) table construction algorithm uses a speci�c set of of setsof LR(1) items, called the canonical collection of sets of LR(1)items for a grammar G.The canonical collection represents the set of valid states forthe parser.The items in each set of the canonical collection fall into two classes:kernel items: items where � is not at the left end of the rhsand [S0 ::= �S; eof]non-kernel items: all items where � is at the left end of rhsEach item corresponds to a point in the parse.To generate a parser state from a kernel item, we take its closure.) if [A ::= � �B�; a] 2 Ij, then, in state j, the parser might nextsee a string derivable from B�) to form its closure, add all items of the form [B ::= � ; �] 2 GCPSC434 (LR Summary) Lecture 11, Page 13

Page 14: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LR(1) ItemsWhat's the point of the lookahead symbols?� carry them along to allow us to choose correct reduction whenthere is any choice� lookaheads are bookkeeping, unless item has � at right end.| in [A ::= X � Y Z; a], a has no direct use| in [A ::= XY Z�; a], a is useful� allows use of grammars that are not uniquely invertibleRecall, the SLR(1) construction uses LR(0) items!The point:For [A ::= ��,a] and [B ::= ��,b], we can decide between reducingto A and to B by looking at limited right context!CPSC434 (LR Summary) Lecture 11, Page 14

Page 15: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LR(1) ItemsThe canonical collection of LR(1) items:� set of items derivable from [S0 ::= �S; eof]� set of all items that can derive the �nal con�gurationThe set of sets where, for each item [A ::= X � Y; u], there exists arightmost derivationS0)� rAst) rxyst)� r0x0utwhere rx)� r0x0 and ys)� u.To construct the canonical collection we need two functions:� closure(I)� goto(I;X)CPSC434 (LR Summary) Lecture 11, Page 15

Page 16: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

Closure(I)Given an item [A ::= � �B�; a], its closure contains the item andany other items that can generate legal substrings to follow �.Thus, if the parser has viable pre�x � on its stack, the input shouldreduce to B� (or for some item [B ::= � ; b] in the closure).To compute closure(I)function closure(I)add = 1while (add 6= 0)add = 0for each item [A ::= � �B�; a] 2 I,each production B ::= 2 G0,and each terminal b 2 FIRST(�a),if [B ::= � ; b] 6 2 I thenadd [B ::= � ; b] to Iadd = 1return IAho, Sethi, and Ullman, Figure 4.38CPSC434 (LR Summary) Lecture 11, Page 16

Page 17: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

Goto(I,X)Let I be a set of LR(1) items and X be a grammar symbol.Then, goto(I;X) is the closure of the set of all items[A ::= �X � �; a] such that [A ::= � �X�; a] 2 IIf I is the set of valid items for some viable pre�x , thengoto(I;X) is the set of valid items for the viable pre�x X .goto(I;X) represents state after recognizing X in state I .To compute goto(I;X)function goto(I,X)let J be the set of items [A ::= �X � �; a]such that [A ::= � �X�; a] 2 Ireturn closure(J)Aho, Sethi, and Ullman, Figure 4.38CPSC434 (LR Summary) Lecture 11, Page 17

Page 18: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

Collection of Sets of LR(1) ItemsWe start the construction of the collection of sets of LR(1) itemswith the item [S0 ::= �S; eof], whereS0 is the start symbol of the augmented grammar G0S is the start symbol of G, andeof is the right end of string markerTo compute the collection of sets of LR(1) itemsprocedure items(G0)C = fclosure(f[S0 ::= �S; eof]g)gadd = 1while (add 6= 0)add = 0for each set of items I in C andeach grammar symbol X such thatgoto(I;X) 6=; andgoto(I;X) 6 2 Cadd goto(I;X) to Cadd = 1Aho, Sethi, and Ullman, Figure 4.38CPSC434 (LR Summary) Lecture 11, Page 18

Page 19: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

ExampleStep 1I0 f[g! � e,eof]gI0 closure(I0)f[g! � e,eof], [e! � t + e,eof], [e ! � t,eof],[t! � f * t,+], [t! � f * t,eof], [t! � f,+], [t! � f,eof],[f! � id,+], [f! � id,eof]gIteration 1I1 goto(I0; e)I2 goto(I0; t)I3 goto(I0; f)I4 goto(I0; id)Iteration 2I5 goto(I2; +)I6 goto(I3; *)Iteration 3I7 goto(I5; e)I8 goto(I6; t)CPSC434 (LR Summary) Lecture 11, Page 19

Page 20: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

ExampleI0: [g ! � e,eof], [e ! � t + e,eof], [e! � t,eof],[t ! � f * t,f+,eofg], [t ! � f,f+,eofg], [f ! � id,f+,eofg]I1: [g! e �,eof]I2: [e ! t �,eof], [e! t � + e,eof]I3: [t ! f �,f+,eofg], [t ! f � * t, f+,eofg]I4: [f ! id �, f+,*,eofg]I5: [e ! t + � e,eof], [e! � t + e,eof], [e ! � t,eof],[t ! � f * t,f+,eofg], [t ! � f,f+,eofg],[f ! � id,f+,*,eofg]I6: [t ! f * � t,f+,eofg], [t ! � f * t,f+,eofg],[t ! � f,f+,eofg], [f ! �id; f+,*,eofg]I7: [e ! t + e �; eof]I8: [t ! f * t �; f+,eofg]CPSC434 (LR Summary) Lecture 11, Page 20

Page 21: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LR(1) Table ConstructionThe Algorithm1. construct the collection of sets of LR(1) items for G0.2. State i of the parser is constructed from Ii.(a) if [A! � � a�; b] 2 Ii and goto(Ii; a) = Ij, then setaction[i,a] to \shift j". (a must be a terminal)(b) if [A! ��; a] 2 Ii, then set action[i,a] to\reduce A! �".(c) if [S0 ! S�; eof] 2 Ii, then set action[i,eof] to\accept".3. If goto(Ii; A) = Ij, then set goto[i,A] to j.4. All other entries in action and goto are set to \error"5. The initial state of the parser is the state constructed from theset containing the item [S0 ! �S; eof].Aho, Sethi, and Ullman, Algorithm 4.10CPSC434 (LR Summary) Lecture 11, Page 21

Page 22: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

What can go wrong?Rules 2a, 2b, and 2c can multiply de�ne a position inthe action table. In this case, the grammar is notLR(1).Two cases arise:shift/reduce This is called a shift/reduce con ict. In general, itindicates an ambiguous construct in the grammar.� can modify the grammar to eliminate it� can resolve in favor of shiftingclassical example: dangling elsereduce/reduce This is called a reduce/reduce con ict. Again, itindicates an ambiguous construct in the grammar.� often, no simple resolution� parse a nearby languageclassical example: PL/I call and subscriptCPSC434 (LR Summary) Lecture 11, Page 22

Page 23: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LALR(1) ParsingDe�ne the core of a set of LR(1) items to be the set ofLR(0) items derived by ignoring the lookahead symbols.Thus, the two sets� f[A) � � �; a]; [A) � � �; b]g, and� f[A) � � �; c]; [A) � � �; d]ghave the same core.Key Idea:If two sets of LR(1) items, Ii and Ij, have the same core, we canmerge the states that represent them in the action and gototables.CPSC434 (LR Summary) Lecture 11, Page 23

Page 24: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LALR(1) Table ConstructionTo construct LALR(1) parsing tables, we can insert a single stepinto the LR(1) algorithm.(1.5) For each core present among the set of LR(1) items, �nd allsets having that core and replace these sets by their union.The goto function must be updated to re ect the replacementsets.The resulting algorithm has large space requirementsCPSC434 (LR Summary) Lecture 11, Page 24

Page 25: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LALR(1) Table ConstructionA more space e�cient algorithm can be derived by observing that:� we can represent Ii by its kernel, those items that are either theinitial item [S0! �S; eof] or do not have the � at the left endof the rhs.� we can compute shift, reduce, and goto actions for the statederived from Ii directly from kernel(Ii).This method avoids building the complete collection of sets ofLR(1) items.CPSC434 (LR Summary) Lecture 11, Page 25

Page 26: LR P - Parasol Laboratoryrwerger/Courses/434/lec12-sum.pdfesente d by stack is viable, p arser has not se en a dete ctable err or. y If the grammar is unam biguous, there a unique

LR(k) LanguagesSLR(1)LALR(1)LR(1)LR(k)Grammars

LanguagesSLR(1) LALR(1)LR(1) LR(k)GrammarsCPSC434 (LR Summary) Lecture 11, Page 26