meljun cortes automata14
TRANSCRIPT
CSC 3130: Automata theory and formal languages
LR(k) grammars
Fall 2008MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACSMBA,MPA,BSCS,ACS
MELJUN CORTESMELJUN CORTES
LR(0) example from last time
A → •aAbA→ •ab
A → a•AbA → a•bA → •aAbA → •ab
A → aA•b
A → aAb•
A → ab•
ab
bAa
1
2
3
4
5
A → aAb | ab
LR(0) parsing example revisited
Stack Input
S
S
SRSR
11a2
1a2a2
1a2a2b31a2A41a2A4b51A
aabbabb
bb
bbεε
A S
A → aAb | ab A ⇒ aAb ⇒ aabb
12
2
345
A
A → •aAbA→ •ab A → a•Ab
A → a•bA → •aAbA → •ab
A → aA•b A → aAb•
A → ab•
a
b
b
A
a12
3
4 5
Aa b
a b
• •
• •
• •
•
Meaning of LR(0) items
α •
A
A → α•Xβundiscovered part
εNFA transitions to:
X → •γ
X β
focus
shift focus to subtree rooted at X(if X is nonterminal)
A → αX•βmove past subtreerooted at X
Outline of LR(0) parsing algorithm
• Algorithm can perform two actions:
• What if:
no complete itemis valid
there is one valid item,and it is complete
shift (S) reduce (R)
some valid itemscomplete, some not
more than one validcomplete item
S / R conflict R / R conflict
Definition of LR(0) grammar
• A grammar is LR(0) if S/R, R/R conflicts never occur– LR means parsing happens left to right and produces a
rightmost derivation
• LR(0) grammars are unambiguous and have a fastparsing algorithm
• Unfortunately, they are not “expressive” enoughto describe programming languages
context-free grammarsparse using CYK algorithm (slow)
LR(∞) grammars
…
Hierarchy of context-free grammars
LR(1) grammars
LR(0) grammarsparse using LR(0) algorithm
javaperl
python…
A grammar that is not LR(0)
S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)
A
S
A B
A
aA
a a
A
a a
S S
ca
input:
possibilities:shift (3), reduce (4)reduce (5), shift (6)
• • •
valid LR(0) items:A → a•A, A → a• B → a•, B → a•b,A → •aA, A → •a
a
S/R, R/R conflicts!
Lookahead
S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)
A
S
A B
A
aA
a a
A
a a
S S
ca
input:
• • •
apeek inside!
valid LR(0) items:A → a•A, A → a• B → a•, B → a•b,A → •aA, A → •a
Lookahead
S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)
input: a apeek inside!
valid LR(0) items:A → a•A, A → a• B → a•, B → a•b,A → •aA, A → •a
A
A
a a
S
•
…
parse tree must look like this
action: shift
Lookahead
S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)
input: a a apeek inside!
valid LR(0) items:A → a•A, A → a• A → •aA, A → •a
parse tree must look like this
…
A
A
aA
a
S
•action: shift
Lookahead
S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)
input: a a a
valid LR(0) items:A → a•A, A → a• A → •aA, A → •a
parse tree must look like this
action: reduce
A
A
aA
a a
S
•
LR(0) items vs. LR(1) items
A
A
a b
a b
Aa b•
A → aAb | ab
A → a•Ab
A
A
a b
a b
Aa b•
[A → a•Ab, b]
LR(0) LR(1)
LR(1) items
• LR(1) items are of the form
to represent this state in the parsing
[A → α•β, x] [A → α•β, ε]or
α β x•
A
α β•
A
Outline of LR(1) parsing algorithm
• Step 1: Build εNFA that describes valid item updates
• Step 2: Convert εNFA to DFA– As in LR(0), DFA will have shift and reduce states
• Step 3: Run DFA on input, using stack to remember sequence of states– Use lookahead to eliminate wrong reduce items
Recall εNFA transitions for LR(0)
• States of εNFA will be items (plus a start state q0)
• For every item S → •α we have a transition
• For every item A → α•Xβ we have a transition
• For every item A → α•Cβ and production C → •δ
S → •αq0ε
A → αX•βXA → α•Xβ
C → •δεA → α•Cβ
εNFA transitions for LR(1)
• For every item [S → •α, ε] we have a transition
• For every item A → α•Xβ we have a transition
• For every item [A → α•Cβ, x] and production C → δ
for every y in FIRST(βx)
[S → •α, ε]q0ε
[A → αX•β, x]X
[A → α•Xβ, x]
[C → •δ, y]ε
[A → α•Cβ, x]
FIRST sets
• Example
FIRST(α) is the set of terminals that occuron the left in some derivation starting from α
S → A(1) | cB(2) A → aA(3) | a(4) B → a(5) | ab(6)
FIRST(a) = {a}FIRST(A) = {a}FIRST(S) = {a, c}FIRST(bAc) = {b}FIRST(BA) = {a}FIRST(ε) = ∅
Explaining the transitions
[A → αX•β, x]X
[A → α•Xβ, x]
[C → •δ, y]ε
[A → α•Cβ, x]
α
A
C β x
α •
A
X β x α •
A
X β x
y ∈ FIRST(βx)
y
C β
δ • •
Example
S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)
[S → •A, ε]
q0
ε
[S → •Bc, ε]
ε
[S → A•, ε]
A[A → •aA, ε]
[B → •a, c]
[S → B•c, ε]
[B → •ab, c]
. . .
ε
ε
ε
B
[A → •a, ε]ε
Convert NFA to DFA
• Each DFA state is a subset of LR(1) items, e.g.
• States can contain S/R, R/R conflicts
• But lookahead can always resolve such conflicts
[A → a•A, ε] [A → a•, ε][B → a•, c] [B → a•b, c] [A → •aA, ε] [A → •a, ε]
Example
S → A(1) | Bc(2) A → aA(3) | a(4) B → a(5) | ab(6)
stack input
ε
a
abBBcS
abc
bc
ccεε
A valid items[S → •A, ε] [S → •Bc, ε] [A → •aA, ε] [A → •a, ε] [B → •a, c] [B → •ab, c]
S
SRSR
[A → a•A, ε] [A → a•, ε] [B → a•, c] [B → a•b, c] [A → •aA, ε] [A → •a, ε]
[B → ab•, c] [S → B•c, ε]
[S → Bc•, ε]
look ahead!
LR(k) grammars
• A context-free grammar is LR(1) if all S/R, R/Rconflicts can be resolved with one lookahead
• More generally, LR(k) grammars can resolve allconflicts with k lookahead symbols– Items have the form [A → α•β, x1...xk]
• LR(1) grammars describe the semantics of mostprogramming languages