meljun cortes automata12
TRANSCRIPT
CSC 3130: Automata theory and formal languages
Parsers for programming languages
MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACSMBA,MPA,BSCS,ACS
MELJUN CORTESMELJUN CORTES
CFG of the java programming languageIdentifier:
IDENTIFIER
QualifiedIdentifier:Identifier { . Identifier }
Literal:IntegerLiteral FloatingPointLiteral CharacterLiteral StringLiteral BooleanLiteralNullLiteral
Expression: Expression1 [AssignmentOperator Expression1]]
AssignmentOperator: = += -= *= /= &= |=
from http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html#52996
…
Parsing java programs
class Point2d { /* The X and Y coordinates of the point--instance variables */ private double x; private double y; private boolean debug; // A trick to help with debugging
public Point2d (double px, double py) { // Constructorx = px;y = py;
debug = false; // turn off debugging }
public Point2d () { // Default constructorthis (0.0, 0.0); // Invokes 2 parameter Point2D constructor
} // Note that a this() invocation must be the BEGINNING of // statement body of constructor
public Point2d (Point2d pt) { // Another consructorx = pt.getX();y = pt.getY();
}
}
…
Simple java program: about 1000 symbols
Parsing algorithms
• How long would it take to parse this?
• Can we parse faster?
• No! CYK is the fastest known general-purposeparsing algorithm
exhaustive algorithm about 1080 years(longer than life of universe)
CYK algorithm about 1 week!
Another way of thinking
Scientist: Find an algorithm thatcan parse strings inany grammar
Engineer: Design your grammar so it has a very fastparsing algorithm
An example
S → Tc(1)
T → TA(2) | A(3)
A → aTb(4) | ab(5)
input: abaabbc
Stack Input
εaabATTaTaaTaabTaATaTTaTbTATTcS
abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbccccεε
Action
shiftshiftreduce (5)reduce (3)shiftshiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1) aa bb
A
a b
A
c
TT
T
A
S
Items
S → •TcS → T•cS → Tc•
T → •TAT → T•AT → TA•
T → •AT → A•
A → •aTbA → a•TbA → aT•bA → aTb•
A → •abA → a•bA → ab•
S → Tc(1) T → A(3)T → TA(2) A → aTb(4) A → ab(5)
Stack Input
εaabATTa
abaabbcbaabbcaabbcaabbcaabbcabbc
Action
shiftshiftreduce (5)reduce (3)shiftshift
••••••
Idea of parsing algorithm:Try to match complete items to top of stack
Some terminology
S → Tc(1)
T → TA(2) | A(3)
A → aTb(4) | ab(5)
input: abaabbc
Stack Input
εaabATTaTaaTaabTaATaTTaTbTATTcS
abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbccccεε
Action
shiftshiftreduce (5)reduce (3)shift shiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1)
handle
valid items:a•Tb, a•b
valid items:T•a, T•c, aT•b
Outline of LR(0) parsing algorithm
• As the string is being read, it is pushed on a stack
• Algorithm keeps track of all valid items
• Algorithm can perform two actions:
no complete itemis viable
shift reduce
there is one valid item,and it is complete
Running the algorithm
Stack Input
S
S
SRSR
εa
aa
aabaAaAbA
aabbabb
bb
bbεε
A Valid Items
A → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •abA → ab•A → aA•bA → aAb•
A → aAb | ab A ⇒ aAb ⇒ aabb
Running the algorithm
Stack Input
S
S
SRSR
εa
aa
aabaAaAbA
aabbabb
bb
bbεε
A Valid Items
A → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •abA → ab•A → aA•bA → aAb•
A → aAb | ab A ⇒ aAb ⇒ aabb
How to update viable items
• Initial set of valid items
• Updating valid items on “shift b”
– After these updates, for every valid item A → α•Cβ andproduction C → •δ, we also add
as a valid item
S → •α for every production S → α
A → α•bβ A → αb•βis updated to
A → α•Xβ disappears if X ≠ b
C → •δa, b: terminalsA, B: variablesX, Y: mixed symbolsα, β: mixed strings
notation
How to update viable items
• Updating valid items on “reduce β to B”– First, we backtrack to viable items before reduce
– Then, we apply same rules as for “shift B” (as if B were a terminal)
A → α•Bβ A → αB•βis updated to
A → α•Xβ disappears if X ≠ B
C → •δ is added for every valid item A → α•Cβ and production C → •δ
Viable item updates by εNFA
• States of εNFA will be items (plus a start state q0)
• For every item S → •α we have a transition
• For every item A → α•Xβ we have a transition
• For every item A → α•Cβ and production C → •δ
S → •αq0ε
A → αX•βXA → α•Xβ
C → •δεA → α•Cβ
Example
A → aAb | ab
A → •aAb A → a•Ab A → aA•b
A → aAb•
A → •ab A → a•b A → ab•
q0
ε
ε
ε
ε
a
a b
b
A
Convert εNFA to DFA
A → •aAbA→ •ab
A → a•AbA → a•bA → •aAbA → •ab
A → aA•b
A → aAb•
A → ab•
a
b
bAa
1
2
3
4
5
states correspond to sets of valid itemstransitions are labeled by variables / terminals
die
Attempt at parsing with DFA
Stack Input
S
S
SR
εa
aa
aabaA
aabbabb
bb
bb
A DFA state
A → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •abA → ab•A → aA•b
A → aAb | ab A ⇒ aAb ⇒ aabb
12
2
3?
Remember the state in stack!
Stack Input
S
S
SRSR
11a2
1a2a2
1a2a2b31a2A41a2A4b51A
aabbabb
bb
bbεε
A DFA state
A → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •abA → ab•A → aA•bA → aAb•
A → aAb | ab A ⇒ aAb ⇒ aabb
12
2
345
LR(0) grammars and deterministic PDAs
• The parsing procedure can be implemented by adeterministic pushdown automaton
• A PDA is deterministic if in every state there is atmost one possible transition – for every input symbol and pop symbol, including ε
• Example: PDA for w#wR is deterministic, but PDA forwwR is not
LR(0) grammars and deterministic PDAs
• Not every PDA can be made deterministic
• Since PDAs are equivalent to CFLs, LR(0) parsing algorithm must fail for some CFLs!
• When does LR(0) parsing algorithm fail?
Outline of LR(0) parsing algorithm
• Algorithm can perform two actions:
• What if:
no complete itemis valid
there is one valid item,and it is complete
shift (S) reduce (R)
some valid itemscomplete, some not
more than one validcomplete item
S / R conflict R / R conflict