meljun cortes automata12

22
CSC 3130: Automata theory and formal languages Parsers for programming languages MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACS MBA,MPA,BSCS,ACS MELJUN CORTES MELJUN CORTES

Upload: meljun-cortes

Post on 15-Jul-2015

49 views

Category:

Technology


0 download

TRANSCRIPT

CSC 3130: Automata theory and formal languages

Parsers for programming languages

MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACSMBA,MPA,BSCS,ACS

MELJUN CORTESMELJUN CORTES

CFG of the java programming languageIdentifier:

IDENTIFIER

QualifiedIdentifier:Identifier { . Identifier }

Literal:IntegerLiteral FloatingPointLiteral CharacterLiteral StringLiteral BooleanLiteralNullLiteral

Expression: Expression1 [AssignmentOperator Expression1]]

AssignmentOperator: = += -= *= /= &= |=

from http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html#52996

Parsing java programs

class Point2d { /* The X and Y coordinates of the point--instance variables */ private double x; private double y; private boolean debug; // A trick to help with debugging

public Point2d (double px, double py) { // Constructorx = px;y = py;

debug = false; // turn off debugging }

public Point2d () { // Default constructorthis (0.0, 0.0); // Invokes 2 parameter Point2D constructor

} // Note that a this() invocation must be the BEGINNING of // statement body of constructor

public Point2d (Point2d pt) { // Another consructorx = pt.getX();y = pt.getY();

}

}

Simple java program: about 1000 symbols

Parsing algorithms

• How long would it take to parse this?

• Can we parse faster?

• No! CYK is the fastest known general-purposeparsing algorithm

exhaustive algorithm about 1080 years(longer than life of universe)

CYK algorithm about 1 week!

Another way of thinking

Scientist: Find an algorithm thatcan parse strings inany grammar

Engineer: Design your grammar so it has a very fastparsing algorithm

An example

S → Tc(1)

T → TA(2) | A(3)

A → aTb(4) | ab(5)

input: abaabbc

Stack Input

εaabATTaTaaTaabTaATaTTaTbTATTcS

abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbccccεε

Action

shiftshiftreduce (5)reduce (3)shiftshiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1) aa bb

A

a b

A

c

TT

T

A

S

Items

S → •TcS → T•cS → Tc•

T → •TAT → T•AT → TA•

T → •AT → A•

A → •aTbA → a•TbA → aT•bA → aTb•

A → •abA → a•bA → ab•

S → Tc(1) T → A(3)T → TA(2) A → aTb(4) A → ab(5)

Stack Input

εaabATTa

abaabbcbaabbcaabbcaabbcaabbcabbc

Action

shiftshiftreduce (5)reduce (3)shiftshift

••••••

Idea of parsing algorithm:Try to match complete items to top of stack

Some terminology

S → Tc(1)

T → TA(2) | A(3)

A → aTb(4) | ab(5)

input: abaabbc

Stack Input

εaabATTaTaaTaabTaATaTTaTbTATTcS

abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbccccεε

Action

shiftshiftreduce (5)reduce (3)shift shiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1)

handle

valid items:a•Tb, a•b

valid items:T•a, T•c, aT•b

Outline of LR(0) parsing algorithm

• As the string is being read, it is pushed on a stack

• Algorithm keeps track of all valid items

• Algorithm can perform two actions:

no complete itemis viable

shift reduce

there is one valid item,and it is complete

Running the algorithm

Stack Input

S

S

SRSR

εa

aa

aabaAaAbA

aabbabb

bb

bbεε

A Valid Items

A → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •abA → ab•A → aA•bA → aAb•

A → aAb | ab A ⇒ aAb ⇒ aabb

Running the algorithm

Stack Input

S

S

SRSR

εa

aa

aabaAaAbA

aabbabb

bb

bbεε

A Valid Items

A → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •abA → ab•A → aA•bA → aAb•

A → aAb | ab A ⇒ aAb ⇒ aabb

How to update viable items

• Initial set of valid items

• Updating valid items on “shift b”

– After these updates, for every valid item A → α•Cβ andproduction C → •δ, we also add

as a valid item

S → •α for every production S → α

A → α•bβ A → αb•βis updated to

A → α•Xβ disappears if X ≠ b

C → •δa, b: terminalsA, B: variablesX, Y: mixed symbolsα, β: mixed strings

notation

How to update viable items

• Updating valid items on “reduce β to B”– First, we backtrack to viable items before reduce

– Then, we apply same rules as for “shift B” (as if B were a terminal)

A → α•Bβ A → αB•βis updated to

A → α•Xβ disappears if X ≠ B

C → •δ is added for every valid item A → α•Cβ and production C → •δ

Viable item updates by εNFA

• States of εNFA will be items (plus a start state q0)

• For every item S → •α we have a transition

• For every item A → α•Xβ we have a transition

• For every item A → α•Cβ and production C → •δ

S → •αq0ε

A → αX•βXA → α•Xβ

C → •δεA → α•Cβ

Example

A → aAb | ab

A → •aAb A → a•Ab A → aA•b

A → aAb•

A → •ab A → a•b A → ab•

q0

ε

ε

ε

ε

a

a b

b

A

Convert εNFA to DFA

A → •aAbA→ •ab

A → a•AbA → a•bA → •aAbA → •ab

A → aA•b

A → aAb•

A → ab•

a

b

bAa

1

2

3

4

5

states correspond to sets of valid itemstransitions are labeled by variables / terminals

die

Attempt at parsing with DFA

Stack Input

S

S

SR

εa

aa

aabaA

aabbabb

bb

bb

A DFA state

A → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •abA → ab•A → aA•b

A → aAb | ab A ⇒ aAb ⇒ aabb

12

2

3?

Remember the state in stack!

Stack Input

S

S

SRSR

11a2

1a2a2

1a2a2b31a2A41a2A4b51A

aabbabb

bb

bbεε

A DFA state

A → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •ab A → a•Ab A → a•bA → •aAb A → •abA → ab•A → aA•bA → aAb•

A → aAb | ab A ⇒ aAb ⇒ aabb

12

2

345

LR(0) grammars and deterministic PDAs

• The parsing procedure can be implemented by adeterministic pushdown automaton

• A PDA is deterministic if in every state there is atmost one possible transition – for every input symbol and pop symbol, including ε

• Example: PDA for w#wR is deterministic, but PDA forwwR is not

LR(0) grammars and deterministic PDAs

• Not every PDA can be made deterministic

• Since PDAs are equivalent to CFLs, LR(0) parsing algorithm must fail for some CFLs!

• When does LR(0) parsing algorithm fail?

Outline of LR(0) parsing algorithm

• Algorithm can perform two actions:

• What if:

no complete itemis valid

there is one valid item,and it is complete

shift (S) reduce (R)

some valid itemscomplete, some not

more than one validcomplete item

S / R conflict R / R conflict

context-free grammarsparse using CYK algorithm (slow)

LR(∞) grammars

Hierarchy of context-free grammars

LR(1) grammars

LR(0) grammarsparse using LR(0) algorithm

javaperl

python…

to be continued…