lecture 3: parsing cs 540 george mason university

63
Lecture 3: Parsing CS 540 George Mason University

Upload: loraine-hunt

Post on 29-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 3: Parsing CS 540 George Mason University

Lecture 3: Parsing

CS 540

George Mason University

Page 2: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 2

Static Analysis - Parsing

Scanner(lexical

analysis)

Parser(syntax

analysis)

CodeOptimizer

SemanticAnalysis

(IC generator)

CodeGenerator

SymbolTable

Sourcelanguage

tokens Syntaticstructure

Syntatic/semanticstructure

Targetlanguage

•Syntax described formally•Tokens organized into syntax tree that describes structure•Error checking

Page 3: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 3

Static Analysis - Parsing

We can use context free grammars to specify the syntaxof programming languages.

Page 4: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 4

Context Free Grammars

Definition: A context free grammar is a formal model that consists of:

1. A set of tokens or terminal symbols Vt

2. A set of nonterminal symbols Vn

3. Start symbol S (in Vn)

4. Finite set of productions of form:

A R1…Rn (n >= 0) where A in Vn,

R in Vn Vt

Page 5: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 5

CFG Examples

Vt = {+,-,0..9}, Vn = {L,D}, s = {L}

L L + D | L – D | D

D 0 | … | 9

Vt={(,)}, Vn = {L}, s = {L}

L ( L ) L

L

Indicates a production

Shorthand for multiple productions

Page 6: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 6

LanguagesRegular A a B, C Context free A Context sensitive A Type 0

Page 7: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 7

Any regular language can be expressed using a CFG

Starting with a NFA:• For each state Si in the NFA

– Create non-terminal Ai

– If transition (Si,a) = Sk, create production Ai a Ak

– If transition (Si = Sk, create production Ai Ak

– If Si is a final state, create production Ai – If Si is the NFA start state, s = Ai

• What does the existence of this algorithm tell us about the relationship between regular and context free languages?

Page 8: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 8

NFA to CFG Example

ab*aa

b

a

A1 a A2

A2 b A2

A2 a A3

A3

1 2 3

Page 9: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 9

Writing Grammars

When writing a grammar (or RE) for some language, the following must be true:

1. All strings generated are in the language.

2. Your grammar produces all strings in the language.

Page 10: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 10

Try these:

• Integers divisible by 2

• Legal postfix expressions

• Floating point numbers with no extra zeros

• Strings of 0,1 where there are more 0 than 1

Page 11: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 11

Parsing

• The task of parsing is figuring out what the parse tree looks like for a given input and language.

• If a string is in the given language, a parse tree must exist.

• However, just because a parse tree exists for some string in a given language doesn’t mean a given algorithm can find it.

Page 12: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 12

Parse TreesThe parse tree for some string in a language that is

defined by the grammar G as follows:– The root is the start symbol of G

– The leaves are terminals or . When visited from left to right, the leaves form the input string

– The interior nodes are non-terminals of G

– For every non-terminal A in the tree with children B1 … Bk, there is some production A B1 … Bk

Page 13: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 13

Parse Tree for (())()

L

L L

( )L L( )L L

( )

L ( L ) LL

Page 14: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 14

Parse Tree for (())()

L

L L

( )L L( ) L L( )

L ( L ) LL

Page 15: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 15

Parsing• Parsing algorithms are based on the idea of

derivations.

• General Algorithms:– LL (top down)– LR (bottom up)

Page 16: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 16

Single Step Derivation

Definition: Given A (with , in

(Vn Vt)*) and a production A ,

A is a single step derivation.

Examples:

L + D L – D + D L L - D

( L ) ( L ) ( ( L ) L ) ( L ) L (L) L

Greek letters (,,,…)

denote a (possibly empty)

sequence of terminals and

non-terminals.

Page 17: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 17

DerivationsDefinition: A sequence of the form: w0 w1 … wn

is a derivation of wn from w0 (w0 * wn)

L production L ( L ) L

( L ) L production L ( ) L production L ( )

L * ()

If wi has non-terminal symbols, it is referred to as sentential form.

Page 18: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 18

L production L ( L ) L

( L ) L production L ( L ) L

( L ) ( L ) L production L

( L ) ( L ) production L ( L ) L

( ( L ) L ) ( L ) production L

( ( ) L ) ( L ) production L

( ( ) L ) ( ) production L

( ( ) ) ( )

L * (( )) ( )

Page 19: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 19

• L(G), the language generated by grammar G is {w in Vt

*: s * w, for start symbol s}

• We’ve just shown that both () and (())() are in L(G) for the previous grammar.

Page 20: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 20

Leftmost Derivations

• A derivation where the leftmost nonterminal is always chosen

• If a string is in a given language (i.e. a derivation exists), then a leftmost derivation must exist

• Rightmost derivation defined as you would expect

Page 21: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 21

Leftmost Derivation for (())()L production L ( L ) L

( L ) L production L ( L ) L

( ( L ) L ) L production L

( ( ) L ) L production L

( ( ) ) L production L

( ( ) ) ( L ) L production L ( L ) L

( ( ) ) ( ) L production L

( ( ) ) ( )

Page 22: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 22

Rightmost Derivation for (())()L production L ( L ) L

( L ) L production L ( L ) L

( L ) ( L ) L production L

( L ) ( L ) production L

( L ) ( ) production L ( L ) L

( ( L ) L ) ( ) production L

( ( L ) ) ( ) production L

( ( ) ) ( )

Page 23: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 23

Ambiguity

Two (or more) parse trees or leftmost derivations for some string in the language

E E + E

E E – E

E 0 | … | 9

E

E + E

E - E E + E

E - E

E

2 3

4 2

3 42 – 3 + 4

Page 24: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 24

• Two leftmost derivations E E + E E E – E

E – E + E 2 – E

2 – E + E 2 – E + E

2 – 3 + E 2 – 3 + E

2 – 3 + 4 2 – 3 + 4

Page 25: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 25

• An ambiguous grammar can sometimes be made unambiguous:E E + T | E – T | T

T 0 | … | 9

• Precedence can be specified as well:E E + T | E – T | T

T T * F | T / F | F

F ( E ) | 0 | …| 9

enforces the correctassociativity

Page 26: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 26

Input: begin simplestmt; simplestmt; end

begin simplestmt ; simplestmt ; end

S S SS

SS

SS

PP begin SS end

SS S ; SS

SS S simplestmt

S begin SS end

Page 27: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 27

Top Down (LL) Parsing

begin simplestmt ; simplestmt ; end

SS

PP begin SS end

SS S ; SS

SS S simplestmt

S begin SS end

Page 28: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 28

Top Down (LL) Parsing

begin simplestmt ; simplestmt ; end

S

SS

SS

PP begin SS end

SS S ; SS

SS S simplestmt

S begin SS end

Page 29: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 29

Top Down (LL) Parsing

begin simplestmt ; simplestmt ; end

S

SS

SS

PP begin SS end

SS S ; SS

SS S simplestmt

S begin SS end

Page 30: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 30

Top Down (LL) Parsing

begin simplestmt ; simplestmt ; end

S S SS

SS

SS

PP begin SS end

SS S ; SS

SS S simplestmt

S begin SS end

Page 31: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 31

Top Down (LL) Parsing

begin simplestmt ; simplestmt ; end

S S SS

SS

SS

PP begin SS end

SS S ; SS

SS S simplestmt

S begin SS end

Page 32: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 32

Top Down (LL) Parsing

begin simplestmt ; simplestmt ; end

S S SS

SS

SS

P 1

2

3

4

5

6

P begin SS end

SS S ; SS

SS S simplestmt

S begin SS end

Page 33: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 33

Bottomup (LR) Parsing

begin simplestmt ; simplestmt ; end

S

P begin SS end

SS S ; SS

SS S simplestmt

S begin SS end

Page 34: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 34

Bottomup (LR) Parsing

begin simplestmt ; simplestmt ; end

S S

P begin SS end

SS S ; SS

SS S simplestmt

S begin SS end

Page 35: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 35

Bottomup (LR) Parsing

begin simplestmt ; simplestmt ; end

S S SS

P begin SS end

SS S ; SS

SS S simplestmt

S begin SS end

Page 36: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 36

Bottomup (LR) Parsing

begin simplestmt ; simplestmt ; end

S S SS

SS

P begin SS end

SS S ; SS

SS S simplestmt

S begin SS end

Page 37: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 37

Bottomup (LR) Parsing

begin simplestmt ; simplestmt ; end

S S SS

SS

SS

P begin SS end

SS S ; SS

SS S simplestmt

S begin SS end

Page 38: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 38

Bottomup (LR) Parsing

begin simplestmt ; simplestmt ; end

S S SS

SS

SS

P 6

5

1

4

2

3

P begin SS end

SS S ; SS

SS S simplestmt

S begin SS end

Page 39: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 39

Parsing

• General Algorithms:– LL (top down)– LR (bottom up)

• Both algorithms are driven by the input grammar and the input to be parsed

• Two important sets: FIRST and FOLLOW

Page 40: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 40

FIRST SetsFIRST() is the set of all terminal symbols

that can begin some sentential form in a derivation that starts with … a

• FIRST() = {a in Vt | * a } { } if *

• Example: S simple | begin S end FIRST(S) = {simple, begin}

Page 41: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 41

To compute FIRST across strings of terminals and non-terminals:

FIRST() = { }

FIRST(A) = A if A is a terminal

FIRST(A) FIRST()

if A * FIRST(A)

otherwise

Page 42: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 42

Computing FIRST setsInitially FIRST(A) (for all A) is empty1. For productions A c , where c in Vt

Add { c } to FIRST(A) 2. For productions A

Add { } to FIRST(A)

3. For productions A B , where * and NOT (B *)

Add FIRST(B) to FIRST(A)

4. For productions A , where * Add FIRST() and { to FIRST(A)

same thing as FIRST(B)

Page 43: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 43

Example 1

• S a S e• S B• B b B e• B C• C c C e• C d

• FIRST(C) = {c,d}• FIRST(B) =• FIRST(S) =

Start with the ‘simplest’ non-terminal

Page 44: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 44

Example 1

• S a S e• S B• B b B e• B C• C c C e• C d

• FIRST(C) = {c,d}• FIRST(B) = {b,c,d}• FIRST(S) =

Now that we know FIRST(C) …

Page 45: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 45

Example 1

• S a S e• S B• B b B e• B C• C c C e• C d

• FIRST(C) = {c,d}• FIRST(B) = {b,c,d}• FIRST(S) = {a,b,c,d}

Page 46: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 46

Example 2

• P i | c | n T S• Q P | a S | d S c S T• R b |• S e | R n | • T R S q

• FIRST(P) = {i,c,n}• FIRST(Q) =• FIRST(R) = {b,}• FIRST(S) =• FIRST(T) =

Page 47: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 47

Example 2

• P i | c | n T S• Q P | a S | d S c S T• R b |• S e | R n | • T R S q

• FIRST(P) = {i,c,n}• FIRST(Q) = {i,c,n,a,d}• FIRST(R) = {b,}• FIRST(S) =• FIRST(T) =

Page 48: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 48

Example 2

• P i | c | n T S• Q P | a S | d S c S T• R b |• S e | R n | • T R S q

• FIRST(P) = {i,c,n}• FIRST(Q) = {i,c,n,a,d}• FIRST(R) = {b,}• FIRST(S) = {e,b,n,}• FIRST(T) =

Note: S R n n because R *

Page 49: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 49

Example 2

• P i | c | n T S• Q P | a S | d S c S T• R b |• S e | R n | • T R S q

• FIRST(P) = {i,c,n}• FIRST(Q) = {i,c,n,a,d}• FIRST(R) = {b,}• FIRST(S) = {e,b,n,}• FIRST(T) = {b,c,n,q}

Note: T R S q S q q because both R and S *

Page 50: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 50

Example 3

• S a S e | S T S• T R S e | Q• R r S r | • Q S T |

• FIRST(S) =• FIRST(R) =• FIRST(T) =• FIRST(Q) =

Page 51: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 51

Example 3

• S a S e | S T S• T R S e | Q• R r S r | • Q S T |

• FIRST(S) = {a}• FIRST(R) = {r, }• FIRST(T) = {r,a, }• FIRST(Q) = {a, }

Page 52: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 52

FOLLOW Sets

• FOLLOW(A) is the set of terminals (including end of file - $) that may follow non-terminal A in some sentential form.

• FOLLOW(A) = {c in Vt | S + …Ac…} {$} if S + …A

• For example, consider L + (())(L)L Both ‘)’ and end of file can follow L

• NOTE: is never in FOLLOW sets

Page 53: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 53

Computing FOLLOW(A)

1. If A is start symbol, put $ in FOLLOW(A)

2. Productions of the form B A ,Add FIRST() – {} to FOLLOW(A)

INTUITION: Suppose B AX and FIRST(X) = {c}

S + B A X + A c

= FIRST(X)

Page 54: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 54

3. Productions of the form B A or

B A where *Add FOLLOW(B) to FOLLOW(A)

INTUITION: – Suppose B Y A

S + B Y A

– Suppose B A X and X *S + B A X *A

FOLLOW(B)

FOLLOW(B)

Page 55: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 55

Example 4

• S a S e | B• B b B C f | C• C c C g | d |

• FIRST(C) = {c,d,}• FIRST(B) = {b,c,d,}• FIRST(S) = {a,b,c,d,}

• FOLLOW(C) =

• FOLLOW(B) =

• FOLLOW(S) = {$}

Assume the first non-terminal isthe start symbol

Using rule #1

Page 56: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 56

Example 4

• S a S e | B• B b B C f | C• C c C g | d |

• FIRST(C) = {c,d,}• FIRST(B) = {b,c,d,}• FIRST(S) = {a,b,c,d,}

• FOLLOW(C) = {f,g}

• FOLLOW(B) =

{c,d,f}

• FOLLOW(S) = {$,e}

Using rule #2

Page 57: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 57

Example 4

• S a S e | B• B b B C f | C• C c C g | d |

• FIRST(C) = {c,d,}• FIRST(B) = {b,c,d,}• FIRST(S) = {a,b,c,d,}

• FOLLOW(C) =

• FOLLOW(B) =

• FOLLOW(S) = { }

$, e

{c,d,f} FOLLOW(S)

= {c,d,e,f,$}

{f,g} FOLLOW(B)

= {c,d,e,f,g,$}

Using rule #3

Page 58: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 58

Example 5

• S A B C | A D• A | a A• B b | c | • C D d C• D e b | f c

• FIRST(D) = {e,f}• FIRST(C) = {e,f}• FIRST(B) = {b,c• FIRST(A) = {a, }• FIRST(S) = {a,b,c,e,f}

• FOLLOW(S) =• FOLLOW(A) =• FOLLOW(B) =• FOLLOW(C) =• FOLLOW(D) =

Page 59: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 59

Example 5

• S A B C | A D• A | a A• B b | c | • C D d C• D e b | f c

• FIRST(D) = {e,f}• FIRST(C) = {e,f}• FIRST(B) = {b,c• FIRST(A) = {a, }• FIRST(S) = {a,b,c,e,f}

• FOLLOW(S) = {$}• FOLLOW(A) = {b,c,e,f}• FOLLOW(B) = {e,f}• FOLLOW(C) = {$}• FOLLOW(D) = {$}

Page 60: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 60

Example 6

• S ( A) | • A T E• E & T E | • T ( A ) | a | b | c

• FIRST(T) = {(,a,b,c}• FIRST(E) = {&, }• FIRST(A) = {(,a,b,c}• FIRST(S) = {(, }

• FOLLOW(S) =

• FOLLOW(A) =

• FOLLOW(E) =

• FOLLOW(T) =

Page 61: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 61

Example 6

• S ( A) | • A T E• E & T E | • T ( A ) | a | b | c

• FIRST(T) = {(,a,b,c}• FIRST(E) = {&, }• FIRST(A) = {(,a,b,c}• FIRST(S) = {(, }

• FOLLOW(S) = {$}

• FOLLOW(A) = { ) }

• FOLLOW(E) =

FOLLOW(A) = { ) }

• FOLLOW(T) =

FIRST(E) FOLLOW(A) FOLLOW(E) = {&, )}

Page 62: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 62

Example 7

• E T E’

• E’ + T E’ | • T F T’

• T’ * F T’ | • F ( E ) | id

• FIRST(F) = FIRST(T) = FIRST(E) = {(,id}

• FIRST(T’) = {*,}

• FIRST(E’) = {+,}

• FOLLOW(E) =

• FOLLOW(E’) =

• FOLLOW(T) =

• FOLLOW(T’) =

• FOLLOW(F) =

Page 63: Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU 63

Example 7

• E T E’

• E’ + T E’ | • T F T’

• T’ * F T’ | • F ( E ) | id

• FIRST(F) = FIRST(T) = FIRST(E) = {(,id}

• FIRST(T’) = {*,}

• FIRST(E’) = {+,}

• FOLLOW(E) = {$,)}

• FOLLOW(E’) = FOLLOW(E) = {$,)}

• FOLLOW(T) = FIRST(E’) FOLLOW(E) FOLLOW(E’) = {+,$,)}

• FOLLOW(T’) = FOLLOW(T) = {+,$,)}

• FOLLOW(F) = FIRST(T’) FOLLOW(T) FOLLOW(T’) = {*,+,$,)}