cos 320 compilers david walker. the front end lexical analysis: create sequence of tokens from...

81
COS 320 Compilers David Walker

Post on 20-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

COS 320Compilers

David Walker

Page 2: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

The Front End

• Lexical Analysis: Create sequence of tokens from characters (Chap 2)

• Parsing: Create abstract syntax tree from sequence of tokens (Chap 3)

• Type Checking: Check program for well-formedness constraints

Lexer Parser

stream ofcharacters

stream oftokens

abstractsyntax

TypeChecker

Page 3: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Parsing with CFGs

• Context-free grammars are (often) given by BNF expressions (Backus-Naur Form)– Appel Chap 3.1

• More powerful than regular expressions– Matching parens

• wait, we could do nested comments with ML-LEX!• still can’t describe PL structure effectively in ML-LEX

• CFGs are good for describing the overall syntactic structure of programs.

Page 4: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Context-Free Grammars

• Context-free grammars consist of:– Set of symbols:

• terminals that denotes token types• non-terminals that denotes a set of strings

– Start symbol– Rules:

• left-hand side: non-terminal• right-hand side: terminals and/or non-terminals• rules explain how to rewrite non-terminals (beginning with

start symbol) into terminals

symbol ::= symbol symbol ... symbol

Page 5: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Context-Free Grammars

A string is in the language of the CFG if and only if it is possible to derive that string using the following non-deterministic procedure:

1. begin with the start symbol

2. while any non-terminals exist, pick a non-terminal and rewrite it using a rule <== could be many choices here

3. stop when all you have left are terminals (and check you arrived at the string your were hoping to)

Parsing is the process of checking that a string is in the CFG for your programming language. It is usually coupled with creating an abstract syntax tree.

Page 6: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

non-terminals: S, E, Elist

terminals: ID, NUM, PRINT, +, :=, (, ), ;

rules:

S ::= S; SS ::= ID := ES ::= PRINT ( Elist )

E ::= IDE ::= NUME ::= E + EE ::= ( S , Elist )

Elist ::= EElist ::= Elist , E

Page 7: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

non-terminals: S, E, Elist

terminals: ID, NUM, PRINT, +, :=, (, ), ;

rules:

1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist )

4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist )

8. Elist ::= E9. Elist ::= Elist , E

ID = NUM ; PRINT ( NUM )

Derive me!

Page 8: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

non-terminals: S, E, Elist

terminals: ID, NUM, PRINT, +, :=, (, ), ;

rules:

1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist )

4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist )

8. Elist ::= E9. Elist ::= Elist , E

S

ID = NUM ; PRINT ( NUM )

Derive me!

Page 9: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

non-terminals: S, E, Elist

terminals: ID, NUM, PRINT, +, :=, (, ), ;

rules:

1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist )

4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist )

8. Elist ::= E9. Elist ::= Elist , E

SID = E

ID = NUM ; PRINT ( NUM )

Derive me!

Page 10: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

non-terminals: S, E, Elist

terminals: ID, NUM, PRINT, +, :=, (, ), ;

rules:

1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist )

4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist )

8. Elist ::= E9. Elist ::= Elist , E

SID = E

ID = NUM ; PRINT ( NUM )

Derive me!

oops,can’t makeprogress

E

Page 11: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

non-terminals: S, E, Elist

terminals: ID, NUM, PRINT, +, :=, (, ), ;

rules:

1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist )

4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist )

8. Elist ::= E9. Elist ::= Elist , E

S

ID = NUM ; PRINT ( NUM )

Derive me!

Page 12: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

non-terminals: S, E, Elist

terminals: ID, NUM, PRINT, +, :=, (, ), ;

rules:

1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist )

4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist )

8. Elist ::= E9. Elist ::= Elist , E

SS ; S

ID = NUM ; PRINT ( NUM )

Derive me!

Page 13: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

non-terminals: S, E, Elist

terminals: ID, NUM, PRINT, +, :=, (, ), ;

rules:

1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist )

4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist )

8. Elist ::= E9. Elist ::= Elist , E

SS ; SID := E ; S

ID = NUM ; PRINT ( NUM )

Derive me!

Page 14: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

non-terminals: S, E, Elist

terminals: ID, NUM, PRINT, +, :=, (, ), ;

rules:

1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist )

4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist )

8. Elist ::= E9. Elist ::= Elist , E

SS ; SID = E ; SID = NUM ; SID = NUM ; PRINT ( Elist )ID = NUM ; PRINT ( E )ID = NUM ; PRINT ( NUM )

Derive me!

Page 15: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

non-terminals: S, E, Elist

terminals: ID, NUM, PRINT, +, :=, (, ), ;

rules:

1. S ::= S; S2. S ::= ID := E3. S ::= PRINT ( Elist )

4. E ::= ID5. E ::= NUM6. E ::= E + E7. E ::= ( S , Elist )

8. Elist ::= E9. Elist ::= Elist , E

SS ; SID = E ; SID = NUM ; SID = NUM ; PRINT ( Elist )ID = NUM ; PRINT ( E )ID = NUM ; PRINT ( NUM )

Another way toderive thesame string

SS ; SS ; PRINT ( Elist )S ; PRINT ( E )S ; PRINT ( NUM )ID = E ; PRINT ( NUM )ID = NUM ; PRINT ( NUM )

left-most derivation right-most derivation

Page 16: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Parse Trees

• Representing derivations as trees– useful in compilers: Parse trees correspond quite

closely (but not exactly) with abstract syntax trees we’re trying to generate

• difference: abstract syntax vs concrete (parse) syntax

• each internal node is labeled with a non-terminal• each leaf node is labeled with a terminal• each use of a rule in a derivation explains how to

generate children in the parse tree from the parents

Page 17: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Parse Trees

• Example:

S

S S

ID E:=

NUM

NUM

E

L )(PRINT

;

SS ; SID = E ; SID = NUM ; SID = NUM ; PRINT ( Elist )ID = NUM ; PRINT ( E )ID = NUM ; PRINT ( NUM )

Page 18: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Parse Trees

• Example: 2 derivations, but 1 tree

S

S S

ID E:=

NUM

NUM

E

L )(PRINT

;

SS ; SID = E ; SID = NUM ; SID = NUM ; PRINT ( Elist )ID = NUM ; PRINT ( E )ID = NUM ; PRINT ( NUM )

SS ; SS ; PRINT ( Elist )S ; PRINT ( E )S ; PRINT ( NUM )ID = E ; PRINT ( NUM )ID = NUM ; PRINT ( NUM )

Page 19: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Parse Trees

• parse trees have meaning.– order of children, nesting of subtrees is

significant

S

SS

ID E:=

NUM

NUM

E

L )(PRINT

;

S

SS

ID E:=

NUM

NUM

E

L )(PRINT

;

Page 20: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Ambiguous Grammars

• a grammar is ambiguous if the same sequence of tokens can give rise to two or more parse trees

Page 21: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Ambiguous Grammars

non-terminals: E

terminals: ID NUM PLUS MULT

E ::= ID | NUM | E + E | E * E

characters: 4 + 5 * 6tokens: NUM(4) PLUS NUM(5) MULT NUM(6)

E

E E

NUM(4)E

*E

+

NUM(6)NUM(5)

I like using this notation whereI avoid repeating E ::=

Page 22: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Ambiguous Grammars

non-terminals: E

terminals: ID NUM PLUS MULT

E ::= ID | NUM | E + E | E * E

characters: 4 + 5 * 6tokens: NUM(4) PLUS NUM(5) MULT NUM(6)

E

E E

NUM(4)E

*E

+

NUM(6)NUM(5)

E

EE

NUM(6)

E+

E

*

NUM(5)NUM(4)

Page 23: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Ambiguous Grammars

• problem: compilers use parse trees to interpret the meaning of parsed expressions– different parse trees have different meanings– eg: (4 + 5) * 6 is not 4 + (5 * 6)– languages with ambiguous grammars are DISASTROUS;

The meaning of programs isn’t well-defined! You can’t tell what your program might do!

• solution: rewrite grammar to eliminate ambiguity– fold precedence rules into grammar to disambiguate– fold associativity rules into grammar to disambiguate– other tricks as well

Page 24: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Building Parsers

• In theory classes, you might have learned about general mechanisms for parsing all CFGs– algorithms for parsing all CFGs are expensive

• actually, with computers getting faster and bigger year over year, researchers are beginning to dispute this claim.

– for 1/10/100 million-line applications, compilers must be fast. • even for 20 thousand-line apps, speed is nice

– sometimes 1/3 of compilation time is spent in parsing

• compiler writers have developed specialized algorithms for parsing the kinds of CFGs that you need to build effective programming languages– LL(k), LR(k) grammars can be parsed.

Page 25: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Recursive Descent Parsing

• Recursive Descent Parsing (Appel Chap 3.2):– aka: predictive parsing; top-down parsing– simple, efficient– can be coded by hand in ML quickly – parses many, but not all CFGs

• parses LL(1) grammars– Left-to-right parse; Leftmost-derivation; 1 symbol lookahead

– key ideas:• one recursive function for each non terminal• each production becomes one clause in the function

Page 26: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

1. S ::= IF E THEN S ELSE S2. | BEGIN S L3. | PRINT E

4. L ::= END5. | ; S L6. E ::= NUM = NUM

non-terminals: S, E, Lterminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, =rules:

Page 27: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

1. S ::= IF E THEN S ELSE S2. | BEGIN S L3. | PRINT E

4. L ::= END5. | ; S L6. E ::= NUM = NUM

non-terminals: S, E, Lterminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, =rules:

datatype token = NUM | IF | THEN | ELSE | BEGIN | END | PRINT | SEMI | EQ

Step 1: Represent the tokens

Step 2: build infrastructure for reading tokens from lexing stream

val tok = ref (getToken ())

fun advance () = tok := getToken ()

fun eat t = if (! tok = t) then advance () else error ()

function supplied by lexer

Page 28: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

1. S ::= IF E THEN S ELSE S2. | BEGIN S L3. | PRINT E

4. L ::= END5. | ; S L6. E ::= NUM = NUM

non-terminals: S, E, Lterminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, =rules:

fun S () = case !tok of IF => eat IF; E (); eat THEN; S (); eat ELSE; S () | BEGIN => eat BEGIN; S (); L () | PRINT => eat PRINT; E ()

and L () = case !tok of END => eat END | SEMI => eat SEMI; S (); L ()

and E () = eat NUM; eat EQ; eat NUM

Step 3: write parser => one function per non-terminal; one clause per rule

val tok = ref (getToken ())fun advance () = tok := getToken ()fun eat t = if (! tok = t) then advance () else error ()

datatype token = NUM | IF | THEN | ELSE | BEGIN | END | PRINT | SEMI | EQ

Page 29: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

1. S ::= A EOF2. A ::= ID := E3. | PRINT ( L )

4. E ::= ID5. | NUM 6. L ::= E7. | L , E

non-terminals: S, A, E, Lrules:

fun S () = A (); eat EOF

and A () = case !tok of ID => eat ID; eat ASSIGN; E () | PRINT => eat PRINT; eat LPAREN; L (); eat RPAREN

and E () = case !tok of ID => eat ID | NUM => eat NUM

and L () = case !tok of ID => ??? | NUM => ???

Page 30: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

problem

• predictive parsing only works for grammars where the first terminal symbol in the input provides enough information to choose which production to use– LL(1)

• when parsing L, if !tok = ID, the parser cannot determine which production to use:

6. L ::= E (E could be ID)7. | L , E (L could be E could be ID)

Page 31: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

solution

• eliminate left-recursion

• rewrite the grammar so it parses the same language but the rules are different:

L ::= E | L , E

S ::= A EOFA ::= ID := E | PRINT ( L )

E ::= ID | NUM

S ::= A EOFA ::= ID := E | PRINT ( L )

E ::= ID | NUM

Page 32: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

solution

• eliminate left-recursion

• rewrite the grammar so it parses the same language but the rules are different:

L ::= E | L , E

L ::= E M

M ::= , E M |

S ::= A EOFA ::= ID := E | PRINT ( L )

E ::= ID | NUM

S ::= A EOFA ::= ID := E | PRINT ( L )

E ::= ID | NUM

Page 33: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

eliminating single left-recursion

• Original grammar form:

• Transformed grammar:

X ::= base | X repeat

X ::= base Xnew

Xnew ::= repeat Xnew |

Strings: base repeat repeat ...

Strings: base repeat repeat ...

Think about: what if you have mutually left-recursive variables X,Y,Z?What’s the most general pattern of left recursion? How to eliminate it?

Page 34: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Recursive Descent Parsing

• Unfortunately, can’t always eliminate left recursion

• Questions:– how do we know when we can parse

grammars using recursive descent?– Is there an algorithm for generating such

parsers automatically?

Page 35: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Constructing RD Parsers

• To construct an RD parser, we need to know what rule to apply when– we are trying to parse a non terminal X– we see the next terminal a in input

• We apply rule X ::= s when– a is the first symbol that can be generated by string s,

OR– s reduces to the empty string (is nullable) and a is the

first symbol in any string that can follow X

Page 36: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Constructing RD Parsers

• To construct an RD parser, we need to know what rule to apply when– we are trying to parse a non terminal X– we see the next terminal a in input

• We apply rule X ::= s when– a is the first symbol that can be generated by string s,

OR– s reduces to the empty string (is nullable) and a is the

first symbol in any string that can follow X

Page 37: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Constructing Predictive Parsers

1. Y ::= 2. | bb

3. X ::= c4. | Y Z

5. Z ::= d

X c

X b

X d

non-terminalseen

nextterminal rule

Page 38: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Constructing Predictive Parsers

1. Y ::= 2. | bb

3. X ::= c4. | Y Z

5. Z ::= d

X c 3

X b

X d

non-terminalseen

nextterminal rule

Page 39: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Constructing Predictive Parsers

1. Y ::= 2. | bb

3. X ::= c4. | Y Z

5. Z ::= d

X c 3

X b 4

X d

non-terminalseen

nextterminal rule

Page 40: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Constructing Predictive Parsers

1. Y ::= 2. | bb

3. X ::= c4. | Y Z

5. Z ::= d

X c 3

X b 4

X d 4

non-terminalseen

nextterminal rule

Page 41: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Constricting Predictive Parsers

• in general, must compute:– for each production X ::= s, must determine if

s can derive the empty string.• if yes, X Nullable

– for each production X := s, must determine the set of all first terminals Q derivable from s

• Q First(X)

– for each non terminal X, determine all terminals symbols Q that immediately follow X

• Q Follow(X)

Page 42: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Iterative Analysis• Many compilers algorithms are iterative

techniques.

• Iterative analysis applies when:– must compute a set of objects with some property P– P is defined inductively. ie, there are:

• base cases: objects o1, o2 “obviously” have property P• inductive cases: if certain objects (o3, o4) have property

P, this implies other objects (f o3; f o4) have property P

– The number of objects in the set is finite • or we can represent infinite collections using some finite

notation & we can find effective termination conditions

Page 43: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Iterative Analysis• general form:

– initialize set S with base cases– applied inductive rules over and over until you reach a fixed

point

• a fixed point is a set that does not change when you apply an inductive rule (function)– Nullable, First and Follow sets can be determined through

iteration– many program analyses & optimizations use iteration– worst-case complexity is bad– average-case complexity can be good: iteration “usually”

terminates in a couple of rounds

Page 44: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Iterative Analysis

Base Cases“obviously”have the property

Page 45: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Iterative Analysis

apply function(rule) to thingsthat have theproperty toproduce new thingsthat have the property

ff

f

f

Page 46: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Iterative Analysis

Base Cases+ things you getby applying ruleto base caseshave the property

Page 47: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Iterative Analysis

Apply rules again

Page 48: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Iterative Analysis

Apply rules again

Page 49: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Iterative Analysis

Finally, you reacha fixed point

Page 50: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Iterative Analysis

Example:• axioms are “obviously true”/taken for granted• rules of logic take basic axioms and prove newthings are true

“Dave teaches cos 320”

“Dave teaches cos 441”

“Dave is a great teacher”

rule r: X is a great teacher /\ X teaches Y => Y is a great class

axioms:

Page 51: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Iterative Analysis

Example:• axioms are “obviously true”/taken for granted• rules of logic take basic axioms and prove newthings are true

“Dave teaches cos 320”

“Dave teaches cos 441”

“Dave is a great teacher”

rule r: X is a great teacher /\ X teaches Y => Y is a great class

axioms:

r

“cos 320 is a great class”

r

“cos 441 is a great class”

Page 52: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Iterative Analysis

Example:• axioms are “obviously true”/taken for granted• rules of logic take basic axioms and prove newthings are true

“Dave teaches cos 320”

“Dave teaches cos 441”

“Dave is a great teacher”

rule r: X is a great teacher /\ X teaches Y => Y is a great class

axioms:

r

“cos 320 is a great class”

r

“cos 441 is a great class”

Fixed Point Reached!

Page 53: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Nullable Sets

• Non-terminal X is Nullable only if the following constraints are satisfied – base case:

• if (X := ) then X is Nullable

– inductive case:• if (X := ABC...) and A, B, C, ... are all Nullable then

X is Nullable

Page 54: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Computing Nullable Sets

• Compute X is Nullable by iteration: – Initialization:

• Nullable := { }• if (X := ) then Nullable := Nullable U {X}

– While Nullable different from last iteration do:• for all X,

– if (X := ABC...) and A, B, C, ... are all Nullable then Nullable := Nullable U {X}

Page 55: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

First Sets

• First(X) is specified like this:– base case:

• if T is a terminal symbol then First (T) = {T}

– inductive case:• if X is a non-terminal and (X:= ABC...) then

– First (X) = First (ABC...)

where First(ABC...) = F1 U F2 U F3 U ... and» F1 = First (A)» F2 = First (B), if A is Nullable; emptyset otherwise» F3 = First (C), if A is Nullable & B is Nullable; emp...» ...

Page 56: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Computing First Sets

• Compute First(X):– initialize:

• if T is a terminal symbol then First (T) = {T}• if T is non-terminal then First(T) = { }

– while First(X) changes (for any X) do• for all X and all rules (X:= ABC...) do

– First (X) := First(X) U First (ABC...) where First(ABC...) := F1 U F2 U F3 U ... and

» F1 := First (A)» F2 := First (B), if A is Nullable; emptyset otherwise» F3 := First (C), if A is Nullable & B is Nullable; emp...» ...

Page 57: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Computing Follow Sets

• Follow(X) is computed iteratively– base case:

• initially, we assume nothing in particular follows X– (when computing, Follow (X) is initially { })

– inductive case:• if (Y := s1 X s2) for any strings s1, s2 then

– Follow (X) = First (s2)

• if (Y := s1 X s2) for any strings s1, s2 then– Follow (X) = Follow(Y), if s2 is Nullable

Page 58: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

building a predictive parser

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z

Y

X

Page 59: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

building a predictive parser

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no

Y yes

X no

base case

Page 60: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

building a predictive parser

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no

Y yes

X no

after one round of induction, we realize we have reached a fixed point

Page 61: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

building a predictive parser

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no { }

Y yes { }

X no { }

base case

Page 62: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

building a predictive parser

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d

Y yes c

X no a,b

round 1

Page 63: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

building a predictive parser

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b

Y yes c

X no a,b

round 2

Page 64: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

building a predictive parser

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b

Y yes c

X no a,b

after three rounds of iteration, no more changes ==> fixed point

Page 65: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

building a predictive parser

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b { }

Y yes c { }

X no a,b { }

base case

Page 66: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

building a predictive parser

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b

after one round of induction, no fixed point

Page 67: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

building a predictive parser

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b

after two rounds of induction, fixed point (but notice, computing Follow(X) before Follow (Y) would have required 3rd round)

Page 68: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b

Grammar: Computed Sets:

Build parsing table where row X, col Ttells parser which clause to execute infunction X with next-token T:

a b c d e

Z

Y

X

• if T First(s) then enter (X ::= s) in row X, col T• if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T

Page 69: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b

Grammar: Computed Sets:

Build parsing table where row X, col Ttells parser which clause to execute infunction X with next-token T:

a b c d e

Z Z ::= XYZ Z ::= XYZ

Y

X

• if T First(s) then enter (X ::= s) in row X, col T• if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T

Page 70: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b

Grammar: Computed Sets:

Build parsing table where row X, col Ttells parser which clause to execute infunction X with next-token T:

a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y

X

• if T First(s) then enter (X ::= s) in row X, col T• if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T

Page 71: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b

Grammar: Computed Sets:

Build parsing table where row X, col Ttells parser which clause to execute infunction X with next-token T:

a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y Y ::= c

X

• if T First(s) then enter (X ::= s) in row X, col T• if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T

Page 72: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b

Grammar: Computed Sets:

Build parsing table where row X, col Ttells parser which clause to execute infunction X with next-token T:

a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y Y ::= Y ::= Y ::= c Y ::= Y ::=

X

• if T First(s) then enter (X ::= s) in row X, col T• if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T

Page 73: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b

Grammar: Computed Sets:

Build parsing table where row X, col Ttells parser which clause to execute infunction X with next-token T:

a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y Y ::= Y ::= Y ::= c Y ::= Y ::=

X X ::= a X ::= b Y e

• if T First(s) then enter (X ::= s) in row X, col T• if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T

Page 74: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b

Grammar: Computed Sets:

What are the blanks?

a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y Y ::= Y ::= Y ::= c Y ::= Y ::=

X X ::= a X ::= b Y e

Page 75: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b

Grammar: Computed Sets:

What are the blanks? --> syntax errors

a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y Y ::= Y ::= Y ::= c Y ::= Y ::=

X X ::= a X ::= b Y e

Page 76: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Z ::= X Y ZZ ::= d

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b

Grammar: Computed Sets:

Is it possible to put 2 grammar rules in the same box?

a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Y Y ::= Y ::= Y ::= c Y ::= Y ::=

X X ::= a X ::= b Y e

Page 77: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

Z ::= X Y ZZ ::= dZ ::= d e

Y ::= cY ::=

X ::= aX ::= b Y e

nullable first follow

Z no d,a,b { }

Y yes c e,d,a,b

X no a,b c,e,d,a,b

Grammar: Computed Sets:

Is it possible to put 2 grammar rules in the same box?

a b c d e

Z Z ::= XYZ Z ::= XYZ Z ::= d

Z ::= d e

Y Y ::= Y ::= Y ::= c Y ::= Y ::=

X X ::= a X ::= b Y e

Page 78: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

predictive parsing tables

• if a predictive parsing table constructed this way contains no duplicate entries, the grammar is called LL(1)– Left-to-right parse, Left-most derivation, 1 symbol

lookahead

• if not, of the grammar is not LL(1)• in LL(k) parsing table, columns include every k-

length sequence of terminals:

aa ab ba bb ac ca ...

Page 79: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

another trick

• Previously, we saw that grammars with left-recursion were problematic, but could be transformed into LL(1) in some cases

• the example non-LL(1) grammar we just saw:

• how do we fix it?

Z ::= X Y ZZ ::= dZ ::= d e

Y ::= cY ::=

X ::= aX ::= b Y e

Page 80: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

another trick

• Previously, we saw that grammars with left-recursion were problematic, but could be transformed into LL(1) in some cases

• the example non-LL(1) grammar we just saw:

• solution here is left-factoring:

Z ::= X Y ZZ ::= dZ ::= d e

Y ::= cY ::=

X ::= aX ::= b Y e

Z ::= X Y ZZ ::= d W Y ::= c

Y ::= X ::= aX ::= b Y e

W ::= W ::= e

Page 81: COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Parsing: Create abstract syntax tree

summary

• CFGs are good at specifying programming language structure• parsing general CFGs is expensive so we define parsers for

simple classes of CFG– LL(k), LR(k)

• we can build a recursive descent parser for LL(k) grammars by:– computing nullable, first and follow sets– constructing a parse table from the sets– checking for duplicate entries, which indicates failure– creating an ML program from the parse table

• if parser construction fails we can– rewrite the grammar (left factoring, eliminating left recursion) and try

again– try to build a parser using some other method