cs415 compilers lexical analysis and - computer …zz124/cs415_spring2014/lectures/lec07...cs415...

CS415 Compilers

Lexical Analysis and

These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice

University

cs415, spring 14

First Programming Project

2 Lecture 7

Instruction Scheduling Project has been posted Due dates: code – February 25, report – March 4 - start early - problem is not totally defined; you may run into situations where you will need time to “redesign” your solution - the report is 40% of the grade, code is 60% - need to follow calling conventions since we will use automatic testing framework - we mainly provide help for C/C++ implementations - not a group project!

cs415, spring 14 Lecture 7

3

Constructing a Scanner – Review

→  The scanner is the first stage in the front end →  Specifications can be expressed using regular expressions →  Build tables and code from a DFA

Scanner

Scanner Generator

specifications

source code parts of speech & words

Tables and code


4

Review: Goal

•  We show how to construct a finite state automaton to recognize any RE

•  Overview: →  Direct construction of a nondeterministic finite automaton

(NFA) to recognize a given RE §  Requires ε-transitions to combine regular subexpressions

→  Construct a deterministic finite automaton (DFA) to simulate the NFA §  Use a set-of-states construction

→  Minimize the number of states §  Hopcroft state minimization algorithm

→  Generate the scanner code §  Additional specifications needed for details


5

Non-deterministic Finite Automata

Each RE corresponds to a deterministic finite automaton (DFA) •  May be hard to directly construct the right DFA What about an RE such as ( a | b )* abb ?

This is a little different

•  S0 has a transition on ε •  S1 has two transitions on a This is a non-deterministic finite automaton (NFA)

a | b

S0 S1 S4 S2 S3 ε a b b


6

Non-deterministic Finite Automata

•  An NFA accepts a string x iff ∃ a path though the transition graph from s0 to a final state such that the edge labels spell x

•  Transitions on ε consume no input •  To “run” the NFA, start in s0 and guess the right transition at

each step →  Always guess correctly →  If some sequence of correct guesses accepts x then accept

Why study NFAs? •  They are the key to automate the RE→DFA construction •  We can paste together NFAs with ε-transitions

NFA NFA becomes an NFA ε


7

Relationship between NFAs and DFAs

DFA is a special case of an NFA •  DFA has no ε transitions •  DFA’s transition function is single-valued •  Same rules will work

DFA can be simulated with an NFA →  Obviously

NFA can be simulated with a DFA (less obvious) •  Simulate sets of possible states •  Possible exponential blowup in the state space •  Still, one state per character in the input stream


8

Automating Scanner Construction

To convert a specification into code: 1  Write down the RE for the input language 2  Build a big NFA 3  Build the DFA that simulates the NFA 4  Systematically shrink the DFA 5  Turn it into code

Scanner generators •  Lex and Flex work along these lines •  Algorithms are well-known and well-understood •  Key issue is interface to parser (define all parts of speech) •  You could build one in a weekend!


9

Review: Automating Scanner Construction

RE→ NFA (Thompson’s construction) •  Build an NFA for each term •  Combine them with ε-moves

NFA → DFA (subset construction) •  Build the simulation

DFA → Minimal DFA •  Hopcroft’s algorithm

DFA →RE (Not part of the scanner construction) •  All pairs, all paths problem •  Take the union of all paths from s0 to an accepting state

minimal DFA

RE NFA DFA

The Cycle of Constructions


10

RE →NFA using Thompson’s Construction

Key idea •  NFA pattern for each symbol and each operator •  Each NFA has a single start and accept state •  Join them with ε moves in precedence order

S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab


11



S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε


12



S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S1 S2 a

S3 S4 b


13



S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5


14



S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5 ε

ε ε

ε


15



S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5 ε

ε ε

ε

S1 S3

NFA for a*

a


16



S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5 ε

ε ε

ε

S1 S3

NFA for a*

a

ε


17



S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5 ε

ε ε

ε

S0 S1 S3 S4

NFA for a*

a

ε


18



S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5 ε

ε ε

ε

S0 S1 ε S3 S4

ε

NFA for a*

a

ε

ε


19



S0 S1 a

NFA for a

S0 S1 a

S3 S4 b

NFA for ab

ε

NFA for a | b

S0

S1 S2 a

S3 S4 b

S5 ε

ε ε

ε

S0 S1 ε S3 S4

ε

NFA for a*

a

ε

ε

Ken Thompson, CACM, 1968


20

Example of Thompson’s Construction

Let’s try a ( b | c )*

1. a, b, & c

2. b | c

3. ( b | c )*

S0 S1 a

S0 S1 b

S0 S1 c

S2 S3 b

S4 S5 c

S1 S6 S0 S7 ε ε

ε ε

S1 S2 b

S3 S4 c

S0 S5 ε

ε

ε

ε


21

Example of Thompson’s Construction

Let’s try a ( b | c )*

1. a, b, & c

2. b | c

3. ( b | c )*

S0 S1 a

S0 S1 b

S0 S1 c

S2 S3 b

S4 S5 c

S1 S6 S0 S7

ε

ε

ε ε

ε ε

ε ε

S1 S2 b

S3 S4 c

S0 S5 ε

ε

ε

ε


22

Example of Thompson’s Construction (con’t)

4. a ( b | c )*

Of course, a human would design something simpler ...

S0 S1 a

b | c But, we can automate production of the more complex one ...

S0 S1 a ε

S4 S5 b

S6 S7 c

S3 S8 S2 S9

ε

ε

ε ε

ε ε

ε ε


23

Review: Automating Scanner Construction

RE→ NFA (Thompson’s construction) •  Build an NFA for each term •  Combine them with ε-moves

NFA → DFA (subset construction) •  Build the simulation

DFA → Minimal DFA •  Hopcroft’s algorithm

DFA →RE (Not part of the scanner construction) •  All pairs, all paths problem •  Take the union of all paths from s0 to an accepting state

minimal DFA

RE NFA DFA



24

NFA →DFA with Subset Construction

Need to build a simulation of the NFA

Two key functions •  Move(si , a) is set of states reachable from si by a •  ε-closure(si) is set of states reachable from si by ε

The algorithm: •  Start state derived from s0 of the NFA •  Take its ε-closure S0 = ε-closure(s0) •  Take the image of S0, move(S0, a) for each a ∈ Σ, and take

its ε-closure, add it to the state set S •  For each state S, compute move(S, a) for each a ∈ Σ, and

take its ε-closure •  Iterate until no more states are added

Sounds more complex than it is…


25


The algorithm:

s0 ← ε-closure(q0 ) add s0 to S

while ( S is still changing ) for each si ∈ S for each a∈ Σ s?← ε-closure(move(si,a)) if ( s? ∉ S ) then add s? to S as sj T[ si,a] ← sj

else T[ si,a] ← s?

Let’s think about why this works


26


The algorithm:

s0 ← ε-closure(q0 ) add s0 to S

while ( S is still changing ) for each si ∈ S for each a∈ Σ s?← ε-closure(move(si,a)) if ( s? ∉ S ) then add s? to S as sj T[ si,a] ← sj

else T[ si,a] ← s?

Let’s think about why this works

The algorithm halts:

1. S contains no duplicates (test before adding)

2. 2Q is finite

3. while loop adds to S, but does not remove from S (monotone)

⇒ the loop halts

S contains all the reachable NFA states It tries each symbol in each si. It builds every possible NFA configuration.

⇒ S and T form the DFA


27


Example of a fixed-point computation •  Monotone construction of some finite set •  Halts when it stops adding to the set •  Proofs of halting & correctness are similar •  These computations arise in many contexts

Other fixed-point computations •  Canonical construction of sets of LR(1) items

→  Quite similar to the subset construction •  Classic data-flow analysis

→  Solving sets of simultaneous set equations •  DFA minimization algorithm (coming up!)

We will see many more fixed-point computations


28

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε

ε-closure(move(s,*))

NFA states a b c

s0 q0 q1


Applying the subset construction:

a ( b | c )* : ε


29

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε


NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none



a ( b | c )* : ε


30

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε


NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9



a ( b | c )* : ε


31

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε


NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9

none



a ( b | c )* : ε


32

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε


NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9

none q5

q7



a ( b | c )* : ε


33

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε


NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9

none q5, q8, q9, q3, q4, q6

q7, q8, q9, q3, q4, q6



a ( b | c )* : ε


34

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε


NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9

none q5, q8, q9, q3, q4, q6

q7, q8, q9, q3, q4, q6

s2 q5, q8, q9, q3, q4, q6



a ( b | c )* : ε


35

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε


NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9

none q5, q8, q9, q3, q4, q6

q7, q8, q9, q3, q4, q6

s2 q5, q8, q9, q3, q4, q6

none s2 s3



a ( b | c )* : ε


36

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε


NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3, q4, q6, q9

none q5, q8, q9, q3, q4, q6

q7, q8, q9, q3, q4, q6

s2 q5, q8, q9, q3, q4, q6

none s2 s3

s3 q7, q8, q9, q3, q4, q6



a ( b | c )* : ε


37

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε


NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3,q4, q6, q9

none q5, q8, q9,q3, q4, q6

q7, q8, q9,q3, q4, q6

s2 q5, q8, q9,q3, q4, q6

none s2 s3

s3 q7, q8, q9,q3, q4, q6

none s2 s3



a ( b | c )* : ε


38

q0 q1 a ε

q4 q5 b

q6 q7 c

q3 q8 q2 q9

ε

ε ε

ε ε

ε ε


NFA states a b c

s0 q0 q1, q2, q3, q4, q6, q9

none none

s1 q1, q2, q3,q4, q6, q9

none q5, q8, q9,q3, q4, q6

q7, q8, q9,q3, q4, q6

s2 q5, q8, q9,q3, q4, q6

none s2 s3

s3 q7, q8, q9,q3, q4, q6

none s2 s3

Final states



a ( b | c )* : ε


39


The DFA for a ( b | c )*

•  Ends up smaller than the NFA •  All transitions are deterministic

δ a b c

s0 s1 - -

s1 - s2 s3

s2 - s2 s3

s3 - s2 s3

s3

s2

s0 s1 c

ba

b

c

c

b


40

Automating Scanner Construction

RE→NFA (Thompson’s construction) •  Build an NFA for each term •  Combine them with ε-moves

NFA →DFA (subset construction) •  Build the simulation

DFA →Minimal DFA •  Hopcroft’s algorithm

DFA →RE (not really part of scanner construction) •  All pairs, all paths problem •  Union together paths from s0 to a final state

minimal DFA

RE NFA DFA



41

DFA Minimization

The Big Picture •  Discover sets of equivalent states •  Represent each such set with just one state


42

DFA Minimization


Two states are equivalent if and only if: •  ∀ a ∈ Σ, transitions on a lead to equivalent states (DFA) •  a-transitions to distinct sets ⇒ states must be in distinct sets


43

DFA Minimization


Two states are equivalent if and only if: •  ∀ a ∈ Σ, transitions on a lead to equivalent states (DFA) •  a-transitions to distinct sets ⇒ states must be in distinct sets

A partition P of S •  Each state s ∈ S is in exactly one set pi ∈ P •  The algorithm iteratively partitions the DFA’s states


44

DFA Minimization

Details of the algorithm •  Group states into maximal size sets, optimistically •  Iteratively subdivide those sets, as needed •  States that remain grouped together are equivalent

Initial partition, P0 , has two sets: {F} & {Q-F} (D =(Q,Σ,δ,q0,F))

Splitting a set (“partitioning a set by a”) •  Assume qa, & qb ∈ s, and δ(qa,a) = qx, & δ(qb,a) = qy •  If qx & qy are not in the same set, then s must be split

→  qa has transition on a, qb does not ⇒ a splits s


45

DFA Minimization

The algorithm P ← { F, {Q-F}} while ( P is still changing) T ← { } for each set S ∈ P T ← T ∪ split(S) P ← T split(S): for each a∈ Σ if a splits S into S1 , S2 , … then return {S1 , S2, …} else return S

Why does this work? •  Partition P ∈ 2Q •  Start off with 2 subsets of Q

{F} and {Q-F} •  While loop takes Pi→Pi+1 by

splitting 1 or more sets •  Pi+1 is at least one step closer

to the partition with |Q| sets •  Maximum of |Q | splits Note that •  Partitions are never combined


46

DFA Minimization

The algorithm P ← { F, {Q-F}} while ( P is still changing) T ← { } for each set S ∈ P T ← T ∪ split(S) P ← T split(S): for each a∈ Σ if a splits S into S1 , S2 , … then return {S1 , S2, …} else return S

Why does this work? •  Partition P ∈ 2Q •  Start off with 2 subsets of Q

{F} and {Q-F} •  While loop takes Pi→Pi+1 by

splitting 1 or more sets •  Pi+1 is at least one step closer

to the partition with |Q| sets •  Maximum of |Q | splits Note that •  Partitions are never combined

This is a fixed-point algorithm!


47

Back to our DFA Minimization example

Then, apply the minimization algorithm

To produce the minimal DFA

s3

s2

s0 s1 c

ba

b

b

c

c

Split onCurrent Partition a b c

P0 { s1, s2, s3} {s0} none none none

s0 s1 a

b | c We observed that a human would design a simpler automaton than Thompson’s construction & the subset construction did.

Minimizing that DFA produces the one that a human would design!

final states


48

Abbreviated Register Specification

Start with a regular expression r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | r8 | r9

minimal DFA

RE NFA DFA



49


Thompson’s construction produces r 0

r 1

r 2

r 8

r 9

… …

s0 sf

ε ε

ε

ε

ε ε

ε ε ε

ε

ε

ε ε

ε ε

ε

ε

ε

ε ε

…

minimal DFA

RE NFA DFA



50


The subset construction builds This is a DFA, but it has a lot of states …

r 0

sf0

s0

sf1 1 sf2 2

sf9 sf8

… 9

8

minimal DFA

RE NFA DFA



51


The DFA minimization algorithm builds This looks like what a skilled compiler writer would do!

r s0 sf

0,1,2,3,4, 5,6,7,8,9

minimal DFA

RE NFA DFA



52

Syntax Analysis Read EaC: 3.1 – 3.3

Homework and Next class

cs415 compilers lexical analysis and - computer …zz124/cs415_spring2014/lectures/lec07...cs415...

Documents