cs415 compilers lexical analysis and - computer …zz124/cs415_spring2014/lectures/lec07...cs415...
TRANSCRIPT
CS415 Compilers
Lexical Analysis and
These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice
University
cs415, spring 14
First Programming Project
2 Lecture 7
Instruction Scheduling Project has been posted Due dates: code – February 25, report – March 4 - start early - problem is not totally defined; you may run into situations where you will need time to “redesign” your solution - the report is 40% of the grade, code is 60% - need to follow calling conventions since we will use automatic testing framework - we mainly provide help for C/C++ implementations - not a group project!
cs415, spring 14 Lecture 7
3
Constructing a Scanner – Review
→ The scanner is the first stage in the front end → Specifications can be expressed using regular expressions → Build tables and code from a DFA
Scanner
Scanner Generator
specifications
source code parts of speech & words
Tables and code
cs415, spring 14 Lecture 7
4
Review: Goal
• We show how to construct a finite state automaton to recognize any RE
• Overview: → Direct construction of a nondeterministic finite automaton
(NFA) to recognize a given RE § Requires ε-transitions to combine regular subexpressions
→ Construct a deterministic finite automaton (DFA) to simulate the NFA § Use a set-of-states construction
→ Minimize the number of states § Hopcroft state minimization algorithm
→ Generate the scanner code § Additional specifications needed for details
cs415, spring 14 Lecture 7
5
Non-deterministic Finite Automata
Each RE corresponds to a deterministic finite automaton (DFA) • May be hard to directly construct the right DFA What about an RE such as ( a | b )* abb ?
This is a little different
• S0 has a transition on ε • S1 has two transitions on a This is a non-deterministic finite automaton (NFA)
a | b
S0 S1 S4 S2 S3 ε a b b
cs415, spring 14 Lecture 7
6
Non-deterministic Finite Automata
• An NFA accepts a string x iff ∃ a path though the transition graph from s0 to a final state such that the edge labels spell x
• Transitions on ε consume no input • To “run” the NFA, start in s0 and guess the right transition at
each step → Always guess correctly → If some sequence of correct guesses accepts x then accept
Why study NFAs? • They are the key to automate the RE→DFA construction • We can paste together NFAs with ε-transitions
NFA NFA becomes an NFA ε
cs415, spring 14 Lecture 7
7
Relationship between NFAs and DFAs
DFA is a special case of an NFA • DFA has no ε transitions • DFA’s transition function is single-valued • Same rules will work
DFA can be simulated with an NFA → Obviously
NFA can be simulated with a DFA (less obvious) • Simulate sets of possible states • Possible exponential blowup in the state space • Still, one state per character in the input stream
cs415, spring 14 Lecture 7
8
Automating Scanner Construction
To convert a specification into code: 1 Write down the RE for the input language 2 Build a big NFA 3 Build the DFA that simulates the NFA 4 Systematically shrink the DFA 5 Turn it into code
Scanner generators • Lex and Flex work along these lines • Algorithms are well-known and well-understood • Key issue is interface to parser (define all parts of speech) • You could build one in a weekend!
cs415, spring 14 Lecture 7
9
Review: Automating Scanner Construction
RE→ NFA (Thompson’s construction) • Build an NFA for each term • Combine them with ε-moves
NFA → DFA (subset construction) • Build the simulation
DFA → Minimal DFA • Hopcroft’s algorithm
DFA →RE (Not part of the scanner construction) • All pairs, all paths problem • Take the union of all paths from s0 to an accepting state
minimal DFA
RE NFA DFA
The Cycle of Constructions
cs415, spring 14 Lecture 7
10
RE →NFA using Thompson’s Construction
Key idea • NFA pattern for each symbol and each operator • Each NFA has a single start and accept state • Join them with ε moves in precedence order
S0 S1 a
NFA for a
S0 S1 a
S3 S4 b
NFA for ab
cs415, spring 14 Lecture 7
11
RE →NFA using Thompson’s Construction
Key idea • NFA pattern for each symbol and each operator • Each NFA has a single start and accept state • Join them with ε moves in precedence order
S0 S1 a
NFA for a
S0 S1 a
S3 S4 b
NFA for ab
ε
cs415, spring 14 Lecture 7
12
RE →NFA using Thompson’s Construction
Key idea • NFA pattern for each symbol and each operator • Each NFA has a single start and accept state • Join them with ε moves in precedence order
S0 S1 a
NFA for a
S0 S1 a
S3 S4 b
NFA for ab
ε
NFA for a | b
S1 S2 a
S3 S4 b
cs415, spring 14 Lecture 7
13
RE →NFA using Thompson’s Construction
Key idea • NFA pattern for each symbol and each operator • Each NFA has a single start and accept state • Join them with ε moves in precedence order
S0 S1 a
NFA for a
S0 S1 a
S3 S4 b
NFA for ab
ε
NFA for a | b
S0
S1 S2 a
S3 S4 b
S5
cs415, spring 14 Lecture 7
14
RE →NFA using Thompson’s Construction
Key idea • NFA pattern for each symbol and each operator • Each NFA has a single start and accept state • Join them with ε moves in precedence order
S0 S1 a
NFA for a
S0 S1 a
S3 S4 b
NFA for ab
ε
NFA for a | b
S0
S1 S2 a
S3 S4 b
S5 ε
ε ε
ε
cs415, spring 14 Lecture 7
15
RE →NFA using Thompson’s Construction
Key idea • NFA pattern for each symbol and each operator • Each NFA has a single start and accept state • Join them with ε moves in precedence order
S0 S1 a
NFA for a
S0 S1 a
S3 S4 b
NFA for ab
ε
NFA for a | b
S0
S1 S2 a
S3 S4 b
S5 ε
ε ε
ε
S1 S3
NFA for a*
a
cs415, spring 14 Lecture 7
16
RE →NFA using Thompson’s Construction
Key idea • NFA pattern for each symbol and each operator • Each NFA has a single start and accept state • Join them with ε moves in precedence order
S0 S1 a
NFA for a
S0 S1 a
S3 S4 b
NFA for ab
ε
NFA for a | b
S0
S1 S2 a
S3 S4 b
S5 ε
ε ε
ε
S1 S3
NFA for a*
a
ε
cs415, spring 14 Lecture 7
17
RE →NFA using Thompson’s Construction
Key idea • NFA pattern for each symbol and each operator • Each NFA has a single start and accept state • Join them with ε moves in precedence order
S0 S1 a
NFA for a
S0 S1 a
S3 S4 b
NFA for ab
ε
NFA for a | b
S0
S1 S2 a
S3 S4 b
S5 ε
ε ε
ε
S0 S1 S3 S4
NFA for a*
a
ε
cs415, spring 14 Lecture 7
18
RE →NFA using Thompson’s Construction
Key idea • NFA pattern for each symbol and each operator • Each NFA has a single start and accept state • Join them with ε moves in precedence order
S0 S1 a
NFA for a
S0 S1 a
S3 S4 b
NFA for ab
ε
NFA for a | b
S0
S1 S2 a
S3 S4 b
S5 ε
ε ε
ε
S0 S1 ε S3 S4
ε
NFA for a*
a
ε
ε
cs415, spring 14 Lecture 7
19
RE →NFA using Thompson’s Construction
Key idea • NFA pattern for each symbol and each operator • Each NFA has a single start and accept state • Join them with ε moves in precedence order
S0 S1 a
NFA for a
S0 S1 a
S3 S4 b
NFA for ab
ε
NFA for a | b
S0
S1 S2 a
S3 S4 b
S5 ε
ε ε
ε
S0 S1 ε S3 S4
ε
NFA for a*
a
ε
ε
Ken Thompson, CACM, 1968
cs415, spring 14 Lecture 7
20
Example of Thompson’s Construction
Let’s try a ( b | c )*
1. a, b, & c
2. b | c
3. ( b | c )*
S0 S1 a
S0 S1 b
S0 S1 c
S2 S3 b
S4 S5 c
S1 S6 S0 S7 ε ε
ε ε
S1 S2 b
S3 S4 c
S0 S5 ε
ε
ε
ε
cs415, spring 14 Lecture 7
21
Example of Thompson’s Construction
Let’s try a ( b | c )*
1. a, b, & c
2. b | c
3. ( b | c )*
S0 S1 a
S0 S1 b
S0 S1 c
S2 S3 b
S4 S5 c
S1 S6 S0 S7
ε
ε
ε ε
ε ε
ε ε
S1 S2 b
S3 S4 c
S0 S5 ε
ε
ε
ε
cs415, spring 14 Lecture 7
22
Example of Thompson’s Construction (con’t)
4. a ( b | c )*
Of course, a human would design something simpler ...
S0 S1 a
b | c But, we can automate production of the more complex one ...
S0 S1 a ε
S4 S5 b
S6 S7 c
S3 S8 S2 S9
ε
ε
ε ε
ε ε
ε ε
cs415, spring 14 Lecture 7
23
Review: Automating Scanner Construction
RE→ NFA (Thompson’s construction) • Build an NFA for each term • Combine them with ε-moves
NFA → DFA (subset construction) • Build the simulation
DFA → Minimal DFA • Hopcroft’s algorithm
DFA →RE (Not part of the scanner construction) • All pairs, all paths problem • Take the union of all paths from s0 to an accepting state
minimal DFA
RE NFA DFA
The Cycle of Constructions
cs415, spring 14 Lecture 7
24
NFA →DFA with Subset Construction
Need to build a simulation of the NFA
Two key functions • Move(si , a) is set of states reachable from si by a • ε-closure(si) is set of states reachable from si by ε
The algorithm: • Start state derived from s0 of the NFA • Take its ε-closure S0 = ε-closure(s0) • Take the image of S0, move(S0, a) for each a ∈ Σ, and take
its ε-closure, add it to the state set S • For each state S, compute move(S, a) for each a ∈ Σ, and
take its ε-closure • Iterate until no more states are added
Sounds more complex than it is…
cs415, spring 14 Lecture 7
25
NFA →DFA with Subset Construction
The algorithm:
s0 ← ε-closure(q0 ) add s0 to S
while ( S is still changing ) for each si ∈ S for each a∈ Σ s?← ε-closure(move(si,a)) if ( s? ∉ S ) then add s? to S as sj T[ si,a] ← sj
else T[ si,a] ← s?
Let’s think about why this works
cs415, spring 14 Lecture 7
26
NFA →DFA with Subset Construction
The algorithm:
s0 ← ε-closure(q0 ) add s0 to S
while ( S is still changing ) for each si ∈ S for each a∈ Σ s?← ε-closure(move(si,a)) if ( s? ∉ S ) then add s? to S as sj T[ si,a] ← sj
else T[ si,a] ← s?
Let’s think about why this works
The algorithm halts:
1. S contains no duplicates (test before adding)
2. 2Q is finite
3. while loop adds to S, but does not remove from S (monotone)
⇒ the loop halts
S contains all the reachable NFA states It tries each symbol in each si. It builds every possible NFA configuration.
⇒ S and T form the DFA
cs415, spring 14 Lecture 7
27
NFA →DFA with Subset Construction
Example of a fixed-point computation • Monotone construction of some finite set • Halts when it stops adding to the set • Proofs of halting & correctness are similar • These computations arise in many contexts
Other fixed-point computations • Canonical construction of sets of LR(1) items
→ Quite similar to the subset construction • Classic data-flow analysis
→ Solving sets of simultaneous set equations • DFA minimization algorithm (coming up!)
We will see many more fixed-point computations
cs415, spring 14 Lecture 7
28
q0 q1 a ε
q4 q5 b
q6 q7 c
q3 q8 q2 q9
ε
ε ε
ε ε
ε ε
ε-closure(move(s,*))
NFA states a b c
s0 q0 q1
NFA →DFA with Subset Construction
Applying the subset construction:
a ( b | c )* : ε
cs415, spring 14 Lecture 7
29
q0 q1 a ε
q4 q5 b
q6 q7 c
q3 q8 q2 q9
ε
ε ε
ε ε
ε ε
ε-closure(move(s,*))
NFA states a b c
s0 q0 q1, q2, q3, q4, q6, q9
none none
NFA →DFA with Subset Construction
Applying the subset construction:
a ( b | c )* : ε
cs415, spring 14 Lecture 7
30
q0 q1 a ε
q4 q5 b
q6 q7 c
q3 q8 q2 q9
ε
ε ε
ε ε
ε ε
ε-closure(move(s,*))
NFA states a b c
s0 q0 q1, q2, q3, q4, q6, q9
none none
s1 q1, q2, q3, q4, q6, q9
NFA →DFA with Subset Construction
Applying the subset construction:
a ( b | c )* : ε
cs415, spring 14 Lecture 7
31
q0 q1 a ε
q4 q5 b
q6 q7 c
q3 q8 q2 q9
ε
ε ε
ε ε
ε ε
ε-closure(move(s,*))
NFA states a b c
s0 q0 q1, q2, q3, q4, q6, q9
none none
s1 q1, q2, q3, q4, q6, q9
none
NFA →DFA with Subset Construction
Applying the subset construction:
a ( b | c )* : ε
cs415, spring 14 Lecture 7
32
q0 q1 a ε
q4 q5 b
q6 q7 c
q3 q8 q2 q9
ε
ε ε
ε ε
ε ε
ε-closure(move(s,*))
NFA states a b c
s0 q0 q1, q2, q3, q4, q6, q9
none none
s1 q1, q2, q3, q4, q6, q9
none q5
q7
NFA →DFA with Subset Construction
Applying the subset construction:
a ( b | c )* : ε
cs415, spring 14 Lecture 7
33
q0 q1 a ε
q4 q5 b
q6 q7 c
q3 q8 q2 q9
ε
ε ε
ε ε
ε ε
ε-closure(move(s,*))
NFA states a b c
s0 q0 q1, q2, q3, q4, q6, q9
none none
s1 q1, q2, q3, q4, q6, q9
none q5, q8, q9, q3, q4, q6
q7, q8, q9, q3, q4, q6
NFA →DFA with Subset Construction
Applying the subset construction:
a ( b | c )* : ε
cs415, spring 14 Lecture 7
34
q0 q1 a ε
q4 q5 b
q6 q7 c
q3 q8 q2 q9
ε
ε ε
ε ε
ε ε
ε-closure(move(s,*))
NFA states a b c
s0 q0 q1, q2, q3, q4, q6, q9
none none
s1 q1, q2, q3, q4, q6, q9
none q5, q8, q9, q3, q4, q6
q7, q8, q9, q3, q4, q6
s2 q5, q8, q9, q3, q4, q6
NFA →DFA with Subset Construction
Applying the subset construction:
a ( b | c )* : ε
cs415, spring 14 Lecture 7
35
q0 q1 a ε
q4 q5 b
q6 q7 c
q3 q8 q2 q9
ε
ε ε
ε ε
ε ε
ε-closure(move(s,*))
NFA states a b c
s0 q0 q1, q2, q3, q4, q6, q9
none none
s1 q1, q2, q3, q4, q6, q9
none q5, q8, q9, q3, q4, q6
q7, q8, q9, q3, q4, q6
s2 q5, q8, q9, q3, q4, q6
none s2 s3
NFA →DFA with Subset Construction
Applying the subset construction:
a ( b | c )* : ε
cs415, spring 14 Lecture 7
36
q0 q1 a ε
q4 q5 b
q6 q7 c
q3 q8 q2 q9
ε
ε ε
ε ε
ε ε
ε-closure(move(s,*))
NFA states a b c
s0 q0 q1, q2, q3, q4, q6, q9
none none
s1 q1, q2, q3, q4, q6, q9
none q5, q8, q9, q3, q4, q6
q7, q8, q9, q3, q4, q6
s2 q5, q8, q9, q3, q4, q6
none s2 s3
s3 q7, q8, q9, q3, q4, q6
NFA →DFA with Subset Construction
Applying the subset construction:
a ( b | c )* : ε
cs415, spring 14 Lecture 7
37
q0 q1 a ε
q4 q5 b
q6 q7 c
q3 q8 q2 q9
ε
ε ε
ε ε
ε ε
ε-closure(move(s,*))
NFA states a b c
s0 q0 q1, q2, q3, q4, q6, q9
none none
s1 q1, q2, q3,q4, q6, q9
none q5, q8, q9,q3, q4, q6
q7, q8, q9,q3, q4, q6
s2 q5, q8, q9,q3, q4, q6
none s2 s3
s3 q7, q8, q9,q3, q4, q6
none s2 s3
NFA →DFA with Subset Construction
Applying the subset construction:
a ( b | c )* : ε
cs415, spring 14 Lecture 7
38
q0 q1 a ε
q4 q5 b
q6 q7 c
q3 q8 q2 q9
ε
ε ε
ε ε
ε ε
ε-closure(move(s,*))
NFA states a b c
s0 q0 q1, q2, q3, q4, q6, q9
none none
s1 q1, q2, q3,q4, q6, q9
none q5, q8, q9,q3, q4, q6
q7, q8, q9,q3, q4, q6
s2 q5, q8, q9,q3, q4, q6
none s2 s3
s3 q7, q8, q9,q3, q4, q6
none s2 s3
Final states
NFA →DFA with Subset Construction
Applying the subset construction:
a ( b | c )* : ε
cs415, spring 14 Lecture 7
39
NFA →DFA with Subset Construction
The DFA for a ( b | c )*
• Ends up smaller than the NFA • All transitions are deterministic
δ a b c
s0 s1 - -
s1 - s2 s3
s2 - s2 s3
s3 - s2 s3
s3
s2
s0 s1 c
ba
b
c
c
b
cs415, spring 14 Lecture 7
40
Automating Scanner Construction
RE→NFA (Thompson’s construction) • Build an NFA for each term • Combine them with ε-moves
NFA →DFA (subset construction) • Build the simulation
DFA →Minimal DFA • Hopcroft’s algorithm
DFA →RE (not really part of scanner construction) • All pairs, all paths problem • Union together paths from s0 to a final state
minimal DFA
RE NFA DFA
The Cycle of Constructions
cs415, spring 14 Lecture 7
41
DFA Minimization
The Big Picture • Discover sets of equivalent states • Represent each such set with just one state
cs415, spring 14 Lecture 7
42
DFA Minimization
The Big Picture • Discover sets of equivalent states • Represent each such set with just one state
Two states are equivalent if and only if: • ∀ a ∈ Σ, transitions on a lead to equivalent states (DFA) • a-transitions to distinct sets ⇒ states must be in distinct sets
cs415, spring 14 Lecture 7
43
DFA Minimization
The Big Picture • Discover sets of equivalent states • Represent each such set with just one state
Two states are equivalent if and only if: • ∀ a ∈ Σ, transitions on a lead to equivalent states (DFA) • a-transitions to distinct sets ⇒ states must be in distinct sets
A partition P of S • Each state s ∈ S is in exactly one set pi ∈ P • The algorithm iteratively partitions the DFA’s states
cs415, spring 14 Lecture 7
44
DFA Minimization
Details of the algorithm • Group states into maximal size sets, optimistically • Iteratively subdivide those sets, as needed • States that remain grouped together are equivalent
Initial partition, P0 , has two sets: {F} & {Q-F} (D =(Q,Σ,δ,q0,F))
Splitting a set (“partitioning a set by a”) • Assume qa, & qb ∈ s, and δ(qa,a) = qx, & δ(qb,a) = qy • If qx & qy are not in the same set, then s must be split
→ qa has transition on a, qb does not ⇒ a splits s
cs415, spring 14 Lecture 7
45
DFA Minimization
The algorithm P ← { F, {Q-F}} while ( P is still changing) T ← { } for each set S ∈ P T ← T ∪ split(S) P ← T split(S): for each a∈ Σ if a splits S into S1 , S2 , … then return {S1 , S2, …} else return S
Why does this work? • Partition P ∈ 2Q • Start off with 2 subsets of Q
{F} and {Q-F} • While loop takes Pi→Pi+1 by
splitting 1 or more sets • Pi+1 is at least one step closer
to the partition with |Q| sets • Maximum of |Q | splits Note that • Partitions are never combined
cs415, spring 14 Lecture 7
46
DFA Minimization
The algorithm P ← { F, {Q-F}} while ( P is still changing) T ← { } for each set S ∈ P T ← T ∪ split(S) P ← T split(S): for each a∈ Σ if a splits S into S1 , S2 , … then return {S1 , S2, …} else return S
Why does this work? • Partition P ∈ 2Q • Start off with 2 subsets of Q
{F} and {Q-F} • While loop takes Pi→Pi+1 by
splitting 1 or more sets • Pi+1 is at least one step closer
to the partition with |Q| sets • Maximum of |Q | splits Note that • Partitions are never combined
This is a fixed-point algorithm!
cs415, spring 14 Lecture 7
47
Back to our DFA Minimization example
Then, apply the minimization algorithm
To produce the minimal DFA
s3
s2
s0 s1 c
ba
b
b
c
c
Split onCurrent Partition a b c
P0 { s1, s2, s3} {s0} none none none
s0 s1 a
b | c We observed that a human would design a simpler automaton than Thompson’s construction & the subset construction did.
Minimizing that DFA produces the one that a human would design!
final states
cs415, spring 14 Lecture 7
48
Abbreviated Register Specification
Start with a regular expression r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | r8 | r9
minimal DFA
RE NFA DFA
The Cycle of Constructions
cs415, spring 14 Lecture 7
49
Abbreviated Register Specification
Thompson’s construction produces r 0
r 1
r 2
r 8
r 9
… …
s0 sf
ε ε
ε
ε
ε ε
ε ε ε
ε
ε
ε ε
ε ε
ε
ε
ε
ε ε
…
minimal DFA
RE NFA DFA
The Cycle of Constructions
cs415, spring 14 Lecture 7
50
Abbreviated Register Specification
The subset construction builds This is a DFA, but it has a lot of states …
r 0
sf0
s0
sf1 1 sf2 2
sf9 sf8
… 9
8
minimal DFA
RE NFA DFA
The Cycle of Constructions
cs415, spring 14 Lecture 7
51
Abbreviated Register Specification
The DFA minimization algorithm builds This looks like what a skilled compiler writer would do!
r s0 sf
0,1,2,3,4, 5,6,7,8,9
minimal DFA
RE NFA DFA
The Cycle of Constructions