lexical analysis/scanning - indian institute of...

99
Compiler Design 1 Lexical Analysis/Scanning Lect 2 Goutam Biswas

Upload: doandiep

Post on 31-Mar-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 1✬

Lexical Analysis/Scanning

Lect 2 Goutam Biswas

Page 2: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 2✬

Input and Output

The input is a stream of characters (ASCII

codes) of the source program.

The output is a stream of tokens or symbolscorresponding to different syntactic categories.The output also contains attributes of tokens.Examples of tokens are different keywords,identifiers, constants, operators, delimiters etc.

Lect 2 Goutam Biswas

Page 3: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 3✬

Note

The scanner removes the comments, white

spaces, evaluates the constants, keeps track of

the line numbers etc.

This stage performs the main I/O and reduces

the complexity of the syntax analyzer.

The syntax analyzer invokes the scannerwhenever it requires a token.

Lect 2 Goutam Biswas

Page 4: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 4✬

Token

A token is an identifier (name/code)corresponding to a syntactic category of thelanguage grammar. In other word it is theterminal alphabet of the grammar. Often weuse an integer code for this.

Lect 2 Goutam Biswas

Page 5: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 5✬

Pattern

A pattern is a description (formal or informal)of the set of objects corresponding to a terminal(token) symbol of the grammar. Examples arethe set of identifier in C language, set of integerconstants etc.

Lect 2 Goutam Biswas

Page 6: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 6✬

Lexeme and Attribute

A lexeme is an actual string of characters that

matches with a pattern and generates a token.

An attribute of a token is a value that thescanner extracts from the corresponding lexemeand supplies to the syntax analyzer.

Lect 2 Goutam Biswas

Page 7: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 7✬

Specification of Token

The set of strings corresponding to a token(terminals) of a programming language is oftena regular set and is specified by a regularexpression.

Lect 2 Goutam Biswas

Page 8: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 8✬

Scanner from the Specification

The collection of tokens of a programminglanguage can be specified by a set of regularexpressions. A scanner or lexical analyzer forthe language uses a DFA (recognizer of regularlanguages) in its core. Different final states ofthe DFA identifies different tokens. Synthesis ofthis DFA from the set of regular expressionscan be automated.

Lect 2 Goutam Biswas

Page 9: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 9✬

Regular Expression

1. ε, ∅ and all a ∈ Σ are regular expressions.

2. If r and s are regular expressions, then so

are (r|s), (rs), (r∗) and (r). Nothing else is a

regular expression.

We can reduce the use of parenthesis byintroducing precedence and associativity rules.Binary operators are left associative and theprecedence rule is ∗ > concat > |.

Lect 2 Goutam Biswas

Page 10: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 10✬

IEEE POSIX Regular Expression

An enlarged set of operators (defined) for the

regular expressions are introduced in different

softwares e.g. awk, grep, lex etc.a.

• \x is the character itself (a few exceptions

are \n, \t, \r etc.).

• . is any character other than ‘\n’.

• [xyz] is x | y | z.aConsult the manual pages of lex/flex and Wikipedia for the details of IEEE

POSIX standard of regular expressions.

Lect 2 Goutam Biswas

Page 11: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 11✬

IEEE POSIX Regular Expression

• [abg-pT-Y] any character a, b,g, · · · , p,

T, · · · , Y.

• [^G-Q] not any one of G, H, · · · , P, Q.

• r+ one or more r’s.

• r? one or zero r’s.

• r{2,} two or more r’s etc.

Lect 2 Goutam Biswas

Page 12: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 12✬

Language of a Regular Expression

The language of a regular expression is definedin a usual way on the inductive structure of thedefinition.L(ε) = {ε}, L(∅) = ∅, L(a) = {a} for all a ∈ Σ,L(r|s) = L(r) ∪ L(s), L(rs) = L(r)L(s),L(r∗) = L(r)∗, L(r?) = L(r) ∪ {ε} etc.

Lect 2 Goutam Biswas

Page 13: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 13✬

C Identifier

The regular expression for the C identifier is[ a-zA-Z][ a-zA-Z0-9]*The first character is an underscore or anEnglish alphabet. From the second character ona decimal digit can also be used.

Lect 2 Goutam Biswas

Page 14: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 14✬

Regular Definition

We can give name to a regular expression forthe convenience of use. A name name of aregular expression can be used regularexpressions following the name.

Lect 2 Goutam Biswas

Page 15: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 15✬

Examples of a Regular Definition

sign: + | - | ε

digit: [0-9]

digits: {digit}*

frac: \.{digits} | ε

frace: \.{digit}{digits}

expo: ((E | e){sign}{digit}{digit}?) | ε

num: {sign}(({digit}+ {frac} {expo}) |

({frace} {expo}))

Lect 2 Goutam Biswas

Page 16: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 16✬

RE to NFA: Thompson’s Construction

We can mechanically construct anon-deterministic finite automaton (NFA) withonly one initial and only one final state from agiven regular expression. The total number ofstates of the NFA is linear in the number ofsymbols of the regular expressiona

aThe construction is on the inductive structure of the definition of the regular

expression.

Lect 2 Goutam Biswas

Page 17: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 17✬

φ

ε

a

s

f

f

s

s

∀a ∈ Σ

Lect 2 Goutam Biswas

Page 18: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 18✬

(r|s) and (rs)

s1

s2

f1

f2

s f

s1 s2f1 f2s f

εε

ε

ε

ε

εεε ε

N(r)

N(s)

N(s) N(r)

Lect 2 Goutam Biswas

Page 19: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 19✬

Kleene Closure: s∗

s1 f1sεεε

f

εεε

ε

N(s)

Lect 2 Goutam Biswas

Page 20: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 20✬

Properties of Thompson’s Construction

• |Q| ≤ 2length(r), where Q is the number of

states of the NFA and length(r) is the

number of alphabet and operator symbols in

r.

• Only one initial and one final state. No

incoming edge to the initial state and no

outgoing edge from the final state.

Lect 2 Goutam Biswas

Page 21: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 21✬

Properties of Thompson’s Construction

• At most one incoming and one outgoing

transition on a symbol of the alphabet. At

most two incoming and two outgoing

ǫ−transitions.

Lect 2 Goutam Biswas

Page 22: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 22✬

a+ (ab)∗ - An Example

a bεε ε

ε

εε

ε

ε

ε

ε

0 1

2 3 4 56 7

8 9

Lect 2 Goutam Biswas

Page 23: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 23✬

Context-Free Grammar of RE

The set of regular expression can be specified

by a context-free grammar.

E → ∅ | ε | σ, ∀σ ∈ Σ

→ E.E | E + E | E ∗ | (E)

We have put a ‘.’ for concatenation to make itan operator grammar and have replaced ‘|’ by‘+’ for claritya.

aThis ambiguous grammar can be used with proper precedence and associa-

tivity rules.

Lect 2 Goutam Biswas

Page 24: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 24✬

Syntax Directed Thompson’s Construction

Rules of Thompson’s construction can beassociated with the non-terminals of theproduction rules of the grammar. This isknown as an attribute grammar. We assumethe following data structures.

Lect 2 Goutam Biswas

Page 25: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 25✬

Syntax Directed Thompson’s Construction

• Global state counter S initialized to 0, and

the state transition table: T[][].

• With every occurrence of the non-terminal E

we associate two attributes E.ini and E.fin

to store the initial and the final states of the

NFA, corresponding to the regular expression

generated by this occurrence of E.

Lect 2 Goutam Biswas

Page 26: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 26✬

Some of the Rules: Basis

E → ε: {T[S][ε] = S+1; E.ini = S;

S = S+1; E.fin = S; S = S+1;}

E → a: {T[S][a] = S+1; E.ini = S;

S = S+1; E.fin = S; S = S+1;}

The second rule depends on the symbol of thealphabet

Lect 2 Goutam Biswas

Page 27: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 27✬

Concatenation Rules

E → E1.E2: {E.ini = S; S = S+1;

E.fin = S+1; S = S+1;

T[E.ini][ε]=E1.ini;

T[E1.fin][ε]=E2.ini;

T[E2.fin][ε]=E.fin;}

Similarly other rules can be derived.

Lect 2 Goutam Biswas

Page 28: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 28✬

The Final NFA

The states of the final NFA are{0, 1, · · · , S − 1}. The initial state is in E.iniand the final state is in E.fin. The statetransitions are in T[][].

Lect 2 Goutam Biswas

Page 29: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 29✬

a+ (a.b)∗ - An Example

+

( )

*a

a b

.

T[10][]=E2.ini;T[E1.fin][]=11;T[E2.fin][]=11

E1.ini=0E1.fin=1T[0][a]=1

E3.ini=E4.iniE3.fin=E4.fin

E5.ini=2E5.fin=3T[2][a]=3

E6.ini=4E6.fin=5T[4][b]=5

E4.ini=6E4.fin=7T[6][]=E5.ini (2); T[E5.fin(3)][]=E6.ini (4);T[E6.fin(5)][]=7

T[8][]=E3.ini; T[E3.fin][]=9;E2.ini=8; E2.fin=9;T[8][]=9;

E

E1 E2 T[E3.fin][]=E3.ini

E3

E4

E5

E.ini=10; E.fin=11; T[10][]=E1.ini;

E6

Q={0,1, ..., 11}

Lect 2 Goutam Biswas

Page 30: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 30✬

NFA to DFA

Let the constructed ε-NFA be (N,Σ, δn, n0, nF ).By taking ε-closure of states and doing thesubset construction we can get an equivalentDFA (Q,Σ, δd, q0, QF ).

Lect 2 Goutam Biswas

Page 31: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 31✬

Algorithm

Q = L = ε-closure({q0})

while(L 6= ∅)

q = removeElm(L)

for all σ ∈ Σ

t = ε-closure(δn(q, σ))

T [q][σ] = t

if t 6∈ Q

Q = Q ∪ {t}

L = L ∪ {t}

Lect 2 Goutam Biswas

Page 32: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 32✬

ε-closure(T )

for all q ∈ T push(St, q)

εT = T

while(isEmpty(St) == false)

t = pop(St)

for all u ∈ δ(t, ε)

if u 6∈ εT

εT = εT ∪ {u}

push(St, u)

Lect 2 Goutam Biswas

Page 33: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 33✬

Note

The time complexity of the ε-closure algorithmfor each state is O(|M |) = O(|N |+ |δ|).

Lect 2 Goutam Biswas

Page 34: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 34✬

Final State of the DFA

The set of final states of the equivalent DFA isQF = {q ∈ Q : nF ∈ q}. It is to be noted thatdifferent final states will recognize differenttokens. It is also possible that one final stateidentifies more than one tokens.

Lect 2 Goutam Biswas

Page 35: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 35✬

Time Complexity of Subset Construction

The size of Q is O(2|N |) and so the timecomplexity is also O(2|N |), where N is the set ofstates of the NFA. But this is one timeconstruction.

Lect 2 Goutam Biswas

Page 36: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 36✬

a+ (ab)∗ - NFA to DFA

The state transition table of the DFA is

Initial Final State

State a b

A : {0, 2, 6, 7, 8, 9} {1, 3, 4, 9} ∅

B : {1, 3, 4, 9} ∅ {2, 5, 7, 9}

C : {2, 5, 7, 9} {3, 4} ∅

D : {3, 4} ∅ {2, 5, 7, 9}

∅ ∅ ∅

Lect 2 Goutam Biswas

Page 37: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 37✬

a+ (ab)∗ - NFA to DFA

aa

aa

a

b bb

b

b

φ

D

A B

C

Lect 2 Goutam Biswas

Page 38: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 38✬

Note

It may be of advantage to drop the transitionsto ∅ for designing a scanner. This makes theDFA incompletely specified. Absence of atransition from a final state may identify atoken.

Lect 2 Goutam Biswas

Page 39: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 39✬

DFA State Minimization

The constructed DFA may have set ofequivalent statesa and can be minimized. It isto be noted that the time complexity of ascanner of a DFA with a larger number ofstates is not different from the scanner of aDFA having a smaller number of states. Theircode sizes are different and that may give riseto some difference in their speeds.

aLet M = (Q,Σ, δ, s, F ) be a DFA. Two states p, q ∈ Q are said to be equiv-

alent if there is no x ∈ Σ∗ so that δ(p, x) 6= δ(q, x).

Lect 2 Goutam Biswas

Page 40: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 40✬

DFA State Minimization

We start with two non-equivalent partitions of

Q: F and Q \ F .

If p, q belongs to the same initial partition Pbut there is some σ ∈ Σ so that δ(p, σ) ∈ P1

and δ(q, σ) ∈ P2, where P1 and P2 are twodistinct partitions, then p, q cannot remain inthe same partition i.e. they are not equivalent.

Lect 2 Goutam Biswas

Page 41: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 41✬

DFA to Scanner

Given a regular expression r we can constructthe recognizer of L(r). For every token class orsyntactic category of a language we have aregular expression. Let {r1, r2, · · · , rk} be thetotal collection of all regular expressions of alanguage. The regular expressionr = r1|r2| · · · |rk represents objects of allsyntactic categories.

Lect 2 Goutam Biswas

Page 42: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 42✬

DFA to Scanner

Give the NFAs of r1, r2, · · · , rk we construct theNFA for r = r1|r2| · · · |rk by introducing a newstart state and adding ε-transitions from thisstate to the initial states of the componentNFAs. But we keep different final states as theyare to identify different token classes.

Lect 2 Goutam Biswas

Page 43: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 43✬

Final Composite NFA

sr1 fr1

sr2 fr2

srk frk

ε

ε

ε

s

N(r1)

N(r2)

N(rk)

Lect 2 Goutam Biswas

Page 44: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 44✬

DFA to Scanner

The DFA corresponding to r can beconstructed from the composite NFA. It can beimplemented as a C program that will be usedas a scanner of the language. But the followingpoints are to be noted.

Lect 2 Goutam Biswas

Page 45: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 45✬

Note

A program is not a single word but a stream of

words and the notion of acceptance of a scanner

should be different from a simple DFA. The

following questions are of importance:

• when does the scanner report an acceptance?

• what does it do if the word (lexeme) matches

with more than one regular expressions?

Lect 2 Goutam Biswas

Page 46: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 46✬

Example

Consider the following subset of C languageoperators: + ++ += * *= < << <= <<=

Lect 2 Goutam Biswas

Page 47: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 47✬

State Transition Diagram

Lect 2 Goutam Biswas

Page 48: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 48✬

$

$

$

$

1

2

3

4

5

6

7

8

9

+ +

=

* =

<< =

=

other

other

other

other

Lect 2 Goutam Biswas

Page 49: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 49✬

Note

At the final state 1 we know that we have “++”.But we cannot decide whether it is pre or postincrement operator. Though scanner can takethat decision, but it is better to delay it for theparser.

Lect 2 Goutam Biswas

Page 50: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 50✬

Note

At the final state 3 we know that we have ‘+’.But we do not know whether it is binary orunary. Again that decision is defered. Moreoverthe last consumed symbol is not part of thelexeme. It is a look-ahead symbol. We marksuch a final state with the number of look-aheadsymbols to un-read before going back to thestart state. Here we have done that by one $.

Lect 2 Goutam Biswas

Page 51: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 51✬

Note

• There are situations where there may be

more than one look-ahead.

Fortran:DO 10 I = 1, 10 and DO 10 I = 1.10The first one is a do-loop and the second one isan assignment DO10I=1.10.PL/I:IF ELSE THEN THEN = ELSE; ELSE ELSE = THENIF THEN are not reserved as keyword.

Lect 2 Goutam Biswas

Page 52: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 52✬

Maximum Word Length Matching

The scanner will go on reading input as long asthere is a transition. Let there be no transitionsfor the current state q on the input σ (themachine is incompletely specified). The state qmay or may not be final.

Lect 2 Goutam Biswas

Page 53: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 53✬

q is Final

If the final state q corresponds to only oneregular expression ri, the scanner returns thecorresponding tokena. But if it matches withmore than one regular expressions then it isnecessary to resolve the conflict. This is oftendone by specifying priority of expressions e.g.keyword over an identifier.

aIt is necessary to identify the final state with the regular expression ri.

Lect 2 Goutam Biswas

Page 54: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 54✬

q is not Final

It is possible that while consuming symbols thescanner has crossed one or more final states.The decision may be to report the last finalstate. But then it is necessary to keep track ofthe final states and the position of the input.

Lect 2 Goutam Biswas

Page 55: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 55✬

Components of a Scanner

1. The transition table of the DFA or NFA.

2. Set of actions corresponding to a final states.

3. Other essential functions.

Lect 2 Goutam Biswas

Page 56: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 56✬

Maximum Prefix on NFA

1. Read input and keep track of the sequence of

the set of states. Stop when no more

transition is possible (maximum prefix).

2. Trace back the last set of states with a final

state.

3. Push back the look-ahead symbols in the

buffer and emit appropriate token along with

attribute value(s).

Lect 2 Goutam Biswas

Page 57: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 57✬

Note

It is possible that the last set of states has morethan one final states corresponding to differentpatterns. Take action corresponding to apattern with highest prioritya.

aA pattern specified earlier may have higher priority.

Lect 2 Goutam Biswas

Page 58: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 58✬

From DFA to Code

Three possible implementations of DFA -

• table driven,

• direct coded,

• hand coded.

Lect 2 Goutam Biswas

Page 59: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 59✬

Table Driven Scanner

There is a driver code and a set of tables. The

driver code essentially has three parts:

• Initialization,

• Main scanner loop,

• Roll-back loop,

• Token or error return.

Lect 2 Goutam Biswas

Page 60: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 60✬

Initialization

currect state <-- start statelexeme <-- Nilpush(stack,$)

Lect 2 Goutam Biswas

Page 61: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 61✬

Main Scanner Loop

while currect state not = error statelexeme <-- lexme + (c = getchar())if current state is an accept state

clear(stack)push(stack, current state)sym <-- translate[c]next state <-- delta(current state, sym)

Lect 2 Goutam Biswas

Page 62: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 62✬

Roll Back Loop

while not a final state or stack is not emptystate <-- pop(stack)unget() last symbol of lexeme

Lect 2 Goutam Biswas

Page 63: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 63✬

Token or Error

if final statereturn token[state]else Error

Lect 2 Goutam Biswas

Page 64: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 64✬

Tables

• translate[] converts a character to a DFA

symbol (reduces the size of the alphabet).

• delta[] is the state transition table.

• token[] have token values corresponding to

final states.

Lect 2 Goutam Biswas

Page 65: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 65✬

Note

At times roll-back may be costly - consider thelanguage ab|(ab)∗c and the input ababababab$.There will be roll-back of 8 + 6 + 4 + 2 = 20characters.

Lect 2 Goutam Biswas

Page 66: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 66✬

Direct Coded Scanner

• Each state is implemented as a fragment of

code.

• It eliminates memory reference for transition

table access.

Lect 2 Goutam Biswas

Page 67: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 67✬

Code Corresponding to a State

• Code is labelled by the state name.

• Read a character and append it to lexeme.

• Update the roll-back stack.

• Go to next appropriate state - a valid

transition, roll-back and token return state

etc.

Lect 2 Goutam Biswas

Page 68: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 68✬

Reading Characters: Input Buffer

A scanner or lexical analyzer reads the inputcharacter by character. The process will bevery inefficient if it sends request to the OS forevery character read.

Lect 2 Goutam Biswas

Page 69: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 69✬

Input Buffer

• OS reads a block of data, supplies the

requesting process the required amount, and

stored the remaining portion in a buffer

called buffer cache. In subsequent calls, the

actual IO does not take place as long as the

data is available in the buffer.

• Requesting OS for single character is also

costly due to context-switching overhead. So

the scanner uses its own buffer.

Lect 2 Goutam Biswas

Page 70: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 70✬

Input Buffer

• A buffer at its end may contain an initial

portion of a lexeme. It creates problem in

refilling the buffer. So a 2-buffer scheme is

used. The buffers are filled alternatively.

• A sentinel-character is placed at the

end-of-buffer to avoid two comparisons -

character and end-of-buffer.

• We may run out of buffer space for a long

Lect 2 Goutam Biswas

Page 71: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 71✬

character string or a comment.

Lect 2 Goutam Biswas

Page 72: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 72✬

Direct DFA Construction from a Regular Expression

A deterministic finite automaton can beconstructed directly from the given regularexpression.

Lect 2 Goutam Biswas

Page 73: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 73✬

Important States: a Definition

• All initial states of an NFA are important.

• Any other state p of an NFA is called

important if p has an out-transition on some

a ∈ Σ.

• The ε-closure of the important states of an

NFA are used to calculate the next state of

the equivalent DFA.

Lect 2 Goutam Biswas

Page 74: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 74✬

Important States

• Important states are introduced in the NFA

by the start states and each symbol positions

of the regular expression.

• In our example, a+ (ab)∗ the important

states are 8, 0, 2, 4.

Lect 2 Goutam Biswas

Page 75: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 75✬

a+ (ab)∗ - An Example

a bεε ε

ε

εε

ε

ε

ε

ε

0 1

2 3 4 56 7

8 9

Lect 2 Goutam Biswas

Page 76: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 76✬

End Marker and Final State

We introduce a special end marker # 6∈ Σ tothe regular expression, r → (r)#. This makesthe final state(s) of the original NFA important.It also helps to detect the final state(s) (a statethat has transition on #).

Lect 2 Goutam Biswas

Page 77: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 77✬

Syntax Tree of a Regular Expression

The regular expression is represented by asyntax tree where each leaf node corresponds toa symbol of the alphabet, a ∈ Σ, or ε. Eachinternal nodes corresponds to an operatorsymbol.

Lect 2 Goutam Biswas

Page 78: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 78✬

Syntax Tree of a+ (ab)∗#

#+

*a

a b

Lect 2 Goutam Biswas

Page 79: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 79✬

Labelling the Leaf Nodes

• We associate a positive integer p with each

leaf node of a ∈ Σ (not of ε). The positive

integer p is called the position of the symbol

of the leaf node.

• Following are a few definitions where n is a

node and p is a position.

Lect 2 Goutam Biswas

Page 80: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 80✬

Definitions

• nullable(n): A node n is nullable if the

language of its subexpression contains ǫ.

• firstpos(n): It is the set of positions in the

subtree of n, from where the first symbol of

any string of the language corresponding to

the subexpression of n may come.

Lect 2 Goutam Biswas

Page 81: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 81✬

DFA directly from Regular Expression

• lastpos(n): it is similar to the firstpos(n)

except that these are the positions of the

last symbols.

• followpos(p): It is the set positions in the

syntax tree from where a symbol may come

after the symbol of the position p in a string

of L((r)#).

Lect 2 Goutam Biswas

Page 82: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 82✬

Computation of nullable(n)

n is a

• leaf node with label ε: true.

• leaf node with label a ∈ Σ: false.

• internal node of the form n1 + n2:

nullable(n1) ∨ nullable(n2).

• internal node of the form n1 ◦ n2:

nullable(n1) ∧ nullable(n2).

• internal node of the form n∗1: true.

Lect 2 Goutam Biswas

Page 83: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 83✬

Computation of firstpos(n)

n is a

• leaf node with label ε: ∅.

• leaf node with label a ∈ Σ: {a}.

• internal node of the form n1 + n2: firstpos(n1) ∪

firstpos(n2).

• internal node of the form n1 ◦ n2: if nullable(n1), then

firstpos(n1) ∪ firstpos(n2), else firstpos(n1).

• internal node of the form n∗1: firstpos(n1).

Lect 2 Goutam Biswas

Page 84: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 84✬

Computation of lastpos(n)

n is a

• leaf node with label ε: ∅.

• leaf node with label a ∈ Σ: {a}.

• internal node of the form n1 + n2: lastpos(n1) ∪

lastpos(n2).

• internal node of the form n1 ◦ n2: if nullable(n2), then

lastpos(n1) ∪ lastpos(n2), else lastpos(n2).

• internal node of the form n∗1: lastpos(n2).

Lect 2 Goutam Biswas

Page 85: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 85✬

Example

In our example there are two nullable nodes,the ‘+’ and the ‘∗’ nodes. We decorate thesyntax tree with ifirstpos() and lastpos() data.

Lect 2 Goutam Biswas

Page 86: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 86✬

#+

*a

a b

1

2 3

4

({3}, {3})

({1}, {1})

({2}, {2})

({2}, {3})

({2}, {3})

({4}, {4})({1, 2}, {1, 3})

({1, 2, 4}, {4})

Lect 2 Goutam Biswas

Page 87: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 87✬

Computation of followpos(p)

Given a regular expression r, a symbol of a

particular position can be followed by a symbol

of another position in a string of L(r) in two

different ways.

• If n is a concatenation node n1 ◦ n2 of the

syntax tree, then for each position p in

lastpos(n1), each position q of firstpos(n2) is

in followpos(p).

Lect 2 Goutam Biswas

Page 88: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 88✬

Computation of followpos(p)

• If n is a Kleene-star node of the syntax tree,

then for each position p in lastpos(n), each

position q of firstpos(n) is in followpos(p).

Lect 2 Goutam Biswas

Page 89: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 89✬

Example

In our example,

• from the concatenation node we get that

3 ∈ followpos(2), 4 ∈ followpos(1) and

followpos(3).

• from the Kleene-star node we get

2 ∈ followpos(3).

Lect 2 Goutam Biswas

Page 90: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 90✬

Example

The following table summaries followpos() of

different positions.

Position p followpos(p)

1 {4}

2 {3}

3 {2, 4}

4 ∅

Lect 2 Goutam Biswas

Page 91: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 91✬

Directed Graph of followpos()

• Each position p is represented by a node.

• There is a directed edge from a position p to

a position q, if q ∈ followpos(p).

Lect 2 Goutam Biswas

Page 92: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 92✬

Directed Graph of the Example

1 2 3 4

Lect 2 Goutam Biswas

Page 93: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 93✬

Directed Graph to NFA

This directed graph is actually an NFA without

ε-transition.

• All positions in the firstpos(root) are initial

states.

• A transition from p→ q is labelled by the

symbol of position p.

• The node corresponding to the position of #

is the accepting state.

Lect 2 Goutam Biswas

Page 94: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 94✬

Directed Graph to NFA: the Example

1 2 3 4

a

a

b

b

Lect 2 Goutam Biswas

Page 95: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 95✬

DFA from Regular Expression - Direct Construction

Input: A regular expression r over Σ

Output: A DFA M = (Q,Σ, s, F, δ).

Algorithm:

1. Construct a syntax tree T corresponding to

the augmented regular expression (r)#,

where # 6∈ Σ.

Lect 2 Goutam Biswas

Page 96: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 96✬

DFA from Regular Expression - Directly

2. Compute nullable, firstpos, lastpos and

followpos of the syntax tree T .

3. The construction of M is as follows: The set

of states Q of M are the subsets of the

positions of T . The start state

s = firstpos(root(T )). The final states are all

the subsets containing the position of #.

Lect 2 Goutam Biswas

Page 97: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 97✬

Construction of δ

tag[firstpos(root(T ))] ← 0

Q← firstpos(root(T ))

while (α ∈ Q and tag[α] = 0) do

tag[α] ← 1

∀a ∈ Σ do

∀ positions p ∈ α of a ∈ Σ,

collect followpos(p) in a set β

if (β 6∈ Q)

tag[β] ← 0

Q← Q ∪ {β}

δ(α, a)← β.

Lect 2 Goutam Biswas

Page 98: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 98✬

DFA of the Example

The state transition table:

Initial Final State

State a b

A : {1, 2, 4} {3, 4} ∅

B : {3, 4} ∅ {2, 4}

C : {2, 4} {3} ∅

D : {3} ∅ {2, 4}

Start state: A{1, 2, 4}, Finalstates:{A{1, 2, 4}, B{3, 4}, C{2, 4}}.

Lect 2 Goutam Biswas

Page 99: Lexical Analysis/Scanning - Indian Institute of …cse.iitkgp.ac.in/~goutam/bbsCompiler/lect/lect2.pdfThe states of the final NFA are {0,1,···,S −1}. The initial state is in

Compiler Design 99✬

DFA State Transition Diagram

3,4 2,4 3

B

C D1,2,4

A

a b

a

b

Lect 2 Goutam Biswas