two issues in lexical analysis specifying tokens (regular expression)

11
• Two issues in lexical analysis – Specifying tokens (regular expression) – Identifying tokens specified by regular expression.

Upload: mindy

Post on 06-Jan-2016

24 views

Category:

Documents


2 download

DESCRIPTION

Two issues in lexical analysis Specifying tokens (regular expression) Identifying tokens specified by regular expression. How to recognize tokens specified by regular expressions? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Two issues in lexical analysis Specifying tokens (regular expression)

• Two issues in lexical analysis– Specifying tokens (regular expression)– Identifying tokens specified by regular

expression.

Page 2: Two issues in lexical analysis Specifying tokens (regular expression)

• How to recognize tokens specified by regular expressions?– A recognizer for a language is a program that takes a

string x as input and answers “yes” if x is a sentence of the language and “no” otherwise.

• In the context of lexical analysis, given a string and a regular expression, a recognizer of the language specified by the regular expression answer “yes” if the string is in the language.

• A regular expression can be compiled into a recognizer (automatically) by constructing a finite automata which can be deterministic or non-deterministic.

Page 3: Two issues in lexical analysis Specifying tokens (regular expression)

• Non-deterministic finite automata (NFA)– A non-deterministic finite automata (NFA) is a mathematical

model that consists of: (a 5-tuple • a set of states Q

• a set of input symbols

• a transition function that maps state-symbol pairs to sets of states.

• A state q0 that is distinguished as the start (initial) state

• A set of states F distinguished as accepting (final) states.

– An NFA accepts an input string x if and only if there is some path in the transition graph from the start state to some accepting state.

– Show an NFA example (page 116, Figure 3.21).

),0,,,( FqQ

Page 4: Two issues in lexical analysis Specifying tokens (regular expression)

• An NFA is non-deterministic in that (1) same character can label two or more transitions out of one state (2) empty string can label transitions.

• For example, here is an NFA that recognizes the language ???.

• An NFA can easily implemented using a transition table.

State a b

0 {0, 1} {0} 1 - {2} 2 - {3}

0 1 2 3a b b

a

b

Page 5: Two issues in lexical analysis Specifying tokens (regular expression)

• The algorithm that recognizes the language accepted by NFA.– Input: an NFA (transition table) and a string x (terminated by eof).

– output “yes” if accepted, “no” otherwise.

S = e-closure({s0});a = nextchar;while a != eof do begin S = e-closure(move(S, a)); a := next char;endif (intersect (S, F) != empty) then return “yes”else return “no”

Note: e-closure({S}) are the state that can be reached from states in S through transitions labeled by the empty string.

Page 6: Two issues in lexical analysis Specifying tokens (regular expression)

– Example: recognizing ababb from previous NFA

– Example2: Use the example in Fig. 3.27 for recognizing ababb

Space complexity O(|S|), time complexity O(|S|^2|x|)??

Page 7: Two issues in lexical analysis Specifying tokens (regular expression)

• Construct an NFA from a regular expression:– Input: A regular expression r over an alphabet

– Output: An NFA N accepting L( r )

– Algorithm (3.3, pages 122):• For , construct the NFA

• For a in , construct the NFA

• Let N(s) and N(t) be NFA’s for regular s and t:

– for s|t, construct the NFA N(s|t):

– For st, construct the NFA N(st):

– For s*, construct the NFA N(s*):

a

N(s)

N(t)

N(s) N(t)

N(s)

Page 8: Two issues in lexical analysis Specifying tokens (regular expression)

• Example: r = (a|b)*abb.

• Example: using algorithm 3.3 to construct N( r ) for r = (ab | a)*b* | b.

Page 9: Two issues in lexical analysis Specifying tokens (regular expression)

• Using NFA, we can recognize a token in O(|S|^2|X|) time, we can improve the time complexity by using deterministic finite automaton instead of NFA.– An NFA is deterministic (a DFA) if

• no transitions on empty-string

• for each state S and an input symbol a, there is at most one edge labeled a leaving S.

– What is the time complexity to recognize a token when a DFA is used?

Page 10: Two issues in lexical analysis Specifying tokens (regular expression)

• Algorithm to convert an NFA to a DFA that accepts the same language (algorithm 3.2, page 118)

initially e-closure(s0) is the only state in Dstates and it is unmarked

while there is an unmarked state T in Dstates do begin

mark T;

for each input symbol a do begin

U := e-closure(move(T, a));

if (U is not in Dstates) then

add U as an unmarked state to Dstates;

Dtran[T, a] := U;

end

end;

Initial state = e-closure(s0), Final state = ?

Page 11: Two issues in lexical analysis Specifying tokens (regular expression)

• Example: page 120, fig 3.27.

• Question:– for a NFA with |S| states, at most how many states can its

corresponding DFA have?