two issues in lexical analysis specifying tokens (regular expression)
DESCRIPTION
Two issues in lexical analysis Specifying tokens (regular expression) Identifying tokens specified by regular expression. How to recognize tokens specified by regular expressions? - PowerPoint PPT PresentationTRANSCRIPT
• Two issues in lexical analysis– Specifying tokens (regular expression)– Identifying tokens specified by regular
expression.
• How to recognize tokens specified by regular expressions?– A recognizer for a language is a program that takes a
string x as input and answers “yes” if x is a sentence of the language and “no” otherwise.
• In the context of lexical analysis, given a string and a regular expression, a recognizer of the language specified by the regular expression answer “yes” if the string is in the language.
• A regular expression can be compiled into a recognizer (automatically) by constructing a finite automata which can be deterministic or non-deterministic.
• Non-deterministic finite automata (NFA)– A non-deterministic finite automata (NFA) is a mathematical
model that consists of: (a 5-tuple • a set of states Q
• a set of input symbols
• a transition function that maps state-symbol pairs to sets of states.
• A state q0 that is distinguished as the start (initial) state
• A set of states F distinguished as accepting (final) states.
– An NFA accepts an input string x if and only if there is some path in the transition graph from the start state to some accepting state.
– Show an NFA example (page 116, Figure 3.21).
),0,,,( FqQ
• An NFA is non-deterministic in that (1) same character can label two or more transitions out of one state (2) empty string can label transitions.
• For example, here is an NFA that recognizes the language ???.
• An NFA can easily implemented using a transition table.
State a b
0 {0, 1} {0} 1 - {2} 2 - {3}
0 1 2 3a b b
a
b
• The algorithm that recognizes the language accepted by NFA.– Input: an NFA (transition table) and a string x (terminated by eof).
– output “yes” if accepted, “no” otherwise.
S = e-closure({s0});a = nextchar;while a != eof do begin S = e-closure(move(S, a)); a := next char;endif (intersect (S, F) != empty) then return “yes”else return “no”
Note: e-closure({S}) are the state that can be reached from states in S through transitions labeled by the empty string.
– Example: recognizing ababb from previous NFA
– Example2: Use the example in Fig. 3.27 for recognizing ababb
Space complexity O(|S|), time complexity O(|S|^2|x|)??
• Construct an NFA from a regular expression:– Input: A regular expression r over an alphabet
– Output: An NFA N accepting L( r )
– Algorithm (3.3, pages 122):• For , construct the NFA
• For a in , construct the NFA
• Let N(s) and N(t) be NFA’s for regular s and t:
– for s|t, construct the NFA N(s|t):
– For st, construct the NFA N(st):
– For s*, construct the NFA N(s*):
a
N(s)
N(t)
N(s) N(t)
N(s)
• Example: r = (a|b)*abb.
• Example: using algorithm 3.3 to construct N( r ) for r = (ab | a)*b* | b.
• Using NFA, we can recognize a token in O(|S|^2|X|) time, we can improve the time complexity by using deterministic finite automaton instead of NFA.– An NFA is deterministic (a DFA) if
• no transitions on empty-string
• for each state S and an input symbol a, there is at most one edge labeled a leaving S.
– What is the time complexity to recognize a token when a DFA is used?
• Algorithm to convert an NFA to a DFA that accepts the same language (algorithm 3.2, page 118)
initially e-closure(s0) is the only state in Dstates and it is unmarked
while there is an unmarked state T in Dstates do begin
mark T;
for each input symbol a do begin
U := e-closure(move(T, a));
if (U is not in Dstates) then
add U as an unmarked state to Dstates;
Dtran[T, a] := U;
end
end;
Initial state = e-closure(s0), Final state = ?
• Example: page 120, fig 3.27.
• Question:– for a NFA with |S| states, at most how many states can its
corresponding DFA have?