lecture 4 regexpr nfa dfa topics thompson construction subset construction readings: 3.7, 3.6...
Post on 21-Dec-2015
223 views
TRANSCRIPT
Lecture 4 RegExpr NFA DFA
Lecture 4 RegExpr NFA DFA
Topics Topics Thompson Construction Subset construction
Readings: 3.7, 3.6Readings: 3.7, 3.6
January 23, 2006
CSCE 531 Compiler Construction
– 2 – CSCE 531 Spring 2006
OverviewOverviewLast TimeLast Time
Flex Symbol table - hash table from K&R
Today’s Lecture Today’s Lecture DFA review Simulating DFA figure 3.22 NFAs Thompson Construction: re NFA Examples NFA DFA, the subset construction
ε – closure(s), ε – closure(T), move(T,a)
ReferencesReferences
– 3 – CSCE 531 Spring 2006
Hash TableHash Table
#define ENDSTR 0#define ENDSTR 0
#define MAXSTR 100#define MAXSTR 100
#include <stdio.h>#include <stdio.h>
struct nlist { /* basic table entry */struct nlist { /* basic table entry */
char *name;char *name;
int val;int val;
struct nlist *next; /*next entry in chain */struct nlist *next; /*next entry in chain */
};};
#define HASHSIZE 100#define HASHSIZE 100
static struct nlist *hashtab[HASHSIZE]; /* pointer table */static struct nlist *hashtab[HASHSIZE]; /* pointer table */
– 4 – CSCE 531 Spring 2006
HashtableHashtable
…
…
.
.
.
xbar foo
boatcount
x
int
int float
func
double null
.
.
.
.
.
.
– 5 – CSCE 531 Spring 2006
The Hash FunctionThe Hash Function
/* PURPOSE: Hash determines hash value based on the sum of the /* PURPOSE: Hash determines hash value based on the sum of the
character values in the string. character values in the string.
USAGE: n = hash(s);USAGE: n = hash(s);
DESCRIPTION OF PARAMETERS: s(array of char) string to be hashedDESCRIPTION OF PARAMETERS: s(array of char) string to be hashed
AUTHOR: Kernighan and RitchieAUTHOR: Kernighan and Ritchie
LAST REVISION: 12/11/83LAST REVISION: 12/11/83
*/*/
hash(char *s) /* form hash value for string s */hash(char *s) /* form hash value for string s */
{{
int hashval;int hashval;
for (hashval = 0; *s != '\0'; )for (hashval = 0; *s != '\0'; )
hashval += *s++;hashval += *s++;
return (hashval % HASHSIZE);return (hashval % HASHSIZE);
}}
– 6 – CSCE 531 Spring 2006
The lookup FunctionThe lookup Function
/*PURPOSE: Lookup searches for entry in symbol table and returns a /*PURPOSE: Lookup searches for entry in symbol table and returns a pointer pointer
USAGE: np= lookup(s);USAGE: np= lookup(s);
DESCRIPTION OF PARAMETERS: s(array of char) string searched forDESCRIPTION OF PARAMETERS: s(array of char) string searched for
AUTHOR: Kernighan and RitchieAUTHOR: Kernighan and Ritchie
LAST REVISION: 12/11/83*/LAST REVISION: 12/11/83*/
struct nlist *lookup(char *s) /* look for s in hashtab */struct nlist *lookup(char *s) /* look for s in hashtab */
{{
struct nlist *np;struct nlist *np;
for (np = hashtab[hash(s)]; np != NULL; np = np->next)for (np = hashtab[hash(s)]; np != NULL; np = np->next)
if (strcmp(s, np->name) == 0)if (strcmp(s, np->name) == 0)
return(np); /* found it */return(np); /* found it */
return(NULL); /* not found */return(NULL); /* not found */
}}
– 7 – CSCE 531 Spring 2006
The install FunctionThe install Function
/*/*
PURPOSE: Install checks hash table using lookup and PURPOSE: Install checks hash table using lookup and if entry not found, it "installs" the entry.if entry not found, it "installs" the entry.
USAGE: np = install(name); USAGE: np = install(name);
DESCRIPTION OF PARAMETERS: name(array of char) DESCRIPTION OF PARAMETERS: name(array of char) name to install in symbol tablename to install in symbol table
AUTHOR: Kernighan and Ritchie, modified by Ron AUTHOR: Kernighan and Ritchie, modified by Ron SobczakSobczak
LAST REVISION: 12/11/83LAST REVISION: 12/11/83
*/*/
– 8 – CSCE 531 Spring 2006
struct nlist *install(char *name) /* put (name) in hashtab */struct nlist *install(char *name) /* put (name) in hashtab */
{{
struct nlist *np, *lookup();struct nlist *np, *lookup();
char *strdup(), *malloc();char *strdup(), *malloc();
int hashval;int hashval;
if ((np = lookup(name)) == NULL) { /* not found */if ((np = lookup(name)) == NULL) { /* not found */
np = (struct nlist *) malloc(sizeof(*np));np = (struct nlist *) malloc(sizeof(*np));
if (np == NULL)if (np == NULL)
return(NULL);return(NULL);
if ((np->name = strdup(name)) == NULL)if ((np->name = strdup(name)) == NULL)
return(NULL);return(NULL);
hashval = hash(np->name);hashval = hash(np->name);
np->next = hashtab[hashval];np->next = hashtab[hashval];
hashtab[hashval] = np;hashtab[hashval] = np;
}}
return(np);return(np);
}}
– 9 – CSCE 531 Spring 2006
NFAs (Non-deterministic Finite Automata)NFAs (Non-deterministic Finite Automata)
Recall from last TimeRecall from last Time
M = (M = (ΣΣ, S, s, S, s00, , δδ, S, SFF)) Σ - alphabet S - states δ – state transition function s0 – start state
SF – set of final or accepting states
L(M) – { x such that it is possible to follow a path in the L(M) – { x such that it is possible to follow a path in the transition diagram labeled x that ends in an accepting transition diagram labeled x that ends in an accepting state.}state.}
– 10 – CSCE 531 Spring 2006
NFA transition functionNFA transition function
NFAs relax the functional nature of the transition NFAs relax the functional nature of the transition functionfunction
δδ(s, a), the nextstate for state s and input a, is a (s, a), the nextstate for state s and input a, is a subset of statessubset of states
– 11 – CSCE 531 Spring 2006
Equivalence NFA, DFA, REEquivalence NFA, DFA, RE
RegExpr RegExpr NFA NFA Thompson ConstructionThompson Construction
NFA NFA DFA DFA Subset ConstructionSubset Construction
DFA DFA DFA DFA DFA minimizationDFA minimization
DFA DFA tables for scanner tables for scanner
DFA DFA RegExpr RegExpr Kleene Construction Kleene Construction
– 12 – CSCE 531 Spring 2006
Converting Regular Expressions to NFAs Converting Regular Expressions to NFAs
Ken Thompson (1968) outlined a regular expression to Ken Thompson (1968) outlined a regular expression to NFA conversion algorithm for use in an editorNFA conversion algorithm for use in an editor Future fame?
How would we use regular expressions in an editor?How would we use regular expressions in an editor?
Unix regular expressionsUnix regular expressions
Grep family – Global Regular Expressions Print – Grep family – Global Regular Expressions Print – prints all lines in a file that contain a match to the prints all lines in a file that contain a match to the regular expressionregular expression
VariationsVariations Fgrep – fast fixed regular expression just a string Egrep – goes through NFA DFA and minimization
– 13 – CSCE 531 Spring 2006
Restrictions on NFAs in Thompson ConstructionRestrictions on NFAs in Thompson Construction
Constructs an NFA from the regular expression with the Constructs an NFA from the regular expression with the following restrictions:following restrictions:
1.1. The NFA has a single start state, The NFA has a single start state, s0, and single final , and single final state, state, sf..
2.2. There are no transitions coming into the start stateThere are no transitions coming into the start state
3.3. and no transitions leaving the final state.and no transitions leaving the final state.
4.4. A state has at most 2 exiting A state has at most 2 exiting εε – transitions and at – transitions and at most 2 entering most 2 entering εε – transitions. – transitions.
s0 sf
– 14 – CSCE 531 Spring 2006
Base Cases of Thompson Construction Base Cases of Thompson Construction
For a For a εε ΣΣ the NFA M the NFA Maa = ( = (ΣΣ, {s, {s00, s, sff}, }, δδ, s, s00, {s, {sff}) that }) that accepts it is:accepts it is:
For For εε the NFA M the NFA Mεε = ( = (ΣΣ, {s, {s00, s, sff}, }, δδ, s, s00, {s, {sff}) that accepts it }) that accepts it is:is:
– 15 – CSCE 531 Spring 2006
Recursive Cases of Thompson Construction Recursive Cases of Thompson Construction
For regular expressions R and S with machines MFor regular expressions R and S with machines MRR and M and MSS
MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})}) M MSS = ( = (ΣΣ, S, SSS, , δδSS, s, s00, {s, {sff})})
Then the NFA Then the NFA
MMR|SR|S = ( = (ΣΣ, S, SR R U SU SSS U {new U {new00, new, newff}, }, δδR|SR|S, new, new00, {new, {newff})})
– 16 – CSCE 531 Spring 2006
Recursive Cases of Thompson Construction R|SRecursive Cases of Thompson Construction R|S
For regular expressions R and S with machines MFor regular expressions R and S with machines MRR and M and MSS
MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})}) M MSS = ( = (ΣΣ, S, SSS, , δδSS, s, s00, {s, {sff})})
Then the NFA Then the NFA
MMR|SR|S = ( = (ΣΣ, S, SR R U SU SSS U {new U {new00, new, newff}, }, δδR|SR|S, new, new00, {new, {newff})})
– 17 – CSCE 531 Spring 2006
Recursive Cases of Thompson Construction RSRecursive Cases of Thompson Construction RS
For regular expressions R and S with machines MFor regular expressions R and S with machines MRR and M and MSS
MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})}) M MSS = ( = (ΣΣ, S, SSS, , δδSS, s, s00, {s, {sff})})
Then the NFA Then the NFA
MMRSRS = ( = (ΣΣ, S, SR R U SU SSS U {new U {new00, new, newff}, }, δδRSRS, new, new00, {new, {newff})})
– 18 – CSCE 531 Spring 2006
Recursive Cases of Thompson Construction R*Recursive Cases of Thompson Construction R*
For regular expression R with machine MFor regular expression R with machine MRR
MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})})
Then the NFA Then the NFA
MMR*R* = ( = (ΣΣ, S, SR R U {new U {new00, new, newff}, }, δδR*R*, new, new00, {new, {newff})})
– 19 – CSCE 531 Spring 2006
Thompson exampleThompson example
Fig 3.16 has one let’s do another RegExpr = ab*b(a|b)*Fig 3.16 has one let’s do another RegExpr = ab*b(a|b)*
– 20 – CSCE 531 Spring 2006
NFA to DFA the Subset ConstructionNFA to DFA the Subset Construction
In an NFA given an input string we make choices about In an NFA given an input string we make choices about which way to go. We can think of it as being in a which way to go. We can think of it as being in a subset of the states.subset of the states.
To convert to a DFATo convert to a DFA
The states of the DFA correspond to sets of states of The states of the DFA correspond to sets of states of the NFAthe NFA
Transitions of the DFA are when you can move Transitions of the DFA are when you can move between the sets in the NFAbetween the sets in the NFA
– 21 – CSCE 531 Spring 2006
Subset Construction FunctionsSubset Construction Functions We will use a collection of functions to facilitate seeing We will use a collection of functions to facilitate seeing
all of the states we can get to from one on a given all of the states we can get to from one on a given input.input.
-closure(s-closure(sii) is set of states reachable from ) is set of states reachable from ssii by by arcsarcs
-closure(T) is set of states reachable from T by -closure(T) is set of states reachable from T by arcsarcs
Move(T, Move(T, aa)) is set of states reachable from T by is set of states reachable from T by aa
– 22 – CSCE 531 Spring 2006
The Subset Construction AlgorithmThe Subset Construction AlgorithmDD00 = = -closure(s-closure(s00) ) // s// s00 the start state of the NFA the start state of the NFA
Add DAdd D0 0 to Dstates as unmarked state to Dstates as unmarked stateWhile there is an unmarked state T in DstatesWhile there is an unmarked state T in Dstates
mark Tmark Tfor each input symbol a dofor each input symbol a do
U := U := -closure(move(T, a)) -closure(move(T, a)) if U is not in Dstates then if U is not in Dstates then
add U as unmarked state to Dstatesadd U as unmarked state to DstatesDtrans[T, a] = UDtrans[T, a] = U
endendendendendend
– 23 – CSCE 531 Spring 2006
Example of Subset ConstructionExample of Subset Construction
Figure 3.35 Figure 3.35 fig 3.37 in text fig 3.37 in text
Example 2Example 2
– 24 – CSCE 531 Spring 2006
Lexical analyzer for subset of CLexical analyzer for subset of C
int constants: int, octal, hex, int constants: int, octal, hex,
Float constantsFloat constants
C identifiersC identifiers
KeywordsKeywords for, while, if, else
Relational operatorsRelational operators < > >= <= != ==
Arithmetic, Boolean and bit operatorsArithmetic, Boolean and bit operators + - * / && || ! ~ & |
Other symbolsOther symbols ; { } [ ] * ->
– 25 – CSCE 531 Spring 2006
Write core.l Flex SpecificationWrite core.l Flex Specification
Due Monday Jan 30Due Monday Jan 30
NotesNotes
1.1. Install Identifiers and constants into symbol tableInstall Identifiers and constants into symbol table
2.2. Return separate token code for each relational Return separate token code for each relational operator. Not as in text!!operator. Not as in text!!
Homework 02 Dues Thursday Jan 26 (now Saturday 28)Homework 02 Dues Thursday Jan 26 (now Saturday 28)
Construct NFA for recognizing (a|b|Construct NFA for recognizing (a|b|εε)(ab)*)(ab)*
Convert to DFAConvert to DFA