lecture 4 regexpr nfa dfa topics thompson construction subset construction readings: 3.7, 3.6...

25
Lecture 4 RegExpr NFA DFA Topics Topics Thompson Construction Subset construction Readings: 3.7, 3.6 Readings: 3.7, 3.6 January 23, 2006 CSCE 531 Compiler Construction

Post on 21-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Lecture 4 RegExpr NFA DFA

Lecture 4 RegExpr NFA DFA

Topics Topics Thompson Construction Subset construction

Readings: 3.7, 3.6Readings: 3.7, 3.6

January 23, 2006

CSCE 531 Compiler Construction

– 2 – CSCE 531 Spring 2006

OverviewOverviewLast TimeLast Time

Flex Symbol table - hash table from K&R

Today’s Lecture Today’s Lecture DFA review Simulating DFA figure 3.22 NFAs Thompson Construction: re NFA Examples NFA DFA, the subset construction

ε – closure(s), ε – closure(T), move(T,a)

ReferencesReferences

– 3 – CSCE 531 Spring 2006

Hash TableHash Table

#define ENDSTR 0#define ENDSTR 0

#define MAXSTR 100#define MAXSTR 100

#include <stdio.h>#include <stdio.h>

struct nlist { /* basic table entry */struct nlist { /* basic table entry */

char *name;char *name;

int val;int val;

struct nlist *next; /*next entry in chain */struct nlist *next; /*next entry in chain */

};};

#define HASHSIZE 100#define HASHSIZE 100

static struct nlist *hashtab[HASHSIZE]; /* pointer table */static struct nlist *hashtab[HASHSIZE]; /* pointer table */

– 4 – CSCE 531 Spring 2006

HashtableHashtable

.

.

.

xbar foo

boatcount

x

int

int float

func

double null

.

.

.

.

.

.

– 5 – CSCE 531 Spring 2006

The Hash FunctionThe Hash Function

/* PURPOSE: Hash determines hash value based on the sum of the /* PURPOSE: Hash determines hash value based on the sum of the

character values in the string. character values in the string.

USAGE: n = hash(s);USAGE: n = hash(s);

DESCRIPTION OF PARAMETERS: s(array of char) string to be hashedDESCRIPTION OF PARAMETERS: s(array of char) string to be hashed

AUTHOR: Kernighan and RitchieAUTHOR: Kernighan and Ritchie

LAST REVISION: 12/11/83LAST REVISION: 12/11/83

*/*/

hash(char *s) /* form hash value for string s */hash(char *s) /* form hash value for string s */

{{

int hashval;int hashval;

for (hashval = 0; *s != '\0'; )for (hashval = 0; *s != '\0'; )

hashval += *s++;hashval += *s++;

return (hashval % HASHSIZE);return (hashval % HASHSIZE);

}}

– 6 – CSCE 531 Spring 2006

The lookup FunctionThe lookup Function

/*PURPOSE: Lookup searches for entry in symbol table and returns a /*PURPOSE: Lookup searches for entry in symbol table and returns a pointer pointer

USAGE: np= lookup(s);USAGE: np= lookup(s);

DESCRIPTION OF PARAMETERS: s(array of char) string searched forDESCRIPTION OF PARAMETERS: s(array of char) string searched for

AUTHOR: Kernighan and RitchieAUTHOR: Kernighan and Ritchie

LAST REVISION: 12/11/83*/LAST REVISION: 12/11/83*/

struct nlist *lookup(char *s) /* look for s in hashtab */struct nlist *lookup(char *s) /* look for s in hashtab */

{{

struct nlist *np;struct nlist *np;

for (np = hashtab[hash(s)]; np != NULL; np = np->next)for (np = hashtab[hash(s)]; np != NULL; np = np->next)

if (strcmp(s, np->name) == 0)if (strcmp(s, np->name) == 0)

return(np); /* found it */return(np); /* found it */

return(NULL); /* not found */return(NULL); /* not found */

}}

– 7 – CSCE 531 Spring 2006

The install FunctionThe install Function

/*/*

PURPOSE: Install checks hash table using lookup and PURPOSE: Install checks hash table using lookup and if entry not found, it "installs" the entry.if entry not found, it "installs" the entry.

USAGE: np = install(name); USAGE: np = install(name);

DESCRIPTION OF PARAMETERS: name(array of char) DESCRIPTION OF PARAMETERS: name(array of char) name to install in symbol tablename to install in symbol table

AUTHOR: Kernighan and Ritchie, modified by Ron AUTHOR: Kernighan and Ritchie, modified by Ron SobczakSobczak

LAST REVISION: 12/11/83LAST REVISION: 12/11/83

*/*/

– 8 – CSCE 531 Spring 2006

struct nlist *install(char *name) /* put (name) in hashtab */struct nlist *install(char *name) /* put (name) in hashtab */

{{

struct nlist *np, *lookup();struct nlist *np, *lookup();

char *strdup(), *malloc();char *strdup(), *malloc();

int hashval;int hashval;

if ((np = lookup(name)) == NULL) { /* not found */if ((np = lookup(name)) == NULL) { /* not found */

np = (struct nlist *) malloc(sizeof(*np));np = (struct nlist *) malloc(sizeof(*np));

if (np == NULL)if (np == NULL)

return(NULL);return(NULL);

if ((np->name = strdup(name)) == NULL)if ((np->name = strdup(name)) == NULL)

return(NULL);return(NULL);

hashval = hash(np->name);hashval = hash(np->name);

np->next = hashtab[hashval];np->next = hashtab[hashval];

hashtab[hashval] = np;hashtab[hashval] = np;

}}

return(np);return(np);

}}

– 9 – CSCE 531 Spring 2006

NFAs (Non-deterministic Finite Automata)NFAs (Non-deterministic Finite Automata)

Recall from last TimeRecall from last Time

M = (M = (ΣΣ, S, s, S, s00, , δδ, S, SFF)) Σ - alphabet S - states δ – state transition function s0 – start state

SF – set of final or accepting states

L(M) – { x such that it is possible to follow a path in the L(M) – { x such that it is possible to follow a path in the transition diagram labeled x that ends in an accepting transition diagram labeled x that ends in an accepting state.}state.}

– 10 – CSCE 531 Spring 2006

NFA transition functionNFA transition function

NFAs relax the functional nature of the transition NFAs relax the functional nature of the transition functionfunction

δδ(s, a), the nextstate for state s and input a, is a (s, a), the nextstate for state s and input a, is a subset of statessubset of states

– 11 – CSCE 531 Spring 2006

Equivalence NFA, DFA, REEquivalence NFA, DFA, RE

RegExpr RegExpr NFA NFA Thompson ConstructionThompson Construction

NFA NFA DFA DFA Subset ConstructionSubset Construction

DFA DFA DFA DFA DFA minimizationDFA minimization

DFA DFA tables for scanner tables for scanner

DFA DFA RegExpr RegExpr Kleene Construction Kleene Construction

– 12 – CSCE 531 Spring 2006

Converting Regular Expressions to NFAs Converting Regular Expressions to NFAs

Ken Thompson (1968) outlined a regular expression to Ken Thompson (1968) outlined a regular expression to NFA conversion algorithm for use in an editorNFA conversion algorithm for use in an editor Future fame?

How would we use regular expressions in an editor?How would we use regular expressions in an editor?

Unix regular expressionsUnix regular expressions

Grep family – Global Regular Expressions Print – Grep family – Global Regular Expressions Print – prints all lines in a file that contain a match to the prints all lines in a file that contain a match to the regular expressionregular expression

VariationsVariations Fgrep – fast fixed regular expression just a string Egrep – goes through NFA DFA and minimization

– 13 – CSCE 531 Spring 2006

Restrictions on NFAs in Thompson ConstructionRestrictions on NFAs in Thompson Construction

Constructs an NFA from the regular expression with the Constructs an NFA from the regular expression with the following restrictions:following restrictions:

1.1. The NFA has a single start state, The NFA has a single start state, s0, and single final , and single final state, state, sf..

2.2. There are no transitions coming into the start stateThere are no transitions coming into the start state

3.3. and no transitions leaving the final state.and no transitions leaving the final state.

4.4. A state has at most 2 exiting A state has at most 2 exiting εε – transitions and at – transitions and at most 2 entering most 2 entering εε – transitions. – transitions.

s0 sf

– 14 – CSCE 531 Spring 2006

Base Cases of Thompson Construction Base Cases of Thompson Construction

For a For a εε ΣΣ the NFA M the NFA Maa = ( = (ΣΣ, {s, {s00, s, sff}, }, δδ, s, s00, {s, {sff}) that }) that accepts it is:accepts it is:

For For εε the NFA M the NFA Mεε = ( = (ΣΣ, {s, {s00, s, sff}, }, δδ, s, s00, {s, {sff}) that accepts it }) that accepts it is:is:

– 15 – CSCE 531 Spring 2006

Recursive Cases of Thompson Construction Recursive Cases of Thompson Construction

For regular expressions R and S with machines MFor regular expressions R and S with machines MRR and M and MSS

MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})}) M MSS = ( = (ΣΣ, S, SSS, , δδSS, s, s00, {s, {sff})})

Then the NFA Then the NFA

MMR|SR|S = ( = (ΣΣ, S, SR R U SU SSS U {new U {new00, new, newff}, }, δδR|SR|S, new, new00, {new, {newff})})

– 16 – CSCE 531 Spring 2006

Recursive Cases of Thompson Construction R|SRecursive Cases of Thompson Construction R|S

For regular expressions R and S with machines MFor regular expressions R and S with machines MRR and M and MSS

MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})}) M MSS = ( = (ΣΣ, S, SSS, , δδSS, s, s00, {s, {sff})})

Then the NFA Then the NFA

MMR|SR|S = ( = (ΣΣ, S, SR R U SU SSS U {new U {new00, new, newff}, }, δδR|SR|S, new, new00, {new, {newff})})

– 17 – CSCE 531 Spring 2006

Recursive Cases of Thompson Construction RSRecursive Cases of Thompson Construction RS

For regular expressions R and S with machines MFor regular expressions R and S with machines MRR and M and MSS

MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})}) M MSS = ( = (ΣΣ, S, SSS, , δδSS, s, s00, {s, {sff})})

Then the NFA Then the NFA

MMRSRS = ( = (ΣΣ, S, SR R U SU SSS U {new U {new00, new, newff}, }, δδRSRS, new, new00, {new, {newff})})

– 18 – CSCE 531 Spring 2006

Recursive Cases of Thompson Construction R*Recursive Cases of Thompson Construction R*

For regular expression R with machine MFor regular expression R with machine MRR

MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})})

Then the NFA Then the NFA

MMR*R* = ( = (ΣΣ, S, SR R U {new U {new00, new, newff}, }, δδR*R*, new, new00, {new, {newff})})

– 19 – CSCE 531 Spring 2006

Thompson exampleThompson example

Fig 3.16 has one let’s do another RegExpr = ab*b(a|b)*Fig 3.16 has one let’s do another RegExpr = ab*b(a|b)*

– 20 – CSCE 531 Spring 2006

NFA to DFA the Subset ConstructionNFA to DFA the Subset Construction

In an NFA given an input string we make choices about In an NFA given an input string we make choices about which way to go. We can think of it as being in a which way to go. We can think of it as being in a subset of the states.subset of the states.

To convert to a DFATo convert to a DFA

The states of the DFA correspond to sets of states of The states of the DFA correspond to sets of states of the NFAthe NFA

Transitions of the DFA are when you can move Transitions of the DFA are when you can move between the sets in the NFAbetween the sets in the NFA

– 21 – CSCE 531 Spring 2006

Subset Construction FunctionsSubset Construction Functions We will use a collection of functions to facilitate seeing We will use a collection of functions to facilitate seeing

all of the states we can get to from one on a given all of the states we can get to from one on a given input.input.

-closure(s-closure(sii) is set of states reachable from ) is set of states reachable from ssii by by arcsarcs

-closure(T) is set of states reachable from T by -closure(T) is set of states reachable from T by arcsarcs

Move(T, Move(T, aa)) is set of states reachable from T by is set of states reachable from T by aa

– 22 – CSCE 531 Spring 2006

The Subset Construction AlgorithmThe Subset Construction AlgorithmDD00 = = -closure(s-closure(s00) ) // s// s00 the start state of the NFA the start state of the NFA

Add DAdd D0 0 to Dstates as unmarked state to Dstates as unmarked stateWhile there is an unmarked state T in DstatesWhile there is an unmarked state T in Dstates

mark Tmark Tfor each input symbol a dofor each input symbol a do

U := U := -closure(move(T, a)) -closure(move(T, a)) if U is not in Dstates then if U is not in Dstates then

add U as unmarked state to Dstatesadd U as unmarked state to DstatesDtrans[T, a] = UDtrans[T, a] = U

endendendendendend

– 23 – CSCE 531 Spring 2006

Example of Subset ConstructionExample of Subset Construction

Figure 3.35 Figure 3.35 fig 3.37 in text fig 3.37 in text

Example 2Example 2

– 24 – CSCE 531 Spring 2006

Lexical analyzer for subset of CLexical analyzer for subset of C

int constants: int, octal, hex, int constants: int, octal, hex,

Float constantsFloat constants

C identifiersC identifiers

KeywordsKeywords for, while, if, else

Relational operatorsRelational operators < > >= <= != ==

Arithmetic, Boolean and bit operatorsArithmetic, Boolean and bit operators + - * / && || ! ~ & |

Other symbolsOther symbols ; { } [ ] * ->

– 25 – CSCE 531 Spring 2006

Write core.l Flex SpecificationWrite core.l Flex Specification

Due Monday Jan 30Due Monday Jan 30

NotesNotes

1.1. Install Identifiers and constants into symbol tableInstall Identifiers and constants into symbol table

2.2. Return separate token code for each relational Return separate token code for each relational operator. Not as in text!!operator. Not as in text!!

Homework 02 Dues Thursday Jan 26 (now Saturday 28)Homework 02 Dues Thursday Jan 26 (now Saturday 28)

Construct NFA for recognizing (a|b|Construct NFA for recognizing (a|b|εε)(ab)*)(ab)*

Convert to DFAConvert to DFA