reg-sets

8/10/2019 reg-sets

1/7

Regular Sets and Exp ressions

Finite automata are important in science, mathematics, and engineering.Engineers like them because they are superb models for circuits (And, since theadvent of VLSI systems sometimes finite automata are circuits!) Computerscientists adore them because they adapt very nicely to algorithm design, forexample the lexical analysis portion of compiling and translation.Mathematicians are intrigued by them too due to the fact that there are severalnifty mathematical characterizations of the sets they accept. This is partiallywhat this section is about.

We shall build expressions from the symbols 0, 1, +, and & using the operations

of union, concatenation, and Kleene closure. Several intuitive examples of ournotation are:

a) 01 means a zero followed by a one (concatenation)b) 0+1 means either a zero or a one (union)

c) 0*means + 0 + 00 + 000 + ... (Kleene closure)

With parentheses we can build larger expressions. And we can associatemeanings with our expressions. Here's how:

Expression Set Represented

(0+1)* all st ri ngs ov er {0,1}.

0*10*10* stri ngs containing exact ly tw o ones.

(0+1)*11 st rings w hich end w it h t wo ones.

That is the intuitive approach to these new expressions or formulas.Now for a precise, formal view. Several definitions should do the job.

Definit ion. 0, 1, , and ar e r egular expr ession s.

Definit ion. I f and ar e regula r expr essions, th en so ar e(), (+ ), and ()*.

OK, fine. Regular expressions are strings put together with zeros, ones,epsilons, stars, plusses, and matched parentheses in certain ways. But why didwe do it? And what do they mean? We shall answer this with a list of whatvarious general regular expressions represent. First, let us define what somespecific regular expressions represent.

8/10/2019 reg-sets

2/7

Regular Sets 2

a) 0 represents the set {0}b) 1 represents the set {1}

c) represents the set {}(the empty string)d) represents the empty set

Now for some general cases. If and are regular expressions representing thesets A and B, then:

a) () represents the set AB

b) (+) represents the set AB

c) ()*represents the set A*

The sets which can be represented by regular expressions are called regularsets. When writing down regular expressions to represent regular sets we shall

often drop parentheses around concatenations. Some examples are 11(0 + 1)*

(the set of strings beginning with two ones), 0*1*(all strings which contain apossibly empty sequence of zeros followed by a possibly null string of ones),and the examples mentioned earlier. We also should note that {0,1}is not theonly alphabet for regular sets. Any finite alphabet may be used.

From our precise definitions of the regular expressions and the sets theyrepresent we can derive the following nice characterization of the regular sets.Then, very quickly we shall relate them to finite automata.

Theor em 1 . Th e class of r egular sets is the sm allest class contain in g t hesets {0}, {1}, {}, an d w hich is closed u nd er un ion, concatenat ion, andKleene closu r e.

See why the above characterization theorem is true? And why we left out theproof? Anyway, that is all rather neat but, what exactly does it have to do withfinite automata?

Theor em 2 . Every r egula r set can be accepted by a f in i te autom aton.

Proof . The singleton sets {0}, {1}, {}, and can all be accepted by finite

automata. The fact that the class of sets accepted by finite automata isclosed under union, concatenation, and Kleene closure completes theproof.

Just from closure properties we know that we can build finite automata toaccept all of the regular sets. And this is indeed done using the constructions

8/10/2019 reg-sets

3/7

Regular Sets 3

from the theorems. For example, to build a machine accepting (a + b)a*b, wedesign:

Mawhich accepts {a},

Mbwhich accepts {b},

Ma+bwhich accepts {a, b}(from Maand Mb),

Ma*which accepts a*,

and so forth

until the desired machine has been built. This is easily done automatically, andis not too bad after the final machine is reduced. But it would be nice though tohave some algorithm for converting regular expressions directly to automata.The following algorithm for this will be presented in intuitive terms in languagereminiscent of language parsing and translation.

Initially, we shall take a regular expression and break it into subexpressions. Forexample, the regular expression (aa + b)*ab(bb)*can be broken into the three

subexpressions: (aa + b)*, ab, and (bb)*. (These can be broken down later on inthe same manner if necessary.) Then we number the symbols in the expressionso that we can distinguish between them later. Our three subexpressions now

are: (a1a2 + b1)*, a3b2, and (b3b4)

*.

Symbols which lead an expression are important as are those which end theexpression. We group these in sets named FIRST and LAST. These sets for oursubexpressions are:

Ex p r essi on FIRST L AST

(a1a2+ b1)* a1, b1 a2, b1

a3b2 a3 b2

(b 3b4)* b3 b4

Note that since the FIRST subexpression contained a union there were twosymbols in its FIRSTset. The FIRSTset for the entire expression is: {a1, a3, b1}.

The reason that a3 was in this set is that since the first subexpression wasstarred, it could be skipped and thus the first symbol of the next subexpressioncould be the first symbol for the entire expression. For similar reasons, theLASTset for the whole expression is {b2, b4}.

Formal, precise rules do govern the construction of the FIRSTand LAST sets.We know that FIRST(a) = {a}and that we always build FIRSTand LASTsets fromthe bottom up. Here are the remaining rules for FIRSTsets.

8/10/2019 reg-sets

4/7

Regular Sets 4

Definit ion. I f and ar e r egular expr essions then:

a)FIRST(+) =FIRST() FIRST()

b) FIRST(*) =FIRST() {}

FIRST() if FIRST()c) FIRST( ) =

FIRST() FIRST() otherwise

Examining these rules with care reveals that the above chart was not quite whatthe rules call for since empty strings were omitted. The correct, complete chartis:

Ex pr essi o n FIRST L A ST

(a1a2+ b1)* a1, b1, a2, b1, a3b2 a3 b2

(b 3b4)* b3, b4,

Rules for the LASTsets are much the same in spirit and their formulation willbe left as an exercise.

One more notion is needed, the set of symbols which might follow each symbolin any strings generated from the expression. We shall first provide an exampleand explain in a moment.

Sy m b ol a1 a2 a3 b1 b2 b3 b4F O L L O W a2 a1, a3, b b2 a1, a3, b1 b3 b4 b3

Now, how did we do this? It is almost obvious if given a little thought. TheFOLLOW set for a symbol is all of the symbols which could come next. Thealgorithm goes as follows. To find FOLLOW(a), we keep breaking the expressioninto subexpressions until the symbol a is in the LASTset of a subexpression.Then FOLLOW(a) is the FIRSTset of the next subexpression. Here is an example.

Suppose that we have as our expression and know that a LAST(). Then

FOLLOW(a) = FIRST(). In most cases, this is the way it we compute FOLLOWsets.

8/10/2019 reg-sets

5/7

Regular Sets 5

But, there are three exceptions that must be noted.

1) If an expression of the form a*is in then we must also include the

FIRSTset of this starred subexpression .

2) If is of the form *then FOLLOW(a) also contains 's FIRSTset.

3) If the subexpression to the right of has an in its FIRSTset, then we

keep on to the right unioning FIRSTsets until we no longer find an in one.

Another example. Let's find the FOLLOWset for b1in the regular expression (a1

+ b1a2*)*b2

*(a3+ b3). First we break it down into subexpressions until b1is in a

LASTset. These are:

(a1+ b1a2*)* b2

* (a3+ b3)

Their FIRSTand LASTsets are:

Ex p r essi o n FIRST L A ST

(a1+ b1a2*)* a1, b1, a1, b1, a2,

b2* b2, b2,

(a3+ b3) a3, b3 a3, b3

Since b1 is in the LAST set of asubexpression which is starred then we place

that subexpression's FIRSTset {a1, b1}into FOLLOW(b1). Since a2*came after

b1and was starred we must include a2also. We also place the FIRSTset of the

next subexpression (b2*) in the FOLLOWset. Since that set contained an , we

must put the next FIRSTset in also. Thus in this example, all of the FIRSTsetsare combined and we have:

FOLLOW(b1) = {a1, b1, a2, b2, a3, b3}

Several other FOLLOWsets are:

FOLLOW(a1) = {a1, b1, b2, a3, b3}

FOLLOW(b2) = {b2, a3, b3}

After computing all of these sets it is not hard to set up a finite automaton forany regular expression. Begin with a state named s0. Connect it to states

8/10/2019 reg-sets

6/7

Regular Sets 6

denoting the FIRSTsets of the expression. (By setswe mean: split the FIRSTsetinto two parts, one for each type of symbol.) Our first example (a1a2 +

b1)*a3b2(b3b4)

*provides:

s

0

1

a

1,3

b

a

b

Next, connect the states just generated to states denoting the FOLLOWsets ofall their symbols. Again, we have:

s0

1

a1,3

b

b2

2 a

a

a

ab

b

b

Continue on until everything is connected. Any edges missing at this pointshould be connected to a rejecting state named sr. The states containing

symbols in the expression's LAST set are the accepting states. The completeconstruction for our example (aa + b)*ab(bb)*is:

s0

1

a

b

a1,3

b

b2

2a

a

a ab

b

b

b

b

b3

sr 4

b

a

a

b

a

a,b

8/10/2019 reg-sets

7/7

Regular Sets 7

This construction did indeed produce an equivalent finite automaton, and innot too inefficient a manner. Though if we note that b2and b4are basically the

same, and that b1and a2are similar, we can easily streamline the automaton to:

s

0

1

b

a

1,3

b

b

2

a

a

a

a

b

b

b

b

b3

sr

a

a

a,b

2,4

Our construction method provides:

s0

a1,3

b

a

a

b

b

a

b

a

a

b

123 123

for our f inal example. There is a very simple equivalent machine.Try to find it!

We now close this section with the equivalence theorem concerning finiteautomata and regular sets. Half of it was proven earlier in the section, but thetranslation of finite automata into regular expressions remains. This is notincluded for two reasons. First, that it is verytedious, and secondly that nobodyever actually does that translation for any practical reason! (It is an interestingdemonstration of a correctness proof which involves several levels of iterationand should be looked up by the interested reader.)

Theor em 3 . Th e regu lar sets ar e exactly t hose sets accepted b y f in i te

automata .

reg-sets

Documents