regular languages - computer sciencecobweb.cs.uga.edu/~potter/theory/2.2_regular_languages.pdf ·...

Regular Languages

CSCI 2670

Department of Computer Science

Fall 2014

CSCI 2670 Regular Languages

Outline

I Regular Expressions

I Converting Regular Expressions to NFAs

I Generalized Nondeterministic finite Automata

I Converting GNFAs

I Nonregular Languages

I The Pumping Lemma


Regular Expressions

I Regular languages can also be defined via regular expressions (regexp),a form of shorthand for languages defined using regular operations.

Definition

Let Σ be any alphabet.

1. Each symbol a ∈ Σ is a regular expression;

2. ε is a regular expression;

3. ∅ is a regular expression;

4. if R1 and R2 are regular expressions, then (R1 ∪ R2) is a regularexpression;

5. if R1 and R2 are regular expressions, then (R1 ◦ R2) is a regularexpression;

6. if R1 is a regular expression, then R∗1 is a regular expression.


Regular Expressions

I If a ∈ Σ, the regexp a denotes the language {a}.I The regexp ε denotes the language {ε}.I The regexp (a ∪ b) denotes the language {a} ∪ {b}.I The regexp (a ◦ b) denotes the language {a} ◦ {b}.I The regexp (a ∪ b)∗ ◦ a denotes {wa| w is any string over {a, b}}.

Note the following:

I R1 ◦ R2 is often abbreviated as R1R2.

I If Σ = {a1, a2, . . .}, Σb is used in place of (a1 ∪ a2 ∪ . . .)b.

I For any regexp R, R∅ = ∅I ∅∗ = {ε}I (R1 ∪ R2) is sometimes written (R1|R2).

I R+ is the concatenation of one or more elements from R.

I Rk is the concatenation of k elements from R.

I The precedence of operators (greatest to least) is: ∗, ◦, ∪.


Regular Expressions

Example

What languages do the following denote (where Σ = {0, 1})?

I 0∗10∗

I Σ∗1Σ∗

I Σ∗001Σ∗

I 1∗(01+)∗

I (ΣΣ)∗

I (ΣΣΣ)∗

I (01 ∪ 10)

I 0Σ∗0 ∪ 1Σ∗1 ∪ 0 ∪ 1

I (0 ∪ ε)1∗

I (0 ∪ ε)(1 ∪ ε)


Two Identities

For any regular expression R:

I R ∪∅ = R

I R ◦ ε = R

The following do not hold in general, however.

I R ∪ ε = R

I R ◦∅ = R

Example

If R = ab, then

I L(R) = {ab}I L(ab ∪∅) = {ab}I L(ab ◦∅) = {}I L(ab ∪ ε) = {ab, ε}I L(ab ◦ ε) = {ab}


Equivalence between Regular Expressions and FiniteAutomata

Theorem

A language is regular if and only if some regular expression describes it.

I That is, a language L is regular if and only if there exists a regularexpression R such that L(R) = L.

I Note that the theorem is a biconditional statement, and so to prove it,both directions must be proven.

I If a language is described by a regular expression, then it is regular.

I If a language is regular, then it is described by some regularexpression.



Lemma

If a language is described by a regular expression, then it is regular.

Proof.

The proof proceeds by constructing, for each regexp R, an NFA N torecognize L(R). The proof uses structural induction.Basis:

1. R = a, where a ∈ Σ: N = ({q0, q1},Σ, δ, q0, {q1}), whereδ(q0, a) = {q1} and δ(q, b) = ∅ for all q 6= q0 or b ∈ Σ such thatb 6= a.

2. R = ε: N = ({q0},Σ, δ, q0, {q0}), where δ(q, a) = ∅ for all states qand a ∈ Σ

3. R = ∅: N = ({q0},Σ, δ, q0, {}), where δ(q, a) = ∅ for all states qand a ∈ Σ



1. L(a) = {a}

2. L(ε) = {ε}

3. L(∅) = ∅



Lemma

If a language is described by a regular expression, then it is regular.

Proof, Continued.

Recursion: The recursive cases are taken care of by the proofs thatregular languages are closed under union, concatenation, and Kleene star.

4. R = R1 ∪ R2

5. R = R1 ◦ R2

6. R = R∗1



Example

1. Convert the regular expression (ab ∪ a)∗ to an NFA.

2. Convert the regular expression (a ∪ b)∗aba to an NFA.

??


Generalized NFAs

I To prove that each regular language is described by a regular expression,we define generalized nondeterministic finite automata (GNFAs).

I GNFAs are like NFAs, except:

1. The start state qstart has edges leading to every other state, but noincoming edges.

2. There is a unique accept state qaccept , qaccept 6= qstart , with incoming edgescoming from every other node. It has no outgoing edges.

3. For all q, q′ ∈ Q − {qaccept , qstart}, there is exactly one edge from q to q′.Note that q and q′ might be the same.


Generalized NFAs

1. The start state qstart has edges leading to every other state, but noincoming edges.

2. There is a unique accept state qaccept , qaccept 6= qstart , with incomingedges coming from every other node. It has no outgoing edges.

3. For all q, q′ ∈ Q − {qaccept , qstart}, there is exactly one edge from q to q′.Note that q and q′ might be the same.


Generalized NFAs

I The labels of the edges in an GNFA will be arbitrary regular expressions.

I A DFA M can be converted into a GNFA:

I Add a new start state qstart with an ε edge leading to the old start state.I Add a new accept state qaccept with an ε edge leading from each accept

state in M to qaccept .I If edges q →a q

′ and q →b q′ exist, replace both with q →(a∪b) q′.

I If no edge leads from q to q′, add q →∅ q′.

I It’s not proven in the text, but it should be clear that each of thesealterations does not change the language accepted by the automaton.


From GNFAs to regular expressions

I The conversion from GNFA to regular expression proceeds by combiningnodes and labels in the graph. If a qrip exists such that:

I qi →R1 qrip,I qrip →R2 qrip, qrip →R3 qj , andI qi →R4 qj ,

I then,

I Delete qrip and each edge above.I Add edge qi →R1R

∗2 R3∪R4 qj .

I Do this for each qi and qj connected via qrip.

I Repeat the process until only two nodes exist, qstart and qaccept .

Let CONVERT (G ) be the regexp obtained as a result of this process.


Generalized NFAs (Definition)

Definition

I A generalized nondeterministic finite automaton (GNFA) is a5-tuple (Q,Σ, δ, qstart , qaccept):

I Q is a finite, nonempty set of states.I Σ is a finite, nonempty alphabet.I δ : (Q − {qaccept})× (Q − {qstart})→ R is the transition function,

where R is the set of regular expressions over Σ.I qstart ∈ Q is the start state.I qaccept is the unique accept state.

I The function δ identifies the labels for edge (qi , qj), whereqi ∈ Q − {qaccept} and qj ∈ Q − {qstart}.

I Here, qi can’t be the accept state, because no edge originates there.

I Here, qj can’t be the start state, because no edge ends there.


Language Recognition for GNFAs

Definition

Let G be a GNFA and w = w1w2 . . .wk a string, where each wi ∈ Σ∗. Gaccepts w iff there is a sequence of states q0, q1, . . . qk such that

I q0 = qstart .

I qk = qaccept .

I for each i , wi ∈ L(Ri ), where Ri = δ(qi−1, qi ).

I We split w into w1w2 . . .wk , where each wi corresponds to a stringgenerated by a regular expression on an edge.

I Specifically, wi is in the language indicated by the label from qi−1 to qi .


Equivalence between Regular Expressions and GNFAs

Proposition

For any GNFA G , CONVERT (G ) is equivalent to G .

Proof.

The proof proceeds by induction on the number of nodes in G .

Basis: If G has only 2 nodes, then they must be the distinct start andaccept states, and the regular expression between them is CONVERT (G )and describes exactly the strings accepted by G .

Induction: Suppose the claim holds for GNFAs of k − 1 states and thatG has k states (where k > 2). Since G has more than 2 states, it can bereduced. Let G ′ be a GNFA obtained by removing a state qrip from Gaccording to the procedure described earlier. Let δ′ be the transitionfunction for G ′.



Proposition


Proof, Continued.

I Let w = w1w2 . . .wn be a string accepted by G . Then there exists asequence qstart , q1, q2, . . . , qaccept demonstrating that G accepts w .Note that for each i , wi ∈ L(Ri ), where Ri = δ(qi−1, qi ).

I State qrip is either in this sequence, or it’s not.

1. If not, then the sequence qstart , q1, q2, . . . , qaccept demonstrates thatG ′ accepts w , since for each qi and qi+1, δ′(qi , qi+1) = R ∪ S ,where δ(qi , qi+1) = R and S is some other regular expression.

2. If qrip is in the sequence qstart , q1, q2, . . . , qaccept , then the sequencewith all occurrences of qrip removed constitutes an acceptingcomputation path for G ′. This is clear from the construction of G ′.



Proposition


Proof, Continued.

I [A similar argument in the opposite direction shows that if G ′

accepts w, then G accepts w.]

I So, for any string w , G accepts w if and only if G ′ does. That is, G ′

and G are equivalent.

I By the inductive hypothesis, CONVERT (G ′) and G ′ are equivalent.

I Since CONVERT (G ′) is CONVERT (G ) (they return the sameregular expression), it follows that G and CONVERT (G ) areequivalent.



Given the previous proposition, the following holds:

Lemma

If a language is regular, then it is described by a regular expression.

Given this lemma and the previous lemma, the following theorem holds.

Theorem

A language is regular if and only if some regular expression describes it.


Converting DFAs to Regular Expressions

The Examples in the next several slides indicate how DFAs can beconverted into regular expressions.

Example 1 (Pg. 75): Convert the following 2 state DFA into a regularexpression.



First we convert to a GNFA by adding a new start and accept state, eachwith appropriate ε edges.

I draw an ε-edge from the new qstart to the old start state.

I draw an ε-edge from each old accept state to qaccept .



I In the GNFA, we must ensure exactly 1 edge connects each pair from(Q − qaccept)× (Q − qstart).

I If multiple edges from qi to qj exist, combine them into a single edgeusing ∪.

I If no edge from qi to qj already exists, add an edge labeled ∅.

I Unofficially, the ∅ edges are typically omitted.



I After the GNFA is constructed, we begin removing nodes.

I Here, node 2 has been removed.



I Here, node 1 has been removed.

I Since the resulting machine has only 2 states, we stop.



Example 2 (Pg. 76): Convert the following 3 state DFA into a regularexpression.



Add the new start state and accept state.Combine multiple edges between nodes, if needed.

Then start removing nodes (start with node 1)...



Node 1 removed.



Node 2 removed.



Node 3 removed. Multiple edges should be combined.



Node 3 removed. Multiple combined.


The Pumping Lemma

I Consider DFA M over Σ = {a, b, c , d} (with edges leading to a trap stateomitted).

I M recognizes the language ab(cb)∗d .

I The computation sequence for abcbcbd is

1→a 2→b 3→c 2→b 3→c 2→b 3→d 4.

I A cycle exists in the graph.

I The pattern cb can be repeated infinitely (“pumped”), yielding anotherstring in the language.


The Pumping Lemma

I All regular languages have this property.

I Any string in the language over a certain length p (the pumping length)has a nonempty substring that can be pumped.

Theorem

If A is a regular language, then there exists an integer p such that for alls ∈ A with |s| ≥ p, s may be divided into pieces s = xyz such that

1. for each i ≥ 0, xy iz ∈ A,

2. |y | > 0, and

3. |xy | ≤ p.

I This theorem is useful when showing that a language is not regular.

I Technique: Use a proof by contradiction.

I Assume A is regularI Use the pumping lemma to show that a string both is and is not in A.I conclude that A is not regular.


The Pumping Lemma

Example

The language B = {0n1n|n ≥ 0} is not regular.

Proof.

??


The Pumping Lemma

Example


Proof.

Suppose that B is regular and so has pumping length p. Let w be anystring of B of length at least p. Then w = xyz such that

1. for each i ≥ 0, xy iz ∈ B,

2. |y | > 0, and

3. |xy | ≤ p.

It cannot be that y consists solely of 1s or solely of 0s, sincexy0z = xz ∈ B and parity of 1s and 0s must be maintained. So y mustcontain as many 0s as 1s, and because of the definition of B it must beof the form 0m1m, where m > 0, and x must consist solely of 0s and zmust consist solely of 1s. However, since xy2z ∈ B, and y has the form0m1m, it must be that x0m1m0m1mz ∈ B. But this string clearly is not ofthe form 0n1n and so cannot be in B. A contradiction! And so B cannotbe regular.


The Pumping Lemma

Example


Alternative Proof.

Suppose that B is regular and so has pumping length p. Let w be thestring 0p1p which is clearly in B. Given that |xy | ≤ p, it must be that xand y consist solely of 0s. By the pumping lemma, xyyz ∈ B. However,this string clearly has more 0s than 1s and so can’t be in B. Acontradiction! And so B cannot be regular.


The Pumping Lemma

Example

The language C = {w | w has an equal number of 0s and 1s} is notregular.

Proof.

??


The Pumping Lemma

Example


Proof.

Suppose that C is regular and so has pumping length p. Let w be thestring 0p1p. Since |w | ≥ p, w = xyz such that

1. for each i ≥ 0, xy iz ∈ C ,

2. |y | > 0, and

3. |xy | ≤ p.

Given the third condition above, since w = 0p1p, it must be that yconsists solely of 0s. Given the first condition of the pumping lemma, itmust be that xz ∈ C . However, since y consists solely of 0s, xz clearlyhas more 1s than 0s and so can’t be in language C . A contradiction!And so C cannot be regular.


The Pumping Lemma

Example


Alternative Proof.

Regular languages are closed under intersection (the proof of thisproceeds similarly to the proof that they are closed under union). Giventhis result, If we assume C is regular, then C ∩ 0∗1∗ is regular (0∗1∗ isclearly regular). However C ∩ 0∗1∗ = {0n1n|n ≥ 0}, which we justshowed to be nonregular. And so C cannot be regular, either.


The Pumping Lemma

Example

The language F = {ww | w ∈ {0, 1}∗} is not regular.

Proof.

??


The Pumping Lemma

Example

The language F = {ww | w ∈ {0, 1}∗} is not regular.

Proof.

Suppose F is regular and let w = 0p1. As such, s = ww ∈ A. Since|s| > p, s can be split into s = xyz such that all conditions of thepumping lemma apply. From the 3rd condition, |xy | ≤ p and so x musthave the form 0i for some i ≥ 0, y must have the form 0k for somek > 0, and z must have the form 0j10i0k0j1 for some j ≥ 0. So,s = 0i0k0j10i0k0j1. Since y can be pumped, 0i02k0j10i0k0j1 ∈ F .However, this is clearly not of the form ww and so cannot be in F . Acontradiction, and so F cannot be regular.


The Pumping Lemma

Example

1. The language A1 = {0i1j | i > j} is not regular.

2. The language A2 = {w | w is a palindrome} is not regular. (Apalindrome is a string that reads the same forward and backward.)

3. The language A3 = {0n1n2n| n ≥ 0} is not regular.

4. The language A4 = {a2n | n ≥ 0} is not regular. (Here,a2n

means astring of 2na′s.)

Proof.

??


regular languages - computer sciencecobweb.cs.uga.edu/~potter/theory/2.2_regular_languages.pdf ·...

Documents