1 regular languages, regular operations september 11, 2001

35
1 Regular Languages, Regular Operations September 11, 2001

Post on 19-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

1

Regular Languages, Regular Operations

September 11, 2001

2

Agenda

Today Regular languages

Finite languages are regular Regular operations on languages

Union () Concatenation () Kleene star (*)

For next time: Read 1.3 and handout on minimization

Thursday, 9/20 (revised ): HW1 collected

3

Definition of Regular Language

Recall the definition of a regular language:DEF: The language accepted by an FA M

is the set of all strings which are accepted by M and is denoted by L (M).

Would like to understand what types of languages are regular. Languages of this type are amenable to super-fast recognition of their elements

Would be nice to know for example, which of the following are regular:

4

Language Examples

Unary prime numbers:{ 11, 111, 11111, 1111111, 11111111111, … }= {12, 13, 15, 17, 111, 113, … }= { 1p | p is a prime number }

Unary squares:{, 1, 14, 19, 116, 125, 136, … }= { 1n | n is a perfect square }

Palindromic bit strings:{, 0, 1, 00, 11, 000, 010, 101, 111, …} = {x {0,1}* | x = xR } o

Will explore whether or not these are regular in future.

5

Finite Languages

All the previous examples had the following property in common: infinite cardinality

NOTE: The strings which made up the language were finite (as they always will be in this course); however, the collection of such strings was infinite.

Before looking at infinite languages, should definitely look at finite languages.

6

Languages of Cardinality 1

Q: Is the singleton language containing one string regular? For example, is

{ banana }regular?

7

Languages of Cardinality 1

A: Yes.

Q: What’s, wrong with this example?

8

Languages of Cardinality 1

A: Nothing, really. This an example of a nondeterministic FA. This turns out to be the most concise way to encapsulate the language { banana }

But we will deal with nondeterminism in coming lectures. So:

Q: Is there a way of fixing this and making it deterministic?

9

Languages of Cardinality 1A: Yes, just add a fail state q7; I.e., put

a state that sucks in all strings different from “banana” for all eternity –unless they happen to be the “banana” prefixes {, b, ba, ban, bana, banan}.

11

Two Strings

Q: How about two strings? For example

{ banana, nab } ?

12

Two Strings

A: Just add another route:

13

Arbitrary Finite Number of Strings

Q1: How about more? For example{ banana, nab, ban, babba } ?

Q2: Or less (the empty set):Ø = {} ?

14

Arbitrary Finite Number of StringsA1:

15

Arbitrary Finite Number of Strings: Empty Language

A2: Build a 1-state automaton whose accept states set F is empty!

16

Arbitrary Finite Number of Strings

THM: All finite languages are regular.Proof : Can always construct a tree whose

leaves are word-ending. In our example the tree is:

Now make word endings into accept states, add a fail sink-state and add links to the fail state to finish the construction. �

b

a a

b

a

n b

a

n

b

a

n

17

Infinite Cardinality

Q: Are all regular languages finite?

18

Infinite Cardinality

A: No! Many infinite languages are regular. Common Mistake 1: The strings of regular

languages are finite, therefore the regular languages must be finite.

Common Mistake 2: Regular languages are –by definition– accepted by finite automata, therefore regular languages are finite.

Q: Give an example of a infinite but regular language.

19

Infinite Cardinality bit strings with an even number of b’s

Simplest example is

many, many moreHome exercise: think of a criterion for

non-finiteness

20

Regular OperationsYou may have come across the regular

operations when doing advanced searches utilizing programs such as emacs, egrep, perl, python, etc. There are three basic operations we will work with:

1. Union2. Concatenation3. Kleene-starAnd a fourth definable in terms of the previous:4. Kleene-plus

21

Regular Operations – Summarizing Table

Operation

Symbol

UNIX version

Meaning

Union | match one of the patterns

Concatenation implicit in

UNIX

match patterns in sequence

Kleene-star

* *Match pattern

0 or more times

Kleene-plus

+ +Match pattern

1 or more times

22

Regular operations - Union

UNIX: to search for all lines containing vowels in a text one could use the command

egrep -i `a|e|i|o|u’

Here the pattern “vowel ” is matched by any line containing one of a, e, i, o or u.

Q: What is a string pattern?

23

String Patterns

A: A good way to define a pattern is as a set of strings, i.e. a language. The language for a given pattern is the set of all strings satisfying the predicate of the pattern.

EG: vowel-pattern = { the set of strings which

contain at least one of: a e i o u }

24

UNIX patterns vs. Computability patterns

In UNIX, a pattern is implicitly assumed to occur as a substring of the matched strings.

In our course, however, a pattern needs to specify the whole string, and not just a substring.

25

Regular operations - Union

Computability: union is exactly what we expect. If you have patterns

A = {aardvark}, B = {bobcat}, C = {chimpanzee}

union the patterns together to getAB C = {aardvark, bobcat,

chimpanzee}

26

Regular operations - Concatenation

UNIX: to search for all consecutive double occurrences of vowels, use:egrep -i `(a|e|i|o|u)(a|e|i|o|u)’

Here the pattern “vowel ” has been repeated. Parentheses have been introduced to specify where exactly in the pattern the concatenation is occurring.

27

Regular operations - Concatenation

Computability. Consider the previous result:

L = {aardvark, bobcat, chimpanzee}

Q: What language results when we concatenate L with itself obtaining

LL ?

28

Regular operations - Concatenation

A: LL = {aardvark, bobcat, chimpanzee}{aardvark, bobcat,

chimpanzee}

={aardvarkaardvark, aardvarkbobcat, aardvarkchimpanzee, bobcataardvark, bobcatbobcat, bobcatchimpanzee, chimpanzeeaardvark, chimpanzeebobcat,

chimpanzeechimpanzee}

Q1: What is L ?

Q2: What is LØ ?

29

Algebra of LanguagesA1: L = L. In general, is the identity in

the “algebra” of languages. I.e., if we think of concatenation as being like multiplication, acts like the number 1.

A2: LØ = Ø. Opposite to , Ø acts like the number zero obliterating everything it is concatenated with.

Note: We can carry on the analogy between numbers and languages. Addition becomes union, multiplication becomes concatenation. This forms a so-called “algebra”.

30

Regular operations – Kleene-*

UNIX: search for lines consisting purely of vowels (including the empty line):

egrep -i `^(a|e|i|o|u)*$’

NOTE: ^ and $ are special symbols in UNIX regular expressions which respectively anchor the pattern at the beginning and end of a line. The trick above can be used to convert any Computability regular expression into an equivalent UNIX form.

31

Regular operations – Kleene-*

Computability: Suppose we have a language

B = { ba, na }

Q: What is the language B * ?

32

Regular operations – Kleene-*

A:B * = { ba, na }*= { ,

ba, na, baba, bana, naba, nana, bababa, babana, banaba, banana, nababa, nabana, nanaba, nanana, babababa, bababana, … }

33

Regular operations – Kleene-+

Kleene-+ is just like Kleene-* except that the pattern is forced to occur at least once.

UNIX: search for lines consisting purely of vowels (not including the empty line):

egrep -i `^(a|e|i|o|u)+$’

Computability: B+ = { ba, na }+= { ba, na, baba, bana, naba, nana, bababa, babana, banaba, banana, nababa, nabana, nanaba, nanana, babababa, bababana, … }

34

Generating the Regular Languages

The real reason that regular languages are called regular is the following:

THM: The regular languages are all those languages which can be generated starting from the finite languages by applying the regular operations.

This will be proved in the coming lectures.Q: Can we start with even more basic

languages than arbitrary finite languages?

35

Generating the Regular LanguagesA: Yes. We can start with languages

consisting of single strings which are themselves just a single character. These are the “atomic” regular languages.

EG: To generate the finite language L = { banana, nab }

we can start with the atomic languages A = {a}, B = {b}, N = {n}.

Then we can express L as:

L = (B A N A N A) (N A B )

36

Blackboard Exercises

Express the DFA patterns from the previous board-exercises using regular operations in both UNIX-style and Computability-style.