regular expressions

35
1 Regular Expressions

Upload: rahmatalam

Post on 18-Nov-2014

308 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Regular Expressions

1

Regular Expressions

Page 2: Regular Expressions

Regular Expression

• A regular expression (RE) is defined inductivelya ordinary character

from the empty string

2

Page 3: Regular Expressions

Regular Expression

R|S = either R or SRS = R followed by S

(concatenation)R* = concatenation of R

zero or more times(R*= |R|RR|RRR...)

3

Page 4: Regular Expressions

RE Extentions

R? = | R (zero or one R)

R+ = RR* (one or more R)

4

Page 5: Regular Expressions

RE Extentions

[abc] = a|b|c (any of listed)

[a-z] = a|b|....|z (range)

[^ab] = c|d|... (anything but

‘a’‘b’) 5

Page 6: Regular Expressions

Regular Expression

RE Strings in L(R)a “a”ab “ab”a|b “a” “b”(ab)* “” “ab”

“abab” ...(a|)b “ab” “b”

6

Page 7: Regular Expressions

Example: integers

• integer: a non-empty string

of digits• digit = ‘0’|’1’|’2’|’3’|’4’|

’5’|’6’|’7’|’8’|’9’• integer = digit digit*

7

Page 8: Regular Expressions

Example: identifiers

• identifier: string or letters or digits starting with a letter

• C identifier:[a-zA-Z_][a-zA-Z0-9_]*

8

Page 9: Regular Expressions

9

Regular Definitions

• To write regular expression for some languages can be difficult, because their regular expressions can be quite complex. In those cases, we may use regular definitions.

• We can give names to regular expressions, and we can use these names as symbols to define other regular expressions.

• A regular definition is a sequence of the definitions of the form:d1 r1 where di is a distinct name and

d2 r2 ri is a regular expression over symbols in

. {d1,d2,...,di-1}

dn rn

Page 10: Regular Expressions

10

Specification of Patterns for Tokens: Regular Definitions

• Example:

letter AB…Zab…z digit 01…9 id letter ( letterdigit )*

• digits digit digit*

Page 11: Regular Expressions

11

Regular Definitions (cont.)

• Ex: Identifiers in Pascalletter A | B | ... | Z | a | b | ... | zdigit 0 | 1 | ... | 9id letter (letter | digit ) *

– If we try to write the regular expression representing identifiers without using regular definitions, that regular expression will be complex.

(A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) *

• Ex: Unsigned numbers in Pascaldigit 0 | 1 | ... | 9digits digit +

opt-fraction ( . digits ) ?opt-exponent ( E (+|-)? digits ) ?

unsigned-num digits opt-fraction opt-exponent

Page 12: Regular Expressions

12

Specification of Patterns for Tokens: Notational Shorthand

• The following shorthands are often used:– + one or more instances of– ? Zero or one instance

r+ = rr*

r? = r[a-z] = abc…z

• Examples:digit [0-9]num digit+ (. digit+)? ( E (+-)? digit+ )?

Page 13: Regular Expressions

13

Definition

• For primitive regular expressions:

aaL

L

L

Page 14: Regular Expressions

14

Definition (continued)

• For regular expressions and

1r 2r

2121 rLrLrrL

2121 rLrLrrL

** 11 rLrL

11 rLrL

Page 15: Regular Expressions

Concatenation of Languages

• If L1 and L2 are languages, we can define the concatenationL1L2 = {w | w=xy, xL1, yL2}

• Examples:– {ab, ba}{cd, dc} =? {abcd, abdc, bacd, badc}– Ø{ab} =? Ø

Page 16: Regular Expressions

Kleene Closure

• L* = i=0Li

= L0 L1 L2 …• Examples:

– {ab, ba}* =? {, ab, ba, abab, abba,…}– Ø* =? {}– {}* =? {}

Page 17: Regular Expressions

17

Example

• Regular expression *)10(00*)10( r

)(rL = { all strings with at least two consecutive 0 }

Page 18: Regular Expressions

18

Example

• Regular expression )0(*)011( r

)(rL = { all strings without two consecutive 0 }

Page 19: Regular Expressions

19

Equivalent Regular Expressions

• Definition:

• Regular expressions and

• are equivalent if

1r 2r

)()( 21 rLrL

Page 20: Regular Expressions

20

Example

• L= { all strings without two consecutive 0 }

)0(*)011(1 r

)0(*1)0(**)011*1(2 r

LrLrL )()( 211r 2rand

are equivalentregular expr.

Page 21: Regular Expressions

Assignment

• Σ = {0, 1}• What is the language for

– 0*1*

• What is the regular expression for– {w | w has at least one 1}– {w | w starts and ends with same symbol}– {w | |w| 5}– {w | every 3rd position of w is 1}– L+ = L1 L2 …– L? (means an optional L)

Page 22: Regular Expressions

22

Regular Expressionsand

Regular Languages

Page 23: Regular Expressions

23

Theorem

LanguagesGenerated byRegular Expressions

RegularLanguages

Page 24: Regular Expressions

24

Standard Representations of Regular Languages

Regular Languages

FAs

NFAsRegularExpressions

Page 25: Regular Expressions

25

Elementary Questions

about

Regular Languages

Page 26: Regular Expressions

26

Membership Question

Question: Given regular languageand string how can we check if ?

L

Lw w

Answer: Take the DFA that acceptsand check if is accepted

Lw

Page 27: Regular Expressions

27

DFA

Lw

DFA

Lw

w

w

Page 28: Regular Expressions

28

Given regular languagehow can we checkif is empty: ?

L

L

Take the DFA that accepts

Check if there is any path from the initial state to a final state

L

)( L

Question:

Answer:

Page 29: Regular Expressions

29

DFA

L

DFA

L

Page 30: Regular Expressions

30

Given regular languagehow can we checkif is finite?

L

L

Take the DFA that accepts

Check if there is a walk with cyclefrom the initial state to a final state

L

Question:

Answer:

Page 31: Regular Expressions

31

DFA

L is infinite

DFA

L is finite

Page 32: Regular Expressions

From RE to -NFA

• For every regular expression R, we can construct an -NFA A, s.t. L(A) = L(R).

• Proof by structural induction:

Ø:

:

a:a

Page 33: Regular Expressions

From RE to -NFA

R+S:

RS:

R*:

R

S

R S

R

Page 34: Regular Expressions

Example: (0+1)*1(0+1)

0

1

0

1

0

1

1

0

1

Page 35: Regular Expressions

Example : (a+b)*aba