![Page 1: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/1.jpg)
Natural Language Processing
Lecture 4 : Regular Expressions and Automata
![Page 2: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/2.jpg)
2
• A language is a set of strings• String: A sequence of letters
Examples: “cat”, “dog”, “house”, …
Defined over an alphabet:
Definitions
zcba ,,,,
![Page 3: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/3.jpg)
3
Alphabets and Strings
• We will use small alphabets:• Strings
abbaw
bbbaaav
abu
ba,
baaabbbaaba
baba
abba
ab
a
![Page 4: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/4.jpg)
Regular Expressions
• In computer science, RE is a language used for specifying text search string.
• A regular expression is a formula in a special language that is used for specifying a simple class of string.
• Formally, a regular expression is an algebraic notation for characterizing a set of strings.
• RE search requires a pattern that we want to search for, and a corpus of texts to search through.
![Page 5: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/5.jpg)
5
Basic Regular Expression Patterns
• The use of the brackets [] to specify a disjunction of characters.
• The use of the brackets [] plus the dash - to specify a range.
![Page 6: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/6.jpg)
6
Basic Regular Expression Patterns
• Uses of the caret ^ for negation or just to mean ^
• The question-mark ? marks optionality of the previous expression.
• The use of period . to specify any character
![Page 7: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/7.jpg)
7
Disjunction, Grouping, and Precedence
• Disjunction• /cat|dog• Precedence• /gupp(y|ies)• To find the English article the• /the/• /[tT]he/• /[^a-zA-Z][tT]he[^a-zA-Z]/
![Page 8: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/8.jpg)
8
Aliases for common sets of characters
![Page 9: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/9.jpg)
9
Regular expression operators for counting
![Page 10: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/10.jpg)
10
Some characters that need to be backslashed
![Page 11: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/11.jpg)
11
Finite State Automata
• FSAs recognize the regular languages represented by regular expressions SheepTalk: /baa+!/
• Directed graph with labeled nodes and arc transitions
•Five states: q0 the start state, q4 the final state, 5 transitions
q0q0 q4q4q1q1 q2q2 q3q3
b aa
a !
![Page 12: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/12.jpg)
12
Formally
• FSA is a 5-tuple consisting of Q: set of states {q0,q1,q2,q3,q4}: an alphabet of symbols {a,b,!} q0: A start state F: a set of final states in Q {q4} (q,i): a transition function mapping Q
x to Q
q0q0 q4q4q1q1 q2q2 q3q3
b a
a
a !
![Page 13: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/13.jpg)
13
• FSA recognizes (accepts) strings of a regular language baa! baaa! baaaa! …
• Tape Input: a rejected input
a b a ! b
q0q0 q4q1q1 q2q2 q3b a
aa !
![Page 14: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/14.jpg)
14
State Transition Table for SheepTalk
StateInput
b a !
0 1 Ø Ø
1 Ø 2 Ø
2 Ø 3 Ø
3 Ø 3 4
4 Ø Ø Ø
baa! baaa! baaaa! baaaaa! ...
q0q0 q4q4q1q1 q2q2 q3q3
b aa
a !
![Page 15: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/15.jpg)
15
Non-Deterministic FSAs for SheepTalk
q0q0 q4q4q1q1 q2q2 q3q3
b a a a !
q0q0 q4q4q1q1 q2q2 q3q3
b a a !
![Page 16: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/16.jpg)
16
Finite Accepter
• Input
“Accept” or“Reject”
String
FiniteAutomata
Output
![Page 17: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/17.jpg)
17
Transition Graph
•
initialstate
final state“accept”state
transition
abba -Finite Accepter
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
![Page 18: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/18.jpg)
18
Initial Configuration
•
1q 2q 3q 4qa b b a
5q
a a bb
ba,
Input Stringa b b a
ba,
0q
![Page 19: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/19.jpg)
04/21/23 19
Reading the Input
•
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b b a
ba,
![Page 20: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/20.jpg)
04/21/23 20
•
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b b a
ba,
![Page 21: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/21.jpg)
04/21/23 21
•
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b b a
ba,
![Page 22: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/22.jpg)
04/21/23 22
•
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b b a
ba,
![Page 23: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/23.jpg)
04/21/23 23
0q 1q 2q 3q 4qa b b a
Output: “accept”
5q
a a bb
ba,
a b b a
ba,
![Page 24: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/24.jpg)
04/21/23 24
Rejection
•
1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b a
ba,
0q
![Page 25: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/25.jpg)
04/21/23 25
•
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b a
ba,
![Page 26: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/26.jpg)
04/21/23 26
•
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b a
ba,
![Page 27: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/27.jpg)
04/21/23 27
•
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b a
ba,
![Page 28: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/28.jpg)
04/21/23 28
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
Output:“reject”
a b a
ba,
![Page 29: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/29.jpg)
04/21/23 29
Another Example
a
b ba,
ba,
0q 1q 2q
a ba
![Page 30: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/30.jpg)
04/21/23 30
a
b ba,
ba,
0q 1q 2q
a ba
![Page 31: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/31.jpg)
04/21/23 31
a
b ba,
ba,
0q 1q 2q
a ba
![Page 32: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/32.jpg)
04/21/23 32
a
b ba,
ba,
0q 1q 2q
a ba
![Page 33: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/33.jpg)
04/21/23 33
a
b ba,
ba,
0q 1q 2q
a ba
Output: “accept”
![Page 34: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/34.jpg)
04/21/23 34
Rejection
a
b ba,
ba,
0q 1q 2q
ab b
![Page 35: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/35.jpg)
04/21/23 35
a
b ba,
ba,
0q 1q 2q
ab b
![Page 36: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/36.jpg)
04/21/23 36
a
b ba,
ba,
0q 1q 2q
ab b
![Page 37: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/37.jpg)
04/21/23 37
a
b ba,
ba,
0q 1q 2q
ab b
![Page 38: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/38.jpg)
04/21/23 38
a
b ba,
ba,
0q 1q 2q
ab b
Output: “reject”
![Page 39: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/39.jpg)
04/21/23 39
Formalities
• Deterministic Finite Accepter (DFA)
FqQM ,,,, 0Q
0q
F
: set of states
: input alphabet
: transition function
: initial state
: set of final states
![Page 40: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/40.jpg)
04/21/23 40
About Alphabets
• Alphabets means we need a finite set of symbols in the input.
• These symbols can and will stand for bigger objects that can have internal structure.
![Page 41: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/41.jpg)
04/21/23 41
Input Aplhabet
•
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
ba,
![Page 42: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/42.jpg)
04/21/23 42
Set of States
Q
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
543210 ,,,,, qqqqqqQ
ba,
![Page 43: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/43.jpg)
04/21/23 43
Initial State
0q
1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q
![Page 44: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/44.jpg)
04/21/23 44
Set of Final States
F
0q 1q 2q 3qa b b a
5q
a a bb
ba,
4qF
ba,
4q
![Page 45: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/45.jpg)
04/21/23 45
Transition Function
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
QQ :
ba,
![Page 46: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/46.jpg)
04/21/23 46
10 , qaq
2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q 1q
![Page 47: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/47.jpg)
04/21/23 47
50 , qbq
1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q
![Page 48: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/48.jpg)
04/21/23 48
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
32 , qbq
![Page 49: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/49.jpg)
04/21/23 49
Transition Function
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b
0q
1q
2q
3q
4q
5q
1q 5q
5q 2q
2q 3q
4q 5q
ba,5q5q5q5q
![Page 50: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/50.jpg)
04/21/23 50
Extended Transition Function(Reads the entire string) *
QQ *:*
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
![Page 51: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/51.jpg)
04/21/23 51
20 ,* qabq
3q 4qa b b a
5q
a a bb
ba,
ba,
0q 1q 2q
![Page 52: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/52.jpg)
04/21/23 52
40 ,* qabbaq
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
![Page 53: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/53.jpg)
04/21/23 53
50 ,* qabbbaaq
1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q
![Page 54: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/54.jpg)
04/21/23 54
50 ,* qabbbaaq
1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q
Observation: There is a walk from to with label
0q 5qabbbaa
![Page 55: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/55.jpg)
04/21/23 55
Example
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
abbaML M
accept
![Page 56: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/56.jpg)
04/21/23 56
Another Example
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
abbaabML , M
acceptacceptaccept
![Page 57: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/57.jpg)
04/21/23 57
More Examples
a
b ba,
ba,
0q 1q 2q
}0:{ nbaML n
accept trap state
![Page 58: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/58.jpg)
04/21/23 58
ML = { all substrings with prefix }ab
a b
ba,
0q 1q 2q
accept
ba,3q
ab
![Page 59: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/59.jpg)
04/21/23 59
ML = { all strings without substring }001
0 00 001
1
0
1
10
0 1,0
![Page 60: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/60.jpg)
04/21/23 60
Regular Languages
• A language is regular if there is a DFA such that
• All regular languages form a language family
LM MLL
![Page 61: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/61.jpg)
04/21/23 61
Example
• The language• is regular:
*,: bawawaL
a
b
ba,
a
b
ba
0q 2q 3q
4q
![Page 62: Natural Language Processing Lecture 4 : Regular Expressions and Automata](https://reader036.vdocument.in/reader036/viewer/2022062805/5697bfd91a28abf838caf878/html5/thumbnails/62.jpg)
04/21/23 62
Dollars and Cents