1 finite state automaton (fsa) ling 570 fei xia week 3: 10/8/2007 texpoint fonts used in emf. read...
Post on 21-Dec-2015
216 views
TRANSCRIPT
![Page 1: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/1.jpg)
1
Finite state automaton (FSA)
LING 570
Fei Xia
Week 3: 10/8/2007
![Page 2: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/2.jpg)
2
Hw1
• Need to test your code on patas– Windows carriage return vs. unix newline– Your machine could yield different results– Compile your code and include the binary– Include the shell scripts– Make sure that your code does not crash
• Set the executable bit in *.(sh|pl|py|…)
• Including path names in the code could be a problem.
• See Bill’s GoPost message: “how to make sure that we can run your code”
![Page 3: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/3.jpg)
3
Hw1 (cont)• Whitespace means \s+
• Your code should be able to handle empty lines, etc.
• make_voc.* runs slowly use a hash table
• It is really important to follow the instructions:– Ex: Retain the same line break– Ex: the output of make_voc.sh Should be of 1152
instead > of 1152 1: (‘of’, 1152)
![Page 4: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/4.jpg)
4
Hw1 (cont)• Grading criteria:
– Incorrect input / output format: -2– Runtime error (e.g., it does not process entire file): -10 -5 for hw1– Missing shell script: -2– No binary, Unix-incompatible: 0 only for this time. Later it will be
treated as a runtime error.
• Grades: 20 students– Average: 57.9– Median: 60 (12 got 60)
• Time spent on the homework: 13 students– Average: 12.2 hours– Median: 10.5 hours
![Page 5: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/5.jpg)
5
Hw2
• Any questions?
• n >= 0
![Page 6: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/6.jpg)
6
From last time
![Page 7: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/7.jpg)
7
Formal grammar
• A formal grammar is a 4-tuple:
• Grammars generate languages.
• Chomsky Hierarchy: – Unrestricted, context sensitive, context free, regular
• There are other types of grammars.
• Human languages are beyond context-free.
![Page 8: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/8.jpg)
8
Formal language
• A language is a set of strings.
• Regular language: defined by recursion– They can be generated by regular grammars– They can be expressed by regular expressions
• Given a regular language, grammar, or expression, how can we tell whether a string belongs to a language?
creating an FSA as an acceptor
![Page 9: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/9.jpg)
9
Finite state automaton (FSA)
![Page 10: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/10.jpg)
10
FSA / FST
• It is one of the most important techniques in NLP.
• Multiple FSTs can be combined to form a larger, more powerful FST.
• Any regular language can be recognized by an FSA.
• Any regular relation can be recognized by an FST.
![Page 11: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/11.jpg)
11
FST Toolkits
• AT&T: http://www.research.att.com/~fsmtools/fsm/man.html
• NLTK: http://nltk.sf.net/docs.html
• ISI: Carmel
• …
![Page 12: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/12.jpg)
12
Outline
• Deterministic FSA (DFA)
• Non-deterministic FSA (NFA)
• Probabilistic FSA (PFA)
• Weighted FSA (WFA)
![Page 13: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/13.jpg)
13
DFA
![Page 14: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/14.jpg)
14
Definition of DFA
An automaton is a 5-tuple =
• An alphabet input symbols
• A finite set of states Q
• A start state q0
• A set of final states F
• A transition function:
![Page 15: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/15.jpg)
15
a
b
b
q0q1
= {a, b}S = {q0, q1}F = {q1} = { q0 £ a ! q0, q0 £ b ! q1, q1 £ b ! q1 }
±
What about q1 £ a ?
![Page 16: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/16.jpg)
16
Representing an FSA as a directed graph
• The vertices denote states:– Final states are represented as two concentric
circles.
• The transitions forms the edges.
• The edges are labeled with symbols.
![Page 17: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/17.jpg)
17
An example
a
b
b
q0q1
q2 a
ba
a b b a a
q0 q0 q1 q1 q2 q0
a b b a b
q0 q0 q1 q1 q2 q1
![Page 18: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/18.jpg)
18
DFA as an acceptor
• A string is said to be accepted by an FSA if the FSA is in a final state when it stops working.– that is, there is a path from the initial state to a final
state which yields the string.– Ex: does the FSA accept “abab”?
• The set of the strings that can be accepted by an FSA is called the language accepted by the FSA.
![Page 19: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/19.jpg)
19
An algorithm for deterministic recognition of DFAs
![Page 20: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/20.jpg)
20
An example
Regular expression: a* b+
a
b
b
q0q1
q0 a q0
q0 b q1
q1 b q1
q1 ²
FSA:
Regular grammar:
Regular language: {b, ab, bb, aab, abb, …}
![Page 21: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/21.jpg)
21
NFA
![Page 22: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/22.jpg)
22
NFA
• A transition can lead to more than one state.• There could be multiple start states.• Transitions can be labeled with ², meaning
states can be reached without reading any input.
now the transition function is:
![Page 23: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/23.jpg)
23
NFA example
b
q0q1
q2 b
ba
b a
b b a b b
q0 q0 q1 q1 q2 q1
b b a b b
q0 q1 q2 q0 q0 q1
q0 q1 q2 q0 q1 q2q0 q1 q2 q0 q0 q0
![Page 24: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/24.jpg)
24
Definition of regular expression
• The set of regular expressions is defined as follows:(1) Every symbol of is a regular expression
(2) ² is a regular expression
(3) If r1 and r2 are regular expressions, so are (r1), r1 r2, r1 | r2 , r1*
(4) Nothing else is a regular expression.
§
![Page 25: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/25.jpg)
25
Regular expression NFABase case:
Concatenation: connecting the final states of FSA1 to the initial state of FSA2 by an ²-translation.
Union: Creating a new initial state and add ²-transitions from it to the initial states of FSA1 and FSA2.
Kleene closure:
![Page 26: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/26.jpg)
26
Regular expression NFA (cont)
An example: \d+(\.\d+)?(e\-?\d+)?
Kleene closure:
![Page 27: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/27.jpg)
27
Regular grammar and FSA
• Regular grammar:
• FSA:
• Conversion between them Hw3
![Page 28: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/28.jpg)
28
Relation between DFA and NFA
• DFA and NFA are equivalent.
• The conversion from NFA to DFA:– Create a new state for each equivalent class
in NFA– The max number of states in DFA is 2N, where
N is the number of states in NFA.
• Why do we need both?
![Page 29: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/29.jpg)
29
Common algorithms for FSA packages
• Converting regular expressions to NFAs
• Converting NFAs to regular expressions
• Determinization: converting NFA to DFA
• Other useful closure properties: union,
concatenation, Kleene closure, intersection
![Page 30: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/30.jpg)
30
So far
• A DFA is a 5-tuple:
• A NFA is a 5-tuple:
• DFA and NFA are equivalent.
• Any regular language can be recognized by an FSA.– Regular language Regex NFA DFA
Regular grammar
![Page 31: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/31.jpg)
31
Outline
• Deterministic finite state automata (DFA)
• Non-deterministic finite state automata (NFA)
• Probabilistic finite state automata (PFA)
• Weighted Finite state automata (WFA)
![Page 32: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/32.jpg)
32
An example of PFA
q0:0 q1:0.2
b:0.8
a:1.0
I(q0)=1.0I(q1)=0.0
P(abn)=I(q0)*P(q0,abn,q1)*F(q1) =1.0*1.0*0.8n*0.2
18.01
8.0*2.08.0*2.0)()(
0
00
n
n
n
n
x
abPxP
F(q0)=0F(q1)=0.2
![Page 33: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/33.jpg)
33
Formal definition of PFA
A PFA is • Q: a finite set of N states• Σ: a finite set of input symbols• I: Q R+ (initial-state probabilities)• F: Q R+ (final-state probabilities)• : the transition relation
between states.• P: (transition probabilities)
),,,,,( PFIQ
QQ }){(
R
![Page 34: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/34.jpg)
34
qI 1)(
a
qaqPqFQq
'}{
1)',,()(
1,1
),()(
),,(*)(*)(),(
1,1,1,1
11
111,1,1
nqnnn
ii
n
iinnn
qwPwP
qwqpqFqIqwP
Constraints on function:
Probability of a string:
![Page 35: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/35.jpg)
35
PFA
• Informally, in a PFA, each arc is associated with a probability.
• The probability of a path is the multiplication of the arcs on the path.
• The probability of a string x is the sum of the probabilities of all the paths for x.
• Tasks:– Given a string x, find the best path for x.– Given a string x, find the probability of x in a PFA.– Find the string with the highest probability in a PFA– …
![Page 36: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/36.jpg)
36
Another PFA example: A bigram language model
P( BOS w1 w2 … wn EOS)
= P(BOS) * P(w1 | BOS) P(w2 | w1) * ….
P(wn | wn-1) * P(EOS | wn)
Examples: I bought two/to/too books How many states?
![Page 37: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/37.jpg)
37
Weighted finite-state automata (WFA)
• Each arc is associated with a weight.
• “Sum” and “Multiplication” can have other meanings.
• Ex: weight is –log prob
- “multiplication” addition
- “Sum” power
))(),,()(()(,..,
tFtxsPsIxweightQts
![Page 38: 1 Finite state automaton (FSA) LING 570 Fei Xia Week 3: 10/8/2007 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d555503460f94a33458/html5/thumbnails/38.jpg)
38
Summary
• DFA and NFA are 5-tuple:– They are equivalent– Algorithm for constructing NFAs for Regexps
• PFA and WFA are 6-tuple:
• Existing packages for FSA/FSM algorithms:– Ex: intersection, union, Kleene closure, difference,
complementation, …
),,,,,( PFIQ