ling 438/538 computational linguistics sandiway fong lecture 11: 10/3
Post on 20-Dec-2015
214 Views
Preview:
TRANSCRIPT
2
Administrivia
• homework 2– will be returned
tomorrow (by email)
• homework 3– will be out on
Thursday
3
Last Tuesday
• textbook– Chapter 2: Regular Expressions and Finite State Automata
• regular expressions – Unix grep and – wildcard search in Microsoft Word
• implementing the FSA in Prolog– Method 1:
• two line program fsa/2 + • transition/3 (δ function) and final_state/1
– Method 2: • define each state, e.g. x, as a predicate, e.g. x/1, • taking the input list as an argument
– non-determinism handled by Prolog’s computation rule
5
Determinism
• deterministic FSA (DFSA)– no ambiguity about where to go at any given state
• non-deterministic FSA (NDFSA)– no restriction on ambiguity (surprisingly, no increase in formal power)
• textbook– D-RECOGNIZE (FIGURE 2.13)– ND-RECOGNIZE (FIGURE 2.21)
fsa(S,L) :-fsa(S,L) :- L = [C|M], L = [C|M], transition(S,C,T),transition(S,C,T),fsa(T,M).fsa(T,M).
fsa(y,[]) :- fsa(y,[]) :- end_state(E)..
6
NDFSA → (D)FSA
[discussed at the end of section 2.2 in the textbook]• construct a new machine
– each state of the new machine represents the set of possible states of the original machine when stepping through the input
• Note: – new machine is equivalent to old one (but has more states)– new machine is deterministic
• example
s x
z
a
a
a
b
y
b
b
a
b
s {x,y}
{z}
a
aa
{y,z}
b
a
{y}
b
a
b
b
7
ε-transitions
• jump from state to another state with the empty character– ε-transition (textbook) or λ-transition– no increase in expressive power
• examplesa
ε
b> a
b
b>
a
ε
b>
what’s the equivalentwithout the ε-transition?
8
Start State(s)
• Finite State Automata (FSA)
– (Q,s,f,Σ,)1. set of states (Q): {s,x,y}
must be a finite set2. start state (s): s3. end state(s) (f): y
4. alphabet (Σ): {a, b}5. transition function :
signature: character × state → state (a,s)=x (a,x)=x (b,x)=y (b,y)=y
s x
y
aa
b
b
>
9
FSA Properties
• FSAs (and thus regular languages) are preserved, i.e. maintain their FSA nature, under...– concatenation– union– intersection– complementation– and other operations...
– [see section 2.3 of textbook]
10
concatenation
• concatenate two FSAs, result is a FSA– trick: use ε-transitions to link the automatons
• example– [figure 2.24]
11
union
• disjunction (union) of two FSAs, result is a FSA– trick: use ε-transitions to link the automatons
• example– [figure 2.26]
12
intersection
• (conjunction) intersect two FSAs, result is a FSA– trick: use (modified) set-of-states construction
• example
s1 x ya
a b
b
s2 zb
a b
{s1,s2} a{x,s2}
a
{y,z}
b
b
look familiar?that’s becausea+b* ∩ a*b+ = a+b+
13
complementation
• (complementation) the negation or opposite FSA – with respect to Σ*
• the set of all possible strings from the alphabet
– i.e. accepts everything original FSA rejects– and rejects everything original FSA accepts– result is still a FSA
14
Limits of Finite State Technology
• Language = set of strings• case 1
– suppose set is finite– e.g. L = {ba, abc, ccb, dd}
• easy to encode as a FSA
(by closure under union)
• case 2– set is infinite– ...
s1 s2 s3ab
s1 s2 s3ba s4
c
s1 s2 s3cc s4
b
s1 s2 s3dd
s0
ε
ε
ε
ε
15
Limits of Finite State Technology
• Language = set of strings• case 2
– set is infinite– e.g. L = a+b+ = { ab, aab, abb, aabb, aaab, abbb,
… }• “one or more a’s followed by one or more b’s”• we know this set is regular
– however, consider L = {anbn | n ≥ 1} = { ab, aabb, aaabbb, …}
• “same number of b’s as a’s…”• this set is not regular. Why?
s x
y
aa
b
b
16
The Limits of Finite State Technology
• [Formally, we can use the Pumping Lemma to prove this particular case.]
• informally, – we can build FSA for…– ab– aabb– aaabbb– …
a b
a a b b
a a a b b b
= end state
17
The Limits of Finite State Technology
• we can merge the individual FSA for…– ab– aabb– aaabbb a a a b b bb
b
b
• such direct encoding would require an infinite number of states– and we’re using Finite State Automata
• quite different from the infinity obtained by looping– freely iterate (no counting)
18
The Limits of Finite State Technology
• example– L = a+b+ = { ab, abb, aab,
aabb, aaab, abbb, … }– “one or more a’s followed
by one or more b’s”
• Note:– can be divided into two
independent halves– each half can be replaced
by iteration
s1 s2 s3ba
s1 s2 s3aa s4
b
s1 s2 s3ba s4
b
s1 s2 s3aa s4
bs5
b
s1 s2 s3aa s4
as5
b
s1 s2 s3ba s4
bs5
b
19
The Limits of Finite State Technology
• example– L = a+b+ = { ab, abb, aab,
aabb, aaab, abbb, … }– “one or more a’s followed
by one or more b’s”
• Note:– can be divided into two
independent halves– each half can be replaced
by iteration
s1 s2 s3ba
s1 s2 s3aa s4
b
s1 s2 s3ba s4
b
s1 s2 s3aa s4
bs5
b
s1 s2 s3aa s4
as5
b
s1 s2 s3ba s4
bs5
b
s1 s2 s3ba s4
b
s1 s2 s3aa s4
bs5
b
s0
εε
s1 s2 s3aa s4
as5
b s6b
s0
εε
s1 s2 s3aa s4
as5
b s6b b s7
s1 s2 s3aa s4
as5
b bs3 s4a
s5b ba
top related