LING/C SC/PSYC 438/538Computational Linguistics
Sandiway Fong
Lecture 10: 9/27
Today’s Topics
• Homework 2 review
• Recap: DCG system
• New topic: Finite State Automata (FSA)
Homework 2 Review
• Question 1– Consider a language L3or5
= {111, 11111, 111111, 111111111, 1111111111, 111111111111, 111111111111111,...}
– each member of L3or5 is a string containing only 1s
– the number of 1s in each string is divisible by 3 or 5
• Give a regular grammar that generates language L3or5
Homework 2 Review
• Regular grammar:– divisible by 3 side
1. s --> [1], a.
2. a --> [1], b.
3. b --> [1].
4. b --> [1], c.
5. c --> [1], d.
6. d --> [1], b.
• Regular grammar:– divisible by 5 side7. s --> [1], one.8. one --> [1], two.9. two --> [1], three.10.three --> [1], four.11.four --> [1].12.four --> [1], five.13.five --> [1], six.14.six --> [1], seven.15.seven --> [1], eight.16.eight --> [1], four.
take the disjunction of the two sub-grammars, i.e. merge them
Homework 2 Review
• Question 2: Language L = {a2nbn+1 | n ≥1} is also non-regular
• but can be generated by a regular grammar extended to allow left and right recursive rules
• Given a Prolog grammar satisfying these rules for L
• Legit rules:X -> aYX -> aX -> Ya– where X, Y are non-
terminals, a is some arbitirary terminal
Homework 2 Review
• L = {a2nbn+1 | n≥ 1}
• DCG:1. s --> [a], b.
2. b --> [a], c.
3. c --> [b], d.
4. c --> s, [b].
5. d --> [b].
generated byrules 1 + 2 + 3 + 5
generated byrules 1 + 2 + 4(center-embeddedrecursive tree)
d
b
b
b
s
c
a
a
bs
b
s
c
a
a
Homework 2 Review
• Question 3 (Optional 438)A right recursive regular grammar that generates a (rightmost) parse for the language {an| n>=2}:s(s(a,B)) --> [a], b(B).b(b(a,B)) --> [a], b(B).b(b(a))--> [a].
• Example?- s(Tree,[a,a,a],[]).Tree = s(a,b(a,b(a)))?
• A corresponding left recursive regular grammar will not halt in all cases when an input string is supplied
• Modify the right recursive grammar to produce a left recursive parse, e.g.?- s(Tree,[a,a,a],[]).
Tree = s(b(b(a),a),a)
• Can you do this for any right recursive regular grammar? e.g. the one for sheeptalk
Homework 2 Review
• Original:s(s(a,B)) --> [a], b(B).b(b(a,B)) --> [a], b(B).b(b(a))--> [a].
• New:1. s(s(B,a)) --> [a], b(B).2. b(b(B,a)) --> [a], b(B).3. b(b(a))--> [a].
taking advantage of the fact that we know we’re generating only a’sexample: in rule 1, we match a terminal a on the left sidebut place an a at the end for the tree
Extra Arguments: Recap
• some uses for extra arguments on non-terminals in the grammar:
1. to generate a parse tree • recording the derivation history
2. feature agreement
3. implement counting
Extra Arguments: Recap
Implement Determiner-Noun Number Agreement
• Data:– the man/men– a man/*a men
• Modified grammar:np(np(D,N)) --> det(D,Number),
common_noun(N,Number).det(det(the),_) --> [the].det(det(a),sg) --> [a].common_noun(n(ball),sg) --> [ball].common_noun(n(man),sg) --> [man].common_noun(n(men),pl) --> [men].
np(np(D,N)) --> detsg(D), common_nounsg(N).np(np(D,N)) --> detpl(D), common_nounpl(N).detsg(det(a)) --> [a].detsg(det(the)) --> [the].detpl(det(the)) --> [the].common_nounsg(n(man)) -->[man]. common_nounpl(n(men)) --> [men].
syntactic sugarcould just encodefeature values in
nonterminal name
Extra Arguments: Recap
• Class Exercise:– Implement Case = {nom,acc} agreement system for
the grammar– Examples: the man hit me vs. *the man hit I
• Modified grammar:s(Y,Z)) --> np(Y,Case), vp(Z), { Case = nom }.np(np(Y),Case) --> pronoun(Y,Case).pronoun(i,nom) --> [i].pronoun(we,nom) --> [we]. pronoun(me,acc) --> [me]. np(np(D,N),_) --> det(D,Num), common_noun(N,Num).vp(vp(Y,Z)) --> transitive(Y), np(Z,Case), { Case = acc }.
{ ... Prolog code ... }
Extra Arguments: Recap
• Class Exercise:– Implement Case = {nom,acc} agreement system for
the grammar– Examples: the man hit me vs. *the man hit I
• Modified grammar:s(Y,Z)) --> np(Y,nom), vp(Z).np(np(Y),Case) --> pronoun(Y,Case).pronoun(i,nom) --> [i].pronoun(we,nom) --> [we]. pronoun(me,acc) --> [me]. np(np(D,N),_) --> det(D,Num), common_noun(N,Num).vp(vp(Y,Z)) --> transitive(Y), np(Z,acc).
without{ ... Prolog code ... }
Extra Arguments: Recap
1. s --> [a],b.2. b --> [a],b.3. b --> [b],c.4. b --> [b].5. c --> [b],c.6. c --> [b].
L = a+b+
a regular grammar
Lab = { anbn | n ≥ 1 }regular grammar + extra argument
1. s(X) --> [a],b(a(X)).2. b(X) --> [a],b(a(X)).3. b(a(X)) --> [b],c(X).4. b(a(0)) --> [b].5. c(a(X)) --> [b],c(X).6. c(a(0)) --> [b].
Query: ?- s(0,L,[]).
Extra Arguments: Recap
1. s(X) --> [a],b(a(X)).2. b(X) --> [a],b(a(X)).3. b(a(X)) --> [b],c(X).4. b(a(0)) --> [b].5. c(a(X)) --> [b],c(X).6. c(a(0)) --> [b].
• derivation tree:
s(0) rule 1
logic:?- s(0,L,[]).
start with 0 wrap an a(_) for each “a”unwrap a(_) for each “b”
match #a’s with #b’s = counting
a b(a(0)) rule 2
a b(a(a(0))) rule 3
b
b c(a(0)) rule 6
So far
• Equivalent formalisms
regexpregular
grammars
FSA
FSA
• example (sheeptalk) – baa!– baaa! …
• regexp– baa+!
w x
z
a
!
ya
a
> s b
points to start state end state
marked in red
basic idea: “just follow the arrows”
state
state transition:in state s,if current input symbol is bgo to state w
accepting computation:in a final state,if there is no moreinput, accept string
FSA: Construction
• step-by-step• regexp
– baa+!=
– baaa*!s>
FSA: Construction
• step-by-step• regexp
– baaa*!
– b
– from state s, – see a ‘b’, – move to state w
s wb>
FSA: Construction
• step-by-step• regexp
– baaa*!
– ba
– from state w, – see an ‘a’, – move to state x
s wb xa>
FSA: Construction
• step-by-step• regexp
– baaa*!
– baa
– from state x, – see an ‘a’, – move to state y
s wb xa>
y
a
FSA: Construction
• step-by-step• regexp
– baaa*!
– baaa*– baa– baaa– baaaa...
– from y,– see an ‘a’, – move to ?
s wb xa>
y
y’
a
a
a...
but machine musthave a finite numberof states!
Regular Expressions: Example
• step-by-step• regexp
– baaa*!
– baa*– baa– baaa– baaaa...
– from state y,– see an ‘a’, – “loop”, i.e. move back to,
state y
s wb xa
a
>
y
a
Regular Expressions: Example
• step-by-step
• regexp– baaa*!
– baaa*!
– from state y,– see an ‘!’, – move to final state z
(indicated in red)
Note: machine cannot finish (i.e. reach the end of the input string) in states s, w, x or y
s wb xa
a
>
y
a
z!
Finite State Automata (FSA)
• construction– the step-by-step FSA construction method we just
used – works for any regular expression
• conclusion– anything we can encode with a regular expression,
we can build a FSA for it
– an important step in showing that FSA and regexps are formally equivalent
regexp operators and FSA
• basic wildcards– . and *
• . any single character• e.g. p.t• put, pit, pat, pet
• * zero or more characters
x yd
abc
e
z etc.
...
...
y
a etc.
one loopfor eachcharacter
over thewholealphabet
regexp operators and FSA
• basic wildcards– +
• one or more of the preceding character
• e.g. a+
– [ ]
• range of characters• e.g. [aeiou]
x ya
a
x yo
aei
u
regexp operators and FSA
• basic wildcards– ?
• zero or one of the preceding character
• e.g. a?• Non-determinism
– any FSA with an empty transition is non-deterministic
– see example: could be in state x or y simultaneous
– any FSA with an empty transition can be rewritten without the empty transition
x ya
λ
λ denotes the empty transition
x ya>
regexp operators and FSA
• backreferences:– by state splitting– e.g. ([aeiou])\1
x yo
aei
uy z
o
aei
u
x
yu
o
a
e
i
u
z
ya
ye
yi
yo
o
a
e
u
Finite State Automata (FSA)
• more formally– (Q,s,f,Σ,)1. set of states (Q): {s,w,x,y,z} 4 statesmust be a finite set2. start state (s): s3. end state(s) (f): z
4. alphabet (Σ): {a, b, !}5. transition function :
signature: character × state → state1. (b,s)=w2. (a,w)=x3. (a,x)=y4. (a,y)=y5. (!,y)=z
s wb xa
a
>
y
a
z!
FSA
• Finite State Automata (FSA) have a limited amount of expressive power
• Let’s look at a modification to FSA and its effect on its power
String Transitions
– so far...
• all machines have had just a single character label on the arc
• so if we allow strings to label arcs– do they endow the FSA with any
more power?
b
• Answer: No– because we can always convert a
machine with string-transitions into one without
abb
a b b
Finite State Automata (FSA)
• equivalent
s
z
baa
!
y
a
>
5 state machine
s wb xa
a
>
y
a
z!
3 state machineusing stringtransitions