ling/c sc/psyc 438/538 computational linguistics sandiway fong lecture 10: 9/27

32
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Post on 15-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

LING/C SC/PSYC 438/538Computational Linguistics

Sandiway Fong

Lecture 10: 9/27

Page 2: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Today’s Topics

• Homework 2 review

• Recap: DCG system

• New topic: Finite State Automata (FSA)

Page 3: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Homework 2 Review

• Question 1– Consider a language L3or5

= {111, 11111, 111111, 111111111, 1111111111, 111111111111, 111111111111111,...}

– each member of L3or5 is a string containing only 1s

– the number of 1s in each string is divisible by 3 or 5

• Give a regular grammar that generates language L3or5

Page 4: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Homework 2 Review

• Regular grammar:– divisible by 3 side

1. s --> [1], a.

2. a --> [1], b.

3. b --> [1].

4. b --> [1], c.

5. c --> [1], d.

6. d --> [1], b.

• Regular grammar:– divisible by 5 side7. s --> [1], one.8. one --> [1], two.9. two --> [1], three.10.three --> [1], four.11.four --> [1].12.four --> [1], five.13.five --> [1], six.14.six --> [1], seven.15.seven --> [1], eight.16.eight --> [1], four.

take the disjunction of the two sub-grammars, i.e. merge them

Page 5: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Homework 2 Review

• Question 2: Language L = {a2nbn+1 | n ≥1} is also non-regular

• but can be generated by a regular grammar extended to allow left and right recursive rules

• Given a Prolog grammar satisfying these rules for L

• Legit rules:X -> aYX -> aX -> Ya– where X, Y are non-

terminals, a is some arbitirary terminal

Page 6: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Homework 2 Review

• L = {a2nbn+1 | n≥ 1}

• DCG:1. s --> [a], b.

2. b --> [a], c.

3. c --> [b], d.

4. c --> s, [b].

5. d --> [b].

generated byrules 1 + 2 + 3 + 5

generated byrules 1 + 2 + 4(center-embeddedrecursive tree)

d

b

b

b

s

c

a

a

bs

b

s

c

a

a

Page 7: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Homework 2 Review

• Question 3 (Optional 438)A right recursive regular grammar that generates a (rightmost) parse for the language {an| n>=2}:s(s(a,B)) --> [a], b(B).b(b(a,B)) --> [a], b(B).b(b(a))--> [a].

• Example?- s(Tree,[a,a,a],[]).Tree = s(a,b(a,b(a)))?

• A corresponding left recursive regular grammar will not halt in all cases when an input string is supplied

• Modify the right recursive grammar to produce a left recursive parse, e.g.?- s(Tree,[a,a,a],[]).

Tree = s(b(b(a),a),a)

• Can you do this for any right recursive regular grammar? e.g. the one for sheeptalk

Page 8: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Homework 2 Review

• Original:s(s(a,B)) --> [a], b(B).b(b(a,B)) --> [a], b(B).b(b(a))--> [a].

• New:1. s(s(B,a)) --> [a], b(B).2. b(b(B,a)) --> [a], b(B).3. b(b(a))--> [a].

taking advantage of the fact that we know we’re generating only a’sexample: in rule 1, we match a terminal a on the left sidebut place an a at the end for the tree

Page 9: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Extra Arguments: Recap

• some uses for extra arguments on non-terminals in the grammar:

1. to generate a parse tree • recording the derivation history

2. feature agreement

3. implement counting

Page 10: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Extra Arguments: Recap

Implement Determiner-Noun Number Agreement

• Data:– the man/men– a man/*a men

• Modified grammar:np(np(D,N)) --> det(D,Number),

common_noun(N,Number).det(det(the),_) --> [the].det(det(a),sg) --> [a].common_noun(n(ball),sg) --> [ball].common_noun(n(man),sg) --> [man].common_noun(n(men),pl) --> [men].

np(np(D,N)) --> detsg(D), common_nounsg(N).np(np(D,N)) --> detpl(D), common_nounpl(N).detsg(det(a)) --> [a].detsg(det(the)) --> [the].detpl(det(the)) --> [the].common_nounsg(n(man)) -->[man]. common_nounpl(n(men)) --> [men].

syntactic sugarcould just encodefeature values in

nonterminal name

Page 11: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Extra Arguments: Recap

• Class Exercise:– Implement Case = {nom,acc} agreement system for

the grammar– Examples: the man hit me vs. *the man hit I

• Modified grammar:s(Y,Z)) --> np(Y,Case), vp(Z), { Case = nom }.np(np(Y),Case) --> pronoun(Y,Case).pronoun(i,nom) --> [i].pronoun(we,nom) --> [we]. pronoun(me,acc) --> [me]. np(np(D,N),_) --> det(D,Num), common_noun(N,Num).vp(vp(Y,Z)) --> transitive(Y), np(Z,Case), { Case = acc }.

{ ... Prolog code ... }

Page 12: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Extra Arguments: Recap

• Class Exercise:– Implement Case = {nom,acc} agreement system for

the grammar– Examples: the man hit me vs. *the man hit I

• Modified grammar:s(Y,Z)) --> np(Y,nom), vp(Z).np(np(Y),Case) --> pronoun(Y,Case).pronoun(i,nom) --> [i].pronoun(we,nom) --> [we]. pronoun(me,acc) --> [me]. np(np(D,N),_) --> det(D,Num), common_noun(N,Num).vp(vp(Y,Z)) --> transitive(Y), np(Z,acc).

without{ ... Prolog code ... }

Page 13: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Extra Arguments: Recap

1. s --> [a],b.2. b --> [a],b.3. b --> [b],c.4. b --> [b].5. c --> [b],c.6. c --> [b].

L = a+b+

a regular grammar

Lab = { anbn | n ≥ 1 }regular grammar + extra argument

1. s(X) --> [a],b(a(X)).2. b(X) --> [a],b(a(X)).3. b(a(X)) --> [b],c(X).4. b(a(0)) --> [b].5. c(a(X)) --> [b],c(X).6. c(a(0)) --> [b].

Query: ?- s(0,L,[]).

Page 14: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Extra Arguments: Recap

1. s(X) --> [a],b(a(X)).2. b(X) --> [a],b(a(X)).3. b(a(X)) --> [b],c(X).4. b(a(0)) --> [b].5. c(a(X)) --> [b],c(X).6. c(a(0)) --> [b].

• derivation tree:

s(0) rule 1

logic:?- s(0,L,[]).

start with 0 wrap an a(_) for each “a”unwrap a(_) for each “b”

match #a’s with #b’s = counting

a b(a(0)) rule 2

a b(a(a(0))) rule 3

b

b c(a(0)) rule 6

Page 15: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

So far

• Equivalent formalisms

regexpregular

grammars

FSA

Page 16: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

FSA

• example (sheeptalk) – baa!– baaa! …

• regexp– baa+!

w x

z

a

!

ya

a

> s b

points to start state end state

marked in red

basic idea: “just follow the arrows”

state

state transition:in state s,if current input symbol is bgo to state w

accepting computation:in a final state,if there is no moreinput, accept string

Page 17: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

FSA: Construction

• step-by-step• regexp

– baa+!=

– baaa*!s>

Page 18: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

FSA: Construction

• step-by-step• regexp

– baaa*!

– b

– from state s, – see a ‘b’, – move to state w

s wb>

Page 19: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

FSA: Construction

• step-by-step• regexp

– baaa*!

– ba

– from state w, – see an ‘a’, – move to state x

s wb xa>

Page 20: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

FSA: Construction

• step-by-step• regexp

– baaa*!

– baa

– from state x, – see an ‘a’, – move to state y

s wb xa>

y

a

Page 21: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

FSA: Construction

• step-by-step• regexp

– baaa*!

– baaa*– baa– baaa– baaaa...

– from y,– see an ‘a’, – move to ?

s wb xa>

y

y’

a

a

a...

but machine musthave a finite numberof states!

Page 22: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Regular Expressions: Example

• step-by-step• regexp

– baaa*!

– baa*– baa– baaa– baaaa...

– from state y,– see an ‘a’, – “loop”, i.e. move back to,

state y

s wb xa

a

>

y

a

Page 23: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Regular Expressions: Example

• step-by-step

• regexp– baaa*!

– baaa*!

– from state y,– see an ‘!’, – move to final state z

(indicated in red)

Note: machine cannot finish (i.e. reach the end of the input string) in states s, w, x or y

s wb xa

a

>

y

a

z!

Page 24: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Finite State Automata (FSA)

• construction– the step-by-step FSA construction method we just

used – works for any regular expression

• conclusion– anything we can encode with a regular expression,

we can build a FSA for it

– an important step in showing that FSA and regexps are formally equivalent

Page 25: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

regexp operators and FSA

• basic wildcards– . and *

• . any single character• e.g. p.t• put, pit, pat, pet

• * zero or more characters

x yd

abc

e

z etc.

...

...

y

a etc.

one loopfor eachcharacter

over thewholealphabet

Page 26: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

regexp operators and FSA

• basic wildcards– +

• one or more of the preceding character

• e.g. a+

– [ ]

• range of characters• e.g. [aeiou]

x ya

a

x yo

aei

u

Page 27: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

regexp operators and FSA

• basic wildcards– ?

• zero or one of the preceding character

• e.g. a?• Non-determinism

– any FSA with an empty transition is non-deterministic

– see example: could be in state x or y simultaneous

– any FSA with an empty transition can be rewritten without the empty transition

x ya

λ

λ denotes the empty transition

x ya>

Page 28: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

regexp operators and FSA

• backreferences:– by state splitting– e.g. ([aeiou])\1

x yo

aei

uy z

o

aei

u

x

yu

o

a

e

i

u

z

ya

ye

yi

yo

o

a

e

u

Page 29: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Finite State Automata (FSA)

• more formally– (Q,s,f,Σ,)1. set of states (Q): {s,w,x,y,z} 4 statesmust be a finite set2. start state (s): s3. end state(s) (f): z

4. alphabet (Σ): {a, b, !}5. transition function :

signature: character × state → state1. (b,s)=w2. (a,w)=x3. (a,x)=y4. (a,y)=y5. (!,y)=z

s wb xa

a

>

y

a

z!

Page 30: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

FSA

• Finite State Automata (FSA) have a limited amount of expressive power

• Let’s look at a modification to FSA and its effect on its power

Page 31: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

String Transitions

– so far...

• all machines have had just a single character label on the arc

• so if we allow strings to label arcs– do they endow the FSA with any

more power?

b

• Answer: No– because we can always convert a

machine with string-transitions into one without

abb

a b b

Page 32: LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Finite State Automata (FSA)

• equivalent

s

z

baa

!

y

a

>

5 state machine

s wb xa

a

>

y

a

z!

3 state machineusing stringtransitions