ling 438/538 computational linguistics sandiway fong lecture 16: 10/19

20
LING 438/538 Computational Linguistics Sandiway Fong Lecture 16: 10/19

Post on 18-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

LING 438/538Computational Linguistics

Sandiway Fong

Lecture 16: 10/19

Administrivia

• review homework #3

• new homework #4– out today– usual rules apply - due next Thursday

Last Time

• Spelling errors and correction• Error Correction

– correct• Bayesian Probability

– Minimum Edit Distance Computation• Dynamic Programming

Minimum Edit Distance

• example– assuming

• insert =1• delete=1• substitution=2• (or 0 for

substituting the same character)

• recursive formula– incrementally computed from minimum edit

distances of shorter stringsintentexecut

intentexecu

intenexecut

intenexecu

one edit operation away

L

D B

min(L+1,D+0,B+1)

cost: 1+2+2+1+2=8

Minimum Edit Distance Computation

• one formula Microsoft Excel implementation

$ in a cell referencemeans don’t change when copiedfrom cell to celle.g. in C$1, 1 stays the samein $A3,A stays the same (not 3)

min(C2+1,B3+1,B2+if(C$1=$A3,0,2)) min(D2+1,C3+1,C2+if(D$1=$A3,0,2))

min(C3+1,B4+1,B3+if(C$1=$A4,0,2))

inc colinc rowrow

column protected

protected

Minimum Edit Distance Computation

• demo example pairs

– intention, intent:– intention, intentional:– intention, ten:– intention, ton:– intention, teen:

• min edit distance(assuming substitution cost 2)

3

2

6

6

7

Homework 3 Review

Question 1

• 438/538 (4pts)• Give the minimum size regular

expression for the FSA below (2pt)

• Minimum size regular expression for the FSA:– a+b*

• not minimum size in terms of number of symbols:– aa*b*– (aa*)|(aa*b*)

s x ya

a

Question 1

• 438/538 (4pts)• Give an equivalent FSA

without the ε-transition (2pts)– answer in the form of a

diagram or formal definition or Prolog definition are all ok

• Equivalent ε-free FSA

s x ya

a

s a b

a b

a b

How to arrive at this answer?

by inspectionor by consideration of a+b*b* = ε | b+

s a

a

as b

b

b

Question 1

• 438/538 (4pts)• Give an equivalent FSA

without the ε-transition (2pts)– answer in the form of a

diagram or formal definition or Prolog definition are all ok

• Set-of-States Construction method:

s x ya

a

{s} {x,y} {y}

a b

a ba

s a b

a b

a b

Question 2

• 438/538 (8pts)• convert the NDFSA into a

deterministic FSA (3pts)

figure 2.27

in the textbook

{1}a

{2}b

{3,4}a

{2,3}b

a

{1}a

{2}b

{3,4}a

{2,3}b

a

• set-of-states construction:

Question 2

• 438/538 (8pts)• implement both the NDFSA

and the equivalent FSA in Prolog using the “one predicate per state” encoding

• Prolog code:one([a|L]) :- two(L).two([b|L]) :- three(L).two([b|L]) :- four(L).three([]).three([a|L]) :- two(L).four([a|L]) :- three(L).

strings abab and abaaba, how many steps (transitions + final stop)?

Question 2

• 438/538 (8pts)• implement both the NDFSA

and the equivalent FSA in Prolog using the “one predicate per state” encoding

• Prolog code:s1([a|L]) :- s2(L).s2([b|L]) :- s34(L).s34([]).s34([a|L]) :- s23(L).s23([]).s23([b|L]) :- s34(L).s23([a|L]) :- s2(L).

{1}a

{2}b

{3,4}a

{2,3}b

a

strings abab and abaaba, how many steps (transitions + final stop)?

Question 3

• 438/538 (8pts)• (5pts) Give a FSA in Prolog

that accepts a binary string (made up of 0’s and 1’s) if and only if it begins with a 1 and contains exactly one 0– examples:

– 1111011

– 10

– *111011101

• FSA:

11

2

1

03

1

Question 3

• 438/538 (8pts)• (5pts) Give a FSA in Prolog

that accepts a binary string (made up of 0’s and 1’s) if and only if it begins with a 1 and contains exactly one 0

• (3pts) Given the regular expression equivalent of the FSA

• Regular Expression:– 11*01*

Homework #4

Question 1

• 438/538 (8pts)

• Implement the e-insertion rule • (Context-Sensitive) Spelling

Rule: (3.5) e / {x,s,z}^__ s#

– as a FST in Prolog

• Goals:– pass through non-matching

cases unchanged – implement rule exactly– no deletion of boundaries ^

and #

Question 2

438/538 (6pts) • What does the Porter Stemmer output for the

following words:– (2 pts) availability– (2 pts) shipping– (2pts) unbelievable

• Show the steps (stages) in your answer

Question 2

438/538 (6pts) – the Porter Stemmer handles -ement for cases like

• replacement replac(e)

– it doesn’t handle statement stat(e)• i.e. it outputs statement

– Why? Explain (2pts)– Modify the Porter rule responsible to allow for statement

stat(e)• Submit your rule (2pts)• Give 2 examples where the modified rule would be too liberal,

i.e. it overstems (2pts)

Summary

• Q1: 8pts

• Q2: 6+6=12pts

• Total: 20 pts