final projects

10
1 Final Projects Please make an appointment to come talk to me (or office hours) What additional things should you add to your project? Are you on the right track with your method? Any time next week Next week: Invited speakers from Google: Wisam Dakka and Kevin Lerman

Upload: castor-santiago

Post on 01-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

Final Projects. Please make an appointment to come talk to me (or office hours) What additional things should you add to your project? Are you on the right track with your method? Any time next week Next week: Invited speakers from Google: Wisam Dakka and Kevin Lerman - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Final Projects

1

Final Projects

Please make an appointment to come talk to me (or office hours)

What additional things should you add to your project?

Are you on the right track with your method? Any time next week

Next week: Invited speakers from Google: Wisam Dakka and Kevin Lerman Stay tuned for room!

Page 2: Final Projects

2

Web-based Models for NLP

Can web-based counts be used to develop good bigram models?

Verified for same task on web vs. corpus British National Corpus (BNC)

Unsupervised ngram models

Many tasks. Let’s take a sample: Candidate target word selection for MT Compound noun bracketing PP attachment

Page 3: Final Projects

3

Obtaining web counts The number of hits for queries

generated for this n-gram Literal queries: “airplanes flew” Inflected queries: “airplane flew” “airplanes

fly” “airplane flies”…. Etc.

Google and Altavista Noise

False positives when punctuation ignored Matches may include links, filenames, etc.

Page 4: Final Projects

4

Combining Web and Corpus Counts Backoff model: if ngram count falls

below threshold, use web count

Interpolation model: count is combination of web and corpus:

Threshold and λ set through tuning on development corpus

Page 5: Final Projects

5

Candidate selection for MT A.Die Geschichte andert sich, nicht jedoch

die Geographie. b. {History, story, tale, saga, strip} changes

but geography does not. Translate verb object pairs where translation

of verb is known, select the most likely noun that goes with it

Web counts obtained for all verb/object translations and selected using Collocation frequency: f(v,n) Conditional probability: f(v,n)/f(n)

Page 6: Final Projects

6

Results

Page 7: Final Projects

7

Bracketing of Compound Nouns [backup compiler] disk vs. backup

[compiler disk]

Lauer: sophisticated dependency model using Thesauris: 77.5

Page 8: Final Projects

8

Interpretation of compound nouns Pet spray: a spray for pets Onion tears: tears caused by onions Lauer: avoids hand labeling with semantic

tags by using prepositions: War story: must find “story about the war” in

corpus over “story for the war” Uses 8 prepositions Model: Argmax(p) P(p|n1,n2) To avoid

sparse data:

Page 9: Final Projects

9

Page 10: Final Projects

10