1 sims 290-2: applied natural language processing marti hearst sept 22, 2004
Post on 21-Dec-2015
215 views
TRANSCRIPT
![Page 1: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/1.jpg)
1
SIMS 290-2: Applied Natural Language Processing
Marti HearstSept 22, 2004
![Page 2: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/2.jpg)
2
Today
Cascaded ChunkingExample of Using Chunking: Word AssociationsEvaluating ChunkingGoing to the next level: Parsing
![Page 3: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/3.jpg)
3
Cascaded Chunking
Goal: create chunks that include other chunksExamples:
PP consists of preposition + NPVP consists of verb followed by PPs or NPs
How to make it work in NLTKThe tutorial is a bit confusing, I attempt to clarify
![Page 4: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/4.jpg)
4
Creating Cascaded Chunkers
Start with a sentence tokenA list of words with parts of speech assignedCreate a fresh one or use one from a corpus
![Page 5: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/5.jpg)
5
Creating Cascaded Chunkers
Create a set of chunk parsersOne for each chunk typeEach one takes as input some kind of list of tokens, and produced as output a NEW list of tokens
– You can decide what this new list is called Examples: NP-CHUNK, PP-CHUNK, VP-CHUNK
– You can also decide what to name each occurrence of the chunk type, as it is assigned to a subset of tokens
Examples: NP, VP, PP
How to match higher-level tags?It just seems to match their string descriptionSo best be certain that their name does not overlap with POS tags too
![Page 6: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/6.jpg)
6
![Page 7: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/7.jpg)
7
![Page 8: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/8.jpg)
8
![Page 9: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/9.jpg)
9
Let’s do some text analysis
Let’s try this on more complex sentencesFirst, read in part of a corpusThen, count how often each word occurs with each POSDetermine some common verbs, choose oneMake a list of sentences containing that verbTest out the chunker on them; examine further
![Page 10: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/10.jpg)
10
![Page 11: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/11.jpg)
11
![Page 12: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/12.jpg)
12
![Page 13: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/13.jpg)
13
Why didn’t this parse work?
![Page 14: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/14.jpg)
14
Why didn’t this parse work?
![Page 15: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/15.jpg)
15
Why d
idn’t
this
pars
e w
ork
?
![Page 16: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/16.jpg)
16
Why didn’t this parse work?
![Page 17: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/17.jpg)
17
Corpus Analysis for Discovery ofWord Associations
Classic paper by Church & Hanks showed how to use a corpus and a shallow parser to find interesting dependencies between words
– Word Association Norms, Mutual Information, and Lexicography, Computational Linguistics, 16(1), 1991
– http://www.research.att.com/~kwc/publications.html
Some cognitive evidence:Word association norms: which word to people say most often after hearing another word
– Given doctor: nurse, sick, health, medicine, hospital…
People respond more quickly to a word if they’ve seen an associated word
– E.g., if you show “bread” they’re faster at recognizing “butter” than “nurse” (vs a nonsense string)
![Page 18: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/18.jpg)
18
Corpus Analysis for Discovery ofWord Associations
Idea: use a corpus to estimate word associationsAssociation ratio: log ( P(x,y) / P(x)P(y) )
The probability of seeing x followed by y vs. the probably of seeing x anywhere times the probability of seeing y anywhereP(x) is how often x appears in the corpusP(x,y) is how often y follows x within w wordsInteresting associations with “doctor”:
– X: honorary Y: doctor– X: doctors Y: dentists– X: doctors Y: nurses– X: doctors Y: treating– X: examined Y:doctor– X: doctors Y: treat
![Page 19: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/19.jpg)
19
Corpus Analysis for Discovery ofWord Associations
Now let’s make use of syntactic information.Look at which words and syntactic forms follow a given verb, to see what kinds of arguments it takesCompute triples of subject-verb-object
Example: nouns that appear as the object of the verb usage of “drink”:
– martinis, cup_water, champagne, beverage, cup_coffee, cognac, beer, cup, coffee, toast, alcohol…
– What can we note about many of these words?
Example: verbs that have “telephone” in their object: – sit_by, disconnect, answer, hang_up, tap, pick_up, return,
be_by, spot, repeat, place, receive, install, be_on
![Page 20: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/20.jpg)
20
Corpus Analysis for Discovery ofWord Associations
The approach has become standardEntire collections available
Dekang Lin’s Dependency Database– Given a word, retrieve words that had dependency
relationship with the input word
Dependency-based Word Similarity– Given a word, retrieve the words that are most similar
to it, based on dependencies
http://www.cs.ualberta.ca/~lindek/demos.htm
![Page 21: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/21.jpg)
21
Example Dependency Database: “sell”
![Page 22: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/22.jpg)
22
Example Dependency-based Similarity: “sell”
![Page 23: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/23.jpg)
23
Homework Assignment
Choose a verb of interestAnalyze the context in which the verb appears
Can use any corpus you like– Can train a tagger and run it on some fresh text
Example: What kinds of arguments does it take?Improve on my chunking rules to get better characterizations
![Page 24: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/24.jpg)
24
Evaluating the Chunker
Why not just use accuracy?Accuracy = #correct/total number
DefinitionsTotal: number of chunks in gold standardGuessed: set of chunks that were labeledCorrect: of the guessed, which were correctMissed: how many correct chunks not guessed?Precision: #correct / #guessedRecall: #correct / #totalF-measure: 2 * (Prec*Recall) / (Prec + Recall)
![Page 25: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/25.jpg)
25
Example
Assume the following numbersTotal: 100Guessed: 120Correct: 80Missed: 20Precision: 80 / 120 = 0.67Recall: 80 / 100 = 0.80F-measure: 2 * (.67*.80) / (.67 + .80) = 0.69
![Page 26: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/26.jpg)
26
Evaluating in NLTKWe have some already chunked text from the Treebank
The code below uses the existing parse to compare against, and to generate Tokens of type word/tag to parse with our own chunker.
Have to add location information so the evaluation code can compare which words have been assigned which labels
![Page 27: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/27.jpg)
27
How to get better accuracy?
Use a full syntactic parserThese days the probabilistic ones work surprisingly well
They are getting faster too.Prof. Dan Klein’s is very good and easy to run
– http://nlp.stanford.edu/downloads/lex-parser.shtml
![Page 28: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/28.jpg)
28
![Page 29: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/29.jpg)
29
![Page 30: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/30.jpg)
30
![Page 31: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/31.jpg)
31
![Page 32: 1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004](https://reader036.vdocument.in/reader036/viewer/2022062714/56649d565503460f94a3496b/html5/thumbnails/32.jpg)
32
Next Week
Shallow Parsing AssignmentDue on Wed Sept 29
Next week:Read paper on end-of-sentence disambiguationPresley and Barbara lecturing on categorizationWe will read the categorization tutorial the following week