classifying unknown proper noun phrases without context
DESCRIPTION
Classifying Unknown Proper Noun Phrases Without Context. Joseph Smarr & Christopher D. Manning Symbolic Systems Program Stanford University April 5, 2002. The Problem of Unknown Words. No statistics are generated for unknown words problematic for statistical NLP - PowerPoint PPT PresentationTRANSCRIPT
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Classifying Unknown Proper Noun Phrases Without Context
Joseph Smarr & Christopher D. ManningSymbolic Systems Program
Stanford University
April 5, 2002
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
The Problem of Unknown Words
• No statistics are generated for unknown words problematic for statistical NLP
• Same problem for Proper Noun Phrases– Also need to bracket entire PNP
• Particularly acute in domains with large number of terms or new words being constantly generated– Drug names– Company names– Movie titles– Place Names– People’s Names
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Proper Noun Phrase Classification
• Task: Given a Proper Noun Phrase (one or more words that collectively refer to an entity), assign it a semantic class (e.g. drug name, company name, etc)
• Example: MUC ENAMEX test (classifying PNPs in text as organizations, places, and people)
• Problem: How do we classify unknown PNPs?
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Existing Techniques for PNP Classification
• Large, manually constructed lists of names– Includes common words (Inc., Dr., etc.)
• Syntactic patterns in surrounding context– … XXXX himself … person– … [profession] of/at/with XXXX organization
• Machine learning with word-level features– Capitalization, punctuation, special chars, etc.
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Limitations of Existing Techniques
• Manually constructed lists and rules– Slow/expensive to create and maintain
• Domain-specific solutions– Won’t generate to new categories
• Misses valuable source of information– People often classify PNPs by how they look
CotrimoxazoleCotrimoxazole WethersfieldWethersfield
Alien Fury: Countdown to InvasionAlien Fury: Countdown to Invasion
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
What’s in a Name?
• Claim: If people can classify unknown PNPs without context, they must be using the composition of the PNP itself– Common accompanying words– Common letters and letter sequences– Number and length of words in PNP
• Idea: Build a statistical generative model that captures these features from data
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Common Words and Letter Sequences
oxa : John
fieldInc
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Number and Length of Words
0
0.05
0.1
0.15
0.2
0.25
0 3 6 9 12 15 18 21 24 27 30
Word Length (# Chars)
Pro
port
ion o
f W
ord
s in
Cate
gory
drug
company
movie
place
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1 2 3 4 5 6 7 8 9 10 11 12
Number of Words in PNP
Count
drug
company
movie
place
person
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Generative Model Used for Classification
• Probabilistic generative model for each category
• Parameters set from – statistics in training data– cross-validation on held-out data (20%)
• Standard Bayesian Classification
Predicted-Category(pnp) = argmaxc P(c|pnp) = argmaxc P(c)*P(pnp|c)
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Generative Model for Each Category
Length n-gram model and word model
P(pnp|c) = Pn-gram(word-lengths(pnp))
*word ipnp P(wi|word-length(wi))Word model: mixture of character n-gram model and common word model
P(wi|len) = len*Pn-gram(wi|len)k/len + (1-len)* Pword(wi|len)
N-Gram Models: deleted interpolation
P0-gram(symbol|history) = uniform-distribution
Pn-gram(s|h) = C(h)Pempirical(s|h) + (1-C(h))P(n-1)-gram(s|h)
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Walkthrough Example: Alec Baldwin
• Length sequence: [0, 0, 0, 4, 7, 0]• Words: “____Alec ”, “lec Baldwin$”
Cu
mu
lativ
e L
og P
rob
abi
lity
-50
-45
-40
-35
-30
-25
-20
-15
-10
-5
0
prior P[4|0,0,0] P[7|0,0,4] P[0|0,4,7] P[" Alec "] P["lec Baldwin$"]
drug
nyse
movie
place
person
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Walkthrough Example: Baldwin
Note: Baldwin appears both in a person’s name and in a place name
1E-13
1E-12
1E-11
1E-10
1E-09
1E-08
1E-07
1E-06
1E-05
0.0001
0.001
0.01
0.1
1
P[B|Alec ]
P[a|lec B]
P[l|ec Ba]
P[d|c Bal]
P[w| Bald]
P[i|Baldw]
P[n|aldwi]
P[$|ldwin]
length-normalized
mixedchar-word
prob
drug
nyse
movie
place
person
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Experimental Setup
• Five categories of Proper Noun Phrases– Drugs, companies, movies, places, people
• Train on 90% of data, test on 10%– 20% of training data held-out for parameter setting
(cross validation)– ~5000 examples per category total
• Each result presented is average/stdev of 10 separate train/test folds
• Three types of tests– pairwise: 1 category vs. 1 category– 1-all: 1 cateory vs. union of all other categories– n-way: every category for itself
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Experimental Results: Classification Accuracy
98.93%98.70%98.64%
98.41%98.16%
97.76%96.81%
95.77%95.47%
95.24%
94.34%
92.70%91.86%
90.90%89.94%
88.11%
93.25%
94.57%
82% 84% 86% 88% 90% 92% 94% 96% 98% 100%
drug-nyse
nyse-drug_movie_place_person
nyse-place
nyse-person
drug-person
nyse-movie
drug-nyse_movie_place_person
drug-movie
person-drug_nyse_movie_place
drug-place
nyse-place-person
place-person
drug-nyse-place-person
movie-person
place-drug_nyse_movie_person
movie-drug_nyse_place_person
movie-place
drug-nyse-movie-place-person
pairwise1-alln-way
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Experimental Results:Confusion Matrix
drug
nyse
movie
place
person
drug nyse movie place person
Correct Category
Predicted Category
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Sources of Incorrect Classification
• Words that appear in one category drive classification in other categories– e.g. Delaware misclassified as company
because of GTE Delaware LP, etc.
• Inherent ambiguity– e.g. movies named after people/places/etc:
● Nuremberg ● John Henry
● Love, Inc. ● Prozac Nation
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Examples of Misclassified PNPs
• Errors from misleading words– Calcium Stanley– Best Foods (24 movies with Best, 2 companies)– Bloodhounds, Inc.– Nebraska (movie: One Standing: Nebraska)– Chris Rock (24 movies with Rock, no other people)
• Can you classify these PNPs?– R & C– Randall & Hopkirk– Steeple Aston– Nandanar– Gerdau
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Contribution of Model Features
• Character n-gram is best single feature• Word model is good, but subsumed by character n-gram• Length n-gram helps character n-gram, but not much
62.35%
72.18%
89.66%
62.35%
89.59%
92.09%
91.94%
0% 20% 40% 60% 80% 100%
length n-gram only
word model only
char n-gram only
length+word
char+word
char+length
full model
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Effect of Increasing N-Gram Length
• Classification accuracy of n-gram models alone• Longer n-grams are useful, but only to a point
50%
55%
60%
65%
70%
75%
80%
85%
90%
95%
100%
1 2 3 4 5 6 7
25%
30%
35%
40%
45%
50%
55%
60%
65%
70%
75%
1 2 3 4 5
character n-gram model length n-gram model
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Effect of Increasing Training Data
• Classifier approaches full potential with little training data• Increasing training data even more is unlikely to help much
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Compensating for Word-Length Bias
• Problem: Character n-gram model places more emphasis on longer words because more terms get multiplied– But are longer words really more important?
• Solution: Take (k/length)’th root of each word’s probability– Treat each word like a single base with an ignored
exponent
• Observation: Performance is best when k>1– Deviation from theoretical expectation
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Compensating for Word-Length Bias
Effect of Word-Length Normalization Parameter
91.0%
91.5%
92.0%
92.5%
93.0%
93.5%
94.0%
0 1 2 3 4 5 6 7 8 9 10 11
k in p-word (̂k/work-len)
Cla
ssific
atio
n A
ccura
cy
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Generative Models Can Also Generate!
• Step 1: Stochastically generate word-length sequence using length n-gram model
• Step 2: Generate each word using character n-gram model
drug
Ambenylin
Carbosil DM 49
Esidrine Plus Base with Moisturalent
nyse
Downe Financial Grp PR
Host Manage U.S.B. Householding Ltd.
Intermedia Inc.
movie
Alien in Oz
Dragons: The Ever Harlane
El Tombre
place
Archfield
Lee-Newcastleridge
Qatad
personBenedict W. Suthberg Elias Lindbert Atkinson Hugh Grob II
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Acquiring Proficiency in New Domains
• Challenge: quickly build a high-accuracy PNP classifier for two novel categories
• Example: “Cheese or Disease?”– Game show on MTV’s Idiot Savants
• Results: 93.5% accuracy within 10 minutes of suggesting categories!– Not possible with previous methods
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Conclusions
• Reliable regularities in the way names are constructed– Can be used to complement contextual
cues (e.g. Bayesian prior)– Not surprising given conscious process of
constructing names (e.g. Prozac)
• Statistical methods perform well without the need for domain-specific knowledge– Allows for quick generalization to new
domains
Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context
Bonus: Does Your Name Look Like A Name?
Ron Kaplan Dan Klein Miler Lee Chris Manning / Christopher D. Manning Bob Moore / Robert C. Moore Emily Bender Ivan Sag Chung-chieh Shan Stu Shieber / Stuart M. Shieber Joseph Smarr Mark Stevenson Dominic Widdows