classifying unknown proper noun phrases without context

26
Joseph Smarr Classifying Unknown Proper Noun Phrases Without Context Classifying Unknown Proper Noun Phrases Without Context Joseph Smarr & Christopher D. Manning Symbolic Systems Program Stanford University April 5, 2002

Upload: salma

Post on 13-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Classifying Unknown Proper Noun Phrases Without Context. Joseph Smarr & Christopher D. Manning Symbolic Systems Program Stanford University April 5, 2002. The Problem of Unknown Words. No statistics are generated for unknown words  problematic for statistical NLP - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Classifying Unknown Proper Noun Phrases Without Context

Joseph Smarr & Christopher D. ManningSymbolic Systems Program

Stanford University

April 5, 2002

Page 2: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

The Problem of Unknown Words

• No statistics are generated for unknown words problematic for statistical NLP

• Same problem for Proper Noun Phrases– Also need to bracket entire PNP

• Particularly acute in domains with large number of terms or new words being constantly generated– Drug names– Company names– Movie titles– Place Names– People’s Names

Page 3: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Proper Noun Phrase Classification

• Task: Given a Proper Noun Phrase (one or more words that collectively refer to an entity), assign it a semantic class (e.g. drug name, company name, etc)

• Example: MUC ENAMEX test (classifying PNPs in text as organizations, places, and people)

• Problem: How do we classify unknown PNPs?

Page 4: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Existing Techniques for PNP Classification

• Large, manually constructed lists of names– Includes common words (Inc., Dr., etc.)

• Syntactic patterns in surrounding context– … XXXX himself … person– … [profession] of/at/with XXXX organization

• Machine learning with word-level features– Capitalization, punctuation, special chars, etc.

Page 5: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Limitations of Existing Techniques

• Manually constructed lists and rules– Slow/expensive to create and maintain

• Domain-specific solutions– Won’t generate to new categories

• Misses valuable source of information– People often classify PNPs by how they look

CotrimoxazoleCotrimoxazole WethersfieldWethersfield

Alien Fury: Countdown to InvasionAlien Fury: Countdown to Invasion

Page 6: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

What’s in a Name?

• Claim: If people can classify unknown PNPs without context, they must be using the composition of the PNP itself– Common accompanying words– Common letters and letter sequences– Number and length of words in PNP

• Idea: Build a statistical generative model that captures these features from data

Page 7: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Common Words and Letter Sequences

oxa : John

fieldInc

Page 8: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Number and Length of Words

0

0.05

0.1

0.15

0.2

0.25

0 3 6 9 12 15 18 21 24 27 30

Word Length (# Chars)

Pro

port

ion o

f W

ord

s in

Cate

gory

drug

company

movie

place

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1 2 3 4 5 6 7 8 9 10 11 12

Number of Words in PNP

Count

drug

company

movie

place

person

Page 9: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Generative Model Used for Classification

• Probabilistic generative model for each category

• Parameters set from – statistics in training data– cross-validation on held-out data (20%)

• Standard Bayesian Classification

Predicted-Category(pnp) = argmaxc P(c|pnp) = argmaxc P(c)*P(pnp|c)

Page 10: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Generative Model for Each Category

Length n-gram model and word model

P(pnp|c) = Pn-gram(word-lengths(pnp))

*word ipnp P(wi|word-length(wi))Word model: mixture of character n-gram model and common word model

P(wi|len) = len*Pn-gram(wi|len)k/len + (1-len)* Pword(wi|len)

N-Gram Models: deleted interpolation

P0-gram(symbol|history) = uniform-distribution

Pn-gram(s|h) = C(h)Pempirical(s|h) + (1-C(h))P(n-1)-gram(s|h)

Page 11: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Walkthrough Example: Alec Baldwin

• Length sequence: [0, 0, 0, 4, 7, 0]• Words: “____Alec ”, “lec Baldwin$”

Cu

mu

lativ

e L

og P

rob

abi

lity

-50

-45

-40

-35

-30

-25

-20

-15

-10

-5

0

prior P[4|0,0,0] P[7|0,0,4] P[0|0,4,7] P[" Alec "] P["lec Baldwin$"]

drug

nyse

movie

place

person

Page 12: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Walkthrough Example: Baldwin

Note: Baldwin appears both in a person’s name and in a place name

1E-13

1E-12

1E-11

1E-10

1E-09

1E-08

1E-07

1E-06

1E-05

0.0001

0.001

0.01

0.1

1

P[B|Alec ]

P[a|lec B]

P[l|ec Ba]

P[d|c Bal]

P[w| Bald]

P[i|Baldw]

P[n|aldwi]

P[$|ldwin]

length-normalized

mixedchar-word

prob

drug

nyse

movie

place

person

Page 13: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Experimental Setup

• Five categories of Proper Noun Phrases– Drugs, companies, movies, places, people

• Train on 90% of data, test on 10%– 20% of training data held-out for parameter setting

(cross validation)– ~5000 examples per category total

• Each result presented is average/stdev of 10 separate train/test folds

• Three types of tests– pairwise: 1 category vs. 1 category– 1-all: 1 cateory vs. union of all other categories– n-way: every category for itself

Page 14: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Experimental Results: Classification Accuracy

98.93%98.70%98.64%

98.41%98.16%

97.76%96.81%

95.77%95.47%

95.24%

94.34%

92.70%91.86%

90.90%89.94%

88.11%

93.25%

94.57%

82% 84% 86% 88% 90% 92% 94% 96% 98% 100%

drug-nyse

nyse-drug_movie_place_person

nyse-place

nyse-person

drug-person

nyse-movie

drug-nyse_movie_place_person

drug-movie

person-drug_nyse_movie_place

drug-place

nyse-place-person

place-person

drug-nyse-place-person

movie-person

place-drug_nyse_movie_person

movie-drug_nyse_place_person

movie-place

drug-nyse-movie-place-person

pairwise1-alln-way

Page 15: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Experimental Results:Confusion Matrix

drug

nyse

movie

place

person

drug nyse movie place person

Correct Category

Predicted Category

Page 16: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Sources of Incorrect Classification

• Words that appear in one category drive classification in other categories– e.g. Delaware misclassified as company

because of GTE Delaware LP, etc.

• Inherent ambiguity– e.g. movies named after people/places/etc:

● Nuremberg ● John Henry

● Love, Inc. ● Prozac Nation

Page 17: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Examples of Misclassified PNPs

• Errors from misleading words– Calcium Stanley– Best Foods (24 movies with Best, 2 companies)– Bloodhounds, Inc.– Nebraska (movie: One Standing: Nebraska)– Chris Rock (24 movies with Rock, no other people)

• Can you classify these PNPs?– R & C– Randall & Hopkirk– Steeple Aston– Nandanar– Gerdau

Page 18: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Contribution of Model Features

• Character n-gram is best single feature• Word model is good, but subsumed by character n-gram• Length n-gram helps character n-gram, but not much

62.35%

72.18%

89.66%

62.35%

89.59%

92.09%

91.94%

0% 20% 40% 60% 80% 100%

length n-gram only

word model only

char n-gram only

length+word

char+word

char+length

full model

Page 19: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Effect of Increasing N-Gram Length

• Classification accuracy of n-gram models alone• Longer n-grams are useful, but only to a point

50%

55%

60%

65%

70%

75%

80%

85%

90%

95%

100%

1 2 3 4 5 6 7

25%

30%

35%

40%

45%

50%

55%

60%

65%

70%

75%

1 2 3 4 5

character n-gram model length n-gram model

Page 20: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Effect of Increasing Training Data

• Classifier approaches full potential with little training data• Increasing training data even more is unlikely to help much

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

Page 21: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Compensating for Word-Length Bias

• Problem: Character n-gram model places more emphasis on longer words because more terms get multiplied– But are longer words really more important?

• Solution: Take (k/length)’th root of each word’s probability– Treat each word like a single base with an ignored

exponent

• Observation: Performance is best when k>1– Deviation from theoretical expectation

Page 22: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Compensating for Word-Length Bias

Effect of Word-Length Normalization Parameter

91.0%

91.5%

92.0%

92.5%

93.0%

93.5%

94.0%

0 1 2 3 4 5 6 7 8 9 10 11

k in p-word (̂k/work-len)

Cla

ssific

atio

n A

ccura

cy

Page 23: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Generative Models Can Also Generate!

• Step 1: Stochastically generate word-length sequence using length n-gram model

• Step 2: Generate each word using character n-gram model

drug

Ambenylin

Carbosil DM 49

Esidrine Plus Base with Moisturalent

nyse

Downe Financial Grp PR

Host Manage U.S.B. Householding Ltd.

Intermedia Inc.

movie

Alien in Oz

Dragons: The Ever Harlane

El Tombre

place

Archfield

Lee-Newcastleridge

Qatad

personBenedict W. Suthberg Elias Lindbert Atkinson Hugh Grob II

Page 24: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Acquiring Proficiency in New Domains

• Challenge: quickly build a high-accuracy PNP classifier for two novel categories

• Example: “Cheese or Disease?”– Game show on MTV’s Idiot Savants

• Results: 93.5% accuracy within 10 minutes of suggesting categories!– Not possible with previous methods

Page 25: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Conclusions

• Reliable regularities in the way names are constructed– Can be used to complement contextual

cues (e.g. Bayesian prior)– Not surprising given conscious process of

constructing names (e.g. Prozac)

• Statistical methods perform well without the need for domain-specific knowledge– Allows for quick generalization to new

domains

Page 26: Classifying Unknown Proper  Noun Phrases Without Context

Joseph SmarrClassifying Unknown Proper Noun Phrases Without Context

Bonus: Does Your Name Look Like A Name?

Ron Kaplan Dan Klein Miler Lee Chris Manning / Christopher D. Manning Bob Moore / Robert C. Moore Emily Bender Ivan Sag Chung-chieh Shan Stu Shieber / Stuart M. Shieber Joseph Smarr Mark Stevenson Dominic Widdows