linguistically rich statistical models of language

27
Linguistically Rich Statistical Models of Language Joseph Smarr M.S. Candidate Symbolic Systems Program Advisor: Christopher D. Manning December 5 th , 2002

Upload: celeste-marty

Post on 02-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Linguistically Rich Statistical Models of Language. Joseph Smarr M.S. Candidate Symbolic Systems Program Advisor: Christopher D. Manning December 5 th , 2002. Grand Vision. Talk to your computer like another human HAL, Star Trek, etc. Ask your computer a question, it finds the answer - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Linguistically Rich  Statistical Models of Language

Linguistically Rich Statistical Models of

Language

Joseph SmarrM.S. Candidate

Symbolic Systems ProgramAdvisor: Christopher D. Manning

December 5th, 2002

Page 2: Linguistically Rich  Statistical Models of Language

Grand Vision

Talk to your computer like another human HAL, Star Trek, etc.

Ask your computer a question, it finds the answer “Who’s speaking at this week’s SymSys

Forum?” Computer can read and summarize text for

you “What’s the cutting edge in NLP these

days?”

Page 3: Linguistically Rich  Statistical Models of Language

We’re Not There (Yet)

Turns out behaving intelligently is difficult What does it take to achieve the grand

vision? General Artificial Intelligence problems

Knowledge representation, common sense reasoning, etc.

Language-specific problems Complexity, ambiguity, and flexibility of

language Always underestimated because language

is so easy for us!

Page 4: Linguistically Rich  Statistical Models of Language

Are There Useful Sub-Goals?

Grand vision is still too hard, but we can solve simpler problems that are still valuable Filter news for stories about new tech gadgets Take the SSP talk email and add it to my calendar Dial my cell phone by speaking my friend’s name Automatically reply to customer service e-mails Find out which episode of The Simpsons is tonight

Two approaches to understanding language: Theory-driven: Theoretical Linguistics Task-driven: Natural Language Processing

Page 5: Linguistically Rich  Statistical Models of Language

Theoretical Linguistics vs. NLP

Theoretical Linguistics

Goal: Understand people’s

Knowledge of language Method:

Rich logical representations of language’s hidden structure and meaning

Guiding principles: Separation of (hidden)

knowledge of language and (observable) performance

Grammaticality is categorical (all or none)

Describe what are possible and impossible utterances

Natural Language Processing

Goal: Develop practical tools for

analyzing speech / text Method:

Simple, robust models of everyday language use that are sufficient to perform tasks

Guiding principles Exploit (empirical)

regularities and patterns in examples of language in text collections

Sentence “goodness” is gradient (better or worse)

Deal with the utterances you’re given, good or bad

Page 6: Linguistically Rich  Statistical Models of Language

Theoretical Linguistics vs. NLP

Linguistics NLP

Page 7: Linguistically Rich  Statistical Models of Language

Linguistic Puzzle

When dropping an argument, why do some verbs keep the subject and some keep the object? John sang the song John sang John broke the vase The vase broke

Not just “quirkiness of language” Similar patterns show up in other languages Seems to involve deep aspects of verb meaning

Rules to account for this phenomenon Two classes of verbs (unergative & unaccusative) Remaining argument must be realized as subject

Page 8: Linguistically Rich  Statistical Models of Language

Exception: Imperatives

“Open the pod bay doors, Hal”

Different goals lead to study of different problems. In NLP... Need to recognize this as a command Need to figure out what specific action to take Irrelevant how you’d say it in French

Describing language vs. working with language But both tasks clearly share many sub-problems

Page 9: Linguistically Rich  Statistical Models of Language

Theoretical Linguistics vs. NLP

Potential for much synergy between linguistics and NLP

However, historically they have remained quite distinct Chomsky (founder of generative grammar):

“It must be recognized that the notion ‘probability of a sentence’ is an entirely useless one, under any known interpretation of this term.”

Karttunen (founder of finite state technologies at Xerox)

Linguists’ reaction to NLP: “Not interested. You do not understand Theory. Go away you geek.”

Jelinek (former head of IBM speech project): “Every time I fire a linguist, the performance of our

speech recognition system goes up.”

Page 10: Linguistically Rich  Statistical Models of Language

Potential Synergies

Lexical acquisition (unknown words) Statistically infer new lexical entries from

context Modeling “naturalness” and “conventionality”

Use corpus data to weight constructions Dealing with ungrammatical utterances

Find “most similar / most likely” correction

Richer patterns for finding information in text Use argument structure / semantic

dependencies More powerful models for speech recognition

Progressively build parse tree while listening

Page 11: Linguistically Rich  Statistical Models of Language

Finding Information in Text

US Government has sponsored lots of research in “information extraction” from news articles Find mentions of terrorists and which locations

they’re targeting Find which companies are being acquired by

which others and for how much Progress driven by simplifying the models

used Early work used rich linguistic parsers

Unable to robustly handle natural text Modern work is mainly finite state patterns

Regular expressions are very practical and successful

Page 12: Linguistically Rich  Statistical Models of Language

Web Information Extraction

How much does that text book cost on Amazon? Learn patterns for finding relevant fields

Concept: Book

Title: Foundations of Statistical Natural Language Processing

Author(s):

Christopher D. Manning & Hinrich Schütze

Price: $58.45

Our Price: $##.##

Page 13: Linguistically Rich  Statistical Models of Language

Improving IE Performance on Natural Text Documents

How can we scale IE back up for natural text? Need to look elsewhere for regularities to

exploit

Idea: Consider grammatical structure Run shallow parser on each sentence Flatten output into sequence of “typed

chunks”Example of Tagged Sentence:

Uba2p is located largely in the nucleus.

NP_SEG VP_SEG PP_SEG NP_SEG

Page 14: Linguistically Rich  Statistical Models of Language

Power of Linguistic Features

21% increase 65% increase 45% increase

Page 15: Linguistically Rich  Statistical Models of Language

Linguistically Rich(er) IE

Exploit more grammatical structure for patterns e.g. Tim Grow’s work on IE with PCFGs

S{pur, acq, amt}

NP{pur}

VP{acq, amt}

NNPMD PP{amt

}VB

NNP

NNP

VP{acq, amt}

NP{amt} NNPCDCD

three

million

for

acquire

will

First

Sheland

Bank Inc

Union

Corp

NP{acq} NNPNN

PNNP

IN{pur}{pur} {pur

}

{acq}

{acq}

{acq}

{amt}

{amt}

{amt}dollars

Page 16: Linguistically Rich  Statistical Models of Language

Classifying Unknown Words

Which of the following is the name of a city?

CotrimoxazoleCotrimoxazole WethersfieldWethersfield

Alien Fury: Countdown to InvasionAlien Fury: Countdown to Invasion

Most linguistic grammars assume a fixed lexicon

How do humans learn to deal with new words? Context (“I spent a summer living in

Wethersfield”) Makeup of the word itself (“phonesthetics”)

Idea: Learn distinguishing letter sequences

Page 17: Linguistically Rich  Statistical Models of Language

What’s in a Name?

oxa : field

Page 18: Linguistically Rich  Statistical Models of Language

Generative Model of PNPsLength n-gram model and word model

P(pnp|c) = Pn-gram(word-lengths(pnp))

*word ipnp P(wi|word-length(wi))Word model: mixture of character n-gram model and common word model

P(wi|len) = len*Pn-gram(wi|len)k/len + (1-len)* Pword(wi|len)

N-Gram Models: deleted interpolation

P0-gram(symbol|history) = uniform-distribution

Pn-gram(s|h) = C(h)Pempirical(s|h) + (1-C(h))P(n-1)-gram(s|h)

Page 19: Linguistically Rich  Statistical Models of Language

Experimental Results

98.93%98.70%98.64%

98.41%98.16%

97.76%96.81%

95.77%95.47%

95.24%

94.34%

92.70%91.86%

90.90%89.94%

88.11%

93.25%

94.57%

82% 84% 86% 88% 90% 92% 94% 96% 98% 100%

drug-nyse

nyse-drug_movie_place_person

nyse-place

nyse-person

drug-person

nyse-movie

drug-nyse_movie_place_person

drug-movie

person-drug_nyse_movie_place

drug-place

nyse-place-person

place-person

drug-nyse-place-person

movie-person

place-drug_nyse_movie_person

movie-drug_nyse_place_person

movie-place

drug-nyse-movie-place-person

pairwise1-alln-way

Page 20: Linguistically Rich  Statistical Models of Language

Knowledge of Frequencies

Linguistics traditionally assumes Knowledge of Language doesn’t involve counting

Letter frequencies are clearly an important source of knowledge for unknown words

Similarly, we saw before that there are regular patterns to exploit in grammatical information

Take home point: Combining Statistical NLP methods with

richer linguistic representations is a big win!

Page 21: Linguistically Rich  Statistical Models of Language

Language is Ambiguous!

Ban on Nude Dancing on Governor’s Desk – from a Georgia newspaper column discussing current legislation

Lebanese chief limits access to private parts – talking about an Army General’s initiative

Death may ease tension – an article about the death of Colonel Jean-Claude Paul in Haiti

Iraqi Head Seeks Arms Juvenile Court to Try Shooting Defendant Teacher Strikes Idle Kids Stolen Painting Found By Tree

Page 22: Linguistically Rich  Statistical Models of Language

Language is Ambiguous!

Local HS Dropouts Cut in Half Obesity Study Looks for Larger Test Group British Left Waffles on Falkland Islands Red Tape Holds Up New Bridges Man Struck by Lightning Faces Battery

Charge Clinton Wins on Budget, but More Lies

Ahead Hospitals Are Sued by 7 Foot Doctors Kids Make Nutritious Snacks

Page 23: Linguistically Rich  Statistical Models of Language

Coping With Ambiguity

Categorical grammars like HPSG provide many possible analyses for sentences 455 parses for “List the sales of the products

produced in 1973 with the products produced in 1972.” (Martin et al, 1987)

In most cases, only one interpretation is intended

Initial solution was hand-coded preferences among rules Hard to manage as number of rules increase Need to capture interactions among rules

Page 24: Linguistically Rich  Statistical Models of Language

Statistical HPSG Parse Selection

HPSG provides deep analyses of sentence structure and meaning Useful for NLP tasks like question answering

Need to solve disambiguation problem to make using these richer representations practical

Idea: Learn statistical preferences among constructions from hand-disambiguated collection of sentences

Result: Correct analysis chosen >80% of the time

StatNLP methods + Linguistic representation = Win

Page 25: Linguistically Rich  Statistical Models of Language

Towards Semantic Extraction

HPSG provides representation of meaning Who did what to whom?

Computers need meaning to do inference Can we extend information extraction methods to

extract meaning representations from pages? Current project: IE for the semantic web

Large project to build rich ontologies to describe the content of web pages for intelligent agents

Use IE to extract new instances of concepts from web pages (as opposed to manual labeling)

student(Joseph), univ(Stanford), at(Joseph, Stanford)

Page 26: Linguistically Rich  Statistical Models of Language

Towards the Grand Vision?

Collaboration between Theoretical Linguistics and NLP is important step forward Practical tools with sophisticated language power

How can we ever teach computers enough about language and the world? Hawking: Moore’s Law is sufficient Moravec: mobile robots must learn like children Kurzweil: reverse-engineer the human brain

The experts agree:

Symbolic Systems is the future!

Page 27: Linguistically Rich  Statistical Models of Language

Upcoming Convergence Courses

Ling 139M Machine Translation Win

Ling 239E Grammar Engineering Win

CS 276B Text Information Retrieval Win

Ling 239A Parsing and Generation Spr

CS 224N Natural Language Processing Spr

Get Involved!!Get Involved!!