word sense disambiguation (1) instructor: paul tarau, based on rada mihalcea’s original slides

33
Word sense disambiguation (1) Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Note: Some of the material in this slide set was adapted from a tutorial given by Rada Mihalcea & Ted Pedersen at ACL 2005

Upload: truda

Post on 25-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Word sense disambiguation (1) Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Note : Some of the material in this slide set was adapted from a tutorial given by Rada Mihalcea & Ted Pedersen at ACL 2005. Definitions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Word sense disambiguation (1)

Instructor: Paul Tarau, based on Rada Mihalcea’s original slidesNote: Some of the material in this slide set was adapted from a tutorial given by Rada Mihalcea & Ted Pedersen at ACL 2005

Page 2: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 2

Definitions

Word sense disambiguation is the problem of selecting a sense for a word from a set of predefined possibilities. Sense Inventory usually comes from a dictionary or thesaurus.Knowledge intensive methods, supervised learning, and

(sometimes) bootstrapping approaches

Word sense discrimination is the problem of dividing the usages of a word into different meanings, without regard to any particular existing sense inventory.Unsupervised techniques

Page 3: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 3

Computers versus Humans

Polysemy – most words have many possible meanings.A computer program has no basis for knowing which one is

appropriate, even if it is obvious to a human…Ambiguity is rarely a problem for humans in their day to day

communication, except in extreme cases…

Page 4: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 4

Ambiguity for Humans - Newspaper Headlines!DRUNK GETS NINE YEARS IN VIOLIN CASEFARMER BILL DIES IN HOUSE PROSTITUTES APPEAL TO POPE STOLEN PAINTING FOUND BY TREE RED TAPE HOLDS UP NEW BRIDGEDEER KILL 300,000RESIDENTS CAN DROP OFF TREESINCLUDE CHILDREN WHEN BAKING COOKIES MINERS REFUSE TO WORK AFTER DEATH

Page 5: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 5

Ambiguity for a Computer

The fisherman jumped off the bank and into the water.The bank down the street was robbed!Back in the day, we had an entire bank of computers devoted

to this problem. The bank in that road is entirely too steep and is really

dangerous. The plane took a bank to the left, and then headed off

towards the mountains.

Page 6: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 6

Early Days of WSD

Noted as problem for Machine Translation (Weaver, 1949)A word can often only be translated if you know the specific sense

intended (A bill in English could be a pico or a cuenta in Spanish)

Bar-Hillel (1960) posed the following:Little John was looking for his toy box. Finally, he found it. The

box was in the pen. John was very happy.Is “pen” a writing instrument or an enclosure where children

play?…declared it unsolvable, left the field of MT!

Page 7: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 7

Since then…

1970s - 1980s Rule based systemsRely on hand crafted knowledge sources

1990s Corpus based approachesDependence on sense tagged text(Ide and Veronis, 1998) overview history from early days to 1998.

2000s Hybrid SystemsMinimizing or eliminating use of sense tagged textTaking advantage of the Web

Page 8: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 8

Practical Applications

Machine TranslationTranslate “bill” from English to Spanish

Is it a “pico” or a “cuenta”?Is it a bird jaw or an invoice?

Information RetrievalFind all Web Pages about “cricket”

The sport or the insect?Question Answering

What is George Miller’s position on gun control?The psychologist or US congressman?

Knowledge AcquisitionAdd to KB: Herb Bergson is the mayor of Duluth.

Minnesota or Georgia?

Page 9: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 9

Knowledge-based WSD

Task definitionKnowledge-based WSD = class of WSD methods relying

(mainly) on knowledge drawn from dictionaries and/or raw text

ResourcesYes

Machine Readable DictionariesRaw corpora

NoManually annotated corpora

ScopeAll open-class words

Page 10: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 10

Machine Readable Dictionaries

In recent years, most dictionaries made available in Machine Readable format (MRD)Oxford English DictionaryCollinsLongman Dictionary of Ordinary Contemporary English (LDOCE)

Thesauruses – add synonymy informationRoget Thesaurus

Semantic networks – add more semantic relationsWordNetEuroWordNet

Page 11: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 11

MRD – A Resource for Knowledge-based WSD

For each word in the language vocabulary, an MRD provides:A list of meaningsDefinitions (for all word meanings)Typical usage examples (for most word meanings)

WordNet definitions/examples for the noun plant1. buildings for carrying on industrial labor; "they built a large plant to

manufacture automobiles“2. a living organism lacking the power of locomotion3. something planted secretly for discovery by another; "the police used a plant to

trick the thieves"; "he claimed that the evidence against him was a plant"4. an actor situated in the audience whose acting is rehearsed but seems

spontaneous to the audience

Page 12: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 12

MRD – A Resource for Knowledge-based WSDA thesaurus adds:

An explicit synonymy relation between word meanings

A semantic network adds:Hypernymy/hyponymy (IS-A), meronymy/holonymy (PART-OF),

antonymy, entailnment, etc.

WordNet synsets for the noun “plant” 1. plant, works, industrial plant 2. plant, flora, plant life

WordNet related concepts for the meaning “plant life” {plant, flora, plant life} hypernym: {organism, being} hypomym: {house plant}, {fungus}, … meronym: {plant tissue}, {plant part} holonym: {Plantae, kingdom Plantae, plant kingdom}

Page 13: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 13

Lesk Algorithm

(Michael Lesk 1986): Identify senses of words in context using definition overlap

Algorithm:Retrieve from MRD all sense definitions of the words to be

disambiguatedDetermine the definition overlap for all possible sense

combinationsChoose senses that lead to highest overlap

Example: disambiguate PINE CONE• PINE

1. kinds of evergreen tree with needle-shaped leaves2. waste away through sorrow or illness• CONE 1. solid body which narrows to a point2. something of this shape whether solid or hollow3. fruit of certain evergreen trees

Pine#1 Cone#1 = 0Pine#2 Cone#1 = 0Pine#1 Cone#2 = 1Pine#2 Cone#2 = 0Pine#1 Cone#3 = 2Pine#2 Cone#3 = 0

Page 14: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 14

Lesk Algorithm for More than Two Words?I saw a man who is 98 years old and can still walk and tell

jokesnine open class words: see(26), man(11), year(4), old(8), can(5),

still(4), walk(10), tell(8), joke(3)

43,929,600 sense combinations! How to find the optimal sense combination?

Simulated annealing (Cowie, Guthrie, Guthrie 1992)Define a function E = combination of word senses in a given

text.Find the combination of senses that leads to highest definition

overlap (redundancy) 1. Start with E = the most frequent sense for each word 2. At each iteration, replace the sense of a random word in the

set with a different sense, and measure E 3. Stop iterating when there is no change in the configuration

of senses

Page 15: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 15

Lesk Algorithm: A Simplified Version

Original Lesk definition: measure overlap between sense definitions for all words in contextIdentify simultaneously the correct senses for all words in context

Simplified Lesk (Kilgarriff & Rosensweig 2000): measure overlap between sense definitions of a word and current contextIdentify the correct sense for one word at a time

Search space significantly reduced

Page 16: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 16

Lesk Algorithm: A Simplified Version

Example: disambiguate PINE in

“Pine cones hanging in a tree”

• PINE

1. kinds of evergreen tree with needle-shaped leaves

2. waste away through sorrow or illness

Pine#1 Sentence = 1Pine#2 Sentence = 0

• Algorithm for simplified Lesk:1.Retrieve from MRD all sense definitions of the word to be

disambiguated

2.Determine the overlap between each sense definition and the current context

3.Choose the sense that leads to highest overlap

Page 17: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 17

Evaluations of Lesk Algorithm

Initial evaluation by M. Lesk50-70% on short samples of text manually annotated set, with

respect to Oxford Advanced Learner’s DictionarySimulated annealing

47% on 50 manually annotated sentencesEvaluation on Senseval-2 all-words data, with back-off to

random sense (Mihalcea & Tarau 2004)Original Lesk: 35%Simplified Lesk: 47%

Evaluation on Senseval-2 all-words data, with back-off to most frequent sense (Vasilescu, Langlais, Lapalme 2004)Original Lesk: 42%Simplified Lesk: 58%

Page 18: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 18

Selectional Preferences

A way to constrain the possible meanings of words in a given context

E.g. “Wash a dish” vs. “Cook a dish” WASH-OBJECT vs. COOK-FOOD

Capture information about possible relations between semantic classes Common sense knowledge

Alternative terminologySelectional Restrictions Selectional PreferencesSelectional Constraints

Page 19: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 19

Acquiring Selectional Preferences

From annotated corporaCircular relationship with the WSD problem

Need WSD to build the annotated corpusNeed selectional preferences to derive WSD

From raw corpora Frequency countsInformation theory measuresClass-to-class relations

Page 20: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 20

Preliminaries: Learning Word-to-Word RelationsAn indication of the semantic fit between two words 1. Frequency counts

Pairs of words connected by a syntactic relations

2. Conditional probabilitiesCondition on one of the words

),,( 21 RWWCount

),( ),,( ),|(

2

2121 RWCount

RWWCountRWWP

Page 21: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 21

Learning Selectional Preferences (1)

Word-to-class relations (Resnik 1993)Quantify the contribution of a semantic class using all the

concepts subsumed by that class

where )(),|(log),|(

)(),|(log),|(

),,(

2

1212

2

1212

21

2CP

RWCPRWCP

CPRWCPRWCP

RCWA

C

22)(

),,(),,(

),(

),,(),|(

2

2121

1

2112

CW WCountRWWCountRCWCount

RWCountRCWCountRWCP

Page 22: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 22

Learning Selectional Preferences (2)Determine the contribution of a word sense based on the

assumption of equal sense distributions:e.g. “plant” has two senses 50% occurrences are sense 1,

50% are sense 2

Example: learning restrictions for the verb “to drink”Find high-scoring verb-object pairs

Find “prototypical” object classes (high association score)

Co-occ score Verb Object11.75 drink tea11.75 drink Pepsi11.75 drink champagne10.53 drink liquid10.2 drink beer9.34 drink wine

A(v,c) Object class3.58 (beverage, [drink, …])2.05 (alcoholic_beverage, [intoxicant, …])

Page 23: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 23

Using Selectional Preferences for WSD

Algorithm:1. Learn a large set of selectional preferences for a given

syntactic relation R2. Given a pair of words W1– W2 connected by a relation R3. Find all selectional preferences W1– C (word-to-class) or

C1– C2 (class-to-class) that apply4. Select the meanings of W1 and W2 based on the selected

semantic class

Example: disambiguate coffee in “drink coffee”1. (beverage) a beverage consisting of an infusion of ground coffee beans

2. (tree) any of several small trees native to the tropical Old World

3. (color) a medium to dark brown color

Given the selectional preference “DRINK BEVERAGE” : coffee#1

Page 24: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 24

Evaluation of Selectional Preferences for WSDData set

mainly on verb-object, subject-verb relations extracted from SemCor

Compare against random baselineResults (Agirre and Martinez, 2000)

Average results on 8 nounsSimilar figures reported in (Resnik 1997)

Object SubjectPrecision Recall Precision Recall

Random 19.2 19.2 19.2 19.2Word-to-word 95.9 24.9 74.2 18.0Word-to-class 66.9 58.0 56.2 46.8Class-to-class 66.6 64.8 54.0 53.7

Page 25: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 25

Semantic Similarity

Words in a discourse must be related in meaning, for the discourse to be coherent (Haliday and Hassan, 1976)

Use this property for WSD – Identify related meanings for words that share a common context

Context span: 1. Local context: semantic similarity between pairs of words 2. Global context: lexical chains

Page 26: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 26

Semantic Similarity in a Local ContextSimilarity determined between pairs of concepts, or between

a word and its surrounding contextRelies on similarity metrics on semantic networks

(Rada et al. 1989)

carnivore

wild dogwolf

bearfeline, felidcanine, canidfissiped mamal, fissiped

dachshund

hunting doghyena dogdingo

hyenadog

terrier

Page 27: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 27

Semantic Similarity Metrics for WSD

Disambiguate target words based on similarity with one word to the left and one word to the right(Patwardhan, Banerjee, Pedersen 2002)

Evaluation:1,723 ambiguous nouns from Senseval-2Among 5 similarity metrics, (Jiang and Conrath 1997) provide the

best precision (39%)

Example: disambiguate PLANT in “plant with flowers”PLANT1. plant, works, industrial plant2. plant, flora, plant life

Similarity (plant#1, flower) = 0.2Similarity (plant#2, flower) = 1.5 : plant#2

Page 28: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 28

Semantic Similarity in a Global ContextLexical chains (Hirst and St-Onge 1988), (Haliday and Hassan 1976)“A lexical chain is a sequence of semantically related words,

which creates a context and contributes to the continuity of meaning and the coherence of a discourse”

Algorithm for finding lexical chains:Select the candidate words from the text. These are words for which

we can compute similarity measures, and therefore most of the time they have the same part of speech.

For each such candidate word, and for each meaning for this word, find a chain to receive the candidate word sense, based on a semantic relatedness measure between the concepts that are already in the chain, and the candidate word meaning.

If such a chain is found, insert the word in this chain; otherwise, create a new chain.

Page 29: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 29

Semantic Similarity of a Global Context

A very long train traveling along the rails with a constant velocity v in a certain direction …

train #1: public transport

#2: order set of things

#3: piece of cloth

travel

#1 change location

#2: undergo transportation

rail #1: a barrier

# 2: a bar of steel for trains

#3: a small bird

Page 30: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 30

Lexical Chains for WSD

Identify lexical chains in a textUsually target one part of speech at a time

Identify the meaning of words based on their membership to a lexical chain

Evaluation:(Galley and McKeown 2003) lexical chains on 74 SemCor texts

give 62.09%(Mihalcea and Moldovan 2000) on five SemCor texts give 90%

with 60% recalllexical chains “anchored” on monosemous words

(Okumura and Honda 1994) lexical chains on five Japanese texts give 63.4%

Page 31: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 31

Example: “plant/flora” is used more often than “plant/factory” - annotate any instance of PLANT as “plant/flora”

Heuristics: Most Frequent Sense

Identify the most often used meaning and use this meaning by default

Word meanings exhibit a Zipfian distributionE.g. distribution of word senses in SemCor

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6 7 8 9 10

Sense number

Fre

quen

cy

Noun

Verb

Adj

Adv

Page 32: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 32

E.g. The ambiguous word PLANT occurs 10 times in a discourse all instances of “plant” carry the same meaning

Heuristics: One Sense Per DiscourseA word tends to preserve its meaning across all its occurrences in a

given discourse (Gale, Church, Yarowksy 1992)What does this mean?

Evaluation: 8 words with two-way ambiguity, e.g. plant, crane, etc.98% of the two-word occurrences in the same discourse carry the same

meaningThe grain of salt: Performance depends on granularity

(Krovetz 1998) experiments with words with more than two sensesPerformance of “one sense per discourse” measured on SemCor is

approx. 70%

Page 33: Word sense disambiguation (1) Instructor: Paul Tarau, based on  Rada Mihalcea’s  original slides

Slide 33

The ambiguous word PLANT preserves its meaning in all its occurrences within the collocation “industrial plant”, regardless of the context where this collocation occurs

Heuristics: One Sense per CollocationA word tends to preserve its meaning when used in the same

collocation (Yarowsky 1993)Strong for adjacent collocationsWeaker as the distance between words increases

An example

Evaluation:97% precision on words with two-way ambiguity

Finer granularity:(Martinez and Agirre 2000) tested the “one sense per

collocation” hypothesis on text annotated with WordNet senses

70% precision on SemCor words