word sense disambiguation and information retrieval chapter 17 jurafsky, d. & martin j. h....

19
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - [email protected]

Upload: hilary-mccormick

Post on 17-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Word sense disambiguation and information retrieval

Chapter 17

Jurafsky, D. & Martin J. H.

SPEECH and LANGUAGE PROCESSING

Jarmo Ritola - [email protected]

Page 2: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Lexical Semantic Processing

• Word sense disambiguation– which sense of a word is being used– non-trivial task– robust algorithms

• Information retrieval– broad field– storage and retrieval of requested text documents– vector space model

Page 3: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Word Sense Disambiguation

• (17.1) “..., everybody has a career and none of them includes washing DISHES”

• (17.2) “In her tiny kitchen at home, Ms. Chen works efficiently, stir-frying several simple DISHES, including braised pig’s ears and chicken livers with green peppers”

• (17.6) “I’m looking for a restaurant that SERVES vegetarian DISHES”

Page 4: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Selectional Restriction

• Rule-to-rule approach

• Blocks the formation of representations with selectional restriction violations

• Correct sense achieved as side effect

• PATIENT roles, mutual exclution– dishes + stir-fry => food sense– dishes + wash => artifact sense

• Need: hierarchical types and restrictions

Page 5: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

S.R. Limitations

• Selectional restrictions too general – (17.7) … kind of DISHES do you recommend?

• True restriction violations– (17.8) …you can’t EAT gold for lunch… – negative environment – (17.9) … Mr. Kulkarni ATE glass …

• Metaphoric and metonymic uses• Selectional association (Resnik)

Page 6: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Robust Word Sense Disambiguation

• Robust, stand alone systems

• Preprocessing– part-of-speech tagging, context selection,

stemming, morphological processing, parsing…

• Feature selection, feature vector

• Train classifier to assign words to senses

• Supervised, bootstrapping, unsupervised

• Does the system scale?

Page 7: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Inputs: Feature Vectors

• Target word, context

• Select relevant linguistic features

• Encode them in a usable form

• Numeric or nominal values

• Collocational features

• Co-occurrence features

Page 8: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Inputs: Feature Vectors (2)

• (17.11) An electric guitar and BASS player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps.

• Collocational [guitar, NN1, and, CJC, player, NN1, stand, VVB]

• Co-occurrence[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0]

fishing, big, sound, player, fly, rod, pound, double, runs, playing, guitar, band

Page 9: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Supervised Learning

• Feature-encoded inputs + categories

• Naïve Bayes classifier

• Decision list classifiers– case statements– tests ordered according to sense likelihood

VP

sPsVPVsPs

SsSs maxargmaxarg

n

Jj

SssvPsPs

1

maxarg

Page 10: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Bootstrapping Approaches

• Seeds, small number of labeled instances• Initial classifier extracts larger training set• Repeat => series of classifier with improving

accuracy and coverage• Hand labeling examples• One sense per collocation

– Also automatic selection from machine readable dictionary

Page 11: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Unsupervised Methods

• Unlabeled feature vectors are grouped into clusters according to a similarity metric

• Clusters are labeled by hand• Agglomerative clustering• Challenges

– correct senses may not be known– heterogeneous clusters– Number clusters and senses differ

Page 12: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Dictionary Based Approaches

• Large-scale disambiguation possible

• Sense definitions retrieved from the dictionary– The sense with highest overlap within context

words

• Dictionary entries relative short– Not enough overlap– expand word list, subject codes

Page 13: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Information Retrieval

• Compositional semantics• Bag of words methods• Terminology

– document– collection– term– query

• Ad hoc retrieval

Page 14: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

The Vector Space Model

• List of terms within the collection• document vector: presence/absence of terms

• raw term frequency• normalization => direction of vector• similarity is cosine of angle between vectors

jnjjjj wwwwd ,,3,2,1 ...,,, knkkkk wwwwq ,,3,2,1 ...,,,

ji

N

ikikkkk wwdqdqsim ,

1,,

Page 15: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

The Vector Space Model

• Document collection• Term by weight matrix

00.5

10

1

2

3

4

5

0

1

2

3

4

5

6

Processing

Ch 7 (6,0,1)

Ch 1 (1,2,1)

Language

Ch 13 (0,5,1)

Sp

ee

ch

19.98.0

16.098.

41.81.41.

A

150

106

121

A

Page 16: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Term Weighting

• Enormous impact on the effectiveness– Term frequency within a single document– Distribution of term across a collection

• Same weighting scheme for documents and queries

• Alternative weighting methods for queries– AltaVista: di,j contains 1’000’000’000 words– average query: 2.3 words

Page 17: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Recall versus precision

• Stemming

• Stop list

• Homonymy, polysemy, synonymy, hyponymy

• Improving user queries– relevance feedback– query expansion, thesaurus, thesaurus

generation, term clustering

Page 18: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Summary

• WSD: assign word to senses

• Selectional restriction

• Machine learning approaches (small scale)– supervised, bootstrapping, unsupervised

• Machine readable dictionaries (large scale)

• Bag of words method, Vector space model

• Query improvement (relevance feedback)

Page 19: Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola - jarmo.ritola@hut.fi

Exercise - Relevance Feedback

Documents Raw term frequencyspeech language

doc 0 0 10doc 1 1 9doc 2 2 8doc 3 3 7doc 4 4 6doc 5 5 5doc 6 6 4doc 7 7 3doc 8 8 2doc 9 9 1doc 10 10 0

The document collection is ordered according to the 'raw term frequency' of words "speech" and "language". The values and ordering is shown in the table below.

You want to find documents with many "speech" words but few "language" words (e.g. relation 8 : 2). Your initial query is {"speech", "language"}, i.e. they have equal weights.

The search machine always returns three most similar documents.

• Show that with relevance feedback you get the documents you want. • How important is the correctness of feedback from the user?