survey on wsd and ir apex@sjtu. wsd: introduction problems in online news retrieval system: query:...

Survey on WSD and IR

Apex@SJTU

WSD: Introduction

Problems in online news retrieval system:

query: “major”

Articles retrieved: about “Prime Minister John Major MP” “major” appears as an adjective “major” appears as a military rank

WSD: Introduction Gale, Church and Yarowsky (1992) cite work dating b

ack to 1950. For many years, WSD was applied only to limited do

mains and a small vocabulary. In recent years, disambiguators are applied to resolv

e the senses of words in a large heterogeneous corpus.

With a more accurate representation and a query also marked up with word sense, researchers believe that the accuracy of retrieval would have to improve.

Approaches to disambiguation

Disambiguation based on manually generated rules

Disambiguation using evidence from existing corpora.


Weiss (1973): general context rule:

If the word “type” appears near to “print”, it most likely meant a small block of metal bearing a raised character on one end.

template rule: If “of” appears immediately after “type”, it

most likely meant a subdivision of a particular kind of thing.

Weiss (1973): Template rules were better, so replied them first. To create rules:

Examine 20 occurrences of an ambiguous word.Test these manually created rules on a further 30

occurrences. Accuracy: 90% Cause for errors: idiomatic uses.


Kelly and Stone (1975): created a set of rules for 6,000 wordsconsisted of contextual rules similar to those

of Weissin addition, used grammatical category of a

word as a strong indicator of sense: “the train” and “to train”

Kelly and Stone (1975): The grammar and context rules were groupe

d into sets so that only certain rules were applied in certain situations.

Conditional statements controlled the application of rule sets.

Unlike Weiss’s system, this disambiguator was designed to process a whole sentence at the same time.

Accuracy: not a success


Small and Rieger (1982) came to similar conclusions.

When this type of disambiguator was extended to work on larger vocabulary, the effort involved in building it became too great.

Since 1980s, WSD research has concentrated on automatically generated rules based on sense evidence derived from a machine readable corpus.

Disambiguation using evidence from existing corpora

Lesk (1988): Resolve the sense of “ash” in :

There was ash from the coal fire. Dictionary definition looked up:

ash(1): The soft grey powder that remains after something has been burnt.

ash(2): A forest tree common in Britain. Definition of context words looked up:

coal(1): A black mineral which is dub from the earth, which can be burnt to given heat.

fire(1): The condition of burning; flames, light and great heat.

fire(2): The act of firing weapons or artillery at an enemy.

Lesk (1988): Sense definitions are ranked by scoring

function based on the number of words that co-occur.

Questionable: how often the word overlap necessary for disambiguation occurred.

Accuracy: “very brief experimentation”, 50%--70% No analysis for the failure, although

definition length is recognized as a possible factor in deciding which dictionary to use.

Disambiguation using evidence from existing corpora Wilks et al. (1990): addressed this word overlap problem by usi

ng a technique of expanding a dictionary definition with words that commonly co-occurred with the text of that definition.

Co-occurrence information was derived from all definition texts in the dictionary.

Wilks et al. (1990): Longman’s Dictionary of Contemporary English

(LDOCE): all its definitions were written using a simplified vocabulary of around 2,000 words.

Few synonyms, a distracting element in the co-occurrence calculation.

“bank”: for economic sense: “money”, ”check”, ”rob” for geographical sense: “river”, ”flood”, ”bridge” Accuracy: “bank” in 200 sentences, judged correct if it coincides

with one manually chosen, 53% at fine-grained level(13 senses) and 85% at coarse-grained(5 senses) level.

They suggested using simulated annealing to disambiguate a whole sentence simultaneously.

Disambiguating simultaneously

Cowie et al. (1992): Accuracy: tested on 67 sentences, 47

% for fine-grained senses while 72% for coarse-grained ones.

No comparison with Wilks et al.’s. No baseline. A possible baseline: senses randomly chosen

A better one: select the most common sense

Manually tagging a corpus A technique in POS tagging:

manually mark up a large text corpus with POS tag, and then train a statistical classifier to associate features with occurrences of the tags.

Ng and Lee (1996): disambiguate 192,000 occurrences of 191 words. examine the following features:

POS and morphological form of the sense tagged word unordered set of its surrounding words local collocations relative to it and if the sense tagged word was a noun, the presence of a

verb was noted also.

Ng and Lee (1996): Experiments:

separated their corpus into training and test sets on an 89%--11% split

accuracy: 63.7% (baseline: 58.1%) sense definition used were from WordNet, 7.8 sense

s per word for nouns and 12.0 senses for verbs no comparison possible between WordNet definition

or LDOCE

Using thesauri: Yarowsky (1992)

Roget’s thesaurus: 1,042 semantic categories Grolier Multimedia Encyclopedia

To decide which semantic category an ambiguous word occurrence should be assigned: a set of clue words, one set for each category, was derived

from a POS tagged corpus the context of each occurrence was gathered a term selection process similar to relevance feedback was

used to derive clue words

Yarowsky (1992) Eg. clue words for animal/insects:

species, family bird, fish, cm, animal, tail, egg, wild, common, coat, female, inhabit, eat, nest

Comparison between words in the context and the clue word sets

Accuracy: 12 ambiguous words, several hundred occurrences, 92% of accuracy on average

Comparison were suspect.

Testing disambiguators

Few “pre-disambiguated” test corpora publicly available.

A sense tagged version of the Brown corpus, called SEMCOR, is available. Trec-like effort underway, called SENSEVAL.

WSD and IR experiments Voorhees (1993): based on WordNet:

Each of 90,000 words and phrases is assigned to one or more synsets.

A synset is a set of words that are synonyms of each other; the words of a synset define it and its meaning.

All synsets are linked together to form a mostly hierarchical semantic network based on hypernymy and hyponymy.

Other relations: meronymy, holonymy, antonymy.

Voorhees (1993): the hood of a word sense contained in syns

et s: largest connected sub graph; contains s; contains only descendants of an ancestor of

s contains no synset that has a descendent th

at includes another instance of a member of s. Consistently worse, tagging sense inaccura

tely

The hood of the first sense of “house” would include the words: housing, lodging, apartment, flat, cabin, gatehouse, bungalow, cottage.

Wallis (1993)

replace words with definitions from LDOCE. “ocean” and “sea”: ocean: The great mass of salt water that co

vers most of the earth; sea: the great body of salty water that cover

s much of the earth’s surface. disappointing results. no analysis of the cause.

Sussna (1993) Assign a weight to all relations and calculate the s

emantic distance between two synsets. Calculate semantic distance between context wor

ds and each of the the synsets to rank the synsets. Parameters: size of context (41 as optimal), the n

umber of words (only 10 because of computation consideration) disambiguated simultaneously.

Accuracy: 56%

Analyses of WSD & IR Krovetz & Croft: sense mismatches were si

gnificantly more likely to occur in non-relevant documents. word collocation skewed frequency distribution

Situations under which WSD may prove useful: where collocation is less prevalent where query words were used in a minority sen

se

Analyses of WSD & IR Sanderson (1994,1997):

pseudo-words: banana/kalashnikov/anecdote experiments on the factor of query length:

effectiveness of retrievals based on short query was greatly affected by the introduction of ambiguity but much less so for longer queries.

Analyses of WSD & IR Gonzalo et al. (1998): experiments based on SEM

COR, write a summary for each document and use it as a query, which is related with only one relevant document.

Cause for error: sense may be too specific newspaper as a business concern as opposed to

the physical object

Gonzalo et al. (1998):

synset based representation: retrieval based on synset seems to be the b

est erroneous disambiguation and its impact on

retrieval effectiveness: baseline precision: 52.6% when error 30%, precision 54.4% when error 60%, precision 49.1%

Sanderson (1997):

output word sense in a list ranked by a confidence score

accuracy: worse than the one without sense, better than the one tagged with one sense.

possible cause: errors.

Disambiguation without sense definition

Zernik (1991): generate cluster for an ambiguous word by thre

e criteria: context words, grammatical category and derivational morphology.

associate the cluster with a dictionary sense.

eg. “train”: 95% of accuracy, grammatical category“office”: full of error

Disambiguation without sense definition

Schutze and Pederson (1995): Very few of the results which show 14% improvement

Cluster based on context words only: words with similar context are put into the same cluster, but recognized as a cluster if only the context appears more than fifty time sin corpus

Similar context of “ball”: tennis, football, cricket. Thus this method breaks up a word’s commonest sense into a number of uses (the sporting sense of ball).

Schutze and Pederson (1995):

score each use of a word representing a word occurrence by

just the word word with its commonest use word with n of its uses

WSD in IR Revisited sigir’03 Skewed frequency distributions coupled with the q

uery term co-occurrence effect are the reasons why traditional IR techniques that don’t take sense into account are not penalized severely.

The impact of inaccurate fine grained WSD has an extreme negative effect on the performance of an IR system.

To achieve increases in performance, it is imperative to minimize the impact of the inaccurate disambiguation.

The need for 90% accurate disambiguation in order to see performance increases remains questionable.

The WSD methods applied A number of experiments were tried, but nothing b

etter than the following was found: applying each of knowledge source (collocations, co-occurrence, and sense frequency) in a stepwise fashion:

a context window consisting of the sentence surrounding the target word to identify sense of the word

examine the surrounding sentence if it contained any collocates we have observed from Semcor

specific sense data

WSD in IR Revisited: Conclusions

Reasons for success:

high precision WSD technique

sense frequency statistics Resilience of vector space model Analysis for Schutze and Pederson’s succe

ss: added tolerance

“A highly accurate bootstrapping algorithm for word sense disambiguatio

n” Rada M. 2000

Disambiguate all nouns and verbs: step 1: complex nominals step 2: name entity step 3: word pairs, based on SEMCOR (previous word, word) pair, (word, successive wor

d) pair step 4: context, based on SEMCOR and WordNet in WordNet, hypernym are also its context

“A highly accurate bootstrapping algorithm for word sense disambiguation” (cont’d)

step 5: words with semantic distance 0 from some words which has already been disambiguated

step 6: words with semantic distance 1 from some words which has already been disambiguated

step 7: words with semantic distance 0 among ambiguous words

step 8: words with semantic distance 1 among ambiguous words

“An Effective Approach to Document Retrieval via Utilizing WordNet and R

ecognizing Phrases” sigir 04

Significant increase for short query Only WSD on Query and Query Expansion Phrase-based and Term-based PSEUDO-RELEVANCE

Phrases identification

4 types of phrases: Proper names (Name Entity), Dictionary Phrases( by WordNet), a simple phrases, a complex phrase

Decide windows size of simple/complex phrases by calculate correlation

Correlation

WSD

Unlike Rada Miha’s WSD, Liu didn’t utilize Semcor, only utilize WORDNET

6 step, basic ideas, by hyper, hypo, cross-reference,etc

Query Expansion

Add Synonyms(conditional) Add Definition Words( only first shortest nou

n phrase) conditional if it is highly globally correlated

Add Hyponyms(conditional) Add Compound Word(conditional)

PSEUDO RELEVANCE FEEDBACK

Using Global Correlations and Wordnet

Global_cor>1 and one of two conditions: 1: monosense 2:its defintion contains some other query terms 3.it is in top10 ranked documents

Combining Local and Global Correlations:

Results

SO: standard Okapi (term-similarity) NO: enhanced SO NO+P: +phrase-similarity NO+P+D: +WSD NO+P+D+F: +Pseudo-feedback

Results:

Model conclusionWSD query onlyWSD only by Wordnet, no se

mcorQuery Complicate ExpansionPseudo-relevance feedbackPhrases and term-based

Thank you!

survey on wsd and ir apex@sjtu. wsd: introduction problems in online news retrieval system: query:...

Documents

generated rules

context rules

word sense

certain rules

template rules

sense evidence

set of rules

word type