Download - COMP791A: Statistical Language Processing
1
COMP791A: Statistical Language Processing
Word Sense DisambiguationChap. 7
2
Overview of the problem Many words have several meanings or senses
(homonyms or polysemous words) Ex: “chair” --> furniture or person Ex: “dishes” --> plates or food
Need to determine which sense of a word is used in a specific sentence
Note: often, the different senses of a word are closely related
Ex: “title” --> right of legal ownership, document that is evidence of the
legal ownership, name of work,…
often, several senses can be “activated” in a single context (co-activation)
Ex: “This could bring competition to the trade” Competition --> the act of competing AND the people who
are competing
3
Word Sense Disambiguation (WSD) To determine which of the senses of an
ambiguous word is invoked in a particular use of the word.
Potentially extremely useful problem Ex: in machine translation…
“chair” --> (person) “directeur” “chair” --> (furniture) “chaise” “bureau” --> “desk” “bureau” --> “office”
Can be done: with rule-based methods with statistical methods
4
WordNet most widely-used lexical database for English free! G. Miller at Princeton
www.cogsci.princeton.edu/~wn used in many applications of NLP EuroWorNet
Dutch, Italian, Spanish, German, French, Czech and Estonian
includes entries for open-class words only (nouns, verbs, adjectives & adverbs)
5
WordNet Entries in WordNet 1.6 (now 2.0):
118,000 different word forms organized according to their meanings (senses)
each entry has a dictionary-style definition (gloss) of each sense AND a set of domain-independent lexical relations among
WordNet’s entries (words) senses sets of synonyms
grouped into synsets (i.e. sets of synonyms)
6
Example 1: WordNet entry for verb serve
7
Rule-based WSD They served green-lipped mussels from New Zealand. Which airlines serve Denver?
semantic restrictions on the predicate of an argument
argument mussels:--> needs a predicate with the sense {provide-food}--> sense 6 of WordNet
argument Denver:--> needs a predicate with the sense {attend-to}--> sense 10 of WordNet
8
Example 2: WordNet entry for dish
9
Rule-based WSD In our house, everybody has a career and none of them includes
washing dishes. In her tiny kitchen, Ms. Chen works efficiently, stir-frying several
simple dishes, including braised pig’s ears and chicken livers with green peppers.
semantic restrictions on the argument of a predicate
predicate wash: --> needs an argument with the sense {object}--> senses 1, 2 or 6 form WordNet
predicate stir-fry:--> needs an argument with the sense {food}--> sense 2 of WordNet
10
Problem with rule-based WSD
In some cases, the constraints on the predicate and on the argument are not enough to pinpoint one unique sense ex: “What kind of dishes do you recommend?”
Figures of speech meaning of words can be generated
dynamically instead of being fixed and stored in a lexicon
or set of selectional restrictions Ex: metaphor, metonymy
11
Problem with rule-based WSD (con’t)
Metaphor: using words / phrases whose meaning are
appropriate to different kinds of concepts suggesting a likeness or analogy between them
This deal does not scare Microsoft. scare has 2 senses in WordNet:
to cause fear to cause to lose courage
metaphor: the corporation is viewed as a person
She is drowning in money metaphor: money is viewed as a liquid
12
Problem with rule-based WSD (con’t)
Metonymy: referring to a concept by naming some other
concept closely related to it We await word from the crown.
a monarch is not the same thing as a crown but we often refer to the monarch as "the crown"
because the two are associated Metonymy : the crown refers to the monarch
The White House had no comment. Metonymy : The White House refers to the
administration
13
WSD versus POS tagging “butter” can be a verb or noun
“I should butter my toasts.” “I like butter on my toasts.”
2 different POS --> 2 different usages with 2 different meanings So WSD can be viewed as POS tagging (classifying using semantic
tags rather than POS tags) But the 2 tasks are considered different… because:
nearby structural cues (ex: is the previous word a determiner?) are important in POS tagging are not effective for WSD
distant content words are very effective for WSD are not interesting for POS
So: in POS tagging, we typically only look at the local context in WSD, we use content words in a larger context
14
Approaches to Statistical WSD Supervised Disambiguation
based on a labeled training set The learning system has:
a training set of feature-encoded inputs AND their appropriate sense label (category)
Based on Lexical Resources use of external lexical resources such as dictionaries
and thesauri Discourse properties Unsupervised Disambiguation
based on unlabeled corpora The learning system has:
a training set of feature-encoded inputs BUT NOT their appropriate sense label (category)
15
Approaches to Statistical WSD --> Supervised Disambiguation
Naïve Bayes Decision Trees
Use of Lexical Resources Dictionary-based Thesaurus-based Translation-based
Discourse properties Unsupervised Disambiguation
16
Supervised WSD: Overview
Surrounding words Most probable sense
…river… fish
…violin… instrument
…salmon… fish
play/ Verb + bass instrument
bass + player instrument
…striped… fish
A word is assumed to have a finite number of discrete senses.
The sense of a word depends on the sense of surrounding words
ex: bass = fish, musical instrument, ...
17
WSD is viewed as typical classification problem use machine learning techniques to train a system that learns a classifier (a function f) to assign to
unseen examples one of a fixed number of senses (categories)
f(input) = correct sense Input:
Target word: The word to be disambiguated
Context (feature vector): a vector of relevant linguistic features that
represents its context (ex: a window of words around the target word)
Supervised WSD: Overview (con’t)
18
Examples of Feature Vectors Take a window of n word around the target word Encode information about the words around the target word
typical features include: words, root forms, POS tags, frequency, …
An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps.
with position information [ (guitar, NN1), (and, CJC), (player, NN1), (stand, VVB) ]
no position information, but word frequency [fishing, big, sound, player, fly, rod, pound, double, runs, playing, guitar,
band] [0,0,0,1,0,0,0,0,0,0,1,0]
other features: [followed by "player", contains "show" in the sentence,…] [yes, no, … ]
19
Supervised WSD
Training corpus Each occurrence of the ambiguous word w
is annotated with a semantic label (its contextually appropriate sense sk).
Several approaches from ML Bayesian classification Decision trees Neural networks K-nearest neighbor (kNN) …
20
Approaches to Statistical WSD --> Supervised Disambiguation
--> Naïve Bayes Decision Trees
Use of Lexical Resources Dictionary-based Thesaurus-based Translation-based
Discourse properties Unsupervised Disambiguation
21
Naïve Bayes Classification Goal: choose the most probable sense s* for a word given a
vector V of surrounding words vector contains:
frequency of words vocabulary: [fishing, big, sound, player, fly, rod, …] [0, 0, 0, 2, 1, 0, …]
Bayes decision rule: s* = argmaxsk
P(sk|V)
where: S is the set of possible senses for the target word sk is a sense in S V is the feature vector (the representation of the
context) Using Bayes rule: P(V)
)P(s )s|P(V argmaxs* kk
ks
22
Decision Rule for Naive Bayes
But: P(V) is the same for all possible senses, so it does not affect the final ranking of the senses, so we can drop it.
To make the computations simpler, we often take the log of probabilities:
P(V)
)s|P(v)P(sargmaxs*:So
tindependen llyconditiona are (context) features the //
assumption ceindependen // )s|P(v )s|P(V :But
rule Bayes //P(V)
))P(ss|P(Vargmax V)|P(s argmaxs* :Goal
n
1j kjk
ks
n
1j kjk
kk
ks
k
ks
)s|P(v )P(s argmaxs*n
1j kjk
sk
n
1jkjk
s
n
1j kjk
s
)s|P(v log )P(s logargmaxs* )s|P(v )P(s logargmax s*kk
23
Training a Naïve Bayes classifier = estimating P(vj|sk) and P(sk) from a sense-tagged training
corpus= finding Maximum-Likelihood Estimation, perhaps with
appropriate smoothing
Naïve Bayes WSD
n
1jkjk
s
)s|P(v log )P(s logargmaxs*k
)s,count(v
)s|count(v )s|P(v
tkt
kj
kj
)count(word )count(s
)P(s kk
Nb of occurrences of feature j over the total nb of features appearing in windows of Sk
Nb of occurrences of sense k over nb of all occurrences
of ambiguous word
24
Naïve Bayes Algorithm// 1. trainingfor all senses sk or word w
for all words vj in the vocabularycompute
for all senses sk of word wcompute
// 2. disambiguationfor all senses sk of word w
score(sk) = log P(sk)for all words vj in the context window
score (sk) = score (sk) + log P(vj | sk)
choose s* = with the greatest score(sk)
)s,count(v )s|count(v
)s|P(v
tkt
kj
kj
)count(word )count(s
)P(s kk
25
Example Training corpus (context window = 3 words):
…Today the World Bank/BANK1 and partners are calling for greater relief……Welcome to the Bank/BANK1 of America the nation's leading financial institution… …Welcome to America's Job Bank/BANK1 Visit our site and……Web site of the European Central Bank/BANK1 located in Frankfurt……The Asian Development Bank/BANK1 ADB a multilateral development finance…
…lounging against verdant banks/BANK2 carving out the...…for swimming, had warned her off the banks/BANK2 of the Potomac. Nobody...
Training: P(the|BANK1) = 5/30 P(the|BANK2) = 3/12 P(world|BANK1) = 1/30 P(world|BANK2) = 0/12 P(and|BANK1) = 1/30 P(and|BANK2) = 0/12 … P(off|BANK1) = 0/30 P(off|BANK2) = 1/12 P(Potomac|BANK1) = 0/30 P(Potomac|BANK2) = 1/12
P(BANK1) = 5/7 P(BANK2) = 2/7
Disambiguation: “I lost my left shoe on the banks of the river Nile.”
Score(BANK1)=log(5/7) + log(P(shoe|BANK1))+log(P(on|BANK1))+log(P(the|BANK1)) …
Score(BANK2)=log(2/7) + log(P(shoe|BANK2))+log(P(on|BANK2))+log(P(the|BANK2)) …
26
Naïve Bayes Assumption Independence assumption:
The features (contextual words) are conditionally independent:
Probability of an entire feature vector given a sense, is the product of the probabilities of its individual features given that sense
Consequences: Bag of words model:
the structure and linear ordering of words within the context is ignored. The presence of one word in the bag is independent of another.
The independence assumption is incorrect but is useful in WSD
(Gale, Church & Yarowsky, 1992) report 90% correct disambiguation with 6 ambiguous nouns in the Hansard
n
1j kjk )s|P(v )s|P(V
27
Approaches to Statistical WSD --> Supervised Disambiguation
Naïve Bayes --> Decision Trees
Use of Lexical Resources Dictionary-based Thesaurus-based Translation-based
Discourse properties Unsupervised Disambiguation
28
Decision Tree Classifier
Bayes Classifier uses information from all words in the context window
But some words are more reliable than others to indicate which sense is used…
29
Decision Tree Classifier (con’t) Look for features that are very good indicators of the
result Place these features (as questions) in nodes of a
decision tree Split the examples so that those with different values
for the chosen feature are in a different set Repeat the same process with another feature
A sequence of tests is applied to each feature vector if test succeeds --> return the sense associated with the
test otherwise --> apply the next test if all features have been tested, then return a default
sense (most common one)
30
Example: bassObservati
onFeatures Sense
Includes ”fish”?
“striped bass”?
Includes “guitar”?
“bass player”?
Includes “piano”?
1 Yes Yes No No No fish
2 Yes Yes No No No fish
3 No No Yes No No instrument
4 No Yes No No No fish
5 Yes Yes No No No fish
6 No No Yes Yes Yes instrument
7 No Yes No No No fish
is "fish" in the feature vector?
fish is "striped" the previous word?
fish is "guitar" in the feature vector?
instrument fish
yes
yes
yes
no
no
no
31
Another Example: The restaurant Training data: OutputInput
32
A first decision tree
But is it the best decision tree we can build?
33
A better decision tree
4 tests instead of 9 & 11 branches instead of 21
34
Choosing the best feature
The key problem is choosing which feature to split a given set of examples
Most used strategy: information theory
p(x)p(x)logH(X)
Xx2
bit 121
log21
21
log21
- )p(x)logp(x21
,21
Itoss) coin H(fair 22iXx
2i
i
Entropy (or self-information)
35
Choosing the best feature (con't)
A)|tentropy(Set)entropy(SeA) gain(Set,
np
nlog
npn
npp
lognp
pt)entropy(Se 22
ii
i
ii
iv
1i
ii
npn
,np
pI x
npnp
A)|tentropy(Se
b failure of prob. and a sucess of prob. with attribute an for entropy b)I(a,
take can A attribute that values distinct of number the is v
:where
The "discriminating power" of an attribute A given a set S
if the training set contains: p positive examples and n negative examples
36
Some intuition
Size Color Shape Output
Big Red Circle +
Small Red Circle +
Small Red Square -
Big Blue Circle -
Size is the least discriminating attribute (i.e. smallest information gain)
Shape and color are the most discriminating attribute (i.e. highest information gain)
37
A small example
So first separate according to either color or shape (root of the tree)
Note: by definition 0log0 is 0
Size
big: 1+ 1- small: 1+ 1-
Color
red: 2+ 1- blue: 0+ 1-
Shape
circle: 2+ 1- square: 0+ 1-
Size Color Shape Output
Big Red Circle +
Small Red Circle +
Small Red Square -
Big Blue Circle -
0.31150.6885-1 Color)|tgain(Outpu
0.6885 (0)41
(0.918)43
Color)|tputentropy(Ou
011
log11
blue)|tputentropy(Ou
0.91831
log31
32
log32
red)|tputentropy(Ou
121
log21
21
log21
tput)entropy(Ou
2
22
22
0.31150.918-1 Shape)|tgain(Outpu
0.918(0)41
(0.918)43
Shape)|tputentropy(Ou
121
log21
21
log21
tput)entropy(Ou 22
01 -1 Size)|tgain(Outpu
1(1)21
(1)21
Size)|tputentropy(Ou
121
log21
21
log21
tput)entropy(Ou 22
38
With the data on p.27, we have:
So root of the tree should be attribute Patrons (we gain more information)
do recursively for subtrees
The restaurant example
0.541bits...44
log44
40
log 40
x124
22
log 22
20
log 20
- x 122
1
64
,62
I x 126
44
,40
I x 124
22
,20
I x 122
1patron)|tgain(Outpu
2222
bits 042
,42
I x124
42
,42
I x 124
21
,21
I x 122
21
,21
I x 122
1type)|tgain(Outpu
39
Back to WSD Need to translate the French word: “Prendre” can be seen as WSD possible translations/senses={take, make, rise,
speak}Observatio
nFeatures/Attributes Sense
Tense
Word left
Direct object
Word right
…
1 … … mesure … … take
2 … … note … … take
3 … … exemple … … take
4 … … decision … … make
5 … … parole … … speak
6 … … parole … … rise
40
Back to WSD (con't)
(Brown et al., 1991) found: On Canadian Hansard
Ambiguous word
Possible senses / translations
Best Feature
Example
“Prendre” {“take ”, “make”, “rise”, “speak”}
Direct object “Prendre une mesure ” --> “to take”“Prendre une décision ” --> “to make”
“Vouloir” {“to want”, “to like”} Tense Present --> “to want”Conditional --> “to like”
“Cent” {“%”, “¢”} Word to the left
“Pour” --> “%”Number --> “¢”
41
Training Set With supervised methods, we need a large sense-tagged
training set… where do you get it from? Using a "real" training set
Main standard hand sense-tagged corpora: SEMCOR corpus
portion of the Brown corpus tagged with WordNet senses
SENSEVAL corpus (www.senseval.org/) Standard WSD “competition” like MUC, TREC & DUC
Open Mind Word Expert(OMWE)
Using pseudowords: Artificial ambiguous words created by conflating two or more words. Ex: occurrences of “banana” and “door” can be replaced by
“banana-door” The disambiguation algorithm can now be tested on this data to
disambiguate the pseudoword “banana-door” into either “banana” or “door”
42
Problems…
With supervised (or unsupervised) methods: need a large amount of work to create a classifier for each
ambiguous word! So most work based in these techniques, report work on a
few words (2 to 12 words) Scaling up these approaches to deal with all ambiguous
words is immense work!
Solution: use lexical resources (ex: machine-readable dictionaries) use distributional properties to improve disambiguation:
Ambiguous words are only used in one sense in any given discourse and with any given collocate.
43
Approaches to Statistical WSD Supervised Disambiguation
Naïve Bayes Decision-tree
--> Use of Lexical Resources --> Dictionary-based Thesaurus-based Translation-based
Discourse properties Unsupervised Disambiguation
44
WSD based on sense definitions
(Lesk, 1986) A word’s dictionary definitions are likely to be good
indicators for the sense they define.
Method: Express the dictionary definitions of the ambiguous
word as sets of bag-of-words Express the context of the ambiguous word as a
single bag-of-words from the dictionary definitions of the context words.
Choose the definition of the ambiguous word that has the greatest overlap with the words occurring in its context.
45
Example "Cone" in dictionary:
DEF-1: “solid body which narrows to a point” BAG = {body, narrows, point, solid}
DEF-2: “something of this shape whether solid or hollow” BAG = {hollow, shape, something, solid}
DEF-3: “fruit of certain evergreen tree” BAG = {evergreen, fruit, tree}
To disambiguate "cone" in "pine cone" "Pine" in dictionary
DEF-1: “kind of evergreen tree” DEF-2: “waste away through sorrow or illness” --> BAG = {evergreen, illness, kind, sorrow, tree, waste}
so "cone" is: score(DEF-1) = {body, narrows, point, solid} {evergreen, illness, kind, sorrow, tree, waste}
= 0 score(DEF-2) = {hollow,shape,something,solid} {evergreen, illness, kind, sorrow, tree,
waste} = 0
score(DEF-3) = {evergreen, fruit, tree} {evergreen, illness, kind, sorrow, tree, waste} = 2
Max overlap: DEF-3
46
The algorithm
For all senses sk of word w
score(sk) = overlap (
- words in the dictionary definition of sense sk
- the union of the words in all context windows that also appear in a definition of w
)pick the sense s* with the highest score(sk)
47
Analysis Accuracies of 50-70% on short samples of texts Problem:
dictionary entries for the target words are usually relatively short and may not provide sufficient material to create adequate classifiers Because the words in the context and their definitions must have direct
overlap
One solution: expand the list of words whose definitions make use of the target word Example:
if “deposit” does not occur in the definition of “bank” but “bank” occurs in the definition of “deposit” We can expand the classifier for “bank” to include “deposit” as a relevant
feature However:
just knowing that “deposit” is related to “bank” does not help much if we do not know to which sense of “bank” it is related to --> To make use of “deposit” as a feature, we have to know which
sense of “bank” was being used in the definition Solution:
Use a thesaurus…
48
Approaches to Statistical WSD Supervised Disambiguation
Naïve Bayes Decision-tree
--> Use of Lexical Resources Dictionary-based --> Thesaurus-based Translation-based
Discourse properties Unsupervised Disambiguation
49
Thesaurus-Based Disambiguation Thesauri include tags (subject codes) in their entries
that correspond to broad semantic categories Each word is assigned one or more subject codes
which corresponds to its different meanings ANIMAL/INSECT (category 414) TOOLS/MACHINERY (category 348)
The semantic categories of the words in a context determine the semantic category of the whole context
This category, determines which word senses are used
For each subject code, count the number of words in the context that have the same subject code
Select the subject code that has the highest count
Accuracy ~50% (but with difficult and highly ambiguous words)
50
Some Results
Roget categoriesWord Sense Roget category Accuracy
(Yarowsky, 1992)
bass musical instrument MUSIC 99%
fish ANIMAL,INSECT 100%
star space object UNIVERSE 96%
celebrity ENTERTAINER 95%
star-shaped object INSIGNIA 82%
interest curiosity REASONING 88%
advantage INJUSTICE 34%
financial DEBT 90%
share PROPERTY 38%
51
Approaches to Statistical WSD Supervised Disambiguation
Naïve Bayes Decision-tree
--> Use of Lexical Resources Dictionary-based Thesaurus-based --> Translation-based
Discourse properties Unsupervised Disambiguation
52
Translation-Based WSD Words can be disambiguated by looking at how they are
translated in other languages Example: the word “interest”
To disambiguate the word “interest” in “showed interest” German translation of “show” is “zeigen” In German corpus:
we always find “zeigen interesse” we never find “zeigen beteiligung”
So in the original phrase “showed interest”, interest had sense2
To disambiguate the word “interest” in “acquired an interest” German translation of “acquired ” is “erwarb” In German corpus: C(“erwarb”, “beteiligung”) > C(“erwarb”,
“interesse”) So in the original phrase “acquired an interest” interest is sense1
sense1 sense2
Definition legal share attention, concern
German Translation
“Beteiligung” “Interesse”
English phrase “acquire an interest” “show interest”
Translation “erwerb eine Beteiligung”
“Interesse zeigen”
53
Approaches to Statistical WSD Supervised Disambiguation
Naïve Bayes Decision-tree
Use of Lexical Resources Dictionary-based Thesaurus-based Translation-based
--> Discourse properties Unsupervised Disambiguation
54
Discourse Properties (Yarowsky, 1995) So far, all methods have considered each occurrence
of ambiguous word separately… But…
One sense per discourse One document --> one sense
One sense per collocation Select some nearby word that give very clues … ie.
select words of a collocation <-> sense of target word
(Yarowsky , 1995) shows a reduction of error rate by 27% when using the discourse constraint!
i.e. assign the majority sense of the discourse to all occurrences of the target word
we can combine these 2 heuristics
55
Approaches to Statistical WSD Supervised Disambiguation
Naïve Bayes Decision-tree
Use of Lexical Resources Dictionary-based Thesaurus-based Translation-based
Discourse properties --> Unsupervised Disambiguation
56
Unsupervised Disambiguation
Disambiguate word senses: without supporting tools such as dictionaries and
thesauri without a labeled training text
Without such resources, we cannot really identify/label the senses
ie. cannot say bank-1 or bank-2 we do not even know the different senses of a word!
But we can: Cluster/group the contexts of an ambiguous word into a
number of groups discriminate between these groups without actually
labeling them
57
Clustering Represent each instance of the ambiguous word as a
vector <f1, f2, f3,…, fv > V is the vocabulary size fi is the frequency of word i in the context.
each vector can be visually represented in an V dimensional space
word
2
word1
V2
V1
V3
word3
58
Clustering
hypothesis: same senses of words will have similar neighboring words
Disambiguation algorithm Identify context vectors corresponding to all
occurrences of a particular word Partition them into regions of high density Tag a sense for each such region Disambiguating a word:
Compute context vector of its occurrence Find the closest centroid of a region Assign the occurrence the sense of that centroid
59
Evaluating WSD Metrics:
Accuracy: the % of words that are tagged correctly Precision & Recall:
Good : nb of correct answers provided by the system Bad: nb of wrong answers provided by the system Null: nb of cases in which the system doesn’t provide any
answer compared to a gold standard
SEMCOR corpus, SENSEVAL corpus, original text without pseudo-words,…
Difficulty in evaluation: Nature of the senses to distinguish has a huge impact on
results coarse VS fine-grained sense distinction
ex: “chair” --> person VS furniture ex: “bank” --> financial institution VS building
60
Bounds on Performance Upper and Lower Bounds on Performance:
Measure of how well an algorithm performs relative to the difficulty of the task.
Upper Bound: Human performance Around 97%-99% with few and clearly distinct senses Inter-judge agreement:
With words with clear & distinct senses --> 95% and up With polysemous words with related senses 65%-70%
Lower Bound (or baseline): Usually the assignment of the most frequent sense 90% is excellent for a word with 2 equiprobable senses 90% is trivial for a word with 2 senses with probability ratios
of 9 to 1 !!!
61
SENSEVAL (www.senseval.org)
Standard WSD “competition” like MUC, TREC & DUC Goals:
Provide a common framework to compare WSD systems
Standardise the task (especially evaluation procedures)
Build and distribute new lexical resources Senseval-1 (1998)
English, French and Italian HECTOR senses (Oxford University Press)
Senseval-2 (2001) 13 languages, including Chinese WordNet senses
Senseval-3 (March 2004) 7 languages (but various tasks) WordNet senses
62
Training text for "arm" (SENSEVAL-1) <instance id="arm.n.om.053"> <answer instance="arm.n.om.053" senseid="arm%1:08:00::"/>
<context>
Many <p="JJ"/> terrestrial <p="JJ"/> vertebrate <p="JJ"/> animals <p="NNS"/> have <p="VBP"/> four <p="CD"/> <ne="_NUM"/> limbs <p="NNS"/> . <p="."/> Those <p="DT"/> attached <p="VBN"/> to <p="TO"/> the <p="DT"/> thoracic <p="JJ"/> portion <p="NN"/> of <p="IN"/> the <p="DT"/> body <p="NN"/> are <p="VBP"/> called <p="VBN"/> " <p="""/> <head> arms <p="NNS"/> </head> . <p="."/> " <p="""/>
</context> </instance>
<instance id="arm.n.om.045"> <answer instance="arm.n.om.045" senseid="arm%1:06:02::"/>
<context> You <p="PRP"/> are <p="VBP"/> likely <p="JJ"/> to <p="TO"/> find <p="VB"/> a <p="DT"/> rocking_chair <p="NN"/> with <p="IN"/> <head> arms <p="NNS"/> </head> in <p="IN"/> a <p="DT"/> museum <p="NN"/>
</context> </instance>
<instance id="arm.n.la.029"> <answer instance="arm.n.la.029" senseid="arm%1:06:01::"/>
<context>
" <p="""/> Unlike <p="IN"/> Linder <p="NNP"/> , <p=","/> who <p="WP"/> was <p="VBD"/> reportedly <p="RB"/> carrying <p="VBG"/> a <p="DT"/> Kalashnikov <p="NNP"/> assault_rifle <p="NN"/> for <p="IN"/> protection <p="NN"/> , <p=","/> APSNICA <p="NNP"/> volunteers <p="NNS"/> do <p="VBP"/> not <p="RB"/> bear <p="VB"/> <head> arms <p="NNS"/> </head> . <p="."/>
</context> </instance>
…
63
What is a word sense anyways? “A mental representations of different meaning
of a word” Experiments in psycho-linguistics
Ask subjects classify index cards with sentences containing an ambiguous words into different piles
But inter-subject agreement is low…
Rely on introspection But introspection tends to rationalize often non-rational
decisions
Ask subjects to classify ambiguous words according to dictionary definitions
Some results show high inter-subject agreement, some results show low agreement!!!