[lecture notes in computer science] advances in artificial intelligence - iberamia-sbia 2006 volume...

Word Sense Disambiguation Based on WordSense Clustering

Henry Anaya-Sanchez1, Aurora Pons-Porrata1, and Rafael Berlanga-Llavori2

1 Center of Pattern Recognition and Data Mining, Universidad de Oriente, [email protected], [email protected]

2 Universitat Jaume I, Castellon, [email protected]

Abstract. In this paper we address the problem of Word Sense Dis-ambiguation by introducing a knowledge-driven framework for the dis-ambiguation of nouns. The proposal is based on the clustering of nounsense representations and it serves as a general model that includes someexisting disambiguation methods. A first prototype algorithm for theframework, relying on both topic signatures built from WordNet and theExtended Star clustering algorithm, is also presented. This algorithmyields encouraging experimental results for the SemCor corpus, showingimprovements in recall over other knowledge-driven methods.

1 Introduction

Word Sense Disambiguation (WSD) is the general task of deciding the appro-priate sense for a particular use of a polysemous word given its textual context.Despite of the intermediate nature of this task, it defines an essential researcharea in Natural Language Processing that contributes to almost all intelligenttext processing applications (e.g. Machine Translation, Information Extraction,Question Answering, Text Summarization, etc.).

The task of WSD can be specialized according to the sense definitions. For in-stance, word sense induction refers to the process of discovering different senses ofan ambiguous word without prior information about the inventory of senses [20].On the other hand, there are two major approaches for the disambiguation whenpredetermined sense definitions are provided: data-driven (or corpus-based) andknowledge-driven WSD. Data-driven methods are supervised due to the factthat they learn over hand-tagged samples. Unlike them, knowledge-driven meth-ods use a background knowledge source, avoiding the use of samples. Currently,lexical resources like WordNet [12] constitute the referred source in most cases.

Both, sense induction and data-driven approaches suffer from the sensitivityof the data together with the adquisition and real-world data problems (e.g. com-pleteness, correctness, etc.) [7], which give a domain-specific character to theseapproaches. These circumstances along with the availability of widely tested lex-ical resources make preferable the use of knowledge-driven methods for WSD,even when supervised approaches clearly outperfom them on well-known testsets (e.g. SENSEVAL corpus).

J.S. Sichman et al. (Eds.): IBERAMIA-SBIA 2006, LNAI 4140, pp. 472–481, 2006.c© Springer-Verlag Berlin Heidelberg 2006

Word Sense Disambiguation Based on Word Sense Clustering 473

Most of knowledge-driven methods have a similar behaviour: they try to matcha textual context against the knowledge source, then select the best match andfinally retrieve from it the suitable senses for the context constituents. Therefore,the main differences among these methods stem from the knowledge source, therelations used to perform the match and the best match selection procedure.

For example, the Lesk method [8] is based on counting word overlaps be-tween dictionary definitions and the context of an ambiguous word. In [4] sim-ulated annealing is used to handle the combinatorial explosion of the Leskmethod. Recently, several approaches such as [2], [5], [15] and [17] considerlexical relations (like hypernymy/hyponymy) among context elements1. Otherworks rely on the Web as knowledge source and use syntactic or text-proximityrelations [18].

In this paper, we address the problem of word sense disambiguation by in-troducing a knowledge-driven framework with a first prototype to disambiguatenouns. Our approach is based on the clustering of sense representations as anatural way to capture the reflected cohesion among the words of a textual unit.Starting from an initial cluster distribution of all possible senses, the algorithmselects groups of senses and discards others by matching the textual contextagainst the clusters. The selected senses are grouped again and the process isrepeated until a certain disambiguation criterion holds. Finally, words are dis-ambiguated with the remaining senses.

To the best of our knowledge, clustering algorithms have been explicitly usedin the WSD area for two main purposes. The primary one consists of clusteringtextual contexts to represent different senses in corpus-driven WSD (e.g. [14])and to induce word senses (e.g. [16], [3]). Other works (like [10] and [1]) clusterfine-grained word senses into coarse-grained ones. Hence, this paper shows anovel way of using clustering in this field.

In addition, our proposal aims to serve as a general model that includes someexisting knowledge-driven methods and, at the same time, attempts to be a newmethod with better performance. Despite the fact that we only treat in thispaper the disambiguation of nouns, the approach can be extended to considerany kind of words right away.

The rest of the paper is organized as follows. First, Section 2 specifies theproposed framework. In Section 3 a first prototype is presented. Section 4 de-scribes some experiments (carried out on SemCor corpus) and its results. Finally,Section 5 is devoted to offer some considerations and future work as conclusions.

2 A Knowledge-Driven Framework for WSD

Usually, text processing applications require the disambiguation of a specificsubset of words (e.g. the most frequent nouns), instead of an exhaustive full-text word disambiguation. Following this idea, the goal of our framework is thedisambiguation of a finite set of nouns N given a textual context T .2

1 These relations are present in well-known lexical databases like WordNet.2 Here, we do not restrict the elements of N to be in T .

474 H. Anaya-Sanchez, A. Pons-Porrata, and R. Berlanga-Llavori

Our framework comprises the following elements:

i. a representation for senses, which is provided by the knowledge source,ii. a clustering algorithm capable to group related sense representations,iii. a matching function for comparing a sense cluster with the textual context,iv. a filtering function for selecting sense clusters relying upon the previous

function andv. a stopping criterion for ensuring the termination of the disambiguation pro-

cess.

The idea behind this is that the noun senses in the given context must berelated by means of a certain – and possibly complex – relation. As we arenot interested in the precise relation definition but in the senses it links, wesuggest the use of a clustering algorithm. The role of this algorithm consists ofputting together the related senses into cohesive clusters. Assuming that eachsense cluster represents a possible meaning for the set of its constituent nouns,the rigth ones must be identified using the textual context. Thus, it is necessarya filtering method to select those clusters that match the best with the context.Due to the intrinsic difficulties in the modelling of the mentioned relation wepropose an iterative process to refine the clustering. Hence, it is also required astopping criterion. The general steps of the framework are shown in Algorithm 1.

Algorithm 1. Framework for the disambiguation of the set of nouns N in thetextual context TInput: The finite set of nouns N and the textual context T .Output: The disambiguated noun senses.

Let S be the set of all senses of nouns in N ;repeat

G = group(S)G′ = filter(G, T, matching-function)S = ∪g∈G′{s|s ∈ g}

until stopping-criterionreturn S

In this algorithm, the functions group, matching-function, filter and stopping-criterion correspond to the framework components ii., iii., iv. and v., respectively.By specifying these functions and the sense representation, different disambigua-tion algorithms can be obtained from this framework. Moreover, some existingknowledge-driven disambiguation methods can be seen as instances of this pro-posal. For example, the Specification Marks method [13] can be expressed byrepresenting each sense with the set of all its hypernym synsets, and by definingthe other framework components as follows. The clustering algorithm builds theset of clusters {subsume(c) ∩ S|c ∈ subsumed by some in(S)} ∪ {{s}|s ∈ S},where subsumed by some in(S) is the set of all WordNet senses that are sub-sumed by senses in S and subsume(c) is the set of senses subsuming c, accordingto the hyponym relation. The matching function associates to each cluster the


number of context words having at least one sense in it. Note that here thetextual context T coincides with N . The component filtering-function firstly se-lects for each noun n in N the clusters having the greatest matching score thatcontains just one sense of n. If there is only one selected cluster for n, the out-put of the function will include the singleton containing this sense. Otherwise,the output of the function will include the singletons (containing senses of n)obtained by applying five heuristics over the selected clusters. Finally, in thismethod only one framework iteration is required, therefore stopping-criterion isthe constant True.

Similarly, the Conceptual Density algorithm [2] can be derived from the frame-work by using the WordNet hypernym relation to provide sense representation.Also, group(S) must be defined as the trivial clustering {{s}|s ∈ S}, the match-ing function as the function that assigns a score to each cluster based on theconceptual density formula, the filtering function as the function that selects thecluster having the greatest matching score for each noun in N , and the stoppingcriterion as the constant True.

3 A New WSD Method

In this section we introduce our first prototype algorithm by defining its frame-work components.

Sense representation: The algorithm uses Topic Signatures [9] as represen-tations for the WordNet nominal senses. The topic signature of a noun sense sis a finite set {< t1, w1 >, ..., < tm, wm >}, where each ti is a term (unigram,bigram or trigram) highly correlated to s with association weight wi. Thesesignatures are represented using the Vector Space Model (VSM) [19]. Also, acorrespondence between a sense and the noun in N it represents is assumed.Clustering algorithm: We adopt the Extended Star Clustering Algorithm [6],which builds overlapped and star-shaped clusters. Each cluster consists of a starand its satellites, where the star is the object with the highest connectivity ofthe cluster. This algorithm relates sense representations in an analogous mannerto the way in which syntactic or discourse relations link textual elements. Toperform the clustering, this algorithm needs a similarity measure between objects(senses, in this case) and a minimum similarity threshold (β0). In this prototype,the cosine similarity measure is used to compare senses.Matching function: The matching function associates a three-component vec-tor to a cluster g according to the textual context T as follows.

matching-function(g, T ) =

⎛⎝|nouns(g)|,

∑i

min(gi, Ti)

min(∑i

gi,∑i

Ti), −

∑s∈g

number(s)

⎞⎠ (1)

In this definition, nouns(g) denotes the set of nouns associated to senses ing, g is the centroid of g, and number(s) is the WordNet ordinal number ofsense s (according to its corresponding noun). It is worth mentioning that Tis represented in the sense VSM. The score assigned to each cluster considers


the number of nouns it has associated, its overlapping with the context and theWordNet sense frequency of its senses.Filtering function: To perform the filtering, this function firstly sorts the clus-ters considering a lexicographic ordering of its associated matching-vectors ob-tained with (1). The goal of this function is to select clusters covering the set ofnouns N by using the previous order. Thus, a cluster g is selected if it containsat least one sense of an uncovered noun and its senses corresponding to coverednouns are included in the already-selected clusters. If g does not contain anysense of uncovered nouns it is discarded. Otherwise, g is inserted into a queueQ. Finally, if the selected clusters do not cover N , clusters in Q adding sensesof uncovered nouns are chosen until the cover is reached.Stopping criterion: As the purpose of this algorithm is to disambiguate theset N , the cardinal of the set of senses obtained from the selected clusters in thefiltering process must be equal to |N |. But it is not always possible with a singleiteration. In order to obtain only one sense per noun, succesive clustering arerequired. Fine-grained clustering can be generated by increasing the minimumsimilarity threshold. Equation (2) gives the used threshold definition for the i-thiteration.

β0(i) =

⎧⎨⎩

percentile(90, sim(S)) if i = 0,

minq∈{0,5,10}

{β = percentile(90 + q, sim(S))|β > β0(i − 1)} otherwise. (2)

In this equation, percentile(p, sim(S)) represents the p-th percentile value ofthe set sim(S) = {cos(si, sj)|si, sj ∈ S, i �= j} ∪ {1}. The idea of this definitionis to select at each time a threshold value that allows the arrangement of strongcohesive clusters. The selection is done from the set of pairwise similarities, tryingto pick out a high percentile value. Hence, the stopping criterion is |S| = |N | orβ0(i + 1) = 1.

3.1 A Disambiguation Example

Figure 1 illustrates graphically the disambiguation of nouns in sentence “Thecompetition gave evidence of the athlete’s skills”. In this example, we considerN to be the set of lemmas of nouns in the sentence (bold-faced words), andthe textual context T composed by these nouns plus the verb give (the restof words are not included because they are meaningless). The correct senses ofthese nouns are competition#2, evidence#2, athlete#1 and skill#1.

The disambiguation algorithm firstly clusters the set of noun senses S ={ competition#1, competition#2, competition#3, competition#4, evidence#1,evidence#2, evidence#3, athlete#1, skill#1, skill#2 }, using the initial β0 =0.056 (the 90th-percentile of the similarities between the senses). The boxes inthe figure represent the obtained clusters, which are sorted by the matchingscores (vectors under the boxes). As we can see, the first cluster comprises thesingle noun sense of athlete, competition#2, which is the sense refering to anathletic competition, and skill#1, which concerns to an ability acquired bytraining. Note that overlapped clusters are obtained. It can be also appreciated


Fig. 1. Disambiguation of nouns in “The competition gave evidence of the athlete’sskills”

that competition#2 and competition#4 form a cluster together with athletedue to its strong relation, although competition#4 is not a correct sense for thesentence.

Initially, the filtering function includes in the selection the first cluster. Asthe second cluster contains one sense of the uncovered noun evidence and allother senses are included in the first cluster, it is selected too. No other clusteris selected because all nouns are covered. In the figure, doubly-boxed clustersdepict the selected ones by the filtering function. In this case, only one iterationis enough to provide the disambiguated nouns, because each noun has a uniquesense in the set of selected clusters.

In addition, we have included in the bottom of the figure a portion of therepresentation of each resulting sense.

4 Experiments

In order to evaluate our approach, we use the SemCor corpus [11] and the tra-ditional measures of Precision, Recall, Coverage and F-measure. This corpuscomprises 190 documents containing 88026 noun occurrences, of which 69549correspond to polysemous nouns.

In our experiments, the disambiguation process was carried out at the sentencelevel, assuming one sense per sentence. For each sentence, the set of its nounsis disambiguated considering all content words of the sentence as the textualcontext. To represent noun senses, topic signatures from the lexical relationsof WordNet are built as follows. A topic signature for a sense includes all itshyponyms, its directly related terms and their glosses. To weight signaturesterms, the tf -idf statistics is used, considering one collection for each noun, andits senses instead of documents.

Table 1 summarizes the overall recall split according to the SemCor cate-gories. The columns include results for polysemous nouns only and for polyse-mous and monosemous nouns combined. In this table we have omitted the values


of Precision, Coverage and F-measure because all nouns are disambiguated bythe algorithm, i.e. Precision and F-measure values coincide with Recall, and a100 % of Coverage is achieved.

Table 1. WSD performance in SemCor categories

Categories Polysemous nouns All nounsA. Press: reportage 0.606 0.683C. Press: reportage 0.504 0.602L. Mystery & detective fiction 0.498 0.589F. Popular lore 0.482 0.604P. Romance & love story 0.480 0.581H. Miscellaneous 0.479 0.590M. Science fiction 0.479 0.587B. Press: editorial 0.476 0.599K. General fiction 0.476 0.580E. Skill & Hobbies 0.473 0.586G. Belles letters, biography, essays 0.462 0.563R. Humor 0.461 0.576N. Adventure & western fiction 0.452 0.552J. Learned 0.444 0.571D. Religion 0.388 0.494Brown 1 0.475 0.588Brown 2 0.467 0.576Whole SemCor 0.472 0.582

As shown in Table 1, our algorithm performs the best in Press: reportagecategory, and the worst in Religion. In all other categories the recall valuesare similar. Thus, it seems that the performance is not affected with differentknowledge domains.

Since WordNet has been criticized for its lack of relations between topicallyrelated concepts, we evaluate the use of topic signatures developed by the IxaResearch Group3, which attempts to overcome this drawback. These signatureswere built by acquiring examples automatically from the Web with the monose-mous relatives method for each WordNet nominal sense. In this case, the sig-nature terms are those occurring in the retrieved snippets, and its weights arecomputed using the tf -idf statistics in a similar way.

Table 2 shows the results obtained in the disambiguation of all noun occur-rences by using both topic signatures separatelly. The experiment was done usingthe 22 documents of Brown 1 belonging to categories from A to E.

Surprisingly, topic signatures built using only WordNet information outper-form the Web-based ones. We suspect it is due to the fact that together withtopically related concepts noisy terms are introduced in the signatures. Notealso that the disambiguation of all noun occurrences was not possible using theWeb-based topic signatures because some noun senses lack signatures.

3 http://ixa.si.ehu.es/Ixa/


Table 2. Results using different topic signatures

Signatures Polysemous nouns All nouns

Based only onWordNet

Web-based

Recall Precision F Coverage

0.501 0.501 0.501 100 %0.433 0.461 0.447 93.8 %

Recall Precision F Coverage

0.603 0.603 0.603 100 %0.536 0.565 0.550 94.9 %

Finally, we compare our method with four knowledge-driven WSD algorithms:Conceptual density [2], UNED method [5], the Lesk method [8] and the Specifica-tion marks with voting heuristics [13]. Table 3 includes the recall values obtainedover the whole SemCor corpus considering polysemous nouns only. Last columnindicates whether the method obtained a full coverage or not.

Table 3. Overall performance

WSD method Recall Full coverageConceptual density 0.220 not

Lesk 0.274 notUNED method 0.313 not

Specification marks 0.391 yesOur method 0.472 yes

It can be appreciated that our approach improves all other methods in atleast a 20 % of recall. It is important to notice that our method also ensures afull coverage.

5 Conclusions

In this paper a framework for the disambiguation of nouns has been introduced.Its novelty resides in the use of clustering as a natural way to connect semanti-cally related word senses. Different knowledge-driven WSD methods can be ob-tained from it by specifying a representation for senses, a clustering algorithm, amatching and a filtering functions, and a stopping criterion. Some existing WSDmethods can be seen as instances of this framework.

Most existing approaches attempt to disambiguate a target word in the con-text of its surrounding words using a particular taxonomical relation. Instead,we disambiguate a set of related words at once using a given textual context.Besides, we allow a flexible sense representation, with which more semantic in-formation can be attached to the disambiguation process.

A first prototype algorithm for the framework was also introduced. It relies onboth topic signatures built from WordNet and the Extended Star clustering algo-rithm. The way this clustering algorithm relates sense representations resemblesthe manner in which syntactic or discourse relations link textual components.


This algorithm was compared with other knowledge-driven disambiguationmethods over the whole SemCor corpus. The experimental results show that ouralgorithm obtains better recall values, while achieving a 100 % of coverage.

Though in this work we treat only the disambiguation of nouns, the approachcan be extended to consider other word categories. As further work, we plan togenerate new algorithms from the framework, using other clustering algorithmsand varying the textual contexts, to explore its impact in the disambiguationtask.

References

1. Agirre, E., Lopez, O.: Clustering wordnet word senses. In: Proceedings of theConference on Recent Advances on Natural Language Processing. Bulgary (2003)121–130

2. Agirre, E., Rigau, G.: Word Sense Disambiguation Using Conceptual Density. In:Proceedings of the 16th Conference on Computational Linguistic, Vol. 1. Denmark(1996) 16–22

3. Bordag, S.: Word Sense Induction: Triplet-Based Clustering and Automatic Evalu-ation. Accepted to the 11st Conference of the European Chapter of the Associationfor Computational Linguistic. Italy (2006)

4. Cowie, J., Guthrie, J. A., Guthrie, L.: Lexical disambiguation using simulatedannealing. In: Proceedings of the 14th International Conference on ComputationalLinguistics, Vol. 1. France (1992) 359–365

5. Fernandez-Amoros, D., Gonzalo, J., Verdejo, F.: The Role of Conceptual Relationsin Word Sense Disambiguation. In: Proceedings of the 6th International Workshopon Applications of Natural Language for Information Systems. Spain (2001) 87–98

6. Gil-Garcıa, R., Badıa-Contelles, J.M., Pons-Porrata, A.: Extended Star ClusteringAlgorithm. Progress in Pattern Recognition, Speech and Image Analysis. LectureNotes on Computer Sciences, Vol. 2905. Springer-Verlag (2003) 480-487

7. Ide, N., Veronis, J.: Word Sense Disambiguation: The State of the Art. Computa-tional Linguistics 24:1 (1998) 1–40

8. Lesk, M.: Automatic Sense Disambiguation Using Machine Readable Dictionaries:How to Tell a Pine Cone from an Ice Cream Cone. In: Proceedings of the 5th AnnualInternational Conference on Systems Documentation. Canada (1986) 24–26

9. Lin, C.-Y., Hovy, E.: The Automated Acquisition of Topic Signatures for TextSummarization. In: Proceedings of the COLING Conference. France (2000)495–501

10. Mihalcea, R., Moldovan, D.I.: EZ. WordNet: Principles for Automatic Generationof a Coarse Grained WordNet. In: Proceedings of the FLAIRS Conference. Florida(2001) 454–458

11. Miller, G. A., Leacock, C., Randee, T., Bunker, R.: A Semantic Concordance. In:Proceedings of the 3rd DARPA Workshop on Human Language Technology. NewJersey (1993) 303–308

12. Miller, G.: WordNet: A Lexical Database for English. Communications of the ACM38:11 (1995) 39–41

13. Montoyo, A., Suarez, A., Rigau, G., Palomar, M.: Combining Knowledge- andCorpus-based Word-Sense-Disambiguation Methods. Journal of Artificial Intelli-gence Research 23 (2005) 299–330


14. Niu, C., Li, W., Srihari, R.K., Li, H., Crist, L.: Context Clustering for Word SenseDisambiguation Based on Modeling Pairwise Context Similarities. In: SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the SemanticAnalysis of Text. Spain (2004) 187–190

15. Patwardhan, S., Banerjee, S., Pedersen, T.: Using Measures of Semantic Relat-edness for Word Sense Disambiguation. In: Proceedings of the 4th InternationalConference on Computational Linguistics and Intelligent Text Processing. Mexico(2003) 16–22

16. Pedersen, T., Purandare, A., Kulkarni, A.: Name Discrimination by ClusteringSimilar Contexts. In: Proceedings of the 6th International Conference on Compu-tational Linguistics and Intelligent Text Processing. Mexico (2005) 226–237

17. Rosso, P., Masulli, F., Buscaldi, D., Pla, F., Molina, A.: Automatic Noun SenseDisambiguation. In: Proceedings of the 4th International Conference on Computa-tional Linguistics and Intelligent Text Processing. Mexico (2003) 16–22

18. Rosso, P., Montes-y-Gomez, M., Buscaldi, B., Pancardo-Rodrıguez, A., Villasenor-Pineda, L.: Two Web-Based Approaches for Noun Sense Disambiguation. In: Pro-ceedings of the 6th International Conference on Computational Linguistics andIntelligent Text Processing. Mexico (2005) 267–279

19. Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Information Retrieval.Journal of the American Society for Information Science 18:11 (1975) 613–620

20. Udani, G., Dave, S., Davis, A., Sibley, T.: Noun Sense Induction Using Web SearchResults. In: Proceedings of the 28th Annual International ACM SIGIR Conferenceon Research and Development in Information Retrieval. Brazil (2005) 657–658

[lecture notes in computer science] advances in artificial intelligence - iberamia-sbia 2006 volume...

Documents