word sense induction using continuous vector space models mikael kågebäck, fredrik johansson,...

Word sense induction using continuous vector space models

Mikael Kågebäck, Fredrik Johansson, Richard Johansson*, Devdatt Dubhashi

LAB, Chalmers University of Technology*Språkbanken, University of Gothenburg

2 out of 15

Word Sense Induction (WSI)• Automatic discovery of word senses.– Given a corpus discover senses of a given word,

e.g. rock

3 out of 15

Applications of WSI• Novel sense detection• Temporal/Geographical word sense drift• Localized word sense lexicons– Machine translation– Text understanding– more…

4 out of 15

Context clustering

1. Compute embeddings for word instances in a corpus, based on their context.

2. Cluster the space.3. Let the centroids represent the senses.

• Pioneered by Hinrich schütze (1998).• Assumption: Distributional hypothesis valid.

5 out of 15

Instance-context Embeddings (ICE)• Based on word embeddings computed using

the skip-gram model.– Low rank approximate factorization of a

normalized co-occurrence matrix C.

– Context word embeddings in V and word embeddings in U.

6 out of 15

Instance-context Embeddings (ICE)

Let the mean skip-gram vector representing the context form the Instance vector but:1. Apply a triangular window function2. Weight each context word using – Naturally removes stop words– Related to the PMI, Goldberg et al (2014).

7 out of 15

Plotted instances for ‘paper’

Mean vector ICE

Plotted using t-sne

8 out of 15

Proposed algorithm

1. Train skip gram model on the corpus.2. Compute instance representations using ICE.– One for each instance of a word in the corpus.

3. Cluster using (nonparametric) k-means.– Cluster evaluation from Pham et al. (2005).

• (Evaluation) disambiguate test data using obtained cluster centroids.

9 out of 15

SemEval 2013 task 13• WSI: Identify senses in ukWaC.• WSD: Disambiguate test words – To one of the induced senses.

• Evaluation :Compare to the annotated WordNet labels.

10 out of 15

Detailed results Semeval 2013 – task 13

Best baselin

e FBC: One se

nse

Best baselin

e FNMI: One per in

stance

Topic modelin

g based WSI (U

nimelb)

Language m

odeling based W

SI (AI-K

U)

Multi sense sk

ip gram (MSSG)

MSSG+ICE weights

ICE-kmeans

57%

00%

44%

35%

46% 49% 51%

00%05% 04% 05% 04% 06% 06%

Fuzzy b-cubed Fuzzy NMI

11 out of 15


Best baselin

e FBC: One se

nse

Best baselin

e FNMI: One per in

stance

Topic modelin

g based WSI (U

nimelb)

Language m

odeling based W

SI (AI-K

U)

Multi sense sk

ip gram (MSSG)

MSSG+ICE weights

ICE-kmeans00%

02%

04%

06%

08%

10%

12%

Harmonic mean of FBC and FNMI

12 out of 15


Topic modelin

g based WSI (U

nimelb)

Language m

odeling based W

SI (AI-K

U)

Multi sense sk

ip gram (MSSG)

MSSG+ICE weights

ICE-kmeans

-20%

-10%

0%

10%

20%

30%

40%

Total relative improvment;

33%

Total relative improvment

Axis Title

Axis Title

13 out of 15

Conclusions• Using skip-gram word embeddings clearly

boost the performance of WSI.• Semantic representation for word.• Tell which context words are most important.

14 out of 15

ICE profile

15 out of 15

Evaluation• SemEval 2013 - task 13– ukWaC– 50 lemmas and 100 instances per lemma.• Annotated with a WordNet senses.

word sense induction using continuous vector space models mikael kågebäck, fredrik johansson,...

Documents