kai-wei chang joint work with scott wen-tau yih, chris meek microsoft research

Post on 15-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Multi-Relational Latent Semantic Analysis.

Kai-Wei ChangJoint work with Scott Wen-tau Yih, Chris Meek

Microsoft Research

Natural Language Understanding

Build an intelligent system that can interact with human using natural language

Research challengeMeaning representation of textSupport useful inferential tasks

Semantic word representation is the foundation

Language is compositionalWord is the basic semantic unit

Continuous Semantic Representations

A lot of popular methods for creating word vectors!

Vector Space Model [Salton & McGill 83]

Latent Semantic Analysis [Deerwester+ 90]

Latent Dirichlet Allocation [Blei+ 01]

Deep Neural Networks [Collobert & Weston 08]

Encode term co-occurrence informationMeasure semantic similarity well

Continuous Semantic Representations

sunnyrainy

windycloudy

car

wheel

cab sad

joy

emotion

feeling

Semantics Needs More Than SimilarityTomorrow

will be rainy. Tomorrow

will be sunny.

rainy, sunny?

rainy, sunny?

Leverage Linguistic Resources

Can’t we just use the existing linguistic resources?

Knowledge in these resources is never completeOften lack of degree of relations

Create a continuous semantic representation that

Leverages existing rich linguistic resourcesDiscovers new relationsEnables us to measure the degree of multiple relations (not just similarity)

Roadmap

IntroductionBackground:

Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)

Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation

ExperimentsConclusions

Roadmap

IntroductionBackground:

Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)

Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation

ExperimentsConclusions

Latent Semantic Analysis [Deerwester+ 1990]

Data representationEncode single-relational data in a matrix

Co-occurrence (e.g., from a general corpus)Synonyms (e.g., from a thesaurus)

FactorizationApply SVD to the matrix to find latent components

Measuring degree of relationCosine of latent vectors

Cosine Score

Encode Synonyms in MatrixInput: Synonyms from a thesaurus

Joyfulness: joy, gladdenSad: sorrow, sadden

joy gladden

sorrow sadden

goodwill

Group 1: “joyfulness”

1 1 0 0 0

Group 2: “sad” 0 0 1 1 0

Group 3: “affection”

0 0 0 0 1

Target word: row-vector

Term: column-vector

SVD generalizes the original dataUncovers relationships not explicit in the thesaurusTerm vectors projected to -dim latent space

Word similarity: cosine of two column vectors in

Mapping to Latent Space via SVD

𝐖 𝐔 𝐕𝑇≈

𝑑×𝑛 𝑑×𝑘

𝑘×𝑘 𝑘×𝑛

𝚺terms

Problem: Handling Two Opposite RelationsSynonyms & Antonyms

LSA cannot distinguish antonyms [Landauer 2002]

“Distinguishing synonyms and antonyms is still perceived as a difficult open problem.” [Poon & Domingos 09]

Polarity Inducing LSA [Yih, Zweig, Platt 2012]

Data representationEncode two opposite relations in a matrix using “polarity”

Synonyms & antonyms (e.g., from a thesaurus)

FactorizationApply SVD to the matrix to find latent components

Measuring degree of relationCosine of latent vectors

joy gladden

sorrow sadden

goodwill

Group 1: “joyfulness”

1 1 -1 -1 0

Group 2: “sad” -1 -1 1 1 0

Group 3: “affection”

0 0 0 0 1

Encode Synonyms & Antonyms in Matrix

Joyfulness: joy, gladden; sorrow, saddenSad: sorrow, sadden; joy, gladden

Inducing polarity

Cosine Score:

Target word: row-vector

joy gladden

sorrow sadden

goodwill

Group 1: “joyfulness”

1 1 -1 -1 0

Group 2: “sad” -1 -1 1 1 0

Group 3: “affection”

0 0 0 0 1

Encode Synonyms & Antonyms in Matrix

Joyfulness: joy, gladden; sorrow, saddenSad: sorrow, sadden; joy, gladden

Inducing polarityTarget word: row-

vector

Cosine Score:

Limitation of the matrix representationEach entry captures a particular type of relation between two entities, orTwo opposite relations with the polarity trick

Encoding other binary relationsIs-A (hyponym) – ostrich is a birdPart-whole – engine is a part of car

Problem: How to Handle More Relations?

Encode multiple relations in a 3-way tensor (3-dim array)!

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Represent word relations using a tensorEach slice encodes a relation between terms and target words.

0 0 0 0

0 0 1 0

1 0 0 0

0 0 0 0

joy

glad

den

sadd

enfe

elin

gjoyfulness

gladden

sad

anger

glad

den

joy

sadd

enfe

elin

g

joyfulness

gladden

sad

anger

Synonym layer Antonym layer

1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 0

Construct a tensor with two slices

Encode Multiple Relations in Tensor

Can encode multiple relations in the tensor

0 0 0 1

0 0 0 0

0 0 0 1

0 0 0 1

glad

den

joy

sadd

enfe

elin

g

joyfulness

gladden

sad

anger

Hyponym layer

1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 0

1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 0

Encode Multiple Relations in Tensor

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Derive a low-rank approximation to generalize the data and to discover unseen relationsApply Tucker decomposition and reformulate the results

Tensor Decomposition – Analogy to SVD

𝑤1

,𝑤2

,…,𝑤

𝑑 𝑡1 , 𝑡2 ,…, 𝑡𝑛

~~ × ×

𝑡1 , 𝑡2 ,…, 𝑡𝑛

𝑟 𝑟

𝑟

𝑟

𝑤1

,𝑤2

,…,𝑤

𝑑

latent representation of words

𝑤1

,𝑤2

,…,𝑤

𝑑 𝑡1 , 𝑡2 ,…, 𝑡𝑛

~~ × ×

𝑡1 , 𝑡2 ,…, 𝑡𝑛

𝑟 𝑟

𝑟

𝑟

Derive a low-rank approximation to generalize the data and to discover unseen relationsApply Tucker decomposition and reformulate the results

~~ × ×

𝑟

𝑟

𝑅1 ,𝑅

2 ,…,𝑅

𝑚

𝑟

latent representation of words

latent representation of a relation

Tensor Decomposition – Analogy to SVD

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Measure Degree of Relation

SimilarityCosine of the latent vectors

Other relation (both symmetric and asymmetric)

Take the latent matrix of the pivot relation (synonym)Take the latent matrix of the relationCosine of the latent vectors after projection

𝑎𝑛𝑡 ( joy ,sadden )=cos (𝓦: , joy , 𝑠𝑦𝑛 ,𝓦: ,sadden ,𝑎𝑛𝑡 )

0 0 0 0

0 0 1 0

1 0 0 0

0 0 0 0

joy

glad

den

sadd

enfe

lling

joyfulness

gladden

sad

anger

glad

den

joy

sadd

enfe

lling

joyfulness

gladden

sad

anger

Synonym layer Antonym layer

1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 0

Measure Degree of RelationRaw Representation

𝑎𝑛𝑡 ( joy ,sadden )=cos (𝓦: , joy , 𝑠𝑦𝑛 ,𝓦: ,sadden ,𝑎𝑛𝑡 )

0 0 0 0

0 0 1 0

1 0 0 0

0 0 0 0

joy

glad

den

sadd

enfe

lling

joyfulness

gladden

sad

anger

glad

den

joy

sadd

enfe

lling

joyfulness

gladden

sad

anger

Synonym layer Antonym layer

1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 0

Measure Degree of RelationRaw Representation

𝐻𝑦𝑝𝑒𝑟 ( joy , feeling )=cos (𝑾 : , joy ,𝑠𝑦𝑛 ,𝑾 : ,feeling , h𝑦𝑝𝑒𝑟 )

joy

glad

den

sadd

enfe

lling

joyfulness

gladden

sad

anger

Synonym layer

1 0 0 0

1 1 0 0

0 0 1 0

0 0 0 0

Estimate the Degree of a RelationRaw Representation

0 0 0 1

0 0 0 0

0 0 0 1

0 0 0 1

glad

den

joy

sadd

enfe

elin

g

joyfulness

gladden

sad

anger

Hypernym layer

𝑟𝑒𝑙 ( w 𝑖 ,w 𝑗 )=cos (𝑊 : , w 𝑖 , 𝑠𝑦𝑛,𝑊 : , w 𝑗 ,𝑟𝑒𝑙 )

Measure Degree of RelationRaw Representation

w 𝑗

w𝑖

Synonym layer

The slice of the specific relation

Cos ( , )

~~ × ×

𝑣1 ,𝑣2 , …,𝑣𝑛

𝑟

𝑟

𝑆,𝑆

2 ,…,𝑆𝑚

𝑟~~

× ×

𝑟𝑒𝑙 ( w 𝑖 ,w 𝑗 )=cos (𝑺:, : ,𝑠𝑦𝑛𝐕𝑖 , :𝑇 ,𝑺: , :, 𝑟𝑒𝑙𝐕 𝑗 ,:

𝑇 )

Measure Degree of RelationLatent Representation

w 𝑗

w𝑖

R 𝑠𝑦𝑛

R𝑟𝑒𝑙 v 𝑗 v 𝑖

Roadmap

IntroductionBackground:

Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)

Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation

ExperimentsConclusions

Experiment: Data for Building MRLSA Model

Encarta Thesaurus Record synonyms and antonyms of target wordsVocabulary of 50k terms and 47k target words

WordNetHas synonym, antonym, hyponym, hypernym relationsVocabulary of 149k terms and 117k target words

Goals:MRLSA generalizes LSA to model multiple relationsImprove performance by combing heterogeneous data

Example Antonyms Output by MRLSA

Target High Score Words

inanimate

alive, living, bodily, in-the-flesh, incarnate

alleviate exacerbate, make-worse, in-flame, amplify, stir-up

relish detest, abhor, abominate, despise, loathe

* Words in blue are antonyms listed in the Encarta thesaurus.

Results – GRE Antonym TestTask: GRE closest-opposite questions

Which is the closest opposite of adulterate?(a) renounce (b) forbid (c) purify (d) criticize (e) correct

Mohammad et al. 08 Lookup PILSA MRLSA0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.64

0.56

0.740.77

Acc

ura

cy

Example Hyponyms Output by MRLSATarget High Score Words

bird ostrich, gamecock, nighthawk, amazon, parrot

automobile

minivan, wagon, taxi, minicab, gypsy cab

vegetable buttercrunch, yellow turnip, romaine, chipotle, chilli

Results – Relational Similarity (SemEval-2012)

Task: Class-Inclusion Relation ( is-a kind of )Most/least illustrative word pairs(a) art:abstract (b) song:opera (c) footwear:boot (d) hair:brown

UTD Lookup MRLSA0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.340.37

0.56

Acc

ura

cy

Relational domain knowledge – the entity type

Relation can only hold between the right types of entities

Words having is-a relation have the same part-of-speechFor relation born-in, the entity types are: (person, location)

Leverage type information to improve MRLSA

Idea #3: Change the objective function

Problem: Use Relational Domain Knowledge

ConclusionsContinuous semantic representation that

Leverages existing rich linguistic resourcesDiscovers new relationsEnables us to measure the degree of multiple relations

ApproachesBetter data representationMatrix/Tensor decomposition

Challenges & Future WorkCapture more types of knowledge in the modelSupport more sophisticated inferential tasks

top related