kai-wei chang joint work with scott wen-tau yih, chris meek microsoft research

41
Multi-Relational Latent Semantic Analysis . Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Upload: hector-maggs

Post on 15-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Multi-Relational Latent Semantic Analysis.

Kai-Wei ChangJoint work with Scott Wen-tau Yih, Chris Meek

Microsoft Research

Page 2: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Natural Language Understanding

Build an intelligent system that can interact with human using natural language

Research challengeMeaning representation of textSupport useful inferential tasks

Semantic word representation is the foundation

Language is compositionalWord is the basic semantic unit

Page 3: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Continuous Semantic Representations

A lot of popular methods for creating word vectors!

Vector Space Model [Salton & McGill 83]

Latent Semantic Analysis [Deerwester+ 90]

Latent Dirichlet Allocation [Blei+ 01]

Deep Neural Networks [Collobert & Weston 08]

Encode term co-occurrence informationMeasure semantic similarity well

Page 4: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Continuous Semantic Representations

sunnyrainy

windycloudy

car

wheel

cab sad

joy

emotion

feeling

Page 5: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Semantics Needs More Than SimilarityTomorrow

will be rainy. Tomorrow

will be sunny.

rainy, sunny?

rainy, sunny?

Page 6: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Leverage Linguistic Resources

Can’t we just use the existing linguistic resources?

Knowledge in these resources is never completeOften lack of degree of relations

Create a continuous semantic representation that

Leverages existing rich linguistic resourcesDiscovers new relationsEnables us to measure the degree of multiple relations (not just similarity)

Page 7: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Roadmap

IntroductionBackground:

Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)

Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation

ExperimentsConclusions

Page 8: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Roadmap

IntroductionBackground:

Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)

Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation

ExperimentsConclusions

Page 9: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Latent Semantic Analysis [Deerwester+ 1990]

Data representationEncode single-relational data in a matrix

Co-occurrence (e.g., from a general corpus)Synonyms (e.g., from a thesaurus)

FactorizationApply SVD to the matrix to find latent components

Measuring degree of relationCosine of latent vectors

Page 10: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Cosine Score

Encode Synonyms in MatrixInput: Synonyms from a thesaurus

Joyfulness: joy, gladdenSad: sorrow, sadden

joy gladden

sorrow sadden

goodwill

Group 1: “joyfulness”

1 1 0 0 0

Group 2: “sad” 0 0 1 1 0

Group 3: “affection”

0 0 0 0 1

Target word: row-vector

Term: column-vector

Page 11: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

SVD generalizes the original dataUncovers relationships not explicit in the thesaurusTerm vectors projected to -dim latent space

Word similarity: cosine of two column vectors in

Mapping to Latent Space via SVD

𝐖 𝐔 𝐕𝑇≈

𝑑×𝑛 𝑑×𝑘

𝑘×𝑘 𝑘×𝑛

𝚺terms

Page 12: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Problem: Handling Two Opposite RelationsSynonyms & Antonyms

LSA cannot distinguish antonyms [Landauer 2002]

“Distinguishing synonyms and antonyms is still perceived as a difficult open problem.” [Poon & Domingos 09]

Page 13: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Polarity Inducing LSA [Yih, Zweig, Platt 2012]

Data representationEncode two opposite relations in a matrix using “polarity”

Synonyms & antonyms (e.g., from a thesaurus)

FactorizationApply SVD to the matrix to find latent components

Measuring degree of relationCosine of latent vectors

Page 14: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

joy gladden

sorrow sadden

goodwill

Group 1: “joyfulness”

1 1 -1 -1 0

Group 2: “sad” -1 -1 1 1 0

Group 3: “affection”

0 0 0 0 1

Encode Synonyms & Antonyms in Matrix

Joyfulness: joy, gladden; sorrow, saddenSad: sorrow, sadden; joy, gladden

Inducing polarity

Cosine Score:

Target word: row-vector

Page 15: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

joy gladden

sorrow sadden

goodwill

Group 1: “joyfulness”

1 1 -1 -1 0

Group 2: “sad” -1 -1 1 1 0

Group 3: “affection”

0 0 0 0 1

Encode Synonyms & Antonyms in Matrix

Joyfulness: joy, gladden; sorrow, saddenSad: sorrow, sadden; joy, gladden

Inducing polarityTarget word: row-

vector

Cosine Score:

Page 16: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Limitation of the matrix representationEach entry captures a particular type of relation between two entities, orTwo opposite relations with the polarity trick

Encoding other binary relationsIs-A (hyponym) – ostrich is a birdPart-whole – engine is a part of car

Problem: How to Handle More Relations?

Encode multiple relations in a 3-way tensor (3-dim array)!

Page 17: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 18: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 19: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 20: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 21: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 22: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Represent word relations using a tensorEach slice encodes a relation between terms and target words.

0 0 0 0

0 0 1 0

1 0 0 0

0 0 0 0

joy

glad

den

sadd

enfe

elin

gjoyfulness

gladden

sad

anger

glad

den

joy

sadd

enfe

elin

g

joyfulness

gladden

sad

anger

Synonym layer Antonym layer

1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 0

Construct a tensor with two slices

Encode Multiple Relations in Tensor

Page 23: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Can encode multiple relations in the tensor

0 0 0 1

0 0 0 0

0 0 0 1

0 0 0 1

glad

den

joy

sadd

enfe

elin

g

joyfulness

gladden

sad

anger

Hyponym layer

1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 0

1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 0

Encode Multiple Relations in Tensor

Page 24: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 25: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Derive a low-rank approximation to generalize the data and to discover unseen relationsApply Tucker decomposition and reformulate the results

Tensor Decomposition – Analogy to SVD

𝑤1

,𝑤2

,…,𝑤

𝑑 𝑡1 , 𝑡2 ,…, 𝑡𝑛

~~ × ×

𝑡1 , 𝑡2 ,…, 𝑡𝑛

𝑟 𝑟

𝑟

𝑟

𝑤1

,𝑤2

,…,𝑤

𝑑

latent representation of words

Page 26: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

𝑤1

,𝑤2

,…,𝑤

𝑑 𝑡1 , 𝑡2 ,…, 𝑡𝑛

~~ × ×

𝑡1 , 𝑡2 ,…, 𝑡𝑛

𝑟 𝑟

𝑟

𝑟

Derive a low-rank approximation to generalize the data and to discover unseen relationsApply Tucker decomposition and reformulate the results

~~ × ×

𝑟

𝑟

𝑅1 ,𝑅

2 ,…,𝑅

𝑚

𝑟

latent representation of words

latent representation of a relation

Tensor Decomposition – Analogy to SVD

Page 27: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Multi-Relational LSA

Data representationEncode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 28: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Measure Degree of Relation

SimilarityCosine of the latent vectors

Other relation (both symmetric and asymmetric)

Take the latent matrix of the pivot relation (synonym)Take the latent matrix of the relationCosine of the latent vectors after projection

Page 29: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

𝑎𝑛𝑡 ( joy ,sadden )=cos (𝓦: , joy , 𝑠𝑦𝑛 ,𝓦: ,sadden ,𝑎𝑛𝑡 )

0 0 0 0

0 0 1 0

1 0 0 0

0 0 0 0

joy

glad

den

sadd

enfe

lling

joyfulness

gladden

sad

anger

glad

den

joy

sadd

enfe

lling

joyfulness

gladden

sad

anger

Synonym layer Antonym layer

1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 0

Measure Degree of RelationRaw Representation

Page 30: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

𝑎𝑛𝑡 ( joy ,sadden )=cos (𝓦: , joy , 𝑠𝑦𝑛 ,𝓦: ,sadden ,𝑎𝑛𝑡 )

0 0 0 0

0 0 1 0

1 0 0 0

0 0 0 0

joy

glad

den

sadd

enfe

lling

joyfulness

gladden

sad

anger

glad

den

joy

sadd

enfe

lling

joyfulness

gladden

sad

anger

Synonym layer Antonym layer

1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 0

Measure Degree of RelationRaw Representation

Page 31: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

𝐻𝑦𝑝𝑒𝑟 ( joy , feeling )=cos (𝑾 : , joy ,𝑠𝑦𝑛 ,𝑾 : ,feeling , h𝑦𝑝𝑒𝑟 )

joy

glad

den

sadd

enfe

lling

joyfulness

gladden

sad

anger

Synonym layer

1 0 0 0

1 1 0 0

0 0 1 0

0 0 0 0

Estimate the Degree of a RelationRaw Representation

0 0 0 1

0 0 0 0

0 0 0 1

0 0 0 1

glad

den

joy

sadd

enfe

elin

g

joyfulness

gladden

sad

anger

Hypernym layer

Page 32: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

𝑟𝑒𝑙 ( w 𝑖 ,w 𝑗 )=cos (𝑊 : , w 𝑖 , 𝑠𝑦𝑛,𝑊 : , w 𝑗 ,𝑟𝑒𝑙 )

Measure Degree of RelationRaw Representation

w 𝑗

w𝑖

Synonym layer

The slice of the specific relation

Page 33: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Cos ( , )

~~ × ×

𝑣1 ,𝑣2 , …,𝑣𝑛

𝑟

𝑟

𝑆,𝑆

2 ,…,𝑆𝑚

𝑟~~

× ×

𝑟𝑒𝑙 ( w 𝑖 ,w 𝑗 )=cos (𝑺:, : ,𝑠𝑦𝑛𝐕𝑖 , :𝑇 ,𝑺: , :, 𝑟𝑒𝑙𝐕 𝑗 ,:

𝑇 )

Measure Degree of RelationLatent Representation

w 𝑗

w𝑖

R 𝑠𝑦𝑛

R𝑟𝑒𝑙 v 𝑗 v 𝑖

Page 34: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Roadmap

IntroductionBackground:

Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)

Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation

ExperimentsConclusions

Page 35: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Experiment: Data for Building MRLSA Model

Encarta Thesaurus Record synonyms and antonyms of target wordsVocabulary of 50k terms and 47k target words

WordNetHas synonym, antonym, hyponym, hypernym relationsVocabulary of 149k terms and 117k target words

Goals:MRLSA generalizes LSA to model multiple relationsImprove performance by combing heterogeneous data

Page 36: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Example Antonyms Output by MRLSA

Target High Score Words

inanimate

alive, living, bodily, in-the-flesh, incarnate

alleviate exacerbate, make-worse, in-flame, amplify, stir-up

relish detest, abhor, abominate, despise, loathe

* Words in blue are antonyms listed in the Encarta thesaurus.

Page 37: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Results – GRE Antonym TestTask: GRE closest-opposite questions

Which is the closest opposite of adulterate?(a) renounce (b) forbid (c) purify (d) criticize (e) correct

Mohammad et al. 08 Lookup PILSA MRLSA0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.64

0.56

0.740.77

Acc

ura

cy

Page 38: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Example Hyponyms Output by MRLSATarget High Score Words

bird ostrich, gamecock, nighthawk, amazon, parrot

automobile

minivan, wagon, taxi, minicab, gypsy cab

vegetable buttercrunch, yellow turnip, romaine, chipotle, chilli

Page 39: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Results – Relational Similarity (SemEval-2012)

Task: Class-Inclusion Relation ( is-a kind of )Most/least illustrative word pairs(a) art:abstract (b) song:opera (c) footwear:boot (d) hair:brown

UTD Lookup MRLSA0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.340.37

0.56

Acc

ura

cy

Page 40: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Relational domain knowledge – the entity type

Relation can only hold between the right types of entities

Words having is-a relation have the same part-of-speechFor relation born-in, the entity types are: (person, location)

Leverage type information to improve MRLSA

Idea #3: Change the objective function

Problem: Use Relational Domain Knowledge

Page 41: Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

ConclusionsContinuous semantic representation that

Leverages existing rich linguistic resourcesDiscovers new relationsEnables us to measure the degree of multiple relations

ApproachesBetter data representationMatrix/Tensor decomposition

Challenges & Future WorkCapture more types of knowledge in the modelSupport more sophisticated inferential tasks