vector-based models of semantic composition · seminar für computerlinguistik, heidelberg mirella...

123
Vector-based Models of Semantic Composition Mirella Lapata and Jeff Mitchell School of Informatics University of Edinburgh Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell 1

Upload: others

Post on 30-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Vector-based Models of Semantic Composition

Mirella Lapata and Jeff Mitchell

School of InformaticsUniversity of Edinburgh

Seminar für Computerlinguistik, Heidelberg

Mirella Lapata and Jeff Mitchell 1

Page 2: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Outline

Outline

1 IntroductionSemantic Space ModelsLogic-based ViewConnectionism

2 Composition Models

3 EvaluationPhrase Similarity TaskLanguage Modeling

4 Conclusions

Mirella Lapata and Jeff Mitchell 2

Page 3: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction

Distributional Hypothesis

You shall know a word by the company it keeps (Firth, 1957).

A word’s context provides information about its meaning.Words are similar if they share similar linguistic contexts.Distributional vs. semantic similarity.

space

address

storage

disk

feetsquare

leased

shuttle

astronaut

NASA

Mirella Lapata and Jeff Mitchell 3

Page 4: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction

A Simple Semantic Space

Stuart B. Opotowsky was named vice president forthis company with interests in insurance, tobacco, ho-tels and broadcasting.

Select 2,000 most common content words as contexts.Five word context window each side of the target word.Convert counts to probabilities: p(c|w).

Divide through by probabilities of each context word: p(c|w)p(c) .

Cosine similarity: sim(w1,w2) = w1·w2|w1||w2| .

Mirella Lapata and Jeff Mitchell 4

Page 5: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction

A Simple Semantic Space

Stuart B. Opotowsky was named vice president forthis company with interests in insurance, tobacco, ho-tels and broadcasting.

Select 2,000 most common content words as contexts.

Five word context window each side of the target word.Convert counts to probabilities: p(c|w).

Divide through by probabilities of each context word: p(c|w)p(c) .

Cosine similarity: sim(w1,w2) = w1·w2|w1||w2| .

Mirella Lapata and Jeff Mitchell 4

Page 6: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction

A Simple Semantic Space

Stuart B. Opotowsky was named vice president forthis company with interests in insurance, tobacco, ho-tels and broadcasting.

Select 2,000 most common content words as contexts.Five word context window each side of the target word.

Convert counts to probabilities: p(c|w).

Divide through by probabilities of each context word: p(c|w)p(c) .

Cosine similarity: sim(w1,w2) = w1·w2|w1||w2| .

Mirella Lapata and Jeff Mitchell 4

Page 7: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction

A Simple Semantic Space

vice president interests insurance . . .company 1 1 1 1 . . .

Select 2,000 most common content words as contexts.Five word context window each side of the target word.

Convert counts to probabilities: p(c|w).

Divide through by probabilities of each context word: p(c|w)p(c) .

Cosine similarity: sim(w1,w2) = w1·w2|w1||w2| .

Mirella Lapata and Jeff Mitchell 4

Page 8: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction

A Simple Semantic Space

vice president tax interests . . .company 25 103 19 55 . . .

Select 2,000 most common content words as contexts.Five word context window each side of the target word.

Convert counts to probabilities: p(c|w).

Divide through by probabilities of each context word: p(c|w)p(c) .

Cosine similarity: sim(w1,w2) = w1·w2|w1||w2| .

Mirella Lapata and Jeff Mitchell 4

Page 9: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction

A Simple Semantic Space

vice president tax interests . . .company 0.06 0.26 0.05 0.14 . . .

Select 2,000 most common content words as contexts.Five word context window each side of the target word.Convert counts to probabilities: p(c|w).

Divide through by probabilities of each context word: p(c|w)p(c) .

Cosine similarity: sim(w1,w2) = w1·w2|w1||w2| .

Mirella Lapata and Jeff Mitchell 4

Page 10: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction

A Simple Semantic Space

vice president tax interests . . .company 1.52 2.32 1.14 1.06 . . .

Select 2,000 most common content words as contexts.Five word context window each side of the target word.Convert counts to probabilities: p(c|w).

Divide through by probabilities of each context word: p(c|w)p(c) .

Cosine similarity: sim(w1,w2) = w1·w2|w1||w2| .

Mirella Lapata and Jeff Mitchell 4

Page 11: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction

A Simple Semantic Space

vice president tax interests . . .company 1.52 2.32 1.14 1.06 . . .

Select 2,000 most common content words as contexts.Five word context window each side of the target word.Convert counts to probabilities: p(c|w).

Divide through by probabilities of each context word: p(c|w)p(c) .

Cosine similarity: sim(w1,w2) = w1·w2|w1||w2| .

Mirella Lapata and Jeff Mitchell 4

Page 12: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction

An Alternative: Topic Models

Key Idea: documents are mixtures of topics, topics are probabilitydistributions over words (Blei et al., 2003; Griffiths and Steyvers, 2002;2003; 2004).

Topic models are generative and structured. For a new document:

1 Choose a distribution over topics2 Choose a topic at random according to distribution3 draw a word from that topic

Statistical techniques used to invert the process: infer set of topics thatwere responsible for generating a collection of documents.

Mirella Lapata and Jeff Mitchell 5

Page 13: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction

Probabilistic Generative Process

2. Generative Models

loan

PROBABILISTIC GENERATIVE PROCESS

TOPIC 1

moneyloanbank

moneyban

k

riverTOPIC 2

river

riverstreambank

bankstream

bank loan

DOC1: money1 bank1 loan1bank1 money1 money1bank1 loan1

DOC2: money1 bank1bank2 river2 loan1 stream2bank1 money1

DOC3: river2 bank2stream2 bank2 river2 river2stream2 bank2

1.0

.5.5

1.0

TOPIC 1

TOPIC 2

DOC1: money? bank? loan?bank? money? money?bank? loan?

DOC2: money? bank?bank? river? loan? stream?bank? money?

DOC3: river? bank?stream? bank? river? river?stream? bank?

STATISTICAL INFERENCE

??

?

Figure 2.Mirella Lapata and Jeff Mitchell 6

Page 14: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction

Statistical Inference

2. Generative Models

loan

PROBABILISTIC GENERATIVE PROCESS

TOPIC 1

moneyloanbank

money

bank

river

TOPIC 2

river

riverstreambank

bankstream

bank loan

DOC1: money1 bank1 loan1bank1 money1 money1bank1 loan1

DOC2: money1 bank1bank2 river2 loan1 stream2bank1 money1

DOC3: river2 bank2stream2 bank2 river2 river2stream2 bank2

1.0

.5.5

1.0

TOPIC 1

TOPIC 2

DOC1: money? bank? loan?bank? money? money?bank? loan?

DOC2: money? bank?bank? river? loan? stream?bank? money?

DOC3: river? bank?stream? bank? river? river?stream? bank?

STATISTICAL INFERENCE

??

?

Figure 2. Mirella Lapata and Jeff Mitchell 7

Page 15: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction

Meaning Representation

Topic 1 Topic 2 Topic npractical 0.39 0.02 . . .difficulty 0.03 0.44 . . .produce 0.06 0.17 . . .

Topic 2difficultyproblemsituationcrisishardship

Topics are the dimensions of the space (500, 1000)Vector components: probability of word given topicTopics correspond to coarse-grained sense distinctionsCosine similarity can be used (probabilistic alternatives)

Mirella Lapata and Jeff Mitchell 8

Page 16: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Semantic Space Models

Semantic Space Models

Semantic space models are extremely popular across disciplines!

Semantic Priming (Lund and Burgess, 1996)Text comprehension (Landauer and Dumais, 1997)Word association (McDonald, 2000)Information Retrieval (Salton et al., 1975)Thesaurus extraction (Grefenstette, 1994)Word Sense disambiguation (Schütze, 1998)Text Segmentation (Hirst, 1997)Automatic, language independent

Catch: representation of the meaning of single words. What aboutphrases or sentences?

Mirella Lapata and Jeff Mitchell 9

Page 17: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Semantic Space Models

Semantic Space Models

Semantic space models are extremely popular across disciplines!

Semantic Priming (Lund and Burgess, 1996)Text comprehension (Landauer and Dumais, 1997)Word association (McDonald, 2000)Information Retrieval (Salton et al., 1975)Thesaurus extraction (Grefenstette, 1994)Word Sense disambiguation (Schütze, 1998)Text Segmentation (Hirst, 1997)Automatic, language independent

Catch: representation of the meaning of single words. What aboutphrases or sentences?

Mirella Lapata and Jeff Mitchell 9

Page 18: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Semantic Space Models

Quick Fix

It was not the sales manager who hit the bottle that day, butthe office worker with the serious drinking problem.

That day the office manager, who was drinking, hit the prob-lem sales worker with the bottle, but it was not serious.

Vector averaging: p = 12(u + v) (Foltz et al., 1998; Landauer et al.,

1997); syntax insensitiveAdd a neighbor to the sum: p = u + v + n (Kintsch, 2001);meaning of predicate depends on its argument

Mirella Lapata and Jeff Mitchell 10

Page 19: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Semantic Space Models

Quick Fix

It was not the sales manager who hit the bottle that day, butthe office worker with the serious drinking problem.

That day the office manager, who was drinking, hit the prob-lem sales worker with the bottle, but it was not serious.

Vector averaging: p = 12(u + v) (Foltz et al., 1998; Landauer et al.,

1997); syntax insensitive

Add a neighbor to the sum: p = u + v + n (Kintsch, 2001);meaning of predicate depends on its argument

Mirella Lapata and Jeff Mitchell 10

Page 20: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Semantic Space Models

Quick Fix

It was not the sales manager who hit the bottle that day, butthe office worker with the serious drinking problem.

That day the office manager, who was drinking, hit the prob-lem sales worker with the bottle, but it was not serious.

Vector averaging: p = 12(u + v) (Foltz et al., 1998; Landauer et al.,

1997); syntax insensitiveAdd a neighbor to the sum: p = u + v + n (Kintsch, 2001);meaning of predicate depends on its argument

Mirella Lapata and Jeff Mitchell 10

Page 21: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Logic-based View

Logic-based View

Meaning of whole is function of meaning of its parts (Frege, 1957).

a horse ran

λu.λv .∃x(u@x ∧ v@x) λy .HORSE(y) λz.RUN(z)

∃x(HORSE(x) ∧ RUN(x))

Logic can account for sentential meaning (Montague, 1974).Differences in meaning are qualitative rather than quantitative.Cannot express degrees of similarity.

Mirella Lapata and Jeff Mitchell 11

Page 22: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Logic-based View

Logic-based View

Meaning of whole is function of meaning of its parts (Frege, 1957).

a horse ran

λu.λv .∃x(u@x ∧ v@x) λy .HORSE(y) λz.RUN(z)

∃x(HORSE(x) ∧ RUN(x))

Logic can account for sentential meaning (Montague, 1974).Differences in meaning are qualitative rather than quantitative.Cannot express degrees of similarity.

Mirella Lapata and Jeff Mitchell 11

Page 23: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Logic-based View

Logic-based View

Meaning of whole is function of meaning of its parts (Frege, 1957).

a horse ran

λu.λv .∃x(u@x ∧ v@x) λy .HORSE(y) λz.RUN(z)

∃x(HORSE(x) ∧ RUN(x))

Logic can account for sentential meaning (Montague, 1974).Differences in meaning are qualitative rather than quantitative.Cannot express degrees of similarity.

Mirella Lapata and Jeff Mitchell 11

Page 24: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Logic-based View

Logic-based View

Meaning of whole is function of meaning of its parts (Frege, 1957).

a horse ran

λu.λv .∃x(u@x ∧ v@x) λy .HORSE(y) λz.RUN(z)

∃x(HORSE(x) ∧ RUN(x))

Logic can account for sentential meaning (Montague, 1974).Differences in meaning are qualitative rather than quantitative.Cannot express degrees of similarity.

Mirella Lapata and Jeff Mitchell 11

Page 25: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Logic-based View

Compositionality

Partee (1995): the meaning of the whole is a function of the meaningof the parts and of the way they are syntactically combined.

Lakoff (1977): the meaning of the whole is a greater than the meaningof the parts.

Frege (1884): never ask the meaning of a word in isolation but only inthe context of a statement.

Pinker (1994): composition of simple elements must allow theconstruction of novel meanings which go beyond those of theindividual elements.

Mirella Lapata and Jeff Mitchell 12

Page 26: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Logic-based View

Compositionality

Partee (1995): the meaning of the whole is a function of the meaningof the parts and of the way they are syntactically combined.

Lakoff (1977): the meaning of the whole is a greater than the meaningof the parts.

Frege (1884): never ask the meaning of a word in isolation but only inthe context of a statement.

Pinker (1994): composition of simple elements must allow theconstruction of novel meanings which go beyond those of theindividual elements.

Mirella Lapata and Jeff Mitchell 12

Page 27: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Logic-based View

Compositionality

Partee (1995): the meaning of the whole is a function of the meaningof the parts and of the way they are syntactically combined.

Lakoff (1977): the meaning of the whole is a greater than the meaningof the parts.

Frege (1884): never ask the meaning of a word in isolation but only inthe context of a statement.

Pinker (1994): composition of simple elements must allow theconstruction of novel meanings which go beyond those of theindividual elements.

Mirella Lapata and Jeff Mitchell 12

Page 28: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Logic-based View

Compositionality

Partee (1995): the meaning of the whole is a function of the meaningof the parts and of the way they are syntactically combined.

Lakoff (1977): the meaning of the whole is a greater than the meaningof the parts.

Frege (1884): never ask the meaning of a word in isolation but only inthe context of a statement.

Pinker (1994): composition of simple elements must allow theconstruction of novel meanings which go beyond those of theindividual elements.

Mirella Lapata and Jeff Mitchell 12

Page 29: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Connectionism

Connectionism

v3

v2

v1

u1v3

u1v2

u1v1

u2v3

u2v2

u2v1

u3v3

u3v2

u3v1

u1 u2 u3

Tensor products: p = u⊗ v (Smolensky, 1990); dimensionality

Circular convolution: p = u ~ v (Plate, 1991); components arerandomly distributedSpatter codes: take the XOR of two vectors (Kanerva, 1998);components are random bits

Mirella Lapata and Jeff Mitchell 13

Page 30: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Connectionism

Connectionism

v3

v2

v1

u1v3

u1v2

u1v1

u2v3

u2v2

u2v1

u3v3

u3v2

u3v1

u1 u2 u3

Tensor products: p = u⊗ v (Smolensky, 1990); dimensionalityCircular convolution: p = u ~ v (Plate, 1991); components arerandomly distributed

Spatter codes: take the XOR of two vectors (Kanerva, 1998);components are random bits

Mirella Lapata and Jeff Mitchell 13

Page 31: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Introduction Connectionism

Connectionism

v3

v2

v1

u1v3

u1v2

u1v1

u2v3

u2v2

u2v1

u3v3

u3v2

u3v1

u1 u2 u3

Tensor products: p = u⊗ v (Smolensky, 1990); dimensionalityCircular convolution: p = u ~ v (Plate, 1991); components arerandomly distributedSpatter codes: take the XOR of two vectors (Kanerva, 1998);components are random bits

Mirella Lapata and Jeff Mitchell 13

Page 32: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

A Framework for Semantic Composition

p = f (u,v,R ,K)

composition of u,v

syntactic relationship

background knowledge

Assumptions:1 eliminate background knowledge K2 vary syntactic relationship R3 p is in same space as u and v4 f () is a linear function of Cartesian product (additive model)5 f () is a linear function of tensor product (multiplicative model)

Mirella Lapata and Jeff Mitchell 14

Page 33: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

A Framework for Semantic Composition

p = f (u,v,R ,K)

composition of u,v

syntactic relationship

background knowledge

Assumptions:1 eliminate background knowledge K2 vary syntactic relationship R3 p is in same space as u and v4 f () is a linear function of Cartesian product (additive model)5 f () is a linear function of tensor product (multiplicative model)

Mirella Lapata and Jeff Mitchell 14

Page 34: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

A Framework for Semantic Composition

p = f (u,v,R ,K)

composition of u,v

syntactic relationship

background knowledge

Assumptions:1 eliminate background knowledge K2 vary syntactic relationship R3 p is in same space as u and v4 f () is a linear function of Cartesian product (additive model)5 f () is a linear function of tensor product (multiplicative model)

Mirella Lapata and Jeff Mitchell 14

Page 35: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

A Framework for Semantic Composition

p = f (u,v,R ,K)

composition of u,v

syntactic relationship

background knowledge

Assumptions:1 eliminate background knowledge K2 vary syntactic relationship R3 p is in same space as u and v4 f () is a linear function of Cartesian product (additive model)5 f () is a linear function of tensor product (multiplicative model)

Mirella Lapata and Jeff Mitchell 14

Page 36: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

A Framework for Semantic Composition

p = f (u,v,R)

composition of u,v

syntactic relationship

background knowledge

Assumptions:1 eliminate background knowledge K

2 vary syntactic relationship R3 p is in same space as u and v4 f () is a linear function of Cartesian product (additive model)5 f () is a linear function of tensor product (multiplicative model)

Mirella Lapata and Jeff Mitchell 14

Page 37: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

A Framework for Semantic Composition

p =f (u,v, OBJ )

composition of u,v

syntactic relationship

background knowledge

Assumptions:1 eliminate background knowledge K2 vary syntactic relationship R

3 p is in same space as u and v4 f () is a linear function of Cartesian product (additive model)5 f () is a linear function of tensor product (multiplicative model)

Mirella Lapata and Jeff Mitchell 14

Page 38: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

A Framework for Semantic Composition

p =f (u,v, OBJ )

composition of u,v

syntactic relationship

background knowledge

Assumptions:1 eliminate background knowledge K2 vary syntactic relationship R3 p is in same space as u and v

4 f () is a linear function of Cartesian product (additive model)5 f () is a linear function of tensor product (multiplicative model)

Mirella Lapata and Jeff Mitchell 14

Page 39: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

A Framework for Semantic Composition

p =f (u,v, OBJ )

composition of u,v

syntactic relationship

background knowledge

Assumptions:1 eliminate background knowledge K2 vary syntactic relationship R3 p is in same space as u and v4 f () is a linear function of Cartesian product (additive model)

5 f () is a linear function of tensor product (multiplicative model)

Mirella Lapata and Jeff Mitchell 14

Page 40: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

A Framework for Semantic Composition

p =f (u,v, OBJ )

composition of u,v

syntactic relationship

background knowledge

Assumptions:1 eliminate background knowledge K2 vary syntactic relationship R3 p is in same space as u and v4 f () is a linear function of Cartesian product (additive model)5 f () is a linear function of tensor product (multiplicative model)

Mirella Lapata and Jeff Mitchell 14

Page 41: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Additive Models

p = Au + Bv

Instances

p = u + v

p = u + v +∑

ini

p = αu + βv

p = v

music solution economy craft createpractical 0 6 2 10 4difficulty 1 8 4 4 0problem 2 15 7 9 1

practical + difficulty = [1 14 6 14 4]

practical + difficulty + problem = [3 29 13 23 5]

0.4 · practical + 0.6 · difficulty = [0.6 5.6 3.2 6.4 1.6]

difficulty = [1 8 4 4 0]

Mirella Lapata and Jeff Mitchell 15

Page 42: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Additive Models

p = Au + Bv

Instances

p = u + v

p = u + v +∑

ini

p = αu + βv

p = v

music solution economy craft createpractical 0 6 2 10 4difficulty 1 8 4 4 0problem 2 15 7 9 1

practical + difficulty = [1 14 6 14 4]

practical + difficulty + problem = [3 29 13 23 5]

0.4 · practical + 0.6 · difficulty = [0.6 5.6 3.2 6.4 1.6]

difficulty = [1 8 4 4 0]

Mirella Lapata and Jeff Mitchell 15

Page 43: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Additive Models

p = Au + Bv

Instances

p = u + v

p = u + v +∑

ini

p = αu + βv

p = v

music solution economy craft createpractical 0 6 2 10 4difficulty 1 8 4 4 0problem 2 15 7 9 1

practical + difficulty = [1 14 6 14 4]

practical + difficulty + problem = [3 29 13 23 5]

0.4 · practical + 0.6 · difficulty = [0.6 5.6 3.2 6.4 1.6]

difficulty = [1 8 4 4 0]

Mirella Lapata and Jeff Mitchell 15

Page 44: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Additive Models

p = Au + Bv

Instances

p = u + v

p = u + v +∑

ini

p = αu + βv

p = v

music solution economy craft createpractical 0 6 2 10 4difficulty 1 8 4 4 0problem 2 15 7 9 1

practical + difficulty = [1 14 6 14 4]

practical + difficulty + problem = [3 29 13 23 5]

0.4 · practical + 0.6 · difficulty = [0.6 5.6 3.2 6.4 1.6]

difficulty = [1 8 4 4 0]

Mirella Lapata and Jeff Mitchell 15

Page 45: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Additive Models

p = Au + Bv

Instances

p = u + v

p = u + v +∑

ini

p = αu + βv

p = v

music solution economy craft createpractical 0 6 2 10 4difficulty 1 8 4 4 0problem 2 15 7 9 1

practical + difficulty = [1 14 6 14 4]

practical + difficulty + problem = [3 29 13 23 5]

0.4 · practical + 0.6 · difficulty = [0.6 5.6 3.2 6.4 1.6]

difficulty = [1 8 4 4 0]

Mirella Lapata and Jeff Mitchell 15

Page 46: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Multiplicative Models

p = Cuv

Instances

p = u� vpi = uivi

p = u⊗ vpi,j = ui · vj

p = u ~ vpi =

∑j uj · vi−j

music solution economy craft createpractical 0 6 2 10 4difficulty 1 8 4 4 0

practical� difficulty = [0 48 8 40 0]

practical⊗difficulty =

0 0 0 0 06 48 24 24 02 16 8 8 0

10 80 40 40 04 32 16 16 0

practical ~ difficulty = [116 50 66 62 80]

Mirella Lapata and Jeff Mitchell 16

Page 47: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Multiplicative Models

p = Cuv

Instances

p = u� vpi = uivi

p = u⊗ vpi,j = ui · vj

p = u ~ vpi =

∑j uj · vi−j

music solution economy craft createpractical 0 6 2 10 4difficulty 1 8 4 4 0

practical� difficulty = [0 48 8 40 0]

practical⊗difficulty =

0 0 0 0 06 48 24 24 02 16 8 8 0

10 80 40 40 04 32 16 16 0

practical ~ difficulty = [116 50 66 62 80]

Mirella Lapata and Jeff Mitchell 16

Page 48: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Multiplicative Models

p = Cuv

Instances

p = u� vpi = uivi

p = u⊗ vpi,j = ui · vj

p = u ~ vpi =

∑j uj · vi−j

music solution economy craft createpractical 0 6 2 10 4difficulty 1 8 4 4 0

practical� difficulty = [0 48 8 40 0]

practical⊗difficulty =

0 0 0 0 06 48 24 24 02 16 8 8 0

10 80 40 40 04 32 16 16 0

practical ~ difficulty = [116 50 66 62 80]

Mirella Lapata and Jeff Mitchell 16

Page 49: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Multiplicative Models

p = Cuv

Instances

p = u� vpi = uivi

p = u⊗ vpi,j = ui · vj

p = u ~ vpi =

∑j uj · vi−j

music solution economy craft createpractical 0 6 2 10 4difficulty 1 8 4 4 0

practical� difficulty = [0 48 8 40 0]

practical⊗difficulty =

0 0 0 0 06 48 24 24 02 16 8 8 0

10 80 40 40 04 32 16 16 0

practical ~ difficulty = [116 50 66 62 80]

Mirella Lapata and Jeff Mitchell 16

Page 50: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Dilation Models

p = Cuv = UvUij = 0,Uii = ui

x = u·vu·uu y = v− x = v− u·v

u·uu

v′= λx + y = (λ− 1)u·v

u·uu + v

p = (λ− 1)(u · v)u + (u · u)v

��

��

��

��

��

��

��

��

��

��7

v

�������������:u

�������������������:

x

CCCCCCCCCCCCCCCO

y

Mirella Lapata and Jeff Mitchell 17

Page 51: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Dilation Models

p = Cuv = UvUij = 0,Uii = ui

x = u·vu·uu y = v− x = v− u·v

u·uu

v′= λx + y = (λ− 1)u·v

u·uu + v

p = (λ− 1)(u · v)u + (u · u)v

��

��

��

��

��

��

��

��

��

��7

v

�������������:u

�������������������:

x

CCCCCCCCCCCCCCCO

y

Mirella Lapata and Jeff Mitchell 17

Page 52: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Dilation Models

p = Cuv = UvUij = 0,Uii = ui

x = u·vu·uu y = v− x = v− u·v

u·uu

v′= λx + y = (λ− 1)u·v

u·uu + v

p = (λ− 1)(u · v)u + (u · u)v�

��

��

��

��

��

��

��

��

���7

v

�������������:u

�������������������:

x

CCCCCCCCCCCCCCCO

y

Mirella Lapata and Jeff Mitchell 17

Page 53: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Dilation Models

p = Cuv = UvUij = 0,Uii = ui

x = u·vu·uu y = v− x = v− u·v

u·uu

v′= λx + y = (λ− 1)u·v

u·uu + v

p = (λ− 1)(u · v)u + (u · u)v�

��

��

��

��

��

��

��

��

���7

v

�������������:u

�������������������:

x

CCCCCCCCCCCCCCCO

y

Mirella Lapata and Jeff Mitchell 17

Page 54: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Composition Models

Models

Dilation Models

p = Cuv = UvUij = 0,Uii = ui

x = u·vu·uu y = v− x = v− u·v

u·uu

v′= λx + y = (λ− 1)u·v

u·uu + v

p = (λ− 1)(u · v)u + (u · u)v�

��

��

��

��

��

��

��

��

���7

v

�������������:u

�������������������:

x

CCCCCCCCCCCCCCCO

y

Mirella Lapata and Jeff Mitchell 17

Page 55: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Phrase Similarity Task

Originally proposed in Kintsch (2002):

Elicit similarity judgments for adjective-noun, noun-noun,verb-object combinations.Phrase pairs from three bands: High, Medium, Low.Compute vectors for phrases, measure their similarity.Correlate model similarities with human ratings.

High Medium Lowold person elderly lady right hand small housekitchen door bedroom window office worker housing departmentproduce effect achieve result consider matter start work

Mirella Lapata and Jeff Mitchell 18

Page 56: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Phrase Similarity Task

Originally proposed in Kintsch (2002):

Elicit similarity judgments for adjective-noun, noun-noun,verb-object combinations.Phrase pairs from three bands: High, Medium, Low.Compute vectors for phrases, measure their similarity.Correlate model similarities with human ratings.

High Medium Lowold person

elderly lady right hand small house

kitchen door

bedroom window office worker housing department

produce effect

achieve result consider matter start work

Mirella Lapata and Jeff Mitchell 18

Page 57: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Phrase Similarity Task

Originally proposed in Kintsch (2002):

Elicit similarity judgments for adjective-noun, noun-noun,verb-object combinations.Phrase pairs from three bands: High, Medium, Low.Compute vectors for phrases, measure their similarity.Correlate model similarities with human ratings.

High Medium Lowold person elderly lady right hand small housekitchen door

bedroom window office worker housing department

produce effect

achieve result consider matter start work

Mirella Lapata and Jeff Mitchell 18

Page 58: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Phrase Similarity Task

Originally proposed in Kintsch (2002):

Elicit similarity judgments for adjective-noun, noun-noun,verb-object combinations.Phrase pairs from three bands: High, Medium, Low.Compute vectors for phrases, measure their similarity.Correlate model similarities with human ratings.

High Medium Lowold person elderly lady right hand small housekitchen door bedroom window office worker housing departmentproduce effect

achieve result consider matter start work

Mirella Lapata and Jeff Mitchell 18

Page 59: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Phrase Similarity Task

Originally proposed in Kintsch (2002):

Elicit similarity judgments for adjective-noun, noun-noun,verb-object combinations.Phrase pairs from three bands: High, Medium, Low.Compute vectors for phrases, measure their similarity.Correlate model similarities with human ratings.

High Medium Lowold person elderly lady right hand small housekitchen door bedroom window office worker housing departmentproduce effect achieve result consider matter start work

Mirella Lapata and Jeff Mitchell 18

Page 60: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Experimental Setup

Similarity Ratings36 pairs (adj-noun, noun-noun, verb-noun) × 3 bands(324 pairs in total, created automatically, substitutability test)Ratings collected using Webexp (90 participants)Participants use 7-point similarity scale

Semantic SpaceCompare simple semantic space against LDA topic model(Blei et al. 2003)2000 dimensions vs 100 topics, using cosine similarity measureParameters for composition models tuned on dev set

Mirella Lapata and Jeff Mitchell 19

Page 61: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Results (for verb-obj)

Model Simple LDAAdditive 0.30 0.40Kintsch 0.29 0.33Weighted Additive 0.34 0.40Multiplicative 0.37 0.34Tensor Product 0.33 0.33Circular Convolution 0.10 0.12Dilation 0.38 0.41Head Only 0.24 0.17Humans 0.55

Multiplicative and dilation models best for simple spaceDilation and Additive models best for LDA modelCircular convolution is worst performing model

Mirella Lapata and Jeff Mitchell 20

Page 62: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Results (for verb-obj)

Model Simple LDAAdditive 0.30 0.40Kintsch 0.29 0.33Weighted Additive 0.34 0.40Multiplicative 0.37 0.34Tensor Product 0.33 0.33Circular Convolution 0.10 0.12Dilation 0.38 0.41Head Only 0.24 0.17Humans 0.55

Multiplicative and dilation models best for simple spaceDilation and Additive models best for LDA modelCircular convolution is worst performing model

Mirella Lapata and Jeff Mitchell 20

Page 63: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Results (for verb-obj)

Model Simple LDAAdditive 0.30 0.40Kintsch 0.29 0.33Weighted Additive 0.34 0.40Multiplicative 0.37 0.34Tensor Product 0.33 0.33Circular Convolution 0.10 0.12Dilation 0.38 0.41Head Only 0.24 0.17Humans 0.55

Multiplicative and dilation models best for simple space

Dilation and Additive models best for LDA modelCircular convolution is worst performing model

Mirella Lapata and Jeff Mitchell 20

Page 64: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Results (for verb-obj)

Model Simple LDAAdditive 0.30 0.40Kintsch 0.29 0.33Weighted Additive 0.34 0.40Multiplicative 0.37 0.34Tensor Product 0.33 0.33Circular Convolution 0.10 0.12Dilation 0.38 0.41Head Only 0.24 0.17Humans 0.55

Multiplicative and dilation models best for simple space

Dilation and Additive models best for LDA modelCircular convolution is worst performing model

Mirella Lapata and Jeff Mitchell 20

Page 65: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Results (for verb-obj)

Model Simple LDAAdditive 0.30 0.40Kintsch 0.29 0.33Weighted Additive 0.34 0.40Multiplicative 0.37 0.34Tensor Product 0.33 0.33Circular Convolution 0.10 0.12Dilation 0.38 0.41Head Only 0.24 0.17Humans 0.55

Multiplicative and dilation models best for simple spaceDilation and Additive models best for LDA model

Circular convolution is worst performing model

Mirella Lapata and Jeff Mitchell 20

Page 66: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Results (for verb-obj)

Model Simple LDAAdditive 0.30 0.40Kintsch 0.29 0.33Weighted Additive 0.34 0.40Multiplicative 0.37 0.34Tensor Product 0.33 0.33Circular Convolution 0.10 0.12Dilation 0.38 0.41Head Only 0.24 0.17Humans 0.55

Multiplicative and dilation models best for simple spaceDilation and Additive models best for LDA model

Circular convolution is worst performing model

Mirella Lapata and Jeff Mitchell 20

Page 67: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Results (for verb-obj)

Model Simple LDAAdditive 0.30 0.40Kintsch 0.29 0.33Weighted Additive 0.34 0.40Multiplicative 0.37 0.34Tensor Product 0.33 0.33Circular Convolution 0.10 0.12Dilation 0.38 0.41Head Only 0.24 0.17Humans 0.55

Multiplicative and dilation models best for simple spaceDilation and Additive models best for LDA modelCircular convolution is worst performing model

Mirella Lapata and Jeff Mitchell 20

Page 68: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Interim Summary

General framework of semantic compositionDifferent composition functions appropriate for differentrepresentations (additive vs. multiplicative)Dilation models overall best, syntax sensitive, parametricResults generalize to noun-noun, adj-noun, verb-obj combinations

Mirella Lapata and Jeff Mitchell 21

Page 69: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Interim Summary

General framework of semantic compositionDifferent composition functions appropriate for differentrepresentations (additive vs. multiplicative)Dilation models overall best, syntax sensitive, parametricResults generalize to noun-noun, adj-noun, verb-obj combinationsWhat are composition models good for?

Mirella Lapata and Jeff Mitchell 21

Page 70: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Modeling Brain Activity

Tom Mitchell and collaboratorsWang et al., 2003; Mitchell et al., 2004; Mitchell et al., 2008;Hutchinson et al., 2009; Chang et al., 2009; Rustandi, 2009

Can we observe differences in neural activity as people thinkabout different concepts?Can we use vector-based models to explain observed neuralactivity?

Mirella Lapata and Jeff Mitchell 22

Page 71: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Functional MRI

Monitors brain activity when people comprehend words or phrases.Measures changes related to blood flow and blood oxygenation.

Mirella Lapata and Jeff Mitchell 23

Page 72: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Functional MRI

Monitors brain activity when people comprehend words or phrases.

Measures changes related to blood flow and blood oxygenation.

Mirella Lapata and Jeff Mitchell 23

Page 73: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Functional MRI

Monitors brain activity when people comprehend words or phrases.Measures changes related to blood flow and blood oxygenation.

Mirella Lapata and Jeff Mitchell 23

Page 74: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Functional MRI

soft bear

strong dog

Mirella Lapata and Jeff Mitchell 24

Page 75: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Chang et al. (ACL, 2009)

Participants see adjective-noun phrasesAdjectives emphasize semantic properties of nounsUse vector-based models to account for variance in neural activity.Train regression model to fit activation profile of stimuliMultiplicative model outperforms non-compositional and additivemodel.

Mirella Lapata and Jeff Mitchell 25

Page 76: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Interim Summary

General framework of semantic compositionDifferent composition functions appropriate for differentrepresentations (additive vs. multiplicative)Dilation models overall best, syntax sensitive, parametricResults generalize to noun-noun, adj-noun, verb-obj combinationsWhat are composition models good for?

modeling brain activitysentential priming, inductive inferencetextual entailment, information retrieval, language modeling

Mirella Lapata and Jeff Mitchell 26

Page 77: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Interim Summary

General framework of semantic compositionDifferent composition functions appropriate for differentrepresentations (additive vs. multiplicative)Dilation models overall best, syntax sensitive, parametricResults generalize to noun-noun, adj-noun, verb-obj combinationsWhat are composition models good for?

modeling brain activity

sentential priming, inductive inferencetextual entailment, information retrieval, language modeling

Mirella Lapata and Jeff Mitchell 26

Page 78: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Interim Summary

General framework of semantic compositionDifferent composition functions appropriate for differentrepresentations (additive vs. multiplicative)Dilation models overall best, syntax sensitive, parametricResults generalize to noun-noun, adj-noun, verb-obj combinationsWhat are composition models good for?

modeling brain activitysentential priming, inductive inference

textual entailment, information retrieval, language modeling

Mirella Lapata and Jeff Mitchell 26

Page 79: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Interim Summary

General framework of semantic compositionDifferent composition functions appropriate for differentrepresentations (additive vs. multiplicative)Dilation models overall best, syntax sensitive, parametricResults generalize to noun-noun, adj-noun, verb-obj combinationsWhat are composition models good for?

modeling brain activitysentential priming, inductive inferencetextual entailment, information retrieval, language modeling

Mirella Lapata and Jeff Mitchell 26

Page 80: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Phrase Similarity Task

Interim Summary

General framework of semantic compositionDifferent composition functions appropriate for differentrepresentations (additive vs. multiplicative)Dilation models overall best, syntax sensitive, parametricResults generalize to noun-noun, adj-noun, verb-obj combinationsWhat are composition models good for?

modeling brain activitysentential priming, inductive inferencetextual entailment, information retrieval, language modeling

Mirella Lapata and Jeff Mitchell 26

Page 81: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Language Modeling

What is the next word?

He is now president and chief operating

officer of the company.

Mirella Lapata and Jeff Mitchell 27

Page 82: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Language Modeling

What is the next word?

He is now president and chief operating

officer of the company.

Mirella Lapata and Jeff Mitchell 27

Page 83: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Language Modeling

What is the next word?

He is now president and chief operating

officer of the company.

‘chief operating’ is followed by ‘officer’ 99% of the time.

Mirella Lapata and Jeff Mitchell 27

Page 84: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Language Modeling

What is the next word?

He is now president and chief operating officer

of the company.

‘chief operating’ is followed by ‘officer’ 99% of the time.

Mirella Lapata and Jeff Mitchell 27

Page 85: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Language Modeling

What is the next word?

He is now president and chief operating officer of the

company.

‘of the’ is very frequent but not very predictive.

Mirella Lapata and Jeff Mitchell 27

Page 86: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Language Modeling

What is the next word?

He is now president and chief operating officer of the

company.

‘of the’ is very frequent but not very predictive.

Mirella Lapata and Jeff Mitchell 27

Page 87: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Language Modeling

What is the next word?

He is now president and chief operating officer of the

company.

Prior content indicative of domain the vocabulary is drawn from.

Mirella Lapata and Jeff Mitchell 27

Page 88: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Language Modeling

What is the next word?

He is now president and chief operating officer of the

company.

Prior content indicative of domain the vocabulary is drawn from.

Mirella Lapata and Jeff Mitchell 27

Page 89: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Language Modeling

What is the next word?

He is now president and chief operating officer of the company.

Given semantic representations for ‘president’, ‘chief’, ‘operating’ and‘officer’ how do we combine them to make the most predictiverepresentation of this history?

Mirella Lapata and Jeff Mitchell 27

Page 90: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Language Modeling

Use vector composition in a language model as a way of capturinglong-range dependencies.Not a new idea: Bellegarda (2000), Coccaro & Jurafsky (1998),Gildea & Hofmann (1999), Deng and Khundapur (2003)How to combine vectors? How to construct them?Focus on multiplicative and additive models.

Mirella Lapata and Jeff Mitchell 28

Page 91: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

A Language Model Based on Vector Composition

He is now president and chief operating officer of the company

Mirella Lapata and Jeff Mitchell 29

Page 92: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

A Language Model Based on Vector Composition

He is now president and chief operating officer of the company

p(company |president , chief ,operating,officer)

Mirella Lapata and Jeff Mitchell 29

Page 93: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

A Language Model Based on Vector Composition

He is now president and chief operating officer of the company

p(company |president , chief ,operating,officer)

p(w |h) = sim(w ,h)

Mirella Lapata and Jeff Mitchell 29

Page 94: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

A Language Model Based on Vector Composition

He is now president and chief operating officer of the company

p(company |president , chief ,operating,officer)

p(w |h) = sim(w ,h)

sim(w ,h) ∝ w · h

Mirella Lapata and Jeff Mitchell 29

Page 95: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

A Language Model Based on Vector Composition

He is now president and chief operating officer of the company

p(company |president , chief ,operating,officer)

p(w |h) = sim(w ,h)

sim(w ,h) ∝ w · h =∑

wihi

Mirella Lapata and Jeff Mitchell 29

Page 96: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

A Language Model Based on Vector Composition

He is now president and chief operating officer of the company

p(company |president , chief ,operating,officer)

p(w |h) = sim(w ,h)

sim(w ,h) ∝ w · h =∑ p(ci |w)

p(ci )p(ci |h)p(ci )

Mirella Lapata and Jeff Mitchell 29

Page 97: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

A Language Model Based on Vector Composition

He is now president and chief operating officer of the company

p(company |president , chief ,operating,officer)

p(w |h) = sim(w ,h)

p(w |h) = p(w)∑

ip(ci |w)

p(ci )p(ci |h)p(ci )

p(ci)

Mirella Lapata and Jeff Mitchell 29

Page 98: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

A Language Model Based on Vector Composition

He is now president and chief operating officer of the company

p(company |president , chief ,operating,officer)

p(w |h) = sim(w ,h)

p(w |h) = p(w)∑

ip(ci |w)

p(ci )p(ci |h)p(ci )

p(ci)

hn = f (wn,hn−1)

Mirella Lapata and Jeff Mitchell 29

Page 99: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

A Language Model Based on Vector Composition

He is now president and chief operating officer of the company

p(company |president , chief ,operating,officer)

p(w |h) = sim(w ,h)

p(w |h) = p(w)∑

ip(ci |w)

p(ci )p(ci |h)p(ci )

p(ci)

hn = f (wn,hn−1)h1 = w1

Mirella Lapata and Jeff Mitchell 29

Page 100: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Experimental Setup

BLLIP CorpusTraining set - 38M wordsDevelopment set - 50K wordsTest set - 50K words

Numbers replaced with <NUM>Vocabulary of 20K word typesOthers replaced with <UNK>Perplexity of model predictions on test setCompare simple semantic space against LDA topic model

Mirella Lapata and Jeff Mitchell 30

Page 101: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Integrating with an Ngram model

Linear interpolationλp1(w) + (1− λ)p2(w)

But this will be most effective when models comparable inpredictiveness.

Modify p(w |h)

p(wn)∑ p(ci |wn)

p(ci )p(ci |h)p(ci )

p(ci)

p(wn|wn−1,wn−2)∑ p(ci |wn)

p(ci )p(ci |h)p(ci )

p(ci)

Mirella Lapata and Jeff Mitchell 31

Page 102: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Integrating with an Ngram model

Linear interpolationλp1(w) + (1− λ)p2(w)

But this will be most effective when models comparable inpredictiveness.

Modify p(w |h)

p(wn)∑ p(ci |wn)

p(ci )p(ci |h)p(ci )

p(ci)

p(wn|wn−1,wn−2)∑ p(ci |wn)

p(ci )p(ci |h)p(ci )

p(ci)

Mirella Lapata and Jeff Mitchell 31

Page 103: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Integrating with an Ngram model

Linear interpolationλp1(w) + (1− λ)p2(w)

But this will be most effective when models comparable inpredictiveness.

Modify p(w |h)

p(wn)∑ p(ci |wn)

p(ci )p(ci |h)p(ci )

p(ci)

p(wn|wn−1,wn−2)∑ p(ci |wn)

p(ci )p(ci |h)p(ci )

p(ci)

Mirella Lapata and Jeff Mitchell 31

Page 104: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Integrating with an Ngram model

Linear interpolationλp1(w) + (1− λ)p2(w)

But this will be most effective when models comparable inpredictiveness.

Modify p(w |h)

p(wn)∑ p(ci |wn)

p(ci )p(ci |h)p(ci )

p(ci)

p(wn|wn−1,wn−2)∑ p(ci |wn)

p(ci )p(ci |h)p(ci )

p(ci)

Mirella Lapata and Jeff Mitchell 31

Page 105: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Integrating with an Ngram model

Linear interpolationλp1(w) + (1− λ)p2(w)

But this will be most effective when models comparable inpredictiveness.

Modify p(w |h)

p(wn)∑ p(ci |wn)

p(ci )p(ci |h)p(ci )

p(ci)

p(wn|wn−1,wn−2)∑ p(ci |wn)

p(ci )p(ci |h)p(ci )

p(ci)

Mirella Lapata and Jeff Mitchell 31

Page 106: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Integrating with an Ngram model

Linear interpolationλp1(w) + (1− λ)p2(w)

But this will be most effective when models comparable inpredictiveness.

Modify p(w |h)

p(wn)∑ p(ci |wn)

p(ci )p(ci |h)p(ci )

p(ci)

p(wn|wn−1,wn−2)∑ p(ci |wn)

p(ci )p(ci |h)p(ci )

p(ci)

Mirella Lapata and Jeff Mitchell 31

Page 107: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram

40

50

60

70

80

90

100

110

120Pe

rple

xity

78.72

Mirella Lapata and Jeff Mitchell 32

Page 108: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram +Add SemSpace

40

50

60

70

80

90

100

110

120Pe

rple

xity

78.72 76.66

Mirella Lapata and Jeff Mitchell 32

Page 109: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram +Add SemSpace

+MultiplySemSpace

40

50

60

70

80

90

100

110

120Pe

rple

xity

78.72 76.66 75.01

Mirella Lapata and Jeff Mitchell 32

Page 110: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram +Add SemSpace

+MultiplySemSpace

+Add LDA

40

50

60

70

80

90

100

110

120Pe

rple

xity

78.72 76.66 75.01 76.60

Mirella Lapata and Jeff Mitchell 32

Page 111: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram +AddSemSpace

+MultiplySemSpace

+Add LDA

+Multiply LDA

40

50

60

70

80

90

100

110

120Pe

rple

xity

78.72 76.66 75.01 78.60

123.95

Mirella Lapata and Jeff Mitchell 32

Page 112: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram +Add SemSpace

+MultiplySemSpace

+Add LDA

+Multiply LDA

40

50

60

70

80

90

100

110

120Pe

rple

xity

78.72 76.66 75.01 78.60

123.95

Mirella Lapata and Jeff Mitchell 32

Page 113: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Comparison to Parsing

Model incorporates semantic dependencies into a trigram model.Increases the probability of upcoming words which aresemantically similar to the history.Syntactic information also captures long-range dependencies.Language models based on syntactic structure.Interpolate composition models with Roark’s (2001) parser.

Mirella Lapata and Jeff Mitchell 33

Page 114: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram

40

50

60

70

80

90

100

110

120

Perp

lexi

ty

78.72

Mirella Lapata and Jeff Mitchell 34

Page 115: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram +Parser

40

50

60

70

80

90

100

110

120

Perp

lexi

ty

78.7275.22

Mirella Lapata and Jeff Mitchell 34

Page 116: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram +Parser -Parser +Multiply, SemSpace

40

50

60

70

80

90

100

110

120Pe

rple

xity

78.7275

.22

75.0

1

Mirella Lapata and Jeff Mitchell 34

Page 117: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram +Parser +Add SemSpace

40

50

60

70

80

90

100

110

120Pe

rple

xity

78.7275.22 73.45

Mirella Lapata and Jeff Mitchell 34

Page 118: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram +Parser +Add SemSpace

+Multiply SemSpace

40

50

60

70

80

90

100

110

120Pe

rple

xity

78.7275.22 73.45 71.32

Mirella Lapata and Jeff Mitchell 34

Page 119: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram +Parser +Add SemSpace

+Multiply SemSpace

+Add LDA

40

50

60

70

80

90

100

110

120Pe

rple

xity

78.7275.22 73.45 71.32 71.58

Mirella Lapata and Jeff Mitchell 34

Page 120: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram +Parser +Add SemSpace

+Multiply SemSpace

+Add LDA

+Multiply LDA

40

50

60

70

80

90

100

110

120Pe

rple

xity

78.7275.22 73.45 71.32 71.58

87.93

Mirella Lapata and Jeff Mitchell 34

Page 121: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Evaluation Language Modeling

Perplexities

Trigram +Parser +Add SemSpace

+Multiply SemSpace

+Add LDA

+Multiply LDA

40

50

60

70

80

90

100

110

120Pe

rple

xity

78.7275.22 73.45 71.32 71.58

87.93

Mirella Lapata and Jeff Mitchell 34

Page 122: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Conclusions

Conclusions

Work so farVector composition for phrase similarity and language modelingCompared a simple semantic space to LDADifferent composition functions appropriate for each modelSemantic dependencies complementary to syntactic onesCognitive Science (to appear), ACL 2008, EMNLP 2009.

Future workIncorporate syntax into composition (parser that outputs acompositional vector-based representation of a sentence)Optimize vectors and composition function on specific tasks

Mirella Lapata and Jeff Mitchell 35

Page 123: Vector-based Models of Semantic Composition · Seminar für Computerlinguistik, Heidelberg Mirella Lapata and Jeff Mitchell1. Outline Outline 1 Introduction Semantic Space Models

Conclusions

LDA Topics

Mirella Lapata and Jeff Mitchell 36