integrating term dependencies according to their utility jian-yun nie university of montreal 1

28
Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Post on 20-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Integrating term dependencies according to their utility

Jian-Yun NieUniversity of Montreal

1

Page 2: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Need for term dependency

• The meaning of a term often depends on other terms used in the same context– Term dependency– E.g. computer architecture, hot dog, …

• Unigram model is unable to capture term dependency– hot + dog ≠ "hot dog"

• Dependency: a group of terms (a pair of terms)

2

Page 3: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Previous approaches• Phrase + unigram

– 2 representations: phrase model and unigram model– Interpolation (each model with a fixed weight)– Assumption: phrases represent useful dependencies between

terms for IR– E.g. Q = the price of hot dog

• PUnigram: price, hot, dog

• PPhrase: price, hot_dog

• P(price hot dog|D) = Pphrase(price hot dog|D) + (1-PUnigram(price hot dog|D)

• or score = scorephrase + (1-scoreunigram

– Effect: documents with the phrase “hot dog” have a higher score

3

Page 4: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Dependency model

• Dependency language model (Gao et al. 2005)– Determine the strongest dependencies among

query terms (a parsing process):– price hot dog

– The determined dependencies define an additional requirement for documents:• Documents have to contain the unigrams• Documents have to contain the required dependencies• The two criteria are linearly interpolated

4

Page 5: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Markov Random Field (MRF) (Metzler&Croft)

• Sequential Full

• Potential function• Sequential model: Interpolation of unigram model, ordered

bigram and unordered bigram

5

Page 6: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Limitations

• The importance of a (type of) dependency is fixed in the combined model in the same way for all the queries– A fixed weight is assigned to each component model

• price-dog is as important as hot-dog (dependency model)• price-hot is as important as hot-dog (MRF) in the ordered model

• Are they equally strong dependencies?– hot-dog > price-dog, price-hot

• Intuition: a stronger dependency forms a stronger constraint

6

Page 7: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Limitations

• Can a phrase model solve this problem?– Some phrases form a semantically stronger dependency

than some others• hot-dog > cute-dog• Sony digital-camera > Sony-digital camera, Sony-camera digital

– Is a semantically stronger dependency more useful for IR?• Not necessarily• digital-camera could be less useful than Sony-camera• The importance of a dependency in IR depends on its usefulness

to retrieve better documents.

7

Page 8: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Limitations

• MRF sequential model– Only consider consecutive pairs of terms– No dependency between distant terms• Sony digital camera: Sony-digital, digital-camera

• Full model– Can cover long distance dependencies– But large increase in complexity

8

Page 9: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Proximity: more flexible dependency

• Tao&Zhai, 2007• Zhao&Yun 2009

• ProxB(wi): proximity centrality– Min/average/sum dist. to the other query terms

• However, is still fixed. 9

Page 10: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

A recent extension to MRF model• Bendersky, Metzler, Croft, 2010

– Weighted dependencies

– wjuni and wj

bi: the importance of different features

– gjuni and gj

bi: the weight of each unigram and bigram according to its utility

– However• fo and fu are mixed up

• Only consider dependency between pairs of adjacent terms

10

Page 11: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Go further

• Using discriminative model instead of MRF– Can consider dependencies between more distant terms,

without having the exponential complexity growth

• We only consider pair-wise dependencies• Assumption: pair-wise dependencies capture the most important part of

dependencies

• Consider several types of dependencies between query terms– Ordered bigram– Unordered pair of terms within some distance (2, 4, 8, 16)

• Dependencies at different distances have different strengths• Co-occurrence dependency ~ variable proximity

11

Page 12: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

General discriminative model

• Breaking down each component model to consider the strength/usefulness of a term dependency

• U, B, Cw: importance of a unigram, a bigram and a co-

occurrence pair within distance w in documents

12

Page 13: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

An example

• corporate pension plans funds

corporate pension fundsplans

.50

.07 .60

.08 .80

.70.80

.20

.07

.60.35

bico2co4co8 (co16 omitted)

13

Page 14: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Further development

• Set U at 1 and vary the other • Features:

14

Page 15: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

How to determine the usefulness of a bigram and a co-occurrence pair B andCw ?

- Using a learning method based on some features- Cross-validation

15

Page 16: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Learning method

• Parameters• Goal:

– Ti: Training data

– Ri: Document ranking using the parameters

– E: measure of effectiveness (MAP)

• Training data:– {xi, zi} a bigram or a pair of term within distance w and its best value

for the query– Finding the best value by coordinate-level ascendent search

• Epsilon SVM with radial basis kernel function

16

Page 17: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Features

17

Page 18: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Test collections

18

Page 19: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Results with other models

19

Page 20: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

With our model

20

Page 21: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Analysis• Some intuitively strong dependencies should not

be considered as important in the retrieval process

• Disk1-query 088:“crude oil price trends”– Ideal weights (bi,co2,4,8,16)=0, AP=0.103 – leant bi=0.2, co2..16=0, AP=0.060

• Disk1-query 003: “joint ventures”– Ideal weights (bi,co2,4,8,16)=0, AP=0.086– leant bi=0.07,co2..16=0, AP=0.084

• Disk1-query 094: “computer aided crime”– Ideal weights (bi,co2,4,8,16) =0, AP=0.223– leant bi=0.3, co2..16=0, AP=.158

21

Page 22: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Analysis

• Some intuitively weakly connected words should be considered as strong dependencies:

• Disk1-query184: “corporate pension plans funds”– Ideal wt.bi=0.5, co2=0.7, co4=0.2, AP=0.253– Learnt wt.bi=0.2,co8=0.01, co16=0.001, AP=0.201 (Uni=0.131)

• Disk1-query115: “impact 1986 immigration law”– Ideal wt.co2=0.1, co4=0.35, co8=0.05, AP=0.511– Learnt wt.bi=0, co16=0.01, AP=0.492 (Uni=0.437)

22

Page 23: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Disk1-query115:“impact 1986 immigration law”

Ideal AP =0.511, uni=0.437, learnt=0.492

.01 .35impact 1986 immigr. law

.01.10

.35

.01

.03

.05

(Learnt) imp-1986 imp-imm imp-law 1986-imm 1986-law imm-law

wt.bi - - .14 - - -

wt.co2 - - - - - .05

wt.co8 - .01 .01 .01 - .01

wt.co16 - .01 .01 .01 .01 .02

bico2co4co8 (co16 omitted)

23

Page 24: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Disk1-query184:“corporate pension plans funds”

• AP ideal=0.253, uni=0.132, learnt=0.201

corporate pension fundsplans

.50

.07 .60

.08 .80

.70.80

.20

.07

.60.35

(Learnt) corp-pen corp-plan corp-fund pen-plan pen-fund plan-fund

wt.bi - - - .20 .18 -

wt.co2 - .05 - .59 .23 -

wt.co8 - .01 - .02 .02 .01

wt.co16 - .02 .02 .04 - .001

bico2co4co8 (co16 omitted)

24

Page 25: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Typical case 1: weak bigram dependency, weak co-occurrence dependency

25

Page 26: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Typical case 2: strong dependencies

26

Page 27: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Typical case 3: Weak bigram dependency, strong co-occurrence dependency

27

Page 28: Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1

Conclusions

• Different types of dependency between query terms to be considered

• They have variable importance/usefulness for IR, and should be integrated in IR model with different weights.– Not necessarily correlate with semantic dependency

• The new model is better than the existing models in most cases (stat. significance in some cases)

28