dependence language model for information retrieval jianfeng gao, jian-yun nie, guangyuan wu,...

25
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangy uan Wu, Guihong Cao, Dependence La nguage Model for Information Retri eval, SIGIR 2004

Upload: natalie-strickland

Post on 19-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Dependence Language Model for Information Retrieval

Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Infor

mation Retrieval, SIGIR 2004

Page 2: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Reference

• Structure and performance of a dependency language model. Ciprian, David Engle and et al. Eurospeech 1997.

• Parsing English with a Link Grammar. Daniel D. K. Sleator and Davy Temperley. Technical Report CMU-CS-91-196 1991.

Page 3: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Why we use independence assumption?

• The independence assumption is one of the assumptions widely adopted in probabilistic retrieval theory.

• Why?– Make retrieval models easier.

– Make retrieval operation tractable.

• The shortage of independence assumption– Independence assumption does not hold in textual data.

Page 4: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Latest ideas of dependence assumption

• Bigram– Some language modeling approach try to incorporate word frequency b

y using bigram.

– Shortage:

• Some of word dependencies not only exist between adjacent words but also exist at more distant.

• Some of adjacent words are not exactly connected.

– Bigam language model showed only marginally better effectiveness than the unigram model.

• Bi-term– Bi-term language model is similar to the bigram model except the const

raint of order in terms is relaxed.

– “information retrieval” and “retrieval of information” will be assigned the same probability of generating the query.

Page 5: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Structure and performance of a dependency language model

Page 6: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Introduction

• This paper present a maximal entropy language model that incorporates both syntax and semantics via a dependency grammar.

• Dependency grammar: express the relations between words by a directed graph which can incorporate the predictive power of words that lie outside of bigram or trigram range.

Page 7: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Introduction

• Why we use Ngram– Assume

if we want to record

we need to store independent parameters

• The drawback of Ngram– Ngram blindly discards relevant words that lie N or more positions in t

he past.

nwwwwS ...,,, 210)...|()...|()()( 10010 nn wwwPwwPwPSP

)...|( 10 nn wwwP

)1(1 VV i

Page 8: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Structure of the model

Page 9: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Structure of the model

• Develop an expression for the joint probability , K is the linkages in the sentence.

• Then we get

• Assume that the sum is dominated by a single term, then

),( KSP

K

KSPSP ),()(

),(maxarg

),(),(*

*

KSPKwhere

KSPKSP

K

K

),()( *KSPSP

Page 10: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

A dependency language model of IR

• A query we want to rank – Previous work:

• Assume independence between query terms :

– New work:

• Assume that term dependencies in a query form a linkage

)...( 1 mqqQ )|( DQP

mi i DqPDQP...1

)|()|(

L L

DLQPDLPDLQPDQP ),|()|()|,()|(

Page 11: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

A dependency language model of IR

• Assume that the sum over all the possible Ls is dominated by a single term

• Assume that each term is dependent on exactly one related query term generated previous.

L L

DLQPDLPDLQPDQP ),|()|()|,()|(

L

DLQP )|,(

*L

)|(maxarg

),|()|()|(

DLPLthatsuch

DLQPDLPDQP

L

hq iq jq

Page 12: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

A dependency language model of IR

Lji ji

jijh

Lji i

ijh

Ljiijh

DLqPDLqP

DLqPDLqqPDqP

DLqP

DLqqPDqP

DLqqPDqPDLQP

),(

),(

),(

),|(),|(

),|(),|,()|(

),|(

),|,()|(

),,|()|(),|(

)|(maxarg

),|()|()|(

DLPLthatsuch

DLQPDLPDQP

L

hq iq jq

Page 13: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

A dependency language model of IR

• Assume– The generation of a single term is independent of L

• By this assumption, we would have arrived at the same result by starting from any term. L can be represented as an undirected graph.

)|(),|( DqPDLqP jj

mi Lji ji

jii

hj Lji ji

jijh

DqPDqP

DLqqPDqP

DqPDqP

DLqqPDqPDqPDLQP

...1 ),(

),(

)|()|(

),|,()|(

)|()|(

),|,()|()|(),|(

Lji ji

jijh DLqPDLqP

DLqPDLqqPDqP

),( ),|(),|(

),|(),|,()|(

Page 14: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

A dependency language model of IR

)|(maxarg

),|()|()|(

DLPLthatsuch

DLQPDLPDQP

L

)|()|(

),|,(log),|,(

),|,()|(log)|(log)|(log...1 ),(

DqPDqP

DLqqPDLqqMI

DLqqMIDqPDLPDQP

ji

jiji

mi Ljijii

取 log

mi Lji ji

jii DqPDqP

DLqqPDqPDLQP

...1 ),( )|()|(

),|,()|(),|(

Page 15: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Parameter Estimation

• Estimating– Assume that the linkages are independent.

– Then count the relative frequency of link l between and given that they appear in the same sentence.

)|( DLP

Ll

DlPDLP )|()|(

),(

),,(),|(

ji

jiji qqC

RqqCqqRF

mi Lji

jii DLqqMIDqPDLPDQP...1 ),(

),|,()|(log)|(log)|(log

iq jq

Have a link in a sentence

in training dataA score

)|(),|(

),|(

),(

QlPqqRF

qqRF

ljiji

ji

The link frequency of

query i and query j

Page 16: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Parameter Estimation

),|()|( ji qqRFQlP

Lji

jiLL

qqRFQLPL),(

),|(maxarg)|(maxarg

Ll Lji

ji qqRFDlPDQLPDLP),(

),|()|(),|()|(

)|(),|(

),|(

),(

QlPqqRF

qqRF

ljiji

ji

assumption

),|(),|()1(),|( jiCjiDji qqRFqqRFqqRF

Assumption: )|()|( DLPQLP

Page 17: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Parameter Estimation

• Estimating– The document language model is smoothed with a Dirichlet prior

)|( DqP i

ii qiC

iC

qiC

iCiD

iii

qC

qC

qC

qCqC

CqPDqPDqP

)(

)(

)(

)()()1(

)|()|()1()|('

Dirichilet distribution

Constant discount

Page 18: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Parameter Estimation

• Estimating ),|,( DLqqMI ji

),(*,),*,(

),,(log

)),(*,)(),*,((

),,(log

),|(),|(

),|,(log),|,(

RqCRqC

NRqqC

NRqCNRqC

NRqqC

DLqPDLqP

DLqqPDLqqMI

jDiD

jiD

jDiD

jiD

ji

jiji

)(*,*, RCN D

Page 19: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Experimental Setting

• Stemmed and stop words were removed.

• Queries are TREC topics 202 to 250 on TREC disk 2 and 3.

Page 20: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

The flow of the experimental

Find the linkage of query

query

Find the max L by maxlP(l|Q)

Get

document Training data For weight computation

Count the frequency

),|( and

),|(

jiC

jiD

qqRF

qqRF

),|( ji qqRF

Get P(L|D)

Count the frequency

)( and )( iDiC qCqC Get )|( DqP i

Count the frequency

)(*,*, and

),,( and

),*,( and ),*,(

RC

RqqC

RqCRqC

D

jiD

iDiD Get ),|,( DLqqMI ji

combine Ranking

document

Page 21: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Result-BM & UG

• BM: binary independent retrieval

• UG: unigram language model approach

• UG achieves the performance similar to, or worse than, that of BM.

Page 22: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Result- DM

• DM: dependency model

• The improve of DM over UG is statistically significant.

Page 23: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Result- BG

• BG: bigram language model

• BG is slightly worse than DM in five out of six TREC collections but substantially outperforms UG in all collection.

Page 24: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Result- BT1 & BT2

• BT: bi-term language model

)),|(),|((2

1),|( 1111 DqqPDqqPDqqP iiBGiiBGiiBT

)}(),(min{2

),(),(),|(

1

1112

iDiD

iiDiiDiiBT qCqC

qqCqqCDqqP

Page 25: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Conclusion

• This paper introduce the linkage of a query as a hidden variable.

• Generate each term in turn depending on other related terms according to the linkage.– This approach cover several language model approaches as special case

s.

• The experimental of this paper outperforms substantially over unigram, bigram and classical probabilistic retrieval model.