information models for ad hoc information retrieval, sigir 2010

55
Information-Based Models for Ad Hoc IR St´ ephane Clinchant 1,2 Eric Gaussier 2 1 Xerox Research Centre Europe 2 Laboratoire d’Informatique de Grenoble Univ. Grenoble 1 SIGIR’10, 20 July 2010 S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 1 / 33

Upload: sclincha

Post on 11-Jul-2015

574 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Information-Based Models for Ad Hoc IR

Stephane Clinchant 1,2 Eric Gaussier 2

1 Xerox Research Centre Europe

2 Laboratoire d’Informatique de GrenobleUniv. Grenoble 1

SIGIR’10, 20 July 2010

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 1 / 33

Page 2: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Overview

Information ModelsNormalization

Probability DistributionRSV

Heuristic Constraints

Condition 1Condition 2Condition 3Condition 4

BurstinessPhenomenon

Property of Prob.Distributions

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 2 / 33

Page 3: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Informative Content

Use Shannon’s information to weigh words in documents

P(X)−log P(X)

Inf(x) = − log P(x |ΘC ) = Informative ContentDeviation from an average behavior

- Observation by Harter (70): non-specialty words deviates from a Poisson- Informative Content, core to Divergence From Randomness Models

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 3 / 33

Page 4: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Informative Content

Use Shannon’s information to weigh words in documents

P(X)−log P(X)

Inf(x) = − log P(x |ΘC ) = Informative ContentDeviation from an average behavior- Observation by Harter (70): non-specialty words deviates from a Poisson- Informative Content, core to Divergence From Randomness Models

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 3 / 33

Page 5: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Information-based Model

Main idea:

1 Discrete terms frequencies x are renormalized into continuousvalues t(x), due to different document length

2 For each term w , values t(x) are assumed to follow a distribution Pwith parameter λw on the corpus, ie Tfw |λw ∼ P

3 Queries and documents are compared with a surprise measure, amean information:

RSV (q, d) =∑w∈q

−xqw log P(Tfw > t(xd

w )|λw )

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 4 / 33

Page 6: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Information-based Model

Main idea:

1 Discrete terms frequencies x are renormalized into continuousvalues t(x), due to different document length

2 For each term w , values t(x) are assumed to follow a distribution Pwith parameter λw on the corpus, ie Tfw |λw ∼ P

3 Queries and documents are compared with a surprise measure, amean information:

RSV (q, d) =∑w∈q

−xqw log P(Tfw > t(xd

w )|λw )

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 4 / 33

Page 7: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Information-based Model

Main idea:

1 Discrete terms frequencies x are renormalized into continuousvalues t(x), due to different document length

2 For each term w , values t(x) are assumed to follow a distribution Pwith parameter λw on the corpus, ie Tfw |λw ∼ P

3 Queries and documents are compared with a surprise measure, amean information:

RSV (q, d) =∑w∈q

−xqw log P(Tfw > t(xd

w )|λw )

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 4 / 33

Page 8: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Outline

1 Model PropertiesI Retrieval HeuristicsI Burstiness Phenomenon

2 Two Power-Law InstancesI log-logistic modelI smoothed power-law model

3 Experiments

4 Extension to PRF

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 5 / 33

Page 9: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Notations

xdw frequency of word w in document d , xq

w in querytdw normalized term frequency

Tfw random variable for frequency of word w

ld length of document didfw corpus parameter for word wθ model parameter.

Most (Ad-Hoc) IR models can be written as:

RSV (q, d) =∑w∈q

f (xqw )h(xd

w , ld , idfw , θ)

⇒ What do we know about h?

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 6 / 33

Page 10: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Notations

xdw frequency of word w in document d , xq

w in querytdw normalized term frequency

Tfw random variable for frequency of word wld length of document didfw corpus parameter for word wθ model parameter.

Most (Ad-Hoc) IR models can be written as:

RSV (q, d) =∑w∈q

f (xqw )h(xd

w , ld , idfw , θ)

⇒ What do we know about h?

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 6 / 33

Page 11: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Notations

xdw frequency of word w in document d , xq

w in querytdw normalized term frequency

Tfw random variable for frequency of word wld length of document didfw corpus parameter for word wθ model parameter.

Most (Ad-Hoc) IR models can be written as:

RSV (q, d) =∑w∈q

f (xqw )h(xd

w , ld , idfw , θ)

⇒ What do we know about h?

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 6 / 33

Page 12: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Overview

Information ModelsNormalization

Probability DistributionRSV

Heuristic Constraints

Condition 1Condition 2Condition 3Condition 4

BurstinessPhenomenon

Property of Prob.Distributions

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 7 / 33

Page 13: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Condition 1Docs with more occurrences of query terms get higher scores than docswith less occurrences

∀(l , idf , θ),∂h(x , l , idf , θ)

∂x> 0 (h increases with x)

0 5 10 15

01

23

45

6

x

h(x)

"Good" h: increasing"Bad" h: decreasing

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 8 / 33

Page 14: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Condition 2The increase in the retrieval score should be smaller for larger termfrequencies. Ex: 2→4, 50→ 52

∀(l , idf , θ),∂2h(x , l , idf , θ)

∂x2< 0 (h concave)

0 5 10 15

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

x

h(x)

"Good" h: Concave"Bad" h: Convex

Difference of scores decreases

Difference of scores increases

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 9 / 33

Page 15: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Condition 3

Longer documents, when compared to shorter ones with exactly thesame number of occurrences of query terms, should be penalized(likely to cover additional topics)

∀(x , idf , θ),∂h(x , l , idf , θ)

∂l< 0 (h decreasing with l)

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 10 / 33

Page 16: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Condition 4: IDF EffectIt is important to downweight terms occurring in many documents

∀(x , l , θ),∂h(x , l , idf , θ)

∂idf> 0 (IDF Effect)

0 5 10 15

1.6

1.8

2.0

2.2

2.4

2.6

2.8

3.0

x

h(x)

h(x,IDF=10)h(x,IDF=5)

IDF Effect: h(x,IDF=10)>h(x,IDF=5)

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 11 / 33

Page 17: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Heuristic Constraints

Condition 1: h increases with x

Condition 2: h is concave

Condition 3: h decreases with l

Condition 4: h increases with idf (IDF Effect)

Additionnal conditions in the paper

⇒ Analytical Reformulation of TFC1, TFC2, LNC1 and TDC:

Fang et al, A Formal Study of Information Retrieval Heuristics, SIGIR’04

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 12 / 33

Page 18: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Heuristic Constraints

Condition 1: h increases with x

Condition 2: h is concave

Condition 3: h decreases with l

Condition 4: h increases with idf (IDF Effect)

Additionnal conditions in the paper

⇒ Analytical Reformulation of TFC1, TFC2, LNC1 and TDC:

Fang et al, A Formal Study of Information Retrieval Heuristics, SIGIR’04

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 12 / 33

Page 19: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Overview

Information ModelsNormalization

Probability DistributionRSV

Heuristic Constraints

Condition 1Condition 2Condition 3Condition 4

BurstinessPhenomenon

Property of Prob.Distributions

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 13 / 33

Page 20: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Burstiness Phenomenon

We proceed to Word Frequency distributions:

Church and Gale 1 showed that a 2-Poisson model yields a poor fit toword frequencies

A possible explanation: the behavior of words which tend to appear inbursts, ie burstiness

Once a word appears in a document, it is much more likely to appearagain

Recent works on Dirichlet Coumpound Multinomial

⇒ Which distributions can account for burstiness?

1Poisson MixturesS.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 14 / 33

Page 21: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Burstiness Phenomenon

We proceed to Word Frequency distributions:

Church and Gale 1 showed that a 2-Poisson model yields a poor fit toword frequencies

A possible explanation: the behavior of words which tend to appear inbursts, ie burstiness

Once a word appears in a document, it is much more likely to appearagain

Recent works on Dirichlet Coumpound Multinomial

⇒ Which distributions can account for burstiness?

1Poisson MixturesS.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 14 / 33

Page 22: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Burstiness Property of Probabilility Distribution

Definition

A distribution P is bursty iff the function gε defined by:

gε(x) = P(X ≥ x + ε|X ≥ x)

is a strictly increasing function of x ( ∀ε > 0)

Interpretation: it becomes easier to generate more occurrences

gε(x) strictly increasing ⇐⇒ ∆ = log gε(x) strictly increasing⇐⇒ ∆ = log P(X ≥ x + ε)− log P(X ≥ x) is increasing

As ∆ < 0, absolute values of successive difference ∆ decreases

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 15 / 33

Page 23: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Burstiness Property of Probabilility Distribution

Definition

A distribution P is bursty iff the function gε defined by:

gε(x) = P(X ≥ x + ε|X ≥ x)

is a strictly increasing function of x ( ∀ε > 0)

Interpretation: it becomes easier to generate more occurrences

gε(x) strictly increasing ⇐⇒ ∆ = log gε(x) strictly increasing⇐⇒ ∆ = log P(X ≥ x + ε)− log P(X ≥ x) is increasing

As ∆ < 0, absolute values of successive difference ∆ decreases

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 15 / 33

Page 24: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Geometric Interpretation of Burstiness

0 5 10 15

−5−4

−3−2

−10

x

log

P(X

>x)

Delta = log P(X>x+e) − log P(X>x) increases

As Delta<0, absolute value decreases

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 16 / 33

Page 25: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Gaussian(mean=5,std=1) is not bursty

0 5 10 15

−50

−40

−30

−20

−10

0

x

log

P(X

>x)

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 17 / 33

Page 26: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Overview

BurstinessPhenomenon

Property of Prob.Distributions

Information ModelsNormalization

Probability DistributionRSV

Heuristic Constraints

Condition 1Condition 2Condition 3Condition 4

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 18 / 33

Page 27: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Information Models & Heuristics Constraints:

Models defined by:

RSV (q, d) =∑w∈q

xqw

Function h︷ ︸︸ ︷(− log P(Tfw > td

w |λw )) (1)

Condition 1: h increasing with x X

Condition 3: h penalizes long documents X

Condition 2: h concave

Theorem

If the distribution P is bursty, then the information model defined with Pis concave

IDF effect and 2 additional Conditions depend on the choice of P

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 19 / 33

Page 28: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Information Models & Heuristics Constraints:

Models defined by:

RSV (q, d) =∑w∈q

xqw

Function h︷ ︸︸ ︷(− log P(Tfw > td

w |λw )) (1)

Condition 1: h increasing with x X

Condition 3: h penalizes long documents X

Condition 2: h concave

Theorem

If the distribution P is bursty, then the information model defined with Pis concave

IDF effect and 2 additional Conditions depend on the choice of P

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 19 / 33

Page 29: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Information Models & Heuristics Constraints:

Models defined by:

RSV (q, d) =∑w∈q

xqw

Function h︷ ︸︸ ︷(− log P(Tfw > td

w |λw )) (1)

Condition 1: h increasing with x X

Condition 3: h penalizes long documents X

Condition 2: h concave

Theorem

If the distribution P is bursty, then the information model defined with Pis concave

IDF effect and 2 additional Conditions depend on the choice of P

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 19 / 33

Page 30: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Information Models & Heuristics Constraints:

Models defined by:

RSV (q, d) =∑w∈q

xqw

Function h︷ ︸︸ ︷(− log P(Tfw > td

w |λw )) (1)

Condition 1: h increasing with x X

Condition 3: h penalizes long documents X

Condition 2: h concave

Theorem

If the distribution P is bursty, then the information model defined with Pis concave

IDF effect and 2 additional Conditions depend on the choice of P

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 19 / 33

Page 31: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Characterization of Information Models

1 Normalisation of FrequenciesIncreasing in x , decreasing in lex: DFR normalization td

w = xdw log(1 + c avg l

ld)

2 Probability Distribution Continuous and Bursty. Support = [0,+∞)

3 Retrieval Function

RSV (q, d) =∑w∈q

−xqw log P(Tfw > td

w |λw )

=∑

w∈q∩d

−xqw log P(Tfw > td

w |λw )

λw =Fw

Nor

Nw

N

where:-Fw Frequency of w in the corpus-Nw Document Frequency of w-N Number of documents in the collection

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 20 / 33

Page 32: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Characterization of Information Models

1 Normalisation of FrequenciesIncreasing in x , decreasing in lex: DFR normalization td

w = xdw log(1 + c avg l

ld)

2 Probability Distribution Continuous and Bursty. Support = [0,+∞)

3 Retrieval Function

RSV (q, d) =∑w∈q

−xqw log P(Tfw > td

w |λw )

=∑

w∈q∩d

−xqw log P(Tfw > td

w |λw )

λw =Fw

Nor

Nw

N

where:-Fw Frequency of w in the corpus-Nw Document Frequency of w-N Number of documents in the collection

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 20 / 33

Page 33: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Characterization of Information Models

1 Normalisation of FrequenciesIncreasing in x , decreasing in lex: DFR normalization td

w = xdw log(1 + c avg l

ld)

2 Probability Distribution Continuous and Bursty. Support = [0,+∞)

3 Retrieval Function

RSV (q, d) =∑w∈q

−xqw log P(Tfw > td

w |λw )

=∑

w∈q∩d

−xqw log P(Tfw > td

w |λw )

λw =Fw

Nor

Nw

N

where:-Fw Frequency of w in the corpus-Nw Document Frequency of w-N Number of documents in the collection

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 20 / 33

Page 34: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Characterization of Information Models

1 Normalisation of FrequenciesIncreasing in x , decreasing in lex: DFR normalization td

w = xdw log(1 + c avg l

ld)

2 Probability Distribution Continuous and Bursty. Support = [0,+∞)

3 Retrieval Function

RSV (q, d) =∑w∈q

−xqw log P(Tfw > td

w |λw )

=∑

w∈q∩d

−xqw log P(Tfw > td

w |λw )

λw =Fw

Nor

Nw

N

where:-Fw Frequency of w in the corpus-Nw Document Frequency of w-N Number of documents in the collection

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 20 / 33

Page 35: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Characterization of Information Models

1 Normalisation of FrequenciesIncreasing in x , decreasing in lex: DFR normalization td

w = xdw log(1 + c avg l

ld)

2 Probability Distribution Continuous and Bursty. Support = [0,+∞)

3 Retrieval Function

RSV (q, d) =∑w∈q

−xqw log P(Tfw > td

w |λw )

=∑

w∈q∩d

−xqw log P(Tfw > td

w |λw )

λw =Fw

Nor

Nw

N

where:-Fw Frequency of w in the corpus-Nw Document Frequency of w-N Number of documents in the collection

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 20 / 33

Page 36: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Two Power-law Instances

The log-logistic and smoothed power law models

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 21 / 33

Page 37: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Log-Logistic Model

Log-Logistic distribution

P(Tfw > tdw |λw ) =

λw

(tdw + λw )

The LGD model is defined by

1 DFR Normalization with parameter c

2 Tfw ∼ LogLogistic(λw = NwN )

3 Ranking Model (as before):

RSV (q, d) =∑

w∈q∩d

xqw

[− log P(Tfw > td

w )]

Meets all conditions for all parameter values

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 22 / 33

Page 38: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Log-Logistic Model

Log-Logistic distribution

P(Tfw > tdw |λw ) =

λw

(tdw + λw )

The LGD model is defined by

1 DFR Normalization with parameter c

2 Tfw ∼ LogLogistic(λw = NwN )

3 Ranking Model (as before):

RSV (q, d) =∑

w∈q∩d

xqw

[− log P(Tfw > td

w )]

Meets all conditions for all parameter values

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 22 / 33

Page 39: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Log-Logistic Model

Log-Logistic distribution

P(Tfw > tdw |λw ) =

λw

(tdw + λw )

The LGD model is defined by

1 DFR Normalization with parameter c

2 Tfw ∼ LogLogistic(λw = NwN )

3 Ranking Model (as before):

RSV (q, d) =∑

w∈q∩d

xqw

[− log P(Tfw > td

w )]

Meets all conditions for all parameter values

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 22 / 33

Page 40: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Log-Logistic Model

Log-Logistic distribution

P(Tfw > tdw |λw ) =

λw

(tdw + λw )

The LGD model is defined by

1 DFR Normalization with parameter c

2 Tfw ∼ LogLogistic(λw = NwN )

3 Ranking Model (as before):

RSV (q, d) =∑

w∈q∩d

xqw

[− log P(Tfw > td

w )]

Meets all conditions for all parameter values

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 22 / 33

Page 41: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Smoothed Power Law SPL

Distribution on [0,+∞) with parameter 0 < λ < 1:

P(Tfw > tdw |λw ) =

λ

tdwtdw +1w − λw

1− λw

IR Model:

1 DFR Normalization with parameter c

2 Tfw ∼ SPL(λw = NwN )

3 Ranking Model (as before):

RSV (q, d) =∑

w∈q∩d

xqw

[− log P(Tfw > td

w )]

Meets all conditions

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 23 / 33

Page 42: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Smoothed Power Law SPL

Distribution on [0,+∞) with parameter 0 < λ < 1:

P(Tfw > tdw |λw ) =

λ

tdwtdw +1w − λw

1− λw

IR Model:

1 DFR Normalization with parameter c

2 Tfw ∼ SPL(λw = NwN )

3 Ranking Model (as before):

RSV (q, d) =∑

w∈q∩d

xqw

[− log P(Tfw > td

w )]

Meets all conditions

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 23 / 33

Page 43: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Experiments

Comparison with language models, BM25, DFR models

Corpus: ROBUST, TREC-3, CLEF03, GIRT with short (-t) and longqueries (-d)

6 query sets: ROB-d, ROB-t, T3-t, GIRT, CLEF-d, CLEF-t

Methodology:

1 Divide each collection into 10 splits training/test

2 Learn best parameter (µ, c , k1) to optimize MAP or P10 on thetraining set

3 Measure MAP or P10 on the 10 splits and test difference with a t-test.

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 24 / 33

Page 44: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Comparison with Dirichlet Smoothing

Table: LGD and SPL versus LM-Dirichlet after 10 splits; bold indicates significantdifference

MAP ROB-d ROB-t GIR T3-t CL-t CL-d

DIR 27.1 25.1 41.1 25.6 36.2 48.5LGD 27.4 25.0 42.1 24.8 36.8 49.7P10 ROB-d ROB-t GIR T3-t CL-t CLF-d

DIR 45.6 43.3 68.6 54.0 28.4 33.8LGD 46.2 43.5 69.0 54.3 28.6 34.5

MAP ROB-d ROB-t GIR T3-t CL-t CL-d

DIR 26.7 25.0 40.9 27.1 36.2 50.2SPL 25.6 24.9 42.1 26.8 36.4 46.9

P10 ROB-d ROB-t GIR T3-t CL-t CL-d

DIR 45.2 43.8 68.2 52.8 27.3 32.8SPL 46.6 44.7 70.8 55.3 27.1 32.9

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 25 / 33

Page 45: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Comparison with Dirichlet Smoothing

Table: LGD and SPL versus LM-Dirichlet after 10 splits; bold indicates significantdifference

MAP ROB-d ROB-t GIR T3-t CL-t CL-d

DIR 27.1 25.1 41.1 25.6 36.2 48.5LGD 27.4 25.0 42.1 24.8 36.8 49.7P10 ROB-d ROB-t GIR T3-t CL-t CLF-d

DIR 45.6 43.3 68.6 54.0 28.4 33.8LGD 46.2 43.5 69.0 54.3 28.6 34.5

MAP ROB-d ROB-t GIR T3-t CL-t CL-d

DIR 26.7 25.0 40.9 27.1 36.2 50.2SPL 25.6 24.9 42.1 26.8 36.4 46.9

P10 ROB-d ROB-t GIR T3-t CL-t CL-d

DIR 45.2 43.8 68.2 52.8 27.3 32.8SPL 46.6 44.7 70.8 55.3 27.1 32.9

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 25 / 33

Page 46: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Comparison with DFR models

Table: LGD and SPL versus PL2 after 10 splits; bold indicates significantdifference

MAP ROB-d ROB-t GIR T3-t CL-t CL-d

PL2 26.2 24.8 40.6 24.9 36.0 47.2LGD 27.3 24.7 40.5 24.0 36.2 47.5

P10 ROB-d ROB-t GIR T3-t CL-t CL-d

PL2 46.4 44.1 68.2 55.0 28.7 33.1LGD 46.6 43.2 66.7 53.9 28.5 33.7

MAP ROB-d ROB-t GIR T3-t CL-t CL-d

PL2 26.3 25.2 42.8 25.8 37.3 45.7SPL 26.3 25.2 42.7 25.3 37.4 44.1

P10 ROB-d ROB-t GIR T3-t CL-t CL-d

PL2 46.0 45.2 69.3 54.8 26.2 32.7SPL 47.0 45.2 69.8 55.4 25.9 32.9

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 26 / 33

Page 47: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Extension to Pseudo Relevance Feedback

Mean information of the top retrieved documents

InfoR(w) =1

|R|∑d∈R

− log P(Tfw > tdw ;λw )

Query Update:

xq2w =

xqw

maxw xqw

+ βInfoR(w)

maxw Info(w)

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 27 / 33

Page 48: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Comparison with others PRF Models

Mixture Model (Zhai)

R comes from a mixture of a relevant topic model θwand the corpus language model (multinomialdistribution)Query Update :

p(w |q2) = αp(w |q) + (1− α)θw

Bo2 Model (Amati)

Documents in R are merged together. A Geometricprobability model measures the informative content of awordQuery Update:

xq2w =

xqw

maxw xqw

+ βInfoBo2(w)

maxw InfoBo2(w)

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 28 / 33

Page 49: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Comparison with others PRF Models

Mixture Model (Zhai)

R comes from a mixture of a relevant topic model θwand the corpus language model (multinomialdistribution)Query Update :

p(w |q2) = αp(w |q) + (1− α)θw

Bo2 Model (Amati)

Documents in R are merged together. A Geometricprobability model measures the informative content of awordQuery Update:

xq2w =

xqw

maxw xqw

+ βInfoBo2(w)

maxw InfoBo2(w)

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 28 / 33

Page 50: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Pseudo Relevance Feedback Experiments

1 Divide each collection in 10 splits training/test

2 Learn best interpolation weight (β, α) to optimize MAP on thetraining set

3 Measure MAP on the 10 splits and test difference with a t-test

4 Change |R| and termCount TC to add to the queries

5 Repeat

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 29 / 33

Page 51: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Table: MAP, bold indicates best performance, ∗ significant difference over LMand Bo2 models

Model |R| TC ROB-t GIRT TREC3-t CLEF-t

LM+MIX 5 5 27.5 44.4 30.7 36.6INL+Bo2 5 5 26.5 42.0 30.6 37.6

LGD 5 5 28.3∗ 44.3 32.9∗ 37.6

LM+MIX 5 10 28.3 45.7∗ 33.6 37.4INL+Bo2 5 10 27.5 42.7 32.6 37.5

LGD 5 10 29.4∗ 44.9 35.0∗ 40.2∗

LM+MIX 10 10 28.4 45.5 31.8 37.6INL+Bo2 10 10 27.2 43.0 32.3 37.4

LGD 10 10 30.0∗ 46.8∗ 35.5∗ 38.9LM+MIX 10 20 29.0 46.2 33.7 38.2INL+Bo2 10 20 27.7 43.5 33.8 37.7

LGD 10 20 30.3∗ 47.6∗ 37.4∗ 38.6

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 30 / 33

Page 52: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Table: Mean average precision (MAP) of PRF experiments; bold indicates bestperformance, ∗ significant difference over LM and Bo2 models

Model |R| TC ROB-t GIR T3-t CL-t

LGD 5 5 28.3∗ 44.3 32.9∗ 37.6SPL 5 5 28.9∗ 45.6∗ 32.9∗ 39.0∗

LGD 5 10 29.4∗ 44.9 35.0∗ 40.2∗

SPL 5 10 29.6∗ 47.0∗ 34.6∗ 39.5∗

LGD 10 10 30.0∗ 46.8∗ 35.5∗ 38.9SPL 10 10 30.0∗ 48.9∗ 33.8∗ 39.1∗

LGD 10 20 30.3∗ 47.6∗ 37.4∗ 38.6SPL 10 20 29.9∗ 50.2∗ 34.3 39.7∗

LGD 20 20 29.5∗ 48.9∗ 37.2∗ 41.0∗

SPL 20 20 28.8 50.3∗ 33.9 39.0∗

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 31 / 33

Page 53: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Conclusion

Can we design IR models compatible with empirical evidence?

⇒ Proposal: Information Models modelling burstiness (better fit to data)

Analytical Characterization of Retrieval Constraints

Definition of Burstiness for Probabilility distributions

Information-Based Models compliant with Retrieval ConstraintsI Bursty Distribution ⇒ Concave Model

Extension to PRF

The Log-logistic and Smoothed Power Law ModelsI Similar/Better Performance to LM and DFR without PRF, better with

PRF

Questions ?

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 32 / 33

Page 54: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Conclusion

Can we design IR models compatible with empirical evidence?

⇒ Proposal: Information Models modelling burstiness (better fit to data)

Analytical Characterization of Retrieval Constraints

Definition of Burstiness for Probabilility distributions

Information-Based Models compliant with Retrieval ConstraintsI Bursty Distribution ⇒ Concave Model

Extension to PRF

The Log-logistic and Smoothed Power Law ModelsI Similar/Better Performance to LM and DFR without PRF, better with

PRF

Questions ?

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 32 / 33

Page 55: Information Models for Ad Hoc Information Retrieval, SIGIR 2010

Relation with DFR

DFR Models are defined by:

RSV (q, d) =∑

w∈q∩d

−xqw Inf2(td

w ) log P(tdw )

We can show that:

Inf2 makes DFR models concave (condition 2)

Without Inf2 , DFR models have poor performances

Discrete Laws with continues values

2 Notions of informations (non homogenous)

⇒ Information Models uses continuous laws and a single concept ofinformation

S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 33 / 33