query anchoring using discriminative query...

Query Anchoring Using Discriminative

Query ModelsSaar Kuzi Anna Shtok Oren Kurland

Technion – Israel Institute Of Technology

We thank SIGIR for the conference travel grant

Pseudo-feedback-basedQuery ExpansionHighly ranked documents are used to induce a query model

Relying on pseudo feedback may result in query drift:

Documents in the result list could be non relevant

Relevant documents can contain non query-pertaining information(Harman ’92, He&Ounis ’09, Lv&Zhai ’09)

Initial Result List

Pseudo Feedback

𝐷𝑖𝑛𝑖𝑡

Query AnchoringTechniques for mitigating the risk in relying on pseudo feedback

Direct:

Interpolation with a model of the original query (e.g., Zhai&Lafferty ’01, Abdul-Jaleel et al. ’04, Lv&Zhai ’09)

Using the original query model as a prior (Tao&Zhai ’04, Tao&Zhai ’06)

Indirect:

Term clipping (e.g., Zhai&Lafferty ’01, Abdul-Jaleel et al. ’04, Ye et al. ’10)

Differential impact of documents on the query model (Lavrenko&Croft ‘01, Abdul-Jaleel et al. ’04, Lv&Zhai ‘14)

Our Approach A novel indirect query anchoring approach using a new discriminativeterm-based model

An accurate term-based representation of the initial ranking

Initial Result List𝐷𝑖𝑛𝑖𝑡

Query Model

Method

Discriminative Query Model

Anchored Query Model

Learning-to-rank

Language Model Notation Unigram language models are used

Given text 𝑥,

𝑝𝑀𝐿𝐸 𝑡 𝑥 ≝𝑡𝑓 𝑡 ∈ x

𝑝𝐷𝑖𝑟 𝑡 𝑥 ≝𝑡𝑓 𝑡 ∈ 𝑑 + 𝜇𝑝MLE 𝑡 𝐶

𝜇 + 𝑥

Two language models, 𝜃1 and 𝜃2, are compared using cross entropy:

𝑡 is a term, 𝑥 is the length of 𝑥, and 𝐶 is the collection of documents

𝐶𝐸(𝑝(∙ |𝜃1)||𝑝 ∙ 𝜃2 ) = −

𝑝 𝑡 𝜃1 log 𝑝(𝑡|𝜃2)

Mixture Model (Zhai&Lafferty ’01)

𝒅𝟏

𝜽𝑻

𝒅𝟐 𝒅𝒏…

𝑑∈𝐷𝑖𝑛𝑖𝑡

𝑡∈𝑑

𝑡𝑓 𝑡 ∈ 𝑑 log( 1 − 𝛾 𝑝 𝑡 𝜃𝑇 + 𝛾𝑝 𝑡 𝐶 ))

𝑝 𝑡 𝑀𝑀 ≝ 𝜆𝑝𝑀𝐿𝐸 𝑡 𝑞 + 1 − 𝜆 𝑝(𝑡|𝜃𝑇𝑐𝑙𝑖𝑝𝑝𝑒𝑑

𝒅𝟏q 𝒅𝒏…

𝑝 𝑡 𝑅𝑀1 ≝

𝑑∈𝐷𝑖𝑛𝑖𝑡

𝑝𝐷𝑖𝑟 𝑡 𝑑 𝑝(𝑑|𝑞)

𝑝 𝑡 𝑅𝑀3 ≝ 𝜆𝑝𝑀𝐿𝐸 𝑡 𝑞 + 1 − 𝜆 𝑝 𝑡 𝑅𝑀1𝑐𝑙𝑖𝑝𝑝𝑒𝑑

Relevance Model (Lavrenko&Croft ’01)

(Abdul-Jaleel et al. ’04)

The log-likelihood of documents in 𝐷𝑖𝑛𝑖𝑡:

Generative Query

Models

Discriminative Model

The pseudo feedback assumption: the higher a document is

ranked in the initial result list, the higher its relevance likelihood

∀𝑑𝑖 ,𝑑𝑗 ∈ 𝐷𝑖𝑛𝑖𝑡, if 𝑟(𝑑𝑖) > 𝑟(𝑑𝑗), then 𝑑𝑖 is more likely to

be relevant than 𝑑𝑗

SVM-rank (Joachims ’02)

2𝑤 ∙ 𝑤 + 𝐶

𝑖,𝑗

𝜉𝑖,𝑗

∀𝑖∀𝑗. 𝑟 𝑑𝑖 > 𝑟 𝑑𝑗 𝑤(𝜙 𝑑𝑖 − 𝜙 𝑑𝑗 ) ≥ 1 − 𝜉𝑖,𝑗

∀𝑖∀𝑗. 𝑟 𝑑𝑖 > 𝑟 𝑑𝑗 𝜉𝑖 ,𝑗 ≥ 0

𝜙 𝑑 = (log 𝑝𝐷𝑖𝑟 𝑡1 𝑑 , . . . , 𝑙𝑜𝑔 𝑝𝐷𝑖𝑟(𝑡 𝑉 |𝑑))

𝑉 is the vocabulary used in the initial result list

Model Derivation

𝑤+ 𝑤−

𝜃𝑤+ 𝜃𝑤−

𝐿1 Norm

Positive Anchor Model

Negative Anchor Model

NegativeComponents

Positive Components

AnchorPos Method

∀𝑡. 𝑠(𝑡) = 𝜆2𝑝(𝑡|𝜃) + 𝜆3𝑝(𝑡|𝜃𝑤+)

𝑝(𝑡|𝜗+)

𝑝(𝑡|𝜃𝐴𝑛𝑐ℎ𝑜𝑟𝑃𝑜𝑠) = 𝜆1𝑝𝑀𝐿𝐸 (𝑡|𝑞) + (1 − 𝜆1)𝑝(𝑡|𝜗+)

Anchoring a generative model, 𝜃, using the positive anchor model, 𝜃𝑤+

1. Term clipping

2. Sum normalization

Interpolation with the original query

𝜆1 + 𝜆2 + 𝜆3 = 1, 𝜆𝑖 ≥ 0

ClipNeg MethodClipping negative anchor terms

1. Setting to 0 the probabilities of negative anchor terms

2. Term clipping

3. Sum normalization

𝑝(𝑡|𝜗−)

𝑝(𝑡|𝜃𝐶𝑙𝑖𝑝𝑁𝑒𝑔) = 𝜆𝑝𝑀𝐿𝐸(𝑡|𝑞) + (1 − 𝜆)𝑝(𝑡|𝜗−)

Interpolation with the original query

𝑝(𝑡|𝜃)

Related Work Existing query anchoring techniques (direct query anchoring, term

clipping and differential weighting)

Applying our approach on top of these yields further improvements

Methods for improving the quality of the pseudo feedback result list

(e.g., Mitra et al. ’98, Lee et al. ’08)

Our model can be induced from any ranked list

Related Work A supervised term classification approach (Cao et al. ’08)

Our approach is unsupervised and focuses on unigram query models

Clustering of terms in a query model (Udupa et al. ’09)

A method for cluster selection was not proposed

Fusing the result lists: the initial and the expansion-based

(Zighelnic&Kurland ’08)

Our methods operate on the model level and post better performance

Evaluation TREC datasets: TREC123, ROBUST, and WT10G

The initial result list is retrieved using a standard language model

method (Lafferty&Zhai ’01): −𝐶𝐸(𝑝𝑀𝐿𝐸(∙ |𝑞)||𝑝𝐷𝑖𝑟 ∙ 𝑑 )

Baselines:

◦ Generative Models (RM3, MM)

◦ Fusion (Zighelnic&Kurland ’08)

Values of free parameters are set using leave-one-out cross validation

The Discriminative Model

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

ROBUST

−𝛼𝐶𝐸(𝑝(∙ |𝜃𝑤+)||𝑝𝐷𝑖𝑟 ∙ 𝑑 ) + 1 − 𝛼 𝐶𝐸(𝑝(∙ |𝜃𝑤−)||𝑝𝐷𝑖𝑟 ∙ 𝑑 )

• Re-ranking an initial result list of 100 documents according to:

• The positive and negative anchor models are clipped to use ν terms

𝐑𝐌𝟏 (𝐀𝐏 = 𝟐𝟒. 𝟖)

𝛉𝐰+ (𝐀𝐏 = 𝟒𝟗. 𝟑)

Query: Airport Security, ROBUST, QL(AP=24.8) Discriminative vs.

Generative

• The discriminative model assigns high probabilities to terms with high IDF values

• The generative models are much more similar to each other, with respect to the terms they promote, than they are to the discriminative model

RM3 MM

TREC123

RM3 MMM

ROBUST

RM3 MM

𝜓∗∗

𝜓∗ ∗

𝜓∗ 𝜓∗

∗∗

AnchorPos

Statistically significant differences with:

Generative Model

Fusion

Anchoring a generative model using the positive anchor

RM3 MM

ROBUST

RM3 MM

TREC123

ClipNeg

RM3 MM

Clipping negative

anchor terms

Statistically significant differences with the generative model

Summary We presented a novel unsupervised pseudo-feedback-based

discriminative query model that is based on a learning-to-rank-

approach

We devised a few methods that use the discriminative model to

perform (indirect) query anchoring of existing query models

Empirical evaluation showed that using our methods can improve the

performance of highly effective generative query models

query anchoring using discriminative query...

Documents

chainless anchoring

anchoring device of ohl poles anchoring device of wooden...

stormpro 361 assembly anchoring - single 2 to 3-7/8 face...

anz anchoring catchemanchor

anchoring to concrete

3.2.3 hit-hy 200 adhesive anchoring system€¦ · adhesive...

anchoring systems

anchoring plan

anchoring the club -...

anchoring - drloisjose.files.wordpress.com

piloting and anchoring

anchoring solutions

anchoring & chemicals

3.2.3 hit-hy 200 adhesive anchoring system - hilti …...

adjustable anchoring syste~

nominal anchoring

anchoring innovation

mooring si anchoring

query expansion for email search -...

anchoring lrg