query-drift prevention for robust query expansion - presentation
TRANSCRIPT
Robust Query Expansion Based on Query-Drift Prevention
Robust Query Expansion Based on Query-DriftPrevention
Liron ZighelnicAcademic advisor: Dr. Oren Kurland
Based on our work at SIGIR 08’
The Faculty of Industrial Engineering and ManagementTechnion - Israel Institute of Technology
30.6.2009 - Information Systems Seminar
Robust Query Expansion Based on Query-Drift Prevention
Background
retrieval
Outline1 Background
Ad Hoc Retrieval
2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models
3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods
4 Experimental Evaluation
5 Related Work
6 Summary
7 Questions
Robust Query Expansion Based on Query-Drift Prevention
Background
retrieval
Our Mission - Ad Hoc Retrieval
Information Need
Corpus C
Ranked list of
documents
initD
1d
2d
nd
1nd
+
⋯
⋯
⋯
id C∈
Retrieval
System
3d
4d
(d)Scoreinit
Query q
documents
Robust Query Expansion Based on Query-Drift Prevention
Query Expansion
Motivation
Outline1 Background
Ad Hoc Retrieval
2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models
3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods
4 Experimental Evaluation
5 Related Work
6 Summary
7 Questions
Robust Query Expansion Based on Query-Drift Prevention
Query Expansion
Motivation
Query Expansion - Motivation
Users tend to use (very) short queriesThe polysemy problem (e.g., q: "Paris Hilton")The vocabulary mismatch problem (e.g., q: "view photos" d:"nature picures")
Expansion: Relevance Feedback vs. Pseudo RelevanceFeedback (a.k.a. blind feedback)(Buckley et al. 94’, Xu and Croft96’)
Robust Query Expansion Based on Query-Drift Prevention
Query Expansion
Pseudo Relevance Feedback
Outline1 Background
Ad Hoc Retrieval
2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models
3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods
4 Experimental Evaluation
5 Related Work
6 Summary
7 Questions
Robust Query Expansion Based on Query-Drift Prevention
Query Expansion
Pseudo Relevance Feedback
Pseudo Relevance Feedback
Expanded
Query
2'd
⋯
⋯
⋯
3'd
4'd
⋯ ⋯
(d)Scorepf
Retrieval
System
( )init
PF D
1'd
'nd
Expansion-Based List
Robust Query Expansion Based on Query-Drift Prevention
Query Expansion
The Performance Robustness Problem
Outline1 Background
Ad Hoc Retrieval
2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models
3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods
4 Experimental Evaluation
5 Related Work
6 Summary
7 Questions
Robust Query Expansion Based on Query-Drift Prevention
Query Expansion
The Performance Robustness Problem
The Performance Robustness Problem
Problems:
Dinit may contain many non relevant documents.
The initially retrieved document list Dinit may not manifest allquery-related aspects (Buckley 04’)
Consequences:
query drift- the shift in “intention” from the original query to itsexpanded form. (Mitra et al. 98’) (e.g., q: "Paris Hilton", q’: "ParisHilton Whitney model heiress")
While on average, pseudo-feedback-based query expansion methodsimprove retrieval effectiveness over that of retrieval using the originalquery, there are numerous queries for which this is not true
Robust Query Expansion Based on Query-Drift Prevention
Query Expansion
The Performance Robustness Problem
The Performance Robustness Problem
Problems:
Dinit may contain many non relevant documents.
The initially retrieved document list Dinit may not manifest allquery-related aspects (Buckley 04’)
Consequences:
query drift- the shift in “intention” from the original query to itsexpanded form. (Mitra et al. 98’) (e.g., q: "Paris Hilton", q’: "ParisHilton Whitney model heiress")
While on average, pseudo-feedback-based query expansion methodsimprove retrieval effectiveness over that of retrieval using the originalquery, there are numerous queries for which this is not true
Robust Query Expansion Based on Query-Drift Prevention
Query Expansion
The Performance Robustness Problem
The Performance Robustness Problem
Problems:
Dinit may contain many non relevant documents.
The initially retrieved document list Dinit may not manifest allquery-related aspects (Buckley 04’)
Consequences:
query drift- the shift in “intention” from the original query to itsexpanded form. (Mitra et al. 98’) (e.g., q: "Paris Hilton", q’: "ParisHilton Whitney model heiress")
While on average, pseudo-feedback-based query expansion methodsimprove retrieval effectiveness over that of retrieval using the originalquery, there are numerous queries for which this is not true
Robust Query Expansion Based on Query-Drift Prevention
Query Expansion
The Performance Robustness Problem
The Performance Robustness Problem - Cont.
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
302 304
306 308
310 312
314 316
318 320
322 324
326 328
330 332
334 336
338 340
342 344
346 348
350
Diffe
ren
ce
in
Eff
ective
ne
ss
Queries
RM1 Query Drift - ROBUST Corpus Queries 301-350
Robust Query Expansion Based on Query-Drift Prevention
Query Expansion
Query Expansion Models
Outline1 Background
Ad Hoc Retrieval
2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models
3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods
4 Experimental Evaluation
5 Related Work
6 Summary
7 Questions
Robust Query Expansion Based on Query-Drift Prevention
Query Expansion
Query Expansion Models
Query Expansion Models
The Relevance Model - RM1 (Lavrenko and Croft 01’): The relevancemodel paradigm assumes that there exists a (language) model RM1that generates terms both in the query and in the relevant documents 1
pRM1(w)def= ∑
d∈Dinit
pd(w)pq(d)
The Interpolated Relevance Model - RM3 (Abdul-Jaleel et al. 04’):query-anchoring at the model level:
pRM3(w)def= λpq(w)+(1−λ )pRM1(w)
—————————————
1. px (y) denotes the "similarity" between x and y
Robust Query Expansion Based on Query-Drift Prevention
Query Expansion
Query Expansion Models
Query Expansion Models- Cont.
Rocchio-1: If we take RM1 model and set pq(d) to a uniformdistribution we get the following model:
pRocchio1(w)def= ∑
d∈Dinit
pd(w)∗ 1|Dinit|
where all documents in Dinit are equal contributors to the constructed model.
Rocchio-3 (Rocchio 71’): query-anchoring at the model level:
pRocchio3(w)def= λpq(w)+(1−λ ) ∑
d∈Dinit
pd(w)∗ 1|Dinit|
Robust Query Expansion Based on Query-Drift Prevention
Query Expansion
Query Expansion Models
Query Expansion Models- Cont.
Model Weigh Interpolationwith respect with the
to pq(d) original queryRM1 3 7
∑d∈Dinitpd(w)pq(d)
RM3 3 3
λpq(w)+(1−λ )∑d∈Dinitpd(w)pq(d)
Rocchio1 7 7
∑d∈Dinitpd(w)∗ 1
|Dinit|Rocchio3 7 3
λpq(w)+(1−λ )∑d∈Dinitpd(w)∗ 1
|Dinit|
Robust Query Expansion Based on Query-Drift Prevention
Query-Drift Prevention
Improving Robustness Using Fusion
Outline1 Background
Ad Hoc Retrieval
2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models
3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods
4 Experimental Evaluation
5 Related Work
6 Summary
7 Questions
Robust Query Expansion Based on Query-Drift Prevention
Query-Drift Prevention
Improving Robustness Using Fusion
Our Idea - Using Fusion
Data fusion - combining retrieval methods or query representations.Data fusion - motivation:
Using a variety of methods (results) will utilize different aspects ofthe search space and hence will return more relevant results.
Performance effectiveness due to minimal overhead.
Robust Query Expansion Based on Query-Drift Prevention
Query-Drift Prevention
Improving Robustness Using Fusion
Improving Robustness Using Fusion - Motivation
Documents ranked high by both retrieved lists are potentiallyrelevant since they constitute a good match to both forms of thepresumed information need.
A document ranked high by the initial retrieval can be assumed tohave a high surface level similarity to the original query
Query expansion can add aspects that were not in the originalquery but may be relevant to the information need and mayimprove the retrieval.
A document that is ranked high by both the initial retrieval and theexpansion is assumed (potentially) to suffer less from query drift.
Documents that are retrieved using a variety of queryrepresentations have a high chance of being relevant. (Belkin etal. 93’, Robertson 97’)
Robust Query Expansion Based on Query-Drift Prevention
Query-Drift Prevention
Improving Robustness Using Fusion
Improving Robustness Using Fusion
The following retrieval methods operate on Dinit∪PF(Dinit).
Combmnz (Fox and Shaw 94’) rewards documents that are rankedhigh in both Dinit and PF(Dinit): 23
Scorecombmnz(d)def= (δ[d ∈Dinit]+δ[d ∈ PF(Dinit)])
·( Scoreinit(d)
∑d ′∈DinitScoreinit(d ′)
+Scorepf(d)
∑d ′∈PF(Dinit) Scorepf(d ′)
).
—————————————
2. For statement s, δ[s] = 1 if s is true and 0 otherwise.
Robust Query Expansion Based on Query-Drift Prevention
Query-Drift Prevention
Improving Robustness Using Fusion
Improving Robustness Using Fusion - Cont.
The interpolation algorithm:Differentially weights the initial score and the pseudo-feedback-basedscore using an interpolation parameter λ :
Scoreinterpolation(d)def=
λδ[d ∈Dinit]Scoreinit(d)
∑d ′∈DinitScoreinit(d ′)
+(1−λ )δ[d ∈ PF(Dinit)]Scorepf(d)
∑d ′∈PF(Dinit) Scorepf(d ′).
Robust Query Expansion Based on Query-Drift Prevention
Query-Drift Prevention
Improving Robustness Using Re-ordering Methods
Outline1 Background
Ad Hoc Retrieval
2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models
3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods
4 Experimental Evaluation
5 Related Work
6 Summary
7 Questions
Robust Query Expansion Based on Query-Drift Prevention
Query-Drift Prevention
Improving Robustness Using Re-ordering Methods
Improving Robustness Using Re-ordering MethodsrerankThe rerank method (e.g. Kurland and Lee 04’) re-orders the (top)pseudo-feedback-based retrieval results by the initial scores ofdocuments. This method anchors the documents in PF(Dinit) to thequery by using their initial scores.
Scorererank(d)def= δ[d ∈ PF(Dinit)]Scoreinit(d).
rev_rerankThe rev_rerank method re-orders the (top) initial retrieval results by thepseudo-feedback-based scores of documents
Scorerev_rerank(d)def= δ[d ∈Dinit]Scorepf(d).
Robust Query Expansion Based on Query-Drift Prevention
Query-Drift Prevention
Improving Robustness Using Re-ordering Methods
Improving Robustness Using Re-ordering MethodsrerankThe rerank method (e.g. Kurland and Lee 04’) re-orders the (top)pseudo-feedback-based retrieval results by the initial scores ofdocuments. This method anchors the documents in PF(Dinit) to thequery by using their initial scores.
Scorererank(d)def= δ[d ∈ PF(Dinit)]Scoreinit(d).
rev_rerankThe rev_rerank method re-orders the (top) initial retrieval results by thepseudo-feedback-based scores of documents
Scorerev_rerank(d)def= δ[d ∈Dinit]Scorepf(d).
Robust Query Expansion Based on Query-Drift Prevention
Experimental Evaluation
Evaluation
Evaluation methods:MAP - Mean Average Precision - effectiveness measurement<Init - Percentage of queries for which the expansion-basedperformance is worse than that of using the original query(measure of robustness)
TREC collections:corpus queries disksTREC 51-200 1-3ROBUST 301-450, 601-700 4,5WSJ 151-200 1-2SJMN 51-150 3AP 51-150 1-3
Robust Query Expansion Based on Query-Drift Prevention
Experimental Evaluation
Query Drift Prevention Methods Applied for RM1
10
15
20
25
30
35
MAP
TREC ROBUST WSJ SJMN AP
Corpus
Query Drift Prevention Methods Applied for RM1 - MAP
RM1
Interpolation
combmnz
rerank
rev_rerank
RM3
10
15
20
25
30
35
40
45
50
<Init
TREC ROBUST WSJ SJMN AP
Corpus
Query Drift Prevention Methods Applied for RM1- Robustness
RM1
Interpolation
combmnz
rerank
rev_rerank
RM3
i,ei,ei,e
i i i
i,ei,ei,e
i ii
i,ei i
i
i
i,e
ii
iii,e
i,e
i i
i i,ei,e
i
“i” and “e” indicate
statistically
significant MAP
differences with the
initial ranking and
RM1 respectively
Robust Query Expansion Based on Query-Drift Prevention
Experimental Evaluation
Robustness of Expansion Methods w/o Combmnz
0
10
20
30
40
50
<Init
TREC ROBUST WSJ SJMN AP
Corpus
Robustness of Expansion Methods
RM1
RM3
Rocchio1
Rocchio3
0
10
20
30
40
50
<Init
TREC ROBUST WSJ SJMN AP
Corpus
Robustness of Combmnz Applied for Expansion Methods
RM1
combmnz
RM3
combmnz
Rocchoi1
combmnz
Rocchio3
combmnz
Robust Query Expansion Based on Query-Drift Prevention
Experimental Evaluation
Robustness Improvement Due to Combmnz
0
0.1
0.2
0.3
0.4
0.5
0.6
% Improvement
TREC ROBUST WSJ SJMN AP AVERAGE
Corpus
Robustness Improvement Due to Combmnz
RM3
combmnz
RM1
combmnz
Rocchio3
combmnz
Rocchoi1
combmnz
Robust Query Expansion Based on Query-Drift Prevention
Experimental Evaluation
RM3 - The Impact of λ on Effectiveness and Robustness
pRM3(w)def= λpq(w)+(1−λ )pRM1(w)
Robust Query Expansion Based on Query-Drift Prevention
Experimental Evaluation
Comparison with a Cluster-Based Re-sampling Method(Lee et al. 08’)
TREC ROBUST WSJ SJMN APMAP < Init MAP < Init MAP < Init MAP < Init MAP < Init
RM3 20 28.7 30 28.1 34.8 20 24.6 29 29.1 28.3RM3 combmnz 17.9 16.7 27.1 19.3 30.7 18 21.6 23 26.5 16.2RM3 rerank 16.9 22.7 25.5 15.3 28.4 14 19.9 11 25.1 12.1Clusters 19.8 31.3 29.9 32.9 32.7 24 25 31 29.4 28.3
1
Robust Query Expansion Based on Query-Drift Prevention
Related Work
Related Work
Improving Robustness
Selecting sampling and weighting documents from the initialsearch (e.g, Billerbeck and Zobel 03’, Li and Croft 05’, Tao andZhai 06’, Collins-Thompson and Callan 07’)
Selecting and weighting terms (Mitra et al. 98’, Carpineto et al01’, Cao et al 08’)
Robust Query Expansion Based on Query-Drift Prevention
Related Work
Related Work - Cont.
Using clustering (Lu et al. 97’ Buckley et al. 98’, Lee et al. 08’)
Predicting whether a given expanded query will be more effectivethan the original one (Cronen-Townsend et al 04’)
Predicting which expansion form from a set of candidates willperform best (Winaver et al. 07’)
Query-anchoring at the model level (Zhai and Lafferty 01’,Abdul-Jaleel et al 04’)
Robust Query Expansion Based on Query-Drift Prevention
Summary
Summary
Fusion can potentially ameliorate query drift (similarity based vs.rank based)
Trade-off between effectiveness and robustness
Pre-retrieval vs. post-retrieval query anchoring
Robust Query Expansion Based on Query-Drift Prevention
Questions
Questions?
Thank you for your time