formal retrieval frameworks
Post on 08-Jan-2016
26 Views
Preview:
DESCRIPTION
TRANSCRIPT
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 1
Formal Retrieval Frameworks
ChengXiang Zhai (翟成祥 ) Department of Computer Science
Graduate School of Library & Information Science
Institute for Genomic Biology, Statistics
University of Illinois, Urbana-Champaign
http://www-faculty.cs.uiuc.edu/~czhai, czhai@cs.uiuc.edu
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 2
Outline
• Risk Minimization Framework [Lafferty & Zhai 01, Zhai & Lafferty 06]
• Axiomatic Retrieval Framework [Fang et al. 04, Fang & Zhai 05, Fang & Zhai 06]
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 3
Risk Minimization Framework
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 4
Risk Minimization: Motivation• Long-standing IR Challenges
– Improve IR theory
• Develop theoretically sound and empirically effective models
• Go beyond the limited traditional notion of relevance (independent, topical relevance)
– Improve IR practice
• Optimize retrieval parameters automatically
• SLMs are very promising tools …
– How can we systematically exploit SLMs in IR?
– Can SLMs offer anything hard/impossible to achieve in traditional IR?
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 5
Long-Standing IR Challenges
• Limitations of traditional IR models
– Strong assumptions on “relevance”
• Independent relevance
• Topical relevance
– Can we go beyond this traditional notion of relevance?
• Difficulty in IR practice
– Ad hoc parameter tuning
– Can’t go beyond “retrieval” to support info. access in general
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 6
More Than “Relevance”
Relevance Ranking Desired Ranking
Redundancy
Readability
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 7
Retrieval Parameters
• Retrieval parameters are needed to
– model different user preferences
– customize a retrieval model according to different queries and documents
• So far, parameters have been set through empirical experimentation
• Can we set parameters automatically?
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 8
Systematic Applications of Language Models to IR
• Many different variants of language models have been developed, but are there many more models to be studied?
• Can we establish a road map for exploring language models in IR?
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 9
Two Main Ideas of the Risk Minimization Framework
• Retrieval as a decision process
• Systematic language modeling
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 10
Idea 1: Retrieval as Decision-Making(A more general notion of relevance)
Unordered subset?
Clustering?
Given a query, - Which documents should be selected? (D) - How should these docs be presented to the user? ()Choose: (D,)
Query … Ranked list?1 2 3 4
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 11
Idea 2: Systematic Language Modeling
Document Language ModelsDocuments
DOC MODELING
QueryQuery
Language Model
QUERY MODELING
Loss Function User
USER MODELING
Retrieval Decision: ?
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 12
Generative Model of Document & Query [Lafferty & Zhai 01b]
observedPartiallyobserved
QU)|( Up QUser
DS )|( Sp DSource
inferred
),|( Sdp Dd Document
),|( Uqp Q q Query
( | , )Q Dp R R
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 13
Applying Bayesian Decision Theory [Lafferty & Zhai 01b, Zhai 02, Zhai & Lafferty 06]
Choice: (D1,1)
Choice: (D2,2)
Choice: (Dn,n)
...
query quser U
doc set Csource S
q
1
N
dSCUqpDLDD
),,,|(),,(minarg*)*,(,
hidden observedloss
Bayes risk for choice (D, )RISK MINIMIZATION
Loss
L
L
L
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 14
Benefits of the Framework
• Systematic exploration of retrieval models (covering almost all the existing retrieval models as special cases)
• Derive general retrieval principles (risk ranking principle)
• Automatic parameter setting
• Go beyond independent-relevance (subtopic retrieval)
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 15
Special Cases of Risk Minimization
• Set-based models (choose D)
• Ranking models (choose )
– Independent loss
• Relevance-based loss
• Distance-based loss
– Dependent loss
• MMR loss
• MDR loss
Boolean model
Probabilistic relevance model Generative Relevance Theory
Vector-space Model
Subtopic retrieval model
Two-stage LM
KL-divergence model
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 16
Case 1: Two-stage Language Models
QU)|( Up Q
DS)|( Sp D ),|( Sdp D
d
),|( Uqp Q q
otherwisec
ifdl DQ
DQ
),(0),,(
Loss function
),ˆ|(
)|ˆ(),ˆ|(
),|ˆ(),(
Uqp
UpUqp
UqpqdR
DQ
DQDQ
DQ
Rank
Risk ranking formula
Stage 1: compute D̂Stage 1
),ˆ|( Uqp DStage 2: compute
Stage 2
(Dirichlet prior smoothing)
(Mixture model)
Two-stage smoothing
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 17
Case 2: KL-divergence Retrieval Models
QU)|( Up Q
DS)|( Sp D ),|( Sdp D
d
),|( Uqp Q q
)||(
),(),,(
DQ
DQDQ
cD
cdl
Loss function
)ˆ||ˆ(),( DQ
Rank
DqdR
Risk ranking formula
)ˆ||ˆ( DQD
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 18
Case 3: Aspect Generative Model of Document & Query
QU),|( Up Q
User),|( Qqp
q Query
DS),|( Sp D
Source),|( Ddp
d Document
=( 1,…, k)
n
n
i
A
aDaiD dddwhereapdpdp ...,)|()|(),|( 1
1 1
dDirapdpdpn
i
A
aai )|()|()|(),|(
1 1
PLSI:
LDA:
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 19
Optimal Ranking for Independent Loss
1 11 1
1 1
1
1 1
1
1 1
1
1 1
* arg min ( , ) ( | , , , )
( , ) ( | ... )
( )
( ) ( )
* arg min ( ) ( ) ( | , , , )
arg min ( ) ( ) (
j j
j
j
j
j
N i
ii j
N i
ii j
N jN
ij i
N jN
ij i
N jN
ij i
L p q U C S d
L s l
s l
s l
s l p q U C S d
s l p
| , , , )
( | , , , ) ( ) ( | , , , )
* ( | , , , )
j j
k k k k
k
q U C S d
r d q U C S l p q U C S d
Ranking based on r d q U C S
Decision space = {rankings}
Sequential browsing
Independent loss
Independent risk= independent scoring
“Risk ranking principle”[Zhai 02]
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 20
Automatic Parameter Tuning• Retrieval parameters are needed to
– model different user preferences
– customize a retrieval model to specific queries and documents
• Retrieval parameters in traditional models
– EXTERNAL to the model, hard to interpret
– Parameters are introduced heuristically to implement “intuition”
– No principles to quantify them, must set empirically through many experiments
– Still no guarantee for new queries/documents
• Language models make it possible to estimate parameters…
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 21
The Way to Automatic Tuning ...
• Parameters must be PART of the model!
– Query modeling (explain difference in query)
– Document modeling (explain difference in doc)
• De-couple the influence of a query on parameter setting from that of documents
– To achieve stable setting of parameters
– To pre-compute query-independent parameters
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 22
Parameter Setting in Risk Minimization
Query Query Language Model
Document Language Models
Loss Function
User
Documents
Query model parameters
Doc model parameters
User model parameters
Estimate
Estimate
Set
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 23
Generative Relevance Hypothesis [Lavrenko 04]
• Generative Relevance Hypothesis: – For a given information need, queries expressing that need and
documents relevant to that need can be viewed as independent random samples from the same underlying generative model
• A special case of risk minimization when document models and query models are in the same space
• Implications for retrieval models: “the same underlying generative model” makes it possible to– Match queries and documents even if they are in different
languages or media
– Estimate/improve a relevant document model based on example queries or vice versa
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 24
Risk minimization can easily go beyond independent relevance…
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 25
Aspect Retrieval
Query: What are the applications of robotics in the world today?
Find as many DIFFERENT applications as possible.
Example Aspects: A1: spot-welding robotics
A2: controlling inventory A3: pipe-laying robotsA4: talking robotA5: robots for loading & unloading memory tapesA6: robot [telephone] operatorsA7: robot cranes… …
Aspect judgments A1 A2 A3 … ... Ak
d1 1 1 0 0 … 0 0d2 0 1 1 1 … 0 0d3 0 0 0 0 … 1 0….dk 1 0 1 0 ... 0 1
Must go beyond independent relevance!
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 26
Evaluation Measures
• Aspect Coverage (AC): measures per-doc coverage
– #distinct-aspects/#docs
– Equivalent to the “set cover” problem, NP-hard
• Aspect Uniqueness(AU): measures redundancy
– #distinct-aspects/#aspects
– Equivalent to the “volume cover” problem, NP-hard
• Examples0001001
0101100
1000101
… ...d1 d3d2
#doc 1 2 3 … …#asp 2 5 8 … …#uniq-asp 2 4 5AC: 2/1=2.0 4/2=2.0 5/3=1.67AU: 2/2=1.0 4/5=0.8 5/8=0.625
Accumulated counts
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 27
Dependent Relevance Ranking
• In general, the computation of the optimal ranking is NP-hard
• A general greedy algorithm
– Pick the first document according to INDEPENDENT relevance
– Given that we have picked k documents, evaluate the CONDITIONAL relevance of each candidate document
– Choose the document that has the highest conditional relevance value
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 28
Loss Function L( k+1 | 1 … k )
d1
dk
? dk+1
… 1
k
k+1
known
Novelty/RedundancyNov ( k+1 | 1 … k )
RelevanceRel( k+1 )
Maximal Marginal Relevance (MMR)
The best dk+1 is novel & relevant
1
k
k+1
Maximal Diverse Relevance (MDR)
Aspect Coverage Distrib. p(a|i)
The best dk+1 is complementary
in coverage
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 29
Maximal Marginal Relevance (MMR) Models
• Maximizing aspect coverage indirectly through redundancy elimination
• Conditional-Rel. = novel + relevant
• Elements
– Redundancy/Novelty measure
– Combination of novelty and relevance
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 30
A Mixture Model for Redundancy
P(w|Background)Collection
P(w|Old)
Ref. document
1-
=?
Maximum Likelihood Expectation-Maximization
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 31
Cost-based Combination of Relevance and Novelty
1,
))|(1()|(
))|(1)(|(Re
))|(Re1())|(1)(|(Re)}{,,,...,|(
2
3
321111
c
cwhere
dNewpdqp
dNewpdlp
dlpcdNewpdlpcdddl
kk
Rank
kk
Rank
kkkkiiQkk
Relevance score Novelty score
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 32
Maximal Diverse Relevance (MDR) Models
• Maximizing aspect coverage directly through aspect modeling
• Conditional-rel. = complementary coverage
• Elements
– Aspect loss function
– Generative Aspect Model
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 33
Aspect Generative Model of Document & Query
QU),|( Up Q
User),|( Qqp
q Query
DS),|( Sp D
Source),|( Ddp
d Document
=( 1,…, k)
n
n
i
A
aDaiD dddwhereapdpdp ...,)|()|(),|( 1
1 1
dDirapdpdpn
i
A
aai )|()|()|(),|(
1 1
PLSI:
LDA:
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 34
Aspect Loss Function
)|()1()|(1
)|(
,
)||()}{,,,...,|(
1
11,...,1
1,...,11111
k
k
ii
kk
kkQ
kiiQkk
apapk
ap
where
Ddddl
QU),|( Up Q ),|( Qqp
q
DS),|( Sp D Ddp ,|(
d
)ˆ||ˆ( 1,...,1kkQD
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 35
Aspect Loss Function: Illustration
Desired coveragep(a|Q)
“Already covered” p(a|1)... p(a|k -1)
New candidate p(a|k)
non-relevant
redundant
perfect
Combined coverage
)|()1()|(1
)|(
1
1
1,...,1
k
k
ii
kk
apapk
ap
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 36
Risk Minimization: Summary
• Risk minimization is a general probabilistic retrieval framework
– Retrieval as a decision problem (=risk min.)
– Separate/flexible language models for queries and docs
• Advantages
– A unified framework for existing models
– Automatic parameter tuning due to LMs
– Allows for modeling complex retrieval tasks
• Lots of potential for exploring LMs…
• For more information, see [Zhai 02]
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 37
Future Research Directions
• Modeling latent structures of documents
– Introduce source structures (naturally suggest structure-based smoothing methods)
• Modeling multiple queries and clickthroughs of the same user
– Let the observation include multiple queries and clickthroughs
• Collaborative search
– Introduce latent interest variables to tie similar users together
• Modeling interactive search
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 38
Axiomatic Retrieval Framework
Most of the following slides are from Hui Fang’s presentation
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 39
Traditional Way of Modeling the RelevanceTraditional Way of Modeling the Relevance
Query
Document
Relevance?
QRep
DRep
• No way to predict the performance and identify the weaknesses• Sophisticated parameter tuning
Rel≈Sim(DRep,QRep)
Vector Space Models[Salton et al.75, Salton et al. 83,
Salton et al. 89, Singhal96]
Rel≈P(R=1|DRep,QRep)
Probabilistic Models[Fuhr et al 92, Lafferty et al 03, Ponte et al 98, Robertson et al. 76, Turtle et al. 91, Rijbergen et al 77]
test collectiontest collection
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 40
No Way to Predict the PerformanceNo Way to Predict the Performance
S (Q,D) c(t,Q)logN 1
df (t)
1 log(1 log(c(t,D)))
(1 s) s | D |avdl
tDQ
1 log(c(t,D))
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 41
“k1, b and k3 are parameters which depend on the nature of the queries and possibly on the database; k1 and b default to 1.2 and 0.75 respectively, but smaller values of b are sometimes advantageous; in long queries k3 is often set to 7 or 1000.”
Sophisticated Parameter TuningSophisticated Parameter Tuning
[Robertson et al. 1999]
S(Q,D) logN df (t) 0.5
df (t)tQD
(k1 1)c(t,D)
c(t,D) k1((1 b) b | D |
avdl)(k3 1)c(t,Q)
k3 c(t,Q)
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 42
High Parameter SensitivityHigh Parameter Sensitivity
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 43
Hui Fang’s Thesis Work Hui Fang’s Thesis Work [Fang 07][Fang 07]
Propose a novel axiomatic framework, where relevance is directly modeled with term-based constraints
– Predict the performance of a function analytically[Fang et al., SIGIR04]
– Derive more robust and effective retrieval functions [Fang & Zhai, SIGIR05, Fang & Zhai, SIGIR06]
– Diagnose weaknesses and strengths of retrieval functions [Fang & Zhai, under review]
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 44
Traditional Way of Modeling the RelevanceTraditional Way of Modeling the Relevance
Document
Relevance?Rel≈Sim(DRep,QRep)
Vector Space Models
Rel≈P(R=1|DRep,QRep)
Probabilistic Models
test collectiontest collection
Query QRep
DRep
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 45
Axiomatic Approach to Relevance ModelingAxiomatic Approach to Relevance Modeling
Document
Relevance?Rel≈Sim(DRep,QRep)
Vector Space Models
Rel≈P(R=1|DRep,QRep)
Probabilistic Models
test collectiontest collection
Query QRep
DRep
Constraint 1
Constraint 2
Constraint m
…
(1) Predict (1) Predict performanceperformance
Rel(Q,D)
(2) Develop more (2) Develop more robust functionsrobust functions
Collection Collection
(constraint 1)(constraint 1)……
(3) Diagnose weaknesses(3) Diagnose weaknesses
Collection Collection
(constraint 2)(constraint 2)
Collection Collection
(constraint m)(constraint m)
We are hereWe are here
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 46
Part 1: Define retrieval constraintsPart 1: Define retrieval constraints
[Fang et. al. SIGIR 2004][Fang et. al. SIGIR 2004]
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 47
• Pivoted Normalization Method
• Dirichlet Prior Method
• Okapi Method
1 ln(1 ln( ( , ))) 1( , ) ln
| | ( )(1 )w q d
c w d Nc w q
d df ws savdl
( , )( , ) ln(1 ) | | ln
( | ) | |w q d
c w dc w q q
p w C d
31
31
( 1) ( , )( 1) ( , )( ) 0.5ln
| |( ) 0.5 ( , )((1 ) ) ( , )w q d
k c w qk c w dN df wddf w k c w qk b b c w d
avdl
Inversed Document FrequencyDocument Length NormalizationTerm Frequency
Empirical Observations in IR (Cont.)
1+ln(c(w,d))
Alternative TF transformationParameter sensitivity
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 48
Research Questions
• How can we formally characterize these necessary retrieval heuristics?
• Can we predict the empirical behavior of a method without experimentation?
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 49
d2:d1:
),( 1dwc
),( 2dwc
Term Frequency Constraints (TFC1)
• TFC1
TF weighting heuristic I: Give a higher score to a document with more occurrences of a query term.
q :w
If |||| 21 dd ),(),( 21 dwcdwc and
Let q be a query with only one term w.
).,(),( 21 qdfqdf then
),(),( 21 qdfqdf
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 50
1 2( , ) ( , )f d q f d q
Term Frequency Constraints (TFC2)
TF weighting heuristic II: Favor a document with more distinct query terms.
2 1( , )c w d
1 2( , )c w d
1 1( , )c w d
d1:
d2:
1 2( , ) ( , ).f d q f d qthen
1 2 1 1 2 1( , ) ( , ) ( , )c w d c w d c w d If2 2 1 1 2 1( , ) 0, ( , ) 0, ( , ) 0c w d c w d c w d
and
1 2| | | |d dand
Let q be a query and w1, w2 be two query terms.
Assume1 2( ) ( )idf w idf w
• TFC2
q:w1 w2
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 51
Term Discrimination Constraint (TDC)
IDF weighting heuristic:Penalize the words popular in the collection; Give higher weights to discriminative terms.
Query: SVM Tutorial Assume IDF(SVM)>IDF(Tutorial)
...…
SVMSVM TutorialTutorial…
Doc 1
……
SVMSVMTutorialTutorial…
Doc 2
( 1) ( 2)f Doc f Doc
SVM Tutorial
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 52
Term Discrimination Constraint (Cont.)
• TDCLet q be a query and w1, w2 be two query terms.
1 2| | | |,d dAssume
)()( 21 widfwidf and),(),( 2111 dwcdwc If
).,(),( 21 qdfqdf then
),(),(),(),( 22211211 dwcdwcdwcdwc and
),(),( 21 dwcdwc for all other words w.and
1 2( ) ( )idf w idf wq:w1 w2
d2:d1:
),( 11 dwc
),( 21 dwc
),( 12 dwc
),( 22 dwc
1 2( , ) ( , )f d q f d q
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 53
Length Normalization Constraints(LNCs)Document length normalization heuristic:Penalize long documents(LNC1); Avoid over-penalizing long documents (LNC2) .
• LNC2
d2:
q:Let q be a query.
d1:||||,1 21 dkdk ),(),( 21 dwckdwc If and
),(),( 21 qdfqdf then
),(),( 21 qdfqdf
d1:d2:
q:Let q be a query.
1),(),(, 12 dwcdwcqw),(),(, 12 dwcdwcw
qw
),( 1dwc
),( 2dwc
If for some word
but for other words
),(),( 21 qdfqdf ),(),( 21 qdfqdf then
• LNC1
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 54
TF-LENGTH Constraint (TF-LNC)
• TF-LNC
TF-LN heuristic:Regularize the interaction of TF and document length.
q:w
),( 2dwc
d2:
),( 1dwc
d1:
Let q be a query with only one term w.
).,(),( 21 qdfqdf then
),(),( 21 dwcdwc and
If 1 2 1 2| | | | ( , ) ( , )d d c w d c w d
1 2( , ) ( , )f d q f d q
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 55
Analytical Evaluation
Retrieval Formula TFCs TDC LNC1 LNC2 TF-LNC
Pivoted Norm. Yes Conditional Yes Conditional Conditional
Dirichlet Prior Yes Conditional Yes Conditional Yes
Okapi (original) Conditional Conditional Conditional Conditional Conditional
Okapi (modified) Yes Conditional Yes Yes Yes
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 56
Term Discrimination Constraint (TDC)
IDF weighting heuristic:Penalize the words popular in the collection; Give higher weights to discriminative terms.
...…SVMSVMSVMTutorialTutorial…
Doc 1
Query: SVM Tutorial Assume IDF(SVM)>IDF(Tutorial)
……TutorialSVMSVMTutorialTutorial…
Doc 2
( 1) ( 2)f Doc f Doc
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 57
Benefits of Constraint Analysis
• Provide an approximate bound for the parameters
– A constraint may be satisfied only if the parameter is within a particular interval.
• Compare different formulas analytically without experimentations
– When a formula does not satisfy the constraint, it often indicates non-optimality of the formula.
• Suggest how to improve the current retrieval models
– Violation of constraints may pinpoint where a formula needs to be improved.
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 58
Parameter sensitivity of s
s
Avg
. Pre
c.
Benefits 1 : Bounding Parameters
• Pivoted Normalization MethodLNC2 s<0.4
0.4
Optimal s (for average precision)
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 59
Negative when df(w) is large Violate many constraints
31
31
( 1) ( , )( 1) ( , )( ) 0.5ln
| |( ) 0.5 ( , )((1 ) ) ( , )w q d
k c w qk c w dN df wddf w k c w qk b b c w d
avdl
Benefits 2 : Analytical Comparison• Okapi Method
Pivoted
Okapi
keyword query verbose query
s or b s or b
Avg
. Pre
c
Avg
. Pre
c
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 60
Benefits 3: Improving Retrieval Formulas
Make Okapi satisfy more constraints; expected to help verbose queries
31
31
( 1) ( , )( 1) ( , )( ) 0.5ln
| |( ) 0.5 ( , )((1 ) ) ( , )w q d
k c w qk c w dN df wddf w k c w qk b b c w d
avdl
• Modified Okapi Methoddf
N 1ln
keyword query verbose query
s or b s or b
Avg
. Pre
c.
Avg
. Pre
c.
Pivoted
Okapi
Modified Okapi
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 61
Axiomatic Approach to Relevance ModelingAxiomatic Approach to Relevance Modeling
Document
Relevance?Rel≈Sim(DRep,QRep)
Vector Space Models
Rel≈P(R=1|DRep,QRep)
Probabilistic Models
test collectiontest collection
Query QRep
DRep
Constraint 1
Constraint 2
Constraint m
…
(1) Predict (1) Predict performanceperformance
Rel(Q,D)
(2) Develop more (2) Develop more robust functionsrobust functions
Collection Collection
(constraint 1)(constraint 1)……
(3) Diagnose weaknesses(3) Diagnose weaknesses
Collection Collection
(constraint 2)(constraint 2)
Collection Collection
(constraint m)(constraint m)
We are hereWe are here
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 62
Part 2: Derive new retrieval functionsPart 2: Derive new retrieval functions
[Fang & Zhai SIGIR05, Fang & Zhai SIGIR06][Fang & Zhai SIGIR05, Fang & Zhai SIGIR06]
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 63
Basic Idea of Axiomatic ApproachBasic Idea of Axiomatic Approach
C2
C3
S1
S2
S3
Function space
C1
Retrieval constraints
Our target
Function space
SS11
SS22
SS33
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 64
Function SpaceFunction Space
Dd1,d2,...,dn
Qq1,q2,...,qm;
S : QD
Define the function space inductively
Q:
D:
catcat
dogdog
Primitive weighting function (f)S(Q,D) = S( , ) = f ( , ) bigbig
Query growth function (h)S(Q,D) = S( , ) = S( , )+h( , , )
Document growth function (g)
S(Q,D) = S( , ) = S( , )+g( , , )
bigbig
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 65
CC11
CC33
CC22
Derivation of New Retrieval FunctionsDerivation of New Retrieval Functions
S(Q,D)
f
g
h
decomposedecompose
S’S’
SS
generalizegeneralize
F
G
Hconstrainconstrain
f '
g'
h'
existing functionexisting function
assembleassemble
S'(Q,D) new functionnew function
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 66
Representative Derived Function Representative Derived Function
S(Q,D) c(t,Q)tQD (
N
df (t))0.35
c(t,D)
c(t,D) s s| D |
avdl
IDFIDF TFTF
length normalizationlength normalization
QTFQTF
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 67
The derived function is less sensitive to the The derived function is less sensitive to the parameter settingparameter setting
Axiomatic ModelAxiomatic Modelbetterbetter
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 68
Adding Semantic Term MatchingAdding Semantic Term Matching
dogdog
Training puppies is not always easy: it requires work. Puppies should be touched and held from birth, although only briefly and occasionally until their eyes and ears open. Otherwise the puppy may become vicious.
A book is a collection of paper with text, pictures, usually bound together along one edge within covers. A book is also a literary work or a main division of such a work. A book produced in electronic format is known as an e-book.
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 69
General Approach General Approach to Semantic Term Matchingto Semantic Term Matching
• Select semantic similar terms
• Expand original query with the selected terms
dog dog 11puppy 0.5
doggy 0.5
hound 0.5
bone 0.1
Key challenge: Key challenge:
How to weight selected terms?How to weight selected terms?
The proposed axiomatic approach provides guidance on The proposed axiomatic approach provides guidance on how to weight terms appropriately. how to weight terms appropriately.
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 70
Effectiveness of Semantic Term MatchingEffectiveness of Semantic Term Matching
ROBUST04 ROBUST05
MAP P@20 MAP P@20
Syntactic term matching
(baseline)0.248 0.352 0.192 0.379
Semantic term matching
0.302
(21.8%)
0.399 0.292
(51.0%)
0.502
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 71
Axiomatic Approach to Relevance ModelingAxiomatic Approach to Relevance Modeling
Document
Relevance?Rel≈Sim(DRep,QRep)
Vector Space Models
Rel≈P(R=1|DRep,QRep)
Probabilistic Models
test collectiontest collection
Query QRep
DRep
Constraint 1
Constraint 2
Constraint m
…
(1) Predict (1) Predict performanceperformance
Rel(Q,D)
(2) Develop more (2) Develop more robust functionsrobust functions
Collection Collection
(constraint 1)(constraint 1)……
(3) Diagnose weaknesses(3) Diagnose weaknesses
Collection Collection
(constraint 2)(constraint 2)
Collection Collection
(constraint m)(constraint m)
We are hereWe are here
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 72
Part 3: Diagnostic evaluation for IR models
[Fang & Zhai, under review]
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 73
Existing evaluation provides little explanation for Existing evaluation provides little explanation for the performance differencesthe performance differences
trec8 wt-2g fr88-89
Pivoted 0.244 0.288 0.218
Dirichlet 0.257 0.302 0.202
How to diagnose weaknesses and strengths of retrieval functions?
test collectiontest collection
dogdog
……
Query:Query:
DocDoc11::
DocDoc22::
DocDocnn::
Retrieval FunctionRetrieval Function MAP=0.25 MAP=0.25
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 74
Relevance-Preserving Perturbations
cD(d,d,K)
concatenate every document with itself K times
document scaling perturbation:document scaling perturbation:
Perturb term statistics in documents and keep relevance statusPerturb term statistics in documents and keep relevance status
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 75
Summary of PerturbationsSummary of Perturbations
• Relevance addition
• Noise addition
• Internal term growth
• Document scaling
• Relevant document concatenation
• Non-relevant document concatenation
• Noise deletion
• Document addition
• Document deletion
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 76
Length Scaling TestLength Scaling Test
cD(d,d,K)
test whether a retrieval function over-penalizes long documents
1. Identify the aspect to be diagnosed
2. Choose appropriate relevance-preserving perturbations
3. Perform the test and interpret the results
Dirichlet over-penalizes Dirichlet over-penalizes long documents!long documents!
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 77
Summary of Diagnostic TestsSummary of Diagnostic Tests
• Length variation sensitivity tests
– Length variance reduction test
– Length variance amplification test
– Length scaling test
• Term noise resistance tests
– Term noise addition test
• TF-LN balance Tests
– Single query term growth
– Majority query term growth
– All query term growth
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 78
Identifying the weaknesses makes it possible to improve the performance
Dir. 0.257 2838 0.397 0.302 1875 0.372 0.207 741 0.185
M.D. 0.262 2874 0.415 0.321 1930 0.395 0.224 811 0.191
Piv. 0.244 2826 0.402 0.288 1924 0.369 0.223 822 0.206
M.P. 0.256 2848 0.411 0.316 1940 0.392 0.230 867 0.202
trec8 wt2g fr88-89
MAP #RRel P@20 MAP #RRel P@20 MAP #RRel P@20
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 79
Axiomatic Framework: Summary
• A new way of examining and developing retrieval models
• Facilitate analytical study of retrieval models
• Applicable to the development of all kinds of ranking functions
• Limitation:
– Constraints can be subjective
– Not constructive (thus must rely on other techniques to reduce the search space)
• Combined with machine learning?
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 80
Lecture 4: Key Points
• Retrieval problem can be generally formalized as a statistical decision problem
– Nicely incorporate generative models into a retrieval framework
– Serve as a road map for exploring new retrieval models
– Make it easier to model complex retrieval problems (interactive retrieval)
• Axiomatic framework makes it possible to analyze a retrieval function without experimentation
– Facilitate theoretical study of retrieval models (“impossibility theorem”?)
– Offer a general methodology for thinking about and improving retrieval models
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 81
Readings
• The risk minimization paper:
– http://sifaka.cs.uiuc.edu/czhai/riskmin.pdf
• Hui Fang’s thesis:
– http://www.cs.uiuc.edu/techreports.php?report=UIUCDCS-R-2007-2847
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., 2008 82
Discussion
• Risk minimization for multimedia retrieval
– Add generative models of images and video to the framework
– Unifying multimedia with text as a common language
• Axiomatic approaches
– Constraints for ranking multimedia information items
– Add constraints to a statistical learning framework (e.g., add constraints as prior or regularization)
top related