automatic term mismatch diagnosis for selective query expansion
DESCRIPTION
Automatic Term Mismatch Diagnosis for Selective Query Expansion. Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA @SIGIR 2012, Portland, OR. Main Points. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/1.jpg)
Automatic Term Mismatch Diagnosis for
Selective Query ExpansionLe Zhao and Jamie Callan
Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University
Pittsburgh, PA@SIGIR 2012, Portland, OR
![Page 2: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/2.jpg)
Main Points
• An important problem – term mismatch & a traditional solution
• New diagnostic intervention approach
• Simulated user studies
• Diagnosis & intervention effectiveness
2
![Page 3: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/3.jpg)
3
• Average term mismatch rate: 30-40% [Zhao10]
• A common cause of search failure [Harman03, Zhao10]• Frequent user frustration [Feild10]• Here: 50% - 300% gain in retrieval accuracy
Term Mismatch Problem
Relevant docs not returned Web, short queries, stemmed,
inlinks included
![Page 4: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/4.jpg)
Term Mismatch Problem
Example query (TREC 2006 Legal discovery task):approval of (cigarette company) logos on television watched by children
4
approval logos television watched children
Mismatch 94% 86% 79% 90% 82%
Highest mismatch rate
High mismatch rate for all query terms in this query
![Page 5: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/5.jpg)
5
The Traditional Solution: BooleanConjunctive Normal Form (CNF) Expansion
Keyword query:approval of logos on television watched by children
Manual CNF (TREC Legal track 2006): (approval OR guideline OR strategy)AND (logos OR promotion OR signage OR brand OR mascot OR marque OR mark)AND (television OR TV OR cable OR network)AND (watched OR view OR viewer)AND (children OR child OR teen OR juvenile OR kid OR adolescent)
– Expressive & compact (1 CNF == 100s alternatives)– Highly effective (this work: 50-300% over base keyword)
![Page 6: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/6.jpg)
The Potential
• Query: approval logos television watched children
6
50-300% Recall approval 6.49%logos 14.1%television 21.3%watched 10.4%children 18.0%Overall 2.04%
The Potential
? Recall+guideline+strategy == 12.8%+promotion+signage... == 19.7%+tv+cable+network == 22.4%+view+viewer == 19.5%+child+teen+kid... == 19.3% == 8.74%
![Page 7: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/7.jpg)
CNF Expansion
• Widely used in practice– Librarians [Lancaster68, Harter86]– Lawyers [Lawlor62, Blair85, Baron07]– Search experts [Clarke95, Hearst96, Mitra98]
• Less well studied in research– Users do not create effective free form Boolean queries
([Hearst09] cites many studies).• Question: How to guide user effort in productive directions
– restricting to CNF expansion (to the mismatch problem)– focusing on problem terms when expanding
7WikiQuery [Open Source IR Workshop] Ad
![Page 8: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/8.jpg)
Main Points
• An important problem – term mismatch & a traditional solution
• New diagnostic intervention approach
• Simulated user studies
• Diagnosis & intervention effectiveness
8
![Page 9: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/9.jpg)
Diagnostic Intervention
• Goal– Least amount user effort near optimal performance– E.g. expand 2 terms 90% of total improvement
9
approval of logos on television watched by children
approval of logos on television watched by children
High idf (rare) terms
CNF (approval OR guideline OR strategy) AND logos AND television AND (watch OR view OR viewer)AND children
(approval OR guideline OR strategy) AND logosAND (television OR tv OR cable OR network)AND watch AND children
Query: approval of logos on television watched by children
Diagnosis:
Expansion:
Low terms
CNF
![Page 10: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/10.jpg)
Diagnostic Intervention
10
[ 0.9 (approval logos television watch children) 0.1 (0.4 guideline 0.3 strategy 0.5 view 0.4 viewer)]
[ 0.9 (approval cigar television watch children) 0.1 (0.4 guideline 0.3 strategy 0.5 tv 0.4 cable 0.2 network) ]
Diagnosis:
Expansion query
Bag of wordExpansion: Bag of wordOriginal query
High idf (rare) termsLow termsapproval of logos on television watched by children
Query: approval of logos on television watched by children
• Goal– Least amount user effort near optimal performance– E.g. expand 2 terms 90% of total improvement
approval of logos on television watched by children
![Page 11: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/11.jpg)
Diagnostic Intervention
• Diagnosis methods– Baseline: rareness (high idf)– High predicted term mismatch P(t | R) [Zhao10]
• Intervention methods– Baseline: bag of word (Relevance Model [Lavrenko01])
• w/ manual expansion terms• w/ automatic expansion terms
– CNF expansion (probabilistic Boolean ranking)• E.g.
11
_
(approval OR guideline OR strategy)ANDP logos ANDP televisionANDP (watch OR view OR viewer)ANDP children
![Page 12: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/12.jpg)
Main Points
• An important problem – term mismatch & a traditional solution
• New diagnostic intervention approach
• Evaluation: Simulated user studies
• Diagnosis & intervention effectiveness
12
![Page 13: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/13.jpg)
13
User Keyword queryDiagnosis
system(P(t | R) or idf)
Problem query termsUser expansionExpansion
terms
Query formulation
(CNF or Keyword)
Retrieval engine Evaluation
(child AND cigar)
(child > cigar)(child teen)
(child OR teen) AND cigar
Diagnostic Intervention (We Hope to)
![Page 14: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/14.jpg)
14
Diagnostic Intervention (We Hope to)
User Keyword queryDiagnosis
system(P(t | R) or idf)
Problem query termsUser expansionExpansion
terms
Query formulation
(CNF or Keyword)
Retrieval engine Evaluation
(child AND cigar)
(child > cigar)
(child OR teen) AND cigar
(child teen)
![Page 15: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/15.jpg)
15
Expert user Keyword queryDiagnosis
system(P(t | R) or idf)
Problem query termsUser expansionExpansion
terms
Query formulation
(CNF or Keyword)
Retrieval engine Evaluation
Online simulation
Online simulation
We Ended up Using Simulation
(child teen)
(child OR teen) AND cigar
(child OR teen) AND (cigar OR tobacco)
FullCNFOffline
(child AND cigar)
(child > cigar)
![Page 16: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/16.jpg)
16
Diagnostic Intervention Datasets
• Document sets– TREC 2007 Legal track, 7 million tobacco corp., train on 2006– TREC 4 Ad hoc track, 0.5 million newswire, train on TREC 3
• CNF Queries– TREC 2007 by lawyers, TREC 4 by Univ. Waterloo [Clarke95]– 50 topics each, 2-3 keywords per query
• Relevance Judgments– TREC 2007 sparse, TREC 4 dense
• Evaluation measures– TREC 2007 statAP, TREC 4 MAP
![Page 17: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/17.jpg)
Main Points
• An important problem – term mismatch & a traditional solution
• New diagnostic intervention approach
• Simulated user studies
• Diagnosis & intervention effectiveness
17
![Page 18: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/18.jpg)
18
P(t | R) vs. idf diagnosis
Results – Diagnosis
Diagnostic CNF expansion on TREC 4 and 2007
0 1 2 3 4 All0%
10%20%30%40%50%60%70%80%90%
100%
P(t | R) on TREC 2007idf on TREC 2007P(t | R) on TREC 4idf on TREC 4
# query terms selected
Gain in retrieval (MAP)
8%-50%
No Expansion
Full Expansion
![Page 19: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/19.jpg)
19
Results – Expansion Intervention
CNF vs. bag-of-word expansion
0 1 2 3 4 All0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
CNF on TREC 4Bag of word on TREC 4CNF on TREC 2007Bag of word on TREC 2007
# query terms selected
Retrieval performance (MAP)
P(t | R) guided expansion on TREC 4 and 2007
50% to300%gain
Similar level of gain in top precision
![Page 20: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/20.jpg)
Main Points
• An important problem – term mismatch & a traditional solution
• New diagnostic intervention approach
• Simulated user studies
• Diagnosis & intervention effectiveness
20
![Page 21: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/21.jpg)
21
Conclusions
• One of the most effective ways to engage user interactions– CNF queries gain 50-300% over keyword baseline.
• Mismatch diagnosis simple & effective interactions– Automatic diagnosis saves user effort by 33%.
• Expansion in CNF easier and better than in bag of word– Bag of word requires balanced expansion of all terms.
• New research questions:– How to learn from manual CNF queries to improve
automatic CNF expansion– get ordinary users to create effective CNF expansions
(with the help of interfaces or search tools)
![Page 22: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/22.jpg)
22
Acknowledgements
Chengtao Wen, Grace Hui Yang, Jin Young Kim, Charlie Clarke, SIGIR Reviewers
Helpful discussions & feedback
Charlie Clarke, Gordon Cormack, Ellen Voorhees, NISTAccess to data
NSF grant IIS-1018317Opinions are solely the authors’.
![Page 23: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/23.jpg)
23
END
![Page 24: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/24.jpg)
The Potential
• Query: approval logos television watched children
24
logos +promotion +signage +brand All
Mismatch 85.9% 81.1% 80.9% 80.3% 80.3%
Recall 14.1% 18.9% 19.1% 19.7% 19.7%
50-300% Recall Recalllogos 14.1% +promotion+signage... == 19.7%approval 6.49% +guideline+strategy == 12.8%television 21.3% +tv+cable+network == 22.4%watched 10.4% +view+viewer == 19.5%children 18.0% +child+teen+kid... == 19.3%Overall 2.04% == 8.74%
The Potential
?
![Page 25: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/25.jpg)
Failure Analysis (vs. baseline)
Diagnosis:• 4 topics: wrong P(t | R) prediction, lower MAP
Intervention:• 3 topics: right diagnosis, but lower MAP• 2 of the 3: no manual expansion for the selected term
– Users do not always recognize which terms need help.• 1 of the 3: wrong expansion terms by expert
– “apatite rocks” in nature, not “apatite” chemical– CNF expansion can be difficult w/o looking at retrieval
results.25
![Page 26: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/26.jpg)
-1 -0.8 -0.6 -0.4 -0.2 0 0.2
-0.1
-0.05
0
0.05
0.1
0.15y - MAP Difference
x - P(t | R) Difference
(P(t | R) better prediction and better MAP)
(P(t | R) better prediction, but lower MAP)
(idf betterprediction, and better MAP)
Failure Analysis -- Comparing diagnosis methods: P(t | R) vs. idf
26
User didn’t expand Wrong
expansion
Legend query query with unexpanded term(s)
![Page 27: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/27.jpg)
Term Mismatch Diagnosis
• Predicting term recall - P(t | R) [Zhao10]– Query dependent features (model causes of mismatch)
• Synonyms of term t based on query q’s context• How likely these synonyms occur in place of t• Whether t is an abstract term• How rare t occurs in the collection C
– Regression prediction: fi(t, q, C) P(t | R)– Used in term weighting for long queries
• Lower predicted P(t | R) higher likelihood of mismatch t more problematic
27
![Page 28: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/28.jpg)
28
Online or Offline Study?
• Controlling confounding variables– Quality of expansion terms– User’s prior knowledge of the topic– Interaction effectiveness & effort
• Enrolling many users• Offline simulations can avoid all these and still make
reasonable observations
![Page 29: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/29.jpg)
29
Simulation Assumptions
• Full expansion to simulate partial expansions• 3 assumptions about user expansion process
– Independent expansion of query terms• A1: same set of expansion terms for a given query term, no
matter which subset of query terms gets expanded• A2: same sequence of expansion terms, no matter …
– A3: Re-constructing keyword query from CNF• Procedure to ensure vocabulary faithful to that of the original
keyword description• Highly effective CNF queries ensure reasonable kw baseline
![Page 30: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/30.jpg)
30
Results – Level of Expansion
• More expansion per query term, better retrieval• Result of expansion terms being effective• Queries with significant gain in retrieval after expanding
more than 4 terms:– Topic 84, cigarette sales in James Bond movies
![Page 31: Automatic Term Mismatch Diagnosis for Selective Query Expansion](https://reader035.vdocument.in/reader035/viewer/2022062521/5681685e550346895ddea039/html5/thumbnails/31.jpg)
31
Expert User Keyword Query Diagnosis system(P(t | R) or idf)
Problem query terms User expansion
Expansion termsQuery formulation(CNF or Keyword)Retrieval engineEvaluation
Online simulation
(child OR youth) AND (cigar OR tobacco)
(child AND cigar)
(child --> youth)(child OR youth) AND cigar
(child > cigar)
Online simulation
Offline Full CNF Query
Idf (rare) 1.22 1.92 1.69 1.87 1.40
Most infrequent