analyzing and evaluating query reformulation strategies in web...
TRANSCRIPT
-
AnalyzingQuery Reformulation &Search Abandonment inWeb Search
Efthimis N. EfthimiadisUniversity of [email protected]
University of Geneva, March 18, 2010 1
-
Work presented here is in collaboration with
Jeff Huang (UW) and Sofia Stamou (UPatras)
Related Publications:• Huang, J. & Efthimiadis, E. N. (2009). Analyzing and Evaluating Query
Reformulation Strategies in Web Search Logs. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009), Hong Kong, November 2-6, 2009, pp77-86. (Conference acceptance rate 15%; Nominated for best student paper award)
• Huang, J. and Efthimiadis, E.N. Search Abandonment in Web Search Logs. Submitted for publication.
• Stamou, S. & Efthimiadis, E. N. (2009). Queries without Clicks: Successful or Failed Searches? In: Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation. Boston, MA., USA. July 23, 2009.
• Stamou, S., & Efthimiadis, E. N. (2010). Interpreting User Inactivity on Search Results. In Proceedings of the 32nd European Conference on IR Research (ECIR) on Advances in Information Retrieval, Milton Keynes, UK, 28-31 March 2010. Springer, 2010. University of Geneva, March 18, 2010 2
-
AgendaA. Search Interaction: overviewB. Reformulations
1. Classifier2. Study3. Findings
C. Abandonment1. Classifier2. Study3. Findings
D. User StudyE. Future work
University of Geneva, March 18, 2010 3
-
Search Interaction
Overview
QueryUser
i-Need
Research on Search Interaction: QF & modelse.g., Bates, Belkin, Fidel, … Ingwersen, Tenopir, …Hartley, Borlund, Toms, …Efthimiadis…& many others
Results
S.E.
• Beyond string matching
• User Intent (e.g., Broder, Rose)• Query prediction performance
• difficulty• QRF, QE expansion risk
• Customize results to intent• present Results
• spelling correction did you mean this?
• query refinement related searches
• related works more like this
• search trails• Clickthroughs
Term suggestions
• Search logsUniversity of Geneva, March 18, 2010
4
-
REFORMULATIONS
University of Geneva, March 18, 20105
-
Query Reformulations are…
a modification of a previous querymade by
computers or usersused to
retrieve different search resultsin a
web search engine
We study these cases
University of Geneva, March 18, 2010 6
-
Same Query
Query Reformulation
New Query
Query Types
7
-
We ask,Which
reformulationstrategies work?
How do we classify these reformulation
strategies?
How are users doing query reformulation?
8
-
Exampleuniversity of washington information school
9University of Geneva, March 18, 2010
-
More examples…Word Reorder
Stemming
Word Substitution
Spelling Correction
Remove Words
Reorder Word
Stemmed
Term Replacement
Speling Correctoin
Remove
10University of Geneva, March 18, 2010
-
Prior Reformulation Taxonomies
Present Study Anick [3] Teevan [34] Jansen [16], He [15], Lau [24] Whittle [36] Bruza [7] Guo [13]
word reorder syntactic variant word order
whitespace and punctuation
non-alphanumerics, word merge SPL, PUN
word splitting, word merging
remove words remove words / duplicates generalization D(k)
add words head, modifier add words, add stopwords specialization C(k) ADD
url stripping domain
stemming morphological variant stemming and pluralization M(k) DER word stemming
acronym acronym abbreviations ABR expansion
substring
abbreviation
word substitution alternative, hyponym, change word swaps, synonyms reformulation W(k), w(k) SUB
spelling correction spelling misspellings M(k) SPE spelling correction
* not detected elaboration, location reformulation S(k), s(k)
* not in data capitalization, extra whitespace J(k) CAS
11University of Geneva, March 18, 2010
-
12University of Geneva, March 18, 2010
-
A Rule-Based ClassifierFirst automated reformulation
strategy classifierBased on heuristics,
– Reformulation types are intuitive, no machine learning
Primary Goal: High PrecisionSecondary Goal: Adequate Recall
High precision enables accurate comparison between properties of reformulation types
13University of Geneva, March 18, 2010
-
Architecture
user1, query string1, timestamp, rank, urluser1, query string2, timestamp, rank, urluser1, query string3, timestamp, rank, urluser2, query string1, timestamp, rank, urluser3, query string1, timestamp, rank, urluser3, query string2, timestamp, rank, url
Query Logs
Classifier
New Queries
Same Queries
Reformulation
Acronym
Stemming
etc...
1�4�U� n� i � v� e� r� s� i � t � y� � o � f� � G� e� n� e�v�a�,� �M�a�r�c�h� �1�8�,� �2�0�1�0�
-
36M Queries from AOL Query Logs
UserId Query Timestamp ClickRank ClickUrl
16348 lucille roberts 5/3/2006 8:01 1 http://www.lucilleroberts.com
16348 tmobile 5/22/2006 14:06
16348 torontolime 5/23/2006 13:48 1 http://www.toronto-lime.com
16348 welime 5/30/2006 14:58
16348 we lime 5/30/2006 14:59
16348 back2basics 5/30/2006 15:07 6 http://back2basics.mypicgallery.com
16348 nycaribbeanvibes 5/30/2006 15:15 2 http://nycaribbeanvibes.photosite.com
16473 theused.com 3/1/2006 23:55
16473 slipknot masks 3/2/2006 0:20 1 http://www.hauntmasters.com
16473 aol maps 3/2/2006 22:08
16473 southeast missouri basketbal 3/2/2006 22:11
16473 southeast missouri basketball 3/2/2006 22:11 5 http://www.semohoops.com
16473 sikeston basketball 3/2/2006 22:13 2 http://www.semissourian.com
16473 sikeston basketball 3/2/2006 22:13 1 http://www.topix.net
15University of Geneva, March 18, 2010
-
Comparitive
Precision Recall Accuracy
Present Study 98.2% 61.3% 89.1%
He et al. 60% 98%
Jones et al. 87.3%
Murray et al. 97.3% 76%
Radlinski et al. 96.5% 92.3%
EvaluationUnscientific
Unscientific because:- Different data sources- Different ways of counting (i.e. include same queries or not?)
16University of Geneva, March 18, 2010
-
FINDINGS
17University of Geneva, March 18, 2010
-
SkipSkip ClickClick SkipClick ClickSkip
word reorder
word substitution
stemming
spelling correction
url stripping
expand acronym
superstring
substring
whitespace / punctuation
form acronym
abbreviation
remove words
add words
same
new
Que
ry R
efor
mul
atio
n Ty
peComparing click pattern frequencies between reformulations types
18University of Geneva, March 18, 2010
-
word reorder
word substitution
stemming
spelling correction
url stripping
expand acronym
superstring
substring
whitespace / punctuation
form acronym
abbreviation
remove words
add words
same
new
SkipSkip SkipClick
Que
ry R
efor
mul
atio
n Ty
peComparing click pattern frequencies between reformulations types
19University of Geneva, March 18, 2010
compare the ratio of SkipSkip to SkipClickto seewhether a user is more likely to click if the initial action is Skip
Spelling correction, &Expand acronym, have high ratios, i.e., people use these reformulations
-
ClickClick ClickSkip
word reorder
word substitution
stemming
spelling correction
url stripping
expand acronym
superstring
substring
whitespace / punctuation
form acronym
abbreviation
remove words
add words
same
new
Que
ry R
efor
mul
atio
n Ty
peComparing click pattern frequencies between reformulations types Some reformulations are performed to improve the
result set, while others redo the result set
Different reformulations are “effective” depending on the initial action, i.e. the action performed after the initial query
20University of Geneva, March 18, 2010
-
Comparing websites clicked between reformulation types
word reorder
superstring
word substitution
stemming
same
remove words
new
spelling correction
url stripping
expand acronym
substring
whitespace / punctuation
add words
form acronym
abbreviation
Same DifferentQ
uery
Ref
orm
ulat
ion
Type
21University of Geneva, March 18, 2010
-
Reformulation Type Median Time (s) between Queries Mean Rank Change
word substitution 73 +4.04
add words 63 +3.19
substring 33 +3.15
remove words 68 +3.02
word reorder 85 +2.86
expand acronym 42 +2.02
stemming 33 +2.00
new 2,417 +1.91
abbreviation 35 +1.39
superstring 53 +1.10
spelling correction 22 +1.03
form acronym 103 +.64
whitespace & punctuation 27 +.54
url stripping 57 +.29
same 1 -1.83
Comparing time between queries (secs) and rank change between reformulations types
22University of Geneva, March 18, 2010
Positive rank change = successful reformulation
-
Future WorkMulti-reformulation
Abandonment
seattle pizza seattle sausage pizza sausage pizza
Search abandonment defined in terms of reformulation:
initial query reformulations session end(timeout or new query)
netbook eee pc eeepc netbook deals
redefined
Using instances of sequential reformulations to detect multi-reformulations
NoClick
23University of Geneva, March 18, 2010
-
Applications
• UIs supporting Reformulations
• Query session boundary detection
• Intelligent query assistance
• Personalized Search
University of Geneva, March 18, 2010 24
-
Summary• We created a taxonomy of query
reformulation strategies and a rule-based classifier to classify reformulations from the AOL query logs, where characteristics for each reformulation strategy was measured
• Different reformulations are usefuldepending on the initial action
• Some reformulations re-rank clicked results higher while others generate new results
25University of Geneva, March 18, 2010
-
Thank You!
Efthimis N. EfthimiadisUniversity of Washington
Questions?
University of Geneva, March 18, 2010 39
mailto:[email protected]�
Slide Number 1Work presented here is in collaboration with�Jeff Huang (UW) and �Sofia Stamou (UPatras) �AgendaSearch Interaction Overview ReformulationsQuery Reformulations are…Query TypesWe ask,ExampleMore examples…Prior Reformulation TaxonomiesReformulation�ClassifierA Rule-Based ClassifierArchitecture36M Queries from AOL Query LogsComparitiveFindingsSlide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23ApplicationsSummaryThank You!