personalizing atypical web search sessions (wsdm'13)
DESCRIPTION
State-of-the-art web search personalization treats users as static or slowly evolving entities with a given set of preferences defined by their past behavior. However, recent publications as well as empirical evidence suggest that there is a significant number of search sessions in which users diverge from their regular search profiles in order to satisfy atypical, non-recurring information needs. In this work, we conduct a large-scale inspection of real life search sessions to further the understanding of this problem. Subsequently, we design an automatic means of detecting and supporting such atypical sessions. We demonstrate significant improvements over state-of-the-art web search personalization techniques by accounting for the typicality of search sessions. The merit of the proposed method is evaluated based on web-scale search session data spanning several months of user activity. This work together with Kevyn Collins-Thompson, Paul Bennett and Susan Dumais has been accepted for full oral presentation at the ACM International Conference on Web Search and Data Mining (WSDM) in Rome, Italy. The full version of this paper is available at: http://dl.acm.org/citation.cfm?id=2433434TRANSCRIPT
February 7, 2013
Challenge the future
Delft University of Technology
Personalizing Atypical Web Search Sessions
Carsten Eickhoff, Kevyn Collins-Thompson, Paul Bennett and Susan Dumais
2 Personalizing Atypical Web Search Sessions
Introduction
• Web search personalization is used to account for user
preferences based on historic observations
• This seems appropriate for typical search tasks
• Atypical search tasks in unfamiliar domains may not
benefit as much from personalization
3 Personalizing Atypical Web Search Sessions
4 Personalizing Atypical Web Search Sessions
Overview
1. Investigation of nature, frequency and cause of atypical search sessions
2. Automatic prediction of atypical search sessions
3. Personalization of atypical search sessions
5 Personalizing Atypical Web Search Sessions
Atypical Search Tasks
• Motivation • Atypical search tasks are often caused by external needs
• Topic domain
• They cover unfamiliar, previously unseen topics & genres
• Behavior
• Due to limited domain knowledge, the searcher may encounter problems during query formulation and result selection
6 Personalizing Atypical Web Search Sessions
Atypical Search Sessions
• Sessions tend to be task-centered • 200 most active Bing users in January 2012
• Navigational queries removed
• Based on a profile, each session is manually judged as
typical or atypical
7 Personalizing Atypical Web Search Sessions
Human-readable User Profiles
• 55% Football(“nfl”,”philadelphia eagles”,”mark sanchez”)
• 14% Boxing(“espn boxing”,”mickey garcia”,”hbo boxing”)
• 09% Television(“modern familiy”,”dexter 8”,”tv guide”)
• 06% Travel(“rome hotels”,“tripadvisor seattle”,“rome pasta”)
• 05% Hockey(“elmira pioneers”,”umass lax”,”necbl”)
8 Personalizing Atypical Web Search Sessions
Example 1:
• 55% Football(“nfl”,”philadelphia eagles”,”mark sanchez”)
• 14% Boxing(“espn boxing”,”mickey garcia”,”hbo boxing”)
• 09% Television(“modern familiy”,”dexter 8”,”tv guide”)
• 06% Travel(“rome hotels”,“tripadvisor seattle”,“rome pasta”)
• 05% Hockey(“elmira pioneers”,”umass lax”,”necbl”)
• Boxing(“soto vs ortiz”)
• Boxing(“humberto soto”)
9 Personalizing Atypical Web Search Sessions
Example 2:
• 55% Football(“nfl”,”philadelphia eagles”,”mark sanchez”)
• 14% Boxing(“espn boxing”,”mickey garcia”,”hbo boxing”)
• 09% Television(“modern familiy”,”dexter 8”,”tv guide”)
• 06% Travel(“rome hotels”,“tripadvisor seattle”,“rome pasta”)
• 05% Hockey(“elmira pioneers”,”umass lax”,”necbl”)
• Dentistry(“oral sores”)
• Dentistry(“aphthous sore”)
• Healthcare(“aphthous ulcer treatment”)
10 Personalizing Atypical Web Search Sessions
Frequency of Atypical Sessions
• 166 out of 2790 sessions (~6%) were judged atypical
• 74% of all users show atypical search sessions
• On average, 7.5% of a user's query volume is atypical
11 Personalizing Atypical Web Search Sessions
Typical vs. Atypical Sessions
• Atypical sessions have:
12 Personalizing Atypical Web Search Sessions
Typical vs. Atypical Sessions
• Atypical sessions have:
• longer queries (often natural language questions)
13 Personalizing Atypical Web Search Sessions
Typical vs. Atypical Sessions
• Atypical sessions have:
• longer queries (often natural language questions)
• more diverse query vocabulary
14 Personalizing Atypical Web Search Sessions
Typical vs. Atypical Sessions
• Atypical sessions have:
• longer queries (often natural language questions)
• more diverse query vocabulary
• higher SAT reading level
15 Personalizing Atypical Web Search Sessions
Typical vs. Atypical Sessions
• Atypical sessions have:
• longer queries (often natural language questions)
• more diverse query vocabulary
• higher SAT reading level
• Observed differences persist within profiles
16 Personalizing Atypical Web Search Sessions
Topic Spread
• Most frequent topics for typical sessions include (sports,
celebrities & gossip, entertainment)
• Often typicality describes the cut between what you choose to
do (typical) and what you have to do (atypical)
17 Personalizing Atypical Web Search Sessions
Predicting Atypical Sessions
• User profiling based on activity in Jan. 2012
• Post hoc classification of sessions in Apr. 2012
• Session features (25)
• Session length
• Query length
• Question words
• POS ratios
• Longest query position
• Reading level
• ...
• Profile features (27)
• δ session length
• δ query length
• …
• Query vocabulary
divergence
• Topic divergence
18 Personalizing Atypical Web Search Sessions
Classification Performance
• Logistic regression model
• 52-dimensional feature space
• CV performance F1 0.84 (P 0.82 / R 0.86) • The resulting performance is comparable to the
agreement among human judges
19 Personalizing Atypical Web Search Sessions
Strongest Typicality Indicators
1. Query length divergence from profile
2. Absolute query length
3. Question word ratio
4. Verb ratio divergence from profile
5. Topic divergence from profile
20 Personalizing Atypical Web Search Sessions
Robustness to Sparsity
• It takes 15-20 sessions to reliably characterize users
• Most users reach this point in 14 days of search activity
21 Personalizing Atypical Web Search Sessions
Retrieval Performance
• Moving from qualitative setting to web scale
• 155k users
• 10.4M sessions
• LambdaMART learning scheme
• Re-ranking the top 10 returned results
22 Personalizing Atypical Web Search Sessions
Profiling Scopes
• Session
• All previous activity in the same session
• Historic
• All previous activity before this session
• Aggregate
• All previous activity before current query
23 Personalizing Atypical Web Search Sessions
Personalization Performance
24 Personalizing Atypical Web Search Sessions
Personalization Performance
25 Personalizing Atypical Web Search Sessions
Hybrid Approach
26 Personalizing Atypical Web Search Sessions
Hybrid Performance
27 Personalizing Atypical Web Search Sessions
Hybrid Performance
28 Personalizing Atypical Web Search Sessions
Hybrid Performance
29 Personalizing Atypical Web Search Sessions
Conclusion
• Investigation of atypical web search sessions • Atypical sessions occur for most users • They account for a significant proportion of queries
• Typicality Prediction • Automatic prediction comparable to human accuracy
• Benefit for Retrieval Settings • Including typicality information into the personalization
process improved retrieval performance
30 Personalizing Atypical Web Search Sessions
Future Work
• Classification of ongoing sessions • Currently looking at last query, do it earlier for more impact • Weakly supervised approach based on pre-classified
complete sessions
• From atypical to typical • How do information needs change over time? • Life cycle of an information need / a profile
• Information value at risk • How important is a piece of information for the user? • How much effort will they invest?