modeling long-term search engine usageryenwhite.com/talks/pdf/whitekapoordumaisumap2010.pdf ·...
TRANSCRIPT
![Page 1: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/1.jpg)
Modeling Long-Term Search Engine Usage
Ryen White, Ashish Kapoor & Susan Dumais
Microsoft Research
![Page 2: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/2.jpg)
Key Problem
• What are key trends in search engine usage?
– Identify long-term patterns of usage
– Understand key variables that affect behavior
• Can we predict long-term search engine usage?
– Determine indicators that are predictive of trends
![Page 3: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/3.jpg)
Prior Work
• Short-term Usage: – Predict Switch within Sessions (Heath & White 2008, Laxman et al. 2008, White & Dumais 2009)
– Predict good search engines for a query (White et al. 2008)
• Economic / Conceptual Models – Identify factors influencing search engine choice (Capraro et al. 2003)
– Models of satisfaction (Keaveney et al. 2001, Mittal et al. 1998)
![Page 4: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/4.jpg)
Long-Term Search Logs
• Six months of toolbar data (26 weeks) – Sep 2008 through February 2009
• Three search engines – Bing, Google and Yahoo
• Users with at least 10 queries every week – 10K users for our analysis
– English speaking, located in US
![Page 5: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/5.jpg)
fractionEngine Fraction of queries issued to search engine
queryCountEngine Number of queries issued to search engine
avgEngineQueryLength Average length (in words) of queries to search engine
fractionEngineSAT Fraction of search engine queries that are satisfied
fractionNavEngine Fraction search engine queries defined as navigational
fractionNavEngineSAT Fraction of queries in fractionNavEngine that are satisfied
Long-Term Search Logs (summarized for each week)
SAT score: Dwell time greater than equal to 30 seconds (Fox et al. 2005)
![Page 6: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/6.jpg)
Outline
• Identifying Key Trends
• Indicators of User Behavior
• Predicting Search Engine Usage
• Conclusion and Future Work
![Page 7: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/7.jpg)
Outline
• Identifying Key Trends
• Indicators of User Behavior
• Predicting Search Engine Usage
• Conclusion and Future Work
![Page 8: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/8.jpg)
Identifying Basis Behaviors
Primary Behavior Indicator: fractionEngine
Time
Search engine
26 X 3 dimensional behavior vector (per user)
![Page 9: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/9.jpg)
Identifying Basis Behaviors U
sers
Observed Behavior
X W H
![Page 10: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/10.jpg)
Option 1: Clustering
corresponds to one user
Good for identifying “user prototypes” e.g. Users that switch engines towards end of 26 weeks as opposed to the beginning Might not recover basis behavior
![Page 11: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/11.jpg)
Option 2: PCA a.k.a. Eigen Analysis
corresponds to one user
Seeks an orthogonal basis that’s aligned with directions of maximal variation Basis vectors are hard to interpret as the basis vectors will have negative values
![Page 12: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/12.jpg)
Option 3: Non-negative matrix factorization
corresponds to one user
Seeks basis with non-negative entries (easier to interpret) The basis can be considered as parts / building blocks Numerically harder problem
![Page 13: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/13.jpg)
Key Trends in Long-Term Search Engine Usage
No Switch
Persistent Switch
Oscillating
![Page 14: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/14.jpg)
Outline
• Identifying Key Trends
• Indicators of User Behavior
• Predicting Search Engine Usage
• Conclusion and Future Work
![Page 15: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/15.jpg)
What are key differentiating factors across the three groups?
Users in oscillating group issue a significantly higher number of queries than the others
Oscillating == Skilled, aware of multiple search engines
![Page 16: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/16.jpg)
What are key differentiating factors across the three groups?
Users in oscillating group are hardest to please!
Low user satisfaction == Hard queries, more demanding in terms of required information
![Page 17: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/17.jpg)
What are key differentiating factors across the three groups?
Users that make the persistent switch issue shortest (possibly simpler) queries.
Shorter / simpler queries == Non-expert population, less familiar with search engines
![Page 18: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/18.jpg)
Outline
• Identifying Key Trends
• Indicators of User Behavior
• Predicting Search Engine Usage
• Conclusion and Future Work
![Page 19: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/19.jpg)
Prediction Goal
Time (weeks into study)
Week 0 Week 26
Oscillating?
Persistent Switch?
No Switch?
Oscillating?
Persistent Switch?
No Switch?
![Page 20: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/20.jpg)
Feature Extraction
fractionEngine Fraction of queries issued to search engine
queryCountEngine Number of queries issued to search engine
avgEngineQueryLength Average length (in words) of queries to search engine
fractionEngineSAT Fraction of search engine queries that are satisfied
fractionNavEngine Fraction search engine queries defined as navigational
fractionNavEngineSAT Fraction of queries in fractionNavEngine that are satisfied
Compute stats: max, min, mean, etc. for observed weeks
F1
F2
F3
F4
.
.
.
FK
![Page 21: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/21.jpg)
Experimental Protocol
• Dataset
– 500 user from each class (1500 total)
– 50-50 train-test split
– Results averaged over 10 random train-test splits
• Classifier
– Gaussian process regression
– Linear kernel
– Classify users as number of weeks observed is varied
![Page 22: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/22.jpg)
Can We Predict Search Engine Usage?
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Cla
ssif
icat
ion
Acc
ura
cy
Weeks
Predicting User Trend
Predictions
Marginals
Gaussian Process Regression (Linear Kernel)
![Page 23: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/23.jpg)
No Switch vs. Rest Pers Switch vs. Rest Oscillate vs. Rest
isOneEngineDominant min fractionEngine A min fractionEngine C
min fractionEngine A min fractionEngine C isOneEngineDominant
ObservedPersistSwitch min fractionEngine B ObservedPersistSwitch
max fractionEngine A max fractionEngine A min fractionEngineSAT C
min fractionEngine B max fractionEngine C mean fractionEngineSAT A
mean fractionEngineSAT A isOneEngineDominant min fractionEngine B
mean fractionEngineA max queryCountEngine C < 50 mean fractionEngineSAT B
min fractionNavEngine A min fractionEngineSAT C mean fractionEngineSAT C
mean fractionNavEngine A mean fractionNavEngine A max queryCountEngine B < 50
max fractionEngine C ObservedPersistSwitch min fractionEngineSAT B
Most Informative Features xwy T
![Page 24: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/24.jpg)
Conclusion and Future Work
• Discovered 3 key trends in long term search engine usage – No Switch, Persistent Switch, Oscillating
• Possible to predict usage behaviors – Extract features about user satisfaction, past usage
behavior
• In future: – Additional data / features (e.g. demographics?) – Can we dissuade users from making a persistent
switch from our engine (if we detect it in advance)?
![Page 25: Modeling Long-Term Search Engine Usageryenwhite.com/talks/pdf/WhiteKapoorDumaisUMAP2010.pdf · Microsoft Research . Key Problem ... –Bing, Google and Yahoo •Users with at least](https://reader033.vdocument.in/reader033/viewer/2022052014/602b5e764362b6465f2cb52d/html5/thumbnails/25.jpg)
Questions?
{ryenw, akapoor, sdumais}@microsoft.com