![Page 1: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/1.jpg)
Fan Guo Chao LiuCarnegie Mellon University Microsoft Research-Redmond
![Page 2: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/2.jpg)
Search Results for “CIKM”
04/22/23 2CIKM'09 Tutorial, Hong Kong, China
# of clicks received
![Page 3: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/3.jpg)
Adapt ranking to user clicks?
04/22/23 3CIKM'09 Tutorial, Hong Kong, China
# of clicks received
![Page 4: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/4.jpg)
Tools needed for non-trivial cases
04/22/23 4CIKM'09 Tutorial, Hong Kong, China
# of clicks received
![Page 5: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/5.jpg)
One of the most extensive (yet indirect) surveys of user experience.
For researchers: Help understand human interaction with IR
results Design and calibrate novel models and
hypotheses For practitioners:
Measure, monitor and improve search engine performance.
Attract more page views and clicks, boost profit 04/22/23 CIKM'09 Tutorial, Hong Kong, China 5
![Page 6: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/6.jpg)
Introduce problems and applications in web search click modeling.
Present latest development of click models in web search.
Provide examples and discuss trade-offs for model design, implementation and evaluation.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 6
![Page 7: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/7.jpg)
04/22/23 CIKM'09 Tutorial, Hong Kong, China 7
Ph.D. Student (exp. 2011), Computer Science Department, Carnegie Mellon University
Advisor: Christos Faloutsos Dissertation topic: graph
mining for large bioinformatics image databases
2008, M.S., CMU 2005, B.E., Tsinghua
University, Beijing, China
![Page 8: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/8.jpg)
Researcher, Internet Services Research Center (ISRC), MSR-Redmond.
Research focus: large-scale search/browsing log analysis for effective Web information access.
2007, Ph.D., UIUC2005, M.S., UIUC Advisor: Jiawei Han Dissertation on statistical
debugging and automated failure analysis
2003, B.S., Peking University, China
04/22/23 CIKM'09 Tutorial, Hong Kong, China 8
![Page 9: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/9.jpg)
IntroductionDesigning click modelsBayesian click modelsSelected topics on click modelsConclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 9
![Page 10: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/10.jpg)
Introduction Web search click logs Interpret clicks as relevance feedback Building statistical models for clicks Applications of click models
Designing click models Bayesian click models Selected topics on click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 10
![Page 11: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/11.jpg)
Click-throughBrowser actionDwelling timeExplicit judgmentOther page elements
04/22/23 CIKM'09 Tutorial, Hong Kong, China 11
![Page 12: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/12.jpg)
Auto-generated data keeping important information about search activity.
1204/22/23 CIKM'09 Tutorial, Hong Kong, China
Position URL Click1 cikm2008.org 1
2 www.cikm.org 03 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 www.comp.polyu.edu.hk/conference/cikm2009 16 cikmconference.org 07 Ir.iit.edu/cikm2004 08 www.informatik.uni-trier.de/~ley/db/conf/cikm/index.html 09 www.tzi.de/CIKM2005 0
10 www.cikm.com 0
Query cikmSession
IDf851c5af178384d12f3d
![Page 13: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/13.jpg)
A real world example
04/22/23 CIKM'09 Tutorial, Hong Kong, China 13
![Page 14: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/14.jpg)
How large is the click log? search logs: 10+ TB/day
In existing publications:▪ [Craswell+08]: 108k sessions▪ [Dupret+08] : 4.5M sessions (21 subsets * 216k
sessions)▪ [Guo +09a] : 8.8M sessions from 110k unique queries▪ [Guo+09b]: 8.8M sessions from 110k unique queries▪ [Chapelle+09]: 58M sessions from 682k unique
queries▪ [Liu+09a]: 0.26PB data from 103M unique queries
04/22/23 CIKM'09 Tutorial, Hong Kong, China 14
![Page 15: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/15.jpg)
How large is one ?
04/22/23 CIKM'09 Tutorial, Hong Kong, China 15
![Page 16: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/16.jpg)
Introduction Web search click logs Interpret clicks as relevance feedback Building statistical models for clicks Applications of click models
Designing click models Bayesian click models Selected topics on click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 16
![Page 17: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/17.jpg)
Clicks are good… Are these two
clicks equally “good”?
Non-clicks may have excuses: Not relevant Not examined
04/22/23 CIKM'09 Tutorial, Hong Kong, China 17
![Page 18: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/18.jpg)
1804/22/23 CIKM'09 Tutorial, Hong Kong, China
![Page 19: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/19.jpg)
Higher positions receive more user attention (eye fixation) and clicks than lower positions.
This is true even in the extreme setting where the order of positions is reversed.
“Clicks are informative but biased”.
1904/22/23 CIKM'09 Tutorial, Hong Kong, China
[Joachims+07]
Normal Position
Perc
enta
ge
Reversed Impression
Perc
enta
ge
![Page 20: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/20.jpg)
“Clicked > Skipped Above” [Joachims02]
04/22/23 CIKM'09 Tutorial, Hong Kong, China 20
Preference pairs:#5>#2, #5>#3, #5>#4.
Use Rank SVM to optimize the retrieval function.
Limitation: Confidence of
judgments Little implication to
user modeling
1
2345
67
8
![Page 21: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/21.jpg)
Introduction Web search click logs Interpret clicks as relevance feedback Building statistical models for clicks Applications of click models
Designing click models Bayesian click models Selected topics on click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 21
![Page 22: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/22.jpg)
Given a set of web search click logs: Predict clicks: output the
probability of click vectors given a new order of URLs.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 22
210 possibilities!
![Page 23: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/23.jpg)
Given a set of web search click logs: Estimate relevance: measures how
good a URL is with regard to the information need of the query/user.
04/22/23 23
Relevance score = 0.5
CIKM'09 Tutorial, Hong Kong, China
![Page 24: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/24.jpg)
The probability of a click if the document appears at the top position. Relevance score = 0.5 indicates that on
average, the document will be clicked once per 2 sessions.
Bayesian click models characterize relevance using a probability distribution
2404/22/23Relevance score
Densi
ty f
unct
ion
CIKM'09 Tutorial, Hong Kong, China
![Page 25: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/25.jpg)
Effective: aware of the position-bias and address it properly
Scalable: linear complexity for both time and space, easy to parallel
Incremental: flexible for model update based on new data
04/22/23 CIKM'09 Tutorial, Hong Kong, China 25
![Page 26: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/26.jpg)
Introduction Web search click logs Interpret clicks as relevance feedback Building statistical models for clicks Applications of click models
Designing click models Bayesian click models Selected topics on click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 26
![Page 27: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/27.jpg)
Optimizing the retrieval function Ranking alternation based on clicks
[Liu+09b]
04/22/23 CIKM'09 Tutorial, Hong Kong, China 27
0.90
0.10
0.08
0.05
0.20
0.72
![Page 28: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/28.jpg)
Optimizing the retrieval function Ranking alternation based on clicks As a feature to a learning-to-rank
system (e.g., RankNet [Burges+05] )
04/22/23 CIKM'09 Tutorial, Hong Kong, China 28
![Page 29: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/29.jpg)
Online advertising User model for sponsored search
auctions
04/22/23 CIKM'09 Tutorial, Hong Kong, China 29
![Page 30: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/30.jpg)
Online advertising User model for sponsored search
auctions Click through rate (CTR) prediction
[Zhu+10]
04/22/23 CIKM'09 Tutorial, Hong Kong, China 30
![Page 31: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/31.jpg)
Search engine evaluation Pskip [Wang+09]:
click-through-rate above last clicks; dwelling time features could also be incorporated.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 31
![Page 32: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/32.jpg)
Search engine evaluation Pskip [Wang+09]: click-through-rate above
last clicks;
Search relevance score [Guo+09c]: average relevance score weighted by chance of examination
04/22/23 CIKM'09 Tutorial, Hong Kong, China 32
![Page 33: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/33.jpg)
User behavior analysis A preliminary work showing different
user behavior patterns for navigational and informational queries [Guo+09c]
04/22/23 CIKM'09 Tutorial, Hong Kong, China 33
![Page 34: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/34.jpg)
Introduction Designing click models
Basic user hypotheses Modeling the first click Extending to multiple clicks Summary of model design
Bayesian click models Selected topics on click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 34
![Page 35: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/35.jpg)
A document must be examined before a click.
The (conditional) probability of click upon examination depends on document relevance.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 35
![Page 36: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/36.jpg)
The click probability could be decomposed: Global component: the examination
probability which reflects the position-bias Local component: depends on the (query,
URL) pair only
The building block for every existing model!
04/22/23 CIKM'09 Tutorial, Hong Kong, China 36
![Page 37: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/37.jpg)
The first document is always examined.
First-order Markov property: Examination at position (i+1) depends on
examination and click at position i only
Examination follows a strict linear order:
04/22/23 CIKM'09 Tutorial, Hong Kong, China 37
Position i Position (i+1)
![Page 38: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/38.jpg)
The first document is always examined.
First-order Markov property: Examination at position (i+1) depends on
examination and click at position i only
Examination follows a strict linear order:
04/22/23 CIKM'09 Tutorial, Hong Kong, China 38
Position i Position (i+1)
![Page 39: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/39.jpg)
Limitation: examination/click rate monotonically decreases with rank, which is not always true.
Some models do not follow this hypothesis (e.g., UBM)
04/22/23 CIKM'09 Tutorial, Hong Kong, China 39
Web search data in [Guo+09a]
Ads click data in [Zhu+10]
![Page 40: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/40.jpg)
Introduction Designing click models
Basic user hypotheses Modeling the first click Extending to multiple clicks Summary of model design
Bayesian click models Selected topics on click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 40
![Page 41: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/41.jpg)
Put together two hypotheses:
Formal model specification: P(Ci=1|Ei=0) = 0, P(Ci=1|Ei=1) = rui
P(E1=1) =1, P(Ei+1=1|Ei=0) = 0
P(Ei+1=1|Ei=1, Ci=0)=104/22/23 CIKM'09 Tutorial, Hong Kong, China 41
Cascade Model = [Craswell+08]
examination hypothesiscascade hypothesis
modeling a single click
![Page 42: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/42.jpg)
The user behavior chart:
04/22/23 CIKM'09 Tutorial, Hong Kong, China 42
Examine the URL
Click?
Yes
No See Next URL?
Done
Yes
Index for URL at position i
![Page 43: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/43.jpg)
First click in Click Chain Model [Guo+09b] as well asDynamic Bayesian Network model [Chapelle+09]
04/22/23 CIKM'09 Tutorial, Hong Kong, China 43
The chance that user may
immediately abandon
examination w/o a click.
Examine the URL
Click?
Yes
No See Next URL?
Done
Yes
Done
No
![Page 44: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/44.jpg)
First click in User Browsing Model [Dupret+08]
04/22/23 CIKM'09 Tutorial, Hong Kong, China 44
Examine the URL
Click?
Yes
No
Done
Yes
Noi ←i+1
See Next URL?
Position-dependent parameters
![Page 45: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/45.jpg)
Introduction Designing click models
Basic user hypotheses Modeling the first click Extending to multiple clicks Summary of model design
Bayesian click models Selected topics on click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 45
![Page 46: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/46.jpg)
Generalize the cascade model to 1+ clicks: P(Ci=1|Ei=0) = 0, P(Ci=1|Ei=1) = rui
P(E1=1) =1, P(Ei+1=1|Ei=0) = 0
P(Ei+1=1|Ei=1, Ci=0)=1
P(Ei+1=1|Ei=1, Ci=1)= λi
04/22/23 CIKM'09 Tutorial, Hong Kong, China 46
λ:global parameters characterizing user browsing
behavior
![Page 47: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/47.jpg)
Generalize the cascade model to 1+ clicks:
04/22/23 CIKM'09 Tutorial, Hong Kong, China 47
![Page 48: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/48.jpg)
DCM Algorithms: Input: for each query session, the query
term, with (URL, clicked) tuple for all top-10 positions.
Output: relevance for each (query, URL) pair;global parameters for user behavior
Method: approximate* maximum-likelihood estimation.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 48*Footnote: the algorithm maximizes a lower bound of log-likelihood function.
![Page 49: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/49.jpg)
04/22/23 CIKM'09 Tutorial, Hong Kong, China 49
Position URL Click1 cikm2008.org 12 www.cikm.org 03 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 www.comp.polyu.edu.hk/... 16 cikmconference.org 07 Ir.iit.edu/cikm2004 08 www.informatik.uni-trier.de... 09 www.tzi.de/CIKM2005 0
10 www.cikm.com 0
Last clicked position
Query cikm
Session ID f851c5af178384d12f3d
![Page 50: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/50.jpg)
04/22/23 CIKM'09 Tutorial, Hong Kong, China 50
Position URL Click1 cikm2008.org 02 www.cikm.org 13 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 cikmconference.org 06 www.comp.polyu.edu.hk/... 17 Ir.iit.edu/cikm2004 08 www.informatik.uni-trier.de... 09 www.tzi.de/CIKM2005 1
10 www.cikm.com 0
Last clicked position
Query cikm
Session ID ab8dee4c4dd21e6aaf03
![Page 51: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/51.jpg)
The estimation formula for relevance:
empirical CTR measured before last clicked position
The estimation formula for global (user behavior) parameters:
empirical probability of “clicked-but-not-last”
04/22/23 CIKM'09 Tutorial, Hong Kong, China 51
![Page 52: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/52.jpg)
Keep 3 counts for each (query, URL) pair
Then
04/22/23 CIKM'09 Tutorial, Hong Kong, China 52
Details
![Page 53: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/53.jpg)
The examine-next probability depends on the relevance of the URL clicked:
04/22/23 CIKM'09 Tutorial, Hong Kong, China 53
Not what I want, go to examine the
next
Aha, this is the right one, and I’m done!
![Page 54: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/54.jpg)
The examine-next probability depends on the relevance of the URL clicked: P(Ei+1=1|Ei=1, Ci=1)= α2(1-rui
) + α3rui
P(Ei+1=1|Ei=1, Ci=0)= α1
where 0 < α1 ≤ 1, 0 ≤ α3< α2≤ 1
04/22/23 CIKM'09 Tutorial, Hong Kong, China 54
![Page 55: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/55.jpg)
The full picture:
04/22/23 CIKM'09 Tutorial, Hong Kong, China 55
![Page 56: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/56.jpg)
There is a subtle difference between the relevance of the URL snippet and the landing page.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 56
hmmm…, this looks
pretty nice
errr…, it’s way out of
date
Conclusion: attractive, but not satisfactory.
![Page 57: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/57.jpg)
The examine-next probability depends on the “satisfaction score”: P(Ei+1=1|Ei=1, Ci=1)= γ(1-sui
) + 0sui
P(Ei+1=1|Ei=1, Ci=0)= γ
where 0 < γ ≤1The click probability is associated
with “attractiveness score”: P(Ci=1|Ei=1)= aui
04/22/23 CIKM'09 Tutorial, Hong Kong, China 57
![Page 58: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/58.jpg)
The full picture:
04/22/23 CIKM'09 Tutorial, Hong Kong, China 58
![Page 59: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/59.jpg)
The examine-next probability depends on both the preceding clicked position r, and the distance to this position d.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 59
r = 0d = 1
Position URL Click1 cikm2008.org 02 www.cikm.org 13 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 cikmconference.org 06 www.comp.polyu.edu.hk/... 1… … …
![Page 60: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/60.jpg)
The examine-next probability depends on both the preceding clicked position r, and the distance to this position d.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 60
r = 0d = 2
Position URL Click1 cikm2008.org 02 www.cikm.org 13 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 cikmconference.org 06 www.comp.polyu.edu.hk/... 1… … …
![Page 61: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/61.jpg)
The examine-next probability depends on both the preceding clicked position r, and the distance to this position d.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 61
r = 2d = 1
Position URL Click1 cikm2008.org 02 www.cikm.org 13 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 cikmconference.org 06 www.comp.polyu.edu.hk/... 1… … …
![Page 62: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/62.jpg)
The examine-next probability depends on both the preceding clicked position r, and the distance to this position d.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 62
r = 2d = 2
Position URL Click1 cikm2008.org 02 www.cikm.org 13 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 cikmconference.org 06 www.comp.polyu.edu.hk/... 1… … …
![Page 63: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/63.jpg)
The examine-next probability depends on both the preceding clicked position r, and the distance to this position d.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 63
r = 2d = 3
Position URL Click1 cikm2008.org 02 www.cikm.org 13 www.cikm.org/2002 04 www.fc.ul.pt/cikm2007 05 cikmconference.org 06 www.comp.polyu.edu.hk/... 1… … …
![Page 64: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/64.jpg)
The examine-next probability depends on both the preceding clicked position r, and the distance to this position d. Users would lose patience when they
browse through without issuing a click. The probability monotonically drops as d
increases and r remains the same.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 64
![Page 65: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/65.jpg)
The examine-next probability depends on both the preceding clicked position r, and the distance to this position d. P(Ei=1|C1:i-1)= βri,di
55 parameters are needed for top-10 positions (0≤r<r+d≤10).
Cascade hypothesis is not assumed.04/22/23 CIKM'09 Tutorial, Hong Kong, China 65
where ri = max{j| j <i , Cj=1}, di = i - ri
![Page 66: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/66.jpg)
The full picture:
04/22/23 CIKM'09 Tutorial, Hong Kong, China 66
![Page 67: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/67.jpg)
Introduction Designing click models
Basic user hypotheses Modeling the first click Extending to multiple clicks Summary of model design
Bayesian click models Selected topics on click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 67
![Page 68: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/68.jpg)
Probability of examine the first URL
04/22/23 CIKM'09 Tutorial, Hong Kong, China 68
Model P(E1)
Cascade 1DCM 1CCM 1*
DBN 1*
UBM β0,1
* Footnote: it is flexible to add another parameter to specify this probability.
![Page 69: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/69.jpg)
Probability of click upon examination
04/22/23 CIKM'09 Tutorial, Hong Kong, China 69
Model P(Ci=1|Ei=1)
Cascade rdi
DCM rdi
CCM rdi
*
DBN adi
UBM rdi*Footnote: the mean of the relevance distribution, detailed in the next part
![Page 70: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/70.jpg)
Probability of examine-next w/o a click
04/22/23 CIKM'09 Tutorial, Hong Kong, China 70
Model P(Ei+1=1|Ei=1,Ci=0)
Cascade 1DCM 1CCM α1
DBN γUBM βri+1,di+1
*
*Footnote: the probability does not depend on Ei
![Page 71: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/71.jpg)
Probability of examine-next after a click
04/22/23 CIKM'09 Tutorial, Hong Kong, China 71
Model P(Ei+1=1|Ei=1,Ci=1)
Cascade --DCM αi
CCM α2(1-rdi) + α3rdi
DBN γ(1-sdi)
UBM βi,1
![Page 72: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/72.jpg)
Probability of examine-next after a click
04/22/23 CIKM'09 Tutorial, Hong Kong, China 72
Model P(Ei+1=1|Ei=1,Ci=1)
Cascade --DCM αi
CCM α2(1-rdi) + α3rdi
DBN γ(1-sdi)
UBM βi,1
![Page 73: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/73.jpg)
Size of parameter sets
04/22/23 CIKM'09 Tutorial, Hong Kong, China 73
Model # of global params
Cascade 0DCM 9CCM 3DBN 1UBM 55
![Page 74: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/74.jpg)
Inference and estimation algorithms
04/22/23 CIKM'09 Tutorial, Hong Kong, China 74
Model
Single-Pass
Details
DCM Maximizing a lower bound of LL, fastest
CCMNo iteration needed,
thanks to the Bayesian framework
DBN EM-based, iterative algorithms
UBM EM-based, usually takes ~30 iterations to converge
![Page 75: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/75.jpg)
Inference and estimation algorithms
04/22/23 CIKM'09 Tutorial, Hong Kong, China 75
Model
Single-Pass
Details
DCM Maximizing a lower bound of LL, fastest
CCMNo iteration needed,
thanks to the Bayesian framework
DBN EM-based, iterative algorithms
UBM EM-based, usually takes ~30 iterations to converge
![Page 76: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/76.jpg)
Introduction Designing click models Bayesian click models
Bayesian framework and the rationale
Bayesian Browsing Model: a case study
Click Chain Model in a nutshell Selected topics on click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 76
![Page 77: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/77.jpg)
p(H)=0.8
Frequentist
Bayesian
0 1
Prior Posterior
10
04/22/23 77CIKM'09 Tutorial, Hong Kong, China
p(H) p(H)
“probability” of p(H)
![Page 78: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/78.jpg)
Prior Posterior
04/22/23 78CIKM'09 Tutorial, Hong Kong, China
Density Function(not normalized)
x x2 x3 x3(1-x) x4(1-x)
![Page 79: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/79.jpg)
Prior Posterior
04/22/23 79CIKM'09 Tutorial, Hong Kong, China
Density Function(not normalized)
x1(1-x)0 x2(1-x)0 x3(1-x)0
x3(1-x)1 x4(1-x)1
![Page 80: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/80.jpg)
The graphical model for coin-toss
04/22/23 CIKM'09 Tutorial, Hong Kong, China 80
X
C1
C2
C3
C4
C5
![Page 81: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/81.jpg)
The graphical model for coin-toss
04/22/23 CIKM'09 Tutorial, Hong Kong, China 81
X
C1
C2
C3
C4
C5
![Page 82: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/82.jpg)
04/22/23 CIKM'09 Tutorial, Hong Kong, China 82
Prior
Density Function(not normalized)
x1
(1-x)0
(1-0.6x)0
(1+0.3x)1
(1-0.5x)0
(1-0.2x)0
…
x1
(1-x)1
(1-0.6x)0
(1+0.3x)1
(1-0.5x)0
(1-0.2x)0
…
x2
(1-x)1
(1-0.6x)0
(1+0.3x)2
(1-0.5x)0
(1-0.2x)0
…
x3
(1-x)1
(1-0.6x)1
(1+0.3x)2
(1-0.5x)0
(1-0.2x)0
…
x3
(1-x)1
(1-0.6x)1
(1+0.3x)2
(1-0.5x)1
(1-0.2x)0
…
![Page 83: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/83.jpg)
Representation of relevance A probability distribution on
[0,1] for each (query, URL) pair
The density function is in a polynomial form over a small set of linear factors.
The coefficients of such linear factors are shared between different (query, URL) pairs.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 83
x3
(1-1x)1
(1-0.6x)1
(1+0.3x)2
(1-0.5x)1
(1-0.2x)0
…
![Page 84: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/84.jpg)
Inference: Go over each query session
once, update the exponents for corresponding (query, URL) pair impressed*
Analytical or numerical integration may be needed to compute the normalization constant.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 84*Footnote: by virtue of the Bayes theorem and conditional independence relationship/assumption
![Page 85: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/85.jpg)
Key problems: Which is the right factor to update?
How to estimate all the coefficients?
04/22/23 CIKM'09 Tutorial, Hong Kong, China 85
![Page 86: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/86.jpg)
Modeling Benefits: Confidence for the URL relevance estimate Relative judgments: probability of URL i is
more relevant to the query than URL j Easy to interpret: coefficients in linear
factors reflect position-bias and user browsing patterns
Computational Benefits: Single-pass, linear algorithms; no
iterations Paralleled version is easy to implement
04/22/23 CIKM'09 Tutorial, Hong Kong, China 86
![Page 87: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/87.jpg)
Introduction Designing click models Bayesian click models
Bayesian framework and the rationale
Bayesian Browsing Model: a case study
Click Chain Model in a nutshell Selected topics on click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 87
![Page 88: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/88.jpg)
For a specific query session, let
where 1 ≤ i ≤ M=10.
04/22/23 88
S1
S2
S3
SM
…
E1
E2
E3
EM
…
C1
C2
C3
CM
…
CIKM'09 Tutorial, Hong Kong, China
![Page 89: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/89.jpg)
04/22/23 89
S1
S2
S3
SM
…
E1
E2
E3
EM
…
C1
C2
C3
CM
…
Relevance
Examination
Click
CIKM'09 Tutorial, Hong Kong, China
![Page 90: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/90.jpg)
Compute the posterior distributionConditional independence
relationship induced from the graphical model
04/22/23 90
How many times the URL j was clicked
How many times URLj was not clicked when it is at position (r + d) with the preceding click at position rCIKM'09 Tutorial, Hong Kong, China
Details
![Page 91: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/91.jpg)
9104/22/23
Only top M=3 positions are shown, 3 query sessions and 4 distinct URLs.
41
4
3
1 3
31 2
Position 1 2 3
Query Session 3
Query Session 2
Query Session 1
CIKM'09 Tutorial, Hong Kong, China
![Page 92: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/92.jpg)
9204/22/23
Initialize M(M+1)/2+1 counts for each URL
URL Clicks r=0d=1
r=0d=2
r=0d=3
r=1d=1
r=1d=2
r=2d=1
4 0 0 0 0 0 0 0
CIKM'09 Tutorial, Hong Kong, China
![Page 93: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/93.jpg)
9304/22/23
Update counts for URL 4 If not impressed, do nothing; If clicked, increment “clicks” by 1; Otherwise, locate the right r and d to
increment.
URL Clicks r=0d=1
r=0d=2
r=0d=3
r=1d=1
r=1d=2
r=2d=1
4 0 0 0 0 0 0 0CIKM'09 Tutorial, Hong Kong, China
![Page 94: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/94.jpg)
9404/22/23
Update counts for URL 4 If not impressed, do nothing; If clicked, increment “clicks” by 1; Otherwise, locate the right r and d to
increment.
URL Clicks r=0d=1
r=0d=2
r=0d=3
r=1d=1
r=1d=2
r=2d=1
4 0 0 0 0 0 0 1CIKM'09 Tutorial, Hong Kong, China
![Page 95: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/95.jpg)
9504/22/23
Update counts for URL 4 If not impressed, do nothing; If clicked, increment “clicks” by 1; Otherwise, locate the right r and d to
increment.
URL Clicks r=0d=1
r=0d=2
r=0d=3
r=1d=1
r=1d=2
r=2d=1
4 1 0 0 0 0 0 1CIKM'09 Tutorial, Hong Kong, China
![Page 96: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/96.jpg)
9604/22/23
The posterior for URL 4
Interpretation: The larger the probability of examination,
the stronger the penalty for a non-click.
URL Clicks r=0d=1
r=0d=2
r=0d=3
r=1d=1
r=1d=2
r=2d=1
4 1 0 0 0 0 0 1
CIKM'09 Tutorial, Hong Kong, China
![Page 97: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/97.jpg)
Keep 2 counts for each parameter (one for click, and the other one for non-click)
04/22/23 CIKM'09 Tutorial, Hong Kong, China 97
Parameter Click Non-click Parameter Click Non-Click
β0,1 0 0 β1,1 0 0
β0,2 0 0 β1,2 0 0
β0,3 0 0 β2,1 0 0
![Page 98: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/98.jpg)
For each position in a query session, locate the right r and d to increment.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 98
Parameter Click Non-click Parameter Click Non-Click
β0,1 1 0 β1,1 0 1
β0,2 0 0 β1,2 0 1
β0,3 0 0 β2,1 0 0
![Page 99: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/99.jpg)
For each position in a query session, locate the right r and d to increment.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 99
Parameter Click Non-click Parameter
Click Non-Click
β0,1 1 1 β1,1 0 1
β0,2 1 0 β1,2 0 1
β0,3 0 0 β2,1 0 1
![Page 100: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/100.jpg)
For each position in a query session, locate the right r and d to increment.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 100
Parameter Click Non-click Parameter
Click Non-Click
β0,1 1 2 β1,1 1 1
β0,2 1 0 β1,2 0 1
β0,3 0 0 β2,1 1 1
![Page 101: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/101.jpg)
Maximum-Likelihood Estimate:
04/22/23 CIKM'09 Tutorial, Hong Kong, China 101
Parameter Click Non-click Parameter
Click Non-Click
β0,1 1 2 β1,1 1 1
β0,2 1 0 β1,2 0 1
β0,3 0 0 β2,1 1 1
![Page 102: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/102.jpg)
Let
Initializing and updating the counts: Time: Space:
04/22/23 102
Linear to the size of the click log
Almost constant storage requiredCIKM'09 Tutorial, Hong Kong, China
Details
![Page 103: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/103.jpg)
Let
Initializing and updating the counts: Time: Space:
Computing relevance scores using numerical integration with B bins: Time: Space:
04/22/23 103CIKM'09 Tutorial, Hong Kong, China
Details
![Page 104: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/104.jpg)
Step 1: Step 1: initialize counting statistics; Step 2: Step 2: scan through the click log
once and update the counts for both inference and estimation
Step 3: Step 3: compute parameter values; Step 4: Step 4: use numerical integration to
obtain relevance scores.
Step 2 also applies for (linear) incremental computation!
04/22/23 104CIKM'09 Tutorial, Hong Kong, China
![Page 105: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/105.jpg)
Introduction Designing click models Bayesian click models
Bayesian framework and the rationale
Bayesian Browsing Model: a case study
Click Chain Model in a nutshell Selected topics on click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 105
![Page 106: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/106.jpg)
The user behavior model:
04/22/23 CIKM'09 Tutorial, Hong Kong, China 106
![Page 107: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/107.jpg)
Graphical model:
04/22/23 CIKM'09 Tutorial, Hong Kong, China 107
Relevance
Examination
Click
S1
S2
S3
SM
…
E1
E2
E3
EM
…
C1
C2
C3
CM
…
![Page 108: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/108.jpg)
04/22/23 CIKM'09 Tutorial, Hong Kong, China 108
Details
![Page 109: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/109.jpg)
Number of user behavior parameters
Number of distinct factors for (query, URL)
Number of counts needed for parameters
04/22/23 CIKM'09 Tutorial, Hong Kong, China 109
CCM UBM
3 55
CCM UBM
22 56
CCM UBM
5 110
![Page 110: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/110.jpg)
Introduction Designing click models Bayesian click models Selected topics on click models
Scaling click models for Petabyte-scale data
Click model evaluation
Tailoring user goals to click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 110
![Page 111: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/111.jpg)
Data collected in 8 weeks Job k includes data between week 1 and
k Both time and space costs are
prohibitive for a single node.
04/22/23 111CIKM'09 Tutorial, Hong Kong, China
![Page 112: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/112.jpg)
A Simple Task: counting # impression for each (query, URL) pair
04/22/23 CIKM'09 Tutorial, Hong Kong, China 112
![Page 113: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/113.jpg)
Extent
GetPairs
Map
Sort
Extent
GetPairs
Map
Sort
Extent
GetPairs
Map
Sort
Extent
GetPairs
Map
Sort
Output
Count Count Count Count
Machine #1
Machine #2 Machine #3 Machine #4
![Page 114: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/114.jpg)
Extent
GetPairs
Map
Sort
Extent
GetPairs
Map
Sort
Extent
GetPairs
Map
Sort
Extent
GetPairs
Map
Sort
Output
Count Count Count Count
“Map” puts all of the same Pairs onto one machine. This allows you to group by various fields in
subsequent processes.
Machine #1
Machine #2 Machine #3 Machine #4
![Page 115: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/115.jpg)
A Simple Task: counting # impression for each (query, URL) pair
Map = Bucket: the intermediate key is (query, URL) pair
04/22/23 CIKM'09 Tutorial, Hong Kong, China 115
![Page 116: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/116.jpg)
Extent
GetPairs
Map
Sort
Extent
GetPairs
Map
Sort
Extent
GetPairs
Map
Sort
Extent
GetPairs
Map
Sort
Output
Count Count Count Count
“Count” carries out standard increment-by-1 over each distinct Pair.
Machine #1
Machine #2 Machine #3 Machine #4
“Count” REDUCES the amount of data since each Pair has only one output value
![Page 117: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/117.jpg)
A Simple Task: counting # impression for each (query, URL) pair
Map = Bucket: the intermediate key is (query, URL) pair
Reduce = Count: it accepts a list of (key, value) tuple, and outputs the final result for each distinct key
04/22/23 CIKM'09 Tutorial, Hong Kong, China 117
![Page 118: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/118.jpg)
Extent
GetPairs
Map
Sort
Extent
GetPairs
Map
Sort
Extent
GetPairs
Map
Sort
Extent
GetPairs
Map
Sort
Output
Count Count Count Count
MAPMAP
REDUCEREDUCE
Machine #1
Machine #2 Machine #3 Machine #4
![Page 119: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/119.jpg)
04/22/23 119
0 for clicks0 for clicks332 52 51 4 61 4 6
CIKM'09 Tutorial, Hong Kong, China
![Page 120: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/120.jpg)
Map: scan the click log Intermediate key: (query, URL) Value: the index of linear factors
(0~55 for top-10 positions)
Reduce: scan the list of (key, value) The key indicates which exponent vector
to update The value indicates the index of the
element in the exponent vector to increment
04/22/23 CIKM'09 Tutorial, Hong Kong, China 120
![Page 121: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/121.jpg)
Linearly increasing computation loadNear-constant elapsed time
04/22/23121
Single machine computation load
Elapse time on SCOPE
• 3 hours• 265 TB log data• 1.15 billion (query, url) pairs
CIKM'09 Tutorial, Hong Kong, China
![Page 122: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/122.jpg)
Introduction Designing click models Bayesian click models Selected topics on click models
Scaling click models for Petabyte-scale data
Click model evaluation
Tailoring user goals to click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 122
![Page 123: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/123.jpg)
04/22/23 123
Impression Data
Click Data
CIKM'09 Tutorial, Hong Kong, China
![Page 124: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/124.jpg)
04/22/23 124
Impression Data
Click Data
Relevance Scores
Global Parameters
M=10
CIKM'09 Tutorial, Hong Kong, China
![Page 125: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/125.jpg)
Relevance
New Impression Vector from an Existing Query
04/22/23 125
Global params
Predicted Examination
Predicted ClicksCIKM'09 Tutorial, Hong Kong, China
![Page 126: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/126.jpg)
Data are collected from a commercial search engine after query term normalization and spam removal.
For each query term, split query sessions evenly into training and test sets according to the timestamp.
Top frequent/infrequent query terms are removed.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 126
![Page 127: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/127.jpg)
Most popular metrics: Average test data log-likelihood (LL)
(probability of accurately predicting the click vector, 2^10 possibilities)[Guo+09a, Guo+09b, Liu+09a, Zhu+10]
Perplexity of prediction for each position(2^{average entropy} of click/no-click binary prediction for each position independently)[Dupret+08, Guo+09a, Guo+09b, Zhu+10]
04/22/23 CIKM'09 Tutorial, Hong Kong, China 127
![Page 128: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/128.jpg)
Other Metrics: Click-through-rate (CTR) prediction
(Especially for predicting CTR@1)[Chapelle+09, Zhu+10]
Predicting first/last clicked positions[Guo+09a, Guo+09b]
Position-bias sanity check(plot the click rate curve for top-10 positions v.s. the ground truth)[Guo+09a, Guo+09b]
04/22/23 CIKM'09 Tutorial, Hong Kong, China 128
![Page 129: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/129.jpg)
Average Log-likelihood Random guess: log(2-10) = -3.01 Optimal value: 0
12904/22/23
Model CCM UBM DCM
LL -1.171 -1.264 -1.302
Improve-ment Ratio
9.7% 14%
CIKM'09 Tutorial, Hong Kong, China
![Page 130: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/130.jpg)
13004/22/23
Better
Worse
CIKM'09 Tutorial, Hong Kong, China
![Page 131: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/131.jpg)
13104/22/23
Better
Worse
CIKM'09 Tutorial, Hong Kong, China
![Page 132: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/132.jpg)
Average Perplexity over top 10 positions Random guess: 2 Optimal value: 1
13204/22/23 CIKM'09 Tutorial, Hong Kong, China
Model CCM UBM DCM
Perplexity
-1.1479
1.1577 1.1590
Improve-ment Ratio
7.5% 8.3%
![Page 133: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/133.jpg)
13304/22/23 CIKM'09 Tutorial, Hong Kong, China
Worse
Better
![Page 134: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/134.jpg)
13404/22/23 CIKM'09 Tutorial, Hong Kong, China
![Page 135: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/135.jpg)
04/22/23 CIKM'09 Tutorial, Hong Kong, China 135
For 1M query sessions, the estimated time in seconds:
* Time for CCM and BBM includes computing posterior mean and variance using numerical integration w/ 100 bins.
** UBM converges in 34 iterations.
DCM CCM* BBM* UBM**
80 150 165 5,000
![Page 136: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/136.jpg)
Introduction Designing click models Bayesian click models Selected topics on click models
Scaling click models for Petabyte-scale data
Click model evaluation
Tailoring user goals to click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 136
![Page 137: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/137.jpg)
Queries could be categorized into 2 sets: Navigational: to find the link to an
existing website, e.g., bing; Informational: more exploration, multiple
clicks may arise, e.g., iron man.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 137
![Page 138: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/138.jpg)
Different user goals result in different browsing and click patterns.
The straightforward mixture-modeling approach is not practical. [Dupret+08]
Solution: Classify query terms a priori based on user
goals. Fitting and learning 2 sets of model
parameters for navigational and informational queries.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 138
![Page 139: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/139.jpg)
Two-way classification for query terms based on click data using… Median position of click distribution Mean position of click distribution Average # clicks per query session …
Pick the one which has best click prediction If a position receives 50% of the click,
then navigational, else informational04/22/23 CIKM'09 Tutorial, Hong Kong, China 139
![Page 140: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/140.jpg)
Improvement of click prediction for DCM: Log-Likelihood: 4.0% Perplexity: 1.3%
Examination/Click position-bias:
04/22/23 CIKM'09 Tutorial, Hong Kong, China 140
![Page 141: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/141.jpg)
Introduction Designing click models Bayesian click models Selected topics on click models Conclusion
04/22/23 CIKM'09 Tutorial, Hong Kong, China 141
![Page 142: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/142.jpg)
Click models A statistical tool to leverage valuable
user implicit feedback in terabyte/petabyte search logs.
Provide click prediction as well as relevance estimates.
Application domains include learning to rank, measuring search performance, online advertising, user behavior analysis…
04/22/23 CIKM'09 Tutorial, Hong Kong, China 142
![Page 143: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/143.jpg)
Click models Different model designs reflect various
assumption of user behaviors to explain the position-bias.
The modeling choice may depend on the application scenario.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 143
![Page 144: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/144.jpg)
Click models Efficient, single-pass, parallelizable
algorithms are desired in real-world applications.
Bayesian framework could be applied to click models for both modeling benefits and computational benefits.
Click Chain Model and Bayesian Browsing Model represent state-of-the-art examples.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 144
![Page 145: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/145.jpg)
Bigger Context Query reformulations Personalization
Richer inputs Universal search Diverse user feedback
Click model v.s. Human judgments04/22/23 CIKM'09 Tutorial, Hong Kong, China 145
![Page 146: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/146.jpg)
[Burges+05]: C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. ICML’05.
[Chapelle+09]: O. Chapelle and Y. Zhang. A dynamic Bayesian network click model for web search ranking. WWW’09.
[Craswell+08]: N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. WSDM ’08.
[Dean+04]: J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. OSDI’04.
[Dupret+08]: G. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. SIGIR’08.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 146
![Page 147: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/147.jpg)
[Guo+09a]: F. Guo, C. Liu, and Y.-M. Wang. Efficient multiple-click models in web search. WSDM’09.
[Guo+09b]: F. Guo, C. Liu, A. Kannan, T. Minka, M. Taylor, Y.-M. Wang, and C. Faloutsos. Click chain model in web search. WWW’09.
[Guo+09c]: F. Guo, L. Li, and C. Faloutsos. Tailoring click models to user goals. WSCD’09.
[Joachims02]: T. Joachims. Optimizing search engines using clickthrough data. KDD’02.
[Joachims+07]: T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Accurately interpreting clickthrough data as implicit feedback, ACM TOIS, 25(2), 2007.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 147
![Page 148: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/148.jpg)
[Lee+05]: U. Lee, Z. Liu, and J. Cho. Automatic identification ofuser goals in web search. WWW’05.
[Liu+09a]: C. Liu, F. Guo, and C. Faloutsos. BBM: Deriving click models from petabyte-scale data. KDD’09.
[Liu+09b]: C. Liu, M. Li, and Y.-M. Wang. Post-rank reordering: resolving preference misalignments between search engines and end users. CIKM’09.
[Richardson+07]: M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. WWW’07.
[Zhu+10]: Z. Zhu, W. Chen, T. Minka, C. Zhu and Z. Chen. A novel click model and its applications to online advertising. To appear in WSDM’10.
04/22/23 CIKM'09 Tutorial, Hong Kong, China 148
![Page 149: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/149.jpg)
04/22/23 CIKM'09 Tutorial, Hong Kong, China 149
MSR, Search LabAnitha Kannan MSR, Cambridge
Tom Minka
Carnegie Mellon University
Christos Faloutsos Li-Wei HeMSR, ISRC-RedmondMSR, Cambridge
Nick Craswell
![Page 150: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/150.jpg)
04/22/23 CIKM'09 Tutorial, Hong Kong, China 150
Yi-Min WangMSR, ISRC-Redmond
MSR, CambridgeMike Taylor
MSR, ISRC-RedmondEthan Tu
![Page 151: Statistical Models for Web Search Click Log Analysis](https://reader035.vdocument.in/reader035/viewer/2022081514/568157c3550346895dc54731/html5/thumbnails/151.jpg)
04/22/23 CIKM'09 Tutorial, Hong Kong, China 151