adapting rankers online, maarten de rijke
Post on 01-Nov-2014
3.758 Views
Preview:
DESCRIPTION
TRANSCRIPT
Adapting Rankers Online
Maarten de Rijke
Adapting Rankers Online
Joint work with Katja Hofmann and Shimon Whiteson
2
Adapting Rankers Online 3
Growing complexity of search engines Current methods for optimizing mostly work offline
Adapting Rankers Online
No distinction between training and operating Search engine observes users’ natural interactions
with the search interface, infers information from them, and improves its ranking function automatically
Expensive data collection not required; the collected data matches target users and target setting
4
Online learning to rank
Search engine observes users’ natural interactionswith the search interface, infers information from
Adapting Rankers Online 5
Users’ natural interactions with the search interface
Oard and Kim, 2001Kelly and Teevan, 2004
Minimum scopeMinimum scopeMinimum scopeMinimum scope
Segment Object Class
Examine
Retain
Reference
Annotate
Create
View, Listen, Scroll, Find, Query Select Browse
PrintBookmark, Save, Delete, Purchase,
EmailSubscribe
Copy-and-paste, Quote
Forward, Reply, Link, Cite
Mark up Rate, Publish Organize
Type, Edit Author
Beh
avio
r cat
egor
y
Refers to
purpose of
observed
behavior
Refers to smallest possible scope of item being acted upon
Adapting Rankers Online
Users’ interactions
Relevance feedback History goes back close to forty years Typically used for query expansion, user profiling
Explicit feedback Users explicitly give feedback Keywords, selecting or marking documents,
answering questions Natural explicit feedback can be difficult to obtain “Unnatural” explicit feedback through TREC
assessors and crowd sourcing6
Adapting Rankers Online
Users’ interactions (2)
Implicit feedback for learning, query expansion and user profiling Observe users’ natural interactions with system Reading time, saving, printing, bookmarking,
selecting, clicking, … Thought to be less accurate than explicit
measures Available in very large quantities at no cost
7
Adapting Rankers Online
Learning to rank online
Using online learning to rank approaches, retrieval systems can learn directly from implicit feedback, while they are running Algorithms need to explore new solutions to obtain
feedback for effective learning and exploit what has been learned to produce results acceptable to users
Interleaved comparison methods can use implicit feedback to detect small differences between rankers and can be used to learn ranking functions online
8
Adapting Rankers Online
Balancing exploration and exploitation Inferring preferences from clicks
9
Agenda
Adapting Rankers Online
Balancing Exploitationand Exploration
10K. Hofmann et al. (2011), Balancing exploration and exploitation. In: ECIR ’11.
Recent work
Adapting Rankers Online
Challenges
Generalize over queries and documents Learn from implicit feedback that is …
noisy relative rank-biased
Keep users happy while learning
11
Adapting Rankers Online
Learning document pair-wise preferences
Insight: infer preferences from clicks
12
Vienna
Joachims, T. (2002). Optimizing search engines using clickthrough data. In KDD '02, pages 133-142.
Adapting Rankers Online
Learning document pair-wise preferences
Input: feature vectors constructed from document pairs
Output: correct / incorrect order Learning method: supervised learning, e.g., SVM
13Joachims, T. (2002). Optimizing search engines using clickthrough data. In KDD '02, pages 133-142.
y ∈ {−1,+1}(�x(q, di), �x(q, dj)) ∈ Rn × Rn
Adapting Rankers Online
Challenges
Generalize over queries and documents Learn from implicit feedback that is …
noisy relative rank-biased
Keep users happy while learning
14
Adapting Rankers Online
Dueling bandit gradient descent
Learns a ranking function consisting of a weight vector for a linear weighted combination of feature vectors from feedback about relative quality of rankings Outcome: weights for ranking
Approach Maintain a current “best” ranking function On each incoming query:
Generate a new candidate ranking function Compare to current “best” If candidate is better, update “best” ranking function
15Yue, Y. and Joachims, T. (2009). Interactively optimizing information retrieval systems as a dueling bandits problem. In ICML '09.
S = w�x(q, d)
x1
x2
current best w
candidate w
Adapting Rankers Online
Challenges
Generalize over queries and documents Learn from implicit feedback that is …
noisy relative rank-biased
Keep users happy while learning
16
Adapting Rankers Online
Exploration and exploitation
17
Need to learn effectively from rank-biased feedback
Need to present high-quality results while learning
Exploration Exploitation
Previous approaches are either purely exploratory or purely exploitative
Adapting Rankers Online
Questions
Can we improve online performance by balancing exploration and exploitation?
How much exploration is needed for effective learning?
18
Adapting Rankers Online
Problem formulation
Reinforcement learning No explicit labels Learn from feedback from the environment in
response to actions (document lists) Contextual bandit problem
19
Retrievalsystem
Environment(user)
try something
get feedback
Retrievalsystem
Environment(user)
documents
clicks
Adapting Rankers Online
Our method
Learning based on Dueling Bandit Gradient Descent Relative evaluations of quality of two document
lists Infers such comparisons from implicit feedback
Balance exploration and exploitation with k-greedy comparison of document lists
20
Adapting Rankers Online
k-greedy exploration
To compare document lists, interleave
An exploration rate k influences the relative number of documents from each list
21
Blue wins comparison
Exploration rate k = 0.5
Adapting Rankers Online 22
k-greedy exploration
Exploration
rate k = 0.5 Exploration
rate k = 0.2
Adapting Rankers Online
Evaluation
Simulated interactions We need to
observe clicks on arbitrary result lists measure online performance
Simulate clicks and measure online performance Probabilistic click model: assume dependent click
model and define click and stop probabilities based on standard learning to rank data sets
Measure cumulative reward of the rankings displayed to the user
23
Adapting Rankers Online
Experiments
Vary exploration rate k Three click models
“perfect” “navigational” “informational”
Evaluate on nine data sets (LETOR 3.0 and 4.0)
24
Adapting Rankers Online
“Perfect” click model
Click model
Provides an upperbound
25
P(c|R) P(c|NR) P(s|R) P(s|NR)
1.0 0.0 0.0 0.0
Final performance over time for data set NP2003 and perfect click model
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
Adapting Rankers Online
“Perfect” online performance
k = 0.5 k = 0.4 k = 0.3 k = 0.2 k = 0.1
HP2003 119.91 125.71 129.99 130.55 128.50
HP2004 109.21 111.57 118.54 119.86 116.46
NP2003 108.74 113.61 117.44 120.46 119.06
NP2004 112.33 119.34 124.47 126.20 123.70
TD2003 82.00 84.24 88.20 89.36 86.20
TD2004 85.67 90.23 91.00 91.71 88.98
OHSUMED 128.12 130.40 131.16 133.37 131.93
MQ2007 96.02 97.48 98.54 100.28 98.32
MQ2008 90.97 92.99 94.03 95.59 95.14
Dark borders indicate significant improvements over the k = 0.5 baseline
Darker shades indicate higher performance
26
125.71
Best performance
with only two
exploratory
documents for
top-10 results
Adapting Rankers Online
“Navigational” click model
Click model
Simulate realistic but reliable interaction
27
P(c|R) P(c|NR) P(s|R) P(s|NR)
0.95 0.05 0.9 0.2
Final performance over time for data set NP2003 and navigational click model
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
Adapting Rankers Online
“Navigational” online performance
28
k = 0.5 k = 0.4 k = 0.3 k = 0.2 k = 0.1
HP2003 102.58 109.78 118.84 116.38 117.52
HP2004 89.61 97.08 99.03 103.36 105.69
NP2003 90.32 100.94 105.03 108.15 110.12
NP2004 99.14 104.34 110.16 112.05 116.00
TD2003 70.93 75.20 77.64 77.54 75.70
TD2004 78.83 80.17 82.40 83.54 80.98
OHSUMED 125.35 126.92 127.37 127.94 127.21
MQ2007 95.50 94.99 95.70 96.02 94.94
MQ2008 89.39 90.55 91.24 92.36 92.25
Dark borders indicate significant improvements over the k = 0.5 baseline
Darker shades indicate higher performance
125.71
Best performance with little
exploration and lots of
exploitation
Adapting Rankers Online
“Informational” click model
Click model
Simulate very noisy interaction
29
P(c|R) P(c|NR) P(s|R) P(s|NR)
0.9 0.4 0.5 0.1
Final performance over time for data set NP2003 and informational click model
k = 0.5 k = 0.2 k = 0.1
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
Adapting Rankers Online
“Informational” online performance
30
k = 0.5 k = 0.4 k = 0.3 k = 0.2 k = 0.1
HP2003 59.53 63.91 61.43 70.11 71.19
HP2004 41.12 52.88 48.54 55.88 55.16
NP2003 53.63 53.64 57.60 58.40 69.90
NP2004 60.59 63.38 64.17 63.23 69.96
TD2003 52.78 52.95 51.58 55.76 57.30
TD2004 58.49 61.43 59.75 62.88 63.37
OHSUMED 121.39 123.26 124.01 126.76 125.40
MQ2007 91.57 92.00 91.66 90.79 90.19
MQ2008 86.06 87.26 85.83 87.62 86.29
Dark borders indicate significant improvements over the k = 0.5 baseline
Darker shades indicate higher performance
125.71
Highest improvements with
low exploration rates:
interaction between
noise and data set
Adapting Rankers Online
Summary
What? Developed first method for balancing exploration and
exploitation in online learning to rank Devised experimental framework for simulating user
interactions and measuring online performance And so?
Balancing exploration and exploitation improves online performance for all click models and all data sets
Best results are achieved with 2 exploratory documents per results list
31
Adapting Rankers Online
What’s next here?
Validate simulation assumptions Evaluate using on click logs Develop new algorithms for online learning to rank
for IR that can balance exploration and exploitation
32
Adapting Rankers Online
Inferring Preferencesfrom Clicks
33
Ongoingwork
Adapting Rankers Online
Interleaved ranker comparison methods
Use implicit feedback (“clicks”), not to infer absolute judgments, but to compare two rankers by observing clicks on an interleaved result list Interleave two ranked lists (“outputs of two rankers”) Use click data to detect even very small differences
between rankers
Examine three existing methods for interleaving, identify issues with them and propose a new one
34
Adapting Rankers Online
Three methods (1)
Balanced interleave method Interleaved list is generated for each query based
on the two rankers User’s clicks on interleaved list are attributed to
each ranker based on how they ranked the clicked docs
Ranker that obtains more clicks is deemed superior
35Joachims, Evaluating retrieval performanceusing clickthrough data, In: Text Mining, 2003
Adapting Rankers Online 36
x
1) Interleaving 2) Comparison
d1d2d3d4
List l1d2d3d4d1
List l2
d1 d2 d3 d4
d2 d1 d3 d4
d1 d2 d3 d4
d2 d1 d3 d4
k = min(4,3) = 3click count:c1 = 1c2 = 2
obse
rved
cl
icks
c
k = min(4,4) = 4click count:c1 = 2c2 = 2
l2 wins the first comparison, and the lists tie for the second. In expectation l2 wins.
x
x x
Two possible interleaved lists l:
Adapting Rankers Online
Three methods (2)
Team draft method Create an interleaved list following the model of
“team captains” selecting their team from a set of players
For each pair of documents to be placed in the interleaved list, a coin flip determines which list gets to select a document first
Record which document contributed which document
37Radlinski et al., How does click-through data reflect retrieval quality? 2008
Adapting Rankers Online 38
1) Interleaving 2) Comparison
d1d2d3d4
List l1d2d3d4d1
List l2
d1 1d2 2d3 1d4 2
d2 2d1 1d3 2d4 1
Four possible interleaved lists l, with different assignments a:
assignments a
For the interleaved lists a) and b) l1 wins the comparison. l2 wins in the other two cases.
d1 1d2 2d3 2d4 1
d2 2d1 1d3 1d4 2
x
x x
x
a)
b)
c)
d)
Adapting Rankers Online
Three methods (3)
Document-constraint method Result lists are interleaved and clicks observed as
for the balanced interleaved method Infer constraints on pairs of individual documents
based on clicks and ranks For each pair of a clicked document and a higher-ranked non-
clicked document, a constraint is inferred that requires the former to be ranked higher than the latter
The original list that violates fewer constraints is deemed superior
39He et al., Evaluation of methods for relative comparison of retrievalsystems based on clickthroughs, 2009
Adapting Rankers Online 40
1) Interleaving 2) Comparison
d1d2d3d4
List l1d2d3d4d1
List l2
d1 d2 d3 d4
d2 d1 d3 d4
xd1 d2 d3 d4
d2 d1 d3 d4
inferred constraintsviolated by: l1 l2d2 ≻ d1 x -d3 ≻ d1 x -
xx x
l2 wins the first comparison, and loses the second. In expectation l2 wins.
inferred constraintsviolated by: l1 l2d1 ≻ d2 - xd3 ≻ d2 x x
Two possible interleaved lists l:
Adapting Rankers Online
Assessing comparison methods
Bias Don’t prefer either ranker when clicks are random
Sensitivity The ability of a comparison method to detect
differences in the quality of rankings
Balanced interleave and document constraint are biased
Team draft may suffer from insensitivity41
Adapting Rankers Online
A new proposal
Briefly Based on team draft Instead of interleaving deterministically, model the
interleaving process as random sampling from softmax functions that define probability distributions over documents
Derive an estimator that is unbiased and sensitive to small ranking changes
Marginalize over all possible assignments to make estimates more reliable
42
1) Probabilistic Interleave 2) Probabilistic Comparison
d1d2d3d4
l1 ! softmax s1d2d3d4d1
All permutations of documents in D are possible.
l2 ! softmax s2
s2
s1
For each rank of the interleaved list l draw one of {s1, s2} and sample d:
d1
d2
d3
d4
s2
s1d2
d3
d4
...
P(dr=1)= 0.85P(dr=2)= 0.10P(dr=3)= 0.03P(dr=4)= 0.02
s2
s1 d3
d4
d4
s2
s1
......
...
...
d4...
Observe data, e.g.d1 1d2 2d3 1d4 2
xx
1 1 1 11 1 1 21 1 2 11 1 2 21 2 1 11 2 1 21 2 2 11 2 2 22 1 1 12 1 1 22 1 2 12 1 2 22 2 1 12 2 1 22 2 2 12 2 2 2
a2 02 01 11 11 11 10 20 22 02 01 11 11 11 10 20 2
o(ci,a)
0.0530.0530.0580.0580.0650.0650.0710.0710.0010.0010.0010.0010.0010.0010.0010.001
P(a|li,qi)
P(c1 > c2) = 0.108P(c1 < c2) = 0.144
s2 (based on l2) wins the comparison. s1 and s2 tie in expectation.
marginalize over all possible assignments:
Adapting Rankers Online 43
For an incoming query System generates
interleaved list Observe clicks Compute probability of
each possible outcome
... All possible assignments are
generated; Probability of each is computed Expensive; only need to do this
until the lowest observed click
Adapting Rankers Online 44
Question
Do analytical differences between the methods translate into performance differences?
Adapting Rankers Online
Evaluation
Set-up Simulation based on dependent click model
Perfect and realistic instantiations Not binary, but with relevance levels
MSLR-WEB30k Microsoft learning to rank data set 136 doc features (i.e., rankers)
Three experiments Exhaustive comparison of all distinct ranker pairs
9,180 distinct pairs Selection of small subsets for detailed analysis Add noise
45
Adapting Rankers Online
Results (1)
Experiment 1 Accuracy
Percentage of pairs of rankers for which a comparison method identified the better ranker after 1000 queries
46
Method Accuracybalanced interleave 0.881
team draft 0.898
document constraint 0.857new 0.914
Adapting Rankers Online
Results (2): overview
“Problematic” pairs Pairs of rankers for which
all methods correctly identified the better one
Three achieved perfect accuracy within 1000 queries
For each method, incorrectly judged pair with highest difference in NDCG
47
Adapting Rankers Online
Results (3): perfect model
48
0
0.2
0.4
0.6
0.8
1
1 10 100 1k 2k 5k 10k
balanced interleaveteam draft
document constraintmarginalized probabilities
0
0.2
0.4
0.6
0.8
1
1 10 100 1k 2k 5k 10k
0
0.2
0.4
0.6
0.8
1
1 10 100 1k 2k 5k 10k
0
0.2
0.4
0.6
0.8
1
1 10 100 1k 2k 5k 10k
Adapting Rankers Online
Results (4): realistic model
49
0
0.2
0.4
0.6
0.8
1
1 10 100 1k 2k 5k 10k
0
0.2
0.4
0.6
0.8
1
1 10 100 1k 2k 5k 10k
Adapting Rankers Online
Summary
What? Methods for evaluating rankers using implicit
feedback Analysis of interleaved comparison methods in
terms of bias and sensitivity And so?
Introduced a new probabilistic interleaved comparison method, unbiased and sensitive
Experimental analysis: more accurate, with substantially fewer observed queries, more robust
50
Adapting Rankers Online
What’s next here?
Evaluate in a real-life setting in the future With more reliable and faster convergence, our
approach can pave the way for online learning to rank methods that require many comparisons
51
Adapting Rankers Online
Wrap-up
52
Adapting Rankers Online
Online learning to rank Emphasis on implicit feedback collected during
normal operation of the search engine Balancing exploration and exploitation Probabilistic method for inferring preferences from
clicks
53
Adapting Rankers Online
Information retrieval observatory
Academic experiments on online learning and implicit feedback used simulators Need to validate the simulators
What’s really needed Move away from artificial explicit feedback to
natural implicit feedback Shared experimental environment for observing
users in the wild as they interact with systems
54
Adapting Rankers Online
Adapting Rankers Online Maarten de Rijke, derijke@uva.nl
55
Adapting Rankers Online 56
(Intentionally left blank)
Adapting Rankers Online 57
Bias
x
1) Interleaving 2) Comparison
d1d2d3d4
List l1d2d3d4d1
List l2
d1 d2 d3 d4
d2 d1 d3 d4
d1 d2 d3 d4
d2 d1 d3 d4
k = min(4,3) = 3click count:c1 = 1c2 = 2
obse
rved
cl
icks
c
k = min(4,4) = 4click count:c1 = 2c2 = 2
l2 wins the first comparison, and the lists tie for the second. In expectation l2 wins.
x
x x
Two possible interleaved lists l:
Adapting Rankers Online
Sensitivity
58
1) Interleaving 2) Comparison
d1d2d3d4
List l1d2d3d4d1
List l2
d1 1d2 2d3 1d4 2
d2 2d1 1d3 2d4 1
Four possible interleaved lists l, with different assignments a:
assignments a
For the interleaved lists a) and b) l1 wins the comparison. l2 wins in the other two cases.
d1 1d2 2d3 2d4 1
d2 2d1 1d3 1d4 2
x
x x
x
a)
b)
c)
d)
top related