adapting rankers online, maarten de rijke

58
Adapting Rankers Online Maarten de Rijke

Upload: yaevents

Post on 01-Nov-2014

3.756 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Maarten de Rijke

Page 2: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Joint work with Katja Hofmann and Shimon Whiteson

2

Page 3: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online 3

Growing complexity of search engines Current methods for optimizing mostly work offline

Page 4: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

No distinction between training and operating Search engine observes users’ natural interactions

with the search interface, infers information from them, and improves its ranking function automatically

Expensive data collection not required; the collected data matches target users and target setting

4

Online learning to rank

Search engine observes users’ natural interactionswith the search interface, infers information from

Page 5: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online 5

Users’ natural interactions with the search interface

Oard and Kim, 2001Kelly and Teevan, 2004

Minimum scopeMinimum scopeMinimum scopeMinimum scope

Segment Object Class

Examine

Retain

Reference

Annotate

Create

View, Listen, Scroll, Find, Query Select Browse

PrintBookmark, Save, Delete, Purchase,

EmailSubscribe

Copy-and-paste, Quote

Forward, Reply, Link, Cite

Mark up Rate, Publish Organize

Type, Edit Author

Beh

avio

r cat

egor

y

Refers to

purpose of

observed

behavior

Refers to smallest possible scope of item being acted upon

Page 6: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Users’ interactions

Relevance feedback History goes back close to forty years Typically used for query expansion, user profiling

Explicit feedback Users explicitly give feedback Keywords, selecting or marking documents,

answering questions Natural explicit feedback can be difficult to obtain “Unnatural” explicit feedback through TREC

assessors and crowd sourcing6

Page 7: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Users’ interactions (2)

Implicit feedback for learning, query expansion and user profiling Observe users’ natural interactions with system Reading time, saving, printing, bookmarking,

selecting, clicking, … Thought to be less accurate than explicit

measures Available in very large quantities at no cost

7

Page 8: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Learning to rank online

Using online learning to rank approaches, retrieval systems can learn directly from implicit feedback, while they are running Algorithms need to explore new solutions to obtain

feedback for effective learning and exploit what has been learned to produce results acceptable to users

Interleaved comparison methods can use implicit feedback to detect small differences between rankers and can be used to learn ranking functions online

8

Page 9: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Balancing exploration and exploitation Inferring preferences from clicks

9

Agenda

Page 10: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Balancing Exploitationand Exploration

10K. Hofmann et al. (2011), Balancing exploration and exploitation. In: ECIR ’11.

Recent work

Page 11: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Challenges

Generalize over queries and documents Learn from implicit feedback that is …

noisy relative rank-biased

Keep users happy while learning

11

Page 12: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Learning document pair-wise preferences

Insight: infer preferences from clicks

12

Vienna

Joachims, T. (2002). Optimizing search engines using clickthrough data. In KDD '02, pages 133-142.

Page 13: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Learning document pair-wise preferences

Input: feature vectors constructed from document pairs

Output: correct / incorrect order Learning method: supervised learning, e.g., SVM

13Joachims, T. (2002). Optimizing search engines using clickthrough data. In KDD '02, pages 133-142.

y ∈ {−1,+1}(�x(q, di), �x(q, dj)) ∈ Rn × Rn

Page 14: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Challenges

Generalize over queries and documents Learn from implicit feedback that is …

noisy relative rank-biased

Keep users happy while learning

14

Page 15: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Dueling bandit gradient descent

Learns a ranking function consisting of a weight vector for a linear weighted combination of feature vectors from feedback about relative quality of rankings Outcome: weights for ranking

Approach Maintain a current “best” ranking function On each incoming query:

Generate a new candidate ranking function Compare to current “best” If candidate is better, update “best” ranking function

15Yue, Y. and Joachims, T. (2009). Interactively optimizing information retrieval systems as a dueling bandits problem. In ICML '09.

S = w�x(q, d)

x1

x2

current best w

candidate w

Page 16: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Challenges

Generalize over queries and documents Learn from implicit feedback that is …

noisy relative rank-biased

Keep users happy while learning

16

Page 17: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Exploration and exploitation

17

Need to learn effectively from rank-biased feedback

Need to present high-quality results while learning

Exploration Exploitation

Previous approaches are either purely exploratory or purely exploitative

Page 18: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Questions

Can we improve online performance by balancing exploration and exploitation?

How much exploration is needed for effective learning?

18

Page 19: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Problem formulation

Reinforcement learning No explicit labels Learn from feedback from the environment in

response to actions (document lists) Contextual bandit problem

19

Retrievalsystem

Environment(user)

try something

get feedback

Retrievalsystem

Environment(user)

documents

clicks

Page 20: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Our method

Learning based on Dueling Bandit Gradient Descent Relative evaluations of quality of two document

lists Infers such comparisons from implicit feedback

Balance exploration and exploitation with k-greedy comparison of document lists

20

Page 21: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

k-greedy exploration

To compare document lists, interleave

An exploration rate k influences the relative number of documents from each list

21

Blue wins comparison

Exploration rate k = 0.5

Page 22: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online 22

k-greedy exploration

Exploration

rate k = 0.5 Exploration

rate k = 0.2

Page 23: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Evaluation

Simulated interactions We need to

observe clicks on arbitrary result lists measure online performance

Simulate clicks and measure online performance Probabilistic click model: assume dependent click

model and define click and stop probabilities based on standard learning to rank data sets

Measure cumulative reward of the rankings displayed to the user

23

Page 24: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Experiments

Vary exploration rate k Three click models

“perfect” “navigational” “informational”

Evaluate on nine data sets (LETOR 3.0 and 4.0)

24

Page 25: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

“Perfect” click model

Click model

Provides an upperbound

25

P(c|R) P(c|NR) P(s|R) P(s|NR)

1.0 0.0 0.0 0.0

Final performance over time for data set NP2003 and perfect click model

0 200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

Page 26: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

“Perfect” online performance

k = 0.5 k = 0.4 k = 0.3 k = 0.2 k = 0.1

HP2003 119.91 125.71 129.99 130.55 128.50

HP2004 109.21 111.57 118.54 119.86 116.46

NP2003 108.74 113.61 117.44 120.46 119.06

NP2004 112.33 119.34 124.47 126.20 123.70

TD2003 82.00 84.24 88.20 89.36 86.20

TD2004 85.67 90.23 91.00 91.71 88.98

OHSUMED 128.12 130.40 131.16 133.37 131.93

MQ2007 96.02 97.48 98.54 100.28 98.32

MQ2008 90.97 92.99 94.03 95.59 95.14

Dark borders indicate significant improvements over the k = 0.5 baseline

Darker shades indicate higher performance

26

125.71

Best performance

with only two

exploratory

documents for

top-10 results

Page 27: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

“Navigational” click model

Click model

Simulate realistic but reliable interaction

27

P(c|R) P(c|NR) P(s|R) P(s|NR)

0.95 0.05 0.9 0.2

Final performance over time for data set NP2003 and navigational click model

0 200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

Page 28: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

“Navigational” online performance

28

k = 0.5 k = 0.4 k = 0.3 k = 0.2 k = 0.1

HP2003 102.58 109.78 118.84 116.38 117.52

HP2004 89.61 97.08 99.03 103.36 105.69

NP2003 90.32 100.94 105.03 108.15 110.12

NP2004 99.14 104.34 110.16 112.05 116.00

TD2003 70.93 75.20 77.64 77.54 75.70

TD2004 78.83 80.17 82.40 83.54 80.98

OHSUMED 125.35 126.92 127.37 127.94 127.21

MQ2007 95.50 94.99 95.70 96.02 94.94

MQ2008 89.39 90.55 91.24 92.36 92.25

Dark borders indicate significant improvements over the k = 0.5 baseline

Darker shades indicate higher performance

125.71

Best performance with little

exploration and lots of

exploitation

Page 29: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

“Informational” click model

Click model

Simulate very noisy interaction

29

P(c|R) P(c|NR) P(s|R) P(s|NR)

0.9 0.4 0.5 0.1

Final performance over time for data set NP2003 and informational click model

k = 0.5 k = 0.2 k = 0.1

0 200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

Page 30: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

“Informational” online performance

30

k = 0.5 k = 0.4 k = 0.3 k = 0.2 k = 0.1

HP2003 59.53 63.91 61.43 70.11 71.19

HP2004 41.12 52.88 48.54 55.88 55.16

NP2003 53.63 53.64 57.60 58.40 69.90

NP2004 60.59 63.38 64.17 63.23 69.96

TD2003 52.78 52.95 51.58 55.76 57.30

TD2004 58.49 61.43 59.75 62.88 63.37

OHSUMED 121.39 123.26 124.01 126.76 125.40

MQ2007 91.57 92.00 91.66 90.79 90.19

MQ2008 86.06 87.26 85.83 87.62 86.29

Dark borders indicate significant improvements over the k = 0.5 baseline

Darker shades indicate higher performance

125.71

Highest improvements with

low exploration rates:

interaction between

noise and data set

Page 31: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Summary

What? Developed first method for balancing exploration and

exploitation in online learning to rank Devised experimental framework for simulating user

interactions and measuring online performance And so?

Balancing exploration and exploitation improves online performance for all click models and all data sets

Best results are achieved with 2 exploratory documents per results list

31

Page 32: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

What’s next here?

Validate simulation assumptions Evaluate using on click logs Develop new algorithms for online learning to rank

for IR that can balance exploration and exploitation

32

Page 33: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Inferring Preferencesfrom Clicks

33

Ongoingwork

Page 34: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Interleaved ranker comparison methods

Use implicit feedback (“clicks”), not to infer absolute judgments, but to compare two rankers by observing clicks on an interleaved result list Interleave two ranked lists (“outputs of two rankers”) Use click data to detect even very small differences

between rankers

Examine three existing methods for interleaving, identify issues with them and propose a new one

34

Page 35: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Three methods (1)

Balanced interleave method Interleaved list is generated for each query based

on the two rankers User’s clicks on interleaved list are attributed to

each ranker based on how they ranked the clicked docs

Ranker that obtains more clicks is deemed superior

35Joachims, Evaluating retrieval performanceusing clickthrough data, In: Text Mining, 2003

Page 36: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online 36

x

1) Interleaving 2) Comparison

d1d2d3d4

List l1d2d3d4d1

List l2

d1 d2 d3 d4

d2 d1 d3 d4

d1 d2 d3 d4

d2 d1 d3 d4

k = min(4,3) = 3click count:c1 = 1c2 = 2

obse

rved

cl

icks

c

k = min(4,4) = 4click count:c1 = 2c2 = 2

l2 wins the first comparison, and the lists tie for the second. In expectation l2 wins.

x

x x

Two possible interleaved lists l:

Page 37: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Three methods (2)

Team draft method Create an interleaved list following the model of

“team captains” selecting their team from a set of players

For each pair of documents to be placed in the interleaved list, a coin flip determines which list gets to select a document first

Record which document contributed which document

37Radlinski et al., How does click-through data reflect retrieval quality? 2008

Page 38: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online 38

1) Interleaving 2) Comparison

d1d2d3d4

List l1d2d3d4d1

List l2

d1 1d2 2d3 1d4 2

d2 2d1 1d3 2d4 1

Four possible interleaved lists l, with different assignments a:

assignments a

For the interleaved lists a) and b) l1 wins the comparison. l2 wins in the other two cases.

d1 1d2 2d3 2d4 1

d2 2d1 1d3 1d4 2

x

x x

x

a)

b)

c)

d)

Page 39: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Three methods (3)

Document-constraint method Result lists are interleaved and clicks observed as

for the balanced interleaved method Infer constraints on pairs of individual documents

based on clicks and ranks For each pair of a clicked document and a higher-ranked non-

clicked document, a constraint is inferred that requires the former to be ranked higher than the latter

The original list that violates fewer constraints is deemed superior

39He et al., Evaluation of methods for relative comparison of retrievalsystems based on clickthroughs, 2009

Page 40: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online 40

1) Interleaving 2) Comparison

d1d2d3d4

List l1d2d3d4d1

List l2

d1 d2 d3 d4

d2 d1 d3 d4

xd1 d2 d3 d4

d2 d1 d3 d4

inferred constraintsviolated by: l1 l2d2 ≻ d1 x -d3 ≻ d1 x -

xx x

l2 wins the first comparison, and loses the second. In expectation l2 wins.

inferred constraintsviolated by: l1 l2d1 ≻ d2 - xd3 ≻ d2 x x

Two possible interleaved lists l:

Page 41: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Assessing comparison methods

Bias Don’t prefer either ranker when clicks are random

Sensitivity The ability of a comparison method to detect

differences in the quality of rankings

Balanced interleave and document constraint are biased

Team draft may suffer from insensitivity41

Page 42: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

A new proposal

Briefly Based on team draft Instead of interleaving deterministically, model the

interleaving process as random sampling from softmax functions that define probability distributions over documents

Derive an estimator that is unbiased and sensitive to small ranking changes

Marginalize over all possible assignments to make estimates more reliable

42

Page 43: Adapting Rankers Online, Maarten de Rijke

1) Probabilistic Interleave 2) Probabilistic Comparison

d1d2d3d4

l1 ! softmax s1d2d3d4d1

All permutations of documents in D are possible.

l2 ! softmax s2

s2

s1

For each rank of the interleaved list l draw one of {s1, s2} and sample d:

d1

d2

d3

d4

s2

s1d2

d3

d4

...

P(dr=1)= 0.85P(dr=2)= 0.10P(dr=3)= 0.03P(dr=4)= 0.02

s2

s1 d3

d4

d4

s2

s1

......

...

...

d4...

Observe data, e.g.d1 1d2 2d3 1d4 2

xx

1 1 1 11 1 1 21 1 2 11 1 2 21 2 1 11 2 1 21 2 2 11 2 2 22 1 1 12 1 1 22 1 2 12 1 2 22 2 1 12 2 1 22 2 2 12 2 2 2

a2 02 01 11 11 11 10 20 22 02 01 11 11 11 10 20 2

o(ci,a)

0.0530.0530.0580.0580.0650.0650.0710.0710.0010.0010.0010.0010.0010.0010.0010.001

P(a|li,qi)

P(c1 > c2) = 0.108P(c1 < c2) = 0.144

s2 (based on l2) wins the comparison. s1 and s2 tie in expectation.

marginalize over all possible assignments:

Adapting Rankers Online 43

For an incoming query System generates

interleaved list Observe clicks Compute probability of

each possible outcome

... All possible assignments are

generated; Probability of each is computed Expensive; only need to do this

until the lowest observed click

Page 44: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online 44

Question

Do analytical differences between the methods translate into performance differences?

Page 45: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Evaluation

Set-up Simulation based on dependent click model

Perfect and realistic instantiations Not binary, but with relevance levels

MSLR-WEB30k Microsoft learning to rank data set 136 doc features (i.e., rankers)

Three experiments Exhaustive comparison of all distinct ranker pairs

9,180 distinct pairs Selection of small subsets for detailed analysis Add noise

45

Page 46: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Results (1)

Experiment 1 Accuracy

Percentage of pairs of rankers for which a comparison method identified the better ranker after 1000 queries

46

Method Accuracybalanced interleave 0.881

team draft 0.898

document constraint 0.857new 0.914

Page 47: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Results (2): overview

“Problematic” pairs Pairs of rankers for which

all methods correctly identified the better one

Three achieved perfect accuracy within 1000 queries

For each method, incorrectly judged pair with highest difference in NDCG

47

Page 48: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Results (3): perfect model

48

0

0.2

0.4

0.6

0.8

1

1 10 100 1k 2k 5k 10k

balanced interleaveteam draft

document constraintmarginalized probabilities

0

0.2

0.4

0.6

0.8

1

1 10 100 1k 2k 5k 10k

0

0.2

0.4

0.6

0.8

1

1 10 100 1k 2k 5k 10k

0

0.2

0.4

0.6

0.8

1

1 10 100 1k 2k 5k 10k

Page 49: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Results (4): realistic model

49

0

0.2

0.4

0.6

0.8

1

1 10 100 1k 2k 5k 10k

0

0.2

0.4

0.6

0.8

1

1 10 100 1k 2k 5k 10k

Page 50: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Summary

What? Methods for evaluating rankers using implicit

feedback Analysis of interleaved comparison methods in

terms of bias and sensitivity And so?

Introduced a new probabilistic interleaved comparison method, unbiased and sensitive

Experimental analysis: more accurate, with substantially fewer observed queries, more robust

50

Page 51: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

What’s next here?

Evaluate in a real-life setting in the future With more reliable and faster convergence, our

approach can pave the way for online learning to rank methods that require many comparisons

51

Page 52: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Wrap-up

52

Page 53: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Online learning to rank Emphasis on implicit feedback collected during

normal operation of the search engine Balancing exploration and exploitation Probabilistic method for inferring preferences from

clicks

53

Page 54: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Information retrieval observatory

Academic experiments on online learning and implicit feedback used simulators Need to validate the simulators

What’s really needed Move away from artificial explicit feedback to

natural implicit feedback Shared experimental environment for observing

users in the wild as they interact with systems

54

Page 55: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Adapting Rankers Online Maarten de Rijke, [email protected]

55

Page 56: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online 56

(Intentionally left blank)

Page 57: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online 57

Bias

x

1) Interleaving 2) Comparison

d1d2d3d4

List l1d2d3d4d1

List l2

d1 d2 d3 d4

d2 d1 d3 d4

d1 d2 d3 d4

d2 d1 d3 d4

k = min(4,3) = 3click count:c1 = 1c2 = 2

obse

rved

cl

icks

c

k = min(4,4) = 4click count:c1 = 2c2 = 2

l2 wins the first comparison, and the lists tie for the second. In expectation l2 wins.

x

x x

Two possible interleaved lists l:

Page 58: Adapting Rankers Online, Maarten de Rijke

Adapting Rankers Online

Sensitivity

58

1) Interleaving 2) Comparison

d1d2d3d4

List l1d2d3d4d1

List l2

d1 1d2 2d3 1d4 2

d2 2d1 1d3 2d4 1

Four possible interleaved lists l, with different assignments a:

assignments a

For the interleaved lists a) and b) l1 wins the comparison. l2 wins in the other two cases.

d1 1d2 2d3 2d4 1

d2 2d1 1d3 1d4 2

x

x x

x

a)

b)

c)

d)