diversifying search result wsdm 2009 intelligent database systems lab. school of computer science...

Diversifying Search Result

WSDM 2009

Intelligent Database Systems Lab.

School of Computer Science & Engineering

Seoul National University

Center for E-Business TechnologySeoul National UniversitySeoul, Korea

Presented by Sung Eun, Park1/25/2011

Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel IeongMicrosoft Research

Contents

Introduction

Intuition

Preliminaries

Problem Formulation

Complexity

Greedy algorithm

Evaluation

Measure

Empirical analysis

Introduction

Ambiguity and diversification

For the ambiguous queries, diversification may help users to find at least one relevant document

Ex) the other day, we were trying to find the meaning of the word “ 왕건” .

– In the context of “ 우와 저거 진짜 왕건이다”

– But search result was all about the king of Goguryu

King 왕건

왕건 as a Big thing

Preliminaries

Problem Formulation

d fails to satisfy user that issues query q with the intended category c

Multiple intents

The probability that some document will satisfy category c

Complexity

A Greedy Algorithm

R(q) be the top k documents selected by some classical ranking algorithm for the target query The algorithm reorder the R(q) to maximize the objective

P(S|q) Input: k, q, C, D, P(c | q), V (d | q, c), Output : set of

documents S

D V(d | q, c)

g(d | q, c)U(R | q) = U(B | q) =0.8 0.2

×0.8×0.8×0.8×0.2×0.2

×0.08×0.08×0.2×0.2

×0.08×0.08

×0.12 0.050.4

• Produces an ordered set of results

• Results not proportional to intent distribution

• Results not according to (raw) quality

Greedy Algorithm (IA-SELECT)

Input: k, q, C, D, P(c | q), V (d | q, c)

Output : set of documents S

When documents may belong to multiple categories, IA-SELECT is no longer guaranteed to be optimal.(Notice this problem is NP-hard)

S = ∅∀c ∈ C, U(c | q) ← P(c | q)while |S| < k do for d ∈ D do g(d | q, c) ← c U(c | q)V (d | q, c) end for d∗ ← argmax g(d | q, c) S ← S {∪ d∗} ∀c ∈ C, U(c | q) ← (1 − V (d ∗ | q, c))U(c | q) D ← D \ {d∗}end while

Marginal Utility

U(c | q): conditional prob of intent c given query qg(d | q, c): current prob of d satisfying q, c

Classical IR Measures(1)

1. Doc 1, rel=32. Doc 2, rel=33. Doc 3, rel=24. Doc 4, rel=05. Doc 5, rel=16. Doc 6, rel=2

Result Doc Set

RR,MRR

Navigational Search/ Question Answering

– A need for a few high-ranked result

Reciprocal Ranking

– How far is an answer document from rank 1?

Example) ½=0.5

Mean Reciprocal Ranking

– Mean of RR of the query test set

1. Doc N2. Doc P3. Doc N4. Doc N5. Doc N

Result Doc Set

Average Precision

– ( 1.00 + 1.00 + 0.75 +

0.67 + 0.38 ) / 6 = 0.633

Mean Average Precision

– Average of the average precision value for a set of queries

– MAP = ( AP1 + AP2 + ... + APn ) / (# of Queries)

Evaluation Measure

Empirical Evaluation

10,000 queries randomlysampled from logs Queries classified acc.

to ODP (level 2)

Keep only queries withat least two intents (~900)

Top 50 results from Live, Google, and Yahoo!

Documents are rated on a 5-pt scale >90% docs have ratings

Docs without ratings are assigned random grade according to the distribution of rated documents

QueryQuery

intentscategoryintents

category docdoc

Proprietary repository of human judgment

A queryclassifier

Results

NDCG-IA

MAP-IA and MRR-IA

Evaluation using Mechanical Turk

Sample 200 queries from the dataset used in Experiment 1

category1

category2

category3

a category they most closely associate with the given query

1. Doc 1, rel=?2. Doc 2, rel=?3. Doc 3, rel=?4. Doc 4, rel=?5. Doc 5, rel=?

Result Doc Set

Judge the corresponding results with respect to the chosen category using the same 4-point scale

Evaluation using Mechanical Turk

Conclusion

How best to diversify results in the presence of ambiguous queries

Provided a greed algorithm for the objective with good approximation guarantees

Thank you

diversifying search result wsdm 2009 intelligent database systems lab. school of computer science...

cuc q d d

c c uc qv d q

d argmax gd q

query qgd q

issues query q

current prob of d satisfying

doc nresult doc setcopyright

doc n2

Documents

diversifying the affordable housing...

diversifying the creative workforce

diversifying teck’s business

ifies-international food industry exhibition, seoul 2009 ...

personalizing atypical web search sessions (wsdm'13)

evolution of two sided markets - yury lifshits - wsdm 2010

wsdm muws 1.0 part...

dynamic information retrieval tutorial - wsdm 2015

diversifying your investments

diversifying candidate pools

diversifying search results

wsdm 2011 - nicolaas matthijs and filip radlinski

diversifying immediate feedback

diversifying network participation: study of india’s...

entity retrieval (wsdm 2014 tutorial)

communities wsdm slides

chapter # wsdm : web semantics design method

a non-parametric bayesian approach [wsdm’14]

diversifying your income

diversifying african trade