diversifying search result wsdm 2009 intelligent database systems lab. school of computer science...

19
Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business Technology Seoul National University Seoul, Korea Presented by Sung Eun, Park 1/25/2011 Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Microsoft Research

Upload: melissa-barber

Post on 17-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Diversifying Search Result

WSDM 2009

Intelligent Database Systems Lab.

School of Computer Science & Engineering

Seoul National University

Center for E-Business TechnologySeoul National UniversitySeoul, Korea

Presented by Sung Eun, Park1/25/2011

Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel IeongMicrosoft Research

Page 2: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Contents

Introduction

Intuition

Preliminaries

Model

Problem Formulation

Complexity

Greedy algorithm

Evaluation

Measure

Empirical analysis

2

Page 3: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Introduction

Ambiguity and diversification

For the ambiguous queries, diversification may help users to find at least one relevant document

Ex) the other day, we were trying to find the meaning of the word “ 왕건” .

– In the context of “ 우와 저거 진짜 왕건이다”

– But search result was all about the king of Goguryu

3

King 왕건

왕건 as a Big thing

Page 4: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Preliminaries

4

Page 5: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Problem Formulation

d fails to satisfy user that issues query q with the intended category c

Multiple intents

The probability that some document will satisfy category c

Page 6: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Complexity

Page 7: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

A Greedy Algorithm

R(q) be the top k documents selected by some classical ranking algorithm for the target query The algorithm reorder the R(q) to maximize the objective

P(S|q) Input: k, q, C, D, P(c | q), V (d | q, c), Output : set of

documents S

0.4

0.9

0.5

0.4

0.4

D V(d | q, c)

0.08

0.72

0.40

0.32

0.08

g(d | q, c)U(R | q) = U(B | q) =0.8 0.2

×0.8×0.8×0.8×0.2×0.2

×0.08×0.08×0.2×0.2

0.08

0.08

0.04

0.03

0.08

0.12

×0.08×0.08

×0.12 0.050.4

0.9

0.4

0.07S

• Produces an ordered set of results

• Results not proportional to intent distribution

• Results not according to (raw) quality

Page 8: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Greedy Algorithm (IA-SELECT)

Input: k, q, C, D, P(c | q), V (d | q, c)

Output : set of documents S

When documents may belong to multiple categories, IA-SELECT is no longer guaranteed to be optimal.(Notice this problem is NP-hard)

S = ∅∀c ∈ C, U(c | q) ← P(c | q)while |S| < k do for d ∈ D do g(d | q, c) ← c U(c | q)V (d | q, c) end for d∗ ← argmax g(d | q, c) S ← S {∪ d∗} ∀c ∈ C, U(c | q) ← (1 − V (d ∗ | q, c))U(c | q) D ← D \ {d∗}end while

Marginal Utility

U(c | q): conditional prob of intent c given query qg(d | q, c): current prob of d satisfying q, c

Page 9: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Classical IR Measures(1)

1. Doc 1, rel=32. Doc 2, rel=33. Doc 3, rel=24. Doc 4, rel=05. Doc 5, rel=16. Doc 6, rel=2

1. Doc 1, rel=32. Doc 2, rel=33. Doc 3, rel=24. Doc 4, rel=05. Doc 5, rel=16. Doc 6, rel=2

Result Doc Set

Page 10: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Classical IR Measures(2)

RR,MRR

Navigational Search/ Question Answering

– A need for a few high-ranked result

Reciprocal Ranking

– How far is an answer document from rank 1?

Example) ½=0.5

Mean Reciprocal Ranking

– Mean of RR of the query test set

1. Doc N2. Doc P3. Doc N4. Doc N5. Doc N

1. Doc N2. Doc P3. Doc N4. Doc N5. Doc N

Result Doc Set

Page 11: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Classical IR Measures(3)

MAP

Average Precision

– ( 1.00 + 1.00 + 0.75 +

0.67 + 0.38 ) / 6 = 0.633

Mean Average Precision

– Average of the average precision value for a set of queries

– MAP = ( AP1 + AP2 + ... + APn ) / (# of Queries)

Page 12: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Evaluation Measure

Page 13: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Empirical Evaluation

10,000 queries randomlysampled from logs Queries classified acc.

to ODP (level 2)

Keep only queries withat least two intents (~900)

Top 50 results from Live, Google, and Yahoo!

Documents are rated on a 5-pt scale >90% docs have ratings

Docs without ratings are assigned random grade according to the distribution of rated documents

QueryQuery

intentscategoryintents

category docdoc

ODP

Proprietary repository of human judgment

A queryclassifier

A queryclassifier

Page 14: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Results

NDCG-IA

MAP-IA and MRR-IA

Page 15: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Evaluation using Mechanical Turk

Sample 200 queries from the dataset used in Experiment 1

query

category1

category2

category3

+

a category they most closely associate with the given query

1. Doc 1, rel=?2. Doc 2, rel=?3. Doc 3, rel=?4. Doc 4, rel=?5. Doc 5, rel=?

Result Doc Set

Judge the corresponding results with respect to the chosen category using the same 4-point scale

Page 16: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Page 17: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Evaluation using Mechanical Turk

Page 18: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Copyright 2010 by CEBT

Conclusion

How best to diversify results in the presence of ambiguous queries

Provided a greed algorithm for the objective with good approximation guarantees

Page 19: Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business

Q&A

Thank you

19