models for metasearch - khoury college of computer sciences€¦ · upper bounds on metasearch...

76
1 Models for Metasearch Javed Aslam

Upload: others

Post on 11-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

1

Models for Metasearch

Javed Aslam

Page 2: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

2

The Metasearch ProblemSearch for: chili peppers

Page 3: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

3

Search Engines

Provide a ranked list of documents.May provide relevance scores.May have performance information.

Page 4: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

4

Search Engine: Alta Vista

Page 5: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

5

Search Engine: Ultraseek

Page 6: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

6

Search Engine: inq102 TREC3Queryid (Num): 50Total number of documents over all queries

Retrieved: 50000Relevant: 9805Rel_ret: 7305

Interpolated Recall - Precision Averages: at 0.00 0.8992at 0.10 0.7514at 0.20 0.6584at 0.30 0.5724at 0.40 0.4982at 0.50 0.4272at 0.60 0.3521at 0.70 0.2915at 0.80 0.2173at 0.90 0.1336at 1.00 0.0115

Average precision (non-interpolated)for all rel docs (averaged over queries)

0.4226Precision:At 5 docs: 0.7440At 10 docs: 0.7220At 15 docs: 0.6867At 20 docs: 0.6740At 30 docs: 0.6267At 100 docs: 0.4902At 200 docs: 0.3848At 500 docs: 0.2401At 1000 docs: 0.1461

R-Precision (precision after R(= num_rel for a query) docs retrieved):

Exact: 0.4524

Page 7: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

7

External MetasearchMetasearch Engine

Search Engine

A

Database A

Search Engine

B

Database B

Search Engine

C

Database C

Page 8: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

8

Internal Metasearch

Text Module

Metasearch core

URL Module

Image Module

HTML Database

Image Database

Search Engine

Page 9: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

9

OutlineIntroduce problemCharacterize problemSurvey current techniquesDescribe new approaches

decision theory, social choice theoryexperiments with TREC data

Upper bounds for metasearchFuture work

Page 10: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

10

Classes ofMetasearch Problems

no trainingdata

trainingdata

rele

vanc

esc

ores

rank

son

ly

CombMNZ LC model

BayesBorda,Condorcet,

rCombMNZ

Page 11: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

11

OutlineIntroduce problemCharacterize problemSurvey current techniquesDescribe new approaches

decision theory, social choice theoryexperiments with TREC data

Upper bounds for metasearchFuture work

Page 12: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

12

Classes ofMetasearch Problems

no trainingdata

trainingdata

rele

vanc

esc

ores

rank

son

ly

CombMNZ LC model

BayesBorda,Condorcet,

rCombMNZ

Page 13: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

13

CombSUM [Fox, Shaw, Lee, et al.]

Normalize scores: [0,1].For each doc:

sum relevance scores given to it by each system (use 0 if unretrieved).

Rank documents by score.Variants: MIN, MAX, MED, ANZ, MNZ

Page 14: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

14

CombMNZ [Fox, Shaw, Lee, et al.]

Normalize scores: [0,1].For each doc:

sum relevance scores given to it by each system (use 0 if unretrieved), andmultiply by number of systems that retrieved it (MNZ).

Rank documents by score.

Page 15: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

15

How well do they perform?

Need performance metric.Need benchmark data.

Page 16: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

16

Metric: Average Precision

RNNRNRNR

4/8

3/5

2/3

1/1

0.6917

Page 17: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

17

Benchmark Data: TREC

Annual Text Retrieval Conference.Millions of documents (AP, NYT, etc.)50 queries.Dozens of retrieval engines.Output lists available.Relevance judgments available.

Page 18: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

18

Data Sets

100050105TREC9

10001010Vogt

10005061TREC5

10005040TREC3

Number of docs

Number queries

Number systemsData set

Page 19: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

19

CombX on TREC5 Data

Page 20: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

20

Experiments

Randomly choose n input systems.For each query:

combine, trim, calculate avg precision.

Calculate mean avg precision.Note best input system.Repeat (statistical significance).

Page 21: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

21

CombMNZ on TREC5

Page 22: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

22

OutlineIntroduce problemCharacterize problemSurvey current techniquesDescribe new approaches

decision theory, social choice theoryexperiments with TREC data

Upper bounds for metasearchFuture work

Page 23: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

23

New Approaches [Aslam, Montague]

Analog to decision theory.Requires only rank information.Training required.

Analog to election strategies.Requires only rank information.No training required.

Page 24: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

24

Classes ofMetasearch Problems

no trainingdata

trainingdata

rele

vanc

esc

ores

rank

son

ly

CombMNZ LC model

BayesBorda,Condorcet,

rCombMNZ

Page 25: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

25

Decision Theory

Consider two alternative explanations for some observed data.

Medical example:Perform a set of blood tests.Does patient have disease or not?

Optimal method for choosing among the explanations: likelihood ratio test. [Neyman-Pearson Lemma]

Page 26: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

26

Metasearch viaDecision Theory

Metasearch analogy:Observed data – document rank info over all systems.Hypotheses – document is relevant or not.

Ratio test: ],...,,|Pr[],...,,|Pr[

21

21

n

nrel rrrirr

rrrrelO =

Page 27: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

27

Bayesian Analysis

Prel = Pr[rel | r1,r2,...,rn ]

Prel =Pr[r1,r2,...,rn | rel] ⋅Pr[rel]

Pr[r1,r2,...,rn ]

Orel =Pr[r1,r2,...,rn | rel] ⋅Pr[rel]Pr[r1,r2,...,rn | irr] ⋅Pr[irr]

∏∏⋅

⋅≅

i i

irel

i i

i irel

irrrrelrLO

irrrirrrelrrel

O

]|Pr[]|Pr[log~

]|Pr[]Pr[]|Pr[]Pr[

Page 28: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

28

Bayes on TREC3

Page 29: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

29

Bayes on TREC5

Page 30: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

30

Bayes on TREC9

Page 31: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

31

Beautiful theory, but…

In theory, there is no difference between theory and practice;in practice, there is.

–variously: Chuck Reid, Yogi Berra

Issue: independence assumption…

Page 32: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

32

Naïve-Bayes Assumption

Orel =Pr[r1,r2,...,rn | rel] ⋅Pr[rel]Pr[r1,r2,...,rn | irr] ⋅Pr[irr]

Orel ≅Pr[rel] ⋅ Pr[ri | rel]

i∏Pr[irr] ⋅ Pr[ri | irr]

i∏

Page 33: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

33

Bayes on Vogt Data

Page 34: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

34

New Approaches [Aslam, Montague]

Analog to decision theory.Requires only rank information.Training required.

Analog to election strategies.Requires only rank information.No training required.

Page 35: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

35

Classes ofMetasearch Problems

no trainingdata

trainingdata

rele

vanc

esc

ores

rank

son

ly

CombMNZ LC model

BayesBorda,Condorcet,

rCombMNZ

Page 36: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

36

Election Strategies

Plurality vote.Approval vote.Run-off.Preferential rankings:

instant run-off,Borda count (positional),Condorcet method (head-to-head).

Page 37: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

37

Metasearch Analogy

Documents are candidates.Systems are voters expressing preferential rankings among candidates.

Page 38: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

38

Condorcet Voting

Each ballot ranks all candidates.Simulate head-to-head run-off between each pair of candidates.Condorcet winner: candidate that beats all other candidates, head-to-head.

Page 39: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

39

Condorcet Paradox

Voter 1: A, B, CVoter 2: B, C, AVoter 3: C, A, BCyclic preferences: cycle in Condorcetgraph.Condorcet consistent path: Hamiltonian.For metasearch: any CC path will do.

Page 40: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

40

Condorcet Consistent Path

Page 41: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

41

Hamiltonian Path ProofInductive Step:

Base Case:

Page 42: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

42

Condorcet-fuse: Sorting

Insertion-sort suggested by proof.Quicksort too; O(n log n) comparisons.

n documents.

Each comparison: O(m).m input systems.

Total: O(m n log n).

Need not compute entire graph.

Page 43: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

43

Condorcet-fuse on TREC3

Page 44: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

44

Condorcet-fuse on TREC5

Page 45: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

45

Condorcet-fuse on Vogt

Page 46: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

46

Condorcet-fuse on TREC9

Page 47: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

47

Breaking Cycles

SCCs are properly ordered.

How are ties within an SCCbroken? (Quicksort)

Page 48: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

48

OutlineIntroduce problemCharacterize problemSurvey current techniquesDescribe new approaches

decision theory, social choice theoryexperiments with TREC data

Upper bounds for metasearchFuture work

Page 49: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

49

Upper Bounds on Metasearch

How good can metasearch be?Are there fundamental limits that methods are approaching?Need an analog to running time lower bounds…

Page 50: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

50

Upper Bounds on Metasearch

Constrained oracle model:omniscient metasearch oracle,constraints placed on oracle that any reasonable metasearch technique must obey.

What are “reasonable” constraints?

Page 51: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

51

Naïve Constraint

Naïve constraint: Oracle may only return docs from underlying lists.Oracle may return these docs in any order.Omniscient oracle will return relevants docs above irrelevant docs.

Page 52: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

52

TREC5: Naïve Bound

Page 53: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

53

Pareto Constraint

Pareto constraint: Oracle may only return docs from underlying lists.Oracle must respect unanimous will of underlying systems.Omniscient oracle will return relevants docs above irrelevant docs, subject to the above constraint.

Page 54: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

54

TREC5: Pareto Bound

Page 55: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

55

Majoritarian Constraint

Majoritarian constraint: Oracle may only return docs from underlying lists.Oracle must respect majority will of underlying systems.Omniscient oracle will return relevant docs above irrelevant docs and break cycles optimally, subject to the above constraint.

Page 56: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

56

TREC5: Majoritarian Bound

Page 57: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

57

Upper Bounds: TREC3

Page 58: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

58

Upper Bounds: Vogt

Page 59: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

59

Upper Bounds: TREC9

Page 60: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

60

TREC8:Avg Prec vs Feedback

Page 61: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

61

TREC8:System Assessments vs TREC

Page 62: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

62

Metasearch Engines

Query multiple search engines.May or may not combine results.

Page 63: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

63

Metasearch: Dogpile

Page 64: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

64

Metasearch: Metacrawler

Page 65: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

65

Metasearch: Profusion

Page 66: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

66

Characterizing Metasearch

Three axes:common vs. disjoint database,relevance scores vs. ranks,training data vs. no training data.

Page 67: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

67

Axis 1: DB Overlap

High overlapdata fusion.

Low overlapcollection fusion (distributed retrieval).

Very different techniques for each…This work: data fusion.

Page 68: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

68

CombMNZ on TREC3

Page 69: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

69

CombMNZ on Vogt

Page 70: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

70

CombMNZ on TREC9

Page 71: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

71

Borda Count

Consider an n candidate election.For each ballot:

assign n points to top candidate,

assign n-1 points to next candidate,…

Rank candidates by point sum.

Page 72: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

72

Borda Count: Election 2000

Ideological order: Nader, Gore, Bush.Ideological voting:

Bush voter: Bush, Gore, Nader.Nader voter: Nader, Gore, Bush.Gore voter:

Gore, Bush, Nader.Gore, Nader, Bush.

50/50, 100/0

Page 73: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

73

Election 2000: Ideological Florida Voting

6,107,13814,639,26714,734,379100/0

7,560,86413,185,54214,734,37950/50

NaderBushGore

Gore Wins

Page 74: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

74

Borda Count: Election 2000

Ideological order: Nader, Gore, Bush.Manipulative voting:

Bush voter: Bush, Nader, Gore.Gore voter: Gore, Nader, Bush.Nader voter: Nader, Gore, Bush.

Page 75: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

75

Election 2000: Manipulative Florida Voting

11,923,76511,731,81611,825,203

NaderBushGore

Nader Wins

Page 76: Models for Metasearch - Khoury College of Computer Sciences€¦ · Upper Bounds on Metasearch Constrained oracle model: omniscient metasearch oracle, constraints placed on oracle

76

Future WorkBayes

approximate dependence.Condorcet

weighting, dependence.Upper bounds

other constraints.Meta-retrieval

Metasearch is approaching fundamental limits.Need to incorporate user feedback: learning…