lecture 9: rank aggregation in metasearch metasearch engine social choice rules rank aggregation

48
Lecture 9: Rank Aggregat ion in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Post on 21-Dec-2015

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Lecture 9: Rank Aggregation in MetaSearch

• MetaSearch Engine• Social Choice Rules• Rank Aggregation

Page 2: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Choices of Search Engines

• Many search engines exist to compete for users– The results are not necessarily the same– Different users prefer different search engines– Search results may, in the future, be biased

towards paid advertisements.

Page 3: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

MetaSearch Engine

• Metasearch Engines are designed to increase the coverage of web by forwarding users’ queries to multiple search engines– Users’ requests are sent to multiple search

engines such as AlltheWeb, Google, MSN.

• Then the results from the individual search engine are combined into a single result set to present to users.

Page 4: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Different Forms of MetaSearch

• Submit different representations of the same query to the same search engine, then combine the results.

• Submit the same query to several search engine adopting different information retrieval models, then combine the results.

Page 5: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• How to combine the results retrieved by different source search engines is crucial for the success of a metasearch engine.

• And this is the problem that social choice theory has been trying to answer.

Issues

Page 6: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Search Engine Watch

• Interesting meta search engines are listed at– http://www.searchenginewatch.com/links/

article.php/2156241

Page 7: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Social Choice Theory

• Studies on protocols that help a group of people make collective decisions, such as vote.

Page 8: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

A Fundamental problem

• Given a collection of agents (voters) – with preferences over different alternatives

(allocations, outcomes),

• how should society evaluate these alternatives and make a decision for all– that may be for the will of some voters but

against that of others.

Page 9: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Applications

• Voters elect president from several candidates.

• National polls for economic or political policy of the government

• The procedure or rule of election

• The rank of metasearch engine obtained from those of search engines

Page 10: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Group Descisions

How do we make decisions

• Flip a coin?

• Dictatorship?

• Democracy (Majority rule)?

Page 11: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Group Decision Rules

• Majority rule ,

• Condorcet paradox (voting cycle)

• Borda rule

Page 12: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• A set of voters V={v1,v2,v3,…,Vn}

• A set of alternatives or outcomes S={s1,s2,s3,…Sm}, with |S|=m; and

• A set of preference relation P={R1,R2,R3…Rn}, called a preference profile, – the preference relation Ri for each voter i is a

permutation (order) of elements in S.

Mathematical model

Page 13: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Example 1 Majority Rule

• 3 rational people have rational preferences over 2 alternatives {x,y}

Person

1 2 3

1st X Y X 1 : X>Y

Pref. i.e.Person 2 : Y>X

2nd Y X Y 3 : X>Y

How to Aggregate their preferences? How to choose?

Page 14: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• Using majority rule.

• Since more than ½ people (two out of three) prefer x to y.

• Then the group prefers x to y

Page 15: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Example 2 Condorcet Paradox

• 3 rational people have rational preferences over 3 alternatives {x,y,z}

Person

1 2 3

1st X Y Z 1 : X>Y>Z

Pref. 2nd Y Z X i.e. Person 2 : Y>Z>X

3rd Z X Y 3 : Z>X>Y

Page 16: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• Person

1 2 3

1st X Y Z 1 : X>Y

Pref. 2nd Y Z X for (x,y) 2 : Y>X X>Y

3rd Z X Y 3 : X>Y• Similarly, for (Y,Z) we can get Y>Z; for (Z,X) we

can get Z>X.• Then X>Y>Z>X (cycling) , Intransitive Not

rational

Binary/paired Comparison With Majority rule

Page 17: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• It was noted by Condorcet in the 18 century that no alternative can win a majority against all other alternatives.

• Pairwise majority is not satisfactory in all cases.

Page 18: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Example 3 Borda Rule

• For each voter, – associate the number 1 with the most

preferred alternative, – 2 with the second and so on,

• Assign to each alternative the number equal to – the sum of the numbers the individual voters

assigned to the alternative.

Page 19: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Person

1 2 3

1st X(1) Y(1) X(1) X(4) X

Pref. 2nd Y(2) X(2) W(2) Y(7) Y

3rd Z(3) W(3) Z(3) Z(10) W

4th W(4) Z(4) Y(4) W(9) Z

Then We get choice X>Y>W>Z

Page 20: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• For above example, if we use binary/paired comparison With majority rule . We can get

X>Y in 2 out of 3, Y>W in 2 out of 3,

W>Z in 2 out of 3, X>W in 3 out of 3,

X>Z in 3 out of 3, Y>Z in 2 out of 3

Then we can achieve same choice

X>Y>W>Z

Page 21: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• For the previous example we had trouble with majority rule via binary/paired comparison, we get a tie between all three alternatives with the Borda’s rule: – All three alternatives get a sum

of 6.

Page 22: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• Some variations 1 with relevant scores available allotting each input system a point p to be

distributed according to relevance scores of the documents.

2 Weighted Borda-rule Each voter may not have equal effectiveness to

the final result. We may set more weight to good quality input systems.

Page 23: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• Condorcet winner algorithm

It also comes from social choice theory. The Condorcet algorithm says that any candidate that can beat all other candidates in a head-to-head contest (pair-wise comparison) should win the election.

Page 24: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• Step 1, Construct Condorcet Graph. For each candidate pair (x,y), there exists an edge

from x to y if x would receive at least as many votes as y in a head-to-head contest.

In Condorcet graph, there is at least one directed edge between every pair of candidates. ( we call the graph is semi-complete)

It may contains cycles in the graph. This is due to voting paradox of the condorcet voting.

Page 25: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• Step 2, We form a new acyclic graph from an old cyclic one by contracting all of the nodes in a cycle into one. It is a strongly connected component graph (SCCG).

A directed graph is strongly connected if for any two nodes ua nd v, there are paths from u to v and from v to u.

Definition of Strongly connected component(SCC): A strongly connected subgraph, S, of a directed

graph, D, such that no vertex or subset of vertices of D can be added to S such that the new subgraph is still strongly connected.

Page 26: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

The graph is totally orderable at the level of the SCC’s and each SCC is a “pocket” of cycles, within which each candidate is tied. (Why?)

Step 3, The condorcet-consistent Hamiltonian path is any Hamiltonian path through Condorcet graph.

Definition Hamiltonian path: A path between two vertices of a graph that visits each vertex exactly once.

Page 27: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• Theorem 1. Suppose x and y are nodes in a graph g, and that X and Y are nodes of the associated SCCG G such that x X and y Y. If there exists a path from X to Y in G, then every Condorcet path of g has x before y.

Refer to [Javed A. Aslam, Mark Montague 2001] for proof.

Page 28: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Rank Aggregation in MetaSearch

Here we discussed two cases which using algorithm rooted at social choice theory for MetaSearch rank aggregation.

• Data fusion track in TREC [Javed A. Aslam, Mark Montague 2001] Models for Metasearch

in SIGIR2001

• Rank aggregation for web search engine

[Cynthia Dwork, Ravi Kumar, Moni Naor, D.Sivakumar 2001]

Rank Aggregation Methods for the Web in WWW10

Page 29: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Data fusion track in TREC

• TREC (Text Retrieval Conference ,see http://trec.nist.gov/) maintains about 6Gb of SGML tagged text, queries and respective answers for evaluation purposes.

• The TREC organizers distribute data sets in advance and 50 new queries each year.

• The competing teams then submit ranked lists of documents that their system gave in response to each query. And these retrieval systems will be evaluated.

Page 30: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• These ranked lists are available for metasearch researchers to download and use.

• For each query, every retrieval system will return top 1000 documents and relevant score is available.

• Then given these results retrieved by many different retrieval systems, how to aggregate them for better performance?

Page 31: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Previous algorithms

• Min, Max and Average Models [Fox and Shaw,1995]

• Linear Combination Model [Bartell 1995]

• Logistic Regression Model

Page 32: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Example

• Min, Max and Average model

The final score of each document d is based on the scores given to d by each input systems (voters).

Algorithm Final score

CombMin minimum of individual relevance scores

CombMed median of individual relevance scores

CombMax maximum of individual relevance scores

CombSum sum of individual relevance scores

CombANZ CombSum / num non-zero relevance scores

CombMNZ CombSum * num non-zero relevance scores

Page 33: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• Linear Combination Model (LC model)

The final score of document d is a simply linearly (each weighted differently) combining the normalized relevance scores given to each document.

ai—weight

si(d)—relevance score

i

iiLC dsadS )()(

Page 34: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Experiment result on TREC Model

• The performance of rank aggregation is evaluated by average precision over the queries

• Score-based borda-fuse (LC model) is usually the best method among several borda variant algorithms.

• It is better than best input system over most of data collection. Such as TREC3, TREC5

Page 35: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Experiment result II

• The performance of rank aggregation is evaluated by average precision over the queries.

• Condorcet-fusion is the only algorithm that , without training data, ever matches the performance of the best input system over TREC 9.

• Condorcet-fusion seems particularly sensitive to the dependence of input systems. If the input systems (voters) are too similar, the performance will decrease.

Page 36: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Rank aggregation methods for web

New Challenges: Different from the case in TREC data fusion,

– The coverage of various search engine is different

– Thus some highly relevant web pages may not be ranked by some search engines.

– Therefore, each voter ranks a partial candidate list

Page 37: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Preliminaries

• Given a universe U, an ordered list with respect to U is an ordering of a subset S U, i.e., ,with each and is some ordering relation on S.

• If contains – all the elements in U, then it is said to be a full list, – otherwise it is called partial list.

]...[ 21 dxxx ,Sxi

Page 38: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• Distance measures between two full lists with respect to a set S– The Kendall tau distance – It counts the number of pairwise disagreements between two

lists.– The distance is given by

– Normalize it by dividing the maximum possible value

)}()(),()(,|),{(),( jibutjijijiK

2/2S

Page 39: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• Spearman footrule distance

• Given two full lists and , the distance is given by

• Normalize it by dividing the maximum value

s

iiiF

1)()(),(

2/2S

Page 40: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• Distance measures for more than 2 list

Given several full lists , for instance, the normalized Footrule distance of to is given by

If are partial lists, let U denote the union of elements in and let be a full list with respect to U. Considering the distance between and the projection of with respect to , we have the induced footrule distance

k ,...,,, 21

k ,...,, 21

k

i ik FkF121 ),()/1(),...,,,(

k ,...,, 21

k ,...,, 21

i

i

k

i ik iFF1 |1 ),(),...,,(

Page 41: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Optimal rank aggregation

The question is Given (full or partial) lists , find a such that is a

full list with respect to the union of the elements of

minimizes

The aggregation obtained by optimizing Kendall distance is called Kemeny optimal aggregation.

k ,...,, 21

k ,...,, 21 ),...,,,( 21 kK

Page 42: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

• When k>=4,computing the Kemeny optimal aggregation is NP-hard.

(please refer to [Cynthia Dwork, Ravi Kumar, Moni Naor, D.Sivakumar 2001] for detailed proof )

We can use Spearman footrule distance to approximate the Kendall distance.

Page 43: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

LCS approach (My own method)

• Given n lists

l1,1, l1, 2, …, l 1, n1;

l2,1 , l 2, 2, …, l 2, n2;

l3,1,l3,2, …, l3, n3; …..

l m,1, l m,2, …, l m,nm,Find a longest common subsequence for

these lists.

Page 44: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

LCS approach (My own method)

• LCS is NP-hard for m sequences if some elements appear twice in a sequence.

• For the lists obtained by search engines, each document appears at most once.

• There exists efficient algorithm to solve the problem for the special case.

• Assume ni=nj for i, j=1, 2, ….

Page 45: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Efficient algorithm for LCS of m sequences

• Fixed the order of the first sequence as

1, 2, …, n1.• Define d(i) to be the length of LCS for the elements 1, 2, …, i that contains i in

the LCS.

Page 46: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

Computation of d(i,1) and d(i,2) d(i)=max k d(k)+1 such that k is always

before i in all the m lists. (if k does not exist, d(i)=1.)

The length of the LCS is max d(i) for i=1, 2, …, n1.

A backtracking process can give the LCS.

Page 47: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

An Example: l1=1,2,3,4,5,6,7,8,9,10.

l2=2,1,3,4,5,6,7,9,8,10

l3=2,3,5,4,1,6,7,8,9,10

l4=2,3,5,7,4,6,1,7,8,9,10 d(1)=1, d(2)=1. d(3)=d(2)+1=2. d(4)=d(3)+1=3. d(5)=d(3)+1=3. d(6)=d(5)+1=4. d(7)=d(6)+1=5. d(8)=d(7)+1=6. d(9)=d(7)+1=6. d(10)=d(9)+1=7. The final length is 7. the LCS is 2,3,4 ,6,7,8,10 2,3,4, 6, 7, 9, 10 is a LCS, too.

Page 48: Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation

When ni’s are different We delete those elements that are absent in some sequence.

Examlple, l1= 1, 2, 3, 4, 5, 6

l2=2, 1, 5, 4, 6

l3=2, 3, 4, 5, 6,

l4=1,4, 3, 5, 6,

since 1 is not in l3, 2 is not in l4 and 3 is not in l2, we can compute the LCS for

l’1= 4, 5, 6

l2= 5, 4, 6

l3= 4, 5, 6,

l4= 4, 5, 6. The final result is 4, 6.