the rank aggregation problem david p. williamson cornell university universidade federal de minas...

55
The Rank Aggregation The Rank Aggregation Problem Problem David P. Williamson David P. Williamson Cornell University Cornell University Universidade Federal de Minas Gerais Universidade Federal de Minas Gerais December 10, 2012 December 10, 2012

Upload: anne-york

Post on 28-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

The Rank Aggregation The Rank Aggregation ProblemProblem

David P. WilliamsonDavid P. WilliamsonCornell UniversityCornell University

Universidade Federal de Minas GeraisUniversidade Federal de Minas GeraisDecember 10, 2012December 10, 2012

Page 2: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

OutlineOutline

An old problem and one formulation of itAn old problem and one formulation of it Some modern-day applicationsSome modern-day applications Related work in approximation Related work in approximation

algorithmsalgorithms Some computational resultsSome computational results ConclusionConclusion

Page 3: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

An old questionAn old question

How can the preferences of multiple How can the preferences of multiple competing agents be fairly taken into competing agents be fairly taken into account?account?– Groups deciding where to go to dinnerGroups deciding where to go to dinner– ElectionsElections

Page 4: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Rank aggregationRank aggregation

Input:Input:– N candidatesN candidates– K voters giving (partial) K voters giving (partial)

preference list of preference list of candidatescandidates

Goal:Goal:– Want single ordering of Want single ordering of

candidates expressing candidates expressing votersvoters’’ preferences preferences

– ??????

Ballot

1. Labour

2. Liberal Democrats

Ballot

1. Conservative

2. Liberal Democrats

3. Labour

Ballot

1. Sinn Fein

2. Labour

3. Liberal Democrats

Page 5: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

A well-known answerA well-known answer

Arrow (1950): They canArrow (1950): They can’’t.t. CanCan’’t simultaneously have a means of t simultaneously have a means of

aggregating preferences that has:aggregating preferences that has:– Non-dictatorshipNon-dictatorship– Pareto efficiency (if everyone prefers A to B, Pareto efficiency (if everyone prefers A to B,

then final order should prefer A to B)then final order should prefer A to B)– Independence of irrelevant alternatives (Given Independence of irrelevant alternatives (Given

two inputs in which A and B are ranked two inputs in which A and B are ranked identically by everyone, the two outputs should identically by everyone, the two outputs should order A and B the same)order A and B the same)

Page 6: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Still…Still…

As with computational intractability, we As with computational intractability, we still need to do the best we can.still need to do the best we can.

Why is this any more relevant now Why is this any more relevant now than before?than before?

Page 7: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

The information ageThe information age

Can easily see the preferences of Can easily see the preferences of millions (e.g. Netflix Challenge).millions (e.g. Netflix Challenge).

……and those of a few.and those of a few. What if the main players are What if the main players are

systematically biased in some way?systematically biased in some way?

Page 8: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

The Rank Aggregation The Rank Aggregation ProblemProblem Question raised by Dwork, Kumar, Question raised by Dwork, Kumar,

Naor, Sivakumar, Naor, Sivakumar, ““Rank aggregation Rank aggregation methods for the webmethods for the web””, WWW10, 2001., WWW10, 2001.– Q: How can search-engine bias be Q: How can search-engine bias be

overcome?overcome?– A: By combining results from multiple A: By combining results from multiple

search enginessearch engines

Page 9: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Sample search: WaterlooGoogle

1. Wikipedia: Battle of Waterloo

2. Wikipedia: Waterloo, ON

3. www.city.waterloo.on.ca (City of Waterloo website)

4. www.uwaterloo.ca (University of Waterloo)

5. www.waterlooindustries.com (High performance tool storage)

Yahoo!

1. www.uwaterloo.ca

2. Wikipedia: Battle of Waterloo

3. www.city.waterloo.on.ca

4. Wikipedia: Waterloo, ON

5. www.waterloorecords.com (Record store in Austin, TX)

MSN

1. Wikipedia: Battle of Waterloo

2. Wikipedia: Waterloo Station (in London)

3. Youtube: Video of ABBA’s “Waterloo”

4. www.waterloorecords.com

5. www.waterloo.il.us (City in Illinois)

Page 10: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Kemeny optimal aggregationKemeny optimal aggregation

Want to find ordering of all elements that minimizes thetotal number of pairs "out of order" with respect to all the lists.

Google1. Wikipedia: Battle of Waterloo2. Wikipedia: Waterloo, ON3. www.city.waterloo.on.ca 4. www.uwaterloo.ca 5. www.waterlooindustries.com

Yahoo!1. www.uwaterloo.ca2. Wikipedia: Battle of Waterloo3. www.city.waterloo.on.ca4. Wikipedia: Waterloo, ON5. www.waterloorecords.com

www.uwaterloo.ca

Wikipedia: Battle of Waterloo

Wikipedia: Waterloo, ON

www.city.waterloo.on.ca

www.waterloo.il.us

Page 11: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

A metric on permutationsA metric on permutations

KendallKendall’’s tau distance K(S,T)s tau distance K(S,T)

number of pairs (i,j) that S and T disagree onnumber of pairs (i,j) that S and T disagree on

BDAC

ABCD

number of disagreements: 3 (AB, AD, number of disagreements: 3 (AB, AD, CD)CD)

Page 12: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Thus given input top k lists TThus given input top k lists T11,,……,T,Tnn, we find , we find permutation S on universe of elements to minimize permutation S on universe of elements to minimize K*(S,TK*(S,T11,,……,T,Tnn) ) == ii K(S,T K(S,Tii) (essentially)) (essentially)

Yields Yields extended Condorcet criterionextended Condorcet criterion: if every cand. : if every cand. in in AA is preferred by some majority to every cand. in is preferred by some majority to every cand. in BB, all of , all of AA ranked ahead of all of ranked ahead of all of BB..

But K* NP-hard to compute for 4 or more lists.

My home page Legit.com

Spam.com Spam.org

Page 13: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

How then to compute an How then to compute an aggregation?aggregation? Answer in Dwork et al.: heuristicsAnswer in Dwork et al.: heuristics Markov chain techniques: given chain Markov chain techniques: given chain

on candidates, compute stationary on candidates, compute stationary probs, rank by probs.probs, rank by probs.

Page 14: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Local KemenizationLocal Kemenization

Can achieve extended Condorcet by finding Can achieve extended Condorcet by finding S a local min of K*(S,TS a local min of K*(S,T11,,……,T,Tnn); i.e. ); i.e. interchanging candidates i and i+1 of S interchanging candidates i and i+1 of S does not decrease score.does not decrease score.

Easy to compute.Easy to compute.

Page 15: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

UsesUses

Internal IBM metasearch engine: Internal IBM metasearch engine: SangamSangam

IBM experimental IBM experimental intranet intranet search search engine: iSearchengine: iSearch

Fagin, Kumar, McCurley, Novak, Sivakumar, Tomlin, W, “Searching the Workplace Web”, WWW 2003.

Page 16: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Internet vs. intranet searchInternet vs. intranet search

Different social forces at work in content Different social forces at work in content creationcreation

Different types of queries and results; intranet Different types of queries and results; intranet search closer to search closer to ‘‘home pagehome page’’ finding finding

No spamNo spameAMTPBCHRMTSASOISSISametimeEA2000IDP

global printe-AMTjobsTDSPintranet passwordglobal campusprintershuman resourcesESPP

TravelReqcatPSMEPPredbooksILCvirusprinterreserve

WebsphereITCS204ITCS300vacation plannerpasswordmobilitycell phonePCFBPFJ

Page 17: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

iSearchiSearch

Idea: aggregate different ranking heuristics to see what works Idea: aggregate different ranking heuristics to see what works best for intranet searchbest for intranet search

Page 18: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Method and resultsMethod and results

Found ground truth, determined Found ground truth, determined ““influenceinfluence”” of each ranking heuristic on of each ranking heuristic on getting pages into top spot (top 3, top getting pages into top spot (top 3, top 5, top 10, etc.)5, top 10, etc.)

Best: Anchortext, Titles, PageRankBest: Anchortext, Titles, PageRank Worst: Content, URL Depth, IndegreeWorst: Content, URL Depth, Indegree Used Dwork et al. random walk Used Dwork et al. random walk

heuristic for aggregationheuristic for aggregation

Page 19: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

The Rank Aggregation The Rank Aggregation ProblemProblem Formulate as a graph problemFormulate as a graph problem Input:Input:

– Set of elements VSet of elements V

– Pairwise information w(i,j),w(j,i)Pairwise information w(i,j),w(j,i)

w(w(j,ij,i) = fraction of voters ranking ) = fraction of voters ranking jj before ibefore i– Find a permutation Find a permutation that minimizes that minimizes

(i) < (i) < (j)(j) w(j,i) w(j,i)

(scaled Kemeny aggregation)(scaled Kemeny aggregation)

Page 20: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Full vs. partial rank Full vs. partial rank aggregationaggregation FullFull rank aggregation: input permutations rank aggregation: input permutations

are total ordersare total orders PartialPartial rank aggregation: otherwise rank aggregation: otherwise Inputs from partial rank aggregation obey Inputs from partial rank aggregation obey

triangle inequality:triangle inequality:– w(i,j) + w(j,k) w(i,j) + w(j,k) ≥≥ w(i,k) w(i,k)

Full rank aggregation also obeys probability Full rank aggregation also obeys probability constraints:constraints:– w(i,j) + w(j,i) = 1w(i,j) + w(j,i) = 1

Page 21: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Approximation algorithmsApproximation algorithms

An An -approximation algorithm is a -approximation algorithm is a polynomial-time algorithm that polynomial-time algorithm that produces a solution of cost at most produces a solution of cost at most times the optimal cost.times the optimal cost.

Page 22: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Remainder of talkRemainder of talk

Approximation algorithms for rank Approximation algorithms for rank aggregationaggregationA very simple 2-approximation algorithm A very simple 2-approximation algorithm

for full rank aggregationfor full rank aggregationPivoting algorithms Pivoting algorithms A simple, deterministic 2-approximation A simple, deterministic 2-approximation

algorithm for triangle inequalityalgorithm for triangle inequalityComputational experimentsComputational experiments

Page 23: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

A simple approximation A simple approximation algorithmalgorithmAn easy 2-approximation algorithm for full rank An easy 2-approximation algorithm for full rank

aggregation:aggregation:choose one of choose one of M M input permutations at randominput permutations at randomprobability i is ranked before j =probability i is ranked before j =

# {# {mm s.t. s.t. mm(i) < (i) < mm(j)} / (j)} / M = M = w(i,j)w(i,j)““costcost”” if i is ranked before j = w(j,i) if i is ranked before j = w(j,i)

expected cost for {i,j} : expected cost for {i,j} : 2w(i,j)w(j,i) 2w(i,j)w(j,i) 2 min {w(i,j), w(j,i)} 2 min {w(i,j), w(j,i)}

Every feasible ordering has cost for {i,j} at Every feasible ordering has cost for {i,j} at least min {w(i,j), w(j,i)}.least min {w(i,j), w(j,i)}.

Page 24: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Doing betterDoing better

To do better, consider a more general To do better, consider a more general problem in which weights obey triangle problem in which weights obey triangle inequality and/or probability constraintsinequality and/or probability constraints– e.g. problems on tournamentse.g. problems on tournaments

Ailon, Charikar, and Newman (STOC Ailon, Charikar, and Newman (STOC 2005) give first constant-factor 2005) give first constant-factor approximation algorithms for these approximation algorithms for these more general problems.more general problems.

Page 25: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

A Quicksort-style algorithmA Quicksort-style algorithm

Choose a vertex k as pivotChoose a vertex k as pivot Order vertex i Order vertex i

left of k if (i,k) in A left of k if (i,k) in A right of k if (k,i) in Aright of k if (k,i) in A

Recurse on left and rightRecurse on left and right

pivotleft right

Page 26: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

If graph is weighted, then form a If graph is weighted, then form a majority majority tournamenttournament G=(V,A) that has (i,j) in A if w(i,j) G=(V,A) that has (i,j) in A if w(i,j) w(j,i); run algorithm. w(j,i); run algorithm.

Ailon et al. show that this gives a 3-Ailon et al. show that this gives a 3-approximation algorithm for weights obeying approximation algorithm for weights obeying triangle inequality triangle inequality

Van Zuylen & W Van Zuylen & W ‘‘07 give a 2-approximation 07 give a 2-approximation algorithm that chooses the pivot algorithm that chooses the pivot deterministically.deterministically.

Page 27: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Bounding the cost?Bounding the cost?

Some arcs in the majority tournament become backward arcsSome arcs in the majority tournament become backward arcs

Observation: backward arcs can be attributed to a particular pivotObservation: backward arcs can be attributed to a particular pivot

cost of cost of forwardforward arc = min{w(i,j),w(j,i)} =: arc = min{w(i,j),w(j,i)} =: wwijij

cost of cost of backwardbackward arc = max{w(i,j), w(j,i)} =: arc = max{w(i,j), w(j,i)} =: wwijij

Idea: choose pivot carefully, so that the total cost of the backward Idea: choose pivot carefully, so that the total cost of the backward arcs is not much more than the total budget for these arcsarcs is not much more than the total budget for these arcs

ij pivot k

“budget” for {i,j}

Page 28: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

How to choose a good How to choose a good pivotpivot

Choose pivot minimizingChoose pivot minimizing

cost of backward arcscost of backward arcs

budget of backward arcsbudget of backward arcs

ThmThm: If the weights satisfy the triangle : If the weights satisfy the triangle inequality, there exists a pivot such that inequality, there exists a pivot such that this ratio is at most 2this ratio is at most 2

Page 29: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

How to choose a good How to choose a good pivotpivotThere exists a pivot such thatThere exists a pivot such that

cost of backward arcs cost of backward arcs 2 (budget of backward arcs) 2 (budget of backward arcs)

Proof:Proof: By averaging argument: By averaging argument:

pivotspivots (cost of backward arcs) = (cost of backward arcs) =

directed triangles tdirected triangles t ( (backwardbackward cost of arcs in t) cost of arcs in t)

pivotspivots (budget of backward arcs) = (budget of backward arcs) =

directed triangles tdirected triangles t((forwardforward cost of all arcs in t) cost of all arcs in t)

k

ijj

pivot k

i

k

pivot ij

k

ipivot j

k

ij

Page 30: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

How to choose a good How to choose a good pivotpivot

Proof (continued): pivots (cost of backward arcs) =

directed triangles t (backward cost of arcs in t)

pivots (budget of backward arcs) =

directed triangles t (forward cost of arcs in t)

k

ij

w(t) = w(j,i) + w(i,k) + w(k,j)

= 2 w(t)

w(t)

w(t)

There exists a pivot such that

cost of backward arcs 2 (budget of backward arcs )

w(j,k) + w(k,i) + w(i,j) + w(j,k)

+ w(k,i) + w(i,j)

Not hard to show that

Page 31: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Combining the two 2-Combining the two 2-approximationsapproximations

Can show that running both the random Can show that running both the random dictator algorithm and the pivoting dictator algorithm and the pivoting algorithm, choosing best solution, algorithm, choosing best solution, gives a 1.6-approximation algorithm gives a 1.6-approximation algorithm for full rank aggregation.for full rank aggregation.

Can be extended to partial rank Can be extended to partial rank aggregationaggregation

Page 32: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

More resultsMore results

Ailon, Charikar, Newman Ailon, Charikar, Newman ’’05 give a 05 give a randomized LP-rounding 4/3-approximation randomized LP-rounding 4/3-approximation algorithm for full rank aggregation.algorithm for full rank aggregation.

Ailon Ailon ’’07 gives 3/2-approximation algorithm 07 gives 3/2-approximation algorithm for partial rank aggregation.for partial rank aggregation.

Van Zuylen & W Van Zuylen & W ’’07 give deterministic 07 give deterministic variants.variants.

Kenyon-Mathieu and Schudy Kenyon-Mathieu and Schudy ’’07 give an 07 give an approximation scheme for full rank approximation scheme for full rank aggregation.aggregation.

Page 33: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Similar problemsSimilar problems

The same sort of pivoting algorithms can The same sort of pivoting algorithms can be applied to problems in clustering be applied to problems in clustering and hierarchical clustering resulting in and hierarchical clustering resulting in approximation algorithms with similar approximation algorithms with similar performance.performance.

Page 34: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

ClusteringClustering

Input: Input: – Set of elements VSet of elements V– Pairwise information wPairwise information w++{i,j}, w{i,j}, w--{i,j}{i,j}– Assumption: weights satisfy Assumption: weights satisfy

triangle inequality ortriangle inequality or probability constraintsprobability constraints

Goal: Goal: – Find a clustering that minimizes Find a clustering that minimizes

i,j togetheri,j togetherww--{i,j} + {i,j} + i,j separated i,j separated ww{i,j}{i,j}

Page 35: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

ClusteringClustering

““Majority tournamentMajority tournament”” – ‘‘++’’ edge {i,j} if w edge {i,j} if w++{i,j} {i,j} w w--{i,j}{i,j}– ‘‘--’’edge {i,j} if wedge {i,j} if w--{i,j} {i,j} w w++{i,j}{i,j}

Pivoting on vertex k:Pivoting on vertex k:– If {i,k} is a If {i,k} is a ‘‘++’’ edge, put i in same cluster as k edge, put i in same cluster as k– If {i,k} is a If {i,k} is a ‘‘--’’ edge, separate i from k edge, separate i from k

Recurse on vertices separated from kRecurse on vertices separated from k

““Directed triangleDirected triangle”” + +

-

Page 36: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Hierarchical ClusteringHierarchical Clustering

M-level hierarchical clustering :M-level hierarchical clustering :– M nested clusterings of same set of objectsM nested clusterings of same set of objects

Input: pairwise information DInput: pairwise information Dij ij {0, …, M} {0, …, M}

Goal: Minimize LGoal: Minimize L11-distance from D: -distance from D: i,j i,j ||ijij - D - Dijij||

i

i j k l

i j l k

j l k

jk = 2

ij = 1

Page 37: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Hierarchical ClusteringHierarchical Clustering

Hierarchical clustering:Hierarchical clustering:– Construct hierarchical clustering top-down: Construct hierarchical clustering top-down:

Use clustering algorithm to get top level clusteringUse clustering algorithm to get top level clustering Recursively invoke algorithm for each top level clusterRecursively invoke algorithm for each top level cluster

(M+2)-approximation algorithm (M = # levels)(M+2)-approximation algorithm (M = # levels)

Matches bound of a more complicated, randomized Matches bound of a more complicated, randomized algorithm of Ailon and Charikar (FOCS algorithm of Ailon and Charikar (FOCS ’’05)05)

Page 38: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Empirical resultsEmpirical results

How well do the ranking algorithms do in How well do the ranking algorithms do in practice?practice?

Two data sets:Two data sets:– Repeat of Dwork et al. experimentsRepeat of Dwork et al. experiments

37 queries to Ask, Google, MSN, Yahoo!37 queries to Ask, Google, MSN, Yahoo! Take top 100 results of each; pages are Take top 100 results of each; pages are ““samesame”” if if

canonicalized URLs are samecanonicalized URLs are same

– Web Communities Data SetWeb Communities Data Set From 9 full rankings of 25 million documentsFrom 9 full rankings of 25 million documents 50 samples of 100 documents, induced 9 rankings of 50 samples of 100 documents, induced 9 rankings of

the 100 documentsthe 100 documents

Page 39: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Pivoting variantsPivoting variants

Deterministic algorithm too slowDeterministic algorithm too slow Take K elements at random, use best Take K elements at random, use best

of K for pivot (using ratio test)of K for pivot (using ratio test)

Page 40: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Dwork et al.Dwork et al.

Page 41: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Web CommunitiesWeb Communities

Page 42: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Other heuristicsOther heuristics

Borda scoringBorda scoring– Sort vertices in ascending order of weighted Sort vertices in ascending order of weighted

indegreeindegree MC4MC4

– The Dwork et al. Markov Chain heuristicThe Dwork et al. Markov Chain heuristic Local KemenizationLocal Kemenization

– Interchange neighbors to improve overall scoreInterchange neighbors to improve overall score Local searchLocal search

– Move single vertices to improve overall scoreMove single vertices to improve overall score CPLEX LP/IPCPLEX LP/IP

– Most LP solutions integralMost LP solutions integral

Page 43: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Dwork et al.Dwork et al.

Page 44: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Web CommunitiesWeb Communities

Page 45: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Open questionsOpen questions

Approximation scheme for partial rank Approximation scheme for partial rank aggregation?aggregation?

Does the model accurately capture Does the model accurately capture ““goodgood”” combined rankings? combined rankings?– Back to metasearch?Back to metasearch?

Page 46: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Open questionsOpen questions

Hope for other linear ordering problems?Hope for other linear ordering problems?– Recent results seem to say no:Recent results seem to say no:

Guruswami, Manokaran, Raghavendra (FOCS 2008): canGuruswami, Manokaran, Raghavendra (FOCS 2008): can’’t t do better than ½ for Max Acyclic Subgraph if Unique Games do better than ½ for Max Acyclic Subgraph if Unique Games has no polytime algorithms.has no polytime algorithms.

Bansal, Khot (FOCS 2009): canBansal, Khot (FOCS 2009): can’’t do better than 2 for single t do better than 2 for single machine scheduling with precedence to minimize weighted machine scheduling with precedence to minimize weighted completion time if variant of Unique Games has no polytime completion time if variant of Unique Games has no polytime algorithms.algorithms.

Svensson (STOC 2010): canSvensson (STOC 2010): can’’t do better than 2 for scheduling t do better than 2 for scheduling identical parallel machines with precedence constraints to identical parallel machines with precedence constraints to minimize schedule length if variant of Unique Games has no minimize schedule length if variant of Unique Games has no polytime algorithms.polytime algorithms.

Perhaps prove that 4/3 is best possible given Perhaps prove that 4/3 is best possible given Unique Games?Unique Games?

Page 47: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Obrigado.

Any questions?

[email protected]

www.davidpwilliamson.net/work

Page 48: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Open questionsOpen questions

Linear ordering polytope has integrality gap of 4/3 Linear ordering polytope has integrality gap of 4/3 for weights from full rank aggregation:for weights from full rank aggregation:Min Min i,ji,j x(i,j)w(j,i) + x(j,i)w(i,j) x(i,j)w(j,i) + x(j,i)w(i,j)s.t. s.t. x(i,j) + x(j,i)x(i,j) + x(j,i) = 1= 1 for all i,j for all i,j

x(i,k) + x(k,j) + x(j,i) x(i,k) + x(k,j) + x(j,i) ≥≥ 1 1 for for all distinct i,j,kall distinct i,j,k

x(i,j) x(i,j) ¸̧ 0 0when when w(i,j) + w(j,i) = 1, w(i,j) + w(j,i) = 1,

w(i,j) + w(k,j) + w(j,i) w(i,j) + w(k,j) + w(j,i) ¸̧ 1. 1.

Is this the worst case for these instances?Is this the worst case for these instances?

Page 49: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Remainder of talkRemainder of talk

Approximation algorithms for rank aggregationApproximation algorithms for rank aggregationA very simple 2-approximation algorithm for full A very simple 2-approximation algorithm for full

rank aggregationrank aggregationPivoting algorithms Pivoting algorithms A simple, deterministic 2-approximation A simple, deterministic 2-approximation

algorithm for triangle inequalityalgorithm for triangle inequalityA 1.6-approximation algorithm for full rank A 1.6-approximation algorithm for full rank

aggregationaggregationLP-based pivotingLP-based pivoting

Page 50: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Further resultsFurther results

To get results for other classes of weights To get results for other classes of weights (e.g. for tournaments) and stronger results (e.g. for tournaments) and stronger results for rank aggregation, we need linear for rank aggregation, we need linear programming based algorithms.programming based algorithms.

Ailon, Charikar, Newman (STOC Ailon, Charikar, Newman (STOC ’’05) and 05) and Ailon (SODA Ailon (SODA ’’07) give randomized rounding 07) give randomized rounding algorithms; made deterministic by Van algorithms; made deterministic by Van Zuylen, Hegde, Jain, W (SODA Zuylen, Hegde, Jain, W (SODA ’’06) and 06) and Van Zuylen, W Van Zuylen, W ’’07.07.

Page 51: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Why LP based?Why LP based?

Consider tournamentsConsider tournamentsw(i,j) = w(i,j) = 1 if (i,j) in tournament1 if (i,j) in tournament

0 otherwise0 otherwise

wwijij 0 0 ijij wwijij = 0 = 0 Lower bound of 0!Lower bound of 0!

Need better lower bound!Need better lower bound!

Page 52: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

LP based algorithmsLP based algorithms

Solve LP relaxation, and round solution:Solve LP relaxation, and round solution:x(i,j) = 1 if i before j, 0 otherwisex(i,j) = 1 if i before j, 0 otherwise

Min Min i,ji,j x(i,j)w(j,i) + x(j,i)w(i,j) x(i,j)w(j,i) + x(j,i)w(i,j)

s.t. s.t. x(i,j) + x(j,i)x(i,j) + x(j,i) = 1= 1 for all i,j for all i,j

x(i,k) + x(k,j) + x(j,i) x(i,k) + x(k,j) + x(j,i) ≥≥ 1 1 for all distinct i,j,k for all distinct i,j,kx(i,j) x(i,j) {0,1} {0,1} 0

i j

k

Page 53: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

LP based algorithmsLP based algorithms

Two types of rounding:Two types of rounding:1.1. - Form tournament G=(V,A) that has (i,j) in A if - Form tournament G=(V,A) that has (i,j) in A if

x(i,j)x(i,j)11//22

- Pivot to get an acyclic solution (where a pivot is - Pivot to get an acyclic solution (where a pivot is chosen similar to before)chosen similar to before)

2.2. - - Choose a vertex j as pivotChoose a vertex j as pivot

order i left of j with probability x(i,j)order i left of j with probability x(i,j)

order i right of j with probability x(j,i)order i right of j with probability x(j,i)

- Recurse on left and right- Recurse on left and right

use method of conditional

expectation to derandomize

Page 54: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

LP based algorithms: LP based algorithms: approximation guaranteesapproximation guarantees

1.1. ““Deterministic roundingDeterministic rounding”” probability constraints: probability constraints: 33

2.2. ““Conditional expectationConditional expectation”” probability constraints: probability constraints: 55//22

triangle inequality constraintstriangle inequality constraints

(partial rank aggregation): (partial rank aggregation): 33//22

full rank aggregation: full rank aggregation: 44//33

Randomized versions due to Ailon et al. and Ailon; deterministic versions by Van Zuylen Randomized versions due to Ailon et al. and Ailon; deterministic versions by Van Zuylen

et al. and Van Zuylen and W.et al. and Van Zuylen and W.

Page 55: The Rank Aggregation Problem David P. Williamson Cornell University Universidade Federal de Minas Gerais December 10, 2012

Remainder of talkRemainder of talk

Approximation algorithms for rank aggregationApproximation algorithms for rank aggregationA very simple 2-approximation algorithm for full A very simple 2-approximation algorithm for full

rank aggregationrank aggregationPivoting algorithms Pivoting algorithms A simple, deterministic 2-approximation A simple, deterministic 2-approximation

algorithm for triangle inequalityalgorithm for triangle inequalityA 1.6-approximation algorithm for partial rank A 1.6-approximation algorithm for partial rank

aggregationaggregationLP-based pivotingLP-based pivoting