rank aggregation
DESCRIPTION
Rank Aggregation. Rank Aggregation: Settings. Multiple items Web-pages, cars, apartments,…. Multiple scores for each item By different reviewers, users, according to different features… Some aggregation function on the scores Sum, Average, Max… Goal: compute the top-k items. - PowerPoint PPT PresentationTRANSCRIPT
Rank Aggregation
Rank Aggregation: Settings• Multiple items– Web-pages, cars, apartments,….
• Multiple scores for each item– By different reviewers, users, according to different
features…
• Some aggregation function on the scores– Sum, Average, Max…
• Goal: compute the top-k items
Rank Aggregation ExampleModel PriceRank
Honda 9
Volvo 3
Subaru 9
Model ComfortRank
Honda 7
Volvo 10
Subaru 5
Model BeautyRank
Honda 3
Volvo 8
Subaru 4
Model TotalRank(min)
Honda 3
Volvo 3
Subaru 4
Model TotalRank(avg)
Honda 6.333
Volvo 7
Subaru 6
Naïve Algorithm
• Compute the aggregated rank for all items
• Find the best one, then the second best one… the k best one
• Good for small-scale problems• Still not feasible for web scales…
Can we do any better?
• An assumption to help us: each individual list comes sorted– Reasonable for search engines, user rankings…
• Another assumption: monotonicity of the aggregation function
• Now can we do any better?
Fagin's algorithm (FA)
• Do sorted access on all lists in parallel• For every item do random access to the other
lists to fetch all of its values• Stop when at least k items were seen (in the
sorted access) in all lists • Sort the list• Why is this enough?
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
Average
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
B 9.5
Average
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
B 9.5
Average
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
B 9.5
C 4
Average
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
B 9.5
C 4
Average
Example (top-3)
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
B 9.5
C 4
Average
How do we know not to look further?
Complexity
• Probabilistic analysis on the order of items can be used to show better bounds (with good probability)
• Can we do even better?
Cost model
• This is a very simple settings so we can define a finer cost model than worst case complexity
• In a web context it is important to do so – Since the scale is huge
• We associate some cost Cs with every sorted access , and some cost Cr with every random access
• Denote the cost for algorithm A on input instance I by cost(A,I)
Instance-optimality
• An algorithm A is instance-optimal if for every input instance I, cost(A,I) = O(cost(A',I)) for every algorithm A'
• A very strong notion
• But we can realize it here!
Threshold Algorithm (TA)
• Idea: sometimes we can stop before seeing k objects in every list
• Use a threshold on how good can a score of an unseen object be.
• Based on aggregating the minimal score seen so far in all lists
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
Average
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
Average
T=9.5
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
B 9.5
Average
T=9.5
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
B 9.5
C 4
Average
T=7
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
B 9.5
C 4
Average
T=4
One step less!
Instance-optimality• Theorem: If the aggregation function is strictly monotone and every
two items in a list have distinct grades, then TA is instance-optimal– Intuition: If an algorithm stops on input I before reaching the
threshold, then we can design an input I' on which it is wrong, by changing values it did not see
– TA sees at most K items more than any algorithm on any input
• Strict monotonicity is needed to avoid "lucky guesses" in breaking ties– Thm. In general no instance-optimal algorithm exists
• Theorem: TA is instance-optimal against all algorithms that do not "guess"– i.e. do not do random access to an item they did not see in
sorted access
Restricted Sorted Access
• Some rankings are not available as sorted– E.g. distances from a map site
• Then we can revise TA to do sorted access only on the list where it is possible
• And still instance-optimal! (Against algorithms that work under the same
restrictions, of course)
No Random Access
• Maintain bottom and upper bounds for every item (worst and best grades)
• Best is the aggregation of what we have seen and the worst we have seen in every list, Worst is the aggregation with what we have seen and zeros
• Keep in the list those with top-K "worst" grades– Break ties by "best" grades
• Halt if we have k items in the list, and the best grade for every item out of the list is less than the k'th in the list
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 4.5<S<9
Average
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort Average
Item Score
A 4.5<S<9
B 5<S<10
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort Average
Item Score
A 4.5<S<9
B 9.5
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort Average
Item Score
A 4.5<S<9
B 9.5
C 2.5<S<5
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
B 9.5
C 4
Average
Item Score
A 4.5<S<9
B 9.5
C 4
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
B 9.5
C 4
Average
Item Score
A 6.5
B 9.5
C 4
Example
Item Score
A 9
B 9
C 3
D 1
Item Score
B 10
C 5
A 4
D 3
Beauty Comfort
Item Score
A 6.5
B 9.5
C 4
Average
Item Score
A 6.5
B 9.5
C 4
Score(D)<3