ranking, aggregation, and youlmackey/papers/rank... · ranking, aggregation, and you lester mackeyy...
TRANSCRIPT
![Page 1: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/1.jpg)
Ranking, Aggregation, and You
Lester Mackey†
Collaborators: John C. Duchi† and Michael I. Jordan∗
†Stanford University ∗UC Berkeley
October 5, 2014
![Page 2: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/2.jpg)
A simple question
I On a scale of 1 (very white) to 10 (very black), how black is thisbox?
I Which box is blacker?
![Page 3: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/3.jpg)
A simple question
I On a scale of 1 (very white) to 10 (very black), how black is thisbox?
I Which box is blacker?
![Page 4: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/4.jpg)
A simple question
I On a scale of 1 (very white) to 10 (very black), how black is thisbox?
I Which box is blacker?
![Page 5: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/5.jpg)
Another questionOn a scale of 1 to 10, how relevant is this result for the query flowers?
![Page 6: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/6.jpg)
Another question
On a scale of 1 to 10, how relevant is this result for the query flowers?
![Page 7: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/7.jpg)
Another question
![Page 8: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/8.jpg)
What have we learned?
1. We are good at pairwise comparisons
I Much worse at absolute relevance judgments[Miller, 1956, Shiffrin and Nosofsky, 1994, Stewart, Brown, and Chater, 2005]
2. We are good at expressing sparse, partial preferences
I Much worse at expressing complete preferences
Complete preferences:
ftd.com
en.wikipedia.org/...
1800flowers.com
What you express:
ftd.com
en.wikipedia.org/...
1800flowers.com
![Page 9: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/9.jpg)
What have we learned?1. We are good at pairwise comparisons
I Much worse at absolute relevance judgments[Miller, 1956, Shiffrin and Nosofsky, 1994, Stewart, Brown, and Chater, 2005]
2. We are good at expressing sparse, partial preferences
I Much worse at expressing complete preferences
Complete preferences:
ftd.com
en.wikipedia.org/...
1800flowers.com
What you express:
ftd.com
en.wikipedia.org/...
1800flowers.com
![Page 10: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/10.jpg)
What have we learned?1. We are good at pairwise comparisons
I Much worse at absolute relevance judgments[Miller, 1956, Shiffrin and Nosofsky, 1994, Stewart, Brown, and Chater, 2005]
2. We are good at expressing sparse, partial preferencesI Much worse at expressing complete preferences
Complete preferences:
ftd.com
en.wikipedia.org/...
1800flowers.com
What you express:
ftd.com
en.wikipedia.org/...
1800flowers.com
![Page 11: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/11.jpg)
Ranking
Goal: Order set of items/results to best match your preferences
I Web search: Return most relevant URLs for user queries
I Recommendation systems:
I Movies to watch based on user’s past ratingsI News articles to read based on past browsing historyI Items to buy based on patron’s or other patrons’ purchases
![Page 12: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/12.jpg)
Ranking
Goal: Order set of items/results to best match your preferences
I Web search: Return most relevant URLs for user queries
I Recommendation systems:
I Movies to watch based on user’s past ratingsI News articles to read based on past browsing historyI Items to buy based on patron’s or other patrons’ purchases
![Page 13: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/13.jpg)
Ranking
Goal: Order set of items/results to best match your preferences
I Web search: Return most relevant URLs for user queries
I Recommendation systems:I Movies to watch based on user’s past ratingsI News articles to read based on past browsing historyI Items to buy based on patron’s or other patrons’ purchases
![Page 14: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/14.jpg)
Ranking proceduresGoal: Order set of items/results to best match your preferences
1. Tractable: Run in polynomial time
2. Consistent: Recover true preferences given sufficient data
3. Realistic: Make use of ubiquitous partial preference data
Past work: 1+2 are possible given complete preference data[Ravikumar, Tewari, and Yang, 2011, Buffoni, Calauzenes, Gallinari, and Usunier, 2011]
This work [Duchi, Mackey, and Jordan, 2013]
I Standard (tractable) procedures for ranking with partialpreferences are inconsistent
I Aggregating partial preferences into more complete preferencescan restore consistency
I New estimators based on U-statistics achieve 1+2+3
![Page 15: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/15.jpg)
Ranking proceduresGoal: Order set of items/results to best match your preferences
1. Tractable: Run in polynomial time
2. Consistent: Recover true preferences given sufficient data
3. Realistic: Make use of ubiquitous partial preference data
Past work: 1+2 are possible given complete preference data[Ravikumar, Tewari, and Yang, 2011, Buffoni, Calauzenes, Gallinari, and Usunier, 2011]
This work [Duchi, Mackey, and Jordan, 2013]
I Standard (tractable) procedures for ranking with partialpreferences are inconsistent
I Aggregating partial preferences into more complete preferencescan restore consistency
I New estimators based on U-statistics achieve 1+2+3
![Page 16: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/16.jpg)
Ranking proceduresGoal: Order set of items/results to best match your preferences
1. Tractable: Run in polynomial time
2. Consistent: Recover true preferences given sufficient data
3. Realistic: Make use of ubiquitous partial preference data
Past work: 1+2 are possible given complete preference data[Ravikumar, Tewari, and Yang, 2011, Buffoni, Calauzenes, Gallinari, and Usunier, 2011]
This work [Duchi, Mackey, and Jordan, 2013]
I Standard (tractable) procedures for ranking with partialpreferences are inconsistent
I Aggregating partial preferences into more complete preferencescan restore consistency
I New estimators based on U-statistics achieve 1+2+3
![Page 17: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/17.jpg)
Ranking proceduresGoal: Order set of items/results to best match your preferences
1. Tractable: Run in polynomial time
2. Consistent: Recover true preferences given sufficient data
3. Realistic: Make use of ubiquitous partial preference data
Past work: 1+2 are possible given complete preference data[Ravikumar, Tewari, and Yang, 2011, Buffoni, Calauzenes, Gallinari, and Usunier, 2011]
This work [Duchi, Mackey, and Jordan, 2013]
I Standard (tractable) procedures for ranking with partialpreferences are inconsistent
I Aggregating partial preferences into more complete preferencescan restore consistency
I New estimators based on U-statistics achieve 1+2+3
![Page 18: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/18.jpg)
Ranking proceduresGoal: Order set of items/results to best match your preferences
1. Tractable: Run in polynomial time
2. Consistent: Recover true preferences given sufficient data
3. Realistic: Make use of ubiquitous partial preference data
Past work: 1+2 are possible given complete preference data[Ravikumar, Tewari, and Yang, 2011, Buffoni, Calauzenes, Gallinari, and Usunier, 2011]
This work [Duchi, Mackey, and Jordan, 2013]
I Standard (tractable) procedures for ranking with partialpreferences are inconsistent
I Aggregating partial preferences into more complete preferencescan restore consistency
I New estimators based on U-statistics achieve 1+2+3
![Page 19: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/19.jpg)
Ranking proceduresGoal: Order set of items/results to best match your preferences
1. Tractable: Run in polynomial time
2. Consistent: Recover true preferences given sufficient data
3. Realistic: Make use of ubiquitous partial preference data
Past work: 1+2 are possible given complete preference data[Ravikumar, Tewari, and Yang, 2011, Buffoni, Calauzenes, Gallinari, and Usunier, 2011]
This work [Duchi, Mackey, and Jordan, 2013]
I Standard (tractable) procedures for ranking with partialpreferences are inconsistent
I Aggregating partial preferences into more complete preferencescan restore consistency
I New estimators based on U-statistics achieve 1+2+3
![Page 20: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/20.jpg)
Ranking proceduresGoal: Order set of items/results to best match your preferences
1. Tractable: Run in polynomial time
2. Consistent: Recover true preferences given sufficient data
3. Realistic: Make use of ubiquitous partial preference data
Past work: 1+2 are possible given complete preference data[Ravikumar, Tewari, and Yang, 2011, Buffoni, Calauzenes, Gallinari, and Usunier, 2011]
This work [Duchi, Mackey, and Jordan, 2013]
I Standard (tractable) procedures for ranking with partialpreferences are inconsistent
I Aggregating partial preferences into more complete preferencescan restore consistency
I New estimators based on U-statistics achieve 1+2+3
![Page 21: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/21.jpg)
Ranking proceduresGoal: Order set of items/results to best match your preferences
1. Tractable: Run in polynomial time
2. Consistent: Recover true preferences given sufficient data
3. Realistic: Make use of ubiquitous partial preference data
Past work: 1+2 are possible given complete preference data[Ravikumar, Tewari, and Yang, 2011, Buffoni, Calauzenes, Gallinari, and Usunier, 2011]
This work [Duchi, Mackey, and Jordan, 2013]
I Standard (tractable) procedures for ranking with partialpreferences are inconsistent
I Aggregating partial preferences into more complete preferencescan restore consistency
I New estimators based on U-statistics achieve 1+2+3
![Page 22: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/22.jpg)
Outline
Supervised RankingFormal definitionTractable surrogatesPairwise inconsistency
AggregationRestoring consistencyEstimating complete preferences
U-statisticsPractical proceduresExperimental results
![Page 23: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/23.jpg)
Outline
Supervised RankingFormal definitionTractable surrogatesPairwise inconsistency
AggregationRestoring consistencyEstimating complete preferences
U-statisticsPractical proceduresExperimental results
![Page 24: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/24.jpg)
Supervised ranking
Observe: Sequence of training examples
I Query Q: e.g., search term “flowers”
I Set of m items IQ to rank
I e.g., websites {1, 2, 3, 4}
I Label Y representing some preferencestructure over items
I Item 1 preferred to {2, 3} and item 3 to 4
1
2 3
4
y12 y13
y34
Example: Y is agraph on items{1, 2, 3, 4}
![Page 25: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/25.jpg)
Supervised ranking
Observe: Sequence of training examples
I Query Q: e.g., search term “flowers”
I Set of m items IQ to rank
I e.g., websites {1, 2, 3, 4}
I Label Y representing some preferencestructure over items
I Item 1 preferred to {2, 3} and item 3 to 4
1
2 3
4
y12 y13
y34
Example: Y is agraph on items{1, 2, 3, 4}
![Page 26: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/26.jpg)
Supervised ranking
Observe: Sequence of training examples
I Query Q: e.g., search term “flowers”
I Set of m items IQ to rankI e.g., websites {1, 2, 3, 4}
I Label Y representing some preferencestructure over items
I Item 1 preferred to {2, 3} and item 3 to 4
1
2 3
4
y12 y13
y34
Example: Y is agraph on items{1, 2, 3, 4}
![Page 27: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/27.jpg)
Supervised ranking
Observe: Sequence of training examples
I Query Q: e.g., search term “flowers”
I Set of m items IQ to rankI e.g., websites {1, 2, 3, 4}
I Label Y representing some preferencestructure over items
I Item 1 preferred to {2, 3} and item 3 to 4
1
2 3
4
y12 y13
y34
Example: Y is agraph on items{1, 2, 3, 4}
![Page 28: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/28.jpg)
Supervised ranking
Observe: Sequence of training examples
I Query Q: e.g., search term “flowers”
I Set of m items IQ to rankI e.g., websites {1, 2, 3, 4}
I Label Y representing some preferencestructure over items
I Item 1 preferred to {2, 3} and item 3 to 4
1
2 3
4
y12 y13
y34
Example: Y is agraph on items{1, 2, 3, 4}
![Page 29: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/29.jpg)
Supervised ranking
Observe: (Q1, Y1), . . . , (Qn, Yn)
Learn: Scoring function f to induce item rankings for each query
I Real-valued score for each item i in item set IQ
αi := fi(Q)
I Vector of scores f(Q) induces ranking over IQ
i ranked above j ⇐⇒ αi > αj
![Page 30: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/30.jpg)
Supervised ranking
Observe: (Q1, Y1), . . . , (Qn, Yn)
Learn: Scoring function f to induce item rankings for each query
I Real-valued score for each item i in item set IQ
αi := fi(Q)
I Vector of scores f(Q) induces ranking over IQ
i ranked above j ⇐⇒ αi > αj
![Page 31: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/31.jpg)
Supervised ranking
Example: Scoring function f with scores
f1(Q) > f2(Q) > f3(Q)
induces same ranking as preference graph Y
1
2
3
Y
f1(Q) > f2(Q)
f2(Q) > f3(Q)
![Page 32: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/32.jpg)
Supervised ranking
Observe: (Q1, Y1), . . . , (Qn, Yn)
Learn: Scoring function f to predict item ranking
Suffer loss: L(f(Q), Y )
I Encodes discord between observed label Y and prediction f(Q)
I Depends on specific ranking task and available data
![Page 33: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/33.jpg)
Supervised rankingExample: Pairwise loss
I Let Y = (weighted) adjacency matrix for a preference graph
I Yij = the preference weight on edge (i, j)
I Let α = f(Q) be the predicted scores for query QI Then, L(α, Y ) =
∑i 6=j Yij1(αi≤αj)
I Imposes penalty for each misordered edge
1
2 3
4
y12 y13
y34
L(α, Y ) = Y121(α1≤α2) + Y131(α1≤α3) + Y341(α3≤α4)
![Page 34: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/34.jpg)
Supervised rankingExample: Pairwise loss
I Let Y = (weighted) adjacency matrix for a preference graphI Yij = the preference weight on edge (i, j)
I Let α = f(Q) be the predicted scores for query QI Then, L(α, Y ) =
∑i 6=j Yij1(αi≤αj)
I Imposes penalty for each misordered edge
1
2 3
4
y12 y13
y34
L(α, Y ) = Y121(α1≤α2) + Y131(α1≤α3) + Y341(α3≤α4)
![Page 35: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/35.jpg)
Supervised rankingExample: Pairwise loss
I Let Y = (weighted) adjacency matrix for a preference graphI Yij = the preference weight on edge (i, j)
I Let α = f(Q) be the predicted scores for query Q
I Then, L(α, Y ) =∑
i 6=j Yij1(αi≤αj)
I Imposes penalty for each misordered edge
1
2 3
4
y12 y13
y34
L(α, Y ) = Y121(α1≤α2) + Y131(α1≤α3) + Y341(α3≤α4)
![Page 36: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/36.jpg)
Supervised rankingExample: Pairwise loss
I Let Y = (weighted) adjacency matrix for a preference graphI Yij = the preference weight on edge (i, j)
I Let α = f(Q) be the predicted scores for query QI Then, L(α, Y ) =
∑i 6=j Yij1(αi≤αj)
I Imposes penalty for each misordered edge
1
2 3
4
y12 y13
y34
L(α, Y ) = Y121(α1≤α2) + Y131(α1≤α3) + Y341(α3≤α4)
![Page 37: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/37.jpg)
Supervised ranking
Observe: (Q1, Y1), . . . (Qn, Yn)
Learn: Scoring function f to rank items
Suffer loss: L(f(Q), Y )
Goal: Minimize the risk R(f) := E [L(f(Q), Y )]
Main Question:Are there tractable ranking procedures that minimize R as n→∞?
![Page 38: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/38.jpg)
Supervised ranking
Observe: (Q1, Y1), . . . (Qn, Yn)
Learn: Scoring function f to rank items
Suffer loss: L(f(Q), Y )
Goal: Minimize the risk R(f) := E [L(f(Q), Y )]
Main Question:Are there tractable ranking procedures that minimize R as n→∞?
![Page 39: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/39.jpg)
Tractable rankingFirst try: Empirical risk minimization
← Intractable!
minf
Rn(f) := En [L(f(Q), Y )] =1
n
∑n
k=1L(f(Qk), Yk)
Idea: Replace loss L(α, Y ) with convex surrogate ϕ(α, Y )
L(α, Y ) =∑
i 6=j Yij1(αi≤αj) ϕ(α, Y ) =∑
i6=j Yijφ(αi − αj)
Hard Tractable
![Page 40: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/40.jpg)
Tractable rankingFirst try: Empirical risk minimization ← Intractable!
minf
Rn(f) := En [L(f(Q), Y )] =1
n
∑n
k=1L(f(Qk), Yk)
Idea: Replace loss L(α, Y ) with convex surrogate ϕ(α, Y )
L(α, Y ) =∑
i 6=j Yij1(αi≤αj) ϕ(α, Y ) =∑
i6=j Yijφ(αi − αj)
Hard Tractable
![Page 41: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/41.jpg)
Tractable rankingFirst try: Empirical risk minimization ← Intractable!
minf
Rn(f) := En [L(f(Q), Y )] =1
n
∑n
k=1L(f(Qk), Yk)
Idea: Replace loss L(α, Y ) with convex surrogate ϕ(α, Y )
L(α, Y ) =∑
i 6=j Yij1(αi≤αj)
ϕ(α, Y ) =∑
i6=j Yijφ(αi − αj)
Hard
Tractable
![Page 42: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/42.jpg)
Tractable rankingFirst try: Empirical risk minimization ← Intractable!
minf
Rn(f) := En [L(f(Q), Y )] =1
n
∑n
k=1L(f(Qk), Yk)
Idea: Replace loss L(α, Y ) with convex surrogate ϕ(α, Y )
L(α, Y ) =∑
i 6=j Yij1(αi≤αj)
ϕ(α, Y ) =∑
i6=j Yijφ(αi − αj)
Hard
Tractable
![Page 43: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/43.jpg)
Tractable rankingFirst try: Empirical risk minimization ← Intractable!
minf
Rn(f) := En [L(f(Q), Y )] =1
n
∑n
k=1L(f(Qk), Yk)
Idea: Replace loss L(α, Y ) with convex surrogate ϕ(α, Y )
L(α, Y ) =∑
i 6=j Yij1(αi≤αj) ϕ(α, Y ) =∑
i 6=j Yijφ(αi − αj)
Hard Tractable
![Page 44: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/44.jpg)
Surrogate ranking
Idea: Empirical surrogate risk minimization
minf
Rϕ,n(f) := En [ϕ(f(Q), Y )] =1
n
∑n
k=1ϕ(f(Qk), Yk)
I If ϕ convex, then minimization is tractable
I argminf Rϕ,n(f)n→∞→ argminf Rϕ(f) := E [ϕ(f(Q), Y )]
Main Question:Are these tractable ranking procedures consistent?
⇐⇒Does argminf Rϕ(f) also minimize the true risk R(f)?
![Page 45: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/45.jpg)
Surrogate ranking
Idea: Empirical surrogate risk minimization
minf
Rϕ,n(f) := En [ϕ(f(Q), Y )] =1
n
∑n
k=1ϕ(f(Qk), Yk)
I If ϕ convex, then minimization is tractable
I argminf Rϕ,n(f)n→∞→ argminf Rϕ(f) := E [ϕ(f(Q), Y )]
Main Question:Are these tractable ranking procedures consistent?
⇐⇒Does argminf Rϕ(f) also minimize the true risk R(f)?
![Page 46: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/46.jpg)
Surrogate ranking
Idea: Empirical surrogate risk minimization
minf
Rϕ,n(f) := En [ϕ(f(Q), Y )] =1
n
∑n
k=1ϕ(f(Qk), Yk)
I If ϕ convex, then minimization is tractable
I argminf Rϕ,n(f)n→∞→ argminf Rϕ(f) := E [ϕ(f(Q), Y )]
Main Question:Are these tractable ranking procedures consistent?
⇐⇒Does argminf Rϕ(f) also minimize the true risk R(f)?
![Page 47: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/47.jpg)
Surrogate ranking
Idea: Empirical surrogate risk minimization
minf
Rϕ,n(f) := En [ϕ(f(Q), Y )] =1
n
∑n
k=1ϕ(f(Qk), Yk)
I If ϕ convex, then minimization is tractable
I argminf Rϕ,n(f)n→∞→ argminf Rϕ(f) := E [ϕ(f(Q), Y )]
Main Question:Are these tractable ranking procedures consistent?
⇐⇒Does argminf Rϕ(f) also minimize the true risk R(f)?
![Page 48: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/48.jpg)
Surrogate ranking
Idea: Empirical surrogate risk minimization
minf
Rϕ,n(f) := En [ϕ(f(Q), Y )] =1
n
∑n
k=1ϕ(f(Qk), Yk)
I If ϕ convex, then minimization is tractable
I argminf Rϕ,n(f)n→∞→ argminf Rϕ(f) := E [ϕ(f(Q), Y )]
Main Question:Are these tractable ranking procedures consistent?
⇐⇒Does argminf Rϕ(f) also minimize the true risk R(f)?
![Page 49: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/49.jpg)
Classification consistency
Consider the special case of classification
I Observe: query X, items {0, 1}, label Y01 = 1 or Y10 = 1
I Pairwise loss: L(α, Y ) = Y011(α0≤α1) + Y101(α1≤α0)
I Surrogate loss: ϕ(α, Y ) = Y01φ(α0 − α1) + Y10φ(α1 − α0)
Theorem: If φ is convex, procedure based on minimizing φ isconsistent if and only if φ′(0) < 0. [Bartlett, Jordan, and McAuliffe, 2006]
⇒ Tractable consistency for boosting, SVMs, logistic regression
![Page 50: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/50.jpg)
Classification consistency
Consider the special case of classification
I Observe: query X, items {0, 1}, label Y01 = 1 or Y10 = 1
I Pairwise loss: L(α, Y ) = Y011(α0≤α1) + Y101(α1≤α0)
I Surrogate loss: ϕ(α, Y ) = Y01φ(α0 − α1) + Y10φ(α1 − α0)
Theorem: If φ is convex, procedure based on minimizing φ isconsistent if and only if φ′(0) < 0. [Bartlett, Jordan, and McAuliffe, 2006]
⇒ Tractable consistency for boosting, SVMs, logistic regression
![Page 51: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/51.jpg)
Classification consistency
Consider the special case of classification
I Observe: query X, items {0, 1}, label Y01 = 1 or Y10 = 1
I Pairwise loss: L(α, Y ) = Y011(α0≤α1) + Y101(α1≤α0)
I Surrogate loss: ϕ(α, Y ) = Y01φ(α0 − α1) + Y10φ(α1 − α0)
Theorem: If φ is convex, procedure based on minimizing φ isconsistent if and only if φ′(0) < 0. [Bartlett, Jordan, and McAuliffe, 2006]
⇒ Tractable consistency for boosting, SVMs, logistic regression
![Page 52: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/52.jpg)
Classification consistency
Consider the special case of classification
I Observe: query X, items {0, 1}, label Y01 = 1 or Y10 = 1
I Pairwise loss: L(α, Y ) = Y011(α0≤α1) + Y101(α1≤α0)
I Surrogate loss: ϕ(α, Y ) = Y01φ(α0 − α1) + Y10φ(α1 − α0)
Theorem: If φ is convex, procedure based on minimizing φ isconsistent if and only if φ′(0) < 0. [Bartlett, Jordan, and McAuliffe, 2006]
⇒ Tractable consistency for boosting, SVMs, logistic regression
![Page 53: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/53.jpg)
Classification consistency
Consider the special case of classification
I Observe: query X, items {0, 1}, label Y01 = 1 or Y10 = 1
I Pairwise loss: L(α, Y ) = Y011(α0≤α1) + Y101(α1≤α0)
I Surrogate loss: ϕ(α, Y ) = Y01φ(α0 − α1) + Y10φ(α1 − α0)
Theorem: If φ is convex, procedure based on minimizing φ isconsistent if and only if φ′(0) < 0. [Bartlett, Jordan, and McAuliffe, 2006]
⇒ Tractable consistency for boosting, SVMs, logistic regression
![Page 54: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/54.jpg)
Ranking consistency?
Good news: Can characterize surrogate ranking consistency
Theorem:1 Procedure based on minimizing ϕ is consistent ⇐⇒
minα
{E[ϕ(α, Y ) | q]
∣∣∣∣ α 6∈ argminα′
E[L(α′, Y ) | q]}
> minα
E[ϕ(α, Y ) | q].
I Translation: ϕ is consistent if and only if minimizing conditionalsurrogate risk gives correct ranking for every query
1[Duchi, Mackey, and Jordan, 2013]
![Page 55: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/55.jpg)
Ranking consistency?
Good news: Can characterize surrogate ranking consistency
Theorem:1 Procedure based on minimizing ϕ is consistent ⇐⇒
minα
{E[ϕ(α, Y ) | q]
∣∣∣∣ α 6∈ argminα′
E[L(α′, Y ) | q]}
> minα
E[ϕ(α, Y ) | q].
I Translation: ϕ is consistent if and only if minimizing conditionalsurrogate risk gives correct ranking for every query
1[Duchi, Mackey, and Jordan, 2013]
![Page 56: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/56.jpg)
Ranking consistency?
Bad news: The consequences are dire...
Consider the pairwise loss:
L(α, Y ) =∑
i 6=j
Yij1(αi≤αj)
1
2 3
4
y12 y13
y34
Task: Find argminα E[L(α, Y ) | q]I Classification (two node) case: Easy
I Choose α0 > α1 ⇐⇒ P[Class 0 | q] > P[Class 1 | q]
I General case: NP hard
I Unless P = NP , must restrict problem for tractable consistency
![Page 57: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/57.jpg)
Ranking consistency?
Bad news: The consequences are dire...
Consider the pairwise loss:
L(α, Y ) =∑
i 6=j
Yij1(αi≤αj)
1
2 3
4
y12 y13
y34
Task: Find argminα E[L(α, Y ) | q]I Classification (two node) case: Easy
I Choose α0 > α1 ⇐⇒ P[Class 0 | q] > P[Class 1 | q]
I General case: NP hard
I Unless P = NP , must restrict problem for tractable consistency
![Page 58: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/58.jpg)
Ranking consistency?
Bad news: The consequences are dire...
Consider the pairwise loss:
L(α, Y ) =∑
i 6=j
Yij1(αi≤αj)
1
2 3
4
y12 y13
y34
Task: Find argminα E[L(α, Y ) | q]
I Classification (two node) case: Easy
I Choose α0 > α1 ⇐⇒ P[Class 0 | q] > P[Class 1 | q]
I General case: NP hard
I Unless P = NP , must restrict problem for tractable consistency
![Page 59: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/59.jpg)
Ranking consistency?
Bad news: The consequences are dire...
Consider the pairwise loss:
L(α, Y ) =∑
i 6=j
Yij1(αi≤αj)
1
2 3
4
y12 y13
y34
Task: Find argminα E[L(α, Y ) | q]I Classification (two node) case: Easy
I Choose α0 > α1 ⇐⇒ P[Class 0 | q] > P[Class 1 | q]
I General case: NP hard
I Unless P = NP , must restrict problem for tractable consistency
![Page 60: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/60.jpg)
Ranking consistency?
Bad news: The consequences are dire...
Consider the pairwise loss:
L(α, Y ) =∑
i 6=j
Yij1(αi≤αj)
1
2 3
4
y12 y13
y34
Task: Find argminα E[L(α, Y ) | q]I Classification (two node) case: Easy
I Choose α0 > α1 ⇐⇒ P[Class 0 | q] > P[Class 1 | q]
I General case: NP hardI Unless P = NP , must restrict problem for tractable consistency
![Page 61: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/61.jpg)
Low noise distributionDefine: Average preference for item i over item j:
sij = E[Yij | q]
I We say i � j on average if sij > sji
Definition (Low noise distribution): If i � j on average and j � kon average, then i � k on average.
2 3
1
s12 s31s13
s23
Low noise⇒ s13 > s31
I No cyclic preferences on average
I Find argminα E[L(α, Y ) | q]: Very easyI Choose αi > αj ⇐⇒ sij > sji
![Page 62: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/62.jpg)
Low noise distributionDefine: Average preference for item i over item j:
sij = E[Yij | q]
I We say i � j on average if sij > sji
Definition (Low noise distribution): If i � j on average and j � kon average, then i � k on average.
2 3
1
s12 s31s13
s23
Low noise⇒ s13 > s31
I No cyclic preferences on average
I Find argminα E[L(α, Y ) | q]: Very easyI Choose αi > αj ⇐⇒ sij > sji
![Page 63: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/63.jpg)
Low noise distributionDefine: Average preference for item i over item j:
sij = E[Yij | q]
I We say i � j on average if sij > sji
Definition (Low noise distribution): If i � j on average and j � kon average, then i � k on average.
2 3
1
s12 s31s13
s23
Low noise⇒ s13 > s31
I No cyclic preferences on average
I Find argminα E[L(α, Y ) | q]: Very easyI Choose αi > αj ⇐⇒ sij > sji
![Page 64: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/64.jpg)
Ranking consistency?
Pairwise ranking surrogate:[Herbrich, Graepel, and Obermayer, 2000, Freund, Iyer, Schapire, and Singer, 2003, Dekel, Manning, and Singer, 2004]
ϕ(α, Y ) =∑
ij
Yijφ(αi − αj)
for φ convex with φ′(0) < 0. Common in ranking literature.
Theorem: ϕ is not consistent, even in low noise settings.[Duchi, Mackey, and Jordan, 2013]
⇒ Inconsistency for RankBoost, RankSVM, Logistic Ranking...
![Page 65: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/65.jpg)
Ranking consistency?
Pairwise ranking surrogate:[Herbrich, Graepel, and Obermayer, 2000, Freund, Iyer, Schapire, and Singer, 2003, Dekel, Manning, and Singer, 2004]
ϕ(α, Y ) =∑
ij
Yijφ(αi − αj)
for φ convex with φ′(0) < 0. Common in ranking literature.
Theorem: ϕ is not consistent, even in low noise settings.[Duchi, Mackey, and Jordan, 2013]
⇒ Inconsistency for RankBoost, RankSVM, Logistic Ranking...
![Page 66: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/66.jpg)
Ranking consistency?
Pairwise ranking surrogate:[Herbrich, Graepel, and Obermayer, 2000, Freund, Iyer, Schapire, and Singer, 2003, Dekel, Manning, and Singer, 2004]
ϕ(α, Y ) =∑
ij
Yijφ(αi − αj)
for φ convex with φ′(0) < 0. Common in ranking literature.
Theorem: ϕ is not consistent, even in low noise settings.[Duchi, Mackey, and Jordan, 2013]
⇒ Inconsistency for RankBoost, RankSVM, Logistic Ranking...
![Page 67: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/67.jpg)
Ranking with pairwise data is challenging
I Inconsistent in general (unless P = NP )I Low noise distributions
I Inconsistent for standard convex losses
ϕ(α, Y ) =∑
ij
Yijφ(αi − αj)
I Inconsistent for margin-based convex losses
ϕ(α, Y ) =∑
ij
φ(αi − αj − Yij)
Question:Do tractable consistent losses exist for partial preference data?
Yes!
![Page 68: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/68.jpg)
Ranking with pairwise data is challengingI Inconsistent in general (unless P = NP )
I Low noise distributions
I Inconsistent for standard convex losses
ϕ(α, Y ) =∑
ij
Yijφ(αi − αj)
I Inconsistent for margin-based convex losses
ϕ(α, Y ) =∑
ij
φ(αi − αj − Yij)
Question:Do tractable consistent losses exist for partial preference data?
Yes!
![Page 69: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/69.jpg)
Ranking with pairwise data is challengingI Inconsistent in general (unless P = NP )I Low noise distributions
I Inconsistent for standard convex losses
ϕ(α, Y ) =∑
ij
Yijφ(αi − αj)
I Inconsistent for margin-based convex losses
ϕ(α, Y ) =∑
ij
φ(αi − αj − Yij)
Question:Do tractable consistent losses exist for partial preference data?
Yes!
![Page 70: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/70.jpg)
Ranking with pairwise data is challengingI Inconsistent in general (unless P = NP )I Low noise distributions
I Inconsistent for standard convex losses
ϕ(α, Y ) =∑
ij
Yijφ(αi − αj)
I Inconsistent for margin-based convex losses
ϕ(α, Y ) =∑
ij
φ(αi − αj − Yij)
Question:Do tractable consistent losses exist for partial preference data?
Yes!
![Page 71: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/71.jpg)
Ranking with pairwise data is challengingI Inconsistent in general (unless P = NP )I Low noise distributions
I Inconsistent for standard convex losses
ϕ(α, Y ) =∑
ij
Yijφ(αi − αj)
I Inconsistent for margin-based convex losses
ϕ(α, Y ) =∑
ij
φ(αi − αj − Yij)
Question:Do tractable consistent losses exist for partial preference data?
Yes!
![Page 72: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/72.jpg)
Ranking with pairwise data is challengingI Inconsistent in general (unless P = NP )I Low noise distributions
I Inconsistent for standard convex losses
ϕ(α, Y ) =∑
ij
Yijφ(αi − αj)
I Inconsistent for margin-based convex losses
ϕ(α, Y ) =∑
ij
φ(αi − αj − Yij)
Question:Do tractable consistent losses exist for partial preference data?
Yes!
![Page 73: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/73.jpg)
Ranking with pairwise data is challengingI Inconsistent in general (unless P = NP )I Low noise distributions
I Inconsistent for standard convex losses
ϕ(α, Y ) =∑
ij
Yijφ(αi − αj)
I Inconsistent for margin-based convex losses
ϕ(α, Y ) =∑
ij
φ(αi − αj − Yij)
Question:Do tractable consistent losses exist for partial preference data?
Yes!
![Page 74: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/74.jpg)
Ranking with pairwise data is challengingI Inconsistent in general (unless P = NP )I Low noise distributions
I Inconsistent for standard convex losses
ϕ(α, Y ) =∑
ij
Yijφ(αi − αj)
I Inconsistent for margin-based convex losses
ϕ(α, Y ) =∑
ij
φ(αi − αj − Yij)
Question:Do tractable consistent losses exist for partial preference data?
Yes, if we aggregate!
![Page 75: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/75.jpg)
Outline
Supervised RankingFormal definitionTractable surrogatesPairwise inconsistency
AggregationRestoring consistencyEstimating complete preferences
U-statisticsPractical proceduresExperimental results
![Page 76: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/76.jpg)
An observation
Can rewrite risk of pairwise loss
E[L(α, Y ) | q] =∑
i 6=j
sij1(αi≤αj)
=∑
i 6=j
max{sij − sji, 0}1(αi≤αj)
where sij = E[Yij | q].
I Only depends on net expected preferences: sij − sjiConsider the surrogate
ϕ(α, s) :=∑
i 6=j
max{sij − sji, 0}φ(αi − αj)
6=∑
i6=j
sijφ(αi − αj)
for φ non-increasing and convex, with φ′(0) < 0.
I Either i→ j penalized or j → i but not both
I Consistent whenever average preferences are acyclic
![Page 77: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/77.jpg)
An observation
Can rewrite risk of pairwise loss
E[L(α, Y ) | q] =∑
i 6=j
sij1(αi≤αj) =∑
i 6=j
max{sij − sji, 0}1(αi≤αj)
where sij = E[Yij | q].I Only depends on net expected preferences: sij − sji
Consider the surrogate
ϕ(α, s) :=∑
i 6=j
max{sij − sji, 0}φ(αi − αj)
6=∑
i6=j
sijφ(αi − αj)
for φ non-increasing and convex, with φ′(0) < 0.
I Either i→ j penalized or j → i but not both
I Consistent whenever average preferences are acyclic
![Page 78: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/78.jpg)
An observation
Can rewrite risk of pairwise loss
E[L(α, Y ) | q] =∑
i 6=j
sij1(αi≤αj) =∑
i 6=j
max{sij − sji, 0}1(αi≤αj)
where sij = E[Yij | q].I Only depends on net expected preferences: sij − sji
Consider the surrogate
ϕ(α, s) :=∑
i 6=j
max{sij − sji, 0}φ(αi − αj)
6=∑
i 6=j
sijφ(αi − αj)
for φ non-increasing and convex, with φ′(0) < 0.
I Either i→ j penalized or j → i but not both
I Consistent whenever average preferences are acyclic
![Page 79: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/79.jpg)
An observation
Can rewrite risk of pairwise loss
E[L(α, Y ) | q] =∑
i 6=j
sij1(αi≤αj) =∑
i 6=j
max{sij − sji, 0}1(αi≤αj)
where sij = E[Yij | q].I Only depends on net expected preferences: sij − sji
Consider the surrogate
ϕ(α, s) :=∑
i 6=j
max{sij − sji, 0}φ(αi − αj) 6=∑
i 6=j
sijφ(αi − αj)
for φ non-increasing and convex, with φ′(0) < 0.
I Either i→ j penalized or j → i but not both
I Consistent whenever average preferences are acyclic
![Page 80: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/80.jpg)
An observation
Can rewrite risk of pairwise loss
E[L(α, Y ) | q] =∑
i 6=j
sij1(αi≤αj) =∑
i 6=j
max{sij − sji, 0}1(αi≤αj)
where sij = E[Yij | q].I Only depends on net expected preferences: sij − sji
Consider the surrogate
ϕ(α, s) :=∑
i 6=j
max{sij − sji, 0}φ(αi − αj) 6=∑
i 6=j
sijφ(αi − αj)
for φ non-increasing and convex, with φ′(0) < 0.
I Either i→ j penalized or j → i but not both
I Consistent whenever average preferences are acyclic
![Page 81: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/81.jpg)
What happened?
Old surrogates: E[ϕ(α, Y ) | q] = limk→∞1k
∑k ϕ(α, Yk)
I Loss ϕ(α, Y ) applied to a single datapoint
New surrogates: ϕ(α,E[Y | q])
= limk→∞ ϕ(α,1k
∑k Yk)
I Loss applied to aggregation of many datapoints
New framework: Ranking with aggregate losses
L(α, sk(Y1, . . . , Yk)) and ϕ(α, sk(Y1, . . . , Yk))
where sk is a structure function that aggregates first k datapoints
I sk combines partial preferences into more complete estimates
I Consistency characterization extends to this setting
![Page 82: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/82.jpg)
What happened?
Old surrogates: E[ϕ(α, Y ) | q] = limk→∞1k
∑k ϕ(α, Yk)
I Loss ϕ(α, Y ) applied to a single datapoint
New surrogates: ϕ(α,E[Y | q]) = limk→∞ ϕ(α,1k
∑k Yk)
I Loss applied to aggregation of many datapoints
New framework: Ranking with aggregate losses
L(α, sk(Y1, . . . , Yk)) and ϕ(α, sk(Y1, . . . , Yk))
where sk is a structure function that aggregates first k datapoints
I sk combines partial preferences into more complete estimates
I Consistency characterization extends to this setting
![Page 83: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/83.jpg)
What happened?
Old surrogates: E[ϕ(α, Y ) | q] = limk→∞1k
∑k ϕ(α, Yk)
I Loss ϕ(α, Y ) applied to a single datapoint
New surrogates: ϕ(α,E[Y | q]) = limk→∞ ϕ(α,1k
∑k Yk)
I Loss applied to aggregation of many datapoints
New framework: Ranking with aggregate losses
L(α, sk(Y1, . . . , Yk)) and ϕ(α, sk(Y1, . . . , Yk))
where sk is a structure function that aggregates first k datapoints
I sk combines partial preferences into more complete estimates
I Consistency characterization extends to this setting
![Page 84: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/84.jpg)
What happened?
Old surrogates: E[ϕ(α, Y ) | q] = limk→∞1k
∑k ϕ(α, Yk)
I Loss ϕ(α, Y ) applied to a single datapoint
New surrogates: ϕ(α,E[Y | q]) = limk→∞ ϕ(α,1k
∑k Yk)
I Loss applied to aggregation of many datapoints
New framework: Ranking with aggregate losses
L(α, sk(Y1, . . . , Yk)) and ϕ(α, sk(Y1, . . . , Yk))
where sk is a structure function that aggregates first k datapoints
I sk combines partial preferences into more complete estimates
I Consistency characterization extends to this setting
![Page 85: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/85.jpg)
What happened?
Old surrogates: E[ϕ(α, Y ) | q] = limk→∞1k
∑k ϕ(α, Yk)
I Loss ϕ(α, Y ) applied to a single datapoint
New surrogates: ϕ(α,E[Y | q]) = limk→∞ ϕ(α,1k
∑k Yk)
I Loss applied to aggregation of many datapoints
New framework: Ranking with aggregate losses
L(α, sk(Y1, . . . , Yk)) and ϕ(α, sk(Y1, . . . , Yk))
where sk is a structure function that aggregates first k datapoints
I sk combines partial preferences into more complete estimates
I Consistency characterization extends to this setting
![Page 86: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/86.jpg)
Aggregation via structure function
1
34
3
3
2
4
1
3
4
4
3
1
2
3
4
Y1, Y2, . . . , Yk sk(Y1, . . . , Yk)
Question: When does aggregation help?
![Page 87: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/87.jpg)
Aggregation via structure function
1
34
3
3
2
4
1
3
4
4
3
1
2
3
4
Y1, Y2, . . . , Yk sk(Y1, . . . , Yk)
Question: When does aggregation help?
![Page 88: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/88.jpg)
Complete data lossesI Normalized Discounted Cumulative Gain (NDCG)I Precision, Precision@kI Expected reciprocal rank (ERR)
Pros: Popular, well-motivated, admit tractable consistent surrogatesI e.g., Penalize mistakes at top of ranked list more heavily
Cons: Require complete preference data
Idea:I Use aggregation to estimate complete preferences from partial
preferences
I Plug estimates into consistent surrogatesI Check that aggregation + surrogacy retains consistency
![Page 89: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/89.jpg)
Complete data lossesI Normalized Discounted Cumulative Gain (NDCG)I Precision, Precision@kI Expected reciprocal rank (ERR)
Pros: Popular, well-motivated, admit tractable consistent surrogatesI e.g., Penalize mistakes at top of ranked list more heavily
Cons: Require complete preference data
Idea:I Use aggregation to estimate complete preferences from partial
preferences
I Plug estimates into consistent surrogatesI Check that aggregation + surrogacy retains consistency
![Page 90: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/90.jpg)
Complete data lossesI Normalized Discounted Cumulative Gain (NDCG)I Precision, Precision@kI Expected reciprocal rank (ERR)
Pros: Popular, well-motivated, admit tractable consistent surrogatesI e.g., Penalize mistakes at top of ranked list more heavily
Cons: Require complete preference data
Idea:I Use aggregation to estimate complete preferences from partial
preferences
I Plug estimates into consistent surrogatesI Check that aggregation + surrogacy retains consistency
![Page 91: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/91.jpg)
Complete data lossesI Normalized Discounted Cumulative Gain (NDCG)I Precision, Precision@kI Expected reciprocal rank (ERR)
Pros: Popular, well-motivated, admit tractable consistent surrogatesI e.g., Penalize mistakes at top of ranked list more heavily
Cons: Require complete preference data
Idea:I Use aggregation to estimate complete preferences from partial
preferencesI Plug estimates into consistent surrogates
I Check that aggregation + surrogacy retains consistency
![Page 92: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/92.jpg)
Complete data lossesI Normalized Discounted Cumulative Gain (NDCG)I Precision, Precision@kI Expected reciprocal rank (ERR)
Pros: Popular, well-motivated, admit tractable consistent surrogatesI e.g., Penalize mistakes at top of ranked list more heavily
Cons: Require complete preference data
Idea:I Use aggregation to estimate complete preferences from partial
preferencesI Plug estimates into consistent surrogatesI Check that aggregation + surrogacy retains consistency
![Page 93: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/93.jpg)
Cascade model for click data[Craswell, Zoeter, Taylor, and Ramsey, 2008, Chapelle, Metzler, Zhang, and Grinspan, 2009]
I Person i clicks on first relevant result, k(i)
I Relevance probability of item k is pkI Probability of a click on item k is
pk
k−1∏
j=1
(1− pj)
I ERR loss assumes p is known
Estimate p via maximum likelihood on n clicks:
s = argmaxp∈[0,1]m
n∑
i=1
log pk(i) +
k(i)∑
j=1
log(1− pj).
⇒ Consistent ERR minimization under our framework
1
2
3
4
5
![Page 94: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/94.jpg)
Cascade model for click data[Craswell, Zoeter, Taylor, and Ramsey, 2008, Chapelle, Metzler, Zhang, and Grinspan, 2009]
I Person i clicks on first relevant result, k(i)
I Relevance probability of item k is pk
I Probability of a click on item k is
pk
k−1∏
j=1
(1− pj)
I ERR loss assumes p is known
Estimate p via maximum likelihood on n clicks:
s = argmaxp∈[0,1]m
n∑
i=1
log pk(i) +
k(i)∑
j=1
log(1− pj).
⇒ Consistent ERR minimization under our framework
1
2
3
4
5
![Page 95: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/95.jpg)
Cascade model for click data[Craswell, Zoeter, Taylor, and Ramsey, 2008, Chapelle, Metzler, Zhang, and Grinspan, 2009]
I Person i clicks on first relevant result, k(i)
I Relevance probability of item k is pkI Probability of a click on item k is
pk
k−1∏
j=1
(1− pj)
I ERR loss assumes p is known
Estimate p via maximum likelihood on n clicks:
s = argmaxp∈[0,1]m
n∑
i=1
log pk(i) +
k(i)∑
j=1
log(1− pj).
⇒ Consistent ERR minimization under our framework
1
2
3
4
5
![Page 96: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/96.jpg)
Cascade model for click data[Craswell, Zoeter, Taylor, and Ramsey, 2008, Chapelle, Metzler, Zhang, and Grinspan, 2009]
I Person i clicks on first relevant result, k(i)
I Relevance probability of item k is pkI Probability of a click on item k is
pk
k−1∏
j=1
(1− pj)
I ERR loss assumes p is known
Estimate p via maximum likelihood on n clicks:
s = argmaxp∈[0,1]m
n∑
i=1
log pk(i) +
k(i)∑
j=1
log(1− pj).
⇒ Consistent ERR minimization under our framework
1
2
3
4
5
![Page 97: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/97.jpg)
Cascade model for click data[Craswell, Zoeter, Taylor, and Ramsey, 2008, Chapelle, Metzler, Zhang, and Grinspan, 2009]
I Person i clicks on first relevant result, k(i)
I Relevance probability of item k is pkI Probability of a click on item k is
pk
k−1∏
j=1
(1− pj)
I ERR loss assumes p is known
Estimate p via maximum likelihood on n clicks:
s = argmaxp∈[0,1]m
n∑
i=1
log pk(i) +
k(i)∑
j=1
log(1− pj).
⇒ Consistent ERR minimization under our framework
1
2
3
4
5
![Page 98: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/98.jpg)
Benefits of aggregation
I Tractable consistency for partial preference losses
argminf
limk→∞
E[ϕ(f(Q), sk(Y1, . . . , Yk))]
⇒argmin
flimk→∞
E[L(f(Q), sk(Y1, . . . , Yk))]
I Use complete data losses with realistic partial preference dataI Models process of generating relevance scores from
clicks/comparisons
![Page 99: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/99.jpg)
What remains?Before aggregation, we had
argminf
1
n
∑n
k=1ϕ(f(Qk), Yk)
︸ ︷︷ ︸empirical
→ argminf
E[ϕ(f(Q), Y )]︸ ︷︷ ︸population
What’s a suitable empirical analogue Rϕ,n(f) with aggregation?
⇐⇒When does
argminf
Rϕ,n(f)︸ ︷︷ ︸empirical
→ argminf
limk→∞
E[ϕ(f(Q), sk(Y1, . . . , Yk))]︸ ︷︷ ︸
population
?
![Page 100: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/100.jpg)
What remains?Before aggregation, we had
argminf
1
n
∑n
k=1ϕ(f(Qk), Yk)
︸ ︷︷ ︸empirical
→ argminf
E[ϕ(f(Q), Y )]︸ ︷︷ ︸population
What’s a suitable empirical analogue Rϕ,n(f) with aggregation?
⇐⇒When does
argminf
Rϕ,n(f)︸ ︷︷ ︸empirical
→ argminf
limk→∞
E[ϕ(f(Q), sk(Y1, . . . , Yk))]︸ ︷︷ ︸
population
?
![Page 101: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/101.jpg)
What remains?Before aggregation, we had
argminf
1
n
∑n
k=1ϕ(f(Qk), Yk)
︸ ︷︷ ︸empirical
→ argminf
E[ϕ(f(Q), Y )]︸ ︷︷ ︸population
What’s a suitable empirical analogue Rϕ,n(f) with aggregation?
⇐⇒When does
argminf
Rϕ,n(f)︸ ︷︷ ︸empirical
→ argminf
limk→∞
E[ϕ(f(Q), sk(Y1, . . . , Yk))]︸ ︷︷ ︸
population
?
![Page 102: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/102.jpg)
Outline
Supervised RankingFormal definitionTractable surrogatesPairwise inconsistency
AggregationRestoring consistencyEstimating complete preferences
U-statisticsPractical proceduresExperimental results
![Page 103: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/103.jpg)
Data with aggregation
q1
q2
q3
q4
q5
Y1 Y2 Y3... nq1
nq2
nq3
I Datapoint consists of query qand preference judgment Y
I nq datapoints for query q
I Structure functions foraggregation:
s(Y1, Y2, . . . , Yk)
![Page 104: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/104.jpg)
Data with aggregation
q1
q2
q3
q4
q5
Y1 Y2 Y3... nq1
nq2
nq3
I Simple idea: for query q,aggregate all Y1, Y2, . . . , Ynq
I Loss ϕ for query q is
nq · ϕ(α, s(Y1, . . . , Ynq))
Cons:
I Requires detailed knowledge of ϕ and sk(Y1, . . . , Yk) as k →∞
Ideal procedure:
I Agnostic to form of aggregation
I Take advantage of independence of Y1, Y2, . . .
![Page 105: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/105.jpg)
Data with aggregation
q1
q2
q3
q4
q5
Y1 Y2 Y3... nq1
nq2
nq3
I Simple idea: for query q,aggregate all Y1, Y2, . . . , Ynq
I Loss ϕ for query q is
nq · ϕ(α, s(Y1, . . . , Ynq))
Cons:
I Requires detailed knowledge of ϕ and sk(Y1, . . . , Yk) as k →∞
Ideal procedure:
I Agnostic to form of aggregation
I Take advantage of independence of Y1, Y2, . . .
![Page 106: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/106.jpg)
Data with aggregation
q1
q2
q3
q4
q5
Y1 Y2 Y3... nq1
nq2
nq3
I Simple idea: for query q,aggregate all Y1, Y2, . . . , Ynq
I Loss ϕ for query q is
nq · ϕ(α, s(Y1, . . . , Ynq))
Cons:
I Requires detailed knowledge of ϕ and sk(Y1, . . . , Yk) as k →∞
Ideal procedure:
I Agnostic to form of aggregation
I Take advantage of independence of Y1, Y2, . . .
![Page 107: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/107.jpg)
Digression: U -statistics
q
nq︷ ︸︸ ︷
k︸ ︷︷ ︸
I U-statistic: classical tool in statistics
I Given X1, . . . , Xn, estimate E[g(X1, . . . , Xk)]for g symmetric
I Idea: Average all estimates based on kdatapoints
Un =
(n
k
)−1 ∑
i1<···<ik
g(Xi1 , Xi2 , . . . , Xik)
![Page 108: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/108.jpg)
Data with aggregation: U -statistic in the loss
q
nq︷ ︸︸ ︷
k︸ ︷︷ ︸
I Target: E[ϕ(α, s(Y1, . . . , Yk)) | q]
I Idea: Estimate with U -statistic:
(nqk
)−1 ∑
i1<···<ik
ϕ(α, s(Yi1 , . . . , Yik))
I Empirical risk for scoring function f :
Rϕ,n(f) =
1
n
∑
q
nq
(nqk
)−1 ∑
i1<···<ik
ϕ(f(q), s(Yi1 , . . . , Yik))
![Page 109: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/109.jpg)
Data with aggregation: U -statistic in the loss
q
nq︷ ︸︸ ︷
k︸ ︷︷ ︸
I Target: E[ϕ(α, s(Y1, . . . , Yk)) | q]I Idea: Estimate with U -statistic:
(nqk
)−1 ∑
i1<···<ik
ϕ(α, s(Yi1 , . . . , Yik))
I Empirical risk for scoring function f :
Rϕ,n(f) =
1
n
∑
q
nq
(nqk
)−1 ∑
i1<···<ik
ϕ(f(q), s(Yi1 , . . . , Yik))
![Page 110: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/110.jpg)
Data with aggregation: U -statistic in the loss
q
nq︷ ︸︸ ︷
k︸ ︷︷ ︸
I Target: E[ϕ(α, s(Y1, . . . , Yk)) | q]I Idea: Estimate with U -statistic:
(nqk
)−1 ∑
i1<···<ik
ϕ(α, s(Yi1 , . . . , Yik))
I Empirical risk for scoring function f :
Rϕ,n(f) =
1
n
∑
q
nq
(nqk
)−1 ∑
i1<···<ik
ϕ(f(q), s(Yi1 , . . . , Yik))
![Page 111: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/111.jpg)
Convergence of U -statistic procedures
Empirical risk for scoring function f :
Rϕ,n(f) =1
n
∑
q
nq
(nqk
)−1 ∑
i1<···<ik
ϕ(f(q), s(Yi1 , . . . , Yik))
Theorem: If we choose kn = o(n) but kn →∞, then uniformly in f
Rϕ,n(f)→ limk→∞
E[ϕ(f(Q), s(Y1, . . . , Yk))]︸ ︷︷ ︸
Limiting aggregated loss
![Page 112: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/112.jpg)
New procedure for learning to rank
1
2
3
4
I Use loss function that aggregates per-query:
Rϕ,n(f) =
1
n
∑
q
nq
(nqk
)−1 ∑
i1<···<ik
ϕ(f(q), s(Yi1 , . . . , Yik))
I Learn ranking function by taking
f ∈ argminf∈F
Rϕ,n(f)
I Can optimize by stochastic gradient descent overqueries q and subsets (i1, . . . , ik)
![Page 113: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/113.jpg)
Experiments
I Web search
I Image ranking
![Page 114: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/114.jpg)
Web searchI Microsoft Learning to Rank Web10K dataset
I 10,000 queries issuedI 100 items per queryI Estimated relevance score r ∈ R for each query/result pair
I Generating pairwise preferences
I Choose query q uniformly at randomI Choose pair (i, j) of items, and set i � j with probability
pij =1
1 + exp(rj − ri)
I Aggregate scores by setting
si =∑
j 6=i
logP (j ≺ i)
P (i ≺ j)
![Page 115: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/115.jpg)
Web searchI Microsoft Learning to Rank Web10K dataset
I 10,000 queries issuedI 100 items per queryI Estimated relevance score r ∈ R for each query/result pair
I Generating pairwise preferences
I Choose query q uniformly at randomI Choose pair (i, j) of items, and set i � j with probability
pij =1
1 + exp(rj − ri)
I Aggregate scores by setting
si =∑
j 6=i
logP (j ≺ i)
P (i ≺ j)
![Page 116: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/116.jpg)
Web searchI Microsoft Learning to Rank Web10K dataset
I 10,000 queries issuedI 100 items per queryI Estimated relevance score r ∈ R for each query/result pair
I Generating pairwise preferencesI Choose query q uniformly at randomI Choose pair (i, j) of items, and set i � j with probability
pij =1
1 + exp(rj − ri)
I Aggregate scores by setting
si =∑
j 6=i
logP (j ≺ i)
P (i ≺ j)
![Page 117: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/117.jpg)
Web searchI Microsoft Learning to Rank Web10K dataset
I 10,000 queries issuedI 100 items per queryI Estimated relevance score r ∈ R for each query/result pair
I Generating pairwise preferencesI Choose query q uniformly at randomI Choose pair (i, j) of items, and set i � j with probability
pij =1
1 + exp(rj − ri)
I Aggregate scores by setting
si =∑
j 6=i
logP (j ≺ i)
P (i ≺ j)
![Page 118: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/118.jpg)
Benefits of aggregation
NDCG risk as a function of aggregation level kfor n = 106 samples
100
101
102
103
104
0.65
0.7
0.75
0.8
0.85
Order k
ND
CG
@10
Aggregate
Pairwise
Score−based
![Page 119: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/119.jpg)
Image ranking
I Setup [Grangier and Bengio 2008]
I Take most common image search queries on google.com
I Train an independent ranker based on aggregated preferencestatistics for each query
I Compare with standard, disaggregated image-rankingapproaches
![Page 120: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/120.jpg)
Image ranking experiments
Highly ranked items from Corel Image Database for query tree car:
Aggregated
SVM
PLSA
![Page 121: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/121.jpg)
Conclusions
1. Partial preference data is abundant and (more) reliable
2. General theory of ranking consistency: When is
argminf
E[ϕ(f(Q), s)] ⊆ argminf
E[L(f(Q), s)]?
I Tractable consistency difficult with partial preference dataI Possible with complete preference data
3. Aggregation can bridge the gap
I Can transform pairwise preferences/click data into scores s
4. Practical consistent procedures via U -statistic aggregation
I Allows for arbitrary aggregation sI High-probability convergence of the learned ranking function
![Page 122: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/122.jpg)
Conclusions
1. Partial preference data is abundant and (more) reliable
2. General theory of ranking consistency: When is
argminf
E[ϕ(f(Q), s)] ⊆ argminf
E[L(f(Q), s)]?
I Tractable consistency difficult with partial preference dataI Possible with complete preference data
3. Aggregation can bridge the gap
I Can transform pairwise preferences/click data into scores s
4. Practical consistent procedures via U -statistic aggregation
I Allows for arbitrary aggregation sI High-probability convergence of the learned ranking function
![Page 123: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/123.jpg)
Conclusions
1. Partial preference data is abundant and (more) reliable
2. General theory of ranking consistency: When is
argminf
E[ϕ(f(Q), s)] ⊆ argminf
E[L(f(Q), s)]?
I Tractable consistency difficult with partial preference dataI Possible with complete preference data
3. Aggregation can bridge the gap
I Can transform pairwise preferences/click data into scores s
4. Practical consistent procedures via U -statistic aggregation
I Allows for arbitrary aggregation sI High-probability convergence of the learned ranking function
![Page 124: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/124.jpg)
Conclusions
1. Partial preference data is abundant and (more) reliable
2. General theory of ranking consistency: When is
argminf
E[ϕ(f(Q), s)] ⊆ argminf
E[L(f(Q), s)]?
I Tractable consistency difficult with partial preference dataI Possible with complete preference data
3. Aggregation can bridge the gapI Can transform pairwise preferences/click data into scores s
4. Practical consistent procedures via U -statistic aggregation
I Allows for arbitrary aggregation sI High-probability convergence of the learned ranking function
![Page 125: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/125.jpg)
Conclusions
1. Partial preference data is abundant and (more) reliable
2. General theory of ranking consistency: When is
argminf
E[ϕ(f(Q), s)] ⊆ argminf
E[L(f(Q), s)]?
I Tractable consistency difficult with partial preference dataI Possible with complete preference data
3. Aggregation can bridge the gapI Can transform pairwise preferences/click data into scores s
4. Practical consistent procedures via U -statistic aggregationI Allows for arbitrary aggregation sI High-probability convergence of the learned ranking function
![Page 126: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/126.jpg)
Future work
I Empirical directionsI Apply to more ranking problems!I Which aggregation procedures perform best?I How much aggregation is enough?
I Statistical questions: beyond consistencyI How does aggregation impact rate of convergence?I Can we design statistically efficient ranking procedures?
I Other ways of dealing with realistic partial preference data?
![Page 127: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/127.jpg)
Future work
I Empirical directionsI Apply to more ranking problems!I Which aggregation procedures perform best?I How much aggregation is enough?
I Statistical questions: beyond consistencyI How does aggregation impact rate of convergence?I Can we design statistically efficient ranking procedures?
I Other ways of dealing with realistic partial preference data?
![Page 128: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/128.jpg)
Future work
I Empirical directionsI Apply to more ranking problems!I Which aggregation procedures perform best?I How much aggregation is enough?
I Statistical questions: beyond consistencyI How does aggregation impact rate of convergence?I Can we design statistically efficient ranking procedures?
I Other ways of dealing with realistic partial preference data?
![Page 129: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/129.jpg)
Future work
I Empirical directionsI Apply to more ranking problems!I Which aggregation procedures perform best?I How much aggregation is enough?
I Statistical questions: beyond consistencyI How does aggregation impact rate of convergence?I Can we design statistically efficient ranking procedures?
I Other ways of dealing with realistic partial preference data?
![Page 130: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/130.jpg)
References I
P. L. Bartlett, M. I. Jordan, and J. McAuliffe. Convexity, classification, and risk bounds. Journal of the American StatisticalAssociation, 101:138–156, 2006.
D. Buffoni, C. Calauzenes, P. Gallinari, and N. Usunier. Learning scoring functions with order-preserving losses and standardizedsupervision. In Proceedings of the 28th International Conference on Machine Learning, 2011.
O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Conference onInformation and Knowledge Management, 2009.
N. Craswell, O. Zoeter, M. J. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In Web Searchand Data Mining (WSDM), pages 87–94, 2008.
O. Dekel, C. Manning, and Y. Singer. Log-linear models for label ranking. In Advances in Neural Information ProcessingSystems 16, 2004.
J. C. Duchi, L. Mackey, and M. I. Jordan. The asymptotics of ranking algorithms. Annals of Statistics, 2013.
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. Efficient boosting algorithms for combining preferences. Journal of MachineLearning Research, 4:933–969, 2003.
R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In Advances in Large MarginClassifiers. MIT Press, 2000.
G. Miller. The magic number seven, plus or minus two: Some limits on our capacity for processing information. PsychologyReview, 63:81–97, 1956.
P. Ravikumar, A. Tewari, and E. Yang. On NDCG consistency of listwise ranking methods. In Proceedings of the 14thInternational Conference on Artificial Intelligence and Statistics, 2011.
R. Shiffrin and R. Nosofsky. Seven plus or minus two: a commentary on capacity limitations. Psychological Review, 101(2):357–361, 1994.
N. Stewart, G. Brown, and N. Chater. Absolute identification by relative judgment. Psychological Review, 112(4):881–911,2005.
![Page 131: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/131.jpg)
![Page 132: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/132.jpg)
What is the problem?
Surrogate loss ϕ(α, s) =∑
ij sijφ(αi − αj)
2 3
1
s12
s23
s13
2 3
1
s31
2 3
1
s12 s31s13
s23
p(s) = .5 p(s′) = .5 Aggregate
∑
s
p(s)ϕ(α, s) =1
2ϕ(α, s′) +
1
2ϕ(α, s′)
∝ s12φ(α1 − α2) + s13φ(α1 − α3) + s23φ(α2 − α3) + s31φ(α3 − α1)
![Page 133: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/133.jpg)
What is the problem?
Surrogate loss ϕ(α, s) =∑
ij sijφ(αi − αj)
2 3
1
s12
s23
s13
2 3
1
s31
2 3
1
s12 s31s13
s23
p(s) = .5 p(s′) = .5 Aggregate
∑
s
p(s)ϕ(α, s) =1
2ϕ(α, s′) +
1
2ϕ(α, s′)
∝ s12φ(α1 − α2) + s13φ(α1 − α3) + s23φ(α2 − α3) + s31φ(α3 − α1)
![Page 134: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/134.jpg)
What is the problem?
s12φ(α1 − α2) + s13φ(α1 − α3) + s23φ(α2 − α3) + s31φ(α3 − α1)
More bang for your $$ by increasing to 0 from left: α1 ↓. Result:
α∗ = argminα
∑
ij
sijφ(αi − αj)
can have α∗2 > α∗1, even if s13 − s31 > s12 + s23.
![Page 135: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/135.jpg)
What is the problem?
s12φ(α1 − α2) + s13φ(α1 − α3) + s23φ(α2 − α3) + s31φ(α3 − α1)
More bang for your $$ by increasing to 0 from left: α1 ↓. Result:
α∗ = argminα
∑
ij
sijφ(αi − αj)
can have α∗2 > α∗1, even if s13 − s31 > s12 + s23.
![Page 136: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/136.jpg)
What is the problem?
s12φ(α1 − α2) + s13φ(α1 − α3) + s23φ(α2 − α3) + s31φ(α3 − α1)
a31φ(α3 - α1)
a13φ(α1 - α3)2 3
1
s12 s31s13
s23
More bang for your $$ by increasing to 0 from left: α1 ↓. Result:
α∗ = argminα
∑
ij
sijφ(αi − αj)
can have α∗2 > α∗1, even if s13 − s31 > s12 + s23.
![Page 137: Ranking, Aggregation, and Youlmackey/papers/rank... · Ranking, Aggregation, and You Lester Mackeyy Collaborators: John C. Duchiyand Michael I. Jordan yStanford University UC Berkeley](https://reader034.vdocument.in/reader034/viewer/2022042712/5f96c3978b13700fd774ffe4/html5/thumbnails/137.jpg)
What is the problem?
s12φ(α1 − α2) + s13φ(α1 − α3) + s23φ(α2 − α3) + s31φ(α3 − α1)
a31φ(α3 - α1)
a13φ(α1 - α3)2 3
1
s12 s31s13
s23
More bang for your $$ by increasing to 0 from left: α1 ↓. Result:
α∗ = argminα
∑
ij
sijφ(αi − αj)
can have α∗2 > α∗1, even if s13 − s31 > s12 + s23.