learning to rank from heuristics to theoretic approaches guest lecture by hongning wang...
TRANSCRIPT
![Page 2: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/2.jpg)
Guest Lecture for Learing to Rank 2
Congratulations
• Job Offer– Design the ranking module for Bing.com
![Page 3: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/3.jpg)
Guest Lecture for Learing to Rank 3
How should I rank documents?
Answer: Rank by relevance!
![Page 4: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/4.jpg)
Guest Lecture for Learing to Rank 4
Relevance ?!
![Page 5: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/5.jpg)
Guest Lecture for Learing to Rank 5
The Notion of Relevance
Relevance
(Rep(q), Rep(d)) Similarity
P(r=1|q,d) r {0,1} Probability of Relevance
P(d q) or P(q d) Probabilistic inference
Different rep & similarity
Vector spacemodel(Salton et al., 75)
Prob. distr.model(Wong & Yao, 89)
…
GenerativeModel
RegressionModel (Fuhr 89)
Classicalprob. Model(Robertson & Sparck Jones, 76)
Docgeneration
Querygeneration
LMapproach(Ponte & Croft, 98)(Lafferty & Zhai, 01a)
Prob. conceptspace model(Wong & Yao, 95)
Differentinference system
Inference network model(Turtle & Croft, 91)
Div. from Randomness(Amati & Rijsbergen 02)
Learn. To Rank(Joachims 02, Berges et al. 05)
Relevance constraints[Fang et al. 04]
Lecture notes from CS598CXZ
![Page 6: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/6.jpg)
Guest Lecture for Learing to Rank 6
Relevance Estimation
• Query matching– Language model– BM25– Vector space cosine similarity
• Document importance– PageRank– HITS
![Page 7: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/7.jpg)
Guest Lecture for Learing to Rank 7
Did I do a good job of ranking documents?
• IR evaluations metrics– Precision– MAP– NDCG
PageRankBM25
![Page 8: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/8.jpg)
Guest Lecture for Learing to Rank 8
Take advantage of different relevance estimator?
• Ensemble the cues– Linear?
•
– Non-linear?• Decision tree-like
𝑎1×𝐵𝑀 25+𝛼2× 𝐿𝑀+𝛼3× 𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘+𝛼4×𝐻𝐼𝑇𝑆
BM25>0.5
LM > 0.1 PageRank > 0.3
True
FalseTrue
r= 1.0 r= 0.7
FalseTrue
r= 0.4 r= 0.1
False
}{𝛼1=0.1 ,𝛼2=0.1,𝛼3=0.5 ,𝛼4=0.2 }→{𝑀𝐴𝑃=0.18 ,𝑁𝐷𝐶𝐺=0.7 }
{𝛼1=0.4 ,𝛼2=0.2 ,𝛼3=0.1 ,𝛼4=0.1 }→ {𝑀𝐴𝑃=0.20 ,𝑁𝐷𝐶𝐺=0.6 }
![Page 9: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/9.jpg)
Guest Lecture for Learing to Rank 9
What if we have thousands of features?
• Is there any way I can do better?– Optimizing the metrics automatically!
How to determine those s? Where to find those tree structures?
![Page 10: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/10.jpg)
Guest Lecture for Learing to Rank 10
Rethink the task
• Given: (query, document) pairs represented by a set of relevance estimators, a.k.a., features
• Needed: a way of combining the estimators– ordered
• Criterion: optimize IR metrics– P@k, MAP, NDCG, etc.
DocID BM25 LM PageRank Label
0001 1.6 1.1 0.9 0
0002 2.7 1.9 0.2 1
Key!
![Page 11: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/11.jpg)
Guest Lecture for Learing to Rank 11
Machine Learning
• Input: , where • Object function : • Output: , such that
𝑂 (𝑌 ′ ,𝑌 )=𝛿(𝑌 ′=𝑌 )
Classification
http://en.wikipedia.org/wiki/Statistical_classification
Regression
𝑂 (𝑌 ′ ,𝑌 )=−∨¿𝑌 ′−𝑌∨¿http://en.wikipedia.org/wiki/Regression_analysis
M
NOTE: We will only talk about supervised learning.
![Page 12: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/12.jpg)
Guest Lecture for Learing to Rank 12
Learning to Rank
• General solution in optimization framework– Input: , where – Object: O = {P@k, MAP, NDCG}– Output: , s.t.,
DocID BM25 LM PageRank Label
0001 1.6 1.1 0.9 0
0002 2.7 1.9 0.2 1
![Page 13: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/13.jpg)
Guest Lecture for Learing to Rank 13
Challenge: how to optimize?
• Evaluation metric recap– Average Precision
•
– DCG•
• Order is essential!– order metric
Not continuous with respect to f(X)!
![Page 14: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/14.jpg)
Guest Lecture for Learing to Rank 14
Approximating the Objects!
• Pointwise– Fit the relevance labels individually
• Pairwise– Fit the relative orders
• Listwise– Fit the whole order
![Page 15: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/15.jpg)
Guest Lecture for Learing to Rank 15
Pointwise Learning to Rank
• Ideally perfect relevance prediction leads to perfect ranking– score order metric
• Reducing ranking problem to– Regression
• Subset Ranking using Regression, D.Cossock and T.Zhang, COLT 2006
– (multi-)Classification
• Ranking with Large Margin Principles, A. Shashua and A. Levin, NIPS 2002
![Page 16: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/16.jpg)
Guest Lecture for Learing to Rank 16
Subset Ranking using RegressionD.Cossock and T.Zhang, COLT 2006
• Fit relevance labels via regression– – Emphasize more on relevant documents
•
http://en.wikipedia.org/wiki/Regression_analysis
Weights on each document Most positive document
![Page 17: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/17.jpg)
Guest Lecture for Learing to Rank 17
Ranking with Large Margin PrinciplesA. Shashua and A. Levin, NIPS 2002
• Goal: correctly placing the documents in the corresponding category and maximize the margin
Y=0 Y=2
Y=1
𝛼 𝑠
Reduce the violations
![Page 18: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/18.jpg)
Guest Lecture for Learing to Rank 18
• Maximizing the sum of margins
Ranking with Large Margin Principles A. Shashua and A. Levin, NIPS 2002
Y=0
Y=2
Y=1
𝛼 𝑠
![Page 19: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/19.jpg)
Guest Lecture for Learing to Rank 19
Ranking with Large Margin Principles A. Shashua and A. Levin, NIPS 2002
• Ranking lost is consistently decreasing with more training data
![Page 20: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/20.jpg)
Guest Lecture for Learing to Rank 20
What did we learn
• Machine learning helps!– Derive something optimizable– More efficient and guided
![Page 21: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/21.jpg)
Guest Lecture for Learing to Rank 21
There is always a catch
• Cannot directly optimize IR metrics– (0 1, 2 0) worse than (0->-2, 2->4)
• Position of documents are ignored– Penalty on documents at higher positions should
be larger• Favor the queries with more documents
![Page 22: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/22.jpg)
Guest Lecture for Learing to Rank 22
Pairwise Learning to Rank
• Ideally perfect partial order leads to perfect ranking– partial order order metric
• Ordinal regression
• Relative ordering between different documents is significant
• E.g., (0->-2, 2->4) is better than (0 1, 2 0)
– Large body of work
![Page 23: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/23.jpg)
Guest Lecture for Learing to Rank 23
Optimizing Search Engines using Clickthrough DataThorsten Joachims, KDD’02
• Minimizing the number of mis-ordered pairs
1
0
linear combination of features𝑓 (𝑞 ,𝑑 )=𝑤𝑇 𝑋𝑞 ,𝑑
RankingSVM
𝑦 1>𝑦2
Keep the relative orders
![Page 24: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/24.jpg)
Guest Lecture for Learing to Rank 24
Optimizing Search Engines using Clickthrough DataThorsten Joachims, KDD’02
• How to use it?– sorder
![Page 25: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/25.jpg)
Guest Lecture for Learing to Rank 25
Optimizing Search Engines using Clickthrough DataThorsten Joachims, KDD’02
• What did it learn from the data?– Linear correlations
Positive correlated features
Negative correlated features
![Page 26: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/26.jpg)
Guest Lecture for Learing to Rank 26
• How good is it?– Test on real system
Optimizing Search Engines using Clickthrough DataThorsten Joachims, KDD’02
![Page 27: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/27.jpg)
Guest Lecture for Learing to Rank 27
An Efficient Boosting Algorithm for Combining Preferences
Y. Freund, R. Iyer, et al. JMLR 2003
• Smooth the loss on mis-ordered pairs
exponential loss
hinge loss (RankingSVM)
0/1 loss
square loss
from Pattern Recognition and Machine Learning, P337
![Page 28: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/28.jpg)
Guest Lecture for Learing to Rank 28
• RankBoost: optimize via boosting– Vote by a committee
An Efficient Boosting Algorithm for Combining Preferences
Y. Freund, R. Iyer, et al. JMLR 2003
from Pattern Recognition and Machine Learning, P658
Updating
Credibility of each committee member (ranking feature)
BM25 PageRank Cosine
![Page 29: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/29.jpg)
Guest Lecture for Learing to Rank 29
• How good is it?
An Efficient Boosting Algorithm for Combining Preferences
Y. Freund, R. Iyer, et al. JMLR 2003
![Page 30: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/30.jpg)
Guest Lecture for Learing to Rank 30
A Regression Framework for Learning Ranking Functions Using Relative Relevance Judgments
Zheng et al. SIRIG’07
• Non-linear ensemble of features– Object: – Gradient descent boosting tree
• Boosting tree– Using regression tree to minimize the residuals
r= 0.4 r= 0.1
BM25>0.5
LM > 0.1 PageRank > 0.3
True
FalseTrue
r= 1.0 r= 0.7
FalseTrue
False
![Page 31: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/31.jpg)
Guest Lecture for Learing to Rank 31
A Regression Framework for Learning Ranking Functions Using Relative Relevance Judgments
Zheng et al. SIRIG’07
• Non-linear v.s. linear– Comparing with RankingSVM
![Page 32: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/32.jpg)
Guest Lecture for Learing to Rank 32
Where do we get the relative orders
• Human annotations– Small scale, expensive to acquire
• Clickthroughs– Large amount, easy to acquire
![Page 33: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/33.jpg)
Guest Lecture for Learing to Rank 33
Accurately Interpreting Clickthrough Data as Implicit Feedback
Thorsten Joachims, et al., SIGIR’05• Position bias
Your click is not because of its relevance, but its position?!
![Page 34: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/34.jpg)
Guest Lecture for Learing to Rank 34
Accurately Interpreting Clickthrough Data as Implicit Feedback
Thorsten Joachims, et al., SIGIR’05• Controlled experiment
– Over trust of the top ranked positions
![Page 35: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/35.jpg)
Guest Lecture for Learing to Rank 35
• Pairwise preference matters– Click: examined and clicked document – Skip: examined but non-clicked document
Accurately Interpreting Clickthrough Data as Implicit Feedback
Thorsten Joachims, et al., SIGIR’05
Click > Skip
![Page 36: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/36.jpg)
Guest Lecture for Learing to Rank 36
What did we learn
• Predicting relative order – Getting closer to the nature of ranking
• Promising performance in practice– Pairwise preferences from click-throughs
![Page 37: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/37.jpg)
Guest Lecture for Learing to Rank 37
Listwise Learning to Rank
• Can we directly optimize the ranking?– order metric
• Tackle the challenge– Optimization without gradient
![Page 38: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/38.jpg)
Guest Lecture for Learing to Rank 38
From RankNet to LambdaRank to LambdaMART: An Overview
Christopher J.C. Burges, 2010
• Minimizing mis-ordered pair => maximizing IR metrics?
Mis-ordered pairs: 6 Mis-ordered pairs: 4
AP: AP:
DCG: DCG:
Position is crucial!
![Page 39: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/39.jpg)
Guest Lecture for Learing to Rank 39
From RankNet to LambdaRank to LambdaMART: An Overview
Christopher J.C. Burges, 2010
• Weight the mis-ordered pairs?– Some pairs are more important to be placed in the
right order– Inject into object function
– Inject into gradient•
Gradient with respect to approximated object, i.e., exponential loss on mis-ordered pairs
Change in original object, e.g., NDCG, if we switch the documents i and j, leaving the other documents unchanged
Depend on the ranking of document i, j in the whole list
![Page 40: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/40.jpg)
Guest Lecture for Learing to Rank 40
• Lambda functions– Gradient?
• Yes, it meets the sufficient and necessary condition of being partial derivative
– Lead to optimal solution of original problem?• Empirically
From RankNet to LambdaRank to LambdaMART: An Overview
Christopher J.C. Burges, 2010
![Page 41: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/41.jpg)
Guest Lecture for Learing to Rank 41
• Evolution
From RankNet to LambdaRank to LambdaMART: An Overview
Christopher J.C. Burges, 2010
RankNet LambdaRank LambdaMART
Object Cross entropy over the pairs Unknown Unknown
Gradient ( function)
Gradient of cross entropy
Gradient of cross entropy times
pairwise change in target metric
Gradient of cross entropy times
pairwise change in target metric
Optimization method neural network stochastic gradient
descentMultiple Additive Regression Trees
(MART)
As we discussed in RankBoost
Optimize solely by gradient
Non-linear combination
![Page 42: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/42.jpg)
Guest Lecture for Learing to Rank 42
• A Lambda tree
From RankNet to LambdaRank to LambdaMART: An Overview
Christopher J.C. Burges, 2010
splitting
Combination of features
![Page 43: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/43.jpg)
Guest Lecture for Learing to Rank 43
AdaRank: a boosting algorithm for information retrievalJun Xu & Hang Li, SIGIR’07
• Loss defined by IR metrics– – Optimizing by boosting Target metrics: MAP, NDCG, MRR
from Pattern Recognition and Machine Learning, P658
Updating
Credibility of each committee member (ranking feature)
BM25 PageRank Cosine
![Page 44: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/44.jpg)
Guest Lecture for Learing to Rank 44
A Support Vector Machine for Optimizing Average Precision
Yisong Yue, et al., SIGIR’07
RankingSVM• Minimizing the pairwise loss
SVM-MAP• Minimizing the structural
loss
Loss defined on the number of mis-ordered document pairs
Loss defined on the quality of the whole list of ordered documents
MAP difference
![Page 45: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/45.jpg)
Guest Lecture for Learing to Rank 45
• Max margin principle– Push the ground-truth far away from any mistakes
you might make– Finding the most violated constraints
A Support Vector Machine for Optimizing Average Precision
Yisong Yue, et al., SIGIR’07
![Page 46: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/46.jpg)
Guest Lecture for Learing to Rank 46
• Finding the most violated constraints– MAP is invariant to permutation of (ir)relevant
documents– Maximize MAP over a series of swaps between
relevant and irrelevant documents•
A Support Vector Machine for Optimizing Average Precision
Yisong Yue, et al., SIGIR’07
Right-hand side of constraints
Greedy solution
Start from the reverse order of ideal ranking
![Page 47: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/47.jpg)
Guest Lecture for Learing to Rank 47
• Experiment results
A Support Vector Machine for Optimizing Average Precision
Yisong Yue, et al., SIGIR’07
![Page 48: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/48.jpg)
Guest Lecture for Learing to Rank 48
Other listwise solutions
• Soften the metrics to make them differentiable– Michael Taylor et al., SoftRank: optimizing non-
smooth rank metrics, WSDM'08• Minimize a loss function defined on
permutations– Zhe Cao et al., Learning to rank: from pairwise
approach to listwise approach, ICML'07
![Page 49: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/49.jpg)
Guest Lecture for Learing to Rank 49
What did we learn
• Taking a list of documents as a whole– Positions are visible for the learning algorithm– Directly optimizing the target metric
• Limitation– The search space is huge!
![Page 50: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/50.jpg)
Guest Lecture for Learing to Rank 50
Summary
• Learning to rank– Automatic combination of ranking features for
optimizing IR evaluation metrics• Approaches
– Pointwise• Fit the relevance labels individually
– Pairwise• Fit the relative orders
– Listwise• Fit the whole order
![Page 51: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/51.jpg)
Guest Lecture for Learing to Rank 51
Experimental Comparisons
• Ranking performance
![Page 52: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/52.jpg)
Guest Lecture for Learing to Rank 52
Experimental Comparisons
• Winning count– Over seven different data sets
![Page 53: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/53.jpg)
Guest Lecture for Learing to Rank 53
Experimental Comparisons
• My experiments– 1.2k queries, 45.5K documents with 1890 features– 800 queries for training, 400 queries for testing
MAP P@1 ERR MRR NDCG@5ListNET 0.2863 0.2074 0.1661 0.3714 0.2949
LambdaMART 0.4644 0.4630 0.2654 0.6105 0.5236RankNET 0.3005 0.2222 0.1873 0.3816 0.3386
RankBoost 0.4548 0.4370 0.2463 0.5829 0.4866RankingSVM 0.3507 0.2370 0.1895 0.4154 0.3585
AdaRank 0.4321 0.4111 0.2307 0.5482 0.4421pLogistic 0.4519 0.3926 0.2489 0.5535 0.4945Logistic 0.4348 0.3778 0.2410 0.5526 0.4762
![Page 54: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/54.jpg)
Guest Lecture for Learing to Rank 54
Analysis of the Approaches
• What are they really optimizing?– Relation with IR metrics
gap
![Page 55: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/55.jpg)
Guest Lecture for Learing to Rank 55
Pointwise Approaches
• Regression based
• Classification based
Regression lossDiscount coefficients in DCG
Classification lossDiscount coefficients in DCG
![Page 56: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/56.jpg)
Guest Lecture for Learing to Rank 56
From
![Page 57: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/57.jpg)
Guest Lecture for Learing to Rank 57
From
Discount coefficients in DCG
![Page 58: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/58.jpg)
Guest Lecture for Learing to Rank 58
Listwise Approaches
• No general analysis– Method dependent– Directness and consistency
![Page 59: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/59.jpg)
Guest Lecture for Learing to Rank 59
Connection with Traditional IR
• People have foreseen this topic long time ago– Nicely fit in the risk minimization framework
![Page 60: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/60.jpg)
Guest Lecture for Learing to Rank 60
Applying Bayesian Decision Theory
Choice: (D1,1)
Choice: (D2,2)
Choice: (Dn,n)
...
query quser U
doc set Csource S
q
1
N
dSCUqpDLDD
),,,|(),,(minarg*)*,(,
hidden observedloss
Bayes risk for choice (D, )RISK MINIMIZATION
Loss
L
L
L
Metric to be optimized Available ranking features
![Page 61: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/61.jpg)
Guest Lecture for Learing to Rank 61
Traditional Solution
• Set-based models (choose D)• Ranking models (choose )
– Independent loss • Relevance-based loss
• Distance-based loss
– Dependent loss • MMR loss • MDR loss
Boolean model
Probabilistic relevance model Generative Relevance Theory
Subtopic retrieval model
Vector-space ModelTwo-stage LM
KL-divergence model
Pointwise
Pairwise/Listwise
Unsupervised!
![Page 62: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/62.jpg)
Guest Lecture for Learing to Rank 62
Traditional Notion of Relevance
Relevance
(Rep(q), Rep(d)) Similarity
P(r=1|q,d) r {0,1} Probability of Relevance
P(d q) or P(q d) Probabilistic inference
Different rep & similarity
Vector spacemodel(Salton et al., 75)
Prob. distr.model(Wong & Yao, 89)
…
GenerativeModel
RegressionModel (Fuhr 89)
Classicalprob. Model(Robertson & Sparck Jones, 76)
Docgeneration
Querygeneration
LMapproach(Ponte & Croft, 98)(Lafferty & Zhai, 01a)
Prob. conceptspace model(Wong & Yao, 95)
Differentinference system
Inference network model(Turtle & Croft, 91)
Div. from Randomness(Amati & Rijsbergen 02)
Learn. To Rank(Joachims 02, Berges et al. 05)
Relevance constraints[Fang et al. 04]
![Page 63: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/63.jpg)
Guest Lecture for Learing to Rank 63
Broader Notion of Relevance
• Traditional view– Content-driven
• Vector space model• Probability relevance model• Language model
• Modern view– Anything related to the quality of the document
• Clicks/views• Link structure• Visual structure• Social network• ….
Query-Document specificUnsupervised
Query, Document, Query-Document specificSupervised
![Page 64: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/64.jpg)
Guest Lecture for Learing to Rank 64
Broader Notion of Relevance
DocumentsQuery
BM25
Language Model
Cosine
DocumentsQuery
BM25
Language Model
Cosinelikes
Linkage structure
Query relation
Query relation
Linkage structure
DocumentsQuery
BM25
Language Model
CosineClick/View
Visual StructureSocial network
![Page 65: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/65.jpg)
Guest Lecture for Learing to Rank 65
Future
• Tighter bounds• Faster solution• Larger scale• Wider application scenario
![Page 66: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/66.jpg)
Guest Lecture for Learing to Rank 66
Resources• Books
– Liu, Tie-Yan. Learning to rank for information retrieval. Vol. 13. Springer, 2011.
– Li, Hang. "Learning to rank for information retrieval and natural language processing." Synthesis Lectures on Human Language Technologies 4.1 (2011): 1-113.
• Helpful pages– http://en.wikipedia.org/wiki/Learning_to_rank
• Packages– RankingSVM: http://svmlight.joachims.org/– RankLib: http://people.cs.umass.edu/~vdang/ranklib.html
• Data sets– LETOR http://research.microsoft.com/en-us/um/beijing/projects/letor//– Yahoo! Learning to rank challenge
http://learningtorankchallenge.yahoo.com/
![Page 67: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/67.jpg)
Guest Lecture for Learing to Rank 67
References• Liu, Tie-Yan. "Learning to rank for information retrieval." Foundations and
Trends in Information Retrieval 3.3 (2009): 225-331.• Cossock, David, and Tong Zhang. "Subset ranking using
regression." Learning theory (2006): 605-619.• Shashua, Amnon, and Anat Levin. "Ranking with large margin principle:
Two approaches." Advances in neural information processing systems 15 (2003): 937-944.
• Joachims, Thorsten. "Optimizing search engines using clickthrough data." Proceedings of the eighth ACM SIGKDD. ACM, 2002.
• Freund, Yoav, et al. "An efficient boosting algorithm for combining preferences." The Journal of Machine Learning Research 4 (2003): 933-969.
• Zheng, Zhaohui, et al. "A regression framework for learning ranking functions using relative relevance judgments." Proceedings of the 30th annual international ACM SIGIR. ACM, 2007.
![Page 68: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/68.jpg)
Guest Lecture for Learing to Rank 68
References• Joachims, Thorsten, et al. "Accurately interpreting clickthrough data as implicit
feedback." Proceedings of the 28th annual international ACM SIGIR. ACM, 2005.
• Burges, C. "From ranknet to lambdarank to lambdamart: An overview." Learning 11 (2010): 23-581.
• Xu, Jun, and Hang Li. "AdaRank: a boosting algorithm for information retrieval." Proceedings of the 30th annual international ACM SIGIR. ACM, 2007.
• Yue, Yisong, et al. "A support vector method for optimizing average precision." Proceedings of the 30th annual international ACM SIGIR. ACM, 2007.
• Taylor, Michael, et al. "Softrank: optimizing non-smooth rank metrics." Proceedings of the international conference WSDM. ACM, 2008.
• Cao, Zhe, et al. "Learning to rank: from pairwise approach to listwise approach." Proceedings of the 24th ICML. ACM, 2007.
![Page 69: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/69.jpg)
Guest Lecture for Learing to Rank 69
Thank you!
• Q&A
![Page 70: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/70.jpg)
Guest Lecture for Learing to Rank 70
Recap of last lecture
• Goal– Design the ranking module for Bing.com
![Page 71: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/71.jpg)
Guest Lecture for Learing to Rank 71
Basic Search Engine Architecture
• The anatomy of a large-scale hypertextual Web search engine– Sergey Brin and Lawrence Page. Computer
networks and ISDN systems 30.1 (1998): 107-117.
Your Job
![Page 72: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/72.jpg)
Guest Lecture for Learing to Rank 72
Learning to Rank
• Given: (query, document) pairs represented by a set of relevance estimators, a.k.a., features
• Needed: a way of combining the estimators– ordered
• Criterion: optimize IR metrics– P@k, MAP, NDCG, etc.
QueryID DocID BM25 LM PageRank Label
0001 0001 1.6 1.1 0.9 0
0001 0002 2.7 1.9 0.2 1
Key!
![Page 73: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/73.jpg)
Guest Lecture for Learing to Rank 73
Challenge: how to optimize?
• Order is essential!– order metric
• Evaluation metrics are not continuous and not differentiable
![Page 74: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/74.jpg)
Guest Lecture for Learing to Rank 74
Approximating the Objects!
• Pointwise– Fit the relevance labels individually
• Pairwise– Fit the relative orders
• Listwise– Fit the whole order
![Page 75: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/75.jpg)
Guest Lecture for Learing to Rank 75
Pointwise Learning to Rank
• Ideally perfect relevance prediction leads to perfect ranking– score order metric
• Reduce ranking problem to– Regression
http://en.wikipedia.org/wiki/Statistical_classification
http://en.wikipedia.org/wiki/Regression_analysis
- Classification
![Page 76: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/76.jpg)
76
Deficiency
• Cannot directly optimize IR metrics– (0 1, 2 0) worse than (0->-2, 2->4)
• Position of documents are ignored– Penalty on documents at higher positions should
be larger• Favor the queries with more documents
Guest Lecture for Learing to Rank
𝑑1′
𝑑2❑
𝑑1❑
10
-2
2
-1
𝑑1′ ′
![Page 77: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/77.jpg)
Guest Lecture for Learing to Rank 77
Pairwise Learning to Rank
• Ideally perfect partial order leads to perfect ranking– partial order order metric
• Ordinal regression
• Relative ordering between different documents is significant
𝑑1′
𝑑2❑
𝑑1❑
10
-2
2
-1
𝑑1′ ′
![Page 78: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/78.jpg)
Guest Lecture for Learing to Rank 78
RankingSVMThorsten Joachims, KDD’02
• Minimizing the number of mis-ordered pairs
1
0
linear combination of features𝑓 (𝑞 ,𝑑 )=𝑤𝑇 𝑋𝑞 ,𝑑
𝑦 1>𝑦2
Keep the relative orders
QueryID DocID BM25 PageRank Label
0001 0001 1.6 0.9 4
0001 0002 2.7 0.2 2
0001 0003 1.3 0.2 1
0001 0004 1.2 0.7 0
![Page 79: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/79.jpg)
79
RankingSVMThorsten Joachims, KDD’02
• Minimizing the number of mis-ordered pairs
Guest Lecture for Learing to Rank
𝑤 ΔΦ (𝑞𝑛 ,𝑑𝑖 ,𝑑 𝑗 )≥1−𝜉 𝑖 , 𝑗 ,𝑛
![Page 80: Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang wang296@illlinois.edu](https://reader030.vdocument.in/reader030/viewer/2022032522/56649d6e5503460f94a4ee4f/html5/thumbnails/80.jpg)
Guest Lecture for Learing to Rank 80
General Idea of Pairwise Learning to Rank
• For any pair of
exponential loss
hinge loss (RankingSVM)
0/1 loss
square loss (GBRank)
80Guest Lecture for Learing to Rank
max (0,1−𝑤 ΔΦ (𝑞𝑛 ,𝑑𝑖 ,𝑑 𝑗 ) )
max (0,1−𝑤 ΔΦ (𝑞𝑛 ,𝑑𝑖 ,𝑑 𝑗 ) )2
exp (−𝑤 ΔΦ (𝑞𝑛 ,𝑑𝑖 ,𝑑 𝑗 ) )❑
𝛿 (𝑤 ΔΦ (𝑞𝑛 ,𝑑𝑖 ,𝑑 𝑗 )<0)
𝑤 ΔΦ (𝑞𝑛 ,𝑑𝑖 ,𝑑 𝑗 )