application of ensemble models in web ranking
DESCRIPTION
Application of Ensemble Models in Web Ranking. Homa B. Hashemi Nasser Yazdani Azadeh Shakery Mahdi Pakdaman Naeini. School of Electrical and Computer Engineering University of Tehran. Information Explosion. Web Challenges. Huge size of information 25 billion pages - PowerPoint PPT PresentationTRANSCRIPT
Application of Ensemble Models in Web Ranking
Homa B. HashemiNasser YazdaniAzadeh Shakery
Mahdi Pakdaman Naeini
School of Electrical and Computer EngineeringUniversity of Tehran
Information Explosion
Web Challenges Huge size of information
25 billion pages
Proliferation and dynamic nature Creation of New pages New links are created at rate 25% per week
Heterogeneous contents HTML/Text/Audio/…
4Application of Ensemble Models in Web Ranking
Search Engine as A Tool
Application of Ensemble Models in Web Ranking 5
http://seo-related.com/
Inside Search Engine Crawling Indexing Ranking
Inside Search Engine Crawling Indexing Ranking
Ranking Approaches Content-based (query dependent)
TF, IDF BM25 Classical IR …
Connectivity based (web) PageRank HITS …
Application of Ensemble Models in Web Ranking 8
Our General Framework
Application of Ensemble Models in Web Ranking 9
Query Retrieval Model
List 1
List 2
List N
…
Ensemble Model
Final
List
Simple Ensemble Models Sum rule
Add (normalized) values of different methods Product rule
Multiply (normalized) values of different methods
Borda rule Combination of ranking
Application of Ensemble Models in Web Ranking 10
Complicated Ensemble Models OWA (Ordered Weighted Averaging)
Click-Through Data
SVM Use the distance from discriminating hyper
plane as the measure for relevancy of a page to a specific query
Application of Ensemble Models in Web Ranking 11
OWA operator
the weights of each vector
Application of Ensemble Models in Web Ranking 12
n
jjjn bwaaaF
121 ,...,,
1
21
23
2
1
1
,1
...,1
,1,
nn
nn
w
w
w
ww
3.0
Simulated Click-Through Data How can we use the user behavior?
80% of user clicks are related to query Click-through data
Application of Ensemble Models in Web Ranking 13
14
L(a)
1. D1
2. D3
3. D2
4. D4
5. D5
6. d6
Simulated Click-Through Data (example)
L(b)
1. D1
2. D4
3. D7
4. D9
5. D2
6. d8
15
L(a)
1. D1
2. D3
3. D2
4. D4
5. D5
6. d6
Simulated Click-Through Data (example)
L(b)
1. D1
2. D4
3. D7
4. D9
5. D2
6. d8
Interleaved results L(a,b)
1. D1 2. D43. D34. D75. D26. D97. D58. D89. D6
16
L(a)
1. D1
2. D3
3. D2
4. D4
5. D5
6. d6
Simulated Click-Through Data (example)
L(b)
1. D1
2. D4
3. D7
4. D9
5. D2
6. d8
Interleaved results L(a,b)
1. D1 First2. D43. D34. D75. D2 Second6. D97. D5 Third8. D89. D6
17
L(a)
1. D1
2. D3
3. D2
4. D4
5. D5
6. d6
Simulated Click-Through Data (example)
L(b)
1. D1
2. D4
3. D7
4. D9
5. D2
6. d8
Interleaved results L(a,b)
1. D1 First2. D43. D34. D75. D2 Second6. D97. D5 Third8. D89. D6
Experimental Datasets LETOR benchmark (English)
Microsoft Research Asia, 2007
DotIR benchmark (Persian) Iran Telecommunication Research Center
(ITRC),2009
Application of Ensemble Models in Web Ranking 18
LETOR Benchmark – p@k
Application of Ensemble Models in Web Ranking 19
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-0.0499999999999997
2.91433543964104E-16
0.0500000000000003
0.1
0.15
0.2
0.25
0.3
0.35
TF-IDF BM25 HITS PageRank Borda1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
product Normal_sum SumWeighted_Sum SVM_Linear SVM_RBFOWA SimClick
LETOR Benchmark – MAP
Application of Ensemble Models in Web Ranking 20
TF-ID
FBM
25HIT
S
Page
Rank
Borda
prod
uct
Normal_
sum
Sum
Weig
hted
_Sum
Weig
hted
_Nor
mal_Su
m
SVM
_Line
ar
SVM
_RBF
OWA
SimClic
k0
0.05
0.1
0.15
0.2
0.25
DotIR Benchmark – p@k
21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
TF-IDF BM25 HITS
PageRank Borda
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
product Normal_sumSum SVM_RBFOWA SimClick
DotIR Benchmark – MAP
Application of Ensemble Models in Web Ranking 22
TF-ID
FBM
25HIT
S
Page
Rank
Borda
prod
uct
Normal_
sum
Sum
Weig
hted
_Sum
Weig
hted
_Nor
mal_Su
m
SVM
_Line
ar
SVM
_RBF
OWA
SimClic
k0
0.1
0.2
0.3
0.4
0.5
0.6
Summary Motivation:
Important role of Ranking algorithms Low precision of content and connectivity
algorithms
Solution: Use different Ensemble models to combine
Ranking algorithms based on Learning
Results: LETOR benchmark has been used for evaluation More research needed to be done on newly built DotIR
collectionApplication of Ensemble Models in Web Ranking 23
Application of Ensemble Models in Web Ranking 24
LABS
25
Reference Ali Mohammad Zareh Bidoki, Pedram Ghodsnia, Nasser
Yazdani, “A3CRank: An Adaptive Ranking method based on Connectivity, Content and Click-through data”, Information Processing and Management, 2010.
Ali Mohammad Zareh Bidoki, “Combination of Documents Features Based on Simulated Click-through Data”, ECIR 2009.
Application of Ensemble Models in Web Ranking
Thank YouAny Questions?