methods for learning classifier combinations: no clear winner
DESCRIPTION
Methods for Learning Classifier Combinations: No Clear Winner. Dmitriy Fradkin, Paul Kantor DIMACS, Rutgers University. Topic 1. Topic 2 …. New Topics. System 2. System 2. System 2. System 1. System 1. System 1. ?. Local Fusion. Federated or Global Fusion. Overview. - PowerPoint PPT PresentationTRANSCRIPT
04/22/23 Dmitriy Fradkin, ACM SAC'2005 1
Methods for Learning Classifier Combinations: No Clear Winner
Dmitriy Fradkin, Paul KantorDIMACS, Rutgers University
04/22/23 Dmitriy Fradkin, ACM SAC'2005 2
Topic 1 Topic 2 ….. New Topics
System 2System 1
Federated or Global Fusion
System 2System 1
?
System 2System 1
Local Fusion
04/22/23 Dmitriy Fradkin, ACM SAC'2005 3
Overview
• Discuss local fusion methods• Describe a new fusion approach for multi-
topic problems that we call “federated”• Compare it empirically to the global
approach, previously described in [Bartell et. al. 1994]
• Interpret the results
04/22/23 Dmitriy Fradkin, ACM SAC'2005 4
Related Work in IR
• [Bartell et. al, 1994] - global fusion of systems• [Hull et. al, 1996] - local fusion methods for
document filtering (averaging, linear and logistic regression, grid search)
• [Lam and Lai 2001] used category-specific features to model error-rate, and then picked the single best system for a category
• [Bennet et.al, 2002] uses “reliability indicators” together with scores as input to a metaclassifier
04/22/23 Dmitriy Fradkin, ACM SAC'2005 5
Combination of Classifiers
}1,0{),( qdyRelevance Judgment:
Decision Rule: )),((}1,0{),( qqdrsignqdC
The problem of fusion can be formulated as the problem offinding a way to combine several decision rules
04/22/23 Dmitriy Fradkin, ACM SAC'2005 7
Linear Combinations
qd
lqdxl
qdxsignqdC qj
l
jjF
on topic document tosystems by thegiven scores, normalized ofvector
ldimensiona-an is ),( and thresholda is weights,of vector ldimensiona-an is where
)),((),(1
),'(min),'(max
),'(min),(
''
'
qdrqdr
qdrqdrx
sdsd
sds
s
04/22/23 Dmitriy Fradkin, ACM SAC'2005 8
Input to Local Fusion
documentjth for judgement relevance - )(documents ,...,1
document,given afor scores of vector - x j
jynj
System 1 System 2 … System L Relevancedoc 1 x_11 x_12 … x_1l y_1doc 2 x_21 x_22 … x_2l y_2… … … … … …doc n x_n1 x_n1 … x_nl y_n
04/22/23 Dmitriy Fradkin, ACM SAC'2005 9
Local Fusion Methods
2
,...,1
)(min :Linear
nj
jj xy
2
,...,1,)(min :2Linear
nj
jj xy
xxxx :CentroidA new fusion method:
Other methods:
04/22/23 Dmitriy Fradkin, ACM SAC'2005 10
Local Fusion Methods (cont.)
jj
j
jj
jjjnj
j
xpp
xypp
pypy
1log
),,|1(
))1log()1(log(min :Logistic,...,1,
Since log is a monotone function, the underlying decision rule is linear
04/22/23 Dmitriy Fradkin, ACM SAC'2005 11
Threshold Tuning
• Once a vector of parameters is found for a local rule, we compute fusion score on the training set and find a threshold maximizing a particular utility measure:
Different combinations lead to different scores and decisions.
),( iii q
04/22/23 Dmitriy Fradkin, ACM SAC'2005 13
Global Fusion
When there are many topics:• Combine all document-query relevance judgments and
corresponding score together (as if for a single query)• Compute a local fusion rule
When data for a new training topic becomes available we can either: • solve the problem from the scratch, or• continue using the same rule.
04/22/23 Dmitriy Fradkin, ACM SAC'2005 14
Input to Global Fusion
System 1 System 2 … System L Relevancedoc 1/query 1 x_111 x_112 … x_11l y_11doc 1/query 2 x_121 x_122 … x_12l y_12… … … … … …doc 1/query m x_111 x_112 … x_11l y_1mdoc 2/ query 1 x_21 x_212 … x_21l y_21… … … … … …doc n/ query m x_n1 x_nm1 … x_nml y_nm
04/22/23 Dmitriy Fradkin, ACM SAC'2005 15
Question:
Suppose we know local fusion rules on a set of queries.• Can we exploit this knowledge on other queries? • Can we come up with a scheme that can easily incorporate new training queries?
04/22/23 Dmitriy Fradkin, ACM SAC'2005 16
Federated Fusion
m
jjm 1
1*
m
jj
m
jj qmm 11
),(11*
),( rulesfusion localir with the queries ningGiven trai j1 jm,...,qq
New training topics are easy to incorporate!
04/22/23 Dmitriy Fradkin, ACM SAC'2005 17
Experimental Evaluation• Reuters Corpus v1, version 2 (RCV1-v2)• 99 topics• Completely judged• ~23K documents (as in Lewis et. al. 2004) to train
individual systems• Selected 4060 (from ~ 800K) to construct fusion
rules• 9-fold cross-validation over topics
04/22/23 Dmitriy Fradkin, ACM SAC'2005 18
Utility Measures
T+ - all positive documents; D+ - submitted positive;D- - submitted negative
||2||||2T11NU
TDD
5.15.0)5.0T11NU,max(T11SU
04/22/23 Dmitriy Fradkin, ACM SAC'2005 19
Term Representation
otherwise 0
0,d)(t,f' if d)),(t,log(f'1 d)f(t,
where f’(t,d) is number of times a term occurs in a document.
IDF weighting: let i’(t) is the number of documents, in the training set T, containing term t. Then:
)i'(t)T
((t)iD
11
log
04/22/23 Dmitriy Fradkin, ACM SAC'2005 20
Individual Classifiers
• Bayesian Binary Regression (BBR) [Genkin et. al. 2004]
• kNN, k=384 (k was chosen on the basis of prior experiments)
• Rocchio Classifier
04/22/23 Dmitriy Fradkin, ACM SAC'2005 22
Single Classifiers and BBR-kNN fusion
0
0.2
0.4
0.6
0.8
1
1.2
1911 29
6
178
133 82 52 31 20 14 9 4
Topics (# of Relevant Documents)
T11S
U
kNNRocchioBBRBBR-kNN globalBBR-kNN federated
04/22/23 Dmitriy Fradkin, ACM SAC'2005 23
Global vs. Federated
0
0.2
0.4
0.6
0.8
1
CC
AT
M14
C18
C18
1
GC
RIM E12
C21
G15
C17
2
E51
2
E11
E13
C18
2
GE
NT
C17
3
E31
E31
1
G15
2
C33
1
E14
1
Topics (in decreasing number of relevant documents)
T11S
U BBR-kNN global
BBR-kNN federated
04/22/23 Dmitriy Fradkin, ACM SAC'2005 24
Global vs. Federated
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Global (T11SU)
Fede
rate
d (T
11SU
)
BBR-kNN global
Series1
04/22/23 Dmitriy Fradkin, ACM SAC'2005 25
Results
Local Fusion kNN Rocchio BBR BBR-kNN global BBR-kNN federatednone 0.583 0.54 0.578 … …Centroid … … … 0.569 0.587Linear … … … 0.569 0.574Linear 2 … … … 0.569 0.575Logistic … … … 0.556 0.549
Average T11SU measure across 99 topics of RCV1
04/22/23 Dmitriy Fradkin, ACM SAC'2005 26
Conclusions• Centroid method performs best with federated fusion• Federated fusion gives higher average utility,• But global fusion performs better on greater number of topics.• This seems to be related to the number of relevant documents for individual topics (federated is better for topics with few relevant documents).• No Clear Winner: the choice of methods depends on user’s objectives• However, computationally Federated fusion is more efficient• Have to consider topic properties when choosing a combination method
04/22/23 Dmitriy Fradkin, ACM SAC'2005 29
Acknowledgments
• KD-D group via NSF grant EIA-0087022• Members of DIMACS MMS project: Fred Roberts (PI), Andrei Anghelescu, Alex
Genkin, Dave Lewis, David Madigan, Vladimir Menkov
• Kwong Bor Ng• Anonymous reviewers