methods for learning classifier combinations: no clear winner

04/22/23 Dmitriy Fradkin, ACM SAC'2005 1

Methods for Learning Classifier Combinations: No Clear Winner

Dmitriy Fradkin, Paul KantorDIMACS, Rutgers University


Topic 1 Topic 2 ….. New Topics

System 2System 1

Federated or Global Fusion

System 2System 1

?

System 2System 1

Local Fusion


Overview

• Discuss local fusion methods• Describe a new fusion approach for multi-

topic problems that we call “federated”• Compare it empirically to the global

approach, previously described in [Bartell et. al. 1994]

• Interpret the results


Related Work in IR

• [Bartell et. al, 1994] - global fusion of systems• [Hull et. al, 1996] - local fusion methods for

document filtering (averaging, linear and logistic regression, grid search)

• [Lam and Lai 2001] used category-specific features to model error-rate, and then picked the single best system for a category

• [Bennet et.al, 2002] uses “reliability indicators” together with scores as input to a metaclassifier


Combination of Classifiers

}1,0{),( qdyRelevance Judgment:

Decision Rule: )),((}1,0{),( qqdrsignqdC

The problem of fusion can be formulated as the problem offinding a way to combine several decision rules


Linear Combinations

qd

lqdxl

qdxsignqdC qj

l

jjF

on topic document tosystems by thegiven scores, normalized ofvector

ldimensiona-an is ),( and thresholda is weights,of vector ldimensiona-an is where

)),((),(1

),'(min),'(max

),'(min),(

''

'

qdrqdr

qdrqdrx

sdsd

sds

s


Input to Local Fusion

documentjth for judgement relevance - )(documents ,...,1

document,given afor scores of vector - x j

jynj

System 1 System 2 … System L Relevancedoc 1 x_11 x_12 … x_1l y_1doc 2 x_21 x_22 … x_2l y_2… … … … … …doc n x_n1 x_n1 … x_nl y_n


Local Fusion Methods

2

,...,1

)(min :Linear

nj

jj xy

2

,...,1,)(min :2Linear

nj

jj xy

xxxx :CentroidA new fusion method:

Other methods:


Local Fusion Methods (cont.)

jj

j

jj

jjjnj

j

xpp

xypp

pypy

1log

),,|1(

))1log()1(log(min :Logistic,...,1,

Since log is a monotone function, the underlying decision rule is linear


Threshold Tuning

• Once a vector of parameters is found for a local rule, we compute fusion score on the training set and find a threshold maximizing a particular utility measure:

Different combinations lead to different scores and decisions.

),( iii q


Global Fusion

When there are many topics:• Combine all document-query relevance judgments and

corresponding score together (as if for a single query)• Compute a local fusion rule

When data for a new training topic becomes available we can either: • solve the problem from the scratch, or• continue using the same rule.


Input to Global Fusion

System 1 System 2 … System L Relevancedoc 1/query 1 x_111 x_112 … x_11l y_11doc 1/query 2 x_121 x_122 … x_12l y_12… … … … … …doc 1/query m x_111 x_112 … x_11l y_1mdoc 2/ query 1 x_21 x_212 … x_21l y_21… … … … … …doc n/ query m x_n1 x_nm1 … x_nml y_nm


Question:

Suppose we know local fusion rules on a set of queries.• Can we exploit this knowledge on other queries? • Can we come up with a scheme that can easily incorporate new training queries?


Federated Fusion

m

jjm 1

1*

m

jj

m

jj qmm 11

),(11*

),( rulesfusion localir with the queries ningGiven trai j1 jm,...,qq

New training topics are easy to incorporate!


Experimental Evaluation• Reuters Corpus v1, version 2 (RCV1-v2)• 99 topics• Completely judged• ~23K documents (as in Lewis et. al. 2004) to train

individual systems• Selected 4060 (from ~ 800K) to construct fusion

rules• 9-fold cross-validation over topics


Utility Measures

T+ - all positive documents; D+ - submitted positive;D- - submitted negative

||2||||2T11NU

TDD

5.15.0)5.0T11NU,max(T11SU


Term Representation

otherwise 0

0,d)(t,f' if d)),(t,log(f'1 d)f(t,

where f’(t,d) is number of times a term occurs in a document.

IDF weighting: let i’(t) is the number of documents, in the training set T, containing term t. Then:

)i'(t)T

((t)iD

11

log


Individual Classifiers

• Bayesian Binary Regression (BBR) [Genkin et. al. 2004]

• kNN, k=384 (k was chosen on the basis of prior experiments)

• Rocchio Classifier


Single Classifiers and BBR-kNN fusion

0

0.2

0.4

0.6

0.8

1

1.2

1911 29

6

178

133 82 52 31 20 14 9 4

Topics (# of Relevant Documents)

T11S

U

kNNRocchioBBRBBR-kNN globalBBR-kNN federated


Global vs. Federated

0

0.2

0.4

0.6

0.8

1

CC

AT

M14

C18

C18

1

GC

RIM E12

C21

G15

C17

2

E51

2

E11

E13

C18

2

GE

NT

C17

3

E31

E31

1

G15

2

C33

1

E14

1

Topics (in decreasing number of relevant documents)

T11S

U BBR-kNN global

BBR-kNN federated


Global vs. Federated

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Global (T11SU)

Fede

rate

d (T

11SU

)

BBR-kNN global

Series1


Results

Local Fusion kNN Rocchio BBR BBR-kNN global BBR-kNN federatednone 0.583 0.54 0.578 … …Centroid … … … 0.569 0.587Linear … … … 0.569 0.574Linear 2 … … … 0.569 0.575Logistic … … … 0.556 0.549

Average T11SU measure across 99 topics of RCV1


Conclusions• Centroid method performs best with federated fusion• Federated fusion gives higher average utility,• But global fusion performs better on greater number of topics.• This seems to be related to the number of relevant documents for individual topics (federated is better for topics with few relevant documents).• No Clear Winner: the choice of methods depends on user’s objectives• However, computationally Federated fusion is more efficient• Have to consider topic properties when choosing a combination method


Acknowledgments

• KD-D group via NSF grant EIA-0087022• Members of DIMACS MMS project: Fred Roberts (PI), Andrei Anghelescu, Alex

Genkin, Dave Lewis, David Madigan, Vladimir Menkov

• Kwong Bor Ng• Anonymous reviewers

methods for learning classifier combinations: no clear winner

Documents

acm sac2005dmitriy fradkin

acm sac2005input

topicsdmitriy fradkin

lineardmitriy fradkin

negativedmitriy fradkin

acm sac2005sheet1system

resultsdmitriy fradkin

metaclassifier dmitriy