study on ensemble learning by feng zhou. content introduction a statistical view of m3 network...

16
Study on Ensemble Learning By Feng Zhou

Upload: alejandro-sanders

Post on 26-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

Study on Ensemble Learning

By Feng Zhou

Page 2: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

Content

• Introduction• A Statistical View of M3 Network• Future Works

Page 3: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

Introduction• Ensemble learning:

– To combine a group of classifiers rather than to design a new one.– The decisions of multiple hypotheses are combined to produce more

accurate results.

• Problems in traditional learning algorithms– Statistical Problem– Computational Problem– Representation Problem

• Related Works– Resampling techniques: Bagging, Boosting– Approaches for extending to multi-class problem:

One-vs-One, One-vs-All.

Page 4: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

Min-Max-Modular (M3) Network(Lu, IEEE TNN 1999)

• Steps– Dividing training sets. (Chen, IJCNN 2006; Wen, ICONIP 2005)

– Training pair-wise classifiers– Integrating the outcomes (Zhao, IJCNN 2005)

• Min process• Max process

0.1 0.5 0.7 0.2

0.4 0.3 0.5 0.6

0.8 0.5 0.4 0.2

0.5 0.9 0.7 0.3

0.1

0.3

0.2

0.3

Min Min Min Min

Max 0.3

Page 5: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

A Statistical View

• Assumption– The pair-wise classifier outputs a probabilistic

value. Sigmoid function (J.C. Platt, ALMC 1999):

• Bayesian decision theory

1( | )

1 Ax BP x

e

{ , }

ˆ argmax ( | )P x

( | ) ( )( | )

( | ) ( ) ( | ) ( )

P x PP x

P x P P x P

where

Page 6: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

A Simple Discrete Example

P(w|x)

W+ W-

X1 1/2

X2 1/2 2/5

X3 2/5

X4 1/5

Page 7: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

A Simple Discrete Example (II)

Classifier 1 (w+:w1-) Classifier 2 (w+:w2

-)

Pc0(w+|x=x2) = 1/3

Pc1(w+|x=x2) = 1/2

Pc2(w+|x=x2) = 1/2

Classifier 0 (w+:w-)

Pc0 < min(Pc1,Pc2)

Page 8: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

A More Complicated Example• When consider a new more

classifier, the evidence that x belong to w+ is getting shrinking.

• Pglobal(w+) < min(Ppartial(w+))

• The one reporting the minimum value contains the most information about w- (Minimization principle)

• If Ppartial(w+)=1, no information about w- is

contained.

Classifier 1 (w+:w1-) Classifier 2 (w+:w2

-)

……

Information about w- is increasing

Page 9: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

Analysis• For each classifier cij

• For each sub-positive class wi+

• For positive class w+

( | , )i i jP x ( , )

( , ) ( , )i

iji j

P xM

P x p x

( | , )i iP x 1

1( 1)

i

jij

qn

M

( | )P x

11

1( 1)

1i i

nq

Page 10: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

Analysis (II)• Decomposition of a complex problem

• Restoration to the original resoluation

Page 11: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

Composition of Training Setsw+ w-

w1+ … wn+

+ w1- … wn-

-

w+

w1+

wn++

w-

w1-

wn--

Have been used

Trivial set, useless

Not used yet

Page 12: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

Another Way of Combinationw+ w-

w1+ … wn+

+ w1- … wn-

-

w+

w1+

wn++

w-

w1-

wn--

'

'

1( 2)

1 1( 2)

i kki

i k jki kj

nM

qn n

M M

Training and testing Time: ( * ) ( )n n n n

Page 13: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

Experiments - Synthesis Data

Page 14: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

Experiments – Text Categorization(20 Newsgroup copus)

Experiments Setup

• Removing words :stemming

stop words < 30

• Using Naïve Bayes as the elementary classifier

• Estimating the probability with a sigmod function

Page 15: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

Future Work

• Situation with consideration of noise– The virtue of the problem:

To access the underlying distribution– Independent parameters for the model:– Constraints we get: – To obtain the best estimation.

Kullback-Leibler Distance (T. Hastie, Ann Statist 1998)

n n ( )2

n n

Page 16: Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works

References[1] T. Hastie & R. Tibshirani, Classification by pairwise coupling, Ann

Statist 1998.[2] J. C. Platt, (Probabilistic outputs for support vector machines and

comparisons to regularized likelihood methods, ALMC 1999[3] B. Lu & , Task decomposition and module combination based on

class relations a modular neural network for pattern classification, IEEE Tran. Neural Networks, 1999

[4] Y. M. Wen & B. Lu, Equal Clustering Makes Min-Max Modular Support Vector Machines More Efficient, ICONIP 2005

[5] H. Zhao & B. Lu, On efficient selection of binary classifiers for min-max modular classifier, IJCNN 2005

[6] K. Chen & B. Lu, Efficient classification of multi-label and imbalanced data using min-max modular classifiers, IJCNN 2006