study on ensemble learning by feng zhou. content introduction a statistical view of m3 network...

Study on Ensemble Learning

By Feng Zhou

Content

• Introduction• A Statistical View of M3 Network• Future Works

Introduction• Ensemble learning:

– To combine a group of classifiers rather than to design a new one.– The decisions of multiple hypotheses are combined to produce more

accurate results.

• Problems in traditional learning algorithms– Statistical Problem– Computational Problem– Representation Problem

• Related Works– Resampling techniques: Bagging, Boosting– Approaches for extending to multi-class problem:

One-vs-One, One-vs-All.

Min-Max-Modular (M3) Network(Lu, IEEE TNN 1999)

• Steps– Dividing training sets. (Chen, IJCNN 2006; Wen, ICONIP 2005)

– Training pair-wise classifiers– Integrating the outcomes (Zhao, IJCNN 2005)

• Min process• Max process

0.1 0.5 0.7 0.2

0.4 0.3 0.5 0.6

0.8 0.5 0.4 0.2

0.5 0.9 0.7 0.3

Min Min Min Min

Max 0.3

A Statistical View

• Assumption– The pair-wise classifier outputs a probabilistic

value. Sigmoid function (J.C. Platt, ALMC 1999):

• Bayesian decision theory

1( | )

1 Ax BP x

ˆ argmax ( | )P x

( | ) ( )( | )

( | ) ( ) ( | ) ( )

P x PP x

P x P P x P

A Simple Discrete Example

P(w|x)

X1 1/2

X2 1/2 2/5

X3 2/5

X4 1/5

A Simple Discrete Example (II)

Classifier 1 (w+:w1-) Classifier 2 (w+:w2

Pc0(w+|x=x2) = 1/3

Pc1(w+|x=x2) = 1/2

Pc2(w+|x=x2) = 1/2

Classifier 0 (w+:w-)

Pc0 < min(Pc1,Pc2)

A More Complicated Example• When consider a new more

classifier, the evidence that x belong to w+ is getting shrinking.

• Pglobal(w+) < min(Ppartial(w+))

• The one reporting the minimum value contains the most information about w- (Minimization principle)

• If Ppartial(w+)=1, no information about w- is

contained.

Classifier 1 (w+:w1-) Classifier 2 (w+:w2

……

Information about w- is increasing

Analysis• For each classifier cij

• For each sub-positive class wi+

• For positive class w+

( | , )i i jP x ( , )

( , ) ( , )i

P x p x

( | , )i iP x 1

( | )P x

Analysis (II)• Decomposition of a complex problem

• Restoration to the original resoluation

Composition of Training Setsw+ w-

w1+ … wn+

+ w1- … wn-

Have been used

Trivial set, useless

Not used yet

Another Way of Combinationw+ w-

w1+ … wn+

+ w1- … wn-

1 1( 2)

i k jki kj

Training and testing Time: ( * ) ( )n n n n

Experiments - Synthesis Data

Experiments – Text Categorization(20 Newsgroup copus)

Experiments Setup

• Removing words :stemming

stop words < 30

• Using Naïve Bayes as the elementary classifier

• Estimating the probability with a sigmod function

Future Work

• Situation with consideration of noise– The virtue of the problem:

To access the underlying distribution– Independent parameters for the model:– Constraints we get: – To obtain the best estimation.

Kullback-Leibler Distance (T. Hastie, Ann Statist 1998)

n n ( )2

References[1] T. Hastie & R. Tibshirani, Classification by pairwise coupling, Ann

Statist 1998.[2] J. C. Platt, (Probabilistic outputs for support vector machines and

comparisons to regularized likelihood methods, ALMC 1999[3] B. Lu & , Task decomposition and module combination based on

class relations a modular neural network for pattern classification, IEEE Tran. Neural Networks, 1999

[4] Y. M. Wen & B. Lu, Equal Clustering Makes Min-Max Modular Support Vector Machines More Efficient, ICONIP 2005

[5] H. Zhao & B. Lu, On efficient selection of binary classifiers for min-max modular classifier, IJCNN 2005

[6] K. Chen & B. Lu, Efficient classification of multi-label and imbalanced data using min-max modular classifiers, IJCNN 2006

study on ensemble learning by feng zhou. content introduction a statistical view of m3 network...

p c2 w

p c0 w

p c1 w

p global w

minp partial w

way of combination w

w minimization principle

composition of training

Documents

zyxel confidential icmpv6 feng zhou sw2 zyxel communications...

optimizing memory efficiency for deep …optimizing memory...

safedrive: safe and recoverable extensions using...

wei-feng tsai xiao-ting zhou, chen fang, kangjun seo,...

cheng fang, feng zhou, chenhao luo

recognizing part attributes with insufficient data ·...

zhenpeng zhou,a tian li,a hongduan huang,a yang chen,a feng

capriccio: scalable threads for internet services rob von...

feng · pdf filefrom all feng shui schools feng feng shui...

ze-peng liu, yue-liang wu and yu-feng zhou kavli institute...

academic boosters, inc. awards reception...• anthony...

1195 bordeaux dr feng zhou b · feng zhou 1195 bordeaux dr...

recent polarized cathodes r&d at slac and future plans feng...

a new mechanism for large boost factor from dm conversions...

1 rx: treating bugs as allergies – a safe method to...

cuoliang li, beng chin ooi, jianhua feng, jianyong wang,...

cardghost ——language design for card game team member:...

seismograph networks in china: present state and development...

ze-peng liu, yue-liang wu and yu-feng zhou kavli institute...

anthropogenic aerosol deposition reduces the sensitivity...