improve naïve bayesian classifier by discriminative training

22
ICONIP 2005 Improve Naïve Bayesian Improve Naïve Bayesian Classifier by Classifier by Discriminative Discriminative Training Training Kaizhu Huang, Zhangbing Zhou Kaizhu Huang, Zhangbing Zhou , , Irwin King Irwin King , , Michael R. Lyu Michael R. Lyu Oct. 2005 Oct. 2005

Upload: reya

Post on 23-Feb-2016

35 views

Category:

Documents


1 download

DESCRIPTION

Improve Naïve Bayesian Classifier by Discriminative Training. Kaizhu Huang, Zhangbing Zhou , Irwin King , Michael R. Lyu Oct. 2005. Outline. Background Classifiers Discriminative classifiers: Support Vector Machines Generative classifiers: Naïve Bayesian Classifiers Motivation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

Improve Naïve Bayesian Improve Naïve Bayesian Classifier by Discriminative Classifier by Discriminative

TrainingTraining

Kaizhu Huang, Zhangbing ZhouKaizhu Huang, Zhangbing Zhou, , Irwin KingIrwin King, , Michael R. LyuMichael R. Lyu

Oct. 2005Oct. 2005

Page 2: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

OutlineOutline

BackgroundBackground– ClassifiersClassifiers

» Discriminative classifiers: Support Vector MachinesDiscriminative classifiers: Support Vector Machines» Generative classifiers: Naïve Bayesian ClassifiersGenerative classifiers: Naïve Bayesian Classifiers

MotivationMotivation Discriminative Naïve Bayesian ClassifierDiscriminative Naïve Bayesian Classifier ExperimentsExperiments DiscussionsDiscussions ConclusionConclusion

Page 3: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

BackgroundBackground

Discriminative ClassifiersDiscriminative Classifiers– Directly maximize a discriminative function or Directly maximize a discriminative function or

posterior function posterior function – Example: Support Vector MachinesExample: Support Vector Machines

SVM

Page 4: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

BackgroundBackground

Generative ClassifiersGenerative Classifiers– Model the joint distribution for each class P(x|C) and Model the joint distribution for each class P(x|C) and

then use Bayes rules to construct posterior classifiers then use Bayes rules to construct posterior classifiers P(C|x), C : class label, x: features .P(C|x), C : class label, x: features .

– Example: Naïve Bayesian ClassifiersExample: Naïve Bayesian Classifiers» Model the distribution for each class under the assumption: Model the distribution for each class under the assumption:

each feature of the data is each feature of the data is independentindependent of others features, when of others features, when given the class labelgiven the class label. .

)4..()()|(maxarg

)3().........()|(maxarg

)2........()(

)()|(maxarg

)1(........)|(maxarg

1

m

jiijc

iic

iic

ic

CPCxP

CPCxPxp

CPCxP

xCPC

i

i

i

i

Constant w.r.t. C

Combining the assumption

mjiCxPCxPCxxP jiji 1 ),|()|()|,(

Page 5: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

BackgroundBackground

ComparisonComparison

Example of Missing Information:

From left to right: Original digit, 50% missing digit, 75% missing digit, and occluded digit.

Page 6: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

BackgroundBackground Why Generative classifiers are Why Generative classifiers are not accurate asnot accurate as

Discriminative classifiers?Discriminative classifiers?Training set

subset D1 labeled as Class 1

subset D2 Labelled as Class 2

Estimate distribution P1 to approximate D1

Estimate distribution P2 to approximate D2

Construct Bayes rule for classification

1.1. It is incomplete for generative It is incomplete for generative classifiers to just approximate the classifiers to just approximate the inner-class information.inner-class information.

2.2. The inter-class discriminative The inter-class discriminative information between classes are information between classes are discardeddiscarded

Scheme for Generative classifiers in two-category classification tasks

Needed!

Page 7: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

BackgroundBackground Why Generative Classifiers Why Generative Classifiers are superior toare superior to Discriminative Discriminative

Classifiers in Classifiers in handling missing informationhandling missing information problems? problems?– SVM SVM lacks the abilitylacks the ability under the uncertainty under the uncertainty– NB can NB can conduct uncertainty inferenceconduct uncertainty inference under the estimated under the estimated

distribution. distribution.

A is the feature set

T is the subset of A, which is missing

A-T is thus the known features

Page 8: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

MotivationMotivation

It seems that a good classifier should It seems that a good classifier should combinecombine the strategies of discriminative the strategies of discriminative classifiers and generative classifiers.classifiers and generative classifiers.

Our work trains one of the Our work trains one of the generativegenerative classifier: Naïve Bayesian Classifier in a classifier: Naïve Bayesian Classifier in a discriminativediscriminative way. way.

Page 9: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

Interaction

is needed!!

Discriminative Naïve Bayesian Discriminative Naïve Bayesian ClassifierClassifier

Training set

Sub-set D1labeled as Class I

Sub-set D2 labeled as Class 2

Estimate the distribution P1 to

approximate D1

Estimate the distribution P2 to approximate D2

Use Bayes rule for classification

Working Scheme of Naïve Bayesian Classifier Mathematic Explanation of Naïve Bayesian Classifier

Easily solved by Lagrange Multiplier method

Page 10: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

Discriminative Naïve Bayesian Discriminative Naïve Bayesian Classifier (DNB)Classifier (DNB)

Optimization function of DNBOptimization function of DNB

•On one hand, the minimization of this function tries to approximate the dataset as accurately as possible.

• On the other hand, the optimization on this function also tries to enlarge the divergence between classes.

• Optimization on joint distribution directly inherits the ability of NB in handling missing information problems

Divergence item

Page 11: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

Discriminative Naïve Bayesian Discriminative Naïve Bayesian Classifier (DNB)Classifier (DNB)

Complete Optimization problemComplete Optimization problem

Nonlinear optimization problem under linear Nonlinear optimization problem under linear constraints.constraints.

Page 12: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

Discriminative Naïve Bayesian Discriminative Naïve Bayesian Classifier (DNB)Classifier (DNB)

Solve the Optimization problemSolve the Optimization problem– Using Rosen Gradient Projection methodsUsing Rosen Gradient Projection methods

Page 13: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

Discriminative Naïve Bayesian Discriminative Naïve Bayesian Classifier (DNB)Classifier (DNB)

Gradient and Projection matrixGradient and Projection matrix

Page 14: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

Extension to Multi-category Extension to Multi-category Classification problemsClassification problems

Page 15: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

Experimental resultsExperimental results

Experimental SetupExperimental Setup– DatasetsDatasets

» 4 benchmark datasets from UCI machine learning repository4 benchmark datasets from UCI machine learning repository– Experimental EnvironmentsExperimental Environments

» Platform:Windows 2000Platform:Windows 2000» Developing tool: Matlab 6.5Developing tool: Matlab 6.5

Page 16: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

Without information missingWithout information missing

ObservationsObservations

–DNB outperforms NB in every datasetsDNB outperforms NB in every datasets

–DNB wins in 2 datasets while it loses in the other 2 DNB wins in 2 datasets while it loses in the other 2 datasets in comparison with SVMdatasets in comparison with SVM

–SVM outperforms DNB in Segment and SatimagesSVM outperforms DNB in Segment and Satimages

Page 17: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

With information missingWith information missing Scheme Scheme

– DNB usesDNB uses

to conduct inference when there is information to conduct inference when there is information missingmissing – SVM sets SVM sets 0 0 values to the missing features (the values to the missing features (the

default way to process unknown features in default way to process unknown features in LIBSVM) LIBSVM)

…………..(5)

Page 18: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

With information missingWith information missing

Error Rate in Iris with missing information

Setup : Setup : Randomly discard features gradually from a small Randomly discard features gradually from a small percentage to a big percentagepercentage to a big percentage

Error Rate in Vote with missing information

Page 19: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

With information missingWith information missing

Error Rate in Satimage with missing information Error Rate in DNA with missing information

Page 20: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

Summary of Experiment ResultsSummary of Experiment Results

1.1. ObservationsObservations NB demonstrates a robust ability in handling NB demonstrates a robust ability in handling

missing information problems.missing information problems. DNB inherits the ability of NB in handling DNB inherits the ability of NB in handling

missing information problems while it has a missing information problems while it has a higher classification accuracy than NBhigher classification accuracy than NB

SVM cannot deal with missing information SVM cannot deal with missing information problems easily.problems easily.

Page 21: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

DiscussionDiscussion Can DNB be extended to general Bayesian Can DNB be extended to general Bayesian

Network (BN) Classifier?Network (BN) Classifier?– Structure learning problem will be involved. Structure learning problem will be involved.

Direct application of DNB will encounter Direct application of DNB will encounter difficulties since the structure is non-fixed in difficulties since the structure is non-fixed in restricted BNs .restricted BNs .

– Finding optimal General Bayesian Network Finding optimal General Bayesian Network Classifiers is an NP-complete problem.Classifiers is an NP-complete problem.

Discriminative training on constrained Discriminative training on constrained Bayesian Network Classifier is possible…Bayesian Network Classifier is possible…

Page 22: Improve Naïve Bayesian Classifier by Discriminative Training

ICONIP 2005

ConclusionConclusion

We develop a novel model named We develop a novel model named Discriminative Naïve Bayesian ClassifiersDiscriminative Naïve Bayesian Classifiers

– It outperforms Naïve Bayesian Classifier when It outperforms Naïve Bayesian Classifier when no information is missingno information is missing

– It outperforms SVMs in handling missing It outperforms SVMs in handling missing information problems.information problems.