gaussian bayes classifiers

33
1 Sep 10th, 2001 Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email protected] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving you r own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received. Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 2 Maximum Likelihood learning of Gaussians for Classification • Why we should care • 3 seconds to teach you a new learning algorithm • What if there are 10,000 dimensions? • What if there are categorical inputs? • Examples “out the wazoo”

Upload: guestfee8698

Post on 05-Dec-2014

1.142 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Gaussian Bayes Classifiers

1

Sep 10th, 2001Copyright © 2001, Andrew W. Moore

Learning Gaussian Bayes Classifiers

Andrew W. MooreAssociate Professor

School of Computer ScienceCarnegie Mellon University

www.cs.cmu.edu/[email protected]

412-268-7599

Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 2

Maximum Likelihood learning of Gaussians for Classification

• Why we should care• 3 seconds to teach you a new learning

algorithm• What if there are 10,000 dimensions?• What if there are categorical inputs?• Examples “out the wazoo”

Page 2: Gaussian Bayes Classifiers

2

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 3

Why we should care• One of the original “Data Mining” algorithms• Very simple and effective• Demonstrates the usefulness of our earlier

groundwork

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 4

Where we were at the end of the MLE lecture…

Inpu

ts

ClassifierPredict

category

Inpu

ts DensityEstimator

Prob-ability

Inpu

ts

RegressorPredictreal no.

Categorical inputs only

Mixed Real / Cat okay

Real-valued inputs only

Dec TreeJoint BC

Naïve BC

Joint DE

Naïve DE

Gauss DE

Page 3: Gaussian Bayes Classifiers

3

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 5

This lecture…In

puts

ClassifierPredict

category

Inpu

ts DensityEstimator

Prob-ability

Inpu

ts

RegressorPredictreal no.

Categorical inputs only

Mixed Real / Cat okay

Real-valued inputs only

Dec TreeJoint BC

Naïve BC

Joint DE

Naïve DE

Gauss DE

Gauss BC

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 6

Road MapProbability

PDFs

Gaussians

MLE ofGaussians

MLE

DensityEstimation

BayesClassifiers

DecisionTrees

Page 4: Gaussian Bayes Classifiers

4

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 7

Road MapProbability

PDFs

Gaussians

MLE ofGaussians

MLE

DensityEstimation

BayesClassifiers

GaussianBayes

Classifiers

DecisionTrees

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 8

Gaussian Bayes Classifier Assumption

• The i’th record in the database is created using the following algorithm

1. Generate the output (the “class”) by drawing yi~Multinomial(p1,p2,…pNy)

2. Generate the inputs from a Gaussian PDF that depends on the value of yi :

xi ~ N(µi ,Σ i).

Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated?

Page 5: Gaussian Bayes Classifiers

5

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 9

MLE Gaussian Bayes Classifier• The i’th record in the database is created

using the following algorithm1. Generate the output (the “class”) by

drawing yi~Multinomial(p1,p2,…pNy)2. Generate the inputs from a Gaussian PDF

that depends on the value of yi :xi ~ N(µi ,Σ i).

Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated?

|DBi|pi

mle = ------|DB|

Let DBi = Subset ofdatabase DB in which

the output class is y = i

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 10

MLE Gaussian Bayes Classifier• The i’th record in the database is created

using the following algorithm1. Generate the output (the “class”) by

drawing yi~Multinomial(p1,p2,…pNy)2. Generate the inputs from a Gaussian PDF

that depends on the value of yi :xi ~ N(µi ,Σ i).

Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated?

(µ imle, Σi

mle )= MLE Gaussian for DBi

Let DBi = Subset ofdatabase DB in which

the output class is y = i

Page 6: Gaussian Bayes Classifiers

6

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 11

MLE Gaussian Bayes Classifier• The i’th record in the database is created

using the following algorithm1. Generate the output (the “class”) by

drawing yi~Multinomial(p1,p2,…pNy)2. Generate the inputs from a Gaussian PDF

that depends on the value of yi :xi ~ N(µi ,Σ i).

Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated?

(µ imle, Σi

mle )= MLE Gaussian for DBi

Let DBi = Subset ofdatabase DB in which

the output class is y = i

∑∈

=R

ki

mlei

ik DB|DB|1

x

xµ ( )( )∑∈

−−=R

Tmleik

mleik

i

mlei

ik DB|DB|1

x

µxµxS

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 12

Gaussian Bayes Classification

)()()|(

)|(x

xx

piyPiyp

iyP==

==

Page 7: Gaussian Bayes Classifiers

7

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 13

Gaussian Bayes Classification

)()()|(

)|(x

xx

piyPiyp

iyP==

==

( ) ( )

)(21

exp||||)2(

1

)|(2/12/

x

µxSµxSx

p

piyP

iikiT

iki

m

−−−

== π

How do we deal with that?

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 14

Here is a dataset

48,000 records, 16 attributes [Kohavi 1995]

age employmenteducation edunummarital … job relation race gender hours_workedcountry wealth…

39 State_gov Bachelors 13 Never_married… Adm_clericalNot_in_familyWhite Male 40 United_Statespoor51 Self_emp_not_incBachelors 13 Married … Exec_managerialHusband White Male 13 United_Statespoor39 Private HS_grad 9 Divorced … Handlers_cleanersNot_in_familyWhite Male 40 United_Statespoor54 Private 11th 7 Married … Handlers_cleanersHusband Black Male 40 United_Statespoor28 Private Bachelors 13 Married … Prof_specialtyWife Black Female 40 Cuba poor38 Private Masters 14 Married … Exec_managerialWife White Female 40 United_Statespoor50 Private 9th 5 Married_spouse_absent… Other_serviceNot_in_familyBlack Female 16 Jamaica poor52 Self_emp_not_incHS_grad 9 Married … Exec_managerialHusband White Male 45 United_Statesrich31 Private Masters 14 Never_married… Prof_specialtyNot_in_familyWhite Female 50 United_Statesrich42 Private Bachelors 13 Married … Exec_managerialHusband White Male 40 United_Statesrich37 Private Some_college10 Married … Exec_managerialHusband Black Male 80 United_Statesrich30 State_gov Bachelors 13 Married … Prof_specialtyHusband Asian Male 40 India rich24 Private Bachelors 13 Never_married… Adm_clericalOwn_child White Female 30 United_Statespoor33 Private Assoc_acdm12 Never_married… Sales Not_in_familyBlack Male 50 United_Statespoor41 Private Assoc_voc 11 Married … Craft_repairHusband Asian Male 40 *MissingValue*rich34 Private 7th_8th 4 Married … Transport_movingHusband Amer_IndianMale 45 Mexico poor26 Self_emp_not_incHS_grad 9 Never_married… Farming_fishingOwn_child White Male 35 United_Statespoor33 Private HS_grad 9 Never_married… Machine_op_inspctUnmarried White Male 40 United_Statespoor38 Private 11th 7 Married … Sales Husband White Male 50 United_Statespoor44 Self_emp_not_incMasters 14 Divorced … Exec_managerialUnmarried White Female 45 United_Statesrich41 Private Doctorate 16 Married … Prof_specialtyHusband White Male 60 United_Statesrich

: : : : : : : : : : : : :

Page 8: Gaussian Bayes Classifiers

8

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 15

Predicting wealth from age

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 16

Predicting wealth from age

Page 9: Gaussian Bayes Classifiers

9

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 17

Wealth from hours worked

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 18

Wealth from years of education

Page 10: Gaussian Bayes Classifiers

10

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 19

age, hours → wealth

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 20

age, hours → wealth

Page 11: Gaussian Bayes Classifiers

11

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 21

age, hours → wealth

Having 2 inputs instead of one helps in two ways:1. Combining evidence from two 1d Gaussians2. Off-diagonal covariance distinguishes class “shape”

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 22

age, hours → wealth

Having 2 inputs instead of one helps in two ways:1. Combining evidence from two 1d Gaussians2. Off-diagonal covariance distinguishes class “shape”

Page 12: Gaussian Bayes Classifiers

12

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 23

age, edunum → wealth

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 24

age, edunum → wealth

Page 13: Gaussian Bayes Classifiers

13

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 25

hours, edunum → wealth

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 26

hours, edunum → wealth

Page 14: Gaussian Bayes Classifiers

14

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 27

Accuracy

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 28

An “MPG” example

Page 15: Gaussian Bayes Classifiers

15

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 29

An “MPG” example

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 30

An “MPG” exampleThings to note:

•Class Boundaries can be weird shapes (hyperconic sections)

•Class regions can be non-simply-connected

•But it’s impossible to model arbitrarily weirdly shaped regions

•Test your understanding: With one input, must classes be simply connected?

Page 16: Gaussian Bayes Classifiers

16

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 31

Overfitting dangers• Problem with “Joint” Bayes classifier:

#parameters exponential with #dimensions.This means we just memorize the

training data, and can overfit.

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 32

Overfitting dangers• Problem with “Joint” Bayes classifier:

#parameters exponential with #dimensions.This means we just memorize the

training data, and can overfit.

• Problemette with Gaussian Bayes classifier:#parameters quadratic with #dimensions.

With 10,000 dimensions and only 1,000 datapoints we could overfit.

Question: Any suggested solutions?

Page 17: Gaussian Bayes Classifiers

17

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 33

General: O(m2)parameters

=

mmm

m

m

221

222

12

11212

σσσ

σσσσσσ

LMOMM

LL

S

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 34

General: O(m2)parameters

=

mmm

m

m

221

222

12

11212

σσσ

σσσσσσ

LMOMM

LL

S

Page 18: Gaussian Bayes Classifiers

18

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 35

Aligned: O(m)parameters

=

m

m

2

12

32

22

12

00000000

00000000

0000

σσ

σσ

σ

LL

MMOMMMLL

L

S

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 36

Aligned: O(m)parameters

=

m

m

2

12

32

22

12

00000000

00000000

0000

σσ

σσ

σ

LL

MMOMMMLL

L

S

Page 19: Gaussian Bayes Classifiers

19

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 37

Spherical: O(1)cov parameters

=

2

2

2

2

2

0000

0000

0000

00000000

σσ

σσ

σ

L

L

MMOMMML

LL

S

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 38

Spherical: O(1)cov parameters

=

2

2

2

2

2

0000

0000

0000

00000000

σσ

σσ

σ

L

L

MMOMMML

LL

S

Page 20: Gaussian Bayes Classifiers

20

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 39

BCs that have both real and categorical inputs?

Inpu

ts

ClassifierPredict

category

Inpu

ts DensityEstimator

Prob-ability

Inpu

ts

RegressorPredictreal no.

Categorical inputs only

Mixed Real / Cat okay

Real-valued inputs only

Dec Tree

BC Here???

Joint BC

Naïve BC

Joint DE

Naïve DE

Gauss DE

Gauss BC

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 40

BCs that have both real and categorical inputs?

Inpu

ts

ClassifierPredict

category

Inpu

ts DensityEstimator

Prob-ability

Inpu

ts

RegressorPredictreal no.

Categorical inputs only

Mixed Real / Cat okay

Real-valued inputs only

Dec Tree

BC Here???

Joint BC

Naïve BC

Joint DE

Naïve DE

Gauss DE

Gauss BC

Easy!

Guess how?

Page 21: Gaussian Bayes Classifiers

21

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 41

BCs that have both real and categorical inputs?

Inpu

ts

ClassifierPredict

category

Inpu

ts DensityEstimator

Prob-ability

Inpu

ts

RegressorPredictreal no.

Categorical inputs only

Mixed Real / Cat okay

Real-valued inputs only

Dec TreeGauss/Joint BCGauss Naïve BC

Joint BC

Naïve BC

Joint DE

Naïve DE

Gauss DE

Gauss DE

Gauss BC

Gauss/Joint DE

Gauss Naïve DE

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 42

BCs that have both real and categorical inputs?

Inpu

ts

ClassifierPredict

category

Inpu

ts DensityEstimator

Prob-ability

Inpu

ts

RegressorPredictreal no.

Categorical inputs only

Mixed Real / Cat okay

Real-valued inputs only

Dec TreeGauss/Joint BCGauss Naïve BC

Joint BC

Naïve BC

Joint DE

Naïve DE

Gauss DE

Gauss DE

Gauss BC

Gauss/Joint DE

Gauss Naïve DE

Page 22: Gaussian Bayes Classifiers

22

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 43

Mixed Categorical / Real Density Estimation

• Write x = (u,v) = (u1 ,u2 ,…uq ,v1 ,v2 … vm-q)

Real valued Categorical valued

P(x |M)= P(u,v |M)

(where M is any Density Estimation Model)

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 44

Not sure which tasty

DE to enjoy? Try our…

Joint / Gauss DE Combo

P(u,v |M) = P(u |v ,M) P(v |M)

Gaussian withparameters

depending on v

Big “m-q”-dimensional lookup table

Page 23: Gaussian Bayes Classifiers

23

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 45

MLE learning of the Joint / Gauss DE Combo

P(u,v |M) = P(u |v ,M) P(v |M)

Fraction of records that match v

Cov. of u among records matching v

Mean of u among records matching v

=qv

=Σv

=µv

u |v ,M ~ N(µv , Σv) , P(v |M) = qv

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 46

MLE learning of the Joint / Gauss DE Combo

P(u,v |M) = P(u |v ,M) P(v |M)

=

=

=

Fraction of records that match v

Cov. of u among records matching v

Mean of u among records matching v

=qv

=Σv

=µv ∑= vvv

ukk

kR s.t.

1

∑=

−−vv

vvv

µuµukk

TkkR s.t.

))((1

RRv

u |v ,M ~ N(µv , Σv) , P(v |M) = qv

Rv = # records that match v

Page 24: Gaussian Bayes Classifiers

24

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 47

Gender and Hours Worked*

*As with all the results from the UCI “adult census” dataset, we can’t draw any real-world conclusions since it’s such a non-real-world sample

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 48

Joint / Gauss DE Combo

What we just did

Page 25: Gaussian Bayes Classifiers

25

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 49

Joint / Gauss BCCombo

What we do next

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 50

Joint / Gauss BCCombo

),()()|,(

),|(vu

vuvu

piYPMp

iYP i ===

),()()|(),|,(

vuvvu

piYPMpMp ii =

=

),(),;( ,,,

vuSµu vvv

ppqN iiii=

Page 26: Gaussian Bayes Classifiers

26

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 51

Joint / Gauss BCCombo

),()()|,(

),|(vu

vuvu

piYPMp

iYP i ===

),()()|(),|,(

vuvvu

piYPMpMp ii =

=

),(),;( ,,,

vuSµu vvv

ppqN iiii=

Rather so-so-notation for “Gaussian with mean µ i,v and covariance Σ i,v evaluated at u”

Fraction of “y=i”records that match v

=qi,v

Fraction of records that match “y=i”

Cov. of u among records matching v and in which y=i

Mean of u among records matching v and in which y=i

=pi

=Σi,v

=µ i,v

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 52

Gender, Hours→Wealth

Page 27: Gaussian Bayes Classifiers

27

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 53

Gender, Hours→Wealth

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 54

Joint / Gauss DE Combo and Joint / Gauss BC Combo: The

downside

• (Yawn…we’ve done this before…)More than a few categorical attributes blah blah

blah massive table blah blah lots of parametersblah blah just memorize training data blah blah blah do worse on future data blah blah need to be more conservative blah

Page 28: Gaussian Bayes Classifiers

28

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 55

Naïve/Gauss combo for Density Estimation

How many parameters?

= ∏∏

==

qm

jj

q

jj MvPMupMp

11

)|()|()|,( vu

),(~| 2jjj NMu σµ ],...,,[~| 21lMultinomia

jjNjjj qqqMv

Real Categorical

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 56

Naïve/Gauss combo for Density Estimation

= ∏∏

==

qm

jj

q

jj MvPMupMp

11

)|()|()|,( vu

),(~| 2jjj NMu σµ ],...,,[~| 21lMultinomia

jjNjjj qqqMv

Real Categorical

∑=k

kjj uR1

µ

R

hvq j

jh

==

in which records of #

22 )(1 ∑ −=

kjkjj u

Rµσ

Page 29: Gaussian Bayes Classifiers

29

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 57

Naïve/Gauss DE Example

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 58

Naïve/Gauss DE Example

Page 30: Gaussian Bayes Classifiers

30

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 59

Naïve / Gauss BC ),(

)()|,(),|(

vuvu

vup

iYPiYpiYP

====

)()|(),|(),(

1

11

2 iYPvPupp

qm

jijj

q

jijijj == ∏∏

==

qvu

σµ

Fraction of “y=i” records in which vj = h=qij[h]Fraction of records that match “y=i”

Var. of uj among records in which y=i

Mean of uj among records in which y=i

=pi

=σ2ij

=µ ij

i

qm

jjij

q

jijijj pvquN

p ∏∏−

==

=11

2 ][),;(),(

1σµ

vu

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 60

Gauss / Naïve BC Example

Page 31: Gaussian Bayes Classifiers

31

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 61

Gauss / Naïve BC Example

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 62

Learn Wealth from 15 attributes

Page 32: Gaussian Bayes Classifiers

32

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 63

Learn Wealth from 15 attributesSa

me

data

, ex

cept

all

real

val

ues

disc

retiz

ed

to 3

leve

ls

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 64

Learn Race from 15 attributes

Page 33: Gaussian Bayes Classifiers

33

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 65

What you should know• A lot of this should have just been a

corollary of what you already knew• Turning Gaussian DEs into Gaussian BCs• Mixing Categorical and Real-Valued

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 66

Questions to Ponder• Suppose you wanted to create an example

dataset where a BC involving Gaussians crushed decision trees like a bug. What would you do?

• Could you combine Decision Trees and Bayes Classifiers? How? (maybe there is more than one possible way)