uva cs 4501: machine learning lecture 17: generative bayes ... · machine learning lecture 17:...

UVACS4501:MachineLearning

Lecture17:GenerativeBayes

Classifiers

Dr.YanjunQi

UniversityofVirginia

DepartmentofComputerScience

Wherearewe?èMajorsectionsofthiscourse

q Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory

12/6/18

Dr.YanjunQi/UVACS

2

Wherearewe?èThreemajorsectionsforclassification

•  We can divide the large variety of classification approaches into roughly three major types

1. Discriminative - directly estimate a decision rule/boundary - e.g., SVM , NN , decision tree 2. Generative: - build a generative statistical model - e.g., naïve bayes classifier, Bayesian networks 3. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors

12/6/18

Dr.YanjunQi/UVACS

3

TodayRecap:GenerativeBayesClassifiers

ü BayesClassifier§  GenerativeBayesClassifier

ü NaïveBayesClassifierü GaussianBayesClassifiers

§  Gaussiandistribution§  NaïveGaussianBC§  Not-naïveGaussianBCèLDA,QDA

12/6/18

Dr.YanjunQi/UVACS

4

•  Treat each feature attribute and the class label as random variables.

•  Given a sample x with attributes ( x1, x2, … , xp ): – Goal is to predict its class c. – Specifically, we want to find the class that maximizes

p( c | x1, x2, … , xp ).

•  Can we estimate p(Ci |x) = p( Ci | x1, x2, … , xp ) directly from data?

Review: Bayes classifiers

12/6/18

Dr.YanjunQi/UVACS/s18

5

cMAP = argmaxcj∈C

P(cj | x1, x2,…, xp ) MAPRule

Bayes classifiers è MAP classification rule

•  Establishing a probabilistic model

for classification –  (1) Discriminative –  (2) Generative

12/6/18

Dr.YanjunQi/UVACS

6

Review: Establishing a probabilistic model for classification

–  (1) Discriminative model

x = (x1 ,x2 ,⋅⋅⋅,xp)

Discriminative

Probabilistic Classifier

1x 2x xp

)|( 1 xcP )|( 2 xcP )|( xLcP

•••

•••

12/6/18AdaptfromProf.KeChenNBslides

7


argmax

c∈CP(c |X),C = {c1 ,⋅⋅⋅,cL}

(2) Generative

P(X |C),C = c1 ,⋅⋅⋅,cL ,X = (X1 ,⋅⋅⋅,Xp)

Generative Probabilistic Model

for Class 1

)|( 1cP x

1x 2x xp•••


for Class 2

)|( 2cP x

1x 2x xp•••


for Class L

)|( LcP x

1x 2x xp•••

•••

x = (x1, x2, ⋅ ⋅ ⋅, xp )12/6/18

Dr.YanjunQi/UVACS

AdaptfromProf.KeChenNBslides8

Generative BC

P(X |C),C = c1 ,⋅⋅⋅,cL ,X = (X1 ,⋅⋅⋅,Xp)


for Class 1

)|( 1cP x

1x 2x xp•••


for Class 2

)|( 2cP x

1x 2x xp•••


for Class L

)|( LcP x

1x 2x xp•••

•••

x = (x1, x2, ⋅ ⋅ ⋅, xp )12/6/18

Dr.YanjunQi/UVACS


ReviewProbability:Ifhardtodirectlyestimatefromdata,mostlikelywecan

estimate

•  1.Jointprobability– UseChainRule

•  2.Marginalprobability– Usethetotallawofprobability

•  3.Conditionalprobability– UsetheBayesRule

12/6/18

Dr.YanjunQi/UVACS6316/f16

11

Establishing a probabilistic model for classification through generative modeling


for Class 1

)|( 1cP x

1x 2x xp•••


for Class 2

)|( 2cP x

1x 2x xp•••


for Class L

)|( LcP x

1x 2x xp•••

•••

x = (x1, x2, ⋅ ⋅ ⋅, xp )12/6/18 AdaptfromProf.KeChenNBslides

argmax

Ci

P(Ci |X )= argmaxCi

P(X ,Ci )= argmaxCi

P(X |Ci )P(Ci )

14


Summary:

–  Apply Bayes rule to get posterior probabilities

–  Then apply the MAP rule

LicCPcC|P

PcCPcC|P|cCP

ii

iii

,,2,1 for )()(

)()()( )(

⋅⋅⋅====∝

=======

xXxX

xXxX

12/6/18

Dr.YanjunQi/UVACS


ADatasetforclassification

•  Data/points/instances/examples/samples/records:[rows]•  Features/attributes/dimensions/independentvariables/covariates/predictors/regressors:[columns,exceptthelast]•  Target/outcome/response/label/dependentvariable:specialcolumntobepredicted[lastcolumn]

12/6/18 16

Output as Discrete Class Label

C1, C2, …, CL

C

C

Generative


argmax

c∈CP(c |X )= argmax

c∈CP(X ,c)= argmax

c∈CP(X |c)P(c)

argmax

c∈CP(c |X)C = {c1 ,⋅⋅⋅,cL}Discriminative

thislecture!

An Example

•  Example: Play Tennis

C

12/6/18

Dr.YanjunQi/UVACS

17

An Example


C

12/6/18

Dr.YanjunQi/UVACS

18

Example


C

12/6/18

Dr.YanjunQi/UVACS

19

12/6/18

Dr.YanjunQi/UVACS

20

•  maximum likelihood estimates – simply use the frequencies in the data

CheckL12-MLELecturefor

Why

Directlyestimatefrom

data

12/6/18 21

•  Learning Phase

C

Outlook (3 values)

Temperature(3 values)

Humidity(2 values)

Wind(2 values)

Play=Yes Play=No

sunny hot high weak 0/9 1/5sunny hot high strong …/9 …/5sunny hot normal weak …/9 …/5sunny hot normal strong …/9 …/5

…. …. …. …. …. ….…. …. …. …. …. ….…. …. …. …. …. ….…. …. …. …. …. ….

3*3*2*2 [conjunctions of attributes] * 2 [two classes]=72 parameters

P(Play=Yes) = 9/14 P(Play=No) = 5/14P(C1), P(C2), …, P(CL)

P(X1,X2 ,…, Xp|C1), P(X1,X2 ,…, Xp|C2)

Dr.YanjunQi/UVACS

Generative Bayes Classifier:

Generative Bayes Classifier:

•  Test Phase

–  Given an unknown instance

–  Look up tables to assign the label c* to Xts if

–  Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)

Dr.YanjunQi/UVACS

12/6/18 22

′Xts = ( ′a1 ,⋅⋅⋅, ′ap)

P̂( ′a1 ,⋅⋅⋅ ′ap |c* )P̂(c* )> P̂( ′a1 ,⋅⋅⋅ ′ap |c)P̂(c),c ≠ c* , c = c1 ,⋅⋅⋅,cL

LastPage

12/6/18

Dr.YanjunQi/UVACS

23





12/6/18

Dr.YanjunQi/UVACS

24

Naïve Bayes Classifier

•  Bayes classification

Difficulty: learning the joint probability

•  Naïve Bayes classification –  Assumption that all input attributes are conditionally independent!

12/6/18

Dr.YanjunQi/UVACS

25

argmax

c j∈CP(x1 ,x2 ,…,xp |c j )P(c j )


•  Bayes classification

Difficulty: learning the joint probability


12/6/18

Dr.YanjunQi/UVACS

26

argmax

c j∈CP(x1 ,x2 ,…,xp |c j )P(c j )



P(X1 ,X2 ,⋅⋅⋅,Xp |C)= P(X1 |X2 ,⋅⋅⋅,Xp ,C)P(X2 ,⋅⋅⋅,Xp |C) = P(X1 |C)P(X2 ,⋅⋅⋅,Xp |C) = P(X1 |C)P(X2 |C)⋅⋅⋅P(Xp |C)

12/6/18

Dr.YanjunQi/UVACS

27





c ≠ c*, c = c1, ⋅ ⋅ ⋅,cL

x = (x1 ,x2 ,⋅⋅⋅,xp)

12/6/18

Dr.YanjunQi/UVACS

29






c ≠ c*, c = c1, ⋅ ⋅ ⋅,cL

x = (x1 ,x2 ,⋅⋅⋅,xp)

12/6/18

Dr.YanjunQi/UVACS

30


Naïve Bayes Classifier (for discrete input attributes) - training

•  Naïve Bayes Algorithm (for discrete input attributes)

–  Learning Phase: Given a training set S,

Output: conditional probability tables; for elements

For each target value of ci (ci = c1,⋅ ⋅ ⋅,cL )

P̂(C = ci )← estimate P(C = ci ) with examples in S; For every attribute value x jk of each attribute X j ( j =1, ⋅ ⋅ ⋅, p; k =1, ⋅ ⋅ ⋅,K j )

P̂(X j = x jk | C = ci )← estimate P(X j = x jk | C = ci ) with examples in S;

Xj, K j × L

12/6/18

Dr.YanjunQi/UVACS

31





For each target value of ci (ci = c1,⋅ ⋅ ⋅,cL )

P̂(C = ci )← estimate P(C = ci ) with examples in S; For every attribute value x jk of each attribute X j ( j =1, ⋅ ⋅ ⋅, p; k =1, ⋅ ⋅ ⋅,K j )

P̂(X j = x jk | C = ci )← estimate P(X j = x jk | C = ci ) with examples in S;

Xj, K j × L

12/6/18

Dr.YanjunQi/UVACS

32





Foreachtargetvalueofci (ci = c1 ,⋅⋅⋅,cL )P̂(C = ci )←estimateP(C = ci )withexamplesinS;Foreveryattributevaluex jk ofeachattributeX j( j =1,⋅⋅⋅,p;k =1,⋅⋅⋅,K j )P̂(X j = x jk |C = ci )←estimateP(X j = x jk |C = ci )withexamplesinS;

Xj, K j × L

12/6/18

Dr.YanjunQi/UVACS

33

Naïve Bayes (for discrete input attributes) - testing


–  Test Phase: Given an unknown instance

Look up tables to assign the label c* to X’ if [P̂( !a1 | c*) ⋅ ⋅ ⋅ P̂( !ap | c*)]P̂(c*)> [P̂( !a1 | c) ⋅ ⋅ ⋅ P̂( !ap | c)]P̂(c),

c ≠ c*, c = c1, ⋅ ⋅ ⋅,cL

!X = ( !a1, ⋅ ⋅ ⋅, !ap )

12/6/18

Dr.YanjunQi/UVACS

34

An Example


C

12/6/18 35

An Example


C

12/6/18

Dr.YanjunQi/UVACS

36

Learning (training) the NBC Model

C

X1 X2 X5 X3 X4 X6

37 12/6/18

Dr.YanjunQi/UVACS


•  maximum likelihood estimates: –  simply use the frequencies in the data

)(),(

)|(ˆj

jiiji cCN

cCxXNcxP

===

=

C

X1 X2 X5 X3 X4 X6

NcCN

cP jj

)()(ˆ

==

38 12/6/18

Dr.YanjunQi/UVACS


•  maximum likelihood estimates –  simply use the frequencies in the data

)(),(

)|(ˆj

jiiji cCN

cCxXNcxP

===

=

C

X1 X2 X5 X3 X4 X6

NcCN

cP jj

)()(ˆ

==

39 12/6/18

Dr.YanjunQi/UVACS

12/6/18

Dr.YanjunQi/UVACS

40

OutlookPlay=YesPlay=No

Sunny

Overcast

Rain

Temperature Play=Yes Play=NoHot

Mild

Cool

Humidity Play=YesPlay=No

High

Normal

Wind Play=YesPlay=No

Strong

Weak

P(Play=Yes) = ?? P(Play=No) = ?? P(C1), P(C2), …, P(CL)

P(X2|C1), P(X2|C2)

P(X4|C1), P(X4|C2)

EstimateP(X j = x jk |C = ci )withexamplesintraining;

12/6/18

Dr.YanjunQi/UVACS

41


OutlookPlay=YesPlay=NoSunny 2/9 3/5

Overcast 4/9 0/5Rain 3/9 2/5

Temperature Play=Yes Play=NoHot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5


High 3/9 4/5Normal 6/9 1/5


Strong 3/9 3/5Weak 6/9 2/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14 P(C1), P(C2), …, P(CL)

P(X2|C1), P(X2|C2)

P(X4|C1), P(X4|C2)

12/6/18

3+3+2+2 [naïve assumption] * 2 [two classes]= 20 parameters

42

Dr.YanjunQi/UVACS



OutlookPlay=YesPlay=NoSunny 2/9 3/5

Overcast 4/9 0/5Rain 3/9 2/5

Temperature Play=Yes Play=NoHot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5


High 3/9 4/5Normal 6/9 1/5


Strong 3/9 3/5Weak 6/9 2/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14 P(C1), P(C2), …, P(CL)

P(X2|C1), P(X2|C2)

P(X4|C1), P(X4|C2)

12/6/18

3+3+2+2 [naïve assumption] * 2 [two classes]= 20 parameters

43

Dr.YanjunQi/UVACS


Testing the NBC Model

•  Test Phase


Dr.YanjunQi/UVACS

!![P̂( ′a1 |c* )⋅⋅⋅P̂( ′ap |c* )]P̂(c* )>[P̂( ′a1 |c)⋅⋅⋅P̂( ′ap |c)]P̂(c)

12/6/18 44


•  Test Phase


Dr.YanjunQi/UVACS


12/6/18 45


•  Test Phase –  Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)–  Look up in conditional-prob tables

P(Outlook=Sunny|Play=No) = 3/5P(Temperature=Cool|Play==No) = 1/5P(Huminity=High|Play=No) = 4/5P(Wind=Strong|Play=No) = 3/5P(Play=No) = 5/14

P(Outlook=Sunny|Play=Yes) = 2/9P(Temperature=Cool|Play=Yes) = 3/9P(Huminity=High|Play=Yes) = 3/9P(Wind=Strong|Play=Yes) = 3/9P(Play=Yes) = 9/14

12/6/18

Dr.YanjunQi/UVACS


46

WHY ? Naïve Bayes Assumption

•  P(cj)–  Canbeestimatedfromthefrequencyofclassesinthetrainingexamples.

•  P(x1,x2,…,xp|cj)– O(|X1|.|X2|.|X3|….|Xp|.|C|)parameters–  Couldonlybeestimatedifavery,verylargenumberoftrainingexampleswasavailable.

NaïveBayesConditionalIndependenceAssumption:•  Assumethattheprobabilityofobservingtheconjunctionof

attributesisequaltotheproductoftheindividualprobabilitiesP(xi|cj).

48

Ifnonaïveassumption

12/6/18

Dr.YanjunQi/UVACS

AdaptFromManning’textCattutorial

WHY ? Naïve Bayes Assumption •  P(cj)

–  Canbeestimatedfromthefrequencyofclassesinthetrainingexamples.

•  P(x1,x2,…,xp|cj)–  O(|X1|.|X2|.|X3|….|Xp|.|C|)parameters–  Couldonlybeestimatedifavery,verylargenumberoftrainingexampleswasavailable.

•  P(xk|cj)

–  O([|X1|+|X2|+|X3|….+|Xp|].|C|)parameters–  AssumethattheprobabilityofobservingtheconjunctionofattributesisequaltotheproductoftheindividualprobabilitiesP(xi|cj).

49 12/6/18

Dr.YanjunQi/UVACS

Naïve

NotNaïve

WHY ? Naïve Bayes Assumption •  P(cj)

–  Canbeestimatedfromthefrequencyofclassesinthetrainingexamples.

•  P(x1,x2,…,xp|cj)–  O(|X1|.|X2|.|X3|….|Xp|.|C|)parameters–  Couldonlybeestimatedifavery,verylargenumberoftrainingexampleswasavailable.

•  P(xk|cj)

–  O([|X1|+|X2|+|X3|….+|Xp|].|C|)parameters–  AssumethattheprobabilityofobservingtheconjunctionofattributesisequaltotheproductoftheindividualprobabilitiesP(xi|cj).

50 12/6/18

Dr.YanjunQi/UVACS

Naïve

NotNaïve

Challenges during Learning (training) the NBC Model

C

X1 X2 X5 X3 X4 X6

51

12/6/18

Dr.YanjunQi/UVACS

C=Flu

X1 X2 X5 X3 X4 X6=Muscle-ache

Forinstance:

•  What if we have seen no training cases where patient had no flu and muscle aches?

•  Zero probabilities cannot be conditioned away, no matter the other evidence!

P̂(X6 = t |C = not_flu)=

N(X6 = t ,C = nf )N(C = nf ) =0

??= argmaxc P̂(c) P̂(xi |c)i∏12/6/18 52

C=Flu

X1 X2 X5 X3 X4 X6=Muscle-ache

Forinstance:

Dr.YanjunQi/UVACS

12/6/18

Dr.YanjunQi/UVACS

53

Smoothing to Avoid Overfitting

kcCNcCxXN

cxPj

jiiji +=

+===

)(1),(

)|(ˆ

#ofvaluesof feature Xi

12/6/18 54

Tomakesum_i(P(xi|Cj)=1

Dr.YanjunQi/UVACS

AdaptFromManning’textCattutorial

i

Smoothing to Avoid Overfitting

P̂(xi |c j )=

N(Xi = xi ,C = c j )+1N(C = c j )+ki

•  Somewhat more subtle version

#ofvaluesof Xi

mcCNmpcCxXN

cxPj

kijkiijki +=

+===

)(),(

)|(ˆ ,,,

overallfractionindatawhereXi=xi,k

extentof“smoothing” 55 12/6/18

Dr.YanjunQi/UVACS

Summary: Generative Bayes Classifier

Task: Classify a new instance X based on a tuple of attribute values into one of the classes X = X1,X2,…,Xp

cMAP = argmaxcj∈C

P(cj | x1, x2,…, xp )

= argmaxcj∈C

P(x1, x2,…, xp | cj )P(cj )P(x1, x2,…, xp )

= argmaxcj∈C

P(x1, x2,…, xp | cj )P(cj )

56

MAP=MaximumAPosteriori

12/6/18

Dr.YanjunQi/UVACS

AdaptFromCarols’probtutorial

Today:GenerativeBayesClassifiers




12/6/18

Dr.YanjunQi/UVACS

57

GaussianDistribution

Courtesy:http://research.microsoft.com/~cmbishop/PRML/index.htm

CovarianceMatrixMean12/6/18

Dr.YanjunQi/UVACS

58

( )2~ ,X N µ σ

Example: the Bivariate Normal distribution

p( !x) = f x1,x2( ) = 1

2π( ) Σ 1/2 e−1

2!x−!µ( )T Σ−1 !x−

!µ( )

with

!µ =

µ1

µ2

⎡

⎣⎢⎢

⎤

⎦⎥⎥

22 1 2

22 22 22 2 2

σ σ σ ρσ σσ σ ρσ σ σ

11 1 1

×1 1

⎡ ⎤⎡ ⎤Σ = = ⎢ ⎥⎢ ⎥

⎣ ⎦ ⎣ ⎦

and

( )2 2 2 222 12 1 2 1σ σ σ σ σ ρ11Σ = − = −

12/6/18

Dr.YanjunQi/UVACS

59

Scatter Plots of data from the bivariate Normal distribution

Dr.YanjunQi/UVACS

HowtoEstimateGaussian:MLE

∑=

=n

iixn 1

1µ

•  In the 1D Gaussian case, we simply set the mean and the variance to the sample mean and the sample variance:

2

σ = 1n

2

( ix −µ)i=1

n

∑

Dr.YanjunQi/UVACS

Dr.YanjunQi/UVACS

< X1, X2!, X p >~ N µ

"#,Σ( )

The p-multivariate Normal distribution

65


•  Continuous-valued Input Attributes –  Conditional probability modeled with the normal distribution

–  Learning Phase: Output: normal distributions and

–  Test Phase: •  Calculate conditional probabilities with all the normal distributions •  Apply the MAP rule to make a decision

P̂(Xj |C = ci ) =1

2πσ ji

exp −(X j−µ ji )

2

2σ ji2

"

#$$

%

&''

µ ji : mean (avearage) of attribute values Xj of examples for which C = ciσ ji : standard deviation of attribute values X j of examples for which C = ci

for X = (X1, ⋅ ⋅ ⋅,Xp ), C = c1, ⋅ ⋅ ⋅,cLp× L

for !X = ( !X1, ⋅ ⋅ ⋅, !Xp )

P(C = ci ) i =1, ⋅ ⋅ ⋅,L

12/6/18

Dr.YanjunQi/UVACS

66


•  Continuous-valued Input Attributes –  Conditional probability modeled with the normal distribution

–  Learning Phase: Output: normal distributions and

–  Test Phase: •  Calculate conditional probabilities with all the normal distributions •  Apply the MAP rule to make a decision

P̂(Xj |C = ci ) =1

2πσ ji


2

2σ ji2

"

#$$

%

&''

µ ji : mean (avearage) of attribute values Xj of examples for which C = ciσ ji : standard deviation of attribute values X j of examples for which C = ci

for X = (X1, ⋅ ⋅ ⋅,Xp ), C = c1, ⋅ ⋅ ⋅,cLp× L

for !X = ( !X1, ⋅ ⋅ ⋅, !Xp )

P(C = ci ) i =1, ⋅ ⋅ ⋅,L

12/6/18

Dr.YanjunQi/UVACS





12/6/18

Dr.YanjunQi/UVACS

67

Dr.YanjunQi/UVACS

P(X1,X2, ⋅ ⋅ ⋅,Xp |C = cj ) = P(X1 |C)P(X2 |C) ⋅ ⋅ ⋅P(Xp |C)

=∏i

12πσ ji


2

2σ ji2

$

%&&

'

())

Naïve

Σ_ ck = Λ _ ckDiagonalMatrixEachclass’covariancematrixisdiagonal

NaïveGaussianmeans?

NotNaïveGaussianmeans?

12/6/18

Dr.YanjunQi/UVACS

69

P(X1,X2, ⋅ ⋅ ⋅,Xp |C = cj ) = P(X1 |C)P(X2 |C) ⋅ ⋅ ⋅P(Xp |C)

=∏i

12πσ ji


2

2σ ji2

$

%&&

'

())

P(X1,X2, ⋅ ⋅ ⋅,Xp |C) =

Naïve

NotNaïve

12/6/18Σ_ ck = Λ _ ckDiagonalMatrix

Eachclass’covariancematrixisdiagonal


ü GaussianBayesClassifiers

§  Gaussiandistribution§  NaïveGaussianBC§  Not-naïveGaussianBC§  LDA:LinearDiscriminantAnalysis§  QDA:QuadraticDiscriminantAnalysis

12/6/18

Dr.YanjunQi/UVACS

70

(1)covariancematrixarethesameacrossclassesèLDA(LinearDiscriminantAnalysis)

12/6/18

Dr.YanjunQi/UVACS

71

Classk Classl Classk Classl

Eachclass’covariancematrixisthesame

72

Visualization (three classes)

12/6/18

Dr.YanjunQi/UVACS

12/6/18

Dr.YanjunQi/UVACS

73

!!

argmaxk

P(Ck |X )= argmaxk

P(X ,Ck )= argmaxk

P(X |Ck )P(Ck )= argmax

klog{P(X |Ck )P(Ck )}

12/6/18

Dr.YanjunQi/UVACS

74

!!

argmaxk

P(Ck |X )= argmaxk

P(X ,Ck )= argmaxk

P(X |Ck )P(Ck )= argmax

klog{P(X |Ck )P(Ck )}

12/6/18

Dr.YanjunQi/UVACS

75

!!log P(Ck |X )

P(Cl |X )= log P(X |Ck )

P(X |Cl )+ log P(Ck )

P(Cl )

12/6/18

Dr.YanjunQi/UVACS

76

!!log P(Ck |X )



P(Cl )

Theaboveisderivedfromthefollowing:

12/6/18

Dr.YanjunQi/UVACS

77

!!log P(Ck |X )



P(Cl )

LDAClassificationRule(alsocalledasLineardiscriminantfunction:)-  Note

argmax

kP(Ck |X )= argmax

kP(X ,Ck )= argmax

kP(X |Ck )P(Ck )

12/6/18

Dr.YanjunQi/UVACS

78





12/6/18

Dr.YanjunQi/UVACS

79

(2)Ifcovariancematrixarenotthesamee.g.èQDA(QuadraticDiscriminantAnalysis)

12/6/18

Dr.YanjunQi/UVACS

80

(2)Ifcovariancematrixarenotthesamee.g.èQDA(QuadraticDiscriminantAnalysis)

12/6/18

Dr.YanjunQi/UVACS

81

QDAvs.LDAonExpandedBasis

12/6/18

Dr.YanjunQi/UVACS

82

LDAwithquadraticbasisVersusQDA

(3)RegularizedDiscriminantAnalysis

12/6/18

Dr.YanjunQi/UVACS

83

Anexample:GaussianBayesClassifier

12/6/18

Dr.YanjunQi/UVACS

84

GaussianBayesClassifier

12/6/18

Dr.YanjunQi/UVACS

85

GaussianBayesClassifier

RedTeam

BlueTeam

NaïveGaussianBayesClassifierisnotalinearclassifier!





12/6/18

Dr.YanjunQi/UVACS

87

Generative Bayes Classifier

classification

Prob. models p(X|C)

EPE with 0-1 loss è MAP Rule

Many options

Prob. Models’ Parameter

Task

Representation

Score Function

Search/Optimization

Models, Parameters

P(X1, ⋅ ⋅ ⋅,Xp |C)

argmaxk

P(C _ k | X) = argmaxk

P(X,C) = argmaxk

P(X |C)P(C)

P̂(Xj |C = ck ) =1

2πσ jk

exp −(X j−µ jk )

2

2σ jk2

"

#$$

%

&''

P(W1 = n1,...,Wv = nv | ck ) =N !

n1k !n2k !..nvk !θ1kn1kθ2k

n2k ..θvknvk

p(Wi = true | ck ) = pi,kBernoulliNaïve

GaussianNaive

Multinomial

References

q Prof.AndrewMoore’sreviewtutorialq Prof.KeChenNBslidesq Prof.CarlosGuestrinrecitationslides

12/6/18

Dr.YanjunQi/UVACS

89

uva cs 4501: machine learning lecture 17: generative bayes ... · machine learning lecture 17:...

Documents