uva cs 4501: machine learning lecture 17: generative bayes ... · machine learning lecture 17:...
TRANSCRIPT
UVACS4501:MachineLearning
Lecture17:GenerativeBayes
Classifiers
Dr.YanjunQi
UniversityofVirginia
DepartmentofComputerScience
Wherearewe?èMajorsectionsofthiscourse
q Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory
12/6/18
Dr.YanjunQi/UVACS
2
Wherearewe?èThreemajorsectionsforclassification
• We can divide the large variety of classification approaches into roughly three major types
1. Discriminative - directly estimate a decision rule/boundary - e.g., SVM , NN , decision tree 2. Generative: - build a generative statistical model - e.g., naïve bayes classifier, Bayesian networks 3. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors
12/6/18
Dr.YanjunQi/UVACS
3
TodayRecap:GenerativeBayesClassifiers
ü BayesClassifier§ GenerativeBayesClassifier
ü NaïveBayesClassifierü GaussianBayesClassifiers
§ Gaussiandistribution§ NaïveGaussianBC§ Not-naïveGaussianBCèLDA,QDA
12/6/18
Dr.YanjunQi/UVACS
4
• Treat each feature attribute and the class label as random variables.
• Given a sample x with attributes ( x1, x2, … , xp ): – Goal is to predict its class c. – Specifically, we want to find the class that maximizes
p( c | x1, x2, … , xp ).
• Can we estimate p(Ci |x) = p( Ci | x1, x2, … , xp ) directly from data?
Review: Bayes classifiers
12/6/18
Dr.YanjunQi/UVACS/s18
5
cMAP = argmaxcj∈C
P(cj | x1, x2,…, xp ) MAPRule
Bayes classifiers è MAP classification rule
• Establishing a probabilistic model
for classification – (1) Discriminative – (2) Generative
12/6/18
Dr.YanjunQi/UVACS
6
Review: Establishing a probabilistic model for classification
– (1) Discriminative model
x = (x1 ,x2 ,⋅⋅⋅,xp)
Discriminative
Probabilistic Classifier
1x 2x xp
)|( 1 xcP )|( 2 xcP )|( xLcP
•••
•••
12/6/18AdaptfromProf.KeChenNBslides
7
Dr.YanjunQi/UVACS/s18
argmax
c∈CP(c |X),C = {c1 ,⋅⋅⋅,cL}
(2) Generative
P(X |C),C = c1 ,⋅⋅⋅,cL ,X = (X1 ,⋅⋅⋅,Xp)
Generative Probabilistic Model
for Class 1
)|( 1cP x
1x 2x xp•••
Generative Probabilistic Model
for Class 2
)|( 2cP x
1x 2x xp•••
Generative Probabilistic Model
for Class L
)|( LcP x
1x 2x xp•••
•••
x = (x1, x2, ⋅ ⋅ ⋅, xp )12/6/18
Dr.YanjunQi/UVACS
AdaptfromProf.KeChenNBslides8
Generative BC
P(X |C),C = c1 ,⋅⋅⋅,cL ,X = (X1 ,⋅⋅⋅,Xp)
Generative Probabilistic Model
for Class 1
)|( 1cP x
1x 2x xp•••
Generative Probabilistic Model
for Class 2
)|( 2cP x
1x 2x xp•••
Generative Probabilistic Model
for Class L
)|( LcP x
1x 2x xp•••
•••
x = (x1, x2, ⋅ ⋅ ⋅, xp )12/6/18
Dr.YanjunQi/UVACS
AdaptfromProf.KeChenNBslides9
Generative BC
P(X |C),C = c1 ,⋅⋅⋅,cL ,X = (X1 ,⋅⋅⋅,Xp)
Generative Probabilistic Model
for Class 1
)|( 1cP x
1x 2x xp•••
Generative Probabilistic Model
for Class 2
)|( 2cP x
1x 2x xp•••
Generative Probabilistic Model
for Class L
)|( LcP x
1x 2x xp•••
•••
x = (x1, x2, ⋅ ⋅ ⋅, xp )12/6/18
Dr.YanjunQi/UVACS
AdaptfromProf.KeChenNBslides10
ReviewProbability:Ifhardtodirectlyestimatefromdata,mostlikelywecan
estimate
• 1.Jointprobability– UseChainRule
• 2.Marginalprobability– Usethetotallawofprobability
• 3.Conditionalprobability– UsetheBayesRule
12/6/18
Dr.YanjunQi/UVACS6316/f16
11
Review : Bayes’ Rule – for Generative Bayes Classifiers
P(C | X) = P(X |C)P(C)P(X)
12
P(C1), P(C2), …, P(CL) P(C1|x), P(C2|x), …, P(CL|x)
12/6/18
Dr.YanjunQi/UVACS
P(Ci |X) =
P(X |Ci )P(Ci )P(X)
Review : Bayes’ Rule – for Generative Bayes Classifiers
P(C | X) = P(X |C)P(C)P(X)
13
P(C1), P(C2), …, P(CL)
Prior
P(C1|x), P(C2|x), …, P(CL|x)
Posterior
12/6/18
Dr.YanjunQi/UVACS
P(C |X) = P(X | C)P(C)P(X)
Establishing a probabilistic model for classification through generative modeling
Generative Probabilistic Model
for Class 1
)|( 1cP x
1x 2x xp•••
Generative Probabilistic Model
for Class 2
)|( 2cP x
1x 2x xp•••
Generative Probabilistic Model
for Class L
)|( LcP x
1x 2x xp•••
•••
x = (x1, x2, ⋅ ⋅ ⋅, xp )12/6/18 AdaptfromProf.KeChenNBslides
argmax
Ci
P(Ci |X )= argmaxCi
P(X ,Ci )= argmaxCi
P(X |Ci )P(Ci )
14
Dr.YanjunQi/UVACS/s18
Summary:
– Apply Bayes rule to get posterior probabilities
– Then apply the MAP rule
LicCPcC|P
PcCPcC|P|cCP
ii
iii
,,2,1 for )()(
)()()( )(
⋅⋅⋅====∝
=======
xXxX
xXxX
12/6/18
Dr.YanjunQi/UVACS
AdaptfromProf.KeChenNBslides15
ADatasetforclassification
• Data/points/instances/examples/samples/records:[rows]• Features/attributes/dimensions/independentvariables/covariates/predictors/regressors:[columns,exceptthelast]• Target/outcome/response/label/dependentvariable:specialcolumntobepredicted[lastcolumn]
12/6/18 16
Output as Discrete Class Label
C1, C2, …, CL
C
C
Generative
Dr.YanjunQi/UVACS/s18
argmax
c∈CP(c |X )= argmax
c∈CP(X ,c)= argmax
c∈CP(X |c)P(c)
argmax
c∈CP(c |X)C = {c1 ,⋅⋅⋅,cL}Discriminative
thislecture!
An Example
• Example: Play Tennis
C
12/6/18
Dr.YanjunQi/UVACS
17
An Example
• Example: Play Tennis
C
12/6/18
Dr.YanjunQi/UVACS
18
Example
• Example: Play Tennis
C
12/6/18
Dr.YanjunQi/UVACS
19
12/6/18
Dr.YanjunQi/UVACS
20
• maximum likelihood estimates – simply use the frequencies in the data
CheckL12-MLELecturefor
Why
Directlyestimatefrom
data
12/6/18 21
• Learning Phase
C
Outlook (3 values)
Temperature(3 values)
Humidity(2 values)
Wind(2 values)
Play=Yes Play=No
sunny hot high weak 0/9 1/5sunny hot high strong …/9 …/5sunny hot normal weak …/9 …/5sunny hot normal strong …/9 …/5
…. …. …. …. …. ….…. …. …. …. …. ….…. …. …. …. …. ….…. …. …. …. …. ….
3*3*2*2 [conjunctions of attributes] * 2 [two classes]=72 parameters
P(Play=Yes) = 9/14 P(Play=No) = 5/14P(C1), P(C2), …, P(CL)
P(X1,X2 ,…, Xp|C1), P(X1,X2 ,…, Xp|C2)
Dr.YanjunQi/UVACS
Generative Bayes Classifier:
Generative Bayes Classifier:
• Test Phase
– Given an unknown instance
– Look up tables to assign the label c* to Xts if
– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
Dr.YanjunQi/UVACS
12/6/18 22
′Xts = ( ′a1 ,⋅⋅⋅, ′ap)
P̂( ′a1 ,⋅⋅⋅ ′ap |c* )P̂(c* )> P̂( ′a1 ,⋅⋅⋅ ′ap |c)P̂(c),c ≠ c* , c = c1 ,⋅⋅⋅,cL
LastPage
12/6/18
Dr.YanjunQi/UVACS
23
TodayRecap:GenerativeBayesClassifiers
ü BayesClassifier§ GenerativeBayesClassifier
ü NaïveBayesClassifierü GaussianBayesClassifiers
§ Gaussiandistribution§ NaïveGaussianBC§ Not-naïveGaussianBCèLDA,QDA
12/6/18
Dr.YanjunQi/UVACS
24
Naïve Bayes Classifier
• Bayes classification
Difficulty: learning the joint probability
• Naïve Bayes classification – Assumption that all input attributes are conditionally independent!
12/6/18
Dr.YanjunQi/UVACS
25
argmax
c j∈CP(x1 ,x2 ,…,xp |c j )P(c j )
Naïve Bayes Classifier
• Bayes classification
Difficulty: learning the joint probability
• Naïve Bayes classification – Assumption that all input attributes are conditionally independent!
12/6/18
Dr.YanjunQi/UVACS
26
argmax
c j∈CP(x1 ,x2 ,…,xp |c j )P(c j )
Naïve Bayes Classifier
• Naïve Bayes classification – Assumption that all input attributes are conditionally independent!
P(X1 ,X2 ,⋅⋅⋅,Xp |C)= P(X1 |X2 ,⋅⋅⋅,Xp ,C)P(X2 ,⋅⋅⋅,Xp |C) = P(X1 |C)P(X2 ,⋅⋅⋅,Xp |C) = P(X1 |C)P(X2 |C)⋅⋅⋅P(Xp |C)
12/6/18
Dr.YanjunQi/UVACS
27
Naïve Bayes Classifier
• Naïve Bayes classification – Assumption that all input attributes are conditionally independent!
– MAP classification rule: for a sample
[P(x1 | c*) ⋅ ⋅ ⋅P(xp | c*)]P(c*)> [P(x1 | c) ⋅ ⋅ ⋅P(xp | c)]P(c),
c ≠ c*, c = c1, ⋅ ⋅ ⋅,cL
x = (x1 ,x2 ,⋅⋅⋅,xp)
12/6/18
Dr.YanjunQi/UVACS
28
P(X1 ,X2 ,⋅⋅⋅,Xp |C)= P(X1 |C)P(X2 |C)⋅⋅⋅P(Xp |C)
Naïve Bayes Classifier
• Naïve Bayes classification – Assumption that all input attributes are conditionally independent!
– MAP classification rule: for a sample
[P(x1 | c*) ⋅ ⋅ ⋅P(xp | c*)]P(c*)> [P(x1 | c) ⋅ ⋅ ⋅P(xp | c)]P(c),
c ≠ c*, c = c1, ⋅ ⋅ ⋅,cL
x = (x1 ,x2 ,⋅⋅⋅,xp)
12/6/18
Dr.YanjunQi/UVACS
29
P(X1 ,X2 ,⋅⋅⋅,Xp |C)= P(X1 |C)P(X2 |C)⋅⋅⋅P(Xp |C)
Naïve Bayes Classifier
• Naïve Bayes classification – Assumption that all input attributes are conditionally independent!
– MAP classification rule: for a sample
[P(x1 | c*) ⋅ ⋅ ⋅P(xp | c*)]P(c*)> [P(x1 | c) ⋅ ⋅ ⋅P(xp | c)]P(c),
c ≠ c*, c = c1, ⋅ ⋅ ⋅,cL
x = (x1 ,x2 ,⋅⋅⋅,xp)
12/6/18
Dr.YanjunQi/UVACS
30
P(X1 ,X2 ,⋅⋅⋅,Xp |C)= P(X1 |C)P(X2 |C)⋅⋅⋅P(Xp |C)
Naïve Bayes Classifier (for discrete input attributes) - training
• Naïve Bayes Algorithm (for discrete input attributes)
– Learning Phase: Given a training set S,
Output: conditional probability tables; for elements
For each target value of ci (ci = c1,⋅ ⋅ ⋅,cL )
P̂(C = ci )← estimate P(C = ci ) with examples in S; For every attribute value x jk of each attribute X j ( j =1, ⋅ ⋅ ⋅, p; k =1, ⋅ ⋅ ⋅,K j )
P̂(X j = x jk | C = ci )← estimate P(X j = x jk | C = ci ) with examples in S;
Xj, K j × L
12/6/18
Dr.YanjunQi/UVACS
31
Naïve Bayes Classifier (for discrete input attributes) - training
• Naïve Bayes Algorithm (for discrete input attributes)
– Learning Phase: Given a training set S,
Output: conditional probability tables; for elements
For each target value of ci (ci = c1,⋅ ⋅ ⋅,cL )
P̂(C = ci )← estimate P(C = ci ) with examples in S; For every attribute value x jk of each attribute X j ( j =1, ⋅ ⋅ ⋅, p; k =1, ⋅ ⋅ ⋅,K j )
P̂(X j = x jk | C = ci )← estimate P(X j = x jk | C = ci ) with examples in S;
Xj, K j × L
12/6/18
Dr.YanjunQi/UVACS
32
Naïve Bayes Classifier (for discrete input attributes) - training
• Naïve Bayes Algorithm (for discrete input attributes)
– Learning Phase: Given a training set S,
Output: conditional probability tables; for elements
Foreachtargetvalueofci (ci = c1 ,⋅⋅⋅,cL )P̂(C = ci )←estimateP(C = ci )withexamplesinS;Foreveryattributevaluex jk ofeachattributeX j( j =1,⋅⋅⋅,p;k =1,⋅⋅⋅,K j )P̂(X j = x jk |C = ci )←estimateP(X j = x jk |C = ci )withexamplesinS;
Xj, K j × L
12/6/18
Dr.YanjunQi/UVACS
33
Naïve Bayes (for discrete input attributes) - testing
• Naïve Bayes Algorithm (for discrete input attributes)
– Test Phase: Given an unknown instance
Look up tables to assign the label c* to X’ if [P̂( !a1 | c*) ⋅ ⋅ ⋅ P̂( !ap | c*)]P̂(c*)> [P̂( !a1 | c) ⋅ ⋅ ⋅ P̂( !ap | c)]P̂(c),
c ≠ c*, c = c1, ⋅ ⋅ ⋅,cL
!X = ( !a1, ⋅ ⋅ ⋅, !ap )
12/6/18
Dr.YanjunQi/UVACS
34
An Example
• Example: Play Tennis
C
12/6/18 35
An Example
• Example: Play Tennis
C
12/6/18
Dr.YanjunQi/UVACS
36
Learning (training) the NBC Model
C
X1 X2 X5 X3 X4 X6
37 12/6/18
Dr.YanjunQi/UVACS
Learning (training) the NBC Model
• maximum likelihood estimates: – simply use the frequencies in the data
)(),(
)|(ˆj
jiiji cCN
cCxXNcxP
===
=
C
X1 X2 X5 X3 X4 X6
NcCN
cP jj
)()(ˆ
==
38 12/6/18
Dr.YanjunQi/UVACS
Learning (training) the NBC Model
• maximum likelihood estimates – simply use the frequencies in the data
)(),(
)|(ˆj
jiiji cCN
cCxXNcxP
===
=
C
X1 X2 X5 X3 X4 X6
NcCN
cP jj
)()(ˆ
==
39 12/6/18
Dr.YanjunQi/UVACS
12/6/18
Dr.YanjunQi/UVACS
40
OutlookPlay=YesPlay=No
Sunny
Overcast
Rain
Temperature Play=Yes Play=NoHot
Mild
Cool
Humidity Play=YesPlay=No
High
Normal
Wind Play=YesPlay=No
Strong
Weak
P(Play=Yes) = ?? P(Play=No) = ?? P(C1), P(C2), …, P(CL)
P(X2|C1), P(X2|C2)
P(X4|C1), P(X4|C2)
EstimateP(X j = x jk |C = ci )withexamplesintraining;
12/6/18
Dr.YanjunQi/UVACS
41
• Learning Phase
OutlookPlay=YesPlay=NoSunny 2/9 3/5
Overcast 4/9 0/5Rain 3/9 2/5
Temperature Play=Yes Play=NoHot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5
Humidity Play=YesPlay=No
High 3/9 4/5Normal 6/9 1/5
Wind Play=YesPlay=No
Strong 3/9 3/5Weak 6/9 2/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14 P(C1), P(C2), …, P(CL)
P(X2|C1), P(X2|C2)
P(X4|C1), P(X4|C2)
12/6/18
3+3+2+2 [naïve assumption] * 2 [two classes]= 20 parameters
42
Dr.YanjunQi/UVACS
EstimateP(X j = x jk |C = ci )withexamplesintraining;
• Learning Phase
OutlookPlay=YesPlay=NoSunny 2/9 3/5
Overcast 4/9 0/5Rain 3/9 2/5
Temperature Play=Yes Play=NoHot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5
Humidity Play=YesPlay=No
High 3/9 4/5Normal 6/9 1/5
Wind Play=YesPlay=No
Strong 3/9 3/5Weak 6/9 2/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14 P(C1), P(C2), …, P(CL)
P(X2|C1), P(X2|C2)
P(X4|C1), P(X4|C2)
12/6/18
3+3+2+2 [naïve assumption] * 2 [two classes]= 20 parameters
43
Dr.YanjunQi/UVACS
EstimateP(X j = x jk |C = ci )withexamplesintraining;
Testing the NBC Model
• Test Phase
– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
Dr.YanjunQi/UVACS
!![P̂( ′a1 |c* )⋅⋅⋅P̂( ′ap |c* )]P̂(c* )>[P̂( ′a1 |c)⋅⋅⋅P̂( ′ap |c)]P̂(c)
12/6/18 44
Testing the NBC Model
• Test Phase
– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
Dr.YanjunQi/UVACS
!![P̂( ′a1 |c* )⋅⋅⋅P̂( ′ap |c* )]P̂(c* )>[P̂( ′a1 |c)⋅⋅⋅P̂( ′ap |c)]P̂(c)
12/6/18 45
Testing the NBC Model
• Test Phase – Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)– Look up in conditional-prob tables
P(Outlook=Sunny|Play=No) = 3/5P(Temperature=Cool|Play==No) = 1/5P(Huminity=High|Play=No) = 4/5P(Wind=Strong|Play=No) = 3/5P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9P(Temperature=Cool|Play=Yes) = 3/9P(Huminity=High|Play=Yes) = 3/9P(Wind=Strong|Play=Yes) = 3/9P(Play=Yes) = 9/14
12/6/18
Dr.YanjunQi/UVACS
!![P̂( ′a1 |c* )⋅⋅⋅P̂( ′ap |c* )]P̂(c* )>[P̂( ′a1 |c)⋅⋅⋅P̂( ′ap |c)]P̂(c)
46
Testing the NBC Model
• Test Phase – Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)– Look up in conditional-prob tables
– MAP rule
P(Outlook=Sunny|Play=No) = 3/5P(Temperature=Cool|Play==No) = 1/5P(Huminity=High|Play=No) = 4/5P(Wind=Strong|Play=No) = 3/5P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9P(Temperature=Cool|Play=Yes) = 3/9P(Huminity=High|Play=Yes) = 3/9P(Wind=Strong|Play=Yes) = 3/9P(Play=Yes) = 9/14
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
12/6/18
Dr.YanjunQi/UVACS
!![P̂( ′a1 |c* )⋅⋅⋅P̂( ′ap |c* )]P̂(c* )>[P̂( ′a1 |c)⋅⋅⋅P̂( ′ap |c)]P̂(c)
47
WHY ? Naïve Bayes Assumption
• P(cj)– Canbeestimatedfromthefrequencyofclassesinthetrainingexamples.
• P(x1,x2,…,xp|cj)– O(|X1|.|X2|.|X3|….|Xp|.|C|)parameters– Couldonlybeestimatedifavery,verylargenumberoftrainingexampleswasavailable.
NaïveBayesConditionalIndependenceAssumption:• Assumethattheprobabilityofobservingtheconjunctionof
attributesisequaltotheproductoftheindividualprobabilitiesP(xi|cj).
48
Ifnonaïveassumption
12/6/18
Dr.YanjunQi/UVACS
AdaptFromManning’textCattutorial
WHY ? Naïve Bayes Assumption • P(cj)
– Canbeestimatedfromthefrequencyofclassesinthetrainingexamples.
• P(x1,x2,…,xp|cj)– O(|X1|.|X2|.|X3|….|Xp|.|C|)parameters– Couldonlybeestimatedifavery,verylargenumberoftrainingexampleswasavailable.
• P(xk|cj)
– O([|X1|+|X2|+|X3|….+|Xp|].|C|)parameters– AssumethattheprobabilityofobservingtheconjunctionofattributesisequaltotheproductoftheindividualprobabilitiesP(xi|cj).
49 12/6/18
Dr.YanjunQi/UVACS
Naïve
NotNaïve
WHY ? Naïve Bayes Assumption • P(cj)
– Canbeestimatedfromthefrequencyofclassesinthetrainingexamples.
• P(x1,x2,…,xp|cj)– O(|X1|.|X2|.|X3|….|Xp|.|C|)parameters– Couldonlybeestimatedifavery,verylargenumberoftrainingexampleswasavailable.
• P(xk|cj)
– O([|X1|+|X2|+|X3|….+|Xp|].|C|)parameters– AssumethattheprobabilityofobservingtheconjunctionofattributesisequaltotheproductoftheindividualprobabilitiesP(xi|cj).
50 12/6/18
Dr.YanjunQi/UVACS
Naïve
NotNaïve
Challenges during Learning (training) the NBC Model
C
X1 X2 X5 X3 X4 X6
51
12/6/18
Dr.YanjunQi/UVACS
C=Flu
X1 X2 X5 X3 X4 X6=Muscle-ache
Forinstance:
• What if we have seen no training cases where patient had no flu and muscle aches?
• Zero probabilities cannot be conditioned away, no matter the other evidence!
P̂(X6 = t |C = not_flu)=
N(X6 = t ,C = nf )N(C = nf ) =0
??= argmaxc P̂(c) P̂(xi |c)i∏12/6/18 52
C=Flu
X1 X2 X5 X3 X4 X6=Muscle-ache
Forinstance:
Dr.YanjunQi/UVACS
12/6/18
Dr.YanjunQi/UVACS
53
Smoothing to Avoid Overfitting
kcCNcCxXN
cxPj
jiiji +=
+===
)(1),(
)|(ˆ
#ofvaluesof feature Xi
12/6/18 54
Tomakesum_i(P(xi|Cj)=1
Dr.YanjunQi/UVACS
AdaptFromManning’textCattutorial
i
Smoothing to Avoid Overfitting
P̂(xi |c j )=
N(Xi = xi ,C = c j )+1N(C = c j )+ki
• Somewhat more subtle version
#ofvaluesof Xi
mcCNmpcCxXN
cxPj
kijkiijki +=
+===
)(),(
)|(ˆ ,,,
overallfractionindatawhereXi=xi,k
extentof“smoothing” 55 12/6/18
Dr.YanjunQi/UVACS
Summary: Generative Bayes Classifier
Task: Classify a new instance X based on a tuple of attribute values into one of the classes X = X1,X2,…,Xp
cMAP = argmaxcj∈C
P(cj | x1, x2,…, xp )
= argmaxcj∈C
P(x1, x2,…, xp | cj )P(cj )P(x1, x2,…, xp )
= argmaxcj∈C
P(x1, x2,…, xp | cj )P(cj )
56
MAP=MaximumAPosteriori
12/6/18
Dr.YanjunQi/UVACS
AdaptFromCarols’probtutorial
Today:GenerativeBayesClassifiers
ü BayesClassifier§ GenerativeBayesClassifier
ü NaïveBayesClassifierü GaussianBayesClassifiers
§ Gaussiandistribution§ NaïveGaussianBC§ Not-naïveGaussianBCèLDA,QDA
12/6/18
Dr.YanjunQi/UVACS
57
GaussianDistribution
Courtesy:http://research.microsoft.com/~cmbishop/PRML/index.htm
CovarianceMatrixMean12/6/18
Dr.YanjunQi/UVACS
58
( )2~ ,X N µ σ
Example: the Bivariate Normal distribution
p( !x) = f x1,x2( ) = 1
2π( ) Σ 1/2 e−1
2!x−!µ( )T Σ−1 !x−
!µ( )
with
!µ =
µ1
µ2
⎡
⎣⎢⎢
⎤
⎦⎥⎥
22 1 2
22 22 22 2 2
σ σ σ ρσ σσ σ ρσ σ σ
11 1 1
×1 1
⎡ ⎤⎡ ⎤Σ = = ⎢ ⎥⎢ ⎥
⎣ ⎦ ⎣ ⎦
and
( )2 2 2 222 12 1 2 1σ σ σ σ σ ρ11Σ = − = −
12/6/18
Dr.YanjunQi/UVACS
59
Scatter Plots of data from the bivariate Normal distribution
Dr.YanjunQi/UVACS
HowtoEstimateGaussian:MLE
∑=
=n
iixn 1
1µ
• In the 1D Gaussian case, we simply set the mean and the variance to the sample mean and the sample variance:
2
σ = 1n
2
( ix −µ)i=1
n
∑
Dr.YanjunQi/UVACS
Dr.YanjunQi/UVACS
< X1, X2!, X p >~ N µ
"#,Σ( )
The p-multivariate Normal distribution
Gaussian Naïve Bayes Classifier
12/6/18
Dr.YanjunQi/UVACS
63
argmaxC
P(C | X) = argmaxC
P(X,C) = argmaxC
P(X |C)P(C)
ijji
ijji
ji
jij
jiij
cCcX
XcCXP
=
=
⎟⎟⎠
⎞⎜⎜⎝
⎛ −−==
whichfor examples of X values attribute of deviation standard :C whichfor examples of values attribute of (avearage) mean :
2)(
exp2
1)|(ˆ 2
2
σµ
σµ
σπ
P(X |C) = P(X1,X2, ⋅ ⋅ ⋅,Xp |C)= P(X1 | X2, ⋅ ⋅ ⋅,Xp,C)P(X2, ⋅ ⋅ ⋅,Xp |C)= P(X1 |C)P(X2, ⋅ ⋅ ⋅,Xp |C)= P(X1 |C)P(X2 |C) ⋅ ⋅ ⋅P(Xp |C)
NaïveBayesClassifier
Gaussian Naïve Bayes Classifier
12/6/18
Dr.YanjunQi/UVACS
64
argmaxC
P(C | X) = argmaxC
P(X,C) = argmaxC
P(X |C)P(C)
ijji
ijji
ji
jij
jiij
cCcX
XcCXP
=
=
⎟⎟⎠
⎞⎜⎜⎝
⎛ −−==
whichfor examples of X values attribute of deviation standard :C whichfor examples of values attribute of (avearage) mean :
2)(
exp2
1)|(ˆ 2
2
σµ
σµ
σπ
P(X |C) = P(X1,X2, ⋅ ⋅ ⋅,Xp |C)= P(X1 | X2, ⋅ ⋅ ⋅,Xp,C)P(X2, ⋅ ⋅ ⋅,Xp |C)= P(X1 |C)P(X2, ⋅ ⋅ ⋅,Xp |C)= P(X1 |C)P(X2 |C) ⋅ ⋅ ⋅P(Xp |C)
NaïveBayesClassifier
65
Gaussian Naïve Bayes Classifier
• Continuous-valued Input Attributes – Conditional probability modeled with the normal distribution
– Learning Phase: Output: normal distributions and
– Test Phase: • Calculate conditional probabilities with all the normal distributions • Apply the MAP rule to make a decision
P̂(Xj |C = ci ) =1
2πσ ji
exp −(X j−µ ji )
2
2σ ji2
"
#$$
%
&''
µ ji : mean (avearage) of attribute values Xj of examples for which C = ciσ ji : standard deviation of attribute values X j of examples for which C = ci
for X = (X1, ⋅ ⋅ ⋅,Xp ), C = c1, ⋅ ⋅ ⋅,cLp× L
for !X = ( !X1, ⋅ ⋅ ⋅, !Xp )
P(C = ci ) i =1, ⋅ ⋅ ⋅,L
12/6/18
Dr.YanjunQi/UVACS
66
Gaussian Naïve Bayes Classifier
• Continuous-valued Input Attributes – Conditional probability modeled with the normal distribution
– Learning Phase: Output: normal distributions and
– Test Phase: • Calculate conditional probabilities with all the normal distributions • Apply the MAP rule to make a decision
P̂(Xj |C = ci ) =1
2πσ ji
exp −(X j−µ ji )
2
2σ ji2
"
#$$
%
&''
µ ji : mean (avearage) of attribute values Xj of examples for which C = ciσ ji : standard deviation of attribute values X j of examples for which C = ci
for X = (X1, ⋅ ⋅ ⋅,Xp ), C = c1, ⋅ ⋅ ⋅,cLp× L
for !X = ( !X1, ⋅ ⋅ ⋅, !Xp )
P(C = ci ) i =1, ⋅ ⋅ ⋅,L
12/6/18
Dr.YanjunQi/UVACS
Today:GenerativeBayesClassifiers
ü BayesClassifier§ GenerativeBayesClassifier
ü NaïveBayesClassifierü GaussianBayesClassifiers
§ Gaussiandistribution§ NaïveGaussianBC§ Not-naïveGaussianBCèLDA,QDA
12/6/18
Dr.YanjunQi/UVACS
67
Dr.YanjunQi/UVACS
P(X1,X2, ⋅ ⋅ ⋅,Xp |C = cj ) = P(X1 |C)P(X2 |C) ⋅ ⋅ ⋅P(Xp |C)
=∏i
12πσ ji
exp −(X j−µ ji )
2
2σ ji2
$
%&&
'
())
Naïve
Σ_ ck = Λ _ ckDiagonalMatrixEachclass’covariancematrixisdiagonal
NaïveGaussianmeans?
NotNaïveGaussianmeans?
12/6/18
Dr.YanjunQi/UVACS
69
P(X1,X2, ⋅ ⋅ ⋅,Xp |C = cj ) = P(X1 |C)P(X2 |C) ⋅ ⋅ ⋅P(Xp |C)
=∏i
12πσ ji
exp −(X j−µ ji )
2
2σ ji2
$
%&&
'
())
P(X1,X2, ⋅ ⋅ ⋅,Xp |C) =
Naïve
NotNaïve
12/6/18Σ_ ck = Λ _ ckDiagonalMatrix
Eachclass’covariancematrixisdiagonal
Today:GenerativeBayesClassifiers
ü GaussianBayesClassifiers
§ Gaussiandistribution§ NaïveGaussianBC§ Not-naïveGaussianBC§ LDA:LinearDiscriminantAnalysis§ QDA:QuadraticDiscriminantAnalysis
12/6/18
Dr.YanjunQi/UVACS
70
(1)covariancematrixarethesameacrossclassesèLDA(LinearDiscriminantAnalysis)
12/6/18
Dr.YanjunQi/UVACS
71
Classk Classl Classk Classl
Eachclass’covariancematrixisthesame
72
Visualization (three classes)
12/6/18
Dr.YanjunQi/UVACS
12/6/18
Dr.YanjunQi/UVACS
73
!!
argmaxk
P(Ck |X )= argmaxk
P(X ,Ck )= argmaxk
P(X |Ck )P(Ck )= argmax
klog{P(X |Ck )P(Ck )}
12/6/18
Dr.YanjunQi/UVACS
74
!!
argmaxk
P(Ck |X )= argmaxk
P(X ,Ck )= argmaxk
P(X |Ck )P(Ck )= argmax
klog{P(X |Ck )P(Ck )}
12/6/18
Dr.YanjunQi/UVACS
75
!!log P(Ck |X )
P(Cl |X )= log P(X |Ck )
P(X |Cl )+ log P(Ck )
P(Cl )
12/6/18
Dr.YanjunQi/UVACS
76
!!log P(Ck |X )
P(Cl |X )= log P(X |Ck )
P(X |Cl )+ log P(Ck )
P(Cl )
Theaboveisderivedfromthefollowing:
12/6/18
Dr.YanjunQi/UVACS
77
!!log P(Ck |X )
P(Cl |X )= log P(X |Ck )
P(X |Cl )+ log P(Ck )
P(Cl )
LDAClassificationRule(alsocalledasLineardiscriminantfunction:)- Note
argmax
kP(Ck |X )= argmax
kP(X ,Ck )= argmax
kP(X |Ck )P(Ck )
12/6/18
Dr.YanjunQi/UVACS
78
Today:GenerativeBayesClassifiers
ü BayesClassifier§ GenerativeBayesClassifier
ü NaïveBayesClassifierü GaussianBayesClassifiers
§ Gaussiandistribution§ NaïveGaussianBC§ Not-naïveGaussianBCèLDA,QDA
12/6/18
Dr.YanjunQi/UVACS
79
(2)Ifcovariancematrixarenotthesamee.g.èQDA(QuadraticDiscriminantAnalysis)
12/6/18
Dr.YanjunQi/UVACS
80
(2)Ifcovariancematrixarenotthesamee.g.èQDA(QuadraticDiscriminantAnalysis)
12/6/18
Dr.YanjunQi/UVACS
81
QDAvs.LDAonExpandedBasis
12/6/18
Dr.YanjunQi/UVACS
82
LDAwithquadraticbasisVersusQDA
(3)RegularizedDiscriminantAnalysis
12/6/18
Dr.YanjunQi/UVACS
83
Anexample:GaussianBayesClassifier
12/6/18
Dr.YanjunQi/UVACS
84
GaussianBayesClassifier
12/6/18
Dr.YanjunQi/UVACS
85
GaussianBayesClassifier
RedTeam
BlueTeam
NaïveGaussianBayesClassifierisnotalinearclassifier!
TodayRecap:GenerativeBayesClassifiers
ü BayesClassifier§ GenerativeBayesClassifier
ü NaïveBayesClassifierü GaussianBayesClassifiers
§ Gaussiandistribution§ NaïveGaussianBC§ Not-naïveGaussianBCèLDA,QDA
12/6/18
Dr.YanjunQi/UVACS
87
Generative Bayes Classifier
classification
Prob. models p(X|C)
EPE with 0-1 loss è MAP Rule
Many options
Prob. Models’ Parameter
Task
Representation
Score Function
Search/Optimization
Models, Parameters
P(X1, ⋅ ⋅ ⋅,Xp |C)
argmaxk
P(C _ k | X) = argmaxk
P(X,C) = argmaxk
P(X |C)P(C)
P̂(Xj |C = ck ) =1
2πσ jk
exp −(X j−µ jk )
2
2σ jk2
"
#$$
%
&''
P(W1 = n1,...,Wv = nv | ck ) =N !
n1k !n2k !..nvk !θ1kn1kθ2k
n2k ..θvknvk
p(Wi = true | ck ) = pi,kBernoulliNaïve
GaussianNaive
Multinomial
References
q Prof.AndrewMoore’sreviewtutorialq Prof.KeChenNBslidesq Prof.CarlosGuestrinrecitationslides
12/6/18
Dr.YanjunQi/UVACS
89