machine learning - bayesian decision theory and …vishy/fall2016/notes/bayesiandecision.pdf ·...

37
Machine Learning Bayesian Decision Theory and Classification S.V . N. (vishy) Vishwanathan University of California, Santa Cruz [email protected] October 21, 2016 S.V . N. Vishwanathan (UCSC) CMPS242 1 / 21

Upload: vuongtu

Post on 08-May-2018

222 views

Category:

Documents


3 download

TRANSCRIPT

Machine LearningBayesian Decision Theory and Classification

S.V.N. (vishy) Vishwanathan

University of California, Santa [email protected]

October 21, 2016

S.V. N. Vishwanathan (UCSC) CMPS242 1 / 21

Binary Classification

Outline

1 Binary Classification

2 Generative ModelsGaussian Generative ModelNaive Bayes

3 Discriminative ClassifiersLogistic Regression

S.V. N. Vishwanathan (UCSC) CMPS242 2 / 21

Binary Classification

Problem Setting: Binary Classification

Data: x = (x1, x2, . . . , xN)>

Labels: t = (t1, t2, . . . , tN)> with ti ∈ {0, 1}t = 1 implies class C1 and t = 0 implies class C2

Let us call the two classes C1 and C2, and let p (C1) = π andp (C2) = 1− π.

S.V. N. Vishwanathan (UCSC) CMPS242 3 / 21

Binary Classification

Basic Idea

Estimate p (Ci |x) and predict C1 if p (C1|x) > p (C2|x)

Key Problem: How to estimate p (Ci |x)?

Two philosophies:

GenerativeDiscriminative

S.V. N. Vishwanathan (UCSC) CMPS242 4 / 21

Generative Models

Outline

1 Binary Classification

2 Generative ModelsGaussian Generative ModelNaive Bayes

3 Discriminative ClassifiersLogistic Regression

S.V. N. Vishwanathan (UCSC) CMPS242 5 / 21

Generative Models

Generative Models

p (Ci |x) =p (x|Ci ) · p (Ci )

p (x)

Decision function: predict C1 if

p (C1|x) > p (C2|x)

S.V. N. Vishwanathan (UCSC) CMPS242 6 / 21

Generative Models

Generative Models

p (Ci |x) =p (x|Ci ) · p (Ci )

p (x)

Decision function: predict C1 if

p (x|C1) · p (C1)

p (x)>

p (x|C2) · p (C2)

p (x)

S.V. N. Vishwanathan (UCSC) CMPS242 6 / 21

Generative Models

Generative Models

p (Ci |x) =p (x|Ci ) · p (Ci )

p (x)

Decision function: predict C1 if

p (x|C1) · p (C1) > p (x|C2) · p (C2)

S.V. N. Vishwanathan (UCSC) CMPS242 6 / 21

Generative Models

Generative Models

p (Ci |x) =p (x|Ci ) · p (Ci )

p (x)

Decision function: predict C1 if

ln p (x|C1) + ln p (C1) > ln p (x|C2) + ln p (C2)

S.V. N. Vishwanathan (UCSC) CMPS242 6 / 21

Generative Models

Generative Models

p (Ci |x) =p (x|Ci ) · p (Ci )

p (x)

Decision function: predict C1 if

ln p (x|C1) + lnπ > ln p (x|C2) + ln (1− π)

S.V. N. Vishwanathan (UCSC) CMPS242 6 / 21

Generative Models Gaussian Generative Model

Class-Conditional Gaussian Distribution

p (x|Ck) = N (x|µk ,Σk)

S.V. N. Vishwanathan (UCSC) CMPS242 7 / 21

Generative Models Gaussian Generative Model

Class-Conditional Gaussian Distribution

p (x|Ck) =1

(2π)D2

1

|Σ|12

exp

{−1

2(x− µk)>Σ−1

k (x− µk)

}

S.V. N. Vishwanathan (UCSC) CMPS242 7 / 21

Generative Models Gaussian Generative Model

Decision Rule

ln p (x|C1) + ln p (C1) > ln p (x|C2) + ln p (C2)

S.V. N. Vishwanathan (UCSC) CMPS242 8 / 21

Generative Models Gaussian Generative Model

Decision Rule

ln

(1

(2π)D2

1

|Σ1|12

exp

{−1

2(x− µ1)>Σ−1

1 (x− µ1)

})+ ln p (C1) >

ln

(1

(2π)D2

1

|Σ2|12

exp

{−1

2(x− µ2)>Σ−1

2 (x− µ2)

})+ ln p (C2)

S.V. N. Vishwanathan (UCSC) CMPS242 8 / 21

Generative Models Gaussian Generative Model

Decision Rule

1

2x>(Σ−1

2 − Σ−11

)x + µ>1 Σ−1

1 x− µ>2 Σ−12 x + b > 0

where

b = ln

1− π

)− 1

2ln|Σ1||Σ2|

+1

2µ>2 Σ−1

2 µ2 −1

2µ>1 Σ−1

1 µ1

S.V. N. Vishwanathan (UCSC) CMPS242 8 / 21

Generative Models Gaussian Generative Model

Special Case: Σi = Σ

(µ1 − µ2)>Σ−1x + b > 0

where

b = ln

1− π

)+

1

2µ>2 Σ−1µ2 −

1

2µ>1 Σ−1µ1

S.V. N. Vishwanathan (UCSC) CMPS242 9 / 21

Generative Models Gaussian Generative Model

Special Case: Σi = Σ

(µ1 − µ2)>Σ−1︸ ︷︷ ︸w>

x + b > 0

where

b = ln

1− π

)+

1

2µ>2 Σ−1µ2 −

1

2µ>1 Σ−1µ1

S.V. N. Vishwanathan (UCSC) CMPS242 9 / 21

Generative Models Gaussian Generative Model

Special Case: Σi = Σ

w>x + b > 0

where

b = ln

1− π

)+

1

2µ>2 Σ−1µ2 −

1

2µ>1 Σ−1µ1

S.V. N. Vishwanathan (UCSC) CMPS242 9 / 21

Generative Models Gaussian Generative Model

Parameter Estimation via MLE

p (x, t|π, µ1, µ2,Σ) =N∏

n=1

[π · N (xn|µ1,Σ)]tn · [(1− π) · N (xn|µ2,Σ)]1−tn

S.V. N. Vishwanathan (UCSC) CMPS242 10 / 21

Generative Models Gaussian Generative Model

Parameter Estimation via MLE

ln p (x, t|π, µ1, µ2,Σ) =N∑

n=1

tn lnπ + (1− tn) ln (1− π) + tn lnN (xn|µ1,Σ)

+ (1− tn) lnN (xn|µ2,Σ)

S.V. N. Vishwanathan (UCSC) CMPS242 10 / 21

Generative Models Gaussian Generative Model

Focus on π

ln p (x, t|π, µ1, µ2,Σ) =N∑

n=1

tn lnπ + (1− tn) ln (1− π) + tn lnN (xn|µ1,Σ)

+ (1− tn) lnN (xn|µ2,Σ)

Take gradients and set to zero:

π =1

N

N∑n=1

tn =N1

N=

N1

N1 + N2

S.V. N. Vishwanathan (UCSC) CMPS242 11 / 21

Generative Models Gaussian Generative Model

Focus on µ1

ln p (x, t|π, µ1, µ2,Σ) =N∑

n=1

tn lnπ + (1− tn) ln (1− π) + tn lnN (xn|µ1,Σ)

+ (1− tn) lnN (xn|µ2,Σ)

Take gradients and set to zero:

µ1 =1

N1

N∑n=1

tnxn

Similar calculation for µ2

S.V. N. Vishwanathan (UCSC) CMPS242 12 / 21

Generative Models Gaussian Generative Model

Focus on Σ

ln p (x, t|π, µ1, µ2,Σ) =N∑

n=1

tn lnπ + (1− tn) ln (1− π) + tn lnN (xn|µ1,Σ)

+ (1− tn) lnN (xn|µ2,Σ)

Take gradients and set to zero:

Σ =N1

NS1 +

N2

NS2

S1 =1

N1

∑n,tn=1

(xn − µ1) (xn − µ1)>

S2 =1

N2

∑n,tn=0

(xn − µ2) (xn − µ2)>

S.V. N. Vishwanathan (UCSC) CMPS242 13 / 21

Generative Models Naive Bayes

Class-Conditional Distribution

For simplicity let each component xi ∈ {0, 1} and we assume conditionalindependence

p (x|Ck) =D∏i=1

µxiki (1− µki )(1−xi )

S.V. N. Vishwanathan (UCSC) CMPS242 14 / 21

Generative Models Naive Bayes

Decision Rule

ln p (x|C1) + ln p (C1) > ln p (x|C2) + ln p (C2)

S.V. N. Vishwanathan (UCSC) CMPS242 15 / 21

Generative Models Naive Bayes

Decision Rule

D∑i=1

xi lnµ1i + (1− xi ) ln (1− µ1i ) + ln p (C1) >

D∑i=1

xi lnµ2i + (1− xi ) ln (1− µ2i ) + ln p (C2)

S.V. N. Vishwanathan (UCSC) CMPS242 15 / 21

Generative Models Naive Bayes

Decision Rule

D∑i=1

(xi ln

µ1i

µ2i+ (1− xi ) ln

(1− µ1i

1− µ2i

))+ ln

1− π

)> 0

S.V. N. Vishwanathan (UCSC) CMPS242 15 / 21

Generative Models Naive Bayes

Decision Rule

D∑i=1

xi · lnµ1i · (1− µ2i )

µ2i · (1− µ1i )︸ ︷︷ ︸w>x

+ ln

(1− µ1i

1− µ2i

)+ ln

1− π

)︸ ︷︷ ︸

b

> 0

S.V. N. Vishwanathan (UCSC) CMPS242 15 / 21

Discriminative Classifiers

Outline

1 Binary Classification

2 Generative ModelsGaussian Generative ModelNaive Bayes

3 Discriminative ClassifiersLogistic Regression

S.V. N. Vishwanathan (UCSC) CMPS242 16 / 21

Discriminative Classifiers

Rewriting the Model

p (C1|x) =p (x|C1) · p (C1)

p (x)

S.V. N. Vishwanathan (UCSC) CMPS242 17 / 21

Discriminative Classifiers

Rewriting the Model

p (C1|x) =p (x|C1) · p (C1)

p (x|C1) · p (C1) + p (x|C2) · p (C2)

S.V. N. Vishwanathan (UCSC) CMPS242 17 / 21

Discriminative Classifiers

Rewriting the Model

p (C1|x) =exp (a1)

exp (a1) + exp (a2)

where ak = ln p (x|Ck) · p (Ck)

S.V. N. Vishwanathan (UCSC) CMPS242 17 / 21

Discriminative Classifiers

Rewriting the Model

p (C1|x) =1

1 + exp (−a)= σ (a)

where a = a1 − a2

S.V. N. Vishwanathan (UCSC) CMPS242 17 / 21

Discriminative Classifiers

Key Idea

Recall that in the Gaussian case with Σi = Σ

a = lnp (x|C1) · p (C1)

p (x|C2) · p (C2)

S.V. N. Vishwanathan (UCSC) CMPS242 18 / 21

Discriminative Classifiers

Key Idea

Recall that in the Gaussian case with Σi = Σ

a = ln p (x|C1) + ln p (C1)− ln p (x|C2)− ln p (C2)

S.V. N. Vishwanathan (UCSC) CMPS242 18 / 21

Discriminative Classifiers

Key Idea

Recall that in the Gaussian case with Σi = Σ

a = (µ1 − µ2)>Σ−1︸ ︷︷ ︸w>

·x + b = µ>1 Σ−1︸ ︷︷ ︸w>

1

·x− µ>2 Σ−1︸ ︷︷ ︸w>

2

·x + b

where

b = ln

1− π

)+

1

2µ>2 Σ−1µ2 −

1

2µ>1 Σ−1µ1

Why not model a directly as w>x + b for some arbitrary w?

S.V. N. Vishwanathan (UCSC) CMPS242 18 / 21

Discriminative Classifiers

Questions?

S.V. N. Vishwanathan (UCSC) CMPS242 19 / 21