bayesian decision theory – continuous features

28
Bayesian Decision Theory – Continuous Features Team teaching

Upload: ballari-taj

Post on 30-Dec-2015

46 views

Category:

Documents


0 download

DESCRIPTION

Bayesian Decision Theory – Continuous Features. Team teaching. Introduction. The sea bass/salmon example State of nature, prior State of nature is a random variable The catch of salmon and sea bass is equiprobable P(  1 ) = P(  2 ) (uniform priors) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bayesian Decision Theory – Continuous Features

Bayesian Decision Theory– Continuous Features

Team teaching

Page 2: Bayesian Decision Theory – Continuous Features

Pattern Classification, Chapter 2 (Part 1) 2

Introduction

• The sea bass/salmon example

– State of nature, prior

• State of nature is a random variable

• The catch of salmon and sea bass is equiprobable

– P(1) = P(2) (uniform priors)

– P(1) + P( 2) = 1 (exclusivity and exhaustivity)

Page 3: Bayesian Decision Theory – Continuous Features

Pattern Classification, Chapter 2 (Part 1) 3

• Decision rule with only the prior information– Decide 1 if P(1) > P(2) otherwise decide 2

• Use of the class –conditional information

• P(x | 1) and P(x | 2) describe the difference in lightness between populations of sea and salmon

Page 4: Bayesian Decision Theory – Continuous Features

Pattern Classification, Chapter 2 (Part 1) 4

Page 5: Bayesian Decision Theory – Continuous Features

Pattern Classification, Chapter 2 (Part 1) 5

• Posterior, likelihood, evidence

– P(j | x) = P(x | j) . P (j) / P(x)

– Where in case of two categories

– Posterior = (Likelihood. Prior) / Evidence

2

1

)()|()(j

jjj PxPxP

Page 6: Bayesian Decision Theory – Continuous Features

Pattern Classification, Chapter 2 (Part 1) 6

Page 7: Bayesian Decision Theory – Continuous Features

Pattern Classification, Chapter 2 (Part 1) 7

• Decision given the posterior probabilities

X is an observation for which:

if P(1 | x) > P(2 | x) True state of nature = 1

if P(1 | x) < P(2 | x) True state of nature = 2

Therefore: whenever we observe a particular x, the probability of

error is :P(error | x) = P(1 | x) if we decide 2

P(error | x) = P(2 | x) if we decide 1

Page 8: Bayesian Decision Theory – Continuous Features

Pattern Classification, Chapter 2 (Part 1) 8

• Minimizing the probability of error

• Decide 1 if P(1 | x) > P(2 | x); otherwise decide 2

Therefore: P(error | x) = min [P(1 | x), P(2 | x)]

(Bayes decision)

Page 9: Bayesian Decision Theory – Continuous Features

Pattern Classification, Chapter 2 (Part 1) 9

Bayesian Decision Theory – Continuous Features

• Generalization of the preceding ideas

– Use of more than one feature– Use more than two states of nature– Allowing actions and not only decide on the state of

nature– Introduce a loss of function which is more general

than the probability of error

Page 10: Bayesian Decision Theory – Continuous Features

Pattern Classification, Chapter 2 (Part 1) 10

• Allowing actions other than classification primarily allows the possibility of rejection

• Refusing to make a decision in close or bad cases!

• The loss function states how costly each action taken is

Page 11: Bayesian Decision Theory – Continuous Features

Pattern Classification, Chapter 2 (Part 1) 11

Let {1, 2,…, c} be the set of c states of nature(or “categories”)

Let {1, 2,…, a} be the set of possible actions

Let (i | j) be the loss incurred for taking

action i when the state of nature is j

Page 12: Bayesian Decision Theory – Continuous Features

Pattern Classification, Chapter 2 (Part 1) 12

Overall riskR = Sum of all R(i | x) for i = 1,…,a

Minimizing R Minimizing R(i | x) for i = 1,…, a

for i = 1,…,a

Conditional risk

cj

jjjii xPxR

1

)|()|()|(

Page 13: Bayesian Decision Theory – Continuous Features

Pattern Classification, Chapter 2 (Part 1) 13

Select the action i for which R(i | x) is minimum

R is minimum and R in this case is called the Bayes risk = best performance that can be achieved!

Page 14: Bayesian Decision Theory – Continuous Features

Lecture note for Stat 231: Pattern Recognition and Machine Learning

Diagram of pattern classification

Procedure of pattern recognition and decision making

subjects Features x

Observables X

Action

Inner belief w

X--- all the observables using existing sensors and instrumentsx --- is a set of features selected from components of X, or linear/non-linear functions of X.w --- is our inner belief/perception about the subject class. --- is the action that we take for x.

We denote the three spaces by

},...,,{,class ofindex theis

vectorais),...,,(

α,,

k21C

d21

αCd

wwww

xxxx

wx

Page 15: Bayesian Decision Theory – Continuous Features

Lecture note for Stat 231: Pattern Recognition and Machine Learning

Examples

Ex 1: Fish classification

X=I is the image of fish,

x =(brightness, length, fin#, ….)

w is our belief what the fish type is c={“sea bass”, “salmon”, “trout”, …}

is a decision for the fish type, in this case c=

={“sea bass”, “salmon”, “trout”, …}

Ex 2: Medical diagnosis X= all the available medical tests, imaging scans that a

doctor can order for a patient x =(blood pressure, glucose level, cough, x-ray….)

w is an illness type c={“Flu”, “cold”, “TB”, “pneumonia”, “lung cancer”…}

is a decision for treatment, ={“Tylenol”, “Hospitalize”, …}

Page 16: Bayesian Decision Theory – Continuous Features

Lecture note for Stat 231: Pattern Recognition and Machine Learning

Tasks

subjects Features x

Observables X

Decision

Inner belief w

controlsensors

selectingInformative

features

statisticalinference

risk/costminimization

In Bayesian decision theory, we are concerned with the last three steps in the big ellipseassuming that the observables are given and features are selected.

Page 17: Bayesian Decision Theory – Continuous Features

Bayes DecisionIt is the decision making when all underlying probability distributions are known.It is optimal given the distributions are known.

For two classes 1 and 2 ,

Prior probabilities for an unknown new observation:

P(1) : the new observation belongs to class 1P(2) : the new observation belongs to class 2

P(1 ) + P(2 ) = 1

It reflects our prior knowledge. It is our decision rule when no feature on the new object is available:Classify as class 1 if P(1 ) > P(2 )

Page 18: Bayesian Decision Theory – Continuous Features

Lecture note for Stat 231: Pattern Recognition and Machine Learning

Bayesian Decision Theory

Features x

Decision x

Inner belief p(w|x)

statisticalInference

risk/costminimization

Two probability tables: a). Prior p(w) b). Likelihood p(x|w)

A risk/cost function (is a two-way table) w)

The belief on the class w is computed by the Bayes rule

The risk is computed by

)(

)()|()|(

xp

wpwxpxwp

k

xxR1j

jjii )|)p(ww|()|(

Page 19: Bayesian Decision Theory – Continuous Features

Bayes DecisionWe observe features on each object.

P(x| 1) & P(x| 2) : class-specific density

The Bayes rule:

Page 20: Bayesian Decision Theory – Continuous Features

Lecture note for Stat 231: Pattern Recognition and Machine Learning

Decision Rule

A decision is made to minimize the average cost / risk,

It is minimized when our decision is made to minimize the cost / risk for each instance x.

dx)()|)(( xpxxRR

d:)(x

k

jjj xwpwxRx

1

)|()|(minarg)|(minarg)(

A decision rule is a mapping function from feature space to the set of actions

we will show that randomized decisions won’t be optimal.

Page 21: Bayesian Decision Theory – Continuous Features

Lecture note for Stat 231: Pattern Recognition and Machine Learning

Bayesian error

In a special case, like fish classification, the action is classification, we assume a 0/1 error.

jiji

jiji

wifw

wifw

1)|(

0)|(

)|(1)|p(w)|( iw

ji

ij

xpxxR

The risk for classifying x to class i is,

The optimal decision is to choose the class that has maximum posterior probability

)|(maxarg))|(1(minarg)( xpxpx

The total risk for a decision rule, in this case, is called the Bayesian error

dxxpxxpdxxpxerrorperrorpR )())|)((1()()|()(

Page 22: Bayesian Decision Theory – Continuous Features

Lecture note for Stat 231: Pattern Recognition and Machine Learning

An example of fish classification

Page 23: Bayesian Decision Theory – Continuous Features

Example

3. It is known that 1% of population suffers from a particular disease. A blood test has a 97% chance to identify the disease for a diseased individual, by also has a 6% chance of falsely indicating that a healthy person has a disease.

a. What is the probability that a random person has a positive blood test.

b. If a blood test is positive, what’s the probability that the person has the disease?

c. If a blood test is negative, what’s the probability that the person does not have the disease?

Page 24: Bayesian Decision Theory – Continuous Features

• S is a boolean RV indicating whether a person has a disease. P(S) = 0.01; P(S’) = 0.99.

• T is a boolean RV indicating the test result ( T = true indicates that test is positive.)– P(T|S) = 0.97; P(T’|S) = 0.03;– P(T|S’) = 0.06; P(T’|S’) = 0.94;

• (a) P(T) = P(S) P(T|S) + P(S’)P(T|S’) = 0.01*0.97 +0.99 * 0.06 = 0.0691

• (b) P(S|T)=P(T|S)*P(S)/P(T) = 0.97* 0.01/0.0691 = 0.1403• (c) P(S’|T’) = P(T’|S’)P(S’)/P(T’)= P(T’|S’)P(S’)/(1-P(T))=

0.94*0.99/(1-.0691)=0.9997

Page 25: Bayesian Decision Theory – Continuous Features

• A physician can do two possible actions after seeing patient’s test results:– A1 - Decide the patient is sick– A2 - Decide the patient is healthy

• The costs of those actions are:– If the patient is healthy, but the doctor decides he/she is sick -

$20,000.– If the patient is sick, but the doctor decides he/she is healthy -

$100.000

• When the test is positive: – R(A1|T) = R(A1|S)P(S|T) + R(A1|S’) P(S’|T) = R(A1|S’) *P(S’|T) =

20.000* P(S’|T) = 20.000*0.8597 = $17194.00– R(A2|T) = R(A2|S)P(S|T) + R(A2|S’) P(S’|T) = R(A2|S)P(S|T) =

100000* 0.1403 = $14030.00

Page 26: Bayesian Decision Theory – Continuous Features

• A physician can do three possible actions after seeing patient’s test results:– Decide the patient is sick– Decide the patient is healthy– Send the patient for another test

• The costs of those actions are:– If the patient is healthy, but the doctor decides he/she

is sick - $20,000.– If the patient is sick, but the doctor decides he/she is

healthy - $100.000– Sending the patient for another test costs $15,000

Page 27: Bayesian Decision Theory – Continuous Features

• When the test is positive: – R(A1|T) = R(A1|S)P(S|T) + R(A1|S’) P(S’|T) = R(A1|S’)

*P(S’|T) = 20.000* P(S’|T) = 20.000*0.8597 = $17194.00

– R(A2|T) = R(A2|S)P(S|T) + R(A2|S’) P(S’|T) = R(A2|S)P(S|T) = 100000* 0.1403 = $14030.00

– R(A3|T) = $15000.00

• When the test is negative: – R(A1|T’) = R(A1|S)P(S|T’) + R(A1|S’) P(S’|T’) = R(A1|

S’) P(S’|T’) = 20,000* 0.9997 = $19994.00– R(A2|T’) = R(A2|S)P(S|T’) + R(A2|S’) P(S’|T’) = R(A1|

S) P(S|T’)= 100,000*0.0003 = $30.00– R(A3|T’) = 15000.00

Page 28: Bayesian Decision Theory – Continuous Features

Excercise• Consider the example of sea bass – salmon classifier, let these two

possible actions: A1: Decide the input is sea bass; A2: Decide the input is salmon. Prior for sea bass and salmon are 2/3 and 1/3, respectively.

• The cost of classifying a fish as a salmon when it truly is sea bass is 2$, and The cost of classifying a fish as a sea bass when it is truly a salmon is 1$.

• Find the decision for input X = 13, whereas the likelihood P(X|ω1) = 0.28,

and P(X|ω2) = 0.17