bayesian decision theory – continuous features
DESCRIPTION
Bayesian Decision Theory – Continuous Features. Team teaching. Introduction. The sea bass/salmon example State of nature, prior State of nature is a random variable The catch of salmon and sea bass is equiprobable P( 1 ) = P( 2 ) (uniform priors) - PowerPoint PPT PresentationTRANSCRIPT
Bayesian Decision Theory– Continuous Features
Team teaching
Pattern Classification, Chapter 2 (Part 1) 2
Introduction
• The sea bass/salmon example
– State of nature, prior
• State of nature is a random variable
• The catch of salmon and sea bass is equiprobable
– P(1) = P(2) (uniform priors)
– P(1) + P( 2) = 1 (exclusivity and exhaustivity)
Pattern Classification, Chapter 2 (Part 1) 3
• Decision rule with only the prior information– Decide 1 if P(1) > P(2) otherwise decide 2
• Use of the class –conditional information
• P(x | 1) and P(x | 2) describe the difference in lightness between populations of sea and salmon
Pattern Classification, Chapter 2 (Part 1) 4
Pattern Classification, Chapter 2 (Part 1) 5
• Posterior, likelihood, evidence
– P(j | x) = P(x | j) . P (j) / P(x)
– Where in case of two categories
– Posterior = (Likelihood. Prior) / Evidence
2
1
)()|()(j
jjj PxPxP
Pattern Classification, Chapter 2 (Part 1) 6
Pattern Classification, Chapter 2 (Part 1) 7
• Decision given the posterior probabilities
X is an observation for which:
if P(1 | x) > P(2 | x) True state of nature = 1
if P(1 | x) < P(2 | x) True state of nature = 2
Therefore: whenever we observe a particular x, the probability of
error is :P(error | x) = P(1 | x) if we decide 2
P(error | x) = P(2 | x) if we decide 1
Pattern Classification, Chapter 2 (Part 1) 8
• Minimizing the probability of error
• Decide 1 if P(1 | x) > P(2 | x); otherwise decide 2
Therefore: P(error | x) = min [P(1 | x), P(2 | x)]
(Bayes decision)
Pattern Classification, Chapter 2 (Part 1) 9
Bayesian Decision Theory – Continuous Features
• Generalization of the preceding ideas
– Use of more than one feature– Use more than two states of nature– Allowing actions and not only decide on the state of
nature– Introduce a loss of function which is more general
than the probability of error
Pattern Classification, Chapter 2 (Part 1) 10
• Allowing actions other than classification primarily allows the possibility of rejection
• Refusing to make a decision in close or bad cases!
• The loss function states how costly each action taken is
Pattern Classification, Chapter 2 (Part 1) 11
Let {1, 2,…, c} be the set of c states of nature(or “categories”)
Let {1, 2,…, a} be the set of possible actions
Let (i | j) be the loss incurred for taking
action i when the state of nature is j
Pattern Classification, Chapter 2 (Part 1) 12
Overall riskR = Sum of all R(i | x) for i = 1,…,a
Minimizing R Minimizing R(i | x) for i = 1,…, a
for i = 1,…,a
Conditional risk
cj
jjjii xPxR
1
)|()|()|(
Pattern Classification, Chapter 2 (Part 1) 13
Select the action i for which R(i | x) is minimum
R is minimum and R in this case is called the Bayes risk = best performance that can be achieved!
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Diagram of pattern classification
Procedure of pattern recognition and decision making
subjects Features x
Observables X
Action
Inner belief w
X--- all the observables using existing sensors and instrumentsx --- is a set of features selected from components of X, or linear/non-linear functions of X.w --- is our inner belief/perception about the subject class. --- is the action that we take for x.
We denote the three spaces by
},...,,{,class ofindex theis
vectorais),...,,(
α,,
k21C
d21
αCd
wwww
xxxx
wx
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Examples
Ex 1: Fish classification
X=I is the image of fish,
x =(brightness, length, fin#, ….)
w is our belief what the fish type is c={“sea bass”, “salmon”, “trout”, …}
is a decision for the fish type, in this case c=
={“sea bass”, “salmon”, “trout”, …}
Ex 2: Medical diagnosis X= all the available medical tests, imaging scans that a
doctor can order for a patient x =(blood pressure, glucose level, cough, x-ray….)
w is an illness type c={“Flu”, “cold”, “TB”, “pneumonia”, “lung cancer”…}
is a decision for treatment, ={“Tylenol”, “Hospitalize”, …}
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Tasks
subjects Features x
Observables X
Decision
Inner belief w
controlsensors
selectingInformative
features
statisticalinference
risk/costminimization
In Bayesian decision theory, we are concerned with the last three steps in the big ellipseassuming that the observables are given and features are selected.
Bayes DecisionIt is the decision making when all underlying probability distributions are known.It is optimal given the distributions are known.
For two classes 1 and 2 ,
Prior probabilities for an unknown new observation:
P(1) : the new observation belongs to class 1P(2) : the new observation belongs to class 2
P(1 ) + P(2 ) = 1
It reflects our prior knowledge. It is our decision rule when no feature on the new object is available:Classify as class 1 if P(1 ) > P(2 )
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Bayesian Decision Theory
Features x
Decision x
Inner belief p(w|x)
statisticalInference
risk/costminimization
Two probability tables: a). Prior p(w) b). Likelihood p(x|w)
A risk/cost function (is a two-way table) w)
The belief on the class w is computed by the Bayes rule
The risk is computed by
)(
)()|()|(
xp
wpwxpxwp
k
xxR1j
jjii )|)p(ww|()|(
Bayes DecisionWe observe features on each object.
P(x| 1) & P(x| 2) : class-specific density
The Bayes rule:
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Decision Rule
A decision is made to minimize the average cost / risk,
It is minimized when our decision is made to minimize the cost / risk for each instance x.
dx)()|)(( xpxxRR
d:)(x
k
jjj xwpwxRx
1
)|()|(minarg)|(minarg)(
A decision rule is a mapping function from feature space to the set of actions
we will show that randomized decisions won’t be optimal.
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Bayesian error
In a special case, like fish classification, the action is classification, we assume a 0/1 error.
jiji
jiji
wifw
wifw
1)|(
0)|(
)|(1)|p(w)|( iw
ji
ij
xpxxR
The risk for classifying x to class i is,
The optimal decision is to choose the class that has maximum posterior probability
)|(maxarg))|(1(minarg)( xpxpx
The total risk for a decision rule, in this case, is called the Bayesian error
dxxpxxpdxxpxerrorperrorpR )())|)((1()()|()(
Lecture note for Stat 231: Pattern Recognition and Machine Learning
An example of fish classification
Example
3. It is known that 1% of population suffers from a particular disease. A blood test has a 97% chance to identify the disease for a diseased individual, by also has a 6% chance of falsely indicating that a healthy person has a disease.
a. What is the probability that a random person has a positive blood test.
b. If a blood test is positive, what’s the probability that the person has the disease?
c. If a blood test is negative, what’s the probability that the person does not have the disease?
• S is a boolean RV indicating whether a person has a disease. P(S) = 0.01; P(S’) = 0.99.
• T is a boolean RV indicating the test result ( T = true indicates that test is positive.)– P(T|S) = 0.97; P(T’|S) = 0.03;– P(T|S’) = 0.06; P(T’|S’) = 0.94;
• (a) P(T) = P(S) P(T|S) + P(S’)P(T|S’) = 0.01*0.97 +0.99 * 0.06 = 0.0691
• (b) P(S|T)=P(T|S)*P(S)/P(T) = 0.97* 0.01/0.0691 = 0.1403• (c) P(S’|T’) = P(T’|S’)P(S’)/P(T’)= P(T’|S’)P(S’)/(1-P(T))=
0.94*0.99/(1-.0691)=0.9997
• A physician can do two possible actions after seeing patient’s test results:– A1 - Decide the patient is sick– A2 - Decide the patient is healthy
• The costs of those actions are:– If the patient is healthy, but the doctor decides he/she is sick -
$20,000.– If the patient is sick, but the doctor decides he/she is healthy -
$100.000
• When the test is positive: – R(A1|T) = R(A1|S)P(S|T) + R(A1|S’) P(S’|T) = R(A1|S’) *P(S’|T) =
20.000* P(S’|T) = 20.000*0.8597 = $17194.00– R(A2|T) = R(A2|S)P(S|T) + R(A2|S’) P(S’|T) = R(A2|S)P(S|T) =
100000* 0.1403 = $14030.00
• A physician can do three possible actions after seeing patient’s test results:– Decide the patient is sick– Decide the patient is healthy– Send the patient for another test
• The costs of those actions are:– If the patient is healthy, but the doctor decides he/she
is sick - $20,000.– If the patient is sick, but the doctor decides he/she is
healthy - $100.000– Sending the patient for another test costs $15,000
• When the test is positive: – R(A1|T) = R(A1|S)P(S|T) + R(A1|S’) P(S’|T) = R(A1|S’)
*P(S’|T) = 20.000* P(S’|T) = 20.000*0.8597 = $17194.00
– R(A2|T) = R(A2|S)P(S|T) + R(A2|S’) P(S’|T) = R(A2|S)P(S|T) = 100000* 0.1403 = $14030.00
– R(A3|T) = $15000.00
• When the test is negative: – R(A1|T’) = R(A1|S)P(S|T’) + R(A1|S’) P(S’|T’) = R(A1|
S’) P(S’|T’) = 20,000* 0.9997 = $19994.00– R(A2|T’) = R(A2|S)P(S|T’) + R(A2|S’) P(S’|T’) = R(A1|
S) P(S|T’)= 100,000*0.0003 = $30.00– R(A3|T’) = 15000.00
Excercise• Consider the example of sea bass – salmon classifier, let these two
possible actions: A1: Decide the input is sea bass; A2: Decide the input is salmon. Prior for sea bass and salmon are 2/3 and 1/3, respectively.
• The cost of classifying a fish as a salmon when it truly is sea bass is 2$, and The cost of classifying a fish as a sea bass when it is truly a salmon is 1$.
• Find the decision for input X = 13, whereas the likelihood P(X|ω1) = 0.28,
and P(X|ω2) = 0.17