materi 3 bayesian decision theory

46
Bayesian Decision Theory Team teaching

Upload: teguhtheo

Post on 13-Apr-2015

31 views

Category:

Documents


1 download

DESCRIPTION

kuliah bayesian

TRANSCRIPT

Page 1: Materi 3 Bayesian Decision Theory

Bayesian  Decision  Theory  

Team  teaching  

Page 2: Materi 3 Bayesian Decision Theory

Lecture  note  for  Stat  231:  Pa;ern  Recogni=on  and  Machine  Learning  

Lecture 2:

Bayesian  Decision  Theory      1.  Diagram  and  formula=on        2.  Bayes  rule  for  inference        3.  Bayesian  decision      4.  Discriminant  func=ons  and  space  par==on        5.  Advanced  issues    

Page 3: Materi 3 Bayesian Decision Theory

Lecture  note  for  Stat  231:  Pa;ern  Recogni=on  and  Machine  Learning  

Diagram  of  pa;ern  classifica=on  Procedure of pattern recognition and decision making

subjects   Features              x

Observables                  X  

   Ac=on              α

Inner  belief                  w  

X--- all the observables using existing sensors and instruments x --- is a set of features selected from components of X, or linear/non-linear functions of X. w --- is our inner belief/perception about the subject class. α  --- is the action that we take for x. We denote the three spaces by

},...,,{,class ofindex theis

vectorais),...,,(

α,,

k21C

d21

αCd

wwwwxxxx

wx

=

Ω∈Ω∈Ω∈

Page 4: Materi 3 Bayesian Decision Theory

Lecture  note  for  Stat  231:  Pa;ern  Recogni=on  and  Machine  Learning  

Examples  

Ex  1:  Fish  classifica=on  

     X=I    is  the  image  of  fish,            x  =(brightness,  length,  fin#,    ….)            w  is  our  belief  what  the  fish  type  is    

                 Ωc={“sea  bass”,  “salmon”,  “trout”,  …}  

         α  is  a  decision  for  the  fish  type,                  in  this  case  Ωc=  Ωα                Ωα ={“sea  bass”,  “salmon”,  “trout”,  

…}  

 

Ex 2: Medical diagnosis X= all the available medical tests, imaging scans that a

doctor can order for a patient x =(blood pressure, glucose level, cough, x-ray….) w is an illness type

Ωc={“Flu”, “cold”, “TB”, “pneumonia”, “lung cancer”…}

α is a decision for treatment, Ωα ={“Tylenol”, “Hospitalize”, …}

Page 5: Materi 3 Bayesian Decision Theory

Lecture  note  for  Stat  231:  Pa;ern  Recogni=on  and  Machine  Learning  

Tasks  

subjects   Features              x

Observables                  X  

Decision              α

Inner  belief                  w  

control  sensors  

selec=ng  Informa=ve    features  

sta=s=cal  inference  

risk/cost  minimiza=on  

In Bayesian decision theory, we are concerned with the last three steps in the big ellipse assuming that the observables are given and features are selected.    

Page 6: Materi 3 Bayesian Decision Theory

Bayes  Decision  It  is  the  decision  making  when  all  underlying  probability  distribu=ons  are  known.  It  is  op=mal  given  the  distribu=ons  are  known.    For  two  classes  ω1 and ω2 , Prior probabilities for an unknown new observation:

P(ω1) : the new observation belongs to class 1 P(ω2) : the new observation belongs to class 2

P(ω1 ) + P(ω2 ) = 1 It reflects our prior knowledge. It is our decision rule when no feature on the new object is available: Classify as class 1 if P(ω1 ) > P(ω2 )

Page 7: Materi 3 Bayesian Decision Theory

Lecture  note  for  Stat  231:  Pa;ern  Recogni=on  and  Machine  Learning  

Bayesian  Decision  Theory  

Features              x

Decision        α(x)

Inner  belief            p(w|x)  

sta=s=cal  Inference  

risk/cost  minimiza=on  

Two probability tables: a). Prior p(w) b). Likelihood p(x|w)

A risk/cost function (is a two-way table) λ(α | w)

The belief on the class w is computed by the Bayes rule                The  risk  is  computed  by        

)()()|()|(

xpwpwxpxwp =

∑=

=k

xxR1j

jjii )|)p(ww|()|( αλα

Page 8: Materi 3 Bayesian Decision Theory

Bayes  Decision  We  observe  features  on  each  object.  

P(x| ω1) & P(x| ω2) : class-specific density    The  Bayes  rule:              

Page 9: Materi 3 Bayesian Decision Theory

Lecture  note  for  Stat  231:  Pa;ern  Recogni=on  and  Machine  Learning  

Decision  Rule    

A  decision  is  made  to  minimize  the  average  cost  /  risk,                                              It  is  minimized  when  our  decision  is  made  to  minimize  the  cost  /  risk  for  each  instance  

x.  

∫= dx)()|)(( xpxxRR α

αα Ω→Ωd:)(x

∑=ΩΩ

==k

jjj xwpwxRx

1)|()|(minarg)|(minarg)( αλαα

αα

A decision rule is a mapping function from feature space to the set of actions we will show that randomized decisions won’t be optimal.

Page 10: Materi 3 Bayesian Decision Theory

Lecture  note  for  Stat  231:  Pa;ern  Recogni=on  and  Machine  Learning  

Bayesian  error  In a special case, like fish classification, the action is classification, we assume a 0/1 error.    

jiji

jiji

wifwwifw

≠=

==

ααλ

ααλ

1)|(

0)|(

)|(1)|p(w)|( iw

jiij

xpxxR ααα

−== ∑≠

The  risk  for  classifying  x  to  class  αi    is,  

The optimal decision is to choose the class that has maximum posterior probability

)|(maxarg))|(1(minarg)( xpxpx ααααα ΩΩ

=−=

The total risk for a decision rule, in this case, is called the Bayesian error

dxxpxxpdxxpxerrorperrorpR )())|)((1()()|()( ∫∫ −=== α

Page 11: Materi 3 Bayesian Decision Theory

Lecture  note  for  Stat  231:  Pa;ern  Recogni=on  and  Machine  Learning  

Par==on  of  feature  space  αα Ω→Ωd:)(x

The  decision  is  a  par==on  /coloring  of  the  feature  space  into  k  subspaces  

ji,jiik1i ≠=Ω∩ΩΩ∪=Ω = φ

1   4  

3  5  

2  

Page 12: Materi 3 Bayesian Decision Theory

Lecture  note  for  Stat  231:  Pa;ern  Recogni=on  and  Machine  Learning  

An  example  of  fish  classifica=on  

Page 13: Materi 3 Bayesian Decision Theory

Lecture  note  for  Stat  231:  Pa;ern  Recogni=on  and  Machine  Learning  

Decision/classifica=on  Boundaries  

Page 14: Materi 3 Bayesian Decision Theory

Lecture  note  for  Stat  231:  Pa;ern  Recogni=on  and  Machine  Learning  

Close-­‐form  solu=ons  In  two  case  classifica=on,  k=2,  with  p(x|w)  being  Normal  densi=es,  the  decision  Boundaries  can  be  computed  in  close-­‐form.  

Page 15: Materi 3 Bayesian Decision Theory

Excercise  •  Consider  the  example  of  sea  bass  –  salmon  classifier,  let  these  two  

possible  ac=ons:  A1:  Decide  the  input  is  sea  bass;  A2:  Decide  the  input  is  salmon.  Prior  for  sea  bass  and  salmon  are  2/3  and  1/3,  respec=vely.    

•  The  cost  of  classifying    a  fish  as  a  salmon  when  it  truly  is  sea  bass  is  2$,  and  The  cost  of  classifying    a  fish  as  a  sea  bass  when  it  is  truly  a  salmon  is  1$.    

•  Find  the  decision  for  input  X  =  13,      whereas  the  likelihood  P(X|ω1)  =  0.28,  

           and  P(X|ω2)  =  0.17  

Page 16: Materi 3 Bayesian Decision Theory

Nominal  features  

Page 17: Materi 3 Bayesian Decision Theory
Page 18: Materi 3 Bayesian Decision Theory

Training  data.  Three  input  dimensions  rash  (R),  temperature  (T  ),  dizzy  (D);  output  class  (C).  All  variables  binary    

Page 19: Materi 3 Bayesian Decision Theory

 Training  Phase  :  

We  can  summarise  the  training  data  as  follows,  and  es=mate  the  likelihoods  as  rela=ve  frequencies  

Page 20: Materi 3 Bayesian Decision Theory

•  Classify  the  test  data  by  compu=ng  the  posterior  probabili=es.  If  P(C  =  1  |  X)  >  0.5  classify  as  C  =  1  else  classify  as  C  =  0  (since  it  is  a  two  class  problem)    

Page 21: Materi 3 Bayesian Decision Theory

Testing phase :

Page 22: Materi 3 Bayesian Decision Theory

Univariate  Normal  Distribu=on  (Con=nuous)  

Page 23: Materi 3 Bayesian Decision Theory

Training  phase    

Page 24: Materi 3 Bayesian Decision Theory

 

compute  the  standard  devia=on  of  both  trolls  and  smurfs      

Page 25: Materi 3 Bayesian Decision Theory

Finally  we  compute  the  class  condi=onal  probability  of  observing  a  height  given  either  a  smurf  or  a  troll    

 

Page 26: Materi 3 Bayesian Decision Theory

Finally,  we  compute  the  prior  probability  of  observing  a  troll  or  a  

smurf  regardless  of  height  as      

Page 27: Materi 3 Bayesian Decision Theory

Tes.ng  phase:    

Page 28: Materi 3 Bayesian Decision Theory

 In  general,  we  have  the  most  important  parts  done.  However,  lets  take  a  look  

at  Bayes  rule  in  this  context:      

We  no=ce  that  in  order  to  figure  out  the  final  probability,  we  need  the  marginal  probability  p(x)  which  is  the  base  probability  of  height.  Basically,  p(x)  is  there  to  make  sure  that  if  we  summed  up  all  the  P(j|x)  they  would  equal  one.  This  is  simply  

which would be:

Page 29: Materi 3 Bayesian Decision Theory

Finally....  We  have  everything  we  need  to  determine  if  a  creature  is  most  likely  a  troll  or  a  smurf  based  upon  its  height.  Thus,  we  can  now  ask,  if  we  observe  a  creature  that  is  2”  tall  how  likely  is  

it  to  be  a  smurf?      

Next we ask what is the likelihood of that we have a troll given the observation of 2”?

then we can decide that we are most likely observing a smurf

The end result is that if

Page 30: Materi 3 Bayesian Decision Theory

Exercise  

Height   Creature  

2.70”   Smurf  

2.52”   Smurf  

2.57”   Smurf  

2.22”   Smurf  

3.16”   Troll  

3.58”   Troll  

3.16”   Troll  

if we test a creature that is 2.90” tall, how likely is it to be? Smurf or Troll?

Page 31: Materi 3 Bayesian Decision Theory

The  Mul.variate  Normal  Distribu.on  (Con.nuous)      

a variance-covariance matrix:

In two dimensions, the so called bivariate normal, we are concerned with two means:

Page 32: Materi 3 Bayesian Decision Theory

Curvature   Diameter   Quality  Control  Result  

2.95   6.63   Passed  

2.53   7.79   Passed  

3.57   5.65   Passed  

3.57   5.45   Passed  

3.16   4.46   Not  passed  

2.58   6.22   Not  passed  

2.16   3.52   Not  passed  

Page 33: Materi 3 Bayesian Decision Theory

•  As  a  consultant  to  the  factory,  you  get  a  task  to  set  up  the  criteria  for  automa=c  quality  control.  Then,  the  manager  of  the  factory  also  wants  to  test  your  criteria  upon  new  type  of  chip  rings  that  even  the  human  experts  are  argued  to  each  other.  The  new  chip  rings  have  curvature  2.81  and  diameter  5.46.  

•  Can  you  solve  this  problem  by  employing  Bayes  Classifier?  

Page 34: Materi 3 Bayesian Decision Theory

•  X  =  features  (or  independent  variables)  of  all  data.  Each  row  (denoted  by  )  represents  one  object;  each  column  stands  for  one  feature.  

•  Y  =  group  of  the  object  (or  dependent  variable)  of  all  data.  Each  row  represents  one  object  and  it  has  only  one  column.  

Page 35: Materi 3 Bayesian Decision Theory

Training  phase  

Page 36: Materi 3 Bayesian Decision Theory

 

             x=                                                                            y=  

2.952.353.573.162.582.163.27

6.637.795.655.474.466.223.52

"

#

$ $ $ $ $ $ $ $ $

%

&

' ' ' ' ' ' ' ' '

1111222

"

#

$ $ $ $ $ $ $ $ $

%

&

' ' ' ' ' ' ' ' '

Page 37: Materi 3 Bayesian Decision Theory

•  Xk  =  data  of  row  k,  for  example  x3  =    •  g=number  of  gropus  in  y,  in  our  example,  g=2  •  Xi  =  features  data  for  group  i  .  Each  row  represents  one  object;  each  column  stands  for  one  feature.  We  separate  x    into  several  groups  based  on  the  number  of  category  in  y.  

3.57 5.65[ ]

Page 38: Materi 3 Bayesian Decision Theory

             x1=                                                                  x2=  

2.952.533.573.16

6.637.795.655.47

"

#

$ $ $ $

%

&

' ' ' '

2.582.163.27

4.466.223.52

"

#

$ $ $

%

&

' ' '

Page 39: Materi 3 Bayesian Decision Theory

•  μi  =  mean  of  features  in  group  i,  which  is  average  of  xi    

•  μ1  =                                      ,  μ2  =      

•  μ  =  global  mean  vector,  that  is  mean  of  the  whole  data  set.  

•  In  this  example,  μ  =        

2.67 4.73[ ]

3.05 6.38[ ]

2.88 5.676[ ]

Page 40: Materi 3 Bayesian Decision Theory

•                   =  mean  corrected  data,  that  is  the  features  data  for  group  i,  xi  ,  minus  the  global  mean  vector  μ  

           =                                                                        =    

−0.305−0.7320.386

−1.2180.547−2.155

#

$

% % %

&

'

( ( (

0.060−0.3570.6790.269

0.9512.109−0.025−0.209

#

$

% % % %

&

'

( ( ( (

xi0

x10

x20

Page 41: Materi 3 Bayesian Decision Theory

Covariance  matrix  of  group  i  =            Σ1  =                                                                            Σ2    =                

Σi =(xi

0 )T xi0

ni

0.166−0.192

−0.1921.349

#

$ %

&

' (

0.259−0.286

−0.2862.142

#

$ %

&

' (

Page 42: Materi 3 Bayesian Decision Theory

Finally  we  compute  the  class  condi=onal  probability  of  observing  curvature  =  2.81  and  diameter  =  5.46    given  

either  a  passed  or  not  passed      

p(2.81, 5.46 | Passed) = 2(3.14)−1/2(2) 0.166 −0.192−0.192 1.349

"

#$

%

&'

−1/2

e−1/2( 2.81

5.46

"

#$

%

&'−

3.056.38

"

#$

%

&')' 0.166 −0.192

−0.192 1.349

"

#$

%

&'

−1

( 2.815.46

"

#$

%

&'−

3.056.38

"

#$

%

&')

p(2.81, 5.46 | Not _ passed) = 2(3.14)−1/2(2) 0.259 −0.286−0.286 2.142

"

#$

%

&'

−1/2

e−1/2( 2.81

5.46

"

#$

%

&'−

2.674.73

"

#$

%

&')' 0.259 −0.286

−0.286 2.142

"

#$

%

&'

−1

( 2.815.46

"

#$

%

&'−

2.674.73

"

#$

%

&')

HP
Sticky Note
sigma -> kovarian
Page 43: Materi 3 Bayesian Decision Theory

P  =  prior  probability  vector  (each  row  represent  prior  probability  of  group  ).  If  we  do  not  know  the  prior  probability,  we  just  

assume  it  is  equal  to  total  sample  of  each  group  divided  by  the  total  samples,  that  is  

 

•  p(Passed)  =  4/7  •  p(Not_passed)  =  3/7  

Page 44: Materi 3 Bayesian Decision Theory

Tes.ng  phase  

Page 45: Materi 3 Bayesian Decision Theory

     

In  general,  we  have  the  most  important  parts  done.  However,  lets  take  a  look  at  Bayes  rule  in  this  context:  

   

Basically,  p(x)  is  there  to  make  sure  that  if  we  summed  up  all  the  P(j|x)  they  would  equal  one.  This  is  simply  

Page 46: Materi 3 Bayesian Decision Theory

Finally....  We  have  everything  we  need  to  determine  if  a  creature  is  most  likely  a  troll  or  a  smurf  based  upon  its  height.  Thus,  we  can  now  ask,  if  we  observe  curvature  =  2.81  and  diameter  =  5.46  

how  likely  is  it  to  be  a  passed?      

Next we ask what is the likelihood of that we have not passed by given the observation of curvature = 2.81 and diameter = 5.46?

then we can decide that is most likely classified as Not Passed

If the end result is

p(Passed | 2.81, 5.46) = p(2.81, 5.46 | Passed)p(Passed)p(2.81, 5.46)

p(Not _ passed | 2.81, 5.46) = p(2.81, 5.46 | Not _ passed)p(Not _Passed)p(2.81, 5.46)

p(Not _ passed | 2.81, 5.46)> p(Passed | 2.81, 5.46)