introduction to machine learning 3rd edition ethem alpaydin © the mit press, 2014...
TRANSCRIPT
INTRODUCTION TO MACHİNE LEARNİNG3RD EDİTİON
ETHEM ALPAYDIN© The MIT Press, 2014
[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml3e
Lecture Slides for
CHAPTER 3:
BAYESİAN DECİSİON THEORY
3
Probability and Inference
Result of tossing a coin is Î {Heads,Tails} Random var X Î{1,0}
Bernoulli: P {X=1} = poX (1 ‒ po)(1 ‒ X)
Sample: X = {xt }Nt =1
Estimation: po = # {Heads}/#{Tosses} = ∑
t xt / N
Prediction of next toss:Heads if po > ½, Tails otherwise
Classification
Credit scoring: Inputs are income and savings.
Output is low-risk vs high-risk Input: x = [x1,x2]T ,Output: C Î {0,1} Prediction:
otherwise 0
)|()|( if 1 choose
or
otherwise 0
)|( if 1 choose
C
C
C
C
,xxCP ,xxCP
. ,xxCP
2121
21
01
501
4
Bayes’ Rule
xx
xppP
PCC
C|
|
110
0011
110
xx
xxx
||||
CC
CCCC
CC
PpPpPpp
PP
5
posterior
likelihoodprior
evidence
6
Bayes’ Rule: K>2 Classes
K
kkk
ii
iii
CPCp
CPCpp
CPCpCP
1
|
|
||
x
x
x
xx
xx | max | if choose
and 1
kkii
K
iii
CPCPC
CPCP
10
Losses and Risks
Actions: αi Loss of αi when the state is Ck : λik Expected risk (Duda and Hart, 1973)
xx
xx
|min| if choose
||
kkii
k
K
kiki
RR
CPR
1
7
8
Losses and Risks: 0/1 Loss
kiki
ik if
if
1
0
x
x
xx
|
|
||
i
ikk
K
kkiki
CP
CP
CPR
1
1
For minimum risk, choose the most probable class
9
Losses and Risks: Reject
10
1
1
0
otherwise
if
if
,Kiki
ik
xxx
xx
|||
||
iik
ki
K
kkK
CPCPR
CPR
11
1
otherwise reject
| and || if choose 1xxx ikii CPikCPCPC
10
Different Losses and Reject
Equal losses
Unequal losses
With reject
11
Discriminant Functions
Kigi ,, , 1x xx kkii ggC max if choose
xxx kkii gg max| R
ii
i
i
i
CPCpCPR
g|
|
|
x
x
x
x
K decision regions R1,...,RK
12
K=2 Classes
Dichotomizer (K=2) vs Polychotomizer (K>2) g(x) = g1(x) – g2(x)
Log odds:
otherwise
if choose
2
1 0
CgC x
x
x
||
log2
1
CPCP
13
Utility Theory
Prob of state k given exidence x: P (Sk|x)
Utility of αi when state is k: Uik
Expected utility:
xx
xx
| max| if Choose
||
jjii
kkiki
EUEUα
SPUEU
14
Association Rules
Association rule: X ® Y People who buy/click/visit/enjoy X are
also likely to buy/click/visit/enjoy Y. A rule implies association, not
necessarily causation.
15
Association measures
Support (X ® Y):
Confidence (X ® Y):
Lift (X ® Y):
customers
and bought whocustomers
##
,YX
YXP
X
YXXPYXP
XYP
bought whocustomers
and bought whocustomers
|
##
)(,
)(
)|()()(
,YP
XYPYPXP
YXP
16
Example
17
Apriori algorithm (Agrawal et al., 1996)
For (X,Y,Z), a 3-item set, to be frequent (have enough support), (X,Y), (X,Z), and (Y,Z) should be frequent.
If (X,Y) is not frequent, none of its supersets can be frequent.
Once we find the frequent k-item sets, we convert them to rules: X, Y ® Z, ...and X ® Y, Z, ...