pr_unit-iii
TRANSCRIPT
-
7/30/2019 PR_UNIT-III
1/29
12/31/2010 SAM_PR_UNIT-II 1
UNIT-III The Normal Density
Univariate density
Density which is analytically tractable
Continuous density
A lot of processes are asymptotically Gaussian
Handwritten characters, speech sounds are ideal orprototype corrupted by random process (central limittheorem)
Where:
= mean (or expected value) ofx
2= expected squared deviation or variance
,x
2
1exp
2
1)x(P
2
-
7/30/2019 PR_UNIT-III
2/29
12/31/2010 SAM_PR_UNIT-II 2
-
7/30/2019 PR_UNIT-III
3/29
12/31/2010 SAM_PR_UNIT-II 3
Multivariate density
Multivariate normal density in d dimensions is:
where x = (x1, x2, , xd)t (t stands for the transpose vector
form)
= (1, 2, , d)tmean vector
= d*dcovariance matrix
||and -1 are determinant and inverse respectively
)x()x(21exp
)2(1)x(P 1t2/12/d
-
7/30/2019 PR_UNIT-III
4/29
12/31/2010 SAM_PR_UNIT-II 4
Discriminant Functions for the Normal Density
Bayes Decision Theory Discrete Features
Bayesian Decision Theory
-
7/30/2019 PR_UNIT-III
5/29
12/31/2010 SAM_PR_UNIT-II 5
Discriminant Functions for the Normal Density
The minimum error-rate classification can be achieved
by the discriminant function
gi(x) = ln p(x |i) + ln P(i)
Case of multivariate normal
)(lnln2
12ln
2)()(
2
1)(
1
ii
i
i
t
ii Pd
xxxg
-
7/30/2019 PR_UNIT-III
6/29
12/31/2010 SAM_PR_UNIT-II 6
All features have equal variance
)(ln)()(2
1)(
2 ii
t
ii Pxxxg
)category!thfor thethresholdthecalledis(
)(ln2
1;
:where
function)ntdiscrimina(linear)(
0
202
0
-ii
ii
t
iii
i
i
t
ii
Pww
wxwxg
Case 1: i=2.I
)()(:x tofromdistanceEuclidean xx t
-
7/30/2019 PR_UNIT-III
7/2912/31/2010 SAM_PR_UNIT-II 7
A classifier that uses linear discriminant functions
is called a linear machine
The decision surfaces for a linear machine are
pieces of hyperplanes defined by:
gi(x) = gj(x)
-
7/30/2019 PR_UNIT-III
8/2912/31/2010 SAM_PR_UNIT-II 8
-
7/30/2019 PR_UNIT-III
9/2912/31/2010 SAM_PR_UNIT-II 9
The hyperplane separating Riand Rj
always orthogonal to the line linking the means!
)()(P
)(Pln)(
2
1x ji
j
i
2
ji
2
ji0
)(21xthen)(P)(Pif ji0ji
-
7/30/2019 PR_UNIT-III
10/2912/31/2010 SAM_PR_UNIT-II 10
-
7/30/2019 PR_UNIT-III
11/2912/31/2010 SAM_PR_UNIT-II 11
-
7/30/2019 PR_UNIT-III
12/2912/31/2010 SAM_PR_UNIT-II 12
Covariance of all classes are identical but arbitrary
i.e., covariance matrices are arbitrary, but equal toeach other for all classes. Features then form hyper-
ellipsoidal clusters of equal size and shape. This alsoresults in linear discriminant functions whose decisionboundaries are again hyperplanes
Hyperplane separating Ri and Rj
(the hyperplane separating Ri and Rj is generally notorthogonal to the line between the means!)
).()()(
)(P/)(Pln)(
2
1x ji
ji
1t
ji
ji
ji0
Case 2: i =
-
7/30/2019 PR_UNIT-III
13/2912/31/2010 SAM_PR_UNIT-II 13
-
7/30/2019 PR_UNIT-III
14/29
12/31/2010 SAM_PR_UNIT-II 14
-
7/30/2019 PR_UNIT-III
15/29
12/31/2010 SAM_PR_UNIT-II 15
The covariance matrices are different for eachcategory
In two class case, the decision boundaries formhyperquadratics.
The discriminant functions are now, in general,quadratic (nor linear)
(Hyperquadrics which are: hyperplanes, pairs ofhyperplanes, hyperspheres, hyperellipsoids,hyperparaboloids, hyperhyperboloids)
)(lnln2
1
2
1w
w
2
1W
:
)(
10
1i
1i
0
iiii
t
ii
ii
i
i
t
ii
t
i
P
where
wxwxWxxg
Case 3: i = arbitrary
-
7/30/2019 PR_UNIT-III
16/29
12/31/2010 SAM_PR_UNIT-II 16
-
7/30/2019 PR_UNIT-III
17/29
12/31/2010 SAM_PR_UNIT-II 17
-
7/30/2019 PR_UNIT-III
18/29
12/31/2010 SAM_PR_UNIT-II 18
For the multi class case, the boundaries will look even
more complicated.
Decision
Boundaries
-
7/30/2019 PR_UNIT-III
19/29
12/31/2010 SAM_PR_UNIT-II 19
Bayes Decision Theory Discrete Features
Components of x are binary or integer valued, x can
take only one of m discrete values
v1, v2, , vm
Case of independent binary features in 2 category
problem
Let x =[x1, x2, , xd]twhere each xiis either 0 or 1, with
probabilities:
pi= P(xi= 1 |1)
qi= P(xi= 1 |2)
-
7/30/2019 PR_UNIT-III
20/29
12/31/2010 SAM_PR_UNIT-II 20
The discriminant function in this case is:
0g(x)ifand0g(x)if
)()(ln
11ln
:
,...,1)1()1(ln
:
)(
21
2
1
1
0
i
0
1
decide
PP
qpw
and
dipqqpw
where
wxwxg
d
i i
i
ii
ii
i
d
i
i
-
7/30/2019 PR_UNIT-III
21/29
12/31/2010 SAM_PR_UNIT-II 21
Compound Bayesian Decision Theoryand Context
dependent,..,...,,, 321 vrc
Exploit statistical dependence to gain improvedperformance by using context
Compound decision problem Sequential compound decision problem
-
7/30/2019 PR_UNIT-III
22/29
12/31/2010 SAM_PR_UNIT-II 22
Compound Bayesian Decision Theoryand Context
Pp
Pp
p
PpP
|X
|X
X
|XX|
The posterior probability of
The optimal procedure is to minimize the compound conditional risk.
If no loss for being correct & all errors are equally costly,
Procedure computing P(|X) for all ,
selecting ( posterior probability is maximum )
ct in ,...,)(,)(),...,1( 1
)x,...,x(X n1
In practice, enormous task (cn) &
P() dependent
)(onlydependsx,)(||X1
iixppi
n
i
i
-
7/30/2019 PR_UNIT-III
23/29
12/31/2010 SAM_PR_UNIT-II 23
-
7/30/2019 PR_UNIT-III
24/29
12/31/2010 SAM_PR_UNIT-II 24
-
7/30/2019 PR_UNIT-III
25/29
12/31/2010 SAM_PR_UNIT-II 25
-
7/30/2019 PR_UNIT-III
26/29
12/31/2010 SAM_PR_UNIT-II 26
-
7/30/2019 PR_UNIT-III
27/29
12/31/2010 SAM_PR_UNIT-II 27
-
7/30/2019 PR_UNIT-III
28/29
12/31/2010 SAM_PR_UNIT-II 28
-
7/30/2019 PR_UNIT-III
29/29
12/31/2010 SAM PR UNIT II 29