pr_unit-iii

7/30/2019 PR_UNIT-III

1/29

12/31/2010 SAM_PR_UNIT-II 1

UNIT-III The Normal Density

Univariate density

Density which is analytically tractable

Continuous density

A lot of processes are asymptotically Gaussian

Handwritten characters, speech sounds are ideal orprototype corrupted by random process (central limittheorem)

Where:

= mean (or expected value) ofx

2= expected squared deviation or variance

,x

2

1exp

2

1)x(P

2


2/29



3/29


Multivariate density

Multivariate normal density in d dimensions is:

where x = (x1, x2, , xd)t (t stands for the transpose vector

form)

= (1, 2, , d)tmean vector

= d*dcovariance matrix

||and -1 are determinant and inverse respectively

)x()x(21exp

)2(1)x(P 1t2/12/d


4/29


Discriminant Functions for the Normal Density

Bayes Decision Theory Discrete Features

Bayesian Decision Theory


5/29


Discriminant Functions for the Normal Density

The minimum error-rate classification can be achieved

by the discriminant function

gi(x) = ln p(x |i) + ln P(i)

Case of multivariate normal

)(lnln2

12ln

2)()(

2

1)(

1

ii

i

i

t

ii Pd

xxxg


6/29


All features have equal variance

)(ln)()(2

1)(

2 ii

t

ii Pxxxg

)category!thfor thethresholdthecalledis(

)(ln2

1;

:where

function)ntdiscrimina(linear)(

0

202

0

-ii

ii

t

iii

i

i

t

ii

Pww

wxwxg

Case 1: i=2.I

)()(:x tofromdistanceEuclidean xx t


7/2912/31/2010 SAM_PR_UNIT-II 7

A classifier that uses linear discriminant functions

is called a linear machine

The decision surfaces for a linear machine are

pieces of hyperplanes defined by:

gi(x) = gj(x)


8/2912/31/2010 SAM_PR_UNIT-II 8


9/2912/31/2010 SAM_PR_UNIT-II 9

The hyperplane separating Riand Rj

always orthogonal to the line linking the means!

)()(P

)(Pln)(

2

1x ji

j

i

2

ji

2

ji0

)(21xthen)(P)(Pif ji0ji


10/2912/31/2010 SAM_PR_UNIT-II 10


11/2912/31/2010 SAM_PR_UNIT-II 11


12/2912/31/2010 SAM_PR_UNIT-II 12

Covariance of all classes are identical but arbitrary

i.e., covariance matrices are arbitrary, but equal toeach other for all classes. Features then form hyper-

ellipsoidal clusters of equal size and shape. This alsoresults in linear discriminant functions whose decisionboundaries are again hyperplanes

Hyperplane separating Ri and Rj

(the hyperplane separating Ri and Rj is generally notorthogonal to the line between the means!)

).()()(

)(P/)(Pln)(

2

1x ji

ji

1t

ji

ji

ji0

Case 2: i =


13/2912/31/2010 SAM_PR_UNIT-II 13


14/29

12/31/2010 SAM_PR_UNIT-II 14


15/29

12/31/2010 SAM_PR_UNIT-II 15

The covariance matrices are different for eachcategory

In two class case, the decision boundaries formhyperquadratics.

The discriminant functions are now, in general,quadratic (nor linear)

(Hyperquadrics which are: hyperplanes, pairs ofhyperplanes, hyperspheres, hyperellipsoids,hyperparaboloids, hyperhyperboloids)

)(lnln2

1

2

1w

w

2

1W

:

)(

10

1i

1i

0

iiii

t

ii

ii

i

i

t

ii

t

i

P

where

wxwxWxxg

Case 3: i = arbitrary


16/29

12/31/2010 SAM_PR_UNIT-II 16


17/29

12/31/2010 SAM_PR_UNIT-II 17


18/29

12/31/2010 SAM_PR_UNIT-II 18

For the multi class case, the boundaries will look even

more complicated.

Decision

Boundaries


19/29

12/31/2010 SAM_PR_UNIT-II 19

Bayes Decision Theory Discrete Features

Components of x are binary or integer valued, x can

take only one of m discrete values

v1, v2, , vm

Case of independent binary features in 2 category

problem

Let x =[x1, x2, , xd]twhere each xiis either 0 or 1, with

probabilities:

pi= P(xi= 1 |1)

qi= P(xi= 1 |2)


20/29

12/31/2010 SAM_PR_UNIT-II 20

The discriminant function in this case is:

0g(x)ifand0g(x)if

)()(ln

11ln

:

,...,1)1()1(ln

:

)(

21

2

1

1

0

i

0

1

decide

PP

qpw

and

dipqqpw

where

wxwxg

d

i i

i

ii

ii

i

d

i

i


21/29

12/31/2010 SAM_PR_UNIT-II 21

Compound Bayesian Decision Theoryand Context

dependent,..,...,,, 321 vrc

Exploit statistical dependence to gain improvedperformance by using context

Compound decision problem Sequential compound decision problem


22/29

12/31/2010 SAM_PR_UNIT-II 22

Compound Bayesian Decision Theoryand Context

Pp

Pp

p

PpP

|X

|X

X

|XX|

The posterior probability of

The optimal procedure is to minimize the compound conditional risk.

If no loss for being correct & all errors are equally costly,

Procedure computing P(|X) for all ,

selecting ( posterior probability is maximum )

ct in ,...,)(,)(),...,1( 1

)x,...,x(X n1

In practice, enormous task (cn) &

P() dependent

)(onlydependsx,)(||X1

iixppi

n

i

i


23/29

12/31/2010 SAM_PR_UNIT-II 23


24/29

12/31/2010 SAM_PR_UNIT-II 24


25/29

12/31/2010 SAM_PR_UNIT-II 25


26/29

12/31/2010 SAM_PR_UNIT-II 26


27/29

12/31/2010 SAM_PR_UNIT-II 27


28/29

12/31/2010 SAM_PR_UNIT-II 28


29/29

12/31/2010 SAM PR UNIT II 29

pr_unit-iii

Documents