dates and timescharlesr/9881/lecture18_blanks.pdf · ¥ presentations next week ¥ assignment 5 due...

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Memorial University of Newfoundland

Pattern RecognitionLecture 18, July 13, 2006

http://www.engr.mun.ca/~charlesr

Office Hours: Tuesdays & Thursdays 8:30 - 9:30 PM

EN-3026


Dates and Times

• Presentations next week

• Assignment 5 due on Monday, July 24th

• Final Reports due on July 28th

2


Presentations - July 18th and 20th

• July 18th

• Liang Chen

• Chen Hao

• Chao Ying

• Shenqiu Zhang

• July 20th

• Zhang Chong

• Anjana Punchihewa

• Liang Zhang

• Yan Zhang

3


Recap

• Feature selection

• choose n of m measurements

• Evaluation:

• Feature interclass distance

• Selection:

• Feature ranking

• Incrementally best feature

• Successive addition/deletions

4


Feature Extraction

5

Given m measurements {x1, ..., xm}, find n < m functions

yj=fj(x1, ..., xm), j=1..n which produce the n best features.

Some applications warrant non-linear functions, including look-

up tables. We will only consider the linear case.

! ! ! ! ! ! ! ! ! y = A x

Find suitable criteria for A.

Usual approaches:

- Intraclass distance (for 1 class)

- Interclass distance (for k labelled classes)

- Representation error (clustering)


Single-class Feature Extraction

6

Choose A to minimize the intra-class distance:

We could formally solve optimization, but we have done the

same calculation before in orthonormal whitening. We want the

minimum variant subspace.

Pattern Recognition

Charles RobertsonJuly 13, 2006

1 Feature Extraction

Given m measurements

{x1, ..., xm}

find n functions

fi{x1, ..., xm}

to yield

yi = fi(x1, ..., xm), i = 1..n

y = Ax

1. Single class feature extraction problem

J(A) =N!

i=1

N!

j=1

"""yi! y

j

"""2

=N!

i=1

N!

j=1

(xi ! xj)T AT A(xi ! xj)

1

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 7

Minimum variant direction = minimum eigenvalue eigenvector

Minimum variance plane = pair of minimum eigenvalue

eigenvectors

1. FEATURE EXTRACTION 2

Min. variance plane:

!1,!

2with "1 ! "2 ! ... ! "m

|Sy| =!!ASAT

!!

If columns of A are eigenvectors of S:

SAT = AT !

S!i= "i!i

!!ASAT!! =

!!AAT !!!

Sy =

"

#$

!T1...

!Tn

%

&'(!

1...!

n

)"

#$"1 0

. . .0 "n

%

&' =n*

i=1

"i

since

!Ti!

j= #ij

2. K classes

Fisher’s criterion

J(A) =!!ASBAT

!!|ASW AT |



!1,!

2with "1 ! "2 ! ... ! "m

3D diagram

!1

!2

!3

|Sy| =!!ASAT

!!


SAT = AT !

S!i= "i!i

!!ASAT!! =

!!AAT !!!

Sy =

"

#$

!T1...

!Tn

%

&'(!

1...!

n

)"

#$"1 0

. . .0 "n

%

&' =n*

i=1

"i



!1,!

2with "1 ! "2 ! ... ! "m

3D diagram

!1

!2

!3

|Sy| =!!ASAT

!!


SAT = AT !

S!i= "i!i

!!ASAT!! =

!!AAT !!!

Sy =

"

#$

!T1...

!Tn

%

&'(!

1...!

n

)"

#$"1 0

. . .0 "n

%

&' =n*

i=1

"i



!1,!

2with "1 ! "2 ! ... ! "m

3D diagram

!1

!2

!3

|Sy| =!!ASAT

!!


SAT = AT !

S!i= "i!i

!!ASAT!! =

!!AAT !!!

Sy =

"

#$

!T1...

!Tn

%

&'(!

1...!

n

)"

#$"1 0

. . .0 "n

%

&' =n*

i=1

"i

x1

x2

x3





!1,!

2with "1 ! "2 ! ... ! "m

3D diagram

!1

!2

!3

|Sy| =!!ASAT

!!


SAT = AT !

S!i= "i!i

!!ASAT!! =

!!AAT !!!

Sy =

"

#$

!T1...

!Tn

%

&'(!

1...!

n

)"

#$"1 0

. . .0 "n

%

&' =n*

i=1

"i



!1,!

2with "1 ! "2 ! ... ! "m

3D diagram

!1

!2

!3

|Sy| =!!ASAT

!!


SAT = AT !

S!i= "i!i

!!ASAT!! =

!!AAT !!!

Sy =

"

#$

!T1...

!Tn

%

&'(!

1...!

n

)"

#$"1 0

. . .0 "n

%

&' =n*

i=1

"i

so



!1,!

2with "1 ! "2 ! ... ! "m

3D diagram

!1

!2

!3

|Sy| =!!ASAT

!!


SAT = AT !

S!i= "i!i

!!ASAT!! =

!!AAT !!!

Sy =

"

#$

!T1...

!Tn

%

&'(!

1...!

n

)"

#$"1 0

. . .0 "n

%

&' =n*

i=1

"i

So choosing the smallest eigenvalues provide smallest scatter of

the new features.


K-classes feature extraction

9

We know that there are k classes, and have labelled samples.

We want to maximize interclass distance to give the maximum

separation.

Fisher’s Criterion:

This is a multidimensional version of Fisher’s Linear discriminant.


since

!Ti!

j= "ij

2. K classes


J(A) =!!ASBAT

!!|ASW AT |

multidimensional version of

J(w) =wT SBw

wT SW w

w = S!1W (m1 !m2)

set

#J(A)#A

= 0

Use result:

#

#A

!!ASAT!! = 2

!!ASAT!! (ASAT )!1AS

#J(A)#A

=#

#A

!!ASBAT!!

|ASW AT |

= 2!!ASBAT

!!|ASW AT | (ASBAT )!1ASB ! 2

!!ASBAT!!

|ASW AT | (ASW AT )!1ASW

= 0


since

!Ti!

j= "ij

2. K classes


J(A) =!!ASBAT

!!|ASW AT |


J(w) =wT SBw

wT SW w

w = S!1W (m1 !m2)

set

#J(A)#A

= 0

Use result:

#

#A

!!ASAT!! = 2


#J(A)#A

=#

#A

!!ASBAT!!

|ASW AT |

= 2!!ASBAT


!!ASBAT!!


= 0

Recall the 1-D case:


Now we need to do! ! ! and set it equal to 0.


since

!Ti!

j= "ij

2. K classes


J(A) =!!ASBAT

!!|ASW AT |


J(w) =wT SBw

wT SW w

w = S!1W (m1 !m2)

set

#J(A)#A

= 0

Use result:

#

#A

!!ASAT!! = 2


#J(A)#A

=#

#A

!!ASBAT!!

|ASW AT |

= 2!!ASBAT


!!ASBAT!!


= 0

We can use the following linear algebra result:


since

!Ti!

j= "ij

2. K classes


J(A) =!!ASBAT

!!|ASW AT |


J(w) =wT SBw

wT SW w

w = S!1W (m1 !m2)

set

#J(A)#A

= 0

Use result:

#

#A

!!ASAT!! = 2


#J(A)#A

=#

#A

!!ASBAT!!

|ASW AT |

= 2!!ASBAT


!!ASBAT!!


= 0

So...


since

!Ti!

j= "ij

2. K classes


J(A) =!!ASBAT

!!|ASW AT |


J(w) =wT SBw

wT SW w

w = S!1W (m1 !m2)

set

#J(A)#A

= 0

Use result:

#

#A

!!ASAT!! = 2


#J(A)#A

=#

#A

!!ASBAT!!

|ASW AT |

= 2!!ASBAT


!!ASBAT!!


= 0


Continuing...


so

! 0 = (ASBAT )!1ASB ! (ASW AT )!1ASW

ASB = (ASBAT )(ASW AT )!1ASW =ASBAT

ASW ATASW

Recall 1-D Case

SBw =wT SBw

wT SW wSW w

Here we have a system of such equations

ASB = !ASW " SBAT = SW AT ! " S!1W SBAT = AT !

Columns

S!1W SB

Note:

J(A) =!!ASBAT

!!|ASW AT | = |!|

A =

"

#$

!T1...

!Tn

%

&'

yi = !Tix

to get more than k-1 features


so



ASW ATASW

Recall 1-D Case

SBw =wT SBw

wT SW wSW w



Columns

S!1W SB

Note:

J(A) =!!ASBAT

!!|ASW AT | = |!|

A =

"

#$

!T1...

!Tn

%

&'

yi = !Tix



so



ASW ATASW

Recall 1-D Case

SBw =wT SBw

wT SW wSW w



Columns

S!1W SB

Note:

J(A) =!!ASBAT

!!|ASW AT | = |!|

A =

"

#$

!T1...

!Tn

%

&'

yi = !Tix


Recall the 1-D case:

Here we have a system of such equations, and


Therefore the columns of AT are eigenvectors of SW-1SB.

Notes:


so



ASW ATASW

Recall 1-D Case

SBw =wT SBw

wT SW wSW w

Scalar:

SBw = !SW w



Columns

S!1W SB

Note:

J(A) =!!ASBAT

!!|ASW AT | = |!|

A =

"

#$

"T1...

"Tn

%

&'


so



ASW ATASW

Recall 1-D Case

SBw =wT SBw

wT SW wSW w

Scalar:

SBw = !SW w



Columns

S!1W SB

Note:

J(A) =!!ASBAT

!!|ASW AT | = |!|

A =

"

#$

"T1...

"Tn

%

&'2. CLUSTERING 5

yi = !Tix


J(A) =!!AST AT

!!|ASW AT |

! ST AT = SW AT !

! S!1W ST AT = AT !

2 Clustering

y = Ax =

"

#$

!T1...

!Tn

%

&'x

x̂ = y1!1+ y2!2

+ ... + yn!n

minimize

E(|x" x̂|2

)

!i

E(|x" x̂|2

)= E

"

$!!!!!x"

n*

i=1

yi!i

!!!!!

2%

'

and we can write

Projection onto the ith eigenvector of SW-1SB.

In fact, they are the maximum eigenvalue eigenvectors of SW-1SB.


Notes:

SW-1SB is not generally symmetric. Thus the eigenvectors are not

orthogonal!

Also, n must be less than the number of classes k for |SB| " 0.

To get more than k-1 features, we can use

which are the maximum eigenvectors of SW-1ST


W1

W2

FIGURE 3.6. Three three-dimensional distributions are projected onto two-dimensionalsubspaces, described by a normal vectors W1 and W2. Informally, multiple discriminantmethods seek the optimum such subspace, that is, the one with the greatest separation ofthe projected distributions for a given total within-scatter matrix, here as associated withW1. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification.Copyright c! 2001 by John Wiley & Sons, Inc.


Clustering

15

What are appropriate criteria for extracting features for clusters?

- Uncorrelated features

- Maximize the variance of the features (assumes between class

scatter > within class scatter)

- Representation error

Suppose:

2. CLUSTERING 5

yi = !Tix


J(A) =!!AST AT

!!|ASW AT |

! ST AT = SW AT !

! S!1W ST AT = AT !

2 Clustering

y = Ax =

"

#$

!T1...

!Tn

%

&'x

x̂ = y1!1+ y2!2

+ ... + yn!n

minimize

E(|x" x̂|2

)

!i

E(|x" x̂|2

)= E

"

$!!!!!x"

n*

i=1

yi!i

!!!!!

2%

'

and we can write

So ! ! ! ! ! ! ! ! is an approximation to x in n

dimensions.

2. CLUSTERING 5

yi = !Tix


J(A) =!!AST AT

!!|ASW AT |

! ST AT = SW AT !

! S!1W ST AT = AT !

2 Clustering

y = Ax =

"

#$

!T1...

!Tn

%

&'x

x̂ = y1!1+ y2!2

+ ... + yn!n

minimize

E(|x" x̂|2

)

!i

E(|x" x̂|2

)= E

"

$!!!!!x"

n*

i=1

yi!i

!!!!!

2%

'

and we can write


We’d like to minimize "" " " . If we require that all

are orthogonal and assume we have shifted the origin to the

mean of all samples

2. CLUSTERING 5

yi = !Tix


J(A) =!!AST AT

!!|ASW AT |

! ST AT = SW AT !

! S!1W ST AT = AT !

2 Clustering

y = Ax =

"

#$

!T1...

!Tn

%

&'x

so

x = AT y

x̂ = y1!1+ y2!2

+ ... + yn!n

minimize

E(|x" x̂|2

)

!i

2. CLUSTERING 5

yi = !Tix


J(A) =!!AST AT

!!|ASW AT |

! ST AT = SW AT !

! S!1W ST AT = AT !

2 Clustering

y = Ax =

"

#$

!T1...

!Tn

%

&'x

so

x = AT y

x̂ = y1!1+ y2!2

+ ... + yn!n

minimize

E(|x" x̂|2

)

!i

2. CLUSTERING 6

x ! x"m

E!|x" x̂|2

"= E

#

$%%%%%x"

n&

i=1

yi!i

%%%%%

2'

(

and we can write

x =m&

i=1

yi!i

! e = x" x̂ =m&

i=n+1

yi!i

E!|e|2

"= E

#

$m&

i=n+1

m&

j=n+1

yiyj!Ti!

j

'

(

=m&

i=n+1

E!y2

i

"

But

E!y2

i

"

E[y] = E[A(x"m)]

{!1

... !n}

Special case:

and we can write

if we use all m components of some orthogonal basis.

2. CLUSTERING 6

x ! x"m

E!|x" x̂|2

"= E

#

$%%%%%x"

n&

i=1

yi!i

%%%%%

2'

(

and we can write

x =m&

i=1

yi!i

! e = x" x̂ =m&

i=n+1

yi!i

E!|e|2

"= E

#

$m&

i=n+1

m&

j=n+1

yiyj!Ti!

j

'

(

=m&

i=n+1

E!y2

i

"

But

E!y2

i

"

E[y] = E[A(x"m)]

{!1

... !n}

Special case:

then the representation error is


2. CLUSTERING 6

x ! x"m

E!|x" x̂|2

"= E

#

$%%%%%x"

n&

i=1

yi!i

%%%%%

2'

(

and we can write

x =m&

i=1

yi!i

! e = x" x̂ =m&

i=n+1

yi!i

E!|e|2

"= E

#

$m&

i=n+1

m&

j=n+1

yiyj!Ti!

j

'

(

=m&

i=n+1

E!y2

i

"

But

E!y2

i

"

E[y] = E[A(x"m)]

{!1

... !n}

Special case:

2. CLUSTERING 6

x ! x"m

E!|x" x̂|2

"= E

#

$%%%%%x"

n&

i=1

yi!i

%%%%%

2'

(

and we can write

x =m&

i=1

yi!i

! e = x" x̂ =m&

i=n+1

yi!i

E!|e|2

"= E

#

$m&

i=n+1

m&

j=n+1

yiyj!Ti!

j

'

(

=m&

i=n+1

E!y2

i

"

But

E!y2

i

"

E[y] = E[A(x"m)]

{!1

... !n}

Special case:

Representation error:


But! ! ! = the variance of feature i,

since

2. CLUSTERING 6

x ! x"m

E!|x" x̂|2

"= E

#

$%%%%%x"

n&

i=1

yi!i

%%%%%

2'

(

and we can write

x =m&

i=1

yi!i

! e = x" x̂ =m&

i=n+1

yi!i

E!|e|2

"= E

#

$m&

i=n+1

m&

j=n+1

yiyj!Ti!

j

'

(

=m&

i=n+1

E!y2

i

"

But

E!y2

i

"

E[y] = E[A(x"m)]

{!1

... !n}

Special case:

2. CLUSTERING 6

x ! x"m

E!|x" x̂|2

"= E

#

$%%%%%x"

n&

i=1

yi!i

%%%%%

2'

(

and we can write

x =m&

i=1

yi!i

! e = x" x̂ =m&

i=n+1

yi!i

E!|e|2

"= E

#

$m&

i=n+1

m&

j=n+1

yiyj!Ti!

j

'

(

=m&

i=n+1

E!y2

i

"

But

E!y2

i

"

E[y] = E[A(x"m)]

{!1

... !n}

Special case:

So we can minimize the error if we choose the n maximum

variance directions as our set

2. CLUSTERING 6

x ! x"m

E!|x" x̂|2

"= E

#

$%%%%%x"

n&

i=1

yi!i

%%%%%

2'

(

and we can write

x =m&

i=1

yi!i

! e = x" x̂ =m&

i=n+1

yi!i

E!|e|2

"= E

#

$m&

i=n+1

m&

j=n+1

yiyj!Ti!

j

'

(

=m&

i=n+1

E!y2

i

"

But

E!y2

i

"

E[y] = E[A(x"m)]

{!1

... !n}

Special case:The n maximum eigenvalue eigenvectors of the total sample

covariance matrix produce a set of n

! - uncorrelated features

! - maximum variance features

! - minimum representation error features.


These eigenvectors are called the Principal Components.

- account for as much variance as possible

- maximum scatter subspace


Feature Extraction Summary

20

y = A x

Criterion A

Single Class MICD Eigenvectors of S

k Classes Fisher’s Criterion Eigenvectors of SW-1SB

Clustering Representation Error Eigenvectors of S

where y is the extracted features, and A is the linear transformation of x, the original measurements.

dates and timescharlesr/9881/lecture18_blanks.pdf · ¥ presentations next week ¥ assignment 5 due...

Documents