ee 7700 pattern classification. bahadir k. gunturk2 classification example goal: automatically...

80
EE 7700 Pattern Classification

Upload: tabitha-stafford

Post on 04-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

EE 7700

Pattern Classification

Page 2: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 2

Classification Example

Goal: Automatically classify incoming fish according to species, and send to respective packing plants. Features: Length, width, color, brightness, etc.Model: Sea bass have some typical length, and it is greater than that for salmon.Classifier: If the fish is longer than a value, l*, classify it as sea bass. Training Samples: To choose l*, make length measurements from training samples and inspect the results.

Page 3: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 3

Classification Example

Page 4: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 4

Classification Example

Page 5: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 5

Classification Example

Now, we have two features two classify the fish: the lightness x1, and the width x2. Feature vector: x=[x1 x2]’.The feature extractor reduces the image of a fish to a feature vector x in a 2D feature space.

Decision boundary

Page 6: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 6

Classification Example

Page 7: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 7

Classification Example

Page 8: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 8

Feature Extraction

The goal of feature extractor is to characterize an object to be recognized by measurements whose values are very similar for objects in the same category, and very different for objects in different categories.

The features should be invariant to the irrelevant transformation of the input. For example, the location of a fish on the belt is irrelevant, and thus the representation should be insensitive to the location of the fish.

Page 9: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 9

Classification

The task of the classifier is to use feature vectors (provided by the feature extractor) to assign the object to a category.

Perfect classification is often impossible, a more general task is to determine the probability for each of the possible categories.

The process of using data to determine the classifier is referred to as training the classifier.

Page 10: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 10

Classical Model

Feature Extractor

Classifier•••

x1

x2

xd

Raw Data Class1 or 2 or ….. or c

We measure a fixed set of d features for an object that we want to classify. For example,

x1 = height x2 = perimeter ... xd = average pixel intensity

Page 11: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 11

Feature Vectors We can think of our feature set as a feature vector x, where x is

the d-dimensional column vector

Can think of x as being a point in a d-dimensional feature space.

By this process of feature measurement, we can represent an object as a point in feature space.

x

x1

x2

x3

x =

x1

x2

xd

•••

Page 12: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 12

What is ahead

Template matching Minimum-distance classifiers Metrics Inner products Linear discriminants Bayesian approach

Page 13: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 13

Template Matching

To classify one of the noisy characters, simply compare it to the two ‘templates’ on the left

Comparison can be done in many ways - here are two: Count the number of places where the template and pattern agree. Pick the

class that has the maximum number of agreements. Count the number of places where the template and pattern disagree. Pick the

class that has the smallest number of disagreements. This may not work well when there is rotation, scaling, warping, occlusion,

etc.

Page 14: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 14

Template Matching ==??

ff gg

MostMostpopularpopular

Question: How can we achieve rotation invariance?

Page 15: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 15

Page 16: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 16

Minimum Distance Classifiers Template matching can be expressed mathematically through a notion of

distance.

Let x be the feature vector for the unknown input, and let m1, m2, ..., mc be templates (i.e., perfect, noise-free feature vectors) for the c classes.

The error in matching x against mk is given by || x - mk ||.

Choose the class for which the error is a minimum.

Since || x - mk || is the distance from x to mk, the technique is called minimum distance classification.

Page 17: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 17

Minimum Distance Classifiers

m1

m2

m3

x

Distancem1

Distancem2

Distancemc

•••

Min

imum

Sel

ecto

r

Cla

ss

•••

x

1/ 2Ta a aEuclidean distance

1 2 ... da a a “Sum of absolute values”

1

d

a

a

a

Page 18: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 18

Euclidean Distance x is a column vector of d features, x1, x2, ... , xd. By using the transpose operator ' we can convert the column vector x to

the row vector x':

The inner product of two column vectors x and y is defined by

Thus the norm of x (using the Euclidean metric) is given by

|| x || = sqrt( x' x )

x =

x1

x2

xd

•••

x’ = [x1, x2, ….., xd]

x’y = x1 y1 + x2 y2 ….., xd yd = xkykk=1

d

Page 19: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 19

Inner Products Important additional properties of inner products:

x' y = y' x = || x || || y || cos( angle between x and y ) x' ( y + z ) = x' y + x' z .

The inner product of x and y is maximum when the angle between them is zero, i.e., when one is just a positive multiple of the other. Sometimes we say

that x' y is the correlation between x and y, and that the correlation is maximum when x and y point in the same direction.

If x' y = 0, the vectors x and y are said to be orthogonal or uncorrelated.

Page 20: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 20

Minimum Distance Classifiers

Example: Let m1=[4.3 1.3]’ and m2=[1.5 0.3]’. Find the decision boundary.

Page 21: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 21

Linear Discriminants For minimum distance classifier, we chose the nearest class Use the inner product to express the Euclidean distance from x to mk:

To find the template mk which minimizes ||x-mk||, it is sufficient to find the mk which maximizes the bracketed term above. Define the linear discriminant function g(x) as

||x-mk||2 = (x -mk)’(x -mk) = x’ x -m’ x - x’ mk+mk’ mkk

= -2 [m’ x - .5 mk’ mk ]+ x’ xk

g(x) = m’ x - .5 ||mk||2

k

constantconstant

Page 22: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 22

Min Euclidean distance Classifier A minimum-Euclidean-distance classifier classifies an input feature

vector x by computing c linear discriminant functions

g1(x), g2(x), ... , gc(x)

and assigning x to the class corresponding to the maximum discriminant function.

m1

m2

md

•••

Ma

xim

um

Sel

ect

or

Cla

ss

•••

xg1(x)

g2(x)

gc(x)

Page 23: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 23

Feature Scaling The numerical value for a feature x depends on the units used, .i.e., on the scale. If x is multiplied by a scale factor a, both the mean and the standard deviation are

multiplied by a. The variance is multiplied by a2.

Sometimes it is desirable to scale the data so that the resulting standard deviation is unity. divide x by the standard deviation s.

Similarly, in measuring the distance from x to m, it often makes sense to measure it relative to the standard deviation.

Page 24: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 24

Feature Scaling

This suggests an important generalization of a minimum-Euclidean-distance classifier.

Let x(i) be the value for Feature i,

let m(i,j) be the mean value of Feature i for Class j, and

let s(i,j) be the standard deviation of Feature i for Class j.

In measuring the distance between the feature vector x and the mean vector mj for Class j, use the standardized distance

r(x,mj)2 = x1 - m1j

s1j

2x2 - m2j

s2j

2xd - mdj

sdj

2

+ + +••••

Page 25: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 25

Covariance The covariance of two features measures their tendency to vary together, i.e., to co-vary.

The variance is the average of the squared deviation of a feature from its mean, the covariance is the average of the products of the deviations of feature values from their means.

Consider Feature i and Feature j.

Let { x(1,i), x(2,i), ... , x(n,i) } be a set of n examples of Feature i

Let { x(1,j), x(2,j), ... , x(n,j) } be a corresponding set of n examples of Feature j

Page 26: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 26

Covariance Let m(i) be the mean of Feature i, and m(j) be the mean of Feature j. Then the covariance of Feature i and Feature j is defined by

The covariance has several important properties: If Feature i and Feature j tend to increase together, then c(i,j) > 0 If Feature i tends to decrease when Feature j increases, then c(i,j) < 0 If Feature i and Feature j are independent, then c(i,j) = 0 | c(i,j) | <= s(i) s(j), where s(i) is the standard deviation of Feature ic(i,i) = s(i)2 variance of Feature i

c(i,j) =[ x(1,i) - m(i) ] [ x(1,j) - m(j) ] + ... + [ x(n,i) - m(i) ] [ x(n,j) - m(j) ]

n-1

Page 27: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 27

Covariance Matrix All of the covariances c(i,j) can be collected together into a covariance

matrix C:

c(1,1) c(1,2) .... c(1,d)c(2,1) c(2,2) .... c(2,d)

c(d,1) c(d,2) .... c(d,d)

C =

Page 28: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 28

Covariance Matrix Need to normalize the distance

Recall what we did earlier to get a standardized distance for a single feature:

What is the matrix generalization of the scalar equation?

r2 = x - m

s

2

= (x-m) (x-m)1s2

r2 = (x-mx)TCx (x-mx)-1

“Mahalanobis distance”

Page 29: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 29

Pros and Cons

The use of the Mahalanobis distance removes several of the limitations of the Euclidean metric: It automatically accounts for the scaling of the coordinate axes It corrects for correlation between the different features It can provide curved as well as linear decision boundaries

Cons: Covariance matrices can be hard to determine accurately, Memory and time requirements grow with the number of features.

Page 30: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 30

Bayesian Decision Theory

Return to fish example. There are two categories. Denote these categories as w1 for sea bass and w2 for salmon.

Assume that there is some prior probability (or simply prior) P(w1) that the next fish is sea bass, and some prior probability that P(w2) that it is salmon.

Suppose that we make a decision without making a measurement. The logical decision rule is

Decide w1 if P(w1) > P(w2); otherwise decide w2

Page 31: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 31

Bayesian Decision Theory

Suppose that we have a feature vector x; now the decision rule is

Decide w1 if P(w1 | x) > P(w2 | x); otherwise decide w2

Using the Bayes formula

( | ) ( )( | )

( )i i

i

p w P wP w

p

xx

x

( ) ( | ) ( )i ii

p p w P wx xwhere

Page 32: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 32

Bayesian Decision Theory

Define a set of discriminant functions gi(x), i=1,…,c

( | ) ( )( | )

( )i i

i

p w P wP w

p

xx

x

( ) ( | ) ( )i i ig p w P wx x

( ) ln ( | ) ln ( )i i ig p w P w x x

Page 33: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 33

Gaussian Density

21 1

( ) exp22

xp x

1/ 2 1/ 2

1 1( ) exp

(2 ) | | 2T

dp

x x μ Σ x μΣ

Univariate

Multivariate

Page 34: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 34

Gaussian Density

Center of the cluster is determined by the mean vector, and the shape of the cluster is determined by the covariance matrix.

( ) ~ ,p Nx μ Σ

2 1Tr x μ Σ x μ

“Mahalonobis distance” from x to mean.

Page 35: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 35

Discriminant Functions for Gaussian Let us examine the discriminant function for

( ) ln ( | ) ln ( )i i ig p w P w x x

( | ) ~ ,i i ip w Nx μ Σ

1/ 2 1/ 2

1 1( ) ln exp ln ( )

(2 ) | | 2T

i i i i idi

g P w

x x μ Σ x μ

Σ

11 1( ) ln 2 ln ln ( )

2 2 2T

i i i i i i

dg P w x x μ Σ x μ Σ

Page 36: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 36

Discriminant Functions for Gaussian Case I: 2

i Σ I

1 21/i Σ I 2

1( ) ln ( )

2T

i i i ig P w

x x μ x μ

Page 37: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 37

Discriminant Functions for Gaussian Case I: 2

i Σ I

1 21/i Σ I 2

1( ) ln ( )

2T

i i i ig P w

x x μ x μ

As the priors change, the decision boundaries shift.

Page 38: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 38

Discriminant Functions for Gaussian Case I: 2

i Σ I

1 21/i Σ I 2

1( ) ln ( )

2T

i i i ig P w

x x μ x μ

Page 39: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 39

Discriminant Functions for Gaussian Examples: Find the decision boundaries for 1D and 2D

Gaussian data.

1 2( ) ( )g gx x

11 1( ) ln 2 ln ln ( )

2 2 2T

i i i i i i

dg P w x x μ Σ x μ Σ

Solve for x from

Page 40: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 40

Discriminant Functions for Gaussiani arbitraryΣ

Page 41: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 41

We learned how we could design an optimal classifier if we knew the prior probabilities P(wi) and the class-conditional densities p(x|wi).

In a typical application, we rarely have complete knowledge. We typically have some general knowledge and a number of design samples (or training data).

We use the samples to estimate the unknown probabilities and probability densities, and then use these estimates as if they were true values.

If the densities could be parameterized, the problem is simplified significantly. (For example, for Gaussian distribution, mean and covariance matrix are the only parameters we need to estimate.)

Parameter Estimation

Page 42: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 42

Parameter Estimation

Gaussian case:

1

n

kkn

μ x

1

1ˆ ˆ ˆn

T

k kkn

Σ x μ x μ

Page 43: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 43

Dimensionality

The accuracy degrades when the dimensionality is large. The dimensionality can be reduced by combining features. Linear combinations are attractive because they are simple

to compute and analytically tractable. Dimensionality reduction techniques include

Principal Component Analysis Fisher’s Discriminant Analysis

Page 44: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 44

Principal Component Analysis (PCA) Find a lower dimensional space that best represents the

data in a least-squares sense.

Full N-dimensional space (here N = 2)

d-dimensional subspace(here d = 1)

U. of Delaware

Page 45: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 45

Principal Component Analysis (PCA) We begin by considering the problem of representing N-

dimensional vectors x1, x2, …, xn by a single vector x0. To be more specific, suppose that we want to find a vector

x0 such that the sum of squared differences between x0 and xk is as small as possible.

Define cost function to be minimized:

The solution is the sample mean:

2

0 0 01

n

kk

J

x x x

01

1 n

kkn

x μ x

Page 46: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 46

Principal Component Analysis (PCA) The sample does not reveal any of the variability in the

data. Let’s now consider a solution of the form

where ak is a scalar and e is a unit vector. Define cost function to be minimized:

The solution is

2

1 11

,..., ,n

n k kk

J a a a

e μ e x

Tk ka e x μ

k ka x μ e

Page 47: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 47

Principal Component Analysis (PCA) What is the best direction e for the line?

2

1 11

,..., ,n

n k kk

J a a a

e μ e x

Tk ka e x μ

Using

We get 2

11

nT

kk

J

e e Se μ x 1

nT

k kk

S x μ x μwhere

Find e that maximizes 1T T e Se e e

Page 48: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 48

Principal Component Analysis (PCA) The solution is

1

nT

k kk

S x μ x μSe e where

T T T e Se e e e eSince

we select the eigenvector corresponding to the largest eigenvalue.

Page 49: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 49

Principal Component Analysis (PCA) Generalize it to d dimensions (d<=n)

1

d

k i ii

a

x μ e

Find the eigenvectors e1, e2, …, ed corresponding to d largest

eigenvalues of S.

Ti i ka e x μ 1,...,i d

Page 50: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 50

Face Recognition

?Probe

Page 51: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 51

Eigenface Approach

Reduce the dimensionality by applying PCA: Apply PCA to a training dataset to find the first d principal

components.

Find the weights for all images. Classify the probe using norm distance.

(d=8)

1 8,...,a a

Page 52: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 52

Fisher’s Linear Discriminant

Although PCA finds components that are useful for representing data, there is no reason to assume that these components must be useful for discriminating between data in different classes.

Page 53: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 53

Fisher’s Linear Discriminant

Suppose that we have a set of N-dimensional vectors x1, x2, …, xn. n1 of them is in the subset D1; and n2 of them is in the subset D2.

We want to find a unit vector w that enables accurate classification.

We would like to have the means of the projected points well separated.

Page 54: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 54

Fisher’s Linear Discriminant

Let m1 and m2, be the sample means of D1 and D2 , respectively:

The means for the projected points are

1

11

1

Dn

x

m x

2

22

1

Dn

x

m x

1 1Tm w m

2 2Tm w m

Page 55: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 55

Fisher’s Linear Discriminant

The distance between the projected means are

We want this difference to be large relative to the standard deviations for each class.

Define the scatter for projected samples as

1 2 1 2 1 2T T Tm m w m w m w m m

22

i

Ti i

D

s m

x

w x

Page 56: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 56

Fisher’s Linear Discriminant

The criterion function we want to maximize is

Let’s write J(w) as an explicit function of w.

22

1 2 1 2 1 2 1 2

TT Tm m w m m w m m m m w

2

1 2T

Bm m w S w

SB is called “Between-class scatter matrix”

2

1 22 2

1 2

( )m m

Js s

w

Page 57: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 57

Fisher’s Linear Discriminant

Let’s write J(w) as an explicit function of w.

2 22

i i

T T Ti i i

D D

s m

x x

w x w x w m

i i

T TT Ti i i i

D D

x x

w x m x m w w x m x m w

Tiw S w

2 21 2 1 2 1 2

T T T Tws s w S w w S w w S S w w S w

Sw is called “Within-class scatter matrix”

Page 58: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 58

Fisher’s Linear Discriminant

Therefore

This expression is known as the generalized Rayleigh quotient. The vector w that maximizes J(w) must satisfy

( )T

BT

w

J w S w

ww S w

B wS w S w

Page 59: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 59

Fisher’s Linear Discriminant

If Sw is nonsingular, we have the conventional eigenvalue problem:

Since SBw is always in the direction of m1-m2 , and the scale factor for w is not an issue, we can immediately write the solution for w that maximizes J(w):

1w B S S w w

11 2w

w S m m

Page 60: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 60

Fisher’s Linear Discriminant

In general, when there are c classes, we solve for the eigenvectors of

B i i w iS w S w

1

cT

B i i ii

n

S m m m m

where

1 i

cT

w i ii D

x

S x m x m

1

1 1 c

i ii

nn n

x

m x m

Page 61: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 61

Fisher’s Linear Discriminant

Therefore, dimensionality is reduced by

Ty W x

1 1cW w w

where

Note that SB is the sum of c matrices of rank 1 or 0, and c-1 of

these matrices are independent; therefore, the maximum rank of SB

is c-1.

Page 62: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 62

Fisher’s Linear DiscriminantBelhumeur et al, “Eigenfaces vs. Fisherfaces,” PAMI 1997.

Page 63: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 63

Fisher’s Linear DiscriminantBelhumeur et al, “Eigenfaces vs. Fisherfaces,” PAMI 1997.

Page 64: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 64

Fisher’s Linear DiscriminantBelhumeur et al, “Eigenfaces vs. Fisherfaces,” PAMI 1997.

Page 65: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 65

Fisher’s Linear DiscriminantBelhumeur et al, “Eigenfaces vs. Fisherfaces,” PAMI 1997.

Page 66: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 66

Linear Discriminant Functions Define linear discriminant functions

( ) ( ) for all i jg g j i x x

0 01

( )d

Ti i i ik k i

k

g w w x w

x w x

Assign x to wi if

1,...,i c

0 0 0T

i j i jw w w w x

Page 67: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 67

Linear Discriminant Functions

Two-class case

Page 68: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 68

Linear Discriminant Functions

( ) ( )i jg gx xHyperplane Hij is defined by

Page 69: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 69

Multilayer Neural Networks

Linear decision boundaries are useful, but often not very powerful.

One way to get more complex boundaries is to apply several linear classifiers, and apply a classifier to their output.

Page 70: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 70

Multilayer Neural Networks

Page 71: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 71

Multilayer Neural Networks

Page 72: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 72

Multilayer Neural Networks

Page 73: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 73

Multilayer Neural Networks

21

1( )

2

c

k kk

J t z

w

Find w that minimizes

Page 74: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 74

Multilayer Neural Networks

21

1( )

2

c

k kk

J t z

w

Training: Choose parameters that minimize error on training set

Stochastic Gradient Descent

( 1) ( )i i J w w

Page 75: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 75

Multilayer NN

k

kj k kj

netJ J

w net w

k

k k k

zJ J

net z net

j j

ji j j ji

y netJ J

w y net w

Page 76: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 76

Multilayer Neural Networks

An activation (or squashing) function

Page 77: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 77

Multilayer Neural Networks

Page 78: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 78

Multilayer Neural Networks

Page 79: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 79

Multilayer Neural Networks

Page 80: EE 7700 Pattern Classification. Bahadir K. Gunturk2 Classification Example Goal: Automatically classify incoming fish according to species, and send to

Bahadir K. Gunturk 80

Multilayer Neural Networks