linear discriminant functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –the...

63
Linear Discriminant Functions

Upload: others

Post on 25-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

Linear Discriminant Functions

Page 2: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

1

Linear discriminant functions anddecision surfaces

• DefinitionIt is a function that is a linear combination of the components of x

g(x) = wtx + w0 (1)where w is the weight vector and w0 the bias

• A two-category classifier with a discriminant function ofthe form (1) uses the following rule:Decide ω1 if

g(x) > 0 and ω2 if g(x) < 0⇔ Decide ω1 if

wtx > -w0 and ω2 otherwiseIf g(x) = 0 ⇒ x is assigned to either class

Page 3: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

2

Page 4: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

3

– The equation g(x) = 0 defines the decisionsurface that separates points assigned to thecategory ω1 from points assigned to thecategory ω2

– When g(x) is linear, the decision surface is ahyperplane

– Algebraic measure of the distance from x tothe hyperplane (interesting result!)

Page 5: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

4

Page 6: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

5

– In conclusion, a linear discriminant function dividesthe feature space by a hyperplane decision surface

– The orientation of the surface is determined by thenormal vector w and the location of the surface isdetermined by the bias

!

x = xp +rw

w

since g(xp ) = 0 and wtw = w2

g(x) = wtx + w0 " w

txp +

rw

w

#

$ %

&

' ( + w0

= g(xp ) + wtw

r

w

" r =g(x)

w

in particular d([0,0],H) =w 0

w

H

w

x

xtw

r

xp

Page 7: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

6

– The multi-category case

• We define c linear discriminant functions

and assign x to ωi if gi(x) > gj(x) ∀ j ≠ i; in case of ties, the classificationis undefined

• In this case, the classifier is a “linear machine”• A linear machine divides the feature space into c decision regions, with

gi(x) being the largest discriminant if x is in the region Ri

• For a two contiguous regions Ri and Rj; the boundary that separatesthem is a portion of hyperplane Hij defined by:

gi(x) = gj(x)⇔ (wi – wj)tx + (wi0 – wj0) = 0

• wi – wj is normal to Hij and

!

gi(x) = wi

tx + wi0 i =1,...,c

ji

ji

ijww

gg)H,x(d

!

!=

Page 8: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

7

Page 9: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

8

Generalized Linear Discriminant Functions

• Decision boundaries which separate between classesmay not always be linear

• The complexity of the boundaries may sometimesrequest the use of highly non-linear surfaces

• A popular approach to generalize the concept of lineardecision functions is to consider a generalized decisionfunction as:

g(x) = w1f1(x) + w2f2(x) + … + wNfN(x) + wN+1 (1)

where fi(x), 1 ≤ i ≤ N are scalar functions of thepattern x, x ∈ Rn (Euclidean Space)

Page 10: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

9

• Introducing fn+1(x) = 1 we get:

• This latter representation of g(x) implies that anydecision function defined by equation (1) can be treatedas linear in the (N + 1) dimensional space (N + 1 > n)

• g(x) maintains its non-linearity characteristics in Rn

!

g(x) = wi fi(x) = wT.y

i=1

N +1

"

where w = (w1,w2,...,wN ,wN +1)T

and y = (f1(x), f2(x),..., fN (x), fN +1(x))T

Page 11: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

10

• The most commonly used generalized decisionfunction is g(x) for which fi(x) (1 ≤ i ≤N) arepolynomials

• Quadratic decision functions for a 2-dimensionalfeature space

!

g(x) = w0

+ wixii=1:N

" + # ijxij= 0

"i=1:N

" x j + $ ijkxij=1:N

"i=1:N

" x j

k=1:N

" xk + ....

!

g(x) = w1x1

2+ w2x1x2 + w3x2

2+ w4x1 + w5x2 + w6

here : w = (w1,w2,...,w6)T

and y = (x1

2,x1x2,x2

2,x1,x2,1)

T

Page 12: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

11

• For patterns x ∈Rn, the most general quadratic decisionfunction is given by:

• The number of terms at the right-hand side is:

This is the total number of weights which are the freeparameters of the problem– n = 3 => 10-dimensional– n = 10 => 65-dimensional

!

g(x) = " ii xi2

+ " ij xix j + wixi + wn+1 (2)

i=1

n

#j= i+1

n

#i=1

n$1

#i=1

n

#

!

l = N +1

= n +n(n "1)

2+ n +1

=(n +1)(n + 2)

2

Page 13: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

12

• In the case of polynomial decision functions of order m, atypical fi(x) is given by:

– It is a polynomial with a degree between 0 and m. To avoidrepetitions, i1 ≤ i2 ≤ …≤ im

where gm(x) is the most general polynomial decisionfunction of order m

!

fi(x) = xi1e1 xi2

e2 ...ximem

where 1" i1,i2,...,im " n and ei,1" i " m is 0 or 1.

!

gm(x) = ... wi1i2 ...im

xi1 xi2 ...xim + gm"1(x)

im= im"1

n

#i2= i1

n

#i1=1

n

#

Page 14: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

13

Example 1: Let n = 3 and m = 2 then:

Example 2: Let n = 2 and m = 3 then:

4332211

2

3333223

2

22231132112

2

111

3

ii

4332211iiii

3

1i

2

wxwxwxw

xwxxwxwxxwxxwx w

wxwxwxwxxw)x(g12

2121

1

++++

+++++=

++++= !!==

32211

2

2222112

2

111

2

ii

1

iiii

2

1i

2

23

2222

2

211222

2

1112

3

1111

2

ii

2

iiiiii

2

ii

2

1i

3

wxwxwxwxxwx w

)x(gxxw)x(gwhere

)x(gxwxxwxxwx w

)x(gxxxw)x(g

12

2121

1

23

321321

121

+++++=

+=

++++=

+=

!!

!!!

==

===

Page 15: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

14

Page 16: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

15

Page 17: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if
Page 18: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

17

• Augmentation

• Incorporate labels into data

• Margin

Page 19: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

18

Page 20: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

Learning Linear Classifiers

Page 21: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

20

Basic Ideas

Directly “fit” linear boundary in feature space.• 1) Define an error function for classification

– Number of errors?– Total distance of mislabeled points from boundary?– Least squares error– Within/between class scatter

• Optimize the error function on training data– Gradient descent– Modified gradient descent– Various search proedures.– How much data is needed?

Page 22: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

21

Two-category case• Given x1, x2,…, xn sample points, with true category labels:α1, α2,…,αn

• Decision are made according to:

• Now these decisions are wrong when atxi is negative andbelongs to class ω1.Let yi = αi xi Then yi >0 when correctly labelled,negative otherwise.

!

if atxi> 0 class "

1 is chosen

if atxi< 0 class "

2 is chosen

!

"i=1

"i= #1

$ % &

if point xi is from class '1

if point xi is from class '2

Page 23: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

22

• Separating vector (Solution vector)– Find a such that atyi>0 for all i.

• atyi=0 defines a hyperplane with a as normal

x2

x1

Page 24: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

23

• Problem: solution not unique• Add Additional constraints: Margin, the distance away from boundary atyi>b

Page 25: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

24

Margins in data space

b

Larger margins promote uniqueness forunderconstrained problems

Page 26: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

25

Finding a solution vector byminimizing an error criterion

What error function?How do we weight the points? All the points or

only error points?Only error points:

!

Perceptron Criterion

JP (at ) = "(atyi

yi #Y

$ ) Y = {yi | atyi < 0}

Perceptron Criterion with Margin

JP (at ) = "(atyi

yi #Y

$ ) Y = {yi | atyi < b}

Page 27: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

26

Page 28: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

27

Minimizing Error via GradientDescent

Page 29: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

28

• The min(max) problem:

• But we learned in calculus how to solve thatkind of question!

)(min xfx

Page 30: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

29

Motivation

• Not exactly,• Functions:• High order polynomials:

• What about function that don’t have ananalytic presentation: “Black Box”

! + ! x

1

6x3 1

120x5 1

5040x7

RRf n!:

Page 31: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

30

:= f ! ( ),x y"

#$$

%

&''cos

1

2x

"

#$$

%

&''cos

1

2y x

Page 32: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

31

Directional Derivatives:first, the one dimension derivative:

!

Page 33: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

32

x

yxf

!

! ),(

y

yxf

!

! ),(

Directional Derivatives :Along the Axes…

Page 34: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

33

v

yxf

!

! ),(

2Rv!

1=v

Directional Derivatives :In general direction…

Page 35: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

34

DirectionalDerivatives

x

yxf

!

! ),(

y

yxf

!

! ),(

Page 36: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

35

In the plane

2R

RRf !2

:

!!"

#$$%

&

'

'

'

'=(

y

f

x

fyxf :),(

The Gradient: Definition in

Page 37: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

36

!!"

#$$%

&

'

'

'

'=(

n

nx

f

x

fxxf ,...,:),...,(

1

1

RRf n!:

The Gradient: Definition

Page 38: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

37

The Gradient Properties• The gradient defines (hyper) plane

approximating the function infinitesimally

yy

fx

x

fz !"

#

#+!"

#

#=!

Page 39: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

38

The Gradient properties• Computing directional derivatives• By the chain rule: (important for later use)

vfpv

fp ,)()( !=

"

"1=v

Want rate ofchange along ν, ata point p

Page 40: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

39

The Gradient properties

is maximal choosing

is minimal choosing

(intuition: the gradient points to the greatest change in direction)

v

f

!

!p

p

ff

v )()(

1!"

!=

p

p

ff

v )()(

1!"

!

#=

Page 41: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

40

The Gradient Properties

• Let be a smooth functionaround P,

if f has local minimum (maximum) at pthen,

(Necessary for local min(max))

!

f :Rn" # " R

!

("f )p =r 0

Page 42: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

41

The Gradient Properties

Intuition

Page 43: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

42

The Gradient Properties

Formally: for anyWe get: }0{\nRv!

!

0 =df (p + t " v)

dt= (#f )p,v

$ (#f )p =r 0

Page 44: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

43

The Gradient Properties• We found the best INFINITESIMAL

DIRECTION at each point,• Looking for minimum using “blind man”

procedure• How can get to the minimum using this

knowledge?

Page 45: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

44

Steepest Descent

• AlgorithmInitialization:loop while

compute search directionupdate

end loop

nRx !

0

!

hi ="f (xi)

!

xi+1 = x

i" # $ h

i

!

"f (xi) > #

Page 46: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

45

Setting the Learning RateIntelligently

Minimizing this approximation:

Page 47: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

46

Minimizing Perceptron Criterion

!

Perceptron Criterion

JP (at ) = "atyi

yi #Y

$ Y = {yi | atyi < 0}

%a JP (at )( ) =%a "atyi

yi #Y

$

= "yiyi #Y

$

Which gives the update rule :

a( j +1) = a(k) +&k yiyi #Yk

$

Where Yk is the misclassified set at step k

Page 48: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

47

Batch Perceptron Update

Page 49: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

48

Page 50: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

49

Simplest Case

Page 51: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

50

When does perceptron ruleconverge?

Page 52: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

51

+

+

Page 53: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

52

Different Error Functions Four learning criteriaas a function ofweights in a linearclassifier.

At the upper left is the totalnumber of patternsmisclassified, which ispiecewise constant and henceunacceptable for gradientdescent procedures.

At the upper right is thePerceptron criterion (Eq. 16),which is piecewise linear andacceptable for gradientdescent.

The lower left is squarederror (Eq. 32), which has niceanalytic properties and isuseful even when thepatterns are not linearlyseparable.

The lower right is the squareerror with margin. A designermay adjust the margin b inorder to force the solutionvector to lie toward themiddle of the b = 0 solutionregion in hopes of improvinggeneralization of the resultingclassifier.

Page 54: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

53

Three Different issues

1) Class boundaryexpressiveness

2) Error definition

3) Learning method

Page 55: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

54

Fisher’s linear disc.

Page 56: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

55

Fisher’s Linear DiscriminantIntuition

Page 57: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

56

Write this in terms of w

We are looking for a good projection

So,

Next,

Page 58: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

57

Define the scatter matrix:

Then,

Thus the denominator can be written:

Page 59: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

58

Rayleigh QuotientMaximizing equivalent tosolving:

With solution:

Page 60: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

59

Page 61: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

60

Page 62: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

61

Page 63: Linear Discriminant Functionsvision.psych.umn.edu/users/schrater/schrater_lab/...6 –The multi-category case •We define c linear discriminant functions and assign x to ω i if

62