kernel – based methods presented by jason friedman lena gorelick advanced topics in computer and...

96
Kernel – Based Kernel – Based Methods Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Advanced Topics in Computer and Human Vision Vision Spring 2003 Spring 2003

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

Kernel – BasedKernel – Based Methods Methods

Presented by

Jason FriedmanLena Gorelick

Advanced Topics in Computer and Human Advanced Topics in Computer and Human VisionVision

Spring 2003Spring 2003

Page 2: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

2

• Structural Risk Minimization (SRM)Structural Risk Minimization (SRM)

• Support Vector Machines (SVM)Support Vector Machines (SVM)

• Feature Space vs. Input SpaceFeature Space vs. Input Space

• Kernel PCAKernel PCA

• Kernel Fisher Discriminate Analysis Kernel Fisher Discriminate Analysis (KFDA)(KFDA)

Agenda…Agenda…

Page 3: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

3

• Structural Risk Minimization (SRM)Structural Risk Minimization (SRM)

• Support Vector Machines (SVM)Support Vector Machines (SVM)

• Feature Space vs. Input SpaceFeature Space vs. Input Space

• Kernel PCAKernel PCA

• Kernel Fisher Discriminate Analysis Kernel Fisher Discriminate Analysis (KFDA)(KFDA)

Agenda…Agenda…

Page 4: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

4

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

• Definition:– Training set with l observations:

Each observation consists of a pair:

16x16=256

Page 5: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

5

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

• The task:“Generalization” - find a mapping

• Assumption: Training and test data drawn from the same probability distribution, i.e.

(x,y) is “similar” to (x1,y1), …, (xl,yl)

Page 6: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

6

Structural Risk Minimization Structural Risk Minimization (SRM) (SRM) – Learning Machine– Learning Machine

• Definition:– Learning machine is a family of

functions {f()}, is a set of parameters.

– For a task of learning two classes f(x,) 2 {-1,1} 8 x,

Class of oriented lines in R2:sign(1x1 + 2x2 + 3)

Page 7: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

7

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

– Capacity vs. – Capacity vs. GeneralizationGeneralization• Definition:

– Capacity of a learning machine measures the ability to learn any training set without error.

Too much Capacity

Too little Capacity

underfittingoverfitting

? Is the color green?Does it have the same # of leaves?

?

Page 8: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

8

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

– Capacity vs. – Capacity vs. GeneralizationGeneralization

•For small sample sizes overfitting or underfitting might occur

•Best generalization = right balance between accuracy and capacity

Page 9: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

9

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

– Capacity vs. – Capacity vs. GeneralizationGeneralization• Solution:

Restrict the complexity (capacity) of the function class.

• Intuition:“Simple” function that explains most of the data is preferable to a “complex” one.

Page 10: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

10

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

-VC dimension -VC dimension• What is a “simple”/”complex” function?• Definition:

– Given l points (can be labeled

in 2l ways)

– The set of points is shattered by the function class {f()} if for each labeling there is a function which correctly assigns those labels.

Page 11: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

11

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

-VC dimension -VC dimension

• Definition– VC dimension of {f()} is the maximum number

of points that can be shattered by {f()} and is a measure of capacity.

Page 12: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

12

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

-VC dimension -VC dimension• Theorem:The VC dimension of the set of oriented

hyperplanes in Rn is n+1.

• Low # of parameters ) low VC dimension

Page 13: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

13

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

-Bounds -Bounds• Definition:

Actual risk

• Minimize R()

• But, we can’t measure actual risk, since we don’t know p(x,y)

Page 14: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

14

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

-Bounds -Bounds• Definition:

Empirical risk

• Remp() ! R(), l ! 1But for small training set deviations might occur

Page 15: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

15

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

-Bounds -Bounds• Risk bound:

with probability (1-) h is VC dimension of the function class

• Note: R() is independent of p(x,y)

Not valid for infinite

VC dimension

Confidence term

Page 16: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

16

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

-Bounds -Bounds

Page 17: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

17

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

-Bounds -Bounds

Page 18: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

18

Structural Risk Minimization Structural Risk Minimization (SRM)(SRM)

-Principal Method-Principal Method• Principle method for choosing a learning

machine for a given task:

Page 19: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

19

SRMSRM• Divide the class of

functions into nested subsets

• Either calculate h for each subset, or get a bound on it

• Train each subset to achieve minimal empirical error

• Choose the subset with the minimal risk bound

Complexity

Risk Bound

Page 20: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

20

• Structural Risk Minimization (SRM)Structural Risk Minimization (SRM)

• Support Vector Machines (SVM)Support Vector Machines (SVM)

• Feature Space vs. Input SpaceFeature Space vs. Input Space

• Kernel PCAKernel PCA

• Kernel Fisher Discriminate Analysis Kernel Fisher Discriminate Analysis (KFDA)(KFDA)

Agenda…Agenda…

Page 21: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

21

Support Vector Machines Support Vector Machines (SVM)(SVM)

• Currently the “en vogue” approach to classification

• Successful applications in bioinformatics, text, handwriting recognition, image processing

• Introduced by Bosner, Gayon and Vapnik, 1992

• SVM are a particular instance of Kernel Machines

Page 22: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

22

Linear SVM – Separable Linear SVM – Separable casecase

• Two given classes are linearly separable

Page 23: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

23

Linear SVM - definitionsLinear SVM - definitions• Separating hyperplane H:

• w is normal to H

• |b|/||w|| is the perpendicular distance from H to the origin

• d+ (d-) is the shortest distance from H to the closest positive (negative) point.

Page 24: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

24

Linear SVM - definitionsLinear SVM - definitions

Page 25: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

25

Linear SVM - definitionsLinear SVM - definitions• If H is a separating hyperplane, then

• No training points fall between H1 and H2

Page 26: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

26

Linear SVM - definitionsLinear SVM - definitions• By scaling w and b, we can require that

Or more simply:

• Equality holds xi lies on H1 or H2

Page 27: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

27

Linear SVM - definitionsLinear SVM - definitions• Note: w is no longer a unit vector

• Margin is now 2 / ||w||

• Find hyperplane with the largest margin.

Page 28: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

28

Linear SVM – maximizing Linear SVM – maximizing marginmargin

• Maximizing the margin , minimizing ||w||2

• ) more room for unseen points to fall

• ) restrict the capacity

R is the radius of the smallest ball around data

Page 29: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

29

Linear SVM – Linear SVM – Constrained OptimizationConstrained Optimization

• Introduce Lagrange multipliers

• “Primal” formulation:

• Minimize LP with respect to w and bRequire

Page 30: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

30

Linear SVM – Linear SVM – Constrained OptimizationConstrained Optimization

• Objective function is quadratic

• Linear constraint defines a convex set

• Intersection of convex sets is a convex set

• ) can formulate “Wolfe Dual” problem

Page 31: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

31

• Maximize LP with respect to i

Require

• Substitute into LP to give:

• Maximize with respect to i

Linear SVM – Linear SVM – Constrained OptimizationConstrained OptimizationThe

Solution

Page 32: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

33

Linear SVM – Linear SVM – Constrained OptimizationConstrained Optimization

• Using Karush Kuhn Tucker conditions:

• If i > 0 then lies either on H1 or H2

) The solution is sparse in i

• Those training points are called “support vectors”. Their removal would change the solution

Page 33: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

34

SVM – Test PhaseSVM – Test Phase

• Given the unseen sample x we take the class of x to be

Page 34: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

35

Linear SVM – Linear SVM – Non-separable caseNon-separable case

• Separable case corresponds to empirical risk of zero.

• For noisy data this might not be the minimum in the actual risk. (overfitting )

• No feasible solution for non-separable case

Page 35: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

36

• is an upper bound on the number of errors

Linear SVM – Linear SVM – Non-separable caseNon-separable case

• Relax the constraints by introducing positive slack variables i

Page 36: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

37

Linear SVM – Linear SVM – Non-separable caseNon-separable case

• Assign extra cost to errors• Minimize

where C is a penalty parameterchosen by the user

Page 37: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

38

Linear SVM – Linear SVM – Non-separable caseNon-separable case

• Lagrange formulation again:

• “Wolfe Dual” problem - maximize:

subject to:

• The solution:

Lagrange multiplier

Page 38: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

40

Linear SVM – Linear SVM – Non-separable caseNon-separable case

• Using Karush Kuhn Tucker conditions:

•The solution is sparse in i

Page 39: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

41

Nonlinear SVMNonlinear SVM

• Non linear decision function might be needed

Page 40: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

42

Nonlinear SVM- Nonlinear SVM- Feature SpaceFeature Space

• Map the data to a high dimensional (possibly infinite) feature space

• Solution depends on • If there were function k(xi,xj) s.t.

) no need to know explicitly

Page 41: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

43

Nonlinear SVM – Toy Nonlinear SVM – Toy exampleexample

Input Space Feature Space

Page 42: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

44

Nonlinear SVM – Nonlinear SVM – Avoid the CurseAvoid the Curse

• Curse of dimensionality:The difficulty of estimating a problem increases drastically with the dimension

• But! Learning in F may be simpler if one uses low complexity function class (hyperplanes)

Page 43: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

45

Nonlinear SVM-Nonlinear SVM-Kernel FunctionsKernel Functions

• Kernel functions exist!– effectively compute dot products in feature

space

• Can use it without knowing and F

• Given a kernel, and F are not unique

• F with smallest dim is called minimal embedding space

Page 44: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

46

Nonlinear SVM-Nonlinear SVM-Kernel FunctionsKernel Functions

• Mercer’s condition:There exists a pair {,F} such that

iff for any g(x) s.t. is finite

then

Page 45: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

47

Nonlinear SVM-Nonlinear SVM-Kernel FunctionsKernel Functions

• Formulation of algorithm in terms of kernels

Page 46: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

48

Nonlinear SVM-Nonlinear SVM-Kernel FunctionsKernel Functions

• Kernels frequently used:

Page 47: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

49

Nonlinear SVM-Nonlinear SVM-Feature SpaceFeature Space

• Hyperplane {w,b} requires dim(F) + 1 parameters

• Solving SVM means adjusting l+1 parameters

d=256, p=4 )

dim(F)=183,181,3

76

Page 48: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

50

SVM - SolutionSVM - Solution

• LD is convex ) the solution is global

• Two type of non-uniqueness:– {w,b} is not unique

– {w,b} is unique, but the set {i} is notPrefer the set with less support vectors(sparse)

Page 49: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

51

Nonlinear SVM-Toy Nonlinear SVM-Toy ExampleExample

Page 50: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

52

Nonlinear SVM-Toy Nonlinear SVM-Toy ExampleExample

Page 51: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

53

Nonlinear SVM-Toy Nonlinear SVM-Toy ExampleExample

Page 52: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

54

Methods of solutionMethods of solution

• Quadratic programming packages• Chunking• Decomposition methods• Sequential minimal optimization

Page 53: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

55

Applications - SVMApplications - SVM

Extracting Support Data for a Given Task

Bernhard Schölkopf Chris Burges Vladimir Vapnik

Proceedings, First International Conference on Knowledge Discovery & Data Mining. 1995

Page 54: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

56

• Input (USPS handwritten digits)– Training set: 7300 – Testing set: 2000

• Constructed:– 10 class/non-class SVM classifiers– Take the class with maximal output

Applications - SVMApplications - SVM

16x16

Page 55: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

57

Applications - SVMApplications - SVM

• Three different types of classifiers for each digit:

Page 56: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

58

Applications - SVMApplications - SVM

Page 57: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

59

Applications - SVMApplications - SVMPredicting the Optimal Decision Functions - SRM

Page 58: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

60

Applications - SVMApplications - SVM

Page 59: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

61

• Structural Risk Minimization (SRM)Structural Risk Minimization (SRM)

• Support Vector Machines (SVM)Support Vector Machines (SVM)

• Feature Space vs. Input SpaceFeature Space vs. Input Space

• Kernel PCAKernel PCA

• Kernel Fisher Discriminate Analysis Kernel Fisher Discriminate Analysis (KFDA)(KFDA)

Agenda…Agenda…

Page 60: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

62

Feature Space vs. Input Feature Space vs. Input SpaceSpace

• Suppose the solution is a linear combination in feature space

• Cannot generally say that each point has a preimage

Page 61: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

63

Feature Space vs. Input Feature Space vs. Input SpaceSpace

• If there exists z such that

and k is an invertible function fk of (x¢ y)then can compute z as

where {e1,…,eN} is an orthonormal basis of the input space.

Page 62: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

64

Feature Space vs. Input Feature Space vs. Input SpaceSpace

• Polynomial kernel is invertible when

• Then the preimage of w is given by

Page 63: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

65

Feature Space vs. Input Feature Space vs. Input spacespace

• In general, cannot find the preimage • Look for an approximation such that

is small

Page 64: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

66

• Structural Risk Minimization (SRM)Structural Risk Minimization (SRM)

• Support Vector Machines (SVM)Support Vector Machines (SVM)

• Feature Space vs. Input SpaceFeature Space vs. Input Space

• Kernel PCAKernel PCA

• Kernel Fisher Discriminate Analysis Kernel Fisher Discriminate Analysis (KFDA)(KFDA)

Agenda…Agenda…

Page 65: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

67

PCA PCA

• Regular PCA:Find the direction u s.t. projecting n points in d dimensions onto u gives the largest variance.

• u is the eigenvector of covariance matrix Cu=u.

Page 66: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

68

Kernel PCA Kernel PCA

• Extension to feature space:– compute covariance matrix

– solve eigenvalue problem CV=V

)

Page 67: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

69

Kernel PCAKernel PCA

– Define in terms of dot products:

– Then the problem becomes:

where

l x l rather than d x d

Page 68: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

70

Kernel PCA – Kernel PCA – Extracting FeaturesExtracting Features

• To extract features of a new pattern x with kernel PCA, project the pattern on the first n eigenvectors

Page 69: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

71

Kernel PCAKernel PCA

• Kernel PCA is used for:– De-noising– Compression– Interpretation (Visualization)– Extract features for classifiers

Page 70: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

72

Kernel PCA - Toy Kernel PCA - Toy exampleexample• Regular PCA:

Page 71: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

73

Kernel PCA - Toy Kernel PCA - Toy exampleexample

Page 72: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

74

Applications – Kernel Applications – Kernel PCAPCA

Kernel PCA Pattern Reconstruction via Approximate Pre-Images

B. Schölkopf, S. Mika, A. Smola, G. Rätsch, and K.-R. Müller.

In L. Niklasson, M. Bodén, and T. Ziemke, editors, Proceedings of the 8th International Conference on Artificial Neural Networks,

Perspectives in Neural Computing, pages 147-152, Berlin,

1998. Springer Verlag.

Page 73: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

75

Applications – Kernel Applications – Kernel PCAPCA

• Recall

• (x) can be reconstructed from its principal components

• Projection operator:

Page 74: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

76

Applications – Kernel Applications – Kernel PCAPCA

Denoising• When ,

) can’t guarantee existence of pre-image

• First n directions ! main structure• The remaining directions ! noise

Page 75: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

77

Applications – Kernel Applications – Kernel PCAPCA

• Find z that minimizes

• Use gradient descent starting with x

Page 76: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

78

Applications – Kernel Applications – Kernel PCAPCA

• Input toy data:3 point sources(100 points each)with Gaussian noise=0.1

• Using RBF

Page 77: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

79

Applications – Kernel Applications – Kernel PCAPCA

Page 78: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

80

Applications – Kernel Applications – Kernel PCAPCA

Page 79: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

81

Applications – Kernel Applications – Kernel PCAPCA

Page 80: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

82

Applications – Kernel Applications – Kernel PCAPCA

Compare to the linear PCA

Page 81: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

83

Applications – Kernel Applications – Kernel PCAPCAReal world data

Page 82: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

84

• Structural Risk Minimization (SRM)Structural Risk Minimization (SRM)

• Support Vector Machines (SVM)Support Vector Machines (SVM)

• Feature Space vs. Input SpaceFeature Space vs. Input Space

• Kernel PCAKernel PCA

• Kernel Fisher Discriminate Analysis Kernel Fisher Discriminate Analysis (KFDA)(KFDA)

Agenda…Agenda…

Page 83: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

85

Fisher Linear Fisher Linear DiscriminantDiscriminant

Finds a direction w, projected on which the classes are “best” separated

Page 84: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

86

Fisher Linear Fisher Linear DiscriminantDiscriminant

• For “best” separation - maximize

where is the projected mean of class i, and is the std.

Page 85: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

87

Fisher Linear Fisher Linear DiscriminantDiscriminant

• Equivalent to finding w which maximizes:

where

Page 86: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

88

Fisher Linear Fisher Linear DiscriminantDiscriminant

• The solution is given by:

Page 87: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

89

Kernel Fisher Kernel Fisher DiscriminantDiscriminant

• Kernel formulation:

where

Page 88: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

90

Kernel Fisher Kernel Fisher DiscriminantDiscriminant

• From the theory of reproducing kernels:

• Substituting it into the J(w) reduces the problem to maximizing:

Page 89: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

91

Kernel Fisher Kernel Fisher DiscriminantDiscriminant

where

Page 90: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

92

Kernel Fisher Kernel Fisher DiscriminantDiscriminant

• Solution is to solve the generalized eigenproblem:

• Projection of a new pattern is then:

• Find a suitable threshold

l x l rather than d x d

Page 91: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

93

Kernel Fisher Discriminant Kernel Fisher Discriminant – Constraint Optimization– Constraint Optimization

• Can formulate problem as constraint optimization

w

Page 92: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

94

Kernel Fisher Discriminant Kernel Fisher Discriminant ––

Toy ExampleToy Example

KFDA KPCA – 1st eigenvect

or

KPCA – 2nd eigenvect

or

Page 93: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

95

Applications – Fisher Applications – Fisher Discriminant AnalysisDiscriminant Analysis

Fisher Discriminant Analysis with Kernels

S.Mika et. al.

In Y.-H. Hu, J. Larsen, E. Wilson, and S. Douglas, editors, Neural Networks for Signal Processing IX, pages 41-48. IEEE,

1999.

Page 94: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

96

Applications – Fisher Applications – Fisher Discriminant AnalysisDiscriminant Analysis

• Input (USPS handwritten digits): – Training set: 3000

• Constructed:– 10 class/non-class KFD classifiers– Take the class with maximal

output

Page 95: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

97

Applications – Fisher Applications – Fisher Discriminant AnalysisDiscriminant Analysis

• Results:3.7% error on a ten-class classifierUsing RBF with = 0.3*256

• Compare to 4.2% using SVM

• KFDA vs. SVM

Page 96: Kernel – Based Methods Presented by Jason Friedman Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2003

98

• Structural Risk Minimization (SRM)Structural Risk Minimization (SRM)

• Support Vector Machines (SVM)Support Vector Machines (SVM)

• Feature Space vs. Input SpaceFeature Space vs. Input Space

• Kernel PCAKernel PCA

• Kernel Fisher Discriminate Analysis Kernel Fisher Discriminate Analysis (KFDA)(KFDA)

Summary…Summary…