svm – support vector machines

46
SVM – Support Vector Machines Presented By : Bella Specktor

Upload: john-franklin

Post on 03-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

SVM – Support Vector Machines. Presented By: Bella Specktor. Lecture Topics:. Motivation to SVM SVM – Algorithm Description SVM Applications. Motivation. SVM Alg. Applications. References. Motivation - Learning. Our task is to detect and exploit complex patterns in data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SVM – Support Vector Machines

SVM – Support Vector Machines

Presented By:Bella Specktor

Page 2: SVM – Support Vector Machines

Lecture Topics:

► Motivation to SVM

► SVM – Algorithm Description

► SVM Applications

Page 3: SVM – Support Vector Machines

Motivation - Learning

•Our task is to detect and exploit complex patterns in data. •For this, we should use learning algorithms.•We would like to use algorithm that is able to make generalizations, but not over generalization.

Motivation SVM Alg. Applications References

• Neural Networks can be used for this task

Page 4: SVM – Support Vector Machines

Linear Classification

• Our purpose is to find and b.

Motivation SVM Alg. Applications References

{ , },x yi i

1... , { 1,1},i l yi

dx Ri

• Suppose we have linear separable data, and we want to classify it into 2 classes. We will label the training data

• Linear separation of the input space is done by the function: 1 1 2 2

( ) , * * .. *d d

f x w x b w x w x w x b

...1w w

d

• Any of this hyperplanes would be fine for the separation. Which one to choose?

Page 5: SVM – Support Vector Machines

Perceptron Algorithm

If

Motivation SVM Alg. Applications References

( , ) 0i k iy w x b

_ _k new k old i iw w y x

• What about the non-linear case?

Page 6: SVM – Support Vector Machines

Neural Networks

Motivation SVM Alg. Applications References

We can use advanced networks architecture with multiple layers.

But…

•Some of them are having many local minima.

•Need to find how many neurons are needed.

•Sometimes we get many solutions.

Page 7: SVM – Support Vector Machines

SVM - History

Motivation SVM Alg. Applications References

Said to start in 1979 with Vladimir Vapnik’s paper.

Major developmets throughout 1990’s : introduced in 1992 by Boser, Guyon & Vapnik.

Centrelized web site:

www.kernel-machines.org

Has been applied to diverse

problems very successfully in the last 10-15

years.

Page 8: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

The SVM Algorithm

• Margin of a linear classifier is the width that the boundary could be increased by before hitting a datapoint.

• SVM would choose the Maximal Margin, Where distance to the closest negative example = Distance to the closest positive example.

Page 9: SVM – Support Vector Machines

Why Maximum Margin?

Better empirical performance.

Even if we have small error in the location of the boundary, we have least chance of misclassification.

Avoiding local minima.

Motivation SVM Alg. Applications References

Page 10: SVM – Support Vector Machines

VC (Vapnik-Chervinenkis) dimensions and Structural Risk Minimization

Motivation SVM Alg. Applications References

• VC dimension of model f is the maximal data point set cardinality that can be shattered by f. For Example:

• Set of points P is said to be shattered by F if for any subset of points there exists such that f can separate P perfectly.

f F

Page 11: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

• The bound on the test error of the classification model is given by:

(log(2 / ) 1) log( / 4)true train

VC n VCerr err

n

(Vapnik, 1995, “Structural Minimization Principle”)

• Intuitively, functions with high VC dimension represent many dichotomies for a given data set.

Page 12: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

VC dimensions and Structural Risk Minimization

A function that minimizes the empirical risk and has low VC dimension will generalize well regardless of the dimensionality of the input space (structural risk minimization).

Vapnik has shown that maximizing the margin of separation between classes is equivalent to minimizing the VC dimension.

Page 13: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

• Support Vectors are the points closest to the separating hiperplane. . Those are critical points whose removal would change the solution found.

• Optimal hyperplane is completely defined by the support vectors.

Page 14: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

• Let be an example closest to the boundary. Set

Thus, support vectors lie in the hyperplanes:

ix| | 1iw x b

: 1 : 11 2

H w x b and H w x bi i

• Notice that: x x w

| |x x M

Page 15: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

( ) 1w x w b

1w x b w w

1 1w w 2

w w

2| | | | | |

w wM x x w w w w

w w

2

| |M

w

Page 16: SVM – Support Vector Machines

The Margin is . Thus, we will get the widest Margin by minimizing .But how to do it?

2

|| ||w

Motivation SVM Alg. Applications References

For this purpose, we will switch to Dual

Representation and use Quadratic

Programming.

|| ||w

Convert the problem to: minimize

Subject to constraint: ( ) 1 0iy w x b

i

21( )

2J w w

Page 17: SVM – Support Vector Machines

In convex problem, R is positive semidefinite. In this case, QP has global minimizer

Motivation SVM Alg. Applications References

Page 18: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

• Our problem is: 21

( )2

J w w

Subject to constraint: ( ) 1 0iy w x b

i

• Introduce Lagrange multipliers associated with the constraints. The solution to the primal problem is equivalent to determining the saddle point of the function:

0i

2

1

1( , , ) ( ( ) 1)

2

n

P i i ii

L L w b w y x w b

0Pi i i i i i

i i

Lw y x w y x

w

0Pi i

i

Ly

b

Page 19: SVM – Support Vector Machines

• can be optimized by quadratic programming.

• formulated in terms of , but depends on w

and b.

DL

Motivation SVM Alg. Applications References

DL

Page 20: SVM – Support Vector Machines

b can be determined by the optimal and condition:

Motivation SVM Alg. Applications References

[ ( ) 1] 0i iy w x b i

i

implies: 0i ( ) 1i i i i i iy w x b w x b y b y w x

For every sample i, one of the following must hold:

0i 0 ( 1) 0i i iand y w x b

Samples with are Support Vectors 0i

Many sparse solution.

0i i i ii

w y x

Page 21: SVM – Support Vector Machines

Test Phase: determine on which side of the decision boundary a given test pattern lies and assign the corresponding label:

sgn( )w x b

Motivation SVM Alg. Applications References

R

i i ii

w x y duality

1

sgn( )n

i i ii

y x x b

Page 22: SVM – Support Vector Machines

Soft Margin Classifier

•In real world problem it is not likely to get an exactly separate line dividing the data.•We might have a curved decision boundary.•Exact linear separation may not be desirable if the data has noise in it.

Motivation SVM Alg. Applications References

Smoothing boundary.

We want that 1 1x w b for yi i i

1 1x w b for yi i i

0

i

Page 23: SVM – Support Vector Machines

is the upper bound on the number of training errors. Thus, in order to control error rate, we would minimize also , when a larger C is corresponding to assigning higher penalty for errors. The new QP:

ii

ii

C

Motivation SVM Alg. Applications References

1

2i i j i j i ji i j

L y y x x 0i i

i

y 0 i C

Constrains:

Where:

(1 )b y x wI I I I

R

i i ii

w x y

argmaxIi

i

Define:

Page 24: SVM – Support Vector Machines

Non Linear SVM

Limitations of Linear SVM:Doesn’t work on non linear separable data.Noise problem.

But… the advantage is that it deals with vectorial data.

Motivation SVM Alg. Applications References

We saw earlier that we can use Neural Networks, but it has many limitations. What should we do?

Page 25: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

Let’s look at the following example:

We would like to map the samples so that they would be linearly separable. If we will lift to two dimensional space with we will get:

2( ) ( , )x x x

Page 26: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

So, possible Solution can be: map data into a richer feature space (usually called Hilbert’s space) including non linear features, and than use linear classifier.

( )x x

But…There is a computational problem.There is a generalization problem.

Page 27: SVM – Support Vector Machines

Solution: using kernels

Remember we used dual representation, and hence data appears only in the form of dot products.

Motivation SVM Alg. Applications References

1

( ) ( ), ( )n

i i i i i ii i

f x y x x b substituting x y x x b

Kernel is a function that returns the value of the dot product between the images of two arguments. ( , ) ( ), ( )

1 2 1 2K x x x x

Thus, we can replace dot products with Kernels .

( ), ( )ix x ( , )K x xi

Page 28: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

Now, rather than making inner product on the new, larger vectors, we represent dot product of the data after doing non linear mapping on them.

( , ) ( ), ( )1 2 1 2

K x x x x

For Kernel we would only need to use K in the training algorithm, and would never need to explicitly even know what is.

The Kernel Matrix:

Page 29: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

Page 30: SVM – Support Vector Machines

Mercer’s Condition

Which functions can serve as Kernels?

Every (semi) positive definite symmetric function is a Kernel, i.e. there exist a mapping such that it is possible to write:

Motivation SVM Alg. Applications References

( , ) ( ), ( )1 2 1 2

K x x x x

Page 31: SVM – Support Vector Machines

Different Kernel Functions

1. Polynomial:

where p is degree of the polinomial.

2. Gaussian Radial Basis Function:

3. Two layer sigmoidal NN:

Motivation SVM Alg. Applications References

Page 32: SVM – Support Vector Machines

Multi-Class ClassificationTwo basic strategies:

1.One Versus All: Q SVMs are trained, and each of the SVMs separates a single class from all the others.

Classification is done by “winner takes all strategy”, in which the classifier with the highest output function assigns the class.

Motivation SVM Alg. Applications References

Page 33: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

Multi-Class Classification

2. Pairwise: Q(Q-1) machines are trained, each SVM separates a pair of classes.

The classification is done by “max-wins” voting strategy, in which the class with most votes determines the instance classification.

First is preferable in terms of training complexity.Experiments didn’t show big performance differences between the two.

Page 34: SVM – Support Vector Machines

Summary - SVM Algorithm for pattern classification

1. Start with data x1,…,xn which lives in feature space of dimension d.

2. Implicitly define the feature space by choosing a Kernel.

3. Find the largest margin linear discriminant function in the higher dimensional space by using quadratic programming package to solve:

Motivation SVM Alg. Applications References

0 i C 0i ii

y

1( )

2i i j i j i ji i j

L y y K x x

Page 35: SVM – Support Vector Machines

Strength and Weaknesses of SVMStrength:

1.Training is relatively easy.

2.No local minima (unlike in NN).

3.Scales relatively well to high dimensional data.

4.Trade-of between complexity and error can be controlled explicitly.

Motivation SVM Alg. Applications References

Major Weakness:

Need for a good Kernel function.

Page 36: SVM – Support Vector Machines

Pattern Recognition:- Object Recognition

- Handwriting recognition

- Speecker Identification

- Text Categorization

- Face Recognition

Motivation

What Is SVM useful for?

SVM Alg. Applications References

Regression Estimation

Page 37: SVM – Support Vector Machines

Face Recognition with SVM - Global Versus Component Approach

Bernd Heisle, Purdi Ho & Tomaso Pogio

Motivation SVM Alg. Applications References

One-versus all strategy was used.Linear SVM for each person in the dataset.Each SVM was trained to distinguish between all images of a single person and all other images.

Global Approach – basic algorithm:

Page 38: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

Face Recognition with SVM - Global Versus Component Approach

Page 39: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

Given a set of q people (a set of q SVMs), the class label y of a face pattern x is computed as follows:

Face Recognition with SVM - Global Versus Component Approach

1

1

( )

l

i i ii

j l

i i ii

y x x bd x

y x

where 1( ) max{| ( ) |}qn j jd x d x

Let |d| be the distance from x to hyperplane:

The gray values of the face picture were converted to feature vector.

Page 40: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

Face Recognition with SVM - Global Versus Component Approach

Global Approach – improved algorithm:

Variation of this algorithm was using second degree polynomial SVM (SVM with second degree polynomial Kernel).

Page 41: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

Face Recognition with SVM - Global Versus Component Approach

Component-based algorithm:

In the detection phase, facial components were detected.

Page 42: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

Than, final detection was made by combining the results of the component classifiers. Each of the componenets was normalized in size their gray levels were combined into a single feature vector:

Again , one versus all Linear SVM was used.

Face Recognition with SVM - Global Versus Component Approach

Page 43: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

The Component-Based algorithm showed much better results than the Global approach.

Face Recognition with SVM - Global Versus Component Approach - Results

Page 44: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

ReferencesB. Heisele, P. Ho & T. Poggio. Face Recognition With Support Vector Machines. Computer Vision and Image understanding, vol. 91, no 1-2, pp. 6-21, 2003.

C.J.C Burges, A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery. Vol 2(1) , 121-167.

J.P Lewis. A Short SVM (Support Vector Machines) Tutorial.

Page 45: SVM – Support Vector Machines

Motivation SVM Alg. Applications References

Prof. Bebis. Support Vector Machines. Pattern Recognition Course Spring 2006 Lecture Slides.

Prof. A.W Moore. Support Vector Machines. 2003 Lecture Slides.

R. Osadchy. Support Vector Machines. 2008 Lecture Slides.

Youtube – Facial Expressions Recognition http://www.youtube.com/watch?v=iPFg52yOZzY

References

Page 46: SVM – Support Vector Machines