support vector machines jordan smith mumt 611 14 february 2008

33
Support Vector Support Vector Machines Machines Jordan Smith Jordan Smith MUMT 611 MUMT 611 14 February 2008 14 February 2008

Upload: bridget-oneal

Post on 19-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Support Vector Support Vector MachinesMachines

Jordan SmithJordan Smith

MUMT 611MUMT 611

14 February 200814 February 2008

Page 2: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Topics to coverTopics to cover What do Support Vector Machines (SVMS) What do Support Vector Machines (SVMS)

do?do?

How do SVMs work?How do SVMs work? Linear dataLinear data Non-linear dataNon-linear data (Kernel functions)(Kernel functions) Unseparable dataUnseparable data(added Cost function)(added Cost function)

Search optimizationSearch optimization

Why?Why?

Page 3: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

What SVMs doWhat SVMs do

Page 4: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

What SVMs doWhat SVMs do

Page 5: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

What SVMs doWhat SVMs do

= margin

Page 6: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

What SVMs doWhat SVMs do

= margin

Page 7: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

What SVMs doWhat SVMs do

= margin= support vector

Page 8: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

What SVMs doWhat SVMs do

= margin= support vector

(optimum separating hyperplane)

Page 9: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

What SVMs doWhat SVMs do

= margin= support vector

(optimum separating hyperplane)

Page 10: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

What SVMs doWhat SVMs do

Sherrod 230

Page 11: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Topics to coverTopics to cover What do Support Vector Machines (SVMS) What do Support Vector Machines (SVMS)

do?do?

How do SVMs work?How do SVMs work? Linear dataLinear data Non-linear dataNon-linear data (Kernel functions)(Kernel functions) Unseparable dataUnseparable data(added Cost function)(added Cost function)

Search optimizationSearch optimization

Why?Why?

Page 12: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

The linear, separable The linear, separable casecase

Training data {Training data {xxii, y, yii}} Separating hyperplane defined by Separating hyperplane defined by

normal vector normal vector ww hyperplane equation: hyperplane equation: w·xw·x + b = 0 + b = 0 distance from plane to origin: |b|/|w|distance from plane to origin: |b|/|w|

Distances from hyperplane to nearest Distances from hyperplane to nearest point in each collection are dpoint in each collection are d++ and d and d--

Goal: maximize dGoal: maximize d++ + d + d--

(margins)

Page 13: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

The linear, separable The linear, separable casecase

1)1) x xii·w ·w + b ≥ +1+ b ≥ +1 (for y(for yii = +1) = +1)

2) 2) xxii·w ·w + b ≤ -1+ b ≤ -1 (for y(for yii = -1) = -1) yyii((xxii·w ·w + b) - 1 ≥ 0 + b) - 1 ≥ 0 for our support vectors, distance from originfor our support vectors, distance from origin

to plane = |1-b|/|w|to plane = |1-b|/|w|

AlgebraAlgebra d d++ + d + d-- = 2 / |w| = 2 / |w|

New goal:New goal:maximize: 2 /|w|maximize: 2 /|w| i.e.,i.e., minimize: |w|minimize: |w|

Page 14: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Nonlinear SVMsNonlinear SVMs

Sherrod 235

Page 15: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Nonlinear SVMsNonlinear SVMs

Kernel trick:Kernel trick:

Map data into a higher-dimensional space Map data into a higher-dimensional space using using : R: Rdd HH

Training problems involve only the dot Training problems involve only the dot product, so product, so HH can even be of infinite can even be of infinite dimensiondimension

Kernel trick makes nonlinear solutions Kernel trick makes nonlinear solutions linear again!linear again!

youtube youtube exampleexample

Page 16: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Nonlinear SVMsNonlinear SVMs

Radial basis function:Radial basis function:

Sherrod 236

Page 17: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Nonlinear SVMsNonlinear SVMs

SigmoidSigmoid

Sherrod 237

Page 18: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Another demonstrationAnother demonstration

appletapplet

Page 19: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

The unseparable caseThe unseparable case

Classifiers need to have a balanced Classifiers need to have a balanced capacitycapacity:: Bad botanist: “It has 847 leaves. Not a Bad botanist: “It has 847 leaves. Not a

tree!”tree!” Bad botanist: “It’s green. That’s a tree!”Bad botanist: “It’s green. That’s a tree!”

Page 20: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

The unseparable caseThe unseparable case

Sherrod 237

Page 21: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

The unseparable caseThe unseparable case

Page 22: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

The unseparable caseThe unseparable case

= error= fuzzy margin

Page 23: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

The unseparable caseThe unseparable case

Add a cost function:Add a cost function:

xxii·w ·w + b ≥ +1 + b ≥ +1 - - ii (for y(for yii = +1) = +1) xxii·w ·w + b ≤ -1 + b ≤ -1 + + ii (for y(for yii = - = -

1)1) i i ≥ 0≥ 0

old goal:old goal: minimize |w|minimize |w|22/2/2

new goal:new goal: minimize |w|minimize |w|22/2 /2 + C(∑+ C(∑i i ii))kk

Page 24: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Optimizing your searchOptimizing your search

To find the separating hyperplane, To find the separating hyperplane, you must manipulate many you must manipulate many parameters, depending on which parameters, depending on which kernel function you select:kernel function you select: C, the cost constantC, the cost constant Gamma, Gamma, ii, etc., etc.

There are two basic methods:There are two basic methods: Grid searchGrid search Pattern searchPattern search

Page 25: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Topics to coverTopics to cover What do Support Vector Machines (SVMS) do?What do Support Vector Machines (SVMS) do?

How do SVMs work?How do SVMs work? Linear dataLinear data Non-linear dataNon-linear data (Kernel functions)(Kernel functions) Unseparable dataUnseparable data (added Cost function)(added Cost function)

Search optimizationSearch optimization

Why?Why?

Page 26: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Why use SVMs?Why use SVMs?

Uses:Uses: Optical character recognitionOptical character recognition Spam detectionSpam detection MIRMIR

genre, artist classification (Mandel genre, artist classification (Mandel 2004, 2005)2004, 2005)

mood classification (Laurier 2007)mood classification (Laurier 2007) popularity classification, based on lyrics popularity classification, based on lyrics

(Dhanaraj 2005)(Dhanaraj 2005)

Page 27: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Why use SVMs?Why use SVMs? Machine learner of choice for high-Machine learner of choice for high-

dimensional data, such as text, images, dimensional data, such as text, images, music!music!

Conceptually simple.Conceptually simple.

Generalizable and efficient.Generalizable and efficient.

Next slides: results of a benchmark study Next slides: results of a benchmark study (Meyer 2004) comparing SVMs and other (Meyer 2004) comparing SVMs and other learning techniqueslearning techniques

Page 28: Support Vector Machines Jordan Smith MUMT 611 14 February 2008
Page 29: Support Vector Machines Jordan Smith MUMT 611 14 February 2008
Page 30: Support Vector Machines Jordan Smith MUMT 611 14 February 2008
Page 31: Support Vector Machines Jordan Smith MUMT 611 14 February 2008
Page 32: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Questions?Questions?

Page 33: Support Vector Machines Jordan Smith MUMT 611 14 February 2008

Key ReferencesKey ReferencesBurges, C. J. C. "A tutorial on support vector machines for pattern

recognition." Data Mining and Knowledge Discovery, 2:955-974, 1998. http://citeseer.ist.psu.edu/burges98tutorial.html

Cortes, C. and V. Vapnik. "Support-Vector Networks." Machine Learning, 20:273-297, Sept 1995. http://citeseer.ist.psu.edu/cortes95supportvector.html

Sherrod, Phillip H. 2008. DTREG: Predictive Modeling Software. (User’s guide) 227-41. <http://www.dtreg.com/DTREG.pdf”

Smola, A. J. and B. Scholkopf. 1998. “A tutorial on support vector regression.” NEUROCOLT Technical report NC-TR-98-030. Royal Holloway college, London.