9. support vector machines - chloé azencottcazencott.info/dotclear/public/lectures/ma2823... · 9....

80
9. Support Vector Machines Foundaons of Machine Learning École Centrale Paris — Fall 2015 Chloé-Agathe Azencot Centre for Computaonal Biology, Mines ParisTech [email protected]

Upload: vuongdung

Post on 01-Mar-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

9. Support Vector Machines

Foundations of Machine LearningÉcole Centrale Paris — Fall 2015

Chloé-Agathe AzencotCentre for Computational Biology, Mines ParisTech

chloe­agathe.azencott@mines­paristech.fr

Page 2: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

2

Learning objectives● Define a large-margin classifier in the separable

case.● Write the corresponding primal and dual

optimization problems.● Re-write the optimization problem in the case of

non-separable data.● Use the kernel trick to apply soft-margin SVMs to

non-linear cases.● Define kernels for real-valued data, strings, and

graphs.

Page 3: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

3

The linearly separable case:hard-margin SVMs

Page 4: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

4

Linear classifier

Assume data is linearly separable: there exists a line that separates + from -

Page 5: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

5

Linear classifier

Page 6: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

6

Linear classifier

Page 7: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

7

Linear classifier

Page 8: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

8

Linear classifier

Page 9: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

9

Linear classifier

Page 10: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

10

Linear classifier

Page 11: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

11

Linear classifier

Which one is beter?

Page 12: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

12

Margin of a linear classifier

Page 13: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

13

Margin of a linear classifier

Page 14: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

14

Margin of a linear classifier

Page 15: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

15

Largest margin classifier:Support vector machines

Page 16: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

16

Support vectors

Page 17: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

17

Formalization● Training set

● Assume the data to be linearly separable

● Goal: Find (w*, b*) that define the hyperplane with largest margin.

Page 18: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

18

Largest margin hyperplane

What is the size of the margin γ?

Page 19: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

19

Largest margin hyperplane

Page 20: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

20

Optimization problem● Margin maximization:

minimize ● Correct classification of the training points:

– For positive examples:

– For negative examples:

– Summarized as:● Optimization problem:

Page 21: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

21

Karun-Kush-Tucker conditions● minimize f(w) under the constraint g(w) ≥ 0

Case 1: the unconstraind minimum lies in the feasible region.Case 2: it does not.

feasible region

iso-contours of f

unconstrained minimum of f

How do we write this in terms of the gradients of f and g?

abusive notation: g(w, b)

Page 22: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

22

Karun-Kush-Tucker conditions● minimize f(w) under the constraint g(w) ≥ 0

Case 1: the unconstraind minimum lies in the feasible region.

Case 2: it does not.

– Summarized as:

Page 23: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

23

Karun-Kush-Tucker conditions● minimize f(w) under the constraint g(w) ≥ 0

Lagrangian:α is called the Lagrange multiplier.

Page 24: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

24

Karun-Kush-Tucker conditions● minimize f(w) under the constraints gi(w) ≥ 0

How do we deal with n constraints?

Page 25: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

25

Karun-Kush-Tucker conditions● minimize f(w) under the constraints gi(w) ≥ 0

Use n Lagrange multiplers – Lagrangian:

Page 26: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

26

Duality● Lagrangian

● Lagrange dual function

● q is concave in α (even if L is not convex)

Page 27: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

27

Duality● Primal problem: minimize f s.t. g(x) ≤ 0.

Equivalently: minimize the Laplacian.● Lagrange dual problem: maximize q.● Weak duality:

If f* optimizes the primal and d* optimizes the dual, then d* ≤ f*.Always hold.

● Strong duality: f* = d*Holds under specific conditions (constraint qualification),e.g. Slater's: f convex and h affine.

Page 28: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

28

Back to hard-margin SVMs● Minimize

under the n constraints

● We introduce one dual variable αi for each constraint (i.e. each training point)

● Lagrangian:

Page 29: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

29

Lagrangian of the SVM

● L(w, b, α) is convex quadratic in w and minimized for:

● L(w, b, α) is affine in b. It minimum is - ∞ except if:

Page 30: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

30

SVM dual problem● Lagrange dual function:

● Dual problem:maximize q(α) subject to α ≥ 0.

● Maximizing a quadratic function under box constraints can be solved efficiently using dedicated software.

Page 31: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

31

Optimal hyperplane● Once the optimal α* is found, we recover (w*, b*)

● The decision function is hence:

● KKT conditions: Either αi = 0 or gi=0

Case 1:Case 2:

Page 32: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

32

Support vectors

α = 0α > 0

Page 33: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

33

The non-linearly separable case: soft-margin SVMs.

Page 34: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

34

Soft-margin SVMsWhat if the data are not linearly separable?

Page 35: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

35

Soft-margin SVMs

Page 36: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

36

Soft-margin SVMs

Page 37: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

37

Soft-margin SVMs● Find a trade-off between large margin and few

errors.

What does this remind you of?

Page 38: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

38

SVM error: hinge loss

● We want:● Hinge loss function:

1y f(x)

lhinge

(f(x), y)

1

Page 39: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

39

Soft-margin SVMs● Find a trade-off between large margin and few

errors.

● Error:

● The soft-margin SVM solves:

Page 40: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

40

The C parameter

● Large Cmakes few errors

● Small Censures a large margin

● Intermediate Cfinds a tradeoff

Page 41: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

41

It is important to control CPr

edic

tion

erro

r

C

On training data

On new data

Page 42: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

42

Slack variables

is equivalent to:

slack variable:distance btw y.f(x) and 1

Page 43: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

43

● Primal

● Lagrangian

● Min the Lagrangian (partial derivatives in w, b, ξ)

● KKT conditions

Dual formulation of the soft-margin SVM

Page 44: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

44

Dual formulation of the soft-margin SVM

● Dual: Maximize

● under the constraints

● KKT conditions:

“easy” “hard” “somewhat hard”

Page 45: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

45

Support vectors of the soft-margin SVM

α = 00< α < C

α = C

Page 46: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

46

Primal vs. dual● Primal: (w, b) has dimension (p+1).

Favored if the data is low-dimensional.

● Dual: α has dimension n.

Favored is there is litle data available.

Page 47: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

47

The non-linear case: kernel SVMs.

Page 48: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

48

Non-linear SVMs

Page 49: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

49

Non-linear mapping to a feature space

R R2

Page 50: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

50

KernelsFor a given mapping

from the space of objects X to some Hilbert space H, the kernel between two objects x and x' is the inner product of their images in the feature spaces.

Kernels allow us to formalize the notion of similarity.

Page 51: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

51

Which functions are kernels?● A function K(x, x') defined on a set X is a kernel iff it

exists a Hilbert space H and a mapping φ: X →H such that, for any x, x' in X:

● A function K(x, x') defined on a set X is positive definite iff it is symmetric and satisfies:

● Theorem [Aronszajn, 1950]: K is a kernel iff it is positive definite.

Page 52: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

52

Positive definite matrices● Have a unique Cholesky decomposition

L: lower triangular, with positive elements on the diagonal

● Sesquilinear form is an inner product

– conjugagte symmetry– linearity in the first argument

– positive definiteness

Page 53: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

53

Polynomial kernels

More generally, for

is an inner product in a feature space of all monomials of degree up to d.

Page 54: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

54

Gaussian kernel

Page 55: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

55

Kernel trick

● Many linear algorithms (in particular, linear SVMs) can be performed in the feature space H without explicitly computing the images φ(x), but instead by computing kernels K(x, x')

● It is sometimes easy to compute kernels which correspond to large-dimensional feature spaces: K(x, x') is often much simpler to compute than φ(x).

Page 56: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

56

SVM in the feature space● Train:

– under the constraints

● Predict with the decision function

Page 57: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

57

SVM with a kernel● Train:

– under the constraints

● Predict with the decision function

Page 58: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

58

Toy example

Page 59: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

59

Toy example: linear SVM

Page 60: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

60

Toy example: polynomial SVM (d=2)

Page 61: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

61

Kernels for strings

Page 62: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

62

Protein sequence classificationGoal: predict which proteins are secreted or not, based on their sequence.

Page 63: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

63

Substring-based representations● Represent strings based on the presence/absence of

substrings of fixed length.

Strings of length k?

Page 64: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

64

Substring-based representations● Represent strings based on the presence/absence of

substrings of fixed length.

– Number of occurences of u in x: spectrum kernel [Leslie et al., 2002].

Page 65: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

65

Substring-based representations● Represent strings based on the presence/absence of

substrings of fixed length.

– Number of occurences of u in x: spectrum kernel [Leslie et al., 2002].

– Number of occurences of u in x, up to m mismatches: mismatch kernel [Leslie et al., 2004].

Page 66: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

66

Substring-based representations● Represent strings based on the presence/absence of

substrings of fixed length.

– Number of occurences of u in x: spectrum kernel [Leslie et al., 2002].

– Number of occurences of u in x, up to m mismatches: mismatch kernel [Leslie et al., 2004].

– Number of occcurences of u in x, allowing gaps, with a weight decaying exponentially with the number of gaps: substring kernel [Lohdi et al., 2002].

Page 67: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

67

Spectrum kernel

● Implementation:– Formally, a sum over |Ak|terms– At most |x| - k + 1 non-zero terms in Φ(x)– Hence: Computation in O(|x|+|x'|)

Page 68: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

68

Spectrum kernel

● Implementation:– Formally, a sum over |Ak|terms– At most |x| - k + 1 non-zero terms in Φ(x)– Hence: Computation in O(|x|+|x'|)

● Fast prediction for a new sequence x:

Page 69: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

69

The choice of kernel maters

Performance of several kernels on the SCOP superfamily recognition kernel [Saigo et al., 2004]

Page 70: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

70

Kernels for graphs

Page 71: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

71

Graph data● Molecules

● Images

[Harchaoui & Bach, 2007]

Page 72: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

72

Subgraph-based representations

0 1 1 0 0 1 0 0 0 1 0 1 0 0 1

no occurrenceof the 1st feature

1+ occurrencesof the 10th feature

Page 73: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

73

Tanimoto & MinMax● The Tanimoto and MinMax similarities are kernels

Page 74: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

74

Which subgraphs to use?● Indexing by all subgraphs...

– Computing all subgraph occurences is NP-hard.– Actually, finding whether a given subgraph occurs in a

graph is NP-hard in general.

http://jeremykun.com/2015/11/12/a-quasipolynomial-time-algorithm-for-graph-isomorphism-the-details/

Page 75: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

75

Which subgraphs to use?● Specific subgraphs that lead to computationally

efficient indexing:– Subgraphs selected based on domain knowledge

E.g. chemical fingerprints– All frequent subgraphs [Helma et al., 2004]– All paths up to length k [Nicholls 2005]– All walks up to length k [Mahé et al., 2005]– All trees up to depth k [Rogers, 2004]– All shortest paths [Borgwardt & Kriegel, 2005]– All subgraphs up to k vertices (graphlets) [Shervashidze

et al., 2009]

Page 76: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

76

Which subgraphs to use?

Path of length 5 Walk of length 5 Tree of depth 2

Page 77: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

77

Which subgraphs to use?

[Harchaoui & Bach, 2007]

Paths

Walks

Trees

Page 78: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

78

The choice of kernel maters

Predicting inhibitors for 60 cancer cell lines [Mahé & Vert, 2009]

Page 79: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

79

The choice of kernel maters

[Harchaoui & Bach, 2007]

● COREL14: 1400 natural images, 14 classes● Kernels: histogram (H), walk kernel (W), subtree kernel

(TW), weighted subtree kernel (wTW), combination (M).

Page 80: 9. Support Vector Machines - Chloé Azencottcazencott.info/dotclear/public/lectures/ma2823... · 9. Support Vector Machines ... Duality Primal problem: ... The Tanimoto and MinMax

80

Summary● Linearly separable case: hard-margin SVM● Non-separable, but still linear: soft-margin SVM● Non-linear: kernel SVM● Kernels for

– real-valued data– strings– graphs.