classification by nearest shrunken centroids and support...

34
Classification by Nearest Shrunken Centroids and Support Vector Machines Florian Markowetz [email protected] Max Planck Institute for Molecular Genetics, Computational Diagnostics Group, Berlin Practical DNA microarray analysis 2005 1

Upload: others

Post on 21-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Classificationby Nearest Shrunken Centroidsand Support Vector Machines

Florian [email protected]

Max Planck Institute for Molecular Genetics,Computational Diagnostics Group, Berlin

Practical DNA microarray analysis 2005

1

Page 2: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Two roads to classification

1. model class probabilities→ QDA, LDA, ...

2. model class boundaries directly

→ Optimal Separating Hyperplanes, SVM

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 2

Page 3: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

What’s the problem?

In classification you have to trade off overfitting vs. underfittingand bias vs. variance.

In 12’000 dimensions even linear methods are very complex → high

variance!

Simplify your models

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 3

Page 4: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Discriminant analysisand gene selection

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 4

Page 5: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Comparing Gaussian likelihoods

Assumption: each group of patients is well described by a Normal

density.

Training: estimate mean and covariance matrix for each group.

Prediction: assign new patient to group with higher likelihood.

Constraints on covariance structure lead to different forms of discri-

minant analysis.

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 5

Page 6: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Disriminant analysis in a nutshell

DLDA

QDA LDA

NearestCentroid

Characterize each class

by mean and covariancestructure.

• Quadratic D.A.different COVs

• Linear D.A.requires same COVs.

• Diagonal linear D.A.same diagonal COVs.

• Nearest centroidsforces COVs to σ2I.

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 6

Page 7: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Feature selection

Next simplification:Base the classification only on a small number of genes.

Feature selection: Find the most discriminative genes.

This task is different from testing for differential expression. Genes can

be significantly differential expressed, but still useless for classification.

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 7

Page 8: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Feature selection1. Filter:

• Rank genes according to discriminative power

by t-statistic, Wilcoxon, ...

• Use only the first k for classification.

• Discrete, hard thresholding.

2. Shrinkage:

• Continously shrink genes until only a few have influence on

classification.

• Example: Nearest Shrunken Centroids.

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 8

Page 9: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Shrunken Centroids

gene B

gene A

overall centroidfirst class centroid second class centroid

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 9

Page 10: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Nearest Shrunken Centroids cont’d

The group centroid xgk for gene g and class k is compared to the

overall centroid xg by

xgk = xg + mk(sg + s0) dgk ,

where sg is the pooled within-class standard deviation of gene g and

s0 is an offset to guard against genes with low expression levels.

Shrinkage: Each dgk is reduced by ∆ in absolute value, until it

reaches zero. Genes with dgk = 0 for all classes do not contribute to

the classification.

(Tibshirani et al., 2002)

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 10

Page 11: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Shortcomings of filter and shrinkage methods

1. High correlated genes get similar score but offer no new information.

But see (Jaeger et al., 2003) for a cure.

2. Filter and Shrinkage work only on single genes.

They don’t find interactions between groups of genes.

3. Filter and Shrinkage methods are only heuristics.

Search for best subset is infeasible for more than 30 genes.

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 11

Page 12: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Support Vector Machines— SVM —

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 12

Page 13: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Which hyperplane is the best?

C D

A B

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 13

Page 14: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

No sharp knive, but a fat plane

Samples

Sampleswith negative

label

with positivelabel

FAT PLANE

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 14

Page 15: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Separate the training set with maximal margin

Separating

Hyperplane

MarginSamples

Sampleswith negative

label

with positivelabel

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 15

Page 16: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

What are Support Vectors?

The points nearest to the separa-

ting hyperplane are called Support

Vectors.

Only they determine the position of

the hyperplane. All other pointshave no influence!

Mathematically: the weighted sum

of the Support Vectors is the nor-

mal vector of the hyperplane.

Separating

Hyperplane

MarginSamples

Sampleswith negative

label

with positivelabel

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 16

Page 17: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Non-separable training setsUse linear separation, but admit training errors.

Separating

Hyperplane

Penalty of error: distance to hyperplane multiplied by error cost C.

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 17

Page 18: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

The end?

The story of how to simplify your models is finished.

But for the sake of completeness:

How do we get from the simple linear Optimal Separating Hyperplane

to a full-grown Support Vector Machine?

It’s a trick, a kernel trick.

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 18

Page 19: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Separation may be easier in higher dimensions

featuremap

separatinghyperplane

complex in low dimensions simple in higher dimensions

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 19

Page 20: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

The kernel trick

Maximal margin hyperplanes in feature space

If classification is easier in a high-dimenisonal feature space, we would

like to build a maximal margin hyperplane there.

The construction depends on inner products ⇒ we will have to

evaluate inner products in the feature space.

This can be computationally intractable, if the dimensions become

too large!

Loophole Use a kernel function that lives in low dimensions, but

behaves like an inner product in high dimensions.

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 20

Page 21: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Kernel functions

Expression profiles p = (p1, p2, . . . , pg) ∈ Rg

and q = (q1, q2, . . . , qg) ∈ Rg.

Similarity in gene space: INNER PRODUCT

〈p, q〉 = p1q1 + p2q2 + . . . + pgqg

Similarity in feature space: KERNEL FUNCTION

K(p, q) = polynomial, radial basis, ...

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 21

Page 22: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Examples of Kernels

linear K(p, q) = 〈p, q〉

polynomial K(p, q) = (γ〈p, q〉+ c0)d

radial basis function K(p, q) = exp(−γ‖p− q‖2

)

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 22

Page 23: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Why is it a trick?

We do not need to know,how the feature space really looks like,

we just need the kernel function as a measure of similarity.

This is kind of black magic: we do not know what happens inside the

kernel, we just get the output.

Still, we have the geometric interpretation of the maximal margin

hyperplane, so SVMs are more transparent than e. g. Artificial Neural

Networks.

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 23

Page 24: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

The kernel trick: summary

separatinghyperplane

Non-linear separation

between vectors

in gene spaceusing kernel functions

=Linear separation

between vectors

in feature spaceusing inner product

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 24

Page 25: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Support Vector Machines

A Support Vector Machine is

a maximal margin hyperplane in feature space

built by using a kernel function in gene space.

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 25

Page 26: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Parameters of SVM

Kernel Parameters γ: width of rbf

coeff. in polynomial ( = 1)

d: degree of polynomial

c0 additive constant in polynomial (= 0)

Error weight C: influence of training errors

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 26

Page 27: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

SVM@work: low complexity

Figure taken from Scholkopf and Smola, Learning with Kernels, MIT Press 2002, p217

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 27

Page 28: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

SVM@work: medium complexity

Figure taken from Scholkopf and Smola, Learning with Kernels, MIT Press 2002, p217

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 28

Page 29: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

SVM@work: high complexity

Figure taken from Scholkopf and Smola, Learning with Kernels, MIT Press 2002, p217

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 29

Page 30: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

References

1. Trevor Hastie, Robert Tibshirani, Jerome FriedmanThe Elements of Statistical Learning. Springer 2001.

2. Bernhard Scholkopf and Alex Smola.Learning with Kernels. MIT Press, Cambridge, MA, 2002.

3. Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, Gilbert ChuDiagnosis of multiple cancer types by shrunken centroids of gene expres-sion, PNAS, 99(10), 6567–6572, 2002.

4. Jochen Jager, R. Sengupta and W.L. RuzzoImproved Gene Selection for Classification of Microarrays, Proc. PSB 2003

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 30

Page 31: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Intro intopractical session

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 31

Page 32: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Computational Diagnosis

TASK:

For 3 new patients in your hospital, decide whether they have a

chromosomal translocation resulting in a BCR/ABL fusion gene or

not.

IDEA:

Learn the difference between the cancer types from an archive of 76

expression profiles, which were analyzed and classified by an expert.

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 32

Page 33: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Training ... tuning ... testing

TRAINING:model <- svm(data = "76 profiles",

labels = "by an expert",kernel = "..",parameters = "..")

TUNING:svm.doctor <- tune.svm( data, labels,

different.parameter.values )

TESTING:diagnosis <- predict(svm.doctor, new.patients)

Florian Markowetz, Classification by SVM and PAM, Practical Microarray Analysis 2004 33

Page 34: Classification by Nearest Shrunken Centroids and Support ...compdiag.molgen.mpg.de/ngfn/docs/2005/may/markowetz-svm.pdf · Nearest Shrunken Centroids cont’d The group centroid

Training ... tuning ... testing

TRAINING:model <- pamr.train( data , labels )

TUNING:pamr.cv( data, labels )

TESTING:diagnosis <- pamr.predict(new.patients, best.treshold)

34