february 5th, 2008 - vital-itvlad popovici (sib) class discrimination for microarray studies...

67
Class discrimination for microarray studies Vlad Popovici Swiss Institute of Bioinformatics February 5th, 2008 Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 1 / 45

Upload: others

Post on 24-Feb-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Class discrimination for microarray studies

Vlad Popovici

Swiss Institute of Bioinformatics

February 5th, 2008

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 1 / 45

Page 2: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Outline

1 Introduction

2 Discriminant analysis

3 Performance assessment

4 Estimating the performance parameters

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 2 / 45

Page 3: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Introduction

Example: ER status prediction

Questions:

How to decide which patient isER+ and which is ER-?

What is the expected error?

What if I prefer to detect most ofER+?

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 3 / 45

Page 4: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Introduction

Example: ER status prediction

Questions:

How to decide which patient isER+ and which is ER-?

What is the expected error?

What if I prefer to detect most ofER+?

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 3 / 45

Page 5: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Introduction

Example: ER status prediction

Questions:

How to decide which patient isER+ and which is ER-?

What is the expected error?

What if I prefer to detect most ofER+?

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 3 / 45

Page 6: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Introduction

Know your problem!

RememberGood study←→ clear objectives.

Problems:

Class Comparison: find genes differentially expressed betweenpredefined classes;

Class Prediction: predict one of the predefined classes using thegene expressions;

Class Discovery: cluster analysis – define new classes using clustersof genes/specimens.

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 4 / 45

Page 7: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Introduction

Know your problem!

RememberGood study←→ clear objectives.

Problems:

Class Comparison: find genes differentially expressed betweenpredefined classes;

Class Prediction: predict one of the predefined classes using thegene expressions;

Class Discovery: cluster analysis – define new classes using clustersof genes/specimens.

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 4 / 45

Page 8: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Introduction

Know your problem!

RememberGood study←→ clear objectives.

Problems:

Class Comparison: find genes differentially expressed betweenpredefined classes;

Class Prediction: predict one of the predefined classes using thegene expressions;

Class Discovery: cluster analysis – define new classes using clustersof genes/specimens.

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 4 / 45

Page 9: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Introduction

Class prediction

Typical applications:

predict treatment response

predict patient relapse

predict the phenotype

toxico–genomics: predict which chemicals are toxic

. . .

Characteristics:

supervised learning: requires labelled training data

the goal is prediction accuracy

uses some measure of similarity

relies on feature selection

quite often incorrectly used

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 5 / 45

Page 10: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Introduction

Class prediction

Typical applications:

predict treatment response

predict patient relapse

predict the phenotype

toxico–genomics: predict which chemicals are toxic

. . .

Characteristics:

supervised learning: requires labelled training data

the goal is prediction accuracy

uses some measure of similarity

relies on feature selection

quite often incorrectly used

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 5 / 45

Page 11: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Introduction

Usage problems:improper methodological approach:

I well fitted model does not ensure good prediction (overfitted model)I too many features used in the model (curse of dimensionality)I feature selection on the full dataset(!)

reproducibility:I improper/insufficient validationI batch effects unaccounted forI insufficiently documented

therapeutic relevance

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 6 / 45

Page 12: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Introduction

External validation

Data acquisition

Design decisions:

−feature selection method(s)

−classifier(s)

−performance criterion

Model construction:

Feature selection

Classifier design

Performance estimation

Model selection

Data acquisition: everything up to (andincluding) normalization

Design decisions: should be taken beforereal modeling

Model design: DO NOT USE ALL DATA ATONCE!!

External validation: other datasets; clinicaltrials: phase II and III

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 7 / 45

Page 13: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Discriminant analysis

GoalFind a separation boundary between the classes.

f(x) > 0

f(x) < 0

f(x)

= 0

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 8 / 45

Page 14: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Representing data

1007_s_at

117_at

1053_at

211585_at

211584_s_at

Tu

mo

r 1

Tu

mo

r k

Tu

mo

r n

Tu

mo

r 2

probeset i

. . . . . .

. . .

. . .

p f

ea

ture

s

each element to beclassified is a vector

usually we classifytumors/samples/patients→ use columns

p � n

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 9 / 45

Page 15: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Representing data

1007_s_at

117_at

1053_at

211585_at

211584_s_at

Tu

mo

r 1

Tu

mo

r k

Tu

mo

r n

Tu

mo

r 2

probeset i

. . . . . .

. . .

. . .

p f

ea

ture

s

each element to beclassified is a vector

usually we classifytumors/samples/patients→ use columns

p � n

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 9 / 45

Page 16: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Representing data

1007_s_at

117_at

1053_at

211585_at

211584_s_at

Tu

mo

r 1

Tu

mo

r k

Tu

mo

r n

Tu

mo

r 2

probeset i

. . . . . .

. . .

. . .

p f

ea

ture

s

each element to beclassified is a vector

usually we classifytumors/samples/patients→ use columns

p � n

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 9 / 45

Page 17: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Formalism:

Data points: X = {xi ∈ Rp |i = 1, . . . , n}

x could be the log ratios or log signals

Labels: Y = {yi ∈ {ω1, . . . , ωk }|i = 1, . . . , n}; (k classes)e.g. ω1 = pCR and ω2 = non-pCR

Easier: take yi ∈ {1, 2, . . . , k } or yi ∈ {−1,+1} (for two classes)

2–class problem (dichotomy)Given a finite set of points X and their corresponding labels Y (a trainingset), find a discriminant function f such that

f(x)

> 0 for x ∈ ω1

< 0 for x ∈ ω2,

for all x.

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 10 / 45

Page 18: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Assumptions:

training set is representative for the whole population

the characteristics of the population do not change over time (e.g.same experimental conditions)

Comments:

for all x: i.e. infere a rule that works for unseen data – generalization

perfect classification of the training data does not ensuregeneralization;e.g.: f(xi) = yi will hardly work on new data

as stated, the problem is ill–posed: there are an infinity of solutions

real data is noisy: usually there is no perfect solution

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 11 / 45

Page 19: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Linear discriminants

With x = (x1, . . . , xn)t , w = (w1, . . . ,wn)t ,

Linear discriminant functions

f(x) = wtx + w0 =n∑

i=1

wixi + w0, w, x ∈ Rn,w0 ∈ R

New problem: optimize some criterion

(w∗,w∗0) = arg maxw,w0

J(X,Y; w,w0)

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 12 / 45

Page 20: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Geometry of the linear discriminants

w

x

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 13 / 45

Page 21: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Geometry of the linear discriminants

w

w0

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 13 / 45

Page 22: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Fisher’s LDA

Fisher’s criterion

J(w) =wtSBwwtSW w

where

SB = (µ1 − µ2)(µ1 − µ2)t is the between class scatter matrix (µi is theaverage of the class i)

SW = 1n−2 (n1Σ1 + n2Σ2) is the pooled within class covariance matrix

(Σi is the estimated covariance of the class i)

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 14 / 45

Page 23: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Fisher’s criterion (in plain English)Find the direction w along which the two classes are best separated – insome sense.

Solution:

w = S−1W (µ1 − µ2)

w0 =??I assuming data is normal with

equal covariances:closed–form formula

I alternative: estimate w0 byline search

I can be used to embed priorprobabilities in the classifier

w

w0

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 15 / 45

Page 24: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Versions of Fisher’s DA

Under normality assumption and

if the covariance matrices are equal and the features areuncorrelated: Diagonal LDA (the covariance matrices are diagonal);

if the covariance matrices are not equal: Quadratic DA

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 16 / 45

Page 25: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Versions of Fisher’s DA(from Duda, Hart & Stork, Pattern Classification)

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 17 / 45

Page 26: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

A Bayesian perspective

2 classes, ω1, ω2

one continuous feature(e.g. log expression)

p(x |ωi): class–conditional distribution

from Bayes’ formula:

p(ωi |x) =p(x |ωi)p(ωi)

p(x)

posterior =likelihood × prior

evidence

optimal decision (Bayes decision rule):decide x ∈ ω1 if p(ω1|x) > p(ω2|x),otherwise decide x ∈ ω2

p(ωi) =?

p(x |ωi) =?

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 18 / 45

Page 27: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Comments:

taking p(x |ωi) to be normal densities⇒ Bayes decision boundary isone of the previous DA forms

this hypothesis allows estimating the posteriors too⇒ we can have aconfidence measure on the decision

estimating p(x |ωi) from data⇒ Naïve Bayes classifier

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 19 / 45

Page 28: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Support Vector Machines (SVM)

separation of the classesis measured in terms ofmargin.

rigorous mathematicalframework (structuralrisk minimization)

linear discriminant(optimal hyperplane)

margin

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 20 / 45

Page 29: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Support Vector Machines (SVM)

separation of the classesis measured in terms ofmargin.

rigorous mathematicalframework (structuralrisk minimization)

linear discriminant(optimal hyperplane)

error

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 20 / 45

Page 30: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Principle of SVMMaximize the margin while minimizing the training error.

Optimization problem: find the hyperplane such that

1

margin2+ cost ·

∑errori

is minimized.

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 21 / 45

Page 31: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Kernel trick

Q: What if data is not linearly separable?A: Transform data such that it becomes (more or less) linearly separable.

Kernels: polynomial, radial basis function, etc.

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 22 / 45

Page 32: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Kernel trick

Q: What if data is not linearly separable?A: Transform data such that it becomes (more or less) linearly separable.

Kernels: polynomial, radial basis function, etc.

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 22 / 45

Page 33: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Other commonly used classifiers

nearest centroid, k−NN

logistic regression

classification trees (CART, C4.5); random forests

compound covariate predictor

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 23 / 45

Page 34: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Final remarks on classifiers

try to understand the classifier: assumptions, strong/weak points

feature selection should be related to the classifier used (e.g. t–test issuited for LDA but not for SVM)

complex classifiers generally need more data for training

use the simplest classifier that gives good results...

...but not a simpler one!

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45

Page 35: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Final remarks on classifiers

try to understand the classifier: assumptions, strong/weak points

feature selection should be related to the classifier used (e.g. t–test issuited for LDA but not for SVM)

complex classifiers generally need more data for training

use the simplest classifier that gives good results...

...but not a simpler one!

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45

Page 36: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Final remarks on classifiers

try to understand the classifier: assumptions, strong/weak points

feature selection should be related to the classifier used (e.g. t–test issuited for LDA but not for SVM)

complex classifiers generally need more data for training

use the simplest classifier that gives good results...

...but not a simpler one!

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45

Page 37: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Final remarks on classifiers

try to understand the classifier: assumptions, strong/weak points

feature selection should be related to the classifier used (e.g. t–test issuited for LDA but not for SVM)

complex classifiers generally need more data for training

use the simplest classifier that gives good results...

...but not a simpler one!

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45

Page 38: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Discriminant analysis

Final remarks on classifiers

try to understand the classifier: assumptions, strong/weak points

feature selection should be related to the classifier used (e.g. t–test issuited for LDA but not for SVM)

complex classifiers generally need more data for training

use the simplest classifier that gives good results...

...but not a simpler one!

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45

Page 39: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Performance assessment

Aspects of classifiers’ performance

discriminability: how well the rule predicts unseen data

reliability: robustness of the prediction or how well the posteriorprobabilities are estimated

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 25 / 45

Page 40: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Performance assessment

Counting errors

basic: how many times the predicted class differs from the true class(0–1 loss)

cost–aware methods: some errors may be more costly

root mean square error (RMSE): requires yi ∈ {0, 1} and that theclassifier outputs posterior probabilities pi :

RMSE =

√(1/n)

∑i

(yi − pi)2

mean absolute error (MAE): same requirements as RMSE:

MAE = (1/n)∑

i

|yi − pi |

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 26 / 45

Page 41: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Performance assessment

Counting errors

basic: how many times the predicted class differs from the true class(0–1 loss)

cost–aware methods: some errors may be more costly

root mean square error (RMSE): requires yi ∈ {0, 1} and that theclassifier outputs posterior probabilities pi :

RMSE =

√(1/n)

∑i

(yi − pi)2

mean absolute error (MAE): same requirements as RMSE:

MAE = (1/n)∑

i

|yi − pi |

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 26 / 45

Page 42: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Performance assessment

Confusion matrix

(for 2 classes called positive and negative, respectively)

Ground truth (gold standard)Positive Negative

Predicted Positive a bNegative c d

a + b + c + d = n

ideally: b = c = 0

a: true positive, b: false positive, c: false negative, d: true negative

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 27 / 45

Page 43: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Performance assessment

Error/performance metrics

These are point estimates:

apparent positives = a + c,apparent negatives = b + d

false positive rateFPR = b/(b + d), false negativerate FNR = c/a + c)

accuracy ACC = (a + d)/n

misclassification rateErr = (b + c)/n

sensitivity SN = a/(a + c) = truepositive rate

specificity SP = d/(b + d) = truenegative rate

positive prediction valuePPV = a/(a + b)

negative prediction valueNPV = d/(c + d)

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 28 / 45

Page 44: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Performance assessment

ROC and AUC

What if we like to see the effect of trading–off true positive rate and falsepositive rate (SN and 1 − SP)?

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 29 / 45

Page 45: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Performance assessment

ROC and AUCWhat if we like to see the effect of trading off true positive rate and falsepositive rate (SN and 1 − SP)?

vary the threshold

for each value of thethreshold, measureTPR, FPR

1

10

Tru

e p

os

itiv

e r

ate

(S

N)

False positive rate (1−SP)

SN

SP

ROC = Receiver Operating Characteristics

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 30 / 45

Page 46: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Performance assessment

ROC and AUCDifferent types of ROC curve:

1

10

Tru

e p

osit

ive r

ate

(S

N)

False positive rate (1−SP)

random

cla

ssifier

R0

R1

R2

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 31 / 45

Page 47: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Performance assessment

ROC and AUCROC curves can be used for comparing classifiers...

1

10

Tru

e p

osit

ive r

ate

(S

N)

False positive rate (1−SP)

R1R2

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 32 / 45

Page 48: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Performance assessment

ROC and AUCROC curves can be used for comparing classifiers... but not always

1

10

Tru

e p

osit

ive r

ate

(S

N)

False positive rate (1−SP)

R1R2

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 33 / 45

Page 49: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Performance assessment

ROC and AUC

Area Under the Curve (AUC) is a summary of the ROC...

1

10

Tru

e p

os

itiv

e r

ate

(S

N)

False positive rate (1−SP)

AUC

...and can be used for comparing classifiers.

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 34 / 45

Page 50: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Performance assessment

ROC and AUC

Area Under the Curve (AUC) is a summary of the ROC...

1

10

Tru

e p

os

itiv

e r

ate

(S

N)

False positive rate (1−SP)

AUC

...and can be used for comparing classifiers.

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 34 / 45

Page 51: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

Estimation schemes

Why estimation? Generally there is no analytic form of the error rate.

We discuss the error rate, but this applies to any performance metric.

Simple schemes:

apparent error: estimate on the training set

holdout estimate: unique split train/test

Resampling schemes:

k–fold cross–validation

repeated k–fold cross–validation

leave–one–out

bootstrapping

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 35 / 45

Page 52: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

k–fold cross–validation

separated train and test sets

randomly dived data into ksubsets (folds) – you may alsochoose to enforce the proportionof the classes (stratified CV)

train on k − 1 folds and test onthe holdout fold

estimate the error as theaverage error measured onholdout folds

TRAIN SET TEST SET

usually k = 5 or k = 10

if k = n ⇒ leave–one–outestimator

improved estimation: repeatedk−CV (e.g. 100 × (5 − CV))

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 36 / 45

Page 53: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

k–fold cross–validation

From k folds:

ε1, . . . , εk errors on the test folds

Ek−CV = 1k∑k

j=1 εj

estimated standard deviation

Confidence intervals (simple version – binomial approximation):

E ≈ E ±

0.5n+ z

√E(1 − E)

n

where n is the dataset size and z = Φ−1(1 − α/2), for a 1 − α confidenceinterval (e.g. z = 1.96 for 95% conf. interval)

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 37 / 45

Page 54: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

k–fold cross–validation

From k folds:

ε1, . . . , εk errors on the test folds

Ek−CV = 1k∑k

j=1 εj

estimated standard deviation

Confidence intervals (simple version – binomial approximation):

E ≈ E ±

0.5n+ z

√E(1 − E)

n

where n is the dataset size and z = Φ−1(1 − α/2), for a 1 − α confidenceinterval (e.g. z = 1.96 for 95% conf. interval)

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 37 / 45

Page 55: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

k–fold cross–validation

From k folds:

ε1, . . . , εk errors on the test folds

Ek−CV = 1k∑k

j=1 εj

estimated standard deviation

Confidence intervals (simple version – binomial approximation):

E ≈ E ±

0.5n+ z

√E(1 − E)

n

where n is the dataset size and z = Φ−1(1 − α/2), for a 1 − α confidenceinterval (e.g. z = 1.96 for 95% conf. interval)

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 37 / 45

Page 56: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

k–fold cross–validation

From k folds:

ε1, . . . , εk errors on the test folds

Ek−CV = 1k∑k

j=1 εj

estimated standard deviation

Confidence intervals (simple version – binomial approximation):

E ≈ E ±

0.5n+ z

√E(1 − E)

n

where n is the dataset size and z = Φ−1(1 − α/2), for a 1 − α confidenceinterval (e.g. z = 1.96 for 95% conf. interval)

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 37 / 45

Page 57: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

Bootstrap error estimation

Performance estimation

(X,Y)

(X1,Y1) (X2,Y2) (XB,YB)

E1 E2 EB

...

1 generate a new dataset (Xb ,Yb) by resampling with replacementfrom the original dataset (X,Y)

2 train the classifier on (Xb ,Yb) and test on the left out data, to obtainan error Eb .

3 repeat 1–2 for b = 1, . . . ,B and collect Eb .

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 38 / 45

Page 58: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

Bootstrap error estimation

estimate the error: for example, use the .632 estimator

E = 0.368E0 + 0.6321B

B∑b=1

Eb

where E0 is the error rate on the full training set (X,Y).

use the empirical distribution of Eb to obtain confidence intervals

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 39 / 45

Page 59: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

Case study (1)

Build a classifier to predict ER status:

select top 2 probesets using the ratio of between–groups towithin–groups sum of squares (similar to t–test)

use LDA to discriminate between ER- and ER+

Algorithm

Let (X,Y) be the full dataset.For (all k=1...K folds):

let the training set be Xtr = X \ X (k), Ytr = Y \ Y (k), (all but the k–thfold), and let the test set be Xts = X (k), Yts = Y (k)

select the best 2 probesets on (Xtr ,Ytr) and train LDA on these data

test on (Xts ,Yts) and record the values for error, AUC,...

Compute the expected error, AUC,...

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 40 / 45

Page 60: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

Case study (1)

Build a classifier to predict ER status:

select top 2 probesets using the ratio of between–groups towithin–groups sum of squares (similar to t–test)

use LDA to discriminate between ER- and ER+Algorithm

Let (X,Y) be the full dataset.For (all k=1...K folds):

let the training set be Xtr = X \ X (k), Ytr = Y \ Y (k), (all but the k–thfold), and let the test set be Xts = X (k), Yts = Y (k)

select the best 2 probesets on (Xtr ,Ytr) and train LDA on these data

test on (Xts ,Yts) and record the values for error, AUC,...

Compute the expected error, AUC,...

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 40 / 45

Page 61: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

Case study (1)

CV scheme Error 95% CI for Error AUC5 − CV 0.0692 0.0073–0.1311 0.96610 − CV 0.0615 0.0026–0.1204 0.97020 × (5 − CV) 0.0615 (0.0538–0.0692)∗ 0.969

∗ from the quantiles of the empirical distribution of the error

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 41 / 45

Page 62: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

Case study (2)

Build a classifier to predict patients with pathologic complete response(pCR).

find the best number of probesets to include in the model using theratio of between–groups to within–groups sum of squares (similar tot–test)

use LDA to discriminate between patients with pCR and thosewithout.

best number of probesets is a meta–parameter, which has to beestimated within a cross–validation

problem: if we estimate the meta–parameter on the full training set,we will likely fail to correctly estimate the performance (optimistic bias)

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 42 / 45

Page 63: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

Case study (2)

Build a classifier to predict patients with pathologic complete response(pCR).

find the best number of probesets to include in the model using theratio of between–groups to within–groups sum of squares (similar tot–test)

use LDA to discriminate between patients with pCR and thosewithout.

best number of probesets is a meta–parameter, which has to beestimated within a cross–validation

problem: if we estimate the meta–parameter on the full training set,we will likely fail to correctly estimate the performance (optimistic bias)

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 42 / 45

Page 64: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

Case study (2)Two–external (nested) CV:

TRAIN SET TEST SET

Ou

ter

CV

lo

op

Inn

er

CV

lo

op

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 43 / 45

Page 65: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

What we do learn from CV:

the expected performance of the modeling recipe;

the imprecision in estimating the performance;we can have a look at:

I what are the most stable featuresI what are the points always missclassified

What we do not learn from CV:

the best features

the best classifier

the best meta–parameters

We obtain these by training on thefull dataset (no CV).

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 44 / 45

Page 66: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Estimating the performance parameters

What we do learn from CV:

the expected performance of the modeling recipe;

the imprecision in estimating the performance;we can have a look at:

I what are the most stable featuresI what are the points always missclassified

What we do not learn from CV:

the best features

the best classifier

the best meta–parameters

We obtain these by training on thefull dataset (no CV).

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 44 / 45

Page 67: February 5th, 2008 - Vital-ITVlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45. Discriminant analysis Final remarks on classifiers try to

Bibliography

Bibliography

Duda, Hart, Stork: Pattern Classification

Hastie, Tibshirani, Friedman: The Elements of Statistical Learning

T.Fawcett: ROC Graphs: Notes and Practical Considerations for Data Mining Researchers, HP Laboratories Tech. Rep.HPL–2003–4

A.Webb: Statistical Pattern Recognition

I.Shmulevich, E.Dougherty: Genomic Signal Processing

Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 45 / 45