february 5th, 2008 - vital-itvlad popovici (sib) class discrimination for microarray studies...
TRANSCRIPT
Class discrimination for microarray studies
Vlad Popovici
Swiss Institute of Bioinformatics
February 5th, 2008
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 1 / 45
Outline
1 Introduction
2 Discriminant analysis
3 Performance assessment
4 Estimating the performance parameters
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 2 / 45
Introduction
Example: ER status prediction
Questions:
How to decide which patient isER+ and which is ER-?
What is the expected error?
What if I prefer to detect most ofER+?
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 3 / 45
Introduction
Example: ER status prediction
Questions:
How to decide which patient isER+ and which is ER-?
What is the expected error?
What if I prefer to detect most ofER+?
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 3 / 45
Introduction
Example: ER status prediction
Questions:
How to decide which patient isER+ and which is ER-?
What is the expected error?
What if I prefer to detect most ofER+?
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 3 / 45
Introduction
Know your problem!
RememberGood study←→ clear objectives.
Problems:
Class Comparison: find genes differentially expressed betweenpredefined classes;
Class Prediction: predict one of the predefined classes using thegene expressions;
Class Discovery: cluster analysis – define new classes using clustersof genes/specimens.
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 4 / 45
Introduction
Know your problem!
RememberGood study←→ clear objectives.
Problems:
Class Comparison: find genes differentially expressed betweenpredefined classes;
Class Prediction: predict one of the predefined classes using thegene expressions;
Class Discovery: cluster analysis – define new classes using clustersof genes/specimens.
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 4 / 45
Introduction
Know your problem!
RememberGood study←→ clear objectives.
Problems:
Class Comparison: find genes differentially expressed betweenpredefined classes;
Class Prediction: predict one of the predefined classes using thegene expressions;
Class Discovery: cluster analysis – define new classes using clustersof genes/specimens.
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 4 / 45
Introduction
Class prediction
Typical applications:
predict treatment response
predict patient relapse
predict the phenotype
toxico–genomics: predict which chemicals are toxic
. . .
Characteristics:
supervised learning: requires labelled training data
the goal is prediction accuracy
uses some measure of similarity
relies on feature selection
quite often incorrectly used
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 5 / 45
Introduction
Class prediction
Typical applications:
predict treatment response
predict patient relapse
predict the phenotype
toxico–genomics: predict which chemicals are toxic
. . .
Characteristics:
supervised learning: requires labelled training data
the goal is prediction accuracy
uses some measure of similarity
relies on feature selection
quite often incorrectly used
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 5 / 45
Introduction
Usage problems:improper methodological approach:
I well fitted model does not ensure good prediction (overfitted model)I too many features used in the model (curse of dimensionality)I feature selection on the full dataset(!)
reproducibility:I improper/insufficient validationI batch effects unaccounted forI insufficiently documented
therapeutic relevance
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 6 / 45
Introduction
External validation
Data acquisition
Design decisions:
−feature selection method(s)
−classifier(s)
−performance criterion
Model construction:
Feature selection
Classifier design
Performance estimation
Model selection
Data acquisition: everything up to (andincluding) normalization
Design decisions: should be taken beforereal modeling
Model design: DO NOT USE ALL DATA ATONCE!!
External validation: other datasets; clinicaltrials: phase II and III
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 7 / 45
Discriminant analysis
Discriminant analysis
GoalFind a separation boundary between the classes.
f(x) > 0
f(x) < 0
f(x)
= 0
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 8 / 45
Discriminant analysis
Representing data
1007_s_at
117_at
1053_at
211585_at
211584_s_at
Tu
mo
r 1
Tu
mo
r k
Tu
mo
r n
Tu
mo
r 2
probeset i
. . . . . .
. . .
. . .
p f
ea
ture
s
each element to beclassified is a vector
usually we classifytumors/samples/patients→ use columns
p � n
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 9 / 45
Discriminant analysis
Representing data
1007_s_at
117_at
1053_at
211585_at
211584_s_at
Tu
mo
r 1
Tu
mo
r k
Tu
mo
r n
Tu
mo
r 2
probeset i
. . . . . .
. . .
. . .
p f
ea
ture
s
each element to beclassified is a vector
usually we classifytumors/samples/patients→ use columns
p � n
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 9 / 45
Discriminant analysis
Representing data
1007_s_at
117_at
1053_at
211585_at
211584_s_at
Tu
mo
r 1
Tu
mo
r k
Tu
mo
r n
Tu
mo
r 2
probeset i
. . . . . .
. . .
. . .
p f
ea
ture
s
each element to beclassified is a vector
usually we classifytumors/samples/patients→ use columns
p � n
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 9 / 45
Discriminant analysis
Formalism:
Data points: X = {xi ∈ Rp |i = 1, . . . , n}
x could be the log ratios or log signals
Labels: Y = {yi ∈ {ω1, . . . , ωk }|i = 1, . . . , n}; (k classes)e.g. ω1 = pCR and ω2 = non-pCR
Easier: take yi ∈ {1, 2, . . . , k } or yi ∈ {−1,+1} (for two classes)
2–class problem (dichotomy)Given a finite set of points X and their corresponding labels Y (a trainingset), find a discriminant function f such that
f(x)
> 0 for x ∈ ω1
< 0 for x ∈ ω2,
for all x.
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 10 / 45
Discriminant analysis
Assumptions:
training set is representative for the whole population
the characteristics of the population do not change over time (e.g.same experimental conditions)
Comments:
for all x: i.e. infere a rule that works for unseen data – generalization
perfect classification of the training data does not ensuregeneralization;e.g.: f(xi) = yi will hardly work on new data
as stated, the problem is ill–posed: there are an infinity of solutions
real data is noisy: usually there is no perfect solution
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 11 / 45
Discriminant analysis
Linear discriminants
With x = (x1, . . . , xn)t , w = (w1, . . . ,wn)t ,
Linear discriminant functions
f(x) = wtx + w0 =n∑
i=1
wixi + w0, w, x ∈ Rn,w0 ∈ R
New problem: optimize some criterion
(w∗,w∗0) = arg maxw,w0
J(X,Y; w,w0)
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 12 / 45
Discriminant analysis
Geometry of the linear discriminants
w
x
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 13 / 45
Discriminant analysis
Geometry of the linear discriminants
w
w0
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 13 / 45
Discriminant analysis
Fisher’s LDA
Fisher’s criterion
J(w) =wtSBwwtSW w
where
SB = (µ1 − µ2)(µ1 − µ2)t is the between class scatter matrix (µi is theaverage of the class i)
SW = 1n−2 (n1Σ1 + n2Σ2) is the pooled within class covariance matrix
(Σi is the estimated covariance of the class i)
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 14 / 45
Discriminant analysis
Fisher’s criterion (in plain English)Find the direction w along which the two classes are best separated – insome sense.
Solution:
w = S−1W (µ1 − µ2)
w0 =??I assuming data is normal with
equal covariances:closed–form formula
I alternative: estimate w0 byline search
I can be used to embed priorprobabilities in the classifier
w
w0
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 15 / 45
Discriminant analysis
Versions of Fisher’s DA
Under normality assumption and
if the covariance matrices are equal and the features areuncorrelated: Diagonal LDA (the covariance matrices are diagonal);
if the covariance matrices are not equal: Quadratic DA
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 16 / 45
Discriminant analysis
Versions of Fisher’s DA(from Duda, Hart & Stork, Pattern Classification)
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 17 / 45
Discriminant analysis
A Bayesian perspective
2 classes, ω1, ω2
one continuous feature(e.g. log expression)
p(x |ωi): class–conditional distribution
from Bayes’ formula:
p(ωi |x) =p(x |ωi)p(ωi)
p(x)
posterior =likelihood × prior
evidence
optimal decision (Bayes decision rule):decide x ∈ ω1 if p(ω1|x) > p(ω2|x),otherwise decide x ∈ ω2
p(ωi) =?
p(x |ωi) =?
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 18 / 45
Discriminant analysis
Comments:
taking p(x |ωi) to be normal densities⇒ Bayes decision boundary isone of the previous DA forms
this hypothesis allows estimating the posteriors too⇒ we can have aconfidence measure on the decision
estimating p(x |ωi) from data⇒ Naïve Bayes classifier
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 19 / 45
Discriminant analysis
Support Vector Machines (SVM)
separation of the classesis measured in terms ofmargin.
rigorous mathematicalframework (structuralrisk minimization)
linear discriminant(optimal hyperplane)
margin
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 20 / 45
Discriminant analysis
Support Vector Machines (SVM)
separation of the classesis measured in terms ofmargin.
rigorous mathematicalframework (structuralrisk minimization)
linear discriminant(optimal hyperplane)
error
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 20 / 45
Discriminant analysis
Principle of SVMMaximize the margin while minimizing the training error.
Optimization problem: find the hyperplane such that
1
margin2+ cost ·
∑errori
is minimized.
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 21 / 45
Discriminant analysis
Kernel trick
Q: What if data is not linearly separable?A: Transform data such that it becomes (more or less) linearly separable.
Kernels: polynomial, radial basis function, etc.
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 22 / 45
Discriminant analysis
Kernel trick
Q: What if data is not linearly separable?A: Transform data such that it becomes (more or less) linearly separable.
Kernels: polynomial, radial basis function, etc.
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 22 / 45
Discriminant analysis
Other commonly used classifiers
nearest centroid, k−NN
logistic regression
classification trees (CART, C4.5); random forests
compound covariate predictor
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 23 / 45
Discriminant analysis
Final remarks on classifiers
try to understand the classifier: assumptions, strong/weak points
feature selection should be related to the classifier used (e.g. t–test issuited for LDA but not for SVM)
complex classifiers generally need more data for training
use the simplest classifier that gives good results...
...but not a simpler one!
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45
Discriminant analysis
Final remarks on classifiers
try to understand the classifier: assumptions, strong/weak points
feature selection should be related to the classifier used (e.g. t–test issuited for LDA but not for SVM)
complex classifiers generally need more data for training
use the simplest classifier that gives good results...
...but not a simpler one!
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45
Discriminant analysis
Final remarks on classifiers
try to understand the classifier: assumptions, strong/weak points
feature selection should be related to the classifier used (e.g. t–test issuited for LDA but not for SVM)
complex classifiers generally need more data for training
use the simplest classifier that gives good results...
...but not a simpler one!
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45
Discriminant analysis
Final remarks on classifiers
try to understand the classifier: assumptions, strong/weak points
feature selection should be related to the classifier used (e.g. t–test issuited for LDA but not for SVM)
complex classifiers generally need more data for training
use the simplest classifier that gives good results...
...but not a simpler one!
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45
Discriminant analysis
Final remarks on classifiers
try to understand the classifier: assumptions, strong/weak points
feature selection should be related to the classifier used (e.g. t–test issuited for LDA but not for SVM)
complex classifiers generally need more data for training
use the simplest classifier that gives good results...
...but not a simpler one!
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 24 / 45
Performance assessment
Aspects of classifiers’ performance
discriminability: how well the rule predicts unseen data
reliability: robustness of the prediction or how well the posteriorprobabilities are estimated
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 25 / 45
Performance assessment
Counting errors
basic: how many times the predicted class differs from the true class(0–1 loss)
cost–aware methods: some errors may be more costly
root mean square error (RMSE): requires yi ∈ {0, 1} and that theclassifier outputs posterior probabilities pi :
RMSE =
√(1/n)
∑i
(yi − pi)2
mean absolute error (MAE): same requirements as RMSE:
MAE = (1/n)∑
i
|yi − pi |
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 26 / 45
Performance assessment
Counting errors
basic: how many times the predicted class differs from the true class(0–1 loss)
cost–aware methods: some errors may be more costly
root mean square error (RMSE): requires yi ∈ {0, 1} and that theclassifier outputs posterior probabilities pi :
RMSE =
√(1/n)
∑i
(yi − pi)2
mean absolute error (MAE): same requirements as RMSE:
MAE = (1/n)∑
i
|yi − pi |
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 26 / 45
Performance assessment
Confusion matrix
(for 2 classes called positive and negative, respectively)
Ground truth (gold standard)Positive Negative
Predicted Positive a bNegative c d
a + b + c + d = n
ideally: b = c = 0
a: true positive, b: false positive, c: false negative, d: true negative
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 27 / 45
Performance assessment
Error/performance metrics
These are point estimates:
apparent positives = a + c,apparent negatives = b + d
false positive rateFPR = b/(b + d), false negativerate FNR = c/a + c)
accuracy ACC = (a + d)/n
misclassification rateErr = (b + c)/n
sensitivity SN = a/(a + c) = truepositive rate
specificity SP = d/(b + d) = truenegative rate
positive prediction valuePPV = a/(a + b)
negative prediction valueNPV = d/(c + d)
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 28 / 45
Performance assessment
ROC and AUC
What if we like to see the effect of trading–off true positive rate and falsepositive rate (SN and 1 − SP)?
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 29 / 45
Performance assessment
ROC and AUCWhat if we like to see the effect of trading off true positive rate and falsepositive rate (SN and 1 − SP)?
vary the threshold
for each value of thethreshold, measureTPR, FPR
1
10
Tru
e p
os
itiv
e r
ate
(S
N)
False positive rate (1−SP)
SN
SP
ROC = Receiver Operating Characteristics
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 30 / 45
Performance assessment
ROC and AUCDifferent types of ROC curve:
1
10
Tru
e p
osit
ive r
ate
(S
N)
False positive rate (1−SP)
random
cla
ssifier
R0
R1
R2
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 31 / 45
Performance assessment
ROC and AUCROC curves can be used for comparing classifiers...
1
10
Tru
e p
osit
ive r
ate
(S
N)
False positive rate (1−SP)
R1R2
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 32 / 45
Performance assessment
ROC and AUCROC curves can be used for comparing classifiers... but not always
1
10
Tru
e p
osit
ive r
ate
(S
N)
False positive rate (1−SP)
R1R2
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 33 / 45
Performance assessment
ROC and AUC
Area Under the Curve (AUC) is a summary of the ROC...
1
10
Tru
e p
os
itiv
e r
ate
(S
N)
False positive rate (1−SP)
AUC
...and can be used for comparing classifiers.
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 34 / 45
Performance assessment
ROC and AUC
Area Under the Curve (AUC) is a summary of the ROC...
1
10
Tru
e p
os
itiv
e r
ate
(S
N)
False positive rate (1−SP)
AUC
...and can be used for comparing classifiers.
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 34 / 45
Estimating the performance parameters
Estimation schemes
Why estimation? Generally there is no analytic form of the error rate.
We discuss the error rate, but this applies to any performance metric.
Simple schemes:
apparent error: estimate on the training set
holdout estimate: unique split train/test
Resampling schemes:
k–fold cross–validation
repeated k–fold cross–validation
leave–one–out
bootstrapping
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 35 / 45
Estimating the performance parameters
k–fold cross–validation
separated train and test sets
randomly dived data into ksubsets (folds) – you may alsochoose to enforce the proportionof the classes (stratified CV)
train on k − 1 folds and test onthe holdout fold
estimate the error as theaverage error measured onholdout folds
TRAIN SET TEST SET
usually k = 5 or k = 10
if k = n ⇒ leave–one–outestimator
improved estimation: repeatedk−CV (e.g. 100 × (5 − CV))
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 36 / 45
Estimating the performance parameters
k–fold cross–validation
From k folds:
ε1, . . . , εk errors on the test folds
Ek−CV = 1k∑k
j=1 εj
estimated standard deviation
Confidence intervals (simple version – binomial approximation):
E ≈ E ±
0.5n+ z
√E(1 − E)
n
where n is the dataset size and z = Φ−1(1 − α/2), for a 1 − α confidenceinterval (e.g. z = 1.96 for 95% conf. interval)
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 37 / 45
Estimating the performance parameters
k–fold cross–validation
From k folds:
ε1, . . . , εk errors on the test folds
Ek−CV = 1k∑k
j=1 εj
estimated standard deviation
Confidence intervals (simple version – binomial approximation):
E ≈ E ±
0.5n+ z
√E(1 − E)
n
where n is the dataset size and z = Φ−1(1 − α/2), for a 1 − α confidenceinterval (e.g. z = 1.96 for 95% conf. interval)
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 37 / 45
Estimating the performance parameters
k–fold cross–validation
From k folds:
ε1, . . . , εk errors on the test folds
Ek−CV = 1k∑k
j=1 εj
estimated standard deviation
Confidence intervals (simple version – binomial approximation):
E ≈ E ±
0.5n+ z
√E(1 − E)
n
where n is the dataset size and z = Φ−1(1 − α/2), for a 1 − α confidenceinterval (e.g. z = 1.96 for 95% conf. interval)
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 37 / 45
Estimating the performance parameters
k–fold cross–validation
From k folds:
ε1, . . . , εk errors on the test folds
Ek−CV = 1k∑k
j=1 εj
estimated standard deviation
Confidence intervals (simple version – binomial approximation):
E ≈ E ±
0.5n+ z
√E(1 − E)
n
where n is the dataset size and z = Φ−1(1 − α/2), for a 1 − α confidenceinterval (e.g. z = 1.96 for 95% conf. interval)
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 37 / 45
Estimating the performance parameters
Bootstrap error estimation
Performance estimation
(X,Y)
(X1,Y1) (X2,Y2) (XB,YB)
E1 E2 EB
...
1 generate a new dataset (Xb ,Yb) by resampling with replacementfrom the original dataset (X,Y)
2 train the classifier on (Xb ,Yb) and test on the left out data, to obtainan error Eb .
3 repeat 1–2 for b = 1, . . . ,B and collect Eb .
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 38 / 45
Estimating the performance parameters
Bootstrap error estimation
estimate the error: for example, use the .632 estimator
E = 0.368E0 + 0.6321B
B∑b=1
Eb
where E0 is the error rate on the full training set (X,Y).
use the empirical distribution of Eb to obtain confidence intervals
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 39 / 45
Estimating the performance parameters
Case study (1)
Build a classifier to predict ER status:
select top 2 probesets using the ratio of between–groups towithin–groups sum of squares (similar to t–test)
use LDA to discriminate between ER- and ER+
Algorithm
Let (X,Y) be the full dataset.For (all k=1...K folds):
let the training set be Xtr = X \ X (k), Ytr = Y \ Y (k), (all but the k–thfold), and let the test set be Xts = X (k), Yts = Y (k)
select the best 2 probesets on (Xtr ,Ytr) and train LDA on these data
test on (Xts ,Yts) and record the values for error, AUC,...
Compute the expected error, AUC,...
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 40 / 45
Estimating the performance parameters
Case study (1)
Build a classifier to predict ER status:
select top 2 probesets using the ratio of between–groups towithin–groups sum of squares (similar to t–test)
use LDA to discriminate between ER- and ER+Algorithm
Let (X,Y) be the full dataset.For (all k=1...K folds):
let the training set be Xtr = X \ X (k), Ytr = Y \ Y (k), (all but the k–thfold), and let the test set be Xts = X (k), Yts = Y (k)
select the best 2 probesets on (Xtr ,Ytr) and train LDA on these data
test on (Xts ,Yts) and record the values for error, AUC,...
Compute the expected error, AUC,...
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 40 / 45
Estimating the performance parameters
Case study (1)
CV scheme Error 95% CI for Error AUC5 − CV 0.0692 0.0073–0.1311 0.96610 − CV 0.0615 0.0026–0.1204 0.97020 × (5 − CV) 0.0615 (0.0538–0.0692)∗ 0.969
∗ from the quantiles of the empirical distribution of the error
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 41 / 45
Estimating the performance parameters
Case study (2)
Build a classifier to predict patients with pathologic complete response(pCR).
find the best number of probesets to include in the model using theratio of between–groups to within–groups sum of squares (similar tot–test)
use LDA to discriminate between patients with pCR and thosewithout.
best number of probesets is a meta–parameter, which has to beestimated within a cross–validation
problem: if we estimate the meta–parameter on the full training set,we will likely fail to correctly estimate the performance (optimistic bias)
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 42 / 45
Estimating the performance parameters
Case study (2)
Build a classifier to predict patients with pathologic complete response(pCR).
find the best number of probesets to include in the model using theratio of between–groups to within–groups sum of squares (similar tot–test)
use LDA to discriminate between patients with pCR and thosewithout.
best number of probesets is a meta–parameter, which has to beestimated within a cross–validation
problem: if we estimate the meta–parameter on the full training set,we will likely fail to correctly estimate the performance (optimistic bias)
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 42 / 45
Estimating the performance parameters
Case study (2)Two–external (nested) CV:
TRAIN SET TEST SET
Ou
ter
CV
lo
op
Inn
er
CV
lo
op
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 43 / 45
Estimating the performance parameters
What we do learn from CV:
the expected performance of the modeling recipe;
the imprecision in estimating the performance;we can have a look at:
I what are the most stable featuresI what are the points always missclassified
What we do not learn from CV:
the best features
the best classifier
the best meta–parameters
We obtain these by training on thefull dataset (no CV).
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 44 / 45
Estimating the performance parameters
What we do learn from CV:
the expected performance of the modeling recipe;
the imprecision in estimating the performance;we can have a look at:
I what are the most stable featuresI what are the points always missclassified
What we do not learn from CV:
the best features
the best classifier
the best meta–parameters
We obtain these by training on thefull dataset (no CV).
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 44 / 45
Bibliography
Bibliography
Duda, Hart, Stork: Pattern Classification
Hastie, Tibshirani, Friedman: The Elements of Statistical Learning
T.Fawcett: ROC Graphs: Notes and Practical Considerations for Data Mining Researchers, HP Laboratories Tech. Rep.HPL–2003–4
A.Webb: Statistical Pattern Recognition
I.Shmulevich, E.Dougherty: Genomic Signal Processing
Vlad Popovici (SIB) Class discrimination for microarray studies February 5th, 2008 45 / 45