cross-validation to assess decoder performance: the good, the bad, and the ugly
TRANSCRIPT
![Page 1: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/1.jpg)
Cross-validation to assess decoder performance:the good, the bad, and the ugly
Gaël Varoquaux
https://hal.archives-ouvertes.fr/hal-01332785
![Page 2: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/2.jpg)
Measuring prediction accuracy
To find the best method(computer scientists)
For information mapping = omnibus test(cognitive neuroimaging)
Cross-validationasymptotically unbiasednon parametric
G Varoquaux 2
![Page 3: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/3.jpg)
1 Some theory
2 Empirical results on brain imaging
G Varoquaux 3
![Page 4: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/4.jpg)
1 Some theory
Test setTrain set
Full data
G Varoquaux 4
![Page 5: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/5.jpg)
1 Cross-validationTest on independent data
Train set Validation set
Loop
Test setTrain set
Full data
Measures prediction accuracy
G Varoquaux 5
![Page 6: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/6.jpg)
1 Cross-validationTest on independent data
Train set Validation set
Loop
Test setTrain set
Full data
Measures prediction accuracyG Varoquaux 5
![Page 7: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/7.jpg)
1 Choice of cross-validation strategyTest on independent dataBe robust to confounding dependences
Leave subjects out, or sessions out
LoopMore loop = more data points
Need to balance error in training model/ error on test
G Varoquaux 6
![Page 8: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/8.jpg)
1 Choice of cross-validation strategy: theory
Negative bias (underestimate performance)decreasing with the size of the training set
[Arlot... 2010] sec.5.1
Variance decreases with the size of the test set[Arlot... 2010] sec.5.2
Fraction of data left out: 10–20%Many random splits of the datarespecting dependency structure
G Varoquaux 7
![Page 9: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/9.jpg)
1 Tuning hyper-parametersComputer scientist says:
You need to set C in your SVM
10-410-310-210-1100 101 102 103 104
Parameter tuning: C
Training set
Validation set
G Varoquaux 8
![Page 10: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/10.jpg)
1 Tuning hyper-parametersComputer scientist says:
You need to set C in your SVM
10-410-310-210-1100 101 102 103 104
Parameter tuning: C
Training set
Validation set
G Varoquaux 8
![Page 11: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/11.jpg)
1 Nested cross-validationTest on independent data
Train set Validation set
Two loops
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 9
![Page 12: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/12.jpg)
2 Empirical results on brainimaging
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 10
![Page 13: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/13.jpg)
2 Datasets and tasks
7 fMRI datasets (6 from openfMRI)Haxby: 5 subjects, 15 inter-subject predictionsInter-subject predictions on 6 studies
OASIS VBM, gender discrimination
HCP MEG task, intra-subject, working memory
# samples: ∼ 200 (min 80, max 400)accuracy min 62%, max 96%
G Varoquaux 11
![Page 14: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/14.jpg)
2 Experiment 1: measuring cross-validation errorLeave out a large validation setMeasure error by cross-validation on the restCompare
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 12
![Page 15: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/15.jpg)
2 Cross-validated measure versus validation set
50.0% 60.0% 70.0% 80.0% 90.0% 100.0%
Accuracy on validation set
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
Acc
urac
y m
easu
red
by c
ross
val
idat
ion
Intra subjectInter subject
G Varoquaux 13
![Page 16: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/16.jpg)
2 Different cross-validation strategiesCross-validation Difference in accuracy measuredstrategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave onesample out
22% +19%
+3% +43%
Intrasubject
Intersubject
G Varoquaux 14
![Page 17: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/17.jpg)
2 Different cross-validation strategiesCross-validation Difference in accuracy measuredstrategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave onesample out
Leave onesubject/session
22% +19%
+3% +43%
10% +10%
21% +17%
Intrasubject
Intersubject
G Varoquaux 14
![Page 18: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/18.jpg)
2 Different cross-validation strategiesCross-validation Difference in accuracy measuredstrategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave onesample out
Leave onesubject/session
20% left out, 3 splits
22% +19%
+3% +43%
10% +10%
21% +17%
11% +11%
24% +16%
Intrasubject
Intersubject
G Varoquaux 14
![Page 19: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/19.jpg)
2 Different cross-validation strategiesCross-validation Difference in accuracy measuredstrategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave onesample out
Leave onesubject/session
20% left out, 3 splits
20% left out, 10 splits
20% left out, 50 splits
22% +19%
+3% +43%
10% +10%
21% +17%
11% +11%
24% +16%
9% +9%
24% +14%
9% +8%
23% +13%
Intrasubject
Intersubject
G Varoquaux 14
![Page 20: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/20.jpg)
2 Simple simulations
X1
X2
time
X1
2 Gaussian-separatedclouds
Auto-correlated noise
200 decoding samples10 000 validation samples⇒ Validation
= assymptotics
G Varoquaux 15
![Page 21: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/21.jpg)
2 Simple simulations
X1
X2
time
X1
X1
X2
time
X1
G Varoquaux 15
![Page 22: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/22.jpg)
2 Different cross-validation strategiesCross-validation Difference in accuracy measuredstrategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave onesample out
Leave oneblock out
20% leftout, 3 splits
20% leftout, 10 splits
20% leftout, 50 splits
16% +14%
+4% +33%
15% +13%
8% +8%
15% +12%
10% +11%
13% +10%
8% +8%
12% +10%
7% +7%
MEG data
Simulations
G Varoquaux 16
![Page 23: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/23.jpg)
2 Experiment 2: parameter-tuningCompare different strategies on validation set:1. Use the default C = 12. Use C = 10003. Choose best C by cross-validation and refit3. Average best models in cross-validation
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
Non-sparse decodersSVM `2Log-reg `2
Sparse decodersSVM `1Log-reg `1
G Varoquaux 17
![Page 24: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/24.jpg)
2 Experiment 2: parameter-tuningCompare different strategies on validation set:1. Use the default C = 12. Use C = 10003. Choose best C by cross-validation and refit3. Average best models in cross-validation
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
Non-sparse decodersSVM `2Log-reg `2
Sparse decodersSVM `1Log-reg `1
G Varoquaux 17
![Page 25: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/25.jpg)
2 Cross-validation for tuning?
CV +
averaging CV +
refitting C=1
C=1000
8%
4%
2%
0%
+2%
+4%
+8%
Impa
ct o
n pr
edic
tion
accu
racy
SVMlogreg
⇓
CV +
averaging CV +
refitting C=1
C=1000
8%
4%
2%
0%
+2%
+4%
+8%
Impa
ct o
n pr
edic
tion
accu
racy
SVMlogreg
⇑
Non-sparse models Sparse models
G Varoquaux 18
![Page 26: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/26.jpg)
@GaelVaroquaux
Cross-validation: lessons learned
Don’t use Leave One OutRandom 10-20% splits respecting sample structure
Cross-validation has error bars of ±10%
Cross-validation is inefficient for parameter tuning-C = 1 for SVM-`2-model averaging for SVM-`1
https://hal.archives-ouvertes.fr/hal-01332785
ni
![Page 27: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/27.jpg)
@GaelVaroquaux
Cross-validation: lessons learned
Don’t use Leave One OutRandom 10-20% splits respecting sample structure
Cross-validation has error bars of ±10%
Cross-validation is inefficient for parameter tuning-C = 1 for SVM-`2-model averaging for SVM-`1
https://hal.archives-ouvertes.fr/hal-01332785
ni
![Page 28: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/28.jpg)
@GaelVaroquaux
Cross-validation: lessons learned
Don’t use Leave One OutRandom 10-20% splits respecting sample structure
Cross-validation has error bars of ±10%
Cross-validation is inefficient for parameter tuning-C = 1 for SVM-`2-model averaging for SVM-`1
https://hal.archives-ouvertes.fr/hal-01332785
ni
![Page 29: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/29.jpg)
@GaelVaroquaux
Cross-validation: lessons learned
Don’t use Leave One OutRandom 10-20% splits respecting sample structure
Cross-validation has error bars of ±10%
Cross-validation is inefficient for parameter tuning-C = 1 for SVM-`2-model averaging for SVM-`1
https://hal.archives-ouvertes.fr/hal-01332785
ni
![Page 30: Cross-validation to assess decoder performance: the good, the bad, and the ugly](https://reader031.vdocument.in/reader031/viewer/2022030317/586fb9151a28abe57d8b8325/html5/thumbnails/30.jpg)
References I
S. Arlot, A. Celisse, ... A survey of cross-validation procedures formodel selection. Statistics surveys, 4:40–79, 2010.