assessing and comparing classification algorithms introduction resampling and cross validation...
TRANSCRIPT
![Page 1: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/1.jpg)
Assessing and Comparing Classification Algorithms
•Introduction•Resampling and Cross Validation•Measuring Error•Interval Estimation and Hypothesis Testing•Assessing and Comparing Performance
![Page 2: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/2.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)2
Introduction
Questions: Assessment of the expected error of a learning
algorithm: Is the error rate of 1-NN less than 2%?
Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP ?
Training/validation/test sets Resampling methods: K-fold cross-validation
![Page 3: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/3.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)3
Algorithm Preference
Criteria (Application-dependent):Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability
Cost-sensitive learning
![Page 4: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/4.jpg)
Assessing and Comparing Classification Algorithms
•Introduction•Resampling and Cross Validation•Measuring Error•Interval Estimation and Hypothesis Testing•Assessing and Comparing Performance
![Page 5: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/5.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)5
Resampling and K-Fold Cross-Validation The need for multiple training/validation sets
{Xi,Vi}i: Training/validation sets of fold i
K-fold cross-validation: Divide X into k, Xi,i=1,...,K
Ti share K-2 parts121
31222
32111
KKKK
K
K
XXXTXV
XXXTXV
XXXTXV
![Page 6: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/6.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)6
1510
2510
259
159
124
224
223
123
112
212
211
111
XVXT
XVXT
XVXT
XVXT
XVXT
XVXT
5×2 Cross-Validation
5 times 2 fold cross-validation (Dietterich, 1998)
![Page 7: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/7.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)7
Bootstrapping
Draw instances from a dataset with replacement Prob that we do not pick an instance after N
draws
that is, only 36.8% is new!
36801
1 1 .eN
N
![Page 8: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/8.jpg)
Assessing and Comparing Classification Algorithms
•Introduction•Resampling and Cross Validation•Measuring Error•Interval Estimation and Hypothesis Testing•Assessing and Comparing Performance
![Page 9: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/9.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)9
Measuring Error
Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found positives / # of positives
= TP / (TP+FN) = sensitivity = hit rate Precision = # of found positives / # of found
= TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity
![Page 10: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/10.jpg)
Methods for Performance Evaluation How to obtain a reliable estimate of performance?
Performance of a model may depend on other factors besides the learning algorithm: Class distribution Cost of misclassification Size of training and test sets
![Page 11: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/11.jpg)
Learning Curve Learning curve shows
how accuracy changes with varying sample size
Requires a sampling schedule for creating learning curve: Arithmetic
sampling(Langley, et al)
Geometric sampling(Provost et al)
Effect of small sample size:
- Bias in the estimate
- Variance of estimate
![Page 12: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/12.jpg)
ROC (Receiver Operating Characteristic)
Developed in 1950s for signal detection theory to analyze noisy signals Characterize the trade-off between positive hits and
false alarms ROC curve plots TP (on the y-axis) against FP (on
the x-axis) Performance of each classifier represented as a
point on the ROC curve changing the threshold of algorithm, sample distribution
or cost matrix changes the location of the point
http://en.wikipedia.org/wiki/Receiver_operating_characteristic
http://www.childrensmercy.org/stats/ask/roc.asp
![Page 13: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/13.jpg)
ROC Curve
At threshold t:
TP=0.5, FN=0.5, FP=0.12, FN=0.88
- 1-dimensional data set containing 2 classes (positive and negative)
- any points located at x > t is classified as positive
![Page 14: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/14.jpg)
ROC Curve(TP,FP): (0,0): declare everything
to be negative class (1,1): declare everything
to be positive class (1,0): ideal
Diagonal line: Random guessing Below diagonal line: prediction is opposite of the
true class
![Page 15: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/15.jpg)
Using ROC for Model Comparison No model
consistently outperform the other M1 is better for
small FPR M2 is better for
large FPR
Area Under the ROC curve Ideal:
Area = 1 Random guess:
Area = 0.5
![Page 16: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/16.jpg)
How to Construct an ROC curveInstance P(+|A) True Class
1 0.95 +
2 0.93 +
3 0.87 -
4 0.85 -
5 0.85 -
6 0.85 +
7 0.76 -
8 0.53 +
9 0.43 -
10 0.25 +
• Use classifier that produces posterior probability for each test instance P(+|A)
• Sort the instances according to P(+|A) in decreasing order
• Apply threshold at each unique value of P(+|A)
• Count the number of TP, FP, TN, FN at each threshold
• TP rate, TPR = TP/(TP+FN)
• FP rate, FPR = FP/(FP + TN)
![Page 17: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/17.jpg)
How to construct an ROC curveClass + - + - - - + - + +
P 0.25 0.43 0.53 0.76 0.85 0.85 0.85 0.87 0.93 0.95 1.00
TP 5 4 4 3 3 3 3 2 2 1 0
FP 5 5 4 4 3 2 1 1 0 0 0
TN 0 0 1 1 2 3 4 4 5 5 5
FN 0 1 1 2 2 2 2 3 3 4 5
TPR 1 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.2 0
FPR 1 1 0.8 0.8 0.6 0.4 0.2 0.2 0 0 0
Threshold >=
ROC Curve:
+ + - + - - - + - +
+
Reverse of above order
![Page 18: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/18.jpg)
Assessing and Comparing Classification Algorithms
•Introduction•Resampling and Cross Validation•Measuring Error•Interval Estimation and Hypothesis Testing•Assessing and Comparing Performance
![Page 19: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/19.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)19
Interval Estimation
X = { xt }t where xt ~ N ( μ, σ2) m ~ N ( μ, σ2/N)
100(1- α) percentconfidence interval
1
950961961
950961961
22N
zmN
zmP
.N
.mN
.mP
..m
N.P
~m
N
//
Z
![Page 20: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/20.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)20
When σ2 is not known:
1
950641
950641
NzmP
.N
.mP
..m
NP
1
1
1212
1
22
N
Stm
N
StmP
t~SmN
N/mxS
N,/N,/
Nt
t
![Page 21: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/21.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)21
Hypothesis Testing
Reject a null hypothesis if not supported by the sample with enough confidence
X = { xt }t where xt ~ N ( μ, σ2)
H0: μ = μ0 vs. H1: μ ≠ μ0
Accept H0 with level of significance α if μ0 is in the 100(1- α) confidence interval
Two-sided test
220
// z,zmN
![Page 22: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/22.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)22
One-sided test: H0: μ ≤ μ0 vs. H1: μ > μ0
Accept if
Variance unknown: Use t, instead of z Accept H0: μ = μ0 if
z,
mN 0
12120
N,/N,/ t,tS
mN
![Page 23: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/23.jpg)
Assessing and Comparing Classification Algorithms
•Introduction•Resampling and Cross Validation•Measuring Error•Interval Estimation and Hypothesis Testing•Assessing and Comparing Performance
![Page 24: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/24.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)24
Assessing Error: H0: p ≤ p0 vs. H1: p > p0 Single training/validation set: Binomial Test
If error prob is p0, prob that there are e errors or less in N validation trials is
1- α
Accept if this prob is less than 1- α
N=100, e=20
jNjje
j
ppj
NeXP
00
1
1
![Page 25: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/25.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)25
Normal Approximation to the Binomial Number of errors X is approx N with mean Np0
and var Np0(1-p0)
Accept if this prob for X = e is less than z1-α
1- α
Z~
pNp
NpX
00
0
1
![Page 26: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/26.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)26
Paired t Test
Multiple training/validation sets xt
i = 1 if instance t misclassified on fold i Error rate of fold i:
With m and s2 average and var of pi
we accept p0 or less error if
is less than tα,K-1
N
xp
N
t
ti
i 1
1
0
Kt~
SpmK
![Page 27: Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing](https://reader030.vdocument.in/reader030/viewer/2022032703/56649d355503460f94a0c389/html5/thumbnails/27.jpg)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)27
K-Fold CV Paired t Test
Use K-fold cv to get K training/validation folds pi
1, pi2: Errors of classifiers 1 and 2 on fold i
pi = pi1 – pi
2 : Paired difference on fold i
The null hypothesis is whether pi has mean 0
12121
1
2
21
00
inif Accept0
1
0 vs.0
K,/K,/K
K
i i
K
i i
t,tt~s
mKsmK
K
mps
K
pm
:H:H