introduction to machine learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 comparing classifiers: h...

33
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, 2014 [email protected] http://www.cmpe.boun.edu.tr/~ethem/i2ml3e Lecture Slides for

Upload: others

Post on 20-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

INTRODUCTION

TO

MACHINE

LEARNING 3RD EDITION

ETHEM ALPAYDIN

© The MIT Press, 2014

[email protected]

http://www.cmpe.boun.edu.tr/~ethem/i2ml3e

Lecture Slides for

Page 2: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

CHAPTER 19: DESIGN AND ANALYSIS OF MACHINE LEARNING EXPERIMENTS

Page 3: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Introduction 3

Questions:

Assessment of the expected error of a learning algorithm: Is

the error rate of 1-NN less than 2%?

Comparing the expected errors of two algorithms: Is k-NN

more accurate than MLP ?

Training/validation/test sets

Resampling methods: K-fold cross-validation

Page 4: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Algorithm Preference 4

Criteria (Application-dependent):

Misclassification error, or risk (loss functions)

Training time/space complexity

Testing time/space complexity

Interpretability

Easy programmability

Cost-sensitive learning

Page 5: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Factors and Response

Response function based on output to be maximized

Depends on controllable factors

Uncontrollable factors introduce randomness

Find the configuration of controllable factors that maximizes response and minimally affected by uncontrollable factors

5

Page 6: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Strategies of Experimentation 6

Response surface design for approximating and maximizing

the response function in terms of the controllable factors

How to search the factor space?

Page 7: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Guidelines for ML experiments 7

A. Aim of the study

B. Selection of the response variable

C. Choice of factors and levels

D. Choice of experimental design

E. Performing the experiment

F. Statistical Analysis of the Data

G. Conclusions and Recommendations

Page 8: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

The need for multiple training/validation sets

{Xi,Vi}i: Training/validation sets of fold i

K-fold cross-validation: Divide X into k, Xi,i=1,...,K

Ti share K-2 parts

Resampling and

K-Fold Cross-Validation 8

121

3122

3211

KKKK

K

K

XXXTXV

XXXTXV

XXXTXV

2

1

Page 9: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

5×2 Cross-Validation 9

1510

2510

259

159

124

224

223

123

112

212

211

111

XVXT

XVXT

XVXT

XVXT

XVXT

XVXT

5 times 2 fold cross-validation (Dietterich, 1998)

Page 10: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Bootstrapping 10

36801

1 1 .

eN

N

Draw instances from a dataset with replacement

Prob that we do not pick an instance after N draws

that is, only 36.8% is new!

Page 11: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Performance Measures 11

Error rate = # of errors / # of instances = (FN+FP) / N

Recall = # of found positives / # of positives

= TP / (TP+FN) = sensitivity = hit rate

Precision = # of found positives / # of found

= TP / (TP+FP)

Specificity = TN / (TN+FP)

False alarm rate = FP / (FP+TN) = 1 - Specificity

Page 12: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

ROC Curve 12

Page 13: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

13

Page 14: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Precision and Recall 14

Page 15: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Interval Estimation 15

1

950961961

950961961

22N

zmN

zmP

Nm

NmP

mNP

mN

//

...

...

~Z

X = { xt }t where xt ~ N ( μ, σ2)

m ~ N ( μ, σ2/N)

100(1- α) percent

confidence interval

Page 16: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

16

When σ2 is not known:

1

950641

950641

NzmP

NmP

mNP

..

..

1

1

1212

1

22

N

Stm

N

StmP

tS

mNNmxS

NN

Nt

t

,/,/

~ /

100(1- α) percent one-sided

confidence interval

Page 17: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Reject a null hypothesis if not supported by the sample with enough confidence

X = { xt }t where xt ~ N ( μ, σ2)

H0: μ = μ0 vs. H1: μ ≠ μ0

Accept H0 with level of significance α if μ0 is in the

100(1- α) confidence interval

Two-sided test

Hypothesis Testing 17

220

// ,

zz

mN

Page 18: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

18

One-sided test: H0: μ ≤ μ0 vs. H1: μ > μ0

Accept if

Variance unknown: Use t, instead of z

Accept H0: μ = μ0 if

z

mN,

0

12120

NN ttS

mN,/,/ ,

Page 19: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Assessing Error: H0:p ≤ p0 vs. H1:p > p0

19

jNjje

j

ppj

NeXP

00

1

1

Single training/validation set: Binomial Test

If error prob is p0, prob that there are e errors or

less in N validation trials is

1- α

Accept if this prob is less than 1- α

N=100, e=20

Page 20: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Normal Approximation to the Binomial

20

Z~

00

0

1 pNp

NpX

Number of errors X is approx N with mean Np0 and

var Np0(1-p0)

Accept if this prob for X = e is

less than z1-α

1- α

Page 21: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Multiple training/validation sets

xti = 1 if instance t misclassified on fold i

Error rate of fold i:

With m and s2 average and var of pi , we accept p0 or less error if

is less than tα,K-1

Paired t Test 21

N

xp

N

t

ti

i

1

1

0

Kt

S

pmK~

Page 22: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Single training/validation set: McNemar’s Test

Under H0, we expect e01= e10=(e01+ e10)/2

Comparing Classifiers: H0:μ0=μ1 vs.

H1:μ0≠μ1 22

2

1

1001

2

1001 1X~

ee

ee

Accept if < X2α,1

Page 23: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

K-Fold CV Paired t Test 23

12121

1

2

21

00

0

1

00

KKK

K

i i

K

i i

ttts

mK

s

mK

K

mps

K

pm

HH

,/,/ ,~

::

in if Accept

vs.

Use K-fold cv to get K training/validation folds

pi1, pi

2: Errors of classifiers 1 and 2 on fold i

pi = pi1 – pi

2 : Paired difference on fold i

The null hypothesis is whether pi has mean 0

Page 24: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

5×2 cv Paired t Test 24

55

1

2

1

1

2221221

5

2

ts

p

ppppsppp

i i

iiiiiiii

~/

/

Use 5×2 cv to get 2 folds of 5 tra/val replications

(Dietterich, 1998)

pi(j) : difference btw errors of 1 and 2 on fold j=1,

2 of replication i=1,...,5

Two-sided test: Accept H0: μ0 = μ1 if in (-tα/2,5,tα/2,5)

One-sided test: Accept H0: μ0 ≤ μ1 if < tα,5

Page 25: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

5×2 cv Paired F Test 25

5105

1

2

5

1

2

1

2

2,~ F

s

p

i i

i j

ji

Two-sided test: Accept H0: μ0 = μ1 if < Fα,10,5

Page 26: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Comparing L>2 Algorithms:

Analysis of Variance (Anova) 26

LH 210 :

Errors of L algorithms on K folds

We construct two estimators to σ2 .

One is valid if H0 is true, the other is always valid.

We reject H0 if the two estimators disagree.

KiLjX jij ,..., ,,...,,,~ 112 N

Page 27: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

27

2

12

0

22

12

2

1

2

2

2

2

21

2

1

0

1

1

L

jjL

j

j

L

j

j

j j

L

j j

K

i

ij

j

SSb

H

mmKSSbK

mm

L

mmK

SK

L

mmS

L

mm

KK

Xm

H

X

X

N

~

~/

ˆ

/,~

:

have wetrue, is whenSo

namely, , is of estimator an Thus

true is If

2

Page 28: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

28

11210

11

22

2

12

2

12

2

2

2

1

2

1

2

2

2

0

1

1

11

1

11

KLLL

KLL

KLK

j

j ijij

j i

jijL

j

j

K

i jij

j

j

FH

FKLSSw

LSSb

KL

SSw

L

SSb

SSwSK

mXSSw

KL

mX

L

S

K

mXS

S

H

,,

,

:

~/

///

/

~~

ˆ

if

: variances group of average

the is to estimator second our of Regardless

2

2

XX

Page 29: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

ANOVA table 29

If ANOVA rejects, we do pairwise posthoc tests

)(~

: vs :

1

10

2

KL

w

ji

jiji

tmm

t

HH

Page 30: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Comparison over Multiple Datasets 30

Comparing two algorithms:

Sign test: Count how many times A beats B over N datasets, and check if this could have been by chance if A and B did have the same error rate

Comparing multiple algorithms

Kruskal-Wallis test: Calculate the average rank of all algorithms on N datasets, and check if these could have been by chance if they all had equal error

If KW rejects, we do pairwise posthoc tests to find which ones have significant rank difference

Page 31: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Instead of testing using a single performance

measure, e.g., error, use multiple measures for

better discrimination, e.g., [fp-rate,fn-rate]

Compare p-dimensional distributions

Parametric case: Assume p-variate Gaussians

Multivariate Tests 31

Page 32: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Multivariate Pairwise Comparison 32

Paired differences:

Hotelling’s multivariate T2 test

For p=1, reduces to paired t test

Page 33: Introduction to Machine Learningethem/i2ml3e/3e_v1-0/... · 01 + e 10)/2 Comparing Classifiers: H 0:μ ... Introduction to Machine Learning Author: ethem Created Date: 7/10/2014 12:49:46

Multivariate ANOVA 33

Comparsion of L>2 algorithms