machine learning: classiﬁer evaluation...who provides the “oracle” to validate answers?...

Machine Learning: Classifier Evaluation

Madhavan Mukund

Chennai Mathematical Institutehttp://www.cmi.ac.in/~madhavan

AlgoLabs Certification Course on Machine Learning23 February, 2015

Evaluating a classifier

Accuracy What fraction of predictions are correct?

Need access to an “oracle” to validate answers




Classification is often asymmetric

Suppose 1% of email traffic constitutes phishing

An email filter that always says “No” is 99% accurate, buttotally useless!

Note: Conventional to assume that “Yes” is the minorityanswer




Classification is often asymmetric

Suppose 1% of email traffic constitutes phishing

An email filter that always says “No” is 99% accurate, buttotally useless!

Note: Conventional to assume that “Yes” is the minorityanswer

Need a finer classification of correct predictions and errors

Evaluating a classifier . . .

Confusion matrix

Classified positive Classified negativeActual Positive TP FNActual Negative FP TN


Confusion matrix


PrecisionWhat fraction of positive classifications are correct?

p =TP

TP + FP


Confusion matrix


PrecisionWhat fraction of positive classifications are correct?

p =TP

TP + FP

Recall

What fraction of actual positive cases are correctly classified?

p =TP

TP + FN


Classified positive Classified negativeActual Positive 1 99Actual Negative 0 1000

Here p = 1 but r = 0.01


Classified positive Classified negativeActual Positive 1 99Actual Negative 0 1000

Here p = 1 but r = 0.01

No functional relationship between p and r

In practice, they are typically inversely related—increasing preduces r and vice versa

Conservative classifier — higher precision, ignores valid cases

Permissive classifier — higher recall, more mistakes


Combine p,r into a single F-Score–weighted harmonic mean

F =1

α 1p + (1− α)1r

=(β2 + 1)pr

β2p + r

where α ∈ [0, 1] and β2 =1− α

α

Fβ=1 =2pr

p + r


Who provides the “oracle” to validate answers?



Holdout sets

Exclude a random sample of training data

Build classifier on remaining data, check answers on holdoutset

Suitable if we have a large volume of training data



Holdout sets

Exclude a random sample of training data

Build classifier on remaining data, check answers on holdoutset

Suitable if we have a large volume of training data

Cross validation

Systematically exclude 1/n of training data

Build classifier on remaining data and check answers onexcluded set

Repeat n times to span entire training data

Aggregate the scores obtained

Overfitting

Model is too specific

Tailored to fit anomalies in training data

Performs suboptimally on general data

More formally, there is another classifier such that:

Current classifier beats the other one on this data . . .. . . but the other one is better on unseen data

Overfitting . . .

Synthetic data, two classes, 0.75 yes and 0.25 no

Blindly saying yes has 0.25 error

Decision tree has 119 nodes, 0.35 error!

Overfitting . . .

Two classes, Pr(y) = p and Pr(n) = 1− p

Choose majority class Y uniformly, error is 1− p

Assign each item class Y with probability p, N withprobability 1− p

Errors on Y : p(1− p)Errors on N: (1− p)pTotal error 2p(1− p) > p, since p > 0.5!

Overfitting . . .

Prune the tree

Top-down: stop expanding tree if information gain drops belowa threshold

Bottom-up:

Remove children of a node if estimated error across children ismore than for original

Estimate error using holdout data

Overfitting . . .

Party affiliation of USlegislators based on votingpattern

Overfitting . . .

Party affiliation of US legislators based on voting pattern, afterpruning

machine learning: classiﬁer evaluation...who provides the “oracle” to validate answers?...

Documents