roc curve estimation

ROC curve estimation

Index

• Introduction to ROC• ROC curve• Area under ROC curve• Visualization using ROC curve

ROC curve

• Originally stands for Receiver Operating Characteristic curve.

• It is used widely in biomedical applications like radiology and imaging.

• An important utility here is to assess classifiers in machine learning.

Example situation

• Consider diagnostic test for a disease

• Test has 2 possible outcomes:

• Positive or negative.

• Now based on this we will explain the various notations used in ROC curves in the next slide.

Data distribution available

Test Result

Pts Pts with with diseasdiseasee

Pts Pts without without the the diseasedisease

Test Result

Call these patients “negative”

Call these patients “positive”

Threshold

Test Result



without the diseasewith the disease

True Positives

Some definitions ...

Test Result




False Positives

Test Result




True negatives

Test Result




False negatives

Confusion Matrix

• Confusion matrix is defined as a matrix consisting of two rows and two columns.

• The orientation of entries in the confusion matrix is as follows if say the confusion matrix is called CMat.

• Then CMat[1][1]=True Positives CMat[1][2]=False Positives.

• Similarly CMat[2][1]=False Negatives and CMat[2][2]=True Negatives.

2-class Confusion Matrix

• Reduce the 4 numbers to two ratestrue positive rate = TP = (#TP)/(#P)false positive rate = FP = (#FP)/(#N)

• Rates are independent of class ratio*

True class

Predicted class

positive negative

positive (#P) #TP #P - #TP

negative (#N) #FP #N - #FP

Comparing classifiers using Confusion Matrix

True

Predicted

pos neg

pos 60 40

neg 20 80

True

Predicted

pos neg

pos 70 30

neg 50 50

True

Predicted

pos neg

pos 40 60

neg 30 70

Classifier 1TP = 0.4FP = 0.3



Interpretations from the Confusion matrix

• The following metrics for a classifier can be calculated using the confusion matrix. These can be used for evaluating the classifier.

• Accuracy = (TP+TN)• Precision = TP/(TP+FP)• Recall = TP/(TP+FN)• F-Score = 2*recall*precision/(recall +

precision)

Tru

e P

osi

tive R

ate

(s

en

siti

vit

y)

0%

100%

False Positive Rate (1-specificity)

0%

100%

ROC curve

Tru

e P

osi

tive

Ra

te

0%

100%

False Positive Rate0%

100%

Tru

e P

osi

tive

Ra

te

0%

100%

False Positive Rate0%

100%

A good test: A poor test:

ROC curve comparison

Area under ROC curve (AUC)

• Overall measure of test performance

• Comparisons between two tests based on differences between (estimated) AUC

• For continuous data, AUC equivalent to Mann-Whitney U-statistic (nonparametric test of difference in location between two populations)

• Determines the accuracy of a classifier in machine learning.

Tru

e P

osi

tive

Ra

te

0%

100%

False Positive Rate

0%

100%

Tru

e P

osi

tive

R

ate

0%

100%

False Positive Rate

0%

100%

Tru

e P

osi

tive

R

ate

0%

100%

False Positive Rate

0%

100%

AUC = 50%

AUC = 90% AUC =

65%

AUC = 100%

Tru

e P

osi

tive

R

ate

0%

100%

False Positive Rate

0%

100%

AUC for ROC curves

Further Evaluation methods

• ROC curve based visualization• The visualization of the ROC curve is

a very good method of evaluating the classifier.

• Tools like Matlab, Weka and Orange provide facilities to support visualization of the ROC curve.

• ROCR is one such tool which provides effective visualization.

roc curve estimation

Documents