© 2009, kaiser permanente center for health research receiver operating characteristic (roc) curves...

43
© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic – Decision Theory

Upload: adela-gaines

Post on 16-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Receiver Operating Characteristic (ROC) Curves

Assessing the predictive properties of a test statistic – Decision Theory

Page 2: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction ProblemConceptual Framework

Suppose we have a test statistic for predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test

Criterion

Pos

Neg

Page 3: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction Problem Conceptual Framework

Suppose we have a test statistic for predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test

Criterion

Pos

Neg

Page 4: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction Problem Conceptual Framework

Suppose we have a test statistic for predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test

Criterion

Pos TP Neg

TP = True Positive

Page 5: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction Problem Conceptual Framework

Suppose we have a test statistic for predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test

Criterion

Pos

Neg

Page 6: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction Problem Conceptual Framework

Suppose we have a test statistic for predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test

Criterion

Pos FP Neg

FP = False Positive

Page 7: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction Problem Conceptual Framework

Suppose we have a test statistic for predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test

Criterion

Pos

Neg

Page 8: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction Problem Conceptual Framework

Suppose we have a test statistic for predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test

Criterion

Pos

Neg FN

FN = False Negative

Page 9: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction Problem Conceptual Framework

Suppose we have a test statistic for predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test

Criterion

Pos

Neg

Page 10: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction Problem Conceptual Framework

Suppose we have a test statistic for predicting the presence or absence of disease.

True Disease Status

Pos Neg

Test

Criterion

Pos

Neg TN

TN = True Negative

Page 11: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction ProblemConceptual Framework

True Disease Status

Pos Neg

Test

Criterion

Pos TP FPNeg FN TN

P N P+ N

Suppose we have a test statistic for predicting the presence or absence of disease.

Page 12: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction ProblemConceptual Framework

Page 13: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction ProblemTest Properties

True Disease Status

Pos Neg

Test

Criterion

Pos TP FPNeg FN TN

P N P+ NAccuracy = Probability that the test yields a

correct result.= (TP+TN) / (P+N)

Page 14: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction ProblemTest Properties

True Disease Status

Pos Neg

Test

Criterion

Pos TP FPNeg FN TN

P N P+ NSensitivity = Probability that a true case will test positive

= TP / PAlso referred to as True Positive Rate (TPR)

or True Positive Fraction (TPF).

Page 15: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction ProblemTest Properties

True Disease Status

Pos Neg

Test

Criterion

Pos TP FPNeg FN TN

P N P+ NSpecificity = Probability that a true negative will test negative

= TN / NAlso referred to as True Negative Rate (TNR)

or True Negative Fraction (TNF).

Page 16: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction ProblemTest Properties

True Disease Status

Pos Neg

Test

Criterion

Pos TP FPNeg FN TN

P N P+ N1-Specificity = Prob that a true negative will test positive

= FP / NAlso referred to as False Positive Rate (FPR)

or False Positive Fraction (FPF).

Page 17: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction ProblemTest Properties

True Disease Status

Pos Neg

Test

Criterion

Pos TP FPNeg FN TN

P N P+ NPositive Predictive Value (PPV)

= Probability that a positive test will truly have disease

= TP / (TP+FP)

Page 18: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction ProblemTest Properties

True Disease Status

Pos Neg

Test

Criterion

Pos TP FPNeg FN TN

P N P+ NNegative Predictive Value (NPV)

= Probability that a negative test will truly be disease free

= TN / (TN+FN)

Page 19: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction ProblemExample

True Disease Status

Pos Neg

Test

Criterion

Pos 27 173 200Neg 73 727 800

100 900 1000

27/100 = .27 Se =

Sp = 727/900 = .81

FPF = 1- Sp = .19

Acc = (27+727)/1000 = .75 PPV = 27/200 = .14 NPV = 727/800 = .91

Page 20: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Binary Prediction ProblemTest Properties

Of these properties, only Se and Sp (and hence FPR) are considered invariant test characteristics.

Accuracy, PPV, and NPV will vary according to the underlying prevalence of disease.

Se and Sp are thus “fundamental” test properties and hence are the most useful measures for comparing different test criteria, even though PPV and NPV are probably the most clinically relevant properties.

Page 21: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC Curves

Now assume that our test statistic is no longer binary, but takes on a series of values (for instance how many of five distinct risk factors a person exhibits).

Clinically we make a rule that says the test is positive if the number of risk factors meets or exceeds some threshold (#RF > x)

Suppose our previous table resulted from using x = 4.

Let’s see what happens as we vary x.

Page 22: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesImpact of using a threshold of 3 or more RFs

True Disease Status

Pos Neg

Test

Criterion

Pos 45 200 245Neg 55 700 755

100 900 1000 27/100 = .45 Se =

Sp = 727/900 = .78

FPF = 1- Sp = .22

Acc = (27+727)/1000 = .75

PPV = 27/200 = .18

NPV = 727/800 = .93

Se , Sp , and interestingly both PPV and NPV

.27

.81 .14

.91

.75

200

800

Page 23: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesSummary of all possible options

Threshold TPR FPR

6 0.00 0.005 0.10 0.114 0.27 0.193 0.45 0.222 0.73 0.271 0.98 0.800 1.00 1.00

As we relax our threshold for defining “disease,” our

true positive rate (sensitivity) increases, but so does the false positive

rate (FPR).

The ROC curve is a way to visually display this

information.

Page 24: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesSummary of all possible options

Threshold TPR FPR

6 0.00 0.005 0.10 0.114 0.27 0.193 0.45 0.222 0.73 0.271 0.98 0.800 1.00 1.00

x=5

x=4

x=2

The diagonal line shows what we would expect from simple guessing (i.e., pure chance).

What might an even better ROC curve look like?

Page 25: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesSummary of a more optimal curve

Threshold TPR FPR

6 0.00 0.005 0.10 0.014 0.77 0.023 0.90 0.032 0.95 0.041 0.99 0.400 1.00 1.00

Note the immediate sharp rise in sensitivity. Perfect accuracy is represented by upper left corner.

Page 26: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesUse and interpretation

The ROC curve allows us to see, in a simple visual display, how sensitivity and specificity vary as our threshold varies.

The shape of the curve also gives us some visual clues about the overall strength of association between the underlying test statistic (in this case #RFs that are present) and disease status.

Page 27: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesUse and interpretation

The ROC methodology easily generalizes to test statistics that are continuous (such as lung function or a blood gas). We simply fit a smoothed ROC curve through all observed data points.

Page 28: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesUse and interpretation

See demo from www.anaesthetist.com/mnm/stats/roc/index.htm

Page 29: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesArea under the curve (AUC)

The total area of the grid represented by an ROC curve is 1, since both TPR and FPR range from 0 to 1.

The portion of this total area that falls below the ROC curve is known as the area under the curve, or AUC.

Page 30: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Area Under the Curve (AUC)Interpretation

The AUC serves as a quantitative summary of the strength of association between the underlying test statistic and disease status.

An AUC of 1.0 would mean that the test statistic could be used to perfectly discriminate between cases and controls.

An AUC of 0.5 (reflected by the diagonal 45° line) is equivalent to simply guessing.

Page 31: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Area Under the Curve (AUC)Interpretation

The AUC can be shown to equal the Mann-Whitney U statistic, or equivalently the Wilcoxon rank statistic, for testing whether the test measure differs for individuals with and without disease.

It also equals the probability that the value of our test measure would be higher for a randomly chosen case than for a randomly chosen control.

Page 32: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Area Under the Curve (AUC)Interpretation

FPR

TPR

1

0 1

ROC Curve

AUC  

~ 0.540

casescontrols

Page 33: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

AUC  

~ .95

TPR

1

0 1FPR

ROC Curve

Area Under the Curve (AUC)Interpretation

casescontrols

Page 34: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Area Under the Curve (AUC)Interpretation

What defines a “good” AUC? Opinions vary Probably context specific

What may be a good AUC for predicting COPD may be very different than what is a good AUC for predicting prostate cancer

Page 35: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Area Under the Curve (AUC)Interpretation

http://gim.unmc.edu/dxtests/roc3.htm .90-1.0 = excellent .80-.90 = good .70-.80 = fair .60-.70 = poor .50-.60 = failRemember that <.50 is worse than guessing!

Page 36: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

Area Under the Curve (AUC)Interpretation

www.childrens-mercy.org/stats/ask/roc.asp .97-1.0 = excellent .92-.97 = very good .75-.92 = good .50-.75 = fair

Page 37: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesComparing multiple ROC curves

Suppose we have two candidate test statistics to use to create a binary decision rule. Can we use ROC curves to choose an optimal one?

Page 38: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesComparing multiple ROC curves

Adapted from curves at: http://gim.unmc.edu/dxtests/roc3.htm

Page 39: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesComparing multiple ROC curves

http://en.wikipedia.org/wiki/Receiver_operating_characteristic

Page 40: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesComparing multiple ROC curves

We can formally compare AUCs for two competing test statistics, but does this answer our question?

AUC speaks to which measure, as a continuous variable, best discriminates between cases and controls?

It does not tell us which specific cutpoint to use, or even which test statistic will ultimately provide the “best” cutpoint.

Page 41: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesChoosing an optimal cutpoint

The choice of a particular Se and Sp should reflect the relative costs of FP and FN results.

What if a positive test triggers an invasive procedure? What if the disease is life threatening and I have an

inexpensive and effective treatment? How do you balance these and other competing factors? See excellent discussion of these issues at

www.anaesthetist.com/mnm/stats/roc/index.htm

Page 42: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesGeneralizations

These techniques can be applied to any binary outcome. It doesn’t have to be disease status. In fact, the use of ROC curves was first introduced during

WWII in response to the challenge of how to accurately identify enemy planes on radar screens.

Page 43: © 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic

© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

ROC CurvesFinal cautionary notes

We assume throughout the existence of a gold standard for measuring “disease,” when in practice no such gold standard exists. COPD, asthma, even cancer (can we truly rule out the absence of

cancer in a given patient?) As a result, even Se and Sp may not be inherently

stable test characteristics, but may vary depending on how we define disease and the clinical context in which it is measured. Are we evaluating the test in the general population or only among

patients referred to a specialty clinic? Incorrect specification of P and N will vary in these two settings.