roc curves studentslev3

Upload: ashish-pandey

Post on 03-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 ROC Curves Studentslev3

    1/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Receiver Operating Characteristic

    (ROC) Curves

    Assessing the predictive properties of a

    test statistic Decision Theory

  • 7/28/2019 ROC Curves Studentslev3

    2/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemConceptual Framework

    Suppose we have a test statistic for predicting the

    presence or absence of disease.

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos

    Neg

  • 7/28/2019 ROC Curves Studentslev3

    3/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemConceptual Framework

    Suppose we have a test statistic for predicting the

    presence or absence of disease.

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos

    Neg

  • 7/28/2019 ROC Curves Studentslev3

    4/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemConceptual Framework

    Suppose we have a test statistic for predicting the

    presence or absence of disease.

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos TP

    Neg

    TP = True Positive

  • 7/28/2019 ROC Curves Studentslev3

    5/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemConceptual Framework

    Suppose we have a test statistic for predicting the

    presence or absence of disease.

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos

    Neg

  • 7/28/2019 ROC Curves Studentslev3

    6/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemConceptual Framework

    Suppose we have a test statistic for predicting the

    presence or absence of disease.

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos FP

    Neg

    FP = False Positive

  • 7/28/2019 ROC Curves Studentslev3

    7/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemConceptual Framework

    Suppose we have a test statistic for predicting the

    presence or absence of disease.

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos

    Neg

  • 7/28/2019 ROC Curves Studentslev3

    8/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemConceptual Framework

    Suppose we have a test statistic for predicting the

    presence or absence of disease.

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos

    Neg FN

    FN = False Negative

  • 7/28/2019 ROC Curves Studentslev3

    9/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemConceptual Framework

    Suppose we have a test statistic for predicting the

    presence or absence of disease.

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos

    Neg

  • 7/28/2019 ROC Curves Studentslev3

    10/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemConceptual Framework

    Suppose we have a test statistic for predicting the

    presence or absence of disease.

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos

    Neg TN

    TN = True Negative

  • 7/28/2019 ROC Curves Studentslev3

    11/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemConceptual Framework

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos TP FP

    Neg FN TN

    P N P+ N

    Suppose we have a test statistic for predicting the

    presence or absence of disease.

  • 7/28/2019 ROC Curves Studentslev3

    12/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemConceptual Framework

  • 7/28/2019 ROC Curves Studentslev3

    13/432009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemTest Properties

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos TP FP

    Neg FN TN

    P N P+ N

    Accuracy = Probability that the test yields acorrect result.

    = (TP+TN) / (P+N)

  • 7/28/2019 ROC Curves Studentslev3

    14/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemTest Properties

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos TP FP

    Neg FN TN

    P N P+ N

    Sensitivity = Probability that a true case will test positive= TP / P

    Also referred to as True Positive Rate (TPR)

    orTrue Positive Fraction (TPF).

  • 7/28/2019 ROC Curves Studentslev3

    15/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemTest Properties

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos TP FP

    Neg FN TN

    P N P+ N

    Specificity = Probability that a true negative will test negative= TN / N

    Also referred to as True Negative Rate (TNR)

    orTrue Negative Fraction (TNF).

  • 7/28/2019 ROC Curves Studentslev3

    16/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemTest Properties

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos TP FP

    Neg FN TN

    P N P+ N

    1-Specificity = Prob that a true negative will test positive= FP / N

    Also referred to as False Positive Rate (FPR)

    orFalse Positive Fraction (FPF).

  • 7/28/2019 ROC Curves Studentslev3

    17/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemTest Properties

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos TP FP

    Neg FN TN

    P N P+ N

    Positive PredictiveValue (PPV) = Probability that a positive testwill truly have disease

    = TP / (TP+FP)

  • 7/28/2019 ROC Curves Studentslev3

    18/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemTest Properties

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos TP FP

    Neg FN TN

    P N P+ N

    Negative PredictiveValue (NPV) = Probability that a negative testwill truly be disease free

    = TN / (TN+FN)

  • 7/28/2019 ROC Curves Studentslev3

    19/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemExample

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos 27 173 200

    Neg 73 727 800

    100 900 1000

    27/100 = .27Se =

    Sp = 727/900 = .81

    FPF = 1- Sp = .19

    Acc = (27+727)/1000 = .75

    PPV = 27/200 = .14

    NPV = 727/800 = .91

  • 7/28/2019 ROC Curves Studentslev3

    20/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Binary Prediction ProblemTest Properties

    Of these properties, only Se and Sp (and hence FPR)

    are considered invariant test characteristics.

    Accuracy, PPV, and NPV will vary according to theunderlying prevalence of disease.

    Se and Sp are thus fundamental test properties and

    hence are the most useful measures for comparing

    different test criteria, even though PPV and NPV areprobably the most clinically relevant properties.

  • 7/28/2019 ROC Curves Studentslev3

    21/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC Curves

    Now assume that our test statistic is no longer binary,but takes on a series of values (for instance howmany of five distinct risk factors a person exhibits).

    Clinically we make a rule that says the test is positiveif the number of risk factors meets or exceeds somethreshold (#RF >x)

    Suppose our previous table resulted from

    usingx

    = 4. Lets see what happens as we vary x.

  • 7/28/2019 ROC Curves Studentslev3

    22/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesImpact of using a threshold of 3 or more RFs

    True Disease Status

    Pos Neg

    Test

    Criterion

    Pos 45 200 245

    Neg 55 700 755

    100 900 1000

    27/100 = .45Se =

    Sp = 727/900 = .78

    FPF = 1- Sp = .22

    Acc = (27+727)/1000 = .75

    PPV = 27/200 = .18

    NPV = 727/800 = .93

    Se, Sp, and interestingly both PPV and NPV

    .27

    .81 .14

    .91

    .75

    200

    800

  • 7/28/2019 ROC Curves Studentslev3

    23/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesSummary of all possible options

    Threshold TPR FPR

    6 0.00 0.00

    5 0.10 0.114 0.27 0.19

    3 0.45 0.22

    2 0.73 0.27

    1 0.98 0.80

    0 1.00 1.00

    As we relax our thresholdfor defining disease, our

    true positive rate

    (sensitivity) increases, butso does the false positiverate (FPR).

    The ROC curve is a way tovisually display this

    information.

  • 7/28/2019 ROC Curves Studentslev3

    24/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesSummary of all possible options

    Threshold TPR FPR

    6 0.00 0.00

    5 0.10 0.114 0.27 0.19

    3 0.45 0.22

    2 0.73 0.27

    1 0.98 0.80

    0 1.00 1.00

    x=5

    x=4

    x=2

    The diagonal line shows what we would expect

    from simple guessing (i.e., pure chance).

    What might an even better ROC curve look like?

  • 7/28/2019 ROC Curves Studentslev3

    25/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesSummary of a more optimal curve

    Threshold TPR FPR

    6 0.00 0.00

    5 0.10 0.014 0.77 0.02

    3 0.90 0.03

    2 0.95 0.04

    1 0.99 0.40

    0 1.00 1.00Note the immediate sharp rise in

    sensitivity. Perfect accuracy is

    represented by upper left corner.

  • 7/28/2019 ROC Curves Studentslev3

    26/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesUse and interpretation

    The ROC curve allows us to see, in a simple

    visual display, how sensitivity and specificity

    vary as our threshold varies. The shape of the curve also gives us some

    visual clues about the overall strength of

    association between the underlying test

    statistic (in this case #RFs that are present)

    and disease status.

  • 7/28/2019 ROC Curves Studentslev3

    27/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesUse and interpretation

    The ROC methodologyeasily generalizes to teststatistics that are

    continuous (such as lungfunction or a blood gas).We simply fit a smoothedROC curve through all

    observed data points.

  • 7/28/2019 ROC Curves Studentslev3

    28/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesUse and interpretation

    See demo from

    www.anaesthetist.com/mnm/stats/roc/index.htm

    http://www.anaesthetist.com/mnm/stats/roc/index.htmhttp://www.anaesthetist.com/mnm/stats/roc/index.htm
  • 7/28/2019 ROC Curves Studentslev3

    29/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesArea under the curve (AUC)

    The total area of the grid

    represented by an ROC

    curve is 1, since both TPR

    and FPR range from 0 to 1.

    The portion of this total

    area that falls below the

    ROC curve is known asthearea under the curve,

    orAUC.

  • 7/28/2019 ROC Curves Studentslev3

    30/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Area Under the Curve (AUC)Interpretation

    The AUC serves as a quantitative summary ofthe strength of association between theunderlying test statistic and disease status.

    An AUC of 1.0 would mean that the teststatistic could be used to perfectly discriminatebetween cases and controls.

    An AUC of 0.5 (reflected by the diagonal 45line) is equivalent to simply guessing.

  • 7/28/2019 ROC Curves Studentslev3

    31/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Area Under the Curve (AUC)Interpretation

    The AUC can be shown to equal the Mann-

    Whitney U statistic, or equivalently the Wilcoxon

    rank statistic, for testing whether the testmeasure differs for individuals with and

    without disease.

    It also equals the probability that the value of our

    test measure would be higher for a randomly

    chosen case than for a randomly chosen control.

  • 7/28/2019 ROC Curves Studentslev3

    32/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Area Under the Curve (AUC)Interpretation

    FPR

    TPR

    1

    0 1

    ROC Curve

    AUC

    ~ 0.540

    casescontrols

  • 7/28/2019 ROC Curves Studentslev3

    33/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    AUC

    ~ .95

    TPR

    1

    0 1FPR

    ROC Curve

    Area Under the Curve (AUC)Interpretation

    casescontrols

  • 7/28/2019 ROC Curves Studentslev3

    34/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Area Under the Curve (AUC)Interpretation

    What defines a good AUC?

    Opinions vary

    Probably context specific What may be a good AUC for predicting COPD

    may be very different than what is a good AUC

    for predicting prostate cancer

  • 7/28/2019 ROC Curves Studentslev3

    35/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Area Under the Curve (AUC)Interpretation

    http://gim.unmc.edu/dxtests/roc3.htm

    .90-1.0 = excellent

    .80-.90 = good .70-.80 = fair

    .60-.70 = poor

    .50-.60 = fail

    Remember that

  • 7/28/2019 ROC Curves Studentslev3

    36/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    Area Under the Curve (AUC)Interpretation

    www.childrens-mercy.org/stats/ask/roc.asp

    .97-1.0 = excellent

    .92-.97 = very good .75-.92 = good

    .50-.75 = fair

    http://www.childrens-mercy.org/stats/ask/roc.asphttp://www.childrens-mercy.org/stats/ask/roc.asphttp://www.childrens-mercy.org/stats/ask/roc.asphttp://www.childrens-mercy.org/stats/ask/roc.asp
  • 7/28/2019 ROC Curves Studentslev3

    37/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesComparing multiple ROC curves

    Suppose we have two candidate test

    statistics to use to create a binary decision

    rule. Can we use ROC curves to choosean optimal one?

  • 7/28/2019 ROC Curves Studentslev3

    38/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesComparing multiple ROC curves

    Adapted from curves at: http://gim.unmc.edu/dxtests/roc3.htm

  • 7/28/2019 ROC Curves Studentslev3

    39/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesComparing multiple ROC curves

    http://en.wikipedia.org/w

    iki/Receiver_operating_

    characteristic

  • 7/28/2019 ROC Curves Studentslev3

    40/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesComparing multiple ROC curves

    We can formally compare AUCs for twocompeting test statistics, but does thisanswer our question?

    AUC speaks to which measure, as acontinuous variable, best discriminatesbetween cases and controls?

    It does not tell us which specific cutpoint touse, or even which test statistic will ultimatelyprovide the best cutpoint.

  • 7/28/2019 ROC Curves Studentslev3

    41/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesChoosing an optimal cutpoint

    The choice of a particular Se and Sp should reflect therelative costs of FP and FN results.

    What if a positive test triggers an invasive procedure?

    What if the disease is life threatening and I have aninexpensive and effective treatment?

    How do you balance these and other competing factors?

    See excellent discussion of these issues atwww.anaesthetist.com/mnm/stats/roc/index.htm

    http://www.anaesthetist.com/mnm/stats/roc/index.htmhttp://www.anaesthetist.com/mnm/stats/roc/index.htm
  • 7/28/2019 ROC Curves Studentslev3

    42/43

    2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH

    ROC CurvesGeneralizations

    These techniques can be applied to any binary

    outcome. It doesnt have to be disease status. In fact, the use of ROC curves was first introduced during

    WWII in response to the challenge of how to accurately

    identify enemy planes on radar screens.

  • 7/28/2019 ROC Curves Studentslev3

    43/43

    ROC CurvesFinal cautionary notes

    We assume throughout the existence of a goldstandard for measuring disease, when in practice nosuch gold standard exists. COPD, asthma, even cancer (can we truly rule out the absence of

    cancer in a given patient?)

    As a result, even Se and Sp may not be inherentlystable test characteristics, but may vary depending onhow we define disease and the clinical context in which

    it is measured. Are we evaluating the test in the general population or only amongpatients referred to a specialty clinic?

    Incorrect specification of P and N will vary in these two settings.