past is prologue: limitations of statistical prediction ... · 11/22/2019  · derek j smolenski,...

27
Past Is Prologue: Limitations of Statistical Prediction Persist in Predictive Modeling “Medically Ready Force…Ready Medical Force” 1 Derek J Smolenski, PhD, MPH Epidemiologist Psychological Health Center of Excellence, Defense Health Agency UNCLASSIFIED

Upload: others

Post on 06-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Past Is Prologue: Limitations of Statistical Prediction Persist in Predictive Modeling

“Medically Ready Force…Ready Medical Force” 1

Derek J Smolenski, PhD, MPH

Epidemiologist Psychological Health Center of Excellence, Defense Health Agency

UNCLASSIFIED

Page 2: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Disclosure

Derek J Smolenski, has nothing to disclose.

Disclosure will be made when a product is discussed for an unapproved use.

The views expressed in this presentation are those of the presenters and do not necessarily reflect the official policy or position of the Department of Defense (DoD) or the U.S. Government.

This continuing education activity is managed and accredited by AffinityCE in collaboration with AMSUS. AffinityCE and AMSUS staff as well as Planners and Reviewers, have no relevant financial or non-financial interests to disclose.

Commercial Support was not received for this activity

“Medically Ready Force…Ready Medical Force” 2UNCLASSIFIED

Page 3: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Objectives

Participants will be able to

discriminate between sensitivity and positive predictive value.

identify two strategies to improve the positive predictive performance of a predictive algorithm.

explain why the low prevalence of a target outcome is detrimental to the positive predictive value.

“Medically Ready Force…Ready Medical Force” 3

Page 4: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Overview

∎ Introduction

∎Historical perspectives

∎ Key concepts

∎Overview of literature

∎ Simulation findings

∎ Clinical utility

∎ Summary

“Medically Ready Force…Ready Medical Force” 4

Page 5: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Introduction

∎Death by suicide is a concern for both the US general population and the military population

∎ Rates for both groups have shown increases over time (DoDSER, 2017)

∎ Statistical models proposed to improve potential case identification

∎Unclear how useful these models will be in practical application

∎ Recently reviewed by Belsher et al. (2019)

“Medically Ready Force…Ready Medical Force” 5

Page 6: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Historical Perspectives

∎ “Using empirically derived schedules to predict suicide with any clinical certainty is unlikely” (Mackinnon & Farberow, 1976; p. 91) An instrument that has a 17-20% positive predictive accuracy could be

useful

∎ Low base rate and instrumentation issues (Pokorny, 1983)

∎ Accurate assessment and clinical utility differ – for violence prediction, insufficiently accurate to sort individuals into substantively distinct risk groups (Mossman, 2000)

∎ Inaccuracies in actuarial and clinical risk assessment, and lack of evidence of meaningful clinical intervention (Undrill, 2007)

“Medically Ready Force…Ready Medical Force” 6

Page 7: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Key Concepts

∎Accuracy (𝑎 + 𝑑)/𝑁

∎ Sensitivity (Se; Recall) 𝑎/𝑁𝑝

∎ Specificity (Sp) 𝑑/𝑁(1 − 𝑝)

∎ Positive predictive value (PPV; Precision) 𝑎/(𝑎 + 𝑏)

“Medically Ready Force…Ready Medical Force” 7

Suicide No Suicide

Positive a b a+b

Negative c d c+d

Np N(1-p) N

Page 8: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Key Concepts

∎ Sensitivity and specificity depend on classification threshold

Tend to be stable across populations

∎ Predictive values heavily influenced by population prevalence in addition to classification thresholds

“Medically Ready Force…Ready Medical Force” 8

Page 9: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Positive Predictive Value

01

02

03

04

05

06

07

08

09

01

00

PP

V (

%)

0 20 40 60 80 100

Prevalence (%)

Se=30, Sp=99 Se=50, Sp=95 Se=80, Sp=50

Se=99, Sp=99 PPV = 50%

“Medically Ready Force…Ready Medical Force” 9

Page 10: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Positive Predictive Value

01

02

03

04

05

06

07

08

09

01

00

PP

V (

%)

0 .25 .5 .75 1 1.25 1.5 1.75 2

Prevalence (%)

Se=30, Sp=99 Se=50, Sp=95 Se=80, Sp=50

Se=99, Sp=99 PPV = 50%

“Medically Ready Force…Ready Medical Force” 10

Page 11: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Area Under the Receiver-Operating Characteristic Curve

02

04

06

08

01

00

Se

nsitiv

ity (

%)

0 20 40 60 80 100

1-Specificity (%)

Model Random

“Medically Ready Force…Ready Medical Force” 11

AUC = 0.86

Data from Simon et al, 2018

Page 12: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Advances in Predictive Models

∎ Enhanced computing capabilities

∎Machine-learning algorithms

∎ Intensive validation techniques

“Medically Ready Force…Ready Medical Force” 12

Page 13: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Application in the Literature

∎ Systematic literature review of suicide mortality and suicide attempt prediction models

∎ 17 prospective studies included

∎ AUC values were considered ‘good’ across models at 0.80 or above

∎ Positive predictive values were very low (<1%) in most instances.

Driven in large part by low base rate

∎ Risk predicted over set time horizons (e.g., 30-,90-days; 3 months, 1 year)

“Medically Ready Force…Ready Medical Force” 13

Page 14: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Simulation

∎Used estimates from the literature of sensitivity and risk threshold to simulate results in different population configurations

Population risk = 200, 500, 1000, and 2000 per 1,000,000 individuals (200 per 1,000,000 is proximal to US adult population annual suicide mortality rate [WISQARS])

Thresholds = 99th, 95th, 90th, and 50th percentile

Sensitivity means = 0.12, 0.23, 0.44, 0.82 corresponding to thresholds above

“Medically Ready Force…Ready Medical Force” 14

Page 15: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Results

0

500

100

01

50

02

00

02

50

03

00

0

Ind

ivid

uals

99th 95th 90th 50th

True Positive False Negative No. Needed

“Medically Ready Force…Ready Medical Force” 15

Base rate = 200 per 1M

Page 16: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Results

0

500

100

01

50

02

00

02

50

03

00

0

Ind

ivid

uals

99th 95th 90th 50th

True Positive False Negative No. Needed

“Medically Ready Force…Ready Medical Force” 16

Base rate = 1000 per 1M

Page 17: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Results

AUC

02

04

06

08

01

00

Se

nsitiv

ity (

%)

0 20 40 60 80 100

1-Specificity (%)

Model Random

Precision-Recall (Saito & Rehmsmeier, 2015)

02

04

06

08

01

00

Pre

cis

ion

(P

PV

; %

)

0 20 40 60 80 100

Recall (Sensitivity; %)

Model Random

“Medically Ready Force…Ready Medical Force” 17

Data from Simon et al, 2018

Page 18: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Results

0.1

.2.3

.4.5

.6.7

.8.9

1

Pre

cis

ion

(P

PV

; %

)

0 20 40 60 80 100

Recall (Sensitivity; %)

Model Prevalence

“Medically Ready Force…Ready Medical Force” 18

AUC = 0.005

Page 19: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Results

∎ Two-stage simulation didn’t improve performance dramatically

∎ Populations with higher base rate performed better

Argues against whole-population implementation

∎AUC estimates provided overly positive assessment of model accuracy

∎Models can be effective as an exclusionary measure (good negative predictive value), but not inclusionary (Streiner, 2003)

“Medically Ready Force…Ready Medical Force” 19

Page 20: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Clinical Utility

∎ Clinical utility (CU) index (Mitchell, 2011)

𝐶𝑈 += 𝑆𝑒 × 𝑃𝑃𝑉

𝐶𝑈 −= 𝑆𝑝 × 𝑁𝑃𝑉

Values <0.49 (49%) subjectively considered not useful

∎Decision curve analysis (Steyerberg et al., 2010)

Compares various courses of action to identify best choice (net benefit)

Varies by conditional risk threshold

“Medically Ready Force…Ready Medical Force” 20

Page 21: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Clinical Utility

Prevalence Threshold CU+ (%) CU- (%)

200 99 0.03 98.98

95 0.02 94.99

90 0.04 90.00

50 0.03 50.00

1000 99 0.14 98.92

95 0.11 94.94

90 0.20 89.98

50 0.14 50.01

“Medically Ready Force…Ready Medical Force” 21

Page 22: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Decision Curve Analysis

𝑇𝑃

𝑁−𝐹𝑃

𝑁

𝑝𝑡1 − 𝑝𝑡

∎Assume 200 per 1M population risk and 95th

percentile risk threshold

∎Options

Treat no one

NB = 0

Treat everyone

Treat those identified by the model

“Medically Ready Force…Ready Medical Force” 22

Page 23: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Decision Curve Analysis

-300

-200

-100

0

100

Net B

ene

fit (p

er

1M

ind

ivid

uals

)

0 2000 4000 6000 8000 10000

Risk Threshold (per 1M individuals)

Se=.10, Sp=.99 Se=.25, Sp=.95

Se=.44, Sp=.90 Se=.82, Sp=.50

Se=.50, Sp=.95 No Intervention

All Intervention

“Medically Ready Force…Ready Medical Force” 23

2000 per 1M = 500 individuals per positive case

Page 24: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

Ways Ahead

∎ Consider modeling in subsets with higher base rate

∎ Improve description of accuracy

∎ Consideration of interventions post positive screening

How many false positives are we willing to tolerate?

How effective is any intervention?

What is the resource burden?

Opportunity costs?

“Medically Ready Force…Ready Medical Force” 24

Page 25: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

References

Belsher, B., Smolenski, D., Pruitt, L., Bush, N. B., EH, Workman, D., Morgan, R., . . . Skopp, N. (2019). Prediction models for suicide attempts and deaths: a systematic review and simulation. JAMA Psychiatry, 76(6), 642-651. doi:10.1001/jamapsyhiatry.2019.0174

Pruitt, L., Smolenski, D., Tucker, J., Issa, F., Chodacki, J., McGraw, K., & Kennedy, C. (2019). DoDSER: Department of Defense Suicide Event Report Calendar Year 2017 Annual Report. Retrieved from https://www.pdhealth.mil/research-analytics/department-defense-suicide-event-report-dodser.

MacKinnon, D. R., & Farberow, N. L. (1976). An assessment of the utility of suicide prediction. Suicide and life-threatening behavior, 6(2), 86-92.

Mitchell, A. J. (2011). Sensitivity X PPV is a recognized test called the clinical utility index (CUI+). European Journal of Epidemiology, 26, 251-252. doi:10.1107/s10654-011-9561-x

Mossman, D. (2000). Assessing the risk of violence--are "accurate" predictions useful? Journal of the American Academy of Psychiatry Law, 28, 272-281.

Pokorny, A. D. (1983). Prediction of suicide in psychiatric patients. Archives of General Psychiatry, 40, 249-257.

“Medically Ready Force…Ready Medical Force” 25

Page 26: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

References

Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS One, 10(3). doi:10.1371/journal.pone.0118432

Simon, G. E., son, E. J., Lawrence, J. M., Rossom, R. C., Ahmedani, R., Lynch, F. L., . . . Shortreed, S. M. (2018). Predicting suicide attempts and suicide deaths following outpatient visits using electronic health records. American Journal of Psychicatry, 175(10), 951-960. doi:10.1176/appi.ajp.2018.17101167

Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obuchowski, N., . . . Kattan, M. W. (2009). Asssessing the performance of prediction models: A framework for traditional and novel measures. Epidemiology, 21(1), 128-138. doi:10.1097/EDE.0b013e3181c30fb2

Streiner, D. L. (2003). Diagnosing tests: using and misusing diagnostic and screening tests. Journal of Personality Assessment, 81(3), 209-219.

Undrill, G. (2007). The risks of risk assessment. Advances in Psychiatric Treatment, 13, 291-297. doi:10.1192/apt.bp.106.003160

Vickers, A. J. (2008). Decision analysis for the evaluation of diagnostic tests, prediction models and molecular markers. American Statistician, 62(4), 314-320. doi:10.1198/000313008X370302

“Medically Ready Force…Ready Medical Force” 26

Page 27: Past Is Prologue: Limitations of Statistical Prediction ... · 11/22/2019  · Derek J Smolenski, has nothing to disclose. Disclosure will be made when a product is discussed for

How to Earn CE If you would like to earn continuing education credit for this activity, please visit:

http://amsus.cds.pesgce.com

Hurry, CE Certificates will only be available for 30 Days after this event!

“Medically Ready Force…Ready Medical Force” 27

How to Earn CE