past is prologue: limitations of statistical prediction ... · 11/22/2019 · derek j smolenski,...
TRANSCRIPT
Past Is Prologue: Limitations of Statistical Prediction Persist in Predictive Modeling
“Medically Ready Force…Ready Medical Force” 1
Derek J Smolenski, PhD, MPH
Epidemiologist Psychological Health Center of Excellence, Defense Health Agency
UNCLASSIFIED
Disclosure
Derek J Smolenski, has nothing to disclose.
Disclosure will be made when a product is discussed for an unapproved use.
The views expressed in this presentation are those of the presenters and do not necessarily reflect the official policy or position of the Department of Defense (DoD) or the U.S. Government.
This continuing education activity is managed and accredited by AffinityCE in collaboration with AMSUS. AffinityCE and AMSUS staff as well as Planners and Reviewers, have no relevant financial or non-financial interests to disclose.
Commercial Support was not received for this activity
“Medically Ready Force…Ready Medical Force” 2UNCLASSIFIED
Objectives
Participants will be able to
discriminate between sensitivity and positive predictive value.
identify two strategies to improve the positive predictive performance of a predictive algorithm.
explain why the low prevalence of a target outcome is detrimental to the positive predictive value.
“Medically Ready Force…Ready Medical Force” 3
Overview
∎ Introduction
∎Historical perspectives
∎ Key concepts
∎Overview of literature
∎ Simulation findings
∎ Clinical utility
∎ Summary
“Medically Ready Force…Ready Medical Force” 4
Introduction
∎Death by suicide is a concern for both the US general population and the military population
∎ Rates for both groups have shown increases over time (DoDSER, 2017)
∎ Statistical models proposed to improve potential case identification
∎Unclear how useful these models will be in practical application
∎ Recently reviewed by Belsher et al. (2019)
“Medically Ready Force…Ready Medical Force” 5
Historical Perspectives
∎ “Using empirically derived schedules to predict suicide with any clinical certainty is unlikely” (Mackinnon & Farberow, 1976; p. 91) An instrument that has a 17-20% positive predictive accuracy could be
useful
∎ Low base rate and instrumentation issues (Pokorny, 1983)
∎ Accurate assessment and clinical utility differ – for violence prediction, insufficiently accurate to sort individuals into substantively distinct risk groups (Mossman, 2000)
∎ Inaccuracies in actuarial and clinical risk assessment, and lack of evidence of meaningful clinical intervention (Undrill, 2007)
“Medically Ready Force…Ready Medical Force” 6
Key Concepts
∎Accuracy (𝑎 + 𝑑)/𝑁
∎ Sensitivity (Se; Recall) 𝑎/𝑁𝑝
∎ Specificity (Sp) 𝑑/𝑁(1 − 𝑝)
∎ Positive predictive value (PPV; Precision) 𝑎/(𝑎 + 𝑏)
“Medically Ready Force…Ready Medical Force” 7
Suicide No Suicide
Positive a b a+b
Negative c d c+d
Np N(1-p) N
Key Concepts
∎ Sensitivity and specificity depend on classification threshold
Tend to be stable across populations
∎ Predictive values heavily influenced by population prevalence in addition to classification thresholds
“Medically Ready Force…Ready Medical Force” 8
Positive Predictive Value
01
02
03
04
05
06
07
08
09
01
00
PP
V (
%)
0 20 40 60 80 100
Prevalence (%)
Se=30, Sp=99 Se=50, Sp=95 Se=80, Sp=50
Se=99, Sp=99 PPV = 50%
“Medically Ready Force…Ready Medical Force” 9
Positive Predictive Value
01
02
03
04
05
06
07
08
09
01
00
PP
V (
%)
0 .25 .5 .75 1 1.25 1.5 1.75 2
Prevalence (%)
Se=30, Sp=99 Se=50, Sp=95 Se=80, Sp=50
Se=99, Sp=99 PPV = 50%
“Medically Ready Force…Ready Medical Force” 10
Area Under the Receiver-Operating Characteristic Curve
02
04
06
08
01
00
Se
nsitiv
ity (
%)
0 20 40 60 80 100
1-Specificity (%)
Model Random
“Medically Ready Force…Ready Medical Force” 11
AUC = 0.86
Data from Simon et al, 2018
Advances in Predictive Models
∎ Enhanced computing capabilities
∎Machine-learning algorithms
∎ Intensive validation techniques
“Medically Ready Force…Ready Medical Force” 12
Application in the Literature
∎ Systematic literature review of suicide mortality and suicide attempt prediction models
∎ 17 prospective studies included
∎ AUC values were considered ‘good’ across models at 0.80 or above
∎ Positive predictive values were very low (<1%) in most instances.
Driven in large part by low base rate
∎ Risk predicted over set time horizons (e.g., 30-,90-days; 3 months, 1 year)
“Medically Ready Force…Ready Medical Force” 13
Simulation
∎Used estimates from the literature of sensitivity and risk threshold to simulate results in different population configurations
Population risk = 200, 500, 1000, and 2000 per 1,000,000 individuals (200 per 1,000,000 is proximal to US adult population annual suicide mortality rate [WISQARS])
Thresholds = 99th, 95th, 90th, and 50th percentile
Sensitivity means = 0.12, 0.23, 0.44, 0.82 corresponding to thresholds above
“Medically Ready Force…Ready Medical Force” 14
Results
0
500
100
01
50
02
00
02
50
03
00
0
Ind
ivid
uals
99th 95th 90th 50th
True Positive False Negative No. Needed
“Medically Ready Force…Ready Medical Force” 15
Base rate = 200 per 1M
Results
0
500
100
01
50
02
00
02
50
03
00
0
Ind
ivid
uals
99th 95th 90th 50th
True Positive False Negative No. Needed
“Medically Ready Force…Ready Medical Force” 16
Base rate = 1000 per 1M
Results
AUC
02
04
06
08
01
00
Se
nsitiv
ity (
%)
0 20 40 60 80 100
1-Specificity (%)
Model Random
Precision-Recall (Saito & Rehmsmeier, 2015)
02
04
06
08
01
00
Pre
cis
ion
(P
PV
; %
)
0 20 40 60 80 100
Recall (Sensitivity; %)
Model Random
“Medically Ready Force…Ready Medical Force” 17
Data from Simon et al, 2018
Results
0.1
.2.3
.4.5
.6.7
.8.9
1
Pre
cis
ion
(P
PV
; %
)
0 20 40 60 80 100
Recall (Sensitivity; %)
Model Prevalence
“Medically Ready Force…Ready Medical Force” 18
AUC = 0.005
Results
∎ Two-stage simulation didn’t improve performance dramatically
∎ Populations with higher base rate performed better
Argues against whole-population implementation
∎AUC estimates provided overly positive assessment of model accuracy
∎Models can be effective as an exclusionary measure (good negative predictive value), but not inclusionary (Streiner, 2003)
“Medically Ready Force…Ready Medical Force” 19
Clinical Utility
∎ Clinical utility (CU) index (Mitchell, 2011)
𝐶𝑈 += 𝑆𝑒 × 𝑃𝑃𝑉
𝐶𝑈 −= 𝑆𝑝 × 𝑁𝑃𝑉
Values <0.49 (49%) subjectively considered not useful
∎Decision curve analysis (Steyerberg et al., 2010)
Compares various courses of action to identify best choice (net benefit)
Varies by conditional risk threshold
“Medically Ready Force…Ready Medical Force” 20
Clinical Utility
Prevalence Threshold CU+ (%) CU- (%)
200 99 0.03 98.98
95 0.02 94.99
90 0.04 90.00
50 0.03 50.00
1000 99 0.14 98.92
95 0.11 94.94
90 0.20 89.98
50 0.14 50.01
“Medically Ready Force…Ready Medical Force” 21
Decision Curve Analysis
𝑇𝑃
𝑁−𝐹𝑃
𝑁
𝑝𝑡1 − 𝑝𝑡
∎Assume 200 per 1M population risk and 95th
percentile risk threshold
∎Options
Treat no one
NB = 0
Treat everyone
Treat those identified by the model
“Medically Ready Force…Ready Medical Force” 22
Decision Curve Analysis
-300
-200
-100
0
100
Net B
ene
fit (p
er
1M
ind
ivid
uals
)
0 2000 4000 6000 8000 10000
Risk Threshold (per 1M individuals)
Se=.10, Sp=.99 Se=.25, Sp=.95
Se=.44, Sp=.90 Se=.82, Sp=.50
Se=.50, Sp=.95 No Intervention
All Intervention
“Medically Ready Force…Ready Medical Force” 23
2000 per 1M = 500 individuals per positive case
Ways Ahead
∎ Consider modeling in subsets with higher base rate
∎ Improve description of accuracy
∎ Consideration of interventions post positive screening
How many false positives are we willing to tolerate?
How effective is any intervention?
What is the resource burden?
Opportunity costs?
“Medically Ready Force…Ready Medical Force” 24
References
Belsher, B., Smolenski, D., Pruitt, L., Bush, N. B., EH, Workman, D., Morgan, R., . . . Skopp, N. (2019). Prediction models for suicide attempts and deaths: a systematic review and simulation. JAMA Psychiatry, 76(6), 642-651. doi:10.1001/jamapsyhiatry.2019.0174
Pruitt, L., Smolenski, D., Tucker, J., Issa, F., Chodacki, J., McGraw, K., & Kennedy, C. (2019). DoDSER: Department of Defense Suicide Event Report Calendar Year 2017 Annual Report. Retrieved from https://www.pdhealth.mil/research-analytics/department-defense-suicide-event-report-dodser.
MacKinnon, D. R., & Farberow, N. L. (1976). An assessment of the utility of suicide prediction. Suicide and life-threatening behavior, 6(2), 86-92.
Mitchell, A. J. (2011). Sensitivity X PPV is a recognized test called the clinical utility index (CUI+). European Journal of Epidemiology, 26, 251-252. doi:10.1107/s10654-011-9561-x
Mossman, D. (2000). Assessing the risk of violence--are "accurate" predictions useful? Journal of the American Academy of Psychiatry Law, 28, 272-281.
Pokorny, A. D. (1983). Prediction of suicide in psychiatric patients. Archives of General Psychiatry, 40, 249-257.
“Medically Ready Force…Ready Medical Force” 25
References
Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS One, 10(3). doi:10.1371/journal.pone.0118432
Simon, G. E., son, E. J., Lawrence, J. M., Rossom, R. C., Ahmedani, R., Lynch, F. L., . . . Shortreed, S. M. (2018). Predicting suicide attempts and suicide deaths following outpatient visits using electronic health records. American Journal of Psychicatry, 175(10), 951-960. doi:10.1176/appi.ajp.2018.17101167
Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obuchowski, N., . . . Kattan, M. W. (2009). Asssessing the performance of prediction models: A framework for traditional and novel measures. Epidemiology, 21(1), 128-138. doi:10.1097/EDE.0b013e3181c30fb2
Streiner, D. L. (2003). Diagnosing tests: using and misusing diagnostic and screening tests. Journal of Personality Assessment, 81(3), 209-219.
Undrill, G. (2007). The risks of risk assessment. Advances in Psychiatric Treatment, 13, 291-297. doi:10.1192/apt.bp.106.003160
Vickers, A. J. (2008). Decision analysis for the evaluation of diagnostic tests, prediction models and molecular markers. American Statistician, 62(4), 314-320. doi:10.1198/000313008X370302
“Medically Ready Force…Ready Medical Force” 26
How to Earn CE If you would like to earn continuing education credit for this activity, please visit:
http://amsus.cds.pesgce.com
Hurry, CE Certificates will only be available for 30 Days after this event!
“Medically Ready Force…Ready Medical Force” 27
How to Earn CE