comparison of the c-statistic with new model discriminators in the prediction of long versus short...

Comparison of the C-statistic with new model discriminators in the prediction of long versus short hospital stay Richard J Woodman 1, Campbell H Thompson 2, Susan W Kim 1, Paul Hakendorf 3. 1 Flinders Centre for Epidemiology and Biostatistics, Flinders University, Adelaide 2 Discipline of General Medicine, Adelaide University, Adelaide 3 Redesigning Care, Flinders Medical Centre, Adelaide 2011 Australia and New Zealand Stata Users Group meeting 17 th September 2011

Meaningful new risk predictors Traditionally rely on the Concordance statistic (C-statistic / ROC) for assessing usefulness of new predictive measures C-statistic Measures overall test/model accuracy (sensitivity/specificity) A weighted average of sensitivity over all possible cut-points Weighted by pdf of non-events High sensitivities (low cut-points) have high weights Probability Interpretation: the probability of assigning a greater risk to a randomly selected patient with the event compared with a randomly selected patient without the event. P(p event > p non-event ) for random pair Usefulness of new predictors ^ ^

Receiver Operating Curve (ROC) True positive rate False positive rate C-statistic Interpretation: Increase in probability that a random event subject will have a higher predicted p than a random non-event subject. Usually small after a few good predictors included in the model Predicted p

Clinicians want to know whether an added predictor will change risk such that they should treat patients differently Can we better quantify improvement in risk prediction from new biomarkers? Net Reclassification Improvement (NRI) Integrated Discrimination Improvement (IDI) Pencina, Agostino et al., Statist. Med. 2008; 27:157-172. How do they differ from the C-statistic? How and when should we be using them? New Risk reclassification measures

NRI can be calculated as a sum of two separate components: one for individuals with events and the other for individuals without events For events, assign 1 for upward reclassification, -1 for downward and 0 for people who do not change their risk category The opposite is done for non-events Sum the individual scores and divide by numbers of people in each group Net Reclassification Improvement

Category-free NRI Calculate p 1 and p 2 (Old model=p 1 New model=p 2 ) Event NRI = P(up l event) P(down l event) Non-event NRI = P(down l nonevent) P(up l nonevent) NRI= Event NRI+Non-event NRI(Pencina 2008) Or NRI(Pencina 2010) Or wNRI(Pencina 2010)

Absolute IDI: Probability difference in discrimination slopes (mean difference in p between events and non- events). = (p 2E - p 2NE ) - (p 1E - p 1NE ) = (p 2E - p 1E ) - (p 2NE - p 1NE ) Relative IDI = (p 2E - p 2NE )/(p 1E - p 1NE ) Integrated Discrimination Improvement (IDI)

Recent example JACC 2011; 58(10): 1025-33. August 2011

Veerana et al.

Category-dependent NRI

NRI Am J Epidemiology 174 (5); June 27, 2011

Unstratified NRI Noncases Cases Stratified NRI Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 nonCases Cases 0.085 0.088 0.003 0.055 0.053 -0.002 -0.01 (0.016) 0.72 Stratified versus Unstratified NRI Statistical testing: Z-score for discordance ~ McNemars test.

Predicting length of hospital stay Short-stay wards necessary due to bed shortages in specialist wards But incorrectly assign patients to short-stay Would overfill short stay units Prevent correct treatment for long stay patients Clinicians trained to diagnose and treat not to predict length of stay Few variables beyond age appear informative

Dataset 3 major hospitals FMC RGH Auckland N=1457 General medical patients Complete data on: Age SBP HR RR Mobility WBC count Cardiac failure (CF) Need for supplementary oxygen (SuO 2 ) All previously collected for predicting outcome Modified Early Warning Score (MEWS) Used by Emergency Medical Services to quickly determine risk of death SBP HR RR Temperature

Logistic regression model for predicting p: P(long stay) Scaling using 2 STATA commands: lintrend (Joanne Garrett Univ North Carolina) fracpoly (Patrick Royston) Calibration HL-deciles and LR tests Measures of Discrimination C-statistic IDI Category-dependent NRI 50% cut-off 57% cut-off Category free NRI Statistical Analysis

lintrend longstay age, round(10) plot(log) xlab ylab STATA lintrend command log odds age

STATA lintrend command log odds WBC count lintrend longstay wbc, round(1) plot(log) xlab ylab

. fracpoly logistic longstay wbc, table compare........ -> gen double Iwbc__1 = X^.5-.9876731667 if e(sample) -> gen double Iwbc__2 = X^.5*ln(X)+.0245010876 if e(sample) (where: X = wbc/10) Logistic regression Number of obs = 1457 LR chi2(2) = 49.38 Prob > chi2 = 0.0000 Log likelihood = -971.8662 Pseudo R2 = 0.0248 ------------------------------------------------------------------------------ longstay | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Iwbc__1 |.0040704.0076682 -2.92 0.003.0001014.1633818 Iwbc__2 | 34.78284 33.17947 3.72 0.000 5.362915 225.5948 ------------------------------------------------------------------------------ Deviance: 1943.73. Best powers of wbc among 44 models fit:.5.5. Fractional polynomial model comparisons: --------------------------------------------------------------- wbc df Deviance Dev. dif. P (*) Powers --------------------------------------------------------------- Not in model 0 1993.113 49.380 0.000 Linear 1 1954.819 11.087 0.011 1 m = 1 2 1949.234 5.502 0.064 2 m = 2 4 1943.732 -- --.5.5 --------------------------------------------------------------- (*) P-value from deviance difference comparing reported model with m = 2 model Fracpoly WBC

Odds ratio95% CIP-value Age (yrs)1.071.04-1.10

Calibration number of observations = 1457 number of groups = 10 Hosmer-Lemeshow chi2(8) = 14.66 Prob > chi2 = 0.07 number of observations = 1457 number of groups = 5 Hosmer-Lemeshow chi2(3) = 5.64 Prob > chi2 = 0.13 number of observations = 1457 number of covariate patterns = 1457 Pearson chi2(1445) = 1486.69 Prob > chi2 = 0.22

#Compare Age with Age + Heart rate using roccomp quietly logistic longstay age predict p1 if e(sample),p quietly logistic longstay c.age##c.hrby10 predict p2 if e(sample),p roccomp longstay p1 p2 ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] ------------------------------------------------------------------------- p1 1457 0.7167 0.0136 0.69000 0.74338 p2 1457 0.7433 0.0131 0.71767 0.76897 ------------------------------------------------------------------------- Ho: area(p1) = area(p2) chi2(1) = 15.68 Prob>chi2 = 0.0001 C-statistic

P(p event > p non-event ) for random pair ~ 2.5% ROC curves Age Area ROC=0.717 Age + heart rate Area ROC=0.743 ^^

Sensitivity and Specificity Improved sensitivity only at high cut-points. C-statistic weights large sensitivities more heavily May be why improvements in sensitivities with later predictors dont translate to increased C.

Distribution of probabilities shift lower Distribution of probabilities flatten Predicted probabilities

User written Author Liisa Byberg, Department of Surgical Sciences, Orthopedics unit, and Uppsala Clinical Research Center, Uppsala University, Sweden type net from http://www.ucr.uu.se/sv/images/stories/downloadshttp://www.ucr.uu.se/sv/images/stories/downloads Syntax nri1 depvar varlist1, prvars(varlist2) cut(#) nri2 depvar varlist1, prvars(varlist2) cut(# #) nri3 depvar varlist1, prvars(varlist2) cut(# # #) STATA NRI command

nri1 longstay age,prvars(hrby10 agehrby10) cut(50) ------------------------------------------------------------------ NRI | Estimate Std. Err. Z P-value ----------+------------------------------------------------------- | 0.05170 0.01792 2.88484 0.00392 ------------------------------------------------------------------ ------------------------------- longstay | and | Established risk Establish | factors + new ed risk | predictors factors | =50% Total ----------+-------------------- 1 | =50% | 36 620 656 | Total | 144 683 827 ----------+-------------------- 0 | =50% | 41 266 307 | Total | 335 295 630 ------------------------------- nri1 heart rate (probability cut-point=50) reclassified Downward (%) reclassified Upward (%) reclassified Upward- Downward (%) NRIP-value 36/827 (0.0435) 63/827 (0.0762) (0.0327) 41/630 (0.0650) 29/630 (0.0460) (-0.0190)0.05170.004 SE= ((0.0762+0.0435)/827 + (0.0460+0.0651)/630)=0.0179 z=0.0517/0.0179=2.88 (McNemar asymptotic test for correlated proportions)

STATA IDI command syntax idi depvar varlist1,prvars(varlist2) idi longstay age,prvars(hrby10 agehrby10) ---------------------------------------------------- IDI | Estimate Std. Err. P-value ----------+----------------------------------------- | 0.04195 0.00525 0.00000 ---------------------------------------------------- Definition: IDI= (IS 2 IS 1 ) (IP 2 IP 1 ) IDI = (p 2 -p 1 )events - (p 2 -p 1 )non-events IS = sensitivity IP = (1 specificity)

Predicted probabilities and the IDI IDI interpretation: Improvement in average sensitivity plus any potential decrease in average (1-specificty). Magnitude is hard to interpret. Some studies also present relative IDI (%).

C-Statistic IDI HRMobilityBPWBC RRCCFSupp_O2

NRI50 NRI57 HRMobilityBPWBC RRCCFSupp_O2 Effect of each variable on re-classification depends on the classification cut- point Small changes in chosen cut-point can have large influences

HRMobilityBPWBC RRCCFSupp_O2 Overall Category-free NRI Interpretation: proportion of subjects with movement of p in the correct direction averaged for event and non-event subjects.

Category-free Event NRICategory-free Non-Event NRI HRMobilityBPWBC RRCCFSupp_O2 Pr(p is higher-p is lower) mostly poorer re-classification Pr(p is lower- p is higher) consistently improved re-classification Interpretation: Net movement of ps in the correct direction - for event and non-event subjects separately.

Proportion of long-stay whose p went up Proportion of short-stay whose p went down HRMobilityBPWBC RRCCFSupp_O2 Mostly < 50% with each new variable Consistently > 50% with each new variable

Summary IDI Mirrored the C-statistic but was more sensitive. Equally weights sensitivity across cut-points. C-statistic weights large sensitivities more heavily. Category-dependent NRI The variables selected were heavily dependent on the chosen cut-points Fewer variables identified as important discriminators than for either the C-statistic, the IDI or category-free NRI. Category-free NRI Overall, quite similar results to the C-statistic and IDI Very different performances amongst the short-stay and longstay patients

Conclusions Discrimination statistics cannot be used interchangeably May be necessary to present all 4 for greatest insight. C-statistic: Averaged sensitivity Does not weight equally across cut-points Does not assess risk re-classification. IDI: Averaged sensitivity Weights cut-points equally Adjusts for specificity differently to C-statistic May better highlight potentially important predictors. Category-free NRI: % subjects with correct movement in p. Event and non-event NRI may perform quite differently Category-dependent NRI: % correct movement across categories. Results may be heavily influenced by chosen cut-points. Be wary of studies using the category-dependent NRI with non predefined cut-points.

comparison of the c-statistic with new model discriminators in the prediction of long versus short...

Documents

ne p

model predicted p slide

event nonevent nri

pp event p nonevent

categorydependent nri

higher predicted p

nonevent nri

event nri nonevent nripencina