can replicate-based methods be used in variance estimation ... · rti international study...
TRANSCRIPT
RTI International RTI International
RTI International is a trade name of Research Triangle Institute. www.rti.org
Can Replicate-Based Methods Be
Used in Variance Estimation for
a Cut-Point Estimator Derived
through ROC Analysis?
Dan Liao, Phillip S. Kott,
and Sarra L. Hedden
RTI International
Acknowledgements
The presentation is sponsored by RTI Statistics and Epidemiology,
with the research included in the presentation stemming from
ongoing methodological work conducted under the contract for the
National Survey on Drug Use and Health (NSDUH). The NSDUH
is funded by the Substance Abuse and Mental Health Services
Administration (SAMHSA), Center for Behavioral Health Statistics
and Quality under contract no. 283-2008-00004C and project no.
0212682.
The views expressed in this presentation do not necessarily reflect
the official position or policies of SAMHSA or the U.S.
Department of Health and Human Services; nor does mention of
trade names, commercial practices, or organizations imply
endorsement by the U.S. Government. 2
RTI International
Study Background
The NSDUH provides national, state and substate data on
substance use and mental health in the civilian,
noninstitutionalized population age 12 and older.
Data are collected on a quarterly basis each year.
Approximately 700 field interviewers (FIs) staffed.
Approximately 140,000 household screenings and 67,500
interviews completed annually.
Conducted by RTI under contract with SAMHSA.
3
RTI International
In 2008, the Mental Health Surveillance Study (MHSS) was
initiated to produce prevalence estimates of serious mental
illness (SMI)
The MHSS included:
– A clinical follow-up study on a subsample of NSDUH respondents, to
diagnose SMI
– Short scales in main NSDUH interview, to be used as predictors of
SMI in a model
– A regression model, based on MHSS subsample data, to be applied to
the main NSDUH sample data to predict SMI for each respondent
Mental Health Surveillance Study
4
RTI International
MHSS Clinical Follow-up
Randomly subsampled roughly 1,500 NSDUH respondents in 2008,
2011 and 2012; 500 each in 2009 and 2010 (out of ~45K adult
NSDUH respondents), through a stratified sampling scheme.
Telephone clinical psychiatric interview 2-4 weeks after main
interview
For variance estimation, NSDUH stratification and clustering design
variables were modified for the MHSS data. Some variance strata
were collapsed to yield a total of 50 MHSS variance strata.
Only included adult respondents who completed the NSDUH
interview in English
5
RTI International
Mental Health Scales on NSDUH
• Psychological Distress:
Kessler 6 (K6) scale (Kessler et. al, 2003)
6 items: item scores = 0 – 4, total score = 0 – 24
• Functional Impairment:
World Health Organization Disability Assessment
Schedule (WHODAS, abbreviated)
8 items: item scores = 0 – 3, total score = 0 – 24
6
RTI International
SMI Status Determination on the MHSS
A respondent to the MHSS is determined to have SMI if
he/she is diagnosed having any applicable past year DSM-IV
mental disorder and with Global Assessment of Functioning
(GAF) score 50.
This is viewed as the Gold Standard (“true”) SMI
determination
7
RTI International
Estimation Step 1: Determine Best
Weighted Logistic Regression Model
Using MHSS Subsample
Let π = Pr(“true” SMI│X1, X2) logit(π) = b0 + b1X1 + b2X2
X1 = recoded K6 total score (0-17)
X2 = recoded WHODAS total score (0-8)
8
RTI International
Estimation Step 2: Determine
Minimum-Bias Cutpoint using MHSS
sample
1. Based on model, each MHSS respondent has predicted
Pr(SMI+) =
2. Based on clinical interview, each MHSS respondent has a
“true” SMI diagnosis
3. Select cutpoint, , for which false positives (FPs)
approximately equal false negatives (FNs) in the MHSS
subsample
- If then predicted SMI status = positive
- If then predicted SMI status = negative
0
0
0 9
RTI International
Estimation Step 3: Apply Model to
Main Sample (NSDUH)
1. Based on model, and reported K6 and WHODAS scores,
each NSDUH respondent has predicted Pr(SMI+) =
2. If then SMI status ykassigned= yes (1)
If then SMI status ykassigned = no (0)
3.
0
0
10
assignedNSDUHk k
NSDUHrespondentsin domain
NSDUHk
NSDUHrespondentsin domain
w y
Cutpoint Estimatorw
RTI International
Challenges in Variance Estimation
We do not know how to develop a linearized variance
estimator for the cutpoint estimator of a domain (i.e. subgroup,
subpopulation) mean, even under the assumption that fitted
logistic model is true.
Replication-based variance estimator:
– Jackknife
– Boostrap
– Balanced repeated replication (BRR)
– Fay’s BRR
11
RTI International
Simulation Study
Step 1: Simulate a full NSDUH sample N using combined 2008-
2010 MHSS samples. Treat it as the population.
Step 2: Keep the same 50 MHSS variance strata as in the MHSS, but
randomly reassigned that part of N in each variance stratum to the
two MHSS variance PSUs.
Step 3: In the tth simulation, call the selected a sample S(t) . Compute
a cutpoint estimate of SMI in S(t) , denoted r(t) , and estimate its
variance using Fay’s BRR, denoted vF(t) .
Step 4: W also compute a cutpoint estimate from the generated
population, denoted as R. The empirical variance/mse of cut-point
estimation is then: EV = (𝒓 𝒕 − 𝒓 )2/𝑇𝑇𝑡=1 .
Simulation: T=1,679
12
RTI International
The Corrected Method for Cutpoint
Estimators
It is usually impossible to find a perfect cut-point where false
positives equal false negatives exactly.
This leads to a small bias in cut-point estimator. This bias is
almost always ignorably small in practice. However, it can
inflate the estimated variance when using Fay’s BRR method.
A corrected version of each replicate cut-point estimator
assigns some NSDUH respondents with one particular
probability of being SMI a value between 0 (no SMI) and 1
(SMI) so that the bias in the estimator disappears.
13
RTI International
An Example for Corrected Method
Assume we have 10 respondents in a sample:
SMI=True SMI status
Pr.=Predicted probability of having SMI from a logistic model
C50=When the cut-point equals to 0.5, the assigned SMI status (1 FN; 2FP)
C30=When the cut-point equals to 0.3, the assigned SMI status (3FN; 2FP)
CC30=Using the corrected method with the cut-point equal to 0.3, the assigned
SMI status (2FN; 2FP)
SMI 1 1 1 1 1 0 0 0 0 0
Pr. 0.2 0.2 0.5 0.6 0.7 0.5 0.2 0.3 0.3 0.2
C50 0 0 1 1 1 1 0 0 0 0
C30 0 0 1 1 1 1 0 1 1 0
CC30 0 0 1 1 1 1 0 0.5 0.5 0
14
RTI International
Average Estimated and Empirical
Standard Errors Using Fay’s BRR
Estimator Domain
Average
Est. SE
Empirical
SE
Relative
Error
Cut-point All 0.856 0.813 5.26
Cut-point Hispanic 0.705 0.575 22.51
Cut-point* All 0.806 0.804 0.36
Cut-point* Hispanic 0.692 0.58 19.44
* The corrected method used to determine cut-points in replicates.
15
RTI International
Discussions
Although a version of Fay’s BRR [balanced-repeated-
replication variance estimator] can be reliable at the all-adults
level, it can seriously overestimate standard errors for some
domains.
– The standard errors for the cut-point estimates of SMI prevalence for
Hispanics were roughly 20% too high.
Using corrected cut-point estimation in the replicates helped
reduce this bias modestly.
16
RTI International
Future Directions and Questions
What about other mental illness indicators (e.g., any mental
illness [AMI])?
Will Fay’s BRR behave better in the simulation with a better
fitted logistic model
SMI prevalence rate 5%; AMI prevalence rate 20%
More covariates (e.g. lifetime major depression episode)
Develop a workable measure of the cutpoint estimator’s
standard error?
17
RTI International
More Information
Dan Liao
Research Statistician
301.816.4605
18