can replicate-based methods be used in variance estimation ... · rti international study...

RTI International RTI International

RTI International is a trade name of Research Triangle Institute. www.rti.org

Can Replicate-Based Methods Be

Used in Variance Estimation for

a Cut-Point Estimator Derived

through ROC Analysis?

Dan Liao, Phillip S. Kott,

and Sarra L. Hedden

[email protected]

RTI International

Acknowledgements

The presentation is sponsored by RTI Statistics and Epidemiology,

with the research included in the presentation stemming from

ongoing methodological work conducted under the contract for the

National Survey on Drug Use and Health (NSDUH). The NSDUH

is funded by the Substance Abuse and Mental Health Services

Administration (SAMHSA), Center for Behavioral Health Statistics

and Quality under contract no. 283-2008-00004C and project no.

0212682.

The views expressed in this presentation do not necessarily reflect

the official position or policies of SAMHSA or the U.S.

Department of Health and Human Services; nor does mention of

trade names, commercial practices, or organizations imply

endorsement by the U.S. Government. 2

RTI International

Study Background

The NSDUH provides national, state and substate data on

substance use and mental health in the civilian,

noninstitutionalized population age 12 and older.

Data are collected on a quarterly basis each year.

Approximately 700 field interviewers (FIs) staffed.

Approximately 140,000 household screenings and 67,500

interviews completed annually.

Conducted by RTI under contract with SAMHSA.

3

RTI International

In 2008, the Mental Health Surveillance Study (MHSS) was

initiated to produce prevalence estimates of serious mental

illness (SMI)

The MHSS included:

– A clinical follow-up study on a subsample of NSDUH respondents, to

diagnose SMI

– Short scales in main NSDUH interview, to be used as predictors of

SMI in a model

– A regression model, based on MHSS subsample data, to be applied to

the main NSDUH sample data to predict SMI for each respondent

Mental Health Surveillance Study

4

RTI International

MHSS Clinical Follow-up

Randomly subsampled roughly 1,500 NSDUH respondents in 2008,

2011 and 2012; 500 each in 2009 and 2010 (out of ~45K adult

NSDUH respondents), through a stratified sampling scheme.

Telephone clinical psychiatric interview 2-4 weeks after main

interview

For variance estimation, NSDUH stratification and clustering design

variables were modified for the MHSS data. Some variance strata

were collapsed to yield a total of 50 MHSS variance strata.

Only included adult respondents who completed the NSDUH

interview in English

5

RTI International

Mental Health Scales on NSDUH

• Psychological Distress:

Kessler 6 (K6) scale (Kessler et. al, 2003)

6 items: item scores = 0 – 4, total score = 0 – 24

• Functional Impairment:

World Health Organization Disability Assessment

Schedule (WHODAS, abbreviated)

8 items: item scores = 0 – 3, total score = 0 – 24

6

RTI International

SMI Status Determination on the MHSS

A respondent to the MHSS is determined to have SMI if

he/she is diagnosed having any applicable past year DSM-IV

mental disorder and with Global Assessment of Functioning

(GAF) score 50.

This is viewed as the Gold Standard (“true”) SMI

determination

7

RTI International

Estimation Step 1: Determine Best

Weighted Logistic Regression Model

Using MHSS Subsample

Let π = Pr(“true” SMI│X1, X2) logit(π) = b0 + b1X1 + b2X2

X1 = recoded K6 total score (0-17)

X2 = recoded WHODAS total score (0-8)

8

RTI International

Estimation Step 2: Determine

Minimum-Bias Cutpoint using MHSS

sample

1. Based on model, each MHSS respondent has predicted

Pr(SMI+) =

2. Based on clinical interview, each MHSS respondent has a

“true” SMI diagnosis

3. Select cutpoint, , for which false positives (FPs)

approximately equal false negatives (FNs) in the MHSS

subsample

- If then predicted SMI status = positive

- If then predicted SMI status = negative

0

0

0 9

RTI International

Estimation Step 3: Apply Model to

Main Sample (NSDUH)

1. Based on model, and reported K6 and WHODAS scores,

each NSDUH respondent has predicted Pr(SMI+) =

2. If then SMI status ykassigned= yes (1)

If then SMI status ykassigned = no (0)

3.

0

0

10

assignedNSDUHk k

NSDUHrespondentsin domain

NSDUHk

NSDUHrespondentsin domain

w y

Cutpoint Estimatorw

RTI International

Challenges in Variance Estimation

We do not know how to develop a linearized variance

estimator for the cutpoint estimator of a domain (i.e. subgroup,

subpopulation) mean, even under the assumption that fitted

logistic model is true.

Replication-based variance estimator:

– Jackknife

– Boostrap

– Balanced repeated replication (BRR)

– Fay’s BRR

11

RTI International

Simulation Study

Step 1: Simulate a full NSDUH sample N using combined 2008-

2010 MHSS samples. Treat it as the population.

Step 2: Keep the same 50 MHSS variance strata as in the MHSS, but

randomly reassigned that part of N in each variance stratum to the

two MHSS variance PSUs.

Step 3: In the tth simulation, call the selected a sample S(t) . Compute

a cutpoint estimate of SMI in S(t) , denoted r(t) , and estimate its

variance using Fay’s BRR, denoted vF(t) .

Step 4: W also compute a cutpoint estimate from the generated

population, denoted as R. The empirical variance/mse of cut-point

estimation is then: EV = (𝒓 𝒕 − 𝒓 )2/𝑇𝑇𝑡=1 .

Simulation: T=1,679

12

RTI International

The Corrected Method for Cutpoint

Estimators

It is usually impossible to find a perfect cut-point where false

positives equal false negatives exactly.

This leads to a small bias in cut-point estimator. This bias is

almost always ignorably small in practice. However, it can

inflate the estimated variance when using Fay’s BRR method.

A corrected version of each replicate cut-point estimator

assigns some NSDUH respondents with one particular

probability of being SMI a value between 0 (no SMI) and 1

(SMI) so that the bias in the estimator disappears.

13

RTI International

An Example for Corrected Method

Assume we have 10 respondents in a sample:

SMI=True SMI status

Pr.=Predicted probability of having SMI from a logistic model

C50=When the cut-point equals to 0.5, the assigned SMI status (1 FN; 2FP)

C30=When the cut-point equals to 0.3, the assigned SMI status (3FN; 2FP)

CC30=Using the corrected method with the cut-point equal to 0.3, the assigned

SMI status (2FN; 2FP)

SMI 1 1 1 1 1 0 0 0 0 0

Pr. 0.2 0.2 0.5 0.6 0.7 0.5 0.2 0.3 0.3 0.2

C50 0 0 1 1 1 1 0 0 0 0

C30 0 0 1 1 1 1 0 1 1 0

CC30 0 0 1 1 1 1 0 0.5 0.5 0

14

RTI International

Average Estimated and Empirical

Standard Errors Using Fay’s BRR

Estimator Domain

Average

Est. SE

Empirical

SE

Relative

Error

Cut-point All 0.856 0.813 5.26

Cut-point Hispanic 0.705 0.575 22.51

Cut-point* All 0.806 0.804 0.36

Cut-point* Hispanic 0.692 0.58 19.44

* The corrected method used to determine cut-points in replicates.

15

RTI International

Discussions

Although a version of Fay’s BRR [balanced-repeated-

replication variance estimator] can be reliable at the all-adults

level, it can seriously overestimate standard errors for some

domains.

– The standard errors for the cut-point estimates of SMI prevalence for

Hispanics were roughly 20% too high.

Using corrected cut-point estimation in the replicates helped

reduce this bias modestly.

16

RTI International

Future Directions and Questions

What about other mental illness indicators (e.g., any mental

illness [AMI])?

Will Fay’s BRR behave better in the simulation with a better

fitted logistic model

SMI prevalence rate 5%; AMI prevalence rate 20%

More covariates (e.g. lifetime major depression episode)

Develop a workable measure of the cutpoint estimator’s

standard error?

17

RTI International

More Information

Dan Liao

Research Statistician

301.816.4605

[email protected]

18

mailto:[email protected]

can replicate-based methods be used in variance estimation ... · rti international study...

Documents