development of a brief screening instrument for detecting ... · development of a brief screening...

16
Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth B. Wells, Barbara Leake, John Landsverk Source: Medical Care, Vol. 26, No. 8 (Aug., 1988), pp. 775-789 Published by: Lippincott Williams & Wilkins Stable URL: http://www.jstor.org/stable/3765462 Accessed: 28/11/2009 22:32 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=lww. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Lippincott Williams & Wilkins is collaborating with JSTOR to digitize, preserve and extend access to Medical Care. http://www.jstor.org

Upload: tranhuong

Post on 09-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

Development of a Brief Screening Instrument for Detecting Depressive DisordersAuthor(s): M. Audrey Burnam, Kenneth B. Wells, Barbara Leake, John LandsverkSource: Medical Care, Vol. 26, No. 8 (Aug., 1988), pp. 775-789Published by: Lippincott Williams & WilkinsStable URL: http://www.jstor.org/stable/3765462Accessed: 28/11/2009 22:32

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available athttp://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unlessyou have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and youmay use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/action/showPublisher?publisherCode=lww.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

Lippincott Williams & Wilkins is collaborating with JSTOR to digitize, preserve and extend access to MedicalCare.

http://www.jstor.org

Page 2: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

MEDICAL CARE August 1988, Vol. 26, No. 8

Development of a Brief Screening Instrument for Detecting Depressive Disorders

M. AUDREY BURNAM, PHD,* KENNETH B. WELLS, MD,*t BARBARA LEAKE, PHD,f AND JOHN LANDSVERK, PHDt

A very short (8-item), self-report measure was developed to screen for de- pressive disorders (major depression and dysthymia). The screener departs from traditional depressive symptom scales in that 1) individual items are differentially weighted and 2) two of the eight items concern diagnostically- relevant durations of depressed mood. Analyses of data from a general popula- tion and from primary care and mental health patients showed that the screener had high sensitivity and good positive predictive value for detecting depressive disorder, especially for recent disorders and those that met full DSM-III criteria. The high predictive utility of the screener, in combination with its brevity, suggests that it may be a useful tool for screening for depres- sion in health care settings. Key words: screening measure; depressive dis- orders; mental health measures. (Med Care 1988; 26:775-789)

In the last decade, there has been consid- erable interest in clinical and policy circles in

developing instruments that identify psy- chopathology in patient and in general pop- ulations. These instruments have been pri- marily of two types: self-report measures of

symptoms that reflect general psychological distress, and structured interview protocols

* From The RAND Corporation, Santa Monica, Cali- fornia.

t From The Department of Psychiatry and Biobe- havioral Sciences, University of California, Los An- geles.

$ From The Department of Internal Medicine, Uni- versity of California, Los Angeles.

This work was supported by grants from: The Robert Wood Johnson Foundation, The Kaiser Family Foun- dation, The Pew Memorial Trust, and the National In- stitute of Mental Health. The conclusions are those of the authors and do not necessarily reflect the views of the funding organizations.

Address correspondence to: M. Audrey Burnam, PHD, The RAND Corporation, 1700 Main St., Santa Monica, CA 90406-2138.

that identify specific psychiatric disorders.

Examples of the former type of instrument are the General Health Questionnaire (GHQ)1 and the Center for Epidemiologic Studies Depression Scale (CES-D);2 exam-

ples of the latter are the Diagnostic Inter- view Schedule (DIS)3 and the Schedule for Affective Disorders and Schizophrenia.4 The self-report measures of symptoms are much easier and cheaper to administer, but

they do not provide a specific diagnosis. While the structured interviews do provide a measure of psychiatric disorder, their cost may preclude their use in many larger-scale studies. More recently, several investigators have used a two-stage case identification

technique in which persons with a high level of general distress symptoms are iden- tified at the first stage.56 Those positive at the first stage then receive a structured in- terview that determines the presence of spe- cific disorders.

775

Page 3: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

BURNAM ET AL.

For some case identification purposes, even this two-stage screening process may be too expensive or burdensome to be feasi- ble. For example, for studies in which data are collected directly in health care settings (e.g., waiting rooms of private physicians' offices), screening procedures must often be

very brief. A first-stage screener that re-

quires 5 or 10 minutes to complete, as do most available symptom scales, may take too long when data on other variables are needed at the same time.

In this paper, the authors describe the de-

velopment of a very short (eight-item) self-

report screener for depressive disorders (major depression and dysthymia). The screener was developed specifically for use in the National Study of Medical Care Out- comes (MOS). The MOS has a large sample of patients, who were recruited when they visited health care providers in selected pri- mary care and mental health specialty prac- tices. The study required that a screener for depressive disorders be included in a 10 minute self-report instrument that also screened for three chronic medical diseases and obtained data on use of services, demo- graphic characteristics, and general health status and functioning. The specific require- ments for this screener were: 1) high sensi- tivity to current major depression and/or dysthymia so that the MOS sample would, as nearly as possible, include all patients with one of these depressive disorders, and 2) a positive predictive value that would in- sure that at least one third of those testing positive at the first-stage screening would subsequently test positive at the second stage. The use of a single-stage design, in which every patient received a diagnostic interview, would have required between 30 and 35 lengthy face-to-face interviews in order to find one case with current depres- sive disorder. If, on the other hand, a brief screener could be developed, having the above properties of high sensitivity and pos- itive predictive value, then data collection from 30-35 persons using the brief self-re-

776

port screening measure would limit the number of persons completing the subse-

quent diagnostic interview to about three

per identified case. Use of such a screener would thus considerably reduce the cost of case finding in this patient population.

In the MOS, depressive disorders were assessed at the second stage using a modi- fied version of the Diagnostic Interview Schedule (DIS) that allowed determination of DSM-III diagnoses of major depression and dysthymia. This DIS-generated defini- tion was, therefore, the criterion for devel-

opment of the screener. To select items for a

first-stage screener, the authors reviewed

existing self-report measures of depressive symptoms and general psychological dis- tress.7 The review included the Beck De-

pression Inventory (21 items),8 the General Health Questionnaire (60 items),' the Zung Self-Assessment Depression scale (20 items),9 the Hopkins Symptoms Checklist (90 items),?1 the Mental Health Inventory (38 items),11 and the Center for Epidemio- logic Studies Depression Scale (20 items).2 The authors decided to base their screening instrument on the Center for Epidemiologic Studies Depression Scale (CES-D) because its test-retest reliability is acceptable,2 and it is short and easy to administer. Further- more, the CES-D has been consistently pre- dictive of depression and other psychiatric disorders.12-15 Other screener scale candi- dates were not consistently highly-corre- lated with measures of depression (the Zung Self-Assessment Depression scale and the Hopkins Symptoms Checklist), were devel-

oped to assess severity of depression rather than to ascertain caseness (the Beck Depres- sion Inventory), had not been evaluated against the presence or absence of depres- sive disorders (the Mental Health Inven- tory), or were much too long (the General Health Questionnaire).

The sensitivity of a screener is the pro- portion of true positive cases that it correctly detects. Generally, it is desirable to mini- mize the number of true positive cases

MEDICAL CARE

Page 4: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

SCREENING INSTRUMENT FOR DEPRESSION

missed in screening (false negative errors). A good screener, therefore, has high sensi- tivity. Previous studies of the CES-D sug- gest that it is a sensitive indicator of depres- sion.13-15 The specificity of a screener is the proportion of true negative cases that it cor- rectly detects. High numbers of incorrectly classified negative cases (false positive errors) are often the cost of maintaining high screener sensitivity. Prior reports show that a large proportion of cases classified positive by the CES-D are not actually de- pressed, but have some other psychiatric disorder. In short, the CES-D does not ade- quately differentiate depression from other psychiatric disorders; its positive predictive value (the proportion with true depression from among those screened positive) is rela- tively low.13-15

The authors suspected that the specificity and, therefore, the positive predictive value of the CES-D items as a screener for de- pressive disorder could be enhanced using two strategies-altering the scoring method and adding items that asked about persis- tent periods of depressed mood. The CES-D is scored by summing responses across 20 items which are equally weighted. Such a scoring strategy does not take into account the possibility that some items may be better predictors of depressive disorders than others. A weighted scoring strategy might, therefore, increase the ability of the items to predict depressive disorder. The addition of items to the CES-D that ask about persistent periods of depressed mood may also add predictive power to the screener because DSM-III definitions of major depression and dysthymia include length-of-episode crite- ria that exceed the 1-week period of time covered by the CES-D. The DIS contains two items that determine whether persistent periods of depressed affect have occurred; these items were chosen as potential screener items.

The authors had access to data from two studies that both administered the CES-D and the DIS to the same respondents. Using

the sample of one study, the authors per- formed analyses to identify, from the pool of 20 CES-D and two DIS items, a subset of items that in combination best predicted current depressive disorder as determined by the DIS. The effectiveness of the result- ing measure as a screener for depressive disorder was subsequently examined in four samples: two were subsamples from the study in which the screener was developed, while two were from a second study and were, therefore, completely independent from the sample in which the screener was developed.

Several questions were addressed as part of the evaluation of the screener. What is the best cutoff score to use for the screener scale? Does the screener identify depressive disorder equally well for primary care and mental health patients? How well does the screener identify depressive disorder when DSM-III criteria for current depressive dis- order are in operation according to stringent or lenient rules using the DIS? How useful is the screener for diagnoses established within varying time frames (e.g., lifetime, past year, past 6 months, and past month)? What is the differential utility of the screener for major depression versus dys- thymia, and for depressive disorders versus other major psychiatric disorders?

The results of this evaluation not only contributed to a solution to the case-identi- fication problem presented by the MOS, but may eventually be useful to others needing efficient and effective mental health screen- ing instruments. First, the study illustrates a method of mental health screener develop- ment that maximizes the discriminant utility of each screener item. Secondly, the screener is unique in combining traditional self-report symptom items with traditional diagnostic assessment items, a quality that may be key in increasing the predictive util- ity of self-report mental health screeners. Finally, for those interested in screening for depressive disorders, the scale presented here may be a good choice.

777

Vol. 26, No. 8

Page 5: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

BURNAM ET AL.

TABLE 1. Demographic Characteristics of the ECA and PSP Samples

ECA Study PSP Study

Primary Care Mental Health Primary Care Mental Health Total Subsample Subsample Sample Sample

Mean age in years 41 43 38 39 40 % Male 47 40 36 33 39 % Hispanic 46 39 31 31 28 % Nonhispanic white 42 51 62 69 72 % Spanish Interview 21 17 11 16 19 Number of Persons 3132 1450 211 525 101

Methods

The data of this report are from two stud- ies, the Los Angeles Epidemiologic Catch- ment Area Study (ECA), and the study of

Psychiatric Screening Questionnaires for

Primary Care Patients (PSP). Each of these studies conducted personal interviews with adult samples in Los Angeles. The samples are described below.

Los Angeles ECA Sample

The Los Angeles ECA is described in de- tail elsewhere.16'17 The Los Angeles ECA

survey data was collected from 3132 adults who were sampled from the adult house- hold populations of two mental health catchment areas in metropolitan Los An- geles. One of the catchment areas contained a population that is predominantly hispanic American (83%), while the other catchment area had a largely nonhispanic white popu- lation. Households were selected using a two-stage probability sampling design. One adult in each household was then randomly selected for inclusion in the study. Of those sampled, 68% participated in the survey. Survey interviews were conducted between January 1983 and August 1984. Demo- graphic characteristics of the ECA sample are shown in Table 1.

Since the MOS is a study of patients of primary care physicians and mental health specialists, the authors defined two ECA subsamples for comparison: 1) those who utilized outpatient health care services for a

778

physical problem in the six months prior to the interview were labeled the "primary care" subsample (N = 1450); and 2) those who utilized outpatient general or mental health services for a mental health problem in the six months prior to the interview were labeled the "mental health" subsample (N = 211).

The basic demographic characteristics and the number of persons in each of these

subsamples are shown in Table 1. The pri- mary care subsample was older and the mental health subsample was younger than the total ECA sample. Both the primary care and mental health subsamples had some- what higher proportions of females, non-

hispanic whites, and English speaking re-

spondents than did the total ECA sample.

PSP Samples

The PSP study, described by Hough et al.,15 was conducted among a primary care outpatient sample and a mental health center outpatient sample. These two sam-

ples, therefore, provided a comparison that paralleled the primary care and mental health subsamples defined in the ECA study. The major difference was that, in the ECA study two groups of individuals were subsetted from the larger sample on the basis of their reports of recent health care visits, while the PSP samples were designed as two distinct samples from primary care and mental health settings. All individuals in that study, therefore, were included in the two comparison groups.

MEDICAL CARE

Page 6: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

SCREENING INSTRUMENT FOR DEPRESSION

The PSP primary care sample was ran-

domly selected from consecutive patients scheduled for a medical visit to a primary care physician in a Health Maintenance Or-

ganization in Southern California. Only his-

panic and nonhispanic white adults, who had at least one prior visit with an HMO

physician, were eligible for selection. His-

panics were oversampled to meet a 30% quota. Of 997 selected patients, 716 agreed by telephone to participate in the study, of which 525 (53%) came to the facility and

completed the personal interview. The PSP mental health sample was se-

lected from adult patients of a community mental health center in Los Angeles. Non- hispanic white and hispanic patients with active chart diagnoses of affective, anxiety, or schizophrenic disorders were eligible for selection. This diagnostic grouping was a

stratifying variable within which patients were randomly selected in numbers propor- tional to the size of the stratum. As with the

primary care patient sample, Hispanics in the mental health sample were oversampled to meet a 30% quota. Of 128 selected pa- tients, 101 (78.9%) completed interviews.

The demographic characteristics and number of persons in each of the PSP sam- ples are shown in Table 1. Like the ECA primary care and mental health subsamples, the PSP primary care and mental health samples had higher proportions of women and nonhispanic whites than did the ECA total sample.

CES-D

Both the ECA and the PSP included the 20-item Center for Epidemiologic Studies Depression Scale (CES-D). The CES-D items ask how often in the past week the respondent has experienced each of 20 symptoms, with responses given on a four- point scale ranging from "rarely or none (less than one day)" to "most or all (5-7 days)." These responses were scored from 0 (rarely or none) to 3 (most or all).

DIS Screener Items

Two items from the DIS were included in the pool of items considered for the screener. These determined whether the re- spondent experienced 1) two or more weeks of depression in the past year, and 2) had 2 or more years of depression that was either

ongoing or had ended as recently as within the past year. These items were scored 0 if the respondent did not report such a feeling, and 1 if he or she did.

DIS/DSM-III Definitions of Disorder

Psychiatric disorders were assessed using the DIS in both the ECA and PSP studies. The DIS is a highly structured diagnostic instrument that was developed for adminis- tration by trained lay interviewers. Survey information is scored with a computerized algorithm that assigns diagnoses according to DSM-III criteria. The development, scor-

ing, reliability, and validity of the DIS have been described by others.3'18-20 A Spanish version of the DIS has also been developed and tested, 21,22 and was used in the surveys reported here when the respondent's pri- mary language was Spanish.

The DIS was used to assess major depres- sion and dysthymia, the depressive dis- orders of interest. It is possible to generate DIS diagnoses that operationalize the hier- archical exclusion rules of DSM-III, in which dominant disorders preclude assign- ment of diagnoses for subordinate dis- orders, but those exclusion rules were not

applied for the purposes of the analyzes here.

The specific major depression criteria used to develop the screener required a life- time diagnosis of major depression with a

reported episode within the previous year. For dysthymia, a lifetime diagnosis was re-

quired, along with 2 or more years of de-

pressed mood that was current or had per- sisted into the past year. For both major de-

pression and dysthymia, evidence of

779

Vol. 26, No. 8

Page 7: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

BURNAM ET AL.

continuing depressive symptoms was re-

quired. This was indicated by the presence of a symptom in at least three DSM-III-de- fined symptom groups or the presence of a

symptom in two symptom groups in addi- tion to a period of depressed mood within the past month. Those with a lifetime his-

tory of manic episodes were excluded from the definition. Presence of major depression or dysthymia meeting these criteria is re- ferred to in this report as MOS-defined de-

pressive disorders.

Although the screener was developed using the above criteria, its validity was tested across a variety of definitions of past and current disorder that can be obtained

using the DIS. DIS diagnoses can be made for several prevalence periods, including lifetime, past year, past six months, and past month. Two alternative methods were em-

ployed to date the most recent period of disorder when defining current prevalence periods: 1) the time of the most recent spell of depression reported by the respondent, and 2) the time within which a sufficient number of symptoms were reported to meet full DSM-III criteria. The former approach is the standard method of determining current

prevalences using the DIS;23 the latter, stricter criterion was based on the recom- mendations of VonKorff and Anthony.24 The two approaches yield identical defini- tions for lifetime diagnoses. They are subse-

quently called the "standard" and "strin-

gent" DIS definitions of depressive dis- orders.

The DIS assesses other major DSM-III disorders, including panic disorder, phobia, obsessive-compulsive disorder, somatiza- tion disorder, alcohol abuse and depen- dence, drug abuse and dependence, antiso- cial personality, schizophrenia, and schi- zophreniform disorder. Severe cognitive impairment, although not a specific DSM-III diagnosis, is also determined as part of the DIS. DIS data regarding these conditions were combined in an index of nonaffective

disorders in analyses that examined the dis- criminant utility of the screener.

Development of Screener

To select the best subset of items for the screener, stepwise multiple logistic regres- sion analysis was employed, with the 20 CES-D and two DIS items tested as predic- tors of the probability of having MOS-de- fined current major depression or dysthy- mia. Logistic regression has been success-

fully used in a number of diverse, general medical applications involving dichotomous outcomes. For example, it has been used to calculate the risk of developing coronary heart disease as a function of certain per- sonal characteristics,25 and to examine hos-

pital mortality rates among ICU patients where adequate information is available re-

garding severity of illness and diagnostic profiles.26 The logistic model assumes that the logarithm of the odds, i.e., ln(P/1-P), is a linear combination of the predictive vari- ables; no assumptions are made about the distributions of the predictors, which is a

major reason for the wide applicability of this technique. For this study, P represents an individual's probability of having MOS- defined depressive disorder.

The total ECA sample was used in the

development of our logistic model. As a first

step, an all possible subsets regression pro- cedure (BMDP9R),27 identical to discrimi- nant analysis when the criterion variable is dichotomous, was used to uncover underly- ing dimensions in the data and identify good sets of predictors from the 20 CES-D and two DIS variables. The best set of pre- dictive items, according to Mallows C, was then submitted to a logistic regression pro- gram (BMDPLR),27 with stepwise backward elimination used to produce a final reduced model.

The final set of items selected for the screener included six CES-D items and the two DIS items. The prediction equation,

780

MEDICAL CARE

Page 8: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

SCREENING INSTRUMENT FOR DEPRESSION

TABLE 2. Screener Items and Unstandardized Coefficients Derived from Logistic Regression

Screener Item Possible values Coefficient

1. I felt depressed 0-3 1.078 2. My sleep was restless 0-3 0.185 3. I enjoyed life (reverse scored) 0-3 -0.269 4. I had crying spells 0-3 0.329 5. I felt sad 0-3 -0.280 6. I felt that people disliked me 0-3 0.288 7. In the past year, have you had 2 weeks or more during which you felt 0-1 2.712

sad, blue, or depressed, or lost pleasure in things that you usually cared about or enjoyed?

8. Have you had 2 years or more in your life when you felt depressed or 0-1 2.182 sad most days, even if you felt okay sometimes? (If yes) Have you felt depressed or sad much of the time in the past year?

ea+Bx P = a+x where

1 + ea+Bx '

e = natural logarithm

a = -6.543

Bx = (1.078 X Item 1) + (0.185 X Item 2) - (0.269 X Item 3)

+ (0.329 X Item 4) - (0.280 X Item 5) + (0.288 X Item 6)

+ (2.712 X Item 7) + (2.182 X Item 8)

showing the unstandardized logistic regres- sion coefficients, is given in Table 2. As the authors had expected, the two DIS items were important predictors of depression and, in fact, had the largest coefficients in the final regression model. Removing items 7, 8, and 1 from the regression procedure would have the greatest impact (ps < .0001), while removing one of the remaining items would have less impact (Ps ranging from .03 to .09).

The contribution of each item to the screener score represents a partial effect of that item after the contribution of all the other items are removed. Although each of the eight screener items is positively corre- lated to each of the other items and to the criterion measure, therefore, two items enter the regression with negative signs. Given the high degree of multicollinearity among the items, the negative coefficients are not surprising. This pattern of multivariate rela-

tionship can be interpreted as "net suppres- sion."28 The two items with negative coeffi- cients increased the total variance explained

by the set of eight items by suppressing a

portion of the variance of the other screener items that was uncorrelated with the de-

pression criterion. In additional analyses, the authors determined that the direction of the signs remained the same when sociode-

mographic factors were included in the re-

gression, suggesting that the coefficient di- rections are not due to different responses on the items from different demographic subgroups. A screener that excluded the two items in question was also tested. This six- item screener, with coefficients recalculated, performed very similarly to the eight-item screener. The eight-item screener, however, was slightly better overall. The screener was scored by solving for the probability of being depressed, using the equation shown in Table 2, and assigning this value as a scale score for each individual.

Evaluation of Screener

The screener was evaluated by examining its ability to predict depressive disorder,

781

Vol. 26, No. 8

Page 9: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

BURNAM ET AL.

Screener

+ -

DIS + a b

depression c d

FIG. 1. Four classifications of an individual using the screener. Sensitivity = a/(a + b); specificity = d/(c + d); positive predictive value = a/(a + c); and false negative rate = b/(b + d).

using the DIS assessment as the criterion, in two subsamples (the primary care and mental health subsamples) of the ECA

study and in the primary care and mental health samples of the PSP study. Using a

range of screener cutpoints, the authors ex- amined screener sensitivity, specificity, pos- itive predictive value, and false negative rate. Definitions of these terms are given above. When comparing the screener score (above cutpoint = positive or below cut-

point = negative) with the criterion case definition (DIS depression is positive or neg- ative), the four possible classifications of an individual will be labeled a-d (Fig. 1).

Results

Table 3 shows the screener's ability to

predict MOS-defined depressive disorder in the primary care and mental health sub- samples of the ECA study (subsets of the total sample in which the screener was de- veloped), and in the primary care and men- tal health samples of the PSP study (inde- pendent samples). The sensitivity, specific- ity, and positive predictive value of the screener for current depressive disorder, as defined in the MOS study, were examined using a range of different cutpoints. Results are shown for each of several cutpoints in a good range for maintaining high sensitivity

of the screener. Cutpoints above 0.080 are not shown because they resulted in sensitiv- ities considered too low for screening pur- poses. Maximal sensitivity for all but one of the samples was achieved using a cutpoint of 0.009, but at this level of sensitivity the

positive predictive values were quite low

(ranging from 9 to 37). At the cutpoint of 0.060, the positive predictive value is only slightly lower than the highest achievable

predictive value for this range of sensitivity. Sensitivity of the screener tends to be

higher and specificity lower among the mental health samples when they are com-

pared with the primary care samples. Al-

though specificity of the screener is some- what lower among the mental health sam-

ples, positive predictive values of the screener remain high as a result of the high base rates of depressive disorder in this pop- ulation.

Screener Utility for Standard Versus Stringent DIS Definition of Current Depressive Disorder

Table 4 shows the utility of the screener for identifying depressive disorder within the past month, using the standard DIS def- inition and the more stringent definition

suggested by Von Korff and Anthony.24 A screener cutpoint of 0.060 was used for this

analysis. Using the stringent definition re- duces the prevalence of depressive disorder found within the past month to half that found when using the standard definition.

Sensitivity of the screener is higher and

specificity only slightly lower for the more

stringent definition of depression. The posi- tive predictive value of the screener is better for the standard definition of depression because baserates are higher using this defi- nition. Findings for the stringent versus standard definitions of depression followed similar patterns when results were exam- ined for other current prevalence periods of

depressive disorder (past 6 months and past year), using the range of screener cutpoints.

782

MEDICAL CARE

Page 10: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

SCREENING INSTRUMENT FOR DEPRESSION

TABLE 3. Screener Detection of Current Depressive Disorder as Defined in MOS Using Varying Cutpoints

ECA PSP

Primary Care Mental Health Primary Care Mental Health (N = 1416) (N = 206) (N = 497) (N = 97)

Sensitivity for cutpoint = 0.009 100 100 86 100 0.026 93 96 86 92 0.043 91 92 86 92 0.060 86 89 86 92 0.080 86 88 79 81

Specificity for cutpoint = 0.009 83 63 75 38 0.026 92 83 85 56 0.043 94 86 88 62 0.060 95 87 90 63 0.080 96 89 91 69

Positive predictive value for cutpoints =

0.009 16 28 9 37 0.026 26 46 14 44 0.043 33 48 17 47 0.060 37 50 20 48 0.080 39 55 20 49

% criterion positive 3.0 12.5 3.0 26.5

Screener Utility for Varying Diagnostic within which diagnoses can be determined Timeframes using the DIS: lifetime, within the past year,

within the past 6 months, and within the The utility of the screener was examined past month. Current prevalence periods

across each of the four prevalence periods were defined according to the stringent DIS

TABLE 4. Screener Detection of Depressive Disorder Within the Past Month Using Standard and Stringent DIS Definitiont

ECA PSP

Total Primary Care Mental Health Primary Care Mental Health (N = 3015) (N = 1416) (N = 206) (N = 501) (N = 97)

Sensitivity Stringent 89 96 94 86 94 Standard 72 70 75 71 87

Specificity Stringent 95 95 85 90 58 Standard 96 96 87 91 65

Positive predictive value Stringent 23 24 37 20 32 Standard 41 40 52 33 54

% Criterion positive Stringent 1.5 1.8 8.7 2.8 17.5 Standard 3.4 4.0 15.5 5.6 32.0

'Screener cupoint = 0.060.

783

Vol. 26, No. 8

Page 11: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

BURNAM ET AL.

TABLE 5. Screener Detection of Depressive Disorder Defined for Lifetime and 3 Current Prevalence Periods Using Stringent DIS Definition

ECA PSP

Total Primary Care Mental Health Primary Care Mental Health (N = 3015) (N = 1416) (N = 206) (N = 501) (N = 97)

0.060 0.009 0.060 0.009 0.060 0.009 0.060 0.009 0.060 0.009

Sensitivity Lifetime 37 71 34 72 47 80 41 72 75 91 Past year 67 95 61 95 74 96 68 88 87 100 Past 6 months 74 93 70 93 83 96 76 91 90 100 Past month 89 98 96 100 94 100 86 93 94 100

Specificity Lifetime 97 89 97 88 88 70 93 81 79 52 Past year 96 86 95 84 85 63 91 76 67 46 Past 6 months 96 85 95 83 86 62 91 76 69 41 Past month 95 84 95 82 85 61 90 75 58 34

Positive predictive value Lifetime 36 40 33 45 47 52 48 38 82 71 Past year 36 18 35 20 43 28 28 16 66 54 Past 6 months 31 14 31 15 43 25 26 14 56 44 Past month 23 9 24 9 37 20 20 10 32 24

% criterion positive Lifetime 9.6 12.2 29.1 14.2 56.7 Past year 3.2 4.0 13.1 5.0 39.2 Past 6 months 2.5 3.1 11.7 4.2 32.0 Past month 1.5 1.8 8.7 2.8 17.5

definition for data presented in Table 5, but similar variations by time period were found when using the standard DIS definition. Data are provided for two screener cut-

points, 0.060 and 0.009. When the screener cutpoint was 0.060,

sensitivity of the screener dramatically in- creased with more recent diagnoses. The

specificity of the screener decreased only slightly ranging from lifetime to more recent

diagnostic periods; positive predictive values also decreased somewhat for more recent diagnostic periods. Although the

specificity of the screener was adequate for all prevalence periods, sensitivity was very low (ranging from 34 to 75) for lifetime de-

pressive disorder, and fell below 80 for most of the samples when examining diagnoses obtained within the past year and within the

past six months. It was only in predicting depressive disorder in the past month that a

cutpoint of 0.060 resulted in acceptable per- formance of the screener.

Using a screener cutpoint of 0.009, sensi-

tivity of the screener reached more accept- able levels even for the lifetime prevalence period. The result of this increased sensitiv-

ity was loss of specificity of the screener, with particularly low positive predictive values for current diagnoses (within the past year or more recent). A cutpoint of 0.009, however, resulted in quite good perfor- mance of the screener for detecting lifetime

depressive disorder.

Screener Utility for Specific Depressive Disorders

Table 6 presents sensitivities, specificities, and positive predictive values of the screener separately for lifetime major de- pression and dysthymia, using the cutpoint

784

MEDICAL CARE

Page 12: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

SCREENING INSTRUMENT FOR DEPRESSION

TABLE 6. Screener Detection of Lifetime Major Depression and Lifetime Dysthymiat

ECA PSP

Total Primary Care Mental Health Primary Care Mental Health (N = 3051) (N = 1416) (N = 206) (N = 501) (N = 97)

Sensitivity Major depression 65 66 76 67 100

Dysthymia 90 90 89 85 92

Specificity Major depression 87 85 65 78 34

Dysthymia 87 85 62 78 35

Positive predictive value Major depression 29 32 41 25 24

Dysthymia 25 26 26 26 33

% criterion positive Major depression 7.0 9.5 24.0 10.3 48.5 Dysthymia 4.7 5.7 13.1 8.3 25.8

Screener cutpoint = 0.009.

of 0.009 that was found to be adequate for

screening lifetime depressive disorder. A substantial number of individuals with de-

pressive disorder have both dysthymia and

major depression. Among those with either lifetime major depression or lifetime dys- thymia, 22% to 31%, depending upon the

sample, had both disorders, while among those with one of these depressive disorders in the past month, 13% to 31% have both. Those patients with both disorders are in- cluded in analysis of each of the separate diagnostic categories in Table 6. For all sam-

ples except the PSP mental health sample, the screener was more sensitive to dysthy- mia than to major depression, while speci- ficities and positive predictive values of the screener were similar for the two types of depressive disorder.

Although the results are not shown here, the authors also examined specific diag- noses of major depression and dysthymia within the past month, using a screener

cutpoint of 0.060. The pattern of results for that prevalence period was similar to the

pattern of results for lifetime diagnoses, with the screener displaying somewhat

greater sensitivity for dysthymia than major depression.

Screener Detection of Depressive Disorder Versus Other Psychiatric Disorders

Table 7 presents data showing the dis- criminant utility of the screener, that is, the screener's ability to detect depressive dis- order relative to its ability to detect other nonaffective psychiatric disorders. Persons with lifetime major depression or dysthymia (with or without manic episodes) were

compared to persons with any other nonaf- fective DIS/DSM-III psychiatric disorder. Those persons with both depressive dis- order and a nonaffective disorder were ex- cluded from the analysis. Persons in each disorder category were compared to those

persons with no disorder, a comparison that resulted in identical specificities of the screener for depressive and nonaffective disorders.

The screener is considerably more sensi- tive to depressive disorder than to nonaf- fective disorder. A much greater proportion of those with nonaffective disorder are in-

correctly classified than those with depres- sive disorder, therefore, as is evidenced by the false negative rates. Although the posi- tive predictive value of the screener for nonaffective disorder exceeds, to a moder-

785

Vol. 26, No. 8

Page 13: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

BURNAM ET AL.

TABLE 7. Screener Detection of Lifetime Affective Disorders Only and Lifetime Nonaffective Disorders Only Versus No Lifetime Disordert

ECA PSP

Total Primary Care Mental Health Primary Care Mental Health (N = 2074) (N = 934) (N = 91) (N = 327) (N = 21) (N = 2697) (N = 1227) (N = 141) (N = 421) (N = 37)

Sensitivity Affective 69 75 84 69 100 Nonaffective 19 18 33 22 58

Specificity Affective 92 90 74 83 72 Nonaffective 92 90 74 83 72

Positive predictive value Affective 32 36 46 28 73 Nonaffective 45 42 55 34 83

False negative rate Affective 2 2 5 4 20 Nonaffective 25 27 46 28 58

% criterion positive Affective 5.3 6.9 20.9 8.9 47.7 Nonaffective 27.2 29.1 48.9 29.2 70.3

'Cutpoint = 0.009.

ate extent, that of affective disorders, the

high positive predictive value can be ex-

plained by the higher base rates of the com- bined nonaffective disorders compared with the affective disorders.

Discussion

An eight-item symptom scale was devel- oped as the first of a two-stage procedure identifying depressive disorder in patients visiting offices of general medical or mental health specialty providers. The results of analyses designed to assess the effectiveness of the screener for this purpose indicated that sensitivity and specificity of the scale were consistently high across four different samples from primary care and mental health user populations. Although sensitiv- ity of the screener was high for recent (within the past month) depressive dis- orders, it dropped substantially when dis- orders were defined for longer prevalence intervals. Varying the cutpoint for the screener improved sensitivity for longer prevalence intervals (within the past 6

months, within the past year, and lifetime), but, overall, the screener was not as effec- tive for these longer prevalence intervals. This characteristic of the screener is very likely a function of the timeframe for the screener items themselves. Six of the screener items are from the CES-D; they ask how the respondent felt in the past week. The remaining two items, taken from the DIS, ask about periods of depressed affect in the past year. A modification of the screener that might increase its ability to detect de-

pressive disorder for longer prevalence in- tervals would be to extend the timeframe of the screener items.

In the primary health care samples, as in the general household sample, the screener tended to have somewhat lower sensitivity and higher specificity for depressive dis- orders than in the mental health care sam- ples. This result can be explained by consid- ering the impact that borderline cases have on overall sensitivities and specificities.29'30 Because a screener is very likely to detect severe cases of disorder more accurately than mild cases, sensitivity is higher when

786

MEDICAL CARE

Page 14: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

SCREENING INSTRUMENT FOR DEPRESSION

the proportion of severe to mild cases in the

population is higher. Among mental health care users, where the prevalence of depres- sive disorder is high, this effect is expected. Similarly, specificity will be higher when the proportion of borderline-negative cases to clear-cut-negative cases is lower, as one

expects in primary care or general popula- tions. In spite of these differences in the

properties of the screener when used in the

primary care and mental health samples, the authors found that in both types of sam- ples the screener had adequate sensitivities (over 85) and positive predictive values (20 to 50) for use as part of the two-stage case identification procedure for which it was developed.

The screener was more effective in de- tecting depressive disorders when using a stringent interpretation of current disorder than when using the standard DIS defini- tion. The distinction between these two methods of operationalizing current dis- order with the DIS has been discussed in detail by VonKorff and Anthony24 using major depression as an example. The stan- dard method for assessing the period of current major depression using the DIS is, to determine whether the respondent has met full criteria at any time in his or her life, and then determine when the last spell of feeling depressed with "some" other previously mentioned lifetime symptoms, ended. The alternative, stringent definition requires that symptoms from at least four DSM-III crite- rion B symptom groups were experienced by the respondent within a specific, past timeframe (e.g., 1 month). Of those who met the standard DIS definition of current major depression or dysthymia, 44% to 56%, depending on the sample, also met the more stringent definition. VonKorff and Anthony's finding of 54% in a Baltimore household survey falls within this range. Sensitivity of the screener was seven to 26 points higher using the stringent definition rather than the standard definition, with no

substantial loss in specificity. This suggests that the screener is better for detecting de- pressive disorder during more severe or acute stages of disorder than during periods of diminished or residual symptomatology.

This screener was designed as the first of a two-stage case identification process. The

second-stage instrument was intended to be the DIS. One limitation of the analyses here is that the screener instrument and the DIS were not independently administered. In

developing and testing the screener, practi- cal contraints necessitated reliance upon ex- isting survey data in which (in the ECA

study) the CES-D and the DIS were admin- istered as part of a single interview. The au- thors did not expect this to substantially in- fluence the findings regarding the CES-D items in the screener since these items were

independently administered in the PSP

study and were asked at a different point in the interview protocols than the DIS affec- tive disorder items in the ECA study. The two DIS items that were included in the screener, however, were also used to define the criterion measure. In both the ECA and PSP studies, these two items were not sepa- rately administered; the two DIS screener items and two DIS items used in the diag- nostic algorithm, therefore, are perfectly correlated.

In the case of a true staged case-identifi- cation design, in which the screener and cri- terion instrument are independently ad- ministered, the authors would expect a high correlation between the items from a first to a repeat administration, but not a perfect correlation because some measurement error will exist. The test-retest reliability of these two items was examined by the au- thors as part of a study of 230 community adults who received an in-person DIS, fol- lowed approximately 3 months later by a telephone administration of the depression section of the DIS. Overall agreements for the two DIS items asked on a lifetime basis were 86%, for 2 weeks of feeling depressed,

787

Vol. 26, No. 8

Page 15: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

BURNAM ET AL.

and 91%, for 2 years of feeling depressed.31 Kappa statistics, that correct for chance

agreement, were 0.73 and 0.46, both of which are considered acceptable levels of

agreement. These data suggest that an inde-

pendent administration of the screener and the criterion instruments would not have a large effect on the findings presented here.

The method used to score the screener in this study was a predictive model based

upon logistic regression techniques. This model differs from the alternative approach of summing unit-weighted items, which is the traditional method of scoring mental health symptom indices. The traditional method assumes that all items measure the

target dimension of mental health status equally well. The regression model assumes that different items may have different value in predicting a criterion measure; weights represent the partial effect of that item, net of all the other items. The authors used this approach to make optimal use of the information contained in a very short set of items. Further studies are needed to de- termine the robustness of the particular weighted model presented here.

The author's screener differs from other measures of general psychological distress in another respect. Other measures have as- sessed symptoms experienced within a given time frame, such as the past week, but the items have not included specific criteria regarding the length of time that a symptom must persist before it is considered positive. The screener used here was based on the assumption that such information may sig- nificantly add to predictive utility. The re- sults indicated that two items added to the screener, that asked about persistent periods of depressed mood, were very powerful predictors of depressive disorder.

In sum, the authors have developed a very brief screener having high sensitivity and specificity for the depressive disorders of major depression and dysthymia. The screener functions well both for a house-

hold population and for users of medical and mental health services. The screener functions especially well for depressive dis- order that is current and active. In the fu- ture, the authors will report data on the util- ity of the screener in the National Study of Medical Care Outcomes.

Acknowledgements Data from The Los Angeles ECA Study were col-

lected as part of the Epidemiologic Catchment Area Program (ECA). The ECA is a series of five epidemio- logic research studies performed by independent re- search teams in collaboration with staff of the division of Biometry and Epidemiology (DBE) of the National Institute of Mental Health (NIMH). The NIMH Princi- pal Collaborators are Darrel A. Regier, Ben Z. Locke, and Jack D. Burke, Jr.; the NIMH Project Officer is William J. Huber. The Principal Investigators and Co- Investigators from the five sites are: Yale University, UO1 MG 34224-Jerome K. Myers, Myma M. Weiss- man, and Gary L. Tischler; the Johns Hopkins Univer- sity, UO1 MH 33870-Morton Kramer and Sam Sha- piro; Washington University, St. Louis, U01 MH 33883-Lee N. Robins and John E. Helzer; Duke Uni- versity, U01 MH 35386-Dan Blazer and Linda George; University of California, Los Angeles, U01 MH 35865-Marvin Karo, Richard L. Hough, Javier I. Escobar, M. Audrey Buram, and Dianne M. Timbers.

Data from the PSP study were collected as part of a study conducted by Richard Hough, principal investi- gator, under contract from the National Institute of Mental Health (#DB-81-0036).

References

1. Goldberg DP. The Detection of Psychiatric Illness by Questionnaire: A Technique for the Identification and Assessment of Nonpsychotic Psychiatric Illness. London: Oxford University Press, 1972.

2. Radloff LS. The CED-D scale: A self-report de- pression scale for research in the general poulation. App Psychol Measurement 1977;1:385.

3. Robins LN, Helzer JE, Croughan J, et al. National Institute of Mental Health Diagnostic Interview Sched- ule: Its history, characteristics, and validity. Arch Gen Psychiatry 1981;38:381.

4. Endicott J, Spitzer RL. A diagnostic interview- the schedule for affective disorders and schizophrenia. Arch Gen Psychiatry 1978;35:837.

5. Hoeper EW, Ncyz PD, Cleary PD, et al. Estimated prevalence of RDC mental disorder in primary medical care. Int J Mental Health 1979;8:6.

6. Schulberg HC, Saul M, McClelland M, et al. As- sessing depression in primary medical and psychiatric practices. Arch Gen Psychiatry 1985;42:1164.

7. Wells KB. Depression as a tracer condition for the

788

MEDICAL CARE

Page 16: Development of a Brief Screening Instrument for Detecting ... · Development of a Brief Screening Instrument for Detecting Depressive Disorders Author(s): M. Audrey Burnam, Kenneth

SCREENING INSTRUMENT FOR DEPRESSION

national study of medical care outcomes. Santa Mon- ica: The Rand Corporation, 1985. (R-3293-RWJ/HJK).

8. Beck AT, Ward CH, Mendelson M, et al. An In- ventory for measuring depression. Arch Gen Psychia- try 1961;4:561.

9. Zung WWK. Self-rating depression scale. Arch Gen Psychiatry 1965;12:63.

10. Kelman HC, and Parlof MB. Interrelations among three criteria of improvement in group therapy: comfort, effectiveness, and self-awareness. J Abnorm & Soc Psychol 1957;54:281.

11. Ware JE Jr., Johnson SA, Davies-Avery A, et al. Conceptualization and measurement of health for adults in the Health Insurance Study: Vol III, Mental Health. Santa Monica: The Rand Corporation, 1979. (R-1987/3-HEW).

12. Weissman MM, Shalomskas D, Pottenger M, et al. Assessing depressive symptoms in five psychiatric populations: A validation study. Am J Epidemiol 1977;106:203.

13. Myers JK, Weissman MM. Use of a self-report symptom scale to detect depression in a community sample. Am J Psychiatry 1980;137:1081.

14. Roberts RE, Vernon SW. The Center for Epide- miological Studies Depression Scale: Its use in a com- munity sample. Am J Psychiatry, 1983;140:41.

15. Hough RL, Landsverk JA, Stone JD, et al. Psy- chiatric screening scale project: Final report. Contract #DB-81-0036. National Institute of Mental Health, 1983.

16. Hough RL, Karno M, Burnam MA, et al. The Los Angeles Epidemiological Catchment Area research program and the epidemiology of psychiatric disorders among Mexican Americans. J Oper Psychiatry 1983;14:42.

17. Burnam MA, Hough RL, Escobar JI, et al. Six- month prevalence of specific psychiatric disorders among Mexican-Americans and Non-Hispanic Whites in Los Angeles. Arch Gen Psychiatry 1987;44:687.

18. Robins LN, Helzer JE, Ratcliff KS, et al. Validity of the Diagnostic Interview Schedule Version II: DSM- III diagnoses. Psychol Med 1982;12:855.

19. Helzer JE, Robins LN, McEvoy LT, et al. A com- parison of clinical and diagnostic Interview Schedule

diagnoses-Physician reexamination of lay-inter- viewed cases in the general population. Arch Gen Psy- chiatry 1985;42:657.

20. Anthony JC, Folstein M, Romanoski AJ, et al. Comparison of the Lay Diagnostic Interview Schedule and a Standardization Psychiatric Diagnosis: Experi- ence in Eastern Baltimore. Arch Gen Psychiatry 1985;42:667.

21. Karno M, Burnam MA, Escobar JI, et al. Devel- opment of the Spanish Language version of the NIMH Diagnostic Interview Schedule. Arch Gen Psychiatry 1983;40:1183.

22. Burnam MA, Karno M, Hough RL, et al. The Spanish Diagnostic Interview Schedule: Reliability and comparison with clinical diagnoses. Arch Gen Psychia- try 1983;40:1189.

23. Myers JK, Weissman M, Tischler GL, et al. Six- month prevalence of psychiatric disorders in three communities. Arch Gen Psychiatry 1984;41:959.

24. VonKorff MR, Anthony JC. The NIMH Diag- nostic Interview Schedule modified to record current mental status. J Affective Disord 1982;4:365.

25. Brittain E. Probability of developing coronary artery disease. Technical Report 54. Stanford: Stanford University Division of Biostatistics, 1980.

26. Knaus WA, Draper EA, Wagner DP, et al. An evaluation of outcome from intensive care in major medical centers. Ann Intern Med 1986;104:410.

27. Dixon WJ, ed. BMDP Statistical Software. Berke- ley: University of California Press, 1983.

28. Cohen J, Cohen P. Applied Multiple Regres- sion/Correlation Analysis for The Behavioral Sciences. Hillsdale NJ: Lawrence Erlbaum, 1975.

29. Sackett DL, Haynes RB, Tugwell P. Clinical Epi- demiology: A Basic Science for Clinical Medicine. Bos- ton: Little Brown & Co., 1985.

30. Robins LN. Epidemiology: Reflections on testing the validity of psychiatric interviews. Arch Gen Psychi- atry 1985;42:918.

31. Wells KB, Burnam MA, Leake B, et al. Agree- ment between face-to-face and telephone-adminis- tered versions of the depression section of the NIMH Diagnostic Interview Schedule. Journal of Psychiatr Res (in press).

789

Vol. 26, No. 8