a systematic review of validated methods for identifying depression using administrative data

11
ORIGINAL REPORT A systematic review of validated methods for identifying depression using administrative data Lisa Townsend 1 *, James T Walkup 1 , Stephen Crystal 1 and Mark Olfson 2 1 Institute for Health, Health Care Policy & Aging Research, Rutgers University, New Brunswick, NJ, USA 2 Department of Psychiatry, College of Physicians and Surgeons, Columbia University and the New York State Psychiatric Institute, New York, NY, USA key wordsdepression; antidepressants; epidemiology; surveillance INTRODUCTION Administrative and claims data (hereafter administrative data) represent an important resource for safety surveillance and research on the use and effectiveness of medical products. They offer opportunities to conduct safety surveillance or study drug use, other treatments, and selected outcomes for large and diverse patient populations across a broad range of usual care settings. However, because these datasets are not designed for either surveillance or research purposes, their use is subject to signicant challenges and limitations. 1 One key challenge involves assessing the validity of algorithms for identifying various outcomes and coex- isting conditions. This was recognized by the Food and Drug Administration as an important need for its Mini-Sentinel pilot program, which is currently focused on conducting safety surveillance to rene safety signals that emerge for its regulated medical products. An important step is to assess the validity of algorithms for identifying health outcomes of interest (HOI) in administrative data. Depression represents an important HOI. It is a leading source of disability, accounting for approxi- mately one third of healthy life years lost to disability for people aged 15 years and older. 2 Given the signi- cant disease burden caused by depression, large-scale efforts to identify the disorder and monitor its outcomes are of paramount importance to public health. In the present report, we review studies that have sought to validate algorithms for identifying cases of depression using administrative data. This report summarizes the process and ndings of the depression algorithm review. The full report is available on the Mini-Sentinel Web site at http://mini-sentinel.org/foundational_activities/ related_projects/default.aspx. METHODS Data sources were limited to administrative datasets from the USA or Canada. The general search strategy was developed based on prior work by the Observa- tional Medical Outcomes Partnership (OMOP) 3 and its contractors and modied slightly for these reports. The modied OMOP search strategy was combined with PubMed terms representing the HOI. Medical subject heading (MeSH) terms were used as HOI search terms. Details of the methods for these system- atic reviews can be found in the accompanying article by Carnahan and Moores. 4 Briey, the base PubMed search was combined with the following terms to represent depression: depression, major depressive dis- order, dysthymic disorder, or seasonal affective disor- der. The workgroup also searched the database of the Iowa Drug Information Service (IDIS) using a similar search strategy to identify other relevant articles that were not found in the PubMed search. To identify depression validation studies that were unpublished, in prepublica- tion, or that were not identied by the search strategies, Mini-Sentinel investigators were requested to provide information on any published or unpublished adminis- trative data studies that validated an algorithm for de- pression. Results were aggregated into two sets of les, one containing the abstracts for review and the other for *Correspondence to: L. Townsend, Rutgers University School of Social Work, 536 George Street, Room 103, New Brunswick, NJ 08901. E-mail: [email protected] Copyright © 2012 John Wiley & Sons, Ltd. pharmacoepidemiology and drug safety 2012; 21(S1): 163173 Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/pds.2310

Upload: lisa-townsend

Post on 06-Jul-2016

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A systematic review of validated methods for identifying depression using administrative data

ORIGINAL REPORT

A systematic review of validated methods for identifying depressionusing administrative data

Lisa Townsend1*, James T Walkup1, Stephen Crystal1 and Mark Olfson2

1Institute for Health, Health Care Policy & Aging Research, Rutgers University, New Brunswick, NJ, USA2Department of Psychiatry, College of Physicians and Surgeons, Columbia University and the New York State Psychiatric Institute, NewYork, NY, USA

key words—depression; antidepressants; epidemiology; surveillance

INTRODUCTION

Administrative and claims data (hereafter administrativedata) represent an important resource for safetysurveillance and research on the use and effectivenessof medical products. They offer opportunities to conductsafety surveillance or study drug use, other treatments,and selected outcomes for large and diverse patientpopulations across a broad range of usual care settings.However, because these datasets are not designed foreither surveillance or research purposes, their use issubject to significant challenges and limitations.1 Onekey challenge involves assessing the validity ofalgorithms for identifying various outcomes and coex-isting conditions. This was recognized by the Foodand Drug Administration as an important need for itsMini-Sentinel pilot program, which is currently focusedon conducting safety surveillance to refine safety signalsthat emerge for its regulated medical products. Animportant step is to assess the validity of algorithmsfor identifying health outcomes of interest (HOI) inadministrative data.Depression represents an important HOI. It is a

leading source of disability, accounting for approxi-mately one third of healthy life years lost to disabilityfor people aged 15 years and older.2 Given the signifi-cant disease burden caused by depression, large-scaleefforts to identify the disorder and monitor its outcomesare of paramount importance to public health. In thepresent report, we review studies that have sought to

validate algorithms for identifying cases of depressionusing administrative data. This report summarizes theprocess and findings of the depression algorithm review.The full report is available on the Mini-Sentinel Website at http://mini-sentinel.org/foundational_activities/related_projects/default.aspx.

METHODS

Data sources were limited to administrative datasetsfrom the USA or Canada. The general search strategywas developed based on prior work by the Observa-tional Medical Outcomes Partnership (OMOP)3 andits contractors and modified slightly for these reports.The modified OMOP search strategy was combinedwith PubMed terms representing the HOI. Medicalsubject heading (MeSH) terms were used as HOIsearch terms. Details of the methods for these system-atic reviews can be found in the accompanying articleby Carnahan and Moores.4 Briefly, the base PubMedsearch was combined with the following terms torepresent depression: depression, major depressive dis-order, dysthymic disorder, or seasonal affective disor-der. The workgroup also searched the database of theIowa Drug Information Service (IDIS) using a similarsearch strategy to identify other relevant articles that werenot found in the PubMed search. To identify depressionvalidation studies that were unpublished, in prepublica-tion, or that were not identified by the search strategies,Mini-Sentinel investigators were requested to provideinformation on any published or unpublished adminis-trative data studies that validated an algorithm for de-pression. Results were aggregated into two sets of files,one containing the abstracts for review and the other for

*Correspondence to: L. Townsend, Rutgers University School of SocialWork, 536 George Street, Room 103, New Brunswick, NJ 08901. E-mail:[email protected]

Copyright © 2012 John Wiley & Sons, Ltd.

pharmacoepidemiology and drug safety 2012; 21(S1): 163–173Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/pds.2310

Page 2: A systematic review of validated methods for identifying depression using administrative data

documenting abstract review results. The PubMedsearch was conducted on 14 May 2010 and the IDISsearches on 11 June 2010.

Abstract review. Each abstract was reviewed indepen-dently by the first and second authors (L.T. and J.W.)to determine whether the full-text article should bereviewed. The following abstract exclusion criteriawere applied: (i) The abstract did not mention depres-sion or dysthymia; (ii) the study did not use an admin-istrative database (eligible sources included insuranceclaims databases and other secondary databasesthat identify health outcomes using billing codes); and(iii) the data source was not from the USA or Canada.Exclusion criteria were documented sequentially(i.e., if one exclusion criterion was met, then the othercriteria were not documented). If the reviewersdisagreed on whether the full text should be reviewed,then it was selected for review. Interrater agreement onwhether to include or exclude an abstract was calculatedusing Cohen’s kappa.

Full-text review. Full-text articles were reviewedindependently by the first and second authors (L.T.and J.W.), with the goal of identifying validation stud-ies. The full-text review included examination ofarticles’ reference sections as an additional means ofcapturing relevant citations. Citations from the refer-ences were selected for full-text review if they werecited as a source for a depression algorithm or wereotherwise deemed likely to be relevant. Full-text studieswere excluded from the final evidence table if they metone or more of the following criteria: (i) The articlecontained a poorly described or difficult to operationa-lize depression algorithm defined by the absence ofDiagnostic and Statistical Manual of Mental Disorder(DSM) depression diagnosis codes (296.2, 296.3,300.4, or 311) or the International Classification ofDiseases (ICD) diagnosis codes for depression (296.2,296.3, 300.4, 311, 298.0, or 309.1), and (ii) the articleprovided no validation measure of depression or didnot report validity statistics. Full-text review exclusioncriteria were applied sequentially. If there was disagree-ment on whether a study should be included, the tworeviewers (L.T. and J.W.) attempted to reach consensuson inclusion by discussion. If they could not agree, anadditional investigator (M.O.) was consulted to makethe final decision.

All studies that survived the exclusion screenwere included in the final evidence table. A singleinvestigator abstracted each study for the table. A

second investigator confirmed the accuracy of theabstracted data. A clinician or topic expert was consultedto review the results of the evidence table and to evaluatehow the findings compared with the findings of diagnos-tic methods used in clinical practice. This assessmenthelped to determine whether the algorithms excludedany depression diagnosis codes commonly used inclinical practice and the appropriateness of the validationmeasures in relation to clinical diagnostic criteria.

RESULTS

The total number of unique citations from the combinedsearches was 1731. A second PubMed search incorpo-rating additional database names identified 30 citations.

Abstract reviews. Of the 1761 abstracts reviewed, 286were selected for full-text review; 219 were excludedbecause they did not study depression, 994 wereexcluded because they were not administrative databasestudies, and 262 were excluded because the data sourcewas not from the USA or Canada. Cohen’s kappa forreviewer agreement regarding abstract inclusion orexclusion was 0.77.

Full-text reviews. Of the 286 full-text articlesreviewed, 10 were included in the final evidencetables; 142 were excluded because the identificationalgorithm was poorly defined (e.g., did not use thespecified DSM or ICD diagnosis codes for depres-sion), and 134 were excluded because they includedno validation of depression or validity statistics.Reviewers identified 36 new citations from full-textarticle references. Of these, one was included in thefinal report; 9 did not study depression, 12 were notdatabase studies, 5 were excluded because the depres-sion algorithm was poorly defined, and 9 wereexcluded because they included no validation ofdepression or validity statistics. The final evidencetables collectively include 11 articles (see Tables 1and 2). Cohen’s kappa for reviewer agreement regard-ing inclusion or exclusion of full-text articles was 0.95.

Depression algorithms and validation statistics. All11 publications listed in the evidence tables includedalgorithms with various combinations of four differentICD-9 diagnostic codes to define depression: depres-sion NOS (311), dysthymic disorder (300.4), andmajor depressive disorder, single episode (296.2) orrecurrent (296.3) (Tables 1 and 2). Two studies further

l. townsend et al.164

Copyright © 2012 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2012; 21(S1): 163–173DOI: 10.1002/pds

Page 3: A systematic review of validated methods for identifying depression using administrative data

Table 1. Depression Algorithm Definitions/Validation Characteristics for Studies Using Standardized Scales or Structured Diagnostic Interviews as aValidation Standard

Citation Study Population andTime Period

Description ofOutcome Studied

Algorithm Validation/AdjudicationProcedure andOperationalDefinition

PositivePredictive

Value (PPV)

Sensitivity(Sens)

Kahnet al.,200814

Adult (≥18 years)Medicaid behavioralhealth managed careorganization enrollees(n=249) with adiagnosed mentaldisorder and diabetes,63.1% female, meanage 52.2 years, 2006

PHQ-9 assesseddepression by mailsurvey

Depression ICD-9code: 296.2,296.3,298.0, 300.4, 309.0,309.1, 309.28, or 311*in the encounter data

Operatingcharacteristics ofdepression diagnosiscode in encounter data(screen) in relation toPHQ-9 score ≥ 10

PPV=66.4%(71 of 107)

Sens=51.1%(71 of 139)

Katonet al.,20065

Adolescent (11-17years) primary careoutpatients (n=769),with history of asthmatreatment, excludedpatients treated forbipolar disorder orschizophrenia;depressed subsample64% female, 2004-2005

C-DISC assessedDSM-IV majordepression ordysthymia; C-DISCassessed DSM-IVpanic, generalizedanxiety, social phobia,agoraphobia, orseparation anxietydisorder

≥1 ICD-9:296.2,296.3, 298.0,300.4, 309.0, 309.1,309.28, or 311*depressive disorderdiagnosis in utilizationrecord during 12months before C-DISC

Proportion of patientswith utilization claimfor depression who metC-DISC criteria for adepressive disorder(PPV1); Proportion ofpatients with utilizationclaim for depressionwho met C-DISCdepressive or anxietydisorder criteria (PPV2)

PPV1=31.5%(29 of 92)PPV2= 39.4%(41 of 104)

Sens1=48.3%(29 of 60)Sens2=36.6%(41 of 112)

Katonet al.,20046

Adult HMO patients(n=4385) withtreatment of diabetesmellitus and majordepression as assessedby PHQ-9, 60%female, mean age 59years, excluded patientswithout diabetes, withcognitive impairment,too ill to participate, orlanguage/hearingproblem.

PHQ-9 positive screenfor current majordepression.

In 12 months beforeassessment:1. ICD-9 code for ≥1depression:296.2,296.3, 298.0,300.4, 309.0, 309.1,309.28, or 311* AND2. Antidepressant(AD) prescription

Proportion of patientswith PHQ-9 majordepression disorderwho were detected byICD-9 code (S1) or ADprescription (S2).

NA Sens (S1) =36.3% (190 of524)Sens (S2) =42.9% (225 of524)

McCuskeret al.,200812

Emergently admittedmedical inpatients ages≥65 years (n=185)from 2 university-affiliated acute carehospitals in Montreal,over sampled fordepression, excludingpatients with cognitiveimpairment.

DIS assessed majordepressive disorder of> 6 months or < 6months duration.

During 12 months afterindex inpatientadmission, 3Algorithms:1. Outpatient claim forphysicians services forICD-9: 311, 300.4.2. Antidepressantprescription.3. Psychiatrist visit.

Proportion with DISassessed majordepression > 6 months(Sens1) and < 6months(Sens2) duration withoutpatient claims fordepression during 12months before or afterindex inpatientadmission.

PPV1=56.3%(9 of 16)PPV2=54.5%PPV3=62.5%(10 of 16)

Sens1= 15.8%(9 of 57)Sens2= 52.6%(30 of 57)Sens3=17.5%(10 of 57)

Solberget al.,200313

Adult (≥18 years)outpatients (n=274)from 9 staff modelprimary care clinics in ametropolitan area withdepression code, noantidepressantprescription 6 months,no diagnosis bipolar,schizophrenia, oralcoholism past year,1998-1999

1. Depressivesymptoms (CES-D ≥6)( PPV1)

ICD-9 311 (only codeavailable fordepression) code, noother 311 codes inprevious 6 months, noAP fills in previous 6months.

Proportion of patientsmeeting administrativecode definition ofdepression who meteach of the fouroutcomes (CES-Dscore, self-reportedcurrent depression, toldby health careprofessional at visit haddepression, and chartaudit with depressiondiagnosis or treatmentat index visit.

PPV1=71.5%(196 of 274)

NA

Sens, sensitivity; DSM-IV, Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition; PHQ-9, Patient Health Questionnaire; C-DISC, Comput-erized Diagnostic Interview Schedule for Children; DIS, Diagnostic Interview Schedule; CES-D, Center for Epidemiologic Studies Depression; PPV, positivepredictive value; ICD-9, International Classification of Diseases, Ninth Revision.

detection of depression in claims 165

Copyright © 2012 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2012; 21(S1): 163–173DOI: 10.1002/pds

Page 4: A systematic review of validated methods for identifying depression using administrative data

Table 2. Depression Algorithm Definitions/Validation Characteristics for Studies Using Medical Records/Self-Report as a Validation Standard

Citation Study Population andTime Period

Description ofOutcome Studied

Algorithm Validation/AdjudicationProcedure andOperationalDefinition

PositivePredictive

Value (PPV)

Sensitivity

Frayneet al.,20107

National sample(n=133,068) ofVeterans who weretreated for diabetesmellitus and respondedto a health survey,98.1% male, 76.6%white, mean age-66.3years, 1998-1999.

Patient self-report ofdepression: “Has adoctor ever told youthat you havedepression?”

Diagnosis of ICD-9:296.2x, 296.3x, 311.

Proportion of patientsmeeting algorithmcriteria told that theyhave depression(PPV) and proportionnot meeting algorithmcriteria not told thatthey have depression(NPV).

PPV=84% NA

Krameret al.,200310

Veterans (n=109)from 3 VA medicalcenters with ≥1outpatient claim fora depressive disorder(296.2-296.26,296.3-296.36, or311) after 180 dayswithout such a claimor an AD fill, 95.2%male, mean age 55.5years,1999-2001.

Presence of depressiondiagnosis in medicalrecord during 180 dayperiod after the newdepressive disorderclaim

≥1 outpatient claimsfor depression (ICD-9:296.2, 296.3, or 311)in any service settingafter 189 days withoutsuch a claim or anantidepressant fill.

Estimate of PPV fornew onset depressionby determiningproportion of caseswith depressiondiagnosis in medicalrecord during 180 daysprior to new indexclaim for depression.

PPV= 48.6%(53 of 109)(New onsetdepression)

NA

Rawsonet al.,199715

Randomly selectedinpatients with a firstlisted dischargediagnosis ofdepression (311) inadministrative files,Saskatchewan, 1986

A diagnosis ofdepression (311) or adepression-relateddisorder (ICD-9:296.1, 296.4, 296.6,300.4, 309, 311) inthe medical recorddischarge note.

First listed inpatientdischarge diagnosis ofdepression (311) inadministrative files.

Proportion with firstlisted dischargediagnosis ofdepression with ICD-9311 (PPV1) ordepression-relateddiagnosis (ICD-9:296.1, 296.4, 296.6,300.4, 309, 311) inmedical recorddischarge note.

PPV1= 58.3%(91 of 156)PPV2= 93.6%(146 of 156)

NA

Smithet al.,200911

Adult (19-64 years)work disabledMedicaid beneficiarieswith depression claimswho responded to anemployment anddisability survey, 2003or 2005

Self-rated depressedmood item.

≥1 Medicaid claimsfor ICD-9: 296.2,296.3, or 311 during12 months prior tosurvey

Proportion ofbeneficiaries withdepression claimswho report depressedmood (PPV1). Alsothe correspondingproportion amongsubgroup withadequately treateddepression (ATHFscore of 3 or 4)(PPV2).

PPV1= 87.9%(175 of 199)PPV2= 91.1%(153 of 168)

NA

Solberget al.,20069

Adults (>19 years)from a large privatehealth plan(N=135,842).5 random samples(N=20) meetingdifferent algorithms fordepression, 2000.

Medical recorddiagnosis ofdepression.

D1. Prevalentdepression: ≥2outpatient or ≥1inpatient ICD-9 296.2,296.3, 300.4, 311 in12 months D2.Antidepressant (AD)treatment (A and Band C): A. 6-monthsno AD prior to newAD fill B. ≥1depression code 3months before or afterAD. C. ≥1 moredepression code or ADfill in 1.5 years beforeor after new AD

Proportion of selectedplan members with D1claims-basedalgorithm found tohave depressiondiagnosis in medicalrecord (PPV1) andproportion with D2algorithm found inmedical record to havestarted a newantidepressanttreatment episode fordepression (PPV2)

PPV1= 98.8%(79 of 80)PPV2= 65.0%(13 of 20) and90.0% (18 of20)

NA

(Continues)

l. townsend et al.166

Copyright © 2012 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2012; 21(S1): 163–173DOI: 10.1002/pds

Page 5: A systematic review of validated methods for identifying depression using administrative data

permitted adjustment disorder with brief depressivereaction (309.0), adjustment reaction with prolongeddepressive reaction (309.1), adjustment reaction withmixed emotional features (309.28), and depressivetype psychosis (298.0).5,6 For some algorithms inone study,7 the following codes were also included:alcohol-induced mental disorder (291.89); other speci-fied drug-induced mental disorders (292.89); bipolaraffective disorder, depressed (296.5); and bipolaraffective disorder, mixed (296.6). Algorithms in threestudies also required a claim for a filled antidepressantprescription.8–10

Two categories of studies are presented. Table 1summarizes studies that used standardized scalesor structured diagnostic interviews as validation stan-dards. Table 2 presents studies that employed medicalrecord or self-report information as validationstandards. Each table includes studies that were popu-lation based, allowing for calculation of both positivepredictive values (PPVs) and sensitivities and cohort-based studies that permitted calculation of PPV only.

Algorithms for individual studies varied with respectto selected codes, treatment setting (outpatient orinpatient), position of code listing (principal vs. second-ary), billing health care professional, number of requiredcodes, and timing of the codes. The heterogeneity of thealgorithms across the studies constrains strict compari-sons. For example, the study that defined depressionon the basis of at least two outpatient or one inpatientcodes for depression (296.2, 296.3, 300.4, and 311) ina 12-month period likely included a narrower patientpopulation9 than did the study that required only a singledepression claim (296.2, 296.3, and 311).11

Validation criteria and method. The studies usedthree main comparators for validating depressionalgorithms: structured diagnostic interviews fordepressive disorders,5,12 self-report items or question-naires,6,7,11,13,14 and depression diagnoses containedin the medical record.8–10,13,15 Some of the self-reportvalidations were based on a single item for depressionor depressed mood11,13 or whether patients had ever

Table 2. (Continued)

Citation Study Population andTime Period

Description ofOutcome Studied

Algorithm Validation/AdjudicationProcedure andOperationalDefinition

PositivePredictive

Value (PPV)

Sensitivity

Solberget al.,200313

Adult (≥18 years)outpatients (n=274)from 9 staff modelprimary care clinics ina metropolitan areawith depression code,no antidepressantprescription 6 months,no diagnosis bipolar,schizophrenia, oralcoholism past year,1998-1999

2. Self-reportedcurrent depression(PPV2)3. Reported told atindex visit hasdepression (PPV3)4. Chart auditdepression diagnosisor treatment (PPV4)

ICD-9 311 (only codeavailable fordepression) code, noother 311 codes inprevious 6 months, noAP fills in previous 6months.

Proportion of patientsmeeting administrativecode definition ofdepression who meteach of the fouroutcomes (CES-Dscore, self-reportedcurrent depression,told by health careprofessional at visithad depression, andchart audit withdepression diagnosisor treatment at indexvisit.

PPV2=71.5%(196 of 274)PPV3=54.6%(149 of 274)PPV4=94.9%(260 of 274)

NA

Spettell,et al,20038

Primary care physicianpanel members (≥12years) from a largeMCO (n=892,786)selected for meetingalgorithm 1 or 2 andmembers matched byage, gender, andnumber of comorbidconditions not meetingalgorithms, 1997.

Physician diagnosisof depression inmedical record duringthe 12 month studyperiod.

Algorithm 1. In 12months, ≥2:A. First listed ICD-9296.2, 296.3, 300.4,311B. AD fill Algorithm 2.In 12 months, A. aboveand ≥1 of A or B.

Bipolar disorder,depressive psychosis orlithium fills excluded. Aor B above during 12months before studyperiod also excluded.

Sensitivity (Sens),Specificity (Spec),positive predictivevalue (PPV), andnegative predictivevalue (NPV)determined for asample of 465 patientsfor algorithm 1 and 2with physiciandiagnosis of depressionin medical record asthe criterion standard.

PPV1= 49.1%(115 of 234)PPV2 =60.6% (63 of103)

Sens1=95.0%(115 of121)

Sens2=52.1% (63of 121)

Sens, sensitivity; spec, specificity; MCO, managed care organization; CES-D, Center for Epidemiologic Studies Depression; ATHF, Antidepressant TreatmentHistory Form; ICD-9, International Classification of Diseases, Ninth Revision; PPV, positive predictive value; NPV, negative predictive value.

detection of depression in claims 167

Copyright © 2012 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2012; 21(S1): 163–173DOI: 10.1002/pds

Page 6: A systematic review of validated methods for identifying depression using administrative data

been told by a doctor that they had depression.7,13 Onestudy13 used a 20-item self-report depression scale(Center for Epidemiologic Studies Depression Scale[CES-D]),16 and two studies6,14 used the PHQ-9, whichis a brief, 9-item validated screen for major depressivedisorder.17 One of the studies permitted direct compar-ison of four different validation criteria: a CES-D scoreof ≥6, self-report depression item, being told by a phy-sician of depression diagnosis, and a medical record di-agnosis of depression.13 This study appears in bothTables 1 and 2 for ease of comparison.

Validation statistics. Four of the 11 studies providedsufficient information to derive a chance-correctedmeasure of agreement (kappa) between the algorithmand a criterion standard. Landis and Koch suggestedthe following kappa agreement standards: 0.0–0.20(slight), 0.21–0.40 (fair), 0.41–0.60 (moderate), and0.61–0.80 (substantial).18 In the three studies that usedan independent assessment of depression as the crite-rion standard, agreement was either in the slight12,14

or fair5,12 range. The only study that permitted calcula-tion of kappa and used a physician diagnosis ofdepression in the medical record as the criterion stan-dard achieved a moderate level of chance-correctedagreement.8 However, use of medical record or chartinformation is not an optimal method for validatingdepression diagnoses in administrative data given thevariability with which diagnostic information is repre-sented in chart notation.19–21

The other seven studies did not provide sufficientinformation to calculate kappa values. In six of thesestudies,7,9–11,13,15 the analysis was limited to patientswho met criteria for the depression algorithm. Thesestudies only permit assessment of the extent to whichpatients who meet algorithm criteria also meet thecriterion standard (PPV) but do not permit determina-tion of the proportion of patients with the criterionstandard who meet algorithm criteria (sensitivity). Infour of these six studies,9,10,13,15 the criterion standardwas a medical record diagnosis of depression. PPVsranged from 48.6%9 to 98.8%.9 Because PPVs varyas function of prevalence, the relevance of these PPVsis undermined by the absence of prevalence data.Furthermore, findings based on medical recorddiagnoses of depression must be interpreted withcaution given that chart information may not referencediagnoses used for billing purposes.19–21

None of the algorithms achieved a high level of agree-ment with depression as measured by independent

assessment. The highest agreement with clinicallydiagnosed depression was achieved by an algorithm thatrequired over a 12-month period at least two first-listedICD-9 codes for 296.2 (major depressive episode, singleepisode), 296.3 (major depressive episode, recurrentepisode), 300.4 (dysthymic disorder), or 311 (depressionnot elsewhere classified) as well as a filled prescriptionfor an antidepressant medication.8 In a large populationof primary care patients, the chance-corrected agreementof this algorithm was moderate (kappa = 0.464).18

Factors influencing algorithm performance

Algorithm construction. Operating characteristicsvaried as a function of the validation algorithm. Theserelationships were most readily apparent in studies thattested different algorithms against the same criterionstandard within the same patient population. In a studyof veterans treated for diabetes mellitus, for example,broadening the algorithm from unipolar depressioncodes to include bipolar disorder and substance-relatedmood disorder codes markedly increased the percent-age of patients captured by the algorithm from 4.5%to 16.5% but reduced the PPV (0.90 to 0.82). In thisstudy, the criterion standard was patient report ofbeing told by a doctor that the patient had depression.7

In one primary care study in which physician med-ical record diagnosis of depression was the criterionstandard, narrowing an algorithm markedly alteredthe operating characteristics.8 In the first algorithm,patients were required to have at least two eventseither of which could be outpatient encounter with aprimary diagnosis of depression or pharmacy claimsfor an antidepressant medication. Thus, patients withantidepressant claims without an outpatient depressiondiagnosis met algorithm requirements. The secondalgorithm required one outpatient claim with a primarydepression diagnosis and another event which couldbe either a second outpatient depression diagnosis oran antidepressant claim. Perhaps not surprisingly, theless stringent first algorithm had a lower specificity(65.4% vs. 88.4%) but a much higher sensitivity(95.0% vs. 52.1%) than that of the more stringent sec-ond algorithm. The kappa was slightly higher for thefirst (0.464) than for the second (0.425) algorithm.8

The position of the listed depression code mayaffect algorithm performance. Only one study includedalgorithms that required the depression code to appearin the principal position.8 The other studies did notspecify position of depression codes. Whether thevalidity of depression codes varies with code positionhas not been subjected to systematic study.

l. townsend et al.168

Copyright © 2012 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2012; 21(S1): 163–173DOI: 10.1002/pds

Page 7: A systematic review of validated methods for identifying depression using administrative data

Patient–Illness characteristics. The studies variedwith respect to patient population, ranging from unse-lected primary care HMO beneficiaries to populationsselected for chronic illness. Clinical differences wereevident among the study populations: three studieswere limited to patients treated for diabetes melli-tus,6,7,14 one focused exclusively on patients with ahistory of asthma treatment,5 and one involved onlydisabled persons.11 Some evidence suggests that co-occurrence of general medical disorders may compro-mise the management and perhaps the recognition ofdepression either because comorbid medical disorderscompete for clinical attention22 or because physiciansmay attribute signs and symptoms of depression toother medical disorders.23 For these reasons, thevalidity of claims-based measures of depression mayvary by the general medical status of the patientpopulation.

Level of care. The studies included in this reportexamined depression outcomes based on billing codesfor inpatient treatment,7,9,15 outpatient visits,7–10,12,13

or emergency encounters.7,10,13 None of the studiesseparately assessed or compared the validation ofdepression outcomes from different treatment settings,although one study compared validity measures fromprimary care visit codes with mental health servicecodes.7

Patient age. One study was limited to adolescents,5

one included children and adults,15 and one includedonly patients at least 65 years of age.12 The remainingeight studies involved non-elderly and elderly adults.The average patient age in these studies was between50 and 60 years. None of the studies included valida-tion information stratified by patient age. BecausePPV depends upon the prevalence of the underlyingcondition, it is likely that PPV will be lower in chil-dren and adolescents than in adults because of thelower treated prevalence of depression in childrenand adolescents.24

Patient gender. Except for two studies of veterans,7,10

all of the studies involved a predominance of femalepatients. None of the studies, however, providedvalidation data stratified by patient sex. After control-ling for measures of severity and impairment, patientsex has not been found to be related to the rate oftreatment of major depression.25

Payment source. The patient populations included arange of payment sources for medical services.Although some of the studies were limited to patientswith Medicaid coverage,11,14 others included onlypatients with private insurance.5,6,9 Two studies werebased on patients receiving care that was financedand provided by the Veterans Health Administra-tion.7,10 The effects of differences in payment sourceand associated billing systems on validity of depres-sion codes remain unknown. One study of veteranscare indicated that supplementing Veterans HealthAdministration data with Medicare data enhanced therate of depression detection7 (Table 2).

Period of data collection. The 11 reviewed studieswere published between 1997 and 2010. The earliestdata were based on care delivered in 1986,15 andthe most recent data were derived from 2006,14 al-though 2 studies did not specify the dates of servicedelivery.6,12 It is likely that increases in rates of antide-pressant prescribing26 over time may influence theperformance of algorithms including antidepressantprescriptions as a criterion, rendering it difficult tocompare studies conducted at different periods.Although time effects were not examined specificallyin this study, the influence of improvements in depres-sion detection on algorithm performance over timeremains an important question for further research.

Excluded populations and diagnoses. Five studiesfocused exclusively on populations with specificconditions including diabetes,6,7,14 asthma,5 anddisability.11 Of the other six studies, two were limitedto patients who had received inpatient treatment,12,15

and one was limited to veterans and focused on newonsets of depression.10 Only one of the studies wasbased on a national population.7 The other studieswere derived from a Canadian province,15 two acutecare hospitals,12 three Veterans Health Administrationmedical centers,10 large managed care organiza-tions,8,14 a private health plan or health maintenanceorganization,5,6,9 or a group of nine primary careclinics.13 The composition of the study samplesincluded herein is an important variable in evaluatingthe utility of algorithms to detect depression in admin-istrative data. Population-based studies that are likelyto include both individuals with and without depres-sion allow for assessment of algorithm sensitivity inidentifying depression and in providing a measure ofprecision (PPV). Cohort-based studies, especially

detection of depression in claims 169

Copyright © 2012 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2012; 21(S1): 163–173DOI: 10.1002/pds

Page 8: A systematic review of validated methods for identifying depression using administrative data

those that oversample for individuals who are likely tohave depression, only allow for estimation of precisionand do not provide information regarding algorithmsensitivity. It is possible that cohort-based studiesdesigned to measure PPV may utilize stricter algo-rithms, leading to a loss of sensitivity. The magnitudeof reduction in sensitivity is unable to be determinedfrom cohort-based data. In addition, the interpretivevalue of PPV statistics is dependent on the degree towhich the prevalence of depression in the samplematches the epidemiological prevalence of depression.

DISCUSSION

When their performance was measured in terms ofagreement with independent assessment of depression,most algorithms produced only slight or fair levels ofagreement with the criterion standard. Performanceimproved when a physician diagnosis of depressionin the medical record was used as the criterionstandard. However, physician diagnoses and measuresbased on administrative data do not capture individ-uals whose depression has not come to medicalattention. Furthermore, using medical records can beproblematic for ascertaining whether administrativedata can accurately identify individuals withdepression given potential variability in the types ofinformation recorded in patient charts.19–21 For depres-sion cases identified algorithmically, PPVwas somewhathigher, ranging from 48.6% to 98.8%. However, theoverall performance of algorithms to detect depressionin administrative data has shortcomings, suggesting thatin view of current limitations in the clinical recognitionof depression in primary care, it is unlikely that an algo-rithm derived from administrative data will be developedthat has generally acceptable validity for identifyingdepression as an outcome. For this reason, it is suggestedthat when possible, research should focus on clinicalpractices that systematically screen for depression whilerecognizing that routine screening for depression inadults and youth, although consistent with recommenda-tions from the US Preventive Services Task Force,27,28

may not be conducted in all service settings.The goal of identifying depression from administra-

tive data is constrained by incomplete clinical detec-tion. In the community, just over one half of adultswith major depression receive treatment for theirsymptoms during the course of 1 year.25,29 In addition,primary care physicians recognize as depressed onlyabout half of patients with depression who presentfor medical treatment.30 The detection rate may beeven lower among patients with medical morbidity

(30%)23 and veterans (40%).31 Deficiencies in theclinical diagnosis of depression, especially varyingidentification rates in different clinical subgroups,impose a ceiling on the performance of efforts todetect depression using administrative data. One coun-tervailing consideration is that clinically detected ortreated depression may be more severe than undetectedor untreated depression.25,32 Screening initiatives33 andother efforts to improve the detection and managementof depression in primary care practice34 may have anincidental salutary effect on the validity of electronic-health-record-based detection of depression.Beyond problems with clinical detection, psychoso-

cial considerations may further limit the accuracy ofadministrative data for identifying depression.Although attributions regarding the causes of depres-sion are changing,35 concern about protecting patientconfidentiality may lead some physicians to substitutenon-mental disorder diagnoses on claims and patientencounter forms.36 Underreporting of depression mayoccur in a conscious effort by the clinician to reducesocial stigma that might otherwise have adverse occu-pational or legal consequences for the patient.37,38

Constraints on clinical diagnosis may help toexplain the range of sensitivities in the reported studies.When self-report patient assessments that capture clini-cally undetected depression serve as the criterion stan-dard, sensitivities are low, ranging from 12.5%15 to51.1%.14 When medical record diagnoses are treatedas the criterion standard, the sensitivity of algorithmsbased on administrative data reaches as high as95.0%.8 These differences are also reflected in the kappavalues that measure agreement between the algorithmand a criterion standard.Studies with independent assessments of de-

pression5–7,12,13 rather than medical record diagno-ses8–10,15 provide more credible evidence of algorithmvalidity. In this regard, the most rigorous studies involvestructured diagnostic assessments5,12 followed by thosethat use a depression screening instrument14 as the crite-rion standard. Unfortunately, low chance-correctedagreement of these studies precludes the algorithmsin these studies from being recommended for caseidentification.One strategy for broadening algorithms and captur-

ing more patients with depression is to includepharmacy records indicating an antidepressantprescription fill.6,8–10 Although this strategy tends toincrease sensitivity by capturing more patients withdepression, it comes at the expense of false positivecases that diminish PPV.6,8,9 These trade-offs arisefrom the large proportion of antidepressant prescrip-tions that are for psychiatric and general medical

l. townsend et al.170

Copyright © 2012 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2012; 21(S1): 163–173DOI: 10.1002/pds

Page 9: A systematic review of validated methods for identifying depression using administrative data

conditions other than depression. In one nationalstudy, only 27% of individuals treated with antidepres-sants reported receiving them for depression.26

Positive predictive value depends upon the preva-lence of the underlying condition. Many of the studieswere performed in highly enriched samples with baseprevalence rates of depression that greatly exceedprimary care populations. Without careful attentionto the prevalence of depression in the base population,the PPVs may appear deceptively high. Although thismay be obvious in samples that are limited only topatients with depression,8,13 it also distorts estimatesin samples that are enriched by oversampling ofpatients with depression. In one study, for example,the base prevalence of depression in a sample selectedfor treatment of a mental disorder (28.5%)14 yielded aPPV of 66.4%. In another study that involved match-ing patients without depression to patients withdepression, the base prevalence of depression fromthe resulting study sample was 26.0% and the PPVwas 49.1% to 60.6% depending upon the algorithm.8

Substantially lower PPVs would be expected ingeneral populations of primary care patients. The prev-alence of major depression occurs in only 5%–10% ofprimary care patients and approximately 10%–14% ofmedical inpatients.39

No algorithm achieved a high level of concordancewith independent assessments of depression. Mostroutine electronic billing data do not currently includesufficient information to identify depression in areasonably reliable manner. Electronic health recordsdo provide sufficient information to identify adultprimary care patients who have been diagnosed withdepression with a moderate level of agreement. Thealgorithm with the strongest psychometric propertiesrequired at least two first-listed depression codes(296.2, 296.3, 300.4, and 311) over the course of12 months as well as an antidepressant prescriptionclaim (kappa: 0.464).8 However, this algorithm utilizedmedical record diagnosis of depression as a validationstandard and therefore cannot be recommended foridentifying clinically diagnosed depression on thebasis of administrative data.Given that depression is an established side-effect of

several medications,40,41 a well-validated algorithm toidentify depression in electronic health data would bevaluable for postmarketing evaluation of drug safety.To improve detection of depression in electronichealth records, recommendations for future researchefforts include the following:

(1) Replicating the evaluation of the most promisingdepression algorithm8 for identifying clinically

diagnosed depression in a fee-for-service or otherprimary care setting using standardized scales orstructured diagnostic interviews as the validationstandard is encouraged.

(2) Future research assessing promising algorithmsshould be conducted in general medical orspecialty mental health settings that routinelyscreen for depression. It should be noted that evenwithin specialty mental health settings, correla-tions are modest between clinical diagnoses andstructured diagnostic interviews.42

(3) Improvement in the clinical recognition of depres-sion through implementation of routine depressionscreening is an important variable that can affectalgorithm performance. Future research shouldspecifically examine algorithm performance overtime as depression screening becomes morethoroughly implemented in youth and primarycare populations.

(4) Because of increasing concern over depression-related adverse events in youth43,44 and the paucityof information that is currently available for this agegroup,5 priority should be given to developing algo-rithms to identify depression in children andadolescents.

CONCLUSION

Incomplete recognition of depression in routine clini-cal practice constrains the performance of electronichealth information to identify depression. Unlesssubstantial progress is made in the clinical detectionof depression, algorithms based on administrativedepression codes are unlikely to achieve acceptablesensitivity in identifying depression as measured byindependent assessment. Depression estimates arepartly a reflection of the base rates of depression inany given sample; therefore, depression rates may beelevated in samples with a high prevalence of mentaldisorders (e.g., Medicaid samples) and underestimatedin samples with a comparatively low prevalence ofmental disorders (e.g., general primary care samples).At the same time, administrative depression codeshave been demonstrated to have reasonable concor-dance with medical record diagnoses of depression.In several contexts, most adults who receive adminis-trative codes for depression have notations of depres-sion in their medical records. Much less is known,however, about the sensitivity of administrativedepression codes for identifying depression in themedical record. However, given the variability of infor-mation contained in medical records,19–21 these sourcesare not considered an optimal means of validating

detection of depression in claims 171

Copyright © 2012 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2012; 21(S1): 163–173DOI: 10.1002/pds

Page 10: A systematic review of validated methods for identifying depression using administrative data

diagnoses in administrative data. Given that antidepres-sants are prescribed for a wide variety of psychiatricdisorders and some general medical conditions,inclusion of prescription claims for antidepressantmedications may or may not improve the PPV ofalgorithms to identify depression. The current value ofadministrative depression codes appears to be limitedto identification of selected cases with a reasonableprobability of having clinically recognized depression.

CONFLICT OF INTEREST

The authors declare no conflict of interest.

KEY POINTS• None of the algorithms evaluated achieved a highlevel of agreement with depression as measuredby independent assessment.

• Detection of depression in administrative recordsis influenced by the base prevalence of depres-sion in the population of interest and isconstrained by incomplete clinical detection ofthe disorder.

• The algorithms with the strongest psychometricproperties employed the following: encounterdepression codes (296.2, 296.3, 298.0, 300.4,309.0, 309.1, 309.28, or 311) validated againstthe PHQ-9 (sensitivity = 51.1%, PPV= 66.4%)or an antidepressant claim within 12months afteran emergent medical hospital admission vali-dated against a structured diagnostic interview(sensitivity = 52.6%, PPV= 54.5%).

• Use of antidepressant claims to identify indivi-duals with depression is problematic given thewidespread use of antidepressants for purposeunrelated to depressive disorders.

• The strongest algorithm performance was foundwhen medical record notation of depression inpatient charts was used as the criterion standard.However, use of medical record notation infor-mation is problematic due to the variability withwhich depression is referenced in thesedocuments.

ACKNOWLEDGEMENTS

This work was supported by the Food and DrugAdministration through the Department of Health andHumanServicesContractNumberHHSF223200910006I.The views expressed in this document do not necessarily

reflect the official policies of the Department of Healthand Human Services, nor does mention of trade names,commercial practices, or organizations imply endorse-ment by the US government.

REFERENCES

1. Stang P. Epidemiological context of signalling. Drug Saf 2007; 30(7): 611–613.2. World Health Organization. The Global Burden of Disease: 2004 Update. 2004.

Available at: http://www.who.int/healthinfo/global_burden_disease/2004_report_update/en/index.html. [1/16/2011].

3. The Observational Medical Outcomes Partnership (OMOP). Health Outcomes ofInterest. Available at: http://omop.fnih.org/HOI. [6/4/2010].

4. Carnahan RM, Moores KG. Mini-Sentinel’s systematic reviews of validatedmethods for identifying health outcomes using administrative and claims data:methods and lessons learned. Pharmacoepidemiol Drug Saf 2012; 21(S1): 82–89.

5. Katon WJ, Richardson L, Russo J, et al. Quality of mental health care foryouth with asthma and comorbid anxiety and depression. Med Care 2006; 44:1064–1072. DOI: 10.1097/01.mlr.0000237421.17555.8f

6. Katon WJ, Simon G, Russo J, et al. Quality of depression care in a population-based sample of patients with diabetes and major depression. Med Care 2004;42: 1222–1229. DOI: 10.1097/00005650-200412000-00009

7. Frayne SM, Sharkansky EJ, Wang D, et al. Using administrative data to identifymental illness: What approach is best? Am J Med Quality 2010; 25(1): 42–50.DOI: 10.1177/10628606093446347

8. Spettell CM, Wall TC, Allison J, et al. Identifying physician-recognized depres-sion from administrative data: Consequences for quality measurement. HealthServ Res 2003; 38: 1081–1102. DOI: 10.1111/1475-6773.00164

9. Solberg LI, Engebretson KI, Sperl-Hillen JM, et al. Are claims data accurateenough to identify patients for performance measures or quality improvement?The case of diabetes, heart disease, and depression. Am J Med Qual 2006; 21:238–245. DOI: 10.1177/1062860606288243

10. Kramer TL. How well do automated performance measures assess guideline ad-herence for new-onset depression in the Veterans Health Administration? JointComm J Quality & Safety 2003; 9: 479–489.

11. Smith EG, Henry AD, Zhang J, et al. Antidepressant adequacy and work statusamong Medicaid enrollees with disabilities: A restriction-based, propensityscore-adjusted analysis. Community Ment Health J 2009; 45: 333–340.

12. McCusker J, Cole M, Latimer E, et al. Recognition of depression in older medicalinpatients discharged to ambulatory care settings: A longitudinal study. Gen HospPsychiatry 2008; 30: 245–251. DOI: 10.1016/j.genhosppsych.2008.01.006

13. Solberg LI, Fischer LR, Rush WA, et al. When depression is the diagnosis, whathappens to patients and are they satisfied? Am J Manag Care 2003; 9: 131–140.

14. Kahn LS, Fox CH, McIntyre RS, et al. Assessing the prevalence of depressionamong individuals with diabetes in a Medicaid managed-care program. Int J Psy-chiatry Med 2008; 38: 13–29.

15. Rawson NS, Malcolm E, D’Arcy C. Reliability of the recording of schizophreniaand depressive disorder in the Saskatchewan health care datafiles. Soc PsychiatryPsychiatr Epidemiol 1997; 32: 191–199. Reference ID: 10219.

16. Radloff LS. The CES-D scale: A self report depression scale for research in thegeneral population. Applied Psychological Measurement 1977; 1: 385–401.DOI: 10.1007/BF00788238

17. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depressionseverity measure. J Gen Intern Med 2001; 16: 606–613. DOI: 10.1046/j.1525-1497.2001.016009606.x

18. Landis JR, Koch GG. The measurement of observer agreement for categoricaldata. Biometrics 1977; 33: 159–174. DOI: 10.2307/2529310

19. Mojtabai R, Olfson M. Proportion of antidepressants prescribed without a psy-chiatric diagnosis is growing. Health Aff 2011; 30: 1434–1442. DOI: 10.1377/hlthaff.2010.1024

20. Desai MM, Rosenheck RA, Thomas JC. Case finding for depression amongmedical outpatients in the Veterans Health Administration. Med Care 2006;44: 175–181. DOI: 10.1097/01.mlr.0000196962.97345.21

21. Jencks SF. Recognition of mental distress and diagnosis of mental disorder inprimary care. JAMA 1985; 253: 1903–1907. DOI: 10.1001/jama.253.13.1903

22. Rost K, Nutting P, Smith J, et al. The role of competing demands in the treatmentprovided primary care patients with major depression. Arch Fam Med 2000; 9:150–154. DOI: 10.1001/archifami.9.2.150

23. Tylee AT, Freeling P, Kerry S. Why do general practitioners recognize majordepression in one woman patient yet miss it in another? Br J Gen Practice1993; 43(373): 327–30.

24. Olfson M, Marcus SC, Druss BG, et al. National trends in the outpatienttreatment of depression. JAMA 2002; 287: 203–209. DOI: 10.1001/jama.287.2.203

l. townsend et al.172

Copyright © 2012 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2012; 21(S1): 163–173DOI: 10.1002/pds

Page 11: A systematic review of validated methods for identifying depression using administrative data

25. Kessler RC, Berglund P, Demler O, et al. The Epidemiology of Major Depres-sive Disorder: Results From the National Comorbidity Survey Replication(NCS-R). JAMA 2003; 289: 3095–3105. DOI: 10.1001/jama.289.23.3095

26. Olfson M, Marcus SC. National patterns in antidepressant medication treatment.Arch Gen Psychiatry 2009; 66: 848–856. DOI: 10.1001/archgenpsychiatry.2009.81

27. Screening for Depression in Adults. December 2009. U.S. Preventive Services TaskForce. http://www.uspreventiveservicestaskforce.org/uspstf/uspsaddepr.htm

28. Williams SB, O’Connor EA, Eder M, et al. Screening for child and adolescentdepression in primary care settings: A systematic evidence review for the USPreventive Services Task Force. Pediatrics 2009; 123(4): e716–e735. DOI:10.1542/peds.2008-2415

29. Hasin DS, Goodwin RD, Stinson FS, et al. Epidemiology of major depressivedisorder: Results from the National Epidemiologic Survey on Alcoholism andRelated Conditions. Arch Gen Psychiatry 2005; 62: 1097–1106. DOI: 10.1001/archpsyc.62.10.1097

30. Wells KB, Hays RD, Burnam A, et al. Detection of depressive disorder forpatients receiving prepaid or fee-for-service care: results from the Medical Out-comes Study. JAMA 1989; 262(23): 3298–302. DOI: 10.1001/jama.262.23.3298.

31. Liu CF, Campbell DG, Chaney EF, et al. Depression diagnosis and antidepressanttreatment among depressed VA primary care patients. Adm Policy Ment Health &Ment Health Serv Res 2006; 33: 331–341. DOI: 10.1007/s10488-006-0043-5.

32. Valenstein M, Ritsema T, Green L, et al. Targeting quality improvement activi-ties for depression: implications of using administrative data. J Fam Practice2000; 49(8): 721–8.

33. Valenstein M, Vijan S, Zeber JE, et al. The cost-utility of screening for depres-sion in primary care. Ann Intern Med 2001; 134: 345–360.

34. Wells KB, Sherbourne C, Shoenbaum M, et al. Impact of disseminating qualityimprovement programs for depression in managed primary care: a randomizedcontrolled trial. JAMA 2000; 12(283): 212–220. DOI: 10.1001/jama.283.2.212.

35. Blumner KH, Marcus SC. Changing perceptions of depression: Ten-year trendsfrom the general social survey. Psychiatr Serv 2009; 60: 306–312. DOI:10.1176/appi.ps.60.3.306.

36. Rost K, Smith R, Matthews DB, et al. The deliberate misdiagnosis of majordepression in primary care. Arch Fam Med 1994; 3(4): 333–7. DOI: 10.1001/archfami.3.4.333.

37. Hoyt DR, Conger RD, Valde JG, et al. Psychological distress and help seeking inrural America. Am J Commun Psychology 1997; 25(4): 449–70. DOI: 10.1023/A:1024655521619.

38. Hirschfeld RM, Keller MB, Panico S, et al. The National Depressive and Manic-Depressive Association consensus statement on the undertreatment of depres-sion. JAMA 1997; 277(4): 333–40. DOI: 10.1001/jama.277.4.333.

39. Kayton W, Schulberg H. Epidemiology of depression in primary care. Gen HospPsychiatry 1992; 14: 237–247.

40. Patten SB, Barbui C. Drug-induced depression: a systematic review to informclinical practice. Psychother Psychosom 2004; 73: 207–215. DOI: 10.1159/000077739.

41. Hull PR, D’Arcy C. Isotretinoin use and subsequent depression and suicide:Presenting the evidence. Am J Clin Derm 2003; 4: 493–505. DOI: 10.2165/00128071-200304070-00005.

42. Shear MK, Greeno C, Kang J, et al. Diagnosis of nonpsychotic patients in com-munity clinics. Am J Psychiatry 2000; 157: 581–587. DOI: 10.1176/appi.ajp.157.4.581.

43. Hammad TA, Langhren T, Racoosin J. Suicidality in pediatric patients treatedwith antidepressant drugs. Arch Gen Psychiatry 2006; 63: 332–339. DOI:10.1001/archpsyc.63.3.332.

44. Sone M, Laughren T, Jones ML, et al. Risk of suicidality in clinical trials of anti-depressants in adults: analysis of proprietary data submitted to US Food andDrug Administration BMJ 2009; 339: b2880. DOI: 10.1136/bmj.b2880.

detection of depression in claims 173

Copyright © 2012 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2012; 21(S1): 163–173DOI: 10.1002/pds