identify opiod use problem - iupui

IDENTIFYOPIODUSEPROBLEM

AbdullahHamadAlzeer

SubmittedtothefacultyoftheUniversityGraduateSchoolinpartialfulfillmentoftherequirements

forthedegree DoctorofPhilosophy

intheSchoolofInformaticsandComputing,IndianaUniversity

December2018

ii

AcceptedbytheGraduateFacultyofIndianaUniversity,inpartial

fulfillmentoftherequirementsforthedegreeofDoctorofPhilosophy.

DoctoralCommittee

______________________________________JosetteJones,Ph.D.,Chair

______________________________________BrianDixon,Ph.D.

September11,2018

______________________________________

MatthewBair,MD.,MS.

______________________________________

XiaowenLiu,Ph.D.

iii

DEDICATION

To those I admire themost:my parents, Hamad andHessah, andmy beloved

siblingswhobelievedinmeallalong,Ibraheem,Saleh,Abdulaziz,Tarfah,Ahmed,and

Sarah.

iv

ACKNOWLEDGMENTS

Everyoneinthislifehashisjourney,andtheroadtocompletingthisdissertation

wasmyodyssey.Forthepast6years,IhaveencounteredenormouschallengesthatI

would not have overcome without the wisdom and support of my committee

members: Josette Jones, Brain Dixon, Mathew Bair and Xiaowen Liu. They have

mastered the art of babysitting my ego while I sought to maintain the highest

academicstandardsinresearch.Thankyouverymuch;yourlessonswentbeyondthe

pagesofthisdissertation,andIpromisetopassthetorchinthefuture.

Inadditiontomycommitteemembers,onescholarandbestfriendhashelpedthe

most in this journey.LinaAlfantoukh,withoutyour supportandexpertise indata

management,thisdissertationmaynothavebeenaccomplishedontime.Onelifemay

notbeenoughtothankyou.

IamgratefultoallmycolleaguesandprofessorsattheSchoolofInformaticsfor

theiracademicsupportandtotheemployeesattheOfficeofInternationalAffairsfor

goingbeyondexpectationstosupporttheinternationalcommunityatIUPUI.

Finally, this research would not have been possible without the trust of and

financialsupportfromthegovernmentofSaudiArabia,representedbytheCollegeof

PharmacyatKingSaudUniversity.MayIusewhatI’velearnedforthegoodofhealth

researchandpracticesinSaudiArabiaandtheworld.

v

AbdullahHamadAlzeer

IDENTIFYOPIOIDUSEPROBLEM

The aimof this research is to design a newmethod to identify the opioid use

problems (OUP) among long-term opioid therapy patients in Indiana University

Healthusingtextminingandmachinelearningapproaches.First,asystematicreview

was conducted to investigate the current variables,methods, and opioid problem

definitionsusedintheliterature.Weidentified75distinctvariablesin9modelsthat

majorlyusedICDcodestoidentifytheopioidproblem(OUP).Thereviewconcluded

thatusingICDcodesalonemaynotbeenoughtodeterminetherealsizeoftheopioid

problemandmoreeffortisneededtoadoptothermethodstounderstandtheissue.

Next,wedevelopedatextminingapproachtoidentifyOUPandcomparedtheresults

with the current conventional method of identifying OUP using ICD-9 codes.

Following the institutional review board and an approval from the Regenstrief

Institute, structured andunstructureddata of 14,298 IUHpatientswere collected

fromtheIndianaNetworkforPatientCare.Ourtextminingapproachidentified127

opioidcasescomparedto45casesidentifiedbyICDcodes.Weconcludedthatthetext

mining approachmay be used successfully to identify OUP from patients clinical

notes. Moreover, we developed a machine learning approach to identify OUP by

analyzingpatients’clinicalnotes.OurmodelwasabletoclassifypositiveOUPfrom

clinical notes with a sensitivity of 88% on unseen data. We concluded that the

machine learning approach may be used successfully to identify the opioid use

problemfrompatients’clinicalnotes.

JosetteJones,Ph.D.,Chair

vi

TABLEOFCONTENTS

1. INTRODUCTION................................................................................................................................1

Synopsis...........................................................................................................................................1

OverviewoftheOpioidUseProblems(OUP)..................................................................2

OpioidOverdoseDeath........................................................................................................2

OpioidUseProblemsandWomen...................................................................................3

OpioidUseProblemsinAdolescentsandYoungAdults........................................3

TheOpioidUseProblemsinIndiana..............................................................................4

ProblemStatement.....................................................................................................................5

Physicians’RoleinIdentifyingOpioidUseProblems.............................................5

Self-ReportOpioidRisk-AssessmentTools.................................................................6

TheStatementofPurpose........................................................................................................9

DissertationAims........................................................................................................................9

DefinitionofTerms..................................................................................................................10

2. CONDUCTASYSTEMATICREVIEWOFPREVIOUSOPIOIDUSEPROBLEMS

(OUP)............................................................................................................................................................13

Objective.......................................................................................................................................13

Method..........................................................................................................................................13

Results...........................................................................................................................................14

Variables..................................................................................................................................15

Method:StudiedPopulation,DatabaseUsedand,Analysis..............................16

OutcomeDefinition.............................................................................................................19

Discussion....................................................................................................................................22

vii

InconsistencyofPredictionDirectionofVariables...............................................22

ClaimsDatavs.ElectronicHealthRecordData......................................................23

VariationinOutcomeDefinition...................................................................................23

Limitations..............................................................................................................................26

Conclusion...................................................................................................................................28

3. IDENTIFYANDCOMPAREPATIENTSWITHOPIOIDUSEPROBLEM....................30

Introduction................................................................................................................................30

Method..........................................................................................................................................32

MethodsOverview..............................................................................................................32

SampleDefinition................................................................................................................33

VariablesofInterest...........................................................................................................33

ReportTypeSelection........................................................................................................33

IdentifyPatientswithOpioidUseProblems............................................................35

IdentifyOUPUsingICD-9Codes....................................................................................36

Analysis....................................................................................................................................37

Results...........................................................................................................................................39

SamplingResults..................................................................................................................39

IdentificationofOpioidUseProblemsUsingaText-MiningApproach.......39

ComparisonBetweenTextMiningandICD-9.........................................................41

VariablesofInterest...........................................................................................................43

Discussion....................................................................................................................................44

IdentificationofOpioidUseProblems........................................................................45

FrequencyofICDCriteriatoIdentifyOUP................................................................46

viii

FrequencyofText-MiningCriteriatoIdentifyOUP..............................................47

Limitations..............................................................................................................................49

Conclusion...................................................................................................................................50

4. MACHINE-LEARNINGAPPROACHTOIDENTIFYOPIOIDUSEPROBLEMS.........51

Introduction................................................................................................................................51

Method..........................................................................................................................................52

Machine-LearningProcessSummary.........................................................................52

Analysis.........................................................................................................................................54

WEKAConfigurationandModelComparison.........................................................54

ModelFinalization...............................................................................................................56

Results...........................................................................................................................................57

ModelComparison..............................................................................................................57

ModelOptimization............................................................................................................57

ModelTesting........................................................................................................................60

Discussion....................................................................................................................................60

Algorithms’Performance.................................................................................................60

Cost-SensitiveClassifier....................................................................................................62

Limitation................................................................................................................................62

Conclusion...................................................................................................................................63

5. SUMMARY.........................................................................................................................................64

SummaryOverview.................................................................................................................64

Self-reportTools(FirstGeneration)................................................................................64

StructuredDataModels(SecondGeneration).............................................................65

ix

ClinicalNotesModels(ThirdGeneration).....................................................................65

NLPandTextMining..........................................................................................................65

MachineLearning................................................................................................................71

Limitations..................................................................................................................................73

Conclusion..............................................................................................................................74

FutureDirectionandRecommendations.......................................................................74

6. APPENDICES....................................................................................................................................76

CHAPTER3SUPPLEMENTS.................................................................................................76

CHAPTER4SUPPLEMENTS.................................................................................................79

WEKADEMONSTRATION.....................................................................................................81

JAVACODESTOEXTRACTTEXT....................................................................................101

7. REFERENCES................................................................................................................................104

CURRICULUMVITAE

x

LISTOFTABLES

Table2.1Mostcommonlyusedvariablesinstatisticalmodelstopredictopioid

abuse.....................................................................................................................................................16

Table2.2Outcomedefinitionandmethodsdatasource.......................................................20

Table2.3Outcomedefinitionandmethodsdatasource.......................................................21

Table3.1Thecriteriaforidentifyingopioiduseproblemsinpatients’clinical

notes......................................................................................................................................................37

Table3.2Demographicscharacteristicsofstudypopulation.............................................40

Table3.3FrequencydistributionofOUPpositivecohorts’characteristics..................42

Table3.4ComparisonofcareutilizationamongOUPpositivecohorts..........................42

Table3.5Measureofassociationandlikelihoodratioforcategoricalvariables........44

Table3.6ComparisonofcontinuousvariablesamongpositiveOUPusingtext

miningvs.negativecohorts.........................................................................................................44

Table4.1HighlightofWEKAconfigurations...............................................................................56

Table4.2Comparisonofmultiplemodels’performancestoidentifypositive

OUP.........................................................................................................................................................58

Table4.3Detailedaccuracyperformancemeasureforsimplelogisticalgorithm

toclassifyOUP...................................................................................................................................58

Table4.4Initialsimplelogisticmodelperformanceon80%ofthegoldstandard

data........................................................................................................................................................58

Table4.5Optimizedsimplelogisticmodelperformanceon80%ofthegold

standarddata.....................................................................................................................................58

xi

Table4.6Simplelogisticconfusionmatrixcomparisonbeforeandafter

optimization.......................................................................................................................................58

Table4.7Simplelogisticmodelperformancetestedon20%ofunseengold

standarddata.....................................................................................................................................58

Table5.1Multivariatelogisticregressionoftext-miningmeasureofopioiduse

problems..............................................................................................................................................68

xii

LISTOFFIGURES

Figure1.1Increasesindrugandopioidoverdosedeaths–U.S.,2000–2014.................3

Figure1.2KeysubstanceuseandmentalhealthindicatorsintheU.S.:Results

fromthe2015nationalsurveyondruguseandhealth.Source:

SAMHSA,2016......................................................................................................................................5

Figure1.3ReportonthetollofopioiduseinIndianaandMarioncounty.

Source:Duwve,J.etal.,2016.........................................................................................................5

Figure1.4DistributionofopioidsbydifferenttypesofMedicareprescribers...............6

Figure1.5Overviewofsourcesandpurposeofopioidriskassessmenttools................7

Figure2.1Articlesextractionprocess...........................................................................................15

Figure3.1Identificationofopioiduseproblemssummarymethodsandresults......38

Figure3.2Positiveopioiduseproblemsratestratifiedbyagegroup.............................43

Figure4.1WEKAmodelconfigurations........................................................................................56

Figure4.2Summaryofchapter4methodandresults............................................................59

Figure5.1Opioidoverdosedeathratesper100,000populationfrom

2006-2014.Source:KaiserFamilyFoundation[123].....................................................70

1

1. INTRODUCTION

Synopsis

Medicalandnon-medicaluseofopioidshasincreasedintheUnitedStates(U.S.)

forthelastdecadeforpatientswithchronic,non-cancerpain[1–4].Prescriptionsof

opioidanalgesics,suchasmorphine,fentanyl,oxycodone,andhydromorphone,have

increased dramatically as shown in multiple studies [1,5–7]. According to The

NationalHealthandNutritionExaminationSurvey,25%ofAmericans20yearsand

older have experiencedpain that has lasted over 24hours in the pastmonth [8].

Despitetheclinical importanceofopioidsinthemanagementofpain,opioidsmay

havesignificantandadversesocietaleffects.Notably,thesecanincludetheepidemic

ofopioidabuseandopioidusedisorderwithrelatedsocio-economicandcriminal

impacts, as well as deaths attributed to overdose [9]. Achieving effective pain

management, along with maintaining functionality and healthy participation in

society, cannot be accomplished with the high occurrence of opioid addiction.

Therefore,understandingtherisksandbenefitsofprescribingopioidsisimportant

toreducetherelatedopioidabuseepidemic[10–13].Thefinancialburdenassociated

withtheopioiduseproblems(OUP)issignificant.Thecostoftreatmentforpatients

withOUP represents a large portion of the cost of rehabilitationprograms, social

consequences, public safety, legal proceedings, and so on [14,15]. The total US

societalcostsofprescriptionopioidOUPwereestimatedtobe$11.8billionin2001,

increasing to $55.7 billion in 2009 [16]. The societal effect of prescribed opioids

extendsbeyondfinancialburdentohealthmatters,accordingtotheNationalInstitute

2

ofDrugAbuse;prescribedopioidswereresponsibleforover16,000deathsin2014,

alone.Thisnumberincreasedfromnearly

6,000deathsin2001[17].

OverviewoftheOpioidUseProblems(OUP)

OpioidOverdoseDeath

In2015,accordingtotheCentersforDiseaseControlandPrevention(CDC),drug

overdosewas the leading causeof accidental death in theU.S.,with55,403death

incidents [18]. Opioid addiction alone is responsible for 20,101 (36%) of these

overdosedeathincidents[19].TheCDChasconcludedthat91Americansdieevery

day using prescription opioids and heroin, together [19]. To understand the

relationshipbetweenheroinandOUP,accordingtoJones(2013),80%(fouroutfive)

ofnewheroinusershavestartedwithprescribedpainkillers,suchasopioids[20].

Figure1.1showsthepatternofdrugoverdosedeaths involvingopioidsbytypeof

opioidinU.Sfrom2000to2014.[21]Unfortunately,alargepartoftheopioidsthat

are available illicitly are originate from prescribed opioids. In 2012, 259 million

prescriptionswereprescribedforopioids[22].Thisquantityissufficienttosupply

everyAmericanadultwithhisorherownbottle[18].

3

Figure1.1Increasesindrugandopioidoverdosedeaths–U.S.,2000–2014

OpioidUseProblemsandWomen

Drugabuseaffectsmenandwomenindifferentways;despitemenbeingmore

likelytodieofpainkilleroverdose,between1999and2010,theoverdoserateamong

women has increased by more than 400% compared to men, whose rate has

increased256% [23].Moreover,womenmaybecomedependentonopioidsmore

quickly than men and engage in doctor shopping more than men, as well [23].

Anotherimportantfactoruniquetowomenispregnancy;addictedpregnantwomen

canputinfantshealthatriskfromneonatalabstinencesyndrome[23].

OpioidUseProblemsinAdolescentsandYoungAdults

Despite the fact that the focus of this proposal is adult chronic patients, it is

importanttounderstandthatopioidconsumptionalsoaffectsadolescents.In2015,

an estimated276,000 adolescents between ages12 and17misusedopioids,with

122,000 having an addiction to prescription pain relievers [18]. Moreover, the

numbernearly tripled to829,0000youngadults fromage18 to25 inonemonth

(Figure1.2)[24].AccordingtotheNationalInstituteofDrugAbuse,mostadolescents

4

who misuse opioids obtained their drugs for free from a friend or relative [25].

Adolescentscanobtaintheirownprescriptions,aswell.Infact,from1999to2010,

thenumberofopioidprescriptionsamongadolescenthasnearlydoubled[26].

TheOpioidUseProblemsinIndiana

TheStateofIndianahasitsshareofopioidepidemics;arecentreportfromthe

Richard M. Fairbanks School of Public Health (September 2016) has reported

shockingfactsregardingOUP.Accordingtothesource,between1999to2014,the

numberof opioidoverdoseshas increasedby600% [27] (Figure1.3). In fact, the

leadingcauseofinjurydeathsinIndianaispoisoning,andmostofthesedeathsare

drugoverdoses (9outof10) [27].Drugoverdosesovertook thenumberofmotor

vehicledeathsinIndiana[27].Mostofthesedrugoverdoseswereinthegroupaged

30-39,followedbythegroupaged50-59[27].Withregardtogender,moremendie

fromdrugoverdoses;however,thegaphasshrunkovertheyears[27].Heroindeaths

havealsoincreasedsince2007(Figure1.3);mostweremale,non-Hispanicwhites,

andpeoplebetweentheagesof30-39hadthehighestdeathratefromheroin[27].

Moreinterestingfactscanbefoundinthereport[27].

5

Figure1.2KeysubstanceuseandmentalhealthindicatorsintheU.S.:Resultsfromthe2015

nationalsurveyondruguseandhealth.Source:SAMHSA,2016

Figure1.3ReportonthetollofopioiduseinIndianaandMarioncounty.

Source:Duwve,J.etal.,2016

ProblemStatement

Physicians’RoleinIdentifyingOpioidUseProblems

Pain-specialist physicians would be best suited to assess OUP overall and

prescribeproperpain-managementaction.However,duetothe limitednumberof

painspecialistphysiciansandtheiraccessibilitytothegeneralpublic,primarycare

physiciansmostoftenprovidethemajorityofpaincareinthehealthsystem[28].In

astudypublished in2016 investigating theprescribingofschedule IImedications

amongtypesofphysicians,itwasfoundthatfamilypractitionersprescribeover15.3

millionprescriptions;whereas internalmedicinepractitionersprescribeover12.7

6

million prescriptions [29] (Figure 1.4). This prescribingmay be due to improper

specialized training and educational/technical support, among other factors.

However, despite various root casues of the issue, primary care physicians have

reported frustration when providing care for chronic-pain patients [30–34]. In a

qualitative study published in 2015, some physicians admitted they have been

manipulatedmanytimesandhaveprescribedopioidsforpatientswhoexaggerated

their pain or for other patientswho had tested positive several times for heroin.

However,physicianssometimesmakethemistakeofnotprescribingmedicationto

patients who actually are in need, according to the same qualitative study [28].

Therefore,relyingonclinical judgmentalonemightmisguidephysicianstochoose

thecorrectcourseofactiontomanagechronic-painpatients.

Figure1.4DistributionofopioidsbydifferenttypesofMedicareprescribers

Self-ReportOpioidRisk-AssessmentTools

Multipleself-reportrisk-assessmenttoolshavebeendesignedandvalidatedtosome

extent in recent years [35–38]. Existing opioid risk-assessment tools vary in the

sourceofidentifyingpredictingvariables(knowledgebasevs.databaseprediction)

and the purpose of use (initiation of treatment vs. monitoring of care). For the

purpose of this dissertation, we categorize opioid risk-assessment tools into two

categories: self-report base vs. database tools. We define self-report-based risk-

assessmenttoolsasrisk-assessmenttoolsthatpredicttheriskofcurrentorfuture

opioiduseproblemsbasedonpredictivevariablesextractedfromdataself-reported

7

by the patient or physician. We define database risk-assessment tools as risk-

assessmenttoolsthatpredicttheriskofcurrentorfutureopioiduseproblemsbased

on objective data extracted from a database, such as electronic health records or

claimsadministrativedata(Figure1.5).

Figure1.5Overviewofsourcesandpurposeofopioidriskassessmenttools

Common examples of self-report tools are SOAPP, ORT, and COMM. However,

havingthemajorityofrisk-assessmenttoolssolelyrelyonpatientcompletionmight

limittheirabilitytopredictaberrantdrugbehaviorduetoreportingbias[39]orlack

ofpatientcomprehension[40].Jonesetal.,(2012)conductedastudythatisunique

in that it compared original ORT (patient self-reported) vs. ORT (psychologist

completedafterconductinganinterview)among51patients;only30(59%)patients

matchedinthesameriskcategory.Amongallvariables,agehadthehighestrateof

8

agreement, with only an 86% level of agreement [40]. Overall predictive ability

varied,aswellbetweenthetwoORTs;predictiveabilityforpsychologist-completed

ORTwas better (43% vs. 70%missing rate) [40]. These results indicate that the

processofadministrationoftheopioidrisk-assessmenttoolitselfcouldplayarolein

determiningtheoverallpredictiveabilityofthetool;perhapspatienteducationand

leveragingphysicianswithmoreaccurateandrobustdatafromhealthcaredatabases

might help tominimize the issue. A second issue of using self-reportmethods to

assessopioidrisk istheprocessofdocumentationitself.Thischallengebecomesa

monthlyburdenontheclinicalstaff,nurses,andphysicianswhoareconductingthe

documentation process. Understanding the challenge of documentation, Butler

conducted a study to examine the effect of computerizing current methods of

administratingSOAPPandCOMM(self-reporttype)ontherateofdocumentationof

thesetools.Theauthordesignedtheweb-basedtoolandimplementeditintwopain

centersforaperiodof10months.Bothsitesusedpaper-basedversionsofSOAPP-R

andCOMM.Theresultsshowedasignificantincreaseinthedocumentationofopioid

riskassessmentforbothinitialandfollow-upvisits(SOAPP30.3%vs.76.9/5,COMM

4.5% vs. 43.6%) (p < .001). Such a study highlights the significant issue with

documentationincurrentself-reportmethods[41].

A third issue with self-reportedmethods is the variation of prediction ability

(reliability)intheliterature;SOAPPisoneofthemostvalidatedtoolswefoundinthe

literature.Basedonthisreview,thereisalargevariationinthesensitivityofthetool

topredictOUP.ThesensitivityandspecificityofSOAPPforonesourcewas0.91and

0.69[42]and0.392and0.693inanother[43],respectively.Thesameargumentcan

9

beappliedtoORT,inwhichonestudyreportedasensitivityof0.45[44]andanother

reportedasensitivityof0.195[43].Onewaytoexplainthisvariationcouldbethe

differenceindefiningtheoutcomeinthevalidationstudiesreviewed.

TheStatementofPurpose

Duetothelimitednumberofpainspecialists,primarycarephysiciansprescribea

significant portion of opioids. Primary care physicians have reported frustration

whenprovidingcareforchronic-painpatients.Ourobjectiveistoinformtheclinical

andresearchcommunityabout factors contributing to the incidenceofopioiduse

problems.

Given the lack of census on structured variables’ rule to identify opioid use

problemsintheliterature,weseektoidentifystudiedvariablesintheliteratureand

classifytheirrulesintermsofprotectionorbeingriskfactorstoopioiduseproblems.

Additionally,while80%ofhealthdataarestoredasunstructureddata,thecurrent

tools to identify opioid use problems majorly rely on patient-reported data or

structured data in the electronic medical records. Thus, we seek to utilize

unstructured data to develop a predictive model for those who are at risk of

developingopioiduseproblemsusingtextminingandmachinelearningapproaches.

DissertationAims

1. Conductasystematicreviewofpreviousopioiduseproblems(OUP)predictive

modelsto:

(a) IdentifyvariablesavailableintheliteraturetopredictOUP.

(b) Exploreandcomparemethods(population,database,andanalysis)used

todevelopstatisticalmodelsthatpredictOUP.

10

(c) UnderstandhowoutcomesweredefinedineachstatisticalmodelOUP.

2. Identifyandcomparepatientswithopioiduseproblems(OUP)usingthetext

miningvs.ICDcodes.

(a) Identifypositiveopioiduseproblemspatientsusingtext-miningapproach.

(b) IdentifypositiveopioiduseproblemspatientsusingtheICD-9codes.

(c) Comparetheresultsandpopulationcharacteristics.

3. DevelopapredictivemodelforthosewhoareatriskofdevelopingOUPusinga

machinelearningapproach.

(a) Identifyagoldstandardsubsetofmedicalreports.

(b) Identifybestconfigurationstobuildmachinelearningmodel.

(c) Buildamodelthatpredictsopioiduseproblemsbasedonpatients’clinical

notes.

DefinitionofTerms

Tounderstandthecomplexityoftheproblem,thereareseveraltermsrelatedto

opioid use that need to be defined: opioid addiction, opioid abuse, aberrant drug

takingbehavior(ADTB),andopioiduseproblems(OUP).AccordingtoR.WestandJ.

Brown in their book,TheoryofAddiction, 2012, addiction is defined as a chronic

conditioninwhichthereisarepeatedpowerfulmotivationtoengageinarewarding

behavior, acquired as a result of engaging in that behavior, that has significant

potentialforunintendedharm.Itisnotall-or-none,butamatterofdegree.[45].

The second term is opioid abuse. In general, substance abuse is an initial step

towardsaddictionanddependence.TheWorldHealthOrganization(WHO)defines

substance abuse as the harmful or hazardous use of psychoactive substances,

11

includingalcoholandillicitdrugs.[46]Someattributesthatcharacterizesubstance

abuseare:failuretofulfillsocialorworkobligations,continueduseofasubstancein

hazardoussituations,legalproblemsrelatedtosubstanceabuse,andpersistentuse

despitecontinuedandrecurrentproblems[47].

Substanceabuseanddependencearenowcombinedintosubstanceusedisorder

(SUD),whichismeasuredonacontinuumfrommildtosevereaccordingtoDiagnostic

andStatisticalManualofMentalDisorders,FifthEdition(DSM5).Asthisareaofstudy

has developed, so have the defining criteria related to substance use disorders.

CategorizationwithinthespectrumofSUDrequiresatleasttwotothreesymptoms-

insteadofone-fromalistof11criteriaformildstage[48].Whilethesecriteriaare

availableindetailatwww.DSM5.org[48],itisimportanttonotethatthetolerance

criterion and withdrawal symptoms for opioid use disorder “does not apply for

diminishedeffectwhenusedappropriatelyundermedicalsupervision”[49].

Aberrantdrugtakingbehaviors(ADTB)aredefinedas“behaviorsthataremore

likelytobeassociatedwithmedicationabuseand/oraddiction”[50].Examplesof

thesebehaviorsare:1)sellingprescriptiondrugs,2)concurrentlyabusingalcoholor

other illicitdrugs,and3) losingprescribedmedicationonmultipleoccasions [51–

53].ItisworthmentioningthatsomebehaviorscouldlooklikeADTB,butinfact,they

mightindicateundertreatmentormaybemorepartofstabilizingthepaincondition

[54]. Examples of these behaviors are: 1) asking for, or even demanding, more

medication,2)askingforspecificmedications,3)useofthepainmedicationtotreat

other symptoms [54]. Despite the evolving nature of these terminologies, it is

12

important for researchers and treating prescribers to understand that there is a

spectrumofaddiction.

For this study, we define the final term, opioid use problems (OUP) as the

presenceofopioidaberrantbehaviors,opioidmisuse,opioidabuseandopioiduse

disorderinthepatientelectronichealthrecord(clinicalnotesand/orICDcodes).A

listofICDcodesandalistofspecificcriteriaofaberrantbehaviors(fortheclinical

note)areprovidedinthemethodsection(AppendixA.3andTable3.1).

13

2. CONDUCTASYSTEMATICREVIEWOFPREVIOUSOPIOIDUSEPROBLEMS

(OUP)

Objective

Several opioid risk assessment tools are available to prescribers to evaluate

opioidanalgesicabuseamongpatientswithchronicpain.Theaimofthissystematic

reviewistoanswerthefollowingquestions:1)Whatvariableshavebeenexamined

in the literature to predict opioid abuse? 2) What are the methods (population,

database,andanalysis)usedtodevelopthestatisticalmodelstocreatethetool?3)

Howweretheoutcomesdefinedineachstatisticalmodel?Theresultsofthischapter

waspublishedinapapertitled:“Reviewoffactors,methods,andoutcomedefinition

indesigningopioidabusepredictivemodels.PainMedicine,19(5),997-1009.”

Method

ApharmacistwithresearchexperienceusedEBSCOandPubMedtoconducta

two-stagesystematicsearchtoidentifyseveralarticlesthataredirectlyrelatedtothe

topicofriskassessment.First,thepharmacistidentified8articlesthatweredeemed

tobedirectlyrelatedtothetopic.Thesecondstepwastoconductasystematicsearch

usingOVIDtogeneratealistofarticlesthatshouldhaveall8articlesofrelevance.

Theideabehindchoosingthismethodisensuringallcommonarticlesofrelevance

are included inone combined search,whichperhapswill yieldmorenewarticles

relatedtothetopic,aswell.Finally,authorsofsomeofthearticlesfoundwerealso

contactedmainlythroughemailoroverthephonetorequirefurtherdataabouttheir

workortoidentifyotherrelatedwork.Thesearchwaslimitedtoarticleswrittenin

EnglishfeaturinghumanadultsubjectspublishedfromJanuary1990toApril2016.

14

Thissearchgenerated1409articles.Duplicatearticleswereautomaticallychecked

usingEndnoteX7andthenmanuallyreviewed.Outof1409articles,43articleswere

duplicates and excluded. The pharmacist reviewed the titles and abstracts of the

remaining1366articles(Figure2.1).Theinclusion/exclusioncriteriawerethe

following:

1. Theincludedarticleswereoriginalstudies,notnarrativereviews.

2. Theincludedstudiesfocusedonriskforaberrantdrug-relatedbehaviors.

3. Studiesincludedquantitativedataspecifictopredictioncapability(oddsratio,

p-value,orconfidenceinterval).

4. Studiesthatfocusedonself-administeredtoolswereexcluded.

5. Onlystudiesthatusedalgorithmstopredictopioidabusefromdataextracted

fromelectronichealthrecordsoradministrativeclaimswereincluded.

After applying the inclusion/exclusion criteria, 27 full-text articles were

downloaded.Ofthesearticles,20articleswereexcluded,becauseoftheirfocuson

selfreport tools (Figure2.1).Variablesof interestwereextractedand listedonan

Excel spreadsheet for all 7 articles included. To check face validity and data

consistency,aprimarycarephysicianreviewedextracteddata, includingvariables

fromarticles,categorizationofvariablesfromarticles,method(population,database,

analysis)used,andoutcomedefinition(opioidabuse).

Results

Duringoursystematicreview,weidentifiedsevenarticles[55–61](ninemodels)

toassessopioidriskfromdatabases(claimsdataorEHR),whichwillbethesubject

ofthisreview.Allninemodelsprovideddefinitionsoftheoutcomeofopioidabuseas

15

well as variables used topredict the outcome.The following sectionswill discuss

importantfindingswithregardtovariables,methods,andoutcomedefinitionsforthe

coincidingmodels.

Figure2.1Articlesextractionprocess

Variables

Amongtheretrievedstudies,9modelsand75distinctvariableswereidentified.

The variables were grouped into 7 main categories; demographics (6 variables),

medications(33variables),careutilizationvisits(3variables),behavior(7variables),

mental status (1 variable), pain andmedical comorbidities (15variables), pain (4

variables),andfamilyhistoryofsubstanceabuseandcomorbidities(6variables).

The identified models most commonly included demographic variables; age and

genderwerementionedinall9models.Historyofalcoholabuse,smokingstatusand

16

mentaldiagnosiswerementionedin5models.Table2.1listsallvariablesmentioned

in3modelsormoreoutof9models.

Method:StudiedPopulation,DatabaseUsedand,Analysis

Asmentionedpreviously, this review found3 studies thatusedadministrative

claimsdataand4studiesthatusedelectronichealthrecorddata.In1ofthe4articles,

adiseasemanagementprogramdatabasewasusedinadditiontohealthrecorddata.

Thenumberofpatientsincludedinthestudiesvariedgreatlyfrom196to1,552,489.

StudiesthatusedICD-9codesasaprimarysourceofdefiningtheircohortincluded

largersamplesizescomparedtostudiesthatdidnotuseICDcodes.Thissamplesize

variation is related to study design and coincides with the nature of data used.

Administrativeclaimsdataislikelynationaldata,whileEMRdataisrepresentativeof

a single site or single health care system. The inclusion criteria for a study’s

population were similar, age from 12 (or 18) to 64, at least one claim for a

prescriptionopioidinthepast12months,orahistoryofreceivingopioidsfor30to

90consecutivedays.Exclusioncriteriawerecancerpainandheroinpoisoning.Table

2.2detailsthemethodsforeacharticle.Finally,intermsofanalysisused,allstudies

usedlogisticregressionastheirprimarymultivariateanalysismethod.Usinglogistic

regressionforthispurposeisconsistentwiththecommonpracticeinthefield,which

recommendsusinglogisticregressiontoanalyzeadichotomousdependentvariable

[62](e.g.opioidabusersvs.non-abusers).Someofthestudiespreliminarilyanalyzed

variablesinbivariateanalysis,withthesignificantvariablesultimatelyenteredinto

themultivariatemodel.

17

Table2.1Mostcommonlyusedvariablesinstatisticalmodelstopredictopioidabuse

Category VariableInterpretation(Risk=Opioidabuse)

#ofmodels

Timesitwasprotective

Timesitwasriskfactor

Oddsnotreportedor=exactly1

Demographics Age Olderagedecreasesrisk 9 9 0 0Demographics Gender Maleincreasesrisk 9 0 6 3Behavior Historyofethanolabuse Alcoholabuseincreasesrisk 5 1 3 1Behavior Smokingstatus Smokingincreasesrisk 5 0 4 1Mental Mentalhealthdiagnosis Mentaldisorderincreases

risk5 0 4 1

Medications #ofopioidprescriptions Increasenumberofopioidprescriptionincreasesrisk

4 0 4 0

Behavior Earlyrefillsofopioidprescriptions

Increasenumberofearlyrefillsofopioidprescriptionincreasesrisk

4 0 4 0

Medications Non-opioidsubstanceabuse/dependence

Non-opioidsubstanceabuse/dependenceincreasesrisk

4 0 4 0

Careutilization Inpatienthospitalizationdays

Greaterhospitalizationdaysincreasesrisk

3 0 3 0

Careutilization Dayswithphysicalcarevisits

Greateroutpatientvisitsincreasesrisk

3 0 3 0

Behavior #ofpharmacieswhereopioidprescriptionwerefilled

Greaternumberofpharmacieswhereopioidprescriptionwerefilledincreasesrisk

3 0 3 0

Behavior #ofopioidprescribers Greaternumberofopioidprescriptionincreasesrisk

3 0 3 0

Medications Methadone Presenceofmethadoneprescriptionincreasesrisk

3 1 2 0

Medications Fentanyl(IR) Presenceoffentanylprescriptionhasnoeffectonrisk

3 1 1 1

Medications Fentanyl(ER) PresenceofFentanyl(ER)prescriptionincreasesrisk

3 0 2 1

18

Category VariableInterpretation(Risk=Opioidabuse)

#ofmodels

Timesitwasprotective

Timesitwasriskfactor

Oddsnotreportedor=exactly1

Medications Morphine(IR) Presenceofmorphineprescriptionincreasesrisk

3 0 2 1

Medications Morphine(ER) PresenceofMorphine(ER)prescriptionincreasesrisk

3 0 2 1

Medications Hydromorphone PresenceofHydromorphoneprescriptionhasnoeffectonrisk

3 1 1 1

Medications Oxycodone PresenceofOxycodoneprescriptiondecreasesrisk

3 1 0 2

Medications Tramadol PresenceofTramadolprescriptiondecreasesrisk

3 3 0 0

Medications Codeine PresenceofCodeineprescriptiondecreasesrisk

3 2 0 1

Medications Propoxyphene PresenceofPropoxypheneprescriptiondecreasesrisk

3 2 0 1

Medications Hydrocodone PresenceofHydrocodoneprescriptiondecreasesrisk

3 2 0 1

Medications Morphine(ER) PresenceofMorphine(ER)prescriptionincreasesrisk

3 0 2 1

Medications Fentanyl(ER) PresenceofFentanyl(ER)prescriptionincreasesrisk

3 0 2 1

MedicalComorbidities

Hepatitis PresenceofHepatitisdiagnosisincreasesrisk

3 0 3 0

19

OutcomeDefinition

The examined articles defined the outcome of opioid abuse differently.

Specifically,4outof7articles(6outof9models)dependedprimarilytheonpresence

or absenceof anopioid abuseordependencediagnosis according to ICD-9 codes,

whiletheother2studiesusedapredefinedlistofopioid-relatedaberrantbehaviors.

Additionally, they also reviewed a patient’s profile for any release from the pain-

management program due to aberrant behaviors. The last article and respective

model,usedahybridtechniquethatincludednaturallanguageprocessingmethods,

alongwithICD-9codestodefineopioidproblems(Table2.3).[55–61]

20

Table2.2OutcomedefinitionandmethodsdatasourceArticle/#ofModels #ofPatients Studysample Sourceofdata/Period CodingLewis,2014/1 202 Prescriptionsofanopioidfor30ormoredaysof

consecutivedosing.Agegreaterthan18years,andadocumentedhistoryofanychronicpainsyndromedefinedaspainlastinggreaterthan90daysinduration.

ElectronicHealthRecord/12Months

ManualChartreview

Ives,2006/1 196 chronic,non-cancerpainofatleastthreemonthsduration

Medicalrecordandourdiseasemanagementprogramdatabase/12months

Followup

White,2009/2 632,000 12to64yearswithatleast1claimforaprescriptionopioidandatleast1medicalclaimfrom2005to2006.

Administrativeclaimsdata/3months(AlternativeA)&12months(AlternativeB)

ICD-9-CM

White,2012/2 1,552,489 Patientsages12-64yearsthroughoutthe12monthspriortotheindexopioidRx

Administrativeclaimsdata/12Months

ICD-9-CM

Edlund,2007/1 15,160 Veteranswithatleastoneprescriptionforanopioidandhadmorethan91daysofsupplyin2002.(excludedindividualswithanycancerdiagnosis(ICD-9-CMcodesbetween140.0and208.9)

Administrativeclaimsdata/24months

ICD-9-CM

Turner,2014/1 5,420UrineDrugtests

PatientAge20orabove.StudyincludedonlyUrineDrugTest(UDTs)forpatientsonChronicopioidtherapy(COT)definedbyGroupHealthas70daysopioidsupplyintheprior90day.TolimitthesampletoUDTsforCNCPpatients,thestudyexcludedthoseofpatientswho,intheone-yearperiodpriortotheUDT,hadhadhospicecare,opioidprescriptionsfromoncologists,ormorethanonevisitforcancerotherthannon-melanomaskincancer.

ElectronicHealthRecord/12months

ICD-9

Hylan,2015 2,752 18yearsorolderwhoinitiatedCOTfornoncancerpainbetween2008&2010.COTwasdefinedasreceiptofequalormorethan70dayssupplyoftransdermalororalopioids(exceptbuprenorphine)inacalendarquarter,whichcorrespondedto>75%ofthedaysinthequartercoveredbyanopioidprescription.Studypatientsreceiveatleast2quartersofCOTwithina1-yearperiod

ICD-9codes:304[0,.00,.01,.02;304.7,.70,.71,.72];305[5,.50,.51,.52]

ICD-9+NaturalLanguageProcessing(NLP)toidentify[overuse,misuse,abuseorad-diction]And/or[Dependance]

21

Table2.3OutcomedefinitionandmethodsdatasourceArticle/#ofModels

OutcomeDefinition

Lewis,2014/1 multiplepharmacies,multiplenonBellevilleFamilyHealthCenter(BFHC)prescribersofopioids(anyprescriptionpreceded and followed by a BFHC provider prescribed opioid), unsanctioned dose escalation, lost or stolenprescription, UDS anomalies, illicit drug use (either self-reported or identified via UDS), multiple emergencydepartmentvisitsresultinginreceiptofopioids,familyreportofopioidsaleorotherformofdiversion,Forgingortamperingwithcontrolledsubstance,prescriptionsandmisuse/adulterationofprescribedformulationorintendedrouteofadministration.

Ives,2006/1 Negativeurinetoxicologicalscreen(UTS)forprescribedopioids,UTSpositiveforopioidsorcontrolledsubstancesnotprescribedbyourpractice,Evidenceofprocurementofopioidsfrommultipleproviders,DiversionofopioidsPrescriptionforgery,Stimulants(cocaineoramphetamines)onUTS.

White,2009/2 304.0(opioid-typedependence),304.7(combinationsofopioidtypewithanyother),305.5(opioidabuse),or965.0(poisoningbyopiatesorrelatednarcoticsbutexcluding965.01[heroinpoisoning])

White,2012/2 ICD9codes:304.0,304.7,305.5,965.00,965.02,and965.09

Edlund,2007/1 ICD9codes:304.00304.03(opioiddependence),304.70304.73(dependence;combinationsofopioidtypedrugwithanyother),and305.50305.53(opioidabuse)

Turner,2014/1 ClinicalClassificationSoftware(CCS)categorization.SmokingStatusused305.1ICD-9code.

Hylan,2015 ICD-9codes:304[0,.00,.01,.02;304.7,.70,.71,.72];305[5,.50,.51,.52]

22

Discussion

InconsistencyofPredictionDirectionofVariables

Ofnote,wefoundinconsistencyintheabilityofindividualvariablestopredictthe

outcome(aberrantdrug-relatedbehavior).Somevariables,suchasusingmethadone

(foropioidusedisordertreatment)weresignificantriskfactorsintwomodels(OR=

2.97,CI=2.57-3.42,p-value<0.0001)and(OR=3.22,CI=2.62-3.95,p-value<0.0001),

andsignificantlyprotectivefactorinathirdmodel(OR=0.26,CI=0.08-0.94,p-value

=0.02).Asecondexampleisseenwiththevariablehistoryofalcoholabuse.Despite

apredominanceintheliterature,whichshowsthatahistoryofalcoholabuseisarisk

factor (4 studies), the same variablewas slightly protective in a fifthmodel. This

variation in the role of a predictive variable (protective vs. risk factor)might be

solvedbyhavingawidermeasurementscale(continuousormultiplecategories)for

thevariablesinsteadofadichotomousscale(YesorNo).Tosupportthisargument,

Zale,E.et.al.,2014,conductedthefirststudyofitskindtotestwhethervaryinglevels

of current/historical smoking and indices of smoking heaviness/nicotine

dependence may be associated with greater likelihood of past-year prescription

opioidmisuseinthegeneralpopulation.[63]Thestudyfoundapositiveassociation

between levelof smokingheaviness/nicotinedependenceandopioidmisuse [63].

Other variables, such as gender, number of opioid prescriptions, early refills,

inpatienthospitalizationdays,dayswithphysicalcarevisits,non-opioidsubstance

abuse/dependence, number of morphine prescriptions, and number of opioid

prescriberswereallconsistentriskfactorsacrossthemodels.Othervariables,such

23

as age (older groups) and tramadol use,were consistently reported as protective

variables(Table2.1).

ClaimsDatavs.ElectronicHealthRecordData

Usingaspecifictypeofdatabase(administrativeclaimsdatavs.electronichealth

recorddata)couldpotentiallyalterdevelopmentofthemodeldesign.Forexample,

administrativeclaimdatabasesgenerallyprovideaconsistentdataformat,because

theyuseapre-definedsetofcodes,[64]Asresult, it isrelativelyeasiertocreatea

study cohort using a claims data compared to most electronic health records

databases.However,onepotentialdisadvantageofusingclaimsdataisthatthecare

providers do not always document these relevant codes. This incomplete

documentationleadstomissingoutcomesofinterestforsomepatients.[65,66]On

the other hand, using electronic health record databases, which hasmuch richer

clinical data (for each patient), [64] can facilitate improved cohort/outcome

definitions.Specificitywithinthecohortcanbeachievedbyreviewingsomeaspects

ofthepatient’sfilemanuallyorusingsomeadvancedtoolstodefinethestudycohort

basedoncertainmetrics(e.g.,estimatedglomerularfiltrationratetodefinechronic

kidneyfailurepatients)[64].

VariationinOutcomeDefinition

The InternationalClassificationofDiseases(ICD) isadiagnosticcodingsystem

that canbeused to categorizepatients.However, the issuewith creating amodel

based on an outcome structured by ICD codes is the fact that these codes often

understatetheactualnumberofpatientsexhibitingthetargetcategorization[57,58].

24

Thismeansthosepatientswhoarelabeledwithopioidproblems,accordingtoICD

codes,arelikelytohaveanopioidproblem;however,ICDcodesmayoverlookother

patientswhohaveopioidproblemsbutwerenotdocumented[57].Thiscouldbedue

an intentional lack of documentation, or in other cases, prescribers try to avoid

stigmatizing patients with opioid problems using these codes. [57] The issue of

underdocumentation in particular can create less representativemodels. Another

limitationtothemodelsfoundisthefactthatmostofthesemodelswerebasedon

claims data,which is less likely to provide the in-depth anddetailed clinical data

available in an electronic health record database. The variation in the outcome

definition across studies and its potential effect can be noticed in the self-report

opioidriskassessmenttools,aswell.Whentools,suchasSOAPP-R(SOAPPrevised)

orORTarevalidatedacrossdifferentpopulationsorwhendifferentdefinitionsofthe

outcomeareused,thepredictionsensitivityandspecificitychangesgreatly.SOAPP-

R is a self-report questionnaire designed to predict aberrant medication-related

behaviorsamongpersonswithchronicpain[67].SOAPP-Rcameasarevisedversion

fortheoriginalSOAPP,aninstrumentdevelopedbyInflexxion,(Newton,MA,USA)

withsupportfromEndoPharmaceuticalsandtheNationalInstituteonDrugAbuse

[68].Version1,consideredasaninitialsteptowarddevelopmentofascreenerfor

aberrantmedication-relatedbehaviorsinchronicpainpatients[37].SOAPP-Risone

of the most validated tools; based on literature, there is a large variation in its

sensitivity.ThesensitivityandspecificityofSOAPPwere91%and69%,respectively,

in one study [42], and 39.2% and 69.3%, respectively, in another [43]. The same

statementcanbeappliedtoORT,aswell.Onestudyreportedsensitivityof45%[46]

25

andtheotherstudyreportedsensitivityaslowas19.5%[43].Onewaytoexplainthe

variationinopioidriskassessmenttoolsperformancecouldbeduetoadifferencein

outcome definitions in the reviewed validation studies. For example, SOAPP

performance was measured in some studies based on a positive result on the

Aberrant Drug Behavior Index (ADBI) [67–69]; while in other studies, it was

measuredbasedonthepresenceofadischargereviewform[70].Atpresent,such

cross-comparisonsforopioidriskassessmentmodels,whicharebasedondatabases,

arenotyetpossibleduetoalackoffurthervalidationfrommultiplestudiesforany

ofthemodelsfound.However,thisreviewindicatesthatmoreeffortinthefutureis

likely toutilizesecondary-dataanalysis.This isdueto thenecessityofdeveloping

more accurate and comprehensivemodels from current and futurehealth system

electronicrecords.

However,thesefutureeffortswillsurelyfaceaninterestingchallenge,considering

that,asof2016,mostof theorganizationaldataareunstructured,andhealthcare

dataisnoexception.AccordingtoIBM,in2015,unstructureddatarepresented80%

of the total health data. [71] Thus, it is important for future efforts to utilize the

untappedpotentialofunstructureddatatodevelopanopioidaddictionmodelthat

usesdataminingtechniques.

Anotheraspectofdefiningoutcomesisthecohortfollow-upperiod.Whetherthe

studiesretrievedwereretrospectiveorprospective,thecoveredperiodtomeasure

theoutcomerangedfrom3monthsinonemodel,including12monthsin6models,

up to24months inonemodel,and finally24-60months(acombinationofapre-

indexingperiodandpost-indexingperiod).Thisvariationintheperiodoffollowup

26

mighthaveanunforeseeable impactonthepredictionaccuracyofthemodels.For

example,aperiodof3monthsof follow-upmaynotbesufficienttodetectopioid-

relatedaberrantbehaviors.Thus,patientsmightbecategorizedfalsenegativeasa

result.

Limitations

Thereareonly7studies,including9models,thatmetourinclusioncriteria.This

couldbeduetousingonlyonesearchengine,OVIDMedline.However, thestudy’s

searchtermswereexhaustiveandusedresultsfromcombined3searchsessionsto

mitigate this shortcoming (Figure 2.1). Another reason for the relatively small

numberofarticlesistheexclusionofself-reportopioidriskassessmentmethods(21

articles).Thejustificationforomittingthesearticlesisspecifictothestudyquestions.

Thestudyattemptstoidentifyvariablesandresourcesusedforthedesignofmodels

basedondatabases,electronichealthrecords,andadministrativeclaimdata.Another

limitation related to data collection is having only one primary care physician to

checkforfacevalidityanddataconsistency.Havinganotherreviewerandreporting

interraterreliabilitycouldaddvaluetothestudy.Inourcase,thiswasnotfeasible,

duetolimitedfundsandresourcesavailableforthestudy.

Asecondlimitationpertainstothequantitativeanalysisofthevariablesfound,

whichdidnotincludemagnitudeoftheoddratioorsignificanceofp-valuesforeach

variable.Rather,theanalysisonlydescribesavariablesruleintermsofbeingarisk

factororprotectivemeasure(Table2.1).Thiswasdue toa lackof reportingodds

ratiosin2articlesandp-valuedatainathirdarticle.These2articlesdidnotreport

theoddsdatafortheinsignificantvariables,whilethethirdarticledidnotreportany

27

p-valuesforthemodel(butreportedadjustedORandCI,instead).Finally,theway

articlescategorizedthevariablessubsetvariedacrossthemodel.Forexample,the

variable age was sub-categorized into 6 age groups, while it was treated as a

continuousvariableintheother.Moreover,determiningthereferencegroupvaried

among models, as well. In the same example of the variable “Age”, one model

identifiedanolderagesub-groupasareferencegroup,while5modelsidentifieda

youngeragesub-groupasareference,andonearticleidentifiedmiddleage(40-49)

asareferencegroup.

A third limitation of this review is the inconsistency in the definition of the

outcome,opioidabuse,acrossthearticles.Despiteadescriptionandanalysisofthe

sameissues,theparticulardefinitionsemployedvariedacrossarticles.ShannonM.

Smithetalexamined the issueofdefinition inconsistency in2013. [72]Thestudy

summarizeddifferencesbetweenmultipleopioid-use-disorderterminologiesbased

onexperts’opinion.Thestudyfoundthattheterm“misuse”“emphasizestheuseof

the substance does not follow medical indications or prescribed dosing” [72].

However,theterm“abuse”iscommonlyappliedtosubstanceusefornontherapeutic

purposes.Addiction,ontheotherhand,wasdefinedas“compulsivesubstanceuse

thatoccursdespitepersonalharmornegativeconsequences”[72].Toaddressthis

variation of the definition in the examined articles, we summarized outcome

definitionsbyeacharticleinTable2.3.Ouranalysiswaslimitedtothedataprovided

inthearticles.Thus,accesstocertaindatathatmeasuredthemodelsperformance,

such as c-statistics, sensitivity, and specificity for analyzed models, were not

available.

28

Conclusion

Opioidriskassessmenttoolsarebecomingstandardpracticeinpainandprimary

clinicspriorandduringprescribingopioidtherapyforchronicpain.However,this

review indicates that there is inter/intravariation in these tools’performance for

assessingopioidabuse.Thisreviewconcludesthatthisvariationcanbeexplainedby

thevariationinstudyperiod,samplesize,opioidabusedefinition,typeofdatabase,

and structureddocumentation to ICDcodes. Inaddition, this review illustratesan

overviewofcommonmethodsofopioidriskassessmenttoolsandcategorizesthem

based on method of reporting into self-report and database-related models.

Moreover,thestudyprovidesacomparisonBetweenthetwocategorieswhenever

possible.Thissystematicreviewpresentsalistofvariablesthatwereusedtopredict

opioidabusefromelectronichealthrecordsandadministrativedata.Furthermore,

thereviewprovidedacountofhowmanytimeseachvariablewasmentionedand

howoftenitwascountedasariskfactororprotectivemeasureintheliterature.To

ourknowledge,thisistheonlysystematicreviewthathasdoneso.Providingsuch

datatoresearcherscouldleadtodevelopingatoolthatismoreaccurateinpredicting

theriskofdevelopingaberrantdrug-relatedbehaviors.Thereviewalsoidentifiesand

compares other aspects related to opioid risk assessmentmodels design, such as

periodof the study,numberofpatients, subject inclusion criteria, and ICD-9 code

usedtoidentifypatientsfromthedatabase.Thestudyhighlightsmajordifferences

betweenarticlesdefiningopioid abusederived fromdatabases for thepurposeof

developingopioiduseriskassessmenttools.Thiscouldhelpfutureresearchersbuild

onpreviousworktocreateadvancedmodelsfor improvedpredictabilityofopioid

29

abuse. Despite the availability and presence of many self-report and database-

orientedriskassessmenttools,prescribersmightnotyetrelysolelyonthesetools

duetotheirlackofvalidationandconsistencyintheirresults.However,opioidrisk

assessmentprocedurescanbeimprovedbyenhancingstructureddatacapturedby

theelectronichealthrecordsystem.Physiciansandnursescanplayamajorrolein

thisstepbydocumentingtheproperICDcodethatspecificallyidentifiesthecategory

ofopioidabuseforpatients.Clinicalpracticesandhospitalsshouldpayattentionto

theviabilityof adopting these tools into theirpractice. Someexpectedbarriers to

adoptionofthesetoolsare:thepopulationvariationfromwithintheclinicwherethe

toolwasvalidated,proceduraldifferences,differencesinoutcomedefinitionbetween

clinics,andlackofpropertechnicalexperiencetoimplementthesetools.

30

3. IDENTIFYANDCOMPAREPATIENTSWITHOPIOIDUSEPROBLEM

Introduction

Opioidsareagroupofmedicationsprescribedtorelievepain.Beforethe1980s

thesedrugsweremainlyprescribedbysurgeonstorelieveimmediatepostoperative

painorpainrelatedtocancer[73,74].However,duringthe1980s,themovetoward

aggressivepaintreatmentandprescribingopioidsforchronicnon-cancerpainhas

been encouraged [73,75]. Advocacy health groups, such as American Pain

Foundation,continuedtopushformoreopioidprescriptionsduringthelate1990s

andthroughthe2000s[73,76–78].By2012,opioidprescriptionsincreasedfrom142

millionin1999to248millionprescriptions[79,80]andopioidsalesquadrupledfrom

1999 to 2010 [81]. Particular medications, such as hydrocodone, doubled in

consumptionbetween1999and2011[82].

Inconjunctionwiththeincreaseinprescribedopioidsperpopulation,therewas

also a higher percentage of opioid overdose deaths and substance use disorder

treatment increased in parallel from 1999 to 2008 [19]. In 2015, drug overdose

incidentsaccountedfor52,404U.S.deaths,ofwhichopioidswereinvolvedin33,091

overdoseincidents(63.1%)[83].The2016NationalSurveyonDrugUseandHealth

indicatedthattherewasapproximately11.8millionpeopleintheU.S.age12yearsor

olderwhomisused opioids in the past year [24]. At the state level, Indiana state

reported794opioid-relatedoverdosedeathswithasubstantialincreaseinheroin-

related overdose deaths. Compared to 2012, heroin-related overdose deaths

increasedfrom114to297in2016.Deathsfromsyntheticopioidshaveincreasedfor

thesameperiodfrom43to304deathsinIndiana[84].

31

Despite the importance of the opioid issue, reporting prevalence of opioid

problemsmaystillimposesomechallenges.Forexample,differencesinthetargeted

outcomedefinitions(“opioidusedisorder”vs.“opioidoverdose”vs.“opioidabuse”)

andpopulationinclusioncriteria(minimummedicationday’ssupplyornumberof

opioidprescriptions)mayalterthestudy’sabilitytoidentifytheissue[85].Despite

thesedifferences,InternationalClassificationofDisease(ICD)identificationmaybea

standardmethodthatmayusedtoreportopioidusedisorder,overdose,abuse,and

dependence(whichwewillrefertointhisstudyasOpioidUseProblem(OUP)).ICD

isacodesystemusedbyphysicianstoclassifyanddiagnosedisease.However,the

issuewithreportinganoutcomedefinedby ICDcodes is the fact that thesecodes

oftenunderstatetheactualnumberofpatientsexhibitingthetargetcategorization

[57,58].

Thus,weexaminedtheliteraturetoidentifyalternativemethodstoidentifyopioid

useproblem in addition to ICD codes.We found a limitednumberof studies that

discussalternativemethodsusingnaturallanguageprocessing(NLP),ortextmining,

inageneralhealthcaresetting[86–89].Theuseoftextmininginthesestudiesvaried

from accelerating the documentation process of patient records to automatically

parsing and coding clinical events or assisting decision-making processes by

classifyingspecificcomplexdiagnoses.

More relevant studies addressing NLP/text mining to identify opioid use

problemswereidentifiedaswell[61,90,91].Carrelletal.(2015)developedanNLP

process to identify evidence of opioid use problems in electronic health records

(EHRs)atGroupHealth,ahealthcaresysteminSeattle,WA,USA.Thestudyfoundthat

32

conventionaldiagnosticcodesforopioiduseproblemshasidentified2,240(10.1%)

patientsandNLPidentifiedanadditional728(3.1%)patientsbyanalyzingpatients’

clinicalnotes[90].Hylanetal.(2015)hasalsousedNLPto“reportonapredictive

modeldevelopedtoassessthelikelihoodofproblemopioiduseovera2-to5-year

periodfollowinginitiationofchronicopioidtherapy”withintheGroupHealthsystem.

Thestudyfoundthesensitivityoftheregressionmodelpredictionofproblemopioid

usetobe58.3%,withspecificitybeing71.2%[61].Despitetheseearlyeffortsofto

use NLP/text mining encouraging, no study has attempted to identify opioid use

problemsusingtextmininginamulti-sitestudysetting.

In this study, we compared different strategies for identifying OUP using

administrativeandEHRdata.Weexaminedwhetheraddingatextminingapproach

couldimproveidentificationofOUPforpatientsonlong-termopioidtherapy(LTOT)

foralargepopulationacrossamulti-sitehealthcaresystem.

Method

MethodsOverview

FollowingInstitutionReviewBoard(IRB#:1710876219)andanapprovalfrom

theRegenstriefInstitute(RI)DataManagementCommittee,wequeriedidentifiedthe

patients’ clinical notes from Indiana University Health (IUH), a large healthcare

networkinIndianawith3,541staffedbedsand2,563,086outpatientvisits[92].The

IUHsystemproducesover30reporttypes(e.g.,Visitnotes,Progressnotes,Discharge

notes).Next, a text-mining algorithmwas applied to identify opioiduseproblems

usingpatients’clinicalnotesandreportedincidenceresults(Figure3.1).Finally,to

investigatewhetherornottherewasadifferencebetweenpositiveOUPusingatext

33

mining approach vs. an ICD conventional approach, we compared the 2 positive

cohorts characteristics in termsof frequency and rateper1,000 long-termopioid

therapypatients andhighlightedkeypoints.Additionally,we compared results of

care utilization (outpatient visits, emergency department visits, hospitalizations,

cumulativehospitalizationdays)andtheywerealsoreported.

SampleDefinition

OursamplewasdefinedasadultIUHmembersage18yearsorolderwhohave

beenprescribedLTOT,whichisdefinedaspatientswith70daysofsupplywithinany

given90-dayperiodbetween1January2013and31December2014(24months).

WeexcludedpatientswithactivecancertofocusonLTOTfornon-cancerchronicpain

(ICD-9 codes 140.x 172.x, 174.x 209.xx, 235.x 239.xx, 338.3). Patients with

schizophreniawerealsoexcludedduetothedocumentedhighpercentageofopioid

dependenceamongthispopulation(ICD-9code295.9)[93].

VariablesofInterest

Patientcharacteristics,includingdemographics(age,gender,race,andethnicity),

alcoholabuse,non-opioidabuse,tobaccouse,mentaldisorders,andhepatitisCwere

reported for the study population. For this study, mental health disorders were

identifiedasdepressivedisorder (ICD-9codes296.2x,296.3x,300.4,311), suicide

attemptorotherself-injury(ICD-9codesE95x.x,E98x.x),oranxietydisorder(ICD9

codes300.0x,300.21,300.22,300.23,300.3,308.3,309.81).

ReportTypeSelection

DuetothelargenumberofreporttypesgeneratedbyIUHsystems,wefocusedon

themostprevalent and relevantnote types available in thedatabase.To increase

34

efficiency,thetextminingprocesswaslimitedtoreporttypesthatwereobjectively

deemedrelevanttoOUPidentificationthroughconsultingclinicalanddatasubject

expertsat IndianaUniversityand theRegenstrief Institute.Three typesof reports

wereinitiallyexcludedduetopresumedirrelevancetoOUPidentification:radiology

reports, medication fills, and patient instructions. Next, the study investigators

selected9reporttypeswithassumedrelevancytoOUPidentificationandqueriedthe

studypopulationfortheir2014counts.Thesystemreturned142,971reportsasthe

following: Emergency Department Doctor Progress Notes (48,898), Emergency

Department Discharge Notes (28,637), Primary Care Doctor Outpatient Progress

Notes (26,669), Visit Notes (21,868), Discharge Summary (11,731), History and

Physical (2,759), Admission History & Physical (1,390), Preadmission

History/Physical(576),PrimaryCareDoctorOutpatientHistoryandPhysical/Initial

Consult(443).Byexploringthe9relevantreporttypes,wefoundthatthetop5report

typesgeneratedover96%(137,802)oftotalreports,whilethebottom4wereonly

responsible for 4% of reports generated. Due to labor constraints, the bottom 4

reports (AdmissionHistory and Physical, PreadmissionHistory/Physical, Primary

CareDoctorOutpatientHistoryandPhysical/InitialConsult)wereexcluded.Thus,

EmergencyDepartmentDoctorProgressNotes, EmergencyDepartmentDischarge

Notes, Primary Care Doctor Outpatient Progress Notes, Visit Notes, Discharge

Summary,wereusedforthetext-mininganalysis.

35

IdentifyPatientswithOpioidUseProblems

Weusedtwocriteriawhichareexplainedinthefollowingsectiontoidentify

opioiduseproblem.Thecriteriawillcovertext-miningandICD-9approacheswhich

areexplainedindetailsinthefollowingsections.

IdentifyOUPusingaTextMimingProcess

The process of identifying OUP using text mining involved 2 main steps: 1)

algorithm development to flag potential positive reports; and 2) results of the

validationprocessusingsemi-assistedmanualreview.

1- AlgorithmDevelopmenttoFlagPotentialPositiveCasesUsingnDepth™

Weappliedthetext-miningpackageprovidedbynDepth™toparsemedicalnotes

todetectOUP.nDepth™isanNLPtooldesignedbytheRegenstriefInstitutein

IndianapolistoextractdatafromtheIndianaNetworkforPatientCare(INPC),which

isahealthcaredatabasemanagedbyRegenstriefInstituteonbehalfoftheIndiana

Health InformationExchange(IHIE) [94,95].As IUH ispartof INPC,nDepth™was

used toquerypatients’ clinical notes to identifypossibleopioiduseproblems.To

developthealgorithm,2keywordlistsadoptedfromtheliteraturewereenteredinto

nDepth™[91].Thefirstlistwascomprisedofopioidterms(e.g.,Vicodin,Opiate),and

thesecondlistwascomprisedofproblemterms(e.g.,addiction,abuse)(Appendix

A.1). nDepth™ creates state machines, an algorithm that parses report types for

certaincriteria,tocheckforallpossiblecombinationsofthe2listswithina5-word

distance. Flagged results are automatically checked for whether the statement is

negated, hypothetical, historical, or experienced. This process itself was adopted

fromtheConTextalgorithmdevelopedbyHarkemaetal. (2009) [96].Thus, if the

36

systemdeterminedthepatientstatementwasnotnegatedordeemedhypotheticalor

historical,thoseflagsweredeemedasexperienced(consideredpositive).

2- Semi-AssistedManualReviewValidationProcess

Flaggedreportsweresemi-assistedmanualreviewedby2trainedreviewers.In

caseofdisagreement,anexpertphysicianactedasa thirdreviewer toresolve the

dispute. To avoid overlooking any signs of opioid use problems, nDepth™ was

programmedtohighlight27suggestivephrasesintheflaggedreports.Thesephrases

werecollectedfromtheliteratureandmodifiedbasedoncommonclinicaldialogin

Indiana(AppendixA.2).Incaseswheretherewasmorethanoneflaggedreportper

patient,thesystemrandomlyselectedonetypeoftheflaggedreportstobereviewed

perpatient.ThecriteriachosentodetermineOUPwereadoptedfromCarrelletal.s’

(2015) work and listed in Table 3.1. Of note, as using marijuana medically and

recreationallyisillegalinIndiana,itsusewastreatedasconcurrentuseofanillicit

drugduring theprocessofmanualreview.Othermodificationswerealsoadopted

basedoninitialreviewsofsubsetsofpatients’clinicalreports.

IdentifyOUPUsingICD-9Codes

TwodefinitionsfromtheliteratureutilizingICD-9codeswerecombinedtocreate

a case definition of OUP: opioid abuse and dependence (304.00, 304.01, 305.50,

305.51,304.71,304.02,304.70,305.52)andopioidpoisoning(965.0,965.00,965.01,

965.02, 965.09, E850.0, E850.1, E850.2) (Appendix A.3). These definitions were

combined to minimize the chance of systematically creating type II errors by

capturingthewiderspectrumofopioiduseproblemsamongthestudypopulation.

37

Table3.1Thecriteriaforidentifyingopioiduseproblemsinpatients’clinicalnotesNo. Criteriaforopioiduseproblems1 Substanceabusetreatment,includingreferralorrecommendation2 Methadoneorsuboxonetreatmentforaddiction3 Obtainedopioidsfromnonmedicalsources4 Lossofcontrolofopioids,craving5 Familymemberreportedpatient’sopioidaddictiontoclinician6 Significanttreatmentcontractviolation7 Concurrentalcoholabuse/dependence(notremitted)8 Concurrentuseofillicitdrugs9 Currentorrecentopioidoverdose10 Patternofearlyrefills(notanisolatedevent)11 Manipulationofphysiciantoobtainopioids12 Obtainedopioidsfrommultiplephysicianssurreptitiously13 Opioidtaper/weanduetoproblems,lackofefficacy(notduetoexpectedpain

improvement)14 Unsuccessfultaperattempt15 Reboundheadacherelatedtochronicopioiduse16 Concurrentuseofunauthorizednarcotics(polypharmacy)17 Physicianstatesopioidabuse/overuse/addictionorlistedICDcodesforopioid

abuse/dependence

Analysis

Frequencytableswillbeusedtodescribestudiedpopulationcharacteristics.For

inferentialstatistics,characteristicsofpositiveOUPidentifiedbytextminingandICD

cohortswillbecompared.Chi-squarewillbeusedforcategoricaldataanalysisand

independentt-testwillbeusedtocomparemeansforcontinuousvariables.

38

Figure3.1Identificationofopioiduseproblemssummarymethodsandresults

39

Results

WeidentifiedandcomparedOUPusingICD-9codesandtextminingtechniques.

TheresultsaresummarizedinFigure3.1.

SamplingResults

Adult patients from IUH on LTOT were queried, and our inclusion criteria

returned34,661patients.Ourexclusioncriteria,cancerandschizophrenia,excluded

11,579(33.4%)patients(11,121cancerpatients,272withschizophrenia[186with

bothschizophreniaandcancer]).Weidentified23,082patientswhoreceivedLTOT

andgenerated559,464reportsacrosstheIUHsystem.Ourreportselectioncriteria

excluded8,784patients, leaving14,298eligiblepatientswhichgenerated137,804

reports.Nearly 60%of the studypopulationwere over 55 years old, andwomen

represented62%ofthestudypopulation.Thewhiteracewasdominantamongthe

study population (86%), while black patients represented 12%, and other or

unknownraceswere2%.Non-HispanicorLatinoethnicityrepresented87%ofthe

studypopulation(Table3.2).

IdentificationofOpioidUseProblemsUsingaText-MiningApproach

Ouralgorithmflagged468distinctstatementsin366uniquereportsrepresenting

154patients.ForvalidationandtoconfirmthepresenceofOUP,2reviewerseach

reviewed1flaggedreportperpositivepatientbasedonspecificcriteria(Table3.1).

Outof154flaggedpatients,only127patientsweredeemedaspositivecasesofOUP

withapositivepredictivevalueof82%.Cohen’sκwasruntodetermineiftherewas

agreementbetween2reviewers’judgementonwhetherasubsetof200patientsin

40

the study cohortweremeeting anyOUP criteria. Therewasmoderate agreement

betweenthe2reviewersjudgements:κ=.0.691(95%CI,0.58to0.79),p<.0005.

Table3.2Demographicscharacteristicsofstudypopulation

Demographics StudyCohortN(%)

Age(18-24) 236(1.7%)25-34 891(6.2%)35-44 1,749(12.2%)45-54 2,808(19.6%)55-64 3,222(22.5%)>65 5,392(37.7%)Sex(women) 8,923(62.4%)Men 5,375(37.6%)Race(Black) 1,719(12.0%)White 12,367(86.5%)Other/Unknown 212(2.0%)Ethnicity(NotHispanicorLatino) 12,369(86.5%)HispanicorLatino 138(1.0%)Unknown 1,791(12.5%)AlcoholAbuse(Yes) 88(0.6%)Non-opioidAbuse(Yes) 88(0.6%)TobaccoUse(Yes) 912(6.4%)Depression(Yes) 845(5.9%)Self-injury(Yes) 14(0.1%)HepatitisC(Yes) 94(0.7%)Total 14,298

FrequencydistributionofcriteriatoidentifyOUPinpatients’clinicalnoteswere

rankedasfollows[OUPcriteria,Frequency(percentage)]:[Physicianstatesnarcotic

abuse/overuse/addiction or listed ICD codes for opioid abuse/dependence, 48

(38%)]; [Concurrent use of illicit drugs, 18 (14%); Methadone or suboxone

treatment for addiction, 15 (12%)]; [Concurrent use of unauthorized narcotics

(polypharmacy), 11 (9%)]; [Current or recent opioid overdose, 10 (8%)];

[Concurrentalcoholabuse/dependence(notremitted),9(7%)];[Obtainedopioids

from multiple physicians surreptitiously, 8 (6%)]; [Physician or patient wants

immediate taper, 3 (2%)]; [Substance abuse treatment, including referral or

41

recommendation, 2 (2%)], [Significant treatment contract violation, 2 (2%)];

[Familymemberreportedpatient’sopioidaddictiontoclinician,1(1%)]

ComparisonBetweenTextMiningandICD-9

Incomparisontothetext-miningapproach,queryingthesamestudycohortfor

designatedICD-9codestoidentifyOUPreturned49distincteventsrepresenting45

uniquepatients(4patientshad2distinctpositiveOUPICD-9codes).Thefrequency

distributionofpositiveICD-9codesforOUPwererankedasthefollowing[ICD-9

code,Frequency(percentage)]:[304,24(49%),305.5,9(18%)],[304.01,5(10%),

965,3(6%),965.01,3(6%)],[E850.2,2(4%)],[304.71,1(2%)],[965.09,1(2%)],

[E850.0,1(2%)](AppendixA.4andAppendixA.5).

Theoverlapbetweenthe2OUPpositivecohortswasmeasured,andwefound

8cases thatwerepositive inbotha textminingapproachandICD-9codesquery

(AppendixA.1).ThefrequencydistributionofOUPpositivecohortscharacteristics

aresummarizedinTable3.3.

ThefrequencydistributionpercentageofpositiveOUPamongstratifiedpositive

cohorts(ICDandtextmining)bygendershowsthatwomenhadahigherfrequency

amongICDcohortscomparedtothetextminingcohort(71%and48%).However,

therateofoverallpositiveOUPindicatesahigherrateofpositiveOUPofmenvs.

women(14.7vs.10.4per1,000).

ThefrequencydistributionpercentageofpositiveOUPamongstratifiedpositive

cohorts(ICDandtextmining)byagegroupshowsthatthe45-54agegrouphadthe

highestfrequencypercentageofOUP(31%and24%).However,therateofpositive

OUPpergroupageindicatesthe18-24groupagehadthehighestOUPrateamong

42

agegroupsinICDandtextminingcohorts(12.7and42.4per1,000)(Table3.3and

Figure3.2).

Table3.3FrequencydistributionofOUPpositivecohorts’characteristics

Characteristic

StudyCohorts* Significance

ICD-9N(%)Text-MiningN(%)

CombinedN(%) X2(P-value)**

Age(18-24) 3(6.7%) 10(7.9%) 13(7.9%) 10.2(<0.001)†25-34 7(15.6%) 22(17.3%) 26(15.9%) 35-44 4(8.9%) 29(22.8%) 33(20.1%) 45-54 14(31.1%) 31(24.4%) 42(25.6%) 55-64 7(15.6%) 25(19.7%) 31(18.9%) >65 10(22.2%) 10(7.9%) 19(11.6%) Sex(Women) 32(71.1%) 61(48.0%) 89(54.3%) 7.1(0.0076)†Men 13(28.9%) 66(52.0%) 75(45.7%) Race(Black) 5(11.1%) 23(18.1%) 27(16.5%) 1.6(0.6014)White 39(87.7%) 101(79.5%) 134(81.7%) Other/Unknown 1(2.0%) 3(2.4%) 3(1.8%) Ethnicity(NotHispanicorLatino) 39(86.7%) 106(83.5%) 139(84.8%) 0.8(1)HispanicorLatino 0(0.0%) 2(1.6%) 2(1.2%) Unknown 6(13.3%) 19(15.0%) 23(14.0%) OtherCharacteristicsAlcoholAbuse(Yes) 2(4.4%) 3(2.4%) 5(3.0%) 0.5(0.6069)Non-opioidAbuse(Yes) 10(22.2%) 7(5.5%) 14(8.5%) 10.4(0.0028)†TobaccoUse(Yes) 14(31.1%) 17(13.4%) 30(18.3%) 7(0.0079)Depression(Yes) 12(26.7%) 14(11.0%) 23(14.0%) 6.3(0.0118)†Self-injury(Yes) 2(4.4%) 2(1.6%) 3(1.8%) 1.2(0.2804)HepatitisC(Yes) 3(6.7%) 3(2.4%) 5(3.0%) 1.8(0.1848)TotalCohorts 45 127 164* —*ICD-9: Positive OUP cases using ICD-9 codes; Text Mining: Positive OUP cases using a text miningapproach;Combined:CombinedpositivecasesinICD-9codesoratextminingapproach(Thereare8casesthatoverlappedbetweenICDandtextminingcohorts);(%):Columnpercentage.**Chi-squarewasreportedfor22measurementsandFishersexacttest(Freeman-Haltontest)forrxctables.

†SignificantdifferencebetweenICDandText-miningcohortonα=0.05.

Table3.4ComparisonofcareutilizationamongOUPpositivecohorts

CareUtilizationICDPositiveMean±(SD)

Text-miningPositiveMean±(SD) P-value

OutpatientVisits 12.84±(15.05) 12.87±(12.402) 0.992EmergencyDepartmentvisits 1.91±(2.193) 3.17±(4.915) 0.022†Hospitalizations 2.24±(2.838) 1.55±(2.572) 0.154CumulativeHospitalizationsDays 16.24±(23.985) 7.43±(12.016) 0.022††Resultsaresignificantonα=0.05

Similar to the overall study population, race and ethnicity were dominantly

White, non-Hispanic in both positive cohorts (Table 3.3). Other characteristics

associatedwithOUPintheliteraturewerereported,includingalcoholabuse,non-

43

opioid abuse, tobacco use, depression, self-injury, and hepatitis C. All showed a

patternofbeingattheirlowerpointinthenegativecohort,withamodestincrease

inthepositivetext-miningcohortandpeakingatthepositiveICD-9cohort(Table

3.3). An independent-samples t-test was conducted to compare care utilization

(outpatient visits, emergency department visit, hospitalizations, cumulative

hospitalizationdays)amongOUPpositive cohorts (Table3.4).Ouranalysis shows

that positive text mining cohort had significantly higher average of visiting

emergencydepartment(M=3.17,SD=4.9)comparingtoICDpositivecohort(M=

1.9,SD=2.1),P=0.022.TheICDpositivecohorthadasignificantlyhigheraverageof

cumulativehospitalizationdays(M=16.2,SD=23.9)comparingtothetext-mining

positivecohort(M=7.4,SD=12,P=0.022.)

Figure3.2Positiveopioiduseproblemsratestratifiedbyagegroup

VariablesofInterest

Inthisstudy,weconductedabi-variateanalysistocomparepositivetext-mining

cohorts vs. negative cohort. Chi-square and likelihood ratio were reported for

categorical variables (Table 3.5) and an independent t-test were reported for

44

continuousvariables(Table3.6).Thefollowingvariablesweresignificantonalpha=

0.05 and were ranked based on likelihood ratio and approximate change in

probability:Non-opioidAbuse,Gender,TobaccoUse,Self-Injury,Depression,Alcohol,

Abuse,andHepatitisC.Thet-testwasalsosignificantonthe followingcontinuous

variables: age, outpatient visits, emergency department visits, hospitalizations,

cumulativehospitalizationdays.

Table3.5Measureofassociationandlikelihoodratioforcategoricalvariables

Variable X2 p-valueLikelihood

ratio p-value

Approximatechangein

probabilityNon-opioidAbuse 55.457 <0.001 20.083 <0.001 >45%Gender 11.23 0.001 10.849 0.001 >45%TobaccoUse 10.881 0.001 8.41 0.004 >40%Self-Injury 30.971 <0.001 7.947 0.005 >35%Depression 6.186 0.013 4.994 0.025 >25%AlcoholAbuse 6.616 0.01 3.823 0.051 >20%HepatitsC 5.895 0.015 3.517 0.061 >20%

Table3.6ComparisonofcontinuousvariablesamongpositiveOUPusingtextminingvs.negativecohorts

VariableText-mining

positive MeanStd.

Deviation p-valueAge Yes 44.98 13.24 <0.001 No 59.02 16.34 OutpatientVisits Yes 12.86 12.4 0.001

No 9.15 10.48 EmergencyDepartmentvisits

Yes 3.17 4.91 <0.001No 0.94 1.86

Hospitalizations Yes 1.55 2.57 <0.001 No 0.54 1.138 CumulativeHospitalizationDays

Yes 7.42 12.01 <0.001No 3.21 17

Discussion

We identified 23,082 patientswho received LTOT during the study period, of

which 14,298 have eligible text mining of relevant electronic clinical notes that

yielded127positiveOUPcases,comparedto45casesusingICD-9codesforthesame

population.OurstudymethodologyfollowedthefootstepsofpreviousworkofCarrell

45

etal.(2015),Palmaretal.(2015),andHylanetal.(2015)[61,90,91].Thisoffersa

uniqueopportunitytocomparepopulationcharacteristicsandtextminingresultsto

identify opioid use problems in multi-healthcare settings. Below, we highlight

takeawayfindingsofourstudy.

IdentificationofOpioidUseProblems

WemeasuredtheprevalenceofopioiduseproblemsinIUHadultpatientsusing

textminingandICD-9codes.Inourstudy,validated-textminingidentified127(0.8%)

positive OUP cases out of 14,298 patients, while ICD-9 had identified 45 (0.3%)

positiveOUPcasesfromthesamepopulation(Totalcombined=164[1.1%]).Carrell

et al. (2015)–validated NLP have identified 1,875 (8.5%) patients out of 21,795

patients eligible for the study,while ICD-9had identified 2,240 (10.1%) from the

same population. We believe the difference of opioid use problem prevalence

betweenthe2studiescouldbeduetoseveralfactorssuchasthedifferencebetween

opioid problem rates in both populations or the study design. In our study, we

included5differentreportstypesthatcoversemergencydepartmentandprimary

carevisits,however,ourstudy,astheoppositefromCarrell’s,didnothaveaccessto

behavioral\mentalhealthreportsdueto42CFRpart2-confidentialityofsubstance

usedisorderpatientrecords[97].Thesereportscouldpotentiallyincludeopioiduse

problemrelateddataandaddingthemtoouranalysiscoulddecreasetheprevalence

disparitiesbetweenthe2studies.

Despitethedifferencesbetweenthe2combinedpercentages,bothfindings fall

within the estimated range reported in the literature. According to Ballantyne’s

commentarypublishedinthejournaljournalPAINin2015,publishedestimatesof

46

opioiduseproblemsrangedfrom<1%to50%[98].Ourfindingsareconsistentwith

otherstudiesestimatingopioidproblemsinthecontextoflong-termopioidtherapy.

Nobleetal. (2010)conductedastudytoassesssafety,efcacy,andeffectivenessof

opioids taken long-term for chronic non-cancer patients. The study reviewed 26

articlesandincludedonlypatientswhohadat least6monthsofopioidtreatment.

Accordingtothestudy,signsofopioidaddictionwerereportedinonly0.27%of4,893

studyparticipants[99].

FrequencyofICDCriteriatoIdentifyOUP

Ourstudyusedabroad-spectrumdefinitionofopioiduseproblemscomprisedof

2setsofICD-9codesforopioidabuseanddependence-aswellasopioidpoisoning.

This is different from Carrell’s study which only included those for opioid abuse

(305.5, 305.51, or 305.52), and opioid dependence (304, 304.01, 304.02, 304.7,

304.71,or304.72).Ourinclusionofopioidpoisingcodeshasdetectedanadditional

8distinctpatients.Thisequals17%of theoverallpositiveOUPdetectedby ICD-9

codesinourstudy.

Overall,ourstudypopulationgenerated49ICD-9events,representing45distinct

patients. The most commonly used ICD-9 codes were 304 [opioid dependence

unspecified] (49%), 305.5 [opioid abuse unspecified] (18%), and 304.1 [opioid

dependencecontinuous](10%).Thesefindingsaresimilartotheresultsreportedby

Palmeretal.(2015).Thestudyinvestigatedtheprevalenceofopioidproblemsamong

chronicopioidtherapypatientsintheGroupHealthCooperativefrom2006to2012.

ThemostcommonlyreportedICD-9codeswere304[opioiddependenceunspecified]

(55%), 304.1 [opioid dependence continuous] (26%), and 305.5 [opioid abuse

47

unspecified] (10%) [100]. Similar to the commonuse of unspecified ICD codes to

identifyOUP,apatternwasnoticedbyourreviewersduringamanualreviewprocess

ofpatients’clinicalnotes.Inthisstudy,wefoundunspecifiedcriteria[Physicianstates

opioidabuse/overuse/addictionor listedICDcodes foropioidabuse/dependence]

represented38%ofpositiveOUPinthetextminingapproach.Thesefindingssuggest

a common pattern in OUP identification between ICD-9 codes and a text-mining

approach.

FrequencyofText-MiningCriteriatoIdentifyOUP

Inourstudy,wehaveused17criteriatoidentifyOUPfromclinicalnotes.Themain

difference in these criteria compared to Carrells’ are the following: 1- we have

includedmarijuanaaspartofourlistofillicitdrugs2-Weaddedconcurrentuseof

authorizednarcoticstoincludespecificcasesofpolypharmacyofnarcoticabuse.We

added a criteria to cover cases where physician states narcotic abuse, overuse,

addictionorlistedICDcodesforopioidabuse/dependence.Inourstudy,useofan

illicitdrugcriteriacomprisedof18cases,ofwhich3casesindicatedactivemarijuana

use (within the last month) without mentioning other illicit drug abuse.

Polypharmacy of narcotic abuse flagged another 11 OUP cases. The third criteria

yielded48positiveOUPcases,ofwhich,16caseshaveoneormorementionsofICD-

9codediagnosisofabuseanddependencewithintheclinicalnotes.Outofthose16

patients,therewasonlyonepatientwhohadadocumentedICD-9foropioidabuse

anddependenceinthestructureddata.

48

Demographics

Inthisstudy,wefoundwomenwererepresentedmorethanmen,overall.Thisis

consistentwithCarrelletal.(2015)andHylanetal.(2015)which,bothhavefound

women to representabout two-thirdsof the chronicopioid therapypopulation in

theirstudies.Overall,higherwomen’srepresentationisconsistentwiththenational

estimatesofopioidprescriptionspergender.Accordingtothenationalstatisticsby

the CDC, there were 58% women vs. 42 % men who filled at least one opioid

prescription in2014[95].However,despitemorewomenthanmenseekingopioid

prescriptionsintheU.S.,menhadhigherratesofopioid-relateddisorders,suchas

opioid abuse or opioid overdose. In 2016, the opioid overdose deaths by gender

showedmenhad18.1deathsper100,000vs.8.5forwomenintheU.S.and16.7vs.

8.5,respectively, inIndiana[96].Ourstudyresultsareconsistentwiththisnotion;

whilepositiveOUPdistributionstratifiedbygendershowedahigherrateofwomen

withOUP,therateofoverallpositiveOUPindicatesahigherrateofpositiveOUPfor

men.

Mirroringnationwidetrendsofopioidprescriptionbyage,agedistributionofthe

overallstudycohortwasskewedtowardanolderage(>65),whichwasthehighest

representationinthestudycohort[97].FrequencyofpositiveOUPdistributionwas

thehighestinthegroupage45-54inboththeICDandtext-miningmethods.However,

therateofpositiveOUPpergroupageindicatesyoungeragegroupshadthehighest

OUP rate among age groups in ICD and text-mining cohorts. This should inform

physiciansinIndianathatdespitethehigherfrequencyofolderpatientswithopioid

49

problemsthattheymightencounter,younger-agepatientsaretheonesathigherrisk

foranopioidproblem.

ANoteAboutOpioidProblemDetectionUsingStructuredData

Despite the studywhichuseda combinationof2 typesof ICD-9 codes (opioid

abuseanddependenceandpoisoning),overall,ourtext-miningapproachidentified

morecases.Thetext-miningapproachalsodetected15patientswithamentionofan

ICD-9codeofopioidabuseanddependenceinpatients’clinicalnotes,butthesecodes

were undocumented in patients’ record as structured data. Thismight indicate a

significantlackofdocumentationtowardopioiduseproblemsandconfirmprevious

studies’ findings regarding the under-reporting of OUP using ICD-9. For future

researchpertainingtoidentifyingOUP,werecommendusingalternatemethods

(in addition to ICD codes), such as textmining andmachine learning, and further

exploringother commonlydocumentedstructureddata, suchasprocedural codes

andelectroniclabresults.

Limitations

Inthisstudy,weusedatext-miningapproachtoanalyzepatients’clinicalnotesto

identifyOUP.However, theanalysiswas limitedto5report types.Developingand

implementingNLPtechniquesgenerallyrequiresintensivecomputingandaninitial

commitmentofsubstantialresourcesandexpertise[101].Thus,itwasnotfinancially

feasiblewithintheallocatedbudgetforthisstudytotestallreporttypesgenerated

by the IUH study population. Rather, the authors relied on expert opinions to

meaningfully limit thenumber of report types.Reliance on expert opinionduring

algorithmdevelopmentiscommoninopioidproblemidentificationliterature.Canan

50

etal. (2017)reviewed15automatedalgorithmsto identifynonmedicalopioiduse

using electronic health record data. The study found that investigators explicitly

reliedonsubjectmatterexpertsduringtheprocessofalgorithmdevelopmentandto

identifycandidatevariables[101].

Anotherlimitationpertainingtostudyresultsisthat,despitetextminingcorrectly

identifying 127 positive cases out of 154 initial positive cases (PPV = 82%), the

overlapbetweenthetext-miningresultsandICD-9positiveresultswasonly8cases

(17%) (AppendixA.1). Thismay suggest that the textminingmethod hasmissed

some positive cases (false negative). To investigate this assumption, we have

reviewed500randomlyselectedreports thatwerenot flaggedbyour text-mining

criteria.Amongthe500reports,we identified10 falsenegativecases, indicatinga

negativepredictivevalueof98%onthereviewedsubset.

Finally, we used multiple behavioral concepts to define opioid use problems

(dependence, abuse, and addiction) and thus, distinguishing specific behavioral

conceptswasnotautomatedinthisresearch.Futureworkmayinvestigateusingtext

mining,naturallanguageprocessing,andmachinelearningtobettertargetspecific

behaviors.

Conclusion

Inthisstudy,wedevelopedatext-miningapproachtoidentifyOUPamonglong-

term opioid therapy patients at IUH. Compared to ICD-9 codes, text mining

successfully identified more OUP cases within the study population. Future

development of text-mining techniques can help identify cases undiscovered by

conventionalICD-9reportingmethods.

51

4. MACHINE-LEARNINGAPPROACHTOIDENTIFYOPIOIDUSEPROBLEMS

Introduction

In2012,opioidprescriptionsreachedtheirpeakwith81.3prescriptionsper100

people[98].Despitemorerecentreportsshowingthatthisnumberhasdecreasedto

66.5prescriptionsper100people, theopioidprescriptionrate in2015remains3

timeshigherthantheratesfrom1999[99],andmorethanaquarterofU.S.counties

arestillshowingdispensedopioidprescriptionnumbersequivalenttoorhigherthan

theirpopulations[98].Thedisparitiesoverprescribingopioidsacrossthenationhas

continuedthroughouttheyears.Infact,in2016,somecountieshadprescriptionrates

7timeshigherthanthenation’saverage[98–100].

Thelinkbetweenaveragedailyopioidprescriptionandopioidoverdosehasbeen

established[101].AccordingtoDunnetal.(2010),long-termopioidtherapypatients

receivinghigherdosesofopioidsareatahigherriskofdrugoverdose[102].Other

studies have found an association between heroin use and an increased rate of

nonmedicalopioiduseandmeetthecriteriaforopioidabuse[103].

Despitetheimportanceoftheopioidabuseissue,thecurrenthealthcaresystem

mainly relies on the International Classification of Disease (ICD) to report use

problems,whichunderestimatestheactualnumberofpatientsexhibitingthetarget

categorization[57,58].

Inrecentyears,severalstudieshavebeenconductedtoinvestigatethevalidityof

othermethods, such as natural language processing and textmining, tomeasure

opioidproblems, inaddition to ICDcodes.However, toourknowledge,nostudies

havebeenconductedtomeasurethepresenceofopioidproblemsamonglong-term

52

opioidusersviamachine-learningtechniques.Theaimofthischapteristoinvestigate

theapplicationofmachine-learningtechniquestoclassifyopioiduseproblems(OUP)

utilizing patients clinical notes. For this study, we define OUP as the presence of

opioidaberrantbehaviors,opioidmisuse,opioidabuseandopioidusedisorderinthe

patient’clinicalnotes.

Method

Machine-LearningProcessSummary

Toprepareformachine-learningprocess,2reviewershadreviewed700clinical

notesusingnDepth™todetermineopioiduseproblemcases.Reportcontents,along

with review results, were extracted by the Regenstrief Institute data core into

comma-separatedvalues(CSV)filesandPortabledocumentformat(PDF)files.The

reviewof700clinicalnotesresultedin145positivereportsforopioiduseproblems

(OUP) and 555 negative reportswith no indication of OUP in the text. To ensure

HIPAAcompliance,thefileswerestoredandprocessedontheKarstdesktop,anIU

remotesecuresupercomputer(Figure4.2).

Topreparedataformachine learning,2Javaprogramswerewrittentoextract

text from each of the PDF and CSV clinical notes files and they were saved into

separate TXT files. For the machine-learning process, Waikato Environment for

Knowledge Analysis (WEKA) was used to build and compare machine-learning

modelsonadocumentlevel.PerformancewascomparedbasedonF-measure,recall,

andprecisionfor5models.Thebestmodelperformancewasoptimizedandfinalized

into a standalone product to identify the OUP outcome among long-term opioid

53

therapypatientsatIUHbasedontheirmedicalrecords.Finally,themodelwastested

onasubsetofunseendata,andthemodelreportedtheresults(Figure4.2).

GoldStandardDevelopment(nDepth™)

Tocreateagoldstandardforthemachine-learningprocess,2trainedreviewers

manuallyreviewed700reportswithasemi-assistedmanualreviewprocessusing

nDepth™.TheRegenstrief Institute in IndianapolisdesignednDepth™asanatural

languageprocessingtoolwithwhichtoextractdata fromthe IndianaNetwork for

Patient Care (INPC). Furthermore, nDepth™ was programmed to highlight 27

suggestivewords in flagged reports (Table A.2).Whenmore than one report per

patientwasflagged,thesystemrandomlyselectedonetypeoftheflaggedreportsper

patienttobereviewed.ToestablishtheKappacoefficient,bothreviewersreviewed

200 identical reports. In case of the discrepancy, files were reevaluated and

consensus was reached. Cohen’s κwas run to determine if there was agreement

betweenthe2reviewersjudgmentregardingwhetherasubsetof200patientsinthe

studycohortmetanyOUPcriteria.Therewasmoderateagreementbetweenthe2

reviewers judgments:κ=0.691 (95%CI, 0.58 to0.79),p< .001.Ouroverall gold

standarddatasetconsistedof136positiveOUPreportsand480negativereports.The

criteriatodetermineopioiduseproblemsinclinicalnotesarelistedinTable3.1.

DataTransformation(Java)

The700manuallyreviewedclinicalnotes,whichconsistedof132PDFand568

CSVreports,wereuploadedintoKarst.KarstDesktopisaremotedesktopservicefor

users with accounts on the Karst research supercomputer [92]. As part of the

machinelearning processes, text from both CSV and PDF reports needed to be

54

transformed into TXT files for each report. Thus, a Java programwas written to

extractdatafromthePDFandCSVfiles(AppendixD.1andAppendixD.2).Asample

oftheextracteddatawasreviewedtovalidatetheextractionprocesstoconfirmdata

correctnessandcompletion.AsafinalsteptopreparethedataforWEKA,textwas

convertedintoAttribute-RelationFileFormat(ARFF).ARFFis“anASCIItextfilesthat

describesalistofinstancessharingasetofattributes.ARFFfilesweredevelopedby

themachine-learningprojectatthedepartmentofcomputerscienceoftheUniversity

ofWaikatoforusewiththeWEKA”[103].

Analysis

WEKAConfigurationandModelComparison

Intheprocessofmodelbuilding,trainingthemodelonalargersamplesizeusually

produceshighermodelperformancecompared to training themodelonasmaller

sample size [104]. However, the model-building process also considers class

representation balance in the training process, [105,106] which can lead to the

inabilitytoclassifytheunbalanceddataduetolackofpriorinformationinthemodel

training [106,107]. Class imbalance is characterized as there being “many more

instancesof someclasses thanothers” [107,108].The fundamental challengewith

trainingamodelonimbalanceddataistheabilityofimbalanceddatato“significantly

compromisetheperformanceofthemoststandardlearningalgorithms”[109].

In previous years, many solutions have been introduced to deal with the

imbalanceddataproblemandcanbesummarizedas follows:1-Datasampling: in

which themodel is trained on amore balanced distributed dataset [107,110]; 2-

Algorithmicmodification:inwhichadoptingbase-learningalgorithmsneeds“tobe

55

more attuned to class imbalanced” [107,111]; 3- Cost-sensitive learning: this

approach considersmodification on data level, algorithm level, or both, in which

consideringmisclassificationofonedataclass(e.g.,positiveclass)costishigherthan

anotherdataclass(e.g.,negativeclass) thus,algorithmperformance isadjusted to

minimizethehighercosterror(misclassifypositiveclass)[107,112].

Ourgoldstandarddatasethasaboutan80%negativeanda20%positiveclass

imbalance.BalancingthedataintheWEKAtrainingmodelon50/50or40/60class

representation is recommended [113]. However, adopting this approach might

jeopardize the real-world application of the model, given the difference in prior

probabilitybetweenthetrainingenvironmentandreal-worldapplication[107,114].

Hence, for this chapter,we decided to use an ensemble approach,where firstwe

exploremultiplealgorithmsandvariousclassificationconfigurationsandcompare

their performance on a balanced subset of data that has similar representation

betweenpositiveandnegativereports.Second,oncewedeterminethebestWEKA

configurationwithwhich to classify the 2 classes,wewill optimize themodel by

introducing560(80%)of thegoldstandardreportstoenhancetheaspectofreal-

worldapplicationofthemodel(becauserealworlddataisnotbalanced)[111].To

standardizedtestingenvironment,cross-validation(k=10)willbeusedacrossallof

themodels.

Finally,theoptimizedmodelwillbetestedonaunseendatasubsetconsistingof

140 (20%) clinical notes. Our intention in the optimization step is to maximize

positive report detection (less false negatives),whilemaintainingminimal loss of

precision. The rationale behind the ensemble approach is that classifiers are

56

commonlydesignedtominimizeerrorwhentheymakepredictionsaboutnewdata.

However, this assumption is proper when the cost of different errors is equally

important [112,115]. In this research,we sought toprioritize theminimizationof

false negatives (by capturing more true positives, while risking an increase in

capturing false positives). Accuracy measures, such as F-measures, recalls, and

precision,willbereportedforeachstep.ThehighlightsoftheWEKAconfiguration

(Table4.1)andthecompletemodelschema(Figure4.1)forbestmodelperformance

will be reported. The complete model training steps of WEKA are reported in

AppendixC.1toAppendixC.19.

Figure4.1WEKAmodelconfigurations

Table4.1HighlightofWEKAconfigurationsOption ChosenFeatures CommentClassifier FilteredClassifer Cost-sensitiveclassifierwasaddedto

optimizethefinalmodel.Algorithm 1-LinearAlgorithm:Simple

Logistic2-Non-LinearAlgorithm:SMO/J48/NaiveBayes3-EnsembleAlgorithm:RandomForest

ForSimpleLogistic:batchsize:100,heuristicstop:50,maxboostingiteration:500,numberofboostingiteration:0numberofdecimalplaces:2

Filter StringToWordVector: IDFTransform:True,TFTransform:True,LowerCaseToken:True.Wordstokeep:1000

StopWordsHandler MultiStopWords Tokenizer NGramTokenizer Max3,Min1

ModelFinalization

Themodelwithbestaccuracymeasureperformancewillbesavedasstandalone

MODEL files. The MODEL extension is a common files type for digital product

“Scheme:weka.classifiers.meta.FilteredClassifier-F"weka.filters.unsupervised.attribute.StringToWordVector-Rfirst-last-W1000-prune-rate-1.0-T-I-N0-L-stemmerweka.core.stemmers.IteratedLovinsStemmer-stopwords-handlerweka.core.stopwords.MultiStopwords-M1-tokenizer\"weka.core.tokenizers.NGramTokenizer-max3-min1-delimiters\\\"\\\\r\\\\n\\\\t.,;:\\\\\\\'\\\\\\\"()?!\\\"\""-S1-Wweka.classifiers.functions.SimpleLogistic---I0-M500-H50-W0.0”

57

definitionsandsimulations[116].ThefilescanbeloadedbyWEKAontoanyPCor

Mackintoshenvironmentandusedtoclassifycasesofopioiduseproblemsinunseen

reportsfromIUH.

Results

ModelComparison

We usedWEKA to build and test files among differentmodels to classify IUH

clinicalnotesintopositiveandnegativeforopioiduseproblemsusingabalancedclass

dataset. The training set consisted of 136 positive reports and similar negative

reports,whichwereuploaded toWEKA.The following algorithmswereused: J48

(defaultWEKAalgorithm),simplelogistic,naivebayes,randomforest,andsequential

minimal optimization (SMO). All models were tested based on a 10-fold

crossvalidation. Our model comparison has shown the Simple Logistic algorithm

showed better performance than other models for predicting positive class

(Precision: 91.9%; Recall: 91.2%; F-measure: 91.5%; ROC: 92%) (Table 4.2). The

detailedaccuracymeasureperformancesforsimplelogisticarelistedinTable4.3.

ModelOptimization

Toenhancemodelreal-worldapplication,simplelogisticmodelconfigurationwas

optimizedon80%ofthegoldstandarddataset,whichwascomprisedof560reports

(136positiveand424negative).Compared to thebalancedataset, themodelhad

initially underperformed when larger portions of the negative dataset were

introduced(Table4.4).Themodelhasclassified37(27%)out136positivecasesas

falsenegativecases.

58

Table4.2Comparisonofmultiplemodels’performancestoidentifypositiveOUPAccuracymeasurestoclassifypositivereportsAlgorithm Precision Recall F-MeasureSimpleLogistic 0.919 0.912 0.915J48 0.847 0.853 0.85RandomForest 0.756 0.934 0.836SMO 0.797 0.838 0.817NaiveBayes 0.821 0.64 0.719

Table4.3DetailedaccuracyperformancemeasureforsimplelogisticalgorithmtoclassifyOUP

Class TPRate FPRate Precision Recall F-Measure ROCAreaPositive 0.921 0.079 0.919 0.912 0.915 0.931

Negative 0.921 0.088 0.915 0.921 0.918 WeightedAvg. 0.917 0.083 0.917 0.917 0.917 Table4.4Initialsimplelogisticmodelperformanceon80%ofthegoldstandarddataClass TPRate FPRate Precision Recall F-Measure ROCAreaPositive 0.728 0.052 0.818 0.728 0.77 0.94

Negative 0.948 0.272 0.916 0.948 0.932 WeightedAvg. 0.895 0.219 0.892 0.895 0.892 Table4.5Optimizedsimplelogisticmodelperformanceon80%ofthegoldstandarddataClass TPRate FPRate Precision Recall F-Measure ROCAreaPositive 0.897 0.094 0.753 0.897 0.819 0.890Negative 0.906 0.103 0.965 0.906 0.934

WeightedAvg. 0.904 0.101 0.913 0.904 0.906

Table4.6Simplelogisticconfusionmatrixcomparisonbeforeandafteroptimization

ConfusionMatrix(N=560:136Positive,424Negative)Beforeoptimization AfterOptimization

Negative Positive Negative Positive402 22 384 4037 99 14 122

Table4.7Simplelogisticmodelperformancetestedon20%ofunseengoldstandarddataClass TPRate FPRate Precision Recall F-Measure ROCAreaPositive 0.889 0.030 0.667 0.889 0.762 0.924Negative 0.970 0.111 0.992 0.970 0.981

WeightedAvg. 0.965 0.106 0.971 0.965 0.967

59

Figure4.2Summaryofchapter4methodandresults

To account for this effect, a cost-sensitive classifying technique was adopted.

Costsensitive classifying is one method to fine-tune model performance through

increasingthepenaltyofmisclassifyingoneormoreclassesintheconfusionmatrix

60

(AppendixC.20toAppendixC.28).Adjustingthemodelcostsensitivelymatrixto2:1

has doubled the penalty of misclassifying positive cases as false positives, which

resultedinadecreaseinfalsepositivecases(Table4.5),droppingthetotaltoonly14

(10%)casesoutof136.Acompleteconfusionmatrixforbothmodels’outcomesis

listedinTable4.6.

ModelTesting

Thefinalizedmodelwastestedon140reports(20%ofourgoldstandarddata).

Thesedatawerenotusedinitiallyintheprocessoftrainingandtestingthemodel.On

theunseendata,themodelhascorrectlyclassified136(96.4%)reportswith

88.9% sensitivity to identify positive cases. The accuracy measures to classify

positive.reportswerethefollowing:Precision:0.667,Recall:0.889,andF-Measure:

0.762.ThecompleteperformancemeasuresarelistedinTable4.7

Discussion

Inthischapter,webuiltamachine-learningmodeltoidentifyopioiduseproblems

basedonIndianaUniversityHealthclinicalnotes.Thepositiveandnegativereports,

which had been extracted andmanually reviewed using nDepth™ in the previous

chapter,werepre-processedanduploadedintoWEKA,whichwasusedtocreatea

modeltoidentifyopioiduseproblems.Inthissection,wehighlightsomeofthekey

pointsofthiswork.

Algorithms’Performance

Wetested5differentalgorithmstoclassifyopioiduseproblemsonadocument

level (Appendix B.1 to Appendix B.5). simple logistic regression showed overall

higheraccuracyperformancemeasures inclassifyingpositiveopioiduseproblems

61

on a balanced dataset (Table 4.2). We found this interesting, given the growing

number of published articles discussing the superiority of random forest and/or

support vector machines over logistic regressions [117,118]. However, a recent

article by Pranckeviˇcius et al. (2017) compared Naive Bayes, Random Forest,

DecisionTree,SupportVectorMachineandlogisticregressionclassifierstoclassify

textreviews(multi-classtextualdata).Thestudyfoundthatlogisticregressionhas

achieved thehighestclassificationaccuracycomparedwithotherclassifiers [119].

This can be explained by the signal-to-noise ratio. One opinion article we found

discussedtheadvantagesoflogisticregressionsovertreedecisionmodels;thearticle

foundthat logisticregressionsperformbetterwhenthesignal-tonoiseratio is low

[120].

Giventhelargetextdata(noise)comparedtothesignal(phrasesorstatements

indicatinganopioiduseproblems),wedecidedtoinvestigatethishypothesisandsee

if logistic regression will still perform better if we limit our model training to

sentencesandphrases.Thus,wehavemanuallyreviewedallofthepositivereports

and300ofthenegativereportstoannotatestatementsandphrasesthatindicateor

negatethepresenceofopioiduseproblemsusingtheExtensibleHumanOracleSuite

ofTools(eHOST).Thisprocesshasresultedin255sentencesandphrasesindicating

thepresenceofanopioiduseproblems.However,itonlyyielded55sentencesand

phrases,negatingthepresenceofanopioiduseproblems.Finally,weappliedsimilar

settings to compare the following 4 classifiers: logistic regression, random forest,

NaiveBayesandSMO.OurresultsshowedthatrandomforestandSMOscoredslightly

higherthanlogisticregressionandNaiveBayes(AppendixB.6).

62

Cost-SensitiveClassifier

Inthisstudy,wefoundthatcost-sensitiveclassifierisaneffectivemethodtouse

to adjust for the data imbalance. Adjusting the cost matrix has increased recall

performancewhileloweringtheprecisionofpredictingpositivecasesinthemodel

byincreasing(doubling)thepenaltyformakingafalsenegativeerror.Ourrationale

forthisapproacharisesfromtheintentiontominimizemissingtrue-positivecases.

WebelievethatidentifyingmorepositiveOUPcaseswillallowfutureresearchersto

better understand why this sector of positive OUP patients (according to their

reports)wasnotdocumentedusingICDasanopioiduseproblemscase.Furthermore,

duetotheimportanceofopioidissuesfromaclinicalperspective,andgiventhesmall

representationofopioiduseproblemscasesintheIUHsystem(about1%),building

ahigh-sensitivitymodeltodetectpositiveOUPinlong-termopioidtherapypatients

based on their clinical notesmay seem like a practical approach, given the small

percentage of positive OUP cases. Such a pragmatic approach might not be

uncommon in screening tests;mammogramsare a commonexampleof screening

testswheresensitivityishigherthanspecificity[121,122].

Limitation

Ourgoldstandarddataweredeterminedbymanuallyreviewingthetext-mining

algorithmtoidentifyopioiduseproblems.Anypositivereportsubclassthatwasnot

captured by the text-mining algorithmmay not be represented in ourmodel. To

accountforthislimitation,wehaveintentionallyaddedallfalsepositivereportsas

partofourtrainingmodelandlabeledthemasnegativereports.Furthermore,500

63

randomnegativereports(fromthetext-miningalgorithm)weremanuallyreviewed,

andfalsenegativereportswereaddedtothetrainingsetaspositivecases.

Conclusion

Inthischapter,weadoptedanensembleapproachtobuildamachine-learning

model to identifyopioiduseproblemsbasedon IndianaUniversityHealth clinical

notes.Ourfinalmodelwastestedonunseendataandreportedasensitivityof88%

whenidentifyingpositivecases.Weconcludedthatamachine-learningapproachmay

beusedtoidentifyopioiduseproblemsinIndiana.

64

5. SUMMARY

SummaryOverview

In this thesis, we aimed to build a classification model to identify opioid use

problems using text-mining and machine-learning approaches. Additionally, we

examined several opioid use problems classification tools/models and identified

theirstrengthsandweaknesses.Generally,thesetoolscanbecategorizedintothree

classes, depending on the data populationmethod as follows: 1- Self-report tools

(Firstgeneration),2-Structureddatamodels(Secondgeneration),3-Clinicalnotes

models(Thirdgeneration).Belowwesynthesizethefindingsofthisthesisinlightof

thisclassification.

Self-reportTools(FirstGeneration)

Thesearerisk-assessmenttoolsthatpredicttheriskofcurrentorfutureopioid

useproblemsbasedonpredictivevariablesextractedfromdataself-reportedbythe

patient or physician. Common examples of self-report tools are SOAPP, ORT, and

COMM. Themain advantage of these tools is their independence from electronic

healthrecorddataandcanbeadministeredtothepatientorfilledoutbythetreating

physician. However, having the majority of these tools solely rely on patient

completionmighthavelimitedtheirabilitytopredictaberrantdrug-takingbehavior

duetoreportingbiasora lackofpatientcomprehension.Other factors thatmight

hinderusingself-reporttoolsaretheneedfordocumentationfromthemedicalstaff

and the largevariation in its sensitivity acrossdifferentpopulationsanddifferent

outcomedefinitions.

65

StructuredDataModels(SecondGeneration)

Thesearerisk-assessmentmodelsthatpredicttheriskofcurrentorfutureopioid

use problems based on predictive variables extracted from structured data of

electronichealthrecords.CommonexamplesofthesemodelsaretheworkofEdlund

etal.(2007),Whiteetal.(2009),Riceetal.(2012),Turneretal.(2014),andothers.

Thesemodelsdonotrelyonpatientreporteddata(directly),rather,itdependson

the structured data available in the electronic health records. However, this

dependenceonelectronichealthrecordsstructureddatamakesthesemodelsasgood

asthequalityandrobustnatureofcaptureddatastoredintheelectronicrecord.For

example,thenumberofprescribersandthenumberofdispensershavebeenshown

tobeariskindicatorforopioiduseproblems.However,includingthesedataintoa

classifyingorpredictivemodelmayrequirepre-worktoconsolidatethesedatafrom

multiple databases (such as Prescription Monitoring Programs). Thus, it was no

surprisetoseeAgeandGenderastheonlycommonvariablesacrossthe9modelswe

found using structured data. However, with the constant growth of the Health

InformationExchangeinsizeandnumber,weexpecttoseemorerobustmodelsin

the future that identify and utilize new variables to better understand opioid use

problems.

ClinicalNotesModels(ThirdGeneration)

NLPandTextMining

These are risk assessment models that identify the presence of opioid use

problems through classifying patients’ clinical notes using natural language

processingandtext-miningapproaches. Incomparisontothe2previousmethods,

66

NLP and text mining are more complex techniques that require intensive

computationalandlaborskills.Thus, inourreview,onlyahandfulofstudieswere

foundusingthesetechniquestoidentifyopioiduseproblems.Inthisthesis,weused

atext-miningapproachtoidentifyopioiduseproblemsusingpatient’sclinicalnotes.

Belowwehighlightthefindingsofourresearchwithinthecontextofsimilarresearch

intheliterature.

MethodsandVariables

Hylanetal.(2015),Carrelletal.(2015),andthesubsequentworkofPalmeretal.

(2015)area fewmodels foundthatuseNLPto identifyopioiduseproblemsfrom

patients’ clinical notes. Hylan et al. (2015), a two-year prospective study, used 3

phasesofNLPtoidentifyclinicalnoteswithindicationsof1-opioidoveruse,misuse,

abuse,oraddiction2-opioiddependence,and3-“mentions”ofICD-9codes(12ICD-

9codes)foropioidabuseordependenceintheclinicalnotes.Thestudyhasidentified

158(5.7%)patientswithopioiduseproblems,whileICD-9forthesamecohorthas

identified25(0.9%).

Our study used dictionary-based NLP which was adopted from Carrell et al.

(2015)for2reasons.First,astheoppositeofCarrellstudy’s,Hylanetal.(2015)did

not specify the technical details of theirNLP process. Second, theNLPmethod of

Carrellstudy’shaveresultedindetectingalargerpercentageofopioiduseproblems.

Third,Hylanetal.(2015)conductedaprospectivestudyanditwaslimitedtoprimary

caresettings,whileCarrelletal.(2015)conductedaretrospectivestudyandincluded

primarycareandemergencydepartmentnotes.Tonote,bothstudiesdrewsamples

fromtheGroupHealthpopulation.Themajordifferencewasinthestudyperiod(2

67

years vs. 5 years), study design (prospective vs. retrospective), and study sample

(primary care vs. primary care and emergency). In our study, the validated-text

miningidentified127(0.8%)positiveOUPcasesoutof14,298patientswhileICD-9

has identified 45 (0.3%) positive OUP cases from the same population (total

combined=164(1.1%)).

InadditiontotheNLPclassification,bothstudieshaveincludedancillaryanalysis

of study cohort to estimate risk.Hylan studiedprediction ability of 7 variables to

identifyapositiveNLPcohort.Thesevariablesweredocumentedforthe2yearsprior

toinitiationofchronicopioidtherapy.Thesevariableswere:Age(15to44or45to

64),smokingstatus,and2-yearpriorICDcodesforthefollowingdiagnoses:opioid

abuse/dependence, drug abuse/dependence (excluding opioids), alcohol

abuse/dependence,mental health disorder, and hepatitis C. The study found that

usingmultivariatelogisticregression,variablesindicatingage18to44,opioidabuse

diagnosis,positivesmokingstatus,andmentaldisorderdiagnosesweresignificant

predictors during the measured period. This study was limited to primary care

patientsofanintegratedgrouppractice,salariedphysiciansworkinginGroupHealth

Cooperative(GHC)clinics.

Inourstudy,validated-textminingidentified127(0.8%)positiveOUPcasesout

of14,298patients,whileICD-9identified45(0.3%)positiveOUPcasesfromthesame

population (total combined = 164 [1.1%]). We compared demographic

characteristicsandvariablesof interestamong2positivegroups(textminingand

ICD)usingChisquareandindependentt-test.WehavealsocomparedpositiveOUP

usingtextminingvs.negativeOUPforthesamevariablesandreportedmultivariate

68

oddsratio.Ourvariablesofinterestincludednon-opioidabuse,gender,tobaccoUse,

self-Injury,depression,alcoholabuse,hepatitisC,age,outpatientvisits,emergency

department visits, hospitalizations, and cumulative hospitalization days. These

variableswerechosensubjectivelybasedonoursystematicreviewofpreviouswork

and the feasibility of obtaining these variables from IUH and INPC databases.

Multivariateanalysislogisticregressionindicatesthatonlygender,non-opioidabuse,

selfinjury,outpatientvisits,emergencydepartmentvisitsandinpatientvisitswere

significantpredictorstopositivetext-miningoutcome(Table5.1).

Table5.1Multivariatelogisticregressionoftext-miningmeasureofopioiduseproblems

Variables OddsRatio Lower* Upper*Depression 1.166 0.612 2.225AgeContinuous 0.915 0.868 0.965AgeGroup 1.465 0.854 2.512Sex 1.62 1.124 2.335Race 0.928 0.591 1.456Ethnicity 1.202 0.931 1.553AlcoholAbuse 1.869 0.536 6.519OtherMedicalAbuse 3.583 1.409 9.116TobaccoUse 0.919 0.473 1.784SelfInjury 7.884 1.35 46.028HepatitisC 1.38 0.348 5.468Outpatientvisits 1.016 1.003 1.03EDvisits 1.093 1.053 1.135InpatientVisits 1.34 1.138 1.579HospitalDays 0.983 0.958 1.009*Oddsratiovaluebasedon95%confidenceinterval

Collectively, in addition to younger age, prior opioid diagnosis and mental

disorder,ourstudyfindingsindicatenon-opioidabuse,selfinjury,outpatientvisits,

EDvisitsandinpatientvisitsaresignificantvariablestopredictopioiduseproblems

inpositiveNLPandtext-miningpositivecohort.Ourreportedbi-variatemeasureof

association has also found alcohol abuse, hepatitis C, tobacco use, depression are

69

significantassociatedwiththemeasuredoutcomes.Onecommonlimitationofboth

studieswasthesmallnumberofpositiveoutcome(127outof14,298and158outof

2,752),whichaffectedthemodelsignificancetopredicttheoutcome.Thiscouldbe

Duethe limitations intheprocessoftextmining, itself,andinabilityofdictionary-

based method to detect indication of opioid use problems in indirect or long

statements.Suchstatementsmightbedeemedfalsepositive.

PrevalenceofOpioidUseProblems

WemeasuredtheprevelenceofopioiduseproblemsinIUHadultpatientsusing

textminingandICD-9codes.Inourstudy,validated-textminingidentified127

(0.8%) positive OUP cases out of 14,298 patients, while ICD-9 has Identified 45

(0.3%)positiveOUPcasesfromthesamepopulation(totalcombined=164[1.1%]).

Carrelletal(2015)Validated-NLPmethodhasidentified1,875(8.5%)patientsoutof

21,795patientseligibleforthestudy,whileICD-9hasidentified2,240(10.1%)from

thesamepopulation.thedisparitiesbetweenthe2resultswasvastinboththeICD

and text mining/NLP approaches despite both studies using the same eligibility

criteriaoflong-termopioidtherapy(orchronicopioidtherapy).Thisdifferencecould

beattributedtooneormoreofthefollowingfactors:

First,despitebothstudiesusingthesameinclusioncriteria,ourstudyexcluded

patientswithschizophreniaduetodocumentedhigherprevalenceofOUPamongthis

population. However, this factor would have minimal effect, since only 86

schizophrenicpatients (whowerealreadyexcludeddue to thepresenceof cancer

diagnosis)wereadditionallyexcludedfromthestudy.

70

Second,thisdifferencecouldbeattributedtothetimeandstudyperiodofthe

2 studies.While the longest available follow-up period in our studywas 2 years,

Carrell’sstudyassessedpatientsovera5-yearperiodfrom2006-2012.Thiscould

indicatethatthelengthoftimeforopioidconsumptionwouldleadtoanincreasein

prevalenceofopioidproblem.Onecritiquetothisassumptionisthatneitherstudy

measuredthedirecteffectofthelengthofopioidtherapyondevelopmentofthestudy

outcomes.

Figure5.1Opioidoverdosedeathratesper100,000populationfrom2006-2014.

Source:KaiserFamilyFoundation[123]

Third, this couldbe attributed to thedifferencebetweenopioidproblem rates

between the statesofWashingtonand Indianaduring the studiedperiod forboth

studies.Nationalstatisticsofopioidoverdosedeathrates(per100,000)showsthat

therateofopioidoverdosedeathsinWashingtonstaterangedfrom10.2to9.2during

Carrell’s studyperiod (2006-2012),while the samerate ranged from5.7 to7.3 in

Indianaduringourstudyperiod(2013-2014)(Figure5.1).

71

Finally, in the oppositition of Carrell’s study, our study did not have access to

behavioral\mentalhealthreportsdueto42CFRPart2-confidentialityofsubstance

usedisorderpatientrecordsregulations[97].Webelievethatincludingbehavioral

healthreportsislikelytoidentifymorepositivecasesofopioiduseproblemsinour

studypopulation.

Collectively,both studiesusedNLP to identifyopioiduseproblems inprimary

careandemergencysettingsandcomparedtheresultswithICD-9codes.TheNLPtext

mining technique identified additional positive cases in both studies. Text-mining

approachalsodetected15patientswithamentionofICD-9codeofopioidabuseand

dependence in patients’ clinical notes but these codes were undocumented in

patients’ record as structured data. This might indicate a significant lack of

documentation for opioid use problems and confirm previous studies’ findings

regarding the underreporting of opioid problems using ICD codes. For future

researchpertainingtoidentifyingOUP,werecommendusingalternatemethods(in

addition to ICD codes), such as text mining and machine learning, and further

exploringother commonlydocumented structureddata, suchasprocedural codes

andelectroniclabresults.

MachineLearning

Theseriskassessmenttechniquesutilizeasetofstatisticaltechniquestoidentify

partsofspeech,entities,sentiment,andotheraspectsoftext.Thechosentechnique

canbeexpressedinmodelsthatcanbeappliedtoanothertext[124].Thesemodels

canbeusedtoidentifyaparticularoutcomeintheclinicalnotes.Thesemodelsare

more flexible; to improve performance by training themodel on larger andmore

72

representative training sets and by choosing a suitable classifying algorithm. In

general,thelearningprocessofmachinelearningcanbecategorizedintosupervised

and unsupervised learning. Supervised learning means training documents are

taggedwithaspecificclass(e.g.,positive,negative),whileunsupervised,canextract

meaningwithoutaspecifictrainingset.Acommonexampleofunsupervisedtraining

isdataclustering.

MachineLearninginClassificationofDisease

Inourreview,wedidnotfindanypublishedworkpertainingtomachine-learning

models that classify clinical notes to identify opioid-addiction related behavior.

However,wefoundseveralstudieswhichhaveusedmachine-learningalgorithmsto

classifyothermedicaleventssuchaschronicdiseasehospitalization,orparticular

medicalconditionsuchasheartfailurestage,cancerdiagnosisoracutepancreatitis

[125–128].

Despitethedifferencesintheoutcomeofinterest,thegeneralthemeofbuildinga

machine-learningmodelissimilar.First,identifytheoutcomeofinterestandsource

ofclinicalnotes(e.g.heartfailurediagnosisintheemergencydepartmentdischarge

note). Second, identify training corpus (gold standard) using NLP/text-mining

approach (e.g. rule-based). Third, buildmachine-learningmodel and evaluate the

performance(byreportingF-measure,recallandprecision).

Machine-LearningtoIdentifyOpioidUseProblems

In this thesis, similar stepswere adopted tobuildmachine-learningpredictive

models to identify opioid use problems among long-term therapy patients at IUH

usingclinicalnotes.Webuiltonourpreviouseffortintextminingtoinvestigatethe

73

applicationofmachine-learningtechniquestoclassifyopioiduseproblemsutilizing

patientsclinicalnotes.Wemanuallyreviewedalistofpositiveandnegativeclinical

notes that resulted from the text-mining study, whichwe then used to build our

machine-learning model. We tested 5 different algorithms to classify opioid use

problemsat thedocument level. SimpleLogistic regressionshowedoverallhigher

accuracy performance measures in classifying positive opioid use problems on

balanceddataset.Ourfinalmodelwastestedonunseendataandreportedarecallof

88%andprecisionof66%(F-measure=76%)whenidentifyingpositivecases.

Despite our findings the results of identifying opioid use problems are

encouraging, our outcome of interest (OUP), comprised of multiple behavioral

concepts,suchasdependence,abuse,addiction,andoverdose.Whiledistinguishing

specificbehavioralconceptswasnotautomated in this research, futureworkmay

investigatetheuseoftextmining,naturallanguageprocessing,andmachinelearning

to better target specific behaviors before considering using these for surveillance

purposesorincorporatingthemintoaclinicaldecisionsystemforprimarycareand

emergencyencounters.

Limitations

Inthisthesis,weusedatext-miningapproachtoanalyzepatientsclinicalnotesto

identifyOUP.Due to labor constraints,ouranalysiswas limited to5 report types,

despite our effort to rely on expert opinions tomeaningfully limit the number of

reporttypes.Thisapproachmighthavesystematicallyoverlookedacertaingroupof

positive OUP patients. Since our gold standard data wasmade throughmanually

reviewing a text-mining algorithm to identify opioid use problems, any positive-

74

report subclass that was not captured by the text-mining algorithm may not be

representedinourmodel.Toaccountforthislimitation,wehaveintentionallyadded

all falsepositive reports as part of our trainingmodel labeled asnegative reports.

Further, a randomsample of 500negative reports (by the text-mining algorithm)

weremanuallyreviewedandfalse-negativereportswereaddedtointothetraining

setaspositivecases.

Conclusion

Thisstudy’smaincontributionwastodemonstrate thevalidityofusinga text-

miningapproachandmachine-learningtechniquestoidentifyopioiduseproblems

frompatients’clinicalnotesinmulti-sitemedicalcarefacilities.Ourresearchhasalso

highlightedthemaindemographiccharacteristics,careutilization,andco-morbidity

similarities and differences among opioid use problems identified by textmining

versustheopioiduseproblemsidentifiedbyICDcodes.Additionally,thisresearch

hascomparedmultiplemachine-learningalgorithms’performancetoidentifyopioid

use problems frompatients’ clinical notes and has reported their performance to

informfutureresearch.

FutureDirectionandRecommendations

Basedontheresultsfromthisthesis,futureresearchmayfocuson:

1. Furtheranalyzingmachine-learningcapabilitytoidentifyopioiduseproblems

from clinical notes.We recommend to buildmachine-learningmodel that is

basedonlargervalidatedtrainingcorpuswhichcomprisedofmultipleyearsof

patient’clinicalnotes.

75

2. Comparingtext-miningapproachagainstICD-10tounderstandfirst,whether

ICD-10documentationtoOUPhaschangedfromICD-9andsecond,tofurther

studythepattern,ifany,fortheOUPcasesmissedbyICDcodes.

3. ExploringtheideaofadoptingahybridapproachthatcombinesICDcodesin

addition to patients clinical notes (text mining or machine learning).

Incorporating such approach may synergize the ability of future models to

identifytheopioidissue.

4. Focusingonmeasuringtherelationshipbetweenopioiduseproblemsandthe

lengthofopioidtreatmentinlong-termopioidtherapypatients.

5. Investigatingtheuseoftextmining,naturallanguageprocessingandmachine

learning to better target specific opioid use problems behaviors (such as

overdose vs. abuse) before incorporating these techniques into surveillance

purposes or adopting it into a clinical decision system for primary care and

emergencyencounters.

76

6. APPENDICES

CHAPTER3SUPPLEMENTS

AppendixA.1VenndiagramoftheoverlapbetweenthetwopositivecohortsofopioidUseprobleminthestudy

77

Appendix A.2 Terms used by nDepth™ algorithm to identify (flag) OUP positivereports

Opioidterms Problemuseterms

codeine analgesic addict abusing aberrancy

codeine narcotic addicted misusing aberrant

codeine opiate addiction overusing daiversion

fentanyl opioid overusing divert

hydrocodone abuse

methadone abuser

morphine misuse

oxy misuser

oxy-ir overuse

oxycod overuser

oxycodone overuse

oxyContinpercocetpolypharmacyvicoVicodin

overuser

AppendixA.3Highlightedphrasesusedinthesemi-assistedmanuallyreviewprocess

Semiassistedmanualreviewhighlightedwords

abus* opioidusedisorder diversion/divert

misus* dependent high

overus* dependency euphoria

overus* heroin abnormalurinedrugscreen

Addict* Crav* dirtyurine

inconsistenturine Lostprescriptions alcoholabus*

unauthorizeddoseincrease

Lesssuggestivewords earlyrefills

takingmorethanprescribed

marijuana immediatetaper

Narcan narcotic naloxone

78

AppendixA.4ICD-9codesusedtodefineOUP

ClassCodes

DiagnosisCategory ICD-9-CMCodes

OpioidAbuseanddependence

1. 304.00(opioiddependenceunspecified)2. 304.01(opioiddependencecontinuous)3. 305.50(opioidabuseunspecified)4. 305.51(opioidabusecontinuous)5. 304.71(opioid/otherdependencecontinuous)6. 304.02(opioiddependenceepisodic)7. 304.70(opioid/otherdependenceunspecified)8. 305.52(opioidabuseepisodic)9. 304.72(opioid/otherdependenceepisodic)

Poisoning 10. Non-specificcode965Poisoningbyanalgesicsantipyreticsandantirheumatics

11. Non-specificcode965.0Poisoningbyopiatesandrelatednarcotics12. Specificcode965.00Poisoningbyopium(alkaloids),unspecified13. Specificcode965.01Poisoningbyheroin14. Specificcode965.02Poisoningbymethadone15. Specificcode965.09Poisoningbyotheropiatesandrelated

narcotics16. E850.0Accidentalpoisoningbyheroin17. E850.1Accidentalpoisoningbymethadone18. E850.2Accidentalpoisoningbyotheropiatesandrelatednarcotics

AppendixA.5FrequencydistributionofpositiveICD-9codesforOUP

FrequencydistributionofpositiveICD-9codesforOUP

ICD-9Codes Description N(%)

304 opioiddependenceunspecified 24(49%)

305.5 opioidabuseunspecified 9(18%)

304.01 opioiddependencecontinuous 5(10%)

965 Non-specificcode965.0Poisoningbyopiatesandrelatednarcotics

3(6%)

965.01 Specificcode965.01Poisoningbyheroin 3(6%)

E850.2 E850.2Accidentalpoisoningbyotheropiatesandrelatednarcotics

2(4%)

E850.0 E850.0Accidentalpoisoningbyheroin 1(2%)

965.09 Specificcode965.09Poisoningbyotheropiatesandrelatednarcotics

1(2%)

304.71 opioid/otherdependencecontinuous 1(2%)

GrandTotal 49(100)*

*4outof45patientshadtwopositiveICD-9codesforOUP

79

CHAPTER4SUPPLEMENTS

AppendixB.1J48modelperformanceon80%ofthegoldstandarddata

Class TPRate

FPRate

sPrecision Recall F-Measure

ROCArea

Positive 0.691 0.092 0.707 0.691 0.699 0.808

Negative 0.908 0.309 0.902 0.908 0.905

WeightedAvg.

0.855 0.256 0.854 0.855 0.855

AppendixB.2NaiveBayesmodelperformanceon80%ofthegoldstandarddata

Class TPRate

FPRate

Precision Recall F-Measure

ROCArea

Positive 0.625 0.146 0.578 0.625 0.601 0.851

Negative 0.854 0.375 0.877 0.854 0.865 0.776

WeightedAvg.

0.798 0.319 0.804 0.798 0.801 0.794

AppendixB.3Randomforestmodelperformanceon80%ofthegoldstandarddata

Class TPRate

FPRate


ROCArea

Positive 0.485 0.160 0.493 0.485 0.489 0.662

Negative 0.840 0.515 0.836 0.840 0.838

WeightedAvg.

0.754 0.429 0.752 0.754 0.753

AppendixB.4SMOmodelperformanceon80%ofthegoldstandarddata

Class TPRate

FPRate


ROCArea

Positive 0.721 0.083 0.737 0.721 0.729 0.819

Negative 0.917 0.279 0.911 0.917 0.914

WeightedAvg.

0.870 0.232 0.869 0.870 0.869

AppendixB.5Simplelogisticmodelperformanceon80%ofthegoldstandarddata

Class TPRate

FPRate


ROCArea

Positive 0.728 0.052 0.818 0.728 0.770 0.940

Negative 0.948 0.272 0.916 0.948 0.932

WeightedAvg.

0.895 0.219 0.892 0.895 0.892

80

AppendixB.6Alistofdocument,sentenceandphraseperformancemeasuretoclassifypositiveclass

AccuracyMeasures

Level Algorithm Precision Recall F-Measure

Document SimpleLogistic 0.918 0.904 0.911

J48 0.847 0.853 0.85

RandomForest 0.756 0.934 0.836

BinarySMO 0.797 0.838 0.817

NaiveBase 0.821 0.64 0.719

Sentence RandomForest 0.867 1 0.929

BinarySMO 0.867 1 0.929

SimpleLogistic 0.856 1 0.922

NaiveBase 0.856 1 0.922

Phrase RandomForest 0.864 1 0.927

BinarySMO 0.864 1 0.927

SimpleLogistic 0.859 1 0.924

NaiveBase 0.859 1 0.924

81

WEKADEMONSTRATION

AppendixC.1ClickExplorertoopenWEKA

AppendixC.2ClickOpenfile

82

AppendixC.3ChooseARFFtrainingdatafilefromcomputer

AppendixC.4Double-checkdatabeforeproceeding

83

AppendixC.5ClickChoosetoopenclassifieroptions

AppendixC.6Choosefilteredclassifier

AppendixC.7Rightclickonclassifiernameandclickshowproperties

84

AppendixC.8ClickChoosetoopenclassifierslist

85

AppendixC.9Chooseaclassifier[e.g.,simplelogistic]

86

AppendixC.10ClickChoosetoopenfilteroptions

87

AppendixC.11ChooseStringToWordVector

88

AppendixC.12Rightclickonfilternameandclickshowproperties

89

AppendixC.13Suggestedfiltersettingsfortextclassifications

90

AppendixC.14Suggestedfiltersettingsfortextclassificationscontinues

91

AppendixC.15Frommainclassifytab,clickMoreoptions

92

AppendixC.16ClickChoosetoopenoutputpredictions

93

AppendixC.17Choosecross-validationandclassifyingattribute

94

AppendixC.18Predictionresultsareshowninthebuffer

AppendixC.19Accuracymeasuresandmodelperformanceresultsareshownatthe

endofthebufferresults

95

AppendixC.20Rightclickonclassifiername,thenclickshowproperties

AppendixC.21ClickChoosetoopenclassifieroptions

96

AppendixC.22ChooseCostSensitiveClassifier

97

AppendixC.23Rightclicktoshowproperties

AppendixC.24Clickchoosetoopenclassifieroptionsandchooseaclassifier

98

AppendixC.25Rightclickoncostmatrixtoshowproperties

99

AppendixC.26Changeclassesnumberaccordingtothestudyclasses,thenclick

resize

100

AppendixC.27CostMatrixnowreflectsstudyclassescount

AppendixC.28Adjustpenaltyasneeded.Trainthemodel

101

JAVACODESTOEXTRACTTEXT

AppendixD.1CodestoextracttextfromPDFinspiredbyDaraYukoriginal

code[129]

packageabdullah;import java . io . File ; import java . io . FileInputStream ; import java . io .FileOutputStream;importjava.io.FileReader;importjava.io.FileWriter;importjava.io.BufferedWriter;importjava.io.IOException;importcom.itextpdf.text.Document;importcom.itextpdf.text.pdf.PdfReader;importcom.itextpdf.text.pdf.parser.PdfTextExtractor;importjava.awt.Desktop;importjavax.swing.filechooser.FileNameExtensionFilter;importjavax.swing.JFileChooser;/∗∗∗∗@author∗/public classPDFToTextConverter{publicstaticvoidmain(String[]args){selectPDFFiles();}//allow pdf f i l e s selection for converting public static voidselectPDFFiles(){JFileChooserchooser=newJFileChooser();FileNameExtensionFilterfilt

e r = new FileNameExtensionFilter (”PDF” ,” pdf ”); chooser .setFileFilter(filter);chooser.setMultiSelectionEnabled(true);intreturnVal=chooser.showOpenDialog(null);Stringpath=FilePath;if(returnVal==JFileChooser.APPROVEOPTION){

File[]Files=chooser.getSelectedFiles();System.out.println(” Please wait . . . ” ) ; for ( int i =0;i<Files . length ; i++){convertPDFToText(Files[i].toString(),path+Files[i].getName().substring(0,

Files[i].getName().length()−4)+”.txt”);}

System.out.println(”Conversioncomplete”);}

}public static voidconvertPDFToText(String src,String desc){

try{ //create file writer

FileWriterfw=newFileWriter(desc); //create buffered writer

BufferedWriterbw=newBufferedWriter(fw); //create pdfreader

PdfReaderpr=newPdfReader(src);//getthenumberofpagesinthedocumentintpNum=pr.getNumberOfPages();for(intpage=1;page<=pNum;page++){Stringtext=PdfTextExtractor.getTextFromPage(pr,page);bw.write(text);bw.newLine();

102

}bw.flush();bw.close();}catch(Exceptione){e.printStackTrace();}}

}

103

AppendixD.2CodestoextracttextfromCSV

packageabdullah;importjava.io.BufferedReader;importjava.io.BufferedWriter;importjava.io.File;importjava.io.FileNotFoundException;importjava.io.FileReader;importjava.io.FileWriter;importjava.io.IOException;importjava.util.StringTokenizer;/∗∗∗∗@author∗/public class Abdullah{

/∗∗ ∗@paramargsthecommandline arguments

∗/ public static void main( String [ ] args ) throwsFileNotFoundException,IOException{

File = new File ( F i l e P a t h ); BufferedReader br = newBufferedReader(newFileReader(file));

String line; StringTokenizer st;

intcounter=0;while((line=br.readLine())!=null){st=newStringTokenizer(line,”\t”);if(st.countTokens()>=3){st.nextToken();

Stringname=st.nextToken()+”.txt”;BufferedWriterout=newBufferedWriter(newFileWriter(”FilePath”+name));st.nextToken();if (st.hasMoreTokens()){out.write(st.nextToken());}else{

out.write(””);}out.close();counter++;

} //(st.countTokens()==4)else{System.err.println(”line\t”+counter);counter++;

}}//while}}

104

7. REFERENCES

[1] D.E. Joranson,“etal., trendsinmedicaluseandabuseofopioidanalgesics,”

Jama,vol.283,no.13,pp.1710–1714,2000.

[2] S. E. McCabe, B. T. West, and H. Wechsler, “Trends and college-level

characteristics associated with the non-medical use of prescription drugs

amonguscollegestudentsfrom1993to2001,”Addiction,vol.102,no.3,pp.

455–465,2007.

[3] R.K.Portenoy,Chronicopioidtherapyinnonmalignantpain.Journalofpainand

symptommanagement,1990.

[4] L.Manchikanti,M.B. F.M., andM.H.Ailinani, “Therapeuticuse, abuse, and

nonmedicaluseofopioids:aten-yearperspective,”PainPhysician,vol.13,pp.

401–435,2010.

[5] CDC, “National vital statistics system: Mortality data,”

http://www.cdc.gov/nchs/deaths.htm,2013,accessed:2018-01-23.

[6] N.Volkow,American’sAddictiontoOpioids:HeroinandPrescriptionDrugAbuse,

vol. 2014, 2018. [Online]. Available: https://www.drugabuse.gov/about-

nida/legislative-activities/testimonyto-congress/2016/americas-addiction-

to-opioids-heroin-prescription-drug-abuse

[7] W.M.ComptonandN.D.Volkow,“Majorincreasesinopioidanalgesicabusein

theunitedstates:concernsandstrategies,”Drugandalcoholdependence,vol.

81,no.2,pp.103–107,2006.

[8] U.D.Health andH.H. Services, “With chartbook on trends in the health of

americans.”2006Claitor’sLawBooksandPublishingDivision,2005.

105

[9] J. Clarke, A. Skoufalos, and R. Scranton, “The american opioid epidemic:

Population health implications and potential solutions. report from the

national stakeholderpanel,”Populationhealthmanagement, vol.19,pp.S1–

S10,2016.

[10] L.Manchikanti, “National drug control policy and prescription drug abuse:

factsandfallacies,”Painphysician,vol.10,no.3,pp.399–424,2007.

[11] L. Manchikanti, E. Whitfield, and F. Pallone, “Evolution of the national all

schedules prescription electronic reporting act (nasper): a public law for

balancingtreatmentofpainanddrugabuseanddiversion,”Painphysician,vol.

8,no.4,pp.335–47,2005.

[12] L.Manchikanti,“Prescriptiondrugabuse:whatisbeingdonetoaddressthis

newdrugepidemic?testimonybeforethesubcommitteeoncriminaljustice,

drugpolicyandhumanresources,”Painphysician,vol.9,no.4,pp.287–321,

2006.

[13] L.ManchikantiandA.Singh,“Therapeuticopioids:aten-yearperspectiveon

the complexities and complications of the escalating use, abuse, and

nonmedicaluseofopioids,”Painphysician,vol.11,no.2,pp.63–88,2008.

[14] A.M.GilsonandP.G.Kreis,“Theburdenofthenonmedicaluseofprescription

opioidanalgesics,”PainMedicine,vol.10,pp.S89–S100,2009.

[15] H.G.Birnbaum,“etal.,estimatedcostsofprescriptionopioidanalgesicabuse

intheunitedstatesin2001:asocietalperspective,”TheClinicaljournalofpain,

vol.22,no.8,pp.667–76,2006.

106

[16] H. G. Birnbaum and et al, “Societal costs of prescription opioid abuse,

dependence,andmisuseintheunitedstates,”Painmedicine(Malden,Mass),

vol.12,no.4,pp.657–67,2011.

[17] N.National,OverdoseDeaths-NumberofDeathsfromPrescriptionOpioidPain

Relievers, vol. 2015, 2018. [Online]. Available:

http://www.drugabuse.gov/related-topics/trends-statistics/overdosedeath-

rates

[18] Asam, “Opioid addiction 2016 facts and figures,” Addiction, 2016. [Online].

Available: http://www.asam.org/docs/default-

source/advocacy/opioidaddiction-disease-facts-figures.pdf

[19] CDC.,“Drugoverdosedeathsintheunitedstatescontinuetoincreasein2015,”

2015.[Online].Available:https://www.cdc.gov/drugoverdose/epidemic/

[20] C. M. Jones, “Heroin use and heroin use risk behaviors among nonmedical

usersofprescriptionopioidpainrelievers-unitedstates,2002-2004and2008-

2010,”Drugandalcoholdependence,vol.132,no.1,pp.95–100,2013.

[21] R.A. Rudd and et al, “Increases in drug and opioid overdose deaths-united

states,”American Journal of Transplantation, vol. 16, no. 4, pp. 1323–1327,

2016.

[22] Cdc., “Opioid painkiller prescribing,” p.1, 2014. [Online].

Available:https://www.cdc.gov/vitalsigns/opioid-prescribing/

[23] CDC, “Prescription painkiller overdoses,” vol. 2018, p. 1, 2013. [Online].

Available:

https://www.cdc.gov/vitalsigns/prescriptionpainkilleroverdoses/index.html

107

[24] Samhsa.,“Keysubstanceuseandmentalhealthindicatorsintheunitedstates:

Resultsfromthe2015nationalsurveyondruguseandhealth,”2016.[Online].

Available: https://www.samhsa.gov/data/sites/default/files/NSDUH-

FFR12015/NSDUH-FFR1-2015/NSDUH-FFR1-2015.pdf

[25] Abuse, “N. i.o.d. prescription andover-the-countermedications,” p. 1, 2015.

[Online].Available:

https://www.drugabuse.gov/publications/drugfacts/prescription-over-

countermedications

[26] R.J.Fortunaandetal.,“Prescribingofcontrolledmedicationstoadolescents

andyoungadults in theunitedstates,”Pediatrics, vol.126,no.6,pp.1108–

1116,2010.

[27] J.Duwveetal.,“Reportonthetollofopioiduseinindianaandmarioncounty,”

vol.2016,2018.[Online].Available:

https://www.inphilanthropy.org/sites/default/files/Richard

[28] C.A.Harleandetal.,“Decisionsupportforchronicpaincare:howdoprimary

carephysiciansdecidewhen toprescribeopioids?aqualitativestudy,”BMC

familypractice,vol.16,no.1,p.48,2015.

[29] J.H.Chenandet al., “Distributionofopioidsbydifferent typesofmedicare

prescribers,”JAMAinternalmedicine,vol.176,no.2,pp.259–261,2016.

[30] P.Bendtsenandetal,“Whatarethequalitiesofdilemmasexperiencedwhen

prescribingopioidsingeneralpractice?”Pain,vol.82,no.1,pp.89–96,1999.

108

[31] A.A.Bergmanandetal.,“Contrastingtensionsbetweenpatientsandpcpsin

chronicpainmanagement:aqualitativestudy,”PainMedicine,vol.14,no.11,

pp.1689–1697,2013.

[32] R.R.Leverenceandetal.,“Chronicnon-cancerpain:asirenforprimarycarea

reportfromtheprimarycaremultiethnicnetwork(primenet),”TheJournalof

theAmericanBoardofFamilyMedicine,vol.24,no.5,pp.551–561,2011.

[33] M.S.Matthias andet al., “Thepatient-provider relationship in chronicpain

care:providers’perspectives,”PainMedicine,vol.11,no.11,pp.1688–1697,

2010.

[34] A.Spitzandetal.,“Primarycareproviders’perspectiveonprescribingopioids

to older adults with chronic non-cancer pain: a qualitative study,” BMC

geriatrics,vol.11,no.1,p.35,2011.

[35] S.F.a.e.a.Butler,“Validationofascreenerandopioidassessmentmeasurefor

patientswithchronicpain,”Pain,vol.112,no.1-2,pp.65–75,2004.

[36] H. Akbik and et al., “Validation and clinical application of the screener and

opioidassessmentforpatientswithpain(soapp),”JournalofPain&Symptom

Management,vol.32,no.3,pp.287–93,2006.

[37] S.F.Butlerandetal.,“Validationoftherevisedscreenerandopioidassessment

forpatientswithpain(soapp-r),”JournalofPain,vol.9,no.4,pp.360–72,2008.

[38] ClinJPain,“Crossvalidationofthecurrentopioidmisusemeasuretomonitor

chronicpainpatientsonopioidtherapy,”ClinicalJournalofPain,vol.26,no.9,

pp.770–6,2010.

109

[39] S.M.Wuand et al., “The addictionbehaviors checklist: validationof a new

clinician-basedmeasureofinappropriateopioiduseinchronicpain,”Journal

ofpainandsymptommanagement,vol.32,no.4,pp.342–351,2006.

[40] T.JonesandA.S.D.Passik,“comparisonofmethodsofadministeringtheopioid

risktool,”JournalofOpioidManagement,vol.7,no.5,pp.347–51,2011.

[41] S.F.Butlerandetal.,“Electronicopioidriskassessmentprogramforchronic

painpatients:Barriersandbenefitsofimplementation,”PainPractice,vol.14,

no.3,pp.E98–E105,2014.

[42] Inflexxion,“Soapplongformvs.soappshortform:Tradeoffinaccuracy,”2005.

[Online].Available:https://www.painedu.com/soapp/SOAPPtradeoff.pdf

[43] T. Jones and et al., “A comparison of various risk screening methods in

predictingdischargefromopioidtreatment,”ClinicalJournalofPain,vol.28,

no.2,pp.93–100,2012.

[44] T.M.Moore, , and et al., “A comparison of common screeningmethods for

predictingaberrantdrug-relatedbehavioramongpatients receivingopioids

forchronicpainmanagement,”PainMedicine,vol.10,no.8,pp.1426–33,2009.

[45] R.WestandJ.Brown,TheoryofAddiction,2nded.-Blackwell:Wiley,2013.

[46] W.S.abuse,p.1,2016.[Online].Available:

http://www.who.int/topics/substanceabuse/en/

[47] “Triwesthealthcarealliance.substanceabusevssubstancedependence,”2016.

[Online]. Available: http://www.triwest.com/en/behavioral-

health/resourcecenter/addiction-recovery/substance-abuse/more/

110

[48] “Highlights of changes from dsm-iv-tr to dsm-5,” 2013. [Online]. Available:

http://www.dsm5.org/documents/changes

[49] C.Tools,“Dsm5criteriaforsubstanceusedisorder,”2013.[Online].Available:

http://www.buppractice.com/node/12351

[50] N.I.Abuse,“Aberrantdrugtakingbehaviors,”vol.4,2016.[Online].

Available:

https://www.drugabuse.gov/sites/default/files/files/AberrantDrugTakingBe

haviors.pdf

[51] J.Jaffe,C.Knapp,andD.Ciraulo,“Opiates:clinicalaspectssubstanceabuse:a

comprehensivetext,”WilliamsandWilkens,pp.158–66,1997.

[52] M.J.Cohenandetal.,“Ethicalperspectives:Opioidtreatmentofchronicpain

inthecontextofaddiction,”ClinicalJournalofPain,vol.18,no.4,pp.S99–S107,

2002.

[53] E.Michnaandetal.,“Predictingaberrantdrugbehaviorinpatientstreatedfor

chronic pain: importance of abuse history,” Journal of pain and symptom

management,vol.28,no.3,pp.250–258,2004.

[54] Drugabuse.gov,“Aberrantdrugtakingbehaviorsinformationsheet,”vol.1,

2018.[Online].Available:

https://www.drugabuse.gov/sites/default/files/files/AberrantDrugTakingBe

haviors.pdf

[55] T.J.Ivesandetal.,“Predictorsofopioidmisuseinpatientswithchronicpain:

aprospectivecohortstudy,”BMCHealthServicesResearch,vol.6,p.46,2006.

111

[56] M.J.Edlundandetal.,“Riskfactorsforclinicallyrecognizedopioidabuseand

dependenceamongveteransusingopioidsforchronicnon-cancerpain,”Pain,

vol.129,no.3,pp.355–62,2007.

[57] A. G. White and et al., “Analytic models to identify patients at risk for

prescriptionopioidabuse,”AmericanJournalofManagedCare,vol.15,no.12,

pp.897–906,2009.

[58] J.B.Riceandetal.,“Amodeltoidentifypatientsatriskforprescriptionopioid

abuse,dependence, andmisuse,”PainMedicine, vol. 13,no.9, pp.1162–73,

2012.

[59] M. Lewis, C. M. Herndon, and J. T. Chibnall, “Patient aberrant drug taking

behaviorsinalargefamilymedicineresidencyprogram:aretrospectivechart

review of screening practices, incidence, and predictors,” Journal of Opioid

Management,vol.10,no.3,pp.169–75,2014.

[60] J.A.Turnerandetal.,“Chronicopioidtherapyurinedrugtestinginprimary

care:prevalenceandpredictorsofaberrantresults,”JournalofGeneralInternal

Medicine,vol.29,no.12,pp.1663–71,2014.

[61] T.R.Hylanandetal.,“Automatedpredictionofriskforproblemopioiduseina

primarycaresetting,”JournalofPain,vol.16,no.4,pp.380–7,2015.

[62] P.D.Allison,LogisticregressionusingSAS:TheoryandapplicationSASInstitute,

S.Institute,Ed.,2012.

[63] E.L.Zaleandetal.,“Tobaccosmoking,nicotinedependence,andpatternsof

prescriptionopioidmisuse:Resultsfromanationallyrepresentativesample,”

Nicotine&TobaccoResearch,vol.17,no.9,pp.1096–103,2015.

112

[64] J.WilsonandA.Bock, “Thebenefitofusingbothclaimsdataandelectronic

medical record data in health care analysis,” 2012. [Online]. Available:

https://www.optum.com/content/dam/optum/resources/whitePapers/Ben

efitsof-using-both-claims-and-EMR-data-in-HC-analysis-WhitePaper-ACS.pdf

[65] E. F. Kern and et al., “Failure of icd?9?cm codes to identify patients with

comorbidchronickidneydiseaseindiabetes,”Healthservicesresearch,vol.41,

no.2,pp.564–580,2006.

[66] M.L.Hansen,P.W.Gunn,andD.C.Kaelber,“Underdiagnosisofhypertensionin

childrenandadolescents,”Jama,vol.298,no.8,pp.874–879,2007.

[67] S.F.Butlerandetal.,“Cross-validationofascreenertopredictopioidmisuse

inchronicpainpatients(soapp-r),”Journalofaddictionmedicine,vol.3,no.2,

p.66,2009.

[68] Inflexxion,“Soapp,”2015. [Online].Available:

https://www.inflexxion.com/soapp-comm/

[69] J.S.Kniselyandetal.,“Prescriptionopioidmisuseindex:abriefquestionnaire

toassessmisuse,”JournalofSubstanceAbuseTreatment,vol.35,no.4,pp.380–

6,2008.

[70] T.Jonesandetal.,“Furthervalidationofanopioidriskassessmenttool:the

briefriskinterview,”JournalofOpioidManagement,vol.10,no.5,pp.353–64,

2014.

[71] Zdnet, Within two years, 2013. [Online]. Available:

http://www.zdnet.com/article/within-two-years-80-of-all-medical-datawill-

be-unstructured/

113

[72] S. M. Smith and et al., “Classification and definition of misuse, abuse, and

related events in clinical trials: Acttion systematic review and

recommendations,”PAIN,vol.154,no.11,pp.2287–2296,2013.

[73] T.P.Stratton,L.Palombi,H.Blue,andM.E.Schneiderhan,“Ethicaldimensions

of the prescription opioid abuse crisis,” American Journal of HealthSystem

Pharmacy,vol.75,no.15,pp.1145–1150,2018.

[74] C.Gounder,“Whoisresponsibleforthepain-pillepidemic,”TheNewYorker,

2013.

[75] B.Meier,Pain killer: A”wonder” drug’s trail of addiction and death. Rodale,

2003.

[76] G. Baucus, “Baucus Grassley opioid investigation letter to American pain

foundation,” https://www.finance.senate.gov/download/baucus-

grassleyopioid-investigation-letter-to-american-pain-foundation, May 2012,

accessed:2018-08-20.

[77] G.Kounang,“Senatereportsayspatientadvocacygroupsgetkickbacksfrom

opioid-manufacturers,”

https://www.cnn.com/2018/02/12/health/senatereport-opioid-

manufacturers-donations/index.html,February2018,accessed:2018-08-20.

[78] Ornstein,“Americanpainfoundationshutsdownassenatorslaunch

investigationofprescriptionnarcotics,”

https://www.propublica.org/article/senatepanel-investigates-drug-

company-ties-to-pain-groups,May2012,accessed:2018-08-20.

114

[79] R.C.Dart,H.L.Surratt,T.J.Cicero,M.W.Parrino,S.G.Severtson,B.Bucher-

Bartelson,andJ.L.Green,“Trendsinopioidanalgesicabuseandmortalityin

theunitedstates,”NewEnglandJournalofMedicine,vol.372,no.3,pp.241–

248,2015.

[80] T. D. Saha, B. T. Kerridge, R. B. Goldstein, S. P. Chou, H. Zhang, J. Jung, R. P.

Pickering,W. J.Ruan, S.M.Smith,B.Huangetal., “Nonmedicalprescription

opioid use and dsm-5 nonmedical prescription opioid use disorder in the

unitedstates,”TheJournalofclinicalpsychiatry,vol.77,no.6,p.772,2016.

[81] S.M.Frenk,K.S.Porter,andL.Paulozzi,Prescriptionopioidanalgesicuseamong

adults: United States, 1999-2012. US Department of Health and Human

Services, Centers for Disease Control and Prevention, National Center for

HealthStatistics,2015,no.2015.

[82] A.Kolodny,D.T.Courtwright,C.S.Hwang,P.Kreiner,J.L.Eadie,T.W.Clark,and

G. C. Alexander, “The prescription opioid and heroin crisis: a public health

approachtoanepidemicofaddiction,”Annualreviewofpublichealth,vol.36,

pp.559–574,2015.

[83] Cdc., “Increases in drug and opioid-involved overdose deaths united states,

20102015,” 2016. [Online]. Available:

https://www.cdc.gov/mmwr/volumes/65/wr/mm655051e1.htm

[84] national institute of drug abuse, “Indiana opioid summary,”

https://www.drugabuse.gov/drugs-abuse/opioids/opioid-summaries-

bystate/indiana-opioid-summary,February2018,accessed:2018-08-21.

115

[85] A.H.Alzeer,J.Jones,andM.J.Bair,“Reviewoffactors,methods,andoutcome

definitionindesigningopioidabusepredictivemodels,”PainMedicine,vol.19,

no.5,pp.997–1009,2017.

[86] U.Rajaandetal.,“Textmininginhealthcare.applicationsandopportunities,”

Journalofhealthcareinformationmanagement:JHIM,vol.22,no.3,pp.52–56,

2008.

[87] R. Kukafka and et al., “Human and automated coding of rehabilitation

discharge summaries according to the international classification of

functioning,disability,andhealth,”JournaloftheAmericanMedicalInformatics

Association,vol.13,no.5,pp.508–515,2006.

[88] S. G. Peters and J. D. Buntrock, “Big data and the electronic health record,”

JournalofAmbulatoryCareManagement,vol.37,no.3,pp.206–10,2014.

[89] L.Pereiraetal.,Usingtextminingtodiagnoseandclassifyepilepsyinchildren.

ine-HealthNetworking,Applications&Services(Healthcom),2013IEEE15th

InternationalConferenceon.2013.IEEE.

[90] D.S.Carrellandetal.,“Usingnaturallanguageprocessingtoidentifyproblem

usageofprescriptionopioids,”Internationaljournalofmedicalinformatics,vol.

84,no.12,pp.1057–1064,2015.

[91] R. E. Palmer and et al., “The prevalence of problem opioid use in patients

receiving chronic opioid therapy: computer-assisted review of electronic

healthrecordclinicalnotes,”Pain,vol.156,no.7,pp.1208–14,2015.

[92] I. Hleath, “About indiana university health,” 2013. [Online]. Available:

http://iuhealth.org/images/glo-new-doc/IUH-FactSheet-07-22-131(2).pdf

116

[93] A.Ghaffarinejad andM.Kerdegary, “Relationshipof opioiddependence and

positiveandnegativesymptomsinschizophrenicpatients,”Addiction&health,

vol.1,no.2,p.69,2009.

[94] Regenstrief, “ndepth,” 2016. [Online]. Available:

http://www.regenstrief.org/resources/ndepth/

[95] Regenstrief institute, “Regenstriefdatacore,” 2018. [Online].

Available:https://www.indianactsi.org/translational-informatics/data-core/

[96] H. Harkema and et al., “Context: an algorithm for determining negation,

experiencer,andtemporalstatusfromclinicalreports,”Journalofbiomedical

informatics,vol.42,no.5,pp.839–851,2009.

[97] Cornell, “42 cfr part 2 - confidentiality of substance use disorder patient

records,” https://www.law.cornell.edu/cfr/text/42/part-2, February 2017,

accessed:2018-09-15.

[98] CDC., “U. s. prescribing rate maps,”2017. [OnlinAvailable:

https://www.cdc.gov/drugoverdose/maps/rxrate-maps.html

[99] A.Goodnough,OpioidPrescriptionsFallAfter2010Peak,C.D.C.ReportFinds,in

TheNewYorkTimes,2017.

[100] K.Newman,O.PrescriptionsandA.D.D.inUSNews,Eds.,2018.

[101] A.S.Bohnertandetal.,“Associationbetweenopioidprescribingpatternsand

opioidoverdose-relateddeaths,”Jama,vol.305,no.13,pp.1315–1321,2011.

[102] K.M.Dunnandetal.,“Opioidprescriptionsforchronicpainandoverdose:A

cohortstudy,”AnnalsofInternalMedicine,vol.152,no.2,pp.85–92,2010.

117

[103] W. M. Compton, C. M. Jones, and G. T. Baldwin, “Relationship between

nonmedicalprescription-opioiduseandheroinuse,”NewEnglandJournalof

Medicine,vol.374,no.2,pp.154–163,2016.

[104] M.Okamotoetal.,“Anasymptoticexpansionforthedistributionofthelinear

discriminantfunction,”TheAnnalsofMathematicalStatistics,vol.34,no.4,pp.

1286–1301,1963.

[105] J. Beck, “Class balancing in weka,”2018. [OnlinAvailable:

https://www.youtube.com/watch?v=V9PNyx5-kxM

[106] O.Kwonand J.M. Sim, “Effects of data set features on theperformances of

classificationalgorithms,”ExpertSystemswithApplications,vol.40,no.5,pp.

1847–1857,2013.

[107] V.Lo´pez,A.Fern´andez,S.Garc´ıa,V.Palade,andF.Herrera,“Aninsightinto

classificationwithimbalanceddata:Empiricalresultsandcurrenttrendson

usingdata intrinsic characteristics,” InformationSciences, vol.250,pp.113–

141,2013.

[108] Y.Sun,M.S.Kamel,andY.Wang,“Boostingforlearningmultipleclasseswith

imbalancedclassdistribution,”innull.IEEE,2006,pp.592–602.

[109] H.HeandE.A.Garcia,“Learningfromimbalanceddata,”IEEETransactionson

Knowledge&DataEngineering,no.9,pp.1263–1284,2008.

[110] G.E.Batista,R.C.Prati,andM.C.Monard,“Astudyofthebehaviorofseveral

methods for balancing machine learning training data,” ACM SIGKDD

explorationsnewsletter,vol.6,no.1,pp.20–29,2004.

118

[111] B. Zadrozny and C. Elkan, “Learning andmaking decisionswhen costs and

probabilitiesarebothunknown,” inProceedingsof the seventhACMSIGKDD

internationalconferenceonKnowledgediscoveryanddatamining.ACM,2001,

pp.204–213.

[112] P.Domingos,“Metacost:Ageneralmethodformakingclassifierscostsensitive,”

inProceedingsofthefifthACMSIGKDDinternationalconferenceonKnowledge

discoveryanddatamining.ACM,1999,pp.155–164.

[113] S. Lomax, “How to know that our dataset is imbalance?” 2017. [Online].

Available:

https://www.researchgate.net/post/How toknowthat ourdatasetis

imbalance

[114] Y. Sun, A. K. Wong, and M. S. Kamel, “Classification of imbalanced data: A

review,”InternationalJournalofPatternRecognitionandArtificialIntelligence,

vol.23,no.04,pp.687–719,2009.

[115] Z.-H.ZhouandX.-Y.Liu,“Trainingcost-sensitiveneuralnetworkswithmethods

addressingtheclassimbalanceproblem,”IEEETransactionsonKnowledgeand

DataEngineering,vol.18,no.1,pp.63–77,2006.

[116] “.modelfile,”http://filext.com/file-extension/MODEL,accessed:2018-08-15.

[117] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised

learning algorithms,” inProceedings of the 23rd international conference on

Machinelearning.ACM,2006,pp.161–168.

119

[118] R.Couronn´e,P.Probst, andA.-L.Boulesteix, “Random forestversus logistic

regression:alarge-scalebenchmarkexperiment,”BMCbioinformatics,vol.19,

no.1,p.270,2018.

[119] T.PranckeviˇciusandV.Marcinkeviˇcius,“Comparisonofnaivebayes,random

forest, decision tree, support vector machines, and logistic regression

classifiersfortextreviewsclassification,”BalticJournalofModernComputing,

vol.5,no.2,p.221,2017.

[120] C.Perlich,“Whataretheadvantagesoflogisticregressionoverdecisiontrees?

are there any cases where it’s better to use logistic regression instead of

decision trees?” https://www.quora.com/What-are-the-advantages-

oflogistic-regression-over-decision-trees-Are-there-any-cases-where-its-

better-touse-logistic-regression-instead-of-decision-trees/answer/Claudia-

Perlich,Jun2017,accessed:2018-08-16.

[121] L.M.Schwartz,S.Woloshin,H.C.Sox,B.Fischhoff,andH.G.Welch,“Uswomen’s

attitudes to false positive mammography results and detection of ductal

carcinomainsitu:crosssectionalsurvey,”Bmj,vol.320,no.7250,pp.1635–

1640,2000.

[122] J. G. Elmore, D. L. Miglioretti, L. M. Reisch, M. B. Barton, W. Kreuter, C. L.

Christiansen, and S. W. Fletcher, “Screening mammograms by community

radiologists:variabilityinfalse-positiverates,”JournaloftheNationalCancer

Institute,vol.94,no.18,pp.1373–1380,2002.

[123] K. F. Foundation, “Opioidoverdosedeath rates andall drugoverdosedeath

rates per 100,000 population (age-adjusted),”

120

https://www.kff.org/other/stateindicator/opioid-overdose-death-rates/,

2018,accessed:2018-08-29.

[124] S.Redmore,“Machinelearningvs.naturallanguageprocessing,”2013.

[125] R.Zhang,S.Ma,L. Shanahan, J.Munroe,S.Horn,andS. Speedie, “Automatic

methods to extract new york heart association classification from clinical

notes,” in Bioinformatics and Biomedicine (BIBM), 2017 IEEE International

Conferenceon.IEEE,2017,pp.1296–1299.

[126] C.-H.Lee,C.-H.Wu,andH.-C.Yang,“Textminingofclinicalrecordsforcancer

diagnosis,”inInnovativeComputing,InformationandControl,2007.

ICICIC’07.SecondInternationalConferenceon. IEEE,2007,pp.172–172.

[127] X.Dong,L.Qian,Y.Guan,L.Huang,Q.Yu,andJ.Yang,“Amulticlassclassification

method based on deep learning for named entity recognition in electronic

medical records,” in Scientific Data Summit (NYSDS), 2016 New York. IEEE,

2016,pp.1–10.

[128] Z.Wei, Z. X. Ju, X. Chun, J. Hua, andP. Jin, “An automatic electronic nursing

recordsanalysissystembasedonthetextclassificationandmachinelearning,”

in Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2013 5th

InternationalConferenceon,vol.2.IEEE,2013,pp.494–498.

[129] D. Yuk, “Pdf to text converter,” http://java.worldbestlearningcenter.com/

2013/06/pdf-to-text-converter.html,June2013,accessed:2018-08-05.

CURRICULUMVITAE

AbdullahHamadAlzeer

EDUCATION

Master of Science inHealth Informatics atNorthernKentuckyUniversity,May

2012.

BachelorofScienceinPharmaceuticalscienceatKingSaudUniversity,February

2009.

EMPLOYMENT

Undergraduate Instructor, Department of Health Information Management,

IndianaUniversity,January2017-2018.

TeachingAssistant,DepartmentofClinicalPharmacy,KingSaudUniversity,June

2011-2018.

Pharmacist, Pharmacy Department, King Fahad Medical City, Riyadh, Saudi

Arabia,May2009-May2010.

PUBLICATION

Alzeer, A., Patel, J., Dixon, B., Bair, M. and Jones, J., 2018, June. A Comparison

BetweenTwoApproachestoIdentifyOpioidUseProblems:ICD-9vs.Text-Mining

Approach.In2018IEEEInternationalConferenceonHealthcareInformatics(ICHI)

(pp.455-456).IEEE.

Alzeer,A.H.,Jones,J.andBair,M.J.,2017.Reviewoffactors,methods,andoutcome

definitionindesigningopioidabusepredictivemodels.PainMedicine,19(5),pp.997-

1009.

Dixon, B.E., Alzeer, A.H., Phillips, E.O.K. andMarrero,D.G., 2016. Integration of

provider,pharmacy,andpatient-reporteddatatoimprovemedicationadherencefor

type2diabetes:acontrolledbefore-afterpilotstudy.JMIRmedicalinformatics,4(1).

PROFESSIONALPRESENTATIONS

Alzeer,AbdullahH.IdentifyOpioidUseProblem:TextMiningApproachIEEEICHI

2018.NewYork,NY.June2018.

AWARDS

BestSaudiStudentsClubPresidentinUSbySaudiArabianCultureMission.

Washington,DC.October2017.

identify opiod use problem - iupui

Documents