identify opiod use problem - iupui
TRANSCRIPT
IDENTIFYOPIODUSEPROBLEM
AbdullahHamadAlzeer
SubmittedtothefacultyoftheUniversityGraduateSchoolinpartialfulfillmentoftherequirements
forthedegree DoctorofPhilosophy
intheSchoolofInformaticsandComputing,IndianaUniversity
December2018
ii
AcceptedbytheGraduateFacultyofIndianaUniversity,inpartial
fulfillmentoftherequirementsforthedegreeofDoctorofPhilosophy.
DoctoralCommittee
______________________________________JosetteJones,Ph.D.,Chair
______________________________________BrianDixon,Ph.D.
September11,2018
______________________________________
MatthewBair,MD.,MS.
______________________________________
XiaowenLiu,Ph.D.
iii
DEDICATION
To those I admire themost:my parents, Hamad andHessah, andmy beloved
siblingswhobelievedinmeallalong,Ibraheem,Saleh,Abdulaziz,Tarfah,Ahmed,and
Sarah.
iv
ACKNOWLEDGMENTS
Everyoneinthislifehashisjourney,andtheroadtocompletingthisdissertation
wasmyodyssey.Forthepast6years,IhaveencounteredenormouschallengesthatI
would not have overcome without the wisdom and support of my committee
members: Josette Jones, Brain Dixon, Mathew Bair and Xiaowen Liu. They have
mastered the art of babysitting my ego while I sought to maintain the highest
academicstandardsinresearch.Thankyouverymuch;yourlessonswentbeyondthe
pagesofthisdissertation,andIpromisetopassthetorchinthefuture.
Inadditiontomycommitteemembers,onescholarandbestfriendhashelpedthe
most in this journey.LinaAlfantoukh,withoutyour supportandexpertise indata
management,thisdissertationmaynothavebeenaccomplishedontime.Onelifemay
notbeenoughtothankyou.
IamgratefultoallmycolleaguesandprofessorsattheSchoolofInformaticsfor
theiracademicsupportandtotheemployeesattheOfficeofInternationalAffairsfor
goingbeyondexpectationstosupporttheinternationalcommunityatIUPUI.
Finally, this research would not have been possible without the trust of and
financialsupportfromthegovernmentofSaudiArabia,representedbytheCollegeof
PharmacyatKingSaudUniversity.MayIusewhatI’velearnedforthegoodofhealth
researchandpracticesinSaudiArabiaandtheworld.
v
AbdullahHamadAlzeer
IDENTIFYOPIOIDUSEPROBLEM
The aimof this research is to design a newmethod to identify the opioid use
problems (OUP) among long-term opioid therapy patients in Indiana University
Healthusingtextminingandmachinelearningapproaches.First,asystematicreview
was conducted to investigate the current variables,methods, and opioid problem
definitionsusedintheliterature.Weidentified75distinctvariablesin9modelsthat
majorlyusedICDcodestoidentifytheopioidproblem(OUP).Thereviewconcluded
thatusingICDcodesalonemaynotbeenoughtodeterminetherealsizeoftheopioid
problemandmoreeffortisneededtoadoptothermethodstounderstandtheissue.
Next,wedevelopedatextminingapproachtoidentifyOUPandcomparedtheresults
with the current conventional method of identifying OUP using ICD-9 codes.
Following the institutional review board and an approval from the Regenstrief
Institute, structured andunstructureddata of 14,298 IUHpatientswere collected
fromtheIndianaNetworkforPatientCare.Ourtextminingapproachidentified127
opioidcasescomparedto45casesidentifiedbyICDcodes.Weconcludedthatthetext
mining approachmay be used successfully to identify OUP from patients clinical
notes. Moreover, we developed a machine learning approach to identify OUP by
analyzingpatients’clinicalnotes.OurmodelwasabletoclassifypositiveOUPfrom
clinical notes with a sensitivity of 88% on unseen data. We concluded that the
machine learning approach may be used successfully to identify the opioid use
problemfrompatients’clinicalnotes.
JosetteJones,Ph.D.,Chair
vi
TABLEOFCONTENTS
1. INTRODUCTION................................................................................................................................1
Synopsis...........................................................................................................................................1
OverviewoftheOpioidUseProblems(OUP)..................................................................2
OpioidOverdoseDeath........................................................................................................2
OpioidUseProblemsandWomen...................................................................................3
OpioidUseProblemsinAdolescentsandYoungAdults........................................3
TheOpioidUseProblemsinIndiana..............................................................................4
ProblemStatement.....................................................................................................................5
Physicians’RoleinIdentifyingOpioidUseProblems.............................................5
Self-ReportOpioidRisk-AssessmentTools.................................................................6
TheStatementofPurpose........................................................................................................9
DissertationAims........................................................................................................................9
DefinitionofTerms..................................................................................................................10
2. CONDUCTASYSTEMATICREVIEWOFPREVIOUSOPIOIDUSEPROBLEMS
(OUP)............................................................................................................................................................13
Objective.......................................................................................................................................13
Method..........................................................................................................................................13
Results...........................................................................................................................................14
Variables..................................................................................................................................15
Method:StudiedPopulation,DatabaseUsedand,Analysis..............................16
OutcomeDefinition.............................................................................................................19
Discussion....................................................................................................................................22
vii
InconsistencyofPredictionDirectionofVariables...............................................22
ClaimsDatavs.ElectronicHealthRecordData......................................................23
VariationinOutcomeDefinition...................................................................................23
Limitations..............................................................................................................................26
Conclusion...................................................................................................................................28
3. IDENTIFYANDCOMPAREPATIENTSWITHOPIOIDUSEPROBLEM....................30
Introduction................................................................................................................................30
Method..........................................................................................................................................32
MethodsOverview..............................................................................................................32
SampleDefinition................................................................................................................33
VariablesofInterest...........................................................................................................33
ReportTypeSelection........................................................................................................33
IdentifyPatientswithOpioidUseProblems............................................................35
IdentifyOUPUsingICD-9Codes....................................................................................36
Analysis....................................................................................................................................37
Results...........................................................................................................................................39
SamplingResults..................................................................................................................39
IdentificationofOpioidUseProblemsUsingaText-MiningApproach.......39
ComparisonBetweenTextMiningandICD-9.........................................................41
VariablesofInterest...........................................................................................................43
Discussion....................................................................................................................................44
IdentificationofOpioidUseProblems........................................................................45
FrequencyofICDCriteriatoIdentifyOUP................................................................46
viii
FrequencyofText-MiningCriteriatoIdentifyOUP..............................................47
Limitations..............................................................................................................................49
Conclusion...................................................................................................................................50
4. MACHINE-LEARNINGAPPROACHTOIDENTIFYOPIOIDUSEPROBLEMS.........51
Introduction................................................................................................................................51
Method..........................................................................................................................................52
Machine-LearningProcessSummary.........................................................................52
Analysis.........................................................................................................................................54
WEKAConfigurationandModelComparison.........................................................54
ModelFinalization...............................................................................................................56
Results...........................................................................................................................................57
ModelComparison..............................................................................................................57
ModelOptimization............................................................................................................57
ModelTesting........................................................................................................................60
Discussion....................................................................................................................................60
Algorithms’Performance.................................................................................................60
Cost-SensitiveClassifier....................................................................................................62
Limitation................................................................................................................................62
Conclusion...................................................................................................................................63
5. SUMMARY.........................................................................................................................................64
SummaryOverview.................................................................................................................64
Self-reportTools(FirstGeneration)................................................................................64
StructuredDataModels(SecondGeneration).............................................................65
ix
ClinicalNotesModels(ThirdGeneration).....................................................................65
NLPandTextMining..........................................................................................................65
MachineLearning................................................................................................................71
Limitations..................................................................................................................................73
Conclusion..............................................................................................................................74
FutureDirectionandRecommendations.......................................................................74
6. APPENDICES....................................................................................................................................76
CHAPTER3SUPPLEMENTS.................................................................................................76
CHAPTER4SUPPLEMENTS.................................................................................................79
WEKADEMONSTRATION.....................................................................................................81
JAVACODESTOEXTRACTTEXT....................................................................................101
7. REFERENCES................................................................................................................................104
CURRICULUMVITAE
x
LISTOFTABLES
Table2.1Mostcommonlyusedvariablesinstatisticalmodelstopredictopioid
abuse.....................................................................................................................................................16
Table2.2Outcomedefinitionandmethodsdatasource.......................................................20
Table2.3Outcomedefinitionandmethodsdatasource.......................................................21
Table3.1Thecriteriaforidentifyingopioiduseproblemsinpatients’clinical
notes......................................................................................................................................................37
Table3.2Demographicscharacteristicsofstudypopulation.............................................40
Table3.3FrequencydistributionofOUPpositivecohorts’characteristics..................42
Table3.4ComparisonofcareutilizationamongOUPpositivecohorts..........................42
Table3.5Measureofassociationandlikelihoodratioforcategoricalvariables........44
Table3.6ComparisonofcontinuousvariablesamongpositiveOUPusingtext
miningvs.negativecohorts.........................................................................................................44
Table4.1HighlightofWEKAconfigurations...............................................................................56
Table4.2Comparisonofmultiplemodels’performancestoidentifypositive
OUP.........................................................................................................................................................58
Table4.3Detailedaccuracyperformancemeasureforsimplelogisticalgorithm
toclassifyOUP...................................................................................................................................58
Table4.4Initialsimplelogisticmodelperformanceon80%ofthegoldstandard
data........................................................................................................................................................58
Table4.5Optimizedsimplelogisticmodelperformanceon80%ofthegold
standarddata.....................................................................................................................................58
xi
Table4.6Simplelogisticconfusionmatrixcomparisonbeforeandafter
optimization.......................................................................................................................................58
Table4.7Simplelogisticmodelperformancetestedon20%ofunseengold
standarddata.....................................................................................................................................58
Table5.1Multivariatelogisticregressionoftext-miningmeasureofopioiduse
problems..............................................................................................................................................68
xii
LISTOFFIGURES
Figure1.1Increasesindrugandopioidoverdosedeaths–U.S.,2000–2014.................3
Figure1.2KeysubstanceuseandmentalhealthindicatorsintheU.S.:Results
fromthe2015nationalsurveyondruguseandhealth.Source:
SAMHSA,2016......................................................................................................................................5
Figure1.3ReportonthetollofopioiduseinIndianaandMarioncounty.
Source:Duwve,J.etal.,2016.........................................................................................................5
Figure1.4DistributionofopioidsbydifferenttypesofMedicareprescribers...............6
Figure1.5Overviewofsourcesandpurposeofopioidriskassessmenttools................7
Figure2.1Articlesextractionprocess...........................................................................................15
Figure3.1Identificationofopioiduseproblemssummarymethodsandresults......38
Figure3.2Positiveopioiduseproblemsratestratifiedbyagegroup.............................43
Figure4.1WEKAmodelconfigurations........................................................................................56
Figure4.2Summaryofchapter4methodandresults............................................................59
Figure5.1Opioidoverdosedeathratesper100,000populationfrom
2006-2014.Source:KaiserFamilyFoundation[123].....................................................70
1
1. INTRODUCTION
Synopsis
Medicalandnon-medicaluseofopioidshasincreasedintheUnitedStates(U.S.)
forthelastdecadeforpatientswithchronic,non-cancerpain[1–4].Prescriptionsof
opioidanalgesics,suchasmorphine,fentanyl,oxycodone,andhydromorphone,have
increased dramatically as shown in multiple studies [1,5–7]. According to The
NationalHealthandNutritionExaminationSurvey,25%ofAmericans20yearsand
older have experiencedpain that has lasted over 24hours in the pastmonth [8].
Despitetheclinical importanceofopioidsinthemanagementofpain,opioidsmay
havesignificantandadversesocietaleffects.Notably,thesecanincludetheepidemic
ofopioidabuseandopioidusedisorderwithrelatedsocio-economicandcriminal
impacts, as well as deaths attributed to overdose [9]. Achieving effective pain
management, along with maintaining functionality and healthy participation in
society, cannot be accomplished with the high occurrence of opioid addiction.
Therefore,understandingtherisksandbenefitsofprescribingopioidsisimportant
toreducetherelatedopioidabuseepidemic[10–13].Thefinancialburdenassociated
withtheopioiduseproblems(OUP)issignificant.Thecostoftreatmentforpatients
withOUP represents a large portion of the cost of rehabilitationprograms, social
consequences, public safety, legal proceedings, and so on [14,15]. The total US
societalcostsofprescriptionopioidOUPwereestimatedtobe$11.8billionin2001,
increasing to $55.7 billion in 2009 [16]. The societal effect of prescribed opioids
extendsbeyondfinancialburdentohealthmatters,accordingtotheNationalInstitute
2
ofDrugAbuse;prescribedopioidswereresponsibleforover16,000deathsin2014,
alone.Thisnumberincreasedfromnearly
6,000deathsin2001[17].
OverviewoftheOpioidUseProblems(OUP)
OpioidOverdoseDeath
In2015,accordingtotheCentersforDiseaseControlandPrevention(CDC),drug
overdosewas the leading causeof accidental death in theU.S.,with55,403death
incidents [18]. Opioid addiction alone is responsible for 20,101 (36%) of these
overdosedeathincidents[19].TheCDChasconcludedthat91Americansdieevery
day using prescription opioids and heroin, together [19]. To understand the
relationshipbetweenheroinandOUP,accordingtoJones(2013),80%(fouroutfive)
ofnewheroinusershavestartedwithprescribedpainkillers,suchasopioids[20].
Figure1.1showsthepatternofdrugoverdosedeaths involvingopioidsbytypeof
opioidinU.Sfrom2000to2014.[21]Unfortunately,alargepartoftheopioidsthat
are available illicitly are originate from prescribed opioids. In 2012, 259 million
prescriptionswereprescribedforopioids[22].Thisquantityissufficienttosupply
everyAmericanadultwithhisorherownbottle[18].
3
Figure1.1Increasesindrugandopioidoverdosedeaths–U.S.,2000–2014
OpioidUseProblemsandWomen
Drugabuseaffectsmenandwomenindifferentways;despitemenbeingmore
likelytodieofpainkilleroverdose,between1999and2010,theoverdoserateamong
women has increased by more than 400% compared to men, whose rate has
increased256% [23].Moreover,womenmaybecomedependentonopioidsmore
quickly than men and engage in doctor shopping more than men, as well [23].
Anotherimportantfactoruniquetowomenispregnancy;addictedpregnantwomen
canputinfantshealthatriskfromneonatalabstinencesyndrome[23].
OpioidUseProblemsinAdolescentsandYoungAdults
Despite the fact that the focus of this proposal is adult chronic patients, it is
importanttounderstandthatopioidconsumptionalsoaffectsadolescents.In2015,
an estimated276,000 adolescents between ages12 and17misusedopioids,with
122,000 having an addiction to prescription pain relievers [18]. Moreover, the
numbernearly tripled to829,0000youngadults fromage18 to25 inonemonth
(Figure1.2)[24].AccordingtotheNationalInstituteofDrugAbuse,mostadolescents
4
who misuse opioids obtained their drugs for free from a friend or relative [25].
Adolescentscanobtaintheirownprescriptions,aswell.Infact,from1999to2010,
thenumberofopioidprescriptionsamongadolescenthasnearlydoubled[26].
TheOpioidUseProblemsinIndiana
TheStateofIndianahasitsshareofopioidepidemics;arecentreportfromthe
Richard M. Fairbanks School of Public Health (September 2016) has reported
shockingfactsregardingOUP.Accordingtothesource,between1999to2014,the
numberof opioidoverdoseshas increasedby600% [27] (Figure1.3). In fact, the
leadingcauseofinjurydeathsinIndianaispoisoning,andmostofthesedeathsare
drugoverdoses (9outof10) [27].Drugoverdosesovertook thenumberofmotor
vehicledeathsinIndiana[27].Mostofthesedrugoverdoseswereinthegroupaged
30-39,followedbythegroupaged50-59[27].Withregardtogender,moremendie
fromdrugoverdoses;however,thegaphasshrunkovertheyears[27].Heroindeaths
havealsoincreasedsince2007(Figure1.3);mostweremale,non-Hispanicwhites,
andpeoplebetweentheagesof30-39hadthehighestdeathratefromheroin[27].
Moreinterestingfactscanbefoundinthereport[27].
5
Figure1.2KeysubstanceuseandmentalhealthindicatorsintheU.S.:Resultsfromthe2015
nationalsurveyondruguseandhealth.Source:SAMHSA,2016
Figure1.3ReportonthetollofopioiduseinIndianaandMarioncounty.
Source:Duwve,J.etal.,2016
ProblemStatement
Physicians’RoleinIdentifyingOpioidUseProblems
Pain-specialist physicians would be best suited to assess OUP overall and
prescribeproperpain-managementaction.However,duetothe limitednumberof
painspecialistphysiciansandtheiraccessibilitytothegeneralpublic,primarycare
physiciansmostoftenprovidethemajorityofpaincareinthehealthsystem[28].In
astudypublished in2016 investigating theprescribingofschedule IImedications
amongtypesofphysicians,itwasfoundthatfamilypractitionersprescribeover15.3
millionprescriptions;whereas internalmedicinepractitionersprescribeover12.7
6
million prescriptions [29] (Figure 1.4). This prescribingmay be due to improper
specialized training and educational/technical support, among other factors.
However, despite various root casues of the issue, primary care physicians have
reported frustration when providing care for chronic-pain patients [30–34]. In a
qualitative study published in 2015, some physicians admitted they have been
manipulatedmanytimesandhaveprescribedopioidsforpatientswhoexaggerated
their pain or for other patientswho had tested positive several times for heroin.
However,physicianssometimesmakethemistakeofnotprescribingmedicationto
patients who actually are in need, according to the same qualitative study [28].
Therefore,relyingonclinical judgmentalonemightmisguidephysicianstochoose
thecorrectcourseofactiontomanagechronic-painpatients.
Figure1.4DistributionofopioidsbydifferenttypesofMedicareprescribers
Self-ReportOpioidRisk-AssessmentTools
Multipleself-reportrisk-assessmenttoolshavebeendesignedandvalidatedtosome
extent in recent years [35–38]. Existing opioid risk-assessment tools vary in the
sourceofidentifyingpredictingvariables(knowledgebasevs.databaseprediction)
and the purpose of use (initiation of treatment vs. monitoring of care). For the
purpose of this dissertation, we categorize opioid risk-assessment tools into two
categories: self-report base vs. database tools. We define self-report-based risk-
assessmenttoolsasrisk-assessmenttoolsthatpredicttheriskofcurrentorfuture
opioiduseproblemsbasedonpredictivevariablesextractedfromdataself-reported
7
by the patient or physician. We define database risk-assessment tools as risk-
assessmenttoolsthatpredicttheriskofcurrentorfutureopioiduseproblemsbased
on objective data extracted from a database, such as electronic health records or
claimsadministrativedata(Figure1.5).
Figure1.5Overviewofsourcesandpurposeofopioidriskassessmenttools
Common examples of self-report tools are SOAPP, ORT, and COMM. However,
havingthemajorityofrisk-assessmenttoolssolelyrelyonpatientcompletionmight
limittheirabilitytopredictaberrantdrugbehaviorduetoreportingbias[39]orlack
ofpatientcomprehension[40].Jonesetal.,(2012)conductedastudythatisunique
in that it compared original ORT (patient self-reported) vs. ORT (psychologist
completedafterconductinganinterview)among51patients;only30(59%)patients
matchedinthesameriskcategory.Amongallvariables,agehadthehighestrateof
8
agreement, with only an 86% level of agreement [40]. Overall predictive ability
varied,aswellbetweenthetwoORTs;predictiveabilityforpsychologist-completed
ORTwas better (43% vs. 70%missing rate) [40]. These results indicate that the
processofadministrationoftheopioidrisk-assessmenttoolitselfcouldplayarolein
determiningtheoverallpredictiveabilityofthetool;perhapspatienteducationand
leveragingphysicianswithmoreaccurateandrobustdatafromhealthcaredatabases
might help tominimize the issue. A second issue of using self-reportmethods to
assessopioidrisk istheprocessofdocumentationitself.Thischallengebecomesa
monthlyburdenontheclinicalstaff,nurses,andphysicianswhoareconductingthe
documentation process. Understanding the challenge of documentation, Butler
conducted a study to examine the effect of computerizing current methods of
administratingSOAPPandCOMM(self-reporttype)ontherateofdocumentationof
thesetools.Theauthordesignedtheweb-basedtoolandimplementeditintwopain
centersforaperiodof10months.Bothsitesusedpaper-basedversionsofSOAPP-R
andCOMM.Theresultsshowedasignificantincreaseinthedocumentationofopioid
riskassessmentforbothinitialandfollow-upvisits(SOAPP30.3%vs.76.9/5,COMM
4.5% vs. 43.6%) (p < .001). Such a study highlights the significant issue with
documentationincurrentself-reportmethods[41].
A third issue with self-reportedmethods is the variation of prediction ability
(reliability)intheliterature;SOAPPisoneofthemostvalidatedtoolswefoundinthe
literature.Basedonthisreview,thereisalargevariationinthesensitivityofthetool
topredictOUP.ThesensitivityandspecificityofSOAPPforonesourcewas0.91and
0.69[42]and0.392and0.693inanother[43],respectively.Thesameargumentcan
9
beappliedtoORT,inwhichonestudyreportedasensitivityof0.45[44]andanother
reportedasensitivityof0.195[43].Onewaytoexplainthisvariationcouldbethe
differenceindefiningtheoutcomeinthevalidationstudiesreviewed.
TheStatementofPurpose
Duetothelimitednumberofpainspecialists,primarycarephysiciansprescribea
significant portion of opioids. Primary care physicians have reported frustration
whenprovidingcareforchronic-painpatients.Ourobjectiveistoinformtheclinical
andresearchcommunityabout factors contributing to the incidenceofopioiduse
problems.
Given the lack of census on structured variables’ rule to identify opioid use
problemsintheliterature,weseektoidentifystudiedvariablesintheliteratureand
classifytheirrulesintermsofprotectionorbeingriskfactorstoopioiduseproblems.
Additionally,while80%ofhealthdataarestoredasunstructureddata,thecurrent
tools to identify opioid use problems majorly rely on patient-reported data or
structured data in the electronic medical records. Thus, we seek to utilize
unstructured data to develop a predictive model for those who are at risk of
developingopioiduseproblemsusingtextminingandmachinelearningapproaches.
DissertationAims
1. Conductasystematicreviewofpreviousopioiduseproblems(OUP)predictive
modelsto:
(a) IdentifyvariablesavailableintheliteraturetopredictOUP.
(b) Exploreandcomparemethods(population,database,andanalysis)used
todevelopstatisticalmodelsthatpredictOUP.
10
(c) UnderstandhowoutcomesweredefinedineachstatisticalmodelOUP.
2. Identifyandcomparepatientswithopioiduseproblems(OUP)usingthetext
miningvs.ICDcodes.
(a) Identifypositiveopioiduseproblemspatientsusingtext-miningapproach.
(b) IdentifypositiveopioiduseproblemspatientsusingtheICD-9codes.
(c) Comparetheresultsandpopulationcharacteristics.
3. DevelopapredictivemodelforthosewhoareatriskofdevelopingOUPusinga
machinelearningapproach.
(a) Identifyagoldstandardsubsetofmedicalreports.
(b) Identifybestconfigurationstobuildmachinelearningmodel.
(c) Buildamodelthatpredictsopioiduseproblemsbasedonpatients’clinical
notes.
DefinitionofTerms
Tounderstandthecomplexityoftheproblem,thereareseveraltermsrelatedto
opioid use that need to be defined: opioid addiction, opioid abuse, aberrant drug
takingbehavior(ADTB),andopioiduseproblems(OUP).AccordingtoR.WestandJ.
Brown in their book,TheoryofAddiction, 2012, addiction is defined as a chronic
conditioninwhichthereisarepeatedpowerfulmotivationtoengageinarewarding
behavior, acquired as a result of engaging in that behavior, that has significant
potentialforunintendedharm.Itisnotall-or-none,butamatterofdegree.[45].
The second term is opioid abuse. In general, substance abuse is an initial step
towardsaddictionanddependence.TheWorldHealthOrganization(WHO)defines
substance abuse as the harmful or hazardous use of psychoactive substances,
11
includingalcoholandillicitdrugs.[46]Someattributesthatcharacterizesubstance
abuseare:failuretofulfillsocialorworkobligations,continueduseofasubstancein
hazardoussituations,legalproblemsrelatedtosubstanceabuse,andpersistentuse
despitecontinuedandrecurrentproblems[47].
Substanceabuseanddependencearenowcombinedintosubstanceusedisorder
(SUD),whichismeasuredonacontinuumfrommildtosevereaccordingtoDiagnostic
andStatisticalManualofMentalDisorders,FifthEdition(DSM5).Asthisareaofstudy
has developed, so have the defining criteria related to substance use disorders.
CategorizationwithinthespectrumofSUDrequiresatleasttwotothreesymptoms-
insteadofone-fromalistof11criteriaformildstage[48].Whilethesecriteriaare
availableindetailatwww.DSM5.org[48],itisimportanttonotethatthetolerance
criterion and withdrawal symptoms for opioid use disorder “does not apply for
diminishedeffectwhenusedappropriatelyundermedicalsupervision”[49].
Aberrantdrugtakingbehaviors(ADTB)aredefinedas“behaviorsthataremore
likelytobeassociatedwithmedicationabuseand/oraddiction”[50].Examplesof
thesebehaviorsare:1)sellingprescriptiondrugs,2)concurrentlyabusingalcoholor
other illicitdrugs,and3) losingprescribedmedicationonmultipleoccasions [51–
53].ItisworthmentioningthatsomebehaviorscouldlooklikeADTB,butinfact,they
mightindicateundertreatmentormaybemorepartofstabilizingthepaincondition
[54]. Examples of these behaviors are: 1) asking for, or even demanding, more
medication,2)askingforspecificmedications,3)useofthepainmedicationtotreat
other symptoms [54]. Despite the evolving nature of these terminologies, it is
12
important for researchers and treating prescribers to understand that there is a
spectrumofaddiction.
For this study, we define the final term, opioid use problems (OUP) as the
presenceofopioidaberrantbehaviors,opioidmisuse,opioidabuseandopioiduse
disorderinthepatientelectronichealthrecord(clinicalnotesand/orICDcodes).A
listofICDcodesandalistofspecificcriteriaofaberrantbehaviors(fortheclinical
note)areprovidedinthemethodsection(AppendixA.3andTable3.1).
13
2. CONDUCTASYSTEMATICREVIEWOFPREVIOUSOPIOIDUSEPROBLEMS
(OUP)
Objective
Several opioid risk assessment tools are available to prescribers to evaluate
opioidanalgesicabuseamongpatientswithchronicpain.Theaimofthissystematic
reviewistoanswerthefollowingquestions:1)Whatvariableshavebeenexamined
in the literature to predict opioid abuse? 2) What are the methods (population,
database,andanalysis)usedtodevelopthestatisticalmodelstocreatethetool?3)
Howweretheoutcomesdefinedineachstatisticalmodel?Theresultsofthischapter
waspublishedinapapertitled:“Reviewoffactors,methods,andoutcomedefinition
indesigningopioidabusepredictivemodels.PainMedicine,19(5),997-1009.”
Method
ApharmacistwithresearchexperienceusedEBSCOandPubMedtoconducta
two-stagesystematicsearchtoidentifyseveralarticlesthataredirectlyrelatedtothe
topicofriskassessment.First,thepharmacistidentified8articlesthatweredeemed
tobedirectlyrelatedtothetopic.Thesecondstepwastoconductasystematicsearch
usingOVIDtogeneratealistofarticlesthatshouldhaveall8articlesofrelevance.
Theideabehindchoosingthismethodisensuringallcommonarticlesofrelevance
are included inone combined search,whichperhapswill yieldmorenewarticles
relatedtothetopic,aswell.Finally,authorsofsomeofthearticlesfoundwerealso
contactedmainlythroughemailoroverthephonetorequirefurtherdataabouttheir
workortoidentifyotherrelatedwork.Thesearchwaslimitedtoarticleswrittenin
EnglishfeaturinghumanadultsubjectspublishedfromJanuary1990toApril2016.
14
Thissearchgenerated1409articles.Duplicatearticleswereautomaticallychecked
usingEndnoteX7andthenmanuallyreviewed.Outof1409articles,43articleswere
duplicates and excluded. The pharmacist reviewed the titles and abstracts of the
remaining1366articles(Figure2.1).Theinclusion/exclusioncriteriawerethe
following:
1. Theincludedarticleswereoriginalstudies,notnarrativereviews.
2. Theincludedstudiesfocusedonriskforaberrantdrug-relatedbehaviors.
3. Studiesincludedquantitativedataspecifictopredictioncapability(oddsratio,
p-value,orconfidenceinterval).
4. Studiesthatfocusedonself-administeredtoolswereexcluded.
5. Onlystudiesthatusedalgorithmstopredictopioidabusefromdataextracted
fromelectronichealthrecordsoradministrativeclaimswereincluded.
After applying the inclusion/exclusion criteria, 27 full-text articles were
downloaded.Ofthesearticles,20articleswereexcluded,becauseoftheirfocuson
selfreport tools (Figure2.1).Variablesof interestwereextractedand listedonan
Excel spreadsheet for all 7 articles included. To check face validity and data
consistency,aprimarycarephysicianreviewedextracteddata, includingvariables
fromarticles,categorizationofvariablesfromarticles,method(population,database,
analysis)used,andoutcomedefinition(opioidabuse).
Results
Duringoursystematicreview,weidentifiedsevenarticles[55–61](ninemodels)
toassessopioidriskfromdatabases(claimsdataorEHR),whichwillbethesubject
ofthisreview.Allninemodelsprovideddefinitionsoftheoutcomeofopioidabuseas
15
well as variables used topredict the outcome.The following sectionswill discuss
importantfindingswithregardtovariables,methods,andoutcomedefinitionsforthe
coincidingmodels.
Figure2.1Articlesextractionprocess
Variables
Amongtheretrievedstudies,9modelsand75distinctvariableswereidentified.
The variables were grouped into 7 main categories; demographics (6 variables),
medications(33variables),careutilizationvisits(3variables),behavior(7variables),
mental status (1 variable), pain andmedical comorbidities (15variables), pain (4
variables),andfamilyhistoryofsubstanceabuseandcomorbidities(6variables).
The identified models most commonly included demographic variables; age and
genderwerementionedinall9models.Historyofalcoholabuse,smokingstatusand
16
mentaldiagnosiswerementionedin5models.Table2.1listsallvariablesmentioned
in3modelsormoreoutof9models.
Method:StudiedPopulation,DatabaseUsedand,Analysis
Asmentionedpreviously, this review found3 studies thatusedadministrative
claimsdataand4studiesthatusedelectronichealthrecorddata.In1ofthe4articles,
adiseasemanagementprogramdatabasewasusedinadditiontohealthrecorddata.
Thenumberofpatientsincludedinthestudiesvariedgreatlyfrom196to1,552,489.
StudiesthatusedICD-9codesasaprimarysourceofdefiningtheircohortincluded
largersamplesizescomparedtostudiesthatdidnotuseICDcodes.Thissamplesize
variation is related to study design and coincides with the nature of data used.
Administrativeclaimsdataislikelynationaldata,whileEMRdataisrepresentativeof
a single site or single health care system. The inclusion criteria for a study’s
population were similar, age from 12 (or 18) to 64, at least one claim for a
prescriptionopioidinthepast12months,orahistoryofreceivingopioidsfor30to
90consecutivedays.Exclusioncriteriawerecancerpainandheroinpoisoning.Table
2.2detailsthemethodsforeacharticle.Finally,intermsofanalysisused,allstudies
usedlogisticregressionastheirprimarymultivariateanalysismethod.Usinglogistic
regressionforthispurposeisconsistentwiththecommonpracticeinthefield,which
recommendsusinglogisticregressiontoanalyzeadichotomousdependentvariable
[62](e.g.opioidabusersvs.non-abusers).Someofthestudiespreliminarilyanalyzed
variablesinbivariateanalysis,withthesignificantvariablesultimatelyenteredinto
themultivariatemodel.
17
Table2.1Mostcommonlyusedvariablesinstatisticalmodelstopredictopioidabuse
Category VariableInterpretation(Risk=Opioidabuse)
#ofmodels
Timesitwasprotective
Timesitwasriskfactor
Oddsnotreportedor=exactly1
Demographics Age Olderagedecreasesrisk 9 9 0 0Demographics Gender Maleincreasesrisk 9 0 6 3Behavior Historyofethanolabuse Alcoholabuseincreasesrisk 5 1 3 1Behavior Smokingstatus Smokingincreasesrisk 5 0 4 1Mental Mentalhealthdiagnosis Mentaldisorderincreases
risk5 0 4 1
Medications #ofopioidprescriptions Increasenumberofopioidprescriptionincreasesrisk
4 0 4 0
Behavior Earlyrefillsofopioidprescriptions
Increasenumberofearlyrefillsofopioidprescriptionincreasesrisk
4 0 4 0
Medications Non-opioidsubstanceabuse/dependence
Non-opioidsubstanceabuse/dependenceincreasesrisk
4 0 4 0
Careutilization Inpatienthospitalizationdays
Greaterhospitalizationdaysincreasesrisk
3 0 3 0
Careutilization Dayswithphysicalcarevisits
Greateroutpatientvisitsincreasesrisk
3 0 3 0
Behavior #ofpharmacieswhereopioidprescriptionwerefilled
Greaternumberofpharmacieswhereopioidprescriptionwerefilledincreasesrisk
3 0 3 0
Behavior #ofopioidprescribers Greaternumberofopioidprescriptionincreasesrisk
3 0 3 0
Medications Methadone Presenceofmethadoneprescriptionincreasesrisk
3 1 2 0
Medications Fentanyl(IR) Presenceoffentanylprescriptionhasnoeffectonrisk
3 1 1 1
Medications Fentanyl(ER) PresenceofFentanyl(ER)prescriptionincreasesrisk
3 0 2 1
18
Category VariableInterpretation(Risk=Opioidabuse)
#ofmodels
Timesitwasprotective
Timesitwasriskfactor
Oddsnotreportedor=exactly1
Medications Morphine(IR) Presenceofmorphineprescriptionincreasesrisk
3 0 2 1
Medications Morphine(ER) PresenceofMorphine(ER)prescriptionincreasesrisk
3 0 2 1
Medications Hydromorphone PresenceofHydromorphoneprescriptionhasnoeffectonrisk
3 1 1 1
Medications Oxycodone PresenceofOxycodoneprescriptiondecreasesrisk
3 1 0 2
Medications Tramadol PresenceofTramadolprescriptiondecreasesrisk
3 3 0 0
Medications Codeine PresenceofCodeineprescriptiondecreasesrisk
3 2 0 1
Medications Propoxyphene PresenceofPropoxypheneprescriptiondecreasesrisk
3 2 0 1
Medications Hydrocodone PresenceofHydrocodoneprescriptiondecreasesrisk
3 2 0 1
Medications Morphine(ER) PresenceofMorphine(ER)prescriptionincreasesrisk
3 0 2 1
Medications Fentanyl(ER) PresenceofFentanyl(ER)prescriptionincreasesrisk
3 0 2 1
MedicalComorbidities
Hepatitis PresenceofHepatitisdiagnosisincreasesrisk
3 0 3 0
19
OutcomeDefinition
The examined articles defined the outcome of opioid abuse differently.
Specifically,4outof7articles(6outof9models)dependedprimarilytheonpresence
or absenceof anopioid abuseordependencediagnosis according to ICD-9 codes,
whiletheother2studiesusedapredefinedlistofopioid-relatedaberrantbehaviors.
Additionally, they also reviewed a patient’s profile for any release from the pain-
management program due to aberrant behaviors. The last article and respective
model,usedahybridtechniquethatincludednaturallanguageprocessingmethods,
alongwithICD-9codestodefineopioidproblems(Table2.3).[55–61]
20
Table2.2OutcomedefinitionandmethodsdatasourceArticle/#ofModels #ofPatients Studysample Sourceofdata/Period CodingLewis,2014/1 202 Prescriptionsofanopioidfor30ormoredaysof
consecutivedosing.Agegreaterthan18years,andadocumentedhistoryofanychronicpainsyndromedefinedaspainlastinggreaterthan90daysinduration.
ElectronicHealthRecord/12Months
ManualChartreview
Ives,2006/1 196 chronic,non-cancerpainofatleastthreemonthsduration
Medicalrecordandourdiseasemanagementprogramdatabase/12months
Followup
White,2009/2 632,000 12to64yearswithatleast1claimforaprescriptionopioidandatleast1medicalclaimfrom2005to2006.
Administrativeclaimsdata/3months(AlternativeA)&12months(AlternativeB)
ICD-9-CM
White,2012/2 1,552,489 Patientsages12-64yearsthroughoutthe12monthspriortotheindexopioidRx
Administrativeclaimsdata/12Months
ICD-9-CM
Edlund,2007/1 15,160 Veteranswithatleastoneprescriptionforanopioidandhadmorethan91daysofsupplyin2002.(excludedindividualswithanycancerdiagnosis(ICD-9-CMcodesbetween140.0and208.9)
Administrativeclaimsdata/24months
ICD-9-CM
Turner,2014/1 5,420UrineDrugtests
PatientAge20orabove.StudyincludedonlyUrineDrugTest(UDTs)forpatientsonChronicopioidtherapy(COT)definedbyGroupHealthas70daysopioidsupplyintheprior90day.TolimitthesampletoUDTsforCNCPpatients,thestudyexcludedthoseofpatientswho,intheone-yearperiodpriortotheUDT,hadhadhospicecare,opioidprescriptionsfromoncologists,ormorethanonevisitforcancerotherthannon-melanomaskincancer.
ElectronicHealthRecord/12months
ICD-9
Hylan,2015 2,752 18yearsorolderwhoinitiatedCOTfornoncancerpainbetween2008&2010.COTwasdefinedasreceiptofequalormorethan70dayssupplyoftransdermalororalopioids(exceptbuprenorphine)inacalendarquarter,whichcorrespondedto>75%ofthedaysinthequartercoveredbyanopioidprescription.Studypatientsreceiveatleast2quartersofCOTwithina1-yearperiod
ICD-9codes:304[0,.00,.01,.02;304.7,.70,.71,.72];305[5,.50,.51,.52]
ICD-9+NaturalLanguageProcessing(NLP)toidentify[overuse,misuse,abuseorad-diction]And/or[Dependance]
21
Table2.3OutcomedefinitionandmethodsdatasourceArticle/#ofModels
OutcomeDefinition
Lewis,2014/1 multiplepharmacies,multiplenonBellevilleFamilyHealthCenter(BFHC)prescribersofopioids(anyprescriptionpreceded and followed by a BFHC provider prescribed opioid), unsanctioned dose escalation, lost or stolenprescription, UDS anomalies, illicit drug use (either self-reported or identified via UDS), multiple emergencydepartmentvisitsresultinginreceiptofopioids,familyreportofopioidsaleorotherformofdiversion,Forgingortamperingwithcontrolledsubstance,prescriptionsandmisuse/adulterationofprescribedformulationorintendedrouteofadministration.
Ives,2006/1 Negativeurinetoxicologicalscreen(UTS)forprescribedopioids,UTSpositiveforopioidsorcontrolledsubstancesnotprescribedbyourpractice,Evidenceofprocurementofopioidsfrommultipleproviders,DiversionofopioidsPrescriptionforgery,Stimulants(cocaineoramphetamines)onUTS.
White,2009/2 304.0(opioid-typedependence),304.7(combinationsofopioidtypewithanyother),305.5(opioidabuse),or965.0(poisoningbyopiatesorrelatednarcoticsbutexcluding965.01[heroinpoisoning])
White,2012/2 ICD9codes:304.0,304.7,305.5,965.00,965.02,and965.09
Edlund,2007/1 ICD9codes:304.00304.03(opioiddependence),304.70304.73(dependence;combinationsofopioidtypedrugwithanyother),and305.50305.53(opioidabuse)
Turner,2014/1 ClinicalClassificationSoftware(CCS)categorization.SmokingStatusused305.1ICD-9code.
Hylan,2015 ICD-9codes:304[0,.00,.01,.02;304.7,.70,.71,.72];305[5,.50,.51,.52]
22
Discussion
InconsistencyofPredictionDirectionofVariables
Ofnote,wefoundinconsistencyintheabilityofindividualvariablestopredictthe
outcome(aberrantdrug-relatedbehavior).Somevariables,suchasusingmethadone
(foropioidusedisordertreatment)weresignificantriskfactorsintwomodels(OR=
2.97,CI=2.57-3.42,p-value<0.0001)and(OR=3.22,CI=2.62-3.95,p-value<0.0001),
andsignificantlyprotectivefactorinathirdmodel(OR=0.26,CI=0.08-0.94,p-value
=0.02).Asecondexampleisseenwiththevariablehistoryofalcoholabuse.Despite
apredominanceintheliterature,whichshowsthatahistoryofalcoholabuseisarisk
factor (4 studies), the same variablewas slightly protective in a fifthmodel. This
variation in the role of a predictive variable (protective vs. risk factor)might be
solvedbyhavingawidermeasurementscale(continuousormultiplecategories)for
thevariablesinsteadofadichotomousscale(YesorNo).Tosupportthisargument,
Zale,E.et.al.,2014,conductedthefirststudyofitskindtotestwhethervaryinglevels
of current/historical smoking and indices of smoking heaviness/nicotine
dependence may be associated with greater likelihood of past-year prescription
opioidmisuseinthegeneralpopulation.[63]Thestudyfoundapositiveassociation
between levelof smokingheaviness/nicotinedependenceandopioidmisuse [63].
Other variables, such as gender, number of opioid prescriptions, early refills,
inpatienthospitalizationdays,dayswithphysicalcarevisits,non-opioidsubstance
abuse/dependence, number of morphine prescriptions, and number of opioid
prescriberswereallconsistentriskfactorsacrossthemodels.Othervariables,such
23
as age (older groups) and tramadol use,were consistently reported as protective
variables(Table2.1).
ClaimsDatavs.ElectronicHealthRecordData
Usingaspecifictypeofdatabase(administrativeclaimsdatavs.electronichealth
recorddata)couldpotentiallyalterdevelopmentofthemodeldesign.Forexample,
administrativeclaimdatabasesgenerallyprovideaconsistentdataformat,because
theyuseapre-definedsetofcodes,[64]Asresult, it isrelativelyeasiertocreatea
study cohort using a claims data compared to most electronic health records
databases.However,onepotentialdisadvantageofusingclaimsdataisthatthecare
providers do not always document these relevant codes. This incomplete
documentationleadstomissingoutcomesofinterestforsomepatients.[65,66]On
the other hand, using electronic health record databases, which hasmuch richer
clinical data (for each patient), [64] can facilitate improved cohort/outcome
definitions.Specificitywithinthecohortcanbeachievedbyreviewingsomeaspects
ofthepatient’sfilemanuallyorusingsomeadvancedtoolstodefinethestudycohort
basedoncertainmetrics(e.g.,estimatedglomerularfiltrationratetodefinechronic
kidneyfailurepatients)[64].
VariationinOutcomeDefinition
The InternationalClassificationofDiseases(ICD) isadiagnosticcodingsystem
that canbeused to categorizepatients.However, the issuewith creating amodel
based on an outcome structured by ICD codes is the fact that these codes often
understatetheactualnumberofpatientsexhibitingthetargetcategorization[57,58].
24
Thismeansthosepatientswhoarelabeledwithopioidproblems,accordingtoICD
codes,arelikelytohaveanopioidproblem;however,ICDcodesmayoverlookother
patientswhohaveopioidproblemsbutwerenotdocumented[57].Thiscouldbedue
an intentional lack of documentation, or in other cases, prescribers try to avoid
stigmatizing patients with opioid problems using these codes. [57] The issue of
underdocumentation in particular can create less representativemodels. Another
limitationtothemodelsfoundisthefactthatmostofthesemodelswerebasedon
claims data,which is less likely to provide the in-depth anddetailed clinical data
available in an electronic health record database. The variation in the outcome
definition across studies and its potential effect can be noticed in the self-report
opioidriskassessmenttools,aswell.Whentools,suchasSOAPP-R(SOAPPrevised)
orORTarevalidatedacrossdifferentpopulationsorwhendifferentdefinitionsofthe
outcomeareused,thepredictionsensitivityandspecificitychangesgreatly.SOAPP-
R is a self-report questionnaire designed to predict aberrant medication-related
behaviorsamongpersonswithchronicpain[67].SOAPP-Rcameasarevisedversion
fortheoriginalSOAPP,aninstrumentdevelopedbyInflexxion,(Newton,MA,USA)
withsupportfromEndoPharmaceuticalsandtheNationalInstituteonDrugAbuse
[68].Version1,consideredasaninitialsteptowarddevelopmentofascreenerfor
aberrantmedication-relatedbehaviorsinchronicpainpatients[37].SOAPP-Risone
of the most validated tools; based on literature, there is a large variation in its
sensitivity.ThesensitivityandspecificityofSOAPPwere91%and69%,respectively,
in one study [42], and 39.2% and 69.3%, respectively, in another [43]. The same
statementcanbeappliedtoORT,aswell.Onestudyreportedsensitivityof45%[46]
25
andtheotherstudyreportedsensitivityaslowas19.5%[43].Onewaytoexplainthe
variationinopioidriskassessmenttoolsperformancecouldbeduetoadifferencein
outcome definitions in the reviewed validation studies. For example, SOAPP
performance was measured in some studies based on a positive result on the
Aberrant Drug Behavior Index (ADBI) [67–69]; while in other studies, it was
measuredbasedonthepresenceofadischargereviewform[70].Atpresent,such
cross-comparisonsforopioidriskassessmentmodels,whicharebasedondatabases,
arenotyetpossibleduetoalackoffurthervalidationfrommultiplestudiesforany
ofthemodelsfound.However,thisreviewindicatesthatmoreeffortinthefutureis
likely toutilizesecondary-dataanalysis.This isdueto thenecessityofdeveloping
more accurate and comprehensivemodels from current and futurehealth system
electronicrecords.
However,thesefutureeffortswillsurelyfaceaninterestingchallenge,considering
that,asof2016,mostof theorganizationaldataareunstructured,andhealthcare
dataisnoexception.AccordingtoIBM,in2015,unstructureddatarepresented80%
of the total health data. [71] Thus, it is important for future efforts to utilize the
untappedpotentialofunstructureddatatodevelopanopioidaddictionmodelthat
usesdataminingtechniques.
Anotheraspectofdefiningoutcomesisthecohortfollow-upperiod.Whetherthe
studiesretrievedwereretrospectiveorprospective,thecoveredperiodtomeasure
theoutcomerangedfrom3monthsinonemodel,including12monthsin6models,
up to24months inonemodel,and finally24-60months(acombinationofapre-
indexingperiodandpost-indexingperiod).Thisvariationintheperiodoffollowup
26
mighthaveanunforeseeable impactonthepredictionaccuracyofthemodels.For
example,aperiodof3monthsof follow-upmaynotbesufficienttodetectopioid-
relatedaberrantbehaviors.Thus,patientsmightbecategorizedfalsenegativeasa
result.
Limitations
Thereareonly7studies,including9models,thatmetourinclusioncriteria.This
couldbeduetousingonlyonesearchengine,OVIDMedline.However, thestudy’s
searchtermswereexhaustiveandusedresultsfromcombined3searchsessionsto
mitigate this shortcoming (Figure 2.1). Another reason for the relatively small
numberofarticlesistheexclusionofself-reportopioidriskassessmentmethods(21
articles).Thejustificationforomittingthesearticlesisspecifictothestudyquestions.
Thestudyattemptstoidentifyvariablesandresourcesusedforthedesignofmodels
basedondatabases,electronichealthrecords,andadministrativeclaimdata.Another
limitation related to data collection is having only one primary care physician to
checkforfacevalidityanddataconsistency.Havinganotherreviewerandreporting
interraterreliabilitycouldaddvaluetothestudy.Inourcase,thiswasnotfeasible,
duetolimitedfundsandresourcesavailableforthestudy.
Asecondlimitationpertainstothequantitativeanalysisofthevariablesfound,
whichdidnotincludemagnitudeoftheoddratioorsignificanceofp-valuesforeach
variable.Rather,theanalysisonlydescribesavariablesruleintermsofbeingarisk
factororprotectivemeasure(Table2.1).Thiswasdue toa lackof reportingodds
ratiosin2articlesandp-valuedatainathirdarticle.These2articlesdidnotreport
theoddsdatafortheinsignificantvariables,whilethethirdarticledidnotreportany
27
p-valuesforthemodel(butreportedadjustedORandCI,instead).Finally,theway
articlescategorizedthevariablessubsetvariedacrossthemodel.Forexample,the
variable age was sub-categorized into 6 age groups, while it was treated as a
continuousvariableintheother.Moreover,determiningthereferencegroupvaried
among models, as well. In the same example of the variable “Age”, one model
identifiedanolderagesub-groupasareferencegroup,while5modelsidentifieda
youngeragesub-groupasareference,andonearticleidentifiedmiddleage(40-49)
asareferencegroup.
A third limitation of this review is the inconsistency in the definition of the
outcome,opioidabuse,acrossthearticles.Despiteadescriptionandanalysisofthe
sameissues,theparticulardefinitionsemployedvariedacrossarticles.ShannonM.
Smithetalexamined the issueofdefinition inconsistency in2013. [72]Thestudy
summarizeddifferencesbetweenmultipleopioid-use-disorderterminologiesbased
onexperts’opinion.Thestudyfoundthattheterm“misuse”“emphasizestheuseof
the substance does not follow medical indications or prescribed dosing” [72].
However,theterm“abuse”iscommonlyappliedtosubstanceusefornontherapeutic
purposes.Addiction,ontheotherhand,wasdefinedas“compulsivesubstanceuse
thatoccursdespitepersonalharmornegativeconsequences”[72].Toaddressthis
variation of the definition in the examined articles, we summarized outcome
definitionsbyeacharticleinTable2.3.Ouranalysiswaslimitedtothedataprovided
inthearticles.Thus,accesstocertaindatathatmeasuredthemodelsperformance,
such as c-statistics, sensitivity, and specificity for analyzed models, were not
available.
28
Conclusion
Opioidriskassessmenttoolsarebecomingstandardpracticeinpainandprimary
clinicspriorandduringprescribingopioidtherapyforchronicpain.However,this
review indicates that there is inter/intravariation in these tools’performance for
assessingopioidabuse.Thisreviewconcludesthatthisvariationcanbeexplainedby
thevariationinstudyperiod,samplesize,opioidabusedefinition,typeofdatabase,
and structureddocumentation to ICDcodes. Inaddition, this review illustratesan
overviewofcommonmethodsofopioidriskassessmenttoolsandcategorizesthem
based on method of reporting into self-report and database-related models.
Moreover,thestudyprovidesacomparisonBetweenthetwocategorieswhenever
possible.Thissystematicreviewpresentsalistofvariablesthatwereusedtopredict
opioidabusefromelectronichealthrecordsandadministrativedata.Furthermore,
thereviewprovidedacountofhowmanytimeseachvariablewasmentionedand
howoftenitwascountedasariskfactororprotectivemeasureintheliterature.To
ourknowledge,thisistheonlysystematicreviewthathasdoneso.Providingsuch
datatoresearcherscouldleadtodevelopingatoolthatismoreaccurateinpredicting
theriskofdevelopingaberrantdrug-relatedbehaviors.Thereviewalsoidentifiesand
compares other aspects related to opioid risk assessmentmodels design, such as
periodof the study,numberofpatients, subject inclusion criteria, and ICD-9 code
usedtoidentifypatientsfromthedatabase.Thestudyhighlightsmajordifferences
betweenarticlesdefiningopioid abusederived fromdatabases for thepurposeof
developingopioiduseriskassessmenttools.Thiscouldhelpfutureresearchersbuild
onpreviousworktocreateadvancedmodelsfor improvedpredictabilityofopioid
29
abuse. Despite the availability and presence of many self-report and database-
orientedriskassessmenttools,prescribersmightnotyetrelysolelyonthesetools
duetotheirlackofvalidationandconsistencyintheirresults.However,opioidrisk
assessmentprocedurescanbeimprovedbyenhancingstructureddatacapturedby
theelectronichealthrecordsystem.Physiciansandnursescanplayamajorrolein
thisstepbydocumentingtheproperICDcodethatspecificallyidentifiesthecategory
ofopioidabuseforpatients.Clinicalpracticesandhospitalsshouldpayattentionto
theviabilityof adopting these tools into theirpractice. Someexpectedbarriers to
adoptionofthesetoolsare:thepopulationvariationfromwithintheclinicwherethe
toolwasvalidated,proceduraldifferences,differencesinoutcomedefinitionbetween
clinics,andlackofpropertechnicalexperiencetoimplementthesetools.
30
3. IDENTIFYANDCOMPAREPATIENTSWITHOPIOIDUSEPROBLEM
Introduction
Opioidsareagroupofmedicationsprescribedtorelievepain.Beforethe1980s
thesedrugsweremainlyprescribedbysurgeonstorelieveimmediatepostoperative
painorpainrelatedtocancer[73,74].However,duringthe1980s,themovetoward
aggressivepaintreatmentandprescribingopioidsforchronicnon-cancerpainhas
been encouraged [73,75]. Advocacy health groups, such as American Pain
Foundation,continuedtopushformoreopioidprescriptionsduringthelate1990s
andthroughthe2000s[73,76–78].By2012,opioidprescriptionsincreasedfrom142
millionin1999to248millionprescriptions[79,80]andopioidsalesquadrupledfrom
1999 to 2010 [81]. Particular medications, such as hydrocodone, doubled in
consumptionbetween1999and2011[82].
Inconjunctionwiththeincreaseinprescribedopioidsperpopulation,therewas
also a higher percentage of opioid overdose deaths and substance use disorder
treatment increased in parallel from 1999 to 2008 [19]. In 2015, drug overdose
incidentsaccountedfor52,404U.S.deaths,ofwhichopioidswereinvolvedin33,091
overdoseincidents(63.1%)[83].The2016NationalSurveyonDrugUseandHealth
indicatedthattherewasapproximately11.8millionpeopleintheU.S.age12yearsor
olderwhomisused opioids in the past year [24]. At the state level, Indiana state
reported794opioid-relatedoverdosedeathswithasubstantialincreaseinheroin-
related overdose deaths. Compared to 2012, heroin-related overdose deaths
increasedfrom114to297in2016.Deathsfromsyntheticopioidshaveincreasedfor
thesameperiodfrom43to304deathsinIndiana[84].
31
Despite the importance of the opioid issue, reporting prevalence of opioid
problemsmaystillimposesomechallenges.Forexample,differencesinthetargeted
outcomedefinitions(“opioidusedisorder”vs.“opioidoverdose”vs.“opioidabuse”)
andpopulationinclusioncriteria(minimummedicationday’ssupplyornumberof
opioidprescriptions)mayalterthestudy’sabilitytoidentifytheissue[85].Despite
thesedifferences,InternationalClassificationofDisease(ICD)identificationmaybea
standardmethodthatmayusedtoreportopioidusedisorder,overdose,abuse,and
dependence(whichwewillrefertointhisstudyasOpioidUseProblem(OUP)).ICD
isacodesystemusedbyphysicianstoclassifyanddiagnosedisease.However,the
issuewithreportinganoutcomedefinedby ICDcodes is the fact that thesecodes
oftenunderstatetheactualnumberofpatientsexhibitingthetargetcategorization
[57,58].
Thus,weexaminedtheliteraturetoidentifyalternativemethodstoidentifyopioid
useproblem in addition to ICD codes.We found a limitednumberof studies that
discussalternativemethodsusingnaturallanguageprocessing(NLP),ortextmining,
inageneralhealthcaresetting[86–89].Theuseoftextmininginthesestudiesvaried
from accelerating the documentation process of patient records to automatically
parsing and coding clinical events or assisting decision-making processes by
classifyingspecificcomplexdiagnoses.
More relevant studies addressing NLP/text mining to identify opioid use
problemswereidentifiedaswell[61,90,91].Carrelletal.(2015)developedanNLP
process to identify evidence of opioid use problems in electronic health records
(EHRs)atGroupHealth,ahealthcaresysteminSeattle,WA,USA.Thestudyfoundthat
32
conventionaldiagnosticcodesforopioiduseproblemshasidentified2,240(10.1%)
patientsandNLPidentifiedanadditional728(3.1%)patientsbyanalyzingpatients’
clinicalnotes[90].Hylanetal.(2015)hasalsousedNLPto“reportonapredictive
modeldevelopedtoassessthelikelihoodofproblemopioiduseovera2-to5-year
periodfollowinginitiationofchronicopioidtherapy”withintheGroupHealthsystem.
Thestudyfoundthesensitivityoftheregressionmodelpredictionofproblemopioid
usetobe58.3%,withspecificitybeing71.2%[61].Despitetheseearlyeffortsofto
use NLP/text mining encouraging, no study has attempted to identify opioid use
problemsusingtextmininginamulti-sitestudysetting.
In this study, we compared different strategies for identifying OUP using
administrativeandEHRdata.Weexaminedwhetheraddingatextminingapproach
couldimproveidentificationofOUPforpatientsonlong-termopioidtherapy(LTOT)
foralargepopulationacrossamulti-sitehealthcaresystem.
Method
MethodsOverview
FollowingInstitutionReviewBoard(IRB#:1710876219)andanapprovalfrom
theRegenstriefInstitute(RI)DataManagementCommittee,wequeriedidentifiedthe
patients’ clinical notes from Indiana University Health (IUH), a large healthcare
networkinIndianawith3,541staffedbedsand2,563,086outpatientvisits[92].The
IUHsystemproducesover30reporttypes(e.g.,Visitnotes,Progressnotes,Discharge
notes).Next, a text-mining algorithmwas applied to identify opioiduseproblems
usingpatients’clinicalnotesandreportedincidenceresults(Figure3.1).Finally,to
investigatewhetherornottherewasadifferencebetweenpositiveOUPusingatext
33
mining approach vs. an ICD conventional approach, we compared the 2 positive
cohorts characteristics in termsof frequency and rateper1,000 long-termopioid
therapypatients andhighlightedkeypoints.Additionally,we compared results of
care utilization (outpatient visits, emergency department visits, hospitalizations,
cumulativehospitalizationdays)andtheywerealsoreported.
SampleDefinition
OursamplewasdefinedasadultIUHmembersage18yearsorolderwhohave
beenprescribedLTOT,whichisdefinedaspatientswith70daysofsupplywithinany
given90-dayperiodbetween1January2013and31December2014(24months).
WeexcludedpatientswithactivecancertofocusonLTOTfornon-cancerchronicpain
(ICD-9 codes 140.x 172.x, 174.x 209.xx, 235.x 239.xx, 338.3). Patients with
schizophreniawerealsoexcludedduetothedocumentedhighpercentageofopioid
dependenceamongthispopulation(ICD-9code295.9)[93].
VariablesofInterest
Patientcharacteristics,includingdemographics(age,gender,race,andethnicity),
alcoholabuse,non-opioidabuse,tobaccouse,mentaldisorders,andhepatitisCwere
reported for the study population. For this study, mental health disorders were
identifiedasdepressivedisorder (ICD-9codes296.2x,296.3x,300.4,311), suicide
attemptorotherself-injury(ICD-9codesE95x.x,E98x.x),oranxietydisorder(ICD9
codes300.0x,300.21,300.22,300.23,300.3,308.3,309.81).
ReportTypeSelection
DuetothelargenumberofreporttypesgeneratedbyIUHsystems,wefocusedon
themostprevalent and relevantnote types available in thedatabase.To increase
34
efficiency,thetextminingprocesswaslimitedtoreporttypesthatwereobjectively
deemedrelevanttoOUPidentificationthroughconsultingclinicalanddatasubject
expertsat IndianaUniversityand theRegenstrief Institute.Three typesof reports
wereinitiallyexcludedduetopresumedirrelevancetoOUPidentification:radiology
reports, medication fills, and patient instructions. Next, the study investigators
selected9reporttypeswithassumedrelevancytoOUPidentificationandqueriedthe
studypopulationfortheir2014counts.Thesystemreturned142,971reportsasthe
following: Emergency Department Doctor Progress Notes (48,898), Emergency
Department Discharge Notes (28,637), Primary Care Doctor Outpatient Progress
Notes (26,669), Visit Notes (21,868), Discharge Summary (11,731), History and
Physical (2,759), Admission History & Physical (1,390), Preadmission
History/Physical(576),PrimaryCareDoctorOutpatientHistoryandPhysical/Initial
Consult(443).Byexploringthe9relevantreporttypes,wefoundthatthetop5report
typesgeneratedover96%(137,802)oftotalreports,whilethebottom4wereonly
responsible for 4% of reports generated. Due to labor constraints, the bottom 4
reports (AdmissionHistory and Physical, PreadmissionHistory/Physical, Primary
CareDoctorOutpatientHistoryandPhysical/InitialConsult)wereexcluded.Thus,
EmergencyDepartmentDoctorProgressNotes, EmergencyDepartmentDischarge
Notes, Primary Care Doctor Outpatient Progress Notes, Visit Notes, Discharge
Summary,wereusedforthetext-mininganalysis.
35
IdentifyPatientswithOpioidUseProblems
Weusedtwocriteriawhichareexplainedinthefollowingsectiontoidentify
opioiduseproblem.Thecriteriawillcovertext-miningandICD-9approacheswhich
areexplainedindetailsinthefollowingsections.
IdentifyOUPusingaTextMimingProcess
The process of identifying OUP using text mining involved 2 main steps: 1)
algorithm development to flag potential positive reports; and 2) results of the
validationprocessusingsemi-assistedmanualreview.
1- AlgorithmDevelopmenttoFlagPotentialPositiveCasesUsingnDepth™
Weappliedthetext-miningpackageprovidedbynDepth™toparsemedicalnotes
todetectOUP.nDepth™isanNLPtooldesignedbytheRegenstriefInstitutein
IndianapolistoextractdatafromtheIndianaNetworkforPatientCare(INPC),which
isahealthcaredatabasemanagedbyRegenstriefInstituteonbehalfoftheIndiana
Health InformationExchange(IHIE) [94,95].As IUH ispartof INPC,nDepth™was
used toquerypatients’ clinical notes to identifypossibleopioiduseproblems.To
developthealgorithm,2keywordlistsadoptedfromtheliteraturewereenteredinto
nDepth™[91].Thefirstlistwascomprisedofopioidterms(e.g.,Vicodin,Opiate),and
thesecondlistwascomprisedofproblemterms(e.g.,addiction,abuse)(Appendix
A.1). nDepth™ creates state machines, an algorithm that parses report types for
certaincriteria,tocheckforallpossiblecombinationsofthe2listswithina5-word
distance. Flagged results are automatically checked for whether the statement is
negated, hypothetical, historical, or experienced. This process itself was adopted
fromtheConTextalgorithmdevelopedbyHarkemaetal. (2009) [96].Thus, if the
36
systemdeterminedthepatientstatementwasnotnegatedordeemedhypotheticalor
historical,thoseflagsweredeemedasexperienced(consideredpositive).
2- Semi-AssistedManualReviewValidationProcess
Flaggedreportsweresemi-assistedmanualreviewedby2trainedreviewers.In
caseofdisagreement,anexpertphysicianactedasa thirdreviewer toresolve the
dispute. To avoid overlooking any signs of opioid use problems, nDepth™ was
programmedtohighlight27suggestivephrasesintheflaggedreports.Thesephrases
werecollectedfromtheliteratureandmodifiedbasedoncommonclinicaldialogin
Indiana(AppendixA.2).Incaseswheretherewasmorethanoneflaggedreportper
patient,thesystemrandomlyselectedonetypeoftheflaggedreportstobereviewed
perpatient.ThecriteriachosentodetermineOUPwereadoptedfromCarrelletal.s’
(2015) work and listed in Table 3.1. Of note, as using marijuana medically and
recreationallyisillegalinIndiana,itsusewastreatedasconcurrentuseofanillicit
drugduring theprocessofmanualreview.Othermodificationswerealsoadopted
basedoninitialreviewsofsubsetsofpatients’clinicalreports.
IdentifyOUPUsingICD-9Codes
TwodefinitionsfromtheliteratureutilizingICD-9codeswerecombinedtocreate
a case definition of OUP: opioid abuse and dependence (304.00, 304.01, 305.50,
305.51,304.71,304.02,304.70,305.52)andopioidpoisoning(965.0,965.00,965.01,
965.02, 965.09, E850.0, E850.1, E850.2) (Appendix A.3). These definitions were
combined to minimize the chance of systematically creating type II errors by
capturingthewiderspectrumofopioiduseproblemsamongthestudypopulation.
37
Table3.1Thecriteriaforidentifyingopioiduseproblemsinpatients’clinicalnotesNo. Criteriaforopioiduseproblems1 Substanceabusetreatment,includingreferralorrecommendation2 Methadoneorsuboxonetreatmentforaddiction3 Obtainedopioidsfromnonmedicalsources4 Lossofcontrolofopioids,craving5 Familymemberreportedpatient’sopioidaddictiontoclinician6 Significanttreatmentcontractviolation7 Concurrentalcoholabuse/dependence(notremitted)8 Concurrentuseofillicitdrugs9 Currentorrecentopioidoverdose10 Patternofearlyrefills(notanisolatedevent)11 Manipulationofphysiciantoobtainopioids12 Obtainedopioidsfrommultiplephysicianssurreptitiously13 Opioidtaper/weanduetoproblems,lackofefficacy(notduetoexpectedpain
improvement)14 Unsuccessfultaperattempt15 Reboundheadacherelatedtochronicopioiduse16 Concurrentuseofunauthorizednarcotics(polypharmacy)17 Physicianstatesopioidabuse/overuse/addictionorlistedICDcodesforopioid
abuse/dependence
Analysis
Frequencytableswillbeusedtodescribestudiedpopulationcharacteristics.For
inferentialstatistics,characteristicsofpositiveOUPidentifiedbytextminingandICD
cohortswillbecompared.Chi-squarewillbeusedforcategoricaldataanalysisand
independentt-testwillbeusedtocomparemeansforcontinuousvariables.
38
Figure3.1Identificationofopioiduseproblemssummarymethodsandresults
39
Results
WeidentifiedandcomparedOUPusingICD-9codesandtextminingtechniques.
TheresultsaresummarizedinFigure3.1.
SamplingResults
Adult patients from IUH on LTOT were queried, and our inclusion criteria
returned34,661patients.Ourexclusioncriteria,cancerandschizophrenia,excluded
11,579(33.4%)patients(11,121cancerpatients,272withschizophrenia[186with
bothschizophreniaandcancer]).Weidentified23,082patientswhoreceivedLTOT
andgenerated559,464reportsacrosstheIUHsystem.Ourreportselectioncriteria
excluded8,784patients, leaving14,298eligiblepatientswhichgenerated137,804
reports.Nearly 60%of the studypopulationwere over 55 years old, andwomen
represented62%ofthestudypopulation.Thewhiteracewasdominantamongthe
study population (86%), while black patients represented 12%, and other or
unknownraceswere2%.Non-HispanicorLatinoethnicityrepresented87%ofthe
studypopulation(Table3.2).
IdentificationofOpioidUseProblemsUsingaText-MiningApproach
Ouralgorithmflagged468distinctstatementsin366uniquereportsrepresenting
154patients.ForvalidationandtoconfirmthepresenceofOUP,2reviewerseach
reviewed1flaggedreportperpositivepatientbasedonspecificcriteria(Table3.1).
Outof154flaggedpatients,only127patientsweredeemedaspositivecasesofOUP
withapositivepredictivevalueof82%.Cohen’sκwasruntodetermineiftherewas
agreementbetween2reviewers’judgementonwhetherasubsetof200patientsin
40
the study cohortweremeeting anyOUP criteria. Therewasmoderate agreement
betweenthe2reviewersjudgements:κ=.0.691(95%CI,0.58to0.79),p<.0005.
Table3.2Demographicscharacteristicsofstudypopulation
Demographics StudyCohortN(%)
Age(18-24) 236(1.7%)25-34 891(6.2%)35-44 1,749(12.2%)45-54 2,808(19.6%)55-64 3,222(22.5%)>65 5,392(37.7%)Sex(women) 8,923(62.4%)Men 5,375(37.6%)Race(Black) 1,719(12.0%)White 12,367(86.5%)Other/Unknown 212(2.0%)Ethnicity(NotHispanicorLatino) 12,369(86.5%)HispanicorLatino 138(1.0%)Unknown 1,791(12.5%)AlcoholAbuse(Yes) 88(0.6%)Non-opioidAbuse(Yes) 88(0.6%)TobaccoUse(Yes) 912(6.4%)Depression(Yes) 845(5.9%)Self-injury(Yes) 14(0.1%)HepatitisC(Yes) 94(0.7%)Total 14,298
FrequencydistributionofcriteriatoidentifyOUPinpatients’clinicalnoteswere
rankedasfollows[OUPcriteria,Frequency(percentage)]:[Physicianstatesnarcotic
abuse/overuse/addiction or listed ICD codes for opioid abuse/dependence, 48
(38%)]; [Concurrent use of illicit drugs, 18 (14%); Methadone or suboxone
treatment for addiction, 15 (12%)]; [Concurrent use of unauthorized narcotics
(polypharmacy), 11 (9%)]; [Current or recent opioid overdose, 10 (8%)];
[Concurrentalcoholabuse/dependence(notremitted),9(7%)];[Obtainedopioids
from multiple physicians surreptitiously, 8 (6%)]; [Physician or patient wants
immediate taper, 3 (2%)]; [Substance abuse treatment, including referral or
41
recommendation, 2 (2%)], [Significant treatment contract violation, 2 (2%)];
[Familymemberreportedpatient’sopioidaddictiontoclinician,1(1%)]
ComparisonBetweenTextMiningandICD-9
Incomparisontothetext-miningapproach,queryingthesamestudycohortfor
designatedICD-9codestoidentifyOUPreturned49distincteventsrepresenting45
uniquepatients(4patientshad2distinctpositiveOUPICD-9codes).Thefrequency
distributionofpositiveICD-9codesforOUPwererankedasthefollowing[ICD-9
code,Frequency(percentage)]:[304,24(49%),305.5,9(18%)],[304.01,5(10%),
965,3(6%),965.01,3(6%)],[E850.2,2(4%)],[304.71,1(2%)],[965.09,1(2%)],
[E850.0,1(2%)](AppendixA.4andAppendixA.5).
Theoverlapbetweenthe2OUPpositivecohortswasmeasured,andwefound
8cases thatwerepositive inbotha textminingapproachandICD-9codesquery
(AppendixA.1).ThefrequencydistributionofOUPpositivecohortscharacteristics
aresummarizedinTable3.3.
ThefrequencydistributionpercentageofpositiveOUPamongstratifiedpositive
cohorts(ICDandtextmining)bygendershowsthatwomenhadahigherfrequency
amongICDcohortscomparedtothetextminingcohort(71%and48%).However,
therateofoverallpositiveOUPindicatesahigherrateofpositiveOUPofmenvs.
women(14.7vs.10.4per1,000).
ThefrequencydistributionpercentageofpositiveOUPamongstratifiedpositive
cohorts(ICDandtextmining)byagegroupshowsthatthe45-54agegrouphadthe
highestfrequencypercentageofOUP(31%and24%).However,therateofpositive
OUPpergroupageindicatesthe18-24groupagehadthehighestOUPrateamong
42
agegroupsinICDandtextminingcohorts(12.7and42.4per1,000)(Table3.3and
Figure3.2).
Table3.3FrequencydistributionofOUPpositivecohorts’characteristics
Characteristic
StudyCohorts* Significance
ICD-9N(%)Text-MiningN(%)
CombinedN(%) X2(P-value)**
Age(18-24) 3(6.7%) 10(7.9%) 13(7.9%) 10.2(<0.001)†25-34 7(15.6%) 22(17.3%) 26(15.9%) 35-44 4(8.9%) 29(22.8%) 33(20.1%) 45-54 14(31.1%) 31(24.4%) 42(25.6%) 55-64 7(15.6%) 25(19.7%) 31(18.9%) >65 10(22.2%) 10(7.9%) 19(11.6%) Sex(Women) 32(71.1%) 61(48.0%) 89(54.3%) 7.1(0.0076)†Men 13(28.9%) 66(52.0%) 75(45.7%) Race(Black) 5(11.1%) 23(18.1%) 27(16.5%) 1.6(0.6014)White 39(87.7%) 101(79.5%) 134(81.7%) Other/Unknown 1(2.0%) 3(2.4%) 3(1.8%) Ethnicity(NotHispanicorLatino) 39(86.7%) 106(83.5%) 139(84.8%) 0.8(1)HispanicorLatino 0(0.0%) 2(1.6%) 2(1.2%) Unknown 6(13.3%) 19(15.0%) 23(14.0%) OtherCharacteristicsAlcoholAbuse(Yes) 2(4.4%) 3(2.4%) 5(3.0%) 0.5(0.6069)Non-opioidAbuse(Yes) 10(22.2%) 7(5.5%) 14(8.5%) 10.4(0.0028)†TobaccoUse(Yes) 14(31.1%) 17(13.4%) 30(18.3%) 7(0.0079)Depression(Yes) 12(26.7%) 14(11.0%) 23(14.0%) 6.3(0.0118)†Self-injury(Yes) 2(4.4%) 2(1.6%) 3(1.8%) 1.2(0.2804)HepatitisC(Yes) 3(6.7%) 3(2.4%) 5(3.0%) 1.8(0.1848)TotalCohorts 45 127 164* —*ICD-9: Positive OUP cases using ICD-9 codes; Text Mining: Positive OUP cases using a text miningapproach;Combined:CombinedpositivecasesinICD-9codesoratextminingapproach(Thereare8casesthatoverlappedbetweenICDandtextminingcohorts);(%):Columnpercentage.**Chi-squarewasreportedfor22measurementsandFishersexacttest(Freeman-Haltontest)forrxctables.
†SignificantdifferencebetweenICDandText-miningcohortonα=0.05.
Table3.4ComparisonofcareutilizationamongOUPpositivecohorts
CareUtilizationICDPositiveMean±(SD)
Text-miningPositiveMean±(SD) P-value
OutpatientVisits 12.84±(15.05) 12.87±(12.402) 0.992EmergencyDepartmentvisits 1.91±(2.193) 3.17±(4.915) 0.022†Hospitalizations 2.24±(2.838) 1.55±(2.572) 0.154CumulativeHospitalizationsDays 16.24±(23.985) 7.43±(12.016) 0.022††Resultsaresignificantonα=0.05
Similar to the overall study population, race and ethnicity were dominantly
White, non-Hispanic in both positive cohorts (Table 3.3). Other characteristics
associatedwithOUPintheliteraturewerereported,includingalcoholabuse,non-
43
opioid abuse, tobacco use, depression, self-injury, and hepatitis C. All showed a
patternofbeingattheirlowerpointinthenegativecohort,withamodestincrease
inthepositivetext-miningcohortandpeakingatthepositiveICD-9cohort(Table
3.3). An independent-samples t-test was conducted to compare care utilization
(outpatient visits, emergency department visit, hospitalizations, cumulative
hospitalizationdays)amongOUPpositive cohorts (Table3.4).Ouranalysis shows
that positive text mining cohort had significantly higher average of visiting
emergencydepartment(M=3.17,SD=4.9)comparingtoICDpositivecohort(M=
1.9,SD=2.1),P=0.022.TheICDpositivecohorthadasignificantlyhigheraverageof
cumulativehospitalizationdays(M=16.2,SD=23.9)comparingtothetext-mining
positivecohort(M=7.4,SD=12,P=0.022.)
Figure3.2Positiveopioiduseproblemsratestratifiedbyagegroup
VariablesofInterest
Inthisstudy,weconductedabi-variateanalysistocomparepositivetext-mining
cohorts vs. negative cohort. Chi-square and likelihood ratio were reported for
categorical variables (Table 3.5) and an independent t-test were reported for
44
continuousvariables(Table3.6).Thefollowingvariablesweresignificantonalpha=
0.05 and were ranked based on likelihood ratio and approximate change in
probability:Non-opioidAbuse,Gender,TobaccoUse,Self-Injury,Depression,Alcohol,
Abuse,andHepatitisC.Thet-testwasalsosignificantonthe followingcontinuous
variables: age, outpatient visits, emergency department visits, hospitalizations,
cumulativehospitalizationdays.
Table3.5Measureofassociationandlikelihoodratioforcategoricalvariables
Variable X2 p-valueLikelihood
ratio p-value
Approximatechangein
probabilityNon-opioidAbuse 55.457 <0.001 20.083 <0.001 >45%Gender 11.23 0.001 10.849 0.001 >45%TobaccoUse 10.881 0.001 8.41 0.004 >40%Self-Injury 30.971 <0.001 7.947 0.005 >35%Depression 6.186 0.013 4.994 0.025 >25%AlcoholAbuse 6.616 0.01 3.823 0.051 >20%HepatitsC 5.895 0.015 3.517 0.061 >20%
Table3.6ComparisonofcontinuousvariablesamongpositiveOUPusingtextminingvs.negativecohorts
VariableText-mining
positive MeanStd.
Deviation p-valueAge Yes 44.98 13.24 <0.001 No 59.02 16.34 OutpatientVisits Yes 12.86 12.4 0.001
No 9.15 10.48 EmergencyDepartmentvisits
Yes 3.17 4.91 <0.001No 0.94 1.86
Hospitalizations Yes 1.55 2.57 <0.001 No 0.54 1.138 CumulativeHospitalizationDays
Yes 7.42 12.01 <0.001No 3.21 17
Discussion
We identified 23,082 patientswho received LTOT during the study period, of
which 14,298 have eligible text mining of relevant electronic clinical notes that
yielded127positiveOUPcases,comparedto45casesusingICD-9codesforthesame
population.OurstudymethodologyfollowedthefootstepsofpreviousworkofCarrell
45
etal.(2015),Palmaretal.(2015),andHylanetal.(2015)[61,90,91].Thisoffersa
uniqueopportunitytocomparepopulationcharacteristicsandtextminingresultsto
identify opioid use problems in multi-healthcare settings. Below, we highlight
takeawayfindingsofourstudy.
IdentificationofOpioidUseProblems
WemeasuredtheprevalenceofopioiduseproblemsinIUHadultpatientsusing
textminingandICD-9codes.Inourstudy,validated-textminingidentified127(0.8%)
positive OUP cases out of 14,298 patients, while ICD-9 had identified 45 (0.3%)
positiveOUPcasesfromthesamepopulation(Totalcombined=164[1.1%]).Carrell
et al. (2015)–validated NLP have identified 1,875 (8.5%) patients out of 21,795
patients eligible for the study,while ICD-9had identified 2,240 (10.1%) from the
same population. We believe the difference of opioid use problem prevalence
betweenthe2studiescouldbeduetoseveralfactorssuchasthedifferencebetween
opioid problem rates in both populations or the study design. In our study, we
included5differentreportstypesthatcoversemergencydepartmentandprimary
carevisits,however,ourstudy,astheoppositefromCarrell’s,didnothaveaccessto
behavioral\mentalhealthreportsdueto42CFRpart2-confidentialityofsubstance
usedisorderpatientrecords[97].Thesereportscouldpotentiallyincludeopioiduse
problemrelateddataandaddingthemtoouranalysiscoulddecreasetheprevalence
disparitiesbetweenthe2studies.
Despitethedifferencesbetweenthe2combinedpercentages,bothfindings fall
within the estimated range reported in the literature. According to Ballantyne’s
commentarypublishedinthejournaljournalPAINin2015,publishedestimatesof
46
opioiduseproblemsrangedfrom<1%to50%[98].Ourfindingsareconsistentwith
otherstudiesestimatingopioidproblemsinthecontextoflong-termopioidtherapy.
Nobleetal. (2010)conductedastudytoassesssafety,efcacy,andeffectivenessof
opioids taken long-term for chronic non-cancer patients. The study reviewed 26
articlesandincludedonlypatientswhohadat least6monthsofopioidtreatment.
Accordingtothestudy,signsofopioidaddictionwerereportedinonly0.27%of4,893
studyparticipants[99].
FrequencyofICDCriteriatoIdentifyOUP
Ourstudyusedabroad-spectrumdefinitionofopioiduseproblemscomprisedof
2setsofICD-9codesforopioidabuseanddependence-aswellasopioidpoisoning.
This is different from Carrell’s study which only included those for opioid abuse
(305.5, 305.51, or 305.52), and opioid dependence (304, 304.01, 304.02, 304.7,
304.71,or304.72).Ourinclusionofopioidpoisingcodeshasdetectedanadditional
8distinctpatients.Thisequals17%of theoverallpositiveOUPdetectedby ICD-9
codesinourstudy.
Overall,ourstudypopulationgenerated49ICD-9events,representing45distinct
patients. The most commonly used ICD-9 codes were 304 [opioid dependence
unspecified] (49%), 305.5 [opioid abuse unspecified] (18%), and 304.1 [opioid
dependencecontinuous](10%).Thesefindingsaresimilartotheresultsreportedby
Palmeretal.(2015).Thestudyinvestigatedtheprevalenceofopioidproblemsamong
chronicopioidtherapypatientsintheGroupHealthCooperativefrom2006to2012.
ThemostcommonlyreportedICD-9codeswere304[opioiddependenceunspecified]
(55%), 304.1 [opioid dependence continuous] (26%), and 305.5 [opioid abuse
47
unspecified] (10%) [100]. Similar to the commonuse of unspecified ICD codes to
identifyOUP,apatternwasnoticedbyourreviewersduringamanualreviewprocess
ofpatients’clinicalnotes.Inthisstudy,wefoundunspecifiedcriteria[Physicianstates
opioidabuse/overuse/addictionor listedICDcodes foropioidabuse/dependence]
represented38%ofpositiveOUPinthetextminingapproach.Thesefindingssuggest
a common pattern in OUP identification between ICD-9 codes and a text-mining
approach.
FrequencyofText-MiningCriteriatoIdentifyOUP
Inourstudy,wehaveused17criteriatoidentifyOUPfromclinicalnotes.Themain
difference in these criteria compared to Carrells’ are the following: 1- we have
includedmarijuanaaspartofourlistofillicitdrugs2-Weaddedconcurrentuseof
authorizednarcoticstoincludespecificcasesofpolypharmacyofnarcoticabuse.We
added a criteria to cover cases where physician states narcotic abuse, overuse,
addictionorlistedICDcodesforopioidabuse/dependence.Inourstudy,useofan
illicitdrugcriteriacomprisedof18cases,ofwhich3casesindicatedactivemarijuana
use (within the last month) without mentioning other illicit drug abuse.
Polypharmacy of narcotic abuse flagged another 11 OUP cases. The third criteria
yielded48positiveOUPcases,ofwhich,16caseshaveoneormorementionsofICD-
9codediagnosisofabuseanddependencewithintheclinicalnotes.Outofthose16
patients,therewasonlyonepatientwhohadadocumentedICD-9foropioidabuse
anddependenceinthestructureddata.
48
Demographics
Inthisstudy,wefoundwomenwererepresentedmorethanmen,overall.Thisis
consistentwithCarrelletal.(2015)andHylanetal.(2015)which,bothhavefound
women to representabout two-thirdsof the chronicopioid therapypopulation in
theirstudies.Overall,higherwomen’srepresentationisconsistentwiththenational
estimatesofopioidprescriptionspergender.Accordingtothenationalstatisticsby
the CDC, there were 58% women vs. 42 % men who filled at least one opioid
prescription in2014[95].However,despitemorewomenthanmenseekingopioid
prescriptionsintheU.S.,menhadhigherratesofopioid-relateddisorders,suchas
opioid abuse or opioid overdose. In 2016, the opioid overdose deaths by gender
showedmenhad18.1deathsper100,000vs.8.5forwomenintheU.S.and16.7vs.
8.5,respectively, inIndiana[96].Ourstudyresultsareconsistentwiththisnotion;
whilepositiveOUPdistributionstratifiedbygendershowedahigherrateofwomen
withOUP,therateofoverallpositiveOUPindicatesahigherrateofpositiveOUPfor
men.
Mirroringnationwidetrendsofopioidprescriptionbyage,agedistributionofthe
overallstudycohortwasskewedtowardanolderage(>65),whichwasthehighest
representationinthestudycohort[97].FrequencyofpositiveOUPdistributionwas
thehighestinthegroupage45-54inboththeICDandtext-miningmethods.However,
therateofpositiveOUPpergroupageindicatesyoungeragegroupshadthehighest
OUP rate among age groups in ICD and text-mining cohorts. This should inform
physiciansinIndianathatdespitethehigherfrequencyofolderpatientswithopioid
49
problemsthattheymightencounter,younger-agepatientsaretheonesathigherrisk
foranopioidproblem.
ANoteAboutOpioidProblemDetectionUsingStructuredData
Despite the studywhichuseda combinationof2 typesof ICD-9 codes (opioid
abuseanddependenceandpoisoning),overall,ourtext-miningapproachidentified
morecases.Thetext-miningapproachalsodetected15patientswithamentionofan
ICD-9codeofopioidabuseanddependenceinpatients’clinicalnotes,butthesecodes
were undocumented in patients’ record as structured data. Thismight indicate a
significantlackofdocumentationtowardopioiduseproblemsandconfirmprevious
studies’ findings regarding the under-reporting of OUP using ICD-9. For future
researchpertainingtoidentifyingOUP,werecommendusingalternatemethods
(in addition to ICD codes), such as textmining andmachine learning, and further
exploringother commonlydocumentedstructureddata, suchasprocedural codes
andelectroniclabresults.
Limitations
Inthisstudy,weusedatext-miningapproachtoanalyzepatients’clinicalnotesto
identifyOUP.However, theanalysiswas limitedto5report types.Developingand
implementingNLPtechniquesgenerallyrequiresintensivecomputingandaninitial
commitmentofsubstantialresourcesandexpertise[101].Thus,itwasnotfinancially
feasiblewithintheallocatedbudgetforthisstudytotestallreporttypesgenerated
by the IUH study population. Rather, the authors relied on expert opinions to
meaningfully limit thenumber of report types.Reliance on expert opinionduring
algorithmdevelopmentiscommoninopioidproblemidentificationliterature.Canan
50
etal. (2017)reviewed15automatedalgorithmsto identifynonmedicalopioiduse
using electronic health record data. The study found that investigators explicitly
reliedonsubjectmatterexpertsduringtheprocessofalgorithmdevelopmentandto
identifycandidatevariables[101].
Anotherlimitationpertainingtostudyresultsisthat,despitetextminingcorrectly
identifying 127 positive cases out of 154 initial positive cases (PPV = 82%), the
overlapbetweenthetext-miningresultsandICD-9positiveresultswasonly8cases
(17%) (AppendixA.1). Thismay suggest that the textminingmethod hasmissed
some positive cases (false negative). To investigate this assumption, we have
reviewed500randomlyselectedreports thatwerenot flaggedbyour text-mining
criteria.Amongthe500reports,we identified10 falsenegativecases, indicatinga
negativepredictivevalueof98%onthereviewedsubset.
Finally, we used multiple behavioral concepts to define opioid use problems
(dependence, abuse, and addiction) and thus, distinguishing specific behavioral
conceptswasnotautomatedinthisresearch.Futureworkmayinvestigateusingtext
mining,naturallanguageprocessing,andmachinelearningtobettertargetspecific
behaviors.
Conclusion
Inthisstudy,wedevelopedatext-miningapproachtoidentifyOUPamonglong-
term opioid therapy patients at IUH. Compared to ICD-9 codes, text mining
successfully identified more OUP cases within the study population. Future
development of text-mining techniques can help identify cases undiscovered by
conventionalICD-9reportingmethods.
51
4. MACHINE-LEARNINGAPPROACHTOIDENTIFYOPIOIDUSEPROBLEMS
Introduction
In2012,opioidprescriptionsreachedtheirpeakwith81.3prescriptionsper100
people[98].Despitemorerecentreportsshowingthatthisnumberhasdecreasedto
66.5prescriptionsper100people, theopioidprescriptionrate in2015remains3
timeshigherthantheratesfrom1999[99],andmorethanaquarterofU.S.counties
arestillshowingdispensedopioidprescriptionnumbersequivalenttoorhigherthan
theirpopulations[98].Thedisparitiesoverprescribingopioidsacrossthenationhas
continuedthroughouttheyears.Infact,in2016,somecountieshadprescriptionrates
7timeshigherthanthenation’saverage[98–100].
Thelinkbetweenaveragedailyopioidprescriptionandopioidoverdosehasbeen
established[101].AccordingtoDunnetal.(2010),long-termopioidtherapypatients
receivinghigherdosesofopioidsareatahigherriskofdrugoverdose[102].Other
studies have found an association between heroin use and an increased rate of
nonmedicalopioiduseandmeetthecriteriaforopioidabuse[103].
Despitetheimportanceoftheopioidabuseissue,thecurrenthealthcaresystem
mainly relies on the International Classification of Disease (ICD) to report use
problems,whichunderestimatestheactualnumberofpatientsexhibitingthetarget
categorization[57,58].
Inrecentyears,severalstudieshavebeenconductedtoinvestigatethevalidityof
othermethods, such as natural language processing and textmining, tomeasure
opioidproblems, inaddition to ICDcodes.However, toourknowledge,nostudies
havebeenconductedtomeasurethepresenceofopioidproblemsamonglong-term
52
opioidusersviamachine-learningtechniques.Theaimofthischapteristoinvestigate
theapplicationofmachine-learningtechniquestoclassifyopioiduseproblems(OUP)
utilizing patients clinical notes. For this study, we define OUP as the presence of
opioidaberrantbehaviors,opioidmisuse,opioidabuseandopioidusedisorderinthe
patient’clinicalnotes.
Method
Machine-LearningProcessSummary
Toprepareformachine-learningprocess,2reviewershadreviewed700clinical
notesusingnDepth™todetermineopioiduseproblemcases.Reportcontents,along
with review results, were extracted by the Regenstrief Institute data core into
comma-separatedvalues(CSV)filesandPortabledocumentformat(PDF)files.The
reviewof700clinicalnotesresultedin145positivereportsforopioiduseproblems
(OUP) and 555 negative reportswith no indication of OUP in the text. To ensure
HIPAAcompliance,thefileswerestoredandprocessedontheKarstdesktop,anIU
remotesecuresupercomputer(Figure4.2).
Topreparedataformachine learning,2Javaprogramswerewrittentoextract
text from each of the PDF and CSV clinical notes files and they were saved into
separate TXT files. For the machine-learning process, Waikato Environment for
Knowledge Analysis (WEKA) was used to build and compare machine-learning
modelsonadocumentlevel.PerformancewascomparedbasedonF-measure,recall,
andprecisionfor5models.Thebestmodelperformancewasoptimizedandfinalized
into a standalone product to identify the OUP outcome among long-term opioid
53
therapypatientsatIUHbasedontheirmedicalrecords.Finally,themodelwastested
onasubsetofunseendata,andthemodelreportedtheresults(Figure4.2).
GoldStandardDevelopment(nDepth™)
Tocreateagoldstandardforthemachine-learningprocess,2trainedreviewers
manuallyreviewed700reportswithasemi-assistedmanualreviewprocessusing
nDepth™.TheRegenstrief Institute in IndianapolisdesignednDepth™asanatural
languageprocessingtoolwithwhichtoextractdata fromthe IndianaNetwork for
Patient Care (INPC). Furthermore, nDepth™ was programmed to highlight 27
suggestivewords in flagged reports (Table A.2).Whenmore than one report per
patientwasflagged,thesystemrandomlyselectedonetypeoftheflaggedreportsper
patienttobereviewed.ToestablishtheKappacoefficient,bothreviewersreviewed
200 identical reports. In case of the discrepancy, files were reevaluated and
consensus was reached. Cohen’s κwas run to determine if there was agreement
betweenthe2reviewersjudgmentregardingwhetherasubsetof200patientsinthe
studycohortmetanyOUPcriteria.Therewasmoderateagreementbetweenthe2
reviewers judgments:κ=0.691 (95%CI, 0.58 to0.79),p< .001.Ouroverall gold
standarddatasetconsistedof136positiveOUPreportsand480negativereports.The
criteriatodetermineopioiduseproblemsinclinicalnotesarelistedinTable3.1.
DataTransformation(Java)
The700manuallyreviewedclinicalnotes,whichconsistedof132PDFand568
CSVreports,wereuploadedintoKarst.KarstDesktopisaremotedesktopservicefor
users with accounts on the Karst research supercomputer [92]. As part of the
machinelearning processes, text from both CSV and PDF reports needed to be
54
transformed into TXT files for each report. Thus, a Java programwas written to
extractdatafromthePDFandCSVfiles(AppendixD.1andAppendixD.2).Asample
oftheextracteddatawasreviewedtovalidatetheextractionprocesstoconfirmdata
correctnessandcompletion.AsafinalsteptopreparethedataforWEKA,textwas
convertedintoAttribute-RelationFileFormat(ARFF).ARFFis“anASCIItextfilesthat
describesalistofinstancessharingasetofattributes.ARFFfilesweredevelopedby
themachine-learningprojectatthedepartmentofcomputerscienceoftheUniversity
ofWaikatoforusewiththeWEKA”[103].
Analysis
WEKAConfigurationandModelComparison
Intheprocessofmodelbuilding,trainingthemodelonalargersamplesizeusually
produceshighermodelperformancecompared to training themodelonasmaller
sample size [104]. However, the model-building process also considers class
representation balance in the training process, [105,106] which can lead to the
inabilitytoclassifytheunbalanceddataduetolackofpriorinformationinthemodel
training [106,107]. Class imbalance is characterized as there being “many more
instancesof someclasses thanothers” [107,108].The fundamental challengewith
trainingamodelonimbalanceddataistheabilityofimbalanceddatato“significantly
compromisetheperformanceofthemoststandardlearningalgorithms”[109].
In previous years, many solutions have been introduced to deal with the
imbalanceddataproblemandcanbesummarizedas follows:1-Datasampling: in
which themodel is trained on amore balanced distributed dataset [107,110]; 2-
Algorithmicmodification:inwhichadoptingbase-learningalgorithmsneeds“tobe
55
more attuned to class imbalanced” [107,111]; 3- Cost-sensitive learning: this
approach considersmodification on data level, algorithm level, or both, in which
consideringmisclassificationofonedataclass(e.g.,positiveclass)costishigherthan
anotherdataclass(e.g.,negativeclass) thus,algorithmperformance isadjusted to
minimizethehighercosterror(misclassifypositiveclass)[107,112].
Ourgoldstandarddatasethasaboutan80%negativeanda20%positiveclass
imbalance.BalancingthedataintheWEKAtrainingmodelon50/50or40/60class
representation is recommended [113]. However, adopting this approach might
jeopardize the real-world application of the model, given the difference in prior
probabilitybetweenthetrainingenvironmentandreal-worldapplication[107,114].
Hence, for this chapter,we decided to use an ensemble approach,where firstwe
exploremultiplealgorithmsandvariousclassificationconfigurationsandcompare
their performance on a balanced subset of data that has similar representation
betweenpositiveandnegativereports.Second,oncewedeterminethebestWEKA
configurationwithwhich to classify the 2 classes,wewill optimize themodel by
introducing560(80%)of thegoldstandardreportstoenhancetheaspectofreal-
worldapplicationofthemodel(becauserealworlddataisnotbalanced)[111].To
standardizedtestingenvironment,cross-validation(k=10)willbeusedacrossallof
themodels.
Finally,theoptimizedmodelwillbetestedonaunseendatasubsetconsistingof
140 (20%) clinical notes. Our intention in the optimization step is to maximize
positive report detection (less false negatives),whilemaintainingminimal loss of
precision. The rationale behind the ensemble approach is that classifiers are
56
commonlydesignedtominimizeerrorwhentheymakepredictionsaboutnewdata.
However, this assumption is proper when the cost of different errors is equally
important [112,115]. In this research,we sought toprioritize theminimizationof
false negatives (by capturing more true positives, while risking an increase in
capturing false positives). Accuracy measures, such as F-measures, recalls, and
precision,willbereportedforeachstep.ThehighlightsoftheWEKAconfiguration
(Table4.1)andthecompletemodelschema(Figure4.1)forbestmodelperformance
will be reported. The complete model training steps of WEKA are reported in
AppendixC.1toAppendixC.19.
Figure4.1WEKAmodelconfigurations
Table4.1HighlightofWEKAconfigurationsOption ChosenFeatures CommentClassifier FilteredClassifer Cost-sensitiveclassifierwasaddedto
optimizethefinalmodel.Algorithm 1-LinearAlgorithm:Simple
Logistic2-Non-LinearAlgorithm:SMO/J48/NaiveBayes3-EnsembleAlgorithm:RandomForest
ForSimpleLogistic:batchsize:100,heuristicstop:50,maxboostingiteration:500,numberofboostingiteration:0numberofdecimalplaces:2
Filter StringToWordVector: IDFTransform:True,TFTransform:True,LowerCaseToken:True.Wordstokeep:1000
StopWordsHandler MultiStopWords Tokenizer NGramTokenizer Max3,Min1
ModelFinalization
Themodelwithbestaccuracymeasureperformancewillbesavedasstandalone
MODEL files. The MODEL extension is a common files type for digital product
“Scheme:weka.classifiers.meta.FilteredClassifier-F"weka.filters.unsupervised.attribute.StringToWordVector-Rfirst-last-W1000-prune-rate-1.0-T-I-N0-L-stemmerweka.core.stemmers.IteratedLovinsStemmer-stopwords-handlerweka.core.stopwords.MultiStopwords-M1-tokenizer\"weka.core.tokenizers.NGramTokenizer-max3-min1-delimiters\\\"\\\\r\\\\n\\\\t.,;:\\\\\\\'\\\\\\\"()?!\\\"\""-S1-Wweka.classifiers.functions.SimpleLogistic---I0-M500-H50-W0.0”
57
definitionsandsimulations[116].ThefilescanbeloadedbyWEKAontoanyPCor
Mackintoshenvironmentandusedtoclassifycasesofopioiduseproblemsinunseen
reportsfromIUH.
Results
ModelComparison
We usedWEKA to build and test files among differentmodels to classify IUH
clinicalnotesintopositiveandnegativeforopioiduseproblemsusingabalancedclass
dataset. The training set consisted of 136 positive reports and similar negative
reports,whichwereuploaded toWEKA.The following algorithmswereused: J48
(defaultWEKAalgorithm),simplelogistic,naivebayes,randomforest,andsequential
minimal optimization (SMO). All models were tested based on a 10-fold
crossvalidation. Our model comparison has shown the Simple Logistic algorithm
showed better performance than other models for predicting positive class
(Precision: 91.9%; Recall: 91.2%; F-measure: 91.5%; ROC: 92%) (Table 4.2). The
detailedaccuracymeasureperformancesforsimplelogisticarelistedinTable4.3.
ModelOptimization
Toenhancemodelreal-worldapplication,simplelogisticmodelconfigurationwas
optimizedon80%ofthegoldstandarddataset,whichwascomprisedof560reports
(136positiveand424negative).Compared to thebalancedataset, themodelhad
initially underperformed when larger portions of the negative dataset were
introduced(Table4.4).Themodelhasclassified37(27%)out136positivecasesas
falsenegativecases.
58
Table4.2Comparisonofmultiplemodels’performancestoidentifypositiveOUPAccuracymeasurestoclassifypositivereportsAlgorithm Precision Recall F-MeasureSimpleLogistic 0.919 0.912 0.915J48 0.847 0.853 0.85RandomForest 0.756 0.934 0.836SMO 0.797 0.838 0.817NaiveBayes 0.821 0.64 0.719
Table4.3DetailedaccuracyperformancemeasureforsimplelogisticalgorithmtoclassifyOUP
Class TPRate FPRate Precision Recall F-Measure ROCAreaPositive 0.921 0.079 0.919 0.912 0.915 0.931
Negative 0.921 0.088 0.915 0.921 0.918 WeightedAvg. 0.917 0.083 0.917 0.917 0.917 Table4.4Initialsimplelogisticmodelperformanceon80%ofthegoldstandarddataClass TPRate FPRate Precision Recall F-Measure ROCAreaPositive 0.728 0.052 0.818 0.728 0.77 0.94
Negative 0.948 0.272 0.916 0.948 0.932 WeightedAvg. 0.895 0.219 0.892 0.895 0.892 Table4.5Optimizedsimplelogisticmodelperformanceon80%ofthegoldstandarddataClass TPRate FPRate Precision Recall F-Measure ROCAreaPositive 0.897 0.094 0.753 0.897 0.819 0.890Negative 0.906 0.103 0.965 0.906 0.934
WeightedAvg. 0.904 0.101 0.913 0.904 0.906
Table4.6Simplelogisticconfusionmatrixcomparisonbeforeandafteroptimization
ConfusionMatrix(N=560:136Positive,424Negative)Beforeoptimization AfterOptimization
Negative Positive Negative Positive402 22 384 4037 99 14 122
Table4.7Simplelogisticmodelperformancetestedon20%ofunseengoldstandarddataClass TPRate FPRate Precision Recall F-Measure ROCAreaPositive 0.889 0.030 0.667 0.889 0.762 0.924Negative 0.970 0.111 0.992 0.970 0.981
WeightedAvg. 0.965 0.106 0.971 0.965 0.967
59
Figure4.2Summaryofchapter4methodandresults
To account for this effect, a cost-sensitive classifying technique was adopted.
Costsensitive classifying is one method to fine-tune model performance through
increasingthepenaltyofmisclassifyingoneormoreclassesintheconfusionmatrix
60
(AppendixC.20toAppendixC.28).Adjustingthemodelcostsensitivelymatrixto2:1
has doubled the penalty of misclassifying positive cases as false positives, which
resultedinadecreaseinfalsepositivecases(Table4.5),droppingthetotaltoonly14
(10%)casesoutof136.Acompleteconfusionmatrixforbothmodels’outcomesis
listedinTable4.6.
ModelTesting
Thefinalizedmodelwastestedon140reports(20%ofourgoldstandarddata).
Thesedatawerenotusedinitiallyintheprocessoftrainingandtestingthemodel.On
theunseendata,themodelhascorrectlyclassified136(96.4%)reportswith
88.9% sensitivity to identify positive cases. The accuracy measures to classify
positive.reportswerethefollowing:Precision:0.667,Recall:0.889,andF-Measure:
0.762.ThecompleteperformancemeasuresarelistedinTable4.7
Discussion
Inthischapter,webuiltamachine-learningmodeltoidentifyopioiduseproblems
basedonIndianaUniversityHealthclinicalnotes.Thepositiveandnegativereports,
which had been extracted andmanually reviewed using nDepth™ in the previous
chapter,werepre-processedanduploadedintoWEKA,whichwasusedtocreatea
modeltoidentifyopioiduseproblems.Inthissection,wehighlightsomeofthekey
pointsofthiswork.
Algorithms’Performance
Wetested5differentalgorithmstoclassifyopioiduseproblemsonadocument
level (Appendix B.1 to Appendix B.5). simple logistic regression showed overall
higheraccuracyperformancemeasures inclassifyingpositiveopioiduseproblems
61
on a balanced dataset (Table 4.2). We found this interesting, given the growing
number of published articles discussing the superiority of random forest and/or
support vector machines over logistic regressions [117,118]. However, a recent
article by Pranckeviˇcius et al. (2017) compared Naive Bayes, Random Forest,
DecisionTree,SupportVectorMachineandlogisticregressionclassifierstoclassify
textreviews(multi-classtextualdata).Thestudyfoundthatlogisticregressionhas
achieved thehighestclassificationaccuracycomparedwithotherclassifiers [119].
This can be explained by the signal-to-noise ratio. One opinion article we found
discussedtheadvantagesoflogisticregressionsovertreedecisionmodels;thearticle
foundthat logisticregressionsperformbetterwhenthesignal-tonoiseratio is low
[120].
Giventhelargetextdata(noise)comparedtothesignal(phrasesorstatements
indicatinganopioiduseproblems),wedecidedtoinvestigatethishypothesisandsee
if logistic regression will still perform better if we limit our model training to
sentencesandphrases.Thus,wehavemanuallyreviewedallofthepositivereports
and300ofthenegativereportstoannotatestatementsandphrasesthatindicateor
negatethepresenceofopioiduseproblemsusingtheExtensibleHumanOracleSuite
ofTools(eHOST).Thisprocesshasresultedin255sentencesandphrasesindicating
thepresenceofanopioiduseproblems.However,itonlyyielded55sentencesand
phrases,negatingthepresenceofanopioiduseproblems.Finally,weappliedsimilar
settings to compare the following 4 classifiers: logistic regression, random forest,
NaiveBayesandSMO.OurresultsshowedthatrandomforestandSMOscoredslightly
higherthanlogisticregressionandNaiveBayes(AppendixB.6).
62
Cost-SensitiveClassifier
Inthisstudy,wefoundthatcost-sensitiveclassifierisaneffectivemethodtouse
to adjust for the data imbalance. Adjusting the cost matrix has increased recall
performancewhileloweringtheprecisionofpredictingpositivecasesinthemodel
byincreasing(doubling)thepenaltyformakingafalsenegativeerror.Ourrationale
forthisapproacharisesfromtheintentiontominimizemissingtrue-positivecases.
WebelievethatidentifyingmorepositiveOUPcaseswillallowfutureresearchersto
better understand why this sector of positive OUP patients (according to their
reports)wasnotdocumentedusingICDasanopioiduseproblemscase.Furthermore,
duetotheimportanceofopioidissuesfromaclinicalperspective,andgiventhesmall
representationofopioiduseproblemscasesintheIUHsystem(about1%),building
ahigh-sensitivitymodeltodetectpositiveOUPinlong-termopioidtherapypatients
based on their clinical notesmay seem like a practical approach, given the small
percentage of positive OUP cases. Such a pragmatic approach might not be
uncommon in screening tests;mammogramsare a commonexampleof screening
testswheresensitivityishigherthanspecificity[121,122].
Limitation
Ourgoldstandarddataweredeterminedbymanuallyreviewingthetext-mining
algorithmtoidentifyopioiduseproblems.Anypositivereportsubclassthatwasnot
captured by the text-mining algorithmmay not be represented in ourmodel. To
accountforthislimitation,wehaveintentionallyaddedallfalsepositivereportsas
partofourtrainingmodelandlabeledthemasnegativereports.Furthermore,500
63
randomnegativereports(fromthetext-miningalgorithm)weremanuallyreviewed,
andfalsenegativereportswereaddedtothetrainingsetaspositivecases.
Conclusion
Inthischapter,weadoptedanensembleapproachtobuildamachine-learning
model to identifyopioiduseproblemsbasedon IndianaUniversityHealth clinical
notes.Ourfinalmodelwastestedonunseendataandreportedasensitivityof88%
whenidentifyingpositivecases.Weconcludedthatamachine-learningapproachmay
beusedtoidentifyopioiduseproblemsinIndiana.
64
5. SUMMARY
SummaryOverview
In this thesis, we aimed to build a classification model to identify opioid use
problems using text-mining and machine-learning approaches. Additionally, we
examined several opioid use problems classification tools/models and identified
theirstrengthsandweaknesses.Generally,thesetoolscanbecategorizedintothree
classes, depending on the data populationmethod as follows: 1- Self-report tools
(Firstgeneration),2-Structureddatamodels(Secondgeneration),3-Clinicalnotes
models(Thirdgeneration).Belowwesynthesizethefindingsofthisthesisinlightof
thisclassification.
Self-reportTools(FirstGeneration)
Thesearerisk-assessmenttoolsthatpredicttheriskofcurrentorfutureopioid
useproblemsbasedonpredictivevariablesextractedfromdataself-reportedbythe
patient or physician. Common examples of self-report tools are SOAPP, ORT, and
COMM. Themain advantage of these tools is their independence from electronic
healthrecorddataandcanbeadministeredtothepatientorfilledoutbythetreating
physician. However, having the majority of these tools solely rely on patient
completionmighthavelimitedtheirabilitytopredictaberrantdrug-takingbehavior
duetoreportingbiasora lackofpatientcomprehension.Other factors thatmight
hinderusingself-reporttoolsaretheneedfordocumentationfromthemedicalstaff
and the largevariation in its sensitivity acrossdifferentpopulationsanddifferent
outcomedefinitions.
65
StructuredDataModels(SecondGeneration)
Thesearerisk-assessmentmodelsthatpredicttheriskofcurrentorfutureopioid
use problems based on predictive variables extracted from structured data of
electronichealthrecords.CommonexamplesofthesemodelsaretheworkofEdlund
etal.(2007),Whiteetal.(2009),Riceetal.(2012),Turneretal.(2014),andothers.
Thesemodelsdonotrelyonpatientreporteddata(directly),rather,itdependson
the structured data available in the electronic health records. However, this
dependenceonelectronichealthrecordsstructureddatamakesthesemodelsasgood
asthequalityandrobustnatureofcaptureddatastoredintheelectronicrecord.For
example,thenumberofprescribersandthenumberofdispensershavebeenshown
tobeariskindicatorforopioiduseproblems.However,includingthesedataintoa
classifyingorpredictivemodelmayrequirepre-worktoconsolidatethesedatafrom
multiple databases (such as Prescription Monitoring Programs). Thus, it was no
surprisetoseeAgeandGenderastheonlycommonvariablesacrossthe9modelswe
found using structured data. However, with the constant growth of the Health
InformationExchangeinsizeandnumber,weexpecttoseemorerobustmodelsin
the future that identify and utilize new variables to better understand opioid use
problems.
ClinicalNotesModels(ThirdGeneration)
NLPandTextMining
These are risk assessment models that identify the presence of opioid use
problems through classifying patients’ clinical notes using natural language
processingandtext-miningapproaches. Incomparisontothe2previousmethods,
66
NLP and text mining are more complex techniques that require intensive
computationalandlaborskills.Thus, inourreview,onlyahandfulofstudieswere
foundusingthesetechniquestoidentifyopioiduseproblems.Inthisthesis,weused
atext-miningapproachtoidentifyopioiduseproblemsusingpatient’sclinicalnotes.
Belowwehighlightthefindingsofourresearchwithinthecontextofsimilarresearch
intheliterature.
MethodsandVariables
Hylanetal.(2015),Carrelletal.(2015),andthesubsequentworkofPalmeretal.
(2015)area fewmodels foundthatuseNLPto identifyopioiduseproblemsfrom
patients’ clinical notes. Hylan et al. (2015), a two-year prospective study, used 3
phasesofNLPtoidentifyclinicalnoteswithindicationsof1-opioidoveruse,misuse,
abuse,oraddiction2-opioiddependence,and3-“mentions”ofICD-9codes(12ICD-
9codes)foropioidabuseordependenceintheclinicalnotes.Thestudyhasidentified
158(5.7%)patientswithopioiduseproblems,whileICD-9forthesamecohorthas
identified25(0.9%).
Our study used dictionary-based NLP which was adopted from Carrell et al.
(2015)for2reasons.First,astheoppositeofCarrellstudy’s,Hylanetal.(2015)did
not specify the technical details of theirNLP process. Second, theNLPmethod of
Carrellstudy’shaveresultedindetectingalargerpercentageofopioiduseproblems.
Third,Hylanetal.(2015)conductedaprospectivestudyanditwaslimitedtoprimary
caresettings,whileCarrelletal.(2015)conductedaretrospectivestudyandincluded
primarycareandemergencydepartmentnotes.Tonote,bothstudiesdrewsamples
fromtheGroupHealthpopulation.Themajordifferencewasinthestudyperiod(2
67
years vs. 5 years), study design (prospective vs. retrospective), and study sample
(primary care vs. primary care and emergency). In our study, the validated-text
miningidentified127(0.8%)positiveOUPcasesoutof14,298patientswhileICD-9
has identified 45 (0.3%) positive OUP cases from the same population (total
combined=164(1.1%)).
InadditiontotheNLPclassification,bothstudieshaveincludedancillaryanalysis
of study cohort to estimate risk.Hylan studiedprediction ability of 7 variables to
identifyapositiveNLPcohort.Thesevariablesweredocumentedforthe2yearsprior
toinitiationofchronicopioidtherapy.Thesevariableswere:Age(15to44or45to
64),smokingstatus,and2-yearpriorICDcodesforthefollowingdiagnoses:opioid
abuse/dependence, drug abuse/dependence (excluding opioids), alcohol
abuse/dependence,mental health disorder, and hepatitis C. The study found that
usingmultivariatelogisticregression,variablesindicatingage18to44,opioidabuse
diagnosis,positivesmokingstatus,andmentaldisorderdiagnosesweresignificant
predictors during the measured period. This study was limited to primary care
patientsofanintegratedgrouppractice,salariedphysiciansworkinginGroupHealth
Cooperative(GHC)clinics.
Inourstudy,validated-textminingidentified127(0.8%)positiveOUPcasesout
of14,298patients,whileICD-9identified45(0.3%)positiveOUPcasesfromthesame
population (total combined = 164 [1.1%]). We compared demographic
characteristicsandvariablesof interestamong2positivegroups(textminingand
ICD)usingChisquareandindependentt-test.WehavealsocomparedpositiveOUP
usingtextminingvs.negativeOUPforthesamevariablesandreportedmultivariate
68
oddsratio.Ourvariablesofinterestincludednon-opioidabuse,gender,tobaccoUse,
self-Injury,depression,alcoholabuse,hepatitisC,age,outpatientvisits,emergency
department visits, hospitalizations, and cumulative hospitalization days. These
variableswerechosensubjectivelybasedonoursystematicreviewofpreviouswork
and the feasibility of obtaining these variables from IUH and INPC databases.
Multivariateanalysislogisticregressionindicatesthatonlygender,non-opioidabuse,
selfinjury,outpatientvisits,emergencydepartmentvisitsandinpatientvisitswere
significantpredictorstopositivetext-miningoutcome(Table5.1).
Table5.1Multivariatelogisticregressionoftext-miningmeasureofopioiduseproblems
Variables OddsRatio Lower* Upper*Depression 1.166 0.612 2.225AgeContinuous 0.915 0.868 0.965AgeGroup 1.465 0.854 2.512Sex 1.62 1.124 2.335Race 0.928 0.591 1.456Ethnicity 1.202 0.931 1.553AlcoholAbuse 1.869 0.536 6.519OtherMedicalAbuse 3.583 1.409 9.116TobaccoUse 0.919 0.473 1.784SelfInjury 7.884 1.35 46.028HepatitisC 1.38 0.348 5.468Outpatientvisits 1.016 1.003 1.03EDvisits 1.093 1.053 1.135InpatientVisits 1.34 1.138 1.579HospitalDays 0.983 0.958 1.009*Oddsratiovaluebasedon95%confidenceinterval
Collectively, in addition to younger age, prior opioid diagnosis and mental
disorder,ourstudyfindingsindicatenon-opioidabuse,selfinjury,outpatientvisits,
EDvisitsandinpatientvisitsaresignificantvariablestopredictopioiduseproblems
inpositiveNLPandtext-miningpositivecohort.Ourreportedbi-variatemeasureof
association has also found alcohol abuse, hepatitis C, tobacco use, depression are
69
significantassociatedwiththemeasuredoutcomes.Onecommonlimitationofboth
studieswasthesmallnumberofpositiveoutcome(127outof14,298and158outof
2,752),whichaffectedthemodelsignificancetopredicttheoutcome.Thiscouldbe
Duethe limitations intheprocessoftextmining, itself,andinabilityofdictionary-
based method to detect indication of opioid use problems in indirect or long
statements.Suchstatementsmightbedeemedfalsepositive.
PrevalenceofOpioidUseProblems
WemeasuredtheprevelenceofopioiduseproblemsinIUHadultpatientsusing
textminingandICD-9codes.Inourstudy,validated-textminingidentified127
(0.8%) positive OUP cases out of 14,298 patients, while ICD-9 has Identified 45
(0.3%)positiveOUPcasesfromthesamepopulation(totalcombined=164[1.1%]).
Carrelletal(2015)Validated-NLPmethodhasidentified1,875(8.5%)patientsoutof
21,795patientseligibleforthestudy,whileICD-9hasidentified2,240(10.1%)from
thesamepopulation.thedisparitiesbetweenthe2resultswasvastinboththeICD
and text mining/NLP approaches despite both studies using the same eligibility
criteriaoflong-termopioidtherapy(orchronicopioidtherapy).Thisdifferencecould
beattributedtooneormoreofthefollowingfactors:
First,despitebothstudiesusingthesameinclusioncriteria,ourstudyexcluded
patientswithschizophreniaduetodocumentedhigherprevalenceofOUPamongthis
population. However, this factor would have minimal effect, since only 86
schizophrenicpatients (whowerealreadyexcludeddue to thepresenceof cancer
diagnosis)wereadditionallyexcludedfromthestudy.
70
Second,thisdifferencecouldbeattributedtothetimeandstudyperiodofthe
2 studies.While the longest available follow-up period in our studywas 2 years,
Carrell’sstudyassessedpatientsovera5-yearperiodfrom2006-2012.Thiscould
indicatethatthelengthoftimeforopioidconsumptionwouldleadtoanincreasein
prevalenceofopioidproblem.Onecritiquetothisassumptionisthatneitherstudy
measuredthedirecteffectofthelengthofopioidtherapyondevelopmentofthestudy
outcomes.
Figure5.1Opioidoverdosedeathratesper100,000populationfrom2006-2014.
Source:KaiserFamilyFoundation[123]
Third, this couldbe attributed to thedifferencebetweenopioidproblem rates
between the statesofWashingtonand Indianaduring the studiedperiod forboth
studies.Nationalstatisticsofopioidoverdosedeathrates(per100,000)showsthat
therateofopioidoverdosedeathsinWashingtonstaterangedfrom10.2to9.2during
Carrell’s studyperiod (2006-2012),while the samerate ranged from5.7 to7.3 in
Indianaduringourstudyperiod(2013-2014)(Figure5.1).
71
Finally, in the oppositition of Carrell’s study, our study did not have access to
behavioral\mentalhealthreportsdueto42CFRPart2-confidentialityofsubstance
usedisorderpatientrecordsregulations[97].Webelievethatincludingbehavioral
healthreportsislikelytoidentifymorepositivecasesofopioiduseproblemsinour
studypopulation.
Collectively,both studiesusedNLP to identifyopioiduseproblems inprimary
careandemergencysettingsandcomparedtheresultswithICD-9codes.TheNLPtext
mining technique identified additional positive cases in both studies. Text-mining
approachalsodetected15patientswithamentionofICD-9codeofopioidabuseand
dependence in patients’ clinical notes but these codes were undocumented in
patients’ record as structured data. This might indicate a significant lack of
documentation for opioid use problems and confirm previous studies’ findings
regarding the underreporting of opioid problems using ICD codes. For future
researchpertainingtoidentifyingOUP,werecommendusingalternatemethods(in
addition to ICD codes), such as text mining and machine learning, and further
exploringother commonlydocumented structureddata, suchasprocedural codes
andelectroniclabresults.
MachineLearning
Theseriskassessmenttechniquesutilizeasetofstatisticaltechniquestoidentify
partsofspeech,entities,sentiment,andotheraspectsoftext.Thechosentechnique
canbeexpressedinmodelsthatcanbeappliedtoanothertext[124].Thesemodels
canbeusedtoidentifyaparticularoutcomeintheclinicalnotes.Thesemodelsare
more flexible; to improve performance by training themodel on larger andmore
72
representative training sets and by choosing a suitable classifying algorithm. In
general,thelearningprocessofmachinelearningcanbecategorizedintosupervised
and unsupervised learning. Supervised learning means training documents are
taggedwithaspecificclass(e.g.,positive,negative),whileunsupervised,canextract
meaningwithoutaspecifictrainingset.Acommonexampleofunsupervisedtraining
isdataclustering.
MachineLearninginClassificationofDisease
Inourreview,wedidnotfindanypublishedworkpertainingtomachine-learning
models that classify clinical notes to identify opioid-addiction related behavior.
However,wefoundseveralstudieswhichhaveusedmachine-learningalgorithmsto
classifyothermedicaleventssuchaschronicdiseasehospitalization,orparticular
medicalconditionsuchasheartfailurestage,cancerdiagnosisoracutepancreatitis
[125–128].
Despitethedifferencesintheoutcomeofinterest,thegeneralthemeofbuildinga
machine-learningmodelissimilar.First,identifytheoutcomeofinterestandsource
ofclinicalnotes(e.g.heartfailurediagnosisintheemergencydepartmentdischarge
note). Second, identify training corpus (gold standard) using NLP/text-mining
approach (e.g. rule-based). Third, buildmachine-learningmodel and evaluate the
performance(byreportingF-measure,recallandprecision).
Machine-LearningtoIdentifyOpioidUseProblems
In this thesis, similar stepswere adopted tobuildmachine-learningpredictive
models to identify opioid use problems among long-term therapy patients at IUH
usingclinicalnotes.Webuiltonourpreviouseffortintextminingtoinvestigatethe
73
applicationofmachine-learningtechniquestoclassifyopioiduseproblemsutilizing
patientsclinicalnotes.Wemanuallyreviewedalistofpositiveandnegativeclinical
notes that resulted from the text-mining study, whichwe then used to build our
machine-learning model. We tested 5 different algorithms to classify opioid use
problemsat thedocument level. SimpleLogistic regressionshowedoverallhigher
accuracy performance measures in classifying positive opioid use problems on
balanceddataset.Ourfinalmodelwastestedonunseendataandreportedarecallof
88%andprecisionof66%(F-measure=76%)whenidentifyingpositivecases.
Despite our findings the results of identifying opioid use problems are
encouraging, our outcome of interest (OUP), comprised of multiple behavioral
concepts,suchasdependence,abuse,addiction,andoverdose.Whiledistinguishing
specificbehavioralconceptswasnotautomated in this research, futureworkmay
investigatetheuseoftextmining,naturallanguageprocessing,andmachinelearning
to better target specific behaviors before considering using these for surveillance
purposesorincorporatingthemintoaclinicaldecisionsystemforprimarycareand
emergencyencounters.
Limitations
Inthisthesis,weusedatext-miningapproachtoanalyzepatientsclinicalnotesto
identifyOUP.Due to labor constraints,ouranalysiswas limited to5 report types,
despite our effort to rely on expert opinions tomeaningfully limit the number of
reporttypes.Thisapproachmighthavesystematicallyoverlookedacertaingroupof
positive OUP patients. Since our gold standard data wasmade throughmanually
reviewing a text-mining algorithm to identify opioid use problems, any positive-
74
report subclass that was not captured by the text-mining algorithm may not be
representedinourmodel.Toaccountforthislimitation,wehaveintentionallyadded
all falsepositive reports as part of our trainingmodel labeled asnegative reports.
Further, a randomsample of 500negative reports (by the text-mining algorithm)
weremanuallyreviewedandfalse-negativereportswereaddedtointothetraining
setaspositivecases.
Conclusion
Thisstudy’smaincontributionwastodemonstrate thevalidityofusinga text-
miningapproachandmachine-learningtechniquestoidentifyopioiduseproblems
frompatients’clinicalnotesinmulti-sitemedicalcarefacilities.Ourresearchhasalso
highlightedthemaindemographiccharacteristics,careutilization,andco-morbidity
similarities and differences among opioid use problems identified by textmining
versustheopioiduseproblemsidentifiedbyICDcodes.Additionally,thisresearch
hascomparedmultiplemachine-learningalgorithms’performancetoidentifyopioid
use problems frompatients’ clinical notes and has reported their performance to
informfutureresearch.
FutureDirectionandRecommendations
Basedontheresultsfromthisthesis,futureresearchmayfocuson:
1. Furtheranalyzingmachine-learningcapabilitytoidentifyopioiduseproblems
from clinical notes.We recommend to buildmachine-learningmodel that is
basedonlargervalidatedtrainingcorpuswhichcomprisedofmultipleyearsof
patient’clinicalnotes.
75
2. Comparingtext-miningapproachagainstICD-10tounderstandfirst,whether
ICD-10documentationtoOUPhaschangedfromICD-9andsecond,tofurther
studythepattern,ifany,fortheOUPcasesmissedbyICDcodes.
3. ExploringtheideaofadoptingahybridapproachthatcombinesICDcodesin
addition to patients clinical notes (text mining or machine learning).
Incorporating such approach may synergize the ability of future models to
identifytheopioidissue.
4. Focusingonmeasuringtherelationshipbetweenopioiduseproblemsandthe
lengthofopioidtreatmentinlong-termopioidtherapypatients.
5. Investigatingtheuseoftextmining,naturallanguageprocessingandmachine
learning to better target specific opioid use problems behaviors (such as
overdose vs. abuse) before incorporating these techniques into surveillance
purposes or adopting it into a clinical decision system for primary care and
emergencyencounters.
76
6. APPENDICES
CHAPTER3SUPPLEMENTS
AppendixA.1VenndiagramoftheoverlapbetweenthetwopositivecohortsofopioidUseprobleminthestudy
77
Appendix A.2 Terms used by nDepth™ algorithm to identify (flag) OUP positivereports
Opioidterms Problemuseterms
codeine analgesic addict abusing aberrancy
codeine narcotic addicted misusing aberrant
codeine opiate addiction overusing daiversion
fentanyl opioid overusing divert
hydrocodone abuse
methadone abuser
morphine misuse
oxy misuser
oxy-ir overuse
oxycod overuser
oxycodone overuse
oxyContinpercocetpolypharmacyvicoVicodin
overuser
AppendixA.3Highlightedphrasesusedinthesemi-assistedmanuallyreviewprocess
Semiassistedmanualreviewhighlightedwords
abus* opioidusedisorder diversion/divert
misus* dependent high
overus* dependency euphoria
overus* heroin abnormalurinedrugscreen
Addict* Crav* dirtyurine
inconsistenturine Lostprescriptions alcoholabus*
unauthorizeddoseincrease
Lesssuggestivewords earlyrefills
takingmorethanprescribed
marijuana immediatetaper
Narcan narcotic naloxone
78
AppendixA.4ICD-9codesusedtodefineOUP
ClassCodes
DiagnosisCategory ICD-9-CMCodes
OpioidAbuseanddependence
1. 304.00(opioiddependenceunspecified)2. 304.01(opioiddependencecontinuous)3. 305.50(opioidabuseunspecified)4. 305.51(opioidabusecontinuous)5. 304.71(opioid/otherdependencecontinuous)6. 304.02(opioiddependenceepisodic)7. 304.70(opioid/otherdependenceunspecified)8. 305.52(opioidabuseepisodic)9. 304.72(opioid/otherdependenceepisodic)
Poisoning 10. Non-specificcode965Poisoningbyanalgesicsantipyreticsandantirheumatics
11. Non-specificcode965.0Poisoningbyopiatesandrelatednarcotics12. Specificcode965.00Poisoningbyopium(alkaloids),unspecified13. Specificcode965.01Poisoningbyheroin14. Specificcode965.02Poisoningbymethadone15. Specificcode965.09Poisoningbyotheropiatesandrelated
narcotics16. E850.0Accidentalpoisoningbyheroin17. E850.1Accidentalpoisoningbymethadone18. E850.2Accidentalpoisoningbyotheropiatesandrelatednarcotics
AppendixA.5FrequencydistributionofpositiveICD-9codesforOUP
FrequencydistributionofpositiveICD-9codesforOUP
ICD-9Codes Description N(%)
304 opioiddependenceunspecified 24(49%)
305.5 opioidabuseunspecified 9(18%)
304.01 opioiddependencecontinuous 5(10%)
965 Non-specificcode965.0Poisoningbyopiatesandrelatednarcotics
3(6%)
965.01 Specificcode965.01Poisoningbyheroin 3(6%)
E850.2 E850.2Accidentalpoisoningbyotheropiatesandrelatednarcotics
2(4%)
E850.0 E850.0Accidentalpoisoningbyheroin 1(2%)
965.09 Specificcode965.09Poisoningbyotheropiatesandrelatednarcotics
1(2%)
304.71 opioid/otherdependencecontinuous 1(2%)
GrandTotal 49(100)*
*4outof45patientshadtwopositiveICD-9codesforOUP
79
CHAPTER4SUPPLEMENTS
AppendixB.1J48modelperformanceon80%ofthegoldstandarddata
Class TPRate
FPRate
sPrecision Recall F-Measure
ROCArea
Positive 0.691 0.092 0.707 0.691 0.699 0.808
Negative 0.908 0.309 0.902 0.908 0.905
WeightedAvg.
0.855 0.256 0.854 0.855 0.855
AppendixB.2NaiveBayesmodelperformanceon80%ofthegoldstandarddata
Class TPRate
FPRate
Precision Recall F-Measure
ROCArea
Positive 0.625 0.146 0.578 0.625 0.601 0.851
Negative 0.854 0.375 0.877 0.854 0.865 0.776
WeightedAvg.
0.798 0.319 0.804 0.798 0.801 0.794
AppendixB.3Randomforestmodelperformanceon80%ofthegoldstandarddata
Class TPRate
FPRate
Precision Recall F-Measure
ROCArea
Positive 0.485 0.160 0.493 0.485 0.489 0.662
Negative 0.840 0.515 0.836 0.840 0.838
WeightedAvg.
0.754 0.429 0.752 0.754 0.753
AppendixB.4SMOmodelperformanceon80%ofthegoldstandarddata
Class TPRate
FPRate
Precision Recall F-Measure
ROCArea
Positive 0.721 0.083 0.737 0.721 0.729 0.819
Negative 0.917 0.279 0.911 0.917 0.914
WeightedAvg.
0.870 0.232 0.869 0.870 0.869
AppendixB.5Simplelogisticmodelperformanceon80%ofthegoldstandarddata
Class TPRate
FPRate
Precision Recall F-Measure
ROCArea
Positive 0.728 0.052 0.818 0.728 0.770 0.940
Negative 0.948 0.272 0.916 0.948 0.932
WeightedAvg.
0.895 0.219 0.892 0.895 0.892
80
AppendixB.6Alistofdocument,sentenceandphraseperformancemeasuretoclassifypositiveclass
AccuracyMeasures
Level Algorithm Precision Recall F-Measure
Document SimpleLogistic 0.918 0.904 0.911
J48 0.847 0.853 0.85
RandomForest 0.756 0.934 0.836
BinarySMO 0.797 0.838 0.817
NaiveBase 0.821 0.64 0.719
Sentence RandomForest 0.867 1 0.929
BinarySMO 0.867 1 0.929
SimpleLogistic 0.856 1 0.922
NaiveBase 0.856 1 0.922
Phrase RandomForest 0.864 1 0.927
BinarySMO 0.864 1 0.927
SimpleLogistic 0.859 1 0.924
NaiveBase 0.859 1 0.924
81
WEKADEMONSTRATION
AppendixC.1ClickExplorertoopenWEKA
AppendixC.2ClickOpenfile
82
AppendixC.3ChooseARFFtrainingdatafilefromcomputer
AppendixC.4Double-checkdatabeforeproceeding
83
AppendixC.5ClickChoosetoopenclassifieroptions
AppendixC.6Choosefilteredclassifier
AppendixC.7Rightclickonclassifiernameandclickshowproperties
84
AppendixC.8ClickChoosetoopenclassifierslist
85
AppendixC.9Chooseaclassifier[e.g.,simplelogistic]
86
AppendixC.10ClickChoosetoopenfilteroptions
87
AppendixC.11ChooseStringToWordVector
88
AppendixC.12Rightclickonfilternameandclickshowproperties
89
AppendixC.13Suggestedfiltersettingsfortextclassifications
90
AppendixC.14Suggestedfiltersettingsfortextclassificationscontinues
91
AppendixC.15Frommainclassifytab,clickMoreoptions
92
AppendixC.16ClickChoosetoopenoutputpredictions
93
AppendixC.17Choosecross-validationandclassifyingattribute
94
AppendixC.18Predictionresultsareshowninthebuffer
AppendixC.19Accuracymeasuresandmodelperformanceresultsareshownatthe
endofthebufferresults
95
AppendixC.20Rightclickonclassifiername,thenclickshowproperties
AppendixC.21ClickChoosetoopenclassifieroptions
96
AppendixC.22ChooseCostSensitiveClassifier
97
AppendixC.23Rightclicktoshowproperties
AppendixC.24Clickchoosetoopenclassifieroptionsandchooseaclassifier
98
AppendixC.25Rightclickoncostmatrixtoshowproperties
99
AppendixC.26Changeclassesnumberaccordingtothestudyclasses,thenclick
resize
100
AppendixC.27CostMatrixnowreflectsstudyclassescount
AppendixC.28Adjustpenaltyasneeded.Trainthemodel
101
JAVACODESTOEXTRACTTEXT
AppendixD.1CodestoextracttextfromPDFinspiredbyDaraYukoriginal
code[129]
packageabdullah;import java . io . File ; import java . io . FileInputStream ; import java . io .FileOutputStream;importjava.io.FileReader;importjava.io.FileWriter;importjava.io.BufferedWriter;importjava.io.IOException;importcom.itextpdf.text.Document;importcom.itextpdf.text.pdf.PdfReader;importcom.itextpdf.text.pdf.parser.PdfTextExtractor;importjava.awt.Desktop;importjavax.swing.filechooser.FileNameExtensionFilter;importjavax.swing.JFileChooser;/∗∗∗∗@author∗/public classPDFToTextConverter{publicstaticvoidmain(String[]args){selectPDFFiles();}//allow pdf f i l e s selection for converting public static voidselectPDFFiles(){JFileChooserchooser=newJFileChooser();FileNameExtensionFilterfilt
e r = new FileNameExtensionFilter (”PDF” ,” pdf ”); chooser .setFileFilter(filter);chooser.setMultiSelectionEnabled(true);intreturnVal=chooser.showOpenDialog(null);Stringpath=FilePath;if(returnVal==JFileChooser.APPROVEOPTION){
File[]Files=chooser.getSelectedFiles();System.out.println(” Please wait . . . ” ) ; for ( int i =0;i<Files . length ; i++){convertPDFToText(Files[i].toString(),path+Files[i].getName().substring(0,
Files[i].getName().length()−4)+”.txt”);}
System.out.println(”Conversioncomplete”);}
}public static voidconvertPDFToText(String src,String desc){
try{ //create file writer
FileWriterfw=newFileWriter(desc); //create buffered writer
BufferedWriterbw=newBufferedWriter(fw); //create pdfreader
PdfReaderpr=newPdfReader(src);//getthenumberofpagesinthedocumentintpNum=pr.getNumberOfPages();for(intpage=1;page<=pNum;page++){Stringtext=PdfTextExtractor.getTextFromPage(pr,page);bw.write(text);bw.newLine();
102
}bw.flush();bw.close();}catch(Exceptione){e.printStackTrace();}}
}
103
AppendixD.2CodestoextracttextfromCSV
packageabdullah;importjava.io.BufferedReader;importjava.io.BufferedWriter;importjava.io.File;importjava.io.FileNotFoundException;importjava.io.FileReader;importjava.io.FileWriter;importjava.io.IOException;importjava.util.StringTokenizer;/∗∗∗∗@author∗/public class Abdullah{
/∗∗ ∗@paramargsthecommandline arguments
∗/ public static void main( String [ ] args ) throwsFileNotFoundException,IOException{
File = new File ( F i l e P a t h ); BufferedReader br = newBufferedReader(newFileReader(file));
String line; StringTokenizer st;
intcounter=0;while((line=br.readLine())!=null){st=newStringTokenizer(line,”\t”);if(st.countTokens()>=3){st.nextToken();
Stringname=st.nextToken()+”.txt”;BufferedWriterout=newBufferedWriter(newFileWriter(”FilePath”+name));st.nextToken();if (st.hasMoreTokens()){out.write(st.nextToken());}else{
out.write(””);}out.close();counter++;
} //(st.countTokens()==4)else{System.err.println(”line\t”+counter);counter++;
}}//while}}
104
7. REFERENCES
[1] D.E. Joranson,“etal., trendsinmedicaluseandabuseofopioidanalgesics,”
Jama,vol.283,no.13,pp.1710–1714,2000.
[2] S. E. McCabe, B. T. West, and H. Wechsler, “Trends and college-level
characteristics associated with the non-medical use of prescription drugs
amonguscollegestudentsfrom1993to2001,”Addiction,vol.102,no.3,pp.
455–465,2007.
[3] R.K.Portenoy,Chronicopioidtherapyinnonmalignantpain.Journalofpainand
symptommanagement,1990.
[4] L.Manchikanti,M.B. F.M., andM.H.Ailinani, “Therapeuticuse, abuse, and
nonmedicaluseofopioids:aten-yearperspective,”PainPhysician,vol.13,pp.
401–435,2010.
[5] CDC, “National vital statistics system: Mortality data,”
http://www.cdc.gov/nchs/deaths.htm,2013,accessed:2018-01-23.
[6] N.Volkow,American’sAddictiontoOpioids:HeroinandPrescriptionDrugAbuse,
vol. 2014, 2018. [Online]. Available: https://www.drugabuse.gov/about-
nida/legislative-activities/testimonyto-congress/2016/americas-addiction-
to-opioids-heroin-prescription-drug-abuse
[7] W.M.ComptonandN.D.Volkow,“Majorincreasesinopioidanalgesicabusein
theunitedstates:concernsandstrategies,”Drugandalcoholdependence,vol.
81,no.2,pp.103–107,2006.
[8] U.D.Health andH.H. Services, “With chartbook on trends in the health of
americans.”2006Claitor’sLawBooksandPublishingDivision,2005.
105
[9] J. Clarke, A. Skoufalos, and R. Scranton, “The american opioid epidemic:
Population health implications and potential solutions. report from the
national stakeholderpanel,”Populationhealthmanagement, vol.19,pp.S1–
S10,2016.
[10] L.Manchikanti, “National drug control policy and prescription drug abuse:
factsandfallacies,”Painphysician,vol.10,no.3,pp.399–424,2007.
[11] L. Manchikanti, E. Whitfield, and F. Pallone, “Evolution of the national all
schedules prescription electronic reporting act (nasper): a public law for
balancingtreatmentofpainanddrugabuseanddiversion,”Painphysician,vol.
8,no.4,pp.335–47,2005.
[12] L.Manchikanti,“Prescriptiondrugabuse:whatisbeingdonetoaddressthis
newdrugepidemic?testimonybeforethesubcommitteeoncriminaljustice,
drugpolicyandhumanresources,”Painphysician,vol.9,no.4,pp.287–321,
2006.
[13] L.ManchikantiandA.Singh,“Therapeuticopioids:aten-yearperspectiveon
the complexities and complications of the escalating use, abuse, and
nonmedicaluseofopioids,”Painphysician,vol.11,no.2,pp.63–88,2008.
[14] A.M.GilsonandP.G.Kreis,“Theburdenofthenonmedicaluseofprescription
opioidanalgesics,”PainMedicine,vol.10,pp.S89–S100,2009.
[15] H.G.Birnbaum,“etal.,estimatedcostsofprescriptionopioidanalgesicabuse
intheunitedstatesin2001:asocietalperspective,”TheClinicaljournalofpain,
vol.22,no.8,pp.667–76,2006.
106
[16] H. G. Birnbaum and et al, “Societal costs of prescription opioid abuse,
dependence,andmisuseintheunitedstates,”Painmedicine(Malden,Mass),
vol.12,no.4,pp.657–67,2011.
[17] N.National,OverdoseDeaths-NumberofDeathsfromPrescriptionOpioidPain
Relievers, vol. 2015, 2018. [Online]. Available:
http://www.drugabuse.gov/related-topics/trends-statistics/overdosedeath-
rates
[18] Asam, “Opioid addiction 2016 facts and figures,” Addiction, 2016. [Online].
Available: http://www.asam.org/docs/default-
source/advocacy/opioidaddiction-disease-facts-figures.pdf
[19] CDC.,“Drugoverdosedeathsintheunitedstatescontinuetoincreasein2015,”
2015.[Online].Available:https://www.cdc.gov/drugoverdose/epidemic/
[20] C. M. Jones, “Heroin use and heroin use risk behaviors among nonmedical
usersofprescriptionopioidpainrelievers-unitedstates,2002-2004and2008-
2010,”Drugandalcoholdependence,vol.132,no.1,pp.95–100,2013.
[21] R.A. Rudd and et al, “Increases in drug and opioid overdose deaths-united
states,”American Journal of Transplantation, vol. 16, no. 4, pp. 1323–1327,
2016.
[22] Cdc., “Opioid painkiller prescribing,” p.1, 2014. [Online].
Available:https://www.cdc.gov/vitalsigns/opioid-prescribing/
[23] CDC, “Prescription painkiller overdoses,” vol. 2018, p. 1, 2013. [Online].
Available:
https://www.cdc.gov/vitalsigns/prescriptionpainkilleroverdoses/index.html
107
[24] Samhsa.,“Keysubstanceuseandmentalhealthindicatorsintheunitedstates:
Resultsfromthe2015nationalsurveyondruguseandhealth,”2016.[Online].
Available: https://www.samhsa.gov/data/sites/default/files/NSDUH-
FFR12015/NSDUH-FFR1-2015/NSDUH-FFR1-2015.pdf
[25] Abuse, “N. i.o.d. prescription andover-the-countermedications,” p. 1, 2015.
[Online].Available:
https://www.drugabuse.gov/publications/drugfacts/prescription-over-
countermedications
[26] R.J.Fortunaandetal.,“Prescribingofcontrolledmedicationstoadolescents
andyoungadults in theunitedstates,”Pediatrics, vol.126,no.6,pp.1108–
1116,2010.
[27] J.Duwveetal.,“Reportonthetollofopioiduseinindianaandmarioncounty,”
vol.2016,2018.[Online].Available:
https://www.inphilanthropy.org/sites/default/files/Richard
[28] C.A.Harleandetal.,“Decisionsupportforchronicpaincare:howdoprimary
carephysiciansdecidewhen toprescribeopioids?aqualitativestudy,”BMC
familypractice,vol.16,no.1,p.48,2015.
[29] J.H.Chenandet al., “Distributionofopioidsbydifferent typesofmedicare
prescribers,”JAMAinternalmedicine,vol.176,no.2,pp.259–261,2016.
[30] P.Bendtsenandetal,“Whatarethequalitiesofdilemmasexperiencedwhen
prescribingopioidsingeneralpractice?”Pain,vol.82,no.1,pp.89–96,1999.
108
[31] A.A.Bergmanandetal.,“Contrastingtensionsbetweenpatientsandpcpsin
chronicpainmanagement:aqualitativestudy,”PainMedicine,vol.14,no.11,
pp.1689–1697,2013.
[32] R.R.Leverenceandetal.,“Chronicnon-cancerpain:asirenforprimarycarea
reportfromtheprimarycaremultiethnicnetwork(primenet),”TheJournalof
theAmericanBoardofFamilyMedicine,vol.24,no.5,pp.551–561,2011.
[33] M.S.Matthias andet al., “Thepatient-provider relationship in chronicpain
care:providers’perspectives,”PainMedicine,vol.11,no.11,pp.1688–1697,
2010.
[34] A.Spitzandetal.,“Primarycareproviders’perspectiveonprescribingopioids
to older adults with chronic non-cancer pain: a qualitative study,” BMC
geriatrics,vol.11,no.1,p.35,2011.
[35] S.F.a.e.a.Butler,“Validationofascreenerandopioidassessmentmeasurefor
patientswithchronicpain,”Pain,vol.112,no.1-2,pp.65–75,2004.
[36] H. Akbik and et al., “Validation and clinical application of the screener and
opioidassessmentforpatientswithpain(soapp),”JournalofPain&Symptom
Management,vol.32,no.3,pp.287–93,2006.
[37] S.F.Butlerandetal.,“Validationoftherevisedscreenerandopioidassessment
forpatientswithpain(soapp-r),”JournalofPain,vol.9,no.4,pp.360–72,2008.
[38] ClinJPain,“Crossvalidationofthecurrentopioidmisusemeasuretomonitor
chronicpainpatientsonopioidtherapy,”ClinicalJournalofPain,vol.26,no.9,
pp.770–6,2010.
109
[39] S.M.Wuand et al., “The addictionbehaviors checklist: validationof a new
clinician-basedmeasureofinappropriateopioiduseinchronicpain,”Journal
ofpainandsymptommanagement,vol.32,no.4,pp.342–351,2006.
[40] T.JonesandA.S.D.Passik,“comparisonofmethodsofadministeringtheopioid
risktool,”JournalofOpioidManagement,vol.7,no.5,pp.347–51,2011.
[41] S.F.Butlerandetal.,“Electronicopioidriskassessmentprogramforchronic
painpatients:Barriersandbenefitsofimplementation,”PainPractice,vol.14,
no.3,pp.E98–E105,2014.
[42] Inflexxion,“Soapplongformvs.soappshortform:Tradeoffinaccuracy,”2005.
[Online].Available:https://www.painedu.com/soapp/SOAPPtradeoff.pdf
[43] T. Jones and et al., “A comparison of various risk screening methods in
predictingdischargefromopioidtreatment,”ClinicalJournalofPain,vol.28,
no.2,pp.93–100,2012.
[44] T.M.Moore, , and et al., “A comparison of common screeningmethods for
predictingaberrantdrug-relatedbehavioramongpatients receivingopioids
forchronicpainmanagement,”PainMedicine,vol.10,no.8,pp.1426–33,2009.
[45] R.WestandJ.Brown,TheoryofAddiction,2nded.-Blackwell:Wiley,2013.
[46] W.S.abuse,p.1,2016.[Online].Available:
http://www.who.int/topics/substanceabuse/en/
[47] “Triwesthealthcarealliance.substanceabusevssubstancedependence,”2016.
[Online]. Available: http://www.triwest.com/en/behavioral-
health/resourcecenter/addiction-recovery/substance-abuse/more/
110
[48] “Highlights of changes from dsm-iv-tr to dsm-5,” 2013. [Online]. Available:
http://www.dsm5.org/documents/changes
[49] C.Tools,“Dsm5criteriaforsubstanceusedisorder,”2013.[Online].Available:
http://www.buppractice.com/node/12351
[50] N.I.Abuse,“Aberrantdrugtakingbehaviors,”vol.4,2016.[Online].
Available:
https://www.drugabuse.gov/sites/default/files/files/AberrantDrugTakingBe
haviors.pdf
[51] J.Jaffe,C.Knapp,andD.Ciraulo,“Opiates:clinicalaspectssubstanceabuse:a
comprehensivetext,”WilliamsandWilkens,pp.158–66,1997.
[52] M.J.Cohenandetal.,“Ethicalperspectives:Opioidtreatmentofchronicpain
inthecontextofaddiction,”ClinicalJournalofPain,vol.18,no.4,pp.S99–S107,
2002.
[53] E.Michnaandetal.,“Predictingaberrantdrugbehaviorinpatientstreatedfor
chronic pain: importance of abuse history,” Journal of pain and symptom
management,vol.28,no.3,pp.250–258,2004.
[54] Drugabuse.gov,“Aberrantdrugtakingbehaviorsinformationsheet,”vol.1,
2018.[Online].Available:
https://www.drugabuse.gov/sites/default/files/files/AberrantDrugTakingBe
haviors.pdf
[55] T.J.Ivesandetal.,“Predictorsofopioidmisuseinpatientswithchronicpain:
aprospectivecohortstudy,”BMCHealthServicesResearch,vol.6,p.46,2006.
111
[56] M.J.Edlundandetal.,“Riskfactorsforclinicallyrecognizedopioidabuseand
dependenceamongveteransusingopioidsforchronicnon-cancerpain,”Pain,
vol.129,no.3,pp.355–62,2007.
[57] A. G. White and et al., “Analytic models to identify patients at risk for
prescriptionopioidabuse,”AmericanJournalofManagedCare,vol.15,no.12,
pp.897–906,2009.
[58] J.B.Riceandetal.,“Amodeltoidentifypatientsatriskforprescriptionopioid
abuse,dependence, andmisuse,”PainMedicine, vol. 13,no.9, pp.1162–73,
2012.
[59] M. Lewis, C. M. Herndon, and J. T. Chibnall, “Patient aberrant drug taking
behaviorsinalargefamilymedicineresidencyprogram:aretrospectivechart
review of screening practices, incidence, and predictors,” Journal of Opioid
Management,vol.10,no.3,pp.169–75,2014.
[60] J.A.Turnerandetal.,“Chronicopioidtherapyurinedrugtestinginprimary
care:prevalenceandpredictorsofaberrantresults,”JournalofGeneralInternal
Medicine,vol.29,no.12,pp.1663–71,2014.
[61] T.R.Hylanandetal.,“Automatedpredictionofriskforproblemopioiduseina
primarycaresetting,”JournalofPain,vol.16,no.4,pp.380–7,2015.
[62] P.D.Allison,LogisticregressionusingSAS:TheoryandapplicationSASInstitute,
S.Institute,Ed.,2012.
[63] E.L.Zaleandetal.,“Tobaccosmoking,nicotinedependence,andpatternsof
prescriptionopioidmisuse:Resultsfromanationallyrepresentativesample,”
Nicotine&TobaccoResearch,vol.17,no.9,pp.1096–103,2015.
112
[64] J.WilsonandA.Bock, “Thebenefitofusingbothclaimsdataandelectronic
medical record data in health care analysis,” 2012. [Online]. Available:
https://www.optum.com/content/dam/optum/resources/whitePapers/Ben
efitsof-using-both-claims-and-EMR-data-in-HC-analysis-WhitePaper-ACS.pdf
[65] E. F. Kern and et al., “Failure of icd?9?cm codes to identify patients with
comorbidchronickidneydiseaseindiabetes,”Healthservicesresearch,vol.41,
no.2,pp.564–580,2006.
[66] M.L.Hansen,P.W.Gunn,andD.C.Kaelber,“Underdiagnosisofhypertensionin
childrenandadolescents,”Jama,vol.298,no.8,pp.874–879,2007.
[67] S.F.Butlerandetal.,“Cross-validationofascreenertopredictopioidmisuse
inchronicpainpatients(soapp-r),”Journalofaddictionmedicine,vol.3,no.2,
p.66,2009.
[68] Inflexxion,“Soapp,”2015. [Online].Available:
https://www.inflexxion.com/soapp-comm/
[69] J.S.Kniselyandetal.,“Prescriptionopioidmisuseindex:abriefquestionnaire
toassessmisuse,”JournalofSubstanceAbuseTreatment,vol.35,no.4,pp.380–
6,2008.
[70] T.Jonesandetal.,“Furthervalidationofanopioidriskassessmenttool:the
briefriskinterview,”JournalofOpioidManagement,vol.10,no.5,pp.353–64,
2014.
[71] Zdnet, Within two years, 2013. [Online]. Available:
http://www.zdnet.com/article/within-two-years-80-of-all-medical-datawill-
be-unstructured/
113
[72] S. M. Smith and et al., “Classification and definition of misuse, abuse, and
related events in clinical trials: Acttion systematic review and
recommendations,”PAIN,vol.154,no.11,pp.2287–2296,2013.
[73] T.P.Stratton,L.Palombi,H.Blue,andM.E.Schneiderhan,“Ethicaldimensions
of the prescription opioid abuse crisis,” American Journal of HealthSystem
Pharmacy,vol.75,no.15,pp.1145–1150,2018.
[74] C.Gounder,“Whoisresponsibleforthepain-pillepidemic,”TheNewYorker,
2013.
[75] B.Meier,Pain killer: A”wonder” drug’s trail of addiction and death. Rodale,
2003.
[76] G. Baucus, “Baucus Grassley opioid investigation letter to American pain
foundation,” https://www.finance.senate.gov/download/baucus-
grassleyopioid-investigation-letter-to-american-pain-foundation, May 2012,
accessed:2018-08-20.
[77] G.Kounang,“Senatereportsayspatientadvocacygroupsgetkickbacksfrom
opioid-manufacturers,”
https://www.cnn.com/2018/02/12/health/senatereport-opioid-
manufacturers-donations/index.html,February2018,accessed:2018-08-20.
[78] Ornstein,“Americanpainfoundationshutsdownassenatorslaunch
investigationofprescriptionnarcotics,”
https://www.propublica.org/article/senatepanel-investigates-drug-
company-ties-to-pain-groups,May2012,accessed:2018-08-20.
114
[79] R.C.Dart,H.L.Surratt,T.J.Cicero,M.W.Parrino,S.G.Severtson,B.Bucher-
Bartelson,andJ.L.Green,“Trendsinopioidanalgesicabuseandmortalityin
theunitedstates,”NewEnglandJournalofMedicine,vol.372,no.3,pp.241–
248,2015.
[80] T. D. Saha, B. T. Kerridge, R. B. Goldstein, S. P. Chou, H. Zhang, J. Jung, R. P.
Pickering,W. J.Ruan, S.M.Smith,B.Huangetal., “Nonmedicalprescription
opioid use and dsm-5 nonmedical prescription opioid use disorder in the
unitedstates,”TheJournalofclinicalpsychiatry,vol.77,no.6,p.772,2016.
[81] S.M.Frenk,K.S.Porter,andL.Paulozzi,Prescriptionopioidanalgesicuseamong
adults: United States, 1999-2012. US Department of Health and Human
Services, Centers for Disease Control and Prevention, National Center for
HealthStatistics,2015,no.2015.
[82] A.Kolodny,D.T.Courtwright,C.S.Hwang,P.Kreiner,J.L.Eadie,T.W.Clark,and
G. C. Alexander, “The prescription opioid and heroin crisis: a public health
approachtoanepidemicofaddiction,”Annualreviewofpublichealth,vol.36,
pp.559–574,2015.
[83] Cdc., “Increases in drug and opioid-involved overdose deaths united states,
20102015,” 2016. [Online]. Available:
https://www.cdc.gov/mmwr/volumes/65/wr/mm655051e1.htm
[84] national institute of drug abuse, “Indiana opioid summary,”
https://www.drugabuse.gov/drugs-abuse/opioids/opioid-summaries-
bystate/indiana-opioid-summary,February2018,accessed:2018-08-21.
115
[85] A.H.Alzeer,J.Jones,andM.J.Bair,“Reviewoffactors,methods,andoutcome
definitionindesigningopioidabusepredictivemodels,”PainMedicine,vol.19,
no.5,pp.997–1009,2017.
[86] U.Rajaandetal.,“Textmininginhealthcare.applicationsandopportunities,”
Journalofhealthcareinformationmanagement:JHIM,vol.22,no.3,pp.52–56,
2008.
[87] R. Kukafka and et al., “Human and automated coding of rehabilitation
discharge summaries according to the international classification of
functioning,disability,andhealth,”JournaloftheAmericanMedicalInformatics
Association,vol.13,no.5,pp.508–515,2006.
[88] S. G. Peters and J. D. Buntrock, “Big data and the electronic health record,”
JournalofAmbulatoryCareManagement,vol.37,no.3,pp.206–10,2014.
[89] L.Pereiraetal.,Usingtextminingtodiagnoseandclassifyepilepsyinchildren.
ine-HealthNetworking,Applications&Services(Healthcom),2013IEEE15th
InternationalConferenceon.2013.IEEE.
[90] D.S.Carrellandetal.,“Usingnaturallanguageprocessingtoidentifyproblem
usageofprescriptionopioids,”Internationaljournalofmedicalinformatics,vol.
84,no.12,pp.1057–1064,2015.
[91] R. E. Palmer and et al., “The prevalence of problem opioid use in patients
receiving chronic opioid therapy: computer-assisted review of electronic
healthrecordclinicalnotes,”Pain,vol.156,no.7,pp.1208–14,2015.
[92] I. Hleath, “About indiana university health,” 2013. [Online]. Available:
http://iuhealth.org/images/glo-new-doc/IUH-FactSheet-07-22-131(2).pdf
116
[93] A.Ghaffarinejad andM.Kerdegary, “Relationshipof opioiddependence and
positiveandnegativesymptomsinschizophrenicpatients,”Addiction&health,
vol.1,no.2,p.69,2009.
[94] Regenstrief, “ndepth,” 2016. [Online]. Available:
http://www.regenstrief.org/resources/ndepth/
[95] Regenstrief institute, “Regenstriefdatacore,” 2018. [Online].
Available:https://www.indianactsi.org/translational-informatics/data-core/
[96] H. Harkema and et al., “Context: an algorithm for determining negation,
experiencer,andtemporalstatusfromclinicalreports,”Journalofbiomedical
informatics,vol.42,no.5,pp.839–851,2009.
[97] Cornell, “42 cfr part 2 - confidentiality of substance use disorder patient
records,” https://www.law.cornell.edu/cfr/text/42/part-2, February 2017,
accessed:2018-09-15.
[98] CDC., “U. s. prescribing rate maps,”2017. [OnlinAvailable:
https://www.cdc.gov/drugoverdose/maps/rxrate-maps.html
[99] A.Goodnough,OpioidPrescriptionsFallAfter2010Peak,C.D.C.ReportFinds,in
TheNewYorkTimes,2017.
[100] K.Newman,O.PrescriptionsandA.D.D.inUSNews,Eds.,2018.
[101] A.S.Bohnertandetal.,“Associationbetweenopioidprescribingpatternsand
opioidoverdose-relateddeaths,”Jama,vol.305,no.13,pp.1315–1321,2011.
[102] K.M.Dunnandetal.,“Opioidprescriptionsforchronicpainandoverdose:A
cohortstudy,”AnnalsofInternalMedicine,vol.152,no.2,pp.85–92,2010.
117
[103] W. M. Compton, C. M. Jones, and G. T. Baldwin, “Relationship between
nonmedicalprescription-opioiduseandheroinuse,”NewEnglandJournalof
Medicine,vol.374,no.2,pp.154–163,2016.
[104] M.Okamotoetal.,“Anasymptoticexpansionforthedistributionofthelinear
discriminantfunction,”TheAnnalsofMathematicalStatistics,vol.34,no.4,pp.
1286–1301,1963.
[105] J. Beck, “Class balancing in weka,”2018. [OnlinAvailable:
https://www.youtube.com/watch?v=V9PNyx5-kxM
[106] O.Kwonand J.M. Sim, “Effects of data set features on theperformances of
classificationalgorithms,”ExpertSystemswithApplications,vol.40,no.5,pp.
1847–1857,2013.
[107] V.Lo´pez,A.Fern´andez,S.Garc´ıa,V.Palade,andF.Herrera,“Aninsightinto
classificationwithimbalanceddata:Empiricalresultsandcurrenttrendson
usingdata intrinsic characteristics,” InformationSciences, vol.250,pp.113–
141,2013.
[108] Y.Sun,M.S.Kamel,andY.Wang,“Boostingforlearningmultipleclasseswith
imbalancedclassdistribution,”innull.IEEE,2006,pp.592–602.
[109] H.HeandE.A.Garcia,“Learningfromimbalanceddata,”IEEETransactionson
Knowledge&DataEngineering,no.9,pp.1263–1284,2008.
[110] G.E.Batista,R.C.Prati,andM.C.Monard,“Astudyofthebehaviorofseveral
methods for balancing machine learning training data,” ACM SIGKDD
explorationsnewsletter,vol.6,no.1,pp.20–29,2004.
118
[111] B. Zadrozny and C. Elkan, “Learning andmaking decisionswhen costs and
probabilitiesarebothunknown,” inProceedingsof the seventhACMSIGKDD
internationalconferenceonKnowledgediscoveryanddatamining.ACM,2001,
pp.204–213.
[112] P.Domingos,“Metacost:Ageneralmethodformakingclassifierscostsensitive,”
inProceedingsofthefifthACMSIGKDDinternationalconferenceonKnowledge
discoveryanddatamining.ACM,1999,pp.155–164.
[113] S. Lomax, “How to know that our dataset is imbalance?” 2017. [Online].
Available:
https://www.researchgate.net/post/How toknowthat ourdatasetis
imbalance
[114] Y. Sun, A. K. Wong, and M. S. Kamel, “Classification of imbalanced data: A
review,”InternationalJournalofPatternRecognitionandArtificialIntelligence,
vol.23,no.04,pp.687–719,2009.
[115] Z.-H.ZhouandX.-Y.Liu,“Trainingcost-sensitiveneuralnetworkswithmethods
addressingtheclassimbalanceproblem,”IEEETransactionsonKnowledgeand
DataEngineering,vol.18,no.1,pp.63–77,2006.
[116] “.modelfile,”http://filext.com/file-extension/MODEL,accessed:2018-08-15.
[117] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised
learning algorithms,” inProceedings of the 23rd international conference on
Machinelearning.ACM,2006,pp.161–168.
119
[118] R.Couronn´e,P.Probst, andA.-L.Boulesteix, “Random forestversus logistic
regression:alarge-scalebenchmarkexperiment,”BMCbioinformatics,vol.19,
no.1,p.270,2018.
[119] T.PranckeviˇciusandV.Marcinkeviˇcius,“Comparisonofnaivebayes,random
forest, decision tree, support vector machines, and logistic regression
classifiersfortextreviewsclassification,”BalticJournalofModernComputing,
vol.5,no.2,p.221,2017.
[120] C.Perlich,“Whataretheadvantagesoflogisticregressionoverdecisiontrees?
are there any cases where it’s better to use logistic regression instead of
decision trees?” https://www.quora.com/What-are-the-advantages-
oflogistic-regression-over-decision-trees-Are-there-any-cases-where-its-
better-touse-logistic-regression-instead-of-decision-trees/answer/Claudia-
Perlich,Jun2017,accessed:2018-08-16.
[121] L.M.Schwartz,S.Woloshin,H.C.Sox,B.Fischhoff,andH.G.Welch,“Uswomen’s
attitudes to false positive mammography results and detection of ductal
carcinomainsitu:crosssectionalsurvey,”Bmj,vol.320,no.7250,pp.1635–
1640,2000.
[122] J. G. Elmore, D. L. Miglioretti, L. M. Reisch, M. B. Barton, W. Kreuter, C. L.
Christiansen, and S. W. Fletcher, “Screening mammograms by community
radiologists:variabilityinfalse-positiverates,”JournaloftheNationalCancer
Institute,vol.94,no.18,pp.1373–1380,2002.
[123] K. F. Foundation, “Opioidoverdosedeath rates andall drugoverdosedeath
rates per 100,000 population (age-adjusted),”
120
https://www.kff.org/other/stateindicator/opioid-overdose-death-rates/,
2018,accessed:2018-08-29.
[124] S.Redmore,“Machinelearningvs.naturallanguageprocessing,”2013.
[125] R.Zhang,S.Ma,L. Shanahan, J.Munroe,S.Horn,andS. Speedie, “Automatic
methods to extract new york heart association classification from clinical
notes,” in Bioinformatics and Biomedicine (BIBM), 2017 IEEE International
Conferenceon.IEEE,2017,pp.1296–1299.
[126] C.-H.Lee,C.-H.Wu,andH.-C.Yang,“Textminingofclinicalrecordsforcancer
diagnosis,”inInnovativeComputing,InformationandControl,2007.
ICICIC’07.SecondInternationalConferenceon. IEEE,2007,pp.172–172.
[127] X.Dong,L.Qian,Y.Guan,L.Huang,Q.Yu,andJ.Yang,“Amulticlassclassification
method based on deep learning for named entity recognition in electronic
medical records,” in Scientific Data Summit (NYSDS), 2016 New York. IEEE,
2016,pp.1–10.
[128] Z.Wei, Z. X. Ju, X. Chun, J. Hua, andP. Jin, “An automatic electronic nursing
recordsanalysissystembasedonthetextclassificationandmachinelearning,”
in Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2013 5th
InternationalConferenceon,vol.2.IEEE,2013,pp.494–498.
[129] D. Yuk, “Pdf to text converter,” http://java.worldbestlearningcenter.com/
2013/06/pdf-to-text-converter.html,June2013,accessed:2018-08-05.
CURRICULUMVITAE
AbdullahHamadAlzeer
EDUCATION
Master of Science inHealth Informatics atNorthernKentuckyUniversity,May
2012.
BachelorofScienceinPharmaceuticalscienceatKingSaudUniversity,February
2009.
EMPLOYMENT
Undergraduate Instructor, Department of Health Information Management,
IndianaUniversity,January2017-2018.
TeachingAssistant,DepartmentofClinicalPharmacy,KingSaudUniversity,June
2011-2018.
Pharmacist, Pharmacy Department, King Fahad Medical City, Riyadh, Saudi
Arabia,May2009-May2010.
PUBLICATION
Alzeer, A., Patel, J., Dixon, B., Bair, M. and Jones, J., 2018, June. A Comparison
BetweenTwoApproachestoIdentifyOpioidUseProblems:ICD-9vs.Text-Mining
Approach.In2018IEEEInternationalConferenceonHealthcareInformatics(ICHI)
(pp.455-456).IEEE.
Alzeer,A.H.,Jones,J.andBair,M.J.,2017.Reviewoffactors,methods,andoutcome
definitionindesigningopioidabusepredictivemodels.PainMedicine,19(5),pp.997-
1009.
Dixon, B.E., Alzeer, A.H., Phillips, E.O.K. andMarrero,D.G., 2016. Integration of
provider,pharmacy,andpatient-reporteddatatoimprovemedicationadherencefor
type2diabetes:acontrolledbefore-afterpilotstudy.JMIRmedicalinformatics,4(1).
PROFESSIONALPRESENTATIONS
Alzeer,AbdullahH.IdentifyOpioidUseProblem:TextMiningApproachIEEEICHI
2018.NewYork,NY.June2018.
AWARDS
BestSaudiStudentsClubPresidentinUSbySaudiArabianCultureMission.
Washington,DC.October2017.