predicting microculture results for optimized antibiotic...

Predicting Microculture Results for Optimized Antibiotic Treatment

Conor K Corbin 1

AbstractOver 700000 patients die a year due to antibi-otic resistant infections Antibiotic resistance isa growing public health concern It is estimatedthat up to 50 of antibiotic use in medical set-tings is either inappropriate or sub-optimal (CDC)We target a specific inpatient clinical workflowinvolving antibiotic distribution and design ma-chine learning classifiers to intervene at variousinstances We build classifiers to predict micro-cultures labs that will not grow bacteria (a proxyfor predicting lack of infection) and classifiersthat predict bug susceptibility to an array of an-tibiotics Our classifiers achieve an AUROC of067 and 070 when predicting no growth bloodand urine cultures respectively Our susceptibilityclassifiers contain operating regions that allow de-escalation to targeted antibiotics in cases wherede-escalation would not occur using current stateof the art tools

1 IntroductionAntibiotic resistance is one of the most serious public healthconcerns and threatens to return us to an age where infec-tions were the leading cause of death in the United StatesAntibiotics grant enormous utility but when used improp-erly enable bacteria to develop immunity In an effort tocombat this issue the Joint Commission has developed an-timicrobial stewardship standards that hospitals around thecountry are expected to follow These standards establishclinical workflows designed to optimize antibiotic treatmentWhen a patient presents with infection like symptoms mi-crocultures are ordered Within a week results of thesecultures are returned detailing the organism causing theinfection (if one exists) and a set of antibiotics to which it issusceptible In the meantime empiric antibiotics (usuallybroad-spectrum) are administered to the patient to maximizethe likelihood that they positively respond Once results arereturned physicians transition their patients to a more tar-geted antibiotic therapy A graphical illustration of the thisworkflow is depicted in Figure 1 Some institutions aug-ment this workflow with the use of antibiograms - tablesdepicting drugbug susceptibility patterns using local hos-

pital data (Hindler amp Stelling 2007) In some cases usingthis tool allows clinicians to de-escalate their patients toless broad-spectrum drugs as soon as the infecting agent isknown Most of the time de-escalation occurs only aftersusceptibility results are returned An example antibiogramis shown in Figure 2 During empiric treatment patients areexposed to broad-spectrum drugs that 1) contribute to thegrowing problem of antibiotic resistance and 2) often causeserious side effects (Giuliano et al 2016) We experimentwith predictive models designed to intervene at several in-stances of this workflow in an attempt to maximize patientsafety and minimize wasteful antibiotic use

2 Related WorkSeveral studies have looked at inefficiencies of empiric an-tibiotic treatment in hospital settings These studies breakdown into roughly three categories inference studies MLstudies and ML + policy studies On the inference side(Anstey et al 2018) compute likelihood ratios of infectiongiven various predictors and find significant associationswith transient hypotension and central line presence On theML side (Hernandez et al 2017) fit a series of classifiers(Naive Bayes Decision Tree Random Forest and SupportVector Machine) to microculture data to predict the pres-ence of infection They limit their analysis to six predictorsalanine aminotransferase alkaline phosphatase bilirubincreatinine C-Reactive protein and white blood cell countand achieve an AUROC of 08 Another group looks specif-ically at predicting the presence of urinary tract infectionfrom urine cultures taken by primary care doctors in Den-mark (Ribers amp Ullrich 2019) They fit a Random Forestclassifier and use a variety of different features includingprior lab tests and values medications and comorbiditiesto aid their predictions The authors work with a datasetcontaining 95594 urine samples and achieve an overall AU-ROC of 073 when predicting bacterial growth on out ofbag samples The authors take a step further and use thepredicted probabilities that their machine learning classi-fier generates to implement an antibiotic prescribing policyThey compare the policy to the prescribing patterns of physi-cians The TREAT decision support system uses a causalprobabilistic network to estimate the probability of infectionits severity source pathogen distribution and antibiotic cov-erage (Paul et al 2006) A prospective cohort study testing


Figure 1 Graphic illustration of microculture workflow in an in-patient setting On Day 1 a patient shows up to the hospital withsigns of infection Their physician prescribes empiric antibiotics(broad-spectrum) to maximize the likelihood the patient respondsAt the same time a microculture is ordered At Day 3 microcul-ture results come back detailing the infecting agent if one existsAt Day 5 susceptibility results come back detailing an array ofantibiotics for which the bacteria is susceptible to and the patientis de-escalated to directed therapy

this technology on 1203 patients revealed an increased rateof appropriate empiric antibiotic prescriptions compared tophysicians (70 vs 57) while using less broad-spectrumantibiotics Antibiotic resistance patterns are non-stationaryand new analysis and support systems are needed to leveragepatient data to optimize antimicrobial treatment pathways

3 Dataset and FeaturesWe fit predictive models using data from STRIDEs (Stan-ford Translational Research Integrated Database Environ-ment) inpatient clinical data warehouse which contains de-identified patient EMRs (electronic medical records) thatspan between 2008-2017 Our dataset contains 115k uniquepatient records and 200k unique encounters Patient in-formation stored in EMRs include patient demographicscomorbidities lab orders and results vital signs medica-tions and treatment teams Of particular interest in thisstudy is STRIDErsquos microculture table which includes mi-croculture orders results and bacteria susceptibility testinginformation on over 80000 blood and 30000 urine culturesrespectively

4 MethodsWe implement predictive models to intervene at both day1 - the point at which microcultures are ordered and day3 - the point at which the identity of the infecting agent isobserved For our first level of analysis we train binaryclassifiers to predict whether or not bacteria will grow at allWe do this for two types of microcultures - blood culturesand urine cultures Our dataset includes 80k blood culturesof which 95 come back negative and 30k urine culturesof which 75 come back negative Clinically the ability to

Figure 2 Antibiogram constructed from Stanford Hospital dataRows are bugs columns are drugs Each value indicates the percentof time a particular bug was susceptible to a drug Clinicians usethese tables to inform their antibiotic prescription decisions Use ofan antibiogram sometimes allows clinicians to de-escalate patientsto more targeted therapy at Day 3 - when the name of the infectingagent is known Often the drop in percent effectiveness from abroad spectrum drug to something more targeted is too much torisk

predict negative cultures with high confidence is interestingbecause it suggests that a machine learning tool could helpprevent needless antibiotic treatment Our positive labels arethus negative cultures Our second level of analysis predictsbacteria susceptibility to a set of antibiotics - independent ofthe type of infecting agent Our last set of classifiers predictsusceptibility to a set of antibiotics given the bacteria typeWe contrast the utility of these classifiers with antibiograms -the current state of the art tool informing clinician antibioticprescription decisions

41 Feature Engineering

We use patient medical information stored in EMRs to makepredictions We leverage patient demographics comorbidi-ties prior lab tests (including prior microcultures and bugsusceptibility screenings) vital signs (including heart beatand blood pressure) medication use (including prior an-tibiotic prescriptions) and treatment teams Categoricalvariables (comorbidities treatment teams and lab orders)are treated as counts over varying time windows in a patienthistory Each categorical features is represented as countsover the past (1 2 4 7 14 30 90 180 365 730 1460)patient days We also include the total number of occur-rences over the patientrsquos entire medical history - and thetime in days since the last occurrence For features thatcontain numerical values (heart beat blood pressure whiteblood cell count) we compute summary statistics over thepast 14 day window These summary statistics include theminimum maximum median mean standard deviationfirst last and slope over the window Missing values areimputed by taking the mean over columns using our trainingset

42 Feature amp Model Selection

Our resulting design matrix above is sparse and containsmany columns with redundant information We utilize arecursive feature selection algorithm in an attempt to reduce


Figure 3 Patient data available in an EMR illustrated as a medical timeline (A) Data include patient demographics diagnosis labs vitalstreatment teams and prior medications The data is time series and needs to be transformed into a patient feature matrix (B) before itcan be fed into downstream machine learning models This is done by taking counts of categorical events and summary statistics of realvalued events over varying time windows

the variance of our resulting classifiers and eliminate uselessfeatures We do this by recursively training a Random Forestclassifier on our current feature space ranking each featurebased on how well it decreases node impurity (GINI score)(Menze et al 2009) and removing the least important fea-tures until a desired number remain (we empirically choose5) This is done using only our training data We use theresulting design matrix to construct our models We trainboth a linear Logistic Regression and non-linear RandomForest We add L1-regularization to our Logistic Regres-sion to further reduce the variance of our model We tunethe regularization coefficient with a k=10 cross-validationgrid-search using only our training data We similarly tunethe number of trees max depth min samples required forsplit and max features of our Random Forest model Wesubsequently train our best linear and non-linear model onour entire training set and evaluate on our test set Ourtraining and test sets are obtained with an 8020 split Wesplit over the patient id column ensuring that microculturesfrom the same patient do not exist in both the training andtest set

5 Experiments amp Results51 Predicting No Growth Cultures

Our first set of analysis involves predicting whether or notbacteria will grow at all in microcultures ordered at StanfordHospital We look at two types of microcultures bloodcultures (N=80000) and urine cultures (N=30000) 96of all blood cultures at Stanford come back negative 75 ofall urine cultures are negative Our L1 Logistic Regressionachieved an AUROC of 064 and 065 predicting no growth

blood and urine cultures respectively The Random Forestmodel achieved an AUROC of 066 and 070 We showan ROC and precision recall curve for the Random Forestmodel in Figure 4

52 Predicting Bug Susceptibility

We next predict bug susceptibility to an array of commonlyprescribed antibiotics given that bacteria has grown in theextracted culture Our positive labels are cultures that grewbacteria susceptible to the antibiotic in question Negativelabels are cultures that grew bacteria not susceptible to theantibiotic Because we rely on the prior of bacterial growththese classifiers are currently designed to intervene at Day 3of the workflow It is worth noting however that we only usedata up until the point at which the microculture is ordered -and we do not make use of the name of the infecting agentwhen making our predictions To instead intervene at ordertime (Day 1) we could more carefully design our test setsuch that both positive and negative growth cultures areappropriately represented Here we train a Random Forestclassifier for 16 binary prediction tasks corresponding to16 different antibiotics In Table 1 we show the AUROCAUPRC and baseline prevalence for the top and bottom per-formers Our top two performers (Vancomycin and Zosyn)motivate a path forward we illustrate in our discussion

53 Personalized Antibiograms

Our last analysis focuses on classifiers specifically designedto intervene at Day 3 - the point at which the infectingagent is returned Here we use data up until the point theresults of the culture are returned including the name ofthe infecting agent to predict a set of antibiotics for which


Figure 4 ROC and precision recall curves for our Random Forest Model predicting no growth blood and urine cultures Our modelachieves an AUROC of 066 and 070 on blood and urine cultures respectively They achieve an AUPRC of 098 and 088 respectively Inboth precision recall curves we box an operating region where our classifiers can predict no growth with gt 95 confidence While a nogrowth culture does not completely imply no infection it is fair to assume that a large portion of these patients are receiving inappropriateantibiotics

Table 1 AUROC AUPRC and baseline positive class prevalencefor best and worst classifiers predicting bug susceptibility

DRUG AUROC AUPRC PREVALENCE

VANCOMYCIN 086 096 092ZOSYN 070 096 093DAPTOMYCIN 067 097 095 AMPSULBACTAM 055 061 056CEFTRIAXONE 050 088 087OXACILLIN 049 071 071

a specific type of bacteria (E Coli) is susceptible to Be-cause we use the name of the infecting agent as a featurein these classifiers we can directly compare the utility ofour predictions to the utility of antibiograms - the tool seenin Figure 2 clinicians currently use to help make properprescription choices We ask the question - do personal-ized antibiograms allow clinicians to de-escalate to lessbroad-spectrum drugs when antibiograms alone do not Wefocus on the following de-escalation pathway - a set of 4antibiotics listed from broadest-spectrum to most targetedMeropenem - PiperacillonTazobactam - Ceftriaxone - Ce-fazolin An antibiogram revealing baseline susceptibility ofEColi to these four drugs constructed from Stanford Hos-pital data is seen in Table 2 We train classifiers to predictEColi susceptibility to PiperacillonTazobactam Ceftriax-one and Cefazolin Precision recall curves are shown foreach of these three classifiers in Figure 4 Note the baselinefor each precision recall curve perfectly corresponds to thevalue in the antibiogram - because the antibiogram depictspercent susceptible

Table 2 Antibiogram depicting E Coli susceptibility to a set ofantibiotics in a common de-escalation pathway

DRUG SUSCEPTIBILITY ISOLATES

MEROPENEM 099 1560PIPTAZO 092 3129CEFTRIAXONE 076 883CEFAZOLIN 075 3632

6 Discussion amp Future Work61 Predicting No Growth Cultures

As seen from Figure 3 we are able to predict no growthmicrocultures with reasonable performance We highlighton operating region for both blood and urine culture clas-sifiers where we are able to predict no growth with gt 95certainty Although a no growth blood or urine culture isnot entirely indicative of the fact that a patient is not in-fected (infection may not be in the blood stream or may notbe a urinary tract infection) it is highly likely that a largeportion of the patients depicted in this operating region arereceiving antibiotics without actually being infected A pathforward would be to generate more sophisticated class la-bels to further separate infected from not infected and get atrue number of how many times the misuse of antibiotics onnon-infected patients could have been prevented


Our second set of classifiers had high variance in predictiveperformance We were able to predict bug susceptibilityto Vancomycin Zosyn and Daptomycin with fairly high


Figure 5 Precision recall curves showing how well we can predict E Coli susceptibility to PiperacillonTazobactam Ceftriaxone andCefazolin using EMR data These classifiers allow us to make patient level predictions Baseline positive class prevalence is shown withthe dotted black line Dotted colored lines indicate baseline prevalence of the more broad-spectrum drug immediately to the left of thedrug in question We box operating regions in these precision recall curves where the classifiers would allow clinicians to de-escalate to aless broad-spectrum drug in cases where an antibiogram alone would not

performance - refer to Table 1 for values Other drugs(AmpicillinSulbactam Ceftriaxone and Oxacillin) we didessentially no better than random chance Our top classi-fiers (Vancomycin and Zosyn) are interesting because thesetwo drugs are often prescribed together as an empiric regi-ment Zosyn is a broad-spectrum drug adept at killing gramnegative bacteria (bacteria with an outer cell wall) Zosyndoes not cover MRSA (Methecillin Resistant Staph Aureus)Vancomycin does 85 of the time Vancomycin and Zosynare prescribed together MRSA is not the infecting agentand Zosyn alone would have sufficed This is worrisomenot just because it needlessly contributes to antibiotic re-sistance but also because Vancomycin and Zosyn takentogether is associated with acute kidney disease (Giulianoet al 2016) An interesting path forward would be to traina classifier to predict the absence of MRSA and attempt tofind an operating region where certain patients could safelybe de-escaleted from Vancomycin + Zosyn to Zosyn

63 Personalized Antibiogram

Our last set of classifiers (operating at Day 3) allow clin-icians to safely de-escalate to less broad-spectrum drugswhere antibiograms alone could not This is illustrated inFigure 3 As an example the drop in percent of EColisusceptible to piperacillontazpbactam vs ceftriaxone (dataavailable with an antibiogram) is quite large - 92 to 76In most cases a clinician will not be willing to take theadded risk and de-escalate their patient infected with EColito ceftriaxone By leveraging a classifier trained on EMRdata we are able to find an operating region where patientsinfected with EColi are just as likely to be susceptible toceftriaxone as they are piperacillontazobactam given the

antibiogram data - 92 In these cases a clincian would bewilling to de-escalate their patient to ceftriaxone - the moretargeted therapy

7 ConclusionsAntibiotic resistance is a growing public health concernand is only accelerated by systemic misuse of antibioticsin medical settings We target a specific clinical workflowand build various machine learning classifiers designed tointervene at particular instances We demonstrate an abilityto predict no growth cultures at order time and an ability toleverage ML to de-escalate patients to more targeted therapyin cases where this could not be done with the current stateof the art Machine learning shows promise in its ability tohelp physicians optimize antibiotic prescriptions

8 ContributionsThough I did not have a CS229 partner I would like to ac-knowledge Kojo Osei Rich Medford and Song Xu from theStanford HealthRex lab Kojo began the project as a Mas-terrsquos student last year and developed the original frameworkto predict no growth cultures Rich Medford is a infectiousdisease specialist at the UT Southwest Medical center andgave useful insight when I became project lead Song Xuwas a post doc at Stanfordrsquos HealthRex Lab and helped de-velop the machine learning pipeline Irsquod also like to thankmy research advisor Jonathan Chen for guidance throughoutthe quarter


9 CodeMicroCulture Project

ReferencesCDC Antibiotic use in the united

states httpswwwcdcgovantibiotic-usestewardship-reportpdfstewardship-reportpdf Accessed2019-04-26

Anstey J E Murray S and Yim J W Predicting bac-teremia in hospitalized patients An analysis of elec-tronic health record data In JOURNAL OF GENERALINTERNAL MEDICINE volume 33 pp S297ndashS297SPRINGER 233 SPRING ST NEW YORK NY 10013USA 2018

Giuliano C A Patel C R and Kale-Pradhan P B Is thecombination of piperacillin-tazobactam and vancomycinassociated with development of acute kidney injury ameta-analysis Pharmacotherapy The Journal of HumanPharmacology and Drug Therapy 36(12)1217ndash12282016

Hernandez B Herrero P Rawson T M Moore L SEvans B Toumazou C Holmes A H and GeorgiouP Supervised learning for infection risk inference usingpathology data BMC medical informatics and decisionmaking 17(1)168 2017

Hindler J F and Stelling J Analysis and presentation of cu-mulative antibiograms a new consensus guideline fromthe clinical and laboratory standards institute Clinicalinfectious diseases 44(6)867ndash873 2007

Menze B H Kelm B M Masuch R HimmelreichU Bachert P Petrich W and Hamprecht F A Acomparison of random forest and its gini importance withstandard chemometric methods for the feature selectionand classification of spectral data BMC bioinformatics10(1)213 2009

Paul M Andreassen S Tacconelli E Nielsen A DAlmanasreh N Frank U Cauda R Leibovici L andGroup T S Improving empirical antibiotic treatment us-ing treat a computerized decision support system clusterrandomized trial Journal of Antimicrobial Chemother-apy 58(6)1238ndash1245 2006

Ribers M A and Ullrich H Battling antibiotic resistanceCan machine learning improve prescribing 2019


Figure 1 Graphic illustration of microculture workflow in an in-patient setting On Day 1 a patient shows up to the hospital withsigns of infection Their physician prescribes empiric antibiotics(broad-spectrum) to maximize the likelihood the patient respondsAt the same time a microculture is ordered At Day 3 microcul-ture results come back detailing the infecting agent if one existsAt Day 5 susceptibility results come back detailing an array ofantibiotics for which the bacteria is susceptible to and the patientis de-escalated to directed therapy

this technology on 1203 patients revealed an increased rateof appropriate empiric antibiotic prescriptions compared tophysicians (70 vs 57) while using less broad-spectrumantibiotics Antibiotic resistance patterns are non-stationaryand new analysis and support systems are needed to leveragepatient data to optimize antimicrobial treatment pathways

3 Dataset and FeaturesWe fit predictive models using data from STRIDEs (Stan-ford Translational Research Integrated Database Environ-ment) inpatient clinical data warehouse which contains de-identified patient EMRs (electronic medical records) thatspan between 2008-2017 Our dataset contains 115k uniquepatient records and 200k unique encounters Patient in-formation stored in EMRs include patient demographicscomorbidities lab orders and results vital signs medica-tions and treatment teams Of particular interest in thisstudy is STRIDErsquos microculture table which includes mi-croculture orders results and bacteria susceptibility testinginformation on over 80000 blood and 30000 urine culturesrespectively

4 MethodsWe implement predictive models to intervene at both day1 - the point at which microcultures are ordered and day3 - the point at which the identity of the infecting agent isobserved For our first level of analysis we train binaryclassifiers to predict whether or not bacteria will grow at allWe do this for two types of microcultures - blood culturesand urine cultures Our dataset includes 80k blood culturesof which 95 come back negative and 30k urine culturesof which 75 come back negative Clinically the ability to

Figure 2 Antibiogram constructed from Stanford Hospital dataRows are bugs columns are drugs Each value indicates the percentof time a particular bug was susceptible to a drug Clinicians usethese tables to inform their antibiotic prescription decisions Use ofan antibiogram sometimes allows clinicians to de-escalate patientsto more targeted therapy at Day 3 - when the name of the infectingagent is known Often the drop in percent effectiveness from abroad spectrum drug to something more targeted is too much torisk

predict negative cultures with high confidence is interestingbecause it suggests that a machine learning tool could helpprevent needless antibiotic treatment Our positive labels arethus negative cultures Our second level of analysis predictsbacteria susceptibility to a set of antibiotics - independent ofthe type of infecting agent Our last set of classifiers predictsusceptibility to a set of antibiotics given the bacteria typeWe contrast the utility of these classifiers with antibiograms -the current state of the art tool informing clinician antibioticprescription decisions

41 Feature Engineering

We use patient medical information stored in EMRs to makepredictions We leverage patient demographics comorbidi-ties prior lab tests (including prior microcultures and bugsusceptibility screenings) vital signs (including heart beatand blood pressure) medication use (including prior an-tibiotic prescriptions) and treatment teams Categoricalvariables (comorbidities treatment teams and lab orders)are treated as counts over varying time windows in a patienthistory Each categorical features is represented as countsover the past (1 2 4 7 14 30 90 180 365 730 1460)patient days We also include the total number of occur-rences over the patientrsquos entire medical history - and thetime in days since the last occurrence For features thatcontain numerical values (heart beat blood pressure whiteblood cell count) we compute summary statistics over thepast 14 day window These summary statistics include theminimum maximum median mean standard deviationfirst last and slope over the window Missing values areimputed by taking the mean over columns using our trainingset

42 Feature amp Model Selection

Our resulting design matrix above is sparse and containsmany columns with redundant information We utilize arecursive feature selection algorithm in an attempt to reduce











































predicting microculture results for optimized antibiotic...

Documents