using machine learning to automate clinical pathways
TRANSCRIPT
Using Machine Learning to Automate Clinical Pathways
David Sontag, PhD Department of Computer Science
Courant Ins@tute of Mathema@cal Sciences NYU
Joint work with my student Yoni Halpern (NYU) and Steven Horng (Beth Israel Deaconess Medical Center)
Health Informa@on Technology is Rapidly Changing
• Aided by HITECH Act, hospital adop@on of EHRs has increased 5-‐fold since 2008
[Charles et al., ONC Data Brief, May 2014]
• Over $4 billion of investment in digital health startups in 2014
Health Informa@on Technology is Rapidly Changing
Analy@cs / Big Data
Healthcare Consumer Engagement
[Wang et al., “Digital health funding in Q1 2015 over $600M”, Rock Health, April 2015]
EHR / Clinical Workflow
Digital Diagnos@cs
Popula@on Health
Management
Digital Medical Device
[Weber et al. (2014). Finding the Missing Link for Big Biomedical Data. JAMA.]
Wealth of digital health data available
Research in my clinical ML lab
• Next-‐genera*on electronic health records focus of today’s talk
• Popula@on-‐level risk stra@fica@on
• Beber managing pa@ents with chronic disease
clinicalml.org
Triage Informa@on (Free text)
Lab results (Con@nuous valued)
MD comments (free text)
Specialist consults Physician documenta@on
Repeated vital signs (con@nuous values) Measured every 30 s
T=0
30 min 2 hrs
Disposi@on
Next-Generation EHR for the Emergency Department
All pa*ent observa*ons
MD/nurse documenta@on
Billing codes Vitals Orders Labs History
Built on Top of Real-‐@me Predic@on of Clinical State Variables
All pa*ent observa*ons
Clinical state variables
MD/nurse documenta@on
Billing codes Vitals Orders Labs History
From nursing home?
Has altered mental status?
Has cardiac e@ology?
Has infec@on?
Will die in next 30 days?
Built on Top of Real-‐@me Predic@on of Clinical State Variables
Machine learning and natural language processing
All pa*ent observa*ons
Clinical state variables
MD/nurse documenta@on
Billing codes Vitals Orders Labs History
Ac*on Alerts/Reminders
Decision support Cohort Selec@on QA review Contextual display
From nursing home?
Has altered mental status?
Has cardiac e@ology?
Has infec@on?
Will die in next 30 days?
Built on Top of Real-‐@me Predic@on of Clinical State Variables
Machine learning and natural language processing
All pa*ent observa*ons
Clinical state variables
MD/nurse documenta@on
Billing codes Vitals Orders Labs History
Ac*on Alerts/Reminders
Decision support Cohort Selec@on QA review Contextual display
From nursing home?
Has altered mental status?
Has cardiac e@ology?
Has infec@on?
Will die in next 30 days?
Built on Top of Real-‐@me Predic@on of Clinical State Variables
Machine learning and natural language processing
Advise fall precau@ons
Suggested order sets
Triggering celluli@s pathway
Sepsis alert Panel management
Example: Triggering Clinical Pathways
• Clinical Pathways project at Beth Israel Deaconess Medical Center (BIDMC)
• Standardizing care in the Emergency Department
– Reduce possibili@es for error
– Enforce established best prac@ces
• Pathways have been shown to reduce in-‐hospital complica@ons, without increasing costs [Rober et al 2010]
Current triggering mechanism (Celluli@s pathway)
Trigger if chief complaint contains any of the following:
CELLULITIS, REDDENED HOT LIMB, ERYTHEMA, LEG SWELLING, INFECTION, HAND, LEG, FOOT, TOE, ARM, FACE, FINGER
Current triggering mechanism (Celluli@s pathway)
Trigger if chief complaint contains any of the following:
CELLULITIS, REDDENED HOT LIMB, ERYTHEMA, LEG SWELLING, INFECTION, HAND, LEG, FOOT, TOE, ARM, FACE, FINGER
Expert constructed rule – built for sensi*vity
Could we learn a beber rule?
Supervised learning is a non-‐starter
• Leverage large clinical databases to learn predic@ve rules.
• Need labeled data • Classifiers onen don’t generalize across ins@tu@ons
LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data&
Our contribu@on: Anchor & Learn Framework
• Use a combina@on of domain exper@se (simple rules) and vast amounts of data (machine learning).
• Method does not require any manual labeling.
• Anchors are highly transferable between ins@tu@ons.
[Halpern et al., AMIA 2014]
What are anchors?
• Rather than provide gold-‐standard labels, construct a simple rule that can catch some posi@ve cases.
What are anchors?
• Rather than provide gold-‐standard labels, construct a simple rule that can catch some posi@ve cases.
• Examples:
Phenotype Possible Anchor Diabe@c gsn:016313 (insulin) in Medica@ons
Cardiac ICD9:428.X (heart failure) in Diagnoses
Nursing home “from nursing home” in text
Social work “social work consulted” in text
What are anchors?
• Rather than provide gold-‐standard labels, construct a simple rule that can catch some posi@ve cases. Low sensi*vity here is ok!
• Examples:
Phenotype Possible Anchor Diabe@c gsn:016313 (insulin) in Medica@ons
Cardiac ICD9:428.X (heart failure) in Diagnoses
Nursing home “from nursing home” in text
Social work “social work consulted” in text
Learning with Anchors LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data&
Pa@ent database
• Iden@fy anchors
Learning with Anchors LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data&
Pa@ent database
1011001
• Iden@fy anchors
Learning with Anchors LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data&
Pa@ent database
1011001
• Iden@fy anchors • Learn to predict the anchors (anchor as pseudo-‐labels)
Learning with Anchors LOINC& UMLS&CUID& RXnorm& ICD9& Unstructured&Data&
Pa@ent database
1011001
• Iden@fy anchors • Learn to predict the anchors (anchor as pseudo-‐labels) • Account for the difference between anchors and labels
Transform
Predict anchor Predict label
LOINC& UMLS&CUID& RXnorm& ICD9& Different&data&types&
New ins@tu@on
Generalizability/Portability
Data may be very different: • Language • Representa@on • Popula@on
New ins@tu@on
Generalizability/Portability
As long as our anchors appear in the new data as well…
LOINC& UMLS&CUID& RXnorm& ICD9& Different&data&types&
New ins@tu@on
Generalizability/Portability
As long as our anchors appear in the new data as well… Can learn a new model, specific to the new ins@tu@on.
LOINC& UMLS&CUID& RXnorm& ICD9& Different&data&types&
New ins@tu@on
Generalizability/Portability
As long as our anchors appear in the new data as well… Can learn a new model, specific to the new ins@tu@on.
Only need to share anchor defini*ons, Each site trains models on its own data.
LOINC& UMLS&CUID& RXnorm& ICD9& Different&data&types&
Theore@cal basis for anchors • Unobserved variable: Y, Observa@on: A • A is an anchor for Y if condi@oning on A=1 gives uniform samples from the set of posi8ve cases.
Theore@cal basis for anchors • Unobserved variable: Y, Observa@on: A • A is an anchor for Y if condi@oning on A=1 gives uniform samples from the set of posi8ve cases.
• Alterna@ve formula@on – two necessary condi@ons:
P (Y = 1|A = 1) = 1Posi*ve condi*on
A ? X|YCondi*onal independence
AND
X represents all other observa@ons.
Theore@cal basis for anchors • Unobserved variable: Y, Observa@on: A • A is an anchor for Y if condi@oning on A=1 gives uniform samples from the set of posi8ve cases.
• Alterna@ve formula@on – two necessary condi@ons:
P (Y = 1|A = 1) = 1Posi*ve condi*on
A ? X|YCondi*onal independence
AND
X represents all other observa@ons.
e.g. If pa@ent is taking insulin, the pa@ent is surely diabe*c.
e.g. If we know the pa@ent had heart failure, knowing whether the diagnosis code appears does inform us about the rest of the record.
Theore@cal basis for anchors • Unobserved variable: Y, Observa@on: A • A is an anchor for Y if condi@oning on A=1 gives uniform samples from the set of posi8ve cases.
• Theorem [Elkan & Noto 2008]:
In the above se>ng, a func8on to predict A
can be transformed to predict Y
• Can also use more recent advances on learning with noisy labels (e.g., Natarajan et al., NIPS ‘13)
Learning with anchors Input: anchor A
unlabeled pa@ents Output: predic@on rule 1. Learn a calibrated classifier (e.g.
logis@c regression) to predict:
2. Using a validate set, let P be the pa@ents with A=1. Compute:
3. For a previously unseen pa@ent t, predict:
Pr(A = 1 | X̃ )
C =1
|P|X
k2PPr(A = 1 | X̃ (k))
[Elkan & Noto 2008]
1
CPr(A = 1|X (t)) if A(t) = 0
1 if A(t) = 1
Calibra*on C is the average model
predic@on for pa@ents with anchors.
Learning Learn to predict A from the other variables.
Transforma*on If no anchor present,
according to a scaled version of the anchor-‐predic@on
model.
…
…
Specified anchors Automated sugges@ons
Detailed pa@ent display
Ranked pa@ent list
Pa@ent filters
User interface to specify anchors
Rapid itera*on ~30 min to add a new clinical state
variable
Sonware freely available: clinicalml.org
Learned model: Celluli@s
Pyxis
Unstructured text
Anchors Highly weighted features (covariates)
ICD9 680-‐686: Infec*ons of skin and subcutaneous *ssue
celluli*s celluli*c cellulits paronychia pilonidal bite cyst boil
abcess abscess abcesses red redness reddness erythema
unasyn vanco finger thumb rle lle gluteal
cephalexin vancomycin clindamycin
cephazolin amoxicillin sulfameth/trimeth
(using 200K pa@ents’ data, 2008-‐2013)
Learned model: Cardiac E@ology
ICD9 codes 410.* acute MI
411.* other acute … 413.* angina pectoris 785.51 card. shock
Pyxis coron. vasodilators
loop diure@c
Anchors
cmed
Ages age=80-‐90 age=70-‐80 age=90+
nstemi stemi ntg lasix nitro
lasix furosemide
Medica*ons aspirin
clopidogrel Heparin Sodium Metoprolol Tartrate
Morphine Sulfate Integrilin Labetalol
Pyxis
Unstructured text
cp chest pain edema cmed
chf exacerba@on sob
pedal edema
Sex=M
Highly weighted features (covariates)
(using 200K pa@ents’ data, 2008-‐2013)
Learned model: Nursing Home
nursing facility
nursing home nsg facility
nsg home nsg. home
from staff at
resident sent
reported
Ages age=90+ age=80-‐90 age=70-‐80
baseline changes nonverbal
ams unwitnessed_fall
confusion
senna colace
trazodone
dnr full code g tube foley nh
Medica*ons
vancomycin levofloxacin
Pyxis
Unstructured text
Anchors
mirtazapine maalox tums
Highly weighted features (covariates)
(using 200K pa@ents’ data, 2008-‐2013)
Evalua@on: ED red flags
• Ac@ve malignancy • Fall • Cardiac E@ology • Infec@on • From Nursing Home
• An@coagulated • Immunosuppressed • Sep@c Shock • Pneumonia
We gathered gold standard labels for these 9 variables by adding ques@ons to EMR at @me of ED disposi@on:
Comparison to Exis@ng Approaches
• (Rules) Predict just according to the anchors. – 1 if anchor is present, 0 otherwise
• (ML) Machine learning (logis@c regression) – Using up to 3K labels – Improves with more labels, but labels are expensive!
Scaling this up • Currently making predic@ons for 40 clinical variables within the BIDMC pa*ent display – e.g. allergic reac@on, motor vehicle accident, hiv+
• Only turned on for a small number of clinicians
Suggested tags: MD can accept/reject
Scaling this up • Currently making predic@ons for 40 clinical variables within the BIDMC pa*ent display – e.g. allergic reac@on, motor vehicle accident, hiv+
Accep@ng a tag triggers events (pathway enrollment, specialized order sets, etc)