2014-01-07 unreasonable effectiveness of data...the learning health care system every clinical...
TRANSCRIPT
![Page 1: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/1.jpg)
The UnreasonableEffectiveness of Data
Peter SzolovitsMIT Computer Science and Artificial Intelligence LabProfessor of Computer Science and EngineeringProfessor of Health Sciences and Technology, IMES
Critical Data: Secondary Use of Big Datafrom Critical CareJanuary 7, 2014
1
Clinical
![Page 2: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/2.jpg)
Peter NorvigIEEE Intelligent Systems,2009
... a largetraining set ofthe input-output behaviorthat we seek toautomate isavailable to usin the wild.
![Page 3: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/3.jpg)
Google’s Lessons
Much of human knowledge is not like physics!“... invariably, simple models and a lot of data trump more elaborate models based
on less data”“... simple n-gram models or linear classifiers based on millions of specific features
perform better than elaborate models that try to discover general rules”“... all the experimental evidence from the last decade suggests that throwing away
rare events is almost always a bad idea, because much Web data consists ofindividually rare but collectively frequent events”
••
•
•
![Page 4: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/4.jpg)
More Data vs. Better Algorithms(Word Sense Disambiguation Task)
http://www.stanford.edu/group/mmds/slides2010/Norvig.pdf
![Page 5: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/5.jpg)
Large Language Models in Machine Translation
Text
Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. Large language models in machine translation.In Proceedings of the2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 858–867,Prague, Czech Republic, 2007.
![Page 6: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/6.jpg)
Vioxx and Heart Attacks
“Trends in inpatient stay due to MI weretightly coupled to the rise and fall ofprescriptions of COX-2 inhibitors, with an18.5% increase in inpatient stays for MIwhen both rofecoxib and celecoxib wereon the market (P<0.001). For everymillion prescriptions of rofecoxib andcelecoxib, there was a 0.5% increase inMI (95%CI 0.1 to 0.9) explaining 50.3%of the deviance in yearly variation of MI-related hospitalizations.”
Brownstein JS, Sordo M, Kohane IS, Mandl KD (2007)The Tell-Tale Heart: Population-Based SurveillanceReveals an Association of Rofecoxib and Celecoxib withMyocardial Infarction. PLoS ONE 2(9): e840.
![Page 7: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/7.jpg)
Shift from Knowledge to Data
Model Prediction = f(Inputs)
![Page 8: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/8.jpg)
The Learning Health Care System
Every clinical encounter is a “natural experiment”Comprehensive data to support analysis of observational data and validate
hypothesesPrinciples
Use “found data”Clinical records, not experimental data collection protocolsEverything: labs, meds, notes, reports, discharge summaries, billing codes,
vitals, monitoring data, gene sequence & expression, environment, geography,social media, metagenomics, epigenomics, proteomics, ...Crimson—use discarded samples
“Anecdotes are not data” —Folk wisdom“A million anecdotes are data!” —Zak Kohane
Focus on predictive modeling
••
••
••
•••
•
![Page 9: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/9.jpg)
Using MIMIC Data to Build Predictive Models
MortalityComparison to SAPS IIDaily Acuity ScoresReal-time Acuity Scores (real-time risk assessment)
Other clinical eventspressor weaningintra-aortic balloon pump weaningonset of septic shockacute kidney injury
Data set (MIMIC 2, earlier snapshot, today ~3-4x)10,066 patients: 7,048 development, 3,018 validationselected cases with adequate dataexcluded neurological and trauma cases
••••
•••••
••••
http://dspace.mit.edu/handle/1721.1/46690
![Page 10: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/10.jpg)
![Page 11: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/11.jpg)
What Kinds of Models to Build?
Patient state depends on pathophysiologyGenetic complement, environmental exposures, pathogens, auto-regulatory
mechanisms, treatments, ...Possible formalism:
POMDP’s, but intractableGraphical models (Bayes Nets, Influence Diagrams, etc.), but require many
independence assumptionsSimple models: Cox proportional hazard, naïve Bayes, linear/logistic regression
Derived variables can summarize essential contributions of dynamic variationintegrals, slopes, ranges, frequencies, etc.Transformed variables: inverse, abs, square, square root, log-abs, abs deviation
from mean, log abs deviation, ...
••
•••
••
••
![Page 12: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/12.jpg)
Summary of Mortality Models
![Page 13: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/13.jpg)
Models for Therapeutic Opportunities and Risks
Predictions are not as accurate as mortality models, but still impressiveUsing acuity score instead of such specific models is worse
E.g., Vasopressor weaning — 0.679 vs. 0.809
••
•
Prediction AUC
Weaning from vasopressors within next 12 hours,remain off for 4 hours
0.809
Pressor weaning + Survival 0.825
Weaning from Intra-Aortic Balloon Pump 0.816
Onset of Septic Shock 0.843
Acute kidney injury 0.742
![Page 14: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/14.jpg)
But where to go from here?
Include data from narrative records: discharge summaries,radiology/pathology/... reports, nursing/doctor notes, ...
Requires natural language processing — in progressImproved intermediate abstractions of data
Use knowledgeUnsupervised machine learning to find clusters of similar data
patterns
•
••
••
![Page 15: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/15.jpg)
Clustering Snapshotshttp://groups.csail.mit.edu/medg/ftp/kshetri/Kshetri_MEng.pdf
~10,000 patients x ~12,000 possible features/patient ≈ 1M rows (sparse), 30 minsnapshots ⇒ 11 clusters
•
# in cluster Survival
GCS Heart Rate Events
![Page 16: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/16.jpg)
Prediction using multi-layer abstraction
Plane of Observations
Plane of Pathophysiological
Clusters(Foci)
Plane of Disease Clusters
Na KGlu
pH CO2
CrUrine
BUN
RRVasopressors
Orientation
SPO2 Vent
MAPJaundiceAIDS
Kidney States
Breathing
InjuryAcute/Chronic aggregate states
Electrolytes Imbalances CVS
Coagulation
INR
WBC
Organ Failure
Rhythms
Slide from Rohit Joshi
![Page 17: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/17.jpg)
Foci%and%FeaturesSelected%~10,000%pa5ents%in%MIMIC$II$database$(excluded%pediatrics,%trauma)Over%12%domain%foci;%many%clinical%variables%per%focusOver%1%million%chart%events%(aBer%binning%into%1%hour%windows)
•••
Focus Features%in%each%Focus
Kidney Crea5nine,%BUN,%BUN2Cr,%UrineOut/Hr/Kg,%…
Liver AST,%Alt,%TBili,%Dbili,%Albumin,%tProtein
Cardio MAP,%HR,%CVP,%BPSys,%BPDias,%Cardiac%Index,%…
Respira5on RR,%SpO2,%FiO2Set,%PEEPSet,%TidVolSet,%SaO2,%PIP,%MinVent,%…
Hematology Hematocrit,%Hgb,%Platelets,%INR,%WBC,RBC,%PT
Electrolytes Na,%Mg,%K,%Ca,%Glucose,%…
AcidYbase Art%CO2,%Art%PaCO2,%Art%pH,%Art%BE
General GCS,%Age,%temp
Medica5on%type Diure5c,%An5arrhythmic,%An5platelet,%%Sympathomime5c,…
Chronic AIDS,%HematMalig,%Metacarcinoma
Electrocardio%(EKG) PVC,%Rhythm%types,%Ectopic%frequency,%… Slide from Rohit Joshi
![Page 18: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/18.jpg)
Layer&2:&Disease&Severity&using&RDF
Learned'clusters'are'closely'related''to'mortality'!! Slide from Rohit Joshi
![Page 19: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/19.jpg)
Mortality Prediction on the Test Data
Our model: AUC of 0.9SAPS-II: AUC of 0.81;
Our method outperforms customized SAPS-II score
High Severity PatientsAll Patients
Our model: AUC of 0.91SAPS-II: AUC of 0.77;
Slide from Rohit Joshi
![Page 20: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/20.jpg)
Pa#ent'State)Transi#on:Effects)of)Past)Transi#ons)of)Other)Foci))
Lungs ElectrolytesKidney
KidneyLungsGeneral
HematAcidbaseElectrolytes
Liver
CardioVascular
Past;Transi=on
ln(organ_transi,on).~..current_state(organ).+.current_state(otherorgans).+...........................................latest_transi,on._trend(otherorgans).+..........................................dura,on_in_current_state(organ).+.........................................dura,on_since_last_change(otherorgans)..Slide from Rohit Joshi
![Page 21: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/21.jpg)
Forecas(ng+the+transi(ons+of+organ+systems
Cardiovascular Respiratory
Confusion3Matrix3(Plot3of3True3states3vs.3Predicted3States)Slide from Rohit Joshi
![Page 22: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/22.jpg)
Future&Needs/Opportuni0es
More&comprehensive&data&collec0onProspec0ve&analysisDecision&supportImproved&visualiza0on&of&pa0ent&stateOp0miza0on&of&interven0ons
•••––
22
![Page 23: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed7532a60a80d707700c251/html5/thumbnails/23.jpg)
Thanks
Funding fromNIH-National Library of Medicine (i2b2)NIH-National Institute of Biomedical Imaging and Bioengineering (MIMIC)ONC SHARP 4 project (Secondary Use of Clinical Data)Simons Center for the Social BrainSiemens Corp.MIMIC Team: Roger Mark, George Moody, Leo Celi, etc.Modeling collaborators: Rohit Joshi, Bill Long, Caleb Hug, Kanak Kshetrii2b2 collaborators: Tianxi Cai, Susanne Churchill, Vivian Gainer, Sergey Goryachev,
Elizabeth Karlson, Isaac Kohane, Fina Kurreema, Katherine Liao, Shawn Murphy,Robert Plenge, Soumya Raychaudhuri, Qing Zeng-Treitler, etc.NLP collaborators: Özlem Uzuner, Bill Long, Anna Rumshisky, Guergana Savova,
Marzyeh Ghassemi, Yuan Luo, Andreea Bodnari, Tawanda Sibanda
My research group: http://medg.csail.mit.edu
•••••••••
•
•