ehr-based phenotyping: bulk learning and evaluaon (with … · bulk learning is a batch-phenotyping...
TRANSCRIPT
![Page 1: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/1.jpg)
EHR-BasedPhenotyping:BulkLearningandEvalua;on(with
Infec;ousDiseases) Po-Hsiang (Barnett) Chiu
![Page 2: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/2.jpg)
Phenotypesandphenotyping
Physicallyobservabletraitsofgenotypes(andtheirinterac;onswithenvironments)
Diseases(anddiseasesubtypes)
AFribu;onsofdiseases(e.g.suscep;bility)
Biochemicalorphysiologicalproper;es,behavior,andproductsofbehavior
![Page 3: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/3.jpg)
Data-DrivenPhenotyping• Data-drivenphenotyping
– Twomainmethodologies• Rule-basedapproach(e.g.eMerge,hFps://emerge.mc.vanderbilt.edu)• Predic3veAnaly3cs
– Datasources:• EHRs/EMRs:Medicinaltreatments,diagnoses,labmeasurements,etc.• Genomicdata:SNParrays,copynumbervaria;on(CNVs),etc.
– Phenotypes• Diseases,subtypes,orvariablesaFributedtodiseasepredic;ons
![Page 4: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/4.jpg)
Diagnos;cConceptUnits• Variousdiseasessharingthesamesetofdiagnos;cconceptunits
• Infec;ousdiseases– Labtests
• Microorganism,blood,urine,body;ssues,stool– Medica;ons
• An;bio;c,an;virus,anthelmin;c
• Buildsta;s;calmodelsforeachdiagnos;ccomponentandcombinethemappropriately– Ensemblelearning
![Page 5: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/5.jpg)
BulkLearninginaNutshell…Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set) as a substrate for model learning and evaluation wherein (a given) medical ontology is used to perform feature selection and model stacking is used to construct abstract feature representation of low sample complexity in order to reduce training requirements.
KeyConcepts:
1.Buildphenotypingmodelsontopofmul;plediseases
3.Modelsarecombinedviamodelstacking(aformofensemblelearning)4.Abstractfeatures
Dimensionalityreduc;on
2.Automa;cfeatureselec;onusinganexis;ngontology
5.Lesslabeleddatarequiredformodelevalua;ons
![Page 6: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/6.jpg)
PhenotypingviaBulkLearning• Undermodelstacking,wethenarriveattheno;onof“concept-drivenphenotyping”– Asubsetorcombina;onsoflabtestsaremoreaFributabletosomediseaseswhiletheothersarebeFerexplainedbymedica;ons
• Inthisstudy,infec;ousdiseasesassociatedwith100ICD-9codesasthedomainofstudyforbulklearning– Forsimplicity,considerdifferentdiagnos;ccodesasdifferentdiseases…
– Why100codes?– Codeselec;onstrategy?
![Page 7: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/7.jpg)
BulkLearningBasicsI
• Addressestwocentralissuesinpredic;veanaly;calapproachtocomputa;onalphenotyping– Featureengineering
• Medicalontologyforfeaturedecomposi;on• MedicalEn;;esDict(hFp://med.dmi.columbia.edu)
– Dataannota;on• Ensemblelearning(e.g.stackedgeneraliza;on[Wolpert1992])
• Featureabstrac;onfordimensionalityreduc;on
![Page 8: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/8.jpg)
MedicalOntologyforGroupingFeatures
• SnapshotofMedicalEn;;esDic;onary(hFp://med.dmi.columbia.edu)
![Page 9: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/9.jpg)
ModelStacking• Whyinspec;ngmul;ple(infec;ous)diseases?
– Usingmul3plediseasesassubstrateandiden;fytheircommonelements– Examplestackingarchitecture(understackedgeneraliza;onmethod)
Level 1
Level 0
Antibiotic Measure
Urinary Chemistry Measure
Intravenous Chemistry Measure
Microbiology Measure
Level 2
Attributes: Level-0 Probabilities and IndicatorsTarget: Diagnostic Codes (Silver Standard)
Other Phenotypic Measures (e.g. Antiviral)
Attributes: Level-1 Probabilities and ICD-9Target: True Labels (Gold Standard)
![Page 10: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/10.jpg)
SurrogateLabelsvsTrueLabels
• Modelstackingisusedtoachieve:– Improveuponbasemodelperformances– TransformEHRdatatoadenserform
• Usesdiagnos;ccodes(e.g.ICD-9)assurrogatelabelstoestablish“approximatepredic;vemodels.”
• Whysurrogatelabels(e.g.ICD-9)?– FeaturesextractedfromEHRcanbelarge– Usedtoderivecompactrepresenta;onofthetrainingdata– “Free”supervisedsignalsthataresufficientlyclosebutcanbeobtained
withoutextrawork• Objec;ve:Buildsta;s;calmodelsinabstractfeaturespace
– Createasparseannota;onset(i.e.goldstandard)thatservesaproxydatasetfordownstreammodelevalua;ons
– 83annotatedcases
![Page 11: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/11.jpg)
⌃
⌃
⌃
⌃
⌃
m1
a1
b1
u1
m1(1)
m1g a1g b1g u1g
global2
(i)
(i-1)
(i+1)
a1(1) b1
(1) u1(1)
logistic unitsraw
features
microbiology
antibioticblood test
urine test
f11
f12
f1j
f21
f2j
f31
f41
f3j
Four Example Base Models
![Page 12: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/12.jpg)
PerformanceEvalua;ons
• HowwelldoesthemodelpredictICD-9s(usingaseparatetestdata)?
• Howwelldoesthemodelpredictannotateddata(assoc.with“truelabels”)?– (Binarized)ICD-9becomesacandidatefeatureamongabstractfeatures(e.g.probabilityscores,indicators)
• AnnotatedsampleconsistsofrandomlyselectedcasesinwhicherrorsofICD-9codingarecorrected
• Dataannota;onsandcodingproceduresaretwoindependentprocesses
![Page 13: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/13.jpg)
BaseLevelPerformances
![Page 14: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/14.jpg)
127.4Enterobiasis
047.8(Other)viralmeningi;s
009.1Gastroenteri;s...
053.9Herpezzoster
117.9Mycoses
![Page 15: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/15.jpg)
![Page 16: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/16.jpg)
OtherComponents
• Semi-supervisedlearningandvirtualannota;onset
• The3rd;erinmodelstackinghierarchy– Trade-offbetweenlearnedabstractfeaturesandtheICD-9codesassurrogatelabels.
– Performanceevalua;ononpredic;ngannotatedlabels• Ontology-basedfeatureengineering• Properdesignoftreatmentandcontrol(training)data
![Page 17: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/17.jpg)
ModelingPerspec;ve• EHRdataconsistofobserva;onsandlatentvariables
– Observa;onscanbedirectlyansweredviasimplequeries• Didthepa;enthavetestsonE.Coli?• Didthepa;enttakeCekriaxon?
• Latentvariablesrepresentquan;;esthatcannotbedirectlyobservedinEHRorcomputedviasimplequeries– Doesthepa;enthaveaninfec;on?– Diagnos;cques;ons:specificallywhichinfec;onsdothepa;enthave?
• Learnclassifierstopredictlatentvariables(withonlyaccesstoobserva;ons)
![Page 18: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/18.jpg)
MedicalPerspec;ve• Seeminglydifferentinfec;ousdiseasesmaysharesimilarsetsoflabtestsandmedica;ons– Staph.aureus
• Skininfec;ons,pneumonia,bloodpoisoning– Cekriaxone
• Meningi;s• Infec;onsatdifferentsitesofthebody(e.g.bloodstream,lungs,urinarytracts)
• Mul;pleclassifiersforthesamedisease– 4classifiersperICD-9code,eachofwhichisbinaryclassifier
• 400classifiersatbaselevel
![Page 19: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/19.jpg)
DataDistribu;onPerspec;ve
“Canwebuildajointmodelapplicabletoalldiseases?”
![Page 20: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/20.jpg)
AbstractFeatureRepresenta;on:DesignChoices
• Relatedworkinconstruc;nghigh-levelfeatures– PCA,unsupervisedfeaturelearning,manifoldlearning,etc.
• Designchoices– Datacharacteris;cs– Interpretability
• DeepNeuralNetwork– Linearcombina;on– Non-lineartransforma;on(e.g.sigmoid,rec;fier,etc.)
• Featureset:con;nuous,dense,and“homogeneous”– Imagepixels– Timesseriesoflabmeasurements– word2vec
• EHRdatahoweverareverydifferent– sparseandincomplete– consistofmanydifferenttypes(binary,categorical,con;nuous,etc.)– Featuresassociatedwithmul;pleconcepts
![Page 21: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/21.jpg)
MovingForward…• Summary
– Bulklearningisaframeworkwithatleastthefollowingsystemchoices• Thebulklearningset(oftargetcondi;ons)=>basemodels• Classifica;onalgorithms(guideline:probabilis;cclassifiers+well-calibrated)• Stackingarchitecture(mul;ple;ers=>levelsofabstrac;ons)• Strategyforcombiningindividual(local)diseasemodelstoaglobalmodel
– Advantage:Canuseasmallannotatedsampleformodelconstruc;onandevalua;onwithintheabstractfeaturespace(e.g.level-1data)
• 83clinicalcaseswerelabeledinthisstudy– Challenge:Themodelinvolvingtheinterac;onbetweenabstractfeaturesand
ICD-9donotgeneralizewellintotheregionofthedatawheretheICD-9codingwasincorrect
• Mul;pletypesofsurrogatelabels⌃
⌃
⌃
⌃
⌃
m1
a1
b1
u1
m1(1)
m1(i)
a1(i)
b1(i)
u1(i)
⌃
m1g a1g b1g u1g
local2(i)
global2
(i)
(i-1)
(i+1)
a1(1) b1
(1) u1(1)
(i-1)(i)
(i+1)
Semi-supervisedlearningAc3velearning
Complexdecisionboundary?
Othersurrogatelabels
• Ongoingandfuturework
![Page 22: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/22.jpg)
Reference[1]D.H.Wolpert,Stackedgeneraliza;on,NeuralNetworks.5(1992)241–259.[2]K.M.Ting,I.H.WiFen,Issuesinstackedgeneraliza;on,J.Ar;f.Intell.Res.10(1999)271–289.[3]J.JinChen,C.ChengWang,R.RunshengWang,UsingStackedGeneraliza;ontoCombineSVMsinMagnitudeandShapeFeatureSpacesforClassifica;onofHyperspectralData,IEEETrans.Geosci.RemoteSens.47(2009)2193-2205.[4]DavidBaorto,JamesCimino,etal.Available:hFp://med.dmi.columbia.edu.Accessdate:Oct20,2016.[5]T.A.Lasko,J.C.Denny,M.A.Levy,Computa;onalPhenotypeDiscoveryUsingUnsupervisedFeatureLearningoverNoisy,Sparse,andIrregularClinicalData,PLoSOne.8(2013)e66341.
![Page 23: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/23.jpg)
T H A N K
Y O U ⌃
⌃
⌃
⌃
m1
a1
b1
u1
f11f12
f1j
f21
f2j
f31
f41
f3j
![Page 24: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/24.jpg)
⌃
⌃
⌃
⌃
m1
a1
b1
u1
logistic unitsraw
featuresf11
f12
f1j
f21
f2j
f31
f41
f3j
m1
a1
b1
u1
⌃
Level 0 Level 1
Microbiology
An;bio;c
Bloodtest
Urinetest
![Page 25: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/25.jpg)
ExampleFeatures
![Page 26: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/26.jpg)
![Page 27: EHR-Based Phenotyping: Bulk Learning and Evaluaon (with … · Bulk Learning is a batch-phenotyping framework that uses multiple diseases collectively (i.e. bulk learning set)](https://reader034.vdocument.in/reader034/viewer/2022051804/5ff0e88bb53fc833ae0e9ec9/html5/thumbnails/27.jpg)
⌃
⌃
⌃
⌃
⌃
m1
a1
b1
u1
m1(1)
m1(i)
a1(i)
b1(i)
u1(i)
⌃
m1g a1g b1g u1g
local2(i)
global2
(i)
(i-1)
(i+1)
a1(1) b1
(1) u1(1)
(i-1)(i)
(i+1)
logistic unitsraw
features
microbiology
antibioticblood test
urine test
2. Compute Base Models
Level-1 Global Unit
Individual Level-1 Local Units
Level-1 abstractfeatures
f11
f12
f1j
f21
f2j
f31
f41
f3j
Four Example Base Models
3. Compute Meta Models (via Ensemble Learning)1. Define Feature Groups Using Medical Ontology
1a. Gather EHR data according to medical concepts
1b. Use Medical Entities Dictionary to delineate feature scopes
1c. Apply feature selection within each
concept group
3a. Per-disease ensembles:compute local level-1 models
3b. Cross-disease ensemble: compute a global
level-1 model
Global level-1 features