phenotype generation from emr by tensor factorization sedi durham cohort james lu m.d. ph.d....
TRANSCRIPT
![Page 1: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department](https://reader037.vdocument.in/reader037/viewer/2022110103/5697c01b1a28abf838ccf8a7/html5/thumbnails/1.jpg)
Phenotype generation from EMR by tensor factorization
SEDI Durham Cohort
James Lu M.D. Ph.D.Department of Electrical and Computer EngineeringDepartment of Medicine
![Page 2: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department](https://reader037.vdocument.in/reader037/viewer/2022110103/5697c01b1a28abf838ccf8a7/html5/thumbnails/2.jpg)
3.2 Trillion / yr (~21% of GDP)
Health System Under Pressure
![Page 3: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department](https://reader037.vdocument.in/reader037/viewer/2022110103/5697c01b1a28abf838ccf8a7/html5/thumbnails/3.jpg)
Small Molecules, Medical Devices, Biologics, diagnostics, genomics,
transcriptomics….
Operations Novel technology
Align incentives, risk sharing, quality metrics, reducing readmissions, six
sigma/ lean, …
Where do I achieve cost arbitrage?
How do we identify which patients to
study?
Where is my patient going to do next?
Can we reorganize
patient flow?
![Page 4: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department](https://reader037.vdocument.in/reader037/viewer/2022110103/5697c01b1a28abf838ccf8a7/html5/thumbnails/4.jpg)
Computable phenotypes are a top down process
PheKB, Northwestern
![Page 5: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department](https://reader037.vdocument.in/reader037/viewer/2022110103/5697c01b1a28abf838ccf8a7/html5/thumbnails/5.jpg)
Many variations of computable phenotypes require adjudication by physicians.
Richesson, et al. 2013
Expensive and time consuming
![Page 6: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department](https://reader037.vdocument.in/reader037/viewer/2022110103/5697c01b1a28abf838ccf8a7/html5/thumbnails/6.jpg)
EMR Data is large and ComplicatedDurham County, 2007-2011
Patient level
>240,000 patients Birthday Death (where available) Gender Race Ethnicity
Visit level
4.4 Million patient visits Average 18 measurements recorded
per visit
Indicator of presence/absence of particular diseases (computed)
Encounter date (start, end) Location (DHRH, DUH, DRH) Path (ED -> inpatient for example) Inpatient / Outpatient
> 60,000 types of observations
• CPT
• ICD9 diagnoses
• ICD9 procedures
• Lab values
• Medications
• Vitals
Intervention level
• Caveats:• Temporal gaps – People are only patients when they are sick• We want to incorporate all of this information• Don’t want to be fooled by mistakes and bias
![Page 7: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department](https://reader037.vdocument.in/reader037/viewer/2022110103/5697c01b1a28abf838ccf8a7/html5/thumbnails/7.jpg)
Decompose each touch with the health care system into its parts
● Each visit is a 5-D tensor (~1 billion elements)
● Patient● Diagnosis/ Billing Codes● Labs ● Medications● Time
● Model as Counts
● Decompose into set of K rank 1 vectors
With Piyush Rai and Changwei Hui
𝒴 𝑃𝑜𝑖𝑠 ¿
x
Code
s
Labs
Medications
Time
+…
![Page 8: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department](https://reader037.vdocument.in/reader037/viewer/2022110103/5697c01b1a28abf838ccf8a7/html5/thumbnails/8.jpg)
Computational phenotypes are a bottom-up process. Factors represent latent phenotypesEvaluate 11242 pts with ~23MM data-points with morbidity outcomes in diabetes
Alprazolam
Urate
Factor 2
Factor 10
Malignant Neoplasm Prostate
Clinical Trial Participation
Secondary Malignant Neoplasms of Bone
External Catheter Set
CEAAG 15-3
Allopurinol
Evening Primrose Oil
Systemic Lupus Erythematosus
Side Effects from Statins
Shoulder Pain
Calcidiol
Jo-1
![Page 9: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department](https://reader037.vdocument.in/reader037/viewer/2022110103/5697c01b1a28abf838ccf8a7/html5/thumbnails/9.jpg)
Patients are composites of common and rare latent phenotypes.
ER/ EKG
Standard Labs (i.e. CBC/ BMP)
Kidney Disease
Hypertension
Surgical Patient
Patient by Factor Score Matrix, 40 most common phenotypes
![Page 10: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department](https://reader037.vdocument.in/reader037/viewer/2022110103/5697c01b1a28abf838ccf8a7/html5/thumbnails/10.jpg)
Compare Outcome prediction to Known Algorithm (UKPDS)
UKPDS: UK Prospective Diabetes Study outcomes model used to predict MI, Death, and Stroke
7 demographic + lab variables: age, ethnicity, smoking status A1c, HDL, Total Cholesterol and
Systolic BP
Dataset Original 7 variable model All Data Non Matrix Factorization Tensor Factorization
Can we predict outcome in next year
Death AMI Stroke
Classification Model: Fit data with Random Forests 10 fold cross validation
With Joseph Lucas
![Page 11: Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department](https://reader037.vdocument.in/reader037/viewer/2022110103/5697c01b1a28abf838ccf8a7/html5/thumbnails/11.jpg)
Tensor derived factors performs better than original UKPDS in all outcomes, provides comparable performance to “all-data” model
Stroke is similar to Dat