amia joint summits 2017 - electronic phenotyping with aphrodite and the observational health...

29
Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network S16: Papers - EHR-based Phenotyping 2017 Joint Summits on Translational Science, March 28th, 2017 JUAN M. BANDA, PHD 1 , YONI HALPERN, PHD 2 , DAVID SONTAG, PHD 3 , NIGAM H. SHAH, PHD 1 1 STANFORD CENTER FOR BIOMEDICAL INFORMATICS RESEARCH, STANFORD, CA; 2 NEW YORK UNIV., NEW YORK, NY; 3 MIT, CAMBRIDGE, MA

Upload: jmbanda

Post on 11-Apr-2017

85 views

Category:

Science


2 download

TRANSCRIPT

Page 1: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Electronic phenotyping with APHRODITE and the Observational

Health Sciences and Informatics (OHDSI) data network

S16: Papers - EHR-based Phenotyping 2017 Joint Summits on Translational Science,

March 28th, 2017

J U A N M . B A N D A , P H D 1 , Y O N I H A L P E R N , P H D 2 , D A V I D S O N T A G , P H D 3 , N I G A M H . S H A H , P H D 1

1 S T A N F O R D C E N T E R F O R B I O M E D I C A L I N F O R M A T I C S R E S E A R C H , S T A N F O R D , C A ;2 N E W Y O R K U N I V . , N E W Y O R K , N Y ;

3 M I T , C A M B R I D G E , M A

Page 2: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

DISCLOSURE:

SPEAKER DISCLOSES THAT HE HAS NO RELATIONSHIPS WITH COMMERCIAL INTERESTS.

1

Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Page 3: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Motivation

- Phenotyping is the beginning step to almost every biomedical study, yet in many cases is very complicated

- Popular modern approaches are rule-based which take very long to develop

- Can we build a statistical model which provides performance comparable to a consensus definition and requires less development effort? …. Turns out we can! (Agarwal V, et al. and Halpern Y, et al.)

- Can we port these two approaches to the Observational Health Sciences and Informatics (OHDSI) tech stack?

2

Agarwal V, et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173Halpern Y, et al. Electronic medical record phenotyping using the anchor and learn framework. JAMIA 2016. 23 (4): 731-740

Page 4: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE)- Framework is designed to enable phenotyping via supervised (or semi-

supervised) learning of phenotype models

- Reads patient data from the OHDSI/OMOP CDM version 5

- Combines two recently published phenotyping approaches:- XPRESS (Agarwal V, et al.)- Achor learning (Halpern Y, et al.)

- In addition, to enable sharing and reproducibility of the underlying phenotype 'recipes', APHRODITE allows sharing of either the trained model or sharing of the configuration settings and anchor selections across multiple sites

Agarwal V, et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173Halpern Y, et al. Electronic medical record phenotyping using the anchor and learn framework. JAMIA 2016. 23 (4): 731-740

3

Page 5: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

4

+140 collaborators

16 countries

Common CDM and vocabulary

The patient network available includes 84 databases, both clinical and claims, totaling over 650 million patients

Page 6: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

OHDSI /OMOP CDMv5

APHRODITE Overview

5

Page 7: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Step 1 – Build Keyword List

Function call:

wordLists <- buildKeywordList(conn, aphrodite_concept_name, cdmSchema, dbms)

Generated keywords for “Myocardial Infarction”

Vocabulary v5

6

Page 8: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Step 2 – Find Cases and Controls

CDMv5

Function call:

casesANDcontrolspatient_ids_df<- getdPatientCohort(conn, dbms,as.character(keywordList_FF$V3),as.character(ignoreList_FF$V3), cdmSchema,nCases,nControls)

7

Page 9: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Step 3 – Extract Patient Data

CDMv5

Function calls:

dataFcases <- getPatientData(conn, dbms, cases, as.character(ignoreList_FF$V3), flag, cdmSchema)dataFcontrols <- getPatientData(conn, dbms, controls, as.character(ignoreList_FF$V3), flag, cdmSchema)

8

Page 10: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Step 4 – Create Feature Vector

Function calls:

fv_all<-buildFeatureVector(flag, dataFcases,dataFcontrols)fv_full_data <- combineFeatureVectors(flag, data.frame(cases), controls, fv_all, outcomeName)

9

Page 11: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Step 5 – Build Model

Function calls:

model_predictors <- buildModel(flag, fv_full_data, outcomeName, folder)model<-model_predictors$model

predictorsNames<-model_predictors$predictorsNames 10

Page 12: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Step 6 – Get Evaluation Results

11

Page 13: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

How do we know ‘noisy labeling’ works?Using the evaluation of Agarwal V., et al.:

- Selected one phenotype each from defined by the Electronic Medical Records and Genomics (eMERGE) and the Observational Medical Outcomes Partnership (OMOP) initiatives:- Myocardial infarction- Type 2 diabetes mellitus

- Created manually reviewed gold standard of cases and controls

Agarwal V., et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173

  Cases Controls Accuracy Recall PPV Cases Controls Accuracy Recall PPVSource Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)

OMOP / PheKB Definition 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96

XPRESS Noisy labels 94 94 0.85 0.93 0.8 152 152 0.89 0.99 0.81

APHRODITE Noisy labels 94 94 0.94 0.87 1.00 152 152 0.91 0.98 0.87

12

Page 14: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Building phenotype models with APHRODITEUsing the following keywords

Performance of noisly labeled trained models

Performance of gold-standard test set

Myocardial Infarction (MI) Type 2 diabetes mellitus (T2DM)Old myocardial infarction Type 2 diabetes mellitus with hyperosmolar coma

True posterior myocardial infarction Type 2 diabetes mellitusMyocardial infarction with complication Pre-existing type 2 diabetes mellitusMyocardial infarction in recovery phase Type 2 diabetes mellitus with multiple complications

Microinfarct of heart Type 2 diabetes mellitus in non-obese

  Cases Cont. Acc. Recall PPV Acc. Recall PPVMyocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)

XPRESS 750 750 0.86 0.89 0.84 0.88 0.89 0.87APHRODITE 750 750 0.9 0.92 0.89 0.89 0.92 0.87APHRODITE 1,500 1,500 0.9 0.93 0.9 0.91 0.93 0.88APHRODITE 10,000 10,000 0.91 0.93 0.91 0.92 0.94 0.89

  Cases Cont. Acc. Recall PPV Cases Cont. Acc. Recall PPVSource Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)

OMOP/PheKB 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96XPRESS 94 94 0.89 0.93 0.86 152 152 0.89 0.88 0.9

APHRODITE (750) 94 94 0.91 0.93 0.90 152 152 0.91 0.95 0.88

APHRODITE (1,500) 94 94 0.92 0.93 0.91 152 152 0.92 0.95 0.89

APHRODITE (10,000) 94 94 0.92 0.94 0.91 152 152 0.93 0.96 0.89

13

Page 15: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

15

Top 15 features for T2DM and MI

Type 2 diabetes mellitus (T2DM) Myocardial Infarction (MI)Rank Feature Concept Name Concept Vocabulary Rank Feature Concept Name Concept Vocabulary

1 drugEx:1503297 Metformin RxNorm 1 obs:40398391 Hypertensive disease SNOMED2 obs: 433736 Obesity SNOMED 2 obs:4243768 Auscultation SNOMED3 obs:16890009 Insulin measurement SNOMED 3 obs:4342792 Infarction NDFRT

4 obs:434005 Morbid Obesity SNOMED 4 obs:4191705Coronary heart disease review SNOMED

5 lab:4149519:NA Glucose measurement SNOMED 5 obs:4167217Family history of clinical finding SNOMED

6 drugEx:253182 Regular Insulin, Human RxNorm 6 obs:4282140 Chewable tablet SNOMED7 drugEx:1112807 Aspirin RxNorm 7 drugEx:1112807 Aspirin RxNorm8 lab: 4144235:NA Glucose measurement, blood SNOMED 8 obs:4325480 Aspirin NDFRT9 obs:45889912 Hemoglobin SNOMED 9 lab:3002030 Lymphocytes % LOINC

10 obs:45889669 Insulin CPT4 10 lab:3014576 Chloride serum/plasma LOINC11 lab:3004410 Hemoglobin A1c (Glycated) LOINC 11 obs:4329041 Pain SNOMED12 obs: 40398391 Hypertensive disease SNOMED 12 lab:40789893 Potassium | Bld-Ser-Plas LOINC13 obs:1503297 Metformin RxNorm 13 obs:433595 Edema SNOMED14 obs:4325480 Aspirin NDFRT 14 visit:77670 Chest Pain SNOMED15 obs:40301556 Past medical history SNOMED 15 obs:1126658 Hydromorphone RxNorm

14

Page 16: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Using Anchors with APHRODITE:Step 1 – Build Keyword List - Anchors

Function call:

wordLists <- buildKeywordList(conn, aphrodite_concept_name, cdmSchema, dbms)

Generated keywords for “Myocardial Infarction”

15

Page 17: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Step 2 – Find Anchors

CDMv5 Function call:

anchor_list <- getAnchors(conn, dbms, cdmSchema, cases, controls, as.character(ignoreList_FF$V3), studyName, outcomeName, flag, numAnchors=50)

16

Page 18: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Step 3 – Curate Anchors

Manual Step

17

Page 19: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Step 4 – Find Cases and Controls - Anchors

CDMv5

Function call:

Anchored_casesANDcontrolspatient_ids_df<- getPatientCohort_w_Anchors(conn, dbms,as.character(keywordList_FF$V3),as.character(ignoreList_FF$V3),

cdmSchema,nCases,nControls, flag) 18

Page 20: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Step 5 – Extract Patient Data - Anchors

CDMv5

Function calls:

dataFcases <- getPatientData(conn, dbms, cases, as.character(ignoreList_FF$V3), flag, cdmSchema)dataFcontrols <- getPatientData(conn, dbms, controls, as.character(ignoreList_FF$V3), flag, cdmSchema)

19

Page 21: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Step 6 – Create Feature Vector - Anchors

Function calls:

fv_all<-buildFeatureVector(flag, dataFcases,dataFcontrols)fv_full_data <- combineFeatureVectors(flag, data.frame(cases), controls, fv_all, outcomeName)

20

Page 22: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Step 7 – Build Model - Anchors

Function calls:

model_predictors <- buildModel(flag, fv_full_data, outcomeName, folder)model<-model_predictors$model

predictorsNames<-model_predictors$predictorsNames 21

Page 23: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Anchors sugested for MI and T2DM

Myocardial Infarction (MI) Type 2 diabetes mellitus (T2DM)

Source importance / rank

concept_name Source Importance /rank

concept_name

obs 1.9036 1 Renal function lab 0.5637 1 Serum HDL/non-HDL cholesterol ratio measurement

obs 1.8554 2 Every eight hours lab 0.5466 2 Glucose measurementobs 1.7831 3 Chest CT obs 0.5328 3 Lipid panelobs 1.7108 4 Cataract obs 0.5156 4 Asthmaobs 1.6386 5 Hypercholesterolemia obs 0.4984 5 Diabetes mellitusobs 1.5663 6 Osteopenia lab 0.4778 6 Hyaline castsobs 1.4699 7 Palpitations obs 0.4675 7 Pulmonary edemaobs 1.3735 8 Commode obs 0.4366 8 Obesityobs 1.3012 9 Tightness sensation quality obs 0.4228 9 Hypoglycemiaobs 1.253 10 Afternoon obs 0.3987 10 Metforminobs 1.012 11 Deficiency obs 0.3816 11 Duplex

drugEx 0.8916 12 ferrous sulfate obs 0.3712 12 PalpitationsdrugEx 0.7711 13 Dobutamine visit 0.3541 13 Obesity

obs 0.7229 14 Fibrovascular lab 0.3334 14 Glucose

lab 0.5542 15 Sodium [Moles/volume] in Blood lab 0.3059 15 Hemoglobin A1c

drugEx 0.4819 16 clopidogrel obs 0.2784 16 Subclavicular approachlab 0.3855 17 Lactic acid measurement obs 0.2303 17 Colonoscopyobs 0.3133 18 Pressure ulcer obs 0.2131 18 Cholesterol

drugEx 0.1928 19 zolpidem drugEx 0.1925 19 Insulin, Regular, Human

obs 0.0964 20 Aspirin 81 MG Enteric Coated Tablet

drugEx 0.0206 20 Oxycodone

22

Page 24: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

APHRODITE anchored phenotype performance

  Cases Cont. Acc. Recall PPV Cases Cont. Acc. Recall PPVMyocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)

OMOP/PheKB 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96

XPRESS 94 94 0.89 0.93 0.86 152 152 0.89 0.99 0.9

APHRODITE 94 94 0.91 0.93 0.9 152 152 0.91 0.98 0.88

APHRODITE (Anchors) 94 94 0.92 0.97 0.89 152 152 0.92 0.95 0.9

23

Page 25: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

ConclusionsWe have successfully implemented the framework proposed by Agarwal et

al. and the Anchor learning framework by Halpern et al. to build and refine phenotype models

We have demonstrated that it is possible identify anchors during the model building process to generate a better labeled training set which leads to a better performing model than just using keyword for noisy labeling

Our main contribution is the APHRODITE package, which allows for the potential redistribution of locally validated phenotype models as well as the sharing of the workflows for learning phenotype models at multiple sites of the OHDSI data network

Agarwal V, et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173Halpern Y, et al. Electronic medical record phenotyping using the anchor and learn framework. JAMIA 2016. 23 (4): 731-740

24

Page 26: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

AcknowledgementsStanford- Nigam Shah- Vibhu Agarwal- Shah lab

New York University- Yoni Halpern

MIT- David Sontag

OHDSI Community

25

Page 27: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

QUESTIONS?

Download Aphrodite:https://github.com/OHDSI/Aphrodite

Learn more:http://tinyurl.com/use-aphrodite

25

@drjmbanda

Page 28: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

28

Why do noisy labels work

Learning theory - Generalization error of a model trained using data with inaccurate labels can be made small

Model trained using m observations with accurate labels has the same gen error as a model trained using m/(1 – 2τ) 2 observations with noisy labels.

Model inaccuracies in automatic labeling as random classification noise given by error rate τ

Unless τ is 0.5, having more observations will help The cost of getting additional observations is negligible

Page 29: AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

29

Performance results for data removal experiments  Cases Cont. Acc. Recall PPV Cases Cont. Acc. Recall PPV

Source Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)

APHRODITE 94 94 0.91 0.93 0.9 152 152 0.91 0.98 0.88

Observations removed 94 94 0.75 0.84 0.78 152 152 0.67 0.76 0.71

Labs removed 94 94 0.87 0.85 0.82 152 152 0.69 0.78 0.72

Drugs removed 94 94 0.85 0.85 0.82 152 152 0.83 0.9 0.84

Visits removed 94 94 0.89 0.88 0.86 152 152 0.86 0.91 0.86

APHRODITE w Anchors 94 94 0.92 0.97 0.89 152 152 0.92 0.95 0.9

Observations removed 94 94 0.77 0.89 0.79 152 152 0.7 0.77 0.73

Labs removed 94 94 0.89 0.88 0.81 152 152 0.71 0.79 0.75

Drugs removed 94 94 0.86 0.87 0.84 152 152 0.86 0.91 0.85

Visits removed 94 94 0.91 0.9 0.87 152 152 0.88 0.9 0.84