extending)an)i2b2.based)clinical)data … · 2019-04-11 · 1244 patients had 1st antibiotic admin...

32
Extending an I2B2based Clinical Data Repository with the R Sta=s=cal Pla?orm Daniel W. Connolly, Bhargav Adagarla, John Keighley, Lemuel R. Waitman University of Kansas Medical Center Kansas City, Kansas This project is supported in part by NIH grant UL1TR000001 and NSF Award CNS-1258315

Upload: others

Post on 07-Feb-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Extending  an  I2B2-­‐based  Clinical  Data  Repository  with  the  R  Sta=s=cal  Pla?orm  

Daniel  W.  Connolly,  Bhargav  Adagarla,  John  Keighley,  Lemuel  R.  Waitman  

University  of  Kansas  Medical  Center  Kansas  City,  Kansas  

This project is supported in part by NIH grant UL1TR000001 and NSF Award CNS-1258315

Page 2: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

HERON  Research  Support  Goals  Clinical  Data  Repository  supports:  

•  Cohort  Discovery  o  prospec=ve  trials:  feasibility  o  retrospec=ve  studies:  data  use  

•  Hypothesis  Genera=on  o  explore  data  o  summarize  o  visualize  

Waitman LR, Warren JJ, Manos EL, Connolly DW. Expressing Observations from Electronic Medical Record Flowsheets in an i2b2 based Clinical Data Repository to Support Research and Quality Improvement. AMIA Annu Symp Proc. 2011;2011:1454-63.

photo credit: Christopher Harshaw

informatics.kumc.edu

Healthcare Enterprise Repository for Ontological Narration

Page 3: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

HERON  helps  Inves=gators  Iden=fy  Cohorts  using  I2B2  

Murphy SN, Weber G, Mendis M, Chueh HC, Churchill S, Glaser JP, Kohane IS. Serving the Enterprise and beyond with Informatics for Integrating Biology and the Bedside (i2b2). J Am Med Inform Assoc. 2010;17(2):124-30.

cancer prevention, treatment

Page 4: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

HERON  Architecture:  Where  Informa=cs  and  Governance  Roles  Meet  

•  Data  from  Epic  Clarity  database  (>  7,000  tables  &  60,000  columns)  •  Transformed  into  an  I2B2-­‐compa=ble  schema.  Then,  de-­‐iden=fied,  and  loaded  on  a  

separate  database  server  to  be  accessed  by  I2B2.  •  De-­‐iden=fied  data  used  by  I2B2  is  deemed  non-­‐human  subjects  research  by  our  

ins=tu=onal  review  board  

patient privacy, institutional liability

python, SQL

Page 5: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Richer  Analysis  Without  Bulk  Export  

•  HERON  includes  i2b2  query  on  a  largely  self-­‐service  basis.  

•  Bulk  export  of  data  for  off-­‐line  analysis  involves  approval  by  an  oversight  commi]ee.  

•  Aim:  support  richer  analysis  without  the  need  for  bulk  export.  

patient privacy, institutional liability

python, SQL

biostatistics, R

cancer prevention, treatment

Investigator

Biostatistician

Privacy Officer

Software Developer

Page 6: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Segagni D, Ferrazzi F, Larizza C, Tibollo V, Napolitano C, Priori SG, Bellazzi R. R engine cell: integrating R into the i2b2 software infrastructure. J Am Med Inform Assoc. 2011 May 1;18(3):314-7. Epub 2011 Jan 24.

Kaplan Meier Web Client

Plug-in

RECell

Kaplan Meier jar

application

JRI libraries

R statistical software

CRC Cell I2B2 DW

I2B2 HIVE

1

2

3 4

5

R  Engine  Cell  adds  Survival  Plo^ng  

Page 7: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Programs  and  Products  

•  Program:  works  for  the  programmer  •  Product:  works  for  the  customer  •  Sharing  programs  is  useful  even  though  they’re  not  products  yet,  or  ever!  

“Programs  must  be  wri]en  for  people  to  read,  and  only  incidentally  for  machines  to  execute.”  

Hal  Abelson,  Structure  and  Interpreta<on  of  Computer  Programs  

Page 8: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Kaplan Meier Web Client

Plug-in

RECell

Kaplan Meier jar

application

JRI libraries

R statistical software

CRC Cell I2B2 DW

I2B2 HIVE

1

2

3 4

5

Integra=ng  the  R  Engine  Cell  with  HERON  for  Cancer  Research  

Issues: •  Clinical Domain

o  cardio vs. cancer o  start at birth vs start at

diagnosis o  stratification: gender vs. stage

•  Version Skew o  RE Cell: I2B2 version 1.4 o  HERON: I2B2 version 1.6

•  Architecture... photo credit: Christopher Harshaw

Page 9: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Kaplan Meier Web Client

Plug-in

RECell

Kaplan Meier jar

application

JRI libraries

R statistical software

CRC Cell I2B2 DW

I2B2 HIVE

1

2

3 4

5

Efficiency,  Scalability:  R  Engine  Cell  Data  Path  

CRC Cell sends back to the plug-in an XML response containing the requested data (extracted from the i2b2 datawarehouse).

725,000,000 facts incl. 60,000 cancer cases

Page 10: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Kaplan Meier Web Client

Plug-in

rgate

km_analysis.R

rpy libraries

R statistical software

I2B2 DW

apache

1 3

5

Efficiency,  Scalability:  rgate  connects  R  to  Oracle  directly    

I2B2 HIVE

PM cell

2

4

Like the CRC cell, rgate calls the PM cell to validate authorization.

Page 11: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

R  Engine  Cell  approach  to  R  Integra=on:  Kaplan  Meier  jar  applica=on  

R  Code  Genera=on  in  KMAnalysis.java:  ...

Integer[] statusInteger = (Integer[])status.toArray(new Integer[status.size()]); StringBuffer statusStr = new StringBuffer();

statusStr.append("status<-c("); for(int i=0;i<statusInteger.length;i++){ statusStr.append(statusInteger[i].intValue());

if(i!=(statusInteger.length-1)) statusStr.append(",");

} statusStr.append(")");

...

re.eval("data=data.frame(time,status,gender)"); re.eval("names(data)=c('time','status','gender')"); re.eval("setwd(\""+resultFolder+"\")");

re.eval("library(survival)"); re.eval("fit <- survfit(Surv(data$time, data$status) ~ gender, data)");

python, SQL, HTML, JavaScript

Page 12: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

R  Engine  Cell  approach  to  R  Integra=on:  Kaplan  Meier  jar  applica=on  

R  Code  Genera=on  in  KMAnalysis.java:  ...

Integer[] statusInteger = (Integer[])status.toArray(new Integer[status.size()]); StringBuffer statusStr = new StringBuffer();

statusStr.append("status<-c("); for(int i=0;i<statusInteger.length;i++){ statusStr.append(statusInteger[i].intValue());

if(i!=(statusInteger.length-1)) statusStr.append(",");

} statusStr.append(")");

...

re.eval("data=data.frame(time,status,gender)"); re.eval("names(data)=c('time','status','gender')"); re.eval("setwd(\""+resultFolder+"\")");

re.eval("library(survival)"); re.eval("fit <- survfit(Surv(data$time, data$status) ~ gender, data)");

biostatistics, R

Page 13: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Separa=on  of  Concerns  in  rgate:  R  code  goes  in  .R  files  

Analysis  is  wri]en  in  the  language  of  sta=s=cians:  ##' km_analysis -- Kaplan Meyer analysis from i2b2 observations

library(ROracle) acct = db_config() patient.set.survival <- function(concept.paths, patient.set.id,

web.folder, filename) { conn <- dbConnect(Oracle(), acct$username, acct$password, access)

sql <- paste(" select '", concept.paths$event, "' panel , to_char(f.start_date, 'YYYY-MM-DD HH24:MI:SS') start_date

, pset.patient_num , cd.name_char , cd.concept_cd

from blueherondata.observation_fact f, ...") data = transform.observations(dbGetQuery(conn, sql))

fit <- survfit(Surv(data$time, data$status) ~ concept.paths$stratum, data) png(paste(web.folder, filename, sep='/')) plot(fit, xlab="Time (Years)", ylab="Survival probability")

dev.off() }

biostatistics, R

Page 14: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Separa=on  of  Concerns  in  rgate:  R  code  goes  in  .R  files,  but...  

How  well  does  the  R  code  behave  when  the  author  is  not  there?:  ##' km_analysis -- Kaplan Meyer analysis from i2b2 observations

library(ROracle) acct = db_config() patient.set.survival <- function(concept.paths, patient.set.id,

web.folder, filename) { conn <- dbConnect(Oracle(), acct$username, acct$password, access)

sql <- paste(" select '", concept.paths$event, "' panel , to_char(f.start_date, 'YYYY-MM-DD HH24:MI:SS') start_date

, pset.patient_num , cd.name_char , cd.concept_cd

from blueherondata.observation_fact f, ...") data = transform.observations(dbGetQuery(conn, sql))

fit <- survfit(Surv(data$time, data$status) ~ concept.paths$stratum, data) png(paste(web.folder, filename, sep='/')) plot(fit, xlab="Time (Years)", ylab="Survival probability")

dev.off() }

patient privacy, institutional liability

python,

SQL, HTML, JavaScript

what the R author needs

???

Page 15: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Object  Capability  Discipline  supports  the  Principle  of  Least  Authority  

     Memory  safety  and  encapsula=on1    +  Effects  only  by  using  held  references2    +  No  powerful  references  by  default3    

     Reference  graph  ≡  Access  graph          Only  connec=vity  begets  connec=vity          Natural  Least  Authority          OO  expressiveness  for  security  pa]erns  

acct = db_config()

1.  closure inspection is not safe: environment(function), as.list(function)

2.  plot(fit) implicitly uses results of png(paste(web.folder, filename)) 3.  R global environment most likely includes lots of powerful

references

A B

C

m

A B

C

M. Miller, C. Morningstar, B. Frantz; "Capability-based Financial Instruments"; Proceedings of Financial Cryptography (Springer-Verlag); 2000 erights.org

erights.org

in a: b.m(c)

Page 16: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

rgate  Security  Architecture:  km_analysis.R  can  only  read  pa=ent  set  

A]enuated  pa=ent  data  access:  ##' km_analysis -- Kaplan Meyer analysis from i2b2 observations

library(survival) run_analysis <- function(patient.set, folder, filename, progress, paths, title, xmax) {

obs.db = observations(patient.set, unlist(paths)) progress(paste("query returned", nrow(obs.db), " observations."))

data <- db2km(obs.db, paths) progress(paste("db2km resulted in ", nrow(data), "data points for plotting."))

survplot(data, title, folder, xmax, filename) progress(paste("KM plot stored in", filename, "in", folder))

}

biostatistics, R

patient privacy, institutional liability

Page 17: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Interac=ve  R  sta=s=cal  visualiza=on  in  HERON  Clinical  Data  Repository  

Page 18: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Exploring  Breast  Cancer  comorbidi=es:  Obesity,  Diabetes  

HERON brings together diabetes diagnosis and BMI from hospital EMR with cancer staging from tumor registry and vital status from the U.S. SSA death index.

Page 19: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

R  Survival  Plug-­‐in  in  Regular  use  by  HERON  Users  

python, SQL, HTML, JavaScript

cancer prevention, treatment

biostatistics, R

patient privacy, institutional liability

informatics.kumc.edu

Page 20: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Kaplan Meier Web Client

Plug-in

rgate

km_analysis.R

rpy libraries

R statistical software

I2B2 DW

apache

1 3

5

A  Balanced  Approach  to  R  plug-­‐ins  for  I2B2    

I2B2 HIVE

PM cell

2

4

biostatistics, R

patient privacy, institutional liability

biostatistics, R biostatistics,

R

abc_analysis.R xyz_analysis.R

abc Web Client

Plug-in xyz Web Client

Plug-in

cancer prevention, treatment

python, SQL, HTML, JavaScript

Page 21: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Survival  Comparison  Between  Cohorts  

Page 22: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Survival  Comparison  Between  Cohorts  

Log-rank test: Groups N Observed Expected 1 440 197 210 2 259 125 111 Chisq= 2.676 on 1 degree of freedom; p= 0.102.

Page 23: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

R  Data  Builder  

Alpha!

Page 24: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

R  Studio  Server  

Page 25: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Systems  Architecture  Identified data server

i2b2 compatiblestar schema

Staged source data

De-identified server

i2b2 compatiblestar schema

Application server

de-identification processmonthly refresh ETL

Source System files (EMR dump, UHC CDB extract)

secu

re F

TP/E

TL

RStudio Server

R scripts plots,statistics

Investigator’s client

One tab in browser

i2b2 web client

Another tab in browser

RStudio IDE web client

i2b2 Hive rgate

Page 26: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

•  R  Data  Builder  plugin  in  i2b2  and  integra=on  with  RStudio  Server  •  (h]p://www.rstudio.com/ide/docs/server/ge^ng_started)  

Emerging  FuncAonality:  From  Data  AggregaAon  to  Hospital  Quality  Preliminary  Analysis  

•  Test  Case:  AnAbioAc  AdministraAon  for  SepAc  paAents  in  the  Emergency  Room  –  Past  publica=on  to  bring  in  flowsheet  

data  an  important  founda=on    –  University  HealthSystem  Consor=um  

CDB  “gold”  standard  for  KU  Hospital  –  What  can  you  solve  in  i2b2  “same  

financial  encounter”  versus  send  to  R?  

Page 27: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Repurposing  i2b2  Clinical  Research  Infrastructure  for  InpaAent  Quality  Improvement  

•  I2b2  originally  ambulatory  or  popula=on/genomics  focused    •  Is  i2b2  version  1.6  with  same  financial  encounter  and  modifiers  now  useful  for  

inpa=ent  research?  

•  Goal:  understand  medica=on            =ming  and  an=bio=c  selec=on  •  Suspect  vancomycin  preferred  •  Validate  HERON  medica=ons  

–  Especially  administra=on  =ming  

Page 28: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

R  Data  Builder  Plugin  and  RStudio  Server  

Web  based  for  user.    Just  another  tab  in  the  browser    All  data  stays  on  the  server  so  there’s  no  data  release  and  risk  of  re-­‐iden=fica=on  due  to  a  lost  file  

i2b2  Plugin  invokes  a  program  that  creates    a  Rda  file  in  their  directory  on  the  server    

Page 29: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

UHC,  Flowsheets,  MedicaAons  data  sources:  what  i2b2  could  answer  versus  R  analysis  

3513 patients had a UHC-defined

septicemia diagnosis

2912 patients were an Emergency

Admission

2861 patients age were 18 years or

older2722 patients had an exposure to an Antibiotic in the

encounter 1839 had ED Triage

documentation during the encounter

1244 patients had 1st antibiotic admin

within 24 hours(1474 encounters)

A

993 had 1st antibiotic admin given in ED(1140 encounters)

B

316 had 1st antibiotic admin not in ED(334 encounters)

C

1836 had the Sepsis Screen Used during the

encounter

261 had 1st antibiotic admin before sepsis

screening (277 encounters)

D

1040 had 1st antibiotic admin after sepsis screening

(1197 encounters)E

Cohorts above line defined with i2b2

Cohorts below line further refined with R

1223 had 2 SIRS criteria, organ

dysfunction and suspicion /treatment

of infection717 MD notified

Average time spent in ED is 8.7 hours, median 7.6

Average time in ED is 7.9 hours,

median 7.1

Average time spent in ED is 6.7 hours, median 6.6

Average time to sepsis screening 2.9 hours, median 49 minutes

Note: 28 patients who lacked an ED departure time were excluded from further analysis

i2b2 could define cohort

cohort refinement with R

Page 30: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

Density  Plots:  Time  from  Arrival  to  First  AnAbioAc  

0.00

0.05

0.10

0.15

0 5 10 15 20 25Hours

ProportionofEncounters

Drug

broad

vanc

1

0.00

0.05

0.10

0.15

0 5 10 15 20 25Hours

ProportionofEncounters

Drug

broad

vanc

2

0.00

0.05

0.10

0.15

0.20

0 5 10 15 20 25Hours

ProportionofEncounters

When

in.ed

not.in

3

0.00

0.05

0.10

0.15

0.20

0 5 10 15 20 25Hours

ProportionofEncounters

Admin

before

after

4

Broad Spectrum versus Vancomycin

Lag in Broad Spectrum after Vancomycin

Lag when given outside Emergency Room

Administration relative to RN Sepsis Screen

Page 31: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

•  REDCap  registries  into  i2b2  allows  intui=ve  explora=on    –  Researchers  may  need  less  abstrac=on  as  data  is  extracted  from  the  EMR.  

•  i2b2  into  REDCap:  inherit  security  model,  graphical/export  tools  

Aligning  Clinical  Research  InformaAcs  for  Quality:  Registry  AbstracAon  and  Data  Delivery  

Page 32: Extending)an)I2B2.based)Clinical)Data … · 2019-04-11 · 1244 patients had 1st antibiotic admin within 24 hours (1474 encounters) A 993 had 1st antibiotic admin given in ED (1140

1st  Annual  HERON  Fishing  Tournament:  HERON  training  workshop  

•  August  1st:  o  1:00-­‐5:00  PM  Classroom  style  training  o  Convene  for  social  gathering  

•  August  2nd:  o  8:00-­‐0:00  AM  Hands-­‐on  training  on  a]endee  topics  o  10:15-­‐11:30  Discussion  

Invited:    KUMC/Fron=ers  researchers  and  regional  informa=cians  

informatics.kumc.edu frontiersresearch.org