using emr data for population registries diana gumas, jhmcis senior director for research systems,...

26
Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical Data Analysis 1

Upload: chloe-sutton

Post on 11-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Using EMR Data for Population Registries

Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga

David Thiemann, Center for Clinical Data Analysis1

Page 2: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Potential Data Uses

• Sample Size Estimates (aggregate data without IRB approval)– Feasibility, grant applications, statistical planning

• Identifying patients for enrollment/recruitment– By diagnosis, pathology, stage, labs, meds

• Identifying/creating matched study controls• Obtaining current demographics (name, address) for mail

solicitation– From research list or by clinic, provider, clinical criteria

• Obtaining ongoing clinical + administrative data on a registry panel– Labs, visits, procedures, immunizations, CPT/ICD9 codes,

resource use

2

Page 3: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Possible research data sources

• EPR (JHH & JHBMC)• Sunrise Clinical Manager (JHH – inpatient)• Meditech (Bayview)• Casemix Datamart• GE Centricity (JHCP)• EPR2020• Departmental Systems (ED, OR, Anesthesia)• Clinical Research Management System (CRMS)• IDX (professional fees)• Death Registry

3

Page 4: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Methods for Data Access

• Historical: Researcher Negotiates Access With Clinical System /DBA

– Logistic nightmare, technical challenge

• Clinical Research Management System (CRMS)– Study cohort with real-time links to enterprise data

• Center for Clinical Data Analysis– Monthly/quarterly data extracts from designated systems

4

Page 5: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Clinical Research Management System (CRMS)

5

• 1,054 Users• 1079 Active Studies• 25,430 Participants

Data Available in CRMS– eIRB – EPR (patient demographics)– Study participants / accruals– Electronic Case Report Forms - in next 2-3 months

Page 6: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Clinical Research Management System (CRMS)

6

Ways to extract data– Canned Reports (click for examples)

– Ad-hoc querying using SQL

– Possible with CCDA support - automated study-specific data extracts

Page 7: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

EPR2020 Data for Researchers

7

4.2M Patients, 23.4M Visits

12.3M Documents, 6.8M Radiology Reports

25.6M Lab Results

1.5M Problems, 2.2M Medications, 140K Allergies

Planned • Bayview & JHCP data• ICD9 diagnosis codes and CPT charges (IDX)

Future• Death Registry• Blood Product Data for Transfusions• Eclipsys SCM Order data• HMED (ED), ORMIS, eADR/Medivision

FromEPR

Today

Page 8: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

My Participant’s Lab Data

8

Reliable. Driven by the CRMS Participant Registry. Exportable.

Page 9: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Registry Cohort Discovery using EPR2020

A JHM investigator wants to find and enroll diabetic patients

aged 45-65 years

with hemoglobin A1C between 7 and 9%

serum creatinine < 2 mg/dl

9

Page 10: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Center for Clinical Data Analysis (CCDA)

Provides periodic (monthly/quarterly) bulk data extracts (delimited/flat files, .xls):

• Preliminary, anonymous data for feasibility, grant applications and statistical sample-size estimates

• IRB-approved case-finding--for study enrollment (mailings, phone solicitation), chart review, and cohort/case-control studies

• Research data extracts - monthly/quarterly integrated extracts from EPR, POE, ORMIS, lab/PDS, billing systems, vaccination/transfusion/culture data, etc.

10

Page 11: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

How CCDA works

• Email [email protected], cc: [email protected]; phone 410-955-65558 (Thiemann)

• For IRB-approved research: – Provide full protocol + IRB approval– Meet to discuss query methods, format– Iterate, then schedule prod (email extracts, Jshare)– Cost: $100/hour

• For non-IRB projects (exploratory analyses, QI)– Same process, cost subsidized by ICTR/JHM– Do NOT implicitly morph QI into IRB

11

Page 12: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

The Basics: Getting Clinical Data Into a Registry Database

• Real work, not ad hoc/bootstrap

• Need $$$ and FTE(s)

• Smart analyst(s) who know database technology and understand (or can learn) nuances of the sources and content domain

• Hands-on PI management/guidance

• Statistical liason early, before database schema and ETL methods are set in stone

12

Page 13: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

The Extract-Transform-Load process:Getting Clinical Data into Research DB

• Raw clinical/administrative data is useless for research

• Build an intermediate (staging) database

– Don’t do data management in SAS/Stata/Excel

• Data dictionary—derivation for each field

• Templated, tested, documented cleanup scripts/routines.

• Intermediate tables: Log each step/modification – For each batch, be able to re-create data transform from scratch

– Version control, change control and documentation are vital

– Build data versioning into the database

13

Page 14: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Transforming Data

• Raw data typically string (char/text) fields

• Unanalyzable characters (* < >, comments) still have meaning

– Put non-numeric data in separate field. Avoid numerical recoding (999)

• ~3% of pts have multiple/non-preferred MRNs– Need 1-to-many link table

• Assays/reference ranges/coding changes– Avoid using raw codes (CPT/ICD) in research db– Map clinical codes to research terms

• Defer analytic assumptions. When recoding data, anticipate problems. Keep options open.

14

Page 15: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

More Data Transform Challenges

• NEVER trust raw data. Learn business logic of source system.

– CPTs morph annually, internal complexity/redundancy– Lab assays/reference/terms change– Parsing is inherently unreliable– Administrative names/groups change (clinic #s, departments).

• Duplicate-value problems (labs, orders)

• System-attribution source/datetime (POE, lab)

• Always run an aggregate (“group by” ) query to identify alternative names (eg lab name) and values (number, result) before transform. Otherwise you’ll miss something

15

Page 16: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Understanding Business Logic

• Trust but verify: Test coding accuracy – Providers may habitually use imprecise/inaccurate diagnosis

codes (especially in profee data)– ICD9 procedure indications often a billing fiction – Trained coders may make systematic errors – Different content domains may have different standards (inpt vs

outpt coders)– Don’t infer/assume dependencies unless enforced by source

system.

• Run min/max queries, aggregates, outer joins– Confirm date ranges, data ranges, relative proportions by year

• Don’t assume that null rows actually are empty. Maybe the query missed something

16

Page 17: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHM Clinical Data Landscape: Past, Present and Future

Past : Babble of unintegrated systems

• EPR (antiquated technology, VSAM files, DB2) contains text, not queryable, analyzable data

Present: EPR2020 (aka Amalga) –integrated data!!

• Has everything in EPR, plus JHCP, plus gradually adding data from clinical/departmental/administative systems (IDX CPTs, transfusion medicine, ORMIS, HMED, eADR, death registry, ad infinitum)

Future: ? Epic, ? JHM Data Warehouse• Epic: One system replacing all major JHM systems• JHH timeline: 4+ years

17

Page 18: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHM Data Sources: Casemix Datamart

• Gold standard for JHM (non-profee) administrative data, including payer/insurance data

• Combines data from Keane (hospital charges), ADT (admission/discharge/transfer), HDM (ICD9 diagnosis + procedure coding), HSCRC (regulatory submissions)

• Not a true data warehouse; meager reconciliation

• Best source for length of stay, resource use, ICD9 diagnoses

• Outpatient ICD9s limited

• Has JHH + BMC + HCGH data 18

Page 19: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHM Data Sources: IDX (profee)

• Gold standard for inpatient +outpatient CPT (profee charge) data

• ICD9 diagnosis data problematic

• Limitation: No data from non-faculty providers (private physicians, etc.)

• Difficult to query. Has a data warehouse, limited access.

• Early target for EPR2020/Amalga integration.

19

Page 20: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHH Data Sources: SCM/POE

• Sunrise Clinical Manager/Provider Order Entry

• Replicated transactional database, difficult to query

• For registry purposes POE has large attribution/process challenges: Stutter-step orders, multiple alerts, imputed times

• Great source for inpatient meds, labs, physiologic monitor data

• No codified ICD9/Snomed/RxNorm data

• No outpatient data

20

Page 21: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHH Data Sources: SCC/AIM

• Sunrise Critical Care (aka Emtek, Eclipsys). JHH ICUs + stepdown units + oncology

• AIM analytic database contains selected but comprehensive batch extract

• Sunsets as ICUs switch to POE ClinDoc

• Challenging to query. Lots of denormalized fields

21

Page 22: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHH + BMC Data Sources: PDS

• PDS=Pathology Data Systems

• Includes lab, transfusion medicine, anatomic pathology, cytopath, John Boitnott’s death registry

• Lab data also available via EPR2020/Amalga and POE

22

Page 23: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

BMC Data Sources: Meditech

• Shrink-wrapped, comprehensive inpatient + outpatient clinical + financial system

• Difficult for ad hoc research queries.

• Exports data to Datamart and EPR2020

• BMC-JHH patient linkage doable but difficult, needs caution

23

Page 24: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHCP Data Sources: GE Centricity

• All clinical + administrative data for JHCP clinics

• Largely opaque to research query; JHCP sometimes collaborates directly, especially for its physician/investigators

• Early target for EPR2020/Amalga integration

• Linkage challenges to BMC and JHH mrns

24

Page 25: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHH Departmental Data:ORMIS + eADR/Medivision

• ORMIS: Operating Room Management Information System

• Mostly transactional scheduling/tracking/administrative data, limited clinical data.

• Has diagnoses, procedures, case start/stop times

• eADR/Medivision (anesthesia) still evolving, limited research data access

• Design challenges similar to legacy SCC critical-care system.

25

Page 26: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHH Departmental Data: HMED (Emergency Department)

• Mostly opaque to research

• Replicated data hosted by Datamart

26