Secondary Use of Healthcare Data for Translational Research
Shawn Murphy MD, Ph.D. Indivo Conference June 18, 2012
Example: PPARγ Pro12Ala and Diabetes
0.1 0.2
0.3 0.4
0.5 0.6
0.7 0.8
0.9 1
1.1 1.2 Estimated risk
(Ala allele) 1.3 2.0
Deeb et al. Mancini et al.
Ringel et al.
Meirhaeghe et al.
Clement et al.
Hara et al.
Altshuler et al.
Hegele et al.
Oh et al.
Douglas et al.
All studies
Lei et al. Hasstedt et al.
1.4 1.5
1.6 1.7
1.8 1.9
Sample size
Ala is protective
Mori et al.
Overall P value = 2 x 10-7
Odds ratio = 0.79 (0.72-0.86)
Courtesy J. Hirschhorn
High Throughput Methods for supporting Translational Research
Set of patients is selected from medical record data in a high throughput fashion
Investigators explore phenotypes of these patients using i2b2 tools and a translational team developed to work specifically with medical record data
Distributed networks cross institutional boundaries for phenotype selection, public health, and hypothesis testing
Working with Indivo for advancement of patient participation in Translational Research
High Throughput Methods for supporting Translational Research
Set of patients is selected from medical record data in a high throughput fashion
Investigators explore phenotypes of these patients using i2b2 tools and a translational team developed to work specifically with medical record data
Distributed networks cross institutional boundaries for phenotype selection, public health, and hypothesis testing
Working with Indivo for advancement of patient participation in Translational Research
De-identified
Data Warehouse
1) Queries for aggregate patient numbers
0000004 2185793 ... ...
0000004 2185793 ... ...
2) Returns identified patient data
Z731984X Z74902XX ... ...
Real identifiers
Query construction in web tool
Encrypted identifiers
OR - Start with list of specific patients, usually from (1) - Authorized use by IRB Protocol - Returns contact and PCP information, demographics, providers, visits, diagnoses, medications, procedures, laboratories, microbiology, reports (discharge, LMR, operative, radiology, pathology, cardiology, pulmonary, endoscopy), and images into a Microsoft Access database and text files.
- Warehouse of in & outpatient clinical data - 6.0 million Partners Healthcare patients - 1.5 billion diagnoses, medications, procedures, laboratories, & physical findings coupled to demographic & visit data - Authorized use by faculty status - Clinicians can construct complex queries - Queries cannot identify individuals, internally can produce identifiers for (2)
Research Patient Data Registry exists at Partners Healthcare to find patient cohorts for clinical research
Organizing data in the Clinical Data Warehouse
Binary Tree
start search
Patient-Concept FACTS patient_key concept_key start_date end_date practitioner_key encounter_key
Patient DIMENSION patient_key patient_id (encrypted) sex age birth_date race
ZIP deceased
Concept DIMENSION concept_key concept_text search_hierarchy
Encounter DIMENSION encounter_key encounter_date
Pract . DIMENSION practitioner_key name service
hospital_of_service
value_type numeric_value textual_value abnormal_flag
Star schema
1500 million
.16 .06 150 6.0
Query items Person who is using tool
Query construction
Results - broken down by number distinct of patients
FINDING PATIENTS
High Throughput Methods for supporting Translational Research
Set of patients is selected from medical record data in a high throughput fashion
Investigators explore phenotypes of these patients using i2b2 tools and a translational team developed to work specifically with medical record data
Distributed networks cross institutional boundaries for phenotype selection, public health, and hypothesis testing
Working with Indivo for advancement of patient participation in Translational Research
The National Center for Biomedical Computing entitled Informatics for Integrating Biology and the Bedside (i2b2), what is it?
Software for explicitly organizing and transforming person-oriented clinical data to a way that is optimized for clinical genomics research Allows integration of clinical data, trials data, and genotypic data
A portable and extensible application framework Software is built in a modular pattern that allows additions without
disturbing core parts Available as open source at https://www.i2b2.org
i2b2 Cell: The Canonical Software Module
i2b2
HTTP XML (minimum: RESTful)
Business Logic
Data Access
Data Objects
An i2b2 Environment (the Hive) is built from i2b2 Cells
A B
C
“Hive” of software services provided by i2b2 cells
remote
local
observation_fact
PK Patient_NumPK Encounter_NumPK Concept_CDPK Observer_CDPK Start_DatePK Modifier_CDPK Instance_Num
End_Date ValType_CD TVal_Char NVal_Num ValueFlag_CD Observation_Blob
visit_dimension
PK Encounter_Num
Start_Date End_Date Active_Status_CD Location_CD*
patient_dimension
PK Patient_Num
Birth_Date Death_Date Vital_Status_CD Age_Num* Gender_CD* Race_CD* Ethnicity_CD*
concept_dimension
PK Concept_Path
Concept_CD Name_Char
observer_dimension
PK Observer_Path
Observer_CD Name_Char
1
∞
∞
∞∞
∞
∞
1
modifier_dimension
PK Modifier_Path
Modifier_CD Name_Char
∞
∞
observation_fact
PK Patient_NumPK Encounter_NumPK Concept_CDPK Observer_CDPK Start_DatePK Modifier_CDPK Instance_Num
End_Date ValType_CD TVal_Char NVal_Num ValueFlag_CD Observation_Blob
visit_dimension
PK Encounter_Num
Start_Date End_Date Active_Status_CD Location_CD*
patient_dimension
PK Patient_Num
Birth_Date Death_Date Vital_Status_CD Age_Num* Gender_CD* Race_CD* Ethnicity_CD*
concept_dimension
PK Concept_Path
Concept_CD Name_Char
observer_dimension
PK Observer_Path
Observer_CD Name_Char
1
∞
∞
∞∞
∞
∞
1
modifier_dimension
PK Modifier_Path
Modifier_CD Name_Char
∞
∞
observation_fact
PK Patient_NumPK Encounter_NumPK Concept_CDPK Observer_CDPK Start_DatePK Modifier_CDPK Instance_Num
End_Date ValType_CD TVal_Char NVal_Num ValueFlag_CD Observation_Blob
visit_dimension
PK Encounter_Num
Start_Date End_Date Active_Status_CD Location_CD*
patient_dimension
PK Patient_Num
Birth_Date Death_Date Vital_Status_CD Age_Num* Gender_CD* Race_CD* Ethnicity_CD*
concept_dimension
PK Concept_Path
Concept_CD Name_Char
observer_dimension
PK Observer_Path
Observer_CD Name_Char
1
∞
∞
∞∞
∞
∞
1
modifier_dimension
PK Modifier_Path
Modifier_CD Name_Char
∞
∞
observation_fact
PK Patient_NumPK Encounter_NumPK Concept_CDPK Observer_CDPK Start_DatePK Modifier_CDPK Instance_Num
End_Date ValType_CD TVal_Char NVal_Num ValueFlag_CD Observation_Blob
visit_dimension
PK Encounter_Num
Start_Date End_Date Active_Status_CD Location_CD*
patient_dimension
PK Patient_Num
Birth_Date Death_Date Vital_Status_CD Age_Num* Gender_CD* Race_CD* Ethnicity_CD*
concept_dimension
PK Concept_Path
Concept_CD Name_Char
observer_dimension
PK Observer_Path
Observer_CD Name_Char
1
∞
∞
∞∞
∞
∞
1
modifier_dimension
PK Modifier_Path
Modifier_CD Name_Char
∞
∞
Data Model
Data Repository Cell
I2b2 Software components are distributed as open source
Set of patients is selected through Enterprise Repository and data is gathered into a data mart
EDR
Selected patients
Data directly from EDR
Data from other sources
Data imported specifically for project
Automated Queries search for Patients and add Data
Project Specific
Phenotypic Data
Interogation can occur through i2b2 web client
Data is available through the i2b2 Workbench
RPDR
Final Project
DB
RPDR Mart
Local Clinical EDC
Local sources Ex: BICS
Project Manager Biostatistician Analyst
Local data extract analyst Programmer
RPDR Support Programmers
Team support for Projects
Natural Language Processing
HOSPITAL COURSE: ... It was recommended that she receive …We also added Lactinax, oral form of Lactobacillus acidophilus to attempt a repopulation of her gut.
SH: widow,lives alone,2 children,no tob/alcohol.
BRIEF RESUME OF HOSPITAL COURSE: 63 yo woman with COPD, 50 pack-yr tobacco (quit 3 wks ago), spinal stenosis, ...
SOCIAL HISTORY: Negative for tobacco, alcohol, and IV drug abuse.
SOCIAL HISTORY: The patient is a nonsmoker. No alcohol.
SOCIAL HISTORY: The patient is married with four grown daughters, uses tobacco, has wine with dinner. Smoker
Non-Smoker
SOCIAL HISTORY: The patient lives in rehab, married. Unclear smoking history from the admission note…
Past Smoker
Hard to pick
Hard to pick
???
NLP Workflow
NLP Specialists
I2b2 Project Investigators
Research Investigator Workflow enabled by mi2b2
Images Retrieved
from Clinical PACS
BIRN/XNAT
Use i2b2
Request Images with
Accession #’s
Query is done To find patients
Study Images
Derive new data from images
mi2b2
Builds complex “Custom Study” displays
Ontology-driven data organization allows simplistic data models that paste together
i2b2 DB Project 1
i2b2 DB Project 2
i2b2 DB Project 3
of Project 3
of Project 2
Shared data of Project 1
[ Enterprise Shared Data ]
Ontology
Consent/Tracking
Security
Custom views supports clinical trials in i2b2
A Patient-Centered View is welcomed by clinical researchers when they review patients for clinical trials
The Patient-Centered View can be focused to support specific needs of each clinical trial review
New Apps can be written to assist eligibility determination during the viewing of a patient
1.
2. SMART Connect
SMART REST
SMART Addition to i2b2
A “Patient-Centered View” enabled by SMART
Up close …
Patient-Centered View can be edited by user
High Throughput Methods for supporting Translational Research
Set of patients is selected from medical record data in a high throughput fashion
Investigators explore phenotypes of these patients using i2b2 tools and a translational team developed to work specifically with medical record data
Distributed networks cross institutional boundaries for phenotype selection, public health, and hypothesis testing
Working with Indivo for advancement of patient participation in Translational Research
Community Arizona State University Beth Israel Deaconness Hospital, Boston, MA Boston University School of Medicine, Boston, MA Brigham and Women's Hospital, Boston, MA Case Western Reserve Hospital Children's Hospital, Boston, MA (Denver) Children's Hospital, Denver, CO Children's Hospital of Philadelphia, PA Childrens's National Medical Center (GWU) Cincinnati Children's Hospital, Cincinnati, OH Cleveland Clinic, Cleveland, OH (Weil Medical College of) Cornell, NYC, NY Duke Medical College Group Health Cooperative Harvard Pilgrim Healthcare Harvard Medical School, Boston, MA Health Sciences South Carolina Kaiser Permanente Health Kimmel Cancer Center (Thomas Jefferson University) Massachusetts General Hospital, Boston, MA Maine Medical Center, Portland, ME Marshfield Clinic, Wisconsin Morehouse School of Medicine, Atlanta, GA Ohio State University Medical Center, Columbus, OH Oregon Health & Science University, Portland, OR Renaissance Computing Institute, Chapel Hill, NC South Carolina Clinical and Translational Research Institute Tufts Medical Center, Boston, MA University of Alabama University of Arkansas Medical School University of California Davis, Davis, CA University of California San Francisco, SF, CA University of Chicago University of Massachusetts Medical School, Worcester, MA University of Michigan Medical Center, Ann Arbor, MI University of Pennsylvania School of Medicine, Philadelphia, PA University of Rochester Medical Center, Rochester, NY University of Texas Health Sciences Center at Houston, Houston, TX University of Texas Health Sciences Center at San Antonio, SA, TX University of Texas Health Sciences Center Southwestern, Dallas, TX Utah Health Science Center, Salt Lake City, UT University of Washington, Seattle, WA University of Wisconsin Madison Veterans Administration Boston and Utah
Georges Pompidous Hospital, Paris, France Institute for Data Technology and Informatics (IDI), NTNU, Norway Karolinska Institute, Sweden University of Erlangen-Nuremberg, Germany University of Goettingen, Goettingen, Germany University of Leicester and Hospitals, England (Biomed. Res. Informatics Ctr. for
Clin. Sci) University of Pavia, Pavia, Italy University of Seoul, Seoul, Korea
United States International
Aggregating across 4 hospitals, 3 i2b2 instances SHRINE (Shared Research Informatics Network) = Distributed Queries
Clinical data in SHRINE
10 years (2001-2011) 4 hospitals 6 million total patients >1 billion medical observations
Demographics Diagnoses (ICD9-CM) Medications (RxNorm) Labs (LOINC)
2012
Query Health = I2b2 queries distributed to hospitals that may or may not be hosting i2b2
High Throughput Methods for supporting Translational Research
Set of patients is selected from medical record data in a high throughput fashion
Investigators explore phenotypes of these patients using i2b2 tools and a translational team developed to work specifically with medical record data
Distributed networks cross institutional boundaries for phenotype selection, public health, and hypothesis testing
Working with Indivo for advancement of patient participation in Translational Research
Integration of i2b2 and Indivo
INDIVO
PATIENT MATCH
DATA FILTER
PATIENT PUBLISH
Key new components
PATIENT MATCH
DATA FILTER
PATIENT PUBLISH
Allows patients permission to be attached to their records in i2b2
Allows only specified parts of the i2b2 record on the patient to be viewed by the patient
Allows patients to publish their outcomes to i2b2 = WRITE BACK TO i2b2
Advantages for Patients
Able to see data on themselves from medical record Important to learn lessons form Beth Israel – Microsoft Healthvault
venture
Integration can span across several institutions May be able to take advantage of SHRINE infrastructure
SMART will allow integrated view to focus on patient’s illnesses and concerns
Indivo may provide a way to return research results to patients
Advantages for Research
Indivo makes it possible to collect Patient Reported Outcomes Can allow engagement of patient for recruitment for Clinical
Trials
Collaborators RPDR
Eugene Braunwald John Glaser Diane Keogh Henry Chueh
I2b2 and SMART
Isaac Kohane Susanne Churchill Griffin Weber Michael Mendis Nich Wattanasin Vivian Gainer Lori Phillips Rajesh Kuttan Wensong Pan Janice Donahue William Simons (SHRINE) Andy McMurry (SHRINE) Doug McFadden (SHRINE) Ken Mandl (SMART) Josh Mandel (SMART)
Medical Imaging (mi2b2) Christopher Herrick David Wang Bill Wang
i2b2 Driving Biology Projects
Vivian Gainer Victor Castro Raul Guzman Robert Plenge Scott Weiss Stan Shaw John Brownstein Qing Zeng Guergana Savova
Query Health
Jeff Klann