moving from big data to better models of disease and drug response - joel dudley
DESCRIPTION
CityAge: The Data Effect VancouverTRANSCRIPT
Moving from Big Data to Better Models of Disease
and Drug Response
Joel Dudley, PhD Director of Biomedical Informatics &
Assistant Professor of Genetics and Genomic Sciences, Mount Sinai School of Medicine
Icahn School of Medicine at Mount Sinai @IcahnIns(tute
Mount Sinai Health System Facts
>6,000Physicians
7Member hospital campuses
>3,500
>3,100,000Patient visits
Hospital beds
Mount Sinai is attracting key talent to thrive in a Big Data world
Demeter
There are rarely smoking guns in human disease biology
There are rarely smoking guns in human disease biology
HEART
VASCULATURE
KIDNEY
IMMUNE SYSTEM
transcriptional network
protein network
metabolite network
Non-coding RNA network
GI TRACT
BRAIN
ENVIRONMENT EN
VIRO
NMEN
T
ENVIRONMENT
ENVI
RONM
ENT
That promise to enable the construction of molecular networks that define the biological processes that comprise living systems
We must embrace complexity to fully understand human physiology and disease
We must embrace complexity to fully understand human physiology and disease
“A complex adaptive system has three characteristics. The first is that the system consists of a number of heterogeneous agents, and each of those agents makes decisions about how to behave. The most important dimension here is that those decisions will evolve over time. The second characteristic is that the agents interact with one another. That interaction leads to the third—something that scientists call emergence: In a very real way, the whole becomes greater than the sum of the parts. The key issue is that you can’t really understand the whole system by simply looking at its individual parts”.
- Michael J. Mauboussin (investment banker)
Although our ability to embrace complexity will bump up against our want to tell stories
Zeus, the sky god; when he is angry he throws lightening bolts out of the sky
Ptolemaic astronomy: the earth is the center of the universe
The earth is flat
Biological processes are driven by simple linearly ordered pathways (e.g. TGF-beta signaling)
Integrating and modeling the digital universe of information
We need to be able to leverage the digital universe of information to solve complex problems
1.8ZETTABYTES
Last year WE cracked the 1 zettabyte
(1.8 trillion gigabytes) of information will be created and replicated in 2011-
and growing fast (it has grown by a factor of 9 in just five years)
2011 IDC Digital Universe Study sponsored by EMC
Being masters of really big data is now critical for biomedical research (TB→PB→EB→ZB)
Organisms Tissues Single cells
Single cell, real-‐2me,
con2nuous?
Inter Pulse Distance (IPD)
Real time observation systems add complex but powerful new dimensions to NGS
We measure more than we know
Exploring the transcriptional landscape of human disease
~300 Diseases and Condi2ons
20k+ Genes
Blue: gene goes down in disease Yellow: gene goes up in disease
Suthram S, Dudley J et al. Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Computational Biology (2010)
Figure 2. Significant disease-disease similarities. (A) Hierarchical clustering of the disease correlations. The distance between two diseases wasdefined to be (1-correlation coefficient) of the two diseases. The tree was constructed using the average method of hierarchical clustering. The redline corresponds to a p-value of 0.01 and FDR of 10.37% and, disease correlations below this line are considered significant. The different colorsrepresent the various categories of significant disease correlations. (B) The network of all the 138 significant disease correlations. The colorscorrespond to significant disease correlation categories in (A). The nodes colored in grey are not marked in (A).doi:10.1371/journal.pcbi.1000662.g002
Network-Based Elucidation of Disease Relationships
PLoS Computational Biology | www.ploscompbiol.org 4 February 2010 | Volume 6 | Issue 2 | e1000662
Building molecular taxonomies of human disease
Data Driven Approach to Connect Drugs and Disease Using Molecular Profiles
Sirota, M., Dudley, J. T., et al. (2011). Discovery and Preclinical Validation of Drug Indications Using Compendia of Public Gene Expression Data. Science Translational Medicine, 3(96).
Topiramate Reduces IBD Severity in a TNBS Rodent Model of IBD
• TNBS chemically induced rat model of IBD
• Animals treated with 80mg/kg topiramate oral after sensitization
• Prednisolone positive control (approved for IBD in humans)
Dudley, J. T., Sirota, M., et al. (2011). Computational Repositioning of the Anticonvulsant Topiramate for Inflammatory Bowel Disease. Science Translational Medicine, 3(96).
Control Imipramine
Approved compound for non-cancer indication prevents formation of SCLC tumors in a genetic model of SCLC
p53/Rb/p130 triple knockout model of SCLC
Mice dosed after
tumor formation
33
Supplementary Fig. 2 | Inhibitory effects of Imipramine, Promethazine, and Bepridil on SCLC allografts and xenografts. a, Strategy used for the treatment of mice growing SCLC tumors under their skin. NSG immunocompromised mice were subcutaneously implanted with 2 different mouse SCLC cell lines (Kp1 and Kp3) (b) and one human SCLC cell line (H187) (c) and tumor volume was measured at the times indicated of daily IP injections with vehicle control (Saline and corn oil; n=10 in (b) and n=4 in (c)), Imipramine (25mg/kg; n=7 in (b) and n=4 in (c)), Promethazine (25mg/kg; n=7 in (b) and n=4 in (c)), and Bepridil (10mg/kg; n=7 in (b) and n=3 in (c)) (3 independent experiments in (b) and 1 experiment in (c)). Values are shown as mean ± s.e.m. The unpaired t-test was used to calculate the p-values of treated versus control tumors at different days of treatment. *P<0.05, **P<0.01, and ***P<0.001. Values that are not significant are not indicated. d, Representative images of SCLC xenografts (H187) collected 14 days after daily treatment with Saline, Imipramine, and Promethazine. e, MTT survival assay of Cisplatin- and saline-treated SCLC cells cultured in 2% serum (n=3 independent experiments) for 48 hours with increasing doses of Imipramine. ns, not significant. f, Representative images of Cisplatin- and saline-treated SCLC allografts collected 17 days after daily treatment with Saline, and Imipramine.
0
2
4
6
8
Days of Treatment
Fo
ld C
ha
ng
e o
f Tu
mo
r V
olu
me
b cSaline
Imipramine
Promethazine
Bepridil
***
**
****
******
*****
0 3 5 7 10 13
!"#$#"%&'()*+,-'.
$
/'0+,1)1"2'34556
787.7
97:7;77
<=%>,=2?@)A,"@)%-'B7!4?@)A,"@)%-'C7!4?@)A,"@)%-'DC!4
''''EFG< "#5< 4?HI9 HJ 4<
<EHK5L
/'0+,1)1"2'34556
787
.797:7;77
"#5< 4?HI9EHK5L
M
<=%>,=2E,=@->#"N)%-'87!4E,=@->#"N)%-'B7!4E,=@->#"N)%-'C7!4
''''EFG< HJ 4<
<
O OOO
OOO
OOOOOO
OOOOOOOOO
OOO
OOO
OOOOOO
OO
OOOOOO
OOO OOO
OOOOOO
OOO
OOOOOO
OOO
OOO
OOOOOO
OOO
OO
"
::'H-+,=P2"L>=@"'>+@=,L
D9'E#-=$#,=@=$Q>=@"'>+@=,L
BC'4-,R-2'<-22'<",$)%=@"'>+@=,L
.8'4)M*+>'<",$)%=)M'>+@=,L
!"!#$!"%&'("'#)'("'#*!+",)
!"!#$!"%&'("'#)'("'#*!+",)
!"!#$!"%&'("'#)'("'#*!+",)
!"!#$!"%&'("'#)'("'#*!+",)
!
!
!
!
"#$
!#%
&#'
'#''''''''''''SHG'-TA,-LL)=%'2-1-2L'
P '''''EFG< ''''EHK5 HJ 4<<'
U-#)$2-
3V">-,6
?@)A,"@)%-
C7!4
E,=@
->#"N)%-
B7!4
<GS
C7!@
C7!@
C7!@
<GS
OOO
<GS
OOO
!"#$#"%&'()*+,-'.
$
/'0+,1)1"2'34556
787.7
97:7;77
<=%>,=2?@)A,"@)%-'B7!4?@)A,"@)%-'C7!4?@)A,"@)%-'DC!4
''''EFG< "#5< 4?HI9 HJ 4<
<EHK5L
/'0+,1)1"2'34556
787
.797:7;77
"#5< 4?HI9EHK5L
M
<=%>,=2E,=@->#"N)%-'87!4E,=@->#"N)%-'B7!4E,=@->#"N)%-'C7!4
''''EFG< HJ 4<
<
O OOO
OOO
OOOOOO
OOOOOOOOO
OOO
OOO
OOOOOO
OO
OOOOOO
OOO OOO
OOOOOO
OOO
OOOOOO
OOO
OOO
OOOOOO
OOO
OO
"
::'H-+,=P2"L>=@"'>+@=,L
D9'E#-=$#,=@=$Q>=@"'>+@=,L
BC'4-,R-2'<-22'<",$)%=@"'>+@=,L
.8'4)M*+>'<",$)%=)M'>+@=,L
!"!#$!"%&'("'#)'("'#*!+",)
!"!#$!"%&'("'#)'("'#*!+",)
!"!#$!"%&'("'#)'("'#*!+",)
!"!#$!"%&'("'#)'("'#*!+",)
!
!
!
!
"#$
!#%
&#'
'#''''''''''''SHG'-TA,-LL)=%'2-1-2L'
P '''''EFG< ''''EHK5 HJ 4<<'
U-#)$2-
3V">-,6
?@)A,"@)%-
C7!4
E,=@
->#"N)%-
B7!4
<GS
C7!@
C7!@
C7!@
<GS
OOO
<GS
OOO
Molecular networks act as sensors and mediators of complex and adaptive cellular physiology
Population
Sample acquisition
Predictive Network Model
What we are about: Integrating big data across many domains to build predictive models that improve how we diagnose and treat disease
Slide courtesy of Eric Schadt
Causal network models generate testable predictions from in silico experimentsUltimately want to drive decision making in drug discovery
PPM1L
Sh3gl2 Grit
C6 Irx3 Prr7
Insulin
Glucose
Fat Mass
Glra2 Atp1a3
Tcf7l2
Slc38a1
Novel phosphatase under development at
Merck for T2D
Lowers glucose
Raises insulin
Increases fat mass Negatively impacts Hypertension genes
GOOD BAD
Predictions derived from the predictive models
Slide courtesy of Eric Schadt
Predictions are great, but only meaningful if they are validated
PPM1L
Sh3gl2 Grit
C6 Irx3 Prr7
Insulin
Glucose
Fat Mass
Glra2 Atp1a3
Tcf7l2
Slc38a1
GLUCOSE LOWERED
FAT MASS INCREASED
BLOOD PRESSURE INCREASED
GOOD
BAD
BAD
Slide courtesy of Eric Schadt
Validation of network model prediction in a patient population
But wait, the network also shows PPM1L and PPARG (target of Avandia) in a causal relationship
PPARG
PPM1L
Network Predicts: - Avandia will lower glucose - Avandia will make you fat - Avandia will increase
cardiovascular risk
Validation 2 years later:
Leveraging NGS and Predictive Network Models to Drive Personalized Cancer Therapy
Clinical'
Tumor'RNA'
Tumor'DNA'
Germline'DNA'
Soma4cvaria4on
Network'integra4on
CD8'epitope'predic4on
Chemo>'genomic
Public'data'integra4on
Cancer'Pa)ent'Profiling
Pa)ent0Specific'Analyses
Interpreta)on'&'Screens'Informed'by'Pa)ent0
Specific'Tumor'Network
Personalized'Report'&'Treatment'Op)ons'Delivered'to'Clinician
Human&cell&system&screening
Pa1ent2specific&xenogra7&models
Pa1ent2specific&mutant&fly&models
Patient-specific subnetwork
Predictive network model of cancer
Genomics Core Facility(Illumina, PacBio, Ion)
RNA$+$DNA
Tumor$biopsy$+$normal
= key driver
Personalized multiscale tumor networks to diagnose and treat cancers
Key driver targeted therapy
Patient-specific subnetwork
Predictive network model of cancer
Genomics Core Facility(Illumina, PacBio, Ion)
RNA$+$DNA
Tumor$biopsy$+$normal
= key driver
Personalized multiscale tumor networks to diagnose and treat cancers
Pa2ent network targeted therapy
Th17Th1
0:00 min0:05 min 0:10 min
DNACell'specific-RNACytokinesClinical-labsPhysiometrics
Personalized multiscale networks to model dynamics of complex disease
How to capture all of the clinical data exhaust?
CPOE
EMR
Billing Telemetry
Data driven translational medicine pipeline at Mount Sinai
EMR(EPIC)
Clinical.LabsSequencing.Facility
Data.Warehouse
BioBank PaAent.Traffic
Clinical.Data
Primary.Data
HighFPerformance.CompuAng
Research.and.Clinical.Queries;
Experiment.CreaAon;.etc.
Disease.Model.ConstrucAon.and.
PredicAon.GeneraAon
AcAonable.Feedback
Multiscale analysis of patient networks enables precision medicine
=
GenomicEnvironmentClinical
Multiscale measures of patients becoming available through the Mount Sinai Biobank
Drugs
DiagnosesDNA
RNA
Labs
Procedures
Microbiome
Immune
Image credit: Li Li (ISMMS)
DMSEA
DMSHA
DMSAA
Topological network generated using SNP data separates race
Low enrich. diabetes
High enrich. diabetes
DMSHA, diabetes enriched
Many possible topological analyses can be driven using Mt. Sinai genotype/phenotype data
The personal biosensor wave is forming
Printable tattoo biosensor
Key challenge: incorporate data-driven models into clinical decision support at the point-of-care
PRACTICE
useful genomic information, regardless of how it is generated. As new pharmacogenomic practice guidelines become avail-able for “actionable” gene–drug pairs, stored genotype data will be released for use in CLIPMERGE PGx pending regulatory approval.
PROVIDER RECRUITMENT, EDUCATION, AND FEEDBACKFollowing consultation with institutional leadership, it was decided that the program would initially be limited to a group of consented providers, by practice, in order to minimize the potential disruption to the institution at large while the infra-structure is established and its impact evaluated. The even-tual aim as the program develops is to include all Mount Sinai providers, which will allow for greater generalizability of the outcome data generated. Before participating in CLIPMERGE PGx, providers are required to attend a 1-h recruitment session. Sessions are held regularly and on an ongoing basis to ensure a high level of provider enrollment. In addition to stand-alone ses-sions, scheduled teaching slots for trainees and divisional meet-ings have been used for recruitment. All sessions are advertised and communicated to relevant providers through existing chan-nels. During the sessions, providers first complete a pretrain-ing questionnaire about their current knowledge of genomics, personalized medicine, and CDS (Supplementary Data online). After completing the questionnaire, they watch a 30-min pres-entation outlining the scientific justification and content of the CDS. They then complete a posttraining questionnaire about their background and attitudes toward prescribing decision aids and personal genome testing. Those who complete the sessions are invited to consent, and their credentials are added to a list of participating users. In addition, each time providers encounter CLIPMERGE CDS in the course of patient care, they will receive a survey via e-mail to gauge their opinions and the appropriate-ness of the CDS they encountered.
DEVELOPMENT AND EVALUATION OF CDS CONTENTThe Clinical Pharmacogenetics Implementation Consortium of the Pharmacogenomic Research Network develops practice guidelines for implementing pharmacogenomics.5 These guide-lines include recommendations regarding medication selection or dosing based on combinations of genotype and phenotype, which are an ideal resource for CLIPMERGE PGx. For example, initial CDS content was developed for clopidogrel (CYP2C19), warfarin (CYP2C9 and VKORC1), simvastatin (SLCO1B1), tri-cyclic antidepressants (CYP2D6 and CYP2C19), and selective serotonin reuptake inhibitors (CYP2D6) because these have Clinical Pharmacogenetics Implementation Consortium guide-lines published or in development and/or modifications to the FDA label, with clinically approved genotyping assays available for implicated alleles. The creation of CDS content was under-taken by a multidisciplinary working group of geneticists, phar-macologists, physicians, and informaticists that formed CDS content by consensus. This group will continue to review exist-ing CDS and to extend new CDS in response to developments in the field, FDA label revisions, and publication of new guidelines. CDS content was also evaluated as part of user acceptance test-ing by a group of CLIPMERGE-enrolled providers.
OUTCOME AND PROCESS MEASURESCLIPMERGE PGx is concerned predominantly with the pro-cess of genomic medicine implementation. This program will contribute to the emerging body of pilot data needed for forth-coming larger studies that will assess the utility of genomic information in optimizing medication efficacy and safety. In addition to quantitative transactional data (e.g., genotype results, CDS type, and frequency) and questionnaire data (e.g., appropriateness of CDS deployment), qualitative data are being collected to provide a deeper understanding of the barriers and facilitators to genome-informed CDS adoption. The program is
Figure 1 A platform for the implementation of genome-informed clinical decision support (CDS). Saliva samples from BioMe patients sent to the Mount Sinai Genetic Testing Laboratory are subjected to clinical pharmacogenomic testing. Valid genotypes are released to the CLIPMERGE database, which also contains longitudinal clinical data extracted from the electronic health record (EHR). These data are assessed by the clinical risk assessment engine (CRAE), which contains prespecified rules relating actionable genotype–drug pairs to genome-informed advice messages. If a rule is fulfilled, decision support is delivered in real time via the EHR. A mockup of CDS for a clopidogrel (Plavix) poor metabolizer is shown, consisting of a text segment, a reference link, and an order set with suggested alternative medications.
CRAERules for actionablegene/drug pairs
CLIPMERGEdatabase
Mount SinaiGenetic Testing Laboratory
CLIPMERGE PGx saliva samplefrom consented BIOMe participant
Clinical genotype data
Longitudinal clinical data Reference material
Drug information
Genome-informed CDS This patient has been prescribed clopidogrel(Plavix®) and is a CYP2C19-poor metabolizer(*2/*2) according to genomic testing. Poormetabolizer status is associated with significantlydiminished antiplatelet response to clopidogrel andincreased risk for adverse cardiovascular eventsfollowing percutaneous coronary intervention (PCI).If no contraindication, consider alternative medicationfrom order set below. Click here to learn more.
If no contraindication, consider prescribing an alternativemedication. Click the medication name for further informationincluding indications, dosage and contraindications.
OK
OK
PRASUGREL (Effient®)
TICAGRELOR (Brilinta®)
Electronichealthrecord
CLIPMERGE platform
CLINICAL PHARMACOLOGY & THERAPEUTICS 3
Erwin Bottinger Omri Gottesman
New from Oxford University Press
http://exploringpersonalgenomics.orgForeword by George Church
EXPLORING PERSONAL GENOMICSJOEL T. DUDLEY & KONRAD J. KARCZEWSKI
Foreword by George M. Church
�
• Visualization
• Disease risk modeling
• Pharmacogenomics
• DNA-to-physiology
• Gene-by-environment
• More!
Thank you for your attention
Email: [email protected] Twitter: @jdudley Web: research.mssm.edu/dudley/
Icahn School of Medicine at Mount Sinai