moving from big data to better models of disease and drug response - joel dudley

Moving from Big Data to Better Models of Disease

and Drug Response

Joel Dudley, PhD Director of Biomedical Informatics &

Assistant Professor of Genetics and Genomic Sciences, Mount Sinai School of Medicine

Icahn School of Medicine at Mount Sinai @IcahnIns(tute

Mount Sinai Health System Facts

>6,000Physicians

7Member hospital campuses

>3,500

>3,100,000Patient visits

Hospital beds

Mount Sinai is attracting key talent to thrive in a Big Data world

Demeter

There are rarely smoking guns in human disease biology

HEART

VASCULATURE

KIDNEY

IMMUNE SYSTEM

transcriptional network

protein network

metabolite network

Non-coding RNA network

GI TRACT

BRAIN

ENVIRONMENT EN

VIRO

NMEN

T

ENVIRONMENT

ENVI

RONM

ENT

That promise to enable the construction of molecular networks that define the biological processes that comprise living systems

We must embrace complexity to fully understand human physiology and disease

We must embrace complexity to fully understand human physiology and disease

“A complex adaptive system has three characteristics. The first is that the system consists of a number of heterogeneous agents, and each of those agents makes decisions about how to behave. The most important dimension here is that those decisions will evolve over time. The second characteristic is that the agents interact with one another. That interaction leads to the third—something that scientists call emergence: In a very real way, the whole becomes greater than the sum of the parts. The key issue is that you can’t really understand the whole system by simply looking at its individual parts”.

- Michael J. Mauboussin (investment banker)

Although our ability to embrace complexity will bump up against our want to tell stories

Zeus, the sky god; when he is angry he throws lightening bolts out of the sky

Ptolemaic astronomy: the earth is the center of the universe

The earth is flat

Biological processes are driven by simple linearly ordered pathways (e.g. TGF-beta signaling)

Integrating and modeling the digital universe of information

We need to be able to leverage the digital universe of information to solve complex problems

1.8ZETTABYTES

Last year WE cracked the 1 zettabyte

(1.8 trillion gigabytes) of information will be created and replicated in 2011-

and growing fast (it has grown by a factor of 9 in just five years)

2011 IDC Digital Universe Study sponsored by EMC

Being masters of really big data is now critical for biomedical research (TB→PB→EB→ZB)

Organisms Tissues Single cells

Single cell, real-‐2me,

con2nuous?

Inter Pulse Distance (IPD)

Real time observation systems add complex but powerful new dimensions to NGS

We measure more than we know

Exploring the transcriptional landscape of human disease

~300 Diseases and Condi2ons

20k+ Genes

Blue: gene goes down in disease Yellow: gene goes up in disease

Suthram S, Dudley J et al. Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Computational Biology (2010)

Figure 2. Significant disease-disease similarities. (A) Hierarchical clustering of the disease correlations. The distance between two diseases wasdefined to be (1-correlation coefficient) of the two diseases. The tree was constructed using the average method of hierarchical clustering. The redline corresponds to a p-value of 0.01 and FDR of 10.37% and, disease correlations below this line are considered significant. The different colorsrepresent the various categories of significant disease correlations. (B) The network of all the 138 significant disease correlations. The colorscorrespond to significant disease correlation categories in (A). The nodes colored in grey are not marked in (A).doi:10.1371/journal.pcbi.1000662.g002

Network-Based Elucidation of Disease Relationships

PLoS Computational Biology | www.ploscompbiol.org 4 February 2010 | Volume 6 | Issue 2 | e1000662

Building molecular taxonomies of human disease

Data Driven Approach to Connect Drugs and Disease Using Molecular Profiles

Sirota, M., Dudley, J. T., et al. (2011). Discovery and Preclinical Validation of Drug Indications Using Compendia of Public Gene Expression Data. Science Translational Medicine, 3(96).

Topiramate Reduces IBD Severity in a TNBS Rodent Model of IBD

• TNBS chemically induced rat model of IBD

• Animals treated with 80mg/kg topiramate oral after sensitization

• Prednisolone positive control (approved for IBD in humans)

Dudley, J. T., Sirota, M., et al. (2011). Computational Repositioning of the Anticonvulsant Topiramate for Inflammatory Bowel Disease. Science Translational Medicine, 3(96).

Control Imipramine

Approved compound for non-cancer indication prevents formation of SCLC tumors in a genetic model of SCLC

p53/Rb/p130 triple knockout model of SCLC

Mice dosed after

tumor formation

33

Supplementary Fig. 2 | Inhibitory effects of Imipramine, Promethazine, and Bepridil on SCLC allografts and xenografts. a, Strategy used for the treatment of mice growing SCLC tumors under their skin. NSG immunocompromised mice were subcutaneously implanted with 2 different mouse SCLC cell lines (Kp1 and Kp3) (b) and one human SCLC cell line (H187) (c) and tumor volume was measured at the times indicated of daily IP injections with vehicle control (Saline and corn oil; n=10 in (b) and n=4 in (c)), Imipramine (25mg/kg; n=7 in (b) and n=4 in (c)), Promethazine (25mg/kg; n=7 in (b) and n=4 in (c)), and Bepridil (10mg/kg; n=7 in (b) and n=3 in (c)) (3 independent experiments in (b) and 1 experiment in (c)). Values are shown as mean ± s.e.m. The unpaired t-test was used to calculate the p-values of treated versus control tumors at different days of treatment. *P<0.05, **P<0.01, and ***P<0.001. Values that are not significant are not indicated. d, Representative images of SCLC xenografts (H187) collected 14 days after daily treatment with Saline, Imipramine, and Promethazine. e, MTT survival assay of Cisplatin- and saline-treated SCLC cells cultured in 2% serum (n=3 independent experiments) for 48 hours with increasing doses of Imipramine. ns, not significant. f, Representative images of Cisplatin- and saline-treated SCLC allografts collected 17 days after daily treatment with Saline, and Imipramine.

0

2

4

6

8

Days of Treatment

Fo

ld C

ha

ng

e o

f Tu

mo

r V

olu

me

b cSaline

Imipramine

Promethazine

Bepridil

***

**

****

******

*****

0 3 5 7 10 13

!"#$#"%&'()*+,-'.

$

/'0+,1)1"2'34556

787.7

97:7;77

<=%>,=2?@)A,"@)%-'B7!4?@)A,"@)%-'C7!4?@)A,"@)%-'DC!4

''''EFG< "#5< 4?HI9 HJ 4<

<EHK5L

/'0+,1)1"2'34556

787

.797:7;77

"#5< 4?HI9EHK5L

M

<=%>,=2E,=@->#"N)%-'87!4E,=@->#"N)%-'B7!4E,=@->#"N)%-'C7!4

''''EFG< HJ 4<

<

O OOO

OOO

OOOOOO

OOOOOOOOO

OOO

OOO

OOOOOO

OO

OOOOOO

OOO OOO

OOOOOO

OOO

OOOOOO

OOO

OOO

OOOOOO

OOO

OO

"

::'H-+,=P2"L>=@"'>+@=,L

D9'E#-=$#,=@=$Q>=@"'>+@=,L

BC'4-,R-2'<-22'<",$)%=@"'>+@=,L

.8'4)M*+>'<",$)%=)M'>+@=,L

!"!#$!"%&'("'#)'("'#*!+",)

!"!#$!"%&'("'#)'("'#*!+",)

!"!#$!"%&'("'#)'("'#*!+",)

!"!#$!"%&'("'#)'("'#*!+",)

!

!

!

!

"#$

!#%

&#'

'#''''''''''''SHG'-TA,-LL)=%'2-1-2L'

P '''''EFG< ''''EHK5 HJ 4<<'

U-#)$2-

3V">-,6

?@)A,"@)%-

C7!4

E,=@

->#"N)%-

B7!4

<GS

C7!@

C7!@

C7!@

<GS

OOO

<GS

OOO

!"#$#"%&'()*+,-'.

$

/'0+,1)1"2'34556

787.7

97:7;77

<=%>,=2?@)A,"@)%-'B7!4?@)A,"@)%-'C7!4?@)A,"@)%-'DC!4

''''EFG< "#5< 4?HI9 HJ 4<

<EHK5L

/'0+,1)1"2'34556

787

.797:7;77

"#5< 4?HI9EHK5L

M

<=%>,=2E,=@->#"N)%-'87!4E,=@->#"N)%-'B7!4E,=@->#"N)%-'C7!4

''''EFG< HJ 4<

<

O OOO

OOO

OOOOOO

OOOOOOOOO

OOO

OOO

OOOOOO

OO

OOOOOO

OOO OOO

OOOOOO

OOO

OOOOOO

OOO

OOO

OOOOOO

OOO

OO

"

::'H-+,=P2"L>=@"'>+@=,L

D9'E#-=$#,=@=$Q>=@"'>+@=,L

BC'4-,R-2'<-22'<",$)%=@"'>+@=,L

.8'4)M*+>'<",$)%=)M'>+@=,L

!"!#$!"%&'("'#)'("'#*!+",)

!"!#$!"%&'("'#)'("'#*!+",)

!"!#$!"%&'("'#)'("'#*!+",)

!"!#$!"%&'("'#)'("'#*!+",)

!

!

!

!

"#$

!#%

&#'

'#''''''''''''SHG'-TA,-LL)=%'2-1-2L'

P '''''EFG< ''''EHK5 HJ 4<<'

U-#)$2-

3V">-,6

?@)A,"@)%-

C7!4

E,=@

->#"N)%-

B7!4

<GS

C7!@

C7!@

C7!@

<GS

OOO

<GS

OOO

Molecular networks act as sensors and mediators of complex and adaptive cellular physiology

Population

Sample acquisition

Predictive Network Model

What we are about: Integrating big data across many domains to build predictive models that improve how we diagnose and treat disease

Slide courtesy of Eric Schadt

Causal network models generate testable predictions from in silico experimentsUltimately want to drive decision making in drug discovery

PPM1L

Sh3gl2 Grit

C6 Irx3 Prr7

Insulin

Glucose

Fat Mass

Glra2 Atp1a3

Tcf7l2

Slc38a1

Novel phosphatase under development at

Merck for T2D

Lowers glucose

Raises insulin

Increases fat mass Negatively impacts Hypertension genes

GOOD BAD

Predictions derived from the predictive models


Predictions are great, but only meaningful if they are validated

PPM1L

Sh3gl2 Grit

C6 Irx3 Prr7

Insulin

Glucose

Fat Mass

Glra2 Atp1a3

Tcf7l2

Slc38a1

GLUCOSE LOWERED

FAT MASS INCREASED

BLOOD PRESSURE INCREASED

GOOD

BAD

BAD


Validation of network model prediction in a patient population

But wait, the network also shows PPM1L and PPARG (target of Avandia) in a causal relationship

PPARG

PPM1L

Network Predicts: - Avandia will lower glucose - Avandia will make you fat - Avandia will increase

cardiovascular risk

Validation 2 years later:

Leveraging NGS and Predictive Network Models to Drive Personalized Cancer Therapy

Clinical'

Tumor'RNA'

Tumor'DNA'

Germline'DNA'

Soma4cvaria4on

Network'integra4on

CD8'epitope'predic4on

Chemo>'genomic

Public'data'integra4on

Cancer'Pa)ent'Profiling

Pa)ent0Specific'Analyses

Interpreta)on'&'Screens'Informed'by'Pa)ent0

Specific'Tumor'Network

Personalized'Report'&'Treatment'Op)ons'Delivered'to'Clinician

Human&cell&system&screening

Pa1ent2specific&xenogra7&models

Pa1ent2specific&mutant&fly&models

Patient-specific subnetwork

Predictive network model of cancer

Genomics Core Facility(Illumina, PacBio, Ion)

RNA$+$DNA

Tumor$biopsy$+$normal

= key driver

Personalized multiscale tumor networks to diagnose and treat cancers

Key driver targeted therapy

Patient-specific subnetwork

Predictive network model of cancer

Genomics Core Facility(Illumina, PacBio, Ion)

RNA$+$DNA

Tumor$biopsy$+$normal

= key driver

Personalized multiscale tumor networks to diagnose and treat cancers

Pa2ent network targeted therapy

Th17Th1

0:00 min0:05 min 0:10 min

DNACell'specific-RNACytokinesClinical-labsPhysiometrics

Personalized multiscale networks to model dynamics of complex disease

How to capture all of the clinical data exhaust?

CPOE

EMR

Billing Telemetry

Data driven translational medicine pipeline at Mount Sinai

EMR(EPIC)

Clinical.LabsSequencing.Facility

Data.Warehouse

BioBank PaAent.Traffic

Clinical.Data

Primary.Data

HighFPerformance.CompuAng

Research.and.Clinical.Queries;

Experiment.CreaAon;.etc.

Disease.Model.ConstrucAon.and.

PredicAon.GeneraAon

AcAonable.Feedback

Multiscale analysis of patient networks enables precision medicine

=

GenomicEnvironmentClinical

Multiscale measures of patients becoming available through the Mount Sinai Biobank

Drugs

DiagnosesDNA

RNA

Labs

Procedures

Microbiome

Immune

Image credit: Li Li (ISMMS)

DMSEA

DMSHA

DMSAA

Topological network generated using SNP data separates race

Low enrich. diabetes

High enrich. diabetes

DMSHA, diabetes enriched

Many possible topological analyses can be driven using Mt. Sinai genotype/phenotype data

The personal biosensor wave is forming

Printable tattoo biosensor

Key challenge: incorporate data-driven models into clinical decision support at the point-of-care

PRACTICE

useful genomic information, regardless of how it is generated. As new pharmacogenomic practice guidelines become avail-able for “actionable” gene–drug pairs, stored genotype data will be released for use in CLIPMERGE PGx pending regulatory approval.

PROVIDER RECRUITMENT, EDUCATION, AND FEEDBACKFollowing consultation with institutional leadership, it was decided that the program would initially be limited to a group of consented providers, by practice, in order to minimize the potential disruption to the institution at large while the infra-structure is established and its impact evaluated. The even-tual aim as the program develops is to include all Mount Sinai providers, which will allow for greater generalizability of the outcome data generated. Before participating in CLIPMERGE PGx, providers are required to attend a 1-h recruitment session. Sessions are held regularly and on an ongoing basis to ensure a high level of provider enrollment. In addition to stand-alone ses-sions, scheduled teaching slots for trainees and divisional meet-ings have been used for recruitment. All sessions are advertised and communicated to relevant providers through existing chan-nels. During the sessions, providers first complete a pretrain-ing questionnaire about their current knowledge of genomics, personalized medicine, and CDS (Supplementary Data online). After completing the questionnaire, they watch a 30-min pres-entation outlining the scientific justification and content of the CDS. They then complete a posttraining questionnaire about their background and attitudes toward prescribing decision aids and personal genome testing. Those who complete the sessions are invited to consent, and their credentials are added to a list of participating users. In addition, each time providers encounter CLIPMERGE CDS in the course of patient care, they will receive a survey via e-mail to gauge their opinions and the appropriate-ness of the CDS they encountered.

DEVELOPMENT AND EVALUATION OF CDS CONTENTThe Clinical Pharmacogenetics Implementation Consortium of the Pharmacogenomic Research Network develops practice guidelines for implementing pharmacogenomics.5 These guide-lines include recommendations regarding medication selection or dosing based on combinations of genotype and phenotype, which are an ideal resource for CLIPMERGE PGx. For example, initial CDS content was developed for clopidogrel (CYP2C19), warfarin (CYP2C9 and VKORC1), simvastatin (SLCO1B1), tri-cyclic antidepressants (CYP2D6 and CYP2C19), and selective serotonin reuptake inhibitors (CYP2D6) because these have Clinical Pharmacogenetics Implementation Consortium guide-lines published or in development and/or modifications to the FDA label, with clinically approved genotyping assays available for implicated alleles. The creation of CDS content was under-taken by a multidisciplinary working group of geneticists, phar-macologists, physicians, and informaticists that formed CDS content by consensus. This group will continue to review exist-ing CDS and to extend new CDS in response to developments in the field, FDA label revisions, and publication of new guidelines. CDS content was also evaluated as part of user acceptance test-ing by a group of CLIPMERGE-enrolled providers.

OUTCOME AND PROCESS MEASURESCLIPMERGE PGx is concerned predominantly with the pro-cess of genomic medicine implementation. This program will contribute to the emerging body of pilot data needed for forth-coming larger studies that will assess the utility of genomic information in optimizing medication efficacy and safety. In addition to quantitative transactional data (e.g., genotype results, CDS type, and frequency) and questionnaire data (e.g., appropriateness of CDS deployment), qualitative data are being collected to provide a deeper understanding of the barriers and facilitators to genome-informed CDS adoption. The program is

Figure 1 A platform for the implementation of genome-informed clinical decision support (CDS). Saliva samples from BioMe patients sent to the Mount Sinai Genetic Testing Laboratory are subjected to clinical pharmacogenomic testing. Valid genotypes are released to the CLIPMERGE database, which also contains longitudinal clinical data extracted from the electronic health record (EHR). These data are assessed by the clinical risk assessment engine (CRAE), which contains prespecified rules relating actionable genotype–drug pairs to genome-informed advice messages. If a rule is fulfilled, decision support is delivered in real time via the EHR. A mockup of CDS for a clopidogrel (Plavix) poor metabolizer is shown, consisting of a text segment, a reference link, and an order set with suggested alternative medications.

CRAERules for actionablegene/drug pairs

CLIPMERGEdatabase

Mount SinaiGenetic Testing Laboratory

CLIPMERGE PGx saliva samplefrom consented BIOMe participant

Clinical genotype data

Longitudinal clinical data Reference material

Drug information

Genome-informed CDS This patient has been prescribed clopidogrel(Plavix®) and is a CYP2C19-poor metabolizer(*2/*2) according to genomic testing. Poormetabolizer status is associated with significantlydiminished antiplatelet response to clopidogrel andincreased risk for adverse cardiovascular eventsfollowing percutaneous coronary intervention (PCI).If no contraindication, consider alternative medicationfrom order set below. Click here to learn more.

If no contraindication, consider prescribing an alternativemedication. Click the medication name for further informationincluding indications, dosage and contraindications.

OK

OK

PRASUGREL (Effient®)

TICAGRELOR (Brilinta®)

Electronichealthrecord

CLIPMERGE platform

CLINICAL PHARMACOLOGY & THERAPEUTICS 3

Erwin Bottinger Omri Gottesman

New from Oxford University Press

http://exploringpersonalgenomics.orgForeword by George Church

EXPLORING PERSONAL GENOMICSJOEL T. DUDLEY & KONRAD J. KARCZEWSKI

Foreword by George M. Church

�

• Visualization

• Disease risk modeling

• Pharmacogenomics

• DNA-to-physiology

• Gene-by-environment

• More!

Thank you for your attention

Email: [email protected] Twitter: @jdudley Web: research.mssm.edu/dudley/

Icahn School of Medicine at Mount Sinai

moving from big data to better models of disease and drug response - joel dudley

Health & Medicine