transmart community meeting 5-7 nov 13 - session 3: transmart a data warehouse for translational...

22
tranSMART: a data warehouse for Translational Medicine at Takeda Pharmaceuticals International Co. David Merberg Bin Li William Trepicchio transMART Community Workshop November 2013

Upload: david-peyruc

Post on 19-Jan-2015

1.438 views

Category:

Health & Medicine


1 download

DESCRIPTION

tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International Dave Marberg, Takeda We have used the tranSMART platform to construct a warehouse containing data from several Takeda clinical trials, proprietary preclinical drug activity studies, 1600 Gene Expression Omnibus studies, and data from TCGA, CCLE, and other sources. All gene expression data has been globally normalized. We extended the tranSMART platform with a set of R function calls to enable cross-study queries and analysis via the rich toolset available in R. The utility of the data warehouse is exemplified by a study in which we built a predictive model for drug sensitivities. The model was trained on gene expression and IC50 data from cell lines and was found to correctly predict drug activity in oncology indications.

TRANSCRIPT

Page 1: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

tranSMART: a data warehouse for Translational Medicine at Takeda Pharmaceuticals International Co.

David MerbergBin LiWilliam Trepicchio

transMART Community WorkshopNovember 2013

Page 2: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

2

Outline

• Takeda’s tranSMART instance

– Goal

– Data content

– Enhancements

• Case Studies – Models for predicting erlotinib and sorafenib efficacy

|○○○○  |    DDMMYY

Page 3: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

3

Takeda rationale for implementing tranSMART

• To provide a large, well organized, and integrated dataset consisting of MPI/Takeda proprietary data, outsourced data, and valuable public data.

• To provide an integrated environment for accessing clinical data and molecular profiling data– Low dimensional data – age, sex, weight, previous treatments, survival,

etc.– High dimensional data – gene expression microarray, SNP, mutation,

NGS

• To provide tools that will enable Medical and Discovery scientists to use this data warehouse for biomarker identification, patient stratification, and drug targeting disease prediction, etc.

|○○○○  |    DDMMYY

Page 4: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

4

Public data currently in Takeda tranSMART

• Gene Expression Omnibus (GEO)– Approximately 1600 studies– Approximately 200 key cancer studies manually curated; another ~150

cancer studies curated via text mining– Most GEO datasets are cancer studies, but there are also samples from

cardiovascular disease, metabolic diseases, hematopoietic diseases, and many others.

• The Cancer Genome Atlas (TCGA)– Gene expression, SNP, and clinical data from close to 1000 patients

(brain, lung, and ovarian cancer)

• Large cell line panels– The CCLE dataset, ~ 1000 cell lines, screened for 24 SOC drugs– The Sanger dataset, ~ 1000 cell lines, screened on > 100 SOC drugs

|○○○○  |    DDMMYY

Page 5: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

5

Proprietary data currently in Takeda tranSMART

• Velcade Trials– Clinical observations– Gene expression results– Mutation data

• Commissioned Studies– Oncopanel 240 – cell line response to Takeda and SOC compounds

• Drug response (IC50, EC50, cell cycle blocks, apoptosis induction, etc.)• Mutation status• Gene expression

– Oncotest – xenograft response to Takeda and SOC compounds• Drug response (IC50)• Mutation status• Gene expression• SNP

|○○○○  |    DDMMYY

Page 6: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

6

OncoPanel 240 (Ricerca/Eurofins Panlabs)

• 240 well-defined tumor cell lines representing diverse tumor types

• Drug sensitivity screen results (IC50, EC50)– for 13 Standard of Care anti-tumor compounds – for 8 Takeda compounds targeting diverse pathways

• Baseline gene expression• Mutation data

|○○○○  |    DDMMYY

Page 7: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

7

Normalization of information in the data warehouse

• Gene expression data– Globally normalized GEO gene expression data using frozen Robust

Multiarray Analysis (fMRA), • Quantile based normalization• Currently, only selected Affymetrix platforms are globally normalized

– Enabled grouping gene expression results from different labs and different studies by disease

• Clinical information– Curate clinical information to create consistent vocabulary

|○○○○  |    DDMMYY

Page 8: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

8

R interface

• Enable direct access to tranSMART database tables– Eliminates some limitations of web interface, E.g. inability to perform multi-

study queries and analyses.– Provide a connection to the R environment, including diverse analysis

packages

• Sample functions– getDistinctConcepts – given a keyword/string, returns study codes for

matching clinical concepts in the tranSMART database– getGEXdata – given study codes, gets Gene Expression data from the

tranSMART database.

|○○○○  |    DDMMYY

> br_concepts <-transmart.getDistinctConcepts(,'Breast_Cancer')> study_list <- unique(br_concepts$STUDYCODE)> ITGB2_GEP_BR2 <-

transmart.getGEXData(study_list, gene.list='ITGB2', data.pivot=F)

> hist(ITGB2_GEP_BR2$LOG_INTENSITY, br=50, xlim=c(5,12), main="All ITGB2 GEP", xlab="GEP")

Page 9: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

9

Summary

• A data warehouse with a large store of gene expression, SNP, and phenotypic data– Clinical samples and cell lines– Data normalized so that comparisons across studies are meaningful– Vocabulary standardized across studies

• An R-interface to facilitate cross-study analysis using a large collection of methods from statistics and machine learning

• A “toolbox” for achieving key Translational Medicine goals– Bridging the gap between “omic” data generated in preclinical studies and

clinical results– Predicting drug efficacy using clinical and pre-clinical information collected

for different purposes

• Case studies in using this toolbox follow . . .

|○○○○  |    DDMMYY

Page 10: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

10

Building and using a model to predict drug sensitivity

|○○○○  |    DDMMYY

?

???

Can we identify arelationship betweenbaseline gene expressionand drug sensitivity in cell lines . . .

. . . and then extrapolate from that relationship to use geneexpression to predictdrug efficacy in the clinic?

0 50 100 150 200

01

23

4

MLN7243 IC50 distribution on Ricerca panel

Cell linesIC

50s

Page 11: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

11

Building the predictive models

• Normalize all Oncopanel 240 expression data• Remove low-intensity and low-variance genes (to get robust signal)• Correlation based feature selection (gene expression vs IC50s)• Develop a methodology for deriving drug sensitivity models

– Based on Partial Least Squares Regression (PLSR)– Captures consensus information from cancer cell line panel data

• Use two SOC drugs as proof of concept for methodology – Predict erlotinib (inhibits EGFR) sensitivity– Predict sorafenib (inhibits VEGFR and PDGFR) sensitivity– Use PFS from BATTLE trial to evaluate performance of models

|○○○○  |    DDMMYY

Oncopanel 240Expression data

Oncopanel 240drug sensitivity

0 50 100 150 200

01

23

4

MLN7243 IC50 distribution on Ricerca panel

Cell lines

IC50

s

Page 12: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

12

Accuracy of the erlotinib sensitivity model

|○○○○  |    DDMMYY

Re-predicting Oncopanel 240 log2(IC50)

Accuracy estimation:Upper boundary: 91%Lower boundary: 77%

Page 13: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

Signature genes over-connected to EGFRSignature genes over-representing pathwaysthat contains an EGFR node

• Also, EGFR ligand NRG1 is among the signature genes

EGFR

Signature genes in the Erlotinib model reflect known drug mechanism

Page 14: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

14

Real data tests of the models

• Test 1: The BATTLE clinical trial– 255 lung cancer (NSCLC) patients, 131 with gene expression profile

data (GSE33072)• 25 patients in erlotinib arm• 39 patients in sorafenib arm

– Are the predictions of the PLSR models consistent with the results of the BATTLE trial?

• Test 2: Predicting drug sensitivity across indications– Use model to predict erlotinib and sorafenib sensitivity based on gene

expression data from 484 Gene Expression Omnibus datasets in Takeda tranSMART instance

• 11,331 samples grouped into 19 major oncology indications• Calculate percentage predicted drug sensitive tumors for each indication• Compare predictions to results of phase III clinical trials and FDA approvals

|○○○○  |    DDMMYY

Page 15: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

15

Test 1 – The BATTLE Trial: Survival analysis of groups predicted to be drug sensitive/resistant by PLSR model

|○○○○  |    DDMMYY

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Monthes from Start of Therapy

Pro

port

ion

of C

ases

0 2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0

Monthes from Start of Therapy

Pro

port

ion

of C

ases

0 2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0

Monthes from Start of Therapy

Pro

port

ion

of C

ases

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Monthes from Start of Therapy

Pro

port

ion

of C

ases

P = 0.09HR = 0.43

P = 0.006HR = 0.32

P = 0.54HR = 1.32

P = 0.32HR = 1.87

E_model pred E_PFS S_model pred S_PFS

E_model pred S_PFS S_model pred E_PFS

(A)

(D)

(B)

(C)

E: Erlotinib; S: Sorafenib; red: predicted sensitive; green: predicted resistant

Page 16: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

Test 2: Are predictions of erlotinib sensitivity, grouped by indication, consistent with clinical results?

16

Kidney cancer is predictedto be Erlotinib insensitive - a phase III clinical trial failed

Lung cancer is predictedto be erlotinib sensitive,a phase III clinical trial succeeded,(companion diagnostic available)

Potential new indication?Multiple head and neck cancertrials are going on now

Page 17: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

Test 2: Are predictions of sorafenib sensitivity, grouped by indication, consistent with clinical results?

17

Kidney and Liver cancers are predicted to be Sorafenib sensitive

Sorafenib has been approved for Kidney and Liver cancers

Potential new indication?

Page 18: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

18

Conclusions

• Using tranSMART, we created a large data warehouse to provide computational support for biomarker identification, patient stratification, and other Translational Medicine goals.

• Patient and cell line data can be grouped across studies by indication or other attributes to increase statistical power. Grouping is enabled by:– Global normalization of numeric data – Standardization of vocabulary– An R interface that provides direct access to database tables

• Using erlotinib and sorafenib as case studies, we demonstrated that the data warehouse and the R interface enable us to predict patient stratification and drug efficacy in cancer indications.

|○○○○  |    DDMMYY

Page 19: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

19

Acknowledgements

|○○○○  |    DDMMYY

TakedaAndy DornerGene ShinAndrew KruegerSeema GroverJike Cui (now at Sanofi)

Recombinant by DeloitteJinlei LiuMike McDuffieHiaping Xia

Thomson ReutersElona Kolpakova-Hart

Page 20: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

20|○○○○  |    DDMMYY

Backup Slides

Page 21: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

Model test 2: How well do the models predicts predict drug-indication efficacy profile?

21

Cancer Type

Successful

Phase III trial -FDA approval

Number of samples

% tumors predicted Erlotinib sensitive

% tumors predicted Sorafenib sensitive

Lung Cancer Erlotinib 329 15.81 0.61

Liver Cancer Sorafenib 85 0.00 31.76

Kidney Cancer Sorafenib 218 0.46 * 24.77

* Erlotinib failed to show efficacy for kidney cancer in a phase III trial

Page 22: tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International