analysis and integration of large-scale molecular and clinical data in cancers

29
Integration of Large- scale Molecular and Clinical Data in Cancers Sampsa Hautaniemi, DTech Systems Biology Laboratory Institute of Biomedicine Genome-Scale Biology Research Program Centre of Excellence in Cancer Genetics Faculty of Medicine University of Helsinki

Upload: ferris

Post on 02-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers. Sampsa Hautaniemi, DTech Systems Biology Laboratory Institute of Biomedicine Genome-Scale Biology Research Program Centre of Excellence in Cancer Genetics Faculty of Medicine University of Helsinki. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Analysis and Integration of Large-scale Molecular

and Clinical Data in Cancers

Sampsa Hautaniemi, DTech

Systems Biology LaboratoryInstitute of Biomedicine

Genome-Scale Biology Research ProgramCentre of Excellence in Cancer Genetics

Faculty of MedicineUniversity of Helsinki

Page 2: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Table of Contents The essence of systems biology: Iteration

and collaboration. Iteration in ovarian cancer.

The essence of systems biology II: Multi-level data. Multi-levelity of breast cancer.

The essence of systems biology III: Computation. Anduril computational framework &

glioblastoma multiforme.

Page 3: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Systems Biology: Iteration

Adapted from a slide by Peter Sorger

Page 4: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Ovarian Cancer Epithelial ovarian cancer is the fifth most

frequent cause of female cancer deaths, with an overall 5-year survival rate below 50%.

The standard chemotherapy for high-grade serous ovarian cancer (HGS-OvCa) is platinum-taxane combination. Majority of patients suffer relapse <18 months. No clinically applicable methods to predict the

prognostic outcome or even to identify the patients unresponsive to current therapies.

Page 5: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Aims of the HGS-OvCa Study To identify poor response and good response

subtypes of HGS-OvCa. Report biomarkers that allow to identify

whether a HGS-OvCa patient responds to the platinum treatment. We developed a computational method that

integrates transcriptomics and clinical data in subtype finding step.

We used transcriptomics and clinical data from 184 HGS-OvCa patients treated with platinum and taxane from TCGA repository.

Page 6: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Three Subtypes of HGS-OvCa

Chen et al. In preparation.

Page 7: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Validation, validation, validation We also used an independent prospective

HGS-OvCa cohort of 29 patients. Data measured with qRT-PCR.

Chen et al. In preparation.

Page 8: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Pathway Analysis Our pathway analysis (too) identified TR3

as a potential driver for platinum resistance.

Page 9: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

TR3 Inhibition with Two Drugs We identified two signaling pathway

regulators for TR3 and associated inhibitors. The use of two inhibitors should transform the

HGS-OvCa cells sensitive to platinum.

Chen et al. In preparation.

AKT inh

+ AKT inh + ERK5 inh

Page 10: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Systems Biology II: Multi-level Data

eAtlas of Pathology

While cancer cells are clearly visible the exact molecular causes for are still unknown. Need to study cancer samples at multiple levels.

Page 11: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Multiple Levels of Data

Clinical

Genetics

Transcriptome

Epigenetics

Proteomics

100 samples lead to~200 million data points.

Page 12: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Multiple level data: Estrogen Receptor

Nuclear receptor:Estrogen receptor

Gene regulation

Transcription factor

Genomic action Non-genomic action

Page 13: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Why Is This Important? Estrogen receptor is the most

important clinical variable in determining how to treat a breast cancer patient.

There are several anti-cancer drugs targeting estrogen receptor pathway. Currently unknown which tumors

do not response to therapy. Finding genes respond to

estrogen receptor stimulus may give clues which genes are important in ER inhibition resistance.

Hugo Simberg: Garden of Death

Page 14: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Data We used chromatin immunoprecipitation

combined with massive parallel sequencing (ChIP-seq) to determine genome-wide occupancy (eight time points) after estradiol stimuli in MCF-7 breast cancer cell line: Estrogene receptor RNA polymerase II

Histone marks (H3K4me3, H2A.Z)

These experiments resulted in >2.0 billion data points to the initial analysis.

Page 15: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

SYNERGY database SYNERGY database is available and fully

operational. http://csblsynergy.fimm.fi/

Page 16: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Finding ER Responsive Genes

Page 17: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Results We identified 777 estrogen receptor early

responding genes. Interestingly, the major estrogen receptor

related changes in cells were due to non-genomic action.

Page 18: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Results Next we searched for genes that have

survival association in a breast cancer cohort of 150 ER+/HER2-/postmenopausal patients in The Cancer Genome Atlas (TCGA) cohort. Based on Kaplan-Meier analysis we identified

23 genes with survival p<0.05. The best survival associated gene was ATAD3B.

Page 19: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Kaplan-Meier for ATAD3B

Page 20: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Intermission Pol2 activity is much better way of

searching for responsive genes to a cue that mRNA.

In deep sequencing, the sequencing depth is important (with our 200 mill. short-read Pol2 data, we found many ER responsive genes not found in 20 mill. short-read GRO-seq).

How to systematically analyze multi-level data?

Page 21: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Multi-level Cancer Research Requires Computational Methods Storing the data and computing power are

the first (but relatively small) hurdles. Analysis of large-scale, heterogeneous

data is much more challenging than single genomics or proteomics data analysis.

There is a need for computational infrastructure. Writing an analysis program fast without

proper infrastructure will lead to delays and errors in larger projects.

Page 22: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Infrastructure: Anduril Anduril is a computational framework to integrate

large-scale and heterogeneous data, knowledge in bio-databases and analysis tools.

The main design principles are: Modular pipeline analysis approach Scalable Open source, thorough documentation

http://www.anduril.org/ Method written in any programming language

executable from the command prompt can be included.

Produces automatically the result PDF and website containing the results.

Page 23: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Complex Pipelines Are Fragile

Page 24: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Glioblastoma Multiforme (GBM) Glioblastoma multiforme (GBM) is one of the

deadliest cancers. The Cancer Genome Atlas (TCGA) has

published data from >500 GBM patients: comparative genomic hybridization arrays single nucleotide polymorphism arrays exon and gene expression arrays microRNA arrays methylation arrays clinical data

Which genes or genetic regions have survival effect?

Page 25: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

GBM Results in Anduril Website

Page 26: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Latest on moesin in GBM

Page 27: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

(Sequence) Component Libraries Over 400 Anduril components already available. Pipelines:

ChIP-seq (EMBO J 2011, Cancer Res 2012, ...) RNA-seq (not published) miRNA-seq (not published) DNA methylation-seq (not published) Whole-genome sequence & exome-sequence (not

published) Image analysis (manuscript)

Page 28: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

Summary Characterization of a complex disease first requires

identifying the key variables. This requires integration data from multiple levels, iterative

mode of research and collaboration. Multi-level data integration requires computational

infrastructure and data-intensive computing. We have developed Anduril to organize large-scale data

analysis projects (imaging, deep sequencing, database usage, conversions, etc.)

The need for computational infrastructure is evident in particular when analyzing deep sequencing data.

All our methods are (will be) freely available.

http://research.med.helsinki.fi/gsb/hautaniemi/software.html

Page 29: Analysis and Integration of Large-scale Molecular and Clinical Data in Cancers

AcknowledgementsSystems Biology Lab

FundingAcademy of FinlandFinnish Cancer OrganizationsSigrid Jusélius FoundationEU FP7ERA-NET SysBio+Biocenter FinlandBiocentrum Helsinki

CollaboratorsOlli Carpén Henk StunnenbergGeorge ReidJukka Westermarck