lincoln stein - informatics of cancer genomes

69
Informatics of Cancer Genomes Lincoln Stein Ontario Institute for Cancer Research AMIA Summit on Translational Genomics San Francisco, March 2011

Upload: amia

Post on 14-Dec-2014

121 views

Category:

Documents


6 download

DESCRIPTION

Lincoln Stein, Ontario Institute for Cancer Research

TRANSCRIPT

Informatics of Cancer Genomes

Lincoln SteinOntario Institute for Cancer Research

AMIA Summit on Translational GenomicsSan Francisco, March 2011

The Genome Project’s Challenge

Translating genome sciences into improved healthcare has been more difficult than we expected!

3

Risk prediction & prognosis: Germline variants predict risk, help optimize screening programs to identify tumours at earlier, more curable, stages;

Promises of cancer genome research to patients, health care providers and payers

Diagnosis: Cancer diagnosis will be more precise, allowing optimization of treatment interventions;

New therapies will be developed that target specific alterations in cancer cells, reducing the need for highly toxic, nonspecific chemotherapies.

A few successes – potential far from being realized

Cancer is a Complex Genomic Disease

Healthy BreastTissue

Early Cancer Invasive Cancer

Cancer is a Genomic Disease

Why Apply Genomics to Cancer?

Every cancer genome is different

Cancers currently treated with a one-size-fits-all strategy (w/ a few exceptions)

Knowledge of genomic changes will inform therapy

Challenges in Understanding Cancer

Tumours are heterogeneous & evolve over time.

Host factors are poorly understood. Different sets of mutated genes may lead

to similar tumours. Different tumour types may have similar

sets of mutated genes. Deep & broad sequencing necessary.

Whole Genome Sequencing- sequencing platform reagents only

$10,000,000

$1,000,000

$100,000

$10,000

$1,000

2005 2006 2007 2008 20102009

OICR Sequencing/Biocomputing Platform

>7 terabases per month (2000 human genomes)

capacity and growing

5500 cores

185 nodes with 16 GB RAM

221 nodes with 24 GB RAM

32 nodes with 96 GB RAM

5 nodes with 256 GB RAM

2.5PB of online storage

1Gb, 10Gb and fibre connectivity

ABI Solid 5500

Illumina GA1

Illumina HiSeq 2000

Pac Bio

Three Parts of Talk

• Part 1 – International Cancer Genome Consortium (ICGC)

• Part 2 – Network Analysis of Cancer Genomes

• Part 3 – Genome Pathways Sequencing (GPS)

Part 1: International Cancer Genome Consortium

Discover and catalog the driver genes in cancer tissues

ICGC– Jan 2011

Nature 464, 993-998 (15 April 2010)

12

Rationale for an international consortium

The scope is huge, such that no country can do it all;

Coordinated cancer genome initiatives will reduce duplication of effort for common tumours and ensure complete studies for many less frequent forms of cancer;

Standardization and uniform quality measures across studies will enable the merging of datasets, increasing power to detect additional targets;

The spectrum of many cancers varies across the world for many tumour types;

The ICGC will accelerate the dissemination of genomic and analytical methods across participating sites and the user community.

The Strategy Identify genomic abnormalities in 50

different major cancer types

Make the data available to the research community & public

Identify genome changes

…GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT…

Data being Collected

• Clinical data – tumor pathology, age, gender, treatment, survival (partly controlled access)

• Germline data – SNPs (controlled access)• Somatic mutations in tumour• Copy number variations• RNA abundance & splicing• DNA methylation

Big Data

50 tumor types and/or subtypesA minimum of 500 specimens and control tissues per tumor type

50,000 Human Genome Projects

ICGC data is distributed, but is accessible through a common portal

Federate or Centralize?

Source DB

Source DB

Source DB

Staging AreaSite 1

Source DB

Source DB

Source DB

Site 1 Site 2 Site 3

Mart DB Mart DB

Mart DB

Mart DB

Centralized Model Federated Model

BioMart: A Federated Data Warehouse(EBI, CSHL, OICR)

Reactome Ensembl HapMap

BioMart

Bioclipse Taverna

GalaxyCytoscapeBioConductor

BioMart BioMart

Arek Kasprzyk

Queries Offered

Search for somatic mutations affecting a gene of interest.

Optionally filtered by various criteria.Search for mutated genes affecting a tumor type of interest.

Optionally filtered by various clinical criteria.Search for donors and samples of interest.

Optionally filtered by clinical & molecular criteria.

Potentially identifiable information is in a controlled tier.Includes germline mutations; NOT somatic mutations.

Apply to Data Access Control Office (DACO) and certify you will not attempt to reidentify patients.

Send list of OpenIDs to allow access.

After review of application, DACO will authorize DCC to accept the OpenIDs indicated for controlled access.

Controlled Data Tier

Part 2: Network Analysis of Cancer Genomes

Discover patterns & mechanisms of altered genes in cancer

Why Network Analysis is Useful

• No single mutated gene is necessary & sufficient to cause cancer.

• Typically one or two common mutations (e.g. TP53) plus many rare mutations.

• Network analysis reduces hundreds of mutated genes to a < dozen mutated pathways.

• Can elucidate mechanism of action of drivers.

Reactome

www.reactome.org

Reactome Pathway Browser

Human EGFR Pathway

Reactome Pathway Coverage

Curated Human Data – Version 35. 5078 proteins 4166 reactions3870 complexes 1112 pathways

Expanding Reactome’s Coverage

Curated Pathways Uncurated Information

human PPI

PPI inferred from fly, worm & yeast

PPI from text mining

Gene co-expression

GO annotation on biological processes

Protein domain- domain interactions

CellMap TRED

GeneWays

Annotated Functional Interactions

Naïve Bayes Classifier

Predicted Functional Interactions

Wu et al. (2010) Genome Biology

Functional Interaction (FI) Network

10,956 proteins (9,542 genes) 209,988 FIs 5% of network shown here

A Paradigm for Interpreting Gene Lists

Reactome Functional Interaction network

Disease subnetwork

Extract mutated, overexpressed, undexpressed, expanded/deleted genesAdd Linker

genes

Disease “modules”

Disease gene prediction

Sample classification

Hypothesis generationApply community clustering algorithms

Reactome Cytoscape Plugin Runs this Paradigm

Find and Annotate Network Modules

Human EGFR Pathway

Back to Reactome

Human EGFR Pathway

Export as Nice Diagrams

OICR Pancreatic Cancer Whole Genome Sequencing

• 5 Patients

– Primary, xenograft, cell line, normal

• Whole genome sequencing

• Somatic mutation calling

• Very conservative – only mutations appearing in primary+xenograft+cell line kept

• 310 NS somatically mutated genes

OICR Pooled Data from 5 Pancreatic Cancer Genomes

(108 mutated genes in network)

p53 & p38 MAPK signaling

KRAS signaling

Hedgehog signaling

Wnt & Cadherin signaling

Zinc fingers

Olfactory signaling

Transcription

Apoptosis

Syndecan-3-mediated signaling

p53 module – 47 genes

PCSI0002

PCSI0005

PCSI0006

PCSI0022

PCSI0024

2 patients

3 patients

linker

KRAS module – 24 genes

PCSI0002

PCSI0005

PCSI0006

PCSI0022

PCSI0024

2 patients

3 patients

linker

Hedgehog module – 11 genes

BMP8B = bone morphogenetic protein 8b

PCSI0002

PCSI0005

PCSI0006

PCSI0022

PCSI0024

2 patients

3 patients

linker

Comparison to 2008 Johns Hopkins Dataset

SCG2SLC1A6SMAD4SMARCA4ST6GAL2TGFBR2TNRTPOZNF835DZ4

Genes mutated in ≥ 3 patients Genes mutated in ≥ 2 patients

p-value = 1.87E-4

Jones et al. Science (2008) 321: 1801

ABLIM2AHNAKBAI3CDH10CDKN2ACTNNA2DPP6FMN2GPR133LRP1B

MYH2ODZ4OVCH1PCDH15PCDH18PIK3CGPPP1R3APREX2PXDNRYR2

SEZ6LSLC45A1TP53TRPM3TTNUSP20ZNF443

AGXT2L2ALDH8A1ARHGEF7ARID1AARSAATN1BOCCAND2CNTNAP2DCHS2

DMDDOCK2DUOX2FAM123CFAT4FRAXAGAS7KRASLGR6LRRTM4

MLL3MUC16NANOS1NKX2-2NLRP4NPY1ROTOFPKD1L2PODNRBM27

PDE4DIPPOTEHRGPD3TBX20TPTE2TUBB2CWASH7PZNF705AZNF717ZNF814

AGAP4ANKRD36AQP7BMP8BCLEC18BFLGFRG2BHERC2HRNRIKZF2KRTAP5-10

LYZL2MST1P9NBPF1NBPF10NBPF14NBPF8NCOR1NF1P4NOTCH2NLOR2T34PABPC3

Hopkins data1278 genes mutated in 24 patients

KRAS signaling

p53 signaling

Integrin signaling

Wnt & Cadherin signaling

TGFβ signalingHeterotrimeric G-protein signaling

Cell cycle

G2/M transition

Lipid metabolism

Hedgehog signaling

Rho GTPase signaling

Muscle contraction

ADAM metallopeptidase with thrombospondin

Transcription

Zinc fingers

Very Similar at Module Level

p53 & p38 MAPK signaling

KRAS signaling

Hedgehog signaling

Wnt & Cadherin signaling

Zinc fingers

Olfactory signaling

Transcription

Apoptosis

Syndecan-3-mediated signaling

Discovering Prognostic Signatures in Cancer Module Datasets

Disease Module Map

Correlate principal components with clinical parameters

Principal component analysis on modulesExpression Analysis of

tumours from multiple patients

Module-Based Signatures of Breast Cancer Survival

• Nejm: van de Vijver et al 2002

– 295 Samples, ~12,000 genes

– Event: death

• GSE4922: Ivshina et al. Cancer Res. 2006

– 249 Samples, ~13,000 genes

– Event: recurrence or death

Building the Network

• Built based on the Nejm data set

– 27 modules selected based on size cutoff 7 and average correlation cutoff 0.25.

• Validated using GSE4922.

PC Analysis Identifies Module 2 as Explaining Much of Variation in Survival

Same Signature Predicts Survival in Independent Data Set

And Three More Data Sets as Well…

Module 2: Kinetochore + Aurora B Signaling

Summary

• Reactome, coupled with the FI network, can lead to

useful insights in analyzing genomic datasets.

• Cytoscape plugin lets anyone perform the

paradigmatic workflow of discovering and annotating

network modules and finding potential drug targets.

• All data and software are open to public; no licensing

required.

• www.reactome.org.

Part 3: The Genome Pathways Sequencing Project

Apply Genomics in a Clinical Setting

•Collab between OICR & Princess Margaret Hospital•Recruit patients with metastatic breast, colorectal, lung & ovarian CA who have “failed” standard therapy.•Sequence ~1000 cancer-related genes.•Identify “actionable” mutations.•Route patients to clinical trials of drugs targeting their particular mutated genes.•Assess clinical & sociological outcomes.

Why are We Doing This?

• Feasibility – can we adapt high-throughput sequencing to the clinical laboratory?

• Performance – can we turn the results around in ~3 weeks?

• Efficacy – Does targeting mutations improve health outcome?

• Sociology – How do clinicians and patients deal with genomic data?

Information Flow

Single Molecule Sequencing

•Pacific Biosystems RS•Single-molecule sequencing; circular consensus•>1000 bp reads, ~8x coverage•15 min/run

Reporting Mutation Consequences

Mutation Consequences Knowledgebase

• ~200 cancer-related genes selected by consensus of local oncologists.

• ~800 being added from knowledgebases at MSKCC, COSMIC and NCI.

• Actionable common mutations being annotated by oncology fellows at PMH.

• Informatics system will generate a draft report; reviewed and revised by expert panel of oncologists.

• Results fed back to knowledgebase.

Actionable Mutations

• Patients with mutations in KRAS, BRAF, & PI3K referred to ongoing clinical trials with targeted inhibitors of those pathways.– Other actionable mutations to be added as suitable

trials become available.

– Does treatment with targeted therapy improve patient outcome?

• Patients with germline variants associated with increased risk of cancer will receive counseling, and offered genetic counseling of potentially affected family members.

Sociological Questions

• How will patients respond to being told that they will be denied treatment based on absence of targeted mutation?

• How will patients respond to learning they carry cancer risk alleles?

• How will patients & clinicians respond to “incidental” findings and nonactionable mutations?

Sociology Study Design

Status of GPS Project

• Oncologists, pathologists, radiologists, clinical trials nurse and other clinical staff recruited.

• Sociologists recruited.

• Clinical & genomics databases built.

• Consequences knowledgebase prototype running.

• Study approved by IRB.

• First patient to be recruited week of March 13, 2011.

Summary

Three OICR Projects1. ICGC – Discover cancer driver genes.

2. Reactome – Discover how driver genes relate to disease mechanisms & to clinical behavior.

3. GPS – Translate genomics to the bedside.

Tentative first steps towards personalized medicine.

Acknowledgements• ICGC DCC & Portal

– Arek Kasprzyk– Junjun Zhang– Francis Ouellette

• Network Analysis– Guanming Wu– Irina Kalatskaya– Christina Yung

• GPS Project– Lillian Siu– John McPherson– Suzanne Karmel-Reid

• My Boss– Thomas Hudson

• Funding– National Institutes of Health– Ministry of Research &

Innovation, Ontario

Ministry of Research and Innovation