fda ngs and big data conference september 2014

43
National Cancer Institute U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health NCI Genomics Data Commons and cloud pilots September 2014

Upload: warren-kibbe

Post on 25-Jun-2015

329 views

Category:

Government & Nonprofit


3 download

DESCRIPTION

Presentation for the FDA NGS and Big Data Conference September 2014 held on the NIH campus. NCI initiatives, including Cancer Genomics Data Commons, NCI Cloud Pilots, big data issues for cancer

TRANSCRIPT

Page 1: FDA NGS and Big Data Conference September 2014

National Cancer InstituteU.S. DEPARTMENT OF HEALTH AND HUMAN SERVICESNational Institutes of Health

NCI Genomics Data Commons and cloud pilots

September 2014

Page 2: FDA NGS and Big Data Conference September 2014

Overview• National Challenges in Cancer Data

• Disruptive Technologies

• NCI Genomics Data Commons

• NCI Cloud Pilots

• Building a national learning health system for cancer clinical genomics

Page 3: FDA NGS and Big Data Conference September 2014

National Challenges in Cancer Informatics

• Lowering barriers to data access, analysis and modeling for cancer research

• Integration of data and learning from basic and clinical research with cancer care that enable prediction and improved outcomes

Page 4: FDA NGS and Big Data Conference September 2014

We need:

• Open Science (Open Access, Open Data, Open Source) and Data Liquidity for the cancer community

• Semantic interoperability through CDEs and Case Report Forms mapped to standards

• Sustainable models for informatics infrastructure, services, data

Page 5: FDA NGS and Big Data Conference September 2014

Where we areDisruptive technologies

Getting social

Open access to data

Page 6: FDA NGS and Big Data Conference September 2014

Disruptive Technologies• Printing• Steam power• Transportation• Electricity• Antibiotics• Semiconductors &VLSI design• http• High throughput biology

Systems view - end of reductionism?

Page 7: FDA NGS and Big Data Conference September 2014

Disruptive Technologies• Printing• Steam power• Transportation• Electricity• Antibiotics• Semiconductors &VLSI design• http• High throughput biology• Ubiquitous computing Everyone is a data provider

Data immersion

World:6.6B active mobile contracts1.9B smart phone contracts1.1B land linesWorld population 7.1B

US:345M active mobile contracts287M smart phone contractsUS population 313M

Page 8: FDA NGS and Big Data Conference September 2014

What about social media?

• Social media may be one avenue for modifying behaviors that result in cancer

• Properly orchestrated, social media can have dramatic impact on quality of life for patients and survivors

• It can reach into all segments of our society, including underserved populations

Page 9: FDA NGS and Big Data Conference September 2014

Public Health

• These three modifiable factors - infectious disease, smoking, and poor nutrition and lack of exercise contribute to at least 50% of our current cancer burden. And the cost from loss of quality of life, pain and suffering is incalculable.

Page 10: FDA NGS and Big Data Conference September 2014

Some NCI Big Data activities

• TCGA, TARGET and ICGC

– Cancer Genomics Data Commons

– NCI Cloud Pilots

• Molecular Clinical Trials:

– MPACT, MATCH, Exceptional Responders

Page 11: FDA NGS and Big Data Conference September 2014

Data are accumulating!

Page 12: FDA NGS and Big Data Conference September 2014

From the Second Machine Age

From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee

Page 13: FDA NGS and Big Data Conference September 2014

Molecular data is Big Data

• Brief trip down memory lane• Sequencing and the Human Genome

Project

Page 14: FDA NGS and Big Data Conference September 2014
Page 15: FDA NGS and Big Data Conference September 2014
Page 16: FDA NGS and Big Data Conference September 2014

GenBank High Throughput Genome Sequence (HTGS)

Page 17: FDA NGS and Big Data Conference September 2014

February 12, 2001

Page 18: FDA NGS and Big Data Conference September 2014

HGP outcomes

Page 19: FDA NGS and Big Data Conference September 2014

19

Page 20: FDA NGS and Big Data Conference September 2014

Assays and Data Types

20

Page 21: FDA NGS and Big Data Conference September 2014

TCGA history

• Initiated in 2005• Collaboration of NHGRI and NCI to

examine GBM, Lung and Ovarian cancer using genomic techniques in 2006.

• Expanded to 20+ tumor types.

Page 22: FDA NGS and Big Data Conference September 2014

TCGA drivers• Providing high quality reference sets for

20+ tissue types• Providing a platform for systems biology

and hypothesis generation• Providing a test bed for understanding the

real world implications of consent and data access policies on genomic and clinical data.

Page 23: FDA NGS and Big Data Conference September 2014

Focus on TCGA• TCGA consortium slides• Thanks to Lou Staudt and Jean Claude

Zenklusen

Page 24: FDA NGS and Big Data Conference September 2014

TCGA –Lessons from

structural genomics

Jean Claude Zenklusen, Ph.D.DirectorTCGA Program OfficeNational Cancer Institute

Page 25: FDA NGS and Big Data Conference September 2014

The Mutational Burden of Human Cancer

Mike Lawrence and Gaddy Getz

Increasing genomiccomplexity

Childhoodcancers

Carcinogens

Page 26: FDA NGS and Big Data Conference September 2014
Page 27: FDA NGS and Big Data Conference September 2014
Page 28: FDA NGS and Big Data Conference September 2014

TCGA Nature 497:67 (2013)

Molecular Subgroups Refine Histological DiagnosisOf Endometrial Carcinoma

POLE(ultra-

mutated)MSI

(hypermutated)Copy-number low

(endometriod)Copy-number high

(serous-like)

Histology

MutationsPer Mb

PolEMSI / MSH2

Copy #PTEN

p53

Serousmisdiagnosed

as endometrioid?EndometrioidSerous

Histology

Page 29: FDA NGS and Big Data Conference September 2014

TCGA Nature 497:67 (2013)

Molecular Diagnosis of Endometrial Cancer MayInfluence Choice of Therapy

POLE(ultra-

mutated)MSI

(hypermutated)Copy-number low

(endometriod)Copy-number high

(serous-like)

Histology

MutationsPer Mb

PolEMSI / MSH2

Copy #PTEN

p53

Adjuvantchemotherapy?

Adjuvantradiotherapy?

Surgery only?

Page 30: FDA NGS and Big Data Conference September 2014

GDC

NCI Cancer Genomics Data Commons

NCI GenomicsData Commons

Genomic +clinical data

. . .

Page 31: FDA NGS and Big Data Conference September 2014

GDC

NCI Cancer Genomics Data Commons

NCI GenomicsData Commons

Genomic +clinical data

. . .

Cancerinformation

donor

Page 32: FDA NGS and Big Data Conference September 2014

GDC

Identifylow-frequencycancer drivers

Define genomicdeterminants of response

to therapy

Compose clinical trialcohorts sharing

Targeted genetic lesions

Cancerinformation

donor

Utility of a Cancer Knowledge Base

Page 33: FDA NGS and Big Data Conference September 2014

DACO

ICGC

dbGaP

EGA

TCGA

BAM

Open

Open

ERA

BAM

Germ

Line

+ EGA id

BAMBAM

Page 34: FDA NGS and Big Data Conference September 2014

ICGCBAM/FASTQ

TCGABAM/FASTQ

ICGCOpenData

(includes TCGA

Open Data)

COSMICOpen Data

Page 35: FDA NGS and Big Data Conference September 2014

Driver for the Cloud Pilots• An inflection point for TCGA is looming

Gig

abyt

es (G

B)

Page 36: FDA NGS and Big Data Conference September 2014

Local copies and computation

• Assuming the 2.5 PB TCGA data set

• Storage and backups ~ $1M US

• Downloading TCGA data at 10 Gb/sec = 23 days

• Size + high dimensionality = high computational requirements that grow quickly

Page 37: FDA NGS and Big Data Conference September 2014

GDC

Relationship of the Cancer Genomics Data Commons and NCI Cloud Pilots

NCI Cloud Computational Centers

PeriodicData Freezes

Search /retrieve

AnalysisNCI GenomicsData Commons

Page 38: FDA NGS and Big Data Conference September 2014

Cancer Genomics Cloud Pilots

Page 39: FDA NGS and Big Data Conference September 2014

NCI Cloud Pilots

• Funding for up to 3 cloud pilots - 24

month pilots that are meant to inform the

Cancer Genomics Data Commons

– Explore models for cancer genomics APIs

– Explore cloud models for data+analysis

Page 40: FDA NGS and Big Data Conference September 2014

NCI Cloud Pilots• A way to move computation to the data• Sustainable models for providing access

to data• Reproducible pipelines for QA, variant

calling, knowledge sharing• Define genomics/phenomics APIs for

discovering new variants contributing to cancer, enhancing response, modulating risk

Page 41: FDA NGS and Big Data Conference September 2014

The future• Elastic computing ‘clouds’• Social networks• Big Data analytics

• Precision medicine• Measuring health• Practicing protective medicine

Semantic and synoptic data

Intervening before health is

compromised

Learning systems that enable learning from every cancer patient

Page 42: FDA NGS and Big Data Conference September 2014

Thank you

Warren A. [email protected]

Page 43: FDA NGS and Big Data Conference September 2014