ebi industry programme tcga warren kibbe november 2013
DESCRIPTION
Presentation to the EBI Industry Programme to highlight the TCGA project at the NCITRANSCRIPT
NCI CBIIT Re-engaged Warren Kibbe
[email protected] 240-276-7300
The views expressed are my own and not a reflection of DHHS or NCI policy
General strategic objectives • Reduce cancer risk • Improve cancer outcomes • Education and dissemination of
information • Provide informative data and powerful
examples
Broad strategic activities • Understand social media as a mechanism
for communication, education, and improving lifestyle choices
• Work productively with patient advocates • Understand risk factors leading to cancer • Model cancer initiation and progression • Enable precision oncology • Help build learning healthcare systems
Informatics strategic objectives • Lower barriers to data access, analysis
and modeling • Promote agility, flexibility, data liquidity • Promote Open Access, Open Data, Open
Source, Open Science • Promote semantic interoperability,
standards, CDEs and Case Report Forms
Informatics strategic objectives • Promote mobile and BYOD for patient
reported outcomes, education, surveillance, eligibility …
• Use informatics to improve and lower barriers to clinical trials accrual
• Use informatics to blur the distinction between care and research – support clinical standards in research
• Identify and disseminate innovations and practices that make research more efficient and effective
A few specific activities • Genomic Data Commons • Cloud Pilot • EVS, NCI Thesaurus, NCI Metathesaurus • CDEs, Case Report Forms • MPACT, MATCH, Exceptional Responders • Integrated informatics for the cooperative
groups • FDA Clinical Trials Repository
– Janus – Collaboration between the FDA and NCI
• RAS Initiative – hub at NCI Frederick
TCGA history • Initiated in 2005 • Collaboration of NHGRI and NCI to
examine GBM, Lung and Ovarian cancer using genomic techniques in 2006.
• Expanded to 20+ tumor types.
TCGA snapshot • Data collection will complete in Q3 2014 • As of October 2013, 700TB of data has
been collated and integrated. • Anticipates 2.5 PB of data as of the end of
Q3 2014 • Some tumor types are complete, others
nearly complete, and still others are just getting to the point of submission
TCGA snapshot • Today there is a standardized analysis
pipeline with standardized protocols • Today there is standardized consent and
consenting process • Today there is a standardized data access
policy
TCGA drivers • Providing high quality reference sets for
20+ tissue types • Providing a platform for systems biology
and hypothesis generation • Providing a test bed for understanding the
real world implications of consent and data access policies on genomic and clinical data.
Focus on the TCGA • The TCGA consortium slides
TCGA – Lessons fromstructural genomics#
Jean Claude Zenklusen, Ph.D. Director TCGA Program Office National Cancer Institute
13
Tumor Project Progress
Manuscript submitted or published
Analysis underway
Sample acquisition phase
Rare tumor project
0
200
400
600
800
1000
1200
®
® ® ® ® ® ® ® ®
Accepting AA cases only Goal of 500 reached
The Mutational Burden of Human Cancer#
Mike Lawrence and Gaddy Getz
Increasing genomic#complexity#
Childhood#cancers#
Carcinogens#
Response of RCC#To Everolimus#
Everolimus#Placebo#
mTOR mutations#Progression-free survival#
(months)#
PI(3)K aberrations (28% of cases)#
Frequent Activation of the PI(3)K Pathway in#Clear Cell Renal Carcinoma#
Motzer et al Lancet 372:449 (2008)#Hakimi et al Nat Gen 45:849 (2013)#Sato et al Nat Gen 45:860 (2013)#TCGA Nature 499:45 (2013)#
TCGA Nature 497:67 (2013)#
Four Molecular Subgroups of Endometrial Cancer#Defined by Integrative Analysis#
POLE#(ultra-#
mutated)#MSI#
(hypermutated)#Copy-number low#
(endometriod)#Copy-number high#
(serous-like)#
Histology#
Mutations#Per Mb#
PolE#MSI / MSH2#
Copy ##PTEN#
p53#
TCGA Nature 497:67 (2013)#
Molecular Subgroups Refine Histological Diagnosis#Of Endometrial Carcinoma#
POLE#(ultra-#
mutated)#MSI#
(hypermutated)#Copy-number low#
(endometriod)#Copy-number high#
(serous-like)#
Histology#
Mutations#Per Mb#
PolE#MSI / MSH2#
Copy ##PTEN#
p53#
Serous#misdiagnosed#
as endometrioid?#Endometrioid#Serous#
Histology#
TCGA Nature 497:67 (2013)#
Molecular Diagnosis of Endometrial Cancer May#Influence Choice of Therapy#
POLE#(ultra-#
mutated)#MSI#
(hypermutated)#Copy-number low#
(endometriod)#Copy-number high#
(serous-like)#
Histology#
Mutations#Per Mb#
PolE#MSI / MSH2#
Copy ##PTEN#
p53#
Adjuvant#chemotherapy?#
Adjuvant#radiotherapy?#
Surgery only?#
GDC!
NCI Cancer Genomics Data Commons Functionality#
NCI Genomics#Data Commons#
Genomic +#clinical data#
. . .
GDC!
NCI Genomics#Data Commons#
Genomic +#clinical data#
. . .
Cancer#information#
donor#
NCI Cancer Genomics Data Commons Functionality#
DACO
ICGC
dbGaP
EGA
TCGA
BAM
Open
Open
ERA
BAM
Germ���Line
+ EGA id
BAM BAM
ICGC BAM/FASTQ
TCGA BAM/FASTQ
ICGC Open Data
(includes ���TCGA ���
Open Data)
COSMIC Open Data
GDC!
Relationship of the Cancer Genomics Data Commonsand NCI Clouds #
NCI Cloud Computational Centers#
Periodic Data Freezes
Search / retrieve
Analysis NCI Genomics#Data Commons#
Cancer Genomics Cloud Pilots
Essential Functions of a Genomics Data Commons#v Perform data quality control#v Harmonize primary data across studies
=> realign all primary sequence data to the reference genome#v Provide “gold standard” derived data:
=> mutations / copy number / digital gene expression #
Jones et al. Genome Biol. 2010;11(8):R82.
Copy # gain#Copy # loss#
Overexpressed#Under expressed#
Mutated#Cancer
Genome Diagnostic
Report
Essential Functions of a Genomics Data Commons#v Perform data quality control#v Harmonize primary data across studies
=> realign all primary sequence data to the reference genome#v Provide “gold standard” derived data:
=> mutations / copy number / digital gene expression #v Permit integrative analysis across data types#
Essential Functions of a Genomics Data Commons#v Perform data quality control#v Harmonize primary data across studies
=> realign all primary sequence data to the reference genome#v Provide “gold standard” derived data:
=> mutations / copy number / digital gene expression #v Permit integrative analysis across data types#v Enable integrative analysis across all cancer samples#
TCGA PanCan Working Group#Giovanni Ciriello#Nikloaus Schultz#Chris Sander#
GDC!
Utility of a Cancer Knowledge Base#
Identify#low-frequency#cancer drivers#
Define genomic#determinants of response#
to therapy#
Compose clinical trial#cohorts sharing#
Targeted genetic lesions#
Cancer#information#
donor#