national cancer data ecosystem and data sharing

62
NCI Support for a National Cancer Data Ecosystem and Data Sharing Warren Kibbe, PhD [email protected] @wakibbe February 6 th , 2017

Upload: warren-kibbe

Post on 09-Feb-2017

86 views

Category:

Health & Medicine


3 download

TRANSCRIPT

NCI Support for a National Cancer Data

Ecosystem and Data SharingWarren Kibbe, PhD

[email protected]

@wakibbe

February 6th, 2017

2

To develop the knowledge base that will lessen the burden of

cancer in the United States and around the world.

NCI Mission

3

Cancer Statistics

In 2016 there were an estimated

1,700,000 new cancer cases and

600,000 cancer deaths- American Cancer Society 2015

Cancer remains the second most common cause of death in the U.S.

- Centers for Disease Control and Prevention 2015

4

Understanding Cancer

Precision medicine will lead to fundamental understanding of the complex interplay between genetics, epigenetics, nutrition, environment and clinical presentation and direct effective, evidence-based prevention and treatment.

5

Cancer is a grand challenge

Deep biological understanding

Advances in scientific methods

Advances in instrumentation

Advances in technology

Data and computation

Cancer Research and Care generatedetailed data that is critical to create a learning health system for cancer

Requires:

6

7

8

Biological Scales

10

Digital technology is changing rapidly

11

Cancer therapy is changing

12

Application of Cancer Genomics is changing

13

The Beau BidenCancer Moonshot

How do we enable meaningful, patient-centered and patient-level

data sharing for cancer and promote access to clinical trials for

all Americans?

14

Goals of the Beau Biden Cancer Moonshot

Accelerate progress in cancer, including prevention & screening From cutting edge basic research to wider uptake of standard

of care

Encourage greater cooperation and collaboration Within and between academia, government, and private sector

Enhance data sharing(Presidential Memo 2016)

15

A Few Beau Biden Cancer Moonshot Milestones• Announced by Former President Obama at the State of the Union January 12, 2016

• Blue Ribbon Panel convened at AACR, April 18, 2016

• Genomic Data Commons went public June 6, 2016

• Vice President’s Cancer Moonshot Summit – June 29, 2016

• Rethinking Clinical Trial Search – Open API at https://clinicaltrialsapi.cancer.gov • Blue Ribbon Panel recommendations – accepted by the National Cancer Advisory

Board on September 7th, 2016

• Cancer Moonshot Task Force and BRP recommendations sent to President on October 17th, 2016 https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative/milestones and released at https://cancer.gov/brp

• 21st Century Cures Act funding the Beau Biden Cancer Moonshot passed 94-5 by the Senate on December 8 and signed by Former President Obama December 13, 2016.

16https://cancer.gov/brp

Blue Ribbon Panel Recommendations

Network for Direct Patient Engagement Cancer Immunotherapy Translational Science Network Therapeutic Target Identification to Overcome Drug Resistance A National Cancer Data Ecosystem for Sharing and Analysis Fusion Oncoproteins in Childhood Cancers Symptom Management Research Prevention and Early Detection – Implementation of Evidence-based Approaches Retrospective Analysis of Biospecimens from Patients Treated with Standard of Care Generation of 4D Human Tumor Atlas Development of New Enabling Cancer Technologies

18

Relationship Between Bypass Budget and Blue Ribbon Panel Report Bypass Budget addresses NCI’s

entire research portfolio

Lays out the plan for NCI’s continued investment in cancer research

Cancer Moonshot is a unique opportunity to enhance cancer research in specific areas that are poised for acceleration

The BRP report made 10 bold, yet feasible, recommendations that will fast-track initiatives if infused with Moonshot funding

Cancer Research Data Ecosystem

@wakibbe

20

Changing the conversation around data sharing

How do we find data, software, standards? How can we make data, annotations, software, metadata accessible? How do we increase data reuse? Reproducibility? How do we make more data machine readable?

NIH Data CommonsNCI Genomic Data Commons

National Cancer Data Ecosystem

Data Commons co-locate data, storage and computing infrastructure, and frequently used tools for analyzing and sharing data to create an

interoperable resource for the research community.*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson, A Case for Data Commons Towards Data Science as a Service, to appear. Source of image: Interior of one of Google’s Data Center, www.google.com/about/datacenters/.

Cancer Research Data Ecosystem – Cancer Moonshot BRP

Well characterized research data

sets

Cancer cohorts Patient data

EHR, Lab Data, Imaging, PROs, Smart Devices,

Decision Support

Learning from everycancer patient

Active researchparticipation

Research informationdonor

Clinical ResearchObservational studies

ProteogenomicsImaging dataClinical trials

Discovery Patient engaged Research

SurveillanceBig Data

Implementation research

SEERGDC

22

Principles for the Cancer Research Data Ecosystem

• Open Science. Supporting Open Access, Open Data, Open Source, and Data Liquidity for the cancer community

• Standardization through CDEs and Case Report Forms

• Interoperability by exposing existing knowledge through appropriate integration of ontologies, vocabularies and taxonomies

• Consistency of curation and capture of evidence (tissue types, recurrence, therapy – including genomic, epigenomic, etc context)

• Sustainable models for informatics infrastructure, services, data, metadata

23

Cancer Data Sharing& Data Commons

• Support open science • Support data reusability• Aligned with Cancer Moonshot• Part of Precision Medicine• Reduce Health Disparities• Improve patient access to clinical

trials• Work toward a learning National

Cancer Data Ecosystem

Reduce the risk, improve early detection, outcomes and survivorship in cancer

24

Data Commons Structure

Data Contributors and Consumers

Researchers PatientsCliniciansInstitutions

NCI ThesauruscaDSR

NLM UMLSRxNormLOINC

SNOMED

25

The Cancer Genomic Data Commons (GDC) is an existing effort to standardize and simplify submission of genomic data to NCI and follow the principles of FAIR – Findable, Accessible, Attributable, Interoperable, Reusable, and Provide Recognition.

The GDC is part of the NIH Big Data to Knowledge (BD2K) initiative and an example of the NIH Commons

Genomic Data Commons

Microattribution, nanopublications, tracking the use of data, annotation of data, use of algorithms, supports

the data /software /metadata life cycle to provide credit and analyze impact of data, software, analytics,

algorithm, curation and knowledge sharing

Force11 white paperhttps://www.force11.org/group/fairgroup/fairprinciples

NCI Genomic Data Commons The GDC went live on June 6, 2016 with approximately 4.1 PB of data. This includes:

2.6 PB of legacy data

1.5 PB of “harmonized” data

577,878 files about 14194 cases (patients), in 42 cancer types, across 29 primary sites.

10 major data types, ranging from Raw Sequencing Data, Raw Microarray Data, to Copy Number Variation, Simple Nucleotide Variation and Gene Expression.

Data are derived from 17 different experimental strategies, with the major ones being RNA-Seq, WXS, WGS, miRNA-Seq, Genotyping Array and Expression Array.

Foundation Medicine announced the release of 18,000 genomic profiles to the GDC at the Cancer Moonshot Summit.

Genomic Data Commons Data Portal

GDC Content

GDC

TCGA11,353 cases TARGET 3,178 cases

Current

Foundation Medicine 18,000 cases Cancer studies in dbGAP ~4,000 cases

Coming soon

NCI-MATCH ~5,000 cases Clinical Trial Sequencing Program ~3,000 cases

Planned (1-3 years)

Cancer Driver Discovery Program ~5,000 cases Human Cancer Model Initiative ~1,000 cases APOLLO – VA-DoD ~8,000

cases

~58,000 cases

Exome-seq

Whole genome-seq

RNA-seq

Copy number

Genomealignment

Genomealignment

Genomealignment

Datasegmentation

1° processing

Mutations

Mutations +structural variants

Digital geneexpression

Copy numbercalls

2° processingOncogene vs.

Tumor suppressor

Translocations

Relative RNA levelsAlternative splicing

Gene amplification/ deletion

3° processing

GDC Data HarmonizationMultiple data types and levels of processing

Mutect2pipeline

GDC Data HarmonizationOpen Source, Dockerized Pipelines

Recoveryrate

(% true positives) A0F0

SomaticSniper 81.1% 76.5%VarScan 93.9% 84.3%

MuSE 93.1% 87.3%All Three 96.4% 91.2%

GDC variant callingpipelinesWash UBaylorBroad

GDC Data HarmonizationMultiple pipelines needed to recover all variants

Development of the NCI Genomic Data Commons (GDC)To Foster the Molecular Diagnosis and Treatment of Cancer

GDC

Bob Grossman PIUniv. of Chicago

Ontario Inst. Cancer Res.Leidos

Institute of MedicineTowards Precision Medicine

2011

Discovery of Cancer Drivers With 2% Prevalence

Lung adeno.+ 2,900

Colorectal+ 1,200

Ovarian+ 500

Lawrence et al, Nature 2014

Power Calculation for Cancer Driver Discovery

Need to resequence >100,000 tumors to identify all cancer drivers at >2% prevalence

What Makes GDC Special? Stores raw genomic data, allowing continuous reanalysis as

computation methods and genome annotations improve

NCI commitment to maintain long-term storage of cancer genomic data in the GDC with free access to researchers

Utilizes shared bioinformatic pipelines to facilitate cross-study comparisons and integrated analysis of multiple data types

Maintains harmonized clinical data in a highly structured and extensible schema

Enables researchers to comply with the NIH Genomic Data Sharing policy as well as journal requirements for data sharing

GDC The explanatory power of data in the GDC will grow over time as

it accrues more cases => GDC will promote precision oncology

Other Cancer Data Sharing EffortsSignature Efforts Data

BRCA ChallengeSomatic variant sharing

Isolated genetic variantsNo raw sequencing data

Precision medicine questionsSomatic variant sharing

Panel gene resequencingClinical response

Clinical trialPublic-private partnerships

Comprehensive genomicsDetailed clinical

phenotype data

Clinical trial accessClinical/genomic data aggregation

EHR dataClinical sequencing

Clinical oncology standardsEHR dataClinical sequencing

GDC

Utility of a Cancer Knowledge System

Identifylow-frequencycancer drivers

Define genomicdeterminants of response

to therapy

Compose clinical trialcohorts sharing

targeted genetic lesions

Cancerinformation

donors

GenomicData

Commons

Towards a Cancer Knowledge System Continue genomic investigations of cancer

• Need > 100,000 cases analyzed• Embrace all genomic platforms• Relationship of relapse and primary biopsies

Incorporate associated clinical annotations• Clinical trial data• Observational, longitudinal standard-of-care data• N-of-1 clinical data

Promote and curate biological investigations of cancer genetic variants• Driver vs. passenger mutations• Multiple phenotypic assays• Alterations in regulatory pathways – proteomics• Mechanisms of therapeutic resistance• Functional genomic investigations

Integrative models for high-dimensional data

GenomicData

Commons

40

Support the Precision Medicine Initiative

• Expand data model to include other data (e.g. imaging and proteomics)

• Allow easy publication of persistent links to data, annotations, algorithms, tools, workflows

• Measure usage and impact

• Change incentives for public contributions

The Genomic Data Commons and Cloud Pilots

41

PMI – Oncology, the GDC and the Cloud Pilots Goals

Support precision medicine-focused clinical research Enable researchers to deposit well-annotated

(Interoperable) genomic data sets with the GDC Provide a single source (and single dbGaP access

request!) to Find and Access these data Enable effective analysis and meta-analysis of these data

without requiring local downloads – data Reuse Understand Contributions, Assess value through usage,

and give Attribution to all users

42

PMI – Oncology, the GDC and the Cloud Pilots Goals

Provide a data integration platform to allow multiple data types, multi-scalar data, temporal data from cancer models and patients through open APIs Work with the Global Alliance for Genomics and Health

(GA4GH) to define the next generation of secure, flexible, meaningful, interoperable, lightweight interfaces – open APIs

Engage the cancer research community in evaluating the open APIs for ease of use and effectiveness

GDC AcknowledgementsNCI Center for Cancer Genomics Univ. of Chicago

Bob GrossmanAllison Heath

Mike FordZhenyu Zhang

Ontario Institute for Cancer Research

Lou StaudtZhining Wang

Martin FergusonJC Zenklusen

Daniela GerhardDeb Steverson

Vincent Ferretti'Francois Gerthoffert

JunJun Zhang

Leidos Biomedical ResearchMark Jensen

Sharon GaheenHimanso Sahni

NCI NCI CBIITTony KerlavageTanya Davidsen

CGC Pilot Team Principal Investigators • Gad Getz, Ph.D - Broad Institute - http://firecloud.org • Ilya Shmulevich, Ph.D - ISB - http://cgc.systemsbiology.net/ • Deniz Kural, Ph.D - Seven Bridges – http://www.cancergenomicscloud.org

NCI Project Officer & CORs• Anthony Kerlavage, Ph.D –Project Officer• Juli Klemm, Ph.D – COR, Broad Institute• Tanja Davidsen, Ph.D – COR, Institute for Systems Biology • Ishwar Chandramouliswaran, MS, MBA – COR, Seven Bridges Genomics

GDC Principal Investigator• Robert Grossman, Ph.D - University of Chicago• Allison Heath, Ph.D - University of Chicago• Vincent Ferretti, Ph.D - Ontario Institute for Cancer Research

Cancer Genomics Project Teams

NCI Leadership Team• Doug Lowy, M.D.• Lou Staudt, M.D., Ph.D.• Stephen Chanock, M.D.• George Komatsoulis, Ph.D.• Warren Kibbe, Ph.D.

Center for Cancer Genomics Partners• JC Zenklusen, Ph.D.• Daniela Gerhard, Ph.D.• Zhining Wang, Ph.D.• Liming Yang, Ph.D.• Martin Ferguson, Ph.D.

Cancer Research Data Ecosystem – Cancer Moonshot BRP

Well characterized research data

sets

Cancer cohorts Patient data

EHR, Lab Data, Imaging, PROs, Smart Devices,

Decision Support

Learning from everycancer patient

Active researchparticipation

Research informationdonor

Clinical ResearchObservational studies

ProteogenomicsImaging dataClinical trials

Discovery Patient engaged Research

SurveillanceBig Data

Implementation research

SEERGDC

46

Data Commons Structure

Data Contributors and Consumers

Researchers PatientsCliniciansInstitutions

NCI ThesauruscaDSR

NLM UMLSRxNormLOINC

SNOMED

47

Questions?

Warren Kibbe, Ph.D.

[email protected]

@wakibbe

48

Rethinking Cancer Clinical Trials Search

for patients and providers

Rethinking Clinical Trials Search

Engaging the Presidential Innovation Fellows Create an Application Programming Interface (API) for Clinical Trials Create an example search interface based on the API Create a twitter feed for all new clinical trials Incorporation of these innovations into cancer.gov

2/6/17

50

Rethinking and Enhancing Clinical Trial Search: June, 2016• Initial Release of an API (Application Programming Interface) (API)1, developed by the

Presidential Innovation Fellows, for testing. This tool, found at https://clinicaltrialsapi.cancer.gov, makes publicly available trial registration information from the CTRP database, currently found on cancer.gov, assessable to third-party innovators so that they can build new digital tools tailored to the clinical trial search needs of their users.

• Launch of @NCICancerTrials on Twitter and dissemination of clinical trial information via GovDelivery: https://public.govdelivery.com/accounts/USNIHNCI/subscriber/new

• Changes the Cancer.gov Website to enhance clinical trial searching

1A set of protocols designed to provide communication between a software application and a computer operating system or between applications.

Rethinking Clinical Trial Search – Next Steps

• Cancer.gov

- Work with the CTAC Clinical Trials Informatics Working Group (CTIWG) on the design on a “front end” to the API for use on the Cancer.gov website.- This will allow search and retrieval of information that is currently available on

Cancer.gov directly from NCI’s Clinical Trials Reporting Program

- The CTIWG will provide input regarding design and usability of the Cancer.gov website, as well as:- Prioritization of requested enhancements (e.g., structured eligibility criteria)

• Other websites and/or providers of clinical trial search

- Test API and use publicly assessable CTRP data for use in their systems.

Clinical Trials Search API https://clinicaltrialsapi.cancer.gov

2/6/17

https://clinicaltrialsapi.cancer.gov/clinical-trial/NCI-2014-01509

2/6/17

2/6/17

2/6/17

2/6/17

2/6/17

59

60

Questions?

Warren Kibbe, Ph.D.

[email protected]

@wakibbe

www.cancer.gov www.cancer.gov/espanol

62

NIH Genomic Data Sharing Policy

https://gds.nih.gov/ Went into effect January 25, 2015

NCI guidance:http://

www.cancer.gov/grants-training/grants-management/nci-policies/genomic-data

Requires public sharing of genomic data sets