the national covid cohort collaborative (n3c): let’s get

85
The National COVID Cohort Collaborative (N3C): Let’s Get Involved ! Warren A. Kibbe, PhD, FACMI June 15, 2021 Purdue Big Data in Cancer Workshop @data2health @ncats_nih_gov covid.cd2h.org ncats.nih.gov/n3c @wakibbe

Upload: others

Post on 20-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

The National COVID Cohort Collaborative (N3C): Let’s Get Involved !

Warren A. Kibbe, PhD, FACMIJune 15, 2021

Purdue Big Data in Cancer Workshop

@data2health@ncats_nih_gov

covid.cd2h.orgncats.nih.gov/n3c@wakibbe

Speaker Objectives

Warren KibbeDuke Biostatistics & BioinformaticsCTSA InformaticsDuke Cancer InstituteMember N3C

● Real World Data

● Open Science

● Overview of N3C

● N3C Data Enclave statistics

● How common data models and variables

are harmonized

● The scope of answerable questions

● Data access and security

● How common data models and variables

are harmonized

● Oncology research in N3C

A program of NIH’s National Center

for Advancing Translational Sciences

Special thanks to:

● Chris Chute, N3C, Johns Hopkins

● Melissa Haendel, N3C, Colorado University

● Umit Topaloglu, N3C, Wake Forest

● Frank Rockhold, Duke

● Noha Sharafeldin, N3C, UAB

4

Take homes

• N3C represents a unique resource to examine effects of COVID-19 on cancer

outcomes

• Largest COVID-19 and cancer cohort within the US

• Consistent with previous literature, older age, male gender, increasing comorbidities,

and hematological malignancies were associated with higher mortality in patients with

cancer and COVID-19

• The N3C dataset confirmed that cancer patients with COVID-19 who received recent

immuno-, or targeted therapies were not at higher risks of overall mortality

What is Real World Data?

Collected in the

context of patient

care. Real World

Data was called out

as part of the 21st

Century Cures Act

21st Century Cures Act: https://www.fda.gov/regulatory-information/selected-amendments-fdc-act/21st-century-cures-act

Graphic from HealthCatalyst: https://www.healthcatalyst.com/insights/real-world-data-chief-driver-drug-development

Our ability to generate biomedical data continues to grow in terms of

variety and volume

Current sources of data

molecular genome pathology imaging labs notes sensors

icons by the Noun Project

AI is changing our ability to go both deep and broad

Trustworthy AI

Provenance

Reusable

Reproducible

Having a health equity lens

● Digital Health, precision medicine, and real world data all have the power to transform healthcare. However, we must pay attention to structural racism and implicit bias if we want to achieve equity.

21st Century Cures Act

Last year I discussed the NCI Cancer

Moonshot and Precision Medicine

activities funded under the 21st Century

Cures Act

FDA was directed by congress to focus

on the use of RWD and RWE in drug

design, development and outcomes

assessment

https://www.fda.gov/regulatory-information/selected-

amendments-fdc-act/21st-century-cures-act

Is it just about Real World Data?

What about Open Science? Data transparency? Data Access?

The importance of Open Science

Calls for greater transparency and ‘open data access’ in clinical research

continue actively.

● “Open science is the movement to make scientific research, data and

dissemination accessible to all levels of an inquiring society”*

● Open Science Project**: “If we want open science to flourish, we should

raise our expectations to: Work. Finish. Publish. Release.”

● FAIR Principles: Findability, Accessibility, Interoperability, and Reusability***

● TRUST Principles: Transparency, Responsibility, User focus, Sustainability

and Technology

* https://www.fosteropenscience.eu/resources

** http://openscience.org/

*** https://www.nature.com/articles/sdata201618

****https://www.nature.com/articles/s41597-020-0486-7

Open Science and Patient Data Access

Some of the challenges are:

● Patient privacy

● Academic credit

● Commercial sensitivity and intellectual property

● Data standards

● Resources (money and people)

There should be room for researchers and patients alike to gain from this effort.

Informatics experts and data scientists are essential elements of this discussion.

One problem with Clinical Trials Data Sharing

● “The tendency for researchers to ‘‘sit’’ on their data for an unduly long period

of time is neither desirable from a scientific point of view nor acceptable from

an ethical perspective. ‘

● ‘After all, the data belong to the patients who agreed to participate in the

research, not to the investigators who coordinated it, as the new European

General Data Protection Regulation emphasizes.”*

*Rockhold, F, et al. Open science: The open clinical trials data journey, Clinical Trials, Vol 16 (5) 1-8, 2019

Access to patient-level data is important for research

There are certainly challenges, but question is not whether data should be

shared, but rather how and when access should be granted.

Responsible open access enables secondary analyses that:

● Enhance reproducibility of clinical research

● Honor the contributions of trial participants,

● Improve the design of future trials

● Generate new research findings

This journey of making patient data available is part of an evolution in

transparency and not a sudden awakening.

What about N3C?

It is an open science, controlled access environment

Clinical and Translational Science

Awards (CTSA) Program

● Algorithms (diagnosis, triage, predictive, etc.)● Drug discovery & pharmacogenetics● Multimodal analytics (EHR, imaging, genomics)● Interventions that reduce disease severity● Best practices for resource allocation● Coordinated research efforts to maximize efficiency and

reproducibility

These all require the creation of a comprehensive clinical data set

The pandemic highlights urgent needsA program of NIH’s National Center

for Advancing Translational Sciences

What Kinds of Questions Can N3C Address?

The scope and scale of the information in the platform

will support probing questions such as:

● What social determinants of health are risk factors for mortality?

● Do some therapies work better than others? By region? By demographics?

● Can we compare local rare clinical observations with national occurrences?

● Can we predict who might have severe outcomes if they have COVID-19?

● What factors will predict the effectiveness of vaccines?● Can we predict acute kidney injury in COVID-19 patients?

● Who might need a ventilator because of lung failure?

A program of NIH’s National Center

for Advancing Translational Sciences

Cohort characterization objectives

To clinically characterize the N3C cohort

● Largest U.S. COVID-19 cohort to date (+ representative controls)

● Racially, ethnically, and geographically diverse

To develop and share validated, versioned OMOP representations of

common variables (labs, vital signs, medications, treatments)

To generate hypotheses to be tested within N3C and elsewhere

● Clinical phenotypes and trajectories

● Treatment patterns and response

● … and many others

?

+

A program of NIH’s National Center

for Advancing Translational Sciences

Benefits for Participation

●Access to large scale COVID-19 data from across the nation

●Pilot data for grant proposals

●Opportunities for KL2 and TL1 and other scholars

●Team science opportunities for new questions and access to Teams, statistics, machine learning (ML), informatics expertise

●Learn ML analytics, NLP methods & access to tools, software, additional datasets

A program of NIH’s National Center

for Advancing Translational Sciences

Step 4. Federated Analytics with HPCWho is in the N3C? The N3C Computable Phenotype

● At a high level, our phenotype looks for patients:

○ With a positive COVID-19 test (PCR or antibody) OR

○ With an ICD-10-CM code of U07.1 OR

○ Two or more COVID-like diagnosis codes (ARDS, pneumonia, etc.) during the

same encounter, but only on or prior to 5/1/2020

● Each one of these patients is then demographically matched to two patients with

negative or equivocal COVID-19 tests.

● Each site securely sends this set of patients, along with their longitudinal EHR

data from 1/1/2018 to the present, to the N3C on a regular basis.

Age 47

Gender M

Race Black

Ethnicit

y

Unknow

n

COVID Positive

Matching algorithm

Age 49

Gender M

Race Black

Ethnicit

y

Hispanic/

Latino

COVID Negative

Age 46

Gender M

Race Black

Ethnicit

y

Not

Hispanic

COVID Negative

A program of NIH’s National Center

for Advancing Translational Sciences

N3C TimelineA program of NIH’s National Center

for Advancing Translational Sciences

N3C DashboardA program of NIH’s National Center

for Advancing Translational Sciences covid.cd2h.org/dashboard

55 sites with data released (purple) and 37 sites withdata pending (open circle). OCHIN is a national networkof 131 sites (diamond).

covid.cd2h.org/teams

31 Domain teams!

As of June 14, 2021

https://ncats.nih.gov/n3c/resources/data-contribution/data-transfer-agreement-signatories

Data Transfer Agreement Signatories

6/14/2021

88 DTA Signatories

Northwestern University at Chicago ᛫ Tufts Medical Center ᛫ Advocate Health Care Network ᛫ University of Alabama at Birmingham ᛫ Oregon Health & Science University ᛫

University of Washington ᛫ Stanford University ᛫ The University of Michigan at Ann Arbor ᛫ Children's Hospital Colorado ᛫ Duke University ᛫ Medical College of Wisconsin ᛫ The

Ohio State University ᛫ University of Nebraska Medical Center ᛫ University of Arkansas for Medical Sciences ᛫ George Washington University ᛫ Johns Hopkins University ᛫ West

Virginia University ᛫ Medical University of South Carolina ᛫ University of North Carolina at Chapel Hill ᛫ University of Virginia ᛫ The University of Texas Medical Branch at Galveston

᛫ University of Minnesota ᛫ University of Cincinnati ᛫ Columbia University Irving Medical Center ᛫ Cincinnati Children's Hospital Medical Center ᛫ Rush University Medical Center ᛫

Nemours ᛫ University of Wisconsin-Madison ᛫ The State University of New York at Buffalo ᛫ Washington University in St. Louis ᛫ University of Rochester ᛫ The University of

Chicago ᛫ University of Miami ᛫ The Scripps Research Institute ᛫ University of Texas Health Science Center at San Antonio ᛫ University of Kentucky ᛫ University of Illinois at

Chicago ᛫ Virginia Commonwealth University ᛫ Weill Medical College of Cornell University ᛫ Carilion Clinic ᛫ University Medical Center New Orleans ᛫ The University of Iowa ᛫

Emory University ᛫ Maine Medical Center ᛫ The University of Texas Health Science Center at Houston ᛫ Boston University Medical Campus ᛫ The University of Utah ᛫ University of

Southern California ᛫ George Washington Children's Research Institute ᛫ University of Colorado Denver I Anschutz Medical Campus ᛫ Mayo Clinic Rochester ᛫ The Rockefeller

University ᛫ Montefiore Medical Center ᛫ University of Mississippi Medical Center ᛫ University of Oklahoma Health Sciences Center, Board of Regents ᛫ University of

Massachusetts Medical School Worcester ᛫ Aurora Health Care ᛫ Penn State ᛫ University of New Mexico Health Sciences Center ᛫ NorthShore University HealthSystem ᛫ Wake

Forest University Health Sciences ᛫ Vanderbilt University Medical Center ᛫ Regenstrief Institute ᛫ Brown University ᛫ Stony Brook University ᛫ University of California, Davis ᛫ Yale

New Haven Hospital ᛫ Rutgers, The State University of New Jersey ᛫ MedStar Health Research Institute ᛫ Loyola University Chicago ᛫ Loyola University Medical Center ᛫

University of Delaware ᛫ Children's Hospital of Philadelphia

N3C Enclave Data Stats

Pediatric cases

A program of NIH’s National Center

for Advancing Translational Sciences

N3C Enclave Data Stats

Pediatric cases

A program of NIH’s National Center

for Advancing Translational Sciences

N3C Enclave Data StatsA program of NIH’s National Center

for Advancing Translational Sciences

Predicting Clinical Severity using machine

learning (64 input variables)

The most powerful predictors are patient age and widely available

vital sign and laboratory values.

The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction

https://pubmed.ncbi.nlm.nih.gov/33469592/

Step 4. Federated Analytics with HPCHow does data get into N3C?

● We have gone through the high-level purpose – EHR data about COVID-19

patients

● Identified the contributing sites

● Know what the inclusion criteria for N3C is – documented COVID-19 testing

● Seen the dashboard overview of N3C and the overall cohort characteristics

● What are the data ingestion, harmonization, query, and publication processes?

● Data governance and security?

● And finally, what about cancer and COVID-19?

A program of NIH’s National Center

for Advancing Translational Sciences

Leveraging Common Data ModelsA program of NIH’s National Center

for Advancing Translational Sciences

● These four data models are commonly used by academic medical centers throughout the US.

● CDMs are used to store EHR data in a consistent way.

● Sites participating in N3C may send data in one of these four formats—the idea is to make it as convenient as possible for sites to submit.

● Common data models also allow us to write a consistent computable phenotype that can be run with few local changes at sites with one or more of these data models.

Harmonization of N3C Data

A program of NIH’s National Center

for Advancing Translational Sciences

Data Availability vs UtilityA program of NIH’s National Center

for Advancing Translational Sciences

● Collections of data are not always useful

● Even if they are available

● Consistently classified data is

alway more useful

FAIR: Findable, Accessible,

Interoperable, ReusableA program of NIH’s National Center

for Advancing Translational Sciences

What does Interoperable mean with respect to data? Harmonized!

Syntactic Interoperability (harmonization)

● One can make sense of the structure

● Metaphor: sentence has good grammar

● Domain of the data standards and data model communities

Semantic interoperability (harmonization)

● One can make sense of the meaning

● Metaphor: the words are understandable

● Domain of the vocabulary, ontology, classification communities

N3C Data Ingestion & Harmonization PipelineA program of NIH’s National Center

for Advancing Translational Sciences

(future)

Span manual

curation of mapping

resources to

industrial scale

production

transformation

Harmonized, not HomogenousA program of NIH’s National Center

for Advancing Translational Sciences

CDMs are built for purpose. Different CDMs emphasize and prioritize different things.

Secure, reproducible, transparent, versioned, provenanced, attributed, and shareable analytics on patient-level EHR data

Collaborative Analytics -

N3C Secure Data Enclave

Federated versus Centralized DQA program of NIH’s National Center

for Advancing Translational Sciences

Many clinical data research networks are federated; N3C is centralized. Centralized datasets

have some advantages where data quality assessment is concerned.

Federated Network Centralized Data

Questions asked

directly against

all sites’ data

combined

Federated versus Centralized DQA program of NIH’s National Center

for Advancing Translational Sciences

With federated data, sites are benchmarked against

themselves.

With centralized data, sites can be benchmarked

against each other.

We have 43 qualifying inpatient visits.

We have 27 qualifying inpatient visits.

We have 806 qualifying inpatient visits.

Site 1 Site 2 Site 3

Site Patient Visit Type Adm. Date Disc. Date

1 123 IP 7/4/2020 7/8/2020

1 456 IP 5/6/2020 5/20/2020

2 987 IP 8/2/2019 8/7/2019

2 654 IP 9/3/2019 9/14/2019

3 234 IP 1/26/2021 1/26/2021

3 234 IP 1/26/2021 1/29/2021

3 234 IP 1/26/2021 1/30/2021

3 234 IP 1/26/2021 1/27/2021Clearly, sites differ in how they define “a visit.”

N3C’s DQ ProcessA program of NIH’s National Center

for Advancing Translational Sciences

How Would N3C Deal with This Finding?

● Discover and discuss at weekly DQ meetings.

● Determine: Is this an issue…

○ For the site to fix?

○ For us to handle on our end?

● Reach out to the site to get more information.

○ What if they can’t fix it?

Site Patient Visit Type Adm. Date Disc. Date

1 123 IP 7/4/2020 7/8/2020

1 456 IP 5/6/2020 5/20/2020

2 987 IP 8/2/2019 8/7/2019

2 654 IP 9/3/2019 9/14/2019

3 234 IP 1/26/2021 1/26/2021

3 234 IP 1/26/2021 1/29/2021

3 234 IP 1/26/2021 1/30/2021

3 234 IP 1/26/2021 1/27/2021

N3C’s DQ ProcessA program of NIH’s National Center

for Advancing Translational Sciences

How Would N3C Deal with This Finding?

● Discover and discuss at weekly DQ meetings.

● Determine: Is this an issue…

○ For the site to fix?

○ For us to handle on our end?

● Reach out to the site to get more information.

○ What if they can’t fix it?

We can write an algorithm to make this

site’s visits look more like the other sites:

if:

● the visit type is inpatient

● and there are > 1 per patient

per day

then:

● merge into a single “macro”

visit

Site Patient Visit Type Adm. Date Disc. Date

1 123 IP 7/4/2020 7/8/2020

1 456 IP 5/6/2020 5/20/2020

2 987 IP 8/2/2019 8/7/2019

2 654 IP 9/3/2019 9/14/2019

3 234 IP 1/26/2021 1/26/2021

3 234 IP 1/26/2021 1/29/2021

3 234 IP 1/26/2021 1/30/2021

3 234 IP 1/26/2021 1/27/2021

N3C’s DQ ProcessA program of NIH’s National Center

for Advancing Translational Sciences

Site Patient Visit Type Adm. Date Disc. Date

1 123 IP 7/4/2020 7/8/2020

1 456 IP 5/6/2020 5/20/2020

2 987 IP 8/2/2019 8/7/2019

2 654 IP 9/3/2019 9/14/2019

3 234 IP 1/26/2021 1/26/2021

3 234 IP 1/26/2021 1/29/2021

3 234 IP 1/26/2021 1/30/2021

3 234 IP 1/26/2021 1/27/2021

Site Patient Visit Type Adm. Date Disc. Date

1 123 IP 7/4/2020 7/8/2020

1 456 IP 5/6/2020 5/20/2020

2 987 IP 8/2/2019 8/7/2019

2 654 IP 9/3/2019 9/14/2019

3 234 IP 1/26/2021 1/30/2021

DQ fix

Takeaways

● Centralized DQ processes allow us to fully

realize the potential of N3C’s large sample size.

● All transformations are fully logged and always

completely reversible if needed.

Original Table Ready for Analysis

N3C Data Ingestion & Harmonization PipelineA program of NIH’s National Center

for Advancing Translational Sciences

(future)

Harmonizing numeric dataA program of NIH’s National Center

for Advancing Translational Sciences

● Problem: Different sites provide their

data in different units

● Solution: Harmonize each to a standard

unit

Kilograms = Pounds / 2.20462

Kilograms = Ounces / 35.274

Kilograms = Grams / 1000

Harmonizing numeric dataA program of NIH’s National Center

for Advancing Translational Sciences

● Problem: Some units are missing

● Solution 1: Contact the source

● Solution 2: N3C inference engine

Kilograms = x / 2.20462 ?

Kilograms = x / 35.274 ?

Kilograms = x / 1000 ?

Harmonization progressA program of NIH’s National Center

for Advancing Translational Sciences

● Harmonized measurements

○ By original unit

○ Across many sites

Homogeneity

after

harmonization

Humans measured in grams do not

look the same as humans measured

in kilograms!

Unit harmonization progressA program of NIH’s National Center

for Advancing Translational Sciences

Canonical unit

Uses a known conversion

Unit not plausible

Missing unit inferred

Unit still missing

● ~2x increase in usable data from our

harmonization procedures

We can rescue

a lot of data!

N3C Data Ingestion & Harmonization PipelineA program of NIH’s National Center

for Advancing Translational Sciences

(future)

714140

Pharyngalgia = Sore throatPlain-language medical vocabulary for precision diagnosis. Nat Genet. 2018 50:474-476.

Long-COVID phenotypes are myriad patient-reported and researcher-measured phenotypes are starkly different

Map literature and patient-reported terms to HPO

N3C Harmonization TakeawaysA program of NIH’s National Center

for Advancing Translational Sciences

What N3C has revealed most in terms of needs:

● Interoperability - we need syntactic and semantic!

○ FHIR ⇒ OMOP (syntactic)

○ Common vocabulary/codeset mapping provenance

and management (semantic)

● Approach data harmonization from an end-to-end data

life cycle perspective

● Leverage USCDI, but build for

interoperable semantic modeling

and extensions

Governing N3C Data

A program of NIH’s National Center

for Advancing Translational Sciences

Goal of the Data Use Agreement is Privacy Protection to Promote broad access:

● COVID-Related research only● NIH housed secure repository● No re-identification of individuals or data source● No download or capture of raw data● Open platform to all researchers● Investigator activities are recorded and can be

audited for security and reproducibility

N3C: Unique Data Use and PrivacyA program of NIH’s National Center

for Advancing Translational Sciences

N3C: Governance and Access

Data Levels to Access

Goal of the Data Use Agreement is Privacy Protection to Promote broad access:● COVID-Related research only● No re-identification of individuals or data source● No download or capture of raw data● Open platform to all researchers● Security: Activities in the N3C Data Enclave are recorded and can be audited● Disclosure of research results to the N3C Data Enclave for the public good● Analytics provenance● Contributor Attribution tracking

Data Use and Privacy

● Transparent and collaborative environment where all contributions are acknowledged

● Provenance and reproducibility

● Promptly sharing research results with N3C users

● Publish in high-impact journals

● Attribution for all N3C artifacts

N3C Attribution and Publication Principles

Researchers, projects, and

artifacts are all linked

together in the enclave

using the Contributor

Attribution Model (CAM).

N3C Provenance, Transparency, Attribution & Rapid SharingA program of NIH’s National Center

for Advancing Translational Sciences

N3C Data Access: Process

Data Use Request

HSP / Security Training

Data Use Agreement

https://ncats.nih.gov/n3c/about/applying-for-access

A program of NIH’s National Center

for Advancing Translational Sciences

Realizing Team Science

A program of NIH’s National Center

for Advancing Translational Sciences

Key functions can nucleate projects:

● Education & training

● Biostatistics

● Study design

● Evaluation

● Informatics

● Clinical expertise

● Innovation & commercialization

● Community & partnerships

N3C Domain Team Expertise:

● Enclave technology

● Data model (OMOP)

● Terminologies

● Data quality

● Codesets, variables, phenotype

● Using/parsing N3C data

● Workflows, methods, algorithms

RolesIngredients (Methods, datasets, instruments)Scientific questions

N3C team Science within & across institutions

https://covid.cd2h.org/domain-teams

CTSAs

OUTCOMES OF COVID-19 IN

CANCER PATIENTS: REPORT

FROM THE NATIONAL COVID

COHORT COLLABORATIVE

(N3C)

Noha Sharafeldin, Benjamin Bates, Qianqian Song, Vithal Madhira, Yao Yan, Sharlene Dong, Eileen Lee, Nathaniel Kuhrt, Yu Raymond Shao, Feifan Liu, Timothy Bergquist, Justin Guinney, Jing Su, Umit Topalogluon behalf of the N3C Consortium

Given on June 4, 2021

https://covid.cd2h.org/ cd2h.slack.com @data2health

N3C Oncology Domain Team (ODT)

60

Noha Sharafeldin, MBBCh, PhD

Benjamin Bates, MD

Rutgers University

Umit Topaloglu, PhD

Wake Forest

University

Noha Sharafeldin, MD, PhD

The University of Alabama at

Birmingham

Leadership

https://covid.cd2h.org/oncologySlack channel: #n3c-tt-oncology

N3C ODT Expertise

61

Noha Sharafeldin, MBBCh, PhD

Noha Sharafeldin

Informatics Biostatistics Clinical Epidemiology N3C data and Logic

Umit Topaloglu Jing Su Benjamin Bates Justin Guinney Vithal Madhira Tim Bergquist

Feifan Liu Qianqian Song Yu Raymond Shao Nate Kuhrt Sharlene Dong Eileen LeeYao Yan

N3C OncologyA program of NIH’s National Center

for Advancing Translational Sciences

http://ascopubs.org/doi/full/10.1200/JCO.21.01074

N3C Cancer Cohort

Primary Diagnosis

63

Noha Sharafeldin, MBBCh, PhD

N3C Cancer Cohort

64

Noha Sharafeldin, MBBCh, PhD

Primary Outcome

• All- cause mortality

Secondary Outcomes (Clinical severity indicators requiring hospitalization)

• Mechanical Ventilation

65

Insert Name

(Insert > Header & Footer > Apply to All)

Demographic, clinical, and tumor characteristics9

Noha Sharafeldin, MBBCh, PhD

2%13%

31%

54%

Age

18-29

30-49

50-64

65+

COVID-19 Positive

4%13%

61%

22%

Race

Hispanic

Non-Hispanic Black

Non-Hispanic White

Other or Unknown

51%49%

Sex

Female

Male

11%

34%

28%

5%

22%

Geographical Location

US-Northeast

US-Midwest

US-South

US-West

Unknown

66

Insert Name

(Insert > Header & Footer > Apply to All)

Demographic, clinical, and tumor characteristics10

Noha Sharafeldin, MBBCh, PhD

COVID-19 Positive

86%

14%

Smoking status

Non-smoker

Current orFormer smoker

41%

16%

9%6%

28%

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

0 1 2 3 ≥4

ADJUSTED CCI

67

Insert Name

(Insert > Header & Footer > Apply to All)

Demographic, clinical, and tumor characteristics11

Noha Sharafeldin, MBBCh, PhD

COVID-19 Positive

15%

14%

12%

12%

9%

11%

0 1000 2000 3000 4000 5000 6000 7000

SKIN CANCERS

BREAST CANCER

PROSTATE CANCER

HEMATOLOGICAL CANCERS

GASTROINTESTINAL CANCERS

MULTI-SITE

Type of primary malignancy

71%

12%

11%

3% 3%

Solid

Liquid

Multi-Site

Unknown

Undefined Primary

COVID-19 Treatment68

Noha Sharafeldin, MBBCh, PhD

COVID-19 Treatment (Yes) COVID positive (n=38,614)

Systemic antibiotics

Systemic steroids

Azithromycin

Remdesivir

Dexamethasone

Hydroxychloroquine (HCQ)

4032(15.75%)

3514(13.73%)

1197(4.68%)

1047(4.09%)

1029(4.02%)

364(1.42%)

Death and invasive ventilation in hospitalized patients

69

Noha Sharafeldin, MBBCh, PhD

Outcome COVID positive

(n=19,515)

COVID negative

(n=184,988)

Death

Invasive Ventilation

2,894 (14.8%)

1,606 (8.2%)

23,207 (12.5%)

9,576 (5.2%)

Survival Probability –by COVID status

70

Noha Sharafeldin, MBBCh, PhD

HR = 1.20 (95%CI: 1.15 – 1.24, p<0.001)

Survival Probability by cancer type among COVID positive patients

71

Noha Sharafeldin, MBBCh, PhD

Hazard ratios associated with 1-year all-cause mortality among COVID-positive patients

72

Noha Sharafeldin, MBBCh, PhD

Hazard ratios associated with 1-year all-cause mortality among COVID-positive patients

73

Noha Sharafeldin, MBBCh, PhD

Hazard ratios associated with 1-year all-cause mortality among COVID-positive patients

74

Noha Sharafeldin, MBBCh, PhD

Hazard ratios associated with 1-year all-cause mortality among COVID-positive patients

75

Noha Sharafeldin, MBBCh, PhD

76

Noha Sharafeldin, MBBCh, PhD

Limitations

• RWD Challenges (e.g. data missingness)

• Limited capture of recent cancer therapy

• Potential misclassification of cancer patients

• Challenges in primary cancer diagnosis mapping and limited historical data

• Method for construction of COVID-19 negative control

77

Noha Sharafeldin, MBBCh, PhD

Conclusions

• N3C represents a unique resource to examine effects of COVID-19 on cancer outcomes

• Largest COVID-19 and cancer cohort within the US

• Consistent with previous literature, older age, male gender, increasing comorbidities, and hematological malignancies were associated with higher mortality in patients with cancer and COVID-19

• The N3C dataset confirmed that cancer patients with COVID-19 who received recent immuno-, or targeted therapies were not at higher risks of overall mortality

78

Noha Sharafeldin, MBBCh, PhD

Acknowledgements

The Patients

US Data Partners

N3C Consortial Authors

Christopher Chute

Melissa Haendel

Amit Mitra

Ramakanth Kavuluru

NCATS U24 TR002306

NIGMS 5U54GM104942-04

NCI P30CA012197 [UT, QS]

LLS 3386-19 [NS]

Indiana University Precision Health

Initiative [JS]

N3C Core Teams

79

Noha Sharafeldin, MBBCh, PhD

AcknowledgementsWe gratefully acknowledge contributions from the following N3C core teams:• Principal Investigators: Melissa A. Haendel*, Christopher G. Chute*, Kenneth R. Gersing, Anita Walden• Workstream, subgroup and administrative leaders: Melissa A. Haendel*, Tellen D. Bennett, Christopher G. Chute, David A. Eichmann, Justin Guinney, Warren A. Kibbe, Hongfang Liu, Philip R.O. Payne, Emily R. Pfaff, Peter N. Robinson, Joel H. Saltz, Heidi Spratt, Justin Starren, Christine Suver, Adam B. Wilcox, Andrew E. Williams, Chunlei Wu• Key liaisons at data partner sites• Regulatory staff at data partner sites• Individuals at the sites who are responsible for creating the datasets and submitting data to N3C• Data Ingest and Harmonization Team: Christopher G. Chute*, Emily R. Pfaff*, Davera Gabriel, Stephanie S. Hong, Kristin Kostka, Harold P. Lehmann, Richard A. Moffitt, Michele Morris, Matvey B. Palchuk, Xiaohan Tanner Zhang, Richard L. Zhu• Phenotype Team (Individuals who create the scripts that the sites use to submit their data, based on the COVID and Long COVID definitions): Emily R. Pfaff*, Benjamin Amor, Mark M. Bissell, Marshall Clark, Andrew T. Girvin, Stephanie S. Hong, Kristin Kostka, Adam M. Lee, Robert T. Miller, Michele Morris, Matvey B. Palchuk, Kellie M. Walters• Project Management and Operations Team: Anita Walden*, Yooree Chae, Connor Cook, Alexandra Dest, Racquel R. Dietz, Thomas Dillon, Patricia A. Francis, Rafael Fuentes, Alexis Graves, Julie A. McMurry, Andrew J. Neumann, Shawn T. O'Neil, Andréa M. Volz, Elizabeth Zampino• Partners from NIH and other federal agencies: Christopher P. Austin*, Kenneth R. Gersing*, Samuel Bozzette, Mariam Deacy, Nicole Garbarini, Michael G. Kurilla, Sam G. Michael, Joni L. Rutter, Meredith Temple-O'Connor• Analytics Team (Individuals who build the Enclave infrastructure, help create codesets, variables, and help Domain Teams and project teams with their datasets): Benjamin Amor*, Mark M. Bissell, Katie Rebecca Bradwell, Andrew T. Girvin, Amin Manna, Nabeel Qureshi• Publication Committee Management Team: Mary Morrison Saltz*, Christine Suver*, Christopher G. Chute, Melissa A. Haendel, Julie A. McMurry, Andréa M. Volz, Anita Walden• Publication Committee Review Team: Carolyn Bramante, Jeremy Richard Harper, Wenndy Hernandez, Farrukh M Koraishy, Federico Mariona, Saidulu Mattapally, Amit Saha, Satyanarayana Vedula

N3C Registration/Traininghttps://covid.cd2h.org/tutorials

Training Office Hours: Tuesdays & Thursdays at 10-11 am PT/1-2 pm ET Registration Required at this link

Orientation Video Coming Soon

Additional Training Tutorials available in the Enclave

Registration for Documents, Meetings & the N3C Data Enclave

Requires Authentication

Enclave Checklist

A program of NIH’s National Center

for Advancing Translational Sciences

● N3C comprises the largest, most representative patient-level COVID-19 cohort in the US and continues to grow

● We CAN do transparent, reproducible, innovative science (including ML) on sensitive observational data at scale, together!

● N3C is an innovative partnership between clinical sites, CDM communities, NIH ICs, CD2H, and commercial partners

● Automation of data extraction and minimum requirements reduces burden and increases site participation

● Robust attribution of all contributors; also provides great venue for trainees

● N3C data is complicated, but there are many people and resources to help users do good science

Step 4. Federated Analytics with HPCTakeawaysA program of NIH’s National Center

for Advancing Translational Sciences

Register with N3C: https://labs.cd2h.org/registration/

Joining Workstreams:

N3C Data Ingestion & Harmonization Workstream

Slack Channel Harmonization

Google Group Harmonization

N3C Phenotype & Data Acquisition Workstream

Slack Channel Phenotype

Google Group Phenotype

N3C Collaborative Analytics Workstream

Slack Channel Analytics

Google Group Analytics

N3C Data Partnership & Governance Workstream

Slack Channel Governance

Google Group Governance

N3C Synthetic Clinical Data Workstream

Slack Channel Synthetic

Google Group Synthetic

N3C Implementation Workstream- Coming soon

Additional Information:

Onboarding N3C, Slack, Google | Finding and Joining a Google Group

NCATS N3C Webpage N3C Website

How to Get Involved with N3CA program of NIH’s National Center

for Advancing Translational Sciences

Melissa A. Haendel,1,4,7,8,10,13,14,52,78,101 Christopher G. Chute,1,4,8,10,13,14,52,78,100,101 Tellen D. Bennett,9,10,13,14,52,100,101 David A. Eichmann,4,9,10,13,78,101 Justin Guinney,4,9,10,14,78,101 Warren A. Kibbe,9,10,52,78,101 Philip R.O. Payne,4,9,10,78,101 Emily R. Pfaff,9,10,13,15,52,78 Peter N. Robinson,4,9,10,15,52,78,100 Joel H. Saltz,10,13,14,15,52,78,101 Heidi Spratt,9,10,100 Christine Suver,10,78,101 John Wilbanks,10,78,101 Adam B. Wilcox,10,101 Andrew E. Williams,10,13,78 Chunlei Wu,9,13,14,78

Clair Blacketer,15,52 Robert L. Bradford,9,52 James J. Cimino,10,14,101 Marshall Clark,9,15,52 Evan W. Colmenares,9,15,52 Patricia A. Francis,78 Davera Gabriel,9,10,13,14,15,52 Alexis Graves,7,9,78 Raju Hemadri,9,15,52 Stephanie S. Hong,9,15,52 George Hripscak,10,52 Dazhi Jiao,9,15,52 Jeffrey G. Klann,14,52,101 Kristin Kostka,9,15,52 Adam M. Lee,9,15,52 Harold P. Lehmann,9,15,52 Lora Lingrey,9,15,52 Robert T. Miller,9,15,52 Michele Morris,9,15,52 Shawn N. Murphy,9,15,52 Karthik Natarajan,9,15,52 Matvey B. Palchuk,9,15,52 Usman Sheikh,9,78 Harold Solbrig,9,15,52 Shyam Visweswaran,10,15,52,101 Anita Walden,7,10,13,14,52,101 Kellie M. Walters,10,14,101 Griffin M. Weber,10,101 Xiaohan Tanner Zhang,9,15,52 Richard L. Zhu,9,15,52 Benjamin Amor,78 Andrew T. Girvin,15,78 Amin Manna,78 Nabeel Qureshi,15,78 Michael G. Kurilla,10,78 Sam G. Michael,10,78 Lili M. Portilla,101 Joni L. Rutter,1,101 Christopher P. Austin,101 Ken R. Gersing,78,101

Shaymaa Al-Shukri,4,15 Adil Alaoui,101 Ahmad Baghal,15 Pamela D. Banning,15,100 Edward M. Barbour,8,15 Michael J. Becich,15,52,101 Afshin Beheshti,14 Gordon R. Bernard,8,15 Sharmodeep Bhattacharyya,100 Mark M. Bissell,9,15 L. Ebony Boulware,14,100 Samuel Bozzette,100,101 Donald E. Brown,101 John B. Buse,14 Brian J. Bush,8,101 Tiffany J. Callahan,14,52 Thomas R. Campion,8,15 Elena Casiraghi,9,15 Ammar A. Chaudhry,13,14 Guanhua Chen,9 Anjun Chen,13 Gari D. Clifford,8,15 Megan P. Coffee,14,100 Tom Conlin,14 Connor Cook,7,78 Keith A. Crandall,9,14,101 Mariam Deacy,78 Racquel R. Dietz,78 Nicholas J. Dobbins,8,9

Peter L. Elkin,15,52,100 Peter J. Embi,52,101 Julio C. Facelli,8,15 Karamarie Fecho,13 Xue Feng,9 Randi E. Foraker,8,13,15 Tamas S. Gal,8,15 Linqiang Ge,14 George Golovko,15,101 Ramkiran Gouripeddi,14,15 Casey S. Greene,13,14 Sangeeta Gupta,52,101 Ashish Gupta,13,101 Janos G. Hajagos,9,15 David A. Hanauer,15,52 Jeremy Richard Harper,9,14,52 Nomi L. Harris,14 Paul A. Harris,101 Mehadi R. Hassan,9 Yongqun He,15,52,100

Elaine L. Hill,9,14 Maureen E. Hoatlin,14 Kristi L. Holmes,4,101 LaRon Hughes,14 Randeep S. Jawa,14 Guoqian Jiang,14 Xia Jing,7,14 Marcin P. Joachimiak,8,15 Steven G. Johnson,9,14,101 Rishikesan Kamaleswaran,9,15,78 Thomas George Kannampallil,15,101 Andrew S. Kanter,15,52 Ramakanth Kavuluru,9,13,14 Kamil Khanipov,8,14 Hadi Kharrazi,9,14 Dongkyu Kim,15,52 Boyd M. Knosp,8,15 Arunkumar Krishnan,9

Tahsin Kurc,9,15 Albert M. Lai,101 Christophe G. Lambert,52,101 Michael Larionov,14 Stephen B. Lee,1,14 Michael D. Lesh,9 Olivier Lichtarge,14 John Liu,9 Sijia Liu,8,9,101 Hongfang Liu,9,15 Johanna J. Loomba,1,15,78,101

Sandeep K. Mallipattu,9,14,15 Chaitanya K. Mamillapalli,14 Christopher E. Mason,15 Jomol P. Mathew,8,15,52 James C. McClay,101 Julie A. McMurry,1,4,7,9,13,14,78 Paras P. Mehta,14 Ofer Mendelevitch,9 Stephane Meystre,8,14,15 Richard A. Moffitt,9,13,15 Jason H. Moore,8,9 Hiroki Morizono,13,14,15,52 Christopher J. Mungall,15,52 Monica C. Munoz-Torres,7,10,78 Andrew J. Neumann,78 Xia Ning,14 Jennifer E. Nyland,13,14 Lisa O'Keefe,78 Anna O'Malley,78 Shawn T. O'Neil,78 Jihad S. Obeid,10,14,15 Elizabeth L. Ogburn,13 Jimmy Phuong,9,15,52,100,101 Jose D Posada,8,15 Prateek Prasanna,14,52 Fred Prior,9,14,15 Justin Prosser,9,78 Amanda Lienau Purnell,101 Ali Rahnavard,9,52 Harish Ramadas,9,52,78 Justin T. Reese,9,10 Jennifer L. Robinson,14,100 Daniel L. Rubin,101 Cody D. Rutherford,9,101 Eugene M. Sadhu,8,15 Amit Saha,9 Mary Morrison Saltz,15,52,101 Thomas Schaffter,78 Titus KL Schleyer,14 Soko Setoguchi,8,14,15 Nigam H. Shah,8,14 Noha Sharafeldin,14 Evan Sholle,15,52 Jonathan C. Silverstein,15,52,101 Anthony Solomonides,101 Julian Solway,14,101

Jing Su,101 Vignesh Subbian,9,52,101 Hyo Jung Tak,15 Bradley W. Taylor,9,14 Anne E. Thessen,14,101 Jason A. Thomas,15 Umit Topaloglu,15,52 Deepak R. Unni,8,9,15,52 Joshua T. Vogelstein,14 Andréa M. Volz,7 David A. Williams,14,15 Kelli M. Wilson,9,78 Clark B. Xu,8,9,15 Hua Xu,9,10,14 Yao Yan,9,15,52 Elizabeth Zak,8,15 Lanjing Zhang,101 Chengda Zhang,14 Jingyi Zheng,14

1CREDIT_00000001 (Conceptualization) 4CREDIT_00000004 (Funding acquisition) 7CRO_0000007 (Marketing and Communications) 8CREDIT_00000008 (Resources) 9CREDIT_00000009 (Software role) 10CREDIT_00000010 (Supervision role) 13CREDIT_00000013 (Original draft) 14CREDIT_00000014 (Review and editing) 15CRO_0000015 (Data role) 52CRO_0000052 (Standards role) 78CRO_0000078 (Infrastructure role) 100Clinical Use Cases 101Governance

https://academic.oup.com/jamia/advance-

article/doi/10.1093/jamia/ocaa196/5893482

Questions or Comments?

Thank you! Thank you!

A program of NIH’s National Center

for Advancing Translational Sciences