bio it 2014-published

26
© 2013 New York Genome Center 1 NYGC PRIVILEGED & CONFIDENTIAL Privacy, Regulatory and Security Requirements in a Collaborative Clinical Genomics Environment TOBY BLOOM, PH.D BIO-IT WORLD APRIL 29, 2014

Upload: toby-bloom

Post on 11-Apr-2017

75 views

Category:

Health & Medicine


0 download

TRANSCRIPT

Page 1: Bio it 2014-published

© 2013 New York Genome Center 1NYGC PRIVILEGED & CONFIDENTIAL

Privacy, Regulatory and Security Requirements in a Collaborative Clinical Genomics Environment

TOBY BLOOM, PH.DBIO-IT WORLDAPRIL 29, 2014

Page 2: Bio it 2014-published

© 2013 New York Genome Center 2NYGC PRIVILEGED & CONFIDENTIAL

NYGC OVERVIEW

Independent, non-profit research organizationFounded as a collaboration of 12 NYC medical institutionsFocused on clinical genomicsExpecting to handle PHI, HIPAA regulations, FISMA-moderate security from the beginning.Merging many kinds of dataThe center’s mission is to save lives by creating an unprecedented collaboration of technology, science and medicine.

Page 3: Bio it 2014-published

© 2013 New York Genome Center 3NYGC PRIVILEGED & CONFIDENTIAL

MEMBER INSTITUTIONS

Page 4: Bio it 2014-published

© 2013 New York Genome Center 4NYGC PRIVILEGED & CONFIDENTIAL

NEW YORK BIOMEDICAL COMMUNITY

Fostering CollaborationEnhancing efficienciesPromoting advances in medicine fasterSharing data is essential!!

Page 5: Bio it 2014-published

© 2013 New York Genome Center 5NYGC PRIVILEGED & CONFIDENTIAL

HOW DO SECURITY, PRIVACY &REGULATIONS AFFECT OUR MISSION?

Page 6: Bio it 2014-published

© 2013 New York Genome Center 6NYGC PRIVILEGED & CONFIDENTIAL

MY DEFINITIONS

Privacy:Ensuring that information that anyone considers personal and would not want known by others is protected

SecurityThe means by which we constrain access to data, so that private data is protected from access by unauthorized individuals, and is not changed, removed, or made unavailable by unauthorized individuals.

RegulationsLaws and governmental or organization rules that govern how data may be accessed and used.

ØALL OF THESE IMPACT SHARING OF DATA

Page 7: Bio it 2014-published

© 2013 New York Genome Center 7NYGC PRIVILEGED & CONFIDENTIAL

DATA SHARING AND AGGREGATION ARE CRITICAL

Complex diseases may need huge numbers of samples to gain statistical powerSequencing more patients when enough sequence exists for a new study is a waste of resources and precious research fundingIn rare diseases, you may not ever see the same thing twice……..

Page 8: Bio it 2014-published

© 2013 New York Genome Center 8NYGC PRIVILEGED & CONFIDENTIAL

RISKS OF SHARING YOUR GENOMIC DATA SHOULDN’T BE UNDERESTIMATED EITHER

GINA does not protect against denial of disability coverage, life insurance, long-term care insurance based on genetic information!

For you or your family members!!!!

Some people can afford not to worry about those issues

But for some, it’s critical!

Does sharing only for research projects, not publicly, reduce this risk sufficiently?

Page 9: Bio it 2014-published

© 2013 New York Genome Center 9NYGC PRIVILEGED & CONFIDENTIAL

AN EXAMPLE: NYC CLINICAL DATA RESEARCH NETWORK

“Both the opportunity and the anxiety are pretty electrifying,” Francis S. Collins, director of the National Institutes of Health, said in an interview.

Page 10: Bio it 2014-published

© 2013 New York Genome Center 10NYGC PRIVILEGED & CONFIDENTIAL

NYC CLINICAL DATA RESEARCH NETWORK

FUNDED by PCORI

Individual Researchers

Page 11: Bio it 2014-published

© 2013 New York Genome Center 11NYGC PRIVILEGED & CONFIDENTIAL

NYC-CDRN GOALS

Collect de-identified data from all patients from all of the member health systems2.5-6.5 Million patient records

De-duplicated across health systemsExpect the first 2.5M records (with incomplete data) by August 1, assuming legal approvals

Available for retrospective studiesAvailable for cohort identificationWill eventually host prospective studies as wellProposal promised connections to genomic data

Page 12: Bio it 2014-published

© 2013 New York Genome Center 12NYGC PRIVILEGED & CONFIDENTIAL

THE DETAILS

Expect to have at least 2.4 million patient records by August

Currently have 2M “dummy” recordsWaiting for the legalities….

De-duplicated across health systems!NYGC provides de-identified information onlyBut we receive “limited data sets” under HIPAAHealthix and Bronx RHIO – trusted brokers - have identifying information but no health dataWhat are we permitted to do with this data?What are the privacy, security, regulatory issues?

Page 13: Bio it 2014-published

© 2013 New York Genome Center 13NYGC PRIVILEGED & CONFIDENTIAL

PRIVACY: AT WHAT LEVEL CAN WE GUARANTEE THIS?

Patients are “fully de-identified” in any data we make available (according to HIPAA standards)

Is that really true?One physician tells me that 3 consecutive phosphate readings are fully identifying

Providers do not want to be identified, and we will keep NO provider informationPlan was to provide proxy ids for health systems

Allowing comparisons, but not identificationBut patient 3-digit zip codes are permitted by HIPAA in NYAnd that will identify the hospital!!!!

Page 14: Bio it 2014-published

© 2013 New York Genome Center 14NYGC PRIVILEGED & CONFIDENTIAL

REGULATORY

Lawyers do not agree on what constitutes re-identification under HIPAAI can identify cohorts for prospective studies from the collected data.

Can I give those anonymized ids back to the hospital they came from to ask that the patients be contacted for consent to participate in the study?Or does that constitute knowingly using de-identified data for re-identification purposes –

Even though I will never see the patient identity?

Page 15: Bio it 2014-published

© 2013 New York Genome Center 15NYGC PRIVILEGED & CONFIDENTIAL

CLINICAL GENOMICS

Many more challengesIdentifiable informationMany types of data

Electronic Health RecordsGenomic DataPersonally reported dataDevice data

Image dataCurrent Auto-Immune Disease Project uses most of these and more

Page 16: Bio it 2014-published

© 2013 New York Genome Center 16NYGC PRIVILEGED & CONFIDENTIAL

LINKING TO OTHER DATA

Prospective studies with additional (possibly identifiable) data collection

Linking to genomic dataLinking to personal device data, patient-provided data, etc.

How do we isolate identifiable information from the de-identified data, to prevent re-identification, and still allow the data to be linked for studies with appropriate consents?

A security question!!!!

Page 17: Bio it 2014-published

© 2013 New York Genome Center 17NYGC PRIVILEGED & CONFIDENTIAL

HOW DO WE CONNECT THIS TO GENOMIC DATA?

Genomic data does not fall under HIPAA – yetBut it is considered “identifying information”Does accessing genomic data and the de-identified patient data by matching anonymized ids constitute re-identification of the de-identified data?We may need to keep a new copy (consented) of the same data for each project.

Page 18: Bio it 2014-published

© 2013 New York Genome Center 18NYGC PRIVILEGED & CONFIDENTIAL

PCORI: A MIX OF PRIVACY, REGULATIONS AND SECURITY ISSUES

Are we using the data in acceptable ways without explicit patient consent?Are we meeting HIPAA regulations around re-identification and use of limited datasets?Do we have adequate security around data transfers and access control from external networks (eg PCORNet)?

Page 19: Bio it 2014-published

© 2013 New York Genome Center 19NYGC PRIVILEGED & CONFIDENTIAL

MAINTAINING A GENOMIC DATA WAREHOUSE

Page 20: Bio it 2014-published

© 2013 New York Genome Center 20NYGC PRIVILEGED & CONFIDENTIAL

NYGC’S GOAL IS TO ENABLE DATA SHARING!

Collecting yet more dataMaintaining a catalog of data hosted by collaboratorsSecurity for multi-tenancy models also!Secure transmission of data among collaboratorsMaintaining our own data securely

Page 21: Bio it 2014-published

© 2013 New York Genome Center 21NYGC PRIVILEGED & CONFIDENTIAL

DATA SECURITY IS VERY GRANULAR

Protecting researchers from themselvesEnsure protection of unpublished dataIRB approvals and informed consents limit who can use data

Researchers don’t always understand the detailsProject-level access control works initiallyBut data sharing agreements can allow access to only some samples in a project for secondary use

Check boxes on informed consents are a big culpritAnd sample-level security is insufficient because owners of data may allow the same samples to be used in multiple studies

But preclude researchers in one study from seeing results of others

Page 22: Bio it 2014-published

© 2013 New York Genome Center 22NYGC PRIVILEGED & CONFIDENTIAL

OPTIONS FOR ACCESS CONTROL

Force all access through a catalogDoesn’t work for methods requiring file pathsUsers hate it

FUSE file systemsUser-space virtual file systemToo slow

Linux access controlDoesn’t work with NFS V3

NFS allows only 16 groups per userThat limits everyone to 16 project-sample combinations

And it doesn’t work with databases!!May well need cell-level access within databases

Page 23: Bio it 2014-published

© 2013 New York Genome Center 23NYGC PRIVILEGED & CONFIDENTIAL

SECURITY OF GENOMIC DATA

Supporting prospective studies means maintaining identifiable data

As does storing genomic data – connected or notOur infrastructure is FISMA-moderate compliantIs this sufficient?BAM files are too big to encrypt at rest and still access in pipelines!!

Hardware assisted encryption still takes 3 hours to decrypt a BAM fileEncrypted disk may be sufficient – but expensive at least

Can’t follow standard HIPAA/HiTECH suggestions

Page 24: Bio it 2014-published

© 2013 New York Genome Center 24NYGC PRIVILEGED & CONFIDENTIAL

EDGE SECURITY

Edge SecurityWe’re FISMA moderate compliantWe’ve passed pharma security auditsWe’ve passed independent security auditsWe regularly do penetration testingWe monitor logs

Is this sufficient?We’ll never be entirely sure

Page 25: Bio it 2014-published

© 2013 New York Genome Center 25NYGC PRIVILEGED & CONFIDENTIAL

THE BALANCING ACT!

Collaboration Restrictions

Page 26: Bio it 2014-published

© 2013 New York Genome Center 26NYGC PRIVILEGED & CONFIDENTIAL

ACKNOWLEDGEMENTS

PCORIRainu Kaushal(Cornell – PCORI PI)George Hripsak(Columbia)Parsa Mirhaji (Montefiore)Alex Low (Cornell)Tom Check (Healthix)Tom Campion (Cornell)Deborah Ascheim(Mt Sinai)Many others

RockefellerMayu FrankDana Orange

NYGCCristyn KellsDorian LearyUday EvaniNina LapchykShailu GargeyaChris BlackScott CollinsJen BaldwinBob Darnell

Cornell TechDeborah Estrin

Funded In Part by the Patient-Centered Outcomes Research Institute