bio it 2014-published
TRANSCRIPT
© 2013 New York Genome Center 1NYGC PRIVILEGED & CONFIDENTIAL
Privacy, Regulatory and Security Requirements in a Collaborative Clinical Genomics Environment
TOBY BLOOM, PH.DBIO-IT WORLDAPRIL 29, 2014
© 2013 New York Genome Center 2NYGC PRIVILEGED & CONFIDENTIAL
NYGC OVERVIEW
Independent, non-profit research organizationFounded as a collaboration of 12 NYC medical institutionsFocused on clinical genomicsExpecting to handle PHI, HIPAA regulations, FISMA-moderate security from the beginning.Merging many kinds of dataThe center’s mission is to save lives by creating an unprecedented collaboration of technology, science and medicine.
© 2013 New York Genome Center 3NYGC PRIVILEGED & CONFIDENTIAL
MEMBER INSTITUTIONS
© 2013 New York Genome Center 4NYGC PRIVILEGED & CONFIDENTIAL
NEW YORK BIOMEDICAL COMMUNITY
Fostering CollaborationEnhancing efficienciesPromoting advances in medicine fasterSharing data is essential!!
© 2013 New York Genome Center 5NYGC PRIVILEGED & CONFIDENTIAL
HOW DO SECURITY, PRIVACY ®ULATIONS AFFECT OUR MISSION?
© 2013 New York Genome Center 6NYGC PRIVILEGED & CONFIDENTIAL
MY DEFINITIONS
Privacy:Ensuring that information that anyone considers personal and would not want known by others is protected
SecurityThe means by which we constrain access to data, so that private data is protected from access by unauthorized individuals, and is not changed, removed, or made unavailable by unauthorized individuals.
RegulationsLaws and governmental or organization rules that govern how data may be accessed and used.
ØALL OF THESE IMPACT SHARING OF DATA
© 2013 New York Genome Center 7NYGC PRIVILEGED & CONFIDENTIAL
DATA SHARING AND AGGREGATION ARE CRITICAL
Complex diseases may need huge numbers of samples to gain statistical powerSequencing more patients when enough sequence exists for a new study is a waste of resources and precious research fundingIn rare diseases, you may not ever see the same thing twice……..
© 2013 New York Genome Center 8NYGC PRIVILEGED & CONFIDENTIAL
RISKS OF SHARING YOUR GENOMIC DATA SHOULDN’T BE UNDERESTIMATED EITHER
GINA does not protect against denial of disability coverage, life insurance, long-term care insurance based on genetic information!
For you or your family members!!!!
Some people can afford not to worry about those issues
But for some, it’s critical!
Does sharing only for research projects, not publicly, reduce this risk sufficiently?
© 2013 New York Genome Center 9NYGC PRIVILEGED & CONFIDENTIAL
AN EXAMPLE: NYC CLINICAL DATA RESEARCH NETWORK
“Both the opportunity and the anxiety are pretty electrifying,” Francis S. Collins, director of the National Institutes of Health, said in an interview.
© 2013 New York Genome Center 10NYGC PRIVILEGED & CONFIDENTIAL
NYC CLINICAL DATA RESEARCH NETWORK
FUNDED by PCORI
Individual Researchers
© 2013 New York Genome Center 11NYGC PRIVILEGED & CONFIDENTIAL
NYC-CDRN GOALS
Collect de-identified data from all patients from all of the member health systems2.5-6.5 Million patient records
De-duplicated across health systemsExpect the first 2.5M records (with incomplete data) by August 1, assuming legal approvals
Available for retrospective studiesAvailable for cohort identificationWill eventually host prospective studies as wellProposal promised connections to genomic data
© 2013 New York Genome Center 12NYGC PRIVILEGED & CONFIDENTIAL
THE DETAILS
Expect to have at least 2.4 million patient records by August
Currently have 2M “dummy” recordsWaiting for the legalities….
De-duplicated across health systems!NYGC provides de-identified information onlyBut we receive “limited data sets” under HIPAAHealthix and Bronx RHIO – trusted brokers - have identifying information but no health dataWhat are we permitted to do with this data?What are the privacy, security, regulatory issues?
© 2013 New York Genome Center 13NYGC PRIVILEGED & CONFIDENTIAL
PRIVACY: AT WHAT LEVEL CAN WE GUARANTEE THIS?
Patients are “fully de-identified” in any data we make available (according to HIPAA standards)
Is that really true?One physician tells me that 3 consecutive phosphate readings are fully identifying
Providers do not want to be identified, and we will keep NO provider informationPlan was to provide proxy ids for health systems
Allowing comparisons, but not identificationBut patient 3-digit zip codes are permitted by HIPAA in NYAnd that will identify the hospital!!!!
© 2013 New York Genome Center 14NYGC PRIVILEGED & CONFIDENTIAL
REGULATORY
Lawyers do not agree on what constitutes re-identification under HIPAAI can identify cohorts for prospective studies from the collected data.
Can I give those anonymized ids back to the hospital they came from to ask that the patients be contacted for consent to participate in the study?Or does that constitute knowingly using de-identified data for re-identification purposes –
Even though I will never see the patient identity?
© 2013 New York Genome Center 15NYGC PRIVILEGED & CONFIDENTIAL
CLINICAL GENOMICS
Many more challengesIdentifiable informationMany types of data
Electronic Health RecordsGenomic DataPersonally reported dataDevice data
Image dataCurrent Auto-Immune Disease Project uses most of these and more
© 2013 New York Genome Center 16NYGC PRIVILEGED & CONFIDENTIAL
LINKING TO OTHER DATA
Prospective studies with additional (possibly identifiable) data collection
Linking to genomic dataLinking to personal device data, patient-provided data, etc.
How do we isolate identifiable information from the de-identified data, to prevent re-identification, and still allow the data to be linked for studies with appropriate consents?
A security question!!!!
© 2013 New York Genome Center 17NYGC PRIVILEGED & CONFIDENTIAL
HOW DO WE CONNECT THIS TO GENOMIC DATA?
Genomic data does not fall under HIPAA – yetBut it is considered “identifying information”Does accessing genomic data and the de-identified patient data by matching anonymized ids constitute re-identification of the de-identified data?We may need to keep a new copy (consented) of the same data for each project.
© 2013 New York Genome Center 18NYGC PRIVILEGED & CONFIDENTIAL
PCORI: A MIX OF PRIVACY, REGULATIONS AND SECURITY ISSUES
Are we using the data in acceptable ways without explicit patient consent?Are we meeting HIPAA regulations around re-identification and use of limited datasets?Do we have adequate security around data transfers and access control from external networks (eg PCORNet)?
© 2013 New York Genome Center 19NYGC PRIVILEGED & CONFIDENTIAL
MAINTAINING A GENOMIC DATA WAREHOUSE
© 2013 New York Genome Center 20NYGC PRIVILEGED & CONFIDENTIAL
NYGC’S GOAL IS TO ENABLE DATA SHARING!
Collecting yet more dataMaintaining a catalog of data hosted by collaboratorsSecurity for multi-tenancy models also!Secure transmission of data among collaboratorsMaintaining our own data securely
© 2013 New York Genome Center 21NYGC PRIVILEGED & CONFIDENTIAL
DATA SECURITY IS VERY GRANULAR
Protecting researchers from themselvesEnsure protection of unpublished dataIRB approvals and informed consents limit who can use data
Researchers don’t always understand the detailsProject-level access control works initiallyBut data sharing agreements can allow access to only some samples in a project for secondary use
Check boxes on informed consents are a big culpritAnd sample-level security is insufficient because owners of data may allow the same samples to be used in multiple studies
But preclude researchers in one study from seeing results of others
© 2013 New York Genome Center 22NYGC PRIVILEGED & CONFIDENTIAL
OPTIONS FOR ACCESS CONTROL
Force all access through a catalogDoesn’t work for methods requiring file pathsUsers hate it
FUSE file systemsUser-space virtual file systemToo slow
Linux access controlDoesn’t work with NFS V3
NFS allows only 16 groups per userThat limits everyone to 16 project-sample combinations
And it doesn’t work with databases!!May well need cell-level access within databases
© 2013 New York Genome Center 23NYGC PRIVILEGED & CONFIDENTIAL
SECURITY OF GENOMIC DATA
Supporting prospective studies means maintaining identifiable data
As does storing genomic data – connected or notOur infrastructure is FISMA-moderate compliantIs this sufficient?BAM files are too big to encrypt at rest and still access in pipelines!!
Hardware assisted encryption still takes 3 hours to decrypt a BAM fileEncrypted disk may be sufficient – but expensive at least
Can’t follow standard HIPAA/HiTECH suggestions
© 2013 New York Genome Center 24NYGC PRIVILEGED & CONFIDENTIAL
EDGE SECURITY
Edge SecurityWe’re FISMA moderate compliantWe’ve passed pharma security auditsWe’ve passed independent security auditsWe regularly do penetration testingWe monitor logs
Is this sufficient?We’ll never be entirely sure
© 2013 New York Genome Center 25NYGC PRIVILEGED & CONFIDENTIAL
THE BALANCING ACT!
Collaboration Restrictions
© 2013 New York Genome Center 26NYGC PRIVILEGED & CONFIDENTIAL
ACKNOWLEDGEMENTS
PCORIRainu Kaushal(Cornell – PCORI PI)George Hripsak(Columbia)Parsa Mirhaji (Montefiore)Alex Low (Cornell)Tom Check (Healthix)Tom Campion (Cornell)Deborah Ascheim(Mt Sinai)Many others
RockefellerMayu FrankDana Orange
NYGCCristyn KellsDorian LearyUday EvaniNina LapchykShailu GargeyaChris BlackScott CollinsJen BaldwinBob Darnell
Cornell TechDeborah Estrin
Funded In Part by the Patient-Centered Outcomes Research Institute