overcoming obstacles to sharing data about human subjects
TRANSCRIPT
Overcoming obstacles to sharing data about human
subjectsForce11 conference
Portland, Oregon18 April, 2016
Robin RiceEDINA and Data Library
University of Edinburgh, UK
The elephant in the room(David Blackwell on Flickr)
The status quo Most data underlying published research, even publicly funded research, are not
shared. How can research claims be verified?
Common barriers are well-known, confidentiality concerns are high
Qualitative research data and small-scale surveys are not commonly re-used
Tendency is to err on side of caution, given legal & ethical responsibilities
As open science agenda pushes disciplines toward reproducibility, there is a danger of human subject-oriented research falling behind
Redressing the imbalanceCaution vs open data sharing
(Seesaw by harmishhk on Flickr)
What a researcher can do to be able to share Plan for sharing (via a data management plan)
Don’t collect personal information that is not needed
Principle of informed consent: get consent to share data
Document all data processing (inside & outside analysis package)
Attribute, anonymise, or aggregate individual’s data
Anonymise it!(by Greendoula on Flickr)
How to create an anonymised, open datasetNumeric data, eg. surveys Qualitative data, eg. interviews
Remove names and identifiers Share the edited transcript, not video or audio unless consented
Renumber and resort case ids Agree a pseudonym with each subject
Group numbers into categories - banding Remind subject not to disclose personal or sensitive information, eg. about family members
Top and bottom code numbers (age, salaries) Replace proper nouns in text (names, placenames etc.) using square brackets, don’t blank out
Use standard codes (eg. SOC, SIC) and geographic boundaries at appropriate levels; not fine-grained
Avoid over-anonymising or data will lose value
Check for low cell counts in cross-tabs Keep a log of all replacements, generalisations or removals made; store separately from anonymised data
Restrict access, if necessary(James Emery on Flickr)
When open data access is not plausible When potential for harm to research subjects is too great
Information that can be used to discriminate requires extra protection
When required by the data producer, funder, health authority, etc. Sometimes precautions are required even for anonymised data
When anonymization is either not feasible or would destroy value of dataset
Population too small to be anonymous, e.g. those with genetic condition
Lock it up to keep safe(Eric Parker on Flickr)
Take proportionate precautions; ease route to access Make documentation and/or code about dataset openly available
Use a template for a data access application & data use agreement
Make arrangements for unbiased review of applications for access
Transfer data safely; use secure channels, encryption
Consider options for remote access in favour of on-site only access
The dangers of data linkage
Data linkage Probabilistic or ‘fuzzy’ matching is one method used to identify
individuals by combining information from different datasets
This can be done for legitimate research purposes, such as matching cases in different government (administrative) datasets
Informed consent is normally impossible for this technique; the data were collected for a different purpose than the current research proposal
Information governance to the rescue(Ryan Stevens on Flickr)
Information governance Requires a bigger infrastructure than one researcher can create
Has been developed to meet ethical standards where informed consent is not possible and research is in the public interest
Allowed by current European Data Directive and new regulation forthcoming
Makes use of the ‘five safes’ safe data, safe researcher, safe project, safe settings, safe outputs
Check out our free educational resources -([email protected])
Research Data Management Training MANTRA
http://datalib.edina.ac.uk
Research Data Management & Sharing MOOC
www.coursera.org/learn/data-management