overcoming obstacles to sharing data about human subjects

Overcoming obstacles to sharing data about human

subjectsForce11 conference

Portland, Oregon18 April, 2016

Robin RiceEDINA and Data Library

University of Edinburgh, UK

The elephant in the room(David Blackwell on Flickr)

The status quo Most data underlying published research, even publicly funded research, are not

shared. How can research claims be verified?

Common barriers are well-known, confidentiality concerns are high

Qualitative research data and small-scale surveys are not commonly re-used

Tendency is to err on side of caution, given legal & ethical responsibilities

As open science agenda pushes disciplines toward reproducibility, there is a danger of human subject-oriented research falling behind

Redressing the imbalanceCaution vs open data sharing

(Seesaw by harmishhk on Flickr)

What a researcher can do to be able to share Plan for sharing (via a data management plan)

Don’t collect personal information that is not needed

Principle of informed consent: get consent to share data

Document all data processing (inside & outside analysis package)

Attribute, anonymise, or aggregate individual’s data

Anonymise it!(by Greendoula on Flickr)

How to create an anonymised, open datasetNumeric data, eg. surveys Qualitative data, eg. interviews

Remove names and identifiers Share the edited transcript, not video or audio unless consented

Renumber and resort case ids Agree a pseudonym with each subject

Group numbers into categories - banding Remind subject not to disclose personal or sensitive information, eg. about family members

Top and bottom code numbers (age, salaries) Replace proper nouns in text (names, placenames etc.) using square brackets, don’t blank out

Use standard codes (eg. SOC, SIC) and geographic boundaries at appropriate levels; not fine-grained

Avoid over-anonymising or data will lose value

Check for low cell counts in cross-tabs Keep a log of all replacements, generalisations or removals made; store separately from anonymised data

Restrict access, if necessary(James Emery on Flickr)

When open data access is not plausible When potential for harm to research subjects is too great

Information that can be used to discriminate requires extra protection

When required by the data producer, funder, health authority, etc. Sometimes precautions are required even for anonymised data

When anonymization is either not feasible or would destroy value of dataset

Population too small to be anonymous, e.g. those with genetic condition

Lock it up to keep safe(Eric Parker on Flickr)

Take proportionate precautions; ease route to access Make documentation and/or code about dataset openly available

Use a template for a data access application & data use agreement

Make arrangements for unbiased review of applications for access

Transfer data safely; use secure channels, encryption

Consider options for remote access in favour of on-site only access

The dangers of data linkage

Data linkage Probabilistic or ‘fuzzy’ matching is one method used to identify

individuals by combining information from different datasets

This can be done for legitimate research purposes, such as matching cases in different government (administrative) datasets

Informed consent is normally impossible for this technique; the data were collected for a different purpose than the current research proposal

Information governance to the rescue(Ryan Stevens on Flickr)

Information governance Requires a bigger infrastructure than one researcher can create

Has been developed to meet ethical standards where informed consent is not possible and research is in the public interest

Allowed by current European Data Directive and new regulation forthcoming

Makes use of the ‘five safes’ safe data, safe researcher, safe project, safe settings, safe outputs

Check out our free educational resources -([email protected])

Research Data Management Training MANTRA

http://datalib.edina.ac.uk

Research Data Management & Sharing MOOC

www.coursera.org/learn/data-management

overcoming obstacles to sharing data about human subjects

Education