preparing data for sharing: the fair principles

35
PREPARING DATA FOR SHARING The FAIR Principles Gareth Knight London School of Hygiene & Tropical Medicine [email protected] ADMIT Network Meeting 01 December 2015

Upload: london-school-of-hygiene-and-tropical-medicine

Post on 12-Jan-2017

787 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Preparing Data for Sharing: The FAIR Principles

PREPARING DATA FOR

SHARING

The FAIR Principles

Gareth Knight

London School of Hygiene & Tropical Medicine

[email protected]

ADMIT Network Meeting

01 December 2015

Page 2: Preparing Data for Sharing: The FAIR Principles

FAIR Principles

Findable

• Descriptive metadata

• Persistent Identifiers

Accessible

• Determining what to share

• Participant consent and risk management

• Access status

Interoperable

• XML standards

• Data Documentation Initiative

• CDISC

Reusable• Rights and

licence models

• Permitted and non-permitted use

http://datafairport.org/

Make your data:• Findable• Accessible• Interoperable• Reusable

Page 3: Preparing Data for Sharing: The FAIR Principles

Data Sharing in the sciences

• Data sharing has always taken place in some form

• Enlightenment during 17 – 18th

century built upon open debate and sharing of knowledge

• Science depends on openness and transparency to advance– Replicate results

– Correct errors & address bias

• Negative as well as positive findings need to be in the public domain

“Systematic Dictionary of the Sciences, Arts, and Crafts”Diderot & d'Alembert (1751 onwards)

Page 4: Preparing Data for Sharing: The FAIR Principles

Data Sharing in the News

“To make progress in science, we need to be open and share.”Neelie Kroes (2012)

vice president of the European Commissionhttp://europa.eu/rapid/press-release_SPEECH-12-258_en.htm

“To make progress in science, we need to be open and share.”Neelie Kroes (2012)

vice president of the European Commissionhttp://europa.eu/rapid/press-release_SPEECH-12-258_en.htm

Page 5: Preparing Data for Sharing: The FAIR Principles

Key Motivators

Research / Policy development Ensure validity

Funder Requirement Publisher requirements

Page 6: Preparing Data for Sharing: The FAIR Principles

Data reuse improves citation rate

• Studies that made data available in a public repository received 9% more citations than similar studies where data was not available

• Creators tend to cite own data up to 2 years

• Third party use grew over time: for 100 datasets deposited in year 0,

– 40 reuse papers in PubMed in year 2

– 100 by year 4

– 150+ by year 5.

Piwowar & Vision, T.J (2013). Data reuse and the open data citation advantage. https://peerj.com/articles/175/

Study of 10,557 articles published between 2001 and 2009 that

collected gene expression microarray data

Page 7: Preparing Data for Sharing: The FAIR Principles

Plan for Sharing

Data Management Plan• Data to be produced

• Management approach

• Sharing approach

– In what form?

– When will it take place?

– How will it be shared?

PlanningData

CollectionDatabase

SetupData

Capture

Data Processing & curation

Archiving & sharing

https://globalhealthdatamanagement.tghn.org/data-dudes/tools-templates/

Page 8: Preparing Data for Sharing: The FAIR Principles

DATA

DISCOVERY

Is your data findable?

Page 9: Preparing Data for Sharing: The FAIR Principles

Discovery Metadata

• Descriptive metadata created to describe key attributes of data:– Title

– Creator

– Content description

• Data repositories/journals capture and publish discovery metadata in several formats (DC, DataCite, DDI)

• Metadata ‘harvested’ by research data catalogues & search engines

• Metadata available to all, even if data is not

Registry of Research Data Repositorieshttp://service.re3data.org

Registry of Research Data Repositorieshttp://service.re3data.org

Page 10: Preparing Data for Sharing: The FAIR Principles

Citing Data

• Research data are a citable resource, same as papers & books

• 44-75 days is the estimated average lifespan of web URLs

• A unique, long-term identifier is necessary to enable citation

• Many persistent ID systems developed to solve problem

– DOI, Handle, ARK, etc.

• Data citation in reports and publications

UK Data Service: Citing Datahttps://www.ukdataservice.ac.uk/use-data/citing-data

UK Data Service: Citing Datahttps://www.ukdataservice.ac.uk/use-data/citing-data

Page 11: Preparing Data for Sharing: The FAIR Principles

DATA

ACCESS

Do you have permission to share? If so, what?

Page 12: Preparing Data for Sharing: The FAIR Principles

Data Selection

Meet funder / journal obligations

Encourage research use

Higher citation rate

Reproduce & validate results

ConstraintsMotivation

Concern that will attract lower rate of response or people will be less honest

Intellectual Property Rights issues

Participant consent doesn’t address

sharing

Data Protection legislation

Data sharing decisions built uponrecognition of all influencing factors

Information Commissioner Office. Data Sharing Code of Practicehttp://www.ico.org.uk/for_organisations/data_protection/topic_guides/data_sharing/

Information Commissioner Office. Data Sharing Code of Practicehttp://www.ico.org.uk/for_organisations/data_protection/topic_guides/data_sharing/

Page 13: Preparing Data for Sharing: The FAIR Principles

Handling individual level data

• Collected and analysed for specific purpose

• Stored no longer than is necessary

• Kept securely and safely to prevent unauthorised or unlawful access, process, loss, or destruction

EU Data Protection Directive 95/46/EC establishes limitations on how information on living individuals is held and used

Reform of the data protection legal framework in the EUhttp://ec.europa.eu/justice/data-protection/reform/index_en.htmReform of the data protection legal framework in the EU

http://ec.europa.eu/justice/data-protection/reform/index_en.htm

Page 14: Preparing Data for Sharing: The FAIR Principles

Informed Consent

Covered data:

• Variables

• Anonymised / identifiable

Allowed activities:

• Use in current project, e.g. topics

• Preserve and archive with 3rd party

• Future research – access & use

Communication method:

• Information Sheet

• F2f discussion

Time period for decision:

• Prior to capture

• Following capture & review

https://globalhealthtrainingcentre.tghn.org/articles/informed-consent/https://globalhealthtrainingcentre.tghn.org/articles/informed-consent/

http://retractionwatch.com/2014/02/05/journal-and-authors-apologize-unreservedly-for-distress-caused-to-deceased-childs-family-by-case-report/

Page 15: Preparing Data for Sharing: The FAIR Principles

Data Sharing as a barrier

Investigation of influence of open data policies on consent rate:

• No participants declined to participate, regardless of condition

• Rates of drop-out vs completion did not vary between open/non-open policies

• No significant change in potential consent rates when participants openly asked about the influence of open data policies on their likelihood of consent.

Some researchers consider sharing obligations to be abarrier to research participation

Page 16: Preparing Data for Sharing: The FAIR Principles

Risk Management

Assess likelihood that data can be used to:

• Identify a person directly

• Infer information about a person

• Link records relating to person to other info

Determine action to address issue:

• Randomisation - noise addition, permutation

• Generalisation - aggregating results, limiting geographic details

• Pseudonymisation - hash functions

Is there a risk of sharing personal or sensitive information?

UK Information Commissioner Office: Anonymisation Code of Practicehttp://www.ico.org.uk/for_organisations/data_protection/topic_guides/anonymisation

UK Information Commissioner Office: Anonymisation Code of Practicehttp://www.ico.org.uk/for_organisations/data_protection/topic_guides/anonymisation

https://www.flickr.com/photos/estherase/2190068148

Page 17: Preparing Data for Sharing: The FAIR Principles

When anonymisation goes wrong

New York City Taxi & Limousine Commission release anonymised 20 GB file on 173 million

journeys under FOI

Drivers' Hack License & Medallion number re-generated, identifying drivers annual income

Identify home address and destinations of residents

Identify journeys made by celebrities?

http://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset/http://arstechnica.com/tech-policy/2014/06/poorly-anonymized-logs-reveal-nyc-cab-drivers-detailed-whereabouts/

Page 18: Preparing Data for Sharing: The FAIR Principles

Access Status

Control method

• Data Transfer Agreement

• Access controls

Application process:

• Request form

• Review process

Access criteria:

• Permitted users – how do you identify?

• Permitted use – topic, academic use,

• Other criteria: encryption, time period

Open Vs. controlled access

https://www.flickr.com/photos/toruokada/16958186672/

Page 19: Preparing Data for Sharing: The FAIR Principles

DATA

INTEROPERABILITY

Can data be analysed and harmonized?

Page 20: Preparing Data for Sharing: The FAIR Principles

Data Standards

Data exchange is dependent upon:

• Open formats

• Common standards

• Documented metadata specification

• Consistent vocabulary

• Documented workflows https://biosharing.org/

Page 21: Preparing Data for Sharing: The FAIR Principles

Clinical Data Interchange

Standards Consortium

Standards intended to improve consistencyacross the clinical trial lifecycle

ProtocolProtocolData

CollectionData

CollectionData

TabulationData

TabulationData

AnalysisData

Analysis

Archiving and

exchange

Archiving and

exchange

Protocol Representation

Model

Clinical Data Acquisition Standards

Harmonization (CDASH)

Operational DataModel (ODM)

andDefine-XML

Study Data Tabulation

Model(SDTM)

AnalysisData Model

(ADaM)

Page 22: Preparing Data for Sharing: The FAIR Principles

Data Documentation Initiative

• Maintained & developed by DDI Alliance

• Supported by data archives, producers, research data centers, university data libraries, statistics organizations, etc.

• Two versions:

– DDI2 / Codebook: An archived instance of a study

– DDI3 / DDI Lifecycle: Suitable for longitudinal and repeated surveys

An XML-based metadata standard developed for social science

and economic statistics

http://www.ddialliance.org/

Page 23: Preparing Data for Sharing: The FAIR Principles

Study

ConceptsConcepts

measures

SurveyInstruments

using

Questions

made up of

Universes

about

Responses

collect

resulting in

with values of

Variables

Comprised of

Categories/Codes,

Numbers

Data Files

Survey Data Model

Slide source:

https://www.unece.org/fileadmin/DAM/stat

s/documents/ece/ces/ge.33/2011/mtg2/W

P_1_Arofan.ppt

Page 24: Preparing Data for Sharing: The FAIR Principles

DDI Codebook

A codeBook consists of:

1. docDscr: describes the DDI document

2. stdyDscr: Title, abstract, methodologies, agencies, access policy

3. fileDscr: a description of files in the dataset

4. dataDscr: variables (name, code, etc.), variable groups, cubes

5. othMat: other related materials, e.g. document citation

3 levels - Study, dataset, variable

Preserves the collection of files associated with

an archival copy of a survey

Page 25: Preparing Data for Sharing: The FAIR Principles

DDI Lifecycle

http://www.ddialliance.org/what

Data collector

Data Analyst Data Curator

Secondary user

Each stage may be performed by different groups

Page 26: Preparing Data for Sharing: The FAIR Principles

DDI Metadata reuse

Basic metadata can be reused during study life:

• Concepts, questions, responses, variables, categories, codes, survey instruments, etc. may be adopted from earlier waves

Referencing earlier iterations:

• Unique identifier

• Version number - control over time

Common metadata ‘groups’ maintained by specific agencies:• Schemes: lists of items of a single type

• Modules: metadata for a specific purpose or lifecycle stage

• All maintainable metadata has a known owner or agency

Page 27: Preparing Data for Sharing: The FAIR Principles

Unique ID example

urn=“urn:ddi:3_0:VariableScheme.Variable=pop.umn.edu:STUDY0145_VarSch01(1_0).V101(1_1)”

This is a URN From DDI Version 3.0 For a variableThe scheme agency is

pop.umn.edu

With identifierSTUDY012345_VarSch01

Version 1.0 Variable ID isV101

Version 1.1

http://www.iza.org/conference_files/eddi09/ppt/thomas_wendy_course.pdf

Page 28: Preparing Data for Sharing: The FAIR Principles

DDI Cross-study comparison

Variables are comparable if they possess same properties:

• Age is comparable if has:– Same concept (e.g., age at last birthday)

– Same top-level universe (people)

– Same representation (i.e., an integer from 0-99)

DDI Comparison module:• Place similar items in same group and perform tailored comparison

• Mappings are context-dependent, i.e. sufficient for purposes of particular research

Page 29: Preparing Data for Sharing: The FAIR Principles

DDI Tools

DDI Codebook:

• Nesstar Publisher & Server

• IHSN Microdata Management Toolkit

• Collectica

• NADA

• UKDA - DExT, ODaF DeXtris

DDI Lifecycle

• Collectica Designer, Collectica for Excel, Portal

• Sledgehammer

DDI Toolshttp://www.ddialliance.org/resources/tools

DDI Toolshttp://www.ddialliance.org/resources/tools

Page 30: Preparing Data for Sharing: The FAIR Principles

DATA

REUSE

Can data be used for further research?

Page 31: Preparing Data for Sharing: The FAIR Principles

Data Rights

• Many rights apply to data– Copyright

– Moral

– Database

– Patents & trade secrets

• Rights issues vary between countries

• Ensure your project has clarified rights issues before sharing

https://www.flickr.com/photos/riekhavoc/4813140176/

Rights issues influence how data can be shared, used and cited

Page 32: Preparing Data for Sharing: The FAIR Principles

Data Licence Models

Many licence models exist, which can be applied at different granularity

• Creative Commons

• Open Data Commons

• GNU GPL, BSD and others for software

Do you have a standard Data Sharing Agreement within your institution?

A data licence outlines permitted & prohibited use

Page 33: Preparing Data for Sharing: The FAIR Principles

What secondary use is allowed?

http://www.bbc.co.uk/news/uk-scotland-tayside-central-14744240http://www.theguardian.com/society/2011/sep/01/cigarette-university-smoking-research-information

Page 34: Preparing Data for Sharing: The FAIR Principles

FAIR data

• Consider permitted use

• Apply appropriate licence

• Use open formats

• Consistent vocabulary

• Common metadata standards

• Consider what will be shared

• Obtain participant consent & perform risk management

• Describe your data in a data repository

• Apply a persistent identifiers

Findable

ReusableInteroperable

Accessible

Page 35: Preparing Data for Sharing: The FAIR Principles

Thank You for your attention!

Questions