data sharing as part of the research workflow

20
Data sharing as part of the research workflow Varsha Khodiyar, PhD Data Curation Editor, Scientific Data Nature Publishing Group @varsha_khodiyar @scientificdata Perspective from Scientific Data Data Perspective beyond Alliances, 3 rd March 2016

Upload: varsha-khodiyar

Post on 11-Apr-2017

308 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Data sharing as part of the research workflow

Data sharing as part of the research workflow

Varsha Khodiyar, PhD

Data Curation Editor, Scientific Data

Nature Publishing Group

@varsha_khodiyar

@scientificdata

Perspective from Scientific Data Data Perspective beyond Alliances, 3rd March 2016

Page 2: Data sharing as part of the research workflow

Why the push to share data?

Research conduct

Publication bias – what is submitted

Experimental design

Statistics

Lab supervision and training

Research reporting and sharing

Gels, microscopy images

Statistical reporting

Methods description

Data deposition and availability

2

Page 3: Data sharing as part of the research workflow

Generating research data is expensive

Just 18.1% NIH grant applications funded in 2014*

• Hours spent writing grants?

• Hours spent reviewing grants?

Resources are finite/expensive

• Modified animals

• Specialized reagents

Time and effort taken in the laboratory to generate good, valid data

* report.nih.gov/success_rates/Success_ByIC.cfm

Page 4: Data sharing as part of the research workflow

Data needs to be…

Discoverable

Need to know it’s

there

Accessible

Must be able to get to the

data

Usable

Require sufficient

information about how

the data was generated

Persistent

Historical data access

as part of the scientific

record, as well as for

new research

Reliable

Data provenance informs data

reuse decisions

Joint Declaration of Data Citation Principles www.force11.org/group/joint-declaration-data-citation-principles-final

Achieving human and machine accessibility of cited data in scholarly publications Starr et al. PeerJ Computer Science (2015). doi:10.7717/peerj-cs.1

Making data count Kratz & Strasser. Sci. Data (2015). doi:10.1038/sdata.2015.39

The FAIR guiding principles for scientific data management and stewardship Williams et al. Sci. Data (in press)

Page 5: Data sharing as part of the research workflow

Researchers already share data

• Most researchers are sharing

data, and using the data of

others

• Direct contact between

researchers (on request) is a

common way of sharing data

• Repositories are second most

common method of sharing

Kratz and Strasser (2015) doi: 10.1371/journal.pone.0117619 9

Page 6: Data sharing as part of the research workflow

But… Sharing of data upon request from published articles

• relies heavily on trust

• when stored informally, disappears at a rate of ~17% per year (Vines et al. 2014; doi: 10.1016/j.cub.2013.11.014)

Data shared in a repository

• often not reusable due to insufficient context

• may not be possible to determine reliability (peer review?)

• may not be easily findable, if not referenced in a scholarly article

• no scholarly credit for data producers

Page 7: Data sharing as part of the research workflow

Data papers and journals

• Ensure formal storage in repository

• Allow space for authors to include sufficient context for reuse

• Peer reviewers often specifically requested to comment on data archive reusability

• Data paper are formal works, giving scholarly credit to data producers

• Formal data citations enabling data discovery via bibliographic indexes that researchers are used to using

Page 8: Data sharing as part of the research workflow

Data journals and multidisciplinary research Cross-domain data sharing vital for solving the most pressing world issues:

• Public health (social science, epidemiology & molecular biology)

• Resource management & sustainability (energy research, policy, ecology & climate science)

Differences between researchers of vocabulary and expressions of reliability, mean clear descriptions of data become even more essential for cross-domain data sharing.

Multidisciplinary data journals (e.g. Data Science Journal, Scientific Data):

• provide a data sharing outlet to researchers in all domains

• help datasets cross domain boundaries, data is more visible and searchable i.e. less siloing

8

Page 9: Data sharing as part of the research workflow

Data reuse by the research community

9

“The Data Descriptor made it easier to use the data, for me it was critical that everything was there…all the technical details like voxel size.”

Professor Daniele Marinazzo

Page 10: Data sharing as part of the research workflow

Data reuse by the non-research community

10

http://www.nytimes.com/interactive/2014/12/30/science/history-of-ebola-in-24-outbreaks.html

Page 11: Data sharing as part of the research workflow

Increasing the discoverability of data

• Is data truly discoverable by researchers outside the original authors domain? • Too many papers to read in each person’s own field.

• Could increasing the machine accessibility of data, result in increased data reuse?

Page 12: Data sharing as part of the research workflow

Data Descriptors have human and machine readable components

12

Human readable representation of

study i.e. article (HTML &

PDF)

Human readable representation of

study i.e. article (HTML

& PDF)

Machine readable

representation of study

i.e. metadata

Page 13: Data sharing as part of the research workflow

• We capture metadata about the data being described in each Data Descriptor

• The manuscript captures human readable metadata needed for data reuse

• The curated metadata records capture machine readable metadata needed for machine based data discovery

Metadata at Scientific Data

Page 14: Data sharing as part of the research workflow

ISA format for machine readable metadata

14

• Study workflow

• Key sample characteristics

needed for data discovery

• Relates samples to data files

• Shows location of dataset

• Uses controlled vocabularies

and ontologies (where

possible)

Page 15: Data sharing as part of the research workflow

Metadata for data discovery

Search by: • Data Repositories • Experiment design • Measurements made • Technologies used • Factor types • Sample Characteristics

• Organism • Environment types • Geographic locations

scientificdata.isa-explorer.org

Page 16: Data sharing as part of the research workflow

16

After data analysis has

been published

Before analysis has been published

Authors not intending to analyse data

Data Descriptors can be submitted and published

at any point in the research workflow

After data analysis has

been published

Before the analysis has

been published

Publication alongside analysis

article

Data as part of the publication workflow

Page 17: Data sharing as part of the research workflow

Data as part of the research workflow?

Papers usually written after analyses, key details can be forgotten

• Ideally metadata would be captured during data generation process

• Takes time and effort to capture adequate metadata of sufficient quality for data reuse

Machine readable metadata

• Metadata format needs to be decided prospectively

• Researchers require professional expertise and guidance to use ontologies (essential for machine readability and discovery)

How to ensure data generators are able to capture metadata easily and in sufficient detail for reuse?

17

Page 18: Data sharing as part of the research workflow

Discoverable

Machine based data discovery

Implement data citations

Use community ontologies

Accessible & Persistent

Encourage use of

repositories

Use persistent identifiers

for data

Usable

Metadata capture

during data generation

process

Encourage use of

minimal reporting standards

Reliable

Encourage peer

reviewers to evaluate

data archive (structure,

format) alongside the article

Researcher incentives

Recognise data as a first class scholarly

work

Provide tools for

data visualization

and discovery

Building infrastructure to promote data sharing as part of the research workflow

Page 19: Data sharing as part of the research workflow

Scientific Data at RDA

Working groups

Publishing Data Workflows

(co-chair)

BioSharing Registry

(Susanna Sansone is co-chair)

Interest groups

Publishing Data

Data Fabric

Data in Context

Metadata

Certification of Digital Repositories

19

Page 20: Data sharing as part of the research workflow

Visit nature.com/sdata Email [email protected] Tweet @ScientificData

Honorary Academic Editor Susanna-Assunta Sansone Managing Editor Andrew L. Hufton Data Curation Editor Varsha K. Khodiyar Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators

Supported by