research data management · 8 key barrier: researcher mind-set & effort • changing the...
TRANSCRIPT
1
Research Data Management
GFII, Paris, February 2014
2
Overview
• Key drivers (and barriers) to the increase in volume of
research data, and its sharing
• Research data management today
• What is done with research data, and what is needed
• Elsevier plans with respect to research data management
3
Key Driver: Advances in Technology
• Faster, easier, cheaper, more computational methods of doing
science by data reuse
• Coming of age of analytics yield new layers of insight on the same
data
• Easier, better linked-data technologies for building webs of data
4
Key Driver: Government Regulations
http://www.nihms.nih.gov/stats/
• Despite their independent nature, scientists need to follow regulations
• Independent grants (as opposed to direct funding) are increasingly driving science
• Example: Open Access submissions to BMC:
Policy is
announced
Non-compliance is
punished
Trends:
• HIPAA (patient privacy act) drove wide-
scale adoption in DMS systems in Health
• Title 21 CFR Part 11 (tracing provenance
of data points) likewise drove usage of
FDA-compliant systems in Pharma
• More detailed data sharing mandates are
in process, in the US and in Europe, e.g.
the OSTP mandate
• Compliance enforcement will be a
main driver for use of data
management systems in academia,
and for sharing of data
“We’re not going to spend any more
money for you to go out and get
more data! We want you first to show
us how you’re going to use all the
data we paid y’all to collect in the
past!”, Barbara Ransom, NSF
Program Director Earth Sciences
2/13
5
Key Driver: Research Institutions’ Reactions to Mandates
• Data Management training
• Dedicated Research Data Management staff
• Encouraging and helping researchers with Data Management Plans
• Establishing Research Data Management Infrastructures
• And more …
Encouraging and facilitating good data management
practices, from curation to sharing
6
Key Driver: Reward Systems
NSF Grant proposal guidelines Chapter II.C.2.f(i)(c)
Tenopir et al, 2013
Piwowar et al,. 2007
• Growing awareness of citation impact of shared data:
Data not shared Data shared
Citations 2
004
-
2005
• NSF allows data to be a research output relevant for
grant applications
• Increasing number of data journals, enabling
sharing, recognition and citation
• Data citation policies are becoming standardised,
e.g. Datacite, CODATA Data Citation report,
Force11 Data Citation Synthesis group
• Growing interest to gather data-related statistics by
forward-looking research offices
There are various trends driving towards the increased value of data in ‘merit
metrics’:
7
Key Driver: Better Science
• Data sharing can (be perceived to) lead to better science. This opinion is
commonly shared by scientists:
Trends:
• Data sharing and reuse are increasingly on the agenda. Along with mandates,
credit, and other aspects this will play a positive role in improved research data
management and sharing
“Data needs to be stored and
organized in a way that will
allow researchers to access,
share, and analyze the
material.”, Tenopir et al, citing
a 2009 PARSE Study
“Integration of disparate data, often at
different biological scales, is a major
characteristic of current and future
biomedical research discoveries …
Data sharing policies are a great step
forward …”, from Phil Bourne’s letter
to apply for the post of NIH’s
Associate Director for Data Science
8
Key Barrier: Researcher Mind-set & Effort
• Changing the workflow, managing & sharing data can cost precious time
• Sharing data is not sufficiently part of the “culture”
Trends:
• Tools tailored to the academic environment (e.g. web-based ELNs), likely to
facilitate data management and sharing without disrupting the workflow
• Continued culture shift increasingly encouraging the sharing of data
Constant pressure to
publish! “There’s no way I’m going to spend
more time documenting my
experiments. I wouldn’t have time
left for my research!”,
researcher at King’s College,
London
9
Key Barrier: Intellectual Property
• There is also a perception that managing
data for reuse implies data will be shared: storing/curating/sharing are sometimes
conflated – adversely affecting not only sharing, but also use of data management tools
Trends:
• The ‘Open Science’ trend is encouraging data sharing
• Technology will likely be increasingly recognized as facilitating IP protection
• Embargo periods are becoming accepted (sharing data after publication, not before)
“Once I have tenure, I’ll share my data.
Until then, I’ll use it to get tenure. I
don’t want to get scooped.”, PI at
CMU, voicing the view of many … “I really should share my data. I’m not
sure why I don’t. I guess I am afraid that
someone will use it.”, researcher at
University of Michigan
10
Research Data Management Today:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
and writes a paper.
End of story.
11
Where Research Data Currently Goes (and what
needs to be done):
Research Data Output
Majority of data (90%?) is stored on
local/university hard drives
MiRB
PetDB:
TAIR
PDB
SedDB
A small portion of data (1-2%?) stored in small, topic-focused data repositories
Figshare
Dataverse
Institutional Repositories
Some data (8%?) stored in the main
generic data repositories
Zenodo
Dryad 1. INCREASE DATA DIGITISATION
4. DEVELOP SUSTAINABLE MODELS
3. IMPROVE REPOSITORY
INTEROPERABILITY
12
Elsevier and Research Data
*Based on the Royal Society’s “Science as an Open Enterprise” Report, 2012
Discoverability Accessibility Intelligibility Assessability Reusability
Linking articles to data in
repositories (ongoing effort)
Publishing data journals / “microarticles”
Collaborative projects with research institutions (ongoing effort)
Journal Data Policies (in 2014)
Data Search
Citation Index
Management
Pure
“Intelligent Openness”*
Longer term efforts
13
Elsevier Content Innovation – Adding Value to Content
Elsevier offers a range of possibilities to
interlink articles and data
Linking through in-article data accession
numbers (e.g. GenBank, Protein Data
Bank)
Database banners shown next to the
article on ScienceDirect (e.g. DRYAD)
Data-integration and visualization tools
that integrate relevant data into the article
page view for exploration (e.g. PANGAEA)
Interactive in-article data viewers enable authors to better present their research and
help readers to build unique insights
Chemical
compounds
Interactive plots
3D models
Inline Supplementary Material presents data
within the article, giving the appropriate
context
• Supplementary material inserted at the
place of reference/citation
• Make it easier for readers to find data
• Put data in the right context
14
“Urban Legend” Project
Reducing Barriers to Data Sharing
• Pilot project focusing on collection of metadata in an electrophysiology lab in CMU
• How can we increase data digitization and curation, enabling sharing and
collaboration?
Moving from this:
15
“Urban Legend” System Overview