realising the value of open data: some disciplinary perspectives
DESCRIPTION
Presentation fro the CIRCE workshop on ISS data preservation and use. Presents finding from the RECODE project on the value of making data open from the perspective of different research disciplines.TRANSCRIPT
Realising the value of open
data:Some disciplinary
perspective
Susan Reilly, LIBER Projects [email protected]
@skreilly
Overview
• Introduction: Policy RECommendations
for Open access to research Data in Europe (RECODE)
• The open research data agenda
• Case studies: drivers and barriers
• The way forward
Project ReCODE
The project will leverage existing networks, communities and projects to
address challenges within the open access and data dissemination and
preservation sector and produce policy recommendations for open access to research data based on existing good
practice.
Project ReCODE Objectives
• Reduce stakeholder fragmentation• Identify stakeholder values and inter-
relationships• Identify gaps, tensions and good practices• Produce a set of guidelines for the sharing
of scientific data• Engagement of stakeholders• Use 5 cases from different disciplines
By Ken Lund (Flickr: Why, Arizona (2)) [CC-BY-SA-2.0 (http://creativecommons.org/licenses
Clear benefits of open data
http://fav.me/d1y5efr
But if we really want researchers to open their data, maybe we should move from
the general to the specific
Because there are barriers too…
• Cultural differences
• Definition of research data
• Lack of skills/education
• Poorly defined roles and responsibilities
• Lack of infrastructure
• Lack of career incentives
5 case studies
• Particle physics
• Clinical science
• Human physiology
• Enviromental science
• Archeology and related disciplines
Particle Physics
• Practice– Large scale collaborative– Numerical data, complex analysis software and
hardware– Long time scale– Grid anlysis
• Motivation– Access for comparision, error testing, less
duplication of effort
Particle physics
• Barriers– Size of data– Relevance– Cost of openness– Complexity– Needs context (metadata)– Culture of collaboration
+ competition
Health Science
• Practices– Interdisciplinary– Different data types and sources– Many stakeholders (commercial, government,
practice)
• Motivations– Faster advancement, more reliable results,
access to negative result, duplication, understand genome
Health Science
• Barriers– Anonymisation– Commericial interests (competition)– Variety of formats– Quality metadata
Archeology
• Practice– Highly individual, fieldwork– Lots of data formats– Lacks standardisation in language,
terminology and measurement
• Motivations– Not replicable, cumulative knowledge,
creating narrative
Archeology
• Barriers– Legacy data– Not digital– Context is key- metadata, interoperability– Unclear research parameters– Specific skill sets needed (e.g. coding)– Cost
How do we define open access to research data?
• We can define ‘open access’ (see Berlin Declaration): license to copy, use, distribute and display material subject to proper attribution of authorship and appropriate standard format, online repository, enable unrestricted distribution,interoperability, and long-term archiving.
• But how do we define research data?Data underlying publications, all experimental data? Disciplines need to define what data should be made open
The entire data lifecycle must be addressed
• Open access to data extends across the life cycle of the production of knowledge, from ethical concerns about data collection, characteristics of data collection, data analysis, data management, access to findings, and the status of findings.
• Although some developments are shared across research practices, these are adapted within specific disciplines
Stakeholder fragmentation
• What is the real cost of open data?• Universities, publishers, public and private
research organizations, software developers, libraries, funding bodies and repositories within national, world regions and global science eco-systems
• High interdependency, but lack
of clarity around roles and
ResponsibilitiesBy Oneblackline (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons
Infrastructure & technologies
• Interoperability
• Scalability
• Data quality
• Automatically
executable policies
By Anonymous (Guillaume Blanchard, Juillet 2004, Fujifilm S6900.) [CC-BY-SA-2.5-2.0-1.0 (http://creativecommons.org/licenses/by-sa/2.5-2.0-1.0), GFDL (http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/) or FAL], via Wikimedia Commons
Legal and ethical issues
• Intellectual property– the database directive, copyright agreements
with publishers, can we (libraries/repositories) change the format of data?
• Data protection– right to be forgotten
A word on the long tail of research data…
• Data that does not fall within the scope of discipline/government repositories
• https://rd-alliance.org/groups/long-tail-research-data-ig/wiki/objectives-interest-group.html
Thank you from the ReCODE partners!