liber webinar: are the fair data principles really fair?
TRANSCRIPT
WEBINAR: Research Data Services
WEBINAR: 23 Things for Research Data Management
Join the conversation: #FAIRdata
Speakers
Alastair [email protected]
Moderating
Birgit SchmidtGoettingen State and University [email protected]
Rob GrimErasmus University [email protected]
• Alastair Dunning@alastairdunning
• Jasmin Böhmer@JasminBoehmer
Technical University of DelftHosts of 4TU.Centre for Research Data
Are the FAIR Data Principles Fair?
• Madeleine de Smaele@MadeleineSmaele
Are the FAIR Data Principles Fair?Blog Post with all the information:
http://bit.ly/2lIgc9pLIBER Webinar, 10th March
Motivation for this Project● H2020 / EU demands on open data and research data
management.
● Providing insight and support for repositories to improve their information architecture and digital infrastructure to comply to H2020 and FAIR demands.
● Own aspiration to offer the best possible service and support for 4TU.Centre for Research Data.
● Working towards practices to improve interoperability and reuse-value of data-sets in research data repositories.
Using the FAIR principles and corresponding facets as scoring matrix
Applying a traffic-light rating system:
Use the information available on the web-interface of the repository online to evaluate the FAIR Principles
Methodology
Our Interpretation of the FAIR Principles
http://bit.ly/2lI2CCJ
1. Compliance is not high
https://data.4tu.nl/repository/uuid:5146dd06-98e4-426c-9ae5-dc8fa65c549f / General Overview Charts
N = 37
Findable Accessible
Interoperable Re-Usable
F1 (meta)data are assigned a globally unique and eternally persistent identifier.
49% of the repositories do not assign DOI, HANDLE, or URN.
E.g. Subject Based Repositories use project ID’s or subject specific ID-systems. These links do not work in public spheres.
A4 metadata are accessible, even when the data are no longer available.
97% of the repository do not clearly write about their metadata persistency, if the data is not available (anymore).
The transparency and integrity of the repository is improved by providing metadata-records for closed, restricted, or unavailable data-sets.
I2 (meta)data use vocabularies that follow FAIR principles.
100% of the repositories do not have visible ontologies or (controlled) vocabulary.
Adding a semantic layer that enables links to unambiguous terms and definitions needs a lot of curation effort.
Is e.g. ORCID (Open Researcher and Contributor ID) a vocabulary?
R1 meta(data) have a plurality of accurate and relevant attributes.
38% of the repositories do not provide sufficient information that helps to determine the value of reuse for the information seeker.
Specific information are mostly included in the documentation. Displaying those information in appropriate metadata fields would be beneficial.
2. Some principles are easily measured; some are much more subjective
Pretty Obvious - (meta)data are assigned a globally unique and eternally persistent identifier.
Vague - data are described with rich metadataWhat makes metadata rich ?
Subjective - (meta)data meet domain-relevant community standards
Philosophically dubious - (meta)data use vocabularies that follow FAIR principles
3. Some principles are narrow; others are broad
Narrow - (meta)data are retrievable by their identifier using a standardized communications protocol.
the protocol is open, free, and universally implementable.
the protocol allows for an authentication and authorization procedure, where necessary.
Broad - (meta)data include qualified references to other
(meta)data.
(meta)data meet domain-relevant community standards (takes a long time to figure out)
Technical vs Policy ● (meta)data are retrievable by their identifier
using a standardized communications protocol. ● the protocol is open, free, and universally
implementable. ● the protocol allows for an authentication and
authorization procedure, where necessary.● metadata are accessible, even when the data
are no longer available.
4. Some subject areas fare badly
Compliance of Social Science Data Repositories against FAIR Findable Principles (F1, F2, F3 and F4)
Practice for Social Science Repositories Analysed● Data only available on request
● Licence not visible / clear
● Plenty of free text documentation on collection of data exists
● No structured metadata per dataset / no machine readable metadata
● But still seem to work well within the discipline
LASA - Longitudinal Aging Study Amsterdam. Aging research and collecting data on aging in the Netherlands
No global identifierNo structured metadataBut plenty of documentation
● Licence sometimes clear (no data protection issues)
● Some free text documentation on the overall collection of data exists
● No structured metadata per dataset / sometime the data is dynamically created following query
● No global identifiers per dataset
● Meeting existing disciplinary norms but not fully embedded as machine readable data
Practice for Climate Data Repositories Analysed
SACA - Southeast Asian Climate Assessment
No structured metadataBut plenty of documentation No global identifier
5. For repositories, doing some simple(ish) things vastly helps compliance
● Create a permanent identifier for each dataset
● Always use an open license or clear License
● Make sure each dataset has rich metadata associated with it (Dublin Core good starting place!)
● Make data available via http
Some Final Points (I)
● FAIR principles are deliberately vague - principles to be interpreted
● Nothing about back-up and preservation. Relationship to Data Seal of Approval?
● Much more work to be done on relationship between FAIR data and FAIR repository
Some Final Points (II) ● To create FAIR dataset demands alliance
between repository and dataset creator
● Governance? How are principles updated
● FAIR principles derive not from libraries / archives but more from life sciences; but still require good knowledge of metadata / archiving practice
Questions?
• Type your questions in the chat box.• Rob Grim (moderator) will select and pose
questions to the speakers• Unanswered questions will be addressed
by Alastair in a blog post (to be published following the webinar)
WEBINAR: Research Data Services
Thank You!Final Notes:
1. Blog post with more information https://openworking.wordpress.com/2017/02/10/fair-principles-connecting-the-dots-for-the-idcc-2017
We’ll email a link to the recording shortly.