Download - Metadata for Research Objects
![Page 1: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/1.jpg)
Sean [email protected]
@seanbechhofer
Making Metadata Work, ISKOLondon, 23rd June 2014
Metadata for Research Objects
1
![Page 2: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/2.jpg)
Publication• Publications are about argumentation:
Convince the reader of the validity of a position– Reproducible Results System: facilitates
enactment and publication of reproducible research.
• Results are reinforced by reproducability– Explicit representation of method.
• Verifiability as a key factor in scientific discovery.
J. Mesirov Accessible Reproducible Research Science 327(5964), p.415-416, 2010 doi:10.1126/science.1179653
Stodden et. al. Reproducible Research: Addressing the Need for Data and Code Sharing in Computational Science Computing in Science and Engineering 12(5), p.8-13, 2010 doi:10.1109/MCSE.2010.113
C.Goble et. al. Accelerating Scientists’ Knowledge Turns Communications in Computer and Information Science Volume 348, 2013, pp 3-25 doi:10.1007/978-3-642-37186-8_1
![Page 3: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/3.jpg)
Reproducible Science
3Goble: SSI Collaborations Workshop 2014
![Page 4: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/4.jpg)
Scientific Workflows
4
» Scientific workflows are at the heart of experimental science› Enable automation of
scientific methods› Support experimental
reproducibility› Encourage best practices
» There is then a need to preserve these workflows› Scientific development based
on method reuse and repurpose
› Conservation is key» Workflow preservation is a
multidimensional challenge› Representation of complex
objects› Decay analysis, diagnosis,
and prevention› Social Objects that can be
inspected, reused, repurposed and credited
Preservation of scientific workflows in data-intensive science
![Page 5: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/5.jpg)
Preservation
TechnicalMulti-step computational processRepeatable and comparativeExplicate computation
Social Virtual WitnessingTransparent, precise, citable documentationAccurate provenance logsReusable protocols, know-how, best practice
Can I review /
repeat your method?
Can I defend my method?
Can I reuse / reproduce
this method?
![Page 6: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/6.jpg)
Context: Semantic Web and Linked Data• SW: Explicit machine-readable representation of
information
• LD: A set of best practices for publishing and connecting data on the Web1. Use URIs to name things2. Use dereferencable HTTP URIs3. Provide useful content on
lookup using standards4. Include links to other stuff
6
![Page 7: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/7.jpg)
• An aggregation object that bundles together experimental resources that are essential to a computational scientific study or investigation. – data used – results produced in an experiment study;– (computational) methods employed to
produce and analyse that data;– people involved in the investigation.
• Plus annotation information that provides additional information about both the bundle itself and the resources of the bundle– descriptions– provenance
Research Objects
7
![Page 8: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/8.jpg)
ROs as a Currency
8
CreatorContributorCollaborator
ComparatorRe-User
EvaluatorReviewerTraineeTrainerReader
Publisher
Curator
Librarian
RepositoryManager
![Page 9: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/9.jpg)
• Three principles underlie the approach:
• Identity– Referring to resources
(and the aggregation itself)• Aggregation
– Describing the aggregation structureand its constituent parts
• Annotation– Associating information with aggregated resources.
Research Objects
9
![Page 10: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/10.jpg)
Identity• Mechanisms for referring to the resources that are
aggregated within a Research Object
• URIs– Web Resources
• DOIs– Documents/papers/datasets
• ORCID IDs– Researchers
10
![Page 11: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/11.jpg)
Identifier Issues• HTTP URIs provide both access and identification• PIDs: Persistent Identifiers (e.g.DOIs) tend to resolve
to human-readable landing pages– With embedded links to further (possibly machine-
readable) resources• ROs seen as non-information resources with
descriptive (RDF) metadata– Redirection/negotiation– Standard patterns for Linked Data resources
• Bidirectional mappings between URIs and PIDs• Versioning through, e.g. Memento
11
H. Van de Sompel et. al. Persistent Identifiers for Scholarly Assets and the Web: The Need for an Unambiguous Mapping 9th International Digital Curation Conference
![Page 12: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/12.jpg)
Aggregation• Open Archives Initiation Object Reuse and Exchange
(OAI ORE) is a standard for describing aggregations of web resources– http://www.openarchives.org/ore/
• Uses a Resource Map to describe the aggregated resources
• Proxies allow for statements about the resources within the aggregation– Capturing context and viewpoints
• Several concrete serialisations– RDF/XML, Atom, RDFa
12Graceful Degradation
![Page 13: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/13.jpg)
Annotation• Open Annotation specification is a community
developed data model for annotation of web resources– http://www.openannotation.org/spec/core/
• Developed by the W3C Open Annotation Community Group
• Allows for “stand-off” annotations– Annotation as a first class citizen
• Developed to fit with Web Architecture
13Graceful Degradation
![Page 14: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/14.jpg)
Annotation Content• Essential to the understanding and interpretation of
the scientific outcomes captured by a Research Object as well as the reuse of the resources within it. – Provenance information about the experiments, the
study or any other experimental resources– Evolution information about the Research Object
and its resources, – Descriptions of computational methods
or processes– Dependency information or settings
about the experiment executions
14
![Page 15: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/15.jpg)
Core & Extensions• Core model provides support for aggregation and
annotation• Extensions provide additional vocabularies for domain
specific tasks• Workflow Provenance
– Information capturing workflow executions• Workflow Description
– Abstractions describing Processes, inputs and outputs
• Research Object Evolution– Information describing change and “snapshots”
15
![Page 16: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/16.jpg)
RO Model
16
![Page 17: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/17.jpg)
Provenance• W3C’s PROV model allows for capture of information
relating to – Attribution
Who did it?– Derivation
Data sources used– Activities
What happened (and when)
• Significant eco-system (generators, viewers, consumers) has grown up around PROV– IPAW & TAPP
17
Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved.
![Page 18: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/18.jpg)
Tooling
18
![Page 19: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/19.jpg)
ROs and OAIS• ROs as Information Packages in OAIS• myExperiment as live/access repository• ROHUB as archival repository
19
![Page 20: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/20.jpg)
SCAPE: Planning and Watch
20
Watch
OperationsPlanning
Env & Users
Repository
plan
deploy
monitor monitor
monitor
accessingest,harvest
execution
http://www.scape-project.eu/
• SCAPE project concerned with Digital Preservation.• Planning and Watch infrastructure to helpmmonitor
the state of a repository and co-ordinate appropriate actions
• Driven by policies.
![Page 21: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/21.jpg)
myExperiment and RODL
Decay, Service Deprecation,Data source monitoring, Checklists,Minimal Models
Wf4Ever: Monitoring and Watch
21
Watch
OperationsPlanning
Env & Users
Repository
plan
deploy
monitor monitor
monitor
accessingest,harvest
execution
• Ideas applied to workflow preservation
![Page 22: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/22.jpg)
Decay• Survey of 92 Taverna workflows from myExperiment
• Volatile Third-Party Resources
• Missing Data• Missing Execution Environments• Poor descriptions
22
Belhajjame et. al. Why workflows break — Understanding and combating decay in Taverna workflows e-Science 2012 doi:10.1109/eScience.2012.6404482
![Page 23: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/23.jpg)
Checklists and Validation• Checklists widely used to support safety, quality and
consistency• Common in experimental science
– Expressing minimum informationrequired
– Supporting “health” monitoring of workflow-centric ROs.
• Checklists can be defined in terms of the RO model and its annotations– Generic checklist service then
executes against that model andthe given annotations
– Provenance23
![Page 24: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/24.jpg)
Minim Data Model
24
Zhao et. al. A Checklist-Based Approach for Quality Assessment of Scientific Information 3rd In. Workshop on Linked Science, 2013
![Page 25: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/25.jpg)
Checklist Evaluation
25
![Page 26: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/26.jpg)
Checklist Evaluation
26
![Page 27: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/27.jpg)
RO Bundle• A single, transferable object encapsulating the
description and resources of an RO– Download, transfer, publish
• ZIP-based format (resources) plus a manifest describing aggregation and annotations (description)– Unpack with standard tooling
• JSON-LD as a representation for manifest– Lightweight linked-data format– Compatible with existing JSON tooling and services– PROV-O and OAC for annotations
27http://wf4ever.github.io/ro/bundle/
![Page 28: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/28.jpg)
Bundling via git/Zenodo/figshare• Scientist works with local folder structure.
– Version management via github. – Local tooling produces metadata description– Metadata about the aggregation (and its resources)
provided by “hidden folder”• Zenodo/figshare pull snapshot from github
– Providing DOIs for the aggregrations– Additional release cycles can prompt new DOIs
28
![Page 29: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/29.jpg)
Zenodo
29
![Page 30: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/30.jpg)
figshare
30
![Page 31: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/31.jpg)
ROs as RDFa
31http://rohub.linkeddata.es
![Page 32: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/32.jpg)
RDFa
32http://rohub.linkeddata.es
![Page 33: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/33.jpg)
Code as a Research Object
33
![Page 34: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/34.jpg)
COMBINE Archive
34http://co.mbine.org/documents/archive
![Page 35: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/35.jpg)
GigaScience/ISA
35http://isa-tools.github.io/soapdenovo2/
![Page 36: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/36.jpg)
IPython
36
![Page 37: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/37.jpg)
Wrap Up• Aggregation objects bundling together experimental
resources that are essential to a computational scientific study or investigation– Intended to support greater transparency and
reproducability• Annotations provide additional information
about the bundle and its contents– Metadata is key here
• Use of existing standards, vocabularies andinfrastructure
• Nascent tooling to support creation,management and publication
37
![Page 38: Metadata for Research Objects](https://reader038.vdocument.in/reader038/viewer/2022102901/554e738cb4c9054a698b4c30/html5/thumbnails/38.jpg)
Thanks!• All the members of the Wf4Ever team
– iSOCO: Intelligent Software Components S.A., Spain– University of Manchester, School of Computer Science, Manchester,
United Kingdom– University of Oxford, Department of Zoology, Oxford, UK– Poznan Supercomputing and Networking Center. Poznan, Poland– IAA: Instituto de Astrofísica de Andalucía, Granada, Spain– Leiden University Medical Centre, Centre for Human and Clinical
Genetics, The Netherlands
• Colleagues in Manchester’s Information Management Group
• RO Advisory Board Members
38
http://www.researchobject.orghttp://www.wf4ever-project.org