2013-01-17 research object
DESCRIPTION
Preserving scientific data and methods - presentation by myGrid for NIHBITRANSCRIPT
Research ObjectsPreserving scientific data and methods
Stian Soiland-Reyes, Khalid BelhajjameSchool of Computer Science, Univ of
Manchester
myGrid NIHBI meet-up Manchester 2013-01-17
2
Agenda
» Preserving digital science» The Research Object
» Anatomy» Lifecycle
» Wf4Ever Tools» Future developments
3
Computation Processes in Today’s Research
» Research is being conducted in increasingly
digital and online environment
» This has led to the emergence of new digital
artifacts
» In some respects, these objects can be regarded
as data
» However, some objects include the description of
the research method that is captured as a
computational process
» Such processes encapsulate the knowledge
related to the generation, (re)use and general
transformation of data in experimental sciences
Raw data
Computational process
Results
4
Scientific Workflow
» A scientific workflow is a precise, executable
description of a scientific procedure - a series of
analysis operations connected using data links
» Each operation represents the execution of a
computational process
» Can be supplied by independently developed
web services
» Can also use existing data sources that are
accessible on the Web
In this work, we focus on a particular kind of computational processes called scientific workflows
5
Preservation Challenges
» Changes by 3rd parties
» Workflow may produce
different lists at different
times
» Workflow may become
inoperable
Challenges deal with their executable aspects and their vulnerability to the volatility of the resources required for their execution
» Workflow decay – The execution of the workflow may fail or yield different results,
due to dependencies on resources and services subject to independent changes,
e.g., EMBL-EBI. Even workflows that depend on local resources are vulnerable.
LaboratoryLaboratory
Instruments
MethodsMethods
Materials Publication
Models, Techniques, Algorithms
Models, Techniques, Algorithms
Data
LaboratoryLaboratory
Instruments
MethodsMethods
Materials
ProvenanceAttribution
Credit
ProvenanceAttribution
Credit
ContextInvestigation
StudyExperiment
ContextInvestigation
StudyExperiment
Replicate / Repeat Exactly replicate the original
experiment and experimental conditions. Eliminate change.
Observe.
Reproduce Run experiment with
differences in experimental conditions.. Compare to test
for same result.Observe.
Capture Curate Discover Use Reuse Preserve
ReproduceBetween Labs
RepeatWithin Lab
RO Architecture is Hourglass
ROs structured packages
Provenance, Versioning, Mim services
Viewing, collaboration services/protocols
Astronomy, Biology, services/protocols
Exchange services (media specific)
Storage services (media specific)
8
Research Object
Datasets
Results
Scientists
Hypothesis Experiments
Annotations
Provenance
Electronic
paper
Work
flows
From Electronic papers to Research objects
9
10
Research Object: A user scenario
11
Why research objects?
A research object aggregates all elements deemed necessary to understand research investigations
Promote reuse, sharing Enable the verification of reproducibility of the
results Trackable, versionable, referenceable
12
Anatomy of a research object
ro:Resourcero:ResearchObject
ro:Manifestore:aggregates ore:describes
ro:Folder
ro:FolderEntry
ore:proxyFor
ore:proxyIn
Subclass of
ro:SemanticAnnotation
ore:aggregates
ro:annotatesAggregatedResource
RDF file
ao:body
Grounding Workflow-centric Research Objects Using Semantic Technologies
Workflow-centric research objects are encoded using RDF, according to a set of ontologies that are publicly available
Research objects extend the Object Exchange and Reuse (ORE) model, to represent aggregation.
13ORE
We use the Annotation Ontology (AO) to annotate research object resources and their relationships.
14
Grounding Workflow-centric Research Objects Using Semantic Technologies
15
Relating resources in research object
Results
Logs
Results
Metadata
PaperSlides
Feeds into
produces
Included in
produces
Published in
produces
Included in
Included in
Included in
Published in
Workflow_16
Workflow_13
Common pathways
QTL
The provenance of the RO elements is key to understanding, comparing and debugging scientific workflows and to verifying the validity of a claim made within the context of a
RO
16
ScientistLive RO Live RO
RO snapshot
<<copy>>
Identified by a URISome metadataSome curation
Mostly private (for my group)
RO snapshot
<<copy>>
Identified by a URISome metadataSome curation
Mostly private (for my group
and for paper reviewers)
Librarian/Curator
Scientist
My supervisor calls me to report my
work
My supervisor calls me again and we decide to publish
our RO+paper
<<versionOf>>
Archived RO
<<copy, filter and curate>>
Identified by a URIGood metadata
and curationMostly public
Reviews received and final version
published
<<versionOf>>
A new PhD student
continues my work
<<copy>>
Evolution of a research object
17
PROV standard - Basis for evolution model
http://www.w3.org/TR/prov-primer/
CandidateRecommendation
18
Customizable preservability checklistsWf4Ever Tools
19
Portal: Browsing and annotatingWf4Ever Tools
20
Command line tools, Client librariesWf4Ever Tools
https://github.com/wf4ever/
21
Specifications and APIsWf4Ever Tools
22
Current Status and Ongoing Work
22[3] http://www.myexperiment.org/
Models/spec v0.1 public: http://purl.org/wf4ever/model
- Upcoming revision v0.2: (Q1 2013) • Minor additions to workflow model terms • “RO Terms” – Upper user level view of RO: hypothesis, results – many are “shortcuts”
for structured model
- TODO: Update annotation model to Open Annotation Data Model (OAC)
- TODO: PAV for detailed authorship provenance Showing, managing and sharing of Research
Objects through myExperiment web site
23
Open Annotation Data Model
http://www.openannotation.org/spec/core/
“Almost final” spec: 2013-01-28
Roll out meeting in Manchester: March 2013
CommunityDraft
24
myExperiment RO support
Thank you!
http://www.wf4ever-project.org/ http://www.mygrid.org.uk/