2013-01-17 research object

25
Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI meet-up Manchester 2013-01-17

Upload: stian-soiland-reyes

Post on 11-May-2015

235 views

Category:

Technology


0 download

DESCRIPTION

Preserving scientific data and methods - presentation by myGrid for NIHBI

TRANSCRIPT

Page 1: 2013-01-17 Research Object

Research ObjectsPreserving scientific data and methods

Stian Soiland-Reyes, Khalid BelhajjameSchool of Computer Science, Univ of

Manchester

myGrid NIHBI meet-up Manchester 2013-01-17

Page 2: 2013-01-17 Research Object

2

Agenda

» Preserving digital science» The Research Object

» Anatomy» Lifecycle

» Wf4Ever Tools» Future developments

Page 3: 2013-01-17 Research Object

3

Computation Processes in Today’s Research

» Research is being conducted in increasingly

digital and online environment

» This has led to the emergence of new digital

artifacts

» In some respects, these objects can be regarded

as data

» However, some objects include the description of

the research method that is captured as a

computational process

» Such processes encapsulate the knowledge

related to the generation, (re)use and general

transformation of data in experimental sciences

Raw data

Computational process

Results

Page 4: 2013-01-17 Research Object

4

Scientific Workflow

» A scientific workflow is a precise, executable

description of a scientific procedure - a series of

analysis operations connected using data links

» Each operation represents the execution of a

computational process

» Can be supplied by independently developed

web services

» Can also use existing data sources that are

accessible on the Web

In this work, we focus on a particular kind of computational processes called scientific workflows

Page 5: 2013-01-17 Research Object

5

Preservation Challenges

» Changes by 3rd parties

» Workflow may produce

different lists at different

times

» Workflow may become

inoperable

Challenges deal with their executable aspects and their vulnerability to the volatility of the resources required for their execution

» Workflow decay – The execution of the workflow may fail or yield different results,

due to dependencies on resources and services subject to independent changes,

e.g., EMBL-EBI. Even workflows that depend on local resources are vulnerable.

Page 6: 2013-01-17 Research Object

LaboratoryLaboratory

Instruments

MethodsMethods

Materials Publication

Models, Techniques, Algorithms

Models, Techniques, Algorithms

Data

LaboratoryLaboratory

Instruments

MethodsMethods

Materials

ProvenanceAttribution

Credit

ProvenanceAttribution

Credit

ContextInvestigation

StudyExperiment

ContextInvestigation

StudyExperiment

Replicate / Repeat Exactly replicate the original

experiment and experimental conditions. Eliminate change.

Observe.

Reproduce Run experiment with

differences in experimental conditions.. Compare to test

for same result.Observe.

Capture Curate Discover Use Reuse Preserve

ReproduceBetween Labs

RepeatWithin Lab

Page 7: 2013-01-17 Research Object

RO Architecture is Hourglass

ROs structured packages

Provenance, Versioning, Mim services

Viewing, collaboration services/protocols

Astronomy, Biology, services/protocols

Exchange services (media specific)

Storage services (media specific)

Page 8: 2013-01-17 Research Object

8

Research Object

Datasets

Results

Scientists

Hypothesis Experiments

Annotations

Provenance

Electronic

paper

Work

flows

From Electronic papers to Research objects

Page 9: 2013-01-17 Research Object

9

Page 10: 2013-01-17 Research Object

10

Research Object: A user scenario

Page 11: 2013-01-17 Research Object

11

Why research objects?

A research object aggregates all elements deemed necessary to understand research investigations

Promote reuse, sharing Enable the verification of reproducibility of the

results Trackable, versionable, referenceable

Page 12: 2013-01-17 Research Object

12

Anatomy of a research object

ro:Resourcero:ResearchObject

ro:Manifestore:aggregates ore:describes

ro:Folder

ro:FolderEntry

ore:proxyFor

ore:proxyIn

Subclass of

ro:SemanticAnnotation

ore:aggregates

ro:annotatesAggregatedResource

RDF file

ao:body

Page 13: 2013-01-17 Research Object

Grounding Workflow-centric Research Objects Using Semantic Technologies

Workflow-centric research objects are encoded using RDF, according to a set of ontologies that are publicly available

Research objects extend the Object Exchange and Reuse (ORE) model, to represent aggregation.

13ORE

Page 14: 2013-01-17 Research Object

We use the Annotation Ontology (AO) to annotate research object resources and their relationships.

14

Grounding Workflow-centric Research Objects Using Semantic Technologies

Page 15: 2013-01-17 Research Object

15

Relating resources in research object

Results

Logs

Results

Metadata

PaperSlides

Feeds into

produces

Included in

produces

Published in

produces

Included in

Included in

Included in

Published in

Workflow_16

Workflow_13

Common pathways

QTL

The provenance of the RO elements is key to understanding, comparing and debugging scientific workflows and to verifying the validity of a claim made within the context of a

RO

Page 16: 2013-01-17 Research Object

16

ScientistLive RO Live RO

RO snapshot

<<copy>>

Identified by a URISome metadataSome curation

Mostly private (for my group)

RO snapshot

<<copy>>

Identified by a URISome metadataSome curation

Mostly private (for my group

and for paper reviewers)

Librarian/Curator

Scientist

My supervisor calls me to report my

work

My supervisor calls me again and we decide to publish

our RO+paper

<<versionOf>>

Archived RO

<<copy, filter and curate>>

Identified by a URIGood metadata

and curationMostly public

Reviews received and final version

published

<<versionOf>>

A new PhD student

continues my work

<<copy>>

Evolution of a research object

Page 17: 2013-01-17 Research Object

17

PROV standard - Basis for evolution model

http://www.w3.org/TR/prov-primer/

CandidateRecommendation

Page 18: 2013-01-17 Research Object

18

Customizable preservability checklistsWf4Ever Tools

Page 19: 2013-01-17 Research Object

19

Portal: Browsing and annotatingWf4Ever Tools

Page 20: 2013-01-17 Research Object

20

Command line tools, Client librariesWf4Ever Tools

https://github.com/wf4ever/

Page 21: 2013-01-17 Research Object

21

Specifications and APIsWf4Ever Tools

Page 22: 2013-01-17 Research Object

22

Current Status and Ongoing Work

22[3] http://www.myexperiment.org/

Models/spec v0.1 public: http://purl.org/wf4ever/model

- Upcoming revision v0.2: (Q1 2013) • Minor additions to workflow model terms • “RO Terms” – Upper user level view of RO: hypothesis, results – many are “shortcuts”

for structured model

- TODO: Update annotation model to Open Annotation Data Model (OAC)

- TODO: PAV for detailed authorship provenance Showing, managing and sharing of Research

Objects through myExperiment web site

Page 23: 2013-01-17 Research Object

23

Open Annotation Data Model

http://www.openannotation.org/spec/core/

“Almost final” spec: 2013-01-28

Roll out meeting in Manchester: March 2013

CommunityDraft

Page 24: 2013-01-17 Research Object

24

myExperiment RO support

Page 25: 2013-01-17 Research Object

Thank you!

http://www.wf4ever-project.org/ http://www.mygrid.org.uk/