reproducibility using semantics: an overview

16
Reproducibility Using Semantics: An Overview Dagstuhl Seminar Jan 2016 Daniel Garijo, Olga Giraldo, Idafen Santana-Pérez, Victor Rodriguez Doncel, Oscar Corcho Ontology Engineering Group Universidad Politécnica de Madrid Madrid, Spain

Upload: dgarijo

Post on 21-Jan-2017

342 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Reproducibility Using Semantics: An Overview

Reproducibility Using Semantics: An Overview

Dagstuhl SeminarJan 2016

Daniel Garijo, Olga Giraldo, Idafen Santana-Pérez, Victor Rodriguez Doncel, Oscar Corcho

Ontology Engineering Group Universidad Politécnica de Madrid

Madrid, Spain

Page 2: Reproducibility Using Semantics: An Overview

The Research Method in different disciplines

2

INPUT DATA LABORATORY PROTOCOL EQUIPMENT

IN V

IVO

/VIT

RO

IN S

ILIC

O

DATASET SCIENTIFIC WORKFLOWINFRASTRUCTURE

Page 3: Reproducibility Using Semantics: An Overview

Some problems in lab protocols

some of them present insufficient granularity,

the instructions can be imprecise or ambiguous due to the use of natural language.

• Incubate the centrifuge tubes in a water bath.

• Incubate the samples for 5 min with gentle shaking.

• Rinse DNA briefly in 1-2 ml of wash.

• Incubate at -20C overnight.

3

Page 4: Reproducibility Using Semantics: An Overview

Currently…

Semi-structured information

Unstructured information

How to formalize the information from laboratory protocols as a knowledge base?

NLP tools + Ontologies

4

Page 5: Reproducibility Using Semantics: An Overview

Semantic annotation

SMART Protocols ontology is available here:http://vocab.linkeddata.es/SMARTProtocols/

GATE Smart Protocols

5

Page 6: Reproducibility Using Semantics: An Overview

The Research Method in different disciplines

6

INPUT DATA LABORATORY PROTOCOL EQUIPMENT

IN V

IVO

/VIT

RO

IN S

ILIC

O

DATASET SCIENTIFIC WORKFLOWINFRASTRUCTURE

Page 7: Reproducibility Using Semantics: An Overview

Vocabularies and methodologies for representing and publishing workflows

7

Interactive Browsing

(Pubby frontend)

Programatic access(external apps)

Wings workflow generation

OPM/PROVconversion Publication Share Reuse

Core

Portal

WINGS on local laptop

Workflow Template

WorkflowInstance

PROVexport

Core

Portal

WINGS on shared hostWorkflow Template

WorkflowInstance

PROVexport

Core

Portal

WINGS on web serverWorkflow Template

WorkflowInstance

PROVexport

LinkedData

Publication

Users

Other workflow environments

RDF TripleStore

Workflow Provenance

Workflow PlanMethodology for workflow publishing

Repository of linked workflows:http://www.opmw.org/sparql

http://purl.org/net/p-plan

http://www.opmw.org/ontology/

Daniel Garijo and Yolanda Gil. 2011. A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, 47-56.Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked Science 2012, Boston, 2012.

7

Page 8: Reproducibility Using Semantics: An Overview

The Research Method in different disciplines

8

INPUT DATA LABORATORY PROTOCOL EQUIPMENT

IN V

IVO

/VIT

RO

IN S

ILIC

O

DATASET SCIENTIFIC WORKFLOWINFRASTRUCTURE

Page 9: Reproducibility Using Semantics: An Overview

PegasusMontageSoyKBEpigenomics

CLOUD

Reproducibility of Computational Scientific Experiments

9

FORMEREQUIPMENT

ANNOTATE REPRODUCE

SEMANTIC ANNOTATIONS

EQUIVALENT EXECUTION

ENVIRONMENT

Dispel4PyInternal ExtinctionSeismic Cross Correlation

MakeflowBlast

Page 10: Reproducibility Using Semantics: An Overview

Some results

• Pegasus Montage Workflow• Astronomy workflow• Construct large image mosaics of the sky• Montage Software distribution• 59 binaries

• Target IaaS Cloud Providers• Amazon EC2 & Futuregrid• Vagrant

10

RO available at http://pegasus.isi.edu/publications/reppar

Page 11: Reproducibility Using Semantics: An Overview

The Research Method in different disciplines

11

INPUT DATA LABORATORY PROTOCOL EQUIPMENT

IN V

IVO

/VIT

RO

IN S

ILIC

O

DATASET SCIENTIFIC WORKFLOWINFRASTRUCTURE

+ CONTEXT!

Page 12: Reproducibility Using Semantics: An Overview

Research Objects

ROs as web pages http://rohub.linkeddata.es/ROs as part of a Linked Data Platform (alpha): http://purl.org/net/ldp4ro

12

Page 13: Reproducibility Using Semantics: An Overview

How to preserve Workflows/Research Objects?

13

Three main ways/levels:• Descriptive reproducibility

• Documentation• Workflow execution reproducibility

• Can we run the workflow?• Workflow results reproducibility

• Can we get the same results?

Checklists!• Corcho et al: Checklist for workflow conservation.

• http://dx.doi.org/10.6084/m9.figshare.1285011• 40 different aspects

• Documentation• Goals• Results• Metadata

• Corcho et al: Checklist for a workflow conservation plan• http://dx.doi.org/10.6084/m9.figshare.1285012• Based on the DCC’s data management plan

Page 14: Reproducibility Using Semantics: An Overview

Some examples

14

Levels of reproducibility

Workflow conservation Plan

Page 15: Reproducibility Using Semantics: An Overview

Intellectual property rights

15

Visit http://licensius.com!

Page 16: Reproducibility Using Semantics: An Overview

Acknowledgements

• The Semantic e-Science team at UPM• Carlos Badenes• Daniel Garijo• Olga Giraldo• Rafael González-Cabero• Idafen Santana• Victor Rodriguez Doncel

• The Wf4Ever team• Carole Goble, José Manuel Gómez Pérez, Raúl Palma,

Jun Zhao, Stian Soiland-Reyes, Khalid Belhajjame, José Enrique Ruíz, Marco Roos, Lourdes Verdes-Montenegro, Norman Morrison, Sean Bechoffer, Graham Klyne, Matt Gamble, and a large etcetera

• The Research Object community group• http://www.researchobject.org/

16