an open provenance model for scientific workflows professor luc moreau l.moreau@ecs.soton.ac.uk...

Post on 14-Dec-2015

226 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Open Provenance Model for Scientific Workflows

Professor Luc MoreauL.Moreau@ecs.soton.ac.ukUniversity of Southampton

www.ecs.soton.ac.uk/~lavm

Provenance & PASOA Teams

University of Southampton Luc Moreau, Paul Groth, Simon Miles, Victor Tan, Miguel Branco,

Sofia Tsasakou, Sheng Jiang, Steve Munroe, Zheng Chen IBM UK (EU Project Coordinator)

John Ibbotson, Neil Hardman, Alexis Biller University of Wales, Cardiff

Omer Rana, Arnaud Contes, Vikas Deora, Ian Wootten, Shrija Rajbhandari

Universitad Politecnica de Catalunya (UPC) Steven Willmott, Javier Vazquez

SZTAKI Laszlo Varga, Arpad Andics, Tamas Kifor

German Aerospace Andreas Schreiber, Guy Kloss, Frank Danneman

Contents

Motivation Provenance Concept Map Process documentation in a

concrete bioinformatics application Conclusions

Motivation

Peer Review/Audit

Accounting

BankingHealthcare

Academicpublishing

e-Science datasets

How to undertake peer-reviewing and validation of e-Scientific results?

Current Solutions

Proprietary, Monolithic

Silos, Closed Do not inter-operate

with other applications

Not adaptable to new regulations

Provenance

Oxford English Dictionary: the fact of coming from some particular

source or quarter; origin, derivation the history or pedigree of a work of art,

manuscript, rare book, etc.; concretely, a record of the passage of an item through its various owners.

Concept vs representation

Application Drivers

Aerospace engineering: maintain a historical record of design processes, up to 99 years.

Organ transplant management: tracking of previous decisions, crucial to maximise the efficiency in matching and recovery rate of patients

High Energy Physics: tracking, analysing, verifying data sets in the ATLAS Experiment of the Large Hadron Collider (CERN)

Bioinformatics: verification and auditing of “experiments” (e.g.for drug approval)

Provenance Concept Map

is an execution of

Application

Services

Provenance(concept)

Data product

produces

Process Documentation

P-structure

has a structure

operates over

P-assertionsconsists of

contains

assert

Process

documents

is defined as a past

Provenance (representation)

is represented by

Provenance Query

is obtained by

has

Making Applications Provenance Aware

ApplicationApplication

Data Product

ProvenanceStore

Assert p-assertions and record them as Process Documentation

Obtain the provenanceof data by issuing

provenance queries

Process Documentation

M1

M2

M3

M4

f1

f2

M3 = f1(M1)M2 = f2(M1,M4)M2 is in reply to M1

I received M1, M4I sent M2, M3

Interaction p-assertions

Relationshipp-assertions

Service statep-assertions

I received M1 at time tI used algorithm x.y.z

Data flow

Interaction p-assertions allow us to specify a flow of data between services

Relationship p-assertions allow us to characterise the flow of data “inside” an service

Overall data flow (internal + external) constitutes a DAG, which characterises the process that led to a result

Process Documentation in a Concrete Bioinformatics Application

Biology Determine how protein

sequences fold into a 3D structure?

Structure of protein sequences may help to answer this question.

Structure can be quantified by textual compressibility.

Determine the amino acid groupings that maximize compressibility?

Collaboration Diagram

Actual Call DAG

The P-StructureThe logical structure of a provenance store

Interaction Record

The set of p-assertions pertaining to agiven interaction (i.e., message exchange between a sender and areceiver)

Interaction KeyA unique identifier for an interaction

Sender identity

Receiver identity

Local id

View

The set of p-assertions created by an asserterinvolved in an interaction (sender or receiverview)

Asserter

The identity of an asserter

Interaction P-Assertion

An assertion of the contents of a message by an actor that has sent or received that message

Interaction P-Assertion Content

The content of an interaction p-assertion:here, the invocation of blast (through awrapper)

Interaction Content

Provenance-related information passed inapplication messages

Actor State P-Assertion

An assertion made by an actor about its internalstate in the context of a specific interaction

Relationship P-AssertionWith respect to an interaction, a relationship p-assertion is anassertion, made by an actor, that describes how the actor obtainedoutput data or the whole message sent in that interaction by applyingsome function to input data or messages from other interactions.

Subject Id

The identity of the subject of a relationship

Object Id

The identity of the object of a relationship

Process Documentation Characteristics

Common logical structure of the provenance store shared by all asserting and querying actors

Can be produced autonomously, asynchronously by the different application components

Open, extensible model, for which we are producing a public specification

Tools can operate on it (e.g. visualisation, reasoning)

Performance (HPDC’05)

Standardisation Philosophy

Thin layer common between systems: extensible data model

Model can be extended for specific: technologies (WS, Web, …), or application domains (Bio, Healthcare,

Desktop, …) Service interfaces

WS-Prov-Intro

WS-Prov-DM

WS-Prov-Glo

WS-Prov-Rec WS-Prov-Query

WS-Prov-DM-Link

WS-Prov-DM-Infer

WS-Prov-DM-DS

Generic Profiles Domain Specific Profiles

WS-Prov-SOAP

Technology Bindings

WS-Prov-DM-Sec

WS-Prov-WWW

WS-Prov-DM-Rel

WS-Prov-Primer

Proposed List of Specifications

Conclusions

ProvenanceStore

Reco

rd

To Sum Up

Query

Compliance check Rerun/Reproduce Analyse

Standardising thedocumentation of

Business Processes

Provenance Architecture Methodology

Apply

Healthcare

DistributionFinance

Aerospace

Automobile

Pharmaceutical

Slide from John Ibbotson

Conclusions

Crucial topic for many applications Full architectural specification Implementation available for download Methodology to make application

provenance-aware Draft standardisation proposal to be

released www.pasoa.org www.gridprovenance.org

twiki.ipaw.info

Provenance Challenge

Provenance Challenge Workshopat OGF18, Washington, September 11-14

Questions

top related