a framework for scientific workflow reproducibility in the...
TRANSCRIPT
![Page 1: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/1.jpg)
A Framework for Scientific Workflow
Reproducibility in the Cloud
Rawaa Qasha, Jacek Cała, Paul Watson Newcastle University, Newcastle upon Tyne, UK
Email: {r.qasha, jacek.cala, paul.watson}@newcastle.ac.uk
![Page 2: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/2.jpg)
In this paper
• A new framework for repeatability and reproducibility of
scientific workflow
• Integrating logical and physical preservation
approaches
• Offering Workflow/tasks repositories with version
control
• Supporting automatic deployment and image capture of
workflows and tasks
2
![Page 3: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/3.jpg)
• Background
• Challenges for workflow reproducibility
• Our solution for logical and physical preservations
• Overview of reproducibility framework
• Experiments and results
• Conclusions
Outline
3
![Page 4: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/4.jpg)
Workflows & Reproducibility
4
92
1443
18 (~20%)
341 (~24%)
0
200
400
600
800
1000
1200
1400
1600
study1* study2**
Num
be
r o
f w
ork
flo
ws
total no. of workflows
Workflows can be re-excuted
*Zhao et al, “Why workflows break Understanding and combating decay in Taverna workflows,” 2012
**Mayer et al, “A Quantitative Study on the Re-executability of Publicly Shared Scientific Workflows”, 2015
![Page 5: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/5.jpg)
• Insufficiently detailed workflow description
• Insufficient description of the execution environment
• Unavailable execution environments
• Absence of & changes in the external dependencies
• Missing input data
5
Challenges
for workflow reproducibility
![Page 6: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/6.jpg)
6
Common reproducibility approaches
T1
T2
T4
T3
Logical preservation
Physical preservation
![Page 7: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/7.jpg)
Using TOSCA as a logical preservation
7
Node
Type
T1
T2
T4
T3
Relationship
Type Node
Template
(T4)
Node
Template
(T1)
Node
Template
(T3)
Node
Template
(T2)
Service Template
Workflow and execution environment description
![Page 8: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/8.jpg)
8
Image
creation
Container
With Depend.
base
image Task
image
Container
creation
Data
Task
artifact
Tools &
Libs.
(a) Initial task deployment & execution
Task
image
Container
creation
Data
(b) Task deployment & execution with task image
Using Docker for physical preservation
Preserving execution environment and dependencies, tracking changes
![Page 9: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/9.jpg)
9
Task/WF
Repository
(GitHub)
Images
Repository
(Docker Hub) LifeCycle
Scripts Basic Types
Workflow Deployment & Enactment Engine
(TOSCA Runtime Environment: Cloudify)
Automated
Image
Creation
Target Execution Environment
(Docker over local VM, AWS, Azure, GCE, …)
Core Repository (GitHub)
Reproducibility Framework
![Page 10: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/10.jpg)
10
Multi-container deployment
![Page 11: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/11.jpg)
11
Single container deployment
![Page 12: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/12.jpg)
12
Time line of workflow devOps
![Page 13: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/13.jpg)
13
Workflow repository
Preserving description, input data, tracking changes and deployment instructions
![Page 14: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/14.jpg)
14
Experiments and Results
![Page 15: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/15.jpg)
15
1- Repeatability of a workflow on different
clouds
![Page 16: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/16.jpg)
16
2- Automatic image capture for improved
performance
![Page 17: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/17.jpg)
17
3- Reproducibility in the face of development
changes
![Page 18: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/18.jpg)
Conclusions
18
• Full workflow reproducibility is a long-standing issue
• TOSCA description is used for logical preservation
• Docker images for tasks/workflows support physical preservation
• Changes tracking and automatic deployment also contribute to a comprehensive solution of the problem
• Integration of these techniques addresses majority of the issues related to workflow decay
![Page 19: A Framework for Scientific Workflow Reproducibility in the Cloudescience-2016.idies.jhu.edu/.../11/Qasha-Rawaa-slides.pdf · 2016. 11. 1. · Rawaa Qasha, Jacek Cała, Paul Watson](https://reader036.vdocument.in/reader036/viewer/2022071401/60eaeba48e52f359f63ec557/html5/thumbnails/19.jpg)
THANK YOU