workflow reuse in practice: a study of neuroimaging pipeline users
DESCRIPTION
eScience 2014, Guarujá (Brasil). Abstract: Workflow reuse is a major benefit of workflow systems and shared workflow repositories, but there are barely any studies that quantify the degree of reuse of workflows or the practical barriers that may stand in the way of successful reuse. In our own work, we hypothesize that defining workflow fragments improves reuse, since end-to-end workflows may be very specific and only partially reusable by others. This paper reports on a study of the current use of workflows and workflow fragments in labs that use the LONI Pipeline, a popular workflow system used mainly for neuroimaging research that enables users to define and reuse workflow fragments. We present an overview of the benefits of workflows and workflow fragments reported by users in informal discussions. We also report on a survey of researchers in a lab that has the LONI Pipeline installed, asking them about their experiences with reuse of workflow fragments and the actual benefits they perceive. This leads to quantifiable indicators of the reuse of workflows and workflow fragments in practice. Finally, we discuss barriers to further adoption of workflow fragments and workflow reuse that motivate further work.TRANSCRIPT
![Page 1: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/1.jpg)
Date: 22/10/2014
Workflow Reuse in Practice:
A Study of Neuroimaging Pipeline Users
Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ, Meredith N. Braskieⱡ, Derrek Hibarⱡ, Xue Huaⱡ, Neda Jahanshadⱡ, Paul
Thompsonⱡ, and Arthur W. Togaⱡ
* Universidad Politécnica de Madrid, Ŧ USC Information Sciences Institute,
ⱡ USC Laboratory of Neuroimaging
![Page 2: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/2.jpg)
Main Contributions
•Highlight the benefits of workflows and workflow fragments reported by users in a neuroscience research lab •Survey of workflow users
•Quantitative perspective on the identified benefits.
IEEE eScience 2014. Guarujá, Brasil
2
repurpose
reuse
repository
Create, collaborate
![Page 3: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/3.jpg)
Background
• Workflows are software artifacts that capture computational experiments • Addition to paper publication • Provenance of results • Reuse
• Existing repositories of workflows
(Galaxy, myExperiment, the LONI Pipeline, CrowdLabs, etc.) • Sharing workflows • Exploring existing workflows
• PROBLEMS to address: •How does workflow reuse happen in a research lab environment? •Are workflow fragments more useful than workflows?
3 IEEE eScience 2014. Guarujá, Brasil
![Page 4: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/4.jpg)
Use case: The LONI Pipeline
Workflow system for neuroimaging analysis http://pipeline.loni.usc.edu/explore/library-navigator/
IEEE eScience 2014. Guarujá, Brasil
4
![Page 5: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/5.jpg)
Why LONI Pipeline?
•Need for reuse •Grouping Tools
•Manual annotation of workflow fragments •Workflow Miner
5 IEEE eScience 2014. Guarujá, Brasil
![Page 6: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/6.jpg)
Approach
IEEE eScience 2014. Guarujá, Brasil
6
Discussions with scientists User survey
Collect responses from users
21 responses
Discuss results
![Page 7: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/7.jpg)
Possible benefits of workflows and workflow fragments
•Sharing workflows with collaborators •Time savings
•Copy & paste fragments of workflows •Reuse existent workflows
•Teaching •Reduce the learning curve of new students
•Visualization
•Simplify workflows
•Design for modularity •Highlight the most relevant steps on a workflow
IEEE eScience 2014. Guarujá, Brasil
7
![Page 8: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/8.jpg)
Possible benefits of workflows and workflow fragments (2)
•Design for understandability •Design for standardization •Debugging
•Provenance exploration
•Paper writing •Linking papers to pipelines
•Reproducibility and inspectability
IEEE eScience 2014. Guarujá, Brasil
8
![Page 9: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/9.jpg)
Survey Analysis
9 IEEE eScience 2014. Guarujá, Brasil
![Page 10: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/10.jpg)
Writing and Sharing Code
•Writing code is considered very important for this area of research. •Sharing code is not considered to be as important.
10 IEEE eScience 2014. Guarujá, Brasil
![Page 11: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/11.jpg)
Adopting a Workflow System
The overwhelming majority of responders found the workflow system useful.
•Creation of workflows.
IEEE eScience 2014. Guarujá, Brasil
11
![Page 12: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/12.jpg)
Adopting a workflow system: workflow size
•Workflows of fewer than 10 steps seem to be the most preferred by scientists
IEEE eScience 2014. Guarujá, Brasil
12
0
2
4
6
8
10
12
14
1 2 3 41-5 5-10 10-20 >20
Numberofworkflowcomponents
![Page 13: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/13.jpg)
Reusing workflows
•Respondents answered that creating workflows is very useful •Reuse of workflows was seen as less useful
•Reuse is not the only reason why workflows are created
•Reusing workflows from a user’s prior work is considered as useful as reusing workflows from others
IEEE eScience 2014. Guarujá, Brasil
13
![Page 14: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/14.jpg)
Reusing workflows (2)
According to the respondents, the major benefits of workflows include: • Time savings •Organizing and storing code • Having a visualization of the overall analysis •Facilitating reproducibility
IEEE eScience 2014. Guarujá, Brasil
14
Workflows save time 13
Easier to track and debug complex code 9
Convenient way to organize/store code 11
Help write more organized code 6
Help make code more modular/reusable 4
Help make methods more understandable 8
Visualization of overall analysis 11
Workflows facilitate reproducibility 10
![Page 15: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/15.jpg)
Reusing workflows (3)
•The overwhelming majority of respondents said workflows are useful for both non-programmers and for teaching new students
IEEE eScience 2014. Guarujá, Brasil
15
Non-programmers can use them 20
New students can easily learn 19
No need for others to re-implement code 14
Adoption of standard ways to do things 9
![Page 16: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/16.jpg)
Reusing workflows (4)
•Respondents did not offer very overwhelming reasons for not sharing workflows •Respondents did not offer very overwhelming reasons for not reusing workflows from others
IEEE eScience 2014. Guarujá, Brasil
16
Others would not want to use them 1
Others ask too many questions of the creators 2
Workflows from others are difficult to understand 3
It is difficult to understand how to prepare data for a workflow 3
Workflows from others are difficult to understand 4
It is difficult to understand how to prepare data for a workflow 2
Workflows created by others are too specific 1
It is hard to take workflows created by others and make them work 2
![Page 17: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/17.jpg)
Reusing groupings
•Reuse is not the only reason why groupings are created. Unlike workflows, reusing groupings from one’s own work is more useful than reusing groupings from others
IEEE eScience 2014. Guarujá, Brasil
17
![Page 18: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/18.jpg)
Reusing groupings (2)
•Most respondents agreed that groupings help simplify workflows. Groupings also make workflows more understandable by others •Other grouping benefits:
•Time savings •Help making modular and understandable code, more so than workflows •Seen as useful to non-programmers and students
IEEE eScience 2014. Guarujá, Brasil
18
Visualization of the analysis 10
To simplify workflows that are complex overall 12
To make workflows more understandable to others 12
Groupings save time 12
Help make code more modular/reusable 10
Help make methods more understandable 7
![Page 19: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/19.jpg)
Reusing groupings (3)
Very few responses motivated any reasons for not sharing groupings or not reusing groupings from others
In general, workflows are considered generally more useful than groupings. On the other hand, more respondents said that groupings help make their code more modular and understandable
IEEE eScience 2014. Guarujá, Brasil
19
Others would not want to use them 0
Others ask too many questions of the creators 1
Workflows from others are difficult to understand 4
It is difficult to understand how to prepare data for a grouping 1
Groupings from others are difficult to understand 2
It is difficult to understand how to prepare data for a grouping 3
Groupings created by others are too specific 1
It is hard to take groupings created by others and make them work 4
![Page 20: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/20.jpg)
Paper Writing
Workflows are not systematically linked to publications
•Most responders believe that the link between a workflow and a publication is kept in private laboratory notes, rather than in a publicly accessible manner
IEEE eScience 2014. Guarujá, Brasil
20
![Page 21: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/21.jpg)
Discussion
Workflows have a clear benefit to the lab. There are important directions of future research suggested by this work: •Improve the use of groupings.
•If users had more assistance in specifying and finding groupings, it is possible that workflows and fragments would be more reused
•Debugging and checking results •Better mechanisms to handle checking intermediate execution results would allow users to define larger workflows
•Better documentation of workflows. •Documentation of workflows tends to be private and scattered, and not usually linked to papers
•Facilitating workflows publication and linking to papers •Papers provide important context and documentation for workflows
IEEE eScience 2014. Guarujá, Brasil
21
![Page 22: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/22.jpg)
Conclusions
•Contributions:
•Highlight the benefits of workflows and workflow fragments reported by users in a neuroscience research lab •Quantitative survey of the benefits by workflow users
•Our work can be expanded by •Validating our findings with more respondents •Reflecting the experience level of the respondents on the questionnaire •Including statistics of the groupings usage on the workflows they create
•There are clear opportunities to develop best practices for designing workflow components and modularizing code, encouraging standards adoption, and facilitating understanding by other users
IEEE eScience 2014. Guarujá, Brasil
22
All materials used and the survey are available at:
http://purl.org/net/wfSurvey-eScience2014
![Page 23: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/23.jpg)
23
Who are we?
•Daniel Garijo, Oscar Corcho Ontology Engineering Group, UPM •Yolanda Gil Information Sciences Institute, USC •Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad, Paul Thompson
Arthur W. Toga. USC Laboratory of Neuro Imaging
IEEE eScience 2014. Guarujá, Brasil
![Page 24: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/24.jpg)
24
Questions?
IEEE eScience 2014. Guarujá, Brasil
![Page 25: Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users](https://reader031.vdocument.in/reader031/viewer/2022020207/5584fe13d8b42ae71b8b4bbd/html5/thumbnails/25.jpg)
Date: 22/10/2014
Workflow Reuse in Practice:
A Study of Neuroimaging Pipeline Users
Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ, Meredith N. Braskieⱡ, Derrek Hibarⱡ, Xue Huaⱡ, Neda Jahanshadⱡ, Paul
Thompsonⱡ, and Arthur W. Togaⱡ
* Universidad Politécnica de Madrid, Ŧ USC Information Sciences Institute,
ⱡ USC Laboratory of Neuroimaging