![Page 1: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/1.jpg)
Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best PracticesRichard Littauer, Karthik Ram, Bertram Ludäscher, William Michener, Rebecca Koskela
Dat
aON
E
1
![Page 2: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/2.jpg)
Scientific Workflows
Tools that help scientists:
• Automate repetitive or difficult work
• Provide reproducibility to their experiments
• Track provenance
• Share their data with other scientists
Dat
aON
E
2
![Page 3: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/3.jpg)
Workflow Workbenches
Dat
aON
E
3
![Page 4: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/4.jpg)
Workflow Workbenches
These facilitate:• Creation
• Mapping
• Scheduling
• Execution
• Visualization
• Re-Use
Dat
aON
E
4
![Page 5: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/5.jpg)
Example Workflow
Dat
aON
E
5
http://www.myexperiment.org/workflows/140.html
![Page 6: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/6.jpg)
Our Study
• How are workflows being used?
Dat
aON
E
6
http://www.flickr.com/photos/eleaf/2536358399
![Page 7: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/7.jpg)
Our Study
• How are workflows being used?• How are they being shared?
Dat
aON
E
7
http://www.flickr.com/photos/eleaf/2536358399
![Page 8: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/8.jpg)
Our Study
• How are workflows being used?• How are they being shared?• What sort of best practices can
researchers follow to maximize the longevity and use of their work?
Dat
aON
E
8
http://www.flickr.com/photos/eleaf/2536358399
![Page 9: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/9.jpg)
Our Study
• www.myexperiment.org• Est. 2007• 5000+ users• 2000+ workflows (mostly Taverna 1, 2, and RapidMiner) D
ataO
NE
9
![Page 10: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/10.jpg)
Our Study
• www.myexperiment.org• Est. 2007• 5000+ users• 2000+ workflows (mostly Taverna 1, 2, and RapidMiner)
• Minable RDF storage for workflows, groups, packs, users, files.• Minable data gathered through the SCUFLE XML language for the
Taverna workflows• Taverna 1 - 479 workflows; Taverna 2 - 684 workflows.
Dat
aON
E
10
![Page 11: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/11.jpg)
Our Study
• We harvested information using a combination of SPARQL and Python (https://github.com/RichardLitt/Understanding-Workflows)
Dat
aON
E
11
![Page 12: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/12.jpg)
Our Study
• We harvested information using a combination of SPARQL and Python (https://github.com/RichardLitt/Understanding-Workflows)
• Gathered user, workflow, files, packs, groups view and download statistics, metadata, descriptions, tags, and so on (http://thedatahub.org/dataset/myexperiment-screenscrape)
Dat
aON
E
12
![Page 13: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/13.jpg)
Findings• A large percentage of
workflows consist of few components.
• The amount of components ranges from 1 to 250. The average workflow supports 24.3 tasks.
• Complex workflows are downloaded more.
Dat
aON
E
13
![Page 14: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/14.jpg)
Findings• Most workflow contributors
submit a single workflow.
• Only 13 users have uploaded more than 30 workflows.
• Just over 5% of the users on myExperiment have uploaded workflows.
Dat
aON
E
14
![Page 15: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/15.jpg)
Findings• Most workflows have only
one version uploaded.
• When several versions do exist, the workflow is more frequently downloaded than “single-edition” workflows.
Dat
aON
E
15
![Page 16: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/16.jpg)
Findings
• Workflow use declined significantly a month after initial upload.
Dat
aON
E
16
![Page 17: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/17.jpg)
Findings
• A large percentage of workflow components – approx. 38% - are shims.
• Components that are used to make output from one step conform to the format expected by a subsequent step. D
ataO
NE
17
![Page 18: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/18.jpg)
Findings
• A large percentage of workflow components – approx. 38% - are shims.
• Components that are used to make output from one step conform to the format expected by a subsequent step.
• This is a problem for developers.
Dat
aON
E
18
![Page 19: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/19.jpg)
Findings
• A large percentage of workflow components – approx. 38% - are shims.
• Components that are used to make output from one step conform to the format expected by a subsequent step.
• This is a problem for developers.
• 8% more than previous studies (Lin et al.)
Dat
aON
E
19
![Page 20: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/20.jpg)
Findings
• 60% of workflows have embedded workflows within them.
Dat
aON
E
20
![Page 21: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/21.jpg)
Findings
• 60% of workflows have embedded workflows within them.
• Documentation on site (tags, description) does not improve use…
Dat
aON
E
21
![Page 22: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/22.jpg)
Findings
• 60% of workflows have embedded workflows within them.
• Documentation on site (tags, description) does not improve use…
• … but community engagement does.
Dat
aON
E
22
![Page 23: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/23.jpg)
Recommendations
Remember workflows are evolving entities.
They are updated in response to user feedback, engagement, and improvements in methodology.
Dat
aON
E
23
![Page 24: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/24.jpg)
Recommendations
Use relevant social annotation tools.
But they need to be constrained; for instance, through the use of a controlled tag vocabulary.
Dat
aON
E
24
![Page 25: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/25.jpg)
Recommendations
Talk about them.
Cite the workflow in publications.Share with colleaguesAdvertise the workflow.
Dat
aON
E
25
![Page 26: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/26.jpg)
Recommendations
Provide sufficient descriptions of your workflows.
Dat
aON
E
26
![Page 27: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/27.jpg)
Recommendations
Keep in mind that one size does not fit all.
Dat
aON
E
27
![Page 28: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/28.jpg)
Recommendations
Workflow re-use could benefit significantly from the assignment of stable identifiers, like Digital Object Identifiers (DOI). D
ataO
NE
28
![Page 29: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/29.jpg)
Recommendations
Education is the key to more use.
i.e. in professional society meetings, online courses, and undergraduate and graduate courses.
Dat
aON
E
29
![Page 30: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/30.jpg)
Impact on Science
Following these recommendations can help:• Make science more efficient.• Facilitate reproducible science.• Help with collaborative research.• Speed up the peer review process. • Your impact. (For instance, NSF has said these
are valuable contributions.)
Dat
aON
E
30
![Page 31: Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices](https://reader035.vdocument.in/reader035/viewer/2022070301/54543661af79592e458b55a3/html5/thumbnails/31.jpg)
Links• Mendeley Research Group:
http://www.mendeley.com/groups/1189721/scientific-workflows-and-workflow-systems/
• Github https://github.com/RichardLitt/Understanding-Workflows• Data http://thedatahub.org/dataset/myexperiment-screenscrape• Notebook https://notebooks.dataone.org/workflows D
ataO
NE
31
http://www.flickr.com/photos/wwworks/4759535950/