reana: reproducible research data analysis platform · 2019-03-13 · reana is a reusable and...

17
Laia Pujol Priego, Jonathan Wareham January – 2019 EN REANA: Reproducible research data analysis platform Open Science Monitor Case Study

Upload: others

Post on 28-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

Laia Pujol Priego, Jonathan Wareham January – 2019

EN

REANA: Reproducible research data

analysis platform

Open Science Monitor Case Study

Page 2: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

REANA - Open Science Monitor Case Study

European Commission

Directorate-General for Research and Innovation

Directorate A — Policy Development and Coordination

Unit A.2 — Open Data Policy and Science Cloud

E-mail [email protected]

[email protected]

European Commission

B-1049 Brussels

Manuscript completed in January 2019.

This document has been prepared for the European Commission however it reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

More information on the European Union is available on the internet (http://europa.eu).

Luxembourg: Publications Office of the European Union, 2019

EN PDF ISBN 978-92-76-00910-8

doi: 10.2777/45007

KI-02-19-176-EN-N

© European Union, 2019.

Reuse is authorized provided the source is acknowledged. The reuse policy of European Commission documents is regulated by Decision 2011/833/EU (OJ L 330, 14.12.2011, p. 39).

For any use or reproduction of photos or other material that is not under the EU copyright, permission must be sought directly from the copyright holders.

Page 3: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

EUROPEAN COMMISSION

REANA (Reusable Analyses) Open Science Monitor Case Study

2019 Directorate-General for Research and Innovation EN

Page 4: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

4

Table of contents Acknowledgments ........................................................................................................ 5

1 Introduction ......................................................................................................... 6

2 Background .......................................................................................................... 7 2.1 Main steps for Reproducibility ........................................................................... 7 2.2 Main features of REANA................................................................................... 8 2.3 REANA Architecture ........................................................................................ 8

3 Drivers ................................................................................................................ 9

4 Barriers ............................................................................................................. 10

5 Impact .............................................................................................................. 10 5.1 Science ...................................................................................................... 10 5.2 Industry ..................................................................................................... 12

6 Lessons learned .................................................................................................. 13

7 Policy conclusions ................................................................................................ 13

References ................................................................................................................ 14

Page 5: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

5

Acknowledgments

Disclaimer: The information and views set out in this study report are those of the author(s) and do not necessarily reflect the official opinion of the Commission. The Commission does not guarantee the accuracy of the data included in this case study. Neither the Commission nor any person acting on the Commission’s behalf may be held responsible for the use which may be made of the information contained therein.

The case study is part of Open Science Monitor led by the Lisbon Council together with CWTS, ESADE, and Elsevier.

Authors

Laia Pujol Priego – Ramon Llull University, ESADE

Jonathan Wareham – Ramon Llull University, ESADE

Page 6: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

6

1 Introduction

“Sciences moves forward by corroboration when researchers verify others' results” (Nature, 2018). Reproducibility, defined as the extent to which consistent results are obtained when an experiment is repeated (Casadevall, 2010) has been in the front-line of research policies agendas since the last decade when studies started to sound the alarm by showing the failure of scientists trying to reproduce research results of their peers in different fields.

Nature published in 2012 a study that reviewed ten years of research, where scholars found that 47 out of 53 biomedical research papers on cancer research was irreproducible (Prinz et al., 2011). Some of the problems that those irreproducible studies showed were: the failure to show all the data, the inappropriate use of statistical tests, the use of reagents1 That was not adequately validated and a failure to repeat the experiments. The study itself was reproduced in PLOS ONE to confirm the irreproducibility of such studies. Four years later, in 2016, Nature surveyed 1576 scientists, finding that more than 70% of researchers failed to reproduce another scientist’s experiments and more than 50% failed to reproduce the results of their experiments. Also, a famous study was implemented in 2015 in the field of Psychology where an open, registered empirical study of reproducibility was launched where 270 researchers around the world work together and try to replicate 100 empirical studies published in the three top PSychology journals. This collective effort, called the Reproducibility Project, showed that fewer than half of the replications that scientists tried were successful (Jarrett, 2015). Motivations that some scholars argue for such reproducibility crisis include increased levels of complexity of experiments and statistics, and increased levels of scrutiny towards researchers findings (Nature, 2018)

As a result of this growing awareness, reproducible research has become a pervasive objective in the research policy agenda at different governmental levels, including funders and journal policies. In the struggle to improve the reproducibility of science, there have been different initiatives from the scientific community to fight the reproducibility crisis. One of the significant initiatives is coming from the particle physics field at CERN who came up with the idea of REANA, a reusable and reproducible research data analysis platform. It has already captured the attention of other disciplines, such as biomedical research and others (see section 5.1), and can potentially function with any scientific setting. The platform enables to re-use and reinterpret preserved data analysis years after the original publication facilitating reproducibility of scientific results. Concretely, REANA helps scientists to structure their input data, their analysis code, containerised environments, and computational workflows in order to run the analysis on remote compute clouds (REANA, 2019).

Although REANA is still in an early alpha stage of development for early adopters and testers, the case shows powerfully that the reproducibility of research requires more than opening up the scientific data: it requires having access to the original computing environment, the experimental datasets, the analysis software, and the computational workflow steps which were used by the researcher to produce the original scientific results in the first place (Simko et al., 2018). Despite being still in early stage deployment, the case provides already significant insights for the scientific community, industry, and policy-makers about how long-term policies need to be accompanied by short-term fixes that help to close the gap of the lack of reproducibility. In its testing phase, REANA already

1 Reagents are defined as a substances employed - as in detecting or measuring a component, in preparing a product, or in developing photographs- because of its chemical or biological activity (Merriam Webster, 2019)

Page 7: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

7

offers a valuable contribution offering a systematic approach by generalising computational practices and providing a technological platform supporting reproducibility.

2 Background

"While currently there is a unilateral emphasis on 'first' discoveries, there should be as much emphasis on replication of discoveries." (Begley, 2013).

REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code and data reuse. REANA has the following characteristics: it is a flexible platform, because it can run multiple different computation workflow engines; it is scalable because it is supported for remote computing cloud; it is considered reusable because its ability to be reused elsewhere by different organisations in diverse settings; and it is a technology developed at CERN as free software without MIT license (REANA, 2019).

The significant contribution of REANA is that it generalizes computational practices employed in the particle physics scientific community and enables the adoption of declarative workflows systems to implement data analysis processes on remote compute clouds. REANA abstracts user practices and helps to systematise reproducibility. Different research teams use different technologies and tools supporting computational workflows to reproduce scientific outputs. REANA has analysed such different scientific pipelines or sequence workflows (van der Aalst et al., 2003), which has led to a systematisation and sequence abstraction of steps that provide a clear entry point to REANA platform (see section 2.1). It provides a “simple ‘shell script’ use case where commands are run sequentially, and each step produces outputs for the next step” (Simko et al., 2018). The steps are summarized below in section 2.1.

2.1 Main steps for Reproducibility

The purpose of achieving a reproducible research data analysis translates into providing structured “runnable recipes” that address (Simko et al., 2018) (see figure 1):

(1) Data: What is the input?: Data files, parameters, live database calls;

(2) Code: What software was used to analyse the data?: Custom code, frameworks, notebooks;

(3) Environment: Which computing environments were used to run the analysis software?: Operating systems, software packages and libraries, CPU and memory resources;

(4) Steps: Which computational steps were taken to run the analysis?: Simple shell commands, structured computational workflows, local or remote task execution.

These steps permit to instantiate the analysis on the compute clouds and run the analysis to obtain its output scientific results.

Page 8: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

8

Figure 1. Step by step process for REANA implementation (based o: REANA, 2019)

2.2 Main features of REANA

The main features of REANA can be summarized as follows (REANA, 2018):

o Structure research data analysis in a reusable manner

o Instantiate computational workflows on remote clouds

o Rerun analyses with modified input data, parameters or code

o Support for several compute clouds (Kubernetes/OpenStack)

o Support for several workflow specifications (CWL, Serial, Yadage)

o Support for several shared storage systems (Ceph)

o Support for several container technologies (Docker)

2.3 REANA Architecture

Leveraging on industry-standard container technologies useful for preserving and reinstantiating runtime environments CERN has developed REANA platform, which permits users to structure their analyses and run them on remote computational clouds (see figure 2).

A YAML file captures the information about the analysis inputs, parameter, and processes. The REANA platform offers a set of micro-services permitting to run and assess container-based computational workflow jobs on the cloud. The user interface of REANA and the command-line client enable scientists to rerun analysis workflows with new input parameters in a simple way. The REANA platform supports a plurality o: Container technologies (Docker), workflow engines (CWL, Yadage), shared storage systems (Ceph, EOS) and compute cloud infrastructures (Ku- bernetes/OpenStack, HTCondor) used by the scientific community (Simko et al., 2018).

Page 9: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

9

Figure 2. REANA Architecture (based o: REANA, 2019)

3 Drivers

REANA was born with the goal to fight the reproducibility crisis and as a natural next step of CERN’s effort to move forward their open science efforts to the next step. CERN has continuously embraced the principles of open science, such as open access with the Sponsoring Consortium for Open Access Publishing in Particle Physics - SCOAP3 and all LHC publications have been published under Open Access conditions; open data, setting up the Open Data Portal for the LHC experiments or Zenodo a free Open Data repository launched by the organisation for use beyond the high-energy physics community; Invenio, an open source library management software package; and their use of open source licenses (Nilsen and Anelli, 2016; Murillo and Kauttu, 2018). REANA was the next step to put the different open science layers and achieve the ultimate goal of facilitating the reproducibility of results.

By doing this, they knew not only the particle-physics community would benefit, but others would also do. The experimental data in particle physics is expensive to take as it requires a high investment in equipment, infrastructure to empirically test the theory. At CERN researchers and engineers have used powerful particle accelerators and detectors to test the predictions and limits of the Standard Model. Besides making possible that all such results are open to being re-used by others, REANA filled the gap for effectively and efficiently succeed in reproducing the scientific outputs performed at CERN and outside by particle physics community and beyond.

Page 10: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

10

To give an example from the outside community of particle physics, also in biomedical research, data is expensive because it often requires large experimental projects to gather it. However, the lack of reproducibility affects not only the work of the scientific community, preventing to move forward scientific discovery but also it affects the industry, especially sectors with high R&D intensity that rely on basic science groundwork. An example is the pharmaceutical industry: “scientists at Bayer recently evaluated about 70 targets that they had worked on. They observed that for almost two-thirds of the targets, the initial basic research data that prompted interest could not be replicated” (Rosenblatt, 2013). Drug development long and convoluted and it takes an average of 12 years from drug discovery to final approval. The process relies on the drug discovery phases (target identification and validation) which is typically done by industry and academia together. Scientific results in such setting need to be reproducible to avoid further waste in investment.

4 Barriers

To scale-up, the use of REANA and foster reproducibility, one of the significant challenges is the actual adoption of preservation tools and techniques by the scientific community. There are several behavioral aspects from scientists that are challenging to be quickly fixed. Awareness about the different steps that scientists need to complete (see section 6) to make possible that others reproduce their result is one of the bottlenecks that REANA might face to be fully exploited and widely deployed.

Scientists need to preserve all the pieces used in the scientific process carefully and document the different steps followed (figure 1) to make possible that others use REANA to reproduce the results. Testing new theories sometimes demand to "resurrect" data from a long time ago. The main challenges rely upon that scientists need to preserve well-structured information about the data they used, software, computing environments, and the associated analysis pipelines to make possible to reuse their data.

There has been already a discussion, which surpasses REANA community, about what sorts of actions should be put in place to generate the academic incentives/ reward structures to foster a "culture of preservation" regarding data, software and all the various inputs integrating the scientific process (Hildreth et al., 2018). While this more general transformation occurs from an institutional perspective, some scientific communities are at the forefront of the open science efforts to make their work more productive and better by developing, implementing and raising awareness about platforms such as REANA, which help systematise the different steps needed to make an experiment reproducible, while providing the technological support to make it happen.

5 Impact

5.1 Science

Due to its focus on supporting diverse reproducibility practices in diverse research domains, REANA has been developed for multi-workflows in different scientific settings, and different types of user.

To do so, REANA supports several workflows, for instance: the Common Workflow Language standard, applied in the bioinformatics and life science field, and Yadage workflow system from the particle physics community. Both of them permit to (Simko et al., 2018).

• The platform has been already tested and validated in a set of realistic particle physics analysis from the four LHC experiments at CERN, and it is currently being tested in diverse settings. REANA has demonstrated in such experimental examples the successful applicability of the platform. As a result, the experimental collaboration of researchers implementing REANA has started to recommend partial analysis preservation to be part of the official pre-publication approval (LHCb, ATLAS Exotics

Page 11: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

11

group). See figure 3 with some selected images from several runnable examples showing the successful implementation of REANA.

Figure 3. Selected images from runnable examples of REANA

• At the current stage, different organisations are both 1) contributing to the development of REANA code for improvements, and 2) asking CERN about the platform for further implementations, showing its potential to foster reproducibility in different scientific settings. Examples include, but are not limited to, bioinformatics and life science, the Swiss Data Science Centre, Astronomer scientific group at the University of Geneva, amongst others.

• Regarding the Swiss Data Science Centre2, the organization runs a similar system called RENKU that is addressing data science needs and is focusing primarily on interactive notebooks. CERN and Swiss Data Science teams have been discussing a collaboration around running batch workflows. Both organisations had a workshop together to collaborate in the implementation of REANA in such setting (see further information about this use case3).

2 About Swiss Data Center: https://datascience.ch/>.

3 About workshop CERN- Swiss Data Center: https://indico.cern.ch/event/736300

Page 12: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

12

When it comes to the astronomer community from the University of Geneva, they run an integrated online analysis system that has similar needs for running scientific workflows on a containerised cloud. CERN is exploring at different synergies with such scientific community. The astronomer community case is just an example showing that are many scientific disciplines are looking for solutions such as REANA because they all have similar needs.

• Furthermore, the interest for REANA has gone through the Atlantic and attract contributors from the US. As a result, CERN has established several partnerships with diverse US projects, supported by the National Science Foundation (NSF). One example of such projects is the Scalable CyberInfrastructure for Artificial Intelligence and Likelihood-Free Inference (SCAILFIN) project4, that seeks to deploy machine learning techniques, artificial intelligence and likelihood-free inference techniques and software using scalable cyberinfrastructure (CI) that is developed to be integrated into existing CI elements, such as the REANA system. The analysis of LHC data is SCAILFIN’s primary science driver, yet the technology is sufficiently generic to be widely applicable to other data-intensive domains. Additionally, the National Center for Supercomputing in Illinois, another US partnership, also have started to hire people to build reproducible workflows using REANA5.

Finally, REANA work has been presented in multiple conferences resulting in the different implementation of the platform. One of such examples is the open Repositories 2017 where REANA was cited in the closing keynote as a best practice example on the way of transforming the data repositories from resource-oriented present towards problem-solving future6. In KubeCon 2018, also CERN presented and demonstrated running scientific workflows on distributed containerized cloud7.

5.2 Industry

The failure to reproduce research results is a dramatic problem especially for R&D intensive industries, such as biomedical research, which ‘stand in giants shoulders’ to develop new drugs into the markets and have to rely on basic science progress and sound scientific knowledge developed both in academia and their corporate R&D departments. In fields with complex and sophisticated experimental environments, a platform such as REANA that can foster the reproducibility of scientific experiments can be a breakthrough in the reproducibility crisis that we are experiencing.

• At present, REANA is looking at bridging the scientific work performed on the “scientific computing on the cloud” with public/ private cloud infrastructures to foster reusable and reproducible science principles. Currently there is a good momentum to do so thanks to the advancements of container technology in the IT industry (Simko et al., 2018).

Unfortunately, the main challenge still relies on adoption and community’s behaviour regarding the preservation of the different inputs to fulfill the requirements to run the platform. REANA stresses the importance of a strict discipline of those performing and involved in carrying research and experiments, both in academia and industry, to gather and preserve their data, software in its specific version and all steps developed (see figure 1).

4 About Scailfin: https://scailfin.github.io/ 5About National Center for Supercomputing hiring for Reana: https://twitter.com/danielskatz/status/1043138449908396037 6 About the keynote and the mention to Reana case: https://twitter.com/tiborsimko/status/880648144383553537 7 About KubeCon 2018 conference and Reana: https://twitter.com/sparkycollier/status/1003631920754974721

Page 13: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

13

6 Lessons learned

Open it is not enough: “Our own experience from opening up vast volumes of data is that openness cannot simply be tacked on as an afterthought at the end of the scientific endeavor. Besides, openness alone does not guarantee reproducibility or reusability, so it should not be pursued as a goal in itself. Focusing on data is also not enough: it needs to be accompanied by software, workflow, and explanations, all of which need to be captured throughout the usual iterative and closed research lifecycle, ready for a timely open release with the results” (Chen et al., 2018).

The example of REANA implemented by the high-energy physics community is an inspiring practice that could be adopted by other scientific communities widely. One of the main takeaways from this case is that reproducibility requires going beyond openness (Chen et al., 2018).

Practical lessons to make your research reproducible: As a result of the process developing REANA, ten rules were embraced as a requirement to make research reproducible (Sandve et al., 2013):

1) For every result, keep track of how it was produced

2) Avoid manual data manipulation steps

3) Archive the exact versions of all external programs used

4) Version control all custom scripts

5) Record all intermediate results, when possible in standardised formats

6) For analyses that include randomness, note underlying random seeds

7) Always store raw data behind plots

8) Generate hierarchical analysis output, allowing layers of increasing detail to be inspected

9) Connect textual statements to underlying results

10) Provide public access to scripts, runs, and results

7 Policy conclusions

While policymakers and funders move forward the open science agenda to foster reproducibility of research which is one of its ultimate goals, there are laudable and necessary steps implemented by the research community itself to move beyond the rhetoric and make necessary quick fixes with the design of a systematic approaches supported with technological tools that help bringing the transformation and profound change required to make reproducibility a reality. REANA embodies one of such examples led by the particle physics community, who has been one of the scientific communities actively engaged with the open science goals.

Besides the development of a technological platform that integrates different tools, REANA offers at the same time a systematic approach to make reproducible the research outputs generated by the scientific community. Although the platform was originated around the needs of the particle physics community, as it has already happened in the past with other tools (e.g., Zenodo) applies to any domain, as the different use case already reflect, and it promises to be a relevant piece for the profound transformation needed.

Page 14: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

14

References

Begley, C. G.; Ellis, L. M. (2012). "Drug development: Raise standards for preclinical cancer research" (PDF). Nature. 483 (7391): 531–533. Bibcode:2012Natur.483..531B. doi:10.1038/483531a. PMID 22460880.

Begley, CG (2013). "Reproducibility: six flags for suspect wor.". Nature. 497: 433–434. Bibcode:2013Natur.497..433B. doi:10.1038/497433a. PMID 23698428

Casadevall, Arturo, and Ferric C. Fang (2010) "Reproducible science.": 4972-4975.

Hildreth, M. D., Boehnlein, A., Cranmer, K., Dallmeier, S., Gardner, R., Hacker, T., ... & Malik, T. (2018). HEP Software Foundation Community White Paper Working Group-Data and Software Preservation to Enable Reuse. arXiv preprint arXiv:1810.01191.

Jarrett, Christian (27 August 2015). "This is what happened when psychologists tried to replicate 100 previously published findings". Research Digest. BPS Research Digest. Retrieved 8 November 2016.

Jasny BR, Chin G, Chong L, Vignieri S. Again, and again, and again: Data replication and reproducibility. Science 2011;334:1225. 13 Naik G. Scientists’ Elusive Goal: Reproducing Study Results. Wall Street Journal 2011 Dec [accessed 2012 Aug 5]. Available from: http://online. wsj.com/article/SB10001424052970203764804577059841672541590. HTML.

M.D.Hildreth, A.Boehnlein, K.Cranmer, S.Dallmeier-Tiessen, R .Gardner, T.Hacker, L. Heinrich, I. Jimenez, M. Kane, D. S. Katz, T. Malik, C. Maltzahn, M. Neubauer, S. Neubert, J. Pivarski, E. Sexton-Kennedy, J. Shiers, T. Simko, S. Smith, D. South, A. Verbytskyi, G. Watts, J. Wozniak, "HEP Software Foundation Community White Paper Working Group – Data and Software Preservation to Enable Reuse" (2018), arXiv:1810.01191.

Nature (2018) https://www.nature.com/collections/prbfkwmwvz,accessed Januaryy 2019

Prinz, F.; Schlange, T.; Asadullah, K. (2011). "Believe it or not: How much can we rely on published data on potential drug targets?". Nature Reviews Drug Discovery. 10 (9): 712. doi:10.1038/nrd3439-c1. PMID 21892149.

REANA http://REANA.io/,accessed Januaryy 2019

Reliability of ‘new drug target’ claims called into question. Nature.com 2011 Sep [accessed 2012 Jul 27]. Available from: http://blogs.nature. com/news/2011/09/reliability_of_new_drug_target.html.

Šimko, T., Cranmer, K., Crusoe, M. R., Heinrich, L., Khodak, A., Kousidis, D., & Rodríguez, D. (2018, October). Search for computational workflow synergies in reproducible research data analyses in particle physics and life sciences. In 2018 IEEE 14th International Conference on e-Science (e-Science) (pp. 403-404). IEEE.

X. Chen, S. Dallmeier-Tiessen, R. Dasler, S. Feger, P. Fokianos, J. B. Gonzalez, H. Hir- vonsalo, D. Kousidis, A. Lavasa, S. Mele, D. Rodriguez Rodriguez, T. Simko, T. Smith, A. Trisovic, A. Trzcinska, I. Tsanaktsidis, M. Zimmermann, K. Cranmer, L. Heinrich, G. Watts, M. Hildreth, L. Lloret Iglesias, K. Lassila-Perini, S. Neubert, “Open is not enough”, Nature Physics (2018), doi:10.1038/s41567-018-0342-2.

Weir, Kristen. "A reproducibility crisis?". American Psychological Association. American Psychological Association. Retrieved 24 November 2016

Page 15: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

15

W. M. P. van der Aalst, A. H. M. ter Hofstede, B. Kiepuszewski, A. P. Bar- ros, "Workflow Patter,"", Distributed and Parallel Databases 14, 5–51 (2003), doi:10.1023/A:1022883727209.

Page 16: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

Getting in touch with the EU IN PERSON All over the European Un i,on there are hundreds of Europe Direct Information Centres. You can find the address of thcenterre nearest you t: http://europa.eu/conta.ct ON THE PHONE OR BY E-MAIL Europe Direct is a service that answers your questions about the European Union. You can contact this service – by freephone: 00 800 6 7 8 9 10 11 (certain operators may charge for these calls), – at the following standard number: +32 22999696 or – by electronic mail va: http://europa.eu/contact Finding information about the EU ONLINE Information about the European Union in all the official languages of the EU is available on the Europa website at: http://europa.eu EU PUBLICATIONS You can download or order free and priced EU publications from EU Bookshop at: http://bookshop.europa.eu. Multiple copies of free publications may be obtained by contacting Europe Direct or your local informatiocenterre (see http://europa.eu/contact) EU LAW AND RELATED DOCUMENTS For access to legal information from the EU, including all EU law since 1951 in all the official language versions, go to EUR-Lex t: http://eur-lex.europa.eu OPEN DATA FROM THE EU The EU Open Data Portal (http://data.europa.eu/euodp/en/data) provideaccess to datasetsts from the EU. Data can be downloaded and reused for free, both focommercial and non-commercialal purposes.

Page 17: REANA: Reproducible research data analysis platform · 2019-03-13 · REANA is a reusable and reproducible research data analysis platform that was generated by CERN to enable code

As a result of this growing awareness, reproducible research has become a pervasive objective in the research policy agenda at different governmental levels, including funders and journal policies. This case study analyses one of the significant initiatives coming from the particle physics field at CERN called REANA. REANA is a reusable and reproducible research data analysis platform enabling users to re-use and reinterpret preserved data analysis years after the original publication, facilitating reproducibility of scientific results. This case study analyses the impact of the platform on science and industry whilst highlighting the drivers behind the project and the barriers that it has overcome. Finally, the study includes the key lessons learned and the main policy conclusions.

Studies and reports

[Catalogue num

ber]