fair bioinfo : open science and fair principles in a bioinformatics … · 2021. 7. 1. · fair...

46
FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make a bioinformatics project more reproducible C. Hernandez 1 T. Denecker 2 J. Seiler 2 G. Le Corguill´ e 2 C. Toano-Nioche 1 1 Institute for Integrative Biology of the Cell (I2BC) UMR 9198, Universit´ e Paris-Sud, CNRS, CEA 91190 - Gif-sur-Yvette, France 2 IFB Core Cluster taskforce June 2021 eline, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 1 / 263

Upload: others

Post on 05-Sep-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

FAIR bioinfo : Open Science and FAIR principles in abioinformatics project

How to make a bioinformatics project more reproducible

C. Hernandez1 T. Denecker2 J. Seiler2 G. Le Corguille2

C. To↵ano-Nioche1

1Institute for Integrative Biology of the Cell (I2BC)UMR 9198, Universite Paris-Sud, CNRS, CEA

91190 - Gif-sur-Yvette, France

2IFB Core Cluster taskforce

June 2021

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 1 / 263

Page 2: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

General information

Practical information:Dates: June 28th - 30th

Location: Institut des SystemesComplexes, 113 rue Nationale,75013-Paris

Courses: 9:00 to 17:30

Meal: 12:30-14:00

Pauses: 10:30-11:00 + 15:30-16:00

2 days of courses + 1 day of coursebuilding

Round table:Teachers

Learners

Ressources:

GitLab

LATEX

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 2 / 263

Page 3: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Training schedule

Day 1:

Introduction to reproducibility

History management (3 Practical Sessions, , )

Control your development environment (1 PS, )

Encapsulation (2 PS, )

Day 2:

Workflow (2 PS, )

Traceability with notebooks (2 PS, , )

IFB resources (2 PS, , )

Sharing and disseminating ( , )

Conclusion

Day 3:

Empowerment and improvement of resources

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 3 / 263

Page 4: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Table of contents

1 Introduction toreproducibility

2 History management

3 Control your developmentenvironment

4 Workflow

5 Tracability with NotebookIntroductionMarkdown

6 IFB resources

7 Sharing and dissemination

8 Conclusion

9 3rd Day

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 174 / 263

Page 5: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Literate programming

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 175 / 263

Page 6: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Introduction

What is literate programming ?Let us change our traditional attitude to the construction of programs:Instead of imagining that our main task is to instruct a computer what todo, let us concentrate rather on explaining to humans what we want thecomputer to do.— Donald E. Knuth, Literate Programming, 1984

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 176 / 263

Page 7: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Introduction

What is literate programming ?

Definition”Literate programming is a programming paradigm introduced by DonaldKnuth in which a computer program is given an explanation of its logic ina natural language, such as English, interspersed with snippets of macrosand traditional source code, from which compilable source code can begenerated.” Donald Knuth, 1984.

Wikipedia, 18/08/2020https://en.wikipedia.org/wiki/Literate_programming#Workflow

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 177 / 263

Page 8: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Introduction

What does it look like ?

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 178 / 263

Page 9: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Introduction

Interactive programming interfaceallowing to combine both natural andcomputer languages.

In one file:

Explanations

Code

Results

Graphs and plots

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 179 / 263

Page 10: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Introduction

Why using literate programming frameworks ?

Use cases:

Day to day analyses

Analysis reports

Writing scientific articles

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 180 / 263

Page 11: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Example of an article entirely written using a notebook

File (on a repository) Published article Executable file

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 181 / 263

Page 12: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Literate programming

This session :

Markdown

Rmarkdown / RStudio

Jupyter

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 182 / 263

Page 13: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Markup and markdown

DefinitionA markup language uses tags to define elements within a document.

Three di↵erent types and usage :

Presentational (used by traditional word-processing systems)I Markup is invisible

Procedural, provides instructions to process the text (e.g. TeX,PostScript)

I Markup is visible and can be directly manipulated by the author.

Descriptive, to label documents parts (e.g. LaTeX, HTML, XML...)I Emphasizes the document structure.

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 183 / 263

Page 14: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Markdown language

Markdown is a Lightweight markup language.Designed to be :

easy to write using any generic text editor (plain-text-formattingsyntax)

easy to read in its raw form

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 184 / 263

Page 15: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Markdown language

You’ve probably see it already on GitHub (README), Wikipedia...

Github guide :urlhttps://guides.github.com/features/mastering-markdown/

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 185 / 263

Page 16: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Literate programming

But how is this useful for literate programming?

When you want to weave both code (to be interpreted) and formattinginformation, you precisely need a lightweight language for the formattingpart.

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 186 / 263

Page 17: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

The challengers

No need to hide, there are currently two main frameworks used inbioinformatics:RMarkdown and Jupyter

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 187 / 263

Page 18: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

RMarkdown

RMarkdown

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 188 / 263

Page 19: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

RMarkdown

At the beginning, there was nothing.

Then came Sweave.Leisch, Friedrich (2002). ”Sweave, Part I: Mixing R and LaTeX: A shortintroduction to the Sweave file format and corresponding R functions”And people saw that the path would be long...

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 189 / 263

Page 20: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

RMarkdown

At the beginning, there was nothing.

Then came Sweave.Leisch, Friedrich (2002). ”Sweave, Part I: Mixing R and LaTeX: A shortintroduction to the Sweave file format and corresponding R functions”

And people saw that the path would be long...

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 189 / 263

Page 21: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

RMarkdown

At the beginning, there was nothing.

Then came Sweave.Leisch, Friedrich (2002). ”Sweave, Part I: Mixing R and LaTeX: A shortintroduction to the Sweave file format and corresponding R functions”And people saw that the path would be long...

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 189 / 263

Page 22: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

RMarkdown

knitR (2011)

”The knitr package was designed to be a transparent engine for dynamicreport generation with R, solve some long-standing problems in Sweave,and combine features in other add-on packages into one package”https://yihui.org/knitr/

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 190 / 263

Page 23: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

RMarkdown

RMarkdown

”When you run render, R Markdown feeds the .Rmd file to knitr, whichexecutes all of the code chunks and creates a new markdown (.md)document which includes the code and its output.The markdown file generated by knitr is then processed by pandoc which isresponsible for creating the finished format.”https://rmarkdown.rstudio.com

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 191 / 263

Page 24: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

RMarkdown

RMarkdown

Integrated into RStudio, IDE for R.

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 192 / 263

Page 25: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

RMarkdown

R Notebooks

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 193 / 263

Page 26: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

RMarkdown

R Notebooks and more...

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 194 / 263

Page 27: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Jupyter

Jupyter

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 195 / 263

Page 28: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Jupyter

A bit of history...

2011 : IPython (interactive Python shell) with notebookfunctionalities

2014 : Spin-o↵ project called Project Jupyter

a non-profit, open-source project maintained by a strong Community

”Jupyter will always be 100% open-source software, free for all to useand released under the liberal terms of the modified BSD license”

A reference to the three core programming languages supported byJupyter (Julia, Python and R)

https://jupyter.org/

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 196 / 263

Page 29: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Jupyter

What can it do?

Everything (excepted co↵ee)

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 197 / 263

Page 30: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Jupyter

What can it do?Everything (excepted co↵ee)

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 197 / 263

Page 31: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Jupyter

But what is it exactly ?

Web-based interactive computational environment.

Web-based : client/server

Interactive : notebook system

Computational environment : console, many kernels available...

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 198 / 263

Page 32: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Jupyter

But what is it exactly ?Web-based interactive computational environment.

Web-based : client/server

Interactive : notebook system

Computational environment : console, many kernels available...

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 198 / 263

Page 33: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Jupyter

But what is it exactly ?Web-based interactive computational environment.

Web-based : client/server

Interactive : notebook system

Computational environment : console, many kernels available...

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 198 / 263

Page 34: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Jupyter

But what is it exactly ?Web-based interactive computational environment.

Web-based : client/server

Interactive : notebook system

Computational environment : console, many kernels available...

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 198 / 263

Page 35: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Jupyter

But what is it exactly ?Web-based interactive computational environment.

Web-based : client/server

Interactive : notebook system

Computational environment : console, many kernels available...

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 198 / 263

Page 36: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Jupyter

Dashboard

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 199 / 263

Page 37: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Jupyter

Notebook editor

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 200 / 263

Page 38: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Jupyter

Project Jupyter

A non-profit, open-source project maintained by a strong Community

Adopted by the biggest in the Cloud industry (Google, Microsoft,Amazon...)

And financed by the biggest (Google, Microsoft, EU Horizon 2020program, Alfred P. Sloan Foundation...)

Inside the Python community (snakemake, conda...)

Integration with GitHub since 2015 (renderer)

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 201 / 263

Page 39: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Jupyter

Nbviewer : a static renderer for Jupyter notebooks

https://nbviewer.jupyter.org/

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 202 / 263

Page 40: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

JupyterJupyter + Docker = binder

https://mybinder.org/

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 203 / 263

Page 41: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

JupyterSince June 2019 : Jupyter Lab v1.0 (now v3.0.16)

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 204 / 263

Page 42: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Conclusion ?

Who’s the best?

It depends...

R analyses? Go for RMarkdown/RStudio

R analyses for a publication ? Consider Jupyter with an R kernel

Python analyses ? Why do you even ask...

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 205 / 263

Page 43: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Conclusion ?

Who’s the best?

It depends...

R analyses? Go for RMarkdown/RStudio

R analyses for a publication ? Consider Jupyter with an R kernel

Python analyses ? Why do you even ask...

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 205 / 263

Page 44: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Conclusion ?

Who’s the best?

It depends...

R analyses? Go for RMarkdown/RStudio

R analyses for a publication ? Consider Jupyter with an R kernel

Python analyses ? Why do you even ask...

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 205 / 263

Page 45: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Conclusion ?

Who’s the best?

It depends...

R analyses? Go for RMarkdown/RStudio

R analyses for a publication ? Consider Jupyter with an R kernel

Python analyses ? Why do you even ask...

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 205 / 263

Page 46: FAIR bioinfo : Open Science and FAIR principles in a bioinformatics … · 2021. 7. 1. · FAIR bioinfo : Open Science and FAIR principles in a bioinformatics project How to make

Practical session

Savoir FAIRe

Markdown

Learn the structure of an Rmd file

Turn a script into a notebook

Extend the notebook with new functionalities

This afternoon: Jupyter with the IFB cluster

Celine, Claire (I2BC-IFB) FAIR Bioinfo IFB 2021 206 / 263