sharing reproducible computations on binder...+ data science + a community of people and an...

36
Sharing reproducible computations on Binder Lindsey Heagy, UC Berkeley & Project Jupyter

Upload: others

Post on 08-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

Sharing reproducible computations on BinderLindsey Heagy, UC Berkeley & Project Jupyter

Page 2: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

you?

Page 3: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

hello (a bit about me)

geophysical inversions

open-source software

open research & education

Jupyter, geoscience + data science

+

Page 4: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

a community of people and an ecosystem of open tools and standards for interactive computing

Page 5: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

the science is the code

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.

-- Buckheit and Donoho WaveLab and Reproducible Research, 1995

Page 6: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

the science is the code

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.

-- Buckheit and Donoho WaveLab and Reproducible Research, 1995

Page 7: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

the science is the code

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.

-- Buckheit and Donoho WaveLab and Reproducible Research, 1995

(and a place to run the code?)

Page 8: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

live, computational environments, running on the cloud, built from your research repositories

Page 9: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

mybinder.org

origins

GitHub

+

+explicit dependencies

+

docker kubernetes

Page 10: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

● creates reproducible containers from repositories (repo2docker)

● generates user sessions that serve these containers (JupyterHub)

● provides an interface to create, share, and use these sessions (BinderHub)

● demonstrates the above as a free public service/tech demo (mybinder.org)

Page 11: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

open science

(why?)

● in order to collaborate● to build on the work of others● for others to build upon your work● to make revisions to your paper when you

hear back from reviewers in 8 months

Page 12: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

open science

(how?)

● complete set of instructions● complete development environment● a place to run the code

Page 13: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

open science

(how?)

● complete set of instructions● complete development environment● a place to run the code

Page 14: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

complete set of instructionsopen-source languages are the raw material

Page 15: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

complete set of instructions

https://stackoverflow.blog/2017/09/06/incredible-growth-python/ https://stackoverflow.blog/2017/10/10/impressive-growth-r/

Page 16: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

complete set of instructionsmature ecosystems of tools

http://www.focusedsupport.com/blog/getting-setup-with-scientific-python/

Page 17: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

complete set of instructionsweb-native interfaces for interacting with code

Page 18: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

complete set of instructionscapture the steps

Page 19: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

complete set of instructionscapture the steps: what is a notebook?

Page 20: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

complete set of instructionsmaintenance and sharing

● version control● issue tracking● licensing● integrations with

○ testing services○ documentation hosting○ ...

GitHub GitLab

Bitbucket

Page 21: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

complete set of instructionsmaintenance and sharing

https://github.com/binder-examples/r

Page 22: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

open science

(how?)

● complete set of instructions● complete development environment● a place to run the code

Page 23: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

repo2docker deterministically build a docker image from a repository with documented dependencies

repo2docker

Page 24: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

complete development environmentdefine dependencies following community standards of practice

Page 25: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

complete development environment

repo2dockerdependencies

repo2docker

container file

Page 26: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

complete development environmentdefine dependencies

https://github.com/binder-examples/r

Page 27: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

open science

(how?)

● complete set of instructions● complete development environment● a place to run the code

Page 28: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

a place to run the codebinder: computational environments on the cloud

Page 29: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

a place to run the code binder!

binder repo2docker JupyterHub(next talk!)

Page 30: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

mybinder.org

Page 31: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code
Page 32: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

as predicted by Einstein

Page 33: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

LIGO collaboration discovery: Sept 14, 2015

Detection problem: ● ~ 1/1000 proton over 4 km.● Sensitivity ~ 1e-21● Milky Way: 1e+21m across!

Page 34: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

demo time

( )

https://www.gw-openscience.org/tutorials/https://github.com/binder-examples/r

Page 35: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

Future

Page 36: Sharing reproducible computations on Binder...+ data science + a community of people and an ecosystem of open tools and standards for interactive computing . the science is the code

thank you!

try out binder:

mybinder.org

connect with the community:

discourse.jupyter.org