all about versioning? - uzh · versioning data depends on the size and update behaviour •...

19
all about Versioning? Dr. Moritz Neun

Upload: others

Post on 20-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

all about Versioning?Dr. Moritz Neun

Page 2: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Maps and Metrics

karhoo.com

Page 3: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Replicability

* Sumatra: a toolkit for reproducible research - Open Science Framework (https://osf.io/rc5jf/?action=download&version=1)

*

Page 4: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Replicability• manyfold and changing tools • manual work / tweaking • environment and dependencies

Page 5: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Building Blocks for Replicability

• documentation and record keeping • versioning

Page 6: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Versioning?

Version control is the means by which different versions and drafts of a document (or file or record or

dataset) are managed.

Version control is the means by which different versions and drafts of a document (or file or record or

dataset) are managed.

http://www2.le.ac.uk/services/research-data/organise-data/version-control

Page 7: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Applications

• Documents • Code • Data

➡ track evolution of work ➡ backup

Page 8: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Version Control

https://webinerds.com/version-control-systems-keep-your-code-in-order/

Page 9: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Versioning Changes

Version Control Systems

1972 1982 1990 2000 2005 2015

SCSS RCS CVSPERFORCE

SVNBITKEEPER

TFS

GITHG

VSTS

local to central central to distributed everything is a branch

SOURCEFORGE(users 2016: 3.7M)

BITBUCKET

GITHUB(users 2016: 15M)

GOOGLE CODE(discontinued 2016)

Source Code Hosting

Page 10: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Version Control Variants

• file name versions or gmail—> snapshots vs. diffs

• local / single user version control • distributed / shared version control

Page 11: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Version Control—> workhorse for record

keeping

Page 12: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Reproducible Workflows

notebook (journal, log, lab book) • keeping track of everything (incl. manual

work, bash history, …) • avoid manual work whenever possible

—> documentation and knowledge transfer

Page 13: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Record Keeping,Versioning & Sharing

Adequate documentation & knowledge transfer: • Shared docs for small projects • Wiki • documentation in code repository (README) • notebook applications (Jupyter) or toolsets

(e.g. RStudio & Knitr & Make & Latex & Git) —> records also need version control!

Page 14: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Versioning DataDepends on the size and update behaviour • Finalised data (keep in VCS or store on web object

stores if allowed) • Datasets with discrete updates (usually snapshots

with DB tools or also VCS work well) • Continuously updated/appended data (i.e. timeline

data) • DB versioning or full snapshots • make sure to annotate events and changes to the

pipeline or other tools

Page 15: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Sharing Code/ToolsHow to share and make code/tools reusable • Source Code, e.g. on GitHub, SourceForge,

University sites, web spaces (Dropbox, S3) • Executables, Bytecode • virtualisation and containers (e.g. VM, Docker) • Web Services (e.g. shared models)

Page 16: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Versioning Changes

Version Control Systems

1972 1982 1990 2000 2005 2015

SCSS RCS CVSPERFORCE

SVNBITKEEPER

TFS

GITHG

VSTS

local to central central to distributed everything is a branch

SOURCEFORGE(users 2016: 3.7M)

BITBUCKET

GITHUB(users 2016: 15M)

GOOGLE CODE(discontinued 2016)

Source Code Hosting

Page 17: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Sharing Data

Many open questions: • long term persistence? • cost / who pays? • privacy / copyright?

Page 18: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

be pragmaticbut aware

Page 19: all about Versioning? - UZH · Versioning Data Depends on the size and update behaviour • Finalised data (keep in VCS or store on web object stores if allowed) • Datasets with

Thank you