reproducible research in practice - uzh...reproducible research in practice 1. das neue ifgi-logo...

25
Reproducible research in practice ifgi Institute for Geoinformatics University of Münster Edzer Pebesma Reproducible Research Workshop, UZH, Sep 13-14, 2016 1 / 23

Upload: others

Post on 08-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

Reproducible research in practice1. Das neue IfGI-Logo 1.6 Logovarianten

Logo für den Einsatz in internationalen bzw.

englischsprachigen Präsentationen.

Einsatzbereiche: Briefbogen, Visitenkarte,

Titelblätter etc.

Mindestgröße 45 mm Breite

ifgi

ifgi

Institute for GeoinformaticsUniversity of Münster

ifgi

Institut für GeoinformatikUniversität Münster

Logo für den Einsatz in nationalen bzw.

deutschsprachigen Präsentationen.

Einsatzbereiche: Briefbogen, Visitenkarte,

Titelblätter etc.

Mindestgröße 45 mm Breite

Dieses Logo kann bei Anwendungen

eingesetzt werden, wo das Logo besonders

klein erscheint.

Einsatzbereiche: Sponsorenlogo,

Power-Point

Größe bis 40 mm Breite

Edzer Pebesma

Reproducible Research Workshop, UZH, Sep 13-14, 2016

1 / 23

Page 2: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen
Page 3: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen
Page 4: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen
Page 5: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen
Page 6: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

Overview

1. Who am I?

2. What is reproducible research? What is replication?

3. Reasons to not do reproducible research

4. Publication cycle

5. Low-hanging fruit

6. More difficult targets

7. http://o2r.info

6 / 23

Page 7: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

Who am I?

I Co-Editor-in-Chief forI Computers & Geosciences (1977)I Journal of Statistical Software (1996)

I Co-author of Applied Spatial Data Analysis with R

I author of several R packages

I active member (and developer) in the R community

7 / 23

Page 8: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen
Page 9: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

What is reproducible research? What is replication?

9 / 23

Page 10: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

What is reproducible research? What is replication?

10 / 23

Page 11: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

What is reproducible research? What is replication?

11 / 23

Page 12: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

Why is the ability to reproduce important?

I transparency, credibility: science is about truths, not opinions

I the ability to verify correctness

12 / 23

Page 13: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

Reasons to not do reproducible research

“Good” reasons:

I I can’t reveal the data – privacy, politics, size

I There is no (scientific) reward – lack of incentives

I Just tell me how! – it is hard, where are the guidelines?

“Bad” reasons:

I I want to keep a competitive advantage – data, procedures,software

I I fear a loss of funding – someone else may financially benefitfrom my work (NC clause)

I I fear someone finds a mistake, or reveal my messy practice(climate community)

13 / 23

Page 14: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

Low-hanging fruit

I the “bad” reasons are hard to fight - this is appealing toresearch ethics, really.

I some of the “good” reasons can be fought:I there can be good reasons to not reveal the data ⇒ hard to

remove, but why not provide procedures with data that isanonymized, scrambled, simulated, subsetted, ...

I lack of incentives: there is no (scientific) reward ⇒ createincentives: reuse → citations

I it is hard: where are the guidelines? ⇒ make it simple

14 / 23

Page 15: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

http://o2r.info

“Opening Reproducible Research”:instead of papers, publish research compendia1, consisting ofpaper, data, and software.

I DFG-LIS call “Open Access Transformation”

I cooperation ULB (library), Chris Kray (HCI), me (journals,geoscience);

I funding: 3 FTE x 2 years, possibly +3 years; start 2016

Central to the proposal is a new form for creating and providingresearch results, the executable research compendium (ERC),which not only enables third parties to reproduce the originalresearch and hence recreate the original research results (figures,tables), but also facilitates interaction with them and therecombination of them with new data or methods.Focus on the publication cycle.

1Gentleman and Temple Lang, 2007. Statistical analyses and reproducibleresearch. Journal of Computational and Graphical Statistics 16:1

15 / 23

Page 16: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

http://o2r.info

“Opening Reproducible Research”:instead of papers, publish research compendia1, consisting ofpaper, data, and software.

I DFG-LIS call “Open Access Transformation”

I cooperation ULB (library), Chris Kray (HCI), me (journals,geoscience);

I funding: 3 FTE x 2 years, possibly +3 years; start 2016

Central to the proposal is a new form for creating and providingresearch results, the executable research compendium (ERC),which not only enables third parties to reproduce the originalresearch and hence recreate the original research results (figures,tables), but also facilitates interaction with them and therecombination of them with new data or methods.Focus on the publication cycle.

1Gentleman and Temple Lang, 2007. Statistical analyses and reproducibleresearch. Journal of Computational and Graphical Statistics 16:1

15 / 23

Page 17: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

Publication cycle

16 / 23

Page 18: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

Research

Publication Process

prepare validate review publish

data

analysis

description

URC ERC RERC PERC

• add metadata • generate reference

results • convert/clean data • convert/clean

analysis procedure • specify licenses • specify UI bindings

(parameters, tables, figures)

• check metadata • check execution • compare results

from execution to reference results

• check UI bindings

• human inspection in different contexts:

• self-publication • peer-review • library check

• confirm validation outcomes

• examine content

• assign DOI(s)/URI(s) • make accessible

• for download • one-click repro. • via specific

platforms/formats

• store • archive • make discoverable

use• one-click reproduce • interact and query

(change parameters, visualisations, etc.)

• discover &compare • re-use components

(data, analysis, etc.)

Page 19: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen
Page 20: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

O2R goals:

(i) to define the formal structure to which an executable researchcompendium has to comply,

(ii) to develop tools for automating validation,

(iii) to demonstrate and evaluate (i) and (ii) by means of fullyfledged use cases, and

(iv) going beyond mere reproduction by developing tools forinteractive exploration of executable research compendia.

Partners:

I Elsevier (H. Koers, content innovation management)

I Copernicus (X. van Edig, journals)

I UCSB (Kuhn), Aalto Univ. School of Science (Kauppinen),Utrecht (Scheider)

19 / 23

Page 21: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

Role of the library

I long-term preservation & archiving

I search & find

I library workflows: what can the library offer to all scientists?What do they have to understand, and what is managed bythe domains?

I use & extend library standards for digital archives: OAIS,BagIt

20 / 23

Page 22: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

More difficult targets

Out of O2R’s scope:

I my data set is large (try reproduce Google Earth Engine)

I my computation only runs on dedicated hardware (GPU,clusters, Arduino)

I my computation requires supercomputing

I licensed software, software constrained to particular platforms

I business models

Inside O2R’s scope

I which interactions are valuable?

I software is dynamic: fix versions and rebuild? fix runtime?

I primarily R, secondarily: anything that can be encapsulated ina docker container

21 / 23

Page 23: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

Why docker?

I VMs abstract away hardware/OS layer

I mainstream

I lightweight, copy-on-write

I dockerfiles make the docker container transparent, andreproducible

Challenges:

I not developed primarily for the purpose of reproducibility(luckily?)

I for this, software versioning system needs better developed

22 / 23

Page 24: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

Reproducible Research in practice: Docker container

https://github.com/benmarwick/1989-excavation-report-Madjebebe

23 / 23

Page 25: Reproducible research in practice - UZH...Reproducible research in practice 1. Das neue IfGI-Logo 1.6 Logo varianten Logo f r den Einsatz in int ernationalen bzw . englischsprachigen

Discussion & Conclusions

I Reproducible research is not hard, benefit now from the lackof guidelines!

I Start early, small-scale: share workflows, scripts, software,data and papers from day 1 rather than just before submittingthe manuscript

I How do we teach our students what open science is?

24 / 23