what is reproducibility goble-clean

35
What is Reproducibility? The R* Brouhaha Professor Carole Goble The University of Manchester, UK Software Sustainability Institute UK [email protected] Alan Turing Institute Symposium Reproducibility, Sustainability and Preservation 6-7 April 2016, Oxford, UK

Upload: carole-goble

Post on 12-Jan-2017

766 views

Category:

Science


1 download

TRANSCRIPT

Page 1: What is reproducibility goble-clean

What is Reproducibility?The R* BrouhahaProfessor Carole Goble

The University of Manchester, UKSoftware Sustainability Institute UK

[email protected]

Alan Turing Institute Symposium Reproducibility, Sustainability and Preservation , 6-7 April 2016, Oxford, UK

Page 2: What is reproducibility goble-clean

“When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.”

Carroll, Through the Looking Glass

re-compute

replicatererun

repeat

re-examine

repurpose

recreate

reuse

restorereconstruct review

regeneraterevise

recycle

redo

robustness tolerance

verification compliance validation assurance

remix

Page 3: What is reproducibility goble-clean

Reproducibility of Reproducibility Research

Page 4: What is reproducibility goble-clean

Even in Computer Science

Page 5: What is reproducibility goble-clean

http://www.dagstuhl.de/1604124. – 29. January 2016, Dagstuhl Seminar 16041

Reproducibility of Data-Oriented Experiments in e-Science

Page 6: What is reproducibility goble-clean

Computational Workflow

Page 7: What is reproducibility goble-clean

10 Simple Rules for Reproducible Computational

Research1. For Every Result, Keep Track of How It

Was Produced2. Avoid Manual Data Manipulation Steps3. Archive the Exact Versions of All

External Programs Used4. Version Control All Custom Scripts5. Record All Intermediate Results, When

Possible in Standardized Formats6. For Analyses That Include Randomness,

Note Underlying Random Seeds7. Always Store Raw Data behind Plots8. Generate Hierarchical Analysis Output,

Allowing Layers of Increasing Detail to Be Inspected

9. Connect Textual Statements to Underlying Results

10.Provide Public Access to Scripts, Runs, and Results

Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285

Record Everything

Automate Everything

Page 8: What is reproducibility goble-clean

Scientific publications goals: (i) announce a result and (ii) convince readers that the result is correct.

Papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension.

Papers in computational science should describe the results and provide the complete software development environment, data and set of instructions which generated the figures.

Virtual Witnessing*

*Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.

Why? Jill Mesirov

David Donoho

Page 9: What is reproducibility goble-clean

Datasets, Data collectionsStandard operating proceduresSoftware, algorithmsConfigurations, Tools and apps, services

Codes, code librariesWorkflows, scriptsSystem software Infrastructure Compilers, hardware

Page 10: What is reproducibility goble-clean

Analogy: The Labdata science | data-driven science

1. Philosophy

2. Practice

Page 11: What is reproducibility goble-clean

Witnessing “Datascopes”

Input Data

Software

Output Data

ConfigParameters

Methodstechniques, algorithms, spec. of the steps, models

Materialsdatasets, parameters, algorithm seedsExperim

ent

Instrumentscodes, services, scripts, underlying libraries, workflows, , ref resourcesLaboratory

sw and hw infrastructure, systems software, integrative platformscomputational environment

Setup

Page 12: What is reproducibility goble-clean

Model Driven Science – can I rerun my model?Model Sweeps, What are the sensitivities?

Why?

Page 13: What is reproducibility goble-clean

“Micro” Reproducibility

“Macro” Reproducibility

When?

Page 14: What is reproducibility goble-clean

Repeat, Replicate, Robust

Why the differences?

Reproduce

[C Titus Brown]

https://2016-oslo-repeatability.readthedocs.org/en/latest/repeatability-discussion.html

Page 15: What is reproducibility goble-clean

Computational analyses

ProductivityTrack differences

Page 16: What is reproducibility goble-clean

Computational analyses?

Repeatability:SamenessSame result1 Lab1 experiment

Reproducibility:SimilaritySimilar result> 1 Lab> 1 experiment

Page 17: What is reproducibility goble-clean

“an experiment is reproducible until another laboratory tries to repeat it.”

Alexander Kohn

Page 18: What is reproducibility goble-clean

reviewers want additional workstatistician wants more runsanalysis needs to be repeatedpost-doc leaves, student arrivesnew data, revised dataupdated versions of algorithms/codessample was contaminated

Page 19: What is reproducibility goble-clean

Measuring Information Gain from Reproducibility

Research goal

Method/Alg.

Platform/Exec Env

Data Parameters

Input data

Actors

Information Gain Consistency

Robustness/S

ensitivity

Generality

Portability/Adoption 1

Portability/Adoption 2

Independent validation

Repurposability

Implementation/Code

No changeChangeDon’t care

https://linkingresearch.wordpress.com/2016/02/21/dagstuhl-seminar-report-reproducibility-of-data-oriented-experiments-in-e-scienc/

http://www.dagstuhl.de/16041

Page 20: What is reproducibility goble-clean

Taxonomy of actions towards improving reproducibility in Computer Science.

https://linkingresearch.wordpress.com/2016/02/21/dagstuhl-seminar-report-reproducibility-of-data-oriented-experiments-in-e-scienc/

http://www.dagstuhl.de/16041

Page 21: What is reproducibility goble-clean

Practice

Page 22: What is reproducibility goble-clean

Methodstechniques, algorithms, spec. of the steps, models

Materialsdatasets, parameters, algorithm seedsExperim

ent

Instrumentscodes, services, scripts, underlying libraries, workflows, ref datasets

Laboratorysw and hw infrastructure, systems software, integrative platformscomputational environment

Input Data

Software

Output Data

ConfigParameters

“Datascope” Entropy -> Preservation“Replication / Reproducibility Window”

Setup

Page 23: What is reproducibility goble-clean

Change:Science, methods, datasetsQuestions don’t change, answers do.

Materials unavailableone offs, streams,stochastics,

sensitivities,licensing scale, non-portable data

Change:Instruments break, labs decayActive ref datasets and services

Platforms & resources unavailable

supercomputer scalenon-portable software

“Datascope” Entropy -> Preservation“Replication / Reproducibility Window”

Archived vs Active EnvironmentIsolated vs Open Distributed Ecosystem

Page 24: What is reproducibility goble-clean

T1 T2

Evolving Reference Knowledge Basese.g. UNIPROT

Page 25: What is reproducibility goble-clean

Repeat harder than Reproduce?

Repeating the experiment or the set up?• When the environment is active

Page 26: What is reproducibility goble-clean

“Datascope” Entropy -> Preservation“Replication / Reproducibility Window”

Form?

Function?

Methodstechniques, algorithms, spec. of the steps, models

Materialsdatasets, parameters, algorithm seedsExperim

ent

Instrumentscodes, services, scripts, underlying libraries, workflows, ref datasets

Laboratorysw and hw infrastructure, systems software, integrative platformscomputational environment

Setup

Page 27: What is reproducibility goble-clean

How? Preserve by Reporting, Reproduce by ReadingProvenance Traces, Notebooks, Rich Metadata

Archived Record

MethodsMaterials

Experim

ent

InstrumentsLaboratory

Setup

Page 28: What is reproducibility goble-clean

How? Preserve by Reporting, Reproduce by ReadingProvenance Traces, Notebooks, Rich Metadata

Archived Recordstandards, common metadata

Provenance

Workflows, Scripts

ELNsMethods

MaterialsExperim

ent

InstrumentsLaboratory

Setup

Page 29: What is reproducibility goble-clean

How? Preserve by Maintaining, Repairing, VMsReproduce by Running, Emulating, Reconstructing

Active Instrument

MethodsMaterials

Experim

ent

InstrumentsLaboratory

Setup

Page 30: What is reproducibility goble-clean

How? Preserve by Maintaining, Repairing, VMsReproduce by Running, Emulating, Reconstructing

Active Instrument, Byte level

MethodsMaterials

Experim

ent

InstrumentsLaboratory

Setup

Page 31: What is reproducibility goble-clean

Levels of Computational Reproducibility

Coverage: how much of an experiment is reproducible

Orig

inal

Exp

erim

ent S

imila

r Exp

erim

ent Di

ffere

nt E

xper

imen

tPo

rtabi

lity

Depth: how much of an experiment is available

Binaries + Data

Source Code / Workflow+ Data

Binaries + Data + Dependencies

Source Code / Workflow+ Data + Dependencies

Virtual MachineBinaries + Data + Dependencies

Virtual MachineSource Code / Workflow+ Data + Dependencies

Figures + Data

[Freire, 2014]

Minimum: data and source code available under terms that permit inspection and execution.

Page 32: What is reproducibility goble-clean

Repeatable Environments

*https://2016-oslo-repeatability.readthedocs.org/en/latest/overview-and-agenda.html

[C. Titus Brown*]

Metadata Objects : Reproducible Reporting, Exchange

Checklist Provenance

Tracking Versioning

Dependencies

container

Page 33: What is reproducibility goble-clean

provenance

dependenciessteps, featurestransparency

portability

robustness

preservation

access descriptionavailablestandards

common APIslicensingidentifiers

standards,common metadata

change variation sensitivity

versioning

packaging

Session 1

Session 3

real-time big data

Session 2computational models and simulations

Session 5Architecture

Infrastructure

So, What is Reproducibility? Being FAIR

Page 34: What is reproducibility goble-clean

So, What is Reproducibility? Being FAIR

Session 4

Citation and Credit

Page 35: What is reproducibility goble-clean

What is Reproducibility?Why, When, Where, Who for, Who by, How

Special thanks to• C Titus Brown• Juliana Freire• David De Roure• Stian Soiland-Reyes• Barend Mons• Tim Clark• Daniel Garijo• Wf4Ever and Research Object teams• Dagstuhl Seminar 16041 • Force11 http://www.force11.org