what is reproducibility goble-clean
TRANSCRIPT
What is Reproducibility?The R* BrouhahaProfessor Carole Goble
The University of Manchester, UKSoftware Sustainability Institute UK
Alan Turing Institute Symposium Reproducibility, Sustainability and Preservation , 6-7 April 2016, Oxford, UK
“When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.”
Carroll, Through the Looking Glass
re-compute
replicatererun
repeat
re-examine
repurpose
recreate
reuse
restorereconstruct review
regeneraterevise
recycle
redo
robustness tolerance
verification compliance validation assurance
remix
Reproducibility of Reproducibility Research
Even in Computer Science
http://www.dagstuhl.de/1604124. – 29. January 2016, Dagstuhl Seminar 16041
Reproducibility of Data-Oriented Experiments in e-Science
Computational Workflow
10 Simple Rules for Reproducible Computational
Research1. For Every Result, Keep Track of How It
Was Produced2. Avoid Manual Data Manipulation Steps3. Archive the Exact Versions of All
External Programs Used4. Version Control All Custom Scripts5. Record All Intermediate Results, When
Possible in Standardized Formats6. For Analyses That Include Randomness,
Note Underlying Random Seeds7. Always Store Raw Data behind Plots8. Generate Hierarchical Analysis Output,
Allowing Layers of Increasing Detail to Be Inspected
9. Connect Textual Statements to Underlying Results
10.Provide Public Access to Scripts, Runs, and Results
Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
Record Everything
Automate Everything
Scientific publications goals: (i) announce a result and (ii) convince readers that the result is correct.
Papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension.
Papers in computational science should describe the results and provide the complete software development environment, data and set of instructions which generated the figures.
Virtual Witnessing*
*Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.
Why? Jill Mesirov
David Donoho
Datasets, Data collectionsStandard operating proceduresSoftware, algorithmsConfigurations, Tools and apps, services
Codes, code librariesWorkflows, scriptsSystem software Infrastructure Compilers, hardware
Analogy: The Labdata science | data-driven science
1. Philosophy
2. Practice
Witnessing “Datascopes”
Input Data
Software
Output Data
ConfigParameters
Methodstechniques, algorithms, spec. of the steps, models
Materialsdatasets, parameters, algorithm seedsExperim
ent
Instrumentscodes, services, scripts, underlying libraries, workflows, , ref resourcesLaboratory
sw and hw infrastructure, systems software, integrative platformscomputational environment
Setup
Model Driven Science – can I rerun my model?Model Sweeps, What are the sensitivities?
Why?
“Micro” Reproducibility
“Macro” Reproducibility
When?
Repeat, Replicate, Robust
Why the differences?
Reproduce
[C Titus Brown]
https://2016-oslo-repeatability.readthedocs.org/en/latest/repeatability-discussion.html
Computational analyses
ProductivityTrack differences
Computational analyses?
Repeatability:SamenessSame result1 Lab1 experiment
Reproducibility:SimilaritySimilar result> 1 Lab> 1 experiment
“an experiment is reproducible until another laboratory tries to repeat it.”
Alexander Kohn
reviewers want additional workstatistician wants more runsanalysis needs to be repeatedpost-doc leaves, student arrivesnew data, revised dataupdated versions of algorithms/codessample was contaminated
Measuring Information Gain from Reproducibility
Research goal
Method/Alg.
Platform/Exec Env
Data Parameters
Input data
Actors
Information Gain Consistency
Robustness/S
ensitivity
Generality
Portability/Adoption 1
Portability/Adoption 2
Independent validation
Repurposability
Implementation/Code
No changeChangeDon’t care
https://linkingresearch.wordpress.com/2016/02/21/dagstuhl-seminar-report-reproducibility-of-data-oriented-experiments-in-e-scienc/
http://www.dagstuhl.de/16041
Taxonomy of actions towards improving reproducibility in Computer Science.
https://linkingresearch.wordpress.com/2016/02/21/dagstuhl-seminar-report-reproducibility-of-data-oriented-experiments-in-e-scienc/
http://www.dagstuhl.de/16041
Practice
Methodstechniques, algorithms, spec. of the steps, models
Materialsdatasets, parameters, algorithm seedsExperim
ent
Instrumentscodes, services, scripts, underlying libraries, workflows, ref datasets
Laboratorysw and hw infrastructure, systems software, integrative platformscomputational environment
Input Data
Software
Output Data
ConfigParameters
“Datascope” Entropy -> Preservation“Replication / Reproducibility Window”
Setup
Change:Science, methods, datasetsQuestions don’t change, answers do.
Materials unavailableone offs, streams,stochastics,
sensitivities,licensing scale, non-portable data
Change:Instruments break, labs decayActive ref datasets and services
Platforms & resources unavailable
supercomputer scalenon-portable software
“Datascope” Entropy -> Preservation“Replication / Reproducibility Window”
Archived vs Active EnvironmentIsolated vs Open Distributed Ecosystem
T1 T2
Evolving Reference Knowledge Basese.g. UNIPROT
Repeat harder than Reproduce?
Repeating the experiment or the set up?• When the environment is active
“Datascope” Entropy -> Preservation“Replication / Reproducibility Window”
Form?
Function?
Methodstechniques, algorithms, spec. of the steps, models
Materialsdatasets, parameters, algorithm seedsExperim
ent
Instrumentscodes, services, scripts, underlying libraries, workflows, ref datasets
Laboratorysw and hw infrastructure, systems software, integrative platformscomputational environment
Setup
How? Preserve by Reporting, Reproduce by ReadingProvenance Traces, Notebooks, Rich Metadata
Archived Record
MethodsMaterials
Experim
ent
InstrumentsLaboratory
Setup
How? Preserve by Reporting, Reproduce by ReadingProvenance Traces, Notebooks, Rich Metadata
Archived Recordstandards, common metadata
Provenance
Workflows, Scripts
ELNsMethods
MaterialsExperim
ent
InstrumentsLaboratory
Setup
How? Preserve by Maintaining, Repairing, VMsReproduce by Running, Emulating, Reconstructing
Active Instrument
MethodsMaterials
Experim
ent
InstrumentsLaboratory
Setup
How? Preserve by Maintaining, Repairing, VMsReproduce by Running, Emulating, Reconstructing
Active Instrument, Byte level
MethodsMaterials
Experim
ent
InstrumentsLaboratory
Setup
Levels of Computational Reproducibility
Coverage: how much of an experiment is reproducible
Orig
inal
Exp
erim
ent S
imila
r Exp
erim
ent Di
ffere
nt E
xper
imen
tPo
rtabi
lity
Depth: how much of an experiment is available
Binaries + Data
Source Code / Workflow+ Data
Binaries + Data + Dependencies
Source Code / Workflow+ Data + Dependencies
Virtual MachineBinaries + Data + Dependencies
Virtual MachineSource Code / Workflow+ Data + Dependencies
Figures + Data
[Freire, 2014]
Minimum: data and source code available under terms that permit inspection and execution.
Repeatable Environments
*https://2016-oslo-repeatability.readthedocs.org/en/latest/overview-and-agenda.html
[C. Titus Brown*]
Metadata Objects : Reproducible Reporting, Exchange
Checklist Provenance
Tracking Versioning
Dependencies
container
provenance
dependenciessteps, featurestransparency
portability
robustness
preservation
access descriptionavailablestandards
common APIslicensingidentifiers
standards,common metadata
change variation sensitivity
versioning
packaging
Session 1
Session 3
real-time big data
Session 2computational models and simulations
Session 5Architecture
Infrastructure
So, What is Reproducibility? Being FAIR
So, What is Reproducibility? Being FAIR
Session 4
Citation and Credit
What is Reproducibility?Why, When, Where, Who for, Who by, How
Special thanks to• C Titus Brown• Juliana Freire• David De Roure• Stian Soiland-Reyes• Barend Mons• Tim Clark• Daniel Garijo• Wf4Ever and Research Object teams• Dagstuhl Seminar 16041 • Force11 http://www.force11.org