open sciencemcrgoble2015
TRANSCRIPT
Reproducibility and Scientific
Research
Professor Carole Goble CBE FREng FBCSThe University of Manchester, UK
Open Data Manchester, 27th January 2015
ica
nh
asch
ee
zb
urg
er.c
om
why, what, where, when, who, how
Scientific publications have at least two goals: (i) to announce a result and (ii) to convince readers that the result is correct…..papers in experimental [and computational science] should describe the results and provide a clear enough protocol [or algorithm] to allow successful repetition and extension
Jill MesirovAccessible Reproducible Research
Science 22 January 2010:Vol. 327 no. 5964 pp. 415-416
DOI: 10.1126/science.1179653
Virtual Witnessing / Minute Taking
designcherry picking data, random seed reporting, non-independent bias, poor positive and negative controls, dodgy normalisation, arbitrary cut-offs, premature data triage, un-validated materials, improper statistical analysis, poor statistical power, stop when “get to the right answer”, software misconfigurations misapplied black box software
reporting
John P. A. Ioannidis Why Most Published Research Findings Are False, August 30, 2005, DOI: 10.1371/journal.pmed.0020124
incomplete reporting of software configurations, parameters & resource versions, missed steps, missing data, vague methods, missing software
Joppa, et al,Troubling Trends in Scientific Software Use SCIENCE 340 May 2013
Empirical
StatisticalComputational
V. Stodden, IMS Bulletin (2013)
Transparency / Availability Gap
1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14
2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
Out of 18 microarray papers, resultsfrom 10 could not be reproduced
Broken software, broken science
• Geoffrey Chang, Scripps Institute
• Homemade data-analysis program inherited from another lab
• Flipped two columns of data, inverting the electron-density map used to derive protein structure
• Retract 3 Science papers and 2 papers in other journals
• One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right).
Miller A Scientist's Nightmare: Software Problem Leads to Five Retractions Science 22 December 2006: vol. 314 no. 5807 1856-1857
http://www.software.ac.uk/blog/2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995
Morin et al Shining Light into Black Boxes Science 13 April 2012: 336(6078) 159-160
Ince et al The case for open computer programs, Nature 482, 2012
algorithmsconfigurationstools and appscodesworkflowsscriptscode librariesthird party services,system software infrastructure, compilershardware
Self-contained codes??
WHY? 12+3 reasons research goes “wrong”
1. Pressure to publish2. Impact factor mania3. Tainted resources4. Bad maths5. Sins of omission6. Science is messy7. Broken peer review8. Some scientists don’t share9. Research never reported10. Poor training -> sloppiness11. Honest error12. Fraud13. Disorganisation & time pressures14. Cost to prepare and curate materials15. Inherently “unreplicable ” (one-off data, specialist kit, stochastic)
https://www.sciencenews.org/article/12-reasons-research-goes-wrong (adapted)
• replication hostility• resource intensive• no funding, time,
recognition, place to publish
• the complete environment?
Its HARD to Prepare and Independently Test
[Norman Morrison]
re-compute
replicate
rerunrepeat
re-examine
repurpose
recreate
reuse
restore
reconstruct review
regeneraterevise
recycle
conceptual replication “show
A is true by doing B rather
than doing A again”
verify but not falsify[Yong, Nature 485, 2012]
regenerate figure
redo
WHAT is reproducibility?this is a heated topic of debate
robustness tolerance
verification compliance
validation assurance
Can I repeat my method?
publish article
DEFEND
* Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)
WHEN? same experiment, set up, lab
submit article (and move on…)
Can I replicateyour method?
CERTIFY
(a window before decay sets in … )
same experiment, set up, independent lab
Can I reproduce my
results using your method or your results using my method?
COMPARE
variations on experiment, set up, lab
Can I reuse your
results / method in my research ?
TRANSFER
different experiment
WHO? scientific ego-system & accesstrust, reciprocity, and competition
blamescoopingno credit / credit driftmisinterpretation scrutiny trollingcost of preparationsupport distractiondependents on old newsloss of dowryloss of special sauce
hugging
flirting
voyerism
cautionary creeping
Tenopir, et al. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6) 2012
Borgman The conundrum of sharing research data, JASIST 2012
John P. A. Ioannidis How to Make More Published Research True, October 21, 2014 DOI: 10.1371/journal.pmed.1001747
Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoSComput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
HOW?
[Adapted Freire, 2013]
transparencydependencies
stepsprovenance
portability
robustness tolerance
preservationpackaging
versioning
accessavailablestandardscommon APIslicence
descriptionintelligiblestandardscommon metadata
HOW?
sustained sitesFindableAccessibleIntelligibleReproducible
Summary
• Replicable Science is hard work and poorly rewarded
• Reproducible Science => Transparent Science but ideally needs to be born that way
• Collective responsibility
• Barend Mons• Sean Bechhofer• Philip Bourne• Matthew Gamble• Raul Palma• Jun Zhao• Alan Williams• Stian Soiland-Reyes• Paul Groth• Tim Clark• Juliana Freire• Alejandra Gonzalez-Beltran• Philippe Rocca-Serra• Ian Cottam• Susanna Sansone• Kristian Garza• Hylke Koers• Norman Morrison• Ian Fore• Jill Mesirov• Robert Stevens• Steve Pettifer
http://www.researchobject.org
http://www.wf4ever-project.org
http://www.fair-dom.org
http://www.software.ac.uk
Further Reading
• https://www.sciencenews.org/article/redoing-scientific-research-best-way-find-truth
• Drummond C Replicability is not Reproducibility: Nor is it Good Science, online
• Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.