open sciencemcrgoble2015

29
Reproducibility and Scientific Research Professor Carole Goble CBE FREng FBCS The University of Manchester, UK [email protected] Open Data Manchester, 27 th January 2015 icanhascheezburger.com why, what, where, when, who, how

Upload: carole-goble

Post on 14-Jul-2015

329 views

Category:

Science


0 download

TRANSCRIPT

Reproducibility and Scientific

Research

Professor Carole Goble CBE FREng FBCSThe University of Manchester, UK

[email protected]

Open Data Manchester, 27th January 2015

ica

nh

asch

ee

zb

urg

er.c

om

why, what, where, when, who, how

Scientific publications have at least two goals: (i) to announce a result and (ii) to convince readers that the result is correct…..papers in experimental [and computational science] should describe the results and provide a clear enough protocol [or algorithm] to allow successful repetition and extension

Jill MesirovAccessible Reproducible Research

Science 22 January 2010:Vol. 327 no. 5964 pp. 415-416

DOI: 10.1126/science.1179653

Virtual Witnessing / Minute Taking

[Pettifer, Attwood]

http://getutopia.com

Why smart parents often tend to have smart kids

“an experiment is reproducible until

another laboratory tries to repeat it.”

Alexander Kohn

designcherry picking data, random seed reporting, non-independent bias, poor positive and negative controls, dodgy normalisation, arbitrary cut-offs, premature data triage, un-validated materials, improper statistical analysis, poor statistical power, stop when “get to the right answer”, software misconfigurations misapplied black box software

reporting

John P. A. Ioannidis Why Most Published Research Findings Are False, August 30, 2005, DOI: 10.1371/journal.pmed.0020124

incomplete reporting of software configurations, parameters & resource versions, missed steps, missing data, vague methods, missing software

Joppa, et al,Troubling Trends in Scientific Software Use SCIENCE 340 May 2013

Empirical

StatisticalComputational

V. Stodden, IMS Bulletin (2013)

Transparency / Availability Gap

1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14

2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html

3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950

Out of 18 microarray papers, resultsfrom 10 could not be reproduced

Researcher survey, 1202 respondents (PARSE.insight 2010)

Sustainability

WHERE?

[Hylke Koers]

Broken software, broken science

• Geoffrey Chang, Scripps Institute

• Homemade data-analysis program inherited from another lab

• Flipped two columns of data, inverting the electron-density map used to derive protein structure

• Retract 3 Science papers and 2 papers in other journals

• One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right).

Miller A Scientist's Nightmare: Software Problem Leads to Five Retractions Science 22 December 2006: vol. 314 no. 5807 1856-1857

http://www.software.ac.uk/blog/2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers

“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995

Morin et al Shining Light into Black Boxes Science 13 April 2012: 336(6078) 159-160

Ince et al The case for open computer programs, Nature 482, 2012

algorithmsconfigurationstools and appscodesworkflowsscriptscode librariesthird party services,system software infrastructure, compilershardware

Self-contained codes??

WHY? 12+3 reasons research goes “wrong”

1. Pressure to publish2. Impact factor mania3. Tainted resources4. Bad maths5. Sins of omission6. Science is messy7. Broken peer review8. Some scientists don’t share9. Research never reported10. Poor training -> sloppiness11. Honest error12. Fraud13. Disorganisation & time pressures14. Cost to prepare and curate materials15. Inherently “unreplicable ” (one-off data, specialist kit, stochastic)

https://www.sciencenews.org/article/12-reasons-research-goes-wrong (adapted)

• replication hostility• resource intensive• no funding, time,

recognition, place to publish

• the complete environment?

Its HARD to Prepare and Independently Test

[Norman Morrison]

Value People. Data. Method. Software.

re-compute

replicate

rerunrepeat

re-examine

repurpose

recreate

reuse

restore

reconstruct review

regeneraterevise

recycle

conceptual replication “show

A is true by doing B rather

than doing A again”

verify but not falsify[Yong, Nature 485, 2012]

regenerate figure

redo

WHAT is reproducibility?this is a heated topic of debate

robustness tolerance

verification compliance

validation assurance

Can I repeat my method?

publish article

DEFEND

* Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)

WHEN? same experiment, set up, lab

submit article (and move on…)

Can I replicateyour method?

CERTIFY

(a window before decay sets in … )

same experiment, set up, independent lab

Can I reproduce my

results using your method or your results using my method?

COMPARE

variations on experiment, set up, lab

Can I reuse your

results / method in my research ?

TRANSFER

different experiment

WHO? scientific ego-system & accesstrust, reciprocity, and competition

blamescoopingno credit / credit driftmisinterpretation scrutiny trollingcost of preparationsupport distractiondependents on old newsloss of dowryloss of special sauce

hugging

flirting

voyerism

cautionary creeping

Tenopir, et al. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6) 2012

Borgman The conundrum of sharing research data, JASIST 2012

John P. A. Ioannidis How to Make More Published Research True, October 21, 2014 DOI: 10.1371/journal.pmed.1001747

Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoSComput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285

HOW?

[Adapted Freire, 2013]

transparencydependencies

stepsprovenance

portability

robustness tolerance

preservationpackaging

versioning

accessavailablestandardscommon APIslicence

descriptionintelligiblestandardscommon metadata

HOW?

sustained sitesFindableAccessibleIntelligibleReproducible

http://software-carpentry.org/

http://datacarpentry.org/

http://www.nature.com/sdata/

ELNs

Automation

Checklists

eLabs

Gathering

scattered

research

components

Summary

• Replicable Science is hard work and poorly rewarded

• Reproducible Science => Transparent Science but ideally needs to be born that way

• Collective responsibility

• Barend Mons• Sean Bechhofer• Philip Bourne• Matthew Gamble• Raul Palma• Jun Zhao• Alan Williams• Stian Soiland-Reyes• Paul Groth• Tim Clark• Juliana Freire• Alejandra Gonzalez-Beltran• Philippe Rocca-Serra• Ian Cottam• Susanna Sansone• Kristian Garza• Hylke Koers• Norman Morrison• Ian Fore• Jill Mesirov• Robert Stevens• Steve Pettifer

http://www.researchobject.org

http://www.wf4ever-project.org

http://www.fair-dom.org

http://www.software.ac.uk

Further Reading

• https://www.sciencenews.org/article/redoing-scientific-research-best-way-find-truth

• Drummond C Replicability is not Reproducibility: Nor is it Good Science, online

• Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.