rob davidson at the g3 workshop: open source - tools for reproducibility
DESCRIPTION
Rob Davidson at the G3 (Great GigaScience & Galaxy) Workshop: Open Source - Tools for Reproducibility. University of Melbourne, 19th September 2014TRANSCRIPT
Tools for:
Open-SourceOpen-Data
Rob L Davidson about.me/rob.davidsonwww.slideshare.net/RobertDavidson6/g3-talk-rld2
The problem
reproducibility.cs.arizona.edu• 515 papers (429 conf, 86 journal) • <30% reproducible
The problem
reproducibility.cs.arizona.edu
The Cause
• Stodden 2010– 638 registrant at NIPS
• 30% share code• 20% share data
http://web.stanford.edu/~vcs/papers/SMPRCS2010.pdf
Publishers must provide! HostingCurating
Citations for everything:data, tools + workflows
Tools for Reproducibility
• Data: GigaDB• Images: OMERO• Workflows
– Galaxy – Executable Docs– VMs
GigaDBgithub.com/gigascience/gigadb-cogini
Hosting all data
Hosting all research objects
Impact for research objects
• Host• Curate• Share• Cite - DOI
Even more accessible, transparent data?Hosting image data with OMERO
Re-producing Images Image LIMS Keeps metadata with image Means the image can be
found later! Image can be understood Also some processing
options
http://www.openmicroscopy.org/site/products/omero
Accessible, transparent Images Embed in web Full res View without special
software Adjust contrast etc Link all images to pub!
No cherry picking!
http://www.openmicroscopy.org/site/products/omero
NO
Cyber-Centipedes! Phenotyping
Accessible Cyber-Centipede images
OMERO: providing access to imaging data
View, filter, measure raw images with direct links from journal article.
See all image data, not just cherry picked examples.
Download and reprocess.
OMERO: Adding value
The alternative...
...look but don't touch
Workflows 1. Galaxy
galaxyproject.org
galaxy.cbiit.cuhk.edu.hk
Implement workflows in a community-accepted format
http://galaxyproject.org
Over 45,000 main Galaxy server users
Over 1,000 papersciting Galaxy use
Over 55 Galaxyservers deployed
Open source
Copyright NBAF-B 2013Tool list Tool parameterisation Results panel
Implement workflows in an intuitive format
Visualising Workflows
Birmingham Metabo-Galaxy Workflow
Birmingham Metabo-Galaxy
Tools wrapped in Python and XMLUser sees web form (easy!)Data stored centrally (secure!)Work done centrally (easy update)
Hosting Workflows
Hosting Workflows
1) Test data2) Software files3) Instructions+ Galaxy implementation
Can we reproduce results? SOAPdenovo2 S. aureus pipeline
GalaxyMost accessible
Easy to share (galaxy toolshed)Quite a bit of work
Doesn't include publication explanations
Workflows2. Executable Docs
Open lab books, dynamic documents• Facilitate reuse and sharing with tools like: Knitr, Sweave,
iPython Notebook
Sweave
• Working towards executable papers…
E.g.
E.g.
Some testimonials for KnitrAuthors (Wolfgang Huber)“I do all my projects in Knitr. Having the textual explanation, the associated code and the results all in one place really increases productivity, and helps explaining my analyses to colleagues, or even just to my future self.”
Reviewers (Christophe Pouzat) “It took me a couple of hours to get the data, the few custom developed routines, the “vignette” and to REPRODUCE EXACTLY the analysis presented in the manuscript. With few more hours, I was able to modify the authors’ code to change their Fig. 4. In addition to making the presented research trustworthy, the reproducible research paradigm definitely makes the reviewer’s job much more fun!
Executable docs:Completely reproduce paper!May require some code-skills
Workflow accessibility:VMs
Why VMs?
• OS settings• Dependencies
– Versions– e.g. python!
• Data + Code linked• Download or run in
cloud
VMs in GigaDB
VMs:Can host Galaxy
Can hold KnitR codeProvides 'snapshot' of working system
Summary
Share data in GigaDBShare all images in GigaDB-View images via OMERO
Share code in GigaDB!Share pipeline using:
Executable docs!Galaxy!
VMs!
Give us data, papers & pipelines*
Improve reproducibility!
[email protected] [email protected] [email protected]
Contact us:
* APC’s currently generously covered by BGI until 2015
www.gigasciencejournal.com
Ruibang Luo (BGI/HKU)Shaoguang Liang (BGI-SZ)Tin-Lap Lee (CUHK)Qiong Luo (HKUST)Senghong Wang (HKUST)Yan Zhou (HKUST)
Thanks to:
@gigasciencefacebook.com/GigaScience
blogs.biomedcentral.com/gigablog/
Peter LiHuayan Gao Chris HunterJesse Si ZheNicole NogoyLaurie GoodmanAmye Kenall (BMC)
Marco Roos (LUMC)Mark Thompson (LUMC)Jun Zhao (Lancaster)Susanna Sansone (Oxford)Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran (Oxford)
www.gigadb.orggalaxy.cbiit.cuhk.edu.hk
www.gigasciencejournal.com
CBIITFunding from:
Our collaborators:team: Case study: