open science 2014
DESCRIPTION
Code as a Research Product: Open Source for Open Science Given at the NIAID Bioinformatics festival 2014TRANSCRIPT
![Page 1: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/1.jpg)
Code as a Research Product !
Open Source for
Open Science
Dan Gezelter @gezelter
OpenScience.org (also: The University of Notre Dame)
![Page 2: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/2.jpg)
Suppose your colleague sends you an email that says, I’ve found something amazing. I don’t have time to tell you exactly what it is, or how I found it, but here’s proof that I discovered it:
smaismrmilmepoetaleumibunenugttauiras
![Page 3: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/3.jpg)
On the 25th of July in 1610, Galileo discovered that Saturn was apparently situated between two smaller companions that always moved together. Wanting to establish his priority of discovery, he sent to Kepler (and others) the following anagram, which he informed them was a coded description of his latest discovery:
smaismrmilmepoetaleumibunenugttauiras
![Page 4: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/4.jpg)
On the 25th of July in 1610, Galileo discovered that Saturn was apparently situated between two smaller companions that always moved together. Wanting to establish his priority of discovery, he sent to Kepler (and others) the following anagram, which he informed them was a coded description of his latest discovery:
smaismrmilmepoetaleumibunenugttauiras
Altissimum planetam tergeminum observavi !
I have observed the highest of the planets [Saturn] three-formed
![Page 5: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/5.jpg)
!
Sadly, this kind of scientific communication was common at the time. Newton, Huygens, Hooke, and Leonardo all used similar devices to hide their discoveries and methods from each other. !!!
![Page 6: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/6.jpg)
!
In 1665, the Philosophical Transactions (one of the earliest scientific journals) was founded by Henry Oldenburg. !!
![Page 7: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/7.jpg)
The Royal Society is collaborating with JSTOR to digitize, preserve, and extend access toPhilosophical Transactions (1665-1678).
www.jstor.org®
![Page 8: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/8.jpg)
The Royal Society is collaborating with JSTOR to digitize, preserve, and extend access toPhilosophical Transactions (1665-1678).
www.jstor.org®
![Page 9: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/9.jpg)
Although the importance of reproducibility is as old as scientific inquiry, the importance of sharing scientific methodology was adopted slowly.
![Page 10: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/10.jpg)
!
Over the next 200 years, publishing and sharing methodology became commonplace. !In August of 1867, the chemist, William Crookes wrote an obituary for his mentor, Michael Faraday. He recounted Faraday’s advice to his students: !
“The secret,” said he, “is comprised in three words — Work, Finish, Publish.”
![Page 11: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/11.jpg)
!
Over the next 200 years, publishing and sharing methodology became commonplace. !In August of 1867, the chemist, William Crookes wrote an obituary for his mentor, Michael Faraday. He recounted Faraday’s advice to his students: !
“The secret,” said he, “is comprised in three words — Work, Finish, Publish.”
!
It must be confessed that young chemists of the present day follow this advice, carefully omitting the second word.
![Page 12: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/12.jpg)
!
Surely science has continued to evolve since 1867… !!
![Page 13: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/13.jpg)
Today, thousands of scientific papers report on computations that cannot be reproduced without access to secret software. !The inner workings of this secret software are hidden from skeptics and other researchers. !If you try to reproduce the capabilities of the secret software in another code, the entity that owns this software bans the university where you work.
![Page 14: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/14.jpg)
Today, thousands of scientific papers report on computations that cannot be reproduced without access to secret software. !The inner workings of this secret software are hidden from skeptics and other researchers. !If you try to reproduce the capabilities of the secret software in another code, the entity that owns this software bans the university where you work.
!
Think I’m exaggerating? BannedByGaussian.org
!
![Page 15: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/15.jpg)
!
Science has been Open since 1665. We just need to remind ourselves of this fact every few years…
!!
![Page 16: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/16.jpg)
• Open Source
• Open Notebook
• Open Data
• Open Metadata
• Open Access
!
What is Open Science?
![Page 17: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/17.jpg)
Transparency in experimental methodology, observation, and collection of data.
• Open Source
• Open Notebook
• Open Data
• Open Metadata
• Open Access
!
What is Open Science?
![Page 18: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/18.jpg)
Transparency in experimental methodology, observation, and collection of data.
Public availability and re-use of scientific data.
• Open Source
• Open Notebook
• Open Data
• Open Metadata
• Open Access
!
What is Open Science?
![Page 19: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/19.jpg)
Public accessibility and transparency of scientific communication.
Transparency in experimental methodology, observation, and collection of data.
Public availability and re-use of scientific data.
• Open Source
• Open Notebook
• Open Data
• Open Metadata
• Open Access
!
What is Open Science?
![Page 20: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/20.jpg)
What is Open Science?
!
Open Science is the idea that scientific knowledge of all kinds should be shared publicly as early as is practical in the discovery process. !
![Page 21: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/21.jpg)
Reproducibility
Reproducibility of experiments is one of the foundations of science.
We expect universality from the results from empirical tests. Independent scientists should be able to subject theories to similar tests in different locations, on different equipment, and at different times and get similar answers.
![Page 22: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/22.jpg)
Reproducible Computational Science
• For simple models and small data sets, calculations are reproducible in principle and in practice.
• As simulations become more complex and data sets become larger, calculations that are reproducible in principle are no longer reproducible in practice without access to the code, data, and meta-data.
• Reproducibility now requires public access to code, data, and meta-data.
![Page 23: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/23.jpg)
Reproducible Computational Science
Reports of numerical experiments should include: !
1. All source code needed to reproduce the calculation 2. All input data used to perform the calculation 3. All meta-data required to allow other codes to use
the input data
These are equivalent to the methodology section of an experimental paper. This standard requires Open Source, Open Data, and Open Meta-data for reproducible computational science.
![Page 24: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/24.jpg)
Reproducible Research Standard
1. Release media components (text, figures) under CC BY. 2. Release code components under MIT license or similar. 3. Attribution license on selection and arrangement of data. 4. Release data under CC0.
V. Stodden, “Enabling reproducible research: Licensing for scientific innovation,” International Journal of Communications Law & Policy 13, 1 (2009).
![Page 25: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/25.jpg)
!
Why aren’t all scientific programs open source?
!!
![Page 26: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/26.jpg)
Two Open Source Science Codes
Started: 1998 2004
Purpose: Molecular Visualization Molecular Dynamics
Languages: Java C++, Python
Developers: 38 17 (graduate students)
Lead Developers: 7 1
Code base: 472,956 lines 92,308 lines
Person-Years: 125 23
Estimated Development Costs: $6,848,949 $629,389
Explicitly-funded Costs: $0 $0
Downloads:Over 831,656 at
SourceForge alone, (possibly millions more)
5,472
External Citations: 221 21
Citations to lead developers: 157 21
![Page 27: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/27.jpg)
Comparative History
Post- doc
Pharma researcher
Graphics guru
Informatics Post-doc
Graduate Student Academic
Grad Student 1
Grad Student 2
AdvisorGroup Code
Grad Student 3
Other Groups
Code Re-use
![Page 28: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/28.jpg)
Jmol is a useful tool• Filled a void created by the death of a closed-source tool.
• Developed by a series of project leads and their geographically-distributed teams. The lead developers hand off the code when they become too busy.
• Application focus changed dramatically over 16 years.
• External users of the code tend to run the application rather than re-use algorithms.
• Jmol became the standard tool for embedding chemical structures in web pages: • RCSB Protein Data Bank (PDB) • Inorganic Crystal Structure Database • Viewer for Folding @ Home projects • Can be directly included in Sakai, Moodle, and WebAssign sites
![Page 29: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/29.jpg)
Jmol is part of other scientific tools• Bioclipse (integrated environment for biomolecule investigation ) • CaGe • ChemPad (3D models calculated on- the-fly from a formula
sketched by hand in a tablet PC • iBabel (a GUI for Openbabel ) • Janocchio (calculates NMR coupling constants and NOEs ) • Molecular Workbench • PFAAT (Protein Family Alignment Annotation Tool) • ProteinGlimpse • Spice • STING Millennium • STRAP • Taverna
![Page 30: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/30.jpg)
Jmol helps disseminate data
Jmol provides structure visualization for: • ACS Chemical Biology • Biochemical Journal • Chemical Reviews (ACS) • Crystallography Journals Online (IUCr) • Molecular BioSystems (Royal Soc. Chem.) • Nature Chemical Biology • Nature Structural & Molecular Biology • Inorganic Chemistry (ACS) • JACS • Journal of Chemical Education • Journal of Molecular Biology • Journal of Natural Products • Organic Letters
![Page 31: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/31.jpg)
Comparative History
Post- doc
Pharma researcher
Graphics guru
Informatics Post-doc
Graduate Student Academic
Grad Student 1
Grad Student 2
AdvisorGroup Code
Grad Student 3
Other Groups
Code Re-use
![Page 32: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/32.jpg)
OpenMD• Merged student codes that carried out similar tasks. • Development was done within one research group and
was piggy-backed on other funded projects. • A journal article outlines the code’s capabilities, and
attribution is requested in the license. • Application development preserved group memory. • External users of the code tend to re-use algorithms
rather than run the application.
![Page 33: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/33.jpg)
OpenMD
• Tight coupling of data & meta-data
• Code versioning information stored in generated data
• Reproduce simulations easily, but reproducibility is not the same as replicability.
• In parallel architectures, replicability may not be possible.
<OpenMD version=2> <MetaData> molecule{ name = "D"; atom[0]{ type = "D"; position(0.0, 0.0, 0.0); orientation(0.0, 0.0, 0.0); } } component{ type = "D"; nMol = 3456; } ensemble = NVE; forceField = "Multipole"; cutoffMethod = "shifted_force"; electrostaticScreeningMethod = "damped"; cutoffRadius = 12.0; dampingAlpha = 0.14; dt = 1.0; runTime = 1e3; sampleTime = 10.0; statusTime = 1.0; seed = 8675309; ## Last run using OpenMD Version: 2.1 Revision: 1972 </MetaData> <Snapshot> …
![Page 34: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/34.jpg)
!
Why aren’t all scientific programs open source?
!!
![Page 35: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/35.jpg)
!
Every discussion in science ends up in a discussion on tenure and grants. !
![Page 36: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/36.jpg)
6�–�College of Science 2013 Accomplishments –�Department of Chemistry and Biochemistry
My graduate students have done excellent research that is well respected in my community, and after their degrees were awarded, they have all gone on to positions in which they can contribute to science and to society in meaningful ways. We are deeply engaged in the university’s goal to become a pre-eminent research university. This past year, our work in the OpenScience movement was recognized nationally and internationally at the White House Champions of Change event. That recognition contributed directly to communication with the external constituents of the university (Goal 5).
Generate Personal “Citation Report”
1. Go to ISI Web of Knowledge: http://www.webofknowledge.com 2. Select the Web of Science tab at the top 3. In 2nd box type in:
• Last name and first initial (no commas). Use all variations, i.e. Doe J OR Doe JR • Make sure that Author appears in the small box on the right
4. Under Current Limits at the bottom of the page, select: • All Years (normally the default) • Click on Search above
5. Under Refine Results in the column on the left side of the page, you may be able to perform refinements that will help to limit the list to only your citations.
6. Click on Refine button 7. Click on Create Citation Report (graph icon/upper right) 8. After generating the citation report, remove any citations that are not yours by checking the left-hand box on
that citation 9. Rerun Citation Report 10. Copy and paste both green plots into the space below (highlight both or right-click and copy each one). 11. Please also copy and paste your citation metrics (h-index, citations etc.), which appear to the right of the plots.
You are done! (Don’t forget to Log Out) * If you need help in generating these citation plots, please contact Thurston Miller [email protected] - He will be happy to assist you! *
Copy and paste Citation Report bar graphs (2) into the space below (please submit as PC viewable graphs, which QuickTime often
are not). Please also copy and paste your citation metrics in this space.
Once a year, academic scientists do this:
![Page 37: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/37.jpg)
• Scientists stay alive professionally by publishing • Paper count • Citation count • h-index
• Time spent on open science projects reduces publication rates • Scientific software tools are often not cited • Even if they were cited, how would that citation get tied to a researcher? • How can a scientist show her institution the value of her project? !!Attribution metrics should (but don’t) take into account:
1. Effort to maintain a useful resource 2. Importance to the scientific community 3. Externalities beyond the scientific community
Recognition & Attribution
![Page 38: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/38.jpg)
Until recently, there was no way to measure open products of research (outside of traditional publications) with a simple metric that can be used by institutions. !This is starting to change. ImpactStory, fidgit DOI lookups..
Recognition & Attribution
What we need is institutional recognition of alternative metrics.
![Page 39: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/39.jpg)
Recognition & Attribution
What we need is institutional recognition of alternative metrics
![Page 40: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/40.jpg)
Recognition & Attribution
What we need is institutional recognition of alternative metrics
And a pony
![Page 41: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/41.jpg)
Recognition & Attribution
What we need is institutional recognition of alternative metrics
And a pony
There is no drive to make these changes in the academic world. !
There’s no drive for this in the for-profit or society journals. The new PLOS One sharing policy is a refreshing change. !
The funding agencies may be our best hope for recognizing code & data as primary research products.
![Page 42: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/42.jpg)
SustainabilityAcademic scientists know almost nothing about good coding practices:
!• Source version control systems (cvs, svn, git, Hg) • Agile (or any other) development models • Design patterns • Object-oriented languages • Strong typing • Public source repositories ( SourceForge, github ) • Differences among open source licenses • Modern build & testing systems • Unit testing • Bug & issue tracking • Designing for usability and usability testing • UI design • Error handling • Introductory user documentation !
Because they aren’t forced into good practices, scientists often create code that is impossible to maintain effectively. This does not lead to sustainable open science.
![Page 43: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/43.jpg)
Sustainability
Why not employ professional programmers to do scientific coding?
![Page 44: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/44.jpg)
Sustainability
Computer scientists often know little about the domain sciences
• It is significantly faster for me to train a computational chemist in good coding practices than it is to train even an accomplished programmer in the various disciplines we use.
Why not employ professional programmers to do scientific coding?
![Page 45: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/45.jpg)
Resources are also necessary for sustainable Open Science NIH, DOE & DARPA fund specific kinds of science. There is little room for projects which enrich the overall scientific enterprise, but don’t constitute novel research themselves. Tools are rarely funded.
!
!
Funding agencies should require delivery of primary research products:
• code in a public repository
• data in a public repository
• make depositions a part of the reporting structure for funded grants
Sustainability
![Page 46: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/46.jpg)
Open source is essential for reproducible, open science.
!
There are no easy solutions to problems of Recognition, Attribution, and Sustainability.
!
That doesn’t mean we get to step away from open source. To do so would be to go back to 1611:
Haec immatura a me jam frustra leguntur oy
Outlook
![Page 47: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/47.jpg)
Open source is essential for reproducible, open science.
!
There are no easy solutions to problems of Recognition, Attribution, and Sustainability.
!
That doesn’t mean we get to step away from open source. To do so would be to go back to 1611:
Haec immatura a me jam frustra leguntur oy
Outlook
Cynthiae figuras aemulatur mater amorum “The mother of love imitates the shape of Cynthia”
(Venus imitates the phases of the moon)
![Page 48: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/48.jpg)
The Alfred P. Sloan Foundation Startup funding for the Open Science Project
Brian Glanz & the Open Science Federation Supporters and friends of Open Science
Michael Nielsen, author of Reinventing Discovery For making us aware of the Galileo & Faraday stories
Acknowledgments
Victoria Stodden, Columbia University developer of the Reproducible Research Standard
The National Science Foundation OpenMD was indirectly supported under grant CHE-0848243
![Page 49: Open science 2014](https://reader031.vdocument.in/reader031/viewer/2022020306/554e74adb4c90545698b4c3a/html5/thumbnails/49.jpg)
Sources• E. A. Partridge and H. C. Whitaker, “Galileo’s Work on Saturn’s Rings - A Historical Correction,” Popular
Astronomy 3, 408-414 (1896).
• Henry Oldenburg, “The Introduction,” Philosophical Transactions 1, 1-3 (1665) doi:10.1098/rstl.1665.0002
• William Crookes, “Faraday,” The Chemical News XVI(404), 110-111 (1867)
• C. Hempel, Philosophy of Natural Science 49 (1966).
• The distinction between verifiable in principle and verifiable in practice was originally made in: A. J. Ayer, Language, Truth and Logic, (New York: Dover, 1946) p. 32.
• E. Sober Philosophy of Biology (Boulder: Westview Press, 2000), pp. 50-51.
• J. Lett, Science, Reason and Anthropology, The Principles of Rational Inquiry (Oxford: Rowman & Littlefield, 1997), p. 47
• The Yale Law School Roundtable on Data and Code Sharing, “Reproducible Research: Addressing the Need for Data and Code Sharing in Computational Science,” Computing in Science & Engineering 12(5), 8-12 (2010) doi: 10.1109/MCSE.2010.113
• V. Stodden, “The Legal Framework for Reproducible Scientific Research: Licensing and Copyright,” Computing in Science & Engineering 11(1), 35-40 (2009) doi: 10.1109/MCSE.2009.19
• V. Stodden, “Enabling reproducible research: Licensing for scientific innovation,” International Journal of Communications Law & Policy 13, 1 (2009).
• Jmol is available at jmol.sf.net
• OpenMD is available at openmd.org
• Source code analysis and cost estimates were done at ohloh.net , reference counts are from webofknowledge.com