2014-10-10-sbc361-reproducible research

Download 2014-10-10-SBC361-Reproducible research

If you can't read please download the document

Upload: yannick-wurm

Post on 02-Jul-2015

434 views

Category:

Science


0 download

DESCRIPTION

SBC361 Reproducible research & Sustainable Software Best practices

TRANSCRIPT

  • 1. Reproducible Research &Sustainable SoftwareSBC361@yannick__ http://yannick.poulet.org

2. Programming in R? 3. Biodiversity+Research+Fieldwork+Worldwide++Find+out+more+at:++BR+3.01++Bancro=+Road+Building+++Wednesday+19th+November++1pm++Queen+Mary+University+ 4. This changes454everything.IlluminaSolid...Any lab cansequenceanything! 5. We need great tools. 6. Reproducible Research &Sustainable Software1. Why care?2. Key concepts3. Approaches, Technologies & Resources 7. Reproducible Research &Sustainable Software Avoid costly mistakes Be faster: stand on the shoulders of giants Increase impact / visibility 8. Reproducible Research &Sustainable Software1. Why care?2. Key concepts3. Approaches, Technologies & Resources 9. 1210.0530v3 [cs.MS] 29 Nov [email protected]),University ofWisconsin ([email protected] University of London ([email protected]),University University ([email protected]), and University of Wisconsin (wilsonp@Scientists spend an increasing amount of time building and usingsoftware. However, most scientists are never taught how to do thisefficiently. As a result, many are unaware of tools and practices thatwould allow them to write more reliable and maintainable code withless effort. We describe a set of best practices for scientific softwaredevelopment that have solid foundations in research and experience,and that improve scientists productivity and the reliability of theirsoftware.1. Write programs for people, not computers.Scientists writing software need to write correctly and can be easily read and programmers (especially the authors future cannot be easily read and understood it is to know that it is actually doing what it is be productive, software developers must therefore aspects of human cognition into account: human working memory is limited, human (Best Practices for Scientific ComputingGreg Wilson , D.A. Aruliah , C. Titus Brown , Neil P. Chue Hong , Matt Davis , Richard T. Guy ,Steven H.D. Haddock , Katy Huff , Ian M. Mitchell , Mark D. Plumbley , Ben Waugh ,Ethan P. White , Paul Wilson Software Carpentry ([email protected]),University of Ontario Institute of Technology (Dhavide.Aruliah@State University ([email protected]),Software Sustainability Institute ([email protected]),Space Telescope ([email protected]),University of Toronto ([email protected]),Monterey Bay Aquarium Research Institute([email protected]),University ofWisconsin ([email protected]),University of British Columbia (mitchell@Mary University of London ([email protected]),University College London ([email protected]),University ([email protected]), and University of Wisconsin ([email protected])Scientists spend an increasing amount of time building and usingsoftware. However, most scientists are never taught how to do thisefficiently. As a result, many are unaware of tools and practices thatwould allow them to write more reliable and maintainable code withless effort. We describe a set of best practices for scientific softwaredevelopment that have solid foundations in research and experience,and that improve scientists productivity and the reliability of theirsoftware.Software is as important to modern scientific research astelescopes and test tubes. From groups that work exclusivelyon computational problems, to traditional laboratory and fieldscientists, more and more of the daily operation of science re-volvesaround computers. This includes the development ofnew algorithms, managing and analyzing the large amountsof data that are generated in single research projects, andcombining disparate datasets to assess synthetic problems.Scientists typically develop their own software for thesepurposes because doing so requires substantial domain-specificand open source software development [61, studies of scientific computing [4, 31, development in general (summarized in practices will guarantee efficient, error-free but used in concert they will reduce errors in scientific software, make it easier the authors of the software time and effort focusing on the underlying scientific questions.Software is as important to modern scientific research astelescopes and test tubes. From groups that work exclusivelyon computational problems, to traditional laboratory and fieldscientists, more and more of the daily operation of science re-volvesaround computers. This includes the development ofnew algorithms, managing and analyzing the large amountsof data that are generated in single research projects, andcombining disparate datasets to assess synthetic problems.and development practices errors the focusing 1. Scientists programmers cannot to be arXiv:1210.0530v3 [cs.MS] 29 Nov 20121. Write programs for people, not computers.2. Automate repetitive tasks.3. Use the computer to record history.4. Make incremental changes.5. Use version control.6. Dont repeat yourself (or others).7. Plan for mistakes.8. Optimize software only after it works correctly.9. Document the design and purpose of code rather than its mechanics.!10. Conduct code reviews. 10. Reproducible Research &Sustainable Software1. Why care?2. Key concepts3. Approaches, Technologies & Resources 11. understand and improve your code in 6Coding for people: Indent your code!approximate Damian Conwaycharactershttp://github.com/ 12. Code for people: Use a style guide For R: http://r-pkgs.had.co.nz/style.html 13. R style guide extract 14. R style guide extractLine lengthStrive to limit your code to 80 characters per line. This fits comfortably on a printed page with areasonably sized font. If you find yourself running out of room, this is a good indication that youshould encapsulate some of the work in a separate function.!ant_measurements