![Page 1: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/1.jpg)
YOU GOT YOUR ENGINEERING IN MY DATA SCIENCE
ADDRESSING THE REPRODUCIBILITY CRISIS WITH SOFTWARE ENGINEERING
1
![Page 2: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/2.jpg)
WE SEE PATTERNS2
![Page 3: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/3.jpg)
SCIENCE USED TO BE A SOLO OPERATION…
3
![Page 4: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/4.jpg)
THE OVERALL HIGGS ANALYSIS WAS PERFORMED BY A TEAM OF MORE THAN 600 PHYSICISTS.
“Who Really Found the Higgs Boson” -Neal Hartman, Nautilus Issue 18
…BUT NOW IT’S NOT
4
![Page 5: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/5.jpg)
DATA SCIENCE IMPROVES
EVERYTHING 5-1
![Page 6: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/6.jpg)
5-2
![Page 7: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/7.jpg)
5-3
![Page 8: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/8.jpg)
Clinical recommendations discouraging the use of CYP2D6 gene testing to guide tamoxifen therapy in breast cancer patients are based on studies with flawed methodology and should be reconsidered, according to the results of a Mayo Clinic study published in the Journal of the National Cancer Institute.
Joe Dangor, Mayo Clinic News Network December 9, 2014
5-4
![Page 9: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/9.jpg)
SEARCHING FOR PATTERNS
6
![Page 10: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/10.jpg)
7
![Page 11: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/11.jpg)
8
![Page 12: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/12.jpg)
PROBLEMS WITH ANALYSIS TOOLS
FALSE POSITIVES IN FMRI RESEARCH
9-1
![Page 13: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/13.jpg)
PROBLEMS WITH ANALYSIS TOOLS
FALSE POSITIVES IN FMRI RESEARCH
▸ After crunching the numbers, “we think that around 3,000 studies could be affected,” says Dr Eklund. But without revisiting each and every study, it is impossible to know which those 3,000 are.
9-2
![Page 14: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/14.jpg)
PROBLEMS WITH PROCESS
PSYCHOLOGICAL RESEARCH
10-1
![Page 15: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/15.jpg)
PROBLEMS WITH PROCESS
▸ “Estimating the reproducibility of psychological science”
PSYCHOLOGICAL RESEARCH
10-2
![Page 16: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/16.jpg)
PROBLEMS WITH PROCESS
▸ “Estimating the reproducibility of psychological science”
▸ Brian Nosek, Science, August 2015
PSYCHOLOGICAL RESEARCH
10-3
![Page 17: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/17.jpg)
PROBLEMS WITH PROCESS
▸ “Estimating the reproducibility of psychological science”
▸ Brian Nosek, Science, August 2015
▸ 270 co-authors tried to reproduce 100 studies
PSYCHOLOGICAL RESEARCH
10-4
![Page 18: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/18.jpg)
PROBLEMS WITH PROCESS
▸ “Estimating the reproducibility of psychological science”
▸ Brian Nosek, Science, August 2015
▸ 270 co-authors tried to reproduce 100 studies
▸ 36% could be reproduced
PSYCHOLOGICAL RESEARCH
10-5
![Page 19: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/19.jpg)
PROBLEMS WITH PROCESS
PSYCHOLOGICAL RESEARCH
“Nosek said there were three possible reasons for his results: that the original effect could have been false positive, that the replication was a false negative, or that both the original and replication results are accurate but that each experiment’s methodology differed in significant ways.”- Colleen Flaherty Inside Higher EdAugust 2015
11
![Page 20: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/20.jpg)
PROBLEMS WITH DATA
12-1
![Page 21: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/21.jpg)
11% OF STUDIES REPRODUCIBLE
PROBLEMS WITH DATA
12-2
![Page 22: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/22.jpg)
PROBLEMS WITH DATA
“For results that could not be reproduced, however, data were not routinely analyzed by investigators blinded to the experimental versus control groups. Investigators frequently presented the results of one experiment, such as a single Western-blot analysis. They sometimes said they presented specific experiments that supported their underlying hypothesis, but that were not reflective of the entire data set. There are no guidelines that require all data sets to be reported in a paper; often, original data are removed during the peer review and publication process.”
- C. Glenn Begley
13
![Page 23: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/23.jpg)
IT CAN BE PROVEN THAT MOST CLAIMED RESEARCH FINDINGS ARE FALSE.John Ioannidis
14
![Page 24: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/24.jpg)
THE REPRODUCIBILITY CRISIS
15
![Page 25: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/25.jpg)
16
![Page 26: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/26.jpg)
IT WORKS ON MY MACHINE
Every Single Software Developer Ever
REPRODUCIBILITY IN SOFTWARE ENGINEERING
17
![Page 27: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/27.jpg)
VERSION YOUR CODE AND DATA
VERSION CONTROL
18
![Page 28: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/28.jpg)
USE A BUILD SCRIPT
19
![Page 29: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/29.jpg)
REVIEW YOUR CODE20
![Page 30: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/30.jpg)
21
![Page 31: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/31.jpg)
DEFINE STANDARD FORMATS
22
![Page 32: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/32.jpg)
FUZZING23
![Page 33: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/33.jpg)
USE IT RELEASE IT
OPEN SOURCE
24
![Page 34: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/34.jpg)
TAKE ADVANTAGE OF MODERN TECHNOLOGY
25
![Page 35: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/35.jpg)
CREATING INTERACTIVE PUBLICATIONS
“Truly Interactive Science Publishing was shown to have enough educational value that readers were willing to invest in the needed set–up and learning phases. Problems encountered in network and computer speed can now be minimized by running the ISP software in a cloud computing environment which will minimize the dependence on local computer and network speeds. The social aspects of data sharing and the enlarged review process may be the hardest obstacles to overcome.”
-Dr. Michael Ackerman
26
![Page 36: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/36.jpg)
PUTTING IT ALL TOGETHER
BEST PRACTICES FOR SOFTWARE ENGINEERING AND DATA SCIENCE
27-1
![Page 37: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/37.jpg)
PUTTING IT ALL TOGETHER
BEST PRACTICES FOR SOFTWARE ENGINEERING AND DATA SCIENCE
▸ Version
27-2
![Page 38: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/38.jpg)
PUTTING IT ALL TOGETHER
BEST PRACTICES FOR SOFTWARE ENGINEERING AND DATA SCIENCE
▸ Version
▸ Provide a build script
27-3
![Page 39: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/39.jpg)
PUTTING IT ALL TOGETHER
BEST PRACTICES FOR SOFTWARE ENGINEERING AND DATA SCIENCE
▸ Version
▸ Provide a build script
▸ Review
27-4
![Page 40: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/40.jpg)
PUTTING IT ALL TOGETHER
BEST PRACTICES FOR SOFTWARE ENGINEERING AND DATA SCIENCE
▸ Version
▸ Provide a build script
▸ Review
▸ Run automated positive and negative tests
27-5
![Page 41: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/41.jpg)
PUTTING IT ALL TOGETHER
BEST PRACTICES FOR SOFTWARE ENGINEERING AND DATA SCIENCE
▸ Version
▸ Provide a build script
▸ Review
▸ Run automated positive and negative tests
▸ Stick to standards
27-6
![Page 42: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/42.jpg)
PUTTING IT ALL TOGETHER
BEST PRACTICES FOR SOFTWARE ENGINEERING AND DATA SCIENCE
▸ Version
▸ Provide a build script
▸ Review
▸ Run automated positive and negative tests
▸ Stick to standards
▸ Use open source when you can
27-7
![Page 43: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/43.jpg)
PUTTING IT ALL TOGETHER
BEST PRACTICES FOR SOFTWARE ENGINEERING AND DATA SCIENCE
▸ Version
▸ Provide a build script
▸ Review
▸ Run automated positive and negative tests
▸ Stick to standards
▸ Use open source when you can
▸ Open source when you can
27-8
![Page 44: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/44.jpg)
PUTTING IT ALL TOGETHER
BEST PRACTICES FOR SOFTWARE ENGINEERING AND DATA SCIENCE
▸ Version
▸ Provide a build script
▸ Review
▸ Run automated positive and negative tests
▸ Stick to standards
▸ Use open source when you can
▸ Open source when you can
▸ Take advantage of technology
27-9
![Page 45: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/45.jpg)
THERE IS NO SILVER BULLET
28
![Page 46: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/46.jpg)
THANKS TO
▸ Andrew Schechtman-Rook
▸ Jacqueline Kazil
▸ Jeanie Drury
29
![Page 48: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/48.jpg)
Image and Content Credits:
2. http://www.telescope.com/assets/images/starcharts/2016-10-starchart_col.png
3. https://xkcd.com/1584/
4. http://nautil.us/issue/18/genius/who-really-found-the-higgs-boson
5. https://news.virginia.edu/content/capital-one-cio-talks-big-data-innovation-ahead-tonight-s-information-session, http://newsnetwork.mayoclinic.org/discussion/mayo-clinic-genotyping-errors-plague-cyp2d6-testing-for-tamoxifen-therapy/, https://www.google.com/patents/US8615473, https://www.bloomberg.com/news/articles/2016-09-20/microsoft-develops-ai-to-help-cancer-doctors-find-the-right-treatments
6. By Lokilech - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=1804667
7. http://news.stanford.edu/news/2012/september/austen-reading-fmri-090712.html
8. http://www.popsci.com/science/article/2010-05/hollywood-science-how-your-brain-reacts-horror-movies
9. http://www.economist.com/news/science-and-technology/21702166-two-studies-one-neuroscience-and-one-palaeoclimatology-cast-doubt
11. https://www.insidehighered.com/news/2015/08/28/landmark-study-suggests-most-psychology-studies-dont-yield-reproducible-results
12. http://www.nature.com/nature/journal/v483/n7391/full/483531a.html
14. http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124
31
![Page 49: You Got Your Engineering in my Data Science - Addressing the Reproducibility Crisis with Software Engineering](https://reader031.vdocument.in/reader031/viewer/2022030303/587b252c1a28ab736c8b76e1/html5/thumbnails/49.jpg)
Image and Content Credits:
15. http://xkcd.com/1574/
16. https://www.flickr.com/photos/vannispen/4608436679
18. https://xkcd.com/1597/
20. https://xkcd.com/1695/
21. http://hyperboleandahalf.blogspot.com/2010/06/this-is-why-ill-never-be-adult.html
22. https://xkcd.com/927/
23. https://www.flickr.com/photos/lamenta3/4349576638
24. https://www.flickr.com/photos/jalbertbowdenii/5682524083
25. http://quod.lib.umich.edu/j/jep/3336451.0018.201?view=text;rgn=main
28. https://www.flickr.com/photos/eschipul/4160817135
32