let's free statistics of irreproducible research! · let's free statistics of...

57
09/14/09 Patrick Wessa, Ed van Stee 1 Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14

Upload: others

Post on 21-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

09/14/09 Patrick Wessa, Ed van Stee 1

Let's Free Statistics of Irreproducible Research!Patrick Wessa – talk @ Aston University, UK – 2009/09/14

Page 2: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Acknowledgments

● Funding (we accept money):● K.U.Leuven Association, OOF 2007/13● Donations from private companies

● Contributors: Bart Baesens, Eric Bloemen, Eddy Borghers, Christophe Croux, Claude Doom, Dirk Janssens, Christine Lourdon, Koen Milis, Stephan Poelmans, Riko van Dijk, Guido Van Rompuy, Ed van Stee, Larry Weldon, Patrick Wessa

● Project website: www.freestatistics.org

Page 3: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

A long time ago...

Page 4: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

My frustration

● Teaching Time Series Analysis● Exam question:

Compute (1-B) Y[t] if you know that

Y[t] = {5, 8, 2, 3, 7, 1, 4}BY[t] = Y[t-1]

Page 5: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

My frustration

● Teaching Time Series Analysis● Exam question:

Compute (1-B) Y[t] if you know that

Y[t] = {5, 8, 2, 3, 7, 1, 4}BY[t] = Y[t-1]

● Result● Less than 8% of students got it right.● More than 90% of students could prove Wold's

decomposition theorem!

Page 6: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Conclusion?

Page 7: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Conclusion?

● I am an extremely bad educator.

Page 8: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Conclusion?

● I am an extremely bad educator.

● I shouldn't have asked that silly question:Students can only reproduce theories – they are no required to understand them!

● ...

Page 9: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Conclusion?

● I am an extremely bad educator.

● I shouldn't have asked that silly question:Students can only reproduce theories – they are no required to understand them!

● ...

● Or maybe there is something wrong with our approach towards statistics education?

Page 10: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

A new approach is needed

● Within the pedagogical paradigm of (social) constructivism:● Interaction & collaboration (peer review)● Experimentation● Responsibility (social control)

=> learning & computing technology

=> we need to Free Statistics of irreproducible research

=> www.FreeStatistics.org

Page 11: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Reproducible Research and the Compendium

Computing

Page 12: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Some References● J. Buckheit and D. L. Donoho. Wavelab and reproducible research. In A.

Antoniadis, editor, Wavelets and Statistics. Springer-Verlag, 1995.

● Peter J. Green. Diversities of gifts, but the same spirit. The Statistician, pages 423–438, 2003.

● T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M.L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.

● David L. Donoho, Xiaoming Huo, BeamLab and Reproducible Research, International Journal of Wavelets, Multiresolution and Information Processing, 2004

● Roger D. Peng, Francesca Dominici, and Scott L. Zeger, Reproducible Epidemiologic Research, American Journal of Epidemiology, 2006

● R. Gentleman, Reproducible Research: A Bioinformatics Case Study, Bioconductor

● R. Gentleman, Applying Reproducible Research in Scientific Discovery, BioSilico, 2005

● Jan de Leeuw, Reproducible Research: the Bottom Line, 2001, online

Page 13: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Some References

● Roger Koenker, Achim Zeileis, Reproducible Econometric Research (A Critical Review of the State of the Art), Department of Statistics and Mathematics Wirtschaftsuniversität Wien, Research Report Series, Report 60, November 2007

● Robert Gentleman, Duncan Temple Lang, Statistical Analyses and Reproducible Research, http://www.bepress.com/bioconductor/paper2

● Schwab, M., Karrenbach, N. and Claerbout, J. Making scientific computations reproducible, Computing in Science & Engineering, 2 (6), pp. 61-67, 2000.

● Robert Gentleman, Some Perspectives on Statistical Computing, online

● Leisch, F., “Sweave and beyond: Computations on text documents”, Proceedings of the 3rd International Workshop on Distributed Statistical Computing, 2003, Vienna, Austria, ISSN 1609-395

Page 14: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Claerbout's principle*

● An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and that complete set of instructions that generated the figures.

*Source: Jan de Leeuw

Page 15: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Sweave package

● Excellent solution (in general)● Somewhat impractical for education because

the student:● is required to DIE

(compatibility, security concerns)● must have a working knowledge of LaTeX and R● must recreate a working compendium (for each

submission)

● Not designed with educational research in mind: there is no way to monitor/measure the actual learning activities

Page 16: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Compendium

• Original definition:

An electronic collection of Text, Data and Software that allows the reader to reproduce the research that is presented in the document

Page 17: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Compendium

Text

Software

Data

Text

Software

Data

Software

Tar, zip, rar, ...

LaTeX R code

Page 18: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments
Page 19: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Compendium redefined

• New definition:

A document with (open-access) references to (remotely) archived Computations (including Data, Meta-data, and Software) that allow us to reproduce, and reuse the underlying analysis

• Complete separation of:– text and computing– computational result and computing infrastructure

=> the compendium platform is a tool for collaboration, dissemination, and monitoring.

Page 20: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Computations Database

Meta Information

Software

Data

Text

Ref.

Ref.

Ref.

Ref.

Ref.

Ref.

R Module

R Module

R Module

R Module

R Module

R Module

Page 21: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Compendium Dynamics

Meta Information

Software

DataText

Ref.

R Module 1

R Module 1

Changed/New R Module

Page 22: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Learning System orEducational Laboratory?

R Framework

CompendiumPlatformCompendium

Blog

Reproduce & Reuse

ReferenceCreate/Maintain

QueryEngine

ProcessMeasurements

(Virtual)Learning

Environment

Usage

Usage

SearchEngine

www.wessa.net

www.freestatistics.org

www.moodle.org

Page 23: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Screenshots

Page 24: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Setting up the course

Page 25: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments
Page 26: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Computations are “blogged” (not archived)

Page 27: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

09/14/09 Vannes 2008 27

Archive of Computational R Objects

Page 28: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

09/14/09 Vannes 2008 28

Objects on the web

Page 29: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Weekly assignments

Page 30: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Snapshot of “Blogged” Computation

Reproduce or Reuse at wessa.net

Cite the computation as follows

Page 31: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Social Interaction, Collaboration, Networking, ...

Page 32: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Feedback (Peer Review)

Submitting Peer Review (feedback) is a good learning activity – not a good grading procedure

Page 33: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Educational Research

Page 34: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Lectures● 13 weeks (semester)

● Week 1: Introduction (explanation) + workshop assignment

● Week 2-12: Workshops + Peer Assessments

● Week 13: Final Exam (multiple choice)

● Grades received from Peers do NOT count => there is no penalty for making mistakes!!

● The quality of feedback messages is graded by the educator

Week 1 Week 2 Week 3 Week 4 ...

ExamL1 L2 L3 L4 L5

WS1 WS2 WS3 WS4 WS5

Rev 1 Rev 2 Rev 3 Rev 4 ...

...

Page 35: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Data

● 37108 computations (2 years)● 20799 feedback messages (2nd year)● 16045 parent-child relationships● 758 students (2 years)● 7899 unique impact relationships (after

elimination of anonymous computations)This is 1.38% of all possible relationships (573806).

Page 36: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Learning Experiences

● Overwhelming evidence that students had a positive learning experience (at the end of the semester).

● This is surprising because:● heavy workload● Reproducible Computing involves critical

thinking which is disliked by students

● All questions have a positive AM - some are even close to the maximum score.

Page 37: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Perceived Usability

● The web-based software was highly rated by students.

● The only exception is question 9: “The website gives error messages that clearly tell me how to fix problems.” ● Reproducible Computing offers quicker and

better problem-solving● Communication between Developer and User

Page 38: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Comparative Advantage

● AM ratings > 0.5

● Q20: Overall, the website was helpful in learning statistics

● Q21: Learning Statistics with this website is more effective than with a traditional handbook

● Q22: I intend to use this website when I need to apply statistics in the future

● Q27: To learn statistics, this website is better than the statistical courses I have had so far

Page 39: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Reported vs. Actual

Page 40: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Relationships

● Strong & robust relationships:● TX = f(RC, Feedback | Prior Knowledge, Education, Gender, ...)

● Satisfaction = f(RC, Usability, Feedback, Critical Thinking)

● Usability = f(Satisfaction about educator)

● Weak or unstable relationships:● TX = f(Satisfaction, Usability, Attitudes)

● Strong and unexpected bias in reported activity (when compared to actual measurements)

Page 41: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments
Page 42: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments
Page 43: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Permission granted - (C)opyright Journal of Technology, Learning, and Assessment

Page 44: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments
Page 45: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Social Networks

● Great interest in the pedagogical paradigm of social constructivism (science education)But... no process information is available!

● Social interaction analysis is difficult (forums)Manual coding is required!

● Dennen (2008):“Discussion is a required component of many Web-based classes, but do we really know its value or contribution to learning? Students may be graded for participation, and number and length of posts may be counted by those evaluating or researching online classes, but all too often the assessment and analysis methods that we use fail to provide us with data that indicate learning took place through participation in online discussion.”

Page 46: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

SociogramPropagation of Ideas through Reproducible Computing

Page 47: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Fraud detection

Page 48: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Problem: separate threads of discussion

Week 1 Week 2 Week 3 Week 4 ...

ExamL1 L2 L3 L4 L5

WS1 WS2 WS3 WS4 WS5

Rev 1 Rev 2 Rev 3 Rev 4 ...

...

Page 49: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

CAL'09 - Brighton - Patrick Wessa

Computation 3

Computation 1

Connected threads of discussion

Computation 2

Computation 4

Computation 5

Computation 6

Page 50: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Pass/Fail predictionactivenon-active

drop-out

female male

prep.progr. prep.progr.bachelor bachelor

Page 51: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Business Applications

Page 52: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

par3 = lambda_1par4 = lambda_2

Expected Return (P50)

Page 53: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

par3 = lambda_1par4 = lambda_2

VaR 99% (P01)

Page 54: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

tKMACD

Morgan Stanley Standard & Poors

Page 55: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Expected Return (P50)

par1 = delta_1par2 = delta_2

Page 56: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

par1 = delta_1par2 = delta_2

VaR 99% (P01)

Page 57: Let's Free Statistics of Irreproducible Research! · Let's Free Statistics of Irreproducible Research! Patrick Wessa – talk @ Aston University, UK – 2009/09/14. Acknowledgments

Alexander’s Filterrule

Morgan Stanley Standard & Poors