let's free statistics of irreproducible research! · let's free statistics of...
TRANSCRIPT
09/14/09 Patrick Wessa, Ed van Stee 1
Let's Free Statistics of Irreproducible Research!Patrick Wessa – talk @ Aston University, UK – 2009/09/14
Acknowledgments
● Funding (we accept money):● K.U.Leuven Association, OOF 2007/13● Donations from private companies
● Contributors: Bart Baesens, Eric Bloemen, Eddy Borghers, Christophe Croux, Claude Doom, Dirk Janssens, Christine Lourdon, Koen Milis, Stephan Poelmans, Riko van Dijk, Guido Van Rompuy, Ed van Stee, Larry Weldon, Patrick Wessa
● Project website: www.freestatistics.org
A long time ago...
My frustration
● Teaching Time Series Analysis● Exam question:
Compute (1-B) Y[t] if you know that
Y[t] = {5, 8, 2, 3, 7, 1, 4}BY[t] = Y[t-1]
My frustration
● Teaching Time Series Analysis● Exam question:
Compute (1-B) Y[t] if you know that
Y[t] = {5, 8, 2, 3, 7, 1, 4}BY[t] = Y[t-1]
● Result● Less than 8% of students got it right.● More than 90% of students could prove Wold's
decomposition theorem!
Conclusion?
Conclusion?
● I am an extremely bad educator.
Conclusion?
● I am an extremely bad educator.
● I shouldn't have asked that silly question:Students can only reproduce theories – they are no required to understand them!
● ...
Conclusion?
● I am an extremely bad educator.
● I shouldn't have asked that silly question:Students can only reproduce theories – they are no required to understand them!
● ...
● Or maybe there is something wrong with our approach towards statistics education?
A new approach is needed
● Within the pedagogical paradigm of (social) constructivism:● Interaction & collaboration (peer review)● Experimentation● Responsibility (social control)
=> learning & computing technology
=> we need to Free Statistics of irreproducible research
=> www.FreeStatistics.org
Reproducible Research and the Compendium
Computing
Some References● J. Buckheit and D. L. Donoho. Wavelab and reproducible research. In A.
Antoniadis, editor, Wavelets and Statistics. Springer-Verlag, 1995.
● Peter J. Green. Diversities of gifts, but the same spirit. The Statistician, pages 423–438, 2003.
● T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M.L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.
● David L. Donoho, Xiaoming Huo, BeamLab and Reproducible Research, International Journal of Wavelets, Multiresolution and Information Processing, 2004
● Roger D. Peng, Francesca Dominici, and Scott L. Zeger, Reproducible Epidemiologic Research, American Journal of Epidemiology, 2006
● R. Gentleman, Reproducible Research: A Bioinformatics Case Study, Bioconductor
● R. Gentleman, Applying Reproducible Research in Scientific Discovery, BioSilico, 2005
● Jan de Leeuw, Reproducible Research: the Bottom Line, 2001, online
Some References
● Roger Koenker, Achim Zeileis, Reproducible Econometric Research (A Critical Review of the State of the Art), Department of Statistics and Mathematics Wirtschaftsuniversität Wien, Research Report Series, Report 60, November 2007
● Robert Gentleman, Duncan Temple Lang, Statistical Analyses and Reproducible Research, http://www.bepress.com/bioconductor/paper2
● Schwab, M., Karrenbach, N. and Claerbout, J. Making scientific computations reproducible, Computing in Science & Engineering, 2 (6), pp. 61-67, 2000.
● Robert Gentleman, Some Perspectives on Statistical Computing, online
● Leisch, F., “Sweave and beyond: Computations on text documents”, Proceedings of the 3rd International Workshop on Distributed Statistical Computing, 2003, Vienna, Austria, ISSN 1609-395
Claerbout's principle*
● An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and that complete set of instructions that generated the figures.
*Source: Jan de Leeuw
Sweave package
● Excellent solution (in general)● Somewhat impractical for education because
the student:● is required to DIE
(compatibility, security concerns)● must have a working knowledge of LaTeX and R● must recreate a working compendium (for each
submission)
● Not designed with educational research in mind: there is no way to monitor/measure the actual learning activities
Compendium
• Original definition:
An electronic collection of Text, Data and Software that allows the reader to reproduce the research that is presented in the document
Compendium
Text
Software
Data
Text
Software
Data
Software
Tar, zip, rar, ...
LaTeX R code
Compendium redefined
• New definition:
A document with (open-access) references to (remotely) archived Computations (including Data, Meta-data, and Software) that allow us to reproduce, and reuse the underlying analysis
• Complete separation of:– text and computing– computational result and computing infrastructure
=> the compendium platform is a tool for collaboration, dissemination, and monitoring.
Computations Database
Meta Information
Software
Data
Text
Ref.
Ref.
Ref.
Ref.
Ref.
Ref.
R Module
R Module
R Module
R Module
R Module
R Module
Compendium Dynamics
Meta Information
Software
DataText
Ref.
R Module 1
R Module 1
Changed/New R Module
Learning System orEducational Laboratory?
R Framework
CompendiumPlatformCompendium
Blog
Reproduce & Reuse
ReferenceCreate/Maintain
QueryEngine
ProcessMeasurements
(Virtual)Learning
Environment
Usage
Usage
SearchEngine
www.wessa.net
www.freestatistics.org
www.moodle.org
Screenshots
Setting up the course
Computations are “blogged” (not archived)
09/14/09 Vannes 2008 27
Archive of Computational R Objects
09/14/09 Vannes 2008 28
Objects on the web
Weekly assignments
Snapshot of “Blogged” Computation
Reproduce or Reuse at wessa.net
Cite the computation as follows
Social Interaction, Collaboration, Networking, ...
Feedback (Peer Review)
Submitting Peer Review (feedback) is a good learning activity – not a good grading procedure
Educational Research
Lectures● 13 weeks (semester)
● Week 1: Introduction (explanation) + workshop assignment
● Week 2-12: Workshops + Peer Assessments
● Week 13: Final Exam (multiple choice)
● Grades received from Peers do NOT count => there is no penalty for making mistakes!!
● The quality of feedback messages is graded by the educator
Week 1 Week 2 Week 3 Week 4 ...
ExamL1 L2 L3 L4 L5
WS1 WS2 WS3 WS4 WS5
Rev 1 Rev 2 Rev 3 Rev 4 ...
...
Data
● 37108 computations (2 years)● 20799 feedback messages (2nd year)● 16045 parent-child relationships● 758 students (2 years)● 7899 unique impact relationships (after
elimination of anonymous computations)This is 1.38% of all possible relationships (573806).
Learning Experiences
● Overwhelming evidence that students had a positive learning experience (at the end of the semester).
● This is surprising because:● heavy workload● Reproducible Computing involves critical
thinking which is disliked by students
● All questions have a positive AM - some are even close to the maximum score.
Perceived Usability
● The web-based software was highly rated by students.
● The only exception is question 9: “The website gives error messages that clearly tell me how to fix problems.” ● Reproducible Computing offers quicker and
better problem-solving● Communication between Developer and User
Comparative Advantage
● AM ratings > 0.5
● Q20: Overall, the website was helpful in learning statistics
● Q21: Learning Statistics with this website is more effective than with a traditional handbook
● Q22: I intend to use this website when I need to apply statistics in the future
● Q27: To learn statistics, this website is better than the statistical courses I have had so far
Reported vs. Actual
Relationships
● Strong & robust relationships:● TX = f(RC, Feedback | Prior Knowledge, Education, Gender, ...)
● Satisfaction = f(RC, Usability, Feedback, Critical Thinking)
● Usability = f(Satisfaction about educator)
● Weak or unstable relationships:● TX = f(Satisfaction, Usability, Attitudes)
● Strong and unexpected bias in reported activity (when compared to actual measurements)
Permission granted - (C)opyright Journal of Technology, Learning, and Assessment
Social Networks
● Great interest in the pedagogical paradigm of social constructivism (science education)But... no process information is available!
● Social interaction analysis is difficult (forums)Manual coding is required!
● Dennen (2008):“Discussion is a required component of many Web-based classes, but do we really know its value or contribution to learning? Students may be graded for participation, and number and length of posts may be counted by those evaluating or researching online classes, but all too often the assessment and analysis methods that we use fail to provide us with data that indicate learning took place through participation in online discussion.”
SociogramPropagation of Ideas through Reproducible Computing
Fraud detection
Problem: separate threads of discussion
Week 1 Week 2 Week 3 Week 4 ...
ExamL1 L2 L3 L4 L5
WS1 WS2 WS3 WS4 WS5
Rev 1 Rev 2 Rev 3 Rev 4 ...
...
CAL'09 - Brighton - Patrick Wessa
Computation 3
Computation 1
Connected threads of discussion
Computation 2
Computation 4
Computation 5
Computation 6
Pass/Fail predictionactivenon-active
drop-out
female male
prep.progr. prep.progr.bachelor bachelor
Business Applications
par3 = lambda_1par4 = lambda_2
Expected Return (P50)
par3 = lambda_1par4 = lambda_2
VaR 99% (P01)
tKMACD
Morgan Stanley Standard & Poors
Expected Return (P50)
par1 = delta_1par2 = delta_2
par1 = delta_1par2 = delta_2
VaR 99% (P01)
Alexander’s Filterrule
Morgan Stanley Standard & Poors