let's free statistics of irreproducible research! · let's free statistics of...

Post on 21-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

09/14/09 Patrick Wessa, Ed van Stee 1

Let's Free Statistics of Irreproducible Research!Patrick Wessa – talk @ Aston University, UK – 2009/09/14

Acknowledgments

● Funding (we accept money):● K.U.Leuven Association, OOF 2007/13● Donations from private companies

● Contributors: Bart Baesens, Eric Bloemen, Eddy Borghers, Christophe Croux, Claude Doom, Dirk Janssens, Christine Lourdon, Koen Milis, Stephan Poelmans, Riko van Dijk, Guido Van Rompuy, Ed van Stee, Larry Weldon, Patrick Wessa

● Project website: www.freestatistics.org

A long time ago...

My frustration

● Teaching Time Series Analysis● Exam question:

Compute (1-B) Y[t] if you know that

Y[t] = {5, 8, 2, 3, 7, 1, 4}BY[t] = Y[t-1]

My frustration

● Teaching Time Series Analysis● Exam question:

Compute (1-B) Y[t] if you know that

Y[t] = {5, 8, 2, 3, 7, 1, 4}BY[t] = Y[t-1]

● Result● Less than 8% of students got it right.● More than 90% of students could prove Wold's

decomposition theorem!

Conclusion?

Conclusion?

● I am an extremely bad educator.

Conclusion?

● I am an extremely bad educator.

● I shouldn't have asked that silly question:Students can only reproduce theories – they are no required to understand them!

● ...

Conclusion?

● I am an extremely bad educator.

● I shouldn't have asked that silly question:Students can only reproduce theories – they are no required to understand them!

● ...

● Or maybe there is something wrong with our approach towards statistics education?

A new approach is needed

● Within the pedagogical paradigm of (social) constructivism:● Interaction & collaboration (peer review)● Experimentation● Responsibility (social control)

=> learning & computing technology

=> we need to Free Statistics of irreproducible research

=> www.FreeStatistics.org

Reproducible Research and the Compendium

Computing

Some References● J. Buckheit and D. L. Donoho. Wavelab and reproducible research. In A.

Antoniadis, editor, Wavelets and Statistics. Springer-Verlag, 1995.

● Peter J. Green. Diversities of gifts, but the same spirit. The Statistician, pages 423–438, 2003.

● T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M.L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.

● David L. Donoho, Xiaoming Huo, BeamLab and Reproducible Research, International Journal of Wavelets, Multiresolution and Information Processing, 2004

● Roger D. Peng, Francesca Dominici, and Scott L. Zeger, Reproducible Epidemiologic Research, American Journal of Epidemiology, 2006

● R. Gentleman, Reproducible Research: A Bioinformatics Case Study, Bioconductor

● R. Gentleman, Applying Reproducible Research in Scientific Discovery, BioSilico, 2005

● Jan de Leeuw, Reproducible Research: the Bottom Line, 2001, online

Some References

● Roger Koenker, Achim Zeileis, Reproducible Econometric Research (A Critical Review of the State of the Art), Department of Statistics and Mathematics Wirtschaftsuniversität Wien, Research Report Series, Report 60, November 2007

● Robert Gentleman, Duncan Temple Lang, Statistical Analyses and Reproducible Research, http://www.bepress.com/bioconductor/paper2

● Schwab, M., Karrenbach, N. and Claerbout, J. Making scientific computations reproducible, Computing in Science & Engineering, 2 (6), pp. 61-67, 2000.

● Robert Gentleman, Some Perspectives on Statistical Computing, online

● Leisch, F., “Sweave and beyond: Computations on text documents”, Proceedings of the 3rd International Workshop on Distributed Statistical Computing, 2003, Vienna, Austria, ISSN 1609-395

Claerbout's principle*

● An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and that complete set of instructions that generated the figures.

*Source: Jan de Leeuw

Sweave package

● Excellent solution (in general)● Somewhat impractical for education because

the student:● is required to DIE

(compatibility, security concerns)● must have a working knowledge of LaTeX and R● must recreate a working compendium (for each

submission)

● Not designed with educational research in mind: there is no way to monitor/measure the actual learning activities

Compendium

• Original definition:

An electronic collection of Text, Data and Software that allows the reader to reproduce the research that is presented in the document

Compendium

Text

Software

Data

Text

Software

Data

Software

Tar, zip, rar, ...

LaTeX R code

Compendium redefined

• New definition:

A document with (open-access) references to (remotely) archived Computations (including Data, Meta-data, and Software) that allow us to reproduce, and reuse the underlying analysis

• Complete separation of:– text and computing– computational result and computing infrastructure

=> the compendium platform is a tool for collaboration, dissemination, and monitoring.

Computations Database

Meta Information

Software

Data

Text

Ref.

Ref.

Ref.

Ref.

Ref.

Ref.

R Module

R Module

R Module

R Module

R Module

R Module

Compendium Dynamics

Meta Information

Software

DataText

Ref.

R Module 1

R Module 1

Changed/New R Module

Learning System orEducational Laboratory?

R Framework

CompendiumPlatformCompendium

Blog

Reproduce & Reuse

ReferenceCreate/Maintain

QueryEngine

ProcessMeasurements

(Virtual)Learning

Environment

Usage

Usage

SearchEngine

www.wessa.net

www.freestatistics.org

www.moodle.org

Screenshots

Setting up the course

Computations are “blogged” (not archived)

09/14/09 Vannes 2008 27

Archive of Computational R Objects

09/14/09 Vannes 2008 28

Objects on the web

Weekly assignments

Snapshot of “Blogged” Computation

Reproduce or Reuse at wessa.net

Cite the computation as follows

Social Interaction, Collaboration, Networking, ...

Feedback (Peer Review)

Submitting Peer Review (feedback) is a good learning activity – not a good grading procedure

Educational Research

Lectures● 13 weeks (semester)

● Week 1: Introduction (explanation) + workshop assignment

● Week 2-12: Workshops + Peer Assessments

● Week 13: Final Exam (multiple choice)

● Grades received from Peers do NOT count => there is no penalty for making mistakes!!

● The quality of feedback messages is graded by the educator

Week 1 Week 2 Week 3 Week 4 ...

ExamL1 L2 L3 L4 L5

WS1 WS2 WS3 WS4 WS5

Rev 1 Rev 2 Rev 3 Rev 4 ...

...

Data

● 37108 computations (2 years)● 20799 feedback messages (2nd year)● 16045 parent-child relationships● 758 students (2 years)● 7899 unique impact relationships (after

elimination of anonymous computations)This is 1.38% of all possible relationships (573806).

Learning Experiences

● Overwhelming evidence that students had a positive learning experience (at the end of the semester).

● This is surprising because:● heavy workload● Reproducible Computing involves critical

thinking which is disliked by students

● All questions have a positive AM - some are even close to the maximum score.

Perceived Usability

● The web-based software was highly rated by students.

● The only exception is question 9: “The website gives error messages that clearly tell me how to fix problems.” ● Reproducible Computing offers quicker and

better problem-solving● Communication between Developer and User

Comparative Advantage

● AM ratings > 0.5

● Q20: Overall, the website was helpful in learning statistics

● Q21: Learning Statistics with this website is more effective than with a traditional handbook

● Q22: I intend to use this website when I need to apply statistics in the future

● Q27: To learn statistics, this website is better than the statistical courses I have had so far

Reported vs. Actual

Relationships

● Strong & robust relationships:● TX = f(RC, Feedback | Prior Knowledge, Education, Gender, ...)

● Satisfaction = f(RC, Usability, Feedback, Critical Thinking)

● Usability = f(Satisfaction about educator)

● Weak or unstable relationships:● TX = f(Satisfaction, Usability, Attitudes)

● Strong and unexpected bias in reported activity (when compared to actual measurements)

Permission granted - (C)opyright Journal of Technology, Learning, and Assessment

Social Networks

● Great interest in the pedagogical paradigm of social constructivism (science education)But... no process information is available!

● Social interaction analysis is difficult (forums)Manual coding is required!

● Dennen (2008):“Discussion is a required component of many Web-based classes, but do we really know its value or contribution to learning? Students may be graded for participation, and number and length of posts may be counted by those evaluating or researching online classes, but all too often the assessment and analysis methods that we use fail to provide us with data that indicate learning took place through participation in online discussion.”

SociogramPropagation of Ideas through Reproducible Computing

Fraud detection

Problem: separate threads of discussion

Week 1 Week 2 Week 3 Week 4 ...

ExamL1 L2 L3 L4 L5

WS1 WS2 WS3 WS4 WS5

Rev 1 Rev 2 Rev 3 Rev 4 ...

...

CAL'09 - Brighton - Patrick Wessa

Computation 3

Computation 1

Connected threads of discussion

Computation 2

Computation 4

Computation 5

Computation 6

Pass/Fail predictionactivenon-active

drop-out

female male

prep.progr. prep.progr.bachelor bachelor

Business Applications

par3 = lambda_1par4 = lambda_2

Expected Return (P50)

par3 = lambda_1par4 = lambda_2

VaR 99% (P01)

tKMACD

Morgan Stanley Standard & Poors

Expected Return (P50)

par1 = delta_1par2 = delta_2

par1 = delta_1par2 = delta_2

VaR 99% (P01)

Alexander’s Filterrule

Morgan Stanley Standard & Poors

top related