6134 math stats

Upload: csrajmohan2924

Post on 03-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 6134 Math Stats

    1/4

    1

    Michael StuartTrinity College, Dublin

    [email protected]

    Mathematical Thinking versus StatisticalThinking in Statistical Teaching

    MSOR ConnectionsFeb 2006 Vol 6 No 1

    Acknowledgements

    The author thanks AWF Edwardsfor details of the reference toFisher (1934).

    Many authors have sought to simplify the mathematics involved in theteaching of statistics. One manifestation of this is the efforts by many

    textbook authors to simplify mathematical formulae. Others have gonefurther and have queried the extent to which mathematics is needed to conveyessential statistical ideas. In particular, some authors have queried the valueof introducing statistical ideas at an elementary level through the medium offormal probability. In this article, the focus is on simplifying the statisticsrather than the mathematics. First, the introduction of statistical significancethrough statistical control charts is described; formal probability is notrequired. Next, re-expressions of standard formulae to facilitate theirstatistical interpretation are presented. As an extension of this idea, thereplacement of standard ANOVA in regression output by analysis of s, theresidual standard deviation, is proposed. The article concludes with discussionof the pros and cons of mathematical and statistical thinking.

    Simplifying Statistical Significance

    There are at least three sources of difficulty for students arising from teachingstatistical inference through probability. In the first place, formal probabilitytheory poses substantial difficulties for students of elementary statistics.Secondly, the abstract notion of repeated sampling from a population thatunderlies the theory of the sampling distribution is a major source of difficulty.Finally, the fact that the basis for classical inference is probabilistic but theresult is not is a source of major confusion.

    The basic idea of Shewhart, who introduced the control chart in 1924, was tomake the distinction between what he referred to as chance causes andassignable causes of variation, with the Normal distribution serving as a modelfor chance variation, leading to the 3 limits conventional in control charts.His logic was that a point outside the 3 limits is so improbable, assuming thatthe process is in control, that the occurrence of such a point makes thatassumption implausible. The idea is entirely parallel to that of statisticalsignificance, with the conventional limits set at 2 (or 1.96 if a show ofspurious accuracy is preferred).

    This approach overcomes the difficulties arising from teaching statisticalinference through probability. In the first place, the idea of variation due to

    chance causes is more convincingly introduced to students through theprocess view of data, via line charts (that become control charts when controllimits are superimposed), than through the histogram view. There is considerableevidence that beginning students readily appreciate the haphazard characterof data when viewed as process data while they have difficulty with thedistribution idea embodied in the histogram view. Further, the regular patterndisplayed in typical histograms of Normal data is at odds with the idea thatthe underlying variation is haphazard.

    Secondly, the notion of repeated sampling is natural in a process context andlends itself to operational interpretation in terms of concrete frequencydistributions rather than the more abstract probability distributions. Thus, theconcept of the sampling distribution can be given an operational introduction

    through visualising continuing process monitoring, rather than the theoreticalbut practically implausible notion of repeated sampling of a population usedin conventional introductions.

  • 7/28/2019 6134 Math Stats

    2/4

    2

    Mathematical Thinking versus Statistical Thinking in Statistical Teaching Michael Stuart

    MSOR ConnectionsFeb 2006 Vol 6 No 1

    Related ideas become more accessible when introducedin this way. For example, the notion of significancelevel may be identified with the control chart falsealarm ratewhich has a simple operational interpretationin terms of an ongoing process with built-in repeatedsampling, rather than the more obscure probability ofrejecting the null hypothesis when true. Further, theidea can be elaborated in terms of the costs of more orless frequent false alarms, thus providing a plausibleapproach to selecting a significance level.

    The approach to significance testing outlined here isextensively implemented in Stuart (2003) [4].

    Note that, although the formal link with probability-

    based reasoning is avoided in the approach advocatedhere, we may be regarded as implicitly applyingprobability based reasoning when applying the informalnotion of chance causes of variation. Elaboration toformal probability based inference may be left to a laterstage of students development. If formal probabilityhas been introduced separately by that stage, there canbe considerable pedagogical advantage todemonstrating that the ideas underlying statistical,significance can be expressed in terms of probability.

    Simplifying Formulae

    Simplification of mathematical formulae is frequentlyseen as desirable from the students point of view. Insome cases, however, there is a hidden cost if themathematical simplification hides the statistical originsof the formulae. For example, the formula for thestandard error of an estimated effect from a balanced 2k

    experiment with N experimental units is often written,following mathematical simplification, as

    4 2

    N

    Re-expressing this as

    22

    2

    N /

    makes it more readily recognisable as the standard errorof a difference between two sample means, each basedon a sample of size N/2. As a second example, thestandard formula for the standard error of prediction ofa regression response variable, Y, given a new value,X

    new, of the explanatory variable, is typically expressed

    as

    2

    2

    21

    1+ +

    ( )

    n

    X X

    X X

    new

    i( )

    The task of unpacking the meaning of this formula ismade unnecessarily difficult by mathematicalsimplification. This may be seen by comparison withthe alternative version

    11 1

    2

    + +

    n n

    X X

    s

    new

    X

    in which it is easily seen that prediction error increases

    with the relative value of the deviation of Xnew

    from X ,relative to the spread of the Xs as measured by the usualmeasure, and also that the increase in prediction errorfrom this source is proportional to 1/n.

    A slight re-arrangement of the statistically simplified

    form,

    11 1

    2

    2

    + + ( )n ns

    X XX

    new

    facilitates identifying the sources of the variationinvolved; each element of the formula may be identifiedwith the standard errors of corresponding elements ofthe standard prediction formula re-expressed as

    n e wY Y ( X X )= +

    In both the examples discussed in this section, the re-expression of the standard error formulas facilitates theirexplanation in statistical terms, in contrast with themathematically simpler versions, which obscure thestatistical interpretation.

    The Analysis of S

    Virtually every statistical computing package includesan analysis of variance table with regression output,even simple linear regression. The analysis of variancetable was devised by R.A. Fisher as a template forcalculating the F ratio; Fisher himself (Fisher 1934, p 52)

    [1] described the analysis of variance table as aconvenient method of arranging the arithmetic.However, the effort required to come to terms with theelements of the table, including the sums of squares andthe F ratio, is considerable. Yet, with simple linearregression, the key entries in the table, the residualmean square and the F ratio, are already effectivelypresented as the residual standard deviation, s, and thet ratio for slope, respectively. To make the casestronger, it is noted that a general regression F test maybe re-expressed in terms of a much simpler and morereadily interpretable ratio of s values and, furthermore,

    that R2

    and Adjusted R2

    may similarly be re-expressed.Full details may be found in Stuart (2005) [5].

  • 7/28/2019 6134 Math Stats

    3/4

    3

    Mathematical Thinking versus Statistical Thinking in Statistical Teaching Michael Stuart

    MSOR ConnectionsFeb 2006 Vol 6 No 1

    Fishers description of the analysis of variance table,quoted above, places the table firmly in the area ofmathematical simplification. This is not to suggest thatthe analysis of variance table does not have value inother contexts. Indeed, with more complicated modelstructures, the table assists with logical interpretation

    along with arranging the arithmetic, as Fisher (1934,p 52) [1] points out. However, with the relativelysimple decomposition of variation associated with linearregression, the benefits of the table are far outweighedby the burden of mathematical interpretation it placeson students of elementary statistics.

    More important than these technical considerations,however, is another reason why more attention should

    be paid to the value of s; it gives an indication ofwhether the regression equation is of any use in practice,since 2s is a rough guide to prediction error.

    Discussion

    The relatively recent emergence of statistical thinkingas a well-defined conceptual framework is allowing acomplete re-think of traditional approaches to theteaching of statistics. The focus among proponents ofstatistical thinking on its merits in the application ofstatistics in a problem solving environment has given

    the subject of statistics a much needed motivationalboost. Such a focus makes the subject more palatableand attractive to students, particularly those for whomstatistics is not their primary interest. The recent text ofHoerl and Snee (2002) [2] provides a radically newapproach to the introduction of statistics to students ofbusiness, based on this view. Thus, the first threechapters of their text is concerned with explaining whatis meant by statistical thinking, how it interacts withand facilitates modern approaches to businessimprovement, all within the context of a broadunderstanding of business processes.

    At another level, Wild and Pfannkuch (1999) [6] haveprovided an outline of a theory of applied statistics inwhich they identify four dimensions of a framework forstatistical thinking: the investigative cycle, types ofthinking, the interrogative cycle and personaldispositions. To illustrate their analysis of the firstdimension, they use an approach to statistical problemsolving developed by McKay and Oldford (1994) [3]that involves five basic steps, problem formulation,solution plan, data collection, data analysis andconclusion. Several other authors have proposedapproaches to statistical problem solving along similar

    lines; see Stuart (2005) [5]. They all involve a broadapproach to statistical problem solving that requires thestatistician to become involved in the context of the

    problem at all stages. Wild and Pfannkuch (1999) [6]discuss in considerable detail how this involvementtakes place, effectively describing how the other threedimensions of their framework interact with the first, theInvestigative Cycle.

    Mathematical thinking, when applied to statisticsteaching (and statistical research) is largely confined tothe data analysis phase of Wild and Pfannkuchs [6] firstdimension; the other three dimensions scarcely interactwith this narrow aspect of the first dimension. There aretwo consequences to this limitation on mathematicalthinking. First, the mathematical solution tends to beregarded as an end in itself rather than a step inapproaching a problem in context. Secondly, it leads

    to a tendency to use data as a vehicle for illustratingmathematical methods rather than as evidence withwhich to address substantive problems. When this kindof thinking is brought to bear on statistical teaching, thefocus is very much on the mathematical aspects ofstatistics. Hence, the view that simplifying themathematics is the same as simplifying the statistics.The illustrations presented in this article demonstratethe opposite.

    Mathematical thinking also assumes that probabilitytheory is fundamental to statistics. The view taken in

    this article is that variation is the fundamental statisticalconcept and that probability serves as a model forstatistical variation. When it comes to statisticalinference, the variation involved is in the ongoingbehaviour of statistical methods; the process view ofthis behaviour was highlighted above. Probability alsoserves as a model for this variation. However, probabilitytheory is not essential in coming to an understanding ofelementary methods of statistical inference. Assessmentsof their behaviour that have direct operationalinterpretations are likely to be more accessible tostudents of elementary statistics.

    Finally, the use of the analysis of variance table inlinear regression output and the assessment of regressionfit through R2 may reflect mathematical appreciation ofthe elegance of the sum of squares decompositionunderlying them, particularly when viewed ingeometrical terms. This involves regarding the valuesof the variables involved as coordinates of vectorswhose squared lengths are represented by the sums ofsquares of their coordinates. With this representation,the sum of squares decomposition is simply anapplication of Pythagoras theorem. This and therelated linear space theory are very attractive

    mathematically while variations on this theme form thebasis for a very large part of mathematical statisticaltheory. The attraction of such elegant and pervasive

  • 7/28/2019 6134 Math Stats

    4/4

    4

    Mathematical Thinking versus Statistical Thinking in Statistical Teaching Michael Stuart

    MSOR ConnectionsFeb 2006 Vol 6 No 1

    mathematics needs to be tempered, however, by thefact that the most elegant development of the theory isthe so-called coordinate free approach. As the vectorcoordinates constitute the data, it follows that thisapproach to the theory is also data free.

    References

    [1] Fisher, R.A., (1934), Discussion of Statistics inagricultural research by J. Wishart, Journal of theRoyal Statistical Society, Supplement, 1, 26-61.

    [2] Hoerl, R.W. and Snee, R.D. (2002), StatisticalThinking: Improving Statistical Performance,Pacific Grove, CA: Duxbury Press/ThomsonLearning.

    [3] McKay, R. and Oldford, W. (1994), AnIntroduction to Empirical Problem Solving: CourseNotes for Stat 231, University of Waterloo,Canada

    [4] Stuart, M. (2003), An Introduction to StatisticalAnalysis for Business and Industry - A ProblemSolving Approach, London: Hodder Arnold.

    [5] Stuart, M. (2005), Mathematical Thinking versusStatistical Thinking; Redressing the Balance inStatistical Teaching http://www.tcd.ie/Statistics/postgraduate/0507.pdf

    [6] Wild, C.J. and Pfannkuch, M. (1999), Statistical

    thinking in empirical enquiry (with discussion),International Statistical Review, 6, 3, 223-265.

    3rd INTERNATIONAL CONFERENCE ONTEACHING OF MATHEMATICS

    at the Undergraduate Level

    June 30 July 5, 2006Istanbul, Turkey

    http://www.tmd.org.tr/ictm3

    SCOPE: Following two very successful International Conferences (ICTM-98, Samos Greece,ICTM02, Crete Greece), the 3rd International Conference on the Teaching of Mathematics will

    address new ways of teaching undergraduate mathematics. It will provide a unique internationaland centralized forum and bring together faculty members from countries with different

    educational and pedagogical systems around the world who are committed to introducing and

    using innovative teaching methods. The conference will be of great interest to mathematics facultyas well as anyone involved in the teaching and learning process of undergraduate mathematics.

    CONFERENCE THEMES:The conference presentations will be centered around the followingthemes:

    Educational Research: Results of current (unpublished) research in mathematics education, andassessment of student learning.

    Technology/ Technology based Educational Systems:Effective integration of computer technology(Calculators, Computer Algebra Systems, WWW resources) into the undergraduate curriculum.

    Innovative Teaching Formats:Innovative ways of teaching undergraduate mathematics courses:cooperative and collaborative teaching and learning styles.

    Distance Learning:Distance Learning Technologies for teaching and learning mathematics.

    Current software/hardware delivery media. Visions for the future.Specific Courses:Reform efforts in specific mathematics courses and assessment results.Other Disciplines:The effects of changes in the teaching of mathematics courses in otherdisciplines (needs of client disciplines; interdisciplinary courses)

    SUBMISSION DEADLINES:

    Abstract Submission: October 15, 2005Review Results: December 1, 2005Full Paper Submission: January 31, 2006Paper Review Decision: March 15, 2006

    Please mark the conference web site (http://www.tmd.org.tr/ictm3) for submission guidelines,updates and announcements.

    Please announce the global conference to colleaguesaround the world.

    For more information, please contact:

    Conference ChairsIgnatios Vakalis Deborah Hughes HallettDept. Mathemat ics Dept. of Mathematics

    Capital University Univers ity of ArizonaUSA USA

    E-mail: [email protected]

    Conf. Co-Chair Local Chair

    Huriye Arikan Tosun TerziogluSabanci Univ. Pres.Turkish Math. Soc.

    Pres. of Sabanci Univ.

    TURKEY TURKEY