the effects of violations of assumptions underlying the · 2017-10-24 · psychological bulletin...

16
PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST 1 C. ALAN BONEAU Duke University As psychologists who perform in a research capacity are well aware, psychological data too frequently have an exasperating tendency to manifest themselves in a form which violates one or more of the assump- tions underlying the usual statistical tests of significance. Faced with the problem of analyzing such data, the researcher usually attempts to trans- form them in such a way that the assumptions are tenable, or he may look elsewhere for a statistical test. The latter alternative has become popular because of the proliferation of the so-called nonparametric or distribution-free methods. These techniques quite generally, however, couple their freedom from restricting assumptions with a disdain for much of the information contained within the data. For example, by classifying scores into groups above and below the median one ignores the fact that there are intracategory differences between the individual scores. As a result, tests which make no assump- tions about the distribution from which one is sampling will tend not to reject the null hypothesis when it is actually false as often as will those tests which do make assumptions. This lack of power of the nonpara- 1 This project was undertaken while the author was a Public Health Service Research Fellow of the National Institute of Mental Health at Duke University. The computa- tions involved in this study were performed in the Duke University Digital Computing Lab- oratory which is supported in part by Na- tional Science Foundation Grant G-6694. The author wishes to express his appreciation to Thomas M. Gallie, Director of the Labora- tory, for his cooperation and assistance. metric tests is a decided handicap when, as is frequently the case in psychological research, a modicum of reinforcement in the form of an occasional significant result is re- quired to maintain the research re- sponse. Confronted with this discouraging prospect and a perhaps equally dis- couraging one of laboriously trans- forming data, performing related tests, and then perhaps having diffi- culty in interpreting results, the re- searcher is often tempted simply to ignore such considerations and go ahead and run a t test or analysis of variance. In most cases, he is de- terred by the feeling that such a procedure will not solve the problem. If a significant result is forthcoming, is it due to differences between means, or is it due to the violation of assump- tions? The latter possibility is usually sufficient to preclude the use of the t or F test. It might be suspected that one could finesse the whole problem of untenable assumptions by better planning of the experiment or by a more judicious choice of variables, but this may not always be the case. Let us examine the assumptions more closely. It will be recalled that both the t test and the closely related F test of analysis of variance are pre- dicated on sampling from a normal distribution. A second assumption required by the derivations is that the variances of the distributions from which the samples have been taken is the same (assumption of homogeneity of variance). Thirdly, it is necessary that scor.es used in 49

Upload: others

Post on 17-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

PSYCHOLOGICAL BULLETINVol. 37, No. 1,1960

THE EFFECTS OF VIOLATIONS OF ASSUMPTIONSUNDERLYING THE t TEST1

C. ALAN BONEAUDuke University

As psychologists who perform ina research capacity are well aware,psychological data too frequentlyhave an exasperating tendency tomanifest themselves in a form whichviolates one or more of the assump-tions underlying the usual statisticaltests of significance. Faced with theproblem of analyzing such data, theresearcher usually attempts to trans-form them in such a way that theassumptions are tenable, or he maylook elsewhere for a statistical test.The latter alternative has becomepopular because of the proliferationof the so-called nonparametric ordistribution-free methods. Thesetechniques quite generally, however,couple their freedom from restrictingassumptions with a disdain for muchof the information contained withinthe data. For example, by classifyingscores into groups above and belowthe median one ignores the fact thatthere are intracategory differencesbetween the individual scores. As aresult, tests which make no assump-tions about the distribution fromwhich one is sampling will tend notto reject the null hypothesis when itis actually false as often as will thosetests which do make assumptions.This lack of power of the nonpara-

1 This project was undertaken while theauthor was a Public Health Service ResearchFellow of the National Institute of MentalHealth at Duke University. The computa-tions involved in this study were performed inthe Duke University Digital Computing Lab-oratory which is supported in part by Na-tional Science Foundation Grant G-6694.The author wishes to express his appreciationto Thomas M. Gallie, Director of the Labora-tory, for his cooperation and assistance.

metric tests is a decided handicapwhen, as is frequently the case inpsychological research, a modicumof reinforcement in the form of anoccasional significant result is re-quired to maintain the research re-sponse.

Confronted with this discouragingprospect and a perhaps equally dis-couraging one of laboriously trans-forming data, performing relatedtests, and then perhaps having diffi-culty in interpreting results, the re-searcher is often tempted simply toignore such considerations and goahead and run a t test or analysis ofvariance. In most cases, he is de-terred by the feeling that such aprocedure will not solve the problem.If a significant result is forthcoming,is it due to differences between means,or is it due to the violation of assump-tions? The latter possibility is usuallysufficient to preclude the use of thet or F test.

It might be suspected that onecould finesse the whole problem ofuntenable assumptions by betterplanning of the experiment or by amore judicious choice of variables,but this may not always be the case.Let us examine the assumptions moreclosely. It will be recalled that boththe t test and the closely related Ftest of analysis of variance are pre-dicated on sampling from a normaldistribution. A second assumptionrequired by the derivations is thatthe variances of the distributionsfrom which the samples have beentaken is the same (assumption ofhomogeneity of variance). Thirdly,it is necessary that scor.es used in

49

Page 2: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

so C. ALAN BONEAU

the test exhibit independent errors.The third assumption is usually notrestrictive since the researcher canreadily conduct most psychologicalresearch so that this requirement issatisfied. The first two assumptionsdepend for their reasonableness inpart upon the vagaries inherent inempirical data and the chance shapeof the sampling distribution. Certainsituations also arise frequently whichtend to produce results having in-trinsic non-normality or heterogene-ity of variance. For example, earlyin a paired-associate learning task,before much learning has taken place,the modal number of responses for agroup will be close to zero and anydeviations will be in an upward direc-tion. The distribution of responseswill be skewed and will have a smallvariance. With a medium numberof trials, scores will tend to be spreadover the whole possible range with amode at the center, a more nearlynormal distribution than before, butwith greater variance. When thetask has been learned by most of thegroup, the distribution will be skeweddownward and with smaller variance.In this particular case, one wouldprobably more closely approximatenormality and homogeneity in thedata by using some other measure,perhaps number of trials for mastery.In many situations this option maynot be present.

There is, however, evidence thatthe ordinary t and F tests are nearlyimmune to violation of assumptionsor can easily be made so if precau-tions are taken (Pearson, 1931; Bart-lett, 1935; Welch, 1937; Daniels,1938; Quensel, 1947; Gayen, 1950a,1950b; David & Johnson, 1951; Hor-snell, 1953; Box, 1954a, 1954b; Box& Anderson, 1955). Journeymanpsychologists have been apprised ofthis possibility by Lindquist (1953)who summarizes the results of a

study by Norton (1951). Norton'stechnique was to obtain samples ofFs by means of a random samplingprocedure from distributions havingthe same mean but which violatedthe assumptions of normality andhomogeneity of variance in prede-termined fashions. As a measure ofthe effect of the violations, Nortondetermined the obtained percentageof sample Fs which exceeded the the-oretical 5% and 1% values from theF tables for various conditions. Ifthe null hypothesis is true, and if theassumptions are met, the theoreticalvalues are F values which would beexceeded by chance exactly 5% or1% of the time. The discrepancy be-tween these expected percentages andthe obtained percentages is one use-ful measure of the effects of the viola-tions.

Norton's results may be summa-rized briefly as follows: (a) When thesamples all came from the samepopulation, the shape of the distribu-tion had very little effect on the per-centage of F ratios exceeding thetheoretical limits. For example, forthe 5% level, the percentages ex-ceeding the theoretical limits were7.83% for a leptokurtic populationas one extreme discrepancy and 4.76%for an extremely skewed distributionas another, (b) For sampling frompopulations having the same shapebut different variances, or havingdifferent shapes but the same vari-ance, there was little effect on theempirical percentage exceeding the-oretical limits, the average being be-tween 6.5% and 7.0%. (c) For sam-pling from populations with differentshapes and heterogeneous variances,a serious discrepancy between theo-retical and obtained percentages oc-curred in some instances. On thebasis of these results, Lindquist (1953p. 86) concluded that "unless theheterogeneity of either form or vari-

Page 3: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST 51

ance is so extreme as to be readilyapparent upon inspection of the data,the effect upon the F distribution willprobably be negligible."

This conclusion has apparently hadsurprisingly little effect upon thestatistical habits of research workers(or perhaps editors) as is evidentfrom the increasing reliance upon theless powerful nonparametric tech-niques in published reports. Thepurpose of this paper is to expoundfurther the invulnerability of the ttest and its next of kin the F test toordinary onslaughts stemming fromviolation of the assumptions of nor-mality and homogeneity. In part,this will be done by reporting resultsof a study conducted by the authordealing with the effect on the t testof violation of assumptions. In addi-tion, supporting evidence from amathematical framework will be usedto bolster the argument.

To temper any imputed dogmatismin the foregoing, it should be empha-sized that there are certain restric-tions which preclude an automaticutilization of the t and F tests with-out regard for assumptions even whenthese tests are otherwise applicable.It is apparent, for example, that theviolation of the homogeneity of vari-ance assumption is drastically dis-turbing to the distribution of t's andFa if the sample sizes are not the samefor all groups, a possibility which wasnot considered in the Norton study.It also seems clear that in cases ofextreme violations, one must have asample size large enough to allowthe statistical effects of averagingto come into play. The need for suchconsiderations will be made apparentin the ensuing discussion. There isabundant evidence, however, thatboth the t and the F tests are muchless affected by extreme violationsof the assumptions than has beengenerally realized.

A SAMPLING EXPERIMENTAt this point we will concern our-

selves with the statement of theresults of a random sampling study.The procedure is one of computing alarge number of t values, each basedupon samples drawn at random fromdistributions having specified char-acteristics, and constructing a fre-quency distribution of the obtainedt's. The present study was performedon the IBM 650 Electronic Com-puter programmed to perform thenecessary operations which can besummarized as follows: (a) the gener-ation of a random number, (&) thetransformation of the random num-ber into a random deviate from theappropriate distribution, (c) the suc-cessive accumulation of the sums andsums of squares of the random devi-ates until the appropriate sample sizeis reached, (d) the computation of at based upon the sums and sums ofsquares of the two samples, (e) thesorting of the t's on the basis of sizeand sign and the construction of afrequency distribution based uponthe sorting operation. The completesequence of operations was performedinternally, the end result, the fre-quency distribution of 1000 t's, beingpunched out on IBM cards.

Comments on many of the aboveoperations are relevant and will bemade according to their order above.

(a) The random numbers con-sisted of 10 digits, the middle 10digits of the product of the previouslygenerated random number and of oneof a sequence of 10 permutations ofthe 10 digits (0, 1, 2, • • • , 9) placedas multipliers in the machine. Tostart the process it was necessary toplace in the machine a 10 digit ran-dom number selected from a table ofrandom numbers. The randomnessof numbers generated in such afashion was checked by sorting 5000of them into 50 categories on the

Page 4: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

52 C. ALAN BONEAU

basis of the first 2 digits. A x2 wascomputed to determine the fit of theobtained distribution to a theoreticalone consisting of 100 scores in eachof the SO categories. The obtained x2

of 47.83 is extremely close to the49.332 value, which is the theoreticalmedian of the x2 distribution withSO degrees of freedom.

(6) In order to obtain the randomdeviates (the individual randomscores from the appropriate popula-tion), the random numbers obtainedin the above fashion were consideredto be numbers between 0 and 1 andinterpreted as the cumulative prob-ability for a particular score from theprescribed population. From a tableentered in the machine, a randomdeviate having that probability wasselected. This is identical with theprocedure one uses in entering theordinary z table to determine thescore below which, say, 97.5% ofthe scores in the distribution lie. Theobtained value, 1.96, is the deviatecorresponding to that cumulativepercentage. The distribution of such

in the computer and were so arrangedthat the mean of each distributionwas 0 and the variance 1. To verifythese values, population means andvariances based on samples of 5000deviates from each of the three popu-lations were estimated by the usualformulas. The results were for thenormal distribution a sample meanof .0024 and a variance of 1.0118,for the exponential a mean of .0128and a variance of 1.0475, and for therectangular a mean of —.0115 and avariance of .9812. All of these resultscould quite easily have arisen fromrandom sampling from distributionshaving the assumed characteristics.To change the size of the variance ofthe population, all deviates weremultiplied when necessary by a con-stant, in this case, the number 2. Theresulting distribution has a mean of0 and variance of 4. The only vari-ances used in this study were 1 and 4.

(c) The sample sizes selected were5 and 15.

(d) The formula used for the com-putation of / was the following:

/=-2 - Ni M i2+SX2

2 -

Ni+Nt-2

/ I 1 \•(ft-1-*)

deviates from a normal populationobtained by using a large randomsample of probabilities is normallydistributed. Similar tables can beconstructed for other populations.The populations selected for thisstudy were the normal, the expo-nential (J-shaped with a skew to theright) having a density function ofy = e~x, and the rectangular or uni-form distribution. These distribu-tions represent extremes of skewnessand flatness to compare with thenormal. The tables of deviates cor-responding to each of the selected dis-tributions were contained internally

where MI and Mi are the means ofthe first and second samples and N\and Nt are the respective samplesizes. This expression, or an equiva-lent statement of it, is found in anystatistics book and is undoubtedlyemployed in a preponderance of theresearch in which a / test involvingnonrelated means is used. As pointedout in most statistic texts, this testis not appropriate when variancesare different. Tests are availablewhich are more or less legitimate un-der these conditions, but a certainamount of approximation is involvedin them. It was felt, however, that

Page 5: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST S3

the ordinary t test might under someconditions be as good an approxima-tion as the more complex forms of ttests and that a verification of thisnotion was desirable. In addition,the above formula makes use of apooled estimate of variance for theerror term and in this respect is simi-lar to the F test of analysis of vari-ance. Because of this fact, certainresults can be generalized from the tto the F test.

To summarize, random sampleswere drawn from populations whichwere either normal, rectangular, orexponential with means equal to 0and variances of 1 or 4. For severalcombinations of forms and variances,t tests of the significance of the differ-ence between sample means werecomputed using combinations of thesample sizes 5 and IS. For each ofthese combinations, frequency dis-tributions of sample t's were obtainedon the IBM 650 Electronic Com-puter.

RESULTSThe results of the sampling study

will be presented in part as a seriesof frequency distributions in the formof bar graphs of the obtained distri-tribution of t's for a particular condi-dition. Upon these have been super-imposed the theoretical t distributioncurve for the appropriate degrees offreedom. This furnishes a rapid com-parison of the extent to which the em-pirical distribution conforms to thetheoretical.

First we shall consider those combi-nations possible when both of thesamples are from normal distribu-tions but variances and sample sizesmay vary. Next will be consideredthe results of sampling from non-normal distributions, but both sam-ples are from the same type of dis-tribution. Finally we deal with theresults of sampling from two differ-

ent kinds of populations, for example,one sample from the normal distri-bution, and another from the ex-ponential.

Potentially, a very large number ofsuch combinations are possible. Limi-tations of the time available on thecomputer necessitated a paring downto a reasonable number. Althoughthe computer is relatively fast whenoptimally programmed, it neverthe-less required almost an hour, on theaverage, to complete a frequencydistribution of 1000 t's. The combi-nations presented here are thosewhich seemed most important at thetime the study was made.

As a measure of the effect of viola-tion of assumptions, the percentage ofobtained t's which exceed the theoret-ical values delineating the middle95% of the* distribution is used. For8, 18, and 28 df which arise in thepresent study, the correspondingvalues are respectively ±2.262,±2.101, and ±2.048. If the as-sumptions are met, and if the nullhypothesis of equality of means istrue, 5% of the obtained t's shouldfall outside these limits. The differ-ence between this nominal value andthe actual value obtained by sam-pling should be a useful measure of thedegree to which violation of as-sumptions changes the distributionof t scores. There is, of course, a ran-dom quality to the obtained percent-age of t's falling outside the theoreticallimits. Hence, the obtained valueshould be looked upon as an approxi-mation to the true value whichshould lie nearby.

In the figures and in the text, thevarious combinations of population,variance, and sample size will berepresented symbolically in the fol-lowing form: £(0, 1)5--ZV(0, 4)15.Here the letters E, N, and R refer tothe population from which the sam-ple was drawn, E for exponential-

Page 6: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

54 C. ALAN BONEAU

FIG. 1. Empirical distribution of t's fromN(Q, 1)5-N(0, 1)5 and theoretical distribu-tion with 8 df.

N for normal, and R for rectangular.The first number in the parenthesis isthe mean of the population distribu-tion, in all cases zero, while thesecond number is the variance. Thenumber following the parenthesis isthe sample size for that particularsample. In the example above, thefirst sample is of Size 5 from an ex-ponential distribution having a vari-ance of 1. The second sample is froma normal distribution with varianceof 4 and the sample size is 15.

Sampling from Normal DistributionsIn order to justify the random

sampling approach utilized in thisstudy, and partly to confirm thefaith placed in the tabled values ofthe mathematical statisticians, theinitial comparisons are between thetheoretical distributions and the ob-tained distributions with assumptionsinviolate. Figures 1 and 2 exhibitthe empirical distributions of t's whenboth samples are taken from the samenormal distribution with zero meanand unit variance—designated N(0,1). In Fig. 1 both samples are ofSize 5, while both are 15 in Fig. 2.The theoretical curves, one for 8 df,the other for 28, represent quite wellthe obtained distributions. Ordi-nates approximately two units fromthe mean of the theoretical distribu-tions mark off the respective 5% lim-

its for rejecting the null hypothesis.In Fig. I,2 5.3% of the obtained t'sfall outside these bounds, while inFig. 2 only 4.0% of the sample t'sare in excess. Since in both casesthe expected value is exactly 5%, wemust attribute the discrepancy torandom sampling fluctuations. Thesize of these discrepancies should beuseful measures in evaluating thediscrepancies which will be encoun-tered under other conditions of sam-pling. For examples of 2000 t's a dis-crepancy as large as 1% from thenominal 5% value evidently occursfrequently, and for this reason shouldnot be considered as evidence to re-ject the theoretical distribution as anapproximation to the empirical one.

As an initial departure from thesimplest cases just presented, Fig. 3compares theoretical and empiricaldistributions when samples are takenfrom the same N(0, 1) population,but the first sample size is 5, the sec-ond is 15—that is, N(0,1)5-^(0,1)15.While this in no sense is a violationof the assumptions of the t test, it isinteresting to note that again sam-pling fluctuations have produced anempirical distribution with 4.0% ofthe t's falling outside the nominal 5%limits.

1 The numbers in the tails of some of thefigures report the number of obtained t's fall-ing outside the boundaries.

FIG. 2. Empirical distribution of t's fromJV(0, 1)15-JV(0, 1)15 and theoretical distribu-tion with 28 df.

Page 7: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST 55

The violation of the assumption ofhomogeneity of variance has effects asdepicted in Fig. 4. Here the ob-tained distribution is based upontwo samples of Size 5, one fromJV(0, 1) and the other from N(Q, 4).The fit is again seen to be close be-tween theoretical and empirical dis-tributions, and 6.4% of the obtainedt's exceed the theoretical 5% limits.By increasing the sample size to 15,a distribution results (not shown here)for which only 4.9% of the *'s falloutside the nominal limits. It wouldseem that increasing the sample sizeproduces a distribution which con-forms rather closely to the t distri-bution. As will be seen later, this is aquite general result based uponmathematical considerations, the im-plications of which are important tothe argument. For the moment it isevident that differences in varianceat least in the ratio of 1 to 4 do notseriously affect the accuracy of prob-ability statements made on the basisof the t test.

This last conclusion is true only solong as the size of both samples is thesame. If the variances are different,with the present set of conditionsthere are two combinations of vari-ance and sample size possible. Inone case the first sample may be ofSize S and drawn from the popula-tion with the smaller variance, while

ISO

! 120-

!'eo

«0

-4 3 3 ^ 5 1 t T iVALUE OF t

FIG. 4. Empirical distribution of t'a fromN(Q, l)5-N(0, 4)5 and theoretical distribu-tion with 8 df.

the second sample of Size 15 isdrawn from the population havingthe larger variance—N(Q, 1)5-^V(0,4)15. In the second case the smallsample size is coupled with the largervariance, the larger sample size withthe smaller variance—A^O, 4)5-^(0,1)15. The respective results of suchsampling are presented in Fig. 5 and6. The empirical distributions areclearly not approximated by the tdistribution. For the distributionof Fig. 5, only 1% of the obtainedt's exceed the nominal 5% values,while in Fig. 6, 16% of the t's falloutside those limits.

There are good mathematical rea-sons why a difference in sample sizeshould produce such decided dis-crepancies when the variances are un-equal. Recall thatS(Z-M)2/(^-l)is an estimate of the variance of the

FIG. 3. Empirical distribution of f s from FIG. S. Empirical distribution of t's fromN(0, 1)5-N(Q, 1)15 and theoretical distribu- N(0, 1)5-N(0, 4)15 and theoretical distribu-tion with 18 df. ' tion with 18 df.

Page 8: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

56 C. ALAN BONEAU

160-

120-

- 1 0 1

VALUE OFt

FIG. 6. Empirical distribution of t'a from(JV(0, 4)S-JV(0, 1)15 and theoretical distribu-tion with 18 df.

population from which the sample isdrawn. Hence, S^—M)2 will in thelong run be equal to (N— l)or2. Theformula used in this study for com-puting t makes use of this fact and,in addition, under the assumptionthat the variances of the populationsfrom which the two samples aredrawn are equal, pools the sum of thesquared deviations from the respec-tive sample means to get a betterestimate. That is S(/fi —.Mi)*+2(Xt — .M-j)2 is an estimate of

=<r2 (homogeneity of variance), thenthe sums estimate (Ni+Nt—2)ff2.Hence,

[1]Nt+Ni-2

is an estimate of <r2. If ff^^a^ theestimating procedure is patently il-legitimate, the resulting value de-pending in a large measure upon thecombination of sample size and vari-ance used. For example, the caseW(0,1)5-^(0,4) 15 has M = 5, W2 = 15,<n2 = l, era2 = 4, and #1+^2-2 = 18.With these values, Formula 1 hasan expected value of [(4-l)-f-(14-4)]/18 = 3.33. Using the appropriatevalues for the other situation, N(Q,4)5-N(0, 1)15, the result of formula1 is [(4-4) + (14-l)]/18 = 1.67. Thismeans that on the average, the de-

nominator for the t test will be largerfor the first case than for the second.If the sample differences betweenmeans were of the same magnitudefor the two cases, obviously more"significant" t's would emerge whenthe denominator is smaller. It sohappens that when this latter condi-tion exists, the variance of the numer-ator also tends to be greater thanin the other condition, a fact whichaccentuates the differences betweenthe two empirical distributions.

Welch (1937) has shown mathe-matically that in the case of samplesizes of 5 and 15, a state which pre-vails here, the percentage of t's ex-ceeding the nominal 5% value variesas a function of the ratio of the twopopulation variances and can be aslow as 0% and as high as 31.3%.If Ni=N<i there is never much bias,except perhaps in the case in whichthe sample sizes are both 2. ForNi = Nt = lQ, the expected value ofthe percentage of t's exceeding thenominal 5% limits varies between 5%and 6.5% regardless of the differencebetween the variances. For largersample sizes, the discrepancy tendsto be even less.

Since the pooling procedure for esti-mating the population variance isused in ordinary analysis of variancetechniques, it would seem that thecombination of unequal variances andunequal sample sizes might playhavoc with F test probability state-ments. That is, a combination oflarge variance and large sample sizeshould tend to make the F test moreconservative than the nominal valuewould lead one to expect, and, aswith the t test, small variance andlarge sample size should produce ahigher percentage of "significant"Fs than expected. These conclusionsare based upon a very simple exten-sion to more than two samples of theexplanation for the behavior of the

Page 9: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST 57

t test probabilities with unequal sam-ple sizes.

A more sophisticated mathematicalhandling of the problem by Box(19S4a) reaches much the same con-clusions for the simple-randomizedanalysis of variance. In a table in hisarticle are given exact (i.e., mathe-matically determined) probabilitiesof exceeding the 5% point when vari-ances are unequal. In this case,sampling is assumed to be from nor-mal distributions. If the samplesizes are the same, the probabilitygiven for equal sample sizes rangefrom 5.55% to 7.42%, for severalcombinations of variances, and num-bers of samples. If, when variancesare different, the samples are of differ-ent sizes, large discrepancies fromthe nominal values result. Combininglarge sample and large variance less-ens the probability of obtaining a"significant" result to much less than5%, just as we have seen for the ttest. In a subsequent article, Box(1954b) presents some results fromtwo-way analysis of variance. Sincethese designs generally have equalcell frequencies the results are nottoo far from expected. His figuresall run within 2% of the 5% valueexpected if all assumptions were met.

It would seem then that both em-pirically and mathematically therecan be demonstrated only a minoreffect on the validity of probabilitystatements caused by heterogeneityof variance, provided the sizes of thesamples are the same. This appliesto the F as well as the t test. If how-ever, the sample sizes are different,major errors in interpretation may re-sult if normal curve thinking is used.

Sampling from Identical Non-NormalDistributions: (Equal Variances)

Let us now proceed to violate theother main assumption, that of nor-mality of distribution from which

sampling takes place. At this timewe will consider the t distributionsarising when both samples are takenfrom the same non-normal distribu-tion. The distributions shown here,and all subsequent ones, are basedupon only 1000 t's, and hence willexhibit somewhat more column tocolumn fluctuation than the preced-ing distributions.

Figure 7 compares the theoreticalt distribution and the empirical dis-tribution obtained from two samplesof Size 5 from the exponential distri-bution—E(0, 1)5-£(0, 1)5. The fitis fairly close, but the proportion ofcases in the tails seems less for theempirical distribution than for thetheoretical. By count, 3.1% of theobtained t's exceed the nominal 5%values—that is, the test in this caseseems slightly conservative. If bothsample sizes are raised to 15 (distri-bution not shown here), the corre-sponding percentage of obtained t's is4.0%. While this is probably not anappreciably better fit than for sam-ples of Size 5, we shall see later thatthere are theoretical reasons to sus-pect that increasing the sample sizeshould better the approximation ofthe empirical curve by the theoreticalno matter what the parent popula-tion may be.

If both samples are of Size 5 fromthe same rectangular distribution—•R(0, 1)5-J?(0, 1)5—the result is as

80-

6 60.

aS40-|

20-

-I 0VALUE OF t

FIG. 7. Empirical distribution of t's from£(0, 1)S-JE(0, 1)5 and theoretical distribu-tion with 8 df.

Page 10: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

58 C. ALAN BONEAU

depicted in Fig. 8. The fit of theoreti-cal curve to empirical data here is asgood as any thus far observed. Thepercentage of obtained t's exceedingthe 5% values is 5.1% in this particu-lar case. For the case in which thesample sizes are both 15 (not shownhere), the fit is equally good, with5.0% of the cases falling outside of thenominal 5% bounds.

Sampling from Non-Normal Distribu-tions: (Unequal Variances)

We may assume that if the vari-ances are unequal, and at the sametime the sample sizes are different,the resulting distributions from non-normal populations will be affectedin the same way as the distributionsderived from normal populations,and for the same reasons. These caseswill not be considered.

If sampling is in sizes of 5 fromtwo exponential distributions, onewith a variance of 1, and the other of4, a skewed distribution of obtainedt's emerges (not shown here). Weshall discover that a skewed distri-bution of t's generally arises whenthe sampling is from distributionswhich are different in degree of skew-ness or asymmetry. (For an explana-tion, see discussion of E(0, 1)5-N(Q, 1)5 below.) Apparently, theeffect of increasing the variance ofthe exponential distribution as in thepresent case—E(0, 1)5-JS(0, 4)5—isto make the negative sample meansarising from the distribution withlarger variance even more negativethan those from the distribution withsmaller variance. In terms of per-centage exceeding the nominal 5%limits for this case, the value is 8.3%,of which 7.6% comes from the skewedtail. This combination of variancesand distribution was not tested withlarger samples, but we shall see whencomparing exponential and normaldistributions that an increase in the

FIG. 8. Empirical distribution of t's fromR(0, 1)5-R(0, 1)5 and theoretical distribu-tion with 8 df.

sample size decreases the skew of theobtained t distribution there. Theo-retically, this decrease should occurin almost all cases, including the pres-ent one.

The result is much less compli-cated if, while variances are differ-ent, the sampling is from symmetricalrectangular distributions—^?(0, 1)-R(0, 4)5. For this small sample situ-ation, (not illustrated), there occursa distribution of obtained t's having7.1% of the values exceeding thenominal 5% points. This is roughlythe same magnitude as the corre-sponding discrepancy from normaldistributions. For the normal, it willbe recalled that an increase of thesample sizes to 15 decreased the ob-tained percentage to 4.9%. Thereis no reason to believe that increasingthe size of the rectangular sampleswould not have the same effect.However, time did not permit the de-terminaiton of this distribution.

Sampling from Two Different Distri-butions

By drawing the first sample froma distribution having one shape,and by drawing the second from adistribution having another shape(other than shape differences arisingfrom heterogeneity of variance), yetanother way has been found to doviolence to the integrity of the as-

Page 11: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST 59

sumptions underlying the t test. Per-haps the least violent of these hap-penings is that in which at least oneof the populations is normal.

When one sample is from the ex-ponential'distribution and the otherfrom the normal, the interesting re-sult shown in Fig. 9 occurs. This isthe small sample case—£(0, 1)5-N(0,1)S. It will be recalled thatfor skewed distributions the meanand median are at different points.In the exponential distributions, forexample, the mean is at the 63rd cen-tile. If samples from the exponentialdistribution are small, there will be atendency for the sample mean to beless than the population mean; obvi-ously sinee- nearly two thirds of thescores are below that mean. Sincethe population mean of the presentdistributions is 0, the result will be apreponderance of negative samplemeans for small samples. If the othersample is taken from a symmetricaldistribution, which would tend toproduce as many positive as negativesample means, the resulting distribu-tion of obtained t's would not balanceabout its zero point, an imbalance ex-acerbated by small samples. In Fig.9, 7.1% of the obtained cases fall out-side the 5% limits, with most, 5.6%,lying in the skewed tail. The effect ofincreasing the sample size to 15 is tonormalize the distribution consider-ably; the resulting curve, Fig, 10, is

80'

60-

-4 -3 -I -I 0

VALUE or t

FIG. 9. Empirical distribution of t'a from£(0, 1)S-JV(0, 1)5 and theoretical distribu-tion with 8 df.

FIG. 10.; Empirical distribution of t's fromE(0, 1)15-N(0, 1)15 and theoretical distribu-tion with 28 df.

fairly well approximated by the / dis-tribution. One of the tails, however,does contain a disproportionate shareof the cases, 4.2% to 0.9% for theother tail, or a total of 5.1% fallingoutside the nominal 5% limits. Nev-ertheless, the degree to which thetheoretical and empirical distribu-tions coincide under these conditionsis striking. It seems likely that ifboth samples were each of Size 25,the resulting sample distribution oft's would be virtually indistinguish-able from the t distribution for 48 df,or the next best thing, the normalcurve itself. To test this hypothesis,an additional empirical t distributionbased on sample sizes of 25 from thesesame exponential and normal popula-tions was obtained (not shown here).The results nicely confirm the pre-sumption. Comparison with theusual 5% values reveals 4.6% ofthe emjpirical t's surpassing them.Whereas with the smaller samples theratio of t's in the skewed tail to thosein the other tail is roughly 80:20, thecorresponding ratio for the largersample case is 59:41. Clearly, the in-crease in sample sizes has tended tonormalize the distribution of t's.

For these conditions, involvingrather drastic violation of the mathe-matical assumptions of the test, thet test has been observed to fare wellwith an adequate sample size. Such

Page 12: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

60 C. ALAN BONEAU

a state of affairs is to be expectedtheoretically. By invoking a fewtheorems of mathematical statisticsit can be shown that if one samplesfrom any two populations for whichthe Central Limit Theorem holds,(almost any population that a psy-chologist might be confronted with),no matter what the variances may be,the use of equal sample sizes insuresthat the resulting distribution of t'swill approach normality as a limit.It would appear from the present re-sults that the approach to normalityis rather rapid, since samples of sizesof 15 are generally sufficient to undomost of the damage inflicted by vio-lation of assumptions. Only in ex-treme cases, such as the last whichinvolves distributions differing inskew, would it seem that slightlylarger sizes are prescribed. Thus itwould appear that the t test is func-tionally a distribution-free test, pro-viding the sample sizes are sufficient-ly large (say, 30, for extreme viola-tions) and equal.

The distributions arising whensampling is from the normal and therectangular distributions—N(0, 1)5-R(Q, 1)5 and N(0, 1)15-£(0, 1)15—would further tend to substantiatethis claim. The respective percent-ages exceeding the 5% nominalvalues are 5.6% and 4.6% fromthe empirical distributions for thesecases, the distributions of t's beingsymmetrical and close to the theo-retical (not shown).

The only other combination exam-ined in the sampling study is the un-interesting case of exponential andrectangular distributions. This dis-tribution (not shown) is again skewedwith the effect of increase of samplesize from 5 to 15 to cut down theskew and to decrease the percentageof cases falling outside the theoretical5% values from 6.4% to 5.6%. Forthose cases falling outside the nomi-

nal 5% values, the ratio is 79:21 forthe smaller samples. This is changedto 69:31 for the sample size of 15.Here again it would seem that largersample sizes would be required to in-sure the validity of probability state-ments utilizing the t distribution asa model.

The results of the total study aresummarized in Table 1 which givesfor each combination of population,variance, and sample size (a) the per-centage of obtained t'a falling outsidethe nominal 5% probability limits ofthe ordinary I distribution, and (&)the percentage of obtained t's fallingoutside the 1% limits. The combina-tions are represented symbolically asbefore. The table is divided into twoparts, the first part presenting infor-mation on the empirical distributionswhich are intrinsically symmetrical.The second part is based upon the in-trinsically nonsymmetripal distribu-tions, additional information in thissection of the table being the percent-age of obtained t's falling in the largerof the tails. The percentage for thesmaller tail may be obtained by sub-traction of the percentage in the larg-er tail from the total.

Certain implications of the tableshould be discussed. In the Nortonstudy, more severe distortions some-times occurred with significance lev-els of 1% and .1% than appearedwith the 5% level. The inclusion inTable 1 of the percentages of ob-tained t's falling outside the nominal1% values makes possible the com-parison of the 1% and 5% results.The 1% values seem to be approxi-mately what would be expected con-sidering that sampling fluctuationsare occurring. It was not felt feasibleto determine the results for the .1%level since with only 1000 or 2000cases the number of obtained t's fall-ing outside the prescribed limits wasnegligible in most cases. It is pos-

Page 13: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST 61

sible, however, that the distortions inthe apparent level of significance aremore drastic for the smaller a values.

All the results and discussionhave been limited thus far to thetwo-tailed t test. With notable ex-ceptions, the conclusions we havereached can be applied directly to theone-tailed t test as well. The ex-ceptions involve those distributionswhich are intrinsically asymmetric(see Table 1). In these distributionsa preponderance of the obtained t'sfall in one tail. Depending upon theparticular tail involved in the one-tailed test the use of t should producetoo many or too few significant re-

TABLE 1

OBTAINED PERCENTAGES OF CASES FALLINGOUTSIDE THE APPROPRIATE TABLED tVALUES FOR THE 5% AND 1% LEVEL

OF SIGNIFICANCE

SymmetricDistributions

Obtained Percentage at

5% level 1% level

N(0, 1)5-7V(0, 1)5N(Q, 1)15- N(Q, 1)15N(0, l)S-N(0, 1)15#(0,1)5-JV(0,4)5JV(0,1)15-N (0,4)15N(0, 1)5-JV(0,4)15JV(0, 4)5- JV(0, 1)15£(0, 1)5-3(0, 1)5£(0, 1)15-£(0, 1)15£(0, 1)5-J?(0, 1)5£(0, 1)15-J?(0, 1)15R(0, 1)5-R(0, 4)5N(0, l)5-jR(0, 1)5^(0,1)15-^(0,1)15

5.34.04.06.44.91.0

16.03.14.05.15.07.15.65.6

0.90.80.61.81.10.16.00.30.41.01.51.91.01.1

AsymmetricDistributions

Obtained Percentage at

5% level 1% level

Larg- Larg-Total er Total er

Tail TailE(Q, 1)5-JV(0,1)5 7.1 5.6 1.9 1.9E(0,1)15-JV(0,1)15 5.1 4.2 1.4 1.2£(0,1)25-N(Q, 1)25 4.6 2.7 1.3 1.1£(0,1)5-7?(0,1)5 6.4 5.0 3.3 2.5£(0,1)1S-J?(0,1)15 5.6 3.9 1.6 1.2£(0,1)5-£(0,4)5 8.3 7.6 1.7 1.7

suits when sampling is from a combi-nation of populations from which anasymmetric t distribution is expected.It seems impossible to make anysimple statements about the behav-ior of the tails in the general case ofasymmetric t distribution except tosay that such distributions are ex-pected whenever the skew of the twoparent populations is different. Theexperimenter must determine foreach particular instance the directionof skew of the expected distributionand act accordingly. Table 1 givesfor the intrinsically asymmetric dis-tributions the total percentage of ob-tained t's falling outside the theo-retical 5% and 1% limits and the per-centage in the larger tail. Fromthese values can be assessed the ap-proximate magnitude of the bias in-curred when a one-tailed test is usedin specific situations.

DISCUSSION AND CONCLUSIONSHaving violated a number of as-

sumptions underlying the t test, andfinding that, by and large, such vio-lations produce a minimal effect onthe distribution of f s , we must con-clude that the t test is a remarkably

fro6wf£}test in the technical sense ofthe word. This term was introducedby Box (1953) to characterize sta-tistical tests which are only inconse-quentially affected by a violation ofthe underlying assumptions. Everystatistical test is in part a test of theassumptions upon which it is based.For example, the null hypothesis of aparticular test may be concernedwith sample means. If, however, theassumptions underlying the test arenot met, the result may be "signifi-cant" even though the populationmeans are the same. If the statisti-cal test is relatively insensitive toviolations of the assumptions otherthan the null hypothesis, and, hence,if probability statements refer pri-

Page 14: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

62 C. ALAN BONEA U

marily to the null hypotheses, it issaid to be robust. The t and jF testsapparently possess this" quality To ahigh degree.

In this particular context, an im-portant example of a test -lackingrobustness is Bartlett's test forhomogjmeity of variance (Bartlett,1937). Trox" (1953) has shown thatthis test is extremely sensitive tonon-normality and will under someconditions be prone to yield "signifi-cant" results even if variances areequal. For example, Box tables anumber of exact probabilities of ex-ceeding the 5% normal theory sig-nificance level in the Bartlett testfor various levels of Xs, the kurtosisparameter, for different quantities ofvariances being compared. As an ex-treme case, if Xa = 2 (i.e., a peakeddistribution) with 30 variances beingtested, the probability of rejectingthe hypothesis at the nominal .05level is actually .849. I fX 2 = -1 (i.e.,a flat distribution), the probability is.00001. Note that in both thesecases, all variances are actuallyequal. Box, realizing that in the caseof equal sample sizes the analysis ofvariance is affected surprisingly littleby heterogeneous variance and non-normality, concludes that the use ofthe nonrobust Bartlett test to "makethe preliminary test on variances israther like putting out to sea in arowing boat to find out whether con-ditions are sufficiently calm for anocean liner to leave port!" Appar-ently, as reported in this same arti-cle, other commonly used tests forevaluating homogeneity are subjectto the same weakness.

We may conclude that for a largenumber of different situations con-fronting the researcher, the use ofthe ordinary t test and its associatedtable will result in probability state-ments which are accurate to a highdegree, even though the assumptions

of homogeneity of variance and nor-mality of the underlying distribu-tions are untenable. This large num-ber of situations has the followinggeneral characteristics: (a) the twosample sizes are equal or nearly so,(b) the assumed underlying popula-tion distributions are of the sameshape or nearly so. (If the distribu-tions are skewed they should havenearly the same variance.) If theseconditions are met, then no matterwhat the variance differences maybe, samples of as small as five willproduce results for which the trueprobability of rejecting the null hy-pothesis at the .05 level will morethan likely be within .03 of that level.If the sample size is as large as 15,the true probabilities are quite likelywithin .01 of the nominal value.That is to say, the percentage oftimes the null hypothesis will be re-jected when it is actually true willtend to be between 4% and 6% whenthe nominal value is 5%.

If the sample sizes are unequal,one is in no difficulty provided thevariances are compensatingly equal.A combination of unequal samplesizes and unequal variances, how-ever, automatically produces inaccu-rate probability statements whichcan be quite different from the nomi-nal values. One must in this case re-sort to different testing procedures,such as those by Cochran and Cox(1950), Satterthwaite (1946), andWelch (1947). The Welch procedureis interesting since it has been ex-tended by Welch (1951) to cover thesimple randomized analysis of vari-ance which suffers the same defect asthe t test when confronted with bothunequal variance and unequal sam-ple sizes. The Fisher-Behrens pro-cedure suggested by many psycholog-ically oriented statistical textbookshas had its validity questioned (Bart-lett, 1936) and, hence, is ignored by

Page 15: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST 63

some statisticians (e.g., Anderson &Bancroft, 1952, p. 82).

If the two underlying populationsare not the same shape, there seemsto be little difficulty if the distribu-tions are both symmetrical. If theydiffer in skew, however, the distribu-tion of obtained t's has a tendency it-self to be skewed, having a greaterpercentage of obtained t's falling out-side of one limit than the other. Thismay tend to bias probability state-ments. Increasing the sample sizehas the effect of removing the skew,and, due to the Central Limit Theo-rem and others, the normal distribu-tion is approached by this maneuver.By the time the sample sizes reach25 or 30, the approach should be closeenough that one can, in effect, ignorethe effects of violations of assump-tions except for extremes. Since thisis so, the t test is seen to be func-tionally nonparametric or distribu-tion-free. It also retains its power insome situations (David & Johnson,1951). There is, unfortunately, noguarantee that the t and F tests areuniformly most powerful tests. It ispossible, even probable, that certain

of the distribution-free methods aremore powerful than the t and F testswhen sampling is from some unspeci-fied distributions or combination ofdistributions. At present, little canbe said to clarify the situation. Muchmore research in this area needs to bedone.

Since the / and F tests of analysisof variance are intimately related, itcan be shown that many of the state-ments referring to the t test can begeneralized quite readily to the Ftest. In particular, the necessity forequal sample sizes, if variances areunequal, is important for the samereasons in the F test of analysis ofvariance as in the t test. A numberof the cited articles have demon-strated both mathematically and bymeans of sampling studies that mostof the statements we have made doapply to the F test. It is suggestedthat psychological researchers feelfree to utilize these powerful tech-niques where applicable in a wid-er variety of situations, the pres-ent emphasis on the nonparametricmethods notwithstanding.

REFERENCES

ANDERSON, R. L., & BANCROFT, T. A. Sta-tistical theory in research. New York:McGraw-Hill, 1952.

BARTLETT, M. S. The effect of non-normalityon the /-distribution. Proc. Camb. Phil.Soc., 193S, 31, 223-231.

BARTLETT, M. S. The information availablein small samples. Proc, Camb. Phil. Soc.,1936, 32, S60-S66.

BARTLETT, M. S. Properties of sufficiencyand statistical tests. Proc. Roy. Soc. (Lon-don), 1937, 160, 268-282.

Box, G. E. P. Non-normality and tests onvariances. Biometrika, 1953, 40, 318-335.

Box, G. E. P. Some theorems on quadraticforms applied in the study of analysis ofvariance problems, I. Effect of inequalityof variance in the one-way classification.Ann. of math. Statist., 1954,25, 290-302.(a)

Box, G. E. P. Some theorems on quadratic

forms applied in the study of analysis ofvariance problems, II. Effects of inequalityof variance and of correlation between er-rors in the two-way classification. Ann. ofmath. Statist., 1954, 25, 484-498. (b)

Box, G. E. P., & ANDERSEN, S. L. Permuta-tion theory in the derivation of robust cri-teria and the study of departures from as-sumption. J. Roy. Statist. Soc. (Series B),1955, 17, 1-34.

COCHRAN, W. G., & Cox, G. M. Experimentaldesigns. New York: Wiley, 1950.

DANIELS, H. E. The effect of departures fromideal conditions other than non-normalityon the t and z tests of significance. Proc.Camb, Phil. Soc., 1938, 34, 321-328.

DAVID, F. N., & JOHNSON, N. L. The effectof non-normality on the power function ofthe /•'-test in the analysis of variance. Bio-metrika, 1951, 38, 43-57.

GAYEN, A. K. The distribution of the vari-

Page 16: THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE · 2017-10-24 · PSYCHOLOGICAL BULLETIN Vol. 37, No. 1,1960 THE EFFECTS OF VIOLATIONS OF ASSUMPTIONS UNDERLYING THE t TEST1

64 C. ALAN BONEAU

ance ratio in random samples of any sizedrawn from non-normal universes. Bio-metrika, 1950, 37, 236-255. (a)

GAYEN, A. K. Significance of difference be-tween the means of two non-normal sam-ples. Biometrika, 1950, 37, 399-408. (b)

HORSNELL, G. The effect of unequal groupvariances on the F-test for the homogeneityof group means. Biometrika, 1953, 40, 128-136.

LINDQUIST, E. F. Design and analysis of ex-periments in psychology and education. Bos-ton: Houghton Mifflin, 1953.

NORTON, D. W. An empirical investigation ofsome effects of non-normality and hetero-geneity on the F-distribution. Unpublisheddoctoral dissertation, State Univer. ofIowa, 1952.

PEARSON, E, S. The analysis of variance inthe case of non-normal variation. Bio-metriha, 1931, 23, 114-133.

QUENSEL, C. E. The validity of the Z-cri-terion when the variates are taken fromdifferent normal populations. Skand. Aktu-arietids, 1947, 30, 44-55.

SATTERTHWAITE, F. E. An approximate dis-tribution of estimates of variance compo-nents. Biomet. Butt., 1946, 2, 110-114.

WELCH, B. L. The significance of the differ-ence between two means when the popula-tion variances are unequal. Biometrika,1937, 29, 350-362.

WELCH, B. L. The generalization of Stu-dent's problem when several differentpopulation variances are involved. Bio-metrika, 1947, 34, 28-35.

WELCH, B. L. On the comparison of severalmean values: An alternative approach.Biometrika, 1951, 38, 330-336.

(January 28,1959)