unequal variance and anova ©2005 dr. b. c. paul. anova assumptions anova assumes the populations...
TRANSCRIPT
Unequal Variance and ANOVA
©2005 Dr. B. C. Paul
ANOVA Assumptions
ANOVA assumes the populations sampled in each class are normally distributed
Also assumes that the variance of those distributions is the sameHow can we check the homogeity of variance
assumption?
Our Story
Quincy’s company has been outsourcing American jobs and wishes to know if there are differences between where his factories are located, what shift the workers are on, and the number of rejected widgets. Quincy gathers data on the rejection rate at his three
factories using 20 weekly rates each taken at random from the record.
Quincy’s data set
Quincy’s favorite plant is in Yucatan Mexico (where he is forced to spend much of his time on visits at the beach) – he labels this plant #1 His day shift he calls 1 His night shift he calls 2
Quincy’s second plant is in Peoria, Illinois. The plant has sentimental value to him since he used to go there with his grandfather, but the workers there are uppity and want things like fair wages and safe working conditions – he labels this plant #2
Quincy’s third plant is in Malaysia. Quincy does not wish to discuss what he does during visits to this plant.
Quincy Enters His Data Set
Rejection rate/ 1000Widgets.
Day or Night Shift
Which Plant
Quincy Scratches His Head
What kind of test should he do? There are too many plants and shifts here to try
to do flocks of T tests. Then he remembers ANOVA
His variables plant shift and plant are really categories The numerical values are arbitrary and not even ordered
Only his # of rejects is ordered With two classes to divide by – shift and plant –
Quincy decides to do a two way ANOVA
Reaching for his trusty SPSS program Quincy Begins
Quincy clicks Analyze to pull downThe menu.
He highlights general linear modelTo bring up the pop-out menu
He then highlights and clicksMultivariate.
Quincy Picks # of rejects for his variable and shift and plant as his fixed factors.
This Time Quincy Looks at Options
Quincy wants to see his meansDisplayed
But most important he wantsTo use Lavene’s test to checkThat the variance is the sameFor all the plants and shifts.
Quincy clicks continue and OK and the program is off to the races
It starts with the Kill Joy special of theDay. Levene’s test says there is noWay the variance is the same for allThe categories.
Then it gives a result it has already called into question.
The most powerful effectWas which plant(Oh – Nuts outsourcing doesSeem to change reject rate)
Shift was the next mostImportant.
But the response to shift workAlso varies by where thePlant is located.
Importance of the controlling variables
About 71% of the variationCan be traced to which plantAnd which shift is working.
Shift Effects
The 95% confidence intervals forThe mean of Day Shift and NightShift do not come close toOverlapping.The values suggest night shiftMesses up more than day shift.
This Really Sucks
For Quincy’s foreign plant the95% confidence intervals overlapSo Quincy may not be able to beCertain the foreign plants areDifferent –However – his U.S. plant has a Distinctly lower rejection rate fromThe foreign plants.
Quincy’s outsourcing has impactedRejection rate.
Moving on to Plant and Shift Interactions.
The U.S. plant on either day shiftOr night shift has much lowerRejection than any of the foreign plants
We also cannot be sure that there is an effect ofShift in the U.S. operations.
Continued Inspection
On night shift the screw-up rate is similar forMalaysia and Mexico
On Day shift, however,Malaysia shows betterPerformance than Mexico
Usefulness of Means Displays
Can see that the plots of individual classes with confidence intervals can give you an at a glimpse view of which differences specifically are driving the test results
Bad news is those confidence intervals used a common standard error which Levene’s test proved was wrong.
What Happens When Assumptions Fail? We have a very clear cut result from our data, but
unfortunately its legitimacy has been called into question The more good information we have the closer we can
peg our answers We saw that adding samples improved the certainty of our
conclusions We saw with the Brehens-Fisher T test that when we lost
homogeity of variance we were less sure of our values than before
We know we have lost something here also General models and methods don’t help us know exactly how much
and don’t tell us what to do about it. We have been warned of an error and we know the
direction that the error will push our answers.
So Are the U.S. Plants Better Or Not? With the high certainty levels here and
fairly good sample size we probably can still be sure about our U.S. plants If things were close we simply would not know
for sure with the analysis and methods we have so far.
What about the homogeneous error variance in regression? In many regression analysis the largest
source for poorly distributed error variance is lack of fit of the modelWe already know how to bring in other
variables and look for non-linear effects If variance still changes it is often that
larger numbers have greater variance.Some sort of data normalization can help this