unequal variance and anova ©2005 dr. b. c. paul. anova assumptions anova assumes the populations...

Unequal Variance and ANOVA

©2005 Dr. B. C. Paul

ANOVA Assumptions

ANOVA assumes the populations sampled in each class are normally distributed

Also assumes that the variance of those distributions is the sameHow can we check the homogeity of variance

assumption?

Our Story

Quincy’s company has been outsourcing American jobs and wishes to know if there are differences between where his factories are located, what shift the workers are on, and the number of rejected widgets. Quincy gathers data on the rejection rate at his three

factories using 20 weekly rates each taken at random from the record.

Quincy’s data set

Quincy’s favorite plant is in Yucatan Mexico (where he is forced to spend much of his time on visits at the beach) – he labels this plant #1 His day shift he calls 1 His night shift he calls 2

Quincy’s second plant is in Peoria, Illinois. The plant has sentimental value to him since he used to go there with his grandfather, but the workers there are uppity and want things like fair wages and safe working conditions – he labels this plant #2

Quincy’s third plant is in Malaysia. Quincy does not wish to discuss what he does during visits to this plant.

Quincy Enters His Data Set

Rejection rate/ 1000Widgets.

Day or Night Shift

Which Plant

Quincy Scratches His Head

What kind of test should he do? There are too many plants and shifts here to try

to do flocks of T tests. Then he remembers ANOVA

His variables plant shift and plant are really categories The numerical values are arbitrary and not even ordered

Only his # of rejects is ordered With two classes to divide by – shift and plant –

Quincy decides to do a two way ANOVA

Reaching for his trusty SPSS program Quincy Begins

Quincy clicks Analyze to pull downThe menu.

He highlights general linear modelTo bring up the pop-out menu

He then highlights and clicksMultivariate.

Quincy Picks # of rejects for his variable and shift and plant as his fixed factors.

This Time Quincy Looks at Options

Quincy wants to see his meansDisplayed

But most important he wantsTo use Lavene’s test to checkThat the variance is the sameFor all the plants and shifts.

Quincy clicks continue and OK and the program is off to the races

It starts with the Kill Joy special of theDay. Levene’s test says there is noWay the variance is the same for allThe categories.

Then it gives a result it has already called into question.

The most powerful effectWas which plant(Oh – Nuts outsourcing doesSeem to change reject rate)

Shift was the next mostImportant.

But the response to shift workAlso varies by where thePlant is located.

Importance of the controlling variables

About 71% of the variationCan be traced to which plantAnd which shift is working.

Shift Effects

The 95% confidence intervals forThe mean of Day Shift and NightShift do not come close toOverlapping.The values suggest night shiftMesses up more than day shift.

This Really Sucks

For Quincy’s foreign plant the95% confidence intervals overlapSo Quincy may not be able to beCertain the foreign plants areDifferent –However – his U.S. plant has a Distinctly lower rejection rate fromThe foreign plants.

Quincy’s outsourcing has impactedRejection rate.

Moving on to Plant and Shift Interactions.

The U.S. plant on either day shiftOr night shift has much lowerRejection than any of the foreign plants

We also cannot be sure that there is an effect ofShift in the U.S. operations.

Continued Inspection

On night shift the screw-up rate is similar forMalaysia and Mexico

On Day shift, however,Malaysia shows betterPerformance than Mexico

Usefulness of Means Displays

Can see that the plots of individual classes with confidence intervals can give you an at a glimpse view of which differences specifically are driving the test results

Bad news is those confidence intervals used a common standard error which Levene’s test proved was wrong.

What Happens When Assumptions Fail? We have a very clear cut result from our data, but

unfortunately its legitimacy has been called into question The more good information we have the closer we can

peg our answers We saw that adding samples improved the certainty of our

conclusions We saw with the Brehens-Fisher T test that when we lost

homogeity of variance we were less sure of our values than before

We know we have lost something here also General models and methods don’t help us know exactly how much

and don’t tell us what to do about it. We have been warned of an error and we know the

direction that the error will push our answers.

So Are the U.S. Plants Better Or Not? With the high certainty levels here and

fairly good sample size we probably can still be sure about our U.S. plants If things were close we simply would not know

for sure with the analysis and methods we have so far.

What about the homogeneous error variance in regression? In many regression analysis the largest

source for poorly distributed error variance is lack of fit of the modelWe already know how to bring in other

variables and look for non-linear effects If variance still changes it is often that

larger numbers have greater variance.Some sort of data normalization can help this

unequal variance and anova ©2005 dr. b. c. paul. anova assumptions anova assumes the populations...

Documents

plant slide

plant quincy

variables plant shift

quincys foreign plant

shift effects

quincy clicks

paul slide

mean of day shift