analysis of variance - university of texas at dallasthe purpose of analysis of variance is to...

Analysis of Variance

OPRE 6301

Introduction . . .

The purpose of analysis of variance is to compare

two or more populations of interval data. Specifically, we

are interested in determining whether differences exist

between the population means.

Although we are interested in a comparison of means, the

method works by analyzing the sample variance. The

intuitive reason for this is that by studying the “sources”

of variation in the sample data, we can decide whether any

observed differences among multiple sample means can

be attributed to chance, or whether they are indicative of

actual differences among the means of the corresponding

populations.

Analysis of variance is one of the most widely used meth-

ods in statistics.

1

One-Way Analysis of Variance . . .

Consider the following scenarios:

— We wish to decide on the basis of sample data whether

there is really a difference in the effectiveness of three

methods of teaching computer programming.

— We wish to compare the average yield per acre of

wheat when different types of fertilizers are used.

— We wish to find out whether differences exist between

the mean travel time from home to work/school along

three different routes.

— We wish to find out whether differences exist between

average starting salary for MBA graduates from dif-

ferent schools.

In all of these examples, we are trying to find out the

effect of a single “factor” on a population; this explains

the description “one-way.”

2

Example: Advertising Strategy . . .

We will illustrate the method using the following concrete

example.

An apple juice manufacturer is planning to develop a

new product — a liquid concentrate.

The marketing manager has to decide how to market the

new product. Three strategies are considered:

• Emphasize convenience of using the product.

• Emphasize the quality of the product.

• Emphasize the product’s low price.

An experiment was conducted as follows:

• An advertising campaign was launched in three

cities.

• In each city, only one of the three characteristics of

the new product (convenience, quality, and price)

was emphasized.

3

Weekly sales were recorded for twenty weeks following

the beginning of the campaigns (see Xm15-01.xls).C o n v n c e Q u a l i t y P r i c e5 2 9 8 0 4 6 7 26 5 8 6 3 0 5 3 17 9 3 7 7 4 4 4 35 1 4 7 1 7 5 9 66 6 3 6 7 9 6 0 27 1 9 6 0 4 5 0 27 1 1 6 2 0 6 5 96 0 6 6 9 7 6 8 94 6 1 7 0 6 6 7 55 2 9 6 1 5 5 1 24 9 8 4 9 2 6 9 16 6 3 7 1 9 7 3 36 0 4 7 8 7 6 9 84 9 5 6 9 9 7 7 64 8 5 5 7 2 5 6 15 5 7 5 2 3 5 7 23 5 3 5 8 4 4 6 95 5 7 6 3 4 5 8 15 4 2 5 8 0 6 7 96 1 4 6 2 4 5 3 2

4

Research Hypothesis

We wish to answer the following question: Do differences

in sales exist between the test markets (i.e., advertising

strategies)? In other words, our research hypothesis is

that at least two of the three mean sales differ.

Therefore,

H0 : µ1 = µ2 = µ3

H1 : At least two of the µjs differ

We now need to develop an appropriate statistic to test

the above pair of hypotheses.

5

Notation

Let

k = total number of populations (k = 3 in this example)

nj = sample size for the jth population (nj = 20 for all

1 ≤ j ≤ k in this example)

n =∑k

j=1 nj , the total number of observations

Xij = the ith observation from the jth population, where

1 ≤ j ≤ k and for given j, 1 ≤ i ≤ nj.

X̄j = the sample mean of all observations from the jth

population, for 1 ≤ j ≤ k; formally,

X̄j =1

nj

nj∑

i=1

Xij .

sj = the standard deviation of all observations from the

jth population

¯̄X = the grand mean of all observations; formally,

¯̄X =1

n

k∑

j=1

nj∑

i=1

Xij =1

n

k∑

j=1

njX̄j .

6

Thus,

X 1 1x 2 1...X n 1 , 111x nX 1 2x 2 2...X n 2 , 222xn

X 1 kx 2 k...X n k , kkkx nS a m p l e s i z eS a m p l e m e a n

F i r s t o b s e r v a t i o n ,f i r s t s a m p l eS e c o n d o b s e r v a t i o n ,s e c o n d s a m p l e

Note that in general, the njs do not have to be the same.

7

Generic Terminology

In the context of this example,

Response Variable: weekly sales, denoted by X above

Responses: actual sales values, which are the observed

values of the Xijs

Experimental Unit: weeks in the three cities when

we recored sales figures

Factor: the criterion by which we classify the popula-

tions — advertising strategy; another possible factor

is advertising medium (TV, newspaper, or internet)

Factor Levels: possible treatments for a given fac-

tor — convenience, quality, or price

8

Test Statistic

We will consider two measures of variability. The idea is

depicted below . . .

2 02 53 0

17

T r e a t m e n t 1 T r e a t m e n t 2 T r e a t m e n t 31 01 2

1 99

T r e a t m e n t 1 T r e a t m e n t 2 T r e a t m e n t 3

2 01 61 51 41 11 09 1 0x 1 =1 5x 2 = 2 0x 3 =

1 0x 1 =1 5x 2 =

2 0x 3 =

T h e s a m p l e m e a n s a r e t h e s a m e a s b e f o r e ,b u t t h e l a r g e r w i t h i n w s a m p l e v a r i a b i l i t ym a k e s i t h a r d e r t o d r a w a c o n c l u s i o na b o u t t h e p o p u l a t i o n m e a n s .A s m a l l v a r i a b i l i t y w i t h i nt h e s a m p l e s m a k e s i t e a s i e rt o d r a w a c o n c l u s i o n a b o u t t h ep o p u l a t i o n m e a n s .

9

Formally, we assume that the Xijs are independent and

normally distributed with means µj and a common vari-

ance σ2. That is,

X̄ij = µj + ǫij , (1)

where the ǫijs, the errors, are normally distributed with

mean 0 and variance σ2.

Now, if the null hypothesis is true, say with all µjs equal

to µ, then the key idea is that we can estimate σ2 in two

ways . . .

10

Method 1: For each j, the nj observations X1j, X2j,

. . . , Xnj,j can be viewed as a sample of size nj from a

normal population with mean µ and variance σ2. It

follows that

1

σ2

nj∑

i=1

(Xij − X̄j)2

has a χ2 distribution with nj − 1 degrees of freedom

(see equation (3) in notes for Chapter 12).

Furthermore, summing the above over j shows that

1

σ2

k∑

j=1

nj∑

i=1

(Xij − X̄j)2

has a χ2 distribution with n − k degrees of freedom

(recall that∑k

j=1 nj = n).

Since E(χ2) = ν, we see that σ2 can be estimated by

MSE ≡1

n − k

k∑

j=1

nj∑

i=1

(Xij − X̄j)2 , (2)

where MSE is for mean square for errors (cf. the

white groupings in the previous figure).

11

Method 2: Recall that for each j, the sample mean X̄j

also is a normally distributed variable with mean µ

and variance σ2/nj. Since we have one mean for

each j, i.e., for each treatment, it is intuitive that

one should be able to estimate σ2 by looking at the

variability of the treatment means around the grand

mean. Indeed, it can be shown that

1

σ2

k∑

j=1

nj(X̄j −¯̄X)2

(think of this as weighting the squared deviation (X̄j−¯̄X)2 by nj, the number of observations for treatment

j) has a χ2 distribution with k−1 degrees of freedom.

Again, since E(χ2) = ν, we see that σ2 can also be

estimated by

MST ≡1

k − 1

k∑

j=1

nj(X̄j −¯̄X)2 , (3)

where MST is for mean square for treatments

(cf. the horizontal pink lines in the previous figure).

12

Since both (2) and (3) are unbiased point estimates for

σ2, we expect the two estimates to be close. This suggests

that we consider the statistic

F ≡MST

MSE. (4)

If the null hypothesis is true, we expect F to be close to

1. If, on the other hand, the null hypothesis is not true,

we expect F to be pulled up by a greater MST (again,

refer to the previous figure). Thus, the F statistic is a

measure of relative variability.

How big an F value is considered sufficient for us to reject

H0? The answer depends on the sampling distribution of

the F statistic, which turns out to be the so-called (natu-

rally) F distribution. (A discussion of the F distribution

can be found on pp. 270–274 of the text.)

13

The F distribution is characterized by two parameters ν1

and ν2. In our setting, these parameters correspond to

the degrees of freedom for MST and MSE, respectively,

and hence ν1 = k − 1 and ν2 = n − k.

The F critical value for a given right -tail probability α

can be found by the Excel function FINV(α, ν1, ν2).

In our example, we have ν1 = 3−1 = 2 and ν2 = 60−3 =

57. For α = 0.05, this yields the critical value

FINV(0.05, 2, 57) = 3.16 .

We are now ready to test H0 . . .

14

The F Test

The MST and the MSE are usually computed via

MST =SST

k − 1

and

MSE =SSE

n − k,

respectively, where

SST ≡k

∑

j=1

nj(X̄j −¯̄X)2, (5)

which is called the sum of squares for treatment,

and

SSE ≡k

∑

j=1

nj∑

i=1

(Xij − X̄j)2, (6)

which is called the sum of squares for errors.

15

For our example, . . .

SST:

2k 1j jj 321 == 2 0 ( 5 7 7 . 5 5 � 6 1 3 . 0 7 ) 2 ++ 2 0 ( 6 5 3 . 0 0 � 6 1 3 . 0 7 ) 2 ++ 2 0 ( 6 0 8 . 6 5 � 6 1 3 . 0 7 ) 2= 5 7 , 5 1 2 . 2 3T h e g r a n d m e a n i s c a l c u l a t e d b y

k21 kk2211SSE:

= j= 1 1 2 2 2 2 3 3 2

16

It follows that

F =SST/(k − 1)

SSE/(n − k)

=57512.23/2

509983.50/57= 3.23 .

Since 3.23 exceeds 3.16, the F critical value at α = 0.05,

there is sufficient evidence to reject H0 in favor of H1

and we conclude that at least one mean sales under these

three strategies differ.

As usual, we can also determine the right -side p-value

associated F = 3.23 via

FDIST(3.23, 2, 57) = 0.0469 ,

which is seen to be lower than 0.05.

¯ 0 . 0 200 . 0 20 . 0 40 . 0 60 . 0 80 . 10 1 2 3 4p V a l u e = P ( F > 3 . 2 3 ) = . 0 4 6 9

17

The Data Analysis tool “Anova: Single Factor ” can be

used to carry out F tests. This greatly simplifies our task.

The output for our example is shown below.

G r o u p s C o u n t S u m A v e r a g e V a r i a n c eS o u r c e o f V a r i a t i o n S S d f M S F P ã v a l u e F c r i t

Note that the last row of the output gives the total of

SST and SSE, defined as:

SS ≡k

∑

j=1

nj∑

i=1

(Xij −¯̄X)2. (7)

18

It can be shown that

SS = SST + SSE (8)

and that SS/σ2 follows the χ2 distribution with n − 1

degrees of freedom. The decomposition of SS into the

sum of SST and SSE in (8) is helpful in visualizing how

the F statistic is related to the two “sources” of varia-

tion, between groups and within groups (groups =

treatments). We will encounter a similar decomposition

in regression analysis.

19

ANOVA: Two Factors . . .

As noted earlier, advertising medium (TV, newspaper, or

internet) could be considered as another factor that has

an effect on sales. Visually, this means . . .

F a c t o r AL e v e l 1L e v e l 2 L e v e l 1F a c t o r B L e v e l 3

T w o ÷ w a y A N O V AT w o f a c t o r s

L e v e l 2

O n e ÷ w a y A N O V AS i n g l e f a c t o rT r e a t m e n t 3 ( l e v e l 1 )

R e s p o n s eR e s p o n s e

T r e a t m e n t 1 ( l e v e l 3 )T r e a t m e n t 2 ( l e v e l 2 )

The Data Analysis tool “Anova: Two-Factor . . . ” can be

used for this. The idea is the same, and we leave out the

details.

20

Testing for Differences . . .

After having rejected H0 in our example, a natural ques-

tion arises: Which mean(s) is (are) different and further-

more, what is the rank order? A casual approach of course

is to directly look at the X̄js, but we should rely on a more

rigorous statistical test.

Recall that for two populations having a common un-

known variance, the confidence interval for the difference

between the population means µ1 − µ2 is given by

(X̄1 − X̄2) ± tα/2

√

s2p

(

1

n1+

1

n2

)

, (9)

where tα/2 has n1 + n2 − 2 degrees of freedom and

s2p =

(n1 − 1)s21 + (n2 − 1)s2

2

n1 + n2 − 2. (10)

In fact, the F test with k = 2 is equivalent to the t test;

that is, F = t2.

21

The idea then is to apply (9) and (10) to all pairs of

sample means for treatments. However, we will make

two revisions to (9) and (10) . . .

Revision 1 (Fisher):

In this revision, we replace s2p in (10) by MSE. The rea-

soning is that MSE is more accurate, as it is based on

data from all (not just two) treatment groups.

Formally, this results in the following decision rule: Re-

ject µi = µj (for any pair of i and j) if |X̄i − X̄j| >

LSDij, where

LSDij = tα/2

√

MSE

(

1

ni+

1

nj

)

, (11)

with tα/2 now having n − k degrees of freedom. The

letters LSD stand for “Least Significant Difference.”

22

Revision 2 (Bonferroni):

Let αE (“E” for experiment-wide) be the probability

of committing at least one Type I error in all c =(

k2

)

pairwise tests. Then, under the assumption that

the tests are only “mildly dependent,” we have the

approximation:

αE ≈ 1 − (1 − α)c. (12)

Observe that as k grows, the above probability grows

rapidly. It follows that Fisher’s procedure could easily

result in an unacceptably high αE.

To remedy this problem, we will rely on the following

(Bonferroni) bound: αE ≤ c α (the probability of a

union is bounded above by the sum of the probabilities

for all individual events). This bound implies that if

we wish to ensure that the probability of committing

a Type I error is no greater than αE, then we should

set α for the pairwise tests to αE/c.

Conclusion: tα/2 in (11) should be replaced by tαE/(2c).

23

Summary

After having rejected H0, the following procedure can be

used to conduct pairwise tests of differences in treatment

means:

— Compute c, which is given by k(k − 1)/2.

— Set α = αE/c, where αE is the targeted probability

for making at least one Type I error in all pairwise

tests.

— For a pair of i and j, conclude that µi and µj differ if

∣

∣X̄i − X̄j

∣

∣ > tα/2

√

MSE

(

1

ni+

1

nj

)

, (13)

where the degrees of freedom of tα/2 is n − k.

24

Example: Advertising Strategy — Continued

We failed to accept H0. Which pair(s) of i and j has

(have) different means?

Analysis: For k = 3, we have c = 3 ·2/2 = 3. To achieve

(say) αE = 0.05, we let α = 0.05/3 = 0.0167. From

Excel, we obtain

TINV(0.0167, 60 − 3) = 2.467 .

Hence,

α i j3 5.4 46 5.6 0 80.6 5 3xx 1 0.3 16 5.6 0 85 5.5 7 7xx 4 5.7 50.6 5 35 5.5 7 7xx32 31 21

We conclude that there is a significant difference between

µ1 and µ2.

25

analysis of variance - university of texas at dallasthe purpose of analysis of variance is to...

Documents