analysis of variance - university of texas at dallasthe purpose of analysis of variance is to...
TRANSCRIPT
Analysis of Variance
OPRE 6301
Introduction . . .
The purpose of analysis of variance is to compare
two or more populations of interval data. Specifically, we
are interested in determining whether differences exist
between the population means.
Although we are interested in a comparison of means, the
method works by analyzing the sample variance. The
intuitive reason for this is that by studying the “sources”
of variation in the sample data, we can decide whether any
observed differences among multiple sample means can
be attributed to chance, or whether they are indicative of
actual differences among the means of the corresponding
populations.
Analysis of variance is one of the most widely used meth-
ods in statistics.
1
One-Way Analysis of Variance . . .
Consider the following scenarios:
— We wish to decide on the basis of sample data whether
there is really a difference in the effectiveness of three
methods of teaching computer programming.
— We wish to compare the average yield per acre of
wheat when different types of fertilizers are used.
— We wish to find out whether differences exist between
the mean travel time from home to work/school along
three different routes.
— We wish to find out whether differences exist between
average starting salary for MBA graduates from dif-
ferent schools.
In all of these examples, we are trying to find out the
effect of a single “factor” on a population; this explains
the description “one-way.”
2
Example: Advertising Strategy . . .
We will illustrate the method using the following concrete
example.
An apple juice manufacturer is planning to develop a
new product — a liquid concentrate.
The marketing manager has to decide how to market the
new product. Three strategies are considered:
• Emphasize convenience of using the product.
• Emphasize the quality of the product.
• Emphasize the product’s low price.
An experiment was conducted as follows:
• An advertising campaign was launched in three
cities.
• In each city, only one of the three characteristics of
the new product (convenience, quality, and price)
was emphasized.
3
Weekly sales were recorded for twenty weeks following
the beginning of the campaigns (see Xm15-01.xls).C o n v n c e Q u a l i t y P r i c e5 2 9 8 0 4 6 7 26 5 8 6 3 0 5 3 17 9 3 7 7 4 4 4 35 1 4 7 1 7 5 9 66 6 3 6 7 9 6 0 27 1 9 6 0 4 5 0 27 1 1 6 2 0 6 5 96 0 6 6 9 7 6 8 94 6 1 7 0 6 6 7 55 2 9 6 1 5 5 1 24 9 8 4 9 2 6 9 16 6 3 7 1 9 7 3 36 0 4 7 8 7 6 9 84 9 5 6 9 9 7 7 64 8 5 5 7 2 5 6 15 5 7 5 2 3 5 7 23 5 3 5 8 4 4 6 95 5 7 6 3 4 5 8 15 4 2 5 8 0 6 7 96 1 4 6 2 4 5 3 2
4
Research Hypothesis
We wish to answer the following question: Do differences
in sales exist between the test markets (i.e., advertising
strategies)? In other words, our research hypothesis is
that at least two of the three mean sales differ.
Therefore,
H0 : µ1 = µ2 = µ3
H1 : At least two of the µjs differ
We now need to develop an appropriate statistic to test
the above pair of hypotheses.
5
Notation
Let
k = total number of populations (k = 3 in this example)
nj = sample size for the jth population (nj = 20 for all
1 ≤ j ≤ k in this example)
n =∑k
j=1 nj , the total number of observations
Xij = the ith observation from the jth population, where
1 ≤ j ≤ k and for given j, 1 ≤ i ≤ nj.
X̄j = the sample mean of all observations from the jth
population, for 1 ≤ j ≤ k; formally,
X̄j =1
nj
nj∑
i=1
Xij .
sj = the standard deviation of all observations from the
jth population
¯̄X = the grand mean of all observations; formally,
¯̄X =1
n
k∑
j=1
nj∑
i=1
Xij =1
n
k∑
j=1
njX̄j .
6
Thus,
X 1 1x 2 1...X n 1 , 111x nX 1 2x 2 2...X n 2 , 222xn
X 1 kx 2 k...X n k , kkkx nS a m p l e s i z eS a m p l e m e a n
F i r s t o b s e r v a t i o n ,f i r s t s a m p l eS e c o n d o b s e r v a t i o n ,s e c o n d s a m p l e
Note that in general, the njs do not have to be the same.
7
Generic Terminology
In the context of this example,
Response Variable: weekly sales, denoted by X above
Responses: actual sales values, which are the observed
values of the Xijs
Experimental Unit: weeks in the three cities when
we recored sales figures
Factor: the criterion by which we classify the popula-
tions — advertising strategy; another possible factor
is advertising medium (TV, newspaper, or internet)
Factor Levels: possible treatments for a given fac-
tor — convenience, quality, or price
8
Test Statistic
We will consider two measures of variability. The idea is
depicted below . . .
2 02 53 0
17
T r e a t m e n t 1 T r e a t m e n t 2 T r e a t m e n t 31 01 2
1 99
T r e a t m e n t 1 T r e a t m e n t 2 T r e a t m e n t 3
2 01 61 51 41 11 09 1 0x 1 =1 5x 2 = 2 0x 3 =
1 0x 1 =1 5x 2 =
2 0x 3 =
T h e s a m p l e m e a n s a r e t h e s a m e a s b e f o r e ,b u t t h e l a r g e r w i t h i n w s a m p l e v a r i a b i l i t ym a k e s i t h a r d e r t o d r a w a c o n c l u s i o na b o u t t h e p o p u l a t i o n m e a n s .A s m a l l v a r i a b i l i t y w i t h i nt h e s a m p l e s m a k e s i t e a s i e rt o d r a w a c o n c l u s i o n a b o u t t h ep o p u l a t i o n m e a n s .
9
Formally, we assume that the Xijs are independent and
normally distributed with means µj and a common vari-
ance σ2. That is,
X̄ij = µj + ǫij , (1)
where the ǫijs, the errors, are normally distributed with
mean 0 and variance σ2.
Now, if the null hypothesis is true, say with all µjs equal
to µ, then the key idea is that we can estimate σ2 in two
ways . . .
10
Method 1: For each j, the nj observations X1j, X2j,
. . . , Xnj,j can be viewed as a sample of size nj from a
normal population with mean µ and variance σ2. It
follows that
1
σ2
nj∑
i=1
(Xij − X̄j)2
has a χ2 distribution with nj − 1 degrees of freedom
(see equation (3) in notes for Chapter 12).
Furthermore, summing the above over j shows that
1
σ2
k∑
j=1
nj∑
i=1
(Xij − X̄j)2
has a χ2 distribution with n − k degrees of freedom
(recall that∑k
j=1 nj = n).
Since E(χ2) = ν, we see that σ2 can be estimated by
MSE ≡1
n − k
k∑
j=1
nj∑
i=1
(Xij − X̄j)2 , (2)
where MSE is for mean square for errors (cf. the
white groupings in the previous figure).
11
Method 2: Recall that for each j, the sample mean X̄j
also is a normally distributed variable with mean µ
and variance σ2/nj. Since we have one mean for
each j, i.e., for each treatment, it is intuitive that
one should be able to estimate σ2 by looking at the
variability of the treatment means around the grand
mean. Indeed, it can be shown that
1
σ2
k∑
j=1
nj(X̄j −¯̄X)2
(think of this as weighting the squared deviation (X̄j−¯̄X)2 by nj, the number of observations for treatment
j) has a χ2 distribution with k−1 degrees of freedom.
Again, since E(χ2) = ν, we see that σ2 can also be
estimated by
MST ≡1
k − 1
k∑
j=1
nj(X̄j −¯̄X)2 , (3)
where MST is for mean square for treatments
(cf. the horizontal pink lines in the previous figure).
12
Since both (2) and (3) are unbiased point estimates for
σ2, we expect the two estimates to be close. This suggests
that we consider the statistic
F ≡MST
MSE. (4)
If the null hypothesis is true, we expect F to be close to
1. If, on the other hand, the null hypothesis is not true,
we expect F to be pulled up by a greater MST (again,
refer to the previous figure). Thus, the F statistic is a
measure of relative variability.
How big an F value is considered sufficient for us to reject
H0? The answer depends on the sampling distribution of
the F statistic, which turns out to be the so-called (natu-
rally) F distribution. (A discussion of the F distribution
can be found on pp. 270–274 of the text.)
13
The F distribution is characterized by two parameters ν1
and ν2. In our setting, these parameters correspond to
the degrees of freedom for MST and MSE, respectively,
and hence ν1 = k − 1 and ν2 = n − k.
The F critical value for a given right -tail probability α
can be found by the Excel function FINV(α, ν1, ν2).
In our example, we have ν1 = 3−1 = 2 and ν2 = 60−3 =
57. For α = 0.05, this yields the critical value
FINV(0.05, 2, 57) = 3.16 .
We are now ready to test H0 . . .
14
The F Test
The MST and the MSE are usually computed via
MST =SST
k − 1
and
MSE =SSE
n − k,
respectively, where
SST ≡k
∑
j=1
nj(X̄j −¯̄X)2, (5)
which is called the sum of squares for treatment,
and
SSE ≡k
∑
j=1
nj∑
i=1
(Xij − X̄j)2, (6)
which is called the sum of squares for errors.
15
For our example, . . .
SST:
2k 1j jj 321 == 2 0 ( 5 7 7 . 5 5 � 6 1 3 . 0 7 ) 2 ++ 2 0 ( 6 5 3 . 0 0 � 6 1 3 . 0 7 ) 2 ++ 2 0 ( 6 0 8 . 6 5 � 6 1 3 . 0 7 ) 2= 5 7 , 5 1 2 . 2 3T h e g r a n d m e a n i s c a l c u l a t e d b y
k21 kk2211SSE:
= j= 1 1 2 2 2 2 3 3 2
16
It follows that
F =SST/(k − 1)
SSE/(n − k)
=57512.23/2
509983.50/57= 3.23 .
Since 3.23 exceeds 3.16, the F critical value at α = 0.05,
there is sufficient evidence to reject H0 in favor of H1
and we conclude that at least one mean sales under these
three strategies differ.
As usual, we can also determine the right -side p-value
associated F = 3.23 via
FDIST(3.23, 2, 57) = 0.0469 ,
which is seen to be lower than 0.05.
¯ 0 . 0 200 . 0 20 . 0 40 . 0 60 . 0 80 . 10 1 2 3 4p V a l u e = P ( F > 3 . 2 3 ) = . 0 4 6 9
17
The Data Analysis tool “Anova: Single Factor ” can be
used to carry out F tests. This greatly simplifies our task.
The output for our example is shown below.
G r o u p s C o u n t S u m A v e r a g e V a r i a n c eS o u r c e o f V a r i a t i o n S S d f M S F P ã v a l u e F c r i t
Note that the last row of the output gives the total of
SST and SSE, defined as:
SS ≡k
∑
j=1
nj∑
i=1
(Xij −¯̄X)2. (7)
18
It can be shown that
SS = SST + SSE (8)
and that SS/σ2 follows the χ2 distribution with n − 1
degrees of freedom. The decomposition of SS into the
sum of SST and SSE in (8) is helpful in visualizing how
the F statistic is related to the two “sources” of varia-
tion, between groups and within groups (groups =
treatments). We will encounter a similar decomposition
in regression analysis.
19
ANOVA: Two Factors . . .
As noted earlier, advertising medium (TV, newspaper, or
internet) could be considered as another factor that has
an effect on sales. Visually, this means . . .
F a c t o r AL e v e l 1L e v e l 2 L e v e l 1F a c t o r B L e v e l 3
T w o ÷ w a y A N O V AT w o f a c t o r s
L e v e l 2
O n e ÷ w a y A N O V AS i n g l e f a c t o rT r e a t m e n t 3 ( l e v e l 1 )
R e s p o n s eR e s p o n s e
T r e a t m e n t 1 ( l e v e l 3 )T r e a t m e n t 2 ( l e v e l 2 )
The Data Analysis tool “Anova: Two-Factor . . . ” can be
used for this. The idea is the same, and we leave out the
details.
20
Testing for Differences . . .
After having rejected H0 in our example, a natural ques-
tion arises: Which mean(s) is (are) different and further-
more, what is the rank order? A casual approach of course
is to directly look at the X̄js, but we should rely on a more
rigorous statistical test.
Recall that for two populations having a common un-
known variance, the confidence interval for the difference
between the population means µ1 − µ2 is given by
(X̄1 − X̄2) ± tα/2
√
s2p
(
1
n1+
1
n2
)
, (9)
where tα/2 has n1 + n2 − 2 degrees of freedom and
s2p =
(n1 − 1)s21 + (n2 − 1)s2
2
n1 + n2 − 2. (10)
In fact, the F test with k = 2 is equivalent to the t test;
that is, F = t2.
21
The idea then is to apply (9) and (10) to all pairs of
sample means for treatments. However, we will make
two revisions to (9) and (10) . . .
Revision 1 (Fisher):
In this revision, we replace s2p in (10) by MSE. The rea-
soning is that MSE is more accurate, as it is based on
data from all (not just two) treatment groups.
Formally, this results in the following decision rule: Re-
ject µi = µj (for any pair of i and j) if |X̄i − X̄j| >
LSDij, where
LSDij = tα/2
√
MSE
(
1
ni+
1
nj
)
, (11)
with tα/2 now having n − k degrees of freedom. The
letters LSD stand for “Least Significant Difference.”
22
Revision 2 (Bonferroni):
Let αE (“E” for experiment-wide) be the probability
of committing at least one Type I error in all c =(
k2
)
pairwise tests. Then, under the assumption that
the tests are only “mildly dependent,” we have the
approximation:
αE ≈ 1 − (1 − α)c. (12)
Observe that as k grows, the above probability grows
rapidly. It follows that Fisher’s procedure could easily
result in an unacceptably high αE.
To remedy this problem, we will rely on the following
(Bonferroni) bound: αE ≤ c α (the probability of a
union is bounded above by the sum of the probabilities
for all individual events). This bound implies that if
we wish to ensure that the probability of committing
a Type I error is no greater than αE, then we should
set α for the pairwise tests to αE/c.
Conclusion: tα/2 in (11) should be replaced by tαE/(2c).
23
Summary
After having rejected H0, the following procedure can be
used to conduct pairwise tests of differences in treatment
means:
— Compute c, which is given by k(k − 1)/2.
— Set α = αE/c, where αE is the targeted probability
for making at least one Type I error in all pairwise
tests.
— For a pair of i and j, conclude that µi and µj differ if
∣
∣X̄i − X̄j
∣
∣ > tα/2
√
MSE
(
1
ni+
1
nj
)
, (13)
where the degrees of freedom of tα/2 is n − k.
24
Example: Advertising Strategy — Continued
We failed to accept H0. Which pair(s) of i and j has
(have) different means?
Analysis: For k = 3, we have c = 3 ·2/2 = 3. To achieve
(say) αE = 0.05, we let α = 0.05/3 = 0.0167. From
Excel, we obtain
TINV(0.0167, 60 − 3) = 2.467 .
Hence,
α i j3 5.4 46 5.6 0 80.6 5 3xx 1 0.3 16 5.6 0 85 5.5 7 7xx 4 5.7 50.6 5 35 5.5 7 7xx32 31 21
We conclude that there is a significant difference between
µ1 and µ2.
25