1 analysis of variance (anova) heibatollah baghi, and mastee badii
TRANSCRIPT
1
ANALYSIS OF VARIANCE (ANOVA)
Heibatollah Baghi, and Mastee Badii
2
Purpose of ANOVA
• Use one-way Analysis of Variance to test when the mean of a variable (Dependent variable) differs among three or more groups
– For example, compare whether systolic blood pressure differs between a control group and two treatment groups
3
Purpose of ANOVA
• One-way ANOVA compares three or more groups defined by a single factor.
– For example, you might compare control, with drug treatment with drug treatment plus antagonist. Or might compare control with five different treatments.
• Some experiments involve more than one factor. These data need to be analyzed by two-way ANOVA or Factorial ANOVA.
– For example, you might compare the effects of three different drugs administered at two times. There are two factors in that experiment: Drug treatment and time.
4
Why not do repeated t-tests?
• Rather than using one-way ANOVA, you might be tempted to use a series of t tests, comparing two groups each time. Don’t do it.
• Repeated t-test increase the chances of type I error or multiple comparison problem
• If you are making comparison between 5 groups, you will need 10 comparison of means
• When the null hypothesis is true the probability that at least 1 of the 10 observed significance levels is less than 0.05 is about 0.29
5
Why not do repeated t-tests?
• With 10 means (45 comparisons), the probability of finding at least one significant difference is about 0.63
• In other words, when level of significance is .05, there is a 1 in 20 chance that one t-test will yield a significant result even when the null hypothesis is true.
• The more t-test the more that probability will increase
6
What Does ANOVA Do?
• ANOVA involves the partitioning of variance of the dependent variable into different components:
– A. Between Group Variability
– B. Within Group Variability
• More Specifically, The Analysis of Variance is a method for partitioning the Total Sum of Squares into two Additive and independent parts.
7
Definition of Total Sum of Squares or Variance
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
CaseGroup
1Group
2 …Group
p
1 X11 X21 … Xp1
2 X12 X22 … Xp2
3 X13 X23 … Xp3
… … … ..
n X1n X2n .. Xpn
Summed acrossall n times p observations
Grand average
8
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j
Definition of Between Sum of Squares
CaseGroup
1Group
2 …Group
p
1 X11 X21 … Xp1
2 X12 X22 … Xp2
3 X13 X23 … Xp3
… … … ..
n X1n X2n .. Xpn
Average of
group j
Grand average
Sum of squared differences
of group means from the grand
mean is SSB
9
Definition of Within Sum of Squares
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
CaseGroup
1Group
2 …Group
p
1 X11 X21 … Xp1
2 X12 X22 … Xp2
3 X13 X23 … Xp3
… … … ..
n X1n X2n .. Xpn
Sum of squareddifference
of observations
from group means
Observations
Group m
ean
10
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j .j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j .j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j .j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j .j
Partitioning of Variance into Different Components
Total sum of squares
Between
groups
sum of squares
Within
groups
sum of
squares
11
Test Statistic in ANOVA
Test statistic for ANOVA
is based on between &
within groups SS
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j .j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j .j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j .j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j .j
12
Test Statistic in ANOVA
• F = Between group variability / Within group variability– The source of Within group variability is the individual
differences.
– The source of Between group variability is effect of independent or grouping variables.
– Within group variability is sampling error across the cases
– Between group variability is effect of independent groups or variables
13
Steps in Test of Hypothesis
1. Determine the appropriate test
2. Establish the level of significance:α
3. Determine whether to use a one tail or two tail test
4. Calculate the test statistic
5. Determine the degree of freedom
6. Compare computed test statistic against a tabled/critical value
Same as Before
14
1. Determine the Appropriate Test
• Independent random samples have been taken from each population
• Dependent variable population are normally distributed (ANOVA is robust with regards to this assumption)
• Population variances are equal (ANOVA is robust with regards to this assumption)
• Subjects in each group have been independently sampled
15
2. Establish Level of Significance
• α is a predetermined value
• The convention• α = .05
• α = .01
• α = .001
16
3. Use a Two Tailed Test
• Ho: 1 = 2 = 3 = 4
Where1 = population mean for group 12 = population mean for group 23 = population mean for group 34 = population mean for group 4
• H1 = not Ho
17
3. Use a Two Tailed Test
• Ha = not Ho
• The alternative hypothesis does not specify whether
1 2 or
2 3 or
1 3
18
4. Calculating Test Statistics
• F = (SSb / dfB) / (SSw / dfw)S
um o
f sq
uare
bet
wee
n
Deg
rees
of fr
eedo
m
bet
wee
nS
um o
f sq
uare
with
in
Deg
rees
of fr
eedo
m
with
in
19
4. Calculating Test Statistics
• By dividing the sum of the squared deviations by degrees of freedom, we are essentially computing an “average” (or mean) amount of variation
• The specific name for the numerator of the F statistic is the mean square between (the average amount of between-group variation
• The specific name for the denominator of the F statistic is the mean square within (the average amount of within- group variation)
20
5. Determine Degrees of Freedom
• Degrees of freedom between
– dfB = k – 1
– K = number of groups
• Degrees of freedom within
– dfw = N – k
– N = total number of subjects in the study
21
6. Compare the Computed Test Statistic Against a Tabled Value
• α = .05
• If Fc > Fα Reject H0
• If Fc > Fα Can not Reject H0
22
Example
• Suppose we had patients with myocardial infarction in the following groups:– Group 1: A music therapy group
– Group 2: A relaxation therapy group
– Group 3: A control group
• 15 patients are randomly assigned to the 3 groups and then their stress levels are measured to determine if the interventions were effective in minimizing stress.
23
Example
• Dependent Variable
– The stress scores. The ranges are from zero (no stress) to 10 (extreme stress)
• Independent Variable or Factor
– Treatment Conditions(3 levels)
24
Observations
Group 1 Group 2 Group 30 1 56 4 62 3 104 2 83 0 6
Mean 3 2 7
25
Sum of Squares for Each GroupGroup 1
0
Group 2
1
Group 3
5
6 4 6
2 3 10
4 2 8
3 0 6
SS1 = 20 SS2 = 10 SS3= 16
n1=5 n2= 5 n3 = 5
3.0X1 2.0 X2 7.0 X3
26
SS Within
70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween
46 16 10 20 SSWithin
16 7)-(67)-(8 7)-(10 7)-(6 7)- (5
)X ( SS
10 2)-(02)-(2 2)-(3 2)-(4 2)- (1
)X ( SS
20 3)-(33)-(4 3)-(2 3)-(6 3)- (0
)( SS
222
22222
23
22222
22
22222
21
X
X
XX
3j
2j
1j 1
2
3j 3
70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween
46 16 10 20 SSWithin
16 7)-(67)-(8 7)-(10 7)-(6 7)- (5
)X ( SS
10 2)-(02)-(2 2)-(3 2)-(4 2)- (1
)X ( SS
20 3)-(33)-(4 3)-(2 3)-(6 3)- (0
)( SS
222
22222
23
22222
22
22222
21
X
X
XX
3j
2j
1j
70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween
46 16 10 20 SSWithin
16 7)-(67)-(8 7)-(10 7)-(6 7)- (5
)X ( SS
10 2)-(02)-(2 2)-(3 2)-(4 2)- (1
)X ( SS
20 3)-(33)-(4 3)-(2 3)-(6 3)- (0
)( SS
222
22222
23
22222
22
22222
21
X
X
XX
3j
2j
70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween
46 16 10 20 SSWithin
16 7)-(67)-(8 7)-(10 7)-(6 7)- (5
)X ( SS
10 2)-(02)-(2 2)-(3 2)-(4 2)- (1
)X ( SS
20 3)-(33)-(4 3)-(2 3)-(6 3)- (0
)( SS
222
22222
23
22222
22
22222
21
X
X
XX
3j
70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween
46 16 10 20 SSWithin
16 7)-(67)-(8 7)-(10 7)-(6 7)- (5
)X ( SS
10 2)-(02)-(2 2)-(3 2)-(4 2)- (1
)X ( SS
20 3)-(33)-(4 3)-(2 3)-(6 3)- (0
)( SS
222
22222
23
22222
22
22222
21
X
X
XX
3j
2j
1j 1
2
3j 3
27
70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween
46 16 10 20 SSWithin
16 7)-(67)-(8 7)-(10 7)-(6 7)- (5
)X ( SS
10 2)-(02)-(2 2)-(3 2)-(4 2)- (1
)X ( SS
20 3)-(33)-(4 3)-(2 3)-(6 3)- (0
)( SS
222
22222
23
22222
22
22222
21
X
X
XX
3j
2j
1j 1
2
3j 3
70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween
46 16 10 20 SSWithin
16 7)-(67)-(8 7)-(10 7)-(6 7)- (5
)X ( SS
10 2)-(02)-(2 2)-(3 2)-(4 2)- (1
)X ( SS
20 3)-(33)-(4 3)-(2 3)-(6 3)- (0
)( SS
222
22222
23
22222
22
22222
21
X
X
XX
3j
2j
1j
70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween
46 16 10 20 SSWithin
16 7)-(67)-(8 7)-(10 7)-(6 7)- (5
)X ( SS
10 2)-(02)-(2 2)-(3 2)-(4 2)- (1
)X ( SS
20 3)-(33)-(4 3)-(2 3)-(6 3)- (0
)( SS
222
22222
23
22222
22
22222
21
X
X
XX
3j
2j
70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween
46 16 10 20 SSWithin
16 7)-(67)-(8 7)-(10 7)-(6 7)- (5
)X ( SS
10 2)-(02)-(2 2)-(3 2)-(4 2)- (1
)X ( SS
20 3)-(33)-(4 3)-(2 3)-(6 3)- (0
)( SS
222
22222
23
22222
22
22222
21
X
X
XX
3j
70 4)-5(7 4)- 5(2 4)- 5(3 SSBetween
46 16 10 20 SSWithin
16 7)-(67)-(8 7)-(10 7)-(6 7)- (5
)X ( SS
10 2)-(02)-(2 2)-(3 2)-(4 2)- (1
)X ( SS
20 3)-(33)-(4 3)-(2 3)-(6 3)- (0
)( SS
222
22222
23
22222
22
22222
21
X
X
XX
3j
2j
1j 1
2
3j 3
Number
of cases
SS BetweenGroup 2
average
Group 1
average
Group 3
average
Grand
average
28
Sum of Squares Total
116 4)-(64)-(84)-(10
4)-(64)-(54)-(04)-(2
4)-(3 4)-(4 4)-(1 4)-(3
4)- (4 4)-(2 4)-(6 4)- (0 SSTotal
222
2222
2 22 2
2222
29
Components of Variance
SSTotal = SSBetween + SSWithin
116 = 70 + 46
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j .j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j .j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j .j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
),.......,1;,......,1(
)( )(n )( 2.
2..
2..
pjni
XXXXXXn
ii
jij
p
ij
p
ij
ij
n
iiij
p
ij
i=1j=1 i=1j=1j=1
.j .j
30
Degrees of Freedom
• Df between = 3 -1
• Df within = 15 - 3
dfB = k – 1
dfw = N – k
31
Test Statistic
MSBetween= 70 / 2 = 35
MSWithin= 46 / 12 = 3.83
Fc = MSBetween / MSWithin
Fc = 35 / 3.83 = 9.13
32
Lookup Critical Value
• Fα = 3.88
33
Conclusions
• Fc = 9.13 > Fα = 3.88
• Fc > Fα Therefore Reject H0
34
One-way ANOVA Summary
Source SS DF MS Fc Fα
-------------- ------ ------ -------- ------ ------
Between 70 2 35 9.13 3.88
Within 46 12 3.83
------- ------ ---- ----- ----- -------
Total 116 14
35
Multiple Comparison GroupsF test does not tell which pair are not equal
Additional analysis is necessary to answer which pair are not equal
36
Fisher’s LSD Test
• These are the null and alternative hypothesis being tested
– Ho1 : µ1 = µ2 Ha1 : µ1 µ2
– Ho2 : µ1 = µ3 Ha2 : µ1 µ3
– Ho3 : µ2 = µ3 Ha3 : µ2 µ3
37
Fisher’s LSD Test
• Known as the protected t-test
• The least difference between means needed for significance
• Df = N – K
• Use the following formula:
)/2(05. nMSwtLSD
38
Calculation of LSD
• All pairs for means differing by at least 2.70 points on the stress scale would be significantly different from on another.
70.2)40(.83.318.2 LSD
39
Application to Three Samples
Mean 1 – Mean 2 = 1
Mean 3 – Mean 1 = 4
Mean 3 – Mean 2 = 5
Alternative Hypotheses:
Ho1 :µ1 = µ2 Not Rejected
Ho2 :µ1 = µ3 Rejected
Ho3 :µ2 = µ3 Rejected
40
Use of SPSS in ANOVA
41
Data in SPSS Input Format
Stress Score Groups
0 1
6 1
2 1
4 1
3 1
1 2
4 2
3 2
2 2
0 2
5 3
6 3
10 3
8 3
6 3
42
SPSS Output for ANOVA
Descriptives
Stress Levels
Music Therapy5 3.00 2.236 1.000 .22 5.78 0 6
Relaxation Therapy 5 2.00 1.581 .707 .04 3.96 0 4
Control Group5 7.00 2.000 .894 4.52 9.48 5 10
N Mean Std. Deviation Std. Error95% Confidence Interval
for Mean Minimum Maximum
Lower Bound
Upper Bound
Total15 4.00 2.878 .743 2.41 5.59 0 10
43
SPSS Output for ANOVA Test of Homogeneity of Variances
Stress Levels.
Levene Statistic df1 df2
Sig level or p-value
.242 2 12 .788
Stress Levels
Between Groups70.000 2 35.000 9.130 .004
Within Groups46.000 12 3.833
Sum of
Squares dfMean
Square F
Sig.level or p-value
Total116.000 14
P<.05, therefore, we reject the Null Hypothesis and continue with Multiple Comparison Table
P > .05, therefore, th assumption of Homogeneity of Variance is met.
ANOVA
44
SPSS Output for ANOVA Multiple Comparisons
Dependent Variable: Stress Levels LSD
Music Therapy Relaxation Therapy 1.000 1.238 .435 -1.70 3.70
Control Group-4.000(*) 1.238 .007 -6.70 -1.30
Relaxation Therapy
Music Therapy-1.000 1.238 .435 -3.70 1.70
Control Group-5.000(*) 1.238 .002 -7.70 -2.30
Control Group Music Therapy4.000(*) 1.238 .007 1.30 6.70
Relaxation Therapy 5.000(*) 1.238 .002 2.30 7.70
(I) Groups (J) Groups
Mean Difference
(I-J) Std. ErrorSig.
Level 95% Confidence Interval
* The mean difference is significant at the .05 level.
45
Take home lesson
How to compare means of three or more samples