one way repeated measure anova
DESCRIPTION
Repeated measure ANOVA; how it works, F statistic, assumptions and its pros and consTRANSCRIPT
REPEATED MEASURES OF ANOVA 1
Repeated Measures ANOVA
Introduction
The repeated measures ANOVA is a member of the ANOVA family. ANOVA is short for
Analysis Of Variance. All ANOVAs compare one or more mean scores with each other; they
are tests for the difference in mean scores (Statistics Solutions, 2012). Repeated measures
ANOVA is the equivalent of the one-way ANOVA, but for related, not independent groups, and
is the extension of the dependent t-test. A repeated measures ANOVA is also referred to as a
within-subjects ANOVA or ANOVA for correlated samples. The repeated factor is called a
“within” subjects factor because comparisons are made multiple times ("repeated") “within” the
same subject rather than across ("between") different subjects (Vincent & Weir, n.d).
The within-subjects ANOVA is appropriate for repeated measures designs (e.g., pretest-posttest
designs), within-subjects experimental designs, matched designs, or multiple measures (Newsom, 2012).
Like T-Tests, repeated measures ANOVA gives the statistic tools to determine whether or not
change has occurred over time. T-Tests compare average scores at two different time periods for
a single group of subjects. Repeated measures ANOVA compared the average score at multiple
time periods for a single group of subjects (Laerd Statistics, 2013). Examples
Taking a self-esteem measure before, after, and following-up a psychological
intervention), and/or
A measure taken over time to measure change such as a motivation score upon entry to a
new program, 6 months into the program, 1 year into the program, and at the exit of the
program.
A measure repeated across multiple conditions such as a measure of experimental
condition A, condition B, and condition C, and
REPEATED MEASURES OF ANOVA 2
Several related, comparable measures (e.g., sub-scales of an IQ test).
Since repeated measures are collected on the same subjects, the means of those measures
study are dependent. A particular subject’s scores will be more alike than scores collected from
multiple subjects, meaning that there is less variability from measure to measure than observed
from person to person in simple ANOVA.
Repeated measures ANOVA separates the two sources of variance: measures and
persons. This separation of the sources of variance decreases MSE, the random variation
(sampling error) component, because there are now two sources of known variation (subjects and
measures) instead of just one (subjects) as in simple ANOVA. The variation in scores due to
differences between subjects is separated from variation due to differences from measure to
measure within a subject.
Instead of comparing treatment effects to a group of different subjects, treatment effects
are compared across multiple measures in the same subjects. Each subject provides their own
"control" value for the comparison. Consequently, this type of design is more sensitive to
differences (i.e., requires smaller differences in the dependent variable to reject the null
hypothesis) than are between subjects designs.
Assumptions
1. Dependent variable
Dependent variable should be measured at the interval or ratio level (i.e., they are
continuous). Examples of variables that meet this criterion include revision time
(measured in hours), intelligence (measured using IQ score), exam performance
(measured from 0 to 100), weight (measured in kg), and so forth (Laerd Statistics, 2013).
REPEATED MEASURES OF ANOVA 3
2. Independent variable
Independent variable should consist of at least two categorical, "related groups" or
"matched pairs". "Related groups" indicates that the same subjects are present in both groups.
The reason that it is possible to have the same subjects in each group is because each subject
has been measured on two occasions on the same dependent variable.
For example, a researcher might have measured 10 individuals' performance in a spelling
test (the dependent variable) before and after they underwent a new form of computerized
teaching method to improve spelling. He would like to know if the computer training
improved their spelling performance. The first related group consists of the subjects at the
beginning (prior to) the computerized spelling training and the second related group consists
of the same subjects, but now at the end of the computerized training. The repeated measures
ANOVA can also be used to compare different subjects, but this does not happen very often
(Laerd Statistics, 2013).
3. No significant outliers differences
There should be no significant outliers in the differences between the two related groups.
Outliers are simply single data points within the data that do not follow the usual pattern
(e.g., in a study of 100 students' IQ scores, where the mean score was 108 with only a small
variation between students, one student had a score of 156, which is very unusual, and may
even put her in the top 1% of IQ scores globally). The problem with outliers is that they can
have a negative effect on the repeated measures ANOVA, distorting the differences between
the related groups (whether increasing or decreasing the scores on the dependent variable),
which reduces the accuracy of your results. Fortunately, when using SPSS to run a repeated
measure ANOVA it can easily detect possible outliers.
REPEATED MEASURES OF ANOVA 4
4. Normally distributed Dependent variable
The distributions of the differences in the dependent variable between the two or more
related groups should be approximately normally distributed. The repeated measures
ANOVA only require approximately normal data because it is quite "robust" to violations of
normality, meaning that assumption can be a little violated and still provide valid results. The
Shapiro-Wilk test of normality can test for normality, which is easily tested by using SPSS
(Laerd Statistics, 2013).
5. Sphericity
Sphericity, is the variances of the differences between all combinations of related groups
must be equal. Unfortunately, repeated measures ANOVAs are particularly susceptible to
violating the assumption of sphericity, which causes the test to become too liberal (i.e., leads
to an increase in the Type I error rate; that is, the likelihood of detecting a statistically
significant result when there isn't one) and loss of power that leads to increase in Type II
error (Discovering Statistics, n.d.). Mauchly's Test of Sphericity in SPSS, tests whether the
data has met or failed this assumption (Laerd Statistics, 2013).
Use of Repeated Measures of ANOVA
In repeated measures ANOVA, the independent variable has categories called levels or
related groups. Lowry (1999) states that in the correlated-samples ANOVA, the number of
conditions is three or more: A|B|C, A|B|C|D, and so forth. Thus, for k=3:
Subject A B C Each row
represents one
subject measured
under
each of k
1subj1 under
condition A
subj1 under
condition B
subj1 under
condition C
2subj2 under
condition A
subj2 under
condition B
subj2 under
condition C
3 subj3 under subj3 under subj3 under
REPEATED MEASURES OF ANOVA 5
condition A condition B condition C
conditions.And so on…
For instance, measurements are repeated over time, such as when measuring changes in
blood pressure due to an exercise-training program, the independent variable is time. Each level
(or related group) is a specific time point. Hence, for the exercise-training study, there would be
three time points and each time-point is a level of the independent variable
Where measurements are made under different conditions, the conditions are the levels (or
related groups) of the independent variable (e.g., type of cake is the independent variable with
chocolate, caramel, and lemon cake as the levels of the independent variable). A schematic of a
different-conditions repeated measures design is shown below. It should be noted that often the
REPEATED MEASURES OF ANOVA 6
levels of the independent variable are not referred to as conditions, but treatments. The
independent variable more commonly referred to as the within-subjects factor (Laerd Statistics,
2013)..
Formula for Repeated Measures ANOVA
The statistic used in repeated measures ANOVA is F, the same statistic as in simple ANOVA,
but now computed using the ratio of the variation “within” the subjects to the “error” variation
(Vincent & Weir, n.d.).
REPEATED MEASURES OF ANOVA 7
The observed between measures variance is an estimate of the variation between measures
that would be expected in the population under the conditions of the study. The observed error
variance is an estimate of the variation that would be expected to occur as a result of sampling
error alone. If the observed (computed) value for F is significantly higher than the value expected
by sampling variation alone, then the variance between groups is larger than would be expected
by sampling error alone. In other words, at least one mean differs from the others enough to
cause large variation between the measures (Vincent & Weir, n.d.).
The logic of Repeated Measures ANOVA is that any differences that are found between
treatments can be explained by only two factors:
1. Treatment effect.
2. Error or Chance
Large value of F: a lot of the overall variation in scores is due to the experimental
manipulation, rather than to random variation between participants (Sussex Edu, n.d.).
Small value of F: the variation in scores produced by the experimental manipulation is
small, compared to random variation between participants. Systematic variation random
variation-“error” (Sussex Edu, n.d.).
Understanding One-Way Repeated-Measures ANOVA
In many studies using the one-way repeated-measures design, the levels of a within-
subject factor represent multiple observations on a scale over time or under different conditions.
However, for some studies, levels of a within-subjects factor may represent scores from different
scales, and the focus may be on evaluating differences in means among these scales. In such a
setting the scales must be commensurable for the ANOVA significance tests to be meaningful.
REPEATED MEASURES OF ANOVA 8
That is, the scales must measure individuals on the same metric, and the difference scores
between scales must be interpretable (Oak edu, n.d).
In some studies, individuals are matched on one or more variables so that individuals
within a set are similar on a matching variable(s), while individuals not in the same set are
dissimilar. The number of individuals within a set is equal to the number of levels of a factor.
The individuals within a set are then observed under various levels of this factor. The matching
process for these designs is likely to produce correlated responses on the dependent variable like
those of repeated-measures designs. Consequently, the data from these studies can be analyzed
as if the factor is a within-subjects factor.
SPSS conducts a standard univariate F test if the within-subjects factor has only two
levels. Three types of tests are conducted if the within-subjects factor has more than two levels:
the standard univariate F test, alternative univariate tests, and multivariate tests. All three types
of tests evaluate the same hypothesis – the population means are equal for all levels of the factor.
The choice of what test to report should be made prior to viewing the results (Oak edu, n.d). The
choice of analysis depends on complex relationships between the degree of sphericity violation
and sample size (Park, Cho & Ki, 2009).
The standard univariate ANOVA F test is not recommended when the within subjects
factor has more than two levels because one of its assumptions, the sphericity assumption is
commonly violated, and the ANOVA F test yields inaccurate p values to the extent that this
assumption is violated.
The alternative univariate tests take into account violations of the sphericity
assumption. These tests employ the same calculated F statistic as the standard univariate test, but
its associated p value potentially differs. In determining the p value, an epsilon statistic is
REPEATED MEASURES OF ANOVA 9
calculated based on the sample data to assess the degree that the sphericity assumption is
violated. The numerator and denominator degrees of freedom of the standard test are multiplied
by epsilon to obtain a corrected set of degrees of freedom for the tabled F value and to determine
its p value (Oak edu, n.d).
The multivariate test does not require the assumption of sphericity. Difference scores
are computed by comparing scores from different levels of the within-subjects factor. For
example, for a within-subjects factor with three levels, difference scores might be computed
between the first and second level and between the second and third level. The multivariate test
then would evaluate whether the population means for these two sets of difference scores are
simultaneously equal to zero. This test evaluates not only the means associated with these two
sets of difference scores, but also evaluates whether the mean of the difference scores between
the first and third levels of the factor is equal to zero as well as linear combinations of these
difference scores.
The SPSS Repeated-Measures procedure computes the difference scores used in the
analysis for us. However, these difference scores do not become part of our data file and,
therefore, we may or may not be aware that the multivariate test is conducted on these difference
scores. Applied statisticians tend to prefer the multivariate test to the standard or the alternative
univariate test because the multivariate test and follow-up tests have a close conceptual link to
each other (Oak edu, n.d).
If the initial hypothesis that the means are equal is rejected and there are more than two
means, then follow-up tests are conducted to determine which of the means differs significantly
from each other. Although more complex comparisons can be performed, most researchers
choose to conduct pairwise comparisons. These comparisons may be evaluated with SPSS using
REPEATED MEASURES OF ANOVA 10
the paired-samples t test procedure, and a Bonferroni approach or the Holm’s Sequential
Bonferroni procedure, can be used to control for Type I error across the multiple pairwise tests.
Hypothesis for Repeated Measures ANOVA
All three types of repeated measures ANOVA tests evaluate the same hypothesis
(Khelifa, n.d.). The repeated measures ANOVA tests for whether there are any differences
between related population means. The null hypothesis (H0) states that the means are equal:
H0: µ1 = µ2 = µ3 = … = µk
Where,
µ = population mean and
k = number of related groups
The alternative hypothesis (HA) states that the related population means are not equal (at
least one mean is different to another mean):
HA: at least two means are significantly different
Example
An experimenter wants to look at the effects of practice on manual dexterity scores. Four
people are randomly sampled and tested at three different times. Does practice change manual
dexterity scores? Test with a = .05.
Person Session 1 Session 2 Session 3 P
A 3 3 6 12
B 2 2 2 6
C 1 1 4 6
D 2 4 6 12
T1=8 T2=10 T3=18 G=36
REPEATED MEASURES OF ANOVA 11
k=3, n=4, N=12, ∑X2=140
Step 1: State the Hypotheses
H0: m1 = m2 = m3
Ha: At least one treatment mean is different from the others.
Step 2: Determine FCrit
If the example had been an independent measures ANOVA, we would use df between
and within and find:
Fcrit(2, 9)a=.05 = 4.26
However, the example was a repeated measures ANOVA so we use df between and error
and find:
Fcrit(2, 6)a=.05 = 5.14
Step 3: Compute the Statistic
As for a repeated measures ANOVA
SSBetween=∑T 2
n−G2
N=82
4+102
4+182
4−362
12=14
SSwithin=ΣX 2−∑T 2
n=140−(82
4+102
4+182
4 )=18
SSSubjects=∑ P2
k−G2
N=122
3+62
3+62
3+122
3−362
12=12
SSerror=SSwithin−SSsubjects=18−12=6
SStotal=ΣX 2−G2
N=140−362
12=32
df between=k−1=2df within=N−k=9df subjects=n−1=3
df error=( N−k )−(n−1)=6df total=N−1=11
REPEATED MEASURES OF ANOVA 12
Source SS df MS Fobt Fcrit
Between 14 2 7 7.0 5.14
Within 18 9
Subjects 12 3
Error 6 6 1
Total 32 11
Conclusion: Fail to reject H0
Alternate test
The Friedman test is a non-parametric statistical test used to detect differences in
treatments across multiple test attempts. The Friedman analysis of variance by ranks is an
alternative to one-way repeated measures ANOVA if the dependent variable is not normally
distributed. When using the Friedman test it is important to use a sample size of at least 12
participants to obtain accurate p values.
Advantages
Repeated measures are the method of using the same participants in different
experimental manipulations (Field, 2011). A repeated measure, in using the same participants for
both manipulations, allows the researcher to exclude the effects of individual differences that
could occur in independent groups (Howitt & Crammer, 2011). Factors such as IQ, ability, age
and other important variables remain the same (Field, 2011). Because the same participants are
use it requires fewer participants than other designs, such as independent designs.
The important point is that this small but consistent difference can be detected in the face
of large overall differences among the subjects. Indeed, the difference between conditions is very
REPEATED MEASURES OF ANOVA 13
small relative to the differences among subjects. It is because the conditions can be compared
within each of the subjects that allow the small difference to be apparent. Differences between
subjects are taken into account and are therefore not error.
Removing variance due to differences between subjects from the error variance greatly
increases the power of significance tests. Therefore, within-subjects designs are almost always
more powerful than between-subject designs. Since power is such an important consideration in
the design of experiments, within-subject designs are generally preferable to between-subject
designs. This design is also very economical as sample members are recruited once for treatment
administration (Choudhury, 2009)
Disadvantages
The use of same participants leads to difficulties counteracting problems of order effects
and need for additional materials. An effect o served could be due to boredom affecting
concentration and performance such as reaction time and accuracy caused by repetition (Pan,
Shell & Schleifer, 1994; Bergh & Vrana, 1998; Dsowen, 2011). Effects could also e due to
practice causing participants’ results to improve because they were given more chance to
practice and become familiar with the task (Collie, Maruff, Darby & McStephen, 2003; Dsowen,
2011). The order effects of of an experiment can be reduced by counterbalancing (Field, 2011).
This involves randomly assigning the order of the experimental manipulations participants are
exposed to. For example, half of the participants would be exposed to Control A and then
Control B, and the other half of the participants exposed to Control B and then Control A
(Howitt & Crammer, 2011; Dsowen, 2011). The results collected should then be less affected by
factors such as boredom and practice. Researchers can also provide opportunities to take a break
REPEATED MEASURES OF ANOVA 14
during the experiment to counteract boredom and loss of concentration (Pan, Shell & Schleifer,
1994; Dsowen, 2011).
If a study was testing how Factor A and Factor B affected memory the researcher would
require a different list of words for participants to memorize for Factor A and B, whereas in
independent groups the same list could be used for each factor, because each group only sees the
material once. (Nilsson, Soil & Sullivan, 1994). In using repeated measures the individual
differences of participants is reduced but this instead produces problems with individual
differences between the materials or environments the participants are exposed to. Therefore the
result may be due to these differences in materials rather than the independent variables in
question. The materials must be carefully examined to ensure equal quality in factors such as
difficulty (Riedel, Klaassen, Deutz, Someren &Praag, 1999; Dsowen, 2011).
Conclusion
The advantages and disadvantages of repeated measures must be compared with benefits
of using independent groups. Each study must have careful consideration into which design
would best meet the needs of the study. Problems related to the design must be reduced to have
as little effect on results as possible. No method is without any difficulty and the researcher must
decide which would best produce results the study in investigation (Dsowen, 2011).