one way repeated measure anova

REPEATED MEASURES OF ANOVA 1

Repeated Measures ANOVA

Introduction

The repeated measures ANOVA is a member of the ANOVA family. ANOVA is short for

Analysis Of Variance. All ANOVAs compare one or more mean scores with each other; they

are tests for the difference in mean scores (Statistics Solutions, 2012). Repeated measures

ANOVA is the equivalent of the one-way ANOVA, but for related, not independent groups, and

is the extension of the dependent t-test. A repeated measures ANOVA is also referred to as a

within-subjects ANOVA or ANOVA for correlated samples. The repeated factor is called a

“within” subjects factor because comparisons are made multiple times ("repeated") “within” the

same subject rather than across ("between") different subjects (Vincent & Weir, n.d).

The within-subjects ANOVA is appropriate for repeated measures designs (e.g., pretest-posttest

designs), within-subjects experimental designs, matched designs, or multiple measures (Newsom, 2012).

Like T-Tests, repeated measures ANOVA gives the statistic tools to determine whether or not

change has occurred over time. T-Tests compare average scores at two different time periods for

a single group of subjects. Repeated measures ANOVA compared the average score at multiple

time periods for a single group of subjects (Laerd Statistics, 2013). Examples

Taking a self-esteem measure before, after, and following-up a psychological

intervention), and/or

A measure taken over time to measure change such as a motivation score upon entry to a

new program, 6 months into the program, 1 year into the program, and at the exit of the

program.

A measure repeated across multiple conditions such as a measure of experimental

condition A, condition B, and condition C, and


Several related, comparable measures (e.g., sub-scales of an IQ test).

Since repeated measures are collected on the same subjects, the means of those measures

study are dependent. A particular subject’s scores will be more alike than scores collected from

multiple subjects, meaning that there is less variability from measure to measure than observed

from person to person in simple ANOVA.

Repeated measures ANOVA separates the two sources of variance: measures and

persons. This separation of the sources of variance decreases MSE, the random variation

(sampling error) component, because there are now two sources of known variation (subjects and

measures) instead of just one (subjects) as in simple ANOVA. The variation in scores due to

differences between subjects is separated from variation due to differences from measure to

measure within a subject.

Instead of comparing treatment effects to a group of different subjects, treatment effects

are compared across multiple measures in the same subjects. Each subject provides their own

"control" value for the comparison. Consequently, this type of design is more sensitive to

differences (i.e., requires smaller differences in the dependent variable to reject the null

hypothesis) than are between subjects designs.

Assumptions

1. Dependent variable

Dependent variable should be measured at the interval or ratio level (i.e., they are

continuous). Examples of variables that meet this criterion include revision time

(measured in hours), intelligence (measured using IQ score), exam performance

(measured from 0 to 100), weight (measured in kg), and so forth (Laerd Statistics, 2013).


2. Independent variable

Independent variable should consist of at least two categorical, "related groups" or

"matched pairs". "Related groups" indicates that the same subjects are present in both groups.

The reason that it is possible to have the same subjects in each group is because each subject

has been measured on two occasions on the same dependent variable.

For example, a researcher might have measured 10 individuals' performance in a spelling

test (the dependent variable) before and after they underwent a new form of computerized

teaching method to improve spelling. He would like to know if the computer training

improved their spelling performance. The first related group consists of the subjects at the

beginning (prior to) the computerized spelling training and the second related group consists

of the same subjects, but now at the end of the computerized training. The repeated measures

ANOVA can also be used to compare different subjects, but this does not happen very often

(Laerd Statistics, 2013).

3. No significant outliers differences

There should be no significant outliers in the differences between the two related groups.

Outliers are simply single data points within the data that do not follow the usual pattern

(e.g., in a study of 100 students' IQ scores, where the mean score was 108 with only a small

variation between students, one student had a score of 156, which is very unusual, and may

even put her in the top 1% of IQ scores globally). The problem with outliers is that they can

have a negative effect on the repeated measures ANOVA, distorting the differences between

the related groups (whether increasing or decreasing the scores on the dependent variable),

which reduces the accuracy of your results. Fortunately, when using SPSS to run a repeated

measure ANOVA it can easily detect possible outliers.


4. Normally distributed Dependent variable

The distributions of the differences in the dependent variable between the two or more

related groups should be approximately normally distributed. The repeated measures

ANOVA only require approximately normal data because it is quite "robust" to violations of

normality, meaning that assumption can be a little violated and still provide valid results. The

Shapiro-Wilk test of normality can test for normality, which is easily tested by using SPSS

(Laerd Statistics, 2013).

5. Sphericity

Sphericity, is the variances of the differences between all combinations of related groups

must be equal. Unfortunately, repeated measures ANOVAs are particularly susceptible to

violating the assumption of sphericity, which causes the test to become too liberal (i.e., leads

to an increase in the Type I error rate; that is, the likelihood of detecting a statistically

significant result when there isn't one) and loss of power that leads to increase in Type II

error (Discovering Statistics, n.d.). Mauchly's Test of Sphericity in SPSS, tests whether the

data has met or failed this assumption (Laerd Statistics, 2013).

Use of Repeated Measures of ANOVA

In repeated measures ANOVA, the independent variable has categories called levels or

related groups. Lowry (1999) states that in the correlated-samples ANOVA, the number of

conditions is three or more: A|B|C, A|B|C|D, and so forth. Thus, for k=3:

Subject A B C Each row

represents one

subject measured

under

each of k

1subj1 under

condition A

subj1 under

condition B

subj1 under

condition C

2subj2 under

condition A

subj2 under

condition B

subj2 under

condition C

3 subj3 under subj3 under subj3 under


condition A condition B condition C

conditions.And so on…

For instance, measurements are repeated over time, such as when measuring changes in

blood pressure due to an exercise-training program, the independent variable is time. Each level

(or related group) is a specific time point. Hence, for the exercise-training study, there would be

three time points and each time-point is a level of the independent variable

Where measurements are made under different conditions, the conditions are the levels (or

related groups) of the independent variable (e.g., type of cake is the independent variable with

chocolate, caramel, and lemon cake as the levels of the independent variable). A schematic of a

different-conditions repeated measures design is shown below. It should be noted that often the


levels of the independent variable are not referred to as conditions, but treatments. The

independent variable more commonly referred to as the within-subjects factor (Laerd Statistics,

2013)..

Formula for Repeated Measures ANOVA

The statistic used in repeated measures ANOVA is F, the same statistic as in simple ANOVA,

but now computed using the ratio of the variation “within” the subjects to the “error” variation

(Vincent & Weir, n.d.).


The observed between measures variance is an estimate of the variation between measures

that would be expected in the population under the conditions of the study. The observed error

variance is an estimate of the variation that would be expected to occur as a result of sampling

error alone. If the observed (computed) value for F is significantly higher than the value expected

by sampling variation alone, then the variance between groups is larger than would be expected

by sampling error alone. In other words, at least one mean differs from the others enough to

cause large variation between the measures (Vincent & Weir, n.d.).

The logic of Repeated Measures ANOVA is that any differences that are found between

treatments can be explained by only two factors:

1. Treatment effect.

2. Error or Chance

Large value of F: a lot of the overall variation in scores is due to the experimental

manipulation, rather than to random variation between participants (Sussex Edu, n.d.).

Small value of F: the variation in scores produced by the experimental manipulation is

small, compared to random variation between participants. Systematic variation random

variation-“error” (Sussex Edu, n.d.).

Understanding One-Way Repeated-Measures ANOVA

In many studies using the one-way repeated-measures design, the levels of a within-

subject factor represent multiple observations on a scale over time or under different conditions.

However, for some studies, levels of a within-subjects factor may represent scores from different

scales, and the focus may be on evaluating differences in means among these scales. In such a

setting the scales must be commensurable for the ANOVA significance tests to be meaningful.

http://grants.hhp.coe.uh.edu/doconnor/PEP6305/Topic%20005%20Normal%20Distribution5.3.htm#error_probability


That is, the scales must measure individuals on the same metric, and the difference scores

between scales must be interpretable (Oak edu, n.d).

In some studies, individuals are matched on one or more variables so that individuals

within a set are similar on a matching variable(s), while individuals not in the same set are

dissimilar. The number of individuals within a set is equal to the number of levels of a factor.

The individuals within a set are then observed under various levels of this factor. The matching

process for these designs is likely to produce correlated responses on the dependent variable like

those of repeated-measures designs. Consequently, the data from these studies can be analyzed

as if the factor is a within-subjects factor.

SPSS conducts a standard univariate F test if the within-subjects factor has only two

levels. Three types of tests are conducted if the within-subjects factor has more than two levels:

the standard univariate F test, alternative univariate tests, and multivariate tests. All three types

of tests evaluate the same hypothesis – the population means are equal for all levels of the factor.

The choice of what test to report should be made prior to viewing the results (Oak edu, n.d). The

choice of analysis depends on complex relationships between the degree of sphericity violation

and sample size (Park, Cho & Ki, 2009).

The standard univariate ANOVA F test is not recommended when the within subjects

factor has more than two levels because one of its assumptions, the sphericity assumption is

commonly violated, and the ANOVA F test yields inaccurate p values to the extent that this

assumption is violated.

The alternative univariate tests take into account violations of the sphericity

assumption. These tests employ the same calculated F statistic as the standard univariate test, but

its associated p value potentially differs. In determining the p value, an epsilon statistic is


calculated based on the sample data to assess the degree that the sphericity assumption is

violated. The numerator and denominator degrees of freedom of the standard test are multiplied

by epsilon to obtain a corrected set of degrees of freedom for the tabled F value and to determine

its p value (Oak edu, n.d).

The multivariate test does not require the assumption of sphericity. Difference scores

are computed by comparing scores from different levels of the within-subjects factor. For

example, for a within-subjects factor with three levels, difference scores might be computed

between the first and second level and between the second and third level. The multivariate test

then would evaluate whether the population means for these two sets of difference scores are

simultaneously equal to zero. This test evaluates not only the means associated with these two

sets of difference scores, but also evaluates whether the mean of the difference scores between

the first and third levels of the factor is equal to zero as well as linear combinations of these

difference scores.

The SPSS Repeated-Measures procedure computes the difference scores used in the

analysis for us. However, these difference scores do not become part of our data file and,

therefore, we may or may not be aware that the multivariate test is conducted on these difference

scores. Applied statisticians tend to prefer the multivariate test to the standard or the alternative

univariate test because the multivariate test and follow-up tests have a close conceptual link to

each other (Oak edu, n.d).

If the initial hypothesis that the means are equal is rejected and there are more than two

means, then follow-up tests are conducted to determine which of the means differs significantly

from each other. Although more complex comparisons can be performed, most researchers

choose to conduct pairwise comparisons. These comparisons may be evaluated with SPSS using


the paired-samples t test procedure, and a Bonferroni approach or the Holm’s Sequential

Bonferroni procedure, can be used to control for Type I error across the multiple pairwise tests.

Hypothesis for Repeated Measures ANOVA

All three types of repeated measures ANOVA tests evaluate the same hypothesis

(Khelifa, n.d.). The repeated measures ANOVA tests for whether there are any differences

between related population means. The null hypothesis (H0) states that the means are equal:

H0: µ1 = µ2 = µ3 = … = µk

Where,

µ = population mean and

k = number of related groups

The alternative hypothesis (HA) states that the related population means are not equal (at

least one mean is different to another mean):

HA: at least two means are significantly different

Example

An experimenter wants to look at the effects of practice on manual dexterity scores. Four

people are randomly sampled and tested at three different times. Does practice change manual

dexterity scores? Test with a = .05.

Person Session 1 Session 2 Session 3 P

A 3 3 6 12

B 2 2 2 6

C 1 1 4 6

D 2 4 6 12

T1=8 T2=10 T3=18 G=36


k=3, n=4, N=12, ∑X2=140

Step 1: State the Hypotheses

H0: m1 = m2 = m3

Ha: At least one treatment mean is different from the others.

Step 2: Determine FCrit

If the example had been an independent measures ANOVA, we would use df between

and within and find:

Fcrit(2, 9)a=.05 = 4.26

However, the example was a repeated measures ANOVA so we use df between and error

and find:

Fcrit(2, 6)a=.05 = 5.14

Step 3: Compute the Statistic

As for a repeated measures ANOVA

SSBetween=∑T 2

n−G2

N=82

4+102

4+182

4−362

12=14

SSwithin=ΣX 2−∑T 2

n=140−(82

4+102

4+182

4 )=18

SSSubjects=∑ P2

k−G2

N=122

3+62

3+62

3+122

3−362

12=12

SSerror=SSwithin−SSsubjects=18−12=6

SStotal=ΣX 2−G2

N=140−362

12=32

df between=k−1=2df within=N−k=9df subjects=n−1=3

df error=( N−k )−(n−1)=6df total=N−1=11


Source SS df MS Fobt Fcrit

Between 14 2 7 7.0 5.14

Within 18 9

Subjects 12 3

Error 6 6 1

Total 32 11

Conclusion: Fail to reject H0

Alternate test

The Friedman test is a non-parametric statistical test used to detect differences in

treatments across multiple test attempts. The Friedman analysis of variance by ranks is an

alternative to one-way repeated measures ANOVA if the dependent variable is not normally

distributed. When using the Friedman test it is important to use a sample size of at least 12

participants to obtain accurate p values.

Advantages

Repeated measures are the method of using the same participants in different

experimental manipulations (Field, 2011). A repeated measure, in using the same participants for

both manipulations, allows the researcher to exclude the effects of individual differences that

could occur in independent groups (Howitt & Crammer, 2011). Factors such as IQ, ability, age

and other important variables remain the same (Field, 2011). Because the same participants are

use it requires fewer participants than other designs, such as independent designs.

The important point is that this small but consistent difference can be detected in the face

of large overall differences among the subjects. Indeed, the difference between conditions is very


small relative to the differences among subjects. It is because the conditions can be compared

within each of the subjects that allow the small difference to be apparent. Differences between

subjects are taken into account and are therefore not error.

Removing variance due to differences between subjects from the error variance greatly

increases the power of significance tests. Therefore, within-subjects designs are almost always

more powerful than between-subject designs. Since power is such an important consideration in

the design of experiments, within-subject designs are generally preferable to between-subject

designs. This design is also very economical as sample members are recruited once for treatment

administration (Choudhury, 2009)

Disadvantages

The use of same participants leads to difficulties counteracting problems of order effects

and need for additional materials. An effect o served could be due to boredom affecting

concentration and performance such as reaction time and accuracy caused by repetition (Pan,

Shell & Schleifer, 1994; Bergh & Vrana, 1998; Dsowen, 2011). Effects could also e due to

practice causing participants’ results to improve because they were given more chance to

practice and become familiar with the task (Collie, Maruff, Darby & McStephen, 2003; Dsowen,

2011). The order effects of of an experiment can be reduced by counterbalancing (Field, 2011).

This involves randomly assigning the order of the experimental manipulations participants are

exposed to. For example, half of the participants would be exposed to Control A and then

Control B, and the other half of the participants exposed to Control B and then Control A

(Howitt & Crammer, 2011; Dsowen, 2011). The results collected should then be less affected by

factors such as boredom and practice. Researchers can also provide opportunities to take a break

http://explorable.com/

http://davidmlane.com/hyperstat/A108717.html


during the experiment to counteract boredom and loss of concentration (Pan, Shell & Schleifer,

1994; Dsowen, 2011).

If a study was testing how Factor A and Factor B affected memory the researcher would

require a different list of words for participants to memorize for Factor A and B, whereas in

independent groups the same list could be used for each factor, because each group only sees the

material once. (Nilsson, Soil & Sullivan, 1994). In using repeated measures the individual

differences of participants is reduced but this instead produces problems with individual

differences between the materials or environments the participants are exposed to. Therefore the

result may be due to these differences in materials rather than the independent variables in

question. The materials must be carefully examined to ensure equal quality in factors such as

difficulty (Riedel, Klaassen, Deutz, Someren &Praag, 1999; Dsowen, 2011).

Conclusion

The advantages and disadvantages of repeated measures must be compared with benefits

of using independent groups. Each study must have careful consideration into which design

would best meet the needs of the study. Problems related to the design must be reduced to have

as little effect on results as possible. No method is without any difficulty and the researcher must

decide which would best produce results the study in investigation (Dsowen, 2011).

one way repeated measure anova

Education