january 7, 2009 - morning session 1 statistics micro mini multi-factor anova january 5-9, 2008 beth...

January 7, 2009 - morning session

1

Statistics Micro Mini

Multi-factor ANOVA

January 5-9, 2008

Beth Ayers


2

Thursday Sessions

• ANOVA‒ One-way ANOVA‒ Two-way ANOVA‒ ANCOVA‒ With-in subject‒ Between subject‒ Repeated measures‒ MANOVA‒ etc.


3

What is ANOVA?

• ANalysis Of VAriance

‒ Partitions the observed variance based on explanatory variables

‒ Compare partitions to test significance of explanatory variables


4

Some Terminology

• Between subjects design – each subject participates in one and only one group

• Within subjects design – the same group of subjects serves in more than one treatment‒ Subject is now a factor

• Mixed design – a study which has both between and within subject factors

• Repeated measures – general term for any study in which multiple measurements are measured on the same subject‒ Can be either multiple treatments or several

measurements over time


5

ANOVA

• Use variances and variance like quantities to study the equality or non-equality of population means

• So, although it is analysis of variance we are actually analyzing means, not variances

• There are other methods which analyze the variances between groups


6

ANOVA

• Typical exploratory analysis includes‒ Tabulation of the number of subjects in each

experimental group

‒ Side-by-side box plots

‒ Statistics about each group‒ At least mean and standard deviation, can

include 5-number summary and information on skewness

‒ Table of means for each experimental group


7

Notation

• If we have k groups, denote the means of the groups as:‒ ¹1, ¹2, . . ., ¹k

• Student i in group j has observation‒ yij = ¹j + ²

ij

‒ Where ²ij are independent, distributed N(0,¾2)‒ Can combine this and say subjects from group j

have distribution N(¹j,¾2)

• With random assignment, the sample mean for any treatment group is representative of the population mean for that group


8

Assumptions

1. The errors ²ij are normally distributed

2. Across the conditions, the errors have equal spread. Often referred to as equal variances.‒ Rule of thumb: the assumption is met if the

largest variance is less than twice the smallest variance

‒ If unequal variances need to make a correction!! This is usually ®/2.

3. The errors are independent from each other


9

Checking the assumptions

• Use the residuals, which are the estimates of ²ij

1. Look at normal probability plot2. Look at residual versus fitted plot3. Hard to check, often assumed from study

design

• For mild violations of the assumptions, there are options for correction

• When the assumptions are not met – the p-value is simply wrong!!


10

One-way ANOVA

• One-way ANOVA is used when‒ Only testing the affect of one explanatory

variable‒ Each subject has only one treatment or

condition‒ Thus a between-subjects design

• Used to test for differences among two or more independent groups

• Gives the same results as two-sample T-test if explanatory variable has 2 levels


11

Hypothesis Testing

• H0: ¹1 = ¹2 = . . . = ¹k

• H1: the ¹’s are not all equal

• The alternative hypothesis H1: ¹1 ≠ … ≠ ¹k is wrong!

• The null hypothesis is called the overall null and is the hypothesis tested by ANOVA

• If the overall null is rejected, must do more specific hypothesis testing to determine which means are different, often referred to as contrasts


12

Terminology

• The sample variance is the sum of the squared deviations from the mean divided by the degrees of freedom

• A mean square (MS) is a variance like quantity calculated as a SS/df

df

SS MS

1

)( s

22

N

xxi


13

One-way ANOVA

• In one-way ANOVA we work with two mean square quantities

‒ MSwithin – the mean square within-groups

‒ MSbetween – the mean square between-groups

between

betweenbetween df

SS MS

within

withinwithin df

SS MS


14

Within vs. Between


15

One-way ANOVA

• For each individual group we have

• So the estimate of MSwithin is

• And the estimate of MSbetween is

1

)(

df

SS MS 1

2

between

betweenbetween

k

xxnk

i ii

k

i i

k

i i

kNn

SS

1

1

within

withinwithin

)1(df

SS MS

1

)(

df

SS 1

2

i

i

i

n

j iij

n

xxi


16

Mean Squares

• What do these values mean?

• MSwithin is considered a true estimate of ¾2 that is unaffected by whether the null or alternative hypothesis is true

• MSbetween is considered a good estimate of ¾2 only when the null hypothesis is true‒ If the alternative is true, values of MSbetween tend to be

inflated

• Thus, we can look at the ratio of the two mean square values to evaluate the null hypothesis


17

Testing the Hypothesis

• The F-test looks at the variation among the group means relative to the variation within the sample

• The F-statistic tends to be larger if the alternative hypothesis is true than if the null hypothesis is true

• The test statistic F has an F(k-1, N-k) distribution

)(

)1(

kNSS

kSS

dfSS

dfSS

MS

MSF

within

between

within

within

between

between

within

between


18

What does the F ratio tell us?

• F = MSbetween / MSwithin

• The denominator is always an estimate of ¾2

(under both the null and alternative hypotheses)

• The numerator is either another estimate of ¾2 (under the null) or is inflated (under the alternative)

• If the null is true, values of F are close to 1

• If the alternative is true, values of F are larger

• Large values of F depend on the degrees of freedom


19

The ANOVA table

• When running an ANOVA, statistical packages will return an ANOVA table summarizing the SS, MS, df, F-statistic, and p-value

SS Df MS F Sig

Group (Treatment, between)

SSbetween dfbetween MSbetween

MSbetween_________________

MSwithin

P-value

Residual (Error,within)

SSwithin dfwithin MSwithin

Total SSbetween

+ SSwithin

dfbetween

+ dfwithin


20

Example

• Suppose we want to know if typing speed varies across majors

• Use 4 majors – Biology, Business, English, and Mathematics

• H0: typing speed is the same for students of all majors‒ H0: ¹Bio = ¹Business = ¹Eng = ¹Math

• H1: typing speed varies across the majors‒ H1: at least one of the means is different


21

Box plots


22

Summary

Major ni Mean Variance

Biology 25 45.3 24.7

Business 25 47.6 25.4

English 25 55.6 38.8

Mathematics 25 45.1 20.1

• The largest variance is less than twice the smallest variance (38.8 < 2 ¢ 20.1 = 40.2). Use ® = 0.05.


23

Degrees of Freedom

• How many groups do we have?

• What is the sample size?

• Using these values:• What is dfwithin?

• What is dfbetween?


24

Degrees of Freedom

• How many groups do we have? ‒ There are k = 4 groups – Biology, English,

Business, and Mathematics

• What is the sample size?‒ There are N = 100 students

• Using these values,‒ What is dfbetween?

‒ k – 1 = 4 – 1 = 3

‒ What is dfwithin?‒ N – k = 100 – 4 = 96


25

Sample Output

SS Df MS F Sig

Group (Treatment, between)

1807.49 3 602.50 22.091 0.000

Residual (Error,within)

2618.20 96 27.17

Total 4425.69 99

• Our estimate of ¾2 is 27.17

• The numerator MS = 602.5 and appears to be highly inflated


26

Results

• F-statistic = 22.1• P-value: <0.0005

• Conclusion – the average words per minute differs for at least one of the majors

• To make stronger statements need to do further testing


27



28

Further Analysis

• If H0 is rejected, we conclude that not all the ¹’s are equal

• Would like to make statements about where there are differences

• Can use planned or unplanned comparisons (or contrasts)‒ Planned comparisons are interesting

comparisons decided on before analysis‒ Unplanned comparisons occur after seeing

the results‒ Be careful not to go fishing for results


29

Contrasts

• A simple contrast hypothesis compares two population means‒ HO: ¹1 = ¹5

• A complex contrast hypothesis has multiple population means on either side• H0: (¹1 + ¹2) / 2 = ¹3

• H0: (¹1 + ¹2) / 2 = (¹3 + ¹4 + ¹5) / 3


30

Planned Comparisons

• Most statistical packages allow you to enter custom planned contrast hypotheses

• The p-values are only valid under strict conditions‒ The conditions maintain Type-1 error rate

across the whole experiment

• Computer packages assume that you have checked the assumptions of the ANOVA test


31

Conditions for Planned Comparisons

• Contrasts are selected before looking at the residuals, they are planned – not post-hoc

• Must be ignored if the overall null is not rejected!

• Each contrast is based on independent information from other contrasts

• The number of planned comparisons must not be more than the corresponding degrees of freedom (k-1 in one-way ANOVA)


32

Unplanned Comparisons

• What if we notice a possible interesting difference when looking at the results?

• Can do comparisons but need to adjust the ®-level to control for Type-1 error

• One common method is to use Tukey’s simultaneous confidence intervals to calculate any and all pairs of group population means‒ This procedure takes multiple comparisons

into consideration to preserve the ® level


33

Other Options

• Bonferroni correction for the number of comparisons done

• Dunnett’s tests

• Scheffe procedure


34

Tukey’s Multiple Comparisons for previous example


35

Conclusions

• In the table on the previous page, ‒ 1 = Biology, 2 = Business, 3 = English,

4 = Mathematics

• Biology, Business, and Mathematics are all are significantly different from English

• There are no other significant differences


36

Additional sample output

• Below is the same output from a different software package


37

Comparison to Regression

• Sample regression output‒ Which major is our baseline?


38


• F-statistic = 22.1, p-value < 0.0005‒ This is the same F-statistic and p-value as the

ANOVA on slide 25

• At least one of the explanatory variables is important in – this corresponds to the rejection of the null, at least one of the means differs


39


• Note that Biology is the baseline and 45.3 is the mean WPM for Biology students

• Note that Business and Mathematics are not significant

• Agrees with post-hoc comparisons that neither Business or Mathematics is significantly different from Biology, but English is not

• To make further conclusions will need to look at multiple comparisons, such as the previous Tukey intervals


40

Regression

• The conclusions about the overall null hypothesis will be the same

• In regression can make statements comparing groups to baseline

• To make more conclusive statements will need to do more analysis

• ANOVA and either planned or post-hoc comparisons will do the same thing and is often easier


41

One-way ANOVA Power

• Two different SAT prep courses charge $1200 for a two month course. An (unethical) experiment would be to randomize students into one of the two courses or take no course

• What information is needed to calculate power for this one-way ANOVA?‒ Sample size‒ Within group variance (¾2 )‒ Estimated or minimally interesting outcome

means for each group


42

Estimate of ¾2

• Based on previous years, we know that 95% of the student scores on SATs fall between 900 and 1500

• ¾ = (1500-900)/4 = 150

• ¾2 = 150^2


43

Minimally interesting outcome

• What is the minimally average benefit, in points gained, that would justify the program?‒ The minimally interesting outcome is based

on previous knowledge

• For this example we’ll try several different values


44

sd[treatment]

• Different applets will define things slightly different. Find an applet you understand.

• For the applet I will show you, they require sd[treatment]. From their definition this is calculated as

‒ Where ¹i is the ith group mean

‒ k = the number of groups

• Ready to go to power applet

1-k

)( nt]sd[treatme

k

1i

2

i


45

Calculating the power

• Let ¾ = 150, n = 50, effect = 50 points‒ Power = 0.3811





46

Calculating the power






47

Moving past One-way ANOVA

• What if we have two categorical explanatory variables?

• What if we have categorical and quantitative explanatory variables?

• What if subjects have more than one treatment?

• What if there is more than one response variable?

• And many other combinations…


48

Two-way ANOVA

• Suppose we now have two categorical explanatory variables

• Is there a significant X1 effect?

• Is there a significant X2 effect?

• Are there significant interaction effects?

• If X1 has k levels and X2 has m levels, then the analysis is often referred to as a “k by m ANOVA” or “k x m ANOVA”


49

Terminology

• If the interaction is significant, the model is called an interaction model

• If the interaction is not significant, the model is called an additive model

• Explanatory variables are often referred to as factors


50

Assumptions

• The assumptions are the same as in One-way ANOVA

1. The errors ²ij are normally distributed

2. Across the conditions, the errors have equal spread. Often referred to as equal variances.

3. The errors are independent from each other


51

Two-way ANOVA

• Two-way (or multi-way) ANOVA is an appropriate analysis method for a study with a quantitative outcome and two (or more) categorical explanatory variables.

• The assumptions are Normality, equal variance, and independent errors.


52

Results

• Results are again displayed in a ANOVA table

• Will have one line for each term in the model. For a model with two factors, we will have one line for each factor and one line for the interaction. We will also have a line for the error and the total.

• See next page.


53

The ANOVA tableSS df MS F Sig

Factor 1 k-1

Factor 2 m-1

Interaction (k-1)(m-1)

Error N-k*m *

Total N-1

• The MS(error), denoted by * in the above table, is the true estimate of ¾2

• The MS in each row is that row’s SS/df •The F-statistic is the MS/MS(error)


54

Exploratory Analysis

• Table of means

• Interaction or profile plots‒ An interaction plot is a way to look at

outcome means for two factors simultaneously

‒ A plot with parallel lines suggests an additive model

‒ A plot with non-parallel lines suggests an interaction model

‒ Note that an interaction plot should NOT be the deciding factor in whether or not to run an interaction model


55

Example

• Continuing with the previous example, suppose we’d like to add gender as an explanatory variable

• X1: Major – 4 levels

• X2: Gender – 2 levels

• Response: words per minute typed

• We will fit an 4 by 2 ANOVA


56

Table of Means and CountsMale Female Overall

Biology 45.5 45.2 45.4

Business 48.6 46.9 47.6

English 55.3 55.9 55.6

Mathematics 45.6 44.6 45.1

Overall 48.9 47.9 48.4

Male Female

Biology 14 11

Business 10 15

English 14 11

Mathematics 12 13

• Note, this table should also include the standard error of each of the means.


57

Interaction plots


58

Interaction plots

• There are two ways to do an interaction plot. Both are legitimate. Ease of interpretation is the final criteria of which to do.

• If one explanatory variable has more levels than the other, interpretation is often easier if the explanatory variable with more levels defines the x-axis

• If one explanatory variable is quantitative but has been categorized and the other is categorical, interpretation is often easier if the categorized quantitative variable defines the x-axis.• Example: age, 20-29, 30-39, 40-49, etc.


59

Results

• Typical output:

• The last column contains the p-values‒ Always check interaction first!‒ If the interaction is not significant, rerun

without it


60

Results

• Updated results

• Now we can interpret the main effects. We can see that major is significant but that gender is not.


61



62

Notes

• If the interaction is significant, do not check the main effects. The main effects should always be kept if the interaction is significant.

• Note that due to the groups of students, you will see vertical lines in the residual versus predicted plot. This is due to the fact that all students with a particular combination of the factors will have the same predicted value.


63

Example 2

• Using the same variables, let’s look at a different outcome


64

Table of Means Example 2

Male Female Overall

Biology 37.9 45.8 41.2

Business 39.9 45.0 43.0

English 45.3 60.0 51.8


Overall 41.3 49.8 51.2


65

Typical SPSS Exploratory Analysis


66

Interaction plots Example 2


67

Results Example 2

• Results

• Note that the interaction is significant‒ In this case both main effects are also

significant, however since the interaction is significant we would keep them even if they were not


68

Example 2


69

Example 2


70

Example 3

• Again, using the same variables, let’s look at a different outcome


71

Table of Means Example 3

Male Female Overall

Biology 47.9 47.2 47.6

Business 50.2 48.1 49.0

English 54.8 62.1 58.1


Overall 51.3 51.1 58.0


72

Interaction Plots Example 3


73

Results Example 3

• Results

• In this case, the interaction and major are significant, but gender is not.

• Since the interaction is significant, leave gender in the model.


74

Example 3


75

Example – Ginkgo for Memory

• A study was performed to test the memory effects of the herbal medicine Ginkgo biloba in healthy people. Subjects received a daily dosage (placebo, 120mg, 250mg) for two months. Subjects also received one of two types of mnemonic training. All subjects were given a memory test before the study and again at the end. The response variable is the difference (after – before) in memory test scores. There were 18 subjects randomly assigned to each combination of levels.


76



77



78

SPSS ANOVA output

• Conclusions?


79

ANOVA output

• Conclusions?


80

Estimated Profile Plot


81

Post-hoc Comparisons

• Since there are only two levels of training and there is a significant training effect, we don’t need multiple comparisons for training


82

Residual plot

• No problems


83

Further Analysis

• If there had been an interaction, we could create a table indicating which differences were significant


84

ANCOVA

• Analysis of Covariance

‒ At least one quantitative and one categorical explanatory variable

‒ In general, the main interest is the effects of the categorical variable and the quantitative variable is considered to be a control variable

‒ It is a blending of regression and ANOVA


85

Example

• Suppose that we have two different math tutors and would like to compare performance on the final math test

• We also have time on tutor and would like to use that as another explanatory variable


86



87

Compare Regression and ANCOVA

• Regression

• ANCOVA


88

Compare Regression and ANOVA

• Note that the p-value for the interaction is the same in both models

• The interaction is not significant, drop and rerun

89

Compare Regression and ANOVA

• Regression

• ANCOVA

january 7, 2009 - morning session 1 statistics micro mini multi-factor anova january 5-9, 2008 beth...

Documents

morning session

group of subjects

experimental group slide

group j

oneway anova oneway

treatment group

time slide

subjects design