january 7, 2009 - morning session 1 statistics micro mini multi-factor anova january 5-9, 2008 beth...
Post on 22-Dec-2015
214 views
TRANSCRIPT
January 7, 2009 - morning session
1
Statistics Micro Mini
Multi-factor ANOVA
January 5-9, 2008
Beth Ayers
January 7, 2009 - morning session
2
Thursday Sessions
• ANOVA‒ One-way ANOVA‒ Two-way ANOVA‒ ANCOVA‒ With-in subject‒ Between subject‒ Repeated measures‒ MANOVA‒ etc.
January 7, 2009 - morning session
3
What is ANOVA?
• ANalysis Of VAriance
‒ Partitions the observed variance based on explanatory variables
‒ Compare partitions to test significance of explanatory variables
January 7, 2009 - morning session
4
Some Terminology
• Between subjects design – each subject participates in one and only one group
• Within subjects design – the same group of subjects serves in more than one treatment‒ Subject is now a factor
• Mixed design – a study which has both between and within subject factors
• Repeated measures – general term for any study in which multiple measurements are measured on the same subject‒ Can be either multiple treatments or several
measurements over time
January 7, 2009 - morning session
5
ANOVA
• Use variances and variance like quantities to study the equality or non-equality of population means
• So, although it is analysis of variance we are actually analyzing means, not variances
• There are other methods which analyze the variances between groups
January 7, 2009 - morning session
6
ANOVA
• Typical exploratory analysis includes‒ Tabulation of the number of subjects in each
experimental group
‒ Side-by-side box plots
‒ Statistics about each group‒ At least mean and standard deviation, can
include 5-number summary and information on skewness
‒ Table of means for each experimental group
January 7, 2009 - morning session
7
Notation
• If we have k groups, denote the means of the groups as:‒ ¹1, ¹2, . . ., ¹k
• Student i in group j has observation‒ yij = ¹j + ²
ij
‒ Where ²ij are independent, distributed N(0,¾2)‒ Can combine this and say subjects from group j
have distribution N(¹j,¾2)
• With random assignment, the sample mean for any treatment group is representative of the population mean for that group
January 7, 2009 - morning session
8
Assumptions
1. The errors ²ij are normally distributed
2. Across the conditions, the errors have equal spread. Often referred to as equal variances.‒ Rule of thumb: the assumption is met if the
largest variance is less than twice the smallest variance
‒ If unequal variances need to make a correction!! This is usually ®/2.
3. The errors are independent from each other
January 7, 2009 - morning session
9
Checking the assumptions
• Use the residuals, which are the estimates of ²ij
1. Look at normal probability plot2. Look at residual versus fitted plot3. Hard to check, often assumed from study
design
• For mild violations of the assumptions, there are options for correction
• When the assumptions are not met – the p-value is simply wrong!!
January 7, 2009 - morning session
10
One-way ANOVA
• One-way ANOVA is used when‒ Only testing the affect of one explanatory
variable‒ Each subject has only one treatment or
condition‒ Thus a between-subjects design
• Used to test for differences among two or more independent groups
• Gives the same results as two-sample T-test if explanatory variable has 2 levels
January 7, 2009 - morning session
11
Hypothesis Testing
• H0: ¹1 = ¹2 = . . . = ¹k
• H1: the ¹’s are not all equal
• The alternative hypothesis H1: ¹1 ≠ … ≠ ¹k is wrong!
• The null hypothesis is called the overall null and is the hypothesis tested by ANOVA
• If the overall null is rejected, must do more specific hypothesis testing to determine which means are different, often referred to as contrasts
January 7, 2009 - morning session
12
Terminology
• The sample variance is the sum of the squared deviations from the mean divided by the degrees of freedom
• A mean square (MS) is a variance like quantity calculated as a SS/df
df
SS MS
1
)( s
22
N
xxi
January 7, 2009 - morning session
13
One-way ANOVA
• In one-way ANOVA we work with two mean square quantities
‒ MSwithin – the mean square within-groups
‒ MSbetween – the mean square between-groups
between
betweenbetween df
SS MS
within
withinwithin df
SS MS
January 7, 2009 - morning session
14
Within vs. Between
January 7, 2009 - morning session
15
One-way ANOVA
• For each individual group we have
• So the estimate of MSwithin is
• And the estimate of MSbetween is
1
)(
df
SS MS 1
2
between
betweenbetween
k
xxnk
i ii
k
i i
k
i i
kNn
SS
1
1
within
withinwithin
)1(df
SS MS
1
)(
df
SS 1
2
i
i
i
n
j iij
n
xxi
January 7, 2009 - morning session
16
Mean Squares
• What do these values mean?
• MSwithin is considered a true estimate of ¾2 that is unaffected by whether the null or alternative hypothesis is true
• MSbetween is considered a good estimate of ¾2 only when the null hypothesis is true‒ If the alternative is true, values of MSbetween tend to be
inflated
• Thus, we can look at the ratio of the two mean square values to evaluate the null hypothesis
January 7, 2009 - morning session
17
Testing the Hypothesis
• The F-test looks at the variation among the group means relative to the variation within the sample
• The F-statistic tends to be larger if the alternative hypothesis is true than if the null hypothesis is true
• The test statistic F has an F(k-1, N-k) distribution
)(
)1(
kNSS
kSS
dfSS
dfSS
MS
MSF
within
between
within
within
between
between
within
between
January 7, 2009 - morning session
18
What does the F ratio tell us?
• F = MSbetween / MSwithin
• The denominator is always an estimate of ¾2
(under both the null and alternative hypotheses)
• The numerator is either another estimate of ¾2 (under the null) or is inflated (under the alternative)
• If the null is true, values of F are close to 1
• If the alternative is true, values of F are larger
• Large values of F depend on the degrees of freedom
January 7, 2009 - morning session
19
The ANOVA table
• When running an ANOVA, statistical packages will return an ANOVA table summarizing the SS, MS, df, F-statistic, and p-value
SS Df MS F Sig
Group (Treatment, between)
SSbetween dfbetween MSbetween
MSbetween_________________
MSwithin
P-value
Residual (Error,within)
SSwithin dfwithin MSwithin
Total SSbetween
+ SSwithin
dfbetween
+ dfwithin
January 7, 2009 - morning session
20
Example
• Suppose we want to know if typing speed varies across majors
• Use 4 majors – Biology, Business, English, and Mathematics
• H0: typing speed is the same for students of all majors‒ H0: ¹Bio = ¹Business = ¹Eng = ¹Math
• H1: typing speed varies across the majors‒ H1: at least one of the means is different
January 7, 2009 - morning session
21
Box plots
January 7, 2009 - morning session
22
Summary
Major ni Mean Variance
Biology 25 45.3 24.7
Business 25 47.6 25.4
English 25 55.6 38.8
Mathematics 25 45.1 20.1
• The largest variance is less than twice the smallest variance (38.8 < 2 ¢ 20.1 = 40.2). Use ® = 0.05.
January 7, 2009 - morning session
23
Degrees of Freedom
• How many groups do we have?
• What is the sample size?
• Using these values:• What is dfwithin?
• What is dfbetween?
January 7, 2009 - morning session
24
Degrees of Freedom
• How many groups do we have? ‒ There are k = 4 groups – Biology, English,
Business, and Mathematics
• What is the sample size?‒ There are N = 100 students
• Using these values,‒ What is dfbetween?
‒ k – 1 = 4 – 1 = 3
‒ What is dfwithin?‒ N – k = 100 – 4 = 96
January 7, 2009 - morning session
25
Sample Output
SS Df MS F Sig
Group (Treatment, between)
1807.49 3 602.50 22.091 0.000
Residual (Error,within)
2618.20 96 27.17
Total 4425.69 99
• Our estimate of ¾2 is 27.17
• The numerator MS = 602.5 and appears to be highly inflated
January 7, 2009 - morning session
26
Results
• F-statistic = 22.1• P-value: <0.0005
• Conclusion – the average words per minute differs for at least one of the majors
• To make stronger statements need to do further testing
January 7, 2009 - morning session
27
Checking the assumptions
January 7, 2009 - morning session
28
Further Analysis
• If H0 is rejected, we conclude that not all the ¹’s are equal
• Would like to make statements about where there are differences
• Can use planned or unplanned comparisons (or contrasts)‒ Planned comparisons are interesting
comparisons decided on before analysis‒ Unplanned comparisons occur after seeing
the results‒ Be careful not to go fishing for results
January 7, 2009 - morning session
29
Contrasts
• A simple contrast hypothesis compares two population means‒ HO: ¹1 = ¹5
• A complex contrast hypothesis has multiple population means on either side• H0: (¹1 + ¹2) / 2 = ¹3
• H0: (¹1 + ¹2) / 2 = (¹3 + ¹4 + ¹5) / 3
January 7, 2009 - morning session
30
Planned Comparisons
• Most statistical packages allow you to enter custom planned contrast hypotheses
• The p-values are only valid under strict conditions‒ The conditions maintain Type-1 error rate
across the whole experiment
• Computer packages assume that you have checked the assumptions of the ANOVA test
January 7, 2009 - morning session
31
Conditions for Planned Comparisons
• Contrasts are selected before looking at the residuals, they are planned – not post-hoc
• Must be ignored if the overall null is not rejected!
• Each contrast is based on independent information from other contrasts
• The number of planned comparisons must not be more than the corresponding degrees of freedom (k-1 in one-way ANOVA)
January 7, 2009 - morning session
32
Unplanned Comparisons
• What if we notice a possible interesting difference when looking at the results?
• Can do comparisons but need to adjust the ®-level to control for Type-1 error
• One common method is to use Tukey’s simultaneous confidence intervals to calculate any and all pairs of group population means‒ This procedure takes multiple comparisons
into consideration to preserve the ® level
January 7, 2009 - morning session
33
Other Options
• Bonferroni correction for the number of comparisons done
• Dunnett’s tests
• Scheffe procedure
January 7, 2009 - morning session
34
Tukey’s Multiple Comparisons for previous example
January 7, 2009 - morning session
35
Conclusions
• In the table on the previous page, ‒ 1 = Biology, 2 = Business, 3 = English,
4 = Mathematics
• Biology, Business, and Mathematics are all are significantly different from English
• There are no other significant differences
January 7, 2009 - morning session
36
Additional sample output
• Below is the same output from a different software package
January 7, 2009 - morning session
37
Comparison to Regression
• Sample regression output‒ Which major is our baseline?
January 7, 2009 - morning session
38
Comparison to Regression
• F-statistic = 22.1, p-value < 0.0005‒ This is the same F-statistic and p-value as the
ANOVA on slide 25
• At least one of the explanatory variables is important in – this corresponds to the rejection of the null, at least one of the means differs
January 7, 2009 - morning session
39
Comparison to Regression
• Note that Biology is the baseline and 45.3 is the mean WPM for Biology students
• Note that Business and Mathematics are not significant
• Agrees with post-hoc comparisons that neither Business or Mathematics is significantly different from Biology, but English is not
• To make further conclusions will need to look at multiple comparisons, such as the previous Tukey intervals
January 7, 2009 - morning session
40
Regression
• The conclusions about the overall null hypothesis will be the same
• In regression can make statements comparing groups to baseline
• To make more conclusive statements will need to do more analysis
• ANOVA and either planned or post-hoc comparisons will do the same thing and is often easier
January 7, 2009 - morning session
41
One-way ANOVA Power
• Two different SAT prep courses charge $1200 for a two month course. An (unethical) experiment would be to randomize students into one of the two courses or take no course
• What information is needed to calculate power for this one-way ANOVA?‒ Sample size‒ Within group variance (¾2 )‒ Estimated or minimally interesting outcome
means for each group
January 7, 2009 - morning session
42
Estimate of ¾2
• Based on previous years, we know that 95% of the student scores on SATs fall between 900 and 1500
• ¾ = (1500-900)/4 = 150
• ¾2 = 150^2
January 7, 2009 - morning session
43
Minimally interesting outcome
• What is the minimally average benefit, in points gained, that would justify the program?‒ The minimally interesting outcome is based
on previous knowledge
• For this example we’ll try several different values
January 7, 2009 - morning session
44
sd[treatment]
• Different applets will define things slightly different. Find an applet you understand.
• For the applet I will show you, they require sd[treatment]. From their definition this is calculated as
‒ Where ¹i is the ith group mean
‒ k = the number of groups
• Ready to go to power applet
1-k
)( nt]sd[treatme
k
1i
2
i
January 7, 2009 - morning session
45
Calculating the power
• Let ¾ = 150, n = 50, effect = 50 points‒ Power = 0.3811
• Let ¾ = 150, n = 100, effect = 50 points‒ Power = 0.6772
• Let ¾ = 150, n = 50, effect = 100 points‒ Power = 0.9367
• Let ¾ = 150, n = 50, effect = 25 points‒ Power = 0.1245
January 7, 2009 - morning session
46
Calculating the power
• Let ¾ = 100, n = 50, effect = 50 points‒ Power = 0.7276
• Let ¾ = 100, n = 100, effect = 50 points‒ Power = 0.9622
• Let ¾ = 100, n = 50, effect = 100 points‒ Power = 0.997
• Let ¾ = 100, n = 50, effect = 25 points‒ Power = 0.2294
January 7, 2009 - morning session
47
Moving past One-way ANOVA
• What if we have two categorical explanatory variables?
• What if we have categorical and quantitative explanatory variables?
• What if subjects have more than one treatment?
• What if there is more than one response variable?
• And many other combinations…
January 7, 2009 - morning session
48
Two-way ANOVA
• Suppose we now have two categorical explanatory variables
• Is there a significant X1 effect?
• Is there a significant X2 effect?
• Are there significant interaction effects?
• If X1 has k levels and X2 has m levels, then the analysis is often referred to as a “k by m ANOVA” or “k x m ANOVA”
January 7, 2009 - morning session
49
Terminology
• If the interaction is significant, the model is called an interaction model
• If the interaction is not significant, the model is called an additive model
• Explanatory variables are often referred to as factors
January 7, 2009 - morning session
50
Assumptions
• The assumptions are the same as in One-way ANOVA
1. The errors ²ij are normally distributed
2. Across the conditions, the errors have equal spread. Often referred to as equal variances.
3. The errors are independent from each other
January 7, 2009 - morning session
51
Two-way ANOVA
• Two-way (or multi-way) ANOVA is an appropriate analysis method for a study with a quantitative outcome and two (or more) categorical explanatory variables.
• The assumptions are Normality, equal variance, and independent errors.
January 7, 2009 - morning session
52
Results
• Results are again displayed in a ANOVA table
• Will have one line for each term in the model. For a model with two factors, we will have one line for each factor and one line for the interaction. We will also have a line for the error and the total.
• See next page.
January 7, 2009 - morning session
53
The ANOVA tableSS df MS F Sig
Factor 1 k-1
Factor 2 m-1
Interaction (k-1)(m-1)
Error N-k*m *
Total N-1
• The MS(error), denoted by * in the above table, is the true estimate of ¾2
• The MS in each row is that row’s SS/df •The F-statistic is the MS/MS(error)
January 7, 2009 - morning session
54
Exploratory Analysis
• Table of means
• Interaction or profile plots‒ An interaction plot is a way to look at
outcome means for two factors simultaneously
‒ A plot with parallel lines suggests an additive model
‒ A plot with non-parallel lines suggests an interaction model
‒ Note that an interaction plot should NOT be the deciding factor in whether or not to run an interaction model
January 7, 2009 - morning session
55
Example
• Continuing with the previous example, suppose we’d like to add gender as an explanatory variable
• X1: Major – 4 levels
• X2: Gender – 2 levels
• Response: words per minute typed
• We will fit an 4 by 2 ANOVA
January 7, 2009 - morning session
56
Table of Means and CountsMale Female Overall
Biology 45.5 45.2 45.4
Business 48.6 46.9 47.6
English 55.3 55.9 55.6
Mathematics 45.6 44.6 45.1
Overall 48.9 47.9 48.4
Male Female
Biology 14 11
Business 10 15
English 14 11
Mathematics 12 13
• Note, this table should also include the standard error of each of the means.
January 7, 2009 - morning session
57
Interaction plots
January 7, 2009 - morning session
58
Interaction plots
• There are two ways to do an interaction plot. Both are legitimate. Ease of interpretation is the final criteria of which to do.
• If one explanatory variable has more levels than the other, interpretation is often easier if the explanatory variable with more levels defines the x-axis
• If one explanatory variable is quantitative but has been categorized and the other is categorical, interpretation is often easier if the categorized quantitative variable defines the x-axis.• Example: age, 20-29, 30-39, 40-49, etc.
January 7, 2009 - morning session
59
Results
• Typical output:
• The last column contains the p-values‒ Always check interaction first!‒ If the interaction is not significant, rerun
without it
January 7, 2009 - morning session
60
Results
• Updated results
• Now we can interpret the main effects. We can see that major is significant but that gender is not.
January 7, 2009 - morning session
61
Checking the assumptions
January 7, 2009 - morning session
62
Notes
• If the interaction is significant, do not check the main effects. The main effects should always be kept if the interaction is significant.
• Note that due to the groups of students, you will see vertical lines in the residual versus predicted plot. This is due to the fact that all students with a particular combination of the factors will have the same predicted value.
January 7, 2009 - morning session
63
Example 2
• Using the same variables, let’s look at a different outcome
January 7, 2009 - morning session
64
Table of Means Example 2
Male Female Overall
Biology 37.9 45.8 41.2
Business 39.9 45.0 43.0
English 45.3 60.0 51.8
Mathematics 41.8 50.0 46.1
Overall 41.3 49.8 51.2
January 7, 2009 - morning session
65
Typical SPSS Exploratory Analysis
January 7, 2009 - morning session
66
Interaction plots Example 2
January 7, 2009 - morning session
67
Results Example 2
• Results
• Note that the interaction is significant‒ In this case both main effects are also
significant, however since the interaction is significant we would keep them even if they were not
January 7, 2009 - morning session
68
Example 2
January 7, 2009 - morning session
69
Example 2
January 7, 2009 - morning session
70
Example 3
• Again, using the same variables, let’s look at a different outcome
January 7, 2009 - morning session
71
Table of Means Example 3
Male Female Overall
Biology 47.9 47.2 47.6
Business 50.2 48.1 49.0
English 54.8 62.1 58.1
Mathematics 52.0 48.4 50.1
Overall 51.3 51.1 58.0
January 7, 2009 - morning session
72
Interaction Plots Example 3
January 7, 2009 - morning session
73
Results Example 3
• Results
• In this case, the interaction and major are significant, but gender is not.
• Since the interaction is significant, leave gender in the model.
January 7, 2009 - morning session
74
Example 3
January 7, 2009 - morning session
75
Example – Ginkgo for Memory
• A study was performed to test the memory effects of the herbal medicine Ginkgo biloba in healthy people. Subjects received a daily dosage (placebo, 120mg, 250mg) for two months. Subjects also received one of two types of mnemonic training. All subjects were given a memory test before the study and again at the end. The response variable is the difference (after – before) in memory test scores. There were 18 subjects randomly assigned to each combination of levels.
January 7, 2009 - morning session
76
Exploratory Analysis
January 7, 2009 - morning session
77
Exploratory Analysis
January 7, 2009 - morning session
78
SPSS ANOVA output
• Conclusions?
January 7, 2009 - morning session
79
ANOVA output
• Conclusions?
January 7, 2009 - morning session
80
Estimated Profile Plot
January 7, 2009 - morning session
81
Post-hoc Comparisons
• Since there are only two levels of training and there is a significant training effect, we don’t need multiple comparisons for training
January 7, 2009 - morning session
82
Residual plot
• No problems
January 7, 2009 - morning session
83
Further Analysis
• If there had been an interaction, we could create a table indicating which differences were significant
January 7, 2009 - morning session
84
ANCOVA
• Analysis of Covariance
‒ At least one quantitative and one categorical explanatory variable
‒ In general, the main interest is the effects of the categorical variable and the quantitative variable is considered to be a control variable
‒ It is a blending of regression and ANOVA
January 7, 2009 - morning session
85
Example
• Suppose that we have two different math tutors and would like to compare performance on the final math test
• We also have time on tutor and would like to use that as another explanatory variable
January 7, 2009 - morning session
86
Exploratory Analysis
January 7, 2009 - morning session
87
Compare Regression and ANCOVA
• Regression
• ANCOVA
January 7, 2009 - morning session
88
Compare Regression and ANOVA
• Note that the p-value for the interaction is the same in both models
• The interaction is not significant, drop and rerun
89
Compare Regression and ANOVA
• Regression
• ANCOVA