analysis of variance - universität innsbruck · two-way anova further extensions useful r-commands...

Introduction ANOVA One-Way ANOVA Two-Way ANOVA Further Extensions Useful R-commands Analysis of Variance Janette Walde [email protected] Department of Statistics University of Innsbruck Janette Walde Analysis of Variance

Upload: phungcong

Post on 16-Jun-2019

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Analysis of Variance

Janette Walde

[email protected]

Department of StatisticsUniversity of Innsbruck

Janette Walde Analysis of Variance

Page 2: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Outline I1 Introduction

ProblemsWhat is Analysis of VarianceSome Terminology

2 ANOVAObject of InvestigationExploratory AnalysisNotationAssumptions

3 One-Way ANOVAArea of ApplicationHypothesis TestingExample

Janette Walde Analysis of Variance

Page 3: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Outline IIPost-Hoc AnalysisPower Analysis

4 Two-Way ANOVATerminologyAssumptionsResultsExploratory AnalysisExample

5 Further Extensions

6 Useful R-commands

Janette Walde Analysis of Variance

Page 4: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

ProblemsWhat is Analysis of VarianceSome Terminology

Problems/Questions

Do fertilizer have different effects on different kind ofwheat?React females differently on anti-cancer drugs as males?Does water evaporation of soil depend on the kind ofvegetation growing, controlling for climate conditions?A new treatment meant to help those with chronic arthritispain was developed and tested for its long-terneffectiveness. Participants in the experiment rated theirlevel of pain on a 0 (no pain) to 9 (extreme pain) scale atthree-month intervals. Was the treatment effective? (Does the exposure of plants to various amounts of CO2affect characteristics of the plant?

Janette Walde Analysis of Variance

Page 5: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

ProblemsWhat is Analysis of VarianceSome Terminology

What is ANOVA?

ANalysis Of VAriance.

Partitions the observed variance based on explanatory(independent) variables.

Compares partitions to test significance on explanatoryvariables.

Janette Walde Analysis of Variance

Page 6: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

ProblemsWhat is Analysis of VarianceSome Terminology

Some Terminology

Between subject design - each subject participates in oneand only one group.

Within subjects design - the same group of subjects servesin more than one treatment - Subject is now a factor.

Mixed design - a study which has both between and withinsubject factors.

Repeated measures - general term for any study in whichmultiple measurements are measured on the samesubject.Can be either multiple treatments or severalmeasurements over time.

Janette Walde Analysis of Variance

Page 7: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Object of InvestigationExploratory AnalysisNotationAssumptions

Object of investigation

Use variances and variance like quantities to study theequality or non-equality of population means.

So, although it is analysis of variance we are actuallyanalyzing means, not variances.

There are other methods which analyze the variancesbetween groups.

Janette Walde Analysis of Variance

Page 8: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Object of InvestigationExploratory AnalysisNotationAssumptions

Typical exploratory analysis include

Tabulation of the number of subjects in experimental group.

Side-by-side box plots.

Statistics about each group.

Janette Walde Analysis of Variance

Page 9: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Object of InvestigationExploratory AnalysisNotationAssumptions

Notation

If we have K groups denote the means of the groups asµ1, µ2, ..., µK .Subject i in group j has observation yij :

yij = µj + εij

where εij are independent distributed N(0, σ2).Can combine this and say that subjects from group j havedistribution N(µ, σ2).

With random assignment the sample mean for anytreatment group is representative of the population meanfor that group.

Janette Walde Analysis of Variance

Page 10: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Object of InvestigationExploratory AnalysisNotationAssumptions

Assumptions

1 The errors εij are normally distributed.2 Across the conditions the errors have equal spread. Often

referred to as equal vaiances

Rule of thumb: the assumption is met if the largest varianceis less than twice the smallest variance.If unequal variances need to make a correction. This isusually α/2.

3 The errors are independent from each other.

Janette Walde Analysis of Variance

Page 11: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Object of InvestigationExploratory AnalysisNotationAssumptions

Checking the assumptions

Use the residuals which are the estimates of εij .1 Look at normal probability plot.2 Look at residual versus fitted plot.3 Hard to check often assumed from study design.

For mild violations of the assumptions there are options forcorrection.

When the assumptions are NOT met the p-values aresimply wrong.

Janette Walde Analysis of Variance

Page 12: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Basics

One-way ANOVA is used whenOnly testing the effect of one explanatory variable.Each subject has only one treatment or condition. Thus, abetween-subject design.

Used to test for differences among two or moreindependent groups.

Gives the same results as two sample t-tests if explanatoryvariable has to levels.

Janette Walde Analysis of Variance

Page 13: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Hypothesis

H0 : µ1 = µ2 = ... = µK

H1: The µ’s are not all equal.

The null hypothesis is called the overall null and is thehypothesis tested by ANOVA.

If the overall null is rejected you must do more specifichypothesis testing to determine which means are different,often referred to as contrasts.

Janette Walde Analysis of Variance

Page 14: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Terminology

The sample variance is the sum of the squared deviationsfrom the mean divided by the number of observationsminus 1

s2 =

(xi − x̄)2

n − 1

A mean square (MS) is a variance like quantity calculatedas the sum of the squared deviations (SS) divided by thedegrees of freedom (df )

MS =SSdf

Janette Walde Analysis of Variance

Page 15: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Within versus Between

In one-way ANOVA we work with two mean squarequantities

* MSwithin ... the mean square within-groups* MSbetween ... the mean square between-groups

For each individual group we have

SSidfi

=∑ni

j=1(xij−x̄i)2

ni−1

So the estimate of MSwithin is

MSwithin = SSwithindfwithin

=∑K

i=1 SSiN−K

And the estimate of MSbetween is

MSbetween = SSbetweendfbetween

=∑K

i=1 ni(x̄i−x̄)2

K−1

Janette Walde Analysis of Variance

Page 16: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Mean Squares

What do these values mean?

MSwithin is considered a true estimate of σ2 that isunaffected by whether the null or alternative hypothesis istrue.

MSbetween is considered a good estimate of σ2 only whenthe null hypothesis is true. If the alternative is true, valuesof MSbetween tend to be inflated.

Thus, we can look at the ratio of the two mean squarevalues to evaluate the null hypothesis.

Janette Walde Analysis of Variance

Page 17: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Testing the Hypothesis

The F -test looks at the variation among the group meansrelative to the variation within the sample

F = MSbetweenMSwithin

= SSbetween/dfbetweenSSwithin/dfwithin

= SSbetween/(K−1)SSwithin/(N−K )

The F -statistic tends to be larger if the alternativehypothesis is true than if the null hypothesis is true.

The test statistic F has an F (K − 1,N − K ) distribution.

Janette Walde Analysis of Variance

Page 18: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

What does the F ratio tell us?

F = MSbetween/MSwithin

The denominator is always an estimate of σ2 (under boththe null and alternative hypotheses).

The numerator is either another estimate of σ2 (under thenull) or is inflated (under the alternative).

If the null is true, values of F are close to 1.

If the alternative is true, values of F are larger.

Large values of F depend on the degrees of freedom.

Janette Walde Analysis of Variance

Page 19: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

The ANOVA table

When running an ANOVA, statistical packages will return anANOVA table summarizing the SS, MS, df , F -statistic, andp-value:

SS df MS F SigGroup

(Treatment, SSbet. dfbet. MSbet.MSbet.MSwithin

p-valuebetween)Residual(Error, SSwithin dfwithin MSwithin

within)Total SSbet.+ dfbet.+

SSwithin dfwithin

Janette Walde Analysis of Variance

Page 20: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Example

The data are gathered from a plant physiology experimentwhich investigated the effect of various sugar on the growth ofpeas. Growth or length is measured in ’ocular units’. Fivegroups are analyzed, a control group and four groups varying inthe kind of sugar and its amount. In each groups there are 10measurements.

sub- control 2% 2% 1% glucose + 1%jects group glucose fructose 2% saccharose fructose1 71 57 58 58 622 68 58 61 59 663 70 60 56 58 65...

......

......

...

Janette Walde Analysis of Variance

Page 21: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Example

We want to know whether the means of the variables’length’ differ significantly across the groups, i.e. does thesupply with sugar influence the growth of the peas?

Use 5 Control group, group 1, group 2, group 3, and group4.

H0 : Growth is independent of the sugar support.H0 : µcontrolgroup = µgroup1 = µgroup2 = µgroup3 = µgroup4

H1 : Growth varies across groups.H1 : At least one of the means is different.

Janette Walde Analysis of Variance

Page 22: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Box plots

"1% fructose" "2% fructose" "control group"

6065

70

Janette Walde Analysis of Variance

Page 23: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Summary

The largest variance is less than twice the smallest variance(2.2 < 2 · 1.4 = 2.8). Use α = 0.05.

Groups ni Mean Variance

Control group 10 70.1 2.2Group 1 10 64.1 1.8Group 2 10 58.0 1.4Group 3 10 58.2 1.9Group 4 10 59.3 1.6

Janette Walde Analysis of Variance

Page 24: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Degrees of Freedom

How many groups do we have?There are K = 5 groups

What is the sample size?There are N = 50 peas.Using these values:

What is dfwithin?K − 1 = 5 − 1 = 4What is dfbetween?N − K = 50 − 5 = 45

Janette Walde Analysis of Variance

Page 25: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Degrees of Freedom

How many groups do we have?There are K = 5 groups

What is the sample size?There are N = 50 peas.Using these values:

What is dfwithin?K − 1 = 5 − 1 = 4What is dfbetween?N − K = 50 − 5 = 45

Janette Walde Analysis of Variance

Page 26: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Degrees of Freedom

How many groups do we have?There are K = 5 groups

What is the sample size?There are N = 50 peas.Using these values:

What is dfwithin?K − 1 = 5 − 1 = 4What is dfbetween?N − K = 50 − 5 = 45

Janette Walde Analysis of Variance

Page 27: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Degrees of Freedom

How many groups do we have?There are K = 5 groups

What is the sample size?There are N = 50 peas.Using these values:

What is dfwithin?K − 1 = 5 − 1 = 4What is dfbetween?N − K = 50 − 5 = 45

Janette Walde Analysis of Variance

Page 28: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Sample Output

SS df MS F SigGroup

(Treatment, 1077.3 4 269.330 82.168 0.000between)Residual

(Error, 147.5 45 3.278within)Total 1224.8 9

Our estimate of σ2 is approximately 3.3.

The numerator MS = 269.330 and appears to be highlyinflated.

Janette Walde Analysis of Variance

Page 29: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Results

F -statistic = 22.1.

p-value: < 0.05.

Conclusion - the growth differs for at least one of thegroups.

To make stronger statements need to do further testing.

Janette Walde Analysis of Variance

Page 30: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Checking the assumptions

−2 −1 0 1 2

−1

01

2

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

4

8

34

58 60 62 64 66 68 70

−2

02

4

Fitted values

Res

idua

ls

Residuals vs Fitted

4

8

34

Janette Walde Analysis of Variance

Page 31: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Further Analysis

If H0 is rejected, we conclude that not all the µ’s are equal.

We would like to make statements about where there aredifferences.Can use planned or unplanned comparisons (or contrasts).

* Planned comparisons are interesting comparisons decidedon before analysis.

* Unplanned comparisons occur after seeing the results.Be careful not to go fishing for results!

Janette Walde Analysis of Variance

Page 32: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Contrasts

A simple contrast hypothesis compares two populationmeans:

* H0 : µ1 = µ5

A complex contrast hypothesis has multiple populationmeans on either side:

* H0 : (µ1 + µ2)/2 = µ3

* H0 : (µ1 + µ2)/2 = (µ3 + µ4 + µ5)/3

Janette Walde Analysis of Variance

Page 33: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Post-Hoc Analysis

Coefficients: Estimate Std. Error t value Pr(> |t |)(Intercept) 64.1 0.5725 111.961 < 2e − 16"1% glu, 2% sac" −6.1 0.8097 −7.534 1.66e − 09"2% fructose" −5.9 0.8097 −7.287 3.83e − 09"2% glucose" −4.8 0.8097 −5.928 3.99e − 07control group" 6.0 0.8097 7.410 2.52e − 09

Residual standard error: 1.81 on 45 degrees of freedomMultiple R-squared: 0.8796, Adjusted R-squared: 0.8689F-statistic: 82.17 on 4 and 45 DF, p-value: < 2.2e − 16

Janette Walde Analysis of Variance

Page 34: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Post-Hoc Analysis, cont.

What if we notice a possible interesting difference whenlooking at the results?

Can do comparisons but need to adjust the α-level tocontrol for Type-1 error.

Bonferroni correction for the number of comparisons done:α∗ = α

number of comparisons (Bonferroni-Holm

correction).

Janette Walde Analysis of Variance

Page 35: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Other Options

One common method is to use Tukey’s simultaneousconfidence intervals to calculate any and all pairs of grouppopulation means. This procedure takes multiplecomparisons into consideration to preserve the α-level.

Dunnett’s tests.

Scheffe procedure.

Janette Walde Analysis of Variance

Page 36: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Bonferroni-Holm correction for previous example

Pairwise comparisons using t tests with pooled SD

data: length and group_name"1% fru" "1% glu, "2% fru" "2% glu"

2% sac""1% glu, 2% sac" 1.2e − 08 − − −"2% fructose" 1.9e − 08 0.81 − −"2% glucose" 1.6e − 06 0.35 0.36 −control group" 1.5e − 08 < 2e − 16 < 2e − 16 2.4e − 16

P value adjustment method: holm.

Janette Walde Analysis of Variance

Page 37: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Comparison to Regression Analysis

The conclusions about the overall null hypothesis will bethe same.

In regression can make statements comparing groups tobaseline.

To make more conclusive statements will need to do moreanalysis.

ANOVA and either planned or post-hoc comparisons willdo the same and is often easier.

Janette Walde Analysis of Variance

Page 38: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

One-way ANOVA Power

Two different TOEFL prep. courses charge $1200 for a twomonth course. An (unethical) experiment would be torandomize students into one of the two courses or take nocourse.What information is needed to calculate power for thisone-way ANOVA?

* Sample size* Within group variance (σ2)* Estimated or minimally interesting outcome means for each

group.

Janette Walde Analysis of Variance

Page 39: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Estimate of σ2

Based on previous years, we know that 95% of the studentscores on TOEFL fall between 900 and 1500:

σ2 = (1500 − 900)/4 = 150

σ2 = 1502

Janette Walde Analysis of Variance

Page 40: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Minimally interesting outcome

What is the minimally average benefit, in points gained,that would justify the program?The minimally interesting outcome is based on previousknowledge.

For this example we’ll try several different values.

Janette Walde Analysis of Variance

Page 41: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Computing the Power

Different applets will define things slightly different (http ://www .epibiostat .ucsf .edu/biostat/sampsize.html).

For the applet I used (’nQuery’), they require’sd[treatment]’. From their definition this is calculated as:

sd[treatment] =

∑Ki=1(µi − µ)2

K

µi ... mean of group i

K ... number of groups

Ready to go to power applet.

Janette Walde Analysis of Variance

Page 42: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Computing the Power, Cont. I

Let σ = 150, n = 50, effect = 50 pointsPower = 38%

Let σ = 150, n = 100, effect = 50 pointsPower = 68%

Let σ = 150, n = 50, effect = 100 pointsPower = 94%

Let σ = 150, n = 50, effect = 25 pointsPower = 12%

Let σ = 100, n = 50, effect = 50 pointsPower = 73%

Janette Walde Analysis of Variance

Page 43: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Computing the Power, Cont. II

Let σ = 100, n = 100, effect = 50 pointsPower = 96%

Let σ = 100, n = 50, effect = 100 pointsPower = 99%

Let σ = 100, n = 50, effect = 25 pointsPower = 23%

Janette Walde Analysis of Variance

Page 44: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis

Moving past One-way ANOVA

What if we have two categorical explanatory variables?

What if we have categorical and quantitative explanatoryvariables?

What if subjects have more than one treatment?

What if there is more than one response variable?

And many other combinations...

Janette Walde Analysis of Variance

Page 45: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

Two-way ANOVA

Two-way (or multi-way) ANOVA is an appropriate analysismethod for a study with a quantitative outcome and two (ormore) categorical explanatory variables.

Suppose we now have two categorical explanatory variables:

Is there a significant X1 effect?

Is there a significant X2 effect?

Are there significant interaction effects?

If X1 has k levels and X2 has m levels, then the analysis is oftenreferred to as a ’k by m ANOVA’ or ’k × m ANOVA’.

Janette Walde Analysis of Variance

Page 46: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

Terminology

If the interaction is significant, the model is called aninteraction model.

If the interaction is not significant, the model is called anadditive model.

Explanatory variables are often referred to as factors.

Janette Walde Analysis of Variance

Page 47: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

Assumptions

The assumptions are the same as in One-way ANOVA:1 The errors εij are normally distributed.2 Across the conditions, the errors have equal spread. Often

referred to as equal variances.3 The errors are independent from each other.

Janette Walde Analysis of Variance

Page 48: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

Results

Results are again displayed in an ANOVA table

Will have one line for each term in the model. For a modelwith two factors, we will have one line for each factor andone line for the interaction. We will also have a line for theerror and the total.

See next page.

Janette Walde Analysis of Variance

Page 49: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

The ANOVA table

SS df MS F Sig.Factor 1 k − 1Factor 2 m − 1

Interaction (k − 1)(m − 1)Error N − k · m ?

Total N − 1

The MS(error), denoted by ? in the above table, is the trueestimate of σ2.

The MS in each row is that row’s SS/df .

The F -statistic is the MS/MS(error).

Janette Walde Analysis of Variance

Page 50: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

Exploratory Analysis

Table of meansInteraction or profile plots

* An interaction plot is a way to look at outcome means fortwo factors simultaneously.

* A plot with parallel lines suggests an additive model.* A plot with non-parallel lines suggests an interaction model.* Note that an interaction plot should NOT be the deciding

factor in whether or not to run an interaction model.

Janette Walde Analysis of Variance

Page 51: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

Example

Do anti-cancer drugs have different effects on males andfemales? Three types of different drugs are given patientshaving cancer. The diameter of the tumor is measured.

X1: Kind of drug - 3 levels

X2: Gender - 2 levels

Response: Tumor diameter

We will fit a 3 by 2 ANOVA.

Janette Walde Analysis of Variance

Page 52: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

Table of means and counts

Male Female OverallCisplatin 66.875 60.0 63.4375

Vinblastine 66.875 62.5 64.68755-fluorouracil 40.625 57.5 49.0625

Overall 58.125 60.0 59.0625

Note, this table should also include the standard error of each of themeans.

Male FemaleCisplatin 8 8

Vinblastine 8 85-fluorouracil 8 8

Janette Walde Analysis of Variance

Page 53: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

Interaction plots

4045

5055

6065

as.factor(drug_name)

mea

n of

dia

met

er

"5−fluorouracil" "cisplatin" "vinblastine"

as.factor(gender_name)

"male""female"

4045

5055

6065

as.factor(gender_name)

mea

n of

dia

met

er

"female" "male"

as.factor(drug_name)

"cisplatin""vinblastine""5−fluorouracil"

Janette Walde Analysis of Variance

Page 54: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

Interaction plots

There are two ways to do an interaction plot. Both arelegitimate. Ease of interpretation is the final criteria ofwhich to do.

If one explanatory variable has more levels than the other,interpretation is often easier if the explanatory variable withmore levels defines the x-axis.

If one explanatory variable is quantitative but has beencategorized and the other is categorical, interpretation isoften easier if the categorized quantitative variable definesthe x-axis. Example: age, 20 − 29, 30 − 39, 40 − 49, etc.

Janette Walde Analysis of Variance

Page 55: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

Results

Output:Df Sum Sq Mean Sq F value Pr(> F )

as.factor(drug) 2 2412.50 1206.25 16.5260 5.077e − 06as.factor(gender) 1 42.19 42.19 0.5780 0.4513514as.factor(drug): 2 1362.50 681.25 9.3333 0.0004429as.factor(gender)Residuals 42 3065.62 72.99

The last column contains the p-values

* Always check interaction first!

* If the interaction is not significant, rerun without it.

Janette Walde Analysis of Variance

Page 56: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

Checking the assumptions

−2 −1 0 1 2

−2

−1

01

2

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

25

27

9

40 45 50 55 60 65

−20

−10

010

20

Fitted values

Res

idua

ls

Residuals vs Fitted

25

27

9

Janette Walde Analysis of Variance

Page 57: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

Notes

The main effects should always be kept if the interaction issignificant.

Note that due to the groups of students, you will seevertical lines in the residual versus predicted plot. This isdue to the fact that all students with a particularcombination of the factors will have the same predictedvalue.

Janette Walde Analysis of Variance

Page 58: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

TerminologyAssumptionsResultsExploratory AnalysisExample

Post-hoc Comparisons

You can get Tukey HSD (Tukey Honestly SignificantDifferences) tests in order to calculate post hoc comparisons oneach factor in the model. You can specify specific factors as anoption.

diff lwr upr p adjcisplatin":female 2.500 -10.252 15.252 0.991"5-fluorouracil":female""vinblastine":female 5.000 -7.752 17.752 0.848"5-fluorouracil":female""5-fluorouracil":male -16.875 -29.627 -4.123 0.004"5-fluorouracil":female"...

...Janette Walde Analysis of Variance

Page 59: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

Extensions

Analysis of Covariance* At least one quantitative and one categorical explanatory

variable are included in the model.* In general, the main interest is the effects of the categorical

variable and the quantitative variable is considered to be acontrol variable.

* It is a blending of regression and ANOVA.

Multivariate designs: MANOVA/MANCOVA

Janette Walde Analysis of Variance

Page 60: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

R-commands I

boxplot(length ∼ group_name)

peas.aov < − aov(length ∼ group_name, data =peas.data)

pairwise.t.test(length, group_name, p.adj = "holm")

tapply(diameter, interaction(drug_name, gender_name),mean)

interaction.plot(as.factor(drug_name),as.factor(gender_name), diameter)

Janette Walde Analysis of Variance

Page 61: Analysis of Variance - Universität Innsbruck · Two-Way ANOVA Further Extensions Useful R-commands Object of Investigation Exploratory Analysis Notation Assumptions Object of investigation

IntroductionANOVA

One-Way ANOVATwo-Way ANOVA

Further ExtensionsUseful R-commands

R-commands II

cancer.aovfit1 < − aov(diameter ∼ as.factor(drug_name) ∗as.factor(gender_name))summary(cancer.aovfit1)plot(cancer.aovfit1, which= 2)TukeyHSD(cancer.aovfit1)

cancer.aovfit2 < − aov(diameter ∼ as.factor(drug_name) +as.factor(gender_name)+as.factor(drug_name) :as.factor(gender_name))summary(cancer.aovfit2)

Janette Walde Analysis of Variance