nathaniel e. helwig - umn statisticsusers.stat.umn.edu/~helwig/notes/aov2-notes.pdf ·...

70
Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 1

Upload: others

Post on 27-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Factorial and Unbalanced Analysis of Variance

Nathaniel E. Helwig

Assistant Professor of Psychology and StatisticsUniversity of Minnesota (Twin Cities)

Updated 04-Jan-2017

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 1

Page 2: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Copyright

Copyright © 2017 by Nathaniel E. Helwig

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 2

Page 3: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Outline of Notes

1) Balanced Two-Way ANOVA:Model Form & AssumptionsLeast-Squares EstimationBasic InferenceHypertension Example (pt 1)Multiple ComparisonsHypertension Example (pt 2)

2) Balanced Three-Way ANOVA:Model Form & EstimationHypertension Example (pt 3)

3) Unbalanced ANOVA Models:Overview of problemTypes of sums-of-squaresHypertension Example (pt 4)

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 3

Page 4: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA

Balanced Two-Way ANOVA

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 4

Page 5: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Model Form & Assumptions

Two-Way ANOVA Model (cell means form)

The Two-Way Analysis of Variance (ANOVA) model has the form

yijk = µjk + eijk

for i ∈ {1, . . . ,njk}, j ∈ {1, . . . ,a}, and k ∈ {1, . . . ,b} whereyijk ∈ R is real-valued response for i-th subject in factor cell (j , k)

µjk ∈ R is real-valued population mean for factor cell (j , k)

eijkiid∼ N(0, σ2) is a Gaussian error term

njk is number of subjects in cell (j , k) and n =∑a

j=1∑b

k=1 njk(note: njk = n∗∀j , k in balanced two-way ANOVA)a and b are number of levels for first and second factors

Implies that yijkind∼ N(µjk , σ

2).

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 5

Page 6: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Model Form & Assumptions

Two-Way ANOVA Model (effect coding)

Using effect coding, the mean for factor cell (j , k) has the form

µjk = µ+ αj + βk + (αβ)jk

for j ∈ {1, . . . ,a} and k ∈ {1, . . . ,b} whereµ is overall population meanαj is main effect of first factor such that

∑aj=1 αj = 0

βk is main effect of second factor such that∑b

k=1 βk = 0(αβ)jk is interaction effect such that

∑aj=1(αβ)jk = 0 ∀k and∑b

k=1(αβ)jk = 0 ∀j

Set (αβ)jk = 0 ∀j , k to fit an additive model.

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 6

Page 7: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Model Form & Assumptions

Two-Way ANOVA Model (matrix form)In matrix form, the two-way ANOVA model is y = Xb + e where

X =

1 x11 · · · x1(a−1) z11 · · · z1(b−1) x11z11 · · · x1(a−1)z11 · · · x11z1(b−1) · · · x1(a−1)z1(b−1)1 x21 · · · x2(a−1) z21 · · · z2(b−1) x21z21 · · · x2(a−1)z21 · · · x21z2(b−1) · · · x2(a−1)z2(b−1)1 x31 · · · x3(a−1) z31 · · · z3(b−1) x31z31 · · · x3(a−1)z31 · · · x31z3(b−1) · · · x3(a−1)z3(b−1)

.

.

....

. . ....

.

.

.. . .

.

.

....

. . ....

. . ....

. . ....

1 xn1 · · · xn(a−1) zn1 · · · zn(b−1) xn1zn1 · · · xn(a−1)zn1 · · · xn1zn(b−1) · · · xn(a−1)zn(b−1)

b =

(µ α1 · · · αa−1 β1 · · · βb−1 (αβ)11 · · · (αβ)(a−1)1 · · · (αβ)1(b−1) · · · (αβ)(a−1)(b−1)

)′where X has 1 + (a− 1) + (b − 1) + (a− 1)(b − 1) = ab columns

xij =

1 if i-th observation is in j-th level of first factor−1 if i-th observation is in a-th level of first factor

0 otherwise

zik =

1 if i-th observation is in k -th level of second factor−1 if i-th observation is in b-th level of second factor

0 otherwisei ∈ {1, . . . ,n} and additional subscripts on y and e are dropped

Implies that y ∼ N(Xb, σ2In).Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 7

Page 8: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Model Form & Assumptions

Two-Way ANOVA Model (assumptions)

The fundamental assumptions of the two-way ANOVA model are:1 xij , zik and yi are observed random variables (known constants)

2 eiiid∼ N(0, σ2) is an unobserved random variable

3 µjk are unknown constants

4 (yi |xij , zik )ind∼ N(µjk , σ

2)note: homogeneity of variance

Interpretation of µjk depends on model formAdditive: µjk = µ+ αj + βk

Interaction: µjk = µ+ αj + βk + (αβ)jk

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 8

Page 9: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Least-Squares Estimation

Ordinary Least-Squares

We want to find the effect estimates (i.e., µ, αj , βk , and (αβ)jk terms)that minimize the ordinary least squares criterion

SSE =b∑

k=1

a∑j=1

njk∑i=1

(yijk − µ− αj − βk − (αβ)jk )2

If njk = n∗∀j , k the least-squares estimates have the form

µ =1

abn∗∑b

k=1

∑aj=1

∑n∗i=1 yijk = y··

αj =

(1

bn∗∑b

k=1

∑n∗i=1 yijk

)− µ = yj· − y··

βk =

(1

an∗∑a

j=1

∑n∗i=1 yijk

)− µ = y·k − y··

(αβ)jk =

(1n∗∑n∗

i=1 yijk

)− µ− αj − βk = yjk − yj· − y·k + y··

which implies that yijk = yjk for all (i , j , k). Proof

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 9

Page 10: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Least-Squares Estimation

Fitted Values and Residuals

Form of fitted values depends on fit model:Additive: µjk = yj· + y·k − y··Interaction: µjk = yjk

Residuals have the form

eijk = yijk − µjk

where form of µjk depends on fit model (additive versus interaction).

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 10

Page 11: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Basic Inference

ANOVA Sums-of-Squares

In balanced two-way ANOVA model with interaction:SST =

∑bk=1

∑aj=1∑n∗

i=1(yijk − y··)2 df = abn∗ − 1

SSR = n∗∑b

k=1∑a

j=1(yjk − y··)2 df = ab − 1

SSE =∑b

k=1∑a

j=1∑n∗

i=1(yijk − yjk )2 df = abn∗ − ab

In balanced two-way ANOVA model with no interaction:SST =

∑bk=1

∑aj=1∑n∗

i=1(yijk − y··)2 df = abn∗ − 1

SSR = n∗∑b

k=1∑a

j=1([yj· + y·k − y··]− y··)2 df = a + b − 2

SSE =∑b

k=1∑a

j=1∑n∗

i=1(yijk − [yj· + y·k − y··])2

df = abn∗ − (a + b − 1)

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 11

Page 12: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Basic Inference

Partitioning the Variance

From MLR notes we know that SST = SSR + SSE .

If njk = n∗∀j , k can partition SSR = SSA + SSB + SSAB whereSSA = bn∗

∑aj=1 α

2j df = a− 1

SSB = an∗∑b

k=1 β2k df = b − 1

SSAB = n∗∑b

k=1∑a

j=1(αβ)2jk df = (a− 1)(b − 1)

Implies that SSR = SSA + SSB for additive model (if njk = n∗∀j , k ).

Proof

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 12

Page 13: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Basic Inference

Extended ANOVA Table and F Tests

We typically organize the SS information into an ANOVA table:Source SS df MS F p-valueSSR n∗

∑bk=1

∑aj=1(yjk − y··)2 ab − 1 MSR F∗ p∗

SSA bn∗∑a

j=1(yj· − y··)2 a− 1 MSA F∗a p∗aSSB an∗

∑bk=1(y·k − y··)2 b − 1 MSB F∗b p∗b

SSAB n∗∑b

k=1∑a

j=1(yjk − yj· − y·k + y··)2 (a− 1)(b − 1) MSAB F∗ab p∗abSSE

∑bk=1

∑aj=1∑n∗

i=1(yijk − yjk )2 ab(n∗ − 1) MSESST

∑bk=1

∑aj=1∑n∗

i=1(yijk − y··)2 abn∗ − 1

MSR = SSRab−1 , MSA = SSA

a−1 , MSB = SSBb−1 , MSAB = SSAB

(a−1)(b−1), MSE = SSE

ab(n∗−1),

F∗ = MSRMSE ∼ Fab−1,ab(n∗−1) and p∗ = P(Fab−1,ab(n∗−1) > F∗),

F∗a = MSAMSE ∼ Fa−1,ab(n∗−1) and p∗a = P(Fa−1,ab(n∗−1) > F∗a ),

F∗b = MSBMSE ∼ Fb−1,ab(n∗−1) and p∗b = P(Fb−1,ab(n∗−1) > F∗b ),

F∗ab = MSABMSE ∼ F(a−1)(b−1),ab(n∗−1) and p∗ab = P(F(a−1)(b−1),ab(n∗−1) > F∗ab),

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 13

Page 14: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Basic Inference

ANOVA Table F Tests

F ∗ and p∗-value are testing H0 : αj = βk = (αβ)jk = 0 ∀j , k versusH1 : (∃j , k ∈ {1, . . . ,a} × {1, . . . ,b})(αj = βk = (αβ)jk = 0 is false)

Equivalent to H0 : µjk = µ ∀j , k versus H1 : not all µjk are equal

F ∗a statistic and p∗a-value are testing H0 : αj = 0 ∀j versusH1 : (∃j ∈ {1, . . . ,a})(αj 6= 0)

Testing main effect of first factor

F ∗b statistic and p∗b-value are testing H0 : βk = 0 ∀k versusH1 : (∃k ∈ {1, . . . ,b})(βk 6= 0)

Testing main effect of second factor

F ∗ab statistic and p∗ab-value are testing H0 : (αβ)jk = 0 ∀j , k versusH1 : (∃j , k ∈ {1, . . . ,a} × {1, . . . ,b})((αβ)jk 6= 0)

Testing interaction effectNathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 14

Page 15: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 1)

Hypertension Example: Data Description

Hypertension example from Maxwell & Delany (2003).

Total of n = 72 subjects participate in hypertension experiment.Factor A: drug type (a = 3 levels: X, Y, Z)Factor B: diet type (b = 2 levels: yes, no)

Randomly assign njk = 12 subjects to each treatment cell:Note there are (ab) = (3)(2) = 6 treatment cellsObservations are independent within and between cells

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 15

Page 16: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 1)

Hypertension Example: Descriptive Statistics

Sum of blood pressure for each treatment cell (∑12

i=1 yijk ):Diet

Drug No (k = 1) Yes (k = 2) TotalX (j = 1) 2136 2052 4188Y (j = 2) 2424 2154 4578Z (j = 3) 2388 2130 4518

Total 6948 6336 13284

Sum-of-squares of blood pressure for each treatment cell (∑12

i=1 y2ijk ):

DietDrug No (k = 1) Yes (k = 2) Total

X (j = 1) 382368 352518 734886Y (j = 2) 491008 388898 879906Z (j = 3) 478238 380462 858700

Total 1351614 1121878 2473492Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 16

Page 17: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 1)

Hypertension Example: OLS Estimation (by hand)

Least-squares estimates are cell means: µjk = yjk and

µ =1

abn∗

∑bk=1

∑aj=1∑n∗

i=1 yijk = y·· = 1328472 = 184.5

α1 =

(1

bn∗

∑bk=1

∑n∗i=1 yi1k

)− µ = y1· − y·· =

418824− 184.5 = −10

α2 =

(1

bn∗

∑bk=1

∑n∗i=1 yi2k

)− µ = y2· − y·· =

457824− 184.5 = 6.25

α3 =

(1

bn∗

∑bk=1

∑n∗i=1 yi3k

)− µ = y3· − y·· =

451824− 184.5 = 3.75

β1 =

(1

an∗

∑aj=1∑n∗

i=1 yij1

)− µ = y·1 − y·· =

694836− 184.5 = 8.5

β2 =

(1

an∗

∑aj=1∑n∗

i=1 yij2

)− µ = y·2 − y·· =

633636− 184.5 = −8.5

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 17

Page 18: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 1)

Hypertension Example: OLS Estimation (by hand)

Continuing from the previous slide. . .

(αβ)11 = y11 − y1· − y·1 + y·· =2136

12− 4188

24− 6948

36+ 184.5 = −5

(αβ)12 = y12 − y1· − y·2 + y·· =2052

12− 4188

24− 6336

36+ 184.5 = 5

(αβ)21 = y21 − y2· − y·1 + y·· =2424

12− 4578

24− 6948

36+ 184.5 = 2.75

(αβ)22 = y22 − y2· − y·2 + y·· =2154

12− 4578

24− 6336

36+ 184.5 = −2.75

(αβ)31 = y31 − y3· − y·1 + y·· =2388

12− 4518

24− 6948

36+ 184.5 = 2.25

(αβ)32 = y32 − y3· − y·2 + y·· =2130

12− 4518

24− 6336

36+ 184.5 = −2.25

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 18

Page 19: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 1)

Hypertension Example: Enter Data (in R)

> bp = scan("/Users/Nate/Desktop/hypertension.dat")Read 72 items> diet = factor(rep(rep(c("no","yes"),each=6),6))> drug = factor(rep(rep(c("X","Y","Z"),each=12),2))> biof = factor(rep(c("present","absent"),each=36))> hyper = data.frame(bp=bp, diet=diet, drug=drug, biof=biof)> hyper[1:20,]

bp diet drug biof1 170 no X present2 175 no X present3 165 no X present4 180 no X present5 160 no X present6 158 no X present7 161 yes X present8 173 yes X present9 157 yes X present10 152 yes X present11 181 yes X present12 190 yes X present13 186 no Y present14 194 no Y present15 201 no Y present16 215 no Y present17 219 no Y present18 209 no Y present19 164 yes Y present20 166 yes Y present

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 19

Page 20: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 1)

Hypertension Example: OLS Estimation (in R)Effect coding for drug and diet:> contrasts(hyper$drug) <- contr.sum(3)> contrasts(hyper$drug)

[,1] [,2]X 1 0Y 0 1Z -1 -1> contrasts(hyper$diet) <- contr.sum(2)> contrasts(hyper$diet)

[,1]no 1yes -1> mymod = lm(bp ~ drug * diet, data=hyper)> summary(mymod) # I deleted some output

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 184.500 1.642 112.355 < 2e-16 ***drug1 -10.000 2.322 -4.306 5.64e-05 ***drug2 6.250 2.322 2.691 0.00901 **diet1 8.500 1.642 5.176 2.30e-06 ***drug1:diet1 -5.000 2.322 -2.153 0.03498 *drug2:diet1 2.750 2.322 1.184 0.24059---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 13.93 on 66 degrees of freedomMultiple R-squared: 0.4329, Adjusted R-squared: 0.3899F-statistic: 10.07 on 5 and 66 DF, p-value: 3.385e-07

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 20

Page 21: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 1)

Hypertension Example: Sums-of-Squares (by hand 1)

Defining n =∑b

k=1∑a

j=1 njk , the relevant sums-of-squares are

SST =∑b

k=1∑a

j=1∑njk

i=1(yijk − y··)2 =∑b

k=1∑a

j=1∑njk

i=1 y2ijk −

1n

(∑bk=1

∑aj=1∑njk

i=1 yijk

)2

= 2473492−172

(13284)2 = 22594

SSE =∑b

k=1∑a

j=1∑njk

i=1(yijk − yjk )2 =∑b

k=1∑a

j=1∑njk

i=1 y2ijk −

∑bk=1

∑aj=1

(∑njki=1 yijk

)2

njk

= 2473492−[21362 + 20522 + 24242 + 21542 + 23882 + 21302

]/12 = 12814

SSR = SST − SSE = 22594− 12814 = 9780

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 21

Page 22: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 1)

Hypertension Example: Sums-of-Squares (by hand 2)The sums-of-squares for the main and interaction effects are given by

SSA = bn∗∑a

j=1(yj· − y··)2 = bn∗∑a

j=1 α2j

= (2)(12)[(−10)2 + 6.252 + 3.752

]= 3675

SSB = an∗∑b

k=1(y·k − y··)2 = an∗∑b

k=1 β2k

= (3)(12)[(−8.5)2 + 8.52

]= 5202

SSAB = n∗∑b

k=1∑a

j=1(yjk − yj· − y·k + y··)2 = n∗∑b

k=1∑a

j=1(αβ)2jk

= 12[(−5)2 + 52 + 2.752 + (−2.75)2 + 2.252 + (−2.25)2

]= 903

and since njk = n∗ = 12 ∀j , k , we have

SSR = SSA + SSB + SSAB9780 = 3675 + 5202 + 903

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 22

Page 23: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 1)

Hypertension Example: ANOVA Table (by hand)

Putting things together in ANOVA table:Source SS df MS F p-valueSSR 9780 5 1956.0 10.07 < .0001

SSA 3675 2 1837.5 9.46 0.0002SSB 5202 1 5202.0 26.79 < .0001SSAB 903 2 451.5 2.33 0.1057

SSE 12814 66 194.2SST 22594 71

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 23

Page 24: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 1)

Hypertension Example: ANOVA Table (in R)

> anova(mymod)Analysis of Variance Table

Response: bpDf Sum Sq Mean Sq F value Pr(>F)

drug 2 3675 1837.5 9.4643 0.0002433 ***diet 1 5202 5202.0 26.7935 2.305e-06 ***drug:diet 2 903 451.5 2.3255 0.1056925Residuals 66 12814 194.2---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 24

Page 25: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Multiple Comparisons

Multiple Comparisons Overview

Still have multiple comparison problem:Overall test is not very informativeCan examine effect estimates for group differencesNeed follow-up tests to examine linear combinations of means

Still can use the same tools as before:BonferroniTukey (Tukey-Kramer)Scheffé

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 25

Page 26: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Multiple Comparisons

Two-Way ANOVA Linear Combinations

Assuming interaction model, we now have

L =∑b

k=1∑a

j=1 cjk yjk and V (L) = σ2∑bk=1

∑aj=1 c2

jk/njk

where cjk are the coefficients and σ2 is the MSE.

Assuming the additive model, we have

La =∑a

j=1 cj yj· and V (La) = σ2∑aj=1 c2

j /nj·

Lb =∑b

k=1 ck y·k and V (Lb) = σ2∑bk=1 c2

k/n·k

where cj and ck are main effect coefficients, σ2 is the MSE, andnj· =

∑bk=1 njk and n·k =

∑aj=1 njk are the marginal sample sizes.

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 26

Page 27: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Multiple Comparisons

Two-Way Multiple Comparisons in Practice

For interaction model, you follow-up on µjk = yjk

Bonferroni for any f tests (independent or not)Tukey (Tukey-Kramer) for all pairwise comparisonsScheffé for all possible contrasts

For additive model, you follow-up on µj = yj· and µk = y·kBonferroni for any f tests (independent or not)Tukey (Tukey-Kramer) for all pairwise comparisonsScheffé for all possible contrasts

For additive model, Tukey and Scheffé control FWER for each maineffect family separately.

Use Bonferroni in combination with Tukey/Scheffé to controlFWER for both families simultaneously

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 27

Page 28: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 2)

Hypertension Example: Interaction (by hand)

All ab(ab − 1)/2 = 15 possible pairwise comparisons of µjk :

L = yjk − yj ′k ′ and V (L) = 194.2(2/12) = 32.36667

and we know that√

2(L)√V (L)∼ qab,abn∗−ab, so 100(1− α)% CI is given by

L± 1√2

q(α)ab,abn∗−ab

√V (L)

where q(α)ab,abn∗−ab is critical value from studentized range.

For example, 95% CI for µ21 − µ11 is given by:

(µ21 − µ11)± 1√2

q(.05)6,66

√V (L)(

242412− 2136

12

)± 1√

2(4.150851)

√32.36667 = [7.303829; 40.69617]

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 28

Page 29: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 2)

Hypertension Example: Interaction (in R)

All ab(ab − 1)/2 = 15 possible pairwise comparisons of µjk :> mymod = aov(bp ~ drug * diet, data=hyper)> TukeyHSD(mymod, "drug:diet")

Tukey multiple comparisons of means95% family-wise confidence level

Fit: aov(formula = bp ~ drug * diet, data = hyper)

$‘drug:diet‘diff lwr upr p adj

Y:no-X:no 24.0 7.303829 40.696171 0.0010415Z:no-X:no 21.0 4.303829 37.696171 0.0058124X:yes-X:no -7.0 -23.696171 9.696171 0.8203137Y:yes-X:no 1.5 -15.196171 18.196171 0.9998189Z:yes-X:no -0.5 -17.196171 16.196171 0.9999992Z:no-Y:no -3.0 -19.696171 13.696171 0.9948741X:yes-Y:no -31.0 -47.696171 -14.303829 0.0000117Y:yes-Y:no -22.5 -39.196171 -5.803829 0.0025081Z:yes-Y:no -24.5 -41.196171 -7.803829 0.0007710X:yes-Z:no -28.0 -44.696171 -11.303829 0.0000856Y:yes-Z:no -19.5 -36.196171 -2.803829 0.0128988Z:yes-Z:no -21.5 -38.196171 -4.803829 0.0044123Y:yes-X:yes 8.5 -8.196171 25.196171 0.6690751Z:yes-X:yes 6.5 -10.196171 23.196171 0.8616371Z:yes-Y:yes -2.0 -18.696171 14.696171 0.9992610

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 29

Page 30: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 2)

Hypertension Example: Additive (by hand A part 1)

All a(a− 1)/2 = 3 possible pairwise comparisons of µj :

Y− X : La1 =4578

24− 4188

24= 16.25

Z− X : La2 =4518

24− 4188

24= 13.75

Z− Y : La3 =4518

24− 4578

24= −2.5

and the variance is given by

V (Laj ) = σ2∑aj=1 c2

j /nj· = (201.7206)(2/24) = 16.81005

where σ2 = SSE+SSABabn∗−(a+b−1) = 12814+903

68 = 201.7206

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 30

Page 31: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 2)

Hypertension Example: Additive (by hand A part 2)

Note√

2(Laj )√V (Laj )

∼ qa,abn∗−(a+b−1), so 100(1− α)% CI is given by

Laj ±1√2

q(α)a,abn∗−(a+b−1)

√V (Laj )

where q(α)a,abn∗−(a+b−1) is critical value from studentized range.

The 95% CI for all three pairwise comparisons is given by

La1 ±1√2

q(.05)3,68

√V (La1 ) = 16.25± 1√

2(3.388576)

√16.81005 = [6.426037; 26.07396]

La2 ±1√2

q(.05)3,68

√V (La2 ) = 13.75± 1√

2(3.388576)

√16.81005 = [3.926037; 23.57396]

La3 ±1√2

q(.05)3,68

√V (La3 ) = −2.5± 1√

2(3.388576)

√16.81005 = [−12.32396; 7.323963]

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 31

Page 32: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 2)

Hypertension Example: Additive (by hand B part 1)

All b(b − 1)/2 = 1 possible pairwise comparison of µk :

yes− no : Lb =6336

36− 6948

36= −17

and the variance is given by

V (Lb) = σ2∑bk=1 c2

k/n·k = (201.7206)(2/36) = 11.2067

where σ2 = SSE+SSABabn∗−(a+b−1) = 12814+903

68 = 201.7206

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 32

Page 33: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 2)

Hypertension Example: Additive (by hand B part 2)

Note√

2(Lb)√V (Lb)

∼ qb,abn∗−(a+b−1), so 100(1− α)% CI is given by

Lb ±1√2

q(α)b,abn∗−(a+b−1)

√V (Lb)

where q(α)b,abn∗−(a+b−1) is critical value from studentized range.

The 95% CI for pairwise comparison is given by

Lb ±1√2

q(.05)2,68

√V (Lb) = −17± 1√

2(2.822019)

√11.2067

= [−23.68011; −10.31989]

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 33

Page 34: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Two-Way ANOVA Hypertension Example (Part 2)

Hypertension Example: Additive (in R)All a(a− 1)/2 = 3 possible pairwise comparisons of µj :> mymod = aov(bp ~ drug + diet, data=hyper)> TukeyHSD(mymod, "drug")

Tukey multiple comparisons of means95% family-wise confidence level

Fit: aov(formula = bp ~ drug + diet, data = hyper)

$drugdiff lwr upr p adj

Y-X 16.25 6.426037 26.073963 0.0005220Z-X 13.75 3.926037 23.573963 0.0036941Z-Y -2.50 -12.323963 7.323963 0.8152941

All b(b − 1)/2 = 1 possible pairwise comparison of µk :> mymod = aov(bp ~ drug + diet, data=hyper)> TukeyHSD(mymod,"diet")

Tukey multiple comparisons of means95% family-wise confidence level

Fit: aov(formula = bp ~ drug + diet, data = hyper)

$dietdiff lwr upr p adj

yes-no -17 -23.68011 -10.31989 3.2e-06

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 34

Page 35: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA

Balanced Three-Way ANOVA

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 35

Page 36: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Model Form and Estimation

Three-Way ANOVA Model (cell means form)

The Three-Way Analysis of Variance (ANOVA) model has the form

yijkl = µjkl + eijkl

for i ∈ {1, . . . ,njkl}, j ∈ {1, . . . ,a}, k ∈ {1, . . . ,b}, l ∈ {1, . . . , c}, whereyijkl ∈ R is response for i-th subject in factor cell (j , k , l)µjkl ∈ R is population mean for factor cell (j , k , l)

eijkliid∼ N(0, σ2) is a Gaussian error term

njkl is number of subjects in cell (j , k , l)(note: njkl = n∗∀j , k , l in balanced three-way ANOVA)(a,b, c) is number of factor levels for Factors (A,B,C)

Implies that yijklind∼ N(µjkl , σ

2).

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 36

Page 37: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Model Form and Estimation

OLS Estimation (cell means form)

Similar to balanced two-way ANOVA, we want to minimize

c∑l=1

b∑k=1

a∑j=1

njkl∑i=1

(yijkl − µjkl)2

which is equivalent to minimizing∑njkl

i=1(yijkl − µjkl)2 for all j , k , l

Taking the derivative of SSEjkl =∑njkl

i=1(yijkl − µjkl)2, we see that

dSSEjkl

dµjkl= −2

njkl∑i=1

yijkl + 2njklµjkl

and setting to zero and solving gives µjkl = 1njkl

∑njkli=1 yijkl = yjkl

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 37

Page 38: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Model Form and Estimation

Three-Way ANOVA Model (effect coding)

The three-way ANOVA with all interactions assumes that

µjkl = µ+ αj + βk + γl + (αβ)jk + (αγ)jl + (βγ)kl + (αβγ)jkl

for j ∈ {1, . . . ,a}, k ∈ {1, . . . ,b}, and l ∈ {1, . . . , c} whereµ is overall population mean

αj is main effect of first factor such that∑a

j=1 αj = 0

βk is main effect of second factor such that∑b

k=1 βk = 0

γl is main effect of third factor such that∑c

l=1 γl = 0

(αβ)jk is A ∗ B interaction effect such that∑a

j=1(αβ)jk = 0 ∀k and∑b

k=1(αβ)jk = 0 ∀j

(αγ)jl is A ∗ C interaction effect such that∑a

j=1(αγ)jl = 0 ∀l and∑c

l=1(αγ)jl = 0 ∀j

(βγ)kl is B ∗ C interaction effect such that∑b

k=1(βγ)kl = 0 ∀l and∑c

l=1(βγ)kl = 0 ∀k(αβγ)jkl is A ∗ B ∗ C interaction effect such that

∑aj=1(αβγ)jkl = 0 ∀k , l and∑b

k=1(αβγ)jkl = 0 ∀j, l and∑c

l=1(αβγ)jkl = 0 ∀j, k

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 38

Page 39: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Model Form and Estimation

OLS Estimation (effect coding)

The OLS estimates of the various effects are given by

µ = y···αj = yj·· − y···βk = y·k · − y···γl = y··l − y···

(αβ)jk = yjk · − yj·· − y·k · + y···(αγ)jl = yj·l − yj·· − y··l + y···(βγ)kl = y·kl − y·k · − y··l + y···

( ˆαβγ)jkl = yjkl − [µ+ αj + βk + γl + (αβ)jk + (αγ)jl + (βγ)kl ]

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 39

Page 40: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Model Form and Estimation

Fitted Values and Residuals

Form of fitted values depends on fit model:Additive: µjkl = µ+ αj + βk + γl

All 2-way Int: µjkl = µ+ αj + βk + γl + (αβ)jk + (αγ)jl + (βγ)kl

3-way Int: µjkl = yjkl

Residuals have the form

eijkl = yijkl − µjkl

where form of µjkl depends on fit model (additive versus interaction).

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 40

Page 41: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Basic Inference

ANOVA Sums-of-Squares

Defining N = abcn∗, the three-way ANOVA sums-of-squares are:SST =

∑cl=1∑b

k=1∑a

j=1∑n∗

i=1(yijkl − y···)2 df = N − 1SSR = n∗

∑cl=1∑b

k=1∑a

j=1(yjkl − y···)2 df = abc − 1SSE =

∑cl=1∑b

k=1∑a

j=1∑n∗

i=1(yijkl − yjkl)2 df = N − abc

SSR = SSA + SSB + SSC + SSAB + SSAC + SSBC + SSABCSSA = bcn∗

∑aj=1 α

2j df = a− 1

SSB = acn∗∑b

k=1 β2k df = b − 1

SSC = abn∗∑c

l=1 γ2l df = c − 1

SSAB = cn∗∑b

k=1∑a

j=1(αβ)2jk df = (a− 1)(b − 1)

SSAC = bn∗∑c

l=1∑a

j=1(αγ)2jl df = (a− 1)(c − 1)

SSBC = an∗∑c

l=1∑b

k=1(βγ)2kl df = (b − 1)(c − 1)

SSABC = n∗∑c

l=1∑b

k=1∑a

j=1( ˆαβγ)2jkl

df = (a− 1)(b − 1)(c − 1)

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 41

Page 42: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Basic Inference

Memory Example: Data Description (revisited)

Hypertension example from Maxwell & Delany (2003).

Total of N = 72 subjects participate in hypertension experiment.Factor A: drug type (a = 3 levels: X, Y, Z)Factor B: diet type (b = 2 levels: yes, no)Factor C: biof type (c = 2 levels: present, absent)

Randomly assign njkl = 6 subjects to each treatment cell:Note there are (abc) = (3)(2)(2) = 12 treatment cellsObservations are independent within and between cells

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 42

Page 43: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Basic Inference

Hypertension Example: Look at Data

> bp = scan("/Users/Nate/Desktop/hypertension.dat")Read 72 items> diet = factor(rep(rep(c("no","yes"),each=6),6))> drug = factor(rep(rep(c("X","Y","Z"),each=12),2))> biof = factor(rep(c("present","absent"),each=36))> hyper = data.frame(bp=bp, diet=diet, drug=drug, biof=biof)> contrasts(hyper$drug) <- contr.sum(3)> contrasts(hyper$diet) <- contr.sum(2)> contrasts(hyper$biof) <- contr.sum(2)> hyper[1:15,]

bp diet drug biof1 170 no X present2 175 no X present3 165 no X present4 180 no X present5 160 no X present6 158 no X present7 161 yes X present8 173 yes X present9 157 yes X present10 152 yes X present11 181 yes X present12 190 yes X present13 186 no Y present14 194 no Y present15 201 no Y present

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 43

Page 44: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Basic Inference

Hypertension Example: All Interactions

> mymod = lm(bp ~ drug * diet * biof, data=hyper)> anova(mymod)Analysis of Variance Table

Response: bpDf Sum Sq Mean Sq F value Pr(>F)

drug 2 3675 1837.5 11.7287 5.019e-05 ***diet 1 5202 5202.0 33.2043 3.053e-07 ***biof 1 2048 2048.0 13.0723 0.0006151 ***drug:diet 2 903 451.5 2.8819 0.0638153 .drug:biof 2 259 129.5 0.8266 0.4424565diet:biof 1 32 32.0 0.2043 0.6529374drug:diet:biof 2 1075 537.5 3.4309 0.0388342 *Residuals 60 9400 156.7---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 44

Page 45: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Basic Inference

Hypertension Example: All 2-Way Interactions

> mymod = lm(bp ~ drug * diet + drug * biof + diet * biof, data=hyper)> anova(mymod)Analysis of Variance Table

Response: bpDf Sum Sq Mean Sq F value Pr(>F)

drug 2 3675 1837.5 10.8759 8.940e-05 ***diet 1 5202 5202.0 30.7899 6.345e-07 ***biof 1 2048 2048.0 12.1218 0.000919 ***drug:diet 2 903 451.5 2.6724 0.077043 .drug:biof 2 259 129.5 0.7665 0.468992diet:biof 1 32 32.0 0.1894 0.664925Residuals 62 10475 169.0---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 45

Page 46: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Basic Inference

Hypertension Example: Additive Model

> mymod = lm(bp ~ drug + diet + biof, data=hyper)> anova(mymod)Analysis of Variance Table

Response: bpDf Sum Sq Mean Sq F value Pr(>F)

drug 2 3675 1837.5 10.550 0.0001039 ***diet 1 5202 5202.0 29.868 7.346e-07 ***biof 1 2048 2048.0 11.759 0.0010403 **Residuals 67 11669 174.2---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 46

Page 47: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Basic Inference

Hypertension Example: Interaction Plot

If you choose the three-way interaction model, you could visualize theinteraction using an interaction plot.

Biofeedback Absent

Drug

Mea

n B

P

Diet NoDiet Yes

X Y Z

170

180

190

200

210

Biofeedback Present

Drug

Mea

n B

P

Diet NoDiet Yes

X Y Z

170

180

190

200

210

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 47

Page 48: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Basic Inference

Hypertension Example: Interaction Plot (R code)

yhat=tapply(hyper$bp,list(hyper$drug,hyper$diet,hyper$biof),mean)par(mfrow=c(1,2))mytitles=c("Biofeedback Absent","Biofeedback Present")for(k in 1:2){plot(1:3,yhat[,1,k],ylim=c(165,215),xlab="Drug",

ylab="Mean BP",main=mytitles[k],axes=FALSE,type="l")lines(1:3,yhat[,2,k],lty=2)legend("topleft",c("Diet No","Diet Yes"),lty=1:2,bty="n")axis(1,at=1:3,labels=c("X","Y","Z"))axis(2)

}

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 48

Page 49: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Basic Inference

Hypertension Example: Multiple Comparisons

Assuming we chose the additive model, we would perform follow-uptests on the marginal means.

Factor A: µaj = µ+ αj = yj··

Factor B: µbk = µ+ βk = y·k ·Factor C: µcl = µ+ γl = y··l

If we chose three-way interaction model, we would perform follow-uptests on the individual cell means.

µjkl = µ+ αj + βk + γl + (αβ)jk + (αγ)jl + (βγ)kl + ( ˆαβγ)jkl

= yjkl

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 49

Page 50: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Balanced Three-Way ANOVA Basic Inference

Hypertension Example: Multiple Comparisons

> mymod = aov(bp ~ drug + diet + biof, data=hyper)

> TukeyHSD(mymod, "drug") # I deleted some outputTukey multiple comparisons of means

95% family-wise confidence level

$drugdiff lwr upr p adj

Y-X 16.25 7.118642 25.381358 0.0001874Z-X 13.75 4.618642 22.881358 0.0016810Z-Y -2.50 -11.631358 6.631358 0.7894946

> TukeyHSD(mymod, "diet") # I deleted some outputTukey multiple comparisons of means

95% family-wise confidence level

$dietdiff lwr upr p adj

yes-no -17 -23.20877 -10.79123 7e-07

> TukeyHSD(mymod, "biof") # I deleted some outputTukey multiple comparisons of means

95% family-wise confidence level

$biofdiff lwr upr p adj

present-absent -10.66667 -16.87544 -4.457897 0.0010403

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 50

Page 51: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Unbalanced ANOVA Models

Unbalanced ANOVA Models

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 51

Page 52: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Unbalanced ANOVA Models Overview of Problem

Unbalanced ANOVA: Model Form

Unbalanced ANOVA has same model form as balanced, but unequalsample sizes in each cell.

1-way: nj 6= nj ′ for some j , j ′

2-way: njk 6= nj ′k ′ for some (jk), (j ′k ′)3-way: njkl 6= nj ′k ′l ′ for some (jkl), (j ′k ′l ′)

In the previous slides, we assumed njk = n∗∀j , k (two-way ANOVA) ornjkl = n∗∀j , k , l (three-way ANOVA), which made life easy.

Effects were orthogonal in balanced designParameter estimates had simple relation to cell/marginal means

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 52

Page 53: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Unbalanced ANOVA Models Overview of Problem

Unbalanced ANOVA: Implications

Main consequence for two-way (and higher-way) unbalanced design:Non-orthogonal SS (e.g., SSR 6= SSA + SSB + SSAB)Design is less efficient (larger variances of parameter estimates)

Unbalanced design also affects our estimation and follow-up testsParameter estimates require matrix inversion: b = (X′X)−1X′yNeed to do follow-up tests on least-squares means

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 53

Page 54: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Unbalanced ANOVA Models Overview of Problem

Unbalanced ANOVA: Testing Effects

Because of non-orthogonality, cannot test effects using F = MS?MSE .

Instead we use the General Linear Model (GLM) F test statistic:

F =SSER − SSEF

dfR − dfF÷ SSEF

dfF∼ F(dfR−dfF ,dfF )

whereSSER is sum-of-squares error for reduced modelSSEF is sum-of-squares error for full modeldfR is error degrees of freedom for reduced modeldfF is error degrees of freedom for full model

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 54

Page 55: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Unbalanced ANOVA Models Overview of Problem

Unbalanced ANOVA: Testing Example

Consider two-way ANOVA and all 7 possible models

yijk = µ+ αj + βk + (αβ)jk + eijk (1)yijk = µ+ αj + βk + eijk (2)yijk = µ+ αj + (αβ)jk + eijk (3)yijk = µ+ βk + (αβ)jk + eijk (4)yijk = µ+ αj + eijk (5)yijk = µ+ βk + eijk (6)yijk = µ+ eijk (7)

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 55

Page 56: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Unbalanced ANOVA Models Overview of Problem

Unbalanced ANOVA: Testing Example (continued)

To test effect, use F test comparing full and reduced models.

To test each effect there are multiple choices we could use for full andreduced models:

A: F=1 and R=4 or F=2 and R=6 or F=5 and R=7B: F=1 and R=3 or F=2 and R=5 or F=6 and R=7AB: F=1 and R=2 or F=3 and R=5 or F=4 and R=6

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 56

Page 57: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Unbalanced ANOVA Models Types of Sums-of-Squares

Types of Sum-of-SquaresType I SS

Amount of additional variation explained by the model when a term is added to the model(aka sequential sum-of-squares).In two-way ANOVA, type I SS would compare:(a) Main Effect A: F=5 and R=7(b) Main Effect B: F=2 and R=5(c) Interaction Effect: F=1 and R=2

Type II SSAmount of variation a term adds to the model when all other terms are included exceptterms that “contain” the effect being tested (e.g., (αβ)jk contains αj and βk ).In two-way ANOVA, type II SS would compare:(a) Main Effect A: F=2 and R=6(b) Main Effect B: F=2 and R=5(c) Interaction Effect: F=1 and R=2

Type III SSAmount of variation a term adds to the model when all other terms are included, which issometimes called partial sum-of-squares.In two-way ANOVA, type III SS would compare:(a) Main Effect A: F=1 and R=4(b) Main Effect B: F=1 and R=3(c) Interaction Effect: F=1 and R=2

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 57

Page 58: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Unbalanced ANOVA Models Types of Sums-of-Squares

Types of Sum-of-Squares (in R)

When fitting multi-way ANOVAs, anova function gives Type I SS.

Order matters in unbalanced design!bp = drug + diet produces different Type I SS tests thanbp = diet + drug if design is unbalanced

Use Anova function in car package for Type II and Type III SS.Function performs Type II SS tests by defaultUse type=3 option for Type III SS tests

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 58

Page 59: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Unbalanced ANOVA Models Hypertension Example (Part 4)

Hypertension Example: Type I

> mymod = lm(bp ~ drug * diet * biof, data=hyper[1:71,])> anova(mymod)Analysis of Variance Table

Response: bpDf Sum Sq Mean Sq F value Pr(>F)

drug 2 3733.6 1866.8 11.7306 5.138e-05 ***diet 1 5113.3 5113.3 32.1311 4.558e-07 ***biof 1 2087.2 2087.2 13.1154 0.0006101 ***drug:diet 2 879.5 439.8 2.7633 0.0712569 .drug:biof 2 280.5 140.3 0.8813 0.4196123diet:biof 1 24.2 24.2 0.1522 0.6978384drug:diet:biof 2 1055.8 527.9 3.3172 0.0431275 *Residuals 59 9389.2 159.1---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 59

Page 60: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Unbalanced ANOVA Models Hypertension Example (Part 4)

Hypertension Example: Type II

> library(car)> Anova(mymod, type=2)Anova Table (Type II tests)

Response: bpSum Sq Df F value Pr(>F)

drug 3704.1 2 11.6378 5.491e-05 ***diet 4975.9 1 31.2676 6.085e-07 ***biof 2061.8 1 12.9561 0.0006541 ***drug:diet 872.5 2 2.7413 0.0727049 .drug:biof 277.7 2 0.8726 0.4231893diet:biof 24.2 1 0.1522 0.6978384drug:diet:biof 1055.8 2 3.3172 0.0431275 *Residuals 9389.2 59---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 60

Page 61: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Unbalanced ANOVA Models Hypertension Example (Part 4)

Hypertension Example: Type III

> library(car)> Anova(mymod, type=3)Anova Table (Type III tests)

Response: bpSum Sq Df F value Pr(>F)

(Intercept) 2412026 1 15156.7271 < 2.2e-16 ***drug 3685 2 11.5784 5.730e-05 ***diet 5057 1 31.7754 5.132e-07 ***biof 2052 1 12.8967 0.0006713 ***drug:diet 882 2 2.7705 0.0707910 .drug:biof 268 2 0.8434 0.4353639diet:biof 27 1 0.1692 0.6822873drug:diet:biof 1056 2 3.3172 0.0431275 *Residuals 9389 59---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 61

Page 62: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Appendix

Appendix

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 62

Page 63: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Appendix

Ordinary Least-Squares (proof for µ)

Expanding the first summation produces

SSE =b∑

k=1

a∑j=1

[n∗∑i=1

y2ijk − 2(µ+ αj + βk + (αβ)jk )

n∗∑i=1

yijk + n∗(µ+ αj + βk + (αβ)jk )2

]

Taking the derivative with respect to µ we have

dSSEdµ

=∑b

k=1∑a

j=1[−2∑n∗

i=1 yijk + 2n∗µ+ 2n∗(αj + βk + (αβ)jk )]

= −2(∑b

k=1∑a

j=1∑n∗

i=1 yijk

)+ 2abn∗µ

and setting to zero and solving for µ givesµ = 1

abn∗

∑bk=1

∑aj=1∑n∗

i=1 yijk = y··

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 63

Page 64: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Appendix

Ordinary Least-Squares (proof for αj)

Taking the derivative with respect to αj we have

dSSEdαj

=∑b

k=1[−2∑n∗

i=1 yijk + 2n∗αj + 2n∗(µ+ βk + (αβ)jk )]

= −2(∑b

k=1∑n∗

i=1 yijk

)+ 2bn∗αj + 2bn∗µ

and setting to zero, using µ for µ, and solving for αj givesαj = 1

bn∗ (∑b

k=1∑n∗

i=1 yijk )− µ = yj· − y··

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 64

Page 65: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Appendix

Ordinary Least-Squares (proof for βk )

Taking the derivative with respect to βk we have

dSSEdβk

=∑a

j=1[−2∑n∗

i=1 yijk + 2n∗βk + 2n∗(µ+ αj + (αβ)jk )]

= −2(∑a

j=1∑n∗

i=1 yijk

)+ 2an∗βk + 2an∗µ

and setting to zero, using µ for µ, and solving for βk givesβk = 1

an∗ (∑a

j=1∑n∗

i=1 yijk )− µ = y·k − y··

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 65

Page 66: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Appendix

Ordinary Least-Squares (proof for (αβ)jk )

Taking the derivative with respect to (αβ)jk we have

dSSEd(αβ)jk

= −2n∗∑

i=1

yijk + 2n∗(αβ)jk + 2n∗(µ+ αj + βk )

and setting to zero, using (µ, αj , βk ) for (µ, αj , βk ), and solving for(αβ)jk gives (αβ)jk = 1

n∗ (∑n∗

i=1 yijk )− µ− αj − βk = yjk − yj· − y·k + y··

Return

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 66

Page 67: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Appendix

Partitioning the Variance (proof part 1)To prove SSR = SSA + SSB + SSAB when njk = n∗∀j , k , note that

yijk − y·· = (yijk − yjk ) + (yjk − [yj· + y·k − y··]) + (yj· − y··) + (y·k − y··)

Now if we square both sides we have(yijk − y··)2 = (yijk − yjk )2 + (yjk − [yj· + y·k − y··])2 + (yj· − y··)2 + (y·k − y··)2

+ 2(yijk − yjk ) {(yjk − [yj· + y·k − y··]) + (yj· − y··) + (y·k − y··)}+ 2(yjk − [yj· + y·k − y··]) [(yj· − y··) + (y·k − y··)]

+ 2(yj· − y··)(y·k − y··)

Now if we apply the triple summation we have SST

SST =b∑

k=1

a∑j=1

njk∑i=1

(yijk − y··)2

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 67

Page 68: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Appendix

Partitioning the Variance (proof part 2)First, note that we have

SSE =∑b

k=1

∑aj=1

∑njki=1(yijk − yjk )2

SSAB =∑b

k=1

∑aj=1

∑njki=1(yjk − [yj· + y·k − y··])2

SSA =∑b

k=1

∑aj=1

∑njki=1(yj· − y··)2

SSB =∑b

k=1

∑aj=1

∑njki=1(y·k − y··)2

so we need to prove that the crossproduct terms are orthogonal.

To prove that the first crossproduct term sums to zero, define(αβ)jk = (yjk − [yj· + y·k − y··]) + (yj· − y··) + (y·k − y··) and note that∑b

k=1∑a

j=1∑njk

i=1 2(yijk − yjk )(αβ)jk = 2∑b

k=1∑a

j=1(αβ)jk∑njk

i=1(yijk − yjk )

= 2∑b

k=1∑a

j=1(αβ)jk (0) = 0

because we are summing mean-centered variable.Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 68

Page 69: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Appendix

Partitioning the Variance (proof part 3)

To prove that the second crossproduct term sums to zero, note that(αβ)jk = (yjk − [yj· + y·k − y··]), αj = (yj· − y··), and βk = (y·k − y··), so∑b

k=1∑a

j=1∑njk

i=1 2(αβ)jk (αj + βk ) = 2∑b

k=1∑a

j=1 njk (αβ)jk (αj + βk )

Now assuming that njk = n∗∀j , k∑bk=1

∑aj=1 njk (αβ)jk αj = n∗

∑aj=1 αj

(∑bk=1(αβ)jk

)= n∗

∑aj=1 αj(0) = 0∑b

k=1∑a

j=1 njk (αβ)jk βk = n∗∑b

k=1 βk

(∑aj=1(αβ)jk

)= n∗

∑bk=1 βk (0) = 0

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 69

Page 70: Nathaniel E. Helwig - UMN Statisticsusers.stat.umn.edu/~helwig/notes/aov2-Notes.pdf · 2017-01-06 · Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor

Appendix

Partitioning the Variance (proof part 4)

To prove that the third crossproduct term sums to zero, note that∑bk=1

∑aj=1∑njk

i=1 2(yj· − y··)(y·k − y··) = 2∑b

k=1∑a

j=1 njk αj βk

and if njk = n∗∀j , k we have that

2∑b

k=1∑a

j=1 njk αj βk = 2n∗∑b

k=1∑a

j=1 αj βk

= 2n∗∑b

k=1 βk

(∑aj=1 αj

)= 2n∗

∑bk=1 βk (0) = 0

which completes the proof.

Return

Nathaniel E. Helwig (U of Minnesota) Factorial & Unbalanced Analysis of Variance Updated 04-Jan-2017 : Slide 70