epse 592: design & analysis of experiments · epse 592: design & analysis of experiments...

50
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EPSE 592: Design & Analysis of Experiments Ed Kroc University of British Columbia [email protected] September 25, 2019 Ed Kroc (UBC) EPSE 592 September 25, 2019 1 / 50

Upload: others

Post on 26-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

EPSE 592: Design & Analysis of Experiments

Ed Kroc

University of British Columbia

[email protected]

September 25, 2019

Ed Kroc (UBC) EPSE 592 September 25, 2019 1 / 50

Page 2: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Last Time

Hypothesis tests, test statistics, and p-values

Z-test

t-tests (independent samples and paired)

F-tests (testing equality of variances)

Ed Kroc (UBC) EPSE 592 September 25, 2019 2 / 50

Page 3: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Today

Type I and type II errors

Multiple testing and adjustments for inflated type I errors

P-value interpretations (orders of magnitude rule)

One-way ANOVA (testing mean differences for more than 2 groups)

Ed Kroc (UBC) EPSE 592 September 25, 2019 3 / 50

Page 4: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Example: three experimental groups of interest

Suppose we are interested in studying how amount of higher educationcorrelates with self-reported anxiety levels. We have a survey designed tomeasure anxiety and give it to 18 people at UBC: 6 who have obtainedBachelor’s degrees, 6 who have obtained Master’s degrees, and 6 whohave obtained PhDs (chosen how?).

Bachelor’s Master’s PhD6.2 6.2 6.95.8 6.9 9.06.0 6.2 7.75.9 7.7 9.16.6 6.8 8.36.2 7.9 8.0

Table: Self-reported anxiety levels, 10 point scale. 18 respondents.

Ed Kroc (UBC) EPSE 592 September 25, 2019 4 / 50

Page 5: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Example: three experimental groups of interest

Could perform 3 independent-samples t-tests to test the 3 nullhypotheses:

H0,1 : µB “ µM

H0,2 : µM “ µP

H0,3 : µB “ µP

Ed Kroc (UBC) EPSE 592 September 25, 2019 5 / 50

Page 6: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Example: three experimental groups of interest

Could perform 3 independent-samples t-tests to test the 3 nullhypotheses:

H0,1 : µB “ µM ùñ p-value ă 0.05H0,2 : µM “ µP ùñ p-value ă 0.05H0,3 : µB “ µP ùñ p-value ăă 0.05

But what about inflated Type I error?

Ed Kroc (UBC) EPSE 592 September 25, 2019 6 / 50

Page 7: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Type I and Type II Errors

Recall: when p-value small, conclude data inconsistent with H0.

Recall: when p-value large, conclude data consistent with H0.

Whenever we make a decision about a hypothesis based on a p-value,we have a chance of making an error.

Given H0 true Given H0 falsedata inconsistent

with H0

Type I errorfalse positive

Correct decisiontrue positive

data consistentwith H0

Correct decisiontrue negative

Type II errorfalse negative

Ed Kroc (UBC) EPSE 592 September 25, 2019 7 / 50

Page 8: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Type I and Type II Errors

Traditionally, we set a predetermined significance level, α, such that

PrpType I errorq “ Prpp ´ value ă α | H0 trueq “ α.

Then α, sample size, variability, and choice of test determine

PrpType II errorq “ Prpp ´ value ą α | H0 falseq “ β.

The confidence level, or specificity, of a test is defined as

Prpp ´ value ą α | H0 trueq “ 1 ´ α.

The power, or sensitivity, of a test is defined as

Prpp ´ value ă α | H0 falseq “ 1 ´ β.

Ed Kroc (UBC) EPSE 592 September 25, 2019 8 / 50

Page 9: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Type I and Type II Errors

In practice, α “ 0.05 is a common choice.

Note: all of 1 ´ α, β, and 1 ´ β are determined once α has beenfixed, the data have been collected, and the choice of analysis made.

Good studies will strive to have 1 ´ β ě 0.80. Most studies will havemuch lower power.

Given H0 true Given H0 falsePr(data inconsistent

with H0 | ¨ ¨ ¨ ) α p1 ´ βq

Pr(data consistentwith H0 | ¨ ¨ ¨ ) p1 ´ αq β

Ed Kroc (UBC) EPSE 592 September 25, 2019 9 / 50

Page 10: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Multiple Testing

Each time we conduct a statistical test of hypothesis, we have achance of committing a Type I or Type II error.

The choice of α controls our chance of Type I error for a single test.

Thus, if our study requires more than one test, each one has a chanceof error.

Thus, if our study requires more than one test, we should beconcerned with the family-wise error rate: the probability ofcommitting at least one Type I error.

Ed Kroc (UBC) EPSE 592 September 25, 2019 10 / 50

Page 11: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Multiple Testing: example

Suppose we test two hypotheses that are independent of each other:H0,1 : mean iron concentration in blood equal between 2 groupsH0,2 : mean anxiety levels equal between same 2 groups

Suppose we set α “

Prptest 1 significant | H0,1 trueq “ Prptest 2 significant | H0,2 trueq.

Rules of probability then tell us:Prptest 1 or 2 significant | H0,1 and H0,2 trueq “

PrpT1 sig.|H0,1q ` PrpT2 sig.|H0,2q ´ PrpT1 and T2 sig.|H0,1,H0,2q

“ α ` α ´ α ¨ α

“ 2α ´ α2

ą α, since 0 ă α ă 1.

Therefore, family-wise error rate ą individual error rate.Ed Kroc (UBC) EPSE 592 September 25, 2019 11 / 50

Page 12: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Adjustments for Multiple Tests

Practically, this means the more hypotheses we test, the less confidentwe can be that our “significant” results are actually significant.

However, there are many ways to correct for this inflation of Type Ierror due to multiple testing:

Bonferroni adjustment (most common, most conservative)Šidák and Holm adjustmentsTukey adjustmentScheffé adjustmentBenjamini-Hochberg adjustment...and many others

Ed Kroc (UBC) EPSE 592 September 25, 2019 12 / 50

Page 13: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Adjustments for Multiple Tests

Bonferroni adjustement says:Set an original α rate of Type I error.Take this α and divide by the total number of tests, n, you willperform: α1 :“ α{n.This new α1 level is what you should use in each test to determine ifthe p-value is “significant” or not.

The Bonferroni procedure guarantees that the chance of making anyType I errors in any tests is no bigger than the original α level.

That is, Bonferroni ensures family-wise Type I error rate is no biggerthan α.

Bonferroni is very conservative: always works, but if tests are notindependent, can be a massive overcorrection.

Ed Kroc (UBC) EPSE 592 September 25, 2019 13 / 50

Page 14: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Adjustments for Multiple Tests: example

Recall our data on self-reported anxiety levels:

Bachelor’s Master’s PhD6.2 6.2 6.95.8 6.9 9.06.0 6.2 7.75.9 7.7 9.16.6 6.8 8.36.2 7.9 8.0

We performed three t-tests of hypotheses to compare if the pairwisemeans of these three groups were different.

Ed Kroc (UBC) EPSE 592 September 25, 2019 14 / 50

Page 15: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Adjustments for Multiple Tests: example

H0,1 : µB “ µM

H0,2 : µM “ µP

H0,3 : µB “ µP

Ed Kroc (UBC) EPSE 592 September 25, 2019 15 / 50

Page 16: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Adjustments for Multiple Tests: example

Using the Bonferroni correction, we would find

α1 “ 0.05{3 “ 0.017.

Comparing our p-values to the adjusted significane level yields:H0,1 : µB “ µM ùñ p-value ą 0.017 (not significant)H0,2 : µM “ µP ùñ p-value ą 0.017 (not significant)H0,3 : µB “ µP ùñ p-value ă 0.017

Two issues here:(1) Bonferroni too conservative (hypotheses not independent); means we

lose power to detect effects.(2) There is no meaningful difference between a p-value of, say, 0.022 and

0.012. Yet here, the former is not “significant” while the latter is“significant”.

Ed Kroc (UBC) EPSE 592 September 25, 2019 16 / 50

Page 17: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Adjustments for Multiple Tests

Two issues here:(1) Bonferroni too conservative (hypotheses not independent); means we

lose power to detect effects.(2) There is no meaningful difference between a p-value of, say, 0.022 and

0.012. Yet here, the former is not “significant” while the latter is“significant”.

How to fix these issues?(1) Choose a better test of hypotheses: ANOVA(2) Discourage the enforcement of arbitrary thresholds; apply the orders of

magnitude rule: p-values that differ by less than one order ofmagnitude are practically indistinguishable as measures of evidence.

Ed Kroc (UBC) EPSE 592 September 25, 2019 17 / 50

Page 18: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

The Analysis of Variance (ANOVA) Paradigm

The general ANOVA methodology can be described as follows:Rather than testing if each pair of m groups exhibit an averagedifference, test only the null hypothesis

H0 : µ1 “ µ2 “ ¨ ¨ ¨ “ µm

Then, if the data are inconsistent with H0, we can start to testindividual pairs (or contrasts) for average differences, making properadjustments for inflated Type I errors along the way.

ANOVA procedure is more efficient than Bonferroni and otheradjustments.

ANOVA is a direct generalization of a t-test to a comparison of morethan two groups.

Ed Kroc (UBC) EPSE 592 September 25, 2019 18 / 50

Page 19: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

The Analysis of Variance (ANOVA) Paradigm

Most importantly:The ANOVA procedure can be generalized to account for a variety ofsecondary effects (confounding variables).

ANOVA gives us a framework to study interaction effects; i.e. howone explanatory variable can mediate the effect of anotherexplanatory variable on the response of interest.

ANOVA procedure is flexible enough to account for a large variety ofexperimental designs (e.g. repeated measures, nested designs, randomeffects, etc.)

We will explore all of these and more in the coming weeks.

Ed Kroc (UBC) EPSE 592 September 25, 2019 19 / 50

Page 20: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Data types

An ANOVA model posits a linear relationship between categoricalexplanatory variables (factors) and a continuous response of interest.

Nominal data: categorical, no orderingE.g. sex, preferred electoral candidate

Ordinal data: categorical, with orderingE.g. rankings (Likert responses, maybe), severity of disease

Count data: ordering with equal distancesE.g. age*, number of occurrences

Continuous data: ordered continuumE.g. time, space, height, weight, age*

Choice of model and analysis will depend on data type.Note: Always ignore Stevens’s levels of measurement: nominal,ordinal, interval, ratio - these are irrelevant in practice and in theory.

Ed Kroc (UBC) EPSE 592 September 25, 2019 20 / 50

Page 21: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

The One-way, Fixed Effects ANOVA Model

The one-way (one-factor), fixed effects ANOVA model:

Y “ µ ` τX ` ε

Y is the continuous response of interestX is the categorical variable, with observations in all categories, usedto explain variation in YA fixed effects model is one where the explanatory variable(s) X havetheir values fixed by the experimenter, and/or are exhausted by theexperimental design.µ is the grand mean; i.e. the average of all Y valuesτX is the average treatment effect of X on Y; i.e. the average of allY ´ µ values for each fixed value of Xε is the leftover error; i.e. the variation in Y unexplained by µ and τX.

Ed Kroc (UBC) EPSE 592 September 25, 2019 21 / 50

Page 22: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

The One-way, Fixed Effects ANOVA Model: example

The one-way ANOVA model for our anxiety (Y) vs. education (X) data:

Yanx “ µ ` τedu ` ε

Levels of X were fixed by experimental design; thus, τedu is a fixedeffect that, here, can assume three values.Y is a random variable, so ε is too.Note: τX ‰ µX

µX “ average of all Y values for each fixed X valueτX “ average of all Y ´ µ values for each fixed X value

Thus, testing the hypothesis

H0 : µB “ µM “ µP

is equivalent to testing the hypothesis

H0 : τB “ τM “ τP “ 0Ed Kroc (UBC) EPSE 592 September 25, 2019 22 / 50

Page 23: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

The One-way, Fixed Effects ANOVA Model

Understanding the treatment effect encoded by τX:

µ

τ1

µ1

τ2

µ2

τ3

µ3

In general, τX “ average of all Y ´ µ values for each fixed X value

Expressed another way, τX “ µX ´ µ

So, if all treatments have the same effect, then they all equal thegrand mean µ and τX “ 0 for all fixed values of X.

Ed Kroc (UBC) EPSE 592 September 25, 2019 23 / 50

Page 24: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

The One-way, Fixed Effects ANOVA Model

Understanding the individual error encoded by ε: suppose we have datapoints on Y (continuous response) and X, a categorical variable with 3levels. Suppose observations Y1, Y2, and Y3 belong to group X1.

µ µ1

ε1

Y1

ε2

Y2

ε3

Y3µ2 µ3

In general, ε can be different for every observation/individual; it is thedifference between the observed response Y and the group mean µX

Explicitly, ε “ Y ´ µX

Ed Kroc (UBC) EPSE 592 September 25, 2019 24 / 50

Page 25: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

The One-way, Fixed Effects ANOVA Model

The one-way (one-factor), fixed effects ANOVA model:

Y “ µ ` τX ` ε

Using the previous two slides, this model can be rewritten as:

Y ´ µ “ pµX ´ µq ` pY ´ µXq

In practice, we do not observe µ or µX, but we do observe the samplegrand mean and sample group means.

Can use these sample statistics to estimate the above equation andthen test the hypothesis that H0 : µX “ µ for all fixed values of X.

Ed Kroc (UBC) EPSE 592 September 25, 2019 25 / 50

Page 26: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Partitioning the ANOVA model into variance components

We have observations on a response Y and an explanatory factorvariable X with K distinct factors.

For example, if X is the education level from previous example, thenK “ 3.

Total sample size “ N.For example, in the anxiety vs. education example, N “ 18.

Sample size within each factor level of X is nj for 1 ď j ď K.Therefore,

Kÿ

j“1nj “ N.

For example, in the anxiety vs. education example, nj “ 6 for all1 ď j ď 3.

Ed Kroc (UBC) EPSE 592 September 25, 2019 26 / 50

Page 27: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Partitioning the ANOVA model into variance components

NOTATION: Yij denotes experimental unit i within factor level j.

NOTATION:ĎY¨j “

1nj

njÿ

i“1Yij

is the sample mean of the responses that all share the same factorlevel j.

NOTATION:ĎY¨¨ “

1N

Kÿ

j“1

njÿ

i“1Yij

is the sample mean of all responses.

Ed Kroc (UBC) EPSE 592 September 25, 2019 27 / 50

Page 28: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Example: education levels vs. anxiety

Bachelor’s pj “ 1q Master’s pj “ 2q PhD pj “ 3q

Y1,1 “ 6.2 Y1,2 “ 6.2 Y1,3 “ 6.9Y2,1 “ 5.8 Y2,2 “ 6.9 Y2,3 “ 9.0Y3,1 “ 6.0 Y3,2 “ 6.2 Y3,3 “ 7.7Y4,1 “ 5.9 Y4,2 “ 7.7 Y4,3 “ 9.1Y5,1 “ 6.6 Y5,2 “ 6.8 Y5,3 “ 8.3Y6,1 “ 6.2 Y6,2 “ 7.9 Y6,3 “ 8.0

ĎY¨1 “ 6.12 ĎY¨2 “ 6.95 ĎY¨3 “ 8.12

ĎY¨¨ “ 7.08

Ed Kroc (UBC) EPSE 592 September 25, 2019 28 / 50

Page 29: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Partitioning the ANOVA model into variance components

Our goal is to partition the observed variation in our response Y into twodistinct pieces:

(1) variation explained by the different factor levels (treatments)(2) leftover (residual) variation

Recall: our ANOVA model can be written as:

Y ´ µ “ pµX ´ µq ` pY ´ µXq (theoretical model)

Since we do not observe µX or µ, we replace them by their sampleestimates ĎY¨j and ĎY¨¨

Also, replace the generic Y by our observed Yij values:

Yij ´ ĎY¨¨ “ pĎY¨j ´ ĎY¨¨q ` pYij ´ ĎY¨jq (sample estimate of model)

Ed Kroc (UBC) EPSE 592 September 25, 2019 29 / 50

Page 30: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Partitioning the ANOVA model into variance components

Now we square both sides of the equation:

pYij ´ ĎY¨¨q2 “

pĎY¨j ´ ĎY¨¨q ` pYij ´ ĎY¨jq‰2

“ pĎY¨j ´ ĎY¨¨q2 ` pYij ´ ĎY¨jq

2 ` 2pĎY¨j ´ ĎY¨¨qpYij ´ ĎY¨jq

Now sum over all observations:K

ÿ

j“1

njÿ

i“1pYij ´ ĎY¨¨q

2 “

Kÿ

j“1

njÿ

i“1pĎY¨j ´ ĎY¨¨q

2 `

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨jq

2

` 2K

ÿ

j“1

njÿ

i“1pĎY¨j ´ ĎY¨¨qpYij ´ ĎY¨jq

Examine the last term in the equation:

Ed Kroc (UBC) EPSE 592 September 25, 2019 30 / 50

Page 31: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Partitioning the ANOVA model into variance components

Examine the last term in the equation:

2K

ÿ

j“1

njÿ

i“1pĎY¨j ´ ĎY¨¨qpYij ´ ĎY¨jq “ 2

Kÿ

j“1pĎY¨j ´ ĎY¨¨q

njÿ

i“1pYij ´ ĎY¨jq

Now, we can simplify the last factor as follows:nj

ÿ

i“1pYij ´ ĎY¨jq “

njÿ

i“1Yij ´

njÿ

i“1

ĎY¨j

“njnj

njÿ

i“1Yij ´ ĎY¨j

njÿ

i“11

“ njĎY¨j ´ njĎY¨j

“ 0

Ed Kroc (UBC) EPSE 592 September 25, 2019 31 / 50

Page 32: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Partitioning the ANOVA model into variance components

Therefore, the entire cross-term disappears:

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨¨q

2 “

Kÿ

j“1

njÿ

i“1pĎY¨j ´ ĎY¨¨q

2 `

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨jq

2

` 2K

ÿ

j“1

njÿ

i“1pĎY¨j ´ ĎY¨¨qpYij ´ ĎY¨jq

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨¨q

2 “

Kÿ

j“1

njÿ

i“1pĎY¨j ´ ĎY¨¨q

2 `

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨jq

2 ` 0

Ed Kroc (UBC) EPSE 592 September 25, 2019 32 / 50

Page 33: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Partitioning the ANOVA model into variance components

This final equation is the fundamental equation of analysis of variance.

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨¨q

2 “

Kÿ

j“1

njÿ

i“1pĎY¨j ´ ĎY¨¨q

2 `

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨jq

2

This equation says that the sample variance in the response variable isequal to the sample variance in the average response for eachtreatment plus the sample variance of the responses within eachtreatment.

This is typically written as a sum of squares (SS) equation:

SStotal “ SStreatment ` SSerror

Or:SStotal “ SSbetween ` SSwithin

Ed Kroc (UBC) EPSE 592 September 25, 2019 33 / 50

Page 34: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Partitioning the ANOVA model into variance components

This final equation is the fundamental equation of analysis of variance.

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨¨q

2 “

Kÿ

j“1

njÿ

i“1pĎY¨j ´ ĎY¨¨q

2 `

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨jq

2

Notice how each term is a sum of squared differences from the grand(terms 1 and 2) or treatment (term 3) means. This is exactly how wealways measure variability, up to a constant multiple.

Notice: the variance in the response is partitioned into variabilityexplained by the average treatment effect (term 2) plus variabilityleftover (term 3).

Ed Kroc (UBC) EPSE 592 September 25, 2019 34 / 50

Page 35: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Examples to clarify the math: Ex. 1

Suppose we have these sample data on Y over a categorical variableX with 3 factor levels:

X “ 1 X “ 2 X “ 3Y1,1 “ 1 Y1,2 “ ´1 Y1,3 “ 5Y2,1 “ 1 Y2,2 “ ´1 Y2,3 “ 5Y3,1 “ 1 Y3,2 “ ´1 Y3,3 “ 5

Then:ĎY¨1 “ 1, ĎY¨2 “ ´1, ĎY¨3 “ 5

And ĎY¨¨ “ 1.67.

Now plug into the fundamental equation of ANOVA:

Ed Kroc (UBC) EPSE 592 September 25, 2019 35 / 50

Page 36: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Examples to clarify the math: Ex. 1

Fundamental equation of ANOVA:

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨¨q

2 “

Kÿ

j“1

njÿ

i“1pĎY¨j ´ ĎY¨¨q

2 `

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨jq

2

Notice that the last term equals zero!

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨jq

2 “p1 ´ 1q2 ` p1 ´ 1q2 ` p1 ´ 1q2

` p´1 ` 1q2 ` p´1 ` 1q2 ` p´1 ` 1q2

` p5 ´ 5q2 ` p5 ´ 5q2 ` p5 ´ 5q2 “ 0

So, as expected, all the variability in the response is explained by thedifferent treatment/factor levels of X.

Ed Kroc (UBC) EPSE 592 September 25, 2019 36 / 50

Page 37: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Examples to clarify the math: Ex. 2

Now, suppose we have these sample data instead on Y over acategorical variable X with 3 factor levels:

X “ 1 X “ 2 X “ 3Y1,1 “ ´1.1 Y1,2 “ ´4.2 Y1,3 “ 0.5Y2,1 “ 0.5 Y2,2 “ ´0.1 Y2,3 “ 0.6Y3,1 “ 2.4 Y3,2 “ 6.1 Y3,3 “ 0.7

Then:ĎY¨1 “ 0.6, ĎY¨2 “ 0.6, ĎY¨3 “ 0.6

And ĎY¨¨ “ 0.6.

Now plug into the fundamental equation of ANOVA:

Ed Kroc (UBC) EPSE 592 September 25, 2019 37 / 50

Page 38: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Examples to clarify the math: Ex. 2

Fundamental equation of ANOVA:

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨¨q

2 “

Kÿ

j“1

njÿ

i“1pĎY¨j ´ ĎY¨¨q

2 `

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨jq

2

Notice that the second term equals zero now!

Kÿ

j“1

njÿ

i“1pĎY¨j ´ ĎY¨¨q

2 “p0.6 ´ 0.6q2 ` p0.6 ´ 0.6q2 ` p0.6 ´ 0.6q2

` p0.6 ´ 0.6q2 ` p0.6 ´ 0.6q2 ` p0.6 ´ 0.6q2

` p0.6 ´ 0.6q2 ` p0.6 ´ 0.6q2 ` p0.6 ´ 0.6q2 “ 0

So, as expected, all the variability in the response is explained by thevariation within each treatment/factor level of X.

Ed Kroc (UBC) EPSE 592 September 25, 2019 38 / 50

Page 39: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mean sum of squares

The fundamental equation of analysis of variance:

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨¨q

2 “

Kÿ

j“1

njÿ

i“1pĎY¨j ´ ĎY¨¨q

2 `

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨jq

2

Notice how each term is a sum of squared differences from the grand(terms 1 and 2) or treatment (term 3) means. This is exactly how wealways measure variability, up to a constant multiple.

Recall: to define the sample variance, we had to divide by a constant:

S2 “1

N ´ 1

Nÿ

ℓ“1pYℓ ´ sYq2

The same applies for the ANOVA equation:

Ed Kroc (UBC) EPSE 592 September 25, 2019 39 / 50

Page 40: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Mean sum of squares

An unbiased estimator of the total variance is the total mean square:

MStotal “1

N ´ 1

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨¨q

2

An unbiased estimator of the between treatment variance is thetreatment mean square:

MStreatment “1

K ´ 1

Kÿ

j“1

njÿ

i“1pĎY¨j ´ ĎY¨¨q

2

An unbiased estimator of the within treatment variance is the errormean square:

MSerror “1

N ´ K

Kÿ

j“1

njÿ

i“1pYij ´ ĎY¨jq

2

Ed Kroc (UBC) EPSE 592 September 25, 2019 40 / 50

Page 41: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Testing the ANOVA null hypothesis

Recall that the null hypothesis that a one-way (fixed factor) ANOVAmodel is designed to test is:

H0 : µ1 “ µ2 “ ¨ ¨ ¨ “ µK,

where µj is the mean response over the jth category of theexplanatory factor X, 1 ď j ď K.

Under this null hypothesis, we have that:

MStreatmentMSerror

„ FpK ´ 1,N ´ Kq

That is, the ratio of the between and within treatment samplevariance estimators derived from the ANOVA model give anF-statistic under the null hypothesis.

Thus, we can use this ratio as a test statistic and calculate p-values!Ed Kroc (UBC) EPSE 592 September 25, 2019 41 / 50

Page 42: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Example: education vs. anxiety

Bachelor’s Master’s PhD6.2 6.2 6.95.8 6.9 9.06.0 6.2 7.75.9 7.7 9.16.6 6.8 8.36.2 7.9 8.0

The “treatments” (i.e. factor levels) here are different levels ofeducation attained.

An ANOVA will decompose the total variance in the data into onepiece that captures variability within each factor level, and one piecethat captures variability of the average response between all factorlevels.

Ed Kroc (UBC) EPSE 592 September 25, 2019 42 / 50

Page 43: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Example: education vs. anxiety

In Jamovi:

Enter data as two columns (anxiety, education) just as you would foran independent samples t-test in Jamovi.

Click “Analyses” tab, then “ANOVA”, then “ANOVA” again.

Assign “Anxiety” to dependent variable.

Assign “Education” to fixed factor(s).

Sums of squares, degrees of freedom, mean squares, F-statistic, andp-value will appear in output.

Click on appropriate boxes to produce assumption checks, pre-plannedcontrasts, or post-hoc comparisons.

Ed Kroc (UBC) EPSE 592 September 25, 2019 43 / 50

Page 44: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Example: education vs. anxiety

Very small p-value (equivalent to very large F-statistic) means dataare inconsistent with the null hypothesis.

Thus, seem to have evidence that at least one factor level ofeducation produces different average anxiety responses than the otherlevels of education.

Ed Kroc (UBC) EPSE 592 September 25, 2019 44 / 50

Page 45: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Example: education vs. anxiety

Notice: mean squares equal sum of squares divided by appropriatedegrees of freedom.

Notice: F-statistic is quotient of MSptreatmentq and MSperrorq.

TERMINOLOGY: we usually refer to observed errors as residuals.

Ed Kroc (UBC) EPSE 592 September 25, 2019 45 / 50

Page 46: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Post hoc, pairwise treatment comparisons

In the previous example, we found strong evidence that the data areinconsistent with

H0 : τ1 “ τ2 “ τ3 “ 0

But which treatments actually exhibit significant differences?

Post hoc comparisons:Testing for pairwise (or other composite) differences after you haveobserved a significant F-statistic in the ANOVA.This is different than pre-planning a specific pairwise (or othercomposite) comparison before you run your ANOVA.Statistical procedure: run a bunch of t-tests on the groups you want tocompare.But what about the multiple testing issue (i.e. inflated Type I error)?...

Ed Kroc (UBC) EPSE 592 September 25, 2019 46 / 50

Page 47: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Post hoc, pairwise treatment comparisons

There are many different techniques that have been developed toconduct post hoc comparisons while adjusting for Type I errorinflation:

Bonferroni (bad: never use for post hoc comparisons of an ANOVA)Tukey’s honest significant differencesScheffe’s correction...and many others

All these methods examine different types of t-tests and/or imposedifferent p-value adjustments.

The output of all these methods is interpreted the same: exactly howyou would interpret a standard t-test.

Ed Kroc (UBC) EPSE 592 September 25, 2019 47 / 50

Page 48: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Post hoc, pairwise treatment comparisons

In Jamovi, click on “Post Hoc Tests” tab;

Identify the variable/factor you want to make comparisons on;

Select which kind of adjustment you want to make.

Ed Kroc (UBC) EPSE 592 September 25, 2019 48 / 50

Page 49: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Post hoc, pairwise treatment comparisons

Scheffe’s is most common: widely applicable, but conservative.

Tukey’s is more powerful than Scheffe’s when groups are close tobalanced; i.e. when the number of responses in each treatment groupare about the same.

weak evidence of a Master’s vs. PhD effect

no evidence of a Bachelor’s vs. Master’s effect

strong evidence of a Bachelor’s vs. PhD effect

Ed Kroc (UBC) EPSE 592 September 25, 2019 49 / 50

Page 50: EPSE 592: Design & Analysis of Experiments · EPSE 592: Design & Analysis of Experiments EdKroc University of British Columbia ed.kroc@ubc.ca September25,2019 Ed Kroc (UBC) EPSE 592

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

Post hoc vs. pre-planned comparisons

Post hoc comparisons are experimentally (and so statistically)different from pre-planned comparisons. Why?

A post hoc comparison is made because you first observe a significantANOVA statistic; i.e. your decision to make a follow-up comparisondepends on the outcome of your first statistical test (the ANOVA)A pre-planned comparison is decided upon in the experimental designphase, before you collect any data or observe any statistics; thus, it isnot dependent on the outcome of any other statistics (e.g. an ANOVA)Post hoc comparisons are more susceptible to inflated Type I errorsPre-planned comparisons will provide stronger evidence, since you arenot just “looking for any significant effects”

In most applied research, all pairwise comparisons are post hoc

In deeper, more controlled experiments (e.g. Phase II and III clinicaltrials), pre-planned comparisons are required

Ed Kroc (UBC) EPSE 592 September 25, 2019 50 / 50