1 topic 8 – one-way anova single factor analysis of variance reading: 17.1, 17.2, & 17.5 skim:...

71
1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

Upload: franklin-blair

Post on 22-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

1

Topic 8 – One-Way ANOVA

Single Factor Analysis of Variance

Reading: 17.1, 17.2, & 17.5

Skim: 12.3, 17.3, 17.4

Page 2: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

2

Overview

Categorical Variables (Factors)

Fixed vs. Random Effects

Review: Two-sample T-test

ANOVA as a generalization of the two-sample T-test

Cell-Means and Factor-Effects ANOVA Models (same model, different form)

Page 3: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

3

Terminology: Factors & Levels The term factor is generally used to refer to

a categorical predictor variable. Blood Type

Gender

Drug Treatment

Other Examples?

The term levels is used to refer to the specific categories for a factor.

A / B / AB / O (could also consider +/-)

Male / Female

Page 4: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

4

Factors: Fixed or Random?

A factor is fixed if the levels under consideration are the only ones of interest.

The levels of the factor are selected by a non-random process AND are the only levels of interest.

For the time being, all factors that we will consider will be fixed.

Examples?

Page 5: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

5

Factors: Fixed or Random? (2)

A factor is random if the levels under consideration may be regarded as a sample from a larger population.

Not all levels of interest are included in the study – only a random sample.

We want to inferences to be applicable to the entire (larger) population of levels.

Examples?

Analysis is a little more complicated; we’ll save this topic for near the end of the course.

Page 6: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

6

Example: Random or Fixed?

To study the effect of diet on cattle, an experimenter randomly (and equally) allocates 50 cows to 5 diets (a control and 4 experimental diets). After 1 year, the cows are butchered and the amount of good meat (in pounds) is measured.

Response = ______________

Cow = _______ Factor

Diet = _______ Factor

Page 7: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

7

Notation

In general, we label our factors A, B, C, etc.

Factor A has levels i = 1, 2, 3, ..., a

Factor B has levels j = 1, 2, 3, ..., b

Factor C has levels k = 1, 2, 3, ..., c

More on notation later; remember for now we are considering single factor ANOVA, so we will have only a “Factor A”.

Page 8: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

8

Comparing Groups

Suppose I want to compare heights between men and women. How would I do this?

Page 9: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

9

Notation for Two-Sample Settings

Suppose an SRS (simple random sample) of size n1 is selected from the 1st population, and another SRS of size n2 is selected from the 2nd population.

Population Sample size

Sample mean

Sample standard deviation

1

2

n1

n2

s1

s2

1y

2y

Page 10: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

10

Estimating Differences

A natural estimator of the difference is the difference between the sample means:

If we assume that both populations are normally distributed (or CLT applies) then both sample means and their difference will be normally distributed as well.

Because we are estimating standard deviations, a confidence interval for the difference in means uses the T-distribution.

1 2

1 2y y

Page 11: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

11

If variances are unknown, then a 95% confidence interval for difference in means is given by

The critical value is . The degrees of freedom is n1 + n2 – 2.

21 2

1 2

1 1crit pooledy y t s

n n

0.975,crit dft t

CI for Difference

Page 12: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

12

Test for Difference = 0

Can also be viewed as a hypothesis test

Test statistic for testing whether the difference is zero:

Compare to critical value used in CI.

1 2

2

1 2

1 1pooled

y yT

sn n

Page 13: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

13

Conclusions If the test statistic is of larger magnitude

(ignore sign) than the critical value, we reject the hypothesis

There is a significant difference between the two groups

The same conclusion results if the CI doesn’t contain zero.

If the statistic is smaller (CI does contain zero), we fail to reject the hypothesis

Fail to show a difference between the two groups

Page 14: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

14

Comparison of Several Groups

Suppose instead of two groups, we have “a” groups that we wish to compare (where a > 2). Note: In Chapter 17, textbook defines the number of groups as “k”. Remember this is just a letter, and the letter we use really has nothing to do with anything in particular. So I’m using a to correspond (consistently) to Factor A.

Page 15: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

15

Multiple treatment model With a groups (treatments), then we could

do two-sample t-tests. But...

This does not test the equality of all means at once

Multiple tests means we have greater chance of making Type I errors (a Bonferroni correction can get expensive because of the large number of tests).

We usually expect variances to be the same across groups, but it isn’t clear how we should estimate variance with more than two samples.

12 1a a

0 1 2: ... aH

Page 16: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

16

Multiple treatment model (2) Analysis of Variance (ANOVA) models

provide a more efficient way to compare multiple groups. For example, in a single factor ANOVA,

The Model (or ANOVA) F-test will test the equality of all group means at the same time.

There are methods of doing pairwise comparisons that are much more efficient than Bonferroni.

All observations (from all groups) are used to estimate the overall variance (by MSE).

Page 17: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

17

Three Ways to View ANOVA

Views observations in terms of their group meanscell means model

Views observations as the sum of an overall mean, a deviation from that mean related to the particular group to which the observation belongsfactor effects model

As regression, using indicator variables.

Page 18: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

18

ANOVA Model

Cell Means Model

Page 19: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

19

ANOVA

ANOVA is generally viewed as a an extension of the T-test but used for comparisons of three or more population means.

These populations are denoted by the levels of our factor.

Only one variable, but has 3+ levels or groups

Hence we call the means of these levels factor level means or simply cell means.

Page 20: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

20

Cell Means Model

Basic ANOVA Model is:

where

Notation:

“i” subscript indicates the level of the factor

“j” subscript indicates observation number within the group

ij i ijY 2~ 0,ij N

1,2,3,...,i a

1,2,3,..., ij n

Page 21: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

21

Cell Sizes

For the time being, we will assume that all the cell sizes are the same:

The total sample size will be denoted

for all in n i

1

(when cell sizes are all )a

ii

N n an n

Page 22: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

22

Assumptions for fixed effects Random samples have been selected for each

level of the factor. All observations are independent.

Response variable is normally distributed for each population (level) and the population variances are the same.

Hence, independence, normality and constant variance

What happened to linearity?

Page 23: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

23

Robustness

ANOVA procedures are generally robust to minor departures from the assumptions (i.e. minor deviations from the assumptions will not affect the performance of the procedure).

For major departures, transformations of the response variable [e.g. Log(Y)] may help.

Transforming the Factor(IE predictor) in ANOVA doesn’t help because it’s categorical

Page 24: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

24

Components of Variation

Variation between groups gets “explained” by allowing the groups to have different means.

We know this as SSM, SSR, or now SSA!

Variation within groups is unexplained.

We know this as SSE (it stays the same )

The ratio F = MSM / MSE forms the basis for testing the hypothesis that all group means are the same. (or F = MSA / MSE)

Page 25: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

25

Variation: Between vs. Within

A convenient way to view the SS

SSA is called the “between” SS because it represents variation between the different groups. It is determined by the squared differences between group means and the grand (overall) mean.

SSE is called the “within” SS because it represents variation within groups. It is determined by the squared differences of observations from their group means.

Page 26: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

26

Quick Comment on Notation

DOT indicates “sum”

BAR indicates “average” or “divide by cell/sample size”

is the mean for all observations

is the mean for the observations in Level i of Factor A.

Y

iY

Page 27: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

27

Pictorial Representation

GROUP 1 GROUP 3GROUP 2

1Y

} - gij iY Y

üïïï -ýïïïþg ggiY Y

Y

ìïïïïïï- íïïïïïïî

ggijY Y

Page 28: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

28

SS Breakdown (Algebraic)

Break down difference between observation and grand mean into two parts:

BETWEEN WITHIN GROUPS GROUPS

( ) ( ) ( ) Total Deviation of Estimated Deviaton aroundDeviation Factor Level Mean Estimated Factor

Level Mean Around Grand Mean

ij i ij iY Y Y Y Y Y- = - + -gg g gg g144424443 144424443 144424443

Page 29: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

29

Components of Variation (2)

Of course the individual components would sum to zero, so we must square them. It turns out that all cross-product terms cancel, and we have:

BETWEEN WITHIN

GROUPS GROUPS

( ) ( ) ( )- = - + -å å ågg g gg g

1444442444443 1444442444443 1444442444443

2 2 2

, , ,

SST SSESSA

ij i ij ii j i j i j

Y Y Y Y Y Y

Page 30: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

30

ANOVA Table

Source SS df MS F

Factor A SSA a – 1 MSA MSA MSE

Error SSE N – a MSE

Total SST N – 1

Page 31: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

31

Model F Test (Cell Means)

Null Hypothesis

Alternative Hypothesis

0 1 2: aH

: There exists some pair of population means not equal.

aH

Page 32: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

32

Conclusion

If we reject the null hypothesis, we have shown differences between groups (levels)

Remember it does not tell us which groups are different. Only that at least one group is different from at least one other group!

If we fail to reject the null hypothesis, we have failed to show any significant differences with the ANOVA F test

Unfortunately sometimes if we look a little closer (we’ll do this later) we still might find some differences!

Page 33: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

33

Calculations: A Brief Look

We’ll consider these for only a balanced design (cell sizes all the same n).

The purpose in doing this is not that you memorize formulas, but that you further your conceptual understanding of the sums of squares.

Page 34: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

34

SS Calculations(Balanced)

2 2

1 1 1

2

1 1

2

1 1

a n a

i ii j i

a n

ij ii j

a n

iji j

SSA Y Y n Y Y

SSE Y Y

SST Y Y

Page 35: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

35

Blood Type Example (1)

Suppose we have 3 observations of a certain response variable for each blood type

Want to construct the ANOVA table

28 32 21 32

27 34 22 32

28 35 25 34

A B O AB

Page 36: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

36

Blood Type Example (2)

We can compute the sample means using SAS:

proc means; class type; output out=means mean=YBAR; proc print; run;

Obs type _TYPE_ _FREQ_ YBAR 1 0 12 29.1667 2 A 1 3 27.6667 3 AB 1 3 32.6667 4 B 1 3 33.6667 5 O 1 3 22.6667

Page 37: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

37

Blood Type Example (3)

SSA (Between)

At this point, we have a choice – to calculate SSE or SST.

2

1

2 2 2 2

2 2 2 2

3

3 1.5 3.5 4.5 6.5

231

a

ii

A i i i

SSA n Y Y

Y Y Y Y Y Y Y Y

Page 38: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

38

Blood Type Example (4)

2

1 1

2 2 2

1 2 3

2 2 2

...

28 27.667 27 27.667 ... 34 22.667

16.67

231 16.67 247.67

a n

ij ii j

A A A A AB AB

SSE Y Y

Y Y Y Y Y Y

SST SSA SSE

Page 39: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

39

Blood Type Example (5)

DF: 4 – 1 = 3 for Factor A

DF: N – 1 = 11 for Total

DF: 11 – 3 = 8 for Error

Mean Squares:

231/3 77

16.67 /8 2.08

MSA

MSE

Page 40: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

40

Blood Type Example (6)

ANOVA Table

F-test is significant, and so we conclude that there is some difference among the means (we just don’t know exactly which means are different).

Source SS df MS F

Between 231.00 3 77.00 36.95

Within 16.67 8 2.084

Total 247.67 11

Page 41: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

41

SAS Coding

Will use PROC GLM with an important addition: CLASS statement

CLASS statement identifies categorical variables for SAS

Note that failure to use CLASS statement for categorical variable will result in:

SYNTAX ERROR if character variable

INAPPROPRIATE ANALYSIS if class levels are numeric

Page 42: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

42

Blood Type Example (SAS)

proc glm data=bloodtype; class type; model resp=type; output out=diag p=pred r=resid; Sum of Source DF Squares Mean Square F Value Pr > F Model 3 231.0000000 77.0000000 36.96 <.0001 Error 8 16.6666667 2.0833333 Total 11 247.6666667 R-Square Coeff Var Root MSE resp Mean 0.932705 4.948717 1.443376 29.16667

Page 43: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

43

Residual Diagnostics

Very similar to what we did in regression

Normality plot is the same – keep in mind that most of the tests in ANOVA are robust to minor violations of normality (thanks to the CLT).

In constant variance plot, still may see megaphone shape in RESID vs. PRED if non-constant variance is a problem.

In plots against the factor levels (commonly used), would simply see differing vertical spreads (not megaphone, because generally the labels on the horizontal axis are not “ordered”)

Page 44: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

44

Blood Type (QQ Plot)

Page 45: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

45

Blood Type (Residual Plot)

Page 46: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

46

Model Estimates In SAS, using /solution as an option in the MODEL

statement of PROC GLM, we can get the parameter estimates for our model.

Unfortunately these are not the cell means!

Standard Parameter Estimate Error t Value Pr > |t| Intercept 22.66666667 B 0.83333333 27.20 <.0001 type A 5.00000000 B 1.17851130 4.24 0.0028 type AB 10.00000000 B 1.17851130 8.49 <.0001 type B 11.00000000 B 1.17851130 9.33 <.0001 type O 0.00000000 B . . . NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

Page 47: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

47

Cell or Group Means

22.67 5 27.67

22.67 11 33.67

22.67 10 32.67

22.67 0 22.67

A

B

AB

O

Y

Y

Y

Y

To get each cell mean or just add the intercept to each parameter estimate

iY

Page 48: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

48

Model Estimates The reason for this is that there are infinitely

many ways to write down the model for ANOVA.

SAS tells us this by saying ALL estimates are “biased”. So what is SAS actually doing?

Standard Parameter Estimate Error t Value Pr > |t| Intercept 22.66666667 B 0.83333333 27.20 <.0001 type A 5.00000000 B 1.17851130 4.24 0.0028 type AB 10.00000000 B 1.17851130 8.49 <.0001 type B 11.00000000 B 1.17851130 9.33 <.0001 type O 0.00000000 B . . . NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

Page 49: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

49

ANOVA Model

Factor Effects Model

(Another convenient view)

Page 50: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

50

A simple example

Three groups:

Grand Mean

1

2

3

30

35

37

34

1

2

3

30 34 4

35 34 1

37 34 3

Page 51: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

51

Factor Effects Model

An alternative to viewing each observation as a deviation from the cell mean, we may consider observations as deviations from the grand (or overall) mean.

Part of that deviation is explained by the cell (or group). We call that part or factor level effects.

We essentially break from the cell-means model into two pieces:

i i

i

i

Page 52: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

52

Factor Effects Model

is the grand (or overall) mean. is the ith treatment effect (difference

between group mean and ) is the error component. is the ith treatment mean. Restriction is made.

1,2,...,

1, 2,...,ij i iji

i aY

j n

i

2~ 0,ij N

i i

0i

Page 53: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

53

Why the Restriction?

Note that estimating would require one more estimate than in the cell means model .

So for the models to be identical, we must add a constraint.

Convenient: makes the grand (or overall) mean.

What exactly does SAS do?

1 2, , ,..., a

1 2, ,..., a

0i

Page 54: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

54

Restriction made by SAS

Last level (alphabetically!!!) is set to ZERO.

This means the intercept (estimate for ) will represent the mean for the “last” group.

So they are not exactly the factor effects, but can we recover factor effects from this?

Standard Parameter Estimate Error t Value Pr > |t| Intercept 22.66666667 B 0.83333333 27.20 <.0001 type A 5.00000000 B 1.17851130 4.24 0.0028 type AB 10.00000000 B 1.17851130 8.49 <.0001 type B 11.00000000 B 1.17851130 9.33 <.0001 type O 0.00000000 B . . . NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

O

Page 55: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

55

Estimating Factor Effects

We previously calculated the cell means (this is the first step):

22.67 5 27.67

22.67 11 33.67

22.67 10 32.67

22.67 0 22.67

A

B

AB

O

Y

Y

Y

Y

Page 56: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

56

Estimating Factor Effects (2)

The overall mean will be the weighted average of the group means (in this case, it’s a straightforward average since the cell sizes are identical):

3 27.67 3 33.67 3 32.67 3 22.67

1229.167

Y

Page 57: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

57

Estimating Factor Effects (3)

The factor effects are the differences between the group and overall means:

Note: Sum of these is ZERO always.

ˆ 27.67 29.17 1.5

ˆ 33.67 29.17 4.5

ˆ 32.67 29.17 3.5

ˆ 22.67 29.17 6.5

A

B

AB

O

Page 58: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

58

Estimates / Tests Alphas are estimated by

For the model F test: Testing the hypothesis that all the means are the same is equivalent to testing

against the alternative

ˆi iY Y

0 1 2: ... 0kH

: 0 for some a iH i

Page 59: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

59

ANOVA as REGRESSION

We’ll look at this only briefly, as in practice we don’t generally view ANOVA in this way. But SAS does! So part of the context here is to help us understand (eventually) how ANOVA models work in SAS.

Page 60: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

60

Dummy Variables

When we view ANOVA as a regression model, we do so using dummy variables.

We’ve already seen such a variable and even used it in the some examples where we had only two possible categories:

Smoking Status (Yes = 1, No = 0)

Gender (Male = 1, Female = 0)

Page 61: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

61

What is a Dummy Variable?

The most important thing about dummy variables is that the numeric value has no meaning beyond defining the category.

We could, for example, take (No = 1, Yes = 0) or (Female = 1, Male = 0) on the previous slide.

Additionally, we could use (Yes = 1, No = -1) without changing the flavor of the results. (the meaning of your parameter estimates would change, but the final interpretations would remain the same)

Page 62: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

62

Extension to Many Groups

If my categorical factor has a levels, then I will need a – 1 dummy variables to represent the factor.

Example: Blood Type (A, B, AB, O)

X1 = 1 if blood type = A; else X1 = 0

X2 = 1 if blood type = B; else X2 = 0

X3 = 1 if blood type = AB; else X3 = 0

Page 63: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

63

Degrees of Freedom

Recall our ANOVA model used a – 1 DF in the model (one fewer than the number of levels for the factor). Why?

Because of these indicator variables. It takes a – 1 indicator variables to encompass our categorical variable. That’s a – 1 slope estimates, and hence a – 1 DF.

In general, any categorical variable in your model will cost DF equal to the number of levels minus one.

Page 64: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

64

Extension to Many Groups (2)

My “Regression” Model will be

What do the parameters represent?

What is being tested with the overall model F test?

0 1 1( ) 2 2( ) 3 3( )A B ABY X X X

Page 65: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

65

Blood Type Example

Model:

is the true mean for blood type O.

is the true mean for type A.

is the true mean for type B.

is the true mean for type AB.

And here are some fairly natural estimates:

0

0 1

0 2

0 3

0 2

1 3

O B O

A O AB O

b Y b Y Y

b Y Y b Y Y

0 1 1( ) 2 2( ) 3 3( )A B ABY X X X

Page 66: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

66

Blood Type Example (2)

Standard errors for these estimates are also fairly intuitive since in general the standard error for a mean is of the form

For example,

SEM n

0 / OSE b MSE n

1 / /O ASE b MSE n MSE n

Page 67: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

67

Blood Type Example (3)

How do we test hypotheses?

H0: All means the same

H0: Mean for Type AB = Mean for Type O

H0: Mean for Type AB = Mean for Type A

Page 68: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

68

Summary

One level of our factor gets represented by the intercept. The slope estimates compare all other levels to that “base” level.

We can compare any set of levels that we want using a general linear test

This is exactly what SAS does for any ANOVA! But the output in SAS will be in a different form to make the interpretations easier.

Page 69: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

69

CLG Activity

Page 70: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

70

Questions?

Page 71: 1 Topic 8 – One-Way ANOVA Single Factor Analysis of Variance Reading: 17.1, 17.2, & 17.5 Skim: 12.3, 17.3, 17.4

71

Upcoming in Topic 9...

Pairwise Comparisons (Sec. 17.7-17.8)

Randomized Blocks (Chapter 18)