lecture 4: anova table - purdue universityghobbs/stat_512/lecture_notes/... · 4-1 lecture 4: anova...

35
4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

Upload: others

Post on 23-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-1

Lecture 4: ANOVA Table

STAT 512

Spring 2011

Background Reading

KNNL: 2.6-2.7

Page 2: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-2

Topic Overview

Working-Hotelling Confidence Band

Inference Example using SAS

ANOVA Table

Page 3: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-3

Working-Hotelling Confidence Band (1)

This gives a confidence limit for the whole

line at once, in contrast to the confidence

interval for just one ˆhY at a time.

Regression line 0 1 hb b X describes hE Y for

given hX .

We have 95% CI for hˆE(Y ) = hY pertaining to

specific hX .

Page 4: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-4

Working-Hotelling Confidence Band (2)

We want a 95% Confidence band for all Xh –

this is a confidence limit for the whole line at

once, in contrast to the confidence interval for

just one ˆhY at a time.

The confidence limit is given by ˆ ˆh hY W s Y ,

where 2 2 1 ;2, 2W F n . Since we are doing

all values of hX at once, it will be wider at each

hX than CIs for individual hX .

Page 5: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-5

Working-Hotelling Confidence Band (3)

We are used to constructing CI’s with t’s, not

W’s. Can we fake it?

We can find a new, smaller alpha for tc that

would give the same results – kind of an

“effective alpha” that takes into account that

you are estimating the entire line.

We find W2 for our desired true α, and then

find the effective αt to use with tc that gives

W(α) = tc(αt).

Page 6: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-6

SAS Example

(musclemass.sas)

(Problem 1.27 in KNNL)

Muscle mass is expected to decrease with

age. Study explores this relationship in

women (n = 60)

15 women randomly selected from each of

four age groups 40-49, 50-59, 60-69, 70-79

We will analyze this data set assuming that

the simple linear regression model applies.

Page 7: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-7

Read in the Data

For textbook files – easiest way is to simply

open data as text file or through website

and paste it into SAS using “datalines”.

DATA muscle;

input mmass age;

datalines;

106 43

106 41

.....

;

Page 8: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-8

Produce a Scatter Plot

goptions ftitle=centb ftext=swissb htitle=3

htext=1.5 ctitle=blue ctext=black;

symbol1 v=dot c=blue ;

axis1 label=('Age (Years)');

axis2 label=(angle=90 'Muscle Mass');

PROC GPLOT data=muscle;

plot mmass*age /haxis=axis1 vaxis=axis2;

title 'Muscle Mass vs Age in women';

RUN; QUIT;

Page 9: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-9

Page 10: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-10

Examining Scatter Plots

Form – linear looks mostly reasonable

Direction – muscle mass seems to decrease

as age increases

Strength – there is quite a bit of scatter so

the relationship is likely weak to moderate

Page 11: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-11

Regression Model Goals

Estimate the difference in mean muscle

mass for women differing in age by 1 year.

Produce CI’s and PI’s for women age 50,

60, and 70

Plot 95% Confidence Band for the

regression line.

Page 12: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-12

Preliminaries

DATA slime;

age = 50; mmass = .; output;

age = 60; mmass = .; output;

age = 70; mmass = .; output;

DATA muscle; set muscle slime;

PROC PRINT; RUN;

This adds to the data set so that we can easily

predict for ages of 50, 60, and 70.

Page 13: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-13

PROC REG

PROC REG data=muscle outest=params outseb;

model mmass=age /clb clm cli;

output out=mean_resp p=predicted

stdp=SE_mean lclm = LCL_mean

uclm=UCL_mean;

output out=predict p=predicted

stdi=SE_pred lcl=LCL_pred

ucl=UCL_pred;

id age;

PROC PRINT data=params;

PROC PRINT data=mean_resp; where mmass=.;

PROC PRINT data=predict; where mmass=.;

RUN;

Page 14: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-14

Output (1)

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 1 11627 11627 174.06 <.0001

Error 58 3875 66.8

Total 59 15502

Root MSE 8.17318 R-Square 0.7501

Page 15: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-15

Output (2)

Parameter Estimates

Par. Std

Variable DF Est. Error t Value Pr>|t|

Intercept 1 156.35 5.51226 28.36 <.0001

age 1 -1.19 0.09020 -13.19 <.0001

Variable DF 95% Confidence Limits

Intercept 1 145.31257 167.38056

age 1 -1.37054 -1.00945

Page 16: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-16

Interpretation

In women, muscle mass decreases by an

average of 1.19 units per year.

A 95% CI for the amount of this decrease is

(1.01 , 1.37). In other words, the 95% CI

for 1 is (-1.37,-1.01).

Note: 95% represents the probability that,

for any given repetition of the experiment,

the confidence interval will actually cover

the true value.

Page 17: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-17

Output (3)

Obs age predict SE_mean LCL_mean UCL_mean

61 50 96.8468 1.38715 94.0701 99.6235

62 60 84.9468 1.05515 82.8347 87.0590

63 70 73.0469 1.38911 70.2663 75.8275

Obs age predict LCL_pred UCL_pred SE_pred

61 50 96.8468 80.2524 113.441 8.29005

62 60 84.9468 68.4507 101.443 8.24101

63 70 73.0469 56.4519 89.642 8.29038

Page 18: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-18

Interpretation

Prediction intervals are pretty wide –

indicating that there is a large amount of

variation. The estimated standard

deviation (RMSE) was 8.2.

We wouldn’t be able to well-predict the

muscle loss for a single subject, but we

would be able to well-predict the average

muscle loss for multiple subjects (the SE’s

associated to the mean response are fairly

small)

Page 19: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-19

Regression Plots

symbol1 v=dot c=blue;

symbol2 v=none i=rlclm95 c=green;

symbol3 v=none i=rlcli95 c=red;

PROC GPLOT data=muscle;

plot mmass*(age age age) / haxis=axis1

vaxis=axis2 overlay;

title2 'Confidence and Prediction Bands';

RUN; QUIT;

Page 20: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-20

Page 21: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-21

Working-Hotelling Adjustment

Previous Confidence Bands were

unadjusted.

To produce the W-H confidence bands for

the regression line, first use the F-

distribution to compute W = 2.18.

For T-distribution with 58 degrees of

freedom, this corresponds to an effective

alpha of about 0.01. So use 0.99 instead of

0.95 to get adjusted confidence band.

Page 22: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-22

Compute Effective Alpha

data a1;

n=60; alpha=0.05; dfn = 2; dfd = n-2;

w2 = 2 * finv(1-alpha,dfn,dfd);

w=sqrt(w2); alphat=2*(1-probt(w,dfd));

tstar=tinv(1-alphat/2,dfd); output;

PROC PRINT data=a1; RUN;

n alpha dfn dfd w2 w alphat tstar

60 0.05 2 58 6.31 2.51 0.0148 2.51234

Use 0.01 (more conservative) as effective alpha.

Page 23: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-23

Page 24: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-24

ANOVA Table

Organize the variation arithmetically

Total (or corrected total) sum of squares is

2

TOT Y iSS SS Y Y

Think of this as the total possible variation

that might be explained by the model. The

percentage of TOTSS that we actually

explain is the coefficient of determination 2R .

Page 25: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-25

Partitioning SSTOT

Two sources: MODEL (variation explained

by regression) and ERROR (unexplained

or residual variation)

2 22

ˆ ˆ

ˆ ˆ

i i i i

SSR SSESSTOT

i i i i

Y Y Y Y Y Y

Y Y Y Y Y Y

(cross terms cancel: see page 65)

Page 26: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-26

Page 27: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-27

Total Sum of Squares

Ignore X while predicting Y: Best predictor

is Y . TOTSS is the sum of squared

deviations from this predictor.

2

TOT Y iSS SS Y Y

Degrees of Freedom is n – 1 since est. Y

Mean Square: / 1TOT TOTMS SS n is

the usual estimate of variance when there

is no predictor term involved.

Page 28: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-28

Model Sum of Squares

Variation explained by the regression model

R iSS Y Y

Degrees of freedom is 1 since estimating the

slope parameter (intercept parameter taken

care of already in estimation of Y ).

Mean Square: /R R RMS SS df

Page 29: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-29

Error Sum of Squares

Unexplained variation

E i iSS Y Y

Degrees of freedom is n – 2 (difference

between the total and model degrees of

freedom.)

Mean Square Error is /E E EMS SS df .

This is the best estimate of the variance for

Y once we condition on the explanatory

variable(s).

Page 30: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-30

ANOVA Table

Source df SS MS

Regression

(Model) 1

2

iY Y R

SSR

df

Error 2n 2ˆ

i iY Y E

SSE

df

Total 1n 2

iY Y T

SSTO

df

Page 31: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-31

Expected Mean Squares

Mean Squares are random variables since

Y’s are random variables. Can compute:

2 21

2

XE MSR SS

E MSE

When 0 1: 0H is true, then E MSR

and E MSE are identical and in

particular their ratio is 1.

Page 32: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-32

F-test

Under the null, F = MSR/MSE has an F

distribution with 1 and n – 2 degrees of

freedom.

When 0 1: 0H is false, MSR tends to be

larger, so we would want to reject the null

when F is large

Generally, reject if F is bigger than critical

value (or in practice, when p-value is less

than the significance level).

Page 33: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-33

ANOVA Table with Test

Source df SS MS F P

Model 1 SSM MSM MSM

MSE .xxx

Error n – 2 SSE MSE

Total n – 1

(model used here b/c this is what you see in SAS)

Page 34: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-34

Example (Muscle Mass)

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 1 11627 11627 174.06 <.0001

Error 58 3875 66.8

Total 59 15502

Root MSE 8.17318 R-Square 0.7501

Page 35: Lecture 4: ANOVA Table - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · 4-1 Lecture 4: ANOVA Table STAT 512 Spring 2011 Background Reading KNNL: 2.6-2.7

4-35

Upcoming in Lecture 5...

General Linear Test (Section 2.8)

Coefficient of Determination/Correlation

(Section 2.9)

Assessing Validity of Model Assumptions

(Chapter 3)