sta305 week 31 assessing model adequacy a number of assumptions were made about the model, and these...

32
STA305 week 3 1 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for inferences. In cases where some assumptions are violated, there are difficulties \ in interpreting the results of the ANOVA. The assumptions for the one-factor model and for most of the models we will discuss in the future, consists of an assumption about the form of the model as well as assumptions about the errors.

Upload: roland-parker

Post on 16-Dec-2015

245 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 1

Assessing Model Adequacy

• A number of assumptions were made about the model, and these

need to be verified in order to use the model for inferences.

• In cases where some assumptions are violated, there are difficulties \

in interpreting the results of the ANOVA.

• The assumptions for the one-factor model and for most of the

models we will discuss in the future, consists of an assumption

about the form of the model as well as assumptions about the errors.

Page 2: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 2

Assumptions for the One-Factor Model

• Model Form: the model assumes that the mean response within

each treatment group is of the form E(Yij) = μ + τi.

• Independence: model assumes that the errors, εij, are independent of each other.

• Homoscedasticity: the model assumes that the, εij have a common variance.

• Normality: the model assumes that the, εij have a normal distribution with mean 0 and constant variance σ2.

• Note, the model might be adequate for most observations, but outliers (unusual observations) might have large impact on parameter estimates and hypothesis tests. Therefore, besides checking the model assumptions we should also check for potential outliers.

Page 3: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 3

Residuals

• Most of the assumptions that need to be validated involve some aspect of the errors, εij.

• Notice that εij = Yij − μ − τi.

• These residuals, εij, can be estimated by the observed residuals as follows: .

• We often use standardized residuals, zij , either instead of, or alongside the residuals .

• Standardized residuals, zij, are given by:

• The standardized residuals, zij, have mean 0 and variance 1, which makes it useful and easier for detecting outlier.

ije

iijiijij YYYYe ˆˆˆ

1/

ˆ

nSS

ez

E

ijij

ije

Page 4: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 4

Checking Model Form

• To assess whether the model form was correctly specified, we plot the standardized residuals against the treatments (factor levels).

• If the model is adequate, residuals should be centered around 0 for each treatment group.

• Residuals that are not centered at 0, or other show any other non-random patterns could indicate lack of fit.

• In slide 5, the graph on the left, shows a residual plot for a case where model form is correct.

• The graph on the right shows a residual plot for a case where model has not been correctly specified.

Page 5: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 5

Page 6: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 6

Checking for Constant Variance

• The plot of standardized residuals versus treatments discussed above can also be used to determine whether the variance is constant across treatments.

• The spread of the standardized residuals should be similar within each treatment group.

• For the data in both Figures on slide 5 the variance appears to be constant.

• The graph on the next slide (slide 7), is for data in which the variance is not likely to be constant.

• We often use a rule-of-thumb to determine whether this constant variance assumption is valid on not.

• The rule-of-thumb is that the ratio of the largest treatment standard deviation to the smallest should be less then or equal to 3, i.e, ...

Page 7: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 7

Page 8: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 8

Another Aspect of Constant Variance

• Sometimes the variance is non-constant but the differences are due to size of observation rather than treatment.

• The residuals might be larger (or smaller) for larger values of response.

• A plot of residuals, , versus fitted values, , is useful for detecting non-constant variance.

ije ijY

Page 9: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 9

Variance-Stabilizing Transformations

• When unequal variances are detected (heteroscedasticity), we might be able to transform the data so that the assumption of common variances holds.

• For example, in the graph on slide 7, it looks like the standard

deviation is linearly related to the treatment level, that is σi = iσ.

• In this case, should have constant variances.

• Generally, we must find some function h so that and all model assumptions are satisfied.

iYY ijij '

***ijiijYh

Page 10: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 10

Checking for Outliers

• Outliers are observations that are unusually large or unusually small.

• Outliers can be spotted on plot of standardized residuals versus

treatment.

• The previous graphs do not appear to have any outliers.

• The graph below (slide 11) contains several outliers.

Page 11: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 11

Page 12: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 12

Checking for Error Term Dependence

• Recall, the model requires that the εij be statistically independent for all i ≠ j.

• Dependence sometimes appears in experimental units that were tested close together in time and/or space.

• To examine whether this has happened, plot residuals in time/space order.

• If independence assumption is satisfied there should be no pattern in the residuals.

• The left graph on slide 13 is a residual plot with no order; while the right graph shows a case where there might be dependence.

Page 13: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 13

Page 14: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 14

Checking for Normality

• If the model assumption about the normality of the residuals holds,

then a Q-Q plot (or normal probability plot) of standardized

residuals should be approximately straight line.

• We can also plot a histogram or stem-and-leaf plot of standardized

residual to check the normality assumption.

Page 15: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 15

Using SAS to Generate Plots

• An example will be used to demonstrate how to generate residual plots using SAS.

• An experiment was conducted in order to compare effects of auditory versus visual signals on speed of response of human subjects.

• Computer was used to present a stimulus.

• Reaction time required by subject to press a key was recorded.

• Subjects were given either visual or auditory signal that stimulus was coming.

• Time between cue and stimulus was either 5, 10, or 15 seconds.

Page 16: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 16

• Thus, there were 6 treatments in total:

Page 17: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 17

• The Data: In addition to recording subject response times, the order

of testing was recorded as well. Response times are in seconds and

order of testing is in brackets.

Page 18: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 18

Creating SAS Dataset

• To create a SAS dataset in the usual manner use the following code:

data response ;input treatment response order ;cards ;1 0.204 91 0.170 10....6 0.258 4;run ;

Page 19: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 19

Fit Model and Obtain Residuals

• Residuals are calculated by PROC GLM and can be output to a SAS dataset:

PROC GLM DATA = response ;CLASS treatment ;MODEL response = treatment ;OUTPUT OUT = resid RESIDUAL = e P=fitted ;run ;

• The OUTPUT statement creates new dataset, called resid,

containing the original data and a new variable, e (residuals)

Page 20: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 20

Getting the Standardized Residuals

• Use PROC STANDARD to standardize residuals obtained in previous step:

proc standard data = residout=stdresid (rename=(e=z)) std=1.0;var e ;run ;

• This creates a new dataset called stdresid, and the standardized residuals are in a variable called z.

Page 21: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 21

Plot of Residuals versus Treatment

• Several model assumptions can be checked by plotting standardized residuals versus treatment.

• The following code can be used to obtain this plot.

proc gplot data = stdresid ;plot z * treatment / vref = 0 ;run ;quit ;

• The resulting plot is given on the next slide.

Page 22: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 22

Page 23: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 23

Results

• Based on the plot in the previous slide the following observations can be made:

There are no outliers. The residuals are centered around 0 for each treatment. The range of standardized residuals is similar for each treatment.

Page 24: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 24

Checking for Constant Variance

• To check for constant variance across treatment groups, use PROC MEANS to get standard deviations:

PROC MEANS data = stdresid ;CLASS treatment ;VAR z ;RUN ;

• The output is as follows: N

treatment Obs Std Dev1 3 1.21399772 3 0.72830903 3 1.46109304 3 0.77076985 3 1.67985376 3 0.9721052

Page 25: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 25

• Note, we can use the rule of thumb for verifying constant variance as follows:

• The ratio is less than 3, so constant variance assumption is reasonable.

• Further, the plot doesn’t suggest unequal variances so it’s OK to assume variance is constant across treatments.

3.273.0

68.1

min

max s

s

Page 26: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 26

Plot Residuals versus Fitted Values

• A second check for constant variance involves plotting standardized residuals versus fitted values.

• Fitted values were obtained from PROC GLM and are contained in the dataset stdresid.

• Use the following code to obtain graph.

proc gplot data = stdresid ;plot z * fitted / vref = 0 ;run ;quit ;

• The plot of residuals versus fitted values is given in the next slide.

• It doesn’t appear that the residuals increase/decrease with fitted value, therefore, it is OK to assume constant variance.

Page 27: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 27

Page 28: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 28

Plot Residuals versus Test Order

• To check for possible dependence of error terms, plot standardized

residuals versus order in which observations were obtained. Use the

following code:

proc gplot data = stdresid ;plot z * order / vref = 0 ;run ;quit ;

• The corresponding plot is given in the next slide. It does not suggest

any dependence due to testing order.

Page 29: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 29

Page 30: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 30

Normal Probability Plot

• Normal probability plots can be obtained from PROC UNIVARIATE in SAS:

PROC UNIVARIATE DATA = stdresid plots ;var z ;probplot/normal (mu=0 sigma=1) square;RUN ;

• The plot generated by these statements is given in the next slide.

Since plotted points fall close to straight line, it appears that the

assumption of normality is reasonable.

Page 31: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 31

Page 32: STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for

STA305 week 3 32

Concluding Remarks

• None of the assumptions was violated in this example. Hence, it is OK to fit model and proceed with inference.

• If outliers had been found, we would want to fit model with & without outliers to compare the inferences.

• If the variance was not constant, we would need to transform the data before going ahead with inference.

• If the data is not normal, we might be able to transform it so that it is normal.

• Also, if data is not normal, we could use nonparametric methods for inference (not yet discussed in course).