worked example using r. > plot(y~x) >plot(epsilon1~x) this is a plot of residuals against the...

51
Worked Example Worked Example Using R Using R

Upload: matthew-maximilian-lawrence

Post on 14-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

Worked ExampleWorked Example

Using RUsing R

Page 2: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 3: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

> plot(y~x)

Page 4: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 5: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 6: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 7: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 8: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 9: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

>plot(epsilon1~x)

This is a plot of residuals against the exploratory variable, x

Page 10: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

>plot(epsilon1~yhat)

This is a plot of residuals against the fitted values, yhat.

Page 11: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

Both graphs show the same thing … the residuals are following a random pattern.

Note: Since the equation is approximately y=x, both graphs are extremely similar in this case.

Page 12: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 13: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 14: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 15: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 16: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 17: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

Model Diagnostics: Residuals Model Diagnostics: Residuals and Influence and Influence

Page 18: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

Consider again the problem of fitting the modelyi = f(xi) + εi i = 1,……….n

Assume again a single continuous response variable y.

The explanatory variable x may be either a single variable, or a vector of variables. How do we assess the quality of a given fit f?

Page 19: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

While summary statistics are helpful, they are not sufficient. Good diagnostics are typically based on case analysis, i.e. an examination of each observation in turn in relation to the fitting procedure. This leads to an examination of residuals and influence.

Page 20: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

ResidualsResiduals

The residuals should be thought of as what is left of the values of the response variable after the fit has been subtracted.

Ideally they should show no further dependence (especially no further location dependence) on x.

Page 21: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

In general this should be investigated graphically by plotting residuals against the explanatory variable(s) x.

For linear models, we frequently compromise by plotting residuals against fitted values.

Page 22: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

In particular the residuals provide information about:

*whether the best relation has been fitted

*the relative merits of different fits

*mild, but non-random, departures from the hypothesised fit

*the magnitude of the residual variation

Page 23: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

*the identification of outliers

*possible further dependence on x, other than through location, of the conditional distribution of y given x - in particular heterogeneity of spread of the residuals.

Page 24: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

Example:Anscombe’s Artificial DataExample:Anscombe’s Artificial DataThe R data frame anscombe is madeavailable by> data(anscombe)This contains 4 artificial datasets, each of 11 observations of a continuous response variable y and a continuous explanatory variable x. The data are now plotted along with the result of the least squares linear model to the corresponding dataset.

Page 25: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

All the usual summary statistics related to the classical analyses of the fitted models are identical across the 4 datasets. This includes the coefficients a and b and their standard errors and confidence intervals, together with the residual standard errors and correlation coefficients.

^ ^

Page 26: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 27: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 28: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 29: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 30: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 31: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

Consideration of the residuals shows that very different judgements should bemade about the appropriateness of the fitted model to each of the 4 cases.

A full discussion is given by Weisberg (1985, pp107,108).

Page 32: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 33: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 34: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 35: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 36: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 37: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 38: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 39: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 40: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

InfluenceInfluence

Influence measures the extent to which a fit is affected by individual observations.

A possible formal definition is the following: the influence of any observation is a measure of the difference between the fit and the fit which would be obtained if that observation were omitted.

Page 41: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

Obviously observations with large influences require more careful checking.

Especially for linear models, influence is often measured by Cook's distance.

Page 42: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

Cook’s Distance FormulaCook’s Distance Formula

Page 43: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

As a rule of thumb, observations for which Di > 1 make a noticeable difference to the parameter estimates, and should be examined carefully for the appropriateness of their use in fitting the model.

Clearly an observation with a large residual also has a large influence. However, an observation with an unusual value of its explanatory variable(s) can pull a fit towards it and have a large influence though a small residual.

Page 44: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

Example: Anscombe's third data set.

The last graph produced by the plot function shows that the observation number 3 has an unusually large value of Cook's distance D3 = 1.39.

>plot(model3) produces:

Page 45: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 46: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 47: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 48: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 49: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x

We now refit the data omitting this observation.

>x5=x3[-3]

>y5=y3[-3]

>model5=lm(y5~x5)

Page 50: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x
Page 51: Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x