1 date name, department lecture 7 model checking for linear mixed models for longitudinal data ziad...

32
1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Upload: franklin-welch

Post on 23-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

1 Date

Name, department

Lecture 7Model Checking for Linear

Mixed Models for Longitudinal Data

Ziad Taib

Biostatistics, AZ

MV, CTH

May 2009

Page 2: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Date

Name, department

2

Outline of lecture 7

1. Introduction to model checking

2. Model checking for the linear model

3. Model checking for the linear mixed models for longitudinal data

Page 3: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

1. Introductio to model checking

The process of statistical analysis might take the form

Model Class Some Models Conclusions

Select Summarize

Data Stop

Page 4: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

In the above process, however, even after a careful selection of model class, the data themselves may indicate that the particular model is unsuitable. Thus, it seems to be reasonable to introduce model checking to the original process. The news process of

statistical analysis is

Model Class Some Models Conclusions

Select

Summarize

Data StopModel Checking

Page 5: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

The inadequacy indicated by model checking could take two forms and is part of the technique of model checking.

1. The detection of systematic discrepancies. It may be e.g. that the data as a whole show some systematic departure from the fitted model. An example of this type is informal checking using residuals.

2. The detection of isolated discrepancies. It may be that a few data values are discrepant from the rest. This can be done using measures of leverage or measures of influence

Page 6: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

2. The linear model

Model checking for linear models uses mainly the following statistics:

The fitted values:

The mean residual sum of square:

The residual:

11ˆˆ ppnn X

pn

XYXYs

t

ˆˆ

2

Ye

nibZXY ijiiiij ,...,2,,1 ,

Page 7: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Residual checking

Plot residuals against mean

Re

sid

ua

ls

0 20 40 60 80 100

-3-2

-10

12

3

Page 8: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Checks for Isolated Departures from the Model

In the case of the standard linear model, the Cook’s distance can be used to assess the influence of observation i, by considering the parameter estimate without the contribution from the i’th observation:

ni

ps

YY

ps

YYYY

p

raV

ps

XXC

ii

t

i

i

t

iitt

ii

,,1 ,ˆˆˆˆˆˆ

ˆˆˆˆˆˆˆˆˆˆ

2

2

)(

2)()(

)(

1

)(2

)()(

Page 9: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

3. Model checking in linear mixed models

Page 10: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Date

Name, department

10

3.1 Model selection: likelihood

When choosing between different models we want to be able to decide which model fits our data best. If the models compared are nested within each other it is possible to do a likelihood ratio test where the test statistic has an approximate distribution. The test statistic for the likelihood statistic is,

where DF are the degrees of freedom which is the difference in number of parameters for the models and L1 and L2 are the likelihoods for the first and second model respectively.

221 ~)log()log(2 DFLL

Page 11: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Date

Name, department

11

If the two models compared are not nested with each other but contain the same number of parameters they can be compared directly by looking at the log likelihood and

the model with the biggest likelihood value wins

If the two models are not nested and contain different number of parameters the likelihood can not be used directly. It is still possible to compare these models with some of the methods described below. The bigger the likelihood is the better the model fits data and we use this

when we compare different models

Since we are interested in getting as simple models as possible we also have to consider the number of parameters in the structures. A model with many parameters usually fits data better than a model with less number of parameters Information.

Page 12: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Date

Name, department

12

3.2 Model selection: Information criteria It is possible to compute so called information criteria and there are

different ways to do that and here we show two of these, Akaikes information criteria (AIC) and Bayesian information criteria (BIC). The idea with both of these is to punish models with many parameters in some way. We present the information criteria the way they are computed in SAS. The AIC value is computed as below where q is the number of parameters

in the covariance structure. Formulated this way, a smaller value of AIC indicates a better model.

The BIC value is computed using the following formula where q is the number of parameters in the covariance structure and n is the number of effective observations, which means the number of individuals. Like for AIC a smaller value of BIC is better than a larger.

)log(22

n

qLLBIC

qLLAIC 22

Page 13: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Model fit It is possible to define a goodness of fit measure similar to, R, the

coefficient of determination often used for linear models. It is called Concordance Correlation coefficient (CCC). Unlike the AIC or the BIC, the CCC does not compare the model at hand to other models, thus it does not require that other models be fitted.

For simple linear regression we have

Page 14: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

3.3 Residuals for linear mixed models

In model selection, we accept the model with the best likelihood value in relation to the number of parameters but we still do not know if the model chosen is a good model or even if the normality assumption we have made is realistic.To check this we can look at two types of plots for our data,

normal plots

residual plots to check

1. normality of the residuals and the random effects

2. if the residuals seem to have a constant variance

3. outliers

Date

Name, department

14

Page 15: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

The predicted values and residuals can be computed in many different ways. Some of these are accounted for in the in what following.

Recall that the general linear mixed model is of the form:

Assuming we have ML estimates of the fixed parameters and EB predictions of the random parameters

Date

Name, department

15

nibZXY ijiiiij ,...,2,,1 ,

Page 16: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Date

Name, department

16

We can “estimate” the residuals according to the following three methods:

Each type of residual is useful to evaluate some of the assumptions of the model.

Page 17: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

DateName, department17

Marginal residuals Can be used to assess linearity of the response wrt

explanatory variables. A random behviour around zero is a sign of linearity.

Conditional residuals Plots of against Y can be used to assess

homogeneity of the variances as well as normality.

EBLUP Plots of bi against subject indices can be used to find

outliers. Plot elements in bi b to assess normality and check for outliers.

Page 18: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

3.4 An exampleTo illustrate the above procedures, we analyze data from a study conducted at the School of Dentistry of the University of So Paulo, Brazil, designed to compare a low cost toothbrush (monoblock) with a conventional toothbrush with respect to the maintenance of the capacity to remove bacterial plaque under daily use. The data in the table correspond to bacterial plaque indices obtained from 32 children aged 4 to 6 before and after tooth brushing in four evaluation sessions.

Date

Name, department

18

Page 19: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Following Singer et al. (2004) who analyze a different data set from the same study, we considered fitting models of the form

Date

Name, department

19

Page 20: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009
Page 21: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Three possible models

(3.1)

(3.2)

(3.3)

i subjectd sessionj type of toothbrush

prepost

Page 22: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

The model reduction procedure can be based on likelihood ratio tests (LRT) and AIC and BIC:

The LRT p-values corresponding to the reduction of (3.1) to (3.2) and of (3.2) to (3.3) were, respectively 0.3420 and 0.1623.

The AIC (BIC) for the three models are

AIC (BIC)(3.1) 95.0 68.6(3.2) 102.8 86.7(3.3) 105.6 92.1

Based on these results, we adopt (3.3) to illustrate the use of the proposed diagnostic procedures.

Page 23: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Date

Name, department

23 Figure 2

To check for the linearity of effects, we plot the marginal residuals versus the logarithms of the pretreatment bacterial plaque index in Figure 2. The figure supports the regression model for the transformed response (log of the bacterial plaque index)

Page 24: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

The figure suggests something is wrong with observations #12.2 and #29.4

Page 25: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Date

Name, department

25

Page 26: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

26

Page 27: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009
Page 28: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009
Page 29: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Date

Name, department

29

References

1. Atkinson, C. A. (1985). Plots, transformations, and regression: an introduction to graphical methods of diagnostic regression analysis. Oxford University Press, Oxford.

2. Cook, R. D. and Weisberg, S. (1982). Residuals and influence regression. Chapman & Hall, New York.

3. Cox, D. R. and Snell, E. J. (1968). A general definition of residuals (with discussion). Journal Royal Statistical Society B 30, 248–275.

4. Fei, Y. and Pan, J. (2003). Influence assessments for longitudinal data in linear mixed models. In 18th international workshop on Statistical Modelling. G. Verbeke, G. Molenberghs, M. Aerts and S. Fieuws (eds.). Leuven: Belgium, 143–148.

5. Grady, J. J. and Helms, R.W. (1995). Model selection techniques for the covariance matrix for incomplete longitudinal data. Statistics in Medicine 14, 1397–1416.

Page 30: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

References

6. Jiang, J. (2001). Goodness-of-fit tests for mixed model diagnostics. The Annals of Statistics 29, 1137–1164.

7. Lange, N. and Ryan, L. (1989). Assessing normality in random effects models. The Annals of Statistics 17, 624– 642.

8. Longford, N. T. (2001). Simulation-based diagnostics in random-coefficient models. Journal of the Royal Statistical Society A 164, 259–273.

9. Nobre, J. S. and Singer, J. M. (2006). Fixed and random effects leverage for influence analysis in linear mixed models. (Submitted; http://www.ime.usp.br/jmsinger).

10. Oman, S. D. (1995). Checking the assumptions in mixed-model analysis of variance: a residual analysis approach. Computational Statistics and Data Analysis 20, 309–330.

Page 31: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

References

11. Verbeke, G. and Lesaffre, E. (1997). The effect of misspecifying the random-effects distributions in linear mixed models for longitudinal data. Computational Statistics and Data Analysis 23, 541–556.

12. Waternaux, C., Laird, N. M., and Ware, J. H. (1989). Methods for analysis of longitudinal data: blood-lead concentrations and cognitive development. Journal of the American Statistical Association 84, 33–41.

13. Weiss, R. E. and Lazaro, C. G. (1992). Residual plots for repeated measures. Statistics in Medicine 11, 115–124.

14. Wolfinger, R. (1993). Covariance structure selection in general mixed models. Communications in Statistics-Simulation 22, 1079–1106.

Page 32: 1 Date Name, department Lecture 7 Model Checking for Linear Mixed Models for Longitudinal Data Ziad Taib Biostatistics, AZ MV, CTH May 2009

Date

Name, department

32

Any Questions?