mixed models - download.e-bookshelf.de · 6 nonlinear marginal model 291 6.1 fixed matrix of random...

Mixed Models

WILEY SERIES IN PROBABILITY AND STATISTICS

Established by WALTER A. SHEWHART and SAMUEL S. WILKS

Editors: David J. Balding, Noel A. C. Cressie, GarrettM. Fitzmaurice, Harvey Goldstein, Iain M. Johnstone, Geert Molenberghs, David W. Scott, Adrian F M. Smith, Ruey S. Tsay, Sanford Weisberg Editors Emeriti: Vic Barnett, J. Stuart Hunter, Joseph B. Kadane, JozefL. Teugels

A complete list of the titles in this series appears at the end of this volume.

Mixed Models

Theory and Applications with R

Second Edition

EUGENE DEMIDENKO

Dartmouth College

W I L E Y

Copyright © 2013 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Demidenko, Eugene, 1948-Mixed models : theory and applications with R / Eugene Demidenko. — Second [edition].

p. cm. — (Wiley series in probability and statistics ; 893) Includes bibliographical references and index. ISBN 978-1-118-09157-9 (hardback) 1. Analysis of variance. I. Title. QA279.D457 2013 519.538—dc23 2013001306

Printed in the United States of America.

10 9 8 7 6 5 4 3 2 1

http://www.copyright.com

http://www.wiley.com/go/permission

http://www.wiley.com

To my family

Contents

Preface xvii

Preface to the Second Edition xix

R Software and Functions xx

Data Sets xxii

Open Problems in Mixed Models xxiii

1 Introduction: Why Mixed Models? 1

1.1 Mixed effects for clustered data 2 1.2 ANOVA, variance components, and the mixed model 4 1.3 Other special cases of the mixed effects model 6 1.4 Compromise between Bayesian and frequentist approaches 7 1.5 Penalized likelihood and mixed effects 9 1.6 Healthy Akaike information criterion 11 1.7 Penalized smoothing 13 1.8 Penalized polynomial fitting 16 1.9 Restraining parameters, or what to eat 18 1.10 Ill-posed problems, Tikhonov regularization, and mixed effects . . . 20 1.11 Computerized tomography and linear image reconstruction 23 1.12 GLMM for PET 26 1.13 Maple leaf shape analysis 29 1.14 DNA Western blot analysis 31 1.15 Where does the wind blow? 33 1.16 Software and books 36

viii Contents

1.17 Summary points 37

2 MLE for the LME Model 41 2.1 Example: weight versus height 42

2.1.1 The first R script 43 2.2 The model and log-likelihood functions 45

2.2.1 The model 45 2.2.2 Log-likelihood functions 48 2.2.3 Dimension-reduction formulas 49 2.2.4 Profile log-likelihood functions 53 2.2.5 Dimension-reduction GLS estimate 55 2.2.6 Restricted maximum likelihood 56 2.2.7 Weight versus height (continued) 59

2.3 Balanced random-coefficient model 60 2.4 LME model with random intercepts 64

2.4.1 Balanced random-intercept model 67 2.4.2 How random effect affects the variance of MLE 71

2.5 Criterion for MLE existence 72 2.6 Criterion for the positive definiteness of matrix D 74

2.6.1 Example of an invalid LME model 75 2.7 Pre-estimation bounds for variance parameters 77 2.8 Maximization algorithms 79 2.9 Derivatives of the log-likelihood function 81 2.10 Newton-Raphson algorithm 82 2.11 Fisher scoring algorithm 85

2.11.1 Simplified FS algorithm 86 2.11.2 Empirical FS algorithm 86 2.11.3 Variance-profile FS algorithm 87

2.12 EM algorithm . . . 88 2.12.1 Fixed-point algorithm 92

2.13 Starting point 93 2.13.1 FS starting point 93 2.13.2 FP starting point 94

2.14 Algorithms for restricted MLE 95 2.14.1 Fisher scoring algorithm 95 2.14.2 EM algorithm 96

2.15 Optimization on nonnegative definite matrices 96 2.15.1 How often can one hit the boundary? 97 2.15.2 Allow matrix D to be not nonnegative definite 98 2.15.3 Force matrix D to stay nonnegative definite 103 2.15.4 Matrix D reparameterization 104 2.15.5 Criteria for convergence 105

2.16 lmeFS and lme in R 107 2.17 Appendix: proof of the existence of MLE I l l 2.18 Summary points 114

3 Statistical Properties of the LME Model 117 3.1 Introduction 117

Contents ix

3.2 Identifiability of the LME model 117 3.2.1 Linear regression with random coefficients 119

3.3 Information matrix for variance parameters 120 3.3.1 Efficiency of variance parameters for balanced data 129

3.4 Profile-likelihood confidence intervals 131 3.5 Statistical testing of the presence of random effects 133 3.6 Statistical properties of MLE 137

3.6.1 Small-sample properties 137 3.6.2 Large-sample properties 140 3.6.3 ML and RML are asymptotically equivalent 144

3.7 Estimation of random effects 145 3.7.1 Implementation in R 148

3.8 Hypothesis and membership testing 151 3.8.1 Membership test 152

3.9 Ignoring random effects 154 3.10 MINQUE for variance parameters 157

3.10.1 Example: linear regression 158 3.10.2 MINQUE for σ2 160 3.10.3 MINQUE for D* 162 3.10.4 Linear model with random intercepts 165 3.10.5 MINQUE for the balanced model 165 3.10.6 lmevarMINQUE function 166

3.11 Method of moments 166 3.11.1 lmevarMM function 171

3.12 Variance least squares estimator 171 3.12.1 Unbiased VLS estimator 173 3.12.2 Linear model with random intercepts 174 3.12.3 Balanced design 174 3.12.4 VLS as the first iteration of ML 175 3.12.5 lmevarUVLS function 175

3.13 Projection on B + space 176 3.14 Comparison of the variance parameter estimation 176

3.14.1 lmesim function 179 3.15 Asymptotically efficient estimation for β 180 3.16 Summary points 181

4 Growth Curve Model and Generalizations 185 4.1 Linear growth curve model 185

4.1.1 Known matrix D 187 4.1.2 Maximum likelihood estimation 189 4.1.3 Method of moments for variance parameters 192 4.1.4 Two-stage estimation 196 4.1.5 Special growth curve models 196 4.1.6 Unbiasedness and efficient estimation for β 200

4.2 General linear growth curve model 201 4.2.1 Example: Calcium supplementation for bone gain 202 4.2.2 Variance parameters are known 204 4.2.3 Balanced model 207

x Contents

4.2.4 Likelihood-based estimation 208 4.2.5 MM estimator for variance parameters 213 4.2.6 Two-stage estimator and asymptotic properties 214 4.2.7 Analysis of misspecification 215

4.3 Linear model with linear covariance structure 219 4.3.1 Method of maximum likelihood 220 4.3.2 Variance least squares 222 4.3.3 Statistical properties 223 4.3.4 LME model for longitudinal autocorrelated data 224 4.3.5 Multidimensional LME model 229

4.4 Robust linear mixed effects model 233 4.4.1 Robust estimation of the location parameter with estimated

σ and c 235 4.4.2 Robust linear regression with estimated threshold 238 4.4.3 Robust LME model 239 4.4.4 Alternative robust functions 239 4.4.5 Robust random effect model 240

4.5 Appendix: derivation of the MM estimator 241 4.6 Summary points 242

5 Meta-analysis Model 245 5.1 Simple meta-analysis model 246

5.1.1 Estimation of random effects 248 5.1.2 Maximum likelihood estimation 248 5.1.3 Quadratic unbiased estimation for σ2 253 5.1.4 Statistical inference 260 5.1.5 Robust/median meta-analysis 266 5.1.6 Random effect coefficient of determination 271

5.2 Meta-analysis model with covariates 273 5.2.1 Maximum likelihood estimation 274 5.2.2 Quadratic unbiased estimation for σ2 277 5.2.3 Hypothesis testing 278

5.3 Multivariate meta-analysis model 278 5.3.1 The model 280 5.3.2 Maximum likelihood estimation 283 5.3.3 Quadratic estimation of the heterogeneity matrix 285 5.3.4 Test for homogeneity 288


6 Nonlinear Marginal Model 291 6.1 Fixed matrix of random effects 292

6.1.1 Log-likelihood function 293 6.1.2 nls function in R 295 6.1.3 Computational issues of nonlinear least squares 296 6.1.4 Distribution-free estimation 297 6.1.5 Testing for the presence of random effects 298 6.1.6 Asymptotic properties 298 6.1.7 Example: log-Gompertz growth curve 299

Contents xi

6.2 Varied matrix of random effects 305 6.2.1 Maximum likelihood estimation 305 6.2.2 Distribution-free variance parameter estimation 308 6.2.3 GEE and iteratively reweighted least squares 309 6.2.4 Example: logistic curve with random asymptote 310

6.3 Three types of nonlinear marginal models 316 6.3.1 Type I nonlinear marginal model 317 6.3.2 Type II nonlinear marginal model 319 6.3.3 Type III nonlinear marginal model 319 6.3.4 Asymptotic properties under distribution misspecification . . 320

6.4 Total generalized estimating equations approach 321 6.4.1 Robust feature of total GEE 323 6.4.2 Expected Newton-Raphson algorithm for total GEE 323 6.4.3 Total GEE for the mixed effects model 324 6.4.4 Total GEE for the LME model 324 6.4.5 Example (continued): log-Gompertz curve 325 6.4.6 Photodynamic tumor therapy 326


7 Generalized Linear Mixed Models 331 7.1 Regression models for binary data 332

7.1.1 Approximate relationship between logit and probit 336 7.1.2 Computation of the logistic-normal integral 338 7.1.3 Gauss-Hermite numerical quadrature for multidimensional in-

tegrals in R 350 7.1.4 Log-likelihood and its numerical properties 352 7.1.5 Unit step algorithm 353

7.2 Binary model with subject-specific intercept 355 7.2.1 Consequences of ignoring a random effect 357 7.2.2 ML logistic regression with a fixed subject-specific intercept 358 7.2.3 Conditional logistic regression 359

7.3 Logistic regression with random intercept 362 7.3.1 Maximum likelihood 362 7.3.2 Fixed sample likelihood approximation 368 7.3.3 Quadratic approximation 371 7.3.4 Laplace approximation to the likelihood 371 7.3.5 VARLINK estimation 374 7.3.6 Beta-binomial model 376 7.3.7 Statistical test of homogeneity 378 7.3.8 Asymptotic properties 381

7.4 Probit model with random intercept 382 7.4.1 Laplace and PQL approximations 382 7.4.2 VARLINK estimation 383 7.4.3 Heckman method for the probit model 383 7.4.4 Generalized estimating equations approach 384 7.4.5 Implementation in R 386

7.5 Poisson model with random intercept 386 7.5.1 Poisson regression for count data 387

xii Contents

7.5.2 Clustered count data 388 7.5.3 Fixed intercepts 389 7.5.4 Conditional Poisson regression 390 7.5.5 Negative binomial regression 391 7.5.6 Normally distributed intercepts 394 7.5.7 Exact GEE for any distribution 396 7.5.8 Exact GEE for balanced count data 397 7.5.9 Heckman method for the Poisson model 398 7.5.10 Tests for overdispersion 399 7.5.11 Implementation in R 400

7.6 Random intercept model: overview 401 7.7 Mixed models with multiple random effects 402

7.7.1 Multivariate Laplace approximation 403 7.7.2 Logistic regression 403 7.7.3 Probit regression 407 7.7.4 Poisson regression 408 7.7.5 Homogeneity tests 410

7.8 GLMM and simulation methods 412 7.8.1 General form of GLMM via the exponential family 412 7.8.2 Monte Carlo for ML 413 7.8.3 Fixed sample likelihood approach 413

7.9 GEE for clustered marginal GLM 416 7.9.1 Variance least squares 418 7.9.2 Limitations of the GEE approach 420 7.9.3 Marginal or conditional model? 422 7.9.4 Implementation in R 423

7.10 Criteria for MLE existence for a binary model 424 7.11 Summary points 429

8 Nonlinear Mixed Effects Model 433 8.1 Introduction 433 8.2 The model 434 8.3 Example: height of girls and boys 437 8.4 Maximum likelihood estimation 439 8.5 Two-stage estimator 442

8.5.1 Maximum likelihood estimation 445 8.5.2 Method of moments 445 8.5.3 Disadvantage of two-stage estimation 446 8.5.4 Further discussion 446 8.5.5 Two-stage method in the presence of a common parameter . 447

8.6 First-order approximation 448 8.6.1 GEE and MLE 448 8.6.2 Method of moments and VLS 449

8.7 Lindstrom-Bates estimator 450 8.7.1 What if matrix D is not positive definite? 452 8.7.2 Relation to the two-stage estimator 452 8.7.3 Computational aspects of penalized least squares 453 8.7.4 Implementation in R: the function nlme 454

Contents xiii

8.8 Likelihood approximations 456 8.8.1 Linear approximation of the likelihood at zero 456 8.8.2 Laplace and PQL approximations 457

8.9 One-parameter exponential model 459 8.9.1 Maximum likelihood estimator 459 8.9.2 First-order approximation 460 8.9.3 Two-stage estimator 461 8.9.4 Lindstrom-Bates estimator 463

8.10 Asymptotic equivalence of the TS and LB estimators 466 8.11 Bias-corrected two-stage estimator 468 8.12 Distribution misspecification 470 8.13 Partially nonlinear marginal mixed model 473 8.14 Fixed sample likelihood approach 474

8.14.1 Example: one-parameter exponential model 475 8.15 Estimation of random effects and hypothesis testing 476

8.15.1 Estimation of the random effects 476 8.15.2 Hypothesis testing for the NLME model 477

8.16 Example (continued) 478 8.17 Practical recommendations 480 8.18 Appendix: Proof of theorem on equivalence 481 8.19 Summary points 484

9 Diagnostics and Influence Analysis 487 9.1 Introduction 487 9.2 Influence analysis for linear regression 488 9.3 The idea of infinitesimal influence 491

9.3.1 Data influence 491 9.3.2 Model influence 492

9.4 Linear regression model . . . 493 9.4.1 Influence of the dependent variable 494 9.4.2 Influence of the continuous explanatory variable 495 9.4.3 Influence of the binary explanatory variable 497 9.4.4 Influence on the predicted value 497 9.4.5 Case or group deletion 498 9.4.6 Rcode 500 9.4.7 Influence on regression characteristics 501 9.4.8 Example 1: Women's body fat 503 9.4.9 Example 2: gypsy moth study 507

9.5 Nonlinear regression model 510 9.5.1 Influence of the dependent variable on the LSE 510 9.5.2 Influence of the explanatory variable on the LSE 510 9.5.3 Influence on the predicted value 511 9.5.4 Influence of case deletion 511 9.5.5 Example 3: logistic growth curve model 512

9.6 Logistic regression for binary outcome 515 9.6.1 Influence of the covariate on the MLE 516 9.6.2 Influence on the predicted probability 516 9.6.3 Influence of the case deletion on the MLE 517

xiv Contents

9.6.4 Sensitivity to misclassification 517 9.6.5 Example: Finney data 522

9.7 Influence of correlation structure 524 9.8 Influence of measurement error 525 9.9 Influence analysis for the LME model 528

9.9.1 Example: Weight versus height 532 9.10 Appendix: MLE derivative with respect to σ2 534 9.11 Summary points 535

10 Tumor Regrowth Curves 539 10.1 Survival curves 541 10.2 Double-exponential regrowth curve 543

10.2.1 Time to regrowth, TR 546 10.2.2 Time to reach specific tumor volume, T* 547 10.2.3 Doubling time, TD 547 10.2.4 Statistical model for regrowth 548 10.2.5 Variance estimation for tumor regrowth outcomes 549 10.2.6 Starting values 550 10.2.7 Example: chemotherapy treatment comparison 551

10.3 Exponential growth with fixed regrowth time 557 10.3.1 Statistical hypothesis testing 558 10.3.2 Synergistic or supra-additive effect 558 10.3.3 Example: combination of treatments 559

10.4 General regrowth curve 563 10.5 Double-exponential transient regrowth curve 564

10.5.1 Example: treatment of cellular spheroids 570 10.6 Gompertz transient regrowth curve 571

10.6.1 Example: tumor treated in mice 572 10.7 Summary points 574

11 Statistical Analysis of Shape 577 11.1 Introduction 577 11.2 Statistical analysis of random triangles 579 11.3 Face recognition 582 11.4 Scale-irrelevant shape model 583

11.4.1 Random effects scale-irrelevant shape model 585 11.4.2 Scale-irrelevant shape model on the log scale 586 11.4.3 Fixed or random size? 587

11.5 Gorilla vertebrae analysis 587 11.6 Procrustes estimation of the mean shape 589

11.6.1 Polygon estimation 592 11.6.2 Generalized Procrustes model 592 11.6.3 Random effects shape model 593 11.6.4 Random or fixed (Procrustes) effects model? 594 11.6.5 Maple leaf analysis 594

11.7 Fourier descriptor analysis 596 11.7.1 Analysis of a star shape 596 11.7.2 Random Fourier descriptor analysis 602

Contents xv

11.7.3 Potato project 604 11.8 Summary points 605

12 Statistical Image Analysis 607 12.1 Introduction 607

12.1.1 What is a digital image? 608 12.1.2 Image arithmetic 609 12.1.3 Ensemble and repeated measurements 609 12.1.4 Image and spatial statistics . . . 610 12.1.5 Structured and unstructured images 610

12.2 Testing for uniform lighting 610 12.2.1 Estimating light direction and position 612

12.3 Kolmogorov-Smirnov image comparison 614 12.3.1 Kolmogorov-Smirnov test for image comparison 614 12.3.2 Example: histological analysis of cancer treatment 615

12.4 Multinomial statistical model for images 618 12.4.1 Multinomial image comparison 620

12.5 Image entropy 621 12.5.1 Reduction of a gray image to binary 623 12.5.2 Entropy of a gray image and histogram equalization 623

12.6 Ensemble of unstructured images 625 12.6.1 Fixed-shift model 626 12.6.2 Random-shift model 628 12.6.3 Mixed model for gray images 631 12.6.4 Two-stage estimation 633 12.6.5 Schizophrenia MRI analysis 635

12.7 Image alignment and registration 638 12.7.1 Affine image registration 641 12.7.2 Weighted sum of squares 642 12.7.3 Nonlinear transformations 643 12.7.4 Random registration 643 12.7.5 Linear image interpolation 644 12.7.6 Computational aspects 645 12.7.7 Derivative-free algorithm for image registration 646 12.7.8 Example: clock alignment 647

12.8 Ensemble of structured images 650 12.8.1 Fixed affine transformations 650 12.8.2 Random affine transformations 651

12.9 Modeling spatial correlation 652 12.9.1 Toeplitz correlation structure 654 12.9.2 Simultaneous estimation of variance and transform parameters 656


xvi Contents

13 Appendix: Useful Facts and Formulas 661 13.1 Basic facts of asymptotic theory 661

13.1.1 Central Limit Theorem 661 13.1.2 Generalized Slutsky theorem 662 13.1.3 Pseudo-maximum likelihood 664 13.1.4 Estimating equations approach and the sandwich formula . . 665 13.1.5 Generalized estimating equations approach 667

13.2 Some formulas of matrix algebra 668 13.2.1 Some matrix identities 668 13.2.2 Formulas for generalized matrix inverse 668 13.2.3 Vec and vech functions; duplication matrix 669 13.2.4 Matrix differentiation 670

13.3 Basic facts of optimization theory 672 13.3.1 Criteria for unimodality 673 13.3.2 Criteria for global optimum 674 13.3.3 Criteria for minimum existence 674 13.3.4 Optimization algorithms in statistics 675 13.3.5 Necessary condition for optimization and criteria for conver-

gence 678

References 681

Index 711

Preface

Preface xvii

Technological advances change the world, and statistics is no exception. The cornerstone of classical statistics is the notion of sample. Today, data are richer: We may have repeated measurements with thousands of clusters; data may come in the form of shapes or images. This book is about statistical analysis of data that constitute a sample of samples. In the first ten chapters we discuss statistical models when data come in traditional form as a sequence of numbers. Chapter 11 deals with a sample (ensemble) of shapes, and in Chapter 12 we discuss how to analyze an ensemble of images.

We take the statistical model based approach to analyzing data. Then the method of analysis is a derivative. Although the method sometimes comes first, the model-based approach has obvious advantages: Assumptions are clearly formulated, and properties of several methods can be studied and compared. For example, least squares is a method of fitting, but its pros and cons can be fully understood only when a statistical model is put forward to describe how observations are obtained. Then least squares is deduced, for example, from maximum likelihood.

Statistical treatment is carried out under a unifying mixed effects approach. This approach becomes fruitful not only to analyze complex clustered data (a sample of samples) but also as a statistical model for penalization and a common ground for the Bayesian and frequentist camps.

Use of the mixed modeling technique in shape and image analysis is exciting and promising. Much work remains to reveal the full power of this statistical approach to these nontraditional statistical data.

The book is divided into three parts. The first eight chapters cover the theory of mixed models: the linear mixed effects (LME) model, the generalized linear mixed model (GLMM), and the nonlinear mixed effects (NLME) model. In Chapter 9 we discuss methods of model diagnostics and influential analysis. The final three chapters are devoted to applications: tumor regrowth, shape, and image. Major results and points of discussion in each chapter are written in lay language and are collected in Summary Points sections so that the reader can get a quick chapter overview.

I look forward to hearing from readers and invite them to visit the book web site at

http://www.dartmouth.edu/~eugened

where some additional information with data and images is presented. I would like to thank the many people I worked with on various projects that have

led up to this book. First, I would like to mention my long-term collaboration with Thérèse Stukel and Tor Tosteson and thank them for their support. I am grateful to Harold Swartz and Jack Hoopes for the exposure to biological problems, and to the team led by Keith Paulsen, including Alex Hartov, Paul Meaney, and Brian Pogue, all from Dartmouth, who introduced me to the world of image reconstruction. Many

xviii Preface

thanks to John Baron, Margaret Karagas, and Mark Israel for creating a friendly scientific atmosphere. I am grateful to Ed Vonesh for discussion and his helpful comments.

Finally, thanks to the Scientific Workplace, a WYSIWYG version of the WT$i typesetting system (http://www.mackichan.com)—it is hard to imagine writing this book without this software.

Eugene Demidenko

Hanover, New Hampshire Dartmouth College January 2004

Preface to the Second Edition xix

Preface to the Second Edition

Time proved that mixed model is an indispensable tool in studying multilevel and clustered data. Mixed model became one of the mainstreams of moderns statistics, on both the theoretical and practical fronts. Several books on the topic have been published since the first edition; see Section 1.16 for a comprehensive list. Most of these books target applications of mixed models and illustrate the examples with popular statistical software packages, such as SAS and R. This book has a distinct theoretical and research flavor. It is intended to explain what is "under the hood" of the mixed model methodology. In particular, it may be used for educational purposes by graduate and Ph.D. students in statistics.

Two major additions have been made in the second edition:

• Each section ends with a set of problems that should be important for an active understanding of the material. There are two type of problems: unmarked problems are regular problems, and problems marked with an asterisk are more difficult and are broader in scope. Usually, they involve an analytical derivation with further empirical confirmation through simulations. In many cases, I deliberately left the solution plan open so that students, together with their instructors, could use their own interpretation, and address questions to different depths. Some problems could be used for graduate or even Ph.D. research.

• Most parts of the theoretical material and methods of estimation are accom-panied by respective R codes. While the first edition used S-Plus/S+, the second edition switches to the R language. The data sets and R codes can be downloaded at the author's web site,

www.dartmouth.edu/~eugened

It is suggested that they be saved on the hard drive in the directory

C : \MixedModels\

with a subdirectory that corresponds to the chapter in the book. All the codes can be distributed and modified freely.

The theory of mixed models has several important unsolved problems. I hope that the list that follows will stimulate research in this direction.

I would like to hear comments and suggestions from readers, including interesting solutions to the problems, and of course typos, which can be e-mailed to me at eugened@dartmouth. edu.

Eugene Demidenko

Hanover, New Hampshire January 2013

R software

R Software and Functions

R function lme ginverse.sym GLSest lmeFS dupp familyl lmeD lmevarMINQUE lmevarMM lmevarUVLS lmesim calcium PRdistance metaMLFS ups2 RobustMedianML nlsMM logG o r t r ee phototurn SSlogprob gauher LNGHint twoint t h r e e i n t l og r i c logMLEl logMLEgh logFS logFSL glmmPQL logVARLINKl logs im poiss f ix poissGEEl poissGEE poissHeck poissMLE gee

Descript ion ML estimation of linear mixed model Generalized inverse of symmetric matrix GLS beta-estimate for LME FS ML estimation of linear mixed model Duplication matrix Vn

Family-specific weight-height relationship Simulations with lme MINQUE for matrix D Method of Moments (MM) for matrix D Unbiased VLS for matrix D Simulations with ML, MINQUE, MM, UVLS Bone density in girls and boys Analysis of dental growth for girls and boys MLE for meta-analysis model Upper confidence limit for σ2

Robust estimation of meta-analysis model Simulations with Michaelis-Menten model Log-Gompertz curve with nlme and xyplot Trunk circumference of trees fitting with nlme Photodynamic tumor growth fitted with nlme Logistic-normal integral with i n t eg ra t e Nodes and weights in GH quadrature Comparison of GH with i n t eg ra t e Example of double integration with gauher Example of 3D integration with gauher Conditional logistic regression ML for logistic regression using in t eg ra t e ML for logistic regression using gauher Fixed sample likelihood for logistic regression ML gauher fixed sample likelihood PQL for GLMM from library MASS VARLINK for logistic regression Simulations with logistic regression Poisson regression with fixed intercepts GEE Poisson regression with VLS GEE Poisson regression with Newton iteration Heckman method for Poisson regression ML for Poisson regression using i n t eg ra t e GEE for GLMM using gee package

Ch. 02 02 02 02 03 03 03 03 03 03 03 04 04 05 05 05 06 06 06 06 07 07 07 07 07 07 07 07 07 07 07 07 07 07 07 07 07 07 07

P. 44 51 55

107 123 149 149 166 171 175 179 202 227 252 262 269 295 301 312 326 338 347 347 350 351 361 368 368 370 370 374 376 374 400 400 400 400 400 423

Continued

R software xxi

R function height log heightlog2S nlme INT.ch08 onexpML onexpSIM onexpFSL nlmeFSL callnlmeFSL heightlog.nlme wbf n l sn i f finney a lckid coloncancer decE untrlme trnlme R.growth rstGT rand t r shapeh maple pota to carpet KS image lena h i s t g r hypoxiaRAT schiz bioimage clockFIG

1 clockROT

Descript ion n l s for height of girls and boys using QLogist Two-stage method with QLogist Example of nlme from package nlme Relative asymptotic bias using gauher MLE of one-parameter exponential model Simulations with one-parameter exp. model FSL method for one-parameter exp. model FSL method for a general NLME model Call to nlmeFSL QLogist with different random effects models /-influence for women's body fat NLS influence for radioactive data /-influence for logistic regression glm for a l ck id .da t lme for coloncancer.dat Doubling time and SE for tumor growth lme for untreated tumor growth nlme for treated tumor regrowth Four groups of treated mice Gompertz curve with Gompertz. dat data Plots random triangles Plots random polygons Opens and plots maple (x, y) shape files Opens and plots six potato images Analyzes carpet image data Kolmogorov-Smirnov image comparison Plots Lena canon image Histogram equalization Hypoxia BOLD MRI rat brain Normal and schizophrenia patients MRI Plots 28 cancer cell histology images Clock images alignment Clock image rotation

Ch. 08 08 08 08 08 08 08 08 08 08 09 09 09 09 09 10 10 10 10 10 11 11 11 11 12 12 12 12 12 12 12 12 12

p. 1 438 443 455 465 465 476 476 475 465 455 501 512 522 528 533 548 553 555 563 573 581 591 594 604 612 616 623 625 629 636 638 648 650

xxii Data Sets

Data Sets

N a m e Family. txt Calcium.txt PRdis tance. txt BerkeyMeta.txt TUMspher.txt t r u n k t r e e . t x t phototumdat.csv p s d a t . r he igh t .da t WomenBF.dat coloncancer.dat NLSNIF.dat Finney.dat a l ck id .da t DEregrowth.dat tumdat.csv Gompertz.dat maple*.xy pot*.pgm carpetc.pgm grpll .pgm Eng l i shLe t t e r s . t x t Hypoxia\\Group\\*.pgm schiz\ \case\ \* .pgm \ \ g r p \ \ * . j p g clockl.pgm \\bark\\bark*.pgm

Descript ion Family height and weight data Bone density in girls and boys Dental growth data for girls and boys Efficacy of tuberculosis vaccine Tumor volume of spheroids Trunk circumference of orange trees Photodynamic tumor therapy Number of visits to doctor Height of girls and boys Women's body fat Colon cancer patients' treatment cost Radioactivity counts in rat heart Finney data on vasoconstriction Underage alcohol consumption Chemotherapy treatment comparison Four-group mice tumor regrowth Gompertz transient regrowth curve Maple leaf (x, y) coordinates Six-potato image data Carpet light source location Histology images of cancer cells Frequency of English letters Hypoxia BOLD MRI rat brain data Normal and schizophrenia patients Cancer cell histology images Clock image alignment Tree bark images

Ch. 02 04 04 05 06 06 06 07 08 09 09 09 09 09 10 10 10 11 11 12 12 12 12 12 12 12 12

Page 44"

202 227 269 301 310 326 401 437 500 509 512 522 528 551 563 551 594 604 612 615 624 629 636 638 648 658

several files exist

Open Problems in Mixed Models xxiii

Open Problems in Mixed Models

• Determining how to deal with a not positive definite covariance matrix of random effects, D, during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For example, in our own R function lmeFS, we allow matrix D to be any (no restriction) but symmetric. As studied in Section 2.16, if matrix D becomes not positive definite during iterations, function lme of library nlme in R returns an error. Function lme4 of the library with the same name does not fail and returns a singular nonnegative definite matrix. The question remains how a not nonnegative definite matrix D can be projected on the space of nonnegative definite matrices, B+. In particular, shall we benefit from an expensive log-likelihood maximization on the bound-ary of B+? This question is closely related to testing what random effects variables are statistically significant.

• Testing the variance-covariance matrix of random effects, particularly testing whether a specific random effect is not statistically significant (variance=0). This question is closely related to difficulties of the numerical implementation described above. The exact F-test for a linear mixed model, as a general-ization of an ANOVA test, is suggested in Section 3.5, and generalized to a nonlinear mixed model in Section 8.15.2. Tests for overdispersion in the framework of random intercepts in logistic and Poisson models are discussed in Sections 7.3.7 and 7.5.10, respectively, and the test for homogeneity in the meta-analysis model is discussed in Section 5.2.3. However, unlike its linear version most of these tests do not yield the exact/nominal significance level in a small sample, and more work is required to eliminate or reduce this dis-crepancy. Even the F-test by itself may not be very powerful, and a search for a better test is urgent and practically important as a tool for mixed model criticism. Several recent papers study the alternatives, including those of Gi-ampaoli and Singer (2009), and Li and Zhu (2013).

• Testing what variables belong to fixed effects and what variables belong to random effects. Which variables affect the mean function and which variables affect the variance of the dependent variable is not a trivial matter. Exist-ing methods of hypothesis testing work separately with fixed and random effects. We need tests that identify fixed and random parts in a mixed model simultaneously. Practically nothing has been done in this direction. Again, in asymptotic setting, when the number of clusters is large, the information ma-trix is block diagonal which implies that the choice of the fixed or random can be done separately. For small AT, this is not true, and therefore simultaneous variable selection is required.

• Development of mixed-model-specific information criteria to address the in-creasing number of parameters, such as generalization of AIC/BIC, or Mal-low's Cp. We have sufficient evidence that these criteria are helpful when

xxiv Open Problems in Mixed Models

adjusting for an increasing number of fixed effects parameters. However, ran-dom effects parameters, namely, elements of the random effects covariance matrix, have a different nature and should not be counted in the same way as fixed effects coefficients. In Section 1.6 we suggest some variants of Akaike's criterion treating the mixed model with a large number of parameters as an inverse ill-posed problem, called healthy AIC, but much work remains to be done.

• Variable selection, or more generally, model selection in the framework of mixed models. This topic is closely related to problems formulated previously. Three types of variable selection schema are available: (1) fixed effects variable selection assuming that random effects variables are known, say, random in-tercepts; (2) random effects variable selection assuming that fixed effects vari-ables are known; and (3) having a set of variables, what variables go to fixed effects, what variables go to random effects, and what variables go nowhere. Only a handful of papers consider the problem, such as a recent paper by Peng and Lu (2012) in an asymptotic setting where selection is much easier. Especially important and difficult is the problem of mixed model selection when the number of potential variables is larger, sometimes much larger than the number of clusters, as in the case of genetics data. Then in addition to the difficulty of the variable selection criterion, a computational burden emerges.

• Power computation and sample size determination for mixed models. An im-portant feature of a mixed model is that two sample sizes should be dis-tinguished: the number of clusters, iV, and the number of observations per cluster, n. Obviously, the number of clusters is more important because when the number of clusters goes to infinity and the number of observations per cluster is fixed, beta-estimates are consistent, but not otherwise. On the other hand, n plays a role in getting the power desired. Prom asymptotic consider-ation, the power function of detecting a beta-coefficient δ versus the zero null hypothesis is equal: P = Φ(—Ζι_α/2 + S/y/V(N,n)), where Φ is the cumu-lative distribution function of the standard normal distribution, a is the size of the test (typically, a = 0.05), Ζ ι_ α / 2 = Φ " ^ 1 - <*/2), and V(N,n) is the variance of the beta estimate. In a particular case when n = 1, we arrive at the standard formula for sample size determination in the double-sided Wald test, n = (Zp + Zi_a/2)

2V/ô2 (Demidenko, 2007b). For example, in the case of a linear balanced model, the variance of the beta-coefficient is a diagonal element of the matrix Ν~1σ2 (X'(I + ZDZ ) _ 1 X) , which is a function of N and n. A similar formula for the variance can be applied to generalized and nonlinear mixed models, but then it would require integration to obtain the Fisher information matrix. We need to extend these computations to the case of unbalanced data where the distribution of the number of observations per cluster is a part of the statistical design.

• Design of optimal experiments with mixed models. In engineering and indus-trial settings, fixed and random effects design matrices may be chosen as a part of experimental design. Although the theory of optimal design of experiments is well studied for linear and nonlinear regression models, not many theoreti-cal developments exist in the mixed model framework. Similar to sample size

Open Problems in Mixed Models xxv

determination, a mixed model leads to a nontrivial choice between design for fixed and random effects. Only a handful of papers exist on the topic (e.g., Dette et al., 2010) and more research is needed to address this important application of mixed models. The idea of adaptive design seems attractive (Glaholt et a l , 2012).

• Statistical hypothesis testing using noniterative quadratic estimators of σ2 and matrix D are discussed in Sections 3.10 to 3.12. The closed-form expression for these estimators makes it possible to study the small-sample properties and development of new statistical tests. In particular, we need extensive simulations; how these estimates behave for small number of clusters, N] and non-Gaussian, possibly skewed, distributions with long tails. The importance of noniterative estimates is explained by the possibility of using them for testing a statistical hypothesis on D that creates an opportunity to testing the statistical significance of the random effects.

• Studying the small-sample-size properties of the beta-estimates and the re-spective statistical hypothesis tests. A paramount question regarding non-linear statistical models involves small-sample properties of estimators. The linear mixed effects model is the simplest nonlinear statistical model in which advances can be achieved. Currently, the t-test is used for the statistical sig-nificance of fixed effects coefficients assuming that the covariance matrix of random effects is fixed and known. We can adjust for the fact that an esti-mate of matrix D is used that would lead to widening the confidence intervals. More research should be done in studying how the confidence intervals and hypothesis testing can be improved using the profile likelihood; see Section 3.4 as an introduction.

• The Gauss-Markov theorem for mixed model or estimated GLS. The Gauss-Markov theorem is the cornerstone of linear models. If the scaled covariance matrix of random effects, D, is known, the estimated generalized least squares estimator for fixed effects coefficients, /3, is BLUE (best linear unbiased es-timator) and has a minimum covariance matrix (the estimator is efficient) among all (linear and nonlinear) unbiased estimators with normal observa-tions. In Section 3.6.1 it is shown that the maximum likelihood estimator of fixed effects is unbiased in a small sample, and it remains unbiased with many other quadratic estimates of D, such as MINQUE, MM, and VLS, discussed in Chapter 3. Thus, the set of unbiased estimator for ß is nonempty, and therefore the question as to which is the most efficient unbiased estimator is valid. Several avenues can be taken to tackle this problem. For example, one may seek an estimate of D as a quadratic function of the observations that minimizes the covariance matrix of the fixed effects coefficients or its deriva-tives at D = 0. A good start may be the simplest random effect model, the meta-analysis model, discussed in Chapter 5.

• Develop better computational algorithms for generalized linear and nonlinear mixed models, including maximum likelihood estimation based on numerical integration. Three types of quantities are computed in traditional (or approx-imate) log-likelihood maximization: the values of the log-likelihood function,

xxvi Open Problems in Mixed Models

its derivatives (the score equations), and the Hessian (or information) ma-trix. It should be noted that score equations are most important because the MLE is defined as the solution of these equations. The Hessian estimate is less important because any positive definite matrix provides the convergence of the maximization algorithm. While the existing methods concentrate on the log-likelihood approximation via integration, we should pay more attention to score equations. The improved Laplace approximation suggested in Section 7.1.2 can be used to approximate the integral or the Gauss-Hermite quadra-ture. To speed up the convergence, one can increase the number of nodes while iterations progress.

• Improve computational algorithms by recognizing that the beta-parameters, ß, and Cholesky factor elements, 5, can be combined in a linear combination Aiß + \Jaö, as outlined in Section 8.14. The prototype of the algorithm is implemented in our function nlmeFSL, but more efficient C/FOTRAN code is required to see the full advantage.

• Starting values for linear and nonlinear mixed model estimation algorithms. A good choice for starting values may be crucial for a successful run, es-pecially for generalized and nonlinear mixed models with a large number of random effects or complicated variance-covariance structure. It seems that the most important is the choice of matrix D. Several recommendations may be explored: (1) a few iterations of the fixed-point algorithm, as discussed in Sec-tion 2.13; and (2) noniterative quadratic estimates of matrix D* and σ2, as discussed in Sections 3.10 to 3.12. Less obvious and yet more important is the choice for starting values for nonlinear mixed models. Beta-parameters may be estimated using glm and n l s , and matrix D* may be estimated based on the residuals. Definitely, more work is required to study theoretical properties of these starting values and to test these suggestions via extensive simulations involving 'difficult' data sets.

• Development of the criterion that the MLE of D is a positive definite matrix. We have developed such criterion for a linear mixed model in Section 2.6 and for a meta-analysis model in Section 5.1.2. A similar criterion is needed for generalized linear and nonlinear mixed models. This criterion can serve as a preliminary test for the adequateness of the mixed model and random effects against overspecification.

• Development of an adequate stopping criterion (criteria) for maximization of the log-likelihood function, especially with generalized linear and nonlinear models. The log-likelihood maximization may be a complex problem, espe-cially when variables are close to collinear or when the nonlinear model has a complicated variance-covariance structure. Proximity between iterations de-fined as ||as — a s_i | | < ε, where ε is a small number, does not guarantee that a s is the point of the global maximum. In order to claim that iterations con-verged to a local maximum, the gradient of the log-likelihood function at a s

must be zero. A question arises: What small is small? For instance, is 10_ 1 or 10 - 8 a small gradient? The interprétable stopping criteria were developed for

Open Problems in Mixed Models xxvii

the nonlinear least squares, and discussed in Section 13.3.5. Similar criteria should be developed for mixed model maximization algorithms.

• Existence of the maximum likelihood estimate (MLE) for generalized linear and nonlinear mixed models. The MLE may not exist; thus, before starting a maximization algorithm one has to be sure that the maximizer exists. You may spend a lot of time on model testing and playing with the start values but eventually fail because the MLE, simply does not exist—the criteria for MLE existence are important. For a linear mixed effects model, MLE exists with probability 1, as discussed in Section 2.5. Things become more complicated for generalized and nonlinear mixed models. For example, in the case of binary dependent data, the conditions for the data separation must be fulfilled, as presented in Section 7.10. You may generalize the existence criteria developed for nonlinear regression by the author (Demidenko, 1989, 2000, 2008) to the existence of the MLE in the mixed model.

• Uniqueness of the log-likelihood maximum. The log-likelihood function is not a quadratic function even for a linear mixed model because of the presence of variances and covariances. Thus, the possibility exists of converging to a local maximum. As proven by Demidenko (2000), for many nonlinear regressions the probability that the normal equation has two or more distinct solutions is positive. We need criteria by which one can test whether the maximum log-likelihood found is global, as suggested by Demidenko (2008). As a con-jecture, the log-likelihood function for a linear mixed model is unimodal (local maximum=global maximum). For a generalized linear mixed model, such as the logistic or Poisson model, this question is open A good start is to investi-gate the uniqueness of the maximum likelihood estimate for a Poisson model with random intercepts. The uniqueness criteria for a general nonlinear mixed model are even more difficult than those for a nonlinear regression but are not completely intractable. In general, criteria for uniqueness are model-dependent and mathematically challenging.

It should be noted that some literature exists that deals with some of the problems outlined above. We have deliberately not tried to mention all existing publications in these directions because it would require much more space. Therefore, an important part of advancing along the lines of these problems will be a careful review of work already done.

mixed models - download.e-bookshelf.de · 6 nonlinear marginal model 291 6.1 fixed matrix of random...

Documents