review article generalized estimating equations in...

12
Review Article Generalized Estimating Equations in Longitudinal Data Analysis: A Review and Recent Developments Ming Wang Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033, USA Correspondence should be addressed to Ming Wang; [email protected] Received 17 March 2014; Revised 29 October 2014; Accepted 16 November 2014; Published 1 December 2014 Academic Editor: Chin-Shang Li Copyright © 2014 Ming Wang. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Generalized Estimating Equation (GEE) is a marginal model popularly applied for longitudinal/clustered data analysis in clinical trials or biomedical studies. We provide a systematic review on GEE including basic concepts as well as several recent developments due to practical challenges in real applications. e topics including the selection of “working” correlation structure, sample size and power calculation, and the issue of informative cluster size are covered because these aspects play important roles in GEE utilization and its statistical inference. A brief summary and discussion of potential research interests regarding GEE are provided in the end. 1. Introduction Generalized Estimating Equation (GEE) is a general sta- tistical approach to fit a marginal model for longitudi- nal/clustered data analysis, and it has been popularly applied into clinical trials and biomedical studies [13]. One longitu- dinal data example can be taken from a study of orthodontic measurements on children including 11 girls and 16 boys. e response is the measurement of the distance (in millimeters) from the center of the pituitary to the pterygomaxillary fissure, which is repeatedly measured at ages 8, 10, 12, and 14 years. e primary goal is to investigate whether there exists significant gender difference in dental growth measures and the temporal trend as age increases [4]. For such data analysis, it is obvious that the responses from the same individual tend to be “more alike”; thus incorporating within-subject and between-subject variations into model fitting is necessary to improve efficiency of the estimation and the power [5]. ere are several simple methods existing for repeated data analysis, that is, ANOVA/MANOVA for repeated mea- sures, but the limitation is the incapability of incorporating covariates. ere are two types of approaches, mixed-effect models and GEE [6, 7], which are traditional and are widely used in practice now. Of note is that these two methods have different tendencies in model fitting depending on the study objectives. In particular, the mixed-effect model is an individual-level approach by adopting random effects to capture the correlation between the observations of the same subject [7]. On the other hand, GEE is a population-level approach based on a quasilikelihood function and provides the population-averaged estimates of the parameters [8]. In this paper, we focus on the latter to provide a review and recent developments of GEE. As is well known, GEE has several defining features [911]. (1) e variance-covariance matrix of responses is treated as nuisance parameters in GEE and thus this model fitting turns out to be easier than mixed-effect models [12]. In particular, if the overall treatment effect is of primary interest, GEE is preferred. (2) Under mild regularity conditions, the parameter estimates are consistent and asymptotically normally distributed even when the “working” correlation structure of responses is misspecified, and the variance-covariance matrix can be estimated by robust “sandwich” variance estimator. (3) GEE relaxes the distribution assumption and only requires the correct specification of marginal mean and variance as well as the link function which connects the covariates of interest and marginal means. However, several aspects of GEE are still in controversy since Liang and Zeger [6]. Crowder addressed some issues on inconsistent estimation of within-subject correlation Hindawi Publishing Corporation Advances in Statistics Volume 2014, Article ID 303728, 11 pages http://dx.doi.org/10.1155/2014/303728

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Review Article Generalized Estimating Equations in ...downloads.hindawi.com/archive/2014/303728.pdfrecent developments of GEE. As is well known, GEE has several de ning features [

Review ArticleGeneralized Estimating Equations in Longitudinal DataAnalysis A Review and Recent Developments

Ming Wang

Division of Biostatistics and Bioinformatics Department of Public Health Sciences Penn State College of MedicineHershey PA 17033 USA

Correspondence should be addressed to Ming Wang mwangphspsuedu

Received 17 March 2014 Revised 29 October 2014 Accepted 16 November 2014 Published 1 December 2014

Academic Editor Chin-Shang Li

Copyright copy 2014 Ming Wang This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Generalized Estimating Equation (GEE) is a marginal model popularly applied for longitudinalclustered data analysis in clinicaltrials or biomedical studiesWe provide a systematic review onGEE including basic concepts as well as several recent developmentsdue to practical challenges in real applications The topics including the selection of ldquoworkingrdquo correlation structure sample sizeand power calculation and the issue of informative cluster size are covered because these aspects play important roles in GEEutilization and its statistical inference A brief summary and discussion of potential research interests regarding GEE are providedin the end

1 Introduction

Generalized Estimating Equation (GEE) is a general sta-tistical approach to fit a marginal model for longitudi-nalclustered data analysis and it has been popularly appliedinto clinical trials and biomedical studies [1ndash3] One longitu-dinal data example can be taken from a study of orthodonticmeasurements on children including 11 girls and 16 boysTheresponse is the measurement of the distance (in millimeters)from the center of the pituitary to the pterygomaxillaryfissure which is repeatedly measured at ages 8 10 12 and 14years The primary goal is to investigate whether there existssignificant gender difference in dental growth measures andthe temporal trend as age increases [4] For such data analysisit is obvious that the responses from the same individual tendto be ldquomore alikerdquo thus incorporating within-subject andbetween-subject variations into model fitting is necessary toimprove efficiency of the estimation and the power [5]

There are several simple methods existing for repeateddata analysis that is ANOVAMANOVA for repeated mea-sures but the limitation is the incapability of incorporatingcovariates There are two types of approaches mixed-effectmodels and GEE [6 7] which are traditional and are widelyused in practice now Of note is that these two methodshave different tendencies in model fitting depending on the

study objectives In particular the mixed-effect model isan individual-level approach by adopting random effects tocapture the correlation between the observations of the samesubject [7] On the other hand GEE is a population-levelapproach based on a quasilikelihood function and providesthe population-averaged estimates of the parameters [8] Inthis paper we focus on the latter to provide a review andrecent developments of GEE As is well known GEE hasseveral defining features [9ndash11] (1)The variance-covariancematrix of responses is treated as nuisance parameters inGEE and thus this model fitting turns out to be easierthan mixed-effect models [12] In particular if the overalltreatment effect is of primary interest GEE is preferred (2)Under mild regularity conditions the parameter estimatesare consistent and asymptotically normally distributed evenwhen the ldquoworkingrdquo correlation structure of responses ismisspecified and the variance-covariance matrix can beestimated by robust ldquosandwichrdquo variance estimator (3) GEErelaxes the distribution assumption and only requires thecorrect specification of marginal mean and variance as wellas the link function which connects the covariates of interestand marginal means

However several aspects of GEE are still in controversysince Liang and Zeger [6] Crowder addressed some issueson inconsistent estimation of within-subject correlation

Hindawi Publishing CorporationAdvances in StatisticsVolume 2014 Article ID 303728 11 pageshttpdxdoiorg1011552014303728

2 Advances in Statistics

coefficient under a misspecified ldquoworkingrdquo correlation struc-ture based on asymptotic theory [7] In addition the estima-tion of the correlation coefficients using the moment-basedapproach is not efficient thus the correlation matrix may notbe a positive definite matrix in certain cases Also Liang andZeger did not incorporate the constraints on the range of cor-relation which was restricted by the marginal means becausethe estimation of the correlation coefficientswas simply basedon Pearson residuals [6] Chaganty and Joe discussed thisissue for dependent Bernoulli randomvariables [13] and laterSabo and Chaganty made future explanation [14] For exam-ple Sutradhar and Das pointed out under misspecificationthe correlation coefficient estimates did not converge to thetrue values [15] Furthermore for discrete random vectorsthe correlation matrix was usually complicated and it wasnot easy to attain multivariate distributions with specifiedcorrelation structures These limitations lead researchers toactively work on this area to develop novel methodologiesSeveral alternative approaches for estimating the correlationcoefficients have been proposed for example one methodwas based on ldquoGaussianrdquo estimation [16 17] and the basicidea was to estimate the correlation coefficients based onmultivariate normal estimating equations and the featurewas that this estimation can ensure the estimated correlationmatrix was positive-definite Wang and Carey proposed toestimate the correlation coefficients by differentiating theCholesky decomposition of the working correlation matrix[18] Also Qu and Lindsay (2003) proposed similar Gaussianor quadratic estimating equations [19] In particular forbinary longitudinal data the estimation of the correlationcoefficients was proposed based on conditional residuals [20ndash22] Nevertheless in this paper the above issues are notdiscussed in great depth and the assumption that underthe regular mild conditions the consistency of parameterestimates as well as within-subject correlation coefficient esti-mate holds is satisfied Thus three specific topics includingmodel selection power analysis and the issue of informativecluster size are mainly focused on and the recent develop-ments are reviewed in the following sections

2 Method

21 Notation and GEE Suppose that longitudinalclustereddata consists of 119870 subjectsclusters For subjectcluster 119894 (119894 =1 2 119870) suppose that there are 119899

119894observations and 119884

119894119895

denotes the 119895th response (119895 = 1 119899119894) and let 119883

119894119895denote

a 119901 times 1 vector of covariates Let 119884119894= (119884

1198941 1198841198942 119884

119894119899119894)1015840

denote the response vector for the 119894th subject with the meanvector noted by 120583

119894= (120583

1198941 1205831198942 120583

119894119899119894)1015840 where 120583

119894119895is the

corresponding 119895th mean The responses are assumed to beindependent across subjectsclusters but correlated withineach subjectcluster The marginal model specifies that arelationship between 120583

119894119895and the covariates 119883

119894119895is written as

follows

119892 (120583119894119895) = 119883

1015840

119894119895120573 (1)

where 119892 is a known link function and 120573 is an unknown119901 times 1 vector of regression coefficients with the true value

as 1205730 The conditional variance of 119884

119894119895given 119883

119894119895is specified

as Var(119884119894119895| 119883119894119895) = ](120583

119894119895)120601 where ] is a known variance

function of 120583119894119895and 120601 is a scale parameter which may need to

be estimated Mostly ] and 120601 depend on the distributions ofoutcomes For instance if 119884

119894119895is continuous ](120583

119894119895) is specified

as 1 and 120601 represents the error variance if 119884119894119895is count

](120583119894119895) = 120583119894119895 and 120601 is equal to 1 Also the variance-covariance

matrix for 119884119894is noted by 119881

119894= 120601119860

12

119894119877119894(120572)11986012119894 where

119860119894= Diag](120583

1198941) ](120583

119894119899119894) and the so-called ldquoworkingrdquo

correlation structure 119877119894(120572) describes the pattern of measures

within subject which is of size 119899119894times 119899119894and depends on

a vector of association parameters denoted by 120572 Table 1provides summary of commonly used ldquoworkingrdquo correlationstructures with the moment-based estimates for 120572 (moredetails in httpwwwokstateedusas) Note that the iterativealgorithm is applied for estimating 120572 using the Pearsonresiduals 119890

119894119895= (119910119894119895minus 120583119894119895)radic](120583119894119895) calculated from the current

value of 120573 Also the scale parameter 120601 can be estimated by

120601 =1

119873 minus 119901

119870

sum119894=1

119899119894

sum119895=1

1198902

119894119895 (2)

where119873 = sum119870119894=1119899119894is the total number of observations and 119901

is covariates dimensionalityBased on Liang and Zeger [6] GEE yields asymptotically

consistent even when the ldquoworkingrdquo correlation structure(119877119894(120572)) is misspecified and the estimate of 120573 is obtained by

solving the following estimating equation

119880 (120573) =119870

sum119894=1

1198631015840

119894119881minus1

119894(119884119894minus 120583119894) = 0 (3)

where 119863119894= 1205971205831198941205971205731015840 Under mildregularity conditions is

asymptotically normally distributed with a mean 1205730and a

covariancematrix estimated based on the sandwich estimator

119871119885= (

119870

sum119894=1

1198631015840

119894119881minus1

119894119863119894)

minus1

119871119885(

119870

sum119894=1

1198631015840

119894119881minus1

119894119863119894)

minus1

(4)

with

119871119885=

119870

sum119894=1

1198631015840

119894119881minus1

119894Cov (119884

119894) 119881minus1

119894119863119894

(5)

by replacing120572120573 and120601with their consistent estimates whereCov(119884

119894) = 1199031198941199031015840

119894with 119903

119894= 119884119894minus120583119894is an estimator of the variance-

covariance matrix of 119884119894[6 23] This ldquosandwichrdquo estimator is

robust in that it is consistent even if the correlation structure(119881119894) is misspecified Note that if 119881

119894is correctly specified then

119871119885

reduces to (sum119870119894=11198631015840119894119881minus1119894119863119894)minus1 which is often referred to as

the model-based variance estimator [24]Thus aWald119885-testcan be performed based on asymptotic normal distribution ofthe test statistic Next we will overviewmodel selection crite-ria and particularly ldquoworkingrdquo correlation structure selectioncriteria with regard to GEE

Advances in Statistics 3

Table 1 Summary of commonly used ldquoworkingrdquo correlation structures for GEE

Correlation structure Corr(119884119894119895 119884119894119896) Sample matrix Estimator

Independent Corr(119884119894119895 119884119894119896) =

1 119895 = 119896

0 119895 = 119896(

1 0 0

0 1 0

0 0 1

) NA

Exchangeable Corr(119884119894119895 119884119894119896) =

1 119895 = 119896

120572 119895 = 119896(

1 120572 120572

120572 1 120572

120572 120572 1

)

=1

(1198731015840 minus 119901) 120601

119870

sum119894=1

sum119895 =119896

119890119894119895119890119894119896

1198731015840 =119870

sum119894=1

119899119894(119899119894minus 1)

119896-dependent Corr(119884119894119895 119884119894119895+119898) =

1 119898 = 0

120572119898119898 = 1 2 119896

0 119898 gt 119896

(

1 12057210

12057211 1205721

0 12057211

)

119898=

1

(119870119898minus 119901) 120601

119870

sum119894=1

sum119895le119899119894minus119898

119890119894119895119890119894119895+119905

119870119905=

119870

sum119894=1

(119899119894minus 119898)

Autoregressive AR(1) Corr(119884119894119895 119884119894119895+119898) = 120572119898 119898 = 0 1 2 119899

119894minus 119895 (

1 120572 1205722

120572 1 120572

1205722 120572 1

)

=1

(1198701minus 119901)120601

119870

sum119894=1

sum119895le119899119894minus1

119890119894119895119890119894119895+1

1198701=

119870

sum119894=1

(119899119894minus 1)

Toeplitz Corr(119884119894119895 119884119894119895+119898) =

1 119898 = 0

120572119898119898 = 1 2 119899

119894minus 119895

(

1 12057211205722

12057211 1205721

120572212057221

)

=1

(1198731015840 minus 119901)120601

119870

sum119894=1

sum119895 =119896

119890119894119895119890119894119896

1198731015840 =

119870

sum119894=1

119899119894(119899119894minus 1)

Unstructured Corr(119884119894119895 119884119894119896) =

1 119895 = 119896

120572119895119896

119895 = 119896(

1 1205721212057213

12057221

1 12057223

1205723112057232

1

) 119895119896=

1

(119870 minus 119901)120601

119870

sum119894=1

119890119894119895119890119894119896

22 Model Selection of GEE In this section we will discussthe model selection criteria available of GEE There areseveral reasons why model selection of GEE models isimportant and necessary (1) GEE has gained increasingattention in biomedical studies which may include a largegroup of predictors [25ndash28] Therefore variable selection isnecessary for determining which are included in the finalregressionmodel by identifying significant predictors (2) it isalready known that one feature of GEE is that the consistencyof parameter estimates can still hold evenwhen the ldquoworkingrdquocorrelation structure ismisspecified But correctly specifyingldquoworkingrdquo correlation structure can definitely enhance theefficiency of the parameter estimates in particular when thesample size is not large enough [16 24 25 29] Thereforehow to select intrasubject correlation matrix plays a vital rolein GEE with improved finite-sample performance (3) thevariance function ](120583) is another potential factor affecting thegoodness-of-fit of GEE [25 30] Correctly specified variancefunction can assist in the selection of covariates and an appro-priate correlation structure [31 32] Different criteria mightbe needed due to the goal of model selection [24 29 33] andnext I will particularly introduce the existing approaches onthe selection of ldquoworkingrdquo correlation structure with its ownmerits and limitations [34]

According to Rotnitzky and Jewell the adequacy ofldquoworkingrdquo correlation structure can be examined through

Γ = (sum119870

119894=11198631015840119894119881minus1119894119863119894)minus1119871119885 where

119871119885has been defined in

Section 21 [35] The statistic RJ(119877) is defined by

RJ (119877) = radic(1 minus RJ1)2 + (1 minus RJ2)2 (6)

where RJ1 = trace(Γ)119901 and RJ2 = trace(Γ2)119901 respectivelyIf the ldquoworkingrdquo correlation structure 119877 is correctly specifiedRJ1 and RJ2 will be thus close to 1 leading to RJ(119877)approaching 0 Thus RJ1 RJ2 and RJ(119877) can all be used forcorrelation structure selection

Shults and Chaganty [36] proposed a criterion for select-ing ldquoworkingrdquo correlation structure based on the minimiza-tion of the generalized error sum of squares (ESS) given asfollows

ESS (120572120573) =119870

sum119894=1

(119884119894minus 119906119894)1015840119881minus1

119894(119884119894minus 119906119894)

=

119870

sum119894=1

1198851015840

119894(120573) 119877minus1

119894(120572) 119885119894 (120573)

(7)

where 119885119894(120573) = 11986012(119884

119894minus 119906119894) The criterion is defined by

SC =ESS (120572120573)(119873 minus 119901 minus 119902)

(8)

4 Advances in Statistics

where 119873 = sum119870

119894=1119899119894is the total number of observations

119901 is the number of regression parameters and 119902 is thenumber of correlation coefficients within the ldquoworkingrdquocorrelation structure Another extended criterion from SCwas proposed by Carey and Wang [37] where the Gaussianpseudolikelihood (GP) is adopted and it is given by

GP (119877) = minus05 times119870

sum119894=1

(119884119894minus 119906119894)1015840119881minus1

119894(119884119894minus 119906119894) + log (1003816100381610038161003816119881119894

1003816100381610038161003816) (9)

where a better ldquoworkingrdquo correlation structure yields a largerGP In their work they also showed that GP criterion heldbetter performance than RJ via simulation

Another criterion is proposed by Pan [38] which mod-ified Akaike information criterion (AIC) [39] in adaptionto GEE Due to the fact that GEE is not likelihood-basedthus it is called quasi-likelihood under the independencemodel criterion (QIC) [40] The basic idea is to calculatethe expected Kullback-Leibler discrepancy using the quasi-likelihood under the independence ldquoworkingrdquo correlationassumption due to the lack of a general and tractable quasi-likelihood for the correlated data under any other complexldquoworkingrdquo correlation structures QIC(119877) is defined by

QIC (119877) = minus2Ψ ( (119877) 119868) + 2trace (Ω119868119871119885) (10)

where the quasilikelihood Ψ((119877) 119868) = sum119870119894=1sum119899119894

119895=1119876((119877) 120601

119884119894119895 119883119894119895) with 119876(120583 120601 119910) = int119906

119910((119910 minus 119905)120601119881(119905))119889119905 defined by

[12] and 120601 are obtained under the hypothesized ldquoworkingrdquocorrelation structure 119877 Ω

119868= sum119870

119894=11198631015840119894119881minus1119894119863119894|120573=119877=119868 and 119871119885

is defined above with replacement of 120573 by (119877) [38] Notethat in this work Pan ignored the second term in Taylorrsquosexpansion of the discrepancy and showed its influence wasnot substantial among his simulation set-ups Later onHardin andHilbe (2003)made slightmodification onQIC(119877)by using (119868) and 120601(119868) for more stability and QIC(119877)HH isgiven by

QIC (119877)HH = minus2Ψ ( (119868) 119868) + 2trace (Ω119868119871119885) (11)

Note that QIC(119877) and QIC(119877)HH do not perform well indistinguishing the independence and exchangeable ldquowork-ingrdquo correlation structures because in certain cases the sameregression parameter estimates can be obtained under thesetwo structures Also the attractive property of the QICcriterion is that it allows the selection of the covariates andldquoworkingrdquo correlation structure simultaneously [41 42] butthis measure is more sensitive to the mean structure becauseQIC is particularly impacted by the first term and the secondtermwhich plays a role as a penalty To better select ldquoworkingrdquocorrelation structure Hin and Wang proposed correlationinformation criterion (CIC) defined by

CIC = trace (Ω119868119871119885) (12)

In their work CIC was shown to outperform QIC whenthe outcomes were binary through simulation studies [43]

One limitation of this criterion is that it cannot penalizethe overparameterization thus the performance is not wellin comparison with two correlation structures having quitedifferent numbers of correlation parameters

Another attractive criterion is the extended quasilike-lihood information criterion (EQIC) proposed by Wangand Hin [25] by using the extended quasilikelihood (EQL)defined by Nelder and Pregibon based on the deviancefunction which is shown below under the independentcorrelation structure [44]

119876lowast(120573 120601 119868) = minus

1

2120601119863 (120573 119868) minus

1

2

119870

sum119894=1

119899119894

sum119895=1

log (2120587120601119860 (120583119894119895))

(13)

where the sum of deviances 119863(120573 119868) = sum119870119894=1sum119899119894

119895=1minus2120601119876(119910

119894119895

120583119894119895) minus 119876(119910

119894119895 119910119894119895) with 119876(sdot) being the quasilikelihood defined

as above Therefore EQIC is defined by

EQIC (119877) = 1120601119863 (120573 119868) +

119870

sum119894=1

119899119894

sum119895=1

log (2120587120601119860 (120583119894119895))

+ 2trace (Ω119868119871119885)

(14)

where some adjustments were applied to 119860(120583) by adding asmall constant 119896 with the optimal chosen value as 16 Theauthor indicated that the covariates were first selected basedon QIC and the variance function could be identified asthe one minimizing EQIC given the selected covariates thenldquoworkingrdquo correlation structure selection could be achievedbased on CIC in addition they found out that the covariatesselection by EQIC given different working variance functionswas more consistent than that based on QIC [45]

Besides those criteriamentioned above Cantoni et al alsodiscussed the covariate selection for longitudinal data anal-ysis [46] also a variance function selection was mentionedby Pan and Mackenzie [30] as well as Wang and Lin [47]in addition more work on ldquoworkingrdquo correlation structureselection was addressed by Chaganty and Joe [48] Wang andLin [47] Gosho et al [49 50] Jang [51] Chen [52] andWestgate [53ndash55] among others Overall themodel selectionof GEE is nontrivial where the best selection criterion is stillbeing pursued [56] and the recent work by Wang et al canbe followed up as the rule of thumb [45]

23 Sample Size and Power of GEE It is well known thatthe calculation of sample size and power is necessary andimportant for planning a clinical trial which have beenwell studied for independent observations [1] With thewide applications of GEE in clinical trials this topic forcorrelatedclustered data has gainedmore attention than ever[5 57] The general method for sample sizepower calculatedwas discussed by Liu and Liang [58] where the generalizedscore test was utilized to draw statistical inference and theresulting noncentral chi-square distribution of test statisticunder the alternative hypothesis was derived however insome special cases that is correlated binary data withnonexchangeable correlation structure there was no close

Advances in Statistics 5

form available along the outline of that formula AfterwardsShih provided an alternative formula on sample sizepowercalculation which relied on Wald tests using the estimatesof regression parameters and robust variance estimators [59]For example in a study with one parameter of interest 120573 thehypothesis of interest can be formulated as

1198670 120573 = 0 versus 119867

119886 120573 = 119887 = 0 (15)

where 119887 is the expected value Thus based on a two-sided119885-test with type I error 120578 the power denoted by 120575 can beobtained by

120575 = 1 minus Φ(1198851205782minus119887radic119870

radic]119877) (16)

where119870 is sample size and ]119877is the robust variance estimator

corresponding to 120573 in the estimate of119870119871119885 Accordingly the

sample size is given by

119870 =]119877(1198851205782minus 1198851minus120575)2

1198872 (17)

For correlated continuous data the calculation is straightfor-ward using (16) however in particular for correlated binarydata more work will be needed [60] and Pan providedexplicit formulas for ]

119877under various situations as follows

[61]

]119877= Ω[

1

1205871199010(1 minus 119901

0)+

1

(1 minus 120587) 1199011 (1 minus 1199011)] (18)

where Ω = 119870(sum119870119894=1

11015840119899119894119877minus1

119894119881119894119877minus1

1198941119899119894)(sum119870

119894=111015840119899119894119877minus1

1198941119899119894)2 with 120587

as the proportion of subjects assigned to the control groupand 119901

0and 119901

1as the mean for control and case groups

[61] The detailed calculations of ]119877under several important

special cases are given by

If 119877119894= 119881119868= CS Ω = 119870

sum119870

119894=1(119899119894 (1 + (119899

119894minus 1) 120572))

If 119877119894= 119868 119881

119868= CS Ω =

119870sum119870

119894=1119899119894[1 + (119899

119894minus 1) 120572]

(sum119870

119894=1119899119894)2

If 119877119894= 119881119868= AR (1) Ω = 119870 (1 + 120572)

sum119870

119894=1[119899119894minus (119899119894minus 2) 120572]

If 119877119894= 119868 119881

119868= AR (1)

Ω = 119870sum119870

119894=1[119899119894+ 2 (119899

119894minus 1) 120572 + 2 (119899

119894minus 2) 1205722 + sdot sdot sdot + 2120572119899119894minus1]

(sum119870

119894=1119899119894)2

(19)

These formulas can be directly used in practice which hascovered most situations encountered in clinical trials [61]Note that when 119877

119894= 119881119868= CS Liu and Liang (1997) provided

a different formula of sample size compared with (17) with119899119894= 119899 which is

119870 = ((1199111minus1205782

+ 1199111minus120575)2

times ((1 minus 120587) 1199010 (1 minus 1199010) + 1205871199011 (1 minus 1199011))

times [1 + (119899 minus 1) 120572])

times (119899120587 (1 minus 120587) (1199011 minus 1199010)2)minus1

(20)

Be aware that the difference is due to the test methods theWald 119885-test used by Pan [61] and the score test applied byLiu and Liang [58] Note that in some cases the score testmay be preferred [62] Although some other works exist forsample sizepower calculation they focused on the otheralternative approaches rather than GEE [63 64] thus wedo not discuss them here For correlated Poisson data thesample sizepower calculation is more challenging due tothe occurrence of overdispersion or sparsity where negativebinomial regression model may be explored [62 65ndash67]

On the other hand there are several concerns [68]First we here focus on the calculation of the sample size 119870assuming 119899

119894is known however based on the power formula

(16) ]119877depends on 119899

119894and thus increasing 119899

119894can also assist

in power improvement but turns out to be less effective than119870 [69] Second the sample sizepower calculation may berestricted to the limitation of clusters for example clusteredrandomized trials (CRTs) where the number of clusters couldbe relatively small For example by the literature review ofpublished CRTs the median number of clusters is shown as21 [70] In such situations the power formula adjusted for thesmall samples in GEE is necessary which has drawn attentionfrom researchers recently [71ndash75]

24 Clustered Data with Informative Cluster Size The appli-cation of GEE in clustered data with informative clustersize is another special topic [76] Taking an example ofa periodontal disease study the number of teeth for eachpatient may be related to the overall oral health of theindividual in other words the worse the oral health is theless the number of teeth is and thus cluster size 119899

119894may

influence the distribution of the oral outcomes which iscalled informative cluster size [45 77] Such issues commonlyoccur in biomedical studies (eg genetic disease studies) andrigorous statistical methods are needed for valid statisticalinference [78] Note that if the maximum of cluster sizeexists and is known then this can be treated as (informative)missing data problem which can be solved via the weightedestimating equations proposed by Robins et al [79] howeverif the maximum is unknown or not accessible the methodof within-cluster resampling (WCR) proposed by Hoffman etal could be applied [80] The basic idea is that for each of 119871resampled replicate data based on a Monte Carlo method (119871is a large number ie 10000) one observation is randomlyextracted from each cluster where

119897with variance estimator

Σ119897can be obtained from a regular score equation denoted by

119878119897(120573) for independent observations (ie linear regression for

6 Advances in Statistics

continuous data logistic regression for binary data Poissonregression for count data) 119897 = 1 2 119871 The details areshown as follows

119878119897(120573) =

119870

sum119894=1

119878119894119895(120573119897) 119868 [119895 isin 119903

119897] = 0

wcr =1

119871

119871

sum119897=1

119897

wcr =1

119871

119871

sum119897=1

Σ119897minus1

119871

119871

sum119897=1

(119897minus wcr) (119897 minus wcr)

119879

(21)

where 119878119894119895(120573119897) = 1198831015840

119894119895119881minus1119894119895(119884119894119895minus 1198831015840119894119895120573119897) with 119903

119897as the set of

data index selected from the 119894th cluster in 119897th replicate dataAlternatively the approach considered by Williamson et alby adopting the weighted estimating equations performsasymptotically equivalently asWCR and also avoids intensivecomputing and it is referred to as the cluster-weighted GEE(CWGEE) [81] The estimating equation is

119878 (120573) =119870

sum119894=1

1

119870

119899119894

sum119895=1

119878119894119895(120573) = 0 (22)

where 119878119894119895is defined the same as above but what is different is

that the subscription 119895 ranges from 1 to 119899119894 not restricted by

the index 119903119897 Note that as 119871 rarr infin (1119871)sum119871

119894=1119878119897(120573) converges

to its expected estimating function and is asymptoticallyequivalent to 119878(120573)

This method was also explored or extended for thecorrelated data with nonignorable cluster size by Benhin etal and Cong et al [82 83] Furthermore a more efficientmethod called modified WCR (MWCR) was proposed byChiang and Lee where minimum cluster size 119899

119894gt 1 subjects

were randomly sampled from each cluster and then GEEmodels for balanced data were applied for estimation byincorporating the intracluster correlation thusMWCRmightbe a more efficient way for analysis [84] But MWCR is notalways satisfactory and Pavlou et al recognized the sufficientconditions of the data structure and the choice of ldquoworkingrdquocorrelation structure which allowed the consistency of theestimates fromMWCR[85] In additionWang et al extendedthe above work to the clustered longitudinal data which arecollected as repeated measures on subjects arising in clusterswith potential informative cluster size [45] Examples includehealth studies of subjects from multiple hospitals or familiesWith the adoption and comparison of GEE WCR andCWGEE the author claimed thatCWGEEwas recommendedbecause of the comparable performance with WCR and thelack of intensive Monte Carlo computation in terms of wellpreserved coverage rates and desirable power propertieswhile GEE models led to invalid inference due to the biasedparameter estimates via extensive simulation studies and realdata application of a periodontal disease study [45] In addi-tion for observed-cluster inference Seaman et al discussedthe methods including weighted and doubly weighted GEEand the shared random-effects models for comparison and

showed the conditions under which the shared random-effects model described members with observed outcomes Y[86] More work can be found in [87ndash90] among others

3 Simulation

In this section we focus on ldquoworkingrdquo correlation structureselection and compare the performances of the existingcriteria through simulation studies Two types of outcomesare considered continuous and count responses The modelsfor data generation are as follows

119906119894119895= 1205730+ 1205731times 119909119894119895

log (119906119894119895) = 1205730+ 1205731times 119909119894119895

(23)

where 1205730= 1205731= 05 119894 = 1 2 119868 with 119868 = 50 100 200 500

and 119895 = 1 2 119869 with 119869 = 4 8 The covariates 119909119894119895are iid

from a standard uniform distribution Unif(0 1) For eachscenario we generate the data based on the underlying truecorrelation structures as independent (IND) exchangeable(EXCH) and autoregressive (AR-1) with 120572 = 03 071000 Monte Carlo data sets are generated for each scenariowhere the estimates of regression parameters and within-subject correlation matrix and seven model selection criteriameasures are calculated using the ldquoworkingrdquo correlationstructure of IND EXCH and AR-1 The partial simulationresults are provided in Tables 2 3 and 4 where the results ofCIC are not shown because they are the same as those of QIC

Based on the results RJ does not perform well forthe scenarios with either continuous or binary outcomeswhile RJ1 and RJ2 have comparable performances and canselect the true underlying correlation structure in mostscenarios with better performance under large sample sizeQIC is not satisfactory when the true correlation structureis independent but has advantageous performance for thescenarios with the true correlation structure as exchangeableor AR-1 On the other hand SC and GP do not performwell for longitudinal data with normal responses but theperformance is slightly improved for longitudinal data withbinary outcomes The results may vary due to variety offactors including the types of ldquoworkingrdquo correlation structureconsidered for model fitting the sample size andor themagnitude of correlation coefficient For the future workthere is a necessity to find out a robust criterion for ldquoworkingrdquocorrelation structure selection of GEE and more advancedapproaches are emerging currently

4 Future Direction and Discussion

In this paper we provide a review of several specific topicssuch as model selection with emphasis on the selection ofldquoworkingrdquo correlation structure sample size and power cal-culation and clustered data analysis with informative clustersize related to GEE for longitudinalcorrelated data Thesimulation studies are conducted for providing numericalcomparisons among five types of model selection criteria[91 92] Until now novel methodologies are still needed andbeing developed due to the increasing usage and potential

Advances in Statistics 7

Table 2 Simulation for longitudinal data with independent correlation matrix

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 198 393 409 202 374 424RJ 327 423 250 312 421 267RJ1 388 322 290 399 316 285RJ2 384 327 289 388 320 292SC 488 1 512 351 310 339GP 547 0 453 368 306 326

100

QIC 209 377 414 185 407 408RJ 338 415 247 340 410 250RJ1 389 349 262 381 358 261RJ2 389 353 258 372 357 271SC 482 1 517 352 346 302GP 520 0 480 360 348 292

8

50

QIC 200 411 389 203 363 434RJ 282 497 221 292 476 232RJ1 402 354 244 386 340 274RJ2 402 357 241 373 347 280SC 465 1 535 351 325 324GP 558 0 442 382 311 307

100

QIC 188 393 419 201 398 401RJ 321 442 237 287 466 247RJ1 347 385 268 385 367 248RJ2 347 382 271 377 369 254SC 492 0 508 355 343 302GP 541 0 459 370 341 289

Table 3 Simulation for longitudinal data with exchangeable correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 106 699 195 53 758 189RJ 419 139 442 869 5 126RJ1 0 963 37 12 898 90RJ2 0 959 41 22 876 102SC 0 593 407 282 650 68GP 1 593 406 412 524 64

100

QIC 31 879 90 7 867 126RJ 350 88 562 911 2 87RJ1 0 995 5 2 946 52RJ2 0 996 4 10 933 57SC 0 598 402 339 635 26GP 0 501 499 445 531 24

8

50

QIC 80 828 92 50 876 74RJ 10 395 595 813 6 181RJ1 0 1000 0 0 987 13RJ2 0 1000 0 0 966 25SC 0 488 513 302 696 2GP 0 511 489 497 500 3

100

QIC 17 953 30 8 973 19RJ 0 408 592 861 0 139RJ1 0 1000 0 0 997 3RJ2 0 1000 0 0 993 7SC 0 470 530 328 672 0GP 0 526 474 486 514 0

8 Advances in Statistics

Table 4 Simulation for longitudinal data with AR-1 correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 91 166 743 66 170 764RJ 712 142 146 925 12 63RJ1 0 478 522 7 505 488RJ2 0 466 534 20 499 481SC 0 480 520 220 350 430GP 0 543 457 303 332 365

100

QIC 25 116 859 7 122 871RJ 770 95 135 972 4 24RJ1 0 475 525 1 569 430RJ2 0 481 519 5 571 424SC 0 491 509 237 371 392GP 0 540 460 290 353 357

8

50

QIC 50 88 862 44 77 879RJ 646 148 206 934 5 61RJ1 0 445 555 0 535 465RJ2 0 443 557 10 535 455SC 0 467 533 168 397 435GP 0 549 451 269 406 325

100

QIC 16 39 945 7 33 960RJ 648 154 198 972 0 28RJ1 0 455 545 1 603 396RJ2 0 455 545 1 609 390SC 0 480 520 177 458 365GP 0 532 468 247 457 296

theoretical constraints of GEE as well as new challengesemerging from practical applications in clinical trials orbiomedical studies

In addition current research of interest related to GEEalso includes a robust and optimal model selection criterionof GEE under missing at random (MAR) or missing not atrandom (MNAR) [93 94] sample sizepower calculation forcorrelated sparse or overdispersion count data or longitudinaldata with small sample [57ndash60] GEE with improved per-formance under the situations with informative cluster sizeandor MAR andor small sample size [95ndash98] and GEE forhigh-dimensional longitudinal data [99] Although GEE hasattractive features flexible application and easy implementa-tion in software the application in practice should be cautiousdepending on the context of study design or data structureand the goals of research interest

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The author was supported by a grant from the Penn StateCTSI The project was supported by the National Center forResearch Resources and the National Center for AdvancingTranslational Sciences National Institutes of Health throughGrant 5 UL1 RR0330184-04 The content is solely the respon-sibility of the author and does not represent the views of theNIH

References

[1] Z Feng P Diehr A Peterson and D McLerran ldquoSelectedstatistical issues in group randomized trialsrdquo Annual Review ofPublic Health vol 22 pp 167ndash187 2001

[2] G Fitzmaurice N M Larid and J H Ware Applied Longitudi-nal Data John Wiley amp Sons 2004

[3] J W Hardin and J M Hilbe Generalized Estimating EquationsChapman and HallCRC Press Boca Raton Fla USA 2003

[4] R F Potthoff and S N Roy ldquoA generalized multivariate analysisof variance model useful especially for growth curve problemsrdquoBiometrika vol 51 pp 313ndash326 1964

Advances in Statistics 9

[5] LM Friedman CD Furberg andD LDeMets Fundamentalsof Clinical Trials Springer New York NY USA 3nd edition1989

[6] K Y Liang and S L Zeger ldquoA comparison of two bias-correctedcovariance estimators for generalized estimating equationsrdquoBiometrika vol 73 pp 13ndash22 1986

[7] M Crowder ldquoOn the use of a working correlation matrixin using generalised linear models for repeated measuresrdquoBiometrika vol 82 no 2 pp 407ndash410 1995

[8] R W Wedderburn ldquoQuasi-likelihood functions generalizedlinearmodels and the Gauss-Newtonmethodrdquo Biometrika vol61 pp 439ndash447 1974

[9] P Diggle P Heagerty K Y Liang and S L Zeger Analysis ofLongitudinal Data Oxford University Press Oxford UK 2002

[10] G Fitzmaurice M Davidian G Verbeke and G MolenberghsLongitudinal Data Anlaysis Chapman ampHallCRC Press 2008

[11] D Hedeker and R D Gibbons Analysis of Longitudinal DataJohn Wiley amp Sons 2006

[12] P McCullagh and J A Nelder Generalized Linear ModelsChapman amp Hall London UK 1989

[13] N R Chaganty and H Joe ldquoRange of correlation matrices fordependent Bernoulli random variablesrdquo Biometrika vol 93 no1 pp 197ndash206 2006

[14] R T Sabo and N R Chaganty ldquoWhat can go wrong whenignoring correlation bounds in the use of generalized estimatingequationsrdquo Statistics in Medicine vol 29 no 24 pp 2501ndash25072010

[15] B C Sutradhar and K Das ldquoOn the efficiency of regressionestimators in generalised linear models for longitudinal datardquoBiometrika vol 86 no 2 pp 459ndash465 1999

[16] Y-G Wang and V Carey ldquoWorking correlation structuremisspecification estimation and covariate design implicationsfor generalised estimating equations performancerdquo Biometrikavol 90 no 1 pp 29ndash41 2003

[17] S R Lipsitz GMolenberghsGM Fitzmaurice and J IbrahimldquoGEE with Gaussian estimation of the correlations when dataare incompleterdquo Biometrics vol 56 no 2 pp 528ndash536 2000

[18] Y-G Wang and V J Carey ldquoUnbiased estimating equationsfromworking correlationmodels for irregularly timed repeatedmeasuresrdquo Journal of the American Statistical Association vol99 no 467 pp 845ndash853 2004

[19] A Qu and B G Lindsay ldquoBuilding adaptive estimating equa-tions when inverse of covariance estimation is difficultrdquo Journalof the Royal Statistical Society B Statistical Methodology vol 65no 1 pp 127ndash142 2003

[20] S R Lipsitz and G M Fitzmaurice ldquoEstimating equations formeasures of association between repeated binary responsesrdquoBiometrics vol 52 no 3 pp 903ndash912 1996

[21] Y Lee and J A Nelder ldquoConditional and marginal modelsanother viewrdquo Statistical Science vol 19 no 2 pp 219ndash238 2004

[22] Y Lee and J A Nelder ldquoLikelihood inference for models withunobservables another viewrdquo Statistical Science vol 24 no 3pp 255ndash269 2009

[23] A Qu B G Lindsay and B Li ldquoImproving generalised estimat-ing equations using quadratic inference functionsrdquo Biometrikavol 87 no 4 pp 823ndash836 2000

[24] G Kauermann and R J Carroll ldquoA note on the efficiencyof sandwich covariance matrix estimationrdquo Journal of theAmerican Statistical Association vol 96 no 456 pp 1387ndash13962001

[25] Y G Wang and L Y Hin ldquoModeling strategies in longitudinaldata analysis covariate variance function and correlationstructure selectionrdquoComputational Statistics andData Analysisvol 54 no 12 pp 3359ndash3370 2010

[26] W Pan ldquoGoodness-of-fit tests for GEE with correlated binarydatardquo Scandinavian Journal of Statistics vol 29 no 1 pp 101ndash110 2002

[27] A M Wood I R White and P Royston ldquoHow should variableselection be performed with multiply imputed datardquo Statisticsin Medicine vol 27 no 17 pp 3227ndash3246 2008

[28] M D Begg and M K Parides ldquoSeparation of individual-level and cluster-level covariate effects in regression analysis ofcorrelated datardquo Statistics in Medicine vol 22 no 16 pp 2591ndash2602 2003

[29] L Y Hin V J Carey and Y G Wang ldquoCriteria for working-correlation-structure selection in GEE assessment via simula-tionrdquoTheAmerican Statistician vol 61 no 4 pp 360ndash364 2007

[30] J X Pan and G Mackenzie ldquoOn modelling mean-covariancestructures in longitudinal studiesrdquo Biometrika vol 90 no 1 pp239ndash244 2003

[31] M Davidian and R J Carroll ldquoVariance function estimationrdquoJournal of the American Statistical Association vol 82 no 400pp 1079ndash1091 1987

[32] M Pourahmadi ldquoJoint mean-covariance models with appli-cations to longitudinal data unconstrained parameterisationrdquoBiometrika vol 86 no 3 pp 677ndash690 1999

[33] S Konishi and G Kitagawa ldquoGeneralised information criteriainmodel selectionrdquoBiometrika vol 83 no 4 pp 875ndash890 1996

[34] B Zhang ldquoSummarizing the goodness of fit o f generalizedlinear models for longitudinal datardquo Statistics in Medicine vol19 pp 1265ndash1275 2000

[35] A Rotnitzky and N P Jewell ldquoHypothesis testing of regressionparameters in semiparametric generalized linear models forcluster correlated datardquo Biometrika vol 77 no 3 pp 485ndash4971990

[36] J Shults andN R Chaganty ldquoAnalysis of serially correlated datausing quasi-least squaresrdquo Biometrics vol 54 no 4 pp 1622ndash1630 1998

[37] V J Carey and Y-G Wang ldquoWorking covariance modelselection for generalized estimating equationsrdquo Statistics inMedicine vol 30 no 26 pp 3117ndash3124 2011

[38] W Pan ldquoAkaikersquos information criterion in generalized estimat-ing equationsrdquo Biometrics vol 57 no 1 pp 120ndash125 2001

[39] H Akaike ldquoInformation theory and an extension of themaximum likelihood principlerdquo in Proceedings of the 2ndInternational Symposium on Information Theory vol 15 pp267ndash281 1973

[40] J A Nelder and Y Lee ldquoLikelihood quasi-likelihood andpseudolikelihood some comparisonsrdquo Journal of the RoyalStatistical Society B vol 54 no 1 pp 273ndash284 1992

[41] J Cui ldquoQIC program andmodel selection in GEE analysesrdquoTheStata Journal vol 7 no 2 pp 209ndash220 2007

[42] J Cui and G Qian ldquoSelection of working correlation structureand best model in GEE analyses of longitudinal datardquo Commu-nications in StatisticsmdashSimulation and Computation vol 36 no4ndash6 pp 987ndash996 2007

[43] L Y Hin and Y G Wang ldquoWorking-correlation-structureidentification in generalized estimating equationsrdquo Statistics inMedicine vol 28 no 4 pp 642ndash658 2009

[44] J A Nelder and D Pregibon ldquoAn extended quasi-likelihoodfunctionrdquo Biometrika vol 74 no 2 pp 221ndash232 1987

10 Advances in Statistics

[45] MWang M Kong and S Datta ldquoInference for marginal linearmodels for clustered longitudinal datawith potentially informa-tive cluster sizesrdquo Statistical Methods in Medical Research vol20 no 4 pp 347ndash367 2011

[46] E Cantoni J M Flemming and E Ronchetti ldquoVariableselection for marginal longitudinal generalized linear modelsrdquoBiometrics Journal of the International Biometric Society vol 61no 2 pp 507ndash514 2005

[47] Y-G Wang and X Lin ldquoEffects of variance-function misspeci-fication in analysis of longitudinal datardquo Biometrics vol 61 no2 pp 413ndash421 2005

[48] N R Chaganty andH Joe ldquoEfficiency of generalized estimatingequations for binary responsesrdquo Journal of the Royal StatisticalSociety Series B Statistical Methodology vol 66 no 4 pp 851ndash860 2004

[49] M Gosho C Hamada and I Yoshimura ldquoCriterion for theselection of a working correlation structure in the generalizedestimating equation approach for longitudinal balanced datardquoCommunications in Statistics vol 40 no 21 pp 3839ndash38562011

[50] M Gosho C Hamada and I Yoshimura ldquoSelection of workingcorrelation structure in weighted generalized estimating equa-tion method for incomplete longitudinal datardquo Communica-tions in Statistics vol 43 no 1 pp 62ndash81 2014

[51] M J JangWorking correlation selection in generalized estimatingequations [Dissertation] University of Iowa 2011

[52] J Chen and N A Lazar ldquoSelection of working correlationstructure in generalized estimating equations via empiricallikelihoodrdquo Journal of Computational and Graphical Statisticsvol 21 no 1 pp 18ndash41 2012

[53] P M Westgate ldquoA bias-corrected covariance estimator forimproved inference when using an unstructured correlationwith quadratic inference functionsrdquo Statistics and ProbabilityLetters vol 83 no 6 pp 1553ndash1558 2013

[54] P M Westgate ldquoCriterion for the simultaneous selection of aworking correlation structure and either generalized estimatingequations or the quadratic inference function approachrdquo Bio-metrical Journal vol 56 no 3 pp 461ndash476 2014

[55] P M Westgate ldquoImproving the correlation structure selectionapproach for generalized estimating equations and balancedlongitudinal datardquo Statistics in Medicine vol 33 no 13 pp2222ndash2237 2014

[56] J Ye ldquoOn measuring and correcting the effects of data miningand model selectionrdquo Journal of the American Statistical Associ-ation vol 93 no 441 pp 120ndash131 1998

[57] J J Shuster Practical Handbook of Sample Size Guidelines forClinical Trials CRC Press Boca Raton Fla USA 1993

[58] G Liu and K-Y Liang ldquoSample size calculations for studieswith correlated observationsrdquo Biometrics vol 53 no 3 pp 937ndash947 1997

[59] W J Shih ldquoSample size and power calculations for periodontaland other studies with clustered samples using the method ofgeneralized estimating equationsrdquo Biometrical Journal vol 39no 8 pp 899ndash908 1997

[60] S R Lipsitz and G M Fitzmaurice ldquoSample size for repeatedmeasures studies with binary responsesrdquo Statistics in Medicinevol 13 no 12 pp 1233ndash1239 1994

[61] W Pan ldquoSample size and power calculations with correlatedbinary datardquoControlled Clinical Trials vol 22 no 3 pp 211ndash2272001

[62] N Breslow ldquoTests of hypotheses in overdispersed Poissonregression and other quasi likelihood modelsrdquo Journal of theAmerican Statistical Association vol 85 pp 565ndash571 1990

[63] E W Lee and N Dubin ldquoEstimation and sample size consider-ations for clustered binary responsesrdquo Statistics inMedicine vol13 no 12 pp 1241ndash1252 1994

[64] D J Sargent J A Sloan and S S Cha ldquoSample size anddesign considerations for phase II clinical trials with correlatedobservationsrdquo Controlled Clinical Trials vol 20 no 3 pp 242ndash252 1999

[65] C S Li ldquoSemiparametric negative binomial regressionmodelsrdquoCommunications in Statistics Simulation and Computation vol39 no 3 pp 475ndash486 2010

[66] WHGreene ldquoAccounting for excess zeros and sample selectionin Poisson and negative binomial regression modelsrdquo TechRep New York University 1994

[67] P Lambert ldquoModeling of repeated series of count data mea-sured at unequally spaced timesrdquo Applied Statistics vol 45 pp31ndash38 1996

[68] M S Pepe andG L Anderson ldquoA cautionary note on in ferencefor marginal regression models with longitudinal data andgeneral correlated response datardquo Communications in StatisticsSeries B vol 23 pp 939ndash951 1994

[69] M Wang and Q Long ldquoModified robust variance estimator forgeneralized estimating equations with improved small-sampleperformancerdquo Statistics in Medicine vol 30 no 11 pp 1278ndash1291 2011

[70] M Taljaard ADMcRae CWeijer et al ldquoInadequate reportingof research ethics review and informed consent in clusterrandomised trials Review of random sample of publishedtrialsrdquo British Medical Journal vol 342 Article ID d2496 2011

[71] L A Mancl and T A DeRouen ldquoA covariance estimator forGEE with improved small-sample propertiesrdquo Biometrics vol57 no 1 pp 126ndash134 2001

[72] M P Fay and B I Graubard ldquoSmall-sample adjustments forWald-type tests using sandwich estimatorsrdquo Biometrics vol 57no 4 pp 1198ndash1206 2001

[73] W Pan ldquoOn the robust variance estimator in generalisedestimating equationsrdquo Biometrika vol 88 no 3 pp 901ndash9062001

[74] W Pan and M M Wall ldquoSmall-sample adjustments in usingthe sandwich variance estimator in generalized estimatingequationsrdquo Statistics in Medicine vol 21 no 10 pp 1429ndash14412002

[75] X Guo W Pan J E Connett P J Hannan and S A FrenchldquoSmall-sample performance of the robust score test and itsmodifications in generalized estimating equationsrdquo Statistics inMedicine vol 24 no 22 pp 3479ndash3495 2005

[76] D M Farewell ldquoMarginal analyses of longitudinal data with aninformative pattern of observationsrdquo Biometrika vol 97 no 1pp 65ndash78 2010

[77] J D Beck T Sharp G G Koch and S Offenbacher ldquoA 5-yearstudy of attachment loss and tooth loss in community-dwellingolder adultsrdquo Journal of Periodontal Research vol 32 no 6 pp516ndash523 1997

[78] S J Arbes Jr H Agustsdottir and G D Slade ldquoEnvironmentaltobacco smoke and periodontal disease in the United StatesrdquoAmerican Journal of Public Health vol 91 no 2 pp 253ndash2572001

[79] J M Robins A Rotnitzky and L P Zhao ldquoAnalysis of semi-parametric regression models for repeated outcomes in the

Advances in Statistics 11

presence of missing datardquo Journal of the American StatisticalAssociation vol 90 pp 106ndash121 1995

[80] E B Hoffman P K Sen and C R Weinberg ldquoWithin-clusterresamplingrdquo Biometrika vol 88 no 4 pp 1121ndash1134 2001

[81] J MWilliamson S Datta and G A Satten ldquoMarginal analysesof clustered data when cluster size is informativerdquo Biometricsvol 59 no 1 pp 36ndash42 2003

[82] E Benhin J N Rao and A J Scott ldquoMean estimating equationapproach to analysing cluster-correlated data with nonignor-able cluster sizesrdquo Biometrika vol 92 no 2 pp 435ndash450 2005

[83] X J Cong G Yin and Y Shen ldquoMarginal analysis of correlatedfailure time data with informative cluster sizesrdquo Biometrics vol63 no 3 pp 663ndash672 2007

[84] T C Chiang and K Y Lee ldquoEfcient estimation methods forinformative cluster size datardquo Statistical Sinica vol 80 pp 121ndash123 2008

[85] M Pavlou S R Seaman and A J Copas ldquoAn examinationof a method for marginal inference when the cluster size isinformativerdquo Statistica Sinica vol 23 no 2 pp 791ndash801 2013

[86] S R Seaman M Pavlou and A J Copas ldquoMethods forobserved-cluster inference when cluster size is informative areview and clarificationsrdquoBiometrics vol 70 no 2 pp 449ndash4562014

[87] Z Chen B Zhang and P S Albert ldquoA joint modeling approachto data with informative cluster size robustness to the clustersize modelrdquo Statistics in Medicine vol 30 no 15 pp 1825ndash18362011

[88] Y Huang and B Leroux ldquoInformative cluster sizes forsubcluster-level covariates and weighted generalized estimatingequationsrdquo Biometrics vol 67 no 3 pp 843ndash851 2011

[89] B F Kurland L L Johnson B L Egleston and P H DiehrldquoLongitudinal data with follow-up truncated by death matchthe analysis method to research aimsrdquo Statistical Science vol24 no 2 pp 211ndash222 2009

[90] J M Neuhaus and C E McCulloch ldquoEstimation of covariateeffects in generalized linear mixed models with informativecluster sizesrdquo Biometrika vol 98 no 1 pp 147ndash162 2011

[91] S R Lipsitz G M Fitzmaurice E J Orav and N M LairdldquoPerformance of generalized estimating equations in practicalsituationsrdquo Biometrics vol 50 no 1 pp 270ndash278 1994

[92] D B Hall and T A Severini ldquoExtended generalized estimatingequations for clustered datardquo Journal of the American StatisticalAssociation vol 93 no 444 pp 1365ndash1375 1998

[93] C-W Shen and Y-H Chen ldquoModel selection for generalizedestimating equations accommodating dropout missingnessrdquoBiometrics vol 68 no 4 pp 1046ndash1054 2012

[94] C-W Shen and Y-H Chen ldquoModel selection of generalizedestimating equations with multiply imputed longitudinal datardquoBiometrical Journal vol 55 no 6 pp 899ndash911 2013

[95] D B Rubin ldquoInference and missing datardquo Biometrika vol 63no 3 pp 581ndash592 1976

[96] R J Little andDB Rubin Statistical Analysis withMissingDataWiley New York NY USA

[97] P Diggle D Farewell and RHenderson ldquoAnalysis of longitudi-nal data with drop-out objectives assumptions and a proposalrdquoJournal of the Royal Statistical Society C vol 56 no 5 pp 499ndash550 2007

[98] A J Copas and S R Seaman ldquoBias from the use of generalizedestimating equations to analyze incomplete longitudinal binarydatardquo Journal of Applied Statistics vol 37 no 6 pp 911ndash9222010

[99] L Wang J Zhou and A Qu ldquoPenalized generalized estimatingequations for high-dimensional longitudinal data analysisrdquoBiometrics vol 68 no 2 pp 353ndash360 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Review Article Generalized Estimating Equations in ...downloads.hindawi.com/archive/2014/303728.pdfrecent developments of GEE. As is well known, GEE has several de ning features [

2 Advances in Statistics

coefficient under a misspecified ldquoworkingrdquo correlation struc-ture based on asymptotic theory [7] In addition the estima-tion of the correlation coefficients using the moment-basedapproach is not efficient thus the correlation matrix may notbe a positive definite matrix in certain cases Also Liang andZeger did not incorporate the constraints on the range of cor-relation which was restricted by the marginal means becausethe estimation of the correlation coefficientswas simply basedon Pearson residuals [6] Chaganty and Joe discussed thisissue for dependent Bernoulli randomvariables [13] and laterSabo and Chaganty made future explanation [14] For exam-ple Sutradhar and Das pointed out under misspecificationthe correlation coefficient estimates did not converge to thetrue values [15] Furthermore for discrete random vectorsthe correlation matrix was usually complicated and it wasnot easy to attain multivariate distributions with specifiedcorrelation structures These limitations lead researchers toactively work on this area to develop novel methodologiesSeveral alternative approaches for estimating the correlationcoefficients have been proposed for example one methodwas based on ldquoGaussianrdquo estimation [16 17] and the basicidea was to estimate the correlation coefficients based onmultivariate normal estimating equations and the featurewas that this estimation can ensure the estimated correlationmatrix was positive-definite Wang and Carey proposed toestimate the correlation coefficients by differentiating theCholesky decomposition of the working correlation matrix[18] Also Qu and Lindsay (2003) proposed similar Gaussianor quadratic estimating equations [19] In particular forbinary longitudinal data the estimation of the correlationcoefficients was proposed based on conditional residuals [20ndash22] Nevertheless in this paper the above issues are notdiscussed in great depth and the assumption that underthe regular mild conditions the consistency of parameterestimates as well as within-subject correlation coefficient esti-mate holds is satisfied Thus three specific topics includingmodel selection power analysis and the issue of informativecluster size are mainly focused on and the recent develop-ments are reviewed in the following sections

2 Method

21 Notation and GEE Suppose that longitudinalclustereddata consists of 119870 subjectsclusters For subjectcluster 119894 (119894 =1 2 119870) suppose that there are 119899

119894observations and 119884

119894119895

denotes the 119895th response (119895 = 1 119899119894) and let 119883

119894119895denote

a 119901 times 1 vector of covariates Let 119884119894= (119884

1198941 1198841198942 119884

119894119899119894)1015840

denote the response vector for the 119894th subject with the meanvector noted by 120583

119894= (120583

1198941 1205831198942 120583

119894119899119894)1015840 where 120583

119894119895is the

corresponding 119895th mean The responses are assumed to beindependent across subjectsclusters but correlated withineach subjectcluster The marginal model specifies that arelationship between 120583

119894119895and the covariates 119883

119894119895is written as

follows

119892 (120583119894119895) = 119883

1015840

119894119895120573 (1)

where 119892 is a known link function and 120573 is an unknown119901 times 1 vector of regression coefficients with the true value

as 1205730 The conditional variance of 119884

119894119895given 119883

119894119895is specified

as Var(119884119894119895| 119883119894119895) = ](120583

119894119895)120601 where ] is a known variance

function of 120583119894119895and 120601 is a scale parameter which may need to

be estimated Mostly ] and 120601 depend on the distributions ofoutcomes For instance if 119884

119894119895is continuous ](120583

119894119895) is specified

as 1 and 120601 represents the error variance if 119884119894119895is count

](120583119894119895) = 120583119894119895 and 120601 is equal to 1 Also the variance-covariance

matrix for 119884119894is noted by 119881

119894= 120601119860

12

119894119877119894(120572)11986012119894 where

119860119894= Diag](120583

1198941) ](120583

119894119899119894) and the so-called ldquoworkingrdquo

correlation structure 119877119894(120572) describes the pattern of measures

within subject which is of size 119899119894times 119899119894and depends on

a vector of association parameters denoted by 120572 Table 1provides summary of commonly used ldquoworkingrdquo correlationstructures with the moment-based estimates for 120572 (moredetails in httpwwwokstateedusas) Note that the iterativealgorithm is applied for estimating 120572 using the Pearsonresiduals 119890

119894119895= (119910119894119895minus 120583119894119895)radic](120583119894119895) calculated from the current

value of 120573 Also the scale parameter 120601 can be estimated by

120601 =1

119873 minus 119901

119870

sum119894=1

119899119894

sum119895=1

1198902

119894119895 (2)

where119873 = sum119870119894=1119899119894is the total number of observations and 119901

is covariates dimensionalityBased on Liang and Zeger [6] GEE yields asymptotically

consistent even when the ldquoworkingrdquo correlation structure(119877119894(120572)) is misspecified and the estimate of 120573 is obtained by

solving the following estimating equation

119880 (120573) =119870

sum119894=1

1198631015840

119894119881minus1

119894(119884119894minus 120583119894) = 0 (3)

where 119863119894= 1205971205831198941205971205731015840 Under mildregularity conditions is

asymptotically normally distributed with a mean 1205730and a

covariancematrix estimated based on the sandwich estimator

119871119885= (

119870

sum119894=1

1198631015840

119894119881minus1

119894119863119894)

minus1

119871119885(

119870

sum119894=1

1198631015840

119894119881minus1

119894119863119894)

minus1

(4)

with

119871119885=

119870

sum119894=1

1198631015840

119894119881minus1

119894Cov (119884

119894) 119881minus1

119894119863119894

(5)

by replacing120572120573 and120601with their consistent estimates whereCov(119884

119894) = 1199031198941199031015840

119894with 119903

119894= 119884119894minus120583119894is an estimator of the variance-

covariance matrix of 119884119894[6 23] This ldquosandwichrdquo estimator is

robust in that it is consistent even if the correlation structure(119881119894) is misspecified Note that if 119881

119894is correctly specified then

119871119885

reduces to (sum119870119894=11198631015840119894119881minus1119894119863119894)minus1 which is often referred to as

the model-based variance estimator [24]Thus aWald119885-testcan be performed based on asymptotic normal distribution ofthe test statistic Next we will overviewmodel selection crite-ria and particularly ldquoworkingrdquo correlation structure selectioncriteria with regard to GEE

Advances in Statistics 3

Table 1 Summary of commonly used ldquoworkingrdquo correlation structures for GEE

Correlation structure Corr(119884119894119895 119884119894119896) Sample matrix Estimator

Independent Corr(119884119894119895 119884119894119896) =

1 119895 = 119896

0 119895 = 119896(

1 0 0

0 1 0

0 0 1

) NA

Exchangeable Corr(119884119894119895 119884119894119896) =

1 119895 = 119896

120572 119895 = 119896(

1 120572 120572

120572 1 120572

120572 120572 1

)

=1

(1198731015840 minus 119901) 120601

119870

sum119894=1

sum119895 =119896

119890119894119895119890119894119896

1198731015840 =119870

sum119894=1

119899119894(119899119894minus 1)

119896-dependent Corr(119884119894119895 119884119894119895+119898) =

1 119898 = 0

120572119898119898 = 1 2 119896

0 119898 gt 119896

(

1 12057210

12057211 1205721

0 12057211

)

119898=

1

(119870119898minus 119901) 120601

119870

sum119894=1

sum119895le119899119894minus119898

119890119894119895119890119894119895+119905

119870119905=

119870

sum119894=1

(119899119894minus 119898)

Autoregressive AR(1) Corr(119884119894119895 119884119894119895+119898) = 120572119898 119898 = 0 1 2 119899

119894minus 119895 (

1 120572 1205722

120572 1 120572

1205722 120572 1

)

=1

(1198701minus 119901)120601

119870

sum119894=1

sum119895le119899119894minus1

119890119894119895119890119894119895+1

1198701=

119870

sum119894=1

(119899119894minus 1)

Toeplitz Corr(119884119894119895 119884119894119895+119898) =

1 119898 = 0

120572119898119898 = 1 2 119899

119894minus 119895

(

1 12057211205722

12057211 1205721

120572212057221

)

=1

(1198731015840 minus 119901)120601

119870

sum119894=1

sum119895 =119896

119890119894119895119890119894119896

1198731015840 =

119870

sum119894=1

119899119894(119899119894minus 1)

Unstructured Corr(119884119894119895 119884119894119896) =

1 119895 = 119896

120572119895119896

119895 = 119896(

1 1205721212057213

12057221

1 12057223

1205723112057232

1

) 119895119896=

1

(119870 minus 119901)120601

119870

sum119894=1

119890119894119895119890119894119896

22 Model Selection of GEE In this section we will discussthe model selection criteria available of GEE There areseveral reasons why model selection of GEE models isimportant and necessary (1) GEE has gained increasingattention in biomedical studies which may include a largegroup of predictors [25ndash28] Therefore variable selection isnecessary for determining which are included in the finalregressionmodel by identifying significant predictors (2) it isalready known that one feature of GEE is that the consistencyof parameter estimates can still hold evenwhen the ldquoworkingrdquocorrelation structure ismisspecified But correctly specifyingldquoworkingrdquo correlation structure can definitely enhance theefficiency of the parameter estimates in particular when thesample size is not large enough [16 24 25 29] Thereforehow to select intrasubject correlation matrix plays a vital rolein GEE with improved finite-sample performance (3) thevariance function ](120583) is another potential factor affecting thegoodness-of-fit of GEE [25 30] Correctly specified variancefunction can assist in the selection of covariates and an appro-priate correlation structure [31 32] Different criteria mightbe needed due to the goal of model selection [24 29 33] andnext I will particularly introduce the existing approaches onthe selection of ldquoworkingrdquo correlation structure with its ownmerits and limitations [34]

According to Rotnitzky and Jewell the adequacy ofldquoworkingrdquo correlation structure can be examined through

Γ = (sum119870

119894=11198631015840119894119881minus1119894119863119894)minus1119871119885 where

119871119885has been defined in

Section 21 [35] The statistic RJ(119877) is defined by

RJ (119877) = radic(1 minus RJ1)2 + (1 minus RJ2)2 (6)

where RJ1 = trace(Γ)119901 and RJ2 = trace(Γ2)119901 respectivelyIf the ldquoworkingrdquo correlation structure 119877 is correctly specifiedRJ1 and RJ2 will be thus close to 1 leading to RJ(119877)approaching 0 Thus RJ1 RJ2 and RJ(119877) can all be used forcorrelation structure selection

Shults and Chaganty [36] proposed a criterion for select-ing ldquoworkingrdquo correlation structure based on the minimiza-tion of the generalized error sum of squares (ESS) given asfollows

ESS (120572120573) =119870

sum119894=1

(119884119894minus 119906119894)1015840119881minus1

119894(119884119894minus 119906119894)

=

119870

sum119894=1

1198851015840

119894(120573) 119877minus1

119894(120572) 119885119894 (120573)

(7)

where 119885119894(120573) = 11986012(119884

119894minus 119906119894) The criterion is defined by

SC =ESS (120572120573)(119873 minus 119901 minus 119902)

(8)

4 Advances in Statistics

where 119873 = sum119870

119894=1119899119894is the total number of observations

119901 is the number of regression parameters and 119902 is thenumber of correlation coefficients within the ldquoworkingrdquocorrelation structure Another extended criterion from SCwas proposed by Carey and Wang [37] where the Gaussianpseudolikelihood (GP) is adopted and it is given by

GP (119877) = minus05 times119870

sum119894=1

(119884119894minus 119906119894)1015840119881minus1

119894(119884119894minus 119906119894) + log (1003816100381610038161003816119881119894

1003816100381610038161003816) (9)

where a better ldquoworkingrdquo correlation structure yields a largerGP In their work they also showed that GP criterion heldbetter performance than RJ via simulation

Another criterion is proposed by Pan [38] which mod-ified Akaike information criterion (AIC) [39] in adaptionto GEE Due to the fact that GEE is not likelihood-basedthus it is called quasi-likelihood under the independencemodel criterion (QIC) [40] The basic idea is to calculatethe expected Kullback-Leibler discrepancy using the quasi-likelihood under the independence ldquoworkingrdquo correlationassumption due to the lack of a general and tractable quasi-likelihood for the correlated data under any other complexldquoworkingrdquo correlation structures QIC(119877) is defined by

QIC (119877) = minus2Ψ ( (119877) 119868) + 2trace (Ω119868119871119885) (10)

where the quasilikelihood Ψ((119877) 119868) = sum119870119894=1sum119899119894

119895=1119876((119877) 120601

119884119894119895 119883119894119895) with 119876(120583 120601 119910) = int119906

119910((119910 minus 119905)120601119881(119905))119889119905 defined by

[12] and 120601 are obtained under the hypothesized ldquoworkingrdquocorrelation structure 119877 Ω

119868= sum119870

119894=11198631015840119894119881minus1119894119863119894|120573=119877=119868 and 119871119885

is defined above with replacement of 120573 by (119877) [38] Notethat in this work Pan ignored the second term in Taylorrsquosexpansion of the discrepancy and showed its influence wasnot substantial among his simulation set-ups Later onHardin andHilbe (2003)made slightmodification onQIC(119877)by using (119868) and 120601(119868) for more stability and QIC(119877)HH isgiven by

QIC (119877)HH = minus2Ψ ( (119868) 119868) + 2trace (Ω119868119871119885) (11)

Note that QIC(119877) and QIC(119877)HH do not perform well indistinguishing the independence and exchangeable ldquowork-ingrdquo correlation structures because in certain cases the sameregression parameter estimates can be obtained under thesetwo structures Also the attractive property of the QICcriterion is that it allows the selection of the covariates andldquoworkingrdquo correlation structure simultaneously [41 42] butthis measure is more sensitive to the mean structure becauseQIC is particularly impacted by the first term and the secondtermwhich plays a role as a penalty To better select ldquoworkingrdquocorrelation structure Hin and Wang proposed correlationinformation criterion (CIC) defined by

CIC = trace (Ω119868119871119885) (12)

In their work CIC was shown to outperform QIC whenthe outcomes were binary through simulation studies [43]

One limitation of this criterion is that it cannot penalizethe overparameterization thus the performance is not wellin comparison with two correlation structures having quitedifferent numbers of correlation parameters

Another attractive criterion is the extended quasilike-lihood information criterion (EQIC) proposed by Wangand Hin [25] by using the extended quasilikelihood (EQL)defined by Nelder and Pregibon based on the deviancefunction which is shown below under the independentcorrelation structure [44]

119876lowast(120573 120601 119868) = minus

1

2120601119863 (120573 119868) minus

1

2

119870

sum119894=1

119899119894

sum119895=1

log (2120587120601119860 (120583119894119895))

(13)

where the sum of deviances 119863(120573 119868) = sum119870119894=1sum119899119894

119895=1minus2120601119876(119910

119894119895

120583119894119895) minus 119876(119910

119894119895 119910119894119895) with 119876(sdot) being the quasilikelihood defined

as above Therefore EQIC is defined by

EQIC (119877) = 1120601119863 (120573 119868) +

119870

sum119894=1

119899119894

sum119895=1

log (2120587120601119860 (120583119894119895))

+ 2trace (Ω119868119871119885)

(14)

where some adjustments were applied to 119860(120583) by adding asmall constant 119896 with the optimal chosen value as 16 Theauthor indicated that the covariates were first selected basedon QIC and the variance function could be identified asthe one minimizing EQIC given the selected covariates thenldquoworkingrdquo correlation structure selection could be achievedbased on CIC in addition they found out that the covariatesselection by EQIC given different working variance functionswas more consistent than that based on QIC [45]

Besides those criteriamentioned above Cantoni et al alsodiscussed the covariate selection for longitudinal data anal-ysis [46] also a variance function selection was mentionedby Pan and Mackenzie [30] as well as Wang and Lin [47]in addition more work on ldquoworkingrdquo correlation structureselection was addressed by Chaganty and Joe [48] Wang andLin [47] Gosho et al [49 50] Jang [51] Chen [52] andWestgate [53ndash55] among others Overall themodel selectionof GEE is nontrivial where the best selection criterion is stillbeing pursued [56] and the recent work by Wang et al canbe followed up as the rule of thumb [45]

23 Sample Size and Power of GEE It is well known thatthe calculation of sample size and power is necessary andimportant for planning a clinical trial which have beenwell studied for independent observations [1] With thewide applications of GEE in clinical trials this topic forcorrelatedclustered data has gainedmore attention than ever[5 57] The general method for sample sizepower calculatedwas discussed by Liu and Liang [58] where the generalizedscore test was utilized to draw statistical inference and theresulting noncentral chi-square distribution of test statisticunder the alternative hypothesis was derived however insome special cases that is correlated binary data withnonexchangeable correlation structure there was no close

Advances in Statistics 5

form available along the outline of that formula AfterwardsShih provided an alternative formula on sample sizepowercalculation which relied on Wald tests using the estimatesof regression parameters and robust variance estimators [59]For example in a study with one parameter of interest 120573 thehypothesis of interest can be formulated as

1198670 120573 = 0 versus 119867

119886 120573 = 119887 = 0 (15)

where 119887 is the expected value Thus based on a two-sided119885-test with type I error 120578 the power denoted by 120575 can beobtained by

120575 = 1 minus Φ(1198851205782minus119887radic119870

radic]119877) (16)

where119870 is sample size and ]119877is the robust variance estimator

corresponding to 120573 in the estimate of119870119871119885 Accordingly the

sample size is given by

119870 =]119877(1198851205782minus 1198851minus120575)2

1198872 (17)

For correlated continuous data the calculation is straightfor-ward using (16) however in particular for correlated binarydata more work will be needed [60] and Pan providedexplicit formulas for ]

119877under various situations as follows

[61]

]119877= Ω[

1

1205871199010(1 minus 119901

0)+

1

(1 minus 120587) 1199011 (1 minus 1199011)] (18)

where Ω = 119870(sum119870119894=1

11015840119899119894119877minus1

119894119881119894119877minus1

1198941119899119894)(sum119870

119894=111015840119899119894119877minus1

1198941119899119894)2 with 120587

as the proportion of subjects assigned to the control groupand 119901

0and 119901

1as the mean for control and case groups

[61] The detailed calculations of ]119877under several important

special cases are given by

If 119877119894= 119881119868= CS Ω = 119870

sum119870

119894=1(119899119894 (1 + (119899

119894minus 1) 120572))

If 119877119894= 119868 119881

119868= CS Ω =

119870sum119870

119894=1119899119894[1 + (119899

119894minus 1) 120572]

(sum119870

119894=1119899119894)2

If 119877119894= 119881119868= AR (1) Ω = 119870 (1 + 120572)

sum119870

119894=1[119899119894minus (119899119894minus 2) 120572]

If 119877119894= 119868 119881

119868= AR (1)

Ω = 119870sum119870

119894=1[119899119894+ 2 (119899

119894minus 1) 120572 + 2 (119899

119894minus 2) 1205722 + sdot sdot sdot + 2120572119899119894minus1]

(sum119870

119894=1119899119894)2

(19)

These formulas can be directly used in practice which hascovered most situations encountered in clinical trials [61]Note that when 119877

119894= 119881119868= CS Liu and Liang (1997) provided

a different formula of sample size compared with (17) with119899119894= 119899 which is

119870 = ((1199111minus1205782

+ 1199111minus120575)2

times ((1 minus 120587) 1199010 (1 minus 1199010) + 1205871199011 (1 minus 1199011))

times [1 + (119899 minus 1) 120572])

times (119899120587 (1 minus 120587) (1199011 minus 1199010)2)minus1

(20)

Be aware that the difference is due to the test methods theWald 119885-test used by Pan [61] and the score test applied byLiu and Liang [58] Note that in some cases the score testmay be preferred [62] Although some other works exist forsample sizepower calculation they focused on the otheralternative approaches rather than GEE [63 64] thus wedo not discuss them here For correlated Poisson data thesample sizepower calculation is more challenging due tothe occurrence of overdispersion or sparsity where negativebinomial regression model may be explored [62 65ndash67]

On the other hand there are several concerns [68]First we here focus on the calculation of the sample size 119870assuming 119899

119894is known however based on the power formula

(16) ]119877depends on 119899

119894and thus increasing 119899

119894can also assist

in power improvement but turns out to be less effective than119870 [69] Second the sample sizepower calculation may berestricted to the limitation of clusters for example clusteredrandomized trials (CRTs) where the number of clusters couldbe relatively small For example by the literature review ofpublished CRTs the median number of clusters is shown as21 [70] In such situations the power formula adjusted for thesmall samples in GEE is necessary which has drawn attentionfrom researchers recently [71ndash75]

24 Clustered Data with Informative Cluster Size The appli-cation of GEE in clustered data with informative clustersize is another special topic [76] Taking an example ofa periodontal disease study the number of teeth for eachpatient may be related to the overall oral health of theindividual in other words the worse the oral health is theless the number of teeth is and thus cluster size 119899

119894may

influence the distribution of the oral outcomes which iscalled informative cluster size [45 77] Such issues commonlyoccur in biomedical studies (eg genetic disease studies) andrigorous statistical methods are needed for valid statisticalinference [78] Note that if the maximum of cluster sizeexists and is known then this can be treated as (informative)missing data problem which can be solved via the weightedestimating equations proposed by Robins et al [79] howeverif the maximum is unknown or not accessible the methodof within-cluster resampling (WCR) proposed by Hoffman etal could be applied [80] The basic idea is that for each of 119871resampled replicate data based on a Monte Carlo method (119871is a large number ie 10000) one observation is randomlyextracted from each cluster where

119897with variance estimator

Σ119897can be obtained from a regular score equation denoted by

119878119897(120573) for independent observations (ie linear regression for

6 Advances in Statistics

continuous data logistic regression for binary data Poissonregression for count data) 119897 = 1 2 119871 The details areshown as follows

119878119897(120573) =

119870

sum119894=1

119878119894119895(120573119897) 119868 [119895 isin 119903

119897] = 0

wcr =1

119871

119871

sum119897=1

119897

wcr =1

119871

119871

sum119897=1

Σ119897minus1

119871

119871

sum119897=1

(119897minus wcr) (119897 minus wcr)

119879

(21)

where 119878119894119895(120573119897) = 1198831015840

119894119895119881minus1119894119895(119884119894119895minus 1198831015840119894119895120573119897) with 119903

119897as the set of

data index selected from the 119894th cluster in 119897th replicate dataAlternatively the approach considered by Williamson et alby adopting the weighted estimating equations performsasymptotically equivalently asWCR and also avoids intensivecomputing and it is referred to as the cluster-weighted GEE(CWGEE) [81] The estimating equation is

119878 (120573) =119870

sum119894=1

1

119870

119899119894

sum119895=1

119878119894119895(120573) = 0 (22)

where 119878119894119895is defined the same as above but what is different is

that the subscription 119895 ranges from 1 to 119899119894 not restricted by

the index 119903119897 Note that as 119871 rarr infin (1119871)sum119871

119894=1119878119897(120573) converges

to its expected estimating function and is asymptoticallyequivalent to 119878(120573)

This method was also explored or extended for thecorrelated data with nonignorable cluster size by Benhin etal and Cong et al [82 83] Furthermore a more efficientmethod called modified WCR (MWCR) was proposed byChiang and Lee where minimum cluster size 119899

119894gt 1 subjects

were randomly sampled from each cluster and then GEEmodels for balanced data were applied for estimation byincorporating the intracluster correlation thusMWCRmightbe a more efficient way for analysis [84] But MWCR is notalways satisfactory and Pavlou et al recognized the sufficientconditions of the data structure and the choice of ldquoworkingrdquocorrelation structure which allowed the consistency of theestimates fromMWCR[85] In additionWang et al extendedthe above work to the clustered longitudinal data which arecollected as repeated measures on subjects arising in clusterswith potential informative cluster size [45] Examples includehealth studies of subjects from multiple hospitals or familiesWith the adoption and comparison of GEE WCR andCWGEE the author claimed thatCWGEEwas recommendedbecause of the comparable performance with WCR and thelack of intensive Monte Carlo computation in terms of wellpreserved coverage rates and desirable power propertieswhile GEE models led to invalid inference due to the biasedparameter estimates via extensive simulation studies and realdata application of a periodontal disease study [45] In addi-tion for observed-cluster inference Seaman et al discussedthe methods including weighted and doubly weighted GEEand the shared random-effects models for comparison and

showed the conditions under which the shared random-effects model described members with observed outcomes Y[86] More work can be found in [87ndash90] among others

3 Simulation

In this section we focus on ldquoworkingrdquo correlation structureselection and compare the performances of the existingcriteria through simulation studies Two types of outcomesare considered continuous and count responses The modelsfor data generation are as follows

119906119894119895= 1205730+ 1205731times 119909119894119895

log (119906119894119895) = 1205730+ 1205731times 119909119894119895

(23)

where 1205730= 1205731= 05 119894 = 1 2 119868 with 119868 = 50 100 200 500

and 119895 = 1 2 119869 with 119869 = 4 8 The covariates 119909119894119895are iid

from a standard uniform distribution Unif(0 1) For eachscenario we generate the data based on the underlying truecorrelation structures as independent (IND) exchangeable(EXCH) and autoregressive (AR-1) with 120572 = 03 071000 Monte Carlo data sets are generated for each scenariowhere the estimates of regression parameters and within-subject correlation matrix and seven model selection criteriameasures are calculated using the ldquoworkingrdquo correlationstructure of IND EXCH and AR-1 The partial simulationresults are provided in Tables 2 3 and 4 where the results ofCIC are not shown because they are the same as those of QIC

Based on the results RJ does not perform well forthe scenarios with either continuous or binary outcomeswhile RJ1 and RJ2 have comparable performances and canselect the true underlying correlation structure in mostscenarios with better performance under large sample sizeQIC is not satisfactory when the true correlation structureis independent but has advantageous performance for thescenarios with the true correlation structure as exchangeableor AR-1 On the other hand SC and GP do not performwell for longitudinal data with normal responses but theperformance is slightly improved for longitudinal data withbinary outcomes The results may vary due to variety offactors including the types of ldquoworkingrdquo correlation structureconsidered for model fitting the sample size andor themagnitude of correlation coefficient For the future workthere is a necessity to find out a robust criterion for ldquoworkingrdquocorrelation structure selection of GEE and more advancedapproaches are emerging currently

4 Future Direction and Discussion

In this paper we provide a review of several specific topicssuch as model selection with emphasis on the selection ofldquoworkingrdquo correlation structure sample size and power cal-culation and clustered data analysis with informative clustersize related to GEE for longitudinalcorrelated data Thesimulation studies are conducted for providing numericalcomparisons among five types of model selection criteria[91 92] Until now novel methodologies are still needed andbeing developed due to the increasing usage and potential

Advances in Statistics 7

Table 2 Simulation for longitudinal data with independent correlation matrix

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 198 393 409 202 374 424RJ 327 423 250 312 421 267RJ1 388 322 290 399 316 285RJ2 384 327 289 388 320 292SC 488 1 512 351 310 339GP 547 0 453 368 306 326

100

QIC 209 377 414 185 407 408RJ 338 415 247 340 410 250RJ1 389 349 262 381 358 261RJ2 389 353 258 372 357 271SC 482 1 517 352 346 302GP 520 0 480 360 348 292

8

50

QIC 200 411 389 203 363 434RJ 282 497 221 292 476 232RJ1 402 354 244 386 340 274RJ2 402 357 241 373 347 280SC 465 1 535 351 325 324GP 558 0 442 382 311 307

100

QIC 188 393 419 201 398 401RJ 321 442 237 287 466 247RJ1 347 385 268 385 367 248RJ2 347 382 271 377 369 254SC 492 0 508 355 343 302GP 541 0 459 370 341 289

Table 3 Simulation for longitudinal data with exchangeable correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 106 699 195 53 758 189RJ 419 139 442 869 5 126RJ1 0 963 37 12 898 90RJ2 0 959 41 22 876 102SC 0 593 407 282 650 68GP 1 593 406 412 524 64

100

QIC 31 879 90 7 867 126RJ 350 88 562 911 2 87RJ1 0 995 5 2 946 52RJ2 0 996 4 10 933 57SC 0 598 402 339 635 26GP 0 501 499 445 531 24

8

50

QIC 80 828 92 50 876 74RJ 10 395 595 813 6 181RJ1 0 1000 0 0 987 13RJ2 0 1000 0 0 966 25SC 0 488 513 302 696 2GP 0 511 489 497 500 3

100

QIC 17 953 30 8 973 19RJ 0 408 592 861 0 139RJ1 0 1000 0 0 997 3RJ2 0 1000 0 0 993 7SC 0 470 530 328 672 0GP 0 526 474 486 514 0

8 Advances in Statistics

Table 4 Simulation for longitudinal data with AR-1 correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 91 166 743 66 170 764RJ 712 142 146 925 12 63RJ1 0 478 522 7 505 488RJ2 0 466 534 20 499 481SC 0 480 520 220 350 430GP 0 543 457 303 332 365

100

QIC 25 116 859 7 122 871RJ 770 95 135 972 4 24RJ1 0 475 525 1 569 430RJ2 0 481 519 5 571 424SC 0 491 509 237 371 392GP 0 540 460 290 353 357

8

50

QIC 50 88 862 44 77 879RJ 646 148 206 934 5 61RJ1 0 445 555 0 535 465RJ2 0 443 557 10 535 455SC 0 467 533 168 397 435GP 0 549 451 269 406 325

100

QIC 16 39 945 7 33 960RJ 648 154 198 972 0 28RJ1 0 455 545 1 603 396RJ2 0 455 545 1 609 390SC 0 480 520 177 458 365GP 0 532 468 247 457 296

theoretical constraints of GEE as well as new challengesemerging from practical applications in clinical trials orbiomedical studies

In addition current research of interest related to GEEalso includes a robust and optimal model selection criterionof GEE under missing at random (MAR) or missing not atrandom (MNAR) [93 94] sample sizepower calculation forcorrelated sparse or overdispersion count data or longitudinaldata with small sample [57ndash60] GEE with improved per-formance under the situations with informative cluster sizeandor MAR andor small sample size [95ndash98] and GEE forhigh-dimensional longitudinal data [99] Although GEE hasattractive features flexible application and easy implementa-tion in software the application in practice should be cautiousdepending on the context of study design or data structureand the goals of research interest

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The author was supported by a grant from the Penn StateCTSI The project was supported by the National Center forResearch Resources and the National Center for AdvancingTranslational Sciences National Institutes of Health throughGrant 5 UL1 RR0330184-04 The content is solely the respon-sibility of the author and does not represent the views of theNIH

References

[1] Z Feng P Diehr A Peterson and D McLerran ldquoSelectedstatistical issues in group randomized trialsrdquo Annual Review ofPublic Health vol 22 pp 167ndash187 2001

[2] G Fitzmaurice N M Larid and J H Ware Applied Longitudi-nal Data John Wiley amp Sons 2004

[3] J W Hardin and J M Hilbe Generalized Estimating EquationsChapman and HallCRC Press Boca Raton Fla USA 2003

[4] R F Potthoff and S N Roy ldquoA generalized multivariate analysisof variance model useful especially for growth curve problemsrdquoBiometrika vol 51 pp 313ndash326 1964

Advances in Statistics 9

[5] LM Friedman CD Furberg andD LDeMets Fundamentalsof Clinical Trials Springer New York NY USA 3nd edition1989

[6] K Y Liang and S L Zeger ldquoA comparison of two bias-correctedcovariance estimators for generalized estimating equationsrdquoBiometrika vol 73 pp 13ndash22 1986

[7] M Crowder ldquoOn the use of a working correlation matrixin using generalised linear models for repeated measuresrdquoBiometrika vol 82 no 2 pp 407ndash410 1995

[8] R W Wedderburn ldquoQuasi-likelihood functions generalizedlinearmodels and the Gauss-Newtonmethodrdquo Biometrika vol61 pp 439ndash447 1974

[9] P Diggle P Heagerty K Y Liang and S L Zeger Analysis ofLongitudinal Data Oxford University Press Oxford UK 2002

[10] G Fitzmaurice M Davidian G Verbeke and G MolenberghsLongitudinal Data Anlaysis Chapman ampHallCRC Press 2008

[11] D Hedeker and R D Gibbons Analysis of Longitudinal DataJohn Wiley amp Sons 2006

[12] P McCullagh and J A Nelder Generalized Linear ModelsChapman amp Hall London UK 1989

[13] N R Chaganty and H Joe ldquoRange of correlation matrices fordependent Bernoulli random variablesrdquo Biometrika vol 93 no1 pp 197ndash206 2006

[14] R T Sabo and N R Chaganty ldquoWhat can go wrong whenignoring correlation bounds in the use of generalized estimatingequationsrdquo Statistics in Medicine vol 29 no 24 pp 2501ndash25072010

[15] B C Sutradhar and K Das ldquoOn the efficiency of regressionestimators in generalised linear models for longitudinal datardquoBiometrika vol 86 no 2 pp 459ndash465 1999

[16] Y-G Wang and V Carey ldquoWorking correlation structuremisspecification estimation and covariate design implicationsfor generalised estimating equations performancerdquo Biometrikavol 90 no 1 pp 29ndash41 2003

[17] S R Lipsitz GMolenberghsGM Fitzmaurice and J IbrahimldquoGEE with Gaussian estimation of the correlations when dataare incompleterdquo Biometrics vol 56 no 2 pp 528ndash536 2000

[18] Y-G Wang and V J Carey ldquoUnbiased estimating equationsfromworking correlationmodels for irregularly timed repeatedmeasuresrdquo Journal of the American Statistical Association vol99 no 467 pp 845ndash853 2004

[19] A Qu and B G Lindsay ldquoBuilding adaptive estimating equa-tions when inverse of covariance estimation is difficultrdquo Journalof the Royal Statistical Society B Statistical Methodology vol 65no 1 pp 127ndash142 2003

[20] S R Lipsitz and G M Fitzmaurice ldquoEstimating equations formeasures of association between repeated binary responsesrdquoBiometrics vol 52 no 3 pp 903ndash912 1996

[21] Y Lee and J A Nelder ldquoConditional and marginal modelsanother viewrdquo Statistical Science vol 19 no 2 pp 219ndash238 2004

[22] Y Lee and J A Nelder ldquoLikelihood inference for models withunobservables another viewrdquo Statistical Science vol 24 no 3pp 255ndash269 2009

[23] A Qu B G Lindsay and B Li ldquoImproving generalised estimat-ing equations using quadratic inference functionsrdquo Biometrikavol 87 no 4 pp 823ndash836 2000

[24] G Kauermann and R J Carroll ldquoA note on the efficiencyof sandwich covariance matrix estimationrdquo Journal of theAmerican Statistical Association vol 96 no 456 pp 1387ndash13962001

[25] Y G Wang and L Y Hin ldquoModeling strategies in longitudinaldata analysis covariate variance function and correlationstructure selectionrdquoComputational Statistics andData Analysisvol 54 no 12 pp 3359ndash3370 2010

[26] W Pan ldquoGoodness-of-fit tests for GEE with correlated binarydatardquo Scandinavian Journal of Statistics vol 29 no 1 pp 101ndash110 2002

[27] A M Wood I R White and P Royston ldquoHow should variableselection be performed with multiply imputed datardquo Statisticsin Medicine vol 27 no 17 pp 3227ndash3246 2008

[28] M D Begg and M K Parides ldquoSeparation of individual-level and cluster-level covariate effects in regression analysis ofcorrelated datardquo Statistics in Medicine vol 22 no 16 pp 2591ndash2602 2003

[29] L Y Hin V J Carey and Y G Wang ldquoCriteria for working-correlation-structure selection in GEE assessment via simula-tionrdquoTheAmerican Statistician vol 61 no 4 pp 360ndash364 2007

[30] J X Pan and G Mackenzie ldquoOn modelling mean-covariancestructures in longitudinal studiesrdquo Biometrika vol 90 no 1 pp239ndash244 2003

[31] M Davidian and R J Carroll ldquoVariance function estimationrdquoJournal of the American Statistical Association vol 82 no 400pp 1079ndash1091 1987

[32] M Pourahmadi ldquoJoint mean-covariance models with appli-cations to longitudinal data unconstrained parameterisationrdquoBiometrika vol 86 no 3 pp 677ndash690 1999

[33] S Konishi and G Kitagawa ldquoGeneralised information criteriainmodel selectionrdquoBiometrika vol 83 no 4 pp 875ndash890 1996

[34] B Zhang ldquoSummarizing the goodness of fit o f generalizedlinear models for longitudinal datardquo Statistics in Medicine vol19 pp 1265ndash1275 2000

[35] A Rotnitzky and N P Jewell ldquoHypothesis testing of regressionparameters in semiparametric generalized linear models forcluster correlated datardquo Biometrika vol 77 no 3 pp 485ndash4971990

[36] J Shults andN R Chaganty ldquoAnalysis of serially correlated datausing quasi-least squaresrdquo Biometrics vol 54 no 4 pp 1622ndash1630 1998

[37] V J Carey and Y-G Wang ldquoWorking covariance modelselection for generalized estimating equationsrdquo Statistics inMedicine vol 30 no 26 pp 3117ndash3124 2011

[38] W Pan ldquoAkaikersquos information criterion in generalized estimat-ing equationsrdquo Biometrics vol 57 no 1 pp 120ndash125 2001

[39] H Akaike ldquoInformation theory and an extension of themaximum likelihood principlerdquo in Proceedings of the 2ndInternational Symposium on Information Theory vol 15 pp267ndash281 1973

[40] J A Nelder and Y Lee ldquoLikelihood quasi-likelihood andpseudolikelihood some comparisonsrdquo Journal of the RoyalStatistical Society B vol 54 no 1 pp 273ndash284 1992

[41] J Cui ldquoQIC program andmodel selection in GEE analysesrdquoTheStata Journal vol 7 no 2 pp 209ndash220 2007

[42] J Cui and G Qian ldquoSelection of working correlation structureand best model in GEE analyses of longitudinal datardquo Commu-nications in StatisticsmdashSimulation and Computation vol 36 no4ndash6 pp 987ndash996 2007

[43] L Y Hin and Y G Wang ldquoWorking-correlation-structureidentification in generalized estimating equationsrdquo Statistics inMedicine vol 28 no 4 pp 642ndash658 2009

[44] J A Nelder and D Pregibon ldquoAn extended quasi-likelihoodfunctionrdquo Biometrika vol 74 no 2 pp 221ndash232 1987

10 Advances in Statistics

[45] MWang M Kong and S Datta ldquoInference for marginal linearmodels for clustered longitudinal datawith potentially informa-tive cluster sizesrdquo Statistical Methods in Medical Research vol20 no 4 pp 347ndash367 2011

[46] E Cantoni J M Flemming and E Ronchetti ldquoVariableselection for marginal longitudinal generalized linear modelsrdquoBiometrics Journal of the International Biometric Society vol 61no 2 pp 507ndash514 2005

[47] Y-G Wang and X Lin ldquoEffects of variance-function misspeci-fication in analysis of longitudinal datardquo Biometrics vol 61 no2 pp 413ndash421 2005

[48] N R Chaganty andH Joe ldquoEfficiency of generalized estimatingequations for binary responsesrdquo Journal of the Royal StatisticalSociety Series B Statistical Methodology vol 66 no 4 pp 851ndash860 2004

[49] M Gosho C Hamada and I Yoshimura ldquoCriterion for theselection of a working correlation structure in the generalizedestimating equation approach for longitudinal balanced datardquoCommunications in Statistics vol 40 no 21 pp 3839ndash38562011

[50] M Gosho C Hamada and I Yoshimura ldquoSelection of workingcorrelation structure in weighted generalized estimating equa-tion method for incomplete longitudinal datardquo Communica-tions in Statistics vol 43 no 1 pp 62ndash81 2014

[51] M J JangWorking correlation selection in generalized estimatingequations [Dissertation] University of Iowa 2011

[52] J Chen and N A Lazar ldquoSelection of working correlationstructure in generalized estimating equations via empiricallikelihoodrdquo Journal of Computational and Graphical Statisticsvol 21 no 1 pp 18ndash41 2012

[53] P M Westgate ldquoA bias-corrected covariance estimator forimproved inference when using an unstructured correlationwith quadratic inference functionsrdquo Statistics and ProbabilityLetters vol 83 no 6 pp 1553ndash1558 2013

[54] P M Westgate ldquoCriterion for the simultaneous selection of aworking correlation structure and either generalized estimatingequations or the quadratic inference function approachrdquo Bio-metrical Journal vol 56 no 3 pp 461ndash476 2014

[55] P M Westgate ldquoImproving the correlation structure selectionapproach for generalized estimating equations and balancedlongitudinal datardquo Statistics in Medicine vol 33 no 13 pp2222ndash2237 2014

[56] J Ye ldquoOn measuring and correcting the effects of data miningand model selectionrdquo Journal of the American Statistical Associ-ation vol 93 no 441 pp 120ndash131 1998

[57] J J Shuster Practical Handbook of Sample Size Guidelines forClinical Trials CRC Press Boca Raton Fla USA 1993

[58] G Liu and K-Y Liang ldquoSample size calculations for studieswith correlated observationsrdquo Biometrics vol 53 no 3 pp 937ndash947 1997

[59] W J Shih ldquoSample size and power calculations for periodontaland other studies with clustered samples using the method ofgeneralized estimating equationsrdquo Biometrical Journal vol 39no 8 pp 899ndash908 1997

[60] S R Lipsitz and G M Fitzmaurice ldquoSample size for repeatedmeasures studies with binary responsesrdquo Statistics in Medicinevol 13 no 12 pp 1233ndash1239 1994

[61] W Pan ldquoSample size and power calculations with correlatedbinary datardquoControlled Clinical Trials vol 22 no 3 pp 211ndash2272001

[62] N Breslow ldquoTests of hypotheses in overdispersed Poissonregression and other quasi likelihood modelsrdquo Journal of theAmerican Statistical Association vol 85 pp 565ndash571 1990

[63] E W Lee and N Dubin ldquoEstimation and sample size consider-ations for clustered binary responsesrdquo Statistics inMedicine vol13 no 12 pp 1241ndash1252 1994

[64] D J Sargent J A Sloan and S S Cha ldquoSample size anddesign considerations for phase II clinical trials with correlatedobservationsrdquo Controlled Clinical Trials vol 20 no 3 pp 242ndash252 1999

[65] C S Li ldquoSemiparametric negative binomial regressionmodelsrdquoCommunications in Statistics Simulation and Computation vol39 no 3 pp 475ndash486 2010

[66] WHGreene ldquoAccounting for excess zeros and sample selectionin Poisson and negative binomial regression modelsrdquo TechRep New York University 1994

[67] P Lambert ldquoModeling of repeated series of count data mea-sured at unequally spaced timesrdquo Applied Statistics vol 45 pp31ndash38 1996

[68] M S Pepe andG L Anderson ldquoA cautionary note on in ferencefor marginal regression models with longitudinal data andgeneral correlated response datardquo Communications in StatisticsSeries B vol 23 pp 939ndash951 1994

[69] M Wang and Q Long ldquoModified robust variance estimator forgeneralized estimating equations with improved small-sampleperformancerdquo Statistics in Medicine vol 30 no 11 pp 1278ndash1291 2011

[70] M Taljaard ADMcRae CWeijer et al ldquoInadequate reportingof research ethics review and informed consent in clusterrandomised trials Review of random sample of publishedtrialsrdquo British Medical Journal vol 342 Article ID d2496 2011

[71] L A Mancl and T A DeRouen ldquoA covariance estimator forGEE with improved small-sample propertiesrdquo Biometrics vol57 no 1 pp 126ndash134 2001

[72] M P Fay and B I Graubard ldquoSmall-sample adjustments forWald-type tests using sandwich estimatorsrdquo Biometrics vol 57no 4 pp 1198ndash1206 2001

[73] W Pan ldquoOn the robust variance estimator in generalisedestimating equationsrdquo Biometrika vol 88 no 3 pp 901ndash9062001

[74] W Pan and M M Wall ldquoSmall-sample adjustments in usingthe sandwich variance estimator in generalized estimatingequationsrdquo Statistics in Medicine vol 21 no 10 pp 1429ndash14412002

[75] X Guo W Pan J E Connett P J Hannan and S A FrenchldquoSmall-sample performance of the robust score test and itsmodifications in generalized estimating equationsrdquo Statistics inMedicine vol 24 no 22 pp 3479ndash3495 2005

[76] D M Farewell ldquoMarginal analyses of longitudinal data with aninformative pattern of observationsrdquo Biometrika vol 97 no 1pp 65ndash78 2010

[77] J D Beck T Sharp G G Koch and S Offenbacher ldquoA 5-yearstudy of attachment loss and tooth loss in community-dwellingolder adultsrdquo Journal of Periodontal Research vol 32 no 6 pp516ndash523 1997

[78] S J Arbes Jr H Agustsdottir and G D Slade ldquoEnvironmentaltobacco smoke and periodontal disease in the United StatesrdquoAmerican Journal of Public Health vol 91 no 2 pp 253ndash2572001

[79] J M Robins A Rotnitzky and L P Zhao ldquoAnalysis of semi-parametric regression models for repeated outcomes in the

Advances in Statistics 11

presence of missing datardquo Journal of the American StatisticalAssociation vol 90 pp 106ndash121 1995

[80] E B Hoffman P K Sen and C R Weinberg ldquoWithin-clusterresamplingrdquo Biometrika vol 88 no 4 pp 1121ndash1134 2001

[81] J MWilliamson S Datta and G A Satten ldquoMarginal analysesof clustered data when cluster size is informativerdquo Biometricsvol 59 no 1 pp 36ndash42 2003

[82] E Benhin J N Rao and A J Scott ldquoMean estimating equationapproach to analysing cluster-correlated data with nonignor-able cluster sizesrdquo Biometrika vol 92 no 2 pp 435ndash450 2005

[83] X J Cong G Yin and Y Shen ldquoMarginal analysis of correlatedfailure time data with informative cluster sizesrdquo Biometrics vol63 no 3 pp 663ndash672 2007

[84] T C Chiang and K Y Lee ldquoEfcient estimation methods forinformative cluster size datardquo Statistical Sinica vol 80 pp 121ndash123 2008

[85] M Pavlou S R Seaman and A J Copas ldquoAn examinationof a method for marginal inference when the cluster size isinformativerdquo Statistica Sinica vol 23 no 2 pp 791ndash801 2013

[86] S R Seaman M Pavlou and A J Copas ldquoMethods forobserved-cluster inference when cluster size is informative areview and clarificationsrdquoBiometrics vol 70 no 2 pp 449ndash4562014

[87] Z Chen B Zhang and P S Albert ldquoA joint modeling approachto data with informative cluster size robustness to the clustersize modelrdquo Statistics in Medicine vol 30 no 15 pp 1825ndash18362011

[88] Y Huang and B Leroux ldquoInformative cluster sizes forsubcluster-level covariates and weighted generalized estimatingequationsrdquo Biometrics vol 67 no 3 pp 843ndash851 2011

[89] B F Kurland L L Johnson B L Egleston and P H DiehrldquoLongitudinal data with follow-up truncated by death matchthe analysis method to research aimsrdquo Statistical Science vol24 no 2 pp 211ndash222 2009

[90] J M Neuhaus and C E McCulloch ldquoEstimation of covariateeffects in generalized linear mixed models with informativecluster sizesrdquo Biometrika vol 98 no 1 pp 147ndash162 2011

[91] S R Lipsitz G M Fitzmaurice E J Orav and N M LairdldquoPerformance of generalized estimating equations in practicalsituationsrdquo Biometrics vol 50 no 1 pp 270ndash278 1994

[92] D B Hall and T A Severini ldquoExtended generalized estimatingequations for clustered datardquo Journal of the American StatisticalAssociation vol 93 no 444 pp 1365ndash1375 1998

[93] C-W Shen and Y-H Chen ldquoModel selection for generalizedestimating equations accommodating dropout missingnessrdquoBiometrics vol 68 no 4 pp 1046ndash1054 2012

[94] C-W Shen and Y-H Chen ldquoModel selection of generalizedestimating equations with multiply imputed longitudinal datardquoBiometrical Journal vol 55 no 6 pp 899ndash911 2013

[95] D B Rubin ldquoInference and missing datardquo Biometrika vol 63no 3 pp 581ndash592 1976

[96] R J Little andDB Rubin Statistical Analysis withMissingDataWiley New York NY USA

[97] P Diggle D Farewell and RHenderson ldquoAnalysis of longitudi-nal data with drop-out objectives assumptions and a proposalrdquoJournal of the Royal Statistical Society C vol 56 no 5 pp 499ndash550 2007

[98] A J Copas and S R Seaman ldquoBias from the use of generalizedestimating equations to analyze incomplete longitudinal binarydatardquo Journal of Applied Statistics vol 37 no 6 pp 911ndash9222010

[99] L Wang J Zhou and A Qu ldquoPenalized generalized estimatingequations for high-dimensional longitudinal data analysisrdquoBiometrics vol 68 no 2 pp 353ndash360 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Review Article Generalized Estimating Equations in ...downloads.hindawi.com/archive/2014/303728.pdfrecent developments of GEE. As is well known, GEE has several de ning features [

Advances in Statistics 3

Table 1 Summary of commonly used ldquoworkingrdquo correlation structures for GEE

Correlation structure Corr(119884119894119895 119884119894119896) Sample matrix Estimator

Independent Corr(119884119894119895 119884119894119896) =

1 119895 = 119896

0 119895 = 119896(

1 0 0

0 1 0

0 0 1

) NA

Exchangeable Corr(119884119894119895 119884119894119896) =

1 119895 = 119896

120572 119895 = 119896(

1 120572 120572

120572 1 120572

120572 120572 1

)

=1

(1198731015840 minus 119901) 120601

119870

sum119894=1

sum119895 =119896

119890119894119895119890119894119896

1198731015840 =119870

sum119894=1

119899119894(119899119894minus 1)

119896-dependent Corr(119884119894119895 119884119894119895+119898) =

1 119898 = 0

120572119898119898 = 1 2 119896

0 119898 gt 119896

(

1 12057210

12057211 1205721

0 12057211

)

119898=

1

(119870119898minus 119901) 120601

119870

sum119894=1

sum119895le119899119894minus119898

119890119894119895119890119894119895+119905

119870119905=

119870

sum119894=1

(119899119894minus 119898)

Autoregressive AR(1) Corr(119884119894119895 119884119894119895+119898) = 120572119898 119898 = 0 1 2 119899

119894minus 119895 (

1 120572 1205722

120572 1 120572

1205722 120572 1

)

=1

(1198701minus 119901)120601

119870

sum119894=1

sum119895le119899119894minus1

119890119894119895119890119894119895+1

1198701=

119870

sum119894=1

(119899119894minus 1)

Toeplitz Corr(119884119894119895 119884119894119895+119898) =

1 119898 = 0

120572119898119898 = 1 2 119899

119894minus 119895

(

1 12057211205722

12057211 1205721

120572212057221

)

=1

(1198731015840 minus 119901)120601

119870

sum119894=1

sum119895 =119896

119890119894119895119890119894119896

1198731015840 =

119870

sum119894=1

119899119894(119899119894minus 1)

Unstructured Corr(119884119894119895 119884119894119896) =

1 119895 = 119896

120572119895119896

119895 = 119896(

1 1205721212057213

12057221

1 12057223

1205723112057232

1

) 119895119896=

1

(119870 minus 119901)120601

119870

sum119894=1

119890119894119895119890119894119896

22 Model Selection of GEE In this section we will discussthe model selection criteria available of GEE There areseveral reasons why model selection of GEE models isimportant and necessary (1) GEE has gained increasingattention in biomedical studies which may include a largegroup of predictors [25ndash28] Therefore variable selection isnecessary for determining which are included in the finalregressionmodel by identifying significant predictors (2) it isalready known that one feature of GEE is that the consistencyof parameter estimates can still hold evenwhen the ldquoworkingrdquocorrelation structure ismisspecified But correctly specifyingldquoworkingrdquo correlation structure can definitely enhance theefficiency of the parameter estimates in particular when thesample size is not large enough [16 24 25 29] Thereforehow to select intrasubject correlation matrix plays a vital rolein GEE with improved finite-sample performance (3) thevariance function ](120583) is another potential factor affecting thegoodness-of-fit of GEE [25 30] Correctly specified variancefunction can assist in the selection of covariates and an appro-priate correlation structure [31 32] Different criteria mightbe needed due to the goal of model selection [24 29 33] andnext I will particularly introduce the existing approaches onthe selection of ldquoworkingrdquo correlation structure with its ownmerits and limitations [34]

According to Rotnitzky and Jewell the adequacy ofldquoworkingrdquo correlation structure can be examined through

Γ = (sum119870

119894=11198631015840119894119881minus1119894119863119894)minus1119871119885 where

119871119885has been defined in

Section 21 [35] The statistic RJ(119877) is defined by

RJ (119877) = radic(1 minus RJ1)2 + (1 minus RJ2)2 (6)

where RJ1 = trace(Γ)119901 and RJ2 = trace(Γ2)119901 respectivelyIf the ldquoworkingrdquo correlation structure 119877 is correctly specifiedRJ1 and RJ2 will be thus close to 1 leading to RJ(119877)approaching 0 Thus RJ1 RJ2 and RJ(119877) can all be used forcorrelation structure selection

Shults and Chaganty [36] proposed a criterion for select-ing ldquoworkingrdquo correlation structure based on the minimiza-tion of the generalized error sum of squares (ESS) given asfollows

ESS (120572120573) =119870

sum119894=1

(119884119894minus 119906119894)1015840119881minus1

119894(119884119894minus 119906119894)

=

119870

sum119894=1

1198851015840

119894(120573) 119877minus1

119894(120572) 119885119894 (120573)

(7)

where 119885119894(120573) = 11986012(119884

119894minus 119906119894) The criterion is defined by

SC =ESS (120572120573)(119873 minus 119901 minus 119902)

(8)

4 Advances in Statistics

where 119873 = sum119870

119894=1119899119894is the total number of observations

119901 is the number of regression parameters and 119902 is thenumber of correlation coefficients within the ldquoworkingrdquocorrelation structure Another extended criterion from SCwas proposed by Carey and Wang [37] where the Gaussianpseudolikelihood (GP) is adopted and it is given by

GP (119877) = minus05 times119870

sum119894=1

(119884119894minus 119906119894)1015840119881minus1

119894(119884119894minus 119906119894) + log (1003816100381610038161003816119881119894

1003816100381610038161003816) (9)

where a better ldquoworkingrdquo correlation structure yields a largerGP In their work they also showed that GP criterion heldbetter performance than RJ via simulation

Another criterion is proposed by Pan [38] which mod-ified Akaike information criterion (AIC) [39] in adaptionto GEE Due to the fact that GEE is not likelihood-basedthus it is called quasi-likelihood under the independencemodel criterion (QIC) [40] The basic idea is to calculatethe expected Kullback-Leibler discrepancy using the quasi-likelihood under the independence ldquoworkingrdquo correlationassumption due to the lack of a general and tractable quasi-likelihood for the correlated data under any other complexldquoworkingrdquo correlation structures QIC(119877) is defined by

QIC (119877) = minus2Ψ ( (119877) 119868) + 2trace (Ω119868119871119885) (10)

where the quasilikelihood Ψ((119877) 119868) = sum119870119894=1sum119899119894

119895=1119876((119877) 120601

119884119894119895 119883119894119895) with 119876(120583 120601 119910) = int119906

119910((119910 minus 119905)120601119881(119905))119889119905 defined by

[12] and 120601 are obtained under the hypothesized ldquoworkingrdquocorrelation structure 119877 Ω

119868= sum119870

119894=11198631015840119894119881minus1119894119863119894|120573=119877=119868 and 119871119885

is defined above with replacement of 120573 by (119877) [38] Notethat in this work Pan ignored the second term in Taylorrsquosexpansion of the discrepancy and showed its influence wasnot substantial among his simulation set-ups Later onHardin andHilbe (2003)made slightmodification onQIC(119877)by using (119868) and 120601(119868) for more stability and QIC(119877)HH isgiven by

QIC (119877)HH = minus2Ψ ( (119868) 119868) + 2trace (Ω119868119871119885) (11)

Note that QIC(119877) and QIC(119877)HH do not perform well indistinguishing the independence and exchangeable ldquowork-ingrdquo correlation structures because in certain cases the sameregression parameter estimates can be obtained under thesetwo structures Also the attractive property of the QICcriterion is that it allows the selection of the covariates andldquoworkingrdquo correlation structure simultaneously [41 42] butthis measure is more sensitive to the mean structure becauseQIC is particularly impacted by the first term and the secondtermwhich plays a role as a penalty To better select ldquoworkingrdquocorrelation structure Hin and Wang proposed correlationinformation criterion (CIC) defined by

CIC = trace (Ω119868119871119885) (12)

In their work CIC was shown to outperform QIC whenthe outcomes were binary through simulation studies [43]

One limitation of this criterion is that it cannot penalizethe overparameterization thus the performance is not wellin comparison with two correlation structures having quitedifferent numbers of correlation parameters

Another attractive criterion is the extended quasilike-lihood information criterion (EQIC) proposed by Wangand Hin [25] by using the extended quasilikelihood (EQL)defined by Nelder and Pregibon based on the deviancefunction which is shown below under the independentcorrelation structure [44]

119876lowast(120573 120601 119868) = minus

1

2120601119863 (120573 119868) minus

1

2

119870

sum119894=1

119899119894

sum119895=1

log (2120587120601119860 (120583119894119895))

(13)

where the sum of deviances 119863(120573 119868) = sum119870119894=1sum119899119894

119895=1minus2120601119876(119910

119894119895

120583119894119895) minus 119876(119910

119894119895 119910119894119895) with 119876(sdot) being the quasilikelihood defined

as above Therefore EQIC is defined by

EQIC (119877) = 1120601119863 (120573 119868) +

119870

sum119894=1

119899119894

sum119895=1

log (2120587120601119860 (120583119894119895))

+ 2trace (Ω119868119871119885)

(14)

where some adjustments were applied to 119860(120583) by adding asmall constant 119896 with the optimal chosen value as 16 Theauthor indicated that the covariates were first selected basedon QIC and the variance function could be identified asthe one minimizing EQIC given the selected covariates thenldquoworkingrdquo correlation structure selection could be achievedbased on CIC in addition they found out that the covariatesselection by EQIC given different working variance functionswas more consistent than that based on QIC [45]

Besides those criteriamentioned above Cantoni et al alsodiscussed the covariate selection for longitudinal data anal-ysis [46] also a variance function selection was mentionedby Pan and Mackenzie [30] as well as Wang and Lin [47]in addition more work on ldquoworkingrdquo correlation structureselection was addressed by Chaganty and Joe [48] Wang andLin [47] Gosho et al [49 50] Jang [51] Chen [52] andWestgate [53ndash55] among others Overall themodel selectionof GEE is nontrivial where the best selection criterion is stillbeing pursued [56] and the recent work by Wang et al canbe followed up as the rule of thumb [45]

23 Sample Size and Power of GEE It is well known thatthe calculation of sample size and power is necessary andimportant for planning a clinical trial which have beenwell studied for independent observations [1] With thewide applications of GEE in clinical trials this topic forcorrelatedclustered data has gainedmore attention than ever[5 57] The general method for sample sizepower calculatedwas discussed by Liu and Liang [58] where the generalizedscore test was utilized to draw statistical inference and theresulting noncentral chi-square distribution of test statisticunder the alternative hypothesis was derived however insome special cases that is correlated binary data withnonexchangeable correlation structure there was no close

Advances in Statistics 5

form available along the outline of that formula AfterwardsShih provided an alternative formula on sample sizepowercalculation which relied on Wald tests using the estimatesof regression parameters and robust variance estimators [59]For example in a study with one parameter of interest 120573 thehypothesis of interest can be formulated as

1198670 120573 = 0 versus 119867

119886 120573 = 119887 = 0 (15)

where 119887 is the expected value Thus based on a two-sided119885-test with type I error 120578 the power denoted by 120575 can beobtained by

120575 = 1 minus Φ(1198851205782minus119887radic119870

radic]119877) (16)

where119870 is sample size and ]119877is the robust variance estimator

corresponding to 120573 in the estimate of119870119871119885 Accordingly the

sample size is given by

119870 =]119877(1198851205782minus 1198851minus120575)2

1198872 (17)

For correlated continuous data the calculation is straightfor-ward using (16) however in particular for correlated binarydata more work will be needed [60] and Pan providedexplicit formulas for ]

119877under various situations as follows

[61]

]119877= Ω[

1

1205871199010(1 minus 119901

0)+

1

(1 minus 120587) 1199011 (1 minus 1199011)] (18)

where Ω = 119870(sum119870119894=1

11015840119899119894119877minus1

119894119881119894119877minus1

1198941119899119894)(sum119870

119894=111015840119899119894119877minus1

1198941119899119894)2 with 120587

as the proportion of subjects assigned to the control groupand 119901

0and 119901

1as the mean for control and case groups

[61] The detailed calculations of ]119877under several important

special cases are given by

If 119877119894= 119881119868= CS Ω = 119870

sum119870

119894=1(119899119894 (1 + (119899

119894minus 1) 120572))

If 119877119894= 119868 119881

119868= CS Ω =

119870sum119870

119894=1119899119894[1 + (119899

119894minus 1) 120572]

(sum119870

119894=1119899119894)2

If 119877119894= 119881119868= AR (1) Ω = 119870 (1 + 120572)

sum119870

119894=1[119899119894minus (119899119894minus 2) 120572]

If 119877119894= 119868 119881

119868= AR (1)

Ω = 119870sum119870

119894=1[119899119894+ 2 (119899

119894minus 1) 120572 + 2 (119899

119894minus 2) 1205722 + sdot sdot sdot + 2120572119899119894minus1]

(sum119870

119894=1119899119894)2

(19)

These formulas can be directly used in practice which hascovered most situations encountered in clinical trials [61]Note that when 119877

119894= 119881119868= CS Liu and Liang (1997) provided

a different formula of sample size compared with (17) with119899119894= 119899 which is

119870 = ((1199111minus1205782

+ 1199111minus120575)2

times ((1 minus 120587) 1199010 (1 minus 1199010) + 1205871199011 (1 minus 1199011))

times [1 + (119899 minus 1) 120572])

times (119899120587 (1 minus 120587) (1199011 minus 1199010)2)minus1

(20)

Be aware that the difference is due to the test methods theWald 119885-test used by Pan [61] and the score test applied byLiu and Liang [58] Note that in some cases the score testmay be preferred [62] Although some other works exist forsample sizepower calculation they focused on the otheralternative approaches rather than GEE [63 64] thus wedo not discuss them here For correlated Poisson data thesample sizepower calculation is more challenging due tothe occurrence of overdispersion or sparsity where negativebinomial regression model may be explored [62 65ndash67]

On the other hand there are several concerns [68]First we here focus on the calculation of the sample size 119870assuming 119899

119894is known however based on the power formula

(16) ]119877depends on 119899

119894and thus increasing 119899

119894can also assist

in power improvement but turns out to be less effective than119870 [69] Second the sample sizepower calculation may berestricted to the limitation of clusters for example clusteredrandomized trials (CRTs) where the number of clusters couldbe relatively small For example by the literature review ofpublished CRTs the median number of clusters is shown as21 [70] In such situations the power formula adjusted for thesmall samples in GEE is necessary which has drawn attentionfrom researchers recently [71ndash75]

24 Clustered Data with Informative Cluster Size The appli-cation of GEE in clustered data with informative clustersize is another special topic [76] Taking an example ofa periodontal disease study the number of teeth for eachpatient may be related to the overall oral health of theindividual in other words the worse the oral health is theless the number of teeth is and thus cluster size 119899

119894may

influence the distribution of the oral outcomes which iscalled informative cluster size [45 77] Such issues commonlyoccur in biomedical studies (eg genetic disease studies) andrigorous statistical methods are needed for valid statisticalinference [78] Note that if the maximum of cluster sizeexists and is known then this can be treated as (informative)missing data problem which can be solved via the weightedestimating equations proposed by Robins et al [79] howeverif the maximum is unknown or not accessible the methodof within-cluster resampling (WCR) proposed by Hoffman etal could be applied [80] The basic idea is that for each of 119871resampled replicate data based on a Monte Carlo method (119871is a large number ie 10000) one observation is randomlyextracted from each cluster where

119897with variance estimator

Σ119897can be obtained from a regular score equation denoted by

119878119897(120573) for independent observations (ie linear regression for

6 Advances in Statistics

continuous data logistic regression for binary data Poissonregression for count data) 119897 = 1 2 119871 The details areshown as follows

119878119897(120573) =

119870

sum119894=1

119878119894119895(120573119897) 119868 [119895 isin 119903

119897] = 0

wcr =1

119871

119871

sum119897=1

119897

wcr =1

119871

119871

sum119897=1

Σ119897minus1

119871

119871

sum119897=1

(119897minus wcr) (119897 minus wcr)

119879

(21)

where 119878119894119895(120573119897) = 1198831015840

119894119895119881minus1119894119895(119884119894119895minus 1198831015840119894119895120573119897) with 119903

119897as the set of

data index selected from the 119894th cluster in 119897th replicate dataAlternatively the approach considered by Williamson et alby adopting the weighted estimating equations performsasymptotically equivalently asWCR and also avoids intensivecomputing and it is referred to as the cluster-weighted GEE(CWGEE) [81] The estimating equation is

119878 (120573) =119870

sum119894=1

1

119870

119899119894

sum119895=1

119878119894119895(120573) = 0 (22)

where 119878119894119895is defined the same as above but what is different is

that the subscription 119895 ranges from 1 to 119899119894 not restricted by

the index 119903119897 Note that as 119871 rarr infin (1119871)sum119871

119894=1119878119897(120573) converges

to its expected estimating function and is asymptoticallyequivalent to 119878(120573)

This method was also explored or extended for thecorrelated data with nonignorable cluster size by Benhin etal and Cong et al [82 83] Furthermore a more efficientmethod called modified WCR (MWCR) was proposed byChiang and Lee where minimum cluster size 119899

119894gt 1 subjects

were randomly sampled from each cluster and then GEEmodels for balanced data were applied for estimation byincorporating the intracluster correlation thusMWCRmightbe a more efficient way for analysis [84] But MWCR is notalways satisfactory and Pavlou et al recognized the sufficientconditions of the data structure and the choice of ldquoworkingrdquocorrelation structure which allowed the consistency of theestimates fromMWCR[85] In additionWang et al extendedthe above work to the clustered longitudinal data which arecollected as repeated measures on subjects arising in clusterswith potential informative cluster size [45] Examples includehealth studies of subjects from multiple hospitals or familiesWith the adoption and comparison of GEE WCR andCWGEE the author claimed thatCWGEEwas recommendedbecause of the comparable performance with WCR and thelack of intensive Monte Carlo computation in terms of wellpreserved coverage rates and desirable power propertieswhile GEE models led to invalid inference due to the biasedparameter estimates via extensive simulation studies and realdata application of a periodontal disease study [45] In addi-tion for observed-cluster inference Seaman et al discussedthe methods including weighted and doubly weighted GEEand the shared random-effects models for comparison and

showed the conditions under which the shared random-effects model described members with observed outcomes Y[86] More work can be found in [87ndash90] among others

3 Simulation

In this section we focus on ldquoworkingrdquo correlation structureselection and compare the performances of the existingcriteria through simulation studies Two types of outcomesare considered continuous and count responses The modelsfor data generation are as follows

119906119894119895= 1205730+ 1205731times 119909119894119895

log (119906119894119895) = 1205730+ 1205731times 119909119894119895

(23)

where 1205730= 1205731= 05 119894 = 1 2 119868 with 119868 = 50 100 200 500

and 119895 = 1 2 119869 with 119869 = 4 8 The covariates 119909119894119895are iid

from a standard uniform distribution Unif(0 1) For eachscenario we generate the data based on the underlying truecorrelation structures as independent (IND) exchangeable(EXCH) and autoregressive (AR-1) with 120572 = 03 071000 Monte Carlo data sets are generated for each scenariowhere the estimates of regression parameters and within-subject correlation matrix and seven model selection criteriameasures are calculated using the ldquoworkingrdquo correlationstructure of IND EXCH and AR-1 The partial simulationresults are provided in Tables 2 3 and 4 where the results ofCIC are not shown because they are the same as those of QIC

Based on the results RJ does not perform well forthe scenarios with either continuous or binary outcomeswhile RJ1 and RJ2 have comparable performances and canselect the true underlying correlation structure in mostscenarios with better performance under large sample sizeQIC is not satisfactory when the true correlation structureis independent but has advantageous performance for thescenarios with the true correlation structure as exchangeableor AR-1 On the other hand SC and GP do not performwell for longitudinal data with normal responses but theperformance is slightly improved for longitudinal data withbinary outcomes The results may vary due to variety offactors including the types of ldquoworkingrdquo correlation structureconsidered for model fitting the sample size andor themagnitude of correlation coefficient For the future workthere is a necessity to find out a robust criterion for ldquoworkingrdquocorrelation structure selection of GEE and more advancedapproaches are emerging currently

4 Future Direction and Discussion

In this paper we provide a review of several specific topicssuch as model selection with emphasis on the selection ofldquoworkingrdquo correlation structure sample size and power cal-culation and clustered data analysis with informative clustersize related to GEE for longitudinalcorrelated data Thesimulation studies are conducted for providing numericalcomparisons among five types of model selection criteria[91 92] Until now novel methodologies are still needed andbeing developed due to the increasing usage and potential

Advances in Statistics 7

Table 2 Simulation for longitudinal data with independent correlation matrix

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 198 393 409 202 374 424RJ 327 423 250 312 421 267RJ1 388 322 290 399 316 285RJ2 384 327 289 388 320 292SC 488 1 512 351 310 339GP 547 0 453 368 306 326

100

QIC 209 377 414 185 407 408RJ 338 415 247 340 410 250RJ1 389 349 262 381 358 261RJ2 389 353 258 372 357 271SC 482 1 517 352 346 302GP 520 0 480 360 348 292

8

50

QIC 200 411 389 203 363 434RJ 282 497 221 292 476 232RJ1 402 354 244 386 340 274RJ2 402 357 241 373 347 280SC 465 1 535 351 325 324GP 558 0 442 382 311 307

100

QIC 188 393 419 201 398 401RJ 321 442 237 287 466 247RJ1 347 385 268 385 367 248RJ2 347 382 271 377 369 254SC 492 0 508 355 343 302GP 541 0 459 370 341 289

Table 3 Simulation for longitudinal data with exchangeable correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 106 699 195 53 758 189RJ 419 139 442 869 5 126RJ1 0 963 37 12 898 90RJ2 0 959 41 22 876 102SC 0 593 407 282 650 68GP 1 593 406 412 524 64

100

QIC 31 879 90 7 867 126RJ 350 88 562 911 2 87RJ1 0 995 5 2 946 52RJ2 0 996 4 10 933 57SC 0 598 402 339 635 26GP 0 501 499 445 531 24

8

50

QIC 80 828 92 50 876 74RJ 10 395 595 813 6 181RJ1 0 1000 0 0 987 13RJ2 0 1000 0 0 966 25SC 0 488 513 302 696 2GP 0 511 489 497 500 3

100

QIC 17 953 30 8 973 19RJ 0 408 592 861 0 139RJ1 0 1000 0 0 997 3RJ2 0 1000 0 0 993 7SC 0 470 530 328 672 0GP 0 526 474 486 514 0

8 Advances in Statistics

Table 4 Simulation for longitudinal data with AR-1 correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 91 166 743 66 170 764RJ 712 142 146 925 12 63RJ1 0 478 522 7 505 488RJ2 0 466 534 20 499 481SC 0 480 520 220 350 430GP 0 543 457 303 332 365

100

QIC 25 116 859 7 122 871RJ 770 95 135 972 4 24RJ1 0 475 525 1 569 430RJ2 0 481 519 5 571 424SC 0 491 509 237 371 392GP 0 540 460 290 353 357

8

50

QIC 50 88 862 44 77 879RJ 646 148 206 934 5 61RJ1 0 445 555 0 535 465RJ2 0 443 557 10 535 455SC 0 467 533 168 397 435GP 0 549 451 269 406 325

100

QIC 16 39 945 7 33 960RJ 648 154 198 972 0 28RJ1 0 455 545 1 603 396RJ2 0 455 545 1 609 390SC 0 480 520 177 458 365GP 0 532 468 247 457 296

theoretical constraints of GEE as well as new challengesemerging from practical applications in clinical trials orbiomedical studies

In addition current research of interest related to GEEalso includes a robust and optimal model selection criterionof GEE under missing at random (MAR) or missing not atrandom (MNAR) [93 94] sample sizepower calculation forcorrelated sparse or overdispersion count data or longitudinaldata with small sample [57ndash60] GEE with improved per-formance under the situations with informative cluster sizeandor MAR andor small sample size [95ndash98] and GEE forhigh-dimensional longitudinal data [99] Although GEE hasattractive features flexible application and easy implementa-tion in software the application in practice should be cautiousdepending on the context of study design or data structureand the goals of research interest

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The author was supported by a grant from the Penn StateCTSI The project was supported by the National Center forResearch Resources and the National Center for AdvancingTranslational Sciences National Institutes of Health throughGrant 5 UL1 RR0330184-04 The content is solely the respon-sibility of the author and does not represent the views of theNIH

References

[1] Z Feng P Diehr A Peterson and D McLerran ldquoSelectedstatistical issues in group randomized trialsrdquo Annual Review ofPublic Health vol 22 pp 167ndash187 2001

[2] G Fitzmaurice N M Larid and J H Ware Applied Longitudi-nal Data John Wiley amp Sons 2004

[3] J W Hardin and J M Hilbe Generalized Estimating EquationsChapman and HallCRC Press Boca Raton Fla USA 2003

[4] R F Potthoff and S N Roy ldquoA generalized multivariate analysisof variance model useful especially for growth curve problemsrdquoBiometrika vol 51 pp 313ndash326 1964

Advances in Statistics 9

[5] LM Friedman CD Furberg andD LDeMets Fundamentalsof Clinical Trials Springer New York NY USA 3nd edition1989

[6] K Y Liang and S L Zeger ldquoA comparison of two bias-correctedcovariance estimators for generalized estimating equationsrdquoBiometrika vol 73 pp 13ndash22 1986

[7] M Crowder ldquoOn the use of a working correlation matrixin using generalised linear models for repeated measuresrdquoBiometrika vol 82 no 2 pp 407ndash410 1995

[8] R W Wedderburn ldquoQuasi-likelihood functions generalizedlinearmodels and the Gauss-Newtonmethodrdquo Biometrika vol61 pp 439ndash447 1974

[9] P Diggle P Heagerty K Y Liang and S L Zeger Analysis ofLongitudinal Data Oxford University Press Oxford UK 2002

[10] G Fitzmaurice M Davidian G Verbeke and G MolenberghsLongitudinal Data Anlaysis Chapman ampHallCRC Press 2008

[11] D Hedeker and R D Gibbons Analysis of Longitudinal DataJohn Wiley amp Sons 2006

[12] P McCullagh and J A Nelder Generalized Linear ModelsChapman amp Hall London UK 1989

[13] N R Chaganty and H Joe ldquoRange of correlation matrices fordependent Bernoulli random variablesrdquo Biometrika vol 93 no1 pp 197ndash206 2006

[14] R T Sabo and N R Chaganty ldquoWhat can go wrong whenignoring correlation bounds in the use of generalized estimatingequationsrdquo Statistics in Medicine vol 29 no 24 pp 2501ndash25072010

[15] B C Sutradhar and K Das ldquoOn the efficiency of regressionestimators in generalised linear models for longitudinal datardquoBiometrika vol 86 no 2 pp 459ndash465 1999

[16] Y-G Wang and V Carey ldquoWorking correlation structuremisspecification estimation and covariate design implicationsfor generalised estimating equations performancerdquo Biometrikavol 90 no 1 pp 29ndash41 2003

[17] S R Lipsitz GMolenberghsGM Fitzmaurice and J IbrahimldquoGEE with Gaussian estimation of the correlations when dataare incompleterdquo Biometrics vol 56 no 2 pp 528ndash536 2000

[18] Y-G Wang and V J Carey ldquoUnbiased estimating equationsfromworking correlationmodels for irregularly timed repeatedmeasuresrdquo Journal of the American Statistical Association vol99 no 467 pp 845ndash853 2004

[19] A Qu and B G Lindsay ldquoBuilding adaptive estimating equa-tions when inverse of covariance estimation is difficultrdquo Journalof the Royal Statistical Society B Statistical Methodology vol 65no 1 pp 127ndash142 2003

[20] S R Lipsitz and G M Fitzmaurice ldquoEstimating equations formeasures of association between repeated binary responsesrdquoBiometrics vol 52 no 3 pp 903ndash912 1996

[21] Y Lee and J A Nelder ldquoConditional and marginal modelsanother viewrdquo Statistical Science vol 19 no 2 pp 219ndash238 2004

[22] Y Lee and J A Nelder ldquoLikelihood inference for models withunobservables another viewrdquo Statistical Science vol 24 no 3pp 255ndash269 2009

[23] A Qu B G Lindsay and B Li ldquoImproving generalised estimat-ing equations using quadratic inference functionsrdquo Biometrikavol 87 no 4 pp 823ndash836 2000

[24] G Kauermann and R J Carroll ldquoA note on the efficiencyof sandwich covariance matrix estimationrdquo Journal of theAmerican Statistical Association vol 96 no 456 pp 1387ndash13962001

[25] Y G Wang and L Y Hin ldquoModeling strategies in longitudinaldata analysis covariate variance function and correlationstructure selectionrdquoComputational Statistics andData Analysisvol 54 no 12 pp 3359ndash3370 2010

[26] W Pan ldquoGoodness-of-fit tests for GEE with correlated binarydatardquo Scandinavian Journal of Statistics vol 29 no 1 pp 101ndash110 2002

[27] A M Wood I R White and P Royston ldquoHow should variableselection be performed with multiply imputed datardquo Statisticsin Medicine vol 27 no 17 pp 3227ndash3246 2008

[28] M D Begg and M K Parides ldquoSeparation of individual-level and cluster-level covariate effects in regression analysis ofcorrelated datardquo Statistics in Medicine vol 22 no 16 pp 2591ndash2602 2003

[29] L Y Hin V J Carey and Y G Wang ldquoCriteria for working-correlation-structure selection in GEE assessment via simula-tionrdquoTheAmerican Statistician vol 61 no 4 pp 360ndash364 2007

[30] J X Pan and G Mackenzie ldquoOn modelling mean-covariancestructures in longitudinal studiesrdquo Biometrika vol 90 no 1 pp239ndash244 2003

[31] M Davidian and R J Carroll ldquoVariance function estimationrdquoJournal of the American Statistical Association vol 82 no 400pp 1079ndash1091 1987

[32] M Pourahmadi ldquoJoint mean-covariance models with appli-cations to longitudinal data unconstrained parameterisationrdquoBiometrika vol 86 no 3 pp 677ndash690 1999

[33] S Konishi and G Kitagawa ldquoGeneralised information criteriainmodel selectionrdquoBiometrika vol 83 no 4 pp 875ndash890 1996

[34] B Zhang ldquoSummarizing the goodness of fit o f generalizedlinear models for longitudinal datardquo Statistics in Medicine vol19 pp 1265ndash1275 2000

[35] A Rotnitzky and N P Jewell ldquoHypothesis testing of regressionparameters in semiparametric generalized linear models forcluster correlated datardquo Biometrika vol 77 no 3 pp 485ndash4971990

[36] J Shults andN R Chaganty ldquoAnalysis of serially correlated datausing quasi-least squaresrdquo Biometrics vol 54 no 4 pp 1622ndash1630 1998

[37] V J Carey and Y-G Wang ldquoWorking covariance modelselection for generalized estimating equationsrdquo Statistics inMedicine vol 30 no 26 pp 3117ndash3124 2011

[38] W Pan ldquoAkaikersquos information criterion in generalized estimat-ing equationsrdquo Biometrics vol 57 no 1 pp 120ndash125 2001

[39] H Akaike ldquoInformation theory and an extension of themaximum likelihood principlerdquo in Proceedings of the 2ndInternational Symposium on Information Theory vol 15 pp267ndash281 1973

[40] J A Nelder and Y Lee ldquoLikelihood quasi-likelihood andpseudolikelihood some comparisonsrdquo Journal of the RoyalStatistical Society B vol 54 no 1 pp 273ndash284 1992

[41] J Cui ldquoQIC program andmodel selection in GEE analysesrdquoTheStata Journal vol 7 no 2 pp 209ndash220 2007

[42] J Cui and G Qian ldquoSelection of working correlation structureand best model in GEE analyses of longitudinal datardquo Commu-nications in StatisticsmdashSimulation and Computation vol 36 no4ndash6 pp 987ndash996 2007

[43] L Y Hin and Y G Wang ldquoWorking-correlation-structureidentification in generalized estimating equationsrdquo Statistics inMedicine vol 28 no 4 pp 642ndash658 2009

[44] J A Nelder and D Pregibon ldquoAn extended quasi-likelihoodfunctionrdquo Biometrika vol 74 no 2 pp 221ndash232 1987

10 Advances in Statistics

[45] MWang M Kong and S Datta ldquoInference for marginal linearmodels for clustered longitudinal datawith potentially informa-tive cluster sizesrdquo Statistical Methods in Medical Research vol20 no 4 pp 347ndash367 2011

[46] E Cantoni J M Flemming and E Ronchetti ldquoVariableselection for marginal longitudinal generalized linear modelsrdquoBiometrics Journal of the International Biometric Society vol 61no 2 pp 507ndash514 2005

[47] Y-G Wang and X Lin ldquoEffects of variance-function misspeci-fication in analysis of longitudinal datardquo Biometrics vol 61 no2 pp 413ndash421 2005

[48] N R Chaganty andH Joe ldquoEfficiency of generalized estimatingequations for binary responsesrdquo Journal of the Royal StatisticalSociety Series B Statistical Methodology vol 66 no 4 pp 851ndash860 2004

[49] M Gosho C Hamada and I Yoshimura ldquoCriterion for theselection of a working correlation structure in the generalizedestimating equation approach for longitudinal balanced datardquoCommunications in Statistics vol 40 no 21 pp 3839ndash38562011

[50] M Gosho C Hamada and I Yoshimura ldquoSelection of workingcorrelation structure in weighted generalized estimating equa-tion method for incomplete longitudinal datardquo Communica-tions in Statistics vol 43 no 1 pp 62ndash81 2014

[51] M J JangWorking correlation selection in generalized estimatingequations [Dissertation] University of Iowa 2011

[52] J Chen and N A Lazar ldquoSelection of working correlationstructure in generalized estimating equations via empiricallikelihoodrdquo Journal of Computational and Graphical Statisticsvol 21 no 1 pp 18ndash41 2012

[53] P M Westgate ldquoA bias-corrected covariance estimator forimproved inference when using an unstructured correlationwith quadratic inference functionsrdquo Statistics and ProbabilityLetters vol 83 no 6 pp 1553ndash1558 2013

[54] P M Westgate ldquoCriterion for the simultaneous selection of aworking correlation structure and either generalized estimatingequations or the quadratic inference function approachrdquo Bio-metrical Journal vol 56 no 3 pp 461ndash476 2014

[55] P M Westgate ldquoImproving the correlation structure selectionapproach for generalized estimating equations and balancedlongitudinal datardquo Statistics in Medicine vol 33 no 13 pp2222ndash2237 2014

[56] J Ye ldquoOn measuring and correcting the effects of data miningand model selectionrdquo Journal of the American Statistical Associ-ation vol 93 no 441 pp 120ndash131 1998

[57] J J Shuster Practical Handbook of Sample Size Guidelines forClinical Trials CRC Press Boca Raton Fla USA 1993

[58] G Liu and K-Y Liang ldquoSample size calculations for studieswith correlated observationsrdquo Biometrics vol 53 no 3 pp 937ndash947 1997

[59] W J Shih ldquoSample size and power calculations for periodontaland other studies with clustered samples using the method ofgeneralized estimating equationsrdquo Biometrical Journal vol 39no 8 pp 899ndash908 1997

[60] S R Lipsitz and G M Fitzmaurice ldquoSample size for repeatedmeasures studies with binary responsesrdquo Statistics in Medicinevol 13 no 12 pp 1233ndash1239 1994

[61] W Pan ldquoSample size and power calculations with correlatedbinary datardquoControlled Clinical Trials vol 22 no 3 pp 211ndash2272001

[62] N Breslow ldquoTests of hypotheses in overdispersed Poissonregression and other quasi likelihood modelsrdquo Journal of theAmerican Statistical Association vol 85 pp 565ndash571 1990

[63] E W Lee and N Dubin ldquoEstimation and sample size consider-ations for clustered binary responsesrdquo Statistics inMedicine vol13 no 12 pp 1241ndash1252 1994

[64] D J Sargent J A Sloan and S S Cha ldquoSample size anddesign considerations for phase II clinical trials with correlatedobservationsrdquo Controlled Clinical Trials vol 20 no 3 pp 242ndash252 1999

[65] C S Li ldquoSemiparametric negative binomial regressionmodelsrdquoCommunications in Statistics Simulation and Computation vol39 no 3 pp 475ndash486 2010

[66] WHGreene ldquoAccounting for excess zeros and sample selectionin Poisson and negative binomial regression modelsrdquo TechRep New York University 1994

[67] P Lambert ldquoModeling of repeated series of count data mea-sured at unequally spaced timesrdquo Applied Statistics vol 45 pp31ndash38 1996

[68] M S Pepe andG L Anderson ldquoA cautionary note on in ferencefor marginal regression models with longitudinal data andgeneral correlated response datardquo Communications in StatisticsSeries B vol 23 pp 939ndash951 1994

[69] M Wang and Q Long ldquoModified robust variance estimator forgeneralized estimating equations with improved small-sampleperformancerdquo Statistics in Medicine vol 30 no 11 pp 1278ndash1291 2011

[70] M Taljaard ADMcRae CWeijer et al ldquoInadequate reportingof research ethics review and informed consent in clusterrandomised trials Review of random sample of publishedtrialsrdquo British Medical Journal vol 342 Article ID d2496 2011

[71] L A Mancl and T A DeRouen ldquoA covariance estimator forGEE with improved small-sample propertiesrdquo Biometrics vol57 no 1 pp 126ndash134 2001

[72] M P Fay and B I Graubard ldquoSmall-sample adjustments forWald-type tests using sandwich estimatorsrdquo Biometrics vol 57no 4 pp 1198ndash1206 2001

[73] W Pan ldquoOn the robust variance estimator in generalisedestimating equationsrdquo Biometrika vol 88 no 3 pp 901ndash9062001

[74] W Pan and M M Wall ldquoSmall-sample adjustments in usingthe sandwich variance estimator in generalized estimatingequationsrdquo Statistics in Medicine vol 21 no 10 pp 1429ndash14412002

[75] X Guo W Pan J E Connett P J Hannan and S A FrenchldquoSmall-sample performance of the robust score test and itsmodifications in generalized estimating equationsrdquo Statistics inMedicine vol 24 no 22 pp 3479ndash3495 2005

[76] D M Farewell ldquoMarginal analyses of longitudinal data with aninformative pattern of observationsrdquo Biometrika vol 97 no 1pp 65ndash78 2010

[77] J D Beck T Sharp G G Koch and S Offenbacher ldquoA 5-yearstudy of attachment loss and tooth loss in community-dwellingolder adultsrdquo Journal of Periodontal Research vol 32 no 6 pp516ndash523 1997

[78] S J Arbes Jr H Agustsdottir and G D Slade ldquoEnvironmentaltobacco smoke and periodontal disease in the United StatesrdquoAmerican Journal of Public Health vol 91 no 2 pp 253ndash2572001

[79] J M Robins A Rotnitzky and L P Zhao ldquoAnalysis of semi-parametric regression models for repeated outcomes in the

Advances in Statistics 11

presence of missing datardquo Journal of the American StatisticalAssociation vol 90 pp 106ndash121 1995

[80] E B Hoffman P K Sen and C R Weinberg ldquoWithin-clusterresamplingrdquo Biometrika vol 88 no 4 pp 1121ndash1134 2001

[81] J MWilliamson S Datta and G A Satten ldquoMarginal analysesof clustered data when cluster size is informativerdquo Biometricsvol 59 no 1 pp 36ndash42 2003

[82] E Benhin J N Rao and A J Scott ldquoMean estimating equationapproach to analysing cluster-correlated data with nonignor-able cluster sizesrdquo Biometrika vol 92 no 2 pp 435ndash450 2005

[83] X J Cong G Yin and Y Shen ldquoMarginal analysis of correlatedfailure time data with informative cluster sizesrdquo Biometrics vol63 no 3 pp 663ndash672 2007

[84] T C Chiang and K Y Lee ldquoEfcient estimation methods forinformative cluster size datardquo Statistical Sinica vol 80 pp 121ndash123 2008

[85] M Pavlou S R Seaman and A J Copas ldquoAn examinationof a method for marginal inference when the cluster size isinformativerdquo Statistica Sinica vol 23 no 2 pp 791ndash801 2013

[86] S R Seaman M Pavlou and A J Copas ldquoMethods forobserved-cluster inference when cluster size is informative areview and clarificationsrdquoBiometrics vol 70 no 2 pp 449ndash4562014

[87] Z Chen B Zhang and P S Albert ldquoA joint modeling approachto data with informative cluster size robustness to the clustersize modelrdquo Statistics in Medicine vol 30 no 15 pp 1825ndash18362011

[88] Y Huang and B Leroux ldquoInformative cluster sizes forsubcluster-level covariates and weighted generalized estimatingequationsrdquo Biometrics vol 67 no 3 pp 843ndash851 2011

[89] B F Kurland L L Johnson B L Egleston and P H DiehrldquoLongitudinal data with follow-up truncated by death matchthe analysis method to research aimsrdquo Statistical Science vol24 no 2 pp 211ndash222 2009

[90] J M Neuhaus and C E McCulloch ldquoEstimation of covariateeffects in generalized linear mixed models with informativecluster sizesrdquo Biometrika vol 98 no 1 pp 147ndash162 2011

[91] S R Lipsitz G M Fitzmaurice E J Orav and N M LairdldquoPerformance of generalized estimating equations in practicalsituationsrdquo Biometrics vol 50 no 1 pp 270ndash278 1994

[92] D B Hall and T A Severini ldquoExtended generalized estimatingequations for clustered datardquo Journal of the American StatisticalAssociation vol 93 no 444 pp 1365ndash1375 1998

[93] C-W Shen and Y-H Chen ldquoModel selection for generalizedestimating equations accommodating dropout missingnessrdquoBiometrics vol 68 no 4 pp 1046ndash1054 2012

[94] C-W Shen and Y-H Chen ldquoModel selection of generalizedestimating equations with multiply imputed longitudinal datardquoBiometrical Journal vol 55 no 6 pp 899ndash911 2013

[95] D B Rubin ldquoInference and missing datardquo Biometrika vol 63no 3 pp 581ndash592 1976

[96] R J Little andDB Rubin Statistical Analysis withMissingDataWiley New York NY USA

[97] P Diggle D Farewell and RHenderson ldquoAnalysis of longitudi-nal data with drop-out objectives assumptions and a proposalrdquoJournal of the Royal Statistical Society C vol 56 no 5 pp 499ndash550 2007

[98] A J Copas and S R Seaman ldquoBias from the use of generalizedestimating equations to analyze incomplete longitudinal binarydatardquo Journal of Applied Statistics vol 37 no 6 pp 911ndash9222010

[99] L Wang J Zhou and A Qu ldquoPenalized generalized estimatingequations for high-dimensional longitudinal data analysisrdquoBiometrics vol 68 no 2 pp 353ndash360 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Review Article Generalized Estimating Equations in ...downloads.hindawi.com/archive/2014/303728.pdfrecent developments of GEE. As is well known, GEE has several de ning features [

4 Advances in Statistics

where 119873 = sum119870

119894=1119899119894is the total number of observations

119901 is the number of regression parameters and 119902 is thenumber of correlation coefficients within the ldquoworkingrdquocorrelation structure Another extended criterion from SCwas proposed by Carey and Wang [37] where the Gaussianpseudolikelihood (GP) is adopted and it is given by

GP (119877) = minus05 times119870

sum119894=1

(119884119894minus 119906119894)1015840119881minus1

119894(119884119894minus 119906119894) + log (1003816100381610038161003816119881119894

1003816100381610038161003816) (9)

where a better ldquoworkingrdquo correlation structure yields a largerGP In their work they also showed that GP criterion heldbetter performance than RJ via simulation

Another criterion is proposed by Pan [38] which mod-ified Akaike information criterion (AIC) [39] in adaptionto GEE Due to the fact that GEE is not likelihood-basedthus it is called quasi-likelihood under the independencemodel criterion (QIC) [40] The basic idea is to calculatethe expected Kullback-Leibler discrepancy using the quasi-likelihood under the independence ldquoworkingrdquo correlationassumption due to the lack of a general and tractable quasi-likelihood for the correlated data under any other complexldquoworkingrdquo correlation structures QIC(119877) is defined by

QIC (119877) = minus2Ψ ( (119877) 119868) + 2trace (Ω119868119871119885) (10)

where the quasilikelihood Ψ((119877) 119868) = sum119870119894=1sum119899119894

119895=1119876((119877) 120601

119884119894119895 119883119894119895) with 119876(120583 120601 119910) = int119906

119910((119910 minus 119905)120601119881(119905))119889119905 defined by

[12] and 120601 are obtained under the hypothesized ldquoworkingrdquocorrelation structure 119877 Ω

119868= sum119870

119894=11198631015840119894119881minus1119894119863119894|120573=119877=119868 and 119871119885

is defined above with replacement of 120573 by (119877) [38] Notethat in this work Pan ignored the second term in Taylorrsquosexpansion of the discrepancy and showed its influence wasnot substantial among his simulation set-ups Later onHardin andHilbe (2003)made slightmodification onQIC(119877)by using (119868) and 120601(119868) for more stability and QIC(119877)HH isgiven by

QIC (119877)HH = minus2Ψ ( (119868) 119868) + 2trace (Ω119868119871119885) (11)

Note that QIC(119877) and QIC(119877)HH do not perform well indistinguishing the independence and exchangeable ldquowork-ingrdquo correlation structures because in certain cases the sameregression parameter estimates can be obtained under thesetwo structures Also the attractive property of the QICcriterion is that it allows the selection of the covariates andldquoworkingrdquo correlation structure simultaneously [41 42] butthis measure is more sensitive to the mean structure becauseQIC is particularly impacted by the first term and the secondtermwhich plays a role as a penalty To better select ldquoworkingrdquocorrelation structure Hin and Wang proposed correlationinformation criterion (CIC) defined by

CIC = trace (Ω119868119871119885) (12)

In their work CIC was shown to outperform QIC whenthe outcomes were binary through simulation studies [43]

One limitation of this criterion is that it cannot penalizethe overparameterization thus the performance is not wellin comparison with two correlation structures having quitedifferent numbers of correlation parameters

Another attractive criterion is the extended quasilike-lihood information criterion (EQIC) proposed by Wangand Hin [25] by using the extended quasilikelihood (EQL)defined by Nelder and Pregibon based on the deviancefunction which is shown below under the independentcorrelation structure [44]

119876lowast(120573 120601 119868) = minus

1

2120601119863 (120573 119868) minus

1

2

119870

sum119894=1

119899119894

sum119895=1

log (2120587120601119860 (120583119894119895))

(13)

where the sum of deviances 119863(120573 119868) = sum119870119894=1sum119899119894

119895=1minus2120601119876(119910

119894119895

120583119894119895) minus 119876(119910

119894119895 119910119894119895) with 119876(sdot) being the quasilikelihood defined

as above Therefore EQIC is defined by

EQIC (119877) = 1120601119863 (120573 119868) +

119870

sum119894=1

119899119894

sum119895=1

log (2120587120601119860 (120583119894119895))

+ 2trace (Ω119868119871119885)

(14)

where some adjustments were applied to 119860(120583) by adding asmall constant 119896 with the optimal chosen value as 16 Theauthor indicated that the covariates were first selected basedon QIC and the variance function could be identified asthe one minimizing EQIC given the selected covariates thenldquoworkingrdquo correlation structure selection could be achievedbased on CIC in addition they found out that the covariatesselection by EQIC given different working variance functionswas more consistent than that based on QIC [45]

Besides those criteriamentioned above Cantoni et al alsodiscussed the covariate selection for longitudinal data anal-ysis [46] also a variance function selection was mentionedby Pan and Mackenzie [30] as well as Wang and Lin [47]in addition more work on ldquoworkingrdquo correlation structureselection was addressed by Chaganty and Joe [48] Wang andLin [47] Gosho et al [49 50] Jang [51] Chen [52] andWestgate [53ndash55] among others Overall themodel selectionof GEE is nontrivial where the best selection criterion is stillbeing pursued [56] and the recent work by Wang et al canbe followed up as the rule of thumb [45]

23 Sample Size and Power of GEE It is well known thatthe calculation of sample size and power is necessary andimportant for planning a clinical trial which have beenwell studied for independent observations [1] With thewide applications of GEE in clinical trials this topic forcorrelatedclustered data has gainedmore attention than ever[5 57] The general method for sample sizepower calculatedwas discussed by Liu and Liang [58] where the generalizedscore test was utilized to draw statistical inference and theresulting noncentral chi-square distribution of test statisticunder the alternative hypothesis was derived however insome special cases that is correlated binary data withnonexchangeable correlation structure there was no close

Advances in Statistics 5

form available along the outline of that formula AfterwardsShih provided an alternative formula on sample sizepowercalculation which relied on Wald tests using the estimatesof regression parameters and robust variance estimators [59]For example in a study with one parameter of interest 120573 thehypothesis of interest can be formulated as

1198670 120573 = 0 versus 119867

119886 120573 = 119887 = 0 (15)

where 119887 is the expected value Thus based on a two-sided119885-test with type I error 120578 the power denoted by 120575 can beobtained by

120575 = 1 minus Φ(1198851205782minus119887radic119870

radic]119877) (16)

where119870 is sample size and ]119877is the robust variance estimator

corresponding to 120573 in the estimate of119870119871119885 Accordingly the

sample size is given by

119870 =]119877(1198851205782minus 1198851minus120575)2

1198872 (17)

For correlated continuous data the calculation is straightfor-ward using (16) however in particular for correlated binarydata more work will be needed [60] and Pan providedexplicit formulas for ]

119877under various situations as follows

[61]

]119877= Ω[

1

1205871199010(1 minus 119901

0)+

1

(1 minus 120587) 1199011 (1 minus 1199011)] (18)

where Ω = 119870(sum119870119894=1

11015840119899119894119877minus1

119894119881119894119877minus1

1198941119899119894)(sum119870

119894=111015840119899119894119877minus1

1198941119899119894)2 with 120587

as the proportion of subjects assigned to the control groupand 119901

0and 119901

1as the mean for control and case groups

[61] The detailed calculations of ]119877under several important

special cases are given by

If 119877119894= 119881119868= CS Ω = 119870

sum119870

119894=1(119899119894 (1 + (119899

119894minus 1) 120572))

If 119877119894= 119868 119881

119868= CS Ω =

119870sum119870

119894=1119899119894[1 + (119899

119894minus 1) 120572]

(sum119870

119894=1119899119894)2

If 119877119894= 119881119868= AR (1) Ω = 119870 (1 + 120572)

sum119870

119894=1[119899119894minus (119899119894minus 2) 120572]

If 119877119894= 119868 119881

119868= AR (1)

Ω = 119870sum119870

119894=1[119899119894+ 2 (119899

119894minus 1) 120572 + 2 (119899

119894minus 2) 1205722 + sdot sdot sdot + 2120572119899119894minus1]

(sum119870

119894=1119899119894)2

(19)

These formulas can be directly used in practice which hascovered most situations encountered in clinical trials [61]Note that when 119877

119894= 119881119868= CS Liu and Liang (1997) provided

a different formula of sample size compared with (17) with119899119894= 119899 which is

119870 = ((1199111minus1205782

+ 1199111minus120575)2

times ((1 minus 120587) 1199010 (1 minus 1199010) + 1205871199011 (1 minus 1199011))

times [1 + (119899 minus 1) 120572])

times (119899120587 (1 minus 120587) (1199011 minus 1199010)2)minus1

(20)

Be aware that the difference is due to the test methods theWald 119885-test used by Pan [61] and the score test applied byLiu and Liang [58] Note that in some cases the score testmay be preferred [62] Although some other works exist forsample sizepower calculation they focused on the otheralternative approaches rather than GEE [63 64] thus wedo not discuss them here For correlated Poisson data thesample sizepower calculation is more challenging due tothe occurrence of overdispersion or sparsity where negativebinomial regression model may be explored [62 65ndash67]

On the other hand there are several concerns [68]First we here focus on the calculation of the sample size 119870assuming 119899

119894is known however based on the power formula

(16) ]119877depends on 119899

119894and thus increasing 119899

119894can also assist

in power improvement but turns out to be less effective than119870 [69] Second the sample sizepower calculation may berestricted to the limitation of clusters for example clusteredrandomized trials (CRTs) where the number of clusters couldbe relatively small For example by the literature review ofpublished CRTs the median number of clusters is shown as21 [70] In such situations the power formula adjusted for thesmall samples in GEE is necessary which has drawn attentionfrom researchers recently [71ndash75]

24 Clustered Data with Informative Cluster Size The appli-cation of GEE in clustered data with informative clustersize is another special topic [76] Taking an example ofa periodontal disease study the number of teeth for eachpatient may be related to the overall oral health of theindividual in other words the worse the oral health is theless the number of teeth is and thus cluster size 119899

119894may

influence the distribution of the oral outcomes which iscalled informative cluster size [45 77] Such issues commonlyoccur in biomedical studies (eg genetic disease studies) andrigorous statistical methods are needed for valid statisticalinference [78] Note that if the maximum of cluster sizeexists and is known then this can be treated as (informative)missing data problem which can be solved via the weightedestimating equations proposed by Robins et al [79] howeverif the maximum is unknown or not accessible the methodof within-cluster resampling (WCR) proposed by Hoffman etal could be applied [80] The basic idea is that for each of 119871resampled replicate data based on a Monte Carlo method (119871is a large number ie 10000) one observation is randomlyextracted from each cluster where

119897with variance estimator

Σ119897can be obtained from a regular score equation denoted by

119878119897(120573) for independent observations (ie linear regression for

6 Advances in Statistics

continuous data logistic regression for binary data Poissonregression for count data) 119897 = 1 2 119871 The details areshown as follows

119878119897(120573) =

119870

sum119894=1

119878119894119895(120573119897) 119868 [119895 isin 119903

119897] = 0

wcr =1

119871

119871

sum119897=1

119897

wcr =1

119871

119871

sum119897=1

Σ119897minus1

119871

119871

sum119897=1

(119897minus wcr) (119897 minus wcr)

119879

(21)

where 119878119894119895(120573119897) = 1198831015840

119894119895119881minus1119894119895(119884119894119895minus 1198831015840119894119895120573119897) with 119903

119897as the set of

data index selected from the 119894th cluster in 119897th replicate dataAlternatively the approach considered by Williamson et alby adopting the weighted estimating equations performsasymptotically equivalently asWCR and also avoids intensivecomputing and it is referred to as the cluster-weighted GEE(CWGEE) [81] The estimating equation is

119878 (120573) =119870

sum119894=1

1

119870

119899119894

sum119895=1

119878119894119895(120573) = 0 (22)

where 119878119894119895is defined the same as above but what is different is

that the subscription 119895 ranges from 1 to 119899119894 not restricted by

the index 119903119897 Note that as 119871 rarr infin (1119871)sum119871

119894=1119878119897(120573) converges

to its expected estimating function and is asymptoticallyequivalent to 119878(120573)

This method was also explored or extended for thecorrelated data with nonignorable cluster size by Benhin etal and Cong et al [82 83] Furthermore a more efficientmethod called modified WCR (MWCR) was proposed byChiang and Lee where minimum cluster size 119899

119894gt 1 subjects

were randomly sampled from each cluster and then GEEmodels for balanced data were applied for estimation byincorporating the intracluster correlation thusMWCRmightbe a more efficient way for analysis [84] But MWCR is notalways satisfactory and Pavlou et al recognized the sufficientconditions of the data structure and the choice of ldquoworkingrdquocorrelation structure which allowed the consistency of theestimates fromMWCR[85] In additionWang et al extendedthe above work to the clustered longitudinal data which arecollected as repeated measures on subjects arising in clusterswith potential informative cluster size [45] Examples includehealth studies of subjects from multiple hospitals or familiesWith the adoption and comparison of GEE WCR andCWGEE the author claimed thatCWGEEwas recommendedbecause of the comparable performance with WCR and thelack of intensive Monte Carlo computation in terms of wellpreserved coverage rates and desirable power propertieswhile GEE models led to invalid inference due to the biasedparameter estimates via extensive simulation studies and realdata application of a periodontal disease study [45] In addi-tion for observed-cluster inference Seaman et al discussedthe methods including weighted and doubly weighted GEEand the shared random-effects models for comparison and

showed the conditions under which the shared random-effects model described members with observed outcomes Y[86] More work can be found in [87ndash90] among others

3 Simulation

In this section we focus on ldquoworkingrdquo correlation structureselection and compare the performances of the existingcriteria through simulation studies Two types of outcomesare considered continuous and count responses The modelsfor data generation are as follows

119906119894119895= 1205730+ 1205731times 119909119894119895

log (119906119894119895) = 1205730+ 1205731times 119909119894119895

(23)

where 1205730= 1205731= 05 119894 = 1 2 119868 with 119868 = 50 100 200 500

and 119895 = 1 2 119869 with 119869 = 4 8 The covariates 119909119894119895are iid

from a standard uniform distribution Unif(0 1) For eachscenario we generate the data based on the underlying truecorrelation structures as independent (IND) exchangeable(EXCH) and autoregressive (AR-1) with 120572 = 03 071000 Monte Carlo data sets are generated for each scenariowhere the estimates of regression parameters and within-subject correlation matrix and seven model selection criteriameasures are calculated using the ldquoworkingrdquo correlationstructure of IND EXCH and AR-1 The partial simulationresults are provided in Tables 2 3 and 4 where the results ofCIC are not shown because they are the same as those of QIC

Based on the results RJ does not perform well forthe scenarios with either continuous or binary outcomeswhile RJ1 and RJ2 have comparable performances and canselect the true underlying correlation structure in mostscenarios with better performance under large sample sizeQIC is not satisfactory when the true correlation structureis independent but has advantageous performance for thescenarios with the true correlation structure as exchangeableor AR-1 On the other hand SC and GP do not performwell for longitudinal data with normal responses but theperformance is slightly improved for longitudinal data withbinary outcomes The results may vary due to variety offactors including the types of ldquoworkingrdquo correlation structureconsidered for model fitting the sample size andor themagnitude of correlation coefficient For the future workthere is a necessity to find out a robust criterion for ldquoworkingrdquocorrelation structure selection of GEE and more advancedapproaches are emerging currently

4 Future Direction and Discussion

In this paper we provide a review of several specific topicssuch as model selection with emphasis on the selection ofldquoworkingrdquo correlation structure sample size and power cal-culation and clustered data analysis with informative clustersize related to GEE for longitudinalcorrelated data Thesimulation studies are conducted for providing numericalcomparisons among five types of model selection criteria[91 92] Until now novel methodologies are still needed andbeing developed due to the increasing usage and potential

Advances in Statistics 7

Table 2 Simulation for longitudinal data with independent correlation matrix

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 198 393 409 202 374 424RJ 327 423 250 312 421 267RJ1 388 322 290 399 316 285RJ2 384 327 289 388 320 292SC 488 1 512 351 310 339GP 547 0 453 368 306 326

100

QIC 209 377 414 185 407 408RJ 338 415 247 340 410 250RJ1 389 349 262 381 358 261RJ2 389 353 258 372 357 271SC 482 1 517 352 346 302GP 520 0 480 360 348 292

8

50

QIC 200 411 389 203 363 434RJ 282 497 221 292 476 232RJ1 402 354 244 386 340 274RJ2 402 357 241 373 347 280SC 465 1 535 351 325 324GP 558 0 442 382 311 307

100

QIC 188 393 419 201 398 401RJ 321 442 237 287 466 247RJ1 347 385 268 385 367 248RJ2 347 382 271 377 369 254SC 492 0 508 355 343 302GP 541 0 459 370 341 289

Table 3 Simulation for longitudinal data with exchangeable correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 106 699 195 53 758 189RJ 419 139 442 869 5 126RJ1 0 963 37 12 898 90RJ2 0 959 41 22 876 102SC 0 593 407 282 650 68GP 1 593 406 412 524 64

100

QIC 31 879 90 7 867 126RJ 350 88 562 911 2 87RJ1 0 995 5 2 946 52RJ2 0 996 4 10 933 57SC 0 598 402 339 635 26GP 0 501 499 445 531 24

8

50

QIC 80 828 92 50 876 74RJ 10 395 595 813 6 181RJ1 0 1000 0 0 987 13RJ2 0 1000 0 0 966 25SC 0 488 513 302 696 2GP 0 511 489 497 500 3

100

QIC 17 953 30 8 973 19RJ 0 408 592 861 0 139RJ1 0 1000 0 0 997 3RJ2 0 1000 0 0 993 7SC 0 470 530 328 672 0GP 0 526 474 486 514 0

8 Advances in Statistics

Table 4 Simulation for longitudinal data with AR-1 correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 91 166 743 66 170 764RJ 712 142 146 925 12 63RJ1 0 478 522 7 505 488RJ2 0 466 534 20 499 481SC 0 480 520 220 350 430GP 0 543 457 303 332 365

100

QIC 25 116 859 7 122 871RJ 770 95 135 972 4 24RJ1 0 475 525 1 569 430RJ2 0 481 519 5 571 424SC 0 491 509 237 371 392GP 0 540 460 290 353 357

8

50

QIC 50 88 862 44 77 879RJ 646 148 206 934 5 61RJ1 0 445 555 0 535 465RJ2 0 443 557 10 535 455SC 0 467 533 168 397 435GP 0 549 451 269 406 325

100

QIC 16 39 945 7 33 960RJ 648 154 198 972 0 28RJ1 0 455 545 1 603 396RJ2 0 455 545 1 609 390SC 0 480 520 177 458 365GP 0 532 468 247 457 296

theoretical constraints of GEE as well as new challengesemerging from practical applications in clinical trials orbiomedical studies

In addition current research of interest related to GEEalso includes a robust and optimal model selection criterionof GEE under missing at random (MAR) or missing not atrandom (MNAR) [93 94] sample sizepower calculation forcorrelated sparse or overdispersion count data or longitudinaldata with small sample [57ndash60] GEE with improved per-formance under the situations with informative cluster sizeandor MAR andor small sample size [95ndash98] and GEE forhigh-dimensional longitudinal data [99] Although GEE hasattractive features flexible application and easy implementa-tion in software the application in practice should be cautiousdepending on the context of study design or data structureand the goals of research interest

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The author was supported by a grant from the Penn StateCTSI The project was supported by the National Center forResearch Resources and the National Center for AdvancingTranslational Sciences National Institutes of Health throughGrant 5 UL1 RR0330184-04 The content is solely the respon-sibility of the author and does not represent the views of theNIH

References

[1] Z Feng P Diehr A Peterson and D McLerran ldquoSelectedstatistical issues in group randomized trialsrdquo Annual Review ofPublic Health vol 22 pp 167ndash187 2001

[2] G Fitzmaurice N M Larid and J H Ware Applied Longitudi-nal Data John Wiley amp Sons 2004

[3] J W Hardin and J M Hilbe Generalized Estimating EquationsChapman and HallCRC Press Boca Raton Fla USA 2003

[4] R F Potthoff and S N Roy ldquoA generalized multivariate analysisof variance model useful especially for growth curve problemsrdquoBiometrika vol 51 pp 313ndash326 1964

Advances in Statistics 9

[5] LM Friedman CD Furberg andD LDeMets Fundamentalsof Clinical Trials Springer New York NY USA 3nd edition1989

[6] K Y Liang and S L Zeger ldquoA comparison of two bias-correctedcovariance estimators for generalized estimating equationsrdquoBiometrika vol 73 pp 13ndash22 1986

[7] M Crowder ldquoOn the use of a working correlation matrixin using generalised linear models for repeated measuresrdquoBiometrika vol 82 no 2 pp 407ndash410 1995

[8] R W Wedderburn ldquoQuasi-likelihood functions generalizedlinearmodels and the Gauss-Newtonmethodrdquo Biometrika vol61 pp 439ndash447 1974

[9] P Diggle P Heagerty K Y Liang and S L Zeger Analysis ofLongitudinal Data Oxford University Press Oxford UK 2002

[10] G Fitzmaurice M Davidian G Verbeke and G MolenberghsLongitudinal Data Anlaysis Chapman ampHallCRC Press 2008

[11] D Hedeker and R D Gibbons Analysis of Longitudinal DataJohn Wiley amp Sons 2006

[12] P McCullagh and J A Nelder Generalized Linear ModelsChapman amp Hall London UK 1989

[13] N R Chaganty and H Joe ldquoRange of correlation matrices fordependent Bernoulli random variablesrdquo Biometrika vol 93 no1 pp 197ndash206 2006

[14] R T Sabo and N R Chaganty ldquoWhat can go wrong whenignoring correlation bounds in the use of generalized estimatingequationsrdquo Statistics in Medicine vol 29 no 24 pp 2501ndash25072010

[15] B C Sutradhar and K Das ldquoOn the efficiency of regressionestimators in generalised linear models for longitudinal datardquoBiometrika vol 86 no 2 pp 459ndash465 1999

[16] Y-G Wang and V Carey ldquoWorking correlation structuremisspecification estimation and covariate design implicationsfor generalised estimating equations performancerdquo Biometrikavol 90 no 1 pp 29ndash41 2003

[17] S R Lipsitz GMolenberghsGM Fitzmaurice and J IbrahimldquoGEE with Gaussian estimation of the correlations when dataare incompleterdquo Biometrics vol 56 no 2 pp 528ndash536 2000

[18] Y-G Wang and V J Carey ldquoUnbiased estimating equationsfromworking correlationmodels for irregularly timed repeatedmeasuresrdquo Journal of the American Statistical Association vol99 no 467 pp 845ndash853 2004

[19] A Qu and B G Lindsay ldquoBuilding adaptive estimating equa-tions when inverse of covariance estimation is difficultrdquo Journalof the Royal Statistical Society B Statistical Methodology vol 65no 1 pp 127ndash142 2003

[20] S R Lipsitz and G M Fitzmaurice ldquoEstimating equations formeasures of association between repeated binary responsesrdquoBiometrics vol 52 no 3 pp 903ndash912 1996

[21] Y Lee and J A Nelder ldquoConditional and marginal modelsanother viewrdquo Statistical Science vol 19 no 2 pp 219ndash238 2004

[22] Y Lee and J A Nelder ldquoLikelihood inference for models withunobservables another viewrdquo Statistical Science vol 24 no 3pp 255ndash269 2009

[23] A Qu B G Lindsay and B Li ldquoImproving generalised estimat-ing equations using quadratic inference functionsrdquo Biometrikavol 87 no 4 pp 823ndash836 2000

[24] G Kauermann and R J Carroll ldquoA note on the efficiencyof sandwich covariance matrix estimationrdquo Journal of theAmerican Statistical Association vol 96 no 456 pp 1387ndash13962001

[25] Y G Wang and L Y Hin ldquoModeling strategies in longitudinaldata analysis covariate variance function and correlationstructure selectionrdquoComputational Statistics andData Analysisvol 54 no 12 pp 3359ndash3370 2010

[26] W Pan ldquoGoodness-of-fit tests for GEE with correlated binarydatardquo Scandinavian Journal of Statistics vol 29 no 1 pp 101ndash110 2002

[27] A M Wood I R White and P Royston ldquoHow should variableselection be performed with multiply imputed datardquo Statisticsin Medicine vol 27 no 17 pp 3227ndash3246 2008

[28] M D Begg and M K Parides ldquoSeparation of individual-level and cluster-level covariate effects in regression analysis ofcorrelated datardquo Statistics in Medicine vol 22 no 16 pp 2591ndash2602 2003

[29] L Y Hin V J Carey and Y G Wang ldquoCriteria for working-correlation-structure selection in GEE assessment via simula-tionrdquoTheAmerican Statistician vol 61 no 4 pp 360ndash364 2007

[30] J X Pan and G Mackenzie ldquoOn modelling mean-covariancestructures in longitudinal studiesrdquo Biometrika vol 90 no 1 pp239ndash244 2003

[31] M Davidian and R J Carroll ldquoVariance function estimationrdquoJournal of the American Statistical Association vol 82 no 400pp 1079ndash1091 1987

[32] M Pourahmadi ldquoJoint mean-covariance models with appli-cations to longitudinal data unconstrained parameterisationrdquoBiometrika vol 86 no 3 pp 677ndash690 1999

[33] S Konishi and G Kitagawa ldquoGeneralised information criteriainmodel selectionrdquoBiometrika vol 83 no 4 pp 875ndash890 1996

[34] B Zhang ldquoSummarizing the goodness of fit o f generalizedlinear models for longitudinal datardquo Statistics in Medicine vol19 pp 1265ndash1275 2000

[35] A Rotnitzky and N P Jewell ldquoHypothesis testing of regressionparameters in semiparametric generalized linear models forcluster correlated datardquo Biometrika vol 77 no 3 pp 485ndash4971990

[36] J Shults andN R Chaganty ldquoAnalysis of serially correlated datausing quasi-least squaresrdquo Biometrics vol 54 no 4 pp 1622ndash1630 1998

[37] V J Carey and Y-G Wang ldquoWorking covariance modelselection for generalized estimating equationsrdquo Statistics inMedicine vol 30 no 26 pp 3117ndash3124 2011

[38] W Pan ldquoAkaikersquos information criterion in generalized estimat-ing equationsrdquo Biometrics vol 57 no 1 pp 120ndash125 2001

[39] H Akaike ldquoInformation theory and an extension of themaximum likelihood principlerdquo in Proceedings of the 2ndInternational Symposium on Information Theory vol 15 pp267ndash281 1973

[40] J A Nelder and Y Lee ldquoLikelihood quasi-likelihood andpseudolikelihood some comparisonsrdquo Journal of the RoyalStatistical Society B vol 54 no 1 pp 273ndash284 1992

[41] J Cui ldquoQIC program andmodel selection in GEE analysesrdquoTheStata Journal vol 7 no 2 pp 209ndash220 2007

[42] J Cui and G Qian ldquoSelection of working correlation structureand best model in GEE analyses of longitudinal datardquo Commu-nications in StatisticsmdashSimulation and Computation vol 36 no4ndash6 pp 987ndash996 2007

[43] L Y Hin and Y G Wang ldquoWorking-correlation-structureidentification in generalized estimating equationsrdquo Statistics inMedicine vol 28 no 4 pp 642ndash658 2009

[44] J A Nelder and D Pregibon ldquoAn extended quasi-likelihoodfunctionrdquo Biometrika vol 74 no 2 pp 221ndash232 1987

10 Advances in Statistics

[45] MWang M Kong and S Datta ldquoInference for marginal linearmodels for clustered longitudinal datawith potentially informa-tive cluster sizesrdquo Statistical Methods in Medical Research vol20 no 4 pp 347ndash367 2011

[46] E Cantoni J M Flemming and E Ronchetti ldquoVariableselection for marginal longitudinal generalized linear modelsrdquoBiometrics Journal of the International Biometric Society vol 61no 2 pp 507ndash514 2005

[47] Y-G Wang and X Lin ldquoEffects of variance-function misspeci-fication in analysis of longitudinal datardquo Biometrics vol 61 no2 pp 413ndash421 2005

[48] N R Chaganty andH Joe ldquoEfficiency of generalized estimatingequations for binary responsesrdquo Journal of the Royal StatisticalSociety Series B Statistical Methodology vol 66 no 4 pp 851ndash860 2004

[49] M Gosho C Hamada and I Yoshimura ldquoCriterion for theselection of a working correlation structure in the generalizedestimating equation approach for longitudinal balanced datardquoCommunications in Statistics vol 40 no 21 pp 3839ndash38562011

[50] M Gosho C Hamada and I Yoshimura ldquoSelection of workingcorrelation structure in weighted generalized estimating equa-tion method for incomplete longitudinal datardquo Communica-tions in Statistics vol 43 no 1 pp 62ndash81 2014

[51] M J JangWorking correlation selection in generalized estimatingequations [Dissertation] University of Iowa 2011

[52] J Chen and N A Lazar ldquoSelection of working correlationstructure in generalized estimating equations via empiricallikelihoodrdquo Journal of Computational and Graphical Statisticsvol 21 no 1 pp 18ndash41 2012

[53] P M Westgate ldquoA bias-corrected covariance estimator forimproved inference when using an unstructured correlationwith quadratic inference functionsrdquo Statistics and ProbabilityLetters vol 83 no 6 pp 1553ndash1558 2013

[54] P M Westgate ldquoCriterion for the simultaneous selection of aworking correlation structure and either generalized estimatingequations or the quadratic inference function approachrdquo Bio-metrical Journal vol 56 no 3 pp 461ndash476 2014

[55] P M Westgate ldquoImproving the correlation structure selectionapproach for generalized estimating equations and balancedlongitudinal datardquo Statistics in Medicine vol 33 no 13 pp2222ndash2237 2014

[56] J Ye ldquoOn measuring and correcting the effects of data miningand model selectionrdquo Journal of the American Statistical Associ-ation vol 93 no 441 pp 120ndash131 1998

[57] J J Shuster Practical Handbook of Sample Size Guidelines forClinical Trials CRC Press Boca Raton Fla USA 1993

[58] G Liu and K-Y Liang ldquoSample size calculations for studieswith correlated observationsrdquo Biometrics vol 53 no 3 pp 937ndash947 1997

[59] W J Shih ldquoSample size and power calculations for periodontaland other studies with clustered samples using the method ofgeneralized estimating equationsrdquo Biometrical Journal vol 39no 8 pp 899ndash908 1997

[60] S R Lipsitz and G M Fitzmaurice ldquoSample size for repeatedmeasures studies with binary responsesrdquo Statistics in Medicinevol 13 no 12 pp 1233ndash1239 1994

[61] W Pan ldquoSample size and power calculations with correlatedbinary datardquoControlled Clinical Trials vol 22 no 3 pp 211ndash2272001

[62] N Breslow ldquoTests of hypotheses in overdispersed Poissonregression and other quasi likelihood modelsrdquo Journal of theAmerican Statistical Association vol 85 pp 565ndash571 1990

[63] E W Lee and N Dubin ldquoEstimation and sample size consider-ations for clustered binary responsesrdquo Statistics inMedicine vol13 no 12 pp 1241ndash1252 1994

[64] D J Sargent J A Sloan and S S Cha ldquoSample size anddesign considerations for phase II clinical trials with correlatedobservationsrdquo Controlled Clinical Trials vol 20 no 3 pp 242ndash252 1999

[65] C S Li ldquoSemiparametric negative binomial regressionmodelsrdquoCommunications in Statistics Simulation and Computation vol39 no 3 pp 475ndash486 2010

[66] WHGreene ldquoAccounting for excess zeros and sample selectionin Poisson and negative binomial regression modelsrdquo TechRep New York University 1994

[67] P Lambert ldquoModeling of repeated series of count data mea-sured at unequally spaced timesrdquo Applied Statistics vol 45 pp31ndash38 1996

[68] M S Pepe andG L Anderson ldquoA cautionary note on in ferencefor marginal regression models with longitudinal data andgeneral correlated response datardquo Communications in StatisticsSeries B vol 23 pp 939ndash951 1994

[69] M Wang and Q Long ldquoModified robust variance estimator forgeneralized estimating equations with improved small-sampleperformancerdquo Statistics in Medicine vol 30 no 11 pp 1278ndash1291 2011

[70] M Taljaard ADMcRae CWeijer et al ldquoInadequate reportingof research ethics review and informed consent in clusterrandomised trials Review of random sample of publishedtrialsrdquo British Medical Journal vol 342 Article ID d2496 2011

[71] L A Mancl and T A DeRouen ldquoA covariance estimator forGEE with improved small-sample propertiesrdquo Biometrics vol57 no 1 pp 126ndash134 2001

[72] M P Fay and B I Graubard ldquoSmall-sample adjustments forWald-type tests using sandwich estimatorsrdquo Biometrics vol 57no 4 pp 1198ndash1206 2001

[73] W Pan ldquoOn the robust variance estimator in generalisedestimating equationsrdquo Biometrika vol 88 no 3 pp 901ndash9062001

[74] W Pan and M M Wall ldquoSmall-sample adjustments in usingthe sandwich variance estimator in generalized estimatingequationsrdquo Statistics in Medicine vol 21 no 10 pp 1429ndash14412002

[75] X Guo W Pan J E Connett P J Hannan and S A FrenchldquoSmall-sample performance of the robust score test and itsmodifications in generalized estimating equationsrdquo Statistics inMedicine vol 24 no 22 pp 3479ndash3495 2005

[76] D M Farewell ldquoMarginal analyses of longitudinal data with aninformative pattern of observationsrdquo Biometrika vol 97 no 1pp 65ndash78 2010

[77] J D Beck T Sharp G G Koch and S Offenbacher ldquoA 5-yearstudy of attachment loss and tooth loss in community-dwellingolder adultsrdquo Journal of Periodontal Research vol 32 no 6 pp516ndash523 1997

[78] S J Arbes Jr H Agustsdottir and G D Slade ldquoEnvironmentaltobacco smoke and periodontal disease in the United StatesrdquoAmerican Journal of Public Health vol 91 no 2 pp 253ndash2572001

[79] J M Robins A Rotnitzky and L P Zhao ldquoAnalysis of semi-parametric regression models for repeated outcomes in the

Advances in Statistics 11

presence of missing datardquo Journal of the American StatisticalAssociation vol 90 pp 106ndash121 1995

[80] E B Hoffman P K Sen and C R Weinberg ldquoWithin-clusterresamplingrdquo Biometrika vol 88 no 4 pp 1121ndash1134 2001

[81] J MWilliamson S Datta and G A Satten ldquoMarginal analysesof clustered data when cluster size is informativerdquo Biometricsvol 59 no 1 pp 36ndash42 2003

[82] E Benhin J N Rao and A J Scott ldquoMean estimating equationapproach to analysing cluster-correlated data with nonignor-able cluster sizesrdquo Biometrika vol 92 no 2 pp 435ndash450 2005

[83] X J Cong G Yin and Y Shen ldquoMarginal analysis of correlatedfailure time data with informative cluster sizesrdquo Biometrics vol63 no 3 pp 663ndash672 2007

[84] T C Chiang and K Y Lee ldquoEfcient estimation methods forinformative cluster size datardquo Statistical Sinica vol 80 pp 121ndash123 2008

[85] M Pavlou S R Seaman and A J Copas ldquoAn examinationof a method for marginal inference when the cluster size isinformativerdquo Statistica Sinica vol 23 no 2 pp 791ndash801 2013

[86] S R Seaman M Pavlou and A J Copas ldquoMethods forobserved-cluster inference when cluster size is informative areview and clarificationsrdquoBiometrics vol 70 no 2 pp 449ndash4562014

[87] Z Chen B Zhang and P S Albert ldquoA joint modeling approachto data with informative cluster size robustness to the clustersize modelrdquo Statistics in Medicine vol 30 no 15 pp 1825ndash18362011

[88] Y Huang and B Leroux ldquoInformative cluster sizes forsubcluster-level covariates and weighted generalized estimatingequationsrdquo Biometrics vol 67 no 3 pp 843ndash851 2011

[89] B F Kurland L L Johnson B L Egleston and P H DiehrldquoLongitudinal data with follow-up truncated by death matchthe analysis method to research aimsrdquo Statistical Science vol24 no 2 pp 211ndash222 2009

[90] J M Neuhaus and C E McCulloch ldquoEstimation of covariateeffects in generalized linear mixed models with informativecluster sizesrdquo Biometrika vol 98 no 1 pp 147ndash162 2011

[91] S R Lipsitz G M Fitzmaurice E J Orav and N M LairdldquoPerformance of generalized estimating equations in practicalsituationsrdquo Biometrics vol 50 no 1 pp 270ndash278 1994

[92] D B Hall and T A Severini ldquoExtended generalized estimatingequations for clustered datardquo Journal of the American StatisticalAssociation vol 93 no 444 pp 1365ndash1375 1998

[93] C-W Shen and Y-H Chen ldquoModel selection for generalizedestimating equations accommodating dropout missingnessrdquoBiometrics vol 68 no 4 pp 1046ndash1054 2012

[94] C-W Shen and Y-H Chen ldquoModel selection of generalizedestimating equations with multiply imputed longitudinal datardquoBiometrical Journal vol 55 no 6 pp 899ndash911 2013

[95] D B Rubin ldquoInference and missing datardquo Biometrika vol 63no 3 pp 581ndash592 1976

[96] R J Little andDB Rubin Statistical Analysis withMissingDataWiley New York NY USA

[97] P Diggle D Farewell and RHenderson ldquoAnalysis of longitudi-nal data with drop-out objectives assumptions and a proposalrdquoJournal of the Royal Statistical Society C vol 56 no 5 pp 499ndash550 2007

[98] A J Copas and S R Seaman ldquoBias from the use of generalizedestimating equations to analyze incomplete longitudinal binarydatardquo Journal of Applied Statistics vol 37 no 6 pp 911ndash9222010

[99] L Wang J Zhou and A Qu ldquoPenalized generalized estimatingequations for high-dimensional longitudinal data analysisrdquoBiometrics vol 68 no 2 pp 353ndash360 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Review Article Generalized Estimating Equations in ...downloads.hindawi.com/archive/2014/303728.pdfrecent developments of GEE. As is well known, GEE has several de ning features [

Advances in Statistics 5

form available along the outline of that formula AfterwardsShih provided an alternative formula on sample sizepowercalculation which relied on Wald tests using the estimatesof regression parameters and robust variance estimators [59]For example in a study with one parameter of interest 120573 thehypothesis of interest can be formulated as

1198670 120573 = 0 versus 119867

119886 120573 = 119887 = 0 (15)

where 119887 is the expected value Thus based on a two-sided119885-test with type I error 120578 the power denoted by 120575 can beobtained by

120575 = 1 minus Φ(1198851205782minus119887radic119870

radic]119877) (16)

where119870 is sample size and ]119877is the robust variance estimator

corresponding to 120573 in the estimate of119870119871119885 Accordingly the

sample size is given by

119870 =]119877(1198851205782minus 1198851minus120575)2

1198872 (17)

For correlated continuous data the calculation is straightfor-ward using (16) however in particular for correlated binarydata more work will be needed [60] and Pan providedexplicit formulas for ]

119877under various situations as follows

[61]

]119877= Ω[

1

1205871199010(1 minus 119901

0)+

1

(1 minus 120587) 1199011 (1 minus 1199011)] (18)

where Ω = 119870(sum119870119894=1

11015840119899119894119877minus1

119894119881119894119877minus1

1198941119899119894)(sum119870

119894=111015840119899119894119877minus1

1198941119899119894)2 with 120587

as the proportion of subjects assigned to the control groupand 119901

0and 119901

1as the mean for control and case groups

[61] The detailed calculations of ]119877under several important

special cases are given by

If 119877119894= 119881119868= CS Ω = 119870

sum119870

119894=1(119899119894 (1 + (119899

119894minus 1) 120572))

If 119877119894= 119868 119881

119868= CS Ω =

119870sum119870

119894=1119899119894[1 + (119899

119894minus 1) 120572]

(sum119870

119894=1119899119894)2

If 119877119894= 119881119868= AR (1) Ω = 119870 (1 + 120572)

sum119870

119894=1[119899119894minus (119899119894minus 2) 120572]

If 119877119894= 119868 119881

119868= AR (1)

Ω = 119870sum119870

119894=1[119899119894+ 2 (119899

119894minus 1) 120572 + 2 (119899

119894minus 2) 1205722 + sdot sdot sdot + 2120572119899119894minus1]

(sum119870

119894=1119899119894)2

(19)

These formulas can be directly used in practice which hascovered most situations encountered in clinical trials [61]Note that when 119877

119894= 119881119868= CS Liu and Liang (1997) provided

a different formula of sample size compared with (17) with119899119894= 119899 which is

119870 = ((1199111minus1205782

+ 1199111minus120575)2

times ((1 minus 120587) 1199010 (1 minus 1199010) + 1205871199011 (1 minus 1199011))

times [1 + (119899 minus 1) 120572])

times (119899120587 (1 minus 120587) (1199011 minus 1199010)2)minus1

(20)

Be aware that the difference is due to the test methods theWald 119885-test used by Pan [61] and the score test applied byLiu and Liang [58] Note that in some cases the score testmay be preferred [62] Although some other works exist forsample sizepower calculation they focused on the otheralternative approaches rather than GEE [63 64] thus wedo not discuss them here For correlated Poisson data thesample sizepower calculation is more challenging due tothe occurrence of overdispersion or sparsity where negativebinomial regression model may be explored [62 65ndash67]

On the other hand there are several concerns [68]First we here focus on the calculation of the sample size 119870assuming 119899

119894is known however based on the power formula

(16) ]119877depends on 119899

119894and thus increasing 119899

119894can also assist

in power improvement but turns out to be less effective than119870 [69] Second the sample sizepower calculation may berestricted to the limitation of clusters for example clusteredrandomized trials (CRTs) where the number of clusters couldbe relatively small For example by the literature review ofpublished CRTs the median number of clusters is shown as21 [70] In such situations the power formula adjusted for thesmall samples in GEE is necessary which has drawn attentionfrom researchers recently [71ndash75]

24 Clustered Data with Informative Cluster Size The appli-cation of GEE in clustered data with informative clustersize is another special topic [76] Taking an example ofa periodontal disease study the number of teeth for eachpatient may be related to the overall oral health of theindividual in other words the worse the oral health is theless the number of teeth is and thus cluster size 119899

119894may

influence the distribution of the oral outcomes which iscalled informative cluster size [45 77] Such issues commonlyoccur in biomedical studies (eg genetic disease studies) andrigorous statistical methods are needed for valid statisticalinference [78] Note that if the maximum of cluster sizeexists and is known then this can be treated as (informative)missing data problem which can be solved via the weightedestimating equations proposed by Robins et al [79] howeverif the maximum is unknown or not accessible the methodof within-cluster resampling (WCR) proposed by Hoffman etal could be applied [80] The basic idea is that for each of 119871resampled replicate data based on a Monte Carlo method (119871is a large number ie 10000) one observation is randomlyextracted from each cluster where

119897with variance estimator

Σ119897can be obtained from a regular score equation denoted by

119878119897(120573) for independent observations (ie linear regression for

6 Advances in Statistics

continuous data logistic regression for binary data Poissonregression for count data) 119897 = 1 2 119871 The details areshown as follows

119878119897(120573) =

119870

sum119894=1

119878119894119895(120573119897) 119868 [119895 isin 119903

119897] = 0

wcr =1

119871

119871

sum119897=1

119897

wcr =1

119871

119871

sum119897=1

Σ119897minus1

119871

119871

sum119897=1

(119897minus wcr) (119897 minus wcr)

119879

(21)

where 119878119894119895(120573119897) = 1198831015840

119894119895119881minus1119894119895(119884119894119895minus 1198831015840119894119895120573119897) with 119903

119897as the set of

data index selected from the 119894th cluster in 119897th replicate dataAlternatively the approach considered by Williamson et alby adopting the weighted estimating equations performsasymptotically equivalently asWCR and also avoids intensivecomputing and it is referred to as the cluster-weighted GEE(CWGEE) [81] The estimating equation is

119878 (120573) =119870

sum119894=1

1

119870

119899119894

sum119895=1

119878119894119895(120573) = 0 (22)

where 119878119894119895is defined the same as above but what is different is

that the subscription 119895 ranges from 1 to 119899119894 not restricted by

the index 119903119897 Note that as 119871 rarr infin (1119871)sum119871

119894=1119878119897(120573) converges

to its expected estimating function and is asymptoticallyequivalent to 119878(120573)

This method was also explored or extended for thecorrelated data with nonignorable cluster size by Benhin etal and Cong et al [82 83] Furthermore a more efficientmethod called modified WCR (MWCR) was proposed byChiang and Lee where minimum cluster size 119899

119894gt 1 subjects

were randomly sampled from each cluster and then GEEmodels for balanced data were applied for estimation byincorporating the intracluster correlation thusMWCRmightbe a more efficient way for analysis [84] But MWCR is notalways satisfactory and Pavlou et al recognized the sufficientconditions of the data structure and the choice of ldquoworkingrdquocorrelation structure which allowed the consistency of theestimates fromMWCR[85] In additionWang et al extendedthe above work to the clustered longitudinal data which arecollected as repeated measures on subjects arising in clusterswith potential informative cluster size [45] Examples includehealth studies of subjects from multiple hospitals or familiesWith the adoption and comparison of GEE WCR andCWGEE the author claimed thatCWGEEwas recommendedbecause of the comparable performance with WCR and thelack of intensive Monte Carlo computation in terms of wellpreserved coverage rates and desirable power propertieswhile GEE models led to invalid inference due to the biasedparameter estimates via extensive simulation studies and realdata application of a periodontal disease study [45] In addi-tion for observed-cluster inference Seaman et al discussedthe methods including weighted and doubly weighted GEEand the shared random-effects models for comparison and

showed the conditions under which the shared random-effects model described members with observed outcomes Y[86] More work can be found in [87ndash90] among others

3 Simulation

In this section we focus on ldquoworkingrdquo correlation structureselection and compare the performances of the existingcriteria through simulation studies Two types of outcomesare considered continuous and count responses The modelsfor data generation are as follows

119906119894119895= 1205730+ 1205731times 119909119894119895

log (119906119894119895) = 1205730+ 1205731times 119909119894119895

(23)

where 1205730= 1205731= 05 119894 = 1 2 119868 with 119868 = 50 100 200 500

and 119895 = 1 2 119869 with 119869 = 4 8 The covariates 119909119894119895are iid

from a standard uniform distribution Unif(0 1) For eachscenario we generate the data based on the underlying truecorrelation structures as independent (IND) exchangeable(EXCH) and autoregressive (AR-1) with 120572 = 03 071000 Monte Carlo data sets are generated for each scenariowhere the estimates of regression parameters and within-subject correlation matrix and seven model selection criteriameasures are calculated using the ldquoworkingrdquo correlationstructure of IND EXCH and AR-1 The partial simulationresults are provided in Tables 2 3 and 4 where the results ofCIC are not shown because they are the same as those of QIC

Based on the results RJ does not perform well forthe scenarios with either continuous or binary outcomeswhile RJ1 and RJ2 have comparable performances and canselect the true underlying correlation structure in mostscenarios with better performance under large sample sizeQIC is not satisfactory when the true correlation structureis independent but has advantageous performance for thescenarios with the true correlation structure as exchangeableor AR-1 On the other hand SC and GP do not performwell for longitudinal data with normal responses but theperformance is slightly improved for longitudinal data withbinary outcomes The results may vary due to variety offactors including the types of ldquoworkingrdquo correlation structureconsidered for model fitting the sample size andor themagnitude of correlation coefficient For the future workthere is a necessity to find out a robust criterion for ldquoworkingrdquocorrelation structure selection of GEE and more advancedapproaches are emerging currently

4 Future Direction and Discussion

In this paper we provide a review of several specific topicssuch as model selection with emphasis on the selection ofldquoworkingrdquo correlation structure sample size and power cal-culation and clustered data analysis with informative clustersize related to GEE for longitudinalcorrelated data Thesimulation studies are conducted for providing numericalcomparisons among five types of model selection criteria[91 92] Until now novel methodologies are still needed andbeing developed due to the increasing usage and potential

Advances in Statistics 7

Table 2 Simulation for longitudinal data with independent correlation matrix

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 198 393 409 202 374 424RJ 327 423 250 312 421 267RJ1 388 322 290 399 316 285RJ2 384 327 289 388 320 292SC 488 1 512 351 310 339GP 547 0 453 368 306 326

100

QIC 209 377 414 185 407 408RJ 338 415 247 340 410 250RJ1 389 349 262 381 358 261RJ2 389 353 258 372 357 271SC 482 1 517 352 346 302GP 520 0 480 360 348 292

8

50

QIC 200 411 389 203 363 434RJ 282 497 221 292 476 232RJ1 402 354 244 386 340 274RJ2 402 357 241 373 347 280SC 465 1 535 351 325 324GP 558 0 442 382 311 307

100

QIC 188 393 419 201 398 401RJ 321 442 237 287 466 247RJ1 347 385 268 385 367 248RJ2 347 382 271 377 369 254SC 492 0 508 355 343 302GP 541 0 459 370 341 289

Table 3 Simulation for longitudinal data with exchangeable correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 106 699 195 53 758 189RJ 419 139 442 869 5 126RJ1 0 963 37 12 898 90RJ2 0 959 41 22 876 102SC 0 593 407 282 650 68GP 1 593 406 412 524 64

100

QIC 31 879 90 7 867 126RJ 350 88 562 911 2 87RJ1 0 995 5 2 946 52RJ2 0 996 4 10 933 57SC 0 598 402 339 635 26GP 0 501 499 445 531 24

8

50

QIC 80 828 92 50 876 74RJ 10 395 595 813 6 181RJ1 0 1000 0 0 987 13RJ2 0 1000 0 0 966 25SC 0 488 513 302 696 2GP 0 511 489 497 500 3

100

QIC 17 953 30 8 973 19RJ 0 408 592 861 0 139RJ1 0 1000 0 0 997 3RJ2 0 1000 0 0 993 7SC 0 470 530 328 672 0GP 0 526 474 486 514 0

8 Advances in Statistics

Table 4 Simulation for longitudinal data with AR-1 correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 91 166 743 66 170 764RJ 712 142 146 925 12 63RJ1 0 478 522 7 505 488RJ2 0 466 534 20 499 481SC 0 480 520 220 350 430GP 0 543 457 303 332 365

100

QIC 25 116 859 7 122 871RJ 770 95 135 972 4 24RJ1 0 475 525 1 569 430RJ2 0 481 519 5 571 424SC 0 491 509 237 371 392GP 0 540 460 290 353 357

8

50

QIC 50 88 862 44 77 879RJ 646 148 206 934 5 61RJ1 0 445 555 0 535 465RJ2 0 443 557 10 535 455SC 0 467 533 168 397 435GP 0 549 451 269 406 325

100

QIC 16 39 945 7 33 960RJ 648 154 198 972 0 28RJ1 0 455 545 1 603 396RJ2 0 455 545 1 609 390SC 0 480 520 177 458 365GP 0 532 468 247 457 296

theoretical constraints of GEE as well as new challengesemerging from practical applications in clinical trials orbiomedical studies

In addition current research of interest related to GEEalso includes a robust and optimal model selection criterionof GEE under missing at random (MAR) or missing not atrandom (MNAR) [93 94] sample sizepower calculation forcorrelated sparse or overdispersion count data or longitudinaldata with small sample [57ndash60] GEE with improved per-formance under the situations with informative cluster sizeandor MAR andor small sample size [95ndash98] and GEE forhigh-dimensional longitudinal data [99] Although GEE hasattractive features flexible application and easy implementa-tion in software the application in practice should be cautiousdepending on the context of study design or data structureand the goals of research interest

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The author was supported by a grant from the Penn StateCTSI The project was supported by the National Center forResearch Resources and the National Center for AdvancingTranslational Sciences National Institutes of Health throughGrant 5 UL1 RR0330184-04 The content is solely the respon-sibility of the author and does not represent the views of theNIH

References

[1] Z Feng P Diehr A Peterson and D McLerran ldquoSelectedstatistical issues in group randomized trialsrdquo Annual Review ofPublic Health vol 22 pp 167ndash187 2001

[2] G Fitzmaurice N M Larid and J H Ware Applied Longitudi-nal Data John Wiley amp Sons 2004

[3] J W Hardin and J M Hilbe Generalized Estimating EquationsChapman and HallCRC Press Boca Raton Fla USA 2003

[4] R F Potthoff and S N Roy ldquoA generalized multivariate analysisof variance model useful especially for growth curve problemsrdquoBiometrika vol 51 pp 313ndash326 1964

Advances in Statistics 9

[5] LM Friedman CD Furberg andD LDeMets Fundamentalsof Clinical Trials Springer New York NY USA 3nd edition1989

[6] K Y Liang and S L Zeger ldquoA comparison of two bias-correctedcovariance estimators for generalized estimating equationsrdquoBiometrika vol 73 pp 13ndash22 1986

[7] M Crowder ldquoOn the use of a working correlation matrixin using generalised linear models for repeated measuresrdquoBiometrika vol 82 no 2 pp 407ndash410 1995

[8] R W Wedderburn ldquoQuasi-likelihood functions generalizedlinearmodels and the Gauss-Newtonmethodrdquo Biometrika vol61 pp 439ndash447 1974

[9] P Diggle P Heagerty K Y Liang and S L Zeger Analysis ofLongitudinal Data Oxford University Press Oxford UK 2002

[10] G Fitzmaurice M Davidian G Verbeke and G MolenberghsLongitudinal Data Anlaysis Chapman ampHallCRC Press 2008

[11] D Hedeker and R D Gibbons Analysis of Longitudinal DataJohn Wiley amp Sons 2006

[12] P McCullagh and J A Nelder Generalized Linear ModelsChapman amp Hall London UK 1989

[13] N R Chaganty and H Joe ldquoRange of correlation matrices fordependent Bernoulli random variablesrdquo Biometrika vol 93 no1 pp 197ndash206 2006

[14] R T Sabo and N R Chaganty ldquoWhat can go wrong whenignoring correlation bounds in the use of generalized estimatingequationsrdquo Statistics in Medicine vol 29 no 24 pp 2501ndash25072010

[15] B C Sutradhar and K Das ldquoOn the efficiency of regressionestimators in generalised linear models for longitudinal datardquoBiometrika vol 86 no 2 pp 459ndash465 1999

[16] Y-G Wang and V Carey ldquoWorking correlation structuremisspecification estimation and covariate design implicationsfor generalised estimating equations performancerdquo Biometrikavol 90 no 1 pp 29ndash41 2003

[17] S R Lipsitz GMolenberghsGM Fitzmaurice and J IbrahimldquoGEE with Gaussian estimation of the correlations when dataare incompleterdquo Biometrics vol 56 no 2 pp 528ndash536 2000

[18] Y-G Wang and V J Carey ldquoUnbiased estimating equationsfromworking correlationmodels for irregularly timed repeatedmeasuresrdquo Journal of the American Statistical Association vol99 no 467 pp 845ndash853 2004

[19] A Qu and B G Lindsay ldquoBuilding adaptive estimating equa-tions when inverse of covariance estimation is difficultrdquo Journalof the Royal Statistical Society B Statistical Methodology vol 65no 1 pp 127ndash142 2003

[20] S R Lipsitz and G M Fitzmaurice ldquoEstimating equations formeasures of association between repeated binary responsesrdquoBiometrics vol 52 no 3 pp 903ndash912 1996

[21] Y Lee and J A Nelder ldquoConditional and marginal modelsanother viewrdquo Statistical Science vol 19 no 2 pp 219ndash238 2004

[22] Y Lee and J A Nelder ldquoLikelihood inference for models withunobservables another viewrdquo Statistical Science vol 24 no 3pp 255ndash269 2009

[23] A Qu B G Lindsay and B Li ldquoImproving generalised estimat-ing equations using quadratic inference functionsrdquo Biometrikavol 87 no 4 pp 823ndash836 2000

[24] G Kauermann and R J Carroll ldquoA note on the efficiencyof sandwich covariance matrix estimationrdquo Journal of theAmerican Statistical Association vol 96 no 456 pp 1387ndash13962001

[25] Y G Wang and L Y Hin ldquoModeling strategies in longitudinaldata analysis covariate variance function and correlationstructure selectionrdquoComputational Statistics andData Analysisvol 54 no 12 pp 3359ndash3370 2010

[26] W Pan ldquoGoodness-of-fit tests for GEE with correlated binarydatardquo Scandinavian Journal of Statistics vol 29 no 1 pp 101ndash110 2002

[27] A M Wood I R White and P Royston ldquoHow should variableselection be performed with multiply imputed datardquo Statisticsin Medicine vol 27 no 17 pp 3227ndash3246 2008

[28] M D Begg and M K Parides ldquoSeparation of individual-level and cluster-level covariate effects in regression analysis ofcorrelated datardquo Statistics in Medicine vol 22 no 16 pp 2591ndash2602 2003

[29] L Y Hin V J Carey and Y G Wang ldquoCriteria for working-correlation-structure selection in GEE assessment via simula-tionrdquoTheAmerican Statistician vol 61 no 4 pp 360ndash364 2007

[30] J X Pan and G Mackenzie ldquoOn modelling mean-covariancestructures in longitudinal studiesrdquo Biometrika vol 90 no 1 pp239ndash244 2003

[31] M Davidian and R J Carroll ldquoVariance function estimationrdquoJournal of the American Statistical Association vol 82 no 400pp 1079ndash1091 1987

[32] M Pourahmadi ldquoJoint mean-covariance models with appli-cations to longitudinal data unconstrained parameterisationrdquoBiometrika vol 86 no 3 pp 677ndash690 1999

[33] S Konishi and G Kitagawa ldquoGeneralised information criteriainmodel selectionrdquoBiometrika vol 83 no 4 pp 875ndash890 1996

[34] B Zhang ldquoSummarizing the goodness of fit o f generalizedlinear models for longitudinal datardquo Statistics in Medicine vol19 pp 1265ndash1275 2000

[35] A Rotnitzky and N P Jewell ldquoHypothesis testing of regressionparameters in semiparametric generalized linear models forcluster correlated datardquo Biometrika vol 77 no 3 pp 485ndash4971990

[36] J Shults andN R Chaganty ldquoAnalysis of serially correlated datausing quasi-least squaresrdquo Biometrics vol 54 no 4 pp 1622ndash1630 1998

[37] V J Carey and Y-G Wang ldquoWorking covariance modelselection for generalized estimating equationsrdquo Statistics inMedicine vol 30 no 26 pp 3117ndash3124 2011

[38] W Pan ldquoAkaikersquos information criterion in generalized estimat-ing equationsrdquo Biometrics vol 57 no 1 pp 120ndash125 2001

[39] H Akaike ldquoInformation theory and an extension of themaximum likelihood principlerdquo in Proceedings of the 2ndInternational Symposium on Information Theory vol 15 pp267ndash281 1973

[40] J A Nelder and Y Lee ldquoLikelihood quasi-likelihood andpseudolikelihood some comparisonsrdquo Journal of the RoyalStatistical Society B vol 54 no 1 pp 273ndash284 1992

[41] J Cui ldquoQIC program andmodel selection in GEE analysesrdquoTheStata Journal vol 7 no 2 pp 209ndash220 2007

[42] J Cui and G Qian ldquoSelection of working correlation structureand best model in GEE analyses of longitudinal datardquo Commu-nications in StatisticsmdashSimulation and Computation vol 36 no4ndash6 pp 987ndash996 2007

[43] L Y Hin and Y G Wang ldquoWorking-correlation-structureidentification in generalized estimating equationsrdquo Statistics inMedicine vol 28 no 4 pp 642ndash658 2009

[44] J A Nelder and D Pregibon ldquoAn extended quasi-likelihoodfunctionrdquo Biometrika vol 74 no 2 pp 221ndash232 1987

10 Advances in Statistics

[45] MWang M Kong and S Datta ldquoInference for marginal linearmodels for clustered longitudinal datawith potentially informa-tive cluster sizesrdquo Statistical Methods in Medical Research vol20 no 4 pp 347ndash367 2011

[46] E Cantoni J M Flemming and E Ronchetti ldquoVariableselection for marginal longitudinal generalized linear modelsrdquoBiometrics Journal of the International Biometric Society vol 61no 2 pp 507ndash514 2005

[47] Y-G Wang and X Lin ldquoEffects of variance-function misspeci-fication in analysis of longitudinal datardquo Biometrics vol 61 no2 pp 413ndash421 2005

[48] N R Chaganty andH Joe ldquoEfficiency of generalized estimatingequations for binary responsesrdquo Journal of the Royal StatisticalSociety Series B Statistical Methodology vol 66 no 4 pp 851ndash860 2004

[49] M Gosho C Hamada and I Yoshimura ldquoCriterion for theselection of a working correlation structure in the generalizedestimating equation approach for longitudinal balanced datardquoCommunications in Statistics vol 40 no 21 pp 3839ndash38562011

[50] M Gosho C Hamada and I Yoshimura ldquoSelection of workingcorrelation structure in weighted generalized estimating equa-tion method for incomplete longitudinal datardquo Communica-tions in Statistics vol 43 no 1 pp 62ndash81 2014

[51] M J JangWorking correlation selection in generalized estimatingequations [Dissertation] University of Iowa 2011

[52] J Chen and N A Lazar ldquoSelection of working correlationstructure in generalized estimating equations via empiricallikelihoodrdquo Journal of Computational and Graphical Statisticsvol 21 no 1 pp 18ndash41 2012

[53] P M Westgate ldquoA bias-corrected covariance estimator forimproved inference when using an unstructured correlationwith quadratic inference functionsrdquo Statistics and ProbabilityLetters vol 83 no 6 pp 1553ndash1558 2013

[54] P M Westgate ldquoCriterion for the simultaneous selection of aworking correlation structure and either generalized estimatingequations or the quadratic inference function approachrdquo Bio-metrical Journal vol 56 no 3 pp 461ndash476 2014

[55] P M Westgate ldquoImproving the correlation structure selectionapproach for generalized estimating equations and balancedlongitudinal datardquo Statistics in Medicine vol 33 no 13 pp2222ndash2237 2014

[56] J Ye ldquoOn measuring and correcting the effects of data miningand model selectionrdquo Journal of the American Statistical Associ-ation vol 93 no 441 pp 120ndash131 1998

[57] J J Shuster Practical Handbook of Sample Size Guidelines forClinical Trials CRC Press Boca Raton Fla USA 1993

[58] G Liu and K-Y Liang ldquoSample size calculations for studieswith correlated observationsrdquo Biometrics vol 53 no 3 pp 937ndash947 1997

[59] W J Shih ldquoSample size and power calculations for periodontaland other studies with clustered samples using the method ofgeneralized estimating equationsrdquo Biometrical Journal vol 39no 8 pp 899ndash908 1997

[60] S R Lipsitz and G M Fitzmaurice ldquoSample size for repeatedmeasures studies with binary responsesrdquo Statistics in Medicinevol 13 no 12 pp 1233ndash1239 1994

[61] W Pan ldquoSample size and power calculations with correlatedbinary datardquoControlled Clinical Trials vol 22 no 3 pp 211ndash2272001

[62] N Breslow ldquoTests of hypotheses in overdispersed Poissonregression and other quasi likelihood modelsrdquo Journal of theAmerican Statistical Association vol 85 pp 565ndash571 1990

[63] E W Lee and N Dubin ldquoEstimation and sample size consider-ations for clustered binary responsesrdquo Statistics inMedicine vol13 no 12 pp 1241ndash1252 1994

[64] D J Sargent J A Sloan and S S Cha ldquoSample size anddesign considerations for phase II clinical trials with correlatedobservationsrdquo Controlled Clinical Trials vol 20 no 3 pp 242ndash252 1999

[65] C S Li ldquoSemiparametric negative binomial regressionmodelsrdquoCommunications in Statistics Simulation and Computation vol39 no 3 pp 475ndash486 2010

[66] WHGreene ldquoAccounting for excess zeros and sample selectionin Poisson and negative binomial regression modelsrdquo TechRep New York University 1994

[67] P Lambert ldquoModeling of repeated series of count data mea-sured at unequally spaced timesrdquo Applied Statistics vol 45 pp31ndash38 1996

[68] M S Pepe andG L Anderson ldquoA cautionary note on in ferencefor marginal regression models with longitudinal data andgeneral correlated response datardquo Communications in StatisticsSeries B vol 23 pp 939ndash951 1994

[69] M Wang and Q Long ldquoModified robust variance estimator forgeneralized estimating equations with improved small-sampleperformancerdquo Statistics in Medicine vol 30 no 11 pp 1278ndash1291 2011

[70] M Taljaard ADMcRae CWeijer et al ldquoInadequate reportingof research ethics review and informed consent in clusterrandomised trials Review of random sample of publishedtrialsrdquo British Medical Journal vol 342 Article ID d2496 2011

[71] L A Mancl and T A DeRouen ldquoA covariance estimator forGEE with improved small-sample propertiesrdquo Biometrics vol57 no 1 pp 126ndash134 2001

[72] M P Fay and B I Graubard ldquoSmall-sample adjustments forWald-type tests using sandwich estimatorsrdquo Biometrics vol 57no 4 pp 1198ndash1206 2001

[73] W Pan ldquoOn the robust variance estimator in generalisedestimating equationsrdquo Biometrika vol 88 no 3 pp 901ndash9062001

[74] W Pan and M M Wall ldquoSmall-sample adjustments in usingthe sandwich variance estimator in generalized estimatingequationsrdquo Statistics in Medicine vol 21 no 10 pp 1429ndash14412002

[75] X Guo W Pan J E Connett P J Hannan and S A FrenchldquoSmall-sample performance of the robust score test and itsmodifications in generalized estimating equationsrdquo Statistics inMedicine vol 24 no 22 pp 3479ndash3495 2005

[76] D M Farewell ldquoMarginal analyses of longitudinal data with aninformative pattern of observationsrdquo Biometrika vol 97 no 1pp 65ndash78 2010

[77] J D Beck T Sharp G G Koch and S Offenbacher ldquoA 5-yearstudy of attachment loss and tooth loss in community-dwellingolder adultsrdquo Journal of Periodontal Research vol 32 no 6 pp516ndash523 1997

[78] S J Arbes Jr H Agustsdottir and G D Slade ldquoEnvironmentaltobacco smoke and periodontal disease in the United StatesrdquoAmerican Journal of Public Health vol 91 no 2 pp 253ndash2572001

[79] J M Robins A Rotnitzky and L P Zhao ldquoAnalysis of semi-parametric regression models for repeated outcomes in the

Advances in Statistics 11

presence of missing datardquo Journal of the American StatisticalAssociation vol 90 pp 106ndash121 1995

[80] E B Hoffman P K Sen and C R Weinberg ldquoWithin-clusterresamplingrdquo Biometrika vol 88 no 4 pp 1121ndash1134 2001

[81] J MWilliamson S Datta and G A Satten ldquoMarginal analysesof clustered data when cluster size is informativerdquo Biometricsvol 59 no 1 pp 36ndash42 2003

[82] E Benhin J N Rao and A J Scott ldquoMean estimating equationapproach to analysing cluster-correlated data with nonignor-able cluster sizesrdquo Biometrika vol 92 no 2 pp 435ndash450 2005

[83] X J Cong G Yin and Y Shen ldquoMarginal analysis of correlatedfailure time data with informative cluster sizesrdquo Biometrics vol63 no 3 pp 663ndash672 2007

[84] T C Chiang and K Y Lee ldquoEfcient estimation methods forinformative cluster size datardquo Statistical Sinica vol 80 pp 121ndash123 2008

[85] M Pavlou S R Seaman and A J Copas ldquoAn examinationof a method for marginal inference when the cluster size isinformativerdquo Statistica Sinica vol 23 no 2 pp 791ndash801 2013

[86] S R Seaman M Pavlou and A J Copas ldquoMethods forobserved-cluster inference when cluster size is informative areview and clarificationsrdquoBiometrics vol 70 no 2 pp 449ndash4562014

[87] Z Chen B Zhang and P S Albert ldquoA joint modeling approachto data with informative cluster size robustness to the clustersize modelrdquo Statistics in Medicine vol 30 no 15 pp 1825ndash18362011

[88] Y Huang and B Leroux ldquoInformative cluster sizes forsubcluster-level covariates and weighted generalized estimatingequationsrdquo Biometrics vol 67 no 3 pp 843ndash851 2011

[89] B F Kurland L L Johnson B L Egleston and P H DiehrldquoLongitudinal data with follow-up truncated by death matchthe analysis method to research aimsrdquo Statistical Science vol24 no 2 pp 211ndash222 2009

[90] J M Neuhaus and C E McCulloch ldquoEstimation of covariateeffects in generalized linear mixed models with informativecluster sizesrdquo Biometrika vol 98 no 1 pp 147ndash162 2011

[91] S R Lipsitz G M Fitzmaurice E J Orav and N M LairdldquoPerformance of generalized estimating equations in practicalsituationsrdquo Biometrics vol 50 no 1 pp 270ndash278 1994

[92] D B Hall and T A Severini ldquoExtended generalized estimatingequations for clustered datardquo Journal of the American StatisticalAssociation vol 93 no 444 pp 1365ndash1375 1998

[93] C-W Shen and Y-H Chen ldquoModel selection for generalizedestimating equations accommodating dropout missingnessrdquoBiometrics vol 68 no 4 pp 1046ndash1054 2012

[94] C-W Shen and Y-H Chen ldquoModel selection of generalizedestimating equations with multiply imputed longitudinal datardquoBiometrical Journal vol 55 no 6 pp 899ndash911 2013

[95] D B Rubin ldquoInference and missing datardquo Biometrika vol 63no 3 pp 581ndash592 1976

[96] R J Little andDB Rubin Statistical Analysis withMissingDataWiley New York NY USA

[97] P Diggle D Farewell and RHenderson ldquoAnalysis of longitudi-nal data with drop-out objectives assumptions and a proposalrdquoJournal of the Royal Statistical Society C vol 56 no 5 pp 499ndash550 2007

[98] A J Copas and S R Seaman ldquoBias from the use of generalizedestimating equations to analyze incomplete longitudinal binarydatardquo Journal of Applied Statistics vol 37 no 6 pp 911ndash9222010

[99] L Wang J Zhou and A Qu ldquoPenalized generalized estimatingequations for high-dimensional longitudinal data analysisrdquoBiometrics vol 68 no 2 pp 353ndash360 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Review Article Generalized Estimating Equations in ...downloads.hindawi.com/archive/2014/303728.pdfrecent developments of GEE. As is well known, GEE has several de ning features [

6 Advances in Statistics

continuous data logistic regression for binary data Poissonregression for count data) 119897 = 1 2 119871 The details areshown as follows

119878119897(120573) =

119870

sum119894=1

119878119894119895(120573119897) 119868 [119895 isin 119903

119897] = 0

wcr =1

119871

119871

sum119897=1

119897

wcr =1

119871

119871

sum119897=1

Σ119897minus1

119871

119871

sum119897=1

(119897minus wcr) (119897 minus wcr)

119879

(21)

where 119878119894119895(120573119897) = 1198831015840

119894119895119881minus1119894119895(119884119894119895minus 1198831015840119894119895120573119897) with 119903

119897as the set of

data index selected from the 119894th cluster in 119897th replicate dataAlternatively the approach considered by Williamson et alby adopting the weighted estimating equations performsasymptotically equivalently asWCR and also avoids intensivecomputing and it is referred to as the cluster-weighted GEE(CWGEE) [81] The estimating equation is

119878 (120573) =119870

sum119894=1

1

119870

119899119894

sum119895=1

119878119894119895(120573) = 0 (22)

where 119878119894119895is defined the same as above but what is different is

that the subscription 119895 ranges from 1 to 119899119894 not restricted by

the index 119903119897 Note that as 119871 rarr infin (1119871)sum119871

119894=1119878119897(120573) converges

to its expected estimating function and is asymptoticallyequivalent to 119878(120573)

This method was also explored or extended for thecorrelated data with nonignorable cluster size by Benhin etal and Cong et al [82 83] Furthermore a more efficientmethod called modified WCR (MWCR) was proposed byChiang and Lee where minimum cluster size 119899

119894gt 1 subjects

were randomly sampled from each cluster and then GEEmodels for balanced data were applied for estimation byincorporating the intracluster correlation thusMWCRmightbe a more efficient way for analysis [84] But MWCR is notalways satisfactory and Pavlou et al recognized the sufficientconditions of the data structure and the choice of ldquoworkingrdquocorrelation structure which allowed the consistency of theestimates fromMWCR[85] In additionWang et al extendedthe above work to the clustered longitudinal data which arecollected as repeated measures on subjects arising in clusterswith potential informative cluster size [45] Examples includehealth studies of subjects from multiple hospitals or familiesWith the adoption and comparison of GEE WCR andCWGEE the author claimed thatCWGEEwas recommendedbecause of the comparable performance with WCR and thelack of intensive Monte Carlo computation in terms of wellpreserved coverage rates and desirable power propertieswhile GEE models led to invalid inference due to the biasedparameter estimates via extensive simulation studies and realdata application of a periodontal disease study [45] In addi-tion for observed-cluster inference Seaman et al discussedthe methods including weighted and doubly weighted GEEand the shared random-effects models for comparison and

showed the conditions under which the shared random-effects model described members with observed outcomes Y[86] More work can be found in [87ndash90] among others

3 Simulation

In this section we focus on ldquoworkingrdquo correlation structureselection and compare the performances of the existingcriteria through simulation studies Two types of outcomesare considered continuous and count responses The modelsfor data generation are as follows

119906119894119895= 1205730+ 1205731times 119909119894119895

log (119906119894119895) = 1205730+ 1205731times 119909119894119895

(23)

where 1205730= 1205731= 05 119894 = 1 2 119868 with 119868 = 50 100 200 500

and 119895 = 1 2 119869 with 119869 = 4 8 The covariates 119909119894119895are iid

from a standard uniform distribution Unif(0 1) For eachscenario we generate the data based on the underlying truecorrelation structures as independent (IND) exchangeable(EXCH) and autoregressive (AR-1) with 120572 = 03 071000 Monte Carlo data sets are generated for each scenariowhere the estimates of regression parameters and within-subject correlation matrix and seven model selection criteriameasures are calculated using the ldquoworkingrdquo correlationstructure of IND EXCH and AR-1 The partial simulationresults are provided in Tables 2 3 and 4 where the results ofCIC are not shown because they are the same as those of QIC

Based on the results RJ does not perform well forthe scenarios with either continuous or binary outcomeswhile RJ1 and RJ2 have comparable performances and canselect the true underlying correlation structure in mostscenarios with better performance under large sample sizeQIC is not satisfactory when the true correlation structureis independent but has advantageous performance for thescenarios with the true correlation structure as exchangeableor AR-1 On the other hand SC and GP do not performwell for longitudinal data with normal responses but theperformance is slightly improved for longitudinal data withbinary outcomes The results may vary due to variety offactors including the types of ldquoworkingrdquo correlation structureconsidered for model fitting the sample size andor themagnitude of correlation coefficient For the future workthere is a necessity to find out a robust criterion for ldquoworkingrdquocorrelation structure selection of GEE and more advancedapproaches are emerging currently

4 Future Direction and Discussion

In this paper we provide a review of several specific topicssuch as model selection with emphasis on the selection ofldquoworkingrdquo correlation structure sample size and power cal-culation and clustered data analysis with informative clustersize related to GEE for longitudinalcorrelated data Thesimulation studies are conducted for providing numericalcomparisons among five types of model selection criteria[91 92] Until now novel methodologies are still needed andbeing developed due to the increasing usage and potential

Advances in Statistics 7

Table 2 Simulation for longitudinal data with independent correlation matrix

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 198 393 409 202 374 424RJ 327 423 250 312 421 267RJ1 388 322 290 399 316 285RJ2 384 327 289 388 320 292SC 488 1 512 351 310 339GP 547 0 453 368 306 326

100

QIC 209 377 414 185 407 408RJ 338 415 247 340 410 250RJ1 389 349 262 381 358 261RJ2 389 353 258 372 357 271SC 482 1 517 352 346 302GP 520 0 480 360 348 292

8

50

QIC 200 411 389 203 363 434RJ 282 497 221 292 476 232RJ1 402 354 244 386 340 274RJ2 402 357 241 373 347 280SC 465 1 535 351 325 324GP 558 0 442 382 311 307

100

QIC 188 393 419 201 398 401RJ 321 442 237 287 466 247RJ1 347 385 268 385 367 248RJ2 347 382 271 377 369 254SC 492 0 508 355 343 302GP 541 0 459 370 341 289

Table 3 Simulation for longitudinal data with exchangeable correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 106 699 195 53 758 189RJ 419 139 442 869 5 126RJ1 0 963 37 12 898 90RJ2 0 959 41 22 876 102SC 0 593 407 282 650 68GP 1 593 406 412 524 64

100

QIC 31 879 90 7 867 126RJ 350 88 562 911 2 87RJ1 0 995 5 2 946 52RJ2 0 996 4 10 933 57SC 0 598 402 339 635 26GP 0 501 499 445 531 24

8

50

QIC 80 828 92 50 876 74RJ 10 395 595 813 6 181RJ1 0 1000 0 0 987 13RJ2 0 1000 0 0 966 25SC 0 488 513 302 696 2GP 0 511 489 497 500 3

100

QIC 17 953 30 8 973 19RJ 0 408 592 861 0 139RJ1 0 1000 0 0 997 3RJ2 0 1000 0 0 993 7SC 0 470 530 328 672 0GP 0 526 474 486 514 0

8 Advances in Statistics

Table 4 Simulation for longitudinal data with AR-1 correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 91 166 743 66 170 764RJ 712 142 146 925 12 63RJ1 0 478 522 7 505 488RJ2 0 466 534 20 499 481SC 0 480 520 220 350 430GP 0 543 457 303 332 365

100

QIC 25 116 859 7 122 871RJ 770 95 135 972 4 24RJ1 0 475 525 1 569 430RJ2 0 481 519 5 571 424SC 0 491 509 237 371 392GP 0 540 460 290 353 357

8

50

QIC 50 88 862 44 77 879RJ 646 148 206 934 5 61RJ1 0 445 555 0 535 465RJ2 0 443 557 10 535 455SC 0 467 533 168 397 435GP 0 549 451 269 406 325

100

QIC 16 39 945 7 33 960RJ 648 154 198 972 0 28RJ1 0 455 545 1 603 396RJ2 0 455 545 1 609 390SC 0 480 520 177 458 365GP 0 532 468 247 457 296

theoretical constraints of GEE as well as new challengesemerging from practical applications in clinical trials orbiomedical studies

In addition current research of interest related to GEEalso includes a robust and optimal model selection criterionof GEE under missing at random (MAR) or missing not atrandom (MNAR) [93 94] sample sizepower calculation forcorrelated sparse or overdispersion count data or longitudinaldata with small sample [57ndash60] GEE with improved per-formance under the situations with informative cluster sizeandor MAR andor small sample size [95ndash98] and GEE forhigh-dimensional longitudinal data [99] Although GEE hasattractive features flexible application and easy implementa-tion in software the application in practice should be cautiousdepending on the context of study design or data structureand the goals of research interest

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The author was supported by a grant from the Penn StateCTSI The project was supported by the National Center forResearch Resources and the National Center for AdvancingTranslational Sciences National Institutes of Health throughGrant 5 UL1 RR0330184-04 The content is solely the respon-sibility of the author and does not represent the views of theNIH

References

[1] Z Feng P Diehr A Peterson and D McLerran ldquoSelectedstatistical issues in group randomized trialsrdquo Annual Review ofPublic Health vol 22 pp 167ndash187 2001

[2] G Fitzmaurice N M Larid and J H Ware Applied Longitudi-nal Data John Wiley amp Sons 2004

[3] J W Hardin and J M Hilbe Generalized Estimating EquationsChapman and HallCRC Press Boca Raton Fla USA 2003

[4] R F Potthoff and S N Roy ldquoA generalized multivariate analysisof variance model useful especially for growth curve problemsrdquoBiometrika vol 51 pp 313ndash326 1964

Advances in Statistics 9

[5] LM Friedman CD Furberg andD LDeMets Fundamentalsof Clinical Trials Springer New York NY USA 3nd edition1989

[6] K Y Liang and S L Zeger ldquoA comparison of two bias-correctedcovariance estimators for generalized estimating equationsrdquoBiometrika vol 73 pp 13ndash22 1986

[7] M Crowder ldquoOn the use of a working correlation matrixin using generalised linear models for repeated measuresrdquoBiometrika vol 82 no 2 pp 407ndash410 1995

[8] R W Wedderburn ldquoQuasi-likelihood functions generalizedlinearmodels and the Gauss-Newtonmethodrdquo Biometrika vol61 pp 439ndash447 1974

[9] P Diggle P Heagerty K Y Liang and S L Zeger Analysis ofLongitudinal Data Oxford University Press Oxford UK 2002

[10] G Fitzmaurice M Davidian G Verbeke and G MolenberghsLongitudinal Data Anlaysis Chapman ampHallCRC Press 2008

[11] D Hedeker and R D Gibbons Analysis of Longitudinal DataJohn Wiley amp Sons 2006

[12] P McCullagh and J A Nelder Generalized Linear ModelsChapman amp Hall London UK 1989

[13] N R Chaganty and H Joe ldquoRange of correlation matrices fordependent Bernoulli random variablesrdquo Biometrika vol 93 no1 pp 197ndash206 2006

[14] R T Sabo and N R Chaganty ldquoWhat can go wrong whenignoring correlation bounds in the use of generalized estimatingequationsrdquo Statistics in Medicine vol 29 no 24 pp 2501ndash25072010

[15] B C Sutradhar and K Das ldquoOn the efficiency of regressionestimators in generalised linear models for longitudinal datardquoBiometrika vol 86 no 2 pp 459ndash465 1999

[16] Y-G Wang and V Carey ldquoWorking correlation structuremisspecification estimation and covariate design implicationsfor generalised estimating equations performancerdquo Biometrikavol 90 no 1 pp 29ndash41 2003

[17] S R Lipsitz GMolenberghsGM Fitzmaurice and J IbrahimldquoGEE with Gaussian estimation of the correlations when dataare incompleterdquo Biometrics vol 56 no 2 pp 528ndash536 2000

[18] Y-G Wang and V J Carey ldquoUnbiased estimating equationsfromworking correlationmodels for irregularly timed repeatedmeasuresrdquo Journal of the American Statistical Association vol99 no 467 pp 845ndash853 2004

[19] A Qu and B G Lindsay ldquoBuilding adaptive estimating equa-tions when inverse of covariance estimation is difficultrdquo Journalof the Royal Statistical Society B Statistical Methodology vol 65no 1 pp 127ndash142 2003

[20] S R Lipsitz and G M Fitzmaurice ldquoEstimating equations formeasures of association between repeated binary responsesrdquoBiometrics vol 52 no 3 pp 903ndash912 1996

[21] Y Lee and J A Nelder ldquoConditional and marginal modelsanother viewrdquo Statistical Science vol 19 no 2 pp 219ndash238 2004

[22] Y Lee and J A Nelder ldquoLikelihood inference for models withunobservables another viewrdquo Statistical Science vol 24 no 3pp 255ndash269 2009

[23] A Qu B G Lindsay and B Li ldquoImproving generalised estimat-ing equations using quadratic inference functionsrdquo Biometrikavol 87 no 4 pp 823ndash836 2000

[24] G Kauermann and R J Carroll ldquoA note on the efficiencyof sandwich covariance matrix estimationrdquo Journal of theAmerican Statistical Association vol 96 no 456 pp 1387ndash13962001

[25] Y G Wang and L Y Hin ldquoModeling strategies in longitudinaldata analysis covariate variance function and correlationstructure selectionrdquoComputational Statistics andData Analysisvol 54 no 12 pp 3359ndash3370 2010

[26] W Pan ldquoGoodness-of-fit tests for GEE with correlated binarydatardquo Scandinavian Journal of Statistics vol 29 no 1 pp 101ndash110 2002

[27] A M Wood I R White and P Royston ldquoHow should variableselection be performed with multiply imputed datardquo Statisticsin Medicine vol 27 no 17 pp 3227ndash3246 2008

[28] M D Begg and M K Parides ldquoSeparation of individual-level and cluster-level covariate effects in regression analysis ofcorrelated datardquo Statistics in Medicine vol 22 no 16 pp 2591ndash2602 2003

[29] L Y Hin V J Carey and Y G Wang ldquoCriteria for working-correlation-structure selection in GEE assessment via simula-tionrdquoTheAmerican Statistician vol 61 no 4 pp 360ndash364 2007

[30] J X Pan and G Mackenzie ldquoOn modelling mean-covariancestructures in longitudinal studiesrdquo Biometrika vol 90 no 1 pp239ndash244 2003

[31] M Davidian and R J Carroll ldquoVariance function estimationrdquoJournal of the American Statistical Association vol 82 no 400pp 1079ndash1091 1987

[32] M Pourahmadi ldquoJoint mean-covariance models with appli-cations to longitudinal data unconstrained parameterisationrdquoBiometrika vol 86 no 3 pp 677ndash690 1999

[33] S Konishi and G Kitagawa ldquoGeneralised information criteriainmodel selectionrdquoBiometrika vol 83 no 4 pp 875ndash890 1996

[34] B Zhang ldquoSummarizing the goodness of fit o f generalizedlinear models for longitudinal datardquo Statistics in Medicine vol19 pp 1265ndash1275 2000

[35] A Rotnitzky and N P Jewell ldquoHypothesis testing of regressionparameters in semiparametric generalized linear models forcluster correlated datardquo Biometrika vol 77 no 3 pp 485ndash4971990

[36] J Shults andN R Chaganty ldquoAnalysis of serially correlated datausing quasi-least squaresrdquo Biometrics vol 54 no 4 pp 1622ndash1630 1998

[37] V J Carey and Y-G Wang ldquoWorking covariance modelselection for generalized estimating equationsrdquo Statistics inMedicine vol 30 no 26 pp 3117ndash3124 2011

[38] W Pan ldquoAkaikersquos information criterion in generalized estimat-ing equationsrdquo Biometrics vol 57 no 1 pp 120ndash125 2001

[39] H Akaike ldquoInformation theory and an extension of themaximum likelihood principlerdquo in Proceedings of the 2ndInternational Symposium on Information Theory vol 15 pp267ndash281 1973

[40] J A Nelder and Y Lee ldquoLikelihood quasi-likelihood andpseudolikelihood some comparisonsrdquo Journal of the RoyalStatistical Society B vol 54 no 1 pp 273ndash284 1992

[41] J Cui ldquoQIC program andmodel selection in GEE analysesrdquoTheStata Journal vol 7 no 2 pp 209ndash220 2007

[42] J Cui and G Qian ldquoSelection of working correlation structureand best model in GEE analyses of longitudinal datardquo Commu-nications in StatisticsmdashSimulation and Computation vol 36 no4ndash6 pp 987ndash996 2007

[43] L Y Hin and Y G Wang ldquoWorking-correlation-structureidentification in generalized estimating equationsrdquo Statistics inMedicine vol 28 no 4 pp 642ndash658 2009

[44] J A Nelder and D Pregibon ldquoAn extended quasi-likelihoodfunctionrdquo Biometrika vol 74 no 2 pp 221ndash232 1987

10 Advances in Statistics

[45] MWang M Kong and S Datta ldquoInference for marginal linearmodels for clustered longitudinal datawith potentially informa-tive cluster sizesrdquo Statistical Methods in Medical Research vol20 no 4 pp 347ndash367 2011

[46] E Cantoni J M Flemming and E Ronchetti ldquoVariableselection for marginal longitudinal generalized linear modelsrdquoBiometrics Journal of the International Biometric Society vol 61no 2 pp 507ndash514 2005

[47] Y-G Wang and X Lin ldquoEffects of variance-function misspeci-fication in analysis of longitudinal datardquo Biometrics vol 61 no2 pp 413ndash421 2005

[48] N R Chaganty andH Joe ldquoEfficiency of generalized estimatingequations for binary responsesrdquo Journal of the Royal StatisticalSociety Series B Statistical Methodology vol 66 no 4 pp 851ndash860 2004

[49] M Gosho C Hamada and I Yoshimura ldquoCriterion for theselection of a working correlation structure in the generalizedestimating equation approach for longitudinal balanced datardquoCommunications in Statistics vol 40 no 21 pp 3839ndash38562011

[50] M Gosho C Hamada and I Yoshimura ldquoSelection of workingcorrelation structure in weighted generalized estimating equa-tion method for incomplete longitudinal datardquo Communica-tions in Statistics vol 43 no 1 pp 62ndash81 2014

[51] M J JangWorking correlation selection in generalized estimatingequations [Dissertation] University of Iowa 2011

[52] J Chen and N A Lazar ldquoSelection of working correlationstructure in generalized estimating equations via empiricallikelihoodrdquo Journal of Computational and Graphical Statisticsvol 21 no 1 pp 18ndash41 2012

[53] P M Westgate ldquoA bias-corrected covariance estimator forimproved inference when using an unstructured correlationwith quadratic inference functionsrdquo Statistics and ProbabilityLetters vol 83 no 6 pp 1553ndash1558 2013

[54] P M Westgate ldquoCriterion for the simultaneous selection of aworking correlation structure and either generalized estimatingequations or the quadratic inference function approachrdquo Bio-metrical Journal vol 56 no 3 pp 461ndash476 2014

[55] P M Westgate ldquoImproving the correlation structure selectionapproach for generalized estimating equations and balancedlongitudinal datardquo Statistics in Medicine vol 33 no 13 pp2222ndash2237 2014

[56] J Ye ldquoOn measuring and correcting the effects of data miningand model selectionrdquo Journal of the American Statistical Associ-ation vol 93 no 441 pp 120ndash131 1998

[57] J J Shuster Practical Handbook of Sample Size Guidelines forClinical Trials CRC Press Boca Raton Fla USA 1993

[58] G Liu and K-Y Liang ldquoSample size calculations for studieswith correlated observationsrdquo Biometrics vol 53 no 3 pp 937ndash947 1997

[59] W J Shih ldquoSample size and power calculations for periodontaland other studies with clustered samples using the method ofgeneralized estimating equationsrdquo Biometrical Journal vol 39no 8 pp 899ndash908 1997

[60] S R Lipsitz and G M Fitzmaurice ldquoSample size for repeatedmeasures studies with binary responsesrdquo Statistics in Medicinevol 13 no 12 pp 1233ndash1239 1994

[61] W Pan ldquoSample size and power calculations with correlatedbinary datardquoControlled Clinical Trials vol 22 no 3 pp 211ndash2272001

[62] N Breslow ldquoTests of hypotheses in overdispersed Poissonregression and other quasi likelihood modelsrdquo Journal of theAmerican Statistical Association vol 85 pp 565ndash571 1990

[63] E W Lee and N Dubin ldquoEstimation and sample size consider-ations for clustered binary responsesrdquo Statistics inMedicine vol13 no 12 pp 1241ndash1252 1994

[64] D J Sargent J A Sloan and S S Cha ldquoSample size anddesign considerations for phase II clinical trials with correlatedobservationsrdquo Controlled Clinical Trials vol 20 no 3 pp 242ndash252 1999

[65] C S Li ldquoSemiparametric negative binomial regressionmodelsrdquoCommunications in Statistics Simulation and Computation vol39 no 3 pp 475ndash486 2010

[66] WHGreene ldquoAccounting for excess zeros and sample selectionin Poisson and negative binomial regression modelsrdquo TechRep New York University 1994

[67] P Lambert ldquoModeling of repeated series of count data mea-sured at unequally spaced timesrdquo Applied Statistics vol 45 pp31ndash38 1996

[68] M S Pepe andG L Anderson ldquoA cautionary note on in ferencefor marginal regression models with longitudinal data andgeneral correlated response datardquo Communications in StatisticsSeries B vol 23 pp 939ndash951 1994

[69] M Wang and Q Long ldquoModified robust variance estimator forgeneralized estimating equations with improved small-sampleperformancerdquo Statistics in Medicine vol 30 no 11 pp 1278ndash1291 2011

[70] M Taljaard ADMcRae CWeijer et al ldquoInadequate reportingof research ethics review and informed consent in clusterrandomised trials Review of random sample of publishedtrialsrdquo British Medical Journal vol 342 Article ID d2496 2011

[71] L A Mancl and T A DeRouen ldquoA covariance estimator forGEE with improved small-sample propertiesrdquo Biometrics vol57 no 1 pp 126ndash134 2001

[72] M P Fay and B I Graubard ldquoSmall-sample adjustments forWald-type tests using sandwich estimatorsrdquo Biometrics vol 57no 4 pp 1198ndash1206 2001

[73] W Pan ldquoOn the robust variance estimator in generalisedestimating equationsrdquo Biometrika vol 88 no 3 pp 901ndash9062001

[74] W Pan and M M Wall ldquoSmall-sample adjustments in usingthe sandwich variance estimator in generalized estimatingequationsrdquo Statistics in Medicine vol 21 no 10 pp 1429ndash14412002

[75] X Guo W Pan J E Connett P J Hannan and S A FrenchldquoSmall-sample performance of the robust score test and itsmodifications in generalized estimating equationsrdquo Statistics inMedicine vol 24 no 22 pp 3479ndash3495 2005

[76] D M Farewell ldquoMarginal analyses of longitudinal data with aninformative pattern of observationsrdquo Biometrika vol 97 no 1pp 65ndash78 2010

[77] J D Beck T Sharp G G Koch and S Offenbacher ldquoA 5-yearstudy of attachment loss and tooth loss in community-dwellingolder adultsrdquo Journal of Periodontal Research vol 32 no 6 pp516ndash523 1997

[78] S J Arbes Jr H Agustsdottir and G D Slade ldquoEnvironmentaltobacco smoke and periodontal disease in the United StatesrdquoAmerican Journal of Public Health vol 91 no 2 pp 253ndash2572001

[79] J M Robins A Rotnitzky and L P Zhao ldquoAnalysis of semi-parametric regression models for repeated outcomes in the

Advances in Statistics 11

presence of missing datardquo Journal of the American StatisticalAssociation vol 90 pp 106ndash121 1995

[80] E B Hoffman P K Sen and C R Weinberg ldquoWithin-clusterresamplingrdquo Biometrika vol 88 no 4 pp 1121ndash1134 2001

[81] J MWilliamson S Datta and G A Satten ldquoMarginal analysesof clustered data when cluster size is informativerdquo Biometricsvol 59 no 1 pp 36ndash42 2003

[82] E Benhin J N Rao and A J Scott ldquoMean estimating equationapproach to analysing cluster-correlated data with nonignor-able cluster sizesrdquo Biometrika vol 92 no 2 pp 435ndash450 2005

[83] X J Cong G Yin and Y Shen ldquoMarginal analysis of correlatedfailure time data with informative cluster sizesrdquo Biometrics vol63 no 3 pp 663ndash672 2007

[84] T C Chiang and K Y Lee ldquoEfcient estimation methods forinformative cluster size datardquo Statistical Sinica vol 80 pp 121ndash123 2008

[85] M Pavlou S R Seaman and A J Copas ldquoAn examinationof a method for marginal inference when the cluster size isinformativerdquo Statistica Sinica vol 23 no 2 pp 791ndash801 2013

[86] S R Seaman M Pavlou and A J Copas ldquoMethods forobserved-cluster inference when cluster size is informative areview and clarificationsrdquoBiometrics vol 70 no 2 pp 449ndash4562014

[87] Z Chen B Zhang and P S Albert ldquoA joint modeling approachto data with informative cluster size robustness to the clustersize modelrdquo Statistics in Medicine vol 30 no 15 pp 1825ndash18362011

[88] Y Huang and B Leroux ldquoInformative cluster sizes forsubcluster-level covariates and weighted generalized estimatingequationsrdquo Biometrics vol 67 no 3 pp 843ndash851 2011

[89] B F Kurland L L Johnson B L Egleston and P H DiehrldquoLongitudinal data with follow-up truncated by death matchthe analysis method to research aimsrdquo Statistical Science vol24 no 2 pp 211ndash222 2009

[90] J M Neuhaus and C E McCulloch ldquoEstimation of covariateeffects in generalized linear mixed models with informativecluster sizesrdquo Biometrika vol 98 no 1 pp 147ndash162 2011

[91] S R Lipsitz G M Fitzmaurice E J Orav and N M LairdldquoPerformance of generalized estimating equations in practicalsituationsrdquo Biometrics vol 50 no 1 pp 270ndash278 1994

[92] D B Hall and T A Severini ldquoExtended generalized estimatingequations for clustered datardquo Journal of the American StatisticalAssociation vol 93 no 444 pp 1365ndash1375 1998

[93] C-W Shen and Y-H Chen ldquoModel selection for generalizedestimating equations accommodating dropout missingnessrdquoBiometrics vol 68 no 4 pp 1046ndash1054 2012

[94] C-W Shen and Y-H Chen ldquoModel selection of generalizedestimating equations with multiply imputed longitudinal datardquoBiometrical Journal vol 55 no 6 pp 899ndash911 2013

[95] D B Rubin ldquoInference and missing datardquo Biometrika vol 63no 3 pp 581ndash592 1976

[96] R J Little andDB Rubin Statistical Analysis withMissingDataWiley New York NY USA

[97] P Diggle D Farewell and RHenderson ldquoAnalysis of longitudi-nal data with drop-out objectives assumptions and a proposalrdquoJournal of the Royal Statistical Society C vol 56 no 5 pp 499ndash550 2007

[98] A J Copas and S R Seaman ldquoBias from the use of generalizedestimating equations to analyze incomplete longitudinal binarydatardquo Journal of Applied Statistics vol 37 no 6 pp 911ndash9222010

[99] L Wang J Zhou and A Qu ldquoPenalized generalized estimatingequations for high-dimensional longitudinal data analysisrdquoBiometrics vol 68 no 2 pp 353ndash360 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Review Article Generalized Estimating Equations in ...downloads.hindawi.com/archive/2014/303728.pdfrecent developments of GEE. As is well known, GEE has several de ning features [

Advances in Statistics 7

Table 2 Simulation for longitudinal data with independent correlation matrix

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 198 393 409 202 374 424RJ 327 423 250 312 421 267RJ1 388 322 290 399 316 285RJ2 384 327 289 388 320 292SC 488 1 512 351 310 339GP 547 0 453 368 306 326

100

QIC 209 377 414 185 407 408RJ 338 415 247 340 410 250RJ1 389 349 262 381 358 261RJ2 389 353 258 372 357 271SC 482 1 517 352 346 302GP 520 0 480 360 348 292

8

50

QIC 200 411 389 203 363 434RJ 282 497 221 292 476 232RJ1 402 354 244 386 340 274RJ2 402 357 241 373 347 280SC 465 1 535 351 325 324GP 558 0 442 382 311 307

100

QIC 188 393 419 201 398 401RJ 321 442 237 287 466 247RJ1 347 385 268 385 367 248RJ2 347 382 271 377 369 254SC 492 0 508 355 343 302GP 541 0 459 370 341 289

Table 3 Simulation for longitudinal data with exchangeable correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 106 699 195 53 758 189RJ 419 139 442 869 5 126RJ1 0 963 37 12 898 90RJ2 0 959 41 22 876 102SC 0 593 407 282 650 68GP 1 593 406 412 524 64

100

QIC 31 879 90 7 867 126RJ 350 88 562 911 2 87RJ1 0 995 5 2 946 52RJ2 0 996 4 10 933 57SC 0 598 402 339 635 26GP 0 501 499 445 531 24

8

50

QIC 80 828 92 50 876 74RJ 10 395 595 813 6 181RJ1 0 1000 0 0 987 13RJ2 0 1000 0 0 966 25SC 0 488 513 302 696 2GP 0 511 489 497 500 3

100

QIC 17 953 30 8 973 19RJ 0 408 592 861 0 139RJ1 0 1000 0 0 997 3RJ2 0 1000 0 0 993 7SC 0 470 530 328 672 0GP 0 526 474 486 514 0

8 Advances in Statistics

Table 4 Simulation for longitudinal data with AR-1 correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 91 166 743 66 170 764RJ 712 142 146 925 12 63RJ1 0 478 522 7 505 488RJ2 0 466 534 20 499 481SC 0 480 520 220 350 430GP 0 543 457 303 332 365

100

QIC 25 116 859 7 122 871RJ 770 95 135 972 4 24RJ1 0 475 525 1 569 430RJ2 0 481 519 5 571 424SC 0 491 509 237 371 392GP 0 540 460 290 353 357

8

50

QIC 50 88 862 44 77 879RJ 646 148 206 934 5 61RJ1 0 445 555 0 535 465RJ2 0 443 557 10 535 455SC 0 467 533 168 397 435GP 0 549 451 269 406 325

100

QIC 16 39 945 7 33 960RJ 648 154 198 972 0 28RJ1 0 455 545 1 603 396RJ2 0 455 545 1 609 390SC 0 480 520 177 458 365GP 0 532 468 247 457 296

theoretical constraints of GEE as well as new challengesemerging from practical applications in clinical trials orbiomedical studies

In addition current research of interest related to GEEalso includes a robust and optimal model selection criterionof GEE under missing at random (MAR) or missing not atrandom (MNAR) [93 94] sample sizepower calculation forcorrelated sparse or overdispersion count data or longitudinaldata with small sample [57ndash60] GEE with improved per-formance under the situations with informative cluster sizeandor MAR andor small sample size [95ndash98] and GEE forhigh-dimensional longitudinal data [99] Although GEE hasattractive features flexible application and easy implementa-tion in software the application in practice should be cautiousdepending on the context of study design or data structureand the goals of research interest

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The author was supported by a grant from the Penn StateCTSI The project was supported by the National Center forResearch Resources and the National Center for AdvancingTranslational Sciences National Institutes of Health throughGrant 5 UL1 RR0330184-04 The content is solely the respon-sibility of the author and does not represent the views of theNIH

References

[1] Z Feng P Diehr A Peterson and D McLerran ldquoSelectedstatistical issues in group randomized trialsrdquo Annual Review ofPublic Health vol 22 pp 167ndash187 2001

[2] G Fitzmaurice N M Larid and J H Ware Applied Longitudi-nal Data John Wiley amp Sons 2004

[3] J W Hardin and J M Hilbe Generalized Estimating EquationsChapman and HallCRC Press Boca Raton Fla USA 2003

[4] R F Potthoff and S N Roy ldquoA generalized multivariate analysisof variance model useful especially for growth curve problemsrdquoBiometrika vol 51 pp 313ndash326 1964

Advances in Statistics 9

[5] LM Friedman CD Furberg andD LDeMets Fundamentalsof Clinical Trials Springer New York NY USA 3nd edition1989

[6] K Y Liang and S L Zeger ldquoA comparison of two bias-correctedcovariance estimators for generalized estimating equationsrdquoBiometrika vol 73 pp 13ndash22 1986

[7] M Crowder ldquoOn the use of a working correlation matrixin using generalised linear models for repeated measuresrdquoBiometrika vol 82 no 2 pp 407ndash410 1995

[8] R W Wedderburn ldquoQuasi-likelihood functions generalizedlinearmodels and the Gauss-Newtonmethodrdquo Biometrika vol61 pp 439ndash447 1974

[9] P Diggle P Heagerty K Y Liang and S L Zeger Analysis ofLongitudinal Data Oxford University Press Oxford UK 2002

[10] G Fitzmaurice M Davidian G Verbeke and G MolenberghsLongitudinal Data Anlaysis Chapman ampHallCRC Press 2008

[11] D Hedeker and R D Gibbons Analysis of Longitudinal DataJohn Wiley amp Sons 2006

[12] P McCullagh and J A Nelder Generalized Linear ModelsChapman amp Hall London UK 1989

[13] N R Chaganty and H Joe ldquoRange of correlation matrices fordependent Bernoulli random variablesrdquo Biometrika vol 93 no1 pp 197ndash206 2006

[14] R T Sabo and N R Chaganty ldquoWhat can go wrong whenignoring correlation bounds in the use of generalized estimatingequationsrdquo Statistics in Medicine vol 29 no 24 pp 2501ndash25072010

[15] B C Sutradhar and K Das ldquoOn the efficiency of regressionestimators in generalised linear models for longitudinal datardquoBiometrika vol 86 no 2 pp 459ndash465 1999

[16] Y-G Wang and V Carey ldquoWorking correlation structuremisspecification estimation and covariate design implicationsfor generalised estimating equations performancerdquo Biometrikavol 90 no 1 pp 29ndash41 2003

[17] S R Lipsitz GMolenberghsGM Fitzmaurice and J IbrahimldquoGEE with Gaussian estimation of the correlations when dataare incompleterdquo Biometrics vol 56 no 2 pp 528ndash536 2000

[18] Y-G Wang and V J Carey ldquoUnbiased estimating equationsfromworking correlationmodels for irregularly timed repeatedmeasuresrdquo Journal of the American Statistical Association vol99 no 467 pp 845ndash853 2004

[19] A Qu and B G Lindsay ldquoBuilding adaptive estimating equa-tions when inverse of covariance estimation is difficultrdquo Journalof the Royal Statistical Society B Statistical Methodology vol 65no 1 pp 127ndash142 2003

[20] S R Lipsitz and G M Fitzmaurice ldquoEstimating equations formeasures of association between repeated binary responsesrdquoBiometrics vol 52 no 3 pp 903ndash912 1996

[21] Y Lee and J A Nelder ldquoConditional and marginal modelsanother viewrdquo Statistical Science vol 19 no 2 pp 219ndash238 2004

[22] Y Lee and J A Nelder ldquoLikelihood inference for models withunobservables another viewrdquo Statistical Science vol 24 no 3pp 255ndash269 2009

[23] A Qu B G Lindsay and B Li ldquoImproving generalised estimat-ing equations using quadratic inference functionsrdquo Biometrikavol 87 no 4 pp 823ndash836 2000

[24] G Kauermann and R J Carroll ldquoA note on the efficiencyof sandwich covariance matrix estimationrdquo Journal of theAmerican Statistical Association vol 96 no 456 pp 1387ndash13962001

[25] Y G Wang and L Y Hin ldquoModeling strategies in longitudinaldata analysis covariate variance function and correlationstructure selectionrdquoComputational Statistics andData Analysisvol 54 no 12 pp 3359ndash3370 2010

[26] W Pan ldquoGoodness-of-fit tests for GEE with correlated binarydatardquo Scandinavian Journal of Statistics vol 29 no 1 pp 101ndash110 2002

[27] A M Wood I R White and P Royston ldquoHow should variableselection be performed with multiply imputed datardquo Statisticsin Medicine vol 27 no 17 pp 3227ndash3246 2008

[28] M D Begg and M K Parides ldquoSeparation of individual-level and cluster-level covariate effects in regression analysis ofcorrelated datardquo Statistics in Medicine vol 22 no 16 pp 2591ndash2602 2003

[29] L Y Hin V J Carey and Y G Wang ldquoCriteria for working-correlation-structure selection in GEE assessment via simula-tionrdquoTheAmerican Statistician vol 61 no 4 pp 360ndash364 2007

[30] J X Pan and G Mackenzie ldquoOn modelling mean-covariancestructures in longitudinal studiesrdquo Biometrika vol 90 no 1 pp239ndash244 2003

[31] M Davidian and R J Carroll ldquoVariance function estimationrdquoJournal of the American Statistical Association vol 82 no 400pp 1079ndash1091 1987

[32] M Pourahmadi ldquoJoint mean-covariance models with appli-cations to longitudinal data unconstrained parameterisationrdquoBiometrika vol 86 no 3 pp 677ndash690 1999

[33] S Konishi and G Kitagawa ldquoGeneralised information criteriainmodel selectionrdquoBiometrika vol 83 no 4 pp 875ndash890 1996

[34] B Zhang ldquoSummarizing the goodness of fit o f generalizedlinear models for longitudinal datardquo Statistics in Medicine vol19 pp 1265ndash1275 2000

[35] A Rotnitzky and N P Jewell ldquoHypothesis testing of regressionparameters in semiparametric generalized linear models forcluster correlated datardquo Biometrika vol 77 no 3 pp 485ndash4971990

[36] J Shults andN R Chaganty ldquoAnalysis of serially correlated datausing quasi-least squaresrdquo Biometrics vol 54 no 4 pp 1622ndash1630 1998

[37] V J Carey and Y-G Wang ldquoWorking covariance modelselection for generalized estimating equationsrdquo Statistics inMedicine vol 30 no 26 pp 3117ndash3124 2011

[38] W Pan ldquoAkaikersquos information criterion in generalized estimat-ing equationsrdquo Biometrics vol 57 no 1 pp 120ndash125 2001

[39] H Akaike ldquoInformation theory and an extension of themaximum likelihood principlerdquo in Proceedings of the 2ndInternational Symposium on Information Theory vol 15 pp267ndash281 1973

[40] J A Nelder and Y Lee ldquoLikelihood quasi-likelihood andpseudolikelihood some comparisonsrdquo Journal of the RoyalStatistical Society B vol 54 no 1 pp 273ndash284 1992

[41] J Cui ldquoQIC program andmodel selection in GEE analysesrdquoTheStata Journal vol 7 no 2 pp 209ndash220 2007

[42] J Cui and G Qian ldquoSelection of working correlation structureand best model in GEE analyses of longitudinal datardquo Commu-nications in StatisticsmdashSimulation and Computation vol 36 no4ndash6 pp 987ndash996 2007

[43] L Y Hin and Y G Wang ldquoWorking-correlation-structureidentification in generalized estimating equationsrdquo Statistics inMedicine vol 28 no 4 pp 642ndash658 2009

[44] J A Nelder and D Pregibon ldquoAn extended quasi-likelihoodfunctionrdquo Biometrika vol 74 no 2 pp 221ndash232 1987

10 Advances in Statistics

[45] MWang M Kong and S Datta ldquoInference for marginal linearmodels for clustered longitudinal datawith potentially informa-tive cluster sizesrdquo Statistical Methods in Medical Research vol20 no 4 pp 347ndash367 2011

[46] E Cantoni J M Flemming and E Ronchetti ldquoVariableselection for marginal longitudinal generalized linear modelsrdquoBiometrics Journal of the International Biometric Society vol 61no 2 pp 507ndash514 2005

[47] Y-G Wang and X Lin ldquoEffects of variance-function misspeci-fication in analysis of longitudinal datardquo Biometrics vol 61 no2 pp 413ndash421 2005

[48] N R Chaganty andH Joe ldquoEfficiency of generalized estimatingequations for binary responsesrdquo Journal of the Royal StatisticalSociety Series B Statistical Methodology vol 66 no 4 pp 851ndash860 2004

[49] M Gosho C Hamada and I Yoshimura ldquoCriterion for theselection of a working correlation structure in the generalizedestimating equation approach for longitudinal balanced datardquoCommunications in Statistics vol 40 no 21 pp 3839ndash38562011

[50] M Gosho C Hamada and I Yoshimura ldquoSelection of workingcorrelation structure in weighted generalized estimating equa-tion method for incomplete longitudinal datardquo Communica-tions in Statistics vol 43 no 1 pp 62ndash81 2014

[51] M J JangWorking correlation selection in generalized estimatingequations [Dissertation] University of Iowa 2011

[52] J Chen and N A Lazar ldquoSelection of working correlationstructure in generalized estimating equations via empiricallikelihoodrdquo Journal of Computational and Graphical Statisticsvol 21 no 1 pp 18ndash41 2012

[53] P M Westgate ldquoA bias-corrected covariance estimator forimproved inference when using an unstructured correlationwith quadratic inference functionsrdquo Statistics and ProbabilityLetters vol 83 no 6 pp 1553ndash1558 2013

[54] P M Westgate ldquoCriterion for the simultaneous selection of aworking correlation structure and either generalized estimatingequations or the quadratic inference function approachrdquo Bio-metrical Journal vol 56 no 3 pp 461ndash476 2014

[55] P M Westgate ldquoImproving the correlation structure selectionapproach for generalized estimating equations and balancedlongitudinal datardquo Statistics in Medicine vol 33 no 13 pp2222ndash2237 2014

[56] J Ye ldquoOn measuring and correcting the effects of data miningand model selectionrdquo Journal of the American Statistical Associ-ation vol 93 no 441 pp 120ndash131 1998

[57] J J Shuster Practical Handbook of Sample Size Guidelines forClinical Trials CRC Press Boca Raton Fla USA 1993

[58] G Liu and K-Y Liang ldquoSample size calculations for studieswith correlated observationsrdquo Biometrics vol 53 no 3 pp 937ndash947 1997

[59] W J Shih ldquoSample size and power calculations for periodontaland other studies with clustered samples using the method ofgeneralized estimating equationsrdquo Biometrical Journal vol 39no 8 pp 899ndash908 1997

[60] S R Lipsitz and G M Fitzmaurice ldquoSample size for repeatedmeasures studies with binary responsesrdquo Statistics in Medicinevol 13 no 12 pp 1233ndash1239 1994

[61] W Pan ldquoSample size and power calculations with correlatedbinary datardquoControlled Clinical Trials vol 22 no 3 pp 211ndash2272001

[62] N Breslow ldquoTests of hypotheses in overdispersed Poissonregression and other quasi likelihood modelsrdquo Journal of theAmerican Statistical Association vol 85 pp 565ndash571 1990

[63] E W Lee and N Dubin ldquoEstimation and sample size consider-ations for clustered binary responsesrdquo Statistics inMedicine vol13 no 12 pp 1241ndash1252 1994

[64] D J Sargent J A Sloan and S S Cha ldquoSample size anddesign considerations for phase II clinical trials with correlatedobservationsrdquo Controlled Clinical Trials vol 20 no 3 pp 242ndash252 1999

[65] C S Li ldquoSemiparametric negative binomial regressionmodelsrdquoCommunications in Statistics Simulation and Computation vol39 no 3 pp 475ndash486 2010

[66] WHGreene ldquoAccounting for excess zeros and sample selectionin Poisson and negative binomial regression modelsrdquo TechRep New York University 1994

[67] P Lambert ldquoModeling of repeated series of count data mea-sured at unequally spaced timesrdquo Applied Statistics vol 45 pp31ndash38 1996

[68] M S Pepe andG L Anderson ldquoA cautionary note on in ferencefor marginal regression models with longitudinal data andgeneral correlated response datardquo Communications in StatisticsSeries B vol 23 pp 939ndash951 1994

[69] M Wang and Q Long ldquoModified robust variance estimator forgeneralized estimating equations with improved small-sampleperformancerdquo Statistics in Medicine vol 30 no 11 pp 1278ndash1291 2011

[70] M Taljaard ADMcRae CWeijer et al ldquoInadequate reportingof research ethics review and informed consent in clusterrandomised trials Review of random sample of publishedtrialsrdquo British Medical Journal vol 342 Article ID d2496 2011

[71] L A Mancl and T A DeRouen ldquoA covariance estimator forGEE with improved small-sample propertiesrdquo Biometrics vol57 no 1 pp 126ndash134 2001

[72] M P Fay and B I Graubard ldquoSmall-sample adjustments forWald-type tests using sandwich estimatorsrdquo Biometrics vol 57no 4 pp 1198ndash1206 2001

[73] W Pan ldquoOn the robust variance estimator in generalisedestimating equationsrdquo Biometrika vol 88 no 3 pp 901ndash9062001

[74] W Pan and M M Wall ldquoSmall-sample adjustments in usingthe sandwich variance estimator in generalized estimatingequationsrdquo Statistics in Medicine vol 21 no 10 pp 1429ndash14412002

[75] X Guo W Pan J E Connett P J Hannan and S A FrenchldquoSmall-sample performance of the robust score test and itsmodifications in generalized estimating equationsrdquo Statistics inMedicine vol 24 no 22 pp 3479ndash3495 2005

[76] D M Farewell ldquoMarginal analyses of longitudinal data with aninformative pattern of observationsrdquo Biometrika vol 97 no 1pp 65ndash78 2010

[77] J D Beck T Sharp G G Koch and S Offenbacher ldquoA 5-yearstudy of attachment loss and tooth loss in community-dwellingolder adultsrdquo Journal of Periodontal Research vol 32 no 6 pp516ndash523 1997

[78] S J Arbes Jr H Agustsdottir and G D Slade ldquoEnvironmentaltobacco smoke and periodontal disease in the United StatesrdquoAmerican Journal of Public Health vol 91 no 2 pp 253ndash2572001

[79] J M Robins A Rotnitzky and L P Zhao ldquoAnalysis of semi-parametric regression models for repeated outcomes in the

Advances in Statistics 11

presence of missing datardquo Journal of the American StatisticalAssociation vol 90 pp 106ndash121 1995

[80] E B Hoffman P K Sen and C R Weinberg ldquoWithin-clusterresamplingrdquo Biometrika vol 88 no 4 pp 1121ndash1134 2001

[81] J MWilliamson S Datta and G A Satten ldquoMarginal analysesof clustered data when cluster size is informativerdquo Biometricsvol 59 no 1 pp 36ndash42 2003

[82] E Benhin J N Rao and A J Scott ldquoMean estimating equationapproach to analysing cluster-correlated data with nonignor-able cluster sizesrdquo Biometrika vol 92 no 2 pp 435ndash450 2005

[83] X J Cong G Yin and Y Shen ldquoMarginal analysis of correlatedfailure time data with informative cluster sizesrdquo Biometrics vol63 no 3 pp 663ndash672 2007

[84] T C Chiang and K Y Lee ldquoEfcient estimation methods forinformative cluster size datardquo Statistical Sinica vol 80 pp 121ndash123 2008

[85] M Pavlou S R Seaman and A J Copas ldquoAn examinationof a method for marginal inference when the cluster size isinformativerdquo Statistica Sinica vol 23 no 2 pp 791ndash801 2013

[86] S R Seaman M Pavlou and A J Copas ldquoMethods forobserved-cluster inference when cluster size is informative areview and clarificationsrdquoBiometrics vol 70 no 2 pp 449ndash4562014

[87] Z Chen B Zhang and P S Albert ldquoA joint modeling approachto data with informative cluster size robustness to the clustersize modelrdquo Statistics in Medicine vol 30 no 15 pp 1825ndash18362011

[88] Y Huang and B Leroux ldquoInformative cluster sizes forsubcluster-level covariates and weighted generalized estimatingequationsrdquo Biometrics vol 67 no 3 pp 843ndash851 2011

[89] B F Kurland L L Johnson B L Egleston and P H DiehrldquoLongitudinal data with follow-up truncated by death matchthe analysis method to research aimsrdquo Statistical Science vol24 no 2 pp 211ndash222 2009

[90] J M Neuhaus and C E McCulloch ldquoEstimation of covariateeffects in generalized linear mixed models with informativecluster sizesrdquo Biometrika vol 98 no 1 pp 147ndash162 2011

[91] S R Lipsitz G M Fitzmaurice E J Orav and N M LairdldquoPerformance of generalized estimating equations in practicalsituationsrdquo Biometrics vol 50 no 1 pp 270ndash278 1994

[92] D B Hall and T A Severini ldquoExtended generalized estimatingequations for clustered datardquo Journal of the American StatisticalAssociation vol 93 no 444 pp 1365ndash1375 1998

[93] C-W Shen and Y-H Chen ldquoModel selection for generalizedestimating equations accommodating dropout missingnessrdquoBiometrics vol 68 no 4 pp 1046ndash1054 2012

[94] C-W Shen and Y-H Chen ldquoModel selection of generalizedestimating equations with multiply imputed longitudinal datardquoBiometrical Journal vol 55 no 6 pp 899ndash911 2013

[95] D B Rubin ldquoInference and missing datardquo Biometrika vol 63no 3 pp 581ndash592 1976

[96] R J Little andDB Rubin Statistical Analysis withMissingDataWiley New York NY USA

[97] P Diggle D Farewell and RHenderson ldquoAnalysis of longitudi-nal data with drop-out objectives assumptions and a proposalrdquoJournal of the Royal Statistical Society C vol 56 no 5 pp 499ndash550 2007

[98] A J Copas and S R Seaman ldquoBias from the use of generalizedestimating equations to analyze incomplete longitudinal binarydatardquo Journal of Applied Statistics vol 37 no 6 pp 911ndash9222010

[99] L Wang J Zhou and A Qu ldquoPenalized generalized estimatingequations for high-dimensional longitudinal data analysisrdquoBiometrics vol 68 no 2 pp 353ndash360 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Review Article Generalized Estimating Equations in ...downloads.hindawi.com/archive/2014/303728.pdfrecent developments of GEE. As is well known, GEE has several de ning features [

8 Advances in Statistics

Table 4 Simulation for longitudinal data with AR-1 correlation matrix with 120572 = 03

119899 119870 CriterionSelection frequencies of ldquoworkingrdquo correlation structure

IND EXCH AR-1 IND EXCH AR-1Normal Binary

4

50

QIC 91 166 743 66 170 764RJ 712 142 146 925 12 63RJ1 0 478 522 7 505 488RJ2 0 466 534 20 499 481SC 0 480 520 220 350 430GP 0 543 457 303 332 365

100

QIC 25 116 859 7 122 871RJ 770 95 135 972 4 24RJ1 0 475 525 1 569 430RJ2 0 481 519 5 571 424SC 0 491 509 237 371 392GP 0 540 460 290 353 357

8

50

QIC 50 88 862 44 77 879RJ 646 148 206 934 5 61RJ1 0 445 555 0 535 465RJ2 0 443 557 10 535 455SC 0 467 533 168 397 435GP 0 549 451 269 406 325

100

QIC 16 39 945 7 33 960RJ 648 154 198 972 0 28RJ1 0 455 545 1 603 396RJ2 0 455 545 1 609 390SC 0 480 520 177 458 365GP 0 532 468 247 457 296

theoretical constraints of GEE as well as new challengesemerging from practical applications in clinical trials orbiomedical studies

In addition current research of interest related to GEEalso includes a robust and optimal model selection criterionof GEE under missing at random (MAR) or missing not atrandom (MNAR) [93 94] sample sizepower calculation forcorrelated sparse or overdispersion count data or longitudinaldata with small sample [57ndash60] GEE with improved per-formance under the situations with informative cluster sizeandor MAR andor small sample size [95ndash98] and GEE forhigh-dimensional longitudinal data [99] Although GEE hasattractive features flexible application and easy implementa-tion in software the application in practice should be cautiousdepending on the context of study design or data structureand the goals of research interest

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The author was supported by a grant from the Penn StateCTSI The project was supported by the National Center forResearch Resources and the National Center for AdvancingTranslational Sciences National Institutes of Health throughGrant 5 UL1 RR0330184-04 The content is solely the respon-sibility of the author and does not represent the views of theNIH

References

[1] Z Feng P Diehr A Peterson and D McLerran ldquoSelectedstatistical issues in group randomized trialsrdquo Annual Review ofPublic Health vol 22 pp 167ndash187 2001

[2] G Fitzmaurice N M Larid and J H Ware Applied Longitudi-nal Data John Wiley amp Sons 2004

[3] J W Hardin and J M Hilbe Generalized Estimating EquationsChapman and HallCRC Press Boca Raton Fla USA 2003

[4] R F Potthoff and S N Roy ldquoA generalized multivariate analysisof variance model useful especially for growth curve problemsrdquoBiometrika vol 51 pp 313ndash326 1964

Advances in Statistics 9

[5] LM Friedman CD Furberg andD LDeMets Fundamentalsof Clinical Trials Springer New York NY USA 3nd edition1989

[6] K Y Liang and S L Zeger ldquoA comparison of two bias-correctedcovariance estimators for generalized estimating equationsrdquoBiometrika vol 73 pp 13ndash22 1986

[7] M Crowder ldquoOn the use of a working correlation matrixin using generalised linear models for repeated measuresrdquoBiometrika vol 82 no 2 pp 407ndash410 1995

[8] R W Wedderburn ldquoQuasi-likelihood functions generalizedlinearmodels and the Gauss-Newtonmethodrdquo Biometrika vol61 pp 439ndash447 1974

[9] P Diggle P Heagerty K Y Liang and S L Zeger Analysis ofLongitudinal Data Oxford University Press Oxford UK 2002

[10] G Fitzmaurice M Davidian G Verbeke and G MolenberghsLongitudinal Data Anlaysis Chapman ampHallCRC Press 2008

[11] D Hedeker and R D Gibbons Analysis of Longitudinal DataJohn Wiley amp Sons 2006

[12] P McCullagh and J A Nelder Generalized Linear ModelsChapman amp Hall London UK 1989

[13] N R Chaganty and H Joe ldquoRange of correlation matrices fordependent Bernoulli random variablesrdquo Biometrika vol 93 no1 pp 197ndash206 2006

[14] R T Sabo and N R Chaganty ldquoWhat can go wrong whenignoring correlation bounds in the use of generalized estimatingequationsrdquo Statistics in Medicine vol 29 no 24 pp 2501ndash25072010

[15] B C Sutradhar and K Das ldquoOn the efficiency of regressionestimators in generalised linear models for longitudinal datardquoBiometrika vol 86 no 2 pp 459ndash465 1999

[16] Y-G Wang and V Carey ldquoWorking correlation structuremisspecification estimation and covariate design implicationsfor generalised estimating equations performancerdquo Biometrikavol 90 no 1 pp 29ndash41 2003

[17] S R Lipsitz GMolenberghsGM Fitzmaurice and J IbrahimldquoGEE with Gaussian estimation of the correlations when dataare incompleterdquo Biometrics vol 56 no 2 pp 528ndash536 2000

[18] Y-G Wang and V J Carey ldquoUnbiased estimating equationsfromworking correlationmodels for irregularly timed repeatedmeasuresrdquo Journal of the American Statistical Association vol99 no 467 pp 845ndash853 2004

[19] A Qu and B G Lindsay ldquoBuilding adaptive estimating equa-tions when inverse of covariance estimation is difficultrdquo Journalof the Royal Statistical Society B Statistical Methodology vol 65no 1 pp 127ndash142 2003

[20] S R Lipsitz and G M Fitzmaurice ldquoEstimating equations formeasures of association between repeated binary responsesrdquoBiometrics vol 52 no 3 pp 903ndash912 1996

[21] Y Lee and J A Nelder ldquoConditional and marginal modelsanother viewrdquo Statistical Science vol 19 no 2 pp 219ndash238 2004

[22] Y Lee and J A Nelder ldquoLikelihood inference for models withunobservables another viewrdquo Statistical Science vol 24 no 3pp 255ndash269 2009

[23] A Qu B G Lindsay and B Li ldquoImproving generalised estimat-ing equations using quadratic inference functionsrdquo Biometrikavol 87 no 4 pp 823ndash836 2000

[24] G Kauermann and R J Carroll ldquoA note on the efficiencyof sandwich covariance matrix estimationrdquo Journal of theAmerican Statistical Association vol 96 no 456 pp 1387ndash13962001

[25] Y G Wang and L Y Hin ldquoModeling strategies in longitudinaldata analysis covariate variance function and correlationstructure selectionrdquoComputational Statistics andData Analysisvol 54 no 12 pp 3359ndash3370 2010

[26] W Pan ldquoGoodness-of-fit tests for GEE with correlated binarydatardquo Scandinavian Journal of Statistics vol 29 no 1 pp 101ndash110 2002

[27] A M Wood I R White and P Royston ldquoHow should variableselection be performed with multiply imputed datardquo Statisticsin Medicine vol 27 no 17 pp 3227ndash3246 2008

[28] M D Begg and M K Parides ldquoSeparation of individual-level and cluster-level covariate effects in regression analysis ofcorrelated datardquo Statistics in Medicine vol 22 no 16 pp 2591ndash2602 2003

[29] L Y Hin V J Carey and Y G Wang ldquoCriteria for working-correlation-structure selection in GEE assessment via simula-tionrdquoTheAmerican Statistician vol 61 no 4 pp 360ndash364 2007

[30] J X Pan and G Mackenzie ldquoOn modelling mean-covariancestructures in longitudinal studiesrdquo Biometrika vol 90 no 1 pp239ndash244 2003

[31] M Davidian and R J Carroll ldquoVariance function estimationrdquoJournal of the American Statistical Association vol 82 no 400pp 1079ndash1091 1987

[32] M Pourahmadi ldquoJoint mean-covariance models with appli-cations to longitudinal data unconstrained parameterisationrdquoBiometrika vol 86 no 3 pp 677ndash690 1999

[33] S Konishi and G Kitagawa ldquoGeneralised information criteriainmodel selectionrdquoBiometrika vol 83 no 4 pp 875ndash890 1996

[34] B Zhang ldquoSummarizing the goodness of fit o f generalizedlinear models for longitudinal datardquo Statistics in Medicine vol19 pp 1265ndash1275 2000

[35] A Rotnitzky and N P Jewell ldquoHypothesis testing of regressionparameters in semiparametric generalized linear models forcluster correlated datardquo Biometrika vol 77 no 3 pp 485ndash4971990

[36] J Shults andN R Chaganty ldquoAnalysis of serially correlated datausing quasi-least squaresrdquo Biometrics vol 54 no 4 pp 1622ndash1630 1998

[37] V J Carey and Y-G Wang ldquoWorking covariance modelselection for generalized estimating equationsrdquo Statistics inMedicine vol 30 no 26 pp 3117ndash3124 2011

[38] W Pan ldquoAkaikersquos information criterion in generalized estimat-ing equationsrdquo Biometrics vol 57 no 1 pp 120ndash125 2001

[39] H Akaike ldquoInformation theory and an extension of themaximum likelihood principlerdquo in Proceedings of the 2ndInternational Symposium on Information Theory vol 15 pp267ndash281 1973

[40] J A Nelder and Y Lee ldquoLikelihood quasi-likelihood andpseudolikelihood some comparisonsrdquo Journal of the RoyalStatistical Society B vol 54 no 1 pp 273ndash284 1992

[41] J Cui ldquoQIC program andmodel selection in GEE analysesrdquoTheStata Journal vol 7 no 2 pp 209ndash220 2007

[42] J Cui and G Qian ldquoSelection of working correlation structureand best model in GEE analyses of longitudinal datardquo Commu-nications in StatisticsmdashSimulation and Computation vol 36 no4ndash6 pp 987ndash996 2007

[43] L Y Hin and Y G Wang ldquoWorking-correlation-structureidentification in generalized estimating equationsrdquo Statistics inMedicine vol 28 no 4 pp 642ndash658 2009

[44] J A Nelder and D Pregibon ldquoAn extended quasi-likelihoodfunctionrdquo Biometrika vol 74 no 2 pp 221ndash232 1987

10 Advances in Statistics

[45] MWang M Kong and S Datta ldquoInference for marginal linearmodels for clustered longitudinal datawith potentially informa-tive cluster sizesrdquo Statistical Methods in Medical Research vol20 no 4 pp 347ndash367 2011

[46] E Cantoni J M Flemming and E Ronchetti ldquoVariableselection for marginal longitudinal generalized linear modelsrdquoBiometrics Journal of the International Biometric Society vol 61no 2 pp 507ndash514 2005

[47] Y-G Wang and X Lin ldquoEffects of variance-function misspeci-fication in analysis of longitudinal datardquo Biometrics vol 61 no2 pp 413ndash421 2005

[48] N R Chaganty andH Joe ldquoEfficiency of generalized estimatingequations for binary responsesrdquo Journal of the Royal StatisticalSociety Series B Statistical Methodology vol 66 no 4 pp 851ndash860 2004

[49] M Gosho C Hamada and I Yoshimura ldquoCriterion for theselection of a working correlation structure in the generalizedestimating equation approach for longitudinal balanced datardquoCommunications in Statistics vol 40 no 21 pp 3839ndash38562011

[50] M Gosho C Hamada and I Yoshimura ldquoSelection of workingcorrelation structure in weighted generalized estimating equa-tion method for incomplete longitudinal datardquo Communica-tions in Statistics vol 43 no 1 pp 62ndash81 2014

[51] M J JangWorking correlation selection in generalized estimatingequations [Dissertation] University of Iowa 2011

[52] J Chen and N A Lazar ldquoSelection of working correlationstructure in generalized estimating equations via empiricallikelihoodrdquo Journal of Computational and Graphical Statisticsvol 21 no 1 pp 18ndash41 2012

[53] P M Westgate ldquoA bias-corrected covariance estimator forimproved inference when using an unstructured correlationwith quadratic inference functionsrdquo Statistics and ProbabilityLetters vol 83 no 6 pp 1553ndash1558 2013

[54] P M Westgate ldquoCriterion for the simultaneous selection of aworking correlation structure and either generalized estimatingequations or the quadratic inference function approachrdquo Bio-metrical Journal vol 56 no 3 pp 461ndash476 2014

[55] P M Westgate ldquoImproving the correlation structure selectionapproach for generalized estimating equations and balancedlongitudinal datardquo Statistics in Medicine vol 33 no 13 pp2222ndash2237 2014

[56] J Ye ldquoOn measuring and correcting the effects of data miningand model selectionrdquo Journal of the American Statistical Associ-ation vol 93 no 441 pp 120ndash131 1998

[57] J J Shuster Practical Handbook of Sample Size Guidelines forClinical Trials CRC Press Boca Raton Fla USA 1993

[58] G Liu and K-Y Liang ldquoSample size calculations for studieswith correlated observationsrdquo Biometrics vol 53 no 3 pp 937ndash947 1997

[59] W J Shih ldquoSample size and power calculations for periodontaland other studies with clustered samples using the method ofgeneralized estimating equationsrdquo Biometrical Journal vol 39no 8 pp 899ndash908 1997

[60] S R Lipsitz and G M Fitzmaurice ldquoSample size for repeatedmeasures studies with binary responsesrdquo Statistics in Medicinevol 13 no 12 pp 1233ndash1239 1994

[61] W Pan ldquoSample size and power calculations with correlatedbinary datardquoControlled Clinical Trials vol 22 no 3 pp 211ndash2272001

[62] N Breslow ldquoTests of hypotheses in overdispersed Poissonregression and other quasi likelihood modelsrdquo Journal of theAmerican Statistical Association vol 85 pp 565ndash571 1990

[63] E W Lee and N Dubin ldquoEstimation and sample size consider-ations for clustered binary responsesrdquo Statistics inMedicine vol13 no 12 pp 1241ndash1252 1994

[64] D J Sargent J A Sloan and S S Cha ldquoSample size anddesign considerations for phase II clinical trials with correlatedobservationsrdquo Controlled Clinical Trials vol 20 no 3 pp 242ndash252 1999

[65] C S Li ldquoSemiparametric negative binomial regressionmodelsrdquoCommunications in Statistics Simulation and Computation vol39 no 3 pp 475ndash486 2010

[66] WHGreene ldquoAccounting for excess zeros and sample selectionin Poisson and negative binomial regression modelsrdquo TechRep New York University 1994

[67] P Lambert ldquoModeling of repeated series of count data mea-sured at unequally spaced timesrdquo Applied Statistics vol 45 pp31ndash38 1996

[68] M S Pepe andG L Anderson ldquoA cautionary note on in ferencefor marginal regression models with longitudinal data andgeneral correlated response datardquo Communications in StatisticsSeries B vol 23 pp 939ndash951 1994

[69] M Wang and Q Long ldquoModified robust variance estimator forgeneralized estimating equations with improved small-sampleperformancerdquo Statistics in Medicine vol 30 no 11 pp 1278ndash1291 2011

[70] M Taljaard ADMcRae CWeijer et al ldquoInadequate reportingof research ethics review and informed consent in clusterrandomised trials Review of random sample of publishedtrialsrdquo British Medical Journal vol 342 Article ID d2496 2011

[71] L A Mancl and T A DeRouen ldquoA covariance estimator forGEE with improved small-sample propertiesrdquo Biometrics vol57 no 1 pp 126ndash134 2001

[72] M P Fay and B I Graubard ldquoSmall-sample adjustments forWald-type tests using sandwich estimatorsrdquo Biometrics vol 57no 4 pp 1198ndash1206 2001

[73] W Pan ldquoOn the robust variance estimator in generalisedestimating equationsrdquo Biometrika vol 88 no 3 pp 901ndash9062001

[74] W Pan and M M Wall ldquoSmall-sample adjustments in usingthe sandwich variance estimator in generalized estimatingequationsrdquo Statistics in Medicine vol 21 no 10 pp 1429ndash14412002

[75] X Guo W Pan J E Connett P J Hannan and S A FrenchldquoSmall-sample performance of the robust score test and itsmodifications in generalized estimating equationsrdquo Statistics inMedicine vol 24 no 22 pp 3479ndash3495 2005

[76] D M Farewell ldquoMarginal analyses of longitudinal data with aninformative pattern of observationsrdquo Biometrika vol 97 no 1pp 65ndash78 2010

[77] J D Beck T Sharp G G Koch and S Offenbacher ldquoA 5-yearstudy of attachment loss and tooth loss in community-dwellingolder adultsrdquo Journal of Periodontal Research vol 32 no 6 pp516ndash523 1997

[78] S J Arbes Jr H Agustsdottir and G D Slade ldquoEnvironmentaltobacco smoke and periodontal disease in the United StatesrdquoAmerican Journal of Public Health vol 91 no 2 pp 253ndash2572001

[79] J M Robins A Rotnitzky and L P Zhao ldquoAnalysis of semi-parametric regression models for repeated outcomes in the

Advances in Statistics 11

presence of missing datardquo Journal of the American StatisticalAssociation vol 90 pp 106ndash121 1995

[80] E B Hoffman P K Sen and C R Weinberg ldquoWithin-clusterresamplingrdquo Biometrika vol 88 no 4 pp 1121ndash1134 2001

[81] J MWilliamson S Datta and G A Satten ldquoMarginal analysesof clustered data when cluster size is informativerdquo Biometricsvol 59 no 1 pp 36ndash42 2003

[82] E Benhin J N Rao and A J Scott ldquoMean estimating equationapproach to analysing cluster-correlated data with nonignor-able cluster sizesrdquo Biometrika vol 92 no 2 pp 435ndash450 2005

[83] X J Cong G Yin and Y Shen ldquoMarginal analysis of correlatedfailure time data with informative cluster sizesrdquo Biometrics vol63 no 3 pp 663ndash672 2007

[84] T C Chiang and K Y Lee ldquoEfcient estimation methods forinformative cluster size datardquo Statistical Sinica vol 80 pp 121ndash123 2008

[85] M Pavlou S R Seaman and A J Copas ldquoAn examinationof a method for marginal inference when the cluster size isinformativerdquo Statistica Sinica vol 23 no 2 pp 791ndash801 2013

[86] S R Seaman M Pavlou and A J Copas ldquoMethods forobserved-cluster inference when cluster size is informative areview and clarificationsrdquoBiometrics vol 70 no 2 pp 449ndash4562014

[87] Z Chen B Zhang and P S Albert ldquoA joint modeling approachto data with informative cluster size robustness to the clustersize modelrdquo Statistics in Medicine vol 30 no 15 pp 1825ndash18362011

[88] Y Huang and B Leroux ldquoInformative cluster sizes forsubcluster-level covariates and weighted generalized estimatingequationsrdquo Biometrics vol 67 no 3 pp 843ndash851 2011

[89] B F Kurland L L Johnson B L Egleston and P H DiehrldquoLongitudinal data with follow-up truncated by death matchthe analysis method to research aimsrdquo Statistical Science vol24 no 2 pp 211ndash222 2009

[90] J M Neuhaus and C E McCulloch ldquoEstimation of covariateeffects in generalized linear mixed models with informativecluster sizesrdquo Biometrika vol 98 no 1 pp 147ndash162 2011

[91] S R Lipsitz G M Fitzmaurice E J Orav and N M LairdldquoPerformance of generalized estimating equations in practicalsituationsrdquo Biometrics vol 50 no 1 pp 270ndash278 1994

[92] D B Hall and T A Severini ldquoExtended generalized estimatingequations for clustered datardquo Journal of the American StatisticalAssociation vol 93 no 444 pp 1365ndash1375 1998

[93] C-W Shen and Y-H Chen ldquoModel selection for generalizedestimating equations accommodating dropout missingnessrdquoBiometrics vol 68 no 4 pp 1046ndash1054 2012

[94] C-W Shen and Y-H Chen ldquoModel selection of generalizedestimating equations with multiply imputed longitudinal datardquoBiometrical Journal vol 55 no 6 pp 899ndash911 2013

[95] D B Rubin ldquoInference and missing datardquo Biometrika vol 63no 3 pp 581ndash592 1976

[96] R J Little andDB Rubin Statistical Analysis withMissingDataWiley New York NY USA

[97] P Diggle D Farewell and RHenderson ldquoAnalysis of longitudi-nal data with drop-out objectives assumptions and a proposalrdquoJournal of the Royal Statistical Society C vol 56 no 5 pp 499ndash550 2007

[98] A J Copas and S R Seaman ldquoBias from the use of generalizedestimating equations to analyze incomplete longitudinal binarydatardquo Journal of Applied Statistics vol 37 no 6 pp 911ndash9222010

[99] L Wang J Zhou and A Qu ldquoPenalized generalized estimatingequations for high-dimensional longitudinal data analysisrdquoBiometrics vol 68 no 2 pp 353ndash360 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Review Article Generalized Estimating Equations in ...downloads.hindawi.com/archive/2014/303728.pdfrecent developments of GEE. As is well known, GEE has several de ning features [

Advances in Statistics 9

[5] LM Friedman CD Furberg andD LDeMets Fundamentalsof Clinical Trials Springer New York NY USA 3nd edition1989

[6] K Y Liang and S L Zeger ldquoA comparison of two bias-correctedcovariance estimators for generalized estimating equationsrdquoBiometrika vol 73 pp 13ndash22 1986

[7] M Crowder ldquoOn the use of a working correlation matrixin using generalised linear models for repeated measuresrdquoBiometrika vol 82 no 2 pp 407ndash410 1995

[8] R W Wedderburn ldquoQuasi-likelihood functions generalizedlinearmodels and the Gauss-Newtonmethodrdquo Biometrika vol61 pp 439ndash447 1974

[9] P Diggle P Heagerty K Y Liang and S L Zeger Analysis ofLongitudinal Data Oxford University Press Oxford UK 2002

[10] G Fitzmaurice M Davidian G Verbeke and G MolenberghsLongitudinal Data Anlaysis Chapman ampHallCRC Press 2008

[11] D Hedeker and R D Gibbons Analysis of Longitudinal DataJohn Wiley amp Sons 2006

[12] P McCullagh and J A Nelder Generalized Linear ModelsChapman amp Hall London UK 1989

[13] N R Chaganty and H Joe ldquoRange of correlation matrices fordependent Bernoulli random variablesrdquo Biometrika vol 93 no1 pp 197ndash206 2006

[14] R T Sabo and N R Chaganty ldquoWhat can go wrong whenignoring correlation bounds in the use of generalized estimatingequationsrdquo Statistics in Medicine vol 29 no 24 pp 2501ndash25072010

[15] B C Sutradhar and K Das ldquoOn the efficiency of regressionestimators in generalised linear models for longitudinal datardquoBiometrika vol 86 no 2 pp 459ndash465 1999

[16] Y-G Wang and V Carey ldquoWorking correlation structuremisspecification estimation and covariate design implicationsfor generalised estimating equations performancerdquo Biometrikavol 90 no 1 pp 29ndash41 2003

[17] S R Lipsitz GMolenberghsGM Fitzmaurice and J IbrahimldquoGEE with Gaussian estimation of the correlations when dataare incompleterdquo Biometrics vol 56 no 2 pp 528ndash536 2000

[18] Y-G Wang and V J Carey ldquoUnbiased estimating equationsfromworking correlationmodels for irregularly timed repeatedmeasuresrdquo Journal of the American Statistical Association vol99 no 467 pp 845ndash853 2004

[19] A Qu and B G Lindsay ldquoBuilding adaptive estimating equa-tions when inverse of covariance estimation is difficultrdquo Journalof the Royal Statistical Society B Statistical Methodology vol 65no 1 pp 127ndash142 2003

[20] S R Lipsitz and G M Fitzmaurice ldquoEstimating equations formeasures of association between repeated binary responsesrdquoBiometrics vol 52 no 3 pp 903ndash912 1996

[21] Y Lee and J A Nelder ldquoConditional and marginal modelsanother viewrdquo Statistical Science vol 19 no 2 pp 219ndash238 2004

[22] Y Lee and J A Nelder ldquoLikelihood inference for models withunobservables another viewrdquo Statistical Science vol 24 no 3pp 255ndash269 2009

[23] A Qu B G Lindsay and B Li ldquoImproving generalised estimat-ing equations using quadratic inference functionsrdquo Biometrikavol 87 no 4 pp 823ndash836 2000

[24] G Kauermann and R J Carroll ldquoA note on the efficiencyof sandwich covariance matrix estimationrdquo Journal of theAmerican Statistical Association vol 96 no 456 pp 1387ndash13962001

[25] Y G Wang and L Y Hin ldquoModeling strategies in longitudinaldata analysis covariate variance function and correlationstructure selectionrdquoComputational Statistics andData Analysisvol 54 no 12 pp 3359ndash3370 2010

[26] W Pan ldquoGoodness-of-fit tests for GEE with correlated binarydatardquo Scandinavian Journal of Statistics vol 29 no 1 pp 101ndash110 2002

[27] A M Wood I R White and P Royston ldquoHow should variableselection be performed with multiply imputed datardquo Statisticsin Medicine vol 27 no 17 pp 3227ndash3246 2008

[28] M D Begg and M K Parides ldquoSeparation of individual-level and cluster-level covariate effects in regression analysis ofcorrelated datardquo Statistics in Medicine vol 22 no 16 pp 2591ndash2602 2003

[29] L Y Hin V J Carey and Y G Wang ldquoCriteria for working-correlation-structure selection in GEE assessment via simula-tionrdquoTheAmerican Statistician vol 61 no 4 pp 360ndash364 2007

[30] J X Pan and G Mackenzie ldquoOn modelling mean-covariancestructures in longitudinal studiesrdquo Biometrika vol 90 no 1 pp239ndash244 2003

[31] M Davidian and R J Carroll ldquoVariance function estimationrdquoJournal of the American Statistical Association vol 82 no 400pp 1079ndash1091 1987

[32] M Pourahmadi ldquoJoint mean-covariance models with appli-cations to longitudinal data unconstrained parameterisationrdquoBiometrika vol 86 no 3 pp 677ndash690 1999

[33] S Konishi and G Kitagawa ldquoGeneralised information criteriainmodel selectionrdquoBiometrika vol 83 no 4 pp 875ndash890 1996

[34] B Zhang ldquoSummarizing the goodness of fit o f generalizedlinear models for longitudinal datardquo Statistics in Medicine vol19 pp 1265ndash1275 2000

[35] A Rotnitzky and N P Jewell ldquoHypothesis testing of regressionparameters in semiparametric generalized linear models forcluster correlated datardquo Biometrika vol 77 no 3 pp 485ndash4971990

[36] J Shults andN R Chaganty ldquoAnalysis of serially correlated datausing quasi-least squaresrdquo Biometrics vol 54 no 4 pp 1622ndash1630 1998

[37] V J Carey and Y-G Wang ldquoWorking covariance modelselection for generalized estimating equationsrdquo Statistics inMedicine vol 30 no 26 pp 3117ndash3124 2011

[38] W Pan ldquoAkaikersquos information criterion in generalized estimat-ing equationsrdquo Biometrics vol 57 no 1 pp 120ndash125 2001

[39] H Akaike ldquoInformation theory and an extension of themaximum likelihood principlerdquo in Proceedings of the 2ndInternational Symposium on Information Theory vol 15 pp267ndash281 1973

[40] J A Nelder and Y Lee ldquoLikelihood quasi-likelihood andpseudolikelihood some comparisonsrdquo Journal of the RoyalStatistical Society B vol 54 no 1 pp 273ndash284 1992

[41] J Cui ldquoQIC program andmodel selection in GEE analysesrdquoTheStata Journal vol 7 no 2 pp 209ndash220 2007

[42] J Cui and G Qian ldquoSelection of working correlation structureand best model in GEE analyses of longitudinal datardquo Commu-nications in StatisticsmdashSimulation and Computation vol 36 no4ndash6 pp 987ndash996 2007

[43] L Y Hin and Y G Wang ldquoWorking-correlation-structureidentification in generalized estimating equationsrdquo Statistics inMedicine vol 28 no 4 pp 642ndash658 2009

[44] J A Nelder and D Pregibon ldquoAn extended quasi-likelihoodfunctionrdquo Biometrika vol 74 no 2 pp 221ndash232 1987

10 Advances in Statistics

[45] MWang M Kong and S Datta ldquoInference for marginal linearmodels for clustered longitudinal datawith potentially informa-tive cluster sizesrdquo Statistical Methods in Medical Research vol20 no 4 pp 347ndash367 2011

[46] E Cantoni J M Flemming and E Ronchetti ldquoVariableselection for marginal longitudinal generalized linear modelsrdquoBiometrics Journal of the International Biometric Society vol 61no 2 pp 507ndash514 2005

[47] Y-G Wang and X Lin ldquoEffects of variance-function misspeci-fication in analysis of longitudinal datardquo Biometrics vol 61 no2 pp 413ndash421 2005

[48] N R Chaganty andH Joe ldquoEfficiency of generalized estimatingequations for binary responsesrdquo Journal of the Royal StatisticalSociety Series B Statistical Methodology vol 66 no 4 pp 851ndash860 2004

[49] M Gosho C Hamada and I Yoshimura ldquoCriterion for theselection of a working correlation structure in the generalizedestimating equation approach for longitudinal balanced datardquoCommunications in Statistics vol 40 no 21 pp 3839ndash38562011

[50] M Gosho C Hamada and I Yoshimura ldquoSelection of workingcorrelation structure in weighted generalized estimating equa-tion method for incomplete longitudinal datardquo Communica-tions in Statistics vol 43 no 1 pp 62ndash81 2014

[51] M J JangWorking correlation selection in generalized estimatingequations [Dissertation] University of Iowa 2011

[52] J Chen and N A Lazar ldquoSelection of working correlationstructure in generalized estimating equations via empiricallikelihoodrdquo Journal of Computational and Graphical Statisticsvol 21 no 1 pp 18ndash41 2012

[53] P M Westgate ldquoA bias-corrected covariance estimator forimproved inference when using an unstructured correlationwith quadratic inference functionsrdquo Statistics and ProbabilityLetters vol 83 no 6 pp 1553ndash1558 2013

[54] P M Westgate ldquoCriterion for the simultaneous selection of aworking correlation structure and either generalized estimatingequations or the quadratic inference function approachrdquo Bio-metrical Journal vol 56 no 3 pp 461ndash476 2014

[55] P M Westgate ldquoImproving the correlation structure selectionapproach for generalized estimating equations and balancedlongitudinal datardquo Statistics in Medicine vol 33 no 13 pp2222ndash2237 2014

[56] J Ye ldquoOn measuring and correcting the effects of data miningand model selectionrdquo Journal of the American Statistical Associ-ation vol 93 no 441 pp 120ndash131 1998

[57] J J Shuster Practical Handbook of Sample Size Guidelines forClinical Trials CRC Press Boca Raton Fla USA 1993

[58] G Liu and K-Y Liang ldquoSample size calculations for studieswith correlated observationsrdquo Biometrics vol 53 no 3 pp 937ndash947 1997

[59] W J Shih ldquoSample size and power calculations for periodontaland other studies with clustered samples using the method ofgeneralized estimating equationsrdquo Biometrical Journal vol 39no 8 pp 899ndash908 1997

[60] S R Lipsitz and G M Fitzmaurice ldquoSample size for repeatedmeasures studies with binary responsesrdquo Statistics in Medicinevol 13 no 12 pp 1233ndash1239 1994

[61] W Pan ldquoSample size and power calculations with correlatedbinary datardquoControlled Clinical Trials vol 22 no 3 pp 211ndash2272001

[62] N Breslow ldquoTests of hypotheses in overdispersed Poissonregression and other quasi likelihood modelsrdquo Journal of theAmerican Statistical Association vol 85 pp 565ndash571 1990

[63] E W Lee and N Dubin ldquoEstimation and sample size consider-ations for clustered binary responsesrdquo Statistics inMedicine vol13 no 12 pp 1241ndash1252 1994

[64] D J Sargent J A Sloan and S S Cha ldquoSample size anddesign considerations for phase II clinical trials with correlatedobservationsrdquo Controlled Clinical Trials vol 20 no 3 pp 242ndash252 1999

[65] C S Li ldquoSemiparametric negative binomial regressionmodelsrdquoCommunications in Statistics Simulation and Computation vol39 no 3 pp 475ndash486 2010

[66] WHGreene ldquoAccounting for excess zeros and sample selectionin Poisson and negative binomial regression modelsrdquo TechRep New York University 1994

[67] P Lambert ldquoModeling of repeated series of count data mea-sured at unequally spaced timesrdquo Applied Statistics vol 45 pp31ndash38 1996

[68] M S Pepe andG L Anderson ldquoA cautionary note on in ferencefor marginal regression models with longitudinal data andgeneral correlated response datardquo Communications in StatisticsSeries B vol 23 pp 939ndash951 1994

[69] M Wang and Q Long ldquoModified robust variance estimator forgeneralized estimating equations with improved small-sampleperformancerdquo Statistics in Medicine vol 30 no 11 pp 1278ndash1291 2011

[70] M Taljaard ADMcRae CWeijer et al ldquoInadequate reportingof research ethics review and informed consent in clusterrandomised trials Review of random sample of publishedtrialsrdquo British Medical Journal vol 342 Article ID d2496 2011

[71] L A Mancl and T A DeRouen ldquoA covariance estimator forGEE with improved small-sample propertiesrdquo Biometrics vol57 no 1 pp 126ndash134 2001

[72] M P Fay and B I Graubard ldquoSmall-sample adjustments forWald-type tests using sandwich estimatorsrdquo Biometrics vol 57no 4 pp 1198ndash1206 2001

[73] W Pan ldquoOn the robust variance estimator in generalisedestimating equationsrdquo Biometrika vol 88 no 3 pp 901ndash9062001

[74] W Pan and M M Wall ldquoSmall-sample adjustments in usingthe sandwich variance estimator in generalized estimatingequationsrdquo Statistics in Medicine vol 21 no 10 pp 1429ndash14412002

[75] X Guo W Pan J E Connett P J Hannan and S A FrenchldquoSmall-sample performance of the robust score test and itsmodifications in generalized estimating equationsrdquo Statistics inMedicine vol 24 no 22 pp 3479ndash3495 2005

[76] D M Farewell ldquoMarginal analyses of longitudinal data with aninformative pattern of observationsrdquo Biometrika vol 97 no 1pp 65ndash78 2010

[77] J D Beck T Sharp G G Koch and S Offenbacher ldquoA 5-yearstudy of attachment loss and tooth loss in community-dwellingolder adultsrdquo Journal of Periodontal Research vol 32 no 6 pp516ndash523 1997

[78] S J Arbes Jr H Agustsdottir and G D Slade ldquoEnvironmentaltobacco smoke and periodontal disease in the United StatesrdquoAmerican Journal of Public Health vol 91 no 2 pp 253ndash2572001

[79] J M Robins A Rotnitzky and L P Zhao ldquoAnalysis of semi-parametric regression models for repeated outcomes in the

Advances in Statistics 11

presence of missing datardquo Journal of the American StatisticalAssociation vol 90 pp 106ndash121 1995

[80] E B Hoffman P K Sen and C R Weinberg ldquoWithin-clusterresamplingrdquo Biometrika vol 88 no 4 pp 1121ndash1134 2001

[81] J MWilliamson S Datta and G A Satten ldquoMarginal analysesof clustered data when cluster size is informativerdquo Biometricsvol 59 no 1 pp 36ndash42 2003

[82] E Benhin J N Rao and A J Scott ldquoMean estimating equationapproach to analysing cluster-correlated data with nonignor-able cluster sizesrdquo Biometrika vol 92 no 2 pp 435ndash450 2005

[83] X J Cong G Yin and Y Shen ldquoMarginal analysis of correlatedfailure time data with informative cluster sizesrdquo Biometrics vol63 no 3 pp 663ndash672 2007

[84] T C Chiang and K Y Lee ldquoEfcient estimation methods forinformative cluster size datardquo Statistical Sinica vol 80 pp 121ndash123 2008

[85] M Pavlou S R Seaman and A J Copas ldquoAn examinationof a method for marginal inference when the cluster size isinformativerdquo Statistica Sinica vol 23 no 2 pp 791ndash801 2013

[86] S R Seaman M Pavlou and A J Copas ldquoMethods forobserved-cluster inference when cluster size is informative areview and clarificationsrdquoBiometrics vol 70 no 2 pp 449ndash4562014

[87] Z Chen B Zhang and P S Albert ldquoA joint modeling approachto data with informative cluster size robustness to the clustersize modelrdquo Statistics in Medicine vol 30 no 15 pp 1825ndash18362011

[88] Y Huang and B Leroux ldquoInformative cluster sizes forsubcluster-level covariates and weighted generalized estimatingequationsrdquo Biometrics vol 67 no 3 pp 843ndash851 2011

[89] B F Kurland L L Johnson B L Egleston and P H DiehrldquoLongitudinal data with follow-up truncated by death matchthe analysis method to research aimsrdquo Statistical Science vol24 no 2 pp 211ndash222 2009

[90] J M Neuhaus and C E McCulloch ldquoEstimation of covariateeffects in generalized linear mixed models with informativecluster sizesrdquo Biometrika vol 98 no 1 pp 147ndash162 2011

[91] S R Lipsitz G M Fitzmaurice E J Orav and N M LairdldquoPerformance of generalized estimating equations in practicalsituationsrdquo Biometrics vol 50 no 1 pp 270ndash278 1994

[92] D B Hall and T A Severini ldquoExtended generalized estimatingequations for clustered datardquo Journal of the American StatisticalAssociation vol 93 no 444 pp 1365ndash1375 1998

[93] C-W Shen and Y-H Chen ldquoModel selection for generalizedestimating equations accommodating dropout missingnessrdquoBiometrics vol 68 no 4 pp 1046ndash1054 2012

[94] C-W Shen and Y-H Chen ldquoModel selection of generalizedestimating equations with multiply imputed longitudinal datardquoBiometrical Journal vol 55 no 6 pp 899ndash911 2013

[95] D B Rubin ldquoInference and missing datardquo Biometrika vol 63no 3 pp 581ndash592 1976

[96] R J Little andDB Rubin Statistical Analysis withMissingDataWiley New York NY USA

[97] P Diggle D Farewell and RHenderson ldquoAnalysis of longitudi-nal data with drop-out objectives assumptions and a proposalrdquoJournal of the Royal Statistical Society C vol 56 no 5 pp 499ndash550 2007

[98] A J Copas and S R Seaman ldquoBias from the use of generalizedestimating equations to analyze incomplete longitudinal binarydatardquo Journal of Applied Statistics vol 37 no 6 pp 911ndash9222010

[99] L Wang J Zhou and A Qu ldquoPenalized generalized estimatingequations for high-dimensional longitudinal data analysisrdquoBiometrics vol 68 no 2 pp 353ndash360 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Review Article Generalized Estimating Equations in ...downloads.hindawi.com/archive/2014/303728.pdfrecent developments of GEE. As is well known, GEE has several de ning features [

10 Advances in Statistics

[45] MWang M Kong and S Datta ldquoInference for marginal linearmodels for clustered longitudinal datawith potentially informa-tive cluster sizesrdquo Statistical Methods in Medical Research vol20 no 4 pp 347ndash367 2011

[46] E Cantoni J M Flemming and E Ronchetti ldquoVariableselection for marginal longitudinal generalized linear modelsrdquoBiometrics Journal of the International Biometric Society vol 61no 2 pp 507ndash514 2005

[47] Y-G Wang and X Lin ldquoEffects of variance-function misspeci-fication in analysis of longitudinal datardquo Biometrics vol 61 no2 pp 413ndash421 2005

[48] N R Chaganty andH Joe ldquoEfficiency of generalized estimatingequations for binary responsesrdquo Journal of the Royal StatisticalSociety Series B Statistical Methodology vol 66 no 4 pp 851ndash860 2004

[49] M Gosho C Hamada and I Yoshimura ldquoCriterion for theselection of a working correlation structure in the generalizedestimating equation approach for longitudinal balanced datardquoCommunications in Statistics vol 40 no 21 pp 3839ndash38562011

[50] M Gosho C Hamada and I Yoshimura ldquoSelection of workingcorrelation structure in weighted generalized estimating equa-tion method for incomplete longitudinal datardquo Communica-tions in Statistics vol 43 no 1 pp 62ndash81 2014

[51] M J JangWorking correlation selection in generalized estimatingequations [Dissertation] University of Iowa 2011

[52] J Chen and N A Lazar ldquoSelection of working correlationstructure in generalized estimating equations via empiricallikelihoodrdquo Journal of Computational and Graphical Statisticsvol 21 no 1 pp 18ndash41 2012

[53] P M Westgate ldquoA bias-corrected covariance estimator forimproved inference when using an unstructured correlationwith quadratic inference functionsrdquo Statistics and ProbabilityLetters vol 83 no 6 pp 1553ndash1558 2013

[54] P M Westgate ldquoCriterion for the simultaneous selection of aworking correlation structure and either generalized estimatingequations or the quadratic inference function approachrdquo Bio-metrical Journal vol 56 no 3 pp 461ndash476 2014

[55] P M Westgate ldquoImproving the correlation structure selectionapproach for generalized estimating equations and balancedlongitudinal datardquo Statistics in Medicine vol 33 no 13 pp2222ndash2237 2014

[56] J Ye ldquoOn measuring and correcting the effects of data miningand model selectionrdquo Journal of the American Statistical Associ-ation vol 93 no 441 pp 120ndash131 1998

[57] J J Shuster Practical Handbook of Sample Size Guidelines forClinical Trials CRC Press Boca Raton Fla USA 1993

[58] G Liu and K-Y Liang ldquoSample size calculations for studieswith correlated observationsrdquo Biometrics vol 53 no 3 pp 937ndash947 1997

[59] W J Shih ldquoSample size and power calculations for periodontaland other studies with clustered samples using the method ofgeneralized estimating equationsrdquo Biometrical Journal vol 39no 8 pp 899ndash908 1997

[60] S R Lipsitz and G M Fitzmaurice ldquoSample size for repeatedmeasures studies with binary responsesrdquo Statistics in Medicinevol 13 no 12 pp 1233ndash1239 1994

[61] W Pan ldquoSample size and power calculations with correlatedbinary datardquoControlled Clinical Trials vol 22 no 3 pp 211ndash2272001

[62] N Breslow ldquoTests of hypotheses in overdispersed Poissonregression and other quasi likelihood modelsrdquo Journal of theAmerican Statistical Association vol 85 pp 565ndash571 1990

[63] E W Lee and N Dubin ldquoEstimation and sample size consider-ations for clustered binary responsesrdquo Statistics inMedicine vol13 no 12 pp 1241ndash1252 1994

[64] D J Sargent J A Sloan and S S Cha ldquoSample size anddesign considerations for phase II clinical trials with correlatedobservationsrdquo Controlled Clinical Trials vol 20 no 3 pp 242ndash252 1999

[65] C S Li ldquoSemiparametric negative binomial regressionmodelsrdquoCommunications in Statistics Simulation and Computation vol39 no 3 pp 475ndash486 2010

[66] WHGreene ldquoAccounting for excess zeros and sample selectionin Poisson and negative binomial regression modelsrdquo TechRep New York University 1994

[67] P Lambert ldquoModeling of repeated series of count data mea-sured at unequally spaced timesrdquo Applied Statistics vol 45 pp31ndash38 1996

[68] M S Pepe andG L Anderson ldquoA cautionary note on in ferencefor marginal regression models with longitudinal data andgeneral correlated response datardquo Communications in StatisticsSeries B vol 23 pp 939ndash951 1994

[69] M Wang and Q Long ldquoModified robust variance estimator forgeneralized estimating equations with improved small-sampleperformancerdquo Statistics in Medicine vol 30 no 11 pp 1278ndash1291 2011

[70] M Taljaard ADMcRae CWeijer et al ldquoInadequate reportingof research ethics review and informed consent in clusterrandomised trials Review of random sample of publishedtrialsrdquo British Medical Journal vol 342 Article ID d2496 2011

[71] L A Mancl and T A DeRouen ldquoA covariance estimator forGEE with improved small-sample propertiesrdquo Biometrics vol57 no 1 pp 126ndash134 2001

[72] M P Fay and B I Graubard ldquoSmall-sample adjustments forWald-type tests using sandwich estimatorsrdquo Biometrics vol 57no 4 pp 1198ndash1206 2001

[73] W Pan ldquoOn the robust variance estimator in generalisedestimating equationsrdquo Biometrika vol 88 no 3 pp 901ndash9062001

[74] W Pan and M M Wall ldquoSmall-sample adjustments in usingthe sandwich variance estimator in generalized estimatingequationsrdquo Statistics in Medicine vol 21 no 10 pp 1429ndash14412002

[75] X Guo W Pan J E Connett P J Hannan and S A FrenchldquoSmall-sample performance of the robust score test and itsmodifications in generalized estimating equationsrdquo Statistics inMedicine vol 24 no 22 pp 3479ndash3495 2005

[76] D M Farewell ldquoMarginal analyses of longitudinal data with aninformative pattern of observationsrdquo Biometrika vol 97 no 1pp 65ndash78 2010

[77] J D Beck T Sharp G G Koch and S Offenbacher ldquoA 5-yearstudy of attachment loss and tooth loss in community-dwellingolder adultsrdquo Journal of Periodontal Research vol 32 no 6 pp516ndash523 1997

[78] S J Arbes Jr H Agustsdottir and G D Slade ldquoEnvironmentaltobacco smoke and periodontal disease in the United StatesrdquoAmerican Journal of Public Health vol 91 no 2 pp 253ndash2572001

[79] J M Robins A Rotnitzky and L P Zhao ldquoAnalysis of semi-parametric regression models for repeated outcomes in the

Advances in Statistics 11

presence of missing datardquo Journal of the American StatisticalAssociation vol 90 pp 106ndash121 1995

[80] E B Hoffman P K Sen and C R Weinberg ldquoWithin-clusterresamplingrdquo Biometrika vol 88 no 4 pp 1121ndash1134 2001

[81] J MWilliamson S Datta and G A Satten ldquoMarginal analysesof clustered data when cluster size is informativerdquo Biometricsvol 59 no 1 pp 36ndash42 2003

[82] E Benhin J N Rao and A J Scott ldquoMean estimating equationapproach to analysing cluster-correlated data with nonignor-able cluster sizesrdquo Biometrika vol 92 no 2 pp 435ndash450 2005

[83] X J Cong G Yin and Y Shen ldquoMarginal analysis of correlatedfailure time data with informative cluster sizesrdquo Biometrics vol63 no 3 pp 663ndash672 2007

[84] T C Chiang and K Y Lee ldquoEfcient estimation methods forinformative cluster size datardquo Statistical Sinica vol 80 pp 121ndash123 2008

[85] M Pavlou S R Seaman and A J Copas ldquoAn examinationof a method for marginal inference when the cluster size isinformativerdquo Statistica Sinica vol 23 no 2 pp 791ndash801 2013

[86] S R Seaman M Pavlou and A J Copas ldquoMethods forobserved-cluster inference when cluster size is informative areview and clarificationsrdquoBiometrics vol 70 no 2 pp 449ndash4562014

[87] Z Chen B Zhang and P S Albert ldquoA joint modeling approachto data with informative cluster size robustness to the clustersize modelrdquo Statistics in Medicine vol 30 no 15 pp 1825ndash18362011

[88] Y Huang and B Leroux ldquoInformative cluster sizes forsubcluster-level covariates and weighted generalized estimatingequationsrdquo Biometrics vol 67 no 3 pp 843ndash851 2011

[89] B F Kurland L L Johnson B L Egleston and P H DiehrldquoLongitudinal data with follow-up truncated by death matchthe analysis method to research aimsrdquo Statistical Science vol24 no 2 pp 211ndash222 2009

[90] J M Neuhaus and C E McCulloch ldquoEstimation of covariateeffects in generalized linear mixed models with informativecluster sizesrdquo Biometrika vol 98 no 1 pp 147ndash162 2011

[91] S R Lipsitz G M Fitzmaurice E J Orav and N M LairdldquoPerformance of generalized estimating equations in practicalsituationsrdquo Biometrics vol 50 no 1 pp 270ndash278 1994

[92] D B Hall and T A Severini ldquoExtended generalized estimatingequations for clustered datardquo Journal of the American StatisticalAssociation vol 93 no 444 pp 1365ndash1375 1998

[93] C-W Shen and Y-H Chen ldquoModel selection for generalizedestimating equations accommodating dropout missingnessrdquoBiometrics vol 68 no 4 pp 1046ndash1054 2012

[94] C-W Shen and Y-H Chen ldquoModel selection of generalizedestimating equations with multiply imputed longitudinal datardquoBiometrical Journal vol 55 no 6 pp 899ndash911 2013

[95] D B Rubin ldquoInference and missing datardquo Biometrika vol 63no 3 pp 581ndash592 1976

[96] R J Little andDB Rubin Statistical Analysis withMissingDataWiley New York NY USA

[97] P Diggle D Farewell and RHenderson ldquoAnalysis of longitudi-nal data with drop-out objectives assumptions and a proposalrdquoJournal of the Royal Statistical Society C vol 56 no 5 pp 499ndash550 2007

[98] A J Copas and S R Seaman ldquoBias from the use of generalizedestimating equations to analyze incomplete longitudinal binarydatardquo Journal of Applied Statistics vol 37 no 6 pp 911ndash9222010

[99] L Wang J Zhou and A Qu ldquoPenalized generalized estimatingequations for high-dimensional longitudinal data analysisrdquoBiometrics vol 68 no 2 pp 353ndash360 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Review Article Generalized Estimating Equations in ...downloads.hindawi.com/archive/2014/303728.pdfrecent developments of GEE. As is well known, GEE has several de ning features [

Advances in Statistics 11

presence of missing datardquo Journal of the American StatisticalAssociation vol 90 pp 106ndash121 1995

[80] E B Hoffman P K Sen and C R Weinberg ldquoWithin-clusterresamplingrdquo Biometrika vol 88 no 4 pp 1121ndash1134 2001

[81] J MWilliamson S Datta and G A Satten ldquoMarginal analysesof clustered data when cluster size is informativerdquo Biometricsvol 59 no 1 pp 36ndash42 2003

[82] E Benhin J N Rao and A J Scott ldquoMean estimating equationapproach to analysing cluster-correlated data with nonignor-able cluster sizesrdquo Biometrika vol 92 no 2 pp 435ndash450 2005

[83] X J Cong G Yin and Y Shen ldquoMarginal analysis of correlatedfailure time data with informative cluster sizesrdquo Biometrics vol63 no 3 pp 663ndash672 2007

[84] T C Chiang and K Y Lee ldquoEfcient estimation methods forinformative cluster size datardquo Statistical Sinica vol 80 pp 121ndash123 2008

[85] M Pavlou S R Seaman and A J Copas ldquoAn examinationof a method for marginal inference when the cluster size isinformativerdquo Statistica Sinica vol 23 no 2 pp 791ndash801 2013

[86] S R Seaman M Pavlou and A J Copas ldquoMethods forobserved-cluster inference when cluster size is informative areview and clarificationsrdquoBiometrics vol 70 no 2 pp 449ndash4562014

[87] Z Chen B Zhang and P S Albert ldquoA joint modeling approachto data with informative cluster size robustness to the clustersize modelrdquo Statistics in Medicine vol 30 no 15 pp 1825ndash18362011

[88] Y Huang and B Leroux ldquoInformative cluster sizes forsubcluster-level covariates and weighted generalized estimatingequationsrdquo Biometrics vol 67 no 3 pp 843ndash851 2011

[89] B F Kurland L L Johnson B L Egleston and P H DiehrldquoLongitudinal data with follow-up truncated by death matchthe analysis method to research aimsrdquo Statistical Science vol24 no 2 pp 211ndash222 2009

[90] J M Neuhaus and C E McCulloch ldquoEstimation of covariateeffects in generalized linear mixed models with informativecluster sizesrdquo Biometrika vol 98 no 1 pp 147ndash162 2011

[91] S R Lipsitz G M Fitzmaurice E J Orav and N M LairdldquoPerformance of generalized estimating equations in practicalsituationsrdquo Biometrics vol 50 no 1 pp 270ndash278 1994

[92] D B Hall and T A Severini ldquoExtended generalized estimatingequations for clustered datardquo Journal of the American StatisticalAssociation vol 93 no 444 pp 1365ndash1375 1998

[93] C-W Shen and Y-H Chen ldquoModel selection for generalizedestimating equations accommodating dropout missingnessrdquoBiometrics vol 68 no 4 pp 1046ndash1054 2012

[94] C-W Shen and Y-H Chen ldquoModel selection of generalizedestimating equations with multiply imputed longitudinal datardquoBiometrical Journal vol 55 no 6 pp 899ndash911 2013

[95] D B Rubin ldquoInference and missing datardquo Biometrika vol 63no 3 pp 581ndash592 1976

[96] R J Little andDB Rubin Statistical Analysis withMissingDataWiley New York NY USA

[97] P Diggle D Farewell and RHenderson ldquoAnalysis of longitudi-nal data with drop-out objectives assumptions and a proposalrdquoJournal of the Royal Statistical Society C vol 56 no 5 pp 499ndash550 2007

[98] A J Copas and S R Seaman ldquoBias from the use of generalizedestimating equations to analyze incomplete longitudinal binarydatardquo Journal of Applied Statistics vol 37 no 6 pp 911ndash9222010

[99] L Wang J Zhou and A Qu ldquoPenalized generalized estimatingequations for high-dimensional longitudinal data analysisrdquoBiometrics vol 68 no 2 pp 353ndash360 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Review Article Generalized Estimating Equations in ...downloads.hindawi.com/archive/2014/303728.pdfrecent developments of GEE. As is well known, GEE has several de ning features [

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of