introduction to the generalized estimating equations and ... · the proof is centered on the...

Introduction to the Generalized EstimatingEquations and its Applications in Small Cluster

Randomized Trials

Fan Li

BIOSTAT 900 Seminar

November 11, 2016

1 / 24

Overview

Outline

Background

The Generalized Estimating Equations (GEEs)

Improved Small-sample Inference

Take-home Message

How GEEs work?How to improve small-sample inference, especially in clusterrandomized trial (CRT) applications?

2 / 24

Cluster Randomized Trials (CRTs)

Randomizing clusters of subjects rather than independent subjects(convenience, ethics, contamination etc.)

Intervention administered at the cluster levelOutcomes are measured at the individual level

Outcomes from subjects within the same cluster exhibit a greatercorrelation than those from different clusters

Intraclass correlation coefficient (ICC) ranges from 0.01 to 0.1

Interest lies in evaluating the intervention effect

Small number of clusters with large cluster sizes

3 / 24

The Stop Colorectal Cancer (STOP CRC) Study

Study the effect of intervention to improve cancer screening

26 health clinics (n = 26) are allocated to either usual care orintervention (1-to-1 ratio)

Usual care - provided opportunistic/occasional screeningIntervention - an automated program for mailing testing kits withinstructions to patients

The clinics contain variable number of patients (mi):

Min 461/Med 1426/Max 3302

Primary outcome - The patient-level binary outcome (yij),completion status of screening test within a year of study initiation

Baseline clinic- and individual-level covariates

Inference – estimand?

4 / 24

The Estimand from Conditional Models

yi = (yi1, . . . , yimi)T - the collection of outcomes from clinic i

Xi - ’design’ matrix (including intercept, treatment variable andbaseline covariates) of cluster i

The generalized linear mixed model g(E(yi|Xi, bi)) = Xiβ + 1mibi

g(·) - a smooth, invertible link functionβ is the regression coefficient (intervention effect)bi ∼ N(0, σ2

b ) - Gaussian random effects

The estimand, defined by the component of β, typically has acluster-specific (conditional) interpretation

yij |xij , bi is assumed from an exponential family model (likelihood)

5 / 24

The Estimand from Marginal Models

Recall basics of GLM

Now let yij |xij from an exponential family model withµij = E(yij |xij) and variance νij = h(µij)/φ

mean-variance relationshipφ - dispersion parameter

Let µi = E(yi|Xi) = (µi1, . . . , µimi)T

Use g(µi) = Xiβ allowing for non-zero covariance of yi isparameterized by a marginal intervention effect (marginal w.r.t.what?)

Population-average intervention effect - more straightforwardinterpretation

To make inferences on β, how to describe the correlation betweencomponents of yi?

6 / 24

Generalized Estimating Equations (GEEs)

Define Ri(α) is the mi ×mi “working” correlation matrix of yi,with α an unknown nuisance parameter, then the “working”

covariance Vi = var(yi|Xi) = A1/2i R(α)A

1/2i /φ

Ai = diag(h(µi1), . . . , h(µimi))

The GEEs are defined as

n∑i=1

Ui(β, φ, α) =

n∑i=1

DTi V−1i (yi − µi) = 0,

where Di = ∂µi/∂βT

Only the first two moments of yij are assumed, quasi-likelihood scoreequation (Wedderburn, 1974)The efficient score equation from a semi-parametric restrictedmoment model (Tsiatis, 2006)

Iterative algorithms (Newton-Raphson) for estimation

7 / 24

Dealing with Nuisances

The working variance Vi contains the nuisance parameters α and φ

It is possible to “profile out” these nuisances within the iterativeprocedure

Moment-based estimators for nuisances:

Given a current estimate β, the Pearson residuals is

rij = (yij − µij)/ν1/2ij , which is typically used to estimate

φ =

n∑i=1

mi∑j=1

r2ij/(N − p),

where N =∑n

i=1mi and p is the dimension of β

What about α?

8 / 24

Dealing with Nuisances - Cont’d

Choices of the working correlation structure

If Ri = I, the independent structure – no nuisances involvedIf corr(yij , yij′) = α for j 6= j′, then we end up with theexchangeable correlation. The nuisance can be estimated by

α = φn∑

i=1

∑j>j′

rij rij′/

{n∑

i=1

mi(mi − 1)/2− p

}

If Ri is assumed unstructured, then (loosely speaking)

Ri(α) =φ

n

n∑i=1

A−1/2i (yi − µi)(yi − µi)

T A−1/2i

Other types of correlation structures are available

In CRT applications, the exchangeable structure is often assumed

9 / 24

Modified Newton’s Algorithm

Initiate β → Compute φ(β) and α(β, φ(β)) → Update β by theNewton’s method → Repeat the last two steps until convergence

Essentially we are solving β iteratively from

0 =

n∑i=1

Ui

(β, α(β, φ(β))

)=

n∑i=1

DTi (β)V

−1i

(β, α(β, φ(β))

)(yi − µi(β)) ,

with a working assumption on Ri(α).

Why are these efforts worthwhile?

It turns out that, under mild assumptions, the final solution β isconsistent even if Ri(α) is misspecified

10 / 24

Asymptotics

(A.1) Sufficient moments of components of Xi and yi exist

(A.2) φ is root-n consistent given β

(A.3) α is root-n consistent given β and φ

(A.4) |∂α(β, φ)/∂φ| = Op(1)

Under (A.1) - (A.4), β is consistent to the truth β0 and asymptotically normal withthe sandwich covariance matrix

Vsand =

(n∑

i=1

DTi V−1i Di

)−1( n∑i=1

DTi V−1i cov(yi)V

−1i Di

)(n∑

i=1

DTi V−1i Di

)−1

(A.2) - (A.4) are usually fulfilled by the moment-based estimators for nuisances

If Ri(α) is correctly specified, Vsand equals the model-based variance(∑n

i=1DTi V−1i Di)

−1

Doesn’t depend on the nuisances as long as (A.2) and (A.3) hold

robust or empirical variance estimator

11 / 24

Asymptotics - Cont’d

The proof is centered on the classical theory of unbiased estimatingequation (van de Vaart, 1998), which simply uses Taylor expansion

The key is to realize that β is asymptotically linear with

√n(β − β0) =

(lim

n→∞n−1

n∑i=1

DTi V−1i Di

)−11√n

n∑i=1

Ui(β0) + op(1)

Vsand then comes from a simple application of CLT

Plug-in estimator Vsand

replace cov(yi) by eieTi with ei = yi − µi

replace β by β

Rule of thumb - need at least n = 50 to work

12 / 24

Small Sample Performance

Recall that STOP CRC only has 26 clinics (n = 26)

The plug-in estimator used the residuals ei to estimate cov(yi)

These residuals tend to be too small with small n

Vsand would be expected to underestimate the covariance of β

How to correct for this bias?

13 / 24

Resampling

An immediate solution is resampling

One could possibly use cluster bootstrapping (sampling clinics withreplacement) or delete-s jackknife covariance estimator

computationsufficient number of bootstrap replicates?optimal s in jackknifing?

Non-closed form correction is difficult to translate into practice

14 / 24

Deriving Vsand,MD

A bias-corrected covariance estimator is proposed by Mancl andDeRouen (2001)

Reduce the bias of the residual estimator, eieTi

Let ei = ei(β) = yi − µi, we could write for each i

ei ≈ ei +∂ei∂βT

(β − β) = ei −Di(β − β),

where we recall that Di = ∂µi/∂βT = −∂ei/∂βT

The second moment of ei

E(eieTi ) ≈ cov(yi)− E

[ei(β − β)TDT

i

]− E

[Di(β − β)eTi

]+E

[Di(β − β)(β − β)TDT

i

]

15 / 24

Deriving Vsand,MD

Recall the asymptotic linearity of β, so we have

(β − β0) ≈(

n∑i=1

DTi V−1i Di

)−1 n∑i=1

DTi V−1i ei

Define Hil = Di

(∑ni=1D

Ti V−1i Di

)−1DT

l V−1l

Hii is the block diagonal element of a projection matrixHii is the leverage of the ith cluster/clinicHil is between zero and one, and usually close to zero

Further since E(eieTl ) = 0, i 6= l, we have

E[Di(β − β)eTi

]= Hiicov(yi)

E[ei(β − β)TDT

i

]= cov(yi)HT

ii

E[Di(β − β)(β − β)TDT

i

]=∑n

l=1Hilcov(yl)HTil

16 / 24

Deriving Vsand,MD

Summing up these terms, we get

E(eieTi ) ≈ (Ii −Hii)cov(yi)(Ii −Hii)

T +∑l6=i

Hilcov(yl)HTil

≈ (Ii −Hii)cov(yi)(Ii −Hii)T

Ii is the identity matrix of dimension mi, and the latter term isassumed small because Hil’s are close to zero

cov(yi) ≈ (Ii −Hii)−1eie

Ti (Ii −HT

ii )−1

Consequently, the MD bias-corrected robust sandwich varianceestimator takes the form

Vsand,MD =

(n∑

i=1

DTi V−1i Di

)−1 ( n∑i=1

DTi V−1i (Ii −Hii)

−1eie

Ti

×(Ii −HTii)−1

V−1i Di

)( n∑i=1

DTi V−1i Di

)−1

Inflates Vsand

17 / 24

Possible Improvement

In theory, the MD bias-corrected robust variance can be improved byincorporating Hil

Specifically, if we write the residuals as e = (eT1 , . . . , eTn )T , and the

projection matrix H = D(∑n

i=1DTi V−1i Di)

−1DTV −1

D = (DT1 , . . . , D

Tn )T

V = block diag(V1, . . . , Vn)y = (yT1 , . . . , y

Tn )

T

We can show E(eeT ) = (I −H)cov(y)(I −H)T , which may promise amore accurate correction

Any practical issues?

Numeric problems due to the near singularity

18 / 24

Heuristics for Vsand,KC

Kauermann and Carroll (2001) proposed an alternative correction byextending the bias-corrected sandwich estimator for linear regression

Under the heteroscedasticity regression model Yi = xTi β + ei withei ∼ N(0, σ2

i ), suppose the parameter of interest is a scalar θ = cTβ

Write X as the design matrix, projection matrixH = X(XTX)−1XT , and define ai = cT (XTX)−1xiThe sandwich estimator

Vsand = cT (XTX)−1

(∑i

xixTi e

2i

)(XTX)−1c :=

∑i

a2i e2i

consistently estimate the variance of least square projectionθ = cT β, which is σ2

∑i a

2i = σ2cT (XTX)−1c

19 / 24


However, under homoscedasticity where σ2i = σ2, the expectation

E(e2i ) = E[yT (I −H)y] = σ2(1− hii),

where hii is the leverage of observation i (between 0 and 1)

IndeedE(Vsand) = σ2

∑i

a2i −σ2∑i

a2ihii︸︷︷︸Bias term

A simple fix is to replace ei in Vsand by leverage-adjusted residualsei = (1− hii)−

12 ei

20 / 24


Realizing that bias of the GEE sandwich estimator stems from thebias of the plug-in estimator eie

Ti for cov(yi)

A similar fix is to use the cluster-leverage-adjusted residualsei = (Ii −Hii)

− 12 ei

The resulting KC bias-corrected sandwich estimator

Vsand,KC =

(n∑

i=1

DTi V−1i Di

)−1 ( n∑i=1

DTi V−1i (Ii −Hii)

− 12 eie

Ti

×(Ii −HTii)− 1

2 V−1i Di

)( n∑i=1

DTi V−1i Di

)−1

It turns out that Vsand,KC removes the first-order bias of Vsand ifRi(α) is correct, and even works well under misspecification

The corrected variance between Vsand and Vsand,MD

21 / 24

Revisit STOP CRC

How do we test the intervention effect for STOP CRC with n = 26?What decisions should we make?

Two additional complications

Large variations in clinic sizes, the coefficient of variation cv = 0.485Wald t-test or Wald z-test?

These decisions are evaluated by simulation studies

22 / 24

Revisit STOP CRC

A well-described simulation study (Li and Redden, 2014)

Simulate correlated binary outcomes from a marginal beta-binomialThe model is parameterized by ICC, which are assumed commonlyreported values 0.01 and 0.05Assume 10, 20 and 30 clusters/clinics, with an average size ofclusters ranging from 10 to 150

Main findings

The Wald t-test with Vsand,KC remains valid as long as cv < 0.6The Wald z-test with Vsand,MD is only valid when n ≥ 20, while theWald t-test is conservative when n ≤ 20When cv > 0.6, a different bias-corrected variance by Fay andGraubard (2001) with the Wald t-test is recommended

For STOP CRC, both Vsand,KC (t-test) and Vsand,MD (z-test) areworth pursuing

23 / 24

Take-home Message

What are GEEs? What is the general rule of thumb for asymptotics?

Intuitions on bias-corrected sandwich estimator and the rule ofthumb for CRT applications

The above simulations dispense with covariate adjustment, wouldthat impact the recommendations?

24 / 24

introduction to the generalized estimating equations and ... · the proof is centered on the...

Documents