introduction to the generalized estimating equations and ... · the proof is centered on the...
TRANSCRIPT
Introduction to the Generalized EstimatingEquations and its Applications in Small Cluster
Randomized Trials
Fan Li
BIOSTAT 900 Seminar
November 11, 2016
1 / 24
Overview
Outline
Background
The Generalized Estimating Equations (GEEs)
Improved Small-sample Inference
Take-home Message
How GEEs work?How to improve small-sample inference, especially in clusterrandomized trial (CRT) applications?
2 / 24
Cluster Randomized Trials (CRTs)
Randomizing clusters of subjects rather than independent subjects(convenience, ethics, contamination etc.)
Intervention administered at the cluster levelOutcomes are measured at the individual level
Outcomes from subjects within the same cluster exhibit a greatercorrelation than those from different clusters
Intraclass correlation coefficient (ICC) ranges from 0.01 to 0.1
Interest lies in evaluating the intervention effect
Small number of clusters with large cluster sizes
3 / 24
The Stop Colorectal Cancer (STOP CRC) Study
Study the effect of intervention to improve cancer screening
26 health clinics (n = 26) are allocated to either usual care orintervention (1-to-1 ratio)
Usual care - provided opportunistic/occasional screeningIntervention - an automated program for mailing testing kits withinstructions to patients
The clinics contain variable number of patients (mi):
Min 461/Med 1426/Max 3302
Primary outcome - The patient-level binary outcome (yij),completion status of screening test within a year of study initiation
Baseline clinic- and individual-level covariates
Inference – estimand?
4 / 24
The Estimand from Conditional Models
yi = (yi1, . . . , yimi)T - the collection of outcomes from clinic i
Xi - ’design’ matrix (including intercept, treatment variable andbaseline covariates) of cluster i
The generalized linear mixed model g(E(yi|Xi, bi)) = Xiβ + 1mibi
g(·) - a smooth, invertible link functionβ is the regression coefficient (intervention effect)bi ∼ N(0, σ2
b ) - Gaussian random effects
The estimand, defined by the component of β, typically has acluster-specific (conditional) interpretation
yij |xij , bi is assumed from an exponential family model (likelihood)
5 / 24
The Estimand from Marginal Models
Recall basics of GLM
Now let yij |xij from an exponential family model withµij = E(yij |xij) and variance νij = h(µij)/φ
mean-variance relationshipφ - dispersion parameter
Let µi = E(yi|Xi) = (µi1, . . . , µimi)T
Use g(µi) = Xiβ allowing for non-zero covariance of yi isparameterized by a marginal intervention effect (marginal w.r.t.what?)
Population-average intervention effect - more straightforwardinterpretation
To make inferences on β, how to describe the correlation betweencomponents of yi?
6 / 24
Generalized Estimating Equations (GEEs)
Define Ri(α) is the mi ×mi “working” correlation matrix of yi,with α an unknown nuisance parameter, then the “working”
covariance Vi = var(yi|Xi) = A1/2i R(α)A
1/2i /φ
Ai = diag(h(µi1), . . . , h(µimi))
The GEEs are defined as
n∑i=1
Ui(β, φ, α) =
n∑i=1
DTi V−1i (yi − µi) = 0,
where Di = ∂µi/∂βT
Only the first two moments of yij are assumed, quasi-likelihood scoreequation (Wedderburn, 1974)The efficient score equation from a semi-parametric restrictedmoment model (Tsiatis, 2006)
Iterative algorithms (Newton-Raphson) for estimation
7 / 24
Dealing with Nuisances
The working variance Vi contains the nuisance parameters α and φ
It is possible to “profile out” these nuisances within the iterativeprocedure
Moment-based estimators for nuisances:
Given a current estimate β, the Pearson residuals is
rij = (yij − µij)/ν1/2ij , which is typically used to estimate
φ =
n∑i=1
mi∑j=1
r2ij/(N − p),
where N =∑n
i=1mi and p is the dimension of β
What about α?
8 / 24
Dealing with Nuisances - Cont’d
Choices of the working correlation structure
If Ri = I, the independent structure – no nuisances involvedIf corr(yij , yij′) = α for j 6= j′, then we end up with theexchangeable correlation. The nuisance can be estimated by
α = φn∑
i=1
∑j>j′
rij rij′/
{n∑
i=1
mi(mi − 1)/2− p
}
If Ri is assumed unstructured, then (loosely speaking)
Ri(α) =φ
n
n∑i=1
A−1/2i (yi − µi)(yi − µi)
T A−1/2i
Other types of correlation structures are available
In CRT applications, the exchangeable structure is often assumed
9 / 24
Modified Newton’s Algorithm
Initiate β → Compute φ(β) and α(β, φ(β)) → Update β by theNewton’s method → Repeat the last two steps until convergence
Essentially we are solving β iteratively from
0 =
n∑i=1
Ui
(β, α(β, φ(β))
)=
n∑i=1
DTi (β)V
−1i
(β, α(β, φ(β))
)(yi − µi(β)) ,
with a working assumption on Ri(α).
Why are these efforts worthwhile?
It turns out that, under mild assumptions, the final solution β isconsistent even if Ri(α) is misspecified
10 / 24
Asymptotics
(A.1) Sufficient moments of components of Xi and yi exist
(A.2) φ is root-n consistent given β
(A.3) α is root-n consistent given β and φ
(A.4) |∂α(β, φ)/∂φ| = Op(1)
Under (A.1) - (A.4), β is consistent to the truth β0 and asymptotically normal withthe sandwich covariance matrix
Vsand =
(n∑
i=1
DTi V−1i Di
)−1( n∑i=1
DTi V−1i cov(yi)V
−1i Di
)(n∑
i=1
DTi V−1i Di
)−1
(A.2) - (A.4) are usually fulfilled by the moment-based estimators for nuisances
If Ri(α) is correctly specified, Vsand equals the model-based variance(∑n
i=1DTi V−1i Di)
−1
Doesn’t depend on the nuisances as long as (A.2) and (A.3) hold
robust or empirical variance estimator
11 / 24
Asymptotics - Cont’d
The proof is centered on the classical theory of unbiased estimatingequation (van de Vaart, 1998), which simply uses Taylor expansion
The key is to realize that β is asymptotically linear with
√n(β − β0) =
(lim
n→∞n−1
n∑i=1
DTi V−1i Di
)−11√n
n∑i=1
Ui(β0) + op(1)
Vsand then comes from a simple application of CLT
Plug-in estimator Vsand
replace cov(yi) by eieTi with ei = yi − µi
replace β by β
Rule of thumb - need at least n = 50 to work
12 / 24
Small Sample Performance
Recall that STOP CRC only has 26 clinics (n = 26)
The plug-in estimator used the residuals ei to estimate cov(yi)
These residuals tend to be too small with small n
Vsand would be expected to underestimate the covariance of β
How to correct for this bias?
13 / 24
Resampling
An immediate solution is resampling
One could possibly use cluster bootstrapping (sampling clinics withreplacement) or delete-s jackknife covariance estimator
computationsufficient number of bootstrap replicates?optimal s in jackknifing?
Non-closed form correction is difficult to translate into practice
14 / 24
Deriving Vsand,MD
A bias-corrected covariance estimator is proposed by Mancl andDeRouen (2001)
Reduce the bias of the residual estimator, eieTi
Let ei = ei(β) = yi − µi, we could write for each i
ei ≈ ei +∂ei∂βT
(β − β) = ei −Di(β − β),
where we recall that Di = ∂µi/∂βT = −∂ei/∂βT
The second moment of ei
E(eieTi ) ≈ cov(yi)− E
[ei(β − β)TDT
i
]− E
[Di(β − β)eTi
]+E
[Di(β − β)(β − β)TDT
i
]
15 / 24
Deriving Vsand,MD
Recall the asymptotic linearity of β, so we have
(β − β0) ≈(
n∑i=1
DTi V−1i Di
)−1 n∑i=1
DTi V−1i ei
Define Hil = Di
(∑ni=1D
Ti V−1i Di
)−1DT
l V−1l
Hii is the block diagonal element of a projection matrixHii is the leverage of the ith cluster/clinicHil is between zero and one, and usually close to zero
Further since E(eieTl ) = 0, i 6= l, we have
E[Di(β − β)eTi
]= Hiicov(yi)
E[ei(β − β)TDT
i
]= cov(yi)HT
ii
E[Di(β − β)(β − β)TDT
i
]=∑n
l=1Hilcov(yl)HTil
16 / 24
Deriving Vsand,MD
Summing up these terms, we get
E(eieTi ) ≈ (Ii −Hii)cov(yi)(Ii −Hii)
T +∑l6=i
Hilcov(yl)HTil
≈ (Ii −Hii)cov(yi)(Ii −Hii)T
Ii is the identity matrix of dimension mi, and the latter term isassumed small because Hil’s are close to zero
cov(yi) ≈ (Ii −Hii)−1eie
Ti (Ii −HT
ii )−1
Consequently, the MD bias-corrected robust sandwich varianceestimator takes the form
Vsand,MD =
(n∑
i=1
DTi V−1i Di
)−1 ( n∑i=1
DTi V−1i (Ii −Hii)
−1eie
Ti
×(Ii −HTii)−1
V−1i Di
)( n∑i=1
DTi V−1i Di
)−1
Inflates Vsand
17 / 24
Possible Improvement
In theory, the MD bias-corrected robust variance can be improved byincorporating Hil
Specifically, if we write the residuals as e = (eT1 , . . . , eTn )T , and the
projection matrix H = D(∑n
i=1DTi V−1i Di)
−1DTV −1
D = (DT1 , . . . , D
Tn )T
V = block diag(V1, . . . , Vn)y = (yT1 , . . . , y
Tn )
T
We can show E(eeT ) = (I −H)cov(y)(I −H)T , which may promise amore accurate correction
Any practical issues?
Numeric problems due to the near singularity
18 / 24
Heuristics for Vsand,KC
Kauermann and Carroll (2001) proposed an alternative correction byextending the bias-corrected sandwich estimator for linear regression
Under the heteroscedasticity regression model Yi = xTi β + ei withei ∼ N(0, σ2
i ), suppose the parameter of interest is a scalar θ = cTβ
Write X as the design matrix, projection matrixH = X(XTX)−1XT , and define ai = cT (XTX)−1xiThe sandwich estimator
Vsand = cT (XTX)−1
(∑i
xixTi e
2i
)(XTX)−1c :=
∑i
a2i e2i
consistently estimate the variance of least square projectionθ = cT β, which is σ2
∑i a
2i = σ2cT (XTX)−1c
19 / 24
Heuristics for Vsand,KC
However, under homoscedasticity where σ2i = σ2, the expectation
E(e2i ) = E[yT (I −H)y] = σ2(1− hii),
where hii is the leverage of observation i (between 0 and 1)
IndeedE(Vsand) = σ2
∑i
a2i −σ2∑i
a2ihii︸ ︷︷ ︸Bias term
A simple fix is to replace ei in Vsand by leverage-adjusted residualsei = (1− hii)−
12 ei
20 / 24
Heuristics for Vsand,KC
Realizing that bias of the GEE sandwich estimator stems from thebias of the plug-in estimator eie
Ti for cov(yi)
A similar fix is to use the cluster-leverage-adjusted residualsei = (Ii −Hii)
− 12 ei
The resulting KC bias-corrected sandwich estimator
Vsand,KC =
(n∑
i=1
DTi V−1i Di
)−1 ( n∑i=1
DTi V−1i (Ii −Hii)
− 12 eie
Ti
×(Ii −HTii)− 1
2 V−1i Di
)( n∑i=1
DTi V−1i Di
)−1
It turns out that Vsand,KC removes the first-order bias of Vsand ifRi(α) is correct, and even works well under misspecification
The corrected variance between Vsand and Vsand,MD
21 / 24
Revisit STOP CRC
How do we test the intervention effect for STOP CRC with n = 26?What decisions should we make?
Two additional complications
Large variations in clinic sizes, the coefficient of variation cv = 0.485Wald t-test or Wald z-test?
These decisions are evaluated by simulation studies
22 / 24
Revisit STOP CRC
A well-described simulation study (Li and Redden, 2014)
Simulate correlated binary outcomes from a marginal beta-binomialThe model is parameterized by ICC, which are assumed commonlyreported values 0.01 and 0.05Assume 10, 20 and 30 clusters/clinics, with an average size ofclusters ranging from 10 to 150
Main findings
The Wald t-test with Vsand,KC remains valid as long as cv < 0.6The Wald z-test with Vsand,MD is only valid when n ≥ 20, while theWald t-test is conservative when n ≤ 20When cv > 0.6, a different bias-corrected variance by Fay andGraubard (2001) with the Wald t-test is recommended
For STOP CRC, both Vsand,KC (t-test) and Vsand,MD (z-test) areworth pursuing
23 / 24
Take-home Message
What are GEEs? What is the general rule of thumb for asymptotics?
Intuitions on bias-corrected sandwich estimator and the rule ofthumb for CRT applications
The above simulations dispense with covariate adjustment, wouldthat impact the recommendations?
24 / 24