estimands, missing data, and sensitivity analysis …estimands, missing data, and sensitivity...
TRANSCRIPT
Estimands, Missing Data, and Sensitivity Analysis
Geert Molenberghs
[email protected] & [email protected]
Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BioStat)
Universiteit Hasselt & KU Leuven, Belgium
www.ibiostat.be
Interuniversity Institute for Biostatisticsand statistical Bioinformatics
Kyoto, Japan December 14, 2018
Survey Sampling 101
Survey population: The collection of units (individuals) about which the researcherwants to make quantitative statements.
Sample frame: The set of units (individuals) that has non-zero probability of beingselected.
Sample: The subset of units that have been selected.
Probability sampling: The family of probabilistic (stochastic) methods by which asubset of the units from the sample frame is selected.
Design properties: The entire collection of methodological aspects that leads to theselection of a sample.
Estimands, Missing Data, and Sensitivity Analysis 1
Sample size: The number of units in the sample.
Analysis and inference: The collection of statistical techniques by which populationestimands are estimated.
Examples: estimation of means, averages, totals, linear regression, ANOVA, logisticregression, loglinear models.
Estimand: The true population quantity (e.g., the average body mass index of the USpopulation).
Estimator: A (stochastic) function of the sample data, with the aim to “come close” tothe estimand.
Estimate: A particular realization of the estimator, for the particular sample taken (e.g.,22.37).
Estimands, Missing Data, and Sensitivity Analysis 2
Your M.o.t.R. Clinical Trial
• Setting:
Potential outcomes (T0j, T1j)
Individual treatment effect ∆Tj = T1j − T0j
Expected treatment effect β = E(T1j − T0j)
• No missing data =⇒ 50% of missing data
• Fair to say: Estimand is β = E(T1 − T0) in population
• Randomization: Treatment effect estimable from observed data:
• Estimator: T1 − T0
Estimands, Missing Data, and Sensitivity Analysis 3
• Information coming from:
. data
. design
. assumptions××××××
• Would be different in an epidemiological study
Estimands, Missing Data, and Sensitivity Analysis 4
Surrogate Endpoints Evaluation:Potential Outcomes
Alonso, Van der Elst, Molenberghs (Statistical Modeling 2016)
• Setting:
Potential outcomes (T0j, T1j)
Individual causal effect ∆Tj = T1j − T0j
Expected causal effect β = E(T1j − T0j)
Surrogate Sj
Estimands, Missing Data, and Sensitivity Analysis 5
• Predictive causal association:
ρψ = corr(∆Tj, Sj)
• (Un)identifiability:
ρT0T1not identifiable
• Information coming from:
. data
. design
. assumptions −→ sensitivity
Estimands, Missing Data, and Sensitivity Analysis 6
⇒ Sensitivity analysis for age-related macular degeneration trial:
PCA
ρψ
Perc
enta
ge
0.00 0.01 0.02 0.03 0.04 0.05 0.06
0.0
0.2
0.4
0.6
0.8
Estimands, Missing Data, and Sensitivity Analysis 7
Surrogate Endpoints Evaluation:Full Causal Paradigm
Alonso et al. (Biometrics 2015)
• Setting:
Treatment potential outcomes (T0j, T1j)
Treatment individual causal effect ∆Tj = T1j − T0j
Surrogate potential outcomes (S0j, S1j)
Surrogate individual causal effect ∆Sj = S1j − S0j
• Individual causal association (ICA):
ρ∆ = corr(∆Tj,∆Sj)
Estimands, Missing Data, and Sensitivity Analysis 8
• Joint distribution unidentifiable
• Capture assumptions in causal diagrams → reduced forms of ρ∆
• Information coming from:
. data
. design
. assumptions −→ sensitivity
• Meta-analytic version in multiple trials
Estimands, Missing Data, and Sensitivity Analysis 9
−10 −5 0 5 10
−20
−15
−10
−5
05
10
15
(a) Trial−level surrogacy
Treatment effect on the surrogate endpoint
Treatm
ent effect on the tru
e e
ndpoin
t
−30 −20 −10 0 10 20 30
−30
−20
−10
010
20
30
(b) Individual−level surrogacy
Results for the surrogate endpoint
Results for
the tru
e e
ndpoin
t
(c) MICA
ρM
Percen
tage
−0.2 0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.05
0.10
0.15
Estimands, Missing Data, and Sensitivity Analysis 10
Terms of Enrichment
Enriched data
Coarse data Augmented data
Incomplete data Randomized studies
Censored data Random effects
Joint models Latent classes
Grouped data Latent variables
Non-compliance Mixtures
Estimands, Missing Data, and Sensitivity Analysis 11
Increasing Complexity
• Standard clinical trial: design compensates for what is unobserved
• Surrogacy: augmentation: sensitivity is design-based
• Incomplete data /non-compliance: coarsening:
sensitivity is (non-)observation-based
. (Subjective) choices unavoidable
. Interference of intercurrent events
. Scenarios needed about f (ymi |yoi ,xi, θ)
. Such scenarios should preserve estimand
. Easy and elegant with MI
Estimands, Missing Data, and Sensitivity Analysis 12
The Slovenian Plebiscite
Rubin, Stern, and Vehovar (1995)
• Slovenian Public Opinion (SPO) Survey
• Four weeks prior to decisive plebiscite
• Three questions:
1. Are you in favor of Slovenian independence?
2. Are you in favor of Slovenia’s secession from Yugoslavia?
3. Will you attend the plebiscite?
• Political decision: ABSENCE≡NO
• Primary Estimand: θ: Proportion in favor of independence
Estimands, Missing Data, and Sensitivity Analysis 13
• Slovenian Public Opinion Survey Data:
Independence
Secession Attendance Yes No ∗Yes Yes 1191 8 21
No 8 0 4
∗ 107 3 9
No Yes 158 68 29
No 7 14 3
∗ 18 43 31
∗ Yes 90 2 109
No 1 2 25
∗ 19 8 96
Estimands, Missing Data, and Sensitivity Analysis 14
Slovenian Plebiscite ←→ Slovenian PublicOpinion Survey
θ =0.885
Estimator θ
Pessimistic bound 0.694
Optimistic bound 0.904
Complete cases 0.928 ?Available cases 0.929 ?MAR (2 questions) 0.892
MAR (3 questions) 0.883
MNAR 0.782
Estimands, Missing Data, and Sensitivity Analysis 15
Modeling Frameworks & Missing DataMechanisms
f (yi, ri|Xi,θ,ψ)
Selection Models: f (yi|Xi,θ) f (ri|Xi,yoi ,y
mi ,ψ)
MCAR −→ MAR −→ MNAR
f (ri|Xi,ψ) f (ri|Xi,yoi ,ψ) f (ri|Xi,y
oi ,y
mi ,ψ)
Pattern-mixture Models: f (yi|Xi, ri, θ) f (ri|Xi,ψ)
Shared-parameter Models: f (yi|Xi, bi, θ) f (ri|Xi, bi,ψ)
Estimands, Missing Data, and Sensitivity Analysis 16
Rubin, 1976
• Ignorability: Rubin (Biometrika, 1976): 42 years ago!
• Little and Rubin (1976, 2002)
• Why did it take so long?
Estimands, Missing Data, and Sensitivity Analysis 17
A Vicious Triangle
Industry
↗↙ ↘↖←−Academe Regulatory−→
• Academe: The R2 principle
• Regulatory: Rigid procedures ←→ scientific developments
• Industry: We cannot / do not want to apply new methods
Estimands, Missing Data, and Sensitivity Analysis 18
Terminology & Confusion
• The Ministry of Disinformation:
←− All directions
Other directions −→
•MCAR, MAR, MNAR: “What do the terms mean?”
•MAR, random dropout, informative missingness, ignorable, censoring,. . .
• Dropout from the study, dropout from treatment, lost to follow up,. . .
• “Under MAR patients dropping out and patients not dropping out aresimilar.”
Estimands, Missing Data, and Sensitivity Analysis 19
A Virtuous Triangle
Industry
↗↙ ↘↖←−Academe Regulatory−→
• FDA/Industry Workshops
• DIA/EMA Meetings
• The NAS Experience
Estimands, Missing Data, and Sensitivity Analysis 20
A Virtuous Quadrangle
←−Industry Patients−→
l l
←−Academe Regulatory−→
Estimands, Missing Data, and Sensitivity Analysis 21
The NAS Experience: A WholesomeProduct
• FDA −→ NAS −→ the working group
• Composition
• Encompassing:
. terminology/taxonomy/concepts
. prevention
. treatment
Estimands, Missing Data, and Sensitivity Analysis 22
Taxonomy
•Missingness pattern: complete — monotone — non-monotone
• Dropout pattern: complete — dropout — intermittent
•Model framework: SEM — PMM — SPM
•Missingness mechanism: MCAR — MAR — MNAR
• Ignorability: ignorable — non-ignorable
• Inference paradigm: frequentist — likelihood — Bayes
Estimands, Missing Data, and Sensitivity Analysis 23
The NAS Panel
Name Specialty Affiliation
Rod Little biostat U Michigan
Ralph D’Agostino biostat Boston U
Kay Dickerson epi Johns Hopkins
Scott Emerson biostat U Washington
John Farrar epi U Penn
Constantine Frangakis biostat Johns Hopkins
Joseph Hogan biostat Brown U
Geert Molenberghs biostat U Hasselt & K.U.Leuven
Susan Murphy stat U Michigan
James Neaton biostat U Minnesota
Andrea Rotnitzky stat Buenos Aires & Harvard
Dan Scharfstein biostat Johns Hopkins
Joseph Shih biostat New Jersey SPH
Jay Siegel biostast J&J
Hal Stern stat UC at Irvine
Estimands, Missing Data, and Sensitivity Analysis 24
Every MNAR Model Has Got an MARCounterpart
Molenberghs, Beunckens, Sotto, and Kenward (JRSSB 2008)
Creemers, Hens, Aerts, Molenberghs, Verbeke, and Kenward (2008)
• Fit an MNAR model to a set of incomplete data
• Change the conditional distribution of the unobserved outcomes, given the observedones, to comply with MAR
• Resulting new model has exactly the same fit as the original MNAR model
• The missing data mechanism has changed
• This implies that definitively testing for MAR versus MNAR is not possible
Estimands, Missing Data, and Sensitivity Analysis 25
MAR Counterpart to Pattern-mixtureModels
f (yio,yi
m, ri|θ, ψ) = f (yi
o|ri, θ) f (ri|ψ) f (yi
m|yio, ri, θ)
↓
h(yio,yi
m, ri|θ, ψ) = f (yi
o|ri, θ) f (ri|ψ) f (yi
m|yio, θ,
ψ)
• Starting from PMM is “natural”: clear separation into:
. fully observable components
. entirely unobserved component
Estimands, Missing Data, and Sensitivity Analysis 26
MAR Counterpart to Selection Models
f (yio,yi
m,ri|θ, ψ) = f (yi
o,yim|θ) f (ri|yio,yim,
ψ)
↓
f (yio,yi
m,ri|θ, ψ) = f (yi
o|ri, θ,
ψ) f (ri|θ,
ψ) f (yim|yio,ri,
θ,ψ)
↓
h(yio,yi
m,ri|θ, ψ) = f (yi
o|ri, θ,
ψ) f (ri|θ,
ψ) f (yim|yio,
θ,ψ)
Estimands, Missing Data, and Sensitivity Analysis 27
MAR Counterpart to Shared-parameterModels
f (yio,yi
m, ri|bi) = f (yoi |gi,hi, ji, `i) f (ymi |yoi , gi,hi,ki,mi) f (ri|gi, ji,kini)
↓
h(yio,yi
m, ri|bi) = f (yoi |gi,hi, ji, `i) h(ymi |yoi ,mi) f (ri|gi, ji,kini)
withh(ymi |yoi ,mi) =
∫
gi
∫
hi
∫
kif (ymi |yoi , gi,hi,ki,mi)dgidhidki
Estimands, Missing Data, and Sensitivity Analysis 28
Slovenian Public Opinion Survey: AnMNAR Model Family
Baker, Rosenberger, and DerSimonian (1992)
• Counts: Yr1r2jk
• Questions: j, k = 1, 2
• Non-response: r1, r2 = 0, 1
E(Y11jk) = mjk
E(Y10jk) = mjkβjk
E(Y01jk) = mjkαjk
E(Y00jk) = mjkαjkβjkγjk
. αjk: non-response on independence question
. βjk: non-response on attendance question
. γjk: interaction between both non-response indicators
Estimands, Missing Data, and Sensitivity Analysis 29
SPO Survey: Interval of Ignorance &Counterpart
Model Structure d.f. loglik θ C.I. θMAR
BRD1 (α, β) 6 -2495.29 0.892 [0.878;0.906] 0.8920
BRD2 (α, βj) 7 -2467.43 0.884 [0.869;0.900] 0.8915
BRD3 (αk, β) 7 -2463.10 0.881 [0.866;0.897] 0.8915
BRD4 (α, βk) 7 -2467.43 0.765 [0.674;0.856] 0.8915
BRD5 (αj, β) 7 -2463.10 0.844 [0.806;0.882] 0.8915
BRD6 (αj, βj) 8 -2431.06 0.819 [0.788;0.849] 0.8919
BRD7 (αk, βk) 8 -2431.06 0.764 [0.697;0.832] 0.8919
BRD8 (αj, βk) 8 -2431.06 0.741 [0.657;0.826] 0.8919
BRD9 (αk, βj) 8 -2431.06 0.867 [0.851;0.884] 0.8919
Model 10 (αk, βjk) 9 -2431.06 [0.762;0.893] [0.744;0.907] 0.8919
Model 11 (αjk, βj) 9 -2431.06 [0.766;0.883] [0.715;0.920] 0.8919
Model 12 (αjk, βjk) 10 -2431.06 [0.694;0.904] 0.8919
Estimands, Missing Data, and Sensitivity Analysis 30
Slovenian Public Opinion Survey:Incomplete Data
Observed ≡BRD7 ≡ BRD7(MAR) ≡BRD9 ≡ BRD9(MAR):
1439 78
16 16
159
32144 54 136
BRD1 ≡ BRD1(MAR):1381.6 101.7
24.2 41.4
182.9
8.1179.7 18.3 136.0
BRD2 ≡ BRD2(MAR):1402.2 108.9
15.6 22.3
159.0
32.0181.2 16.8 136.0
Estimands, Missing Data, and Sensitivity Analysis 31
Slovenian Public Opinion Survey:Complete-data Prediction
BRD1 ≡ BRD1(MAR):1381.6 101.7
24.2 41.4
170.4 12.5
3.0 5.1
176.6 13.0
3.1 5.3
121.3 9.0
2.1 3.6
BRD2:1402.2 108.9
15.6 22.3
147.5 11.5
13.2 18.8
179.2 13.9
2.0 2.9
105.0 8.2
9.4 13.4
BRD2(MAR):1402.2 108.9
15.6 22.3
147.7 11.3
13.3 18.7
177.9 12.5
3.3 4.3
121.2 9.3
2.3 3.2
BRD7:1439 78
16 16
3.2 155.8
0.0 32.0
142.4 44.8
1.6 9.2
0.4 112.5
0.0 23.1
BRD9:1439 78
16 16
150.8 8.2
16.0 16.0
142.4 44.8
1.6 9.2
66.8 21.0
7.1 41.1
BRD7(MAR) ≡ BRD9(MAR):1439 78
16 18
148.1 10.9
11.8 20.2
141.5 38.4
2.5 15.6
121.3 9.0
2.1 3.6
Estimands, Missing Data, and Sensitivity Analysis 32
Slovenian Public Opinion Survey:Collapsed (Marginalized) Predictions
BRD1 ≡ BRD1(MAR):1849.9 136.2
32.4 55.4=⇒ θ = 89.2%
BRD2:1833.9 142.5
40.2 57.5=⇒ θ = 88.4%
BRD2(MAR):1849.0 142.0
34.5 48.5=⇒ θ = 89.2%
BRD7:1585.0 391.1
17.6 80.3=⇒ θ = 76.4%
BRD9:1799.7 152.0
40.7 82.3=⇒ θ = 86.7%
BRD7(MAR) ≡ BRD9(MAR):1849.9 136.3
30.4 57.4=⇒ θ = 89.2%
Estimands, Missing Data, and Sensitivity Analysis 33
Modeling Frameworks & Missing DataMechanisms
f (yi, ri|Xi,θ,ψ)
Selection Models: f (yi|Xi,θ) f (ri|Xi,yoi ,y
mi ,ψ)
MCAR −→ MAR −→ MNAR
f (ri|Xi,ψ) f (ri|Xi,yoi ,ψ) f (ri|Xi,y
oi ,y
mi ,ψ)
Pattern-mixture Models: f (yi|Xi, ri, θ) f (ri|Xi,ψ)
Shared-parameter Models: f (yi|Xi, bi, θ) f (ri|Xi, bi,ψ)
Estimands, Missing Data, and Sensitivity Analysis 34
Frameworks and Their Methods: Start
f (yi, ri|Xi,θ,ψ)
Selection Models: f (yi|Xi, θ) f (ri|Xi,yoi ,y
mi ,ψ)
MCAR/simple −→ MAR −→ MNAR
direct likelihood!
direct Bayesian!
multiple imputation (MI)!
IPW ⊃ W-GEE!
d.l. + IPW = double robustness!
Estimands, Missing Data, and Sensitivity Analysis 35
Frameworks and Their Methods: Next
f (yi, ri|Xi,θ,ψ)
Selection Models: f (yi|Xi, θ) f (ri|Xi,yoi ,y
mi ,ψ)
MCAR/simple −→ MAR −→ MNAR
joint model!?
sensitivity analysis!
PMM
MI (MGK, J&J)
local influence
interval ignorance
IPW based
Estimands, Missing Data, and Sensitivity Analysis 36
Multiple Imputation
• Valid under MAR
• An alternative to direct likelihood and WGEE
• Three steps:
1. The missing values are filled in M times =⇒ M complete data sets
2. The M complete data sets are analyzed by using standard procedures
3. The results from the M analyses are combined into a single inference
• Rubin (1987), Rubin and Schenker (1986), Little and Rubin (1987), Verbeke and Molenberghs (2000),
Molenberghs and Verbeke (2005), Molenberghs and Kenward (2007), Ratitch and O’Kelly (2011),
Carpenter and Kenward (2013), Mallinckrodt (2013), van Buuren (2014), . . .
Estimands, Missing Data, and Sensitivity Analysis 37
• Multiple imputation (M = 5 imputations):
Observeddata
.................................................................................................................................................................................................................................................................................................................................................
........................................................................................
........................................................................................
..........................................
........................................
....................................................................................................................................................................................... ........................................
.......................................................................................................................................................................................................................... ........................................
......................................................................................................................................................................................................................................................................................................... ........................................
Imputed 1
Imputed 2
Imputed 3
Imputed 4
Imputed 5
....................................................................................................................................................................................... ........................................
....................................................................................................................................................................................... ........................................
....................................................................................................................................................................................... ........................................
....................................................................................................................................................................................... ........................................
....................................................................................................................................................................................... ........................................
Results 1
Results 2
Results 3
Results 4
Results 5
......................................................................................................................................................................................................................................................................................................... ........................................
.......................................................................................................................................................................................................................... ........................................
....................................................................................................................................................................................... ........................................
.......................................................................................
........................................................................................
...........................................
........................................
.................................................................................................................................................................................................................................................................................................................................................
Finalresults
...........................
..........................
...........................
...........................
..................................................
....................................
....................................................................................
.....................................................................................................................................................................
.........................................................................................................................................................................................................................................................
Imputation CombinationAnalysis
Estimands, Missing Data, and Sensitivity Analysis 38
Use of MI in Practice
• Many analyses of the same incomplete set of data
• A combination of missing outcomes and missing covariates
• As an alternative to WGEE: MI can be combined with classical GEE
• MI in SAS:
Imputation Task: PROC MI
↓Analysis Task: PROC “MYFAVORITE”
↓Inference Task: PROC MIANALYZE
Estimands, Missing Data, and Sensitivity Analysis 39
Generalized Estimating Equations
Liang and Zeger (1986)
S(β) =N∑
i=1[Di]
T [Vi(α)]−1 (yi − µi) = 0
• V i(.) is not the true variance of Y i but only a plausible guess
• the score equations are solved in a standard way
• Asymptotic distribution:√N(β − β) ∼ N (0, I−1
0 I1I−10 )
I0 =N∑
i=1DTi [Vi(α)]−1Di I1 =
N∑
i=1DTi [Vi(α)]−1Var(Y i)[Vi(α)]−1Di
Estimands, Missing Data, and Sensitivity Analysis 40
Weighted GEE
πi =ni∏
`=2(1 − pi`)
π′i =di−1∏
`=2(1− pi`)
· pidi
pi` = P (Di = `|Di ≥ `, Yi ` ,Xi ` )
Ri = 1 if subject i is complete
Ri = 0 if subject i is incomplete
S(β) =N∑
i=1
Ri
πi
∂µi∂β′
V −1i (yi − µi) = 0
S(β) =N∑
i=1
1
π′i
∂µoi∂β′
(V oi )−1(yoi − µoi ) = 0
Estimands, Missing Data, and Sensitivity Analysis 41
The Analgesic Trial
• single-arm trial with 530 patients recruited (491 selected for analysis)
• analgesic treatment for pain caused by chronic nonmalignant disease
• treatment was to be administered for 12 months
• we will focus on Global Satisfaction Assessment (GSA)
• GSA scale goes from 1=very good to 5=very bad
• GSA was rated by each subject 4 times during the trial, at months 3, 6, 9, and 12.
Estimands, Missing Data, and Sensitivity Analysis 42
• Research questions:
. Evolution over time
. Relation with baseline covariates: age, sex, duration of the pain, type of pain,disease progression, Pain Control Assessment (PCA), . . .
. Investigation of dropout
• Frequencies:
GSA Month 3 Month 6 Month 9 Month 12
1 55 14.3% 38 12.6% 40 17.6% 30 13.5%
2 112 29.1% 84 27.8% 67 29.5% 66 29.6%
3 151 39.2% 115 38.1% 76 33.5% 97 43.5%
4 52 13.5% 51 16.9% 33 14.5% 27 12.1%
5 15 3.9% 14 4.6% 11 4.9% 3 1.4%
Tot 385 302 227 223
Estimands, Missing Data, and Sensitivity Analysis 43
• Missingness:
Measurement occasion
Month 3 Month 6 Month 9 Month 12 Number %
Completers
O O O O 163 41.2
Dropouts
O O O M 51 12.91
O O M M 51 12.91
O M M M 63 15.95
Non-monotone missingness
O O M O 30 7.59
O M O O 7 1.77
O M O M 2 0.51
O M M O 18 4.56
M O O O 2 0.51
M O O M 1 0.25
M O M O 1 0.25
M O M M 3 0.76
Estimands, Missing Data, and Sensitivity Analysis 44
Analysis of the Analgesic Trial
• A logistic regression for the dropout indicator:
logit[P (Di = j|Di ≥ j, ·)] = ψ0 + ψ11I(GSAi,j−1 = 1) + ψ12I(GSAi,j−1 = 2)
+ψ13I(GSAi,j−1 = 3) + ψ14I(GSAi,j−1 = 4)
+ψ2PCA0i + ψ3PFi + ψ4GDi
with
. GSAi,j−1 the 5-point outcome at the previous time
. I(·) is an indicator function
. PCA0i is pain control assessment at baseline
. PFi is physical functioning at baseline
. GDi is genetic disorder at baseline are used
Estimands, Missing Data, and Sensitivity Analysis 45
Effect Par. Estimate (s.e.)
Intercept ψ0 -1.80 (0.49)
Previous GSA= 1 ψ11 -1.02 (0.41)
Previous GSA= 2 ψ12 -1.04 (0.38)
Previous GSA= 3 ψ13 -1.34 (0.37)
Previous GSA= 4 ψ14 -0.26 (0.38)
Basel. PCA ψ2 0.25 (0.10)
Phys. func. ψ3 0.009 (0.004)
Genetic disfunc. ψ4 0.59 (0.24)
• There is some evidence for MAR: P (Di = j|Di ≥ j) depends on previous GSA.
• Furthermore: baseline PCA, physical functioning and genetic/congenital disorder.
Estimands, Missing Data, and Sensitivity Analysis 46
• GEE and WGEE:
logit[P (Yij = 1|tj,PCA0i)] = β1 + β2tj + β3t2j + β4PCA0i
Effect Par. GEE WGEE
Intercept β1 2.95 (0.47) 2.17 (0.69)
Time β2 -0.84 (0.33) -0.44 (0.44)
Time2 β3 0.18 (0.07) 0.12 (0.09)
Basel. PCA β4 -0.24 (0.10) -0.16 (0.13)
• A hint of potentially important differences between both
Estimands, Missing Data, and Sensitivity Analysis 47
Make Monotone −→ Weighted GEE
• Multiple imputation to create monotone missingness:
proc mi data=m.gsa4 seed=459864 simple nimpute=10
round=0.1 out=m.gsaimput;
title ’Monotone multiple imputation’;
mcmc impute = monotone;
var pca0 physfct gsa1 gsa2 gsa3 gsa4;
run;
• Standard GEE:
proc gee data=m.gsaw plots=histogram;
title ’Standard GEE for GSA data’;
class patid timecls;
model gsabin = time|time pca0 / dist=bin;
repeated subject=patid / within=timecls corr=un;
run;
Estimands, Missing Data, and Sensitivity Analysis 48
• Weighted GEE, where weights are created at observation level:
ods graphics on;
proc gee data=m.gsaimput03 plots=histogram;
title ’Weighted GEE for GSA Data Based on Multiple
Imputation to Monotonize - OBSLEVEL’;
by _imputation_;
class patid timecls;
model gsabin = time|time pca0 / dist=bin covb;
repeated subject=patid / within=timecls corr=un ecovb;
missmodel prevgsa pca0 physfct / type=obslevel;
ods output GEEEmpPEst=gmparms parminfo=gmpinfo
modelinfo=modelinfo GEERCov=gmcovb;
run;
proc mianalyze parms=gmparms parminfo=gmpinfo covb=gmcovb;
title ’Multiple Imputation Analysis After Weighted GEE for GSA Data’;
modeleffects intercept time time*time pca0;
run;
Estimands, Missing Data, and Sensitivity Analysis 49
• To use weights at subject rather than observation level:
missmodel prevgsa pca0 physfct / type=sublevel;
Effect Par. Est.(s.e.) p-value Est.(s.e.) p-value
Standard GEE
Without MI After MI
Intercept β0 2.90(0.46) 2.87(0.45)
Time β1 -0.81(0.32) 0.0124 -0.83(0.32) 0.0087
Time2 β2 0.17(0.07) 0.0083 0.18(0.06) 0.0058
PCA0 β3 -0.23(0.10) 0.0178 -0.21(0.10) 0.0253
Weighted GEE (after MI)
Observation level Subject level
Intercept β0 2.74(0.46) 2.62(0.60)
Time β1 -0.76(0.33) 0.0231 -0.71(0.40) 0.0747
Time2 β2 0.17(0.07) 0.0155 0.16(0.08) 0.0444
PCA0 β3 -0.19(0.10) 0.0384 -0.21(0.12) 0.0853
Estimands, Missing Data, and Sensitivity Analysis 50
Age-related Macular Degeneration
I T I S A
P L E A S
U R E T O
B E W I T
H Y O U A
T F O R T
C O L L I
N S F O R
G R A Y B
I L L IC SA
Estimands, Missing Data, and Sensitivity Analysis 51
Age-related Macular Degeneration Trial
• Pharmacological Therapy for Macular Degeneration Study Group (1997)
• An occular pressure disease which makes patients progressively lose vision
• 240 patients enrolled in a multi-center trial (190 completers)
• Treatment: Interferon-α (6 million units) versus placebo
• Visits: baseline and follow-up at 4, 12, 24, and 52 weeks
• Continuous outcome: visual acuity: # letters correctly read on a vision chart
• Binary outcome: visual acuity versus baseline ≥ 0 or ≤ 0
Estimands, Missing Data, and Sensitivity Analysis 52
• Missingness:
Measurement occasion
4 wks 12 wks 24 wks 52 wks Number %
Completers
O O O O 188 78.33
Dropouts
O O O M 24 10.00
O O M M 8 3.33
O M M M 6 2.50
M M M M 6 2.50
Non-monotone missingness
O O M O 4 1.67
O M M O 1 0.42
M O O O 2 0.83
M O M M 1 0.42
Estimands, Missing Data, and Sensitivity Analysis 53
Weighted Generalized EstimatingEquations
• Model for the weights:
logit[P (Di = j|Di ≥ j)] = ψ0 + ψ1yi,j−1 + ψ2Ti + ψ31L1i + ψ32L2i + ψ34L3i
+ψ41I(tj = 2) + ψ42I(tj = 3)
with
. yi,j−1 the binary outcome at the previous time ti,j−1 = tj−1 (since time is commonto all subjects)
. Ti = 1 for interferon-α and Ti = 0 for placebo
. Lki = 1 if the patient’s eye lesion is of level k = 1, . . . , 4 (since one dummyvariable is redundant, only three are used)
. I(·) is an indicator function
Estimands, Missing Data, and Sensitivity Analysis 54
• Results for the weights model:
Effect Parameter Estimate (s.e.)
Intercept ψ0 0.14 (0.49)
Previous outcome ψ1 0.04 (0.38)
Treatment ψ2 -0.86 (0.37)
Lesion level 1 ψ31 -1.85 (0.49)
Lesion level 2 ψ32 -1.91 (0.52)
Lesion level 3 ψ33 -2.80 (0.72)
Time 2 ψ41 -1.75 (0.49)
Time 3 ψ42 -1.38 (0.44)
• GEE:logit[P (Yij = 1|Ti, tj)] = βj1 + βj2Ti
Estimands, Missing Data, and Sensitivity Analysis 55
Effect Par. CC LOCF Observed data
Unweighted WGEE
Int.4 β11 -1.01(0.24;0.24) -0.87(0.20;0.21) -0.87(0.21;0.21) -0.98(0.10;0.44)
Int.12 β21 -0.89(0.24;0.24) -0.97(0.21;0.21) -1.01(0.21;0.21) -1.78(0.15;0.38)
Int.24 β31 -1.13(0.25;0.25) -1.05(0.21;0.21) -1.07(0.22;0.22) -1.11(0.15;0.33)
Int.52 β41 -1.64(0.29;0.29) -1.51(0.24;0.24) -1.71(0.29;0.29) -1.72(0.25;0.39)
Tr.4 β12 0.40(0.32;0.32) 0.22(0.28;0.28) 0.22(0.28;0.28) 0.80(0.15;0.67)
Tr.12 β22 0.49(0.31;0.31) 0.55(0.28;0.28) 0.61(0.29;0.29) 1.87(0.19;0.61)
Tr.24 β32 0.48(0.33;0.33) 0.42(0.29;0.29) 0.44(0.30;0.30) 0.73(0.20;0.52)
Tr.52 β42 0.40(0.38;0.38) 0.34(0.32;0.32) 0.44(0.37;0.37) 0.74(0.31;0.52)
Corr. ρ 0.39 0.44 0.39 0.33
Estimands, Missing Data, and Sensitivity Analysis 56
Multiple Imputation
•M = 10 imputations
• GEE:logit[P (Yij = 1|Ti, tj)] = βj1 + βj2Ti
• GLMM:
logit[P (Yij = 1|Ti, tj, bi)] = βj1 + bi + βj2Ti, bi ∼ N (0, τ 2)
• Ti = 0 for placebo and Ti = 1 for interferon-α
• tj (j = 1, . . . , 4) refers to the four follow-up measurements
• Imputation based on the continuous outcome
Estimands, Missing Data, and Sensitivity Analysis 57
Effect Par. GEE GLMM
Int.4 β11 -0.84(0.20) -1.46(0.36)
Int.12 β21 -1.02(0.22) -1.75(0.38)
Int.24 β31 -1.07(0.23) -1.83(0.38)
Int.52 β41 -1.61(0.27) -2.69(0.45)
Trt.4 β12 0.21(0.28) 0.32(0.48)
Trt.12 β22 0.60(0.29) 0.99(0.49)
Trt.24 β32 0.43(0.30) 0.67(0.51)
Trt.52 β42 0.37(0.35) 0.52(0.56)
R.I. s.d. τ 2.20(0.26)
R.I. var. τ 2 4.85(1.13)
Estimands, Missing Data, and Sensitivity Analysis 58
Sensitivity Analysis
• We apply a shift to the treatment group:
proc mi data=m.armd13 seed=486048 simple out=m.armd13as1
nimpute=10 round=0.1;
title ’Shift multiple imputation’;
class treat;
var lesion diff4 diff12 diff24 diff52;
fcs reg;
mnar adjust (diff12 / shift=10 adjustobs=(treat=’2’));
mnar adjust (diff24 / shift=15 adjustobs=(treat=’2’));
mnar adjust (diff52 / shift=20 adjustobs=(treat=’2’));
by treat;
run;
Estimands, Missing Data, and Sensitivity Analysis 59
• Expanded results (all after MI):
GLMM
Effect Par. GEE MAR shift
Int.4 β11 -0.84(0.20) -1.46(0.36) -1.42(0.36)
Int.12 β21 -1.02(0.22) -1.75(0.38) -1.67(0.38)
Int.24 β31 -1.07(0.23) -1.83(0.38) -1.83(0.39)
Int.52 β41 -1.61(0.27) -2.69(0.45) -2.77(0.44)
Trt.4 β12 0.21(0.28) 0.32(0.48) 0.24(0.48)
Trt.12 β22 0.60(0.29) 0.99(0.49) 0.91(0.50)
Trt.24 β32 0.43(0.30) 0.67(0.51) 0.65(0.51)
Trt.52 β42 0.37(0.35) 0.52(0.56) 0.60(0.55)
R.I. s.d. τ 2.20(0.26) 2.20(0.25)
R.I. var. τ 2 4.85(1.13) 4.83(1.08)
Estimands, Missing Data, and Sensitivity Analysis 60
• Shift can be applied to one or a few groups only
• Alternative to shift: adding extra randomness
• Alternative to shift: multiplier
• Use pattern-mixture model identifying restrictions:
. NCMV
. CCMV
Estimands, Missing Data, and Sensitivity Analysis 61
Overview
MCAR/simple CC biased
LOCF inefficient
not simpler than MAR methods
MAR direct likelihood easy to conduct
direct Bayes Gaussian & non-Gaussian
weighted GEE
MI
MNAR variety of methods strong, untestable assumptions
most useful in sensitivity analysis
Estimands, Missing Data, and Sensitivity Analysis 62
“Estimands, Missing Data and Sensitivity Analysis” by Dr. Molenberghs
標本調査理論との関係
代替エンドポイント評価の事例
情報はどこから生成されるか:データ・デザイン・仮定
仮定は検証可能でないため,感度解析の実施が求められる.
エンリッチメント
モデルが含む見地>その見地についてデータが与えることができる情報
エンリチッチメントのクラス
a. Coarsening: 例えば,データは観測されるが,その一部は観測されなかった-欠測値
b. Augmentation: 簡便な解釈のために,モデルが観測されない変数(例えば,ランダム効果)で補足される
1
“Estimands, Missing Data and Sensitivity Analysis” by Dr. Molenberghs
欠測データの一般的な枠組み
National Academy of Science Report 「患者」‐アカデミア-規制-企業
MAR (Missing at Random)モデルとNot Missing at Random (NMAR) どのようなMARモデルにも,データに同じあてはめを提供できるカウン
ターパートとしてのNMARモデルが存在する
データにのみ依存し,MARが成りたっているかどうかを示すことはできない
標準的な解析と感度解析の適用
無視可能な尤度 (Ignorable Likelihood) Bayes流接近法
多重補完
Inverse Probability Weighting (IPW) (重みつき一般化方程式)
2