impact of the distributional assumptions of random effects on model fitting xueying (sherri) zhang...

28
Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services CDIC/ Tobacco control section

Upload: kayla-wentworth

Post on 27-Mar-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

Impact of the Distributional Assumptions of Random Effects on

Model Fitting

Xueying (Sherri) Zhang

Research Scientist

California Department of Health Services

CDIC/ Tobacco control section

Page 2: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

2

Overview

1. Introduction

2. Data Source and Study Population3. Methods: Model Description and Simulations

4. Results

5. Discussion and Conclusions

Page 3: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

3

Introduction

Distribution of the random effects

-0.31969 0.08031 0.48031 0.88031 1.28031 1.68031 2.08031 2.48031 2.88031

bi

0

1

2

3

4

5

6

Random Effect model: Logit [E (Yij=1| bi)] = ß0 + ß1Xij1+ ß2Xij2 + ß2Xij3 +bi

Page 4: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

4

Introduction

Research Questions:

1. What is the impact of this normality assumption on the

estimates of fixed effects?

2. When a cluster- level confounder is omitted from the model and

the random effects are associated with the covariates in the

model, are the estimates of RE model correct?

Page 5: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

5

Data Source and Study Population

1. Regional Rural Injury Study-II(RRIS-II) was a population-based, prospective cohort study, which was designed to investigate the incidence and consequences of agricultural injury in the five state region of Minnesota, Wisconsin, North Dakota, South Dakota, and Nebraska in 1999.

2. 3,765 household, including 16,538 persons, participated in the study.

3. We modeled the probability of agricultural activity-related injury (Yes/No). Gender, prior injury, and working hours per week on the agricultural operation were chosen as the covariates.

4. Clustered binary data---the same operations and the potential similarity of behaviors between parents and children.

Page 6: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

6

Model Description

Generalized linear mixed models (GzLMM) with a random intercept is expressed as:

Logit [E (Yij=1| bi)] = ß0 + ß1Xij1+ ß2Xij2 + ß3Xij3+bi

Yij indicates whether the agricultural activity-related injury happened or not in 1999 for the jth person in the ith family

bi is the random effect for the ith family

Xijk indicates gender, age, education, marital status, prior injury, working hours on the farm and the percentage of prior injury within the family (PPI).

Page 7: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

7

Methods

Random effect model :The marginal likelihood for Bernoulli data is as follows:

i

m

i

i

n

j

dbbfijy

ijy

YDL pp ijij

1

)1(

1

)()/,( )1(

Where Pij =E(Yij | bi) and with a logit link:

bZXP

PLog iijij

ij

ij

)

1( bi ~ N(0, σ2)

ebib

if 2

0

2

2

1

22

1)(

Page 8: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

8

Methods

Conditional model:

Logit [E (Yij=1| bi)] = ß’0+ ß’

1Xij1+ ß’2Xij2 + ß’

3Xij3+bi

Given the sufficient statistics for bi, Sij, the conditional likelihood can

be expressed as:

i j iiiij

j iiiij

bxxSbxxy

f

f

),,...,|(

),,...,|(

31

31L=

Page 9: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

9

Methods

Marginal model only takes the fixed effects into account, the model is:

Logit [E (Yij=1| bi)] = ß*0+ ß*

1Xij1+ ß*2Xij2 + ß*

3Xij3

Var(Yij)=φVar(E(Yij))

It estimates β* by solving a quasi score function:

0))((1

1

'

xy iii

m

ii f

Marginal model:

Page 10: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

10

Simulations---(1) True model for the first simulation:

Logit [E (Yij | bi)] = ß0 + ß1*gender + ß2*workhour +ß3 *priorinj +bi

Random effects:

nbi=sqrt (σ2/v2)*(bi-µ) nbi~ ( 0, σ2 )

σ2 the estimated variance for random effects from the true model.

v2 the variance of the predicted bi from the true model.

µ the mean of estimated bi from the true model.

bi the predicted random effects from the true model.

Pij= Exp (Xijß+nbi)/(1+ Exp(Xijß+nbi))

Page 11: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

11

Simulations---(1)

1000 different seeds Random numbers (R) from U(0,1) for each individual

If R≤ Pij, SimY=1; else, SimY=0

Pr (simY) = Pij

Covariates remain the same with the real data

a marginal model, RE model and conditional model was fit for each data set.

Page 12: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

12

Simulations---(2) True model for the second simulation:

Logit [E (Yij | bi)] = ß0 + ß1*gender + ß2*workhour +ß3 *priorinj + ß4 *PPI + bi

PPI-Percentage of prior injury within family.

Random effects:nbi=sqrt (σ2/v2)*(bi-µ) nbi~ ( 0, σ2 )

σ2 the estimated variance for random effects from the true model. v2 the variance of the predicted bi from the true model. µ the mean of estimated bi from the true model. bi the predicted random effects from the true model.

Pij= Exp (Xijß+nbi)/(1+ Exp(Xijß+nbi))

Page 13: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

13

Simulations---(2)

1000 different seeds Random numbers (R) from U(0,1)

If R≤ Pij, SimY=1; else, SimY=0

Pr (simY) = Pij

Covariates remain the same with the real data

A marginal model, RE model and conditional model was fit for each data set.

Page 14: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

14

Results - Results from the model using real data:

Marginal model RE model Conditional model

Working hours Estimate 95% C.I. Estimate 95% C.I. Estimate 95% C.I.

0 -2.78 -3.38 -2.18 -2.90 -3.53 -2.26 -2.50 -3.28 -1.73

0-20 -1.24 -1.55 -0.92 -1.33 -1.66 -0.99 -1.15 -1.65 -0.66

21-40 -0.63 -0.91 -0.34 -0.69 -1.00 -0.38 -0.55 -1.00 -0.09

41-60 -0.31 -0.58 -0.04 -0.33 -0.63 -0.04 -0.13 -0.58 0.32

61-80 -0.20 -0.46 0.07 -0.22 -0.51 0.08 -0.05 -0.49 0.40

81+ * 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Page 15: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

15

Results - Results from the model using real data:

-4.00

-3.50

-3.00

-2.50

-2.00

-1.50

-1.00

-0.50

0.00

0.50

1.00

0 0-20 21-40 41-60 61-80 81+ *

Farm working hours

Es

tim

ate

s

Marginal Model RE Model Conditional Model

Page 16: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

16

Results--Simulation (1) of the model without PPI:

The true model for the simulation is the RE model:

Logit [E (Yij | bi)] = ß0 + ß1*gender + ß2*workhour +ß3 *priorinj +bi

1. The average estimates of the RE model are closer to those of the conditional model.

2. The bias for prior injury in the RE model is 0.1130, much larger than the estimates from the marginal model and conditional model: 0.0573 and -0.0036.

3. The MSE for prior injury from the RE model is 0.0182, which is much bigger than the MSE from the marginal model and conditional models: 0.0073 and 0.0089.

Page 17: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

  Marginal model RE model Conditional model

VariablesWorkinghours Gender

Prior injury

Working hours Gender

Prior injury

Working hours Gender

Prior injury

Bt

(True values) 0.9448 0.5716 1.1263 1.0392 0.6287 1.2388 1.0392 0.6287 1.2388

Bs (Avg

estimates) 0.9048 0.5172 1.1836 1.0613 0.5765 1.3518 1.0516 0.6249 1.2352

Bias (Bs-Bt) -0.0400 -0.0544 0.0573 0.0221 -0.0522 0.1130 0.0124 -0.0038 -0.0036

MSE mean((Bi-Bt)^2)) 0.0073 0.0081 0.0073 0.0083 0.0092 0.0182 0.0124 0.0083 0.0089

Page 18: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

  Marginal model RE model Conditional model

Working GenderPrior injury Working Gender

Prior injury Working Gender

Prior injury

Percentage of 80% C.I. coverage 0.7770 0.6810 0.6950 0.8080 0.7230 0.4830 0.7990 0.8000 0.8620

Percentage of 85% C.I. coverage 0.8220 0.7550 0.7450 0.8690 0.7910 0.5340 0.8510 0.8440 0.9130

Percentage of 90% C.I. coverage 0.8780 0.8270 0.7940 0.9210 0.8320 0.6230 0.9260 0.8880 0.9360

Percentage of 95% C.I. coverage 0.9400 0.8940 0.8880 0.9640 0.9010 0.7420 0.9710 0.9440 0.9720

Percentage of 99% C.I. coverage 0.9890 0.9830 0.9720 0.9850 0.9900 0.9030 0.9990 0.9940 0.9950

Page 19: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

19

Results -Simulation of the Model with PPI:

Hypotheses for incorrect estimates for prior injury in the RE model: an important cluster level confounder related to prior injury was omitted from the model.

The random effects were significantly associated with prior injury (ß =0.0162, p=0.0021).

After PPI was included in the model, the random effects were independent of prior injury (ß=0.0027, p=0.6101) and PPI (ß=0.0084, p=0.3668).

Page 20: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

20

Results -Simulation of the Model with PPI:

PPI is a confounder for the effect of prior injury because it is significant in the model (ß=0.5205, p<0.001) and also associated with prior injury.

True model for the second simulation:

Logit [E (Yij | bi)] = ß0 + ß1*gender + ß2*workhour +ß3 *priorinj + ß4 *PPI + bi

Page 21: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

  Marginal model RE model Conditional model

Variables Working GenderPrior injury Working Gender

Prior injury Working Gender

Prior injury

Bt

(True values) 0.9688 0.5940 0.9827 1.0653 0.6531 1.0805 1.0653 0.6531 1.0805

Bs (Avg estimates) 0.9145 0.5375 1.0935 1.0698 0.6285 1.2517 1.0429 0.6249 1.2389

Bias (Bs-Bt) -0.0543 -0.0565 0.1109 0.0045 -0.0246 0.1712 -0.0224 -0.0282 0.1584

MSE mean((Bi-Bt)^2)) 0.0057 0.0049 0.0067 0.0080 0.0072 0.0088 0.0141 0.0077 0.0097

Page 22: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

22

Results -Simulation of the Model with PPI:

The average estimates of the RE model are almost equal to those of the conditional model.

The biases of the estimates for these three models are very similar to each other.

The mean squared errors (MSE) of the estimates of these three models are also similar.

Page 23: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

  Marginal model RE model Conditional model

VariablesWorkinghours Gender

Prior injury

Workinghours Gender

Prior injury

Workinghours Gender

Prior injury

Percentage of 80% C.I. coverage 0.8340 0.8280 0.8540 0.8300 0.7670 0.8350 0.8100 0.8100 0.8030

Percentage of 85% C.I. coverage 0.8950 0.8640 0.8770 0.8810 0.8340 0.8810 0.8450 0.8700 0.8740

Percentage of 90% C.I. coverage 0.9260 0.9170 0.9040 0.9200 0.8890 0.9060 0.8970 0.9000 0.9270

Percentage of 95% C.I. coverage 0.9570 0.9750 0.9510 0.9580 0.9380 0.9480 0.9410 0.9450 0.9630

Percentage of 99% C.I. coverage 0.9970 1.0000 0.9990 0.9970 0.9970 0.9970 0.9920 1.000 0.9900

Page 24: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

24

Results -Simulation of the Model with PPI:

The percentage of C.I. coverage is higher than the corresponding confidence interval.

For instance, the percentage of 80% C.I. coverage for working hours in the marginal model is 83.4%, higher than 80%.

The percentages of C.I. coverage are greatly improved, especially for the prior injury in the RE model.

For instance, only 48.3% of the 80% confidence intervals of the prior injury estimates cover the true value, but in Table 7, 83.5% of the 80% confidence intervals cover the true value.

Page 25: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

25

Discussion and Conclusions

The fixed effects from the RE model are correct even in cases where the distribution of random effects does not follow the normal distribution.

The random effects should be independent of the covariates in the model.

Page 26: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

26

Discussion and Conclusions

In this project, after we include PPI, the random effects were independent of all the covariates, the random effects still did not follow a normal distribution

--- unknown or unmeasured variables may exist which affects the probability of injury within the family.

Page 27: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

27

Discussion and Conclusions

One limitation of the project is that the true values for marginal model were not available.

For further study, the impacts of the normality assumption of random effects on the estimates of random effects are of interest.

Page 28: Impact of the Distributional Assumptions of Random Effects on Model Fitting Xueying (Sherri) Zhang Research Scientist California Department of Health Services

Thank You!