mixed models with applications to large data sets€¦ · • non-linear models (pk/pd data) •...

39
Mixed models with applications to large data sets Geert Verbeke L-Biostat: Leuven Biostatistics and statistical Bioinformatics Centre Katholieke Universiteit Leuven, Belgium [email protected] http://perswww.kuleuven.be/geert verbeke

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Mixed models

with applications to large data sets

Geert Verbeke

L-Biostat: Leuven Biostatistics and statistical Bioinformatics Centre

Katholieke Universiteit Leuven, Belgium

[email protected]

http://perswww.kuleuven.be/geert verbeke

Page 2: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Outline

• Mixed models, multi-level models, or something else ?

• Mixed models in action: The Diabetes Project Leuven

• Mixed models in large data sets

LSD, June 7-8, 2012 1

Page 3: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

I will focus on . . .

• Model formulation

• Parameter interpretation

• Misconceptions

• Problems often encountered in practice

• Issues with large data sets

LSD, June 7-8, 2012 2

Page 4: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

I will NOT talk about . . .

• Estimation methods

• Inferential procedures

• Model selection

• Diagnostics

• . . .

LSD, June 7-8, 2012 3

Page 5: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Growth curves

(Goldstein 1979)

Mothers height Children

Small mothers < 155 cm 1 → 6

Medium mothers [155cm; 164cm] 7 → 13

Tall mothers > 164 cm 14 → 20

LSD, June 7-8, 2012 4

Page 6: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

A multi-level model

• Yij is jth measurement for the ith child, taken at time tj (age)

• Level 1:

Yij = β1i + β2itj + εij

• Level 2:

β1i = β1Si + β3Mi + β5Ti + b1i

β2i = β2Si + β4Mi + β6Ti + b2i

• Assumptions: εij ∼ N (0, σ2res) and bi = (b1i, b2i)

′ ∼ N (0, D)

LSD, June 7-8, 2012 5

Page 7: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Level 1 + Level 2 = Mixed model

Yij = (β1Si + β3Mi + β5Ti + b1i)

+(β2Si + β4Mi + β6Ti + b2i)tj+εij

⇓Linear regression model with fixed and random parameters

m“Mixed” model

LSD, June 7-8, 2012 6

Page 8: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

The general model

• Mixed / multi-level model:

Yi = Xiβ + Zibi + εi

• Implied marginal model:

Yi ∼ N (Xiβ, ZiDZ′i + σ2

resI)

• Fixed effects model systematic trendsRandom effects generate association

LSD, June 7-8, 2012 7

Page 9: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Flexibilities & extensions

• Unbalanced data

• More than 2 levels:

E.g., 10 cities→ In each: 5 schools

→ In each: 2 classes→ In each: 5 students

→ Each student given a test twice

• Linear −→ • generalized linear models (logistic, count, . . . )• non-linear models (PK/PD data)• generalized non-linear models

LSD, June 7-8, 2012 8

Page 10: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

The Diabetes Project Leuven

(Borgermans et al. 2009)

• The impact of offering GP’s assistance of a diabetes care team,consisting of a nurse educator, a dietician, an ophthalmologist and aninternal medicine doctor, for the treatment of their diabetes patients

• GP’s randomized to one of two programs:

. LIP: Low Intervention Program (group A)

. HIP: High Intervention Program (group R)

• We consider the HIP group only

. 61 GP’s with 1577 patients

. # patients per GP between 5 and 138, with median of 47

LSD, June 7-8, 2012 9

Page 11: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

• Patients were measured twice:

. When the program was initiated (time T0)

. After one year (time T1)

• HbA1c: glycosylated hemoglobin:

. Molecule in red blood cells that attaches to glucose (blood sugar)

. High values reflect more glucose in blood

. Gives a good estimate of how well diabetes has been managed overlast 2 or 3 months

. Non-diabetics have values between 4% and 6%

. HbA1c above 7% means diabetes is poorly controlled, implyinghigher risk for long-term complications.

LSD, June 7-8, 2012 10

Page 12: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

A logistic mixed model

• Dichotomized version of HbA1c:

Y =

1 if HbA1c < 7%

0 if HbA1c ≥ 7%

• A three-level logistic mixed model:

Yijk ∼ Bernoulli(πijk)

logit(πijk) = log(

πijk

1−πijk

)= β0 + β1tk + ai + bj(i),

ai ∼ N (0, σ2GP ), bj(i) ∼ N (0, σ2

PAT )

LSD, June 7-8, 2012 11

Page 13: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Fixed effects

Effect Estimate (se) p-value

Intercept β0: 0.1662 (0.0796) 0.0410

Time β1: 0.6240 (0.0812) < .0001

“Fixed effects model systematic trends”

6=“Fixed effects model average trends”

LSD, June 7-8, 2012 12

Page 14: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Logistic random-intercepts model

E[Yijk|ai, bj(i)

]= πijk =

exp[β0 + β1tk + ai + bj(i)

]

1 + exp[β0 + β1tk + ai + bj(i)

]

LSD, June 7-8, 2012 13

Page 15: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Average subject treated by average GP

E[Yijk|ai = 0, bj(i) = 0

]=

exp [β0 + β1tk + 0 + 0]

1 + exp [β0 + β1tk + 0 + 0]

LSD, June 7-8, 2012 14

Page 16: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Average evolution

E[Yijk

]= E

exp[β0 + β1tk + ai + bj(i)

]

1 + exp[β0 + β1tk + ai + bj(i)

]

LSD, June 7-8, 2012 15

Page 17: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Conclusion

Average evolution 6= Evolution average subject

• Parameters in the mixed model have a subject-specific interpretation,not a population-averaged one.

• Calculation of the marginal average population requires computation of∫∫

exp[β0 + β1tk + ai + bj(i)

]

1 + exp[β0 + β1tk + ai + bj(i)

]

f (ai)f (bj(i)) daidbj(i)

LSD, June 7-8, 2012 16

Page 18: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Variance components

Effect Estimate (se) p-value

Between GP variance σ2GP : 0.1399 (0.0528) ?

Between patient variance σ2PAT : 1.1154 (0.1308) ?

!!! Tests for variance components not standard !!!

LSD, June 7-8, 2012 17

Page 19: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Random effects predictions

LSD, June 7-8, 2012 18

Page 20: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Scatterplot of random effects predictions

LSD, June 7-8, 2012 19

Page 21: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

• For each GP, we observe at most 7 different patient predictions.

• These correspond to the 7 possible response profiles:0 −→ 0, 0 −→ 1, 1 −→ 1, 0 −→ ·, 1 −→ ·, · −→ 0, and · −→ 1.

• The negative trends are also a side effect of the discrete nature of theoutcomes.

• Two patients, j1 and j2, treated by different GP’s, i1 and i2, with thesame response profile should get identical predicted probabilities

⇒ ai1 + bj1(i1)= ai2 + bj2(i2)

⇒ ai + bj(i) is constant

⇒ Observed non-normality not necessarily problematic

LSD, June 7-8, 2012 20

Page 22: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

The reverse ?

• Simulation of 1000 subjects with 5 measurements each

• Histogram of true random intercepts:

LSD, June 7-8, 2012 21

Page 23: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

• Histogram of predictions assuming normality:

• The normal “prior” forces the predictions to satisfy normality

LSD, June 7-8, 2012 22

Page 24: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Conclusion

(Verbeke & Lesaffre 1996)

The normality assumption for random effectscannot be tested using their predictions

⇓Model extensions are needed

LSD, June 7-8, 2012 23

Page 25: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Large data sets

Measurements → 1 2 3 4 n

↓ Subjects#1 • • • • • • • • • • • • • • • • • • •#2 • • • • • • • • • • • • • • • • • • •#3 • • • • • • • • • • • • • • • • • • •#4 • • • • • • • • • • • • • • • • • • •

• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •

#N • • • • • • • • • • • • • • • • • • •

N large, or n large, or both ?

LSD, June 7-8, 2012 24

Page 26: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Situations leading to large data sets

• N large: Observational longitudinal data

• n large: Statistical genetics / functional data analysis

• N and n large: Large multivariate longitudinal data

LSD, June 7-8, 2012 25

Page 27: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Large N

Measurements → 1 2 3 4 n

↓ Subjects#1 • • • • • • • • • • • • • • • • • • •#2 • • • • • • • • • • • • • • • • • • •#3 • • • • • • • • • • • • • • • • • • •#4 • • • • • • • • • • • • • • • • • • •

• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •

#N • • • • • • • • • • • • • • • • • • •

{

{{

=⇒ Independent sub-samples

LSD, June 7-8, 2012 26

Page 28: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Large n

Measurements → 1 2 3 4 n

↓ Subjects#1 • • • • • • • • • • • • • • • • • • •#2 • • • • • • • • • • • • • • • • • • •#3 • • • • • • • • • • • • • • • • • • •#4 • • • • • • • • • • • • • • • • • • •

• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •

#N • • • • • • • • • • • • • • • • • • •︸ ︷︷ ︸ ︸ ︷︷ ︸ ︸ ︷︷ ︸

=⇒ Dependent sub-samples

LSD, June 7-8, 2012 27

Page 29: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

The general split sample idea

(Molenberghs, Verbeke, & Iddi 2011)

• Split sample in M sub-samples

• Analyse each sub-sample separately

• Combine results in appropriate way

• Inference follows from pseudo likelihood ideas

LSD, June 7-8, 2012 28

Page 30: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Pseudo likelihood

(Arnold & Strauss 1991)

• (Log-)Likelihood:

`(Θ) =∑

i

`(yi|Θ), Θ̂d→ N (Θ, I−1

0 )

• Pseudo (log-)likelihood:

p`(Θ) =∑

i

s

δs `(yi(s)|Θ), Θ̂

d→ N (Θ, I−1

0 I1I−10 )

LSD, June 7-8, 2012 29

Page 31: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Example: Multivariate longitudinal data

• Threshold sound pressure levels (dB), on both ears,11 frequencies: 125 → 8000 Hz

• Observations from 603 males, with up to 15 obs./subject.

× 603

LSD, June 7-8, 2012 30

Page 32: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Linear mixed models for hearing data

• Linear mixed model for one outcome:

Yi(t) = (β1 + β2 Fagei + β3 Fage2i + ai)

+ (β4 + β5 Fagei + bi) t + β6 visit1(t) + εi(t)

• Joint model:

Y1i(t) = µ1(t) + a1i + b1it + ε1i(t)

Y2i(t) = µ2(t) + a2i + b2it + ε2i(t)

...

Y22i(t) = µ22(t) + a22i + b22it + ε22i(t)

LSD, June 7-8, 2012 31

Page 33: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Joint model

• Distributional assumptions:

(a1i, a2i, . . . , a22i, b1i, b2i, . . . , b22i)′ ∼ N (0, D44×44)

(ε1i(t), ε1i(t), . . . , ε1i(t))′ ∼ N (0, Σ22×22) , for all t

• Full multivariate joint model

. 44 × 44 covariance matrix for random effects

. 22 × 22 covariance matrix for error components

. 990 + 253 = 1243 covariance parameters

=⇒ Computational problems!

LSD, June 7-8, 2012 32

Page 34: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Pairwise approach

(Fieuws & Verbeke 2006)

• Fit all 231 bivariate models using (RE)ML (SAS PROC MIXED):

(Y1, Y2), (Y1, Y3), . . . , (Y1, Y22), (Y2, Y3), . . . , (Y2, Y22), . . . , (Y21, Y22)

• Equivalent to maximizing pseudo (log-)likelihood:

p`(Θ) = `(Y1, Y2|Θ1,2) + `(Y1, Y3|Θ1,3) + . . . + `(Y21, Y22|Θ21,22)

• Inferences follow from pseudo likelihood theory

LSD, June 7-8, 2012 33

Page 35: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Overlapping sub-samples

Measurements → 1 2 3 n

↓ Subjects#1 • • • • • • • • • • • • • • • • • • •#2 • • • • • • • • • • • • • • • • • • •#3 • • • • • • • • • • • • • • • • • • •#4 • • • • • • • • • • • • • • • • • • •

• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • •

#N • • • • • • • • • • • • • • • • • • •︸ ︷︷ ︸ ︸ ︷︷ ︸ ︸ ︷︷ ︸

︸ ︷︷ ︸ ︸ ︷︷ ︸

LSD, June 7-8, 2012 34

Page 36: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Hearing data: Joint tests for fixed effects

• Example: Interaction between the linear time effect and age.

• Estimates and standard errors:

χ210 = 90.4, p < 0.0001 χ2

10 = 110.9, p < 0.0001

LSD, June 7-8, 2012 35

Page 37: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Hearing data: Association of evolutions

• Association between underlying random effects: D44×44 of interest

• PCA on correlation matrix of random slopes, left side:

LSD, June 7-8, 2012 36

Page 38: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Conclusions

• Mixed models provide flexible tools for hierarchical data:

. Unbalanced data

. Multiple levels

. Natural way to incorporate association by modeling variability

. Natural extension of ‘standard models’

. Large data sets can be handled (pseudo-likelihood)

• However:

. Parameter interpretation needs careful reflection

. Inference not always standard

. Model assessment more involved

LSD, June 7-8, 2012 37

Page 39: Mixed models with applications to large data sets€¦ · • non-linear models (PK/PD data) • generalized non-linear models LSD, June 7-8, 2012 8. The Diabetes Project Leuven (Borgermans

Thanks !LSD, June 7-8, 2012 38