introduction to longitudinal data analysis · 2012-04-27 · analysis of longitudinal data. oxford...

Introduction to Longitudinal Data Analysis

Öþôçò ÓéÜííçòÐáíåðéóôÞìéï Áèçíþí, ÔìÞìá Ìáèçìáôéêü

[email protected]

April 27, 2012

Bibliography

• Weiss Robert(2005). Modeling Longitudinal Data. Springer.

• Diggle P.J., Heagerty P., Liang KY and Zeger S.(2002). Analysis of Longitudinal Data.Oxford Statistical Science Series.

• Fitzmaurice G.M., Laird N.M. and Ware J.H.(2004) Applied Longitudinal Analysis. Wiley.

• Davis C.(2002). Statistical Methods for the Analysis of Repeated Measures. Springer.

• Crowder M.J. and Hand D.J.(1990) Analysis of Repeated Measures. Chapman & Hall.

Longitudinal Data Analysis 1

Introduction

• We are familiar with the assumptions behind linear regression models, that observationsare independent.

• The de�ning feature of Longitudinal Studies is that measurements of the same individualare collected repeatedly over time.

• As a result, observations on the same individual must be associated.

• Hence, the assumption of independent observations cannot be justi�ed.


• The availability of repeated measurement on the same subjects at several time pointscertainly o�ers more information compared to cross-sectional studies.

• Longitudinal studies allows the study of change over time (within subject change).

• The primary goals is studies of this kind are:

{ characterize the change of response over time{ investigate the factors that in uence it

• Responses can be either univariate or multivariate (here we focus on univariate responses).


●

●

●

●

●

●

●

●

●

●

10 12 14 16 18

6065

7075

8085

9095

Age

Rea

ding

Abi

lity


●

●

●

●

●

●

●

●

●

●

10 12 14 16 18

6065

7075

8085

9095

Age

Rea

ding

Abi

lity

●

●

●

●

●

●

●

●

●

●

10 12 14 16 18

6065

7075

8085

9095

Age


●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

10 12 14 16 18

6065

7075

8085

9095


Example: CD4+ Cell Numbers (macs data; Diggle et.al.)

The Human Immune de�ciency Virus (HIV) causes AIDS by attacking and reducing CD4+cells and hence reducing a person's ability to �ght infection.

• An uninfected individual has around 1100 cells per millilitre of blood

• CD4+ decrease in number with time from infection

• CD4+ number can be used to monitor disease progression

We have 2376 values of CD4+ cell number from 369 infected individuals. We plot CD4+values against time since seroconversion (time since HIV becomes detectable). [Multi-centerAIDS cohort study of MACS (Kaslow et.al. 1987)]


−2 0 2 4

050

010

0015

0020

0025

0030

00

Years since seroconversion

CD

+ c

ell n

umbe

rs


−2 0 2 4

050

010

0015

0020

0025

0030

00


CD

+ c

ell n

umbe

rs


Example: Treatment of Lead Exposed Children (TLC) Trial

(Fitzmaurice et.al.)

• The TLC trial was a placebo-controlled, randomized study of succimer (a chelating agent)in children with blood lead levels of 20-44 micrograms/dL.

• These data consist of four repeated measurements of blood lead levels obtained at baseline(or week 0), week 1, week 4, and week 6 on 100 children who were randomly assigned tochelation treatment with succimer or placebo.

Group Baseline Week 1 Week 4 Week 6

Succimer 26.5 13.5 15.5 20.8(5.0) (7.7) 7.8) (9.2)

Placebo 26.3 24.7 24.1 23.6(5.0) (5.5) (5.8) (5.6)

Table 1: Mean blood lead levels (sd) from the TLC trial.


●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

0 1 2 3 4 5 6

010

2030

4050

Time(weeks)

Mea

n B

lood

lead

leve

l

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●●


●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

0 1 2 3 4 5 6

010

2030

4050

Time(weeks)

Mea

n B

lood

lead

leve

l

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●●●

●

● ●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●● ●

●


●

●

●

●

0 1 2 3 4 5 6

1015

2025

30

Time(weeks)

Mea

n B

lood

lead

leve

l●

●●

●

SuccimerPlacebo


Example: Small Mice Data (Weiss)

• Weights in milligrams of new-born male mice.

• All from mothers from a single strain.

• 14 mice measured every 3 days.

• Measurements from day 2 up to day 20.

• Balanced data set.


R Console Page 1

Group id weight.2 weight.5 weight.8 weight.11 weight.14 weight.17 weight.201 3 22 190 388 621 823 1078 1132 11918 3 23 218 393 568 729 839 852 100415 3 24 141 260 472 662 760 885 87822 3 25 211 394 549 700 783 870 92529 3 26 209 419 645 850 1001 1026 106936 3 27 193 362 520 530 641 640 75143 3 28 201 361 502 530 657 762 88850 3 29 202 370 498 650 795 858 91057 3 30 190 350 510 666 819 879 92964 3 31 219 399 578 699 709 822 95371 3 32 225 400 545 690 796 825 83678 3 33 224 381 577 756 869 929 99985 4 34 187 329 441 525 589 621 79692 4 35 278 471 606 770 888 1001 1105


R Console Page 1

Group id weight day1 3 22 190 22 3 22 388 53 3 22 621 84 3 22 823 115 3 22 1078 146 3 22 1132 177 3 22 1191 208 3 23 218 29 3 23 393 510 3 23 568 811 3 23 729 1112 3 23 839 1413 3 23 852 1714 3 23 1004 2015 3 24 141 216 3 24 260 517 3 24 472 818 3 24 662 1119 3 24 760 1420 3 24 885 1721 3 24 878 20


●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

5 10 15 20

200

400

600

800

1000

1200

Day

Wei

ght


●

●

●

●

●

●

●

5 10 15 20

200

400

600

800

1000

1200

Days

Wei

ght

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●


Distinctive Feature

Longitudinal Data are clustered. Clusters of data are created from the repeatedmeasurement obtained from the same subject/individual at di�erent times/occassions.

• This feature implies that observations of this kind are correlated, and 'common sense' saysthat they are positively correlated.

• This correlation usually is not of interest.

• However, this correlation needs to be accounted in the analysis, because it invalidated the'common' assumption of independent observations.

• Between subject observations are NOT correlated.

• Clustered data can arise in many di�erent ways. Family, school, hospital, and householddata are clusters that produce correlated data.


Objectives

Important role in health sciences.

• Investigate heterogeneity among individual (genetic, social, behavioral).

• Investigate changes in response over time. This is not possible in cross-sectional studies,where within and between subjects factors that in uence the changes over time cannot bedistinguished.

• Relate changes to covariates.

• Make predictions about how speci�c individuals change over time.


Terminology

• In a LD study, the units being studied are referred to subjects or individuals.

• Individuals are measured at di�erent times or occasions.

• The number of repeated observations and their timing can vary between studies and/orindividuals.

{ A study where all individuals have the same number of observations, usually at the sameoccasions, is called balanced.

{ The opposite leads to an unbalanced study (the 'norm' for LD studies).

• Missing data are very common, leading to incomplete data.

• Data can be collected prospectively (advisable) or retrospectively (often poor quality data).


Balanced Studies

• Clinical trial measuring the e�cacy of an analgesic agent, taking repeated measures ofself-reported pain scale at baseline and at the end of six 15-min intervals.

• Usually when the length of time is short or when humans are not the main subject ofinvestigation (ex. rats).

Unbalanced Studies

• When arthritis patients visit the clinic at 6-month intervals, either miss a visit or the timingis never exactly at 6 months (6-12 months).

• Most health related studies.


Notation

• Let Yij denote the response of the i-individual (i = 1; :::; N) at j-occasion (j = 1; :::; n).(this notation is su�cient of measurements are equally separated)

• Given that we have n repeated measures for each individual, we can group them in a n×1vector

Yi =

Yi1Yi2...Yin

or Yi = (Yi1; Yi2; : : : ; Yin)

′.

• Interest lies on the mean response and how this changes with covariates (treatment group,age, sex,...)

�j = E(Yij):


If we allow the mean response to di�er across individuals, then

�ij = E(Yij):


Data Structures

The general layout is

• N subjects

• from which we get n repeated measures

• at times ti

• Yij response of interest from subject i at occasion j

• with covariates xij = (xij1; xij2; :::; xijp). Generally the number of covariates may varyacross the repeated measurements

• Missing indicator

�ij =

{1; if Yij and xij are observed;0; ...missing.


Layout for the one-sample case

OccasionSubject 1 j n

1 y11 y1j y1t... ... ... ...i yi1 yij yit... ... ... ...N yN1 yNj yNn


Time Missing

Subject Point Indicator Response Covariates

1 1 ä11 y11 x111 : : : x11p...

......

... . . . ...

j ä1j y1j x1j1 : : : x1jp...

......

... . . . ...

t1 ä1t1y1t1

x1t11 : : : x1t1p

........................................................................................

i 1 äi1 yi1 xi11 : : : xi1p...

......

... . . . ...

j äij yij xij1 : : : xijp...

......

... . . . ...

ti äiti yiti xiti1 : : : xitip

........................................................................................

n 1 än1 yn1 xn11 : : : xn1p...

......

... . . . ...

j änj ynj xnj1 : : : xnjp...

......

... . . . ...

tn änt1 ynt1 xntn1 : : : xntnp

Table 2: General layout for repeated measurements


Time Point

Group Subject 1 j t

1 1 y111 y11j y11t...

......

...

i y1i1 y1ij y1it...

......

...

n1 y1n11 y1n1jy1n1t

........................................................................................

h 1 yh11 yh1j yh1t...

......

...

i yhi1 yhij yhit...

......

...

nh yhnh1 yhnhj yhnht

........................................................................................

s 1 ys11 ys1j ys1t...

......

...

i ysi1 ysij ysit...

......

...

ns ysns1 ysnsj ysnst

Table 3: Layout for the special case of multiple samples


Dependence & Correlation

Consider a simple LD design that is balanced and complete, with n measurements of theresponse variable at a common set of occasions on N individuals.

• Expectation: �ij = E(Yij):

• Variance: �2j = E{[Yij − E(Yij)]

2} = E{(Yij − �ij)2}:

• Covariance: �jk = E{(Yij − �ij)(Yik − �ik)}:

• Correlation: �jk =E{(Yij−�ij)(Yik−�ik)}

�j�k:


• We anticipate observations on the same individual to be positively correlated. Thus

Cov

Yi1Yi2...Yin

=

V ar(Yi1) Cov(Yi1; Yi2) : : : Cov(Yi1; Yin)

Cov(Yi2; Yi1) V ar(Yi2) : : : Cov(Yi2; Yin)... ... . . . ...

Cov(Yin; Yi1) Cov(Yin; Yi2) : : : V ar(Yin)

=

�11 �12 : : : �1n

�21 �22 : : : �21... ... . . . ...

�n1 �n2 : : : �nn

where:

• Cov(Yij; Yik) = �jk = �kj = Cov(Yik; Yij);

• �kk = Cov(Yik; Yik) = V ar(Yik) = �2k :


Hence, the covariance matrix takes the simple form

Cov(Yi) =

�2

1 �12 : : : �1n

�21 �22 : : : �21

... ... . . . ...�n1 �n2 : : : �2

n

;

and equally we can de�ne the correlation matrix

Corr(Yi) =

1 �12 : : : �1n

�21 1 : : : �21... ... . . . ...�n1 �n2 : : : 1

;

whereCorr(Yij; Yik) = �jk = �kj = Corr(Yik; Yij):


Example: TLC Trial (cont.)

Objective: Investigate whether treatment with succimer reduced blood lead levels overtime, relative to any changes observed in the placebo group.

H0 : �j(S) = �j(P ); for all j = 1; :::4;

where �j(S) and �j(P ) are the succimer and placebo mean responses at the jth occasion.

Alternatively,

H0 : �j(S)− �1(S) = �j(P )− �1(P ); for all j = 2; :::4;

which states that the changes in the mean response from baseline are equal in the twotreatments.

Note: The second version of the null hypothesis discusses the changes in the means,while there might be di�erences at baseline. Hence, is implied by the �rst null hypothesis,making the second less restrictive.


Restrict attention to the placebo group and let's explore the interdependence of the fourmeasures of blood lead level. First, explore the time plot

●

●

●

●

0 1 2 3 4 5 6

1520

2530

35

Time(weeks)

Mea

n B

lood

lead

leve

l ●

●

●●


while secondly we can explore the pairwise scatter-plots for children in the placebo group

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●●

● ●●

●

●●●

●

●

●

●

●

●

●●

●

●●

●●●●

● ●

●

●

●

●

●●

10 20 30 40 5010

3050

Baseline

Wee

k 1

●●

●

●

●

●

●●●●●

●

●●

●

●

●● ●

●

●

●●

●●

●●●

●

●●

●

●

●

●●

●●

●

●

●●●● ●

●

●

●

● ●

10 20 30 40 50

1030

50

Baseline

Wee

k 4

●●

●

●

●

●

● ●● ●●

●

●●

●

●

●● ●

●

●

●●

●●

●●●

●

●●

●

●

●

●●

●●

●

●

●●●● ●●

●

●

● ●

10 20 30 40 5010

3050

Week 1

Wee

k 4

●●●

●

●

●●

●●●

●

●●

●●

●●

● ●

●

●●

●

●●

●● ●

●●●

●

●

●●

●

●●

●

●

●●

●

● ●●

●

●● ●

10 20 30 40 50

1030

50

Baseline

Wee

k 6

●●●

●

●

●●

●●●

●

●●

●●

●●

● ●

●

●●

●

●●

●● ●

●●●

●

●

●●

●

●●

●

●

●●

●

● ●●

●

●● ●

10 20 30 40 50

1030

50

Week 1

Wee

k 6

●●●

●

●

●●

●●●

●

●●

●●

●●

●●

●

●●

●

●●

●●●

●●●

●

●

●●

●

●●

●

●

●●●

●●●

●

●●●

10 20 30 40 5010

3050

Week 4

Wee

k 6


The estimated covariances are

Cov(Yi) =

25:2 22:8 24:3 21:422:8 29:8 27:0 23:424:3 27:0 33:1 28:221:4 23:4 28:2 31:8

;

and correlations

Corr(Yi) =

1 0:83 0:84 0:76

0:83 1 0:86 0:760:84 0:86 1 0:870:76 0:76 0:87 1

:


What if we ignore the correlation in the analysis?

A natural estimate of the change in the mean response is

� = �2 − �1;

where �j = 1N

∑Ni=1 Yij: For the treatment group we have �2− �1 = 13:5− 26:6 = −13.

To obtain an estimate of it's standard error we calculate

V ar(�) = V ar

{1

N

N∑i=1

(Yi2 − Yi1)

}=

1

N(�2

1 + �22 − 2�12);

which in our case becomes

V ar(�) =1

50(25:2 + 58:9− 2(15:5)) = 1:06:


If we had simply ignored the existing correlation, then

• We would implicitly assume that �12 = 0, and hence

V ar(�) =1

50(25:2 + 58:9) = 1:68;

which is approximately 1.6 times larger

• This lead to wide con�dence intervals and p-values for the test of H0 : � = 0 that aretoo large.

In summary

• the correlation between the observations is a good thing

• failure to take account of the correlation in the analysis could lead to misleading scienti�cinferences


Pros

• Investigate pattern of change

• Subjects serve as their own controls since response variable is measured under control(baseline) and experimental conditions

• Data collected from the same subjects are more reliable

• While we can address the same questions as in a cross-sectional study, in LD analysis wecan separate what is called cohort and age e�ects


Cons

• Complications in the analysis due to the correlation between observations

• The investigator not always controls the circumstances

{ unbalanced designs{ missing data (pattern!)


Start: Plots/Graphical Presentation

Initially we discuss how we can analyze data that come from one population/group. Weintend to explore

• how observations change over time

• what may in uence possible changes over time.

Initially assume that we have a balanced study. This is an important and reasonableassumption for some kind of analyses which cannot adjust for some forms of irregularities.

For example, it is very important to have observations at regular time points, so quantitieslike the mean response at a speci�c occasion can be calculated.

Assume we have the data from the TLC study, only from the Placebo group.


1. No matter what we are planing to do with the analysis of LD data, the �rst step is alwaysthe creation of a scatterplot. For the balanced TLC data we have

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

0 1 2 3 4 5 6

010

2030

4050

Time(weeks)

Mea

n B

lood

lead

leve

l

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●●


while for the unbalanced macs data we have

−2 0 2 4

050

010

0015

0020

0025

00


CD

+ c

ell n

umbe

rs


2. As soon as we explore the scatterplot we plot the time or pro�le plot. For the TLC datathere is little hope of getting something very useful out of it (usually the case),

●

●●

●

0 1 2 3 4 5 6

010

2030

4050

Time(weeks)

Mea

n B

lood

lead

leve

l

●

●●

●● ●

● ●

●

●●

●

●

●

●

●

●

● ● ●

● ●

●●

●

●

●

●

●

●

● ●

●●

●

●●

●

●

●

●

●

●●

●

●●

●●

● ●

●

● ●● ●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

● ●

●

●

●

●●

●

●●

●●● ●

●

●

●

●●

●●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

● ●●●

●

●●

● ●

●●

●

●●

●

●●

●

●

●● ●

●

●●

● ●

● ●

●●

●●

●

●

● ●

● ●

●

●

● ●

●●

● ●

● ●

●

●●

●●

●

●

●

●●

●

●

● ●

●

●●

●

●

●

●

●

●

● ●

●

●● ●

●

●●

●

●


The smallmice scatterplot de�nitely provides with some intuition about the nature of thedata

●

●

●

●

●

●

●

5 10 15 20

200

400

600

800

1000

1200

Days

Wei

ght

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●


while for the unbalanced macs data we have a seriously messy situation

●

●

●

−2 0 2 4 6

050

010

0015

0020

0025

0030

00


CD

+ c

ell n

umbe

rs

●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●●

●

●

●

●

●

●

● ●●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

● ●●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

● ● ●

●

●

● ●●

●

●

●

● ●

●●

●●

●

●

●

●●

●●

● ● ●

●

●

●

●

●●

●

●

●

●

●

●●

●

●●

●

●

●●

●● ●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●● ●

●

● ●●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

● ● ●

●

●

●

●●

●● ●

●

●

●

●

●

●

●

●●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

● ●

●●

●

● ●

●

●

●

●

●

● ●

●

●

●

● ●

●

●

●●

●

●

●

●●

●● ● ●

●

●●

●

●

●

●

●●

● ●●

●

● ●

●●

●

●●

●●

●

● ●

● ●

●

●●

●

●

●

● ●

●

●●

●●

● ●

●

●●● ●

● ●●

● ● ● ●●

● ●

●

●●

●●

● ● ● ●

●

●●

●●

●

●

●

● ●●

●

●

●

●

●

●

●●

●

● ●●

●

●●

● ●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

● ●

●

●●

●●

●

●

●●

●

●

●● ●

●

●●

●

● ●●

●●

●

●

● ● ●

●

●

●

●

●

●

●

●●

● ●

● ●● ● ●

●

●

● ●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●●

● ●

●

● ●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●

●

● ●

●●

●

●

●

●

●

●

● ●

● ●●

● ●

● ●

●

●●

●

●●

● ●

● ●

●● ●

●● ●

●●

●

●●

●

●●

●

●

● ●

●

●

●

●

●

● ●

●● ●

●●

●

●

● ●

●● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●●

● ● ●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

●●

●●

●

●●

●

●

●

●

● ●

●

●

●

●

● ● ●

●

●● ●

●

●

● ●

●

●

●●

●

●

●

●● ●

●

●

●●

●

●● ● ●

● ●● ●

●

●●●

●

●●

●

●

●

●

●

●

●

● ● ● ●

●●

●

●●

●

●

● ●●

●

● ●●

●

●● ●

● ●● ●

● ●

●●

● ●●

● ●

●●

●

●

● ● ●

● ●

●

●

● ●●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●

●

●

● ●

●

●

●●

● ●

●

●

●●

● ● ●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●● ● ●

● ● ●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●● ● ●

● ●● ●

●●

●

●●

●

●

● ● ●●

● ● ●

●●●

●●

● ●●

●

●

●

●

● ●

● ●

●● ●

●●

●

● ●●

●●

●

●

●

●

●

● ●●

● ●●

●●

●● ●

●

●

●

● ●

●

●

●

●●

●

● ●●

● ●

●

●

●

● ●

●

●

●●

●●

●●

● ● ●

●●

●

●

●

●

●●

● ●

●

●●

●

●

● ●

●

●

●

●●

●

●●

●●

●

●

●

● ●●

●

●

●

●

●

●

●

● ●

●

●● ●

●

●●

●

● ●

●

●

●

●

●●

●

●

●

●

●

● ● ●

●

● ●●

● ●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

● ●

●

●● ●

●

●

●●

● ●●

●●

●

●●

●

●

● ●●

●●

●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

● ●

●

●

●●


One way of getting some useful information out of it is to simply plot a small, random,sample of time plots

●●

●

●

0 1 2 3 4 5 6

010

2030

4050

Time(weeks)

Mea

n B

lood

lead

leve

l

●●

●

●

●

●

●

●●

● ●

●

●

● ●

●

●

●

●●


while for the unbalanced macs data we have

●

●

●

●

●

●

−2 0 2 4 6

050

010

0015

0020

0025

0030

00


CD

+ c

ell n

umbe

rs

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

● ●

●

●

●

●

● ●

●

●

●

●

● ●

●

●

● ●●

●

●

●

●

● ●●

●●

●●

●


However, there is always the danger that the chosen time plots are not representative ofthe population. A possible '�x' to this problem is

• Choose a variable• Observe the time plots for di�erent levels of this variable (if this is a binary or factor

variable)• Observe the time plots for the di�erent quantiles of this variable (if this is continuous

variable)• This variable could be one of the explanatory variables


Consider the time plots for the macs data by age quantiles

●

●● ●

−2 0 2 4 60

500

1500

2500

(1)

●

●● ●

●

●●

●

●

●●

●

●

●

●

●● ●

●

●

●

●

●

●● ● ●

●●

●

●

●●

● ●

● ● ● ●●

● ●

●●

●● ●

● ● ●●

●

●●

●● ●

● ●●

●

●● ●

●

● ●

●

●

●

●

●

●● ●

●●

●

● ● ●

●●

●

●● ●● ● ● ● ● ●●

●

●● ● ●

● ●●

●

●

●

●● ● ●

●

●

●

●

●●

●●

●

●

●●

● ● ●

●

●●

●

● ●

●●

●

●●

●

●

● ●●

●

● ● ●

● ●

●

−2 0 2 4 6

050

015

0025

00

(2)

● ●

● ●

●●

●

●

●

●

● ●●

●

●

●●

●

●

●●

●

● ●

●

●●

●● ●

● ●

●●●

●

●● ●●

●●

●

●

●

● ●●

● ●

●

●●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●●

● ●

●

● ●

●

●

●●

●

● ●

● ●●

●●

●●●

● ● ●

●●

●

●

●●● ● ● ●

●

● ●

●

●

●●

●●

●

●●

●

●● ● ● ● ●●

●

●●

●●

● ●●

●

●●

● ●●

●

●

● ●

●

−2 0 2 4 6

050

015

0025

00

(3)

●●

●

●

●

● ●●●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

● ● ●●

●●

●

●

● ●●

●

●

● ●

●●

●●

● ●

●

●

●●

●

●● ●

● ● ●●

●

●

●●

●

● ●

●●

●

●●

●

●●

● ●● ●

● ●

●●

● ● ●

●●

● ●● ● ●

●

● ●

●● ●

●

●●

●

●

● ●

● ●

●●

●● ●

●

●

●

●

●

●●

● ●

●

●

●

●●

●

●

●

●

●

−2 0 2 4 60

500

1500

2500

(4)

●

●

●● ●

●● ●

●●

●

●●

●

●

●●

●

●

●

● ● ●● ●●

●●

●

●

●●

● ● ●

●

●

● ●●

● ●●

●●

●

●

●●

●

●●●

● ●

●●

●

●●

●

●

● ●

●●

●

●

●

●

●

●

●●

●

●●

●

●● ●

●●

● ● ●●

●

●

●

●● ●

●●

●● ●●

● ●

●

●

●

●●

●●

● ●●

●●

●

●

●● ● ●

●

●

●

●●

●

●● ●

●

●

●

●●

●


3. One way to explore changes in response over time is to create boxplots. This is possibleonly in balanced studies, where occasions are common for everybody. For smallmice data

●

●●

●

●

●●

2 5 8 11 14 17 20

200

400

600

800

1000

1200

Days

Wei

ght


Simple Analysis

The apparent complication from the fact that we have repeated measurements could beovercome by summarized the longitudinal data.

1. Perhaps the simplest univariate summary of LD data is the average of the response froma single subject

Yi =

∑nij=1 Yij

ni:

The average Yi is treated as a single response per subject. The analysis then is simpli�edand linear regression and ANOVA techniques can easily be used.

Note: This approach is straight forward in balanced studies. A problem exists in unbalancedstudies where not all of the subjects have the same number of observations. So we could

• average all the available observations per subject and continue• average all the available observations per subject and perform some weighted analysis• ignore them or do something else???


2. Another way of analyzing LD data is by summarizing each pro�le by a slope.

• We treat the set of observations for each subject as a 'separate' population• we regress Yij against tij with ni data points for each of the i subjects• De�ne Y ∗ij = Yij − Yi and t

∗ij = tij − ti, where ti is the mean of the observation times

for subject i. Then the slope can be written

�i =

∑j t∗ijY∗ij∑

j t∗ijt∗ij

:

Then the n slopes are treated as the regular data, and analysis using standard techniquesare being used to analyze these data. For example, two sample t-tests or ANOVA can beused to compare the slopes between two groups or more.

3. Many LD studies are designed to be analyzed as a paired analysis. Hence, if we havedata of the form before treatment and after treatment, then the paired t-test could be theappropriate way for analysis.


Problems with simple analyses

1. E�ciency Lost: This occurs when we do NOT use all the data available to us.

• Omit subjects (NEVER do that)• Omit observations

2. Bias: Can be introduced at many stages and in many di�erent ways.

• by design• by subjects who may drop-out for reasons related to the study• by the analyst through mis-analysis

3. Over-simpli�cation: When we simplify the data ignoring their richness.


Smoothing Techniques

In cases where the occasions of measurement are di�erent, it is helpful to produce a"smoothed" plot of the mean response trend over time, as a summary measure.

• Many of these smoothing techniques estimate the mean response at any time by consideringnot only the observations at this particular occasion, but also the neighboring ones.

• That is, the estimated mean is based on observations takes before, at and after the timeof interest.

• The mean, say, at time t is taken to be a weighted average of the observations in closeproximity or neighborhood of time t.


A: Moving/Running Average

One of the most well-known and simplest approaches is the moving or running average.

• For longitudinal data that are balanced and complete the moving average at time t, saySt, is given by

St =1

N

N∑i=1

k∑j=−k

wjyi;t+j; t = k + 1; :::; n− k

where

{ k is some positive integer (eg k = 1 or k = 2) and

{∑k

j=−k wj = 1.

We refer to 2k + 1 as being the order of the moving average. This expression assumesthat N individuals are measured at the same set of occasions.

• With unbalance and/or incomplete data, a similar expression can be derived.


• The order of the moving average determines a symmetric neighborhood of values used toestimate St.

• The higher the order of the moving average the greater the smoothness of the resultingestimate of the mean time trend.

• Hence, the lower the order of the moving average the greater the roughness of the estimate.

• The wj are positive weights that add up to 1, usually equal. In the case where they arenot equal, they are chosen to decrease symmetrically about some maximum value. Thatis wj = w−j and w0 > w1 > ::: > wk. As a result, observation closer to time t havegreater weight in the calculation of the mean than those further apart.

• Based on this de�nition, the calculation of the moving average is problematic at thebeginning and at the end of time plot. A solution is to amend the summation to rangefrom j = max(−k; 1− t) to j = min(k; n− t) and diving by the by the correspondingsum of the included weights.


●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

5 10 15 20

200

400

600

800

1000

1200

SmallMice

Day

Wei

ght

●

●

●

●

●

●

●

Mean/DayMoving Average (k=1)


●

●

●

●

● ●

●

●

●

●

● ●●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

● ●

●●

●

●

●

●

●●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●●●

●●

●

● ●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ● ● ●

●

●

●

●

●

● ●

●

● ●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●● ●

●

●

●●

●

● ●

●●

●

●

● ●

●

●

●

●

●●

●●

● ●

●

●

● ●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●●

●

● ●

●

● ●●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●●●

●

●

●

● ●●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●●

●●●

●

● ● ●

●

●

●● ●

● ●

●●

●●●

●●

●

●

●

●

●

●

● ● ●

●●

●

●●

●

●

● ●●

●

●●

●

●● ●

● ●● ●

● ●●

●

●

● ● ●

● ●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

● ●

● ● ●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

● ●● ●

●●

●

●

●

●

● ●

●

●

●

● ●

●●

● ●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●●

●●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

● ● ●

●

● ●

●●

●

●

●

● ●●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

−2 0 2 4

050

010

0015

0020

0025

00


CD

+ c

ell n

umbe

rsBandwidth


Similarly, but more e�ciently, we can use the kernel smoother

�(t) =

∑mi=1 w(t; ti; h)yi∑mi=1 w(t; ti; h)

;

which is a weighting function that changes smoothly over time and gives more weight toobservations close to time t.

A common weight function is the the Gaussian (normal) Kernel

K(u) = exp(−0:5u2):

Hence:w(t; ti; h) = K{(t− ti)=h};

where h is the bandwidth of the kernel.

In R: bandwidth = The kernels are scaled so that their quartiles (viewed

as probability densities) are at +/- 0.25*bandwidth


●

●

●

●

● ●

●

●

●

●

● ●●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

● ●

●●

●

●

●

●

●●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●●●

●●

●

● ●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ● ● ●

●

●

●

●

●

● ●

●

● ●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●● ●

●

●

●●

●

● ●

●●

●

●

● ●

●

●

●

●

●●

●●

● ●

●

●

● ●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●●

●

● ●

●

● ●●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●●●

●

●

●

● ●●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●●

●●●

●

● ● ●

●

●

●● ●

● ●

●●

●●●

●●

●

●

●

●

●

●

● ● ●

●●

●

●●

●

●

● ●●

●

●●

●

●● ●

● ●● ●

● ●●

●

●

● ● ●

● ●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

● ●

● ● ●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

● ●● ●

●●

●

●

●

●

● ●

●

●

●

● ●

●●

● ●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●●

●●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

● ● ●

●

● ●

●●

●

●

●

● ●●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

−2 0 2 4

050

010

0015

0020

0025

00

Kernel Smoother (Box)


CD

+ c

ell n

umbe

rs

Bandwidth=0.5 (default)Bandwidth=4


●

●

●

●

● ●

●

●

●

●

● ●●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

● ●

●●

●

●

●

●

●●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●●●

●●

●

● ●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ● ● ●

●

●

●

●

●

● ●

●

● ●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●● ●

●

●

●●

●

● ●

●●

●

●

● ●

●

●

●

●

●●

●●

● ●

●

●

● ●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●●

●

● ●

●

● ●●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●●●

●

●

●

● ●●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●●

●●●

●

● ● ●

●

●

●● ●

● ●

●●

●●●

●●

●

●

●

●

●

●

● ● ●

●●

●

●●

●

●

● ●●

●

●●

●

●● ●

● ●● ●

● ●●

●

●

● ● ●

● ●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

● ●

● ● ●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

● ●● ●

●●

●

●

●

●

● ●

●

●

●

● ●

●●

● ●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●●

●●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

● ● ●

●

● ●

●●

●

●

●

● ●●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

−2 0 2 4

050

010

0015

0020

0025

00

Kernel Smoother (Gaussian)


CD

+ c

ell n

umbe

rs

Bandwidth=0.5 (default)Bandwidth=3


B: Lowess

One popular method is (robust) LOcally WEighted (polynomial) regreSSion or lowess.

• The lowess estimate at t is understood by imagining there is a 'window' centered at t.

• The lowess estimate of the mean at t is determined by �tting a 'straight' line to the datainside the window and obtaining the predicted value at t from the �tted regression line(using the explanatory variable values for that data point).

• The polynomial is �t using weighted least squares, giving more weight to points near thepoint whose response is being estimated and less weight to points further away.

• The entire lowess curve is obtained by moving the window of �xed width from left to rightand repeating the process at every time.

• The width of the window determines the smoothness. The wider the window the smootherthe curve. This is called the bandwidth.


• The choice of bandwidth involves the classical trade o� between bias and precision.Excessive smoothing decreases the variance of the estimate at the risk of introducing bias.Insu�cient smoothing is unlikely to introduce bias but will produce a variable estimate.

• Many of the details of this method, such as the degree of the polynomial model and theweights, are exible.

References

• Cleveland, W. S. (1979) Robust locally weighted regression and smoothing scatterplots.J. Amer. Statist. Assoc. 74, 829{836.

• Cleveland, W. S. (1981) LOWESS: A program for smoothing scatterplots by robust locallyweighted regression. The American Statistician, 35, 54.


●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

5 10 15 20

200

400

600

800

1000

1200

SmallMice

Day

Wei

ght

●

●

●

●

●

●

●

lowessMean/Day


●

●

●

●

● ●

●

●

●

●

● ●●

●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

● ●

●●

●

●

●

●

●●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●●●

●●

●

● ●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ● ● ●

●

●

●

●

●

● ●

●

● ●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●● ●

●

●

●●

●

● ●

●●

●

●

● ●

●

●

●

●

●●

●●

● ●

●

●

● ●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●●

●

● ●

●

● ●●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●●●

●

●

●

● ●●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●●

●●●

●

● ● ●

●

●

●● ●

● ●

●●

●●●

●●

●

●

●

●

●

●

● ● ●

●●

●

●●

●

●

● ●●

●

●●

●

●● ●

● ●● ●

● ●●

●

●

● ● ●

● ●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

● ●

● ● ●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

● ●● ●

●●

●

●

●

●

● ●

●

●

●

● ●

●●

● ●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●●

●●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

● ● ●

●

● ●

●●

●

●

●

● ●●

●

●

● ●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

−2 0 2 4

050

010

0015

0020

0025

00

lowess curve [ macs (placebo) ]

Time

CD

4


* (Robust Regression) *

Major problems in regression are the absence of

• normality (parametric)

• common variance

• independence of the errors


Other problems are

• overly in uential data points

• outliers

• inadequate speci�cation of the functional form of the model

• near-linear dependencies amongst the independent variables (collinearity)

• independent variables being subject to errors


Robust regression is a form of regression analysis designed to circumvent somelimitations of traditional parametric and non-parametric methods.

• A simple method of estimating parameters in a regression model that are less sensitive tooutliers than the least squares estimates, is to use least absolute deviations. Even then,gross outliers can still have a considerable impact on the model.

• Another approach to robust estimation of regression models is to replace the normaldistribution with a heavy-tailed distribution. A t-distribution with between 4 and 6 degreesof freedom has been reported to be a good choice in various practical situations.


Revision

i) Normal Distribution

• Univariate N(�; �2). The probability density function is

�(x) =1

�√

2�exp

{−(x− �)2

2�2

}

• Multivariate Np(�;Σ). Let x = (x1; x2; :::; xp)′ a p-component random vector having

a MVN with mean � = (�1; �2; :::; �p)′ and a p× p covariance matrix

Σ =

�11 : : : �1p... . . . ...�p1 : : : �pp

:


The pdf has the form

f(x1; x2; :::; xp) = (2�)−p=2|Σ|−1=2 exp{−0:5(x− �)′Σ−1(x− �)

}:

ii) Maximum Likelihood Estimation

• Independent Observations (simple linear regression)Suppose the data are collected from a series of cross{sectional studies. We have asample of N -individuals at n-occasions, and the data are of the form (Yij;Xij), for theith individual at the jth occasion. The model has the form

Yij = Xij� + eij;

where eij ∼ N(0; �2): Hence

f(yij) =1

�√

2�exp

{−(yij − �ij)

2

2�2

};


and the likelihood function takes the form

L =

N∏i=1

n∏j=1

f(yij):

The log{likelihood then becomes

l = log

N∏i=1

n∏j=1

f(yij)

= −nN

2log(2��2)− 1

2

N∑i=1

n∑j=1

(yij −X′ij�)2

�2;


while the MLE of � (also the OLS estimate) are

� =

N∑i=1

n∑j=1

(XijX′ij)

−1

N∑i=1

n∑j=1

(Xijyij):

Note: In this process we have ignored �2.• Correlated Observations

In this case we have ni observations for the ith subject. Assume Σi is known, and hence

we do not need to estimate it (later we see how we can estimate it). It is assumed thatYi = (Yi1; Yi2; :::; Yini)

′ has a Nni(�i;Σi) distribution. Hence, the log{likelihood canbe written as

l = −K2

log(2�)− 1

2

N∑i=1

log |Σi| −1

2

N∑i=1

(yi −Xi�)′Σ−1i (yi −Xi�);

where K =∑N

i=1 ni is the total number of observations.


Then the estimator of �, known as GLS estimator, can be expressed as

� =

{N∑i=1

(X′iΣ−1i Xi)

}−1 N∑i=1

(X′iΣ−1i yi);

and has the properties:{ Is unbiased:

E(�) = �:

{ Asymptotically has a MVN with mean � and

Cov(�) =

{N∑i=1

(X′iΣ−1i Xi)

}−1

:

Note: Similar asymptotic properties we have when we estimate Σi. However, with smallsample sizes, the sampling distribution of � is adversely in uenced by the number ofcovariance parameters that need to be estimated.


Modelling the Mean: Pro�le Analysis

• Initially, we introduce no structure on the mean response over time.

• Additionally, we set no structure on the covariance among the repeated measures. Thiswill be dealt in details later.

• In order to perform a Pro�le Analysis, we require balanced data, with the timing of therepeated measures common to all individuals in the study.

• Unbalanced designs due to missing data can be handled.

• This kind of analysis is appealing when there is a single categorical covariate (eg. treatmentgroup) and when a speci�c pattern for the di�erences in the response pro�les cannot bespeci�ed.


●

●

●

●

0 1 2 3 4 5 6

1015

2025

30

Time(weeks)

Mea

n B

lood

lead

leve

l●

●●

●

SuccimerPlacebo


Hypotheses:

For simplicity, assume that we have a two-level categorical covariate (two-group design). Anygeneralization should be straight forward.

Hence, the following questions arise:

• Are the pro�les of the groups parallel? In other words, is there a group× time interaction?

• Is there a time e�ect? (under the assumption that the mean response pro�les are parallel)

• Is there a group e�ect? (under the assumption that the mean response pro�les are parallel)


●

●

●

●

1.0 1.5 2.0 2.5 3.0 3.5 4.0

68

12Time

Mea

n R

espo

nse

● ● ● ●

1.0 1.5 2.0 2.5 3.0 3.5 4.0

68

12

Time

Mea

n R

espo

nse

● ● ● ●

1.0 1.5 2.0 2.5 3.0 3.5 4.0

68

12

Time

Mea

n R

espo

nse


Suppose we have the two-group design, where a new treatment (T) is compared to astandard one (C).

Measurement OccasionGroup 1 2 . . . n

Treatment �1(T ) �2(T ) . . . �n(T )Control �1(C) �2(C) . . . �n(C)Di�erence ∆1 ∆2 . . . ∆n

∆j = �j(T)− �j(C)

The null hypothesis is that there is no treatment × time interaction. This means thatthe di�erence in the means between the treatment groups is the same over time. Hence:

H0 : ∆1 = ∆2 = : : : = ∆n:

This provides with a test on (n− 1) degrees of freedom.


In a General Linear Model formulation we have

E(Yi|Xi) = �i = Xi�;

where Xi is an appropriate design matrix for the kind of interpretation we want for the �'s.

Example: If we have n = 3 measurements from two groups, then we require 2× 3 = 6parameters for the means. For group A we have

�1(A) = �1

�2(A) = �2

�3(A) = �3

while for group B we have

�1(B) = �4

�2(B) = �5

�3(B) = �6


Hence, the design matrix for group A has the form

Xi =

1 0 0 0 0 00 1 0 0 0 00 0 1 0 0 0

while for group B

Xi =

0 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1

where � = (�1; �2; : : : ; �6)′ is a 6× 1 vector of regression coe�cients. Hence:

�(A) =

�1(A)�2(A)�3(A)

=

�1

�2

�3

and �(B) =

�1(B)�2(B)�3(B)

=

�4

�5

�6

:


In this way of parameterization we cannot test H0 by simply setting one of the �'s equal tozero, or something simple like that. The null hypothesis of no treatment× time interactioncan be re-expressed as

H0 : (�1 − �4) = (�2 − �5) = (�3 − �6);

and written in a matrix formH0 : L� = 0;

for

L =

(1 −1 0 −1 1 01 0 −1 −1 0 1

):

This expression leads to the following set of equations

{�1 − �2 − �4 + �5 = 0�1 − �3 − �4 + �6 = 0

⇒{

�1 − �4 = �2 − �5

�1 − �4 = �3 − �6


A slightly di�erent way of parameterization is when we use a group as a reference group (thisis the preferred way of parameterization of many statistical software). In this approach thedesign matrices have the form

Xi =

1 0 0 0 0 01 1 0 0 0 01 0 1 0 0 0

for group A, while for group B

Xi =

1 0 0 1 0 01 1 0 1 1 01 0 1 1 0 1

;

where in this case the reference group is the �rst one (group A).


Hence:

�(A) =

�1(A)�2(A)�3(A)

=

�1

�1 + �2

�1 + �3

and

�(B) =

�1(B)�2(B)�3(B)

=

�1 + �4

(�1 + �4) + (�2 + �5)(�1 + �4) + (�3 + �6)

:

As a result, the null hypothesis for no treatment× time interaction now takes the form

H0 : �5 = �6 = 0;

which is a simpler and a more straight forward way of testing H0.

Additionally, testing for the main e�ects (group and time) is straight forward. Therefore,under the assumption of no interaction e�ect, the hypothesis of no time e�ect can be assessedthrough

H′0 : �2 = �3 = 0;


while the group e�ect can be assessed through

H′′0 : �4 = 0:

General Case

In a similar way we can have the parameterization for the case where we have G groupsto compare over n occasions. In the '�rst' parameterization' (no reference group) we canintroduce G dummy (binary) variables, indicators for each one of the G treatment groups

Zig =

{1; if the ith subject belongs to group g;0; otherwise.


Hence: �i(1)�i(2)...

�i(G− 1)�i(G)

=

�1

�2...

�G−1

�G

If, however, we choose to introduce an intercept, say �1, then we need G − 1 dummy

variables. Hence, if we allow group G to be our reference group, we get�i(1)�i(2)...

�i(G− 1)�i(G)

=

�1 + �2

�1 + �3...

�1 + �G�1


How is Done!

• In the Pro�le Analysis we basically have two covariates, both factors! One, say Z1,represents treatment group (G ≥ 2) while the second one, say Z2, is for the n occasionsfor which we have measurements.

• Assume we have G = 2 treatment groups and n = 3 occasions. The 'usual' approach(most stats software) is to introduce by default an intercept into the model. In this casewe need G− 1 = 1 dummy variables for treatment and n− 1 = 2 for occasions.

{ For treatment we have (assuming standard treatment is the reference)

Zi1 =

{1; if the ith subject is on new treatment;0; if the ith subject is on standard treatment.


{ For the occasions (assuming the �rst one is the reference) we have

Zi21 =

{1; indicate observation at the second occasion;0; otherwise.

and

Zi22 =

{1; indicate observation at the third occasion;0; otherwise.

• The model takes the form

{ with no treatment× time interaction

�i = �1 + �2Zi1 + �3Zi21 + �4Zi22

{ with interaction

�i = �1 + �2Zi1 + �3Zi21 + �4Zi22 + �5Zi1Zi21 + �6Zi1Zi22


• For example (model with interaction):

{ If the ith subject is in new treatment and for the second occasion, we have

�i = �1 + �2 + �3 + �5:

{ If the ith subject is in standard treatment and for the third occasion, we have

�i = �1 + �4:

{ While, if the ith subject is in standard treatment and for the �rst occasion, we have

�i = �1:


• The design matrices based on our model are

{ for patients on the new treatment

Xi =

1 1 0 0 0 01 1 1 0 1 01 1 0 1 0 1

;

{ for patients on the standard treatment

Xi =

1 0 0 0 0 01 0 1 0 0 01 0 0 1 0 0


Missing Data

We have mentioned that this type of analysis requires balanced structures. However,missing data can be easily dealt with by constructing the appropriate design matrix.

For example, if a subject attends two of the arrange occasions and misses the third one, thenwe can simply remove the appropriate line from the design matrix.

Hence, if a patient from group A (previous example) misses the third visit, then the designmatrix becomes

Xi =

(1 0 0 0 0 00 1 0 0 0 0

)


Tools & Concepts

Now, we consider how to make inferences about �. More speci�cally this has to do withcon�dence intervals and hypothesis testing.

A. Statistical Inference:In order to estimate � we use the ML in order to get �, with estimated covariance matrix

Cov(�) =

{N∑i=1

(X′iΣiXi

)}−1

;

where Σ, the ML estimate of Σ is being used.


1. Con�dence Intervals: For every single component �k of � we have

�k ± 1:96

√V ar(�k)

for a 95 % Con�dence Interval.

Generally, if L is a vector or matrix of known weights, then

L� ± 1:96

√LCov(�)L′


2. Wald Test: Whenever a relationship within or between data items can be expressed asa statistical model with parameters to be estimated from a sample, the Wald test canbe used to test the true value of the parameter based on the sample estimate. Hence,for testing the hypothesis

H0 : �k = 0

HA : �k 6= 0;

we calculate the following Wald Statistic

Z =�k√

V ar(�k)

can be compared with N(0; 1).In general, con�dence intervals can be constructed for linear combinations of thecomponents of �. Hence, assume that L� represent a set of contrasts of interest.The hypothesis testing takes the form

H0 : L� = 0


HA : L� 6= 0;

and the Ward Statistic becomes

Z ′ =L�√

LCov(�)L′:

Now, if L is a single row vector then LCov(�)L′ is scalar and hence we compare Z ′ tothe standard normal distribution.Furthermore, since Z ′ ∼ N(0; 1), then Z ′2 has a �2 distribution with 1 degree offreedom (df). As a result, an identical test of the above hypothesis uses the statistic

W 2 = (L�)′{LCov(�)L′

}−1

(L�);

and compare W 2 to �21.

However, this formulation helps to generalize (when L has more than one rows), allowing


the simultaneous testing of a multivariate hypothesis. Hence, if L has r rows then asimultaneous test

H0 : L� = 0

HA : L� 6= 0;

is given by

W 2 = (L�)′{LCov(�)L′

}−1

(L�);

which follows a �2 distribution with r df.

This is often referred to as the multivariate Wald test.


3. Likelihood Ratio Test:• The LRT can be used to compare two models, when one model is a special case

(nested) to the other.• The alternative or full model allows some parameters to vary, whereas the null or

reduced model �xes those parameters at known values.• The LRT is then 2 times the di�erence of the log miximized likelihoods for each

model. The alternative of full model (larger model) will always have the larger log-likelihood (lfull), whereas the null or reduced model has lred < lfull. Hence, the teststatistic

G2 = 2(lfull − lred)

is constructed to answer how much larger lfull is from lred. The larger G2 is thestronger the evidence that the smaller model (null) is inadequate.• We compare G2 to a �2 distribution with df equal to the di�erence between the

number of parameters in the two models.


Note 1:Likelihood-based con�dence intervals can be constructed with the use of of the pro�le

likelihood. More speci�cally, for a single component �k of �, the pro�le log-likelihood isobtained by maximizing the log-likelihood over the remaining parameters while keeping�k �xed. Then a 95 % CI is constructed by obtaining the values of �k that satisfy

2{lp(�k)− lp(�k)} ≤ critical value:

Note 2:LRT can be used for covariance parameters. Due to problems with the samplingdistribution of variance parameters, Wald test is not recommended. Even with LRTthere are some problems in comparing nested models for covariance parameters.


B. Restricted (residual) Maximum Likelihood (REML) Estimation:

• Introduced by Patterson & Thompson (1971) as a way of estimating variancecomponents in a GLM.• In ML estimation the log-likelihood function has the form

l = −K2

log(2�)− 1

2

N∑i=1

log |Σi| −1

2

N∑i=1

(yi −Xi�)′Σ−1i (yi −Xi�);

where K =∑N

i=1 ni is the total number of observations.• It is known that the ML estimate of Σi is biased in small samples.• To illustrate, consider the case where observations are independent (from cross-sectional

studies) with constant variance �2. Estimates of both � and �2 come from the


maximization of the log-likelihood function

l = −K2

log(2��2)− 1

2

N∑i=1

n∑j=1

(yij −X′ij�)2

�2

• The MLE of �2 is

�2 =

N∑i=1

n∑j=1

(yij −X′ij�)2

K;

and we know that �2 is a biased estimate of �2

E(�2) =

(K − p

K

)�2;

where p is the dimension of �.


• An unbiased estimate of �2 is

�2 =

N∑i=1

n∑j=1

(yij −X′ij�)2

K − p;

which is known as the REML estimate.• In e�ect, the bias arises from the fact that the ML estimate does not take into account

the fact that � is also being estimated from the same data.


As a result, Restricted (residual) Maximum Likelihood Estimation was developed toaddress this particular problem.

• The main idea is to separate the part of the data that is being used for the estimationof variance parameters.• Hence, we need to eliminate � from the likelihood, so only Σi is left in the likelihood

to be estimated.• One way of doing that is by transforming the data to a set of linear combinations of

observations that have a distribution that does not depend on �.• In the case of GLM with dependent errors the REML estimator is de�ned as a MLE

based on a linearly transformed set of data

Y∗ = AY;

such that the distribution of Y∗ does not depend on �.• For example the residuals after estimating � by OLS can be used to estimate Σi. Hence,

A = I −X(X ′X)−1X ′:


• Then, Y∗ has a singular multivariate Gaussian distribution with mean zero, whateverthe value of �:• The REML estimator of Σi is less biased than the ML estimator. When N is much

larger than p the di�erence becomes less important.• The REML estimator is being used for Σ, while � is estimated by the GLS estimator

� =

{N∑i=1

(X′iΣ−1i Xi)

}−1 N∑i=1

(X′iΣ−1i yi);

by plugging in the REML estimate of Σi.• REML is the default in R (and in many statistical software).


Model Selection

Model selection involves the choice of an appropriate model among a set of candidatemodels.

1. Nested Models: The likelihood ratio (LR) test is used in nested models. This means thatthe reduced model is a special case of the full model. In this case LR test can be seen asa model selection tool, since we can decide whether the additional complication of the fullmodel is worthwhile or the simpler model is equally good in describing the data.

2. Generally: Model selection techniques are useful for screening through many di�erentcovariance models. The goal is to choose the 'best' model for use in further analysis. The(log-) likelihood is once again the driving force behind any selection tool. More speci�cally

• Criterion-based approaches compare adjusted log-likelihoods penalized for the numberof parameters in the model.• The penalty increases with the number of parameters. This is because models with many

parameters should �t better (higher log-likelihood) than models with fewer parameters.


• The penalty is used to level o� this discrepancy.• The most popular selection criteria are{ The Akaike Information Criterion (AIC). For a given model m the AIC is de�ned as

AIC(m) = −2 loglikelihood(m) + 2qm;

where qm is the number of parameters in the model.{ The Bayes Information Criterion (BIC), de�ned as

BIC(m) = −2 loglikelihood(m) + log(N)qm;

where N is the number of observations (sample size).• For covariance models the log-REML is being used.• Model selection proceeds similarly in both criteria.{ We �t the models of interest to the data and then they are ranked according either

their AIC or BIC value.{ The model with the smallest value is selected as best.


Example: TLC Data

• Reshape the datatlc:long = reshape(tlc; idvar = ”id”; varying = c(”lead0”; ”lead1”; ”lead4”; ”lead6”); v:names =

”lead”; direction = ”long”)

• Model 1:

fm1 = lmer(lead ∼ factor(time) + factor(group) + (1|id); data = tlc:long)

• Model 2:

fm2 = lmer(lead ∼ factor(time) + (1|id); data = tlc:long)


R Console Page 1

> fm1Linear mixed-effects model fit by REML Formula: lead ~ factor(time) + factor(group) + (1 | id) Data: tlc.long AIC BIC logLik MLdeviance REMLdeviance 2576 2600 -1282 2569 2564Random effects: Groups Name Variance Std.Dev. id (Intercept) 24.475 4.9472 Residual 24.417 4.9414 number of obs: 400, groups: id, 100

Fixed effects: Estimate Std. Error t value(Intercept) 23.6173 0.8915 26.493factor(time)1 -7.3150 0.6988 -10.468factor(time)4 -6.6140 0.6988 -9.465factor(time)6 -4.2020 0.6988 -6.013factor(group)P 5.5775 1.1060 5.043

Correlation of Fixed Effects: (Intr) fct()1 fct()4 fct()6factor(tm)1 -0.392 factor(tm)4 -0.392 0.500 factor(tm)6 -0.392 0.500 0.500 factr(grp)P -0.620 0.000 0.000 0.000


R Console Page 1

> fm2Linear mixed-effects model fit by REML Formula: lead ~ factor(time) + (1 | id) Data: tlc.long AIC BIC logLik MLdeviance REMLdeviance 2599 2619 -1294 2592 2589Random effects: Groups Name Variance Std.Dev. id (Intercept) 32.022 5.6588 Residual 24.417 4.9414 number of obs: 400, groups: id, 100

Fixed effects: Estimate Std. Error t value(Intercept) 26.4060 0.7513 35.15factor(time)1 -7.3150 0.6988 -10.47factor(time)4 -6.6140 0.6988 -9.46factor(time)6 -4.2020 0.6988 -6.01

Correlation of Fixed Effects: (Intr) fct()1 fct()4factor(tm)1 -0.465 factor(tm)4 -0.465 0.500 factor(tm)6 -0.465 0.500 0.500


R Console Page 1

> anova(fm1,fm2)Data: tlc.longModels:fm2: lead ~ factor(time) + (1 | id)fm1: lead ~ factor(time) + factor(group) + (1 | id) Df AIC BIC logLik Chisq Chi Df Pr(>Ch isq) fm2 5 2602.4 2622.4 -1296.2 fm1 6 2581.4 2605.3 -1284.7 23.069 1 1.563 e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘. ’ 0.1 ‘ ’ 1


Modelling the mean: Parametric Curves

• As the number of occasions increase and the number of irregular observations increase,Pro�le Analysis becomes less and less appealing.

• Furthermore, it is reasonable in many circumstances to expect that the mean response islikely to change smoothly (monotonically) over time, at least for the duration of the study.

• Fitting parsimonious models for the mean response leads to statistical tests with greaterpower than the Pro�le Analysis (narrower range of alternative hypotheses).

• This, however, is true only if the assumed structure for the mean is 'correct'.


Linear Trends over time

Assume the model

E(Yij) = �1 + �2Timeij + �3Groupi + �4Timeij × Groupi;

• Groupi =

{1; new treatment;0; otherwise.

• Timeij has two indices to allow for mistimed observations.

• Hence:

{ for the control group we have: E(Yij) = �1 + �2Timeij:

{ for experimental treatment group we have: E(Yij) = (�1 + �3) + (�2 + �4)Timeij:


Time

Mea

n R

espo

nse

0 2 4 6 8 10

01

23

45

Control

Treatment


Quadratic Trends over time

Assume the model

E(Yij) = �1 +�2Timeij +�3Time2ij +�4Groupi+�5Timeij×Groupi+�6Time

2ij×Groupi:

• Changes in the mean response are no longer constant. The rate of change now dependson time (earlier/later).

• Hence:

{ for the control group we have:

E(Yij) = �1 + �2Timeij + �3Time2ij:

{ for experimental treatment group we have:

E(Yij) = (�1 + �4) + (�2 + �5)Timeij + (�3 + �6)Time2ij:


Time

Mea

n R

espo

nse

0 2 4 6 8 10

02

46

810

12

Control

Treatment


• There is a natural hierarchy in higher order models. In the quadratic model

E(Yij) = �1 + �2Timeij + �3Time2ij;

�rst we test the quadratic trend (�3 = 0) before we move on to the linear term (�2 = 0).

• It is very important to see how variables enter the model. Centering variables to their meanvalue o�er a simple interpretation to the intercept. Additionally collinearity problems areavoided. For example consider Timej. If Timej ∈ {0; 1; 2; :::; 10} then the correlationbetween Timej and Time2

j is 0.96. However, if we center Time by subtracting its meanvalue 5, then the correlation goes down to zero.


Time-Varying Covariates

• So far we have discussed cases where variables remain unchanged over time.

• The common case where at the �rst visit all subjects are in the same state (say untreated)and any intervention is given from the second visit onwards (TLC data).

• Furthermore, in many trials patients tend to switch treatments for various reasons, usuallyside e�ects or even personal choice (when the treatment cannot be disclosed). As a resultwe need to allow for this change over the duration of the study (cross-over trials).


• In the case of a time-varying treatment indicator the treatment variable, say Gi, will notbe constant over time. Vector

Gi =

00111

;

indicates that patient i started with placebo for the �rst two occasions and then (s)heswitched to active treatment. Extension to cases where we have more than two treatmentgroups are possible with the inclusion of the right number of indicator (dummy) variables.

• In exactly the same way we model a continuous time-varying covariate, where at eachoccasion the right value for this covariate is included in the model.


Other Approaches: Splines

There are cases where longitudinal trends in the mean response cannot be characterized by�rst and second degree polynomials in time. Additionally, there are cases where non-lineartrends cannot be well approximated by polynomials in time of any order. This can happenwhen the mean response can rapidly increase or decrease for some duration and then continuemore slowly. A class of models called Splines are then used to describe these complicatedcurves.

A. Step Function:

• The simplest spline model for the population mean is a sequence of at steps.• In this approach, each step approximates the mean response over a small interval of

time. The result is a step function that approximates the smooth curve of the meanresponse.• The step function parameterization is quite straight forward. Suppose that we have

observations in 9 time points (occasions) from t = 1 to t = 9 and we have three step


functions with steps at 2.5 and 5.5. Then

time V 1 V 2

123456789

=

1 0 01 0 00 1 00 1 00 1 00 0 10 0 10 0 10 0 1

or

1 0 01 0 01 1 01 1 01 1 01 1 11 1 11 1 11 1 1

• In the V 1 parameterization the parameters represent the mean response in the intervals

(1,2.5), (2.5,5.5) and (5.5,9). In the V 2 parameterization, parameter �1 represent themean response in the �rst interval (1,2.5), however �2 represent the di�erence in themean response in the �rst two intervals and �3 the di�erence between the means in the


third and second intervals.

Note A: This parametrization is similar to the one for unstructured mean, with the onlydi�erence being in the fact that a single parameter is the mean for multiple time points.

B. Bent Line (piece-wise linear):

• Another approach, slightly more complicated, is to assume continuous functions, linearon intervals of time, with the slope allowed to change from one interval to the next.• As a result,connected line segments approximate the continuous curve of the mean

response.• The bent line requires two parameters for the �rst interval and one additional parameter,

for the change in slope, for every additional interval.• Hence, a model with two break points at t∗1 and t∗2 can be written as

E(Yij = �1 + �2Timeij + �3(Timeij − t∗1)+ + �4(Timeij − t∗2)+;

where (x)+ is equal to x when x > 0 and zero otherwise.


• The covariate matrix then takes the form

time Bent Line

123456789

=

1 1 0 01 2 0 01 3 0:5 01 4 1:5 01 5 2:5 01 6 3:5 0:51 7 4:5 1:51 8 5:5 2:51 9 6:5 3:5

Parameter �1 is the intercept, �2 is the slope up to time 2.5, �2 + �3 is the slopebetween 2.5 and 5.5 and �nally �2 + �3 + �4 is the slope after 5.5.

Note B: The break points at t = 2:5 and t = 5:5 are formally called knots. For thestep function, the number of parameters we require are 1 plus the number of knots. For


the bent line model the number of parameters required are 2 plus the number of knots.Note C: For the bent line model we often require fewer knots than the step function. Asa result, in practice, the total number of parameters required for the bent line model arefewer than the step function.

C. Higher Order Polynomial Splines:

• Spline models can become even more complicated by using piece-wise quadratic or cubicmodels.• Two parameters characterize spline models{ the order of the piece-wise polynomial on each interval{ the number of knots• If a spline is of kth-order, then for each knot there is a covariate that allows the coe�cient

of the kth-order term tKij to change. For example, at each knot there is a jump at thestep function and in the bent line model there is a change at the slope. The cubic splinemodel has an intercept, slope, a quadratic and a cubic term. At each knot, say t0k, thecubic spline model has a covariate of the form (tij − t0k)

3+. This allows the coe�cient

of the time cubed to change at each knot.


Note D: Generally, the number of parameters is equal to the degree of the polynomialplus the number of knots plus 1.


Modelling the Covariance

• Although the covariance between observations is not of primary interest, accounting forthe covariance among repeated measures usually increases the precision with which theparameters are being estimated.

• Furthermore, when we have missing data, the 'correct' speci�cation of the covariancestructure is often a requirement for valid estimates of the regression parameters.

• There are two aspects that require modelling: the mean and the covariance structure.Although they appear to be independent, an interdependence exist based on the fact thatthe covariance between any pairs of residuals {Yij−�ij(�)} and {Yik−�ik(�)} dependson the model of the mean. As a result, a model for the covariance should be chosen onthe basis of some model for the mean.


A. Unstructured

Cov(Yi) =

�11 �12 · · · �1n

�21 �22 · · · �2n... ... . . . ...

�n1 �n2 · · · �nn

• The above structure is reasonable when the number of occasions is relatively small and

all individuals are measured at the same set of occasions.• Formal requirements:{ symmetric{ positive de�nite• Advantage: No structure in the covariance matrix.

• Drawback 1: many parameters to estimate. We have to estimate n(n+1)2 parameters,

growing rapidly with n. As a result, estimation process can be unstable.• Drawback 2: Problem when we have mistimed observations.


B. Compound Symmetry

Cov(Yi) = �2

1 � · · · �

� 1 · · · �... ... . . . ...� � · · · 1

• Variance is assumed constant �2 and Corr(Yij; Yik) = �.• Advantage: Only two parameters to estimate.• Drawback 1: Makes the strong assumption that the correlation between any pair of

observations is the same, regardless of the time interval between measurements. Thisis rather unappealing for most Longitudinal data, since correlation is expected to decaywith time.• Drawback 2: The assumption of constant variance is also unrealistic. We have seen

that variance increases with time


C. Toeplitz

Cov(Yi) = �2

1 �1 �2 · · · �n−1

�1 1 �1 · · · �n−2

�2 �1 1 · · · �n−3... ... ... . . . ...

�n−1 �n−2 �n−3 · · · 1

• Assume that any pair of responses equally separated in time have the same correlation.• Variance is constant �2 and Corr(Yij; Yij+k) = �k.• Appropriate only when measurements are made at (approximately) equal intervals of

time.• There are n parameters to be estimated.


D. Autoregressive

Cov(Yi) = �2

1 � �2 · · · �n−1

� 1 � · · · �n−2

�2 � 1 · · · �n−3

... ... ... . . . ...�n−1 �n−2 �n−3 · · · 1

• A special case of the Toeplitz covariance structure.• Variance is constant �2 and Corr(Yij; Yij+k) = �k.• Advantage: Only two parameters to estimate.


E. Banded

Cov(Yi) = �2

1 �1 0 · · · 0�1 1 �1 · · · 00 �1 1 · · · 0... ... ... . . . ...0 0 0 · · · 1

• Makes the assumption that the correlation is zero beyond some point.• The above is a banded Toeplitz covariance pattern with a band size of 2.• Variance is constant �2 and Corr(Yij; Yij+k) = 0 for k ≥ 2.• Disadvantage: Makes a very strong assumption about how quickly the correlation

decays.


F. Exponential

• When measurement occasions are not equally spaced, we can generalize theautoregressive pattern by assuming

Corr(Yij; Yij) = �|tij−tik|;

for � > 0.• Thus, correlation decrease exponentially with the time separation between models.• Called exponential because

Corr(Yij; Yij) = �|tij−tik| = exp{−�|tij − tik|};

where � = − log(�).• Invariant under liner transformations.


Example: TLC Data

glsTLC:

leadij = �0 + �1groupi + �2timej + �3(groupi × timej)

with the covariance matrix having the compound symmetry form

Cov(Yi) = �2

1 � � �

� 1 � �

� � 1 �

� � � 1

>gls(lead ∼ factor(group)*factor(time),data=tlc.long,correlation=corCompSymm(form= 1|id))


R Console Page 1

> glsTLC=gls(lead~factor(group)*factor(time),data=tlc.long,correlation=corCompSymm(form=~1|id))> summary(glsTLC)Generalized least squares fit by REML Model: lead ~ factor(group) * factor(time) Data: tlc.long AIC BIC logLik 2480.621 2520.334 -1230.311

Correlation Structure: Compound symmetry Formula: ~1 | id Parameter estimate(s): Rho 0.5954401

Coefficients: Value Std.Error t-value p-value(Intercept) 26.540 0.9370175 28.323911 0.0000factor(group)P -0.268 1.3251428 -0.202242 0.8398factor(time)1 -13.018 0.8428574 -15.445080 0.0000factor(time)4 -11.026 0.8428574 -13.081691 0.0000factor(time)6 -5.778 0.8428574 -6.855252 0.0000factor(group)P:factor(time)1 11.406 1.1919804 9.568950 0.0000factor(group)P:factor(time)4 8.824 1.1919804 7.402807 0.0000factor(group)P:factor(time)6 3.152 1.1919804 2.644339 0.0085

Correlation: (Intr) fct()P fct()1 fct()4 fct()6 f()P:()1 f()P:()4factor(group)P -0.707 factor(time)1 -0.450 0.318 factor(time)4 -0.450 0.318 0.500 factor(time)6 -0.450 0.318 0.500 0.500 factor(group)P:factor(time)1 0.318 -0.450 -0.707 -0.354 -0.354 factor(group)P:factor(time)4 0.318 -0.450 -0.354 -0.707 -0.354 0.500 factor(group)P:factor(time)6 0.318 -0.450 -0.354 -0.354 -0.707 0.500 0.500

Standardized residuals: Min Q1 Med Q3 Max -2.5147478 -0.6973588 -0.1498706 0.5542799 6.5106944

Residual standard error: 6.625714 Degrees of freedom: 400 total; 392 residual


glsTLC2:



Cov(Yi) = �2

s1 � � �

� s2 � �

� � s3 �

� � � s4

>gls(lead ∼ factor(group)*factor(time),data=tlc.long,correlation=corCompSymm(form= 1|id),

weight=varIdent(form= 1|time))


R Console Page 1

> # Compound Symmetry with different variances at different occasions> glsTLC2=gls(lead~factor(group)*factor(time),data=tlc.long,correlation=corCompSymm(form=~1|id),weight=varIdent(form=~1|time))> summary(glsTLC2)Generalized least squares fit by REML Model: lead ~ factor(group) * factor(time) Data: tlc.long AIC BIC logLik 2459.960 2511.587 -1216.980

Correlation Structure: Compound symmetry Formula: ~1 | id Parameter estimate(s): Rho 0.6102797 Variance function: Structure: Different standard deviations per stratum Formula: ~1 | time Parameter estimates: 0 1 4 6 1.000000 1.279651 1.323192 1.519196






glsTLC3:


with the covariance matrix having the symmetric form

Cov(Yi) =

�2 �12 �13 �14

�21 �2 �24 �24

�31 �32 �2 �34

�41 �42 �43 �2

>gls(lead ∼ factor(group)*factor(time),data=tlc.long,correlation=corSymm(form= 1|id))


R Console Page 1

> summary(glsTLC3)Generalized least squares fit by REML Model: lead ~ factor(group) * factor(time) Data: tlc.long AIC BIC logLik 2471.632 2531.201 -1220.816

Correlation Structure: General Formula: ~1 | id Parameter estimate(s): Correlation: 1 2 3 2 0.596 3 0.582 0.769 4 0.536 0.552 0.551






glsTLC4:



Cov(Yi) =

�2

1 �12 �13 �14

�21 �22 �24 �24

�31 �32 �23 �34

�41 �42 �43 �24

>gls(lead ∼ factor(group)*factor(time),data=tlc.long,correlation=corSymm(form= 1|id),



R Console Page 1

> summary(glsTLC4)Generalized least squares fit by REML Model: lead ~ factor(group) * factor(time) Data: tlc.long AIC BIC logLik 2452.076 2523.559 -1208.038

Correlation Structure: General Formula: ~1 | id Parameter estimate(s): Correlation: 1 2 3 2 0.571 3 0.570 0.775 4 0.577 0.582 0.581Variance function: Structure: Different standard deviations per stratum Formula: ~1 | time Parameter estimates: 0 1 4 6 1.000000 1.325887 1.370453 1.524826






glsTLC5:

leadij = �0 + �1groupi + �2timej


Cov(Yi) =

�2

1 �12 �13 �14

�21 �22 �24 �24

�31 �32 �23 �34

�41 �42 �43 �24

>gls(lead ∼ factor(group)+factor(time),data=tlc.long,correlation=corSymm(form= 1|id),



R Console Page 1

> summary(glsTLC5)Generalized least squares fit by REML Model: lead ~ factor(group) + factor(time) Data: tlc.long AIC BIC logLik 2525.171 2584.854 -1247.585

Correlation Structure: General Formula: ~1 | id Parameter estimate(s): Correlation: 1 2 3 2 0.334 3 0.407 0.822 4 0.551 0.512 0.550Variance function: Structure: Different standard deviations per stratum Formula: ~1 | time Parameter estimates: 0 1 4 6 1.000000 1.567427 1.478107 1.484946

Coefficients: Value Std.Error t-value p-value(Intercept) 25.399921 0.7104368 35.75254 0.0000factor(group)P 2.012157 0.9786873 2.05598 0.0404factor(time)1 -7.315000 0.7993320 -9.15139 0.0000factor(time)4 -6.614000 0.7247826 -9.12549 0.0000factor(time)6 -4.202000 0.6448471 -6.51627 0.0000

Correlation: (Intr) fct()P fct()1 fct()4factor(group)P -0.689 factor(time)1 -0.222 0.000 factor(time)4 -0.205 0.000 0.814 factor(time)6 -0.105 0.000 0.437 0.446




R Console Page 1

> anova(glsTLC,glsTLC2) Model df AIC BIC logLik Test L.Ratio p-valueglsTLC 1 10 2480.621 2520.334 -1230.311 glsTLC2 2 13 2459.960 2511.587 -1216.980 1 vs 2 26.66058 <.0001> > anova(glsTLC,glsTLC3) Model df AIC BIC logLik Test L.Ratio p-valueglsTLC 1 10 2480.621 2520.334 -1230.311 glsTLC3 2 15 2471.632 2531.200 -1220.816 1 vs 2 18.98944 0.0019> > anova(glsTLC,glsTLC4) Model df AIC BIC logLik Test L.Ratio p-valueglsTLC 1 10 2480.621 2520.334 -1230.311 glsTLC4 2 18 2452.076 2523.559 -1208.038 1 vs 2 44.54507 <.0001> > anova(glsTLC2,glsTLC3) Model df AIC BIC logLik Test L.Ratio p-valueglsTLC2 1 13 2459.960 2511.587 -1216.980 glsTLC3 2 15 2471.632 2531.200 -1220.816 1 vs 2 7.671143 0.0216> > anova(glsTLC2,glsTLC4) Model df AIC BIC logLik Test L.Ratio p-valueglsTLC2 1 13 2459.960 2511.587 -1216.980 glsTLC4 2 18 2452.076 2523.559 -1208.038 1 vs 2 17.88450 0.0031> > anova(glsTLC3,glsTLC4) Model df AIC BIC logLik Test L.Ratio p-valueglsTLC3 1 15 2471.632 2531.200 -1220.816 glsTLC4 2 18 2452.076 2523.559 -1208.038 1 vs 2 25.55564 <.0001> > anova(glsTLC4,glsTLC5) Model df AIC BIC logLik Test L.Ratio p-valueglsTLC4 1 18 2452.076 2523.559 -1208.038 glsTLC5 2 15 2525.171 2584.854 -1247.585 1 vs 2 79.09486 <.0001Warning message:In anova.lme(object = glsTLC4, glsTLC5) : Fitted objects with different fixed effects. REML comparisons are not meaningful.> > anova(update(glsTLC4,method='ML'),update(glsTLC5, method='ML')) Model df AIC BIC logLik Test L.Ratio p-valueupdate(glsTLC4, method = "ML") 1 18 2461.368 25 33.214 -1212.684 update(glsTLC5, method = "ML") 2 15 2529.555 25 89.427 -1249.778 1 vs 2 74.18778 <.0001


Random E�ects

• We need to understand (at least qualitatively) what are the likely sources of randomvariation

• One possible source is Random Effects, when units are sampled at random from apopulation and various aspects of their behavior may show stochastic variation betweenunits

• We introduce Linear Random E�ects model where

{ the response is assumed to be a linear function of exploratory variables with regressioncoe�cients that vary from one individual to the next

{ variability re ects natural heterogeneity due to unmeasured factors


Example: Children birth weight and growth rate.

• A random e�ects model is a reasonable description if the set of coe�cients from apopulation of children can be thought of as a sample from a distribution

• Given the actual coe�cient for a children, the linear Random E�ects model assumes thatrepeated observations for that person are independent

• Correlation arises because we cannot observe the underlying growth curve, that is theregression coe�cient, but we have only imperfect measurements of weight on each infant

• So the model takes the form

E(Yij|Ui) = (�0 + Ui) + �1(time)ij

• Typically, a parametric model such as Gaussian with mean=0 and unknown variance �2 isused for Ui.


Linear Mixed Models

• The Usual Linear Modely = X� + e;

where

{ y = (y1; :::; yn)′ is an n× 1 vector of independent observations

{ � is a p× 1 vector of unknown parameters{ X an n× p design (model) matrix{ e = (e1; :::; en)

′ is an n× 1 vector of independent errors


• The linear mixed model (general)

Yi = Xi� + Zibi + ei;

where

{ Yi, � and e as before with∗ E(ei) = 0n∗ V ar(ei) = W

{ Matrix Z is a given n× q matrix (the columns of Z is a subset of the columns of X){ bi is an unobservable random vector of dimensions q × 1, following (theoretically) any

multivariate distribution with the following assumptions∗ E(bi) = 0q∗ V ar(bi) = B

In practice bi follow a multivariate normal distribution.{ In addition, vectors bi and ei are assumed uncorrelated.{ E(Yi) = Xi�

{ V ar(Yi) = V ar(X� + Zb + e) = ZBZ ′ + W .


Random Intercept Model

Consider the model

Yij = X′ij� + bi + eij

= (�1 + bi) + Xij2�2 + ::: + Xijp�p + eij

• Each subject's pro�le appears at (across occasions) - [or parallel]

• Observations Yij vary around a di�erent value for each subject. These values are theintercepts of the line each subject's responses vary around, where bi represents thedeviations of subject's i intercept from the population one (�1).

• The set of intercepts are a sample from the population of intercepts.

• This implies that there is between-subject variability (equivalent to within-subject

correlation)


1 2 3 4 5

−2

−1

01

2

Time

Res

pons

e


• Furthermore, the variance of Yij takes the form

V ar(Yij) = V ar(X′ij� + bi + eij)

= V ar(bi) + V ar(eij)

= �2b + �2

and the covariance between any pair of observations of the same subject

Cov(Yij; Yik) = Cov(X′ij� + bi + eij; X

′ik� + bi + eik)

= Cov(bi; bi)

= �2b :


The covariance matrix then becomes

Cov(Yi) =

�2b + �2 �2

b �2b · · · �2

b

�2b �2

b + �2 �2b · · · �2

b

�2b �2

b �2b + �2 · · · �2

b... ... ... . . . ...�2b �2

b �2b · · · �2

b + �2

;

and the correlation between two observations becomes

� = Corr(Yij; Yik) =�2b

�2b + �2

:

• The presence of random e�ect induce correlation among repeated measurements. This isalso known as intra-class correlation.


Note: In statistics, the intraclass correlation is a descriptive statistic that can be used whenquantitative measurements are made on units that are organized into groups. It describeshow strongly units in the same group resemble each other. While it is viewed as a typeof correlation, unlike most other correlation measures it operates on data structured asgroups, rather than data structured as paired observations.

• The model

E(Yij|bi) = X′ij� + bi

is referred to as the conditional or subject speci�c mean model

• The model

E(Yij) = X′ij�

is referred to as the marginal or population averaged mean model


1 2 3 4 5

02

46

810

Time

Res

pons

e


Example: Orthodont Data [included in nlme package]

• A set of measurements of the distance from the pituitary gland to the pterygomaxillary�ssure taken every 2 years.

• Measurements taken from 8 till 14 years of age.

• We have 27 children: 16 males - 11 females

• Data collected from x-rays.


Age (yr)

Dis

tanc

e fr

om p

ituita

ry to

pte

rygo

max

illar

y fis

sure

(m

m)

20

25

30

810 13

● ●

●

●

M16

●

●●

●

M05

810 13

●● ●

●

M02

● ● ●

●

M11

810 13

● ●

●

●

M07

●

●

●●

M08

810 13

● ●

●

●

M03

●

● ●

●

M12

810 13

●

●

●

●

M13

●

● ● ●

M14

●

●

●

●

M09

●

●

●

●

M15

●●

●

●

M06

●

●● ●

M04

●●

●

●

M01

● ●

● ●

M10

●

● ● ●

F10

20

25

30

●●

● ●

F09

20

25

30

●● ●

●

F06

810 13

●●

●

●

F01

●

● ●●

F05

810 13

●● ●

●

F07

● ●

●

●

F02

810 13

● ● ● ●

F08

●

● ●

●

F03

810 13

●● ●

●

F04

● ●

● ●

F11


R Console Page 1

> levels(Orthodont$Sex)[1] "Male" "Female"> OrthoFem=Orthodont[Orthodont$Sex=="Female",]> lmF=lmList(distance ~ age, data=OrthoFem)> coef(lmF) (Intercept) ageF10 13.55 0.450F09 18.10 0.275F06 17.00 0.375F01 17.25 0.375F05 19.60 0.275F07 16.95 0.550F02 14.20 0.800F08 21.45 0.175F03 14.40 0.850F04 19.65 0.475F11 18.95 0.675


R Console Page 1

> intervals(lmF), , (Intercept)

lower est. upperF10 10.07138 13.55 17.02862F09 14.62138 18.10 21.57862F06 13.52138 17.00 20.47862F01 13.77138 17.25 20.72862F05 16.12138 19.60 23.07862F07 13.47138 16.95 20.42862F02 10.72138 14.20 17.67862F08 17.97138 21.45 24.92862F03 10.92138 14.40 17.87862F04 16.17138 19.65 23.12862F11 15.47138 18.95 22.42862

, , age

lower est. upperF10 0.14009962 0.450 0.7599004F09 -0.03490038 0.275 0.5849004F06 0.06509962 0.375 0.6849004F01 0.06509962 0.375 0.6849004F05 -0.03490038 0.275 0.5849004F07 0.24009962 0.550 0.8599004F02 0.49009962 0.800 1.1099004F08 -0.13490038 0.175 0.4849004F03 0.54009962 0.850 1.1599004F04 0.16509962 0.475 0.7849004F11 0.36509962 0.675 0.9849004


R Console Page 1

> lmF2=update(lmF,distance~I(age-11))> intervals(lmF2), , (Intercept)

lower est. upperF10 17.80704 18.500 19.19296F09 20.43204 21.125 21.81796F06 20.43204 21.125 21.81796F01 20.68204 21.375 22.06796F05 21.93204 22.625 23.31796F07 22.30704 23.000 23.69296F02 22.30704 23.000 23.69296F08 22.68204 23.375 24.06796F03 23.05704 23.750 24.44296F04 24.18204 24.875 25.56796F11 25.68204 26.375 27.06796

, , I(age - 11)

lower est. upperF10 0.14009962 0.450 0.7599004F09 -0.03490038 0.275 0.5849004F06 0.06509962 0.375 0.6849004F01 0.06509962 0.375 0.6849004F05 -0.03490038 0.275 0.5849004F07 0.24009962 0.550 0.8599004F02 0.49009962 0.800 1.1099004F08 -0.13490038 0.175 0.4849004F03 0.54009962 0.850 1.1599004F04 0.16509962 0.475 0.7849004F11 0.36509962 0.675 0.9849004


R Console Page 1

> lmeF=lme(distance~age,data=OrthoFem,random=~1)# Using REML> summary(lmeF)Linear mixed-effects model fit by REML Data: OrthoFem AIC BIC logLik 149.2183 156.169 -70.60916

Random effects: Formula: ~1 | Subject (Intercept) ResidualStdDev: 2.06847 0.7800331

Fixed effects: distance ~ age Value Std.Error DF t-value p-value(Intercept) 17.372727 0.8587419 32 20.230440 0age 0.479545 0.0525898 32 9.118598 0 Correlation: (Intr)age -0.674

Standardized Within-Group Residuals: Min Q1 Med Q3 Max -2.2736479 -0.7090164 0.1728237 0.4122128 1.6325181

Number of Observations: 44Number of Groups: 11


R Console Page 1

> lmeF0=lme(distance~I(age-11),data=OrthoFem,random=~1)> summary(lmeF0)Linear mixed-effects model fit by REML Data: OrthoFem AIC BIC logLik 149.2183 156.169 -70.60916


Fixed effects: distance ~ I(age - 11) Value Std.Error DF t-value p-value(Intercept) 22.647727 0.6346568 32 35.6850 0I(age - 11) 0.479545 0.0525898 32 9.1186 0 Correlation: (Intr)I(age - 11) 0




Random Intercept and Slope Model

Consider the model

Yij = (�1 + b1i) + (�2 + b2i)tij + eij:

• Each subject varies with respect(i) baseline level when ti1 = 0 and(ii) rate of change of response over time.

• In this particular case we have q = p = 2 and

Xi = Zi =

1 ti11 ti2... ...1 tini

:


• Additionally, consider the variance

V ar(Yij) = V ar(X′ij� + Z

′ijbi + eij)

= V ar(Z′ijbi + eij)

= V ar(b1i + b2itij + eij)

= V ar(b1i) + 2tijCov(b1i; b2i) + t2ijV ar(b2i) + V ar(eij):

and the covariance among the repeated observations of the same subject becomes

Cov(Yij; Yik) = V ar(b1i) + (tij + tik)Cov(b1i; b2i) + tijtikV ar(b2i):

• Hence, the covariance matrix can be expressed as a function of time.


Covariance Structure

In the linear mixed model

Yi = Xi� + Zibi + ei;

the matrix Wi = Cov(ei) introduces the covariance between the repeated observations whenfocusing on the conditional mean response pro�le of a speci�c individual. In other words, itis the covariance of the ith individual's deviations from the response pro�le

E(Yi|bi) = Xi� + Zibi:

• The usual assumption is W = �2In. This is referred as the conditional independence

assumption.

• The conditional covariance becomes

Cov(Yi|bi) = Cov(ei) = Wi


• The marginal then takes the form

Cov(Yi) = ZiBZ′i + Wi

.

• The Cov(Yi) allows for between-subject (B) and within-subject (Wi) sources of variation.

• Due to the fact that Cov(Yi) is a function of times of measurements (when time is in Zi),in principle each subject may have its own measurement times.

• The comparison of random e�ects models for the covariance is based on the likelihoodratio test (REML). A test of two nested models, one with q and another one with q + 1correlated random e�ects lead to a chi-square test on q + 1 df (1 for variance and q

covariances). However, caution is needed when the null hypothesis is on the boundary ofthe parameter space.


Some Characteristics

• There is no need of balanced data.

• The covariances are functions of time. As a result, if time is included in Zi, each patientcan have his own sequence of measurement times. This property makes these modelssuitable for the analysis of real life longitudinal data.

• The number of covariance parameters that need to be estimated remains unchangedregardless of the number of measurements.

• The random e�ects covariance structure allows the variances and covariances to change(increase or decrease) as a function of measurement times, without introducing restrictivestructures as the covariance pattern models do.


Prediction

• In the analysis of longitudinal data the interest in �xed e�ects � is obvious. Theinterpretation of the parameters is clear and associated with the mean response over timeand changes in covariates.

• In many cases, however, subject-speci�c trajectories are of interest.

• Under the linear mixed-e�ects model patient speci�c response trajectories can bepredicted/estimated.

• This is possible by obtaining predictions of the subject-speci�c e�ects bi (random e�ects),or

Xi� + Zibi:


• Generally, the issue of predicting a random variable and as a result the patient speci�cresponse trajectory is that of predicting its conditional mean given the available data.

• There are two pieces of information that contribute in the estimation/prediction of bi.

{ The �rst is the statement thatbi ∼ N(0; B)

(the prior of bi).{ The second is the likelihood of the data Yi, which say that

Yi|bi ∼ N(Xi� + Zibi; Wi)

.


• We combine information by multiplying the two densities (joint) and ...after some maths...we get

E(bi|Yi) = BZ′iΣ−1i (Yi −Xi�);

where Σi = Cov(Yi) = ZiBZ′i + Wi: This is known as the BLUP.

• The predictor of bi depends on B. Hence, when this is replaced by its REML estimator,we have

bi = BZ′iΣ−1i (Yi −Xi�);

also known as the empirical BLUP (or empirical Bayes estimate).

• Given bi we obtainYi = Xi� + Zibi:


• As a result we have

Yi = Xi� + Zibi

= Xi� + ZiBZ′iΣ−1i (Yi −Xi�)

= (Ini − ZiBZ′iΣ−1i )Xi� + ZiBZ

′iΣ−1i Yi

= (WiΣ−1i )Xi� + (Ini − WiΣ

−1i )Yi

whereΣiΣ

−1i = Ini = (ZiBZ

′i + Wi)Σ

−1i = ZiBZ

′iΣ−1i + WiΣ

−1i :

This expression shows that Yi is a weighted mean of Xi� , the population-averaged meanresponse pro�le and Yi the i

th patient's observed response pro�le.

• As a result the predicted response pro�le is pulled (shrinks) towards the population-averaged mean response pro�le.


• The amount of shrinkage depends on Wi and Σi.

• If Wi is "large" then the within-subject variability is greater that the between subjectvariability and hence more weight is given on the population averaged mean responsepro�le Xi�.

• The opposite holds when Wi is "small".


Example: Orthodont (cont.)

>lmeOrth1=lme(distance ∼ I(age-11),data=Orthodont,random=∼1)>lmeOrth1ml=update(lmeOrth1,method='ML')

>lmeOrth2=lme(distance ∼ I(age-11),data=Orthodont)

>lmeOrth2ml=update(lmeOrth2,method='ML')

>lmeOrth3=update(lmeOrth2,fixed=distance ∼ Sex*I(age-11))


R Console Page 1

> summary(lmeOrth1)Linear mixed-effects model fit by REML Data: Orthodont AIC BIC logLik 455.0025 465.6563 -223.5013



Standardized Within-Group Residuals: Min Q1 Med Q3 Max -3.66453932 -0.53507984 -0.01289591 0.48742859 3.72178465



R Console Page 1

> OrthRE1ml=random.effects(lmeOrth1ml)> OrthRE1ml (Intercept)M16 -0.9152788M05 -0.9152788M02 -0.5798146M11 -0.3561719M07 -0.2443505M08 -0.1325291M03 0.2029351M12 0.2029351M13 0.2029351M14 0.7620421M09 0.9856849M15 1.6566133M06 2.1038989M04 2.3275416M01 3.3339342M10 4.8994337F10 -4.9408491F09 -2.5925998F06 -2.5925998F01 -2.3689570F05 -1.2507430F07 -0.9152788F02 -0.9152788F08 -0.5798146F03 -0.2443505F04 0.7620421F11 2.1038989


R Console Page 1

> coef(lmeOrth1)#subject specific coefficients (random intercept only) (Intercept) I(age - 11)M16 23.10517 0.6601852M05 23.10517 0.6601852M02 23.44163 0.6601852M11 23.66593 0.6601852M07 23.77808 0.6601852M08 23.89023 0.6601852M03 24.22668 0.6601852M12 24.22668 0.6601852M13 24.22668 0.6601852M14 24.78744 0.6601852M09 25.01174 0.6601852M15 25.68464 0.6601852M06 26.13325 0.6601852M04 26.35755 0.6601852M01 27.36691 0.6601852M10 28.93702 0.6601852F10 19.06774 0.6601852F09 21.42291 0.6601852F06 21.42291 0.6601852F01 21.64721 0.6601852F05 22.76872 0.6601852F07 23.10517 0.6601852F02 23.10517 0.6601852F08 23.44163 0.6601852F03 23.77808 0.6601852F04 24.78744 0.6601852F11 26.13325 0.6601852


R Console Page 1

> summary(lmeOrth1ml)Linear mixed-effects model fit by maximum likelihood Data: Orthodont AIC BIC logLik 451.3895 462.1181 -221.6948



Standardized Within-Group Residuals: Min Q1 Med Q3 Max -3.68695130 -0.53862941 -0.01232442 0.49100161 3.74701483



R Console Page 1

> OrthRE1ml=random.effects(lmeOrth1ml)> OrthRE1ml (Intercept)M16 -0.9152788M05 -0.9152788M02 -0.5798146M11 -0.3561719M07 -0.2443505M08 -0.1325291M03 0.2029351M12 0.2029351M13 0.2029351M14 0.7620421M09 0.9856849M15 1.6566133M06 2.1038989M04 2.3275416M01 3.3339342M10 4.8994337F10 -4.9408491F09 -2.5925998F06 -2.5925998F01 -2.3689570F05 -1.2507430F07 -0.9152788F02 -0.9152788F08 -0.5798146F03 -0.2443505F04 0.7620421F11 2.1038989


R Console Page 1

> coef(lmeOrth1ml)#subject specific coefficients (random intercept only) (Intercept) I(age - 11)M16 23.10787 0.6601852M05 23.10787 0.6601852M02 23.44333 0.6601852M11 23.66698 0.6601852M07 23.77880 0.6601852M08 23.89062 0.6601852M03 24.22608 0.6601852M12 24.22608 0.6601852M13 24.22608 0.6601852M14 24.78519 0.6601852M09 25.00883 0.6601852M15 25.67976 0.6601852M06 26.12705 0.6601852M04 26.35069 0.6601852M01 27.35708 0.6601852M10 28.92258 0.6601852F10 19.08230 0.6601852F09 21.43055 0.6601852F06 21.43055 0.6601852F01 21.65419 0.6601852F05 22.77241 0.6601852F07 23.10787 0.6601852F02 23.10787 0.6601852F08 23.44333 0.6601852F03 23.77880 0.6601852F04 24.78519 0.6601852F11 26.12705 0.6601852


>plot(compareFits(coef(lmeOrth1),coef(lmeOrth1ml)))

M16M05M02M11M07M08M03M12M13M14M09M15M06M04M01M10F10F09F06F01F05F07F02F08F03F04F11

20 22 24 26 28

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

(Intercept)

0.2 0.4 0.6 0.8 1.0

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

I(age − 11)

● ●coef(lmeOrth1) coef(lmeOrth1ml)


>plot(augPred(lmeOrth1),aspect="xy",grid=T)

Age (yr)

Dis

tanc

e fr

om p

ituita

ry to

pte

rygo

max

illar

y fis

sure

(m

m)

20

25

30

8 1114

● ●

●

●

M16

●

●●

●

M05

8 1114

●● ●

●

M02

● ● ●

●

M11

8 1114

● ●

●

●

M07

●

●

●●

M08

8 1114

● ●

●

●

M03

●

● ●

●

M12

8 1114

●

●

●

●

M13

●

● ● ●

M14

●

●

●

●

M09

●

●

●

●

M15

●●

●

●

M06

●

●● ●

M04

●●

●

●

M01

● ●

● ●

M10

●

● ● ●

F10

20

25

30

●●

● ●

F09

20

25

30

●● ●

●

F06

8 1114

●●

●

●

F01

●

● ●●

F05

8 1114

●● ●

●

F07

● ●

●

●

F02

8 1114

● ● ● ●

F08

●

● ●

●

F03

8 1114

●● ●

●

F04

● ●

● ●

F11


R Console Page 1


Random effects: Formula: ~I(age - 11) | Subject Structure: General positive-definite StdDev Corr (Intercept) 2.1343327 (Intr)I(age - 11) 0.2264275 0.503 Residual 1.3100394

Fixed effects: distance ~ I(age - 11) Value Std.Error DF t-value p-value(Intercept) 24.023148 0.4296608 80 55.91189 0I(age - 11) 0.660185 0.0712532 80 9.26534 0 Correlation: (Intr)I(age - 11) 0.294




>plot(compareFits(ranef(lmeOrth2),ranef(lmeOrth2ml)),mark=c(0,0))


−4 −2 0 2 4

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

(Intercept)

−0.2 0.0 0.2 0.4

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

I(age − 11)

● ●ranef(lmeOrth2) ranef(lmeOrth2ml)


R Console Page 1


Random effects: Formula: ~Sex + I(age - 11) + Sex:I(age - 11) | Subject Structure: General positive-definite StdDev Corr (Intercept) 1.7178454 (Intr) SexFml I(-11)SexFemale 1.6956351 -0.307 I(age - 11) 0.2937695 -0.009 -0.146 SexFemale:I(age - 11) 0.3160597 0.168 0.290 -0.964Residual 1.2551778

Fixed effects: distance ~ Sex + I(age - 11) + Sex:I(age - 11) Value Std.Error DF t-value p-value(Intercept) 24.968750 0.4572240 79 54.60945 0.0000SexFemale -2.321023 0.7823126 25 -2.96687 0.0065I(age - 11) 0.784375 0.1015733 79 7.72226 0.0000SexFemale:I(age - 11) -0.304830 0.1346293 79 -2.26421 0.0263 Correlation: (Intr) SexFml I(-11)SexFemale -0.584 I(age - 11) -0.006 0.004 SexFemale:I(age - 11) 0.005 0.144 -0.754




R Console Page 1

> OrthRE3=random.effects(lmeOrth3)> OrthRE3 (Intercept) SexFemale I(age - 11) SexFemale:I(age - 11)M16 -1.73612668 0.63199885 -0.121203414 0.0748642681M05 -1.73713471 0.49796730 0.035630448 -0.0876368146M02 -1.40604191 0.43103963 -0.003830025 -0.0370896958M11 -1.18396932 0.56512991 -0.239248823 0.2132764937M07 -1.07528511 0.31943477 0.008987456 -0.0407096045M08 -0.96357680 0.47583428 -0.213277852 0.1928075453M03 -0.63399603 0.20785928 -0.017487532 -0.0003969599M12 -0.63483606 0.09616632 0.113207353 -0.1358145288M13 -0.63802816 -0.32826691 0.609847916 -0.6504012907M14 -0.08183867 0.14099033 -0.135532941 0.1380152657M09 0.13720981 -0.12701403 0.099549847 -0.0991217929M15 0.79838740 -0.39490093 0.177462762 -0.1605286380M06 1.24102052 -0.32776769 -0.058124041 0.0964521169M04 1.46326110 -0.17133882 -0.319681817 0.3739018202M01 2.45317943 -0.81889368 0.084716304 -0.0161270990M10 3.99777519 -1.19823860 -0.021015641 0.1385089141F10 -1.91258504 -1.84210386 0.071770763 -0.2293495874F09 -0.72087067 -0.69430276 0.027050737 -0.0864435068F06 -0.71120815 -0.68499782 0.026688309 -0.0852850411F01 -0.59610113 -0.57413261 0.022368854 -0.0714818606F05 -0.03022851 -0.02911148 0.001134008 -0.0036244236F07 0.16900395 0.16277491 -0.006341852 0.0202661280F02 0.19316023 0.18603726 -0.007247922 0.0231622923F08 0.30543005 0.29417922 -0.011461928 0.0366266523F03 0.54331257 0.52328536 -0.020387500 0.0651510668F04 1.02505976 0.98728531 -0.038465941 0.1229211327F11 1.73502694 1.67108646 -0.065107527 0.2080571474


>plot(augPred(lmeOrth3),aspect="xy",grid=T)

Age (yr)

Dis

tanc

e fr

om p

ituita

ry to

pte

rygo

max

illar

y fis

sure

(m

m)

20

25

30

8 11

● ●

●

●

M16

●

●●

●

M05

8 11

●● ●

●

M02

● ● ●

●

M11

8 11

● ●

●

●

M07

●

●

●●

M08

8 11

● ●

●

●

M03

●

● ●

●

M12

8 11

●

●

●

●

M13

●

● ● ●

M14

●

●

●

●

M09

●

●

●

●

M15

●●

●

●

M06

●

●● ●

M04

●●

●

●

M01

● ●

● ●

M10

●

● ● ●

F10

20

25

30

●●

● ●

F09

20

25

30

●● ●

●

F06

8 11

●●

●

●

F01

●

● ●●

F05

8 11

●● ●

●

F07

● ●

●

●

F02

8 11

● ● ● ●

F08

●

● ●

●

F03

8 11

●● ●

●

F04

● ●

● ●

F11


R Console Page 1

> newOrth=data.frame(Subject=rep(c("M11","F03"),c(3,3)),Sex=rep(c("Male","Female"),c(3,3)),age=rep(16:18,2) )> newOrth Subject Sex age1 M11 Male 162 M11 Male 173 M11 Male 184 F03 Female 165 F03 Female 176 F03 Female 18> predict(lmeOrth3,newdata=newOrth,level=0:1) Subject predict.fixed predict.Subject1 M11 28.89063 26.510412 M11 29.67500 27.055543 M11 30.45938 27.600664 F03 25.04545 26.335875 F03 25.52500 26.860186 F03 26.00455 27.38449


>lmListOrth=lmList(distance I(age-11), data=Orthodont)

>compFOrth=compareFits(coef(lmListOrth),coef(lmeOrth2))


18 20 22 24 26 28 30

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

(Intercept)

0.5 1.0 1.5 2.0

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

I(age − 11)

● ●coef(lmListOrth) coef(lmeOrth2)


>plot(comparePred(lmListOrth,lmeOrth2,length.out=2),layout=c(9,3))

Age (yr)

Dis

tanc

e fr

om p

ituita

ry to

pte

rygo

max

illar

y fis

sure

(m

m)

20

25

30

8 10 13

● ●

●●

M16

●

●●

●

M05

8 10 13

●● ●

●

M02

● ● ●●

M11

8 10 13

● ●

●

●

M07

●

●

●●

M08

8 10 13

● ●

●

●

M03

●

● ●

●

M12

8 10 13

●

●

●

●

M13

●

● ● ●

M14

●

●

●

●

M09

●●

●

●

M15

●●

●

●

M06

●

●● ●

M04

●●

●

●

M01

● ●

● ●

M10

●

● ● ●

F10

20

25

30

●●

● ●

F09

20

25

30

●● ●

●

F06

8 10 13

●●

●●

F01

●● ●

●

F05

8 10 13

●● ●

●

F07

● ●

●●

F02

8 10 13

● ● ● ●

F08

●

● ●

●

F03

8 10 13

●● ●

●

F04

● ●

● ●

F11

lmListOrth lmeOrth2


Examining a Fitted Model

There are two basic assumptions that need to be assessed

1. the within-group errors are assumed independent and identically normally distributed withmean zero and variance �2 (since Wi = �2I), and they are independent of the randome�ects

2. the random e�ects are normally distributed with mean zero and covariance matrix B (notdepending on the group) and are independent for di�erent groups.


Assessing assumptions on the within-group error

• The primary quantities used to assess the adequacy of the �rst assumption are the within-group residuals, de�ned as the di�erence between the observed and the within-group �ttedvalue.

• The plot method of lme class is the primary tool for obtaining diagnostics for the �rstassumption.


Example: Orthodont (cont.)

• Initially we consider the box plot of the residuals, by group.

• We add a vertical line at zero so we can assess whether

{ the residuals are centered at zero{ have constant variance across groups{ are independent of the group level


>plot(lmeOrth2,Subject ∼ resid(.),abline=0)

Residuals (mm)

Sub

ject


−4 −2 0 2 4

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●


>plot(lmeOrth2,resid(.,type='p') ∼ fitted(.)|Sex,id=0.05,adj=-0.3)

Fitted values (mm)

Sta

ndar

dize

d re

sidu

als

−2

0

2

4

20 25 30

●

●

●●

●

●

●

●●

● ●

●●

●

●

●●

●

●

●●

● ● ●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

M09

M09

M13

Male

20 25 30

●

●

●

●●

●

●●

●

●

● ●

●●

●●

●

●

● ●

● ●

●

●

●●

●

●

●

●

●

●

● ● ●

●●

●

●●

●

●

●

●

Female


R Console Page 1

> lmeOrth5=lme(distance~I(age-11),data=Orthodont,weights=varIdent(form=~1|Sex))> summary(lmeOrth5)Linear mixed-effects model fit by REML Data: Orthodont AIC BIC logLik 435.6466 454.2907 -210.8233

Random effects: Formula: ~I(age - 11) | Subject Structure: General positive-definite StdDev Corr (Intercept) 2.1590091 (Intr)I(age - 11) 0.1980627 0.617 Residual 1.6452598

Variance function: Structure: Different standard deviations per stratum Formula: ~1 | Sex Parameter estimates: Male Female 1.0000000 0.4040981 Fixed effects: distance ~ I(age - 11) Value Std.Error DF t-value p-value(Intercept) 23.97377 0.4341697 80 55.21752 0I(age - 11) 0.60686 0.0594260 80 10.21203 0 Correlation: (Intr)I(age - 11) 0.391




R Console Page 1

> anova(lmeOrth2,lmeOrth5) Model df AIC BIC logLik Test L.Ratio p-valuelmeOrth2 1 6 454.6367 470.6173 -221.3183 lmeOrth5 2 7 435.6466 454.2907 -210.8233 1 vs 2 20.99004 <.0001


>plot(lmeOrth5,resid(.,type='p') ∼ fitted(.)|Sex,id=0.05,adj=-0.3)

Fitted values (mm)

Sta

ndar

dize

d re

sidu

als

−2

0

2

20 25 30

●

●

●

●

●●

●

●

●

●●

●●

●

●

●●

●

●

●

●

● ● ●●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●● ● ●

●

●

●

●

●

M09

M09

M13

Male

20 25 30

●

●

●

●

●

●

●●

●

●

● ●

●●

●

●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

Female


>plot(lmeOrth5,distance ∼ fitted(.),id=0.05,adj=-0.3)

Fitted values (mm)

Dis

tanc

e fr

om p

ituita

ry to

pte

rygo

max

illar

y fis

sure

(m

m)

20

25

30

20 25 30

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

● ●

●

●

●

● ●

M09

M09

M13


>qqnorm(lmeOrth5, ∼ resid(.)|Sex)

Residuals (mm)

Qua

ntile

s of

sta

ndar

d no

rmal

−2

−1

0

1

2

−4 −2 0 2 4

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

Male

−4 −2 0 2 4

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

Female


Assessing assumptions on the random e�ects

• The ranef method is used to obtain the estimated BLUP of the random e�ects for lmeobjects.

• Two types of diagnostic plots will be used to assess the second assumption

{ qqnorm: normal plot{ pairs: scatter plot


>qqnorm(lmeOrth2, ∼ ranef(.),id=0.10,cex=0.7)

Random effects

Qua

ntile

s of

sta

ndar

d no

rmal

−2

−1

0

1

2

−4 −2 0 2 4

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

M10

F10

(Intercept)

−0.2 0.0 0.2 0.4

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

M13

I(age − 11)


>pairs(lmeOrth2,∼ranef(.)|Sex,id= ∼ Subject=='M13',adj=-0.3)

(Intercept)

I(ag

e −

11)

−0.2

0.0

0.2

0.4

−4 −2 0 2 4

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

M13

Male

−4 −2 0 2 4

●

●

●●●

●

●

●

●

●

●

Female


>qqnorm(lmeOrth5, ∼ ranef(.),id=0.10,cex=0.7)

Random effects

Qua

ntile

s of

sta

ndar

d no

rmal

−2

−1

0

1

2

−4 −2 0 2 4

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

M10

F10

(Intercept)

−0.2 −0.1 0.0 0.1 0.2

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

I(age − 11)


Revision: Generalized Liner Models

• So far we have discussed methods for analyzing continuous data

• When the response is discrete (e.g. binary, count), linear models are no longer appropriate

• Instead, we use Generalized Liner Models (GLM)

• Extensions of GLMs will be considered for the analysis of Longitudinal data


Features:

1. We have a response variable Yi for the ith subject, i = 1; :::; N , with an associated p× 1

vector of covariates

Xi =

Xi1...

Xip

2. Distributional assumption:

In the linear models, the distribution of the response variable is assumed normal. In theGLM, an extension is considered by assuming that the distribution of the response variablebelongs to the exponential family of distributions

f(yi; �i; �) = exp[{yi�i − �(�i)}=� + b(yi; �)]:

The speci�c functions a() and b() distinguish one member of the family from the other.Parameter �i is called the location parameter and � is the dispersion parameter.


It can be shown thatV ar(Yi) = �v(�i);

where v(�i) is the variance function, a known function of the mean �i, and � > 0.Members of this family are the Normal, Bernulli and Poisson distribution.

3. Systematic Component:In GLM, the mean is a function of the linear predictor �i;

�i = �1Xi1 + �2Xi2 + : : : + �pXip;

where usually Xi1 = 1.Note: In this context, linear means that �i is linear to the regression parameters � but notnecessarily to the covariates.


4. Link Function:The �nal thing is to relate the mean �i to linear predictor �i. This can be done byintroducing the link function g(),

g(�i) = �i = �1Xi1 + �2Xi2 + : : : + �pXip:

The link function is a known function, e.g. log(�i), that transforms the mean to changelinearly with changes in the covariates.

Distribution v(�) Link Function

Normal 1 Identity: � = �

Bernoulli �(1− �) Logit: log(

�1−�

)= �

Poisson � Log: log(�) = �


Logistic Regression: Binary outcomes

• Response Yi is a binary outcome with P (Yi = 1) = �i

• The mean is related to the covariates through

logit(�i) = log

(�i

1− �i

)= �1 + �2Xi

• Responses are Bernoulli variables with

V ar(Yi) = �i(1− �i)

• It can also be expressed as

�i =exp(�1 + �2Xi)

1 + exp(�1 + �2Xi)


Log-Linear Model for Count data

• Response Yi is a count assuming that has a Poisson distribution

P (Yi = yi) = e�i�yii

yi!:

• The mean is related to the covariates through

log(�i) = �1 + �2Xi:

• If the rate of occurrence is of interest, we get

log(�i=Ti) = �1 + �2Xi ⇒log(�i) = log(Ti) + �1 + �2Xi;


where Ti is the relevant time period. Ti is known as an o�set, and enters the model witha �xed parameters equal to 1.

• Responses are Poisson variables with

V ar(Yi) = v(�i) = �i:


Classes of model for dependent non-normal data

In the current settings with longitudinal data, two classes of models are widely used

1. marginal or population average (PA) models

2. subject-speci�c (SS) models


i. Marginal or Population Average Models (PA)

• Consider the logistic model

logit(E[Yij]) = X′ij�1

E[Yij] =exp(X

′ij�1)

1 + exp(X′ij�1)

• Also called population-average model• Models the mean at each time• Changes represent changes at the average level, not within subject change• Does not induce any within subject dependence


ii. Subject-Speci�c Models (SS)

• Consider the logistic model

logit(E[Yij|bi]) = X′ij�2 + bi

E[Yij|bi] =exp(X

′ij�2 + bi)

1 + exp(X′ij�2 + bi)

• bi is the e�ect associated with subject i• repeated measurements are assumed independent conditional on bi• Taking the expectation with respect bi induces correlation among repeated measures

and de�nes the marginal expectation

E[Yij] = ES{E[Yij|bi]}

= ES

{exp(X

′ij�2 + bi)


}


To summarize:

• with dependent data, when we move away from the normal linear model, we no longerhave a uni�ed modeling framework• deferent approaches are de�ned for di�erent distributions• as a result, parameter represent di�erent things in di�erent models• extra care is needed about the scale


Comparison between PA and SS models

• This is a matter of scale

{ eg. in logistic regression for the SS model the linear predictor

logit(E[Yij|bi]) = X′ij�2 + bi

operate on the logit scale


{ but, marginalizing involves averaging on the probability scale

E[Yij] = ES{E[Yij|bi]}

= ES

{exp(X

′ij�2 + bi)


}

6=exp(X

′ij�2 + E[bi])

1 + exp(X′ij�2 + E[bi])

=exp(X

′ij�2)

1 + exp(X′ij�2)

• The �nal expression is the probability for a subject with zero subject e�ect and is not thesame thing as the average probability over the subjects


Marginal Models: Generalized Estimating Equations (GEE)

• marginal models are primarily used to provide inferences about the population means

• GEEs provide an extension to the GLMs to longitudinal data

• no distributional assumption for the response variable is required

• only the speci�cation of a regression model for the mean is required

• the response variable can be continuous, binary or count

• furthermore, as a regression model easily handles unbalanced data


Notation:

• the notation is similar to what we have already introduced

• the response variable

Yi =

Yi1Yi2...

Yini

doesn't have to be continuous any more

• ni is the number of observations for subject i


• associated with Yi is a vector of covariates

Xij =

Xij1

Xij2...

Xijp

where i = 1; 2; :::; N and j = 1; 2; :::; ni.

• Two types of covariates are included among Xij

1. between-subject covariates, which are covariates that do not change over time (gender,treatment, etc)

2. within subject covariates, which are those that change over time (time since baseline,current status, etc)


Note: Since marginal models primarily care for population means, marginal models forlongitudinal data model separately the mean response and the within subject associationbetween the repeated responses.

• the former if of interest

• the latter is treated as nuisance


A marginal model has the following three part speci�cation

1. The mean structure is the following

g(�ij) = �ij = X′ij�;

where the conditional mean �ij = E[Yij|Xij] depends on the linear predictor �ij throughthe link function g().

2. The variance is assume to have the form

V ar(Yij) = �v(�ij);

where v(�ij) is a known function of the mean and � is a scale parameter that may be knownor need to be estimated. The scale parameter could be di�erent for di�erent occasions(balanced data) or could depend on time.


3. The within subject association among the repeated responses, given Xij, is a function ofa separate set of parameters, say �, that could also depend on the means. This could bethe pairwise correlations or log-odds ratios, depending on the type of the data

furthermore:

1. in marginal models, the mean response and the within-subject association is modeledseparately

2. the avoidance of distributional assumption for Yij leads to a method of estimation knownas Generalized Estimation Equations (GEE)


e.g. Marginal Model for Continuous Response

• The mean of Yij associates with the covariates through the identity link

�ij = �ij = X′ij�:

• The variance has the formV ar(Yij) = �v(�ij) = �;

where v(�ij) = 1 and � to be estimated.

• The within-subject association among repeated measures can be models using any of theways of modeling the covariance structure already discussed (autoregressive, unstructured,etc). We can assume a �rst order autoregressive correlation structure

Corr(Yij; Yik) = a|k−j|;

where 0 ≤ a ≤ 1.


• The already discussed linear model can be seen as a special case of the marginal model

• The marginal model provide a broad class of models for continuous data, largely based onthe choice of the link function


e.g. Marginal Model for Binary Response

• Responses are considered Bernoulli variables

• The mean of Yij associates with the covariates through the logit link

log

(�ij

1− �ij

)= �ij = X

′ij�:

• The variance has the formV ar(Yij) = �ij(1− �ij);

where � = 1.


• The within-subject association among repeated measures can be models using anunstructured pairwise log-odds ratio pattern (or any other available pattern)

logOR(Yij; Yik) = ajk;

where

OR(Yij; Yik) =P (Yj = 1; Yk = 1)P (Yj = 0; Yk = 0)

P (Yj = 1; Yk = 0)P (Yj = 0; Yk = 1).


e.g. Marginal Model for Counts

• Responses are considered to follow Poisson Distribution

• The mean of Yij associates with the covariates through the log link function

log (�ij) = �ij = X′ij�:

• The variance has the formV ar(Yij) = ��ij;

where � does not depend on time and has to be estimated.

• The within-subject association among repeated measures can be models using anunstructured pairwise correlation pattern (or any other available pattern)

Corr(Yij; Yik) = ajk:


Here a balanced design has been assumed.

• In the model speci�cation, the Poisson variance is multiplied by a parameter �. Hence,variance is in ated when � > 1. It is very common that count data have variability greaterthan the predicted variance from Poisson, and this is called overdispersion.


Estimation

• GEE approach is based on estimating equations

• The idea is to extend the usual likelihood equation for GLM by incorporating the covariancematrix of the responses

• Assume the following marginal model

1. g (�ij) = �ij = X′ij�:

2. V ar(Yij) = �v(�ij);where v(�ij) is a known function of the mean and � can be di�erent for each occasion(balanced data) or depend on time.

3. The pairwise within subject association is assume a function of the means �ij and a setof association parameters �, such that

Vi = A12i Corr(Yi)A

12i ;


where A12i is a diagonal matrix with elements V ar(Yij) = �v(�ij) along the diagonal

and Corr(Yi) is a correlation matrix, a function of �. We tend to call Vi a working

covariance matrix, to distinguish it from the true underline covariance matrix.

• The GLS estimator of � is

� =

{N∑i=1

(X′iΣ−1i Xi)

}−1 N∑i=1

(X′iΣ−1i yi);

obtained by solvingN∑i=1

X′iΣ−1i (yi − �i) = 0

as part of the minimization of

N∑i=1

(yi −Xi�)′Σ−1i (yi −Xi�):


• The GEE estimator of � is obtained from the the minimization of

N∑i=1

(yi − �i(�))′V −1i (yi − �i(�))

with respect to �, where Vi is assumed known (ignoring its dependence on �) and �i isthe vector of �ij = g−1(Xij�). This results to the generalized estimating equations

N∑i=1

D′iV−1i (yi − �i) = 0;

where Vi is the working covariance matrix and Di = @�i@�.


Iterative estimation procedure

The GEE have no closed-form solution.

Step 1: Given current (initial) estimated for � and �, Vi is estimated and an estimate of �is obtained from

N∑i=1

D′iV−1i (yi − �i) = 0:

Step 2: Given the current estimate of �, estimates of � and � can be obtained fromstandardized residuals

eij =Yij − �ij√v(�ij)

:


Notes:

1. We iterate between the above steps until convergence.

2. Initial values for � can be obtained from �tting a GLM assuming independent observations

3. Algorithm is simple


Properties:

1. The estimate � is a consistent estimate of � (large sample property). This is trueirrespectively of the choice of Vi. Hence, all we need is that the model for the meanis correctly speci�ed.

2. In large sample, � has a MVN with mean � and

Cov(�) = B−1MB−1;

where

B =

N∑i=1

D′iV−1i Di;

M =

N∑i=1

D′iV−1i Cov(Yi)V

−1i Di:


These matrices can be estimated by substituting �, � and � by their estimates and replacingCov(Yi) = Σi by

(Yi − �i)(Yi − �i)′:

3. Hence

Cov(�) =

(N∑i=1

D′iV−1i Di

)−1{N∑i=1

D′iV−1i (Yi − �i)(Yi − �i)

′V−1i Di

}(N∑i=1

D′iV−1i Di

)−1

:

This is the so called sandwich estimator.

4. Finally, if we model correctly, Vi = Σi and

Cov(�) = B−1:


Pros:

1. The GEE estimator � is as precise as the MLE.

2. The GEE estimator is consistent estimate of � even when the within-subject associationsare misspeci�ed.

3. In this case, valid* estimates of the standard errors can be obtained from the sandwichestimator

* Reliance on the sandwich estimator is not appealing when the number of subjects is notvery big compared to the number of repeated observations, when the design is unbalanced.In these cases, it is preferably to obtain the model based covariance

Cov(�) = B−1;

which provides valid estimates when the working covariance matrix is a good approximationof the true covariance Σi.


Example: Respiratory Data (Binary)

• In each of two centers patients were randomized to active treatment or placebo

• During treatment, the respiratory status (poor or good) was determined at each of fourmonthly visits

• There were 111 patiens (54 vs 57)

• Question of interest is to asses the treatment is e�ective and estimate its e�ect


resp glm = glm(status ∼ centre + treatment + sex + baseline + age, data

= resp, family = "binomial")


R Console Page 1

> summary(resp_glm)

Call:glm(formula = status ~ centre + treatment + sex + b aseline + age, family = "binomial", data = resp)

Deviance Residuals: Min 1Q Median 3Q Max -2.3146 -0.8551 0.4336 0.8953 1.9246

Coefficients: Estimate Std. Error z value Pr( >|z|) (Intercept) -0.900171 0.337653 -2.666 0. 00768 ** centre2 0.671601 0.239567 2.803 0. 00506 ** treatmenttreatment 1.299216 0.236841 5.486 4.1 2e-08 ***sexmale 0.119244 0.294671 0.405 0. 68572 baselinegood 1.882029 0.241290 7.800 6.2 0e-15 ***age -0.018166 0.008864 -2.049 0. 04043 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘. ’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 608.93 on 443 degrees of freed omResidual deviance: 483.22 on 438 degrees of freed omAIC: 495.22

Number of Fisher Scoring iterations: 4


resp gee1 = gee(nstat ∼ centre + treatment + sex + baseline + age, data

= resp, family = "binomial", id = subject,corstr = "independence", scale.fix

= TRUE, scale.value = 1)


R Console Page 1

> summary(resp_gee1)

GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA gee S-function, version 4.13 modified 98/01/27 (1998)

Model: Link: Logit Variance to Mean Relation: Binomial Correlation Structure: Independent

Call:gee(formula = nstat ~ centre + treatment + sex + baseline + age, id = subject, data = resp, family = "binomial", corstr = "independence", scale.fix = TRUE, scale.value = 1)

Summary of Residuals: Min 1Q Median 3Q Max -0.93134415 -0.30623174 0.08973552 0.33018952 0.84307712

Coefficients: Estimate Naive S.E. Naive z Robust S.E.(Intercept) -0.90017133 0.337653052 -2.665965 0.46032700centre2 0.67160098 0.239566599 2.803400 0.35681913treatmenttreatment 1.29921589 0.236841017 5.485603 0.35077797sexmale 0.11924365 0.294671045 0.404667 0.44320235baselinegood 1.88202860 0.241290221 7.799854 0.35005152age -0.01816588 0.008864403 -2.049306 0.01300426 Robust z(Intercept) -1.9555041centre2 1.8821889treatmenttreatment 3.7038127sexmale 0.2690501baselinegood 5.3764332age -1.3969169

Estimated Scale Parameter: 1Number of Iterations: 1

Working Correlation [,1] [,2] [,3] [,4][1,] 1 0 0 0[2,] 0 1 0 0[3,] 0 0 1 0[4,] 0 0 0 1


resp gee2 = gee(nstat ∼ centre + treatment + sex + baseline + age, data

= resp, family = "binomial", id = subject,corstr = "exchangeable", scale.fix

= TRUE, scale.value = 1)


R Console Page 1

> summary(resp_gee2)


Model: Link: Logit Variance to Mean Relation: Binomial Correlation Structure: Exchangeable

Call:gee(formula = nstat ~ centre + treatment + sex + baseline + age, id = subject, data = resp, family = "binomial", corstr = "exchangeable", scale.fix = TRUE, scale.value = 1)

Summary of Residuals: Min 1Q Median 3Q Max -0.93134415 -0.30623174 0.08973552 0.33018952 0.84307712

Coefficients: Estimate Naive S.E. Naive z Robust S.E.(Intercept) -0.90017133 0.47846344 -1.8813796 0.46032700centre2 0.67160098 0.33947230 1.9783676 0.35681913treatmenttreatment 1.29921589 0.33561008 3.8712064 0.35077797sexmale 0.11924365 0.41755678 0.2855747 0.44320235baselinegood 1.88202860 0.34191472 5.5043802 0.35005152age -0.01816588 0.01256110 -1.4462014 0.01300426 Robust z(Intercept) -1.9555041centre2 1.8821889treatmenttreatment 3.7038127sexmale 0.2690501baselinegood 5.3764332age -1.3969169


Working Correlation [,1] [,2] [,3] [,4][1,] 1.0000000 0.3359883 0.3359883 0.3359883[2,] 0.3359883 1.0000000 0.3359883 0.3359883[3,] 0.3359883 0.3359883 1.0000000 0.3359883[4,] 0.3359883 0.3359883 0.3359883 1.0000000


R Console Page 1

> # Confidence Interval for estimated treatment effect [logOR scale]> se <- summary(resp_gee2)$coefficients["treatmenttreatment","Robust S.E."]> coef(resp_gee2)["treatmenttreatment"] + c(-1, 1) * se * qnorm(0.975)[1] 0.6117037 1.9867281> > # Confidence Interval for estimated treatment effect [OR scale]> exp(coef(resp_gee2)["treatmenttreatment"] + c(-1, 1) * se * qnorm(0.975))[1] 1.843570 7.291637


Example: Epilepsy Data (Counts)

• 59 patients with epilepsy were randomized to receive either "Progabide" or "Placebo".

• Numbers of seizures observed in each of four 2-week periods were recorded along with thebaseline seizure count for the 8 weeks prior randomization

• Question of interest is whether taking the anti-epileptic drug reduces the number of seizurescompares to placebo


R Console Page 1

> data("epilepsy", package = "HSAUR")> itp <- interaction(epilepsy$treatment, epilepsy$period)> tapply(epilepsy$seizure.rate, itp, mean) placebo.1 Progabide.1 placebo.2 Progabide.2 placebo.3 Progabide.3 placebo.4 Progabide.4 9.357143 8.580645 8.285714 8.419355 8.785714 8.129032 7.964286 6.709677 > tapply(epilepsy$seizure.rate, itp, var) placebo.1 Progabide.1 placebo.2 Progabide.2 placebo.3 Progabide.3 placebo.4 Progabide.4 102.75661 332.71828 66.65608 140.65161 215.28571 193.04946 58.18386 126.87957


●●

●

●

●

●

●

●

●

1 2 3 4

020

4060

8010

0

Placebo

Period

Num

ber

of s

eizu

res

●●

●

● ●●

●

●

●

●

●

●

●●

●

1 2 3 4

020

4060

8010

0

Progabide

Period

Num

ber

of s

eizu

res


●

1 2 3 4

01

23

4

Placebo

Period

Log

num

ber

of s

eizu

res

●

●

●

●

1 2 3 4

01

23

4

Progabide

Period

Log

num

ber

of s

eizu

res


fm <- seizure.rate ∼ base + age + treatment + offset(per)

epilepsy glm <- glm(fm, data = epilepsy, family = "poisson")


R Console Page 1

> summary(epilepsy_glm)

Call:glm(formula = fm, family = "poisson", data = epilep sy)

Deviance Residuals: Min 1Q Median 3Q Max -4.4360 -1.4034 -0.5029 0.4842 12.3223

Coefficients: Estimate Std. Error z value Pr (>|z|) (Intercept) -0.1306156 0.1356191 -0.963 0 .33549 base 0.0226517 0.0005093 44.476 < 2e-16 ***age 0.0227401 0.0040240 5.651 1. 59e-08 ***treatmentProgabide -0.1527009 0.0478051 -3.194 0 .00140 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘. ’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to b e 1)

Null deviance: 2521.75 on 235 degrees of free domResidual deviance: 958.46 on 232 degrees of free domAIC: 1732.5

Number of Fisher Scoring iterations: 5


epilepsy gee1 <- gee(fm, data = epilepsy, family = "poisson",id = subject,

corstr = "independence", scale.fix = TRUE,scale.value = 1)


R Console Page 1

> summary(epilepsy_gee1)


Model: Link: Logarithm Variance to Mean Relation: Poisson Correlation Structure: Independent

Call:gee(formula = fm, id = subject, data = epilepsy, family = "poisson", corstr = "independence", scale.fix = TRUE, scale.value = 1)

Summary of Residuals: Min 1Q Median 3Q Max -4.9195387 0.1808059 1.7073405 4.8850644 69.9658560

Coefficients: Estimate Naive S.E. Naive z Robust S.E. Robust z(Intercept) -0.13061561 0.1356191185 -0.9631062 0.365148155 -0.3577058base 0.02265174 0.0005093011 44.4761250 0.001235664 18.3316325age 0.02274013 0.0040239970 5.6511312 0.011580405 1.9636736treatmentProgabide -0.15270095 0.0478051054 -3.1942393 0.171108915 -0.8924196


Working Correlation [,1] [,2] [,3] [,4][1,] 1 0 0 0[2,] 0 1 0 0[3,] 0 0 1 0[4,] 0 0 0 1



corstr = "exchangeable", scale.fix = TRUE,scale.value = 1)


R Console Page 1



Model: Link: Logarithm Variance to Mean Relation: Poisson Correlation Structure: Exchangeable

Call:gee(formula = fm, id = subject, data = epilepsy, family = "poisson", corstr = "exchangeable", scale.fix = TRUE, scale.value = 1)







corstr = "exchangeable", scale.fix = FALSE,scale.value = 1)


R Console Page 1



Model: Link: Logarithm Variance to Mean Relation: Poisson Correlation Structure: Exchangeable

Call:gee(formula = fm, id = subject, data = epilepsy, family = "poisson", corstr = "exchangeable", scale.fix = FALSE, scale.value = 1)



Estimated Scale Parameter: 5.089608Number of Iterations: 1



Generalized Linear Mixed E�ects Models

• GLMs can be extended, with the inclusion of random parameters, to allow variation betweensubjects

• Random e�ects follow multivariate normal distribution

• Conditional on random e�ects, responses are independent following a distribution thatbelongs to the exponential family.


Model Speci�cation:

• The distribution of Yij, conditional to random e�ects, belongs to the exponential familyof distributions.

• It's variance isV ar(Yi) = �v(E[Yij|bi])

• Given bi, Yij are independent from one another

• In matrix notation, the linear predictor can be written

�ij = X′ij� + Z

′ijbi;

and for some known link function g()

g(E[Yij|bi]) = �ij = X′ij� + Z

′ijbi;


• Random e�ects, in theory, can follow any multivariate distribution. In practice, they followmultivariate normal with mean equal zero and a covariance matrix G.


GLMM for Continuous Response:

• Responses Yij are independent, conditional on bi, and normally distributed

• Variance has the formV ar(Yij|bi) = �2;

where � = �2 and v(�) = 1.

• The linear predictor is


′ijbi;

where X′ij = Z

′ij = (1; tij) (illustration). Then

E(Yij|bi) = �ij = X′ij� + Z

′ijbi

= (�1 + b1i) + (�2 + b2i)tij:

• Although the link is the identity function, more options are available


• Random e�ects have a bi-variate Normal with covariance matrix G2×2


GLMM for Binary Response:

• Responses Yij are independent, conditional on bi, Bernoulli variables

• Variance has the form

V ar(Yij|bi) = E(Yij|bi)(1− E(Yij|bi)):

This means that � = 1.

• The linear predictor is given by


′ijbi

= X′ij� + bi;


where Z′ij = 1 for all i; j (illustration). Then

log

[P (Yij = 1|bi)P (Yij = 0|bi)

]= �ij = X

′ij� + bi

• bi ∼ N(0; �2).

• This is a random intercept model, equivalent to the compound symmetry model.


GLMM for Counts:

• Responses Yij are independent, conditional on bi, following Poisson distribution

• Variance has the formV ar(Yij|bi) = E(Yij|bi):

This means that � = 1.

• The linear predictor is given by


′ijbi;

where Z′ij = (1; tij) for all i; j (illustration). Then

logE(Yij|bi) = �ij = X′ij� + Z

′ijbi

• Random e�ects follow bivariate normal with zero mean and 2x2 covariance matrix


Parameter Interpretation

• Parameters in the linear predictor are now interpreted in terms of conditional probabilities,given subject (random) e�ects

• Regression parameters � in GLMM have di�erent interpretation than in marginal models

• In GLMM, � represent subject-speci�c interpretation

• Speci�cally, � represent the impact of covariates on changes in an individual's transformedmean response


• Consider the example with the logistic regression model

log

[P (Yij = 1|bi)P (Yij = 0|bi)

]= X

′ij� + bi;

where bi ∼ N(0; g11). Furthermore, consider covariate Xijk takes some value x, leadingto the log-odds

log

[P (Yij = 1|bi; Xij1; Xij2; :::; Xijk = x; :::; Xijp)

P (Yij = 0|bi; Xij1; Xij2; :::; Xijk = x; :::; Xijp)

]= �1Xij1 + �2Xij2 + ::: + �kx + ::: + �pXijp + bi:

Additionally, if Xijk = x + 1, then the log-odds takes the form

log

[P (Yij = 1|bi; Xij1; Xij2; :::; Xijk = x + 1; :::; Xijp)

P (Yij = 0|bi; Xij1; Xij2; :::; Xijk = x + 1; :::; Xijp)

]= �1Xij1 + �2Xij2 + ::: + �k(x + 1) + ::: + �pXijp + bi;


and hence �k measures the changes in the log-odds resulted from a unit change in covariateXijk while the remaining ones were held �xed. In terms of interpretation:

{ If the covariate Xijk varies within individual (subject-speci�c, time-varying) then

log

[P (Yij′ = 1|bi; Xij′1; Xij′2; :::; Xij′k = x + 1; :::; Xij′p)

P (Yij′ = 0|bi; Xij′1; Xij′2; :::; Xij′k = x + 1; :::; Xij′p)

]− log

[P (Yij = 1|bi; Xij1; Xij2; :::; Xijk = x; :::; Xijp)

P (Yij = 0|bi; Xij1; Xij2; :::; Xijk = x; :::; Xijp)

]= �k;

where the interpretation is quite straight forward since all other covariates as well asrandom e�ects are the same and hence removed. Hence,

log[P(Yij′=1|bi;:::)=P(Yij′=0|bi;:::)P(Yij=1|bi;:::)=P(Yij=0|bi;:::)

]= logOR = �k ⇒

OR = exp(�k)

is the within subject OR.


{ If the covariate Xijk is time invariant (between-subject), like treatment group,interpretation becomes complicated. Hence

log

[P (Yij = 1|bi; Xij1; Xij2; :::; Xijk = 1; :::; Xijp)

P (Yij = 0|bi; Xij1; Xij2; :::; Xijk = 1; :::; Xijp)

]− log

[P (Yi′j = 1|bi′; Xi′j1; Xi′j2; :::; Xi′jk = 0; :::; Xi′jp)

P (Yi′j = 0|bi′; Xi′j1; Xi′j2; :::; Xi′jk = 0; :::; Xi′jp)

]= �k + (bi − bi′);

and as a result the change in log-odds is confounded by bi− bi′. It is misleading to giveto this change a subject-speci�c interpretation. It is seen as a model based extrapolation(no data available) and could be sensitive to various assumptions concerning the randome�ects.


Estimation and Inference

• The distribution of the random e�ects as well as the distribution of the responses areknown

• As a result, the joint distribution of random e�ects and responses is fully speci�ed

f(Yi; bi) = f(Yi|bi)f(bi);

wheref(Yi|bi) = f(Yi1|bi) f(Yi2|bi) : : : f(Yini|bi)

under the conditional independence assumption.


• Then, the likelihood function takes the form

L(�; �; G) =

N∏i=1

∫f(Yi|bi)f(bi)dbi;

where the random e�ects are integrated out of the likelihood, obtaining in that way amarginal likelihood averaged over the bi.

• There is now way the likelihood can be written in a closed form

• As a result, numerical integration techniques are required


Prediction of bi

• Given the MLE of �, � and G, bi can be predicted as

bi = E(bi|Yi; �; �; G)

• This is the empirical Bayes or BLUP used before

• Numerical integration techniques are also required


The lmer function (R: lme4 package)


Fit (Generalized) Linear Mixed-Effects Models

Description

Fit a linear or generalized linear mixed-effects model with nested or crossed grouping factors for the random effects.

Usage

lmer(formula, data, family, method, control, start, subset, weights, na.action, offset, contrasts, model, ...) lmer2(formula, data, family, method, control, start, subset, weights, na.action, offset, contrasts, model, ...)

Arguments

Details

lmer(lme4) R Documentation

formula a two-sided linear formula object describing the fixed-effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. The vertical bar character "|" separates an expression for a model matrix and a grouping factor.

data an optional data frame containing the variables named in formula. By default the variables are taken from the environment from which lmer is called.

family a GLM family, see glm. If family is missing then a linear mixed model is fit; otherwise a generalized linear mixed model is fit.

method a character string. For a linear mixed model the default is "REML" indicating that the model should be fit by maximizing the restricted log-likelihood. The alternative is "ML" indicating that the log-likelihood should be maximized. (This method is sometimes called "full" maximum likelihood.) For a generalized linear mixed model the criterion is always the log-likelihood but this criterion does not have a closed form expression and must be approximated. The default approximation is "PQL" or penalized quasi-likelihood. Alternatives are "Laplace" or "AGQ" indicating the Laplacian and adaptive Gaussian quadrature approximations respectively. The "PQL" method is fastest but least accurate. The "Laplace" method is intermediate in speed and accuracy. The "AGQ" method is the most accurate but can be considerably slower than the others.

control a list of control parameters. See below for details.start a list of relative precision matrices for the random effects. This has the same form

as the slot "Omega" in a fitted model. Only the upper triangle of these symmetric matrices should be stored.

subset, weights, na.action, offset, contrasts

further model specification arguments as in lm; see there for details.

model logical indicating if the model component should be returned (in slot frame).... potentially further arguments for methods. Currently none are used.

Page 1 of 3Fit (Generalized) Linear Mixed-Effects Models

17/04/2008mk:@MSITStore:C:\PROGRA~1\R\R-26~1.1\library\lme4\chtml\lme4.chm::/lmer.html


Example: Respiratory Data


resp lmer1 =lmer(status ∼ centre + treatment + sex + baseline + age +

(1|subject), data = resp, family = "binomial")


R Console Page 1

> summary(resp_lmer1)Generalized linear mixed model fit using Laplace Formula: status ~ centre + treatment + sex + baseli ne + age + (1 | subject) Data: resp Family: binomial(logit link) AIC BIC logLik deviance 443 471.7 -214.5 429Random effects: Groups Name Variance Std.Dev. subject (Intercept) 3.8402 1.9596 number of obs: 444, groups: subject, 111

Estimated scale (compare to 1 ) 0.7770601

Fixed effects: Estimate Std. Error z value Pr(> |z|) (Intercept) -1.64382 0.75668 -2.172 0. 0298 * centre2 1.04635 0.53075 1.971 0. 0487 * treatmenttreatment 2.16087 0.51652 4.183 2.87 e-05 ***sexmale 0.20740 0.65969 0.314 0. 7532 baselinegood 3.07037 0.52499 5.848 4.96 e-09 ***age -0.02549 0.01994 -1.278 0. 2012 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘. ’ 0.1 ‘ ’ 1

Correlation of Fixed Effects: (Intr) centr2 trtmnt sexmal bslngdcentre2 -0.054 trtmnttrtmn -0.407 0.018 sexmale -0.008 -0.151 0.222 baselinegod -0.347 -0.236 0.206 0.101 age -0.753 -0.226 -0.015 -0.255 0.069


resp lmer2 = lmer(status ∼ centre + treatment + sex + baseline + age

+ (age|subject), data = resp, family = "binomial")


R Console Page 1

> summary(resp_lmer2)Generalized linear mixed model fit using Laplace Formula: status ~ centre + treatment + sex + baseli ne + age + (age | subject) Data: resp Family: binomial(logit link) AIC BIC logLik deviance 445.8 482.7 -213.9 427.8Random effects: Groups Name Variance Std.Dev. Corr subject (Intercept) 1.964799 1.401713 age 0.001584 0.039799 0.003 number of obs: 444, groups: subject, 111

Estimated scale (compare to 1 ) 0.7859826

Fixed effects: Estimate Std. Error z value Pr(> |z|) (Intercept) -1.29487 0.72534 -1.785 0. 0742 . centre2 0.99755 0.50953 1.958 0. 0503 . treatmenttreatment 2.01372 0.50179 4.013 5.99 e-05 ***sexmale 0.24017 0.68883 0.349 0. 7273 baselinegood 2.97704 0.51023 5.835 5.39 e-09 ***age -0.03354 0.02107 -1.592 0. 1114 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘. ’ 0.1 ‘ ’ 1

Correlation of Fixed Effects: (Intr) centr2 trtmnt sexmal bslngdcentre2 -0.084 trtmnttrtmn -0.396 0.013 sexmale 0.053 -0.130 0.215 baselinegod -0.337 -0.226 0.217 0.076 age -0.753 -0.173 -0.038 -0.316 0.042


introduction to longitudinal data analysis · 2012-04-27 · analysis of longitudinal data. oxford...

Documents