inference methods for the conditional logistic regression model … · 2010-10-28 · inference...

Post on 30-Dec-2019

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Inference Methods for the Conditional LogisticRegression Model with Longitudinal Data

Arising from Animal Habitat Selection Studies

Thierry Duchesne∗ 1

(Thierry.Duchesne@mat.ulaval.ca)

with Radu Craiu∗∗, Daniel Fortin∗∗∗, Sophie Baillargeon∗

∗Departement de mathematiques et de statistique, Universite Laval∗∗Department of Statistics, University of Toronto∗∗∗Departement de biologie, Universite Laval

Department of Statistics SeminarUniversity of Manitoba

October 28, 2010

1Research funded by NSERC.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Outline

1 IntroductionResearch objectivesSampling designsData availableMethodological objectives

2 Conditional logistic regressionModel and notationJustification of conditional logistic regression

3 Population averaged inferenceMethodExample of application

4 Subject specific inferenceMethodExample of application

5 Conclusion

6 References

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Research objectives

Objectives of our research

Ecological objectivesFor the biologists, it is important to understand the linksbetween various attributes of a landscape and how animalsselect their habitat (or move within their home-range).

Statistical objectivesWhat are the “appropriate” sampling designs?What are the possible statistical models?How do we make inference on the model parameters?

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Research objectives

Objectives of our research

Ecological objectivesFor the biologists, it is important to understand the linksbetween various attributes of a landscape and how animalsselect their habitat (or move within their home-range).

Statistical objectivesWhat are the “appropriate” sampling designs?What are the possible statistical models?How do we make inference on the model parameters?

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Sampling designs

Possible study designs

Unmatched used vs unused (or available) designsUseful to determine what landscape attributes predict if alocation is likely to be used or not over a specified timeframe (e.g., trees with nests vs trees without nests).

Usually analyzed with logistic regression (Yi = 1 if location iis used, Yi = 0 otherwise). To be used with care since insome contexts, available;unused.If sampling unit is animal (with many used locations peranimal), then within animal correlation must be taken intoconsideration ⇒ GEE (population-averaged) or mixedmodels (subject-specific) are used. Again, care must beexercised w.r.t. the available/unused locations.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Sampling designs

Possible study designs

Unmatched used vs unused (or available) designsUseful to determine what landscape attributes predict if alocation is likely to be used or not over a specified timeframe (e.g., trees with nests vs trees without nests).Usually analyzed with logistic regression (Yi = 1 if location iis used, Yi = 0 otherwise). To be used with care since insome contexts, available;unused.

If sampling unit is animal (with many used locations peranimal), then within animal correlation must be taken intoconsideration ⇒ GEE (population-averaged) or mixedmodels (subject-specific) are used. Again, care must beexercised w.r.t. the available/unused locations.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Sampling designs

Possible study designs

Unmatched used vs unused (or available) designsUseful to determine what landscape attributes predict if alocation is likely to be used or not over a specified timeframe (e.g., trees with nests vs trees without nests).Usually analyzed with logistic regression (Yi = 1 if location iis used, Yi = 0 otherwise). To be used with care since insome contexts, available;unused.If sampling unit is animal (with many used locations peranimal), then within animal correlation must be taken intoconsideration ⇒ GEE (population-averaged) or mixedmodels (subject-specific) are used. Again, care must beexercised w.r.t. the available/unused locations.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Sampling designs

Possible study designs

Matched designsFor each location used (or step traveled) by an animal, munused locations that could have been visited by the sameanimal at the same time are sampled.

The dataset is comprised of several such matched stratafor each animal.Does not allow inference on absolute probability of use of aprecise location, but does allow inference on the probabilityof choosing a given location among a set of locations whenlocation attributes are given.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Sampling designs

Possible study designs

Matched designsFor each location used (or step traveled) by an animal, munused locations that could have been visited by the sameanimal at the same time are sampled.The dataset is comprised of several such matched stratafor each animal.

Does not allow inference on absolute probability of use of aprecise location, but does allow inference on the probabilityof choosing a given location among a set of locations whenlocation attributes are given.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Sampling designs

Possible study designs

Matched designsFor each location used (or step traveled) by an animal, munused locations that could have been visited by the sameanimal at the same time are sampled.The dataset is comprised of several such matched stratafor each animal.Does not allow inference on absolute probability of use of aprecise location, but does allow inference on the probabilityof choosing a given location among a set of locations whenlocation attributes are given.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Sampling designs

Matched design

E.g., each location is matched with 10 locations picked atrandom among those that could have been used at same time.

Step Selection Functions. Fortin et al. 2005 Ecology 86(5): 1320-1330

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Sampling designs

Matched design

E.g., each location is matched with 10 locations picked atrandom among those that could have been used at same time.

Step Selection Functions. Fortin et al. 2005 Ecology 86(5): 1320-1330

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Data available

Part I: Data on the available location

We have a detailed GIS database of Prince Albert NationalPark

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Data available

Part II: Animal location data

For each of K animals (female bison), GPS collars give theirprecise location at a large number of equally spaced time steps

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Methodological objectives

Our precise statistical problems

In some cases, we can get more than one Y = 1 in astratum: e.g., a pair of animals traveling together.How do we make inferences on the preferences of theanimals for given landscape attributes under such asampling design?We will see that this can be done if we can come up with a“longitudinal” version of conditional logistic regression.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Model and notation

Notation

Animals: c = 1,2, . . . ,K; Strata: j = 1,2, . . . ,Sc; Locations:i = 1,2, . . . ,n;

Response variable: y(c)ji = 1 if animal c was at location i in

j-th stratum, 0 otherwise;Covariates: Value of attributes of landscape at location i instratum j of animal c: x(c)

ji = (x(c)ji1, . . . ,x

(c)jip)

>;Sampling design: By design, it is known before samplingthat ∑

ni=1 y(c)

ji = m for all j,c.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Model and notation

Notation

Animals: c = 1,2, . . . ,K; Strata: j = 1,2, . . . ,Sc; Locations:i = 1,2, . . . ,n;

Response variable: y(c)ji = 1 if animal c was at location i in

j-th stratum, 0 otherwise;

Covariates: Value of attributes of landscape at location i instratum j of animal c: x(c)

ji = (x(c)ji1, . . . ,x

(c)jip)

>;Sampling design: By design, it is known before samplingthat ∑

ni=1 y(c)

ji = m for all j,c.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Model and notation

Notation

Animals: c = 1,2, . . . ,K; Strata: j = 1,2, . . . ,Sc; Locations:i = 1,2, . . . ,n;

Response variable: y(c)ji = 1 if animal c was at location i in

j-th stratum, 0 otherwise;Covariates: Value of attributes of landscape at location i instratum j of animal c: x(c)

ji = (x(c)ji1, . . . ,x

(c)jip)

>;

Sampling design: By design, it is known before samplingthat ∑

ni=1 y(c)

ji = m for all j,c.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Model and notation

Notation

Animals: c = 1,2, . . . ,K; Strata: j = 1,2, . . . ,Sc; Locations:i = 1,2, . . . ,n;

Response variable: y(c)ji = 1 if animal c was at location i in

j-th stratum, 0 otherwise;Covariates: Value of attributes of landscape at location i instratum j of animal c: x(c)

ji = (x(c)ji1, . . . ,x

(c)jip)

>;Sampling design: By design, it is known before samplingthat ∑

ni=1 y(c)

ji = m for all j,c.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Model and notation

“Prospective” model

If we sampled locations without knowing the value of the y(c)ji in

advance (i.e., prospective study), we could link landscapeattributes x(c)

ji with y(c)ji using logistic regression-type models.

E.g., given i.i.d. N(0,Σ) vectors of animal-level random effects,say bc, and the covariates, it is assumed that the y(c)

ji areindependent with

Pr[y(c)

ji = 1∣∣∣bc,x

(c)ji

]=

exp(

β>x(c)

ji +b>c z(c)ji

)1+ exp

(β>x(c)

ji +b>c z(c)ji

) .

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Model and notation

“Prospective” model

If we sampled locations without knowing the value of the y(c)ji in

advance (i.e., prospective study), we could link landscapeattributes x(c)

ji with y(c)ji using logistic regression-type models.

E.g., given i.i.d. N(0,Σ) vectors of animal-level random effects,say bc, and the covariates, it is assumed that the y(c)

ji areindependent with

Pr[y(c)

ji = 1∣∣∣bc,x

(c)ji

]=

exp(

β>x(c)

ji +b>c z(c)ji

)1+ exp

(β>x(c)

ji +b>c z(c)ji

) .

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Model and notation

Resource selection function

The exponential of the linear predictor is sometimes calledresource selection function (RSF). Maps of its value can help toassess animal preferences.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Justification of conditional logistic regression

“Retrospective” model

When location i in stratum j of animal c is sampled on the basisof its y(c)

ji value, how can we infer about β (and possibly Σ) in theprospective model?

Using arguments based on conditional likelihood (e.g., Hosmer& Lemeshow 2000), on discrete choice theory (e.g., Manly etal. 2002, Train 2003) or on movement kernels (e.g., Forester etal, 2009), we get that a good way to deal with the retrospectivedesign is conditional logistic regression.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Justification of conditional logistic regression

“Retrospective” model

When location i in stratum j of animal c is sampled on the basisof its y(c)

ji value, how can we infer about β (and possibly Σ) in theprospective model?

Using arguments based on conditional likelihood (e.g., Hosmer& Lemeshow 2000), on discrete choice theory (e.g., Manly etal. 2002, Train 2003) or on movement kernels (e.g., Forester etal, 2009), we get that a good way to deal with the retrospectivedesign is conditional logistic regression.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Justification of conditional logistic regression

Conditional likelihood

If we suppose that b>c z(c)ji = bc in the prospective model, then we

get that

Pr

[y(c)

ji , i = 1, . . . ,n

∣∣∣∣∣bc,n

∑i=1

y(c)ji = m,x(c)

ji , i = 1, . . . ,n

]=

exp(

∑ni=1 β

>x(c)ji y(c)

ji

)∑

(nm)

`=1 exp(

∑ni=1 β

>x(c)ji v`i

) ,

where the sum at the denominator is over all vectors v`

comprised of zeros and ones such that the sum of theirelements is m.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Justification of conditional logistic regression

Exponential movement kernels (Forester et al 2009)

Suppose the animal is at location a at time step t.All locations in set Da are reachable by the animal untiltime step t +1.Assume that the density of movement from a point a to apoint b in a homogeneous baseline landscape over onetime step is given by φ(dab), where dab is the distancebetween a and b.Suppose that habitat characteristics have a log-lineareffect on the movement kernel. Then

f (b|a,xs,s ∈Da) =φ(dab)exp(β>xb)∫

s∈Daφ(das)exp(β>xs)

.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Justification of conditional logistic regression

Exponential movement kernels (Forester et al 2009)

Evaluation of the integral at the denominator can be replacedby an approximating sum. Forester et al (2009) show that if asample Sa comprised of b and n−1 other locations in Da are“appropriately” sampled,

f (b|a,x`, ` ∈Da) =φ(dab)exp(β>xb)∫

`∈Daφ(da`)exp(β>x`)

≈ exp(β>xb)

∑`∈Sa exp(β>x`),

which is the probability of conditional logistic regression whenm = 1 and the location with y = 1 is b.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Data and assumptions

Now back to the general problem:K animals, S(c) strata observed for animal c, m “cases”(locations with y = 1) and n−m “controls” (locations withy = 0) in each stratum.We want to make population averaged inference about β inthe prospective model.It is assumed that the data can be partitioned intouncorrelated clusters (data from different animalsuncorrelated, or clusters of observations on a same animaltaken several time units apart).

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Craiu et al (2008)

We showed that the likelihood score function of theretrospective model can be rewritten as

U(β ) =K

∑c=1

S(c)

∑j=1

n

∑i=2

x(c)∗ji y(c)

ji −∑

(mn)

`=1 v`ix(c)∗ji exp

(∑

nh=2 β

>x(c)∗ji v`h

)∑

(mn)

`=1 exp(

∑nh=2 β

>x(c)∗ji v`h

)

=K

∑c=1

D(c)>(

V(c)Indep

)−1 {Y(c)− µ(β )

},

where x(c)∗ji = x(c)

ji −x(c)j1 and Y(c) is the vector of all responses,

but without the y(c)j1 ’s and µ(β ) = ERetro.[Y(c)|X(c)].

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Advantages

With the robust (sandwich) estimate of Var(β ), inferencesabout β are valid no matter what the correlation structurewithin clusters is ... as long as data are uncorrelatedbetween clusters.

U(β ) is the partial likelihood score for the Cox model fordiscrete data ⇒ PROC PHREG or coxph() can be used toapply the method.Simulations have shown that inferences are good in finitesamples:

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Advantages

With the robust (sandwich) estimate of Var(β ), inferencesabout β are valid no matter what the correlation structurewithin clusters is ... as long as data are uncorrelatedbetween clusters.U(β ) is the partial likelihood score for the Cox model fordiscrete data ⇒ PROC PHREG or coxph() can be used toapply the method.

Simulations have shown that inferences are good in finitesamples:

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Advantages

With the robust (sandwich) estimate of Var(β ), inferencesabout β are valid no matter what the correlation structurewithin clusters is ... as long as data are uncorrelatedbetween clusters.U(β ) is the partial likelihood score for the Cox model fordiscrete data ⇒ PROC PHREG or coxph() can be used toapply the method.Simulations have shown that inferences are good in finitesamples:

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Simulation results, Craiu et al (2008, Table 1)

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Disadvantages

Inference on parameters of working correlation matrix notpossible ⇒ Must use independence working assumption.

Though better than AIC, the QIC(I) model selectioncriterion did not perform really well in simulations:

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Disadvantages

Inference on parameters of working correlation matrix notpossible ⇒ Must use independence working assumption.Though better than AIC, the QIC(I) model selectioncriterion did not perform really well in simulations:

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Simulation results

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Example of application

Application to female bison in Prince Albert

8 female bison with 14 clusters of 48 locations, and 1female with 9 clusters, all followed between 2 Sept. 2005 –2 Dec. 2005Each observed location was matched to 10 locationspicked at random in a 300 m buffer (soK = 8×14+1×9 = 121, S = 48, m = 1, n = 11).x: 6 dummy variables to quantify seven-level habitat classcategorical variable (deciduous stands = baseline level)

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Example of application

Model fit

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Conditional inference

Sometimes, subject-specific inferences are required. Can weestimate β and Σ from the mixed-effects prospective model withthe retrospective sampling design?

Already done in some special cases:Family studies of genetic diseases (special case S = 1)Mixed multinomial logit discrete choice model (special casem = 1)

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Conditional inference

Sometimes, subject-specific inferences are required. Can weestimate β and Σ from the mixed-effects prospective model withthe retrospective sampling design?

Already done in some special cases:Family studies of genetic diseases (special case S = 1)Mixed multinomial logit discrete choice model (special casem = 1)

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Likelihood for the general case

Craiu et al (2011) get the following likelihood in the generalcase:

L(β ,Σ) =K

∏c=1

exp(

∑si y(c)si β

>x(c)si

)∫exp

(∑si y(c)

si b>z(c)si

)d(c)(β ,b) dF(b;Σ)∫

d(c)(β ,b)∏s ∑`∈L

(c)s

exp{

∑i v(c)`si (β

>x(c)si +b>z(c)

si )}

dF(b;Σ),

where d(c)(β ,b) = ∏s ∏i{1+ exp(β>x(c)si +b>z(c)

si )}−1.

How do you maximize this thing?!?!?!?!!

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Likelihood for the general case

Craiu et al (2011) get the following likelihood in the generalcase:

L(β ,Σ) =K

∏c=1

exp(

∑si y(c)si β

>x(c)si

)∫exp

(∑si y(c)

si b>z(c)si

)d(c)(β ,b) dF(b;Σ)∫

d(c)(β ,b)∏s ∑`∈L

(c)s

exp{

∑i v(c)`si (β

>x(c)si +b>z(c)

si )}

dF(b;Σ),

where d(c)(β ,b) = ∏s ∏i{1+ exp(β>x(c)si +b>z(c)

si )}−1.

How do you maximize this thing?!?!?!?!!

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Likelihood for the general case

Craiu et al (2011) get the following likelihood in the generalcase:

L(β ,Σ) =K

∏c=1

exp(

∑si y(c)si β

>x(c)si

)∫exp

(∑si y(c)

si b>z(c)si

)d(c)(β ,b) dF(b;Σ)∫

d(c)(β ,b)∏s ∑`∈L

(c)s

exp{

∑i v(c)`si (β

>x(c)si +b>z(c)

si )}

dF(b;Σ),

where d(c)(β ,b) = ∏s ∏i{1+ exp(β>x(c)si +b>z(c)

si )}−1.

How do you maximize this thing?!?!?!?!!

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Maximization of the likelihood

Family studies (Pfeiffer et al 2001): Evaluate the integralsby Monte Carlo method, then maximize using a hybrid ofNewton-type methods for β and grid search for elements ofΣ.

Mixed multinomial logit (Bhat 2001): Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize.Craiu et al (2011), first attempt: Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize

With small K and large S, these methods are painfully slow andunstable!

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Maximization of the likelihood

Family studies (Pfeiffer et al 2001): Evaluate the integralsby Monte Carlo method, then maximize using a hybrid ofNewton-type methods for β and grid search for elements ofΣ.Mixed multinomial logit (Bhat 2001): Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize.

Craiu et al (2011), first attempt: Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize

With small K and large S, these methods are painfully slow andunstable!

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Maximization of the likelihood

Family studies (Pfeiffer et al 2001): Evaluate the integralsby Monte Carlo method, then maximize using a hybrid ofNewton-type methods for β and grid search for elements ofΣ.Mixed multinomial logit (Bhat 2001): Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize.Craiu et al (2011), first attempt: Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize

With small K and large S, these methods are painfully slow andunstable!

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Maximization of the likelihood

Family studies (Pfeiffer et al 2001): Evaluate the integralsby Monte Carlo method, then maximize using a hybrid ofNewton-type methods for β and grid search for elements ofΣ.Mixed multinomial logit (Bhat 2001): Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize.Craiu et al (2011), first attempt: Quasi-Monte Carloevaluation of integrals, Newton-type methods to maximize

With small K and large S, these methods are painfully slow andunstable!

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Two-step algorithm, Craiu et al (2011)

Inspired by earlier work for GLMM, we derived a two-stepmethod that is numerically fast and stable and that yieldsestimators of β and Σ with good properties:

Step 1: Separately for each cluster c, use traditionalmaximum likelihood for independent data (e.g., coxph())to get β c and an estimate of its estimate Rc = Var(β c).

Step 2: Since the clusters are large, the β c areindependent and β c ≈ N(β ,Rc). Thus we can use linearmixed model theory and REML estimation to combinethese estimates together to obtain estimates of β and Σ.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Two-step algorithm, Craiu et al (2011)

Inspired by earlier work for GLMM, we derived a two-stepmethod that is numerically fast and stable and that yieldsestimators of β and Σ with good properties:

Step 1: Separately for each cluster c, use traditionalmaximum likelihood for independent data (e.g., coxph())to get β c and an estimate of its estimate Rc = Var(β c).Step 2: Since the clusters are large, the β c areindependent and β c ≈ N(β ,Rc). Thus we can use linearmixed model theory and REML estimation to combinethese estimates together to obtain estimates of β and Σ.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Second step: REML with EM

Easy to implement and to program ... but difficult to explain dueto extremely heavy notation! But in a nutshell,

Stack the estimates β 1 . . . , β K in a vector V and theirvariance estimates in a block diagonal matrixR = diag(R1, . . . ,RK).

Stack the vectors of random effects b1, . . . ,bK in a vector φ

and their variances in a block diagonal matrixΣ = diag(Σ, . . . ,Σ).Define W1 = 1K ⊗ Ip and W2 = IK p.Consider the linear mixed model

U = W1β +W2φ + ε,

where ε ∼ N(0,R), φ ∼ N(0, Σ) and φ ⊥ ε.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Second step: REML with EM

Easy to implement and to program ... but difficult to explain dueto extremely heavy notation! But in a nutshell,

Stack the estimates β 1 . . . , β K in a vector V and theirvariance estimates in a block diagonal matrixR = diag(R1, . . . ,RK).Stack the vectors of random effects b1, . . . ,bK in a vector φ

and their variances in a block diagonal matrixΣ = diag(Σ, . . . ,Σ).

Define W1 = 1K ⊗ Ip and W2 = IK p.Consider the linear mixed model

U = W1β +W2φ + ε,

where ε ∼ N(0,R), φ ∼ N(0, Σ) and φ ⊥ ε.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Second step: REML with EM

Easy to implement and to program ... but difficult to explain dueto extremely heavy notation! But in a nutshell,

Stack the estimates β 1 . . . , β K in a vector V and theirvariance estimates in a block diagonal matrixR = diag(R1, . . . ,RK).Stack the vectors of random effects b1, . . . ,bK in a vector φ

and their variances in a block diagonal matrixΣ = diag(Σ, . . . ,Σ).Define W1 = 1K ⊗ Ip and W2 = IK p.

Consider the linear mixed model

U = W1β +W2φ + ε,

where ε ∼ N(0,R), φ ∼ N(0, Σ) and φ ⊥ ε.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Second step: REML with EM

Easy to implement and to program ... but difficult to explain dueto extremely heavy notation! But in a nutshell,

Stack the estimates β 1 . . . , β K in a vector V and theirvariance estimates in a block diagonal matrixR = diag(R1, . . . ,RK).Stack the vectors of random effects b1, . . . ,bK in a vector φ

and their variances in a block diagonal matrixΣ = diag(Σ, . . . ,Σ).Define W1 = 1K ⊗ Ip and W2 = IK p.Consider the linear mixed model

U = W1β +W2φ + ε,

where ε ∼ N(0,R), φ ∼ N(0, Σ) and φ ⊥ ε.

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Second step: REML with EM

β and Σ in this mixed linear model can be estimated bymaximum likelihood (ML) or by restricted maximumlikelihood (REML).

We first tried with ML, but variances were underestimatedand β was biased.We used the EM algorithm (both E and M steps in closedform for a few specifications of the structure of Σ) toimplement REML ⇒ numerically quick and stable,estimators quite good in terms of bias, even in terms ofefficiency.

An R package (TwoStepClogit) implementing this methodshould be available on CRAN in the Spring!

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Second step: REML with EM

β and Σ in this mixed linear model can be estimated bymaximum likelihood (ML) or by restricted maximumlikelihood (REML).We first tried with ML, but variances were underestimatedand β was biased.

We used the EM algorithm (both E and M steps in closedform for a few specifications of the structure of Σ) toimplement REML ⇒ numerically quick and stable,estimators quite good in terms of bias, even in terms ofefficiency.

An R package (TwoStepClogit) implementing this methodshould be available on CRAN in the Spring!

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Second step: REML with EM

β and Σ in this mixed linear model can be estimated bymaximum likelihood (ML) or by restricted maximumlikelihood (REML).We first tried with ML, but variances were underestimatedand β was biased.We used the EM algorithm (both E and M steps in closedform for a few specifications of the structure of Σ) toimplement REML ⇒ numerically quick and stable,estimators quite good in terms of bias, even in terms ofefficiency.

An R package (TwoStepClogit) implementing this methodshould be available on CRAN in the Spring!

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Second step: REML with EM

β and Σ in this mixed linear model can be estimated bymaximum likelihood (ML) or by restricted maximumlikelihood (REML).We first tried with ML, but variances were underestimatedand β was biased.We used the EM algorithm (both E and M steps in closedform for a few specifications of the structure of Σ) toimplement REML ⇒ numerically quick and stable,estimators quite good in terms of bias, even in terms ofefficiency.

An R package (TwoStepClogit) implementing this methodshould be available on CRAN in the Spring!

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Method

Simulation results, Craiu et al (2011, Fig. 1)

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Example of application

Application to female bison in Prince Albert

20 pairs of two female bison followed between 15 Nov. –15 April, 2005, 2006, 2007Each pair of observed locations was matched to 10locations picked at random in a 700 m buffer (so K = 20,m = 2, n = 12, S varied between 21 and 349).x: dummy variables to quantify habitat class as well as anabove-ground vegetation biomass index (in kg/m2)

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Example of application

Model fit

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Future research

How should the “controls” be sampled?

Within cluster correlation: How to estimate workingcorrelations in GEE? How to include autocorrelationamong observations belonging to a same cluster in theprospective (then retrospective) model?Between cluster correlation: How can we include betweenanimal (or between pair of animals) correlation in suchmodels?Model validation: relatively easy to do informally withK-fold cross-validation type of approaches ... but how cana formal goodness-of-fit test be done?

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Future research

How should the “controls” be sampled?Within cluster correlation: How to estimate workingcorrelations in GEE? How to include autocorrelationamong observations belonging to a same cluster in theprospective (then retrospective) model?

Between cluster correlation: How can we include betweenanimal (or between pair of animals) correlation in suchmodels?Model validation: relatively easy to do informally withK-fold cross-validation type of approaches ... but how cana formal goodness-of-fit test be done?

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Future research

How should the “controls” be sampled?Within cluster correlation: How to estimate workingcorrelations in GEE? How to include autocorrelationamong observations belonging to a same cluster in theprospective (then retrospective) model?Between cluster correlation: How can we include betweenanimal (or between pair of animals) correlation in suchmodels?

Model validation: relatively easy to do informally withK-fold cross-validation type of approaches ... but how cana formal goodness-of-fit test be done?

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

Future research

How should the “controls” be sampled?Within cluster correlation: How to estimate workingcorrelations in GEE? How to include autocorrelationamong observations belonging to a same cluster in theprospective (then retrospective) model?Between cluster correlation: How can we include betweenanimal (or between pair of animals) correlation in suchmodels?Model validation: relatively easy to do informally withK-fold cross-validation type of approaches ... but how cana formal goodness-of-fit test be done?

Outline Introduction Conditional logistic regression GEE Mixed model Conclusion Ref’s

References

Bhat, C. (2001). Quasi-random maximum simulated likelihood estimation of themixed multinomial logit model, Transport. Res. Part B, 35, 677–93.

Craiu, R. V., Duchesne, T., Fortin, D. (2008). Inference methods for the condi-tional logistic regression model with longitudinal data., Biometrical J., 50,97–109.

Craiu, R. V., Duchesne, T., Fortin, D., Baillargeon, S. (2011). Conditional logisticregression with longitudinal follow up and individual-level random coefficients: Astable and efficient two-step estimation method, J. of Comput. & Graph. Statist,to appear.

Forester, J. D., Im, H. K., Rathouz, P. J. (2009). Accounting for animal movementin estimation of resource selection functions: sampling and data analysis,Ecology, 90, 3554-65.

Pfeiffer, R. M., Gail, M. H., Pee, D. (2001). Inference for covariates that accountsfor ascertainment and random genetic effects in family studies, Biometrika, 88,933–48.

Train, K. (2003). Discrete choice methods with simulation, New York: CambridgeUniversity Press.

top related