chapterfrwjb/materials/nhmm.pdfchapter 1 non-hierar chical mul tilevel models jon rasbash and...

Chapter 1

NON-HIERARCHICALMULTILEVELMODELS

Jon Rasbash and William Browne

1. INTRODUCTION

In the models discussed in this book so far we have assumed thatthe structures of the populations from which the data have been drawnare hierarchical. This assumption is sometimes not justi�ed. In thischapter two main types of non-hierarchical model are considered. Firstly,cross-classi�ed models. The notion of cross-classi�cation is probablyreasonably familiar to most readers. Secondly, we consider multiplemembership models, where lower level units are in uenced by more thanone higher level unit from the same classi�cation. For example, somepupils may attend more than one school. We also consider situationsthat contain a mixture of hierarchical, crossed and multiple membershiprelationships.

2. CROSS-CLASSIFIED MODELS

This section is divided into three parts. In the �rst part we look at sit-uations that can give rise to a two way cross-classi�cation and introducesome diagrams to describe the population structure, and discuss notationfor constructing a statistical model. In the second part we discuss someof the possible estimation methods for estimating cross-classi�ed modelsand give an example analysis of an educational data set. In the thirdpart we then describe some more complex cross-classi�ed structures andgive an example analyses of a medical data set.

2.1 TWO WAY CROSS-CLASSIFICATIONS :A BASIC MODEL

Suppose, we have data on a large number of patients, attending manyhospitals and we also know the neighbourhood in which the patient livesand that we regard patient, neighbourhood and hospital all as important

1

2

Table 1.1 Patients cross-classi�ed by hospital and neighbourhood.

Neighbourhood 1 Neighbourhood 2 Neighbourhood 3

Hospital 1 XX X

Hospital 2 X X

Hospital 3 XX X

Hospital 4 X XXX

Table 1.2 Patients nested within hospitals within neighbourhoods.

Neighbourhood 1 Neighbourhood 2 Neighbourhood 3

Hospital 1 XXX

Hospital 2 XX

Hospital 3 XXX

Hospital 4 XXXX

sources of variation for the patient level outcome measure we wish tostudy. Now, typically hospitals will draw patients from many di�erentneighbourhoods and the inhabitants of a neighbourhood will go to manyhospitals. No pure hierarchy can be found and patients are said to becontained within a cross-classi�cation of hospitals by neighbourhoods.This can be represented schematically , for the case of twelve patientscontained within a cross-classi�cation of three neighbourhoods by fourhospitals as in table 1.1.In this example we have patients at level 1 and neighbourhood and

hospital are cross-classi�ed at level 2. The characteristic pattern of across-classi�cation is shown, some rows contains multiple entries andsome columns contain multiple entries. In a nested relationship, if therow classi�cation is nested within the column classi�cation then all theentries across a row will fall under a single column and vice versa ifthe column classi�cation is nested within the row classi�cation. Forexample, if hospitals are nested within neighbourhoods we might observethe pattern in table 1.2.

3

Many studies follow this simple two-way crossed structure, here are afew examples :

Education: students cross-classi�ed by primary school and sec-ondary school.

Health: patients cross-classi�ed by general practice and hospital.

Survey data: individuals cross-classi�ed by interviewer and area ofresidence.

2.1.1 Diagrams for representing the relationship between

classi�cations. We �nd two types of diagrams useful in expressingthe nature of relationships between classi�cations. Firstly, unit diagramswhere we draw every unit (patient, hospital and neighbourhood, in thecase of our �rst example) and connect each lowest level unit(patient) toits parent units (hospital, neighbourhood). Such a representation of thedata in table 1.1 is shown in �gure 1.1.

Hospital H1 H2 H3 H4

Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12

Neighbourhood N1 N2 N3

Figure 1.1 Diagrams for crossed structure given in table 1.1.

Note that we have two hierarchies present, patients within hospitalsand patients within neighbourhoods, we have organised the topology ofthe diagram such that patients are nested within hospitals. However,

4

when we come to add neighbourhoods to the diagram we see that theconnecting lines cross, indicating we have a cross classi�cation. Drawingthe hierarchical structure shown in table 1.2 gives the representationshown in �gure 1.2.

Hospital H1 H4 H2 H3

Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12

Neighbourhood N1 N2 N3

Figure 1.2 Diagrams for completely nested structure given in table 1.2.

Clearly, to draw such diagrams that include all units with large datasets is not practical as there will be far to many nodes on the diagramto �t into a reasonable area. However, they can be used in schematicform to convey the structure of the relationship between classi�cations.However, when we have four or �ve random classi�cations present (ascommonly occur with social data) even schematic forms of these dia-grams can become hard to read. There is a more minimal diagram,the classi�cation diagram, which has one node for each classi�cation.Nodes connected by an arrow indicate a nested relationship, nodes con-nected by a double arrow indicate a multiple-membership relationship(examples are given later) and unconnected nodes indicate a crossed re-lationship. Thus the crossed structure in �gure 1.1 and the completelynested structure of �gure 1.2 are drawn as

5

Patient

Hospital Neighbourhood

(i) crossed structure

Patient

Hospital

Neighbourhood

(ii) nested structure

Figure 1.3 Classi�cation Diagrams for nesting and crossing.

2.1.2 Some notation for constructing a statistical model.

The matrix notation used in this book for describing hierarchical models,that is,

yj = Xj� + Zj�j + ej

does not readily extend to the case of cross-classi�cations. This is be-cause this notation assumes a unique hierarchy where we write down thegeneric equation for the jth level two unit. In a simple cross-classi�cationwe have two sets of level two units, for example, hospitals and neigh-bourhoods, so which classi�cation is j indexing?We can extend the basic scalar notation to handle cross-classi�ed

structures. Assume we have patients nested within a cross-classi�cationof neighbourhoods by hospital, that is the case illustrated in �gure 1.3(i).Suppose we want to estimate a simple variance components model givingestimates of the mean response and patient, hospital and neighbourhoodlevel variation. In this case we can write the model in scalar notation as

yi(j1;j2) = �0 + �j1 + �j2 + ei(j1;j2)

where �0 estimates the mean response, j1 indexes the neighbourhoodclassi�cation, j2 indexes the hospital classi�cation, �j1 is the random

6

Table 1.3 Indexing table for neighbourhoods and hospitals for patients given in �g-ure 1.1

i nhbd(i) hosp(i)

1 1 12 2 13 1 14 2 25 1 26 2 37 2 38 3 39 3 410 2 411 3 412 3 4

e�ect for neighbourhood j1, �j2 is the random e�ect for hospital j2,yi(j1;j2) is the response for the ith patient from the cell in the cross-classi�cation de�ned by neighbourhood j1 and hospital j2 and �nallyei(j1;j2) is the patient level residual for the i'th patient from cell in thecross-classi�cation de�ned by neighbourhood j1 and hospital j2.Details of how this notation extends to represent more complex mod-

els and patterns of cross-classi�cations are given in Rasbash and Browne,2001. One problem with this notation is that as we �t models with moreclassi�cations and more complex patterns of crossing, the subscript no-tation that describes the patterns becomes very cumbersome and di�-cult to read. We therefore prefer an alternative notation introduced inBrowne et al., 2000.We can write the same model as

yi = �0 + �(2)nbhd(i) + �

(3)hosp(i) + ei

where i indexes the observation level in this case patients, and nbhd(i)and hosp(i) are functions that return the unit number of the neighbour-hood and hospital, respectively, that patient i belongs to. Thus for thedata structure drawn in �gure 1.1 the values of nbhd(i) and hosp(i) aregiven in table 1.3.Therefore the model for patient 3 would be

7

y3 = �0 + �(2)1 + �

(3)1 + e3

and for patient 5 would be

y5 = �0 + �(2)1 + �

(3)2 + e5

We number the classi�cations from 2 upwards as we use classi�cationnumber 1 to represent the identity classi�cation that applies to the ob-servation level (like level 1 in a hierarchical model). This classi�cationsimply returns the row numbers in the data matrix. As can be seenrandom e�ects require bracketed superscripting with their classi�cationnumber to avoid ambiguity.This simpli�ed notation has the advantage that the subscripting no-

tation does not increase in complexity as we add more classi�cations.This simpli�cation is achieved because the notation makes no attemptto describe the patterns of crossing and nesting present. This is usefulinformation and we therefore advocate the use of this notation in con-junction with the classi�cation diagrams, as shown in �gure 1.3, whichdisplays these patterns explicitily.

2.2 ESTIMATION ALGORITHMS

We will describe three estimation algorithms for �tting cross-classi�cationmodels in detail and mention other alternatives. Each of these threemethods has advantages and disadvantages in terms of speed, memoryusage and bias and these will be discussed later. All three methods havebeen implemented in versions of the MLwiN software package (Rasbashet al., 2000) and all results in this paper are produced by this package.

2.2.1 An IGLS algorithm for estimating cross-classi�ed mod-

els. The iterative generalized least squares estimates for a multilevelmodel are those estimates which simultaneously satisfy both of the fol-lowing equations:

�̂ = (XTV�1X)�1(XTV�1y)

�̂ = (Z�T (V�)�1Z�)�1Z�T (V�)�1y�

where �̂ are the estimated �xed coe�cients and �̂ is a vector containingthe estimated variances and covariances of the sets of random e�ects inthe model. V = Cov(y j X; �) and an estimate of V is constructed from

the elements of �̂. y� is the vector of elements of (y � X�)(y � X�)T

8

and therefore has length n2 (n is the length of the data set). V� is thecovariance matrix of y� and Z� is the design matrix linking y� to V in theregression of y� on Z�. V� has the form V� = V

NV. See Goldstein, 1986

for more details. Some of these matrices are massive for example (V�)�1

is dimensioned n2 by n2, making a direct software implementation ofthese estimating equations extremely resource intensive both in termsof CPU time and memory consumed. However, in hierarchical modelsV and V� have a block diagonal structure which can be exploited bycustomised algorithms (see Goldstein and Rasbash, 1996) which allowe�cient computation.The problem presented by cross-classi�ed models is that V (and there-

fore V� ) no longer has the block diagonal structure which the e�cientalgorithm requires.

2.2.2 Structure of V for cross classi�ed models. Lets take alook at the structure of V, the covariance matrix of y, for cross-classi�edmodels and see how we can adapt the standard IGLS algorithm to handlecross-classi�cations.The basic two level cross-classi�ed model (with hospitals + neigh-

bourhoods) can be written as :

yi = X� + �(2)hosp(i) + �

(3)nhbd(i) + ei

�(2)hosp(i) � N(0; �2�(2)); �

(3)nhbd(i) � N(0; �2�(3)); ei � N(0; �2e)

The variance of our response is now

var(yi) = var(�(2)hosp(i) + �

(3)nhbd(i) + ei) = �2�(2) + �2�(3) + �2e :

The covariance between individuals a and b is

cov(ya; yb) = cov(�(2)hosp(a) + �

(3)nhbd(a) + ea; �

(2)hosp(b) + �

(3)nhbd(b) + eb)

which simpli�es to �2�(2) for two individuals from the same hospital

but di�erent neighbourhoods, �2�(3) for two individuals from the same

neighbourhood but di�erent hospitals, �2�(2) + �2�(3) for two individuals

from the same neighbourhood and the same hospital and zero for twoindividuals who are from both di�erent neighbourhoods and hospitals.If we take a toy example of �ve patients in two hospitals and introducea cross-classi�cation with two neighbourhoods, as shown in table 1.4.This generates a 5 by 5 covariance matrix for the responses of the �ve

patients with the following structure :

9

Table 1.4 Indexing table for hospitals and neighbourhoods for 5 patients

i hosp(i) nhbd(i)

1 1 12 1 23 1 14 2 25 2 1

V =

0BBBB@h+ n+ p h h+ n 0 n

h h+ n+ p h n 0h+ n h h+ n+ p 0 n0 n 0 h+ n+ p hn 0 n h h+ n+ p

1CCCCAwhere n = �2�(2); h = �2�(3) and p = �2e .

Here the data is sorted patient within hospital, this allows us to splitthe covariance matrix into two components. A component for patientswithin hospitals which has a block diagonal structure (P) and a compo-nent for neighbourhoods which is not block diagonal (Q) :

V = P + Q

where

P =

0BBBB@h+ p h h 0 0h h+ p h 0 0h h h+ p 0 00 0 0 h+ p h0 0 0 h h+ p

1CCCCAand

Q =

0BBBB@n 0 n 0 n0 n 0 n 0n 0 n 0 n0 n 0 n 0n 0 n 0 n

1CCCCASplitting the structure of V into a hierarchical, block-diagonal part

that the IGLS algorithm can handle in an e�cient way and a non-hierarchical, non-block diagonal part forms the basis of a relatively e�-cient algorithm for handling cross-classi�ed models.If we take the dummy variable indicator matrix of neighbourhoods

(Z), then we have Q = ZZTn :

10

Z =

0BBBB@1 00 11 00 11 0

1CCCCA , ZZTn =

0BBBB@1 0 1 0 10 1 0 1 01 0 1 0 10 1 0 1 01 0 1 0 1

1CCCCAn

We can de�ne a `pseudo-unit' that spans the entire data set, in our toyexample, all �ve points, and declare this pseudo-unit to be level three inthe model (removing the neighbourhood level from the model). We cannow form the three level hierarchical model

yi = �0 + �(2)hosp(i) + �

(3)punit(i);1Z1 + �

(3)punit(i);2Z2 + ei

24 �(3)punit(i);1

�(3)punit(i);2

35 � N(0;��(3));��(3) =

"�2�(3);1 0

0 �2�(3);2

#

�(2)hosp(i) � N(0; �2�(2)); ei � N(0; �2e )

Here the level structure is patients within hospitals within the pseudounit level. Z1 and Z2 are columns 1 and 2 of Z. �2�(3);1 and �2�(3);2 are

both estimates of the between neighbourhood variation, therefore weconstrain them to be equal. Thus we can use the standard IGLS hierar-chical algorithm to de�ne and estimate the correct covariance structurefor a cross-classi�ed model. Now if we had 200 hospitals and 100 neigh-bourhoods, we would have to form 100 dummy variables for the neigh-bourhoods, allow them all to have variances at level 3 and constrain thevariances to be equal. Details of this algorithm are given in Rasbashand Goldstein, 1994 and Bull et al., 1999 and it will be refered to as theRG algorithm in later sections.

2.2.3 MCMC. The MCMC estimation methods (see Chapter 3of this book for a fuller description) aim to generate samples from thejoint posterior distribution of all unknown parameters. They then usethese samples to calculate point and interval estimates for each individ-ual parameter. The Gibbs sampler algorithm produces samples fromthe joint posterior by generating in turn from the conditional poste-rior distributions of groups of unknown parameters. In chapter 3 theGibbs sampling algorithm for a Normally distributed response hierar-chical model is given.As we have seen in the notation section we can describe our model

as a set of additive terms, one for the �xed part of the model and one

11

for each of the random classi�cations. The MCMC algorithm works oneach of these terms seperately and consequently the algorithm for a cross-classi�ed model is no more complicated than for a hierarchical model.For illustration we present the steps for the following cross-classi�edmodel based on the variance components hospitals by neighbourhoodsmodel and refer the interested reader to Browne et al., 2000 for moregeneral algorithms. Note that if the response is dichotomous or a countthen as in chapter 3 we can use the Metropolis-Gibbs hybrid methoddiscussed there.The basic two level cross-classi�ed model (with hospitals + neigh-

bourhoods) can be written as :

yi = X� + �(2)hosp(i) + �

(3)nhbd(i) + ei

�(2)hosp(i) � N(0; �2�(2)); �

(3)nhbd(i) � N(0; �2�(3)); ei � N(0; �2e )

We can split our unknown parameters into 6 distinct sets : the �xed ef-

fects, �, the hospital random e�ects, �(2)hosp(i), the neighbourhood random

e�ects, �(3)nhbd(i), the hospitals variance, �

2�(2) the neighbourhood variance,

�2�(3) and the residual variance, �2e .

Then we need to generate random draws from the conditional distri-bution of each of these six groups of unknowns. MCMC algorithmsare generally used in a Bayesian context and consequently we needto de�ne prior distributions for our unknown parameters. For gen-erality we will use a multivariate Normal prior for the �xed e�ects,� � Npf (�p; Sp), and scaled inverse (SI) �2 priors for the three vari-

ances. For the hospital variance �2�(2) � SI�2(�2; s22), for the neigh-

bourhood variance �2�(3) � SI�2(�3; s23) and for the residual variance

�2e � SI�2(�e; s2e). The steps are then as follows:

In step 1 of the algorithm the conditional posterior distribution in theGibbs update for the �xed e�ects parameter vector � is multivariatenormal with dimension pf (the number of �xed e�ects) :

p(� j y; �(2); �(3); �2�(2); �2�(3); �2e) � Npf (b�; bD); wherebD =

hPNi=1

(Xi)TXi

�2e+ S�1

p

i�1andb� = bD hP

i(Xi)

T di�2e

+ S�1p �p

i;where

di = yi � �(2)hosp(i) � �

(3)nhbd(i):

In step 2 we update the hospital residuals, �(2)k , using Gibbs sampling

with a univariate Normal full conditional distribution :

12

p(�(2)k j y; �; �(3); �2�(2); �2�(3); �2e) � N(b�(2)k ; bD(2)

k ); where

bD(2)k =

�n(2)k

�2e+ 1

�2�(2)

��1

and

bu(2)k = bD(2)k

�Pi;hosp(i)=k

d(2)i

�2e

�;where

d(2)i = yi �Xi� � �

(3)nhbd(i):

In step 3 we update the neighbourhood residuals, �(3)k , using Gibbs sam-

pling with a univariate Normal full conditional distribution :

p(�(3)k j y; �; �(2); �2�(2); �2�(3); �2e) � N(b�(3)k ; bD(3)

k ); where

bD(3)k =

�n(3)k

�2e+ 1

�2�(3)

��1

and

bu(3)k = bD(3)k

�Pi;nhbd(i)=k

d(3)i

�3e

�;where

d(3)i = yi �Xi� � �

(2)hosp(i):

Note that in the above two steps n(c)k refers to the number of individ-

uals in the kth unit of classi�cation c.In step 4 we update the hospital variance �2�(2) using Gibbs sampling

and a Gamma full conditional distribution for 1=�2�(2) :

p(1=�2�(2) j y; �; �(2); �(3); �2�(3); �2e) � Gammahn2+�2

2 ; 12Pn2

j=1(�(2)j )2 + �2s

22

i:

In step 5 we update the neighbourhood variance �2�(3) using Gibbs sam-

pling and a Gamma full conditional distribution for 1=�2�(3) :

p(1=�2�(3) j y; �; �(2); �(3); �2�(2); �2e) � Gamma

hn3+�3

2 ; 12Pn3

j=1(�(3)j )2 + �3s

23

i:

In step 6 we update the observation level variance �2e using Gibbs sam-pling and a Gamma full conditional distribution for 1=�2e :

p(1=�2e j y; �; �(2) ; �(3); �2�(2); ��(3)) � GammahN+�e

2 ; 12P

i e2i + �es

2e

i:

The above 6 steps are repeatedly sampled from in sequence to producecorrelated chains of parameter estimates from which point and intervalestimates can be created as in chapter 3.

2.2.4 AIP method. The Alternating Imputation Prediction (AIP)method is a data augmentation algorithm for estimating cross-classi�ed

13

models with large numbers of random e�ects. Comprehensive details ofthis algorithm are given in Clayton and Rasbash, 1999. We now give anoverview.Data augmentation has been reviewed by Schafer, 1997. Tanner and

Wong, 1987 introduced the idea of data augmentation as a stochas-tic version of the EM algorithm for maximum likelihood estimation inproblems involving missing data. Corresponding to the E and M stepsof Tanner and Wong we have

I(mputation) step - impute missing data by sampling the distribution ofthe missing data conditional upon the observed data and current valuesof the model parameters.P(osterior) step - sample parameter values from the complete data pos-terior distribution; these will be used for the next I-step.

In the context of random e�ect models, the random e�ects play therole of missing data. If the observed data are denoted by y, the randome�ects by � and the model parameters by � and if we denote the proba-bility distribution of y conditional on X as p(yjX) then the algorithm isspeci�ed (at step t) by

I step - Draw a sample �(t) from p(� j y; � = �(t�1))P step - Draw a sample �(t) from p(� j y; � = �(t))

Repeated application of these two steps delivers a stochastic chainwith equilibrium distribution p(�; � j y) in a similar way to the MCMCalgorithm. Now lets look at how we can adapt this method to �t acrossed random e�ects model when the only estimating engine we haveat our disposal is one optimized for �tting nested random e�ects.An n-way cross-classi�ed model can be broken down into n sub-models

each of which is a 2 level hierarchical model. For example, patientsnested within a cross-classi�cation of neighbourhood by hospital can bebroken down into a patient within hospital sub-model and a patientwithin neighbourhood sub-model.Take the simple model

yi = Xi� + �(2)nbd(i) + �

(3)hosp(i) + ei

where neighbourhood and hospital are cross-classi�ed. This cross-classi�ed model can be portioned into two hierarchical sub models :patients within neighbourhoods (model N) and patients within hospitals(model H). An informal description of the AIP algorithm is :

1. Start by �tting model N using an estimation procedure for 2 levelmodels.

14

2. Sample the model parameters from an approximation to their jointposterior distribution. That is sample the �xed e�ects, the neigh-bourhood level variance and the patient level variance; denotethese samples by �[0;2], �

2�[0;2] and �2e[0;2] respectively. Here [0,2]

labels a term as belonging to AIP iteration 0, for classi�cationnumber 2, that is neighbourhood. This is the P-step for the neigh-bourhood classi�cation.

3. Next sample a set of neighbourhood level random e�ects(o[0;2])

from p(�[0;2] j y; �[0;2]; �2�[0;2]; �2e[0;2]) . This is the I-step for the

neighbourhood classi�cation.

4. O�set o[0;2] from y, that is form y� = y � o[0;2], re-sort the dataaccording to hospitals and �t model H using the new o�set responsey�.

5. Next sample �[0;3], �2�[0;3] and �2e[0;3], from this second model, H.

This is the P-step for the hospital classi�cation.

6. Sample a set of hospital level random e�ects(o[0;3] ) from p(�[0;3] jy; �[0;3]; �

2�[0;3]; �

2e[0;3]). This is the I-step for the hospital classi�ca-

tion.

This completes one iteration of the AIP algorithm, this is an Imputation-Posterior algorithm that Alternates between the neighbourhood and hos-pital classi�cations. We proceed by forming y� = y�o[0;3], that is o�set-ting the sampled hospital residuals from y and using that as a responsein step 1. After T iterations the procedure delivers the following twochains, that can be used for inference

f�[0;2]; �2�[0;2]; �2e[0;2]g; f�[1;2]; �2�[1;2]; �2e[1;2]g : : : f�[T;2]; �2�[T;2]; �2e[T;2]gf�[0;3]; �2�[0;3]; �2e[0;3]g; f�[1;3]; �2�[1;3]; �2e[1;3]g : : : f�[T;3]; �2�[T;3]; �2e[T;3]gNote that we get two sets of estimates for both the �xed e�ects and

the level 1 variance with the AIP algorithm and these should be approx-imately equal.

2.2.5 Other Methods. Raudenbush, 1993 considers an empiri-cal Bayes approach to �tting cross-classi�ed models based on the EMalgorithm. He considers the speci�c case of two classi�cations where oneof the classi�cations has many units whilst the other has far fewer andshows two educational examples to illustrate the method.Two other recent approaches that can be used for �tting cross-classi�ed

models, in particular with non-Normal responses are Gauss-Hermitequadrature within PQL estimation Pan and Thompson, 2000 and the

15

HGLM model framework as described in Lee and Nelder, 2000. Neitherof these approaches has been designed with speed of estimation in mindand so they are currently not feasible for the size of some of the problemsthat are considered in practice.

2.2.6 Comparison of estimation methods. The RG methodwhen it works is generally fairly quick to converge where all or all but oneof the crossed classi�cations have small numbers of units. When thereare multiple crossed classi�cations with large numbers of untis then thespeed of the RG algorithm deteriorates and memory usage is greatly in-creased, often exhausting the available memory. The AIP method doesnot have these memory problems but will be slower for structures thatare almost hierarchical. Although this method works reasonably well, ifthe response is a binary variable and quasi-likelihood methods need tobe used, then this method like the RG method is still a�ected by the biasthat is inherent in quasi-likelihood methods for binary response multi-level models (See Goldstein and Rasbash, 1996). The MCMC methodshave no bias problems although there are still issues on which prior dis-tributions to use for the variance parameters. They also, like the AIPmethods do not have any memory problems. They are however generallycomputationally a lot slower as they are estimating the whole distribu-tion and not simply the mode, although as the structure of the databecomes more complex the ratio of speed di�erence is reduced.

2.2.7 An example analysis of a two way cross-classi�cation:

primary schools crossed with secondary schools. We will hereconsider �tting the RG method using the IGLS algorithm, the MCMCmethod based on Gibbs sampling (Browne et al., 2000) and the AIPmethod to an educational example from Fife in Scotland. Here we haveas a response the exam results of 3,435 children at age 16. We know foreach child both the primary school and secondary school that they at-tended and we are interested in partitioning the variance between thesetwo sources and individual pupil level variation. The classi�cation di-agram is shown in �gure 1.4. There are 148 primary schools that feedinto 19 secondary schools in the dataset. Of the 148 primary schools, 59are nested within a single secondary school, whilst another 62 have atmost 3 pupils that do not go to the main secondary school so we havean almost nested structure. This structure is particularly suited for theRG algorithm.We will �t the following model to the dataset

16

Pupil

Primary School Secondary School

Figure 1.4 Classi�cation Diagram for the Fife educational example

Table 1.5 Point estimates for the Fife educational dataset.

Parameter IGLS MCMC AIP

Mean achievement (�0) 5.50 (0.17) 5.50 (0.18) 5.51 (0.19)Secondary school variance (�2�(2)) 0.35 (0.16) 0.41 (0.21) 0.34 (0.15)

Primary school variance (�2�(3)) 1.12 (0.20) 1.15 (0.213) 1.11 (0.20)

Individual level variance (�2e) 8.10 (0.20) 8.12 (0.20) 8.11 (0.20)

yi = �0 + �(2)SEC(i) + �

(3)PRIM(i) + ei

�(2)SEC(i) � N(0; �2�(2)); �

(3)PRIM(i) � N(0; �2�(3)); ei � N(0; �2e ):

The results are shown in table 1.5.From table 1.5 we can see that in this example there is more variation

between primary schools than between secondary schools. The MCMC

17

estimates replicate the IGLS estimates with slightly greater higher levelvariances (mean versus mode estimates) due to the skewness of the pos-terior distribution. The AIP method gives very similar results to theIGLS method. A further discussion of these results is given in Gold-stein, 1995.

2.3 MODELS FOR MORE COMPLEXPOPULATION STRUCTURES

In this section we will consider expanding the simple two cross-classi�edstructure to accomodate more classi�cations and more complex struc-tures.

2.3.1 Example scenarios. Lets take the situation described inthe classi�cation diagram drawn in �gure 1.3(i) where patients lie withina cross-classi�cation of hospitals by areas. We may have information onthe doctor that treated each patient and doctors may be nested withinhospitals. The classi�cation diagram for this structure is shown in �gure1.5.

Patient

Doctor

Hospital

Neighbourhood

Figure 1.5 Classi�cation Diagram for two crossed hierarchies (patients within doctorswithin hospitals)*(patients within neighbourhoods).

A variance components model for this structure is written as

18

yi = �0 + �(2)nhbd(i) + �

(3)hosp(i) + �

(4)doct(i) + ei

If doctors work across hospitals and are therefore not nested withinhospital we then have a three way cross-classi�cation which is drawn in�gure 1.6.

Patient

Hospital Neighbourhood Doctor

Figure 1.6 Classi�cation Diagram for three crossed hierarchies (patients within hos-pitals)* (patients within doctors)*(patients within neighbourhoods).

Note that the variance components model for the structure in �g-ure 1.6 is also described by the same equation. This is a re ection ofthe fact that the model notation for describing the random e�ects sim-ply lists the classi�cations that are sources of variation for the responsewe are modelling. In the variance components model we only have anintercept term which varies across all four classi�cations present. Sup-pose we had another explanatory variable, x1 and we wished to allow itscoe�cient to vary across the doctor classi�cations; we would write thismodel as

yi = �0 + �(2)nhbd(i) + �

(3)hosp(i) + �

(4)doct(i);0 + �1x1i + �

(4)doct(i);1x1i + ei

or alternatively we can express the model as :

19

yi = �0i + �1ix1i + ei

�0i = �0 + �(2)nhbd(i) + �

(3)hosp(i) + �

(4)doct(i);0

�1i = �1 + �(4)doct(i);1

It may be that the scenario described in �gure 1.6 is further com-plicated because hospitals, doctors and neighbourhoods are all nestedwithin regions. In this case the classi�cation diagram becomes as in�gure 1.7.

Patient

Hospital Neighbourhood Doctor

Region

Figure 1.7 Classi�cation Diagram for three crossed hierarchies nested within a higherlevel classi�cation.

Extending the last model to incorporate a simple random e�ect forthe region classi�cation we have

yi = �0i + �1ix1i + ei

�0i = �0 + �(2)nhbd(i) + �

(3)hosp(i) + �

(4)doct(i);0 + �

(5)reg(i)

�1i = �1 + �(4)doct(i);1

These few example scenarios indicate how the classi�cation diagramsand simpli�ed notation can extend to describe patterns of crossings ofarbitrary complexity.

2.3.2 An example analysis of a complex cross-classi�ed struc-

ture : Arti�cial Insemination data. We consider a data set con-

20

cerning arti�cial insemination by donor. Detailed description of thisdata set and the substantive research questions addressed by modellingit within a cross-classi�ed frame work are given in Ecochard and Clay-ton, 1998. The data was re-analysed in Clayton and Rasbash, 1999 as anexample case study demonstrating the properties of the AIP algorithmfor estimating cross-classi�ed models.The data consists of 1901 women who were inseminated by sperm

donations from 279 donors. Each donor made multiple donations, therewere 1328 donations in all. A single donation is used for multiple in-seminations. Each woman receives a series of monthly inseminations, 1insemination per ovulatory cycle. The data contain 12100 cycles withinthe 1901 women.There are two crossed hierarchies, a hierarchy for donors and a hier-

archy for women. Level 1 corresponds to measures made at each ovu-latory cycle. The response we analyse is the binary variable indicatingif conception occurs in a given cycle. The hierarchy for women is cy-cles within women. The hierarchy for donors is cycles within donationswithin donors. Within a series of cycles a women may receive sperm frommultiple donors/donations. The classi�cation diagram for this structureis given in �gure 1.8. The model �tted to the data is

Cycle

Donation

Donor

Woman

Figure 1.8 Classi�cation Diagram for the arti�cial insemination example model.

21

Table 1.6 Results for the Arti�cial insemination example.

Parameter MCMC AIP

intercept (�0) -3.92 (0.21) -3.90 (0.21)azoospermia (�1) 0.21 (0.09) 0.22 (0.10)semen quality (�2) 0.18 (0.03) 0.18 (0.03)

womens age > 35 (�3) -0.29 (0.12) -0.27 (0.12)Sperm count (�4) 0.002 (0.001) 0.002 (0.001)Sperm motility (�5) 0.0002 (0.0001) 0.0002 (0.0001)

Insemination too early (�6) -0.69 (0.17) -0.67 (0.17)Insemination too late (�7) -0.27 (0.09) -0.25 (0.09)Donor variance (�2�(4)) 0.11 (0.06) 0.10 (0.06)

Donation variance (�2�(3)) 0.36 (0.074) 0.34 (0.065)

Women variance (�2�(2)) 1.02 (0.15) 1.01 (0.11)

yi � Bernouilli(�i)logit(�i) = �0 + azooi � �1 + semenqi � �2 + age > 35i � �3+

spermcounti � �4 + spermmoti � �5 + iearlyi � �6+ilatei � �7 + �

(2)woman(i) + �

(3)donation(i) + �

(4)(donor(i)

�(2)woman(i) � N(0; �2�(2)); �

(3)donation(i) � N(0; �2�(3)); �

(4)donor(i) � N(0; �2�(4))

(1.1)

Note that azoospermia (azoo) is a dichotomous variable indicatingwhether the fecundability of the woman is impaired (0 impaired, 1 notimpaired). The results of �tting this model from the MCMC and AIPestimation procedures are given in table 1.6. This model could not be�tted using the RG algorithm. This is because if the data is sorted ac-cording to women then we need to �t 279 dummy variables for donorsand 1328 dummy variables for donations. Alternatively, if we sort thedata according to donations within donors we have to �t 1901 dummyvariables for women. Either way, the size of these data matrices causeproblems of insu�cient memory. Even if these memory problems can beworked around the numerical instability of the constraining procedure,that attempts to constrain over a thousand seperately estimated vari-ances to be equal, causes the adapted IGLS algorithm to fail to converge.After inclusion of covariates there is considerably more variation in

the probability of a successful insemination attributable to the women

22

hierarchy than the donor hierarchy. Both the AIP and MCMC methodsgive similar estimates for all parameters. The �xed e�ect estimatesshow that the probability of conception is increased with azoospermiaand increased sperm quality, count and motility but decreased with theage of the woman and with inseminations that are too early or late.

3. MULTIPLE MEMBERSHIP MODELS

As we have seen from the previous section, allowing classi�cations tobe crossed gives rise to a large family of additional model structuresthat can be estimated. The other main restriction of the basic multi-level model is the need for observations to belong to a unique classi�-cation unit i.e. every pupil belongs to a particular class, every patientis treated at a particular hospital. Often however, over time a patientmay be treated at several hospitals and depending on the response ofinterest all of these hospitals may have in uence. In this section we will�rstly introduce the idea of multiple membership and give some examplescenarios where it may occur. We will then discuss the possible estima-tion procedures that can be used to �t multiple membership models and�nish the chapter with a simulated example from the �eld of education.

3.1 A BASIC STRUCTURE FOR TWOLEVEL MULTIPLE MEMBERSHIPS

Supose we have data on a large number of patients that attend theirlocal hospital and during the course of their hospital stay they are treatedby several nurses and we regard the nurses as an important factor onthe patients outcome of interest. Now typically each patient will be seenby more than one nurse during their stay (although some will only see1) but there are many nurses and so we will treat nurses as a randomclassi�cation rather than as �xed e�ects. To illustrate this table 1.7shows the nurses seen by the �rst 4 patients.We can consider this structure in a unit diagram as shown in �gure 1.9.Here each line in the diagram corresponds to a tick mark in the table.

Again as our dataset gets larger such unit diagrams become impracticalas there will be too many nodes and so we will resort to using the clas-si�cation diagrams introduced earlier for cross-classi�ed models. If wewish to include multiple membership classi�cations in such diagrams weuse the convention of a double arrow to represent multiple membership.This will lead to the classi�cation diagram shown in �gure 1.10 for theabove patients and nurses example.

23

Table 1.7 Table of patients that are seen by multiple nurses.

Nurse 1 Nurse 2 Nurse 3

Patient 1p p

Patient 2p

Patient 3p p

Patient 4p p

Patient P1 P2 P3 P4

Nurse N1 N2 N3

Figure 1.9 Unit Diagram for multiple membership patients within nurses example.

3.1.1 Example scenarios. Many studies have multiple member-ship structure, here are a few examples :

Education : pupils change school/class over the course of theireducation and each school/class has an e�ect on their education.

Health : patients are seen by several doctors and nurses during thecourse of their treatment.

Survey data : Over their lifetime individuals move household andeach household has a bearing on their lifestyle, health, salary etc.

3.1.2 Constructing a statistical model. Returning to our ex-ample of patients being seen by multiple nurses, we have patient 1'sresponse being a�ected by nurse 1 and nurse 3 while patient 2 is onlya�ected by nurse 1. As we are treating nurse as a random classi�cation

24

Patient

Nurse

Figure 1.10 Classi�cation Diagram for multiple membership patients within nursesexample.

we would like each patient's response to have equal e�ect on the nurseclassi�cation variance so we generally weight the random e�ects to sumto 1. For example let's assume patient 1 has been treated by nurse 1for 2 days and nurse 3 for 1 day. Then we may give nurse 1 a weightof 2

3 and nurse 3 a weight of 13 . Often we do not have information on

the amount of time patients are seen by each nurse and so we commonlyallocate equal weights (in this case 1

2) to each nurse.We can then write down a general two level multiple-membership

model as

yi = X� +X

j2nurse(i)

w(2)i;j �

(2)j + ei

�(2)j � N(0; �2�(2)); ei � N(0; �2e)

nurse(i) is the set of nurses seen by patient i and w(2)i;j is the weight

given to nurse j for patient i. Here we assume thatXj2nurse(i)

w(2)i;j = 18i:

25

If we wish to write out this model for the �rst four patients from theexample we get

y1 = X1� + 12�

(2)1 + 1

2�(2)3 + e1

y2 = X2� + �(2)1 + e2

y3 = X3� + 12�

(2)2 + 1

2�(2)3 + e3

y4 = X4� + 12�

(2)1 + 1

2�(2)2 + e4

3.2 ESTIMATION ALGORITHMS

There are two main algorithms for multiple membership models, anadaption of the Rasbash and Goldstein, 1994 algorithm described earlierand the MCMC method. The AIP method has not been extended tocater for multiple membership models.

3.2.1 An IGLS algorithm for multiple membership models.

Earlier we described how to �t a cross-classi�ed model by absorbingone of the cross-classi�cations into a set of dummy variables (The RGmethod). A slight modi�cation is required to allow this technique to beused to �t multiple membership models. First lets consider a two levelhierarchical model for patients within nurses:

yi = �0 + �(2)nurse(i) + ei

�(2)nurse(i) � N(0; �2�(2)); ei � N(0; �2e ):

We can reparamaterise this simple two level model as

yi = �0+zi;1�(2)nurse(i);1+zi;2�

(2)nurse(i);2+zi;3�

(2)nurse(i);3+: : :+zi;J�

(2)nurse(i);J+ei2666666664

�(2)nurse(i);1

�(2)nurse(i);2

�(2)nurse(i);3

.

�(2)nurse(i);J

3777777775� N(0;��(2));��(2) =

2666666664

�2�(2);1 0 0 ... 0

0 �2�(2);2 0 ... 0

0 0 �2�(2);3 ... 0

... ... ... ... ...

... ... ... ... ...0 0 0 ... �2

�(2);J

3777777775ei � N(0; �2e )

26

where zi;j is a dummy variable which is 1 if patient i is seen by nursej, 0 otherwise and J is the total number of nurses. Also we add theconstraint �2

�(2);1 = �2�(2);2 = : : : = �2

�(2);J . Now these two models will

deliver the same estimates, however the second formulation will takemuch longer to compute. The advantage of the second model formulationis that it is straightforward to extend it to the multiple membershipcase. Suppose patients are not nested within a single nurse but aremultiple members of nurses with membership probabilities, �i;j. We cansimply replace zi;j with �i;j in the second formulation and estimationcan proceed in an identical fashion but will now deliver estimates for themultiple membership model.

3.2.2 MCMC. Once again we will use a Gibbs sampling algo-rithm that relies on updating groups of parameters in turn from theirconditional posterior distributions. For illustration we present the stepsfor the following simple multiple membership model based on the vari-ance components model patients within nurses described earlier. Weonce again refer the interested reader to Browne et al., 2000 for more gen-eral algorithms and note that if the response is dichotomous or a countthen as in chapter 3 we can use the Metropolis-Gibbs hybrid methoddiscussed there.The basic two level multiple membershipmodel (patients within nurses)

can be written as :

yi = X� +X

j2nurse(i)

w(2)i;j �

(2)j + ei

�(2)j � N(0; �2�(2)); ei � N(0; �2e)

We can split our unknown parameters into 4 distinct sets : the �xed

e�ects, �, the nurse random e�ects, �(2)j , the nurse level variance, �2�(2)

and the patient level residual variance, �2e .We then need to generate random draws from the conditional distribu-

tion of each of these four groups of unknowns. We will de�ne prior distri-butions for our unknown parameters as follows: For generality we will usea multivariate Normal prior for the �xed e�ects, � � Npf (�p; Sp), and

scaled inverse �2 priors for the two variances. For the nurse level variance�2�(2) � SI�2(�2; s

22), and for the patient level variance �

2e � SI�2(�e; s

2e).

The steps are then as follows:In step 1 of the algorithm the conditional posterior distribution in theGibbs update for the �xed e�ects parameter vector � is multivariatenormal with dimension pf (the number of �xed e�ects) :

27

p(� j y; �(2); �2�(2); �

2e) � Npf (

b�; bD); wherebD =hPN

i=1(Xi)TXi

�2e+ S�1

p

i�1andb� = bD hP

i(Xi)T di

�2e+ S�1

p �pi;where

di = yi �P

j2nurse(i)w(2)i;j �

(2)j :

In step 2 we update the nurse residuals, �(2)k , using Gibbs sampling with

a univariate Normal full conditional distribution :

p(�(2)k j y; �; �2�(2); �2e) � N(b�(2)k ; bD(2)

k ); where

bD(2)k =

"Pi;k2nurse(i)

(w(2)i;k

)2

�2e+ 1

�2�(2)

#�1

and

bu(2)k = bD(2)k

"Pi;k2nurse(i)

w(2)i;k

d(2)i;k

�2e

#;where

d(2)i;k = yi �Xi� �

Pj2nurse(i);j 6=kw

(2)i;j �

(2)j :

In step 3 we update the nurse level variance �2�(2) using Gibbs sampling

and a Gamma full conditional distribution for 1=�2�(2) :

p(1=�2�(2) j y; �; �(2); �2e) � Gammahn2+�2

2 ; 12Pn2

j=1(�(2)j )2 + �2s

22

i:

In step 4 we update the patient level variance �2e using Gibbs samplingand a Gamma full conditional distribution for 1=�2e :

p(1=�2e j y; �; �(2); �2�(2)) � GammahN+�e

2 ; 12P

i e2i + �es

2e

i:

The above 4 steps are repeatedly sampled from in sequence to producecorrelated chains of parameter estimates from which point and intervalestimates can be created as in chapter 3.

3.2.3 Comparison of estimation methods. As in the compar-ison for cross-classi�ed models there are bene�ts for both methods. TheRG method is fairly quick but the number of level 2 units determinesthe size of some of the matrices involved and the number of constraintsthat the method has to apply. These dependencies lead to numericalinstability or memory exhaustion in situations with more than a fewhundred level 2 units. The MCMC methods although again computa-tionally slower do not su�er from these memory problems.

3.2.4 An example analysis of a two level multiple member-

ship model : Children moving school . We consider a simulated

28

Table 1.8 Results for the multiple membership schools example.

Parameter RG RIGLS estimates MCMC Estimates

intercept (�0) 0.002 (0.040) 0.003 (0.040)LRT e�ect (�1) 0.565 (0.012) 0.565 (0.013)

School variance (�2�(3)) 0.093 (0.018) 0.096 (0.020)

Pupil variance (�2�(2)) 0.570 (0.013) 0.571 (0.013)

data example based on the problem in education of adjusting for thefact that pupils move school during the course of their studies. We willconsider a study with 4059 students from 65 schools taken from Rasbashet al., 2000. The actual data in the study has each child belonging to1 school but we will assume that over their education 10% of childrenmoved school so we will choose at random for 10% of the children asecond school. We will assume that information about when the moveoccured is unavailable and so for these children we will allocate equalweights of 0.5 to each school. Browne et al., 2000 considered this asthe basis for a simulation experiment by generating 1000 datasets withthis structure to show the bias and coverage properties of the MCMCmethod. We will instead consider the true response on our modi�edstructure. We have as a response the pupil's total (normalised) examscore in all GCSE exams taken at age 16 and as a predictor the pupil's(standardised) score in a reading test taken at age 11. As we are inter-ested in progress from age 11-16 it makes sense to consider the e�ect ofall schools attended in this period.We will consider the following model

normexami = �0 + �1standlrti +X

j2school(i)

w(2)i;j �

(2)j + ei

�(2)j � N(0; �2�(2)); ei � N(0; �2e)

We �t this model using both the RG and MCMC methods and theresults can be seen in table 1.8From the table we can see that both methods give similar results.

If we compare the results here with the results in Rasbash et al., 2000we see only slight changes to the estimates with the level 2 varianceslightly decreased and the level 1 variance slightly increased. Howeverin cases where there is greater amounts of multiple membership the

29

variance estimates can be altered if this multiple membership is ignored,for example if we randomly assigned every pupil to a second school thevariances change to 0.088 and 0.609 at levels 1 and 2 respectively.

4. COMBINING MULTIPLE MEMBERSHIPAND CROSS-CLASSIFIED STRUCTURESIN A SINGLE MODEL

Consider two of our earlier examples in the �eld of education, �rstlypupils in a crossing of primary schools and secondary schools and sec-ondly pupils who are moving from school to school. We could assumethat these two structures occur simultaneously and we will then endup with a model structure that contains both a multiple membershipclassi�cation (secondary schools) and a second classi�cation (primaryschools) that is crossed with the �rst. This scenario can be representedby a classi�cation diagram as in �gure 1.11. Browne et al., 2000 refer tomodels that contain both multiple memberships and cross classi�cationsas multiple membership multiple classi�cation (MMMC) models.

Pupil

P. School S. School

Figure 1.11 Classi�cation Diagram for the neighbours/schools multiple membershipmodel.

30

4.1 EXAMPLE SCENARIOS

Many studies have both cross-classi�ed and multiple membership clas-si�cations in their structure, a few examples are the following :

Education : pupils can be a�ected by the crossing of the neigh-bourhood they live in and the school they attend. They could alsochange class over their period of education and so this multiplemembership class classi�cation will be crossed with the neighbour-hood classi�cation.

Health : patients are seen by several doctors during their treat-ment and may visit several hospitals. Doctors who are specialistsmay move from hospital to hospital and so are crossed with thehospitals.

Survey Data : individuals will belong to many households overthe course of their lives and will reside in several properties. Anentire household may move to a new property so households canbe crossed with properties and all the households/properties canhave an e�ect on the individual. See Goldstein et al., 2000 formore details.

Spatial Data : individuals will belong to a particular area but willalso be a�ected by multiple neighbouring areas.

4.2 CONSTRUCTING A STATISTICALMODEL

If we return to our example of pupils attending multiple secondaryschools but coming from one primary school we need to combine the mul-tiple membership and cross classi�ed model structures into one model.As we are treating the secondary schools as a random classi�cation wewould like each pupil to have an equal e�ect on the secondary schoolclassi�cation so we will use weights that add to 1 when a pupil attendsmore than one secondaryschool. We will let second(i) be the list ofsecondary schools that child i has attended.We can then write down a general two classi�cation MMMC model

as

yi = X� +X

j2second(i)

w(2)i;j �

(2)j + �

(3)prim + ei

�(2)j � N(0; �2�(2)); �

(3)prim � N(0; �2�(3)); ei � N(0; �2e ):

31

Here w(2)i;j is the weight given to secondary school j for pupil i. Here

we assume thatP

j2second(i)w(2)i;j = 18i. Both the RG algorithm and

the MCMC method can be used to �t these models that combine bothmultiple membership and cross classi�cation.

4.3 AN EXAMPLE ANALYSIS : DANISHPOULTRY FARMING

Rasbash and Browne, 2001 consider an example from veterinary epi-demiology concerning the outbreaks of salmonella typhimurium in ocksof chickens in poultry farms in Denmark between 1995 and 1997. Theresponse of interest is whether salmonella typhimurium is present in a ock and in the data collected 6.3% of ocks had the disease. At theobservation level, each observation represents a ock of chickens. Foreach ock the response variable is whether or not there was an instanceof salmonella in that ock. The basic data have a simple hierarchicalstructure as each ock is kept in a house on a farm until slaughter. As ocks live for a short time before they are slaughtered several ocks willstay in the same house each year. The hierarchy is as follows 10,127child ocks within 725 houses on 304 farms.Each ock is created from a mixture of parent ocks (up to 6) of which

there are 200 in Denmark and so we have a crossing between the child ock hierarchy and the multiple membership parent ock classi�cation.The classi�cation diagram can be seen in �gure 1.12. We also know theexact makeup of each child ock (in terms of parent ocks) and so canuse these as weights for each of the parent ocks. We are interestedin assessing how much of the variability in typhoid incidence can beattributed to houses, farms and parent ocks.There are also 4 hatcheries in which all the eggs from the parent

ocks are hatched. We will therefore �t a variance components modelthat allows for di�erent average rates of salmonella for each year withhatchery included in the �xed part as follows :

salmonellai � Bernouilli(�i)

logit(�i) = �0 + Y 96 � �1 + Y 97 � �2 + hatch2 � �3+

hatch3 � �4 + hatch4 � �5 + �(2)House(i) + �

(3)Farm(i) +

Pj2P:flock(i)w

(4)i;j �

(4)j

�(2)House(i) � N(0; �2�(2)); �

(3)Farm(i) � N(0; �2�(3)); �

(4) � N(0; �2�(4))

(1.2)

32

Child Flock

House

Farm

Parent Flock

Figure 1.12 Classi�cation diagram for the Danish poultry model.

The results of �tting model 1.2 using both the Rasbash and Goldsteinmethod with 1st order MQL estimation and the MCMC method can beseen in table 1.9. The quasi-likelihood methods are numerically ratherunstable and we could not get either 2nd order MQL or PQL to �t thismodel.We can see here that there are large e�ects for the year the chickens

were born suggesting that salmonella was more prevalent in 1995 thanthe other years. The hatchery e�ects were also large suggesting chickensproduced in hatcheries 1 and 3 had a larger incidence of salmonella.There is a large variability for the parent ock e�ects and for the farme�ects which are of similar magnitude. There is less variability betweenhouses within farms.

4.3.1 Method comparison. The MCMC results were run for50,000 iterations after a burn-in of 20,000 (This took just under 2 hourson a 733MHz PC) as we used arbitrary starting values and so the chaintook a while to converge. From table 1.9 we can see reasonable agree-ment between the two methods, although the �xed e�ects in MQL are allsmaller as is the farm level variance. This behaviour was shown in sim-ulations on a nested 3 level binary response data structure in Rodriguez

33

Table 1.9 Results for the Danish poultry example.

Parameter 1st MQL MCMC Estimates

intercept (�0) -1.862 (0.184) -2.322 (0.213)1996 e�ect (�1) -1.004 (0.138) -1.239 (0.162)1997 e�ect (�2) -0.852 (0.159) -1.165 (0.187)

Hatchery 2 e�ect (�3) -1.458 (0.222) -1.733 (0.255)Hatchery 3 e�ect (�4) -0.250 (0.209) -0.211 (0.252)Hatchery 4 e�ect (�5) -1.007 (0.353) -1.062 (0.388)

Parent ock variance (�2�(4)) 0.892 (0.184) 0.895 (0.179)

Farm variance (�2�(3)) 0.639 (0.121) 0.927 (0.197)

House variance (�2�(2)) 0.206 (0.096) 0.208 (0.108)

and Goldman, 1995 with the improvements of the MCMC method shownin Browne and Draper, 2000 and so this suggests that the MCMC resultsshould be more accurate.

4.4 COMPLEX RANDOM EFFECTS

Model 1.2 is essentially another variance components model but wecould �t a model that has complex variation at one of the higher clas-si�cations. To illustrate this we will modify the farm level variance toaccount for di�erent variability between years at the farm level that is

we replace the simple farm level random e�ects, �(3)Farm(i) with 3 sets of

e�ects one for each year. Our expanded model is then as follows :

salmonellai � Bernouilli(�i)

logit(�i) = �0 + Y 96 � �1 + Y 97 � �2 + hatch2 � �3+

hatch3 � �4 + hatch4 � �5 + �(2)House(i) + Y 95 � �(3)

Farm(i);1+

Y 96 � �(3)Farm(i);2 + Y 97 � �(3)

Farm(i);3 +P

j2P:flock(i)w(4)i;j �

(4)j

�(2)House(i) � N(0; �2�(2)); �

(3)Farm(i) � N3(0;��(3)); �

(4) � N(0; �2�(4))

(1.3)

34

Table 1.10 Estimates for the parameters in model 1.3.

Parameter MCMC Estimates

intercept (�0) -2.544 (0.240)1996 e�ect (�1) -1.149 (0.256)1997 e�ect (�2) -1.003 (0.293)

Hatchery 2 e�ect (�3) -1.788 (0.265)Hatchery 3 e�ect (�4) -0.143 (0.252)Hatchery 4 e�ect (�5) -1.065 (0.383)

Parent ock variance (�2�(4)) 0.878 (0.180)

Farm year95 variance (��(3)[1; 1]) 1.416 (0.341)Farm 95/96 covariance (��(3)[1; 2]) 0.514 (0.262)Farm 95/97 covariance (��(3)[1; 3]) 0.415 (0.226)Farm year96 variance (��(3)[2; 2]) 1.239 (0.463)Farm 96/97 covariance (��(3)[2; 3]) 0.750 (0.321)Farm year97 variance (��(3)[3; 3]) 1.017 (0.482)

House variance (�2�(2)) 0.271 (0.119)

The parameter estimates for this extended model are given in ta-ble 1.10. We see that the �xed e�ects estimates are fairly similar tomodel 1.2. It is interesting to see that all the covariances in the farmlevel variance matrix are positive. This suggests that after adjusting forother factors, if a farm has an incidence of salmonella in 1995 then itis more likely to have an incidence again in 1996 and in 1997. In factthe corresponding correlation estimates are 0.39, 0.35 and 0.67 respec-tively showing that in particular there is a strong correlation betweensalmonella infection in farms in 1996 and 1997. The numerical instabil-ities of the quasi-likelihood methods mean that comparitive estimatescould not be calculated for this model.

5. CONSEQUENCES OF IGNORINGNON-HIERARCHICAL STRUCTURES

Analysing only hierarchical components of populations which haveadditional non-nested structures has two potentially negative conse-quences. Firstly, the model is under-speci�ed because there are sourcesof variation that have not been included in the model. This under-speci�cation can lead to an underestimation of the standard errors of

35

Table 1.11 E�ects of ignoring a cross-classi�ed structure.

Parameter Model I Model II Model III

intercept 5.97 (0.07) 6.02 (0.07) 5.98 (0.07)VRQ e�ect 0.16 (0.003) 0.16 (0.003) 0.16 (0.003)

primary school variance 0.28 (0.06) 0.27 (0.06)secondary school variance 0.05 (0.02) 0.01 (0.02)

pupil variance 4.25 (0.10) 4.48 (0.11) 4.25 (0.10)

the parameters and therefore to incorrect inferences. Secondly, the vari-ance components obtained from the simple hierarchical model, or sets ofseparate hierarchical models, can not be trusted. They may change sub-stantially if the additional non-nested structures are included in a singlemodel. For example, we may wish to know about the relative importanceof general practices and hospitals on the variation in some patient leveloutcome. If patients are cross-classi�ed by hospital and general practice,we need to �t the full cross-classi�ed model including patients, generalpractices and hospitals in order to address this question. Looking at twoseparate hierarchical analysis one of patients within hospital, the otherof patients within general practices, is not su�cient.A numerical example of this is shown in table 1.11 which shows results

for three models �tted using the RG method to the educational attain-ment data from Fife in Scotland, where pupils are contained within across-classi�cation of primary schools by secondary schools. Model I �tspupils within primary schools and ignores secondary school, model II �tspupils within secondary schools and ignores primary school and modelIII �ts the cross-classi�cation. The response is an attainment score atage 16, the explanatory variable vrq is a verbal reasoning measure takenat age 11. When one side of the cross-classi�cation is ignored, the re-leased variance is split between the classi�cation left in the model andthe pupil level variance, in ating both estimates. This has the mostdrastic e�ect when the primary school hierarchy is ignored, in this case(model II) the in ated estimate of the between secondary school vari-ance is 2.5 times its standard error as opposed to 0.5 times its standarderror in the full model.

References

Browne, W. J. and Draper, D. (2000). A comparison of Bayesian andlikelihood methods for �tting multilevel models. Submitted.

Browne, W. J., Goldstein, H., and Rasbash, J. (2000). Fitting com-plex model structures to large datasets: a Monte Carlo Markov chain(MCMC) algorithm to �t multiple membership multiple classi�cationmodels. Submitted.

Bull, J. M., Riley, G. D., Rasbash, J., and Goldstein, H. (1999). ParallelImplementation of a Multilevel Modelling Package. ComputationalStatistics and Data Analysis, 31:457{474.

Clayton, D. G. and Rasbash, J. (1999). Estimation in large crossedrandom-e�ects models by data augmentation. Journal of the RoyalStatistical Society, Series A, 162:425{436.

Ecochard, R. and Clayton, D. G. (1998). Multilevel modelling of con-ception in arti�cial insemination by donor. Statistics in Medicine,17:1137{1156.

Goldstein, H. (1986). Multilevel mixed linear model analysis using iter-ative generalised least squares. Biometrika, 73:43{56.

Goldstein, H. (1995). Multilevel Statistical Models. Edward Arnold, Lon-don, 2 edition.

Goldstein, H. and Rasbash, J. (1996). Improved Approximations for Mul-tilevel Models with Binary Responses. Journal of the Royal StatisticalSociety, Series A, 159:505{513.

Goldstein, H., Rasbash, J., Browne, W. J., Woodhouse, G., and Poulain,M. (2000). Multilevel modelling in the study of dynamic householdstructures. European Journal of Population, pages {.

Lee, Y. and Nelder, J. (2000). Hierarchical Generalized linear models:a synthesis of generalized linear models, random e�ects models, andstructured dispersion. Technical report, Department of Mathematics,Imperial College, London.

37

38

Pan, J. X. and Thompson, R. (2000). Generalized linear mixed models:An improved estimating procedure. In Bethlehem, J. G. and van derHeijden, P. G. M., editors, COMPSTAT: Proceedings in Computa-tional Statistics, 2000., pages 373{378. Physica-Verlag.

Rasbash, J. and Browne, W. J. (2001). Non-hierarchical multilevel mod-els. In Leyland, A. and Goldstein, H., editors, Multilevel modelling ofhealth statistics. Wiley.

Rasbash, J., Browne, W. J., Goldstein, H., Yang, M., Plewis, I., Healy,M., Woodhouse, G., Draper, D., Langford, I., and Lewis, T. (2000). AUser's Guide to MLwiN. Institute of Education, London, 2.1 edition.

Rasbash, J. and Goldstein, H. (1994). E�cient analysis of mixed hi-erarchical and crossed random structures using a multilevel model.Journal of Behavioural Statistics, 19:337{350.

Raudenbush, S. W. (1993). A crossed random e�ects model for unbal-anced data with applications in cross-sectional and longitudinal re-search. Journal of Educational Statistics, 18:321{350.

Rodriguez, G. and Goldman, N. (1995). An Assessment of EstimationProcedures for Multilevel Models with Binary Responses. Journal ofthe Royal Statistical Society, Series A, 158:73{89.

Schafer, J. (1997). Analysis of Incomplete Multivariate Data. Chapman& Hall, London.

Tanner, M. and Wong, W. (1987). The calculation of posterior distribu-tions by data augmentation (with discussion). Journal of the AmericanStatistical Association, 82:528{550.

chapterfrwjb/materials/nhmm.pdfchapter 1 non-hierar chical mul tilevel models jon rasbash and...

Documents