predicting random effects from a finite population of unequal size clusters … · 2007. 11....

Predicting Random Effects from a Finite Population of Unequal Size Clusters based

on Two-Stage Sampling

Edward J. Stanek III

Department of Public Health

401 Arnold House

University of Massachusetts

715 North Pleasant Street

Amherst, MA 01003-9304 USA

[email protected]

Julio M. Singer

Departamento de Estatística

Universidade de São Paulo

São Paulo, Brazil

[email protected]

C07ed13.doc 11/29/2007 12:37 PM i

mailto:[email protected]


ABSTRACT

Prediction of random effects is an important problem with expanding applications.

In the simplest context, the problem corresponds to prediction of the latent value (the

mean) of a realized cluster selected via two-stage sampling. Best linear unbiased

predictors developed from mixed models are widely used, but their development requires

distributional assumptions or an infinite population framework. When the number and

size of clusters is finite, super population models have been used to predict the

contribution of the unobserved units to a realized cluster mean. Recently, predictors

developed from a two-stage sampling model have been shown to out-perform the model-

based predictors. However, the random permutation model underlying these predictors is

limited to settings where cluster sizes and sample sizes per cluster are equal.

We present a new two-stage sampling model that can be used when cluster sizes

and sample sizes per cluster differ. The model expands the set of random variables from

the two-stage sampling model to accommodate the different size clusters. The resulting

best linear unbiased predictor out performs any of the competing predictors, even when

clusters and sample sizes per cluster are equal. The reduction in expected mean squared

error relative to predictors developed under mixed model or super population model

assumptions is likely due to the specificity of the expanded random permutation model to

the problem. This suggests faithfully capturing the stochastic aspects of a problem is

more important than simplifying assumptions in developing optimal solutions. Many

other problems may be amenable to improved solutions based on extension of this

approach.

C07ed13.doc 11/29/2007 12:37 PM ii

Contact Address: Edward J. Stanek III Department of Public Health 401 Arnold House University of Massachusetts at Amherst Amherst, Ma. 01002 Phone: 413-545-3812 Fax: 413-545-1645 Email: [email protected] KEYWORDS: superpopulation, best linear unbiased predictor, random permutation, optimal estimation, design-based inference, mixed models. ACKNOWLEDGEMENT. This work was developed with the support of the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Brazil and the National Institutes of Health (NIH-PHS-R01-HD36848, R01-HL071828-02), USA.

C07ed13.doc 11/29/2007 12:37 PM iii


1. INTRODUCTION

There is often interest in the mean unit response of a cluster where response is

observed for only a subset of the cluster’s units. In its simplest form, this problem is

often addressed in the context of a mixed model, and corresponds to predicting the mean

of a realized cluster in a two stage simple random sample. There are many applications

where such prediction is of interest. Henderson (1984) was interested in predicting a

cow’s milk production, and studied lactation records of dairy cows. In the current U.S.

educational environment, high school standardized test scores are of interest, with

schools having high stakes in the results due to their use in resource allocation. The

strength of a family’s emotional support is of interest in studies of coping with

subsequent illness in aging (Yorgason, et al. 2006). A patient’s biomarker level may be

important in quantifying risk of disease or subsequent intervention.

Although applications vary widely, each problem can be defined using a common

clustered population framework, where the target parameter can be defined as the average

expected response of units in a realized cluster. In the examples, a cluster may

correspond to a cow, a school, a family, or a patient, while the units may correspond to

lactation periods, students, family members, or time periods, respectively. Ideally, the

number and size of clusters is known, since the practical relevance and interpretation of

the statistical inference is enhanced by a clear problem definition.

It is rarely possible to observe the expected response on all units in each cluster.

Instead, response is observed on a subset of units for a subset of clusters. An important

C07ed13.doc 11/29/2007 12:37 PM 1

question is how to use such information to guess the average response of units in a

specified cluster. To answer this question, the observed data are linked to the population

via an assumed probability model, objective criteria are defined that can be used to

evaluate competing guesses, and the best guess is then determined.

Ideally, the problem defined in the population should be identical to the problem

answered via the stochastic model. The typical use of mixed models to guess the mean

response for a cluster is an example where the problem answered via the probability

model is a slightly different problem. In a mixed model, how best to estimate a

parameter corresponding to a cluster mean is translated into the question of how best to

predict a realized random effect. The cluster identified in the population is represented as

a random variable in the stochastic model. This translation changes the problem, since

the identity of the cluster is lost in the usual representation of the stochastic model. The

difference is subtle, since the subscript that identifies a cluster in the population is usually

replaced by a subscript representing the position of a selected cluster in the sample

(Cassel, Sarndal, and Wretman 1979). For example, the first cluster in a population

listing is not the same as the first sample cluster, which we refer to as the first primary

sampling unit, or PSU (as in Stanek and Singer, 2004). A similar substitution occurs for

the subscript identifying units in a cluster, and sample units for a sample cluster, which

we refer to as secondary sampling units, or SSUs. The differences between a cluster and

a PSU are often a source of confusion in interpreting predictors of realized random

effects.

C07ed13.doc 11/29/2007 12:37 PM 2

Model based approaches to predict a realized random effect are characterized by a

gap between the problem stated in terms of identifiable clusters and units in the finite

population and the random effects in a mixed model that do not identify cluster or units.

Since multiple probability models can be posited, different ‘optimal’ solutions can be

claimed for the same basic problem. Each solution suffers from the limitation that the

underlying model can not be connected back to the identifiable finite population. Thus,

while each solution is optimal for its theoretical framework, none of the frameworks

match reality, and it unclear which optimal solution is best.

There are other difficulties with model-based approaches that occur when cluster sizes

differ in the population. A problem occurs when trying to use a model to represent

sample SSUs in a PSU. Such a representation is important for the model to include

characteristics of units (within cluster variables) that may be related to response. The

problem occurs since vectors representing sample SSUs for a sample PSU are of different

size (corresponding to different numbers of sampled units). Interpretation of response for

an element in a sample vector then depends on the ordering of the PSUs, since vectors

representing different sample PSUs have different length. The mixed model assumption

that PSUs are selected at random conflicts with the conditioning needed to interpret

response for a SSU. In part to sidestep this difficulty, model based approaches may

assume the number of SSUs is infinitely large (as in the mixed model approaches of

Goldberger 1962; Henderson 1984, McLean, Sanders and Stroup 1991; Robinson 1991;

and McCulloch and Searle 2001), or condition on cluster size (as in the superpopulation

model based approaches of Scott and Smith 1969; Ghosh and Lahiri 1987; Bolfarine and

C07ed13.doc 11/29/2007 12:37 PM 3

Zachs 1992; Valliant, Dorfman and Royall, 2000). Such assumptions widen the gap

between the problem defined in the finite population, and the solution to the problem

defined in the model. Mixed model methods are widely accepted and used in spite of

these issues, but the foundation for such methods is not completely secure.

We present a new expanded two stage random permutation (RP) model that

overcomes some of the limitations of previous probability models. A key feature of the

expanded model is simultaneously retaining the cluster identify and the PSU position in

the model, while distinguishing for a PSU the relevant contribution of sample SSUs, and

non-sampled SSUs to a target random variable such as a cluster mean. The model

extends the expanded model for simple random sampling of Stanek, Singer, and Lencina

(2004) to the two-stage RP model used by Stanek and Singer (2004), while allowing for

possible unequal size clusters. We develop the best linear unbiased predictor (BLUP) of

a PSU mean using this model, and show that the predictor has smaller mean squared error

than other predictors, including the usual mixed model predictor.

Beginning with an explicit representation of random variables underlying a two

stage RP of a finite population, we show that when predicting a PSU mean via a linear

unbiased predictor, the expanded model can not be further reduced without loss of

information. Since the expanded model retains this information, the best linear unbiased

predictor (developed in an identical manner as the development in Stanek and Singer

(2004)), has smaller expected MSE than any previously developed predictors (including

commonly used mixed model predictors, super population model predictors). We

characterize the extent of the reduction in the theoretical expected mean squared error,

C07ed13.doc 11/29/2007 12:37 PM 4

and illustrate results for empirical predictors via simulations relative to mixed model and

super population model predictors. We conclude by highlighting model features that

have consequence in extending this work and related work that offer promising

possibilities for future improvement.

2. AN EXPANDED RP MODEL FOR A FINITE CLUSTERED POPULATION

We first define notation and terminology for a clustered finite population. Let a

finite population be defined by a listing of sM units labeled 1,..., st M= in each of

clusters, labeled , where the non-stochastic response for unit t in cluster is

given by

N

1,...,s = N s

sty . We assume that the response for a unit can be observed without error, and

corresponds to the expected value for the unit. The finite population parameters

corresponding to the mean and variance of cluster s , 1,...,s N= , are defined by

1

1 sM

s stts

yM

μ=

= ∑ and ( )22

1

1 1 sMs

s stts s

My

M Mσ μ

=

⎛ ⎞−= −⎜ ⎟

⎝ ⎠∑ s

(where we use the survey sampling definition of the parameter 2sσ ). Similarly, the

population mean, and the variance between cluster means are defined as

1

1 N

ssN

μ μ=

= ∑ and ( )22

1

1 1 N

ss

NN N

σ μ μ=

−⎛ ⎞ = −⎜ ⎟⎝ ⎠

∑ .

Using these parameters, we represent the potentially observable response for unit t in

cluster as s

st sy stμ β ε= + +

C07ed13.doc 11/29/2007 12:37 PM 5

where (s s )β μ μ= − is the deviation of the mean for cluster from the overall mean, and s

( )st sty sε μ= − is the deviation of unit t ’s response (in cluster ) from the mean for

cluster . Defining

s

s ( )1 2 N

′′ ′ ′=y y y y where , the

model can be summarized as

( )1 2 ss s s sMy y y ′=y

μ= + +y X Zβ ε

where , =X 11 s

N

MN s× == ⊕Z 1 ,

1

N

ss

M=

= ∑ , and ( 1 2 N )β β β ′=β . Here, is an

column vector of ones,

a1

1a×1

N

ss=⊕A denotes a block diagonal matrix with blocks given by

sA (Graybill 1983), and is defined similarly to . None of the terms in the model are

random variables.

ε y

2.1. Random Variables and The Two Stage Random Permutation Model

We explicitly define a vector of random variables that represents a two stage RP

of the population. Assuming that each realization of the permutation is equally likely,

with probability

1

1

! !N

ss

N M=∏

, the random variables formally represent two-stage sampling

(Cochran 1977). We assume that the sample clusters are in the first positions in a

permutation of clusters. Similarly, we assume that the sample units in cluster

correspond to the units in the first

n

s

sm positions in a permutation of the cluster’s units.

C07ed13.doc 11/29/2007 12:37 PM 6

Note that these definitions represent random variables as a sequence as opposed to the

more usual set notation.

When all clusters are of equal size such that sM M= for all 1,...,s N= , Stanek

and Singer (2004) defined indicator random variables to explicitly represent a random

vector of dimension corresponding to a two-stage permutation of the population.

We follow a similar strategy when clusters sizes differ, but note two differences. First, to

retain the identity of clusters and PSUs in the vector representing a RP, we expand the

number of random variables from to . Second, we include a set of fixed known

weights associated with the SSUs in a cluster. Different target parameters (such as the

cluster mean, total, or a cluster regression parameter) can be formed easily by changing

the weights.

1NM ×

N

The expanded model uses a larger set of random variables than the random

variables used in Stanek and Singer’s model, or in super-population model approaches.

These random variables are fewer than the random variables that would result from a

further expansion that would retain the identity of units and SSUs, and even fewer than

2

the very general representation of the model used by Godambe (1955). We attribute the

better efficiency of predictors (to be shown) under the expanded model in part to the

larger number of random variables used to represent the basic problem. Also notice that

the two-stage RP model faithfully represents a two-stage cluster sampling design, so that

the methods are designed based. We define the weighted expanded random variables,

and illustrate the need for the expansion next.

C07ed13.doc 11/29/2007 12:37 PM 7

Sample indicator random variables relate the response for unit t in cluster , s sty ,

to the response for a unit in position j of a cluster in position i in a two stage

permutation of clusters and units. We define ( )sjtU as an indicator random variable that

takes on a value of one when SSU j in cluster is unit t , and zero otherwise, and use it

to represent the random variable corresponding to SSU j in cluster given by

s

s

( )

1

sMs

sj jtt

Y U=

=∑ sty . Let sjw denote a fixed non-stochastic weight for 1,...,s N= , 1,..., sj M= ,

and define the weighted response as . For example, wsj sj sjY w Y=1

sM

wsjj

Y=∑ corresponds to a

cluster total when for all 1sjw = 1,..., sj M= , or a cluster mean when 1sj

s

wM

= for

all 1,..., sj M= . Notice that we can define ( )swsj sj s jY w ′= y U where

, and ( ) ( ) ( )( )( )1 2 s

ss ssj j j jMU U U ′=U ( )1 2 sws ws ws ws

′MY Y Y=Y

N

. The vector

represents a permutation of weighted response for SSUs in cluster .

wsY

s

A permutation of clusters is defined using the indicator random variables for

and , that take on a value of one when PSU i is cluster , and a

value of zero otherwise. If all clusters were equal in size, we could represent a

permutation of SSUs for PSU i by . When cluster sizes differ, elements in this

expression are defined only when

isU

1,...,i N= 1,...,s = s

1

N

is wss

U=∑ Y

min( , 1,..., )sj M s N≤ = . Mixed model and

C07ed13.doc 11/29/2007 12:37 PM 8

superpopulation model representations of two stage cluster sampling do not formally

account for this range restriction in linking the random variables to the finite population. .

We directly account for different size clusters, and avoid the requirement of range

constraints for subscripts by expanding the number of random variables used to represent

a permutation of clusters and units. For PSU , the expanded random variables are

defined by the vector

i

1× ( )( ) ( )1 1 2 2wi is ws i w i w iN wNU U U U ′′ ′ ′= =Y Y Y Y Y , with a

two stage random permutation of the population represented by the vector, 1N ×

( )( ) ( )1 2w wi w w wN′′ ′ ′= =Y Y Y Y Y . An element of wY is given by . is wsjU Y

A simple example illustrates the notation. Suppose the population consists of

three clusters ( ), where the first two clusters have two units (3N = 1 2 2M M= = ), the

third cluster has three units ( ), 3 3M = 7= , and 1sjw = for all 1,...,s N= , 1,..., sj M= .

We represent a permutation of units for cluster by , s wsY 1,...,3s = . Suppose the first

permutation of clusters results in clusters 1s = , 2s = , and 3s = in positions

respectively, while a second permutation results in clusters

1,...,3i =

3s = , 2s = , and in

positions respectively. The representation of the random variables realized by

the first permutation of PSUs is the random vector (

1s =

1,...,3i =

)1 2 3w w w′′ ′ ′Y Y Y , and by the second

permutation of PSUs is the random vector ( )3 2 1w w w′′ ′ ′Y Y Y . Although both vectors are

of dimension , the third SSU in the first permutation is in PSU 1× 2i = , while the third

SSU in the second permutation is in PSU 1i = . The position of a SSU in the permuted

C07ed13.doc 11/29/2007 12:37 PM 9

population is not sufficient to retain the identity of the PSU for the SSU. In contrast,

using the expanded random variable representation, the random variables realized by the

first permutation of PSUs are represented by

( )1 2 3 2 2 3 2 2 3w w w′′ ′ ′ ′ ′ ′ ′ ′ ′Y 0 0 0 Y 0 0 0 Y , while those realized by the second

permutation are represented by ( )2 2 3 2 2 3 1 2 3w w w′′ ′ ′ ′ ′ ′ ′ ′ ′0 0 Y 0 Y 0 Y 0 0 , where

is an vector with all elements equal to zero. This notation preserves the identity of

the PSU for each SSU, avoiding the ambiguity that arises in mixed models and

superpopulation models.

a0

1a×

2.2 A Mixed Effect Model for the Expanded Random Variables

We represent a mixed model for the expanded RP model indexing representing

expectation with respect to permutations of clusters with the subscript 1ξ and expectation

with respect to permutations of units in a cluster with the subscript 2ξ . For PSU i , we

express

( ) ( ) ( )1 1 2 1 1 1|wi wi wi wi wiE E Eξ ξ ξ ξ ξ ξ⎡ ⎤= + −⎣ ⎦Y Y Y Y +E

where ( )1 2 1

1 N

wi ssE

Nξ ξ =

⎛ ⎞= ⊕⎜ ⎟⎝ ⎠

Y w μ , ( )2 1| 1

N

wi s s isEξ ξ μ

=

⎛ ⎞= ⊕⎜ ⎟⎝ ⎠

Y w U , ( )2 1|wi wi wiEξ ξ= −E Y Y ,

( )( ) ( )1 2 ss sj s s sMw w w w ′= =w , ( )( ) ( )1 2s Nμ μ μ μ ′= =μ and

( )( ) ( 1 2i is i i iNU U U U )′= =U . The fixed effects are given by μ , the vector of

C07ed13.doc 11/29/2007 12:37 PM 10

cluster means. The random effects, ( ) ( )2 1 1 1| wi wiE Eξ ξ ξ ξ−Y Y , are defined as the deviation

from the fixed effects of the expected response conditional on a realized PSU. In the RP

model of Stanek and Singer (2004), the random effect for the mean of PSU was defined

explicitly as

i

1

N

is ss

U β=∑ , with the random variables explicitly linking the clusters to

PSU i . In the expanded RP model, random effects are defined for SSU

isU

j in PSU as

. For both models, the expected value of the random effect (with

respect to

i

( )( 1sj )is isE Uξμ −sw U

1ξ ) is zero, but this result arises from quite different circumstances. The

deviation of response from the expected response within a PSU is represented by .

We combine these expressions arriving at the expanded RP mixed model given by

wiE

( )( )11 1

1 N N

w N s N s ss svec E

N ξμ= =

⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎛ ⎞= ⊗ ⊕ + ⊗ ⊕ − +⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦Y 1 w wμ I w U U E

)

(1)

where . The variance of random effects is given by ( 1 2 N=U U U U

( )( )1 2 11 1

1var1

N N

N s s N s s N ss svec E

Nξ ξ ξμ μ= =

⎛ ⎞⎡ ⎤ ⎡⎛ ⎞ ⎛ ⎞ ⎛ ⎞⊗ ⊕ − = ⊗ ⊕ ⊕⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎢−⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣⎝ ⎠I w U U P w P w

1

N

ssμ

=

⎤⎥⎦

while ( )1 2

2

1 1 1var

s s

s

M MNs

w N sj M sjs j jw w

Nξ ξσ

= = =

⎛ ⎞⎡ ⎤⎛ ⎞ ⎛ ⎞= ⊗ ⊕ ⊕ ⊕⎜ ⎟⎢ ⎥⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎣ ⎦⎝ ⎠E I P , where 1

a a a= −P I Ja and

denotes an matrix with all elements equal to one.

aJ

a a×

2.3. Defining Random Variables of Interest

C07ed13.doc 11/29/2007 12:37 PM 11

Model (1) is an expanded version of a mixed model that retains the identity of

clusters, while accounting for a two stage RP. Our interest is in predicting a linear

combination of these random variables defined by wT ′= g Y , where is non-stochastic.

Although many linear combinations are possible, we limit the discussion to linear

combination given by where

g

wT ′= g Y ′ ′ ′= ⊗g c 1 and c is a 1N × vector of constants.

In particular, we focus on the setting where i=c e is an 1N × vector with all elements

equal to zero, except for element which has the value of one. Of principal interest is

the setting where i , such that the cluster of interest is realized in the sample. When

i

n≤

1sj

s

wM

= for all , 1,...,s N= 1,..., sj M= , the target random variable,

is the mean of PSU ; when 1 1

sMN

is sj sjs j

T U w Y= =

⎛ ⎞= ⎜

⎝ ⎠∑ ∑ ⎟ i 1sjw = for all 1,...,s N= ,

1,..., sj M= , the target random variable, is the total of PSU i . 1 1

sMN

is sj sjs j

T U w Y= =

⎛ ⎞= ⎜

⎝ ⎠∑ ∑ ⎟

3. PREDICTING A PSU MEAN USING AN EXPANDED RP MODEL

The basic strategy for developing a predictor under a model-based approach is

given in many places (Scott and Smith 1969; Royall 1976; Bolfarine and Zacks 1992;

Valliant et al. 2000), and applied in a design-based framework to a balanced two stage

cluster sampling by Stanek and Singer (2004). Stanek and Singer showed that the

predictor of a realized PSU mean from the RP model outperforms the mixed model and

C07ed13.doc 11/29/2007 12:37 PM 12

superpopulation model predictors. We develop similar results using the expanded RP

model, and illustrate that the resulting predictor of a realized PSU mean has even lower

expected MSE relative to the predictor given by Stanek and Singer (2004).

We assume that the elements in the sample portion of wY will be observed, and

express T as the sum of two parts, one which is a function of the sample, and the other

which is a function of the remaining random variables. Then, requiring the predictor to

be a linear function of the sample random variables and to be unbiased, coefficients are

evaluated that minimize, the expected value of the MSE given by . While

in theory, an optimal predictor can be expressed following this prediction recipe, in

practice, the high dimensionality of the vectors from the expansion of random variables

may result in singularities that prevent unique solution (as in Stanek, Singer, and Lencina

(2004)). In part for this reason, we project the random variables into a lower dimensional

space that retains the necessary information for an optimal solution, simplifying the

problem. The projection differs from the projection used when all clusters have equal

size.

(1 2ˆvar T Tξ ξ − )

3.1. Partial Collapsing of the Expanded RP Random Variables

Rao and Bellhouse (1978) (see Theorem 1.1) provide a way of determining

whether the optimal linear unbiased predictor of a target random variable, can

be obtained as the optimal linear unbiased predictor of

wT ′= g Y

p wpT ′= g Y based on ,

a vector of random variables that spans a lower dimensional space. We apply this

wp w′=Y C Y

C07ed13.doc 11/29/2007 12:37 PM 13

theorem when ( )( )

1 1

1 1

s s s

s s s

N N

m M mi sN N

m M mi s

−= =

−= =

⎛ ⎞′ ′⊕⊕⎜ ⎟′ = ⎜ ⎟

⎜ ⎟′ ′⊕⊕⎜ ⎟⎝ ⎠

1 0C

0 1 and ( ) 1

p

−⎡ ⎤′ ′ ′= ⎢ ⎥⎣ ⎦g g C C C , where sm represents

the number of sample units in cluster . The additional subscript s p denotes the partial

collapsing of the expanded random variables. The collapsing sums the SSUs for the

sample and remainder in each cluster for each PSU, reducing the number of random

variables from to . Since N 22N ( ) 1

w wp

−⎡ ⎤′= +⎢ ⎥⎣ ⎦ CY C C C Y P Yw , where

( )2

1

N M

−′= −CP I C C C C′ , we can express ( ) 1

w wp

−⎡ ⎤′ ′ ′ ′= +⎢ ⎥⎣ ⎦ C wg Y g C C C Y g P Y . Notice

that [ ]2p N′ ′ ′ ′= ⊗ ⊗g 1 c 1 , 0w′ =Cg P Y and p wpT ′= g Y . Defining as the optimal linear

unbiased predictor of T based on

T

wpY , and B as a linear unbiased predictor of

, will be optimal for ( ) 0w′ =Cg P Y T wT ′= g Y if and only if ( )1 2ˆ ˆ 0E T T Bξ ξ

⎡ ⎤− =⎣ ⎦ . We

determine conditions under which ( )1 2ˆ ˆ 0E T T Bξ ξ

⎡ ⎤− =⎣ ⎦ by using the unbiased constraints

of the predictors, and expressing ( )1 2ˆE T T Bξ ξ

ˆ⎡ ⎤−⎣ ⎦ as a function of ( )1 2 w wEξ ξ ′Y Y .

Simplifying terms, we find that when sjw ws= for all 1,..., sj M= , ( )1 2ˆ ˆ 0E T T Bξ ξ

⎡ ⎤− =⎣ ⎦

(see c06ed54.doc for details). This means that we can obtain the optimal predictor using

the partially collapsed random variables as long as within each cluster, the weights are

equal for SSUs.

C07ed13.doc 11/29/2007 12:37 PM 14

We assume that sjw w= s for all 1,..., sj M= in subsequent developments, and

develop the best linear unbiased predictor of p wpT ′= g Y based on wpY . The vector

contains random variables. The first random variables are of the form

wpY

22N 2N

is s s sIU w m Y , while the remaining random variables are of the form 2N

( )is s s s sIIU w M m Y− , where 1

1 sm

sI sjjsm =

=Y Y∑ and 1

1 s

s

M

II sjj ms sM m = +

=−sY Y∑ .

It is natural to consider whether or not it is sufficient to predict p wpT ′= g Y where

[ ]2p N′ ′ ′ ′= ⊗ ⊗g 1 c 1 using *wp wp

′=Y C Y where *2N N

′ ′= ⊗C I 1 . Note that

where . Note that

*wpT ′= g Y

( ) 1* * * *

p

−⎡′ ′′= ⎢⎣ ⎦g g C C C ⎤

⎥ ( ) 1* * *

2N

N N

−′ = ⊗

1C C C I and .

We refer to the random variables in as the collapsed random variables. This set of random variables is similar to those used by Stanek and Singer (2004) for a population with equal size clusters and equal size samples per cluster. When there is no response error, the collapsed random variables can be used to develop the same predictor of a linear combination of PSU means as that obtained by Stanek and Singer (2004).

*2

′ ′ ′= ⊗g 1 c

2N wpY

Our goal is to see whether or not we lose information in doing this collapsing by applying the Rao-Bellhouse theorem. First, we find that an unbiased predictor of T using can be obtained only if sampling of clusters is with probability proportional to size. based on . We proceed in a similar manner

wpY

further collapsing of the random variables to a set of random variables, with

one variable for the sample SSUs and one for the remaining SSUs in each PSU. This

can be accomplished by er(Would it be possible to collapse these random variables

further?)

22N 2N

3.2. Predicting a Parameter for a PSU Using Collapsing Expanded RP Random

Variables

C07ed13.doc 11/29/2007 12:37 PM 15

We partition into the first random variables corresponding to the sample,

, and the remaining random variables, wpY nN

wIY wIIY to predict I wI II wIIT ′ ′= +g Y g Y , where

I I′ ′= ⊗ N′g c 1 and ( )II II N N′ ′ ′ ′ ′= ⊗ ⊗g c 1 c 1 , and ( I II )′′ ′=c c c , where is an vector of constants. Explicitly, the partitioned RP model is defined by

Ic 1n×

2 1 2

IwI wI wI wI

IIwII wII wII wII

E Eξ ξ ξ

⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛⎛ ⎞= + − +⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜⎜ ⎟

⎢ ⎥⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝⎣ ⎦

XY Y Yμ

XY Y Y⎞⎟⎠

EE

where 1

1 N

I n s ssw m

N =

⎡ ⎤⎛ ⎞= ⊗ ⊕⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦X 1 and

( )

1

1

1

1

N

N n s ss

II N

N s s ss

w mN

w M mN

− =

=

⎛ ⎞⎛ ⎞⊗ ⊕⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟=⎜ ⎟⎛ ⎞⊗ ⊕ −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

1X

1, random effects

are given by

( )( )

( )( )

( ) ( )

1

12 1 2

1

1

1

11

N

n s s I Is

NwI wI

N n s s II IIswII wII

N

N s ss

f d vec E

f d vec EE E

f d vec E

ξ

ξξ ξ ξ

ξ

=

− =

=

⎛ ⎞⎛ ⎞⊗ ⊕ −⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟⎜ ⎟⎡ ⎤ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⊗ ⊕ −⎜ ⎟− = ⎜ ⎟⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎝ ⎠⎢ ⎥ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎣ ⎦⎜ ⎟⎛ ⎞ ⎡ ⎤⊗ ⊕ − −⎜ ⎟⎜ ⎟⎣ ⎦⎜ ⎟⎝ ⎠⎝ ⎠

I U

Y Y I UY Y

I U

U

U

U

,

where ss

s

mf

M= , s s s sd M w μ= ,

2

wI wI wI

wII wII wII

Eξ

⎛ ⎞ ⎛ ⎞ ⎛ ⎞= −⎜ ⎟ ⎜ ⎟ ⎜

⎝ ⎠ ⎝ ⎠ ⎝ ⎠

E Y YE Y Y

⎟ , and ( )I II=U U U ,

and ( )( ) ( )1 2I i= =U U U U Un ( )( ) ( )1 2II i n n N+ += =U U U U U . We

partition the variance in a similar manner, representing it by1 2

,

,

var I I IIwI

I II IIwIIξ ξ

⎛ ⎞ ⎛ ⎞=⎜ ⎟ ⎜ ⎟′⎝ ⎠⎝ ⎠

V VYV VY

where 2 *2

1 1 1

1 11

N N N

I n n s s N s s n ss s s sef d f d fN N = = =

⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛= − ⊗ ⊕ ⊕ + ⊗ ⊕⎜ ⎟ ⎜ ⎟ ⎜⎜ ⎟ ⎢ ⎥− ⎝ ⎠ ⎝ ⎠ ⎝⎝ ⎠ ⎣ ⎦V I J P I v ⎞

⎟⎠

,

( ) ( )( )

( )

1 1

,2 *2

1 1 1

1 1 1 1 11 1

N N

n s s Nn N n n N n n N s s

I II N N N

s s N s s n s ses s n N n s

f d f dN N N N

f d f d f v

× − × − × = =

= = × − =

⎛ ⎞⎛ ⎞s s

⎡ ⎤⎡ ⎤ ⎛ ⎞ ⎛ ⎞⎛ ⎞− ⊗ − ⊗ ⊕ ⊕ −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎢ ⎥⎢ ⎥− − ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎜ ⎟=⎜ ⎟⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞⊕ ⊕ − ⊗ ⊕⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦⎝ ⎠

J I 0 J PV

P I 0

,

C07ed13.doc 11/29/2007 12:37 PM 16

( ) ( )

( )

( )( )

( )

1 1 1 1

1 1

1 1

11

11

1

N n N n N nN n n N n N

N N N N

s s N s s s s N s ss s s s

IIn N n

N N nN n

N N

s s N s ss s

N N

f d f d f d f d

NN

f d f d

− − −− × − ×

= = = =

× −

× −−

= =

⎛ ⎞ ⎡ ⎤⎛ ⎞− ⊗ − ⊗⎜ ⎟⎜ ⎟ ⎢ ⎥⎝ ⎠⎝ ⎠ ⎣ ⎦⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⊕ ⊕ ⊕ ⊕ −⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦

= ⎛ ⎞⎛ ⎞− ⎜ ⎟⎜ ⎟ − ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

⎛ ⎞⊕ − ⊕⎜ ⎟⎝ ⎠

I J 0 I J

P P

V 0J

I

P( ) ( )

( )

( )2 *2

1

1 11 1

N n N nN n nN

s sen N n sN

NN n

N N

s s N s ss s

f v

f d f d

− −− ×

× − =

−

= =

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟ −⎛ ⎞⎜ ⎟

⎜ ⎟⎜ ⎟ ⎛ ⎞⎜ ⎟⎜ ⎟ + ⊗⎜ ⎟⎜ ⎟ ⎝ ⎠⎜ ⎟⊗ ⎜ ⎟−⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟⎡ ⎤⎛ ⎞ ⎛ ⎞⊕ − ⊕ −⎜ ⎟⎜ ⎟ ⎜ ⎟⎢ ⎥⎡ ⎤⎛ ⎞ ⎝ ⎠ ⎝ ⎠⎣ ⎦⎜ ⎟⎜ ⎟⎢ ⎥⎜ ⎟⎝ ⎠⎣ ⎦⎝ ⎠

I 0 I

0IP I

P

⊕

and 2 2

*2 1 s s s sse

s

f M wvf N

σ⎛ ⎞−= ⎜ ⎟⎝ ⎠

.

We develop an expression for the best linear unbiased predictor of T next. The

predictor is a linear function of the sample, ˆwIT ′= L Y . Since

, the unbiased constraint given by

. Minimizing

( )ˆ wII II

wII

T T⎛ ⎞

′ ′ ′− = − − ⎜⎝ ⎠

YL g g

Y⎟

=( ) 0I I II II′ ′ ′− −L g X g X ( )1 2ˆvar T Tξ ξ − accounting for the unbiased

constraint using Lagrange multipliers, results in the familiar solution,

( ) ( )1 11 1 1 1 1 1

,ˆ

I I I I I I I I I I II II I I I I I II I

− −− − − − − −⎡ ⎤′ ′ ′ ′= + − +⎢ ⎥⎣ ⎦

L g V V X X V X X V V g V X X V X X g I .

This result simplifies to (see c06ed56.doc, p42 and c07ed01.doc p1)

( )*

1 1

1ˆ N Ns

n I N n N II n Ns ss s

k N N nc cf n f n= =

⎛ ⎞ ⎡ ⎤⎡ ⎤ ⎛ ⎞ −= ⊗ ⊕ + ⊗ ⊕ + ⊗⎜ ⎟ ⎢ ⎥⎜ ⎟⎢ ⎥⎜ ⎟ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎝ ⎠

L P c 1 1 1 1 1

where * **

* 1

111

Ns s

s s sss

k kk k d

d N k=

−⎛ ⎞= − ⎜ ⎟−⎝ ⎠∑ ,

( )2

2 *21s

ss se

dkd N v

=+ −

, 1

1 N

ss

k kN =

= ∑ ,

1

1 N

II ii n

c cN n = +

=− ∑ and

1

1 N

ii

cN =

= ∑c . The predictor ˆ ˆwIT ′= L Y can be expressed as (see

page 6 of c07ed01.doc)

C07ed13.doc 11/29/2007 12:37 PM 17

( ) ( ) ( )1 1 1

ˆ ˆ ˆn N N

i i s s s sI II s s s s sIi s s

N N nT c Y Y c I M w Y c I M f w Yn n= = =

−⎛ ⎞ ⎛= − + +⎜ ⎟ ⎜⎝ ⎠ ⎝

∑ ∑ ∑ ⎞⎟⎠

where *

1

ˆN

i is s s s sIs

M w k Y=

= ∑Y U , 1

1ˆ ˆn

ii

Y Yn =

= ∑ , 1

n

s isi

I U=

= ∑ is an indicator ‘inclusion’ random

variable for cluster s in the sample, and 1

1 sm

sI sjs

Y Ym =

= j∑ .

An expression for the expected mean squared error (EMSE) of the predictor can

be developed directly using expressions for the partitioned variance, and simplify to (see

c07ed17.doc, p15)

( ) ( ) ( )

( )

1 2

22 2 *2 *2

,1 1

2 2

1

1 1ˆvar 2

2

n N

i I kd kd d s se sei s

N

i I di

NcT T c c k v v

N n N

Ncc Nc ncn

ξ ξ σ σ

σ

= =

=

⎛ ⎞⎛ ⎞ ⎛− = − − + +⎜ ⎟⎜ ⎟ ⎜⎝ ⎠⎝ ⎠ ⎝⎡ ⎤+ + −⎢ ⎥⎣ ⎦

∑ ∑

∑

*2

1

N

s=

⎞⎟⎠

∑

where 1

1 n

I ii

c cn =

= ∑ , 1

1 N

d ss

dN

μ=

= ∑ , *

1

1 N

kd s ss

k dN

μ=

= ∑ , ( )22 *

1

11

N

kd s s kds

k dN

σ μ=

= −− ∑ ,

( )22

1

11

N

d ss

dN

σ μ=

= −− ∑ d and ( )( )*

,1

11

N

kd d s s kd s ds

k d dN

σ μ μ=

= −− ∑ − .

When predicting a PSU mean, 1s

s

wM

= , and the predictor simplifies to

( )1

1ˆ ˆN

s sI is

T I Y Y Yn =

= + −∑ n≤ˆ if i , and to ( )1

1ˆN

s sIs

T I Yn =

⎡ ⎤= ⎢ ⎥⎣ ⎦∑ when . The EMSE of the

sample PSU mean predictor (when

i n>

i n≤ ) simplifies to (see c07ed25.doc, page 3)

( ) ( ) ( )

( ) ( )

1 2

2* *

1 1

2*2

21

1 1 1ˆvar 1 11

1 1 1 1

N N

s s s ss s

Ns

s ss s

nT T k kn N N

n k fnN m

ξ ξ μ μ

σ

= =

=

⎡ ⎤− ⎛ ⎞⎛ ⎞− = − − −⎢ ⎥⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠⎢ ⎥⎣ ⎦

⎡ ⎤+ + − −⎣ ⎦

∑ ∑

∑,

C07ed13.doc 11/29/2007 12:37 PM 18

while the EMSE of a PSU not in the sample is given by

( ) ( )1 2

22

1

1 1ˆvar 1N

ss

s s

nT T fn nN Nξ ξ m

σσ=

+⎛ ⎞− = + −⎜ ⎟⎝ ⎠

∑ .

4. COMPARISON OF EMSE

5. EMPIRICAL PREDICTOR SIMULTION

6. APPLICATION

7. DISCUSSION

OLD STUFF

4. APPLICATION

_________________________________________________________________________________________

Table 2. Predictors of latent values of a realized PSU Mean based on Different models

Predictor Target Sample SSUs Remaining SSUs

_________________________________________________________________________________________

iY Simple Mean | 1s isUμ = *i ic Y + ( )*1 i ic Y−

iMMP Mixed Model iP ( )ˆ ˆi ik Yμ μ+ −

iSSP Scott&Smith Model iP i if Y + ( ) ( )* * *ˆ ˆ1 i i if k Yμ μ⎡ ⎤− + −⎣ ⎦

*iP RP (PPS) iP ifY + ( ) ( )*1 if Y k Y Y⎡ ⎤− + −⎣ ⎦

C07ed13.doc 11/29/2007 12:37 PM 19

iP• RP (General) iP•i ic Y + ( ) ( )1 i ic Y k Y Y•⎡ ⎤− + −⎣ ⎦

_________________________________________________________________________________________

_________________________________________________________________________________________

C07ed13.doc 11/29/2007 12:37 PM 20

REFERENCES

Bolfarine, H., and Zacks, S. (1992), Prediction Theory for Finite Populations, New York:

Springer-Verlag.

Cassel, C.M., Sarndal, C.E., and Wretman, J.H. (1977), Foundations of Inference in

Survey Sampling, New York: Wiley.

Cochran, W. (1977), Survey Sampling, New York: Wiley.

Deville, J.C., and Sarndal, C.E. (1992), “Calibration Estimation in Survey Sampling,”

Journal of the American Statistical Association, 87, 376-382.

Diggle, P. L., Heagerty, P., Liang, K. Y. and Zeger, S. (2002), Analysis of Longitudinal

Data, Oxford University Press.

Ghosh, M. and Lahiri, P. (1987), “Robust Empirical Bayes Estimation of Means from

Stratified Samples,” Journal of the American Statistical Association, 82,1153-1162.

Goldberger, A. S. (1962), “Best Linear Unbiased Prediction in the Generalized Linear

Regression Model,” Journal of the American Statistical Association, 57, 369-375.

C07ed13.doc 11/29/2007 12:37 PM 21

Graybill, F. A. (1983), Matrices with applications in statistics, Belmont, California:

Wadsworth International.

Henderson, C.R. (1984), Applications of Linear Models in Animal Breeding, Guelph,

Canada: University of Guelph.

Henderson, C. R., Kempthorne, O., Searle, S. R. and von Krosigk , C. M., (1959), “The

Estimation of Environmental and Genetic Trends from Records Subject to Culling,”

Biometrics, 15, 192-218.

Hinkelmann, K., and Kempthorne, O. (1994), Design and Analysis of Experiments, Vol. 1,

Introduction to Experimental Designs, New York: Wiley.

Li, W. (2003), “Use of random Permutation Model in rate Standardization and

Calibration,” unpublished doctoral thesis, University of Massachusetts, Massachusetts.

McCulloch, C. E. and Searle, S. R. (2001), Generalized, Linear, and Mixed Models, New

York: John Wiley and Sons.

McLean, R. A., Sanders, W. L. and Stroup, W. W. (1991), “A Unified Approach to

Mixed Linear Models,” The American Statistician, 45(1), 54-64.

C07ed13.doc 11/29/2007 12:37 PM 22

Ockene, I. S., Hebert, J. R., Ockene, J. K., Saperia, G. M., Nicolosi, R., Merriam, P.A.

and Hurley, T. G. (1999), “Effect of Physician-delivered Nutrition Counseling Training

and an Office-support Program on Saturated Fat Intake, Weight, and Serum Lipid

Measurements in a Hyperlipidemic Population: Worcester Area Trial for Counseling in

Hyperlipidemia (WATCH),” Archives of Internal Medicine, Apr 12, 1 59(7), 725-731.

Rao, J.N.K. (2003), Small Area Estimation, New York: Wiley.

Robinson, G. K. (1991). “That BLUP is a Good Thing: the Estimation of Random

Effects,” Statistical Science, 6(1), 15-51.

Royall, R. M. (1976), “The Linear Least-squares Prediction Approach to Two-stage

Sampling,” Journal of the American Statistical Association , 71, 657-664.

Sarndal, C-E, Swensson, B., and Wretman, J. (1992), Model Assisted Survey Sampling,

New York: Springer-Verlag.

Scott, A. and T. M. F. Smith (1969), “Estimation in Multi-stage Surveys,” Journal of the

American Statistical Association, 64(327), 830-840.

C07ed13.doc 11/29/2007 12:37 PM 23

Searle, S. R., Casella, G. and McCulloch, C. E. (1992), Variance Components. New

York: Wiley.

Stanek, E. J. III and Singer, J. M. (2004), “Predicting Random Effects from Finite

Population Clustered Samples with Response Error,” Journal of the American Statistical

Association, 99, 119-130.

Valliant, R., Dorfman, H. A. and Royall, R. M. (2000), Finite Population Sampling and

Inference, New York: Wiley.

Verbeke, G., and Molenberghs, G. (2000), Linear Mixed Models for Longitudinal Data,

New York: Springer-Verlag.

C07ed13.doc 11/29/2007 12:37 PM 24

predicting random effects from a finite population of unequal size clusters … · 2007. 11....

Documents