microsoft word - document1people.umass.edu/stanek/pubhlth892d/c11ed01v4-shortv5.doc · web...

Bayesian Models with Measurement Error that Partially Account for Identifiable Subjects

Edward J. Stanek III , Parimal Mukhopadhyay , Viviana B. Lencina , and Luz Mery González

Department of Public Health, University of Massachusetts at Amherst, USA

Indian Statistics Institute, Kolkata, India

Facultad de Ciencias Economicas, Universidad Nacional de Tucumán, CONICET, Argentina

Departamento de Estadística, Universidad Nacional de Colombia, Bogotá, Colombia

document.doc 5/18/2023 7:42 AM i

In a mixed model, latent values associated with subjects are typically random while the subject

specific measurement error variances are not considered to be random. To understand this paradox, we

consider prediction via Bayesian models similar to those proposed by Ericson (1969), but that account

for identifiable subjects. Defining the data as response for a subset of identifiable subjects from a finite

population, we note that the posterior distribution of the subjects’ latent values in the data under an

exchangeable prior for population latent values is an exchangeable distribution of the subjects’ latent

values in the data. We expand this development to settings where response is measured with error on

subjects in the data set, and develop the expected value and variance of the posterior distribution of the

subjects’ latent values in the data, revealing that the measurement error variance component in the

posterior distribution includes the average measurement error variance instead of subject specific

measurement error variances. Based on these results, we specify a new prior distribution that leads to a

posterior distribution of the subjects’ latent values in the data where the subjects’ identities are retained

for measurement error, but not for the corresponding latent values. This class of models allows flexible

specification of fixed and random effects, and highlights the distinction between potentially observable

points and artificial points in the prior distribution. The results clarify the relationship between physical

populations and measurements, and stochastic models that may partially connect to this physical reality.

There are important implications in applications when there is interest in estimating population

parameters, domain means, and latent values for realized random effects.

document.doc 5/18/2023 7:42 AM ii

ACKNOWLEDGEMENT

This work was developed at a joint meeting of the authors in July, 2010 in the Department of Public

Health at the University of Massachusetts, Amherst, USA and a follow-up meeting in September, 2010

in the Departamento de Economia, Universidad Nacional de Tucumán, Tucumán, Argentina.

Appreciation is given to the helpful comments of Julio Singer, Wenjun Li, Michael Lavine, Shrikant

Bangdiwala, Silvina San Martino, and Mirta Santana on early drafts of this manuscript. Previous

meetings over the past five years of the Finite Population Mixed Model Research group (including

many of these investigators) were supported by the National Institutes of Health (NIH-PHS-R01-

HD36848, R01-HL071828-02), USA, Conselho Nacional de desenvolvimento Científico e Tecnológico

(CNPq) and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Brazil.

Keywords: Finite population, heteroskedasticity, superpopulation, unbiasedness, inference, mixed

models, finite population mixed model.

document.doc 5/18/2023 7:42 AM iii

1. INTRODUCTION

Estimating a subject’s latent value in a population based on response for a subset of subjects is a

common problem. When subjects are measured with error, subjects’ labels have an intriguing role. If

the subjects’ identities are known for the subset and used in defining a stochastic model for each

subjects’ responses, the response error model has fixed subject effects and a subject’s latent value can

be estimated directly from the response error model. Ignoring the subject labels, a mixed model can be

specified for response that sums the latent value and measurement error for subjects in a randomly

selected set of subjects, and used to predict a realized subject’s latent value. Although the latent value

of a realized subject in the set can be estimated (or predicted) with each model, the best linear unbiased

predictor (BLUP) from the mixed model is more accurate.

The subjects’ labels are not completely suppressed in mixed models when measurement error

variances differ between subjects. In such settings, although subject labels are ignored (so that subject

effects are random), response error is identified with realized subjects. This partial use of labeling

results in weighted least squares estimates of the mean in a mixed model, and shrinkage constants for

BLUPs of realized subjects’ latent values that depend on the realized subjects’ response error variances.

The better accuracy of these estimators relative to estimators that result from a finite population mixed

model (FPMM) proposed by Stanek and Singer (2004) provides the motivation for this investigation.

We seek to more clearly understand the manner in which labels are partially used in mixed models

when there is heterogeneous endogenous variance.

One approach that enables a subject’s identity to be known is to expand the number of random

variables in a finite population framework that represents the underlying problem, as in Stanek and

Singer (2008). Models using this approach can include effects identified with subjects, but the

document.doc 5/18/2023 7:42 AM 1

approach does not lead to practical estimators. The approach is based on representing the data using an

expanded number of random variables relative to finite population models, but fewer random variables

than would be needed for Godambe’s (1955) general sampling framework, where a different probability

may be associated with each permutation of the labeled data (Zhang, 2010). A second approach to

partially account for a subject’s identify is to use superpopulation models (Godambe 1966). Although

superpopulation models may be developed where subject’s latent values are random, and measurement

error is fixed as in Scott and Smith (1969), they do so by introducing points into the modeling space

that can not be observed (Stanek and Singer 2011). We explore a third approach to partially account for

subject’s identify that uses Basu’s (1969) sufficient statistics in Godambe’s framework in a Bayesian

context.

Our investigation focuses on the role of labels in a simple Bayesian model similar to Ericson

(1969,1988), proceeding in the following manner. We first introduce notation and the basic ideas when

there is no measurement error and where our interest is in the target parameter corresponding to the

population mean. Subsequently, we apply this framework to a setting where subject’s response

includes measurement error with heterogeneous variances, and develop a predictor of a realized

subject’s latent value from the posterior distribution. Random variables for this distribution can be

defined identical to those in a FPMM assuming the subjects in the data set are a finite population.

Measurement error variances are not identifiable. Finally, we introduce a new prior distribution that

results in a posterior distribution where the measurement error variance is identifiable, but random

effects only match the first and second moments of a set of exchangeable random variables whose

realization is a vector of latent values for subjects in the data set. Response for a subject in this

posterior distribution can be represented by a mixed model where the latent values for the subjects in

the data are exchangeable, but the measurement error variance is identified with a realized subject.


2. THE PRIOR AND POSTERIOR DISTRIBUTIONS OF THE POPULATION MEAN

WITH IDENTIFIABLE SUBJECTS AND NO MEASUREMENT ERROR

Assume our interest is in the mean response in a population of subjects, but that the observed

data corresponds to a subset of labeled subjects’ responses (equal to the subjects’ latent values). We

define the population as where represents a subject’s label (such as a

name, assumed to be unique for each subject), and represents the subject’s response (equal to the

subject’s latent value) which is a non-stochastic parameter. The population mean is the simple average

latent value in the population, which we represent by , where is

the set of subjects in the population. It is not necessary that the labels be completely known, nor that

be exactly specified for such a population to be conceptually defined. However, we require the

population to be defined in space and time, so that these quantities (while not known) are at least

potentially knowable, and is defined for . For illustration, our interest may be in the average

number of hypoglycemic episodes in the past year that occurred for Medicare enrollees with a diabetes

diagnosis in Massachusetts on 7/1/2010. Conceptually, a list could be formed of all enrollees to define

the finite population, where for each enrollee, the actual number of hypoglycemic episodes is recorded.

It is not necessary to list the enrollees and record each response to have a clear interpretation of .

Although we don’t know the population mean, we assume that from previous studies and/or

experience, we can guess it. The previous studies most likely were conducted in different locations and

times, and have different strengths and limitations. Associated with each guess, we assign a prior

probability that reflects our subjective measure of belief that the guess is the actual value of the

parameter for the population. These values, , and their associated prior probabilities, ,


, define the prior distribution, where . We assume (i.e. believe) that the actual mean is one of

the possible values specified in the prior. Let the prior parameter be defined for

with subjects . In order for , we require not

only that , and , but also that for each , must be identical in and .

Different interpretations can be given to the population underlying the prior parameter and its

associated prior probability, .

One concept of the population underlying the prior parameters is that the subjects are the same

for all , such that , but that the latent value, , for a given subject is uncertain. The

populations and are distinguished by having and differ for at least one .

An example occurs when the populations are defined by associating the latent values in any of

possible permutations of latent values with a given listing of subjects in . For this set of prior

populations if all responses are distinct, although and for all , for only

one will equal . It is for this population that we wish to know . A variation on this concept

is to define the populations by associating the latent values in any of possible permutations of

any of a set of latent values where , resulting in . The subjects may

correspond conceptually to the subjects in a superpopulation. The prior parameter, , will be the same

for all permutations of a set, but may differ for different sets. When is very large, the prior

distribution of could be approximated by a continuous parametric distribution.

A different concept of the prior can be illustrated via an example. Suppose we want to know the


average annual Medicare cost per enrollee in 2010 in the United States, with enrollees defined as the

enrollees on July 1, 2010. From past research, we can specify (or guess) values of for each of the

years 2006-2009, knowing that the population of Medicare enrollees differs somewhat between the

years. Conceptually, assume that we are able to form a list of all enrollees over the time period 2006-

2009, and associate with each enrollee their average annual cost for each year they are enrolled in

Medicare. Let us augment this list with the new 2010 enrollees in Medicare to form a superpopulation

of subjects, . For a subject enrolled for four years in Medicare, there will

be four pairs that may differ as a result of different latent values for the four years.

By definition, the enrollees in 2010 are a subset of the subjects, but without knowing ,

we don’t know exactly what subset corresponds to the 2010 enrollees. Suppose we construct all

possible subsets of subjects from the superpopulation, and for each subject in the subset, choose

one pair, defining and . The 2010 enrollees correspond to one of these subsets of

subjects, but we don’t know which one. We may limit the possible latent values for a subject to those

observed in previous years, or expand the set of latent values that are possible for a subject similar to

the previous example. Conceptually, the entire set of populations and their corresponding parameters,

, can be summarized in a distribution. Although the actual enrollees in 2010, , are not known, for

many , . In one of these populations, we assume . The prior distribution reflects our

uncertainty over for which , is equal to .

Often the prior distribution may be defined by and their associated prior probabilities, for

with only an implicit understanding of . Additional specificity of the prior population is

not needed, and thus suppressed. The population could conceivably be identical to any of the


populations in the prior. Suppose now that we observe data on a

subset of subjects, , where is the latent value for the subject labeled . The

basic idea underlying Bayesian inference is to use these data to reduce the uncertainty associated with

the prior distribution. The uncertainty is reduced since once we know the subjects in the data, only

prior populations that include these subjects, i.e., where , are possible. The challenge in

updating the prior distribution is identifying where , such that for each , .

2.1. Identifying Subjects in an Exchangeable Distribution for Population

In order to have a clear strategy for linking subjects in the data to subjects in the prior

distribution, we expand the representation of the prior so as to be able to identify subjects. We do so by

assuming that response for the subjects in each population is a realization of a vector of

exchangeable random variables, and define notation that enables subjects to be identified for each point

in such an exchangeable distribution. Suppressing the subscript to simplify notation, for each

, we define , as a vector of exchangeable random variables similar to

Ericson (1969) such that the joint probability density of response ,

associated with each permutation, , of the subjects in is identical for all . Unlike

Ericson, we define points in the prior distribution for each so that both the subject’s parameter and

the subject’s label can be identified. To do so, we introduce notation to keep track of permutations of

subject labels used to construct different listings, and notation for permutations of subject’s response

used to define possible points in . In each case, we maintain the connection between the subject’s

label and response evident in . First, we define different possible listings.


A population listing links each subject’s label to a position, , in an ordered array .

Let us define an initial listing by placing the subject labels into a vector , where the

subject’s label in position is for . We define the corresponding parameter vector by

, where is response for the subject with label . Different listings are defined by

, where , , is an permutation matrix with elements in row of column

that take a value of one when is in position in listing , and zero otherwise. Each row and

column of total to one. We define , an identity matrix, so that . By knowing

(and hence ), we know which subject is in position in , and hence know the subject

associated with response in position in .

The vector of random variables is defined for listing with points

for and . The matrix is an permutation matrix

with elements in row of column that take a value of one when the subject in listing is in

position in permutation , and zero otherwise. Rows and columns of total to one. By knowing

(and hence ), we know which subject from listing corresponds to the response in position of

. The labels for subjects in are given by . We associate an indicator random

variable, , with such that where for all , and denotes

expectation over . With these definitions, . Associated with each point is a


vector of subject labels, , and a probability . The additional indices and determine

, and along with the definition of , determine the order of the subjects in for population

. We also define, where .

2.2. The Prior Distribution with Exchangeable Subjects

The vector is defined for each , , which we now identify using the subscript

, such that where . For each , , we assume the vector

of random variables are exchangeable implying that the joint probability density is identical

for all , and represent a vector of random variables with this joint probability density by .

Defining as an indicator random variable for population such that , including the

distribution of the populations, we define . The product is a permutation matrix

(which we index by ), and it is possible to identify different pairs, that will result in

the same permutation matrix. When the product is identical for different pairs of and , i.e.

, the same point, will be realized via the random variables. As long as the probability

associated with the point, is the same in , for each , is exchangeable. Note

that these probabilities may be different for different . As a special case, when for all

and , then for population , the prior distribution of is exchangeable. In this setting,

and where is a vector of ones, , is an


identity matrix, , and . Using to denote expectation with respect to

, where and where and

. Since possible values of are defined by realizations of , and

.

2.3. The Data

We define the data next. Suppose response is observed on a set of labeled subjects from the

population given by where is the subject’s label, and is the subject’s

response. We list the subject labels in the data, in a vector , where

the subject’s label in position is for . We define the corresponding response vector

by , where is response for the subject with label . The subjects may be placed in

different orders defined by , , where is an permutation matrix with

elements in row of column that take a value of one when is in position in , and zero

otherwise. Each row and column of total to one. We define , an identity matrix, so that

. The vector where for is response for the subject whose label

is in row of . Using this notation, we define the data equivalently as ,

indexing the vector of subject labels and response by the different possible orders of subjects. For the

data, the average response is , and .


2.4. The Posterior Distribution

The posterior distribution is the joint distribution of the data and the prior distribution, given the

data. For to be consistent with the data, we require that , defining when ,

and otherwise. Notice that even when , as long as , only a subset of points in the

prior distribution of population will be consistent with the data. By such consistency, we require the

set of subjects observed in the data to match the set of subjects defined by points for the random

variables in the prior distribution that are associated with the data. These points will have positive

probability in the posterior distribution. We re-formulate the notation to be able to clearly define such

points.

For population where , without loss of generality, we assume that the first

positions in correspond to the data. Each point, with positive prior probability has

associated with it a vector of labels, , which we partition so that is

an vector representing subject labels that could potentially correspond to subjects in . Recall

that the initial listing of subjects in was arbitrarily defined by placing the subject labels into a vector

. Without loss of generality, we define the initial listing (corresponding to ) as

where , implying that the subject in each position of the first positions of

is the same subject as the subject in the corresponding position in the data given by .

Replacing by , . We partition the response vector in for listing in a

similar manner, such that where , and note that since , .

document.doc 5/18/2023 7:42 AM 10

Recall that response for the data corresponds to any of the set of vectors where

specifies different ordering of subjects, . In addition, the subject labels for points

in the prior distribution of population are given by with response

. Let us partition such that is the first rows of . Using these expressions,

the subject labels for the point in the prior when can be expressed as

, implying that where with

of dimension . This vector of subject labels will match an ordering of the subject labels for

the data, , when is equal to for some . This identifies the subjects for

the partially observed points in the prior with subjects in the data. Only points where for

some can have positive probability in the posterior distribution. Assume now that

for some , say so that . The response for the vector of subjects

in the prior is given by , and equal to the response for the corresponding subjects in the

data given by . This implies that when , response for the prior listing for is given

by . We define the average response for the subjects in as

and .

document.doc 5/18/2023 7:42 AM 11

We express the requirement that for some in a simpler manner by

changing notation. When , for some . Similarly,

for , where is an permutation matrix.

Let us define indicator variables, , and , for each point

where if , and zero otherwise. A point is in the posterior

distribution if , implying that for some and

. Given and , this will occur at most for one value of and , which we

indicate by and , respectively. Similarly, given and where , for each

listing , there will be one value of where . We represent response for such a point by

, noting that is equal to when . Given and listing , let

define an indicator for the point in the posterior. This same point will occur in each listing, and

since the distribution is exchangeable for each , it will have an identical prior probability given by

for all , and identical posterior probability (given ) equal to

. Using this notation, when we represent the posterior random variables for

document.doc 5/18/2023 7:42 AM 12

by , where is an indicator random variable for the point

, and .

In many situations, , implying that for each permutation of the data, , a

permutation of the remaining subjects has the same probability of occurring for all . In

such a setting, we define as an indicator random variable that has a value of one when the order of

the first subjects is , and zero otherwise, and define as an indictor random variable that has

a value of one when the order of the remaining subjects is , and zero otherwise, such that

and . Using these indicator random variables, we define

and , so that where . The matrix

is an random permutation matrix where and , while

is a random permutation matrix where and . We define

the random vector whose realization corresponds to permutations of subject labels in the data

set.

Assuming all permutations are equally likely, for and , so

that . In this setting when , represents the

probability that the first subjects in correspond to permutation of and the remaining

document.doc 5/18/2023 7:42 AM 13

subjects correspond to permutation of . Assuming , this results in

and . Note that the points in the posterior distribution are defined similarly for each

where , and that and .

In order for points in the posterior distribution to have positive probability, we require that

. Let define the set of in the prior that are consistent with the data, and

be an indicator random variable that is equal to one with probability , and zero

otherwise. Taking account of the prior probabilities for , we define the posterior random variables

which simplifies to where and .

We now consider the posterior distribution of . Recall that the prior distribution of is

defined by and their associated prior probabilities, , . The posterior

distribution of , given the data, , is defined by and their associated posterior

probabilities, where . Notice that we can express where the

first term in the expression, i.e. , is the total response for the subjects in the data which is known

for all . This total was not known for in the prior. The posterior distribution

document.doc 5/18/2023 7:42 AM 14

differs from the prior in that it includes only prior parameters given by , with

replaced by .

We briefly discuss the implications of these results before giving a simple example. Of special

interest is understanding the relationship between the prior distribution and the posterior distribution.

Let us focus on the difference that occurs in our understanding of a prior parameter after accounting

for the data. Assuming that , after accounting for the data, we represent

. The portion of that depends on is , the average response for

subjects that we did not observe in the data. This observation matches intuition, since if we observe

response on some subjects in the population and are interested in the average for the entire population,

then the part of the problem that remains unknown is a function of the remaining unobserved subject’s

response. Suppose that the population is very large relative to the data, such that . In such a

setting, , and the difference between the prior distribution and the posterior distribution

corresponds simply to altering the prior probabilities to exclude where , i.e., prior parameters

that are not consistent with the data. While this result is intuitive, it provides little weight to support

Bayesian inference in this context as a useful approach. We discuss a simple example that illustrates

these ideas next, and subsequently discuss an extension of these results to settings with measurement

error, where a Bayesian approach can provide valuable new inference.

2.5. A Simple Example

We illustrate the posterior random variables with a simple example where the data corresponds

to response for , where . These subjects are members of

document.doc 5/18/2023 7:42 AM 15

a population of subjects, and we wish to know the population mean,

, where . Although the population is well defined, we don’t actually

know the name of the third subject in the population.

Let us define a superpopulation by where and define the

subjects in the superpopulation by . We assume that the subjects in the population

are a subset of subjects in the superpopulation such that ; the subject’s label, , is

assumed to be unique for each subject in the superpopulation; and the subject’s response, , is a non-

stochastic parameter. We also assume that for and , . This

means that the response for a subject in the data is the same response that would occur for the subject if

they were in any prior population, . We define the prior distribution of as with their

associated prior probabilities, , , for with subjects

, and assume that the parameters correspond to the average

response for different possible subsets of , as illustrated in Table 1. The prior probabilities that we

associate with each are subjective. For illustration, we assume , ,

and .

-insert Table 1 here-

The prior distribution may be reasonable in a setting where the finite population is constantly changing,

and we are interested in the mean response at a particular point in time. The probabilities associated

document.doc 5/18/2023 7:42 AM 16

with the parameters in the prior are assigned subjectively, and this subjectivity is evident in the

probabilities associated with parameters in the posterior distribution.

We expand the prior distribution for each assuming exchangeability of response for the

subjects, illustrating the points for in Table 2. Each row in Table 2 corresponds to a different

listing, while the columns are represented by the vectors partitioned into an vector

that represents the subspace of that is potentially observed

(corresponding to the data (above the dashed line) with values, , where Initial corresponds to the

first letter of the subject’s name), and an vector that represents the ortho-

complement of corresponding to sub-space for the remaining subjects, with values . The

permutation matrices for each column, , are listed at the top. The points above the dashed line, i.e.

, represent possible values of the prior random variables that could correspond to the data.

The value for a subject in the data is represented by . We indicate a coordinate in

where the subject and values match by , and coordinates where the subject and value is

not in the data by . Coordinates in that are not potentially observed in the prior, but where

the subject was included in the data are indicated by . Only points where all coordinates

can be represented by either or have positive probability in the posterior

distribution. This results in 12 points in the posterior distribution in Table 2 when , a similar 12

points in the posterior distribution when (with replaced by ), and no points included in the

posterior distribution when or .

-Insert Table 2 here-

document.doc 5/18/2023 7:42 AM 17

The points in the posterior distribution are summarized in Table 3. We note that for each listing,

each point occurs once so that normalizing the probabilities, the posterior distribution of can be

summarized in Table 4.

-insert Table 3 and 4 here-

For illustration, suppose that response for subjects in the superpopulation is given by ,

, , and so that parameters in the prior distribution of are given by ,

, , and with prior probabilities 0.1, 0.2, 0.3, and 0.4, respectively. Given the data

on Lily of and Rose of , parameters in the posterior distribution of are given by

and with probabilities 0.33 and 0.67, respectively.

Notice that the same points for occur for each with positive probability in the posterior

distribution. As a result, the prior probabilities do not alter the posterior distribution of , which

depends only on the assumption of exchangeability, such that and .

The expected value and variance of is different for the prior and the posterior. In the prior, the

expected value is , where ; in the posterior, the expected value of is . The

expected value and variance in the posterior depends on the prior probabilities where

where and , while

where and .

In the posterior, and are independent.

document.doc 5/18/2023 7:42 AM 18

3. THE PRIOR AND POSTERIOR DISTRIBUTION WHEN IDENTIFIABLE SUBJECTS

ARE MEASURED WITH ERROR

We discuss a setting where subjects are measured with error, such that an observed response is a

subject’s latent value plus measurement error. Koop (1974) discussed a general sample survey

framework with measurement error similar to that used here, but did not discuss a Bayesian approach

for inference. Our interest may be in the average latent value, , in a population, or in the latent value

for a subject in the data. We discuss inference for these two target parameters based on the posterior

distribution next.

3.1. The Data when Subjects are Measured with Error

Suppose response is observed on occasion for a set of labeled subjects given by

where is a subject’s label and is a subject’s response on occasion .

The set of subject labels and are defined in Section 2.3. We define the vector of latent values

corresponding to by and the response vector by where the latent value and

response for is given by and , respectively.

We assume that for on occasion , we observe . The latent value, , and

measurement error, , are not directly observable. Assume that for , is the realization of a

random variable representing measurement error, where and , with

indicating expectation with respect to measurement error. We also assume that for any

two measures of the same subject, and for measures on any two subjects. Although in general,

measures may be made on a subject, we assume for simplicity that for all . With these

document.doc 5/18/2023 7:42 AM 19

assumptions, we represent response for as the realization of the random variable . We

define the corresponding vector of random variables by where , ,

and . The set of realized responses of the random variables , , is observed.

Subjects may be placed in different orders defined by , , where we define

so that . The vector corresponds to latent values for the subjects in order

, and corresponds to a vector of random variables whose realization is response for the

subjects in order . Using this notation, we define the latent values for the subjects in the data

equivalently as and the random variables whose realization is response for the

subjects equivalently as , indexing the vector of subject labels, latent values,

and random variables for response by the different possible orders of subjects. For the data, the average

response is , the average latent value is , and .

3.2. The Prior Distribution.

We motivate the prior distribution similarly to the setting where there is no measurement error.

Although the average latent value, , for population is not known, we assume that from previous

studies and/or experience, we can guess it. Associated with each guess we assign a prior probability

, , where . We believe that the actual population mean, , is one of the possible

values specified in the prior. We define the vector of latent values corresponding to in Section 2.1

by and a corresponding vector of measurement error variances by . The

prior parameter is defined for with subjects for

document.doc 5/18/2023 7:42 AM 20

. The prior distribution is defined as in Section 2.2, as where

and , with corresponding measurement error variances given by

. When , we define , denoting and , with the

corresponding partitioned vectors represented by and .

3.3. The Posterior Distribution

We proceed similar to Section 2.4 to define the posterior distribution as points in the prior

distribution that are consistent with the data. By consistency, we require . A point is in the

posterior distribution if , and for some and . For

, the latent values for these points define the latent values for the posterior random

variables given by , where . When there is measurement error, the

latent values in are not observed. Instead, we observe a realization of response

for . Replacing by , we represent the posterior random variables by

where and is response for the subject in row of .

Assuming all permutations are equally likely, the expected value simplifies to

and such that , and

document.doc 5/18/2023 7:42 AM 21

where . Notice that

and are independent. The posterior distribution of does not depend on the prior distribution, ,

while the posterior distribution of is marginal over measurement error.

3.4. An Example with Measurement Error

Consider an application where there is interest in the population mean, , for the listing . A

subjective Bayesian may assign latent values to subjects in a superpopulation of given by

, , , and so that parameters in the prior distribution of the average latent value in

a population of subjects are given by , , , and with prior

probabilities 0.1, 0.2, 0.3, and 0.4, respectively. Suppose the subjects are ,

where we assume that and , so that . These assumptions imply that

and that .

We represent response for the points in the posterior distribution in terms of (instead of the

realized response, ), and represent including measurement error in .

Assuming the prior distribution is the same as the prior in section 2.5, the posterior distribution can be

summarized in terms of four random vectors in Table 5.

-insert Table 5 here-

Since and the latent value for is known (i.e., and equal to when or

when , the posterior distribution of is with probability 1/3 and with probability

document.doc 5/18/2023 7:42 AM 22

2/3. If the posterior mean is used as an estimate of , then the estimate is 9.67.

Alternatively, suppose we wish to estimate the latent value for a subject in the data, say Lily.

Apart from the assumption of exchangeability, the prior distribution is not relevant since in the

posterior, , where are random subject effects, and

. Lily can not be identified in the posterior distribution. However, we can express

the model for the random variable in position in by

,

and use finite population mixed model methods (Stanek and Singer, 2004, Stanek and Singer, 2011) to

predict the latent value for the subject in row of given by the best linear unbiased predictor

(BLUP), which simplifies to where and . The quantity

is the latent value for the subject in row of , where is the random

effect associated with the subject in row of . When Lily is in row , we refer to Lily as the

realized subject, and as the realized random effect. If we substitute Lily’s response, for

and for in , we obtain the BLUP of the realized subject’s latent value.

4. PRIOR DISTRIBUTIONS THAT PARTIALLY IDENTIFY SUBJECTS

The results in Section 3 illustrate that the sampling distribution underlying a FPMM for the

subjects in the data set matches the posterior distribution conditional on the subjects in the data set in a

Bayesian analysis with an exchangeable prior distribution for subjects and measurement error,

document.doc 5/18/2023 7:42 AM 23

regardless of the prior probabilities. Conditional on the data set, the prior probabilities only are retained

in the distribution of remaining subjects not included in the data set. Subjects are not identified in the

posterior distribution; the measurement error variance for any random variable, for in the

posterior distribution is constant, and equal to the average measurement error variance for subjects in

the data set given by . We develop a partially identifiable Bayesian model by altering the prior

distribution so that the measurement error variance in the posterior distribution can be identified with a

labeled subject. Of interest is the latent value for a labeled subject in the data where response is

observed with measurement error and the measurement error variance for the subject is known.

We motivate this new prior distribution based on the results in Section 3, where the response for

a subject in the posterior is the realization of a random variable for in .

The subject is not identifiable, and we can not identify in which position the subject will occur. In the

prior distribution, there will be one or more random variables given by

such that for some realization of , ,

and . In the posterior distribution, we represent these random variables by .

In the data, both the subject’s label and response are known. Let be an permutation

matrix such that lists the subject’s in the data in a specific order. This order is arbitrary, and can

be specified subjectively. We refer to this ordering as the realized order. We would like to associate

with the response for each subject in the realized order their measurement error variance. For this to

occur, the measurement error must be equal to for all .

We propose a prior distribution that will maintain the correspondence of the measurement error

variance for each subject in the realized order in the posterior distribution. In order to keep the

expression for the measurement error to equal for all in the posterior distribution, let

document.doc 5/18/2023 7:42 AM 24

us define , and represent each of the random variables in the prior that would correspond to

in the posterior distribution by random variables in a new prior distribution given by

. Since each of these prior random variables is the posterior distribution, and

, we can represent the random variables in the posterior distribution by . We

call the prior that results in this posterior a partially identifiable exchangeable prior.

For a given , we express the partially identifiable exchangeable prior by defining

response for points indexed by and similar to , using the same marginal (over response

error) prior probabilities, , for these points. We only change the definition of the points in the

prior distribution that have positive probability in the posterior distribution, replacing

by . We represent random variables for the posterior distribution as

. Assuming all permutations are equally likely,

, with expected value , and

where and

. The posterior distribution is conditional on the data, with

measurement error assigned according to subjects listed in the posterior distribution in order . In

document.doc 5/18/2023 7:42 AM 25

this way, the measurement error is identified with a subject, but the subject’s latent value is not

identifiable.

The posterior distribution of for the partially identifiable Bayesian model has an expected

value and variance similar to a commonly used mixed model. Let us list subjects in the data set in the

order and represent their response by corresponding to the realization of

, where is a random variable representing response for the subject with the label

in row of . The posterior distribution is the distribution of a vector of random variables,

, where the subject associated with position in the vector has a label in position

in . The expected value is given by , and the variance is

, where and represents the measurement error

variance for the subject in position of . Consider any set of constants, where

and , and define a corresponding vector . Also, let us define

so that and . With these definitions, we can represent

for where . The expected value and variance of this model is similar to that

for a FPMM used to represent the posterior distribution in Section 3, but with a different variance

structure. The BLUE estimator of in this model is the weighted least squares estimator,

document.doc 5/18/2023 7:42 AM 26

where and . The BLUP of the latent value for subject in

position is . To obtain an estimate of or of the latent value for given

by , we replace by the realization of in or .

It is valuable to compare the possible points in the posterior distribution to the points that can

actually be observed with a measurement error model for subjects in the data. We do so via a simple

example where we assume with , defined such that and

. We assume that Rose and Lily have latent values given by 6 and 9, respectively such that

. This implies that and . We also assume that the measurement

error variance is known, and given by , while . Finally, we assume that

measurement error deviations can take on two possible equally likely values (i.e. for Rose, ; for

Lily, ). With these assumptions, the response for could be , ,

or with equal probability.

Let so that represents the order of subjects in the partially

labeled posterior distribution. The random variables in of the partially identifiable posterior

distribution correspond to where and .

Also, and . When , the only values of that can satisfy

and corresponds to . As a result, possible points in the

document.doc 5/18/2023 7:42 AM 27

partially labeled posterior distribution correspond to , , , , ,

, , . In the posterior distribution, each of these points is equally likely. Only

the first four points in this distribution match possible realizations of the data, distinguishing the

partially identifiable posterior distribution from reality. By including the additional artificial points in

the posterior distribution, the average accuracy of the estimator of each latent value (over ) is

best.

5. DISCUSSION

The development has practical implications for interpretation of the posterior distribution in a

Bayesian analysis in such simple settings. First, since the posterior distribution can be divided into two

independent distributions, with one including the subjects in the data set, if so that ,

apart from limiting the prior populations to those where , the mean of the posterior

distribution will minimally update the mean of the prior distribution. Interpretation of this mean is

closely related to interpretation of the mean of the prior distribution. If all possible sets of

subjects from a superpopulation are assigned equal prior probabilities in a prior, the mean of the

posterior distribution can not be interpreted as the simple mean latent value in the superpopulation since

subjects in the data set will have different weights, clarifying discussion by Graubard and Korn (2002).

The framework permits simultaneous discussion of inference from finite population, superpopulation,

and Bayesian approaches. Second, in the simple Bayesian model with heterogeneous measurement

error, subjects’ latent values and measurement errors are not identifiable in the posterior distribution.

The BLUE of the mean latent value for subjects in the data set using the posterior is the simple mean

document.doc 5/18/2023 7:42 AM 28

response, not the weighted least square mean. Similarly, the BLUP for a realized subject’s latent value

is the FPMM BLUP similar to Stanek and Singer (2004), not a BLUP with shrinkage constants that

vary with the realized subjects.

Finally, we introduce a new prior distribution that allows measurement error to be identified

with a subject in the posterior distribution of the data set, but not with the subject’s latent value. The

BLUE of the mean latent value for subjects in the data set based on such a posterior is the weighted

least square mean, and the BLUP of a realized subject’s latent value has a shrinkage constant specific

for the realized subject. Simulation studies similar to those of San Martino, Singer, and Stanek (2008)

in a finite population show that the BLUP of a realized subject’s latent value that allows the shrinkage

constants to vary along with the realized subject’s variance can be more accurate. More importantly,

the explicit tracking of subject labels for points in the posterior in section 2 makes it possible to develop

flexible prior distributions so as to retain subject’s identity for specified terms (such as measurement

error) in the posterior distribution for subjects in the data set. This flexibility enables models to be

specified that partially condition on some subject specific factors, without conditioning fully on a

subject. Predictors from such models, such as the weight least squares mean, may result in more

accurate inference. The Bayesian models that partially account for identifiable subjects appear to be a

step in this direction. We expect that further study of these models may lead to a better understanding

of how to creatively expand reality to better understand it.

document.doc 5/18/2023 7:42 AM 29

References

Basu, D. (1969). “Role of the sufficiency and likelihood principles in sample survey theory,” Sankhya

A (31):441-454.

Ericson, W.A. (1969). “Subjective Bayesian models in sampling finite populations,” Journal of the

Royal Statistical Society B 31:195-234.

Ericson, W.A. (1988). Bayesian inference in finite populations. In P.R. Krishnaiah and C.R. Rao (eds),

Handbook of statistics, Vol. 6. Elsevier Science Publishers, Amsterdam, pp. 213-246.

Godambe, V.P. (1955). “A unified theory of sampling from finite populations,” Journal of the Royal

Statistical Society, Series B, 17, 268-278.

Godambe, V.P. (1966). “Bayes and empirical Bayes estimation in sampling finite populations,”

Technical report No 41, Department of Statistics, The Johns Hopkins University, 1966.

Graubard, B.I. and Korn, E.L. (2002). “Inference for superpopulation parameters using sample

surveys,” Statistical Science 17(1):73-96.

Koop, J.C. (1974). “Notes for a unified theory of estimation for sample surveys taking into account

response errors,” Metrika 21:19-39.

document.doc 5/18/2023 7:42 AM 30

San Martino, S., Singer, J.D., and Stanek, E.J. III (2008). “Performance of balanced two-stage empirical

predictors of realized cluster latent values from finite populations: a simulation study,” Journal of

Computational Statistics and Data Analysis 52:2199-2217.

Scott, A. and Smith, T.M.F. (1969). “Estimation in multi-stage surveys. Journal of the American

Statistical Association 64(327): 830-840.

Stanek, E.J. III and Singer, J.M. (2004). “Predicting random effects from finitie population clustered

samples with response error,” Journal of the American Statistical Association, 99:468, 1119-1130.

Stanek, E.J. III and Singer, J.D. (2008). “Predicting random effects with an expanded finite population

mixed model,” Journal of Statistical Planning and Inference, 138(10):2991-3004.

Stanek, E. J. III and Singer, J.M. “Sampling, WLS and Mixed Models,” (2011, in press) Statistics in

Biopharmaceutical Research.

Ruitao Zhang, "Developing best linear unbiased estimator in finite population accounting for

measurement error due to interviewer" (January 1, 2010). Electronic Doctoral Dissertations for UMass

Amherst. Paper AAI3427614.

document.doc 5/18/2023 7:42 AM 31

Table 1. The Prior Distribution of with parameters equal to the average response of subjects from subsets of subjects in a superpopulation with .

Probability Mean Subjects Population


Table 2. Possible Points in the Prior Distribution for population with when Data is on

(as indicated by response )

Listing


Table 3. Possible Points in the Posterior Distribution with when Data is on

and the prior Probability of the Point for a Given Listing

.


Table 4. The Posterior Distribution with when Data is on , along with Posterior points corresponding an Exchangeable Prior

Posterior Parameter

Posterior Probability

Points in Posterior Dist


Table 5. The Posterior Distribution with when Data is on , along with Posterior points corresponding an Exchangeable Prior with Measurement Error

Posterior Parameter

Posterior Probability

Points in

Posterior Dist


microsoft word - document1people.umass.edu/stanek/pubhlth892d/c11ed01v4-shortv5.doc · web...

Documents