microsoft word - document1people.umass.edu/stanek/pubhlth892d/c11ed01v4-shortv5.doc · web...
TRANSCRIPT
Bayesian Models with Measurement Error that Partially Account for Identifiable Subjects
Edward J. Stanek III , Parimal Mukhopadhyay , Viviana B. Lencina , and Luz Mery González
Department of Public Health, University of Massachusetts at Amherst, USA
Indian Statistics Institute, Kolkata, India
Facultad de Ciencias Economicas, Universidad Nacional de Tucumán, CONICET, Argentina
Departamento de Estadística, Universidad Nacional de Colombia, Bogotá, Colombia
document.doc 5/18/2023 7:42 AM i
In a mixed model, latent values associated with subjects are typically random while the subject
specific measurement error variances are not considered to be random. To understand this paradox, we
consider prediction via Bayesian models similar to those proposed by Ericson (1969), but that account
for identifiable subjects. Defining the data as response for a subset of identifiable subjects from a finite
population, we note that the posterior distribution of the subjects’ latent values in the data under an
exchangeable prior for population latent values is an exchangeable distribution of the subjects’ latent
values in the data. We expand this development to settings where response is measured with error on
subjects in the data set, and develop the expected value and variance of the posterior distribution of the
subjects’ latent values in the data, revealing that the measurement error variance component in the
posterior distribution includes the average measurement error variance instead of subject specific
measurement error variances. Based on these results, we specify a new prior distribution that leads to a
posterior distribution of the subjects’ latent values in the data where the subjects’ identities are retained
for measurement error, but not for the corresponding latent values. This class of models allows flexible
specification of fixed and random effects, and highlights the distinction between potentially observable
points and artificial points in the prior distribution. The results clarify the relationship between physical
populations and measurements, and stochastic models that may partially connect to this physical reality.
There are important implications in applications when there is interest in estimating population
parameters, domain means, and latent values for realized random effects.
document.doc 5/18/2023 7:42 AM ii
ACKNOWLEDGEMENT
This work was developed at a joint meeting of the authors in July, 2010 in the Department of Public
Health at the University of Massachusetts, Amherst, USA and a follow-up meeting in September, 2010
in the Departamento de Economia, Universidad Nacional de Tucumán, Tucumán, Argentina.
Appreciation is given to the helpful comments of Julio Singer, Wenjun Li, Michael Lavine, Shrikant
Bangdiwala, Silvina San Martino, and Mirta Santana on early drafts of this manuscript. Previous
meetings over the past five years of the Finite Population Mixed Model Research group (including
many of these investigators) were supported by the National Institutes of Health (NIH-PHS-R01-
HD36848, R01-HL071828-02), USA, Conselho Nacional de desenvolvimento Científico e Tecnológico
(CNPq) and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Brazil.
Keywords: Finite population, heteroskedasticity, superpopulation, unbiasedness, inference, mixed
models, finite population mixed model.
document.doc 5/18/2023 7:42 AM iii
1. INTRODUCTION
Estimating a subject’s latent value in a population based on response for a subset of subjects is a
common problem. When subjects are measured with error, subjects’ labels have an intriguing role. If
the subjects’ identities are known for the subset and used in defining a stochastic model for each
subjects’ responses, the response error model has fixed subject effects and a subject’s latent value can
be estimated directly from the response error model. Ignoring the subject labels, a mixed model can be
specified for response that sums the latent value and measurement error for subjects in a randomly
selected set of subjects, and used to predict a realized subject’s latent value. Although the latent value
of a realized subject in the set can be estimated (or predicted) with each model, the best linear unbiased
predictor (BLUP) from the mixed model is more accurate.
The subjects’ labels are not completely suppressed in mixed models when measurement error
variances differ between subjects. In such settings, although subject labels are ignored (so that subject
effects are random), response error is identified with realized subjects. This partial use of labeling
results in weighted least squares estimates of the mean in a mixed model, and shrinkage constants for
BLUPs of realized subjects’ latent values that depend on the realized subjects’ response error variances.
The better accuracy of these estimators relative to estimators that result from a finite population mixed
model (FPMM) proposed by Stanek and Singer (2004) provides the motivation for this investigation.
We seek to more clearly understand the manner in which labels are partially used in mixed models
when there is heterogeneous endogenous variance.
One approach that enables a subject’s identity to be known is to expand the number of random
variables in a finite population framework that represents the underlying problem, as in Stanek and
Singer (2008). Models using this approach can include effects identified with subjects, but the
document.doc 5/18/2023 7:42 AM 1
approach does not lead to practical estimators. The approach is based on representing the data using an
expanded number of random variables relative to finite population models, but fewer random variables
than would be needed for Godambe’s (1955) general sampling framework, where a different probability
may be associated with each permutation of the labeled data (Zhang, 2010). A second approach to
partially account for a subject’s identify is to use superpopulation models (Godambe 1966). Although
superpopulation models may be developed where subject’s latent values are random, and measurement
error is fixed as in Scott and Smith (1969), they do so by introducing points into the modeling space
that can not be observed (Stanek and Singer 2011). We explore a third approach to partially account for
subject’s identify that uses Basu’s (1969) sufficient statistics in Godambe’s framework in a Bayesian
context.
Our investigation focuses on the role of labels in a simple Bayesian model similar to Ericson
(1969,1988), proceeding in the following manner. We first introduce notation and the basic ideas when
there is no measurement error and where our interest is in the target parameter corresponding to the
population mean. Subsequently, we apply this framework to a setting where subject’s response
includes measurement error with heterogeneous variances, and develop a predictor of a realized
subject’s latent value from the posterior distribution. Random variables for this distribution can be
defined identical to those in a FPMM assuming the subjects in the data set are a finite population.
Measurement error variances are not identifiable. Finally, we introduce a new prior distribution that
results in a posterior distribution where the measurement error variance is identifiable, but random
effects only match the first and second moments of a set of exchangeable random variables whose
realization is a vector of latent values for subjects in the data set. Response for a subject in this
posterior distribution can be represented by a mixed model where the latent values for the subjects in
the data are exchangeable, but the measurement error variance is identified with a realized subject.
document.doc 5/18/2023 7:42 AM 2
2. THE PRIOR AND POSTERIOR DISTRIBUTIONS OF THE POPULATION MEAN
WITH IDENTIFIABLE SUBJECTS AND NO MEASUREMENT ERROR
Assume our interest is in the mean response in a population of subjects, but that the observed
data corresponds to a subset of labeled subjects’ responses (equal to the subjects’ latent values). We
define the population as where represents a subject’s label (such as a
name, assumed to be unique for each subject), and represents the subject’s response (equal to the
subject’s latent value) which is a non-stochastic parameter. The population mean is the simple average
latent value in the population, which we represent by , where is
the set of subjects in the population. It is not necessary that the labels be completely known, nor that
be exactly specified for such a population to be conceptually defined. However, we require the
population to be defined in space and time, so that these quantities (while not known) are at least
potentially knowable, and is defined for . For illustration, our interest may be in the average
number of hypoglycemic episodes in the past year that occurred for Medicare enrollees with a diabetes
diagnosis in Massachusetts on 7/1/2010. Conceptually, a list could be formed of all enrollees to define
the finite population, where for each enrollee, the actual number of hypoglycemic episodes is recorded.
It is not necessary to list the enrollees and record each response to have a clear interpretation of .
Although we don’t know the population mean, we assume that from previous studies and/or
experience, we can guess it. The previous studies most likely were conducted in different locations and
times, and have different strengths and limitations. Associated with each guess, we assign a prior
probability that reflects our subjective measure of belief that the guess is the actual value of the
parameter for the population. These values, , and their associated prior probabilities, ,
document.doc 5/18/2023 7:42 AM 3
, define the prior distribution, where . We assume (i.e. believe) that the actual mean is one of
the possible values specified in the prior. Let the prior parameter be defined for
with subjects . In order for , we require not
only that , and , but also that for each , must be identical in and .
Different interpretations can be given to the population underlying the prior parameter and its
associated prior probability, .
One concept of the population underlying the prior parameters is that the subjects are the same
for all , such that , but that the latent value, , for a given subject is uncertain. The
populations and are distinguished by having and differ for at least one .
An example occurs when the populations are defined by associating the latent values in any of
possible permutations of latent values with a given listing of subjects in . For this set of prior
populations if all responses are distinct, although and for all , for only
one will equal . It is for this population that we wish to know . A variation on this concept
is to define the populations by associating the latent values in any of possible permutations of
any of a set of latent values where , resulting in . The subjects may
correspond conceptually to the subjects in a superpopulation. The prior parameter, , will be the same
for all permutations of a set, but may differ for different sets. When is very large, the prior
distribution of could be approximated by a continuous parametric distribution.
A different concept of the prior can be illustrated via an example. Suppose we want to know the
document.doc 5/18/2023 7:42 AM 4
average annual Medicare cost per enrollee in 2010 in the United States, with enrollees defined as the
enrollees on July 1, 2010. From past research, we can specify (or guess) values of for each of the
years 2006-2009, knowing that the population of Medicare enrollees differs somewhat between the
years. Conceptually, assume that we are able to form a list of all enrollees over the time period 2006-
2009, and associate with each enrollee their average annual cost for each year they are enrolled in
Medicare. Let us augment this list with the new 2010 enrollees in Medicare to form a superpopulation
of subjects, . For a subject enrolled for four years in Medicare, there will
be four pairs that may differ as a result of different latent values for the four years.
By definition, the enrollees in 2010 are a subset of the subjects, but without knowing ,
we don’t know exactly what subset corresponds to the 2010 enrollees. Suppose we construct all
possible subsets of subjects from the superpopulation, and for each subject in the subset, choose
one pair, defining and . The 2010 enrollees correspond to one of these subsets of
subjects, but we don’t know which one. We may limit the possible latent values for a subject to those
observed in previous years, or expand the set of latent values that are possible for a subject similar to
the previous example. Conceptually, the entire set of populations and their corresponding parameters,
, can be summarized in a distribution. Although the actual enrollees in 2010, , are not known, for
many , . In one of these populations, we assume . The prior distribution reflects our
uncertainty over for which , is equal to .
Often the prior distribution may be defined by and their associated prior probabilities, for
with only an implicit understanding of . Additional specificity of the prior population is
not needed, and thus suppressed. The population could conceivably be identical to any of the
document.doc 5/18/2023 7:42 AM 5
populations in the prior. Suppose now that we observe data on a
subset of subjects, , where is the latent value for the subject labeled . The
basic idea underlying Bayesian inference is to use these data to reduce the uncertainty associated with
the prior distribution. The uncertainty is reduced since once we know the subjects in the data, only
prior populations that include these subjects, i.e., where , are possible. The challenge in
updating the prior distribution is identifying where , such that for each , .
2.1. Identifying Subjects in an Exchangeable Distribution for Population
In order to have a clear strategy for linking subjects in the data to subjects in the prior
distribution, we expand the representation of the prior so as to be able to identify subjects. We do so by
assuming that response for the subjects in each population is a realization of a vector of
exchangeable random variables, and define notation that enables subjects to be identified for each point
in such an exchangeable distribution. Suppressing the subscript to simplify notation, for each
, we define , as a vector of exchangeable random variables similar to
Ericson (1969) such that the joint probability density of response ,
associated with each permutation, , of the subjects in is identical for all . Unlike
Ericson, we define points in the prior distribution for each so that both the subject’s parameter and
the subject’s label can be identified. To do so, we introduce notation to keep track of permutations of
subject labels used to construct different listings, and notation for permutations of subject’s response
used to define possible points in . In each case, we maintain the connection between the subject’s
label and response evident in . First, we define different possible listings.
document.doc 5/18/2023 7:42 AM 6
A population listing links each subject’s label to a position, , in an ordered array .
Let us define an initial listing by placing the subject labels into a vector , where the
subject’s label in position is for . We define the corresponding parameter vector by
, where is response for the subject with label . Different listings are defined by
, where , , is an permutation matrix with elements in row of column
that take a value of one when is in position in listing , and zero otherwise. Each row and
column of total to one. We define , an identity matrix, so that . By knowing
(and hence ), we know which subject is in position in , and hence know the subject
associated with response in position in .
The vector of random variables is defined for listing with points
for and . The matrix is an permutation matrix
with elements in row of column that take a value of one when the subject in listing is in
position in permutation , and zero otherwise. Rows and columns of total to one. By knowing
(and hence ), we know which subject from listing corresponds to the response in position of
. The labels for subjects in are given by . We associate an indicator random
variable, , with such that where for all , and denotes
expectation over . With these definitions, . Associated with each point is a
document.doc 5/18/2023 7:42 AM 7
vector of subject labels, , and a probability . The additional indices and determine
, and along with the definition of , determine the order of the subjects in for population
. We also define, where .
2.2. The Prior Distribution with Exchangeable Subjects
The vector is defined for each , , which we now identify using the subscript
, such that where . For each , , we assume the vector
of random variables are exchangeable implying that the joint probability density is identical
for all , and represent a vector of random variables with this joint probability density by .
Defining as an indicator random variable for population such that , including the
distribution of the populations, we define . The product is a permutation matrix
(which we index by ), and it is possible to identify different pairs, that will result in
the same permutation matrix. When the product is identical for different pairs of and , i.e.
, the same point, will be realized via the random variables. As long as the probability
associated with the point, is the same in , for each , is exchangeable. Note
that these probabilities may be different for different . As a special case, when for all
and , then for population , the prior distribution of is exchangeable. In this setting,
and where is a vector of ones, , is an
document.doc 5/18/2023 7:42 AM 8
identity matrix, , and . Using to denote expectation with respect to
, where and where and
. Since possible values of are defined by realizations of , and
.
2.3. The Data
We define the data next. Suppose response is observed on a set of labeled subjects from the
population given by where is the subject’s label, and is the subject’s
response. We list the subject labels in the data, in a vector , where
the subject’s label in position is for . We define the corresponding response vector
by , where is response for the subject with label . The subjects may be placed in
different orders defined by , , where is an permutation matrix with
elements in row of column that take a value of one when is in position in , and zero
otherwise. Each row and column of total to one. We define , an identity matrix, so that
. The vector where for is response for the subject whose label
is in row of . Using this notation, we define the data equivalently as ,
indexing the vector of subject labels and response by the different possible orders of subjects. For the
data, the average response is , and .
document.doc 5/18/2023 7:42 AM 9
2.4. The Posterior Distribution
The posterior distribution is the joint distribution of the data and the prior distribution, given the
data. For to be consistent with the data, we require that , defining when ,
and otherwise. Notice that even when , as long as , only a subset of points in the
prior distribution of population will be consistent with the data. By such consistency, we require the
set of subjects observed in the data to match the set of subjects defined by points for the random
variables in the prior distribution that are associated with the data. These points will have positive
probability in the posterior distribution. We re-formulate the notation to be able to clearly define such
points.
For population where , without loss of generality, we assume that the first
positions in correspond to the data. Each point, with positive prior probability has
associated with it a vector of labels, , which we partition so that is
an vector representing subject labels that could potentially correspond to subjects in . Recall
that the initial listing of subjects in was arbitrarily defined by placing the subject labels into a vector
. Without loss of generality, we define the initial listing (corresponding to ) as
where , implying that the subject in each position of the first positions of
is the same subject as the subject in the corresponding position in the data given by .
Replacing by , . We partition the response vector in for listing in a
similar manner, such that where , and note that since , .
document.doc 5/18/2023 7:42 AM 10
Recall that response for the data corresponds to any of the set of vectors where
specifies different ordering of subjects, . In addition, the subject labels for points
in the prior distribution of population are given by with response
. Let us partition such that is the first rows of . Using these expressions,
the subject labels for the point in the prior when can be expressed as
, implying that where with
of dimension . This vector of subject labels will match an ordering of the subject labels for
the data, , when is equal to for some . This identifies the subjects for
the partially observed points in the prior with subjects in the data. Only points where for
some can have positive probability in the posterior distribution. Assume now that
for some , say so that . The response for the vector of subjects
in the prior is given by , and equal to the response for the corresponding subjects in the
data given by . This implies that when , response for the prior listing for is given
by . We define the average response for the subjects in as
and .
document.doc 5/18/2023 7:42 AM 11
We express the requirement that for some in a simpler manner by
changing notation. When , for some . Similarly,
for , where is an permutation matrix.
Let us define indicator variables, , and , for each point
where if , and zero otherwise. A point is in the posterior
distribution if , implying that for some and
. Given and , this will occur at most for one value of and , which we
indicate by and , respectively. Similarly, given and where , for each
listing , there will be one value of where . We represent response for such a point by
, noting that is equal to when . Given and listing , let
define an indicator for the point in the posterior. This same point will occur in each listing, and
since the distribution is exchangeable for each , it will have an identical prior probability given by
for all , and identical posterior probability (given ) equal to
. Using this notation, when we represent the posterior random variables for
document.doc 5/18/2023 7:42 AM 12
by , where is an indicator random variable for the point
, and .
In many situations, , implying that for each permutation of the data, , a
permutation of the remaining subjects has the same probability of occurring for all . In
such a setting, we define as an indicator random variable that has a value of one when the order of
the first subjects is , and zero otherwise, and define as an indictor random variable that has
a value of one when the order of the remaining subjects is , and zero otherwise, such that
and . Using these indicator random variables, we define
and , so that where . The matrix
is an random permutation matrix where and , while
is a random permutation matrix where and . We define
the random vector whose realization corresponds to permutations of subject labels in the data
set.
Assuming all permutations are equally likely, for and , so
that . In this setting when , represents the
probability that the first subjects in correspond to permutation of and the remaining
document.doc 5/18/2023 7:42 AM 13
subjects correspond to permutation of . Assuming , this results in
and . Note that the points in the posterior distribution are defined similarly for each
where , and that and .
In order for points in the posterior distribution to have positive probability, we require that
. Let define the set of in the prior that are consistent with the data, and
be an indicator random variable that is equal to one with probability , and zero
otherwise. Taking account of the prior probabilities for , we define the posterior random variables
which simplifies to where and .
We now consider the posterior distribution of . Recall that the prior distribution of is
defined by and their associated prior probabilities, , . The posterior
distribution of , given the data, , is defined by and their associated posterior
probabilities, where . Notice that we can express where the
first term in the expression, i.e. , is the total response for the subjects in the data which is known
for all . This total was not known for in the prior. The posterior distribution
document.doc 5/18/2023 7:42 AM 14
differs from the prior in that it includes only prior parameters given by , with
replaced by .
We briefly discuss the implications of these results before giving a simple example. Of special
interest is understanding the relationship between the prior distribution and the posterior distribution.
Let us focus on the difference that occurs in our understanding of a prior parameter after accounting
for the data. Assuming that , after accounting for the data, we represent
. The portion of that depends on is , the average response for
subjects that we did not observe in the data. This observation matches intuition, since if we observe
response on some subjects in the population and are interested in the average for the entire population,
then the part of the problem that remains unknown is a function of the remaining unobserved subject’s
response. Suppose that the population is very large relative to the data, such that . In such a
setting, , and the difference between the prior distribution and the posterior distribution
corresponds simply to altering the prior probabilities to exclude where , i.e., prior parameters
that are not consistent with the data. While this result is intuitive, it provides little weight to support
Bayesian inference in this context as a useful approach. We discuss a simple example that illustrates
these ideas next, and subsequently discuss an extension of these results to settings with measurement
error, where a Bayesian approach can provide valuable new inference.
2.5. A Simple Example
We illustrate the posterior random variables with a simple example where the data corresponds
to response for , where . These subjects are members of
document.doc 5/18/2023 7:42 AM 15
a population of subjects, and we wish to know the population mean,
, where . Although the population is well defined, we don’t actually
know the name of the third subject in the population.
Let us define a superpopulation by where and define the
subjects in the superpopulation by . We assume that the subjects in the population
are a subset of subjects in the superpopulation such that ; the subject’s label, , is
assumed to be unique for each subject in the superpopulation; and the subject’s response, , is a non-
stochastic parameter. We also assume that for and , . This
means that the response for a subject in the data is the same response that would occur for the subject if
they were in any prior population, . We define the prior distribution of as with their
associated prior probabilities, , , for with subjects
, and assume that the parameters correspond to the average
response for different possible subsets of , as illustrated in Table 1. The prior probabilities that we
associate with each are subjective. For illustration, we assume , ,
and .
-insert Table 1 here-
The prior distribution may be reasonable in a setting where the finite population is constantly changing,
and we are interested in the mean response at a particular point in time. The probabilities associated
document.doc 5/18/2023 7:42 AM 16
with the parameters in the prior are assigned subjectively, and this subjectivity is evident in the
probabilities associated with parameters in the posterior distribution.
We expand the prior distribution for each assuming exchangeability of response for the
subjects, illustrating the points for in Table 2. Each row in Table 2 corresponds to a different
listing, while the columns are represented by the vectors partitioned into an vector
that represents the subspace of that is potentially observed
(corresponding to the data (above the dashed line) with values, , where Initial corresponds to the
first letter of the subject’s name), and an vector that represents the ortho-
complement of corresponding to sub-space for the remaining subjects, with values . The
permutation matrices for each column, , are listed at the top. The points above the dashed line, i.e.
, represent possible values of the prior random variables that could correspond to the data.
The value for a subject in the data is represented by . We indicate a coordinate in
where the subject and values match by , and coordinates where the subject and value is
not in the data by . Coordinates in that are not potentially observed in the prior, but where
the subject was included in the data are indicated by . Only points where all coordinates
can be represented by either or have positive probability in the posterior
distribution. This results in 12 points in the posterior distribution in Table 2 when , a similar 12
points in the posterior distribution when (with replaced by ), and no points included in the
posterior distribution when or .
-Insert Table 2 here-
document.doc 5/18/2023 7:42 AM 17
The points in the posterior distribution are summarized in Table 3. We note that for each listing,
each point occurs once so that normalizing the probabilities, the posterior distribution of can be
summarized in Table 4.
-insert Table 3 and 4 here-
For illustration, suppose that response for subjects in the superpopulation is given by ,
, , and so that parameters in the prior distribution of are given by ,
, , and with prior probabilities 0.1, 0.2, 0.3, and 0.4, respectively. Given the data
on Lily of and Rose of , parameters in the posterior distribution of are given by
and with probabilities 0.33 and 0.67, respectively.
Notice that the same points for occur for each with positive probability in the posterior
distribution. As a result, the prior probabilities do not alter the posterior distribution of , which
depends only on the assumption of exchangeability, such that and .
The expected value and variance of is different for the prior and the posterior. In the prior, the
expected value is , where ; in the posterior, the expected value of is . The
expected value and variance in the posterior depends on the prior probabilities where
where and , while
where and .
In the posterior, and are independent.
document.doc 5/18/2023 7:42 AM 18
3. THE PRIOR AND POSTERIOR DISTRIBUTION WHEN IDENTIFIABLE SUBJECTS
ARE MEASURED WITH ERROR
We discuss a setting where subjects are measured with error, such that an observed response is a
subject’s latent value plus measurement error. Koop (1974) discussed a general sample survey
framework with measurement error similar to that used here, but did not discuss a Bayesian approach
for inference. Our interest may be in the average latent value, , in a population, or in the latent value
for a subject in the data. We discuss inference for these two target parameters based on the posterior
distribution next.
3.1. The Data when Subjects are Measured with Error
Suppose response is observed on occasion for a set of labeled subjects given by
where is a subject’s label and is a subject’s response on occasion .
The set of subject labels and are defined in Section 2.3. We define the vector of latent values
corresponding to by and the response vector by where the latent value and
response for is given by and , respectively.
We assume that for on occasion , we observe . The latent value, , and
measurement error, , are not directly observable. Assume that for , is the realization of a
random variable representing measurement error, where and , with
indicating expectation with respect to measurement error. We also assume that for any
two measures of the same subject, and for measures on any two subjects. Although in general,
measures may be made on a subject, we assume for simplicity that for all . With these
document.doc 5/18/2023 7:42 AM 19
assumptions, we represent response for as the realization of the random variable . We
define the corresponding vector of random variables by where , ,
and . The set of realized responses of the random variables , , is observed.
Subjects may be placed in different orders defined by , , where we define
so that . The vector corresponds to latent values for the subjects in order
, and corresponds to a vector of random variables whose realization is response for the
subjects in order . Using this notation, we define the latent values for the subjects in the data
equivalently as and the random variables whose realization is response for the
subjects equivalently as , indexing the vector of subject labels, latent values,
and random variables for response by the different possible orders of subjects. For the data, the average
response is , the average latent value is , and .
3.2. The Prior Distribution.
We motivate the prior distribution similarly to the setting where there is no measurement error.
Although the average latent value, , for population is not known, we assume that from previous
studies and/or experience, we can guess it. Associated with each guess we assign a prior probability
, , where . We believe that the actual population mean, , is one of the possible
values specified in the prior. We define the vector of latent values corresponding to in Section 2.1
by and a corresponding vector of measurement error variances by . The
prior parameter is defined for with subjects for
document.doc 5/18/2023 7:42 AM 20
. The prior distribution is defined as in Section 2.2, as where
and , with corresponding measurement error variances given by
. When , we define , denoting and , with the
corresponding partitioned vectors represented by and .
3.3. The Posterior Distribution
We proceed similar to Section 2.4 to define the posterior distribution as points in the prior
distribution that are consistent with the data. By consistency, we require . A point is in the
posterior distribution if , and for some and . For
, the latent values for these points define the latent values for the posterior random
variables given by , where . When there is measurement error, the
latent values in are not observed. Instead, we observe a realization of response
for . Replacing by , we represent the posterior random variables by
where and is response for the subject in row of .
Assuming all permutations are equally likely, the expected value simplifies to
and such that , and
document.doc 5/18/2023 7:42 AM 21
where . Notice that
and are independent. The posterior distribution of does not depend on the prior distribution, ,
while the posterior distribution of is marginal over measurement error.
3.4. An Example with Measurement Error
Consider an application where there is interest in the population mean, , for the listing . A
subjective Bayesian may assign latent values to subjects in a superpopulation of given by
, , , and so that parameters in the prior distribution of the average latent value in
a population of subjects are given by , , , and with prior
probabilities 0.1, 0.2, 0.3, and 0.4, respectively. Suppose the subjects are ,
where we assume that and , so that . These assumptions imply that
and that .
We represent response for the points in the posterior distribution in terms of (instead of the
realized response, ), and represent including measurement error in .
Assuming the prior distribution is the same as the prior in section 2.5, the posterior distribution can be
summarized in terms of four random vectors in Table 5.
-insert Table 5 here-
Since and the latent value for is known (i.e., and equal to when or
when , the posterior distribution of is with probability 1/3 and with probability
document.doc 5/18/2023 7:42 AM 22
2/3. If the posterior mean is used as an estimate of , then the estimate is 9.67.
Alternatively, suppose we wish to estimate the latent value for a subject in the data, say Lily.
Apart from the assumption of exchangeability, the prior distribution is not relevant since in the
posterior, , where are random subject effects, and
. Lily can not be identified in the posterior distribution. However, we can express
the model for the random variable in position in by
,
and use finite population mixed model methods (Stanek and Singer, 2004, Stanek and Singer, 2011) to
predict the latent value for the subject in row of given by the best linear unbiased predictor
(BLUP), which simplifies to where and . The quantity
is the latent value for the subject in row of , where is the random
effect associated with the subject in row of . When Lily is in row , we refer to Lily as the
realized subject, and as the realized random effect. If we substitute Lily’s response, for
and for in , we obtain the BLUP of the realized subject’s latent value.
4. PRIOR DISTRIBUTIONS THAT PARTIALLY IDENTIFY SUBJECTS
The results in Section 3 illustrate that the sampling distribution underlying a FPMM for the
subjects in the data set matches the posterior distribution conditional on the subjects in the data set in a
Bayesian analysis with an exchangeable prior distribution for subjects and measurement error,
document.doc 5/18/2023 7:42 AM 23
regardless of the prior probabilities. Conditional on the data set, the prior probabilities only are retained
in the distribution of remaining subjects not included in the data set. Subjects are not identified in the
posterior distribution; the measurement error variance for any random variable, for in the
posterior distribution is constant, and equal to the average measurement error variance for subjects in
the data set given by . We develop a partially identifiable Bayesian model by altering the prior
distribution so that the measurement error variance in the posterior distribution can be identified with a
labeled subject. Of interest is the latent value for a labeled subject in the data where response is
observed with measurement error and the measurement error variance for the subject is known.
We motivate this new prior distribution based on the results in Section 3, where the response for
a subject in the posterior is the realization of a random variable for in .
The subject is not identifiable, and we can not identify in which position the subject will occur. In the
prior distribution, there will be one or more random variables given by
such that for some realization of , ,
and . In the posterior distribution, we represent these random variables by .
In the data, both the subject’s label and response are known. Let be an permutation
matrix such that lists the subject’s in the data in a specific order. This order is arbitrary, and can
be specified subjectively. We refer to this ordering as the realized order. We would like to associate
with the response for each subject in the realized order their measurement error variance. For this to
occur, the measurement error must be equal to for all .
We propose a prior distribution that will maintain the correspondence of the measurement error
variance for each subject in the realized order in the posterior distribution. In order to keep the
expression for the measurement error to equal for all in the posterior distribution, let
document.doc 5/18/2023 7:42 AM 24
us define , and represent each of the random variables in the prior that would correspond to
in the posterior distribution by random variables in a new prior distribution given by
. Since each of these prior random variables is the posterior distribution, and
, we can represent the random variables in the posterior distribution by . We
call the prior that results in this posterior a partially identifiable exchangeable prior.
For a given , we express the partially identifiable exchangeable prior by defining
response for points indexed by and similar to , using the same marginal (over response
error) prior probabilities, , for these points. We only change the definition of the points in the
prior distribution that have positive probability in the posterior distribution, replacing
by . We represent random variables for the posterior distribution as
. Assuming all permutations are equally likely,
, with expected value , and
where and
. The posterior distribution is conditional on the data, with
measurement error assigned according to subjects listed in the posterior distribution in order . In
document.doc 5/18/2023 7:42 AM 25
this way, the measurement error is identified with a subject, but the subject’s latent value is not
identifiable.
The posterior distribution of for the partially identifiable Bayesian model has an expected
value and variance similar to a commonly used mixed model. Let us list subjects in the data set in the
order and represent their response by corresponding to the realization of
, where is a random variable representing response for the subject with the label
in row of . The posterior distribution is the distribution of a vector of random variables,
, where the subject associated with position in the vector has a label in position
in . The expected value is given by , and the variance is
, where and represents the measurement error
variance for the subject in position of . Consider any set of constants, where
and , and define a corresponding vector . Also, let us define
so that and . With these definitions, we can represent
for where . The expected value and variance of this model is similar to that
for a FPMM used to represent the posterior distribution in Section 3, but with a different variance
structure. The BLUE estimator of in this model is the weighted least squares estimator,
document.doc 5/18/2023 7:42 AM 26
where and . The BLUP of the latent value for subject in
position is . To obtain an estimate of or of the latent value for given
by , we replace by the realization of in or .
It is valuable to compare the possible points in the posterior distribution to the points that can
actually be observed with a measurement error model for subjects in the data. We do so via a simple
example where we assume with , defined such that and
. We assume that Rose and Lily have latent values given by 6 and 9, respectively such that
. This implies that and . We also assume that the measurement
error variance is known, and given by , while . Finally, we assume that
measurement error deviations can take on two possible equally likely values (i.e. for Rose, ; for
Lily, ). With these assumptions, the response for could be , ,
or with equal probability.
Let so that represents the order of subjects in the partially
labeled posterior distribution. The random variables in of the partially identifiable posterior
distribution correspond to where and .
Also, and . When , the only values of that can satisfy
and corresponds to . As a result, possible points in the
document.doc 5/18/2023 7:42 AM 27
partially labeled posterior distribution correspond to , , , , ,
, , . In the posterior distribution, each of these points is equally likely. Only
the first four points in this distribution match possible realizations of the data, distinguishing the
partially identifiable posterior distribution from reality. By including the additional artificial points in
the posterior distribution, the average accuracy of the estimator of each latent value (over ) is
best.
5. DISCUSSION
The development has practical implications for interpretation of the posterior distribution in a
Bayesian analysis in such simple settings. First, since the posterior distribution can be divided into two
independent distributions, with one including the subjects in the data set, if so that ,
apart from limiting the prior populations to those where , the mean of the posterior
distribution will minimally update the mean of the prior distribution. Interpretation of this mean is
closely related to interpretation of the mean of the prior distribution. If all possible sets of
subjects from a superpopulation are assigned equal prior probabilities in a prior, the mean of the
posterior distribution can not be interpreted as the simple mean latent value in the superpopulation since
subjects in the data set will have different weights, clarifying discussion by Graubard and Korn (2002).
The framework permits simultaneous discussion of inference from finite population, superpopulation,
and Bayesian approaches. Second, in the simple Bayesian model with heterogeneous measurement
error, subjects’ latent values and measurement errors are not identifiable in the posterior distribution.
The BLUE of the mean latent value for subjects in the data set using the posterior is the simple mean
document.doc 5/18/2023 7:42 AM 28
response, not the weighted least square mean. Similarly, the BLUP for a realized subject’s latent value
is the FPMM BLUP similar to Stanek and Singer (2004), not a BLUP with shrinkage constants that
vary with the realized subjects.
Finally, we introduce a new prior distribution that allows measurement error to be identified
with a subject in the posterior distribution of the data set, but not with the subject’s latent value. The
BLUE of the mean latent value for subjects in the data set based on such a posterior is the weighted
least square mean, and the BLUP of a realized subject’s latent value has a shrinkage constant specific
for the realized subject. Simulation studies similar to those of San Martino, Singer, and Stanek (2008)
in a finite population show that the BLUP of a realized subject’s latent value that allows the shrinkage
constants to vary along with the realized subject’s variance can be more accurate. More importantly,
the explicit tracking of subject labels for points in the posterior in section 2 makes it possible to develop
flexible prior distributions so as to retain subject’s identity for specified terms (such as measurement
error) in the posterior distribution for subjects in the data set. This flexibility enables models to be
specified that partially condition on some subject specific factors, without conditioning fully on a
subject. Predictors from such models, such as the weight least squares mean, may result in more
accurate inference. The Bayesian models that partially account for identifiable subjects appear to be a
step in this direction. We expect that further study of these models may lead to a better understanding
of how to creatively expand reality to better understand it.
document.doc 5/18/2023 7:42 AM 29
References
Basu, D. (1969). “Role of the sufficiency and likelihood principles in sample survey theory,” Sankhya
A (31):441-454.
Ericson, W.A. (1969). “Subjective Bayesian models in sampling finite populations,” Journal of the
Royal Statistical Society B 31:195-234.
Ericson, W.A. (1988). Bayesian inference in finite populations. In P.R. Krishnaiah and C.R. Rao (eds),
Handbook of statistics, Vol. 6. Elsevier Science Publishers, Amsterdam, pp. 213-246.
Godambe, V.P. (1955). “A unified theory of sampling from finite populations,” Journal of the Royal
Statistical Society, Series B, 17, 268-278.
Godambe, V.P. (1966). “Bayes and empirical Bayes estimation in sampling finite populations,”
Technical report No 41, Department of Statistics, The Johns Hopkins University, 1966.
Graubard, B.I. and Korn, E.L. (2002). “Inference for superpopulation parameters using sample
surveys,” Statistical Science 17(1):73-96.
Koop, J.C. (1974). “Notes for a unified theory of estimation for sample surveys taking into account
response errors,” Metrika 21:19-39.
document.doc 5/18/2023 7:42 AM 30
San Martino, S., Singer, J.D., and Stanek, E.J. III (2008). “Performance of balanced two-stage empirical
predictors of realized cluster latent values from finite populations: a simulation study,” Journal of
Computational Statistics and Data Analysis 52:2199-2217.
Scott, A. and Smith, T.M.F. (1969). “Estimation in multi-stage surveys. Journal of the American
Statistical Association 64(327): 830-840.
Stanek, E.J. III and Singer, J.M. (2004). “Predicting random effects from finitie population clustered
samples with response error,” Journal of the American Statistical Association, 99:468, 1119-1130.
Stanek, E.J. III and Singer, J.D. (2008). “Predicting random effects with an expanded finite population
mixed model,” Journal of Statistical Planning and Inference, 138(10):2991-3004.
Stanek, E. J. III and Singer, J.M. “Sampling, WLS and Mixed Models,” (2011, in press) Statistics in
Biopharmaceutical Research.
Ruitao Zhang, "Developing best linear unbiased estimator in finite population accounting for
measurement error due to interviewer" (January 1, 2010). Electronic Doctoral Dissertations for UMass
Amherst. Paper AAI3427614.
document.doc 5/18/2023 7:42 AM 31
Table 1. The Prior Distribution of with parameters equal to the average response of subjects from subsets of subjects in a superpopulation with .
Probability Mean Subjects Population
document.doc 5/18/2023 7:42 AM 1
Table 2. Possible Points in the Prior Distribution for population with when Data is on
(as indicated by response )
Listing
document.doc 5/18/2023 7:42 AM 2
Table 3. Possible Points in the Posterior Distribution with when Data is on
and the prior Probability of the Point for a Given Listing
.
document.doc 5/18/2023 7:42 AM 3
Table 4. The Posterior Distribution with when Data is on , along with Posterior points corresponding an Exchangeable Prior
Posterior Parameter
Posterior Probability
Points in Posterior Dist
document.doc 5/18/2023 7:42 AM 4
Table 5. The Posterior Distribution with when Data is on , along with Posterior points corresponding an Exchangeable Prior with Measurement Error
Posterior Parameter
Posterior Probability
Points in
Posterior Dist
document.doc 5/18/2023 7:42 AM 5