coverage of fb 3-30-09 - v2 - dhj
TRANSCRIPT
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
1/40
DRAFT DO NOT CITE OR QUOTE
Coverage of the Foreign-Born Population in Censuses and Surveys: What Do We Think,
What Do We Know, and What Can We Prove?
By
Dean H. Judson1
Abstract
It is widely assumed that the foreign-born population in general, and the unauthorized
foreign-born population in particular, are not captured in surveys and the decennial Census
as well as the native-born population. Many different estimates have been proffered, all
with significant limitations. This paper attempts to summarize research on this coverage
question. We first describe the methods used to correct net coverage error in censuses and
surveys, and discuss the effect of net coverage error on censuses, surveys, and derived
products. Then we describe the history of various coverage assertions or assumptions used
in the literature (what do we think?). We then assess their relative merit (what do we
know?). Finally, we attempt to make an assessment as to what isprovable about coverage
of the foreign-born (what can we prove?). We conclude by suggesting a research agenda
that would address this coverage question in a statistically-principled way.
Keywords: foreign-born coverage, coverage measurement, coverage correction,
information integration, the coverage question
1 This report was completed while the author was Senior Statistician for the Office of Immigration
Statistics, and is released to inform interested parties of ongoing research and to encourage discussion of
work in progress. The views expressed on statistical and methodological issues are those of the author and
not necessarily those of the Office of Immigration Statistics or Department of Homeland Security.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 1
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
2/40
DRAFT DO NOT CITE OR QUOTE
Introduction: the Coverage Question ...................................................................................3
Nonsampling Error ...............................................................................................................4
Net Coverage Error ..............................................................................................................5Why is Net Coverage Error Important? ...............................................................................6
Correcting for Net Coverage Error in Censuses and Surveys .................................................6
The Census: Count Imputation and Post-Enumeration Survey Coverage CorrectionFactors ..................................................................................................................................7
The Effect of Net Coverage Error on the Census ................................................................8
Surveys: Weights and Population Controls .........................................................................9The Effect of Net Coverage Error on Survey Estimates ...................................................10
CPS .................................................................................................................................10
ACS ................................................................................................................................10
Other Surveys .................................................................................................................10Adjustments for Potential Net Coverage Error .................................................................11
What has been Asserted about Coverage of the Foreign Born? ............................................14
What is Provable About Coverage of the Foreign Born? ......................................................15
Ethnographic Studies Suggest that the Foreign Born Avoid Detection ............................16The Foreign Born Tend to Respond Later to Surveys ...................................................16
States with Higher Levels of Foreign Born, Particularly Unauthorized or RecentArrivals, Tend to Have Lower Coverage Ratios ...............................................................16
The Foreign Born, Particularly Recent Arrivals, Tend to Live in Areas with Higher
Hard-to-Count Scores .....................................................................................................17Demographic Characteristics of Individual Foreign Born Persons, Particularly Recent
Arrivals, Correlate With Hard to Count Indices ............................................................18
What Can be Done to Measure Coverage of the Foreign Born? ...........................................20
Coverage Measurement Alternatives: Summary ...............................................................20Post Enumeration Surveys/Dual System Estimation .........................................................21
Demographic Benchmarking .............................................................................................22
Demographic Analysis .......................................................................................................22Direct enquiries ..................................................................................................................22
One-Way Record Linking ..................................................................................................23
Synthetic Estimation ..........................................................................................................23Research Proposals .................................................................................................................24
References ..............................................................................................................................26
Tables and Figures ..................................................................................................................31
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 2
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
3/40
DRAFT DO NOT CITE OR QUOTE
Introduction: the Coverage QuestionWhen conducting a census or survey, it is important that the operations of the census or
survey properly represent the target population of interest. In general, the relationshipbetween the enumerated/sampled group and the target population is referred to as the
coverage of that population. In the absence of proper coverage, there is coverage error,and the survey estimates or census enumeration are biasedwith respect to the population.
This fact is especially important for derived estimates, such as the unauthorized population,which are difficult in any case due to data gaps.
The foreign-born population of the United States is estimated to consist of about 38 millionpeople, 12.6% of the total population of about 302,000,000 as of 2007 (U.S. Census
Bureau, American FactFinder, 2008). It is widely assumed that, for various reasons, the
foreign-born population is particularly likely to be subject to coverage error (Marcelli andOng, 2002; Camarota, 2006) in surveys and the census. If true, then this would have wide-
ranging implications for the census itself, ongoing surveys, and for estimates derived from
them (including estimates of the naturalized, legal permanent, refugee/asylee andunauthorized subpopulations). We will refer to this question as the coverage question
and we will repeatedly use the phrase potential undercoverage to emphasize the potential
in the absence of hard proof.2
The purpose of this paper is to comprehensively address the coverage question, and
attempt to define an agenda to convert assumptions about foreign-born undercoverage into
proof about its existence and magnitude. We will do so in the following steps:
First, we will address the sources of error in surveys and the census;
Second, we will review the known effects of coverage error on surveys and the
census, focusing on the foreign born where possible;
Third, we will summarize attempts or recommendations on how to try to measurecoverage;
Fourth, we will assess which measurements of coverage are most viable; andfinally,
We will suggest a research agenda to definitively attack the coverage question.
In what follows, we will be most careful to distinguish our knowledge in this area betweenwhat an individual researcher asserts to be true (what do we think?), what the community
of researchers generally agree on (what do we know?), and concrete findings in this area
(what can we prove?).3
2 This is not a new concern; cf. Siegel, 1976:15: This report has tried to develop the view that it is not a
practical goal to estimate directly the number of illegal aliens [sic]Thus, the effort to estimate the number
of illegal aliens becomes principally an effort to measure the coverage of the total population by nativity
(italics added).3 For the readers interest, this three-part partition of knowledge is derived from the excellent movie,
And the Band Played On based upon the book about the AIDS epidemic of the same name (Shilts, 1987).
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 3
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
4/40
DRAFT DO NOT CITE OR QUOTE
Coverage error refers to a number of errors in addition to any sampling variability or other
nonsampling error. To more fully describe the term coverage error, we will begin with a
general discussion of sampling variability and nonsampling error in surveys and the census.
Sampling Variability
The sampling error of a sample survey can be measured in several ways. The first measure
that is usually desired is the variance of the sample estimate. This is the average, over all
possible samples, of the squared deviations of the estimates from their expected value. Anestimate of the variance can be obtained from the sample survey data themselves. If there
are nonsampling errors or the sample is biased, then the deviations are taken around the
true value of the statistic and the measure is called the mean square error. Typically, the
variance is denoted by 2 and the mean square error by MSE. Of these two measures, the
MSE is more general, as illustrated in the formula for MSE. Suppose that p is the value
being estimated, and p is the estimator of p; then the MSE ofp is given by:
( )
( ) ( )
)(bias)var(
)()(
)()()()(
2
22
22
pp
ppEpEpE
ppEpEpEppEpMSE
+=
+=
+==
(1)
Ifp is unbiased, then the MSE is just the variance itself. In the presence of coverage error
(or other kinds of nonsampling error), p is notunbiased, and contributes to mean squared
error.4
Nonsampling Error
In addition to the error of the estimates caused by sampling variability, there is another
component of the total error in demographic data. Nonsampling error characterizes all
surveys, whether sampling is used or notincluding 100% surveys known as censuses.This component arises from mistakes made in the process of eliciting, recording, and
processing the response of an individual unit in the surveyed population. Every operation
in a census or sample survey, and every factor within an operation, may contribute tononsampling error. Lessler and Kalsbeek (1992:9) classify survey errors into four types:
4 Lohr (1999:256-258) develops a simple model illustrating the biasing effect of nonresponse. Letting MN
be the number of nonrespondents in a survey, N the total number sampled in the survey, RUp the mean
response for respondents, MUp the mean response for nonrespondents, and Up the mean response for the
population as a whole, the bias induced by nonresponse is approximately ( )MURUM ppN
N . This bias
is small only ifN
NMis small (there is little nonresponse) or ( )MURU pp is small (responders arent
much different than nonresponders).
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 4
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
5/40
DRAFT DO NOT CITE OR QUOTE
Frame errors: problems with developing lists of units to be sampled, duplication or
omission of units, and the like;
Sampling errors: error or variance associated with sampling itself;
Nonresponse errors: errors associated with the failure to include a unit in the survey
that should have been included, either through omission by the surveyer or
nonresponse or refusal by the respondent; and Measurement errors: errors associated with data collection operations, including the
contribution of respondent, interviewer, and questionnaire to error.
Factors that are common to all of these operations, including training procedures and
supervision, and technical-staff intervention associated with each operation, may also be
sources of nonsampling error. Typically, those operations that are conducted in the officeafter collection of the data in the field are more amenable to control than are the field
operations. It is also generally possible through study of the office operations themselves
to measure errors introduced (by the operations) into the data that were collected in the
field. Because nonsampling error made by the respondent, in interaction with the
interviewer and the questionnaire, is more serious and less amenable to measurement thanerrors arising from other operations, nonsampling error is often called response error. A
typical example of response error made by respondents is the tendency of persons in manycountries to report their ages in years ending in zero and five (Ewbank, 1981). Often such
response error requires special detection and smoothing methods; such methods are
described in Arriaga, Johnson, and Jamison (1994), and Judson and Popoff (2004).
Net Coverage Error
Coverage error can take two forms: overcoverage and undercoverage. The former implies
that a sampled unit occurs in the sample more than once. The latter implies that a sampled
unit occurs in the sample less than once. Typically, the difference between overcoverageand undercoverage is referred to as net coverage error. Net coverage error can be positive,
negative, exactly zero or sum to zero (in this latter case errors offset each other).
Furthermore, net coverage error can vary across specific geographic, economic or
demographic subgroups. That is, one group could have a positive net coverage error and
another group could have negative net coverage errorthus one group is overcovered and
the other undercovered. This state of affairs is known as differential coverage error.
At this point it is useful to discuss generally the causes of over and undercoverage, before
entering into the specific issues associated with the foreign born. Several authors (Fein,
1990; de la Puente, 1993; Anderson and Feinberg, 1999; Judson and Popoff, 2004)enumerate sources of coverage error, which we shall summarize.
A source of overcoverage found in Census 2000 was Master Address File frame erroneous
enumeration and duplication (Jones, 2003). In the Census 2000 context, operations that
created the Master Address File created some units that were later found to be erroneousenumerations, or created multiple records for the same address. (For example, the same
physical location might have multiple ways to write the address, and the file would contain
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 5
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
6/40
DRAFT DO NOT CITE OR QUOTE
all of them.) Comparing demographic housing benchmarks with the 1999 Decennial
Master Address File, the control file for subsequent operations, implied a 5.3%
overcoverage of addresses in that file. This led to overenumeration when the addressreceived multiple forms or in-person followups (Fay, 2001).
Similarly, overcoverage can be caused by accidental person duplication. For example, acensus form might enumerate a student living away from home at college, while the
students parental household also enumerates that same student. As another example,
persons in assisted-living situations might be enumerated at a group quarter and at ahousehold. The Census Bureau estimates a lower-bound figure of 5.8 million such person
duplicates in Census 2000 (Mule and Fenstermaker, 2003).
By contrast, undercoverage comes from many sources. The Census Bureau enumeratesmany of them: High mobility; Rentership; Language barriers; Neighborhood resistance;
Irregular housing; Non-standard living arrangements; Loose attachment to a particular
household; Concerns regarding confidentiality (de la Puente, 1993; Darga, 2000; Camarota,
2006). To this list might be added: Active desire for concealment (Tourangeau, et al.,1997; Ellis, 1995; Valentine and Valentine, 1971).
Why is Net Coverage Error Important?
When estimates are produced by a census or survey, typically these estimates are notderived directly from the raw survey or census responses. This means that, even with
extensive operational quality control and intensive effort, the data collection does not yet
represent the population.
As Shapiro and Kostanich (1988:443) state: we believe there is not general awareness
of the deleterious effects of response error, and it is rarely estimated. In poorly designedand conducted household surveys, there can be many serious problems. In even the best
household surveys, however, undercoverage and response error tend to be high and, in our
opinion, are the two most important problems in the sample survey field. (Italics added.)
When faced with this situation, there are variety of processing steps and adjustments to
the data to attempt to make it represent the population of interest. (These steps are
different for a 100% enumeration and a sample enumeration, so we will address themseparately.) Typically, the goal is to represent the population as a whole, not necessarily
any particular subgroup. Therefore, the question that we will address is: do these
adjustments correct for potential undercoverage of the foreign-born population, and if they
do not, how much undercoverage remains?
Correcting for Net Coverage Error in Censuses andSurveys
In this section we will describe attempts to correct for net coverage error in the census andin surveys. The approaches are slightly different due to their different data collection
context (100% enumeration versus sample selection). One area which we will not consider
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 6
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
7/40
DRAFT DO NOT CITE OR QUOTE
is the area of item imputation, where item nonresponse (failure to collect one or more data
element from a survey/census form) requires item imputation. We will, however, consider
unit imputation caused by unit nonresponse (where entire sample/enumeration units aremissed).
The Census: Count Imputation and Post-Enumeration SurveyCoverage Correction Factors
In the census context, extensive efforts are made to execute a complete enumeration: theseinclude advertising campaigns, school presentations, state and local partnership programs,
careful organization, quality control checks, multiple language questionnaires, querying
neighbors, and multiple nonresponse followups, to name a few.
However, despite all efforts, for a small fraction of housing units (about 1.4% of housing
units in Census 2000; Zajac, 2003), nothing is known about the composition of the housing
unit, even from a neighbor. In this case imputation is used.5 Imputation typically takes twoforms: status imputation and count imputation. Status imputation determines
whether the nonresponding unit should be considered occupied or vacant; if occupied,count imputation determines how many personsshould have been enumerated there, butwere not. In Census 2000, both kinds of imputation were executed using the hot deck
technique, which typically chooses a nearest neighbor to make the imputation6. Item
imputation is then used to fill out the characteristics of the imputed people.
For Census 2000, an ambitious coverage correction program was developed, building on
the earlier Post-enumeration survey (PES) and Post-enumeration program (PEP) used in
past censuses. This coverage program, the Accuracy and Coverage Evaluation (A.C.E.),was a reenumeration of sampled census blocks, from address frame development to
household enumeration with specially-designated interviewers. The sample size of the
A.C.E. was large enough that it had potential to be used to adjust the census to correct fornet coverage error.
The statistical theory behind the A.C.E. operation was not, as some suppose, to get the
enumeration right in the A.C.E. and then use it to correct the census. Rather, it used dualsystem estimation theory, which assumes that the two enumerations are independent (rather
than one being superior to the other). This independence assumption is critical to
understanding why it is difficult to 1) incorporate nativity questions into the analysis in2010 or 2) use other data sources (such as recent Lawful Permanent Resident Admissions)
to estimate coverage. We will deal with these technical matters later in this paper.
Using the independence assumption, an estimate of the true enumeration is constructed, for840 (in 1990) or 416 (in 2000, after collapsing) predefined post-strata or estimation
domains. (For example, American Indians living on reservations is one estimation domain;
5 Technically, in Zajac, 2003, this is called substitution to distinguish it from assignment and
allocation, two forms of item imputation.6 Note that these imputations were the subject of a lawsuit against the Census Bureau by the state of Utah,
with Utah claiming that hot deck imputation was a form of prohibited sampling for nonresponse, the
Census Bureau claiming otherwise. The Supreme Court decided in the Census Bureaus favor by a 5-4
margin. See Cantwell, Hogan, and Styles, 2004, for a summary ofUtah v. Evans.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 7
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
8/40
DRAFT DO NOT CITE OR QUOTE
Hispanic renters is another.) From these domains, coverage correction factors were to be
derived, and used in non-sampled blocks to adjust the census (see U.S. Census Bureau,
2004, for a detailed design document). The Census Bureau did not commit to adjusting thecensus; rather, staff conducted a series of evaluations and the Census Bureau was to
recommend whether to adjust or not.
The history of A.C.E. is documented elsewhere, both by the Census Bureau and by others
(Anderson and Feinberg, 1999; U.S. Census Bureau, 2004). In sum, the problem of
duplication, respondents interpreting residence rules, and nonindependence between theoriginal enumeration and the A.C.E. enumeration, mitigated against using the A.C.E. to
adjust the decennial results. Three decisions, in sequence, were made after several months
of evaluation: 1) A.C.E. could not be used for congressional apportionment; 2) A.C.E.
could not be used for congressional redistricting; and 3) A.C.E. should not be used foradjusting the postcensal population estimates base (Mulry, 2006). The Census Bureau has
not made public any plans for considering adjustment in 2010.
The Effect of Net Coverage Error on the Census
The central purpose of the decennial census is to provide the Constitutional basis for theapportionment of representatives amongst states. A second purpose is to provide small
area data for the purpose of congressional redistricting (Government Accountability Office,
1998a).
It is the third purpose of a census that is most germane to this paper. This third purpose is
to serve as a population base for making postcensal population estimates at many levels ofgeography: national, state, county, and subcounty. These postcensal population estimates
are used for federal funds distribution, but we will not focus on these distributive effects.
In this paper, their most important use is for survey population controls, as described in theprevious section. To the extent that the census itself has differential net coverage error,
that error will be propagated forward into postcensal estimates and projections.
An illustration of these impacts can be seen in Robinson, et al. (2003), which compared theDemographic Analysis (DA) method with unadjusted decennial enumerations. Figure
one, taken from that paper, exhibits net coverage error7 by 5-year age group and two race
categories.
-- Insert figure one about here --
As can be seen in this figure, comparing decennial census results with the independent DAsystem of coverage measurement reveals important patterns by race, sex and age. Ages 0-4
and 5-9 tend to exhibit net undercoverage, for all race groups. Outside of these ages,
7 The term used at the time was net census undercountwith a negative net undercount implying an
overcount. We prefer the term net coverage error in this context. It is important to note in this context
that the race detailthat DA can provide is limited to black, white, and all other races. Typically the latter
two are combined into a nonblack category. This is a limitation of the data sources used to construct the
DA estimate. From our point of view (our interest in coverage of the foreign born), this is a significant
limitation.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 8
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
9/40
DRAFT DO NOT CITE OR QUOTE
nonblack females exhibit slight net overcoverage, while black males exhibit substantial net
undercoverage from ages 20-64. Black females and nonblack males also exhibit slight
undercoverage for approximately ages 30-59. Similar patterns were detected in the 1990census, as well (Robinson, et al., 2003).
To the extent that these coverage errors remain in an unadjusted census, they propagateforward into postcensal estimates, and surveys that use them for postratification controls.
Surveys: Weights and Population Controls
In the survey environment, there are two primary components: Adjustments based on
survey design, and adjustments based on population controls derived from the previous
census. Coverage correction typically takes the form of a series of weights, each ofwhich attempts to correct for one or another form of net coverage error. For this section,
we will take a typical example based on the American Community Survey, although
similar techniques are used with other surveys such as the Current Population Survey,Annual Social and Economic Supplement (ASEC).
Weights are applied in phases8:
Phase one: In phase one, the sampling design is taken into account. This weight accountsfor the combined probability of selection of the final sampled unit. Since a sample, by
construction, does not cover the entire population of interest, this is the first attempt at
correctionto correct for unequal sampling probability.
Phase two: In phase two, any nonresponse followup design is taken into account. For the
American Community Survey, approximately one in three households who do not respond
to the mailing and the telephone followup, are sent to personal interview followup. Bydesign, then, about 2/3rds of the nonresponders are not covered. This weight accounts for
that and upweights those households who receive personal interviews.
Phase three: After phases one and two, at this stage there are a fraction of households whoare noninterviews. (Some have used the phrase hard core nonresponders or similar
language.) Little is known about these households except their geographical location and
some limited information interviewers can obtain by proxy. Phase three weights, or
noninterview weights are typically used to adjust the sampled and responding units tomatch the geography and information available on the nonresponders. At this point the
combination of weights are purely survey oriented.
Phase four: After phase three, postratification occurs. In this phase, independenthousing unit estimates (typically constructed by demographers) are used as control totals
that is, the estimated survey housing unit values are controlled to the independent
controls. The purpose is to remove the nonresponse bias from the mean squared error.
Phase five: Finally, another round of postratification occurs. In this phase, independentpopulation estimates (again, demographically constructed) are used as control totals for
person recordsthe estimated survey person characteristics (some combination of age,
race, sex, and Hispanic origin) are controlled to the independent controls. As withhousing unit controls, the purpose is to remove nonresponse bias from the mean squared
error.
8 These phases are necessarily a summary and approximate the steps without being exact specifications.
Typically the weights are combined multiplicatively to generate a final adjustment weight.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 9
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
10/40
DRAFT DO NOT CITE OR QUOTE
Lohr (1993: 272) presents this caution about this sequence of adjustment weights:
The models for weighting adjustments for nonresponse are strong: in each
weighting cell, the respondents and nonrespondents are assumed to be
similar.These models never exactly describe the true state of affairs, and youshould always consider their plausibility and implications. It is an unfortunate
tendency of many survey practitioners to treat the weighting adjustment as a
complete remedy and to then act as though there were no nonresponse (Italicsadded).
The Effect of Net Coverage Error on Survey Estimates
The effect of net coverage error in survey estimates can be summarized by the coverage
ratio (Shapiro and Kostanich, 1988). A coverage ratio compares the estimate from thesample of the number of people who have a particular characteristic to the same estimate
from updated decennial census figures. For example, a coverage ratio of .95 for males aged50 to 59 indicates that the survey estimate of the number of people in this subpopulation is
95% of the updated census population estimate. Occasionally, the coverage ratio exceeds1.0, indicating overcoverage of a particular category.
CPS
The Current Population Survey in the 2001-2004 period are summarized at
http://www.bls.census.gov/cps/basic/perfmeas/coverage.htm. As noted on that page,average coverage ratios are typically about .90, coverage ratios for males are typically
lower than for females, and this is particularly prominent for black males in that survey.
Hispanic males and males of other race are also low, in the .80 range. It also appears that
the coverage ratios have a slight downward trend over the period.
ACS
The American Community Survey maintains a data quality web page, which summarizes
coverage ratios.9 Some evidence of coverage differentials has also been presented at
statistical meetings (e.g., Bruce, Navarro and Ahmed, 2007). At the national level,unadjusted ACS estimates exhibit coverage ratios of between .94 and .97 relative to total
population estimates through the 2000-2006 period. Males approach 8% undercoverage,
females about 4%. Hispanic coverage ratios range from .897 to .964, non-Hispanic whitefrom .949 to .971.
Other SurveysWe only briefly mention other surveys. Three surveys that provide useful information on
the foreign-born population are the Survey of Income and Program Participation (SIPP),
the New Immigrant Survey (NIS), and the National Agricultural Workers Survey (NAWS).
9 Because ACS samples for personal interview followup, the coverage ratio is necessarily defined slightly
differently than the analogous CPS coverage ratio. Coverage ratios can be found at:
http://www.census.gov/acs/www/acs-php/quality_measures_coverage_2006.php.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 10
http://www.bls.census.gov/cps/basic/perfmeas/coverage.htmhttp://www.bls.census.gov/cps/basic/perfmeas/coverage.htm -
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
11/40
DRAFT DO NOT CITE OR QUOTE
The SIPP, by virtue of its longitudinal design, has an additional coverage dimension, apart
from the coverage ratios found in other surveys. This additional dimension is sample
attrition (the loss of formerly-responding households and people). Attrition, of course,causes later respondent groups to differ from earlier respondent groups, and requires
attrition weights to correct for differential attrition across groups.
Adjustments for Potential Net Coverage Error
In a number of publications (e.g., Passel, Van Hook, and Bean, 2004; Passel and Suro,2005; Passel, 2005, 2006, 2007; Passel and Cohn, 2008), Pew Hispanic Center has
published estimates of the size and characteristics of the unauthorized foreign-born
population. This method uses the March Current Population Survey (CPS) as the base forthe estimate.
Undercoverage factors are derived from the Accuracy and Coverage Evaluation (Passel andCohn, 2008: 13): a 2.0% undercount rate for legal resident immigrants (2.6% for legal
resident immigrants who have entered after 1980). Passel and Cohn cite Marcelli and Ong(2002) to justify a 12.5% undercount for unauthorized immigrants in the March CPS.
Passel, Van Hook and Bean (2004) use a lower, 9.1% undercount.
It is important in this context to point out that the resulting estimate not only uses these
assumptions, but the resulting estimate is sensitive to them. Using the Passel, Van Hookand Bean (2004) estimates of the 2000 unauthorized as a base, after some algebraic
simplification, the resulting approximate residual estimation formula is (in millions):
,1
11361346120
)-u(
)-u(.-.edunauthoriz
edunauthoriz
legal= where,
unauthorized is the final estimate of the number unauthorized;
legalu is the assumed undercoverage rate for legal resident immigrants; and
edunauthorizu u is the assumed undercoverage rate for unauthorized immigrants.
If we fix the parameter legalu to match that of Passel, Van Hook and Bean, at 1.6%, the
estimating equation simplifies to:
,1
9.1246120
)-u(
-.edunauthoriz
edunauthoriz
= .
Thus, the unauthorized population is inflated by)-u( edunauthoriz1
1.
Table one illustrates the impact of this inflator on the final estimate (we vary the parameterfrom one percent to 20 percent).
-- Table one about here --
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 11
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
12/40
DRAFT DO NOT CITE OR QUOTE
As can be seen, the range of possible unauthorized estimates ranges from 7.68 million (the
lower bound) to 9.5 million (the upper bound). As can be seen from the center column, theinflation factor increases increasinglythat is, as the assumed rate grows, its impact on the
final number also grows.
The Office of Immigration Statistics (OIS) is legally mandated to produce an estimate of
the stock of the unauthorized population residing in the United States (Immigration and
Nationality Act, Section 103(d).) The method is described (cf. Hoefer, Rytina, and Baker,2008) as a residual method, and is similar to the Pew Hispanic method described above:
A population base of the total foreign-born population is constructed from
American Community Survey population estimates;
This base is then inflated for net undercoverage;
From this base is subtracted estimates of deaths, emigrants, legal permanent
residents, refugees and asylees, and resident nonimmigrants; and
The residual is an estimate of the unauthorized.
In this method, two undercoverage factors are applied: a 2.5% net undercoverage rate for
the legal permanent resident (LPR), refugee and asylee population as a whole, and a 10%net undercoverage rate for the nonimmigrants10 and the unauthorized. It is important to
note that the application of these two assumed net undercoverage rates implies that the
resulting estimate is no longer consistent with published census estimates from the ACS.The unauthorized net coverage rate was based on previous DHS/INS unauthorized
estimates, which cited Marcelli and Ong (2002). The authors note that the resulting
estimate is sensitive to the assumed net undercoverage rate.
Like the Pew Hispanic method, with some simplifying assumptions we can perform our
own sensitivity analysis. Ignoring some demographic particulars, and using similarnotation as above to illustrate the similarities, the 2007 formula is approximately equal to(in millions):
( )[ ] ,1
117.112.2018.28
)-u()-u(-emig)()-u-(edunauthoriz
edunauthoriz
nonlegal =
where,
legalu is the undercount rate for the legal resident population;
emig is the effective emigration rate11;
nonu is the undercount rate for nonimmigrants; and
edunauthorizu is the undercount rate for the residual unauthorized population in the numerator.
Table two compares the relative sensitivity of the resulting estimate to each of the assumed
rates. The shaded center line represents the assumed rates used in 2007 (again noting that
this formula is an approximation):
10 Nonimmigrants are sometimes referred to as legal temporary migrants; they include visa holders who
are not legally approved for long-term immigration/residence.11 This is an approximation to the internal formula.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 12
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
13/40
DRAFT DO NOT CITE OR QUOTE
-- Table two about here --
The table reveals that, over ranges that are commonly used, the formula is particularly
sensitive to the assumed net undercoverage of the unauthorized foreign born. It is also
notably sensitive to assumed emigration rates, and less so for the coverage assumptionsassociated with nonimmigrants and with the legally-resident foreign-born population.
We conclude with an examination of labor force estimates. During the period of economicprosperity of the 1990s, an anomaly appeared between the Current Population Surveys
estimate of the labor force, and the Current Economic Statistics survey of establishment
reported jobs. In essence, the two series, normally running almost in parallel, began to
diverge, with the CES reporting greater growth in jobs than the CPS was reporting growthin the labor force (Juhn and Potter, 1999). Juhn and Potter analyzed the differences as of
1999, considering three hypotheses: that the surveys treat multiple jobholding differently;
that the payroll survey (CES) is upwardly adjusted by benchmarking; and that there was an
undercount of the working-age population in the calculation of the household surveyestimates. Their conclusion is notable:
We find that the third explanationan underestimated working-age population
best accounts for the recent rise in the employment gap. Since the household survey
calculates the level of employment by combining survey data with a census-basedestimate of the U.S. working-age population, an undercount of that population will
produce low employment numbers. Evidence suggests that the census has in fact
historically underestimated this population. Significantly, the undercount appears to
be highest among groups whose employment status is very sensitive to businesscycle fluctuations. We contend that the steady expansion of the economy in the
1990s has enabled these cyclical workers to find employment. Their numbers, only
partly captured in the censusand, by extension, in the household surveyhave inrecent years helped to boost the job count in the payroll survey, widening the gap
between the surveys employment estimates (Juhn and Potter, 1999:1).
The recently- (within 10 years) and very-recently- (within 5 years) arrived foreign born are
more likely than older foreign born or native born persons to fall into the working age
population (Mosisa, 2002). To the extent that the foreign born are among this group, and
the unauthorized foreign born in particular, then the potential undercoverage wouldsystematically underestimate the size of the labor force, particularly during expansions.
The CES/CPS differential in the 1990s, and the subsequent (in 2000) discovery of
additional, as-yet-undetected net immigration (see, e.g., Robinson, et al., 2002: 22, or
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 13
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
14/40
DRAFT DO NOT CITE OR QUOTE
Nardone, et al., 2003:12),12 is consistent with this interpretation, and suggests the
sensitivity of these results to potential undercoverage of the foreign born.
An analysis of the effect of CPS/CES differences on unemployment rates and other
quantities of interest was presented by Schweitzer and Ransom (1999). They calculate the
effect on late 1990s unemployment rates based on assuming that the CPS employmentlevels followed those of the CES. Their conclusions are presented in their Figure 1: either
unemployment rates must have been much smaller than reported in the CPS during that
period, or labor force participation rates must have risen very quickly, or something waswrong with the population controls. Again, this is consistent with, but does not prove, the
subsequent discovery of additional net immigration, and suggests the sensitivity of these
results to potential undercoverage of the foreign born.
What has been Asserted about Coverage of the ForeignBorn?
Marcelli and Ong (2002), provide results of a study of foreign-born Mexicans in Los
Angeles county; these results have been widely cited and used far outside their originalcontext.13 In this study, Marcelli and Ong developed a direct survey enquiry: respondents
in households were asked directly, was [this person enumerated in the household]
included in the 2000 questionnaire sent to the Census Department? Later in the survey,respondents legal statuses were also ascertained. They obtained a gross undercoverage
rate of 10.6% for unauthorized respondents, 8.3% for legal immigrants, 4.5% for U.S.
citizens, and 7.1% for temporary visitors.
In contrast to direct enquiries, Van Hook and Bean (1998) used a demographic analysis
approach, generating an expected population size (based on vital statistics) and
comparing it to the obtained population size in the 1990 census. Their results suggest net
undercount rates that range from 15-25 percent for unauthorized Mexican immigrants.
In the context of evaluating coverage in Census 2000, Deardorff and Blumerman (inRobinson, 2001: Appendix A) develop several net undercoverage assumptions by migrant
legal status, and examine their impact on final demographic analysis estimates. In part, this
was an attempt to explain the then-discrepancy between early A.C.E. coverage results and
demographic analysis coverage results. For purposes of their scenarios, these assumptionsranged from 1-2% for legal migrants, 7-35% for legal temporary migrants, and 10-15% for
12 Robinson, et al.:For the 2000 DA, we were particularly concerned about the reliability of the
immigration components and conducted a sensitivity analysis in response. This analysis led to the
incorporation of an alternative set of DA estimates to allow for the possible understatement ofimmigration (specifically, undocumented immigration) in the initial DA components of growth. Nardone,
et al.: Additional studies carried out by the Census Bureau Population Division as part of theestimates evaluation indicated that the estimates of unauthorized migrants that were used
in the 1990 based intercensal estimates were too low. The evaluations indicated that the
residual foreign born population increased by about 5 million during the 1990 to 2000
decade rather than the 2.25 million (10 x 225,000) assumed in the 1990 based estimates.13 For the record, Marcelli (personal communication) considers applying these results to geographical
contexts outside of Los Angeles County or to other foreign-born populations residing in the United States
to be questionable at best.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 14
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
15/40
DRAFT DO NOT CITE OR QUOTE
the unauthorized. They note (p. A-12) that the coverage assumptions do not explain the
different total populations calculated by DA and the A.C.E.
Passel, Van Hook, and Bean (2004), Passel and Suro (2005), Passel, 2005, 2006, 2007;
Passel and Cohn, 2008), developed a series of estimates of the foreign born population by
legal status, including characteristics. These estimates are widely cited in the mainstreammedia. These estimates assume a net undercoverage rate of 2% for legal resident
immigrants (2.6% for those entering after 1980), based on census studies of net coverage
error. For the unauthorized, however, a net undercoverage rate of 12.5% is assumed. InPassel and Cohn (2008), Marcelli and Ongs 2002 work is cited.
A further source of assertions about coverage of the foreign born is the Office of
Immigration Statistics estimates of the unauthorized population. As noted above, twoundercoverage factors are applied: a 2.5% net undercoverage rate for the LPR, refugee
and asylee population as a whole, and a 10% net undercoverage rate for the nonimmigrants
and the unauthorized. It is important in this context to quote the justification for these
rates: This was the same rate used in previous DHS estimates (Department ofHomeland Security, 2007). Of course, the previous estimates cited other previous
estimates, and eventually this assumption can be traced to Office of Policy and Planningestimates developed by Robert Warren (2003), who cited Marcelli and Ong (2002).
Other ranges have been applied to earlier censuses.14 Unofficial estimates by Census staffput the undercount of illegal immigrants at about 33 percent in the 1980 census
(Government Accountability Office, 1998b; Fernandez and Robinson, 1994); Passel (1986)
suggested a range between 33 and 50 percent. For the 1990 census, various analyses put the
figure at roughly 20-30 percent (Woodrow, 1991; Van Hook and Bean, 1997; Woodrow-Lafield, 1995). GAO (1998) cites these and other sources in attesting to the difficulty of
assessing these various assertions.
For completeness, it is important to note the A.C.E. results from Census 2000, and what the
non-adjustment decision of the Secretary of the Commerce Department (coinciding with
the Census Bureaus recommendation [ESCAP II]) implies. Because overall net coverageerror was close to zero, and because it could not be determined that A.C.E.-based
adjustments would improve distributional accuracy, no adjustment was performed. This
implies that the net coverage error of the foreign-born population is, for all practical
purposes, zero. Implicitly, then, subsequent to Census 2000 all population estimates builton the census base are by assumption covering the foreign-born population adequately.
What is Provable About Coverage of the Foreign Born?While much is assumed or asserted to be true about coverage of the foreign born, theseclaims are difficult to prove. In this section we will survey some empirical results from theliterature, and provide some results of our own. These results suggest that net
14 It is widely believed (e.g., Anderson and Feinberg, 1999) that Census-taking has improved steadily
throughout the 20th century (duplication in Census 2000 possibly being a symptom of an overly aggressive
attempt to eliminate undercoverage). Thus, estimates of undercoverage from earlier censuses may not
apply to Census 2000.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 15
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
16/40
DRAFT DO NOT CITE OR QUOTE
undercoverage of the foreign-born population is higher than for the rest of the population.
In the following sections, we will use the term recent arrival to refer to foreign-born
persons whose year of entry is within 10 years of the current survey period, and the termvery recent arrival to refer to those whose year of entry is within five years of the current
survey period.15
Ethnographic Studies Suggest that the Foreign Born AvoidDetection
In the early 1990s, several ethnographic studies were performed, working with 29
(necessarily local and not statistically generalizable) sample areas containing some foreign-
born groups. De la Puente (1993: 4) summarizes many of these results. De la Puente notes
that complex households were common in sample areas with recent immigrants, especiallyHispanic immigrants. These ad hoc households protect the identity of members and thus
contribute to within-household undercoverage.16 Complex households (often containing
more members than allowed by law or by building management) combine with fear ofdisclosure to create avoidance.
The Foreign Born Tend to Respond Later to Surveys
Work performed by Camarota and Capizzano (2004) mixed ethnographic with quantitative
analyses of operational ACS data. While the ethnographic results were broadly
comparable with the previous section, the quantitative data reveal that, in the study areas,the foreign born are more likely to be captured later in the survey process. Figure two
and three, taken from Camarota and Capizzano (2004), illustrate this:
-- Insert figures two and three about here --
As can be seen in these figures, foreign-born respondents are disproportionately
responding in the later, telephone assisted (CATI) or personal interview (CAPI) phases ofthe ACS. Furthermore, certain countries of birth of foreign-born respondents (including
some countries that are commonly held to be major sources of unauthorized immigration)
tend to be captured later in the operational process. Obviously, complete household non-response represents the ultimate late responderand it is widely asserted by survey
experts (with evidence given by, e.g., Treat and Stackhouse, 2002) that late responders or
households captured in nonresponse follow-up are systematically different than earlyresponders.
States with Higher Levels of Foreign Born, ParticularlyUnauthorized or Recent Arrivals, Tend to Have LowerCoverage Ratios
Figure four plots coverage ratios in the 2006 American Community Survey against thepercent of the total population that is a recent arrival foreign born, for states. Overlaid
15 Obviously, the choice of years is somewhat arbitrary. Wilson (2008) uses a 10-year period to indicate
recent; Clark and Patel (2004) use a 5-year period to indicate recent. The results described below are
robust to either definition.16 One household cite in this study indicated that of the thirteen persons living in this household, only six
were enumerated in the 1990 census.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 16
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
17/40
DRAFT DO NOT CITE OR QUOTE
onto this plot is a simple linear regression and 95% confidence interval. The regression
line is analytically weighted to reflect the total population size of the state.
-- Insert figure four about here --
As can be seen, states with more recently arrived foreign born systematically tend tohave lower total coverage ratios.
It is important in this context to mention the ecological fallacy in this regard (Robinson,1950). These are state level data, and demonstrating a correlation between recently arrived
foreign born and lower coverage ratios does not necessarily imply individual-level net
undercoverage. The demonstrated relationship suggests, but does not prove, such an
individual-level relationship.
The Foreign Born, Particularly Recent Arrivals, Tend to Live inAreas with Higher Hard-to-Count Scores
Our final test uses the Planning Database (Robinson and Bruce, 2007; see also Bruce andRobinson, 2003; and Bruce, Robinson, and Sanders, 2001), a tool designed by the CensusBureau for planning and targeting areas that are harder to count than other areas. This
database, using Census 2000 short form and long form data, contains tabulations of data on
variables relevant to net undercoverage. Further, for 65,184 census tracts, a hard to
count score is calculated. This score ranges from 0 (representing the easiest to counttracts, with none or very few indicators) to a theoretical maximum of 132 (representing
tracts with all indicators of net undercoverage).
Again we wish to be careful not to commit the ecological fallacy (Robinson, 1950). We
emphasize that these are tractlevel data, and while we will show that there is a correlation
between recently arrived foreign-born persons and other hard to count indicators, werecognize that definitive proof requires individual level data. To emphasize this point, we
will continue to use the phrase areas to refer to tracts.
The hard to count score is a sum of weighted scores. The following description is fromRobinson and Bruce (2007: 7): a total of 12 variables that were correlated with
nonresponse rates in 1990 and 2000 are used to derive the HTC score.
The set of algorithms used to determine HTC scores is as follows:
(1) each individual variable is sorted across geographic areas from high to low (e.g., sort
tracts from highest percent poverty to lowest percent poverty),
(2) scores (0 to 11) are assigned to each variable for each tract (e.g., values of 11 are givento tracts with the highest poverty rates of over 44.3 percent and values of 0 are given to
tracts below the national median poverty rate of 9.9 percent in 2000),(3) the scores assigned to each of the 12 variables for a tract are summed to form a
composite HTC score for the tract.
The final HTC score is the sum of the ratings of the following components:1) Percent vacant housing units;
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 17
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
18/40
DRAFT DO NOT CITE OR QUOTE
2) Percent housing units that are not single-family structure;
3) Percent renter-occupied housing units;
4) Percent crowded occupied units;5) Percent of families not in a husband/wife configuration;
6) Percent of occupied housing units with no phone service;
7) Percent of persons 25 and older not a high-school graduate or more;8) Percent of persons below the poverty level;
9) Percent of persons receiving public assistance;
10) Percent of persons 16 and older unemployed;11) Percent of households that are linguistically isolated;
12) Percent of occupied housing units where the owner moved in within 1999-2000.
We have demonstrated above that the foreign born, and particularly recent entry foreignborn, tend to have characteristics that correlate with the items that enter into the hard to
count score. Do they also tend to live in areas that have high HTC scores? Figure five
presents a simple graphical analysis, with a linear regression (weighted by population size
in each tract) fit overlaid onto the point cloud. Each point is a census tract.
-- Figure five about here
As can be seen, there appears to be a correlation between percent foreign born who have
entered within the last ten years and the hard to count scores. And, paradoxically, when alinear regression is fit, for tracts where there are high levels of foreign born, beginning
about about 80% of total population, the linear fit begins to make predictions that are
impossibly high (that is, above 132).
This simple graphical finding suggests that there is residual variance to be explained that is
not captured in the components included in the score itself. To test this hypothesis, we run
a regression of HTC score on the twelve components that enter into that score, with theaddition of one variable: percent of total population that is foreign born and has a year of
entry within the last ten years. The result of this regression is presented in table three.
-- Insert table three about here
As can be seen, and not surprisingly, the components of the HTC score tend to predict that
score. Additionally, conditional on all other components of the score being held constant,every one percent increase in recent entry foreign born in a census tract results in an
expected .475 point increase in the HTC score.
Demographic Characteristics of Individual Foreign BornPersons, Particularly Recent Arrivals, Correlate With Hardto Count Indices
In previous sections, we have dealt with aggregate data at the state and tract levels. First,
we have shown that coverage ratios tend to decline as the percentage of the population whoare recent arrival foreign born increases. Second, we have shown that hard to count scores
for tracts tend to increase as the percentage of the population who are recent arrival foreign
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 18
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
19/40
DRAFT DO NOT CITE OR QUOTE
born increases, and that this relationship remains even when the components of the score
itself are used to predict it.
In each of the preceding sections we have noted the potential for the ecological fallacy. In
this section we use microdata from the 2007 American Community Survey Public Use
Microdata Sample to examine whether the components that correlate with netundercoverage are concentrated among individuals who are foreign born, in particular
recent arrivals. We use the characteristics of individuals and households identified by the
Census Bureaus planning database (Robinson and Bruce, 2007) as predictive of low returnrates and hard to count characteristics.
Our analysis is simply framed. We have partitioned the population into three groups:
native born (including those born abroad of American parents); older foreign born (thoseforeign-born persons whose year of entry is earlier than 10 years from the survey date, i.e.,
1997); the recent foreign born (those foreign-born people whose year of entry is within 6-
10 years of the survey date, 1997-2001), and the very-recent foreign born (those foreign-
born people whose year of entry is within five years of the survey date, 2002 or later). Weproceeded by comparing whether these four groups differ on each of the individual hard to
count characteristics. Recall that the characteristics identified in the hard to count scoreare: vacant housing units, housing units that are not a single-family structure, renter-
occupied housing units, crowded occupied units, families not in a husband/wife
configuration, occupied housing units with no phone service, persons over 25 who have notgraduated from high school, persons below the poverty level, persons receiving public
assistance, persons aged 16 and over who are unemployed17, households that are
linguistically isolated, and occupied housing units where the owner moved in within
1999-2000. With the exception of vacancy (which does not apply), we look at eachindividually, rather than in a multivariate context, performing simple difference of
proportions (percentages) tests (Agresti, 1990) to assess statistical significance. In each
case our null hypotheses are stated simply:
H0(1): The percentage of persons who exhibit the hard to count characteristic does not
differ between native born and non-recent foreign born.
H0(2): The percentage of persons who exhibit the hard to count characteristic does not
differ between native born and recently arrived foreign born.
H0(3): The percentage of persons who exhibit the hard to count characteristic does not
differ between native born and very recently arrived foreign born.
Our alternative hypotheses are the converses of these null hypotheses. We use the 90%
two-tailed confidence interval; thus, if the confidence intervals of the two comparison
groups do not overlap, we will reject the null hypothesis, otherwise, retain. Table fourcontains these tests.
17 Note that persons who are out of the labor force are not considered unemployed in this definition;
thus this percentage does not correspond to the traditional unemployment rate definition.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 19
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
20/40
DRAFT DO NOT CITE OR QUOTE
-- Insert table four about here --
We summarize the conclusions from this table:1. The foreign born, particularly recent arrivals, are more likely to live in non-single-
unit houses;
2. The foreign born, particularly recent arrivals, are more likely to be renters ratherthan owners;
3. The recently arrived foreign born are more likely to live in crowded housing
units;4. The recently arrived foreign born are more likely to live in non-husband/wife
families;
5. The recently arrived foreign born are more likely to live in houses without a
telephone;6. The foreign born (age 25+) are more likely to not have graduated from High
School;
7. The recently arrived foreign born are more likely to live in households below the
poverty line; the non-recently-arrived foreign born are less likely to live inhouseholds below the poverty line;
8. The non-recently-arrived foreign born are more likely to be receiving publicassistance; the recent foreign born are less likely to be receiving public assistance;
9. The recently arrived foreign born are more likely to be unemployed;
10. The foreign born, particularly recent arrivals, are more likely to live inlinguistically isolated households;
11. The recently arrived foreign born are more likely to be recent movers; the non-
recently-arrived foreign born are less likely to be recent movers.
In sum, of the eleven characteristics that are considered to make a household hard to
count, the recently-arrived foreign born are more likely to exhibit ten of the them, and less
likely to exhibit one (public assistance receipt). The non-recently-arrived foreign bornexhibit six of these characteristics; for three characteristics the native born and non-
recently-arrived foreign born are statistically indistinguishable; and for two characteristics
the non-recently-arrived foreign born are less likely to exhibit the characteristic.
What Can be Done to Measure Coverage of the ForeignBorn?
Despite the stated importance of this topic to many agencies (Government Accountability
Office, 1998:57-58), there have been few attempts to measure net undercoverage of theforeign born. This section will detail alternatives, with appropriate caveats stated.
Coverage Measurement Alternatives: Summary
Coverage measurement is a difficult topic to summarize, but we shall attempt a brief
summarization here. Table one describes the coverage measurement approaches and their
implied coverage ratios. One can imagine defining coverage as doing it better, that is,determining what enumerationshould have occurred. In this case the coverage ratio is the
enumeration divided by the truth (as determined by the better system). Dual system
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 20
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
21/40
DRAFT DO NOT CITE OR QUOTE
estimation is based on doing it independently. In this case the coverage ratio is the
enumeration divided by the independence model. The remaining methods, Reverse record
check, Megalist/Benchmark, and Demographic accounting, each assume that thetraced sample, the megalist or benchmark list, or the demographic summation, determine
the coverage ratio (Popoff and Judson 2004: 634 list these five alternatives; Judson, 2006
discusses coverage ratios.)
-- Insert table five about here --
Post Enumeration Surveys/Dual System Estimation
A standard for evaluating coverage in a census is the Dual system estimation (DSE)method using a post enumeration survey. (See Popoff and Judson, 2004:633-637 for a
summary).
The DSE method has a long history; see, e.g., Chandrasekaran and Deming, 1949; Marks,
Seltzer and Krotki, 1974; Wolter, 1986; Hogan, 1992, 1993 and 2000; for theory andexamples of the method in practice. While the 1950 Census was the first to use a post
enumeration program, and the method has been used subsequently (with increasingtechnical sophistication). The DSE method is a microdata approach (focusing on
individual responses) rather than an aggregate approach (focusing on demographic
aggregates; Judson, 2006).
The 2000 Accuracy and Coverage Evaluation was based on the census short form.
Because of this, it was not possible to create a specific estimation domain for the foreignborn. (As the nativity question is not asked on the short form, it is not possible to assign
individual respondents to that domain so as to construct coverage estimates of that
domain.) It has been asserted (e.g., Hogan, 2008) that technical and policy issues make itimpossible to use dual system estimation to construct a coverage estimate specifically for
the foreign born.
The operative phrase here is, of course, dual-system estimation. Assuming that a dualsystem estimate is the gold standard, and noting that no legislative mandate exists to
specifically construct an estimate of coverage for the foreign born18, the question is moot
without asking nativity on both the 2010 census and the CCM survey, it is not possible toclassify individuals sufficiently to construct a full dual system estimate.
What is not spoken of is the possibility of some otherkind of estimate of coverage.19 It is
to these other kinds of estimates that we now turn.
18 Raising the question, of course: would there be any support for such a legislative mandate, that is, to
specifically construct a dual system estimate, and corresponding coverage correction factor, using nativity
status as an estimation domain?19 Suppose that we stipulate, for the sake of argument, that dual system is the gold standard. Even so, that
stipulation raises the further question: if we cannot generate a gold standard estimate, does that mean we
should not attempt to producesome estimate, recognizing its caveats?
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 21
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
22/40
DRAFT DO NOT CITE OR QUOTE
Demographic Benchmarking
A method that we will describe as demographic benchmarking has been presented in
Pitkin and Park (2005), and proposed by Camarota (2006). The demographicbenchmarking method develops, for an appropriate target population, the highest-quality
demographic data available to construct an estimate of the target population, in various
aggregate quantities (e.g., age groups). These benchmarks are compared to census orsurvey results, and where the census or survey results differ from the demographic
benchmarks, interpret the difference as net coverage error. The demographic
benchmarking approach stands or falls on one particular assumptionthat the benchmark
is of sufficiently high quality to serve this role.
What benchmarks have been used successfully? For the benchmark to be of sufficiently
high quality, it must be measured with little or no error. Of the demographic statisticsavailable currently, only two fully qualify: vital statistics on births, and vital statistics on
deaths. (In the United States, data on Medicare enrollments are a candidate, with relatively
minor correction for historical underenrollments; Robinson, et al, 2002.)
Demographic Analysis
An extension of the demographic benchmarking approach is what we shall calldemographic analysis (which we alluded to earlier). What distinguishes demographic
analysis from benchmarking is that it attempts to construct a complete estimate of the
population of interest, rather than particular segments of it. Like benchmarking, it isfundamentally an aggregate approach rather than a microdata approach. The use of
demographic analysis to assess net coverage is similar to the benchmarking approach in the
following ways: For an appropriate target population, it uses the highest-qualitydemographic data available to construct an estimate of the target population, in various
aggregate quantities (e.g., age groups). For less-well-known groups (e.g., immigrants;
emigrants; unauthorized persons), demographically-plausible models are constructed. andthen the demographic benchmark is compared to census or survey results, interpreting thedifference as net coverage error.
The challenge of the demographic analysis approach is that it makes more assumptionsthan the benchmarking approach. The benchmarking approach can rely on the relative
strength of its underlying data sources: vital statistics, housing, school enrollment, or
employment data. Demographic analysis, in contrast, must rely on the formerandweakerassumptions about components of migration.
Direct enquiries
Marcelli and Ong (2002), after reviewing demographic analysis and dual-system
approaches (with an abbreviated discussion of synthetic estimation, to which we shall turn
subsequently), propose a direct enquiry as a method of estimating census undercoverage.This approach is microdata oriented, in that respondents in households are asked directly
was [this person enumerated in the household] included in the 2000 questionnaire sent to
the Census Department? Thus, for an appropriate target population, a list of persons inthat target population is constructed, adjusting for survey nonresponse. Within that target
population, a direct enquiry as to whether the person was captured in census records is
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 22
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
23/40
DRAFT DO NOT CITE OR QUOTE
performed, and were a respondent indicates that they were not reported on the census list,
interpret the report as a census coverage error, and calculate the gross undercoverage rate
from these data.
Obviously, the question about reporting to the census has the potential to be sensitive,
presumably more so for those foreign-born persons of unauthorized or ambiguous legalstatus. Because the question is sensitive, it is easy to imagine that some version of social
desirability bias will play a role in the respondents answer.
Marcelli and Ong, while recognizing the above criticism, defend the approach as follows:
They work directly with the local population of interest; they use interviewers that are as
non-threatening as possible; and they ask questions in the vernacular, with native speakers.
All of these approaches are designed to reduce response error due to respondent fear orresistance.
One-Way Record Linking
A microdata approach for estimating coverage error was tested by Heer and Passel (1987)
in the Los Angeles metropolitan area. This method will be referred to as one-way recordlinking. For an appropriate target population, a list of persons in that target population,
with appropriate individually identifying information (e.g., full name, full date of birth,
geographic locators) is constructed. This list is linked with the decennial enumeration list,and where a nonlink occurs with the census list, interpret the nonlink as a census coverage
error, and calculate the gross undercoverage rate from these data.
The key to this approach is the assumption that the list for the target population is a
complete list, and that therefore any difference between the list and the census enumeration
mustbe gross census undercoverage. A weakness of the approach is the assumption thatneither list contains gross overcoverage. Privacy concerns about this use of records might
arise, as well.
Synthetic Estimation
Synthetic methods for estimating coverage involve combining information from a
coverage evaluation survey with demographic characteristics of the population of interest.20
The synthetic approach also has a long history (e.g., Gonzales, 1978) and makes use of
direct and indirect information. In fact, the intended application (in A.C.E. in 2000) of the
dual system estimator itself would have used synthetic estimation methods for non-sampledcensus areas.
In the intended A.C.E. 2000 application, nonsampled census blocks would have been
treated as an estimation domain to which coverage correction factors would have been
20This is aspecific application of synthetic estimation in general, in which survey estimates of some
specific characteristic are combined with demographic characteristics to construct the final estimand.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 23
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
24/40
DRAFT DO NOT CITE OR QUOTE
applied synthetically. (A description of the application of the synthetic method can be
found in Hogan, 2000, or Judson and Popoff, 2004.) The innovation proposed here is to
treat the foreign-born population, or some subset of the foreign-born population (such arecent arrivals) as an estimation domain, and an estimate of the unauthorized foreign-born
population as a separate estimation domain, rather than nonsampled census blocks, and
apply the dual system coverage estimates to them. This would required tabulating theforeign-born population enumerated in Census 2000, allocating the foreign born
respondents to appropriate A.C.E. (revision II) post strata, determining the proportion of
the foreign-born population that falls into each stratum, constructing the coverage factorsfrom A.C.E. Revision II post strata, and calculating a synthetic estimate of the net coverage
factors for the foreign-born population as a whole.
The strength of this synthetic approach is that it would use the best available statistically-designed coverage data, rather than rely on demographic assumptions or the assumption
that one or more list is a benchmark. A weakness is that it makes the implicit assumption
that the foreign born have the same coverage factors as the population as a whole within a
post-stratum (the synthetic assumption). If it is assumed that the foreign-born populationhas at best net coverage error equal to the population as a whole, then this method would
generate an upper bound net coverage error rate.
Research ProposalsWe have argued in this paper that assessing the potential undercoverage of the foreign-bornpopulation is an important task for the statistical community. We have presented
ethnographic, survey, and demographic evidence for such undercoverage. We have
presented new findings that suggest, but do not prove, that the foreign-born population
might have differential undercoverage both in the census and in ongoing surveys. We haveshown that population estimates of public policy significance are highly sensitive to an
assumed rate.
Given these findings, it is natural to conclude that the statistical community has a
responsibility to find a way to estimate that number, the potential net undercoverage of the
foreign born. We shall now summarize a sequence of research tasks to approach thatnumber, beginning with the easiest-to-implement and proceeding into more difficult
approaches.
The first method on our list is demographic benchmark studies. Pitkin and Park (2005)
demonstrated that birth registry data, combined with reasonable demographic assumptions,can construct a benchmark population estimate from which an estimate of coverage could
be derived. Because Pitkin and Parks method is based on birth registry data, it does notsuffer the weaknesses of the larger demographic analysis approachit requires fewer
difficult-to-maintain demographic assumptions. This approach would provide a net
coverage error estimate.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 24
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
25/40
DRAFT DO NOT CITE OR QUOTE
The second method on our list is synthetic estimation. Assuming no congressional
mandate for a coverage estimate by nativity is promulgated, it is possible to cross-classify
the foreign-born on characteristics evaluated in the A.C.E. in 2000 (or those to be evaluatedin Census Coverage Measurement in 2010). Using such cross-classification and published
coverage correction factors, it is a mathematical exercise to derive a synthetic estimate.
This approach would provide a net coverage error estimate.
The third method on our list is that of developing a direct survey. Marcelli and Ong (2002)
have demonstrated that direct enquiry, combined with demographic analysis, can begin tomake headway in understanding the potential gross undercoverage of the foreign born.
Following Marcellis arguments, it would appear that such a survey would best be fielded
by a trusted, non-governmental entity, and designed from bottom-to-top to allay
respondents privacy concerns. This approach, as with one-way record linkage, would notprovide information on gross overcoverage.
The fourth method on our list is a one-way record linkage study. We have described above
the technical limitations of a one-way record linkage, noting in particular that theassessment of coverage is biased by the presence of record linkage error. However, it
appears to us to that results from such a study could be adjusted for the presence of sucherror (e.g., Judson, 2007: 497), yielding, at least, some direct information about coverage of
the foreign born. While this approach would provide information on gross undercoverage,
it would not provide information on gross overcoverage.
The fifth method involves testing the feasibility of a Post-enumeration Survey in the
context of the American Community Survey. In accord with Hogan (2008), Census 2010
will not have a nativity question, thus a dual system estimate using nativity as an estimationdomain is not possible. However, the American Community Survey does include a nativity
question. With the development of appropriate statistical theory to account for the ACSs
complex sample design, a post-ACS survey, designed along similar lines as the existingCensus Coverage Measurement system, would provide coverage measurements by nativity
(and presumably, other relevant characteristics).
Our sixth and final method involves testing nativity questions in a CCM framework for
post-2010 purposes. The sensitivity of nativity questions in a CCM framework might bias
the dual system estimator by inducing correlation bias amongst the foreign-born
population. While this is a reasonable concern, it warrants empirical testing. If, in fact theimpact of a nativity question is negligible, and congressional mandate for such estimation
were in place, then there would be no reason not to apply the gold standard coverage
measurement technique to developing a statistically-principled estimate for the netcoverage of the foreign born.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 25
-
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
26/40
DRAFT DO NOT CITE OR QUOTE
References
Agresti, A. 1990. Categorical Data Analysis. New York: Wiley.
Anderson. M. J. and Feinberg, S. E. 1999. Who Counts? The Politics of Census-Taking inContemporary America. New York, NY: Russell Sage Foundation.
Arriaga, E,E,, P.D. Johnson, and E. Jamison. 1994. Population Analysis with
Microcomputers, Vol. 1 and 2. Washington, D.C.: U.S.Department of Commerce, U.S.
Census Bureau, International Programs Center.
Bill, W. 2002. A.C.E. Revision II: Calculating Aggregate Data Defined, Correct
Enumeration, and Census Inclusion Rates (For Groups that Involve Aggregation AcrossPost-Strata). Online: http://www.census.gov/dmd/www/pdf/pp-40r.pdf.
Bruce, A., and Robinson, J. G.. 2003. The Planning Database: Its Development and Use asan Effective Targeting Tool in Census 2000," paper presented at the Annual Meetings of
the Southern Demographic Association, Arlington, VA, October 24, 2003.
Bruce, A., Robinson J. G., and Sanders, M. V.. 2001. Hard-to-Count Scores and BroadDemographic Groups Associated with Patterns of Response Rates in Census 2000,"
Proceedings of the Social Statistics Section, American Statistical Association.
Camarota, S. and Capizzano, J. 2004. Assessing the Quality of Data Collected on the
Foreign Born: An Evaluation of the American Community Survey (ACS). Online:
http://www.sabresystems.com/whitepapers/CIS_whitepaper.pdf.
Camarota, S. 2006. Assessing the Quality of Data Collected on the Foreign Born:
An Evaluation of the American Community Survey (ACS). Paper presented at the U.S.
Census Bureau Conference, Immigration Statistics: Methodology and Data Quality.Alexandria, VA: February 13-14, 2006.
Cantwell, P.J., Hogan, H., and Styles, K.M. 2004. The Use of Statistical Methods in theU.S. Census: Utah V. Evans. The American Statistician, 58: 203-212.
Chandrasekar, C., and Deming, W.E. 1949. On a Method of Estimating Birth and DeathRates and the Extent of Registration. Journal of the American Statistical Association, 44:
101-115.
Clark, W.A.V. and Patel, S. 2004. Residential Choices of the Newly Arrived ForeignBorn: Spatial Patterns and the Implications for Assimilation. California Center for
Population Research On-Line Working Paper Series, CCPR-026-04, February 2004.
Darga, K. 2000. Fixing the Census Until it Breaks: An Assessment of the Undercount
Adjustment Puzzle. Lansing, MI: Michigan Information Center.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 26
http://www.census.gov/dmd/www/pdf/pp-40r.pdfhttp://www.census.gov/dmd/www/pdf/pp-40r.pdf -
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
27/40
DRAFT DO NOT CITE OR QUOTE
De la Puente, M. 1993. Why Are People Missed Or Erroneously Included By The
Census: A Summary Of Findings From Ethnographic Coverage Reports. ResearchConference on Undercounted Ethnic Populations. Richmond, VA: U.S. Census Bureau.
Deardorff, K. and Blumerman, L. 2001. Appendix A: Estimates of the Foreign-BornPopulation by Migrant Status: 2000. In Robinson, J.G. 2001. ESCAP II: Demographic
Analysis Results. Executive Steering Committee for A.C.E. Policy II, Report No. 1,
October 13, 2001. Online: http://www.census.gov/dmd/www/pdf/Report1.PDF.
Ellis, Y. 1995. Examination of Census Omission and Erroneous Enumeration Based on
1990 Ethnographic Studies of Census Coverage. Pp. 515-520Proceedings of the
American Statistical Association (Survey Research Methods Section). Alexandria, VA:American Statistical Association.
Ewbank, D.C. 1981. Age Misreporting and Age-Selective Underenumeration: Sources,
Patterns, and Consequences for Demographic Analysis. Washington, D.C.: NationalAcademy Press.
Fay, R. E. 2001. The 2000 Housing Unit Duplication Operations and Their Effect on The
Accuracy Of The Population Count Paper presented at the Annual Meeting of the
American Statistical Association, Atlanta, Georgia, August 5-9, 2001.
Fein, D. J. 1990. Racial and ethnic differences in U.S. census omission rates.
Demography, 27:285-302.
Fernandez, E.W., and Robinson, J. G. 1994. "Illustrative Ranges of the Distribution of
Undocumented Immigrants by State," Technical Working Paper No. 8. October 1994.
Online: http://www.census.gov/population/www/documentation/twps0008/twps0008.html.
Government Accountability Office 1998a. Decennial Census: Overview of Historical
Census Issues. GAO/GGD-98-103. Washington D.C.: U.S. Government AccountabilityOffice.
Government Accountability Office 1998b. Immigration Statistics: Information Gaps,
Quality Issues Limit Utility of Federal Data to Policymakers. GAO/GGD-98-164.Washington D.C.: U.S. Government Accountability Office.
Hoefer, M., Rytina, N., and Baker, B. 2008. Estimates of the Unauthorized ImmigrantPopulation Residing in the United States: January 2007. Online:
http://www.dhs.gov/xlibrary/assets/statistics/publications/ois_ill_pe_2007.pdf.
Hogan, H. 1992. "The 1990 Post-Enumeration Survey: An Overview." The American
Statistician, 46: 261-269.
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 27
http://www.census.gov/dmd/www/pdf/Report1.PDFhttp://www.census.gov/population/www/documentation/twps0008/twps0008.htmlhttp://www.dhs.gov/xlibrary/assets/statistics/publications/ois_ill_pe_2007.pdfhttp://www.census.gov/dmd/www/pdf/Report1.PDFhttp://www.census.gov/population/www/documentation/twps0008/twps0008.htmlhttp://www.dhs.gov/xlibrary/assets/statistics/publications/ois_ill_pe_2007.pdf -
8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ
28/40
DRAFT DO NOT CITE OR QUOTE
Hogan, H. 1993. "The 1990 Post-Enumeration Survey: Operations and Results." Journal
of the American Statistical Association, 88:1047-1060.
Hogan, H. 2000. Accuracy and Coverage Evaluation: Theory and Application. Paper
presented at the 2000 Joint Statistical Meetings, Indianapolis, Indiana, August 2-5, 2000.
Hogan, H. 2008. Letter to Ms. Judith Droitcour, Assistant Director, Applied Research and
Methods, U.S. Government Accountability Office. Dated February 26, 2008.
Jones, J. 2003. Housing Unit Duplication in Census 2000. Census Bureau Evaluation
O.10. Washington, DC: U.S. Census Bureau. Online:
http://www.census.gov/pred/www/rpts/O.10.PDF.
Judson, D.H. 2006. Demographic Coverage Measurement: Can Information Integration
Theory Help? Paper presented at the 2006 Joint Statistical Meetings, Seattle, WA, August
6-10, 2006.
Judson, D.H. 2007. Information integration for constructing social statistics: history,
theory and ideas towards a research programme.Journal of the Royal Statisti