coverage of fb 3-30-09 - v2 - dhj

8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

1/40

DRAFT DO NOT CITE OR QUOTE

Coverage of the Foreign-Born Population in Censuses and Surveys: What Do We Think,

What Do We Know, and What Can We Prove?

By

Dean H. Judson1

Abstract

It is widely assumed that the foreign-born population in general, and the unauthorized

foreign-born population in particular, are not captured in surveys and the decennial Census

as well as the native-born population. Many different estimates have been proffered, all

with significant limitations. This paper attempts to summarize research on this coverage

question. We first describe the methods used to correct net coverage error in censuses and

surveys, and discuss the effect of net coverage error on censuses, surveys, and derived

products. Then we describe the history of various coverage assertions or assumptions used

in the literature (what do we think?). We then assess their relative merit (what do we

know?). Finally, we attempt to make an assessment as to what isprovable about coverage

of the foreign-born (what can we prove?). We conclude by suggesting a research agenda

that would address this coverage question in a statistically-principled way.

Keywords: foreign-born coverage, coverage measurement, coverage correction,

information integration, the coverage question

1 This report was completed while the author was Senior Statistician for the Office of Immigration

Statistics, and is released to inform interested parties of ongoing research and to encourage discussion of

work in progress. The views expressed on statistical and methodological issues are those of the author and

not necessarily those of the Office of Immigration Statistics or Department of Homeland Security.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 1


2/40


Introduction: the Coverage Question ...................................................................................3

Nonsampling Error ...............................................................................................................4

Net Coverage Error ..............................................................................................................5Why is Net Coverage Error Important? ...............................................................................6

Correcting for Net Coverage Error in Censuses and Surveys .................................................6

The Census: Count Imputation and Post-Enumeration Survey Coverage CorrectionFactors ..................................................................................................................................7

The Effect of Net Coverage Error on the Census ................................................................8

Surveys: Weights and Population Controls .........................................................................9The Effect of Net Coverage Error on Survey Estimates ...................................................10

CPS .................................................................................................................................10

ACS ................................................................................................................................10

Other Surveys .................................................................................................................10Adjustments for Potential Net Coverage Error .................................................................11

What has been Asserted about Coverage of the Foreign Born? ............................................14

What is Provable About Coverage of the Foreign Born? ......................................................15

Ethnographic Studies Suggest that the Foreign Born Avoid Detection ............................16The Foreign Born Tend to Respond Later to Surveys ...................................................16

States with Higher Levels of Foreign Born, Particularly Unauthorized or RecentArrivals, Tend to Have Lower Coverage Ratios ...............................................................16

The Foreign Born, Particularly Recent Arrivals, Tend to Live in Areas with Higher

Hard-to-Count Scores .....................................................................................................17Demographic Characteristics of Individual Foreign Born Persons, Particularly Recent

Arrivals, Correlate With Hard to Count Indices ............................................................18

What Can be Done to Measure Coverage of the Foreign Born? ...........................................20

Coverage Measurement Alternatives: Summary ...............................................................20Post Enumeration Surveys/Dual System Estimation .........................................................21

Demographic Benchmarking .............................................................................................22

Demographic Analysis .......................................................................................................22Direct enquiries ..................................................................................................................22

One-Way Record Linking ..................................................................................................23

Synthetic Estimation ..........................................................................................................23Research Proposals .................................................................................................................24

References ..............................................................................................................................26

Tables and Figures ..................................................................................................................31



3/40


Introduction: the Coverage QuestionWhen conducting a census or survey, it is important that the operations of the census or

survey properly represent the target population of interest. In general, the relationshipbetween the enumerated/sampled group and the target population is referred to as the

coverage of that population. In the absence of proper coverage, there is coverage error,and the survey estimates or census enumeration are biasedwith respect to the population.

This fact is especially important for derived estimates, such as the unauthorized population,which are difficult in any case due to data gaps.

The foreign-born population of the United States is estimated to consist of about 38 millionpeople, 12.6% of the total population of about 302,000,000 as of 2007 (U.S. Census

Bureau, American FactFinder, 2008). It is widely assumed that, for various reasons, the

foreign-born population is particularly likely to be subject to coverage error (Marcelli andOng, 2002; Camarota, 2006) in surveys and the census. If true, then this would have wide-

ranging implications for the census itself, ongoing surveys, and for estimates derived from

them (including estimates of the naturalized, legal permanent, refugee/asylee andunauthorized subpopulations). We will refer to this question as the coverage question

and we will repeatedly use the phrase potential undercoverage to emphasize the potential

in the absence of hard proof.2

The purpose of this paper is to comprehensively address the coverage question, and

attempt to define an agenda to convert assumptions about foreign-born undercoverage into

proof about its existence and magnitude. We will do so in the following steps:

First, we will address the sources of error in surveys and the census;

Second, we will review the known effects of coverage error on surveys and the

census, focusing on the foreign born where possible;

Third, we will summarize attempts or recommendations on how to try to measurecoverage;

Fourth, we will assess which measurements of coverage are most viable; andfinally,

We will suggest a research agenda to definitively attack the coverage question.

In what follows, we will be most careful to distinguish our knowledge in this area betweenwhat an individual researcher asserts to be true (what do we think?), what the community

of researchers generally agree on (what do we know?), and concrete findings in this area

(what can we prove?).3

2 This is not a new concern; cf. Siegel, 1976:15: This report has tried to develop the view that it is not a

practical goal to estimate directly the number of illegal aliens [sic]Thus, the effort to estimate the number

of illegal aliens becomes principally an effort to measure the coverage of the total population by nativity

(italics added).3 For the readers interest, this three-part partition of knowledge is derived from the excellent movie,

And the Band Played On based upon the book about the AIDS epidemic of the same name (Shilts, 1987).



4/40


Coverage error refers to a number of errors in addition to any sampling variability or other

nonsampling error. To more fully describe the term coverage error, we will begin with a

general discussion of sampling variability and nonsampling error in surveys and the census.

Sampling Variability

The sampling error of a sample survey can be measured in several ways. The first measure

that is usually desired is the variance of the sample estimate. This is the average, over all

possible samples, of the squared deviations of the estimates from their expected value. Anestimate of the variance can be obtained from the sample survey data themselves. If there

are nonsampling errors or the sample is biased, then the deviations are taken around the

true value of the statistic and the measure is called the mean square error. Typically, the

variance is denoted by 2 and the mean square error by MSE. Of these two measures, the

MSE is more general, as illustrated in the formula for MSE. Suppose that p is the value

being estimated, and p is the estimator of p; then the MSE ofp is given by:

( )

( ) ( )

)(bias)var(

)()(

)()()()(

2

22

22

pp

ppEpEpE

ppEpEpEppEpMSE

+=

+=

+==

(1)

Ifp is unbiased, then the MSE is just the variance itself. In the presence of coverage error

(or other kinds of nonsampling error), p is notunbiased, and contributes to mean squared

error.4

Nonsampling Error

In addition to the error of the estimates caused by sampling variability, there is another

component of the total error in demographic data. Nonsampling error characterizes all

surveys, whether sampling is used or notincluding 100% surveys known as censuses.This component arises from mistakes made in the process of eliciting, recording, and

processing the response of an individual unit in the surveyed population. Every operation

in a census or sample survey, and every factor within an operation, may contribute tononsampling error. Lessler and Kalsbeek (1992:9) classify survey errors into four types:

4 Lohr (1999:256-258) develops a simple model illustrating the biasing effect of nonresponse. Letting MN

be the number of nonrespondents in a survey, N the total number sampled in the survey, RUp the mean

response for respondents, MUp the mean response for nonrespondents, and Up the mean response for the

population as a whole, the bias induced by nonresponse is approximately ( )MURUM ppN

N . This bias

is small only ifN

NMis small (there is little nonresponse) or ( )MURU pp is small (responders arent

much different than nonresponders).



5/40


Frame errors: problems with developing lists of units to be sampled, duplication or

omission of units, and the like;

Sampling errors: error or variance associated with sampling itself;

Nonresponse errors: errors associated with the failure to include a unit in the survey

that should have been included, either through omission by the surveyer or

nonresponse or refusal by the respondent; and Measurement errors: errors associated with data collection operations, including the

contribution of respondent, interviewer, and questionnaire to error.

Factors that are common to all of these operations, including training procedures and

supervision, and technical-staff intervention associated with each operation, may also be

sources of nonsampling error. Typically, those operations that are conducted in the officeafter collection of the data in the field are more amenable to control than are the field

operations. It is also generally possible through study of the office operations themselves

to measure errors introduced (by the operations) into the data that were collected in the

field. Because nonsampling error made by the respondent, in interaction with the

interviewer and the questionnaire, is more serious and less amenable to measurement thanerrors arising from other operations, nonsampling error is often called response error. A

typical example of response error made by respondents is the tendency of persons in manycountries to report their ages in years ending in zero and five (Ewbank, 1981). Often such

response error requires special detection and smoothing methods; such methods are

described in Arriaga, Johnson, and Jamison (1994), and Judson and Popoff (2004).

Net Coverage Error

Coverage error can take two forms: overcoverage and undercoverage. The former implies

that a sampled unit occurs in the sample more than once. The latter implies that a sampled

unit occurs in the sample less than once. Typically, the difference between overcoverageand undercoverage is referred to as net coverage error. Net coverage error can be positive,

negative, exactly zero or sum to zero (in this latter case errors offset each other).

Furthermore, net coverage error can vary across specific geographic, economic or

demographic subgroups. That is, one group could have a positive net coverage error and

another group could have negative net coverage errorthus one group is overcovered and

the other undercovered. This state of affairs is known as differential coverage error.

At this point it is useful to discuss generally the causes of over and undercoverage, before

entering into the specific issues associated with the foreign born. Several authors (Fein,

1990; de la Puente, 1993; Anderson and Feinberg, 1999; Judson and Popoff, 2004)enumerate sources of coverage error, which we shall summarize.

A source of overcoverage found in Census 2000 was Master Address File frame erroneous

enumeration and duplication (Jones, 2003). In the Census 2000 context, operations that

created the Master Address File created some units that were later found to be erroneousenumerations, or created multiple records for the same address. (For example, the same

physical location might have multiple ways to write the address, and the file would contain



6/40


all of them.) Comparing demographic housing benchmarks with the 1999 Decennial

Master Address File, the control file for subsequent operations, implied a 5.3%

overcoverage of addresses in that file. This led to overenumeration when the addressreceived multiple forms or in-person followups (Fay, 2001).

Similarly, overcoverage can be caused by accidental person duplication. For example, acensus form might enumerate a student living away from home at college, while the

students parental household also enumerates that same student. As another example,

persons in assisted-living situations might be enumerated at a group quarter and at ahousehold. The Census Bureau estimates a lower-bound figure of 5.8 million such person

duplicates in Census 2000 (Mule and Fenstermaker, 2003).

By contrast, undercoverage comes from many sources. The Census Bureau enumeratesmany of them: High mobility; Rentership; Language barriers; Neighborhood resistance;

Irregular housing; Non-standard living arrangements; Loose attachment to a particular

household; Concerns regarding confidentiality (de la Puente, 1993; Darga, 2000; Camarota,

2006). To this list might be added: Active desire for concealment (Tourangeau, et al.,1997; Ellis, 1995; Valentine and Valentine, 1971).

Why is Net Coverage Error Important?

When estimates are produced by a census or survey, typically these estimates are notderived directly from the raw survey or census responses. This means that, even with

extensive operational quality control and intensive effort, the data collection does not yet

represent the population.

As Shapiro and Kostanich (1988:443) state: we believe there is not general awareness

of the deleterious effects of response error, and it is rarely estimated. In poorly designedand conducted household surveys, there can be many serious problems. In even the best

household surveys, however, undercoverage and response error tend to be high and, in our

opinion, are the two most important problems in the sample survey field. (Italics added.)

When faced with this situation, there are variety of processing steps and adjustments to

the data to attempt to make it represent the population of interest. (These steps are

different for a 100% enumeration and a sample enumeration, so we will address themseparately.) Typically, the goal is to represent the population as a whole, not necessarily

any particular subgroup. Therefore, the question that we will address is: do these

adjustments correct for potential undercoverage of the foreign-born population, and if they

do not, how much undercoverage remains?

Correcting for Net Coverage Error in Censuses andSurveys

In this section we will describe attempts to correct for net coverage error in the census andin surveys. The approaches are slightly different due to their different data collection

context (100% enumeration versus sample selection). One area which we will not consider



7/40


is the area of item imputation, where item nonresponse (failure to collect one or more data

element from a survey/census form) requires item imputation. We will, however, consider

unit imputation caused by unit nonresponse (where entire sample/enumeration units aremissed).

The Census: Count Imputation and Post-Enumeration SurveyCoverage Correction Factors

In the census context, extensive efforts are made to execute a complete enumeration: theseinclude advertising campaigns, school presentations, state and local partnership programs,

careful organization, quality control checks, multiple language questionnaires, querying

neighbors, and multiple nonresponse followups, to name a few.

However, despite all efforts, for a small fraction of housing units (about 1.4% of housing

units in Census 2000; Zajac, 2003), nothing is known about the composition of the housing

unit, even from a neighbor. In this case imputation is used.5 Imputation typically takes twoforms: status imputation and count imputation. Status imputation determines

whether the nonresponding unit should be considered occupied or vacant; if occupied,count imputation determines how many personsshould have been enumerated there, butwere not. In Census 2000, both kinds of imputation were executed using the hot deck

technique, which typically chooses a nearest neighbor to make the imputation6. Item

imputation is then used to fill out the characteristics of the imputed people.

For Census 2000, an ambitious coverage correction program was developed, building on

the earlier Post-enumeration survey (PES) and Post-enumeration program (PEP) used in

past censuses. This coverage program, the Accuracy and Coverage Evaluation (A.C.E.),was a reenumeration of sampled census blocks, from address frame development to

household enumeration with specially-designated interviewers. The sample size of the

A.C.E. was large enough that it had potential to be used to adjust the census to correct fornet coverage error.

The statistical theory behind the A.C.E. operation was not, as some suppose, to get the

enumeration right in the A.C.E. and then use it to correct the census. Rather, it used dualsystem estimation theory, which assumes that the two enumerations are independent (rather

than one being superior to the other). This independence assumption is critical to

understanding why it is difficult to 1) incorporate nativity questions into the analysis in2010 or 2) use other data sources (such as recent Lawful Permanent Resident Admissions)

to estimate coverage. We will deal with these technical matters later in this paper.

Using the independence assumption, an estimate of the true enumeration is constructed, for840 (in 1990) or 416 (in 2000, after collapsing) predefined post-strata or estimation

domains. (For example, American Indians living on reservations is one estimation domain;

5 Technically, in Zajac, 2003, this is called substitution to distinguish it from assignment and

allocation, two forms of item imputation.6 Note that these imputations were the subject of a lawsuit against the Census Bureau by the state of Utah,

with Utah claiming that hot deck imputation was a form of prohibited sampling for nonresponse, the

Census Bureau claiming otherwise. The Supreme Court decided in the Census Bureaus favor by a 5-4

margin. See Cantwell, Hogan, and Styles, 2004, for a summary ofUtah v. Evans.



8/40


Hispanic renters is another.) From these domains, coverage correction factors were to be

derived, and used in non-sampled blocks to adjust the census (see U.S. Census Bureau,

2004, for a detailed design document). The Census Bureau did not commit to adjusting thecensus; rather, staff conducted a series of evaluations and the Census Bureau was to

recommend whether to adjust or not.

The history of A.C.E. is documented elsewhere, both by the Census Bureau and by others

(Anderson and Feinberg, 1999; U.S. Census Bureau, 2004). In sum, the problem of

duplication, respondents interpreting residence rules, and nonindependence between theoriginal enumeration and the A.C.E. enumeration, mitigated against using the A.C.E. to

adjust the decennial results. Three decisions, in sequence, were made after several months

of evaluation: 1) A.C.E. could not be used for congressional apportionment; 2) A.C.E.

could not be used for congressional redistricting; and 3) A.C.E. should not be used foradjusting the postcensal population estimates base (Mulry, 2006). The Census Bureau has

not made public any plans for considering adjustment in 2010.

The Effect of Net Coverage Error on the Census

The central purpose of the decennial census is to provide the Constitutional basis for theapportionment of representatives amongst states. A second purpose is to provide small

area data for the purpose of congressional redistricting (Government Accountability Office,

1998a).

It is the third purpose of a census that is most germane to this paper. This third purpose is

to serve as a population base for making postcensal population estimates at many levels ofgeography: national, state, county, and subcounty. These postcensal population estimates

are used for federal funds distribution, but we will not focus on these distributive effects.

In this paper, their most important use is for survey population controls, as described in theprevious section. To the extent that the census itself has differential net coverage error,

that error will be propagated forward into postcensal estimates and projections.

An illustration of these impacts can be seen in Robinson, et al. (2003), which compared theDemographic Analysis (DA) method with unadjusted decennial enumerations. Figure

one, taken from that paper, exhibits net coverage error7 by 5-year age group and two race

categories.

-- Insert figure one about here --

As can be seen in this figure, comparing decennial census results with the independent DAsystem of coverage measurement reveals important patterns by race, sex and age. Ages 0-4

and 5-9 tend to exhibit net undercoverage, for all race groups. Outside of these ages,

7 The term used at the time was net census undercountwith a negative net undercount implying an

overcount. We prefer the term net coverage error in this context. It is important to note in this context

that the race detailthat DA can provide is limited to black, white, and all other races. Typically the latter

two are combined into a nonblack category. This is a limitation of the data sources used to construct the

DA estimate. From our point of view (our interest in coverage of the foreign born), this is a significant

limitation.



9/40


nonblack females exhibit slight net overcoverage, while black males exhibit substantial net

undercoverage from ages 20-64. Black females and nonblack males also exhibit slight

undercoverage for approximately ages 30-59. Similar patterns were detected in the 1990census, as well (Robinson, et al., 2003).

To the extent that these coverage errors remain in an unadjusted census, they propagateforward into postcensal estimates, and surveys that use them for postratification controls.

Surveys: Weights and Population Controls

In the survey environment, there are two primary components: Adjustments based on

survey design, and adjustments based on population controls derived from the previous

census. Coverage correction typically takes the form of a series of weights, each ofwhich attempts to correct for one or another form of net coverage error. For this section,

we will take a typical example based on the American Community Survey, although

similar techniques are used with other surveys such as the Current Population Survey,Annual Social and Economic Supplement (ASEC).

Weights are applied in phases8:

Phase one: In phase one, the sampling design is taken into account. This weight accountsfor the combined probability of selection of the final sampled unit. Since a sample, by

construction, does not cover the entire population of interest, this is the first attempt at

correctionto correct for unequal sampling probability.

Phase two: In phase two, any nonresponse followup design is taken into account. For the

American Community Survey, approximately one in three households who do not respond

to the mailing and the telephone followup, are sent to personal interview followup. Bydesign, then, about 2/3rds of the nonresponders are not covered. This weight accounts for

that and upweights those households who receive personal interviews.

Phase three: After phases one and two, at this stage there are a fraction of households whoare noninterviews. (Some have used the phrase hard core nonresponders or similar

language.) Little is known about these households except their geographical location and

some limited information interviewers can obtain by proxy. Phase three weights, or

noninterview weights are typically used to adjust the sampled and responding units tomatch the geography and information available on the nonresponders. At this point the

combination of weights are purely survey oriented.

Phase four: After phase three, postratification occurs. In this phase, independenthousing unit estimates (typically constructed by demographers) are used as control totals

that is, the estimated survey housing unit values are controlled to the independent

controls. The purpose is to remove the nonresponse bias from the mean squared error.

Phase five: Finally, another round of postratification occurs. In this phase, independentpopulation estimates (again, demographically constructed) are used as control totals for

person recordsthe estimated survey person characteristics (some combination of age,

race, sex, and Hispanic origin) are controlled to the independent controls. As withhousing unit controls, the purpose is to remove nonresponse bias from the mean squared

error.

8 These phases are necessarily a summary and approximate the steps without being exact specifications.

Typically the weights are combined multiplicatively to generate a final adjustment weight.



10/40


Lohr (1993: 272) presents this caution about this sequence of adjustment weights:

The models for weighting adjustments for nonresponse are strong: in each

weighting cell, the respondents and nonrespondents are assumed to be

similar.These models never exactly describe the true state of affairs, and youshould always consider their plausibility and implications. It is an unfortunate

tendency of many survey practitioners to treat the weighting adjustment as a

complete remedy and to then act as though there were no nonresponse (Italicsadded).

The Effect of Net Coverage Error on Survey Estimates

The effect of net coverage error in survey estimates can be summarized by the coverage

ratio (Shapiro and Kostanich, 1988). A coverage ratio compares the estimate from thesample of the number of people who have a particular characteristic to the same estimate

from updated decennial census figures. For example, a coverage ratio of .95 for males aged50 to 59 indicates that the survey estimate of the number of people in this subpopulation is

95% of the updated census population estimate. Occasionally, the coverage ratio exceeds1.0, indicating overcoverage of a particular category.

CPS

The Current Population Survey in the 2001-2004 period are summarized at

http://www.bls.census.gov/cps/basic/perfmeas/coverage.htm. As noted on that page,average coverage ratios are typically about .90, coverage ratios for males are typically

lower than for females, and this is particularly prominent for black males in that survey.

Hispanic males and males of other race are also low, in the .80 range. It also appears that

the coverage ratios have a slight downward trend over the period.

ACS

The American Community Survey maintains a data quality web page, which summarizes

coverage ratios.9 Some evidence of coverage differentials has also been presented at

statistical meetings (e.g., Bruce, Navarro and Ahmed, 2007). At the national level,unadjusted ACS estimates exhibit coverage ratios of between .94 and .97 relative to total

population estimates through the 2000-2006 period. Males approach 8% undercoverage,

females about 4%. Hispanic coverage ratios range from .897 to .964, non-Hispanic whitefrom .949 to .971.

Other SurveysWe only briefly mention other surveys. Three surveys that provide useful information on

the foreign-born population are the Survey of Income and Program Participation (SIPP),

the New Immigrant Survey (NIS), and the National Agricultural Workers Survey (NAWS).

9 Because ACS samples for personal interview followup, the coverage ratio is necessarily defined slightly

differently than the analogous CPS coverage ratio. Coverage ratios can be found at:

http://www.census.gov/acs/www/acs-php/quality_measures_coverage_2006.php.

http://www.bls.census.gov/cps/basic/perfmeas/coverage.htmhttp://www.bls.census.gov/cps/basic/perfmeas/coverage.htm


11/40


The SIPP, by virtue of its longitudinal design, has an additional coverage dimension, apart

from the coverage ratios found in other surveys. This additional dimension is sample

attrition (the loss of formerly-responding households and people). Attrition, of course,causes later respondent groups to differ from earlier respondent groups, and requires

attrition weights to correct for differential attrition across groups.

Adjustments for Potential Net Coverage Error

In a number of publications (e.g., Passel, Van Hook, and Bean, 2004; Passel and Suro,2005; Passel, 2005, 2006, 2007; Passel and Cohn, 2008), Pew Hispanic Center has

published estimates of the size and characteristics of the unauthorized foreign-born

population. This method uses the March Current Population Survey (CPS) as the base forthe estimate.

Undercoverage factors are derived from the Accuracy and Coverage Evaluation (Passel andCohn, 2008: 13): a 2.0% undercount rate for legal resident immigrants (2.6% for legal

resident immigrants who have entered after 1980). Passel and Cohn cite Marcelli and Ong(2002) to justify a 12.5% undercount for unauthorized immigrants in the March CPS.

Passel, Van Hook and Bean (2004) use a lower, 9.1% undercount.

It is important in this context to point out that the resulting estimate not only uses these

assumptions, but the resulting estimate is sensitive to them. Using the Passel, Van Hookand Bean (2004) estimates of the 2000 unauthorized as a base, after some algebraic

simplification, the resulting approximate residual estimation formula is (in millions):

,1

11361346120

)-u(

)-u(.-.edunauthoriz

edunauthoriz

legal= where,

unauthorized is the final estimate of the number unauthorized;

legalu is the assumed undercoverage rate for legal resident immigrants; and

edunauthorizu u is the assumed undercoverage rate for unauthorized immigrants.

If we fix the parameter legalu to match that of Passel, Van Hook and Bean, at 1.6%, the

estimating equation simplifies to:

,1

9.1246120

)-u(

-.edunauthoriz

edunauthoriz

= .

Thus, the unauthorized population is inflated by)-u( edunauthoriz1

1.

Table one illustrates the impact of this inflator on the final estimate (we vary the parameterfrom one percent to 20 percent).

-- Table one about here --



12/40


As can be seen, the range of possible unauthorized estimates ranges from 7.68 million (the

lower bound) to 9.5 million (the upper bound). As can be seen from the center column, theinflation factor increases increasinglythat is, as the assumed rate grows, its impact on the

final number also grows.

The Office of Immigration Statistics (OIS) is legally mandated to produce an estimate of

the stock of the unauthorized population residing in the United States (Immigration and

Nationality Act, Section 103(d).) The method is described (cf. Hoefer, Rytina, and Baker,2008) as a residual method, and is similar to the Pew Hispanic method described above:

A population base of the total foreign-born population is constructed from

American Community Survey population estimates;

This base is then inflated for net undercoverage;

From this base is subtracted estimates of deaths, emigrants, legal permanent

residents, refugees and asylees, and resident nonimmigrants; and

The residual is an estimate of the unauthorized.

In this method, two undercoverage factors are applied: a 2.5% net undercoverage rate for

the legal permanent resident (LPR), refugee and asylee population as a whole, and a 10%net undercoverage rate for the nonimmigrants10 and the unauthorized. It is important to

note that the application of these two assumed net undercoverage rates implies that the

resulting estimate is no longer consistent with published census estimates from the ACS.The unauthorized net coverage rate was based on previous DHS/INS unauthorized

estimates, which cited Marcelli and Ong (2002). The authors note that the resulting

estimate is sensitive to the assumed net undercoverage rate.

Like the Pew Hispanic method, with some simplifying assumptions we can perform our

own sensitivity analysis. Ignoring some demographic particulars, and using similarnotation as above to illustrate the similarities, the 2007 formula is approximately equal to(in millions):

( )[ ] ,1

117.112.2018.28

)-u()-u(-emig)()-u-(edunauthoriz

edunauthoriz

nonlegal =

where,

legalu is the undercount rate for the legal resident population;

emig is the effective emigration rate11;

nonu is the undercount rate for nonimmigrants; and

edunauthorizu is the undercount rate for the residual unauthorized population in the numerator.

Table two compares the relative sensitivity of the resulting estimate to each of the assumed

rates. The shaded center line represents the assumed rates used in 2007 (again noting that

this formula is an approximation):

10 Nonimmigrants are sometimes referred to as legal temporary migrants; they include visa holders who

are not legally approved for long-term immigration/residence.11 This is an approximation to the internal formula.



13/40


-- Table two about here --

The table reveals that, over ranges that are commonly used, the formula is particularly

sensitive to the assumed net undercoverage of the unauthorized foreign born. It is also

notably sensitive to assumed emigration rates, and less so for the coverage assumptionsassociated with nonimmigrants and with the legally-resident foreign-born population.

We conclude with an examination of labor force estimates. During the period of economicprosperity of the 1990s, an anomaly appeared between the Current Population Surveys

estimate of the labor force, and the Current Economic Statistics survey of establishment

reported jobs. In essence, the two series, normally running almost in parallel, began to

diverge, with the CES reporting greater growth in jobs than the CPS was reporting growthin the labor force (Juhn and Potter, 1999). Juhn and Potter analyzed the differences as of

1999, considering three hypotheses: that the surveys treat multiple jobholding differently;

that the payroll survey (CES) is upwardly adjusted by benchmarking; and that there was an

undercount of the working-age population in the calculation of the household surveyestimates. Their conclusion is notable:

We find that the third explanationan underestimated working-age population

best accounts for the recent rise in the employment gap. Since the household survey

calculates the level of employment by combining survey data with a census-basedestimate of the U.S. working-age population, an undercount of that population will

produce low employment numbers. Evidence suggests that the census has in fact

historically underestimated this population. Significantly, the undercount appears to

be highest among groups whose employment status is very sensitive to businesscycle fluctuations. We contend that the steady expansion of the economy in the

1990s has enabled these cyclical workers to find employment. Their numbers, only

partly captured in the censusand, by extension, in the household surveyhave inrecent years helped to boost the job count in the payroll survey, widening the gap

between the surveys employment estimates (Juhn and Potter, 1999:1).

The recently- (within 10 years) and very-recently- (within 5 years) arrived foreign born are

more likely than older foreign born or native born persons to fall into the working age

population (Mosisa, 2002). To the extent that the foreign born are among this group, and

the unauthorized foreign born in particular, then the potential undercoverage wouldsystematically underestimate the size of the labor force, particularly during expansions.

The CES/CPS differential in the 1990s, and the subsequent (in 2000) discovery of

additional, as-yet-undetected net immigration (see, e.g., Robinson, et al., 2002: 22, or



14/40


Nardone, et al., 2003:12),12 is consistent with this interpretation, and suggests the

sensitivity of these results to potential undercoverage of the foreign born.

An analysis of the effect of CPS/CES differences on unemployment rates and other

quantities of interest was presented by Schweitzer and Ransom (1999). They calculate the

effect on late 1990s unemployment rates based on assuming that the CPS employmentlevels followed those of the CES. Their conclusions are presented in their Figure 1: either

unemployment rates must have been much smaller than reported in the CPS during that

period, or labor force participation rates must have risen very quickly, or something waswrong with the population controls. Again, this is consistent with, but does not prove, the

subsequent discovery of additional net immigration, and suggests the sensitivity of these

results to potential undercoverage of the foreign born.

What has been Asserted about Coverage of the ForeignBorn?

Marcelli and Ong (2002), provide results of a study of foreign-born Mexicans in Los

Angeles county; these results have been widely cited and used far outside their originalcontext.13 In this study, Marcelli and Ong developed a direct survey enquiry: respondents

in households were asked directly, was [this person enumerated in the household]

included in the 2000 questionnaire sent to the Census Department? Later in the survey,respondents legal statuses were also ascertained. They obtained a gross undercoverage

rate of 10.6% for unauthorized respondents, 8.3% for legal immigrants, 4.5% for U.S.

citizens, and 7.1% for temporary visitors.

In contrast to direct enquiries, Van Hook and Bean (1998) used a demographic analysis

approach, generating an expected population size (based on vital statistics) and

comparing it to the obtained population size in the 1990 census. Their results suggest net

undercount rates that range from 15-25 percent for unauthorized Mexican immigrants.

In the context of evaluating coverage in Census 2000, Deardorff and Blumerman (inRobinson, 2001: Appendix A) develop several net undercoverage assumptions by migrant

legal status, and examine their impact on final demographic analysis estimates. In part, this

was an attempt to explain the then-discrepancy between early A.C.E. coverage results and

demographic analysis coverage results. For purposes of their scenarios, these assumptionsranged from 1-2% for legal migrants, 7-35% for legal temporary migrants, and 10-15% for

12 Robinson, et al.:For the 2000 DA, we were particularly concerned about the reliability of the

immigration components and conducted a sensitivity analysis in response. This analysis led to the

incorporation of an alternative set of DA estimates to allow for the possible understatement ofimmigration (specifically, undocumented immigration) in the initial DA components of growth. Nardone,

et al.: Additional studies carried out by the Census Bureau Population Division as part of theestimates evaluation indicated that the estimates of unauthorized migrants that were used

in the 1990 based intercensal estimates were too low. The evaluations indicated that the

residual foreign born population increased by about 5 million during the 1990 to 2000

decade rather than the 2.25 million (10 x 225,000) assumed in the 1990 based estimates.13 For the record, Marcelli (personal communication) considers applying these results to geographical

contexts outside of Los Angeles County or to other foreign-born populations residing in the United States

to be questionable at best.



15/40


the unauthorized. They note (p. A-12) that the coverage assumptions do not explain the

different total populations calculated by DA and the A.C.E.

Passel, Van Hook, and Bean (2004), Passel and Suro (2005), Passel, 2005, 2006, 2007;

Passel and Cohn, 2008), developed a series of estimates of the foreign born population by

legal status, including characteristics. These estimates are widely cited in the mainstreammedia. These estimates assume a net undercoverage rate of 2% for legal resident

immigrants (2.6% for those entering after 1980), based on census studies of net coverage

error. For the unauthorized, however, a net undercoverage rate of 12.5% is assumed. InPassel and Cohn (2008), Marcelli and Ongs 2002 work is cited.

A further source of assertions about coverage of the foreign born is the Office of

Immigration Statistics estimates of the unauthorized population. As noted above, twoundercoverage factors are applied: a 2.5% net undercoverage rate for the LPR, refugee

and asylee population as a whole, and a 10% net undercoverage rate for the nonimmigrants

and the unauthorized. It is important in this context to quote the justification for these

rates: This was the same rate used in previous DHS estimates (Department ofHomeland Security, 2007). Of course, the previous estimates cited other previous

estimates, and eventually this assumption can be traced to Office of Policy and Planningestimates developed by Robert Warren (2003), who cited Marcelli and Ong (2002).

Other ranges have been applied to earlier censuses.14 Unofficial estimates by Census staffput the undercount of illegal immigrants at about 33 percent in the 1980 census

(Government Accountability Office, 1998b; Fernandez and Robinson, 1994); Passel (1986)

suggested a range between 33 and 50 percent. For the 1990 census, various analyses put the

figure at roughly 20-30 percent (Woodrow, 1991; Van Hook and Bean, 1997; Woodrow-Lafield, 1995). GAO (1998) cites these and other sources in attesting to the difficulty of

assessing these various assertions.

For completeness, it is important to note the A.C.E. results from Census 2000, and what the

non-adjustment decision of the Secretary of the Commerce Department (coinciding with

the Census Bureaus recommendation [ESCAP II]) implies. Because overall net coverageerror was close to zero, and because it could not be determined that A.C.E.-based

adjustments would improve distributional accuracy, no adjustment was performed. This

implies that the net coverage error of the foreign-born population is, for all practical

purposes, zero. Implicitly, then, subsequent to Census 2000 all population estimates builton the census base are by assumption covering the foreign-born population adequately.

What is Provable About Coverage of the Foreign Born?While much is assumed or asserted to be true about coverage of the foreign born, theseclaims are difficult to prove. In this section we will survey some empirical results from theliterature, and provide some results of our own. These results suggest that net

14 It is widely believed (e.g., Anderson and Feinberg, 1999) that Census-taking has improved steadily

throughout the 20th century (duplication in Census 2000 possibly being a symptom of an overly aggressive

attempt to eliminate undercoverage). Thus, estimates of undercoverage from earlier censuses may not

apply to Census 2000.



16/40


undercoverage of the foreign-born population is higher than for the rest of the population.

In the following sections, we will use the term recent arrival to refer to foreign-born

persons whose year of entry is within 10 years of the current survey period, and the termvery recent arrival to refer to those whose year of entry is within five years of the current

survey period.15

Ethnographic Studies Suggest that the Foreign Born AvoidDetection

In the early 1990s, several ethnographic studies were performed, working with 29

(necessarily local and not statistically generalizable) sample areas containing some foreign-

born groups. De la Puente (1993: 4) summarizes many of these results. De la Puente notes

that complex households were common in sample areas with recent immigrants, especiallyHispanic immigrants. These ad hoc households protect the identity of members and thus

contribute to within-household undercoverage.16 Complex households (often containing

more members than allowed by law or by building management) combine with fear ofdisclosure to create avoidance.

The Foreign Born Tend to Respond Later to Surveys

Work performed by Camarota and Capizzano (2004) mixed ethnographic with quantitative

analyses of operational ACS data. While the ethnographic results were broadly

comparable with the previous section, the quantitative data reveal that, in the study areas,the foreign born are more likely to be captured later in the survey process. Figure two

and three, taken from Camarota and Capizzano (2004), illustrate this:

-- Insert figures two and three about here --

As can be seen in these figures, foreign-born respondents are disproportionately

responding in the later, telephone assisted (CATI) or personal interview (CAPI) phases ofthe ACS. Furthermore, certain countries of birth of foreign-born respondents (including

some countries that are commonly held to be major sources of unauthorized immigration)

tend to be captured later in the operational process. Obviously, complete household non-response represents the ultimate late responderand it is widely asserted by survey

experts (with evidence given by, e.g., Treat and Stackhouse, 2002) that late responders or

households captured in nonresponse follow-up are systematically different than earlyresponders.

States with Higher Levels of Foreign Born, ParticularlyUnauthorized or Recent Arrivals, Tend to Have LowerCoverage Ratios

Figure four plots coverage ratios in the 2006 American Community Survey against thepercent of the total population that is a recent arrival foreign born, for states. Overlaid

15 Obviously, the choice of years is somewhat arbitrary. Wilson (2008) uses a 10-year period to indicate

recent; Clark and Patel (2004) use a 5-year period to indicate recent. The results described below are

robust to either definition.16 One household cite in this study indicated that of the thirteen persons living in this household, only six

were enumerated in the 1990 census.



17/40


onto this plot is a simple linear regression and 95% confidence interval. The regression

line is analytically weighted to reflect the total population size of the state.

-- Insert figure four about here --

As can be seen, states with more recently arrived foreign born systematically tend tohave lower total coverage ratios.

It is important in this context to mention the ecological fallacy in this regard (Robinson,1950). These are state level data, and demonstrating a correlation between recently arrived

foreign born and lower coverage ratios does not necessarily imply individual-level net

undercoverage. The demonstrated relationship suggests, but does not prove, such an

individual-level relationship.

The Foreign Born, Particularly Recent Arrivals, Tend to Live inAreas with Higher Hard-to-Count Scores

Our final test uses the Planning Database (Robinson and Bruce, 2007; see also Bruce andRobinson, 2003; and Bruce, Robinson, and Sanders, 2001), a tool designed by the CensusBureau for planning and targeting areas that are harder to count than other areas. This

database, using Census 2000 short form and long form data, contains tabulations of data on

variables relevant to net undercoverage. Further, for 65,184 census tracts, a hard to

count score is calculated. This score ranges from 0 (representing the easiest to counttracts, with none or very few indicators) to a theoretical maximum of 132 (representing

tracts with all indicators of net undercoverage).

Again we wish to be careful not to commit the ecological fallacy (Robinson, 1950). We

emphasize that these are tractlevel data, and while we will show that there is a correlation

between recently arrived foreign-born persons and other hard to count indicators, werecognize that definitive proof requires individual level data. To emphasize this point, we

will continue to use the phrase areas to refer to tracts.

The hard to count score is a sum of weighted scores. The following description is fromRobinson and Bruce (2007: 7): a total of 12 variables that were correlated with

nonresponse rates in 1990 and 2000 are used to derive the HTC score.

The set of algorithms used to determine HTC scores is as follows:

(1) each individual variable is sorted across geographic areas from high to low (e.g., sort

tracts from highest percent poverty to lowest percent poverty),

(2) scores (0 to 11) are assigned to each variable for each tract (e.g., values of 11 are givento tracts with the highest poverty rates of over 44.3 percent and values of 0 are given to

tracts below the national median poverty rate of 9.9 percent in 2000),(3) the scores assigned to each of the 12 variables for a tract are summed to form a

composite HTC score for the tract.

The final HTC score is the sum of the ratings of the following components:1) Percent vacant housing units;



18/40


2) Percent housing units that are not single-family structure;

3) Percent renter-occupied housing units;

4) Percent crowded occupied units;5) Percent of families not in a husband/wife configuration;

6) Percent of occupied housing units with no phone service;

7) Percent of persons 25 and older not a high-school graduate or more;8) Percent of persons below the poverty level;

9) Percent of persons receiving public assistance;

10) Percent of persons 16 and older unemployed;11) Percent of households that are linguistically isolated;

12) Percent of occupied housing units where the owner moved in within 1999-2000.

We have demonstrated above that the foreign born, and particularly recent entry foreignborn, tend to have characteristics that correlate with the items that enter into the hard to

count score. Do they also tend to live in areas that have high HTC scores? Figure five

presents a simple graphical analysis, with a linear regression (weighted by population size

in each tract) fit overlaid onto the point cloud. Each point is a census tract.

-- Figure five about here

As can be seen, there appears to be a correlation between percent foreign born who have

entered within the last ten years and the hard to count scores. And, paradoxically, when alinear regression is fit, for tracts where there are high levels of foreign born, beginning

about about 80% of total population, the linear fit begins to make predictions that are

impossibly high (that is, above 132).

This simple graphical finding suggests that there is residual variance to be explained that is

not captured in the components included in the score itself. To test this hypothesis, we run

a regression of HTC score on the twelve components that enter into that score, with theaddition of one variable: percent of total population that is foreign born and has a year of

entry within the last ten years. The result of this regression is presented in table three.

-- Insert table three about here

As can be seen, and not surprisingly, the components of the HTC score tend to predict that

score. Additionally, conditional on all other components of the score being held constant,every one percent increase in recent entry foreign born in a census tract results in an

expected .475 point increase in the HTC score.

Demographic Characteristics of Individual Foreign BornPersons, Particularly Recent Arrivals, Correlate With Hardto Count Indices

In previous sections, we have dealt with aggregate data at the state and tract levels. First,

we have shown that coverage ratios tend to decline as the percentage of the population whoare recent arrival foreign born increases. Second, we have shown that hard to count scores

for tracts tend to increase as the percentage of the population who are recent arrival foreign



19/40


born increases, and that this relationship remains even when the components of the score

itself are used to predict it.

In each of the preceding sections we have noted the potential for the ecological fallacy. In

this section we use microdata from the 2007 American Community Survey Public Use

Microdata Sample to examine whether the components that correlate with netundercoverage are concentrated among individuals who are foreign born, in particular

recent arrivals. We use the characteristics of individuals and households identified by the

Census Bureaus planning database (Robinson and Bruce, 2007) as predictive of low returnrates and hard to count characteristics.

Our analysis is simply framed. We have partitioned the population into three groups:

native born (including those born abroad of American parents); older foreign born (thoseforeign-born persons whose year of entry is earlier than 10 years from the survey date, i.e.,

1997); the recent foreign born (those foreign-born people whose year of entry is within 6-

10 years of the survey date, 1997-2001), and the very-recent foreign born (those foreign-

born people whose year of entry is within five years of the survey date, 2002 or later). Weproceeded by comparing whether these four groups differ on each of the individual hard to

count characteristics. Recall that the characteristics identified in the hard to count scoreare: vacant housing units, housing units that are not a single-family structure, renter-

occupied housing units, crowded occupied units, families not in a husband/wife

configuration, occupied housing units with no phone service, persons over 25 who have notgraduated from high school, persons below the poverty level, persons receiving public

assistance, persons aged 16 and over who are unemployed17, households that are

linguistically isolated, and occupied housing units where the owner moved in within

1999-2000. With the exception of vacancy (which does not apply), we look at eachindividually, rather than in a multivariate context, performing simple difference of

proportions (percentages) tests (Agresti, 1990) to assess statistical significance. In each

case our null hypotheses are stated simply:

H0(1): The percentage of persons who exhibit the hard to count characteristic does not

differ between native born and non-recent foreign born.


differ between native born and recently arrived foreign born.


differ between native born and very recently arrived foreign born.

Our alternative hypotheses are the converses of these null hypotheses. We use the 90%

two-tailed confidence interval; thus, if the confidence intervals of the two comparison

groups do not overlap, we will reject the null hypothesis, otherwise, retain. Table fourcontains these tests.

17 Note that persons who are out of the labor force are not considered unemployed in this definition;

thus this percentage does not correspond to the traditional unemployment rate definition.



20/40


-- Insert table four about here --

We summarize the conclusions from this table:1. The foreign born, particularly recent arrivals, are more likely to live in non-single-

unit houses;

2. The foreign born, particularly recent arrivals, are more likely to be renters ratherthan owners;

3. The recently arrived foreign born are more likely to live in crowded housing

units;4. The recently arrived foreign born are more likely to live in non-husband/wife

families;

5. The recently arrived foreign born are more likely to live in houses without a

telephone;6. The foreign born (age 25+) are more likely to not have graduated from High

School;

7. The recently arrived foreign born are more likely to live in households below the

poverty line; the non-recently-arrived foreign born are less likely to live inhouseholds below the poverty line;

8. The non-recently-arrived foreign born are more likely to be receiving publicassistance; the recent foreign born are less likely to be receiving public assistance;

9. The recently arrived foreign born are more likely to be unemployed;

10. The foreign born, particularly recent arrivals, are more likely to live inlinguistically isolated households;

11. The recently arrived foreign born are more likely to be recent movers; the non-

recently-arrived foreign born are less likely to be recent movers.

In sum, of the eleven characteristics that are considered to make a household hard to

count, the recently-arrived foreign born are more likely to exhibit ten of the them, and less

likely to exhibit one (public assistance receipt). The non-recently-arrived foreign bornexhibit six of these characteristics; for three characteristics the native born and non-

recently-arrived foreign born are statistically indistinguishable; and for two characteristics

the non-recently-arrived foreign born are less likely to exhibit the characteristic.

What Can be Done to Measure Coverage of the ForeignBorn?

Despite the stated importance of this topic to many agencies (Government Accountability

Office, 1998:57-58), there have been few attempts to measure net undercoverage of theforeign born. This section will detail alternatives, with appropriate caveats stated.

Coverage Measurement Alternatives: Summary

Coverage measurement is a difficult topic to summarize, but we shall attempt a brief

summarization here. Table one describes the coverage measurement approaches and their

implied coverage ratios. One can imagine defining coverage as doing it better, that is,determining what enumerationshould have occurred. In this case the coverage ratio is the

enumeration divided by the truth (as determined by the better system). Dual system



21/40


estimation is based on doing it independently. In this case the coverage ratio is the

enumeration divided by the independence model. The remaining methods, Reverse record

check, Megalist/Benchmark, and Demographic accounting, each assume that thetraced sample, the megalist or benchmark list, or the demographic summation, determine

the coverage ratio (Popoff and Judson 2004: 634 list these five alternatives; Judson, 2006

discusses coverage ratios.)

-- Insert table five about here --

Post Enumeration Surveys/Dual System Estimation

A standard for evaluating coverage in a census is the Dual system estimation (DSE)method using a post enumeration survey. (See Popoff and Judson, 2004:633-637 for a

summary).

The DSE method has a long history; see, e.g., Chandrasekaran and Deming, 1949; Marks,

Seltzer and Krotki, 1974; Wolter, 1986; Hogan, 1992, 1993 and 2000; for theory andexamples of the method in practice. While the 1950 Census was the first to use a post

enumeration program, and the method has been used subsequently (with increasingtechnical sophistication). The DSE method is a microdata approach (focusing on

individual responses) rather than an aggregate approach (focusing on demographic

aggregates; Judson, 2006).

The 2000 Accuracy and Coverage Evaluation was based on the census short form.

Because of this, it was not possible to create a specific estimation domain for the foreignborn. (As the nativity question is not asked on the short form, it is not possible to assign

individual respondents to that domain so as to construct coverage estimates of that

domain.) It has been asserted (e.g., Hogan, 2008) that technical and policy issues make itimpossible to use dual system estimation to construct a coverage estimate specifically for

the foreign born.

The operative phrase here is, of course, dual-system estimation. Assuming that a dualsystem estimate is the gold standard, and noting that no legislative mandate exists to

specifically construct an estimate of coverage for the foreign born18, the question is moot

without asking nativity on both the 2010 census and the CCM survey, it is not possible toclassify individuals sufficiently to construct a full dual system estimate.

What is not spoken of is the possibility of some otherkind of estimate of coverage.19 It is

to these other kinds of estimates that we now turn.

18 Raising the question, of course: would there be any support for such a legislative mandate, that is, to

specifically construct a dual system estimate, and corresponding coverage correction factor, using nativity

status as an estimation domain?19 Suppose that we stipulate, for the sake of argument, that dual system is the gold standard. Even so, that

stipulation raises the further question: if we cannot generate a gold standard estimate, does that mean we

should not attempt to producesome estimate, recognizing its caveats?



22/40


Demographic Benchmarking

A method that we will describe as demographic benchmarking has been presented in

Pitkin and Park (2005), and proposed by Camarota (2006). The demographicbenchmarking method develops, for an appropriate target population, the highest-quality

demographic data available to construct an estimate of the target population, in various

aggregate quantities (e.g., age groups). These benchmarks are compared to census orsurvey results, and where the census or survey results differ from the demographic

benchmarks, interpret the difference as net coverage error. The demographic

benchmarking approach stands or falls on one particular assumptionthat the benchmark

is of sufficiently high quality to serve this role.

What benchmarks have been used successfully? For the benchmark to be of sufficiently

high quality, it must be measured with little or no error. Of the demographic statisticsavailable currently, only two fully qualify: vital statistics on births, and vital statistics on

deaths. (In the United States, data on Medicare enrollments are a candidate, with relatively

minor correction for historical underenrollments; Robinson, et al, 2002.)

Demographic Analysis

An extension of the demographic benchmarking approach is what we shall calldemographic analysis (which we alluded to earlier). What distinguishes demographic

analysis from benchmarking is that it attempts to construct a complete estimate of the

population of interest, rather than particular segments of it. Like benchmarking, it isfundamentally an aggregate approach rather than a microdata approach. The use of

demographic analysis to assess net coverage is similar to the benchmarking approach in the

following ways: For an appropriate target population, it uses the highest-qualitydemographic data available to construct an estimate of the target population, in various

aggregate quantities (e.g., age groups). For less-well-known groups (e.g., immigrants;

emigrants; unauthorized persons), demographically-plausible models are constructed. andthen the demographic benchmark is compared to census or survey results, interpreting thedifference as net coverage error.

The challenge of the demographic analysis approach is that it makes more assumptionsthan the benchmarking approach. The benchmarking approach can rely on the relative

strength of its underlying data sources: vital statistics, housing, school enrollment, or

employment data. Demographic analysis, in contrast, must rely on the formerandweakerassumptions about components of migration.

Direct enquiries

Marcelli and Ong (2002), after reviewing demographic analysis and dual-system

approaches (with an abbreviated discussion of synthetic estimation, to which we shall turn

subsequently), propose a direct enquiry as a method of estimating census undercoverage.This approach is microdata oriented, in that respondents in households are asked directly

was [this person enumerated in the household] included in the 2000 questionnaire sent to

the Census Department? Thus, for an appropriate target population, a list of persons inthat target population is constructed, adjusting for survey nonresponse. Within that target

population, a direct enquiry as to whether the person was captured in census records is



23/40


performed, and were a respondent indicates that they were not reported on the census list,

interpret the report as a census coverage error, and calculate the gross undercoverage rate

from these data.

Obviously, the question about reporting to the census has the potential to be sensitive,

presumably more so for those foreign-born persons of unauthorized or ambiguous legalstatus. Because the question is sensitive, it is easy to imagine that some version of social

desirability bias will play a role in the respondents answer.

Marcelli and Ong, while recognizing the above criticism, defend the approach as follows:

They work directly with the local population of interest; they use interviewers that are as

non-threatening as possible; and they ask questions in the vernacular, with native speakers.

All of these approaches are designed to reduce response error due to respondent fear orresistance.

One-Way Record Linking

A microdata approach for estimating coverage error was tested by Heer and Passel (1987)

in the Los Angeles metropolitan area. This method will be referred to as one-way recordlinking. For an appropriate target population, a list of persons in that target population,

with appropriate individually identifying information (e.g., full name, full date of birth,

geographic locators) is constructed. This list is linked with the decennial enumeration list,and where a nonlink occurs with the census list, interpret the nonlink as a census coverage

error, and calculate the gross undercoverage rate from these data.

The key to this approach is the assumption that the list for the target population is a

complete list, and that therefore any difference between the list and the census enumeration

mustbe gross census undercoverage. A weakness of the approach is the assumption thatneither list contains gross overcoverage. Privacy concerns about this use of records might

arise, as well.

Synthetic Estimation

Synthetic methods for estimating coverage involve combining information from a

coverage evaluation survey with demographic characteristics of the population of interest.20

The synthetic approach also has a long history (e.g., Gonzales, 1978) and makes use of

direct and indirect information. In fact, the intended application (in A.C.E. in 2000) of the

dual system estimator itself would have used synthetic estimation methods for non-sampledcensus areas.

In the intended A.C.E. 2000 application, nonsampled census blocks would have been

treated as an estimation domain to which coverage correction factors would have been

20This is aspecific application of synthetic estimation in general, in which survey estimates of some

specific characteristic are combined with demographic characteristics to construct the final estimand.



24/40


applied synthetically. (A description of the application of the synthetic method can be

found in Hogan, 2000, or Judson and Popoff, 2004.) The innovation proposed here is to

treat the foreign-born population, or some subset of the foreign-born population (such arecent arrivals) as an estimation domain, and an estimate of the unauthorized foreign-born

population as a separate estimation domain, rather than nonsampled census blocks, and

apply the dual system coverage estimates to them. This would required tabulating theforeign-born population enumerated in Census 2000, allocating the foreign born

respondents to appropriate A.C.E. (revision II) post strata, determining the proportion of

the foreign-born population that falls into each stratum, constructing the coverage factorsfrom A.C.E. Revision II post strata, and calculating a synthetic estimate of the net coverage

factors for the foreign-born population as a whole.

The strength of this synthetic approach is that it would use the best available statistically-designed coverage data, rather than rely on demographic assumptions or the assumption

that one or more list is a benchmark. A weakness is that it makes the implicit assumption

that the foreign born have the same coverage factors as the population as a whole within a

post-stratum (the synthetic assumption). If it is assumed that the foreign-born populationhas at best net coverage error equal to the population as a whole, then this method would

generate an upper bound net coverage error rate.

Research ProposalsWe have argued in this paper that assessing the potential undercoverage of the foreign-bornpopulation is an important task for the statistical community. We have presented

ethnographic, survey, and demographic evidence for such undercoverage. We have

presented new findings that suggest, but do not prove, that the foreign-born population

might have differential undercoverage both in the census and in ongoing surveys. We haveshown that population estimates of public policy significance are highly sensitive to an

assumed rate.

Given these findings, it is natural to conclude that the statistical community has a

responsibility to find a way to estimate that number, the potential net undercoverage of the

foreign born. We shall now summarize a sequence of research tasks to approach thatnumber, beginning with the easiest-to-implement and proceeding into more difficult

approaches.

The first method on our list is demographic benchmark studies. Pitkin and Park (2005)

demonstrated that birth registry data, combined with reasonable demographic assumptions,can construct a benchmark population estimate from which an estimate of coverage could

be derived. Because Pitkin and Parks method is based on birth registry data, it does notsuffer the weaknesses of the larger demographic analysis approachit requires fewer

difficult-to-maintain demographic assumptions. This approach would provide a net

coverage error estimate.



25/40


The second method on our list is synthetic estimation. Assuming no congressional

mandate for a coverage estimate by nativity is promulgated, it is possible to cross-classify

the foreign-born on characteristics evaluated in the A.C.E. in 2000 (or those to be evaluatedin Census Coverage Measurement in 2010). Using such cross-classification and published

coverage correction factors, it is a mathematical exercise to derive a synthetic estimate.

This approach would provide a net coverage error estimate.

The third method on our list is that of developing a direct survey. Marcelli and Ong (2002)

have demonstrated that direct enquiry, combined with demographic analysis, can begin tomake headway in understanding the potential gross undercoverage of the foreign born.

Following Marcellis arguments, it would appear that such a survey would best be fielded

by a trusted, non-governmental entity, and designed from bottom-to-top to allay

respondents privacy concerns. This approach, as with one-way record linkage, would notprovide information on gross overcoverage.

The fourth method on our list is a one-way record linkage study. We have described above

the technical limitations of a one-way record linkage, noting in particular that theassessment of coverage is biased by the presence of record linkage error. However, it

appears to us to that results from such a study could be adjusted for the presence of sucherror (e.g., Judson, 2007: 497), yielding, at least, some direct information about coverage of

the foreign born. While this approach would provide information on gross undercoverage,

it would not provide information on gross overcoverage.

The fifth method involves testing the feasibility of a Post-enumeration Survey in the

context of the American Community Survey. In accord with Hogan (2008), Census 2010

will not have a nativity question, thus a dual system estimate using nativity as an estimationdomain is not possible. However, the American Community Survey does include a nativity

question. With the development of appropriate statistical theory to account for the ACSs

complex sample design, a post-ACS survey, designed along similar lines as the existingCensus Coverage Measurement system, would provide coverage measurements by nativity

(and presumably, other relevant characteristics).

Our sixth and final method involves testing nativity questions in a CCM framework for

post-2010 purposes. The sensitivity of nativity questions in a CCM framework might bias

the dual system estimator by inducing correlation bias amongst the foreign-born

population. While this is a reasonable concern, it warrants empirical testing. If, in fact theimpact of a nativity question is negligible, and congressional mandate for such estimation

were in place, then there would be no reason not to apply the gold standard coverage

measurement technique to developing a statistically-principled estimate for the netcoverage of the foreign born.



26/40


References

Agresti, A. 1990. Categorical Data Analysis. New York: Wiley.

Anderson. M. J. and Feinberg, S. E. 1999. Who Counts? The Politics of Census-Taking inContemporary America. New York, NY: Russell Sage Foundation.

Arriaga, E,E,, P.D. Johnson, and E. Jamison. 1994. Population Analysis with

Microcomputers, Vol. 1 and 2. Washington, D.C.: U.S.Department of Commerce, U.S.

Census Bureau, International Programs Center.

Bill, W. 2002. A.C.E. Revision II: Calculating Aggregate Data Defined, Correct

Enumeration, and Census Inclusion Rates (For Groups that Involve Aggregation AcrossPost-Strata). Online: http://www.census.gov/dmd/www/pdf/pp-40r.pdf.

Bruce, A., and Robinson, J. G.. 2003. The Planning Database: Its Development and Use asan Effective Targeting Tool in Census 2000," paper presented at the Annual Meetings of

the Southern Demographic Association, Arlington, VA, October 24, 2003.

Bruce, A., Robinson J. G., and Sanders, M. V.. 2001. Hard-to-Count Scores and BroadDemographic Groups Associated with Patterns of Response Rates in Census 2000,"

Proceedings of the Social Statistics Section, American Statistical Association.

Camarota, S. and Capizzano, J. 2004. Assessing the Quality of Data Collected on the

Foreign Born: An Evaluation of the American Community Survey (ACS). Online:

http://www.sabresystems.com/whitepapers/CIS_whitepaper.pdf.

Camarota, S. 2006. Assessing the Quality of Data Collected on the Foreign Born:

An Evaluation of the American Community Survey (ACS). Paper presented at the U.S.

Census Bureau Conference, Immigration Statistics: Methodology and Data Quality.Alexandria, VA: February 13-14, 2006.

Cantwell, P.J., Hogan, H., and Styles, K.M. 2004. The Use of Statistical Methods in theU.S. Census: Utah V. Evans. The American Statistician, 58: 203-212.

Chandrasekar, C., and Deming, W.E. 1949. On a Method of Estimating Birth and DeathRates and the Extent of Registration. Journal of the American Statistical Association, 44:

101-115.

Clark, W.A.V. and Patel, S. 2004. Residential Choices of the Newly Arrived ForeignBorn: Spatial Patterns and the Implications for Assimilation. California Center for

Population Research On-Line Working Paper Series, CCPR-026-04, February 2004.

Darga, K. 2000. Fixing the Census Until it Breaks: An Assessment of the Undercount

Adjustment Puzzle. Lansing, MI: Michigan Information Center.

http://www.census.gov/dmd/www/pdf/pp-40r.pdfhttp://www.census.gov/dmd/www/pdf/pp-40r.pdf


27/40


De la Puente, M. 1993. Why Are People Missed Or Erroneously Included By The

Census: A Summary Of Findings From Ethnographic Coverage Reports. ResearchConference on Undercounted Ethnic Populations. Richmond, VA: U.S. Census Bureau.

Deardorff, K. and Blumerman, L. 2001. Appendix A: Estimates of the Foreign-BornPopulation by Migrant Status: 2000. In Robinson, J.G. 2001. ESCAP II: Demographic

Analysis Results. Executive Steering Committee for A.C.E. Policy II, Report No. 1,

October 13, 2001. Online: http://www.census.gov/dmd/www/pdf/Report1.PDF.

Ellis, Y. 1995. Examination of Census Omission and Erroneous Enumeration Based on

1990 Ethnographic Studies of Census Coverage. Pp. 515-520Proceedings of the

American Statistical Association (Survey Research Methods Section). Alexandria, VA:American Statistical Association.

Ewbank, D.C. 1981. Age Misreporting and Age-Selective Underenumeration: Sources,

Patterns, and Consequences for Demographic Analysis. Washington, D.C.: NationalAcademy Press.

Fay, R. E. 2001. The 2000 Housing Unit Duplication Operations and Their Effect on The

Accuracy Of The Population Count Paper presented at the Annual Meeting of the

American Statistical Association, Atlanta, Georgia, August 5-9, 2001.

Fein, D. J. 1990. Racial and ethnic differences in U.S. census omission rates.

Demography, 27:285-302.

Fernandez, E.W., and Robinson, J. G. 1994. "Illustrative Ranges of the Distribution of

Undocumented Immigrants by State," Technical Working Paper No. 8. October 1994.

Online: http://www.census.gov/population/www/documentation/twps0008/twps0008.html.

Government Accountability Office 1998a. Decennial Census: Overview of Historical

Census Issues. GAO/GGD-98-103. Washington D.C.: U.S. Government AccountabilityOffice.

Government Accountability Office 1998b. Immigration Statistics: Information Gaps,

Quality Issues Limit Utility of Federal Data to Policymakers. GAO/GGD-98-164.Washington D.C.: U.S. Government Accountability Office.

Hoefer, M., Rytina, N., and Baker, B. 2008. Estimates of the Unauthorized ImmigrantPopulation Residing in the United States: January 2007. Online:

http://www.dhs.gov/xlibrary/assets/statistics/publications/ois_ill_pe_2007.pdf.

Hogan, H. 1992. "The 1990 Post-Enumeration Survey: An Overview." The American

Statistician, 46: 261-269.

http://www.census.gov/dmd/www/pdf/Report1.PDFhttp://www.census.gov/population/www/documentation/twps0008/twps0008.htmlhttp://www.dhs.gov/xlibrary/assets/statistics/publications/ois_ill_pe_2007.pdfhttp://www.census.gov/dmd/www/pdf/Report1.PDFhttp://www.census.gov/population/www/documentation/twps0008/twps0008.htmlhttp://www.dhs.gov/xlibrary/assets/statistics/publications/ois_ill_pe_2007.pdf


28/40


Hogan, H. 1993. "The 1990 Post-Enumeration Survey: Operations and Results." Journal

of the American Statistical Association, 88:1047-1060.

Hogan, H. 2000. Accuracy and Coverage Evaluation: Theory and Application. Paper

presented at the 2000 Joint Statistical Meetings, Indianapolis, Indiana, August 2-5, 2000.

Hogan, H. 2008. Letter to Ms. Judith Droitcour, Assistant Director, Applied Research and

Methods, U.S. Government Accountability Office. Dated February 26, 2008.

Jones, J. 2003. Housing Unit Duplication in Census 2000. Census Bureau Evaluation

O.10. Washington, DC: U.S. Census Bureau. Online:

http://www.census.gov/pred/www/rpts/O.10.PDF.

Judson, D.H. 2006. Demographic Coverage Measurement: Can Information Integration

Theory Help? Paper presented at the 2006 Joint Statistical Meetings, Seattle, WA, August

6-10, 2006.

Judson, D.H. 2007. Information integration for constructing social statistics: history,

theory and ideas towards a research programme.Journal of the Royal Statisti

coverage of fb 3-30-09 - v2 - dhj

Documents