coverage of fb 3-30-09 - v2 - dhj

Upload: commando719

Post on 06-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    1/40

    DRAFT DO NOT CITE OR QUOTE

    Coverage of the Foreign-Born Population in Censuses and Surveys: What Do We Think,

    What Do We Know, and What Can We Prove?

    By

    Dean H. Judson1

    Abstract

    It is widely assumed that the foreign-born population in general, and the unauthorized

    foreign-born population in particular, are not captured in surveys and the decennial Census

    as well as the native-born population. Many different estimates have been proffered, all

    with significant limitations. This paper attempts to summarize research on this coverage

    question. We first describe the methods used to correct net coverage error in censuses and

    surveys, and discuss the effect of net coverage error on censuses, surveys, and derived

    products. Then we describe the history of various coverage assertions or assumptions used

    in the literature (what do we think?). We then assess their relative merit (what do we

    know?). Finally, we attempt to make an assessment as to what isprovable about coverage

    of the foreign-born (what can we prove?). We conclude by suggesting a research agenda

    that would address this coverage question in a statistically-principled way.

    Keywords: foreign-born coverage, coverage measurement, coverage correction,

    information integration, the coverage question

    1 This report was completed while the author was Senior Statistician for the Office of Immigration

    Statistics, and is released to inform interested parties of ongoing research and to encourage discussion of

    work in progress. The views expressed on statistical and methodological issues are those of the author and

    not necessarily those of the Office of Immigration Statistics or Department of Homeland Security.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 1

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    2/40

    DRAFT DO NOT CITE OR QUOTE

    Introduction: the Coverage Question ...................................................................................3

    Nonsampling Error ...............................................................................................................4

    Net Coverage Error ..............................................................................................................5Why is Net Coverage Error Important? ...............................................................................6

    Correcting for Net Coverage Error in Censuses and Surveys .................................................6

    The Census: Count Imputation and Post-Enumeration Survey Coverage CorrectionFactors ..................................................................................................................................7

    The Effect of Net Coverage Error on the Census ................................................................8

    Surveys: Weights and Population Controls .........................................................................9The Effect of Net Coverage Error on Survey Estimates ...................................................10

    CPS .................................................................................................................................10

    ACS ................................................................................................................................10

    Other Surveys .................................................................................................................10Adjustments for Potential Net Coverage Error .................................................................11

    What has been Asserted about Coverage of the Foreign Born? ............................................14

    What is Provable About Coverage of the Foreign Born? ......................................................15

    Ethnographic Studies Suggest that the Foreign Born Avoid Detection ............................16The Foreign Born Tend to Respond Later to Surveys ...................................................16

    States with Higher Levels of Foreign Born, Particularly Unauthorized or RecentArrivals, Tend to Have Lower Coverage Ratios ...............................................................16

    The Foreign Born, Particularly Recent Arrivals, Tend to Live in Areas with Higher

    Hard-to-Count Scores .....................................................................................................17Demographic Characteristics of Individual Foreign Born Persons, Particularly Recent

    Arrivals, Correlate With Hard to Count Indices ............................................................18

    What Can be Done to Measure Coverage of the Foreign Born? ...........................................20

    Coverage Measurement Alternatives: Summary ...............................................................20Post Enumeration Surveys/Dual System Estimation .........................................................21

    Demographic Benchmarking .............................................................................................22

    Demographic Analysis .......................................................................................................22Direct enquiries ..................................................................................................................22

    One-Way Record Linking ..................................................................................................23

    Synthetic Estimation ..........................................................................................................23Research Proposals .................................................................................................................24

    References ..............................................................................................................................26

    Tables and Figures ..................................................................................................................31

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 2

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    3/40

    DRAFT DO NOT CITE OR QUOTE

    Introduction: the Coverage QuestionWhen conducting a census or survey, it is important that the operations of the census or

    survey properly represent the target population of interest. In general, the relationshipbetween the enumerated/sampled group and the target population is referred to as the

    coverage of that population. In the absence of proper coverage, there is coverage error,and the survey estimates or census enumeration are biasedwith respect to the population.

    This fact is especially important for derived estimates, such as the unauthorized population,which are difficult in any case due to data gaps.

    The foreign-born population of the United States is estimated to consist of about 38 millionpeople, 12.6% of the total population of about 302,000,000 as of 2007 (U.S. Census

    Bureau, American FactFinder, 2008). It is widely assumed that, for various reasons, the

    foreign-born population is particularly likely to be subject to coverage error (Marcelli andOng, 2002; Camarota, 2006) in surveys and the census. If true, then this would have wide-

    ranging implications for the census itself, ongoing surveys, and for estimates derived from

    them (including estimates of the naturalized, legal permanent, refugee/asylee andunauthorized subpopulations). We will refer to this question as the coverage question

    and we will repeatedly use the phrase potential undercoverage to emphasize the potential

    in the absence of hard proof.2

    The purpose of this paper is to comprehensively address the coverage question, and

    attempt to define an agenda to convert assumptions about foreign-born undercoverage into

    proof about its existence and magnitude. We will do so in the following steps:

    First, we will address the sources of error in surveys and the census;

    Second, we will review the known effects of coverage error on surveys and the

    census, focusing on the foreign born where possible;

    Third, we will summarize attempts or recommendations on how to try to measurecoverage;

    Fourth, we will assess which measurements of coverage are most viable; andfinally,

    We will suggest a research agenda to definitively attack the coverage question.

    In what follows, we will be most careful to distinguish our knowledge in this area betweenwhat an individual researcher asserts to be true (what do we think?), what the community

    of researchers generally agree on (what do we know?), and concrete findings in this area

    (what can we prove?).3

    2 This is not a new concern; cf. Siegel, 1976:15: This report has tried to develop the view that it is not a

    practical goal to estimate directly the number of illegal aliens [sic]Thus, the effort to estimate the number

    of illegal aliens becomes principally an effort to measure the coverage of the total population by nativity

    (italics added).3 For the readers interest, this three-part partition of knowledge is derived from the excellent movie,

    And the Band Played On based upon the book about the AIDS epidemic of the same name (Shilts, 1987).

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 3

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    4/40

    DRAFT DO NOT CITE OR QUOTE

    Coverage error refers to a number of errors in addition to any sampling variability or other

    nonsampling error. To more fully describe the term coverage error, we will begin with a

    general discussion of sampling variability and nonsampling error in surveys and the census.

    Sampling Variability

    The sampling error of a sample survey can be measured in several ways. The first measure

    that is usually desired is the variance of the sample estimate. This is the average, over all

    possible samples, of the squared deviations of the estimates from their expected value. Anestimate of the variance can be obtained from the sample survey data themselves. If there

    are nonsampling errors or the sample is biased, then the deviations are taken around the

    true value of the statistic and the measure is called the mean square error. Typically, the

    variance is denoted by 2 and the mean square error by MSE. Of these two measures, the

    MSE is more general, as illustrated in the formula for MSE. Suppose that p is the value

    being estimated, and p is the estimator of p; then the MSE ofp is given by:

    ( )

    ( ) ( )

    )(bias)var(

    )()(

    )()()()(

    2

    22

    22

    pp

    ppEpEpE

    ppEpEpEppEpMSE

    +=

    +=

    +==

    (1)

    Ifp is unbiased, then the MSE is just the variance itself. In the presence of coverage error

    (or other kinds of nonsampling error), p is notunbiased, and contributes to mean squared

    error.4

    Nonsampling Error

    In addition to the error of the estimates caused by sampling variability, there is another

    component of the total error in demographic data. Nonsampling error characterizes all

    surveys, whether sampling is used or notincluding 100% surveys known as censuses.This component arises from mistakes made in the process of eliciting, recording, and

    processing the response of an individual unit in the surveyed population. Every operation

    in a census or sample survey, and every factor within an operation, may contribute tononsampling error. Lessler and Kalsbeek (1992:9) classify survey errors into four types:

    4 Lohr (1999:256-258) develops a simple model illustrating the biasing effect of nonresponse. Letting MN

    be the number of nonrespondents in a survey, N the total number sampled in the survey, RUp the mean

    response for respondents, MUp the mean response for nonrespondents, and Up the mean response for the

    population as a whole, the bias induced by nonresponse is approximately ( )MURUM ppN

    N . This bias

    is small only ifN

    NMis small (there is little nonresponse) or ( )MURU pp is small (responders arent

    much different than nonresponders).

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 4

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    5/40

    DRAFT DO NOT CITE OR QUOTE

    Frame errors: problems with developing lists of units to be sampled, duplication or

    omission of units, and the like;

    Sampling errors: error or variance associated with sampling itself;

    Nonresponse errors: errors associated with the failure to include a unit in the survey

    that should have been included, either through omission by the surveyer or

    nonresponse or refusal by the respondent; and Measurement errors: errors associated with data collection operations, including the

    contribution of respondent, interviewer, and questionnaire to error.

    Factors that are common to all of these operations, including training procedures and

    supervision, and technical-staff intervention associated with each operation, may also be

    sources of nonsampling error. Typically, those operations that are conducted in the officeafter collection of the data in the field are more amenable to control than are the field

    operations. It is also generally possible through study of the office operations themselves

    to measure errors introduced (by the operations) into the data that were collected in the

    field. Because nonsampling error made by the respondent, in interaction with the

    interviewer and the questionnaire, is more serious and less amenable to measurement thanerrors arising from other operations, nonsampling error is often called response error. A

    typical example of response error made by respondents is the tendency of persons in manycountries to report their ages in years ending in zero and five (Ewbank, 1981). Often such

    response error requires special detection and smoothing methods; such methods are

    described in Arriaga, Johnson, and Jamison (1994), and Judson and Popoff (2004).

    Net Coverage Error

    Coverage error can take two forms: overcoverage and undercoverage. The former implies

    that a sampled unit occurs in the sample more than once. The latter implies that a sampled

    unit occurs in the sample less than once. Typically, the difference between overcoverageand undercoverage is referred to as net coverage error. Net coverage error can be positive,

    negative, exactly zero or sum to zero (in this latter case errors offset each other).

    Furthermore, net coverage error can vary across specific geographic, economic or

    demographic subgroups. That is, one group could have a positive net coverage error and

    another group could have negative net coverage errorthus one group is overcovered and

    the other undercovered. This state of affairs is known as differential coverage error.

    At this point it is useful to discuss generally the causes of over and undercoverage, before

    entering into the specific issues associated with the foreign born. Several authors (Fein,

    1990; de la Puente, 1993; Anderson and Feinberg, 1999; Judson and Popoff, 2004)enumerate sources of coverage error, which we shall summarize.

    A source of overcoverage found in Census 2000 was Master Address File frame erroneous

    enumeration and duplication (Jones, 2003). In the Census 2000 context, operations that

    created the Master Address File created some units that were later found to be erroneousenumerations, or created multiple records for the same address. (For example, the same

    physical location might have multiple ways to write the address, and the file would contain

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 5

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    6/40

    DRAFT DO NOT CITE OR QUOTE

    all of them.) Comparing demographic housing benchmarks with the 1999 Decennial

    Master Address File, the control file for subsequent operations, implied a 5.3%

    overcoverage of addresses in that file. This led to overenumeration when the addressreceived multiple forms or in-person followups (Fay, 2001).

    Similarly, overcoverage can be caused by accidental person duplication. For example, acensus form might enumerate a student living away from home at college, while the

    students parental household also enumerates that same student. As another example,

    persons in assisted-living situations might be enumerated at a group quarter and at ahousehold. The Census Bureau estimates a lower-bound figure of 5.8 million such person

    duplicates in Census 2000 (Mule and Fenstermaker, 2003).

    By contrast, undercoverage comes from many sources. The Census Bureau enumeratesmany of them: High mobility; Rentership; Language barriers; Neighborhood resistance;

    Irregular housing; Non-standard living arrangements; Loose attachment to a particular

    household; Concerns regarding confidentiality (de la Puente, 1993; Darga, 2000; Camarota,

    2006). To this list might be added: Active desire for concealment (Tourangeau, et al.,1997; Ellis, 1995; Valentine and Valentine, 1971).

    Why is Net Coverage Error Important?

    When estimates are produced by a census or survey, typically these estimates are notderived directly from the raw survey or census responses. This means that, even with

    extensive operational quality control and intensive effort, the data collection does not yet

    represent the population.

    As Shapiro and Kostanich (1988:443) state: we believe there is not general awareness

    of the deleterious effects of response error, and it is rarely estimated. In poorly designedand conducted household surveys, there can be many serious problems. In even the best

    household surveys, however, undercoverage and response error tend to be high and, in our

    opinion, are the two most important problems in the sample survey field. (Italics added.)

    When faced with this situation, there are variety of processing steps and adjustments to

    the data to attempt to make it represent the population of interest. (These steps are

    different for a 100% enumeration and a sample enumeration, so we will address themseparately.) Typically, the goal is to represent the population as a whole, not necessarily

    any particular subgroup. Therefore, the question that we will address is: do these

    adjustments correct for potential undercoverage of the foreign-born population, and if they

    do not, how much undercoverage remains?

    Correcting for Net Coverage Error in Censuses andSurveys

    In this section we will describe attempts to correct for net coverage error in the census andin surveys. The approaches are slightly different due to their different data collection

    context (100% enumeration versus sample selection). One area which we will not consider

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 6

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    7/40

    DRAFT DO NOT CITE OR QUOTE

    is the area of item imputation, where item nonresponse (failure to collect one or more data

    element from a survey/census form) requires item imputation. We will, however, consider

    unit imputation caused by unit nonresponse (where entire sample/enumeration units aremissed).

    The Census: Count Imputation and Post-Enumeration SurveyCoverage Correction Factors

    In the census context, extensive efforts are made to execute a complete enumeration: theseinclude advertising campaigns, school presentations, state and local partnership programs,

    careful organization, quality control checks, multiple language questionnaires, querying

    neighbors, and multiple nonresponse followups, to name a few.

    However, despite all efforts, for a small fraction of housing units (about 1.4% of housing

    units in Census 2000; Zajac, 2003), nothing is known about the composition of the housing

    unit, even from a neighbor. In this case imputation is used.5 Imputation typically takes twoforms: status imputation and count imputation. Status imputation determines

    whether the nonresponding unit should be considered occupied or vacant; if occupied,count imputation determines how many personsshould have been enumerated there, butwere not. In Census 2000, both kinds of imputation were executed using the hot deck

    technique, which typically chooses a nearest neighbor to make the imputation6. Item

    imputation is then used to fill out the characteristics of the imputed people.

    For Census 2000, an ambitious coverage correction program was developed, building on

    the earlier Post-enumeration survey (PES) and Post-enumeration program (PEP) used in

    past censuses. This coverage program, the Accuracy and Coverage Evaluation (A.C.E.),was a reenumeration of sampled census blocks, from address frame development to

    household enumeration with specially-designated interviewers. The sample size of the

    A.C.E. was large enough that it had potential to be used to adjust the census to correct fornet coverage error.

    The statistical theory behind the A.C.E. operation was not, as some suppose, to get the

    enumeration right in the A.C.E. and then use it to correct the census. Rather, it used dualsystem estimation theory, which assumes that the two enumerations are independent (rather

    than one being superior to the other). This independence assumption is critical to

    understanding why it is difficult to 1) incorporate nativity questions into the analysis in2010 or 2) use other data sources (such as recent Lawful Permanent Resident Admissions)

    to estimate coverage. We will deal with these technical matters later in this paper.

    Using the independence assumption, an estimate of the true enumeration is constructed, for840 (in 1990) or 416 (in 2000, after collapsing) predefined post-strata or estimation

    domains. (For example, American Indians living on reservations is one estimation domain;

    5 Technically, in Zajac, 2003, this is called substitution to distinguish it from assignment and

    allocation, two forms of item imputation.6 Note that these imputations were the subject of a lawsuit against the Census Bureau by the state of Utah,

    with Utah claiming that hot deck imputation was a form of prohibited sampling for nonresponse, the

    Census Bureau claiming otherwise. The Supreme Court decided in the Census Bureaus favor by a 5-4

    margin. See Cantwell, Hogan, and Styles, 2004, for a summary ofUtah v. Evans.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 7

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    8/40

    DRAFT DO NOT CITE OR QUOTE

    Hispanic renters is another.) From these domains, coverage correction factors were to be

    derived, and used in non-sampled blocks to adjust the census (see U.S. Census Bureau,

    2004, for a detailed design document). The Census Bureau did not commit to adjusting thecensus; rather, staff conducted a series of evaluations and the Census Bureau was to

    recommend whether to adjust or not.

    The history of A.C.E. is documented elsewhere, both by the Census Bureau and by others

    (Anderson and Feinberg, 1999; U.S. Census Bureau, 2004). In sum, the problem of

    duplication, respondents interpreting residence rules, and nonindependence between theoriginal enumeration and the A.C.E. enumeration, mitigated against using the A.C.E. to

    adjust the decennial results. Three decisions, in sequence, were made after several months

    of evaluation: 1) A.C.E. could not be used for congressional apportionment; 2) A.C.E.

    could not be used for congressional redistricting; and 3) A.C.E. should not be used foradjusting the postcensal population estimates base (Mulry, 2006). The Census Bureau has

    not made public any plans for considering adjustment in 2010.

    The Effect of Net Coverage Error on the Census

    The central purpose of the decennial census is to provide the Constitutional basis for theapportionment of representatives amongst states. A second purpose is to provide small

    area data for the purpose of congressional redistricting (Government Accountability Office,

    1998a).

    It is the third purpose of a census that is most germane to this paper. This third purpose is

    to serve as a population base for making postcensal population estimates at many levels ofgeography: national, state, county, and subcounty. These postcensal population estimates

    are used for federal funds distribution, but we will not focus on these distributive effects.

    In this paper, their most important use is for survey population controls, as described in theprevious section. To the extent that the census itself has differential net coverage error,

    that error will be propagated forward into postcensal estimates and projections.

    An illustration of these impacts can be seen in Robinson, et al. (2003), which compared theDemographic Analysis (DA) method with unadjusted decennial enumerations. Figure

    one, taken from that paper, exhibits net coverage error7 by 5-year age group and two race

    categories.

    -- Insert figure one about here --

    As can be seen in this figure, comparing decennial census results with the independent DAsystem of coverage measurement reveals important patterns by race, sex and age. Ages 0-4

    and 5-9 tend to exhibit net undercoverage, for all race groups. Outside of these ages,

    7 The term used at the time was net census undercountwith a negative net undercount implying an

    overcount. We prefer the term net coverage error in this context. It is important to note in this context

    that the race detailthat DA can provide is limited to black, white, and all other races. Typically the latter

    two are combined into a nonblack category. This is a limitation of the data sources used to construct the

    DA estimate. From our point of view (our interest in coverage of the foreign born), this is a significant

    limitation.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 8

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    9/40

    DRAFT DO NOT CITE OR QUOTE

    nonblack females exhibit slight net overcoverage, while black males exhibit substantial net

    undercoverage from ages 20-64. Black females and nonblack males also exhibit slight

    undercoverage for approximately ages 30-59. Similar patterns were detected in the 1990census, as well (Robinson, et al., 2003).

    To the extent that these coverage errors remain in an unadjusted census, they propagateforward into postcensal estimates, and surveys that use them for postratification controls.

    Surveys: Weights and Population Controls

    In the survey environment, there are two primary components: Adjustments based on

    survey design, and adjustments based on population controls derived from the previous

    census. Coverage correction typically takes the form of a series of weights, each ofwhich attempts to correct for one or another form of net coverage error. For this section,

    we will take a typical example based on the American Community Survey, although

    similar techniques are used with other surveys such as the Current Population Survey,Annual Social and Economic Supplement (ASEC).

    Weights are applied in phases8:

    Phase one: In phase one, the sampling design is taken into account. This weight accountsfor the combined probability of selection of the final sampled unit. Since a sample, by

    construction, does not cover the entire population of interest, this is the first attempt at

    correctionto correct for unequal sampling probability.

    Phase two: In phase two, any nonresponse followup design is taken into account. For the

    American Community Survey, approximately one in three households who do not respond

    to the mailing and the telephone followup, are sent to personal interview followup. Bydesign, then, about 2/3rds of the nonresponders are not covered. This weight accounts for

    that and upweights those households who receive personal interviews.

    Phase three: After phases one and two, at this stage there are a fraction of households whoare noninterviews. (Some have used the phrase hard core nonresponders or similar

    language.) Little is known about these households except their geographical location and

    some limited information interviewers can obtain by proxy. Phase three weights, or

    noninterview weights are typically used to adjust the sampled and responding units tomatch the geography and information available on the nonresponders. At this point the

    combination of weights are purely survey oriented.

    Phase four: After phase three, postratification occurs. In this phase, independenthousing unit estimates (typically constructed by demographers) are used as control totals

    that is, the estimated survey housing unit values are controlled to the independent

    controls. The purpose is to remove the nonresponse bias from the mean squared error.

    Phase five: Finally, another round of postratification occurs. In this phase, independentpopulation estimates (again, demographically constructed) are used as control totals for

    person recordsthe estimated survey person characteristics (some combination of age,

    race, sex, and Hispanic origin) are controlled to the independent controls. As withhousing unit controls, the purpose is to remove nonresponse bias from the mean squared

    error.

    8 These phases are necessarily a summary and approximate the steps without being exact specifications.

    Typically the weights are combined multiplicatively to generate a final adjustment weight.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 9

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    10/40

    DRAFT DO NOT CITE OR QUOTE

    Lohr (1993: 272) presents this caution about this sequence of adjustment weights:

    The models for weighting adjustments for nonresponse are strong: in each

    weighting cell, the respondents and nonrespondents are assumed to be

    similar.These models never exactly describe the true state of affairs, and youshould always consider their plausibility and implications. It is an unfortunate

    tendency of many survey practitioners to treat the weighting adjustment as a

    complete remedy and to then act as though there were no nonresponse (Italicsadded).

    The Effect of Net Coverage Error on Survey Estimates

    The effect of net coverage error in survey estimates can be summarized by the coverage

    ratio (Shapiro and Kostanich, 1988). A coverage ratio compares the estimate from thesample of the number of people who have a particular characteristic to the same estimate

    from updated decennial census figures. For example, a coverage ratio of .95 for males aged50 to 59 indicates that the survey estimate of the number of people in this subpopulation is

    95% of the updated census population estimate. Occasionally, the coverage ratio exceeds1.0, indicating overcoverage of a particular category.

    CPS

    The Current Population Survey in the 2001-2004 period are summarized at

    http://www.bls.census.gov/cps/basic/perfmeas/coverage.htm. As noted on that page,average coverage ratios are typically about .90, coverage ratios for males are typically

    lower than for females, and this is particularly prominent for black males in that survey.

    Hispanic males and males of other race are also low, in the .80 range. It also appears that

    the coverage ratios have a slight downward trend over the period.

    ACS

    The American Community Survey maintains a data quality web page, which summarizes

    coverage ratios.9 Some evidence of coverage differentials has also been presented at

    statistical meetings (e.g., Bruce, Navarro and Ahmed, 2007). At the national level,unadjusted ACS estimates exhibit coverage ratios of between .94 and .97 relative to total

    population estimates through the 2000-2006 period. Males approach 8% undercoverage,

    females about 4%. Hispanic coverage ratios range from .897 to .964, non-Hispanic whitefrom .949 to .971.

    Other SurveysWe only briefly mention other surveys. Three surveys that provide useful information on

    the foreign-born population are the Survey of Income and Program Participation (SIPP),

    the New Immigrant Survey (NIS), and the National Agricultural Workers Survey (NAWS).

    9 Because ACS samples for personal interview followup, the coverage ratio is necessarily defined slightly

    differently than the analogous CPS coverage ratio. Coverage ratios can be found at:

    http://www.census.gov/acs/www/acs-php/quality_measures_coverage_2006.php.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 10

    http://www.bls.census.gov/cps/basic/perfmeas/coverage.htmhttp://www.bls.census.gov/cps/basic/perfmeas/coverage.htm
  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    11/40

    DRAFT DO NOT CITE OR QUOTE

    The SIPP, by virtue of its longitudinal design, has an additional coverage dimension, apart

    from the coverage ratios found in other surveys. This additional dimension is sample

    attrition (the loss of formerly-responding households and people). Attrition, of course,causes later respondent groups to differ from earlier respondent groups, and requires

    attrition weights to correct for differential attrition across groups.

    Adjustments for Potential Net Coverage Error

    In a number of publications (e.g., Passel, Van Hook, and Bean, 2004; Passel and Suro,2005; Passel, 2005, 2006, 2007; Passel and Cohn, 2008), Pew Hispanic Center has

    published estimates of the size and characteristics of the unauthorized foreign-born

    population. This method uses the March Current Population Survey (CPS) as the base forthe estimate.

    Undercoverage factors are derived from the Accuracy and Coverage Evaluation (Passel andCohn, 2008: 13): a 2.0% undercount rate for legal resident immigrants (2.6% for legal

    resident immigrants who have entered after 1980). Passel and Cohn cite Marcelli and Ong(2002) to justify a 12.5% undercount for unauthorized immigrants in the March CPS.

    Passel, Van Hook and Bean (2004) use a lower, 9.1% undercount.

    It is important in this context to point out that the resulting estimate not only uses these

    assumptions, but the resulting estimate is sensitive to them. Using the Passel, Van Hookand Bean (2004) estimates of the 2000 unauthorized as a base, after some algebraic

    simplification, the resulting approximate residual estimation formula is (in millions):

    ,1

    11361346120

    )-u(

    )-u(.-.edunauthoriz

    edunauthoriz

    legal= where,

    unauthorized is the final estimate of the number unauthorized;

    legalu is the assumed undercoverage rate for legal resident immigrants; and

    edunauthorizu u is the assumed undercoverage rate for unauthorized immigrants.

    If we fix the parameter legalu to match that of Passel, Van Hook and Bean, at 1.6%, the

    estimating equation simplifies to:

    ,1

    9.1246120

    )-u(

    -.edunauthoriz

    edunauthoriz

    = .

    Thus, the unauthorized population is inflated by)-u( edunauthoriz1

    1.

    Table one illustrates the impact of this inflator on the final estimate (we vary the parameterfrom one percent to 20 percent).

    -- Table one about here --

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 11

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    12/40

    DRAFT DO NOT CITE OR QUOTE

    As can be seen, the range of possible unauthorized estimates ranges from 7.68 million (the

    lower bound) to 9.5 million (the upper bound). As can be seen from the center column, theinflation factor increases increasinglythat is, as the assumed rate grows, its impact on the

    final number also grows.

    The Office of Immigration Statistics (OIS) is legally mandated to produce an estimate of

    the stock of the unauthorized population residing in the United States (Immigration and

    Nationality Act, Section 103(d).) The method is described (cf. Hoefer, Rytina, and Baker,2008) as a residual method, and is similar to the Pew Hispanic method described above:

    A population base of the total foreign-born population is constructed from

    American Community Survey population estimates;

    This base is then inflated for net undercoverage;

    From this base is subtracted estimates of deaths, emigrants, legal permanent

    residents, refugees and asylees, and resident nonimmigrants; and

    The residual is an estimate of the unauthorized.

    In this method, two undercoverage factors are applied: a 2.5% net undercoverage rate for

    the legal permanent resident (LPR), refugee and asylee population as a whole, and a 10%net undercoverage rate for the nonimmigrants10 and the unauthorized. It is important to

    note that the application of these two assumed net undercoverage rates implies that the

    resulting estimate is no longer consistent with published census estimates from the ACS.The unauthorized net coverage rate was based on previous DHS/INS unauthorized

    estimates, which cited Marcelli and Ong (2002). The authors note that the resulting

    estimate is sensitive to the assumed net undercoverage rate.

    Like the Pew Hispanic method, with some simplifying assumptions we can perform our

    own sensitivity analysis. Ignoring some demographic particulars, and using similarnotation as above to illustrate the similarities, the 2007 formula is approximately equal to(in millions):

    ( )[ ] ,1

    117.112.2018.28

    )-u()-u(-emig)()-u-(edunauthoriz

    edunauthoriz

    nonlegal =

    where,

    legalu is the undercount rate for the legal resident population;

    emig is the effective emigration rate11;

    nonu is the undercount rate for nonimmigrants; and

    edunauthorizu is the undercount rate for the residual unauthorized population in the numerator.

    Table two compares the relative sensitivity of the resulting estimate to each of the assumed

    rates. The shaded center line represents the assumed rates used in 2007 (again noting that

    this formula is an approximation):

    10 Nonimmigrants are sometimes referred to as legal temporary migrants; they include visa holders who

    are not legally approved for long-term immigration/residence.11 This is an approximation to the internal formula.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 12

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    13/40

    DRAFT DO NOT CITE OR QUOTE

    -- Table two about here --

    The table reveals that, over ranges that are commonly used, the formula is particularly

    sensitive to the assumed net undercoverage of the unauthorized foreign born. It is also

    notably sensitive to assumed emigration rates, and less so for the coverage assumptionsassociated with nonimmigrants and with the legally-resident foreign-born population.

    We conclude with an examination of labor force estimates. During the period of economicprosperity of the 1990s, an anomaly appeared between the Current Population Surveys

    estimate of the labor force, and the Current Economic Statistics survey of establishment

    reported jobs. In essence, the two series, normally running almost in parallel, began to

    diverge, with the CES reporting greater growth in jobs than the CPS was reporting growthin the labor force (Juhn and Potter, 1999). Juhn and Potter analyzed the differences as of

    1999, considering three hypotheses: that the surveys treat multiple jobholding differently;

    that the payroll survey (CES) is upwardly adjusted by benchmarking; and that there was an

    undercount of the working-age population in the calculation of the household surveyestimates. Their conclusion is notable:

    We find that the third explanationan underestimated working-age population

    best accounts for the recent rise in the employment gap. Since the household survey

    calculates the level of employment by combining survey data with a census-basedestimate of the U.S. working-age population, an undercount of that population will

    produce low employment numbers. Evidence suggests that the census has in fact

    historically underestimated this population. Significantly, the undercount appears to

    be highest among groups whose employment status is very sensitive to businesscycle fluctuations. We contend that the steady expansion of the economy in the

    1990s has enabled these cyclical workers to find employment. Their numbers, only

    partly captured in the censusand, by extension, in the household surveyhave inrecent years helped to boost the job count in the payroll survey, widening the gap

    between the surveys employment estimates (Juhn and Potter, 1999:1).

    The recently- (within 10 years) and very-recently- (within 5 years) arrived foreign born are

    more likely than older foreign born or native born persons to fall into the working age

    population (Mosisa, 2002). To the extent that the foreign born are among this group, and

    the unauthorized foreign born in particular, then the potential undercoverage wouldsystematically underestimate the size of the labor force, particularly during expansions.

    The CES/CPS differential in the 1990s, and the subsequent (in 2000) discovery of

    additional, as-yet-undetected net immigration (see, e.g., Robinson, et al., 2002: 22, or

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 13

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    14/40

    DRAFT DO NOT CITE OR QUOTE

    Nardone, et al., 2003:12),12 is consistent with this interpretation, and suggests the

    sensitivity of these results to potential undercoverage of the foreign born.

    An analysis of the effect of CPS/CES differences on unemployment rates and other

    quantities of interest was presented by Schweitzer and Ransom (1999). They calculate the

    effect on late 1990s unemployment rates based on assuming that the CPS employmentlevels followed those of the CES. Their conclusions are presented in their Figure 1: either

    unemployment rates must have been much smaller than reported in the CPS during that

    period, or labor force participation rates must have risen very quickly, or something waswrong with the population controls. Again, this is consistent with, but does not prove, the

    subsequent discovery of additional net immigration, and suggests the sensitivity of these

    results to potential undercoverage of the foreign born.

    What has been Asserted about Coverage of the ForeignBorn?

    Marcelli and Ong (2002), provide results of a study of foreign-born Mexicans in Los

    Angeles county; these results have been widely cited and used far outside their originalcontext.13 In this study, Marcelli and Ong developed a direct survey enquiry: respondents

    in households were asked directly, was [this person enumerated in the household]

    included in the 2000 questionnaire sent to the Census Department? Later in the survey,respondents legal statuses were also ascertained. They obtained a gross undercoverage

    rate of 10.6% for unauthorized respondents, 8.3% for legal immigrants, 4.5% for U.S.

    citizens, and 7.1% for temporary visitors.

    In contrast to direct enquiries, Van Hook and Bean (1998) used a demographic analysis

    approach, generating an expected population size (based on vital statistics) and

    comparing it to the obtained population size in the 1990 census. Their results suggest net

    undercount rates that range from 15-25 percent for unauthorized Mexican immigrants.

    In the context of evaluating coverage in Census 2000, Deardorff and Blumerman (inRobinson, 2001: Appendix A) develop several net undercoverage assumptions by migrant

    legal status, and examine their impact on final demographic analysis estimates. In part, this

    was an attempt to explain the then-discrepancy between early A.C.E. coverage results and

    demographic analysis coverage results. For purposes of their scenarios, these assumptionsranged from 1-2% for legal migrants, 7-35% for legal temporary migrants, and 10-15% for

    12 Robinson, et al.:For the 2000 DA, we were particularly concerned about the reliability of the

    immigration components and conducted a sensitivity analysis in response. This analysis led to the

    incorporation of an alternative set of DA estimates to allow for the possible understatement ofimmigration (specifically, undocumented immigration) in the initial DA components of growth. Nardone,

    et al.: Additional studies carried out by the Census Bureau Population Division as part of theestimates evaluation indicated that the estimates of unauthorized migrants that were used

    in the 1990 based intercensal estimates were too low. The evaluations indicated that the

    residual foreign born population increased by about 5 million during the 1990 to 2000

    decade rather than the 2.25 million (10 x 225,000) assumed in the 1990 based estimates.13 For the record, Marcelli (personal communication) considers applying these results to geographical

    contexts outside of Los Angeles County or to other foreign-born populations residing in the United States

    to be questionable at best.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 14

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    15/40

    DRAFT DO NOT CITE OR QUOTE

    the unauthorized. They note (p. A-12) that the coverage assumptions do not explain the

    different total populations calculated by DA and the A.C.E.

    Passel, Van Hook, and Bean (2004), Passel and Suro (2005), Passel, 2005, 2006, 2007;

    Passel and Cohn, 2008), developed a series of estimates of the foreign born population by

    legal status, including characteristics. These estimates are widely cited in the mainstreammedia. These estimates assume a net undercoverage rate of 2% for legal resident

    immigrants (2.6% for those entering after 1980), based on census studies of net coverage

    error. For the unauthorized, however, a net undercoverage rate of 12.5% is assumed. InPassel and Cohn (2008), Marcelli and Ongs 2002 work is cited.

    A further source of assertions about coverage of the foreign born is the Office of

    Immigration Statistics estimates of the unauthorized population. As noted above, twoundercoverage factors are applied: a 2.5% net undercoverage rate for the LPR, refugee

    and asylee population as a whole, and a 10% net undercoverage rate for the nonimmigrants

    and the unauthorized. It is important in this context to quote the justification for these

    rates: This was the same rate used in previous DHS estimates (Department ofHomeland Security, 2007). Of course, the previous estimates cited other previous

    estimates, and eventually this assumption can be traced to Office of Policy and Planningestimates developed by Robert Warren (2003), who cited Marcelli and Ong (2002).

    Other ranges have been applied to earlier censuses.14 Unofficial estimates by Census staffput the undercount of illegal immigrants at about 33 percent in the 1980 census

    (Government Accountability Office, 1998b; Fernandez and Robinson, 1994); Passel (1986)

    suggested a range between 33 and 50 percent. For the 1990 census, various analyses put the

    figure at roughly 20-30 percent (Woodrow, 1991; Van Hook and Bean, 1997; Woodrow-Lafield, 1995). GAO (1998) cites these and other sources in attesting to the difficulty of

    assessing these various assertions.

    For completeness, it is important to note the A.C.E. results from Census 2000, and what the

    non-adjustment decision of the Secretary of the Commerce Department (coinciding with

    the Census Bureaus recommendation [ESCAP II]) implies. Because overall net coverageerror was close to zero, and because it could not be determined that A.C.E.-based

    adjustments would improve distributional accuracy, no adjustment was performed. This

    implies that the net coverage error of the foreign-born population is, for all practical

    purposes, zero. Implicitly, then, subsequent to Census 2000 all population estimates builton the census base are by assumption covering the foreign-born population adequately.

    What is Provable About Coverage of the Foreign Born?While much is assumed or asserted to be true about coverage of the foreign born, theseclaims are difficult to prove. In this section we will survey some empirical results from theliterature, and provide some results of our own. These results suggest that net

    14 It is widely believed (e.g., Anderson and Feinberg, 1999) that Census-taking has improved steadily

    throughout the 20th century (duplication in Census 2000 possibly being a symptom of an overly aggressive

    attempt to eliminate undercoverage). Thus, estimates of undercoverage from earlier censuses may not

    apply to Census 2000.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 15

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    16/40

    DRAFT DO NOT CITE OR QUOTE

    undercoverage of the foreign-born population is higher than for the rest of the population.

    In the following sections, we will use the term recent arrival to refer to foreign-born

    persons whose year of entry is within 10 years of the current survey period, and the termvery recent arrival to refer to those whose year of entry is within five years of the current

    survey period.15

    Ethnographic Studies Suggest that the Foreign Born AvoidDetection

    In the early 1990s, several ethnographic studies were performed, working with 29

    (necessarily local and not statistically generalizable) sample areas containing some foreign-

    born groups. De la Puente (1993: 4) summarizes many of these results. De la Puente notes

    that complex households were common in sample areas with recent immigrants, especiallyHispanic immigrants. These ad hoc households protect the identity of members and thus

    contribute to within-household undercoverage.16 Complex households (often containing

    more members than allowed by law or by building management) combine with fear ofdisclosure to create avoidance.

    The Foreign Born Tend to Respond Later to Surveys

    Work performed by Camarota and Capizzano (2004) mixed ethnographic with quantitative

    analyses of operational ACS data. While the ethnographic results were broadly

    comparable with the previous section, the quantitative data reveal that, in the study areas,the foreign born are more likely to be captured later in the survey process. Figure two

    and three, taken from Camarota and Capizzano (2004), illustrate this:

    -- Insert figures two and three about here --

    As can be seen in these figures, foreign-born respondents are disproportionately

    responding in the later, telephone assisted (CATI) or personal interview (CAPI) phases ofthe ACS. Furthermore, certain countries of birth of foreign-born respondents (including

    some countries that are commonly held to be major sources of unauthorized immigration)

    tend to be captured later in the operational process. Obviously, complete household non-response represents the ultimate late responderand it is widely asserted by survey

    experts (with evidence given by, e.g., Treat and Stackhouse, 2002) that late responders or

    households captured in nonresponse follow-up are systematically different than earlyresponders.

    States with Higher Levels of Foreign Born, ParticularlyUnauthorized or Recent Arrivals, Tend to Have LowerCoverage Ratios

    Figure four plots coverage ratios in the 2006 American Community Survey against thepercent of the total population that is a recent arrival foreign born, for states. Overlaid

    15 Obviously, the choice of years is somewhat arbitrary. Wilson (2008) uses a 10-year period to indicate

    recent; Clark and Patel (2004) use a 5-year period to indicate recent. The results described below are

    robust to either definition.16 One household cite in this study indicated that of the thirteen persons living in this household, only six

    were enumerated in the 1990 census.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 16

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    17/40

    DRAFT DO NOT CITE OR QUOTE

    onto this plot is a simple linear regression and 95% confidence interval. The regression

    line is analytically weighted to reflect the total population size of the state.

    -- Insert figure four about here --

    As can be seen, states with more recently arrived foreign born systematically tend tohave lower total coverage ratios.

    It is important in this context to mention the ecological fallacy in this regard (Robinson,1950). These are state level data, and demonstrating a correlation between recently arrived

    foreign born and lower coverage ratios does not necessarily imply individual-level net

    undercoverage. The demonstrated relationship suggests, but does not prove, such an

    individual-level relationship.

    The Foreign Born, Particularly Recent Arrivals, Tend to Live inAreas with Higher Hard-to-Count Scores

    Our final test uses the Planning Database (Robinson and Bruce, 2007; see also Bruce andRobinson, 2003; and Bruce, Robinson, and Sanders, 2001), a tool designed by the CensusBureau for planning and targeting areas that are harder to count than other areas. This

    database, using Census 2000 short form and long form data, contains tabulations of data on

    variables relevant to net undercoverage. Further, for 65,184 census tracts, a hard to

    count score is calculated. This score ranges from 0 (representing the easiest to counttracts, with none or very few indicators) to a theoretical maximum of 132 (representing

    tracts with all indicators of net undercoverage).

    Again we wish to be careful not to commit the ecological fallacy (Robinson, 1950). We

    emphasize that these are tractlevel data, and while we will show that there is a correlation

    between recently arrived foreign-born persons and other hard to count indicators, werecognize that definitive proof requires individual level data. To emphasize this point, we

    will continue to use the phrase areas to refer to tracts.

    The hard to count score is a sum of weighted scores. The following description is fromRobinson and Bruce (2007: 7): a total of 12 variables that were correlated with

    nonresponse rates in 1990 and 2000 are used to derive the HTC score.

    The set of algorithms used to determine HTC scores is as follows:

    (1) each individual variable is sorted across geographic areas from high to low (e.g., sort

    tracts from highest percent poverty to lowest percent poverty),

    (2) scores (0 to 11) are assigned to each variable for each tract (e.g., values of 11 are givento tracts with the highest poverty rates of over 44.3 percent and values of 0 are given to

    tracts below the national median poverty rate of 9.9 percent in 2000),(3) the scores assigned to each of the 12 variables for a tract are summed to form a

    composite HTC score for the tract.

    The final HTC score is the sum of the ratings of the following components:1) Percent vacant housing units;

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 17

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    18/40

    DRAFT DO NOT CITE OR QUOTE

    2) Percent housing units that are not single-family structure;

    3) Percent renter-occupied housing units;

    4) Percent crowded occupied units;5) Percent of families not in a husband/wife configuration;

    6) Percent of occupied housing units with no phone service;

    7) Percent of persons 25 and older not a high-school graduate or more;8) Percent of persons below the poverty level;

    9) Percent of persons receiving public assistance;

    10) Percent of persons 16 and older unemployed;11) Percent of households that are linguistically isolated;

    12) Percent of occupied housing units where the owner moved in within 1999-2000.

    We have demonstrated above that the foreign born, and particularly recent entry foreignborn, tend to have characteristics that correlate with the items that enter into the hard to

    count score. Do they also tend to live in areas that have high HTC scores? Figure five

    presents a simple graphical analysis, with a linear regression (weighted by population size

    in each tract) fit overlaid onto the point cloud. Each point is a census tract.

    -- Figure five about here

    As can be seen, there appears to be a correlation between percent foreign born who have

    entered within the last ten years and the hard to count scores. And, paradoxically, when alinear regression is fit, for tracts where there are high levels of foreign born, beginning

    about about 80% of total population, the linear fit begins to make predictions that are

    impossibly high (that is, above 132).

    This simple graphical finding suggests that there is residual variance to be explained that is

    not captured in the components included in the score itself. To test this hypothesis, we run

    a regression of HTC score on the twelve components that enter into that score, with theaddition of one variable: percent of total population that is foreign born and has a year of

    entry within the last ten years. The result of this regression is presented in table three.

    -- Insert table three about here

    As can be seen, and not surprisingly, the components of the HTC score tend to predict that

    score. Additionally, conditional on all other components of the score being held constant,every one percent increase in recent entry foreign born in a census tract results in an

    expected .475 point increase in the HTC score.

    Demographic Characteristics of Individual Foreign BornPersons, Particularly Recent Arrivals, Correlate With Hardto Count Indices

    In previous sections, we have dealt with aggregate data at the state and tract levels. First,

    we have shown that coverage ratios tend to decline as the percentage of the population whoare recent arrival foreign born increases. Second, we have shown that hard to count scores

    for tracts tend to increase as the percentage of the population who are recent arrival foreign

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 18

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    19/40

    DRAFT DO NOT CITE OR QUOTE

    born increases, and that this relationship remains even when the components of the score

    itself are used to predict it.

    In each of the preceding sections we have noted the potential for the ecological fallacy. In

    this section we use microdata from the 2007 American Community Survey Public Use

    Microdata Sample to examine whether the components that correlate with netundercoverage are concentrated among individuals who are foreign born, in particular

    recent arrivals. We use the characteristics of individuals and households identified by the

    Census Bureaus planning database (Robinson and Bruce, 2007) as predictive of low returnrates and hard to count characteristics.

    Our analysis is simply framed. We have partitioned the population into three groups:

    native born (including those born abroad of American parents); older foreign born (thoseforeign-born persons whose year of entry is earlier than 10 years from the survey date, i.e.,

    1997); the recent foreign born (those foreign-born people whose year of entry is within 6-

    10 years of the survey date, 1997-2001), and the very-recent foreign born (those foreign-

    born people whose year of entry is within five years of the survey date, 2002 or later). Weproceeded by comparing whether these four groups differ on each of the individual hard to

    count characteristics. Recall that the characteristics identified in the hard to count scoreare: vacant housing units, housing units that are not a single-family structure, renter-

    occupied housing units, crowded occupied units, families not in a husband/wife

    configuration, occupied housing units with no phone service, persons over 25 who have notgraduated from high school, persons below the poverty level, persons receiving public

    assistance, persons aged 16 and over who are unemployed17, households that are

    linguistically isolated, and occupied housing units where the owner moved in within

    1999-2000. With the exception of vacancy (which does not apply), we look at eachindividually, rather than in a multivariate context, performing simple difference of

    proportions (percentages) tests (Agresti, 1990) to assess statistical significance. In each

    case our null hypotheses are stated simply:

    H0(1): The percentage of persons who exhibit the hard to count characteristic does not

    differ between native born and non-recent foreign born.

    H0(2): The percentage of persons who exhibit the hard to count characteristic does not

    differ between native born and recently arrived foreign born.

    H0(3): The percentage of persons who exhibit the hard to count characteristic does not

    differ between native born and very recently arrived foreign born.

    Our alternative hypotheses are the converses of these null hypotheses. We use the 90%

    two-tailed confidence interval; thus, if the confidence intervals of the two comparison

    groups do not overlap, we will reject the null hypothesis, otherwise, retain. Table fourcontains these tests.

    17 Note that persons who are out of the labor force are not considered unemployed in this definition;

    thus this percentage does not correspond to the traditional unemployment rate definition.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 19

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    20/40

    DRAFT DO NOT CITE OR QUOTE

    -- Insert table four about here --

    We summarize the conclusions from this table:1. The foreign born, particularly recent arrivals, are more likely to live in non-single-

    unit houses;

    2. The foreign born, particularly recent arrivals, are more likely to be renters ratherthan owners;

    3. The recently arrived foreign born are more likely to live in crowded housing

    units;4. The recently arrived foreign born are more likely to live in non-husband/wife

    families;

    5. The recently arrived foreign born are more likely to live in houses without a

    telephone;6. The foreign born (age 25+) are more likely to not have graduated from High

    School;

    7. The recently arrived foreign born are more likely to live in households below the

    poverty line; the non-recently-arrived foreign born are less likely to live inhouseholds below the poverty line;

    8. The non-recently-arrived foreign born are more likely to be receiving publicassistance; the recent foreign born are less likely to be receiving public assistance;

    9. The recently arrived foreign born are more likely to be unemployed;

    10. The foreign born, particularly recent arrivals, are more likely to live inlinguistically isolated households;

    11. The recently arrived foreign born are more likely to be recent movers; the non-

    recently-arrived foreign born are less likely to be recent movers.

    In sum, of the eleven characteristics that are considered to make a household hard to

    count, the recently-arrived foreign born are more likely to exhibit ten of the them, and less

    likely to exhibit one (public assistance receipt). The non-recently-arrived foreign bornexhibit six of these characteristics; for three characteristics the native born and non-

    recently-arrived foreign born are statistically indistinguishable; and for two characteristics

    the non-recently-arrived foreign born are less likely to exhibit the characteristic.

    What Can be Done to Measure Coverage of the ForeignBorn?

    Despite the stated importance of this topic to many agencies (Government Accountability

    Office, 1998:57-58), there have been few attempts to measure net undercoverage of theforeign born. This section will detail alternatives, with appropriate caveats stated.

    Coverage Measurement Alternatives: Summary

    Coverage measurement is a difficult topic to summarize, but we shall attempt a brief

    summarization here. Table one describes the coverage measurement approaches and their

    implied coverage ratios. One can imagine defining coverage as doing it better, that is,determining what enumerationshould have occurred. In this case the coverage ratio is the

    enumeration divided by the truth (as determined by the better system). Dual system

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 20

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    21/40

    DRAFT DO NOT CITE OR QUOTE

    estimation is based on doing it independently. In this case the coverage ratio is the

    enumeration divided by the independence model. The remaining methods, Reverse record

    check, Megalist/Benchmark, and Demographic accounting, each assume that thetraced sample, the megalist or benchmark list, or the demographic summation, determine

    the coverage ratio (Popoff and Judson 2004: 634 list these five alternatives; Judson, 2006

    discusses coverage ratios.)

    -- Insert table five about here --

    Post Enumeration Surveys/Dual System Estimation

    A standard for evaluating coverage in a census is the Dual system estimation (DSE)method using a post enumeration survey. (See Popoff and Judson, 2004:633-637 for a

    summary).

    The DSE method has a long history; see, e.g., Chandrasekaran and Deming, 1949; Marks,

    Seltzer and Krotki, 1974; Wolter, 1986; Hogan, 1992, 1993 and 2000; for theory andexamples of the method in practice. While the 1950 Census was the first to use a post

    enumeration program, and the method has been used subsequently (with increasingtechnical sophistication). The DSE method is a microdata approach (focusing on

    individual responses) rather than an aggregate approach (focusing on demographic

    aggregates; Judson, 2006).

    The 2000 Accuracy and Coverage Evaluation was based on the census short form.

    Because of this, it was not possible to create a specific estimation domain for the foreignborn. (As the nativity question is not asked on the short form, it is not possible to assign

    individual respondents to that domain so as to construct coverage estimates of that

    domain.) It has been asserted (e.g., Hogan, 2008) that technical and policy issues make itimpossible to use dual system estimation to construct a coverage estimate specifically for

    the foreign born.

    The operative phrase here is, of course, dual-system estimation. Assuming that a dualsystem estimate is the gold standard, and noting that no legislative mandate exists to

    specifically construct an estimate of coverage for the foreign born18, the question is moot

    without asking nativity on both the 2010 census and the CCM survey, it is not possible toclassify individuals sufficiently to construct a full dual system estimate.

    What is not spoken of is the possibility of some otherkind of estimate of coverage.19 It is

    to these other kinds of estimates that we now turn.

    18 Raising the question, of course: would there be any support for such a legislative mandate, that is, to

    specifically construct a dual system estimate, and corresponding coverage correction factor, using nativity

    status as an estimation domain?19 Suppose that we stipulate, for the sake of argument, that dual system is the gold standard. Even so, that

    stipulation raises the further question: if we cannot generate a gold standard estimate, does that mean we

    should not attempt to producesome estimate, recognizing its caveats?

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 21

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    22/40

    DRAFT DO NOT CITE OR QUOTE

    Demographic Benchmarking

    A method that we will describe as demographic benchmarking has been presented in

    Pitkin and Park (2005), and proposed by Camarota (2006). The demographicbenchmarking method develops, for an appropriate target population, the highest-quality

    demographic data available to construct an estimate of the target population, in various

    aggregate quantities (e.g., age groups). These benchmarks are compared to census orsurvey results, and where the census or survey results differ from the demographic

    benchmarks, interpret the difference as net coverage error. The demographic

    benchmarking approach stands or falls on one particular assumptionthat the benchmark

    is of sufficiently high quality to serve this role.

    What benchmarks have been used successfully? For the benchmark to be of sufficiently

    high quality, it must be measured with little or no error. Of the demographic statisticsavailable currently, only two fully qualify: vital statistics on births, and vital statistics on

    deaths. (In the United States, data on Medicare enrollments are a candidate, with relatively

    minor correction for historical underenrollments; Robinson, et al, 2002.)

    Demographic Analysis

    An extension of the demographic benchmarking approach is what we shall calldemographic analysis (which we alluded to earlier). What distinguishes demographic

    analysis from benchmarking is that it attempts to construct a complete estimate of the

    population of interest, rather than particular segments of it. Like benchmarking, it isfundamentally an aggregate approach rather than a microdata approach. The use of

    demographic analysis to assess net coverage is similar to the benchmarking approach in the

    following ways: For an appropriate target population, it uses the highest-qualitydemographic data available to construct an estimate of the target population, in various

    aggregate quantities (e.g., age groups). For less-well-known groups (e.g., immigrants;

    emigrants; unauthorized persons), demographically-plausible models are constructed. andthen the demographic benchmark is compared to census or survey results, interpreting thedifference as net coverage error.

    The challenge of the demographic analysis approach is that it makes more assumptionsthan the benchmarking approach. The benchmarking approach can rely on the relative

    strength of its underlying data sources: vital statistics, housing, school enrollment, or

    employment data. Demographic analysis, in contrast, must rely on the formerandweakerassumptions about components of migration.

    Direct enquiries

    Marcelli and Ong (2002), after reviewing demographic analysis and dual-system

    approaches (with an abbreviated discussion of synthetic estimation, to which we shall turn

    subsequently), propose a direct enquiry as a method of estimating census undercoverage.This approach is microdata oriented, in that respondents in households are asked directly

    was [this person enumerated in the household] included in the 2000 questionnaire sent to

    the Census Department? Thus, for an appropriate target population, a list of persons inthat target population is constructed, adjusting for survey nonresponse. Within that target

    population, a direct enquiry as to whether the person was captured in census records is

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 22

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    23/40

    DRAFT DO NOT CITE OR QUOTE

    performed, and were a respondent indicates that they were not reported on the census list,

    interpret the report as a census coverage error, and calculate the gross undercoverage rate

    from these data.

    Obviously, the question about reporting to the census has the potential to be sensitive,

    presumably more so for those foreign-born persons of unauthorized or ambiguous legalstatus. Because the question is sensitive, it is easy to imagine that some version of social

    desirability bias will play a role in the respondents answer.

    Marcelli and Ong, while recognizing the above criticism, defend the approach as follows:

    They work directly with the local population of interest; they use interviewers that are as

    non-threatening as possible; and they ask questions in the vernacular, with native speakers.

    All of these approaches are designed to reduce response error due to respondent fear orresistance.

    One-Way Record Linking

    A microdata approach for estimating coverage error was tested by Heer and Passel (1987)

    in the Los Angeles metropolitan area. This method will be referred to as one-way recordlinking. For an appropriate target population, a list of persons in that target population,

    with appropriate individually identifying information (e.g., full name, full date of birth,

    geographic locators) is constructed. This list is linked with the decennial enumeration list,and where a nonlink occurs with the census list, interpret the nonlink as a census coverage

    error, and calculate the gross undercoverage rate from these data.

    The key to this approach is the assumption that the list for the target population is a

    complete list, and that therefore any difference between the list and the census enumeration

    mustbe gross census undercoverage. A weakness of the approach is the assumption thatneither list contains gross overcoverage. Privacy concerns about this use of records might

    arise, as well.

    Synthetic Estimation

    Synthetic methods for estimating coverage involve combining information from a

    coverage evaluation survey with demographic characteristics of the population of interest.20

    The synthetic approach also has a long history (e.g., Gonzales, 1978) and makes use of

    direct and indirect information. In fact, the intended application (in A.C.E. in 2000) of the

    dual system estimator itself would have used synthetic estimation methods for non-sampledcensus areas.

    In the intended A.C.E. 2000 application, nonsampled census blocks would have been

    treated as an estimation domain to which coverage correction factors would have been

    20This is aspecific application of synthetic estimation in general, in which survey estimates of some

    specific characteristic are combined with demographic characteristics to construct the final estimand.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 23

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    24/40

    DRAFT DO NOT CITE OR QUOTE

    applied synthetically. (A description of the application of the synthetic method can be

    found in Hogan, 2000, or Judson and Popoff, 2004.) The innovation proposed here is to

    treat the foreign-born population, or some subset of the foreign-born population (such arecent arrivals) as an estimation domain, and an estimate of the unauthorized foreign-born

    population as a separate estimation domain, rather than nonsampled census blocks, and

    apply the dual system coverage estimates to them. This would required tabulating theforeign-born population enumerated in Census 2000, allocating the foreign born

    respondents to appropriate A.C.E. (revision II) post strata, determining the proportion of

    the foreign-born population that falls into each stratum, constructing the coverage factorsfrom A.C.E. Revision II post strata, and calculating a synthetic estimate of the net coverage

    factors for the foreign-born population as a whole.

    The strength of this synthetic approach is that it would use the best available statistically-designed coverage data, rather than rely on demographic assumptions or the assumption

    that one or more list is a benchmark. A weakness is that it makes the implicit assumption

    that the foreign born have the same coverage factors as the population as a whole within a

    post-stratum (the synthetic assumption). If it is assumed that the foreign-born populationhas at best net coverage error equal to the population as a whole, then this method would

    generate an upper bound net coverage error rate.

    Research ProposalsWe have argued in this paper that assessing the potential undercoverage of the foreign-bornpopulation is an important task for the statistical community. We have presented

    ethnographic, survey, and demographic evidence for such undercoverage. We have

    presented new findings that suggest, but do not prove, that the foreign-born population

    might have differential undercoverage both in the census and in ongoing surveys. We haveshown that population estimates of public policy significance are highly sensitive to an

    assumed rate.

    Given these findings, it is natural to conclude that the statistical community has a

    responsibility to find a way to estimate that number, the potential net undercoverage of the

    foreign born. We shall now summarize a sequence of research tasks to approach thatnumber, beginning with the easiest-to-implement and proceeding into more difficult

    approaches.

    The first method on our list is demographic benchmark studies. Pitkin and Park (2005)

    demonstrated that birth registry data, combined with reasonable demographic assumptions,can construct a benchmark population estimate from which an estimate of coverage could

    be derived. Because Pitkin and Parks method is based on birth registry data, it does notsuffer the weaknesses of the larger demographic analysis approachit requires fewer

    difficult-to-maintain demographic assumptions. This approach would provide a net

    coverage error estimate.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 24

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    25/40

    DRAFT DO NOT CITE OR QUOTE

    The second method on our list is synthetic estimation. Assuming no congressional

    mandate for a coverage estimate by nativity is promulgated, it is possible to cross-classify

    the foreign-born on characteristics evaluated in the A.C.E. in 2000 (or those to be evaluatedin Census Coverage Measurement in 2010). Using such cross-classification and published

    coverage correction factors, it is a mathematical exercise to derive a synthetic estimate.

    This approach would provide a net coverage error estimate.

    The third method on our list is that of developing a direct survey. Marcelli and Ong (2002)

    have demonstrated that direct enquiry, combined with demographic analysis, can begin tomake headway in understanding the potential gross undercoverage of the foreign born.

    Following Marcellis arguments, it would appear that such a survey would best be fielded

    by a trusted, non-governmental entity, and designed from bottom-to-top to allay

    respondents privacy concerns. This approach, as with one-way record linkage, would notprovide information on gross overcoverage.

    The fourth method on our list is a one-way record linkage study. We have described above

    the technical limitations of a one-way record linkage, noting in particular that theassessment of coverage is biased by the presence of record linkage error. However, it

    appears to us to that results from such a study could be adjusted for the presence of sucherror (e.g., Judson, 2007: 497), yielding, at least, some direct information about coverage of

    the foreign born. While this approach would provide information on gross undercoverage,

    it would not provide information on gross overcoverage.

    The fifth method involves testing the feasibility of a Post-enumeration Survey in the

    context of the American Community Survey. In accord with Hogan (2008), Census 2010

    will not have a nativity question, thus a dual system estimate using nativity as an estimationdomain is not possible. However, the American Community Survey does include a nativity

    question. With the development of appropriate statistical theory to account for the ACSs

    complex sample design, a post-ACS survey, designed along similar lines as the existingCensus Coverage Measurement system, would provide coverage measurements by nativity

    (and presumably, other relevant characteristics).

    Our sixth and final method involves testing nativity questions in a CCM framework for

    post-2010 purposes. The sensitivity of nativity questions in a CCM framework might bias

    the dual system estimator by inducing correlation bias amongst the foreign-born

    population. While this is a reasonable concern, it warrants empirical testing. If, in fact theimpact of a nativity question is negligible, and congressional mandate for such estimation

    were in place, then there would be no reason not to apply the gold standard coverage

    measurement technique to developing a statistically-principled estimate for the netcoverage of the foreign born.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 25

  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    26/40

    DRAFT DO NOT CITE OR QUOTE

    References

    Agresti, A. 1990. Categorical Data Analysis. New York: Wiley.

    Anderson. M. J. and Feinberg, S. E. 1999. Who Counts? The Politics of Census-Taking inContemporary America. New York, NY: Russell Sage Foundation.

    Arriaga, E,E,, P.D. Johnson, and E. Jamison. 1994. Population Analysis with

    Microcomputers, Vol. 1 and 2. Washington, D.C.: U.S.Department of Commerce, U.S.

    Census Bureau, International Programs Center.

    Bill, W. 2002. A.C.E. Revision II: Calculating Aggregate Data Defined, Correct

    Enumeration, and Census Inclusion Rates (For Groups that Involve Aggregation AcrossPost-Strata). Online: http://www.census.gov/dmd/www/pdf/pp-40r.pdf.

    Bruce, A., and Robinson, J. G.. 2003. The Planning Database: Its Development and Use asan Effective Targeting Tool in Census 2000," paper presented at the Annual Meetings of

    the Southern Demographic Association, Arlington, VA, October 24, 2003.

    Bruce, A., Robinson J. G., and Sanders, M. V.. 2001. Hard-to-Count Scores and BroadDemographic Groups Associated with Patterns of Response Rates in Census 2000,"

    Proceedings of the Social Statistics Section, American Statistical Association.

    Camarota, S. and Capizzano, J. 2004. Assessing the Quality of Data Collected on the

    Foreign Born: An Evaluation of the American Community Survey (ACS). Online:

    http://www.sabresystems.com/whitepapers/CIS_whitepaper.pdf.

    Camarota, S. 2006. Assessing the Quality of Data Collected on the Foreign Born:

    An Evaluation of the American Community Survey (ACS). Paper presented at the U.S.

    Census Bureau Conference, Immigration Statistics: Methodology and Data Quality.Alexandria, VA: February 13-14, 2006.

    Cantwell, P.J., Hogan, H., and Styles, K.M. 2004. The Use of Statistical Methods in theU.S. Census: Utah V. Evans. The American Statistician, 58: 203-212.

    Chandrasekar, C., and Deming, W.E. 1949. On a Method of Estimating Birth and DeathRates and the Extent of Registration. Journal of the American Statistical Association, 44:

    101-115.

    Clark, W.A.V. and Patel, S. 2004. Residential Choices of the Newly Arrived ForeignBorn: Spatial Patterns and the Implications for Assimilation. California Center for

    Population Research On-Line Working Paper Series, CCPR-026-04, February 2004.

    Darga, K. 2000. Fixing the Census Until it Breaks: An Assessment of the Undercount

    Adjustment Puzzle. Lansing, MI: Michigan Information Center.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 26

    http://www.census.gov/dmd/www/pdf/pp-40r.pdfhttp://www.census.gov/dmd/www/pdf/pp-40r.pdf
  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    27/40

    DRAFT DO NOT CITE OR QUOTE

    De la Puente, M. 1993. Why Are People Missed Or Erroneously Included By The

    Census: A Summary Of Findings From Ethnographic Coverage Reports. ResearchConference on Undercounted Ethnic Populations. Richmond, VA: U.S. Census Bureau.

    Deardorff, K. and Blumerman, L. 2001. Appendix A: Estimates of the Foreign-BornPopulation by Migrant Status: 2000. In Robinson, J.G. 2001. ESCAP II: Demographic

    Analysis Results. Executive Steering Committee for A.C.E. Policy II, Report No. 1,

    October 13, 2001. Online: http://www.census.gov/dmd/www/pdf/Report1.PDF.

    Ellis, Y. 1995. Examination of Census Omission and Erroneous Enumeration Based on

    1990 Ethnographic Studies of Census Coverage. Pp. 515-520Proceedings of the

    American Statistical Association (Survey Research Methods Section). Alexandria, VA:American Statistical Association.

    Ewbank, D.C. 1981. Age Misreporting and Age-Selective Underenumeration: Sources,

    Patterns, and Consequences for Demographic Analysis. Washington, D.C.: NationalAcademy Press.

    Fay, R. E. 2001. The 2000 Housing Unit Duplication Operations and Their Effect on The

    Accuracy Of The Population Count Paper presented at the Annual Meeting of the

    American Statistical Association, Atlanta, Georgia, August 5-9, 2001.

    Fein, D. J. 1990. Racial and ethnic differences in U.S. census omission rates.

    Demography, 27:285-302.

    Fernandez, E.W., and Robinson, J. G. 1994. "Illustrative Ranges of the Distribution of

    Undocumented Immigrants by State," Technical Working Paper No. 8. October 1994.

    Online: http://www.census.gov/population/www/documentation/twps0008/twps0008.html.

    Government Accountability Office 1998a. Decennial Census: Overview of Historical

    Census Issues. GAO/GGD-98-103. Washington D.C.: U.S. Government AccountabilityOffice.

    Government Accountability Office 1998b. Immigration Statistics: Information Gaps,

    Quality Issues Limit Utility of Federal Data to Policymakers. GAO/GGD-98-164.Washington D.C.: U.S. Government Accountability Office.

    Hoefer, M., Rytina, N., and Baker, B. 2008. Estimates of the Unauthorized ImmigrantPopulation Residing in the United States: January 2007. Online:

    http://www.dhs.gov/xlibrary/assets/statistics/publications/ois_ill_pe_2007.pdf.

    Hogan, H. 1992. "The 1990 Post-Enumeration Survey: An Overview." The American

    Statistician, 46: 261-269.

    /var/www/apps/conversion/current/tmp/scratch17168/89512926.doc 27

    http://www.census.gov/dmd/www/pdf/Report1.PDFhttp://www.census.gov/population/www/documentation/twps0008/twps0008.htmlhttp://www.dhs.gov/xlibrary/assets/statistics/publications/ois_ill_pe_2007.pdfhttp://www.census.gov/dmd/www/pdf/Report1.PDFhttp://www.census.gov/population/www/documentation/twps0008/twps0008.htmlhttp://www.dhs.gov/xlibrary/assets/statistics/publications/ois_ill_pe_2007.pdf
  • 8/2/2019 Coverage of FB 3-30-09 - V2 - DHJ

    28/40

    DRAFT DO NOT CITE OR QUOTE

    Hogan, H. 1993. "The 1990 Post-Enumeration Survey: Operations and Results." Journal

    of the American Statistical Association, 88:1047-1060.

    Hogan, H. 2000. Accuracy and Coverage Evaluation: Theory and Application. Paper

    presented at the 2000 Joint Statistical Meetings, Indianapolis, Indiana, August 2-5, 2000.

    Hogan, H. 2008. Letter to Ms. Judith Droitcour, Assistant Director, Applied Research and

    Methods, U.S. Government Accountability Office. Dated February 26, 2008.

    Jones, J. 2003. Housing Unit Duplication in Census 2000. Census Bureau Evaluation

    O.10. Washington, DC: U.S. Census Bureau. Online:

    http://www.census.gov/pred/www/rpts/O.10.PDF.

    Judson, D.H. 2006. Demographic Coverage Measurement: Can Information Integration

    Theory Help? Paper presented at the 2006 Joint Statistical Meetings, Seattle, WA, August

    6-10, 2006.

    Judson, D.H. 2007. Information integration for constructing social statistics: history,

    theory and ideas towards a research programme.Journal of the Royal Statisti