appendix iii. limitations of the data · appendix iii limitations of the data ... selection and...

25
Appendix III Limitations of the Data Introduction—The data presented in this Statistical Abstract came from many sources. The sources include not only fed- eral statistical bureaus and other organi- zations that collect and issue statistics as their principal activity, but also govern- mental administrative and regulatory agencies, private research bodies, trade associations, insurance companies, health associations, and private organizations such as the National Education Associa- tion and philanthropic foundations. Con- sequently, the data vary considerably as to reference periods, definitions of terms and, for ongoing series, the number and frequency of time periods for which data are available. The statistics presented were obtained and tabulated by various means. Some statistics are based on complete enumera- tions or censuses, while others are based on samples. Some information is ex- tracted from records kept for administra- tive or regulatory purposes (school enroll- ment, hospital records, securities registration, financial accounts, social security records, income tax returns, etc.), while other information is obtained ex- plicitly for statistical purposes through interviews or by mail. The estimation pro- cedures used vary from highly sophisti- cated scientific techniques, to crude ‘‘in- formed guesses.’’ Each set of data relates to a group of indi- viduals or units of interest referred to as the target universe or target population, or simply as the universe or population. Prior to data collection the target universe should be clearly defined. For example, if data are to be collected for the universe of households in the United States, it is necessary to define a ‘‘household.’’ The target universe may not be completely tractable. Cost and other considerations may restrict data collection to a survey universe based on some available list, such list may be it of date. This list is called a survey frame or sampling frame. The data in many tables are based on data obtained for all population units, a census, or on data obtained for only a portion, or sample, of the population units. When the data presented are based on a sample, the sample is usually a sci- entifically selected probability sample. This is a sample selected from a list or sampling frame in such a way that every possible sample has a known chance of selection and usually each unit selected can be assigned a number, greater than zero and less than or equal to one, repre- senting its likelihood or probability of se- lection. For large-scale sample surveys, the prob- ability sample of units is often selected as a multistage sample. The first stage of a multistage sample is the selection of a probability sample of large groups of population members, referred to as pri- mary sampling units (PSUs). For example, in a national multistage household sample, PSUs are often counties or groups of counties. The second stage of a multi- stage sample is the selection, within each PSU selected at the first stage, of smaller groups of population units, referred to as secondary sampling units. In subsequent stages of selection, smaller and smaller nested groups are chosen until the ulti- mate sample of population units is ob- tained. To qualify a multistage sample as a probability sample, all stages of sam- pling must be carried out using probabil- ity sampling methods. Prior to selection at each stage of a multi- stage (or a single stage) sample, a list of the sampling units or sampling frame for that stage must be obtained. For ex- ample, for the first stage of selection of a national household sample, a list of the counties and county groups that form the PSUs must be obtained. For the final stage of selection, lists of households, and sometimes persons within the house- holds, have to be compiled in the field. For surveys of economic entities and for the economic censuses the Census Bureau Appendix III 941 U.S. Census Bureau, Statistical Abstract of the United States: 2003

Upload: others

Post on 22-Mar-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

Appendix III

Limitations of the Data

Introduction—The data presented in thisStatistical Abstract came from manysources. The sources include not only fed-eral statistical bureaus and other organi-zations that collect and issue statistics astheir principal activity, but also govern-mental administrative and regulatoryagencies, private research bodies, tradeassociations, insurance companies, healthassociations, and private organizationssuch as the National Education Associa-tion and philanthropic foundations. Con-sequently, the data vary considerably asto reference periods, definitions of termsand, for ongoing series, the number andfrequency of time periods for which dataare available.

The statistics presented were obtainedand tabulated by various means. Somestatistics are based on complete enumera-tions or censuses, while others are basedon samples. Some information is ex-tracted from records kept for administra-tive or regulatory purposes (school enroll-ment, hospital records, securitiesregistration, financial accounts, socialsecurity records, income tax returns, etc.),while other information is obtained ex-plicitly for statistical purposes throughinterviews or by mail. The estimation pro-cedures used vary from highly sophisti-cated scientific techniques, to crude ‘‘in-formed guesses.’’

Each set of data relates to a group of indi-viduals or units of interest referred to asthe target universe or target population,or simply as the universe or population.Prior to data collection the target universeshould be clearly defined. For example, ifdata are to be collected for the universeof households in the United States, it isnecessary to define a ‘‘household.’’ Thetarget universe may not be completelytractable. Cost and other considerationsmay restrict data collection to a surveyuniverse based on some available list,such list may be it of date. This list iscalled a survey frame or sampling frame.

The data in many tables are based ondata obtained for all population units, acensus, or on data obtained for only aportion, or sample, of the populationunits. When the data presented are basedon a sample, the sample is usually a sci-entifically selected probability sample.This is a sample selected from a list orsampling frame in such a way that everypossible sample has a known chance ofselection and usually each unit selectedcan be assigned a number, greater thanzero and less than or equal to one, repre-senting its likelihood or probability of se-lection.

For large-scale sample surveys, the prob-ability sample of units is often selected asa multistage sample. The first stage of amultistage sample is the selection of aprobability sample of large groups ofpopulation members, referred to as pri-mary sampling units (PSUs). For example,in a national multistage householdsample, PSUs are often counties or groupsof counties. The second stage of a multi-stage sample is the selection, within eachPSU selected at the first stage, of smallergroups of population units, referred to assecondary sampling units. In subsequentstages of selection, smaller and smallernested groups are chosen until the ulti-mate sample of population units is ob-tained. To qualify a multistage sample asa probability sample, all stages of sam-pling must be carried out using probabil-ity sampling methods.

Prior to selection at each stage of a multi-stage (or a single stage) sample, a list ofthe sampling units or sampling frame forthat stage must be obtained. For ex-ample, for the first stage of selection of anational household sample, a list of thecounties and county groups that form thePSUs must be obtained. For the final stageof selection, lists of households, andsometimes persons within the house-holds, have to be compiled in the field.For surveys of economic entities and forthe economic censuses the Census Bureau

Appendix III 941

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 2: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

generally uses a frame constructed fromthe Bureau’s Business Register. The Busi-ness Register contains all establishmentswith payroll in the United States, includ-ing small single establishment firms aswell as large multiestablishment firms.

Wherever the quantities in a table refer toan entire universe, but are constructedfrom data collected in a sample survey,the table quantities are referred to assample estimates. In constructing asample estimate, an attempt is made tocome as close as is feasible to the corre-sponding universe quantity that would beobtained from a complete census of theuniverse. Estimates based on a samplewill, however, generally differ from thehypothetical census figures. Two classifi-cations of errors are associated with esti-mates based on sample surveys: (1) sam-pling error the error arising from the useof a sample, rather than a census, to esti-mate population quantities and (2) non-sampling error those errors arising fromnonsampling sources. As discussed be-low, the magnitude of the sampling errorfor an estimate can usually be estimatedfrom the sample data. However, the mag-nitude of the nonsampling error for anestimate can rarely be estimated. Conse-quently, actual error in an estimate ex-ceeds the error that can be estimated.

The particular sample used in a survey isonly one of a large number of possiblesamples of the same size which couldhave been selected using the same sam-pling procedure. Estimates derived fromthe different samples would, in general,differ from each other. The standard error(SE) is a measure of the variation amongthe estimates derived from all possiblesamples. The standard error is the mostcommonly used measure of the samplingerror of an estimate. Valid estimates ofthe standard errors of survey estimatescan usually be calculated from the datacollected in a probability sample. For con-venience, the standard error is sometimesexpressed as a percent of the estimateand is called the relative standard error orcoefficient of variation (CV). For example,an estimate of 200 units with an esti-mated standard error of 10 units has anestimated CV of 5 percent.

A sample estimate and an estimate of itsstandard error or CV can be used to con-struct interval estimates that have a pre-scribed confidence that the interval in-cludes the average of the estimatesderived from all possible samples with aknown probability. To illustrate, if all pos-sible samples were selected under essen-tially the same general conditions, andusing the same sample design, and if anestimate and its estimated standard errorwere calculated from each sample, then:(1) Approximately 68 percent of the inter-vals from one standard error below theestimate to one standard error above theestimate would include the average esti-mate derived from all possible samples;(2) approximately 90 percent of the inter-vals from 1.6 standard errors below theestimate to 1.6 standard errors above theestimate would include the average esti-mate derived from all possible samples;and (3) approximately 95 percent of theintervals from two standard errors belowthe estimate to two standard errors abovethe estimate would include the averageestimate derived from all possiblesamples.

Thus, for a particular sample, one can saywith the appropriate level of confidence(e.g., 90 percent or 95 percent) that theaverage of all possible samples is in-cluded in the constructed interval. Ex-ample of a confidence interval: An esti-mate is 200 units with a standard error of10 units. An approximately 90-percentconfidence interval (plus or minus 1.6standard errors) is from 184 to 216.

All surveys and censuses are subject tononsampling errors. Nonsampling errorsare of two kinds random and nonrandom.Random nonsampling errors arise be-cause of the varying interpretation ofquestions (by respondents or interview-ers) and varying actions of coders, key-ers, and other processors. Some random-ness is also introduced when respondentsmust estimate. Nonrandom nonsamplingerrors result from total nonresponse (nousable data obtained for a sampled unit),partial or item nonresponse (only a por-tion of a response may be usable), inabil-ity or unwillingness on the part of respon-dents to provide correct information,difficulty interpreting questions, mistakes

942 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 3: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

in recording or keying data, errors of col-lection or processing, and coverage prob-lems (overcoverage and undercoverageof the target universe). Random nonre-sponse errors usually, but not always,result in an understatement of samplingerrors and thus an overstatement of theprecision of survey estimates. Estimatingthe magnitude of nonsampling errorswould require special experiments oraccess to independent data and, conse-quently, the magnitudes are seldom avail-able.

Nearly all types of nonsampling errorsthat affect surveys also occur in completecensuses. Since surveys can be conductedon a smaller scale than censuses, non-sampling errors can presumably be con-trolled more tightly. Relatively more fundsand effort can perhaps be expended to-ward eliciting responses, detecting andcorrecting response error, and reducingprocessing errors. As a result, survey re-sults can sometimes be more accuratethan census results.

To compensate for suspected nonrandomerrors, adjustments of the sample esti-mates are often made. For example,adjustments are frequently made fornonresponse, both total and partial. Ad-justments made for either type of nonre-sponse are often referred to as imputa-tions. Imputation for total nonresponseis usually made by substituting for thequestionnaire responses of the nonre-spondents the ‘‘average’’ questionnaireresponses of the respondents. These im-putations usually are made separatelywithin various groups of sample mem-bers, formed by attempting to place re-spondents and nonrespondents togetherthat have ‘‘similar’’ design or ancillarycharacteristics. Imputation for item nonre-sponse is usually made by substitutingfor a missing item the response to thatitem of a respondent having characteris-tics that are ‘‘similar’’ to those of the non-respondent.

For an estimate calculated from a samplesurvey, the total error in the estimate iscomposed of the sampling error, whichcan usually be estimated from thesample, and the nonsampling error, whichusually cannot be estimated from the

sample. The total error present in a popu-lation quantity obtained from a completecensus is composed of only nonsamplingerrors. Ideally, estimates of the total errorassociated with data given in the Statisti-cal Abstract tables should be given. How-ever, because of the unavailability of esti-mates of nonsampling errors, onlyestimates of the levels of sampling errors,in terms of estimated standard errors orcoefficients of variation, are available. Toobtain estimates of the estimated stand-ard errors from the sample of interest,obtain a copy of the referenced report,which appears at the end of each table.

Source of Additional Material: The FederalCommittee on Statistical Methodology(FCSM) is an interagency committee dedi-cated to improving the quality of federalstatistics. <http://fcsm.ssd.census.gov/>

Principal data bases—Beginning beloware brief descriptions of 46 of the samplesurveys and censuses that provide a sub-stantial portion of the data contained inthis Abstract.

U.S. DEPARTMENT OFAGRICULTURE, NationalAgriculture Statistics Service

Basic Area Frame SampleUniverse, Frequency, and Types of Data:

June agricultural survey collects data onplanted acreage and livestock invento-ries. The survey also serves to measurelist incompleteness and is subsampledfor multiple frame surveys.

Type of Data Collection Operation: Strati-fied probability sample of about 11,000land area units of about 1 sq. mile(range from 0.1 sq. mile in cities to sev-eral sq. miles in open grazing areas).Sample includes 42,000 parcels of agri-cultural land. About 20 percent of thesample replaced annually.

Data Collection and Imputation Proce-dures: Data collection is by personalenumeration. Imputation is based onenumerator observation or datareported by respondents having similaragricultural characteristics.

Estimates of Sampling Error: EstimatedCVs range from 1 percent to 2 percentfor regional estimates to 3 percent to 6percent for state estimates of majorcrop acres and livestock inventories.

Appendix III 943

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 4: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

Other (nonsampling) Errors: Minimizedthrough rigid quality controls on the col-lection process and careful review of allreported data.

Sources of Additional Material: U.S.Department of Agriculture, NationalAgricultural Statistics Service: The FactFinders of Agriculture, September 1994.

Multiple Frame Surveys

Universe, Frequency, and Types of Data:Surveys of U.S. farm operators to obtaindata on major livestock inventories,selected crop acreage and production,grain stocks, and farm labor characteris-tics; farm economic data and chemicaluse data.

Type of Data Collection Operation: Pri-mary frame is obtained from general orspecial purpose lists, supplemented by aprobability sample of land areas used toestimate for list incompleteness.

Data Collection and Imputation Proce-dures: Mail, telephone, or personal inter-views used for initial data collection.Mail nonrespondent followup by phoneand personal interviews. Imputationbased on average of respondents.

Estimates of Sampling Error: EstimatedCV for number of hired farm workers isabout 3 percent. Estimated CVs rangefrom 1 percent to 2 percent for regionalestimates to 3 percent to 6 percent forstate estimates of livestock inventoriesand crop acreage.

Other (nonsampling) Errors: In addition toabove, replicated sampling proceduresused to monitor effects of changes insurvey procedures.

Sources of Additional Material: U.S.Department of Agriculture, NationalAgricultural Statistics Service: The FactFinders of Agriculture, September 1994.

Objective Yield Surveys

Universe, Frequency, and Types of Data:Surveys for data on corn, cotton, pota-toes, soybeans, and wheat to forecastand estimate yields.

Type of Data Collection Operation: Ran-dom location of plots in probabilitysample. Corn, cotton, soybeans, springwheat, and durum wheat selected inJune from Basic Area Frame Sample (see

above). Winter wheat and potatoesselected from March and June multipleframe surveys, respectively.

Data Collection and Imputation Proce-dures: Enumerators count and measureplant characteristics in sample fields.Production measured from plots at har-vest. Harvest loss measured from postharvest gleanings.

Estimates of Sampling Error: CVs fornational estimates of production areabout 2-3 percent.

Other (nonsampling) Errors: In addition toabove, replicated sampling proceduresused to monitor effects of changes insurvey procedures.

Sources of Additional Material: U.S.Department of Agriculture, NationalAgricultural Statistics Service: The FactFinders of Agriculture, September 1994.

U.S. CENSUS BUREAU

County Business Patterns

Universe, Frequency, and Types of Data:Annual tabulation of basic data itemsextracted from the Business Register, afile of all known single and multiestab-lishment companies maintained andupdated by the Census Bureau. Datainclude number of establishments, num-ber of employees, first quarter andannual payrolls, and number of estab-lishments by employment size class.Data are excluded for self-employed per-sons, domestic service workers, railroademployees, agricultural productionworkers, and most government employ-ees.

Type of Data Collection Operation: Theannual Company Organization Surveyprovides individual establishment datafor multi establishment companies. Datafor single establishment companies areobtained from various Census Bureauprograms, such as the Annual Survey ofManufactures and Current Business Sur-veys, as well as from administrativerecords of the Internal Revenue Serviceand the Social Security Administration.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Error: Responserates of greater than 85 percent for the2000 Company Organization Survey.

944 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 5: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

1997 Economic Census—Manufacturing Sector

Universe, Frequency, and Types of Data:Conducted every 5 years to obtain infor-mation on labor, materials, capital inputand output characteristics, plant loca-tion, and legal form of organization forall plants in the United States with oneor more paid employees. Universe was36,000 manufacturing establishments in1997.

Type of Data Collection Operation: Com-plete enumeration of data itemsobtained from 200,000 firms. Adminis-trative records from Internal RevenueService (IRS) and Social Security Admin-istration (SSA) are used for 166,000smaller single-location firms, whichwere determined by various cutoffsbased on size and industry.

Data Collection and Imputation Proce-dures: Four mail and telephone follow-ups for larger nonrespondents. Data forsmall single-location firms (generallythose with fewer than 10 employees)not mailed census questionnaires wereestimated from administrative recordsof IRS and SSA. Data for nonrespondentswere imputed from related responses oradministrative records from IRS and SSA.Approximately 9 percent of total valueof shipments was represented by fullyimputed records in 1997.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: Based onevaluation studies, estimates of non-sampling errors for 1972 were about1.3 percent for estimated total payroll;2 percent for total employment; and 1percent for value of shipments. Esti-mates for later years are not available.

Sources of Additional Material: U.S. Cen-sus Bureau, 1997 Economic Census -Manufacturing Sector, Industry Series,Geographic Area Series, Subject Series,and Summary Series.

Foreign Trade—ExportStatistics

Universe, Frequency, and Types of Data:The export declarations collected byU.S. Bureau of Customs and Border Pro-tection are processed each month toobtain data on the movement of U.S.

merchandise exports to foreign coun-tries. Data obtained include value, quan-tity, and shipping weight of exports bycommodity, country of destination, dis-trict of exportation, and mode of trans-portation.

Type of Data Collection Operation: Ship-pers Export Declarations (paper andelectronic) are generally required to befiled for the exportation of merchandisevalued over $2,500. U.S. Bureau of Cus-toms and Boarder Protection officialscollect and transmit the documents tothe Census Bureau on a flow basis fordata compilation. Data for shipmentsvalued under $2,501 are estimated,based on established percentages ofindividual country totals.

Data Collection and Imputation Proce-dures: Statistical copies of ShippersExport Declarations are received on adaily basis from ports throughout thecountry and subjected to a monthly pro-cessing cycle. They are fully processedto the extent they reflect items valuedover $2,500. Estimates for shipmentsvalued at $2,500 or less are made,based on established percentages ofindividual country totals.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: The goodsdata are a complete enumeration ofdocuments collected by the U.S. Bureauof Customs and Boarder Protection andare not subject to sampling errors; butthey are subject to several types of non-sampling errors. Quality assurance pro-cedures are performed at every stage ofcollection, processing and tabulation;however the data are still subject to sev-eral types of nonsampling errors. Themost significant of these include report-ing errors, undocumented shipments,timeliness, data capture errors, anderrors in the estimation of low-valuedtransactions.

Sources of Additional Material: U.S.Census Bureau, U.S. InternationalTrade in Goods and Services, FT 900,U.S. Imports of Merchandise, andU.S. Exports of Merchandise.<http://www.census.gov/foreign-trade/guide/sec2.html>

Appendix III 945

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 6: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

Foreign Trade—ImportStatistics

Universe, Frequency, and Types of Data:The import entry documents collectedby U.S. Bureau of Customs and BoarderProtection are processed each month toobtain data on the movement of mer-chandise imported into the UnitedStates. Data obtained include value,quantity, and shipping weight by com-modity, country of origin, district ofentry, and mode of transportation.

Type of Data Collection Operation: Importentry documents, either paper or elec-tronic, are required to be filed for theimportation of goods into the UnitedStates valued over $2,000 or for articleswhich must be reported on formalentries. U.S. Bureau of Customs andBoarder Protection officials collect andtransmit statistical copies of the docu-ments to the Census Bureau on a flowbasis for data compilation. Estimates forshipments valued under $2,001 and notreported on formal entries are based onestimated established percentages forindividual country totals.

Data Collection and Imputation Proce-dures: Statistical copies of import entrydocuments, received on a daily basisfrom ports of entry throughout thecountry, are subjected to a monthly pro-cessing cycle. They are fully processedto the extent they reflect items valued at$2,001 and over or items which must bereported on formal entries.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: The goodsdata are a complete enumeration ofdocuments collected by the U.S. Bureauof Customs and Boarder Protection andare not subject to sampling errors, butthey are subject to several types of non-sampling errors. Quality assurance pro-cedures are performed at every stage ofcollection, processing and tabulation;however the data are still subject to sev-eral types of nonsampling errors. Themost significant of these include report-ing errors, undocumented shipments,timeliness, data capture errors, anderrors in the estimation of low-valuedtransactions.

Sources of Additional Material: U.S.Census Bureau, U.S. InternationalTrade in Goods and Services, FT 900,U.S. Imports of Merchandise, andU.S. Exports of Merchandise.<http://www.census.gov/foreigntrade/guide/sec2.html>

Census of Governments

Universe, Frequency, and Types of Data:Survey of all governmental units in theUnited States conducted every 5 yearsto obtain data on government revenue,expenditures, debt, assets, employmentand employee retirement systems, prop-erty values, public school systems, andnumber, size, and structure of govern-ments.

Type of Data Collection Operation: Com-plete census. List of units derivedthrough classification of governmentunits recently authorized in each stateand identification, counting, and classifi-cation of existing local governments andpublic school systems.

Data Collection and Imputation Proce-dures: Data collected through field andoffice compilation of financial data fromofficial records and reports for statesand large local governments; mail can-vass of selected data items, like statetax revenue and employee retirementsystems; and collection of local govern-ment statistics through central collec-tion arrangements with stategovernments.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: Some non-sampling errors may arise due to pos-sible inaccuracies in classification,response, and processing.

Sources of Additional Material: Publica-tions <http://www.census.gov/prod/www/abs/govern.html>: U.S. CensusBureau, Public Employment in 1992, GE92, No. 1, Governmental Finances in1991-1992, GF 92, No. 5, and Census ofGovernments, 1997 and 2002, variousreports. Web site references: Census ofGovernments <http://www.census.gov/govs/www/cog2002.html> and<http://www.census.gov/govs/www/cog.html>. Employment—state and

946 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 7: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

local site: <http://www.census.gov/govs/www/apes.html>. Finance—stateand local site: <http://www.census.gov/govs/www/financegen.html>.

Annual Surveys of State andLocal Government

Universe, Frequency, and Types of Data:Sample survey conducted annually toobtain data on revenue, expenditure,debt, and employment of state and localgovernments. Universe is all govern-mental units in the United States (about87,500).

Type of Data Collection Operation: Samplesurvey includes all state governments,county governments with 100,000+population, municipalities with 75,000+population, townships with 50,000+population, all school districts with10,000+ enrollment in March 2000, andother governments meeting certain cri-teria; probability sample for remainingunits.

Data Collection and Imputation Proce-dures: Field and office compilation ofdata from official records and reportsfor states and large local governments;central collection of local governmentalfinancial data through cooperativeagreements with a number of state gov-ernments; mail canvass of other unitswith mail and telephone follow-ups ofnonrespondents. Data for nonresponsesare imputed from previous year data orobtained from secondary sources, ifavailable.

Estimates of Sampling Error: State andlocal government totals are generallysubject to sampling variability of lessthan 3 percent.

Other (nonsampling) Errors: Nonresponserate is less than 15 percent for localgovernments. Other possible errors mayresult from undetected inaccuracies inclassification, response, and processing.

Sources of Additional Material: Publica-tions <http://www.census.gov/prod/www/abs/govern.html>: U.S. CensusBureau, Public Employment in 1992, GE92, No. 1, Governmental Finances in1991-1992, GF 92, No. 5, and Census ofGovernments, 1997 and 2002, variousreports. Web site references: Census of

Governments <http://www.census.gov/gov/govs/www/cog202.html> and<http://www.census.gov/govs/www/cog.html>. Employment—state andlocal site: <http://www.census.gov/govs/www/apes.html>. Finance—stateand local site: <http://www.census.gov/govs/www/financegen.html> and<http://www.census.gov/govs/www/statetechdoc.html>.

American Housing Survey

Universe, Frequency, and Types of Data:Conducted nationally in the fall in oddnumbered years to obtain data on theapproximately 116 million occupied orvacant housing units in the UnitedStates (group quarters are excluded).Data include characteristics of occupiedhousing units, vacant units, new hous-ing and mobile home units, financialcharacteristics, recent mover house-holds, housing and neighborhoodquality indicators, and energy character-istics.

Type of Data Collection Operation: Thenational sample was a multistage prob-ability sample with about 53,000 unitseligible for interview in 2001. Sampleunits, selected within 394 PSUs, weresurveyed over a 4-month period.

Data Collection and Imputation Proce-dures: For 2001, the survey was con-ducted by personal interviews. Theinterviewers obtained the informationfrom the occupants or, if the unit wasvacant, from informed persons such aslandlords, rental agents, or knowledge-able neighbors.

Estimates of Sampling Error: For thenational sample, illustrations of the SEof the estimates are provided in theAppendix D of the 2001 report. As anexample, the estimated CV is about 0.2percent for the estimated percentage ofowner occupied units with two persons.

Other (nonsampling) Errors: Responserate was about 93 percent. Nonsamplingerrors may result from incorrect orincomplete responses, errors in codingand recording, and processing errors.For the 2001 national sample, approxi-mately 1.9 percent of the total housinginventory was not adequately repre-sented by the AHS sample.

Appendix III 947

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 8: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

Sources of Additional Material: U.S. Cen-sus Bureau, Current Housing Reports,Series H150 and H-170, American Hous-ing Survey. <http://www.census.gov/hhes/www/ahs.html>.

Monthly Survey ofConstruction

Universe, Frequency, and Types of Data:Survey conducted monthly of newly con-structed housing units (excludingmobile homes). Data are collected onthe start, completion, and sale of hous-ing. (Annual figures are aggregates ofmonthly estimates.)

Type of Data Collection Operation: Prob-ability sample of housing units obtainedfrom building permits selected from19,000 places. For nonpermit places,multistage probability sample of newhousing units selected in 169 PSUs. Inthose areas, all roads are canvassed inselected enumeration districts.

Data Collection and Imputation Proce-dures: Data are obtained by telephoneinquiry and field visit.

Estimates of Sampling Error: EstimatedCV of 3 percent to 4 percent for esti-mates of national totals, but may behigher than 20 percent for estimatedtotals of more detailed characteristics,such as housing units in multiunit struc-tures.

Other (nonsampling) Errors: Responserate is over 90 percent for most items.Nonsampling errors are attributed todefinitional problems, differences ininterpretation of questions, incorrectreporting, inability to obtain informationabout all cases in the sample, and pro-cessing errors.

Sources of Additional Material: All dataare available on the Internet at<http://www.census.gov/const/www/newsresconstindex.html>. Furtherdocumentation of the survey is alsoavailable at that site.

Value of Construction Put inPlace

Universe, Frequency, and Types of Data:Survey conducted monthly on totalvalue of all construction put in place in

the current month, both public and pri-vate projects. Construction valuesinclude costs of materials and labor,contractors profits, overhead costs, costof architectural and engineering work,and miscellaneous project costs.(Annual figures are aggregates ofmonthly estimates.)

Type of Data Collection Operation: Variesby type of activity: Total cost of privateone-family houses started each month isdistributed into value put in place usingfixed patterns of monthly constructionprogress; using a multistage probabilitysample, data for private multifamilyhousing are obtained by mail from own-ers of multiunit projects. Data for resi-dential additions and alterations areobtained in a quarterly survey measur-ing expenditures; monthly estimates areinterpolated from quarterly data. Esti-mates of value of private nonresidentialconstruction, and state and local gov-ernment construction are obtained bymail from owners (or agents) for a prob-ability sample of projects. Estimates offarm nonresidential construction expen-ditures are based on U.S. Department ofAgriculture annual estimates of con-struction; public utility estimates areobtained from reports submitted to fed-eral regulatory agencies and from pri-vate utility companies; estimates offederal construction are based onmonthly data supplied by federal agen-cies.

Data Collection and Imputation Proce-dures: See ‘‘Type of Data Collection’’Operation. Imputation accounts forapproximately 25 percent of estimatedvalue of construction each month.

Estimates of Sampling Error: CV estimatesfor private nonresidential constructionrange from 3 percent for estimatedvalue of industrial buildings to 9 percentfor religious buildings. CV is approxi-mately 2 percent for total new privatenonresidential buildings.

Other (nonsampling) Errors: For directlymeasured data series based on samples,some nonsampling errors may arisefrom processing errors, imputations,and misunderstanding of questions.Indirect data series are dependent onthe validity of the underlying assump-tions and procedures.

948 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 9: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

Sources of Additional Material: U.S. Cen-sus Bureau, Construction Reports, SeriesC30, Value of Construction Put in Place.

Annual Survey ofManufactures

Universe, Frequency, and Types of Data:The Annual Survey of Manufactures(ASM) is conducted annually, except foryears ending in 2 and 7 for all manufac-turing establishments having one ormore paid employees. The purpose ofthe ASM is to provide key intercensalmeasures of manufacturing activity,products, and location for the publicand private sectors. The ASM providesstatistics on employment, payroll,worker hours, payroll supplements, costof materials, value added by manufac-turing, capital expenditures, inventories,and energy consumption. It also pro-vides estimates of value of shipmentsfor 1,800 classes of manufactured prod-ucts.

Type of Data Collection Operation: TheASM includes approximately 55,000establishments selected from the censusuniverse of 366,000 manufacturingestablishments. Some 25,000 largeestablishments are selected with cer-tainty, and some 30,000 other establish-ments are selected with probabilityproportional to a composite measure ofestablishment size. The survey isupdated from two sources; Internal Rev-enue Service administrative records areused to include new single-unit manu-facturers and the Company OrganizationSurvey identifies new establishments ofmultiunit forms.

Data Collection and Imputation Proce-dures: Survey is conducted by mail withphone and mail follow-ups of nonre-spondents. Imputation (for all nonre-sponse items) is based on previous yearreports, or for new establishments insurvey, on industry averages.

Estimates of Sampling Error: Estimatedstandard errors for number of employ-ees, new expenditure, and for valueadded totals are given in annual publica-tions. For U.S. level industry statistics,most estimated standard errors are 2percent or less, but vary considerablyfor detailed characteristics.

Other (nonsampling) Errors: Responserate is about 85 percent. Nonsamplingerrors include those due to collection,reporting, and transcription errors,many of which are corrected throughcomputer and clerical checks.

Sources of Additional Material: U.S. Cen-sus Bureau, Annual Survey of Manufac-tures, and Technical Paper 24.<http://www.census.gov/econ/www/mancen.html>.

Census of Population

Universe, Frequency, and Types of Data:Complete count of U.S. population con-ducted every 10 years since 1790. Dataobtained on number and characteristicsof people in the U.S.

Type of Data Collection Operation: In1980, 1990, and 2000 complete censusfor some items age, sex, race, and rela-tionship to householder. In 1980,approximately 19 percent of the hous-ing units were included in the sample; in1990 and 2000, approximately 17 per-cent.

Data Collection and Imputation Proce-dures: In 1980, 1990, and 2000, mailquestionnaires were used extensivelywith personal interviews in the remain-der. Extensive telephone and personalfollowup for nonrespondents was donein the censuses. Imputations were madefor missing characteristics.

Estimates of Sampling Error: Samplingerrors for data are estimated for allitems collected by sample and vary bycharacteristic and geographic area. TheCVs for national and state estimates aregenerally very small.

Other (nonsampling) Errors: Since 1950,evaluation programs have been con-ducted to provide information on themagnitude of some sources of nonsam-pling errors such as response bias andundercoverage in each census. Resultsfrom the evaluation program for the1990 census indicate that the estimatednet under coverage amounted to about1.5 percent of the total resident popula-tion. For Census 2000, the evaluationprogram indicates a net overcount of0.5 percent of the resident population.

Appendix III 949

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 10: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

Sources of Additional Material: U.S. Cen-sus Bureau, The Coverage of Populationin the 1980 Census, PHC80-E4; ContentReinterview Study: Accuracy of Data forSelected Population and Housing Char-acteristics as Measured by Reinterview,PHC80-E2; 1980 Census of Population,Vol. 1, (PC80-1), Appendixes B, C, andD; 1990 Census of Population and Hous-ing, Content Reinterview Study, CPH-E-1,1990 Census of Population and Housing,Effectiveness of Quality Assurance, CPH-E-2, 1990 Census of Population andHousing, Programs to Improve Cover-age, CPH-E-3. For 2000 census see<http://www.census.gov/pred/www>.

Current Population Survey(CPS)

Universe, Frequency, and Types of Data:Nationwide monthly sample survey ofcivilian noninstitutional population, 15years old or over, to obtain data onemployment, unemployment, and anumber of other characteristics.

Type of Data Collection Operation: Multi-stage probability sample of about50,000 households in 754 PSUs in 1996expanded to about 60,000 householdsin July 2001. Oversampling in somestates and the largest MSAs to improvereliability for those areas of employmentdata on annual average basis. A con-tinual sample rotation system is used.Households are in sample 4 months, outfor 8 months, and in for 4 more. Month-to-month overlap is 75 percent; year-to-year overlap is 50 percent.

Data Collection and Imputation Proce-dures: For first and fifth months that ahousehold is in sample, personal inter-views; other months, approximately, 85percent of the data collected by phone.Imputation is done for both item andtotal nonresponse. Adjustment for totalnonresponse is done by a predefinedcluster of units, by MSA size and resi-dence; for item nonresponse imputationvaries by subject matter.

Estimates of Sampling Error: EstimatedCVs on national annual averages forlabor force, total employment, andnonagricultural employment, 0.2 per-cent; for total unemployment and agri-cultural employment, 1.0 percent to 2.5percent. The estimated CVs for family

income and poverty rate for all personsin 1986 are 0.5 percent and 1.5 percent,respectively. CVs for subnational areas,such as states, would be larger andwould vary by area.

Other (nonsampling) Errors: Estimates ofresponse bias on unemployment are notavailable, but estimates of unemploy-ment are usually 5 percent to 9 percentlower than estimates from reinterviews.Six to 7.0 percent of sample householdsunavailable for interviews.

Sources of Additional Material: U.S. Cen-sus Bureau and Bureau of Labor Statis-tics, Current Population Survey; Designand Methodology, (Tech. Paper 63),available on Internet <http://www.census.gov/prod/2002pubs/tp63rv.pdf>; Source and Accuracy of Estimatesfor Poverty in the United States, avail-able on Internet <http://www.census.gov/hhes/poverty/poverty02/pov02src.pdf> and Bureau of Labor Sta-tistics, Employment and Earnings,monthly, Explanatory Notes and Esti-mates of Error, Household Data and BLSHandbook of Methods, Chapter 1, avail-able on the Internet at <http://www.bls.gov/opub/hom/homch1a.htm>.

Surveys of Minority- andWomen-Owned BusinessEnterprises (SMOBE/SWOBE)

Universe, Frequency, and Types of Data:The surveys provide basic economicdata on businesses owned by Blacks,Hispanics, Asians, Pacific Islanders,Alaska Natives, American Indians, andWomen. All firms operating during1997, except those classified as agricul-tural, are represented. The lists of allfirms (or sample frames) are compiledfrom a combination of business taxreturns and data collected on other eco-nomic census reports. The publisheddata include the number of firms, grossreceipts, number of paid employees,and annual payroll. The data are pre-sented by geographic area, industry,size of firm, and legal form of organiza-tion of firm.

Type of Data Collection Operation: Thesurveys are based on a stratifiedprobability sample of approximately2.5 million firms from a universe ofapproximately 20.8 million firms. There

950 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 11: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

were approximately 5.3 million firmswith paid employees and 15.5 millionfirms with no paid employees. The dataare based on the entire firm rather thanon individual locations of a firm.

Data Collection and Imputation Proce-dures: Data were collected through amailout/mailback operation. Compensa-tion for missing data is addressedthrough reweighting, edit correction,and standard statistical imputationmethods.

Estimates of Sampling Error: Variability inthe estimates is due to the sample selec-tion and estimation for items collectedby SMOBE/SWOBE. CVs are applicable toonly published cells in which samplecases are tabulated. The CVs for numberof firms and receipts at the nationallevel range from 1 to 4 percent.

Other (nonsampling) Error: Nonsamplingerrors are attributed to many sources:inability to obtain information for allcases in the universe, adjustments tothe weights of respondents to compen-sate for nonrespondents, imputation formissing data, data errors and biases,mistakes in recording or keying data,errors in collection or processing, andcoverage problems.

Sources of Additional Materials: U.S. Cen-sus Bureau, Guide to the 1997 EconomicCensus and Related Statistics.

1997 Economic Census (Geo-graphic Area Series and SubjectSeries Reports) (for NAICS sec-tors 22, 42, 4445, 48-49, and51-81)

Universe, Frequency, and Types of Data:Conducted every 5 years to obtain dataon number of establishments, numberof employees, total payroll size, totalsales, and other industry specific statis-tics. In 1997, the universe was allemployer and nonemployer establish-ments primarily engaged in wholesale,retail, utilities, finance & insurance, realestate, transportation & warehousing,and other service industries.

Type of Data Collection Operation: Alllarge employer firms were surveyed (i.e.all employer firms above the payroll sizecutoff established to separate large from

small employers) plus a 5 percent to 25percent sample of the small employerfirms. Firms with no employees werenot required to file a census return.

Data Collection and Imputation Proce-dures: Mail questionnaires were usedwith both mail and telephone follow-upsfor nonrespondents. Data for nonre-spondents and for small employer firmsnot mailed a questionnaire wereobtained from administrative records ofthe IRS and Social Security Administra-tion or imputed. Nonemployer data wereobtained exclusively from IRS 1997income tax returns.

Estimates of Sampling Error: Not appli-cable for basic data such as sales, rev-enue, payroll, etc.

Other (nonsampling) Errors: Trade arealevel unit response rates in 1997 rangedfrom 85 percent to 99 percent. Itemresponse rates ranged from 60 percentto 90 percent with lower rates for themore detailed questions. Nonsamplingerrors may occur during the collection,reporting, and keying of data, andindustry misclassification.

Sources of Additional Material: U.S. Cen-sus Bureau, 1997 Economic Census: Geo-graphic Area Series and Subject SeriesReports (by NAICS sector), Appendix C,and <www.census.gov/con97.html>.

Service Annual Survey

Universe, Frequency, and Types of Data:The U.S. Census Bureau conducts theService Annual Survey to providenational estimates of revenues,expenses, and e-commerce revenues fortaxable and tax-exempt firms classifiedin selected service industries. Estimatesare summarized by industry classifica-tion based on the 1997 North AmericanIndustry Classification System (NAICS).Industries covered by the Service AnnualSurvey include all or part of the follow-ing NAICS sectors: Transportation andWarehousing (NAICS 48-49); Information(NAICS 51); Finance and Insurance(NAICS 52); Real Estate and Rental andLeasing (NAICS 53); Professional, Scien-tific, and Technical Services (NAICS 54);Administrative and Support and WasteManagement and Remediation Services

Appendix III 951

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 12: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

(NAICS 56); Health Care and Social Assis-tance (NAICS 62); Arts, Entertainment,and Recreation (NAICS 71); and OtherServices, except Public Administration(NAICS 81). Data items collected includetotal revenue, revenue from e-commercetransactions; and for selected industries,revenue from detailed service products,total expenses, and expenses by majortype, revenue from exported services,and inventories. Questionnaires aremailed in January and request annualdata for the prior year. Estimates arepublished approximately 12 monthsafter the initial survey mailing.

Type of Data Collection Operation: TheService Annual Survey estimates aredeveloped using data from a probabilitysample and administrative records. Serv-ice Annual Survey questionnaires aremailed to a probability sample that isperiodically reselected from a universeof firms located in the United States andhaving paid employees. The sampleincludes firms of all sizes and coversboth taxable firms and firms exemptfrom federal income taxes. Updates tothe sample are made on a quarterlybasis to account for new businesses.Firms without paid employees or non-employers are included in the estimatesthrough imputation and/or administra-tive records data provided by other fed-eral agencies. Links to additionalinformation about confidentiality protec-tion, sampling error, nonsampling error,sample design, definitions, and copiesof the questionnaires may be found onthe Internet at <www.census.gov/econ/www/servmenu.html>.

Estimates of Sampling Error: Coefficientsof variation for the 2001 Service AnnualSurvey estimates range from 0.7 percentto 2.1 percent for total revenue esti-mates computed at the NAICS sector(2-digit NAICS code) level. Samplingerrors for more detailed industries areshown in the corresponding publica-tions. Links to additional informationregarding sampling error may be foundat: <www.census.gov/svsd/www/cv.html>.

Other (nonsampling) Errors: Data areimputed for unit nonresponse, item non-response, and for reported data that

fails edits. The percent of imputed datafor total revenue for the 2001 ServiceAnnual Survey is approximately 12 per-cent.

Sources of Additional Material: U.S. Cen-sus Bureau, Current Business Reports,Service Annual Survey, Census BureauWeb site: <http://www.census.gov/econ/www/servmenu.html>.

Wholesale Trade Survey

Universe, Frequency, and Types of Data:Provides monthly estimates of whole-sale sales and end of month inventories.

Type of Data Collection Operation: Prob-ability sample of all firms from a listframe and additionally, for retail andservice an area frame. The list frame isthe Bureaus Standard Statistical Estab-lishment List (SSEL) updated quarterlyfor recent birth Employer Identification(EI) Numbers issued by the Internal Rev-enue Service and assigned a kind ofbusiness code by the Social SecurityAdministration. The largest firms areincluded monthly.

Data Collection and Imputation Proce-dures: Data are collected by mail ques-tionnaire with telephone followups fornonrespondents. Imputation made foreach nonresponse item and each itemfailing edit checks.

Estimates of Sampling Error: For the 2001monthly surveys median CVs are about0.6 percent for estimated total retailsales, 1.3 for wholesale sales, 1.6 forwholesale inventories. For dollar volumeof receipts, CVs from the Service AnnualSurvey vary by kind of business andrange between 1.5 percent to 15.0 per-cent. Sampling errors are shown inmonthly publications.

Other (nonsampling) Errors: Imputationrates are about 18 percent to 23 percentfor monthly retail sales, 30 percent forwholesale sales, about 32 percent formonthly wholesale inventories, and 14percent for the Service Annual Survey.

Sources of Additional Material: U.S. Cen-sus Bureau, Current Business Reports,Monthly Retail Trade, Monthly WholesaleTrade, and Service Annual Survey.

952 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 13: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

Monthly Retail Trade andFood Service Survey

Universe, Frequency, and Types of Data:Provides monthly estimates of retail andfood service sales by kind of businessand end of month inventories of retailstores.

Type of Data Collection Operation: Prob-ability sample of all firms from a listframe. The list frame is the BureausStandard Statistical Establishment List(SSEL) updated quarterly for recent birthEmployer Identification (EI) Numbersissued by the Internal Revenue Serviceand assigned a kind of business code bythe Social Security Administration. Thelargest firms are included monthly; asample of others is included everymonth also.

Data Collection and Imputation Proce-dures: Data are collected by mail ques-tionnaire with telephone followups fornonrespondents. Imputation made foreach nonresponse item and each itemfailing edit checks.

Estimates of Sampling Error: For the 2003monthly surveys, CVs are about 0.5 per-cent for estimated total retail sales and0.99 percent for estimated total retailinventories. Sampling errors are shownin monthly publications.

Other (nonsampling) Errors: Imputationrates are about 20 percent for monthlyretail and food service sales, and 28 per-cent for monthly retail inventories.

Sources of Additional Material: U.S. Cen-sus Bureau, Current Business Reports,Monthly Retail Trade.

Nonemployer Statistics

Universe, Frequency, and Types of Data:Nonemployer statistics are an annualtabulation of economic data by industryfor active businesses without paidemployees that are subject to federalincome tax. Data showing the numberof establishments and receipts by indus-try are available for the U.S., states,counties, and metropolitan areas. Mosttypes of businesses covered by the Cen-sus Bureaus economic statistics pro-grams are included in the nonemployer

statistics. Tax-exempt and agricultural-production businesses are excludedfrom nonemployer statistics.

Type of Data Collection Operation: Theuniverse of nonemployer establishmentsis created annually as a byproduct of theCensus Bureaus Business Register pro-cessing for employer establishments. Ifa business is active but without paidemployees, then it becomes part of thepotential nonemployer universe. Indus-try classification and receipts are avail-able for each potential nonemployerbusiness. These data are obtained pri-marily from the annual business incometax returns of the Internal Revenue Serv-ice (IRS). The potential nonemployer uni-verse undergoes a series of complexprocessing, editing, and analyticalreview procedures at the Census Bureauto distinguish nonemployers fromemployers, and to correct and completedata items used in creating the datatables.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: The data aresubject to nonsampling errors, such aserrors of self-classification by industryon tax forms, as well as errors ofresponse, keying, nonreporting, andcoverage.

Sources of Additional Material: U. S. Cen-sus Bureau, Nonemployer Statistics:2000 (Introduction; Coverage and Meth-odology). See also <http://www.census.gov/epcd/nonemployer/view/cov&meth.htm>.

U.S. DEPARTMENT OFEDUCATION, National Centerfor Education Statistics

Higher Education GeneralInformation Survey (HEGIS),Degrees and Other FormalAwards Conferred. Beginning1986, Integrated Postsecond-ary Education Data Survey(IPEDS)

Universe, Frequency, and Types of Data:Annual survey of all institutions andbranches listed in the Education Direc-tory, Colleges and Universities to obtain

Appendix III 953

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 14: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

data on earned degrees and other for-mal awards, conferred by field of study,level of degree, sex, and by racial/ethniccharacteristics (every other year prior to1989, then annually).

Type of Data Collection Operation: Com-plete census.

Data Collection and Imputation Proce-dures: Survey package is usually mailedin the spring with surveys due at vary-ing dates in the summer and fall; mailand phone followup procedures for non-respondents. Missing data are imputedby using data of similar institutions.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: For 2000-2001, approximately 92.3 percentresponse rate for degree-granting insti-tutions.

Sources of Additional Material:<http://www.nces.ed.gov/ipeds/>

Higher Education GeneralInformation Survey (HEGIS),Fall Enrollment in Institu-tions of Higher Education;beginning 1986, IntegratedPostsecondary EducationData Survey (IPEDS), Comple-tions Fall Enrollment

Universe, Frequency, and Types of Data:Annual survey of all institutions andbranches listed in the Directory toobtain data on total enrollment by sex,level of enrollment, type of program,racial/ethnic characteristics (every otheryear prior to 1989, then annually) andattendance status of student, and onfirst-time students.

Type of Data Collection Operation: Com-plete census.

Data Collection and Imputation Proce-dures: Survey package is usually mailedin the spring with surveys due at vary-ing dates in the summer and fall; mailand phone followup procedures for non-respondents. Missing data are imputedby using data of similar institutions.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: For degree-granting institutions approximately 97.2percent response rate in fall 2000.

Sources of Additional Material: U.S.Department of Education, National Cen-ter for Education Statistics, Fall Enroll-ment in Higher Education, annual.<http://www.nces.ed.gov/ipeds/>

Higher Education GeneralInformation Survey (HEGIS),Financial Statistics of Insti-tutions of Higher Education;beginning 1986, IntegratedPostsecondary EducationData Survey (IPEDS), Finance

Universe, Frequency, and Types of Data:Annual survey of all institutions andbranches listed in the Education Direc-tory, Colleges and Universities to obtaindata on financial status and operations,including current funds revenues, cur-rent funds expenditures, and physicalplant assets.

Type of Data Collection Operation: Com-plete census.

Data Collection and Imputation Proce-dures: Survey package is usually mailedin the spring with surveys due at vary-ing dates in the summer and fall; mailand phone followup procedures for non-respondents. Missing data are imputedby using data of similar institutions.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: For 2000,96.7 percent for degree-granting institu-tions.

Sources of Additional Material: U.S.Department of Education, National Cen-ter for Education Statistics, FinancialStatistics. <http://www.neces.ed.gov/ipeds/>

National PostsecondaryStudent Aid Study

Universe, Frequency, and Types of Data:NPSAS is a comprehensive nationwidestudy conducted every 3 to 4 years byNCES. It was first administered in 1986-87. The purpose of the study is to pro-duce reliable national estimates of how

954 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 15: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

students and their families pay for post-secondary education. Information hasbeen gathered on more than 55,000 stu-dents in each study cycle.

Type of Data Collection Operation: Thedesign for the NPSAS sample involvesselecting a nationally representativesample of postsecondary educationinstitutions and students within thoseinstitutions. NPSAS data come from mul-tiple sources, including institutionalrecords, government databases, andstudent telephone interviews.

Data Collection and Imputation Proce-dures: NPSAS:2000 involved a multi-stage effort to collect informationrelated to student aid. An initial NPSAS:2000 data collection stage collectedelectronic student aid report informationdirectly from the U.S. Department ofEducation Central Processing System forfederal financial aid applications. Thesecond stage involved abstracting infor-mation from the students records at theschool from which he/she was sampled,using a computer-assisted data entrysystem. In the third stage, interviewswere conducted with sampled students,primarily using a computer-assisted tele-phone interviewing (CATI) procedure.Computer-assisted personal interview-ing procedures, using field interviewers,were also used for the first time on aNPSAS study, to help reduce the level ofnonresponse to CATI. Over the course ofdata collection, some data wereobtained from the Department of Educa-tions National Student Loan Data System(NSLDS), the ACT and the EducationalTesting Service. The additional datasources provided a way to check or con-firm information obtained from studentrecords or the interview and includeother data. After the editing process(which included logical imputations), theremaining missing values for 23 analy-sis variables were imputed statistically.Most of the variables were imputedusing a weighted hot deck procedure.

Estimates of Sampling Error: Estimatesvary by characteristic and variable.

Other (nonsampling) Errors: There isnearly complete coverage of the institu-tions in the target population. Studentcoverage, however, is dependent upon

enrollment lists provided by the institu-tions. For the 1999-2000 NPSAS, theoverall weighted study response ratewas 89 percent. There is also error dueto unit and item nonresponse, as well asmeasurement error. Possible sources ofmeasurement error include: incorrectreporting, variation in question deliveryor interpretation, and mistakes in dataentry.

Sources of Additional Material: U.S.Department of Education, National Cen-ter for Education Statistics. NationalPostsecondary Student Aid Study, 1999-2000 (NPSAS: 2000), MethodologyReport, NCES 2002152, by John A. Ric-cobono, Melissa B. Cominole, Peter H.Siegel, Timothy J. Gabel, Michael W.Link, and Lutz K. Berkner. Projectofficer, Andrew G. Malizio. Washington,DC: 2001. U.S. Department of Education.National Center for Education Statistics.NCES Handbook of Survey Methods,NCES 2003603, by Lori Thurgood, Eliza-beth Walter, George Carter, Susan Henn,Gary Huang, Daniel Nooter, Wray Smith,R. William Cash, and Sameena Salvucci.Project Officers, Marilyn Seastrom, TaiPhan, and Michael Cohen. Washington,DC: 2003.

National Household EducationSurveys Program

Universe, Frequency, and Types of Data:The National Household EducationSurveys Program (NHES) is a system oftelephone surveys of the noninstitution-alized civilian population of the UnitedStates. Surveys in NHES have varyinguniverses of interest depending on theparticular survey. Specific topics cov-ered by each survey are at the NHESWeb site <http://neces.ed .gov/nhes>. Alist of the surveys fielded as part ofNHES, each universe, and the years theywere fielded is provided below. (1) AdultEducation Interviews were conductedwith a representative sample of civilian,noninstitutionalized persons age 16 andolder who were not enrolled in grade 12or below (1991, 1995, 1999, 2001,2003). (2) Before- and After-School Pro-grams and Activities Interviews wereconducted with parents of a representa-tive sample of students in grades K8(1999, 2001). (3) Civic Involvement

Appendix III 955

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 16: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

Interviews were conducted with a repre-sentative sample of parents, youth, andadults(1996, 1999). (4) Early ChildhoodProgram Participation Interviews wereconducted with parents of a representa-tive sample of children from birththrough grade 3, with the specific agegroups varying by survey year (1991,1995, 1999, 2001). (5) Household andLibrary Use Interviews were conductedwith a representative sample of U.S.households (1996). (6) Parent and Fam-ily Involvement in Education Interviewswere conducted with parents of a repre-sentative sample of children age threethrough grade 12 or in grades K-12depending on the survey year.(1996,1999, 2003). (7) School Readiness Inter-views were conducted with parents of arepresentative sample of 3- to 7-year-oldchildren (1993, 1999). (8) School Safetyand Discipline Interviews were con-ducted with a representative sample ofstudents in grades 612, their parents,and the parents of a representativesample of students in grades 3-5 (1993).

Type of Data Collection Operation: NHESuses telephone interviews to collectdata.

Data Collection and Imputation Proce-dures: Telephone numbers are selectedusing random digit-dialing techniques.Approximately 45,000 to 64,000 house-holds are contacted in order to identifypersons eligible for the surveys. Dataare collected using computer-assistedtelephone interviewing (CATI) proce-dures. Missing data are imputed usinghot-deck imputation procedures.

Estimates of Sampling Error: Unweightedsample sizes range between 2,500 and21,000. The average root design effectsof the surveys in NHES range from 1.1to 4.5.

Other (nonsampling) Errors: Because ofunit nonresponse and because thesamples are drawn from householdswith telephone instead of all house-holds, nonresponse and/or coveragebias may exist for some estimates. How-ever, both sources of potential bias areadjusted for in the weighting process.Analyses of both potential sources ofbias in the NHES collections have beenstudied and no significant bias has beendetected.

Sources of Additional Material: Please seethe NHES Web site at <http://nces.ed.gov/nhes>.

U.S. ENERGY INFORMATIONADMINISTRATION

Residential EnergyConsumption Survey

Universe, Frequency, and Types of Data:Quadriennial survey of households andtheir fuel suppliers. Data are obtainedon energy-related household character-istics, housing unit characteristics, useof fuels, and energy consumption andexpenditures by fuel type.

Type of Data Collection Operation: Prob-ability sample in 116 PSUs. The 1997survey resulted in 5,900 completedinterviews. The 2001 survey resultedin 5,318 completed interviews. Forresponding units, fuel consumptionand expenditure data obtained fromfuel suppliers to those households.

Data Collection and Imputation Proce-dures: Personal interviews. Extensivefollowup of nonrespondents includingmail questionnaires for some house-holds. Adjustments for nonrespondentswere made in weighting for respon-dents. Most item nonresponses wereimputed.

Estimates of Sampling Error: EstimatedCVs for household averages (1997 sur-vey): for consumption, 1.3 percent; forexpenditures, 1.0 percent; for variousfuels, values ranged from 2.0 percent;for electricity to 7.0 percent for LPG.

Other (nonsampling) Errors: Householdresponse rate of 81.0 percent for 1997survey and 76.7 for 2001 survey. Non-consumption data were mostly imputedfor mail respondents (2.5 percent of eli-gible units in 1997 and 3.9 percent in2001). Usable responses from fuel sup-pliers for various fuels ranged from 80.7percent for electricity to 56.6 percentfor fuel oil for the 1997 survey.

Sources of Additional Material: U.S.Energy Information Administration, ALook at Residential Energy Consumptionin 1997. The Web page for the Residen-tial Energy Consumption Survey is at:<http://www.eia.doe.gov/emeu/recs>.

956 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 17: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

U.S. NATIONAL CENTER FORHEALTH STATISTICS (NCHS)

National Vital StatisticsSystem

Universe, Frequency, and Types of Data:Annual data on births and deaths in theUnited States.

Type of Data Collection Operation: Mortal-ity data based on complete file of deathrecords, except 1972, based on 50 per-cent sample. Natality statistics 1951-71,based on 50 percent sample of birth cer-tificates, except a 20 percent to 50 per-cent in 1967, received by NCHS.Beginning 1972, data from some statesreceived through Vital Statistics Coop-erative Program (VSCP) and complete fileused; data from other states based on50 percent sample. Beginning 1986, allreporting areas participated in the VSCP.

Data Collection and Imputation Proce-dures: Reports based on records fromregistration offices of all states, Districtof Columbia, New York City, PuertoRico, Virgin Islands, Guam, AmericanSamoa, and Northern Marianas.

Estimates of Sampling Error: For recentyears, CVs for births are small due tolarge portion of total file in sample(except for very small estimated totals).

Other (nonsampling) Errors: Data onbirths and deaths believed to be at least99 percent complete.

Sources of Additional Material: U.S.National Center for Health Statistics,Vital Statistics of the United States, Vol. Iand Vol. II, annual, and National VitalStatistics Report. NCHS Web site at<http://www.cdc.gov/nchs/nvss.htm>.

National Health InterviewSurvey (NHIS)

Universe, Frequency, and Types of Data:Continuous data collection covering thecivilian noninstitutional population toobtain information on personal anddemographic characteristics, illnesses,injuries, impairments, and other healthtopics.

Type of Data Collection Operation: Multi-stage probability sample of 49,000households (in 198 PSUs) from 1985 to

1994; 43,000 households (358 designPSUs) from 1995 on, selected in groupsof about four adjacent households.

Data Collection and Imputation Proce-dures: Some missing data items (e.g.,race, ethnicity) are imputed using a hotdeck imputation value. Unit nonre-sponse is compensated for by an adjust-ment to the survey weights.

Estimates of Sampling Error: Estimates ofstandard error (SE): For 1999 medicallyattended injury episodes rates in thepast 12 months by falling for: females37.66 (1.94), and males 29.69 (1.82) per1,000 population; for 1999 injury epi-sodes rates during the past 12 monthsinside the home - 20.38 (1.08) per 1,000population.

Other (nonsampling) Errors: The responserate was 93.8 percent in 1996; in 1999,the total household response rate was87.6 percent, with the final familyresponse rate of 86.1 percent, and thefinal sample adult response rate of 69.6percent; in 2000, the total householdresponse rate was 88.9 percent, withthe final family response rate of 87.3percent, and the final sample adultresponse rate was 72.1 percent for theNHIS. 2001 household final responserate was 88.9 percent, with thefamily/person final response rate of87.6 percent and the final sample adultresponse rate was 73.8 percent. (Note:The NHIS sample redesign was con-ducted in 1995, and the NHIS question-naire was redesigned in 1997.)

Sources of Additional Material: U.S.National Center for Health Statistics,Summary Health Statistics for U.S. Chil-dren: National Health Interview Survey,1999, Vital and Health Statistics, Series10 #203; U.S. National Center for HealthStatistics, Summary Health Statistics forthe U.S. Population: National HealthInterview Survey, 1999, Vital and HealthStatistics, Series 10 #211; U.S. NationalCenter for Health Statistics, SummaryHealth Statistics for U.S. Adults: NationalHealth Interview Survey, 1999, Vital andHealth Statistics, Series 10 #212; U.S.National Center for Health Statistics,Design and Estimation for the NationalHealth Interview Survey, 1995-2004,Vital and Health Statistics, Series 2

Appendix III 957

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 18: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

#130; U.S. National Center for HealthStatistics, Summary Health StatisticsTechnical Report: National Health Inter-view Survey, 1997-2003, Vital and Sta-tistics, Series 2 #134 (in preparation).

U.S. BUREAU OF JUSTICESTATISTICS (BJS)

National Crime VictimizationSurvey

Universe, Frequency, and Types of Data:Monthly survey of individuals andhouseholds in the United States toobtain data on criminal victimization ofthose units for compilation of annualestimates.

Type of Data Collection Operation:National probability sample survey ofabout 50,000 interviewed households in376 PSUs selected from a list ofaddresses from the 1980 census,supplemented by new construction per-mits and an area sample where permitsare not required.

Data Collection and Imputation Proce-dures: Interviews are conducted every 6months for 3 years for each householdin the sample; 8,300 households areinterviewed monthly. Personal inter-views are used in the first interview; theintervening interviews are conducted bytelephone whenever possible.

Estimates of Sampling Error: CVs aver-aged over the period 1998-2001 are:3.7 percent for personal crimes(includes all crimes of violence pluspurse snatching crimes), 3.8 percent forcrimes of violence; 12.1 percent for esti-mate of rape/sexual assault counts; 7.9percent for robbery counts; 4.1 percentfor assault counts; 11.2 percent forpurse snatching (it refers to pursesnatching and pocket picking); 2.5 per-cent for property crimes; 3.8 percent forburglary counts; 2.7 percent for theft (ofproperty); and 5.2 percent for motorvehicle theft counts.

Other (nonsampling) Errors: Respondentrecall errors which may include report-ing incidents for other than the refer-ence period, interviewer coding and

processing errors, and possible mis-taken reporting or classifying of events.Adjustment is made for a householdnoninterview rate of about 7 percentand for a within-household noninterviewrate of 10 percent.

Sources of Additional Material: U.S.Bureau of Justice Statistics, Criminal Vic-timization in the United States, annual.

U.S. FEDERAL BUREAU OFINVESTIGATION

Uniform Crime Reporting(UCR) Program

Universe, Frequency, and Types of Data:Monthly reports on the number of crimi-nal offenses that become known to lawenforcement agencies. Data are col-lected on crimes cleared by arrest, byage, sex, and race of offender, and onassaults on law enforcement officers.

Type of Data Collection Operation: Crimestatistics are based on reports of crimedata submitted either directly to the FBIby contributing law enforcement agen-cies or through cooperating state UCRprograms.

Data Collection and Imputation Proce-dures: States with UCR programs collectdata directly from individual lawenforcement agencies and forwardreports, prepared in accordance withUCR standards, to the FBI. Accuracy andconsistency edits are performed by theFBI.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: Coverage of90 percent of the population (92 percentin MSAs, 79 percent in ‘‘other cities’’,and 79 percent in rural areas) by UCRprogram, though varying number ofagencies report.

Sources of Additional Material: U.S. Fed-eral Bureau of Investigation, Crime inthe United States, annual, Hate CrimeStatistics, annual, Law EnforcementOfficers Killed & Assaulted, annual,<http://www.fbi.gov/ucr.htm>.

958 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 19: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

U.S. BUREAU OF LABORSTATISTICS

U.S. Census Bureau, CurrentPopulation Survey (CPS)

Universe, Frequency, and Types of Data:Nationwide monthly sample survey ofcivilian noninstitutional population, 15years old or over, to obtain data onemployment, unemployment, and anumber of other characteristics.

Type of Data Collection Operation: Multi-stage probability sample of about50,000 households in 754 PSUs in 1996,expanded to about 60,000 householdsin July 2001. Oversampling in somestates and the largest MSAs to improvereliability for those areas of employmentdata on annual average basis. A con-tinual sample rotation system is used.Households are in sample 4 months, outfor 8 months, and in for 4 more. Month-to-month overlap is 75 percent; year-to-year overlap is 50 percent.

Data Collection and Imputation Proce-dures: For first and fifth months that ahousehold is in sample, personal inter-views; other months, approximately, 85percent of the data collected by phone.Imputation is done for both item andtotal nonresponse. Adjustment for totalnonresponse is done by a predefinedcluster of units, by MSA size and resi-dence; for item nonresponse imputationvaries by subject matter.

Estimates of Sampling Error: EstimatedCVs on national annual averages forlabor force, total employment, andnonagricultural employment, 0.2 per-cent; for total unemployment and agri-cultural employment, 1.0 percent to 2.5percent. The estimated CVs for familyincome and poverty rate for all personsin 1986 are 0.5 percent and 1.5 percent,respectively. CVs for subnational areas,such as states, would be larger andwould vary by area.

Other (nonsampling) Errors: Estimates ofresponse bias on unemployment are notavailable, but estimates of unemploy-ment are usually 5 percent to 9 percentlower than estimates from reinterviews.About 7.5 percent of sample householdsunavailable for interviews.

Sources of Additional Material: U.S. Cen-sus Bureau and Bureau of Labor Statis-tics, Current Population Survey; Designand Methodology, (Tech. Paper 63),available on Internet <http://www.census.gov/prod/2002pubs/tp63rv.pdf>; Source and Accuracy of Estimatesfor Poverty in the United States, avail-able on Internet <http://www.census.gov/hhes/poverty/poverty02/pov02src.pdf> and Bureau of Labor Sta-tistics, Employment and Earnings,monthly, Explanatory Notes and Esti-mates of Error, Household Data and BLSHandbook of Methods, Chapter 1, avail-able on the Internet at <http://www.bls.gov/opub/hom/homch1a.htm>.

Consumer Price Index (CPI)Universe, Frequency, and Types of Data:

Monthly survey of price changes of alltypes of consumer goods and servicespurchased by urban wage earners andclerical workers prior to 1978, andurban consumers thereafter. Bothindexes continue to be published.

Type of Data Collection Operation: Priorto 1978, and since 1998, sample of vari-ous consumer items in 87 urban areas;from 1978-1997, in 85 PSUs, exceptfrom January 1987 through March 1988,when 91 areas were sampled.

Data Collection and Imputation Proce-dures: Prices of consumer items areobtained from about 50,000 housingunits, and 23,000 other reporters in 87areas. Prices of food, fuel, and a fewother items are obtained monthly; pricesof most other commodities and servicesare collected every month in the threelargest geographic areas and everyother month in others.

Estimates of Sampling Error: Estimates ofstandard errors are available.

Other (nonsampling) Errors: Errors resultfrom inaccurate reporting, difficulties indefining concepts and their operationalimplementation, and introduction ofproduct quality changes and new prod-ucts.

Sources of Additional Material: U.S.Bureau of Labor Statistics, Internet site<http://stats.bls.gov/cpi> and BLSHandbook of Methods, Chapter 17, Bulle-tin 2490.

Appendix III 959

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 20: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

Producer Price Index (PPI)

Universe, Frequency, and Types of Data:Monthly survey of producing companiesto determine price changes of all com-modities produced in the United Statesfor sale in commercial transactions.Data on agriculture, forestry, fishing,manufacturing, mining, gas, electricity,public utilities, and a few services.

Type of Data Collection Operation: Prob-ability sample of approximately 30,000establishments that result in about100,000 price quotations per month.

Data Collection and Imputation Proce-dures: Data are collected by mail andfacsimile. If transaction prices are notsupplied, list prices are used. Someprices are obtained from trade publica-tions, organized exchanges, and govern-ment agencies. To calculate index, pricechanges are multiplied by their relativeweights taken from 1997 shipment val-ues from the census of manufactures.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: Not availableat present.

Sources of Additional Material: U.S.Bureau of Labor Statistics, BLS Handbookof Methods, Chapter 14, Bulletin 2490.U.S. Bureau of Labor Statistics Internetsites <http://stats.bls.gov/ppi>.

Current Employment Statistic(CES) Program

Universe, Frequency, and Types of Data:Monthly survey drawn from a samplingframe of over 8 million UnemploymentInsurance tax accounts in order obtaindata by industry on employment, hours,and earnings.

Type of Data Collection Operation: In2003, the CES sample included about160,000 businesses and governmentagencies, which represent approxi-mately 400,000 individual worksites.

Data Collection and Imputation Proce-dures: Each month, the state agenciescooperating with BLS, as well as BLSData Collection Centers, collect datathrough various automated collectionmodes and mail. BLS-Washington staff

prepares national estimates of employ-ment, hours, and earnings while statesuse the data to develop state and areaestimates.

Estimates of Sampling Errors: The relativestandard error for total nonfarmemployment is 0.2 percent.

Other (nonsampling) Errors): Estimates ofemployment adjusted annually to reflectcomplete universe. Average adjustmentis 0.3 percent over the last decade,ranging from zero to 0.7 percent.

Sources of Additional: U.S. Bureau ofLabor Statistics, Employment and Earn-ings, monthly, Explanatory Notes andEstimates of Errors, Tables 2-A through2-F.

National CompensationSurvey

Universe, Frequency, and Types of Data:Nationwide sample survey of establish-ments of all employment size classes,stratified by geographic area, in privateindustry and state and local govern-ment. Data collected include wages andsalaries, and employer costs ofemployee benefits. Data producedinclude percent changes in the cost ofemployment cited in the EmploymentCost Index (ECI) and costs per hourworked for individual benefits cited inthe Employer Costs for Employee Com-pensation (ECEC). The survey providesdata by ownership (Private industry andstate and local government), industrysector, major industry divisions, majoroccupational groups, bargaining status,metropolitan area status, and censusregion. ECEC also provides data byestablishment size class.

Type of Data Collection Operation: Prob-ability proportionate to size sample ofestablishments. The sample is replacedon a continual basis. Establishments arein the survey for approximately 5 years,with some establishments replaced eachquarter.

Data Collection and Imputation Proce-dures: For the initial visit, data are pri-marily collected in a personal visit to the

960 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 21: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

establishment. Quarterly updates areobtained primarily by mail, fax, and tele-phone. Imputation is done for individualbenefits.

Estimates of Sampling Error: Becausestandard errors vary from quarter toquarter, the ECI uses a 5-year movingaverage of standard errors to evaluatepublished series. These standard errorsare available at <http://www.bls.gov/ncs/ect/home.htm>.

Other (nonsampling) Errors: Nonsamplingerrors have a number of potentialsources. The primary sources are (1)survey nonresponse and (2) data collec-tion and processing errors. Nonsam-pling errors are not measured.Procedures have been implemented forreducing nonsampling errors, primarilythrough quality assurance programs.These programs include the use of datacollection reinterviews, observed inter-views, computer edits of the data, andsystematic professional review of thereports on which the data are recorded.The programs also serve as a trainingdevice to provide feedback to the fieldeconomists, or data collectors, onerrors. And, they provide information onthe sources of error which can be rem-edied by improved collection instruc-tions or computer processing edits.Extensive training of field economists isalso conducted to maintain high stan-dards in data collection.

Sources of Additional Material: Bureau ofLabor Statistics, BLS Handbook of Meth-ods, Chapter 8 (Bulletin 2490) and<http://www.bls.gov/ncs>.

Consumer Expenditure Survey(CES)

Universe, Frequency and Types of Data:Consists of two continuous compo-nents: a quarterly interview survey anda weekly diary or recordkeeping survey.They are nationwide surveys that collectdata on consumer expenditures,income, characteristics, and assets andliabilities. Samples are national probabil-ity samples of households that arerepresentative of the civilian noninstitu-tional population. The surveys havebeen ongoing since 1980.

Type of Data Collection Operation: TheInterview Survey is a panel rotation sur-vey. Each panel is interviewed for fivequarters and then dropped from the sur-vey. About 7,500 consumer units areinterviewed each quarter. The Diary Sur-vey sample is new each year and con-sists of about 7,500 consumer units.Data are collected on an ongoing basisin 105 PSUs since 1996.

Data Collection and Imputation Proce-dures: For the Interview Survey, data arecollected by personal interview witheach consumer unit interviewed onceper quarter for five consecutive quar-ters. Designed to collect informationthat respondents can recall for 3 monthsor longer, such as large or recurringexpenditures. For the Diary Survey,respondents record all their expendi-tures in a self-reporting diary for twoconsecutive 1-week periods. Designedto pick up items difficult to recall over along period, such as detailed foodexpenditures. Missing or invalidattributes or expenditures are imputed.Income, assets, and liabilities are notimputed. The U.S. Census Bureau col-lects the data for the Bureau of LaborStatistics.

Estimates of Sampling Error: Standarderror tables are available since 2000.

Other (nonsampling) Errors: Includesincorrect information given by respon-dents, data processing errors, inter-viewer errors, and so on. They occurregardless of whether data are collectedfrom a sample or from the entire popu-lation.

Sources of Additional Material: Bureauof Labor Statistics, Internet site<http://www.bls.gov/cex> and BLSHandbook of Methods, Chapter 16,Bulletin 2490.

BOARD OF GOVERNORS OF THEFEDERAL RESERVE SYSTEM

Survey of Consumer Finances

Universe, Frequency, and Types of Data:Periodic sample survey of families. Inthis survey a given household is dividedinto a primary economic unit and othereconomic units. The primary economicunity, which may be a single individual,

Appendix III 961

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 22: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

is generally chosen as the unit that con-tains the person who either holds thetitle to the home or is the first personlisted on the lease. The primary unit isused as the reference family. The surveycollects detailed data on the composi-tion of family balance sheets, the termsof loans, and relationships with financialinstitutions. It also gathered informationon the employment history and pensionrights of the survey respondent and thespouse or partner of the respondent.

Type of Data Collection Operation: Thesurvey employs a two-part strategy forsampling families. Some families wereselected by standard multistage areaprobability sampling methods applied toall 50 states. The remaining families inthe survey were selected using statisti-cal records derived from tax returns,under the strict rules governing confi-dentiality and the rights of potentialrespondents to refuse participation.

Data Collection and Imputation Proce-dures: NORC at the University of Chi-cago has collected data for the surveysince 1992. Since 1995, the survey hasused computer-assisted personal inter-viewing. Adjustments for nonresponseare made through multiple imputationof unanswered questions and throughweighting adjustments based on dataused in the sample design for familiesthat refused participation.

Estimates of Sampling Error: Because ofthe complex design of the survey, theestimation of potential sampling errorsis not straightforward. A replicate-basedprocedure is available.

Other (nonsampling) Errors: The surveyaims to complete 4,500 interviews, withabout two-thirds of that number deriv-ing from the area-probability sample.The response rate is typically about 70percent for the area-probability sampleand about 35 percent over all strata inthe tax-data sample. Proper training andmonitoring of interviewers, carefuldesign of questionnaires, and system-atic editing of the resulting data wereused to control inaccurate surveyresponses.

Sources of Additional Material: Board ofGovernors of the Federal Reserve Sys-tem, Recent Changes in U.S. Family

Finances: Evidence from the 1998 and2001 Survey of Consumer Finances, Fed-eral Reserve Bulletin, January 2003.

U.S. INTERNAL REVENUE SERVICE

Statistics of Income,Individual Income TaxReturns

Universe, Frequency, and Types of Data:Annual study of unaudited individualincome tax returns, Forms 1040, 1040A,and 1040EZ, filed by U.S. citizens andresidents. Data provided on variousfinancial characteristics by size ofadjusted gross income, marital status,and by taxable and nontaxable returns.Data by state, based on 100 percent file,also include returns from 1040NR, filedby nonresident aliens plus certain selfemployment tax returns.

Type of Data Collection Operation: Annual2000 stratified probability sample ofapproximately 196,000 returns brokeninto sample strata based on the larger oftotal income or total loss amounts aswell as the size of business plus farmreceipts. Sampling rates for samplestrata varied from 0.05 percent to 100percent.

Data Collection and Imputation Proce-dures: Computer selection of sample oftax return records. Data adjusted duringediting for incorrect, missing, or incon-sistent entries to ensure consistencywith other entries on return.

Estimates of Sampling Error: EstimatedCVs for tax year 2000: Adjusted grossincome less deficit 0.11 percent; salariesand wages 0.21 percent; and taxexempt interest received 1.65 percent.(State data not subject to samplingerror.)

Other (nonsampling) Errors: Processingerrors and errors arising from the use oftolerance checks for the data.

Sources of Additional Material: U.S.Internal Revenue Service, Statisticsof Income, Individual Income TaxReturns, annual. For background infor-mation for individual tax statistics:<http://www.irs.gov/taxstats/article/0,,id=96571,00.html>

962 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 23: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

Statistics of Income, SoleProprietorship Returns andStatistics of Income Bulletin

Universe, Frequency, and Types of Data:Annual study of unaudited income taxreturns of nonfarm sole proprietorships,Form 1040 with business schedules.Data provided on various financial char-acteristics by industry.

Type of Data Collection Operation: Strati-fied probability sample of approximately56,000 sole proprietorships for tax year2000. The sample is classified based onpresence or absence of certain businessschedules; the larger of total income orloss; and size of business plus farmreceipts. Sampling rates vary from0.05 percent to 100 percent.

Data Collection and Imputation Proce-dures: Computer selection of sample oftax return records. Data adjusted duringediting for incorrect, missing, or incon-sistent entries to ensure consistencywith other entries on return.

Estimates of Sampling Error: EstimatedCVs for tax year 2000 are available. Forsole proprietorships, business receipts,0.69 percent; net income, (less loss),0.94 percent; depreciation 1.40 percent.

Other (nonsampling) Errors: Processingerrors and errors arising from the use oftolerance checks for the data.

Sources of Additional Material: U.S. Inter-nal Revenue Service, Statistics ofIncome, Sole Proprietorship Returns (foryears through 1980) and Statistics ofIncome Bulletin, Vol. 22, No. 1 (summer2002).

Statistics of Income, Partner-ship Returns and Statisticsof Income Bulletin

Universe, Frequency, and Types of Data:Annual study of unaudited income taxreturns of partnerships, Form 1065.Data provided on various financial char-acteristics by industry.

Type of Data Collection Operation: Strati-fied probability sample of approximately36,000 partnership returns from a popu-lation of 2.2 million filed during calen-dar year 2000. The sample is classifiedbased on combinations of gross

receipts, net income or loss, and totalassets, and on industry. Sampling ratesvary from 0.09 percent to 100 percent.

Data Collection and Imputation Proce-dures: Computer selection of sample oftax return records. Data are adjustedduring editing for incorrect, missing, orinconsistent entries to ensure consis-tency with other entries on return. Datanot available due to regulations are notimputed.

Estimates of Sampling Error: EstimatedCVs for tax year 2000 (latest available):For number of partnerships, 0.3 percent;business receipts, 0.4 percent; netincome, 0.5 percent; net loss, 1.9 per-cent.

Other (nonsampling) Errors: Processingerrors and errors arising from the use oftolerance checks for the data.

Sources of Additional Material: U.S. Inter-nal Revenue Service, Statistics ofIncome, Partnership Returns and Statis-tics of Income Bulletin, Vol. 22, No. 2(fall 2002).

Corporation Income TaxReturns

Universe, Frequency, and Types of Data:Annual study of unaudited corporationincome tax returns, Forms 1120,1120-A, 1120-F, 1120-L, 1120-PC, 1120-REIT, 1120-RIC, and 1120S, filed by cor-porations or businesses legally definedas corporations. Data provided on vari-ous financial characteristics by industryand size of total assets, and businessreceipts.

Type of Data Collection Operation: Strati-fied probability sample of approximately145,500 returns for tax year 2000, allo-cated to sample classes which are basedon type of return, size of total assets,size of net income or deficit, andselected business activity. Samplingrates for sample classes varied from0.25 percent to 100 percent.

Data Collection and Imputation Proce-dures: Computer selection of sample oftax return records. Data adjusted duringediting for incorrect, missing, or incon-sistent entries to ensure consistencywith other entries on return and to com-ply with statistical definitions.

Appendix III 963

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 24: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

Estimates of Sampling Error: EstimatedCVs for tax year 2000: Returns withassets over $250 million are self-representing. For other returns groupedby assets, CVs ranged from 0.04 percentto 2.88 percent; for amount of netincome CV is 0.14 percent.

Other (nonsampling) Errors: Nonsamplingerrors include coverage errors, process-ing errors, and response errors.

Sources of Additional Material: U.S. Inter-nal Revenue Service, Statistics ofIncome, Corporation Income TaxReturns, annual. For background infor-mation for corporation tax statistics:<http://www.irs.gov/taxstats/article/0,,id=96246,00.html>.

U.S. SOCIAL SECURITYADMINISTRATION

Benefit DataUniverse, Frequency, and Types of Data:

All persons receiving monthly benefitsunder Title II of Social Security Act. Dataon number and amount of benefits paidby type and state.

Type of Data Collection Operation: Databased on administrative records. Databased on 100 percent files, as well as10 percent and 1 percent sample files.

Data Collection and Imputation Proce-dures: Records used consist of actionspursuant to applications dated by subse-quent post-entitlement actions.

Estimates of Sampling Error: Varies bysize of estimate and sample file size.

Other (nonsampling) Errors: Processingerrors, which are believed to be small.

Sources of Additional Material: U.S. SocialSecurity Administration, Annual Statisti-cal Supplement to the Social SecurityBulletin.

Supplemental Security Income(SSI) Program

Universe, Frequency, and Types of Data:All eligible aged, blind, or disabled per-sons receiving SSI benefit paymentsunder SSI program. Data include numberof persons receiving federally adminis-tered SSI, amounts paid, and stateadministered supplementation.

Type of Data Collection Operation: Databased on administrative records.

Data Collection and Imputation Proce-dures: Data adjusted to reflect returnedchecks and overpayment refunds. Forfederally administered payments, actualadjusted amounts are used.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: Processingerrors, which are believed to be small.

Sources of Additional Material: U.S. SocialSecurity Administration, Annual Statisti-cal Supplement to the Social SecurityBulletin.

National Highway Traffic SafetyAdministration (NHTSA)

Fatality Analysis ReportingSystem (FARS)

Universe, Frequency, and Types of Data:Census of all motor vehicle trafficcrashes involving at least one personkilled as a result of the crash. The crashmust be reported to the state and thedeath of the involved person must bewithin thirty days of the crash date.These fatal crashes occur throughoutthe United States (includes the Districtof Columbia), Puerto Rico, VirginIslands, and American Pacific Territories.

Type of Data Collection Operation: Eachstate provides an analyst(s) whoextracts data from the official docu-ments and enters it into a standardizeddatabase.

Data Collection and Imputation Proce-dures: Detailed data describing the char-acteristics of the fatal crash, the vehiclesand persons involved are obtained frompolice crash reports, driver and vehicleregistration records, autopsy reports,highway department, etc. Computerizededit checks monitor that accuracy andcompleteness of the data. The FARSincorporates a sophisticated mathemati-cal multiple imputation model to imputemissing blood alcohol concentration(BAC) in the database for drivers andpedestrians only.

964 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2003

Page 25: Appendix III. Limitations of the Data · Appendix III Limitations of the Data ... selection and usually each unit selected can be assigned a number, greater than zero and less than

Estimates of Sampling Error: Since this iscensus data, there are no samplingerrors.

Other (nonsampling) Errors: Data on thefatal motor vehicle traffic crashes ismore than 97 percent complete.

Sources of Additional Material: The FARSCoding and Validation Manual, ANSID16.1, Manual on Classification of MotorVehicle Traffic Accidents (sixth edition).

Appendix III 965

U.S. Census Bureau, Statistical Abstract of the United States: 2003