measuring and reporting sources of error is surveys · 2018-06-05 · (energy) chris moriarity...

153
STATISTICAL POLICY WORKING PAPER 31 Measuring and Reporting Sources of Error in Surveys Statistical Policy Office Office of Information and Regulatory Affairs Office of Management and Budget July 2001

Upload: others

Post on 28-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • STATISTICAL POLICY

    WORKING PAPER 31

    Measuring and Reporting Sources of Error in Surveys

    Statistical Policy Office Office of Information and Regulatory Affairs

    Office of Management and Budget

    July 2001

  • The Federal Committee on Statistical MethodologyApril 2001 MEMBERS

    Virginia de Wolf, Acting Chair William Iwig Committee on National Statistics National Agricultural Statistics Service

    Susan W. Ahmed Daniel Kasprzyk Consumer Product Safety Commission National Center for Education Statistics

    Wendy L. Alvey, Secretary Nancy J. Kirkendall U.S. Bureau of the Census Energy Information Administration

    Lynda Carlson Charles P. Pautler, Jr. National Science Foundation Internal Revenue Service

    Cynthia Z.F. Clark Susan Schechter U.S. Bureau of the Census U.S. Office of Management and Budget

    Steven B. Cohen Rolf R. Schmitt Agency for Healthcare Research and Quality Federal Highway Administration

    Lawrence H. Cox Monroe G. Sirken National Center for Health Statistics National Center for Health Statistics

    Cathryn Dippo Nancy L. Spruill U.S. Bureau of Labor Statistics U.S. Department of Defense

    Zahava D. Doering Clyde Tucker Smithsonian Institution U.S. Bureau of Labor Statistics

    Robert E. Fay Alan R. Tupek U.S. Bureau of the Census U.S. Bureau of the Census

    Ronald Fecso Denton R. Vaughan National Science Foundation U.S. Bureau of the Census

    Gerald Gates G. David Williamson U.S. Bureau of the Census Centers for Disease Control and Prevention

    Barry Graubard Alvan O. Zarate National Cancer Institute National Center for Health Statistics

    CONSULTANT Robert Groves Joint Program in Survey Methodology

  • STATISTICAL POLICY

    WORKING PAPER 31

    Measuring and Reporting Sources of Error in Surveys

    Prepared by the Subcommittee on Measuring and Reporting the Quality of Survey Data

    Federal Committee on Statistical Methodology

    Statistical Policy Office Office of Information and Regulatory Affairs

    Office of Management and Budget

    June 2001

  • _________________________

    Members of the Subcommittee on Measuring and Reporting the Quality of Survey Data

    Daniel Kasprzyk, Chair National Center for Education Statistics (Education)

    Dale Atkinson National Agricultural Statistics Services (Agriculture)

    Judith Conn Centers for Disease Control and Prevention (Health and Human Services)

    Ram Chakrabarty1 Bureau of the Census (Commerce)

    Charles Darby Agency for Healthcare Research and Quality

    Lee Giesbrecht2 Bureau of Transportation Statistics (Transportation)

    Brian Harris-Kojetin3 Bureau of Labor Statistics (Labor)

    Howard Hogan Bureau of the Census (Commerce)

    Nancy Kirkendall Energy Information Administration (Energy)

    Marilyn McMillen National Center for Education Statistics (Education)

    Renee Miller Energy Information Administration (Energy)

    Chris Moriarity National Center for Health Statistics (Health and Human Services)

    Dennis Schwanz Bureau of the Census (Commerce)

    Carolyn Shettle4 National Science Foundation

    W. Karl Sieber National Institute for Occupational Safety and Health (Health and Human Services)

    Antoinette Ware-Martin Energy Information Administration (Energy)

    John Wolken Federal Reserve Board

    Graham Kalton, Senior Advisor Joint Program in Survey Methodology and WESTAT

    1 Deceased 2 Now with Abt Associates, Inc. 3 Now with Arbitron, Inc. 4 Now with Institute for Survey Research, Temple University

  • Acknowledgments

    In 1996, Maria Gonzales, Chair of the Federal Committee on Statistical Methodology (FCSM), formed the Subcommittee on Measuring and Reporting the Quality of Survey Data. Members of the subcommittee were drawn from 12 federal agencies whose missions include the collection, production, dissemination, and reporting of statistical information.

    Maria’s energy and enthusiasm provided the early stimuli for the subcommittee, and her untimely death was a loss to the subcommittee. A little over one year after Maria’s death, the subcommittee renewed its commitment to the project. Nancy Kirkendall, then Chair of the FCSM, and Dan Kasprzyk, Chair of the FCSM Subcommittee on Measuring and Reporting the Quality of Survey Data, in consultation with Graham Kalton identified broad areas for the subcommittee to consider. The members of the subcommittee, drawing on their own experience as well as the culture and norms of their respective agencies, took those broad ideas and developed the approach found in this working paper.

    The working paper is the result of several activities. First, many discussions and meetings of the subcommittee occurred over the course of several years. These meetings provided the opportunity to establish common ground and shared interests among the members of the subcommittee. Second, members of the subcommittee organized sessions related to the topic of “measuring and reporting the quality of survey data” at the 1996 and 1998 Federal Committee on Statistical Methodology (FCSM)/Council on Professional Associations for Federal Statistics (COPAFS) conferences. The papers presented at these conferences are available in the conference proceedings (OMB Statistical Policy Working Paper 26: Seminar on Statistical Methodology in the Public Service and OMB Statistical Policy Working Paper 28: Seminar on Interagency Coordination and Cooperation). Third, the 1997 Statistics Canada Symposium, “New Directions in Surveys and Censuses,” provided an opportunity to present a paper on the general approach taken by the subcommittee and obtain comments from the statistical community. Fourth, the subcommittee conducted three studies to develop an understanding and appreciation of the reporting practices of Federal statistical agencies. Finally, the subcommittee organized an invited session for the 1999 International Statistical Institute meetings in Helsinki, Finland, at which the international statistics community commented on the results of the three studies.

    These activities helped guide the development and are the basis of this working paper. All subcommittee members participated in spirited, informative, and productive discussions over the course of several years. Video conferencing through the auspices of the National Center for Health Statistics allowed full participation of subcommittee members not located in the Washington, DC metropolitan area.

    iii

  • Each chapter in this report was drafted by one or more members of the subcommittee as follows:

    Chapter Author

    1..................... Daniel Kasprzyk

    2..................... Daniel Kasprzyk

    3..................... Chris Moriarity

    4..................... Marilyn McMillen, Brian Harris-Kojetin, Renee Miller, and Antoinette Ware-Martin

    5..................... Howard Hogan and Karl Seiber

    6..................... Dennis Schwanz, Charles Darby, and Judith Conn

    7..................... Dale Atkinson and John Wolken

    8..................... Pat Dean Brick, Lee Giesbrecht, and Carolyn Shettle

    Jim Knaub (Energy Information Administration) and the late Ram Chakrabarty (U.S. Bureau of the Census) contributed to the early discussions of the subcommittee. Lee Giesbrecht, Carolyn Shettle, and Brian Harris-Kojetin were active subcommittee members, drafting major parts of individual chapters, until each left government service for the private sector in 2000, 1999, and 1998, respectively. Pat Dean Brick, a consultant now with Research Triangle Institute, provided substantial editorial assistance on the entire manuscript. Daniel Kasprzyk was responsible for the final edit and review of the manuscript.

    The subcommittee appreciates the thoughtful and helpful comments of the reviewers of the working paper: Cynthia Clark, Suzanne Dorinski, Carma Hogue, Susan Schecter, Clyde Tucker, and Al Tupek. Throughout the subcommittee’s existence, Graham Kalton provided timely and practical advice. His experience, knowledge and good judgment helped clarify issues for the subcommittee; his commentary on interim products and his dedication to producing outstanding work is appreciated. Nancy Kirkendall, as chair of the FCSM, encouraged and cajoled the subcommittee in its efforts. She was an active subcommittee member and made substantial content and editorial contributions to all chapters. Finally, the subcommittee thanks Carol Rohr and Susan Baldridge of Pinkerton Computer Consultants, Inc. for their excellent effort in editing and desktop publishing the manuscript.

    iv

  • Summary

    In 1996, the FCSM established a subcommittee to review the measurement and reporting of data quality in federal data collection programs. Many issues revolve around these two broad topics, not the least of which is what is meant by “quality.” Different data users have different goals and, consequently, different ideas of what constitutes “quality.” If defining quality is difficult, then the reporting of quality is also. Reporting “quality” is dependent on the needs of data users and the kind of product—analytic report, technical report or data set, for example, made available to the user. The FCSM subcommittee, whose membership represents the experiences of 12 statistical agencies, took the approach of studying “data quality” in terms of the measurement and reporting of various error sources that affect data quality: sampling error, nonresponse error, coverage error, measurement error, and processing error.

    The subcommittee developed an approach to studying these error sources by trying to answer four questions:

    ��What measurement methods are used by federal data collection programs to assess sources of error in data collection programs?

    ��To what extent do federal data collection programs report information on sources of error to the user community?

    ��How does reporting about error sources vary across different types of publications and dissemination media?

    ��What information on sources of error should federal data collection programs provide and how should they provide it?

    This report represents the subcommittee’s efforts at addressing these questions. To understand current reporting practices, the subcommittee conducted three studies. Two studies focussed on specific kinds of reports—the short-format report and the analytic report. The third study reviewed the extent to which information about error sources was available on the Internet. The studies’ results were surprising because information about sources of survey error were not as well-reported as the subcommittee had expected. Chapter 1 discusses data quality and policies and guidelines with respect to reporting on the quality of data. Chapter 2 describes the studies, the studies’ results and provides recommendations on the kind of information that ought to be reported in short-format reports, analytic reports, and technical reports. Chapters 3–7 describe the measurement of sources of error in surveys: sampling error, nonresponse error, coverage error, measurement error, and processing error. Each chapter discusses the results of the subcommittee’s studies that relate to the particular source of error, provides specific recommendations for reporting error sources in an analytic report, and identifies additional topics to report in the technical report format. Chapter 8 discusses the measurement and reporting of total survey error.

    v

  • vi

  • Contents Acknowledgments .......................................................................................................................... iii Summary ......................................................................................................................................... v List of Tables.................................................................................................................................. xi

    Chapter 1 Measuring and Reporting Data Quality in Federal Data Collection Programs............1-1 1.1 Introduction ................................................................................................................1-1 1.2 Overview of the Report ..............................................................................................1-1 1.3 Data Quality ...............................................................................................................1-2 1.4 Data Quality Policies and Guidelines.........................................................................1-3

    1.4.1 The Principle of Openness .............................................................................1-3 1.4.2 Statistical Standards and Guidelines ..............................................................1-4 1.4.3 Discussion ......................................................................................................1-5

    1.5 Measuring Sources of Error .......................................................................................1-5 References ..............................................................................................................................1-8

    Chapter 2 Reporting Sources of Error: Studies and Recommendations ......................................2-1 2.1 Reporting Formats and Reporting Sources of Error...................................................2-1 2.2 The Short-Format Report ...........................................................................................2-2

    2.2.1 The Short-Format Report Study.....................................................................2-2 2.2.2 Short-Format Reports: Discussion and Recommendations............................2-3

    2.3 The Analytic Report ...................................................................................................2-4 2.3.1 The Analytic Report Study.............................................................................2-4 2.3.2 The Analytic Report: Discussion and Recommendations..............................2-5

    2.4 The Internet ..............................................................................................................2-10 2.4.1 The Internet Study........................................................................................2-10 2.4.2 The Internet Study: Discussion and Recommendations...............................2-11

    2.5 General Observations ...............................................................................................2-12 References ............................................................................................................................2-13

    Chapter 3 Sampling Error ............................................................................................................3-1 3.1 Introduction ................................................................................................................3-1 3.2 Measuring Sampling Error .........................................................................................3-1

    3.2.1 Variance Estimation Methods ........................................................................3-2 3.2.2 Computer Software for Variance Estimation .................................................3-3 3.2.3 Sampling Error Estimates from Public Use Files...........................................3-3

    3.3 Approaches to Reporting Sampling Error ..................................................................3-4 3.3.1 Direct Estimates of Sampling Errors..............................................................3-6 3.3.2 Indirect Estimates of Sampling Errors ...........................................................3-6

    3.4 Reporting Sampling Error in Federal Surveys ...........................................................3-7 References ............................................................................................................................3-10

    Chapter 4 Nonresponse Error .......................................................................................................4-1 4.1 Introduction ................................................................................................................4-1 4.2 Unit Nonresponse.......................................................................................................4-2

    4.2.1 Computing and Reporting Response/Nonresponse Rates ..............................4-3

    vii

  • 4.2.2 Unweighted Response Rates ..........................................................................4-3 4.2.3 Weighted Response Rates ..............................................................................4-6 4.2.4 Using Weighted versus Unweighted Response Rates ....................................4-7 4.2.5 Other Response Rates.....................................................................................4-8 4.2.6 Measuring and Reporting Nonresponse Bias .................................................4-9

    4.3 Item Nonresponse.....................................................................................................4-11 4.3.1 Causes of Item Nonresponse ........................................................................4-11 4.3.2 Computing and Reporting Item Response/Nonresponse Rates....................4-12

    4.4 Compensating for Nonresponse ...............................................................................4-12 4.4.1 Weighting Procedures ..................................................................................4-13 4.4.2 Imputation Procedures..................................................................................4-14

    4.5 Methods and Procedures to Minimize Unit and Item Nonresponse.........................4-15 4.6 Quality Indicators for Nonresponse..........................................................................4-16 4.7 Reporting Nonresponse in Federal Surveys .............................................................4-17 References ............................................................................................................................4-20

    Chapter 5 Coverage Error.............................................................................................................5-1 5.1 Introduction ................................................................................................................5-1 5.2 Measuring Coverage Error .........................................................................................5-3

    5.2.1 Comparisons to Independent Sources ............................................................5-3 5.2.2 Case-by-Case Matching and Dual System Estimation ...................................5-5 5.2.3 Other Approaches to Coverage Measurement................................................5-7

    5.3 Assessing the Effect of Coverage Errors on Survey Estimates ..................................5-7 5.4 Correcting for Coverage Error..................................................................................5-10

    5.4.1 Dual Frame Approach ..................................................................................5-10 5.4.2 Postratification .............................................................................................5-11

    5.5 Reporting Coverage Error in Federal Surveys .........................................................5-12 References ............................................................................................................................5-14

    Chapter 6 Measurement Error ......................................................................................................6-1 6.1 Introduction ................................................................................................................6-1 6.2 Sources of Measurement Error...................................................................................6-2

    6.2.1 Questionnaire Effects .....................................................................................6-2 6.2.2 Data Collection Mode Effects ........................................................................6-5 6.2.3 Interviewer Effects .........................................................................................6-9 6.2.4 Respondent Effects.......................................................................................6-11

    6.3 Approaches to Quantify Measurement Error ...........................................................6-13 6.3.1 Randomized Experiments ............................................................................6-13 6.3.2 Cognitive Research Methods .......................................................................6-13 6.3.3 Reinterview Studies......................................................................................6-15 6.3.4 Behavior Coding ..........................................................................................6-18 6.3.5 Interviewer Variance Studies .......................................................................6-19 6.3.6 Record Check Studies ..................................................................................6-20

    6.4 Reporting Measurement Error in Federal Surveys ...................................................6-24 References ............................................................................................................................6-27

    viii

  • Chapter 7 Processing Error...........................................................................................................7-1 7.1 Introduction ................................................................................................................7-1 7.2 Measuring Processing Error .......................................................................................7-1

    7.2.1 Data Entry Errors............................................................................................7-1 7.2.2 Pre-Edit Coding Errors...................................................................................7-4 7.2.3 Editing Errors .................................................................................................7-6 7.2.4 Imputation Errors ...........................................................................................7-8

    7.3 Reporting Processing Error in Federal Surveys .........................................................7-9 References ............................................................................................................................7-11

    Chapter 8 Total Survey Error .......................................................................................................8-1 8.1 Introduction ...............................................................................................................8-1 8.2 Measuring Total Survey Error....................................................................................8-3

    8.2.1 Comparisons to Independent Sources ...............................................................8-3 8.2.2 Quality Profiles..................................................................................................8-5 8.2.3 Error Models .....................................................................................................8-6

    8.3 Reporting Total Survey Error.....................................................................................8-8 References ............................................................................................................................8-11

    ix

  • x

  • List of Tables Table 2.1.—Evaluation criteria for the sources of error...............................................................2-6

    Table 2.2.—Evaluation criteria for background survey information ...........................................2-7

    Table 2.3.—Background survey information...............................................................................2-8

    Table 2.4.—Limitations of the data..............................................................................................2-9

    Table 4.1.—Example of reporting multiple response rates for random digit-dialing (RDD) surveys: 1996 National Household Education Survey .............................................4-5

    Table 4.2.—Hierarchical response rates: Response rates for the Schools and Staffing Surveys, public schools 1993–94............................................................................................4-6

    Table 5.1—Current Population Survey coverage ratios...............................................................5-4

    Table 6.1.—Rates of interviewer falsification, by survey..........................................................6-16

    Table 6.2.—Summary of the reinterview reliability for the 1998 and 1991 administrator and teacher surveys .......................................................................................................6-17

    Table 6.3.—Cases sampled from police records by whether crime was reported in the survey “within past 12 months” by type of crime..............................................................6-21

    Table 6.4.—Accuracy of teachers’ self-reports on the year they earned their academic degrees....................................................................................................................6-22

    Table 6.5.—SIPP record check: Percent net bias in estimates of program participation ...........6-23

    Table 6.6.—Percentage of respondents not reporting the motor vehicle accident, by number of months between accident and the interview ......................................................6-24

    Table 7.1.—1984 Residential Energy Consumption Survey (RECS): Changes to the household and billing files, by reason........................................................................................7-7

    xi

  • xii

  • Chapter 1

    Measuring and Reporting Data Quality in Federal Data Collection Programs

    1.1 Introduction The United States statistical system includes over 70 statistical agencies spread among 10 separate departments and 8 independent agencies. Within this decentralized statistical system the Office of Management and Budget (OMB) plays an important leadership role in the coordination of statistical work across the federal government. For example, the Federal Committee on Statistical Methodology (FCSM), a committee of the OMB, has played a leadership role in discussions of the methodology of federal surveys (Gonzalez 1995; Bailar 1997) for almost 25 years.

    In 1996, the FCSM established a subcommittee to review the measurement and reporting of data quality in federal data collection programs. The issues contained within this broad mandate are complex. Measuring the quality of survey data takes on different meanings depending on the constituency. Different data users have different goals and, consequently, different ideas of what constitutes quality. Similarly, the reporting of “quality” can be implemented quite differently depending on the type of data product produced. The FCSM subcommittee, whose membership represents the experiences of twelve statistical agencies, developed an approach to examining this topic by trying to provide answers to four questions:

    ��What measurement methods are used by federal agencies to assess sources of error in data collection programs?

    ��To what extent do federal data collection programs report information on sources of error to the user community?

    ��How does reporting about error sources vary across different types of publications and dissemination media?

    ��What information on sources of error should federal data collection programs provide and how should they provide it?

    This report represents the subcommittee’s efforts at addressing these questions. In general, the subcommittee took the approach of studying data quality in terms of the measurement and reporting of various error sources that affect data quality: sampling error, nonresponse error, coverage error, measurement error, and processing error (see, for example Kish 1965). The result of this analysis forms the core of this report.

    1.2 Overview of the Report This report provides a general discussion of sources of error in data collection programs. Chapter 2 describes studies undertaken by the subcommittee to understand current statistical agency

    1–1

  • practices on the reporting of those sources of error, and it gives general recommendations on the types of information about those error sources that ought to be available in short-format reports, analytic reports, or on the Internet. Each of chapters 3–7 addresses a specific source of error from two directions: first, the measurement of the source of error. In this case, a brief discussion of the methods used to measure the error source is given. The second direction is the nature and extent of reporting sources of error in analytic applications and more comprehensive survey design and methodological reports. Examples of practice, both in measuring the error source as well as how the error source is reported, are given when it is appropriate. The final chapter, chapter 8, discusses total survey error, the ways in which it is measured and reported by federal statistical agencies, and several recommendations concerning the reporting of total survey error.

    1.3 Data Quality A rich literature exists and continues to grow on the topic of survey data quality (Lyberg et al. 1997; Collins and Sykes 1999) and its management in national statistical agencies (Brackstone 1999). Definitions of the concept proliferate, but cluster around the idea that the characteristics of the product under development meet or exceed the stated or implied needs of the user. Arondel and Depoutot (1998) suggest in their review that statistical organizations should break down quality into components or characteristics that focus around several key concepts: accuracy, relevance, timeliness, and accessibility. See also, Statistics Canada (1992) and Statistics Sweden (Andersson, Lindstrom, and Lyberg 1997).

    Accuracy is an important and visible aspect of quality that has been of concern to statisticians and survey methodologists for many years. It relates to the closeness between estimated and true (unknown) values. For many, accuracy means the measurement and reporting of estimates of sampling error for sample survey programs, but, in fact, the concept is much broader, taking in nonsampling error as well. Nonsampling error includes coverage error, measurement error, nonresponse error, and processing error. These sources of error will be discussed below; however, it is important to recognize that the accuracy of any estimate is affected by both sampling and nonsampling error.

    Relevance refers to the idea that the data collection program measures concepts that are meaningful and useful to data users. Does the concept implemented in the data collection program fit the intended use? For example, concepts first measured in a continuous sample survey program 20 years ago may be inapplicable in current society; that is, it may no longer be relevant to data users. Determining the relevance of concepts and definitions is a difficult and time-consuming process requiring the expertise of data collectors, data providers, data users, agency researchers, and expert panels.

    Timeliness can refer to several concepts. First, it refers to the length of the data collection’s production time—the time from data collection until the first availability of a product. Fast release times are without exception looked upon favorably by end users. Second, timeliness can also refer to the frequency of the data collection. Timely data are current data. Timeliness can be difficult to characterize since the characteristics of the data collection can affect the availability of data. For example, a new sample survey may require more time prior to implementation than the revision of an existing survey. Data from continuous recurring surveys should be available

    1–2

  • sooner than periodic or one-time surveys, but ultimately timeliness is assessed by user needs and expectations.

    Accessibility, as a characteristic of data quality, refers to the ability of data users to obtain the products of the data collection program. Data products have their most value—are most accessible—when they are easily available to end-users and in the forms and formats desired. Data products are of several types—individual microdata in user-friendly formats on different media, statistical tabulations on key survey variables, and analytic and descriptive analysis reports. Accessibility also implies the data products include adequate documentation and discussion to allow proper interpretation of the survey results. Accessibility can also be described in terms of the efforts data producers make to provide “hands-on” technical assistance in using and interpreting the data products through consultation, training classes, etc.

    Arondel and Depoutot (1998) suggest three other characteristics of data quality: comparability of statistics, coherence, and completeness. Comparability of statistics refers to the ability to make reliable comparisons over time; coherence refers to the ability of the statistical data program to maintain common definitions, classifications, and methodological standards when data originate from several sources; and completeness is the ability of the statistical data collection to provide statistics for all domains identified by the user community.

    Survey data quality is a concept with many dimensions and with each dimension linked with others. In the abstract, all dimensions of data quality are very important, but in practice, it is usually not possible to place high importance on all dimensions. Thus, with fixed financial resources, an emphasis on one dimension will result in a decrease in emphasis in another. More emphasis on accuracy can lead to less emphasis on timeliness and accessibility; or an emphasis on timeliness may result in early/preliminary release data of significantly lower accuracy. Each dimension is important to an end user, but each user may differ in identifying the most important priorities for a data collection program.

    The subcommittee chose to limit its coverage to the accuracy dimension—a dimension that has a history of measurement and reporting. The subcommittee focused on reviewing statistical indicators used to describe different aspects of survey accuracy in relation to various error sources, how indicators may be measured, and whether and how they are presented to data users.

    1.4 Data Quality Policies and Guidelines

    1.4.1 The Principle of Openness The subcommittee’s focus on accuracy is rooted in two long-standing important general principles/features of government statistical systems. The first and critical feature of federal statistical agencies is that there is a stated policy of “openness” concerning the reporting of data quality to users. The U.S. Office of Management and Budget (1978) states that:

    “To help guard against misunderstanding and misuse of data, full information should be available to users about sources, definitions, and methods used in collecting and compiling statistics, and their limitations.”

    1–3

  • Indeed, the principle is characteristic of national statistical systems; for example, Statistics Canada (1992) has articulated a policy on informing users about data quality and methodology:

    “Statistics Canada, as a professional agency in charge of producing official statistics, has the responsibility to inform users of concepts and methodology used in collecting and processing its data, the quality of the data it produces, and other features of the data that may affect their use and interpretation.”

    New Zealand (Statistics New Zealand 1998) has developed protocols to guide the production and release of official statistics. The protocols are based on ten principles—one of which is to

    “Be open about methods used and documentation of methods and quality measures should be easily available to users to allow them to determine fit for use.”

    The policy of openness in providing full descriptions of data, methods, assumptions, and sources of error is one that is universally accepted by United States statistical agencies as well as the statistical agencies of other countries. As a policy and as a characteristic of a government statistical system, it is noncontroversial. There is general agreement that data users need information on the sources and methods used to compile the data. They also need to understand the nature and sources of error in sample surveys. Correct interpretation or re-analysis of data relies on the availability of such information. As Citro (1997) points out, data users must have the opportunity to review data methods and data limitations; they cannot heed these limitations if they do not have the opportunity to study and review them. However, as will be discussed below, implementation of the policy can vary in many ways.

    1.4.2 Statistical Standards and Guidelines The second feature of the United States statistical system that is important is the availability of standards or guidelines for survey processes. The U.S. Office of Management and Budget (1978) provides general guidelines for reporting survey information. Individual agencies have the flexibility to develop the general guidelines into more specific guidelines that help the agency codify its own professional standards. The specific guidelines help to promote consistency among studies, and promote the documentation of methods and principles used in collection, analysis, and dissemination. These standards and guidelines generally include prescriptions for the dissemination of information about data quality to users. Such guidelines are not uniformly available across all agencies, but a few good examples exist, such as the standards developed by the U.S. Department of Energy (1992) and the U.S. Department of Education (1992). The standards and guidelines have the effect of helping staff improve the quality and uniformity of data collection, analysis, and dissemination activities.

    Other agencies take a slightly different approach to the development of standards and guidelines, focussing more on the establishment of policies and procedures for reviewing reports, determining sampling errors, and testing hypotheses (Sirken et al. 1974) or presenting information concerning sampling and nonsampling error (Gonzalez et al. 1975; updated by Bailar 1987).

    1–4

  • The interest in establishing standards and guidelines is also found in the statistical systems of other countries. For example, the United Kingdom Government Statistical Service (1997) has developed guidelines that focus specifically on the reporting of data quality. These guidelines are in the form of a checklist of questions related to individual areas of the survey process. Statistics Canada (1998) has developed a report that provides “good practices” in the form of principles and guidelines for the individual steps of a survey. Documenting data quality in relation to various error sources is not a new concern. Thirty-seven years ago, the United Nations (1964) presented recommendations on the topics to be documented when preparing sample survey reports, including information on many sources of error. The United Nations recommendations include the provision of detailed information about how the survey was conducted.

    Information about survey procedures provides users with valuable insights into data quality. As an example, in a face-to-face household survey, information about interviewer training and recruitment, the extent of checking on the conduct of interviews, the number of visits required, and the refusal conversion procedures is useful in assessing the quality of the conduct of the survey. Therefore, it is important for data producers to report information about the background and history of the data collection program, its scope and objectives, sample design and sample size, data collection procedures, time period of data collection, the response mode, the designated respondents, as well as processing and estimation procedures. For repeated surveys, it is important for users to be aware of changes in design, content, and implementation procedures since they may affect comparisons over time.

    1.4.3 Discussion “Openness” and “measurement and reporting guidelines”—two principles discussed above— provide the impetus for the work of the subcommittee. The principle of “openness” is important for the United States statistical system because it helps others to reproduce results and question findings. The principle allows the system and the agencies in it to be held accountable and to maintain impartiality for the presentation and reporting of official statistics. The second principle provides the policies and guidelines to implement the policy of openness.

    The measurement and reporting of error sources is important for everyone who uses statistical data. For the analyst, this information helps data analyses through an awareness of the limitations of the data. It helps the methodologist understand current data collection procedures, methods, and data limitations, and it motivates the development of better methods for future data collections. For the statistical agency, the implementation of effective measurement and reporting is an integral part of the good practices expected of a statistical agency.

    1.5 Measuring Sources of Error The subcommittee organized its work in terms of the error sources that affect accuracy. It reviewed methods for measuring error sources and reviewed indicators for describing information on data quality. The subcommittee identified five sources of error: sampling error, nonresponse error, coverage error, measurement error, and processing error.

    Sampling error is probably the best-known source of survey error and refers to the variability that occurs by chance because a sample rather than an entire population was surveyed. The

    1–5

  • reporting of sampling error for survey estimates should be important to all statistical agencies. For any survey based on a probability sample, data from the survey can be used to estimate the standard errors of survey estimates. Nowadays, the standard errors for most estimates can be readily computed using software that takes into account the survey’s complex sample design. The challenge that occurs with the computation of standard errors is a result of the multi-purpose nature of many federal surveys. Surveys produce many complex statistics and the task of computing and reporting standard errors for all the survey estimates and for differences between estimates is an extremely large one.

    Nonresponse error is a highly visible and well-known source of nonsampling error. It is an error of nonobservation reflecting an unsuccessful attempt to obtain the desired information from an eligible unit. Nonresponse reduces sample size, results in increased variance, and introduces a potential for bias in the survey estimates. Nonresponse rates are frequently reported and are often viewed as a proxy for the quality of a survey. Nonresponse rates may be calculated differently for different purposes (Lessler and Kalsbeek 1992; Gonzalez, Kasprzyk, and Scheuren 1994; Council of American Survey Research Organizations 1982; American Association for Public Opinion Research 2000) and they are often miscalculated. The complexities of the survey design often make calculation and communication of response rates confusing and potentially problematic. While reporting nonresponse rates is important, nonresponse rates alone provide no indication of nonresponse bias. Special studies are necessary.

    Coverage error is the error associated with the failure to include some population units in the frame used for sample selection (undercoverage) and the error associated with the failure to identify units represented on the frame more than once (overcoverage). The source of coverage error is the sampling frame itself. It is important, therefore, that information about the quality of the sampling frame and its completeness for the target population is known. Measurement methods for coverage error rely on methods external to the survey operations; for example, comparing survey estimates to independent sources or by implementing a case-by-case matching of two lists.

    Measurement error is characterized as the difference between the observed value of a variable and the true, but unobserved, value of that variable. Measurement error comes from four primary sources in survey data collection: the questionnaire, as the official presentation or request for information; the data collection method, as the way in which the request for information is made; the interviewer, as the deliverer of the questions; and the respondent, as the recipient of the request for information. These sources comprise the entirety of data collection, and each source can introduce error into the measurement process. For example, measurement error may occur in respondents’ answers to survey questions, including misunderstanding the meaning of the question, failing to recall the information accurately, and failing to construct the response correctly (e.g., by summing the components of an amount incorrectly). Measurement errors are difficult to quantify, usually requiring special, expensive studies. Reinterview programs, record check studies, behavior coding, cognitive testing, and randomized experiments are a few of the approaches used to quantify measurement error.

    Processing error occurs after the survey data are collected, during the processes that convert reported data to published estimates and consistent machine-readable information. Each processing step, from data collection to the publication of the final survey results, can generate errors in the data or in the published statistics. These errors range from a simple recording error,

    1–6

  • that is a transcribing or transmission error, to more complex errors arising from a poorly specified edit or imputation model. They tend not to be well-reported or well-documented, and are seldom treated in the survey research literature. Processing errors include data entry, coding, and editing and imputation errors. Imputation errors are included under processing error because many agencies treat failed edits as missing and impute values for them. Error rates are determined through quality control samples; however, in recent years authors have advocated continuous quality management practices (Morganstein and Marker 1997; Linacre and Trewin 1989).

    The classification of error sources in surveys described above provides a framework for users of statistical data to develop an understanding of the nature of the data they analyze. An understanding of the limitations of data can assist an analyst in developing methods to compensate for the known shortcomings of their data. Of course, the errors from various sources are not of the same size or of the same importance. Later chapters will describe measurement techniques for determining the magnitude of the sources of error.

    1–7

  • References American Association for Public Opinion Research. 2000. Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys. Ann Arbor, MI: AAPOR.

    Andersson, C., Lindstrom, H., and Lyberg, L. 1997. “Quality Declaration at Statistics Sweden.” Seminar on Statistical Methodology in the Public Service. Washington, DC: U.S. Office of Management and Budget (Statistical Policy Working Paper 26). 131–144.

    Arondel, P. and Depoutot, R. May 1998. “Overview of Quality Issues when Dealing with Soci-economic Products in an International Environment.” Paper prepared for presentation at the XXXth ASU Meeting.

    Bailar, B. 1997. “The Federal Committee on Statistical Methodology.” Proceedings of the Section on Government Statistics and Section on Social Statistics. Alexandria, VA: American Statistical Association. 137–140.

    Bailar, B. June 2, 1987. “Policy on Standards and Review of Census Bureau Publications.” U.S. Bureau of the Census memorandum.

    Brackstone, G. 1999. “Managing Data Quality in a Statistical Agency.” Survey Methodology. 25(2): 139–149.

    Citro, C. 1997. “Discussion.” Seminar on Statistical Methodology in the Public Service. Washington, DC: U.S. Office of Management and Budget (Statistical Policy Working Paper 26). 43–51.

    Collins, M. and Sykes, W. 1999. “Extending the Definition of Survey Quality.” Journal of Official Statistics. 15(1): 57–66.

    Council of American Survey Research Organizations. 1982. On the Definitions of Response Rates. Port Jefferson, NY.

    Gonzalez, M.E. 1995. “Committee Origins and Functions: How and Why the Federal Committee on Statistical Methodology Began and What it Does.” Proceedings of the Section on Government Statistics. Alexandria, VA: American Statistical Association. 262–267.

    Gonzalez, M.E., Kasprzyk, D., and Scheuren, F. 1994. “Nonresponse in Federal Surveys: An Exploratory Study.” Amstat News 208. Alexandria, VA: American Statistical Association.

    Gonzalez, M.E., Ogus, J.L., Shapiro, G., and Tepping, B.J. 1975. “Standards for Discussion and Presentation of Errors in Survey and Census Data.” Journal of the American Statistical Association. 70(351)(II): 5–23.

    Kish, L. 1965. Survey Sampling. New York: John Wiley & Sons.

    Lessler, J.T. and Kalsbeek, W.D. 1992. Nonsampling Error in Surveys. New York: John Wiley & Sons.

    1–8

  • Linacre, S. and Trewin, D. 1989. “Evaluation of Errors and Appropriate Resource Allocation in Economic Collections.” Proceedings of the Annual Research Conference, Washington, DC: U.S. Bureau of the Census. 197–209.

    Lyberg, L., Biemer, P., Collins, M., deLeeuw E., Dippo, C., Schwarz, N. and Trewin, D. (eds.) 1997. Survey Measurement and Process Quality. New York: John Wiley & Sons.

    Morganstein, D. and Marker, D. 1997. “Continuous Quality Improvement in Statistical Agencies.” In L. Lyberg, P. Biemer, M. Collins, E. deLeeuw, C. Dippo, N. Schwarz, and D. Trewin (eds.), Survey Measurement and Process Quality. New York: John Wiley & Sons. 475– 500.

    Sirken, M. G., Shimizu, B.I., French, D.K., and Brock, D.B. 1974. Manual on Standards and Procedures for Reviewing Statistical Reports. Washington, DC: National Center for Health Statistics.

    Statistics Canada. 1998. Statistics Canada Quality Guidelines. Ottawa, Canada.

    Statistics Canada. 1992. Policy on Informing Users of Data Quality and Methodology. Ottawa, Canada.

    Statistics New Zealand. August 1998. Protocols for Official Statistics. Statistics New Zealand.

    United Kingdom Government Statistical Service. 1997. Statistical Quality Checklist. London: U.K. Office for National Statistics.

    United Nations. 1964. Recommendations for the Preparation of Sample Survey Reports (Provisional Issue). Statistical Papers, Series C, ST/STAT/SER.C/1/Rev.2. New York: United Nations.

    U.S. Department of Education. 1992. NCES Statistical Standards. Washington, DC: National Center for Education Statistics (NCES 92–021r).

    U.S. Department of Energy. 1992. The Energy Information Administration Standards Manual. Washington DC: Energy Information Administration.

    U.S. Office of Management and Budget. 1978. Statistical Policy Handbook. Washington, DC.

    1–9

  • 1–10

  • Chapter 2

    Reporting Sources of Error: Studies and Recommendations

    2.1 Reporting Formats and Reporting Sources of Error Reporting formats for presenting information to users vary considerably not only across statistical agencies but also within a single statistical agency. Individual data programs have a variety of constituencies and user groups, each with diverse representation ranging from sophisticated data analysts with graduate degrees to reporters and the general public. These user groups are served by different types of data products. Individual-level data sets provide survey responses in a format that can be accessed by data analysts. Data sets are often packaged on CD-ROMs with software and instructions on how to create analytic subfiles in the formats used by statistical software. Other CD-ROM products may allow the tabulation of a large number (but not all) of survey variables. Lately, microdata are being made available on the Internet in the form of downloadable files and through online statistical tools. In contrast, diskettes and even 9-track tape files are still released to the general public.

    Print products vary widely in their sophistication and complexity of analysis. Simple categorizations of print reports are difficult, but several broad types of reports can be distinguished. Press releases of one or two pages and short-format reports intended to make complex statistical data available to the general public have become popular during the last decade. The U.S. Bureau of the Census’ Census Briefs, the National Science Foundation’s Issue Briefs, and the National Center for Education Statistics’ Issue Brief and Statistics in Brief series provide recent examples of short print report products aimed at a broad audience.

    Descriptive analyses featuring tabular presentations are often released by statistical agencies. These vary in length, number, and complexity of the tabular information presented, but rarely provide complex statistical analyses. Typically, these reports describe results in narrative form, display both simple and complicated data tables, illustrate aspects of the data through graphical representation, or use combinations of these reporting formats.

    Complex substantive data analyses, using techniques such as regression analysis and categorical data analysis, are often made available in what may be characterized as analytical reports. While these reports may also present descriptive data, their focus is answering specific questions and testing specific hypotheses. These reports and those mentioned above often rely on a single data set for the analysis, although this need not be the case. Compendium reports provide data from a large number of data sets, usually in the form of data tables and charts on a large and diverse set of topic areas.

    Methodological/technical reports are released to describe special studies, and provide results of research on statistical and survey methods. These are reports released to provide specific information on a particular source of error or survey procedure, either to quantify the magnitude of the error source or to document a particular problem. Other technical reports do not focus on a specific methodological study, but rather provide substantial background information about data collection, processing, and estimation procedures. These reports are often characterized as “design and methodology” reports or user guides for the data collection program.

    2–1

  • Although there is considerable recognition of the importance of reporting information about the survey design and the nature and magnitude of survey error sources, there is a lack of consensus about how much detail should be provided in the different reporting formats. Generally speaking, most survey methodologists and program managers agree that basic information about the survey’s purpose, key variables, sample design, data collection methods, and estimation procedures ought to be available in descriptive and analytical reports. Additionally, there is a consensus that the sources of error described above, sampling error, nonresponse error, coverage error, measurement error, and processing error, should be described and accounted for when reporting results. There is less of a consensus and, perhaps no consensus, on the reporting of an error source when it is not a major source of error; for example, a discussion of coverage error in a survey where “state” is the sampling unit may not be necessary.

    While there is general agreement that this information should be reported, there is no clear answer as to how much information to provide in the various reporting formats. A long, reasonably detailed discussion of error sources at the end of a lengthy complicated analytic report may seem reasonable in that context, but is obviously inappropriate for reports that may be only 2–10 pages long. Striking a reasonable reporting balance is the issue, because there is a strong belief that some information on the data source and error sources should be reported regardless of the length of the report. Obviously, details reported ought to depend on the nature of the report and its intended use.

    An understanding of current practices in reporting the quality of survey data is largely dependent on anecdotal evidence and the experiences of individuals working within agencies and survey programs. The subcommittee addressed this limited understanding of current practices in reporting error sources by conducting three studies aimed at trying to characterize current agency practices in reporting sources of error. The results of these studies (Kasprzyk et al. 1999) provided a framework for a discussion of issues and recommendations. The studies dealt with three reporting formats: the short-format report, the analytic report, and the use of the Internet. In the sections that follow, the subcommittee’s studies on the extent to which sources of error are reported in each of the three reporting formats are described and the results of the studies and the subcommittee’s recommendations are presented. The recommendations concern the kinds of information on sources of error that should be included in each of these three reporting formats.

    2.2 The Short-Format Report

    2.2.1 The Short-Format Report Study Short reports, directed to specific audiences, such as policymakers and the public, focus typically on a very narrow topic, issue, or statistic. McMillen and Brady (1999) reviewed 454 publications of 10 pages or less in length to examine their treatment of information about survey design, data quality, and measurement. The publications are products of the 12 statistical agencies that comprise the Interagency Council on Statistical Policy and are available over the Internet. The publications were released during the 1990s and were almost exclusively on-line reports. The majority of the reports were published in the mid-1990s or later. The reports reviewed from each agency were as follows: 91 from the Bureau of Labor Statistics; 79 from the Economic Research Service; 64 from the National Center for Education Statistics; 38 from the U.S. Bureau of the Census; 34 from the National Science Foundation; 28 from the National Agricultural Statistics Service; 24 from the Bureau of Justice Statistics; 22 from the Bureau of Economic Analysis; 22

    2–2

  • from the Internal Revenue Service; 13 from the National Center for Health Statistics; 8 from the Energy Information Administration; and 3 from the Bureau of Transportation Statistics.

    The study found considerable variation in the amount of documentation included across this reporting format. Since there is little consensus over the amount of detail to be included in short reports, the authors limited the review of the reports as to whether or not specific error sources or specific elements of documentation about survey design were mentioned. Virtually all of the short reports include some information on how to learn more about the data reported. This information ranged from name, phone number, e-mail and web site address to citations of printed reports.

    Approximately two-thirds (69 percent) of the 454 reports included either a reference to a technical report or some mention of study design, data quality, or survey error. Close to one-half (47 percent) included some information describing the purpose of the survey or analysis, the key variables, the survey frame, and/or key aspects of the sample design. Only 20 percent included the sample size and 10 percent described the mode of data collection. Only a very small fraction (2 percent) mentioned estimation and/or weighting.

    About one-fifth (22 percent) mentioned sampling error. In most cases, this was no more than a mention, although occasionally statistical significance testing and significance level were noted. Only a handful of reports included information on the size of the sampling error. Nonresponse error is the most visible and well-known source of nonsampling error and certainly the most recognizable indicator of data quality. Despite this, only 13 percent of the short reports included any reference to response rates, to nonresponse as a potential source of error, or to imputations. Only 3 percent reported unit nonresponse rates and there was virtually no reporting of item nonresponse rates. Coverage rates or coverage as a potential source of error was mentioned in only 10 percent of the reports covered. The difficulties associated with measurement were reported in 22 percent of the reports reviewed. Processing errors as a potential source of survey error were cited in 16 percent of the reports.

    Results of the study are not surprising. The types of reports studied are short and oriented to a specific topic. The principal goal of the publication is to convey important policy-relevant results with a minimum of text. Discussion of sources of error in this report format is not viewed as critical. However, the disparity between stated policy and implemented policy concerning the reporting of sources of error is obvious.

    2.2.2 Short-Format Reports: Discussion and Recommendations The short-format report presents limitations on the amount of information that can be presented. Nevertheless, the subcommittee felt the essential principle of reporting information on the nature and magnitude of error must continue to be addressed. The subcommittee recommends that:

    All short-format reports provide basic information about the data set used in the analysis, known sources of error, related methodological reports, and a contact for further information.

    The information presented must, of necessity, be brief, yet it must contain enough salient information that the reader can appreciate the limitations of the methodology and data. Thus, the report should include the name and year of the data collection program the analyses are based on

    2–3

  • and whether the data are based on a probability sample or census. It should also state that the data reported are subject to sampling error (if a sample survey) and nonsampling error. The total in-scope sample size and the overall unit response rate should be reported. Reports having statements describing findings should state whether statistical significance testing was used and reference the significance level. It should include a statement that sampling errors for estimates in the reports are available on request. When only a few estimates are displayed, presenting confidence intervals associated with the estimates may be appropriate. Estimates dependent on survey variables with high item nonresponse rates or having particularly difficult measurement properties should be identified. A reference to a source report that includes more detailed information about data collection and data quality should be cited along with the name of a contact person who can provide additional information or answer questions.

    The information in the recommendation can be conveyed in a short paragraph at the conclusion of a short-format report. The subcommittee recommends that agencies adopt a reporting format that can be repeated with only minor modifications across their short-format reports. One example might look like this:

    Estimates in this report are based on a national probability sample of drawn from the < Sampling Frame>. All estimates are subject to sampling error, as well as nonsampling error, such as measurement error, nonresponse error, data processing error, and coverage error. Quality control and editing procedures are used to reduce errors made by respondents, coders, and interviewers. Statistical adjustments have been made for unit nonresponse and questionnaire item nonresponse. All differences reported are statistically significant at the 0.05 level. The response rate for the survey was xx.x percent. Sampling errors for the estimates in this report are available from . Detailed information concerning the data collection (including procedures taken to test the wording of questions), methodology, and data quality are available in . For more information about the data and the analysis contact .

    2.3 The Analytic Report

    2.3.1 The Analytic Report Study A second study conducted by the FCSM subcommittee focused on a review of “analytic publications”—publications resulting from a primary summarization of a one-time survey or an ongoing series of surveys. Analytic publications may use a variety of formats with results described in narrative form, displayed in tables, shown in graphical format, or a combination of these. Atkinson, Schwanz, and Sieber (1999) conducted a review of 49 analytic publications produced by 17 agencies. The review included publications from major statistical agencies, as well as some from smaller agencies conducting surveys. The selected publications were a convenience sample, but an effort was made to cover as many of the major statistical agencies as time would allow.

    The review considered both the completeness of background information on survey design and procedures and reports of error sources. Evaluation criteria were established for the kinds of survey information that ought to be included in analytic reports. The fifty-one review criteria

    2–4

  • identified are listed in tables 2.1 and 2.2. For each of these, a value of “1” or “0” was assigned to each criterion, depending on whether or not the publication contained the qualifying information for the criterion. To facilitate, standardize, and document the review of each publication, hierarchical levels of increasing detail about each of the major categories of error and background survey information were also established. The criteria for sources of error consisted of a three-level hierarchy ranging from level 1, where a particular error source was merely mentioned through level 3, where detailed information about the error source was provided (table 2.1). Levels 2 and 3 generally involved some quantification of the error source.

    Sampling error was the most frequently documented error source, being mentioned in 92 percent of the reports. Among the analytic reports reviewed, 75 percent presented sampling errors, 75 percent gave a definition and interpretation, and 45 percent specified the method used in calculating sampling errors. Somewhat surprisingly, only 71 percent mentioned unit nonresponse, 59 percent reported an overall response rate, and 20 percent reported response rates for subgroups. Only one-half (49 percent) mentioned item nonresponse and only 22 percent reported any item response rates. Nearly all reports included a definition of the universe (94 percent) and identified and described the frame (84 percent), but only one-half (49 percent) specifically mentioned coverage error as a potential source of nonsampling error, and only 16 percent provided an estimated coverage rate. Two-thirds of the reports mentioned measurement error and one-half included a description and definition. Specific studies to quantify this error were mentioned in only 18 percent of the reports. The majority of the reports (78 percent) mentioned processing as an error source, but very few included any detail about this error source (about 4 percent reported coding error rates and 6 percent reported edit failure rates).

    A second set of criteria was defined to measure the extent to which contextual survey information is included in publications. This information explains the survey procedures and helps the reader understand the survey results and their limitations. Two levels of increasing detail were defined for each of the four categories of background survey information (table 2.2).

    The study indicated that survey background information was reported reasonably well—the general features of the sample design were reported about 92 percent of the time, data collection methods about 88 percent of the time, and a brief description of the estimation techniques about 82 percent of the time. The review of error sources revealed variation across agencies. Only 59 percent of the reports included at least some mention of each of the five error sources.

    The results of this study are not comforting. While recognizing the subjective nature of the evaluation criteria and the obvious limitations of a small convenience sample, the fact remains that a considerable discrepancy exists between stated principles of practice and their implementation when it comes to the nature and extent of reporting sources of survey error.

    2.3.2 The Analytic Report: Discussion and Recommendations Analytic reports, as we have defined them for this study, include a wide variety of report series and types of analysis. The most important characteristic of this kind of report is that it provides fairly detailed analyses and/or summaries of data from either one-time or continuing surveys. The reports, themselves, are longer than the reports described in section 2.2 and provide more opportunity for data providers and analysts to describe the sources and limitations of the data. The subcommittee’s recommendations take advantage of this fact while recognizing that the

    2–5

  • Table 2.1.—Evaluation criteria for the sources of error

    Error type Level Criteria

    Coverage error 1 Coverage error is specifically mentioned as a source of nonsampling error

    2 Overall coverage rate is provided Universe is defined Frame is identified and described

    3 Coverage rates for subpopulations are given Poststratification procedures and possible effects are described

    Nonresponse error 1 Unit nonresponse is specifically mentioned Item nonresponse is specifically mentioned Overall response rate is given

    2 Item response rates are given Weighted and unweighted unit response rates at each interview level are given Numerator and denominator for unit and item response rate are defined

    3 Subgroup response rates are given Effect of nonresponse adjustment procedure is mentioned Imputation method is described Effect of item nonresponse is mentioned Results of special nonresponse studies are described

    Processing error 1 Processing errors are specifically mentioned

    2 Data keying error rates are given Coding error rates are given Edit failure rates are summarized References are given to processing error studies and documentation

    3 Coder variance studies or other processing error studies are given

    Measurement error 1 Measurement error is mentioned as a source of nonsampling error

    2 Specific sources of measurement error are described and defined

    3 Reinterview, record check, or split-sample measurement error studies are mentioned and/or summarized with references to larger reports

    Sampling error 1 Sampling error is mentioned as a source of error Definition and interpretation of sampling error is included Significance level of statements is given

    2 Sampling errors are presented Confidence intervals are defined and method for calculating intervals is described Sampling errors and calculations for different types of estimates (e.g., levels, percent, ratios, means, and medians) are described

    3 Method used for calculating sampling error is mentioned with reference to a more detailed description Generalized model(s) and assumptions are described

    2–6

  • Table 2.2.—Evaluation criteria for background survey information

    Error type Level Criteria

    Comparison to other data sources 1 General statement about comparability of the survey data over

    time is included General statement about comparability with other data sources is included Survey changes that affect comparisons are briefly described

    2 Survey changes that affect comparisons of the survey data over time are described in detail Tables, charts, or figures showing comparisons of the survey data over time are included Tables, charts, or figures showing comparisons with other data sources are included

    Sample design 1 General features of sample design (e.g., sample size, number of PSUs and oversampled populations) are briefly described

    2 Sample design methodologies (e.g., PSU stratification variables and methodology, within PSU stratification and sampling methodology and oversampling methodology) are described in detail with references to more detailed documentation

    Data collection methods 1 Data collection methods used (e.g., mail, telephone, personal visit) are briefly described

    2 Data collection methods are described in more detail Data collection steps taken to reduce nonresponse, undercoverage, or response variance/bias are described

    Estimation 1 Estimation methods are described briefly

    2 Methods used for calculating each adjustment factor are described in some detail Variables used to define cells in each step are mentioned Cell collapsing criteria used in each step are mentioned

    purpose of the report is to present statistical information. The length limitations found in the short-format reports do not apply here and, consequently, an opportunity exists for a fuller treatment of descriptions of the survey, methodology, and data limitations. On the other hand, the fuller treatment of methodology in an analytic report cannot be so lengthy and detailed that this information overshadows the statistical information presented in the report.

    Analytic reports are usually intended for a broad and multi-discipline audience. The reports usually provide a technical notes or methodology appendix containing information about the data sources and their limitations. A critical aspect of the recommendations is the understanding that information presented in a technical or methodology appendix must provide the essentials or key aspects of the survey background and the major sources of error in the survey. The details of the data collection operations and procedures, studies about the error sources, and detailed analyses of the effects of statistical and procedural decisions belong in individual technical reports, comprehensive design and methodology reports, or quality profiles. The technical appendix does not need to be lengthy, 5–10 pages, but it should provide quantitative information to inform the reader as well as citations to secondary sources that provide more detailed information or analyses.

    2–7

  • The subcommittee recommends that studies reporting analyses of statistical data should present three types of information:

    ��Background description of the data collection programs used in the analysis (table 2.3 lists key information),

    ��Description of each major source of error, the magnitude of the error source (if available), and any known limitations of the data (table 2.4 lists essential information), and

    ��Access to the questionnaire or questionnaire items used in the analysis, through the report or through electronic means, or upon request.

    The information described in the recommendation above helps readers/users of the report to better understand the report’s findings. The subcommittee appreciates that a substantial amount of material is typically available on these topics in the data collection specifications. The difficult task for the data producer is to synthesize the available material into a short technical appendix.

    Table 2.3.—Background survey information

    Survey objectives

    Survey content

    Changes in content, procedures, and design from previous rounds

    Survey preparations/pretests

    Sample design Target population defined Sampling frame identified and described Stratification variables Sample size

    Data collection Schedule (when collected/number of follow-ups/time in field) Mode (percent of each type) Respondent (identified/percent self/ percent proxy) Reference period identified Interview Length

    Data processing Identification of procedures used to minimize processing errors Editing operations Coding operations Imputation methods

    Estimation Description of procedure (stages of estimation) Source and use of independent controls

    Key variables/concerns defined

    2–8

  • Table 2.4.—Limitations of the data Sampling error

    Described Interpreted Calculation method stated Presentation of sampling error

    Sampling error tables Generalized variance model parameters Design effects

    Description of how to calculate sampling error

    Nonsampling error Description Sources identified

    Nonresponse error Definition of total unit response

    Numerator and denominator specified Assumptions for the calculation of the response rate stated (RDD, for example) Special situations clarified (defining longitudinal response rates, for example)

    Unit response rates (unweighted and weighted) at each level reported Overall response rate reported Special nonresponse studies cited if low unit response rates occur Item response rates summarized

    Coverage error Coverage error defined Target population defined Study population defined Sampling frame identified (name and year) Coverage rates (population/subpopulation rates) provided

    Measurement error (summarize results and refer to technical reports) Measurement error defined and described Special studies identified (reinterview studies/record check studies) Technical reports/memoranda referenced

    Processing error Processing error described Data entry (keying/scanning) error rates Edit failure rates (summarized) Coding error rates

    Comparison to other data sources Identification/description of independent sources for comparisons Tables/charts/figures comparing estimates Limitations of comparison tables described

    References about the data collection program, survey and sample design, error sources, and special studies

    The second type of information that ought to be included in a technical appendix concerns information about the accuracy of the estimates presented in the report. All statistical agencies address this issue in some fashion. However, the information is presented inconsistently. Basic statistical data to inform users of the quality of the data collection operations are often not

    2–9

  • reported. Substantial gaps exist in the reporting of quantitative information. Three aspects of the estimates should be addressed in the technical appendix: sampling error, nonsampling error, and comparisons with other data sources.

    Sampling error should be defined and presented. Access to sampling errors of the survey estimates should be provided. Nonsampling error should be described and, since in most cases it is difficult to quantify, statistical indicators that serve as proxies for its actual measurement should be presented. A short discussion of each error type presented in the report along with any available data about the extent of that error type should be summarized for the data user. Presenting statistical information about the error source is important—either direct information about the error source or proxy information. Table 2.4 identifies some important topics that ought to be included in a discussion of sources of error. The list is lengthy, but a detailed treatment of each topic is not what is being advocated. Finally, if comparable data are available, information about the comparisons should be provided and detailed analyses referenced.

    The third piece of information that should be made available in the appendix is the questionnaire itself, the questionnaire items used in the analysis, or at a minimum access to questionnaires, perhaps electronically or upon request. The availability of the questionnaire allows the reader to understand the context of the question asked of the respondent.

    2.4 The Internet

    2.4.1 The Internet Study The Internet has become the principal medium for the dissemination of data products for most federal statistical agencies. The third study (Giesbrecht et al. 1999) reviewed guidelines and practices for reporting error sources over the Internet. Some federal agencies have written standards for Web sites, but these generally focus on Web site design, layout, and administrative access. A few agencies, such as the U.S. Bureau of the Census, have begun the process of developing standards for providing information about data quality over the Internet (U.S. Bureau of the Census 1997). This draft report gives details of data quality information that ought to be provided to the user, but does not require or suggest the use of Internet features for making information more accessible. Generally, standards documents related to Internet practices reiterate standards for printed documents (for example, United Nations Economic and Social Council 1998).

    The study reviewed the accessibility of data quality documentation on current Internet sites of 14 federal agencies with survey data collections. Online data documentation was available for most of the sites visited (78 percent). For about one-half the sites, offline documentation was referenced as well. Most agencies seem to upload their printed reports and documentation in the form of simple text or Adobe Acrobat portable document format (PDF) files. In addition, one-half the sites offered technical support online and an additional 29 percent included lists of telephone contacts on their web sites. The study also noted a few best practices found on the visited Web sites, such as the availability of pop-up windows providing definitions of column and row headings in tables, links to send e-mail messages to technical specialists, links to “survey methodology” and “survey design” documentation, explicit directions to users about errors and comparability issues, links from one agency’s home page to another, and common access points to statistical information.

    2–10

  • The study found current Internet standards for data quality information echo the standards for printed reports and statistical tables. More explicit guidelines for how the advantages of the Internet medium should be employed to make data quality information more accessible do not seem to exist. The development of metadata standards (Dippo 1997), however, as an integral part of the survey measurement process may facilitate the creative use of the Internet.

    2.4.2 The Internet Study: Discussion and Recommendations The use of the Internet for reporting statistical information is growing and evolving so fast that recommendations seem inappropriate since they become out-of-date very quickly. The potential use of this new medium has not been fully developed, and statistical agencies while providing much information on the Internet have only begun to explore its potential. In general, agencies report electronically what is reported on paper, often in the form of PDF files that are no more interactive than the paper report. Thus, the limitations on reporting information on the data collection program and sources of error are limitations of the printed report itself.

    Large gaps exist between the potential of the medium and implementation within the medium. The key issue is how to organize and display statistical information and its corresponding documentation in a way that can be understood and easily accessed by the user community. Thus, it is important for statistical agencies to continue developing online design features, such as frames, audio/video, hyperlinks to relevant documents (such as design, estimation, and technical documentation) or parts of the same document, pop-up windows (for, among other applications, providing data definitions for terms in tables or for providing the sampling error of the estimate in the table), online data analysis tools, user forums, and e-mail technical support links to improve service to data users.

    The subcommittee recommends:

    Agencies should systematically and regularly review, improve access to, and update reports and data products available on the Internet, particularly to reports about the quality of the data; the amount of information about data quality should be no less than that contained in printed reports; linkage features available on the Internet, such as hypertext links, should be used to improve access to information about the data collection program and its sources of error. Information displayed on the Internet should incorporate good design principles and “best practices” of displaying data and graphics on the web.

    Agency practices will dictate whether the Internet reporting function is decentralized or not. Either way, financial and staff resources should be allocated to developing new applications to improve online access to information about the quality of data in reports and products on the Internet.

    Predicting future development is difficult, however, as printing and traditional dissemination costs continue to increase and Internet access in households continues to grow. In fact, it may become increasingly common to find that information is available only through the Internet. Internet dissemination ought to spur the development of new ways to present statistical information and new ways to inform data users about the quality of the statistical information. At this time, based on our review of Internet sites, the paper report model is almost universal. The Internet product is developed after the paper product is completed. This suggests to us that the

    2–11

  • potential of the Internet to present and display information has not been addressed from the point of view of a dissemination plan based solely on the Internet. Otherwise, the use of video, audio, frames, and hyperlinks would be more obvious. Consequently, we suggest that data, reports and press releases available only though the Internet be developed to take maximum advantage of the new medium.

    2.5 General Observations Specific recommendations about reporting the nature and extent of sources of error in data collection programs are highly dependent on the form of the report. Reporting limitations are apparent in light of the different formats for releasing statistical information. The fundamental issue is the identification of critical information about the data collection program and the principal sources of error likely to affect an analyst’s appreciation of the results. The development of general recommendations reported in this chapter highlights the practical difficulties of reporting about data collection programs and their error sources when reporting formats place limitations on the amount of information reported. The remainder of the report will no longer address the short-format report, but will focus on the analytic or substantive report.

    The analytic report in its many variations provides broad coverage of a substantial number of analyses and topics. The length and format of these reports provide the survey methodologist and survey statistician an opportunity to inform data users about the quality of data. Recommendations concerning analytic reports provide the minimum amount of information the subcommittee thought ought to be available. In the course of specifying minimum requirements, the subcommittee recognized that much more information about the survey program and its sources of error should be available to the data user. Thus, each chapter has two sets of recommendations: 1) the minimum reporting requirements about the survey