Download - 1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline
1
Improving Data Quality
COURSE DESCRIPTIONTITLE: Improving Data Quality in Census/Surveys
DURATION: 3 weeks Two weeks in Washington, DC and one week in Jeffersonville, IN at the US Census Bureau National Processing Center (NPC)
PREREQUISITES: None
TRAINEE MATERIALS: Participant Manual
PERFORMANCE OBJECTIVES: Upon completion of this course, you will be able to: Plan and budget quality assurance programs; Apply the principles of quality assurance to tasks, processes, and products; Identify areas of potential error in mapping, questionnaire development, staff training, data
collection, coding, data processing, and data analysis operations; Develop a quality assurance program
INSTRUCTOR: Rebecca Sauer ([email protected]) International Programs Center, US Census Bureau
COURSE DATES: June 23 – July 11, 2003
Introduction to Data Quality- Course OutlineWashington, DC
1. Managing the process, an introduction to quality assurance A. Purpose and activities of a quality assurance program B. Principles of Total Quality Management (TQM)
2. Involving data users in the process A. Producing relevant data B. Creating more data users C. Building a reputation
3. Planning a quality assurance program
A. Work Breakdown Structure (WBS) B. Resource requirements C. Scheduling D. Determining costs for quality assurance programs
4. Tools for budgeting and planning assurance programs
5. Quality control considerations for census/survey operations A. Reducing Non-Response B. Development of manuals C. Mapping operations D. Staff training E. Field operations F. Acceptance Sampling G. Questionnaire development H. Coding operations I. Data processing operations J. Data analysis K. Data delivery
4
What is Quality Data?
5
Does Data Quality Matter?
Policy and program decisions
Trends
Modification of survey tools
6
Data Quality:
The degree of excellence or accuracy of the factual information being collected in a survey or census needed to make it meet the user’s needs for decision making purposes.
7
Quality Assurance Program
A good quality assurance program is an effective tool used to fine-tune the products and processes of a census or survey to prevent data errors before they happen, saving time and money.
8
Goals of Data Quality: Relevance
Accuracy
Timeliness
Accessibility
Interpretability
Coherence
9
RELEVANCERelevance is the degree to which the data
meets the users’ needs. In order to meet these goals, Subject-matter specialists of the statistical organization must meet with the users to define:
Items to be measured
Concepts and definitions
Analytical plans
Tabulation plans
10
ACCURACY
The objective of a survey or census is to obtain
estimates of the true (unknown) value of a
population or economic parameter. For these
estimates to have any worth they must be
close to the true value. Therefore, it is of
utmost importance to establish accuracy as a
primary goal for data production.
11
TIMELINESS
Timeliness refers to the length of time
between data availability and the event
it describes. Timely information is
valuable because it can still be acted
upon. Timelines is usually a trade-off
with accuracy.
12
ACCESSIBILITYThe accessibility of statistical information refers
to the ease with which it can be obtained from the national statistical office. This includes the ease with which the existence of information can be ascertained, as well as the suitability of the form or medium through which the information can be accessed.
The cost of the information may also be an aspect of accessibility for some users.
13
INTERPRETABILITYThe interpretability of statistical information
reflects the availability of the supplementary information and metadata necessary to interpret and utilize it appropriately.
This information normally covers the underlying concepts, variables, and classifications used, the methodology of collection, and indications of the accuracy of the statistical information.
14
COHERENCEThe coherence of statistical information reflects
the degree to which it can be successfully brought together with other statistical information within a broad analytical framework over time.
The use of standard concepts, classifications, and target populations promotes coherence, as does the use of common methodology across surveys.
15
Responsibility of the Statistical Organization:
Produce timely, coherent data to satisfy users’ needs, which is accessible and easily understood, while insisting on the greatest possible accuracy.
Relevance
Timeliness
Accuracy
Coherence
Accessibility
Interpretability
16
Benefits of High Quality Data:
Increased use of data
Increased visibility and prestige for the statistical office
Generate a culture of data use and demand
17
Quality Assurance Program:
Major components:
A Training Program
Quality Control Program
An Evaluation Program
18
Purposes of Quality Control:
To control the product:
Census products are the results of any work that is produced by one group of persons that will be used by another group of persons later in the census.
In order to control census products, we need definitions of acceptable for each product, decision rules to determine which products are accepted or rejected, and appropriate actions to take based on the results of the decision.
19
Purposes of Quality Control:
To control the process
Control the methods used to monitor the operation
Control the steps that determine when an employee needs to be retrained or released
-User meetings -Data Collection -Post-collection processing
-Design and Development -Analysis
-Dissemination
PLAN the products
COLLECT the data
DELIVER the products
Docu
men
tati
on
Cu
sto
mer
Serv
ice
Manage the Process
Quality Control
21
Anatomy of a Survey/Census 5 Phases
Contract Negotiation
Design and Development
DataCollection
Post-CollectionProcessing
Analysis andDissemination
Each phase has its own:•Objective•Key tasks•Deliverables•Documentation
22
Contract Negotiation
Objective: Identify the sponsor’s needs and
outline the survey(s) to meet those needs.
Key Tasks: Understand the requirements Generate the contract Negotiate to final decision Gain necessary government
approval/clearance
Deliverables: Approved contract Rough schedules and timelines Rough questionnaire outline
Documentation: Project description List of data products expected Contract
23
Design and DevelopmentObjective:
Develop survey tools to meet the objectives, given time and cost parameters
Key Tasks: Finalize schedule Sampling Create input files (listing) Develop/revise data capture systems Develop and test the questionnaire Develop training and interviewing
materials Conduct field pre-test Test systems
Deliverables: Sample Approved data collection/capture
modes Input files (master list) Training/interview materials Analysis plan
Documentation: Baseline schedule Final specifications Sampling plan Training materials Instrument documentation
24
Data Collection
Objective: Gather raw data in a timely and
cost-effective manner.
Key Tasks: Conduct training Field the survey Collect the data from the field Monitoring and problem solving
Deliverables: Status of each case Raw data for each case
Documentation: Tracking report of field problems Progress/status reports
25
Post-Collection ProcessingObjective: Generate accurate and organized
final microdata.
Key Tasks: Data capture Data receipt (reformatting) Preliminary review Clean the data Imputation Weighting Generation of preliminary tables Monitoring and problem solving
Deliverables: Approved internal data file
(microdata) Crosstabulations and/or work
tables
Documentation: Data dictionary All processing specifications
(coding, editing, imputation, weighting, etc.)
Problem tracking and progress reports
26
Analysis and DisseminationObjective:
Translate data into useful information that meets objectives, and distribute it to the appropriate audience.
Key Tasks: Send data directly to sponsor
(if applicable) Create public use file Table/publication generation Compile/produce final
documentation Evaluation and debrief
Deliverables: Tables for publication Reports Public use file Press Releases
Documentation: Lessons learned Procedural History Reports/Publications/Press
Releases Public use file Disclosure request
27
Activities of a Quality Assurance Program
Measurement of Quality Characteristics
Comparison to Pre-determined
Standards
Corrective Actions
28
Quality Control InspectionsTypes:• Qualitative or Attribute Inspections
Examination of a characteristic of interest and determination of whether a presence or absence of a certain property is there.
• Quantitative or Variable InspectionsMeasurement of the characteristic of interest on
a continuous scale.
Methods:• Sample Inspections• 100% Verification
29
Verification Methods
Dependent Verification:
• Production clerk
• Verifier
Verifier sees production clerk’s work
PROBLEM: In dependent verification, the verifier may agree more often than they should since they see the production clerk’s work.
30
2-Way Independent Verification:Two-way match• Production clerk• Verifier• Matcher
Agreements between production clerk and verifier are correct
Disagreements are reviewed by a matcher
PROBLEM: Independent verification is more costly since there are three clerks involved in the process: the production clerk, the verifier, and the matcher. However, independent verification is more accurate since the verifier is not influenced by the production clerk’s work.
31
3-Way Independent Verification:
Three-way match• Production clerk• Two independent verifiers• Matcher
Agreements of all three (clerk and verifiers) are correct
If two out of three agree, an error is chargedIf all three disagree, no error is charged and
matcher decides the correct answer
Problem: more costly but more accurate