generalized census processing system at the national agricultural statistics service

19
Slide 1 Slide Slide 1 International Conference on Establishment Surveys III Montreal June 18-21, 2007 United States Department of Agriculture National Agricultural Statistics Service Processing System at the National Agricultural Statistics Service Thomas Jacob, Carol House National Agricultural Statistics Service

Upload: farhani

Post on 18-Mar-2016

29 views

Category:

Documents


0 download

DESCRIPTION

United States Department of Agriculture National Agricultural Statistics Service. International Conference on Establishment Surveys III Montreal • June 18-21, 2007. Slide 1. Slide. Slide 1. Generalized Census Processing System at the National Agricultural Statistics Service. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Generalized Census Processing  System at the National Agricultural Statistics Service

Slide 1Slide Slide 1

International Conference on Establishment Surveys IIIMontreal • June 18-21, 2007

United States Department of AgricultureNational Agricultural Statistics Service

Generalized Census Processing System at the National Agricultural

Statistics Service

Thomas Jacob, Carol House

National Agricultural Statistics Service

Page 2: Generalized Census Processing  System at the National Agricultural Statistics Service

Presentation Outline

• Census of Agriculture Overview• 2002 Census Processing System• Reasons for redesign• Redesign initiatives• Dashboard for continuous monitoring• Can the system be more generalized?• Acknowledgements• Questions

Slide 1Slide Slide 2International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Page 3: Generalized Census Processing  System at the National Agricultural Statistics Service

Census of Agriculture Overview

• In 1997 Census of Agriculture was transferred from U.S Bureau of the Census

• 2002 -3 Million report forms mailed out• 400+ system users in Headquarters and

Field Offices• Over 1,500 variables• Over 110 published tables per state and US• Volume, volume, volume

Slide 1Slide Slide 3International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Page 4: Generalized Census Processing  System at the National Agricultural Statistics Service

2002 Census Processing System• NASS contracted National Processing Center (NPC) for

- Mail out, Check in, Capturing images and Capturing data ( OMR +ICR)

• SAS based system for Edit, Imputation and Analysis using Sybase and Redbrick databases

- Edit Specifications captured using Decision logic table (DLT)- Micro level and macro level analysis- Automated edit using DLT- Tried to implement Fellegi-Holt (FH)

methodology and DLT as a two-tier edit- Goal of 80% data not touched by analysts.

Slide 1Slide Slide 4International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

OMR=Optical Marker Recognition ICR=Intelligent Character Recognition

Page 5: Generalized Census Processing  System at the National Agricultural Statistics Service

What Worked Well

• Completed Census on Schedule• Questionnaire Imaging• Analysis - Macro and Micro tools• % of records touched• Disclosure routines worked well but

independently

Slide 1Slide Slide 5International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Page 6: Generalized Census Processing  System at the National Agricultural Statistics Service

Reasons for Redesign

• Increase system speed- Edit and Imputation was extremely slow

(could only edit 75 records at a time)- Issues with loads between databases- Slow communication lines- Database design was inefficient

- Nearest Neighbor Imputation using sequential search

Slide 1Slide Slide 6International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Page 7: Generalized Census Processing  System at the National Agricultural Statistics Service

Reasons for Redesign

• Increase effectiveness and quality of process- Minimize data capture errors- Time consuming analysis

- Inadequate dashboard for identifying influential records

- Need for true interactive edit (IE)- Disclosure routine in old FORTRAN code

Slide 1Slide Slide 7International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Page 8: Generalized Census Processing  System at the National Agricultural Statistics Service

Edit/Imputation/IE

CATI

Web

SCAN

Images

PaperForms

KFI

Raw Data

Sybase/OLTP

Replication Server

Redbrick/OLAP

Batch Edit

Analysis

Data ReviewInteractive Edit

PRD

DLT Edit Data Review

Interactive Edit

Replication Server

Data ReviewInteractive Edit

Disclosure/Tabulation

Slide 1Slide Slide 8International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Donor Pool

Page 9: Generalized Census Processing  System at the National Agricultural Statistics Service

Edit/Imputation/IE

CATI

Web

SCAN

Images

PaperForms

KFI

Raw Data

Sybase/OLTP

Replication Server

Redbrick/OLAP

Batch Edit

Analysis

Data ReviewInteractive Edit

PRD

DLT Edit Data Review

Interactive Edit

Replication Server

Data ReviewInteractive Edit

Disclosure/Tabulation

Qua

Slide 1Slide Slide 9International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Donor Pool

Page 10: Generalized Census Processing  System at the National Agricultural Statistics Service

Redesign Initiatives• Multiple modes of data collections ( CATI, WEB,

KFI, …)- but use the same module for loading data

• Key from Image (KFI) instead of scanning (OCR&OMR)

• Create an indicator denoting additional information occurred on the report form ( Respondent notes, Remarks, Altered Stubbs)

• Create images for respondents who responded through CATI, Web

Slide 1Slide Slide 10International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Page 11: Generalized Census Processing  System at the National Agricultural Statistics Service

Edit/Imputation/IE

CATI

Web

SCAN

Images

PaperForms

KFI

Raw Data

Sybase/OLTP

Replication Server

Redbrick/OLAP

Batch Edit

Analysis

Data ReviewInteractive Edit

PRD

DLT Edit Data Review

Interactive Edit

Replication Server

Data ReviewInteractive Edit

Disclosure/Tabulation

Slide 1Slide Slide 11International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Donor Pool

Page 12: Generalized Census Processing  System at the National Agricultural Statistics Service

Redesign Initiatives• Batch edit in Unix, IE in PC( local) using

the same code and same donors• True interactive edit (IE)• Dual screens for Data Review and Image

comparisons• Improve donor search strategies- scalable

using daemons & SAS/SHARE• More use of Previously reported Data

(PRD)

Slide 1Slide Slide 12International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Page 13: Generalized Census Processing  System at the National Agricultural Statistics Service

Edit/Imputation/IE

CATI

Web

SCAN

Images

PaperForms

KFI

Raw Data

Sybase/OLTP

Replication Server

Redbrick/OLAP

Batch Edit

Analysis

Data ReviewInteractive Edit

PRD

DLT Edit Data Review

Interactive Edit

Replication Server

Data ReviewInteractive Edit

Disclosure/Tabulation

Slide 1Slide Slide 13International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Donor Pool

Page 14: Generalized Census Processing  System at the National Agricultural Statistics Service

Redesign Initiatives• Creating new data models for both Transactional

(OLTP) and Analytic databases (OLAP)• Editing is in OLTP environment. Analysis is in

OLAP environment• Introduce Replication server- moves and

synchronizes data between OLTP and OLAP• Perform more server side processing using

SAS/CONNECT to reduce interactive response times

Slide 1Slide Slide 14International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

OLTP=Online Transaction Processing OLAP=Online Analytic Processing

Page 15: Generalized Census Processing  System at the National Agricultural Statistics Service

Redesign Initiatives

• Disclosure module converted to SAS/BASE • The system is more metadata driven. • Provide quality control grids to monitor the

editing effects on the data

Slide 1Slide Slide 15International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Page 16: Generalized Census Processing  System at the National Agricultural Statistics Service

Dashboard for Continuous Monitoring

• Implementing a Quality Control module to track four major areas in a proactive mode- AdministrativeManagement Information System (MIS) reports to track weekly progress

- Data Monitor what the system is doing to the data.

Tables, maps, graphs, outlier gridsIndependent check of record level inconsistencies

- Elapsed Times Track how long key processes are taking to run

- System Stability Track key indicators that can impact performance of databases, UNIX

machines, SAS, etc.

Slide 1Slide Slide 16International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Page 17: Generalized Census Processing  System at the National Agricultural Statistics Service

Can the system be more generalized?

• Wanted to have one system for Surveys and Censuses

• Metadata can handle both• Imputation can handle different types of

imputation• A few Surveys are using the system• Survey Analysts are reluctant to use DLT for

Survey edits• FH methodology sent back to research for

further evaluation.

Slide 1Slide Slide 17International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

Page 18: Generalized Census Processing  System at the National Agricultural Statistics Service

Acknowledgment

Slide 1Slide Slide 18International Conference on Establishment Surveys III

Montreal • June 18-21, 2007

We want to thank each and every member in the 2007 Census Team for their tireless efforts to make the redesign initiatives a reality.

Page 19: Generalized Census Processing  System at the National Agricultural Statistics Service

Questions?

Slide 1Slide Slide 19International Conference on Establishment Surveys III

Montreal • June 18-21, 2007