the application for statistical processing at surs andreja smukavec, surs rudi seljak, surs unece...

17
The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki, 5 – 7 October 2015

Upload: allison-farmer

Post on 17-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

The Application for Statistical Processing at

SURS

Andreja Smukavec, SURS

Rudi Seljak, SURS

UNECE Statistical Data Confidentiality Work Session

Helsinki, 5 – 7 October 2015

Page 2: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Old system

• Stove-pipe oriented production– Ad-hoc solutions were developed for a

particular survey

• Survey methodologists‘ strive for improvement was crucial– “Our data are not confidential“

• Process metadata were not organized– Difficulties when a survey methodologist

resigns

Page 3: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Renovation• An internal project started in 2012

– IT, General Methodology and subject-matter specialists

– Build a global solution appropriate for most of the surveys

– Solution which covers most of the parts of statistical production:

• Data validation • Data editing and imputation• Aggregation and standard error estimation • Statistical disclosure control for tabular data• Tabulation

Page 4: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Renewed system

• Generalised metadata driven application– Database of process metadata

• MS Access -> ORACLE• For each survey instance

– General SAS code– GUI for process metadata– Different microdata environments allowed,

just some basic rules for the structure of microdata databases

• Ad hoc SAS program for preparation of microdata

Page 5: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Schematic presentation of the renewed system

Different microdata databases

General SAS

Ad -

Database of processmetadata

Metadata repository

Different kind of output

…program program

Application for management

Data on tables and variables

Ad-hoc

Page 6: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Tabular data protection1. Calculation of primary sensitivity for

seven types of statistics: number, total, share, ratio, average…

– Threshold, p%-rule, (n,k)-dominance rule– „Holding rule“ + sampling weights– Zeroes unsafe

2. Secondary suppression applied in case of sensitive statistics (number and total)

– SAS-Tool (Excel file with metadata, Tau Argus, SAS macros)

Page 7: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Tabular data protection• Results for each survey instance saved in

the database with statistics (ORACLE)– Statuses for lower precision– Confidentiality flags for the type of primary

and secondary suppression

• 3 types of tabulation (codelists)– Excel format (the most user-friendly)– plain text format (.tab,.hrc) for Tau-Argus– plain text format (.csv) for PX-Edit (SURS’s

publication tool)

Page 8: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Tabulation & Tabular Data Protection

program

General SAS program

Database of process metadata

Caculation of statistics

Tabulation

Different microdata databases

Ad - hoc program

Tabular protection

Output tables

General SAS program

Database with

statistics

Database of process metadata

Page 9: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Parameters for SDC in MetaSOP

Page 10: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Tabulation in MetaSOP

Page 11: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Processing in MetaSOP

Page 12: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Example of 3-dimensional table

After aggregation

CC_SI / Dim_2Dim_3

TOT  F  O TOT  TOT  1209943548 1.09E+09 1.23E+08

1  37700934.42 35625442 207549311  47110694.48 46417660 693034.12  733763444.2 6.62E+08 7145629521  517712620.1 4.8E+08 3748999822  161044502.5 1.1E+08 5083708823  37903335.85 37783060 120275.824  343495995.1 2.86E+08 57438583

11  TOT  59283130.99 56199883 30832481  64428657.15 62453677 197498011  21989840.69 21609892 379948.22  69502173.33 67377101 212507321  13959568.67 13959569 -22  338148.7639 338148.8 z23  7911125.122 7911125 -24  27886089.54 26016025 1870064

12  TOT  215349659.2 2.04E+08 117929681  5993635.356 5993635 -11  2035728.954 2035729 -2  55635358.28 54430511 120484721  146242216.3 1.43E+08 278387622  4164502.417 3872003 292499.223  38774447.75 34931862 384258524  42332750.72 37447112 4885639

21  TOT  176972728 1.76E+08 13239981  2248602.352 2248602 z11  166013.5624 166013.6 z2  372993785.9 3.69E+08 413476921  418831917.8 4.08E+08 1033732322  29411096.08 29411096 z23  56581.5975 56581.6 z24  88244091.34 86483431 1760660

After use of SAS-Tool

CC_SI / Dim_2Dim_3

TOT  F  O TOT  TOT  1209943548 1.09E+09 1.23E+08

1  37700934.42 35625442 207549311  47110694.48 46417660 693034.12  733763444.2 6.62E+08 7145629521  517712620.1 4.8E+08 3748999822  161044502.5 1.1E+08 5083708823  37903335.85 37783060 120275.824  343495995.1 2.86E+08 57438583

11  TOT  59283130.99 56199883 30832481  64428657.15 z z11  21989840.69 z z2  69502173.33 z z21  13959568.67 13959569 -22  338148.763 z z23  7911125.122 7911125 -24  27886089.54 z z

12  TOT  215349659.2 2.04E+08 117929681  5993635.356 5993635 -11  2035728.954 2035729 -2  55635358.28 54430511 120484721  146242216.3 1.43E+08 278387622  4164502.417 z z23  38774447.75 z z24  42332750.72 z z

21  TOT  176972728 1.76E+08 13239981  z z z11  z z z2  z z z21  418831917.8 4.08E+08 1033732322  29411096.08 z z23  z z z24  88244091.34 z z

Page 13: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

New organization• Old system:

– Every survey had its own programmer and its own general methodologist

• Renewed system:– General methodologist and IT expert

(„support team“) help the subject-matter specialist to

• insert and edit the process metadata (except for SDC) into the application

• run particular parts of the statistical process

Page 14: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Advantages

• The subject-matter personnel‘s skills improve (higher quality of data)

• The process metadata can be changed easily and the procedure can be repeated in short time (flexibility)

• The rules for data processing are gathered in one place (transparency)

Page 15: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Drawbacks

• High risk of syntax errors in the process of the insertion of metadata expressions

• Subject-matter personnel has to learn some new skills (SAS expressions)

• An error during the execution can cause problem if the support team is busy or not available

Page 16: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Challenges for the future• Introduce the application successfully into

the production– Adjusting to changes by the subject-matter

specialists– Building a qualified support team

• Adding new functionalities – Indices – Secondary suppression for other types of

statistics– GUI instead of the Excel file for the SAS - Tool

Page 17: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,

Thank you for attention.