Download - Sampling Error Estimation – SORS practice Rudi Seljak, Petra Blažič Statistical Office of the Republic of Slovenia

Sampling Error Estimation – SORS practice

Rudi Seljak, Petra Blažič Statistical Office of the Republic of Slovenia

Content of the presentation

• Introduction to the problem• Application for sampling error estimation – basic

principles • Short description of the application • Discussion

Introduction to the problem

• In the case of sampling surveys, standard error is still the most indicative “accuracy” indicator.

• It is obligation of the producer of official statistics to provide at least some information about the level of the accuracy together with the disseminated statistics

• Two main challenges: – How to correctly and timely estimate the standard error for the

whole amount of the disseminated results– How to present these errors to the wide range of different users

in clear and understandable way.

Standard error estimation at SORS• (Not so far) past:

– calculation of the sampling error was quite »survey dependent« → each survey had its own system

– the direct estimations only for the key statistics and for the key domains → models for for the other statistics and (sub) domains

– results with lower degree of precision were marked and the coefficient of variation was the “exclusive” criteria used

• Significant revision of the system few years ago:– The general rules were set up for the sampling error estimation

– The new rules were set up for the dissemination and presentation

– A special (sas) application was built in which all the above mentioned rules were incorporated

Application – general principles

• The application enables calculation of standard error for seven types of statistics.

• The application is usable for most of the statistics, produced at SORS, with few exceptions: – EU-SILC (Laeken) indicators (separate sas macro) – Indices (separate sas macro)

• The application enables aggregation, standard error calculation and also denotation with the special signs, if needed.

Application – general principles cont’d

• The application “merges” the processes of aggregation, sampling error estimation and tabulation into one fully automated process.

• It is designed as a metadata driven (MDD) system → parameters for the concrete survey provided outside the core computer code

• The application uses the following softwares: – The core part of the application (processing) is built in SAS

environment, using PROC SURVEYMEANS “facilities”– The metadata are (for now) stored in Access database – Outputs are provided in the form of the excel tables

Application – technical descriptionHypothetical example

• Stratified one-stage sample Survey on internet usage in enterprises. • Input variables:

– Emp…Number of employees– Turn…Turnover – Wpage…Does the enterprise has its webpage (yes/no)– Nace2…Nace 2-digit group – Nace3…Nace 3-digit group – SizeC…Size class

• Output statistics– STAT01…Proportion of enterprises with its webpage– STAT02… Total turnover in enterprises with its webpage– STAT03… Turnover per employee in enterprises with its webpage

• Dissemination needed by the following domains– Nace 2-digit group– Nace 2-digit group * Size class

• Strata:– Nace 3-digit group * Size class

Metadata tables - Description of the statistics

Table Stat_code Stat_desc Type Dummy Variable Variable_en Variable_den

Table1 STAT01 Proportion of enterprises with its webpage 02 Dummy01

Table1 STAT02 Total turnover in enterprises with its webpage 03 Var02

Table1 STAT03Turnover per employee in enterprises with its

webpage 05 Var02 Var03

Type of statistics:

02 - Proportion

03 - Total

05 - Ratio

Name of the Dummy variable needed for the calculation of the proportion (0,1 values)

Name of the variable required for the calculation of the total

Name of the variable in the enumerator, required for the calculation of the ratio

Name of the variable in the denominator, required for the calculation of the ratio

Metadata tables – derived variables

Table Var_name Condition Value

Table1 Dummy01 If Wpage='yes' 1

Table1 Dummy01 If Wpage='no' 0

Table1 Var02 If Wpage='yes' Turn

Table1 Var02 If Wpage='no' 0

Table1 Var03 If Wpage='yes' Emp

Table1 Var03 If Wpage='no' 0

Name of the derived variable needed

Condition which determines for which units certain rule will be applied

Value of the derived variable

Metadata tables – domains

Table Domain_code Dom_var1 Dom_var2 … Dom_var10

Table1 Dom1 Nace2

Table1 Dom2 Nace2 SizeC

List of the variables which define the dimensions of the domain.

Metadata tables – sample design information

Table Strata PSU

Table1 Nace3

Table2 SizeC

Table Nace3 SizeC _rate_

Table1 26.2 1 1

Table1 26.2 2 0.3

Table1 26.2 3 0.01

…

Information on sample design (strata, PSU)

Information on sample rate by strata cells

Metadata tables – other information

• Type of criteria used for the denotation of the statistics with lower precision

• Limits for the denotations of the statistics with lower precision

• Formats of the results of the final tables (decimals, percentages,…)

• Form and content of the output tables

Output – “raw results”

• Each row of the table gives the information on one aggregate.

Dom1 Dom_val1 Dom2 Dom_val2

…

Stat_code Value No. of units SE CV Stat_diss

Nace2 26.2 Stat01 55.423 229 2.394 4.32 55.4

Nace2 26.3 Stat02 757801.234 102 116625.56 15.39 116625_M

Nace2 33.3 SizeC 3 Stat03 852.273 25 300.256 35.23 N

Specification of domains

Identification of statistics

Information on estimated statistics

Value to be disseminated

Output – formatted tables

Proportion of enterprises with its webpage

Total turnover in enterprises with its webpage

Turnover per employee in enterprises with its webpage

Nace 2 -digit groups

32.4 124675 340

56.5 85738 M N

45.5 M N 578

…

Conclusions

• The application represents an important contribution to the process of the modernization of the statistical processes.

• It can be managed only by the subject matter personnel → significant rationalization of the survey execution.

• Planned improvements : – Development of the user interfaces for metadata management– Transfer of metadata database into ORACLE environment– Supplementation of the application functionalities with the

possibility to estimate the sampling error for indices

Download - Sampling Error Estimation – SORS practice Rudi Seljak, Petra Blažič Statistical Office of the Republic of Slovenia

Top Related