Sampling Error Estimation – SORS practice
Rudi Seljak, Petra Blažič Statistical Office of the Republic of Slovenia
Content of the presentation
• Introduction to the problem• Application for sampling error estimation – basic
principles • Short description of the application • Discussion
Introduction to the problem
• In the case of sampling surveys, standard error is still the most indicative “accuracy” indicator.
• It is obligation of the producer of official statistics to provide at least some information about the level of the accuracy together with the disseminated statistics
• Two main challenges: – How to correctly and timely estimate the standard error for the
whole amount of the disseminated results– How to present these errors to the wide range of different users
in clear and understandable way.
Standard error estimation at SORS• (Not so far) past:
– calculation of the sampling error was quite »survey dependent« → each survey had its own system
– the direct estimations only for the key statistics and for the key domains → models for for the other statistics and (sub) domains
– results with lower degree of precision were marked and the coefficient of variation was the “exclusive” criteria used
• Significant revision of the system few years ago:– The general rules were set up for the sampling error estimation
– The new rules were set up for the dissemination and presentation
– A special (sas) application was built in which all the above mentioned rules were incorporated
Application – general principles
• The application enables calculation of standard error for seven types of statistics.
• The application is usable for most of the statistics, produced at SORS, with few exceptions: – EU-SILC (Laeken) indicators (separate sas macro) – Indices (separate sas macro)
• The application enables aggregation, standard error calculation and also denotation with the special signs, if needed.
Application – general principles cont’d
• The application “merges” the processes of aggregation, sampling error estimation and tabulation into one fully automated process.
• It is designed as a metadata driven (MDD) system → parameters for the concrete survey provided outside the core computer code
• The application uses the following softwares: – The core part of the application (processing) is built in SAS
environment, using PROC SURVEYMEANS “facilities”– The metadata are (for now) stored in Access database – Outputs are provided in the form of the excel tables
Application – technical descriptionHypothetical example
• Stratified one-stage sample Survey on internet usage in enterprises. • Input variables:
– Emp…Number of employees– Turn…Turnover – Wpage…Does the enterprise has its webpage (yes/no)– Nace2…Nace 2-digit group – Nace3…Nace 3-digit group – SizeC…Size class
• Output statistics– STAT01…Proportion of enterprises with its webpage– STAT02… Total turnover in enterprises with its webpage– STAT03… Turnover per employee in enterprises with its webpage
• Dissemination needed by the following domains– Nace 2-digit group– Nace 2-digit group * Size class
• Strata:– Nace 3-digit group * Size class
Metadata tables - Description of the statistics
Table Stat_code Stat_desc Type Dummy Variable Variable_en Variable_den
Table1 STAT01 Proportion of enterprises with its webpage 02 Dummy01
Table1 STAT02 Total turnover in enterprises with its webpage 03 Var02
Table1 STAT03Turnover per employee in enterprises with its
webpage 05 Var02 Var03
Type of statistics:
02 - Proportion
03 - Total
05 - Ratio
Name of the Dummy variable needed for the calculation of the proportion (0,1 values)
Name of the variable required for the calculation of the total
Name of the variable in the enumerator, required for the calculation of the ratio
Name of the variable in the denominator, required for the calculation of the ratio
Metadata tables – derived variables
Table Var_name Condition Value
Table1 Dummy01 If Wpage='yes' 1
Table1 Dummy01 If Wpage='no' 0
Table1 Var02 If Wpage='yes' Turn
Table1 Var02 If Wpage='no' 0
Table1 Var03 If Wpage='yes' Emp
Table1 Var03 If Wpage='no' 0
Name of the derived variable needed
Condition which determines for which units certain rule will be applied
Value of the derived variable
Metadata tables – domains
Table Domain_code Dom_var1 Dom_var2 … Dom_var10
Table1 Dom1 Nace2
Table1 Dom2 Nace2 SizeC
List of the variables which define the dimensions of the domain.
Metadata tables – sample design information
Table Strata PSU
Table1 Nace3
Table2 SizeC
Table Nace3 SizeC _rate_
Table1 26.2 1 1
Table1 26.2 2 0.3
Table1 26.2 3 0.01
…
Information on sample design (strata, PSU)
Information on sample rate by strata cells
Metadata tables – other information
• Type of criteria used for the denotation of the statistics with lower precision
• Limits for the denotations of the statistics with lower precision
• Formats of the results of the final tables (decimals, percentages,…)
• Form and content of the output tables
Output – “raw results”
• Each row of the table gives the information on one aggregate.
Dom1 Dom_val1 Dom2 Dom_val2
…
Stat_code Value No. of units SE CV Stat_diss
Nace2 26.2 Stat01 55.423 229 2.394 4.32 55.4
Nace2 26.3 Stat02 757801.234 102 116625.56 15.39 116625_M
Nace2 33.3 SizeC 3 Stat03 852.273 25 300.256 35.23 N
Specification of domains
Identification of statistics
Information on estimated statistics
Value to be disseminated
Output – formatted tables
Proportion of enterprises with its webpage
Total turnover in enterprises with its webpage
Turnover per employee in enterprises with its webpage
Nace 2 -digit groups
32.4 124675 340
56.5 85738 M N
45.5 M N 578
…
Conclusions
• The application represents an important contribution to the process of the modernization of the statistical processes.
• It can be managed only by the subject matter personnel → significant rationalization of the survey execution.
• Planned improvements : – Development of the user interfaces for metadata management– Transfer of metadata database into ORACLE environment– Supplementation of the application functionalities with the
possibility to estimate the sampling error for indices