1 editing administrative data and combined data sources introduction

29
1 Editing Administrative Data and Combined Data Sources Introduction

Upload: rosalind-carter

Post on 18-Dec-2015

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Editing Administrative Data and Combined Data Sources Introduction

1

Editing Administrative Data and Combined Data Sources

Introduction

Page 2: 1 Editing Administrative Data and Combined Data Sources Introduction

2

Sub-topic: Use of administrative data for business surveys and economic data

Papers focus on methods for pre-processing and edit and imputation to obtain high quality administrative data for supporting survey data and incorporating into statistical data.

Administrative data is used as a direct statistical source in business surveys and economic censuses by replacing survey data of smaller units thus reducing costs and response burden.

Administrative data supports processing of survey data through error localization, imputation models, selective editing techniques and setting thresholds.

Page 3: 1 Editing Administrative Data and Combined Data Sources Introduction

3

• Use of administrative data for business surveys and economic data

Relevant papers for sub-topic:

WP2 - Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures, Canada

WP4 - Use and Editing of Administrative Data in the BusinessIndicators Unit, New Zealand

WP5 - Detecting Outliers in Price Quotes for the Canadian Consumer Price Index, Canada

WP6 - Imputation of External Trade Data in Denmark, Denmark

WP9 - The Use of Administrative Data in the Annual Survey of Retail, Wholesale and Services, United States

Page 4: 1 Editing Administrative Data and Combined Data Sources Introduction

4

Sub-topic: Combining Data Sources

Combining multiple administrative data sources may replace the need to carry out some surveys. Target variables can be obtained by direct replacement or modeled using administrative data as covariates.

Administrative data supports other statistical processes related to edit and imputation and enhances the dimensions of quality with respect to accuracy, coherence, consistency and completeness.

Relevant papers for sub-topic:

WP7 - Evaluation of Editing and Imputation Supported by Administrative Records, Israel

WP8 - Editing and Imputation for the Creation of a Linked MicroFile from Base Registers and Other Administrative Data, Norway

Page 5: 1 Editing Administrative Data and Combined Data Sources Introduction

5

Sub-topic: Other Processes Supporting Edit and Imputation

Statistical data are becoming more dependent on combining administrative data with survey data. There is a need to expand processes beyond conventional and traditional methods of data collection.

High quality, unambiguous metadata about the administrative data must be fully integrated with the survey data and communicated through every step of the processing operation and especially to end users.

Relevant paper for sub-topic:

WP3 - Conceptual Modeling of Administrative Register Information

and XML - Taxation Metadata as an Example, Finland

Page 6: 1 Editing Administrative Data and Combined Data Sources Introduction

6

Editing Administrative Data and Combined Data Sources

Enjoy the Presentations!

Page 7: 1 Editing Administrative Data and Combined Data Sources Introduction

7

Editing Administrative Data and Combined Data Sources

Summary of Papers

Page 8: 1 Editing Administrative Data and Combined Data Sources Introduction

8

• Use of administrative data for business surveys and economic data

All the papers focus on the use of administrative data for enhancing and improving economic statistical data:

Reduction of costs and response burden;

Improving edit and imputation processes by using administrative data for error localization and imputation models;

Setting thresholds and benchmarks for selective editing techniques.

Examples were shown on the use of tax data and trade data with an emphasis on the need for direct pre-processing and edit and imputation procedures to define timely and accurate target variables

that are needed for survey processing.

Page 9: 1 Editing Administrative Data and Combined Data Sources Introduction

9

• Use of administrative data for business surveys and economic data

Other main points:

Quality assessment presented in the papers:

Indicators for evaluating definitions, consistency, correlation and distributions between survey variables and administrative data,

Assessment of edit and imputation procedures on final point estimates and their variance.

The importance of understanding the needs of users to produce “fit for use” data through selective editing techniques compared to “perfect” data through full editing.

Outlier detection as a form of selective editing technique which take into account skewed distributions of the economic data.

Page 10: 1 Editing Administrative Data and Combined Data Sources Introduction

10

• Use of administrative data for business surveys and economic data

Other main points:

Imputation models in the papers included the use of historical data, ratio imputation and nearest neighbor donor imputation, as well as imputation on both a micro and macro level.

One example was the imputation for statistical units reporting bi-monthly or half-yearly on tax data to obtain timely monthly data.

All papers emphasize the importance of quality and error checks on the final outputs based on the combined administrative and survey data sources.

Page 11: 1 Editing Administrative Data and Combined Data Sources Introduction

11

• Use of administrative data for business surveys and economic data

Specific topics from the papers:

At Statistics Canada, an Economic Census was developed based on high-quality administrative data, the Business Register and survey data.

At Statistics New Zealand and the Census Bureau a comprehensive program is being carried out to incorporate more administrative data into the survey processes of economic data by replacing survey data of smaller units and moving towards selective editing techniques.

Page 12: 1 Editing Administrative Data and Combined Data Sources Introduction

12

• Use of administrative data for business surveys and economic data

Specific topics from the papers:

Both Statistics New Zealand and Statistics Denmark discuss edit and imputation processes specifically for trade statistics where administrative data is fundamental to the imputation of missing and erroneous data.

Statistics Canada present different methods for outlier detection as a special case of selective editing techniques.

Both Statistics Canada and the Census Bureau assess the quality of outputs based on administrative data by the impact on the efficiency of the final point estimates.

Page 13: 1 Editing Administrative Data and Combined Data Sources Introduction

13

• Combining Data Sources

The emphasis of the papers is on linking multiple high quality administrative data sources to model and impute target variables for social surveys.

The more sources linked together the higher the risk of errors through conflicting values of variables. Each data source must be assessed for its completeness and accuracy to avoid introducing new errors into the statistical data.

Administrative data improves the quality of statistical data through error localization, imputation models, outlier detection, and selective editing techniques. It also reduces the need for edit and imputation.

Boundaries between edit and imputation are constantly moving due to the use of multiple sources of data. Administrative data support the detection and correction of errors. They also provide a source of data as a reference file for imputation.

Page 14: 1 Editing Administrative Data and Combined Data Sources Introduction

14

• Combining Data Sources

Other main points:

Administrative data supports both the error detection and error correction processes:

by supplementing survey data and allowing for better model specification for imputation either by adding covariates or by actually replacing missing or erroneous data;

for use as a reference file to confirm erroneous values of variables and reasons for failed edit checks;

for quality assurance to identify errors resulting from both the data collection phase or the data processing phase.

Prior knowledge and understanding of the data in a multi-source data collection is essential for the selection and integration of the data sources.

Page 15: 1 Editing Administrative Data and Combined Data Sources Introduction

15

• Combining Data Sources

Specific topics from the papers:

At Statistics Norway multiple administrative data sources are linked to obtain employment characteristics. The electronic data capture has a large impact on the development of integrated and coherent statistical systems.

Papers demonstrate methods for identifying units, timeliness of the variables, definitions and classifications in order to merge multiple administrative sources and develop imputation models for target variables not present in the data sources.

CBS Israel has wide experience working with multiple sources of administrative data and its use for both the editing stage and the imputation stage and also supporting other statistical processing.

Page 16: 1 Editing Administrative Data and Combined Data Sources Introduction

16

• Other Processes Supporting Edit and Imputation

Many survey processes are based on traditional methods of collected survey data. With more use of multiple data sources, statistical processing has to encompass all of the statistical data, both survey and administrative data.

The edit and imputation processes and its validation provide important metadata which result in future key explanations to users on movements in the series.

Other statistical processing supported by the edit and imputation processes are record linkage, coding and the imputation of new

variables as well as quality assessment of the final outputs.

Page 17: 1 Editing Administrative Data and Combined Data Sources Introduction

17

• Other Processes Supporting Edit and Imputation

Other main points:

The need to understand and interpret register data through a uniform reference frame and in a standard format is vital to both producers and users of the statistical data.

Quality dimensions are enhanced by the use of administrative data with respect to coherence, consistency, comparability, completeness and accuracy.

Imputation for new variables is supported by administrative data by providing better models, more covariates and definitions of weighting classes or the direct replacement by administrative data.

Page 18: 1 Editing Administrative Data and Combined Data Sources Introduction

18

• Other Processes Supporting Edit and Imputation

Specific topics from the paper:

Owners of administrative registers do not often hold information about the data in electronic format. The challenge for NSI’s is to translate this information about the data into structured metadata.

Statistics Finland uses the Common Structure of Statistical Information (CSOSI) method, and gives an example of the system when applied to personal taxation data and to the administrative information describing it.

When registers are used in the survey process, producers of statistics must ensure that users gain a good understanding of the content to ensure that they make accurate interpretations.

Page 19: 1 Editing Administrative Data and Combined Data Sources Introduction

19

Editing Administrative Data and Combined Data Sources

Points for Discussion

Page 20: 1 Editing Administrative Data and Combined Data Sources Introduction

20

• Use of administrative data for business surveys and economic data

Points for discussion:

How can differences in definitions, classifications and timeliness of variables in administrative data be reconciled with survey data without introducing new bias into the data?

Can we automatically assume that administrative data has higher quality than survey data? How should thresholds be set below which administrative data should not be used at all?

Can administrative data directly replace survey data?

Quality measures in the papers focused on the efficiency of point estimates. Are there other quality measures that measure the impact of using administrative data in survey processes, in particular at a micro level?

Page 21: 1 Editing Administrative Data and Combined Data Sources Introduction

21

• Use of administrative data for business surveys and economic data

Points for discussion:

How can edit rules be managed and updated to take into account dynamic and constantly changing administrative sources?

Selective editing thresholds described in the papers were determined by budget constraints. Can we incorporate historical data, external knowledge and the influence on the final estimates into the setting of thresholds?

Selective editing techniques for administrative data target larger statistical units, however smaller units are typically used for replacing survey data. Is there a way to efficiently edit smaller units through selective editing techniques?

Page 22: 1 Editing Administrative Data and Combined Data Sources Introduction

22

• Use of administrative data for business surveys and economic data

Points for discussion:

Outlier detection methodology is proposed as a selective editing technique but it does not necessarily target the most influential units. Can the methodologies be combined and how should thresholds be determined?

Can selective editing techniques be carried out for multi-variate editing? How can we measure the impact of influential multiple variables and to set thresholds in this framework?

How can better imputation models be developed for administrative data as opposed to survey data which make more use of historical data and multiple data sources? For example, can units reporting monthly be used to impute units reporting by-monthly or half-yearly?

Page 23: 1 Editing Administrative Data and Combined Data Sources Introduction

23

• Combining Data Sources

Points for discussion:

Can we develop a mechanism to influence the methods of data collection from suppliers of administrative data in terms of content and format to ensure more generic pre-processing and edit and imputation processes?

The integration of multiple data sources can result in introducing new errors. How should the quality of the variables be assessed in a multi-source data collection, in particular when having to choose between values for the same variable?

The quality of administrative data can vary widely. When we consider combining data sources, should they all be of a similar quality?

Page 24: 1 Editing Administrative Data and Combined Data Sources Introduction

24

• Other Processes Supporting Edit and Imputation

Points for discussion:

Is there a mechanism by which we can influence the suppliers of administrative registers to collect and maintain metadata in a machine readable format?

How can we best integrate content information about administrative registers into the metadata describing the overall statistical processing operation, in particular with different formats of data?

Page 25: 1 Editing Administrative Data and Combined Data Sources Introduction

25

Editing Administrative Data and Combined Data Sources

Conclusions and Future Research

Page 26: 1 Editing Administrative Data and Combined Data Sources Introduction

26

Underlying theme in all of the papers:

The use of administrative data for survey processing and in particular for supporting efficient edit and imputation processes based on error localization techniques, imputation modeling and selective editing techniques, increases the quality of the statistical data and reduces response burden and costs.

There is a clear need for standardization/harmonization of definitions and concepts to facilitate the use of multiple sources of administrative data within the survey process.

Page 27: 1 Editing Administrative Data and Combined Data Sources Introduction

27

Future Research:

The development of generic modules for editing and imputation of administrative data is particularly challenging since data collection methods and formats vary greatly depending on the source.

More research needs to go into the development of common portals and electronic data collection which will have a direct effect on methods used for editing and imputation.

Better modeling techniques, edit and imputation processes and quality indicators are needed to assess and correct administrative data prior to its use in statistical processing as well as to increase the quality of the final product.

Page 28: 1 Editing Administrative Data and Combined Data Sources Introduction

28

Future Research:

Further development of a time series methodology approach for error localization and imputation of administrative data which usually have rich historical data.

Administrative data is diverse and may include both numerical and categorical data. The edit and imputation modules have to be able to handle both types of data.

Better methods for setting selective editing thresholds for administrative data based on the influence of the variable as well as the development of a multi-variate framework.

Page 29: 1 Editing Administrative Data and Combined Data Sources Introduction

29

Editing Administrative Data and Combined Data Sources

Thank you for your attention!

Natalie and Heather