1
Editing the Integrated Census in Israel
EDITING THE INTEGRATED CENSUS IN
ISRAEL
Prepared by Eva Rotenberg,
Central Bureau of Statistics, Israel (1)
(1) I would like to thank Ari Paltiel who edited this paper
3
The paper describes the editing and imputation procedures which were used for the demographic variables in the integrated 2008 census in Israel.
Background The Israeli census is an "Integrated Census"
Combines administrative data for 100% of the population with data obtained from a large sample survey (approximately 17% of the households in the country).
The main administrative data source for the Improved Administrative File is the National Population Register (NPR)
All register records are identified by a unique “personal identity number” (PIN) , which can be used for matching records
The NPR contains personal records for all citizens and permanent residents of Israel and includes demographic and residential information
The area survey of the census serves two main purposes:
1. The survey results provide parameters to calculate a weight which represents the probability of a person to actually reside in his/her registered Statistical Area (which is a is a compound of consecutive buildings/blocks consisting of an average of 5,000 inhabitants) in the NPR
http://www.cbs.gov.il/mifkad/integ_census.pdf
2. Collecting socio-economic information such as labor force characteristics, household typology, education, housing, ownership of durable goods and disability
Background
Patterns of Demographic Data in the NPR
The demographic variables which were edited and imputed are:
‘year of birth’, ‘sex’, ‘marital status’, ‘year of immigration’, ‘country of birth’, and ‘parent’s country of birth’
Edit checks, which were implemented with Canceis software, include :
standard checks between relationships such as: ages of parents and children, marital status and age, year of immigration and year of birth, etc
Patterns of Demographic Data in the NPR
The missing values of ‘country of birth’ and ‘parent’s country of birth’ are concentrated in older persons' records in the NPR
The missing values of ‘year of immigration’ were dispersed among younger persons born abroad
The choice of imputation methods is dictated by such special population patterns and relationships between variables such as country of birth , parents country of birth, year of
immigration, year of birth
8
Methodology of Editing and Imputation
Cold deck imputation
Deterministic imputation
Statistical imputation
NIM (Nearest-neighbor Imputation Methodology) using Canceis software (Canadian Census Edit & Imputation System).
Methodology of Editing and Imputation
Cold-deck imputation : Imputation from external data sources - the census
area sample survey and previous (traditional) censuses
The imputation process is based on failed edits in the administrative source
When a discrepancy is found in edited items between valid records of the administrative source with valid records of the census area survey we prefer the administrative source as the more reliable source of data for most variables.
Choice of methodology
The imputation sequence progresses by degree of accuracy from ‘strong’ to ‘weak’ imputation
Once the cold-deck imputation stage is exhausted, we weigh the possibilities of different imputation methods
NIM does not apply to all cases for which imputation is needed either because : there are more certain possibilities the data source does not meet the preconditions for hot-deck imputation
For these cases we used other imputation methods: deterministic imputation, statistical imputation
The Process of Editing and Imputation
Strong deterministic imputation
Completion from the Census sample survey
Matching with previous censuses
Weak deterministic imputation
Statistical imputation
Nearest-neighbor Imputation Methodology
12
Results
The relative proportions of imputations at each stage of the process were determined by the data patterns of different demographic variables in the NPR and the ‘tailoring’ of the combination of imputation methods
Results
‘Year of immigration’
Individuals Percents
Strong Deterministic Imputation 180426 8.8
Imputation by Census Survey 5431 0.3
Imputation by Previous Censuses
16569 0.8
Weak Deterministic Imputation 1842 0.1
Statistical Imputation (means) 15957 0.8
NIM 6914 0.3
TOTAL 227139 11.0
Results
‘Country of birth’
Individuals Percents
Imputation by Census Survey 82 0.0
Imputation by Previous Censuses
565 0.0
NIM 2790 0.0
TOTAL 3437 0.0
Results
‘Father’s country of birth’
Individuals Percents
Imputation by Census Survey 53216 1.3
Imputation by Previous Censuses
304882 7.5
Weak Deterministic Imputation 8668 0.2
NIM 31581 0.8
TOTAL 398347 9.8
16
Summary
In this paper we have shown how the methodology of the Integrated Census in Israel, characterized by a combination of administrative source and a field survey dictated the choice of imputation methods
The imputation process as a whole and the relative proportions of imputations at each stage of the process were determined by the data patterns of different demographic variables in the NPR and the ‘tailoring’ of the combination of imputation methods
Thank you