estimation of preliminary unemployment rates by means of multiple imputation un/ece-work session on...
TRANSCRIPT
Estimation of preliminary unemployment rates by
means of multiple imputation
UN/ECE-Work Session on Data Editing
Vienna, April 2008
Thomas Burg, Statistics Austria
www.statistik.at
S T A T I S T I K A U S T R I AApril 2008 2
Outline
Description of the problem
Methods of Estimation
Preliminary estimation using MI
Results
S T A T I S T I K A U S T R I AApril 2008 3
Quickness of results
Today policy makers want to receive results as early as possible
Challenging for official statistics
Final results only after field work is completed
Can I get figures earlier?
S T A T I S T I K A U S T R I AApril 2008 4
Austrian Labor Force Survey
Survey performed quarterly based on a rotatingsample of households. Every quarter one fifthof the sample is exchanged
Data collection is distributed to 13 weeks of a quarter andrespondents are questioned about their labor status withreference to the week before.
Most important figures: Unemployment rates
S T A T I S T I K A U S T R I AApril 2008 5
Situation during field work
End of quarter
Estimation on data available on first day afterQuarter ends.
S T A T I S T I K A U S T R I AApril 2008 6
The Problem
Is it possible to estimate preliminary unemploymentRates on the basis of the data already received?
AvailableData ~70%
Missing Records ~30%
Unemploymentfigures
S T A T I S T I K A U S T R I AApril 2008 7
Missing Records
For missing records not everything is missing……
Rotating sample Basic socio demographic information (Age, Sex, etc…
Information from sampling frame Assumed household size, residence..
S T A T I S T I K A U S T R I AApril 2008 8
Estimation Methods
Weighting on basis of available data Raking procedure involving marginal distributions
of the Austrian population
Imputing labor status for records still to come
Assumption on the set of records necessary
S T A T I S T I K A U S T R I AApril 2008 9
Imputing labour status
AvailableData ~70%
Missing Records ~30%
To impute values on a record I definitely needrecords on which I can impute!
Informationfrom priorrotations and from the sampling frame
S T A T I S T I K A U S T R I AApril 2008 10
Multiple imputation
In official statistics not very common:
There you like to have authentic databaseswith stored values
Multiple imputation rather focuses on concrete estimation problems
=> Here I have a concrete estimation problem!
S T A T I S T I K A U S T R I AApril 2008 11
Multiple imputation – single imputation step Analysis (I)
Labour status:
4 possible values (1=’employed’, 2=’unemployed’, 3=’not relevant for employment’, 4=’military person’).
Analysis of distributional differences of labour status between known and expected records based on poststratificationincluding Sex , Age-groups, and Citizenship
S T A T I S T I K A U S T R I AApril 2008 12
Multiple imputation – single imputation step Analysis (II)
Results were also depending on the quarter.
Even incorporating this figures were not satisfactory
=> There must be an additional factor
=> Weight of a person delivered desired result.
S T A T I S T I K A U S T R I AApril 2008 13
Multiple imputation – single imputation step
Identify stratum s
Get distribution Y forLabour Status in sratum s
Correct Y by estimateddistribution differences C
Generate random numberx and assign imputedvalue for Labour Status according to Y+C
Single Imputation for a record not received
S T A T I S T I K A U S T R I AApril 2008 14
Multiple imputation
Multiple Imputation smoothes out Variability of estimators
Estimation of differences of distribution between known records and expected records of the specified quarter on the basis of quarters already processed. Weighting of the dataset based on certain socio-demographic assumptions concerning the records still to come. Computation of the distribution of labour status of the already known records 25 times single imputation of the item labour status according to the algorithm above and calculation of the unemployment rate on the basis of imputed and non-imputed values for every single imputation. Final estimation of the unemployment rate by the mean value over all imputation runs.
S T A T I S T I K A U S T R I AApril 2008 15
Results (I)
Results for the MI-Estimation of preliminary figures compared to the real data
S T A T I S T I K A U S T R I AApril 2008 16
Results (II)
Umemployment Rate
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Real data 5.2 5.2 5.0 5.1 5.5 4.7 4.3 4.5
Grossing up 5.2 5.2 5.3 5.0 5.3 4.6 4.3 4.4
MI 5.3 5.2 5.4 5.1 5.4 4.7 4.2
q1_2005 q2_2005 q3_2005 q4_2005 q1_2006 q2_2006 q3_2006 q4_2006
Umemployment Rate
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Real data 5.2 5.2 5.0 5.1 5.5 4.7 4.3 4.5
Grossing up 5.2 5.2 5.3 5.0 5.3 4.6 4.3 4.4
MI 5.3 5.2 5.4 5.1 5.4 4.7 4.2
q1_2005 q2_2005 q3_2005 q4_2005 q1_2006 q2_2006 q3_2006 q4_2006
Comparison of estimation of unemployment rate – MI, Grossing up and Real data
S T A T I S T I K A U S T R I AApril 2008 17
Conclusions – Critical remarks
Multiple imputation is a possible estimation strategy for preliminary figures
Problematic assumptions concerning expected records
Time series are very thin now