weighting sample surveys with bascula

26
Weighting sample surveys with Bascula Harm Jan Boonstra Statistics Netherlands

Upload: elom

Post on 30-Jan-2016

112 views

Category:

Documents


4 download

DESCRIPTION

Weighting sample surveys with Bascula. Harm Jan Boonstra Statistics Netherlands. Outline. General overview Calibration/weighting Estimation and variance estimation Demonstration with example data from the Dutch Labour Force Survey (LFS) Other applications at Statistics Netherlands. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Weighting sample surveys with Bascula

Weighting sample surveys with Bascula

Harm Jan BoonstraStatistics Netherlands

Page 2: Weighting sample surveys with Bascula

Outline

• General overview– Calibration/weighting– Estimation and variance estimation

• Demonstration with example data from the Dutch Labour Force Survey (LFS)

• Other applications at Statistics Netherlands

Page 3: Weighting sample surveys with Bascula

Bascula

• Part of Blaise (current version 4.7), a general system for computer-assisted survey processing developed at Statistics Netherlands

• History: predecessor LINWEIGHT developed by Jelke Bethlehem in the 1980’s

Page 4: Weighting sample surveys with Bascula

Main features

• Calibration: computation of weights using auxiliary information encoded in a weighting model

• Estimation of (sub)population totals, means, proportions and ratios

• Variance estimation: Taylor linearisation and balanced repeated replication (BRR) for several sampling designs

Page 5: Weighting sample surveys with Bascula

Weighting

• Reduction of MSE– Reduction of (non-resonse) bias– Reduction of sampling variance

• Calibration to auxiliary totals for consistency with known population totals

• A single set of weights– Easy tabulation– Mutual consistency between estimated tables

Page 6: Weighting sample surveys with Bascula

‘Small sample’ problems

• Full consistency with register data or data from related surveys can usually not be achieved (overfitting). Not all information can be used at the same time.

• Weighting can be ineffective for (small) domain estimates

For sufficiently large samples weighting is an effective and convenient way to improve estimates!

Page 7: Weighting sample surveys with Bascula

Weighting/calibration methods in Bascula

Based on the general regression (GREG) estimator:

• Poststratification, e.g. Region x AgeClass• Ratio estimator, e.g. AgeClass x Income• Linear weighting, e.g. Region + AgeClass x

IncomeBased on Iterative Proportional Fitting (IPF):• Multiplicative weighting, e.g. Region +

AgeClass

Page 8: Weighting sample surveys with Bascula

Further weighting options

• Bounding of weights for linear weighting, Huang and Fuller algorithm

• Consistent linear weighting, e.g. for equal weights within households, Lemaître and Dufour

Page 9: Weighting sample surveys with Bascula

Estimation of totals

• Based on the calibration weights:

• General regression estimator:

• Also ratios of totals, means, proportions, subclasses

si

iical ywY

si

iiHTt

HTregr ywXXBYY )ˆ(ˆˆˆ

)ˆ(/)/(1

1

HTsj

jtjjji

tii XXxxdxg

iii gdw

Page 10: Weighting sample surveys with Bascula

Variance estimation

• Direct/Taylor method (HT and GREG only)

• Balanced Repeated Replication (BRR)

Sampling designs supported:

• Stratified two-stage element or cluster design with simple random sampling without replacement in both stages

• Stratified multistage cluster designs with replacement in the first stage and unequal propabilities

Page 11: Weighting sample surveys with Bascula

Taylor variance

• Taylor linearisation:

• Modified variance estimator (default in Bascula):

)ˆ(ˆˆˆHT

tHTregr XXBYY

si i

i

si i

it

iregr

exByYv

var

ˆvar)ˆ(

si i

ii

siiiregr

egewYv

varvar)ˆ(

Page 12: Weighting sample surveys with Bascula

BRR variance

• R balanced half samples (partially balanced if R < #strata)

• Fay factor

• Grouped BRR (more than 2 PSUs per stratum allowed)– Artificial strata– Repeated grouping

2

1

)(,2

)ˆˆ(1ˆ

regr

R

regrregrBRR YYR

Yv

Page 13: Weighting sample surveys with Bascula

Input

• Sample data file: Ascii (fixed column or separated), Blaise, other OleDB compatible

• Blaise meta information; Blaise Textfile Wizard helps in making data model for Ascii files

• Tables of population totals• Selection of weighting scheme and other paramete

rs that influence the weighting• Some additional input required for estimation and

variance estimation: target tables and sampling design details

Page 14: Weighting sample surveys with Bascula

Data integrity checks

• Consistency of set of population tables

• Sample counts per cell do not exceed population counts

• Enough sample observations for each cell in weighting model

• Inclusion weights/sampling fractions compatible with sampling design specified

Page 15: Weighting sample surveys with Bascula

Output

• Set of final and correction weights (written to the sample file and to a separate weights file)

• Optionally: fitted values • Tables of estimates (including estimates of

standard errors) in export file; format compatible with population data file

it

i xBy ˆˆ

Page 16: Weighting sample surveys with Bascula

Example: Dutch Labour Force Survey

• Rotating panel design with five waves; CAPI in first wave, CATI in subsequent waves

• CATI data first calibrated on the most important target variable (employment in several categories) to initial CAPI panel to reduce panel attrition bias

• Weighted CATI data is combined with CAPI data and together calibrated to population totals of weighting scheme

Region44 x Age4 x Sex2 + Age21 x Sex2 + Age5 x MarStat2 + Sex2 x Age5 x Ethnicity8 + CWI3

Page 17: Weighting sample surveys with Bascula

Dacseis software evaluation report on Bascula:

‘Bascula is a part of Blaise (an integrated system for survey processing), and it might not be reasonable to purchase Blaise only for the use of Bascula. When having Blaise available, Bascula provides an advanced weighting tool (linear or multiplicative weighting) with abilities for proper variance estimation based on Taylor’s linearisation. When the basic order of the weight and estimate calculations of Bascula is understood, the operations can be carried out quite easily.’

Page 18: Weighting sample surveys with Bascula

Usage

• menu-based interactive version

• from Blaise’s script language Manipula

• from most modern programming languages, e.g. VB, VBA, Delphi, C++, C#

• from other software able to act as automation client, e.g. S-Plus

Page 19: Weighting sample surveys with Bascula

Automation

Bascula component (dll) can be used to automate weighting/estimation processes

For recurring weighting/estimation processes, batch processing, integration into production systems

Build custom tools utilizing Bascula’s functionality

Page 20: Weighting sample surveys with Bascula

Tools that use Bascula component

• Tool that integrates imputation/outlier detection and handling/weighting for the Production Statistics

• Tool for analysing results of experiments• Tool for repeated weighting• Simple simulation tools

– Variance estimation (Dacseis)– GREG as input for small area estimators

Page 21: Weighting sample surveys with Bascula

Repeated weighting

• Practical sequential approach to make tables of estimates consistent between data sources

• Two step procedure1. Start with GREG estimates

2. Adjust these estimates such that they are consistent with register totals (not used in the weighting scheme of GREG) and possibly with previously estimated marginal tables from a combination of surveys.

Page 22: Weighting sample surveys with Bascula

Software tool

Source: Systemdocumentation VRD, V.Snijders

Dataset, weighting model, population

totals

Export

Estimates

StatBase

VRD

Metadatabase

StatBase

VRD

Metadatabase

Rectangular datasets

Bascula

Estimation 15Estimation 15

Micro database

Page 23: Weighting sample surveys with Bascula

Use of Bascula at Statistics Netherlands

• Labour Force Survey• Repeated weighting for the Social Statistical

Database• Survey on Household Incomes• Budget Survey• Survey on Living Conditions• Production Statistics

and more

Page 24: Weighting sample surveys with Bascula

Survey on Household Incomes

• Calibration on both person totals and household totals, both obtained from municipal registrations

• Consistent linear weighting: Region29 x Age8 x Sex2 +

Region29 x HouseholdType9 x OneHH

OneHH is auxiliary variable that sums to one over each household

Page 25: Weighting sample surveys with Bascula

Production Statistics

• Continuous auxiliary variables available from Tax Office; categorical variables from Business Register

• Weighting scheme: Activity x SizeClass x Source x Tax +

Activity x SizeClass x Source• Variable Source indicates whether tax info

can be matched to surveyed businesses

Page 26: Weighting sample surveys with Bascula

Finally,

• Priorities for further development have not been very high in the last three years, but that may change

• Possible extensions: variance structure, Newton-Raphson for exponential method, two-phase regression estimator, synthetic estimation for subpopulations, small area estimation?