weighting and imputation for core social housing statistics julia bowman & niall goulding

Post on 17-Dec-2015

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Weighting and Imputationfor CORE Social Housing Statistics

Julia Bowman & Niall Goulding

What CORE is

• COntinuous REcording of Social Housing Lettings

• Census – hybrid of interview and administrative data

• Household level data collected

• Private Registered Providers and Local Authorities

• Collected from all housing providers in England since 2004

• Many types of information are collected, not just the number of lettings…

Lettings log

2012/13 Headline stats

Context – 378,700 lettings

Household characteristics – 91% UK nationals, 22% in work, 3% under 18Most common reason given for why the household left their last settled home - overcrowding

Average weekly rent - £79.58 / £104.52

Length of time vacant – 32 days

Staying within local authority – 90%

378,700 lettingsOvercrowding£79.58 per week32 days vacant90% remain in LA

Complimentary data setsLocal Authority Housing Statistics (LAHS)

English Housing Survey (EHS)

Users

Interests around household characteristics

• And media interest…

QIF bid

• Two problems we sought to resolve…

• Placed bid to the UKSA’s Quality Improvement Fund (QIF)

• Work carried out by the ONS Methodology Advisory Board

Problem 1: LA missing records

• Lettings volume varies greatly by local authority

• Local Authority Housing Statistics (LAHS): nearly complete lettings data at LA level

• CORE: lettings data at household level

Problem 1: LA missing records

• Some LAs do not provide logs for every letting in CORE

• Introduces bias into demographic statistics

• Lettings grossed to LAHS counts on urban/rural classification

• Does not account for demographics of population

Solution 1: Improved Weighting

• Geographic approach maintained

• ONS area classifications (OACs) are used to replace urban/rural classifications.

• Areas grouped on many factors using a cluster methodology

Solution 1: Improved Weighting

• What is our best estimate for lettings per ONS cluster area?

• The highest of LAHS or CORE for each LA

• If neither, we use an imputed LAHS figure

• Sum these to get total lettings per ONS cluster area

Solution 1: Improved Weighting

Highest of LAHS, CORE, imputed LAHS for each LA

Sum lettings per ONS cluster area group

Compare to reported CORE figure per area group

Ratio of best estimate to CORE figure = weight

Problem 2: Record level missing data

• Both LA and PRPs submit logs with missing household characteristics

• Age, sex, ethnicity, nationality and economic status

• This can happen because

tenant refuses to provide the information

some LAs do not interview

admin data constraints

IT constraints

Solution 2: Imputation

• So how do we account for this?

• Donor imputation: Neighbour Imputation Method

• Canadian Census Edit and Imputation System – CanCEIS (Canadian Census 2001, UK Census 2011)

• Efficient, free license, variety of record editing rules

Solution 2: Imputation

Raw data comes to DCLG (SPSS)

Data reformatted for CanCEIS (ASCII)

CanCEIS finds incomplete and donor

records

CanCEIS matches records

Household characteristics that are available(age, sex, ethnicity, nationality, economic status)

Area classification, provider type (LA/PRP), previous tenure, size of property, asylum seeker,

refugee status (and client type)

Record randomly picked from pool of

donors

Imputed output data set

Age Sex Nationality Area Asylum

45 M UK 6 N

35 M EEA 2 N

27 F MISSING 4 N

Age Sex Nationality Area Asylum

45 1 1 6 0

35 1 2 2 0

27 2 -10 4 0

Age Sex Nationality Area Asylum

45 1 1 6 0

35 1 2 2 0

27 2 -10 4 0

Age Sex Nationality Area Asylum

27 2 -10 4 0

27 2 2 4 0 ×10

2

The complete process

Raw data comes to DCLG

Weighting Imputation

Complete recordsWeights assigned

Final data set

Results

• What happens when we weight and impute?

PRP LA Total %

UK 113,071 69,256 91.8%

A10 4,258 2,547 3.4%

Other EEA 1,286 936 1.1%

Other 3,537 3,710 3.6%

Missing 4,324 17,131 9.7%

Total lettings 220,056

PRP LA Total %

UK 116,944 96,410 91.4%

A10 4,427 3,569 3.4%

Other EEA 1,347 1,369 1.2%

Other 3,758 5,510 4.0%

Total lettings 233,334

Original reported data Weighted and imputed dataImputed data

PRP LA Total %

UK 116,944 84,439 91.5%

A10 4,427 3,118 3.4%

Other EEA 1,347 1,204 1.2%

Other 3,758 4,819 3.9%

Total lettings 220,056

Testing

• But what further tests can we do?

• Remove logs from a complete data set and then test weighting against the complete version

• Deleting data and then imputing it to check error rate

• Finding other unaccounted biases needing weighting

• Any other thoughts?

Future work

• CORE is now National Statistics – improvements pending

• Use areas from 2011 census data

• Affordable rent weighting and imputation

• Improve data quality and volume from LAs – 2013/14 first year all LAs will participate

• On going disclosure control investigations

• Make CORE data more easily available via Open Data Communities

Thank you. Questions and comments please!

top related