1 philip clarke and denise silva development of small area estimation at ons

48
1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

Upload: lawson-juers

Post on 16-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

1

Philip Clarke and Denise Silva

Development of Small Area Estimation at ONS

Page 2: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

2

Outline

1. Small Area Estimation Problem

2. History and current provision

3. Development in progress

4. Wider research

5. Consultancy service

Page 3: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

3

1. Small Area Estimation Problem

• “Official statistics provide an indispensable element in the information system of a democratic society” (Fundamental Principles of Official Statistics, UNSD )

• Sample surveys are used to provide estimates for target parameters on population (or National) level and also for subpopulations or domains of study

• However implementation in a Small Area Context is challenging

Page 4: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

4

Small Area Estimation Problem

• In small areas/domains sample sizes are usually not large enough to provide reliable estimates using classical design based methods.

• Small area estimation problem refers to SMALL SAMPLE SIZES (or none at all) in the domain or area of interest.

Page 5: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

5

2. History

• Small Area Estimation in UK begun as research project in late 1990s.

• In response to calls for locally focussed information in many different areas :Environmental

Business

Social, e.g. health, housing, deprivation, unemployment.

• Also calls for more general domain estimation;– e.g. cross classifications by age/sex, occupation.

• Initial experimental studies on mental health estimation for DoH.

Page 6: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

6

Developing alternative methodology

• Purpose :– To enable production of reliable estimates of characteristics of

interest for small areas or domains based on very small or no

sample.

– To asses the quality (precision) of estimates.

• Several years of research and development (since 1995)– Partnership work with universities and Statistics Finland

– The EURAREA project:

Research programme funded by Eurostat to ‘enhance

techniques to meet European needs’ (from 2001-2004)

Page 7: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

7

Basis of Approach: Relax the Survey Restriction

• ‘Borrow strength’ by removing the isolation of

depending solely on the survey and solely on

respondents in a given area.– Widen the class of respondents for a given area by pooling together similar areas.

– Widen the class of respondents by taking past period respondents into account.

– Take advantage of other related data sources which are not sample survey based.

• Known as auxiliary data.

e.g. Administrative data or census data which are available for all areas/domains.

Page 8: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

8

Model based estimation

• All approaches detailed are based on an implicit or explicit model.

• The auxiliary data and use of survey data from all areas is the approach currently adopted in UK.– Borrows strength nationally.

– Uses an explicit statistical model to represent the relationship

between the survey variable of interest and auxiliary data. Dependent variable is survey variable of interest.

Independent variables are certain auxiliary data variables known

as covariates.

Model fitted using sample data and assumed to apply generally.

Model then used in the obtaining of area/domain estimates.

Page 9: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

9

Outline of a model structure

• Suppose variable of interest, Y, in an area j is linearly related to a single covariate X

• A possible model structure is given by :

where is the mean of Y in area j

• This is a deterministic structure, so we need to add some random variability

j jY X

jY

Page 10: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

10

• Obtain

• uj represent random area differences from the deterministic value.

• represents variability between areas.

j j jY X u ),0(~ 2uj Nu

2u

Page 11: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

11

Model fitting

• Fit the model using direct survey estimates for each area.

• This introduces additional sampling variability.

• Unit level sampling variability

giving rise to additional area level sampling variability

j j j jy X u e 2~ (0, )ij ee N

2~ (0, )j e je N n

Page 12: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

12

Estimating from the model

• Once the model is fitted, estimate for area j by using parameter estimates :

jjj uXy ˆ ˆˆˆ

Page 13: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

13

Estimating from the model

• Once the model is fitted, estimate for area j by using parameter estimates :

• Estimate of mean squared error given by

jjj uXy ˆ ˆˆˆ

2ˆ)ˆ(ˆujyESM )ˆ,αv(oC2)ˆ(raV)αr(aV 2 jj XX

Page 14: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

14

Estimating from the model

• Once the model is fitted, estimate for area j by using parameter estimates :

• Estimate of mean squared error given by

• Modelling success measured by obtaining estimates with high precision based on low mean squared errors.

jjj uXy ˆ ˆˆˆ

2ˆ)ˆ(ˆujyESM )ˆ,αv(oC2)ˆ(raV)αr(aV 2 jj XX

Page 15: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

15

Current provision

• SAEP – a generic methodology for application to variables from household based surveys. – Mean household income based on Family

Resources Survey published as Experimental Statistics for wards in 1998/99, 2001/02 and for middle layer super output areas 2004/05

• Specialised methodology for labour market estimation of unemployment from Labour Force Survey.– Unemployment levels and rates routinely

published quarterly as National Statistics for Local Authority Districts in Great Britain.

Page 16: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

16

SAEP methodology and income estimation

SAEP methodology is -:• derived from outlined model-based approach,

BUT is• based on a unit (household)/area multilevel model;• borrows strength across areas using multivariate

area level auxiliary data (covariates);• can model transformation of variable of interest if

required;• adapted for estimating at ward/middle layer super

output area (MSOA) from customary ONS clustered design household sample surveys;

Page 17: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

17

Application to income estimation- Response Variable

• Income value for each household sampled in Family Resources Survey (FRS).~ 3,300 MSOAs in England and Wales with sample

in 2004/05,

~ 21,500 total responding households.

• But not a simple random sample.– Clustered design with primary sampling units as

postcode sectors,

~ 1,500 sampled postcode sectors.

Page 18: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

18

Coping with design clustering

• Samples are random samples of postcode sectors; – So random terms are around postcode sectors,

indexed by j

• Estimation is required for geographically distinct wards or middle layer super output areas;– So covariates are for these areas, indexed by d– For estimation, covariates must be known for all

areas not just sampled areas.

Page 19: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

19

SAEP model and estimator structure for income estimation

• Multilevel structure gives rise to unit level random term replacing area sampling variability

• Logarithmic transformation of income taken because of positive skewness of income distribution

• Model : id d j ijlog y X u e

ije

ie

Page 20: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

20

SAEP model fitting procedure

• Create a dataset containing :– Variable of interest from individual household

responses to survey.– values of a large number of administrative and

census variables for the particular household area of residence which we believe could impact on variable of interest, eg census variables, DWP social benefit claimant rates, council tax band proportions

Page 21: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

21

SAEP model fitting procedure (cont.)

• Starting with a null model, fit covariates in a stepwise manner in order of significance by using specialised multilevel software – eg. MLwiN or SAS PROC MIXED.

• In this way select a set of significant covariates and fit an accepted model.

• Use diagnostic techniques to investigate model against assumptions eg. Randomness of residuals, unbiasedness of predictions.

Page 22: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

22

Estimator and mean squared error

• Estimator on log income scale :A synthetic estimator is used omitting the random

area terms :

ˆˆ d dlog y X

Page 23: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

23

Estimator and mean squared error

• Estimator on log income scale :A synthetic estimator is used omitting the random

area terms :

• Mean squared error

ˆˆ d dlog y X

2ˆ ˆTd d uX Var X

Page 24: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

24

Converting to raw income scale

• Need to make allowance for

mean(log) log(mean)

• Area estimate2 2ˆ ˆˆˆ exp

2u e

d dy X

Page 25: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

25

Converting to raw income scale

• Need to make allowance for

mean(log) log(mean)

• Area estimate

• Confidence interval

2 2ˆ ˆˆˆ exp2

u ed dy X

12 2

2 2ˆ ˆˆexp ˆ ˆ1.96

2Tu e

d d d uX X Var X

Page 26: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

26

Actual model for ward estimation of income in 2004/05

ˆlog

.................

6.01 0.76

0.18 0.13

0.58 0.

...............

72

.

d d

d d

d d

phrpman

lnphrpecac lnphhtyp

inco

e1

engegh pcgeo

me

x

x x

x x

phrpman = proportion of household reference persons aged 16-74 who are in professional or managerial occupations.lnphrpecac = logit of proportion of household reference persons aged 16-74 who are economically active.lnphhtype1 = logit of proportion of one person households.engegh = proportion of council tax band G&H dwellings for England.pcgeo = proportion of people aged 60 and over claiming pension

credit (guarantee element only) .

Page 27: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

27

Page 28: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

28

Income estimation outputs

• Estimates obtained of sufficient precision for publication and acceptable to user community.

• Accredited as Experimental Statistics• Placed on Neighbourhood Statistics website

together with user guides and technical documentation.

Page 29: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

29

Estimation of unemployment at local authority level

BACKGROUND• Unemployment is a key indicator and is used for

policy making and resource allocation

• Official UK measure of unemployment follows the International Labour Organisation Definition (ILO)

• ILO unemployment is estimated via the Labour Force Survey (national level)

• Small (local) sample sizes in the LFS for some areas

Page 30: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

30

Features of Labour Force Survey

• A rotating panel survey– Roughly 60,000 households surveyed each quarter– Each household remains in sample for 5 quarters

(waves 1 to 5) then drops out

• Waves 1 and 5 respondents for last four quarters used to obtain an annual ‘local labour force survey’ dataset of about 90,000 independent households.

• Unclustered survey design – giving a sample in each LAD.

Page 31: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

31

Features of unemployment modelling

• Unclustered LFS design means

– direct estimates available for each LAD

– availability of estimated random area terms in LAD estimation

• However– low precision of direct survey estimates due to small sample

sizes– need for better precision model-based estimates

• Availability of a highly correlated covariate – number of claimants of unemployment benefit/job seekers allowance

– Eliminates need for model fitting to a range of possible covariates on each occasion.

Page 32: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

32

The small area estimation model

A LOGISTIC multilevel model by local authority (d) and six age/sex classes (i). It relates the probability pdi of an individual to be unemployed.

Response variable: proportion of unemployed individuals in LFS in age/sex class of local authority (logit transformed).

Covariate data• Benefit data: the logit of the claimant proportion of job seekers

allowance in each age/sex class within each local authority and also for overall age/sex classes;

• The age/sex class: male/female for age groups (16 to 24; 25 to 49; 50 and over)

• Geographical region: the 12 government office regions (GOR)

• ONS area classification : 7 categories under the National Statistics Area Classification for Local Authorities

Page 33: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

33

• The model used to link pid with the auxiliary data is a Binomial linear mixed model with a logistic link function

Area random effect

logit ln1

Tidid id d

id

pp X u

p

β2~ (0, )d uu N

Page 34: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

34

Estimator from model

• The model-based estimator of proportion unemployed in each age/sex group of each LAD is then given after fitting model by :

• Note the use of the term in the estimator as it is now available for each LAD.

ˆ ˆexpˆˆ ˆantilogit

ˆ ˆ1 exp

Tid dT

id id dTid d

up u

u

x βx β

x β

du

Page 35: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

35

• Model has estimated a proportion at each age/sex group

• This is converted into an estimate of unemployment level at each LAD by :– multiplying each proportion estimate by the LFS estimate of

population unsampled

– adding those sampled and found unemployed

– summing the age/sex group estimates

Final Estimator for unemployment level for area d is:

Model-based estimate for Unemployment

6 6

1 1ˆ ˆ ˆ ˆd id sid id id idi iY Y y N n p

6 age-sex groups

Page 36: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

36

LAD Estimation of unemployment rate

• The estimate of unemployment rate is obtained using model-based estimate of unemployment level and the direct estimate of employment :

Direct survey estimate of

Employment

dd

dd

EY

Yr

Model-based estimate of

Unemployment

Page 37: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

37

Precision of Estimates

• The mean squared error (MSE) for the unemployment level estimates in LAD d is given by several components

• G1 and G2 come from the uncertainty in estimating the coefficients and

u in the model

• G3 arises because we have estimated the variance of u

• G4 is necessary because the model estimates actual values rather than

means

• G5 is the additional variance component due the estimation of population

size in each LAD

54321d GGGGG)Y(MSE

β

)ˆ( dN

2u

Page 38: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

38

Unemployment estimates publication

• The standard errors of the model based estimates found to be smaller than the corresponding direct standard errors in each LAD.

• Model-based estimates have been accredited as National Statistics and now published quarterly in Labour Market statistics releases.

(http://www.statistics.gov.uk/StatBase/Product.asp?vlnk=14160)

Page 39: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

39

3. Developments in progress

Labour Market area

– Consistent estimation of all three labour market states: - employed, not economically active, unemployed

– Currently… Local Authority labour market estimates are:

• Model-based estimates for unemployment

• Direct survey estimates for economically inactivity and employment figures

• Now developing a multivariate model to estimate concurrently number of unemployed, employed and economic inactive people by local authority

Page 40: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

40

Compositional data

• The proportions of individuals classified in each category are: Proportions bounded between 0 and 1 and

subject to a unity-sum constraint.

Multinomial Logistic model to relate labour market probabilities with auxiliary data for all categories is therefore defined with only 2 equations.

Page 41: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

41

Multinomial Logistic Model

11 1 1

3

mlogit( ) ln Tidid id d

id

pp u

p

x β

22 2 2

3

mlogit( ) ln Tidid id d

id

pp u

p

x β

Page 42: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

42

Multinomial Logistic Model

11 1 1

3

mlogit( ) ln Tidid id d

id

pp u

p

x β

22 2 2

3

mlogit( ) ln Tidid id d

id

pp u

p

x β

1 1

1 2

1

exp( )

1 exp

Tid d

idTid j dj

j

up

u

x β

x β 2 2

2 2

1

exp( )

1 exp

Tid d

idTid j dj

j

up

u

x β

x β

Then:

3 2

1

1

1 expid

Tid j dj

j

pu

x β

Page 43: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

43

The Model

• Relates the probabilities of labour market states to following predictors:

• age/sex group ; Geographical region and ONS area classification:

• Benefit data: claimant proportions (JSA) and incapacity benefit

• Other variables will be tested (e.g. income support)

Page 44: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

46

Developments in progress (cont.)

Labour Market area– Unemployment estimation at Parliamentary

constituency level

• Non-nested geography but with certain matching areas

• Issue here is to ensure consistency with local authority

estimates at comparable areas

• Model developed and estimates likely to become

available in the coming year

Page 45: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

47

Developments in progress (cont.)

Income estimation– Estimation at local authority level

• Clustered survey design entails a modification of SAEP framework to cater

• Currently in development

– Estimation of poverty: proportion households below threshold

• Currently being developed for MSOA/local authority level

Page 46: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

48

4. Wider research activities

In conjunction with academic partners– Estimation of change over time

Current work is confined to single point-in-time estimation but users would like indication of progress over time – particular in relation to funding

– Estimation of poverty using M-quantile modelling

Research using FRS data by Nikos Tzavidis

– Models incorporating spatial relationshipsPreliminary investigation of spatial relationship in

unemployment model in conjunction with Ayoub Saei at Southampton University

Link with work at Imperial College by Nicky Best and Virgilio Gomez-Rubio

Page 47: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

49

5. Methodology Consultancy Service

ONS is currently establishing a methodology consultancy service

– To undertake and support statistical work by other government departments and public sector organisations.

– Resource for assessment/quality improvement

– Currently working with Health and Safety Executive on small area estimation of incidence of work related illness at local authority level.

Page 48: 1 Philip Clarke and Denise Silva Development of Small Area Estimation at ONS

50

References• Small Area Estimation Project Report. Model-Based Small Area

Estimation Series No.2, ONS, January 2003• Developments in small area estimation in UK with focus in current

research. Clarke, P., Mcgrath K., Chandra, H., Tzavidis, N. (2007). IASS Satellite Meeting on Small Area Estimation, Pisa.

• Model Based Estimates of Income for Middle Layer Super Output Areas 2004/05 Technical Report, ONS, September 2007

http://neighbourgood.statistics.gov.uk/HTMLDocs/images/Technical Report 2004_05 v2 - Final_tcm97-53513.pdf http://neighbourhood.statistics.gov.uk/dissemination/MetadataDownloadPDF.do?downloadId=21704

• Development of improved estimation methods for local area unemployment levels and rates. Labour Market Trends, vol. 111, no 1www.statistics.gov.uk/cci/article.asp?id=372

• Summary publication accompanying the publication of the 2003 unemployment estimates November 2004http://www.statistics.gov.uk/downloads/theme_labour/ALALFS/AnnexA.pdf