local demographic rates from the 2001 census when is a proportion a most difficult thing to...

24
Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and Survey Research, March 16 th 2004

Upload: jenna-pollard

Post on 28-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Local demographic rates from the 2001 census

When is a proportion a most difficult thing to estimate?

Ludi Simpson and Matt Bowen, Centre for Census and Survey Research,

March 16th 2004

Page 2: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Rochdale 43 African older women28 economically active = 65%

• National=60%• Census – (almost) all women

– but one year’s women– is it robust enough to use in a forecast?

• Alternatives:– Use local rate– Collapse categories – use Black– Use national rate as more robust– Weight local and national

• Bradford top up rule: weight with national women to 100• (43*65% + 57*60%) = 62%

Page 3: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Rochdale 43 African older women28 economically active = 65%

• National=60%Is 65% a Rochdale feature, or is it ‘sampling’ error?

How best to estimate each local rate?• Does it matter?

– Many rates to measure, most by age and sex• Children/Women ratios: general fertility rate• Migration rate, in and out• Economic activity rate• Household headship rate

– With smaller samples, often wildly volatile– Require comparison between areas– Require best estimates of underlying or long term

rates

Page 4: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Ethnic group demography

• ONS: National projections with ethnic group dimension

• Home Office: Community Cohesion

• Dept of Work and Pensions: raise economic activity; neighbourhood effects

• Local authorities: housing access

• Methods: small area estimation

Page 5: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Economic activity of women aged 25-74, Census 2001Employment status, men 25-74 England and

Wales 2001Employment status, women 25-74 England and

Wales 2001

0% 20% 40% 60% 80% 100%

BangladeshiPakistani

OtherOther Asian

African / WhiteAfrican

Caribbean / WhiteOther mixed

ChineseIndian

Asian / WhiteOther WhiteOther Black

IrishALL PEOPLE

White BritonCaribbean

Black: workers; Dark grey:

Home/family; Light grey: Other

Source: Census 2001 Table ST108, excluding retired

Page 6: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Extraction from SASPAC or csv files to SPSS

Matrix commands to reshape giving ethnic group, population, economic activity variables

Page 7: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

355 local authorities with Pakistani women aged 25-74

Pakistani women aged 25-74, economic activity, Census 2001Direct estimates

0.00.20.40.60.81.0

3 3 6 6 7 9 9

10 12 14 15 16 17 17 19 21 22 25 29 37 45 57 62 71 93

123

177

255

332

454

641

855

1103

1727

2642

4420

LAD population (denom)

Estim

ated

eco

nom

ic

activ

ity

Oldham

Page 8: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

National p=0.249Confidence interval for LAD population n

= ± sqrt(p*(p-1)/(n-1)Pakistani women aged 25-74, economic activity, Census 2001

Direct estimates with funnel plot: 95% confidence intervals around national mean 0.249

0.0

0.2

0.4

0.6

0.8

1.0

3 3 6 6 7 9 9

10

12

14

15

16

17

17

19

21

22

25

29

37

45

57

62

71

93

12

3

17

7

25

5

33

2

45

4

64

1

85

5

11

03

17

27

26

42

44

20

LAD population (denom)

Est

imat

ed e

con

om

ic a

ctiv

ity

Page 9: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Economic activity, related to number of women

Pakistani women aged 25-74, economic activity, Census 2001Direct estimates with funnel plot: 95% confidence intervals around national mean 0.249

0.0

0.2

0.4

0.6

0.8

1.0

3 3 6 6 7 9 9

10

12

14

15

16

17

17

19

21

22

25

29

37

45

57

62

71

93

12

3

17

7

25

5

33

2

45

4

64

1

85

5

11

03

17

27

26

42

44

20

LAD population (denom)

Est

imat

ed e

con

om

ic a

ctiv

ity

P= 0.238P=0.444P=0.502

Page 10: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

local area heterogeneity → believe area proportion small samples → weight to national mean

• Nic Longford, shrinkage estimators, JRSS(A) 1999– No distributional assumptions– The area sample as a proportion of the whole – The combination of area and national proportions

that minimises Mean Square Error of estimate from true local mean:

• Usually

lp̂

2b

)ˆ( lpv

)ˆ()21)(ˆ(

ˆ)1)(ˆ(ˆ)ˆ()ˆ(2

2

pvqpvσ

pqpvpqpvσpv

llb

lllllb

lq

)ˆ(

ˆ)ˆ(ˆ2

2

lb

llb

pvσ

ppvpσ

Page 11: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Pakistani women aged 25-74, economic activity, Census 2001Direct and non-parametric shrunken estimates (Nic Longford, JRSSA 1999)

0.0

0.2

0.4

0.6

0.8

1.0

3 3 6 6 7 9 9 10 12 14 15 16 17 17 19 21 22 25 29 37 45 57 62 71 93 123

177

255

332

454

641

855

1103

1727

2642

4420

LAD population (denom)

Est

imat

ed e

cono

mic

ac

tivi

ty

Page 12: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Predict local means and shrink to those

• Combine two approaches to small area estimation– Model similarity of locality to others, to find

their predicted mean • Only as good as the model

– Rely on direct local estimate when n is large• Software limitations

Page 13: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Predictors of area economic activity

Pakistani women aged 25-742001 Census, 355 LADs

20.0

30.0

40.0

50.0

60.0

70.0

80.0

0.0 5.0 10.0 15.0

Pakistani as percent of all women 25-74

% U

K b

orn

Pakistani women aged 25-742001 Census, 355 LADs

0.0

0.2

0.4

0.6

0.8

1.0

25.0 30.0 35.0 40.0 45.0 50.0Mean age of Pakistani women 25-74

Ec

on

om

ic a

cti

vit

y

Pakistani women aged 25-742001 Census, 355 LADs

0.0

0.2

0.4

0.6

0.8

1.0

0.0 5.0 10.0 15.0 20.0

Pakistani as percent of all women 25-74

Ec

on

om

ic a

cti

vit

y

Pakistani women aged 25-742001 Census, 355 LADs

0.0

0.2

0.4

0.6

0.8

1.0

0.0 20.0 40.0 60.0 80.0 100.0

Percent born in UK (Pakistani people all ages)

Ec

on

om

ic a

cti

vit

y

c

Page 14: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Modelling options for a rate p

• P as continuous variable, normal errors– No equivalent of shrinkage

• Numerator as count, denominator as offset, Poisson errors

• P as an aggregate 0-1 variable, Binomial errors: logistic– Not SPSS– MLwiN– STATA

Page 15: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Logistic transformation of proportion to avoid predicted values outside [0,1]

logit(p)=loge(p/(1-p))

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

p

log

it(p

)

Page 16: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Snijders and Bosker, 1999

• “Estimation procedures for these [binary outcome multilevel] models are still in a state of active development”, with different and disputed levels of bias, mean square error and stability, dependent on starting values, data, complexity of the model and small group sizes

Page 17: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Multilevel logistic model

Page 18: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

MLwiN eth % mean age uk- variance Extra-parameters of pop 2574 born % LADs binomial

White Briton 0.006 -0.078 -0.072 0.020 1.0Irish -0.044 -0.099 -0.010 0.015 1.2Other White -0.016 -0.054 -0.009 0.024 1.2

Caribbean / White-0.365 -0.046 -0.014 0.007ns 1.7African / White-0.772 -0.025 0.000ns 0.029* 1.8Asian / White -0.492 -0.030 -0.010 0.039 1.7Other mixed -0.341 -0.020 -0.003ns 0.013ns 1.9

Indian -0.025 -0.015ns -0.003 0.107 1.5Pakistani -0.181 0.008ns -0.011 0.208 1.7Bangladeshi -0.054 -0.010ns -0.021 0.235 2.7Other Asian -0.069 0.034 -0.014 0.132 2.5

Caribbean -0.024 -0.079 0.007* 0.033 2.4African -0.036 -0.048 0.006ns 0.139 2.1Other Black -0.012ns -0.035 -0.001ns 0.021ns 2.4

Chinese -0.212 -0.038 -0.003ns 0.041 0.7Other -0.108 0.008ns -0.009* 0.093 1.1

Page 19: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

LAD predicted economic activity from fixed model, 3 predictors

Pakistani women aged 25-74, economic activity, Census 2001Estimates from model with 3 explanatory variables, no random effect

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

3 3 6 9 9 12 14 15 17 18 21 23 29 42 57 70 93 155

255

417

641

981

1727

3288LAD population (denom)

Es

tim

ate

d e

co

no

mic

ac

tiv

ity

direct intercept from fixed model

Page 20: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Shrinkage from observed proportion towards modelled proportion

Pakistani women aged 25-74, economic activity, Census 2001Estimates from models with 3 explanatory variables and random intercept

0.00.10.20.30.40.50.60.70.80.91.0

3 3 6 7 9 10 12 14 15 16 17 19 21 25 30 42 52 63 81 116

177

266

418

616

904

1258

2356

4977LAD population (denom)

Est

imat

ed e

con

om

ic a

ctiv

ity

direct fixed intercept from mixed model random intercept PQL2 RIGLS

Page 21: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Pakistani women aged 25-74, economic activity, Census 2001Estimates from models with 3 explanatory variables and random intercept

0.0

0.2

0.4

0.6

0.8

1.0

3 3 3 6 6 6 8 9 9 10 10 12 13 14 15 15 16 16 17 18 19 20 21 23 24 27 30 36 43 46 57 61 69 77 91 113

128

176

212

266

332

446

499

681

802

981

1138

1425

2171

3173

4420

LAD population (denom)

Es

tim

ate

d e

co

no

mic

ac

tiv

ity

direct random intercept PQL2 RIGLSfixed intercept from fixed model fixed intercept from mixed model

Page 22: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and
Page 23: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Shrinkage with random intercept only, NOT equal to shrinkage to the overall mean

Pakistani women aged 25-74, economic activity, Census 2001Estimates from models with intercept only

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

3 3 6 7 9 10 12 14 15 16 17 19 21 25 30 42 52 63 81 116

177

266

418

616

904

1258

2356

4977

LAD population (denom)

Est

imat

ed e

con

om

ic a

cti

vity

direct random intercept

Fixed intercept=mean=0.249

Beta0 with random intercept=0.385

Page 24: Local demographic rates from the 2001 census When is a proportion a most difficult thing to estimate? Ludi Simpson and Matt Bowen, Centre for Census and

Conclusions

• Shrinkage using variance components models is suitable for small area estimation– especially where the area populations vary greatly

• It makes little difference to proportions based on populations over 100

• Modelling allows borrowing strength from similar areas– The model must fit well

• The software for mixed models does not shrink to the overall mean (or fixed effects prediction in the case of continuous predictors)– Software and documentation under development