local demographic rates from the 2001 census when is a proportion a most difficult thing to...
TRANSCRIPT
Local demographic rates from the 2001 census
When is a proportion a most difficult thing to estimate?
Ludi Simpson and Matt Bowen, Centre for Census and Survey Research,
March 16th 2004
Rochdale 43 African older women28 economically active = 65%
• National=60%• Census – (almost) all women
– but one year’s women– is it robust enough to use in a forecast?
• Alternatives:– Use local rate– Collapse categories – use Black– Use national rate as more robust– Weight local and national
• Bradford top up rule: weight with national women to 100• (43*65% + 57*60%) = 62%
Rochdale 43 African older women28 economically active = 65%
• National=60%Is 65% a Rochdale feature, or is it ‘sampling’ error?
How best to estimate each local rate?• Does it matter?
– Many rates to measure, most by age and sex• Children/Women ratios: general fertility rate• Migration rate, in and out• Economic activity rate• Household headship rate
– With smaller samples, often wildly volatile– Require comparison between areas– Require best estimates of underlying or long term
rates
Ethnic group demography
• ONS: National projections with ethnic group dimension
• Home Office: Community Cohesion
• Dept of Work and Pensions: raise economic activity; neighbourhood effects
• Local authorities: housing access
• Methods: small area estimation
Economic activity of women aged 25-74, Census 2001Employment status, men 25-74 England and
Wales 2001Employment status, women 25-74 England and
Wales 2001
0% 20% 40% 60% 80% 100%
BangladeshiPakistani
OtherOther Asian
African / WhiteAfrican
Caribbean / WhiteOther mixed
ChineseIndian
Asian / WhiteOther WhiteOther Black
IrishALL PEOPLE
White BritonCaribbean
Black: workers; Dark grey:
Home/family; Light grey: Other
Source: Census 2001 Table ST108, excluding retired
Extraction from SASPAC or csv files to SPSS
Matrix commands to reshape giving ethnic group, population, economic activity variables
355 local authorities with Pakistani women aged 25-74
Pakistani women aged 25-74, economic activity, Census 2001Direct estimates
0.00.20.40.60.81.0
3 3 6 6 7 9 9
10 12 14 15 16 17 17 19 21 22 25 29 37 45 57 62 71 93
123
177
255
332
454
641
855
1103
1727
2642
4420
LAD population (denom)
Estim
ated
eco
nom
ic
activ
ity
Oldham
National p=0.249Confidence interval for LAD population n
= ± sqrt(p*(p-1)/(n-1)Pakistani women aged 25-74, economic activity, Census 2001
Direct estimates with funnel plot: 95% confidence intervals around national mean 0.249
0.0
0.2
0.4
0.6
0.8
1.0
3 3 6 6 7 9 9
10
12
14
15
16
17
17
19
21
22
25
29
37
45
57
62
71
93
12
3
17
7
25
5
33
2
45
4
64
1
85
5
11
03
17
27
26
42
44
20
LAD population (denom)
Est
imat
ed e
con
om
ic a
ctiv
ity
Economic activity, related to number of women
Pakistani women aged 25-74, economic activity, Census 2001Direct estimates with funnel plot: 95% confidence intervals around national mean 0.249
0.0
0.2
0.4
0.6
0.8
1.0
3 3 6 6 7 9 9
10
12
14
15
16
17
17
19
21
22
25
29
37
45
57
62
71
93
12
3
17
7
25
5
33
2
45
4
64
1
85
5
11
03
17
27
26
42
44
20
LAD population (denom)
Est
imat
ed e
con
om
ic a
ctiv
ity
P= 0.238P=0.444P=0.502
local area heterogeneity → believe area proportion small samples → weight to national mean
• Nic Longford, shrinkage estimators, JRSS(A) 1999– No distributional assumptions– The area sample as a proportion of the whole – The combination of area and national proportions
that minimises Mean Square Error of estimate from true local mean:
• Usually
lp̂
p̂
2b
)ˆ( lpv
)ˆ()21)(ˆ(
ˆ)1)(ˆ(ˆ)ˆ()ˆ(2
2
pvqpvσ
pqpvpqpvσpv
llb
lllllb
lq
)ˆ(
ˆ)ˆ(ˆ2
2
lb
llb
pvσ
ppvpσ
Pakistani women aged 25-74, economic activity, Census 2001Direct and non-parametric shrunken estimates (Nic Longford, JRSSA 1999)
0.0
0.2
0.4
0.6
0.8
1.0
3 3 6 6 7 9 9 10 12 14 15 16 17 17 19 21 22 25 29 37 45 57 62 71 93 123
177
255
332
454
641
855
1103
1727
2642
4420
LAD population (denom)
Est
imat
ed e
cono
mic
ac
tivi
ty
Predict local means and shrink to those
• Combine two approaches to small area estimation– Model similarity of locality to others, to find
their predicted mean • Only as good as the model
– Rely on direct local estimate when n is large• Software limitations
Predictors of area economic activity
Pakistani women aged 25-742001 Census, 355 LADs
20.0
30.0
40.0
50.0
60.0
70.0
80.0
0.0 5.0 10.0 15.0
Pakistani as percent of all women 25-74
% U
K b
orn
Pakistani women aged 25-742001 Census, 355 LADs
0.0
0.2
0.4
0.6
0.8
1.0
25.0 30.0 35.0 40.0 45.0 50.0Mean age of Pakistani women 25-74
Ec
on
om
ic a
cti
vit
y
Pakistani women aged 25-742001 Census, 355 LADs
0.0
0.2
0.4
0.6
0.8
1.0
0.0 5.0 10.0 15.0 20.0
Pakistani as percent of all women 25-74
Ec
on
om
ic a
cti
vit
y
Pakistani women aged 25-742001 Census, 355 LADs
0.0
0.2
0.4
0.6
0.8
1.0
0.0 20.0 40.0 60.0 80.0 100.0
Percent born in UK (Pakistani people all ages)
Ec
on
om
ic a
cti
vit
y
c
Modelling options for a rate p
• P as continuous variable, normal errors– No equivalent of shrinkage
• Numerator as count, denominator as offset, Poisson errors
• P as an aggregate 0-1 variable, Binomial errors: logistic– Not SPSS– MLwiN– STATA
Logistic transformation of proportion to avoid predicted values outside [0,1]
logit(p)=loge(p/(1-p))
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p
log
it(p
)
Snijders and Bosker, 1999
• “Estimation procedures for these [binary outcome multilevel] models are still in a state of active development”, with different and disputed levels of bias, mean square error and stability, dependent on starting values, data, complexity of the model and small group sizes
Multilevel logistic model
MLwiN eth % mean age uk- variance Extra-parameters of pop 2574 born % LADs binomial
White Briton 0.006 -0.078 -0.072 0.020 1.0Irish -0.044 -0.099 -0.010 0.015 1.2Other White -0.016 -0.054 -0.009 0.024 1.2
Caribbean / White-0.365 -0.046 -0.014 0.007ns 1.7African / White-0.772 -0.025 0.000ns 0.029* 1.8Asian / White -0.492 -0.030 -0.010 0.039 1.7Other mixed -0.341 -0.020 -0.003ns 0.013ns 1.9
Indian -0.025 -0.015ns -0.003 0.107 1.5Pakistani -0.181 0.008ns -0.011 0.208 1.7Bangladeshi -0.054 -0.010ns -0.021 0.235 2.7Other Asian -0.069 0.034 -0.014 0.132 2.5
Caribbean -0.024 -0.079 0.007* 0.033 2.4African -0.036 -0.048 0.006ns 0.139 2.1Other Black -0.012ns -0.035 -0.001ns 0.021ns 2.4
Chinese -0.212 -0.038 -0.003ns 0.041 0.7Other -0.108 0.008ns -0.009* 0.093 1.1
LAD predicted economic activity from fixed model, 3 predictors
Pakistani women aged 25-74, economic activity, Census 2001Estimates from model with 3 explanatory variables, no random effect
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
3 3 6 9 9 12 14 15 17 18 21 23 29 42 57 70 93 155
255
417
641
981
1727
3288LAD population (denom)
Es
tim
ate
d e
co
no
mic
ac
tiv
ity
direct intercept from fixed model
Shrinkage from observed proportion towards modelled proportion
Pakistani women aged 25-74, economic activity, Census 2001Estimates from models with 3 explanatory variables and random intercept
0.00.10.20.30.40.50.60.70.80.91.0
3 3 6 7 9 10 12 14 15 16 17 19 21 25 30 42 52 63 81 116
177
266
418
616
904
1258
2356
4977LAD population (denom)
Est
imat
ed e
con
om
ic a
ctiv
ity
direct fixed intercept from mixed model random intercept PQL2 RIGLS
Pakistani women aged 25-74, economic activity, Census 2001Estimates from models with 3 explanatory variables and random intercept
0.0
0.2
0.4
0.6
0.8
1.0
3 3 3 6 6 6 8 9 9 10 10 12 13 14 15 15 16 16 17 18 19 20 21 23 24 27 30 36 43 46 57 61 69 77 91 113
128
176
212
266
332
446
499
681
802
981
1138
1425
2171
3173
4420
LAD population (denom)
Es
tim
ate
d e
co
no
mic
ac
tiv
ity
direct random intercept PQL2 RIGLSfixed intercept from fixed model fixed intercept from mixed model
Shrinkage with random intercept only, NOT equal to shrinkage to the overall mean
Pakistani women aged 25-74, economic activity, Census 2001Estimates from models with intercept only
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
3 3 6 7 9 10 12 14 15 16 17 19 21 25 30 42 52 63 81 116
177
266
418
616
904
1258
2356
4977
LAD population (denom)
Est
imat
ed e
con
om
ic a
cti
vity
direct random intercept
Fixed intercept=mean=0.249
Beta0 with random intercept=0.385
Conclusions
• Shrinkage using variance components models is suitable for small area estimation– especially where the area populations vary greatly
• It makes little difference to proportions based on populations over 100
• Modelling allows borrowing strength from similar areas– The model must fit well
• The software for mixed models does not shrink to the overall mean (or fixed effects prediction in the case of continuous predictors)– Software and documentation under development