model-based estimation of poverty indicators for small
TRANSCRIPT
Model-based Estimation of Poverty Indicators for Small Areas: Overview J. N. K. Rao Carleton University, Ottawa, Canada Isabel Molina Universidad Carlos III de Madrid, Spain Paper presented at The First Asian ISI Satellite Meeting on Small Area Estimation (SAE), September 1-4, 2013, Bangkok, Thailand
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
NOTATION
• U finite population of size N.
• Population partitioned into D subsets U1, . . . ,UD of sizesN1, . . . ,ND , called domains or areas.
• Variable of interest Y .
• Ydj value of Y for unit j from domain d .
• Target: to estimate domain parameters.
δd = h(Yd1, . . . ,YdNd), d = 1, . . . ,D.
• We want to use data from a sample S ⊂ U of size n drawnfrom the whole population.
• Sd = S ∩ Ud sub-sample from domain d of size nd = |Sd |.• Problem: nd too small for some domains.
3
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
DIRECT ESTIMATORS
• Direct estimator: Estimator that uses only the sample datafrom the corresponding domain.
• Small area/domain: subset of the population that is targetof inference and for which the direct estimator does not haveenough precision.
• Indirect estimator: Borrows strength from other areas.
4
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
NESTED-ERROR REGRESSION MODEL
• Model: xdj auxiliary variables at unit level,
Ydj = x′djβ + ud + edj , udiid∼ N(0, σ2u), edj
iid∼ N(0, σ2e ).
• EBLUP of Yd : Predict non-sample valuesYdj = x′dj βWLS + ud ,
ˆY EBLUPd =
1
Nd
∑j∈Sd
Ydj +∑
j∈Ud−Sd
Ydj
, d = 1, . . . ,D.
X Battese, Harter & Fuller (1988), JASA 5
SAE POVERTY INDICATORS EB ELL SIMULATIONS EXTENSIONS APPLICATION CONCLUSIONS
SOME POVERTY AND INCOME INEQUALITYINDICATORS
• FGT poverty indicators
• Quintile share
• Gini coefficient
• Sen index
• Theil index
• Generalized entropy
• Fuzzy monetary index
X Neri, Ballini & Betti (2005), Stat. in Transition 13
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
FGT POVERTY INDICATORS
• Edj welfare measure for indiv. j in domain d : for instance,equivalised annual net income.
• z = poverty line.
• FGT family of poverty indicators for domain d :
Fαd =1
Nd
Nd∑j=1
(z − Edj
z
)αI (Edj < z), α = 0, 1, 2.
When α = 0⇒ Poverty incidence
When α = 1⇒ Poverty gap
X Foster, Greer & Thornbecke (1984), Econometrica 6
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
FGT POVERTY INDICATORS
• Complex non-linear quantities (non continuous): Even ifFGT poverty indicators are also means
Fαd =1
Nd
Nd∑j=1
Fαdj , Fαdj =
(z − Edj
z
)αI (Edj < z),
we cannot assume normality for the Fαdj .
7
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
SMALL AREA ESTIMATION
• Due to the relative nature of the mentioned poverty line,poverty has usually low frequency: Large sample size isneeded.
X In Spain, poverty line for 2006: 6557 euros, approx. 20 %population under the line.
• Survey on Income and Living Conditions (EU-SILC) haslimited sample size.
X In the Spanish SILC 2006, n = 34,389 out ofN = 43,162,384 (8 out 10,000).
8
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
SAMPLE SIZES OF PROVINCES BY GENDER
• Direct estimators for Spanish provinces are not very precise.
• We want estimates by Gender:Small areas: D = 52 provinces for each gender.
• CVs of direct and EB estimators of poverty incidences forselected provinces for each gender:
Province Gender nd Obs. Poor CV Dir. CV EB CV HB
Soria F 17 6 51.87 16.56 19.82Tarragona M 129 18 24.44 14.88 12.35Cordoba F 230 73 13.05 6.24 6.93Badajoz M 472 175 8.38 3.48 4.24
Barcelona F 1483 191 9.38 6.51 4.52
9
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
EB METHOD FOR POVERTY ESTIMATION
• Assumption: there exists a transformation Ydj = T (Edj) ofthe welfare variables Edj which follows a normal distribution(i.e., the nested error model with normal errors ud and edj).
• FGT poverty indicator as a function of transformed variables:
Fαd =1
Nd
Nd∑j=1
{z − T−1(Ydj)
z
}αI{
T−1(Ydj) < z}
= hα(yd),
where yd = (Yd1, . . . ,YdNd)′ = (y′ds , y
′dr )′.
• EB estimator of Fαd :
FEBαd = Eydr [Fαd |yds ] .
• MSE by parametric bootstrap for finite populations(X Gonzalez-Manteiga, Lombardıa, Molina, Morales andSantamarıa, 2008, J.Stat.Comp.Simul.).
10
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
MONTE CARLO APPROXIMATION
(a) Generate L non-sample vectors y(`)dr , ` = 1, . . . , L from the
(estimated) conditional distribution of ydr |yds .
(b) Attach the sample elements to form a population vector
y(`)d = (yds , y
(`)dr ), ` = 1, . . . , L.
(c) Calculate the poverty measure with each population vector
F(`)αd = hα(y
(`)d ), ` = 1, . . . , L. Then take the average over the
L Monte Carlo generations:
FEBαd = Eydr [Fαd |yds ] ∼=
1
L
L∑`=1
F(`)αd .
11
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS
MSE ESTIMATION
• Construct bootstrap populations {Y ∗(b)dj , b = 1, . . . ,B} from
Y ∗dj = x′dj β + u∗d + e∗dj ; j = 1, . . . ,Nd , d = 1, . . . ,D.
u∗diid∼ N(0, σ2u); e∗dj
iid∼ N(0, σ2e ).
• Calculate bootstrap population parameters F ∗αd(b)
• From each bootstrap population, take the sample with thesame indexes S as in the initial sample and calculate EBsFEB∗αd (b) using bootstrap sample data y∗s and known xdj .
mse∗(FEBαd ) =
1
B
B∑b=1
{FEB∗αd (b)− F ∗αd(b)}2
20
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS
WORLD BANK (WB) / ELL METHOD
• Elbers et al. (2003) also used nested error model ontransformed variables Ydj , using clusters as d .
• For comparability we take cluster as small area.
• Generate A bootstrap populations {Y ∗dj(a), a = 1, . . . ,A}
• Calculate F ∗αd(a), a = 1, . . . ,A. Then ELL estimator is:
F(ELL)αd =
1
A
A∑a=1
F ∗αd(a) = F ∗αd(·)
21
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS
WORLD BANK (WB) / ELL METHOD
• MSE estimator:
mse(FELLαd ) =
1
A
A∑a=1
{F ∗αd(a)− F ∗αd(·)}2
• If the mean Yd is the parameter of interest, then
ˆY(ELL)d ' Xd β
• ˆY(ELL)d is a regression synthetic estimator.
• For non-sampled areas, FELLαd is essentially equivalent to FEB
αd .
• But MSE estimators are different for ELL and EB.
22
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
POVERTY INCIDENCE• Bias negligible for all three estimators (EB, direct and ELL).• EB much more efficient than ELL and direct estimators.• ELL even less efficient than direct estimators!
Bias ( %)
0 20 40 60 80
−0.
3−
0.2
−0.
10.
00.
10.
20.
3
Area
Bia
s po
vert
y in
cide
nce
(x10
0)
EB Sample ELL
MSE (×104)
0 20 40 60 80
1020
3040
5060
70
Area
MS
E p
over
ty in
cide
nce
(x10
000)
EB Sample ELL
Figure 1. Bias (left) and MSE (right) of EB, direct and ELL estimators
of poverty incidences F0d for each area d . 12
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
POVERTY GAP
• Same conclusions as for poverty incidence.
Bias ( %)
0 20 40 60 80
−0.
10−
0.05
0.00
0.05
0.10
Area
Bia
s po
vert
y ga
p (x
100)
EB Direct ELL
MSE (×104)
0 20 40 60 80
12
34
Area
MS
E p
over
ty g
ap (
x100
00)
EBDirectELL
Figure 2. Bias (left) and MSE (right) of EB, direct and ELL estimators
of poverty gaps F1d for each area d . 13
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS
CENSUS EB METHOD• When sample data cannot be linked with census auxiliary data,in steps (a) and (b) of EB method generate a full census from
yd = µd |ds+vd1Nd+εd , µd |ds = Xd β+σ2u1Nd
1′nd V−1ds (yds−Xds β).
• Practically the same as original EB method.
a) Mean (×100)
0 20 40 60 80
3.3
3.4
3.5
3.6
3.7
3.8
Area
Pov
erty
inci
denc
e (x
100)
EB Census EB
b) MSE (×104)
0 20 40 60 80
9.5
10.0
10.5
11.0
11.5
12.0
Area
MS
E p
over
ty in
cide
nce
(x10
000)
EB Census EB
Figure 4. a) Mean and b) MSE of EB and Census EB estimators of
poverty gaps F1d for each area d . 29
0 20 40 60 80
12.4
12.6
12.8
13.0
13.2
13.4
13.6
Area
MS
E E
LL e
stim
ator
of m
eans
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● True MSEBootstrap MSEELL MSE
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS
BOOTSTRAP MSE
• The bootstrap MSE tracks true MSE.
a) MSE of poverty incidence
0 20 40 60 80
10.0
10.2
10.4
10.6
10.8
Area
MS
E P
over
ty in
cide
nce
(x10
000)
●
●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●●
●
●
● True MSEBootstrap MSE
b) MSE of poverty gap
0 20 40 60 80
0.78
0.80
0.82
0.84
0.86
Area
MS
E P
over
ty g
ap (
x100
00)
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●●
●
●●
●●●
●
● True MSEBootstrap MSE
Figure 3. True MSEs and bootstrap estimators (×104) of EB estimators
with B = 500 for each area d . 28
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
HIERARCHICAL BAYES METHOD
• Reparameterized nested-error model:
ydi |ud ,β, σ2 ind∼ N(x′diβ + ud , σ
2)
ud |ρ, σ2ind∼ N
(0,
ρ
1− ρσ2), i = 1, . . . ,Nd , d = 1, . . . ,D.
• ρ ∼ U(0, 1) shrinkage or reference prior under a constantmean model, with good frequentist properties.
• π(σ2) ∝ 1/σ2 Jeffreys objective or reference prior.
• π(β, σ2, ρ) ∝ 1/σ2 noninformative prior.
X Rao, Nandram & Molina, Work in progress 14
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
HIERARCHICAL BAYES METHOD
• Proper posterior density (provided X full column rank):
π(u,β, σ2, ρ|ys) = π1(u|β, σ2, ρ, ys)π2(β|σ2, ρ, ys)π3(σ2|ρ, ys)π4(ρ|ys)
• ud |β, σ2, ρ, ysind∼ Normal.
• β|σ2, ρ, ys ∼ Normal.
• σ−2|ρ, ys ∼ Gamma.
• π4(ρ|ys) not simple but ρ-values from it can be generatedusing a grid method.
X Rao, Nandram & Molina, Work in progress 15
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
HIERARCHICAL BAYES METHOD
• θ = (u′,β, σ2, ρ)′ vector of parameters.
• Distribution of out-of-sample values given parameters:
Ydi |θind∼ N(x′diβ + ud , σ
2), i ∈ rd , d = 1, . . . ,D.
• Hierarchical Bayes estimator of Fαd = hα(yd):
FHBαd = Eydr (Fαd |ys),
with
f (ydr |ys) =
∫ ∏i∈rd
f (Ydi |θ)π(θ|ys)dθ.
X Rao, Nandram & Molina, Work in progress 16
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
APPLICATION WITH SILC DATA
• We assume the nested error model for the log-equivalizedannual net income (Ydj = T (Edj) = log Edj).
• We fit the nested error model separately for each gender withprovinces as areas (D = 52).
• We take as explanatory variables, the indicators of 5 agegroups, of having Spanish nationality, of 3 education levelsand of labor force status (unemployed, employed or inactive).
17
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
HIERARCHICAL BAYES METHOD• HB estimates practically the same as EB ones.
• The same result in simulations under the frequential setup(frequential validity).
Poverty incidence (×100)
EB estimate poverty incidence (x100)
HB
estim
ate
pove
rty in
cide
nce
(x10
0)
10
15
20
25
30
35
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
10 15 20 25 30 35
gender
● M
F
b) Poverty gap (×100)
EB estimate poverty gap (x100)
HB
estim
ate
pove
rty g
ap (x
100)
4
6
8
10
12
14
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
4 6 8 10 12 14
gender
● M
F
Figure 3 HB estimates of poverty incidence F0d (left) and of poverty gap
F1d (right) against EB estimates for each province d . 18
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
HIERARCHICAL BAYES METHODPoverty incidence
Area sorted by increasing sample size
Est
ima
ted
pove
rty
inci
de
nce
(x1
00
)
10
20
30
40
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
10 20 30 40 50
gender
● M
F
Figure 4 HB estimates of poverty incidences with HPD intervals bygender, for each province d . Provinces sorted by increasing sample size.
19
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
HIERARCHICAL BAYES METHOD
Poverty gap
Area sorted by increasing sample size
Est
ima
ted
pove
rty
ga
p (
x10
0)
5
10
15
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
10 20 30 40 50
gender
● M
F
Figure 5 HB estimates of poverty gaps with HPD intervals by gender, for
each province d . Provinces sorted by increasing sample size. 20
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
HIERARCHICAL BAYES METHODPoverty incidence
Area sorted by increasing sample size
Est
ima
ted
Va
r a
nd
bo
ots
tra
p M
SE
, P
ov.
inc.
5
10
15
20
25
30
Male
●
● ●●
●
●
●
●●
●●
●● ●
●● ●
●
●●
●● ●
● ●
● ●●
● ●●
●● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
10 20 30 40 50
Female
●
●
●
●
●
●●
●●
●● ●
●●
●● ●
● ●● ●
●
●
●
●●
● ●
● ● ●● ● ● ●
● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●
10 20 30 40 50
Uncertainty
● Bootstrap MSE EB
Post. Variance HB
Figure 6 Posterior variance of HB estimators and bootstrap MSE of EB
estimators of poverty incidence for each province for Males (left) and
Females (right). Provinces sorted by increasing sample size. 21
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
RESULTSPoverty incidence ( %): Men
under 1515 − 2020 − 2525 − 30over 30
Poverty incidence ( %): Women
under 1515 − 2020 − 2525 − 30over 30
Pov.inc.≥ 30 %, Men: Almerıa, Granada, Cordoba, Badajoz, Avila,Salamanca, Zamora, Cuenca.
Women: also Jaen, Albacete, Ciudad Real, Palencia, Soria. 22
SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS
RESULTS
Poverty gap ( %): Men
under 55 − 7.57.5 − 1010 − 12.5over 12.5
Poverty gap ( %): Women
under 55 − 7.57.5 − 1010 − 12.5over 12.5
Pov.gap ≥ 12.5 %, Men: Almerıa, Badajoz, Zamora, Cuenca.
Women: Granada, Amerıa, Badajoz, Avila, Cuenca. 23
M-quantile SAE estimation: Giusti et al. (2012) • Model: M-quantile of order qof the conditional
distribution of y given x, )(),( qxqxQ ψψ β′= . • )(ˆ qψβ : Estimator of )(qψβ for specified q. Solve
),(ˆ qxQy djdj ′= ψ to get djq for dsj∈ and take their mean
dθ . • Predictor of djy for drj∈ is )ˆ(ˆˆ qdjdj xy θβψ′= .
• M-quantile estimator of dY for small dn and relatively large dN is approximately given by
)ˆ(ˆ)( dddd
MQd xXyy θβψ′−+≈
• Note that MQ estimator looks similar to sample regression
estimator of Fuller which is a component of the EBLUP under nested error model. Hence MQ can be considerably less efficient than EBLUP under nested error model with significant area effects
• Similar comments apply to the MQ estimator of the
poverty measure dFα considered by Giusti et al. (2012).
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS
SKEW-NORMAL EB
• Nester error model with edj skew normal
udiid∼ N(0, σ2u), edj
iid∼ SN(0, σ2e , λe)
θ = (β′, σ2u, σ2e , λe)′
λe = 0 corresponds to Normal
• As in the Normal case, EB estimator can be computed bygenerating only univariate normal variables, conditionallygiven a half-normal variable T = t.
• SN-EB was computed assuming θ is known.
31
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS
SKEW-NORMAL EB SIMULATION
• EB biased under significant skewness (λ > 1) unlike SN EB.
a) Bias of SN-EB estimator
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
Area
Bia
s:
Po
ve
rty G
ap
(x1
00
)
SN(0.5,10) SN(0.5,5) SN(0.5,3) SN(0.5,2) SN(0.5,1)
b) Bias of EB estimator
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
Area
Bia
s:
Po
ve
rty G
ap
(x1
00
)
SN(0.5,10) SN(0.5,5) SN(0.5,3) SN(0.5,2) SN(0.5,1)
Figure 6. Bias of a) SN-EB estimator and b) EB estimator under skew
normal distributions for error term for λ = 1, 2, 3, 5, 10.
X Diallo & Rao, Work in progress 32
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS
SKEW-NORMAL EB SIMULATION
• RMSE = MSE(EB)/MSE(SN-EB)
• SN-EB significantly more efficient than EB when λ > 1.
0 20 40 60 80
1.0
1.5
2.0
2.5
Area
RM
SE
= M
SE
(MR
)/M
SE
(SN
-MR
): P
ove
rty G
ap SN(0.5,10) SN(0.5,5) SN(0.5,3) SN(0.5,2) SN(0.5,1) SN(0.5,0)
Figure 7. RMSE for skewness parameter λ = 1, 2, 3, 5, 10. 33
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS
CONCLUSIONS
• We studied EB and HB estimation of complex small areaparameters.
• Method applicable to unit level data.
• EB method assumes normality for some transformation of thevariable of interest. EB work extended to skew normaldistributions.
• It requires the knowledge of all population values of theauxiliary variables.
• It requires computational effort because large number ofpopulations are generated. Fast EB method available.
38
SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS
CONCLUSIONS
• Original EB method, unlike ELL method, requires linkingsample with census data for the auxiliary variables. CensusEB method avoids the linking and is practically the same asoriginal EB.
• Both EB and ELL methods assume that the sample isnon-informative, that is, the model for the population holdsgood for the sample. Under informative sampling, probablyboth methods are biased. Currently an extension of EBmethod accounting for informative sampling is being studied.
39