model-based estimation of poverty indicators for small

Model-based Estimation of Poverty Indicators for Small Areas: Overview J. N. K. Rao Carleton University, Ottawa, Canada Isabel Molina Universidad Carlos III de Madrid, Spain Paper presented at The First Asian ISI Satellite Meeting on Small Area Estimation (SAE), September 1-4, 2013, Bangkok, Thailand

SMALL AREA ESTIMATION POVERTY INDICATORS EB METHOD HB METHOD APPLICATION CONCLUSIONS

NOTATION

• U finite population of size N.

• Population partitioned into D subsets U1, . . . ,UD of sizesN1, . . . ,ND , called domains or areas.

• Variable of interest Y .

• Ydj value of Y for unit j from domain d .

• Target: to estimate domain parameters.

δd = h(Yd1, . . . ,YdNd), d = 1, . . . ,D.

• We want to use data from a sample S ⊂ U of size n drawnfrom the whole population.

• Sd = S ∩ Ud sub-sample from domain d of size nd = |Sd |.• Problem: nd too small for some domains.

3


DIRECT ESTIMATORS

• Direct estimator: Estimator that uses only the sample datafrom the corresponding domain.

• Small area/domain: subset of the population that is targetof inference and for which the direct estimator does not haveenough precision.

• Indirect estimator: Borrows strength from other areas.

4


NESTED-ERROR REGRESSION MODEL

• Model: xdj auxiliary variables at unit level,

Ydj = x′djβ + ud + edj , udiid∼ N(0, σ2u), edj

iid∼ N(0, σ2e ).

• EBLUP of Yd : Predict non-sample valuesYdj = x′dj βWLS + ud ,

ˆY EBLUPd =

1

Nd

∑j∈Sd

Ydj +∑

j∈Ud−Sd

Ydj

, d = 1, . . . ,D.

X Battese, Harter & Fuller (1988), JASA 5

SAE POVERTY INDICATORS EB ELL SIMULATIONS EXTENSIONS APPLICATION CONCLUSIONS

SOME POVERTY AND INCOME INEQUALITYINDICATORS

• FGT poverty indicators

• Quintile share

• Gini coefficient

• Sen index

• Theil index

• Generalized entropy

• Fuzzy monetary index

X Neri, Ballini & Betti (2005), Stat. in Transition 13


FGT POVERTY INDICATORS

• Edj welfare measure for indiv. j in domain d : for instance,equivalised annual net income.

• z = poverty line.

• FGT family of poverty indicators for domain d :

Fαd =1

Nd

Nd∑j=1

(z − Edj

z

)αI (Edj < z), α = 0, 1, 2.

When α = 0⇒ Poverty incidence

When α = 1⇒ Poverty gap

X Foster, Greer & Thornbecke (1984), Econometrica 6


FGT POVERTY INDICATORS

• Complex non-linear quantities (non continuous): Even ifFGT poverty indicators are also means

Fαd =1

Nd

Nd∑j=1

Fαdj , Fαdj =

(z − Edj

z

)αI (Edj < z),

we cannot assume normality for the Fαdj .

7


SMALL AREA ESTIMATION

• Due to the relative nature of the mentioned poverty line,poverty has usually low frequency: Large sample size isneeded.

X In Spain, poverty line for 2006: 6557 euros, approx. 20 %population under the line.

• Survey on Income and Living Conditions (EU-SILC) haslimited sample size.

X In the Spanish SILC 2006, n = 34,389 out ofN = 43,162,384 (8 out 10,000).

8


SAMPLE SIZES OF PROVINCES BY GENDER

• Direct estimators for Spanish provinces are not very precise.

• We want estimates by Gender:Small areas: D = 52 provinces for each gender.

• CVs of direct and EB estimators of poverty incidences forselected provinces for each gender:

Province Gender nd Obs. Poor CV Dir. CV EB CV HB

Soria F 17 6 51.87 16.56 19.82Tarragona M 129 18 24.44 14.88 12.35Cordoba F 230 73 13.05 6.24 6.93Badajoz M 472 175 8.38 3.48 4.24

Barcelona F 1483 191 9.38 6.51 4.52

9


EB METHOD FOR POVERTY ESTIMATION

• Assumption: there exists a transformation Ydj = T (Edj) ofthe welfare variables Edj which follows a normal distribution(i.e., the nested error model with normal errors ud and edj).

• FGT poverty indicator as a function of transformed variables:

Fαd =1

Nd

Nd∑j=1

{z − T−1(Ydj)

z

}αI{

T−1(Ydj) < z}

= hα(yd),

where yd = (Yd1, . . . ,YdNd)′ = (y′ds , y

′dr )′.

• EB estimator of Fαd :

FEBαd = Eydr [Fαd |yds ] .

• MSE by parametric bootstrap for finite populations(X Gonzalez-Manteiga, Lombardıa, Molina, Morales andSantamarıa, 2008, J.Stat.Comp.Simul.).

10


MONTE CARLO APPROXIMATION

(a) Generate L non-sample vectors y(`)dr , ` = 1, . . . , L from the

(estimated) conditional distribution of ydr |yds .

(b) Attach the sample elements to form a population vector

y(`)d = (yds , y

(`)dr ), ` = 1, . . . , L.

(c) Calculate the poverty measure with each population vector

F(`)αd = hα(y

(`)d ), ` = 1, . . . , L. Then take the average over the

L Monte Carlo generations:

FEBαd = Eydr [Fαd |yds ] ∼=

1

L

L∑`=1

F(`)αd .

11

SAE POVERTY INDICATORS EB ELL SIMULATIONS MODIFICATIONS EXTENSIONS CONCLUSIONS

MSE ESTIMATION

• Construct bootstrap populations {Y ∗(b)dj , b = 1, . . . ,B} from

Y ∗dj = x′dj β + u∗d + e∗dj ; j = 1, . . . ,Nd , d = 1, . . . ,D.

u∗diid∼ N(0, σ2u); e∗dj

iid∼ N(0, σ2e ).

• Calculate bootstrap population parameters F ∗αd(b)

• From each bootstrap population, take the sample with thesame indexes S as in the initial sample and calculate EBsFEB∗αd (b) using bootstrap sample data y∗s and known xdj .

mse∗(FEBαd ) =

1

B

B∑b=1

{FEB∗αd (b)− F ∗αd(b)}2

20


WORLD BANK (WB) / ELL METHOD

• Elbers et al. (2003) also used nested error model ontransformed variables Ydj , using clusters as d .

• For comparability we take cluster as small area.

• Generate A bootstrap populations {Y ∗dj(a), a = 1, . . . ,A}

• Calculate F ∗αd(a), a = 1, . . . ,A. Then ELL estimator is:

F(ELL)αd =

1

A

A∑a=1

F ∗αd(a) = F ∗αd(·)

21


WORLD BANK (WB) / ELL METHOD

• MSE estimator:

mse(FELLαd ) =

1

A

A∑a=1

{F ∗αd(a)− F ∗αd(·)}2

• If the mean Yd is the parameter of interest, then

ˆY(ELL)d ' Xd β

• ˆY(ELL)d is a regression synthetic estimator.

• For non-sampled areas, FELLαd is essentially equivalent to FEB

αd .

• But MSE estimators are different for ELL and EB.

22


POVERTY INCIDENCE• Bias negligible for all three estimators (EB, direct and ELL).• EB much more efficient than ELL and direct estimators.• ELL even less efficient than direct estimators!

Bias ( %)

0 20 40 60 80

−0.

3−

0.2

−0.

10.

00.

10.

20.

3

Area

Bia

s po

vert

y in

cide

nce

(x10

0)

EB Sample ELL

MSE (×104)

0 20 40 60 80

1020

3040

5060

70

Area

MS

E p

over

ty in

cide

nce

(x10

000)

EB Sample ELL

Figure 1. Bias (left) and MSE (right) of EB, direct and ELL estimators

of poverty incidences F0d for each area d . 12


POVERTY GAP

• Same conclusions as for poverty incidence.

Bias ( %)

0 20 40 60 80

−0.

10−

0.05

0.00

0.05

0.10

Area

Bia

s po

vert

y ga

p (x

100)

EB Direct ELL

MSE (×104)

0 20 40 60 80

12

34

Area

MS

E p

over

ty g

ap (

x100

00)

EBDirectELL

Figure 2. Bias (left) and MSE (right) of EB, direct and ELL estimators

of poverty gaps F1d for each area d . 13


CENSUS EB METHOD• When sample data cannot be linked with census auxiliary data,in steps (a) and (b) of EB method generate a full census from

yd = µd |ds+vd1Nd+εd , µd |ds = Xd β+σ2u1Nd

1′nd V−1ds (yds−Xds β).

• Practically the same as original EB method.

a) Mean (×100)

0 20 40 60 80

3.3

3.4

3.5

3.6

3.7

3.8

Area

Pov

erty

inci

denc

e (x

100)

EB Census EB

b) MSE (×104)

0 20 40 60 80

9.5

10.0

10.5

11.0

11.5

12.0

Area

MS

E p

over

ty in

cide

nce

(x10

000)

EB Census EB

Figure 4. a) Mean and b) MSE of EB and Census EB estimators of

poverty gaps F1d for each area d . 29

0 20 40 60 80

12.4

12.6

12.8

13.0

13.2

13.4

13.6

Area

MS

E E

LL e

stim

ator

of m

eans

●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

● True MSEBootstrap MSEELL MSE


BOOTSTRAP MSE

• The bootstrap MSE tracks true MSE.

a) MSE of poverty incidence

0 20 40 60 80

10.0

10.2

10.4

10.6

10.8

Area

MS

E P

over

ty in

cide

nce

(x10

000)

●

●●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●●

●

●●

●

●

● True MSEBootstrap MSE

b) MSE of poverty gap

0 20 40 60 80

0.78

0.80

0.82

0.84

0.86

Area

MS

E P

over

ty g

ap (

x100

00)

●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●●

●

●

●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●●

●

●

●

●

●●

●●

●

●●

●●●

●

● True MSEBootstrap MSE

Figure 3. True MSEs and bootstrap estimators (×104) of EB estimators

with B = 500 for each area d . 28


HIERARCHICAL BAYES METHOD

• Reparameterized nested-error model:

ydi |ud ,β, σ2 ind∼ N(x′diβ + ud , σ

2)

ud |ρ, σ2ind∼ N

(0,

ρ

1− ρσ2), i = 1, . . . ,Nd , d = 1, . . . ,D.

• ρ ∼ U(0, 1) shrinkage or reference prior under a constantmean model, with good frequentist properties.

• π(σ2) ∝ 1/σ2 Jeffreys objective or reference prior.

• π(β, σ2, ρ) ∝ 1/σ2 noninformative prior.

X Rao, Nandram & Molina, Work in progress 14



• θ = (u′,β, σ2, ρ)′ vector of parameters.

• Distribution of out-of-sample values given parameters:

Ydi |θind∼ N(x′diβ + ud , σ

2), i ∈ rd , d = 1, . . . ,D.

• Hierarchical Bayes estimator of Fαd = hα(yd):

FHBαd = Eydr (Fαd |ys),

with

f (ydr |ys) =

∫ ∏i∈rd

f (Ydi |θ)π(θ|ys)dθ.



APPLICATION WITH SILC DATA

• We assume the nested error model for the log-equivalizedannual net income (Ydj = T (Edj) = log Edj).

• We fit the nested error model separately for each gender withprovinces as areas (D = 52).

• We take as explanatory variables, the indicators of 5 agegroups, of having Spanish nationality, of 3 education levelsand of labor force status (unemployed, employed or inactive).

17


HIERARCHICAL BAYES METHOD• HB estimates practically the same as EB ones.

• The same result in simulations under the frequential setup(frequential validity).

Poverty incidence (×100)

EB estimate poverty incidence (x100)

HB

estim

ate

pove

rty in

cide

nce

(x10

0)

10

15

20

25

30

35

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

10 15 20 25 30 35

gender

● M

F

b) Poverty gap (×100)

EB estimate poverty gap (x100)

HB

estim

ate

pove

rty g

ap (x

100)

4

6

8

10

12

14

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

4 6 8 10 12 14

gender

● M

F

Figure 3 HB estimates of poverty incidence F0d (left) and of poverty gap

F1d (right) against EB estimates for each province d . 18


HIERARCHICAL BAYES METHODPoverty incidence

Area sorted by increasing sample size

Est

ima

ted

pove

rty

inci

de

nce

(x1

00

)

10

20

30

40

●

●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

10 20 30 40 50

gender

● M

F

Figure 4 HB estimates of poverty incidences with HPD intervals bygender, for each province d . Provinces sorted by increasing sample size.

19



Poverty gap


Est

ima

ted

pove

rty

ga

p (

x10

0)

5

10

15

●

●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

● ●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

10 20 30 40 50

gender

● M

F

Figure 5 HB estimates of poverty gaps with HPD intervals by gender, for

each province d . Provinces sorted by increasing sample size. 20


HIERARCHICAL BAYES METHODPoverty incidence


Est

ima

ted

Va

r a

nd

bo

ots

tra

p M

SE

, P

ov.

inc.

5

10

15

20

25

30

Male

●

● ●●

●

●

●

●●

●●

●● ●

●● ●

●

●●

●● ●

● ●

● ●●

● ●●

●● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

10 20 30 40 50

Female

●

●

●

●

●

●●

●●

●● ●

●●

●● ●

● ●● ●

●

●

●

●●

● ●

● ● ●● ● ● ●

● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●

10 20 30 40 50

Uncertainty

● Bootstrap MSE EB

Post. Variance HB

Figure 6 Posterior variance of HB estimators and bootstrap MSE of EB

estimators of poverty incidence for each province for Males (left) and

Females (right). Provinces sorted by increasing sample size. 21


RESULTSPoverty incidence ( %): Men

under 1515 − 2020 − 2525 − 30over 30

Poverty incidence ( %): Women

under 1515 − 2020 − 2525 − 30over 30

Pov.inc.≥ 30 %, Men: Almerıa, Granada, Cordoba, Badajoz, Avila,Salamanca, Zamora, Cuenca.

Women: also Jaen, Albacete, Ciudad Real, Palencia, Soria. 22


RESULTS

Poverty gap ( %): Men

under 55 − 7.57.5 − 1010 − 12.5over 12.5

Poverty gap ( %): Women

under 55 − 7.57.5 − 1010 − 12.5over 12.5

Pov.gap ≥ 12.5 %, Men: Almerıa, Badajoz, Zamora, Cuenca.

Women: Granada, Amerıa, Badajoz, Avila, Cuenca. 23

M-quantile SAE estimation: Giusti et al. (2012) • Model: M-quantile of order qof the conditional

distribution of y given x, )(),( qxqxQ ψψ β′= . • )(ˆ qψβ : Estimator of )(qψβ for specified q. Solve

),(ˆ qxQy djdj ′= ψ to get djq for dsj∈ and take their mean

dθ . • Predictor of djy for drj∈ is )ˆ(ˆˆ qdjdj xy θβψ′= .

• M-quantile estimator of dY for small dn and relatively large dN is approximately given by

)ˆ(ˆ)( dddd

MQd xXyy θβψ′−+≈

• Note that MQ estimator looks similar to sample regression

estimator of Fuller which is a component of the EBLUP under nested error model. Hence MQ can be considerably less efficient than EBLUP under nested error model with significant area effects

• Similar comments apply to the MQ estimator of the

poverty measure dFα considered by Giusti et al. (2012).


SKEW-NORMAL EB

• Nester error model with edj skew normal

udiid∼ N(0, σ2u), edj

iid∼ SN(0, σ2e , λe)

θ = (β′, σ2u, σ2e , λe)′

λe = 0 corresponds to Normal

• As in the Normal case, EB estimator can be computed bygenerating only univariate normal variables, conditionallygiven a half-normal variable T = t.

• SN-EB was computed assuming θ is known.

31


SKEW-NORMAL EB SIMULATION

• EB biased under significant skewness (λ > 1) unlike SN EB.

a) Bias of SN-EB estimator

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

Area

Bia

s:

Po

ve

rty G

ap

(x1

00

)

SN(0.5,10) SN(0.5,5) SN(0.5,3) SN(0.5,2) SN(0.5,1)

b) Bias of EB estimator

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

Area

Bia

s:

Po

ve

rty G

ap

(x1

00

)

SN(0.5,10) SN(0.5,5) SN(0.5,3) SN(0.5,2) SN(0.5,1)

Figure 6. Bias of a) SN-EB estimator and b) EB estimator under skew

normal distributions for error term for λ = 1, 2, 3, 5, 10.

X Diallo & Rao, Work in progress 32


SKEW-NORMAL EB SIMULATION

• RMSE = MSE(EB)/MSE(SN-EB)

• SN-EB significantly more efficient than EB when λ > 1.

0 20 40 60 80

1.0

1.5

2.0

2.5

Area

RM

SE

= M

SE

(MR

)/M

SE

(SN

-MR

): P

ove

rty G

ap SN(0.5,10) SN(0.5,5) SN(0.5,3) SN(0.5,2) SN(0.5,1) SN(0.5,0)

Figure 7. RMSE for skewness parameter λ = 1, 2, 3, 5, 10. 33


CONCLUSIONS

• We studied EB and HB estimation of complex small areaparameters.

• Method applicable to unit level data.

• EB method assumes normality for some transformation of thevariable of interest. EB work extended to skew normaldistributions.

• It requires the knowledge of all population values of theauxiliary variables.

• It requires computational effort because large number ofpopulations are generated. Fast EB method available.

38


CONCLUSIONS

• Original EB method, unlike ELL method, requires linkingsample with census data for the auxiliary variables. CensusEB method avoids the linking and is practically the same asoriginal EB.

• Both EB and ELL methods assume that the sample isnon-informative, that is, the model for the population holdsgood for the sample. Under informative sampling, probablyboth methods are biased. Currently an extension of EBmethod accounting for informative sampling is being studied.

39

model-based estimation of poverty indicators for small

Documents