the estimation of unknown multiway distributions
Post on 05-Jan-2016
28 Views
Preview:
DESCRIPTION
TRANSCRIPT
The estimation of unknown multiway distributions
Paul WilliamsonUniversity of Liverpool, UK
To IPF or to Reweight? That is the question…
1. The need for IPF/Reweighting
Survey distribution [of age by sex]
…for local area?
Sampling Error: Conditional Probability
SAR District: Leeds (2nd largest in UK) 95% CI as Count % % of SAR
Economic position Female Total female estimate Employee full-time 1525 4146 36.8 4.0 On a Govt scheme 31 77 40.3 27.2 Unemployed 168 573 29.3 12.7 Retired 1267 2116 59.9 3.5 Total 5545 10485 52.9 1.8
Over-exaggerate problem?• 2% sample• Minimally multivariate• Not based on minorities (e.g. unemployed
ethnic minority)• Min. geog. threshold: 120k
Need to make most of all available data…
Survey distribution [of age by sex]
Loc
al a
ge d
istr
ibu
tion
Local sex distribution
Male Female
Young 1 2Old 2 3
1 22 3
2. IPF v. Reweighting
Male Female50 50
Young 20 1 2Old 80 2 3
Male Female50 50
Young 20 8.9 11.1Old 80 41.1 38.9
Weight ID Age Sex1 Young Male2 Young Male3 Young Male4 Young Male5 Young Male6 Young Female7 Young Female8 Young Female9 Young Female10 Young Female11 Old Male12 Old Male13 Old Male14 Old Male15 Old Male16 Old Female17 Old Female18 Old Female19 Old Female20 Old Female
2. IPF v. ReweightingWeight ID Age Sex
1 1 Young Male1 2 Young Male1 3 Young Male1 4 Young Male1 5 Young Male1 6 Young Female1 7 Young Female1 8 Young Female1 9 Young Female1 10 Young Female
11 Old Male12 Old Male13 Old Male14 Old Male15 Old Male16 Old Female17 Old Female18 Old Female19 Old Female20 Old Female
TARGET: Male Female5 5
Young 2 0.89 1.11Old 8 4.11 3.89
ESTIMATE: Male Female5 5
Young 2 5 5Old 8 0 0
ESTIMATE: Male Female5 5
Young 2 1 1Old 8 4 4
Weight ID Age Sex1 1 Young Male
2 Young Male3 Young Male4 Young Male5 Young Male
1 6 Young Female7 Young Female8 Young Female9 Young Female
10 Young Female1 11 Old Male1 12 Old Male1 13 Old Male1 14 Old Male
15 Old Male1 16 Old Female1 17 Old Female1 18 Old Female1 19 Old Female
20 Old Female
Comparison for margin-constrained tables
Target: age x sex x tenure x economic position (64 counts) at district level (17 districts)
% NFC (17 district average)32182237
•2% SAR•CO
•IPFN
•IPFU
Simpson & Tranmer (2005)
Source of relationship 6 counts 3 %s
None1. Independent margins 381 0.209
2% SAR2. England & Wales 69 0.1583. Direct SAR area sample 62 0.1103a. Multilevel model 61 0.109
1% SAR, 26 ward types4. Direct ward type sample 57 0.0934a. Multilevel model 58 0.093
Average error
9363 wards(RMSE)
6 counts 3 %s
348 0.189
60 0.05962 0.05961 0.057
-- ---- --
Average error
816 wards(RMSE)
Combinatorial Optimisation5. Direct estimate -- -- 42 0.0476. As constraint on IPF -- -- 32 0.045
Target: Car ownership (2) x Tenure (3) (6 counts; 3%s) for residents at ward level
Household in ‘unaffordable’ housing if:• household income in bottom 40% of national equivalised gross household income• rent/mortgage >= 30% of gross household income
Estimated by reweighting HES to fit 74 SLA constraints [cell counts] for each of 953 SLAs
(SLA~Ward)
GREGWT estimates produced by NATSEM (University of Canberra) [GREGWT≈GENLOG]
3. Housing affordability
Measures of Fit
Cells:
• AE; APE; Z-score; Zm-score; NFC
Table(s):• TAE; TAPE; ΣZ2; RSSZ; NFT
AreaCode GREGWT COOTAE 8479.6 113.2OTAE/HH 518.4 0.3OTAPE 1.6 0.5ORSumZ2 55578.4 0.9ONFT 2.8 0.1ONFC 14.2 1.9MaxWeight 18.8 8.0
ALL SLAs in ACT
AreaCode GREGWT COOTAE 64052.9 100.3OTAE/HH 3961.5 1.8OTAPE 9.3 2.0ORSumZ2 424467.7 3.1ONFT 8.4 1.0ONFC 71.9 7.9MaxWeight 100.3 9.5
OTAE/H>1AreaCode GREGWT COOTAE 70.2 100.6OTAE/HH 0.0 0.1OTAPE 0.3 0.2ORSumZ2 33.8 0.4ONFT 1.6 0.0ONFC 4.5 0.8MaxWeight 6.2 7.2
OTAE/H<=0.1
Houesholds in unaffordable housing
percentage
counts
NSW (convergent) SLAs
R2 = 0.4877
0%
5%
10%
15%
20%
25%
0% 5% 10% 15% 20% 25%
ABS estimates
GR
EG
WT
es
tim
ate
s
NSW (convergent) SLAs
R2 = 0.7051
0%
5%
10%
15%
20%
25%
0% 5% 10% 15% 20% 25%
ABS estimates
CO
es
tim
ate
s
NSW (convergent) SLAs
R2 = 0.9853
0
2000
4000
6000
8000
10000
0 2000 4000 6000 8000 10000
ABS estimates
GR
EG
WT
es
tim
ate
s
NSW (convergent) SLAs
R2 = 0.9881
0
2000
4000
6000
8000
10000
0 2000 4000 6000 8000 10000
ABS estimates
CO
es
tim
ate
s
Houesholds in unaffordable housing
percentage
counts
ACT (convergent) SLAs
R2 = 0.3738
0%
5%
10%
15%
20%
0% 5% 10% 15% 20%
ABS estimates
GR
EG
WT
es
tim
ate
s
ACT (convergent) SLAs
R2 = 0.1162
0%
5%
10%
15%
20%
0% 5% 10% 15% 20%
ABS estimates
CO
es
tim
ate
s
ACT (convergent) SLAs
R2 = 0.8401
0
100
200
300
400
500
600
0 100 200 300 400 500 600
ABS estimates
GR
EG
WT
es
tim
ate
s
ACT (convergent) SLAs
R2 = 0.8279
0
100
200
300
400
500
600
0 100 200 300 400 500 600
ABS estimates
CO
es
tim
ate
s
GREGWT wts All convergent SLAs
State n= max wt wt=0wt (>0 to 1)
wt (>1 to 10)
wt (>10 to 50)
ACT 90 24 56% 44% 1% 0%NSW 194 688 49% 36% 14% 1%VIC 195 353 44% 41% 15% 0%QLD 444 120 49% 47% 4% 0%
CO wts All SLAs
State n= max wt wt=0wt (>0 to 1)
wt (>1 to 10)
wt (>10 to 50)
ACT 107 34 95% 3% 2% 0%NSW 199 648 79% 7% 13% 1%VIC 200 242 78% 8% 13% 0%QLD 454 87 90% 5% 5% 0%
Solution: CO with non-integer increments?
Cause: Patterns of weight distribution?
4. Conclusion
(a) Accuracy of estimates(b) Unanswered questions
Constraining Census Tabulations
Cells constrained
per table Age / Sex / Marital status 84 Household composition / Tenure 77 Resident status / Sex 6 Household size / Number of rooms / Tenure 196 Long-term illness / Age / Sex 14 Dependants 7 Socio-economic group of household head / Tenure 100 Age / Sex / Marital status of household head 28 Sex / Marital status / Economic position 56 Age / Sex / Economic position 180 Ethnic group of household head / Tenure 16 Sex / Economic position / Ethnic group 24 Household composition / Car ownership 33 Occupation / Age / Sex 20 814
top related