the estimation of unknown multiway distributions

16
The estimation of unknown multiway distributions Paul Williamson University of Liverpool, UK To IPF or to Reweight? That is the question…

Upload: cianna

Post on 05-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

To IPF or to Reweight? That is the question…. The estimation of unknown multiway distributions. Paul Williamson University of Liverpool, UK. Survey distribution [of age by sex]. 1. The need for IPF/Reweighting. …for local area?. Over-exaggerate problem?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The estimation of unknown multiway distributions

The estimation of unknown multiway distributions

Paul WilliamsonUniversity of Liverpool, UK

To IPF or to Reweight? That is the question…

Page 2: The estimation of unknown multiway distributions

1. The need for IPF/Reweighting

Survey distribution [of age by sex]

…for local area?

Page 3: The estimation of unknown multiway distributions

Sampling Error: Conditional Probability

SAR District: Leeds (2nd largest in UK) 95% CI as Count % % of SAR

Economic position Female Total female estimate Employee full-time 1525 4146 36.8 4.0 On a Govt scheme 31 77 40.3 27.2 Unemployed 168 573 29.3 12.7 Retired 1267 2116 59.9 3.5 Total 5545 10485 52.9 1.8

Over-exaggerate problem?• 2% sample• Minimally multivariate• Not based on minorities (e.g. unemployed

ethnic minority)• Min. geog. threshold: 120k

Page 4: The estimation of unknown multiway distributions

Need to make most of all available data…

Survey distribution [of age by sex]

Loc

al a

ge d

istr

ibu

tion

Local sex distribution

Page 5: The estimation of unknown multiway distributions

Male Female

Young 1 2Old 2 3

1 22 3

2. IPF v. Reweighting

Male Female50 50

Young 20 1 2Old 80 2 3

Male Female50 50

Young 20 8.9 11.1Old 80 41.1 38.9

Page 6: The estimation of unknown multiway distributions

Weight ID Age Sex1 Young Male2 Young Male3 Young Male4 Young Male5 Young Male6 Young Female7 Young Female8 Young Female9 Young Female10 Young Female11 Old Male12 Old Male13 Old Male14 Old Male15 Old Male16 Old Female17 Old Female18 Old Female19 Old Female20 Old Female

2. IPF v. ReweightingWeight ID Age Sex

1 1 Young Male1 2 Young Male1 3 Young Male1 4 Young Male1 5 Young Male1 6 Young Female1 7 Young Female1 8 Young Female1 9 Young Female1 10 Young Female

11 Old Male12 Old Male13 Old Male14 Old Male15 Old Male16 Old Female17 Old Female18 Old Female19 Old Female20 Old Female

TARGET: Male Female5 5

Young 2 0.89 1.11Old 8 4.11 3.89

ESTIMATE: Male Female5 5

Young 2 5 5Old 8 0 0

ESTIMATE: Male Female5 5

Young 2 1 1Old 8 4 4

Weight ID Age Sex1 1 Young Male

2 Young Male3 Young Male4 Young Male5 Young Male

1 6 Young Female7 Young Female8 Young Female9 Young Female

10 Young Female1 11 Old Male1 12 Old Male1 13 Old Male1 14 Old Male

15 Old Male1 16 Old Female1 17 Old Female1 18 Old Female1 19 Old Female

20 Old Female

Page 7: The estimation of unknown multiway distributions

Comparison for margin-constrained tables

Target: age x sex x tenure x economic position (64 counts) at district level (17 districts)

% NFC (17 district average)32182237

•2% SAR•CO

•IPFN

•IPFU

Page 8: The estimation of unknown multiway distributions

Simpson & Tranmer (2005)

Source of relationship 6 counts 3 %s

None1. Independent margins 381 0.209

2% SAR2. England & Wales 69 0.1583. Direct SAR area sample 62 0.1103a. Multilevel model 61 0.109

1% SAR, 26 ward types4. Direct ward type sample 57 0.0934a. Multilevel model 58 0.093

Average error

9363 wards(RMSE)

6 counts 3 %s

348 0.189

60 0.05962 0.05961 0.057

-- ---- --

Average error

816 wards(RMSE)

Combinatorial Optimisation5. Direct estimate -- -- 42 0.0476. As constraint on IPF -- -- 32 0.045

Target: Car ownership (2) x Tenure (3) (6 counts; 3%s) for residents at ward level

Page 9: The estimation of unknown multiway distributions

Household in ‘unaffordable’ housing if:• household income in bottom 40% of national equivalised gross household income• rent/mortgage >= 30% of gross household income

Estimated by reweighting HES to fit 74 SLA constraints [cell counts] for each of 953 SLAs

(SLA~Ward)

GREGWT estimates produced by NATSEM (University of Canberra) [GREGWT≈GENLOG]

3. Housing affordability

Page 10: The estimation of unknown multiway distributions

Measures of Fit

Cells:

• AE; APE; Z-score; Zm-score; NFC

Table(s):• TAE; TAPE; ΣZ2; RSSZ; NFT

Page 11: The estimation of unknown multiway distributions

AreaCode GREGWT COOTAE 8479.6 113.2OTAE/HH 518.4 0.3OTAPE 1.6 0.5ORSumZ2 55578.4 0.9ONFT 2.8 0.1ONFC 14.2 1.9MaxWeight 18.8 8.0

ALL SLAs in ACT

AreaCode GREGWT COOTAE 64052.9 100.3OTAE/HH 3961.5 1.8OTAPE 9.3 2.0ORSumZ2 424467.7 3.1ONFT 8.4 1.0ONFC 71.9 7.9MaxWeight 100.3 9.5

OTAE/H>1AreaCode GREGWT COOTAE 70.2 100.6OTAE/HH 0.0 0.1OTAPE 0.3 0.2ORSumZ2 33.8 0.4ONFT 1.6 0.0ONFC 4.5 0.8MaxWeight 6.2 7.2

OTAE/H<=0.1

Page 12: The estimation of unknown multiway distributions

Houesholds in unaffordable housing

percentage

counts

NSW (convergent) SLAs

R2 = 0.4877

0%

5%

10%

15%

20%

25%

0% 5% 10% 15% 20% 25%

ABS estimates

GR

EG

WT

es

tim

ate

s

NSW (convergent) SLAs

R2 = 0.7051

0%

5%

10%

15%

20%

25%

0% 5% 10% 15% 20% 25%

ABS estimates

CO

es

tim

ate

s

NSW (convergent) SLAs

R2 = 0.9853

0

2000

4000

6000

8000

10000

0 2000 4000 6000 8000 10000

ABS estimates

GR

EG

WT

es

tim

ate

s

NSW (convergent) SLAs

R2 = 0.9881

0

2000

4000

6000

8000

10000

0 2000 4000 6000 8000 10000

ABS estimates

CO

es

tim

ate

s

Page 13: The estimation of unknown multiway distributions

Houesholds in unaffordable housing

percentage

counts

ACT (convergent) SLAs

R2 = 0.3738

0%

5%

10%

15%

20%

0% 5% 10% 15% 20%

ABS estimates

GR

EG

WT

es

tim

ate

s

ACT (convergent) SLAs

R2 = 0.1162

0%

5%

10%

15%

20%

0% 5% 10% 15% 20%

ABS estimates

CO

es

tim

ate

s

ACT (convergent) SLAs

R2 = 0.8401

0

100

200

300

400

500

600

0 100 200 300 400 500 600

ABS estimates

GR

EG

WT

es

tim

ate

s

ACT (convergent) SLAs

R2 = 0.8279

0

100

200

300

400

500

600

0 100 200 300 400 500 600

ABS estimates

CO

es

tim

ate

s

Page 14: The estimation of unknown multiway distributions

GREGWT wts All convergent SLAs

State n= max wt wt=0wt (>0 to 1)

wt (>1 to 10)

wt (>10 to 50)

ACT 90 24 56% 44% 1% 0%NSW 194 688 49% 36% 14% 1%VIC 195 353 44% 41% 15% 0%QLD 444 120 49% 47% 4% 0%

CO wts All SLAs

State n= max wt wt=0wt (>0 to 1)

wt (>1 to 10)

wt (>10 to 50)

ACT 107 34 95% 3% 2% 0%NSW 199 648 79% 7% 13% 1%VIC 200 242 78% 8% 13% 0%QLD 454 87 90% 5% 5% 0%

Solution: CO with non-integer increments?

Cause: Patterns of weight distribution?

Page 15: The estimation of unknown multiway distributions

4. Conclusion

(a) Accuracy of estimates(b) Unanswered questions

Page 16: The estimation of unknown multiway distributions

Constraining Census Tabulations

Cells constrained

per table Age / Sex / Marital status 84 Household composition / Tenure 77 Resident status / Sex 6 Household size / Number of rooms / Tenure 196 Long-term illness / Age / Sex 14 Dependants 7 Socio-economic group of household head / Tenure 100 Age / Sex / Marital status of household head 28 Sex / Marital status / Economic position 56 Age / Sex / Economic position 180 Ethnic group of household head / Tenure 16 Sex / Economic position / Ethnic group 24 Household composition / Car ownership 33 Occupation / Age / Sex 20 814