j. kylov_gielfeldt - collecting the household data as a sub-sample

12
Collecting the household data as a sub- sample. Rome May 2014 Jonas Kylov Gielfeldt

Upload: istituto-nazionale-di-statistica

Post on 22-Jan-2018

42.020 views

Category:

Education


0 download

TRANSCRIPT

Page 1: J. Kylov_Gielfeldt - Collecting the household data as a sub-sample

Collecting the household data as a sub-

sample.

Rome May 2014

Jonas Kylov Gielfeldt

Page 2: J. Kylov_Gielfeldt - Collecting the household data as a sub-sample

The broader frame – why are wecollecting household data?

• Are households the “natural” unit for collecting LFS variables?

• Not very often! (jobless households and…)

• Are the household unit better for data collection?

• Sometimes yes! With CAPI-mode household is very sensible, but not for CAWI and CATI-mode.

Page 3: J. Kylov_Gielfeldt - Collecting the household data as a sub-sample

• All NSI’s work in an environment were resources are sparse(r).

• Is it justifiable to use a lot of resources on collecting household data if

a) there is no gain in terms of collection mode

b) there is no strict substantial reason for collecting the

variables on households instead of on individuals?

The economy of it all…

3

Page 4: J. Kylov_Gielfeldt - Collecting the household data as a sub-sample

The Danish case

4

• In Denmark we collect the core-LFS through CATI-mode – this model is better suited for individualsas the unit.

• We are obliged to collect the household data, thisis done in a combination of CAWI/CATI

• Since collecting on household does not fit ourcollection-mode and we do not see the substantial reason for collecting LFS-variables weuse a sub-sample to minimize costs.

Page 5: J. Kylov_Gielfeldt - Collecting the household data as a sub-sample

• The core-LFS gross sample – 40.000 persons pr. quarter

• The number of respondents pr. quarter – 22.000 persons

• The gross sub-sample – 11.000 persons (not includingCore-LFS respondents)

• The number of respondents – 6.000 persons

The core-sample and the sub-sample

5

Page 6: J. Kylov_Gielfeldt - Collecting the household data as a sub-sample

Why to use a sub-sample

6

• If the NSI primarily uses CATI - collecting the wholehousehold through this mode will increase costssignificantly.

• Collecting household as the core-sample quadruples the costs! Otherwise diminish the sample size, riskingincreased bias/cluster effect (household members areoften equal)

Costs saved by sub-sampling

6000 respondents quadrupled Euro DKR (7,45)

Number of respondents 24.000

Current price in average for ca. 6000 respondents on HH 29.600 220.520

Price quadrupled 118.400 882.080

Difference (saved costs) 88.800 661.560

Page 7: J. Kylov_Gielfeldt - Collecting the household data as a sub-sample

Different sample sizes – meansdifferent weighting models

7

• The weighting model of the Core-LFS

Variables Groupings

-age11 11 grp

Information is crossed -sex 2 grp

-region 5 grp

Information is crossed

-age6 6 grp

-education 3 grp

-socio-economic status 8 grp

-number of children in the

household 4 grp

-citizenship 4 grp

-registered as unemployed 12 grp

-brutto income 4 grp

-moved 2 grp

Page 8: J. Kylov_Gielfeldt - Collecting the household data as a sub-sample

• Quite a big non-response in the Danish LFS, but a lotof high quality registers.

• This is used as auxiliary information in a rathercomplex weighting model.

• The weighting model is optimized for the number of individuals in the population and especially wants to control bias on fx labour market status, educationetc.

On the core-LFS weighting model

8

Page 9: J. Kylov_Gielfeldt - Collecting the household data as a sub-sample

The weighting model of the household-sample

9

Variables Groupings

-age 3 grp

Information is crossed -sex 2 grp

-family type 6 grp

-size of household 4 grp

A person from Household has

moved 2 grp

-Only danes in household or

mixed household 2 grp

-average age of the household 3 grp

-brutto household income 4 grp

Page 10: J. Kylov_Gielfeldt - Collecting the household data as a sub-sample

On the household weighting model

• This weighting model is optimized for both the number of individuals in the population, but also the total number of households

• This means that new variables must be added as auxiliary information (family type, size of household etc.)

• At the same time – smaller smaple size limits the amount of auxiliary information

Page 11: J. Kylov_Gielfeldt - Collecting the household data as a sub-sample

Differences in estimates – the exampleon education

• Education is added as auxiliary information in the core-LFS but not in the household

• This means differences in estimates

Highest level of education completed

(25-64 years) - % 2011 Core-LFS 2011 HH-LFS 2012 Core-LFS 2012 HH-LFS 2013 Core-LFS 2013 HH-LFS

-At most lower secondary level 23,1 19,4 22,1 18,3 21,7 19,1

-Upper secondary level 43,2 41,6 43,1 41 42,8 40,6

-Third level 33,7 39 34,8 40,6 35,4 40,2

2011 Core-LFS 2011 HH-LFS 2012 Core-LFS 2012 HH-LFS 2013 Core-LFS 2013 HH-LFS

- min. ISCED3c long / upper

secondary level (20-24

years) - % 70,0 74,9 72,0 74,9 71,8 76,1

-Early leavers from

education and training (18-

24 years) - % 9,7 7,9 9,1 8,0 8,1 6,9

Page 12: J. Kylov_Gielfeldt - Collecting the household data as a sub-sample

The auxiliary information on education

• The difference between Core and household-LFS shows that the auxiliary information helpsdealing with the overrepresentation of highereducated.

• But it is not possible to use this information in the household model, since it would make it toocomplex.

• The household model does not handle the bias at all