nonresponse bias in studies of residential mobility

18
Nonresponse bias in studies of residential mobility Elizabeth Washbrook, Paul Clarke and Fiona Steele University of Bristol Research Methods Festival, 3 July 2012

Upload: tao

Post on 22-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Nonresponse bias in studies of residential mobility. Elizabeth Washbrook , Paul Clarke and Fiona Steele University of Bristol Research Methods Festival, 3 July 2012. The problem of panel nonresponse. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Nonresponse  bias in studies of residential mobility

Nonresponse bias in studies of residential mobility

Elizabeth Washbrook, Paul Clarke and Fiona Steele

University of Bristol

Research Methods Festival, 3 July 2012

Page 2: Nonresponse  bias in studies of residential mobility

The problem of panel nonresponse

• Household survey panel data permits social scientists to analyse a wide range of issues that cannot be addressed with cross-sectional data

• But the value of panel data is potentially undermined by nonresponse (dropout or intermittent missingness)– Smaller sample sizes reduce the efficiency of estimates– More seriously, selective nonresponse can lead to biased

estimates – those who remain in the sample become untypical of the population as a whole

Page 3: Nonresponse  bias in studies of residential mobility

Residential mobility application• The study of residential mobility/migration is at the core of

studies of demography and the life course – how do different groups change their housing or location in response to changing circumstances?

• Nonresponse issues are rarely considered in the substantive literature on mobility, yet there are reasons to think it might be even more of a problem here than in other applications.

• Moving house (the outcome of interest) is often cited as a key reason why people drop out of panel surveys → movers who remain are not typical of movers as a whole– PSID 1968-1989 had a 51% attrition rate. Fitzgerald et al. (JHR 1998) provide data

showing at least 20% of attritors were lost following a move

Page 4: Nonresponse  bias in studies of residential mobility

A standard model for mobility𝑌𝑖𝑡∗ = 𝛃′𝐗𝑖𝑡−1 + 𝑈𝑖 + 𝜀𝑖𝑡

𝑈𝑖~𝑁(0,𝜎2) 𝑌𝑖𝑡∗ is the unobserved latent propensity of individual i to move in the interval [t-1, t). Observed mobility status, 𝑌𝑖𝑡, depends on whether this propensity is greater or less than zero. 𝐗𝑖𝑡−1 is a vector of fully observed covariates measured at t-1 (prior to any potential move), including an intercept. 𝛃 is the coefficient vector of interest. 𝑈𝑖 is an individual random effect. Similar models have been used in numerous studies of residential mobility and migration (e.g. Boheim & Taylor, 2002; Ioannides & Kan, 1996; Ermisch, 1999; Clark & Huang, 2003, Rabe & Taylor, 2010)

𝑌𝑖𝑡 = 𝐼(𝑌𝑖𝑡∗ > 0) 𝜀𝑖𝑡~𝑁(0,1)

Page 5: Nonresponse  bias in studies of residential mobility

Modelling responseDefine 𝑅𝑖𝑡 = 1 if 𝑌𝑖𝑡 is observed and 𝑅𝑖𝑡 = 0 otherwise.

Estimates of 𝛃 based on the sample where 𝑅𝑖𝑡 = 1 will be biased unless the data are ‘missing at random’ (MAR), that is unless Pr(𝑅𝑖𝑡 = 1ȁ�𝑌𝑖𝑡,𝐗𝑖𝑡−1ሻ= Pr(𝑅𝑖𝑡 = 1|𝐗𝑖𝑡−1).

A vast literature has explored what happens when MAR doesn’t hold, i.e. when response is nonignorable. Broadly there are two approaches.

Specify and estimate and model for the missing data mechanism simultaneously with the outcome of interest (e.g. Hausman and Wise, 1979; Diggle and Kenward, 1994) o Has to rely on untestable assumptions about functional form and/or the

validity of exclusion restrictions on the DGP (instrumental variables) Assess the sensitivity of estimates to small departures from the MAR assumption

(e.g. Copas and Li, 1997) o Avoids modelling the unknown missing data mechanism but will not be valid if

the nonignorability of response is extreme

Page 6: Nonresponse  bias in studies of residential mobility

The direct dependence (DD) modelIf moving directly affects whether an individual continues to participate in the panel then MAR is automatically violated. We can express this via a ‘direct dependence’ (DD) model for the response propensity:

𝑅𝑖𝑡∗ = 𝚿′𝐗𝑖𝑡−1 + 𝛾𝑌𝑖𝑡 + 𝜈𝑖𝑡, 𝑅𝑖𝑡 = 𝐼ሺ𝑅𝑖𝑡∗ > 0ሻ, 𝜈𝑖𝑡~𝑁(0,1)

The term 𝛾𝑌𝑖𝑡 captures the idea that the response propensity of an individual with given 𝐗𝑖𝑡−1, and 𝜈𝑖𝑡 will differ – by the amount 𝛾 - if they move relative to the case in which they do not move. In the mobility example we expect that 𝛾 < 0. Previous studies implicitly assume that 𝛾 = 0 so that MAR is satisfied.

Page 7: Nonresponse  bias in studies of residential mobility

An alternative response modelA common alternative model of response is the bivariate probit (BP) model. This sets 𝛾 equal to zero by definition, but allows for correlated errors in the mobility and response equations.

A simple cross-sectional version of this model for a continuous outcome variable is the basis for the well-known two-step Heckman selection estimator. 𝑅𝑖𝑡∗ = 𝚿′𝐗𝑖𝑡−1 + 𝜈𝑖𝑡, 𝑅𝑖𝑡 = 𝐼ሺ𝑅𝑖𝑡∗ > 0ሻ, (𝜀𝑖𝑡,𝜈𝑖𝑡)~bivnorm(0,0,1,1,𝜌)

In the BP model non-ignorability is essentially an omitted variables problem. If all relevant factors can be observed and controlled 𝜌 could be driven to zero and MAR would be satisfied. This is not so in the DD model. The causal effect of a move on response will lead to biases even with no omitted variables.

Page 8: Nonresponse  bias in studies of residential mobility

Maximum likelihood estimationThe sample likelihood contribution for a given individual in a given year is ℒ𝑖𝑡. It takes one of three possible combinations depending on the observed 𝑌𝑖𝑡 and 𝑅𝑖𝑡:

Likelihood contribution

Response Mobility Interpretation [t-1,t)

ℒ𝐴𝑖𝑡 𝑅𝑖𝑡 = 0 Unobserved Dropout ℒ𝐵𝑖𝑡 𝑅𝑖𝑡 = 1 𝑌𝑖𝑡 = 0 Remained in panel; no move ℒ𝐶𝑖𝑡 𝑅𝑖𝑡 = 1 𝑌𝑖𝑡 = 1 Remained in panel; moved

ℒ𝑖𝑡 = (1− 𝑅𝑖𝑡)ℒ𝐴𝑖𝑡 + 𝑅𝑖𝑡ሺ1− 𝑌𝑖𝑡ሻℒ𝐵𝑖𝑡 + 𝑅𝑖𝑡𝑌𝑖𝑡ℒ𝐶𝑖𝑡

The log likelihood for individual i is then

logℒ𝑖 = logන 𝜎𝜙ሺ𝑈𝑖ሻ∞−∞ ቐෑ� ℒ𝑖𝑡

𝑡=𝑇𝑖𝑡=2 ቑ𝑑𝑈𝑖

Page 9: Nonresponse  bias in studies of residential mobility

Maximum likelihood estimationℒ𝑖𝑡 = (1− 𝑅𝑖𝑡)ℒ𝐴𝑖𝑡 + 𝑅𝑖𝑡ሺ1− 𝑌𝑖𝑡ሻℒ𝐵𝑖𝑡 + 𝑅𝑖𝑡𝑌𝑖𝑡ℒ𝐶𝑖𝑡

DD likelihood BP likelihood ℒ𝐴𝑖𝑡 ሼΦሺ𝛃′𝐗𝑖𝑡−1 + 𝑈𝑖ሻ×Φሺ−𝚿′𝐗𝑖𝑡−1 + 𝛾ሻሽ+ሼΦሺ−ሾ𝛃′X𝑖𝑡−1 + 𝑈𝑖ሿሻ×Φሺ−𝚿′𝐗𝑖𝑡−1ሻሽ

Φ(−𝚿′𝐗𝑖𝑡−1)

ℒ𝐵𝑖𝑡 Φ(−ሾ𝛃′X𝑖𝑡−1 + 𝑈𝑖ሿ) ×Φ(𝚿′𝐗𝑖𝑡−1)

Φ2(−ሾ𝛃′X𝑖𝑡−1 + 𝑈𝑖ሿ,𝚿′𝐗𝑖𝑡−1,𝜌)

ℒ𝐶𝑖𝑡 Φ(𝛃′𝐗𝑖𝑡−1 + 𝑈𝑖) ×Φ(𝚿′𝐗𝑖𝑡−1 + 𝛾)

Φ2(𝛃′𝐗𝑖𝑡−1 + 𝑈𝑖,𝚿′𝐗𝑖𝑡−1,𝜌)

Page 10: Nonresponse  bias in studies of residential mobility

Exclusion restrictionsThe models set out rely heavily on the untestable assumption that the error terms are normally distributed. Likelihood-based estimates that rely solely on functional form for identification are well known to be sensitive to failures of the distributional assumptions.

However, the imposition of exclusion restrictions (the inclusion of instrumental variables) can dramatically improve the stability and robustness of the model.

The inclusion of a response instrument – something that predicts response but not the outcome of interest – is common practice in the standard continuous-variable two-equation Heckman selection model.

The statistics literature has also explored the role of an outcome instrument – something that predicts the outcome but not response – and has shown that this weakens the modelling assumptions necessary for identification.

Page 11: Nonresponse  bias in studies of residential mobility

Residential mobility in the BHPS

• BHPS is representative sample of 5500 households in 1991, interviewed annually (18 waves of data on over 10,000 individuals).

• Sample of men 20-59, living in England or Wales in year t-1, from Waves 6-18 (1996-2008)– Full-time students and retirees excluded– Focus on men avoids the ‘double-counting’ problem in

which sample individuals move together as a couple• 4,724 individuals contributing 33,347 person-year

observations (mean 7.1)

Page 12: Nonresponse  bias in studies of residential mobility

Residential mobility in the BHPS• Outcome =1 if individual moved to a different residence within the

same region between t-1 and t (longer distance moves coded 0)– The majority of moves are local (85% in this sample)– Motivations for short- and long-distance moves tend to the quite different:

long-distance moves are more job-related while short-distance moves are more housing-related

• Outcome observed for 94.5% of observations, among which mobility rate is 9.6%.

• 38% of sample individuals are known to have moved at least once, 16% more than once.

• 36% drop out of the panel at least once, 6% re-enter at a later wave

Page 13: Nonresponse  bias in studies of residential mobility

Exclusion restrictionsOutcome instrument

– Log average sale price of properties in region of residence over 12 months prior to t-1, deflated by RPI. From Land Registry data (only available for England and Wales from 1995 onwards).

– Expect that high house prices will deter mobility, but will have no independent effect on response, conditional on year and region fixed effects.

Response instrument– Sample membership status. Original 1991 sample adult (OSM; omitted),

65%; ECHP joiner in 1997, 4%; Celtic booster sample joiner in 1999, 14%; parent of OSMs child, 9%; original 1991 sample child, 8%. TSMs dropped.

– Survey-related variables are often used as instruments in this context (e.g. Cappellari and Jenkins 2008). The rationale is that stronger survey attachment will have been fostered among OSMs than among later joiners or those involved only because of family ties.

Page 14: Nonresponse  bias in studies of residential mobility

Results I. Nonignorability and IV parameters

DD model BP model

Coef SE Coef SE

Mobility instrument

Log region house price -0.257 0.144 -0.285 0.146

p<0.10

p<0.10 Response instrument

OSM adult at Wave 1 0 [ref]

0 [ref] ECHP joiner -1.159 0.079 -0.984 0.047 Celtic joiner -0.194 0.064 -0.163 0.054 PSM joiner -0.069 0.055 -0.075 0.046 Child joiner -0.163 0.059 -0.143 0.049 Joint p-value p<0.01

p<0.01

Random effect SE (σμ) 0.116 0.105 Nonignorability parameter (γ) -1.465 0.198 - Error correlation (ρ) - -0.419 0.129 Log likelihood -15138.5 -15148.4 Total obs 33347 33347 Uncensored obs 31511 31511

Value of γ implies moving reduces the expected response probability from 0.95 to 0.55.

Page 15: Nonresponse  bias in studies of residential mobility

Results II. Covariates of interest

MAR (RE probit) DD model BP model

Coef SE Coef SE

Coef SE

Unemployed (ref = employed) 0.105 0.049 * 0.160 0.048 ** 0.138 0.050 ** Inactive (ref = employed) 0.066 0.053 0.100 0.052 + 0.084 0.053 Employed partner 0.020 0.034 0.022 0.033

0.022 0.033

Educ: O-level (ref = none) 0.031 0.036 0.021 0.035

0.028 0.035 Educ: A-level (ref = none) 0.057 0.035 + 0.022 0.035 0.042 0.035 Educ: degree (ref = none) 0.075 0.043 + 0.021 0.043 0.055 0.043 Single (ref = married) 0.170 0.049 ** 0.237 0.048 ** 0.196 0.049 ** Cohabiting (ref = married) 0.138 0.035 ** 0.154 0.034 ** 0.143 0.034 ** Private renter (ref = owner) 0.884 0.037 ** 0.865 0.039 ** 0.883 0.037 ** Social renter (ref = owner) 0.265 0.043 ** 0.305 0.042 ** 0.283 0.042 ** Lives with parents (ref=owner) 0.110 0.050 * 0.118 0.049 * 0.113 0.049 * Household income (log) 0.014 0.021 0.009 0.021

0.012 0.021

Rooms per person -0.100 0.017 ** -0.096 0.017 ** -0.099 0.017 ** Age 30-39 (ref = 20-29) -0.178 0.032 ** -0.188 0.031 ** -0.189 0.032 ** Age 40-49 (ref = 20-29) -0.0462 0.040 ** -0.478 0.040 ** -0.476 0.040 ** Age 50-59 (ref = 20-29) -0.639 0.047 ** -0.652 0.047 ** -0.649 0.046 ** ** p<.01, * p<.05, + p<.1. Models also control for, year, region, children of different ages and log regional house prices.

Page 16: Nonresponse  bias in studies of residential mobility

Results III. Response equation

DD model BP model

Coef SE

Coef SE

Unemployed (ref = employed) -0.235 0.056 ** -0.265 0.047 ** Inactive (ref = employed) -0.136 0.055 * -0.148 0.047 ** Employed partner -0.009 0.041

-0.008 0.035

Educ: O-level (ref = none) 0.031 0.038

0.018 0.033 Educ: A-level (ref = none) 0.143 0.039 ** 0.107 0.033 ** Educ: degree (ref = none) 0.216 0.050 ** 0.171 0.042 ** Single (ref = married) -0.218 0.058 ** -0.275 0.049 ** Cohabiting (ref = married) -0.028 0.047 -0.101 0.038 ** Private renter (ref = owner) 0.285 0.078 ** -0.158 0.052 ** Social renter (ref = owner) 0.015 0.052

-0.105 0.041 *

Lives with parents (ref=owner) 0.027 0.062

-0.005 0.052 Household income (log) 0.019 0.023

0.009 0.020

Rooms per person -0.026 0.021

0.014 0.017 Age 30-39 (ref = 20-29) 0.020 0.048 0.125 0.039 ** Age 40-49 (ref = 20-29) -0.029 0.066 0.202 0.045 ** Age 50-59 (ref = 20-29) -0.082 0.079 0.213 0.049 ** ** p<.01, * p<.05, + p<.1. Models also control for, year, region, children of different ages and sample membership status.

Page 17: Nonresponse  bias in studies of residential mobility

Conclusions• Estimates of some predictors of moving house in the BHPS

differ depending on whether or not attrition bias is accounted for in the analysis– The positive effect of unemployment is markedly larger than

suggested by MAR estimates– The positive effect of economic inactivity (p<.1) is insignificant in the

MAR estimates– Higher qualifications are no longer significantly associated with

greater mobility when non-response is accounted for• The direction of the changes implies that effects are

underestimated for covariates negatively associated with response and overestimated for those positively associated with response

Page 18: Nonresponse  bias in studies of residential mobility

Conclusions• Both the DD and BP models reject ignorability of non-response.

Corrections made by the two models are in the same direction, but larger in the former case. The log likelihood suggests the DD model is a slightly better fit.

• Next steps: simulation studies to explore the effect of including exclusion restrictions of varying strengths when the error distribution is mis-specified

• The potentially causal nature of the relationship between mobility and nonresponse implies that it is particularly important to consider the issue in studies of mobility, and provides an a priori reason for favouring a DD-type response mechanism.

• There are other examples where the DD model may be more appropriate, e.g. studies modelling poor health as the outcome