sc968 panel data methods for sociologists lecture 3 fixed and random effects models continued
TRANSCRIPT
SC968Panel data methods for sociologistsLecture 3
Fixed and random effects models continued
Overview
Review Between- and within-individual variation Types of variables: time-invariant, time-varying and trend Individual heterogeneity Within and between estimators
The implementation of fixed and random effects models in STATA Statistical properties of fixed and random effects models Choosing between fixed and random effects: the Hausman test Estimating coefficients on time-invariant variables in FE Thinking about specification
Between- and within-individual variation
If you have a sample with repeated observations on the same individuals, there are two sources of variance within the sample:The fact that individuals are systematically
different from one another (between-individual variation)• Joe lives in Colchester, Jane lives in
WivenhoeThe fact that individuals’ behaviour varies
between observations (within-individual variation) • Joe moves from Colchester to Wivenhoe
How to think about two sources of variation in panel data...
Between variation
How does an individual vary, on average, from the sample mean?
i j
i xxB 2)(
Within variation
How does an individual vary at any particular time point from his individual mean?
i j
iij xxW 2)(
W1 W2 W3 W4 W5 Person Mean
Jane 20 20 20 20 20 20
Joe15 5 6 20 4 10
Average income for sample: £10 per year
xtsum in STATA
Similar to ordinary “sum” command
within 4.320605 1 15 T = 15 between 0 8 8 n = 1294wave overall 8 4.320605 1 15 N = 19410 within 4.030974 -6.738331 35.12834 T-bar = 12.7845 between 3.609665 0 29.69231 n = 1225LIKERT overall 11.26167 5.344825 0 36 N = 15661 within .1852756 -.866041 1.000626 T-bar = 13.1787 between .1738938 0 1 n = 1237ue_sick overall .0672924 .2505353 0 1 N = 16302 within 4.31763 31.30015 54.30015 T = 15 between 19.27238 6.4 90.93333 n = 1294age overall 40.03349 19.74332 0 98 N = 19410 within .243531 -.244038 1.622629 T-bar = 13.2026 between .4217842 0 1 n = 1234partner overall .6892954 .4627963 0 1 N = 16292 within 0 .5397574 .5397574 T-bar = 13.1964 between .4989059 0 1 n = 1237female overall .5397574 .4984321 0 1 N = 16324 Variable Mean Std. Dev. Min Max Observations
. xtsum female partner age ue_sick LIKERT wave if nwaves == 15
delta: 1 unit time variable: wave, 1 to 15, but with gaps panel variable: pid (unbalanced). xtset pid wave
xtsum in STATA
Similar to ordinary “sum” command
within 4.320605 1 15 T = 15 between 0 8 8 n = 1294wave overall 8 4.320605 1 15 N = 19410 within 4.030974 -6.738331 35.12834 T-bar = 12.7845 between 3.609665 0 29.69231 n = 1225LIKERT overall 11.26167 5.344825 0 36 N = 15661 within .1852756 -.866041 1.000626 T-bar = 13.1787 between .1738938 0 1 n = 1237ue_sick overall .0672924 .2505353 0 1 N = 16302 within 4.31763 31.30015 54.30015 T = 15 between 19.27238 6.4 90.93333 n = 1294age overall 40.03349 19.74332 0 98 N = 19410 within .243531 -.244038 1.622629 T-bar = 13.2026 between .4217842 0 1 n = 1234partner overall .6892954 .4627963 0 1 N = 16292 within 0 .5397574 .5397574 T-bar = 13.1964 between .4989059 0 1 n = 1237female overall .5397574 .4984321 0 1 N = 16324 Variable Mean Std. Dev. Min Max Observations
. xtsum female partner age ue_sick LIKERT wave if nwaves == 15
delta: 1 unit time variable: wave, 1 to 15, but with gaps panel variable: pid (unbalanced). xtset pid wave
All variation is “between”
All variation is within, because this is a balanced sample
Have chosen a balanced sample
Most variation is “between”, because it’s fairly rare to switch between having and not having a partner
More on xtsum….
within 4.320605 1 15 T = 15 between 0 8 8 n = 1294wave overall 8 4.320605 1 15 N = 19410 within 4.030974 -6.738331 35.12834 T-bar = 12.7845 between 3.609665 0 29.69231 n = 1225LIKERT overall 11.26167 5.344825 0 36 N = 15661 within .1852756 -.866041 1.000626 T-bar = 13.1787 between .1738938 0 1 n = 1237ue_sick overall .0672924 .2505353 0 1 N = 16302 within 4.31763 31.30015 54.30015 T = 15 between 19.27238 6.4 90.93333 n = 1294age overall 40.03349 19.74332 0 98 N = 19410 within .243531 -.244038 1.622629 T-bar = 13.2026 between .4217842 0 1 n = 1234partner overall .6892954 .4627963 0 1 N = 16292 within 0 .5397574 .5397574 T-bar = 13.1964 between .4989059 0 1 n = 1237female overall .5397574 .4984321 0 1 N = 16324 Variable Mean Std. Dev. Min Max Observations
. xtsum female partner age ue_sick LIKERT wave if nwaves == 15
delta: 1 unit time variable: wave, 1 to 15, but with gaps panel variable: pid (unbalanced). xtset pid wave
Observations with non-missing variable
Average number of time-points
Number of individuals
Min & max refer to individual deviation from own averages, with global averages added back in.
Min & max refer to xi-bar
Types of variable
Those which vary between individuals but hardly ever over time Sex Ethnicity Parents’ social class when you were 14 The type of primary school you attended (once you’ve become an adult)
Those which vary over time, but not between individuals The retail price index National unemployment rates Age, in a cohort study
Those which vary both over time and between individuals Income Health Psychological wellbeing Number of children you have Marital status
Trend variables Vary between individuals and over time, but in highly predictable ways: Age Year
Within and between estimators
)}()1{()()1()(
estimatorsbetween and within theof average weighteda is
estimator effects random thefinally, And
)()()(
:gsubtractin
person for nsobservatio all ofmean
iitiiitiit
iitiitiit
iiii
itiitit
uxxyy
xxyy
uxy
i
uxy
Individual-specific, fixed over time
Varies over time, usual assumptions apply (mean zero, homoscedastic, uncorrelated with x or u or itself)
This is the “between” estimator
And this is the “within” estimator – “fixed effects”
θ measures the weight given to between-group variation, and is derived from the variances of ui and εi
Individual heterogeneity: one reason to used fixed effects
A very simple concept: people are different! In social science, when we talk about heterogeneity, we are really
talking about unobservable (or unobserved) heterogeneity. Observed heterogeneity: differences in education levels, or parental
background, or anything else that we can measure and control for in regressions
Unobserved heterogeneity: anything which is fundamentally unmeasurable, or which is rather poorly measured, or which does not happen to be measured in the particular data set we are using.
Time invariant heterogeneity Height (among adults) Innate intelligence Antenatal care of mother
Time variant kinds of heterogeneity Social network size Beauty Weight
Unobserved heterogeneity
Extend the OLS equation we used in Week 1, breaking the error term down into two components: one representing the time invariant, unobservable characteristics of the person, and the other representing genuine “error”.
In cross-sectional analysis, there is no way of distinguishing between the two. But in panel data analysis, we have repeated observations – and this allows us
to distinguish between them.
iiKiKiiii uxxxxy .........332211
Fixed effects (within estimator)
Allows us to “net out” time-invariant unobserved characteristics Ignores between-group variation – so it’s an inefficient estimator However, few assumptions are required, so FE is generally consistent
and unbiased Disadvantage: can’t estimate the effects of any time-invariant variables
Also called least squares dummy variable model (LDV) Analysis of covariance (CV) model
)()()( iitiitiit
itiitit
xxyy
uxy
Between estimator
Not much used Except to calculate the θ parameter for random effects, but STATA does this, not you!
It’s inefficient compared to random effects It doesn’t use as much information as is available in the data (only uses means)
Assumption required: that ui is uncorrelated with xi
Easy to see why: if they were correlated, how could one decide how much of the variation in y to attribute to the x’s (via the betas) as opposed to the correlation?
Can’t estimate effects of variables where mean is invariant over individuals Age in a cohort study Macro-level variables
iiii
itiitit
uxy
uxy
Random effects estimator
Weighted average of within and between models Assumption required: that ui is uncorrelated with xi
Rather heroic assumption – think of examples Will see a test for this later Uses both within- and between-group variation, so makes best use of the
data and is efficient But unless the assumption holds that ui is uncorrelated with xi , it is
inconsistent
AKA one-way error components model, variance component model, GLS estimator (STATA also allows ML random effects)
)}()1{()()1()( iitiiitiit
itiitit
uxxyy
uxy
Consistency versus efficiency. Random effects clearly does worse here…..
“True” value of betas
Inconsistent but efficient
Consistent but inefficient
…. But arguably, random effects do a better job of getting close to the “true” coefficient here.
Random effects
Fixed effects
“True” value of betas
Testing between FE and RE
Hausman test Hypothesis H0: ui is uncorrelated with xi
Hypothesis H1: ui is correlated with xi
Fixed effects is consistent under both H0 and H1
Random effects is efficient, and consistent under H0 (but inconsistent under H1)
Prob>chi2 = 0.0000 = 123.96 chi2(5) = (b-B)'[(V_b-V_B)^(-1)](b-B)
Test: Ho: difference in coefficients not systematic
B = inconsistent under Ha, efficient under Ho; obtained from xtreg b = consistent under Ho and Ha; obtained from xtreg badhealth 1.230831 1.433115 -.2022848 .0187202 age2 -.0011833 -.0011062 -.0000771 .0001624 age .1141748 .1058038 .008371 .0157531 partner -.298668 -.1947691 -.1038989 .0677693 ue_sick 1.951485 2.045302 -.0938175 .0572845 fixed . Difference S.E. (b) (B) (b-B) sqrt(diag(V_b-V_B)) Coefficients
. hausman fixed .
. quietly xtreg LIKERT female ue_sick partner age age2 badh, re
. estimates store fixed
. quietly xtreg LIKERT female ue_sick partner age age2 badh, fe
Random effects rejected (inconsistent) in favour of fixed effects (consistent but inefficient)
Example from last week
Sex does not appear
HOWEVER
Big disciplinary divide Economists swear by the Hausman test and rarely report random
effects Other disciplines (eg psychology) consider other factors such as
explanatory power.
Estimating FE in STATA
F test that all u_i=0: F(3316, 20882) = 4.56 Prob > F = 0.0000 rho .49265449 (fraction of variance due to u_i) sigma_e 4.0525618 sigma_u 3.9934565 _cons 6.252975 .4932977 12.68 0.000 5.286073 7.219877 badhealth 1.230831 .0428556 28.72 0.000 1.14683 1.314831 age2 -.0011833 .0002209 -5.36 0.000 -.0016163 -.0007503 age .1141748 .0214403 5.33 0.000 .0721501 .1561994 partner -.298668 .118635 -2.52 0.012 -.5312018 -.0661342 ue_sick 1.951485 .1394164 14.00 0.000 1.678218 2.224752 female (dropped) LIKERT Coef. Std. Err. t P>|t| [95% Conf. Interval]
corr(u_i, Xb) = 0.1561 Prob > F = 0.0000 F(5,20882) = 220.44
overall = 0.1285 max = 14 between = 0.1906 avg = 7.3R-sq: within = 0.0501 Obs per group: min = 1
Group variable: pid Number of groups = 3317Fixed-effects (within) regression Number of obs = 24204
. xtreg LIKERT female ue_sick partner age age2 badh, fe
“u” and “e” are the two parts of the error term
Peaks at age 48
“R-square-like” statistic
Between regression:
_cons 3.953941 .4430909 8.92 0.000 3.085181 4.822701 badhealth 2.275832 .0926521 24.56 0.000 2.094171 2.457493 age2 -.0009489 .0002263 -4.19 0.000 -.0013927 -.0005052 age .0827335 .0219026 3.78 0.000 .0397895 .1256775 partner -.0101941 .1777423 -0.06 0.954 -.35869 .3383019 ue_sick 2.038192 .312191 6.53 0.000 1.426085 2.650299 female 1.476659 .1350226 10.94 0.000 1.211923 1.741395 LIKERT Coef. Std. Err. t P>|t| [95% Conf. Interval]
sd(u_i + avg(e_i.))= 3.833357 Prob > F = 0.0000 F(6,3310) = 166.80
overall = 0.1482 max = 14 between = 0.2322 avg = 7.3R-sq: within = 0.0480 Obs per group: min = 1
Group variable: pid Number of groups = 3317Between regression (regression on group means) Number of obs = 24204
. xtreg LIKERT female ue_sick partner age age2 badh, be
Not much used, but useful to compare coefficients with fixed effects
Coefficient on “partner” was negative and significant in FE model.
In FE, the “partner” coeff really measures the events of gaining or losing a partner
Random effects regression
rho .3577895 (fraction of variance due to u_i) sigma_e 4.0525618 sigma_u 3.0248563 _cons 5.181864 .3137662 16.52 0.000 4.566894 5.796835 badhealth 1.433115 .0385506 37.17 0.000 1.357558 1.508673 age2 -.0011062 .0001498 -7.39 0.000 -.0013998 -.0008126 age .1058038 .014544 7.27 0.000 .0772981 .1343094 partner -.1947691 .0973734 -2.00 0.045 -.3856175 -.0039207 ue_sick 2.045302 .1271039 16.09 0.000 1.796183 2.294422 female 1.493431 .1259931 11.85 0.000 1.246489 1.740373 LIKERT Coef. Std. Err. z P>|z| [95% Conf. Interval]
0.1986 0.1986 0.5482 0.6629 0.6629 min 5% median 95% max theta
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000Random effects u_i ~ Gaussian Wald chi2(6) = 2013.32
overall = 0.1471 max = 14 between = 0.2239 avg = 7.3R-sq: within = 0.0500 Obs per group: min = 1
Group variable: pid Number of groups = 3317Random-effects GLS regression Number of obs = 24204
. xtreg LIKERT female ue_sick partner age age2 badh, re theta
Option “theta” gives a summary
of weights
And what about OLS?
OLS simply treats within- and between-group variation as the same Pools data across waves
_cons 4.450393 .2212733 20.11 0.000 4.016684 4.884102 badhealth 1.841796 .0357165 51.57 0.000 1.771789 1.911802 age2 -.0010613 .0001049 -10.12 0.000 -.001267 -.0008557 age .0983746 .0103316 9.52 0.000 .078124 .1186252 partner -.0751296 .0769271 -0.98 0.329 -.2259116 .0756524 ue_sick 2.031815 .1240757 16.38 0.000 1.788619 2.275011 female 1.409466 .0640651 22.00 0.000 1.283895 1.535038 LIKERT Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 694823.199 24203 28.7081436 Root MSE = 4.9431 Adj R-squared = 0.1489 Residual 591239.694 24197 24.4344214 R-squared = 0.1491 Model 103583.505 6 17263.9175 Prob > F = 0.0000 F( 6, 24197) = 706.54 Source SS df MS Number of obs = 24204
. reg LIKERT female ue_sick partner age age2 badh
Comparing models
Compare coefficients between models Reasonably similar – differences in “partner” and “badhealth” coeffs R-squareds are similar Within and between estimators maximise within and between r-2 respectively.
FE RE BE OLS Female - 1.49 *** 1.47 *** 1.41 *** Ue_sick 1.95 *** 2.04 *** 2.03 *** 2.03 *** Partner -0.30 ** -1.94 *** -0.01 -0.08 Age 0.11 *** 0.11 ** 0.08 *** 0.10 *** Age-2 -0.00 *** -0.00 *** -0.00 *** -0.00 *** Badhealth 1.23 *** 1.43 ** 2.28 *** 1.84 *** Cons 6.25 *** 5.18 *** 3.95 *** 4.45 *** Within R2 0.050 0.050 0.048 - Between r2 0.191 0.224 0.232 - Overall r2 0.129 0.147 0.148 0.149
Test whether pooling data is valid
itiitit uxy
If the ui do not vary between individuals, they can be treated as part of α and OLS is fine.
Breusch-Pagan Lagrange multiplier test H0 Variance of ui = 0
H1 Variance of ui not equal to zero
If H0 is not rejected, you can pool the data and use OLS Post-estimation test after random effects
Prob > chi2 = 0.0000 chi2(1) = 10816.48 Test: Var(u) = 0
u 9.149756 3.024856 e 16.42326 4.052562 LIKERT 28.70814 5.357998 Var sd = sqrt(Var) Estimated results:
LIKERT[pid,t] = Xb + u[pid] + e[pid,t]
Breusch and Pagan Lagrangian multiplier test for random effects
. xttest0
. quietly xtreg LIKERT female ue_sick partner age age2 badh, re
Thinking about the within and between estimators…..
)()()( iitiitiit
iiii
xxyy
uxy
Both between and FE models written with the same coefficient vector β, but no reason why they should be the same.
Between: βj measures the difference in y associated with a one-unit difference in the average value of variable xj between individuals – essentially a cross-sectional concept
Within: βj measures the difference associated with a one-unit increase in variable xj at individual level – essentially a longitudinal concept
Random effects, as a weighted average of the two, constrains both βs to be the same.
Excellent article at http://www.stata.com/support/faqs/stat/xt.html And lots more at http://www.stata.com/support/faqs/stat/#models
Examples
Example 1: Consider estimating a wage equation, and including a set of regional dummies,
with S-E the omitted group. Wages in (eg) the N-W are lower, so the estimated between coefficient on N-W
will be negative. However, in the within regression, we observe the effects of people moving to
the N-W. Presumably they wouldn’t move without a reasonable incentive. So, the estimated within coefficient may even be positive – or at least, it’s likely to be a lot less negative.
Example 2: Estimate the relationship between family income and children’s educational
outcomes The between-group estimates measure how well the children of richer
families do, relative to the children of poorer families – we know this estimate is likely to be large and significant.
The within-group estimates measure how children’s outcomes change as their own family’s income changes. This coefficient may well be much smaller.
FE and time-invariant variables
Reformulating the regression equation to distinguish between time-varying and time-invariant variables:
itiiitit uzxy
Time-varying variables: income, health
Time-invariant variables – eg sex, race
Individual-specific fixed effect
Residual
Inconveniently, fixed effects washes out the z’s, so does not produce estimates of γ.
But there is a way! Requires the z variable to be uncorrelated with u’s
Coefficients on time-invariant variables
Run FE in the normal way Use estimates to predict the residuals Use the between estimator to regress the residuals on the time-invariant variables Done! Only use this if RE is rejected: otherwise, RE provides best estimates of all coefficients Going back to the previous example,
_cons -.7288892 .0984186 -7.41 0.000 -.9218564 -.5359219 female 1.599518 .1360426 11.76 0.000 1.332782 1.866254 FE_RESID Coef. Std. Err. t P>|t| [95% Conf. Interval]
sd(u_i + avg(e_i.))= 3.913298 Prob > F = 0.0000 F(1,3315) = 138.24
overall = 0.0212 max = 14 between = 0.0400 avg = 7.3R-sq: within = 0.0000 Obs per group: min = 1
Group variable: pid Number of groups = 3317Between regression (regression on group means) Number of obs = 24204
. xtreg FE_RESID female, be
(13352 missing values generated). predict FE_RESID, ue
. quietly xtreg LIKERT female ue_sick partner age age2 badh, fe
From previous slide…
Our estimate of 1.60 for the coefficient on “female” is slightly higher than, but definitely in the same ball-park as, those produced by the other methods.
FE RE BE OLS Female - 1.49 *** 1.47 *** 1.41 *** Ue_sick 1.95 *** 2.04 *** 2.03 *** 2.03 *** Partner -0.30 ** -1.94 *** -0.01 -0.08 Age 0.11 *** 0.11 ** 0.08 *** 0.10 *** Age-2 -0.00 *** -0.00 *** -0.00 *** -0.00 *** Badhealth 1.23 *** 1.43 ** 2.28 *** 1.84 *** Cons 6.25 *** 5.18 *** 3.95 *** 4.45 *** Within R2 0.050 0.050 0.048 - Between r2 0.191 0.224 0.232 - Overall r2 0.129 0.147 0.148 0.149
Improving specification
Recall our problem with the “partner” coefficient OLS and between estimates show no significant relationship between
partnership status and LIKERT scores FE and RE show a significant negative relationship. FE estimates coefficient on deviation from mean – likely to reflect moving in
together (which makes you temporarily happy) and splitting up (which makes you temporarily sad).
Investigate this by including variables to capture these events
FE RE BE OLS Female - 1.49 *** 1.47 *** 1.41 *** Ue_sick 1.95 *** 2.04 *** 2.03 *** 2.03 *** Partner -0.30 ** -1.94 *** -0.01 -0.08 Age 0.11 *** 0.11 ** 0.08 *** 0.10 *** Age-2 -0.00 *** -0.00 *** -0.00 *** -0.00 *** Badhealth 1.23 *** 1.43 ** 2.28 *** 1.84 *** Cons 6.25 *** 5.18 *** 3.95 *** 4.45 *** Within R2 0.050 0.050 0.048 - Between r2 0.191 0.224 0.232 - Overall r2 0.129 0.147 0.148 0.149
Generate variables reflecting changes
(5078 missing values generated). gen lose_pnr = (partner == 0 & partner[_n-1] == 1) if pid == pid[_n-1] & wave == wave[_n-1] + 1
(5078 missing values generated). gen get_pnr = (partner == 1 & partner[_n-1] == 0) if pid == pid[_n-1] & wave == wave[_n-1] + 1
. sort pid wave
Note: we will lose some observations
Fixed effects
F test that all u_i=0: F(2763, 18493) = 4.83 Prob > F = 0.0000 rho .46871319 (fraction of variance due to u_i) sigma_e 4.030519 sigma_u 3.7857335 _cons 6.796602 .5570247 12.20 0.000 5.704782 7.888422 badhealth 1.284593 .045967 27.95 0.000 1.194494 1.374693 age2 -.0008799 .0002464 -3.57 0.000 -.0013629 -.0003969 age .0734274 .0240822 3.05 0.002 .0262241 .1206308 ue_sick 1.894659 .1530311 12.38 0.000 1.594704 2.194614 female (dropped) lose_pnr 2.64016 .2371252 11.13 0.000 2.175372 3.104947 get_pnr -.0793952 .2116739 -0.38 0.708 -.4942956 .3355053 partner .3186429 .143112 2.23 0.026 .0381301 .5991557 LIKERT Coef. Std. Err. t P>|t| [95% Conf. Interval]
corr(u_i, Xb) = 0.1460 Prob > F = 0.0000 F(7,18493) = 160.80
overall = 0.1333 max = 13 between = 0.1839 avg = 7.7R-sq: within = 0.0574 Obs per group: min = 1
Group variable: pid Number of groups = 2764Fixed-effects (within) regression Number of obs = 21264
. xtreg LIKERT partner get_pnr lose_pnr female ue_sick age age2 badh, fe
.
Coeff on having a partner now slightly positive; getting a partner is insignificant; losing a partner is now large and positive
Random effects
rho .3504325 (fraction of variance due to u_i) sigma_e 4.030519 sigma_u 2.9604042 _cons 5.457217 .3436851 15.88 0.000 4.783606 6.130827 badhealth 1.470353 .0414036 35.51 0.000 1.389203 1.551502 age2 -.0007748 .0001621 -4.78 0.000 -.0010926 -.000457 age .0719139 .0159222 4.52 0.000 .0407069 .1031209 ue_sick 1.892352 .1388821 13.63 0.000 1.620148 2.164556 female 1.450748 .1324675 10.95 0.000 1.191116 1.710379 lose_pnr 2.76626 .2284331 12.11 0.000 2.318539 3.21398 get_pnr -.0897335 .204547 -0.44 0.661 -.4906382 .3111713 partner .281375 .113251 2.48 0.013 .0594072 .5033428 LIKERT Coef. Std. Err. z P>|z| [95% Conf. Interval]
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000Random effects u_i ~ Gaussian Wald chi2(8) = 1922.41
overall = 0.1545 max = 13 between = 0.2213 avg = 7.7R-sq: within = 0.0571 Obs per group: min = 1
Group variable: pid Number of groups = 2764Random-effects GLS regression Number of obs = 21264
. xtreg LIKERT partner get_pnr lose_pnr female ue_sick age age2 badh, re
similar
Proportion of total residual variance attributable to the u’s - c.f. random slopes models later
Collating the coefficients:
FE RE BE OLS Partner 0.32 ** 0.28 ** 0.29 0.17 ** Get partner -0.07 -0.09 ** -2.85 ** -0.10 Lose partner 2.64 *** 2.77 *** 7.17 *** 3.19 *** FE RE BE OLS Partner -0.30 ** -1.94 *** -0.01 -0.08
Hausman test again
Have we cleaned up the specification sufficiently that the Hausman test will now fail to reject random effects?
No! Although the chi-squared statistic is smaller now (at 116.04), than previously (at 123.96)
Prob>chi2 = 0.0000 = 116.04 chi2(7) = (b-B)'[(V_b-V_B)^(-1)](b-B)
Test: Ho: difference in coefficients not systematic
B = inconsistent under Ha, efficient under Ho; obtained from xtreg b = consistent under Ho and Ha; obtained from xtreg badhealth 1.284593 1.470353 -.1857594 .0199676 age2 -.0008799 -.0007748 -.0001051 .0001855 age .0734274 .0719139 .0015135 .0180675 ue_sick 1.894659 1.892352 .0023072 .0642673 lose_pnr 2.64016 2.76626 -.1260999 .0636136 get_pnr -.0793952 -.0897335 .0103383 .0544645 partner .3186429 .281375 .0372679 .0874944 fixed . Difference S.E. (b) (B) (b-B) sqrt(diag(V_b-V_B)) Coefficients
. hausman fixed .
. quietly xtreg LIKERT partner get_pnr lose_pnr female ue_sick age age2 badh, re
. estimates store fixed
. quietly xtreg LIKERT partner get_pnr lose_pnr female ue_sick age age2 badh, fe
Thinking about time
Under FE, including “wave” or “year” as a continuous variable is not very useful, since it is treated as the deviation from the individual’s mean.
We may not want to treat time as a linear trend (for example, if we are looking for a cut point related to social policy)
Also, wave is very much correlated with individuals’ ages Can do FE or RE including time periods as dummies May be referred to as “two-way fixed effects”
Generate each dummy variable separately, or….
local i = 1 while `i' <= 15 { gen byte W`i' = (wave == `i') local i = `i' + 1 }
Time variables insignificant here (as we would expect)
F test that all u_i=0: F(2763, 18481) = 4.83 Prob > F = 0.0000 rho .46934486 (fraction of variance due to u_i) sigma_e 4.0304244 sigma_u 3.7904487 _cons 6.873039 6.064719 1.13 0.257 -5.01437 18.76045 W15 (dropped) W14 .0610156 .1898793 0.32 0.748 -.3111654 .4331966 W13 -.0671728 .279283 -0.24 0.810 -.6145932 .4802477 W12 -.0358824 .385874 -0.09 0.926 -.7922312 .7204663 W11 .0881723 .4963143 0.18 0.859 -.8846495 1.060994 W10 .2739767 .6086295 0.45 0.653 -.9189933 1.466947 W9 (dropped) W8 -.1120629 .8400402 -0.13 0.894 -1.758619 1.534493 W7 -.0104289 .9562925 -0.01 0.991 -1.884851 1.863993 W6 .0865111 1.07344 0.08 0.936 -2.01753 2.190553 W5 -.0761569 1.185396 -0.06 0.949 -2.399643 2.247329 W4 .1273198 1.303812 0.10 0.922 -2.428272 2.682911 W3 -.0554759 1.422781 -0.04 0.969 -2.844257 2.733306 W2 -.0140737 1.540443 -0.01 0.993 -3.033485 3.005338 badhealth 1.282999 .0460178 27.88 0.000 1.1928 1.373199 age2 -.0008821 .0002464 -3.58 0.000 -.0013651 -.0003991 age .071427 .1200867 0.59 0.552 -.1639541 .3068081 ue_sick 1.894834 .1531005 12.38 0.000 1.594743 2.194925 female (dropped) lose_pnr 2.648729 .2372293 11.17 0.000 2.183737 3.11372 get_pnr -.072553 .2117186 -0.34 0.732 -.487541 .3424349 partner .3193454 .1431496 2.23 0.026 .038759 .5999317 LIKERT Coef. Std. Err. t P>|t| [95% Conf. Interval]
corr(u_i, Xb) = 0.1423 Prob > F = 0.0000 F(19,18481) = 59.92
overall = 0.1323 max = 13 between = 0.1811 avg = 7.7R-sq: within = 0.0580 Obs per group: min = 1
Group variable: pid Number of groups = 2764Fixed-effects (within) regression Number of obs = 21264
. xtreg LIKERT partner get_pnr lose_pnr female ue_sick age age2 badh W*, fe
Extending panel data models to discrete dependent variables
Panel data extensions to logit and probit models Recap from Week 1: These models cover discrete (categorical) outcomes, eg psychological
morbidity; whether one has a job;. Think of other examples. Outcome variable is always 0 or 1. Estimate:
OLS (linear probability model) would set F(X,β) = X’β + ε Inappropriate because:
Heteroscedasticity: the outcome variable is always 0 or 1, so ε only takes the value -x’β or 1-x’β
More seriously, one cannot constrain estimated probabilities to lie between 0 and 1.
),(1)0Pr(
),()1Pr(
XFY
XFY
Extension of logit and probit to panel data:
We won’t do the maths! But essentially, STATA maximises a likelihood function derived from the
panel data specification Both random effects and fixed effects
First, generate the categorical variable indicating psychological morbidity
. gen byte PM = (hlghq2 > 2) if hlghq2 >= 0 & hlghq2 != .
Fixed effects estimates – xtlogit (clogit)
badhealth .5386858 .0298361 18.05 0.000 .4802081 .5971636 age2 .0000894 .0001715 0.52 0.602 -.0002468 .0004256 age -.03383 .0162808 -2.08 0.038 -.0657398 -.0019203 ue_sick .7533968 .0970111 7.77 0.000 .5632586 .9435351 lose_pnr 1.231475 .1469964 8.38 0.000 .9433672 1.519583 get_pnr .0368568 .13587 0.27 0.786 -.2294436 .3031572 partner .0960128 .0917139 1.05 0.295 -.0837432 .2757688 PM Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -5829.2122 Prob > chi2 = 0.0000 LR chi2(7) = 517.04
max = 13 avg = 9.6 Obs per group: min = 2
Group variable: pid Number of groups = 1543Conditional fixed-effects logistic regression Number of obs = 14802
Iteration 3: log likelihood = -5829.2122 Iteration 2: log likelihood = -5829.2122 Iteration 1: log likelihood = -5829.2179 Iteration 0: log likelihood = -5844.5165
note: female omitted because of no within-group variance. all negative outcomes.note: 1221 groups (6462 obs) dropped because of all positive ornote: multiple positive outcomes within groups encountered.. xtlogit PM partner get_pnr lose_pnr female ue_sick age age2 badh, fe
Losing a partner, being unemployed or sick, and being in bad health are associated with psychological morbidity
Negative in age throughout the human life span
Is losing a partner necessarily causing the psychological morbidity?
Adding some more variables:
nch02 .249448 .0785737 3.17 0.001 .0954464 .4034497 badhealth .537545 .0298374 18.02 0.000 .4790647 .5960253 age2 .0000582 .0001719 0.34 0.735 -.0002787 .0003951 age -.0295734 .0163456 -1.81 0.070 -.0616102 .0024635 ue_sick .749727 .0970536 7.72 0.000 .5595054 .9399487 lose_pnr 1.217756 .1472094 8.27 0.000 .9292311 1.506282 get_pnr .0679186 .1363361 0.50 0.618 -.1992952 .3351324 partner .0470255 .0931317 0.50 0.614 -.1355092 .2295603 PM Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -5824.1975 Prob > chi2 = 0.0000 LR chi2(8) = 527.07
max = 13 avg = 9.6 Obs per group: min = 2
Group variable: pid Number of groups = 1543Conditional fixed-effects logistic regression Number of obs = 14802
Iteration 3: log likelihood = -5824.1975 Iteration 2: log likelihood = -5824.1975 Iteration 1: log likelihood = -5824.2036 Iteration 0: log likelihood = -5839.5118
note: female omitted because of no within-group variance. all negative outcomes.note: 1221 groups (6462 obs) dropped because of all positive ornote: multiple positive outcomes within groups encountered.. xtlogit PM partner get_pnr lose_pnr female ue_sick age age2 badh nch02, fe
We know that women sometimes suffer from post-natal depression. Try total number of children, and children aged 0-2
Total number of children is insignificant, but children 0-2 is significant.
Next step???
Yes, we should separate men and women
sort female by female: xtlogit PM partner get_pnr lose_pnr female ue_sick age age2 badh nch02, fe
nch02 .0458965 .1268808 0.36 0.718 -.2027854 .2945784 badhealth .5628403 .047939 11.74 0.000 .4688817 .656799 age2 -.0004864 .0002804 -1.73 0.083 -.0010359 .0000632 age .0141781 .0265837 0.53 0.594 -.037925 .0662812 ue_sick .9009421 .1397474 6.45 0.000 .6270421 1.174842 lose_pnr 1.335693 .2314295 5.77 0.000 .8820997 1.789287 get_pnr .2042066 .2165868 0.94 0.346 -.2202957 .6287089 partner -.0262595 .151735 -0.17 0.863 -.3236547 .2711357 PM Coef. Std. Err. z P>|z| [95% Conf. Interval]
nch02 .3840788 .1011092 3.80 0.000 .1859084 .5822493 badhealth .5222259 .0382135 13.67 0.000 .4473288 .597123 age2 .0004039 .0002185 1.85 0.065 -.0000245 .0008322 age -.0570441 .0208069 -2.74 0.006 -.0978248 -.0162633 ue_sick .6032882 .1357316 4.44 0.000 .3372591 .8693174 lose_pnr 1.13012 .1901842 5.94 0.000 .7573657 1.502874 get_pnr -.0122303 .1751243 -0.07 0.944 -.3554676 .3310069 partner .0930161 .1181743 0.79 0.431 -.1386013 .3246336 PM Coef. Std. Err. z P>|z| [95% Conf. Interval]
Men
Women
Relationship between PM and young children is confined to women Any other gender differences?
Back to random effects
Likelihood-ratio test of rho=0: chibar2(01) = 2038.50 Prob >= chibar2 = 0.000 rho .3679062 .013299 .3422473 .3943355 sigma_u 1.38378 .0395675 1.308362 1.463545 /lnsig2u .6496376 .0571876 .537552 .7617232 _cons -2.871645 .2033651 -14.12 0.000 -3.270233 -2.473057 nch02 .2653162 .0743185 3.57 0.000 .1196546 .4109779 badhealth .6613526 .0261188 25.32 0.000 .6101607 .7125446 age2 .0000337 .0000961 0.35 0.726 -.0001546 .0002221 age -.013065 .0094552 -1.38 0.167 -.0315968 .0054667 ue_sick .7131287 .0839162 8.50 0.000 .5486559 .8776015 female .686486 .0712769 9.63 0.000 .5467859 .8261862 lose_pnr 1.309734 .1389371 9.43 0.000 1.037422 1.582046 get_pnr .032454 .1320281 0.25 0.806 -.2263163 .2912244 partner .0565392 .0695474 0.81 0.416 -.0797712 .1928496 PM Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -10377.058 Prob > chi2 = 0.0000 Wald chi2(9) = 959.52
max = 13 avg = 7.7Random effects u_i ~ Gaussian Obs per group: min = 1
Group variable: pid Number of groups = 2764Random-effects logistic regression Number of obs = 21264
Estimates are VERY similar to FE
Testing between FE and RE
quietly xtlogit PM partner get_pnr lose_pnr female ue_sick age age2 badh nch02, fe estimates store fixed quietly xtlogit PM partner get_pnr lose_pnr female ue_sick age age2 badh nch02, re hausman fixed .
Prob>chi2 = 0.0000 = 149.76 chi2(8) = (b-B)'[(V_b-V_B)^(-1)](b-B)
Test: Ho: difference in coefficients not systematic
B = inconsistent under Ha, efficient under Ho; obtained from xtlogit b = consistent under Ho and Ha; obtained from xtlogit nch02 .249448 .2653162 -.0158682 .0255066 badhealth .537545 .6613526 -.1238076 .014425 age2 .0000582 .0000337 .0000245 .0001425 age -.0295734 -.013065 -.0165083 .0133334 ue_sick .749727 .7131287 .0365983 .0487594 lose_pnr 1.217756 1.309734 -.0919776 .0486529 get_pnr .0679186 .032454 .0354646 .0340015 partner .0470255 .0565392 -.0095137 .0619409 fixed . Difference S.E. (b) (B) (b-B) sqrt(diag(V_b-V_B)) Coefficients
Random effects is rejected again.
Random effects probit
No fixed effects command available, as there does not exist a sufficient statistic allowing the fixed effects to be conditioned out of the likelihood.
Likelihood-ratio test of rho=0: chibar2(01) = 2056.20 Prob >= chibar2 = 0.000 rho .3899428 .0131534 .364491 .4160085 sigma_u .799494 .0221031 .7573255 .8440105 /lnsig2u -.4475525 .0552927 -.5559243 -.3391807 _cons -1.657895 .1165019 -14.23 0.000 -1.886235 -1.429556 nch02 .1530239 .0431233 3.55 0.000 .0685039 .237544 badhealth .3825895 .0149317 25.62 0.000 .3533239 .4118551 age2 .0000201 .0000552 0.36 0.715 -.000088 .0001283 age -.0077306 .0054309 -1.42 0.155 -.018375 .0029138 ue_sick .4189777 .048681 8.61 0.000 .3235648 .5143906 female .3924276 .0407552 9.63 0.000 .3125488 .4723063 lose_pnr .7646656 .0800772 9.55 0.000 .6077173 .921614 get_pnr .0183513 .0757428 0.24 0.809 -.1301019 .1668045 partner .0334017 .0399311 0.84 0.403 -.0448618 .1116651 PM Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -10370.501 Prob > chi2 = 0.0000 Wald chi2(9) = 995.53
max = 13 avg = 7.7Random effects u_i ~ Gaussian Obs per group: min = 1
Group variable: pid Number of groups = 2764Random-effects probit regression Number of obs = 21264
Why aren’t the sets of coefficients more similar?
Logit Probit Partner 0.057 0.033 *** Get partner 0.032 0.018 *** Lose partner 1.310 *** 0.765 *** Female 0.686 *** 0.392 ** UE/ sick 0.713 *** 0.419 *** Age -0.013 -0.007 Age-squared 0.000 -0.000 Bad health 0.661 *** 0.383 ** Kids 0-2 0.265 *** 0.153 *** Cons -2.871 *** -1.658 ***
Remember the conversion scale from Week 1…