panel regression
DESCRIPTION
EkonometrikaTRANSCRIPT
Panel Data Regression
Vid Adrison
Outline
• Structure of Data
• Structure of error in panel data
• Strict exogeneity assumption
• Estimation techniques in panel data under
strict exogeneity assumption
• Estimation technique when strict
exogeneity assumption is violated
Structure of Data
• Time Series: Single individual, many time observation. Ex: West Java Rice production, from 1980 - 2007
• Cross Section: Many individuals, single time observation. Ex: Indonesian Rice production by provinces, in 2007
• Panel/Longitudinal: Many individuals, and multiple time observations. Ex: Indonesian rice production by provinces, from 1980-2008
Error structure in panel data
• Ci is usually called: unobserved component, latent variable, unobserved heterogeneity, individual effect, individual heterogeneity
• Ci assumed to be constant over time, and vary across individuals. For instance: – ability in wage equation Ability is unobserved by econometrician, but
definitely affects individual’s wage
• The analysis of panel data is centered around the assumption of Ci (i.e., whether or not Ci is correlated with explanatory variables) – If Ci is correlated with one or more explanatory variables, then Fixed
Effect is the appropriate technique
– If Ci is uncorrelated with any explanatory variable, then Random Effect is the appropriate technique
itiitit ucXY
Strict Exogeneity Assumption • Strict Exogeneity Assumption
– If stated in term of unobserved effect
• Once Xit and Ci are controlled for, there are no other
variables affecting the value of Yit
– If stated in term of the idiosyncratic error
• This assumption is much stronger, because it does not
allow any arbitrary correlation between error and any covariates
• Standard Fixed Effect and Random effect regression only valid when strict exogeneity assumption is satisfied
iitiiitiiTiiit cXcxYEcxxxYE ,|,,....,| 121
iiTiiit cxxxuE ,,....,| 21
TtsuxE itis ,.....,1,0'
Strict Exogeneity Assumption
• Examples of Strict Exogeneity Assumption
violation
– Example 1:
• If individual’s decision to participate in a training is
influenced by shocks on his/her wage in the past,
or if the administrator choose individuals with low
Uit to participate in the training in t+1, then strict
exogeneity assumption might not be satisfied
itiititit uctrainingXwage log
Strict Exogeneity Assumption
– Example 2: • In this model, individual’s wage depends on his/her
wage in the past. Recall that the fundamental assumption in panel is E(Xis,Uit)=0 for s,t=1,….T. If there is a shock on wage at time t, it will affect the wage in time t. Since lagged wage is included as explanatory variable, then E(Xis,Uit) will not be equal to zero. Thus, any model with lagged dependent variable will not satisfy the strict exogeneity assumption, therefore, standard random effect or standard fixed effect will not be appropriate
itiititit ucwageXwage 1log
Estimation Techniques under Strict
Exogeneity Assumption
• A: FIXED EFFECT: if unobserved heterogeneity (Ci) is arbitrarily correlated with observed characteristics – Example:
• Firm’s decision to evade taxes depend on unobservable characteristics (i.e., manager’s preference to cheat). However, there is a possibility that the decision is related to some observed characteristics, such as asset size and cash flow (Big firms have higher incentives to evade taxes, and are more able to pay fines if the evasion is detected)
• There are two techniques of estimation under Fixed effect – Between Estimators
• Estimate the parameters using cross sectional information
• Run the average value of each individual
• What is the average between Mary and Joe if they differ in X by one unit?
– Within Estimators • Estimate the parameters using time series information of each individual
• Calculated by regressing the difference of each variable with its over time average, to get rid off the time constant unobservable
• What is the expected difference in Joe’s value if X increases by one unit?
iitiitiit uuXXYY
Estimation Techniques under the
Strict Exogeneity Assumption
• If we have an individual with at least one variable that is constant over time, parameter estimates can not be obtained.
• Example: We want to see what factors cause the economic growth of a city. In the specification, we include dummy variable to indicate the location of a city, i.e., whether it is located near to the sea. Since the value of location dummy will be constant over time, the difference will be zero, just like the difference of the unobserved heterogeneity. Thus, we can not distinguish the effect of time-constant observable and time-constant unobservable
Estimation Techniques under the
Strict Exogeneity Assumption
• B. RANDOM EFFECT: if unobserved heterogeneity (Ci) is
uncorrelated with explanatory variable
• If we assume that the constants (unobserved heterogeneity) are
randomly distributed across cross sectional unit
• It would be appropriate if we believe that sampled cross-sectional
units were drawn from a large population
• The estimation is conducted under FGLS
• The parameter value of Random Effect would be a weighted
average of Between and Within Estiamators
n
i
ii
n
i
ii YXXXYXXX1
1'
1
1
1'111 ''
'2/1 1
TT iiT
I
22
1
uT
Where
Estimation Techniques under Strict
Exogeneity Assumption
• If we do not have time-constant variable, which
method is appropriate? • Use Hausman Test basically test whether there is a
systematic difference between the two specification
• For instance: Specification RE uses Random Effect, and
Specification FE uses Fixed effects
– Ho is: There is no systematic difference between specification
RE and FE
– Ha: There is a systematic difference between specification RE
and FE
» Specification FE is consistent in both Ho and Ha
» Specification RE is inconsistent in Ha, but efficient in Ho
Estimation Technique when Strict
Exogeneity Assumption is Violated • General steps:
– Use a transformation to eliminate the unobserved
heterogeneity
– Choose an instrument for endogenous variables in
the transformed equation
– Estimate using pooled 2SLS
12
1
12111
12111
1
itit
itititit
itititititititit
ititiiitititititit
itiititit
YforinstrumentasYuse
uYXY
uuYYXXYY
uuccYYXXYY
ucYXY
Regression Results
rho .83717807 (fraction of variance due to u_i) sigma_e 3.612922 sigma_u 8.1923983 _cons 6.584371 2.001338 3.29 0.001 2.661819 10.50692 unem .2560543 .2708762 0.95 0.345 -.2748532 .7869619 exec -.0351956 .1619968 -0.22 0.828 -.3527036 .2823124 mrdrte Coef. Std. Err. z P>|z| [95% Conf. Interval]
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.6369Random effects u_i ~ Gaussian Wald chi2(2) = 0.90
overall = 0.0433 max = 3 between = 0.0732 avg = 3.0R-sq: within = 0.0015 Obs per group: min = 3
Group variable: id Number of groups = 51Random-effects GLS regression Number of obs = 153
. xtreg mrdrte exec unem
_cons .348119 2.68724 0.13 0.897 -4.961612 5.65785 unem 1.258905 .4373612 2.88 0.005 .394721 2.12309 exec .1650227 .1938679 0.85 0.396 -.2180419 .5480872 mrdrte Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 12845.3381 152 84.5088034 Root MSE = 8.9612 Adj R-squared = 0.0498 Residual 12045.5418 150 80.3036122 R-squared = 0.0623 Model 799.796283 2 399.898141 Prob > F = 0.0081 F( 2, 150) = 4.98 Source SS df MS Number of obs = 153
. reg mrdrte exec unem
Prob > chi2 = 0.0000 chi2(1) = 98.47 Test: Var(u) = 0
u 67.11539 8.192398 e 13.05321 3.612922 mrdrte 84.5088 9.192867 Var sd = sqrt(Var) Estimated results:
mrdrte[id,t] = Xb + u[id] + e[id,t]
Breusch and Pagan Lagrangian multiplier test for random effects
. xttest0
Specification Test
What method to choose? – Depends on the existence of unobserved heterogeneity
– If the unobserved heterogeneity is significant, then it on whether or not it is correlated with observed characteristic
• We can employ Hausman Test; which basically test RE against FE. In Hausman Test, statistically significant difference is interpreted as evidence against the random effects
• However, there are two caveats; 1. Correlation between observed characteristic and idyosincratic
error causes both RE and FE to be inconsistent
2. Hausman is conducted under two assumptions; unobserved characteristic is uncorrelated with observed characteristic, and it is normally distributed. If it is not normally distributed, then Hausman does not have systematic power against this condition.
Specification Test
– In null, OLS, FE and RE all consistent. If null is rejected, RE is inconsistent.
– However, there are cases when the difference between FE and FE coefficients are small, but statistically significant.
– On the other hand, there are cases where RE and FE coefficients differ greatly, but we cannot reject null due to large standard error. – In this circumstances, a typical response is to choose RE
specification. However, this comes at the cost of increased Type II error (failing to reject null, when it is false)
. est store fe
F test that all u_i=0: F(50, 100) = 16.46 Prob > F = 0.0000 rho .85542114 (fraction of variance due to u_i) sigma_e 3.612922 sigma_u 8.788124 _cons 7.637844 1.684436 4.53 0.000 4.295971 10.97972 unem .095914 .2800721 0.34 0.733 -.4597411 .6515692 exec -.1140743 .1800836 -0.63 0.528 -.4713551 .2432065 mrdrte Coef. Std. Err. t P>|t| [95% Conf. Interval]
corr(u_i, Xb) = -0.0635 Prob > F = 0.7909 F(2,100) = 0.24
overall = 0.0002 max = 3 between = 0.0007 avg = 3.0R-sq: within = 0.0047 Obs per group: min = 3
Group variable: id Number of groups = 51Fixed-effects (within) regression Number of obs = 153
. xtreg mrdrte exec unem, fe
. est store re
rho .83717807 (fraction of variance due to u_i) sigma_e 3.612922 sigma_u 8.1923983 _cons 6.584371 2.001338 3.29 0.001 2.661819 10.50692 unem .2560543 .2708762 0.95 0.345 -.2748532 .7869619 exec -.0351956 .1619968 -0.22 0.828 -.3527036 .2823124 mrdrte Coef. Std. Err. z P>|z| [95% Conf. Interval]
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.6369Random effects u_i ~ Gaussian Wald chi2(2) = 0.90
overall = 0.0433 max = 3 between = 0.0732 avg = 3.0R-sq: within = 0.0015 Obs per group: min = 3
Group variable: id Number of groups = 51Random-effects GLS regression Number of obs = 153
. xtreg mrdrte exec unem
Prob>chi2 = 0.0336 = 6.79 chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
Test: Ho: difference in coefficients not systematic
B = inconsistent under Ha, efficient under Ho; obtained from xtreg b = consistent under Ho and Ha; obtained from xtreg unem .095914 .2560543 -.1601403 .0711792 exec -.1140743 -.0351956 -.0788787 .0786584 fe re Difference S.E. (b) (B) (b-B) sqrt(diag(V_b-V_B)) Coefficients
Mundlak’s Approach
• Although with the Hausman test available, choosing between FE and RE specification poses a dilemma. – FE is robust to correlation between the unobserved
heterogeneity and explanatory variables. However, we cannot use time invariant regressors.
– RE, on the other hand, can use time invariant regressors, but the assumption of zero correlation between unobserved heterogeneity and explanatory variables is unlikely
• Mundlak (1978) proposes modification of RE that would at least partially overcome its deficit. – The trick is to include additional variables – the time average of
time-varying variables – into the regression
itiiitit ucXXy .