panel regression

Panel Data Regression

Vid Adrison

Outline

• Structure of Data

• Structure of error in panel data

• Strict exogeneity assumption

• Estimation techniques in panel data under

strict exogeneity assumption

• Estimation technique when strict

exogeneity assumption is violated

Structure of Data

• Time Series: Single individual, many time observation. Ex: West Java Rice production, from 1980 - 2007

• Cross Section: Many individuals, single time observation. Ex: Indonesian Rice production by provinces, in 2007

• Panel/Longitudinal: Many individuals, and multiple time observations. Ex: Indonesian rice production by provinces, from 1980-2008

Error structure in panel data

• Ci is usually called: unobserved component, latent variable, unobserved heterogeneity, individual effect, individual heterogeneity

• Ci assumed to be constant over time, and vary across individuals. For instance: – ability in wage equation Ability is unobserved by econometrician, but

definitely affects individual’s wage

• The analysis of panel data is centered around the assumption of Ci (i.e., whether or not Ci is correlated with explanatory variables) – If Ci is correlated with one or more explanatory variables, then Fixed

Effect is the appropriate technique

– If Ci is uncorrelated with any explanatory variable, then Random Effect is the appropriate technique

itiitit ucXY

Strict Exogeneity Assumption • Strict Exogeneity Assumption

– If stated in term of unobserved effect

• Once Xit and Ci are controlled for, there are no other

variables affecting the value of Yit

– If stated in term of the idiosyncratic error

• This assumption is much stronger, because it does not

allow any arbitrary correlation between error and any covariates

• Standard Fixed Effect and Random effect regression only valid when strict exogeneity assumption is satisfied

iitiiitiiTiiit cXcxYEcxxxYE ,|,,....,| 121

iiTiiit cxxxuE ,,....,| 21

TtsuxE itis ,.....,1,0'

Strict Exogeneity Assumption

• Examples of Strict Exogeneity Assumption

violation

– Example 1:

• If individual’s decision to participate in a training is

influenced by shocks on his/her wage in the past,

or if the administrator choose individuals with low

Uit to participate in the training in t+1, then strict

exogeneity assumption might not be satisfied

itiititit uctrainingXwage log


– Example 2: • In this model, individual’s wage depends on his/her

wage in the past. Recall that the fundamental assumption in panel is E(Xis,Uit)=0 for s,t=1,….T. If there is a shock on wage at time t, it will affect the wage in time t. Since lagged wage is included as explanatory variable, then E(Xis,Uit) will not be equal to zero. Thus, any model with lagged dependent variable will not satisfy the strict exogeneity assumption, therefore, standard random effect or standard fixed effect will not be appropriate

itiititit ucwageXwage 1log

Estimation Techniques under Strict

Exogeneity Assumption

• A: FIXED EFFECT: if unobserved heterogeneity (Ci) is arbitrarily correlated with observed characteristics – Example:

• Firm’s decision to evade taxes depend on unobservable characteristics (i.e., manager’s preference to cheat). However, there is a possibility that the decision is related to some observed characteristics, such as asset size and cash flow (Big firms have higher incentives to evade taxes, and are more able to pay fines if the evasion is detected)

• There are two techniques of estimation under Fixed effect – Between Estimators

• Estimate the parameters using cross sectional information

• Run the average value of each individual

• What is the average between Mary and Joe if they differ in X by one unit?

– Within Estimators • Estimate the parameters using time series information of each individual

• Calculated by regressing the difference of each variable with its over time average, to get rid off the time constant unobservable

• What is the expected difference in Joe’s value if X increases by one unit?

iitiitiit uuXXYY

Estimation Techniques under the


• If we have an individual with at least one variable that is constant over time, parameter estimates can not be obtained.

• Example: We want to see what factors cause the economic growth of a city. In the specification, we include dummy variable to indicate the location of a city, i.e., whether it is located near to the sea. Since the value of location dummy will be constant over time, the difference will be zero, just like the difference of the unobserved heterogeneity. Thus, we can not distinguish the effect of time-constant observable and time-constant unobservable

Estimation Techniques under the


• B. RANDOM EFFECT: if unobserved heterogeneity (Ci) is

uncorrelated with explanatory variable

• If we assume that the constants (unobserved heterogeneity) are

randomly distributed across cross sectional unit

• It would be appropriate if we believe that sampled cross-sectional

units were drawn from a large population

• The estimation is conducted under FGLS

• The parameter value of Random Effect would be a weighted

average of Between and Within Estiamators

n

i

ii

n

i

ii YXXXYXXX1

1'

1

1

1'111 ''

'2/1 1

TT iiT

I

22

1

uT

Where

Estimation Techniques under Strict

Exogeneity Assumption

• If we do not have time-constant variable, which

method is appropriate? • Use Hausman Test basically test whether there is a

systematic difference between the two specification

• For instance: Specification RE uses Random Effect, and

Specification FE uses Fixed effects

– Ho is: There is no systematic difference between specification

RE and FE

– Ha: There is a systematic difference between specification RE

and FE

» Specification FE is consistent in both Ho and Ha

» Specification RE is inconsistent in Ha, but efficient in Ho

Estimation Technique when Strict

Exogeneity Assumption is Violated • General steps:

– Use a transformation to eliminate the unobserved

heterogeneity

– Choose an instrument for endogenous variables in

the transformed equation

– Estimate using pooled 2SLS

12

1

12111

12111

1

itit

itititit

itititititititit

ititiiitititititit

itiititit

YforinstrumentasYuse

uYXY

uuYYXXYY

uuccYYXXYY

ucYXY

Regression Results

rho .83717807 (fraction of variance due to u_i) sigma_e 3.612922 sigma_u 8.1923983 _cons 6.584371 2.001338 3.29 0.001 2.661819 10.50692 unem .2560543 .2708762 0.95 0.345 -.2748532 .7869619 exec -.0351956 .1619968 -0.22 0.828 -.3527036 .2823124 mrdrte Coef. Std. Err. z P>|z| [95% Conf. Interval]

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.6369Random effects u_i ~ Gaussian Wald chi2(2) = 0.90

overall = 0.0433 max = 3 between = 0.0732 avg = 3.0R-sq: within = 0.0015 Obs per group: min = 3

Group variable: id Number of groups = 51Random-effects GLS regression Number of obs = 153

. xtreg mrdrte exec unem

_cons .348119 2.68724 0.13 0.897 -4.961612 5.65785 unem 1.258905 .4373612 2.88 0.005 .394721 2.12309 exec .1650227 .1938679 0.85 0.396 -.2180419 .5480872 mrdrte Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 12845.3381 152 84.5088034 Root MSE = 8.9612 Adj R-squared = 0.0498 Residual 12045.5418 150 80.3036122 R-squared = 0.0623 Model 799.796283 2 399.898141 Prob > F = 0.0081 F( 2, 150) = 4.98 Source SS df MS Number of obs = 153

. reg mrdrte exec unem

Prob > chi2 = 0.0000 chi2(1) = 98.47 Test: Var(u) = 0

u 67.11539 8.192398 e 13.05321 3.612922 mrdrte 84.5088 9.192867 Var sd = sqrt(Var) Estimated results:

mrdrte[id,t] = Xb + u[id] + e[id,t]

Breusch and Pagan Lagrangian multiplier test for random effects

. xttest0

Specification Test

What method to choose? – Depends on the existence of unobserved heterogeneity

– If the unobserved heterogeneity is significant, then it on whether or not it is correlated with observed characteristic

• We can employ Hausman Test; which basically test RE against FE. In Hausman Test, statistically significant difference is interpreted as evidence against the random effects

• However, there are two caveats; 1. Correlation between observed characteristic and idyosincratic

error causes both RE and FE to be inconsistent

2. Hausman is conducted under two assumptions; unobserved characteristic is uncorrelated with observed characteristic, and it is normally distributed. If it is not normally distributed, then Hausman does not have systematic power against this condition.

Specification Test

– In null, OLS, FE and RE all consistent. If null is rejected, RE is inconsistent.

– However, there are cases when the difference between FE and FE coefficients are small, but statistically significant.

– On the other hand, there are cases where RE and FE coefficients differ greatly, but we cannot reject null due to large standard error. – In this circumstances, a typical response is to choose RE

specification. However, this comes at the cost of increased Type II error (failing to reject null, when it is false)

. est store fe

F test that all u_i=0: F(50, 100) = 16.46 Prob > F = 0.0000 rho .85542114 (fraction of variance due to u_i) sigma_e 3.612922 sigma_u 8.788124 _cons 7.637844 1.684436 4.53 0.000 4.295971 10.97972 unem .095914 .2800721 0.34 0.733 -.4597411 .6515692 exec -.1140743 .1800836 -0.63 0.528 -.4713551 .2432065 mrdrte Coef. Std. Err. t P>|t| [95% Conf. Interval]

corr(u_i, Xb) = -0.0635 Prob > F = 0.7909 F(2,100) = 0.24


Group variable: id Number of groups = 51Fixed-effects (within) regression Number of obs = 153

. xtreg mrdrte exec unem, fe

. est store re

rho .83717807 (fraction of variance due to u_i) sigma_e 3.612922 sigma_u 8.1923983 _cons 6.584371 2.001338 3.29 0.001 2.661819 10.50692 unem .2560543 .2708762 0.95 0.345 -.2748532 .7869619 exec -.0351956 .1619968 -0.22 0.828 -.3527036 .2823124 mrdrte Coef. Std. Err. z P>|z| [95% Conf. Interval]

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.6369Random effects u_i ~ Gaussian Wald chi2(2) = 0.90


Group variable: id Number of groups = 51Random-effects GLS regression Number of obs = 153

. xtreg mrdrte exec unem

Prob>chi2 = 0.0336 = 6.79 chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)

Test: Ho: difference in coefficients not systematic

B = inconsistent under Ha, efficient under Ho; obtained from xtreg b = consistent under Ho and Ha; obtained from xtreg unem .095914 .2560543 -.1601403 .0711792 exec -.1140743 -.0351956 -.0788787 .0786584 fe re Difference S.E. (b) (B) (b-B) sqrt(diag(V_b-V_B)) Coefficients

Mundlak’s Approach

• Although with the Hausman test available, choosing between FE and RE specification poses a dilemma. – FE is robust to correlation between the unobserved

heterogeneity and explanatory variables. However, we cannot use time invariant regressors.

– RE, on the other hand, can use time invariant regressors, but the assumption of zero correlation between unobserved heterogeneity and explanatory variables is unlikely

• Mundlak (1978) proposes modification of RE that would at least partially overcome its deficit. – The trick is to include additional variables – the time average of

time-varying variables – into the regression

itiiitit ucXXy .

panel regression

Documents