multivariate regression models for panel data
Post on 23-Feb-2018
271 Views
Preview:
TRANSCRIPT
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
1/42
Journal of Econometrics 18 (1982) 546. North-Holland Publishing Company
MULTIVARIATE REGRESSION MODELS
FOR PANEL DATA
Gary CHAMBERLAIN*
University ( / K+sconsin Madison, WI 53706, USA
Nutionul Bureau of
Economic
esearch, Cambridge, MA 02138. USA
The paper examines the relationship between heterogeneity bias and strict exogeneity in a
distributed lag regression of y on X. The relationship is very strong when x is continuous,
weaker when x is discrete, and non-existent as the order of the distributed lag becomes
infinite. The individual specific
random variables introduce nonlinearity and hetero-
skedasticity; so the paper provides an appropriate framework for the estimation of multivariate
linear predictors. Restrictions are imposed using a minimum distance estimator. It is generally
more efficient than the conventional estimators such as quasi-maximum likelihood. There are
computationally simple generalizations of two- and three-stage least squares that achieve
this efficiency gain. Some of these ideas are illustrated using the sample of Young Men in the
National Longitudinal Survey. The paper reports regressions on the leads and lags of variables
measuring union coverage, SMSA, and region. The results indicate that the leads and lags could
have been generated just by a random intercept. This gives some support for analysis of
covariance type estimates; these estimates indicate a substantial heterogeneity bias in the union,
SMSA, and region coefficients.
1. Introduction
Suppose that we have a sample of individuals (or firms) followed over time:
(xif,yiJ, where there are
t=
1,.
. .,
T periods and
i
1,. ., N individuals.
Consider the following distributed lag specification:
E YitIXil,...,XiT,bO...,bJ,C)= i bijXi,t-j+Ci,
t=J+l,...,T
j=O
The coefficients
b,,
and ci are allowed to vary across individuals but are
constant over time. The population parameters of interest are fij= E bij),
j=O,...,
J. If the bii or ci are correlated with x, then a least squares regression
*I am grateful to Arthur Goldberger, Zvi Griliches, Donald Hester, George Jakubson, Ariel
Pakes, and Burton Singer for comments and helpful discussions. Financial support was provided
by the National Science Foundation (Grants No. SOC-7925959 and No. SES-8016383) and by
funds granted to the Institute for Research on Poverty at the University of Wisconsin, Madison,
by the Department of Health, Education, and Welfare pursuant to the provisions of the
Economic Opportunity Act of 1964.
01657410/82/000Cr0000/$02.75 0 1982 North-Holland
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
2/42
6
G. Chamberl ain, M ult it lar iat e reqression models or panel data
of y, on x,, ., .xtmJ will not provide a consistent estimator of the B,i (as
N-+co). We shall refer to this inconsistency as a heterogeneity bias.
In section 2, on identification, we consider first the case J =0 and bi j=Pj
We argue that the presence of heterogeneity bias will be signalled by a full
set of lags and leads in the least squares regression of y, on x1,. . .,xT
Furthermore, if we let y=(yi,.. .,yr), x=(xr,. . .,x,) and consider the
multivariate linear predictor: E*(y
lx) = no + lI,x,
then the T x T matrix ZZ,
should have a distinctive pattern - the off-diagonal elements within the
same column are all equal. In that case,
so there is just a contemporaneous relationship when we transform to first
differences. I think that a test for such restrictions should accompany
analysis of covariance type estimation.
There is an analogous question when J is finite and the bj are random as
well as c. Does E(y, 1 1,. . , xT) = E y,
1x,, . . ., xtmJ)
imply that there is no
heterogeneity bias? We find that the answer is yes if x has a continuous
distribution but not if x is discrete.
New issues arise as the order
(J)
of the distributed lag becomes infinite.
We consider this problem in the context of a stationary stochastic process; c
and the bj are (shift) invariant random variables. There are invariant random
variables with non-zero variance if and only if the process is not ergodic. We
pose the following question: if
E* Y,
I
. . .Xf-1.&,Xt+1,.. .I= E*(Y, 1 ,, x, - 1,. . .I,
so that y does not cause x according to the Sims (1972) definition, is it then
true that there is no heterogeneity bias? The answer is no, because if d is an
invariant random variable, then
E*(dI .
x,_~,x,,x,+~ ,...
)=E*(dIxt,xtpl ,... ).
Section 3 of the paper considers the estimation of multivariate linear
predictors. lhere is a sample ri = (x;,y i = 1,. . , N , where x; = (xi,, ., xiK) and
yi=(y,r,. . , yiM). We assume that ri is independent and identically distributed
(i.i.d.) according to some distribution with finite fourth moments. We do not
assume that the regression function
E(ji 1 i )
is linear; for although
E(j i 1 i , ci)
may be linear, there is generally no reason to insist that
E(c,j xi )
is linear.
Furthermore, we allow the conditional variance V(_V,
i)
to be an arbitrary
function of xi; the heteroskedasticity could, for example, be due to random
coefficients. Let wi be the vector formed from the squares and cross-products
of the elements of vi ; let Zl be the matrix of linear predictor coefficients:
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
3/42
G. Chamberlain Mulrirw-iate regression models fir panel data
7
,5*Cyi ( xi) =ZIx, where fl= ECy,x;)[E(xix~)] -I. Then wi is i.i.d. and f7 is a
function of E(wi). So the problem is to make inferences about differentiable
functions of a population mean, under random sampling.
This is straightforward and the results have a variety of novel implications.
Let ii be the least squares estimator; let it and 71
be the vectors formed from
the columns of ii and II. Then fi(7i-~)~N(O,Q) as N-t co. The formula
for C2 is not the standard one, since we are not assuming homoskedastic,
linear regression.
We impose restrictions by using a minimum distance estimator: find the
matrix satisfying the restrictions that is closest to fi in the norm provided by
fi -I, where fi is a consistent (as N+ 00) estimator of a. This leads to some
surprising results. For example,
consider a univariate linear predictor:
E*(yi 1xii, xiz)= x0 + zlxil + n2xi2. We can impose the restriction that n2 =0
by using a least squares regression of y on x1 to estimate rcr; however, this is
asymptotically less efficient, in general, than our minimum distance estimator.
The conventional estimator is a minimum distance estimator, but it is using
a different norm.
A related result is that two-stage least squares is not, in general, an
efticient procedure for combining instrumental variables; three-stage least
squares is also using the wrong norm. We provide more efficient estimators
for the linear simultaneous equations model by applying our minimum
distance procedure to the reduced form, thereby generalizing Malinvauds
(1970) minimum distance estimator. Suppose that the only restrictions are
that certain structural coefficients are zero (and the normalization rule). We
provide a generalization of three-stage least squares that has the same
limiting distribution as our minimum distance estimator. There is a
corresponding generalization of two-stage least squares.
We also consider the maximum likelihood estimator based on assuming
that ri has a multivariate normal distribution with mean z and covariance
matrix Z. Then the slope coefficients in IZ are functions of C and, more
generally, we can consider estimating arbitrary functions of C subject to
restrictions. When the normality assumptions do not hold, we refer to the
estimator as a quasi-maximum likelihood estimator. The quasi-maximum
likelihood estimator has the same limiting distribution as a certain minimum
distance estimator; but in general that minimum distance estimator is not
using the optimal norm. Hence our estimator is generally more efficient than
the quasi-maximum likelihood estimator.
Section 4 of the paper presents an empirical example that illustrates some
of the results. It is based on the panel of Young Men in the National
Longitudinal Survey (Parnes); y, is the logarithm of the individuals hourly
wage, and x, includes variables to indicate whether or not the individuals
wage is set by collective bargaining; whether or not he lives in an SMSA;
and whether or not he lives in the South. We present unrestricted least
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
4/42
8
G. Chamberlain, Multivariate regression models for panel data
squares regressions of y, on xi,.
.,
xT. There are significant leads and lags; if
they are generated just by a random intercept (c), then ZZ should have a
distinctive form. There is some evidence in favor of this, and hence some
justification for analysis of covariance estimation. In this example, the leads
and lags could be interpreted as due just to c, with E(y, 1 1,. . ., xT, c) =j?x, + c.
2. Identification
Suppose that a farmer is producing a product with a Cobb-Douglas
technology,
Y,=Px,+c+~,,
o
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
5/42
G. Chamberlain :Illrltirariate regression models for panel data
9
With more than one observation per farm, however, we can consider the
least squares regression of y, on x = (xi,.
.,xT).
The population counterpart
is
E*(y,
I x) = pxt + E*(c ( x) + E*(u, I x).
Assume that V(X) is non-singular. Then
E*(c 1 ) = $ + xx,
Iz = I/ (x) cov(x, c).
Even if E*(u, / x) =O, there will generally be a full set of lags and leads
if V(c) # 0. For example, if cov (xt, c) =cov (x,, c), t = 1,.
. ., ?;
then Iz is pro-
portional to the row sums of V-(x), and all of the elements of I will
typically be non-zero. I think that it is generally true that E*(c lx) depends
on all of the x,s if it depends on any of them. So the presence of
heterogeneity bias will be signalled by a full set of lags and leads. Also, if
E*(u) x)=0, then the wide-sense multivariate regression will have a
distinctive pattern:
n
1 =
co+,
x)
v
(x) = p
I, +
1
A,
where 1 is a TX 1 vector of ones. The off-diagonal elements within the same
column of ll, are all equal.
A common solution to the bias problem is some form of analysis of co-
variance. For example, we can form the farm specific means (j?=CT= 1 y,/T,
X =cT= 1 ,/T) and the deviations around them (jt = y, - j, 3, = x,-X), and then
run a pooled least squares regression of ~7 on 2. This is equivalent to first
running the least squares regression of g* on & for each of the
T
cross-section
samples, and then forming a weighted average of the T slope coefficients. The
population counterpart of the tth least squares regression is
So the least squares regression of Y; on ?r provides a consistent (as N-co)
estimator of fl only if E*(u, - Ul X,-Z?) =O. I would not expect this condition
to hold unless
E*(u,-uu,.
1
j-~~-x~,...,x~-x~~~)=O,
t = 2,
,) 7:
This analysis of covariance estimator was used by Mundlak (1961). Related estimators have
been discussed by Balestra and Nerlove (1966), Wallace and Hussein (1969), Amemiya (1971),
Maddala (1971), and Mundlak (1978). Analysis of covariance in nonlinear models is discussed in
Chamberlain (1980).
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
6/42
10 G. Chamberlain, Mu&variate regression models for panel data
so that x is strictly exogenous when we transform the model to first
differences.3 The strict exogeneity restriction is testable since it implies that
E*( ,,-Yr~1IXz-X1,...,
xT--x.-l)=~*(Yt-Y,-l I-+-x,-d
hence there are exclusion restrictions on the linear predictors.
A stronger condition is that
E*(u,lx ,,..., xT)=o,
t=l,...,T.
This implies that Zl, has the form fiZ,+l1. These restrictions on n,are
testable; we can summarize them by saying that x is strictly exogenous
conditional on c. The restrictions would fail to hold in the production
function example if u, is partly predictable from its past, so that
E[exp(u,)
1LAY,]
epends on u, _ r, u, _ 2, . . .
Now suppose that the technology varies across the farms, so that
y,=bx,+c+u,,
where b is a random variable that is constant over time. We shall refer to b
and c as invariant random variables. Our discussion of E*(c lx) indicated
that it depends on all of the x,)s if it depends on any of them. I would expect
this to be true of E(c
1 )
as well. This general characteristic of invariant
random variables is formulated in the following condition:
Condition (C). Let x* =(xt,, . . ., xlK), where {tr,. . ., tK} is some proper subset of
{l,...,
T). Let d be an invariant random variable. Then E(d 1 )=E(d I x*)
implies that
E(d
1 ) =
E(d).
Suppose that the parameter of interest is /l=E(b). If b or c is correlated
with x, then a least squares regression of y, on x, will not provide a
consistent estimator of /I. We have argued that such a heterogeneity bias will
be signalled by a full set of lags and leads when we regress y, on (x1,. . ., xT).
Under what conditons can we infer that there is no bias if we observe only a
contemporaneous relationship? Proposition 1 provides some guidance; it can
be extended easily to the case of a finite distributed lag.
Condition (R). Prob (x, =x, _ 1) = 0 for some integer n with i 5 n 5 T.
Proposition I. Suppose that
E(y, I x, b, 4 = b x, + c,
t=l,...,T.
3The strict exogeneity terminology is based on Sims (1972, 1974)
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
7/42
G. Chamberl ain, M ult iv ari ate regression models fir panel dat a
11
[f conditions (C) and (R) hold and if T 23. then
E Y, I 4 = E(Y, 4,
t=l,...,7:
implies that
where /I = E(b) = E(b ) x) and y = E(c) = E(c
) x).
ProoJ: The following equalities hold with probability one:
E(b 4 = CE(Y,I 4 - E(Y, I I xn - d/(x, - xn 11,
So E(bIx)=E(bI
x,, x, _ I), and if T2 3, then (C) implies that E(b
1 )=
E(b),
and
~ clx)=E y,lx)--E blx)x,=E y,lx,)--xx,;
hence E(c 1 ) = E(c 1x1) and so
E(c 1 ) = E(c).
Q.E.D.
This analysis can be applied to linear transformations of the process. If
we find that E(y,
1 )
has a full set of lags and leads, then we can ask if
that is just due to E(c/x)#E(c). Let dy,=y,-y,_,, Ax~=x~--x~-~, and
Ax = (Ax,,
. . .,
Ax,). Under the assumptions of the proposition, if
E(AY, 1A4 = E(AY, ( Ax,),
then
E(AY, 1A4 = B(A-4.
Note that it is possible to find
E(Ay,
1
Ax)=E(Ay, (Ax,)
even though
-W+)#W). F
or example, consider the stationary case in which cov(x,, b)
= cov (x,, b);
then
E*(b
1
Ax) = E(b)
and so
E(b 1Ax)= E(b)
if the regression
function of
b
on
Ax
is linear. Then we might find that
E(Ay,)
x) has a full set
of lags and leads even though E(Ay,
1
Ax) does not.
The condition that prob(x,=x,_ ,)=O is necessary. For consider the
following counter-example:
E(b
( x) = /II1 if x1 =. . . = xT,
E(b
1 ) = p2 if not
(PI f PA. Then
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
8/42
12
G.
Chamberlain, Multivariate regression
modelsor
panel data
but p2 #E(b) unless prob(x, = ... = xT) = 0. So there is an important
distinction here between continuous and discrete distributions for x. If x,
only takes on a finite set of values, then there will generally be positive
probability that x1 =. . . = xT, although this probability may become negligible
for large 7:
The following proposition provides some additional insight mto this
distinction; it is based on a condition that is slightly weaker than (R):
Condi ti on (R).
Prob(x, = x2 =. . . = xT) = 0.
Proposi ti on 2. Suppose that
E(Y, x, b,4 = bxt + c,
t=l,...,7;
w here T 2 2. Assume t hat condi t i on (R) hol ds and defi ne
6=til
Yt-m-+l
x,--v.
Then E(6j = E(b) i f E((6j) < a .4
ProoJ The following equalities hold with probability one:
E(l+,b,c)= i b(x,-X)
i (x,-%)2=b;
t=1
I
t=1
so if E(I6/)< co,
E(6j = E[E(6[ X, b, c)] = E(b).
Q.E.D.
Suppose that (yil,. . ., yi,, xii,.
. ., xiT), i=
1,. . ., N, is a random sample
from the distribution of b,x). Define
6zt$l (Yit-Pi)(xit-xi) til (xit-xi)2.
I
Then if the assumptions of Proposition 2 are satisfied, cr= I &i/N converges
almost surely (as.) to E(b) as N-co. It is important that gi is an unbiased
estimator of E(b), since we are actually taking the unweighted mean of a
*The assumption that E(161)< co is not innocuous. For example, suppose that V(c)= V(b)=0
and (x,, y,) is independent and identically distributed (t = 1,. ., T) according to a bivariate normal
distribution. Then h^=b+{ P(y, Ix~)/[(T-~)V(.X,)]}~ w,
where w has Students t-distribution with
T- 1 degrees of freedom. Hence Q/61) < cc only if T 2 3.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
9/42
G. Chamberl ain, M ult iuar iat e regression models for panel data 13
large number of these estimators. The lack of bias requires that x be strictly
exogenous conditional on b,c. It would not be sufficient to assume that
E(y, ( xt,
b, c) = bx, + c.
For example, if x, = y,_ 1, then our estimator would not
converge to
E(b),
due to the small
T
bias in least squares estimates of an
autoregressive process.
Let Di =0 if xi1 = .. .. = xiT, Di= 1 if not. We can compute gi only for the
group with Di= 1. The sample mean of bi for that group converges as. to
E(b
1D = l), but we have no information on
E(b
1D = 0). So unless prob(D = 0)
= 0, any value for E(b) is consistent with a given value for E(b
1
D = 1).5
If x, has a continuous distribution, then the assumption that the regression
function is linear (E(y,
1 t,
b, c) = bx, + c) is very restrictive; the implication of
this assumption (combined with strict exogeneity) is that we can obtain an
unbiased estimator for
b,
and hence a consistent (as N+co) estimator for
E(b).
If x, is a binary variable, then the assumption of linear regression is not
restrictive at all; but there are fewer implications since there is positive
probability that 6is not defined for finite ?:
The following extension of Proposition 1 to the case of a finite distributed
lag is straightforward?
Proposition 1. Suppose that
E(y,IX,b,,...,b,,c)= i bjx,-j+c,
t=J+l,...,T
j=O
If condition (C) holds, ij
1
X . . X,-J
i i
: 1
X,-J-l . . . . x&,
,fbr some integer n with 25 + 2 5 n 5 7; and if T 2 25 + 3, then
E(Y, I4 = E(Y, Ix,, . . .>X,-J),
t=J+1,...,7;
5A solution could be based on Mundlaks (1978a) proposal that E(bI x)=$,,+$, CT=, x1.
However, even if we assume that the regression function is linear in x1,. .,xT, it may be difficult
to justify the restriction that only cx, matters, unless T is large and we have stationarity:
cov (b, I,) = cov
(b, x1)
and V(x) band diagonal. (See Proposition 4 and the discussion preceding
it). Furthermore, if cov(h, x,) = cov(b, x1), then E(b 1 2-x,, .,xr -xT- 1)= E(b) (if the regression
function is linear), and so there is no heterogeneity bias once we transform to first differences.
6We shall not discuss the problems that arise from truncating the lag distribution when
T < J + 1. These problems are discussed in Griliches and Pakes (1980). By working with linear
transformations of the process, it is fairly straightforward to extend our analysis to general
rational distributed lag schemes.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
10/42
14
G. Chamberl ain, M ult iv ari ate regression models or panel data
implies that
E(y,Ix)= i Bjxt-j+Y,
j=O
where
pj = E(bj) = E(b,
1 )
and y = E(c) = E(c I x),
j=O,..., J.
The extension of Proposition 2 is also straightforward. There are new issues,
however, in the infinite lag case, which we shall take up next.
Large number of lags.
Suppose that
E(.Yfldx),c)= f Bjxt-j+c2
i=O
where O(X) is the information set (a-field) generated by {.
. .,x_ I, x0, x1,. . .},
and Cj=o /Ij x,_ j converges in mean square as J-+ co. Consider a regression
version of the Sims (1972) condition for x to be strictly exogenous (y does
not cause x),
E Yt
I
4) =
E(Yt
I x,2 xt -
19.. 4
Does this condition imply that E(c
1a(x))=E(c), so
that there is no
heterogeneity bias?
We shall consider this question in the context of a (strictly) stationary
stochastic process. Since c does not change over time, it is an invariant
random variable. The following proposition is proved in appendix A:
Proposition 3. Ifd is an invariant random variable with E(ldl)< co, then
E(dIo(x))=E(dlx,,x,-,,...),
where t is any integer.
It follows that
n
E Y,Ia x))=E cIx,,x,-,,...)+ C Pjx*-j
j=O
=E y,Ix,,x,-I,...).
So we cannot rule out heterogeneity bias just because y does not cause x. If
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
11/42
G. Chamberl ain, M ult iv ari ate regression models or panel data
1s
a large number of lags have been included, then a small number of leads
provide little additional information on c.
We can gain some insight into this result by considering the linear
predictor of an invariant random variable. Let
where
E*(c 1xl,. . .,
x.)=IC/T+&XT,
2;. =(& i, . . .) A,,) and x;=(xl,...,xT).
Stationarity implies that I,=rV- (xT)l, where r =cov(xl, c) and 1 is a TX 1
vector of ones. Since V(x,) is a band-diagonal matrix, I is approximately an
eigenvector of I+,) for large T; hence &.x,EzIc~T=
1 x,.
For example, if
X, = px, _ i + u,, where v, is serially uncorrelated, then
&-x,=~
(1 PI i xt+P(x, +x,)
/cu+P) vxln
i=l
1
Now in this example, L&K, does not approach a limit as T--+Lx unless
z = cov (x,, c) =O. In fact cov (xi, c) is zero here, since there is a non-trivial
linear predictor only if cj=O x,_ j/J converges to a non-degenerate random
variable as J-rco.
The general case is covered by the following proposition:
Proposition 4.
If d is an invariant random variable and E(d) < co, E(xf) < 00,
then
E*(d
I
. ..) X_1,X&Xl)... )=$ +/IT?,
where 2 is the limit
i n mean squar e of cJ= O , _
/J as J+ co, t is any
integer,
A=cov(d,i)/V(i) if V(a)#O,
and
=o
if V(a)=O,
$ = E(d) - AE(f).
(See appendix A for proof.)
The existence of the f limit, both in mean square and almost surely, is the
main result of ergodic theory and will be discussed further below. It is clear
that 2 is an invariant random variable. If V(a)#O, then the x process has a
(non-degenerate) invariant component, and conditioning on the xs gives a
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
12/42
16
G. Chamberlain, Multioariate regression modelsfor panel data
non-trivial linear predictor if 2 is correlated with c. However, if V(i)=O, then
cov(c, x,)=0 for all
t,
and the linear prediction of c is not improved by
conditioning on the xs
It follows from Proposition 4 that
E* Y, I
. . . x,-1,x*,x,+ ,,..
) =E*(ytIxt,x,-,,...I
=i+jio
j++
1
t - + r(J),
where r(J) converges in mean square to zero as J-co. So y does not cause x
according to Sims definition; but this does not imply that c is uncorrelated
with the xs. If we include a large number of lags, then the bias in any one
coefficient is a negligible
A/J,
but the bias in the sum of the lag coefficients
tends to 2 as J-co. If we include K leads, then the sum of their coefficients
is approximately K3,/J, which is close to zero when J is much larger than K.
If the pi are zero for j> J*, then the lag coefficients beyond that point will
be close to zero but their sum will be close to II.
Under the stationarity assumption,
there are non-degenerate invariant
random variables if and only if the process is not ergodic. The basic result
here is the (pointwise) ergodic theorem: Let g be a random variable on
(Q,F,P) with E(lgl)< co,
and let g,(o)=g(Sw), where S is the shift
transformation (see appendix A); then the following limit exists as.:
The limit kj is an invariant random variable; it is the expectation of 8,
conditional on &, where f is the information set (a-field) generated by all of
the invariant random variables. If 1/(i) # 0 for some g, then the process is not
ergodic. In the ergodic case, all of the invariant random variables have
degenerate distributions.
Suppose that
and let
E(Y, 44, A= b x, + c,
Gil
(Y,-Ylbt--x)
il
h-3.
Recall condition (R): prob(s, =...=x~)=O. I want to examine the
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
13/42
G. Chamberl ain, Multinariate regression models or panel data
17
significance of condition (R) as T+n; in the stationary case. Note that
So a limiting version of condition (R) is
prob[ I/(x, ) f) = 0] = 0.
If this condition holds, then
l imb
~~(xlY1l~)-~(xlI8)~(YlI,a)~, as.
T
T- r X
E(4 I&)-cm, I&)I2 .
and b is observable as T-tco. But if there is positive probability that
T/(x, 1 ) =O, then the identification problem is more difficult. There is no
information on b for the stayers; so-in order to obtain E(b), even as T-co,
we
have to make untestable assumptions about the unobservable part of the
b distribution.
3. Estimation
Consider a sample Y;=(x:,yi),
i =
1,.
. .,X,
where xi. = (xi,, . ., xiK), yi
=(yil,. . ., yiM). We shall assume that vi is independent and identically
distributed (i.i.d.) according to some multivariate distribution with finite
fourth moments and
E(x,x:)
non-singular. Consider the minimum mean
square error linear predictors,
E*(yi,
I
xi)
=dlxi>
m=l,...,M,
which we can write as
E*bi 1xi) = LZxi with
tZ = Ebi xi) [E(xi xi)] .
We want to estimate ll subject to restrictions and to test those restrictions.
For example, we may want to test whether a submatrix of Ll has the form
/?Z+lA.
I think that analysis of covariance estimation should be accompanied
by such a test.
We shall not assume that the regression function
E(y, 1xi)
is linear. For
although E@, 1 i, ci) may be linear (indeed, we hope that it is), there is generally
This agrees with the definition in section 2 if xi includes a constant.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
14/42
18
G. Chamberlain, Multivariate regression models for panel data
no reason to insist that E(ci Ixi) is linear. So we shall present a theory of
inference for linear predictors. Furthermore, even if the regression function is
linear, there may be heteroskedasticity - due to random coefficients, for
example.8 So we shall allow V(j,
1 i)
to be an arbitrary function of xi.
3.1. The estimation of linear predictors
Let wi be the vector formed from the distinct elements of r i r i that have
non-zero variance. Since v;=(xi,yi) is i.i.d.,
it follows that wi is i.i.d. This
simple observation is the key to our results. Since IZ is a function of E(wi),
our problem is to make inferences about a function of a population mean,
under random sampling.
Let p= E(w,) and let IL be the vector formed from the columns of ll [Z
= vet (IZ)]. Then YI is a function of P: x=/z(p). Let W= cy2
1 w,/N;
then
7i = h(w) is the least squares estimator:
.=VeC[ ~~XixI)-~~XiYI].
By the strong law of large numbers, W
converges almost surely to p as
N-tee
(WL
$), where p is the true value of p. Let n=h(~o). Since h(p) is
continuous at p =p, we have 2%
7~. The central limit theorem implies
that
J5$i-pO)%v(O,
(w,)).
Since h(p) is differentiable at p = PO, the &method gives
JN(iZ-d)%v(O,R),
where
We have derived the limiting distribution of the least squares estimator.
This approach was used by Cramer (1946) to obtain limiting normal
*Anderson (1969,1970), Swamy (1970,1974), Hsiao (1975), and Mundlak (1978a) discuss
estimators that incorporate the particular form of heteroskedasticity that is generated by
random coefficients.
See Billingsley (1979, example 29.1, p. 340) or Rao (1973, p. 388).
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
15/42
G. Chamberlain, Multivariate regression models for panel data 19
distributions for sample correlation and regression coefficients (p. 367); he
presents an explicit formula for the variance of the limiting distribution of a
sample correlation coefficient (p. 359). Kendall and Stuart (1961, p. 293) and
Goldberger (1974) present the formula for the variance of the limiting
distribution of a simple regression coefficient.
Evaluating the partial derivatives in the formula for 52 is tedious. That
calculation can be simplified since i has a ratio form. In the case of simple
regression with a zero intercept, we have rc = E(y,x,)/E(xj ) and
fi(kTO)=
y.u.-
I I
i=l
Ql xi)[fi( , m)].
Since I?= r x/N*E(x?), we obtain the same limiting distribution by working
with
fl C(Yi noxi)xillCfi E(xZ)l,
The definition of rc gives E[(y, - rcxi)xi] = 0, and so the central limit theorem
implies that
This approach was used by White (1980) to obtain the limiting distribution
for univariate regression coefficients.
lo In appendix B (Proposition 5) we
follow Whites approach to obtain
where
s2 =
E[iJJi-noxi)(yi
-nOx,) @@i; (Xi xi) @,
1,
(1)
@, = E(qx;).
A consistent estimator of 52 is readily available from the corresponding
sample moments,
n here
o=&$ [~i-Bxi)(JJi-fiXi)@ S;(Xixi)S;q AL?,
(2)
L 1
S,= 5 x,x:/N.
i l
Also see White (1980a,b).
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
16/42
20 G. Chamberlain Multkariate regression modelsfiv
panel
data
If E(j, 1 i) =ZZx, so that the regression function is linear, then
If Vcvi 1Xi) is uncorrelated with xix;, then
If the conditional variance is homoskedastic, so that V(j, 1 i)= C does not
depend on xi, then
3.2. Imposing restrictions: The minimum distance estimator
Since IZ is a function of E(w,), restrictions on ZZ imply restrictions on E(wi).
Let the dimension of r=E(wi) be q.
We shall specify the restrictions by the
condition that ~1 depends only on a p x 1 vector 8 of unknown parameters: p
=g(8), where g is a known function and psq. The domain of 8 is X a subset
of p-dimensional Euclidean space (RP) that contains the true value 8. So the
restrictions imply that ~=g(6) is confined to a certain subset of
Rq.
We can impose the restrictions by using a minimum distance estimator:
choose &to
where A, ff-i P and P is positive definite. This minimization problem is
equivalent to the following one: choose 6 to
The properties of 6 are developed, for example, in Malinvaud (1970, ch. 9).
Since g does not depend on any exogenous variables, the derivation of these
properties can be simplified considerably, as in Chiang (1956) and Ferguson
(1958).
For completeness, we shall state a set of regularity conditions and the
properties that they imply:
If there is one element in ripi with zero variance, then q = [(K + M)(K + M + 1)/2] - 1.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
17/42
G. Chamberl ain, M ult iv ari ate regression models for panel data
21
Assumption 1.
uN aAg(Bo); Yis a compact subset of RP that contains 6; g
is continuous on yT and g(6)=g(O) for 0~ Y implies that 8=8;
A, s Y,
where Y is positive definite.
Assumption 2.
$?[a,-g(O)] %(O, A); r contains a neighborhood
O in which g has continuous second partial derivatives; rank (G) =p,
G =
ag eOym
Choose 8 to
minCa,-g(e)lA.Ca,-s(e)l.
0Er
Proposition 6.
If Assumption I is satisfied, then ea%Oo.
E. of
where
Proposition 7.
Zf Assumptions I and 2 are satisfied, then ,,/%(&O)%V(O, A),
where
If A is positive definite, then A -(CT A - 1 c)-
1
is positive semi-definite; hence an
optimal choice for Y is A .
Proposition 8.
If Assumptions I and 2 are satisfied, if A is a q x q positive
definite matrix, and if A,%A- I, then
Wwd831 4vC~,-g(B)1%2kp).
(This is extended to the case of nested restrictions in Proposition 8, appendix
B.)12
Suppose that the restrictions involve only Zl. We specify the restrictions by
the condition that z=f (4, where 6 is s x 1 and the domain of 6 is Y,, a
subset of R that includes the true value 6. Consider the following estimator
of 6: choose s^ to
~:CA-f(6)]8-[li-f(S)],
1
Since the proofs are simple, we shall keep the paper self-contained and include them in
appendix B. The proofs are based on Chiang (1956), Ferguson (1958), and Malinvaud (1970,
ch. 9).
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
18/42
22
G. Chamberl ain, M ult ioar iat e regression models or panel data
where fi is given in eq. (2) and we assume that 0 in eq. (1) is positive
definite. If Y, and
f
satisfy Assumptions 1 and 2, then 6^3S,
fi(&
so)qo, [F
n ~
Fj -
),
and
where
F= i 3 f d ) /W.
We can also estimate So by applying the minimum distance procedure to w
instead of to Iz. Suppose that the components of wi are arranged so that
w:=(w;,, wQ, where wil contains the components of x&. Partition p=E(wi)
conformably: p = (PC;,&). Set 8 = (8r, VZ)= (8, pi). Assume that V(w,) is
positive definite. Now choose 6 to
and g,(n, ~1~)= pr. Then &r gives an estimator of 6; it has the same limiting
distribution as the estimator 8 that we obtained by applying the minimum
distance procedure to 12. (See Proposition 9, appendix B.)
This framework leads to some surprising results on efficient estimation.
For a simple example, we shall use a univariate linear predictor model,
E*(yi 1
Xil,Xiz)=710 +
?Tl
Xi1
+7Cz Xi2.
Consider imposing the restriction rc2 = 0. Then the conventional estimator of
n1 is byx,,
the slope coefficient in the least squares regression of y on x1. We
shall show that this estimator is generally less efficient than the minimum
distance estimator if the regression function is nonlinear or if there is
heteroskedasticity.
Let fi,,it, be the slope coefficients in the least squares multiple regression
of y on x1,x2. The minrmum distance estimator of a, under the restriction
rrZ =0 can be obtained as 6=72r +r& where r is chosen to minimize the
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
19/42
G. Chamberlain, Multivariate regression models for panel data
23
(estimated) variance of the limiting distribution of & this gives
where Qj, is the estimated covariance between tij and I& in their limiting
distribution. Since 72, = bYx, - 722bx2x1, we have
If E(Y,
1Xil,XiJ is
linear and if V(y,
1 ii, xi2)=a2,
then w12/022 =
-COv(Xi,,Xi2)/~(Xi~) and s^= byxl. But in general 8# byxl and s^ is more
efficient than
by_.
The source of the efficiency gain is that the limiting
distribution for ti, has a zero mean (if rc2=O), and so we can reduce variance
without introducing any bias if 72, is correlated with
b,,l.
Under the
assumptions of linear regression and homoskedasticity,
b,_
and 72, are
uncorrelated; but this need not be true in the more general framework that
we are using.
3.3. Simultaneous equations: A generalization of two- and three-stage least
squares
Given the discussion on imposing restrictions, it is not surprising that two-
stage least squares is not, in general, an efficient procedure for combining
instrumental variables. I shall demonstrate this with a simple example.
Assume that (yi,zirxil,xi2) is i.i.d. according to some distribution with finite
fourth moments, and that
yi = 6 Zi +
Vi,
where
E(ui xii) = E(ui xi2) = 0.
Assume also that
E(zi xii) # 0, E(z, xi2) # 0.
Then
there are two instrumental variable estimators that both converge a.s. to 6:
$jcifI YixijlifI zixij,
j= 1,2,
fi{(;;)-(;)}-N(OJ)>
where the j,
k
element of n is
2, = EC(Yi-dzi)2XijXi J
Jk
E(zixii)E(zi.xik)
j,k=1,2.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
20/42
24 G. Chamberl ain, M ult iv ari at e regression models or panel data
The two-stage least squares estimator combines 8, and & by forming
^zi=7c1xil +ti2xi2, based on the least squares regression of z on x1,x2 (as-
sume that E[(xir, Xia)(Xil, xi2)] is non-singular),
where
N
N
N
oiti,
c
ZiXil
i=l
I
ili~lzixil+722 C zixi2
.
i=l
)
Since i %a, JN(&s,,
-6) has the same limiting distribution as
This suggests finding the r that minimizes the variance of the limiting
distribution of fi[r($i - 6) + (1 -r)(& -S)]. The answer leads to the
minimum distance estimator: choose e^ o
gives
e^=z&+(l-z)&,
where
~=(~+1,2)/(3.1+2~12+~22),
and Ijk is the j, k element of A - .
The estimator obtained by using a
consistent estimator of A has the same limiting distribution.
In general z #a since r is a function of fourth moments and a is not.
Suppose, for example, that zi = Xi2. Then IX= 0 but z # 0 unless
xil xi2
E(xil
x i2 )
>I
=o
If we add another equation, then we can consider the conventional three-
stage least squares estimator. Its limiting distribution is derived in appendix
B (Proposition 5); however, viewed as a minimum distance estimator, it is
using the wrong norm in general.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
21/42
G. Chamberlain, Multivariate regression models for panel data
25
Consider the standard simultaneous equations model:
yi = nxi +
ui,
E(Ui xi) = 0,
ryi +
BXi =
ui,
where rll+
B= 0
and Tui = vi. We are continuing to assume that yi is
M x 1, xi is K x 1, r; = (xi yi) is i.i.d. according to a distribution with finite fourth
moments
(1 =
1,. .,N), and that
E(x,,xi)
is non-singular. There are restrictions on
r and
B: m T, B )=O,
where
m
is a known function. Assume that the implied
restrictions on ll can be specified by the condition that n=vec(lT)=f(Q
where the domain of 6 is r,, a subset of
R
that includes the true value So
(s 5 MK). Assume that Y, and f satisfy Assumptions 1 and 2; these properties
could be derived from regularity conditions on m, as in Malinvaud (1970,
prop. 2, p. 670).
Choose 8 to
y: [7i - f(d)]&
1[72-f(s)],
E 1
where d is given by eq. (2) and we assume that 0 in eq. (1) is positive
definite. Let F= af(s)/S. Then we have J%(~-~~)%NN(O, A), where n
= (F Q - 1 F) . This generalizes Malinvauds minimum distance estimator (p.
676); it reduces to his estimator if UP uy is uncorrelated with xi xi, so that Q
= E(up up ) @ [E(.qx;)] - (up = yi Zl x,).
Now suppose that the only restrictions on r and B are that certain
coefficients are zero, together with the normalization restrictions that the
coefticient of yim in the mth structural equation is one. Then we can give an
explicit formula for A. Write the mth structural equation as
where the components of zi, are the variables in yi and xi that appear in the
mth equation with unknown coefficients. Let there be M structural equations
and assume that the true value r is non-singular. Let 6 =(S;, . . ., &) be s x 1,
and let r(6) and
B 6)
be parametric representations of r and
B
that satisfy
the zero restrrctions and the normalization rule. We can choose a compact
set Y, c
R
containing a neighborhood of the true value a, such that I(6) is
non-singular for b E Y,. Then s = f(s), where f(s) = vet [ - r
(6)
B S)].
Assume that f(s) =IL implies that 6=6, so that the structural parameters
are identified. Then Y, and f satisfy Assumptions 1 and 2, and J%(8-6)
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
22/42
26
G. Chamberl ain, M ult iv ari ate regression models or panel data
A + (O, A). The formula for &r/&Y is given in Rothenberg (1973, p. 69),
an/as =-(r - 1cg K) p,,(zM B d5; )I,
where @,, is block-diagonal: @,, = diag {E(zilx:), . . ., E(Zi,Xi)}, and @,=E(X&).
So we have
n = {
~,,[E(Op
Up~ Xi X:)] - l
UP:,>
,
where I$ = royi + So xi. If up up is uncorrelated with xi xj, then this reduces
to
n = {@J-E -(Up up) @ @,I] a;,> - l,
which is the conventional asymptotic covariance matrix for three-stage least
squares [Zellner and Thiel (1962)].
I shall present a generalization of three-stage least squares that has the
same limiting distribution as the generalized minimum distance estimator.
Let /I=vec(B) and note that R= -(f ~
@ I)/?.
Then we have
[ji+(r- 0
z)/?]s)-[a+(r-
0
4Bl
=[(ro1)72+P]O-[(ro1)12+81,
where
o=(Z~~;l)E(f UpU:r~XtX;)(Z~Qi;).
Let S,, be the following block-diagonal matrix:
and let
where
iji = ~yi +
~Xi
p+rO
7
B%
BO.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
23/42
G. Chamberlain, Multivariate regression models fir palI e data
21
Now replace 0 by
6 = (Z@s,- ) 9yzgs,- ),
and note that
(I 0 S,)[(r 0 472 + j?] = sxy - s:,s.
Then we have the following distance function:
This corresponds to Basmanns (1965) interpretation of three-stage least
squares. 3
Minimizing with respect to 6 gives
a,,=(S,, F s:,)-(s,, Ps,,).
The limiting distribution of this estimator is derived in appendix B
(Proposition 5). We record it as:
Proposition
10. fi(6^,,-6)%iV(0,A),
where A =(@,, P- @P:,)-l.
This
generalized three-stage least squares estimator is asymptotically efficient within
the class of minimum distance estimators.
Finally, we shall consider the generalization of two-stage least squares.
Suppose that
Yil =S; zil O i l ,
where E(xiUil)=O, Zil is sl x 1, and rank [E(XiZ:l)] =sl. We complete the
system by setting
yi, = nk xi + Uim,
where E(XiUi,)=O (m=2,. . ., M). SO z~,,,=x~ (m=2,.
. ., M),
and
Let 6 =(6;, II;, . ., nJ and apply the minimum distance procedure to obtain
8; since we are ignoring any restrictions on R, (m = 2,.
. ., M), 8
is a limited
information minimum distance estimator.
13See Rothenberg (1973 p. 82). A more general derivation of this distance function can be
obtained by following Hanken (1982). Also see White (1982).
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
24/42
28
G. Chamberlain, Multivariate regression models for panel data
We have a($, -@)1?N(O, n
11),
and evaluating the partitioned inverse
gives
where
n 11= {E(Zi, $I [E((Diq)Z Xix:)] - E(Xi Zil)} _ ,
(4)
$1 =yi, -s;ozir.
We can obtain the same limiting distribution by using the following
generalization of two-stage least squares: Let
and
where $I %Sy (for example, 8r could be an instrumental variable estimator);
then
&;G2
Z; x PE;,
x2,)-
(z; x P
,,
Xy,).
This is the estimator of S, that we obtain by applying generalized three-stage
least squares to the completed system, with no restrictions on A, (m
= 2,. . .)
M). The limiting distribution of this estimator is derived in appendix
B (Proposition 5):
Proposition 11.
,,/%(8,,, -Sy)%N(O, A,,), where A,
I
is given in eq. (4). This
generalized two-stage least squares estimator is asymptotically efficient in the
class of limited information minimum distance estimators.
3.4. Asymptotic efjciency: A comparison with the quasi-maximum likelihood
estimator
Assume that
ri
is i.i.d.
(i=
1,. . .,
N) from a distribution with Er,) =z, V rJ
=Z, where Z is a J x J positive definite matrix; the fourth moments are
finite. Suppose that we wish to estimate functions of Z subject to restrictions.
Let C= vet(Z) and express the restrictions by the condition that a=g(O),
where g is a function from Yinto Rq with a domain YC RP that contains the
true value
O(q = J*; p 5 J(J + 1)/2).
Let
S=kiil
ri-FJ ri-yi),
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
25/42
If the distribution of vi is multivariate normal, then the log-likelihood
function is
If there are no restrictions on r, then the maximum likelihood estimator of 8
is a solution to the following problem: Choose 6 to solve
We shall derive the properties of this estimator when the distribution of Yi is
not necessarily normal; in that case we shall refer to the estimator as a quasi-
maximum likelihood estimator (e^,,,).14
MaCurdy (1979) considered a version of this problem and showed that,
under suitable regularity conditions, ,/%(gQML -0) has a limiting normal
distribution; the covariance matrix, however, is not given by the standard
information matrix formula. We would like to compare this distribution with
the distribution of the minimum distance estimator.
This comparison can be readily made by using Theorem 1 in Ferguson
(1958). In our notation, Ferguson considers the following problem: Choose 8
to solve
w (s, e) [s-g e)] = 0.
He derives the limiting distribution of
fi(&--
fI) under regularity
conditions on the functions W and g. These regularity conditions are
particularly simple in our problem since W does not depend on S. We can
state them as follows:
Assumption 3. E. c RP
is an open set containing 8; g is a continuous, one-
to-one mapping of E. into Rq with a continuous inverse; g has continuous
second partial derivatives in Eo; rank [ag(fI)/S] =p for OE 8,; Z(O) is non-
singular for
edo.
In addition, we shall need SaAg(Oo) and the central limit theorem result that
+%(S-g(e))%N(O,d), where A = V[(U~-~~)@(U~-~~)].
Then Fergusons theorem implies that the likelihood equations almost
surely have a unique solution within So for sufficiently large N, and
14The quasi-maximum likelihood terminology was used by the Cowles Commission; see
Malinvaud (1970, p. 678).
JE--B
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
26/42
30
G. Chamberl ain, M ult ioar iat e regression models or panel data
vmL4L
eO)%,N(O, A), where
A=(GYG,-GYAYG(GYG,)-,
and G=&(fl)/%, Y=(Z@Zo)-. It will be convenient to rewrite this,
imposing the symmetry restrictions on Z. Let G* be the J( J+ 1)/2 x 1 vector
formed by stacking the columns of the lower triangle of Z. We can define a
J* x [ J( J + 1)/2] matrix
T
such that CT
Ta*.
The elements in each row of
T
are all 0 except for a single element which is one;
T
has full column rank. Let
s= J-s*
g(6)=
Tg*(B), G* = ~g*(~)/S, Y* = TYT;
then fi[S* -s*(0)]
%N(O,A*),
where
A*
is the covariance matrix of the vector formed from the
columns of the lower triangle of (ri-rO)(ri -TO). NOW we can set
/I =(e* y*G*)- (G*
y* A* y* G*)(e* y*
G*)-
1.
Consider the following minimum distance estimator: choose @MD o
T$[s* -g*(B)] A,{ ?* -g*(O)],
where ris a compact subset of E. that contains a neighborhood of 8 and
A,=%Y*. Then the following result is implied by Proposition 7.
Proposition 12. If Assumption 3 is satisfied, then J%(&~~~ -0) has the
same limiting distribution as fi(gMD - 0).
If A* is non-singular,
an optimal minimum distance estimator has
A,a%[A*-,
where [ is an arbitrary positive real number. If the distribution
of ri is normal, then A*- =iY*; but in general A*- is not proportional to
Y*, since
A*
depends on fourth moments and Y* is a function of second
moments. So in general flPML
is less efficient than the optimal minimum
distance estimator that uses
;i;l s~-s*) s:-s-i)
1
-1
,
where SF is the vector formed from the lower triangle of (ri-r](ri-f).
More generally, we can consider the class of consistent estimators that are
continuously differentiable functions of s
*: &=@*). Chiang (1956) shows that
the minimum distance estimator based on
A*-
has the minimal asymptotic
covariance matrix within this class. The minimum distance estimator based
on A, in (5) attains this lower bound.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
27/42
G. Chamberlain, Multivariate regression models or panel data
31
4. An empirical example
We shall present an empirical example that illustrates some of the
preceding results. The data come from the panel of Young Men in the
National Longitudinal Survey (Parnes). The sample consists of 1454 young
men who were not enrolled in school in 1969, 1970, or 1971, and who had
complete data on the variables listed in table 1. Table 2a presents an
unrestricted least squares regression of the logarithm of wage in 1969 on the
union, SMSA, and region variables for all three years. The regression also
includes a constant, schooling, experience, experience squared, and race. This
regression is repeated using the 1970 wage and the 1971 wage.
Table
I
Characteristics of National Longitudinal Survey
Young Men, not enrolled in school in 1969,
1970, 1971; N= 1454.
Variable Mean
Standard
deviation
LWI 5.64
0.423
LWZ 5.74 0.426
LW3 5.82 0.437
Ul 0.336
u2 0.362
lJ3 0.364
lJlU2
0.270
lJIcJ3 0.262
U2U3
0.303
UI CJ2U3
0.243
SMSAI 0.697
SMSAZ
0.627
SMSA3 0.622
RNSI
0.409
RNS2
0.404
RNS3 0.410
s 11.7 2.64
EXP69
5.11 3.71
EXP692 39.8 46.6
RACE
0.264
LWI, L W2, LW3 ~ logarithm of hourly
earnings (in cents) on the current or last job in
1969,1970,1971; UI, U2, U3 - 1 if wages on
current or last job set by collective bargaining,
0 if not, in 1969,1970,1971; SMSAI,SMSAZ,
SMSA3 -
1 if respondent in SMSA, 0 if not,
in 1969,1970,1971;
RNSI, RNSZ, RNS3 -
1, if
respondent in South, 0 if not, in 1969,1970,1971;
S ~ years of schooling completed; EXP69 -
(S-age in 1969 -6); RACE - 1 if respondent
black, 0 if not.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
28/42
1
u
2
b
u
y
B
n
p
m
a
v
s
o
a
p
s
X
(
8
p
6
S
C
Z
S
V
W
Z
W
V
S
W
a
"
S
=
%
t
1
0
9
O
(
9
0
t
z
(
P
O
2
0
f
?
?
-
(
6
O
9
O
(
9
0
L
O
(
O
O
O
9
O
f
(
1
0
P
O
(
2
0
8
0
(
S
O
O
2
0
i
(
6
O
O
0
0
(
P
O
O
1
0
(
Z
O
O
8
0
z
z
E
O
O
)
(
L
O
z
o
Z
O
O
O
O
l
M
k
m
(
S
O
(
O
O
S
0
P
O
O
6
0
Z
M
(
P
O
O
h
O
b
M
O
Z
O
O
L
O
L
O
f
M
.
_
r
z
I
a
q
A
-
_
~
o
J
p
m
p
m
s
u
o
j
u
_
e
.
s
u
S
s
e
l
s
p
s
m
9
1
~
u
z
.b
u
Q
B
n
p
m
a
s
o
J
p
m
s
a
L
$
w
2
9
a
s
u
%
a
V
(
8
O
O
(
E
0
1
O
O
(
9
0
S
O
(
E
O
O
O
(
O
O
(
E
O
O
Z
O
O
O
P
O
0
8
0
C
0
f
8
0
Z
0
I
P
O
O
9
O
.
C
(
Z
D
O
(
6
0
(
6
0
(
9
0
S
0
(
L
O
O
9
o
(
8
0
(
E
O
O
S
O
6
O
s
0
o
o
E
O
9
0
0
0
O
O
8
O
Z
M
(
O
O
O
8
o
g
p
(
P
O
S
O
(
s
(
S
O
O
Z
O
I
O
O
S
O
6
0
Z
(
s
o
I
L
O
l
C
Z
I
I
t
z
I
a
q
m
_
~
l
u
:
J
o
o
a
p
m
s
p
s
u
x
a
R
S
s
m
I
p
3
u
e
a
w
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
29/42
G. Chamberlain Multivariate regression models for
panel
data
33
In section 2 we discussed the implications of a random intercept (c) and a
random slope b). If the leads and lags are due just to c, then the submatrices
of LI corresponding to the union, SMSA, or region coefficients should have
the form /3l+U. Consider, for example, the 3 x 3 submatrix of union
coefficients ~ the off-diagonal elements in each column should be equal to
each other. So we compare 0.048 to 0.046, 0.042 to 0.041, and -0.009 to 0.010;
not bad.
In table 2b we add a complete set of union interactions, so that, for the
union variables at least, we have a general regression function. Now the
submatrix of union coefficients is 3 x 7. If it equals pZ3,0)+Zl, then in the
first three columns, the off-diagonal elements within a column should be
equal; in the last four columns, all elements within a column should be equal.
I first imposed the restrictions on the SMSA and region coefficients, using
the minimum distance estimator. fl is estimated using the formula in eq. (2),
section 3.1, and A,=&. The minimum distance statistic (Proposition 8) is
6.82, which is not a surprising value from a ~(10) distribution. If we impose
the restrictions on the union coefficients as well, then the 21 coefficients in
table 2b are replaced by 8: one fl and seven 2s. This gives an increase in the
minimum distance statistic (Proposition 8, appendix B) of 19.36-6.82
= 12.54, which is not a surprising value from a ~(13) distribution. So there is
no evidence here against the hypothesis that all the lags and leads are
generated by c.
Consider a transformation of the model in which the dependent variables are
LWl, LW2-LWl, and LW3-LW2. Start with a multivariate regression on
all of the lags and leads (and union interactions); then impose the restriction that
U,
SMSA,
and
RNS
appear in the LW2-
L WI
and LW3 - LW2 equations
only as contemporaneous changes (E(y, - y,
1 1 1, x2, x3) = p(x, - x,_ J).
This
is equivalent to the restriction that c generates all of the lags and leads, and
we have seen that it is supported by the data. I also considered imposing all
of the restrictions with the single exception of allowing separate coefficients
for entering and leaving union coverage in the wage change equations. The
estimates (standard errors) are 0.097 (0.019) and -0.119 (0.022). The standard
error on the sum of the coefficients is 0.024, so again there is no evidence
against the simple model with E(y, 1x1, x2, x3, c) = /IX, + c.15
However, since the x,s are binary variables, condition (R) in Proposition 1
Using May-May CPS matches for 197771978, Mellow (1981) reports coefftcients (standard
errors) of 0.087 (0.018) and -0.069 (0.020) for entering and leaving union membership in a wage
change regression. The sample consists of 6,602 males employed as non-agricultural wage and
salary workers in both years. He also reports results for 2,177 males and females whose age was
525. Here the coefficients on entering and leaving union membership are quite different: 0.198
(0.031) and -0.035 (0.041); it would be useful to reconcile these numbers with our results for
young men. Also see Stafford and Duncan (1980).
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
30/42
34
G. Chamberlain, Multivariate regression models for panel data
does not hold. For example, the union coefticients provide some evidence
that E(b
1 1, x2,x,)
is constant for the individuals who experience a change in
union coverage [i.e., E(b
1
,,x,,x,)=if if x,+x,+x,#O or 33; but there is
no direct evidence on E(b
1x1, x2, x3)
for the people who are always covered
or never covered. Furthermore, our alternative hypothesis has no structure.
It might be fruitful, for example, to examine the changes in union coverage
jointly with changes in employer.
Table 3a exhibits the estimates that result from imposing the restrictions
using the optimal minimum distance estimator.j We also give the
conventional generalized least squares estimates. They are minimum distance
estimates in which the weighting matrix (AN) is the inverse of
We give the conventional standard errors based on (pfi;F)- and the
standard errors calculated according to Proposition 7, which do not require
an assumption of homoskedastic linear regression. These standard errors are
larger than the conventional ones, by about 30%. The estimated gain in
efficiency from using the appropriate metric is not very large; the standard
errors calculated according to Proposition 7 are about 10% larger when we
use conventional GLS instead of the optimum minimum distance estimator.
Table 3a also presents the estimated Ils. Consider, for example, an
individual who was covered by collective bargaining in 1969. The linear
predictor of c increases by 0.089 if he is also covered in 1970, and it increases
by an additional 0.036 if he is covered in all three years. The predicted c for
someone who is always covered is higher by 0.102 than for someone who is
never covered.
Table 3b presents estimates under the constraint that I=U. The increment
in the distance statistic is 89.08 - 19.36= 69.72, which is a surprisingly large
value to come from a x2 (13) distribution. If we constrain only the union As
to be zero, then the increment is 57.06- 19.36= 37.7, which is surprisingly
large coming from a x2(7) distribution. So there is strong evidence for
heterogeneity bias.
The union coefficient declines from 0.157 to 0.107 when we relax the A =0
restriction. The least squares estimates for the separate cross-sections, with
16We did not find much evidence for nonstationarity in the slope coefficients. If we allow the
union fi to vary over the three years, we get 0.105, 0.103, 0.114. The distance statistic declines
IO 18.51, giving 19.36- 18.51 =0X5; this is not a surprising value from a x*(2) distribution. If we
also free up /I for SMSA and RNS, then the decline in the distance statistic is 18.51- 13.44
= 5.07, which is not a surprising value from a x(4) distribution.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
31/42
T
e
3
R
c
e
e
m
e
B
/
,
s
C
c
e
s
(
a
s
a
d
e
o
o
u
S
M
S
4
0
1
0
0
5
(
0
0
(0
0
0
1
0
0
(
0
0
(0
0
(
0
0
(0
0
R
-
0
0
(
0
0
-
0
0
(
0
0
(
0
0
U
u
(
3
l
J
J
2
U
U
r
2
U
lJ
U
U
-
0
0
-
0
0
-
0
0
0
1
0
1
0
1
-
0
2
.
(
0
0
(
0
0
(
0
0
(
0
0
(
0
0
(
0
0
(
00
S
M
I
S
M
A
S
M
A
R
R
R
0
0
~
0
0
0
0
0
1
-
0
0
-
0
1
(
0
0
(
0
0
(
0
0
(
0
0
(
0
0
(
00
x
2
=
1
3
E
C
x
=
x
+
x
x
=
U
U
U
U
U
U
U
U
U
U
U
U
S
M
A
S
M
A
S
M
A
R
R
R
x
=
1
S
E
E
R
Z
=
J
Z
0
B
M
A
R
+
1
Z
s
u
e
c
e
T
e
c
o
a
e
e
e
a
n
=
F
w
e
6
s
u
e
c
e
B
a
1
a
e
m
n
m
m
d
s
a
e
m
e
w
h
A
=
i
n
e
2
s
o
3
1
o
a
l
o
a
e
m
n
m
m
d
s
a
e
m
e
w
h
A
=
6
n
e
6
s
o
4
o
i
s
n
s
h
w
i
n
t
h
t
a
e
T
f
s
a
d
e
o
f
o
/
?
i
s
h
c
o
o
b
o
(
FR
4
t
h
s
s
a
d
e
o
f
o
&
i
s
b
o
F
F
~
F
1
F
F
F
~
T
x
s
a
s
c
a
e
c
m
e
f
o
m
N
k
F
G
&
?
F
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
32/42
36
G. Chamberlain, Multivariate regression models for panel data
Table 3b
Restricted estimates under the constraint that I = 0.
Coefficients (and standard errors) of:
u
SMSA
RNS
s^
0.157
0.120 -0.150
(0.012)
(0.013)
(0.016)
x2(36) = 89.08
See
footnote to table 3a.
no leads or lags, give union coefficients of 0.195, 0.189, and 0.191 in 1969,
1970 and 1971.17 So the decline in the union coefficient, when we allow for
heterogeneity bias, is 32% or 44x, depending on which biased estimate (0.16
or 0.19) one uses. The SMSA and region coefficients also decline in absolute
value. The least squares estimates for the separate cross-sections give an
average SMSA coefficient of 0.147 and an average region coefficient of
-0.131. So the decline in the SMSA coefficient is either 53% or 62x, and the
decline in absolute value of the region coefficient is either 45% or 37%.
5. Conclusion
We have examined the relationship between heterogeneity bias and strict
exogeneity in distributed lag regressions of y on x. The relationship is very
strong when x is continuous, weaker when x is discrete, and non-existent as
the order of the distributed lag becomes infinite.
The individual specific random variables introduce nonlinearity and
heteroskedasticity. So we have provided an appropriate framework for the
estimation of multivariate linear predictors. We showed that the optimal
minimum distance estimator is more efficient, in general, than the
conventional estimators such as quasi-maximum likelihood, We provided
computationally simple generalizations of two- and three-stage least squares
that achieve this efficiency gain.
Using the NLS Young Men in 1969 (N = 1362), Griliches (1976) reports a union membership
coefticient of 0.203. Using the NLS Young Men in a pooled regression for 19661971 and 1973
(N=470), Brown (1980) reports a coefficient of 0.130 on a variable measuring the probability of
union coverage. (The union coverage question was asked only in 1969, 1970, and 1971; so this
variable is imputed for the other four years.) The coefficient declines to 0.081 when individual
intercepts are included in the regression. His regressions also include a large number of
occupation and industry specific job characteristics.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
33/42
G. Chamberlain Multiuariate regression modekfor panel data
37
Some of these ideas were illustrated using the sample of Young Men in the
National Longitudinal Survey. We examined regressions of wages on the
leads and lags in union coverage, SMSA, and region. The results indicate
that the leads and lags could have been generated just by a random
intercept. This gives some support for analysis of covariance type estimates;
these estimates indicate a substantial heterogeneity bias in the union, SMSA,
and region coefficients.
ppendix
Let Sz be a set of points where OEQ is a doubly infinite sequence of
vectors of real numbers: 0={...,0_~,0~,0~,...}={0,,t~I), where w,ER~
and I is the set of all integers. Let z,(w)=o, be the tth coordinate function.
Let F be the a-field generated by sets of the form
A =
(0.xz,(w)E
B,, . . .,
Z,+k(u)E Bk},
where t, k E I and the Bs are q-dimensional Bore1 sets. Let P be a probability
measure defined on 9 such that {e,, t E 11 is a (strictly) stationary stochastic
process on the probability space (C&P-, P).
The shift transformation S is defined by z,(So) =zt+ r(w). It is an invertible,
measure preserving transformation. A random variable d defined on (sZ,P, P)
is invariant if d(So)=d(w) except on a set with probability measure zero
(almost surely or as.). A set A E 9 is invariant if its indicator function is an
invariant random variable.
We shall use E(d ( Y), to denote the conditional expectation of the random
variable d with respect to the o-field 3, evaluated at w. Let x, be a
component of zl, let g(x) denote the a-field generated by {.. ., x_ 1, x0, x1,. . .},
and let E(d1 xt,x,_
r,. . .) denote the expectation of d conditional on the
a-field generated by xt, xt 1,. . . .
Proposition 3.
If d is an invariant random variable with E(ldl)< co, then
where t is any integer.
Proof. First we shall show that E(d I a(x)) is an invariant random variable.
Let f(o)=d(Sw). A change of variable argument shows that
E(d I CT(X))~~ E(fl S- o(x)),
a.s.
[See Billingsley (1965, example 10.3, p. 109).] Since d is an invariant
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
34/42
38 G. Chamberlain, Multivariate regression models,for panel data
random variable, we have d(Sw)=d(o) as.; also S- CJ(X) 0(x). Hence
Let CJ(X,,X _ 1,. . .) denote the a-field generated by (x,, x,_ 1,. . .), and let
~=~~_a(xt,x,-
1,.
. .)
be the left tail o-field generated by the x process.
Since E(d
1 T(X))
is an invariant random variable, there is a version of
E(d 1a(x)) that is measurable Y-. [See Rozanov (1967, lemma 6. l., p. 162).]
Hence E(d
1o(x)) = E(d 1Y)
a.s.,
and
so
E(d
1a(x)) = E(d 10(x,, xt_ 1,. . .)).
Q.E.D.
Let d be an invariant random variable and assume that E(P)< co,
E(xT)< co. Consider the Hilbert space of random variables generated by the
linear manifold spanned by the variables {d,. . .,x_ 1, x,,, x1,. . .}, closed with
respect to convergence in mean square. We also include a constant (1) in
the space. The inner product is (a, b) =E(ab). Then the linear predictor
E*(d I
. ..) X_1,X(),Xl)... )
is defined as the projection of d on the closed linear
subspace generated by { 1,. . ., x _ 1, x0, x1,. . .}.
Proposit ion 4.
I f d i s an i nvar i ant random var i able and E(d) < co, E(xf) < co,
then
E*(dl ..., x_l,xO ,xl ,... )=$+A&
w here f i s t he l imi t i n mean square of cfzO X ,-~/J as J-co, t i s any integer,
and
i = cov (d, ~)/V(~)
if V(2) 0,
=o
if
V(R)=O,
rc/ E(d) - AE(.f).
Proof:
The existence of the limit is implied by the mean ergodic theorem
[Billingsley (1965, theorem 2.1, p. 21)]. Since d is an invariant random
variable, we have cov(d,x,)=cov(d, x1) for all t. Let aJ=xf=l x,-j/J. Then
cov (d, a,) = cov (d, x,),
and
so cov (d, m) lim,, m cov (d, 2,) = cov (d, x1).
Since
i is an invariant random variable, we have cov (a, a,)= cov (a, x,), and so
V(a) = lim,, m
cov (2, a,) = cov (a, x1). Hence
cov
(d $ - A f, x,) = cov (d, x1)-I cov (a, x1)
=
cov (d, 2) - 1 V (a) 0,
t E I.
Since we also have E(d - I I / 2~2) 0, the proof is complete.
Q.E.D.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
35/42
G. Chamberlain Multioariate reyression models for panel data 39
Appendix B
Let r: = (x;,yi ), i = 1,. . ., N, where xi = (xii,. . ., xix) and yi = (y,r, . . ., yiM). Write
the mth structural equation as
Yim = S:, i r n U im ,
m=l,...,M,
where the components of zi, are the variables in yi and xi that appear in the
mth equation with unknown coefficients. Let S,, be the following block-
diagonal matrix:
and
Let 0: = (I$~, . ., &), where u& = yim - 6,ozi, and ~5: is the true value of 6,; let
Gz, = E&J Let 6 =(S;, . . ., 6b) be s x 1, and set
s^=
S,,
D -
Sz,) -
(S,,
D
s,,,).
Proposition 5. Assume that (1)
ri
is i.i.d. according to some distribution with
,finite fourth moments; (2) E[xi(yi, -8: Zi,J] = 0 (m = 1,. , ., M); (3) rank (a..,) = s;
und (4) D a Y as N- +E,
,I$& 6)s N(0, A), where
where P is a positive definite matrix. Then
Proof: ~(S^-~O)=(S,,D-~S~,)-~~,,D~~~~~,(U~O~~)~~. By
the
strong law of large numbers, Sz,2@,,; @2x Y ~
W,
is an s x s positive
definite matrix since rank (@,,)=s. So we obtain the same limiting
distribution by considering
(Gi,, Y - l a:,) - 1 CD,, Y l f (I$ @ X&G.
i=l
Note that II: @Ixi is i.i.d. with E(u: @ Xi)=O, V(U~ 0
X~)=E(U~U: 0 Xix;).
Then
applying the central limit theorem gives ~(8-6)~N(0,A).
Q.E.D.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
36/42
40
G. Chamberlain, Multivariate regression models or panel data
This result includes as special cases a number of the commonly used
estimators. If zi, = xi(m = 1
. . ., M)
and D =Z, then 8 is the least squares
estimator and ,4 reduces to the formula for R given in eq. (1) of section 3.1.
If Y = E($$) 0 E(x&), then n is the asymptotic covariance matrix for the
three-stage least squares estimator. If Y =E($$ @ XiXI), then ,4 is the
asymptotic covariance matrix for the generalized three-stage least squares
estimator [eq. (3), section 3.31. If
Y = diag{E(z$~) E(xi xi), . ., E(vg) E(xi xi)),
then we have the asymptotic covariance matrix for two-stage least squares. If
Y = diag{ E($t Xi xi), . ., E(?I~$ xi xi)},
we have the asymptotic covariance matrix for generalized two-stage least
squares. [A,, is given in eq. (4), section 3.3.1
Next we shall derive the properties of the minimum distance estimator. Let
D,(0) = [a,-g(@]AJa,-g(6)] and choose e to
min ON(e).
Bt 1
Assumptions 1 and 2 are stated in section 3.2.
Proposition 6.
f
ssumpt i on 1 i s sati sfied, t hen @%I .
ProoJ:
Let D*(6) = &(0) -g(0)] Yk(8) -g(e)].
D, a.~.
converges uniformly
to D* on Y: Let B be a neighborhood of 8 and set r = r-
B.
Then
min D,(@*min D*(8)=&
BEY BET
Since 6 > 0 and
DN (& )a*O,
it must be that 8~ B a.s. for N sufficiently large.
Since B is an arbitrary neighborhood of 8, we have shown that 8 Leo.
Q.E.D.
Proposition 7.
f
Assumpti ons I and 2 are sat i sfi ed, t hen fi(& -8)
% N(U , A), w here
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
37/42
G. Chamberl ain, M ult ivari ate
regression
models or panel dat a
41
Zf A is positive definite, then A -(G A - 1 G)- 1 is positive semi-definite; hence an
optimal choicefbr Y is A I.
Proof Let
s,(fI)=dD,(B)/80= -2(aip(e)/a~)A.[a,-g(e)].
Since &SO, for N sufficiently large we as. have 6~ Z. and s,(8) = 0. The
mean value theorem implies that
s,(6)= ~~(8) + (ds,(O*)/tW) (6 0)
a.s.,
for sufficiently large N, where 8* is on the line segment connecting 8 and 0.
[There is a different 8* for each row of &,(O*)/%; the measurability of 8*
follows from lemmas 2 and 3 of Jennrich (1969).] Since 0*28, direct
evaluation shows that
&&I*)/% 32 G Y G,
which is non-singular. Hence
fi(e^- 0) = - [t%,(tl*)/ae,] - l
JNs, eo)
a.s.,
for sufficiently large N. We obtain the same limiting distribution by
considering
Hence @(e - 0)s N(O, A).
To find an optimal Y, note that there is a non-singular matrix C such that
A=CC.
Let G=C-Gand B=(GYG)-GYC. Then we have
which is positive semi-definite. Q.E.D.
Proposition 8. If Assumptions I and 2 are satisfied, if A is positive definite,
and if ANaLA-, then
NC% s(e)l Ada, -&I
%c2(q
PI.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
38/42
42
G. Chamberl ain, M ulr ~rt rri ate regression models or panel data
Proof:
For sufficiently large N we have
JNCg(&)-g(OO)]
G,
JE(e^-0)
a.s.,
where G, %G. From the proof of Proposition 7, we have
JE(e^--OO)=R,JN[u,-g(BO)] s
.,
where R,~R=(GA-G)-lGAp. Hence
fib,
-g(831
= ,%,
-AeoN
-fiCg(83
g(e")l sQCU>
where Q = Z,- GR, C is a non-singular matrix such that CC = A, and
II 2 N(O,
I,);
d, = N[a, -g(@)] A,[a, -g(8)] %i C Q A - 1 QCu.
Let G=CG and M,=Z,-c(G@lc; then M, is a symmetric
idempotent matrix with rank
q-p
and
CQA-QC=M;=M,.
Hence d,~,uM,u~~X2(q-p). Q.E.D.
Now consider imposing additional restrictions, which are expressed by the
condition that 8 =f(a), where
a
is s x 1 (s 5 p). The domain of a is Y,, a subset
of
R
that contains the true value d . So O=f(a) is confined to a certain
subset of RP.
Assumption 2. Y,
is a compact subset of
R
that contains a; f is a
continuous mapping from r, into Y, f a) =
e
for a E Y, implies a =a ; Y,
contains a neighborhood of a0 in which f has continuous second partial
derivatives; rank (F) = s, where
F= df a)/da.
Let h a)=g[f a)]. Choose oi to
min
[ahi-h a)]AN[u,-h a)].
OIE,
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
39/42
G. Chamberlain Multivariate regression models
for
anel data
43
Proposition 8. If Assumptions 1, 2, and 2 are satisfied, $ A is positive
definite, and if AN%Ael, then d, -d2%~(p-s), where
d, =N[a,
-W)lA.Ca,-W41,
Furthermore, d, -d, is independent of d, in their limiting joint distribution.
Proof The assumptions on f and Y, imply that h and & satisfy
Assumptions 1 and 2. By following the proof of Proposition 8, we can show
that the vector (d,, d2) converges in distribution to (d:, d:), where
U& N(O,Z), C is a non-singular matrix such that CC= A, 8= C-H,
G=C-G, and
MH=Iq-A(RA)-lzT, M,=z,-e(el~-w
Since fi is in the column space of e, we have M,Mc=MGM, =M,; so
MH-MG is a symmetric idempotent matrix with rank p-s. Hence
d, -d,~tU(MH-MG)u~~X2(p-s).
Since
cov[(M,-M,)u,M,u] =(MH-MG)MG=O,
we see that d: - d,* is independent of d:.
Q.E.D.
In section 3.2 we considered applying the minimum distance procedure
both to L and to W. We want to show that if the restrictions involve only n,
then the two procedures give estimators of R with the same limiting
distribution. First consider the effect of a one-to-one transformation from W
to (ti, w;): let I(p) be a function from R4 into Rq and let L = al(p)/a$, where
p =g(O). Let h(8) = Ik(O)]. Choose 8 to
y;,rCk) - W41ANCQ4 W)l.
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
40/42
44
G. Chamberl ain, M ult iv ari at e regression models or panel data
Proposition 9a.
Assume that (1) Assumptions 1 and 2 are satisfied for g and
1: (2) 1 is one-to-one and continuous on the range of g(O) for 0~ K 1 has
continuous second partial derivatives in a neighborhood of g(Q); L is non-
singular; (3) A is positive definite and A, A(LAL)-. Then ,/%(6-O)
3
N(0, A),
where A = (G A
l c)-.*
Proof: By the d-method,
fi[f ~,)-h 8~)]~N O,
A L).
Hence ,,/%(& O)% N(0, A), where ,4 = (H(L A L)) 1 H)- and H= c%(O~)/N.
Since
H= L G
and
L
is non-singular, we have A =(G
A ~ c)-.
Q.E.D.
Finally, consider augmenting aN to a
k x 1
vector cN: c;V=(a;, bk),
kzq.
(For example, we can augment 12 by adding WZ.) Assume that
cN %
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
41/42
G. Chamberl ain, Mul t iw ri ate regression models for panel data
45
where A&,
is the s,
t
submatrix of A; (s, t = 1,2). Then the concentrated
distance function is
= [aN -g(O)] A,[a, -g(O)].
Q.E.D.
So the addition of unrestricted moments does not affect the minimum
distance estimator.
References
Anderson, T.W., 1969, Statistical inference for covariance matrices with linear structure, in: P.R.
Krishnaiah, ed., Proceedings of the second international symposium on multivariate analysis
(Academic Press, New York).
Anderson, T.W., 1970, Estimation of covariance matrices which are linear combinations or
whose inverse are linear combinations of given matrices, in: Essays in probability and
statistics (University of North Carolina Press, Chapel Hill, NC).
Amemiya, T., 1971, The estimation of variances in a variance-components model, International
Economic Review 12, l-13.
Balestra, P. and M. Nerlove, 1966, Pooling cross section and time series data in the estimation
of a dynamic model: The demand for natural gas, Econometrica 34, 5855612.
Basmann, R.L., 1965, On the application of the identifiability test statistic and its exact finite
sample distribution function in predictive testing of explanatory economic models,
Unpublished manuscript.
Billingsley, P., 1965, Ergodic theory and information (Wiley, New York).
Billingsley, P., 1979, Probability and measure (Wiley, New York).
Brown, C., 1980, Equalizing differences in the labor market, Quarterly Journal of Economics 94,
113-134.
Chamberlain, G., 1980, Analysis of covariance with qualitative data, Review of Economic
Studies 47, 225-238.
Chiang, C.L., 1956, On regular best asymptotically normal estimates, Annals of Mathematical
Statistics 27, 336-351.
Cramer, H., 1946, Mathematical methods of statistics (Princeton University Press, Princeton,
NJ).
Ferguson, T.S., 1958, A method of generating best asymptotically normal estimates with
application to the estimation of bacterial densities, Annals of Mathematical Statistics 29,
10461062.
Goldberger, AS., 1974, Asymptotics of the sample regression slope, Unpublished lecture note no.
12.
Griliches, Z., 1976, Wages of very young men, Journal of Political Economy 84, S69-S85.
Griliches, Z. and A. Pakes, 1980, The estimation of distributed lags in short panels, National
Bureau of Economic Research technical paper no. 4.
Hansen, L.P., 1982, Large sample properties of generalized method of moments estimators,
Econometrica 50, forthcoming.
Hsiao, C., 1975, Some estimation methods for a random coefficient model, Econometrica 43,
3055325.
Jennrich, R.I., 1969, Asymptotic properties of non-linear least squares estimators, The Annals of
Mathematical Statistics 40, 6333643.
Kendall, M.G. and A. Stuart, 1961, The advanced theory of statistics, Vol. 2 (Griffin, London).
MaCurdy. T.E., 1979, Multiple time series models applied to panel data: Specification of a
dynamic model of labor supply, Unpublished manuscript,
-
7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA
42/42
46
G. Chamberl ain, M ult iv ari ate regression models for panel data
Maddala, G.S., 1971, The use of variance components models in pooling cross section and time
series data, Econometrica 39, 341-358.
Malinvaud, E., 1970, Statistical methods of econometrics (North-Holland, Amsterdam).
Mellow, W., 1981, Unionism and wages: A longitudinal analysis, Review of Economics and
Statistics 63, 43-52.
Mundlak, Y., 1961, Empirical production function free of management bias, Journal of Farm
Economics 43,44-56.
Mundlak, Y., 1963, Estimation of production and behavioral functions from a combination of
time series and cross section data, in: C. Christ et al., eds., Measurement in economics
(Stanford University Press, Stanford, CA).
Mundlak, Y., 1978, On the pooling of time series and cross section data, Econometrica 46, 699
85
Mundlak, Y., 1978a. Models with variable coefftcients: Integration and extension, Annales de
IINSEE, 30-31, 4833509.
Rao, C.R., 1973, Linear statistical inference and its applications (Wiley, New York).
Rothenberg, T.J., 1973, Efficient estimation with a priori information (Yale University Press,
New Haven, CT).
Rozanov, Y.A., 1967, Stationary random processes (Holden-Day, San Francisco, CA).
Sims, CA., 1972, Money, income, and causality, American Economic Review 62, 54G552.
Sims, C.A. 1974, Distributed lags, in: M.D. Intriligator and D.A. Kendrick, eds., Frontiers of
quantitative economics, Vol. II (North-Holland, Amsterdam).
Swamv, P.A.V.B.. 1970. Efficient inference in a random coefficient regression
top related