multivariate regression models for panel data

7/24/2019 MULTIVARIATE REGRESSION MODELS FOR PANEL DATA

1/42

Journal of Econometrics 18 (1982) 546. North-Holland Publishing Company

MULTIVARIATE REGRESSION MODELS

FOR PANEL DATA

Gary CHAMBERLAIN*

University ( / K+sconsin Madison, WI 53706, USA

Nutionul Bureau of

Economic

esearch, Cambridge, MA 02138. USA

The paper examines the relationship between heterogeneity bias and strict exogeneity in a

distributed lag regression of y on X. The relationship is very strong when x is continuous,

weaker when x is discrete, and non-existent as the order of the distributed lag becomes

infinite. The individual specific

random variables introduce nonlinearity and hetero-

skedasticity; so the paper provides an appropriate framework for the estimation of multivariate

linear predictors. Restrictions are imposed using a minimum distance estimator. It is generally

more efficient than the conventional estimators such as quasi-maximum likelihood. There are

computationally simple generalizations of two- and three-stage least squares that achieve

this efficiency gain. Some of these ideas are illustrated using the sample of Young Men in the

National Longitudinal Survey. The paper reports regressions on the leads and lags of variables

measuring union coverage, SMSA, and region. The results indicate that the leads and lags could

have been generated just by a random intercept. This gives some support for analysis of

covariance type estimates; these estimates indicate a substantial heterogeneity bias in the union,

SMSA, and region coefficients.

1. Introduction

Suppose that we have a sample of individuals (or firms) followed over time:

(xif,yiJ, where there are

t=

1,.

. .,

T periods and

i

1,. ., N individuals.

Consider the following distributed lag specification:

E YitIXil,...,XiT,bO...,bJ,C)= i bijXi,t-j+Ci,

t=J+l,...,T

j=O

The coefficients

b,,

and ci are allowed to vary across individuals but are

constant over time. The population parameters of interest are fij= E bij),

j=O,...,

J. If the bii or ci are correlated with x, then a least squares regression

*I am grateful to Arthur Goldberger, Zvi Griliches, Donald Hester, George Jakubson, Ariel

Pakes, and Burton Singer for comments and helpful discussions. Financial support was provided

by the National Science Foundation (Grants No. SOC-7925959 and No. SES-8016383) and by

funds granted to the Institute for Research on Poverty at the University of Wisconsin, Madison,

by the Department of Health, Education, and Welfare pursuant to the provisions of the

Economic Opportunity Act of 1964.

01657410/82/000Cr0000/$02.75 0 1982 North-Holland


2/42

6

G. Chamberl ain, M ult it lar iat e reqression models or panel data

of y, on x,, ., .xtmJ will not provide a consistent estimator of the B,i (as

N-+co). We shall refer to this inconsistency as a heterogeneity bias.

In section 2, on identification, we consider first the case J =0 and bi j=Pj

We argue that the presence of heterogeneity bias will be signalled by a full

set of lags and leads in the least squares regression of y, on x1,. . .,xT

Furthermore, if we let y=(yi,.. .,yr), x=(xr,. . .,x,) and consider the

multivariate linear predictor: E*(y

lx) = no + lI,x,

then the T x T matrix ZZ,

should have a distinctive pattern - the off-diagonal elements within the

same column are all equal. In that case,

so there is just a contemporaneous relationship when we transform to first

differences. I think that a test for such restrictions should accompany

analysis of covariance type estimation.

There is an analogous question when J is finite and the bj are random as

well as c. Does E(y, 1 1,. . , xT) = E y,

1x,, . . ., xtmJ)

imply that there is no

heterogeneity bias? We find that the answer is yes if x has a continuous

distribution but not if x is discrete.

New issues arise as the order

(J)

of the distributed lag becomes infinite.

We consider this problem in the context of a stationary stochastic process; c

and the bj are (shift) invariant random variables. There are invariant random

variables with non-zero variance if and only if the process is not ergodic. We

pose the following question: if

E* Y,

I

. . .Xf-1.&,Xt+1,.. .I= E*(Y, 1 ,, x, - 1,. . .I,

so that y does not cause x according to the Sims (1972) definition, is it then

true that there is no heterogeneity bias? The answer is no, because if d is an

invariant random variable, then

E*(dI .

x,_~,x,,x,+~ ,...

)=E*(dIxt,xtpl ,... ).

Section 3 of the paper considers the estimation of multivariate linear

predictors. lhere is a sample ri = (x;,y i = 1,. . , N , where x; = (xi,, ., xiK) and

yi=(y,r,. . , yiM). We assume that ri is independent and identically distributed

(i.i.d.) according to some distribution with finite fourth moments. We do not

assume that the regression function

E(ji 1 i )

is linear; for although

E(j i 1 i , ci)

may be linear, there is generally no reason to insist that

E(c,j xi )

is linear.

Furthermore, we allow the conditional variance V(_V,

i)

to be an arbitrary

function of xi; the heteroskedasticity could, for example, be due to random

coefficients. Let wi be the vector formed from the squares and cross-products

of the elements of vi ; let Zl be the matrix of linear predictor coefficients:


3/42

G. Chamberlain Mulrirw-iate regression models fir panel data

7

,5*Cyi ( xi) =ZIx, where fl= ECy,x;)[E(xix~)] -I. Then wi is i.i.d. and f7 is a

function of E(wi). So the problem is to make inferences about differentiable

functions of a population mean, under random sampling.

This is straightforward and the results have a variety of novel implications.

Let ii be the least squares estimator; let it and 71

be the vectors formed from

the columns of ii and II. Then fi(7i-~)~N(O,Q) as N-t co. The formula

for C2 is not the standard one, since we are not assuming homoskedastic,

linear regression.

We impose restrictions by using a minimum distance estimator: find the

matrix satisfying the restrictions that is closest to fi in the norm provided by

fi -I, where fi is a consistent (as N+ 00) estimator of a. This leads to some

surprising results. For example,

consider a univariate linear predictor:

E*(yi 1xii, xiz)= x0 + zlxil + n2xi2. We can impose the restriction that n2 =0

by using a least squares regression of y on x1 to estimate rcr; however, this is

asymptotically less efficient, in general, than our minimum distance estimator.

The conventional estimator is a minimum distance estimator, but it is using

a different norm.

A related result is that two-stage least squares is not, in general, an

efticient procedure for combining instrumental variables; three-stage least

squares is also using the wrong norm. We provide more efficient estimators

for the linear simultaneous equations model by applying our minimum

distance procedure to the reduced form, thereby generalizing Malinvauds

(1970) minimum distance estimator. Suppose that the only restrictions are

that certain structural coefficients are zero (and the normalization rule). We

provide a generalization of three-stage least squares that has the same

limiting distribution as our minimum distance estimator. There is a

corresponding generalization of two-stage least squares.

We also consider the maximum likelihood estimator based on assuming

that ri has a multivariate normal distribution with mean z and covariance

matrix Z. Then the slope coefficients in IZ are functions of C and, more

generally, we can consider estimating arbitrary functions of C subject to

restrictions. When the normality assumptions do not hold, we refer to the

estimator as a quasi-maximum likelihood estimator. The quasi-maximum

likelihood estimator has the same limiting distribution as a certain minimum

distance estimator; but in general that minimum distance estimator is not

using the optimal norm. Hence our estimator is generally more efficient than

the quasi-maximum likelihood estimator.

Section 4 of the paper presents an empirical example that illustrates some

of the results. It is based on the panel of Young Men in the National

Longitudinal Survey (Parnes); y, is the logarithm of the individuals hourly

wage, and x, includes variables to indicate whether or not the individuals

wage is set by collective bargaining; whether or not he lives in an SMSA;

and whether or not he lives in the South. We present unrestricted least


4/42

8

G. Chamberlain, Multivariate regression models for panel data

squares regressions of y, on xi,.

.,

xT. There are significant leads and lags; if

they are generated just by a random intercept (c), then ZZ should have a

distinctive form. There is some evidence in favor of this, and hence some

justification for analysis of covariance estimation. In this example, the leads

and lags could be interpreted as due just to c, with E(y, 1 1,. . ., xT, c) =j?x, + c.

2. Identification

Suppose that a farmer is producing a product with a Cobb-Douglas

technology,

Y,=Px,+c+~,,

o


5/42

G. Chamberlain :Illrltirariate regression models for panel data

9

With more than one observation per farm, however, we can consider the

least squares regression of y, on x = (xi,.

.,xT).

The population counterpart

is

E*(y,

I x) = pxt + E*(c ( x) + E*(u, I x).

Assume that V(X) is non-singular. Then

E*(c 1 ) = $ + xx,

Iz = I/ (x) cov(x, c).

Even if E*(u, / x) =O, there will generally be a full set of lags and leads

if V(c) # 0. For example, if cov (xt, c) =cov (x,, c), t = 1,.

. ., ?;

then Iz is pro-

portional to the row sums of V-(x), and all of the elements of I will

typically be non-zero. I think that it is generally true that E*(c lx) depends

on all of the x,s if it depends on any of them. So the presence of

heterogeneity bias will be signalled by a full set of lags and leads. Also, if

E*(u) x)=0, then the wide-sense multivariate regression will have a

distinctive pattern:

n

1 =

co+,

x)

v

(x) = p

I, +

1

A,

where 1 is a TX 1 vector of ones. The off-diagonal elements within the same

column of ll, are all equal.

A common solution to the bias problem is some form of analysis of co-

variance. For example, we can form the farm specific means (j?=CT= 1 y,/T,

X =cT= 1 ,/T) and the deviations around them (jt = y, - j, 3, = x,-X), and then

run a pooled least squares regression of ~7 on 2. This is equivalent to first

running the least squares regression of g* on & for each of the

T

cross-section

samples, and then forming a weighted average of the T slope coefficients. The

population counterpart of the tth least squares regression is

So the least squares regression of Y; on ?r provides a consistent (as N-co)

estimator of fl only if E*(u, - Ul X,-Z?) =O. I would not expect this condition

to hold unless

E*(u,-uu,.

1

j-~~-x~,...,x~-x~~~)=O,

t = 2,

,) 7:

This analysis of covariance estimator was used by Mundlak (1961). Related estimators have

been discussed by Balestra and Nerlove (1966), Wallace and Hussein (1969), Amemiya (1971),

Maddala (1971), and Mundlak (1978). Analysis of covariance in nonlinear models is discussed in

Chamberlain (1980).


6/42

10 G. Chamberlain, Mu&variate regression models for panel data

so that x is strictly exogenous when we transform the model to first

differences.3 The strict exogeneity restriction is testable since it implies that

E*( ,,-Yr~1IXz-X1,...,

xT--x.-l)=~*(Yt-Y,-l I-+-x,-d

hence there are exclusion restrictions on the linear predictors.

A stronger condition is that

E*(u,lx ,,..., xT)=o,

t=l,...,T.

This implies that Zl, has the form fiZ,+l1. These restrictions on n,are

testable; we can summarize them by saying that x is strictly exogenous

conditional on c. The restrictions would fail to hold in the production

function example if u, is partly predictable from its past, so that

E[exp(u,)

1LAY,]

epends on u, _ r, u, _ 2, . . .

Now suppose that the technology varies across the farms, so that

y,=bx,+c+u,,

where b is a random variable that is constant over time. We shall refer to b

and c as invariant random variables. Our discussion of E*(c lx) indicated

that it depends on all of the x,)s if it depends on any of them. I would expect

this to be true of E(c

1 )

as well. This general characteristic of invariant

random variables is formulated in the following condition:

Condition (C). Let x* =(xt,, . . ., xlK), where {tr,. . ., tK} is some proper subset of

{l,...,

T). Let d be an invariant random variable. Then E(d 1 )=E(d I x*)

implies that

E(d

1 ) =

E(d).

Suppose that the parameter of interest is /l=E(b). If b or c is correlated

with x, then a least squares regression of y, on x, will not provide a

consistent estimator of /I. We have argued that such a heterogeneity bias will

be signalled by a full set of lags and leads when we regress y, on (x1,. . ., xT).

Under what conditons can we infer that there is no bias if we observe only a

contemporaneous relationship? Proposition 1 provides some guidance; it can

be extended easily to the case of a finite distributed lag.

Condition (R). Prob (x, =x, _ 1) = 0 for some integer n with i 5 n 5 T.

Proposition I. Suppose that

E(y, I x, b, 4 = b x, + c,

t=l,...,T.

3The strict exogeneity terminology is based on Sims (1972, 1974)


7/42

G. Chamberl ain, M ult iv ari ate regression models fir panel dat a

11

[f conditions (C) and (R) hold and if T 23. then

E Y, I 4 = E(Y, 4,

t=l,...,7:

implies that

where /I = E(b) = E(b ) x) and y = E(c) = E(c

) x).

ProoJ: The following equalities hold with probability one:

E(b 4 = CE(Y,I 4 - E(Y, I I xn - d/(x, - xn 11,

So E(bIx)=E(bI

x,, x, _ I), and if T2 3, then (C) implies that E(b

1 )=

E(b),

and

~ clx)=E y,lx)--E blx)x,=E y,lx,)--xx,;

hence E(c 1 ) = E(c 1x1) and so

E(c 1 ) = E(c).

Q.E.D.

This analysis can be applied to linear transformations of the process. If

we find that E(y,

1 )

has a full set of lags and leads, then we can ask if

that is just due to E(c/x)#E(c). Let dy,=y,-y,_,, Ax~=x~--x~-~, and

Ax = (Ax,,

. . .,

Ax,). Under the assumptions of the proposition, if

E(AY, 1A4 = E(AY, ( Ax,),

then

E(AY, 1A4 = B(A-4.

Note that it is possible to find

E(Ay,

1

Ax)=E(Ay, (Ax,)

even though

-W+)#W). F

or example, consider the stationary case in which cov(x,, b)

= cov (x,, b);

then

E*(b

1

Ax) = E(b)

and so

E(b 1Ax)= E(b)

if the regression

function of

b

on

Ax

is linear. Then we might find that

E(Ay,)

x) has a full set

of lags and leads even though E(Ay,

1

Ax) does not.

The condition that prob(x,=x,_ ,)=O is necessary. For consider the

following counter-example:

E(b

( x) = /II1 if x1 =. . . = xT,

E(b

1 ) = p2 if not

(PI f PA. Then


8/42

12

G.

Chamberlain, Multivariate regression

modelsor

panel data

but p2 #E(b) unless prob(x, = ... = xT) = 0. So there is an important

distinction here between continuous and discrete distributions for x. If x,

only takes on a finite set of values, then there will generally be positive

probability that x1 =. . . = xT, although this probability may become negligible

for large 7:

The following proposition provides some additional insight mto this

distinction; it is based on a condition that is slightly weaker than (R):

Condi ti on (R).

Prob(x, = x2 =. . . = xT) = 0.

Proposi ti on 2. Suppose that

E(Y, x, b,4 = bxt + c,

t=l,...,7;

w here T 2 2. Assume t hat condi t i on (R) hol ds and defi ne

6=til

Yt-m-+l

x,--v.

Then E(6j = E(b) i f E((6j) < a .4

ProoJ The following equalities hold with probability one:

E(l+,b,c)= i b(x,-X)

i (x,-%)2=b;

t=1

I

t=1

so if E(I6/)< co,

E(6j = E[E(6[ X, b, c)] = E(b).

Q.E.D.

Suppose that (yil,. . ., yi,, xii,.

. ., xiT), i=

1,. . ., N, is a random sample

from the distribution of b,x). Define

6zt$l (Yit-Pi)(xit-xi) til (xit-xi)2.

I

Then if the assumptions of Proposition 2 are satisfied, cr= I &i/N converges

almost surely (as.) to E(b) as N-co. It is important that gi is an unbiased

estimator of E(b), since we are actually taking the unweighted mean of a

*The assumption that E(161)< co is not innocuous. For example, suppose that V(c)= V(b)=0

and (x,, y,) is independent and identically distributed (t = 1,. ., T) according to a bivariate normal

distribution. Then h^=b+{ P(y, Ix~)/[(T-~)V(.X,)]}~ w,

where w has Students t-distribution with

T- 1 degrees of freedom. Hence Q/61) < cc only if T 2 3.


9/42

G. Chamberl ain, M ult iuar iat e regression models for panel data 13

large number of these estimators. The lack of bias requires that x be strictly

exogenous conditional on b,c. It would not be sufficient to assume that

E(y, ( xt,

b, c) = bx, + c.

For example, if x, = y,_ 1, then our estimator would not

converge to

E(b),

due to the small

T

bias in least squares estimates of an

autoregressive process.

Let Di =0 if xi1 = .. .. = xiT, Di= 1 if not. We can compute gi only for the

group with Di= 1. The sample mean of bi for that group converges as. to

E(b

1D = l), but we have no information on

E(b

1D = 0). So unless prob(D = 0)

= 0, any value for E(b) is consistent with a given value for E(b

1

D = 1).5

If x, has a continuous distribution, then the assumption that the regression

function is linear (E(y,

1 t,

b, c) = bx, + c) is very restrictive; the implication of

this assumption (combined with strict exogeneity) is that we can obtain an

unbiased estimator for

b,

and hence a consistent (as N+co) estimator for

E(b).

If x, is a binary variable, then the assumption of linear regression is not

restrictive at all; but there are fewer implications since there is positive

probability that 6is not defined for finite ?:

The following extension of Proposition 1 to the case of a finite distributed

lag is straightforward?

Proposition 1. Suppose that

E(y,IX,b,,...,b,,c)= i bjx,-j+c,

t=J+l,...,T

j=O

If condition (C) holds, ij

1

X . . X,-J

i i

: 1

X,-J-l . . . . x&,

,fbr some integer n with 25 + 2 5 n 5 7; and if T 2 25 + 3, then

E(Y, I4 = E(Y, Ix,, . . .>X,-J),

t=J+1,...,7;

5A solution could be based on Mundlaks (1978a) proposal that E(bI x)=$,,+$, CT=, x1.

However, even if we assume that the regression function is linear in x1,. .,xT, it may be difficult

to justify the restriction that only cx, matters, unless T is large and we have stationarity:

cov (b, I,) = cov

(b, x1)

and V(x) band diagonal. (See Proposition 4 and the discussion preceding

it). Furthermore, if cov(h, x,) = cov(b, x1), then E(b 1 2-x,, .,xr -xT- 1)= E(b) (if the regression

function is linear), and so there is no heterogeneity bias once we transform to first differences.

6We shall not discuss the problems that arise from truncating the lag distribution when

T < J + 1. These problems are discussed in Griliches and Pakes (1980). By working with linear

transformations of the process, it is fairly straightforward to extend our analysis to general

rational distributed lag schemes.


10/42

14

G. Chamberl ain, M ult iv ari ate regression models or panel data

implies that

E(y,Ix)= i Bjxt-j+Y,

j=O

where

pj = E(bj) = E(b,

1 )

and y = E(c) = E(c I x),

j=O,..., J.

The extension of Proposition 2 is also straightforward. There are new issues,

however, in the infinite lag case, which we shall take up next.

Large number of lags.

Suppose that

E(.Yfldx),c)= f Bjxt-j+c2

i=O

where O(X) is the information set (a-field) generated by {.

. .,x_ I, x0, x1,. . .},

and Cj=o /Ij x,_ j converges in mean square as J-+ co. Consider a regression

version of the Sims (1972) condition for x to be strictly exogenous (y does

not cause x),

E Yt

I

4) =

E(Yt

I x,2 xt -

19.. 4

Does this condition imply that E(c

1a(x))=E(c), so

that there is no

heterogeneity bias?

We shall consider this question in the context of a (strictly) stationary

stochastic process. Since c does not change over time, it is an invariant

random variable. The following proposition is proved in appendix A:

Proposition 3. Ifd is an invariant random variable with E(ldl)< co, then

E(dIo(x))=E(dlx,,x,-,,...),

where t is any integer.

It follows that

n

E Y,Ia x))=E cIx,,x,-,,...)+ C Pjx*-j

j=O

=E y,Ix,,x,-I,...).

So we cannot rule out heterogeneity bias just because y does not cause x. If


11/42


1s

a large number of lags have been included, then a small number of leads

provide little additional information on c.

We can gain some insight into this result by considering the linear

predictor of an invariant random variable. Let

where

E*(c 1xl,. . .,

x.)=IC/T+&XT,

2;. =(& i, . . .) A,,) and x;=(xl,...,xT).

Stationarity implies that I,=rV- (xT)l, where r =cov(xl, c) and 1 is a TX 1

vector of ones. Since V(x,) is a band-diagonal matrix, I is approximately an

eigenvector of I+,) for large T; hence &.x,EzIc~T=

1 x,.

For example, if

X, = px, _ i + u,, where v, is serially uncorrelated, then

&-x,=~

(1 PI i xt+P(x, +x,)

/cu+P) vxln

i=l

1

Now in this example, L&K, does not approach a limit as T--+Lx unless

z = cov (x,, c) =O. In fact cov (xi, c) is zero here, since there is a non-trivial

linear predictor only if cj=O x,_ j/J converges to a non-degenerate random

variable as J-rco.

The general case is covered by the following proposition:

Proposition 4.

If d is an invariant random variable and E(d) < co, E(xf) < 00,

then

E*(d

I

. ..) X_1,X&Xl)... )=$ +/IT?,

where 2 is the limit

i n mean squar e of cJ= O , _

/J as J+ co, t is any

integer,

A=cov(d,i)/V(i) if V(a)#O,

and

=o

if V(a)=O,

$ = E(d) - AE(f).

(See appendix A for proof.)

The existence of the f limit, both in mean square and almost surely, is the

main result of ergodic theory and will be discussed further below. It is clear

that 2 is an invariant random variable. If V(a)#O, then the x process has a

(non-degenerate) invariant component, and conditioning on the xs gives a


12/42

16

G. Chamberlain, Multioariate regression modelsfor panel data

non-trivial linear predictor if 2 is correlated with c. However, if V(i)=O, then

cov(c, x,)=0 for all

t,

and the linear prediction of c is not improved by

conditioning on the xs

It follows from Proposition 4 that

E* Y, I

. . . x,-1,x*,x,+ ,,..

) =E*(ytIxt,x,-,,...I

=i+jio

j++

1

t - + r(J),

where r(J) converges in mean square to zero as J-co. So y does not cause x

according to Sims definition; but this does not imply that c is uncorrelated

with the xs. If we include a large number of lags, then the bias in any one

coefficient is a negligible

A/J,

but the bias in the sum of the lag coefficients

tends to 2 as J-co. If we include K leads, then the sum of their coefficients

is approximately K3,/J, which is close to zero when J is much larger than K.

If the pi are zero for j> J*, then the lag coefficients beyond that point will

be close to zero but their sum will be close to II.

Under the stationarity assumption,

there are non-degenerate invariant

random variables if and only if the process is not ergodic. The basic result

here is the (pointwise) ergodic theorem: Let g be a random variable on

(Q,F,P) with E(lgl)< co,

and let g,(o)=g(Sw), where S is the shift

transformation (see appendix A); then the following limit exists as.:

The limit kj is an invariant random variable; it is the expectation of 8,

conditional on &, where f is the information set (a-field) generated by all of

the invariant random variables. If 1/(i) # 0 for some g, then the process is not

ergodic. In the ergodic case, all of the invariant random variables have

degenerate distributions.

Suppose that

and let

E(Y, 44, A= b x, + c,

Gil

(Y,-Ylbt--x)

il

h-3.

Recall condition (R): prob(s, =...=x~)=O. I want to examine the


13/42

G. Chamberl ain, Multinariate regression models or panel data

17

significance of condition (R) as T+n; in the stationary case. Note that

So a limiting version of condition (R) is

prob[ I/(x, ) f) = 0] = 0.

If this condition holds, then

l imb

~~(xlY1l~)-~(xlI8)~(YlI,a)~, as.

T

T- r X

E(4 I&)-cm, I&)I2 .

and b is observable as T-tco. But if there is positive probability that

T/(x, 1 ) =O, then the identification problem is more difficult. There is no

information on b for the stayers; so-in order to obtain E(b), even as T-co,

we

have to make untestable assumptions about the unobservable part of the

b distribution.

3. Estimation

Consider a sample Y;=(x:,yi),

i =

1,.

. .,X,

where xi. = (xi,, . ., xiK), yi

=(yil,. . ., yiM). We shall assume that vi is independent and identically

distributed (i.i.d.) according to some multivariate distribution with finite

fourth moments and

E(x,x:)

non-singular. Consider the minimum mean

square error linear predictors,

E*(yi,

I

xi)

=dlxi>

m=l,...,M,

which we can write as

E*bi 1xi) = LZxi with

tZ = Ebi xi) [E(xi xi)] .

We want to estimate ll subject to restrictions and to test those restrictions.

For example, we may want to test whether a submatrix of Ll has the form

/?Z+lA.

I think that analysis of covariance estimation should be accompanied

by such a test.

We shall not assume that the regression function

E(y, 1xi)

is linear. For

although E@, 1 i, ci) may be linear (indeed, we hope that it is), there is generally

This agrees with the definition in section 2 if xi includes a constant.


14/42

18


no reason to insist that E(ci Ixi) is linear. So we shall present a theory of

inference for linear predictors. Furthermore, even if the regression function is

linear, there may be heteroskedasticity - due to random coefficients, for

example.8 So we shall allow V(j,

1 i)

to be an arbitrary function of xi.

3.1. The estimation of linear predictors

Let wi be the vector formed from the distinct elements of r i r i that have

non-zero variance. Since v;=(xi,yi) is i.i.d.,

it follows that wi is i.i.d. This

simple observation is the key to our results. Since IZ is a function of E(wi),

our problem is to make inferences about a function of a population mean,

under random sampling.

Let p= E(w,) and let IL be the vector formed from the columns of ll [Z

= vet (IZ)]. Then YI is a function of P: x=/z(p). Let W= cy2

1 w,/N;

then

7i = h(w) is the least squares estimator:

.=VeC[ ~~XixI)-~~XiYI].

By the strong law of large numbers, W

converges almost surely to p as

N-tee

(WL

$), where p is the true value of p. Let n=h(~o). Since h(p) is

continuous at p =p, we have 2%

7~. The central limit theorem implies

that

J5$i-pO)%v(O,

(w,)).

Since h(p) is differentiable at p = PO, the &method gives

JN(iZ-d)%v(O,R),

where

We have derived the limiting distribution of the least squares estimator.

This approach was used by Cramer (1946) to obtain limiting normal

*Anderson (1969,1970), Swamy (1970,1974), Hsiao (1975), and Mundlak (1978a) discuss

estimators that incorporate the particular form of heteroskedasticity that is generated by

random coefficients.

See Billingsley (1979, example 29.1, p. 340) or Rao (1973, p. 388).


15/42

G. Chamberlain, Multivariate regression models for panel data 19

distributions for sample correlation and regression coefficients (p. 367); he

presents an explicit formula for the variance of the limiting distribution of a

sample correlation coefficient (p. 359). Kendall and Stuart (1961, p. 293) and

Goldberger (1974) present the formula for the variance of the limiting

distribution of a simple regression coefficient.

Evaluating the partial derivatives in the formula for 52 is tedious. That

calculation can be simplified since i has a ratio form. In the case of simple

regression with a zero intercept, we have rc = E(y,x,)/E(xj ) and

fi(kTO)=

y.u.-

I I

i=l

Ql xi)[fi( , m)].

Since I?= r x/N*E(x?), we obtain the same limiting distribution by working

with

fl C(Yi noxi)xillCfi E(xZ)l,

The definition of rc gives E[(y, - rcxi)xi] = 0, and so the central limit theorem

implies that

This approach was used by White (1980) to obtain the limiting distribution

for univariate regression coefficients.

lo In appendix B (Proposition 5) we

follow Whites approach to obtain

where

s2 =

E[iJJi-noxi)(yi

-nOx,) @@i; (Xi xi) @,

1,

(1)

@, = E(qx;).

A consistent estimator of 52 is readily available from the corresponding

sample moments,

n here

o=&$ [~i-Bxi)(JJi-fiXi)@ S;(Xixi)S;q AL?,

(2)

L 1

S,= 5 x,x:/N.

i l

Also see White (1980a,b).


16/42

20 G. Chamberlain Multkariate regression modelsfiv

panel

data

If E(j, 1 i) =ZZx, so that the regression function is linear, then

If Vcvi 1Xi) is uncorrelated with xix;, then

If the conditional variance is homoskedastic, so that V(j, 1 i)= C does not

depend on xi, then

3.2. Imposing restrictions: The minimum distance estimator

Since IZ is a function of E(w,), restrictions on ZZ imply restrictions on E(wi).

Let the dimension of r=E(wi) be q.

We shall specify the restrictions by the

condition that ~1 depends only on a p x 1 vector 8 of unknown parameters: p

=g(8), where g is a known function and psq. The domain of 8 is X a subset

of p-dimensional Euclidean space (RP) that contains the true value 8. So the

restrictions imply that ~=g(6) is confined to a certain subset of

Rq.

We can impose the restrictions by using a minimum distance estimator:

choose &to

where A, ff-i P and P is positive definite. This minimization problem is

equivalent to the following one: choose 6 to

The properties of 6 are developed, for example, in Malinvaud (1970, ch. 9).

Since g does not depend on any exogenous variables, the derivation of these

properties can be simplified considerably, as in Chiang (1956) and Ferguson

(1958).

For completeness, we shall state a set of regularity conditions and the

properties that they imply:

If there is one element in ripi with zero variance, then q = [(K + M)(K + M + 1)/2] - 1.


17/42

G. Chamberl ain, M ult iv ari ate regression models for panel data

21

Assumption 1.

uN aAg(Bo); Yis a compact subset of RP that contains 6; g

is continuous on yT and g(6)=g(O) for 0~ Y implies that 8=8;

A, s Y,

where Y is positive definite.

Assumption 2.

$?[a,-g(O)] %(O, A); r contains a neighborhood

O in which g has continuous second partial derivatives; rank (G) =p,

G =

ag eOym

Choose 8 to

minCa,-g(e)lA.Ca,-s(e)l.

0Er

Proposition 6.

If Assumption I is satisfied, then ea%Oo.

E. of

where

Proposition 7.

Zf Assumptions I and 2 are satisfied, then ,,/%(&O)%V(O, A),

where

If A is positive definite, then A -(CT A - 1 c)-

1

is positive semi-definite; hence an

optimal choice for Y is A .

Proposition 8.

If Assumptions I and 2 are satisfied, if A is a q x q positive

definite matrix, and if A,%A- I, then

Wwd831 4vC~,-g(B)1%2kp).

(This is extended to the case of nested restrictions in Proposition 8, appendix

B.)12

Suppose that the restrictions involve only Zl. We specify the restrictions by

the condition that z=f (4, where 6 is s x 1 and the domain of 6 is Y,, a

subset of R that includes the true value 6. Consider the following estimator

of 6: choose s^ to

~:CA-f(6)]8-[li-f(S)],

1

Since the proofs are simple, we shall keep the paper self-contained and include them in

appendix B. The proofs are based on Chiang (1956), Ferguson (1958), and Malinvaud (1970,

ch. 9).


18/42

22

G. Chamberl ain, M ult ioar iat e regression models or panel data

where fi is given in eq. (2) and we assume that 0 in eq. (1) is positive

definite. If Y, and

f

satisfy Assumptions 1 and 2, then 6^3S,

fi(&

so)qo, [F

n ~

Fj -

),

and

where

F= i 3 f d ) /W.

We can also estimate So by applying the minimum distance procedure to w

instead of to Iz. Suppose that the components of wi are arranged so that

w:=(w;,, wQ, where wil contains the components of x&. Partition p=E(wi)

conformably: p = (PC;,&). Set 8 = (8r, VZ)= (8, pi). Assume that V(w,) is

positive definite. Now choose 6 to

and g,(n, ~1~)= pr. Then &r gives an estimator of 6; it has the same limiting

distribution as the estimator 8 that we obtained by applying the minimum

distance procedure to 12. (See Proposition 9, appendix B.)

This framework leads to some surprising results on efficient estimation.

For a simple example, we shall use a univariate linear predictor model,

E*(yi 1

Xil,Xiz)=710 +

?Tl

Xi1

+7Cz Xi2.

Consider imposing the restriction rc2 = 0. Then the conventional estimator of

n1 is byx,,

the slope coefficient in the least squares regression of y on x1. We

shall show that this estimator is generally less efficient than the minimum

distance estimator if the regression function is nonlinear or if there is

heteroskedasticity.

Let fi,,it, be the slope coefficients in the least squares multiple regression

of y on x1,x2. The minrmum distance estimator of a, under the restriction

rrZ =0 can be obtained as 6=72r +r& where r is chosen to minimize the


19/42


23

(estimated) variance of the limiting distribution of & this gives

where Qj, is the estimated covariance between tij and I& in their limiting

distribution. Since 72, = bYx, - 722bx2x1, we have

If E(Y,

1Xil,XiJ is

linear and if V(y,

1 ii, xi2)=a2,

then w12/022 =

-COv(Xi,,Xi2)/~(Xi~) and s^= byxl. But in general 8# byxl and s^ is more

efficient than

by_.

The source of the efficiency gain is that the limiting

distribution for ti, has a zero mean (if rc2=O), and so we can reduce variance

without introducing any bias if 72, is correlated with

b,,l.

Under the

assumptions of linear regression and homoskedasticity,

b,_

and 72, are

uncorrelated; but this need not be true in the more general framework that

we are using.

3.3. Simultaneous equations: A generalization of two- and three-stage least

squares

Given the discussion on imposing restrictions, it is not surprising that two-

stage least squares is not, in general, an efficient procedure for combining

instrumental variables. I shall demonstrate this with a simple example.

Assume that (yi,zirxil,xi2) is i.i.d. according to some distribution with finite

fourth moments, and that

yi = 6 Zi +

Vi,

where

E(ui xii) = E(ui xi2) = 0.

Assume also that

E(zi xii) # 0, E(z, xi2) # 0.

Then

there are two instrumental variable estimators that both converge a.s. to 6:

$jcifI YixijlifI zixij,

j= 1,2,

fi{(;;)-(;)}-N(OJ)>

where the j,

k

element of n is

2, = EC(Yi-dzi)2XijXi J

Jk

E(zixii)E(zi.xik)

j,k=1,2.


20/42

24 G. Chamberl ain, M ult iv ari at e regression models or panel data

The two-stage least squares estimator combines 8, and & by forming

^zi=7c1xil +ti2xi2, based on the least squares regression of z on x1,x2 (as-

sume that E[(xir, Xia)(Xil, xi2)] is non-singular),

where

N

N

N

oiti,

c

ZiXil

i=l

I

ili~lzixil+722 C zixi2

.

i=l

)

Since i %a, JN(&s,,

-6) has the same limiting distribution as

This suggests finding the r that minimizes the variance of the limiting

distribution of fi[r($i - 6) + (1 -r)(& -S)]. The answer leads to the

minimum distance estimator: choose e^ o

gives

e^=z&+(l-z)&,

where

~=(~+1,2)/(3.1+2~12+~22),

and Ijk is the j, k element of A - .

The estimator obtained by using a

consistent estimator of A has the same limiting distribution.

In general z #a since r is a function of fourth moments and a is not.

Suppose, for example, that zi = Xi2. Then IX= 0 but z # 0 unless

xil xi2

E(xil

x i2 )

>I

=o

If we add another equation, then we can consider the conventional three-

stage least squares estimator. Its limiting distribution is derived in appendix

B (Proposition 5); however, viewed as a minimum distance estimator, it is

using the wrong norm in general.


21/42


25

Consider the standard simultaneous equations model:

yi = nxi +

ui,

E(Ui xi) = 0,

ryi +

BXi =

ui,

where rll+

B= 0

and Tui = vi. We are continuing to assume that yi is

M x 1, xi is K x 1, r; = (xi yi) is i.i.d. according to a distribution with finite fourth

moments

(1 =

1,. .,N), and that

E(x,,xi)

is non-singular. There are restrictions on

r and

B: m T, B )=O,

where

m

is a known function. Assume that the implied

restrictions on ll can be specified by the condition that n=vec(lT)=f(Q

where the domain of 6 is r,, a subset of

R

that includes the true value So

(s 5 MK). Assume that Y, and f satisfy Assumptions 1 and 2; these properties

could be derived from regularity conditions on m, as in Malinvaud (1970,

prop. 2, p. 670).

Choose 8 to

y: [7i - f(d)]&

1[72-f(s)],

E 1

where d is given by eq. (2) and we assume that 0 in eq. (1) is positive

definite. Let F= af(s)/S. Then we have J%(~-~~)%NN(O, A), where n

= (F Q - 1 F) . This generalizes Malinvauds minimum distance estimator (p.

676); it reduces to his estimator if UP uy is uncorrelated with xi xi, so that Q

= E(up up ) @ [E(.qx;)] - (up = yi Zl x,).

Now suppose that the only restrictions on r and B are that certain

coefficients are zero, together with the normalization restrictions that the

coefticient of yim in the mth structural equation is one. Then we can give an

explicit formula for A. Write the mth structural equation as

where the components of zi, are the variables in yi and xi that appear in the

mth equation with unknown coefficients. Let there be M structural equations

and assume that the true value r is non-singular. Let 6 =(S;, . . ., &) be s x 1,

and let r(6) and

B 6)

be parametric representations of r and

B

that satisfy

the zero restrrctions and the normalization rule. We can choose a compact

set Y, c

R

containing a neighborhood of the true value a, such that I(6) is

non-singular for b E Y,. Then s = f(s), where f(s) = vet [ - r

(6)

B S)].

Assume that f(s) =IL implies that 6=6, so that the structural parameters

are identified. Then Y, and f satisfy Assumptions 1 and 2, and J%(8-6)


22/42

26


A + (O, A). The formula for &r/&Y is given in Rothenberg (1973, p. 69),

an/as =-(r - 1cg K) p,,(zM B d5; )I,

where @,, is block-diagonal: @,, = diag {E(zilx:), . . ., E(Zi,Xi)}, and @,=E(X&).

So we have

n = {

~,,[E(Op

Up~ Xi X:)] - l

UP:,>

,

where I$ = royi + So xi. If up up is uncorrelated with xi xj, then this reduces

to

n = {@J-E -(Up up) @ @,I] a;,> - l,

which is the conventional asymptotic covariance matrix for three-stage least

squares [Zellner and Thiel (1962)].

I shall present a generalization of three-stage least squares that has the

same limiting distribution as the generalized minimum distance estimator.

Let /I=vec(B) and note that R= -(f ~

@ I)/?.

Then we have

[ji+(r- 0

z)/?]s)-[a+(r-

0

4Bl

=[(ro1)72+P]O-[(ro1)12+81,

where

o=(Z~~;l)E(f UpU:r~XtX;)(Z~Qi;).

Let S,, be the following block-diagonal matrix:

and let

where

iji = ~yi +

~Xi

p+rO

7

B%

BO.


23/42

G. Chamberlain, Multivariate regression models fir palI e data

21

Now replace 0 by

6 = (Z@s,- ) 9yzgs,- ),

and note that

(I 0 S,)[(r 0 472 + j?] = sxy - s:,s.

Then we have the following distance function:

This corresponds to Basmanns (1965) interpretation of three-stage least

squares. 3

Minimizing with respect to 6 gives

a,,=(S,, F s:,)-(s,, Ps,,).

The limiting distribution of this estimator is derived in appendix B

(Proposition 5). We record it as:

Proposition

10. fi(6^,,-6)%iV(0,A),

where A =(@,, P- @P:,)-l.

This

generalized three-stage least squares estimator is asymptotically efficient within

the class of minimum distance estimators.

Finally, we shall consider the generalization of two-stage least squares.

Suppose that

Yil =S; zil O i l ,

where E(xiUil)=O, Zil is sl x 1, and rank [E(XiZ:l)] =sl. We complete the

system by setting

yi, = nk xi + Uim,

where E(XiUi,)=O (m=2,. . ., M). SO z~,,,=x~ (m=2,.

. ., M),

and

Let 6 =(6;, II;, . ., nJ and apply the minimum distance procedure to obtain

8; since we are ignoring any restrictions on R, (m = 2,.

. ., M), 8

is a limited

information minimum distance estimator.

13See Rothenberg (1973 p. 82). A more general derivation of this distance function can be

obtained by following Hanken (1982). Also see White (1982).


24/42

28


We have a($, -@)1?N(O, n

11),

and evaluating the partitioned inverse

gives

where

n 11= {E(Zi, $I [E((Diq)Z Xix:)] - E(Xi Zil)} _ ,

(4)

$1 =yi, -s;ozir.

We can obtain the same limiting distribution by using the following

generalization of two-stage least squares: Let

and

where $I %Sy (for example, 8r could be an instrumental variable estimator);

then

&;G2

Z; x PE;,

x2,)-

(z; x P

,,

Xy,).

This is the estimator of S, that we obtain by applying generalized three-stage

least squares to the completed system, with no restrictions on A, (m

= 2,. . .)

M). The limiting distribution of this estimator is derived in appendix

B (Proposition 5):

Proposition 11.

,,/%(8,,, -Sy)%N(O, A,,), where A,

I

is given in eq. (4). This

generalized two-stage least squares estimator is asymptotically efficient in the

class of limited information minimum distance estimators.

3.4. Asymptotic efjciency: A comparison with the quasi-maximum likelihood

estimator

Assume that

ri

is i.i.d.

(i=

1,. . .,

N) from a distribution with Er,) =z, V rJ

=Z, where Z is a J x J positive definite matrix; the fourth moments are

finite. Suppose that we wish to estimate functions of Z subject to restrictions.

Let C= vet(Z) and express the restrictions by the condition that a=g(O),

where g is a function from Yinto Rq with a domain YC RP that contains the

true value

O(q = J*; p 5 J(J + 1)/2).

Let

S=kiil

ri-FJ ri-yi),


25/42

If the distribution of vi is multivariate normal, then the log-likelihood

function is

If there are no restrictions on r, then the maximum likelihood estimator of 8

is a solution to the following problem: Choose 6 to solve

We shall derive the properties of this estimator when the distribution of Yi is

not necessarily normal; in that case we shall refer to the estimator as a quasi-

maximum likelihood estimator (e^,,,).14

MaCurdy (1979) considered a version of this problem and showed that,

under suitable regularity conditions, ,/%(gQML -0) has a limiting normal

distribution; the covariance matrix, however, is not given by the standard

information matrix formula. We would like to compare this distribution with

the distribution of the minimum distance estimator.

This comparison can be readily made by using Theorem 1 in Ferguson

(1958). In our notation, Ferguson considers the following problem: Choose 8

to solve

w (s, e) [s-g e)] = 0.

He derives the limiting distribution of

fi(&--

fI) under regularity

conditions on the functions W and g. These regularity conditions are

particularly simple in our problem since W does not depend on S. We can

state them as follows:

Assumption 3. E. c RP

is an open set containing 8; g is a continuous, one-

to-one mapping of E. into Rq with a continuous inverse; g has continuous

second partial derivatives in Eo; rank [ag(fI)/S] =p for OE 8,; Z(O) is non-

singular for

edo.

In addition, we shall need SaAg(Oo) and the central limit theorem result that

+%(S-g(e))%N(O,d), where A = V[(U~-~~)@(U~-~~)].

Then Fergusons theorem implies that the likelihood equations almost

surely have a unique solution within So for sufficiently large N, and

14The quasi-maximum likelihood terminology was used by the Cowles Commission; see

Malinvaud (1970, p. 678).

JE--B


26/42

30

G. Chamberl ain, M ult ioar iat e regression models or panel data

vmL4L

eO)%,N(O, A), where

A=(GYG,-GYAYG(GYG,)-,

and G=&(fl)/%, Y=(Z@Zo)-. It will be convenient to rewrite this,

imposing the symmetry restrictions on Z. Let G* be the J( J+ 1)/2 x 1 vector

formed by stacking the columns of the lower triangle of Z. We can define a

J* x [ J( J + 1)/2] matrix

T

such that CT

Ta*.

The elements in each row of

T

are all 0 except for a single element which is one;

T

has full column rank. Let

s= J-s*

g(6)=

Tg*(B), G* = ~g*(~)/S, Y* = TYT;

then fi[S* -s*(0)]

%N(O,A*),

where

A*

is the covariance matrix of the vector formed from the

columns of the lower triangle of (ri-rO)(ri -TO). NOW we can set

/I =(e* y*G*)- (G*

y* A* y* G*)(e* y*

G*)-

1.

Consider the following minimum distance estimator: choose @MD o

T$[s* -g*(B)] A,{ ?* -g*(O)],

where ris a compact subset of E. that contains a neighborhood of 8 and

A,=%Y*. Then the following result is implied by Proposition 7.

Proposition 12. If Assumption 3 is satisfied, then J%(&~~~ -0) has the

same limiting distribution as fi(gMD - 0).

If A* is non-singular,

an optimal minimum distance estimator has

A,a%[A*-,

where [ is an arbitrary positive real number. If the distribution

of ri is normal, then A*- =iY*; but in general A*- is not proportional to

Y*, since

A*

depends on fourth moments and Y* is a function of second

moments. So in general flPML

is less efficient than the optimal minimum

distance estimator that uses

;i;l s~-s*) s:-s-i)

1

-1

,

where SF is the vector formed from the lower triangle of (ri-r](ri-f).

More generally, we can consider the class of consistent estimators that are

continuously differentiable functions of s

*: &=@*). Chiang (1956) shows that

the minimum distance estimator based on

A*-

has the minimal asymptotic

covariance matrix within this class. The minimum distance estimator based

on A, in (5) attains this lower bound.


27/42

G. Chamberlain, Multivariate regression models or panel data

31

4. An empirical example

We shall present an empirical example that illustrates some of the

preceding results. The data come from the panel of Young Men in the

National Longitudinal Survey (Parnes). The sample consists of 1454 young

men who were not enrolled in school in 1969, 1970, or 1971, and who had

complete data on the variables listed in table 1. Table 2a presents an

unrestricted least squares regression of the logarithm of wage in 1969 on the

union, SMSA, and region variables for all three years. The regression also

includes a constant, schooling, experience, experience squared, and race. This

regression is repeated using the 1970 wage and the 1971 wage.

Table

I

Characteristics of National Longitudinal Survey

Young Men, not enrolled in school in 1969,

1970, 1971; N= 1454.

Variable Mean

Standard

deviation

LWI 5.64

0.423

LWZ 5.74 0.426

LW3 5.82 0.437

Ul 0.336

u2 0.362

lJ3 0.364

lJlU2

0.270

lJIcJ3 0.262

U2U3

0.303

UI CJ2U3

0.243

SMSAI 0.697

SMSAZ

0.627

SMSA3 0.622

RNSI

0.409

RNS2

0.404

RNS3 0.410

s 11.7 2.64

EXP69

5.11 3.71

EXP692 39.8 46.6

RACE

0.264

LWI, L W2, LW3 ~ logarithm of hourly

earnings (in cents) on the current or last job in

1969,1970,1971; UI, U2, U3 - 1 if wages on

current or last job set by collective bargaining,

0 if not, in 1969,1970,1971; SMSAI,SMSAZ,

SMSA3 -

1 if respondent in SMSA, 0 if not,

in 1969,1970,1971;

RNSI, RNSZ, RNS3 -

1, if

respondent in South, 0 if not, in 1969,1970,1971;

S ~ years of schooling completed; EXP69 -

(S-age in 1969 -6); RACE - 1 if respondent

black, 0 if not.


28/42

1

u

2

b

u

y

B

n

p

m

a

v

s

o

a

p

s

X

(

8

p

6

S

C

Z

S

V

W

Z

W

V

S

W

a

"

S

=

%

t

1

0

9

O

(

9

0

t

z

(

P

O

2

0

f

?

?

-

(

6

O

9

O

(

9

0

L

O

(

O

O

O

9

O

f

(

1

0

P

O

(

2

0

8

0

(

S

O

O

2

0

i

(

6

O

O

0

0

(

P

O

O

1

0

(

Z

O

O

8

0

z

z

E

O

O

)

(

L

O

z

o

Z

O

O

O

O

l

M

k

m

(

S

O

(

O

O

S

0

P

O

O

6

0

Z

M

(

P

O

O

h

O

b

M

O

Z

O

O

L

O

L

O

f

M

.

_

r

z

I

a

q

A

-

_

~

o

J

p

m

p

m

s

u

o

j

u

_

e

.

s

u

S

s

e

l

s

p

s

m

9

1

~

u

z

.b

u

Q

B

n

p

m

a

s

o

J

p

m

s

a

L

$

w

2

9

a

s

u

%

a

V

(

8

O

O

(

E

0

1

O

O

(

9

0

S

O

(

E

O

O

O

(

O

O

(

E

O

O

Z

O

O

O

P

O

0

8

0

C

0

f

8

0

Z

0

I

P

O

O

9

O

.

C

(

Z

D

O

(

6

0

(

6

0

(

9

0

S

0

(

L

O

O

9

o

(

8

0

(

E

O

O

S

O

6

O

s

0

o

o

E

O

9

0

0

0

O

O

8

O

Z

M

(

O

O

O

8

o

g

p

(

P

O

S

O

(

s

(

S

O

O

Z

O

I

O

O

S

O

6

0

Z

(

s

o

I

L

O

l

C

Z

I

I

t

z

I

a

q

m

_

~

l

u

:

J

o

o

a

p

m

s

p

s

u

x

a

R

S

s

m

I

p

3

u

e

a

w


29/42

G. Chamberlain Multivariate regression models for

panel

data

33

In section 2 we discussed the implications of a random intercept (c) and a

random slope b). If the leads and lags are due just to c, then the submatrices

of LI corresponding to the union, SMSA, or region coefficients should have

the form /3l+U. Consider, for example, the 3 x 3 submatrix of union

coefficients ~ the off-diagonal elements in each column should be equal to

each other. So we compare 0.048 to 0.046, 0.042 to 0.041, and -0.009 to 0.010;

not bad.

In table 2b we add a complete set of union interactions, so that, for the

union variables at least, we have a general regression function. Now the

submatrix of union coefficients is 3 x 7. If it equals pZ3,0)+Zl, then in the

first three columns, the off-diagonal elements within a column should be

equal; in the last four columns, all elements within a column should be equal.

I first imposed the restrictions on the SMSA and region coefficients, using

the minimum distance estimator. fl is estimated using the formula in eq. (2),

section 3.1, and A,=&. The minimum distance statistic (Proposition 8) is

6.82, which is not a surprising value from a ~(10) distribution. If we impose

the restrictions on the union coefficients as well, then the 21 coefficients in

table 2b are replaced by 8: one fl and seven 2s. This gives an increase in the

minimum distance statistic (Proposition 8, appendix B) of 19.36-6.82

= 12.54, which is not a surprising value from a ~(13) distribution. So there is

no evidence here against the hypothesis that all the lags and leads are

generated by c.

Consider a transformation of the model in which the dependent variables are

LWl, LW2-LWl, and LW3-LW2. Start with a multivariate regression on

all of the lags and leads (and union interactions); then impose the restriction that

U,

SMSA,

and

RNS

appear in the LW2-

L WI

and LW3 - LW2 equations

only as contemporaneous changes (E(y, - y,

1 1 1, x2, x3) = p(x, - x,_ J).

This

is equivalent to the restriction that c generates all of the lags and leads, and

we have seen that it is supported by the data. I also considered imposing all

of the restrictions with the single exception of allowing separate coefficients

for entering and leaving union coverage in the wage change equations. The

estimates (standard errors) are 0.097 (0.019) and -0.119 (0.022). The standard

error on the sum of the coefficients is 0.024, so again there is no evidence

against the simple model with E(y, 1x1, x2, x3, c) = /IX, + c.15

However, since the x,s are binary variables, condition (R) in Proposition 1

Using May-May CPS matches for 197771978, Mellow (1981) reports coefftcients (standard

errors) of 0.087 (0.018) and -0.069 (0.020) for entering and leaving union membership in a wage

change regression. The sample consists of 6,602 males employed as non-agricultural wage and

salary workers in both years. He also reports results for 2,177 males and females whose age was

525. Here the coefficients on entering and leaving union membership are quite different: 0.198

(0.031) and -0.035 (0.041); it would be useful to reconcile these numbers with our results for

young men. Also see Stafford and Duncan (1980).


30/42

34


does not hold. For example, the union coefticients provide some evidence

that E(b

1 1, x2,x,)

is constant for the individuals who experience a change in

union coverage [i.e., E(b

1

,,x,,x,)=if if x,+x,+x,#O or 33; but there is

no direct evidence on E(b

1x1, x2, x3)

for the people who are always covered

or never covered. Furthermore, our alternative hypothesis has no structure.

It might be fruitful, for example, to examine the changes in union coverage

jointly with changes in employer.

Table 3a exhibits the estimates that result from imposing the restrictions

using the optimal minimum distance estimator.j We also give the

conventional generalized least squares estimates. They are minimum distance

estimates in which the weighting matrix (AN) is the inverse of

We give the conventional standard errors based on (pfi;F)- and the

standard errors calculated according to Proposition 7, which do not require

an assumption of homoskedastic linear regression. These standard errors are

larger than the conventional ones, by about 30%. The estimated gain in

efficiency from using the appropriate metric is not very large; the standard

errors calculated according to Proposition 7 are about 10% larger when we

use conventional GLS instead of the optimum minimum distance estimator.

Table 3a also presents the estimated Ils. Consider, for example, an

individual who was covered by collective bargaining in 1969. The linear

predictor of c increases by 0.089 if he is also covered in 1970, and it increases

by an additional 0.036 if he is covered in all three years. The predicted c for

someone who is always covered is higher by 0.102 than for someone who is

never covered.

Table 3b presents estimates under the constraint that I=U. The increment

in the distance statistic is 89.08 - 19.36= 69.72, which is a surprisingly large

value to come from a x2 (13) distribution. If we constrain only the union As

to be zero, then the increment is 57.06- 19.36= 37.7, which is surprisingly

large coming from a x2(7) distribution. So there is strong evidence for

heterogeneity bias.

The union coefficient declines from 0.157 to 0.107 when we relax the A =0

restriction. The least squares estimates for the separate cross-sections, with

16We did not find much evidence for nonstationarity in the slope coefficients. If we allow the

union fi to vary over the three years, we get 0.105, 0.103, 0.114. The distance statistic declines

IO 18.51, giving 19.36- 18.51 =0X5; this is not a surprising value from a x*(2) distribution. If we

also free up /I for SMSA and RNS, then the decline in the distance statistic is 18.51- 13.44

= 5.07, which is not a surprising value from a x(4) distribution.


31/42

T

e

3

R

c

e

e

m

e

B

/

,

s

C

c

e

s

(

a

s

a

d

e

o

o

u

S

M

S

4

0

1

0

0

5

(

0

0

(0

0

0

1

0

0

(

0

0

(0

0

(

0

0

(0

0

R

-

0

0

(

0

0

-

0

0

(

0

0

(

0

0

U

u

(

3

l

J

J

2

U

U

r

2

U

lJ

U

U

-

0

0

-

0

0

-

0

0

0

1

0

1

0

1

-

0

2

.

(

0

0

(

0

0

(

0

0

(

0

0

(

0

0

(

0

0

(

00

S

M

I

S

M

A

S

M

A

R

R

R

0

0

~

0

0

0

0

0

1

-

0

0

-

0

1

(

0

0

(

0

0

(

0

0

(

0

0

(

0

0

(

00

x

2

=

1

3

E

C

x

=

x

+

x

x

=

U

U

U

U

U

U

U

U

U

U

U

U

S

M

A

S

M

A

S

M

A

R

R

R

x

=

1

S

E

E

R

Z

=

J

Z

0

B

M

A

R

+

1

Z

s

u

e

c

e

T

e

c

o

a

e

e

e

a

n

=

F

w

e

6

s

u

e

c

e

B

a

1

a

e

m

n

m

m

d

s

a

e

m

e

w

h

A

=

i

n

e

2

s

o

3

1

o

a

l

o

a

e

m

n

m

m

d

s

a

e

m

e

w

h

A

=

6

n

e

6

s

o

4

o

i

s

n

s

h

w

i

n

t

h

t

a

e

T

f

s

a

d

e

o

f

o

/

?

i

s

h

c

o

o

b

o

(

FR

4

t

h

s

s

a

d

e

o

f

o

&

i

s

b

o

F

F

~

F

1

F

F

F

~

T

x

s

a

s

c

a

e

c

m

e

f

o

m

N

k

F

G

&

?

F


32/42

36


Table 3b

Restricted estimates under the constraint that I = 0.

Coefficients (and standard errors) of:

u

SMSA

RNS

s^

0.157

0.120 -0.150

(0.012)

(0.013)

(0.016)

x2(36) = 89.08

See

footnote to table 3a.

no leads or lags, give union coefficients of 0.195, 0.189, and 0.191 in 1969,

1970 and 1971.17 So the decline in the union coefficient, when we allow for

heterogeneity bias, is 32% or 44x, depending on which biased estimate (0.16

or 0.19) one uses. The SMSA and region coefficients also decline in absolute

value. The least squares estimates for the separate cross-sections give an

average SMSA coefficient of 0.147 and an average region coefficient of

-0.131. So the decline in the SMSA coefficient is either 53% or 62x, and the

decline in absolute value of the region coefficient is either 45% or 37%.

5. Conclusion

We have examined the relationship between heterogeneity bias and strict

exogeneity in distributed lag regressions of y on x. The relationship is very

strong when x is continuous, weaker when x is discrete, and non-existent as

the order of the distributed lag becomes infinite.

The individual specific random variables introduce nonlinearity and

heteroskedasticity. So we have provided an appropriate framework for the

estimation of multivariate linear predictors. We showed that the optimal

minimum distance estimator is more efficient, in general, than the

conventional estimators such as quasi-maximum likelihood, We provided

computationally simple generalizations of two- and three-stage least squares

that achieve this efficiency gain.

Using the NLS Young Men in 1969 (N = 1362), Griliches (1976) reports a union membership

coefticient of 0.203. Using the NLS Young Men in a pooled regression for 19661971 and 1973

(N=470), Brown (1980) reports a coefficient of 0.130 on a variable measuring the probability of

union coverage. (The union coverage question was asked only in 1969, 1970, and 1971; so this

variable is imputed for the other four years.) The coefficient declines to 0.081 when individual

intercepts are included in the regression. His regressions also include a large number of

occupation and industry specific job characteristics.


33/42

G. Chamberlain Multiuariate regression modekfor panel data

37

Some of these ideas were illustrated using the sample of Young Men in the

National Longitudinal Survey. We examined regressions of wages on the

leads and lags in union coverage, SMSA, and region. The results indicate

that the leads and lags could have been generated just by a random

intercept. This gives some support for analysis of covariance type estimates;

these estimates indicate a substantial heterogeneity bias in the union, SMSA,

and region coefficients.

ppendix

Let Sz be a set of points where OEQ is a doubly infinite sequence of

vectors of real numbers: 0={...,0_~,0~,0~,...}={0,,t~I), where w,ER~

and I is the set of all integers. Let z,(w)=o, be the tth coordinate function.

Let F be the a-field generated by sets of the form

A =

(0.xz,(w)E

B,, . . .,

Z,+k(u)E Bk},

where t, k E I and the Bs are q-dimensional Bore1 sets. Let P be a probability

measure defined on 9 such that {e,, t E 11 is a (strictly) stationary stochastic

process on the probability space (C&P-, P).

The shift transformation S is defined by z,(So) =zt+ r(w). It is an invertible,

measure preserving transformation. A random variable d defined on (sZ,P, P)

is invariant if d(So)=d(w) except on a set with probability measure zero

(almost surely or as.). A set A E 9 is invariant if its indicator function is an

invariant random variable.

We shall use E(d ( Y), to denote the conditional expectation of the random

variable d with respect to the o-field 3, evaluated at w. Let x, be a

component of zl, let g(x) denote the a-field generated by {.. ., x_ 1, x0, x1,. . .},

and let E(d1 xt,x,_

r,. . .) denote the expectation of d conditional on the

a-field generated by xt, xt 1,. . . .

Proposition 3.

If d is an invariant random variable with E(ldl)< co, then

where t is any integer.

Proof. First we shall show that E(d I a(x)) is an invariant random variable.

Let f(o)=d(Sw). A change of variable argument shows that

E(d I CT(X))~~ E(fl S- o(x)),

a.s.

[See Billingsley (1965, example 10.3, p. 109).] Since d is an invariant


34/42

38 G. Chamberlain, Multivariate regression models,for panel data

random variable, we have d(Sw)=d(o) as.; also S- CJ(X) 0(x). Hence

Let CJ(X,,X _ 1,. . .) denote the a-field generated by (x,, x,_ 1,. . .), and let

~=~~_a(xt,x,-

1,.

. .)

be the left tail o-field generated by the x process.

Since E(d

1 T(X))

is an invariant random variable, there is a version of

E(d 1a(x)) that is measurable Y-. [See Rozanov (1967, lemma 6. l., p. 162).]

Hence E(d

1o(x)) = E(d 1Y)

a.s.,

and

so

E(d

1a(x)) = E(d 10(x,, xt_ 1,. . .)).

Q.E.D.

Let d be an invariant random variable and assume that E(P)< co,

E(xT)< co. Consider the Hilbert space of random variables generated by the

linear manifold spanned by the variables {d,. . .,x_ 1, x,,, x1,. . .}, closed with

respect to convergence in mean square. We also include a constant (1) in

the space. The inner product is (a, b) =E(ab). Then the linear predictor

E*(d I

. ..) X_1,X(),Xl)... )

is defined as the projection of d on the closed linear

subspace generated by { 1,. . ., x _ 1, x0, x1,. . .}.

Proposit ion 4.

I f d i s an i nvar i ant random var i able and E(d) < co, E(xf) < co,

then

E*(dl ..., x_l,xO ,xl ,... )=$+A&

w here f i s t he l imi t i n mean square of cfzO X ,-~/J as J-co, t i s any integer,

and

i = cov (d, ~)/V(~)

if V(2) 0,

=o

if

V(R)=O,

rc/ E(d) - AE(.f).

Proof:

The existence of the limit is implied by the mean ergodic theorem

[Billingsley (1965, theorem 2.1, p. 21)]. Since d is an invariant random

variable, we have cov(d,x,)=cov(d, x1) for all t. Let aJ=xf=l x,-j/J. Then

cov (d, a,) = cov (d, x,),

and

so cov (d, m) lim,, m cov (d, 2,) = cov (d, x1).

Since

i is an invariant random variable, we have cov (a, a,)= cov (a, x,), and so

V(a) = lim,, m

cov (2, a,) = cov (a, x1). Hence

cov

(d $ - A f, x,) = cov (d, x1)-I cov (a, x1)

=

cov (d, 2) - 1 V (a) 0,

t E I.

Since we also have E(d - I I / 2~2) 0, the proof is complete.

Q.E.D.


35/42

G. Chamberlain Multioariate reyression models for panel data 39

Appendix B

Let r: = (x;,yi ), i = 1,. . ., N, where xi = (xii,. . ., xix) and yi = (y,r, . . ., yiM). Write

the mth structural equation as

Yim = S:, i r n U im ,

m=l,...,M,

where the components of zi, are the variables in yi and xi that appear in the

mth equation with unknown coefficients. Let S,, be the following block-

diagonal matrix:

and

Let 0: = (I$~, . ., &), where u& = yim - 6,ozi, and ~5: is the true value of 6,; let

Gz, = E&J Let 6 =(S;, . . ., 6b) be s x 1, and set

s^=

S,,

D -

Sz,) -

(S,,

D

s,,,).

Proposition 5. Assume that (1)

ri

is i.i.d. according to some distribution with

,finite fourth moments; (2) E[xi(yi, -8: Zi,J] = 0 (m = 1,. , ., M); (3) rank (a..,) = s;

und (4) D a Y as N- +E,

,I$& 6)s N(0, A), where

where P is a positive definite matrix. Then

Proof: ~(S^-~O)=(S,,D-~S~,)-~~,,D~~~~~,(U~O~~)~~. By

the

strong law of large numbers, Sz,2@,,; @2x Y ~

W,

is an s x s positive

definite matrix since rank (@,,)=s. So we obtain the same limiting

distribution by considering

(Gi,, Y - l a:,) - 1 CD,, Y l f (I$ @ X&G.

i=l

Note that II: @Ixi is i.i.d. with E(u: @ Xi)=O, V(U~ 0

X~)=E(U~U: 0 Xix;).

Then

applying the central limit theorem gives ~(8-6)~N(0,A).

Q.E.D.


36/42

40

G. Chamberlain, Multivariate regression models or panel data

This result includes as special cases a number of the commonly used

estimators. If zi, = xi(m = 1

. . ., M)

and D =Z, then 8 is the least squares

estimator and ,4 reduces to the formula for R given in eq. (1) of section 3.1.

If Y = E($$) 0 E(x&), then n is the asymptotic covariance matrix for the

three-stage least squares estimator. If Y =E($$ @ XiXI), then ,4 is the

asymptotic covariance matrix for the generalized three-stage least squares

estimator [eq. (3), section 3.31. If

Y = diag{E(z$~) E(xi xi), . ., E(vg) E(xi xi)),

then we have the asymptotic covariance matrix for two-stage least squares. If

Y = diag{ E($t Xi xi), . ., E(?I~$ xi xi)},

we have the asymptotic covariance matrix for generalized two-stage least

squares. [A,, is given in eq. (4), section 3.3.1

Next we shall derive the properties of the minimum distance estimator. Let

D,(0) = [a,-g(@]AJa,-g(6)] and choose e to

min ON(e).

Bt 1

Assumptions 1 and 2 are stated in section 3.2.

Proposition 6.

f

ssumpt i on 1 i s sati sfied, t hen @%I .

ProoJ:

Let D*(6) = &(0) -g(0)] Yk(8) -g(e)].

D, a.~.

converges uniformly

to D* on Y: Let B be a neighborhood of 8 and set r = r-

B.

Then

min D,(@*min D*(8)=&

BEY BET

Since 6 > 0 and

DN (& )a*O,

it must be that 8~ B a.s. for N sufficiently large.

Since B is an arbitrary neighborhood of 8, we have shown that 8 Leo.

Q.E.D.

Proposition 7.

f

Assumpti ons I and 2 are sat i sfi ed, t hen fi(& -8)

% N(U , A), w here


37/42

G. Chamberl ain, M ult ivari ate

regression

models or panel dat a

41

Zf A is positive definite, then A -(G A - 1 G)- 1 is positive semi-definite; hence an

optimal choicefbr Y is A I.

Proof Let

s,(fI)=dD,(B)/80= -2(aip(e)/a~)A.[a,-g(e)].

Since &SO, for N sufficiently large we as. have 6~ Z. and s,(8) = 0. The

mean value theorem implies that

s,(6)= ~~(8) + (ds,(O*)/tW) (6 0)

a.s.,

for sufficiently large N, where 8* is on the line segment connecting 8 and 0.

[There is a different 8* for each row of &,(O*)/%; the measurability of 8*

follows from lemmas 2 and 3 of Jennrich (1969).] Since 0*28, direct

evaluation shows that

&&I*)/% 32 G Y G,

which is non-singular. Hence

fi(e^- 0) = - [t%,(tl*)/ae,] - l

JNs, eo)

a.s.,

for sufficiently large N. We obtain the same limiting distribution by

considering

Hence @(e - 0)s N(O, A).

To find an optimal Y, note that there is a non-singular matrix C such that

A=CC.

Let G=C-Gand B=(GYG)-GYC. Then we have

which is positive semi-definite. Q.E.D.

Proposition 8. If Assumptions I and 2 are satisfied, if A is positive definite,

and if ANaLA-, then

NC% s(e)l Ada, -&I

%c2(q

PI.


38/42

42

G. Chamberl ain, M ulr ~rt rri ate regression models or panel data

Proof:

For sufficiently large N we have

JNCg(&)-g(OO)]

G,

JE(e^-0)

a.s.,

where G, %G. From the proof of Proposition 7, we have

JE(e^--OO)=R,JN[u,-g(BO)] s

.,

where R,~R=(GA-G)-lGAp. Hence

fib,

-g(831

= ,%,

-AeoN

-fiCg(83

g(e")l sQCU>

where Q = Z,- GR, C is a non-singular matrix such that CC = A, and

II 2 N(O,

I,);

d, = N[a, -g(@)] A,[a, -g(8)] %i C Q A - 1 QCu.

Let G=CG and M,=Z,-c(G@lc; then M, is a symmetric

idempotent matrix with rank

q-p

and

CQA-QC=M;=M,.

Hence d,~,uM,u~~X2(q-p). Q.E.D.

Now consider imposing additional restrictions, which are expressed by the

condition that 8 =f(a), where

a

is s x 1 (s 5 p). The domain of a is Y,, a subset

of

R

that contains the true value d . So O=f(a) is confined to a certain

subset of RP.

Assumption 2. Y,

is a compact subset of

R

that contains a; f is a

continuous mapping from r, into Y, f a) =

e

for a E Y, implies a =a ; Y,

contains a neighborhood of a0 in which f has continuous second partial

derivatives; rank (F) = s, where

F= df a)/da.

Let h a)=g[f a)]. Choose oi to

min

[ahi-h a)]AN[u,-h a)].

OIE,


39/42

G. Chamberlain Multivariate regression models

for

anel data

43

Proposition 8. If Assumptions 1, 2, and 2 are satisfied, $ A is positive

definite, and if AN%Ael, then d, -d2%~(p-s), where

d, =N[a,

-W)lA.Ca,-W41,

Furthermore, d, -d, is independent of d, in their limiting joint distribution.

Proof The assumptions on f and Y, imply that h and & satisfy

Assumptions 1 and 2. By following the proof of Proposition 8, we can show

that the vector (d,, d2) converges in distribution to (d:, d:), where

U& N(O,Z), C is a non-singular matrix such that CC= A, 8= C-H,

G=C-G, and

MH=Iq-A(RA)-lzT, M,=z,-e(el~-w

Since fi is in the column space of e, we have M,Mc=MGM, =M,; so

MH-MG is a symmetric idempotent matrix with rank p-s. Hence

d, -d,~tU(MH-MG)u~~X2(p-s).

Since

cov[(M,-M,)u,M,u] =(MH-MG)MG=O,

we see that d: - d,* is independent of d:.

Q.E.D.

In section 3.2 we considered applying the minimum distance procedure

both to L and to W. We want to show that if the restrictions involve only n,

then the two procedures give estimators of R with the same limiting

distribution. First consider the effect of a one-to-one transformation from W

to (ti, w;): let I(p) be a function from R4 into Rq and let L = al(p)/a$, where

p =g(O). Let h(8) = Ik(O)]. Choose 8 to

y;,rCk) - W41ANCQ4 W)l.


40/42

44

G. Chamberl ain, M ult iv ari at e regression models or panel data

Proposition 9a.

Assume that (1) Assumptions 1 and 2 are satisfied for g and

1: (2) 1 is one-to-one and continuous on the range of g(O) for 0~ K 1 has

continuous second partial derivatives in a neighborhood of g(Q); L is non-

singular; (3) A is positive definite and A, A(LAL)-. Then ,/%(6-O)

3

N(0, A),

where A = (G A

l c)-.*

Proof: By the d-method,

fi[f ~,)-h 8~)]~N O,

A L).

Hence ,,/%(& O)% N(0, A), where ,4 = (H(L A L)) 1 H)- and H= c%(O~)/N.

Since

H= L G

and

L

is non-singular, we have A =(G

A ~ c)-.

Q.E.D.

Finally, consider augmenting aN to a

k x 1

vector cN: c;V=(a;, bk),

kzq.

(For example, we can augment 12 by adding WZ.) Assume that

cN %


41/42

G. Chamberl ain, Mul t iw ri ate regression models for panel data

45

where A&,

is the s,

t

submatrix of A; (s, t = 1,2). Then the concentrated

distance function is

= [aN -g(O)] A,[a, -g(O)].

Q.E.D.

So the addition of unrestricted moments does not affect the minimum

distance estimator.

References

Anderson, T.W., 1969, Statistical inference for covariance matrices with linear structure, in: P.R.

Krishnaiah, ed., Proceedings of the second international symposium on multivariate analysis

(Academic Press, New York).

Anderson, T.W., 1970, Estimation of covariance matrices which are linear combinations or

whose inverse are linear combinations of given matrices, in: Essays in probability and

statistics (University of North Carolina Press, Chapel Hill, NC).

Amemiya, T., 1971, The estimation of variances in a variance-components model, International

Economic Review 12, l-13.

Balestra, P. and M. Nerlove, 1966, Pooling cross section and time series data in the estimation

of a dynamic model: The demand for natural gas, Econometrica 34, 5855612.

Basmann, R.L., 1965, On the application of the identifiability test statistic and its exact finite

sample distribution function in predictive testing of explanatory economic models,

Unpublished manuscript.

Billingsley, P., 1965, Ergodic theory and information (Wiley, New York).

Billingsley, P., 1979, Probability and measure (Wiley, New York).

Brown, C., 1980, Equalizing differences in the labor market, Quarterly Journal of Economics 94,

113-134.

Chamberlain, G., 1980, Analysis of covariance with qualitative data, Review of Economic

Studies 47, 225-238.

Chiang, C.L., 1956, On regular best asymptotically normal estimates, Annals of Mathematical

Statistics 27, 336-351.

Cramer, H., 1946, Mathematical methods of statistics (Princeton University Press, Princeton,

NJ).

Ferguson, T.S., 1958, A method of generating best asymptotically normal estimates with

application to the estimation of bacterial densities, Annals of Mathematical Statistics 29,

10461062.

Goldberger, AS., 1974, Asymptotics of the sample regression slope, Unpublished lecture note no.

12.

Griliches, Z., 1976, Wages of very young men, Journal of Political Economy 84, S69-S85.

Griliches, Z. and A. Pakes, 1980, The estimation of distributed lags in short panels, National

Bureau of Economic Research technical paper no. 4.

Hansen, L.P., 1982, Large sample properties of generalized method of moments estimators,

Econometrica 50, forthcoming.

Hsiao, C., 1975, Some estimation methods for a random coefficient model, Econometrica 43,

3055325.

Jennrich, R.I., 1969, Asymptotic properties of non-linear least squares estimators, The Annals of

Mathematical Statistics 40, 6333643.

Kendall, M.G. and A. Stuart, 1961, The advanced theory of statistics, Vol. 2 (Griffin, London).

MaCurdy. T.E., 1979, Multiple time series models applied to panel data: Specification of a

dynamic model of labor supply, Unpublished manuscript,


42/42

46

G. Chamberl ain, M ult iv ari ate regression models for panel data

Maddala, G.S., 1971, The use of variance components models in pooling cross section and time

series data, Econometrica 39, 341-358.

Malinvaud, E., 1970, Statistical methods of econometrics (North-Holland, Amsterdam).

Mellow, W., 1981, Unionism and wages: A longitudinal analysis, Review of Economics and

Statistics 63, 43-52.

Mundlak, Y., 1961, Empirical production function free of management bias, Journal of Farm

Economics 43,44-56.

Mundlak, Y., 1963, Estimation of production and behavioral functions from a combination of

time series and cross section data, in: C. Christ et al., eds., Measurement in economics

(Stanford University Press, Stanford, CA).

Mundlak, Y., 1978, On the pooling of time series and cross section data, Econometrica 46, 699

85

Mundlak, Y., 1978a. Models with variable coefftcients: Integration and extension, Annales de

IINSEE, 30-31, 4833509.

Rao, C.R., 1973, Linear statistical inference and its applications (Wiley, New York).

Rothenberg, T.J., 1973, Efficient estimation with a priori information (Yale University Press,

New Haven, CT).

Rozanov, Y.A., 1967, Stationary random processes (Holden-Day, San Francisco, CA).

Sims, CA., 1972, Money, income, and causality, American Economic Review 62, 54G552.

Sims, C.A. 1974, Distributed lags, in: M.D. Intriligator and D.A. Kendrick, eds., Frontiers of

quantitative economics, Vol. II (North-Holland, Amsterdam).

Swamv, P.A.V.B.. 1970. Efficient inference in a random coefficient regression

multivariate regression models for panel data

Documents

multivariate bayesian logistic regression for …of clinical...

multivariate linear regression

novel ensemble of multivariate adaptive regression spline

7. measuring impact (martinez) cairo...

multivariate analysis of regression...

multivariate local polynomial regression with application to...

multivariate linear regression models

regression analysis for multivariate dependent count data...

nonparametric regression analysis of multivariate ... ›...

dan multivariate adaptive regression spline (mars

multivariate logistic regression approach for landslide

regression and multivariate analysis

mars - multivariate adaptive regression splines - visionday

block variable selection in multivariate regression and

regularized multivariate regression for identifying...

demonstrating data-driven multivariate regression models

introduction to regression in r part ii: multivariate ......

diagnostic checking for multivariate regression models

multivariate least weighted squares...

sparse multivariate factor regression - arxiv