v01-analysis of variance

991
Preface The field of statistics is growing at a rapid pace and the rate of publication of the books and papers on applied and theoretical aspects of statistics has been increasing steadily. The last decade has also witnessed the emergence of several new statistics journals to keep pace with the increase in research activity in statistics. With the advance of computer technology and the easy accessibility to statistical packages, more and more scientists in many disciplines have been using statistical techniques in data analysis. Statistics seems to be playing the role of a common de- nominator among all the scientists besides having profound influence on such matters like public policy. So, there is a great need to have compre- hensive self-contained reference books to disseminate information on various aspects of statistical methodology and applications. The series Handbook of Statistics is started in an attempt to fulfill this need. Each volume in the series is devoted to a particular topic in statistics. The material in these volumes is essentially expository in nature and the proofs of the results are, in general, omitted. This series is addressed to the entire community of statisticians and scientists in various disciplines who use statistical methodology in their work. At the same time, special emphasis will be made on applications-oriented techniques with the applied statisti- cians in mind as the primary audience. It is believed that every scientist interested in statistics will be benefitted by browsing through these volumes. The first volume of the series is devoted to the area of analysis of variance (ANOVA). The field of the ANOVA was developed by R. A. Fisher and others and has emerged as a very important branch of statistics. An attempt has been made to cover most of the useful techniques in univariate and multivariate ANOVA in this volume. Certain other aspects of the ANOVA not covered in this volume due to limitation of space are planned to be included in subsequent volumes since various branches of statistics are interlinked. It is quite fitting that this volume is dedicated to the memory of the late H. Scheff~ who made numerous important contributions to the field of vii

Upload: asgrus

Post on 29-Oct-2015

78 views

Category:

Documents


11 download

TRANSCRIPT

  • Preface

    The field of statistics is growing at a rapid pace and the rate of publication of the books and papers on applied and theoretical aspects of statistics has been increasing steadily. The last decade has also witnessed the emergence of several new statistics journals to keep pace with the increase in research activity in statistics. With the advance of computer technology and the easy accessibility to statistical packages, more and more scientists in many disciplines have been using statistical techniques in data analysis. Statistics seems to be playing the role of a common de- nominator among all the scientists besides having profound influence on such matters like public policy. So, there is a great need to have compre- hensive self-contained reference books to disseminate information on various aspects of statistical methodology and applications. The series Handbook of Statistics is started in an attempt to fulfill this need. Each volume in the series is devoted to a particular topic in statistics. The material in these volumes is essentially expository in nature and the proofs of the results are, in general, omitted. This series is addressed to the entire community of statisticians and scientists in various disciplines who use statistical methodology in their work. At the same time, special emphasis will be made on applications-oriented techniques with the applied statisti- cians in mind as the primary audience. It is believed that every scientist interested in statistics will be benefitted by browsing through these volumes.

    The first volume of the series is devoted to the area of analysis of variance (ANOVA). The field of the ANOVA was developed by R. A. Fisher and others and has emerged as a very important branch of statistics. An attempt has been made to cover most of the useful techniques in univariate and multivariate ANOVA in this volume. Certain other aspects of the ANOVA not covered in this volume due to limitation of space are planned to be included in subsequent volumes since various branches of statistics are interlinked.

    It is quite fitting that this volume is dedicated to the memory of the late H. Scheff~ who made numerous important contributions to the field of

    vii

  • viii Preface

    ANOVA. Scheff6's book The Analysis of Variance has significant impact on the field and his test for multiple comparisons of means of normal populations has been widely used.

    I wish to thank Professors S. Das Gupta, N. L. Johnson, C. G. Khatfi, K. V. Mardia and N. H. Timm for serving as members of the editorial board of this volume. Thanks are also due to the contributors to this volume and North-Holland Publishing Company for their excellent cooperation. Professors R. D. Bock, K. C. Chanda, S. Geisser, R. Gnana- desikan, S. J. Haberman, J. C. Lee, G. S. Mudholkar, M. D. Perlman, J. N. K. Rao, P. S. S. Rao and C. R. Rao were kind enough to review various chapters in this volume. I wish to express my appreciation to my dis- tinguished colleague, Professor C. R. Rao, for his encouragement and inspiration.

    P. R. Krishnaiah

  • Contributors

    T. A. Bancroft, Iowa State University, Ames (Ch. 13) V. P. Bhapkar, University of Kentucky, Lexington (Ch. 11) R. D. Bock, University of Chicago, Chicago (Ch. 23) D. Brandt, University of Chicago, Chicago (Ch. 23) D. R. Brillinger, University of California, Berkeley (Ch. 8) H. Bunke, Akademie der Wissenschaften der D. D. R., Berlin (Ch. 18) S. Das Gupta, University of Minnesota, Minneapolis (Ch. 6) D. A. S. Fraser, University of Toronto, Toronto (Ch. 12) S. Geisser, University of Minnesota, Minneapolis (Ch. 3) R. Gnanadesikan, Bell Telephone Laboratories, Murray Hill (Ch. 5) C. -P. Han, Iowa State University, Ames (Ch. 13) H. L. Harter, Wright-Patterson Air Force Base, Ohio (Ch. 19) P. K. Ito, Nanzan University, Nagoya (Ch.7) A. J. Izenman, Colorado State University, Fort Collins (Ch. 17) G. Kaskey, Univac, Blue Bell (Ch. 10) C, G. Khatri, Gujarat University, Ahmedabad (Ch. 14) J. Kleffe, Akademie der Wissenschaften der D. D. R., Berlin (Ch. 1) B. Kolman, Drexel University, Philadelphia (Ch. 10) P. R. Krishnaiah, University of Pittsburgh, Pittsburgh (Chs. 10, 16,21,24,25) J. C. Lee, Wright State University, Dayton (Ch. 16) K. V. Mardia, University of Leeds, Leeds (Ch. 9) S. K. Mitra, lndian Statistical Institute, New Delhi (Ch. 15) G. S. Mudholkar, University of Rochester, Rochester (Ch. 21) S. J. Press, University of California, Riverside (Ch. 4) C. R. Rao, University of Pittsburgh, Pittsburgh (Ch. 1) A. R. Sampson, University of Pittsburgh, Pittsburgh (Ch. 20) P. K. Sen, University of North Carolina, Chapel Hill (Ch. 22) L. Steinberg, Temple University, Philadelphia (Ch. 10) P. Subbaiah, Oakland University, Rochester (Ch. 21) N. H. Timm, University of Pittsburgh, Pittsburgh (Ch. 2) Mn Yochmowitz, Brooks Air Force Base, Texas (Ch. 25)

    xvii

  • P. R. Krishnaiah, ed., Handbook of Statistics, VoL 1 1 @North-Holland Publishing Company (1980) 1-40 1

    Estimation of Variance Components

    C. Radhakr i shna Rao* and J i i rgen K le f fe

    1. In t roduct ion

    The usual mixed linear model discussed in the literature on variance components is

    Y=Xi~+ Ul t~ l+ + Upep+~ (1 .1)

    where X, U 1 ..... Up are known matrices, B is a fixed unknown vector parameter and 91 .. . . . eOp, e are unobservable random variables (r.v.'s) such that

    e(q,i) =o, = o, =o,

    E(ee') = o2In, E(~?,*;) = o2I,~. (1.2)

    The unknown parameters 2 2 2 o 0, o 1 .. . . . oj; are called variance components. Some of the early uses of such models are due to Yates and Zacopancy

    0935) and Cochran (1939) in survey sampling, Yates (1940) and Rao (1947, 1956) in combining intra and imerblock information in design of experiments, Fairfield Smith (1936), Henderson (1950), Panse (1946) and Rao (1953) in the construction of selection indices in genetics, and Brown- lee (1953) in industrial applications. A systematic study of the estimation of variance components was undertaken by Henderson (1953) who pro- posed three methods of estimation.

    The general approach in all these papers was to obtain p + 1 quadratic functions of Y, say Y' Qi Y, i = 1 .. . . ,p + 1, which are invariant for transla- tion of Y by Xa where a is arbitrary, and solve the equations

    Y' Qi Y= E( Y' Qi Y )= aioo~ + ailo~ + . . . + ao, o~, i=0,1 .. . . . p.

    (1.3)

    The work of this author is sponsored by the Air Force Office of Scientific Research, Air Force Systems Command under Contract F49620-79-C-0161. Reproduction in whole or in part is permitted for any purpose of the United States Government.

  • 2 C Radhakrishna Rao and Ji~rgen Kleffe

    The method of choosing the quadratic forms was intuitive in nature (see Henderson, 1953) and did not depend on any stated criteria of estimation. The entries in the ANOVA table giving the sums of squares due to different effects were considered as good choices of the quadratic forms in general. The ANOVA technique provides good estimators in what are called balanced designs (see Anderson, 1975; Anderson and Crump, 1967) but, as shown by Seely (1975) such estimators may be inefficient in more general linear models. For a general discussion of Henderson's methods and their advantages (computational simplicity) and limitations (lack of uniqueness, inapplicability and inefficiency in special cases) the reader is referred to papers by Searle (1968, 1971), Seely (1975), Olsen et al. (1976) and Harville (1977, p.335).

    A completely different approach is the ML (maximum likelihood) method initiated by Hartley and Rao (1967). They considered the likeli- hood of the unknown parameters fl, o02 ..... o~ based on observed Y and obtained the likelihood equations by computing the derivatives of likeli- hood with respect to the parameters. Patterson and Thompson (1975) considered the marginal likelihood based on the maximal invariant of Y, i.e., only on B'Y where B = X (matrix orthogonal to X) and obtained what are called marginal maximum likelihood (MML) equations. Harville (1977) has given a review of the ML and MML methods and the computa- tional algorithms associated with them.

    ML estimators, though consistent may be heavily biased in small sam- ples so that some caution is needed when they are used as estimates of individual parameters for taking decisions or for using them in the place of true values to obtain an efficient estimate of ft. The problem is not acute if the exact distribution of the ML estimators is known, since in that case appropriate adjustments can be made in the individual estimators before using them. The general large sample properties associated with ML estimators are misleading in the absence of studies on the orders of sample sizes for which these properties hold in particular cases. The bias in MML estimators may not be large even in small samples. As observed earlier, the MML estimator is, by construction, a function of B'Y the maximal invariant of Y. It turns out that even the full ML estimator is a function of B'Y although the likelihood is based on Y. There are important practical cases where reduction of Y to B' Y results in non-identifiability of individ- ual parameters, in which case neither the ML nor the MML is applicable. The details are given in Section 5.

    Rao (1970, 1971a, b, 1972, 1973) proposed a general method ~alled MINQE (minimum norm quadratic estimation) the scope of which has been extended to cover a variety of situations by Focke and Dewess (1972), Kleffe (1975, 1976, 1977a, b, 1978, 1979), J.N.K. Rao (1973), Fuller

  • Estimation of variance components 3

    and J.N.K. Rao (1978), P.S.R.S. Rao and Chaubey (1978), P.S.R.S. Rao (1977), Pukelsheim (1977, 1978a), Sinha and Wieand (1977) and Rao (1979). The method is applicable to a general linear model

    Y~--:~f~'~-E, E(EE/)=O1VI-~- * . . -~-OpVp (1.4)

    where no structure need be imposed on e and no restrictions are placed on 0 i or V i. (In the model (1.1), 0 i >1 0 and V/are non-negative definite.)

    In the MINQE theory, we define what is called a natural estimator of a linear functionf 'O of 0 in terms of the unobservable r.v. e in (1.4), say e'Ne. Then the estimator Y'A Y in terms of the observable r.v. Y is obtained by minimizing the norm of the difference between the quadratic forms e'Ne and Y'A Y = (Xfl + e)'A (XB + e). The universality of the MINQE method as described in Rao (1979) and in this article arises from the following observations:

    (a) It offers a wide scope in the choice of the norm depending on the nature of the model and prior information available.

    (b) One or more restrictions such as invariance, unbiasedness and non-negative definiteness can be placed on Y 'AY depending on the desired properties of the estimators.

    (c) The method is applicable in situations where ML and MML fail. (d) There is an automatic provision for incorporating available prior

    information on the unknown parameters fi and 0. (e) Further, ML and MML estimators can be exhibited as iterated

    versions of suitably chosen MINQE's. (f) The MINQE equation provides a natural numerical algorithm for

    computing the ML or MML estimator. (g) For a suitable choice of the norm, the MINQ estimators provide

    minimum variance estimators of 0 when Y is normally distributed. It has been mentioned by some reviewers of the MINQE theory that the

    computations needed for obtaining the MINQ estimators are somewhat heavy. It is true that the closed form expressions given for MINQE's contain inverses of large order matrices, but they can be computed in a simple way in special cases that arise in practice. The computations in such cases are of the same order of magnitude as obtaining sums of squares in the ANOVA table appropriate for the linear model. It is certainly not true that the computation of MLE or MMLE is simpler than that of MINQE. Both may have the same order of complexity in the general case.

    Recently, simple numerical techniques for computing MINQE's have been developed by Ahrens (1978), Swallow and Searl (1978) and Ahrens et al. (1979) for the unbalanced random ANOVA model and by Kleffe (1980) for several unbalanced two way classification models. Similar results for

  • 4 C. Radhakrishna Rao and Jiirgen Kleffe

    simple regression models with heteroscedastic variances have been given by Rao (1970) and Kleffe and Z611ner (1978). Infante (1978) investigated the calculation of MINQE's for the random coefficient regression model.

    2. Models of variance and covariance components

    2.L General mode[

    There is a large variety of models of variance and covariance compo- nents used in research work in biological and behavioral sciences. They can all be considered in a unified frame work under a general Gauss- Markoff (GM),model

    Y=XB+e (2.1.1)

    where Y is n-vector random variable, X is n m matrix, 13 is m-vector parameter and e is n-vector variable. The models differ mainly in the structure imposed on e. The most general formulation is

    E(e)=0, (2.1.2)

    D(e)=OIV, +. . . +OpV,= V(0)= V o (2.1.3)

    where D stands for the dispersion (variance covariance) matrix, 0 '= (01 .. . . . 0p) is unknown vector parameter and V l . . . . . V e are known symmet- ric matrices. We let/? ~R m and 0 ~ ~ (open set) cR p such that V(O)>10 (i.e., nonnegative definite) for each 0 E ~-. In the representation (2.1.3) we have not imposed any restriction such as 0 i/> 0 or V, is nonnegative definite.

    It may be noted that any arbitrary n n dispersion matrix O--(0~/) can be written in the form (2.1.3)

    ~] ~2 00 V U (2.1.4)

    involving a maximum of p = n(n + 1) unknown parameters 0,7 and known matrices V~j, but in models of practical interest p has a relatively small value compared to n.

    2.2. Variance components

    A special case of the variance components model is when e has the structure

    e= UI~ 1 +. . . + UpOp (2.2.1)

  • Estimation of variance components 5

    where U~ is n rni given matrix and ~ is mi-vector r.v. such that

    E(~i)=0; E(+i~j)=0, i=/=j; E(,#iq~;)= o21,~.

    In such a case

    (2.2.2)

    V(O) = 01V 1 +. . . + Op ~ (2.2.3)

    V,= UiU/>0 and 0 i=o2>0. Most of the models discussed in where literature are of the type (2.2.1) leading to (2.2.3).

    The complete GM model when e has the structure (2.2.1) is

    Y=x+ u,,~,+ . . - + ~%,

    E(Oi) = 0; E(OiOj) = 0, ivaj; E(O~O;)=a~I,~. (2.2.4)

    The associated statistical problems are:

    (a) Estimation of fl,

    (b) Estimation of a~, i= 1 .. . . . p,

    (c) Prediction of q~/, i= 1 .. . . . p. (2.2.5)

    The last problem arises in the construction of selection indices in genetics, and some early papers on the subject providing a satisfactory solution are due to Fairfield Smith (1936), Panse (1946) based on an idea suggested by Fisher. See also Henderson (1950) for similar developments. A theoretical justification of the method employed by these authors and associated tests of significance are given in Rao (1953).

    A particular case of the model (2.2.4) is where it can be broken down into a number of submodels

    Y1 = X I /~+ E1 . . . . . Yp =Xpfl+Ep (2.2.6)

    where Y,. is ni-vector variable and

    E(ei) = O, E(eie;) = Off,~, E(eiej' ) = 0. (2.2.7)

    Note that the fl parameters are the same in all submodels, and in some situations the design matrices X 1 .. . . . Xp may also be the same. The model (2.2.6) with the covariance structure (2.2.7) is usually referred to as one with "heteroscedastic variances" and the problem of estimating fi as that of estimating a "common mean" (see P.S.R.S. Rao et al., 1979; and J.N.K. Rao and Subrahmaniam, 1971).

  • 6 c. Radhakrishna Rao and Jiirgen Kleffe

    2.3. Variance and covariance components

    We assume the same structure (2.2.1) for e but with a more general covafiance structure for the 4~i's

    E(dp,) = O, E(OiO;) = Ai, i = 1 . . . . . k,

    E(Oidp; ) ---- a~I,,~, i = k + 1 . . . . . p,

    E(~,~) = 0, ivaj (2.3.1)

    leading to

    V(0)= U1A1 U; + - - + VkAkV; ,+o;+|Uk+lV ;+ l+2 , " * +4Up V;'

    (2.3.2)

    where A i >/0. tn some practical problems A i are all the same and there is only one 0 2 in which case (2.3.1) becomes

    V(O)---- U1AU;+ - + UkAU/~+a2I . (2.3.3)

    2.4. Random regression coefficients

    This is a special case of the variance and covariance components model considered in Section 2.3 where e has the structure

    e = S01 + q52, E(q)10]) = A, E(q)2~;) = 021 (2.4.1)

    the compounding matrix for 01 being the same as for fl leading to the GM model

    Y ~-- X~ + X I + 2 ,

    D(e) = XAX '+ o21. (2.4.2)

    In general, we have repeated observations on the model (2.4.2) with different X's

    leading to the model

    Y=Xf l+ e with ":rx'[i

    D(~) =

    i = 1 . . . . . t (2.4.3)

    X1AX ~ + a2I 0 ]

    l 0 XtAX / + 021

    (2.4.4)

    (2.4.5)

  • Estimation of variance components 7

    all the off diagonal blocks being null matrices. A discussion of such models is contained in Fisk (1967), Infante (1978), Rao (1965, 1967), Swamy (1971), and Spjotvoll (1977). In some cases A is known to be of diagonal form (see Hildreth and Houck, 1968).

    2.5. Intraclass correlation model

    We shall illustrate an intraclass correlation model with special reference to two way classified data with repeated observations in each cell

    Y~jk, i= 1 . . . . ,p ; j= 1 . . . . . q; k= l . . . . . r. (2.5.1)

    We write

    ~jk = bt0k + e,;ik

    where P~jk are fixed parameters with a specified structure, and

    (2.5.2)

    e%~)=0, e(~2~)= 0 2,

    g(eijre~is) = O201, r~S,

    E(eijreiks) = 0202, j =/=k, r =/=s, E(eijretk,) -- 02p> i--/=t,j=/=k, r=/=s.

    (2.5.3)

    This dispersion matrix of (Yr~) can be exhibited in the form (2.1.3) with 2 2 2 v 2 four parameters a , o Ol, o 02, o 03. A model of the type (2.5.2) is given in

    Rao (1973, p. 258).

    2.6. Mult ivar iate model

    A k-variate linear model is of the form

    (v~: . . . : v~)=x(& :- . . :&)+(< :-- - :*0,

    (2.6.1)

    / i * t t ! - _ _ ! t l Denoting Y=(Y 1 .. . . . Y), f f=( f i l . . . . . f i), e - (q . . . . . e~), the multivariate model may be written as a univariate model

    P

    E(~g)= ~ (oi re) (2.6.2)

  • 8 C. Radhakrishna Rao and Jiirgen Kleffe

    where t3 i are (k k) matrices of variance and covariance components o}~ ), r, s = 1 .. . . . k. In the multivariate regression model p = 1, in which case

    E(gi') = (O V). (2.6.3)

    We may specify structures for e analogous to (2.2.1) in the univariate case

    Ei= Ult~li'q- " "" + Up~l~i, i= 1 . . . . ,k,

    E (+i ,~Ojm)=o~I , E(+i/O~h)=O, rC~h. (2.6.4)

    For special choices of U~, we obtain multivariate one, two,.., way mixed models.

    Models of the type (2.6.2) have been considered by Krishnaiah and Lee (1974). They discuss methods of estimating the covariance matrices Ol- and testing the hypothesis that a covariance matrix has the structure (2.6.2).

    3. Estimability

    3.1. Unbiasedness

    Let us consider the univariate GM model (2.1.1) with the covariance structure (2.1.3)

    r--x~+~, D(,)--0,Vl+-.. +0,z, (3.1.1)

    and find the conditions under which linear functions f 'O can be estimated by functions of Y subject to some constraints. The classes of estimators considered are as follows:

    = ( Y 'A Y, A symmetric},

    %= ( g(r): e[ g(Y)] =f'o v~ ~R~,O ~),

    = (g(r) =g(r+x~)w).

    (3.1.2)

    (3.1.3)

    (3.1.4)

    We use the following notations: (i) S(B) represents the linear manifold generated by the columns

    of B. (ii) P=X(X 'X) -X ' is the projection operator onto $ (X), and M=

    I -P . (iii) Pr = X(X ' T - 1X) -X ' T - 1

  • Estimation of variance components 9

    Theorem 3.1.1 provides conditions for unbiased estimability.

    TrtEOm~M 3.1.t. Let the linear model be as in (3.1.1). Then: (i) The estimator Y'A Y is unbiased for y =f'O iff

    X 'AX= 0, trA V~ =f , i = 1 ... . . p. (3.1.5)

    (ii) There exists an unbiased estimator ~efg iff f E $ (H),

    H= (hij), h o = tr( V/Vj- PViPVj). (3.1.6)

    (iii) I f Y has multivariate normal distribution, then '6,y is not empty iff eL! N ~ is no empty.

    The results (i) and (ii) are discussed in Seely (1970), Rao (1970, 1971a, b) and Focke and Dewess (1972), Kleffe and Pincus (1974a, b) and (iii) in Pincus (1974).

    NOTE 1: Result (ii) holds if in (3.1.6) we choose

    h 0 = tr( V,.(I- P) Vj). (3.1.7)

    NOTE 2: In the special case V,.Vj=0 for i~ j , 0 i, the ith individual parameter, is unbiasedly estimable iff MV~0 where M= I -P .

    LEM~A 3.1.1. The linear space F of all unbiasedly estimable linear functions of Ois

    F= {EoY'AY: A ~sp(V , -PV IP . . . . . Vp-PVpP)} (3.1.8)

    Where sp(A 1 ..... Ap) is the set of all linear combinations of A 1 ... . . Ap.

    Let us consider the multivariate model (2.6.2) written in a vector form

    f= ( IX) f l+ ~,

    E(i~:) = ([~1 @ VI) -~-.- --b (%@ Vt~ ) (3.1.9)

    where O i are k k matrix variance-covariance components.

    LEMMA 3.t.2. The parametric function y=Zf / t rCO i is unbiasedly estim- able from the model (3.1.9) iff f'O is so from the univariate model (3.1.1).

  • ~0 C. Radhakrishna Rao and Jiirgen Kleffe

    LEMMA 3.I.3. The class F of unbiasedly estimable linear functions of elements of 0 i, i = 1 .. . . . p in (3.1.9), is

    F= (y= ~] trCiOi: C i are such that nb=O~ Y~ b, Ci=O) (3.1.10)

    where H is as defined in (3.1.6) or (3.1.7).

    3.2. Invariance

    An estimator is said to be invariant for translation of the parameter fl in the linear model (3.1.1) if it belongs to the class (3.1.4). Theorem 3.2.1 provides the conditions under which estimators belonging to the class ~fV~ ~ exist.

    THEOREM 3.2.1. Let the linear mbdel be as in (3.1.1). Then: (i) The estimator Y'A Y E )~Lf N ~ iff

    AX=O, t rAV/=f , i= 1 ..... p. (3.2.1)

    (ii) There exists an unbiased estimator in ~ f3 ~ iff f c S ( HM) where

    H M = (ho.), h U = tr(MV/MVj.), M = i - P. (3.2.2)

    (iii) Under the assumption of normality of Y, the result (3.2.2) can be extended to the class ~.

    NOTE: In (3.2.2), we can choose

    hlj = tr( BB' ViBB' Vj) (3.2.3)

    where B is any choice of X , i.e., B is a matrix of maximum rank such that B'X = O.

    LEMMA 3.2.1. The linear space of all invariantly unbiasedly estimable linear functions of 0 is

    r l= (EoY 'MAMY: A E sp(V1-PV lP .. . . . Vp -PVpP) ) . (3.2.4)

    LEMMa 3.2.2. I f f'O is invariantly unbiasedly estimable from the model (3.1.1) then so is ~:=X f.trCOi from the model (3.1.9).

  • Estimation of variance components ~ 1

    LBM~ 3.2.3. All invariantly unbiasedly estimable linear functions of the elements of 1 ..... p in the model (3.1.9) belong to the set

    Fez = { y= trCiOi: C i are such that HMb =0~ biCi=O}.

    (3.2.5)

    For proofs of Lemmas, 3.2.2 and 3.2.3, see Kleffe (1979).

    NOTE: We can estimate any member of the class (3.2.5) by functions of the form

    E tr(qr%r)

    where Ap .... Ap are matrices arising in invariant quadratic unbiased esti- mation in the univariate model (3.1.1).

    3. 3. Examples

    Consider the model with four observations

    Y1 = fll -~- ED Y2 = B1-1- e2, Y3 = B2 "~ 63, Y4 = B2 "1- e4

    where e i are all uncorrelated and V(el)-~- V(E3)= 02 and V(e2)= V(e4)= 02. The matrices X, V 1, V 2 are easily seen to be

    X= Ii i1 [i000; , [000 il Vl= 0 0 0 0 0 1 0 ' 0 1 0 ' 0 0 0 "

    0 0 0 0 0 0

    The matrices H and H M of Theorems 3.1.1 and 3.2.1 are

    3 11 r l 1 H=[ 1 3 ' l l 1

    7, Since H is of full rank, applying (3.1.6) we find that o12 and o2 z are individually unbiasedly estimable. But H M is of rank one and the unit vectors do not belong to the space g (/arM). Then (3.2.2) shows that o 2 and o 2 are not individually unbiasedly estimable by invariant quadratic forms.

    Consider the model Y-- X/? + Xq, + e where/? is a fixed vector parameter and ~ is a vector of random effects such that E(~a)=0, E (~ ' )= 02Im,

  • !2 C. Radhakrishna Rao and Jiirgen Kleffe

    E(eoe') = O, E(ee*) = o~I n. Let Y 'A Y be an unbiased estimate of o 2. Then we must have

    X'AX = O, t rAXX ' = t, trA = 0

    which is not consistent. Hence unbiased estimators of cr22 do not exist.

    4. Minimum variance unbiased estimation (normal case)

    4. O. Notations

    In Section 3, we obtained conditions for unbiased estimability of f 'O in the !inear model

    Y=Xf l - l -e , D(e)~-~IVI + . . . -.I-OpVp= V 0 (4.0.1)

    restricting the class of estimators to quadratic functions of Y. In this section we do not put any restriction on the class of estimators but assume that

    Y~Nn(X/3 , Vo) , /3 ~1~ m, 0 ~ (4.0.2)

    i.e., n variate normal, and V o is p.d. for 0 ~ o~. The condition that V o is p.d. is assumed to simplify presentation of results, and is satisfied in many practical situations.

    First, we derive the locally minimum variance unbiased estimator (LMVUE) of f 'O at a chosen point (flo, Oo) in R m x6) -. If the estimator is independent of fio, Oo then we have an UMVUE (uniformly minimum variance unbiased estimator). UMVUE's do not exist except in simple cases. In the general case we suggest the use of LMVUE with a suitable choice of fl0, 00 based on previous experience or apriori considerations. We also indicate an iterative method which starts with an initial value (/30, 00), gets an improved set (/31,01), and provides in the limit IMVUE (iterated MVUE).

    LMVUE's are obtained in the class of quadratic estimators by LaMotte (1973) under the assumption of normality and by Rao (1971a, b) in the general case. Such estimators were designated by Rao as MIVQUE (minimum variance quadratic unbiased estimator). Kleffe and Pincus (1974a, b) and Kleffe (1977a, b) extended the class of estimators to quadratic forms in (Y -Xo0 and found that under normality assumption, MIVQUE is LMVUE in the whole class of unbiased estimators.

  • Estimation of variance components 13

    4.1, Locally minimum variance unbiased estimation

    D~HNmOS 4.1.1. An estimator 3, is called LMVUE of its expected value at ( rio, 0o) E R" x ~ iff

    V( q, I /30,00) < V(~ I fio, Oo) (4.1.1)

    for all q such that

    E('~,) ~- E(~) ~( 1~, 0) ~ R m X o~. (4.1.2)

    We use the following notations:

    vo=o v +. . .

    A,o= Vo-I(V~- PoV~Pd)Vo -', eo=X(X'V~lX)X'Vo -1,

    K o = (trAio Vj),

    = [ ( r - )'A ,o( Y - XB ) ..... ( Y - XB )%o( Y - XB ) ]'. (4.1.3)

    Let (rio, 0o) be an apriori value of (fl, 0). Then applying the result (3.1.6) of Theorem 3.1.1 we find that f'O is unbiasedly estimable iff

    f E g (Koo). (4.1.4)

    Theorem 4.1.1 provides an explicit expression for the LMVUE.

    T~IEOX~M 4.i.1. Let f satisfy the condition (4.1.4) and Ko, k, o be as defined in (4.1.3). Then the LMVUE of f'O at (flo, Oo) is

    "7 = X'k&,oo = E )~( Y - Xflo)'A,( Y - )(rio) (4.1.5)

    where ~ is any solution of KooX =f.

    Theorem 4.1.1 is established by showing that

    cov(g(Y), &, o0) = 0

    for all g(Y) such that E[g(Y)lfl, O]=O Vfl ER" , 0~, and using the theorem on minimum variance estimation given in C.R. Rao (1973, p.317).

  • ~4 C Radhakrishna Rao and Jiirgen Kleffe

    NOTE 1: For any X, X'kao, e is LMVUE of its expected value which is a linear function of 0. Thus (4.1.5) characterizes all LMVUE's Of linear functions of 0 at (rio, 0o).

    NOTE 2: The variance of ~ as defined in (4.1.5) is

    V({ I/3, O) = 4(/3 -/3o)'X'AooVoAoX( fi -/3o) + 2 trAooVoAooVo

    where Aoo = ZX, Aio o. The variance at (/30, 00) is

    V(91 flo, 0o) = 22t'Ko), = 2f'XoTf

    where Koo is any g-inverse of Koo.

    (4.1.6)

    (4.1.7)

    where M= I -X(X 'X) -X ' , and C + is the Moore Penrose inverse of C (see Rao and Mitra, 1972). The statistic "71 which is independent of the apriori value of fl is an alternative estimator of f'O but it may not be unbiased for f'0.

    NOTE 4: Theorem 4.1.1 can be stated in a different form as follows. Iff'O is unbiasedly estimable then its LMVUE at /30,00 is f '0 where 0 is any solution of the consistent equation

    KooO = kazoo (4.1.10)

    NOTE 5: (Estimation of fl and 0.) Let 0 (i.e., each component of 0) be estimable in which case 1(2oo is nonsingular and the solution of (4.1.10) is ~l--Koolk~o, Oo. Let /?l be a solution of Xfl=PooY. We may use 01,fl 1 the LMVUE of 0,/3 as initial values and obtain second stage estimates t~2 and /32 of 0 and fi as solutions of

    K40=k~,,d ,, XB= Pd Y. (4.1.11)

    ql=X'kLoo= Y (MVooM ) ' + (~Vi ) (MVooM) + Y (4.1.9)

    NOTE 3: The BLUE (best linear unbiased estimator) of Xfl at 0 o is

    Xfl= Poo Y. (4.1.8)

    Substituting/~ for t0 in (4.1.5) we have

  • Estimation of variance components 15

    The process may be repeated and if the solutions converge they satisfy the equations

    KeO=kp, o, X[3=PeY. (4.1.12)

    The solution (/~,0) of (4.1.12) may be called IMVUE (iterated minimum variance unbiased estimator) of ([3,0). The IMVUE is not necessarily unbiased.

    4.2. Invariant estimation

    Let us restrict the class of estimators to invariant unbiased (IU) estima- tors, i.e., estimators g(Y) such that

    g(Y+Xf l )=g(Y) V[3, E[ g( Y)I fl, O] =f'O (4.2.1)

    and find the locally minimum variance invariant unbiased estimator (LMVIUE).

    Let

    M ~

    Hul(O ) =

    hi(Y,8)=

    I - P , P=X(X 'X) -X ' ,

    (tr[ (MVoM) + Vi( MVoM ) + Vj ])

    OrE e;)vo ( Y'( MVoM) + VI( MVoM) + Y,

    .... Y'( MVoM) + Vp( MVoM) + Y)'

    [ Y'Vo-~(I-eo) Vl ( I - t~)Vo -1Y,

    .... Y' Vo-1( I - Po) Vp( I - P~) V e- 1r] ,. (4.2.2)

    THEOREM 4.2.1. (i) f'O is invariantly unbiasedly estimable iff

    f E S (Hul(O))

    for any choice of 0 such that V e is nonsingular. (ii) The LMVIUE of f'O at 0 o is

    ~=)Chx( Y, Oo)

    where X is any solution of [HuI(Oo)])~ =f.

    (4.2.3)

    (4.2.4)

  • 16 C. Radhakrishna Rao and Jiirgen Kleffe

    The resuRs of Theorem 4.2.1 are obtained by transforming the model = X/3 + e to a model involving the maximal invariant of Y,

    = B' Y= B'e = e, (4.2.5)

    where B = X J-, which is independent of/3, and applying Theorem 4.1.1.

    NOT~ 1: Theorem 4.2.1 can be stated in a different form as follows. I f f '0 is invariantly unbiasedly estimable, then its LMVIUE at 0 0 is f't~ where 0 is a solution of

    [ Hu,(Oo) ] 0 = hi( Y, 0o) (4.2.6)

    where Hux(O ) and hi(Y, 0) are defined in (4.2.2).

    NoT~ 2: If 0 admits invariant unbiased estimation, then as in Note 5 following Theorem 4.1.1 we may obtain IMVIUE of (/3,0) as the solution of

    x /3=eo,

    i Hvz(O) ] 0 = hz( Y, 0). (4.2.7)

    5. Minimum norm quadratic estimation (MINQE-theory)

    5.0. MINQE-principle

    In Section 4 we assumed normal distribution for the random vector Y in the linear model and obtained the LMVUE of linear functions of variance components without imposing any restriction on the estimating function. However, we found that the estimators were all quadratic. In the present section we shall not make any distributional assumptions but confine our attention to the class of quadratic estimators and lay down some principles for deriving optimum estimators.

    Natural estimator: Consider a general random effects linear model

    Y=x/3+ + . . . + u ,%= + ueo,

    = o, = o,I . , , =o (5.o.1)

    so that

    D( Y ) = O1U1U; -+. . . . -~ ~p Up ~Tpt= Ol Vl -~ O . . q- ~p Vp.

  • .Estimation of variance components 17

    It is convenient for later developments to write the error term in the form

    U(~= U,4,,-- Ul,4,1,-[--.. -Jr Up,B, (5.0.2)

    where U/.= ~ U i and Oi.=4,i/V~aai ~ and a i is an apriori value of Oi, so that Oi* are comparable in some sense. A natural estimator of 0 i when 4,,. is known is 0i = aiq):*4,i/ri and that off 'O is

    f '0= ~b. N+, (5.0.3)

    with a suitable choice of the matrix N I. Suppose that the detailed structure of 0 as in (5.0.1) is not specified but

    it is given that

    E(4,4,') = 01F 1 +. . . + OeF, (5.0.4)

    so that

    D(Y)=OlUF IU : + " " +OpUFpU'=O1VI +"" + OpVp.

    It is not clear as to how a natural estimator off 'O can be defined in terms in such a case. However, using prior values oq . . . . . % of 01 . . . . . 0p we may

    write

    UO = (UF2/2)(F-'/24,) = U,q~, (5.0.5)

    where F, = alF l + . . . + o~fp and define an estimator off 'O as

    ~--- E ~i4,~*( F = l/2FiFa-1/2)(]), = #>;N4,. (say) (5.0.6)

    where ~ are chosen to make ~ unbiased for f '0 , i.e.,/~1 .... ,/~ is a solution of the equations

    (trFiF~-tFlF~-') l~l+.. . +(trF/F~-lFpF~-l)p v =0, i= 1 . . . . . p.

    A more general definition of a natural estimator in terms of e when the model is Y= X3 + e without specifying any structure for e is given in Section 5.4.

    MINQE-theory: Consider the general model (5.0.5) and a quadratic estimator ~ = Y'A Y of f'O. Now

    Y'A Y= * X 'A U, X 'AX ] ~ fl ]

    (5.0.7)

  • 18 C. Radhakrishna Rao arm Ji~rgen Kleffe

    'N while the natural estimator is q~, q~, as defined in (5.0.6). The difference between Y'A Y and ~;N~, is

    X'AU, X 'AX 1\ " (5.0.8)

    The minimum norm quadratic estimator (MINQE) is the one obtained by minimizing an appropriately chosen norm of the matrix of the quadratic form in (5.0.8)

    Dzi Dl21 ] U;AU,-N U' ,AX O'21 D22 = X'AU, X 'AX " (5.0.9)

    We shall consider mainly two kinds of norms, one a simple Euclidean norm

    trD11Dlt + 2 tr D12Dzl + tr D2zDzz (5.O.lO)

    and another a weighted Euclidean norm

    tr Dl I WD ix W+ 2 tr D ~2KD21 W+ tr D22KD2zK (5.o.11)

    where W and K are n.n.d, matrices. The norm (5.0.11) gives different weights to ~, and fl in the quadratic form (5.0.8).

    We impose other restrictions on A (and indicate the MINQE so ob- tained by adding a symbol in brackets) such as Y 'A Y

    (a) is unbiased: MINQE(U)

    (b) is invariant for translation in t : MINQE(I)

    (c) satisfies both (a) and (b): MINQE(U, 1)

    (d) is unbiased non-negative definite: MINQE(U, NND)

    (e) is invariant non-negative definite: MINQE(I, NND), etc.

    The properties of the estimator strongly depend on the norm chosen and the restrictions imposed. We also obtain a series of IMINQE's (iterated MINQE's), by repeatedly solving the MINQE equations using the solu- tions at any stage as prior values for transforming the model as indicated below equation (5.0.5).

  • Estimation of variance components 19

    5.I. M INQE(U, I )

    We consider the class of invariant unbiased quadratic estimators, i.e., of the form Y'A Y where A belongs to the class

    Cfl = {A: AX=O, trA Vi =f i, i= 1 .. . . . p} (5.1.1)

    where X and V/ are as defined for the general model (5.0.5). We use the following notations and assumptions

    T=(V,~+ XX')>O, Va=ollVl"[-'" "st'OlpG,

    PT=X(X 'T - 'X ) -X 'T - ' , MT=( I -P r )

    where a is a prior value of 0.

    THEOREM 5.1.1. I f Gfl is not empty, then under the Euclidean norm (5.0.10), the MINQE(U, I ) of f'O is

    3= ~Y 'A iY , A i= T-1MTV~M~T -~ (5.1.2)

    where X=(h 1 . . . . . ~) ' is any solution of

    [ Hul(a ) ]~k =f (5.1.3)

    where Hul(a ) is the matrix (trAi Vj).

    PROOF. Under the conditions (5.1.1), the square of the Euclidean norm in (5.0.10) becomes

    HU'AU-N I i2=tr (U 'AUU'AU) -2t rNU'AU+trNN. (5.1.4)

    But N=Y,p~F~trNU'AU=~,t~J~ so expression

    that we need minimize only the

    t rAV~AV~=trATAT for A@Gfz. (5.1.5)

    It is easy to show that (5.1.5) is minimized at A =A, ~Gf i such that

    tr D TA, T = 0 VD E @gl. (5.1.6)

    where no _ , "~UI -- ( MT E MT : tr E MT Vi MT = O, i = 1 .. . . . p } .

  • 20 C Radhakrishna Rao and Jiirgen Kleffe

    Then (5.I .6)~trEM~TA. TM~ =O when trEMrV~M~=O,i= l . . . . . p which ~ TA. T= Y~M T V,M~. which gives the solution (5.1.3). The equa- tion for ~ is obtained by expressing the condition of unbiasedness. Note that [HuI(a)]X= f is consistent iff Uut is not empty. Also the solution (5.1.2) is independent of N.

    NOTE 1: An alternative expression for ~ given in (5.1.3) is

    q= E Y%Y, Ai-'-(MV, M) + VI(MV,~M) + (5.1.7) where M= I -XX +. Note that Hul(a ) of (5.1.3) can be written as

    Hul(a ) = (tr( iV , M) + Vi( MV, M) + Vj).

    NOTE 2: When V~ is nonsingular, T can be replaced by V~ in Theorem 5. I. 1. Then

    .~= E~i.Y,AiY, Ai = - -1 t - -1 V~, MvoViMv V ~ (5.1.8)

    in which case

    Hvl (a ) - ( t r M' v -1vM ' v -1v~ - - \ VaV ot " i Va" a " j ] "

    NOTE 3: If Y is normally distributed, MINQE(U,I) is LMVIUE off'O at values of 0 where XOiV ~ is proportional to V~ (see Theorem 4.1.1).

    NOTE 4: If in (5.1.4) we use the weighted Euclidean norm (5.0.11)

    ] [U 'AU-N[ ]2=tr (U 'AU-N)W(U 'AU-N)W (5.1.9)

    where W is p.d., the solution may not be independent of N. The expression (5.1.9) can be written as

    t rAGAG- 2 trAH + tr NWNW (5.1.10)

    where G---(UWU'+XX') and H= UWNWU'. If G is nonsingular, then the minimum of (5.1.10) is attained at A equal to

    A,= G-~(EhiMGViM+ + MGHM+)G -t

    = E Xi(MGM) + V,(MGM) + + (MGM)+H(MGM) +

    (5.1.11)

    where ~ are determined from the equations

    trA, V/=f/, i=1 .. . . . p.

  • Estimation of variance components 21

    NOTE 5: It is seen from (5.1.2) that the estimate off'O can be written in the form f '0 where t) is a solution of

    [/- IUI(~) ] 0 = hi( r , Oi) (5. I. 12)

    where the ith element of hi(Y, ~) is

    Y'A,Y= Y 'T - 'MrV~M~.T-W (5.1.13)

    and Hm(c 0 is as defined in (5.1.3). If each component of O admits invariant unbiased estimation then Hui(O 0 is non-singular and the MINQE(U, I ) of 0 is

    0= [ Hv,(~) ] - lh/ ( Y, ~). (5.1.14)

    NorE 6: The computation of MINQE(U, I ) of 0 involves the use of a an aprior value of 0. If we have no prior information on 0, there are two possibilities. We may take ~ as a vector with all its elements as unity. An alternative is to choose some o~, compute (5.1.14), consider it (say t)l) as an apriori value of 0 and repeat the computation of (5.1.14). The second round value, say t) 2 is an appropriate estimate of 0, which may be better than tT~ if the initial choice a is very much different from t) r

    We may repeat the process and obtain 03 choosing 02 as an apriori value and so on. The limiting value which satisfies the equation

    [ Hu,(O)] O= h, ( r , O) (5.1.15)

    is the IMINQE( U, I), the iterated MINQE( U, I), which is the same as IMVIUE defined in (4.2.7). It is shown in Section 6 that eq. (5.1.15) is the marginal maximum likelihood (MML) equation considered by Patterson and Thompson (1975).

    5.2, MINQE(U)

    We drop invariance and consider only unbiasedness, as in problems such as those mentioned by Focke and Dewess (1972) where invariant estimates do not exist. In such problems it is advisable to use an apriori value fi0 of j? and change Y to Y -X f i 0 and 13 to ( f l - r io ) and work with the transformed model in addition to the transformation indicated in (5.0.5). For unbiased estimators Y'A Y of f'O the matrix A belongs to

    ~Y~ = { A: X 'AX = 0, trA V i =f., i = 1 ..... p ) (5.2.1)

    where X and V, are as in the general model (5.0.5).

  • 22 C. Radhakrishna Rao and Jiirgen Kleffe

    THEOREM 5.2.1. Let T= V~+ XX' be p.d. If GYv is not empty then the MINQE(U) under Eue#dean norm (5.0.10) is

    ~= ~X,.Y'A,Y, Ai-= T - I (V i - PrVjP~r)T -1 (5.2.2)

    where ~ = (~1 ..... ha)' is any solution of

    [ Htr( a) ]X= f (5.2.3)

    where Hu(a ) is the matrix (trAi Vj).

    PROOF. Under (5.0.10) we have to minimize

    I[ U.A U. - U 112 "4- 2 II U~AX II 2 (5.2.4)

    which, using (5.2.1), reduces to

    trAV~AV~+2trAV, AXX'=trATAT, T= V~+XX'. (5.2.5)

    The expression (5.2.5) attains a minimum at A = A. iff

    trDTA.T=O VD ~QO. (5.2.6)

    Observing that D ~ Q ~D = E - Pr, EPr and following the arguments of Theorem 5.1.1, the expression for A . is obtained as in (5.2.2).

    NOTE 1: We shall consider a few alternatives to the simple Euclidean norm. Focke and Dewess (1972) give different weights to the two terms in (5.2.4) as in (5.0.11). Choosing W= I and K= r2I, (5.2.5) becomes

    trA V~A V~ + 2r z trA V~AXX' = tr [ A ( V~ + r2XX')A ( V~ + r2XX') ].

    (5.2.7)

    The constant r 2 determines the relative weights to be attached to fl and q~. The solution obtained by minimizing (5.2.7) is called r-MINQE(U) which is the same as (5.2.2) with T replaced by (V~ + r2XX').

    NOTE 2: The iterated estimates of fl and MINQE(U) of 0 are solutions of the equations

    x'vo-'x3= x'v;-~r, [ Hu(O)lO= hu(Y,O ) (5.2.8)

  • Estimation of variance components 23

    where hu( Y,O )-~( Y 'AI Y . . . . . YIApY)', (5.2.9)

    Hv(O ) and A i are as defined in Theorem (5.2.1). The solution of (5.2.8) is represented by IMINQE(U).

    5.3. m-MINQE(U)

    In (5.2.7) we defined r-MINQE(U) which uses a weighted Euclidean norm to provide differential weights to fl and ~ and also suggested a translation in Y using a prior value of ft. Actually we may consider a transformation which changes

    g--> y - Xflo , fl---~r-lK-1/2fl

    where/3 0 and r2K correspond to apriori mean and dispersion of t3. Then the Euclidean norm of (5.0.10) becomes

    irA( V a + r2XKX')A( Va + r2XKX') =

    =trATAT+2(r 2 - 1)trATAXKX' (5.3.1)

    where T= V~ + XKX' . Let us denote the optimal solution in such a case by A r and define

    A0=limA r as r---~oe. If A 0 exists, we call the corresponding estimator Y'AoY, the ce-MINQE(U). The following theorem due to Focke and Dewess (1972) establishes the existence of ~-MINQE(U) .

    THEOREM 5.3.1. Let c'V be the set of linear combinations of V 1 . . . . . Vp. Then:

    (i) ~-MINQE(U) exists iff ~ is not empty. (ii) A o is the unique matrix which minimizes t rATAT in the class:

    G = (A: A ~ vCf and minimizes trA TAXKX' subject to A E GYu}.

    (5.3.2) Theorem 5.3.1 characterizes m-MINQE(U) but does not provide a

    method of calculating it. Theorem 5.3.2 due to Kleffe (1977b) gives the necessary formula.

    THEOREM 5.3.2. Let C f be not empty and

    B = ( t r (MVM) + Vi(XKX' ) + 5) (5.3.3)

  • 24 C. Radhakrishna Rao and Jiirgen Kleffe

    where of f'O is Y 'A . Y where

    A . = (XKX ' ) . V . (MV.M) + + (MV,~M) + Va(XKX ')

    + (MV~M) + Vb(MV,~M) + ,

    ga = ~ ai V i, Vb = E b, Vi

    (XKX ' ) , = T - 1/2(T- I/2XKX' T - 1/2)+T - 1/2. The oo-MINQE(U)

    (5.3.4)

    and a=(a I ..... ap)' and b=(b 1 .. . . . bp)' satisfy the equations

    where Qb+ 2Ba=f, Qa=O

    Q = (tr(MV~M) + Vi(MV~M ) + Vj) = Hu,(a ). (5.3.5)

    NOTE 1: It is interesting to note that oe-MINQE(U) is the same if instead of the sequence r2K, we consider (A+rEK) for any A>~0 (see Kleffe, 1977b).

    NOTE 2: oe-MINQE(U) coincides with MINQE(U, I ) if it exists (see Kleffe, 1979).

    5.4. MINQE without unbiasedness

    Let us consider the linear model

    r=xB+,, +o,v,= vo (5.4.1)

    where V o is p.d. for each 0 EY. Choosing a prior value a of 0, (5.4.1) can be written

    Y= Xfl + V2/2e. (5.4.2)

    where e, = Vff 1/2e and V~ = a 1V~ + . + 0% Vp. Using the definition (5.0.6) with e, as q~, a natural estimator f '0 is

    V: '/: (5.4.3)

    where )k=(~ 1 . . . . . )~p)' is chosen such that e'.Ne, is unbiased for f'O, i.e., X satisfies the equation [H(a)]X=f where

    H(a) = (tr Vi. Vj.) = (tr V~-'Vi V.-'Vj). (5.4.4)

    It is seen that (5.4.3) is LMVUE of 0 at 0=a when e is normally

  • Estimation of variance components 25

    distributed. The MINQE of f'O is Y'A Y where A is chosen to minimize

    V2/2A V2/2- N X 'A V2/2 .

    V'./2Ax X'Ax (5.4.5)

    In Sections 5.1-5.3 we imposed the condition of unbiasedness on Y'A Y. We withdraw this condition but consider some alternative restrictions on the symmetric matrix A as defined by the following classes.

    C = (A }, (5.4.6)

    Gt, U = (A :X 'AX = 0}, (5.4.7)

    C~ = (A:AX=O}. (5.4.8)

    It is seen that when A ~ Gev, the bias in the estimator Y'A Y is indepen- dent of the location parameter fl, and is thus partially unbiased (PU). The MINQE's obtained subject to the restrictions (5.4.6)-(5.4.8) are repre- sented by MINQE, MINQE(PU), MINQE(I) respectively. The following general results are reported in Rao (1979).

    THEOREM 5.4.1. Consider the model (5.0.5) and let V, = a I V 1 +. + ap Vp be p.d. Further, let W=Y~V i where %=(3,1 . . . . . Xp)' satisfies the equation [H(a)]% =f, where H(a) = (tr V- 1V,. V~- IV j). Then under the Euclidean norm in (5.4.5), the optimal matrices A . providing MINQE's are as follows.

    (i)

    (ii)

    (iii)

    = V~-I ( I - P~)W( I - P,,) V~ -1

    where M= I - X (X 'X) -X ' .

    PROOF. Under Euclidean norm, the square of (5.4.5) is

    tr( 1/2 V~ "A V2/2 - N) 2 + 2 tr(X'A V~AX) + tr( X 'AX) 2.

    MINQE: A. = ( V~ + XX') -1 W( V~ + XX')-1, (5.4.9)

    MINQE(PU): A. = ( V~ + XX') -1( W- P~ WP,)( V, + XX' ) - ' ,

    P: = X( X ' V ,X) -X ' V d- 1, (5.4.10)

    MINQE( I ) :A . = ( MV, M ) + W( MV, M ) +

    (5.4.11)

    (5.4.12)

  • 26 C. Radhakrishna Rao and Jiirgen Kleffe

    Without any restriction on A, the minimum of (5.4.12) is attained at A, iff

    tr( V2/2A, V2/2- N) V2/2BV1/2 + 2 tr(X'A, V.BX)

    +tr (X 'A ,XX'BX)=O

    for any symmetric matrix B. Then A, satisfies the equation

    or

    (5.4.13)

    V2/2( TZ'/2A, ~ , , , V2/2- N) V2/2 + XX 'A , V~

    + V,A ,XX' + XX'A ,XX' = 0

    (Vo+XX')A,(V:+XX')= ,/2 1/2 v2 Nv2 =Zxiv~=w,

    A,=(V: + XX')- 'W(V. + XX')-'

    which is the matrix given in (5.4.9). If A is subject to the restriction X'AX= O, then (5.4.13) must hold when

    B is replaced by B - P~BP, where P~ is defined in (5.4.10). Then arguing as above and noting that P~ V~ = V~P~, the equation for A, is

    ( vo + XX')A,( V~ + XX') = Y L( Vi- eo V,e~) or

    A, = ( V~ + XX') -1( W- P. WP')( V. + XX') -1

    which is the matrix given in (5.4.10). If A is subject to the condition AX= O, then (5.4.13) must hold when B is

    replaced by MBM where M = I - P. Then A, satisfies the equation

    (MV~,M)A,(MV~,M)= MWM or

    A, = (MV,~M) + W(MV~M) +

    = V- '( I - P.) W( I - P~,) V~-

    which is the matrix given in (5.4.11).

    NOTE 1: MINQE in (5.4.9) and MINQE(I) in (5.4.11) are automatically non-negative when the natural estimator is non-negative while MINQE(PU) may not be.

    NOTE 2: The MINQE(I) of f'O given in (5.4.11) can be written as f'O where 0 is a solution of

    [ H(a) ! 0 = hi( Y, a) (5.4.14)

  • Estimation of variance components 27

    where H(a) is as defined in (5.4.4) and the ith element of h1(Y,e 0 is

    WV~-~(I- P.)Vi(I- P~)V~-Iy. (5.4.15)

    The eq. (5.4. ~4) is consistent. If 0 is identifiable, then H(~) is non-singular, in which case t}= [H(a)l-lhl(Y,e O. This form of the solution enables us to obtain IMINQE(I) , i.e., iterated MINQE(I ) , by writing t} 1 = [H(a)]-lhl(Y,e 0 and obtaining a second stage estimate 0 with a replaced by t~. The limiting solution, if the process converges, satisfies the equation

    [u(0)]0=h,(r,0) .(5.4.16)

    which is shown to be the maximum likelihood equation in Section 6. The estimators (5.4.9)-(5.4.11) depend on the choice of the natural

    estimator (5.4.3) unlike the unbiased MINQE's considered in Sections 5.1-5.3. The condition of unbiasedness eliminated the terms which de- pended on the natural estimator in the norm to be minimized and provided estimators free of the choice of the natural estimator, although the concept of a natural estimator was useful in formulating the MINQE principle.

    In choosing the natural estimator (5.4.3) we did not consider any structure for the error term in the linear model (5.4.1). Now suppose that e= U+ where E(++')=O1FI+... + OpFp as considered in (5.0.1) and we choose the natural estimator as in (5.0.3),

    +*NI+* = +'*( Z I~iF~ - 1/2FiF~ 1/2)+, (5.4.17)

    where +, = F~- I/2+ and ~' --- (/q . . . . . Pr) satisfies the equation

    Itr(FiF~-'FjF~-l) ] lX= f. (5.4.18)

    In such a case the norm to be minimized is

    I U~A U , - N l X 'A U .

    U;,AX X'AX (5.4.19)

    where U, = UF2/2. The expressions for the MINQE's obtained by mini- mizing (5.4.19) are the same as those given in (5.4.9)-(5.4.11) except that W= ~ #i V~ instead of Y.X i V,. It may be noted that X satisfies the equation [H(a)]X =f where H(a) is as defined in (5.4.4) and )t may not be equal to/~ which is a solution of (5.4.18). In some problems like the estimation of

  • 28 C. Radhakrishna Rao and Ji~'rgen Kleffe

    heteroscedastic variances considered by P.S.R.S. Rao and Chaubey (1978), ~,-- #. The properties of estimators based on X and/~ need investigation.

    5.5. MINQE (NND)- -Non-negat ive definite estimator

    In the general variance components model, we admitted the possibility of some of the parameters being negative. But there are cases such as the random effects model where the variance components are non-negative and it may be desirable to have non-negative estimators for them. The estimators considered so far except some of those in Section 5.4 can assume negative values although the parametric function is non-negative. In this section we explore the possibility of obtaining unbiased quadratic estimators -~ = Y 'A Y with A 1> 0 of parametric functions f'O which are non-negative in 0 ~ ey for a general model. A MINQE in this class is denoted by MINQE(U, NND), where NND stands for non-negative defi- niteness of the quadratic estimator.

    The following lemma characterizes the nature of the matrix A if ~ has to be unbiased and non-negative (see Pukelsheim, 1977 for proofs of various results in this section).

    LEMMA 5.5.1. A non-negative and unbiased quadratic estimator Y 'AY satisfies the invariance condition, i.e., AX=O.

    PROOF. Unbiasedness ~X'AX =O~AX = 0 since A >/0. In view of Lemma 5.5.1 we need only consider the class of matrices

    ~YVD=(A:A>~0, AX=O, trAV/=f/, i=1 .. . . . p) . (5.5.1)

    Further, because of invariance we can work with a transformed model

    t= Z 'Y=e, E(t) --- O, E(tC) = 0181 + ' " + OpB~ (5.5.2)

    where Z = X (with full rank say s) and B i = Z ' V/Z, i = 1 . . . . . p. We need consider quadratic estimators -~ = t' Ct where C belongs to the class

    C~D ={C:C/>0, trCBi-- f i}. (5.5.3)

    LEMMA 5.5.2. @YUD is not empty iff

    fe convex span {q(b): b~R"} (5.5.4)

    where q(b) = (b 'MVIMb . . . . . b 'MVpMb)' .

  • Estimation of variance components 29

    NOTE; In terms of the model (5.5.2), the condition (5.5.4) is

    fE convex span { q (b) ,bER s } (5.5.5)

    where q(b) = (b'Blb ..... b'Bpb). The conditions (5.5.4) and (5.5.5) are rather complicated, but simple

    results can be obtained if we assume V 1 ... . . Vp to be n.n.d.

    THEOREM 5.5.1. Let V//> 0, i= 1 ... . . p, V=Y, V,. and V(O= V- V i and B i be as defined in (5.5.2). There exists an n.n.d, quadratic unbiased estimator of Oj iff

    S (Bj) z s (MV+M) S (MVM)

    ( MVM ) S ( MVM)c R( MVM ) 0, then 8j is not non-negatively estim- able. Further, if V/> 0, then 8;, i:Pj is not non-negatively estimable.

    However, let us assume that G/up is not empty for a given f and estimate f'O by the MINQE principle. For this purpose we have to minimize

    [IAII2=trAV~AV~ forA EEYUD. (5.5.9)

    This appears to be a difficult problem in the general case. Of course, if MINQE(U,1) turns out to be a non-negative estimator in any given situation it is automatically MINQE(U, NND). It may also be noted that if

  • 30 C. Radhakrishna Rao and Jiirgen Kleffe

    sp(MV1M ..... MVpM} is a quadratic subspace with respect to (MVM) +, then the MINQE(U,I) off'O is n.n.d, iff CfD is not empty.

    Since C:vn is a convex set, we proceed as follows to solve the problem (5.5.9). The minimum is attained at A , iff

    trBV~AV~>~trA,V~A,V~ VB EGfz~ (5.5.10)

    or writing B=A, + D, the condition (5.5.t0) becomes

    trDV~A,V~>O VD~@,

    =(D:DX=O,A,+O ~> O, trDV/--O,i--- 1 ... . . p}.

    (5.5.11)

    (5.5.12)

    A general solution for (5.5.11) cannot be explicitly written down, but the formula will be useful in examining whether any guessed solution for A , provides a MINQE(U, NND). We shall consider some special cases.

    THEOREM 5.5.2. Let V//> 0, i = 1 ..... p, and Oj be estimable, i.e., the condi- tion (5.5.9) is satisfied. Then the MINQE(U, NND) of Oj is

    4= 1 Y'AjY, Aj=[(I-G)Vj(I-G)] + (5.5.13) R(Aj)

    where G is the projection operator onto the space generated by the columns of (X, V 1 . . . . . Vj_ I , Vj+ 1 . . . . , Vp).

    An alternative approach to the problem (5.5.9) based on standard methods of convex programming is provided by Pukelsheim (1977, 1978a, b).

    We define the functional

    g( B )= man ([]A][2-(A,B ) ) (5.5.14) A ~ 6:vl

    where ~fi is the class defined in (5.1.1), ]IA[I2=trAV~AV~ and (A,B)--- trAV, BV~ with V, >0, and call the problem

    sup g(B) (5.5.15) B>0

    as the dual optimization problem.

  • Estimation of variance components 31

    LEMMA 5.5.3. Let A ,e~fo and B, ~ 0 be such that

    ![A,H2= g( B,). (5.5.16)

    Then:

    (i) A. and B. are optimal solutions of (5.5.9) and (5.5.15).

    (ii) (A . ,B . ) =0. (5.5.17)

    NOTE: g(B) is bounded above since

    HA,[]2>g(B) for all B /> 0. (5.5.18)

    For obtaining a satisfactory solution to the problem (5~5.9) we need an explicit expression for g(B). We obtain this in terms of A where Y 'AY is the MINQE(U, I ) of f'O. Let us note that any matrix B( - -B ' ) can be decomposed in terms of symmetric matrices

    B=B+(B-B )

    such that BECx and (B~B-B)=O. The matrix B is simply the projection of B onto the subspace Gi in the space of symmetric matrices with inner product ( . , . ~ as defined in (5.5.15). We note that by construc- tion, A is such that

    (A,B ) =0 for any given B. (5.5.19)

    THEOREM 5.5.3. Let Y 'AY be the MINQE(U,I) of f'O and GfD be not empty. Then:

    (i) g(B) --t1~112- (A ,B) - 4 II Bll 2, (5.5.20)

    (ii) B, >- 0 is optimal [i.e., maximizes g(B)] /ff

    3+ 2B, ~>0,~ 0 (A+21 BO, B,)=0, (5.5.21)

    1 (iii) A, = A + ~ B o, (5.5.22)

    is a solution to (5.5.9), i.e.,provides MINQE( U, NND ) of f'O and (A . ,B . ) =0.

  • 32 C. Radhakrishna Rao and Jilrgen Kleffe

    The results of Theorem 5.5.3 do not provide a computational technique for obtaining A*. Puketsheim (1978a, b,c) proposed an iterative scheme which seems to work well in many problems.

    6. Maximum likelihood est imat ion

    6.1. The general model

    We consider the general GM model

    Y=Xf l+ e, E(eg)=O~V~+... +0pV,= V 0 (6.1.1)

    and discuss the maximum likelihood estimation of 0 under the assumption

    Y~(Xf l , Vo), fl ~Rm, O ~. (6.1.2)

    We assume that V o is p.d. for V0 E of. Harville (1977) has given a review of the ML estimation of 0 describing

    the contributions made by Anderson (1973), Hartley and Rao (1967), Henderson (1977), Patterson and Thompson (1975), Miller (1977, 1979) and others. We discuss these methods and make some additional com- ments.

    The log likelihood of the unknown parameters (fl,O) is proportional to

    l( fl, O, Y )= - logl Vol - ( Y - Xfl )' Vo ' ( Y - Xfl ). (6.1.3)

    The proper ML estimator of (fl, O) is a value (/~,/~) such that

    l(/~,t~,r)= sup l(fl, o ,r ) . (6.1.4) 3,o~

    Such an estimator does not exist in the important case considered by Focke and Dewess (1972). In the simple version of their problem there are two random variables

    Y l=~+el , Y2=/~+ e2,

    E(e 2) = 0 2 , E(e 2 ) = 0 2 , E(e,e2) = 0. (6.1.5)

    The likelihood based on Yz and Y2 is

    ( r , - ( r2 - - l og o I - log 02 - - (6 .1 .6 )

    2o/2 2o22

  • Estimation of variance components 33

    which can be made arbitrarily large by choosing/~= YI and letting al---~0, so that no proper MLE exists. The ML equations obtained by equating the derivatives of (6.1.6) to zero are

    02 = ( YI -/~)2, a2 = ( Y2 -/~)2, ~t( ~1 + la 2 ]] = al 2Y---L + o---~Y2

    (6.1.7)

    which imply ol = cr2- Thus the ML approach fails to provide acceptable estimators. However, in the example (6.1.5), all the parameters are identifi- able and MINQE(U) of a 2 and o 2 exist. A similar problem arises in estimating 02 and o 2 in the model Y= Xfi + Xy + e where E(yy' )= O2Im, E(ee') = a~I n and E(e)/) = O.

    It is well-known that ML estimators of variance components are heavily biased in general and in some situations considered by Neyman and Scott (1948), they are not even consistent. In such cases, the use of ML estimators for drawing inferences on individual parameters may lead to gross errors, unless the exact distribution of the ML estimators is known. These drawbacks and the computational difficulties involved in obtaining the ML estimators place some limitations on the use of the ML method in practical problems.

    6.2. Maximum likelihood equations

    For 0 ~ oy such that V o > 0 (i.e., p.d.), the likelihood of (fl, 0) is

    /( fl, O, Y) = - log[ 11ol-( Y - Xfl )' Vo- I( y - xf l ). (6.2.1)

    Taking derivatives of (6.2.1) w.r.t, to fl and 0; and equating them to zero we get the ML equations

    X' Vo- IXt~ ~- X" Vo- I Y, (6.2.2)

    t rVo-W~=(Y-Xf l ) 'Vo -W~vo- l (y -x f l ) , i=1 ..... p.

    Substituting for fl in (6.2.3) from (6.2.2), the equations become

    (6.2.3)

    Xfl= PoY, Po=X(X 'Vo- IX ) -X 'Vo - ' , (6.2.4)

    [ H(0)] 0 = h,( Y, 0) (6.2.5)

    where H(0)=(t r V o- W i Vo-IVj) is the matrix defined in (5.4.4) and the ith

  • 34 C. Radhakrishna Rao and JiJrgen Kleffe

    element of hj(Y,O) is

    r ' (x - P0)' vo vi vo-1(I- Po) r (6.2.6)

    which is the same as the expression defined in (5.4.15). We make a few comments on the eqs. (6.2.4) and (6.2.5).

    (i) The ML equation (6.2.5) is the same as that for IMINQE( I ) given in (5.4.15).

    (ii) The original likelihood eq. (6.2.3) is unbiased while the eq. (6.2.5) which provides a direct estimate of 0 is not so in the sense

    E[ hI(Y,O) ]:=#[ H(O) ]O. (6.2.7)

    An alternative to the eq. (6.2.5) is the one obtained by equating hi( Y, 0) to its expectation

    h,(r,0) = e[h , ( r ,0 ) ] = [ Hu,(0)l 0 (6.2.8)

    which is the marginal ML (MML) equation suggested by Patterson and Thompson (1975).

    (iii) There may be no solution to (6.2.5) in tile admissible set ~ to which 0 belongs. This may happen when the supremum of the likelihood is attained at a boundary point of oy.

    (iv) It is interesting to note that the ML estimate of 0 is invariant for translation of Y by Xa for any a, i.e., the MLE is a function of the maximal invariant B' Y of Y where B = X .

    Suppose 0 in the model (6.1.1) is identifiable on the basis of distribution of Y in the sense:

    +opG=o;v,+... +o;G oi-o;=o foran/,

    i.e., V~ are linearly independent (see Bunke and Bunke, 1974). But it may happen, as in the ease of the example of Focke and Dewess (1972), that 0 is no longer identifiable when we consider only the distribution of B' Y, the maximal invariant of Y. Such a situation arises when B' ViB are linearly dependent while V~ are not. In such cases the ML method is not applicable while MINQE(U) developed in Section 5.2 can be used. Thus, the invari- ance property of MLE limits the scope of application of the ML method.

    (v) Computational algorithms: The eq. (6.2.5) for the estimation of 0 is, in general, very complicated and no closed form solution is possible. One has to adopt iterative procedures. Harville (1977) has reviewed some of the existing methods.

  • Estimation of variance components 35

    (a) If 0k is the kth approximation to the solution of (6.2.5), then the (k + 1)th approximation is

    ^ -1 h 0k+l=[ H(Ok)l l(Y, Ok) (6.2.9)

    as suggested for IMINQE(I), provided 0 is identifiable. Otherwise, the H matrix in (6.2.5) is not invertible. Iterative procedure of the type (6.2.9) is mentioned by Anderson (1973), Harville (1977), LaMotte (1973) and Rao (1972) in different contexts. However, it is not known whether the procedure (6.2.9) converges and provides a solution at which supremum of the likelihood is attained.

    (b) Hartley and Rao (1967), Henderson (1977) and Harville (1977) proposed algorithms suitable for the special case when one of the V~ is an identity matrix (or at least non-singular). An extension of their method for the general case is to obtain the (k + l)th approximation of the ith component of 0 as

    Pok) ' Vd~ ' V i Vd~'(I-- eak)r, i=1 ..... p. tr Vd~ ' V i

    (6.2.10)

    In the special case when V~ are non-negative definite and the initial 0 i are chosen as non-negative, the successive approximations of 0 i using the algorithm (6.2.10)stay non-negative. This may be a "good property" of the algorithm, but it is not clear what happens when the likelihood eq. (6.2.5) does not have a solution in the admissible region.

    (c) Hemmerle and Hartley (1973) and Goodnight and Hemmerle (1978) developed the method of W transformation for solving the ML equations. Miller (1979) has given a different approach. Possibilities of using the variable-metric algorithms of Davidson-Fletcher-Powell de- scribed by Powell (1970) are mentioned by Harville (1977). As it stands, further research is necessary for finding a satisfactory method of solving the eq. (6.2.5) and ensuring that the solution provides a maximum of the likelihood.

    6. 3. Marginal maximum likelihood equation

    As observed earlier the ML eq. (6.2.5) is not unbiased, in the sense

    E[ h,( Y,O] 4 = [ H(O ) ]O. (6.3.1)

  • 36 C. Radhakrishna Rao and Jiirgen Kleffe

    If we replace the eq. (6.2.5) by

    h,(g,O)=E[h~(r ,o) ]

    = [ Hui(O)] 0, (6.3.2)

    we obtain the IMINQE( U, I ) defined in (5.1.14), which is the same as IMVIUE defined in (4.2.7).

    The eq. (6.3.2) is obtained by Patterson and Thompson (1975) by maximizing the likelihood of 0 based on T'Y, where T is any choice of X , which is the maximal invariant of Y. Now

    l(O, T' Y) = - IoN T' VoT!- Y 'T (T ' VoT) - 'T ' Y. (6.3.3)

    Differentiating (6.3.3) w.r.t. 0 i we obtain the MML (marginal ML) equao tion

    t r (T (T 'VoT) - IT 'V i )= Y 'T (T 'VoT) - 'T 'V iT (T 'VoT) - IT 'y ,

    i= 1 . . . . . p. (6.3.4)

    Using the identity (C.R. Rao, 1973, p.77)

    T(T 'VoT) - 'T ' -= Vu ' - Vo-IX(X'VolX)XtVo 1

    _~ V0-1(i_ Po) (6.3.5)

    eq. (6.3.4) becomes

    tr( Vo- ~ ( I - Po ) Vi ) = Y' Vo ' ( I - Po ) Vi( I - P[O Vo l Y, i=1 .. . . . p

    (6.3.6)

    which is independent of the choice of T= X used in the construction of the maximal invariant of Y. It is easy to see that (6.3.6) can be written as

    [ Hul(O) ]O= h,( Y,O) (6.3.7)

    which is eq. (6.3.2). (i) Both ML and MML estimates depend on the maximal invariant

    T 'Y of Y. Both the methods are not applicable when 0 is not identifiable on the basis of T' Y.

    (ii) The bias in MMLE may not be as heavy as in MLE and MMLE may be more useful as a point estimator.

    (iii) The solution of (6.3.7) may not lie in the admissible set of 0 as in the case of the ML equation.

  • Estimation of variance components 37

    (iv) If O~ is the kth approximation, then the (k + 1)th approximation can be obtained as

    (6.3.8)

    It is not known whether the process converges and yields a solution which maximizes the marginal likelihood.

    (v) Another algorithm for MMLE similar to (6.2.9) is to compute the (k + 1)th approximation to the ith component of 0 as

    Oi, k + l'=4,k Y'( I-- Iviv I(I- Pk) V (6.3.9) tr Vd - '(I-- Pd )

    It is seen that both ML and MML estimators can be obtained as iterated MINQE's, MLE being IMINQE(I ) defined in (5.4.16) and MMLE being IMINQE( U, I) defined in (5.1.14). There are other iterated MINQE's which can be used in cases where ML and MML methods are not applicable.

    It has been remarked by various authors that MINQE involves heavy computations, requiring the inversion of large matrices. This argument is put forward against the use of MINQE. These authors overlook the fact that inversion of large matrices depend on the inversion of smaller order matrices in special cases. For instance, if V o is of the form ( I+ UDU'), then it is well-known that

    Vo- l=I -U(U 'U+D 1) - lu ' (6.3.10)

    which can be used to compute Vo -1 if the matrix (U 'U + D -1) is compara- tively of a smaller order than V o. It may be noted that the computational complexity is of the same order for MINQE and MLE, MMLE.

    References

    Ahrens, H. (1978). MINQUE and ANOVA estimator for one way classification--a risk comparison. Biometrical J. 20, 535-556.

    Ahrens, H., Kleffe, J. and Tensler, R. (1979). Mean squared error comparisons for MINQUE, ANOVA and two alternative estimators under the balanced one way random model. Tech. Rep. P-19/79, Akademie der Wissenschaften der DDR.

    Anderson, R. L. (1975). Designs and estimators for variance components. In: J. N. Srivastava, ed., A Survey of Statistical Design and Linear Models. pp. 1-29.

    Anderson, R. L. and Crump. P. P. (1967). Comparisons of designs and estimation procedures for estimating parameters in a two stage nested process. Technometrics 9, 499-516.

  • 38 C Radhakrishna Rao and JiJrgen Kleffe

    Anderson, T. W. (1973). Asymptotically efficient estimation of covariance matrices with linear structure. Ann. Statist. 1. 135-141.

    Brownlee, K. A. (1953). Industrial Experimentation. Chemical Publishing Co. Bunke, H. and Bunke, O. (1974). Identifiability and estimability. Math. Operationsforsch.

    Statist. 5, 223-233. Cochran, W. G. (1939). The use of the analysis of variance in enumeration by sampling. J.

    Am. Statist. Assoc. 34, 492-510. Fairfield Smith, H. (1936). A discriminant function for plant selection. Ann. Eugenics

    (London) 7, 240-260. Fisk, P. R. (1967). Models of the second kind in regression analysis. J. Roy. Statist. Soc. B 29,

    235-244. Focke, J. and Dewess, G. (1972). Uber die Sch~/tzmethode MINQUE yon C. IL Rao and ihre

    Verallgemeinerung. Math. Operationforsch. Statist. 3, 129-143. Fuller, W. A. and Rao, J. N. K. (1978). Estimation for a linear regression model with

    unknown diagonal covariance matrix. Ann. Statist. 6, 1149-1158. Goodnight, J. H. and Hemmerle, W. J. (1978). A simplified algorithm for the W-transforma--

    tion in variance component estimation. SAS Tech. Rept. R-104, Raleigh, NC. Hartley, H. O. and Rao, J. N. K. (1967). Maximum likelihood estimation for the mixed

    analysis of variance model. Biometrika 54, 93-108. Harville, D. A. (1977) Maximum likelihood approaches to variance component estimation

    and to related problems. J. Am. Statist. Assoc. 72, 320-340. Hemmerle, W. J. and Hartley, H. O. (1973). Computing maximum likelihood estimates for the

    mixed AOV model using the W-transformation. Technometrics 15, 819-831. Henderson, C. R. (1950). Estimation of genetic parameters (Abstract). Ann. Math. Statist. 21,

    309-310. Henderson, C. R. (1953)o Estimation of variance and covariance components. Biometrics 9,

    226-252. Henderson, C. R. (1977). Prediction of future records. In: Proc. Int. Conf. on Quantitative

    Genetics. pp. 616-638. Hildreth, C. and Houck, J. P. (1968). Some estimators for a linear model with random

    coefficients. J. Am. Statist. Assoc. 63, 584-595. Infante, A. (1978). Die MINQUE--Schatzung bei Verlaufskurvemmodellen mat zufalligen

    regressionskoeffizienten. Thesis, Dortmund (FRG). Kleffe, J. (1975). Quadratische Bayes-Sch;itzungen f/Jr Lineare Parameter: der Kovarianz-

    matrix im Gemischten Linearen Modellen. Dissertation, Humboldt Univ., Berlin. Kleffe, J. (1976). Best qnadratic unbiased estimators for variance components in mixed linear

    models. Sankhya B 3~1, 179-186. Kleffe, J. (1977a). Invmiant methods for estimating variance components in mixed linear

    models. Math. Operationforsch. Statist. 8, 233-250. Kleffe, J. (1977b). A note on oo-MINQUE in variance covariance components models. Math.

    Operationforsch. Statist. 8, 337-343. Kleffe, J. (1978). Simultaneous estimation of expectation and covariance matrix in linear

    models. Math. Oper. Statist. Ser. Statist. 9, 443-478. Kleffe, J. (1979). C. R. Rao's MINQUE for replicated and multivariate observations. Tech.

    Rept. Zimm der AdW der DDR, Berlin. Kleffe, J. (1980). C. R. Rao's MINQUE under four two way ANOVA models. Biometrical J.

    21, in press. Kleffe, J. and Pincus, R. (1974a). Bayes and best quadratic unbiased estimators for parame-

    ters of the covariance matrix in a normal linear model. Math. Operationsforsch. Statist. 5, 47-67.

  • Estimation of variance components 39

    Kleffe, J. and Pincus, R. (1974b). Bayes and best quadratic unbiased estimators for variance components and heteroscedastic variances in linear models. Math. Operationsforsch. Statist. 5, 147-159.

    Kleffe, J. and Z~tlner, I. (1978). On quadratic estimation of heteroscedastic variances. Math. Oper. Statist. Set. Statist. 9, 27-44.

    Krishnaiah, P. R. and Lee, Jack C. (1974). On covariance structures. Sankhya 38A, 357-371. LaMotte, L. R. (1973). Quadratic estimation of variance components. Biometrics 29, 311-330. Miller, J. J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed

    model of analysis of variance. Ann. Statist. 5, 746-762. Miller, J. J. (1979). Maximum likelihood estimation of variance components--a Monte Carlo

    Study. J. Statist. Comp. and Simulation 8, 175-190. Neyman, J. and Scott, E. (1948). Consistent estimators based on partially consistent observa-

    tions. Econometrica 16, 1-32. Olsen, A., Seely, J. and Birkes, D. (1976). Invariant quadratic unbiased estimation for two

    variance components. Ann. Statist. 4, 878-890. Panse, V. G. (1946). An application of discriminant function for selection in poultry. J.

    Genetics (London) 47, 242-253. Patterson, H. D. and Thompson, R. (1975). Maximum likelihood estimation of components of

    variance. In: Proc~ of 8th International Biometric Conference. pp. 197-207. Pincus, R. (1974). Estimability of parameters of the covariance matrix and variance compo-

    nents. Math. Oper. Statist. 5, 245-248. Powell, M. J. D. (1970). A survey of numerical methods for unconstrained optimization.

    SIAM Rev. 12, 79-97. Pukelsheim, F. (1977). Linear models and convex programs: Unbiased non-negative estima-

    tion in variance component models. Tech. Rep. 104, Stanford University. Pukelsheim, F. (1978a). Examples for unbiased non-negative estimation in variance compo-

    nent models. Tecli. Rep. 113, Stanford University. Pukelsheim, F. (1978b). On the geometry of unbiased non-negative definite quadratic estima-

    tion in variance component models. In: Proc. Vl-th International Conference on Math. Statist., Poland

    Pukelsheim, F. (1978c). On the existence of unbiased non-negative estimates of variance components. Tech. Rep. Inst. Math. Stat., Univ. of Freiburg.

    Rao, C. R. (1947). General methods of analysis for incomplete block designs. J. Am. Statist. Assoc. 42, 541-561.

    Rao, C. R. (1953). Discriminant function for genetic differentiation and selection. Sankhya 12, 229-246.

    Rao, C. R. (1956). On the recovery of interblock information in varietal trials. Sankhya 17, 105-114.

    Rao, C. R. (1965). The theory of least squares when the parameters are stochastic and its application to the analysis of growth curves. Biometrics 52, 447-458.

    Rao, C. R. (1967). Least squares theory using an estimated dispersion matrix and its application to measurement of signals. In: Proc. Fifth Berkeley Symposium, Vol. 1. pp. 355-372.

    Rao, C. R. (1970). Estimation of heteroscedastic variances in linear models. J. Am. Statist. Assoc. 65, 161-172.

    Rao, C. R. (1971a). Estimation of variance and covariance components. J. Multivariate Anal. 1, 257-275.

    Rao, C. R. (1971b). Minimum variance quadratic unbiased estimation of variance compo- nents. J. Multivariate Anal. 1, 445-456.

    Rao, C. R. (1972). Estimation of variance and covariance components in linear models. J. Am. Statist. Assoc. 67, 112-115.

  • 40 C. Radhakrishna Rat and Ji~rgen Kleffe

    Rat, C. R. (1973). Linear Statistical Inference and Its Applications. Second Edition. John Wiley, New York.

    Rat, C. R. (1979). Estimation of variance components---MINQE theory and its relation to ML and MML estimation. Sankhya (in press).

    Rat, C. R. and Mitra, S. K. (1972). Generalized Inverse of Matrices and Its Applications. Johil Wiley, New York.

    Rat, J. N. K. (1973). On the estimation of heteroscedastic variances, Biometrics 29, 11-24. Rat, J. N. K. and Subrahmaniam, K. (1971). Combining independent estimators and

    estimation in linear regression with unequal variances. Biometrics 27, 971-990. Rat, P. S. R. S. and Chaubey, Y. P. (1978). Three modifications of the principle of the

    MINQUE. Commn. Statist. Math. A7, 767-778. Rat, P. S. R. S. (1977). Theory of the MINQUE--A review. Sankhya B, 201-210. Rat, P. S. R. S., Kaplan, J. and Cochran, W. G. (1979). Estimators for the one-way random

    effects model with unequal error variances. Teeh. Rep. Searle, S. R. (1968). Another look at Henderson's method of estimating variance components.

    Biometrics 24, 749-788. Searle, S. R. (1971). Topics in variance component estimation. Biometrics 27, 1-76. Seely, J. (1970). Linear spaces and unbiased estimation--application to mixed linear model.

    Ann. Math. Statist. 42, 710-721. Seely, J. (1975). An example of inadmissible analysis of variance estimator for a variance

    component. Biometrika 62, 689-690. Sinha, B. K. and Wieand, H. S. (1977). MINQUE's of variance and covariance components

    of certain covariance structures. Indian Statistical Institute. Tech. Rep. 28/77. Spjotvoll, E. (1977). Random coefficients regression models. A review. Math Oper. Statist.,

    Ser. Statist. 8, 69-93. Swallow, W. H. and Searle, S. R. (1978). Minimum variance quadratic unbiased estimation of

    variance components. Technometrics 20, 265-272. Swamy, P. A. B. (1971). Statistical Inference in Random Coefficients-Regression Models.

    Springer-Verlag, Berlin. Yates, F. (1940). The recovery of inter-block information in balanced incomplete block

    designs. Ann. Eugenics (London). 10, 317-325. Yates, F. and Zacopancy, I. (1935). The estimation of the efficiency of sampling with special

    reference to sampling for yield in cereal experiments. J. Agric. Sci. 25, 545-577.

  • P. R. Krishnaiah, ed., Handbook of Statistics, Vol. 1 '~) North-Holland Publishing Company (1980) 41-87

    Multivariate Analysis of Variance of Repeated Measurements

    Nel l H . 7~mm

    1. Introduction

    The analysis of variance of multiple observations on subjects or units over several treatment conditions or periods of time is commonly referred to in the statistical and behavioral science literature as the repeated measures situation or repeated measures analysis. Standard textbook discussions of repeated measurement designs employing mixed-model univariate analysis of variance procedures are included in Cox (1958), Federer (1955), Finny (1960), John (1971), Kempthorne (1952), Kirk (1968), Lindquist (1953), Myers (1966), Quenouille (1953) and Winer (t971), to name a few. Recently, Federer and Balaam (1972) published an extensive bibliography of repeated measurement designs and their analysis through 1967 and Hedayat and Afsarinejad (1975) discussed the construc- tion of many of the designs. Coverage of the analysis of variance of repeated measures designs by the above authors has been limited to standard situations employing univariate techniques. The analysis of re- peated measurements are discussed from a multivariate analysis of vario ance point of view in this chapter.

    2. The general linear model

    The generalization of the analysis of variance procedure to analyze repeated measurement designs utilizing the multivariate analysis of vari- ance approach employs the multivariate general linear model and the testing of linear hypotheses usingp-dimensional vector observations. From a multivariate point of view, n independent p-dimensional repeated measurements are regarded as p-variate normal variates Yg, i--1,2 ..... n, with a common unknown variance-covariance matrix Y. and expectations

    E(Yi) = xillfl I + xi2~2 -q- . . . "~ Xiq~q , i = 1 ,2 , . . . , n, (2.1)

    41

  • 42 Nell H. Timm

    where the xij's are known constants and the/ ] fs are unknown p-compo- nent parameter vectors. Letting thep q matrix B '= (/31/] 2. /]p), thep n matrix Y '=(Y IY2 . . . Yn) and the nq matrix X=[xif l , expression (2.1) is written as

    E(Y) = XB. (2.2)

    Since each row vector Yi of Y is sampled from a p-variate normal population with variance-covariance matrix Z,we may write the variance of the matrix Y as

    V(Y) = I n Z (2.3)

    where the symbol @ represents the direct or Kronecker product of two matrices. The combination of the formulas (2.2) and (2.3) are referred to as the multivariate Gauss-Markof f setup.

    To estimate the unknown parameter vectors in the matrix B, the normal equations

    X'XB = X 'Y (2.4)

    are solved. Letting /~ be a solution to the normal equations, the least squares estimator of an estimable parametric vector function

    i~t=ctB = Cl / ] l - -~ c2 / ]2 - ] - -q- Cq/]q, (2.5)

    for known c i, is

    I~=Ct /~ = C I~ 1 "+" C2~2"~ " " " "~ Cq~q. (2.6)

    To estimate the unknown elements % of the matrix E, the sum of squares and cross products (SSP) matrix due to error is computed. This matrix is obtained by evaluating

    S e = Y' Y - Y 'XB (2.7)

    where/~ is any solution to the normal equations. Letting the rank of the design matrix be r ~< q, the degrees of freedom due to error is n - r = re, and (1/Ve)S e results in an unbiased estimator of Z.

    To test the hypothesis H o that ~p=e'B has a specified value ~k0, we proceed by using Hotelling's generalized T 2 statistic (see Hotelling, 1931 and Bowker, 1960) and Theorem 2.1 (Rao, 1973, p. 541).

    THEOREM 2.1. Let S have a central Wishart distribution with k degrees of freedom, represented by S~ Wp(k,Y), and let d be normally distributed with

  • Multivariate analysis of variance of repeated measurements 43

    mean ~ and variance-covariance matrix c- IN with constant c greater than zero, represented by d~Np(&c- ly.) , such that S and d are independent. Hotelling's generalized T 2 statistic is defined by

    and T 2 = ckd'S - ld,

    ( k=p+ l ) T2 P -~- ~r (p ,k -p + l,er2),

    which is a noncentral F distribution with noncentrality parameter Cq "2m- c6'Z- ~&

    Since S e has a central Wishart distribution, Se~Wp(Pe, Z), and ~ NpOp, c ' (X 'X) -eE ), independent of Se, where (X 'X) - is a generalized inverse of X 'X, and e ' (X 'X) -c > 0 if ~p is estimable,

    ( l)e--pq- 1 ) (l~--l~o)tge-l(~--lgbO) =F p e'(X'X) -c

    (2.8)

    has a noncentral F distribution and the null hypothesis Ho: ~=~o is rejected at the significance level a if F > F"(p, e e -p + 1). Alternatively, since

    r 2 ISel 1+

    "e Iae + ahl '

    where S h = (e ' (X 'X) -e) - ' (~- ~Po)(ff- ~Po)'~ Wp(1,,3_2,) under the null hypothe- sis, the ratio

    [S~[ .~B( ve -p+ l p ) (2.9) B= ]Se~ ah I 2 '2

    has a central beta distribution when H o is true so that rejecting for large values of F is equivalent to rejecting H o for small values of B.

    To test the hypothesis H o that v h independent estimable functions have a specified value F, the null hypothesis H 0 is written as

    H o: CBA = r (2.10)

    where the Ph q matrix C is of rank v h

  • 44 Neil H. Timm

    Furthermore,

    S~=A'Y ' [ I - -X (X 'X) -X ' ]YA , (2.12)

    and S h are independently distributed; Se,--Wu(n--r,A'EA ) and Sh~ Wu(vh,A'EA, .). Departure from the null hypothesis may be detected by comparing the matrices S e and S h.

    Having computed the matrices S e and S h for the null hypothesis (2.10), several procedures have been recommended for testing that the hypothesis H o is true. All of the procedures proposed are dependent on the roots of one of the following determinantal equations.

    (a) Is~ -- XSel =0, (b) ISe-- P(Se+ SOl =0, (c) IS~-O(S~ + Se)I=O

    (2.13)

    with roots ordered from largest to smallest for i=1,2 .. . . . s=min(vh, U ). Wilks (1932) proposed testing H 0 using

    A- iSe+Sh i v, = i=1 i=1 i=1 (2.14)

    and to reject H 0 if A~a (s,m,n) where m=(]v h - u I -1 ) /2 , n=(v e - u -1 ) /2 and s=min(v~,u). Pillai (1960) approximated the distri- bution of the following trace criterion proposed by Bartlett (1939) and

  • Multivariate analysis" of variance of repeated measurements 45

    Nanda (1950):

    " k Xi v='rr[ sh(s~ + se) '] = ,Z e, = 1 +a, "= i=1

    - - -= i (1-~-v,) (2.17) i~ l

    and to reject the hypothesis if V>V '~ (s,m,n). Tables for each of the criteria are collected in Timm (1975). For a review of the literature on the distribution of A, T0 z, O, and V, the reader is referred to Krishnaiah (1978). In general no one multivariate criterion is uniformly best; we have selected to use WiNs' A-criterion to illustrate the analysis of repeated measurement designs from a multivariate analysis of variance point of view. When s = 1, all criteria are equivalent.

    Several alternative criteria have been proposed by authors to test the null hypothesis represented in (2.10). Of particular importance is the step-down procedure proposed by J. Roy (1958) and the finite intersection tests developed by Krishnaiah (1965). In addition, tests based on the ratio of roots are discussed in the paper by Krishnaiah and Waikar (1971).

    Following the test of a multivariate hypothesis of the form H0: CBA = F, simultaneous confidence intervals for the parametric estimable functions ~p = c'Ba, for vectors c in the row space of C and arbitrary vectors a, may be obtained for each of the multivariate test criteria. Evaluating the expression

    1 1 (() ; ; tp--C 0 a t --Se ac t (XtX) -c

  • 46 Neil H. Timm

    Bartlett-Nanda-Pillai:

    V ~ Co2= Ue(~) . (2.196)

    The critical values U ~, ", U~, and V ~ correspond to those procured in testing the multivariate hypothesis Ho: CBA = F at the significance level a.

    3. One-sample repeated measurement design

    Suppose a random sample of n subjects are measured (in the same metric scale with the same origin and unit) at p treatment levels so that the general organization of the data may be represented as in Table 3.1.

    The data in Table 3.1 may be analyzed as a special application of the multivariate general linear model. The p repeated measures for the ith subject is regarded as ap-variate vector observation

    Yi=l~+ei, i=1 ,2 .. . . . n, (3.1)

    where 1~ is apl vector of treatment means and ei is apl vector of random errors. Furthermore, we assume that ei~INp(O,X) so that E(Yi)=~.

    The usual hypothesis of interest for the design is that the treatment means/x~,~t 2 ... . . Pp, the elements of the vector ~, are equal:

    Ho: gJ =/~2 . . . . . Pp. (3.2)

    Representing H 0 as CBA =F, the matrices C, B, A and F take the following form:

    c =[1], ~ =[ . , , .2 ..... pp], (1 x 1) (1 Xp)

    =(':::/, i0j, px(p- l ) \ --1'] l (p- l )

    where 1 denotes a vector of unities. With the np matrix Y=[Yifl and the nl design matrix X=I ,

    expressions for S e and S h are readily obtained using (2.11) and (2.12) with = (X 'X) - 1X' Y. If H 0 is true,

    A= ISel IS~ + S~I ~ C ' (p -- 1, 1 ,~ -- l )

  • Multivariate analysis of variance of repeated measurements

    Table 3.1 Data for a one-group repeated measurement design

    Subjects Treatments

    T 1 T 2 T~

    SI Yll YI2

    $2 Y21 Y22

    Y lp

    Y2p

    Sn Ynl Yn2 Ynp

    47

    and H 0 is rejected if AF~(p-I'n-p+I)'

    since when s= 1, T2/ve=(1 - A)/A.

    EXAMPLE 3.1. Using the data in 'Fable 3.2, the mean reaction time of subjects to five probe words are investigated (Timm, 1975, p. 233).

    Table 3.2 Sample data: one-group analysis

    Subjects Probe-word positions

    1 2 3 4 5

    1 51 36 50 35 42 2 27 20 26 17 27 3 37 22 41 37 30 4 42 36 32 34 27 5 27 18 33 14 29 6 43 32 43 35 40 7 41 22 36 25 38 8 38 21 31 20 16 9 36 23 27 25 28

    10 26 31 31 32 36 11 29 20 25 26 25

  • 48 Nell H. Timm

    Calculations show that

    A- ISel =0.2482 ISe + Shl

    or T 2= 30.29 and

    (n -p+l ) T2 =(7)30"29=5.30 . p - 1 (n - l) 10

    The hypothesis is rejected at the a=0.05 level if A

  • Multivariate analysis' of variance of repeated measurements 49

    the repeated measures has the uniform covariance structure

    = o ;qr+ .21. (3.5)

    Such a decomposition yields the umvariate mixed model

    y,j = tX + Sg + flj + e,j (3.6)

    where the subjects are a random sample from a population in which s i ,~lN (0, o 2) jointly independent of the errors eij and eij~IN(O, 02). The parame- ters in (3.6) have the interpretation: /~ is an unknown constant, s i is a random component associated with subject i, flj is a fixed treatment effect, and eij is a random error component. A test of the null hypothesis

    /4o: f i ,=,8 2 . . . . . tg,,

    is provided by the ratio

    SSH/ (p - 1) ~F(v~,Ve.)" F= SSE/(n - 1)(p - 1)

    The degrees of freedom for the F ratio are obtained from the multivariate test by the formula v~=R(A)vh=(p-1) l=(p - -1 ) and v*=R(A)ve= (p - 1)(n - 1) where R(A) denotes the rank of A. Furthermore, selecting A so that A'A = I, SSH = Tr(Sh) and SSE= Tr(Se). Thus, the univariate mixed model analysis is merely a special case of the more general multivariate analysis.

    If the variance-covariance matrix Y, has the structure given in (3.5), then the mean square ratio for testing the equality of the fixed treatment effects have an exact F-distribution. As shown by Bock (1963), a necessary and sufficient condition for an exact F-test is that the transformation of the error matrix by an orthogonal contrast matrix results is the scalar matrix o21. Box (1950) and Lee, Krishnaiah and Chang (1976) developed proce- dures to test for uniform covariance structure. Bock (1975) and Huynh and Feldt (1970) review the general structure case. Whenever, the condition for the exact F-distribution is satisfied, the univariate model should be used to analyze repeated measurement data.

    When the variance-eovariance matrix X is arbitrary, Greerdlouse and Geisser (1959) and Huynh and Feldt (1976) proposed a conservative F-test procedure for testing for the equality of treatment effects using the univariate mixed model analysis. However, as discussed by Geisser (1979), such a procedure is to be avoided when an exact multivariate procedure exists.

  • 50 Nell H. Timm

    4. The /-sample repeated measurement design

    Letting

    ~lf~j = ( Yijl ,Yij2 . . . . 'Yijp ) '~ INp (~gi, ~)

    the p-variate observation of the jth subject within the ith group is repre- sented as

    Y~j =/** q-eo, i=1 ,2 ..... I ; j= 1,2 .. . . . iV,. (4.1)

    and ._ 1 N- -E l= iN i where #i=(/~,l,&2,.-.,/~) is a 1 p vector of means and eu is a vector of random erro