studies in estimation of patterned covariance matrices

68
Linköping Studies in Science and Technology. Dissertations. No. 1255 Studies in Estimation of Patterned Covariance Matrices Martin Ohlson Department of Mathematics Linköping University, SE–581 83 Linköping, Sweden Linköping 2009

Upload: lydan

Post on 03-Jan-2017

227 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Studies in Estimation of Patterned Covariance Matrices

Linköping Studies in Science and Technology. Dissertations.No. 1255

Studies in Estimation of PatternedCovariance Matrices

Martin Ohlson

Department of MathematicsLinköping University, SE–581 83 Linköping, Sweden

Linköping 2009

Page 2: Studies in Estimation of Patterned Covariance Matrices

Linköping Studies in Science and Technology. Dissertations.No. 1255

Studies in Estimation of Patterned Covariance Matrices

Martin Ohlson

[email protected]

Mathematical StatisticsDepartment of Mathematics

Linköping UniversitySE–581 83 Linköping

Sweden

ISBN 978-91-7393-622-4 ISSN 0345-7524

Copyright c© 2009 Martin Ohlson

Printed by LiU-Tryck, Linköping, Sweden 2009

Page 3: Studies in Estimation of Patterned Covariance Matrices

Ágætis byrjun...

Page 4: Studies in Estimation of Patterned Covariance Matrices
Page 5: Studies in Estimation of Patterned Covariance Matrices

Abstract

Ohlson, M. (2009). Studies in Estimation of Patterned Covariance Matrices.Doctoral dissertation. ISBN 978-91-7393-622-4. ISSN 0345-7524.

Many testing, estimation and confidence interval procedures discussed in the multivariatestatistical literature are based on the assumption that the observation vectors are indepen-dent and normally distributed. The main reason for this is that often sets of multivariateobservations are, at least approximately, normally distributed. Normally distributed datacan be modeled entirely in terms of their means and variances/covariances. Estimatingthe mean and the covariance matrix is therefore a problem of great interest in statisticsand it is of great significance to consider the correct statistical model. The estimator forthe covariance matrix is important since inference on the mean parameters strongly de-pends on the estimated covariance matrix and the dispersion matrix for the estimator ofthe mean is a function of it.

In this thesis the problem of estimating parameters for a matrix normal distributionwith different patterned covariance matrices, i.e., different statistical models, is studied.

A p-dimensional random vector is considered for a banded covariance structure re-flectingm-dependence. A simple non-iterative estimation procedure is suggested whichgives an explicit, unbiased and consistent estimator of the mean and an explicit and con-sistent estimator of the covariance matrix for arbitraryp andm.

Estimation of parameters in the classical Growth Curve model when the covariancematrix has some specific linear structure is considered. In our examples maximum like-lihood estimators can not be obtained explicitly and must rely on numerical optimizationalgorithms. Therefore explicit estimators are obtained as alternatives to the maximumlikelihood estimators. From a discussion about residuals, a simple non-iterative estima-tion procedure is suggested which gives explicit and consistent estimators of both themean and the linearly structured covariance matrix.

This thesis also deals with the problem of estimating the Kronecker product structure.The sample observation matrix is assumed to follow a matrix normal distribution witha separable covariance matrix, in other words it can be written as a Kronecker productof two positive definite matrices. The proposed estimators are used to derive a likeli-hood ratio test for spatial independence. Two cases are considered, when the temporalcovariance is known and when it is unknown. When the temporal covariance is known,the maximum likelihood estimates are computed and the asymptotic null distribution isgiven. In the case when the temporal covariance is unknown the maximum likelihoodestimates of the parameters are found by an iterative alternating algorithm and the nulldistribution for the likelihood ratio statistic is discussed.

v

Page 6: Studies in Estimation of Patterned Covariance Matrices
Page 7: Studies in Estimation of Patterned Covariance Matrices

Populärvetenskaplig sammanfattning

Inom många skattningsproblem, testprocedurer och beräkningar av konfidensintervall somdiskuteras i den multivariata statistiska litteraturen antas det att de observerade vektor-erna eller matriserna är oberoende och normalfördelade. Det främsta skälet till detta äratt observationerna åtminstone oftast har dessa egenskaper approximativt. Normalförde-lad data kan modelleras enbart genom dess väntevärdesstruktur och varians / kovarians.Det är därför ett problem av stort intresse att skatta väntevärdet och kovariansmatrisenbra, samtidigt som det är viktigt att anta en korrekt statistisk modell. Skattningen av ko-variansmatrisen är också viktig eftersom slutsatser om väntevärdet beror på den skattadekovariansmatrisen.

I den här avhandling diskuteras problemet med att skatta parametrarna, alltså vän-tevärdet och kovariansmatrisen, för en matrisnormalfördelning när kovariansmatrisen harolika mönster, det vill säga olika statistiska modeller.

Flera olika strukturer beaktas. Först diskuteras enp-dimensionell stokastisk vektorsom antas ha en kovariansmatris med bandstruktur, vilket innebär ettm-beroende. Enenkel algoritm föreslås som ger en explicit, väntevärdesriktig och konsistent skattning avväntevärdet och en explicit och konsistent skattning av kovariansmatrisen för godtyckligdimensionp och bandbreddm.

Skattning av parametrarna i den klassiska tillväxtkurvemodellen när kovariansma-trisen har en linjär struktur är ett problem av stort intresse. I många exempel kan maximum-likelihoodskattningar inte erhållas explicit och måste därför beräknas med någon nu-merisk optimeringsalgoritm. Vi beräknar explicita skattningar som ett bra alternativ tillmaximum-likelihoodskattningarna. Från en diskussion om residualerna, ges en enkel al-goritm som resulterar i väntevärdesriktiga och konsistenta skattningar både för väntevärdetoch den linjärt mönstrade kovariansmatrisen.

Avhandlingen behandlar även problemet med att skatta kroneckerproduktstrukturen.Observationerna antas följa en matrisnormalfördelningmed en separabel kovariansmatris,det vill säga den kan skrivas som en kroneckerprodukt mellan två positivt definita matris-er. Dessa matriser kan tolkas som den spatiala och temporala kovariansen. Huvudmåletär att beräkna likelihoodkvoten som testar spatialt oberoende. Först antas det att den tem-porala kovariansen är känd och maximum-likelihoodskattningarna beräknas. Det visas attfördelningen för likelihoodkvoten är lika med den från fallet med oberoende observation-er. När den temporala kovariansen är okänd, härleds därefter en iterativ algoritm för atthitta maximum-likelihoodskattningarnaoch fördelningen för likelihoodkvoten diskuteras.

När det görs olika tester på kovariansmatriser uppkommer kvadratiska former. I denhär avhandlingen härleds en generalisering av fördelningen på den kvadratiska formen avmatriser. Det visas att fördelningen på en kvadratisk form är densamma som fördelningenför en viktad summa av ickecentrala wishartfördelade matriser.

vii

Page 8: Studies in Estimation of Patterned Covariance Matrices
Page 9: Studies in Estimation of Patterned Covariance Matrices

Acknowledgments

First of all I would like to thank my supervisor Professor Timo Koski for giving me theopportunity to work on the problems discussed in this thesis. It has been very interestingand I am grateful for all the freedom in my work.

I am also very grateful to my assistant supervisor Professor Dietrich von Rosen for allthe ideas and all the discussions we have had. Thank you for showing me the "multivariateworld" and for all inspiration.

During my time as a PhD-student I have been visiting some people around the world.My deepest thanks goes to to Professor Muni S. Srivastava for my time at the Departmentof Statistics, University of Toronto. During my time in Canada I was invited for a seminarat Department of Mathematics & Statistics, University of Maryland, Baltimore County.Thank you Professor Bimal Sinha for that opportunity.

I would also like to thank my present and former colleagues at the Department ofMathematics. In particular I wish to thank all the PhD-students at the department and mycolleagues at the Division of Mathematical Statistics, especially Dr. Eva Enqvist. Thanksfor your many interesting discussions and ideas.

For LaTeX layout of this thesis I am grateful to Dr. Gustaf Hendeby. The LaTeXtemplate has been very convenient and easy to work with. Thank you a lot.

I am very grateful to my dear friend Dr. Thomas Schön for the support and for theproofreading this thesis. We have had a lot of interesting discussions over the years andthe coffee breaks with "nice" coffee have been valuable.

Finally, I would like to thank my family for all support. You have always believed inme and encouraged me. Especially you, Lotta! You are very special in my life, you aremy everything!

Linköping, April 16, 2009

Martin Ohlson

ix

Page 10: Studies in Estimation of Patterned Covariance Matrices
Page 11: Studies in Estimation of Patterned Covariance Matrices

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Outline of Part I . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.2 Outline of Part II . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Other Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

I Estimation of Patterned Covariance Matrices 7

2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution 92.1 Multivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Matrix Normal Distribution . . . . . . . . . . . . . . . . . . . . 92.1.2 Wishart and Non-central Wishart Distribution . . . . . . . . . . . 102.1.3 Results on Quadratic Forms . . . . . . . . . . . . . . . . . . . . 11

2.2 Estimation of the Parameters . . . . . . . . . . . . . . . . . . . . . . . . 182.2.1 Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . 182.2.2 Patterned Covariance Matrix . . . . . . . . . . . . . . . . . . . . 202.2.3 Estimating the Kronecker Product Covariance . . . . . . . . . . . 262.2.4 Estimating the Kronecker Product Covariance (One Observation

Matrix) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.2.5 The Distribution of a Special Sample Covariance Matrix . . . . . 33

3 Estimation of the Covariance Matrix for a Growth Curve Model 373.1 The Growth Curve Model . . . . . . . . . . . . . . . . . . . . . . . . . . 373.2 Estimation of the Parameters . . . . . . . . . . . . . . . . . . . . . . . . 39

xi

Page 12: Studies in Estimation of Patterned Covariance Matrices

xii Contents

3.2.1 Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . 393.2.2 Growth Curve Model with Patterned Covariance Matrix . . . . . 41

4 Concluding Remarks 454.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Bibliography 49

Due to Copyright restrictions the articles are not included in the electronic version.

II Papers 57

A On Distributions of Matrix Quadratic Forms 591 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 Distribution of Multivariate Quadratic Forms . . . . . . . . . . . . . . . 633 Complex Matrix Quadratic Forms . . . . . . . . . . . . . . . . . . . . . 714 A Special Sample Covariance Matrix . . . . . . . . . . . . . . . . . . . . 73References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

B Explicit Estimators under m-Dependence for a Multivariate Normal Distri-bution 771 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802 Definitions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 813 Explicit Estimator of a Banded Covariance Matrix . . . . . . . . . . . . . 824 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

C The Likelihood Ratio Statistic for Testing Spatial Independence using a Sep-arable Covariance Matrix 951 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982 Known Dependency Structure . . . . . . . . . . . . . . . . . . . . . . . 993 Unknown Dependency Structure . . . . . . . . . . . . . . . . . . . . . . 103

3.1 Ψ hasAR(1) Structure . . . . . . . . . . . . . . . . . . . . . . . 1043.2 Ψ has Intraclass Structure . . . . . . . . . . . . . . . . . . . . . 108

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

D Explicit Estimators of Parameters in the Growth Curve Model with LinearlyStructured Covariance Matrices 1131 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1162 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1173 Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . . . . 1204 Growth Curve Model with a Linearly Structured Covariance Matrix . . . 1205 Properties of the Proposed Estimators . . . . . . . . . . . . . . . . . . . 1246 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Page 13: Studies in Estimation of Patterned Covariance Matrices

1Introduction

THIS thesis is concerned with the problem of estimating patterned covariance matri-ces for different kinds of statistical models. Patterned covariance matrices can arise

from a variety of contexts and can be both linear and non-linear. One example of a lin-early structured covariance matrix arise in the theory of graphical modeling. In graphicalmodeling, a bi-directed graph represent marginal independences among random variablesthat are identified with the vertices of the graph. Gaussian graphical models are calledcovariance graph models and can be used for estimation and testing.

If two vertices are not joined by an edge, then the two associated random variables areassumed to be marginally independent. For example, in Figure 2.1 the graph representsthe covariance structure for a random vectorx = (x1, x2, x3, x4)

′. Covariance graph

x3 x2x4x1

Figure 1.1: Covariance graph for a random vectorx = (x1, x2, x3, x4)′.

models will generate a patterned covariance matrix with some of the covariances equal tozero, i.e., if the vertices in the graph is not connected by a bi-directed edgei ↔ j thenσij = 0. In Figure 1.1 the graph imposesσ12 = σ14 = σ23 = 0 and the covariancematrix forx is given by

Σ =

σ11 0 σ13 00 σ22 0 σ24

σ13 0 σ33 σ34

0 σ24 σ34 σ44

.

The covariance graph can also represent more advanced models if some constraints areadded. Another linearly structured covariance matrix is the Toeplitz matrix and a special

1

Page 14: Studies in Estimation of Patterned Covariance Matrices

2 1 Introduction

case is a Toeplitz matrix with zeros and different variances.The covariance graph for thisspecial Toeplitz structure is given in Figure 1.2. This structure can for example representthat the covariances depends on the distance between equally spaced locations or timepoints. The covariance matrix is then given by

Σ =

σ21 ρ 0 0ρ σ2

2 ρ 00 ρ σ2

3 ρ0 0 ρ σ2

4

.

In this thesis, primarily linearly structured covariance matrix is considered. We know that

x2 x4x3x1

Figure 1.2: Covariance graph for a kind of Toeplitz covariance matrix with zeros.

inference on the mean parameters strongly depends on the estimated covariance matrixand the dispersion matrix for the estimator of the mean is a function of the covariancematrix. Hence, when testing the mean parameters the estimator of the covariance matrixis very important and it is of great significance to have good estimators for the correctmodel.

1.1 Background

One of the first to discuss a patterned covariance matrix was Wilks (1946) who consideredtheuniform(also known asintraclass) covariance structure when dealing with measure-ments onk equivalent psychological tests. The uniform structure is a linear covariancestructure which is given with equal diagonal elements and equal off-diagonal elements.This structure is also of interest since it arise as the marginal distribution in a random-effect model. Votaw (1948) extended the uniform model tocompound symmetrystructure,a covariance structure similar the uniform structure, but with blocks each having uniformstructure. The intraclass model can also be generalized to theToeplitzor circular Toeplitzstructure which is discussed by Olkin and Press (1969), Olkin (1973) and later Nahtman(2006). These models are all special cases of theinvariant normalmodels consideredby Andersson (1975) and a proper review of the results related to the invariant normalmodels can be found in Perlman (1987).

More recently, linearly structured covariance matrices have been discussed in theframe of graphical models, e.g., see Drton and Richardson (2004), Chaudhuri et al. (2007)where an iterative algorithm for the maximum likelihood estimators were given. Origi-nally, many estimators of the covariance matrix were obtained from non-iterative leastsquare methods. With increasing computational power iterative methods such as max-imum likelihood, restricted maximum likelihood among others were introduced to esti-mate the patterned covariance matrix. Nowadays, the data sets and the dimension of themodels are very large and non-iterative methods are preferable again. In this thesis we

Page 15: Studies in Estimation of Patterned Covariance Matrices

1.2 Outline 3

mainly consider explicit estimators for patterned covariance matrices in the multivariatelinear model, but also an iterative estimator for the Kronecker product covariance struc-ture is considered.

The Growth Curve model introduced by Potthoff and Roy (1964) has been extensivelystudied over the years. The mean structure for the Growth Curve model is bilinear insteadof linear as for the ordinary multivariate linear model. Potthoff and Roy (1964) originallyderived a class of weighted estimators for the parameter matrix which are a function of anarbitrary positive definite matrix. Khatri (1966a) extended this result and showed that themaximum likelihood estimator also is a weighted estimator.

Patterned covariance matrices for the Growth Curve model have been discussed in theliterature, e.g., the intraclass covariance structure has been considered by Khatri (1973),Arnold (1981), Lee (1988). Since the mean structure is bilinear, we will have a decom-position of the space generated by the design matrices as tensor spaces instead of linearspaces as in the linear model case. This decomposition will make maximum likelihoodestimation of a patterned covariance more complicated. In this thesis we use the decom-position of tensor spaces and derive explicit estimators of linearly structured covariancematrices.

1.2 Outline

This thesis consists of two parts and the outline is as follows.

1.2.1 Outline of Part I

In Part I the background and theory are given. Chapter 2 starts with definitions and someresults for the multivariate distributions that are used, i.e., matrix normal and Wishartdistribution. Also a review of the univariate and matrix quadratic form is presented.The second part of Chapter 2 discuss estimation of the mean and covariance matrix ina multivariate linear model. The different estimators which are discussed are maximumlikelihood for the non-patterned case and several methods for various structures.

In Chapter 3 we consider the Growth Curve model. The maximum likelihood estima-tors are given for the ordinary non-patterned case and various structures for the covariancematrix are discussed.

Part I ends with a conclusion and some pointers for future work in Chapter 4.

1.2.2 Outline of Part II

Part II consists of four papers. Below follows a short summary for each of the papers.

Paper A: On Distributions of Matrix Quadratic Forms

Ohlson, M. and Koski, T. (2009b). On distributions of matrix quadraticforms. Submitted toCommunications in Statistics - Theory and Methods.

Page 16: Studies in Estimation of Patterned Covariance Matrices

4 1 Introduction

A characterization of the distribution of the multivariate quadratic form given byXAX′,whereX is ap × n normally distributed matrix andA is ann × n symmetric real ma-trix, is presented. We show that the distribution of the quadratic form is the same as thedistribution of a weighted sum of non-central Wishart distributed matrices. This is ap-plied to derive the distribution of the sample covariance between the rows ofX when theexpectation is the same for every column and is estimated with the regular mean.

Paper B: Explicit Estimators under m-Dependence for aMultivariate Normal Distribution

Ohlson, M., Andrushchenko, Z., and von Rosen, D. (2009). Explicit estima-tors under m-dependence for a multivariate normal distribution. Acceptedfor publication inAnnals of the Institute of Statistical Mathematics.

The problem of estimating parameters of a multivariate normalp-dimensional randomvector is considered for a banded covariance structure reflectingm-dependence. A simplenon-iterative estimation procedure is suggested which gives an explicit, unbiased andconsistent estimator of the mean and an explicit and consistent estimator of the covariancematrix for arbitraryp andm.

Paper C: The Likelihood Ratio Statistic for Testing SpatialIndependence using a Separable Covariance Matrix

Ohlson, M. and Koski, T. (2009a). The likelihood ratio statistic for testingspatial independence using a separable covariance matrix. Technical ReportLiTH-MAT-R-2009-06, Department of Mathematics, Linköping University.

This paper deals with the problem of testing spatial independence for dependent obser-vations. The sample observation matrix is assumed to follow a matrix normal distribu-tion with a separable covariance matrix, in other words it can be written as a Kroneckerproduct of two positive definite matrices. Two cases are considered, when the temporalcovariance is known and when it is unknown. When the temporal covariance is known,the maximum likelihood estimates are computed and the asymptotic null distribution isgiven. In the case when the temporal covariance is unknown the maximum likelihoodestimates of the parameters are found by an iterative alternating algorithm and the nulldistribution for the likelihood ratio statistic is discussed.

Paper D: Explicit Estimators of Parameters in the Growth CurveModel with Linearly Structured Covariance Matrices

Ohlson, M. and von Rosen, D. (2009). Explicit estimators of parameters inthe Growth Curve model with linearly structured covariance matrices. Sub-mitted toJournal of Multivariate Analysis.

Estimation of parameters in the classical Growth Curve model when the covariance ma-trix has some specific linear structure is considered. In our examples maximum likeli-hood estimators can not be obtained explicitly and must rely on optimization algorithms.

Page 17: Studies in Estimation of Patterned Covariance Matrices

1.3 Other Publications 5

Therefore explicit estimators are obtained as alternativesto the maximum likelihood esti-mators. From a discussion about residuals, a simple non-iterative estimation procedure issuggested which gives explicit and consistent estimators of both the mean and the linearlystructured covariance matrix.

1.3 Other Publications

Other publications from conferences are listed below.

• Distribution of Quadratic Formsat MatTriad 2007, 22-24 March 2007, Bedlewo,Poland.

• More on Distribution of Quadratic Formsat The 8th Tartu Conference on Multi-variate Statistics and The 6th Conference on Multivariate Distributions with FixedMarginals, 26-29 June 2007, Tartu, Estonia.

• Explicit Estimators under m-Dependence for a Multivariate Normal DistributionatLinStat 2008, 21-25 April 2008, Bedlewo, Poland.

• The Likelihood Ratio Statistic for Testing Spatial Independence using a SeparableCovariance Matrixat Swedish Society for Medical Statistics, Spring conference,27 March 2009, Uppsala, Sweden.

1.4 Contributions

The main contributions of the thesis are as follows.

• In Paper A a characterization of the distribution of a matrix quadratic form is de-rived. This characterization is similar to the one for the univariate case and can beused to prove several properties for the matrix quadratic form.

• Estimation of a banded covariance matrix, i.e., a matrix with zeros outside an arbi-trary large band, is considered in Paper B. A non-iterative estimation procedure issuggested which gives explicit, unbiased and consistent estimators.

• In Paper C an iterative alternating algorithm for the maximum likelihood estimatorsis derived for the case when we have Kronecker product covariance structure andonly one observation matrix. The cases with intraclass and autoregressive structureof order one are handled.

• Linearly structured covariance structures for the Growth Curve model are discussedin Paper D. Using the residuals a simple non-iterative estimation procedure is sug-gested which gives explicit and consistent estimators of both the mean and thelinearly structured covariance matrix.

Page 18: Studies in Estimation of Patterned Covariance Matrices

6 1 Introduction

Page 19: Studies in Estimation of Patterned Covariance Matrices

Part I

Estimation of PatternedCovariance Matrices

7

Page 20: Studies in Estimation of Patterned Covariance Matrices
Page 21: Studies in Estimation of Patterned Covariance Matrices

2Estimation of the Covariance Matrix

for a Multivariate Normal Distribution

M ANY testing, estimation and confidence interval procedures discussed in the multi-variate statistical literature are based on the assumption that the observation vectors

are independent and normally distributed (Anderson, 2003, Srivastava and Khatri, 1979,Muirhead, 1982). There are two main reasons for this. Firstly, sets of multivariate obser-vations are often, at least approximately, normally distributed. Secondly, the multivariatenormal distribution is mathematically tractable. Normally distributed data can be mod-eled entirely in terms of their means and variances/covariances. Estimating the mean andthe covariance matrix are therefore problems of great interest in statistics, as well as inmany related more applied areas.

2.1 Multivariate Distributions

2.1.1 Matrix Normal Distribution

Suppose thatX : p×n is a random matrix. Let the expectation ofX beE(X) = M : p×nand the covariance matrix becov(X) = Ω : (pn) × (pn).

Definition 2.1. A positive definite matrixA is said to be separable if it can be written asa Kronecker product of two positive definite matricesB andC,

A = B⊗ C.

Here⊗ is the Kronecker product, see Kollo and von Rosen (2005). Suppose that thecovariance matrixΩ is separable, i.e.,Ω = Ψ ⊗ Σ, whereΨ : n× n andΣ : p× p aretwo positive definite covariance matrices. Assume thatX is matrix normal distributed,denoted byX ∼ Np,n (M,Σ,Ψ), which is equivalent to

vecX ∼ Npn (vecM,Ψ⊗ Σ) ,

9

Page 22: Studies in Estimation of Patterned Covariance Matrices

10 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

wherevec ( · ) is the vectorization operator (Kollo and von Rosen, 2005). The covariancematrix Σ can be interpreted as the covariance between the rows ofX andΨ can beinterpreted as the covariance between the columns ofX. SinceΣ andΨ are positivedefinite, written asΣ > 0 andΨ > 0, the density function ofX is

f(X) = (2π)− 1

2pn|Σ|−n/2|Ψ|−p/2etr

1

2Σ−1 (X − M)Ψ−1 (X − M)

, (2.1)

which is the same density function as forvecX, since we can write

|Ψ ⊗ Σ| = |Ψ|p|Σ|n, and

vec′X(Ψ ⊗ Σ)−1vecX = trΣ−1XΨ−1X′,

whereetr(A) = exp(tr(A)) and|Σ| is the determinant of the matrixΣ.One of the most important properties of the matrix normal distribution is that it is

invariant under bilinear transformations.

Theorem 2.1LetX ∼ Np,n (M,Σ,Ψ). For any matricesB : q × p andC : m× n

BXC′ ∼ Nq,m

(BMC′,BΣB′,CΨC′

).

2.1.2 Wishart and Non-central Wishart Distribution

The Wishart distribution is the multivariate generalization of theχ2 distribution. Thematrix W : p × p is said to be Wishart distributed if and only ifW = YY′ for somematrixY ∼ Np,n(M,Σ, I), whereΣ is positive definite (Kollo and von Rosen, 2005). Ifthe mean is zero,M = 0, the Wishart distribution is said to be central and this is denotedby W ∼ Wp(n,Σ). Otherwise, ifM 6= 0, the Wishart distribution is non-central,W ∼Wp(n,Σ,Ω), whereΩ = MM′.

The density function forW ∼Wp(n,Σ) exists ifn ≥ p and is given by

fW(W) =(2pn/2Γp

(n2

)|Σ|n/2

)−1

|W|(n−p−1)/2 exp

1

2trΣ−1W

,

for W > 0 and where the multivariate gamma functionΓp

(n2

)is given by

Γp

(n2

)= πp(p−1)/4

p∏

i=1

Γ

(1

2(n+ 1 − i)

). (2.2)

If W ∼ Wp(n,Σ,Ω), then the characteristic function ofW can be found in Muirhead(1982) and is given by

ϕW(T) = |I − iΓΣ|−n/2etr

(−

1

2Σ−1Ω

)etr

(1

2Σ−1Ω (I − iΓΣ)−1

),

whereT = (tij)pi,j=1, Γ = (γij) = ((1 + δij) tij)

pi,j=1, δij is the Kronecker delta and

tij = tji. Hence, for the case of central Wishart distribution it is nothing more than

ϕW(T) = |I− iΓΣ|−n/2.

Page 23: Studies in Estimation of Patterned Covariance Matrices

2.1 Multivariate Distributions 11

It is known, that if a Wishart distributed matrixW is transformed asBWB′ for somematrixB : q × p, we have a new Wishart distributed matrix (Kollo and von Rosen, 2005,Muirhead, 1982).

Theorem 2.2LetW ∼Wp (n,Σ,Ω) andB : q × p real matrix. Then

BWB′ ∼Wq

(n,BΣB′,BΩB′

).

It is also easy to see from the definition of a Wishart distribution that the sum of indepen-dent Wishart distributed variables is again Wishart distributed.

Theorem 2.3Let the random matrixW1 ∼ Wp(n,Σ,Ω1) be independent of the matrixW2 ∼Wp(m,Σ,Ω2). Then

W1 + W2 ∼Wp(n+m,Σ,Ω1 + Ω2).

2.1.3 Results on Quadratic Forms

In this section we will study the distribution of a multivariate quadratic formQ = XAX′,whereX : p × n is a random matrix andA : n × n is a real symmetric matrix. Thedistribution ofQ has been studied by many authors, see for example Khatri (1962), Hogg(1963), Khatri (1966b), Hayakawa (1966), Shah (1970), Rao and Mitra (1971), Hayakawa(1972) and more recently Gupta and Nagar (2000), Vaish and Chaganty (2004), Kolloand von Rosen (2005) and the references therein. Wong and Wang (1993), Mathew andNordström (1997), Masaro and Wong (2003), Hu (2008) discussed Wishartness for thequadratic form when the covariance matrix is non-separable, i.e., we can not write it as aKronecker product.

Univariate Quadratic Forms

The quadratic formQ = XAX′ is the multivariate generalization of the univariatequadratic form

q = x′Ax,

wherex has a multivariate normal distributionx ∼ Np(µ,Σ), e.g., see Rao (1973),Graybill (1976), Srivastava and Khatri (1979), Muirhead (1982). We will start by consid-ering the univariate case when the variables are independent. Later on we consider thedependent case with a covariance matrixΣ. The results are given below for the sake ofcompleteness. There is a lot of literature on this topic and the results quoted below canbe found in for example Graybill (1976), Srivastava and Khatri (1979), Muirhead (1982).In the subsequent section these results are recapitulated for the multivariate case too. Anapplication of the distribution of a specific quadratic form is the sample variances2.

Page 24: Studies in Estimation of Patterned Covariance Matrices

12 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

Example 2.1

Let

C = In −1

n11′,

where1 = (1, . . . , 1)′ : n× 1, In : n× n is the identity matrix. Then we have

q = x′Cx = x′

(I −

1

n11′

)(I −

1

n11′

)x

= (x − x1)′(x − x1) =

p∑

i=1

(xi − x)2 = (n− 1)s2,

where the second equality follows from the fact thatC = C2, i.e, C is idempotent.Furthermore,x = 1

n1′x and the distribution ofq = x′Cx is the same as the distributionof (n− 1)s2.

In the simplest case we suppose that the variables are standard normal,x ∼ Np(0, I).The distribution ofq is centralχ2 under some restrictions onA.

Theorem 2.4Suppose thatx ∼ Np(0, I) and letq = x′Ax. Then the distribution ofq is centralχ2,i.e.,q ∼ χ2

k, if and only ifA is a symmetric idempotent matrix andrank(A) = k.

Again, if we assume the variables to be independent, but with nonzero means, the distri-bution ofq is non-centralχ2, under the same restrictions onA as in Theorem 2.4. Thenon-centrality parameter will depend onA andµ.

Theorem 2.5Suppose thatx ∼ Np(µ, I) and letq = x′Ax. Then the distribution ofq is non-centralχ2, i.e., q ∼ χ2

k(δ), whereδ = 12µ′Aµ, if and only ifA is an idempotent matrix and

rank(A) = k.

Assume now that the variables inx are dependent, i.e., letx ∼ Np(µ,Σ), whereΣ > 0.SinceΣ is positive definite,Σ can be decomposed asΣ = Γ′Γ, whereΓ : p × p andrank(Γ) = p. Using this decomposition the following theorem from Graybill (1976) iseasy to prove.

Theorem 2.6Suppose thatx ∼ Np(µ,Σ), whererank(Σ) = p and letq = x′Ax. Then the distribu-tion of q is non-centralχ2, i.e.,q ∼ χ2

k(δ), whereδ = 12µ′Aµ, if and only if any of the

following three conditions are satisfied

• AΣ is an idempotent matrix of rankk.

• ΣA is an idempotent matrix of rankk.

• Σ is a generalized inverse ofA andrank(A) = k.

Page 25: Studies in Estimation of Patterned Covariance Matrices

2.1 Multivariate Distributions 13

HereΣ is a generalized inverse ofA, denotedΣ = A−, if and only if AΣA = A.A special case of Theorem 2.6 is the following corollary (Graybill, 1976), where thecovariance matrix is supposed to be a diagonal matrix and the mean is the same for everyvariable.

Corollary 2.1Suppose thatx ∼ Np(µ,D), whereD is a diagonal matrix of rankp. Thenq = x′Ax ∼χ2

p−1(δ), whereδ = 12µ′Aµ, if

A = D−1 − (1′D−11)−1D−111′D−1.

Alsoδ = 0 if µ = µ1 for any scalarµ.

Corollary 2.1 gives the distribution of the ordinary sample variance.

Example 2.2

Letx1, . . . , xn be a random sample and letxi ∼ N(µ, σ2) for i = 1, . . . , n. Furthermore,let the variables be independent. We can now write

x = (x1, . . . , xn)′ ∼ Nn(µ1, σ2I)

and using Corollary 2.1 we haveD = σ2I and

A = D−1 −(1′D−11

)−1D−111′D−1 = σ−2

(I − n−111′

).

Using this we see that the quadratic formx′Ax has the followingχ2 distribution,

x′Ax =x′(I − n−111′

)x

σ2=

(x − x1)′ (x − x1)

σ2

=

∑ni=1(xi − x)2

σ2=

(n− 1)s2

σ2∼ χ2

n−1,

and this is as we expected.

The next theorem is the most general one in the question about the distribution of thequadratic form. It states that every quadratic formx′Ax has the same distribution asthe weighted sum of independent non-centralχ2 variables, see for example Baldessari(1967), Tan (1977), Graybill (1976) for more details.

Theorem 2.7Suppose thatx ∼ Np(µ,Σ), whererank(Σ) = p. The random variableq = x′Ax hasthe same distribution as the random variablew =

∑ni=1 diwi, wheredi are the latent

roots of the matrixAΣ, and wherewi are independent non-centralχ2 random variables,each with one degree of freedom.

Theorem 2.7 and the fact that a sum of independent non-centralχ2 variables is againnon-centralχ2 distributed give that all the theorems above are special cases.

One of the first to consider independence between quadratic forms was Cochran(1934).

Page 26: Studies in Estimation of Patterned Covariance Matrices

14 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

Theorem 2.8 (Cochran’s Theorem)Given x ∼ Np (0, I), suppose thatx′x is decomposed intok quadratic forms,qi =x′Bix, i = 1, 2, . . . , k, whererank (Bi) = ri and theBi ≥ 0, then any one of thefollowing conditions implies the other two.

•∑k

i=1 ri = p,

• eachqi ∼ χ2ri

,

• all theqi are mutually independent.

Many authors have generalized Cochran’s theorem, see for example Chipman and Rao(1964), Styan (1970), Tan (1977) and the references therein.

Multivariate Quadratic Forms

Several authors, see for example Khatri (1962), Shah (1970), Vaish and Chaganty (2004),have investigated the conditions under which the quadratic formQ = XAX′, whereX ∼ Np,n (M,Σ,Ψ), has a Wishart distribution. Rao (1973) answered the questionwith the relation

XAX′ ∼Wp ⇔ l′XAX′l ∼ χ2,

for any fixed vectorl. Hence, the theory of univariate quadratic forms can be applied tothe multivariate case. IfM = 0 andΨ = I, we have a multivariate version of Theorem2.4 (see for example Rao (1973), Gupta and Nagar (2000) for the details).

Theorem 2.9LetY ∼ Np,n (M,Σ, I) and letA : n×n be a symmetric real matrix. ThenQ = YAY′

is Wishart if and only ifA is an idempotent matrix.

Example 2.3

Let Y ∼ Np,n (µ1′,Σ, I), with µ = (µ1, . . . , µp)′, i.e.,Y is n independentp-vectors,

Y = (y1, . . . ,yn). The sample mean vectory and sample covariance matrixS aredefined by

y =1

n

n∑

i=1

yi =1

nY1,

S =1

n(Y − y1′)(Y − y1′)′ = YCY′,

whereC is the centralization matrix, i.e.,C = I − 1n11′. We know thatC is idempotent

andrank(C) = n− 1. Using Theorem 2.9 we have

nS ∼Wp(n− 1,Σ),

sinceMCM′ = µ1′C1µ′ = 0.

In the non-central case whenM 6= 0 andΨ 6= I, Q is non-central Wishart if and only if

Page 27: Studies in Estimation of Patterned Covariance Matrices

2.1 Multivariate Distributions 15

AΨ is an idempotent matrix.

Corollary 2.2LetX ∼ Np,n (M,Σ,Ψ) and letA : n× n be a symmetric real matrix. Then

Q = XAX′ ∼Wp

(r,Σ,MAM′

),

wherer = rank(AΨ) if and only ifAΨ is an idempotent matrix.

The next theorem from Kollo and von Rosen (2005) (Theorem 2.2.4.) gives the necessaryand sufficient conditions for two quadratic forms to be independent.

Theorem 2.10Let X ∼ Np,n (M,Σ,Ψ), Y ∼ Np,n (0,Σ,Ψ) and A : n × n and B : n × n benon-random matrices. Then

(i) YAY′ is independent ofYBY′ if and only if

ΨAΨB′Ψ = 0, ΨA′ΨBΨ = 0,

ΨAΨBΨ = 0, ΨA′ΨB′Ψ = 0,

(ii) YAY′ is independent ofYB if and only if

B′ΨA′Ψ = 0,

B′ΨAΨ = 0.

Theorem 2.10 can be used to show the independence between the sample mean and sam-ple covariance.

Example 2.4

Let Y ∼ Np,n (µ1′,Σ, I), with µ = (µ1, . . . , µp)′. Using Theorem 2.10(ii) we see that

y andS given in Example 2.3 are independent.

Khatri (1962) extended Cochran’s theorem (Theorem 2.8) to the multivariate case bydiscussing conditions for Wishartness and independence of second degree polynomials.Several other have also generalized Cochran’s theorem for the multivariate case, see forexample Rao and Mitra (1971), Khatri (1980), Vaish and Chaganty (2004), Tian and Styan(2005).

Khatri (1966b) derived the density forQ = XAX′, whenX ∼ Np,n (0,Σ,Ψ), i.e.,for the central caseM = 0, as

2−12pn

(Γp

(1

2n

))−1

|AΨ|−12p|Σ|−

12n|Q|

12 (n−p−1)

exp

1

2q−1trΣ−1Q

0F0

(T,

1

2q−1Σ−1Q

),

(2.3)

Page 28: Studies in Estimation of Patterned Covariance Matrices

16 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

whereq > 0 is an arbitrary constant andT = In − qA− 12 Ψ−1A− 1

2 . The densityfunction involves the the hypergeometric function of matrix argument0F0 and is relativelycumbersome to handle. The hypergeometric function can be expand in terms of zonalpolynomialsCκ( · ) (see Muirhead (1982) for details about the zonal polynomials) as

0F0 (R,S) =

∞∑

k=0

κ

Cκ(R)Cκ(S)

Cκ(I)k!, (2.4)

which is slowly convergent and the expansion in terms of Laguerre polynomials may bepreferable for computational purposes.

The probability density function (2.3) is written as the product of a Wishart densityfunction and a generalized hypergeometric function. This form is not always convenientfor studying properties ofQ. For M = 0 andΨ = In, both Hayakawa (1966) andShah (1970) derived the probability density function forQ. Using the moment generat-ing function ofQ, Shah (1970) expressed the density function ofQ in terms of Laguerrepolynomials with matrix argument. Hayakawa (1966) gave the probability density func-tion for Q whenM = 0 andΨ = In as

2−12pn

(Γp

(1

2n

))−1

|A|−12p|Q|

12 (n−p−1)

0F0

(A−1,−

1

2Σ−1Q

). (2.5)

Hayakawa (1966) also showed that any quadratic formQ can be decomposed into a linearcombination of independent central Wishart or pseudo Wishart matrices with coefficientsequal to the eigenvalues ofA.

The Laplace transform was used in Khatri (1977) to generalize the results of Shah(1970) to the non-central case. WhenA = I, Khatri (1977) also obtained a similarrepresentation of the non-central Wishart density in terms of the generalized Laguerrepolynomial with matrix argument.

In the non-central case whenX ∼ Np,n(M,Σ,Ψ) Gupta and Nagar (2000) gavethe non-central density forQ = XAX′ in terms of generalized Hayakawa polynomi-als, which are expectations of certain zonal polynomials. Gupta and Nagar (2000) alsocomputed the moment generating function and used it for proving Wishartness and inde-pendence of quadratic forms.

Ohlson and Koski (2009b) characterized the distribution ofQ using the characteristicfunction. The characteristic function forQ = YAY′ is given by the following theoremdue to Ohlson and Koski (2009b).

Theorem 2.11LetY ∼ Np,n (M,Σ, I), then the characteristic function ofQ = YAY′ is

ϕQ(T) =

r∏

j=1

|Ip − iλjΓΣ|−1/2etr

(−

1

2Σ−1Ωj

)

etr

(1

2Σ−1Ωj (Ip − iλjΓΣ)

−1

),

whereT = (tij)pi,j=1, Γ = (γij) = ((1 + δij) tij)

pi,j=1, tij = tji andδij is the Kronecker

delta. The non-centrality parameters areΩj = mjm′j , wheremj = M∆j . The vectors

∆j and the valueλj are the latent vectors and roots ofA respectively.

Page 29: Studies in Estimation of Patterned Covariance Matrices

2.1 Multivariate Distributions 17

Ohlson and Koski (2009b) showed that the distribution ofQ coincides with the distribu-tion of a weighted sum of non-central Wishart distributed matrices, similar as in the casewhenM = 0 andΨ = I done by Hayakawa (1966). A multivariate version of Theorem2.7 was given by Ohlson and Koski (2009b) for the matrix quadratic formQ = YAY′.

Theorem 2.12SupposeY ∼ Np,n (M,Σ, I) and letQ be the quadratic formQ = YAY′, whereA :n× n is any real matrix. Then the distribution ofQ is the same as forW =

∑j λjWj ,

whereλj are the nonzero latent roots ofA andWj are independent non-central Wishart,i.e.,

Wj ∼Wp(1,Σ,mjm′j),

wheremj = M∆j and∆j are the corresponding latent vectors.

The multivariate quadratic formYAY′ has the same distribution asW =∑

j λjWj ,whereλj are the nonzero latent roots ofA andWj are independent non-central Wishart,i.e.,Wj ∼ Wp(1,Σ,mjm

′j), wheremj = M∆j and∆j are the corresponding latent

vectors. Since this class of distributions are rather common Ohlson and Koski (2009b)defined the distribution and gave several properties.

Definition 2.2. AssumeY ∼ Np,n (M,Σ, I). Define the distribution of the multivariatequadratic formQ = YAY′ to beQp(A,M,Σ).

If we transform a Wishart distributed matrixW asBWB′, we have a new Wishart dis-tributed matrix. In the same way we can transform our quadratic form.

Theorem 2.13LetQ ∼ Qp(A,M,Σ) andB : q × p real matrix. Then

BQB′ ∼ Qq(A,B,BΣB′).

More theorems and properties for the distribution ofQ = YAY′ can be found in Ohlsonand Koski (2009b).

The standard results about distributions of quadratic forms such as Theorem 2.9 followfrom Theorem 2.12 and the fact that idempotent matrices only have latent roots equal tozero or one (Horn and Johnson, 1990). The expectation of the quadratic formYAY′ isanother property that is easy to derive using Theorem 2.12.

Theorem 2.14LetQ = YAY′ ∼ Qp(A,M,Σ) thenE

(YAY′

)= tr (A)Σ + MAM′.

Now, suppose thatX ∼ Np,n (M,Σ,Ψ) i.e., the columns are dependent as well.

Corollary 2.3The distribution ofQ = XAX′ is the same as for

W =∑

j

λjWj ,

Page 30: Studies in Estimation of Patterned Covariance Matrices

18 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

whereλj are the nonzero latent roots ofΨ1/2AΨ1/2 and Wj are independent non-central Wishart, i.e.,

Wj ∼Wp(1,Σ,mjm′j),

wheremj = MΨ−1/2∆j and∆j are the corresponding latent vectors.

Hence, we see that the distribution ofQ isQp(Ψ1/2AΨ1/2,MΨ−1/2,Σ).

Remark 2.1. The latent roots are the same forΨ1/2AΨ1/2 andAΨ.

2.2 Estimation of the Parameters

Originally, many estimators of the covariance matrix were obtained from non-iterativeleast squares methods such as the ANOVA and MINQUE approaches, for example whenestimating variance components. When computer resources became more powerful iter-ative methods such as maximum likelihood, restricted maximum likelihood, generalizedestimation equations among others were introduced. However, nowadays one is interestedin applying covariance structures, including variance components models, to very largedata sets. These are found for example in QTL-analysis in Genetics or time series withdensely sampled observations in meteorology or in EEG/EKG-studies in medicine.

2.2.1 Maximum Likelihood Estimators

Let x ∼ Np (µ,Σ), whereµ = (µ1, . . . , µp)′ andΣ > 0. Furthermore, letxi, i =

1, . . . , n, be an independent random sample onx. The observation matrix can then bewritten as

X = (x1, . . . ,xn) ∼ Np,n (µ1′,Σ, I) .

From (2.1) the likelihood function is given by

L (µ,Σ) = (2π)−12pn |Σ|−n/2etr

1

2Σ−1 (X− µ1′) (X− µ1′)

′. (2.6)

Since

(X − µ1′) (X − µ1′)′= S + n (x − µ) (x − µ)

′,

where

x =1

nX1,

and

S = X(I − 1(1′1)−11′

)X′ = X

(I − n−111′

)X′, (2.7)

Page 31: Studies in Estimation of Patterned Covariance Matrices

2.2 Estimation of the Parameters 19

the likelihood function (2.6) can be written as

L (µ,Σ) = (2π)− 1

2pn|Σ|−n/2etr

1

2Σ−1

(S + n (x − µ) (x − µ)

′). (2.8)

Hence, from Fischer-Neyman factorization Theorem (Schervish (1995), page 89)(x,S)is a sufficient statistic for(µ,Σ). The maximum likelihood estimators can be establishedthrough a series of inequalities.

L (µ,Σ) = (2π)− 1

2pn|Σ|−n/2etr

1

2Σ−1

(S + n (x − µ) (x − µ)

′)

(2.9)

≤ (2π)− 1

2pn|Σ|−n/2etr

1

2Σ−1S

(2.10)

≤ (2π)−12pn |

1

nS|−n/2 exp (−pn/2) , (2.11)

where the first inequality holds since(x − µ)′ Σ−1 (x − µ) ≥ 0 and the second inequal-ity since we can use

|Σ|−n/2etr

1

2Σ−1S

≤ |

1

nS|−n/2 exp (−pn/2)

given in Srivastava and Khatri (1979), page 25. Equality in (2.9)-(2.11) holds if and onlyif µ = x andΣ = 1

nS. Hence, the maximum likelihood estimators forµ andΣ are

µML =1

nX1 (2.12)

and

ΣML =1

nX(I− 1(1′1)−11′

)X′. (2.13)

Since(I − 1(1′1)−11′

)is a projection on the orthogonal space to the column space gen-

erated of the vector1, using Theorem 2.10(ii) it can be seen thatµML andΣML areindependent and distributed due to the following theorem (Anderson (2003), page 77 and255).

Theorem 2.15If X ∼ Np,n (µ1′,Σ, I) then the maximum likelihood estimators given in(2.12) and(2.13)are independently distributed as

µML ∼ Np

(µ,

1

)

and

nΣML ∼Wp (n− 1,Σ) .

Page 32: Studies in Estimation of Patterned Covariance Matrices

20 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

2.2.2 Patterned Covariance Matrix

Patterned covariance matrices arise in a variety of contexts and have been studied by manyauthors. In a seminal paper Wilks (1946) considered patterned structures when dealingwith measurements onk equivalent psychological tests. This led to a covariance matrixwith equal diagonal elements and equal off-diagonal elements, i.e., a covariance matrixgiven by

Σ = σ2 ((1 − ρ)I + ρ11′) : p× p, (2.14)

with − 1p−1 < ρ < 1. This structure is calleduniform, complete symmetryor intraclass

covariance structure. Wilks (1946) developed statistical test criteria for testing equality inmeans, equality in variances and equality in covariances. The structure implies that boththe inverse and the determinant have closed form expressions (Muirhead (1982), page114) and the maximum likelihood estimators are given by

σ2 =trS

p(n− 1)

and

ρ =1′S1− trS

(p− 1)trS,

whereS is given in (2.7).Votaw (1948) extended the intraclass model to a model with blocks calledcompound

symmetry, type Iandtype II. The compound symmetry covariance matrices are, for thep = 4 case, given as

ΣI =

α β β ββ γ δ δβ δ γ δβ δ δ γ

and ΣII =

α β κ σβ α σ κκ σ γ δσ κ δ γ

. (2.15)

Votaw (1948) considered different psychometric and medical research problems where thecompound symmetry is applicable. In Szatrowski (1982) block compound symmetry wasdiscussed and the models were applied to the analysis of an educational testing problem.In a series of papers, Szatrowski (1985) discussed how to obtain maximum likelihoodestimators for the elements of a class of patterned covariance matrices in the presence ofmissing data.

Another type of block structure arise when multivariatecomplexnormal distribution isconsidered. The multivariate complex normal distribution can be defined as in Srivastavaand Khatri (1979).

Definition 2.3. Let z = x + iy with meanθ and covariance matrixQ = Σ1 + iΣ2.Thenz ∼ CNp (θ,Q) if and only if

(x

y

)∼ N2p

((θ1

θ2

),ΣC

),

Page 33: Studies in Estimation of Patterned Covariance Matrices

2.2 Estimation of the Parameters 21

where

ΣC =1

2

(Σ1 −Σ2

Σ2 Σ1

), θ = θ1 + iθ2.

Goodman (1963) was one of the first to study the covariance matrix of the multivariatecomplex normal distribution, which for example arise in spectral analysis of multipletime series. A direct extension is to studyquaternions, e.g., see Andersson (1975) andAndersson et al. (1983). The covariance matrix in the quaternion case is a4p× 4pmatrixand is structured as

ΣQ =

Σ1 Σ2 Σ3 Σ4

−Σ2 Σ1 −Σ4 Σ3

−Σ3 Σ4 Σ1 −Σ2

−Σ4 −Σ3 Σ2 Σ1

.

A circular stationarymodel, where variables are thought of as being equally spacedaround a circle was considered by Olkin and Press (1969). The covariance between twovariables in the circular stationary model depends on the distance between the variablesand the covariance matrix forp = 4 andp = 5 have the structures

ΣCS = σ20

1 ρ1 ρ2 ρ1

ρ1 1 ρ1 ρ2

ρ2 ρ1 1 ρ1

ρ1 ρ2 ρ1 1

and ΣCS = σ2

0

1 ρ1 ρ2 ρ2 ρ1

ρ1 1 ρ1 ρ2 ρ2

ρ2 ρ1 1 ρ1 ρ2

ρ2 ρ2 ρ1 1 ρ1

ρ1 ρ2 ρ2 ρ1 1

.

Olkin and Press (1969) considered three symmetries, namely circular, intraclass andspherical and derived likelihood ratio test and the asymptotic distribution under the hy-pothesis and alternative. Olkin (1973) generalized the circular stationary model with amultivariate version in which each element was a vector and the covariance matrix can bewritten as ablock circularmatrix.

The covariance symmetries investigated by for example Wilks (1946), Votaw (1948)and Olkin and Press (1969) are all special cases ofinvariant normal modelsconsideredby Andersson (1975). The invariant normal models include not only all models specifiedby symmetries of the covariance matrix, but also the linear models for the mean. Thesymmetry model defined by a groupG is the family of covariance matrices given by

S+G =

Σ|Σ > 0,GΣG′ = Σ for all G ∈ G

,

i.e., this implies that ifx is a random vector withcov(x) = Σ such thatΣ ∈ S+G then

there is a symmetry restrictions on the covariance matrix, namely thatcov(x) = cov(Gx)for all G ∈ G. Perlman (1987) summarized and discussed the the group symmetry covari-ance models suggested by Andersson (1975). Furthermore, several examples of differentsymmetries, maximum likelihood estimators and likelihood ratio tests are given in Perl-man (1987). The following example by Perlman (1987) shows the connection between acertain symmetry and a group representation.

Page 34: Studies in Estimation of Patterned Covariance Matrices

22 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

Example 2.5

Let the random vectorx be partitioned asx = (x′1, . . . ,x

′k)′ and assume thatcov(x) =

Σ have circular block symmetry, i.e.,

cov(x) = cov(Prx), r = 1, . . . , k − 1, (2.16)

where

P =

0 Iq 0 · · · 0

0 0 Iq

0 0 0

. . . Iq

Iq 0 0 · · · 0

: (qk) × (qk).

Let G1 be the cyclic group of orderk given by

G1 =I,P, . . . ,Pk−1

.

Fork = 4 one can verify thatS+G1

consists of all positive definite matrices of the form

ΣCBS =

A B C B′

B′ A B C

C B′ A B

B C B′ A

, A = A′,C = C′,

whereA,B,C : q × q and it follows that the circular block symmetry condition (2.16) isequivalent toΣ ∈ S+

G1.

In Jensen (1988) a class of covariance models that is larger than the class of invariantnormal models was obtained. Jensen (1988) considered structures which are linear in boththe covariance and the inverse covariance and proved that these can be parameterized byJordan algebras. The structures which are linear in the covariance are given by

Σ =

m∑

i=1

σiGi, (2.17)

whereGi, i = 1, . . . ,m are symmetric and linear independent known matrices andσi, i = 1, . . . ,m are unknown real parameters such thatΣ is positive definite. The struc-ture (2.17) was first discussed by Anderson (1969, 1970, 1973), where the likelihoodequations, an iterative method for solving these equations and the asymptotic distributionof the estimates were given.

Permutation invariantcovariance matrices were considered in Nahtman (2006) andit was proven that permutation invariance implies a specific structure for the covariancematrix. Nahtman and von Rosen (2005) showed that shift invariance impliesToeplitzcovariance matrices and marginally shift invariance givesblock Toeplitzcovariance ma-trices.

Page 35: Studies in Estimation of Patterned Covariance Matrices

2.2 Estimation of the Parameters 23

There exist many papers on Toeplitz covariance matrices, e.g., see Burg et al. (1982),Fuhrmann and Barton (1990), Marin and Dhorne (2002) and Christensen (2007). To havea Toeplitz structure means that certain invariance conditions are fulfilled, e.g., equality ofvariances and covariances. Toeplitz covariance matrices are all banded matrices. Bandedcovariance matrices and their inverses frequently arise in biological, economical timeseries and time series in engineering. For example in signal processing applications, in-cluding autoregressive or moving average image modeling, covariances of Gauss-Markovrandom processes (Woods, 1972, Moura and Balram, 1992), or numerical approximationsof partial differential equations based on finite differences. Banded matrices are also usedto model the correlation of cyclostationary processes in periodic time series (Chakraborty,1998).

One type of Toeplitz matrices is the covariance matrices arising from the theory oftime series. Durbin (1959) suggested an efficient estimation procedure for the parametersin a Gaussian moving average time series of order one. Godolphin and De Gooijer (1982)computed the exact maximum likelihood estimators of the parameters for a moving av-erage process. The autoregressive moving average process has been discussed by manyauthors. Anderson (1975) gave the likelihood equations for the parameters which are non-linear in most cases and proposed some iterative solutions. Anderson (1977) derived theNewton-Raphson procedures for the same likelihood equations. For more references ontime series analysis, see for example Box and Jenkins (1970), Hannan (1970), Anderson(1971). See Figure 2.2 for the connections between different covariance structures.

Several authors have considered other structures. For example, Jöreskog (1981) con-sidered linear structural relations (LISREL) models and Lauritzen (1996) more sophis-ticated structures within the framework of graphical models. Browne (1977) reviewspatterned correlation matrices arising from multiple psychological measurements.

Linearly Structured Covariance Matrices with Zeros

Covariance matrices with zeros have been considered by several authors, see for exampleGrzebyk et al. (2004), Drton and Richardson (2004), Mao et al. (2004), Chaudhuri et al.(2007).

An new algorithm, callediterative conditional fitting, for the maximum likelihoodestimators in the multivariate normal case is derived in Drton and Richardson (2004),Chaudhuri et al. (2007). The iterative conditional fitting algorithm is based on covariancegraph models and is an iterative algorithm for deriving the maximum likelihood estima-tors. Suppose we have a random vectorx = (x1, x2, x3, x4) with the covariance matrix

Σ =

σ11 0 σ13 00 σ22 0 σ24

σ13 0 σ33 σ34

0 σ24 σ34 σ44

. (2.18)

A covariance graph is a graph with a vertex for every random variable in the random vec-tor x. The vertices in the graph are connected by a bi-directed edgei↔ j unlessσij = 0.Hence, the covariance graph corresponding to the covariance matrix (2.18) is given inFigure 2.1. The iterative conditional fitting algorithm developed in Drton and Richardson(2004), Chaudhuri et al. (2007) is given as follows. First fix the marginal distribution for

Page 36: Studies in Estimation of Patterned Covariance Matrices

24 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

x3 x2x4x1

Figure 2.1: Covariance graph for covariance matrix(2.18).

the variables different fromi. Then estimate, by maximum likelihood, the conditionaldistribution of variablei given all the other variables under the constraints implied by thecovariance graph model. Last, calculate a new estimate of the joint distribution by mul-tiplying together the fixed marginal and the estimated conditional distributions. Repeatuntil some convergence is obtained.

Let the index set of the variables beV = 1, . . . , p, the set of all variables buti be−i = V \i andsp(i) = j|i↔ j, i.e., the set of variables that is dependent to variablei. Let alsonsp(i) = V \(sp(i) ∪ i). Assume that we haven observations from a normaldistribution with mean zero and covariance matrix with some zeros. Let the observationmatrix beX = (x1, . . . ,xn) ∼ Np,n (0,Σ, In). Furthermore, the distribution if thevariablesX−i is given by

X−i ∼ Np−1,n (0,Σ−i,−i, In) ,

whereXS is the submatrix ofX including the rows given by the index setS andΣS,T

is the submatrix ofΣ including the rows and columns given by index setsS andT ,respectively.

The conditional distribution ofXi|X−i is then

Xi|X−i ∼ N1,n

(Σi,−iΣ

−1−i,−iX−i, λiIn

), (2.19)

where

λi = σii − Σi,−iΣ−1−i,−iΣ−i,i.

Using the fact thatσij = 0 if j ∈ nsp(i) we have from (2.19)

Xi|X−i ∼ N1,n

j∈sp(i)

σijZ(i)j , λiIn

, (2.20)

where all the pseudo-variablesZ(i)sp(i) are defined as

Z(i)sp(i) =

(Σ−1

−i,−i

)sp(i),−i

X−i (2.21)

A more precise formulation of the iterative conditional fitting algorithm is as follows.

Page 37: Studies in Estimation of Patterned Covariance Matrices

2.2 Estimation of the Parameters 25

Algorithm 2.1 (Iterative conditional fitting algorithm)

1. Set the iteration counterr = 0, and choose a starting valueΣ(0)

≥ 0, for examplethe identity matrix.

2. SetΣ(r,0)

= Σ(r)

and repeat the following steps for alli ∈ V :

(i) Let Σ(r,i)

−i,−i = Σ(r,i−1)

−i,−i and calculate from this submatrix the pseudo-variables

Z(i)sp(i) given in (2.21).

(ii) Compute the maximum likelihood estimates

Σ(r,i)

i,sp(i) = XiZ(i)′sp(i)

(Z

(i)sp(i)Z

(i)′sp(i)

)−1

,

λi =1

n

(Xi − Σ

(r,i)

i,sp(i)Z(i)sp(i)

)(Xi − Σ

(r,i)

i,sp(i)Z(i)sp(i)

)′

,

for the linear regression given in(2.20).

(iii) CompleteΣ(r,i)

by solving forσii and thus setting

σ(r,i)ii = λi + Σ

(r,i)

i,−i

((Σ

(r,i)

−i,−i

)−1)

sp(i),sp(i)

Σ(r,i)

−i,i .

3. SetΣ(r+1)

= Σ(r,p)

, increment the counterr to r + 1 and repeat step 2 untilconvergence is obtained.

Ohlson et al. (2009) studied banded matrices with unequal elements except that certaincovariances are zero. This banded structure is a special case of the structure considered byChaudhuri et al. (2007), see Figure 2.2. The basic idea is that widely separated observa-tions in time or space often appear to be uncorrelated. Therefore, it is reasonable to workwith a banded covariance structure where all covariances more thanm steps apart equalzero, a so calledm-dependentstructure. LetΣ(m)

(k) : k × k be anm-dependent banded

covariance matrix and partitionΣ(m)(k) as

Σ(m)(k) =

(m)(k−1) σ1k

σ′k1 σkk

), (2.22)

where

σ′k1 = (0, . . . , 0, σk,k−m, . . . , σk,k−1) .

The procedure suggested by Ohlson et al. (2009) to estimate a banded covariance matrixΣ

(m)(k) is given by the following algorithm.

Page 38: Studies in Estimation of Patterned Covariance Matrices

26 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

Algorithm 2.2 (Explicit estimators for a banded covariance matrix)

Let X ∼ Np,n(µ1′n,Σ

(m)(p) , In), with arbitrary integerm. The estimators ofµ andΣ

(m)(p)

are given by the following two steps.

(i) Use the maximum likelihood estimator forµ1, . . . , µm+1 andΣ(m)(m+1).

(ii) Calculate the following estimators fork = m+ 2, . . . , p in increasing order, wherefor eachk let i = k −m, . . . , k − 1:

µk =1

nx′

k1n, (2.23)

σki = βki

|Σ(k−1)|

|Σ(k−2)|, (2.24)

σkk =1

nx′

k

(In − Xk−1(X

k−1Xk−1)−1X

k−1

)xk + σ

′k1Σ

−1

(k−1)σ1k, (2.25)

where

σ′k1 = (0, . . . , 0, σk,k−m, . . . , σk,k−1) ,

βk =(βk0, βk,k−m, . . . , βk,k−1

)′= (X

k−1Xk−1)−1X

k−1xk, (2.26)

Xk−1 = (1n, xk−1,k−m, . . . , xk−1,k−1)

and

xk−1,i =

k−1∑

j=1

(−1)i+j|M

ji

(k−1)|

|Σ(k−2)|xj ,

whereMji

(k−1) is the matrix obtained when thejth row andith column have been removed

from Σ(k−1).

The estimators in Algorithm 2.2 are fairly natural, but they are ad hoc estimators andOhlson et al. (2009) motivated them with the following theorem.

Theorem 2.16The estimatorµ = (µ1, . . . , µp)

′ given in Algorithm 2.2 is unbiased and consistent, and

the estimatorΣ(m)

(p) = (σij) is consistent.

2.2.3 Estimating the Kronecker Product Covariance

If the covariance matrix for a matrix normal distribution is separable, i.e., if it can bewritten as a Kronecker product between two matrices the model belongs to the curvedexponential family. Thus, under the Kronecker product structure the parameter space

Page 39: Studies in Estimation of Patterned Covariance Matrices

2.2 Estimation of the Parameters 27

ΣCT

ΣARΣ(m)

ΣWZ

Σ > 0

ΣT

Σ = σ2IΣIC

Figure 2.2: Different covariance structures. (ΣWZ = with zeros, Σ(m) = banded,ΣT = toeplitz, ΣCT = circular toeplitz, ΣAR = autoregressiveand ΣIC =intraclass)

is of lower dimension and it has other statistical properties, i.e., one have to be carefulsince the estimation and testing can be more complicated. Kronecker product structuresin covariance matrices have recently been studied by Dutilleul (1999), Naik and Rao(2001), Lu and Zimmerman (2005), Roy and Khattree (2005a), Mitchell et al. (2005,2006), Srivastava et al. (2007).

Let X1, . . . ,XN be a random sample from the matrix normal distribution,Xi ∼Np,n (M,Σ,Ψ) for i = 1, . . . , N . From (2.1) the logarithm of the likelihood functionfor the sampleX1, . . . ,XN can, ignoring the normalizing constant, be written as

lnL(M,Σ,Ψ) = −nN

2ln |Σ| −

pN

2ln |Ψ|

−1

2

N∑

i=1

trΣ−1 (Xi − M)Ψ−1 (Xi − M)

′. (2.27)

The likelihood equations for the maximum likelihood estimatorsM, Σ, Ψ are given by

Page 40: Studies in Estimation of Patterned Covariance Matrices

28 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

Dutilleul (1999) as

M =1

N

N∑

i=1

Xi = X, (2.28)

Σ =1

nN

N∑

i=1

(Xi − X

−1 (Xi − X

)′(2.29)

and

Ψ =1

pN

N∑

i=1

(Xi − X

)′Σ

−1 (Xi − X

). (2.30)

There is no explicit solution to (2.29) and (2.30). Dutilleul (1999) derived an estima-tor for Ψ ⊗ Σ using theflip-flop algorithm given in Algorithm 2.3. Srivastava et al.(2007) pointed out that the Gaussian model with a separable covariance matrix, i.e.,X ∼ Np,n (M,Σ,Ψ) belongs to the curved exponential family and the convergenceand uniqueness have to be carefully considered.

Algorithm 2.3 (The flip-flop algorithm)

1. Choose a starting value forΨ = Ψ(0)

.

2. EstimateΣ(r)

from (2.29)with Ψ = Ψ(r−1)

.

3. EstimateΨ(r)

from (2.30)with Σ = Σ(r)

.

4. Repeat Step 2 and 3 until some convergence criteria is fulfilled.

Another problem with the estimation ofΣ and Ψ respectively, is that all the parametersare not uniquely defined since for every scalarc 6= 0 we have

Ψ ⊗ Σ = cΨ⊗1

cΣ.

Srivastava et al. (2007) gave two ways to obtain unique estimates. Either setψnn = 1 orψii = 1 for i = 1, . . . , n respectively and then use Algorithm 2.3 under these constraints.

2.2.4 Estimating the Kronecker Product Covariance (OneObservation Matrix)

In many situations in statistical analysis the assumption of independence between obser-vations are violated. Observations from a spatio-temporal stochastic process can undersome conditions be described as a matrix normal distribution with a separable covariancematrix. Often also when the stochastic process are spatio-temporal, some structures canbe assumed for one or both of the matrices in the Kronecker product. Shitan and Brock-well (1995), Ohlson and Koski (2009a) have considered the Kronecker product structurein time series analysis and structures for the covariance matrices.

Page 41: Studies in Estimation of Patterned Covariance Matrices

2.2 Estimation of the Parameters 29

The spatio-temporal processes typically only have one observation matrix. Let a ran-dom matrixX : p×n have matrix normal distribution with a separable covariance matrix,i.e.,X ∼ Np,n (M,Σ,Ψ), where the covariance between the rows isΣ : p × p and thecovariance between the columns isΨ : n × n. We will call Σ andΨ the spatial andtemporal covariance matrix, respectively. Furthermore, we will start by assuming thatΨ

is known.Assume thatX = (x1,x2, . . . ,xn), wherexi ∼ Np(µ,Σ). Since the temporal

covariance isΨ we haven dependent vectors. The expectation ofX is given byE(X) =µ1′, whereµ = (µ1, . . . , µp)

′. WhenN = 1 in log-likelihood function (2.27) it is easyto prove the following theorem.

Theorem 2.17Let X ∼ Np,n (µ1′,Σ,Ψ), whereΣ > 0 andΨ is known. The maximum likelihoodestimators forµ andΣ are given by

µ = (1′Ψ−11)−1XΨ−11

and

nΣ = XHX′,

whereH is the weighted centralization matrix

H = Ψ−1 − Ψ−11(1′Ψ−11

)−11′Ψ−1. (2.31)

We know that the ordinary sample covariance matrixnS = X(I − 1n11′)X′ is Wishart

distributed and we can show that the same is valid for the sample covariance matrixA =XHX′. SinceHΨ is idempotent,rank(H) = n− 1 and using Corollary 2.2 we have thefollowing corollary.

Corollary 2.4LetX ∼ Np,n (µ1′,Σ,Ψ), whereΨ is known, thennΣ = XHX′ ∼Wp (n− 1,Σ).

In most cases, both the spatial and temporal covariance matrices are unknown. If the twocovariance matrices have no structure it is impossible to estimate all parameters from onesample observation matrixX. There arep(p + 1)/2 + n(n + 1)/2 + p parameters toestimate. In many applications we reduce the number of parameters by assuming that thetemporal covariance matrixΨ has some structure, see for example Chaganty and Naik(2002), Huizenga et al. (2002), Roy and Khattree (2005a,b).

Let X ∼ Np,n (µ1′,Σ,Ψ), whereΣ > 0 andΨ > 0. The following corollarygiven by Ohlson and Koski (2009a) gives the maximum likelihood estimators when thetemporal covariance matrixΨ can be estimated explicitly.

Corollary 2.5Let X ∼ Np,n (µ1′,Σ,Ψ) and assume that the temporal covariance matrixΨ can beestimated explicitly. The maximum likelihood estimators forµ andΣ are given by

µ = (1′Ψ−1

1)−1XΨ−1

1

Page 42: Studies in Estimation of Patterned Covariance Matrices

30 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

and

nΣ = XHX′,

whereH is the estimated weighted centralization matrix i.e.,

H = Ψ−1

− Ψ−1

1(1′Ψ

−11)−1

1′Ψ−1. (2.32)

Ohlson and Koski (2009a) presented an iterative algorithm to find the maximum likeli-hood estimates of the parameters in a matrix normal distribution, similar the algorithm inDutilleul (1999). The big difference is that Ohlson and Koski (2009a) only have one sam-ple observation matrix and assumed that the temporal covarianceΨ has an autoregressiveor intraclass structure, whereas the spatial covariance matrix is unstructured.

Let Ψ be the covariance matrix from an autoregressive process of order one, i.e.,AR(1) (see Brockwell and Davis (2002), Ljung (1999) for more details). The covariancematrix is then given by

Ψ(θ) =1

1 − θ2

1 θ θ2 · · · θn−1

θ 1 · · ·

θ2...

. . .... θ

θn−1 θ 1

.

Further, let the covariance matrixΣ > 0 be unstructured. This model implies that everyrow in X comes from the same stationaryAR(1) time series, but with different variancesσii. The model can be written as

xit − µi = θ (xi,t−1 − µi) + εit, i = 1, . . . , p, t = 1, . . . , n

for some expectationsµi and whereεit ∼ N(0, σii), |θ| < 1 andεit is uncorrelated withxis for eachs < t. The differentAR(1) time series are dependent sinceσij 6= 0.

A reasonable first estimate of the parameterθ could be the mean of the Yule-Walkerestimates

θ(0) =1

p

p∑

i=1

θi, (2.33)

whereθi is the Yule-Walker estimate ofθ (see Brockwell and Davis (2002) for the theoryaround the Yule-Walker estimate) in each time series, i.e.,

θi =

∑n−1t=1 (xit − xi)(xi,t+1 − xi)∑n

t=1(xit − xi)2, where xi =

1

n

n∑

t=1

xit.

The determinant and the inverse ofΨ can easily be calculated,

|Ψ(θ)| = (1 − θ2)−1

Page 43: Studies in Estimation of Patterned Covariance Matrices

2.2 Estimation of the Parameters 31

and

Ψ−1(θ) = I + θ2D1 − θD2 =

1 −θ 0 · · · 0−θ 1 + θ2 · · · 0

0...

. . ....

... 1 + θ2 −θ0 0 · · · −θ 1

,

whereD1 = diag(0, 1, . . . , 1, 0) andD2 is tridiagonal matrix with zeros on the diagonaland ones on the super- and subdiagonal. Without the imposed structure on the temporalcovariance, the maximum likelihood estimates do not exist explicitly. Ohlson and Koski(2009a) gave the likelihood equations

µ = (1′Ψ−1(θ)1)−1XΨ−1(θ)1, (2.34)

nΣ = X(Ψ−1(θ) − Ψ−1(θ)1

(1′Ψ−1(θ)1

)−11′Ψ−1(θ)

)X′ = XH(θ)X′ (2.35)

and

2θp+ (1 − θ2)trΥ(θ) (X − µ1′)

′Σ−1 (X − µ1′)

= 0, (2.36)

whereH is defined in (2.31) and

Υ(θ) = 2θD1 − D2 =

0 −1 0 · · · 0−1 2θ · · · 0

0...

. . ....

... 2θ −10 0 · · · −1 0

.

Ohlson and Koski (2009a) also showed that the equations (2.34)-(2.36) give the equation

2θp+ n(1 − θ2)trXBΥ(θ)B′X′(XBΨ−1(θ)B′X′)−1

= 0, (2.37)

where

B = I − (1′Ψ−1(θ)1)−1Ψ−1(θ)11′,

which is a polynomial equation of order less than3p+2. Instead of solving the polynomialequation (2.37) exactly, Ohlson and Koski (2009a) showed that the maximum likelihoodestimatesµ, Σ andθ can be calculated by iteratively solving the three equations (2.34),(2.35) and (2.36) above. The algorithm derived by Ohlson and Koski (2009a) is given inAlgorithm 2.4.

Page 44: Studies in Estimation of Patterned Covariance Matrices

32 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

Algorithm 2.4 (The algorithm given by Ohlson and Koski (2009a))

1. Obtain an initial estimateθ(0) of θ, using the mean of the Yule-Walker estimatesgiven by(2.33).

2. Computeµ(1) andΣ(1)

from (2.34)and (2.35), usingθ(0).

3. Compute the value ofθ(k) by solving the cubic equation(2.36)using the estimates

µ(k) andΣ

(k). Ensure that|θ(k)| < 1 and that the solution is a maximum.

4. Computeµ(k) andΣ(k)

from (2.34)and (2.35), using the estimateθ(k) from previ-ous step.

5. Repeat steps 3 and 4 until convergence is obtained, i.e., until

|θ(k) − θ(k−1)| < ε

and

tr

((Σ

(k)− Σ

(k−1))2)< ε. (2.38)

Convergence is checked by verifying that the maximum of the absolute difference be-tween two successive estimates ofθ(k) and that the trace of the quadratic difference be-

tween two successive estimates ofΣ(k)

is less than a predetermine numberε (ε can betaken to be10−6). The condition (2.38) is relevant since we have

tr

((Σ

(k)− Σ

(k−1))2)

= tr

((Σ

(k)− Σ

(k−1))′(

Σ(k)

− Σ(k−1)

))

=∑

i

j

(s(k)ij − s

(k−1)ij )2 ≡ ‖Σ

(k)− Σ

(k−1)‖2,

whereΣ(k)

=(s(k)ij

)i,j

and‖ · ‖ is thel2 norm (Horn and Johnson, 1990). Hence, we

check that thel2 norm of the difference is less thanε. In every run of the algorithm onlya cubic equation has to be solved since givenµ andΣ, the last part of equation (2.36)

trΥ(θ) (X− µ1′)

′Σ−1 (X− µ1′)

is a linear function inθ, implying that (2.36) is a cubic equation.Intraclass covariance structure (2.14) was first considered by Wilks (1946). For the

matrix normal distribution and the Kronecker product structure several authors have dis-cussed the case with intraclass covariance structure forΨ, e.g., see Roy and Khattree(2005a,b). Ohlson and Koski (2009a) showed that the likelihood function is maximizedfor

µ =(1′Ψ−1(ρ)1

)−1XΨ−1(ρ)1 = n−1X1 = µ,

Page 45: Studies in Estimation of Patterned Covariance Matrices

2.2 Estimation of the Parameters 33

i.e., the regular mean and that the resulting likelihood function is written as

L(µ,Σ, ρ) = |Σ|−n/2((1 − ρ)n−1(1 + (n− 1)ρ)

)−p/2etr

1

2

1

1 − ρΣ−1XCX′

,

where

C = I − 1(1′1)−11′. (2.39)

The likelihood function is decreasing and does not have a maximum for any permissibleρ. Hence,ρ is estimated, for alln, with the smallest value ofρ that is allowed. For largenand since− 1

n−1 < ρ, we chooseρ = 0. Hence, in the intraclass model with one sample

observation matrix, the best we can do for largen is to estimateΨ with Ψ = In, i.e.,assume independence. The estimate for the spatial covarianceΣ is the ordinary samplecovariance matrix given in (2.13) .

2.2.5 The Distribution of a Special Sample Covariance Matrix

AssumeX ∼ Np,n (µ1′,Σ,Ψ), whereΣ > 0 andΨ is known. The maximum likeli-hood estimators are given in Theorem 2.17 and the distribution of the sample covariancematrix in Corollary 2.4 .

Now for some reason we estimate the expectationµ with the regular meanµ =1nX1 = x, i.e., we use the same estimator as if the observations were independent.This estimator is also an unbiased and consistent estimator for the mean and is intuitivelyreasonable. There are several reasons for using the simpler estimator. For example, theestimatorµ is more robust thanµML for a large number of observations, i.e., for largen. Another reason could be that we only know the centralized observations,X − µ1′.However, when we estimate the covariance matrixΣ, we use the dependent model withΨ. The estimator ofΣ is than given as

nΣ = (X − µ1′)Ψ−1 (X− µ1′)′= XCΨ−1CX′,

whereC is given in (2.39). Hence, the distribution of the sample covariance matrix is

(X − µ1′)Ψ−1 (X − µ1′)′∼ Qp

(Ψ1/2CΨ−1CΨ1/2,MΨ−1/2,Σ

), (2.40)

where the distributionQp is defined in Definition 2.2. For simplifying the distribution(2.40) we need the eigenvalues and eigenvectors of the matrixΨ1/2CΨ−1CΨ1/2. Letλ(A) be the eigenvalues of the matrixA. We know that the eigenvalues of the matrixΨ1/2CΨ−1CΨ1/2 are the same as the eigenvalues ofCΨ−1CΨ (see Horn and Johnson(1990) for details). Furthermore, the matrixCΨ−1CΨ can be written as

CΨ−1CΨ = I −1

n

(11′ + CΨ−111′Ψ

).

The following two lemmas will give the distribution fornΣ. The first lemma gives theeigenvalues ofΨ1/2CΨ−1CΨ1/2.

Page 46: Studies in Estimation of Patterned Covariance Matrices

34 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

Lemma 2.1The eigenvalues ofΨ1/2CΨ−1CΨ1/2 are

λ(Ψ1/2CΨ−1CΨ1/2

)=

(0, 1 −

1

n1′ΨCΨ−11, 1, . . . , 1

).

Proof: The proof is straightforward. Using known properties of the eigenvalues we have

λ(Ψ1/2CΨ−1CΨ1/2

)= λ

(CΨ−1CΨ

)= λ

(I −

1

n

(11′ + CΨ−111′Ψ

))

= 1 −1

nλ(11′ + CΨ−111′Ψ

)= 1 −

1

((1 CΨ−11

)( 1′

1′Ψ

))

= 1 −1

n

((1′

1′Ψ

)(1 CΨ−11

)), 0, . . . , 0

)

= 1 −1

n

(n 0

1′Ψ1 1′ΨCΨ−11

), 0, . . . , 0

)

= 1 −1

n

(n,1′ΨCΨ−11, 0, . . . , 0

)=

(0, 1 −

1

n1′ΨCΨ−11, 1, . . . , 1

)

and the proof of the Lemma follows.

Let λ∗ = 1 − 1n1′ΨCΨ−11, which is the eigenvalue ofΨ1/2CΨ−1CΨ1/2 not equal to

zero or one.

Lemma 2.2Let the vectorsh∗ andh0 beh∗ = Ψ1/2CΨ−11 andh0 = Ψ−1/21, respectively. Thenh∗ is an eigenvector forΨ1/2CΨ−1CΨ1/2 with eigenvalueλ∗ andh0 is an eigenvectorfor Ψ1/2CΨ−1CΨ1/2 with eigenvalue0, i.e.,

Ψ1/2CΨ−1CΨ1/2h∗ = λ∗h∗,

Ψ1/2CΨ−1CΨ1/2h0 = 0.

The distribution ofnΣ is given in the following corollary.

Corollary 2.6AssumeX ∼ Np,n (µ1′,Σ,Ψ), whereΣ > 0 and Ψ is known. The distribution ofnΣ = XCΨ−1CX′ is the same as the distribution ofW = W1 + λ∗W∗, whereW1

andW∗ are independent and

W1 ∼Wp (n− 2,Σ) ,

W∗ ∼Wp (1,Σ)

andλ∗ = 1 − 1n1′ΨCΨ−11.

Proof: We have that

W1 ∼Wp

(n− 2,Σ,µ1′Ψ−1/2

(n−2∑

i=1

∆i∆i′

)Ψ−1/21µ′

),

Page 47: Studies in Estimation of Patterned Covariance Matrices

2.2 Estimation of the Parameters 35

where∆i, i = 1, . . . , n− 2, are the orthonormal eigenvectors with eigenvalue1 and

n−2∑

i=1

∆i∆i′ = I −

(∆0∆0

′ + ∆∗∆∗′).

The noncentrality parameter is given by

µ1′Ψ−1/2(I−

(∆0∆0

′ + ∆∗∆∗′))

Ψ−1/21µ′

= µ1′Ψ−1/2(I − (h′

0h0)−1h0h

′0 − (h∗′h∗)−1h∗h∗′

)Ψ−1/21µ′

= µ1′Ψ−11µ′ − (h′0h0)

−1µ1′Ψ−1/2h0h′0Ψ

−1/21µ′

− (h∗′h∗)−1µ1′Ψ−1/2h∗h∗′Ψ−1/21µ′ (2.41)

and we have, sinceh0 = Ψ−1/21 and1′Ψ−11 are both scalars,

(h0′h0)

−1µ1′Ψ−1/2h0h0′Ψ−1/21µ′ = (1′Ψ−11)−1µ1′Ψ−111′Ψ−11µ′

= (1′Ψ−11)µµ′. (2.42)

Furthermore, sinceh∗ = Ψ1/2CΨ−11 and1′C = C1 = 0,

µ1′Ψ−1/2∆∗∆∗′Ψ−1/21µ′ = (h∗′h∗)−1µ1′Ψ−1/2h∗h∗′Ψ−1/21µ′

= (h∗′h∗)−1µ1′CΨ−111′Ψ−1C1µ′ = 0. (2.43)

Finally, from (2.41), (2.42) and (2.43), we have

µ1′Ψ−1/2(I −

(∆0∆

′0 + ∆∗∆∗′

))Ψ−1/21µ′ = 0

and the Corollary follows.

The expectation ofXCΨ−1CX′ can be computed straightforwardly,

E(XCΨ−1CX′

)=

(n− 1 −

1

n1′ΨCΨ−11

)Σ.

Hence, an unbiased estimator ofΣ is

Σ =

(n− 1 −

1

n1′ΨCΨ−11

)−1

XCΨ−1CX′.

Page 48: Studies in Estimation of Patterned Covariance Matrices
Page 49: Studies in Estimation of Patterned Covariance Matrices

3Estimation of the Covariance Matrix

for a Growth Curve Model

THE Growth Curve model is a generalization of the multivariate analysis of variancemodel (MANOVA). The Growth Curve model belongs to the curved exponential

family and was introduced by Potthoff and Roy (1964). The mean structure for the GrowthCurve model is bilinear instead of linear as for the ordinary MANOVA model. For detailsand references connected to the model see Srivastava and Khatri (1979), Kshirsagar andSmith (1995), Srivastava and von Rosen (1999) or Kollo and von Rosen (2005).

3.1 The Growth Curve Model

In this section will we give the definition of the Growth Curve model and two examplesfor the understanding. The Growth Curve model is defined as follows.

Definition 3.1. Let X : p× n andB : q × k be the observation and parameter matricesand letA : p × q andC : k × n be the within and between individual design matrices,respectively. Let alsoq ≤ p, rank(C) + p ≤ n andΣ : p × p be positive definite. TheGrowth Curve model is given by

X = ABC + Σ1/2E, (3.1)

where

E ∼ Np,n (0, Ip, In) .

One may note that the Growth Curve model defined above, is nothing more than theclassical MANOVA model ifA = I and that the design matrixC is the same as inclassical univariate and multivariate linear models. The Growth Curve model (3.1) can be

37

Page 50: Studies in Estimation of Patterned Covariance Matrices

38 3 Estimation of the Covariance Matrix for a Growth Curve Model

rewritten as

vecX =(C′ ⊗ A

)vecB +

(I ⊗ Σ1/2

)vecE,

which is a special case of the classical multivariate linear model. However, there is nogain expressing the model in this way since, as we will see, the interesting part is thetensor space generated byC′ ⊗ A andI⊗ Σ.

The following examples will show how the Growth Curve model can be used. Formore examples see Srivastava and Carter (1983), Kshirsagar and Smith (1995).

Example 3.1: Potthoff & Roy - Dental Data

Dental measurements on eleven girls and sixteen boys at four different ages(8, 10, 12, 14)were taken. Each measurement is the distance, in millimeters, from the center of pituitaryto pteryo-maxillary fissure. Suppose linear growth curves describe the mean growth forboth the girls and the boys. Then we may use the Growth Curve model where the obser-vation, parameter and design matrices are given as follows

X = (x1, . . . ,x27) : 4 × 27

=

21 21 20.5 23.5 21.5 20 21.5 23 20 . . .16.5 24.5 26 21.5 23 20 25.5 24.5 22 . . .. . . 24 23 27.5 23 21.5 17 22.5 23 2220 21.5 24 24.5 23 21 22.5 23 21 . . .19 25 25 22.5 22.5 23.5 27.5 25.5 22 . . .. . . 21.5 20.5 28 23 23.5 24.5 25.5 24.5 21.521.5 24 24.5 25 22.5 21 23 23.5 22 . . .19 28 29 23 24 22.5 26.5 27 24.5 . . .. . . 24.5 31 31 23.5 24 26 25.5 26 23.523 25.5 26 26.5 23.5 22.5 25 24 21.5 . . .

19.5 28 31 26.5 27.5 26 27 28.5 26.5 . . .. . . 25.5 26 31.5 25 28 29.5 26 30 25

,

B =

(b01 b02b11 b12

), A =

1 81 101 121 14

and C =

(1′

11 0′16

0′11 1′

16

).

Hence, for example for individual in the first groupi = 1, . . . , 11 the Growth Curvemodel gives the mean

E (xi) =

b01 + 8b11b01 + 10b11b01 + 12b11b01 + 14b11

.

Page 51: Studies in Estimation of Patterned Covariance Matrices

3.2 Estimation of the Parameters 39

Example 3.2

Let there bek groups of individuals, withnj individuals in thejth group. Thek differentgroups have been taught with different learning processes. Every individual have beentested at the samep time pointstr, r = 1, . . . , p. If we assume that the testing results of anindividual are multivariate normal distributed with a covariance matrixΣ and that testingresults between different individuals are independent we can apply the Growth Curvemodel defined in Definition 3.1. Let the mean for the different groups be a polynomial intime of degreeq − 1. The mean for groupj can then be written as

µj = β1j + β2jt+ · · · + βqjtq−1, j = 1, . . . , k,

whereβij are unknown parameters. The Growth Curve model is then given by the matri-ces

A =

1 t1 . . . tq−11

1 t2 . . . tq−12

......

. . ....

1 tp . . . tq−1p

, B = (βij) andC =

1′n1

0′n2

. . . 0′nk

0′n1

1′n2

. . . 0′nk

......

. . ....

0′n1

0′n2

. . . 1′nk

.

3.2 Estimation of the Parameters

In this section we will give the maximum likelihood estimators for a Growth Curve model.We will also consider different structures for the covariance matrix in a Growth Curvemodel. We will discuss the methods existing in the literature for the estimation problem.

3.2.1 Maximum Likelihood Estimators

When no assumptions about the covariance matrix in the Growth Curve model (3.1) aremade, Potthoff and Roy (1964) originally derived a class of weighted estimators for theparameter matrixB as

B = (A′G−1A)−1A′G−1XC′(CC′)−1, (3.2)

where the design matricesA andC are assumed to have full rank. The estimator (3.2)is a function of an arbitrary positive definite matrixG. If G is chosen to be the identitymatrix, i.e.,G = I, the weighted estimator (3.2) is an unweighted estimator

B = (A′A)−1A′XC′(CC′)−1. (3.3)

Khatri (1966a) extended the results from Potthoff and Roy (1964) and showed that themaximum likelihood estimator is a weighted estimator as well. The maximum likelihoodestimator is a function of the matrix of the sum of squares

S = X(I − C′(CC′)−C

)X′ (3.4)

Page 52: Studies in Estimation of Patterned Covariance Matrices

40 3 Estimation of the Covariance Matrix for a Growth Curve Model

and the maximum likelihood estimator is given by

BML = (A′S−1A)−A′S−1XC′(CC′)− + (A′)oZ1 + A′Z2Co′, (3.5)

whereZ1 andZ2 are arbitrary matrices andAo is any matrix of full rank which is span-ning the orthogonal complement toC(A), i.e.,C(Ao) = C(A)⊥, whereC(A) stands forthe linear space generated by the columns ofA. If the design matricesA andC have fullrank the estimator is nothing more than

BML = (A′S−1A)−1A′S−1XC′(CC′)−1. (3.6)

Furthermore, the maximum likelihood estimator ofΣ is given by

nΣML =(X − ABMLC

)(X− ABMLC

)′= S + R1R

1, (3.7)

where

R1 = XC′(CC′)−C − ABC

= (I − A(A′S−1A)−A′S−1)XC′(CC′)−C. (3.8)

From (3.5) we see thatABMLC is always unique, i.e., the estimator for the covariancematrixΣ is also always unique.

The residuals in the bilinear models were discussed in von Rosen (1995), Seid Hamidand von Rosen (2006), Ohlson and von Rosen (2009). The space orthogonal to the spacegenerated by the two design matrices can be decomposed as(CS(A) ⊗ C(C′)

)⊥= CS(A)⊥ ⊗ C(C′) ⊞ CS(A)⊥ ⊗ C(C′)⊥ ⊞ CS(A)⊥ ⊗ C(C′)⊥,

(3.9)

where⊞ denotes the orthogonal direct sum of subspaces andCS(A) stands for the linearspace generated by the columns ofA with an inner product defined withS as< x,y >=x′S−1y. If there is no subscription it means that the standard inner product< x,y >=x′y is assumed. The decomposition can be clearer with a figure, see Figure 3.1.

CS(A)⊥ R1 R2

CS(A) ABC R3

C(C′) C(C′)⊥

Figure 3.1: Decomposition of the space generated by the design matricesA andC.

The residuals defined by von Rosen (1995) are the projections ofX on the threesubspaces in the decomposition (3.9) and are given byR1 in (3.8),

R2 =(I − A(A′S−1A)−A′S−1

)X(I − C′(CC′)−C

)(3.10)

Page 53: Studies in Estimation of Patterned Covariance Matrices

3.2 Estimation of the Parameters 41

and

R3 = A(A′S−1A)−A′S−1X(I− C′(CC′)−C

). (3.11)

These residuals are one measure of how well the data fits the estimated Growth Curvemodel. SinceR2 + R3 = X

(I − C′(CC′)−C

)the maximum likelihood estimator for

the covariance matrix can be expressed in the residuals as

nΣML =(R2 + R3

)(R2 + R3

)′+ R1R

1.

Example 3.3

In Example 3.1 the design matrices and observed data was given. The maximum likeli-hood estimators for the parameter matrix and the covariance matrix are given by

B =

(17.4254 15.84230.4764 0.8268

)and Σ =

5.1192 2.4409 3.6105 2.52222.4409 3.9279 2.7175 3.06233.6105 2.7175 5.9798 3.82352.5222 3.0623 3.8235 4.6180

.

The mean for the two different groups,µ1 for girls andµ2 for boys, are then given by

(µ1, µ2) = AB =

21.2363 22.456722.1890 24.110323.1417 25.763924.0945 27.4175

.

3.2.2 Growth Curve Model with Patterned Covariance Matrix

Assume that the design matrixA is given byA = Y ⊗ Im, whereY is a knownp1 × q1matrix of full rank. Rao (1967) and Reinsel (1982) have shown that the unweightedestimator (3.3) also is the maximum likelihood estimator if the covariance matrix is givenby the structure

Σ = (Ip1 ⊗ Σl) + (Y ⊗ Im)Σλ

(Y′ ⊗ Im

)+ (W ⊗ Im)Σr

(W′ ⊗ Im

),

whereΣl : m ×m, Σλ : (mq1) × (mq1) are positive definite matrices,Σr : m(p1 −q1) ×m(p1 − q1) is a nonnegative definite matrix andW : p1 × (p1 − q1) is a known,full rank matrix such thatX′W = 0. Furthermore, Chinchilli and Carter (1984) derivedthe likelihood ratio test for this structured covariance matrix which can arise from themixed-model case.

Other patterned covariance matrices have also been discussed in the literature. Khatri(1973) derived the likelihood ratio test for the intraclass covariance structure given by

Σ = σ1G + σ2ww′,

Page 54: Studies in Estimation of Patterned Covariance Matrices

42 3 Estimation of the Covariance Matrix for a Growth Curve Model

whereσ1 andσ2 are unknown,G is a known matrix andw ∈ C(A) is a given vector inthe column space ofA. Khatri (1973) also derived the likelihood ratio test for sphericityand considered the problem of independence. The uniform covariance structure has beenconsidered in, e.g., Arnold (1981) (page 209-238) and Lee (1988).

Closely connected to the intraclass covariance structure is the random effects modelstudied by for example Rao (1965, 1975), Reinsel (1982, 1984), Ware (1985). More re-cently, random-effect covariance structure has been considered for the mixed MANOVA-GMANOVA models and the extended growth curves, e.g., see Yokoyama (1995, 1996,1997). The random effect Growth Curve model is given by

X = ABC + ΘZ + E,

whereA, B andC are as in Definition 3.1,Θ : p× r is an unknown matrix,Z is a knowndesign matrix. Furthermore, the error matrix followsE ∼ Np,n (0,Σs, I), where

Σs = A′s∆sAs + σ2

sIp,

As is the matrix consisting of thes first rows ofA, ∆s is an arbitrarys × s positivesemi-definite matrix andσ2

s > 0.Linearly structured (Kollo and von Rosen (2005) Definition 1.3.7) covariance matrices

have been frequently considered in the MANOVA model, see Section 2.2.2. The fact thatthe mean structure is bilinear in the Growth Curve model (3.1) gives the decompositionof the space orthogonal to the space generated by the design matrices given in (3.9) astensor spaces instead of linear spaces as in the MANOVA case. Ohlson and von Rosen(2009) derived an estimation procedure for a linearly structured covariance matrixΣ(P )

using the decomposition (3.9). The procedure proposed by Ohlson and von Rosen (2009)gives explicit estimators and is given as follows.

Algorithm 3.1 (Estimating patterned covariance in a Growth Curve model)

1. LetG in (3.2)be the estimated inner productΣ(P )

1 . SinceS from (3.4) is Wishart

distributed asS ∼ Wp

(n− rank(C),Σ(P )

), the inner product can be estimated

using some known procedure (Ohlson and von Rosen (2009) proposed a least squaresolution).

2. Condition on the estimated inner product. The total residual variationS + R1R′

1|Sis then distributed as

S ∼Wp

(n− rank(C),Σ(P )

),

R1R′

1|S ∼Wp

(rank(C),PΣ(P )P′

),

where

P = I − A(A′Σ−1

1 A)−A′Σ−1

1 .

From this the estimator is derived using a least square procedure.

Page 55: Studies in Estimation of Patterned Covariance Matrices

3.2 Estimation of the Parameters 43

Example 3.4

In Example 3.1 the design matrices and observed data was given. Assume that the covari-ance matrix is a kind of Toeplitz matrix defined as

ΣToep =

σ1 ρ1 ρ2 ρ3

ρ1 σ2 ρ1 ρ2

ρ2 ρ1 σ3 ρ1

ρ3 ρ2 ρ1 σ4

.

The estimates proposed by Ohlson and von Rosen (2009) is given by

B =

(17.4647 15.66240.4722 0.8437

)

and

ΣToep =

5.9978 3.1865 3.5575 2.71293.1865 4.4510 3.1865 3.55753.5575 3.1865 6.2330 3.18652.7129 3.5575 3.1865 4.9523

.

Lee (1988) considered not only the intraclass covariance matrix for a Growth Curvemodel, but also a nonlinear structure, namely theautoregressive covariance structure.Other authors have also considered the autoregressive covariance structure which is natu-ral for time series and repeated measurements, see for example Hudson (1983), Fujikoshiet al. (1990) and Lee and Hsu (1997).

Lee and Chang (2000) considered the general autoregressive, or banded, covariancestructure which is nothing else than the Toeplitz structure defined by

Σ = σ2C, (3.12)

whereC = (cij), cij = ρ|i−j|, i 6= j, cii = 1, for i, j = 1, . . . , p, σ2 > 0 andρ|i−j| areunknown and−1 < ρ|i−j| < 1 subject toC > 0.

Lee and Chang (2000) derived the likelihood equations and discussed the problemfrom a Bayesian point of view. The model (3.12) has also been considered by Leeand Geisser (1975) and Jennrich and Schluchter (1986). Lee and Geisser (1975) gavea rough solution and Jennrich and Schluchter (1986) provided an iterative numerical so-lution based on Newton-Raphson, Fisher scoring and generalized EM algorithms.

The Kronecker product covariance structure has also been considered for the GrowthCurve model. Srivastava et al. (2008) derived the likelihood equations and proved theuniqueness of the solution under some relevant conditions.

Page 56: Studies in Estimation of Patterned Covariance Matrices
Page 57: Studies in Estimation of Patterned Covariance Matrices

4Concluding Remarks

IN the first part of this thesis the theory and the background for the four papers given inthe second part are discussed. The idea underpinning Part I is to put the papers in Part

II in a relation, both to each other and with the theory behind them.

4.1 Conclusion

Many testing, estimation and confidence interval procedures discussed in the multivariatestatistical literature are based on the assumption that the observation vectors are indepen-dent and normally distributed. The main reasons for this are that often sets of multivariateobservations are, at least approximately, normally distributed. Normally distributed datacan be modeled entirely in terms of their means and variances/covariances. Estimatingthe mean and the covariance matrix is therefore a problem of great interest in statisticsand it is of great significance to consider the correct statistical model. The estimator forthe covariance matrix is important since inference on the mean parameters strongly de-pends on the estimated covariance matrix and the dispersion matrix for the estimator ofthe mean is a function of it.

This thesis consider the problem of estimating patterned covariance matrices for dif-ferent kinds of statistical models. Primarily linearly structured covariance matrices areconsidered and we think it is of importance to derive explicit estimators. In many exam-ples the maximum likelihood estimators can not be obtained explicitly and must rely onoptimization algorithms. In this thesis we will derive explicit estimators as alternatives tothe maximum likelihood estimators.

In Paper B, we discuss a banded structure for the covariance matrix of a multivariatenormal random vector, i.e., a symmetric matrix but with zeros outside of a band. Wepropose a simple non-iterative estimation procedure which gives an explicit, unbiased andconsistent estimator of the mean and an explicit and consistent estimator of the covariancematrix.

45

Page 58: Studies in Estimation of Patterned Covariance Matrices

46 4 Concluding Remarks

Estimation of parameters in the classical Growth Curve modelwhen the covariancematrix has some specific linear structure is considered in Paper D. From a discussion aboutresiduals, we propose a non-iterative estimation procedure which also gives explicit andconsistent estimators of both the mean and the linearly structured covariance matrix.

Also the Kronecker product structure is discussed in this thesis. In Paper C the maintask is to compute the likelihood ratio test for testing spatial independence when the ob-servation (columns) are dependent. The sample observation matrix is assumed to followa matrix normal distribution with separable covariance matrix, i.e., a Kronecker productstructure. The idea with a separable covariance matrix is that it can be described by twocovariance matrices, one between the rows of the sample observation matrix (spatial co-variance) and one between the columns (temporal covariance). If the temporal covariancematrix is known, it is shown that the test resemble the ordinary case with independentobservations. This should not come as a surprise since we can transform our depen-dent observations to be independent. Furthermore, if the temporal covariance matrix isunknown, but follows intraclass or autoregressive structure of order one we develop analternating algorithm for the maximum likelihood estimates. Hence, the likelihood ratiostatistic can be computed and testing spatial independence with dependent observationsis straight on if the null distribution of the statistic can be calculated.

Moreover, when estimating covariance matrices the matrix quadratic form arise. Thequestion when the quadratic form has a central or non-central Wishart distribution is thor-oughly investigated in the literature. In Paper A we present a generalization of the distri-bution of a multivariate quadratic form. Using the characteristic function of the quadraticform we show that the quadratic form has the same distribution as the weighted sum ofnon-central Wishart distributed variables.

4.2 Future Research

Originally, many estimators of the covariance matrix were obtained from non-iterativeleast square methods. When computer resources became more powerful iterative meth-ods such as maximum likelihood, restricted maximum likelihood, generalized estimationequations among others were introduced. However, nowadays when data sets are verylarge and more complex statistical models are considered, non-iterative methods haveagain become of interest. We believe that the future will maybe be something in the mid-dle, a theory that combine both the explicit and the iterative algorithms. Below follows alist of future research areas in this direction.

• The proposed estimator in Paper B is closely connected to the maximum likelihoodestimator. More properties and the relation to the maximum likelihood estimatorneed to be investigated further. Also, a high-dimensional analysis can be the aimfor further studies.

• For testing spatial independence in Paper C, the distribution of the likelihood ratiostatistic when the temporal covariance matrix is unknown has to be computed. Ifthe temporal covariance is known, the test statistic and the null distribution arewell known but if the temporal covariance is unknown only a simulation study ispresented in Paper C.

Page 59: Studies in Estimation of Patterned Covariance Matrices

4.2 Future Research 47

• More studies can be done for the proposed estimator for a linearly structured co-variance matrix in a Growth Curve model, given in Paper D. Several properties forthe estimators and the relation between the two design matrices and the estimatorscan be analyzed more.

• Similar as in Paper D an estimator for a linearly structured covariance matrix inan Extended Growth Curve model can be derived using the idea with the differentresiduals. The goal in Paper D was not just to obtain reasonable explicit estimators,but also to explore some new inferential ideas which can be applied to more generalmodels.

Page 60: Studies in Estimation of Patterned Covariance Matrices
Page 61: Studies in Estimation of Patterned Covariance Matrices

Bibliography

Anderson, T. W. (1969). Statistical inference for covariance matrices with linear structure.In Proceedings of the Second International Symposium on Multivariate Analysis. (P. R.Krisnaiah, ed.). Academic Press, New York, pages 55–66.

Anderson, T. W. (1970). Estimation of covariance matrices which are linear combinationsor whose inverses are linear combinations of given matrices.In Essay in Probabilityand Statistics (R. C. Bose, I. M. Chakravati, P. C. Mahalanobis, C. R. Rao, and K. J. C.Smith, eds.). University of North Carolina Press, Chapel Hill, pages 1–24.

Anderson, T. W. (1971).The Statistical Analysis of Time Series. John Wiley & Sons,New York.

Anderson, T. W. (1973). Asymptotically efficient estimation of covariance matrices withlinear structure.The Annals of Statistics, 1(1):135–141.

Anderson, T. W. (1975). Maximum likelihood estimation of parameters of autoregres-sive processes with moving average residuals and other covariance matrices with linearstructure.The Annals of Statistics, 3(6):1283–1304.

Anderson, T. W. (1977). Estimation for autoregressive moving average models in the timeand frequency domains.The Annals of Statistics, 5(5):842–865.

Anderson, T. W. (2003).An Introduction to Multivariate Statistical Analysis. John Wiley& Sons, New York, 3rd edition.

Andersson, S. A. (1975). Invariant normal models.The Annals of Statistics, 3(1):132–154.

Andersson, S. A., Brøns, H. K., and Jensen, S. T. (1983). Distribution of eigenvalues inmultivariate statistical analysis.The Annals of Statistics, 11(2):392–415.

49

Page 62: Studies in Estimation of Patterned Covariance Matrices

50 Bibliography

Arnold, S. F. (1981).The Theory of Linear Models and Multivariate Analysis. John Wiley& Sons, New York.

Baldessari, B. (1967). The distribution of a quadratic form of normal random variables.The Annals of Mathematical Statistics, 38(6):1700–1704.

Box, G. E. P. and Jenkins, G. (1970).Time Series Analysis, Forecasting and Control.Holden-Day, San Francisco.

Brockwell, P. J. and Davis, R. A. (2002).Introduction to Time Series and Forecasting.Springer, New York.

Browne, M. W. (1977). The analysis of patterned correlation matrices by generalized leastsquares.British Journal of Mathematical and Statistical Psychology, 30(1):113–124.

Burg, J. P., Luenberger, D. G., and Wenger, D. L. (1982). Estimation of structured covari-ance matrices.Proceedings of the IEEE, 70(9):963–974.

Chaganty, N. R. and Naik, D. N. (2002). Analysis of multivariate longitudinal data usingquasi-least squares.Journal of Statistical Planning and Inference, 103(1):421–436.

Chakraborty, M. (1998). An efficient algorithm for solving general periodic Toeplitzsystem.IEEE Transactions on Signal Processing, 46(3):784–787.

Chaudhuri, S., Drton, M., and Richardson, T. S. (2007). Estimation of a covariance matrixwith zeros.Biometrika, 94(1):199–216.

Chinchilli, V. M. and Carter, W. (1984). A likelihood ratio test for a patterned covariancematrix in a multivariate Growth Curve model.Biometrics, 40(1):151–156.

Chipman, J. S. and Rao, M. M. (1964). Projections, generalized inverses, and quadraticforms. Journal of Mathematical Analysis and Applications, 9(1):1–11.

Christensen, L. P. B. (2007). An EM-algorithm for band-toeplitz covariance matrix esti-mation.In IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), Honolulu, Hawaii, USA, April 2007.

Cochran, W. G. (1934). The distribution of quadratic forms in a normal system, with ap-plications to the analysis of covariance.In Mathematical Proceedings of the CambridgePhilosophical Society, 30(2):178–191.

Drton, M. and Richardson, T. S. (2004). Iterative conditional fitting for estimation ofa covariance matrix with zeros. Technical Report no. 469, Department of Statistics,University of Washington.

Durbin, J. (1959). Efficient estimation of parameters in moving-average models.Biometrika, 46(3/4):306–316.

Dutilleul, P. (1999). The MLE algorithm for the matrix normal distribution.Journal ofStatistical Computation and Simulation, 64(2):105–123.

Page 63: Studies in Estimation of Patterned Covariance Matrices

Bibliography 51

Fuhrmann, D. R. and Barton, T. A. (1990). Estimation of blocktoeplitz covariance matri-ces. In 24th IEEE Asilomar Conference on Signals, Systems and Computers. PacificGrove, California, USA, 2:779–783.

Fujikoshi, Y., Kanda, T., and Tanimura, N. (1990). The Growth Curve model with anautoregressive covariance structure.Annals of the Institute of Statistical Mathematics,42(3):533–542.

Godolphin, E. J. and De Gooijer, J. G. (1982). On the maximum likelihood estimation ofthe parameters of a Gaussian moving average process.Biometrika, 69(2):443–451.

Goodman, N. R. (1963). Statistical analysis based on a certain multivariate complex Gaus-sian distribution (an introduction).The Annals of Mathematical Statistics, 34(1):152–177.

Graybill, F. A. (1976). Theory and Application of the Linear Model. Duxbury Press,North Scituate, Massachusetts.

Grzebyk, M., Wild, P., and Chouanière, D. (2004). On identification of multi-factor mod-els with correlated residuals.Biometrika, 91(1):141–151.

Gupta, A. K. and Nagar, D. K. (2000).Matrix Variate Distributions. Chapman and Hall,Boca Raton, Florida.

Hannan, E. J. (1970).Multiple Time Series. John Wiley & Sons, New York.

Hayakawa, T. (1966). On the distribution of a quadratic form in multivariate normalsample.Annals of the Institute of Statistical Mathematics, 18(1):191–201.

Hayakawa, T. (1972). On the distribution of the multivariate quadratic form in multivari-ate normal samples.Annals of the Institute of Statistical Mathematics, 24(1):205–230.

Hogg, R. V. (1963). On the independence of certain Wishart variables.The Annals ofMathematical Statistics, 34(3):935–939.

Horn, R. A. and Johnson, C. R. (1990).Matrix Analysis. Cambridge University Press,Cambridge.

Hu, J. (2008). Wishartness and independence of matrix quadratic forms in a normalrandom matrix.Journal of Multivariate Analysis, 99(3):555–571.

Hudson, I. L. (1983). Asymptotic tests for Growth Curve models with autoregressiveerrors.Australian Journal of Statistics, 25(3):413–424.

Huizenga, H. M., de Munck, J. C., Waldorp, L. J., and Grasman, R. (2002). Spatiotem-poral EEG/MEG source analysis based on a parametric noise covariance model.IEEETransactions on Biomedical Engineering, 49(6):533–539.

Jennrich, R. I. and Schluchter, M. D. (1986). Unbalanced repeated-measures models withstructured covariance matrices.Biometrics, 42(4):805–820.

Page 64: Studies in Estimation of Patterned Covariance Matrices

52 Bibliography

Jensen, S. T. (1988). Covariance hypotheses which are linearin both the covariance andthe inverse covariance.The Annals of Statistics, 16(1):302–322.

Jöreskog, K. G. (1981). Analysis of covariance structures. With discussion by E. B.Andersen, H. Kiiveri, P. Laake, D. B. Cox and T. Schweder. With a reply by the author.Scandinavian Journal of Statistics, 8(2):65–92.

Khatri, C. G. (1962). Conditions for Wishartness and independence of second degreepolynomials in a normal vector.The Annals of Mathematical Statistics, 33(3):1002–1007.

Khatri, C. G. (1966a). A note on a MANOVA model applied to problems in GrowthCurve.Annals of the Institute of Statistical Mathematics, 18(1):75–86.

Khatri, C. G. (1966b). On certain distribution problems based on positive defi-nite quadratic functions in normal vectors.The Annals of Mathematical Statistics,37(2):468–479.

Khatri, C. G. (1973). Testing some covariance structures under a Growth Curve model.Journal of Multivariate Statistics, 3(1):102–116.

Khatri, C. G. (1977). Distribution of a quadratic form in non-central normal vectors usinggeneralised Laguerre polynomials.South African Statistical Journal, 11:167–179.

Khatri, C. G. (1980). Statistical inference for covariance matrices with linear structure.In Handbook of Statistics (P. R. Krishnaiah, ed.). North-Holland, Amsterdam, pages443–469.

Kollo, T. and von Rosen, D. (2005).Advanced Multivariate Statistics with Matrices.Springer, Dordrecht.

Kshirsagar, A. M. and Smith, W. B. (1995).Growth Curves. Marcel Dekker, New York.

Lauritzen, S. L. (1996).Graphical Models. The Clarendon Press, Oxford UniversityPress, New York.

Lee, J. C. (1988). Prediction and estimation of Growth Curves with special covariancestructures.Journal of the American Statistical Association, 83(402):432–440.

Lee, J. C. and Chang, C. H. (2000). Bayesian analysis of a Growth Curve modelwith a general autoregressive covariance structure.Scandinavian Journal of Statistics,27(4):703–713.

Lee, J. C. and Geisser, S. (1975). Applications of Growth Curve prediction.Sankhya Ser.A, 37:239–256.

Lee, J. C. and Hsu, Y. L. (1997). Bayesian analysis of Growth Curves with AR(1) depen-dence.Journal of Statistical Planning and Inference, 64(2):205–229.

Ljung, L. (1999).System Identification, Theory for the User. Prentice Hall, Upper SaddleRiver, New Jersey, 2nd edition.

Page 65: Studies in Estimation of Patterned Covariance Matrices

Bibliography 53

Lu, N. and Zimmerman, D. L. (2005). The likelihood ratio test for a separable covariancematrix. Statistics and Probability Letters, 73(4):449–457.

Mao, Y., Kschischang, F. R., and Frey, J. (2004). Convolutional factor graphs as prob-abilistic models. In Proceedings of the 20th conference on Uncertainty in artificialintelligence (M. Chickering and j. Halpern, eds.). Banff, Canada. AUAI Press, Arling-ton, Virginia, pages 374–381.

Marin, J. M. and Dhorne, T. (2002). Linear Toeplitz covariance structure models with op-timal estimators of variance components.Linear Algebra and Its Applications, 354(1-3):195–212.

Masaro, J. and Wong, C. S. (2003). Wishart distributions associated with matrix quadraticforms. Journal of Multivariate Analysis, 85(1):1–9.

Mathew, T. and Nordström, K. (1997). Wishart and Chi-square distributions associatedwith matrix quadratic forms.Journal of Multivariate Analysis, 61(1):129–143.

Mitchell, M. W., Genton, M. G., and Gumpertz, M. L. (2005). Testing for separability ofspace-time covariances.Environmetrics, 16(8):819–831.

Mitchell, M. W., Genton, M. G., and Gumpertz, M. L. (2006). A likelihood ratio test forseparability of covariances.Journal of Multivariate Analysis, 97(5):1025–1043.

Moura, J. M. F. and Balram, N. (1992). Recursive structure of noncausal Gauss Markovrandom fields.IEEE Transactions on Information Theory, 38(2):334–354.

Muirhead, R. J. (1982).Aspects of Multivariate Statistical Theory. John Wiley & Sons,New York.

Nahtman, T. (2006). Marginal permutation invariant covariance matrices with applica-tions to linear models.Linear Algebra and Its Applications, 417(1):183–210.

Nahtman, T. and von Rosen, D. (2005). Shift permutation invariance in linear randomfactor models.Research Report Centre of Biostochastics, Swedish University of Agri-culture science. Report 2005:6.

Naik, D. N. and Rao, S. S. (2001). Analysis of multivariate repeated measures datawith a Kronecker product structured covariance matrix.Journal of Applied Statistics,28(1):91–105.

Ohlson, M., Andrushchenko, Z., and von Rosen, D. (2009). Explicit estimators under m-dependence for a multivariate normal distribution.Accepted for publication in Annalsof the Institute of Statistical Mathematics.

Ohlson, M. and Koski, T. (2009a). The likelihood ratio statistic for testing spatial indepen-dence using a separable covariance matrix. Technical Report LiTH-MAT-R-2009-06,Department of Mathematics, Linköping University.

Ohlson, M. and Koski, T. (2009b). On distributions of matrix quadratic forms.Submittedto Communications in Statistics - Theory and Methods.

Page 66: Studies in Estimation of Patterned Covariance Matrices

54 Bibliography

Ohlson, M. and von Rosen, D. (2009). Explicit estimators in the Growth Curve modelwith a patterned covariance matrix.Submitted to Journal of Multivariate Analysis.

Olkin, I. (1973). Testing and estimation for structures which are circularly symmetricin blocks. In Multivariate Statistical Inference (D. G. Kabe and R. P. Gupta, eds.).North-Holland, Amsterdam, pages 183–195.

Olkin, I. and Press, S. (1969). Testing and estimation for a circular stationary model.TheAnnals of Mathematical Statistics, 40(4):1358–1373.

Perlman, M. D. (1987). Group symmetry covariance models.Statistical Science, 2:421–425.

Potthoff, R. F. and Roy, S. N. (1964). A generalized multivariate analysis of variancemodel useful especially for Growth Curve problems.Biometrika, 51(3/4):313–326.

Rao, C. R. (1965). The theory of least squares when the parameters are stochastic and itsapplication to the analysis of Growth Curves.Biometrika, 52(3/4):447–458.

Rao, C. R. (1967). Least squares theory using an estimated dispersion matrix and its ap-plication to measurement of signals.In Proceedings of the Fifth Berkeley Symposiumon Mathematial Statistics and Probability. Vol I (L. M. LeCam and J. Neyman, eds.).University of California Press, Berkeley, pages 355–372.

Rao, C. R. (1973).Linear Statistical Inference and Its Applications. John Wiley & Sons,New York, 2nd edition.

Rao, C. R. (1975). Simultaneous estimation of parameters in different linear models andapplications to biometric problems.Biometrics, 31(2):545–554.

Rao, C. R. and Mitra, S. K. (1971).Generalized Inverse of a Matrix and its Applications.Jon Wiley & Sons, New York.

Reinsel, G. (1982). Multivariate repeated measurement or Growth Curve models withmultivariate random-effects covariance structure.Journal of the American StatisticalAssociation, 77(377):190–195.

Reinsel, G. (1984). Estimation and prediction in a multivariate random effects generalizedlinear model.Journal of the American Statistical Association, 79(386):406–414.

Roy, A. and Khattree, R. (2005a). On implementation of a test for Kronecker productcovariance structure for multivariate repeated measures data.Statistical Methodology,2(4):297–306.

Roy, A. and Khattree, R. (2005b). Testing the hypothesis of a Kroneckar product co-variance matrix in multivariate repeated measures data.In Proceedings of the ThirtyAnnual SASR© Users Group International (SUGI) Conference, Philadelphia, Pennsyl-vania, pages 1–11.

Schervish, M. J. (1995).Theory of Statistics. Springer, New York.

Page 67: Studies in Estimation of Patterned Covariance Matrices

Bibliography 55

Seid Hamid, J. and von Rosen, D. (2006). Residuals in the extended Growth Curve model.Scandinavian Journal of Statistics, 33(1):121–138.

Shah, B. K. (1970). Distribution theory of a positive definite quadratic form with matrixargument.The Annals of Mathematical Statistics, 41(2):692–697.

Shitan, M. and Brockwell, P. J. (1995). An asymptotic test for separability of a spatial au-toregressive model.Communications in Statistics - Theory and Methods, 24(8):2027–2040.

Srivastava, M. S. and Carter, E. M. (1983).An Introduction to Applied MultivariateStatistics. North Holland, New York.

Srivastava, M. S. and Khatri, C. G. (1979).An Introduction to Multivariate Statistics.North Holland, New York.

Srivastava, M. S., Nahtman, T., and von Rosen, D. (2007). Models with a Kroneckerproduct covariance structure: estimation and testing.Research Report Centre of Bios-tochastics, Swedish University of Agriculture science. Report 2007:7.

Srivastava, M. S., Nahtman, T., and von Rosen, D. (2008). Estimation in general mul-tivariate linear models with Kronecker product covariance structure.Research ReportCentre of Biostochastics, Swedish University of Agriculture science. Report 2008:1.

Srivastava, M. S. and von Rosen, D. (1999). Growth Curve model.In Multivariate Anal-ysis, Design of Experiments, and Survey Sampling, Textbooks Monogr. 159 Dekker,New York, pages 547–578.

Styan, G. P. H. (1970). Notes on the distribution of quadratic forms in singular normalvariables.Biometrika, 57(3):567–572.

Szatrowski, T. H. (1982). Testing and estimation in the block compound symmetry prob-lem. Journal of Educational Statistics, 7(1):3–18.

Szatrowski, T. H. (1985). Asymptotic distributions in the testing and estimation of themissing-data multivariate normal linear patterned mean and correlation matrix.LinearAlgebra and its Applications, 67:215–231.

Tan, W. Y. (1977). On the distribution of quadratic forms in normal random variables.The Canadian Journal of Statistics/La Revue Canadienne de Statistique, 5(2):241–250.

Tian, Y. and Styan, G. P. H. (2005). Cochran’s statistical theorem for outer inverses ofmatrices and matrix quadratic forms.Linear and Multilinear Algebra, 53(5):387–392.

Vaish, A. K. and Chaganty, N. R. (2004). Wishartness and independence of matrixquadratic forms for Kronecker product covariance structures.Linear Algebra and ItsApplications, 388:379–388.

von Rosen, D. (1995). Residuals in the Growth Curve model.Annals of the Institute ofStatistical Mathematics, 47(1):129–136.

Page 68: Studies in Estimation of Patterned Covariance Matrices

56 Bibliography

Votaw, D. F. (1948). Testing compound symmetry in a normal multivariate distribution.The Annals of Mathematical Statistics, 19(4):447–473.

Ware, J. H. (1985). Linear models for the analysis of longitudinal studies linear modelsfor the analysis of longitudinal studies.The American Statistician, 39(2):95–101.

Wilks, S. S. (1946). Sample criteria for testing equality of means, equality of variances,and equality of covariances in a normal multivariate distribution.The Annals of Math-ematical Statistics, 17(3):257–281.

Wong, C. S. and Wang, T. (1993). Multivariate versions of Cochran’s theorems II.Journalof Multivariate Analysis, 44(1):146–146.

Woods, J. W. (1972). Two-dimensional discrete Markovian fields.IEEE Transactions onInformation Theory, 18(2):232–240.

Yokoyama, T. (1995). Statistical inference on some mixed MANOVA-GMANOVA mod-els with random effects.Hiroshima Mathematical Journal, 25(3):441–474.

Yokoyama, T. (1996). Extended Growth Curve models with random-effects covariancestructures.Communications in Statistics - Theory and Methods, 25(3):571–584.

Yokoyama, T. (1997). Tests for a family of random-effects covariance structures ina multivariate Growth Curve model.Journal of Statistical Planning and Inference,65(2):281–292.