tests for covariance matrices with high ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfabstract title...

133
TESTS FOR COVARIANCE MATRICES WITH HIGH-DIMENSIONAL DATA Saowapha Chaipitak A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy (Statistics) School of Applied Statistics National Institute of Development Administration 2012

Upload: others

Post on 21-Apr-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

TESTS FOR COVARIANCE MATRICES

WITH HIGH-DIMENSIONAL DATA

Saowapha Chaipitak

A Dissertation Submitted in Partial

Fulfillment of the Requirements for the Degree of

Doctor of Philosophy (Statistics)

School of Applied Statistics

National Institute of Development Administration

2012

Page 2: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author
Page 3: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

ABSTRACT

Title of Dissertation Tests for Covariance Matrices with High-dimensional Data

Author Ms. Saowapha Chaipitak

Degree Doctor of Philosophy (Statistics)

Year 2012

In multivariate statistical analysis, it is a necessity to know the facts regarding

the covariance matrix of the data in hand before applying any further analysis. This

study focuses on testing hypotheses concerning the covariance matrices of

multivariate normal data having the number of variables larger than or equal to the

sample size, called high-dimensional data. The two objectives of this study were: first,

for one sample data, to develop a test statistic for testing the hypothesis for whether

the covariance matrix equals a specified known matrix, called a partially known

matrix, and second, for two independent sample data, to develop a test statistic for

testing the hypothesis of equality of two covariance matrices of the two independent

populations. For the two hypotheses, a classical method such as the likelihood ratio

test is commonly used and is well defined when the sample size is larger than the

number of variables.

The two proposed test statistics 1T (for one sample data) and 2T (for two

sample data) are proposed for a high-dimensional situation. Both test statistics 1T and

2T are asymptotically normally distributed when the number of variables and the

sample size go towards infinity. A simulation study showed that both proposed test

statistics 1T and 2T approximately control the nominal significance level and have

good powers. The convergences to asymptotic normality of the two statistics were not

greatly affected by the change of covariance structures considered in the study

(Unstructured, Compound Symmetry, Heterogeneous Compound Symmetry, Simple,

Toeplitz, and Variance Component).

Page 4: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

iv

Furthermore, in the one sample case, the proposed test statistic 1T performed

comparably to the test statistics JU , proposed by Ledoit and Wolf (2002), and 1ST ,

proposed by Srivastava (2005), for large sample sizes and was more powerful than

these two tests for small or moderate sample sizes with a larger or equal number of

variables. In the two sample case, the proposed test statistic 2T is as good as the

competitive tests ,, 2SJ TT and ,SYT proposed by Schott (2007), Srivastava (2007b), and

Srivastava and Yanagihara (2010), respectively, for large sample sizes and is

markedly superior to these competitive tests for small to moderate sample sizes with a

larger or equal number of variables for all of the covariance matrix structures

considered. Finally, two real datasets regarding human gene expression in colon

tissues were also analyzed to illustrate the application of the theoretical results.

Page 5: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

ACKNOWLEDGEMENTS

I would like to express my gratitude to everyone who has given me help in

completing this dissertation. In particular, I am indebted to my advisor, Associate

Professor Dr. Samruam Chongcharoen, for always giving invaluable suggestions,

motivation and encouragement. With his guidance and support, I have completed the

dissertation. I also gratefully acknowledge my committee members: Professor Dr.

Prachoom Suwattee; Associate Professor Dr. Vichit Lorchirachoonkul and Assistant

Professor Dr. Winai Bodhisuwan, for contributing both their time and helpful

comments and suggestions.

I am grateful to the Commission on Higher Education of Thailand for financial

support through a grant fund under the Strategic Scholarships Fellowships Frontier

Research Networks. I am also grateful to Dr. John McMorris for his kindness towards

me by editing my English, which has made the manuscript more readable.

I would like to express my deep thanks to my friends for their help, spirit,

patience and co-operation.

Finally, I am greatly indebted to my parents, my elder brother, and my

younger sister, for their greatest love, encouragement, and support throughout my

graduate study.

Saowapha Chaipitak

January 2013

Page 6: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

TABLE OF CONTENTS

Page

ABSTRACT iii

ACKNOWLEDGEMENTS v

TABLE OF CONTENTS vi

LIST OF TABLES viii

CHAPTER 1 INTRODUCTION 1

1.1 Background 1

1.2 Objectives of the Study 4

1.3 Scope of the Study 5

1.4 Usefulness of the Study 5

CHAPTER 2 LITERATURE REVIEW 6

2.1 Testing the Hypothesis for a Partially Known Matrix 6

2.2 Testing the Equality of Two Covariance Matrices for 14

Two Independent Populations

CHAPTER 3 THE PROPOSED TESTS 22

3.1 Testing the Hypothesis for a Partially Known Matrix 22

for One High-dimensional Data

3.2 Testing the Equality of Two Covariance Matrices 26

for Two High-dimensional Data

CHAPTER 4 SIMULATION STUDY 32

4.1 Simulation Study for Testing the Hypothesis for a 33

Partially Known Matrix for One High-dimensional

Data

4.2 Simulation Study for Testing the Equality of Two 42

Covariance Matrices for Two High-dimensional Data

4.3 Application 59

Page 7: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

vii

CHAPTER 5 CONCLUSIONS, DISCUSSION AND RECOMMENDATIONS 60

FOR FUTURE RESEARCH

5.1 Conclusions 60

5.2 Discussion 62

5.3 Recommendations for Future Research 63

BIBLIOGRAPHY 64

APPENDICES 67

Appendix A Expected Values and Variances of the Estimators 68

Appendix B Proof of Theorem 3.1.2 80

Appendix C FORTRAN Syntax for One High-dimensional Data 88

Appendix D FORTRAN Syntax for Two High-dimensional Data 101

BIOGRAPHY 123

Page 8: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

LIST OF TABLES

Tables Page

4.1 Covariance Matrix Structure Definition 32

4.2 Empirical Type I Error Rates of the Test Statistic 1T under 38

the Four Null Hypotheses applied at 05.0=α

4.3 Empirical Powers of Test Statistic 1T under the Four 39

Alternative Hypotheses applied at 05.0=α

4.4 Empirical Type I Error Rates (under IH =Σ:1*0 ) and 40

Empirical Powers (under F=Σ:1*1H ) of 1, SJ TU and 1T

applied at 05.0=α

4.5 Empirical Type I Error Rates (under IH 2:2*0 =Σ ) and 41

Empirical Powers (under DH 2:2*1 =Σ ) of 1, SJ TU and 1T

applied at 05.0=α

4.6 Empirical Type I Error Rates (under 10H ′ ) of ,,, 2 SYSJ TTT 51

and 2T and Empirical Powers (under 11H ′ ) of JT and 2T

applied at 05.0=α

4.7 Empirical Type I Error Rates (under 20H ′ ) of ,,, 2 SYSJ TTT 52

and 2T and Empirical Powers (under 21H ′ ) of ,,2 SYS TT and 2T

applied at 05.0=α

4.8 Empirical Type I Error Rate (under 30H ′ ) of ,,, 2 SYSJ TTT 53

and 2T and Empirical Powers (under 31H ′ ) of JT and 2T

applied at 05.0=α

Page 9: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

ix

4.9 Empirical Type I Error Rates (under 40H ′ ) of ,,, 2 SYSJ TTT 54

and 2T and Empirical Powers (under 41H ′ ) of JT and 2T

applied at 05.0=α

4.10 Empirical Type I Error Rates (under 50H ′ ) of ,,, 2 SYSJ TTT 55

and 2T and Empirical Powers (under 51H ′ ) of JT and 2T

applied at 05.0=α

4.11 Empirical Type I Error Rates (under 60H ′ ) of ,,, 2 SYSJ TTT 56

and 2T and Empirical Powers (under 61H ′ ) of ,,, 2 SYSJ TTT

and 2T applied at 05.0=α

4.12 Empirical Type I Error Rates (under 70H ′ ) of ,,, 2 SYSJ TTT 57

and 2T and Empirical Powers (under 71H ′ ) of JT and 2T

applied at 05.0=α

4.13 Empirical Type I Error Rates (under 80H ′ ) of ,,, 2 SYSJ TTT 58

and 2T and Empirical Powers (under 81H ′ ) of JT and 2T

applied at 05.0=α

Page 10: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

CHAPTER 1

INTRODUCTION

1.1 Background

In multivariate statistical analysis, the data consist of more than one variable

on a number of observations, .n Before applying further analyses, testing the

hypotheses concerning a population covariance matrix as to whether it is equal to a

particular matrix and whether two population covariance matrices are equal are very

important. Classical techniques for these two hypotheses are based on the likelihood

ratio criterion which is valid if and only if the sample size (or the number of

observations) is greater than its number of variables p (more details are provided in

the next section). In the present, there are many applications in modern science and

economics, e.g. the analysis of DNA microarrays. Here data typically have thousands

of gene expressions whereas these are obtained on a group of observations which

often numbers much less than 100 (Schott, 2007); for examples with data, see Dudoit,

Fridlyand and Speed (2002); and Ibrahim, Chen and Gray (2002). Data having the

number of variables larger than or equal to the sample size, i.e. np ≥ , are called

“high-dimensional data” (Srivastava, 2010; Fujikoshi, Ulyanov and Shimizu, 2010;

Fisher, Sun and Gallagher, 2010).

As aforementioned, the likelihood ratio criterion is not well defined when the

data fall into a high-dimensional situation. Moreover, it is based on the asymptotic

theory which is restricted to the case that the sample size goes towards infinity

whereas the number of variables is fixed, called the “classical approach”. Details are

provided in many multivariate statistics and mathematical statistical texts: see

Anderson (1984); Johnson and Wichern (2002); Casella and Berger (2002); Rao

(1973); Rohatgi (1984); Lehmann (1999); Lehmann and Romano (2005); and others.

Page 11: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

2

A better approach to high-dimensional data sets is known as “(n, p)-

asymptotics”, “general asymptotics”, “concentration asymptotics” (Ledoit and Wolf,

2002), or “increasing dimension asymptotics” (Serdobolskii, 1999), where the

asymptotic theory framework of the test statistic is based upon both the sample size

and the number of variables approaching infinity. This approach is a generalization of

the classical technique (Fisher et al., 2010). Some examples of recent work on the

many problems of statistical inference using high-dimensional datasets include Birke

and Dette (2005); Samruam Chongcharoen (2011); Boonyarit Choopradit and

Samruam Chongcharoen (2011a, 2011b); Ledoit and Wolf (2002); Lin and Xiang

(2008); Fisher et al. (2010); Schott (2007); Srivastava (2005); Srivastava (2006);

Srivastava (2007a, 2007b); and Srivastava and Yanagihara (2010).

Two goals of this dissertation were to develop test statistics for the following

two hypotheses in high-dimensional data:

1) A hypothesis of testing for a covariance matrix equal to a specified known

matrix

2) A hypothesis of testing for the equality between two covariance matrices

Each hypothesis is presented in more detail in Sections 1.1.1 and 1.1.2 as follows:

1.1.1 Testing the Hypothesis for a Covariance Matrix Equal to a Specified

Known Matrix for One Population

Let nXX ,...,1 be a random sample drawn from a p-variate normal population

with mean vector μ and covariance matrix ,Σ denoted by ),,(~ ΣμX pj N for

,,...,1 nj = where both μ and Σ are unknown. The hypothesis for testing that the

population covariance matrix is equal to a specified known matrix, which is called

“a partially known matrix” from now on, can be written as

02

0 : Σ=Σ σH against 02

1 : Σ≠Σ σH , (1.1)

where 02 >σ is an unknown scalar and 0Σ a known positive definite matrix. The

likelihood ratio criterion that is very useful for handling problems where the number

of variables, ,p is less than the sample size, n , (Anderson, 1984: 429) is based on the

sample covariance matrix, ,S which is nonsingular. However, in the high-dimensional

Page 12: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

3

case, when ,np ≥ the likelihood ratio criterion is not available because the sample

covariance matrix, ,S becomes singular. Hence, in this dissertation, the focus is on

developing a test statistic for testing hypothesis 0H above for high-dimensional data.

The work for this dissertation begins by exploring several tests introduced in

recent years for testing the sphericity of the covariance matrix, i.e. ,2pIσ=Σ where

pI denotes the pp× identity matrix, from high-dimensional data, such as Ledoit and

Wolf (2002); Birke and Dette (2005); Srivastava (2005); Srivastava (2006); and

Fisher et al. (2010). More details are provided in Section 2.1. A test statistic was then

developed based on the consistent estimators of the second moment of the sample

eigenvalues described in Section 3.1. Under the null hypothesis (1.1), the asymptotic

distribution of the proposed test statistic is standard normal as ),( np together

approach infinity. Its performance was assessed and compared to some recent tests

provided in the literature through a simulation study given in Section 4.1.

1.1.2 Testing the Hypothesis of the Equality of Two Covariance Matrices

for Two Independent Populations

The equality of two covariance matrices is essential in multivariate analysis.

For example, in discriminant analysis (see Fujikoshi et al., 2010: 256; and Srivastava,

2002: 252), different discriminant rules are given depending on whether the

population under consideration has equal covariance matrices or not. In classical

analysis, for ,pn > when testing the hypothesis regarding the equality of the two

mean vectors, if the covariance matrices are equal an exact test for testing this

hypothesis exists. If the two covariance matrices are not equal, only an approximate

test is available (Johnson, 1998: 420). For testing the equality of two mean sub-

vectors, this requirement is also addressed (Srivastava, 2002: 125; Gamage and

Mathew, 2008).

Now let ,2,1;,...,1; == knj kjkX be two random samples drawn from two

independent p-variate normal populations ),,( kkpN Σμ where kμ denotes an unknown

mean vector of the thk population and kΣ denotes an unknown positive definite

Page 13: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

4

covariance matrix of the thk population. It is desirable to test the equality of these two

covariance matrices, i.e. testing the hypothesis that

Σ=Σ=Σ′ 210 :H against 211 : Σ≠Σ′H , (1.2)

where Σ denotes the common unknown positive definite covariance matrix of the two

populations under 0H ′ when the sample size from each population is less than or

equal to the number of variables, i.e. .2,1; =≤ kpnk

For pnk > , a classical way of dealing with this hypothesis is the likelihood

ratio test (Anderson, 1984) which is the case of the ratio of the determinants of the

two estimates of the covariance matrices and the determinant of the estimate of the

common covariance matrix under the null hypothesis. Barlett (1937) suggested the

modified likelihood ratio test by replacing the sample numbers appearing in the

likelihood test by the number of degrees of freedom of the sample covariance

matrices. However, these tests are valid if and only if the sample size from each

sample is greater than the number of variables.

In high-dimensional datasets, recent works, such as by Schott (2007);

Srivastava (2007b); and Srivastava and Yanagihara (2010) described in Section 2.2,

have been proposed under the general asymptotic theory framework.

At this point, this dissertation proposes a test statistic for testing the equality of

two covariance matrices for high-dimensional data. The proposed test statistic is

based on the consistent estimator of the ratio of the two second moments of the two

sample eigenvalues described in Section 3.2. Under the null hypothesis, as

,),( ∞→np the proposed statistic is asymptotically distributed as standard normal

when .2,1, =≥ knp k Note that the notation ∞→),( np means that both p and n go

towards infinity. This notation is used throughout this dissertation from here on.

Comparisons of the proposed test statistic to the existing tests given in the literature

are demonstrated using simulation provided in Section 4.2.

1.2 Objectives of the Study

The objectives of the dissertation are as follows:

Page 14: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

5

1) To propose a test statistic for testing hypothesis 02

0 : Σ=Σ σH , where

02 >σ is an unknown scalar and 0Σ a known positive definite matrix for one sample

with high-dimensional data

2) To propose a test statistic for testing hypothesis Σ=Σ=Σ′ 210 :H for two

independent samples with high-dimensional data

3) To assess the performance of the proposed test statistics by considering

their Type I error rates and powers via a simulation study and comparing them to

those of the existing tests

1.3 Scope of the Study

The proposed test statistics for high-dimensional data were developed under

the following conditions:

1) The data are assumed to be identically and independently distributed as

multivariate normal with the number of variables being greater than or equal to the

sample size )( np ≥

2) The asymptotic distribution of the proposed test statistic was investigated

when ∞→),( np

1.4 Usefulness of the Study

The newly proposed tests could be beneficial for analyzing multivariate data

in statistical situations where the number of variables is larger than or equal to the

sample size, such as DNA microarray analysis, genetics, astronomy data, etc.

Page 15: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

CHAPTER 2

LITERATURE REVIEW

This chapter begins with a review of the literature on testing the hypothesis

that the covariance matrix is a partially known matrix using the classical approach,

where the number of variables is less than the sample size, and the high-dimensional

approach, where the number of variables is larger than or equal to the sample size.

Testing the equality of two covariance matrices in the classical and high-dimensional

approaches is reviewed next.

2.1 Testing the Hypothesis for a Partially Known Matrix

As given in Chapter 1, the hypothesis for a partially known matrix is written

as

02

0 : Σ=Σ σH against 02

1 : Σ≠Σ σH .

Let nXX ,...,1 be a random sample drawn from a p-variate normal population

with unknown mean vector μ and unknown positive definite covariance matrix ,Σ

denoted by .,...,1),,(~ njN pj =ΣμX The variables made on a single observation are

regularly collected into a column vector, i.e. ,),,,( 21′= pjjjj xxx LX where j

represents the ,,...,1, njj th = observation from the random sample. The set of

variables on all observations in a sample set make up a matrix of observations, X ,

such that

( )

pnpnnn

p

p

n

n

xxx

xxxxxx

×⎟⎟⎟⎟

⎜⎜⎜⎜

=⎟⎟⎟

⎜⎜⎜

′′

=′=

LMMMM

LL

ML

21

22212

12111

2

1

21

X

XX

XXXX .

Page 16: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

7

The −p dimensional population is assumed to have a −×1p mean vector μ and

a −× pp covariance matrixΣ , so that

⎟⎟⎟⎟

⎜⎜⎜⎜

=Σ⎟⎟⎟

⎜⎜⎜

=

pppp

p

p

p σσσ

σσσσσσ

μ

μμ

LMOMM

L

L

M

21

22212

11211

2

1

andμ ,

where .,...,1;,...,1),( njpixE iji ===μ The diagonal elements ,iiσ ])[( 2iijii xE μσ −=

are the variances of the random variables ,,...,1;,...,1, njpixij == and the off-

diagonal elements ,ilσ )])([( lljiijil μxμxE −−=σ are the covariances between the

random variables ijx and ,ljx for .,...,1;,...,2,1 njpli ==≠ The covariance matrix Σ

can be expressed using the matrix notation

⎥⎦⎤

⎢⎣⎡ ′−−= ))(( μXμXEΣ .

The probability density function for random vector jX from the p-variate normal

distribution is defined as

( ) ,)()(21exp2)( 12/12/

⎟⎠⎞

⎜⎝⎛ −Σ′−−Σ= −−− μxμxx jj

pjf π

where Σ denotes the determinant operation on the matrix Σ and 0≠Σ because Σ is

positive definite. The estimates of the mean vector μ and the covariance matrix Σ are

the sample mean vector X and sample covariance matrix S , typically defined as

,11∑=

=n

jjn

XX (2.1)

and

.))((1

11

1)(1∑=

× ′−−−

=−

==n

jjjppij nn

s XXXXAS (2.2)

2.1.1 The Classical Approach

When pn > , from Anderson (1984: 429), the appropriate test for testing the

hypothesis 02

0 : Σ=Σ σH is the likelihood ratio test (LRT), which is given by

Page 17: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

8

,1

2

10

10

n

p

trp

L

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

⎛Σ

Σ=

S

S (2.3)

where tr denotes the trace notation.

From (2.3), for testing hypothesis pIH 2*0 : σ=Σ , the LRT is given by

( )

2

1

1

/12

1

/11

pn

p

ii

p

i

pi

n

p

lp

l

trp

L⎟⎟⎟⎟

⎜⎜⎜⎜

=

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

⎛=

=

=

S

S, (2.4)

where ,,...,1, pili = are the eigenvalues of S and, from Anderson (1984: 432),

)()]()([)()log)1(( 32242

21

−+ +≤−≤+≤=≤−− nOzPzPzPzLnP fff χχωχρ ,

where O denotes Big-oh notation, 2fχ denotes a Chi-squared random variable with

f degrees of freedom, and

,1)1(21

−+= ppf

,)1(6221

2

−++

−=npppρ

222

23

2 )1(288)2362)(2)(1)(2(

ρω

−+++−−+

=np

pppppp .

Following this, the LRT in (2.4) was been shown to have a monotone power

function by Carter and Srivastava (1977).

John (1971) proposed the test statistic under the null hypothesis *0H ,

1])/1[(

)/1()/1(

12

22

−=⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛−=

SS

SS

trptrpI

trptrp

U . (2.5)

The test statistic U is consistent as ∞→n , while p is fixed. John (1971) showed

that U has the asymptotically locally most powerful invariant test for *0H as ∞→n .

Page 18: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

9

2.1.2 The High-dimensional Approach

In this section, the considerable body of work completed on statistical testing

in high-dimensional data is built upon for testing the neighboring hypothesis, where

0Σ is restricted to an identity matrix for testing the hypothesis

pIH 2*0 : σ=Σ (sphericity) against .: 2*

1 pIH σ≠Σ

Various tests have been proposed, such as Ledoit and Wolf (2002); Birke and Dette

(2005); Srivastava (2005); Srivastava (2006); and Fisher et al. (2010).

The pioneering work of Ledoit and Wolf (2002) discussed the validity of

testing John’s U statistic above in a high-dimensional situation. Since the asymptotic

distribution of the U test statistic being studied assumes that ∞→n while p remains

fixed, it treats terms of order np / like terms of order ,/1 n which is inappropriate if

p is greater than .n After this, Ledoit and Wolf studied its consistency and

investigated the asymptotic distribution of the U test statistic using a new asymptotic

theoretical framework, such that ∞→),( np and ,/ cnp → for some finite

concentration c , where ( )+∞∈ ,0c . They showed that this test statistic is still

consistent when ∞→),( np and ).,0(/ +∞∈→ cnp Under the null hypothesis

,*0H they provided a test statistic based on John’s U statistic as

.2

1)1( −−−=

pUnUJ (2.6)

They showed that as ∞→),( np and ),,0(/ +∞∈→ cnp the test statistic JU is

asymptotically distributed as standard normal. It can be seen in their simulation study

that this test could control the Type I error rates under the null hypothesis when the

covariance matrix was set as the identity matrix, i.e. ,2Iσ=Σ with .12 =σ In addition,

the simulated power of the test statistic converged to one under the alternative

hypothesis when the covariance matrix was set as the diagonal matrix half of whose

elements equal 1 and other half 0.50.

Birke and Dette (2005) derived the asymptotic distribution of the test statistic

based on John’s U statistic in (2.5) using a new technique which is more applicable,

including the extreme cases of concentration 0=c and ,∞=c i.e. ].,0[/ ∞∈→ cnp

The test statistic under the null hypothesis *0H is given by

Page 19: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

10

2

1)1( −−−=

pUnTB . (2.7)

As ∞→),( np and ],,0[/ ∞∈→ cnp the test statistic BT is asymptotically distributed

as standard normal. Note that the test statistic BT in (2.7) is exactly the same as the

test statistic JU in (2.6).

Ledoit and Wolf (2002) did likewise but only required a more general

condition, whereas Srivastava (2005) imposed the condition that ,10),( ≤<= ζζpOn

and under this condition he proposed a test based on the Cauchy-Schwarz inequality

of the eigenvalues of Σ such that

⎟⎟⎠

⎞⎜⎜⎝

⎛≤⎟⎟

⎞⎜⎜⎝

⎛ ∑∑==

p

i

ri

p

i

ri p

1

22

1λλ ,

where ,1=r and siλ are the thi eigenvalue of Σ . The equality holds if and only if

,...1 λ=λ==λ p for all pi ,...,1= , and a constant .λ

When ,1=r let

2

1

1

2

1

⎟⎟⎠

⎞⎜⎜⎝

⎟⎟⎠

⎞⎜⎜⎝

=

=

=

p

ii

p

iip

λ

λψ .

He observed that 11 =ψ if and only if *0H holds. Thus the hypothesis can be

considered as 1: 1*0 =′ ψH against 1: 1

*1 >′ ψH . Note that

( ) 21

22

2

2

1

1

2

1 )/1()/1(

/

/

hh

trptrp

p

p

p

ii

p

ii

=ΣΣ

=

⎟⎟⎠

⎞⎜⎜⎝

⎟⎟⎠

⎞⎜⎜⎝

=

=

=

λ

λψ ,

where .2,1,)/1( =Σ= mtrph mm

The test statistic based on the first and second moments of the eigenvalues of S under

the null hypothesis *0H is given by

,1ˆˆ

2)1(

21

21

⎥⎥⎦

⎢⎢⎣

⎡−

−=

hhnTS (2.8)

Page 20: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

11

where

Strp

h 11 = (2.9)

and

( )( ) ( ) ⎥⎦⎤

⎢⎣⎡

−−

+−−

= 222

2 111

)12)1(ˆ SS tr

ntr

pnnnh . (2.10)

The random variables 1h and 2h are consistent estimators of Σtrp)/1( and

,)/1( 2Σtrp respectively. He showed that, under the null hypothesis ,*0H as

∞→),( np and under the condition ),( ζpOn = 10 ≤< ζ , the test statistic 1ST is

asymptotically distributed as standard normal. The asymptotic distribution of the

statistic under the alternative hypothesis was also given, but the simulation study for

evaluating the Type I error rates and the power of his test statistic 1ST were not reported

in this article.

Srivastava (2006) proposed an adapted version of the likelihood ratio test

when pn > to the case of pn ≤ simply by interchanging n and p. He let

,)1(288

]13)1(6)1(2)[1)(2(2

23

1 −−+−+−+−

=n

nnnnnnc

.2

1

,)1(6

1)1(2

2

1

2

1

−−=

−++−

−=

nng

nnnpm

The test is given by

,log 211 LmQ −=

where

1

1

1

12

11

=

=

⎟⎠

⎞⎜⎝

⎛−

=

∏nn

ii

n

ii

ln

lL ,

where ,1,...,1, −= nili are the positive eigenvalues of S . This test is applicable under

the assumptions 0/ →pn and n is fixed. He provided the following result:

Page 21: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

12

).()]()([)()( 31

224

211

21 111

mOzPzPmczPzQP ggg +≥−≥+≥=≥ +− χχχ

It can be found from his simulation study that this test can control the size (Type I

error rate) of the test when n is medium and p is large. However, this test is not

appropriate when n is large even though p is also large. Perhaps the reason is that the

asymptotic distribution of this test statistic is derived under the assumptions 0/ →pn

and n being fixed, while in practice n can vary together with .p

Motivated by the results of Srivastava (2005), and under similar condition to

Ledoit and Wolf (2002) of ,/ cnp → ),0( +∞∈c , Fisher et al. (2010) proposed an

alternative test based on the Cauchy-Schwarz inequality of the eigenvalues of Σ but

took a look at the case where 2=r , i.e.

⎟⎟⎠

⎞⎜⎜⎝

⎛≤⎟⎟

⎞⎜⎜⎝

⎛ ∑∑==

p

ii

p

ii p

1

42

1

2 λλ

Now let

2

1

2

1

4

2

⎟⎟⎠

⎞⎜⎜⎝

⎟⎟⎠

⎞⎜⎜⎝

=

=

=

p

ii

p

iip

λ

λψ

In a similar fashion to Srivastava (2005), he considered that 12 =ψ if and only if

*0H holds. Thus the hypothesis can be considered as 1: 2

**0 =ψH against .1: 2

**1 >ψH

Note that

( ) ,)/1()/1(

/

/

22

422

4

2

1

2

1

4

2

1

2

1

4

2 hh

trptrp

p

pp

p

ii

p

ii

p

ii

p

ii

Σ=

⎟⎟⎠

⎞⎜⎜⎝

⎟⎟⎠

⎞⎜⎜⎝

=

⎟⎟⎠

⎞⎜⎜⎝

⎟⎟⎠

⎞⎜⎜⎝

=

=

=

=

=

λ

λ

λ

λψ

where 4,...,1,)/1( =Σ= mtrph mm

With the constants

,1

4*

−−=n

b

,)2)(1()3(3)1(2

2

2*

+−−−+−

−=nnnnnc

Page 22: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

13

,)2)(1(

)15(22

*

+−−+

=nnn

nd

)2()1(15

22*

+−−+

−=nnn

ne ,

and

,)4)(3)(2)(5)(3)(1(

)2()1( 25

−−−++++−−

=nnnnnnn

nnnτ

the test statistic based on the second and fourth moments of the sample eigenvalues

under the null hypothesis is given by

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛−

++

−= 1ˆ

ˆ

1288

122

*4

2 hh

cc

nTF , (2.11)

where c is estimated by 1−np , 2h is defined as in (2.10), and

[ ]4*22*22*3*4*4 )()()(ˆ SSSSSSS tretrtrdtrctrtrbtr

ph ⋅+⋅+⋅+⋅+=

τ , (2.12)

which is a consistent estimator of .1 4Σtrp

Under the null hypothesis ,*0H as ∞→),( np and ),,0(/ +∞∈→ cnp he

showed that the test statistic FT is asymptotically distributed as standard normal. The

asymptotic distribution of the statistic FT under the alternative hypothesis was also

provided. His simulation study showed that this test statistic performed well under

IH 2*0 : σ=Σ with 12 =σ (i.e. each )1=iλ .

For the next review, a near spherical matrix definition, as defined by Fisher et

al. (2010), is given in the form

⎟⎠⎞⎜

⎝⎛ ′Θ= I0

0B ,

where Θ is an rr × diagonal matrix, for ,pr < with all elements ,1≠iθ I is a

)()( rprp −×− identity matrix and 0 is a −− )( rp vector of zeros. The number r is

chosen to be small so that the near spherical matrix is the identity matrix with the

exception of a few elements. It can be found in Fisher et al. (2010) that under the near

spherical alternative hypothesis, this test statistic FT is more powerful than the test of

Page 23: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

14

Srivastava (2005), as defined in (2.8), and is comparable to that of Ledoit and Wolf

(2002), as defined in (2.6).

2.2 Testing the Equality of Two Covariance Matrices for Two Independent

Populations

This section explores several test statistics for testing the hypothesis of equality

of two covariance matrices, defined as

Σ=Σ=Σ′ 210 :H against .: 211 Σ≠Σ′H

Let ,2,1;,...,1, == knj kjkX be random samples drawn from independently

normally distributed populations ),( kkpN Σμ where kμ denotes an unknown mean

vector of the thk population and kΣ denotes an unknown positive definite covariance

matrix of the thk population.

In this section, a set of two samples, one from each population, is used. The

variables made on a single observation from each sample are regularly collected into a

column vector, i.e. ′= ),,,( 21 pjkjkjkjk xxx LX , where j represents the thj observation

from the thk random sample, for .2,1;,...,1 == knj k The set of variables on all

observations in the thk sample set make up a matrix of observations, kX , such that

.2,1,

21

22212

12111

=⎟⎟⎟⎟

⎜⎜⎜⎜

=

×

kxxx

xxxxxx

pnkpnknkn

kpkk

kpkk

k

kkkk

LMMMM

LL

X

The −p dimensional population is assumed to have −×1p mean vectors kμ

and −× pp covariance matrices kΣ so that

,2,1,and

21

22212

11211

2

1

=

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

= k

ppkpkpk

pkkk

pkkk

k

pk

k

k

k

σσσ

σσσσσσ

μ

μμ

L

MOMM

L

L

Page 24: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

15

where .2,1;,...,1),( === kpixE ijkikμ The diagonal elements ,iikσ

,)( 2ikijkiik xE μσ −= of the thk covariance matrix kΣ are the variances of the random

variable ,2,1;,...,2,1;,...,1, === knjpix kijk and the off-diagonal elements ,ilkσ

where )],)([( lkljkikijkilk μxμxE −−=σ are the covariances between the random

variables ,ijkx and .2,1;,...,1;,...,1, ===≠ knjplix kljk

The thk covariance matrix kΣ can be expressed using the matrix notation,

.2,1,))(( =⎥⎦⎤

⎢⎣⎡ ′−−=Σ kE kkkkk μXμX

The probability density function for random vector jkX from the thk p-variate

normal distribution is defined as

( ) ,2,1;,...,1,)()(21exp2)( 12/12/ ==⎟

⎠⎞

⎜⎝⎛ −Σ′−−Σ= −−− knjf kkjkikjki

pjk μxμxx π

where kΣ denotes the determinant operation on the matrix kΣ and 0≠Σk since kΣ

is a positive definite matrix.

Let

,2,1,11

== ∑=

kn

kn

jjk

kk XX (2.13)

( )( ) ,2,1,1

=′−−=∑=

kin

jkjkkjkk XXXXA (2.14)

,2,1,1

1=

−= kn kk

k AS (2.15)

,2,1,11 == ktr

ph kk S (2.16)

,2,1,)(1

1)1)(2(

)1(ˆ 222

2 =⎭⎬⎫

⎩⎨⎧

−−

+−−

= ktrn

trnnp

nh k

kk

kk

kk SS (2.17)

Suppose there are independent estimates ,, 21 SS the sample covariance

matrices of the covariance matrices 1Σ and ,2Σ respectively, with

,2,1),1,(~)1( =−Σ− knWn kkpkk S i.e. kkn S)1( − having a Wishart distribution with

Page 25: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

16

kn degrees of freedom and covariance matrix .kΣ The common covariance matrix Σ

is estimated by the pooled sample covariance matrix

( ) ,1

11

1ˆ21 SAAA ≡+

−=

−=Σ

nn

where .121 −+= nnn Note that ),1,(~)1( −Σ− nWn pS i.e. S)1( −n has a Wishart

distribution with 1−n degrees of freedom and covariance matrix .Σ Therefore,

,11 Str

ph = (2.18)

.)(1

1)1)(2(

)1(ˆ 222

2 ⎭⎬⎫

⎩⎨⎧

−−

+−−

= SS trn

trnnp

nh (2.19)

2.2.1 The Classical Approach

For ,pn > to test Σ=Σ=Σ′ 210 :H against ,: 211 Σ≠Σ′H the likelihood ratio

criterion (see Anderson, 1984: 406; and Srivastava, 2002: 490), is given by

.)()2(

)1()1(2/

22/

1

2/)(21

2/)(21

2/22

2/11

3 21

21

21

21p

nn

nn

nn

nn

nnnn

nn

nnL ⎟

⎟⎠

⎞⎜⎜⎝

⎛ +⎟⎟

⎜⎜

−+

−−=

+

+S

SS

This test is not unbiased unless the degrees of freedom associated with kS , kn , are

changed to 1−kn .

The modified likelihood ratio test suggested by Bartlett (1937) on intuitive

grounds and quoted in Anderson (1984: 406) is based on the statistic

.2/)1(

2/)1(2

2/)1(1

4

21

−−

= n

nn

LS

SS

This modified likelihood ratio test is valid only if knp < , for .2,1=k In particular, if

p is fixed then the asymptotic null distribution of this criterion ,log2 4L− as ∞→kn ,

for ,2,1=k is Chi-squared with 2/)1( +pp degrees of freedom. Pearlman (1980)

showed that the test based on 4L is unbiased.

An alternative test based on the Wald statistic quoted in Schott (2007) is

.)()1(

)1)(1()(

1)1(

2)1( 2

1

2

1

112

2

1

11

⎭⎬⎫

⎩⎨⎧

−−−

−−−−

= ∑∑∑= =

−−

=

−−

k llk

lk

kkk

k trnnn

trnnnW SSSSSSSS

Page 26: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

17

This test statistic has the same asymptotic null distribution as 4L and is valid as long

as S is nonsingular, i.e. as long as np < (Schott, 2007).

2.2.2 The High-dimensional Approach

Schott (2007) proposed a test for testing the equality of several covariance

matrices. Since the equality of a two covariance matrix test is considered here, then

the test statistic of Schott (2007) based on a consistent estimator of the square of the

Frobenius norm of ,21 Σ−Σ i.e. ( ) ,221 Σ−Σtr is given by

( ) ,2ˆˆˆ)1(2

)1)(1(212221

2

21⎟⎟⎠

⎞⎜⎜⎝

⎛−+

−−−

= SStrp

hhhnnnTJ (2.20)

where ,2,1,ˆ2 =kh k is as defined in (2.17), and 2h as defined in (2.19). Under the null

hypothesis, he validated that the test statistic JT is distributed as standard normal as

,),,( 21 ∞→nnp and .2,1),,0(/ =∞∈→ kcnp kk

In his simulation study, the asymptotic normal distribution of this test statistic

was evaluated under the null hypothesis of Σ=Σ=Σ′ 210 :H after setting the common

unknown positive definite covariance matrix Σ as two different matrices. First, Σ

was set as the identity matrix, and second, as a block diagonal matrix with each block

matrix given by ,151.05.0 444 ′+I where 41 denotes the 14× vector with each of its

elements equal to 1. Two p-variate normal samples with equal sample sizes were

constructed.

First, the results obtained using the common covariance matrix Σ as the

identity matrix reported in his article showed that the empirical Type I error rates of

the test statistic JT were not close to the nominal significance level when the sample

size was small and they converged to the nominal significance level when p and the

sample size increased. He also found, but without tabulating the results, that the

empirical Type I error rates when the sample sizes were equal, i.e., ,21 nn = were not

substantially different from those when 2/12 nn = .

For the second setting of Σ as mentioned above, the empirical Type I error

rates were generally not close to the nominal level for small values of p and sample

Page 27: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

18

size. When p and the sample size increased, the empirical Type I error rates seemed

to improve. However, this test statistic yielded empirical Type I error rates which

were not close to the nominal significance level when p and the sample size were very

close; the empirical Type I error rates were generally much higher than the nominal

significance level. Furthermore, he mentioned that the convergence to a standard

normal distribution of this test statistic is somewhat slower when pI≠Σ , which can

be clearly observed in that article.

To estimate the power of this test, a simulation under the alternative

hypothesis with I=Σ1 while 2Σ had a block-diagonal structure with each block

matrix given by )2,1,1,...,2,1,1,1(diag was carried out. The empirical powers converged

to one as both p and the sample size increased and converged at a slower rate for

small values of .p Moreover, from his results, it was surprising that when sample

sizes were fixed and not large enough the empirical powers decreased

when p increased.

Srivastava (2007a) proposed a test based on the statistic

,ˆ))1(])1{[(ˆ

2

22112

1

hnntrhp

GSS −−

=+

where +1S denotes the Moore-Penrose inverse of .1S He noted that when the null

hypothesis, Σ=Σ=Σ′ 210 :H , is true, ).1,(~)1( −Σ− nWn pS For a fixed 1n and

,2n and under the null hypothesis,

.~lim 2)1)(1( 21 −−∞→ nnp

G χ

It was noted by Srivastava and Yanagihara (2010) that this test did not perform well.

It was quoted in Srivastava and Yanagihara (2010) that Srivastava (2007b)

proposed a test based on a consistent estimator of 22

21 Σ−Σ trtr . The test statistic is

given by

,ˆˆ

ˆˆ22

21

22212

ηη +

−=

hhTS (2.21)

Page 28: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

19

where ,2,1,ˆ2 =kh k are as defined in (2.17), and are consistent estimators of

,2,1,1 22 =Σ= ktr

ph kk respectively. The statistic 2ˆkη is a consistent estimator of 2

where

,2,1,)1(2

1)1(

422

4222

2 =⎟⎟⎠

⎞⎜⎜⎝

⎛ −+

−= k

phhn

hn

k

kkη

,2,1,ˆˆ)1(2

1ˆ)1(

4ˆ22

4222

2 =⎟⎟⎠

⎞⎜⎜⎝

⎛ −+

−= k

hphn

hn

k

kkη

2h as defined in (2.19) and

.ˆˆˆˆˆ11ˆ 41

32232

212

211

4

04 ⎟⎟

⎞⎜⎜⎝

⎛−−−−= hnphpchhcphpctr

pch A (2.22)

The constants ,,, 210 ccc and 3c are defined as

],18)1(21)1(6)1)[(1( 230 +−+−+−−= nnnnc

],9)1(6)1(2)[1(2 21 +−+−−= nnnc

],2)1(3)[1(22 +−−= nnc and

].7)1(5)1(2)[1( 23 +−+−−= nnnc

Under the null hypothesis, he showed that the test statistic 2ST is asymptotically

distributed as standard normal as .),( ∞→np

A numerical simulation study was carried out and the results shown in

Srivastava and Yanagihara (2010). In their simulation, they let ),,...,( 1 pdddiagD =

where ),5,1(~,...,...

1 Udddii

p and )2,1( =Δ jj is a pp× matrix whose thba ),( element are

defined by 10/1

)}2(2.0{)1( baba j −+ +×− . The asymptotic normality of this test statistic

was assessed under the null hypothesis, .: 1210 DDH Δ=Σ=Σ′ Two p-variate normal

samples were simulated with equal size. Under this setting, the empirical Type I error

rates of this test statistic were not very close to the nominal significance level when

the sample size and p were not large enough. However, it tended towards the nominal

significance level when p and the sample size increased.

Page 29: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

20

It was shown in this article that the empirical Type I error rates of the test

statistic JT , as defined in (2.20) and proposed by Schott (2007), performed quite

badly under this setup. The empirical Type I error rates were substantially greater than

the nominal significance level; they were at least 0.071 for all cases of p and sample

size considered. Moreover, the empirical Type I error rates did not converge to the

nominal significance level as p and the sample size increased. The power of this test

statistic 2ST was measured under the alternative hypothesis, 211 : Σ≠Σ′H , where

DD 11 Δ=Σ while .22 DDΔ=Σ The empirical power of test statistic 2ST tended

towards one more slowly than those of JT as p and the sample size increased.

Srivastava and Yanagihara (2010) proposed an alternative test relying on

a consistent estimator of difference

,)()( 2

2

22

21

21

21 Σ

Σ−

Σ

Σ=−

trtr

trtr

γγ

where .2,1,)( 2

2

=ΣΣ

= ktrtr

k

kkγ Under the null hypothesis ,: 210 Σ=Σ=Σ′H .021 =−γγ

Thus they noted that the hypothesis is equivalent to the following:

0: 210 =−′ γγH against .0: 211 ≠−′ γγH

Consistent estimators of kγ are given by ,ˆkγ where ,2,1,ˆˆ

ˆ2

1

2 == khh

k

kkγ the random

variables ,1kh and ,2,1,ˆ2 =kh k are as defined in (2.16) and (2.17), respectively. The

test statistic is given by

,ˆˆ

ˆˆ22

21

21

ξξ

γγ

+

−=SYT (2.23)

where

,2,1,ˆˆ

ˆˆˆ2

ˆˆ)1(2

ˆˆ

)1(4ˆ

41

451

3261

32

41

22

22 =

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

⎟⎟⎠

⎞⎜⎜⎝

⎛+−

−+

−= k

hh

hhh

hh

pn

hh

nk

kkξ

where ,ˆ,ˆ21 hh and 4h are as defined in (2.18), (2.19), and (2.22), respectively, and

Page 30: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

21

.ˆ)1(ˆˆ)1(3))1((1]4)1(3)1)[(1(

1ˆ 31

212

323 ⎟⎟

⎞⎜⎜⎝

⎛−−−−−

+−+−−= hpnhhnpnntr

pnnnh S

(2.24) Under the null hypothesis, the test statistic SYT is asymptotically normally distributed

as standard normal as .),( ∞→np

A simulation study was conducted under the null hypothesis,

.: 1210 DDH Δ=Σ=Σ′ The empirical Type I error rates of this test statistic SYT were

poor when the sample size was small and p was not large. The empirical Type I error

rates converged to one as p and the sample size increased. It was noted and can be

observed from Srivastava and Yanagihara (2010) that the convergence of this test

statistic SYT to a standard normal distribution is slower than that of the test statistic

2ST in (2.21), as proposed by Srivastava (2007b). The reason was addressed in

Srivastava and Yanagihara (2010), and is that an estimation of 2kη standardizing 2ST is

easier than that of 2kξ standardizing SYT because 2

kξ depends on more terms than 2kη .

Under the alternative hypothesis, 211 : Σ≠Σ′H , where DD 11 Δ=Σ while DD 22 Δ=Σ ,

the empirical power of this test SYT tended to one faster than those of the two test

statistics JT and 2ST .

Page 31: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

CHAPTER 3

THE PROPOSED TESTS

This chapter separately presents the methods of developing the two proposed

tests and their asymptotic distributions in two sections. The proposed test for testing

the hypothesis for a partially known matrix for one high-dimensional data is

introduced first. After that, the proposed test for testing the equality of two covariance

matrices for two high-dimensional data is also explained.

3.1 Testing the Hypothesis for a Partially Known Matrix for One High-

dimensional Data

For testing the hypothesis that

02

0 : Σ=Σ σH against ,: 02

1 Σ≠Σ σH

where 2σ is unknown and 0Σ is a known positive definite matrix, the test statistic is

built by considering a measure of distances between the two matrices, namely the

square of the Frobenius norm,

,)(2)(1)(1 410

221

0221

0 σσσψ +ΣΣ−ΣΣ=−ΣΣ= −−− trp

trp

Itrp

(3.1)

where tr denotes the trace notation. The measurement 0=ψ if and only if the null

hypothesis holds. Thus, testing hypothesis 0:0 =ψH against 0:1 >ψH can be

considered.

The following assumptions are made:

(A1) As ),1[,,),( ∞∈→∞→ ccnpnp

(A2) As ,8,...,1),,0(,, =∞∈→∞→ map mmm αα

Page 32: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

23

[ ]( ) .,as0

)1(21

)ˆ(1ˆ

22

1211

∞→→−

=

≤>−

nppn

aaVaraaP

ε

εε

where ∑=

− =ΣΣ=p

i

m

i

imm dp

trp

a1

10 )(1)(1 λ

(see Appendix A, Section A.1, page 68)

The iλ ’s are the eigenvalues of the covariance matrixΣ and id ’s are the eigenvalues

of a given positive definite matrix .0Σ

From (3.1), to find the estimator of the measurement ,ψ 1a and 2a need to be

estimated where their estimators are consistent for large p and n , as presented in the

next theorem.

Theorem 3.1.1 Let

)(1ˆ 101 S−Σ= tr

pa (3.2)

and

( ) ⎥⎦⎤

⎢⎣⎡ Σ

−−Σ

+−−

= −− 210

210

2

2 )(1

1)(1)1)(2(

)1(ˆ SS trn

trpnn

na , (3.3)

then

(i) 1a is an unbiased and consistent estimator of ),(1 101 ΣΣ= −tr

pa and

(ii) 2a is an unbiased and consistent estimator of .)(1 2102 ΣΣ= −tr

pa

Proof As shown in Appendix A (Section A.2, page 69-70), two statistics are

obtained:

),(1ˆ 101 S−Σ= tr

pa and ( ) ⎥⎦

⎤⎢⎣⎡ Σ

−−Σ

+−−

= −− 210

210

2

2 )(1

1)(1)1)(2(

)1(ˆ SS trn

trpnn

na .

(i) Using Lemma A.2 in Appendix A (page 71), it can be shown that

( ) ,)(1ˆ 11

01 atrp

aE =ΣΣ= − (3.4)

so 1a is an unbiased estimator of 1a . Using the variance of 1a in Lemma A.3 in

Appendix A (page 72), and by applying the Chebyshev’s inequality, for any ,0>ε

we obtain

(3.5)

Page 33: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

24

[ ]( ) .,as0481

)ˆ(1ˆ

22242

2222

∞→→⎟⎟⎠

⎞⎜⎜⎝

⎛+≈

≤>−

npan

anp

aVaraaP

ε

εε

Thus 1a is a consistent estimator of 1a .

From equations (3.4) and (3.5), it can be concluded that 1a is an unbiased and

consistent estimator of 1a .

(ii) By using Lemma A.2 in Appendix A (page 71), we obtain

( ) ,)(1ˆ 221

02 atrp

aE =ΣΣ= − (3.6)

so 2a is an unbiased estimator of .2a With a similar proof to (i) and using the variance

of 2a in Lemma A.7 in Appendix A (page 75), we get

(3.7)

Thus 2a is a consistent estimator of 2a . Hence, from equations (3.6) and (3.7), it can

be concluded that 2a is an unbiased and consistent estimator of .2a

The proof is completed.

Thus, from Theorem 3.1.1, an unbiased and consistent estimator of ψ in (3.1)

is defined as

.ˆ2ˆˆ 41

22 σσψ +−= aa (3.8)

To find the distribution of ψ , the distributions of 1a and 2a need to be found,

as shown in the next theorem.

Theorem 3.1.2 Under the assumptions (A1) and (A2), as ,),( ∞→np

,484

42

,ˆˆ

2224

3

32

2

12

2

1

⎥⎥⎥⎥

⎢⎢⎢⎢

⎟⎟⎟⎟

⎜⎜⎜⎜

+⎟⎟⎠

⎞⎜⎜⎝

⎛⎯→⎯⎟⎟

⎞⎜⎜⎝

an

anpnp

anpa

npa

aa

Naa D

where yx D⎯→⎯ denotes x converging in distribution to .y

Proof (See the proof in Appendix B (page 80)).

Page 34: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

25

The following theorem and corollary provide for an asymptotic distribution of

the estimator ψ under the alternative and null hypotheses by applying the delta

method to a function of two random variables. Next, the lemma taken from Lehmann

and Romano (2005: 436) shows the delta method used to prove the asymptotic

normality of the estimator ψ .

Lemma 3.1.1 (The delta method)

Suppose n21 ,...,, XXX are random vectors in the ℜk Euclidean space and

assume that ),()( Σ⎯→⎯− 0μX kD

nn Nτ , where μ is a constant vector and { }nτ is a

sequence of constants .∞→nτ In addition, presume that (.)g is a function from ℜk

to ℜ which is differentiable at μ with a gradient (vector of first partial derivatives) of

dimension k×1 at μ equal to ),(μg′ then

( )[ ] ( ).)()(,0)( TDnn ggNgg μμμX ′Σ′⎯→⎯−τ

Proof (See proof in Lehmann and Romano (2005: 436)).

Theorem 3.1.3 Under the assumptions (A1) and (A2), as ( ) ,, ∞→np

),,0(ˆ 2βψψ ND⎯→⎯− (3.9)

with .2424 2

243

22

4

22

⎟⎟⎠

⎞⎜⎜⎝

⎛+

+−= a

pnaanan

nσσ

β

Proof Let ,2),( 41

2221 σσ +−= aaaag then the proposed test statistic is

.ˆ2ˆ)ˆ,ˆ(ˆ 41

2221 σσψ +−== aaaag

The first partial derivatives of ),( 21 aag with respect to 1a and 2a are respectively

given by

2

1

2σ−=⎟⎟⎠

⎞⎜⎜⎝

⎛∂∂ag and .1

2

=⎟⎟⎠

⎞⎜⎜⎝

⎛∂∂ag

Thus, by applying the delta method,

),,0(ˆ 2βψψ ND⎯→⎯−

Page 35: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

26

where

( )

.2424

12

484

42

12

22

432

24

2

2

2

2243

32

22

⎟⎟⎠

⎞⎜⎜⎝

⎛+

+−=

⎟⎟⎠

⎞⎜⎜⎝

⎛−

⎟⎟⎟⎟

⎜⎜⎜⎜

+−=

ap

naanann

na

npa

npa

npa

npa

σσ

σσβ

The proof is completed.

Corollary 3.1.1 Under the null hypothesis 02

0 : Σ=Σ σH , ,0=ψ and under the

assumptions (A1) and (A2), as ,),( ∞→np

).1,0(ˆ2 41 NnT D⎯→⎯= ψσ

(3.10)

Proof Under ,0H ,, 63

42 σσ == aa and ,8

4 σ=a then .42

82

nσβ = It follows from

the previous theorem, so therefore the proof is completed.

3.2 Testing the Equality of Two Covariance Matrices for Two High-

dimensional Data

In this section, it is desirable to test the hypothesis

Σ=Σ=Σ′ 210 :H against ,: 211 Σ≠Σ′H

where Σ denotes the common unknown covariance matrix of the two populations

when .2,1, =≥ knp k

Recall from Chapter 2 that the common covariance matrix Σ is estimated by

the pooled sample covariance matrix

,)(1

11

1ˆ21 SAAA ≡+

−=

−=Σ

nn

where 121 −+= nnn and .2,1,))((1

=′−−=∑=

kkn

jkjkkjkk XXXXA

Page 36: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

27

A test statistic for 0H ′ is proposed in this work based on the fact that if the null

hypothesis 0H ′ holds, i.e. 21 Σ=Σ , then .22

21 Σ=Σ trtr Thus, under the null hypothesis,

we obtain the quantity

.122

2122

21 ==

Σ

Σ=

hh

trtr

b

Therefore, the hypothesis 1:0 =′ bH can be tested against .1:1 ≠′ bH This is a two-

sided test.

The following assumptions are imposed:

(B1) As ),0(,,),( ∞∈→∞→ ccnpnp

(B2) As ),,1[,,),( ∞∈→∞→ kkk

k ccnpnp 2,1=k

(B3) As ),,0(,, ∞∈→∞→ mmmhp αα 16,...,1=m

(B4) As ),,0(,, ∞∈→∞→ lklklkhp αα ,8,...,1;2,1 == lk

where ,1 mm tr

ph Σ= and l

klk trp

h Σ=1

In order to estimate the quantity ,b the following two lemmas extended from

some of the results of Srivastava (2005) for one population to the case of two

populations (which are presented without proof) are obtained.

Lemma 3.2.1 Let ),1,(~)1( −Σ− kkpkk nWn S and ,4,...,1;2,1,1==Σ= lktr

ph l

klk then,

under the assumptions (B2) and (B4), unbiased and consistent estimators of kh2 , as

∞→),( knp , are given by kh2ˆ , as defined in (2.17) in Chapter 2,

i.e.

.2,1,)(1

1)1)(2(

)1(ˆ 222

2 =⎭⎬⎫

⎩⎨⎧

−−

+−−

= ktrn

trnnp

nh k

kk

kk

ik SS

Page 37: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

28

Lemma 3.2.2 Let ),1,(~)1( −Σ− kkpkk nWn S ,2,1,ˆ2 =kh k as defined in (2.17) in

Chapter 2, and ,4,...,1;2,1,1==Σ= lktr

ph l

klk then, under the assumptions (B2) and

(B4),

( ),)ˆ(lim 22

),(xx

hhP

k

kk

np k

Φ=⎥⎥⎦

⎢⎢⎣

⎡≤

−∞→ η

where ( )xΦ denotes the cumulative distribution function of a standard normal

random variable and .1

2)1(

4 22

42

⎟⎟⎠

⎞⎜⎜⎝

−+

−=

k

kk

kk n

hh

pnη

Using Lemma 3.2.1, the consistent estimator of b can be estimated by

.)(

11

)(1

1

)1)(2()1()1)(2()1(

ˆˆ

ˆ2

22

22

21

1

21

112

2

222

1

22

21

⎭⎬⎫

⎩⎨⎧

−−

⎭⎬⎫

⎩⎨⎧

−−

+−−+−−

==SS

SS

trn

tr

trn

tr

nnnnnn

hh

b

The following lemma gives the asymptotic distribution of the consistent

estimators 21h and 22h .

Lemma 3.2.3 Let ),1,(~)1( −Σ− kkpkk nWn S ,2,1,ˆ2 =kh k as defined in (2.17), and

,4,...,1;2,1,1==Σ= lktr

ph l

klk then, under the assumptions (B2) and (B4),

.

12

)1(40

01

2)1(

4

ˆ

2

222

422

1

221

411

22

212

22

21

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

⎛−

+−

⎟⎟⎠

⎞⎜⎜⎝

⎛−

+−

⎟⎟⎠

⎞⎜⎜⎝

⎛⎯→⎯

⎟⎟⎠

⎞⎜⎜⎝

nhh

pn

nhh

pnhh

Nh

h D

Proof Since random samples 2,1,,...,1, == knj kjkX are drawn from two

independent populations and the sample covariance matrices ,2,1, =kkS are

calculated from corresponding independent random samples 1jX and ,2jX then 1S and

2S must be independent of each other. In fact, the statistic 21h is a function of 1S alone,

Page 38: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

29

whereas the statistic 22h is also a function of 2S alone. Thus 21h and 22h are also

independent which results in .0)ˆ,ˆ( 2221 =hhCOV

By applying Lemma 3.2.2, 2,1,ˆ2 =kh k are asymptotically normally distributed with

mean kh2 and variance ,2kη and from the fact that the covariance between 21h and

22h is zero, it follows that the joint asymptotic distribution of estimators 21h and 22h is

bi-variate normally distributed with a mean vector and covariance matrix as given

above. The proof is completed.

From Lemma 3.2.3, the statistic b is a ratio of two uncorrelated estimators

21h and 22h . By applying the delta method as given in Lemma 3.1.1, it ensures that a

function of two random variables can be approximated as a normal distribution. The

following theorem gives the asymptotic normality of the statistic .b

Theorem 3.2.1 Let ,b and b be as defined above, then, under the assumptions (B1)-

(B4),

),,(ˆ 2δbNb D⎯→⎯

where

.1

2)1(1

21

14

2

222

422222

221

1

221

411

222

2

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

⎟⎟⎠

⎞⎜⎜⎝

−+

−+⎟

⎟⎠

⎞⎜⎜⎝

−+

−=

nph

hhn

hnph

hnph

δ

Proof. Let ,),(22

212221 h

hhhg = then .ˆˆ

)ˆ,ˆ(ˆ22

212221 h

hhhgb ==

The first partial derivatives of ),( 2221 hhg with respect to 21h and 22h are respectively

given by 2221

2221 1),(hh

hhg=⎟⎟

⎞⎜⎜⎝

⎛∂

∂ and .

),(222

21

22

2221

hh

hhhg

−=⎟⎟⎠

⎞⎜⎜⎝

⎛∂

Thus, by applying the delta method, ),(ˆ 2δbNb D⎯→⎯ with

Page 39: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

30

.1

2)1(1

21

141

24)1(1

24)1(

1

1

12

)1(40

01

2)1(

41

2

222

422222

221

1

221

411

222

2

222

424222

221

1

221

412221

222

21

22

2

222

422

1

221

411

222

21

22

2

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

⎟⎟⎠

⎞⎜⎜⎝

−+

−+⎟

⎟⎠

⎞⎜⎜⎝

−+

−=

⎟⎟⎠

⎞⎜⎜⎝

−+

−+⎟

⎟⎠

⎞⎜⎜⎝

−+

−=

⎟⎟⎟⎟

⎜⎜⎜⎜

−⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

−+

⎟⎟⎠

⎞⎜⎜⎝

−+

−⎟⎟⎠

⎞⎜⎜⎝

⎛−=

nph

hhn

hnph

hnph

nph

hphn

hnph

hphn

hh

h

nph

hpn

nph

hpn

hh

The proof is completed.

Corollary 3.2.1 Let b be as defined above. Under Σ=Σ=Σ′ 210 :H and the

assumptions (B1)-(B4), it follows that

( ).1,0

)1(11212

21

2

12

2

122

4

* N

np

nhh

p

bT D

k kk k

⎯→⎯

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

⎥⎥⎦

⎢⎢⎣

−+

−=

∑∑==

Proof. Under ,0H ′ ,22221 hhh == and .44241 hhh ==

Thus ,)1(1

124 2

12

2

122

42 0

⎭⎬⎫

⎩⎨⎧

−+

−= ∑∑

==

k kk k

H

np

nhh

pδ which follows from Theorem 3.2.1, then

the proof is completed.

In order to use *T in practice, it is necessary to estimate 2δ involving the

estimates of 2h and .4h The following lemma states a consistent estimator of 4h taken

from Fisher et al. (2010), which is also presented without proof.

Lemma 3.2.4 Let ),1,(~)1( −Σ− nWn pS and ,16,...,1,1=Σ= ktr

ph k

k then, under the

assumptions (B1) and (B3), an unbiased and consistent estimator of 4h as

∞→),( np is given by *4h , which was defined in (2.12) in Chapter 2,

[ ],)()()(ˆ 4*22*22*3*4*4 SSSSSSS tretrtrdtrctrtrbtr

ph ++++=

τ

where

Page 40: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

31

,1

4*

−−=

nb

,)2)(1()3(3)1(2

2

2*

+−−−+−

−=nnnnnc

,)2)(1(

)15(22

*

+−−+

=nnn

nd

)2()1(15

22*

+−−+

−=nnn

ne ,

and

.)4)(3)(2)(5)(3)(1(

)2()1( 25

−−−++++−−

=nnnnnnn

nnnτ

Using Lemmas 3.2.1 and 3.2.4, consistent estimators of 2h and 4h are given

by 2h and *4h , respectively. By substituting 2h and *

4h , we obtain a corresponding

consistent estimator of ,2δ namely 2δ , as

.)1(

11

1ˆˆ2

4)1(1

1ˆˆ24ˆ

2

12

2

122

*4

2

12

2

122

*42

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

−+

−=

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

−+

−= ∑∑∑∑

==== k kk kk kk k nnhph

np

nhh

Thus a test of 0H ′ can be based on the statistic

δ

1ˆ2

−=

bT . (3.11)

In addition, its asymptotic null distribution is standard normal. The proposed test

statistic 2T with an α level of significance rejects 0H if 2/2 || αzT > , where 2/αz

denotes the upper 2/α quantile of the standard normal distribution.

Page 41: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

CHAPTER 4

SIMULATION STUDY

This section explains a Monte Carlo method carried out using the Fortran

programming language (FORTRAN) to investigate the performance of the two

proposed tests 1T and ,2T defined in (3.10) and (3.11), respectively. Multivariate

normal vectors were generated using the international mathematics and statistics

library (IMSL) with the multivariate normal random number generator (RNMVN)

subroutine. As will be seen, the performances of the tests were assessed in two

aspects: (1) empirical Type I error rate (under the null hypothesis) and (2) empirical

power (under the alternative hypothesis). Both the empirical Type I error rates and the

empirical powers of the two proposed tests 1T and 2T , as well as those of competitive

tests, were computed under a variety of covariance matrix structures. Some types of

covariance matrix structure used in this study are given in Table 4.1.

Table 4.1 Covariance Matrix Structure Definition

Type of

Covariance Matrix

Structure

Definition

1. Unstructured (UN) ppij ×=Σ )(σ

2. Compound Symmetry (CS) ,112ppp kI ′+=Σ σ where ,02 >σ k is an

appropriate constant, pI denotes the pp× identity

matrix, and p1 denotes the 1×p vector of ones

Page 42: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

33

Table 4.1 (Continued)

Type of

Covariance Matrix

Structure

Definition

3. Heterogeneous compound

symmetry (CSH) ppij ×=Σ )(σ where ;,02 jiiij =>=σσ

,, jijiij ≠= ρσσσ where ρ is the correlation

parameter satisfying 1<ρ

4. Simple (SIM) pI2σ=Σ

5. Toeplitz (TOEP). ppjippij ×−× ==Σ )()( ||σσ

6. Variance Component (VC) ),...,,( 2211 ppdiag σσσ=Σ

where piii ,...,1,0 =>σ

4.1 Simulation Study for Testing the Hypothesis for a Partially Known

Matrix for One High-dimensional Data

4.1.1 Simulation Setting

This section deals with a simulation to assess the asymptotic normality of the

proposed test statistic 1T . Under the null hypothesis ,: 02

0 Σ=Σ σH by setting

,12 =σ for each ),( np combination, where }320,160,80,40,20,10{∈p and

},160,80,40,20,10{∈n multivariate normal vectors were generated with 10,000

independent iterations. The test statistic 1T was calculated and then the empirical

Type I error rate of the proposed test was obtained by recording the proportion of

rejections of the test statistic. The nominal significance level )(α of interest was fixed

at 0.05. The empirical Type I error rates under the null hypothesis and empirical

powers under the corresponding alternative hypothesis were computed for four

different hypotheses with different covariance matrix structures:

Page 43: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

34

1) Unstructured Structure (UN)

⎪⎩

⎪⎨⎧

<−

=====Σ +×× jifor

ji

jiforH jippjippij

2)1(

1)()(: 0

10 σσU

⎪⎩

⎪⎨⎧

<−

=====Σ +×× .

4)1(

1)()(: 1

11 jifor

ji

jiforH jippjippij σσU

2) Compound Symmetry Structure (CS)

pppIH 11)5.0(5.0: 020 ′+==Σ C

pppIH 11)1.0(9.0: 12

1 ′+=Σ =C

3) Heterogeneous compound symmetry Structure (CSH)

030 : M=ΣH

131 : M=ΣH

where

0M is a matrix having CSH with ,;)3,2(~2 ρσσσσσ jiijiij jiU === when

,,5.0 ji ≠=ρ and

1M is a matrix having CSH with ,;)4,3(~2 ρσσσσσ jiijiij jiU === when

.,5.0 ji ≠=ρ

4) Toeplitz Structure (TOEP)

040 : T=ΣH

14

1 : T=ΣH

where 0T is a Toeplitz matrix with elements 50.0,1 10 −== σσ with the rest of the

elements equal to zero, and 1T is a Toeplitz matrix with elements 45.0,1 10 −== σσ

with the rest of the elements equal to zero.

The results of the empirical Type I error rates are shown in Table 4.2 and

the empirical powers are tabulated in Table 4.3.

To compare the performance of the proposed test statistic 1T to the test

statistics defined in Ledoit and Wolf (2002), denoted JU as in (2.6) in Chapter 2, and

Page 44: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

35

Srivastava (2005), denoted 1ST as in (2.8) in Chapter 2, attention to the null hypothesis

was restricted using a simple structure (SIM), i.e. IH 2*0 : σ=Σ (sphericity). The

following two hypotheses were considered under simple and variance component

structures:

1) Simple (SIM) and Variance Component Structure (VC) (as in Ledoit and Wolf,

2002)

pIH =Σ:1*0 (SIM)

F=Σ:1*1H (VC)

where F is a matrix in VC where half of the elements are equal to 1 and the other half

equal to 0.05.

2) Variance Component structure (VC)

pIH 2:2*0 =Σ (SIM)

DH 2:2*1 =Σ (VC)

where );,...,( 1 pdddiagD = .,...,2,1);1,0(~ piUnifdi =

4.1.2 Simulation Results

At this point, according to the four null hypotheses described above, the

empirical Type I error rates are correspondingly exhibited in Table 4.2. It can be

observed that for all four null hypotheses (under the UN, CS, CSH, and TOEP

structures) the proposed test statistic 1T yielded satisfying empirical Type I error rates

in the same pattern. This means that the convergence to asymptotic normality of the

proposed test statistic 1T is not greatly affected by a change in the covariance matrix

structure of the null hypothesis. As expected, the empirical Type I error rates of the

proposed test statistic 1T were reasonably close to the nominal 0.05 significance level

and got better when p and n increase.

Table 4.3 shows that the empirical powers of the proposed test statistic 1T

performed under the four covariance matrix structures (UN, CS, CSH, and TOEP)

rapidly converged to one and remained high as p and n increased. It can be seen in

Page 45: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

36

this table that the speed of convergence to normality of the four sets of empirical

powers of the proposed test statistic 1T differed depending on the alternative

covariance matrix setting.

The empirical Type I error rates and empirical powers performed to compare

the proposed test statistic 1T to the two test statistics JU and 1ST as described above

are displayed in Tables 4.4 and 4.5.

Table 4.4 reports that the empirical Type I error rates of the proposed test

statistic 1T under the null hypothesis with the covariance matrix as the identity matrix,

i.e. ,I=Σ the same as Ledoit and Wolf (2002). It can be observed from this table that

the empirical Type I error rates of the proposed test statistic 1T are very similar to

those provided in Table 4.2 and it performed well under the four covariance matrix

structures (UN, CS, CSH, and TOEP). This suggests that the proposed test statistic 1T

is also very appropriate for the SIM structure. This table shows that the empirical

Type I error rates of the three test statistics ,, 1SJ TU and 1T generally tended to the

nominal 0.05 significance level as p and n increased, and tended to 0.05 as p

increased for any fixed .n Furthermore, the two test statistics 1ST and 1T yielded

empirical Type I error rates close to 0.05 in the following situations: (1) when p was

large, here ,160≥p for any n , and (2) when both p and n were at least medium

numbers, here 40≥p and ,40≥n where .np ≥ The test statistic JU gave empirical

Type I error rates close to 0.05 in the following situations: (1) when p was very large,

here 320≥p , for any ,n and (2) when both p and n were large, here 160≥p and

,160≥n where .np ≥ This indicates that the proposed test statistic 1T is more useful

than the test statistic JU since 1T can be applied in a wider range of p and n .

So as to take a look at the empirical powers of the competitive tests 1, SJ TU

and 1T under the alternative hypothesis where the covariance matrix was set to be an

F matrix half of whose elements equal 1 and the other half equal 0.05, Table 4.4

shows that, as expected, the empirical powers of the test statistics converged to one as

Page 46: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

37

p and n increased. Moreover, the empirical powers of the proposed test statistic 1T

were much higher than those from the two tests JU and 1ST when the sample size was

medium, with n around 40.

Table 4.5 reporting empirical Type I error rates when I2=Σ (SIM) shows

that all of the competitive test statistics ,, 1SJ TU and 1T yielded empirical Type I error

rates with the same values as those under I=Σ , provided in Table 4.4. Thus there is

no need to recap on it. From this result, it is clear that the unknown scalar 2σ was not

effective on the consistency of the asymptotic normality of the test statistics ,, 1SJ TU

and 1T , which should have been the case.

As displayed in Table 4.5, the empirical powers of the three test statistics

1, SJ TU and 1T tended towards one as p and n increased. The empirical power of the

proposed test statistic 1T was substantially higher than those of JU and 1ST , especially

when n was small, 20 here, for all values of p where .np ≥ The magnitude of this

empirical power difference seemed to decrease when n increased.

Consequently, it can be concluded that the performance of proposed test

statistic 1T was generally outstanding when compared to the test statistics JU and 1ST ,

especially when the sample size was small or medium.

Page 47: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

38

Table 4.2 Empirical Type I Error Rates of the Test Statistic 1T under the Four Null

Hypotheses applied at 05.0=α

Empirical Type I Error Rate of 1T p n 0

10 : U=ΣH 0

20 : C=ΣH 0

30 : M=ΣH 0

40 : T=ΣH

10 10 0.059 0.058 0.058 0.059 20 10 0.055 0.054 0.054 0.055 20 0.062 0.063 0.063 0.061

40 10 0.055 0.055 0.056 0.055 20 0.056 0.056 0.055 0.055 40 0.055 0.055 0.055 0.056

80 10 0.057 0.056 0.056 0.055 20 0.057 0.057 0.057 0.057 40 0.052 0.052 0.051 0.051 80 0.052 0.052 0.052 0.051

160 10 0.054 0.053 0.054 0.053 20 0.053 0.054 0.054 0.053 40 0.055 0.055 0.055 0.055 80 0.056 0.056 0.055 0.056 160 0.053 0.053 0.053 0.054

320 10 0.052 0.052 0.052 0.052 20 0.051 0.051 0.050 0.050 40 0.052 0.051 0.052 0.051 80 0.050 0.051 0.050 0.050 160 0.051 0.050 0.050 0.050 320 0.053 0.051 0.053 0.053

Page 48: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

39

Table 4.3 Empirical Powers of the Test Statistic 1T under the Four Alternative

Hypotheses applied at 05.0=α

Empirical Power of 1T p n 1

11 : U=ΣH 1

21 : C=ΣH 1

31 : M=ΣH 1

41 : T=ΣH

10 10 0.174 0.560 0.269 0.4802 20 10 0.224 0.597 0.286 0.9328 20 0.362 0.905 0.447 0.999

40 10 0.265 0.617 0.288 1.000 20 0.443 0.918 0.460 1.000 40 0.772 1.000 0.776 1.000

80 10 0.300 0.624 0.292 1.000 20 0.498 0.918 0.461 1.000 40 0.837 1.000 0.776 1.000 80 0.998 1.000 0.991 1.000

160 10 0.319 0.625 0.289 1.000 20 0.537 0.925 0.467 1.000 40 0.866 1.000 0.778 1.000 80 0.999 1.000 0.993 1.000 160 1.000 1.000 1.000 1.000

320 10 0.342 0.629 0.293 1.000 20 0.554 0.925 0.459 1.000 40 0.891 1.000 0.779 1.000 80 1.000 1.000 0.993 1.000 160 1.000 1.000 1.000 1.000 320 1.000 1.000 1.000 1.000

Page 49: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

40

Table 4.4 Empirical Type I Error Rates (under IH =Σ:1*0 ) and Empirical Powers

(under F=Σ:1*1H ) of 1, SJ TU and 1T applied at 05.0=α

Empirical Type I Error Rate Empirical Power p n JU 1ST 1T JU 1ST 1T

10 10 0.049 0.048 0.059 0.121 0.118 0.059 20 10 0.050 0.047 0.054 0.130 0.125 0.049 20 0.059 0.057 0.063 0.270 0.265 0.233

40 10 0.054 0.051 0.055 0.136 0.132 0.050 20 0.053 0.056 0.055 0.283 0.288 0.229 40 0.055 0.053 0.056 0.658 0.654 0.891

80 10 0.057 0.053 0.057 0.146 0.141 0.047 20 0.058 0056 0.057 0.291 0.284 0.226 40 0.052 0.050 0.052 0.676 0.671 0.896 80 0.051 0.050 0.052 0.991 0.991 1.000

160 10 0.056 0.054 0.053 0.144 0.139 0.045 20 0.055 0.053 0.053 0.294 0.287 0.228 40 0.056 0.055 0.055 0.673 0.668 0.896 80 0.057 0.055 0.055 0.994 0.994 1.000 160 0.052 0.052 0.053 1.000 1.000 1.000

320 10 0.055 0.052 0.052 0.143 0.138 0.045 20 0.052 0.050 0.051 0.292 0.286 0.226 40 0.054 0.052 0.052 0.684 0.677 0.897 80 0.050 0.050 0.050 1.000 1.000 1.000 160 0.050 0.050 0.051 1.000 1.000 1.000 320 0.053 0.053 0.053 1.000 1.000 1.000

Page 50: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

41

Table 4.5 Empirical Type I Error Rates (under IH 2:2*0 =Σ ) and Empirical Powers

(under DH 2:2*1 =Σ ) of 1, SJ TU and 1T applied at 05.0=α

Empirical Type I Error Rate Empirical Power p n JU 1ST 1T JU 1ST 1T

10 10 0.049 0.048 0.059 0.412 0.405 0.453 20 10 0.050 0.047 0.054 0.445 0.437 0.498 20 0.059 0.057 0.063 0.922 0.918 1.000

40 10 0.054 0.051 0.055 0.368 0.360 0.256 20 0.053 0.056 0.055 0.821 0.816 1.000 40 0.055 0.053 0.056 0.999 0.999 1.000

80 10 0.057 0.053 0.057 0.356 0.348 0.202 20 0.058 0056 0.057 0.789 0.783 0.999 40 0.052 0.050 0.052 0.999 0.999 1.000 80 0.051 0.050 0.052 1.000 1.000 1.000

160 10 0.056 0.054 0.053 0.354 0.346 0.189 20 0.055 0.053 0.053 0.781 0.774 1.000 40 0.056 0.055 0.055 0.999 0.999 1.000 80 0.057 0.055 0.055 1.000 1.000 1.000 160 0.052 0.052 0.053 1.000 1.000 1.000

320 10 0.055 0.052 0.052 0.352 0.343 0.189 20 0.052 0.050 0.051 0.789 0.782 1.000 40 0.054 0.052 0.052 0.999 0.999 1.000 80 0.050 0.050 0.050 1.000 1.000 1.000 160 0.050 0.050 0.051 1.000 1.000 1.000 320 0.053 0.053 0.053 1.000 1.000 1.000

Page 51: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

42

4.2 Simulation Study for Testing the Equality of Two Covariance

Matrices for Two High-dimensional Data

4.2.1 Simulation Setting

In this section, the performance of the proposed test statistic 2T was assessed

using a numerical simulation technique for testing the hypothesis Σ=Σ=Σ′ 210 :H

and by comparing it to the competitive tests ,, 2SJ TT and .SYT Under the null

hypothesis, by controlling ,2,1, =≥ knp k for each ),,( 21 nnp combination of

},240,160,80,40,20{∈p },240,160,80,40,20{1 ∈n and },240,160,80,40,20{2 ∈n two

independent p-multivariate normal samples were simulated with 10,000 independent

iterations. The nominal significance level was fixed at .05.0=α The four test

statistics ,,, 22 SJ TTT and SYT were computed and the proportions of rejections of these

statistics were recorded. The empirical Type I error rates of the four test statistics

were conducted under seven hypotheses set up with different covariance structures, as

shown below. The corresponding empirical powers (under the corresponding

alternative hypothesis) were computed for a test whose Type I error rate can control

the nominal significance level as follows:

1) Unstructured Structure (UN). Two hypotheses where the common

covariance matrix structure is a UN were considered:

1.1 0211

0 : U=Σ=Σ′H

011

1 : U=Σ′H and 12 U=Σ ,

where 0U and 1U in UN are defined as

⎪⎩

⎪⎨⎧

<−

==== +

×× jiforj

ijifor

jippjippij

)10.0()1(1

)()(0 σσU

and

⎪⎩

⎪⎨⎧

<−

==== +

×× .)05.0()1(1

)()(1 jiforj

ijifor

jippjippij σσU

Page 52: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

43

1.2 DDH 1212 : Δ=Σ=Σ′

DDH 112 : Δ=Σ′ and ,22 DDΔ=Σ

where ),,...,( 1 pdddiagD = ),5,1(~,...,...

1 Udddii

p and )2,0( =Δ jj is a pp× matrix

whose thba ),( element are defined by 10/1

)}2(2.0{)1( baba j −+ +×− . Note that this

hypothesis follows the idea of Srivastava and Yanakihara (2010) and was previously

defined in Chapter 2.

2) Compound Symmetry Structure (CS)

pppIH 11)01.0(99.0: 0213

0 ′+==Σ=Σ′ C

013

1 : C=Σ′H and .11)05.0(95.012 pppI ′+==Σ C

3) Heterogeneous compound symmetry Structure (CSH)

0214

0 : M=Σ=Σ′H

014

1 : M=Σ′H and 12 M=Σ ,

where 0M is a matrix having CSH with ρσσσσσ jiijiij jiU === ;)6,5(~2 , when

5.0=ρ , and 1M is a matrix having CSH with ρσσσσσ jiijiij jiU === ;)5,4(~2

when .4.0=ρ

4) Simple Structure (SIM) and Variance Component Structure (VC) (as

in Schott, 2007)

pIH =Σ=Σ′ 215

0 : (SIM)

pIH =Σ′ 15

1 : (SIM) and 22 V=Σ (VC),

where ).2,1,1,1,...,2,1,1,1,2,1,1,1(2 diag=V

5) Simple Structure (SIM)

pIH 2: 216

0 =Σ=Σ′

pIH 2: 16

1 =Σ′ and .5.12 pI=Σ

6) Toeplitz Structure (TOEP)

0217

0 : T=Σ=Σ′H

017

1 : T=Σ′H and 12 T=Σ ,

Page 53: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

44

where 0T is a Toeplitz matrix with elements 50.0,1 10 −== σσ and the rest of the

elements equal to zero, and 1T is a Toeplitz matrix with elements 30.0,1 10 −== σσ

and the rest of the elements equal to zero.

7) Variance Component Structure (VC)

0218

0 : V=Σ=Σ′H

018

0 : V=Σ′H and 12 V=Σ , where piUDiag iipp ,...,1),2,1(~),,...,,( 22110 == σσσσV

and .,...,1),5.2,5.1(~),,...,,( 22111 piUDiag iipp == σσσσV

4.2.2 Simulation Results

4.2.2.1 Empirical Type I Error Rates

In this section, the results are described in the order of the simulation

settings described above.

For the unstructured structure (UN), the empirical Type I error rates of

all four tests obtained by setting the common covariance matrix in two different

matrices under UN are exhibited in Tables 4.6 and 4.7.

Table 4.6 reports that the empirical Type I error rates of the test statistic

JT were much larger than the nominal 0.05 significance level. It is obvious that these

rates could not reach 0.05 for all of the p and sample size cases and they moved

further away from 0.05 when the sample size increased for any .p For instance,

when ,80=p the empirical Type I error rate of JT is 0.057 (at 2021 == nn ) and

increases to 0.061 (at ).8021 == nn The empirical Type I error rates of the two test

statistics 2ST and SYT were not close to 0.05 for all p and sample sizes considered.

These empirical Type I error rates are conservative (the values of Type I errors are

much smaller than the nominal significance level). The empirical Type I error rates of

the proposed test statistic 2T reasonably tended to 0.05 as p and the sample size

increased. As seen from Table 4.6, it can be concluded that the three test statistics

,, 2SJ TT and SYT did not perform satisfactorily under this common covariance matrix

setting while the proposed test 2T was appropriate if the sample size was large,

say ,1601 ≥n ,1602 ≥n and .2,1, =≥ knp k

Page 54: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

45

Table 4.7 presents the empirical Type I error rates of the competitive

test statistics obtained by tracking the idea of Srivastava and Yanagihara (2010) as

described in the simulation settings. The empirical Type I error rates of the test

statistics JT and 2T have a similar pattern to those provided in Table 4.6, so they will

not be explained again. It can be observed from Table 4.7 that the empirical Type I

error rates of the test statistics 2ST and SYT from this table are significantly different to

those from Table 4.6. Their empirical Type I error rates converged to 0.05 when p and

the sample size increased. This indicates that the consistency towards asymptotic

normality of the two test statistics 2ST and SYT was greatly affected by a change of the

covariance matrix appearing in the null hypothesis. These two test statistics performed

well under covariance matrices set as in the idea given in Srivastava and Yanagihara

(2010). However, the test statistics 2ST and SYT did not perform well for small sample

sizes and were suitable when the sample sizes were at least medium (at least 40 here)

and .2,1, =≥ knp k Furthermore, it can be observed that the convergence to an

asymptotic normal distribution of the test statistic 2ST was faster than those of SYT .

This phenomenon agrees with the statement given in Srivastava and Yanagihara

(2010).

Under the compound symmetry structure (CS), the empirical Type I

error rates are shown in Table 4.8. Both test statistics JT and 2T gave satisfactory

empirical Type I error rates which were quite controlled at 0.05 for all cases of p and

sample size considered. The empirical Type I error rates of the test statistics 2ST and

SYT were not close to 0.05. As seen from the table, empirical Type I error rates of the

two test statistics 2ST and SYT were much larger than 0.05 when p and the sample size

were small. After that, their empirical Type I error rates decreased further away from

0.05 as p and the sample size increased. For example, at ,4021 === nnp the

empirical Type I error rates of the test statistics 2ST and SYT were 0.053 and 0.077,

respectively, and both decreased to 0.037 and 0.024 when ,24021 === nnp

respectively. Moreover, when p was fixed, the empirical Type I error rates of 2ST and

SYT dropped when the sample size increased. For example, when ,160=p the

Page 55: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

46

empirical Type I error rates of 2ST and SYT were 0.059 and 0.057 (at ),2021 == nn

respectively, and both rates decreased to 0.043 and 0.031, respectively (at

).16021 == nn Consequently, the test statistics 2ST and SYT are not suitable for this

covariance structure while the proposed test statistic JT and the test statistic 2T are.

Under the heterogeneous compound symmetry structure (CSH), the

empirical Type I error rates are exhibited in Table 4.9. The empirical Type I error

rates of the three test statistics ,, 2SJ TT and SYT were not close to 0.05 whereas that of

the proposed test statistic 2T approximated well to 0.05 as p and the sample size

increased. It can be observed that the empirical Type I error rates of the test statistics

2ST and SYT from this table are lower than those from Table 4.6 for all cases

considered. This indicates that the convergences of the tests 2ST and SYT to a standard

normal distribution were very slow and was not accomplished when the common

covariance matrix was under the CSH structure. The proposed test statistic 2T gave

empirical Type I error rates tending to 0.05 when p and the sample size increased.

Moreover, the empirical Type I error rates of the proposed test 2T were close to 0.05

when the sample size was at least medium, here ,401 ≥n ,402 ≥n and .2,1, =≥ knp k

Under the simple structure (SIM) with ,12 =σ i.e. I=Σ=Σ 21 , the

empirical Type I error rates are shown in Table 4.10. The empirical Type I error rates

of the test statistics JT and 2T were very close to 0.05 for all cases considered.

Moreover, the empirical Type I error rates of the proposed test statistic 2T were

slightly better than those of the JT test statistic. For the test statistics 2ST and SYT ,

their empirical Type I error rates converged to 0.05 more slowly than those of the test

statistics JT and 2T . However, the empirical Type I error rates of the test statistics ST

and SYT were extremely out of control at 0.05, particularly when the sample size was

small for any .p

Under the SIM structure with ,22 =σ i.e. I221 =Σ=Σ , as displayed in

Table 4.11, the empirical Type I errors of the two test statistics JT and 2T are the same

as those displayed in Table 4.10 (when I=Σ=Σ 21 ) while the test statistics 2ST and

Page 56: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

47

SYT became too conservative. This indicates that consistency to the asymptotic

normality of the two test statistics JT and 2T was not affected by a change of an

unknown scalar 2σ defined in SIM, from 12 =σ to ,22 =σ while the consistency of

the two test statistics 2ST and SYT was very sensitive to this scalar.

Under the Toeplitz structure (TOEP), and the variance component

structure (VC), the empirical Type I error rates are shown in Tables 4.12 and 4.13,

respectively. Since these two tables give results in a similar pattern, it can be

simultaneously summarized that the empirical Type I errors of the test statistics

2ST and SYT were approximately zero while those of the two test statistics JT and

2T approximated the nominal 0.05 significance level as p and the sample size

increased. The test statistic JT yielded empirical Type I error rates which were slightly

better than those of the proposed test statistic 2T under TOEP, but those of 2T were

slightly better than those of JT under the VC structure. Moreover, Tables 4.12 and

4.13 present that the empirical Type I error rates of the proposed test statistic 2T were

close to 0.05 when the sample size was at least medium, i.e. 40,40 21 ≥≥ nn here, and

.2,1, =≥ knp k

For the case when the two sample sizes were not equal ),( 21 nn ≠

empirical Type I error rates tests SYSJ TTT ,, 2 and 2T were also performed for all seven

null and alternative hypotheses. 12 2nn = was chosen and used in combination

with ),,( 21 nnp where },240,160,80,40,20{∈p },240,160,80,40,20{1 ∈n and

}.240,160,80,40{2 ∈n The results of the Type I error rates are displayed at the bottom

of Tables 4.6-4.13. All tables report that the empirical Type I error rates of all four

test statistics SYSJ TTT ,, 2 and 2T are not substantially different from each other when

21 nn = in every null hypothesis and covariance matrix structure combination.

As describe above, it can be concluded that the proposed test statistic 2T

performed well with all of the covariance matrix structures considered here, while the

test statistic JT is most appropriate under the CS, SIM, TOEP, and VC structures. The

Page 57: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

48

previous statement that the test statistic JT is most appropriate under the SIM

corresponds to the results from Schott (2007). However, the convergence of the test

statistic JT to a standard normal distribution was slower when the common

covariance matrix was not of the SIM structure. In addition, the two test statistics

2ST and SYT were suitable for a certain covariance matrix under the UN structure and

also performed reasonably well under the SIM structure only in the case of 12 =σ , i.e.

when the common covariance matrix is the identity matrix.

4.2.2.2 Empirical Power

In general, it is fair to compare two tests with respect to their powers

only if they are level alpha tests, i.e. for power comparison studies, only a test which

can control the nominal significance level should be considered.

Recall that the proposed test statistic 2T is appropriate under all of the

covariance matrix structures (UN, CS, CSH, SIM, TOEP and VC), the test statistic

JT is appropriate under the CS, SIM, TOEP and VC structures, and the test statistics

2ST and SYT are appropriate under UN (for a certain null hypothesis) and SIM (when

)12 =σ structures. Subsequently, the empirical powers under the corresponding

alternative hypothesis were computed as described in the simulation settings and are

presented in Tables 4.6 to 4.13.

For the UN structure, Table 4.6 shows that the empirical powers of the

proposed test statistic 2T converged to 1 as p and the sample size increased. Table 4.7

reports the empirical powers of the proposed test statistic 2T compared to the test

statistics 2ST and SYT under certain alternative hypotheses with covariance matrices

DD 11 Δ=Σ and DD 22 Δ=Σ (from Srivastava and Yanagihara (2010)). As expected,

their empirical powers tended to 1 as p and the sample size increased. It can be seen in

this table that the empirical powers of the proposed test statistic 2T are generally

higher than those of the two tests.

For the CS structure, Table 4.8 reports that the empirical powers of the

proposed test statistic 2T and JT test statistic are quite high and rapidly tended to one.

Page 58: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

49

Moreover, the empirical powers of both tests were quite responsive to an increase in

p and sample size. Furthermore, the empirical powers of the proposed test statistic

2T were higher than those of the test statistic JT for all cases considered.

For the CSH structure, only the proposed test statistic 2T approximated

0.05 well. As displayed in Table 4.9, the empirical powers of the proposed test

2T rapidly converged to one when p and the sample size increased.

For the SIM structure, Table 4.10 shows the empirical powers of all four

test statistics SYSJ TTT ,, 2 and 2T under the alternative hypothesis using I=Σ1 and

).2,1,1,1,...,2,1,1,1(2 dia=Σ As expected, the empirical powers of these test statistics

quickly tended to one as p and the sample size increased. The proposed test statistic

2T generally gave a higher power than the competitive tests ,, 2SJ TT and .SYT

Moreover, the empirical powers of the proposed test statistic 2T were substantially

higher than the competitive tests ,, 2SJ TT and SYT when the sample size was small.

Table 4.11 informs of the empirical powers of the proposed test

statistics 2T and JT under the alternative hypothesis with I21 =Σ and I5.12 =Σ (the

SIM structure). The empirical powers of these two test statistics converged to one as

p and the sample size increased. The convergence to one of the empirical powers of

the proposed test statistic 2T was much faster than that of the test statistic JT ,

especially when the sample size was small, 4021 ≤= nn here, for all .p For example,

when ,2021 === nnp the empirical powers of 2T and JT are 0.757 and 0.117,

respectively. Consequently, under the SIM structure, 2T is a reasonable test and more

powerful than the JT test, particularly in cases with a small sample size.

For the TOEP structure, Table 4.12 shows that the proposed test statistic

2T gave empirical powers approaching one with a much faster rate than the test

statistic JT as p and the sample size increased.

For the VC structure, the empirical powers of the test statistics 2T and

JT are presented in Table 4.13. The two tests yielded empirical powers that converged

Page 59: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

50

to one as p and the sample size increased. The convergence of the empirical powers to

one of the proposed test statistic 2T was much faster than that of the test statistic .JT

The empirical powers of SYSJ TTT ,, 2 and 2T when the two sample sizes

were not equal, i.e. 21 nn ≠ and, in particular, ,2 12 nn = were also carried out. The

results are presented at the bottom of Tables 4.6-4.13. As expected, all four test

statistics SYSJ TTT ,, 2 and 2T yielded empirical powers higher than those obtained for

equal sample sizes for any p because the sample sizes were larger. Moreover, the

empirical power of 2T remained substantially higher than those of the test statistics

,, 2SJ TT and SYT , especially for small sample sizes.

Page 60: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

51

Table 4.6 Empirical Type I Error Rates (under 10H ′ ) of ,,, 2 SYSJ TTT and 2T and

Empirical Powers (under 11H ′ ) of JT and 2T applied at 05.0=α

p 1n 2n

Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T 2T

21 nn = 20 20 20 0.061 0.052 0.048 0.062 0.078 40 20 20 0.055 0.027 0.019 0.058 0.103 40 40 0.090 0.027 0.012 0.053 0.128

80 20 20 0.057 0.006 0.003 0.062 0.167 40 40 0.061 0.013 0.005 0.054 0.241 80 80 0.061 0.025 0.015 0.053 0.428

160 20 20 0.064 0.001 0.000 0.071 0.288 40 40 0.064 0.008 0.003 0.057 0.446 80 80 0.066 0.019 0.013 0.056 0.719 160 160 0.066 0.031 0.024 0.053 0.944

240 20 20 0.063 0.001 0.000 0.082 0.371 40 40 0.066 0.001 0.004 0.067 0.573 80 80 0.068 0.019 0.015 0.056 0.824 160 160 0.069 0.031 0.026 0.055 0.980 240 240 0.089 0.035 0.033 0.051 1.000 12 2nn =

40 20 40 0.058 0.028 0.015 0.056 0.128 80 20 40 0.056 0.014 0.007 0.059 0.220 40 80 0.062 0.023 0.013 0.052 0.346

160 20 40 0.060 0.010 0.005 0.063 0.373 40 80 0.063 0.018 0.013 0.058 0.584 80 160 0.066 0.028 0.022 0.054 0.839

240 20 40 0.067 0.007 0.004 0.063 0.453 40 80 0.074 0.017 0.012 0.060 0.697 80 160 0.065 0.024 0.018 0.051 0.911

Page 61: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

52

Table 4.7 Empirical Type I Error Rates (under 20H ′ ) of ,,, 2 SYSJ TTT and 2T and

Empirical Powers (under 21H ′ ) of ,,2 SYS TT and 2T applied at 05.0=α

p 1n 2n

Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T 2ST SYT 2T

21 nn =

20 20 20 0.092 0.081 0.087 0.061 0.427 0.029 0.435 40 20 20 0.082 0.064 0.077 0.057 0.438 0.017 0.525 40 40 0.095 0.057 0.065 0.055 0.655 0.190 0.635

80 20 20 0.086 0.074 0.075 0.060 0.590 0.008 0.518 40 40 0.093 0.042 0.065 0.055 0.699 0.106 0.729 80 80 0.100 0.056 0.041 0.055 0.917 0.502 0.926

160 20 20 0.093 0.071 0.073 0.060 0.415 0.001 0.506 40 40 0.095 0.063 0.059 0.058 0.746 0.307 0.878 80 80 0.097 0.048 0.045 0.054 0.897 0.589 0.901 160 160 0.103 0.051 0.054 0.052 1.000 0.999 1.000

240 20 20 0.095 0.064 0.069 0.069 0.491 0.001 0.515 40 40 0.094 0.047 0.040 0.056 0.804 0.032 0.882 80 80 0.104 0.054 0.046 0.047 0.993 0.708 0.999 160 160 0.105 0.053 0.054 0.053 1.000 1.000 1.000 240 240 0.108 0.050 0.051 0.051 1.000 1.000 1.000 12 2nn =

40 20 40 0.081 0.058 0.069 0.056 0.503 0.018 0.677 80 20 40 0.083 0.071 0.074 0.056 0.600 0.010 0.711 40 80 0.096 0.047 0.060 0.054 0.854 0.299 0.864

160 20 40 0.089 0.058 0.061 0.056 0.509 0.406 0.690 40 80 0.090 0.055 0.057 0.054 0.852 0.731 0.953 80 160 0.101 0.047 0.053 0.047 1.000 0.904 1.000

240 20 40 0.096 0.058 0.061 0.055 0.603 0.028 0.645 40 80 0.096 0.046 0.044 0.046 0.927 0.695 0.985 80 160 0.106 0.051 0.050 0.050 1.000 1.000 1.000

Page 62: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

53

Table 4.8 Empirical Type I Error Rates (under 30H ′ ) of ,,, 2 SYSJ TTT and 2T and

Empirical Powers (under 31H ′ ) of JT and 2T applied at 05.0=α

p 1n 2n

Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T JT 2T

21 nn =

20 20 20 0.057 0.081 0.084 0.056 0.079 0.080 40 20 20 0.055 0.085 0.081 0.052 0.099 0.114 40 40 0.052 0.053 0.077 0.050 0.169 0.145

80 20 20 0.047 0.082 0.081 0.056 0.165 0.210 40 40 0.051 0.052 0.059 0.052 0.324 0.333 80 80 0.055 0.049 0.045 0.053 0.654 0.585

160 20 20 0.048 0.059 0.057 0.052 0.287 0.387 40 40 0.053 0.041 0.040 0.052 0.589 0.670 80 80 0.050 0.040 0.035 0.051 0.918 0.932 160 160 0.051 0.043 0.031 0.052 0.998 0.998

240 20 20 0.048 0.046 0.045 0.050 0.399 0.534 40 40 0.051 0.030 0.027 0.048 0.739 0.836 80 80 0.048 0.030 0.019 0.051 0.975 0.988 160 160 0.052 0.036 0.022 0.052 1.000 1.000 240 240 0.052 0.037 0.024 0.050 1.000 1.000 12 2nn =

40 20 40 0.054 0.064 0.076 0.054 0.135 0.142 80 20 40 0.051 0.059 0.071 0.054 0.228 0.276 40 80 0.053 0.047 0.045 0.049 0.448 0.469

160 20 40 0.047 0.042 0.044 0.046 0.397 0.508 40 80 0.053 0.039 0.033 0.051 0.725 0.809 80 160 0.056 0.044 0.034 0.052 0.967 0.977

240 20 40 0.050 0.036 0.033 0.049 0.528 0.669 40 80 0.054 0.032 0.025 0.052 0.859 0.925 80 160 0.054 0.033 0.025 0.047 0.993 0.998

Page 63: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

54

Table 4.9 Empirical Type I Error Rates (under 40H ′ ) of ,,, 2 SYSJ TTT and 2T and

Empirical Powers (under 41H ′ ) of JT and 2T applied at 05.0=α

p 1n 2n

Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T 2T

21 nn =

20 20 20 0.101 0.033 0.007 0.058 0.0408 40 20 20 0.098 0.032 0.007 0.058 0.551 40 40 0.098 0.039 0.022 0.053 0.851

80 20 20 0.098 0.031 0.007 0.060 0.579 40 40 0.101 0.040 0.023 0.052 0.862 80 80 0.103 0.043 0.031 0.049 0.989

160 20 20 0.100 0.031 0.008 0.057 0.528 40 40 0.101 0.038 0.021 0.052 0.755 80 80 0.099 0.041 0.031 0.050 0.946 160 160 0.103 0.050 0.044 0.051 0.998

240 20 20 0.098 0.029 0.007 0.065 0.471 40 40 0.099 0.038 0.021 0.058 0.668 80 80 0.099 0.041 0.032 0.052 0.861 160 160 0.100 0.044 0.038 0.052 0.987 240 240 0.108 0.050 0.044 0.052 0.999 12 2nn =

40 20 40 0.095 0.039 0.012 0.057 0.645 80 20 40 0.098 0.035 0.011 0.053 0.664 40 80 0.097 0.040 0.027 0.050 0.938

160 20 40 0.097 0.033 0.011 0.052 0.578 40 80 0.098 0.038 0.025 0.048 0.840 80 160 0.100 0.043 0.038 0.051 0.983

240 20 40 0.096 0.035 0.012 0.059 0.497 40 80 0.098 0.038 0.024 0.051 0.727 80 160 0.099 0.042 0.035 0.049 0.934

Page 64: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

55

Table 4.10 Empirical Type I Error Rates (under 50H ′ ) of ,,, 2 SYSJ TTT and 2T and

Empirical Powers (under 51H ′ ) of JT and 2T applied at 05.0=α

p 1n 2n

Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T JT 2ST SYT 2T

21 nn =

20 20 20 0.057 0.082 0.084 0.058 0.243 0.422 0.015 0.23040 20 20 0.054 0.090 0.087 0.050 0.241 0.533 0.009 0.502 40 40 0.054 0.054 0.083 0.052 0.564 0.976 0.051 0.978

80 20 20 0.048 0.093 0.090 0.056 0.223 0.581 0.003 0.708 40 40 0.051 0.061 0.073 0.051 0.573 0.997 0.040 0.998 80 80 0.056 0.054 0.057 0.052 0.969 1.000 0.485 1.000

160 20 20 0.049 0.082 0.075 0.052 0.213 0.538 0.001 0.821 40 40 0.053 0.062 0.071 0.050 0.561 0.999 0.029 1.000 80 80 0.048 0.052 0.055 0.051 0.971 1.000 0.469 1.000 160 160 0.051 0.050 0.050 0.050 1.000 1.000 0.999 1.000

240 20 20 0.046 0.070 0.068 0.052 0.200 0.454 0.001 0.853 40 40 0.052 0.062 0.065 0.049 0.552 1.000 0.021 1.000 80 80 0.048 0.051 0.048 0.048 0.973 1.000 0.456 1.000 160 160 0.052 0.050 0.053 0.050 1.000 1.000 1.000 1.000 240 240 0.052 0.048 0.054 0.048 1.000 1.000 1.000 1.000 12 2nn =

40 20 40 0.054 0.064 0.079 0.053 0.265 0.659 0.004 0.74380 20 40 0.053 0.067 0.081 0.053 0.260 0.784 0.004 0.907 40 80 0.053 0.051 0.056 0.049 0.751 1.000 0.079 1.000

160 20 40 0.049 0.060 0.067 0.047 0.244 0.846 0.002 0.972 40 80 0.052 0.052 0.056 0.050 0.746 1.000 0.072 1.000 80 160 0.055 0.050 0.054 0.051 0.999 1.000 0.786 1.000

240 20 40 0.048 0.064 0.068 0.048 0.228 0.845 0.001 0.983 40 80 0.057 0.054 0.051 0.051 0.751 1.000 0.067 1.000 80 160 0.050 0.048 0.051 0.049 0.999 1.000 0.768 1.000

Page 65: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

56

Table 4.11 Empirical Type I Error Rates (under 60H ′ ) of ,,, 2 SYSJ TTT and 2T and

Empirical Powers (under 61H ′ ) of ,,, 2 SYSJ TTT and 2T applied at 05.0=α

p 1n 2n

Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T JT 2T

21 nn =

20 20 20 0.057 0.005 0.000 0.058 0.117 0.757 40 20 20 0.054 0.001 0.000 0.051 0.105 0.870 40 40 0.054 0.003 0.000 0.052 0.205 0.998

80 20 20 0.048 0.000 0.000 0.055 0.096 0.936 40 40 0.051 0.001 0.000 0.051 0.198 1.000 80 80 0.056 0.005 0.000 0.052 0.495 1.000

160 20 20 0.049 0.000 0.000 0.052 0.086 0.964 40 40 0.053 0.001 0.000 0.050 0.185 1.000 80 80 0.048 0.002 0.000 0.051 0.931 1.000 160 160 0.051 0.004 0.000 0.050 1.000 1.000

240 20 20 0.046 0.000 0.000 0.052 0.070 0.974 40 40 0.052 0.000 0.000 0.049 0.177 1.000 80 80 0.048 0.001 0.000 0.048 0.464 1.000 160 160 0.052 0.002 0.000 0.050 0.936 1.000 240 240 0.052 0.004 0.000 0.048 0.999 1.000 12 2nn =

40 20 40 0.055 0.003 0.001 0.052 0.189 0.947 80 20 40 0.053 0.002 0.000 0.053 0.186 0.984 40 80 0.053 0.003 0.001 0.049 0.347 1.000

160 20 40 0.049 0.000 0.000 0.052 0.174 0.994 40 80 0.052 0.001 0.000 0.050 0.342 1.000 80 160 0.055 0.003 0.000 0.051 0.708 1.000

240 20 40 0.048 0.001 0.000 0.048 0.160 0.996 40 80 0.057 0.001 0.000 0.051 0.329 1.000 80 160 0.050 0.002 0.000 0.049 0.714 1.000

Page 66: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

57

Table 4.12 Empirical Type I Error Rates (under 70H ′ ) of ,,, 2 SYSJ TTT and 2T and

Empirical Powers (under 71H ′ ) of JT and 2T applied at 05.0=α

p 1n 2n

Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T JT 2T

21 nn =

20 20 20 0.061 0.016 0.001 0.067 0.100 0.215 40 20 20 0.055 0.008 0.001 0.056 0.093 0.262 40 40 0.056 0.017 0.001 0.053 0.152 0.428

80 20 20 0.049 0.004 0.001 0.060 0.089 0.320 40 40 0.055 0.007 0.001 0.052 0.151 0.589 80 80 0.054 0.015 0.001 0.051 0.328 0.907

160 20 20 0.050 0.001 0.000 0.055 0.090 0.373 40 40 0.052 0.004 0.000 0.054 0.152 0.736 80 80 0.053 0.007 0.000 0.053 0.336 0.987 160 160 0.052 0.013 0.000 0.047 0.772 1.000

240 20 20 0.048 0.001 0.001 0.053 0.084 0.398 40 40 0.050 0.002 0.000 0.053 0.147 0.800 80 80 0.049 0.005 0.001 0.053 0.333 0.996 160 160 0.055 0.011 0.000 0.050 0.766 1.000 240 240 0.051 0.015 0.001 0.044 0.973 1.000 12 2nn =

40 20 40 0.056 0.013 0.001 0.052 0.134 0.312 80 20 40 0.050 0.008 0.001 0.054 0.132 0.330 40 80 0.054 0.011 0.001 0.049 0.228 0.718

160 20 40 0.053 0.003 0.001 0.049 0.133 0.474 40 80 0.055 0.006 0.001 0.052 0.224 0.867 80 160 0.052 0.014 0.001 0.052 0.502 0.998

240 20 40 0.051 0.003 0.001 0.047 0.127 0.508 40 80 0.051 0.003 0.001 0.050 0.231 0.919 80 160 0.055 0.011 0.000 0.050 0.506 1.000

Page 67: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

58

Table 4.13 Empirical Type I Error Rates (under 80H ′ ) of ,,, 2 SYSJ TTT and 2T and

Empirical Powers (under 81H ′ ) of JT and 2T applied at 05.0=α

p 1n 2n

Empirical Type I Error Rate Empirical Power JT 2ST SYT 2T JT 2T

21 nn =

20 20 20 0.059 0.008 0.000 0.059 0.118 0.286 40 20 20 0.054 0.002 0.000 0.052 0.108 0.526 40 40 0.054 0.007 0.000 0.052 0.194 0.981

80 20 20 0.048 0.000 0.000 0.056 0.092 0.717 40 40 0.052 0.002 0.000 0.052 0.184 0.999 80 80 0.055 0.008 0.000 0.054 0.454 1.000

160 20 20 0.049 0.001 0.000 0.052 0.081 0.825 40 40 0.053 0.001 0.000 0.052 0.179 1.000 80 80 0.051 0.003 0.000 0.051 0.450 1.000 160 160 0.050 0.006 0.000 0.049 0.912 1.000

240 20 20 0.047 0.000 0.000 0.049 0.074 0.861 40 40 0.051 0.000 0.000 0.049 0.166 1.000 80 80 0.047 0.000 0.000 0.049 0.430 1.000 160 160 0.054 0.004 0.000 0.048 0.914 1.000 240 240 0.052 0.006 0.000 0.047 0.998 1.000 12 2nn =

40 20 40 0.056 0.007 0.001 0.053 0.084 0.782 80 20 40 0.051 0.002 0.000 0.052 0.073 0.922 40 80 0.055 0.005 0.000 0.052 0.198 1.000

160 20 40 0.051 0.001 0.001 0.045 0.067 0.974 40 80 0.054 0.003 0.000 0.050 0.188 1.000 80 160 0.055 0.005 0.000 0.050 0.594 1.000

240 20 40 0.048 0.002 0.000 0.048 0.060 0.983 40 80 0.055 0.001 0.000 0.051 0.184 1.000 80 160 0.049 0.003 0.000 0.049 0.587 1.000

Page 68: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

59

4.3 Application

Here, the proposed test statistics 1T and 2T were applied to a microarray

dataset collected by Notterman et al. (2001), which is available at http://genomics-

pubs.princeton.edu/oncology/Data/CarcinomaNormal datasetCancerResearch.xls (last

access: October 9, 2012). Two groups of cancerous colon tissues (adenocarcinoma

and adenoma) and their paired normal colon tissues were examined using

olignucleotide arrays. The expression levels of about 6500 human genes were probed

in 18 colon adenocarcinomas, 4 colon adenomas, and 22 normal colon tissues.

4.3.1 One High-dimensional Sample

From the dataset, 18 colon adenocarcinomas were probed with oligonucleotide

arrays and the expression levels of 6500 human genes were measured on each. For

convenience, attention to the 18 colon adenocarcinomas was restricted to the first 256

measurements each, so 18=n and .256=p The covariance matrix was examined for

sphericity. The data gave the observed test statistic values as ,567.284=JU

,582.2701 =ST and ,500.81 =T with each p-value 0≈ , so the hypothesis for testing

sphericity is rejected at any reasonable significance level.

4.3.2 Two High-dimensional Samples

Two groups of cancerous colon tissues (adenocarcinoma and adenoma) were

examined with olignucleotide arrays. Attention was restricted to a subset of 100

expression levels of the gene expressions on colon tissues from 4 adenocarcinomas

and 4 adenomas, giving ,4,4 21 == nn and .100=p The covariance matrices of the two

groups were examined for equality. The data presented the observed test statistic

values of 908.0=JT and 636.02 −=T whose corresponding p-values were 0.182 and

0.524, indicating that the hypothesis of the equality of the two covariance matrices of

these data was not rejected at any reasonable significance level.

Page 69: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

CHAPTER 5

CONCLUSIONS, DISCUSSION AND RECOMMENDATIONS

FOR FUTURE RESEARCH

5.1 Conclusions

In this dissertation, independent p-variate normal data are assumed to be high-

dimensional, i.e. the number of the variables is larger than or equal to the sample size.

The two hypotheses of interest are:

1) The hypothesis for testing for a partially known matrix for one high-

dimensional data, i.e.

02

0 : Σ=Σ σH against 02

1 : Σ≠Σ σH (1.1)

2) The hypothesis for testing the equality of two covariance matrices for two

high-dimensional data, i.e.

Σ=Σ=Σ′ 210 :H against 211 : Σ≠Σ′H (1.2)

For these two hypotheses, the test statistics 1T from (1.1) and 2T from (1.2)

were proposed. The first test statistic 1T is given by

ψσ

ˆ2 41

nT = ,

where 41

22 ˆ2ˆˆ σσψ +−= aa . The second test statistic 2T is formulated as

δ1ˆ

2−

=bT ,

where

⎭⎬⎫

⎩⎨⎧

−−

⎭⎬⎫

⎩⎨⎧

−−

+−−+−−

=2

22

22

21

1

21

112

2

222

1

)(1

1

)(1

1

)1)(2()1()1)(2()1(ˆ

SS

SS

trn

tr

trn

tr

nnnnnn

b

Page 70: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

61

and

.)1(

11

1ˆˆ2

4)1(1

1ˆˆ24ˆ

2

12

2

122

*4

2

12

2

122

*42

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

−+

−=

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

−+

−= ∑∑∑∑

==== k kk kk kk k nnhph

np

nhh

Under the first null hypothesis (1.1), the asymptotic distribution of the first

proposed test statistic 1T follows the standard normal as the number of variables and

the sample size goes towards infinity. Under the second null hypothesis (1.2), the

proposed test statistic 2T is asymptotically distributed as standard normal as the

number of variables and the sample size goes towards infinity.

The properties of the two proposed test statistics 1T and 2T were evaluated on

two aspects: the Type I error rate and the power of the tests via simulation techniques.

With 10,000 independent simulations, independent p-variate normal data were

generated. Empirical Type I error rates of the two test statistics 1T and 2T were

computed and compared to existing test statistics in the literature under the null

hypothesis with a variety of covariance matrix structures (Unstructured (UN),

Compound Symmetry (CS), Heterogeneous compound symmetry (CSH), Simple

(SIM), Toeplitz (TOEP) and Variance Component (VC)). Empirical powers were also

investigated under the corresponding alternative hypothesis with a variety of

covariance matrix structures.

For testing hypothesis (1.1), the results from the simulation study are as

follows. The first proposed test statistic 1T performed well for all covariance matrix

structures considered (UN, CS, CSH and TOEP) and convergence to asymptotic

normality of the proposed test statistic was not affected by a change of covariance

matrix structure. A comparison of the proposed test statistic 1T with the existing tests

JU and 1ST proposed by Ledoit and Wolf (2002); and Srivastava (2005), respectively,

was also carried out under the SIM structure. As seen in Chapter 4, the proposed test

statistic 1T was generally very comparable to the test statistics JU and 1ST when the

sample size was large and the proposed test was outstanding in comparison with the

test statistics JU and 1ST for small or medium sample size at any p, where .np ≥

Page 71: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

62

For testing hypothesis (1.2), the performance of the second proposed test

statistic 2T compared to competitive test statistics ,, 2SJ TT and SYT was assessed under

six covariance matrix structures (UN, CS, CSH, TOEP, SIM and VC). It was found

that the proposed test statistic 2T is an appropriate test for all of the covariance matrix

structures considered here, and the test statistic JT is appropriate for some of the

structures (CS, SIM, TOEP, and VC), whereas the test statistics 2ST and SYT performed

reasonably only under the UN (only for a certain covariance matrix) and SIM (only in

the case of )12 =σ structures. In addition, for all covariance matrix structures, the

proposed test statistic 2T is outstanding when compared to these competitive tests,

especially for small or medium sample size at any p, where 2,1, =≥ knp k .

Finally, the two proposed test statistics 2T and 2T were shown to be applicable

to real data regarding human gene expression of adenoma, adenocarcinoma, and

normal colon tissues.

5.2 Discussion

It can be seen in Chapter 4 that the efficiency of the proposed test statistic 2T

seems to be similar to the test statistic JT for some of the covariance matrix structures.

Perhaps the reason is because these two test statistics are constructed based upon

unbiased and consistent estimators of the parameter 2,1,2 =Σ itr i . It is possible that the

convergences of these estimators to the parameter are at a similar rate resulting in

coincidences in their empirical Type I error rates for some situations. However, the

power of the proposed test statistic 2T was generally better than that of the test

statistic JT for all situations investigated here.

As seen in Chapter 4, the test statistic SYT did not perform quite as well for

many of the covariance matrix structures. The reason could be that it may not be

correct to assume that the hypothesis 210 : Σ=Σ′H against 211 : Σ≠Σ′H is equivalent to

0: 210 =−′ γγH against 0: 211 ≠−′ γγH because when 211 : Σ≠Σ′H is true, 21 γγ − can

equal zero. To illustrate this, with 3=p and under ,21 Σ≠Σ suppose that

Page 72: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

63

⎥⎥⎦

⎢⎢⎣

⎡==Σ

200

020

002

2 31 I and ,2000

0200

0020

20 32⎥⎥⎦

⎢⎢⎣

⎡==Σ I

then 31

3612

)( 21

21

1 ==ΣΣ

=trtr

γ and 31

36001200

)( 22

22

2 ==ΣΣ

=trtr

γ , leading to .021 =−γγ

In another example, we let

⎥⎥⎦

⎢⎢⎣

⎡=Σ

300

020

001

1 and ⎥⎥⎦

⎢⎢⎣

⎡=Σ

1000

0300

0020

2 ,

then 3614

)( 21

21

1 =ΣΣ

=trtr

γ and 3614

36001400

)( 22

22

2 ==ΣΣ

=trtr

γ . It is again clear that

021 =− γγ even though 21 Σ≠Σ , as previously seen.

5.3 Recommendations for Future Research

Here, possible extensions of this study are recommended as follows:

1) For one and two high-dimensional data, since it can be said that both

proposed test statistics 1T and 2T are based on the Frobenius norm, then a proposal

based on other norms should be possible.

2) An extension to the proposed test statistic 2T for testing two covariance

matrices to several high-dimensional covariance matrices could be of interest.

3) In this study, normality of the data is presumed, so developing another test

statistic without this assumption is one to be considered.

4) In this study, test statistics concerning the parameter of a normal

distribution were proposed, so a distribution free statistic could also be interesting to

study.

Page 73: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

BIBLIOGRAPHY

Anderson, T. W. 1984. An Introduction to Multivariate Statistical Analysis.

2nd ed. New York: John Wiley & Sons.

Bartlett, M. S. 1937. Properties of Sufficiency and Statistical Tests. In Proceedings

of the Royal Society of London. Series A. Sir Mark Welland. London:

Royal Society. Pp. 268-282.

Billingsley, P. 1995. Probability and Measure. 3rd ed. New York: John Wiley &

Sons.

Birke, M. and Dette, H. 2005. A Note on Testing the Covariance Matrix for Large

Dimension. Statistics and Probability Letters. 74 (October): 281-289.

Boonyarit Choopradit and Samruam Chongcharoen. 2011a. A Test for One-sample

Repeated Measures Designs: Effect of High-dimensional Data. Journal of

Applied Sciences. 11 (October): 3285-3292.

Boonyarit Choopradit and Samruam Chongcharoen. 2011b. A Test for Two-sample

Repeated Measures Designs: Effect of High-dimensional Data. Journal of

Mathematics and Statistics. 7 (October): 332-342.

Carter, E. M. and Srivastava, M. S. 1977. Monotonicity of the Power Functions of

Modifed Likelihood Ratio Criterion for the Homogeneity of Variances and

of the Shericity Test. Journal of Multivariate Analysis. 7 (March): 229-

233.

Casella, G. and Berger, R. L. 2002. Statistical Inferences. California: Duxbury.

Dudoit, S.; Fridlyand, J. and Speed, T. P. 2002. Comparison of Discrimination

Methods for the Classification of Tumors Using Gene Expression Data.

Journal of American Statistical Association. 97 (December): 77-87.

Fisher, T.; Sun, X. and Gallagher, C. M. 2010. A New Test for Sphericity of the

Covariance Matrix for High Dimensional Data. Journal of Multivariate

Analysis. 101 (November): 2554-2570.

Page 74: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

65

Fujikoshi, Y.; Ulyanov, V. and Shimizu, R. 2010. Multivariate Statistics : High-

Dimensional and Large-Sample Approximations. New Jersey: John

Wiley & Sons.

Gamage, J. and Mathew, T. 2008. Inference on Mean Sub-vectors of Two

Multivariate Normal Populations with Unequal Covariance Matrices.

Statistics and Probability Letters. 78 (March): 420-425.

Ibrahim, J. G.; Chen, M. and Gray, R. J. 2002. Baysian Models for Gene Expression

with DNA Microarray Data. Journal of American Statistical

Association. 97 (December): 88-99.

John, S. 1971. Some Optimal Multivariate Tests. Biometrika. 58 (April): 123-127.

Johnson, D. E. 1998. Applied Multivariate Methods for Data Analysts.

California: Duxbury.

Johnson, R. A. and Wichern, D. W. 2002. Applied Multivariate Statistical Analysis.

5th ed. New Jersey: Prentice Hall.

Ledoit, O. and Wolf, M. 2002. Some Hypothesis Tests for the Covariance Matrix

when the Dimension is Large Compared to the Sample Size. The Annals

of Statistics. 30 (August): 1081-1102.

Lehmann, E. L. 1999. Elements of Large-Sample Theory. New York: Springer.

Lehmann, E. L. and Romano, J. P. 2005. Testing Statistical Hypotheses. 3rd ed.

New York: Springer.

Lin, Z. and Xiang, Y. 2008. A Hypothesis Test for Independence of Sets of Variates

in High Dimensions. Statistics and Probability Letters. 78 (December):

2939-2946.

Notterman, D. A.; Alon, U.; Sierk, A. J. and Levine, A. J. 2001. Transcriptional

Gene Expression Profiles of Colorectal Adenoma, Adenocarcinoma, and

Normal Tissue Examined by Oligonucleotide Arrays. Cancer Research.

61 (April): 3124-3130.

Pearlman, M. D. 1980. Unbiasedness of the Likelihood Ratio Tests for Equality of

Several Covariance Matrices and Equality of Several Multivariate Normal

Populations. The Annals of Statistics. 8 (March): 247-263.

Rao, C. R. 1973. Linear Statistical Inference and Its Application. 2nd ed.

New York: John Wiley & Sons.

Page 75: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

66

Rencher, A. R. 2003. Linear Models in Statistics. New York: John Wiley & Sons.

Rohatgi, V. K. 1984. Statistical Inference. New York: John Wiley & Sons.

Samruam Chongcharoen. 2011. Inversion of Covariance Matrix for High

Dimensional Data. Journal of Mathematics and Statistics. 7 (July):

227-229.

Schott, J. R. 2007. A Test for the Equality of Covariance Matrices when the

Dimension is Large Relative to the Sample Sizes. Computational

Statistics and Data Analysis. 51 (August): 6535-6542.

Serdobolskii, V. I. 1999. Theory of Essentially Multivariate Statistical Analysis.

Uspekhi Matematicheskikh Nauk. 54 (2): 85-112.

Srivastava, M. S. 2002. Methods of Multivariate Statistics. New York: John

Wiley & Sons.

Srivastava, M. S. 2005. Some Tests Concerning the Covariance Matrix in High

Dimensional Data. Journal of Japan Statistical Society. 35 (2): 251-272.

Srivastava, M. S. 2006. Some Tests Criteria for the Covariance Matrix with Fewer

Observations than the Dimension. Acta Et Commentationes

Universitatis Tartuensis De Mathematica. 10 (1): 77-93

Srivastava, M. S. 2007a. Multivatiate Theory for Analyzing High-dimensional Data.

Journal of Japan Statistical Society. 37 (1): 53-86.

Srivastava, M. S. 2007b. Testing the Equality of Two Covariance Matrices and

Independence of Two Sub-vectors with Fewer than Observations than the

Simension. In International Conference on Advances in

Interdisciplinary Statistics and Combinatorics. Oct: 12-14. Sat Gupta.

North Carolina: Taylor & Francis.

Srivastava, M. S. 2010. Methods of Multivariate Statistics. New York: John Wiley

& Sons.

Srivastava, M. S. and Yanagihara, H. 2010. Testing the Equality of Several

Covariance Matrices with Fewer Observations than the Dimension.

Journal of Multivariate Analysis. 101 (July): 1319-1329.

Page 76: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

APPENDICES

Page 77: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

APPENDIX A

Expected Values and Variances of the Estimators

For a positive symmetric definite matrix ,Σ by spectral decomposition, we can

obtain ,Γ′ΓΛ=Σ where )...,,,( 21 pdiag λλλ=Λ with iλ being the thi eigenvalue of Σ

and Γ is an orthogonal matrix with each column normalized corresponding to

eigenvectors .p21 γ,...,γ,γ Similarly, 0Σ can also be written as ,RRD ′=Σ0 where

)...,,,( 21 pddddiag=D with id being the thi eigenvalue of ,0Σ and R an orthogonal

matrix with each column normalized corresponding to eigenvectors p21 r,...,r,r

(Rencher, 2003: 46).

A.1 Expressions of the Estimators mm tr

pa )(1 1

0 ΣΣ= −

We can write the expressions ,8,...,1,)(1 10 =ΣΣ= − mtr

pa m

m in terms of

eigenvalues as follows:

.1

)D(1

))RDR((1

))RRD((1

)(1

1

1

1

1

10

∑=

⎟⎟⎠

⎞⎜⎜⎝

⎛ λ=

Λ=

Γ′ΛΓ′=

Γ′ΓΛ′=

ΣΣ=

p

i

m

i

i

m

m

m

mm

dp

trp

trp

trp

trp

a

Page 78: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

69

A.2 Expressions of the Estimators for 1a and 2a

Let ),1,(~)1( −Σ′=− nWn pYYS where )( 1−= n21 y...,,y,yY , and each

),(~ Σ0y pj N and is independent (Anderson, 1984: Section 3.3). In addition, let

)( 1n21 Z...,,Z,ZZ −= , where jZ are independently and identically distributed

(iid.) ),( IN p 0 , then we can write ZY 21

Σ= , where .21

21

Σ=ΣΣ Define

),( p21 w,...,w,wZW =Γ′′=′ and each iw are iid. ).,(1 INn 0− Thus, define iiww′=iiυ

as iid. Chi-squared random variables with 1−n degree of freedom.

We can write )(1 10 S−Σtr

pand 21

0 )(1 S−Σtrp

in terms of Chi-squared random

variables as follows:

( )

( )( )( )( )

.)1(

1)1(

1)1(

1)1(

1)1(

1)1(

11

11

)(1ˆ

1

1

1

1

1

1

1

1

=

=

−=

′−

=

′Λ−

=

Γ′ΓΛ′−

=

Σ′−

=

′−

=

⎟⎠⎞

⎜⎝⎛ ′

−′=

Σ=

p

iii

i

i

p

i i

i

dpn

dpn

trpn

trpn

trpn

trpn

ntr

p

trp

a

υλ

λii

10

ww

WWD

ZZD

ZZD

YYD

YYRRD

S

Similarly,

( )

( )( )WWDWWD

WWD

YYRDRS10

′Λ′Λ−

=

′Λ−

=

⎟⎠⎞

⎜⎝⎛ ′

−=Σ

−−

−−

112

212

212

)1(1

)1(1

111)(1

trpn

trpn

ntr

ptr

p

Page 79: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

70

( )( )( )

,)1(

2)1(

1

2)1(

1

2)1(

1)1(

1)(1

22

2

12

2

2

22

12

2

2

12

2

2

112

2

∑∑

∑∑

∑∑

<=

<=

<=

−−−

−+

−=

⎥⎥⎦

⎢⎢⎣

⎡+

−=

⎥⎥⎦

⎢⎢⎣

⎡′′+′′

−=

Λ′Λ′−

p

jiij

ji

jiii

p

i i

i

p

jiij

ji

jiii

p

i i

i

p

ji ji

jip

i i

i

ddpndpn

dddtr

pn

dddtr

pn

trpn

trp

υλλ

υλ

υλλ

υλ

λλλjjiiiiii

10

wwwwwwww

WDWWDWS

where jiww′=ijυ .

Let

( )( ) ( ) .)(1

1)(112

)1(ˆ 210

210

2

2 ⎥⎦⎤

⎢⎣⎡ Σ

−−Σ

+−−

= −− SS trn

trpnn

na

Since ,1)1)(2(

)1( 2

≈+−

−nn

n

( )

,1

1)1(

2)1(2

2)1(

1)1(

2)1(

11

11

11

)1(2

)1(11

)(1

1)(1ˆ

21

22

2

12

2

3

2

12

2

32

22

12

2

2

2

1

22

2

12

2

2

210

2102

bbnddpndpn

ndddnddndnp

dnnddndnp

trn

trp

a

p

jijjiiij

ji

jiii

p

i i

i

p

jijjii

ji

jiii

p

i i

ip

jiij

ji

jiii

p

i i

i

p

iii

i

ip

jiij

ji

jiii

p

i i

i

+=

⎟⎠⎞

⎜⎝⎛

−−

−+

−−

=

⎥⎥⎦

⎢⎢⎣

⎡+

−−

⎥⎥⎦

⎢⎢⎣

−+

−=

⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛−−

−−

+−

=

⎥⎦⎤

⎢⎣⎡ Σ

−−Σ≈

∑∑

∑∑∑∑

∑∑∑

<=

<=<=

=<=

−−

υυυλλ

υλ

υυλλ

υλ

υλλ

υλ

υλ

υλλ

υλ

SS

where

.1

1)1(

2,)1(2 2

222

12

2

31 ∑∑<=

⎟⎠⎞

⎜⎝⎛

−−

−=

−−

=p

jijjiiij

ji

jiii

p

i i

i

nddpnb

dpnnb υυυ

λλυ

λ

A.3 Expected Values of the Estimators

The following lemma taken from Srivastava (2005) and Fisher et al. (2010)

gives important results for proving various lemmas in this work.

Page 80: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

71

Lemma A.1. For iiww′=iiυ and jiww′=ijυ , for any ,ji ≠

,...2,1),32)...(1)(1()( =−++−= rrnnnE riiυ ),1(2)( −= nVar iiυ

),2)(1)(1(8)( 2 ++−= nnnVar iiυ ),1(8)]1([ 3 −=−− nnE iiυ

),3)(1(12)]1([ 4 +−=−− nnnE iiυ

[ ],)()1(272)1)(1(3)]1)(1([ 3442 nOnnnnnE ii +−+−=+−−υ

,1)( 2 −= nE ijυ ),1)(1(3)( 4 +−= nnE ijυ

),1)(1()( 2 +−= nnE ijiiυυ ),3)(1)(1()( 22 ++−= nnnE ijiiυυ

.)1)(1()( 22 +−= nnE jjiiij υυυ

Proof. The first 6 results can be found in Srivastava (2005) and the last 5 results can

be found in Fisher et al. (2010).

A.3.1 Expected Values of 1a and 2a

Lemma A.2 For 1a and 2a as defined above,

,)ˆ( 11 aaE = and .)ˆ( 22 aaE = (A.1)

Proof Using the results from Lemma A.1.,

( )

.

)(1

1

)1()1(

1

)()1(

1)1(

11)ˆ(

1

10

1

1

1

1

101

a

trp

dp

ndpn

Edpn

dpnEtr

pEaE

p

i i

i

p

i i

i

ii

p

i i

i

ii

p

i i

i

=

ΣΣ=

=

−−

=

−=

⎟⎟⎠

⎞⎜⎜⎝

⎛−

=⎥⎦

⎤⎢⎣

⎡Σ=

=

=

=

=

λ

λ

υλ

υλ

S

Since ,01

12 =⎟⎠⎞

⎜⎝⎛

−− jjiiij n

E υυυ so it follows that ,0)( 2 =bE then

Page 81: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

72

( )( )

( )( )

( )( )

.

)(1

1

)1)(1()1(2

12)1(

)()1(2

12)1(

)1(2

12)1()ˆ(

2

210

12

21

2

2

3

2

2

12

2

3

2

2

12

2

3

2

2

a

trp

dp

nndpn

nnn

n

Edpn

nnn

ndpn

nEnn

naE

p

i i

i

p

i i

i

ii

p

i i

i

ii

p

i i

i

=

ΣΣ=

=

+−−−

+−−

=

⎟⎟⎠

⎞⎜⎜⎝

−−

+−−

=

⎟⎟⎠

⎞⎜⎜⎝

−−

+−−

=

=

=

=

=

λ

λ

υλ

υλ

A.4 Variances of the Estimators

A.4.1 Variance of 1a

Lemma A.3.

.)1(

2)ˆ( 21 apn

aVar−

=

Proof By applying Lemma A.1.,

.)1(

2

)(1)1(

2

1)1(

2

)1(2)1(

1

)()1(

1)1(

1)ˆ(

2

210

12

21

22

122

11

apn

trppn

dppn

ndpn

Vardpn

dpnVaraVar

p

i i

i

p

i i

i

p

iii

i

i

p

iii

i

i

−=

⎟⎟⎠

⎞⎜⎜⎝

⎛ΣΣ

−=

⎟⎟⎠

⎞⎜⎜⎝

−=

−−

=

−=

⎟⎟⎠

⎞⎜⎜⎝

⎛−

=

=

=

=

=

λ

λ

υλ

υλ

Page 82: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

73

A.4.2 Variance of 2a

Lemma A.4.

.8)1(

)2)(1()2(8)( 445

2

1 anp

apn

nnnbVar ≈−

++−=

Proof By applying Lemma A.1., the variance of 1b can be easily found as follows:

( )

.8

)(18

18

1)1(

)2)(1()2(8)1(

)2)(1()2(8

)2)(1)(1(8)1(

)2()1(

)2()1(

)2()(

4

410

14

41

4

4

5

21

4

4

25

21

4

4

26

2

2

14

4

26

2

2

12

2

26

2

1

anp

trpnp

dpnp

dppnnnn

dpnnnn

nnndpn

n

Vardpn

nd

Varpn

nbVar

p

i i

i

p

i i

i

p

i i

i

p

i i

i

ii

p

i i

i

ii

p

i i

i

=

⎟⎟⎠

⎞⎜⎜⎝

⎛ΣΣ=

⎟⎟⎠

⎞⎜⎜⎝

⎛ λ≈

⎟⎟⎠

⎞⎜⎜⎝

⎛ λ

−++−

=

λ

−++−

=

++−λ

−−

=

υλ

−−

=

⎟⎟⎠

⎞⎜⎜⎝

⎛υ

λ

−−

=

=

=

=

=

=

=

Lemma A.5.

.1)1(

)1)(2(4)( 42242 ⎟⎟

⎞⎜⎜⎝

⎛−

−+−

= ap

an

nnbVar

Proof By applying Lemma A.1., we compute

Page 83: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

74

,)1()1)(1(3)]([)()( 22242 −−+−=−= nnnEEVar ijijij υυυ

,)1(4)]1)(1[()1()1(

)]([)()()]([)()(

2

222

222

22

−=−−−+−=

−=−=

nnnnnn

EEEEEVar

jjiijjii

jjiijjiijjii

υυυυυυυυυυ

( )

( )( )[ ]

( )[ ]

( ) ( ) ( ) ( )[ ]

[ ]

( )[ ]n

nnnn

nnnnnnnnn

nEEnEnEn

nnnEn

nnEn

nE

nEE

nCOV

jjiiijjjiiij

jjiiijjjiiij

jjiiij

jjiijjiiijijjjiiij

4

)1()1()1(1

1

)1()1)(1)(1()1()1()1)(1(1

1

)1()1()1(1

1

)1()1()1(1

1

)1()1(1

1

11

11)(

11,

22

322

3222

3222

22

222

=

−−+−−

=

−+−−−−−−−+−−

=

−+−−−−−

=

−+−−−−−

=

−−−−−

=

⎥⎦

⎤⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛

−−

−−=⎟

⎠⎞

⎜⎝⎛

υυυυυυ

υυυυυυ

υυυ

υυυυυυυυυ

( )

⎟⎟⎠

⎞⎜⎜⎝

⎛−

−+−

=

⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛ΣΣ−⎟⎟

⎞⎜⎜⎝

⎛ΣΣ

−+−

=

⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟⎟

⎞⎜⎜⎝

−+−

=

−+−

=

⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟⎟

⎞⎜⎜⎝

⎛−

−+−−+−

−=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛

−−

−+

−=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛

−−

−=

−−

==

<

<

<

<

∑∑

4224

410

221

04

14

42

12

2

4

22

22

24

22

222

22

24

22

222

22

24

2242

1)1(

)1)(2(4

)(11)(1)1(

)1)(2(4

111)1(

)1)(2(4

2)1(

)1)(2(4

)4(2)1(4)1(

1)1()1)(1(3)1(

41

1,2)()1(

1)()1(

41

1)1(

4)(

ap

an

nn

trpp

trpn

nn

dppdpnnn

ddpnnn

nnnn

nnnddpn

nCOVVar

nVar

ddpn

nddVar

pnbVar

p

i i

ip

i i

i

p

ji ji

ji

p

ji ji

ji

p

jijjiiijjjiiij

ji

ji

p

jijjiiij

ji

ji

λλ

λλ

λλ

υυυυυυλλ

υυυλλ

Page 84: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

75

Lemma A.6. .0),( 21 =bbCOV

Proof Since ,0)( 2 =bE

.01

1...

11

11

)1()2(2

11

)1()2(2

)(),(

222

2

22222

2

22

22112

1

21

5

22

12

2

5

2121

=⎪⎪⎪⎪

⎪⎪⎪⎪

⎪⎪⎪⎪

⎪⎪⎪⎪

⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛

−−++

⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛

−−+

⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛

−−

−−

=

⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛

−−⎟⎟

⎞⎜⎜⎝

−−

=

=

∑∑

<

<

<

<=

p

kjkkjjjk

kj

kjpp

p

p

p

kjkkjjjk

kj

kj

p

kjkkjjjk

kj

kj

p

kjkkjjjk

kj

kjii

p

i i

i

ndddE

ndddE

ndddE

pnn

ndddE

pnn

bbEbbCOV

υυυλλ

υλ

υυυλλ

υλ

υυυλλ

υλ

υυυλλ

υλ

Note that, using the results in Lemma A.1.,

for ,kji ≠≠

for ,kji ≠=

for .ikj =≠

Lemma A.7. Let

( ),

)(1

1)(1ˆ

21

210

2102

bb

trn

trp

a

+=⎥⎦⎤

⎢⎣⎡ Σ

−−Σ≈ −− SS

then, 22242

48)ˆ( an

anp

aVar +≈ as .),( ∞→np

( ) ( )

( )( ) ( )( )

( )( ) ( )( ) 0)1(31)1(1

131)1(

11

0)1(31)1(1

131)1(

11

,0)1)(1(1)1(1

1)1(1)1(

111

1

322

322

222

22

=⎟⎠⎞

⎜⎝⎛ −++−

−−++−=

⎟⎠⎞

⎜⎝⎛

−−=

=⎟⎠⎞

⎜⎝⎛ −++−

−−++−=

⎟⎠⎞

⎜⎝⎛

−−=

=⎟⎠⎞

⎜⎝⎛ −−+−

−−−+−=

⎟⎠⎞

⎜⎝⎛

−−=

⎥⎥⎦

⎢⎢⎣

⎡⎟⎠⎞

⎜⎝⎛

−−

<

<

<

<

<

<

<

p

kj kj

kj

p

kjjjiiijii

kj

kj

p

kj kj

kj

p

kjkkiiikii

kj

kj

p

kj kj

kj

p

kjkkjjiijkii

kj

kj

p

kjkkjjjk

kj

kjii

nnnnn

nnndd

nE

dd

nnnnn

nnndd

nE

dd

nnnnn

nnndd

nE

dd

nddE

λλ

υυυυλλ

λλ

υυυυλλ

λλ

υυυυυλλ

υυυλλ

υ

Page 85: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

76

Proof From Lemma A.4., A.5. and A.6.,

.48

4)2(4

4)12(4

448

448

148

1)1(

)1)(2(4)1(

)2)(1()2(8),(2)()()ˆ(

2224

22242

22242

22242

22242

42224

422445

221212

an

anp

an

apnn

an

apn

n

an

apnnp

an

apnnp

ap

an

anp

ap

an

nnapn

nnnbbCOVbVarbVaraVar

+=

+≈

+−

=

+⎟⎟⎠

⎞⎜⎜⎝

⎛−=

+⎟⎟⎠

⎞⎜⎜⎝

⎛−=

⎟⎟⎠

⎞⎜⎜⎝

⎛−+≈

⎟⎟⎠

⎞⎜⎜⎝

⎛−

−+−

+−

++−=

++=

A.5 Covariance Terms Lemma A.8. Proof Using the results from Lemma A.1.,

( )

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

−+−

−⎥⎥⎦

⎢⎢⎣

⎡+

−−

=

⎟⎟⎠

⎞⎜⎜⎝

−−

⎟⎟⎠

⎞⎜⎜⎝

⎛−

⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

−−

=

−=

∑∑∑∑

∑∑

∑∑

==≠=

==

==

p

i i

ip

i i

ijjii

p

ji ji

jiii

p

i i

i

ii

p

i i

ip

iii

i

i

ii

p

i i

iii

p

i i

i

ddpnnn

dddE

pnn

dpnnE

dpnE

ddE

pnn

bEaEbaEbaCOV

12

2

122

22

23

13

3

24

2

12

2

31

2

12

2

124

111111

)1()1)(2(

)1(2

)1(2

)1(1

)1(2

)ˆ()ˆ(),ˆ(

λλυυ

λλυ

λ

υλ

υλ

υλ

υλ

.),(as4

)1()1)(2(4

),ˆ(

3

33

11

∞→≈

−+−

=

npnpa

pnann

baCOV

Page 86: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

77

npa

trppn

nndpn

nndpn

nnn

ddpnnn

dpnnn

ddpnnnn

dpnnnnn

dddpnnn

nndd

nnndpn

nbaCOV

p

i i

ip

i i

i

p

ji jj

jip

i i

i

p

ji ji

jip

i i

i

p

i

p

ji jj

ji

i

i

p

ji ji

jip

i i

i

33103

13

3

221

3

3

23

2

2

221

3

3

22

2

2

24

2

13

3

24

12

2

3

3

22

22

2

13

3

2411

4)(1

)1()3)(2(4

)1()1)(2(

)1()3)(1)(2(

)1()1)(2(

)1()1)(2(

)1()2)(1()1(

)1()3)(1)(1)(2(

)1()1)(2(

)1()1()3)(1)(1()1(2),ˆ(

≈⎟⎟⎠

⎞⎜⎜⎝

⎛ΣΣ

−+−

=

−+−

−−

++−=

−+−

−−

+−−

−++−

+−

++−−=

⎥⎥⎦

⎢⎢⎣

⎡+

−+−

⎥⎥⎦

⎢⎢⎣

⎡+−+++−

−−

=

==

≠=

≠=

= ≠

≠=

∑∑

∑∑

∑∑

∑ ∑

∑∑

λλ

λλλ

λλλ

λλλ

λλλ

Lemma A.9. 0),ˆ( 21 =baCOV .

Proof From the fact that ,0)( 2 =bE then

.01

1)1(

2)ˆ(),ˆ(

2

123

2121

=⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛

−−⎟⎟

⎞⎜⎜⎝

−=

=

∑∑<=

p

jijjiiij

ji

jip

iii

i

i

ndddE

pn

baEbaCOV

υυυλλ

υλ

.

This follows from the note in Lemma A.6.

Lemma A.10.

.4)ˆ,ˆ( 321 anp

aaCOV ≈

Proof

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎠⎞

⎜⎝⎛

−−

−+

−−

⎟⎟⎠

⎞⎜⎜⎝

⎛−

=

∑∑

<

=

=p

jijjiiij

ji

ji

ii

p

i i

ip

iii

i

i

nddpn

dpnn

dpnEaaE

υυυλλ

υλ

υλ

11

)1(2)1(2

)1(1)ˆˆ(

22

2

12

2

3

121

Page 87: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

78

.

11

)1(2

)1(2

)ˆˆ(2

123

2

12

2

124

21

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛

−−⎟⎟

⎞⎜⎜⎝

−+

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

−−

=

∑∑

∑∑

<=

==

p

jijjiiij

ji

jip

iii

i

i

ii

p

i i

ip

iii

i

i

ndddpn

ddpnn

EaaEυυυ

λλυ

λ

υλ

υλ

The expectation of the second term equals zero, so

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

−+−

+−

+−=

⎥⎥⎦

⎢⎢⎣

⎡−⎟⎟

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

−+−

+−

++−=

−+−

+−

++−=

⎥⎥⎦

⎢⎢⎣

⎡+−+++−

−−

=

⎥⎥⎦

⎢⎢⎣

⎡+

−−

∑∑∑

∑∑∑∑

∑∑

∑∑

∑∑

===

====

≠=

≠=

≠=

p

i i

ip

i i

ip

i i

i

p

i i

ip

i i

ip

i i

ip

i i

i

p

ji ji

jip

i i

i

p

ji ji

jip

i i

i

p

jijjii

ji

jiii

p

i i

i

ddpnnn

dpnnn

dddpnnn

dpnnnn

ddpnnn

dpnnnn

nndd

nnndpn

n

dddE

pnnaaE

112

2

221

3

3

23

13

3

112

2

221

3

3

23

2

2

221

3

3

23

22

2

13

3

24

22

23

13

3

2421

)1()1)(2(

)1()1)(2(4

)1()1)(2(

)1()3)(1)(2(

)1()1)(2(

)1()3)(1)(2(

)1()1()3)(1)(1()1(2

)1(2)ˆˆ(

λλλ

λλλλ

λλλ

λλλ

υυλλ

υλ

.4

)(1)(1)(14

1114

14

123

10

210

310

112

2

13

311

2

2

21

3

3

2

aaanp

trp

trp

trpnp

dpdpdpnp

ddpdnpp

i i

ip

i i

ip

i i

i

p

i i

ip

i i

ip

i i

i

+=

⎟⎟⎠

⎞⎜⎜⎝

⎛ΣΣ⎟⎟

⎞⎜⎜⎝

⎛ΣΣ+⎟⎟

⎞⎜⎜⎝

⎛ΣΣ=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛+⎟

⎟⎠

⎞⎜⎜⎝

⎛=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛+≈

−−−

===

===

∑∑∑

∑∑∑λλλ

λλλ

Using the results from Lemma A.2.,

.4

)(14

4

1114)ˆ()ˆ()ˆˆ()ˆ,ˆ(

3

310

13

3

2

12

2

1112

2

21

3

3

2

212121

anp

trpnp

dnp

dpdpddpdnp

aEaEaaEaaCOV

p

i i

i

p

i i

ip

i i

ip

i i

ip

i i

ip

i i

i

=

⎟⎟⎠

⎞⎜⎜⎝

⎛ΣΣ=

=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛−

⎥⎥⎦

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛+≈

−=

=

=====

∑∑∑∑∑λ

λλλλλ

Page 88: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

79

A.6 Asymptotic Variance and Covariance Terms

The variance and covariance terms are simplified by finding their asymptotic

values under assumptions (A1) and (A2), as well as .),( ∞→np

,2

)1(2

)ˆ( 221 np

apn

aaVar ≈

−= (A.2)

,8

)1()2)(1()2(8)( 4

45

2

1 npa

apn

nnnbVar ≈−

++−= (A.3)

,14

1)1(

)1)(2(4)(

4222

42242

⎟⎟⎠

⎞⎜⎜⎝

⎛−≈

⎟⎟⎠

⎞⎜⎜⎝

⎛−

−+−

=

ap

an

ap

an

nnbVar (A.4)

,48)ˆ( 22242 a

na

npaVar +≈ (A.5)

.4

)1()1)(2(4

),ˆ( 323

311 np

apn

annbaCOV ≈

−+−

= (A.6)

.4

)ˆ,ˆ( 123

21 aanpa

aaCOV +≈ (A.7)

Page 89: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

APPENDIX B

Proof of Theorem 3.1.2

To find the distributions of 1a and ,ˆ2a the Lyapunov-type Central Limit

Theorem from Rao (1973: 147) is used and given in the following theorem.

Theorem B.1. (Central Limit Theorem from Rao (1973: 147))

Let nXXX ,...,2,1 be a sequence of independent −p dimensional random

variables, such that ,0)( =iE X and iΣ is the pp × covariance matrix of .iX Suppose

that, as ,∞→n 0≠Σ→∑=

Σ 01

1 n

i in and, for every ,0>ε

01

21→∑

=∫ε>

n

i idF

iin

nXX ,

where iF is the distribution function of iX and iX is the Euclidean norm of vector

.iX Following this, the random variables nn /)...1( XX ++ converge to a

−p variate normal distribution with mean zero and covariance matrix 0.Σ

Proof (See proof in Rao (1973: 147)).

Since, from (3.3) in Chapter 3 (page 23), ],)1)(2(

)1(ˆ 21

2

2 [ bbnn

na ++−

−= then it is

necessary to find the distributions of 11,ˆ ba and 2b which will standardize them as

normally distributed. We start by finding the distributions of 11,ˆ ba (both are functions

of )iiυ and of 2b (a function of )., jiij ≠υ Subsequently, the distribution of 2a , which

is a distribution of a linear function of two normal random variables, is obtained.

Page 90: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

81

First, in order to find the distribution of 1a and 1b , under the eigenvalues iλ

and id , as defined in Appendix A (page 67), we let

,1

))1((1 −

−−=

ndn

ui

iiii

υλ and ,

)2)(1)(1())1)(1((

2

22

2 ++−

+−−υλ=

nnndnn

ui

iiii

where

,0)(,0)( 21 == ii uEuE

,2

)1(2)1(

)()1(

)( 2

2

2

2

2

2

1i

i

i

iii

i

ii d

ndn

Vardn

uVarλλ

υλ

=−−

=−

=

.8

)2)(1)(1(8)2)(1)(1(

1

)()2)(1)(1(

1)(

4

4

4

4

24

4

2

i

i

i

i

iii

ii

d

nnndnnn

Vardnnn

uVar

λ

λ

υλ

=

++−++−

=

++−=

Since ,0)(,0)( 21 == ii uEuE

( )( )[ ]

[ ]

3

3

3

3

3

3

3

2233

3

23

2121

4

)2(14

)2)(1()1()1)(1(4

)2)(1()1()1()1()1)(1()1(

)2)(1()1()1)(1()1(

),(),(

i

ni

i

i

i

i

i

iiiiiii

i

iiiii

iiii

de

ndn

nnndnn

nnndnnnnnE

nnndnnnE

uuEuuCOV

λ=

+

+λ=

++−

+−λ=

++−

+−+υ+−−υ−−υλ=

++−

+−−υ−−υλ=

=

and 12/1 ≈++= nnen as .∞→n

Page 91: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

82

Since siiυ are independent, then ),( 21 ′= iii uuU are independently distributed random

vectors, for ,,...,1 pi = with ,)( 0U =iE and covariance matrix inΩ given by

.,...,1,84

42

4

4

3

3

3

3

2

2

pi

dde

de

d

i

i

i

ni

i

ni

i

i

in =

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

=Ωλλ

λλ

For any ,n as ∞→p

.8442

84

42

...

0

43

32

14

4

13

31

3

3

12

2

1

0≠Ω→⎥⎦⎤

⎢⎣⎡=

⎥⎥⎥⎥

⎢⎢⎢⎢

=

Ω++Ω=Ω

∑∑

∑∑

==

==

nn

n

p

i i

ip

i i

in

p

i i

inp

i i

i

pnnn

aaeaea

dpdpe

dpe

dp

p

λλ

λλ

where .8442

43

320⎥⎦⎤

⎢⎣⎡=Ω αα

ααn

nn e

e

If iF is the distribution function of ,iu then

∑ ∫∑ ∫

=

=

== >′

+≤

+=

′≤′

p

iii

p

iii

p

iiii

p

i p

iii

uuEp

uuEp

dFpp

dFp

ii

1

42

4122

1

222

2122

1

22

1

),(2

)(1

)111(

ε

ε

εε

uuuuuu

from the −rC inequality in Rao (1973: 149). Since, as ,∞→p and from Lemma B.1,

[ ]

0

)3)(1(12)1(

2

)1()1(

2)(2

124

4

22

4

124

4

221

4122

+−−

=

−−−

=

∑∑

=

==

nnndp

nEndp

uEp

p

i i

i

ii

p

i i

ip

ii

λε

υλ

εε

and, by an analogous derivation, as ,∞→p

Page 92: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

83

.0

)2()1()1()]()1(272)[1)(1(32

)2()1()1()]1)(1([2)(2

12228

348

22

12228

428

221

4222

++−+−+−

=

++−+−−

=

∑∑

=

==

p

i i

i

p

i i

iiip

ii

nnndnOnnn

p

nnndnnE

puE

p

λε

υλεε

Hence ,0)(21

42

4122 →+∑

=

p

iii uuE

pε as .∞→p

By applying the multivariate central limit theorem in Theorem B.1, as ,∞→p for any

,n

).,(

)2)(1())1)(1((

)1(1

))1(()1(

1

)...(1 02

12

221

21 nD

p

i i

iii

p

i i

iii

p N

nndnn

pn

dn

pnp

Ω⎯→⎯

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

+++−−

−−−=+++

=

= 0UUUυλ

υλ

Note that, as ,∞→n and ,1≈ne

0

43

320

8442

Ω→⎥⎦⎤

⎢⎣⎡=Ω αα

ααn

nn e

e ,

where .8442

43

320⎥⎦⎤

⎢⎣⎡=Ω αα

αα

Thus, it follows that, as ,),( ∞→pn

).,(

)2)(1())1)(1((

)1(1

))1(()1(

10

2

12

221 Ω⎯→⎯

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

++

+−−

−−

=

= 0N

nndnn

pn

dn

pn Dp

i i

iii

p

i i

iii

υλ

υλ

Subsequently, under assumption (A2) which leads to assuming that ,0ΩΩ →

where ,8442

43

32⎥⎦⎤

⎢⎣⎡=Ω aa

aa it follows that

).,(

)2)(1())1)(1((

)1(1

))1(()1(

1

2

12

221 Ω⎯→⎯

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

++

+−−

−−

=

= 0N

nndnn

pn

dn

pn Dp

i i

iii

p

i i

iii

υλ

υλ

Page 93: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

84

For the first element in the previous random vector, recall that ii

p

i i

i

dpna υ

λ∑=−

=1

1 )1(1ˆ

and ,11

1 ∑=

=p

i i

i

dpa

λ since

[ ]

).ˆ()1(

)1(ˆ)1()1(

1

)1()1(

1))1(()1(

1

11

11

111

aapn

panapnpn

dn

dpndn

pn

p

i i

ip

i i

iiip

i i

iii

−−=

−−−−

=

⎥⎦

⎤⎢⎣

⎡ −−

−=

−−− ∑∑∑

===

λυλυλ

Because, as ,∞→n

),ˆ()ˆ()1( 1111 aanpaapn −≈−− then ),2,0()ˆ( 211 aNaanp D⎯→⎯− and, with

a simple linear transformation, we have

).2

,(ˆ 211 np

aaNa D⎯→⎯ (B.1)

Recall that ,1 2

12

2

31 ii

p

i i

i

dpnnb υ

λ∑=

−= and .1

12

2

1 ∑=

=p

i i

i

dpa

λ The second element is also

considered and we obtain

.)2)(1(

)1)(1()2)(1()2(

)1()1(

1)2)(1(

)1)(1()2)(1()1(

1)2)(1())1)(1((

)1(1

213

12

2

12

221

2

22

⎥⎥⎦

⎢⎢⎣

++

+−−

++−

−=

⎥⎥⎦

⎢⎢⎣

++

+−−

++−=

++

+−−

∑∑

==

=

nnpann

nnnpbn

pn

nndnn

nndpn

nndnn

pnp

i i

ip

i i

iii

p

i i

iii

λυλ

υλ

Since, as ,∞→n

),(

)(1)2)(1(

)1)(1()3)(1()2(

)1()1(

1

21

2121

3

abnp

npanpbnpnn

pannnnn

pbnpn

−=

−≈⎥⎥⎦

⎢⎢⎣

++

+−−

++−

then ),8,0()( 421 aNabnp D⎯→⎯− and, with a simple linear transformation, we

obtain the result

Page 94: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

85

).8

,( 421 np

aaNb D⎯→⎯ (B.2)

To find the distribution of ,2b the Lindeberg Central Limit Theorem from

Billingsley (1995: 359) is also made use of.

Theorem B.2. (Lindeberg Central Limit Theorem from Billingsley (1995: 359))

Let nXX ,...,1 be a sequence of independent random variables which satisfies

i) ,0)( =iXE and ii) ).( 22ii XE=σ

Let 01

22 >σ=∑=

n

iinS and iP be the distribution function of .iX

If ,01

22

1→∑

=∫ε≥

n

insiX

idPiX

nSfor ,0>ε then

).1,0(1 ND

nS

n

i iX

nSnM

⎯→⎯∑==

Proof (See proof in Billingsley (1995: 359)).

Srivastava (2005) produced an important result, which is used for the next

proof, as

)1,0(~1

Nn

ij

υ,

as ,∞→n leading to 21

2

~1

χυ−nij , which are asymptotically independently distributed

for all distinct i and .j

Note that 2b was defined in (A.4) in Appendix A, and so now let

).1

1()1(

2 22 jjiiij

ji

jiij ndpdn

υυυλλ

η−

−−

=

Page 95: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

86

From Lemma A.1., we have ,0)( =ijE η and let

,4

)1()1)(2(4

)(

11

)1(4

)1

1()1(

2)(

4222

4224

2

224

22

2

⎟⎟⎠

⎞⎜⎜⎝

⎛−≈

⎟⎟⎠

⎞⎜⎜⎝

⎛−

−+−

=

=

⎟⎠⎞

⎜⎝⎛

−−

−=

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

−−

−==

∑∑

<

<<

paa

n

paa

nnn

bVar

nVar

ddpn

ndpdnVarVarS

p

jijjiiij

ji

ji

p

jijjiiij

ji

jip

jiijp

υυυλλ

υυυλλ

η

as .),( ∞→np

Let .1

1)1(

22

22 b

nddpnM

p

jijjiiij

ji

jip

jiijp =⎟

⎠⎞

⎜⎝⎛

−−

−== ∑∑

<<

υυυλλ

η

If ijP is the distribution function of ,ijη for ,0>ε then

08

)1(

)1)(2(8

11

)1(4

11

)1(4

)(1

11

222222

22

222224

22

2

222

22

2224

2

22224

222

222

22

→≈

+−=

⎟⎠⎞

⎜⎝⎛

−−

−=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛

−−

−=

=

<

∑ ∫∑ ∫

<

<

<

<

<

<< >

p

ji jip

ji

p

ji jip

ji

p

jijjiiij

ji

ji

p

p

jijjiiij

ji

ji

p

p

jiij

p

p

jiijij

p

p

ji Sijij

p

ddSpn

ddSpn

nn

nE

ddSpn

nddE

Spn

ES

dPS

dPS

pij

ε

λλ

ε

λλ

υυυλλ

ε

υυυλλ

ε

ηε

ηε

ηεη

as .∞→p

Page 96: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

87

Subsequently, it follows from the Lindeberg Central Limit Theorem in Theorem B.2.,

).1,0(2 42

2

2 N

pa

a

bnSM D

p

p ⎯→⎯

=

Subsequently, by a linear transformation, we obtain

.4,0 42222 ⎟⎟

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛−⎯→⎯

pa

an

Nb D (B.3)

By applying Lemma A.6 in Appendix A, 1b and 2b are asymptotically independent.

Note that 2a is a linear function of the two random variables 1b and ,2b i.e.

2121

2

2 ])1)(2(

)1(ˆ [ bbbbnn

na +≈++−

−= ,

as .∞→n By applying Lemma A.6 and (A.5) in Appendix A, as well as (B.2) and

(B.3), we obtain

.48,ˆ 222422 ⎟⎟⎠

⎞⎜⎜⎝

⎛+⎯→⎯ a

na

npaNa D (B.4)

From (A.7) in Appendix A, we have ,4

)ˆ,ˆ( 321 np

aaaCOV ≈ and (B.1) and (B.4), then

we obtain the joint distribution of 1a and 2a as

.484

42

,ˆˆ

2224

3

32

2

12

2

1

⎥⎥⎥⎥

⎢⎢⎢⎢

⎟⎟⎟⎟

⎜⎜⎜⎜

+⎟⎠⎞

⎜⎝⎛⎯→⎯⎟

⎠⎞

⎜⎝⎛

an

anpnp

anpa

npa

aaNa

a D

The proof is completed.

Page 97: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

APPENDIX C

FORTRAN Syntax for One High-Dimensional Data

INTEGER IRANK,ISEED,J,LDR,LDRSIG,NOUT,LDCOV,LDINCD,IDO,

& IFRQ,INCD(LDINCD,1), IWT, MOPT, NMISS, NOBS, NROW,

& N, P, ITERATION, LDX, COUNT1, COUNT2,COUNT3,CORCASE,

& ALTER_COV_CASE

PARAMETER (K=10000,N=10,P=10,SIG_SQ=1,IDINCD=1)

C SIG_SQ IS SIGMA SQUARE

REAL V0(P,P),V1(P,P),X(N,P),RSIG(P,P),S(P,P),TR_V0,

& V0_INV(P,P),A(P,P),TR_A,B(P,P),TR_B,

& T1_HAT,ABOVE,BELOW,SS(P,P),TR_SS,D,SUMWT,XMEAN(P),

& EVAL0(P),EVAL1(P),UNI(P),UNI1(P),R(P)

EXTERNAL CHFAC,RNMVN,RNSET,UMACH,LINRG,WRRRN,MRRRR,

& ANORIN,RNUN,SSCAL,SADD,EVLRG,WRCRN,CORVC

C SET A KNOWN POSITIVE DEFINITE MATRIX, V0, UNDER H0

CALL UMACH(2,NOUT)

LDRSIG =P

LDR =N

LDCOV =P

C ALL COVARIANCE MATRIX CONSIDERED

DO 9999 CORCASE=1,1

DO I =1,P

Page 98: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

89

DO J =1,P

IF (I.EQ.J)THEN

V0(I,J)=1.0

ELSE

V0(I,J) = 0.0

ENDIF

END DO

END DO

IF(CORCASE.EQ.1) THEN

WRITE(6,*)'MATRIX V0, UN STRUCTURE'

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V0(I,I)=1.0

ELSE

V0(I,J) = ((-1)**(I+J))*(I/(2.0*J))

ENDIF

END DO

END DO

ELSEIF(CORCASE.EQ.2) THEN

WRITE(6,*)'MATRIX V0, CS STRUCTURE'

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V0(I,J)=1.0

ELSE

V0(I,J) = 0.5

ENDIF

END DO

END DO

Page 99: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

90

ELSEIF(CORCASE.EQ.3) THEN

WRITE(6,*)'MATRIX V0, CSH STRUCTURE'

CALL UMACH(2,NOUT)

ISEED=6250

CALL RNSET(ISEED)

ALO=2.0

BHI=3.0

CALL RNUN(P,UNI)

CALL SSCAL(P,BHI-ALO,UNI,1)

CALL SADD(P,ALO,UNI,1)

CALL WRRRN('UNIFORM',1,P,UNI,1,0)

DO I=1,P

V0(I,I)=UNI(I)

END DO

RHO=0.50

DO I=1,P

DO J=1,P

IF(I.EQ.J) THEN

V0(I,I)=UNI(I)

ELSE

V0(I,J)=((UNI(I)*UNI(J))**0.5)*RHO

ENDIF

END DO

END DO

ELSEIF (CORCASE.EQ.4) THEN

WRITE(6,*)'MATRIX V0, TOEP STRUCTURE'

DO I=1,P-1

V0(I,I+1)=-0.5

IF(I+2.LE.P) THEN

Page 100: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

91

V0(I,I+2) = 0.0

ENDIF

END DO

ELSEIF(CORCASE.EQ.5) THEN

WRITE(6,*)'MATRIX V0 CASE 1,SIMPLE STRUCTURE'

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V0(I,J)=2.0

ELSE

V0(I,J) = 0.0

ENDIF

END DO

END DO

END IF

DO I=1,P-1

DO J=I+1,P

V0(J,I)=V0(I,J)

END DO

END DO

C REPORT HEADER

CALL WRRRN('V0',P,P,V0,P,0)

C COMPUTE ALL OF EIGENVALUES OF MATRIX V0

CALL EVLRG(P,V0,P,EVAL0)

CALL WRCRN('EIGEN VALUES OF MATRIX V0',1,P,EVAL0,1,0)

TR_V0 = 0.0

Page 101: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

92

DO I =1,P

TR_V0=V0(I,I)+TR_V0

END DO

C THE INVERSE OF V0, V_INV

CALL LINRG(P,V0,P,V0_INV,P)

CALL WRRRN ('COVARIANCE INVERSE, V0_INV',P,P,V0_INV,P,0)

WRITE(6,*)'TRACE OF V0 =TR_V0=',TR_V0

C SETA GIVEN KNOWN POPULATION COVARIANCE MATRIX, UNDER H1,

C V1

DO 999 ALTER_COV_CASE=6,6

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V1(I,J)=1.0

ELSE

V1(I,J) = 0.0

ENDIF

END DO

END DO

IF(ALTER_COV_CASE.EQ.1) THEN

WRITE(6,*)'MATRIX V1, UN STRUCTURE'

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V1(I,I)=1.0

ELSE

V1(I,J) = ((-1)**(I+J))*(I/(4.0*J))

ENDIF

Page 102: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

93

END DO

END DO

ELSEIF(ALTER_COV_CASE.EQ.2) THEN

WRITE(6,*)'MATRIX V1 CASE 1, CS STRUCTURE '

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V1(I,J)=1.0

ELSE

V1(I,J) = 0.1

ENDIF

END DO

END DO

ELSEIF(ALTER_COV_CASE.EQ.3) THEN

WRITE(6,*)'MATRIX V1, CSH STRUCTURE'

CALL UMACH(2,NOUT)

ISEED=6250

CALL RNSET(ISEED)

ALO=3.0

BHI=4.0

CALL RNUN(P,UNI1)

CALL SSCAL(P,BHI-ALO,UNI1,1)

CALL SADD(P,ALO,UNI1,1)

CALL WRRRN('UNIFORM1',1,P,UNI1,1,0)

DO I=1,P

V1(I,I)=UNI1(I)

END DO

RHO=0.50

Page 103: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

94

DO I=1,P

DO J=1,P

IF(I.EQ.J) THEN

V1(I,I)=UNI1(I)

ELSE

V1(I,J)=((UNI1(I)*UNI1(J))**0.5)*RHO

ENDIF

END DO

END DO

ELSEIF(ALTER_COV_CASE.EQ.4) THEN

WRITE(6,*)'MATRIX V1, TOEP STRUCTURE'

DO I=1,P-1

V1(I,I+1)=-0.45

IF(I+2.LE.P) THEN

V1(I,I+2) = 0.0

ENDIF

END DO

ELSEIF(ALTER_COV_CASE.EQ.5) THEN

WRITE(6,*)'MATRIX V1, VC STRUCTURE'

N_HALF=P/2

WRITE(6,*)'P/2 = ', N_HALF

DO I =1,P

DO J =1,P

IF (I.GT.N_HALF)THEN

V1(I,I)=0.5

ENDIF

END DO

END DO

ELSEIF(ALTER_COV_CASE.EQ.6) THEN

Page 104: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

95

WRITE(6,*)'MATRIX V1,VC STRUCTURE'

CALL UMACH(2,NOUT)

ISEED=6250

CALL RNSET(ISEED)

CALL RNUN(P,R)

CALL WRRRN('UNIFORM VECTOR =',1,P,R,1,0)

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V1(I,J)=2.0*R(I)

ELSE

V1(I,J) = 0.0

ENDIF

END DO

END DO

END IF

DO I=1,P-1

DO J=I+1,P

V1(J,I)=V1(I,J)

END DO

END DO

999 END DO

C REPORT HEADER

CALL WRRRN('V1',P,P,V1,P,0)

C COMPUTE ALL EIGENVALUES OF MATRIX V1

CALL EVLRG(P,V1,P,EVAL1)

CALL WRCRN('EIGEN VALUES OF MATRIX V1',1,P,EVAL1,1,0)

Page 105: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

96

C CONTRUCTE DATA X UNDER V1

C OBTAIN THE CHOLESKY FACTORIZATION

CALL CHFAC (P,V1,P,0.00001,IRANK,RSIG,LDRSIG)

C INITIALIZE SEED OF RANDOM NUMBER GERNATOR

ITERATION=0

C INITIALIZE THE NUMBER OF TIME THAT TEST STATISTIC FALL IN C

C THE CRITICAL REGION

COUNT1 = 0

COUNT2 = 0

COUNT3 = 0

C INITIALIZE SEED OF RANDOM NUMBER GENERATOR, K

ISEED =6250

CALL RNSET(ISEED)

DO 10 ITER = 1,K

ITERATION = ITERATION +1

WRITE(6,*)'ITER',ITERATION

CALL RNMVN(N,P,RSIG,LDRSIG,X,LDR)

CALL WRRRN(NXP MATRIX X',N,P,X,N,0)

C COMPUTE SAMPLE COVARIANCE MATRIX,S

CALL UMACH(2, NOUT)

IDO = 0

NROW = N

LDX = N

IFRQ = 0

IWT = 0

MOPT = 0

Page 106: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

97

ICOPT = 0

CALL CORVC(IDO, NROW, P, X, N, IFRQ, IWT, MOPT, ICOPT,

& XMEAN, S, LDCOV, INCD, LDINCD,NOBS, NMISS, SUMWT)

CALL WRRRN('SAMPLE COV MATRIX, S', P,P, S, LDCOV, 0)

TR_S=0.0

DO I =1,P

DO J=1,P

IF (I.EQ.J) THEN

TR_S=TR_S+S(I,J)

ELSE

ENDIF

END DO

END DO

WRITE(6,*) 'TRACE OF SAMPLE VAR-COV MATRIX= TR_S = ',TR_S

C MULTIPLY POPULATION VARIANCE-COVARIANCE INVERSE

C (COV_INV) WITH SAMPLE COVARIANCE MATRIX (S)

CALL MRRRR (P,P,V0_INV,P,P,P,S,P,P,P,A,P)

CALL WRRRN ('A=V0_INV * S ',P,P,A,P,0)

CALL MRRRR(P,P,A,P,P,P,A,P,P,P,B,P)

CALL WRRRN ('B = (V0_INV * S) ^ 2', P,P,B,P,0)

TR_A=0.0

TR_B = 0.0

DO I =1,P

TR_A=TR_A+A(I,I)

TR_B=TR_B+B(I,I)

END DO

Page 107: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

98

C============================================================

C COMPUTE THE PROPOSED TEST STATISTIC, T1

C============================================================

A1=TR_A/P

D = (N-2.0)*(N+1.0)

E=(N-1)**2/D

A2=E*(TR_B-(TR_A**2)/(N-1))/P

T =A2-2.0*(SIG_SQ**2)*A1+SIG_SQ**2

ABOVE=(N-1)*

BELOW=2*((2*(SIG_SQ**2)*(N-1)-4*SIG_SQ*(N-1)+2*(N-1))/P + 1)**0.5

T1 = ABOVE/BELOW

WRITE(6,*) 'TEST STATISTIC = T1 = ',T1

CRV=ANORIN(0.95)

WRITE(6,*) 'CRITICAL VALUE =',CRV

IF (T1.GT.CRV) THEN

COUNT1 = COUNT1+1

ENDIF

C============================================================

C COMPUTE SRIVASTAVA (2005)’ S STATISTIC, TS1

C============================================================

CALL MRRRR (P,P,S,P, P,P,S,P, P,P,SS,P)

CALL WRRRN ('S SQUARE = S * S = ',P,P,SS,P,0)

TR_SS = 0.0

DO I =1,P

TR_SS = TR_SS + SS(I,I)

END DO

WRITE(6,*) 'TRACE OF S SQUARE =',TR_SS

Page 108: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

99

A1_HAT=TR_S/P

A2_HAT=(TR_SS-1.0*TR_S**2/(N-1))/P*(N-1)**2/(N-2)/(N+1)

TS1=((N-1)/2.0)*(A2_HAT/(A1_HAT**2)-1)

WRITE(6,*) 'SRIVASTAVA STATISTIC, TS1 = ',TS1

IF (TS1.GT.CRV) THEN

COUNT2 = COUNT2+1

ENDIF

C============================================================

C COMPUTE LEDOIT AND WOLF (2002)’ S STATISTIC, UJ

C============================================================

U=1.0*(TR_SS/P)/(TR_S/P)**2-1

UJ=((N*1.0-1)*U-P-1.0)/2.0

WRITE(6,*)'UJ=',UJ

IF (UJ.GT.CRV) THEN

COUNT3 = COUNT3+1

ENDIF

10 CONTINUE

C============================================================

C COMPUTE PROPROTIONS OF REJECTIONS OF THE TEST STATISTICS

C=========================================================

PROP_T1=1.0*COUNT1/K*1.0

WRITE(6,60) PROP_T1

60 FORMAT(' PROP1_T1 = (#T1 > Z(ALPHA))/ K = ',T35,F7.4)

PROP_UJ=1.0*COUNT3/K*1.0

WRITE(6,120) PROP_UJ

Page 109: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

100

70 FORMAT(' PROP_UJ = (#UJ > Z(ALPHA))/ K = ',T35,F7.4)

PROP _TS1=1.0*COUNT2/K*1.0

WRITE(6,100) PROP _TS1

80 FORMAT(' PROP_TS1 = (#TS1 > Z(ALPHA))/ K = ',T35,F7.4)

9999 CONTINUE

STOP

END

Page 110: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

APPENDIX D

FORTRAN Syntax for Two High-Dimensional Data

INTEGER I,IRANK,ISEED,J,LDR1,LDR2,LDRSIG,NOUT, N1,N2,P,

& ITERATION,COUNT1,COUNT2,COUNT3,COUNT4,n,

& LDCOV,LDINCD

PARAMETER (K=10000,N1=20,N2=20,P=20,LDCOV=P,LDINCD=1)

INTEGER ICOPT,IDO,IFRQ,INCD(LDINCD,1),IWT,MOPT,NMISS,NOBS,

& NROW

REAL V1(P,P),V2(P,P),RSIG(P,P),X1(N1,P),S1(P,P),XMEAN1(P), SS1(P,P),

& X2(N2,P),S2(P,P),SS2(P,P),XMEAN2(P), S(P,P),SS(P,P),SSS(P,P),

& SSSS(P,P), A2,A4,TAU,b,d,c_star,e,T2_SQ,M1(P,P),M2(P,P),

& MM1(P,P),MM2(P,P),M(P,P),MM(P,P), MMM(P,P),MMMM(P,P),

& C0,C1,C2,C3,S1S2(P,P),TJ,TJ_SQ, U1(P),U2(P),DIAG(P,P),

& DT(P,P),DID(P,P),EVAL1(P),EVAL2(P)

EXTERNAL CHFAC,RNMVN,RNSET,UMACH,LINRG,WRRRN,MRRRR,

& ANORIN,CHIIN,CORVC,RNUN,SSCAL,SADD,EVLRG,WRCRN

C SET POPULATION COVARIANCE MATRICES

CALL UMACH(2,NOUT)

LDRSIG =P

LDR1 =N1

LDR2 =N2

C ALL COVARIANCE MATRICES CONSIDERED

DO 9999 COV1=8,8

C SET FIRST POPULATION COVARIANCE MATRIX, SIGMA1=V1

Page 111: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

102

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V1(I,J)=1.0

ELSE

V1(I,J) = 0.0

ENDIF

END DO

END DO

IF(COV1.EQ.1) THEN

WRITE(6,*)'MATRIX V1, UN STRUCTURE'

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V1(I,I)=1.0

ELSE

V1(I,J) = ((-1)**(I+J))*(I/(10.0*J))

ENDIF

END DO

END DO

ELSEIF(COV1.EQ.2) THEN

WRITE(6,*)'MATRIX V1, UN STRUCTURE'

CALL UMACH(2,NOUT)

ISEED=6250

CALL RNSET(ISEED)

AAA=1.0

BBB=5.0

CALL RNUN(P,U1)

CALL SSCAL(P,BBB-AAA,U1,1)

CALL SADD(P,AAA,U1,1)

Page 112: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

103

CALL WRRRN('U1',1,P,U1,1,0)

DO I=1,P

DO J=1,P

IF (I.EQ.J)THEN

DIAG(I,I)=U1(I)

ELSE

DIAG(I,J)=0.0

ENDIF

END DO

END DO

CALL WRRRN('DIAG',P,P,DIAG,P,0)

L=0

WW=0.1

DO I=1,P

DO J=1,P

Y=ABS(I-J)

DT(I,J)=(-1)**(I+J)*(0.2*(L+2.0))**(Y**WW)

END DO

END DO

CALL WRRRN('DELTA MATRIX',P,P,DT,P,0)

CALL MRRRR (P,P,DIAG,P,P,P,DT,P,P,P,DID,P)

CALL WRRRN('DIAG MATRIX * DELTA MATRIX',P,P,DID,P,0)

CALL MRRRR (P,P,DID,P,P,P,DIAG,P,P,P,V1,P)

CALL WRRRN('DIAG *DELTA *DIAG = V1',P,P,V1,P,0)

ELSEIF(COV1.EQ.3) THEN

WRITE(6,*)'MATRIX V1, CS STRUCTRUE'

DO I =1,P

DO J =1,P

Page 113: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

104

IF (I.EQ.J)THEN

V1(I,J)=1.0

ELSE

V1(I,J) = 0.01

ENDIF

END DO

END DO

ELSEIF(COV1.EQ.4) THEN

WRITE(6,*)'MATRIX V1, CSH STRUCTURE'

CALL UMACH(2,NOUT)

ISEED=6250

CALL RNSET(ISEED)

AAA=5.0

BBB=6.0

CALL RNUN(P,U1)

CALL SSCAL(P,BBB-AAA,U1,1)

CALL SADD(P,AAA,U1,1)

CALL WRRRN('U1',1,P,U1,1,0)

DO I=1,P

V1(I,I)=U1(I)

END DO

RHO=0.50

DO I=1,P

DO J=1,P

IF(I.EQ.J) THEN

V1(I,I)=U1(I)

ELSE

V1(I,J)=((U1(I)*U1(J))**0.5)*RHO

ENDIF

Page 114: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

105

END DO

END DO

ELSEIF(COV1.EQ.5) THEN

WRITE(6,*)'MATRIX V1, SIM STRUCTRUE'

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V1(I,J)=1.0

ELSE

V1(I,J) = 0.0

ENDIF

END DO

END DO

ELSEIF(COV1.EQ.6) THEN

WRITE(6,*)'MATRIX V1, SIM STRUCTRUE'

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V1(I,J)=2.0

ELSE

V1(I,J) = 0.0

ENDIF

END DO

END DO

ELSEIF(COV1.EQ.7) THEN

WRITE(6,*)'MATRIX V1, TOEP STRUCTRUE'

DO I=1,P-1

V1(I,I+1)=-0.5

IF(I+2.LE.P) THEN

Page 115: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

106

V1(I,I+2) = 0.0

ENDIF

END DO

ELSEIF(COV1.EQ.8) THEN

WRITE(6,*)'MATRIX V1, VC STUCTURE'

CALL UMACH(2,NOUT)

ISEED=6250

CALL RNSET(ISEED)

AAA=1.0

BBB=2.0

CALL RNUN(P,U1)

CALL SSCAL(P,BBB-AAA,U1,1)

CALL SADD(P,AAA,U1,1)

CALL WRRRN('U1',1,P,U1,1,0)

DO I=1,P

DO J=1,P

V1(I,J)=0.0

END DO

END DO

DO I=1,P

V1(I,I)=U1(I)

END DO

END IF

DO I=1,P-1

DO J=I+1,P

V1(J,I)=V1(I,J)

END DO

END DO

Page 116: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

107

C REPORT HEADER

CALL WRRRN('V1',P,P,V1,P,0)

C ALL COVARIANCE MATRICES CONSIDERED

DO 8888 COV2=8,8

C SET SECOND POPULATION COVARINCE MATRIX SIGMA2=V2

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V2(I,J)=1.0

ELSE

V2(I,J) = 0.0

ENDIF

END DO

END DO

IF(COV2.EQ.1) THEN

WRITE(6,*)'MATRIX V2, UN STRUCTURE'

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V2(I,I)=1.0

ELSE

V2(I,J) = ((-1)**(I+J))*(I/(20.0*J))

ENDIF

END DO

END DO

ELSEIF(COV2.EQ.2) THEN

WRITE(6,*)'MATRIX V2, UN STRUCTURE'

CALL UMACH(2,NOUT)

ISEED=6250

Page 117: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

108

CALL RNSET(ISEED)

AAA=1.0

BBB=5.0

CALL RNUN(P,U1)

CALL SSCAL(P,BBB-AAA,U1,1)

CALL SADD(P,AAA,U1,1)

CALL WRRRN('U1',1,P,U1,1,0)

DO I=1,P

DO J=1,P

IF (I.EQ.J)THEN

DIAG(I,I)=U1(I)

ELSE

DIAG(I,J)=0.0

ENDIF

END DO

END DO

CALL WRRRN('DIAG',P,P,DIAG,P,0)

L=2

WW=0.10

DO I=1,P

DO J=1,P

Y=ABS(I-J)

DT(I,J)=(-1)**(I+J)*(0.2*(L+2.0))**(Y**WW)

END DO

END DO

CALL WRRRN('DELTA MATRIX',P,P,DT,P,0)

CALL MRRRR (P,P,DIAG,P,P,P,DT,P,P,P,DID,P)

CALL WRRRN('DIAG MATRIX * DELTA MATRIX',P,P,DID,P,0)

Page 118: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

109

CALL MRRRR (P,P,DID,P,P,P,DIAG,P,P,P,V2,P)

CALL WRRRN('DIAG *DELTA *DIAG = V2',P,P,V2,P,0)

ELSEIF(COV2.EQ.3) THEN

WRITE(6,*)'MATRIX V2, CS STRUCTURE'

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

V2(I,J)=1.0

ELSE

V2(I,J) = 0.05

ENDIF

END DO

END DO

ELSEIF(COV2.EQ.4) THEN

WRITE(6,*)'MATRIX V2, CSH STRUCTURE'

CALL UMACH(2,NOUT)

ISEED=6250

CALL RNSET(ISEED)

AAA=4.0

BBB=5.0

CALL RNUN(P,U1)

CALL SSCAL(P,BBB-AAA,U1,1)

CALL SADD(P,AAA,U1,1)

CALL WRRRN('U1',1,P,U1,1,0)

DO I=1,P

V2(I,I)=U1(I)

END DO

RHO=0.40

Page 119: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

110

DO I=1,P

DO J=1,P

IF(I.EQ.J) THEN

V2(I,I)=U1(I)

ELSE

V2(I,J)=((U1(I)*U1(J))**0.5)*RHO

ENDIF

END DO

END DO

ELSEIF(COV2.EQ.5) THEN

WRITE(6,*)'MATRIX V2, VC STRUCTURE'

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

MMOD=MOD(I,4)

ZERO=0

IF (MMOD.EQ.0) THEN

V2(I,I)=2.0

ELSE

ENDIF

ELSE

V2(I,J)=0.0

ENDIF

END DO

END DO

ELSEIF(COV2.EQ.6) THEN

WRITE(6,*)'MATRIX V2, SIM STRUCTURE'

DO I =1,P

DO J =1,P

IF (I.EQ.J)THEN

Page 120: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

111

V2(I,J)=1.5

ELSE

V2(I,J) = 0.0

ENDIF

END DO

END DO

ELSEIF(COV2.EQ.7) THEN

WRITE(6,*)'MATRIX V2, TOEP STRUCTRUE'

DO I=1,P-1

V2(I,I+1)=-0.30

IF(I+2.LE.P) THEN

V2(I,I+2) = 0.0

ENDIF

END DO

ELSEIF(COV2.EQ.8) THEN

WRITE(6,*)'MATRIX V2, VC STUCTURE'

CALL UMACH(2,NOUT)

ISEED=6250

CALL RNSET(ISEED)

AAA=1.5

BBB=2.5

CALL RNUN(P,U2)

CALL SSCAL(P,BBB-AAA,U2,1)

CALL SADD(P,AAA,U2,1)

C CALL WRRRN('U2',1,P,U2,1,0)

DO I=1,P

DO J=1,P

V2(I,J)=0.0

END DO

END DO

Page 121: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

112

DO I=1,P

V2(I,I)=U2(I)

END DO

ENDIF

DO I=1,P-1

DO J=I+1,P

V2(J,I)=V2(I,J)

END DO

END DO

C REPORT HEADER

CALL WRRRN('V2',P,P,V2,P,0)

C============================================================

C COMPUTE EIGENVALUES OF MATRIX V1 AN V2

C============================================================

CALL EVLRG(P,V1,P,EVAL1)

CALL WRCRN('EIGEN VALUES OF MATRIX V1',1,P,EVAL1,1,0)

CALL EVLRG(P,V2,P,EVAL2)

CALL WRCRN('EIGEN VALUES OF MATRIX V2',1,P,EVAL2,1,0)

C INITIALIZE SEED OF RANDOM NUMBER GERNATOR

ITERATION=0

C INITIALIZE THE NUMBER OF TIME,COUNT,THAT TEST STATISTIC FALL

C IN THE CRITICAL REGION

COUNT1= 0

COUNT2= 0

COUNT3= 0

COUNT4= 0

Page 122: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

113

C INITIALIZE SEED OF RANDOM NUMBER GENERATOR, K

ISEED =6250

CALL RNSET(ISEED)

DO 10 ITER = 1,K

ITERATION = ITERATION +1

WRITE(6,*)'#',ITERATION

C============================================================

C PHASE 1 FOR POPULATION 1

C============================================================

C OBTAIN THE CHOLESKY FACTORIZATION

CALL CHFAC (P,V1,P,0.00001,IRANK,RSIG,LDRSIG)

C CONSTRUCT FIRST SAMPLE DATA, X1, BASED ON V1

CALL RNMVN(N1,P,RSIG,LDRSIG,X1,LDR1)

CALL WRRRN ('(N1)x P MATRIX X1',N1,P,X1,N1,0)

C CALCULATE FIRST SAMPLE COVARIANCE MATRIX, S1

CALL UMACH(2, NOUT)

IDO = 0

NROW = N1

IFRQ = 0

IWT = 0

MOPT = 0

ICOPT = 0

CALL CORVC(IDO, NROW, P, X1, N1, IFRQ, IWT, MOPT, ICOPT,

& XMEAN1, S1, LDCOV, INCD, LDINCD,NOBS, NMISS, SUMWT)

CALL WRRRN('1ST SAMPLE COV MATRIX = S1', P,P, S1, LDCOV, 0)

CALL MRRRR (P,P,S1,P,P,P,S1,P,P,P,SS1,P)

Page 123: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

114

CALL WRRRN ('S1^2 ',P,P,SS1,P,0)

TR_S1=0.0

TR_SS1=0.0

DO I=1,P

TR_S1=TR_S1+S1(I,I)

TR_SS1=TR_SS1+SS1(I,I)

END DO

A21_HAT=(N1-1)**2*(TR_SS1-TR_S1**2/(N1-1))/(N1-2)/(N1+1)/P

C WRITE(6,*)'TRACE OF S1= ',TR_S1

C WRITE(6,*)'TRACE OF S1^2 = ',TR_SS1

C WRITE(6,*)'A21_HAT = ',A21_HAT

C============================================================

C PHASE 2 FOR POPULATION 2

C============================================================

C OBTAIN THE CHOLESKY FACTORIZATION

CALL CHFAC (P,V2,P,0.00001,IRANK,RSIG,LDRSIG)

C CONSTRUCT SECOND SAMPLE DATA X2 BASED ON V2

CALL RNMVN(N2,P,RSIG,LDRSIG,X2,LDR2)

CALL WRRRN ('(N2)x P MATRIX X2',N2,P,X2,N2,0)

C CALCULATE SECOND SAMPLE COVARIANCE MATRIX, S2

CALL UMACH(2,NOUT)

NROW = N2

CALL CORVC(IDO, NROW, P, X2, N2, IFRQ, IWT, MOPT, ICOPT,

& XMEAN2, S2, LDCOV, INCD, LDINCD,NOBS, NMISS, SUMWT)

CALL WRRRN('2ND SAMPLE COV MATRIX = S2', P,P, S2, LDCOV, 0)

CALL MRRRR (P,P,S2,P,P,P,S2,P,P,P,SS2,P)

Page 124: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

115

CALL WRRRN ('S2^2 ',P,P,SS2,P,0)

TR_S2=0.0

TR_SS2=0.0

DO I=1,P

TR_S2=TR_S2+S2(I,I)

TR_SS2=TR_SS2+SS2(I,I)

END DO

A22_HAT=(N2-1)**2*(TR_SS2-TR_S2**2/(N2-1))/(N2-2)/(N2+1)/P

WRITE(6,*)'TRACE OF S2= ',TR_S2

WRITE(6,*)'TRACE OF S2^2 = ',TR_SS2

WRITE(6,*)'A22_HAT = ',A22_HAT

C============================================================

C PHASE 3 COMPUTE THE PROPOSED TEST STATISTIC, T2

C============================================================

C ESTIMATE THE COMMON COVARIANCE MATRIX, UNDER H0:

C SIGMA1=SIGMA2=SIGMA

C THE POOLED COVARIANCE MATRIX, S, IS UNBIASED ESTIMATOR OF

C THE SIGMA MATRIX

DO I =1,P

DO J =1,P

S(I,J)=((N1-1)*S1(I,J)+(N2-1)*S2(I,J))/(N1+N2-2)

END DO

END DO

CALL WRRRN('POOLED COV MATRIX= S',P,P,S,P,0)

CALL MRRRR (P,P,S,P,P,P,S,P,P,P,SS,P)

CALL WRRRN ('S^2 ',P,P,SS,P,0)

CALL MRRRR (P,P,S,P,P,P,SS,P,P,P,SSS,P)

Page 125: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

116

CALL WRRRN ('S^3 ',P,P,SSS,P,0)

CALL MRRRR (P,P,SS,P,P,P,SS,P,P,P,SSSS,P)

CALL WRRRN ('S^4 ',P,P,SSSS,P,0)

TR_S=0.0

TR_SS=0.0

TR_SSS=0.0

TR_SSSS=0.0

DO I=1,P

TR_S=TR_S+S(I,I)

TR_SS=TR_SS+SS(I,I)

TR_SSS=TR_SSS+SSS(I,I)

TR_SSSS=TR_SSSS+SSSS(I,I)

END DO

WRITE(6,*)'TRACE OF POOLED-COV MATRIX = TR_S = ',TR_S

WRITE(6,*)'TRACE OF S^2 = TR_SS = ',TR_SS

WRITE(6,*)'TRACE OF S^3 = TR_SSS = ',TR_SSS

WRITE(6,*)'TRACE OF S^4 = TR_SSSS = ',TR_SSSS

n=(N1-1)+(N2-1)

WRITE(6,*)'SMALL n = ',n

A2=(n**2)*(TR_SS-(TR_S**2)/n)/P/(n-1)/(n+2)

WRITE(6,*)'A2 = ',A2

d1=1.0*(n**5.0)

d2=1.0*(n**2.0+1.0*n+2.0)

d3=1.0*(n+1)

d4=1.0*(n+2)

d5=1.0*(n+4)

d6=1.0*(n+6)

Page 126: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

117

d7=1.0*(n-1)

d8=1.0*(n-2)

d9=1.0*(n-3)

TAU=1.0*(d1*d2)/(d3*d4*d5*d6*d7*d8*d9)

b=-4.0/n

c_star=-1.0*(2.0*n**2+3.0*n-6)/n/(n**2+n+2)

d=2.0*(5.0*n+6)/n/(n**2+n+2)

e=-1.0*(5.0*n+6)/n**2/(n**2+n+2)

A4=1.0*TAU*(TR_SSSS+b*TR_SSS*TR_S+c_star*TR_SS**2+d*TR_SS*

& TR_S**2+e*TR_S**4)/P

DELTA_SQ=8.0*A4*(1.0/(N1-1)+1.0/(N2-1))/(A2**2)/P+4.0*

& (1.0/(N1-1)**2+1.0/(N2-1)**2)

DELTA=1.0*SQRT(DELTA_SQ)

T2=(A21_HAT/A22_HAT-1)/DELTA

T2_SQ=T2**2

C WRITE(6,*)'T2 = ',T2

C============================================================

C PHASE 4 COMPUTE SRIVASTAVA(2007)'S STATISTIC,TS2

C============================================================

DO I =1,P

DO J =1,P

M1(I,J)=(N1-1)*S1(I,J)

M2(I,J)=(N2-1)*S2(I,J)

END DO

END DO

CALL WRRRN('(N1-1)*S1= M1',P,P,M1,P,0)

CALL WRRRN('(N2-1)*S2= M2',P,P,M2,P,0)

CALL MRRRR (P,P,M1,P,P,P,M1,P,P,P,MM1,P)

Page 127: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

118

CALL WRRRN ('M1^2 ',P,P,MM1,P,0)

CALL MRRRR (P,P,M2,P,P,P,M2,P,P,P,MM2,P)

CALL WRRRN ('M2^2 ',P,P,MM2,P,0)

DO I =1,P

DO J =1,P

M(I,J)=M1(I,J)+M2(I,J)

END DO

END DO

CALL WRRRN('M1+M2 = M',P,P,M,P,0)

CALL MRRRR (P,P,M,P,P,P,M,P,P,P,MM,P)

CALL WRRRN ('M^2 ',P,P,MM,P,0)

CALL MRRRR (P,P,MM,P,P,P,MM,P,P,P,MMMM,P)

CALL WRRRN ('M^4 ',P,P,MMMM,P,0)

TR_M1=0.0

TR_M2=0.0

TR_MM1=0.0

TR_MM2=0.0

TR_M=0.0

TR_MM=0.0

TR_MMMM=0.0

DO I=1,P

TR_M1=TR_M1+M1(I,I)

TR_M2=TR_M2+M2(I,I)

TR_MM1=TR_MM1+MM1(I,I)

TR_MM2=TR_MM2+MM2(I,I)

Page 128: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

119

TR_M=TR_M+M(I,I)

TR_MM=TR_MM+MM(I,I)

TR_MMMM=TR_MMMM+MMMM(I,I)

END DO

WRITE(6,*)'TR_M1=',TR_M1, ' TR_M2=',TR_M2

WRITE(6,*)'TR_MM1=',TR_MM1, ' TR_MM2',TR_MM2

WRITE(6,*)'TR_M=',TR_M,' TR_MM=',TR_MM

WRITE(6,*)'TR_MMMM=',TR_MMMM

C0=1.0*n*(n**3+6.0*n**2+21.0*n+18)

C1=2.0*n*(2.0*n**2+6.0*n+9)

C2=2.0*n*(3.0*n+2)

C3=1.0*n*(2.0*n**2+5.0*n+7)

A21S=1.0*(TR_MM1-(TR_M1**2)/(N1-1))/(P*(N1-2)*(N1+1))

A22S=1.0*(TR_MM2-(TR_M2**2)/(N2-1))/(P*(N2-2)*(N2+1))

A1S=TR_M/(n*P)

A2S=(TR_MM-TR_M**2/n)/((n-1)*(n+2)*p)

A4S=1.0*(1.0*TR_MMMM/P-1.0*P*C1*A1S-1.0*(P**2)*C2*(A1S**2)*A2S

& -1.0*P*C3*(A2S**2)-1.0*n*(P**3)*(A1S**4))/C0

ET1_SQ=4.0*A2S**2*(1.0+2.0*(N1-1)*A4S/P/A2S**2)/

& (N1-1)**2

ET2_SQ=4.0*A2S**2*(1.0+2.0*(N2-1)*A4S/P/A2S**2)/

& (N2-1)**2

PLUS=1.0*(ET1_SQ+ET2_SQ)

TS2_SQ=1.0*(A21S-A22S)**2/PLUS

WRITE(6,*)'TS2_SQ=',TS2_SQ

C============================================================

CPHASE 5 COMPUTE SRIVASTAVA&YANAGIHARA(2010)'S STATISTIC,TSY

C============================================================

CALL MRRRR (P,P,M,P,P,P,MM,P,P,P,MMM,P)

CALL WRRRN ('M^3 ',P,P,MMM,P,0)

Page 129: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

120

TR_MMM=0.0

DO I=1,P

TR_MMM=TR_MMM+MMM(I,I)

END DO

A11S=1.0*TR_M1/P/(N1-1)

A12S=1.0*TR_M2/P/(N2-1)

A3S=1.0*(TR_MMM/P-3.0*n*(n+1)*P*A2S*A1S-n*P**2

& *A1S**3)/n/(n**2+3*n+4)

GAM1=A21S/A11S**2

GAM2=A22S/A12S**2

R=(1.0*(A2S**3/A1S**6-2.0*A2S*A3S/A1S**5+A4S/A1S**4))/P

SI1_SQ=4.0*(A2S**2/A1S**4+2.0*(N1-1)*R)/(N1-1)**2

SI2_SQ=4.0*(A2S**2/A1S**4+2.0*(N2-1)*R)/(N2-1)**2

SUM=1.0*(SI1_SQ+SI2_SQ)

TSY_SQ=1.0*(GAM1-GAM2)**2/SUM

C WRITE(6,*)'TSY_SQ=',TSY_SQ

C============================================================

C PHASE 6 COMPUTE SCHOTT(2007)'S STATISTIC, TJ,

C============================================================

CALL MRRRR (P,P,S1,P,P,P,S2,P,P,P,S1S2,P)

CALL WRRRN ('S1*S2 = S1S2 ',P,P,S1S2,P,0)

TR_S1S2=0.0

DO I=1,P

TR_S1S2=TR_S1S2+S1S2(I,I)

END DO

WRITE(6,*) 'TR(S1*S2) =',TR_S1S2

TJ=(N1-1)*(N2-1)*(A21_HAT+A22_HAT-2.0*TR_S1S2/P)/2.0/

& (N1+N2-2)/A2

Page 130: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

121

TJ_SQ=TJ**2

WRITE(6,*)'TJ = ',TJ,' TJ_SQ = ',TJ_SQ

C============================================================

C PHASE 7 COMPUTE PROPORTIONS OF REJECTIONS

C============================================================

CRV=ANORIN(0.95)

WRITE(6,*) 'CRITICAL VALUE =Z(ALPHA)=',CRV

CRV1=CHIIN(0.95,1.0)

WRITE(6,*)'95th PERCENTAGE OF A CHI-SQUARED WITH 1 D.F.=',CRV1

IF (T2_SQ.GT.CRV1) THEN

COUNT1 =COUNT1+1

ENDIF

IF (TJ.GT.CRV) THEN

COUNT2 =COUNT2+1

ENDIF

IF (TS2_SQ.GT.CRV1) THEN

COUNT3 =COUNT3+1

ENDIF

IF (TSY_SQ.GT.CRV1) THEN

COUNT4 =COUNT4+1

ENDIF

10 CONTINUE

ASL1=1.0*COUNT1/K*1.0

WRITE(6,11) ASL1

Page 131: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

122

11 FORMAT(' ASL1= (# T2^2 > CHI(ALPHA)))/K = ',T35,F7.4)

ASL2=1.0*COUNT2/K*1.0

WRITE(6,13) ASL2

12 FORMAT(' ASL2= (# TJ > Z(ALPHA)) K = ',T35,F7.4)

ASL3=1.0*COUNT3/K*1.0

WRITE(6,14) ASL3

13 FORMAT(' ASL3 = (# TS2^2 > CHI(ALPHA))/ K = ',T35,F7.4)

ASL4=1.0*COUNT4/K*1.0

WRITE(6,14) ASL4

14 FORMAT(' ASL4 = (# TSY^2 > CHI(ALPHA))/ K) = ',T35,F7.4)

8888 CONTINUE

9999 CONTINUE

STOP

END

Page 132: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

BIOGRAPHY

NAME Miss Saowapha Chaipitak

ACADAMIC BACKGROUND B.Sc. (Statistics), Naresuan University,

Thailand, 1999.

M.S. (Applied Statistics), National Institute

of Development Administration, Thailand,

2006.

PRESENT POSITION Lecturer,

Faculty of Science and Technology,

Rajamangala University of Technology

Thanyaburi (RMUTT), Thailand.

EXPERIENCE 1999-2000: General Administrative Officer,

Phitsanulok Municipal Court, Thailand.

2000-present: Lecturer,

Faculty of Science and Technology,

Rajamangala University of Technology

Thanyaburi (RMUTT), Thailand.

2007: Received a scholarship from “the

Commission on Higher Education, Thailand”

for enrolling in the doctoral level program at

the School of Applied Statistics, National

Institute of Development Administration

(NIDA), Thailand.

Page 133: TESTS FOR COVARIANCE MATRICES WITH HIGH ...libdcms.nida.ac.th/thesis6/2012/b177570.pdfABSTRACT Title of Dissertation Tests for Covariance Matrices with High-dimensional Data Author

124

Publication: Saowapha Chaipitak and Samruam

Chongcharoen, 2013. A Test for Testing the

Equality of Two Covariance Matrices for High-

dimensional Data. Journal of Applied

Sciences. 13 (February): 270-277.