canonical analysis of several sets of variables jon … · 2018-11-13 · 5.2 interpretation of the...

CANONICAL ANALYSIS OF SEVERAL SETS OF VARIABLES

by

JON ROBERTS KETTENRINGDepartment of Statistics

University of North Carolina at Chapel Hill

Institute of Statistics Mimeo Series No. 615

APRIL 1969

The research was partially supported by the National Institutes ofHealth, Institute of General Medical Science, Grant No. GM-12865-05.

TABLE OF CONTENTS

CHAPTER

ACKNOWLEDGMENTS

INTRODUCTION AND SUMMARY

I. RELATIONS BETWEEN TWO NONSINGULAR SETS OF VARIABLES

ii

PAGE

v

vi

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

Preliminary Remarks

Notation •

Introduction to Canonical Analysis

The Basic Theorem of Canonical Analysis

Minimal Conditions for Canonical Analysis .

Principal Component Models for Pairs of Canonical Variates

Some Invariants under Transformations ofOne Set

Modified Pairs of Canonical Variates

1

2

6

9

· 14

· 17

• 24

27

II. CANONICAL ANALYSIS IN THE PRESENCE OF SINGULARITIES

2.1

2.2

2.3

2.4

2.5

2.6

Introduction

Singular Sets of Variates

The Extension Theorem

Canonical Analysis of a Two Way ContingencyTable

Scoring Qualitative Responses in a One WayAnalysis of Variance

Canonical Analysis with Linear Restrictions

· 33

· 34

• 35

· 38

41

· 44

CHAPTER

iii

PAGE

III. MODELS FOR THE CANONICAL ANALYSIS OF SEVERAL SETS OF VARIABLES

3.1 Introduction

3.2 Notation

3.3 Criteria for the Selection of First Stage Canonical Variates •

3.4 The Sum of Correlations Model

3.5 The Maximum Variance Model

3.6 The Sum of Squared Correlations Model .3.7 The Minimum Variance Model

3.8 The Generalized Variance Model

3.9 Higher Stage Canonical Variates .IV. PROCEDURES FOR THE CANONICAL ANALYSIS OF SEVERAL SETS OF

VARIABLES

4.1 Introduction

4.2 The Sum of Correlations Procedure

4.3 The Sum of Squared Correlations Procedure

4.4 The Generalized Variance Procedure .4.5 The Maximum Variance Procedure

4.6 The Minimum Variance Procedure

V. PRACTICAL CONSIDERATIONS AND EXAMPLES

5.1 Introduction

5.2 Interpretation of the Results

5.3 Starting Points for Iterative Procedures

5.4 Example Number One

5.5 Example Number Two

5.6 Example Number Three

47

48

50

53

55

56

58

59

60

66

67

73

76

80

86

90

91

93

95

. 103

• 106

CHAPTER

BIBLIOGRAPHY

APPENDIX

PAGE

• 110

. 113

v

ACKNOWLEDGMENTS

It is a special pleasure to thank my adviser, Professor

N. L. Johnson, for his guidance during the course of this research.

Through numerous discussions, he has been a vital source of inspiration

and encouragement.

Much of the work on Chapters II and III was accomplished at Bell

Telephone Laboratories. While there, I received numerous helpful com

ments and suggestions from Dr. J. D. Carroll and Dr. R. Gnanadesikan.

I also would like to thank the other members of my doctoral com

mittee, Professor 1. M. Chakravarti, Professor R. L. Davis, Professor

J. E. Grizzle, and Professor P. K. Sen, for their interest in my work.

In addition, I am grateful to Mr. James O. Kitchen for providing

extensive advice and assistance on the computer programs which were

required.

vi,

INTRODUCTION AND SUMMARY

The main purpose of this study is to investigate ways of extending

the theory of canonical correlation or canonical analysis to deal with

more than two sets of random variables.

The first chapter contains some new approaches to the classical con-

cepts and theory for two nonsingular sets of variables. In addition, a

number of results which are required for the proper treatment of more

than two sets are also presented. The results in Section 1.5, on the

minimal conditions for canonical analysis, and Section 1.6, on principal

component models for pairs of canonical variables, are especially im-

portant in this connection.

The two set theory is further developed in the second chapter to

cover situations where one or both of the sets may be singular. It is

shown, in particular, that the analysis can be carried out without

eliminating the singularities. The key theorem is given in Section 2.3,

and three applications of it follow in the ensuing sections.

Five different techniques for the canonical analysis of several

sets of variables are considered in Chapters III and IV. Two of these

1are due to Horst ([8] , [9], and [10]), a third is due to Steel [21],

and the remaining two are new.'

1 The numbers in square brackets refer to the bibliography.

vii-

One important feature of these methods is that they all reduce to

the classical procedure when the number of sets is only two. A second

important common feature is that each method calls for the selection of

(canonical) variables, one from each set, according to a criterion of

optimizing some function of their correlation matrix.

It appears that the first attempt to extend the theory of canonical

analysis to three or more sets was made by Vinograde [23]. His method,

however, does not possess the above mentioned second feature and hence

does not fall within the framework of this study.

In Chapter III, models of the general principal component type,

extensions of those in Section 1.6, are constructed for each of the

five methods. The models are useful in that they help to clarify the

types of effects which the methods can detect.

The actual procedures for determining the canonical variables ac

cording to the five techniques are developed in Chapter IV. Three of

the procedures turn out to be iterative in nature. A favorable property

of these procedures is that each must converge. Furthermore, if the

starting points are appropriately chosen, the derived variables will

necessarily be the desired canonical variables. (Suggestions for

starting points are made in Section 5.3.)

Horst's "maximum correlation" method (see [8] or [10]) is one of

those which requires an iterative procedure. He, too, proposed an iter

ative procedure for finding the canonical variables, but his differs

from the one recommended here (see Section 4.2). A strong drawback of

Horst's procedure is that its convergence properties are not known.

.. viii

Steel [21], using compound matrices, developed a complicated sys

tem of equations which one could solve, at least in theory, to obtain

the canonical variables for his method. The equations in fact are dif

ficult to handle; and, consequently, a new approach has been evolved

(cf. Section 4.4). The resulting equations can be solved by an iterative

procedure which is quite like the one proposed for the maximum correla

tion technique.

Three examples are given in the last chapter to illustrate the var

ious methods. Some specific practical suggestions are also included.

CRAPTER I

RELATIONS BETWEEN TWO NONSINGULAR SETS OF VARIABLES

1.1 Preliminary Remarks.

The classical problem of identifying and measuring relations between

two sets of variables was first tackled by Rotelling in a brief 1935

paper [11]. Ris definitive study of the problem appeared the next year

in [12] and still stands as a key reference in multivariate statistical

literature.

An example taken from [12] should help to pin down the problem

which is being considered. In this example, a set of mental test var

iables are compared with a set of physical measurement variables. "The

questions then arise of determining the number and nature of the inde

pendent relations of mind and body ..• and of extracting fro~;l the multi

plicity of correlations in the system suitable characterizations of these

independent relations."

Rotelling's method is based on an analysis of the covariance matrix

of the two sets of variates; and, consequently, the relations which can

be found using it are necessarily of a linear type.

This chapter contains a rather thorough investigation of his method

as it would be applied to an arbitrary positive definite covariance ma

trix. The notation needed to proceed further is set down in the next

section. The focus in the third section shifts back to a description of

2

the analysis. The next three sections contain technical results asso-

ciated with Hote11ing's technique. The last two sections include addi-

tiona1 results which are relevant to some special types of relations be-

tween the sets. Much of this chapter has been motivated by and is di-

rected towards problems to be taken up in Chapters III and IV.

1. 2 Notation.

Let the two sets of random variables be

and Xi = with

Adjoining these together produces a (p x 1) vector

X' = with p =

The covariance matrix of !' assumed positive definite throughout this

chapter, is

2:2:11 2: 12 ] P1 «0 .. )) .=

2: 21 2:22 P2 1J

P1 P2

The rank of 2:12

is designated by r.

It is often convenient to make "preliminary" nonsingu1ar linear

transformations of the following type within each set:

= and =-1

T2 X2

where T1

and T2

are matrices such that

2: 22

3

Let

then the transformations can be expressed more concisely as

y =

where

=

(T1

and T2

may be taken to be lower triangular, a computationally ex-

pedient form, or symmetric, in which case it is convenient to write

~~ .. for T.• ) The covariance matrix of Y is1.1. 1.

I R12R = = D- 1W- 1 ,

R21 IT T

Much of what is to follow will concern

where

z. = B~ Y. = B. X.,-1. 1. -1. 1. -1. i = 1, 2,

i Z1 b*' .bii-1 1.-

i Z2 b*' .b;Z. B~(

i-2 and B.1.-

=-1. 1. 1.

.Z .b*' .b'1. Pi 1.-Pi 1.-p .1.

with B*i

an orthogonal matrix. Writing

4

it follows that

z' =

where

[B* 0 B

l 0

]1

DB* and DB = .0 B* 0 BZ2

The relation between DB1c and DB is

DB1c = DBDT .

The covariance matrix of Z is

I <P lZ(1. 2.1)

I

= =

Relations between the two sets are most commonly studied through·

the analysis of an appropriately selected Z. Special notation will not

be introduced to distinguish ~, or any of the quantities depending on

~, when Z is the chosen one. Where a distinction is necessary, it

should be clear from the context.

Most attention will be given to the pairs of variables,

i = 1, Z, ••. , Pl.

<P(i) =

The covariance matrix of ~(i) is

[1 epi]ep. 1 .

1.

5

Denote the ordered eigenvalues of ~(i) by

and the corresponding orthonormal eigenvectors by

and2

e .•-1.

When discussing the variables Z(i)' there are implicitly in the back

ground some Z(l)' ~(2)' .•• , ~(i-1) such that the covariance structure

of the i pairs taken together is compatible with (1.2.1).

Use will also be made of variables

i = 1, 2, ... , P1

which are general unit variance linear compounds of Xl and X2

respectively.

Additional notation will be introduced as needed.

Some special symbols to be used throughout are listed below:

for the Euclidean norm of a rectangular matrix

for the determinant of a square matrix A·,

A;

A for a generalized inverse of a rectangular matrix A",

tr(A) for the trace of a square matrix A",

c. (A)J

V(A)

N(A)

for the j-th largest eigenvalue of a matrix A with real

eigenvalues;

for the vector space generated by the columns of the matrix

A (i.e., the range space of A);

for the null space of the matrix A;

dim[V] for the dimension of the vector space

for the expectation of X;

V·,

6

var(X) for the covariance matrix of X·-'corr(X, y) for the correlation matrix of X and !;

cov(X, Y) for the covariance matrix of X and Y;

1 for a vector of ones;

6 .. for the Kronecker delta.J.]

1.3 Introduction to Canonical Analysis

The formal extraction procedure referred to in Section 1.1 will be

loosely called a "canonical analysis." It is convenient to describe the

procedure in a sequential fashion. At the first stage, one looks for a

pair (lZl' 2Zl) with the largest possible correlation. The derived

variables are known as canonical variables and together they constitute

the firstpair~ canonical variables. The associated correlation, de-

noted by Pl

, is called the first canonical correlation. When one of

the initial sets consists of criterion variables and the other of pre-

dictor variables, then the canonical variate representing the criterion

set corresponds to Rotelling's most predictable criterion [11].

The analysis may be continued until Pl pairs of canonical variates

and canonical correlations have been found. At the i-th stage

(i = 2,3, ••. , Pl)' the i-th pair of canonical variates1 ~(i) is

chosen so that the corresponding i-th canonical correlation, designated

Pi' is the maximum correlation obtainable which is consistent with

(1.3.1) corr(lZj' lZi) = 0, j = 1, 2, ·.. , i-l;

(1.3.2) corr(2Zj' 2Zi) = 0, j = 1, 2, ·.. , i-l;

(1.3.3) corr(lZj' 2Zi) = 0, j = 1, 2, ·.. , i-l;

1 The frequently appearing statement that ~(i) is the i-th pair ofe ca-nonical variates is an imprecise way of saying that !(i) is a particularone of the admissible i-th pairs of canonical variates.

7

(1. 3. 4) 1,2, ..• , i-I.

It is not necessary to impose all of these constraints in order to

arrive at the i-th pair of canonical variates. The derivations which

follow the sequential formation of canonical variates, as outlined above,

generally impose (1.3.1) and (1.3.2) in the process of determining the

higher order pairs (see, for example, Anderson [1] or Lancaster [15])

and then proceed to show that (1.3.3) and (1.3.4) are also satisfied.

In fact, even these two commonly used constraints ((1.3.1) and (1.3.2))

are more than is usually required. This point will be dealt with fur-

ther in Section 1.5.

If a formal completion of the analysis may be accomplished

by finding an additional (P2 - P1) canonical variables 2Zj' j = P1 + 1,

P1 + 2, •.. , PZ' associated with the second set~ The only conditions

imposed on these variables are that they be uncorre1ated with each of the

2P1 other canonical variables and among themselves. Although it is the

canonical pairs !(i) which receive most of the attention, the other

(P2 - P1) canonical variables can also be enlightening. In particular,

the two sets

are always uncorre1ated.

... , and

The results of Section 1.4 establish that

(1.3.5) = II R1Z112.

The elements of R12 are all that can be explained by the canonical

analysis. (1.3.5) indicates in what sense the statement, "the ensemble

of canonical variables accounts for all existing relations between the

two sets," is valid.2 Any Z containing the PI canonical pairs plus these (P2-PI) addi~

tiona1 canonical variables is referred to as a canonical Z.

8

The number of non-zero canonical correlations, which is the same as

r, the rank of L12 , R12 , or ~12' indicates the number of dimensions

or factors common to both sets. The nature of these dimensions or fac-

tors can be clarified in at least two ways. One approach is given by

Rao ([20], p.496): he shows that the minimum number of common factors

needed to explain ~l and ~2 separately (the same common factors

being used in both representations) by the usual factor analysis model

(see Rao ([20], p.498)) is just r. In Section 1.6, each canonical pair

~(i) is expressed in terms of a one or two principal component factor

model. It is only the first r pairs which admit interesting represen-

tations of this type. With respect to the single principal component

factor model, the numbe~ of non-trivial factors, i.e., the number of

~(i) with non-trivial representations in terms of one factor, is again

r.

An important feature of a canonical analysis is that the end pro-

ducts, namely the canonical variates and correlations, are invariant

under nonsingular transformations of either set; this point was made by

Hotelling [12] and has been used advantageously by Horst ([8], [9], and

[10]), Steel [21], and others. It is, therefore, legitimate to make pre-

liminary transformations from !l and X2 to Yl and Y2 in order to

achieve orthonormal bases for the two sets of variables. In terms of

Y, the canonical analysis is exposed as nothing more than the selection

of orthogonal transformations (B*1

and B*)2

to new orthonormal bases,

~l and ~2' which have a particularly simple correlational structure.

3

9

1.4 The Basic Theorem of Canonical Ana1ysis3

Theorem 1.4.1. can be fairly described as the basic theorem of ca-

nonica1 analysis because, together with its corollary, it provides the

essential information for carrying out the canonical analysis ca1cu-

lations. The theorem establishes the existence of a transformation DB

or DB* which induces a unique ~ matrix with the property that the

elements of ~12 are all zero except along its main diagonal where they

are non-negative, less than one, and in descending order of magnitude.

The underlying Z must be a canonical vector as is demonstrated in

Theorem 1.4.2.

THEOREM 1.4.1. Suppose 2: is positive definite. Then there exists

a matrix

(1.4.1) DB = [~1 ~2]and a unique matrix

(1.4.2) ~ DB2:D~

such that

(1.4.3) ~ = [:21 ;12] ,

with

(1.4.4) ~12 (D 0 ) P1 ~hP1 P2-P1

and

The material in this section is included mainly for completeness.The proof of Theorem 1.4.2, however, has a new twist which is useful forthe considerations in Section 1.5.

(1.4.5)

with

D = diag(P1' P2 , ... , pp )'1

10

2 2P2 , ... , P

p1B*T-1 where

1 1

(1.4.6)

must be the eigenvalues of

B! is any orthogonal matrix such that

B*T-1~ ~-1~ T-1 'B*'1 1 ~12~22~21 1 1

Also

where B*2

is any orthogonal matrix satisfying

(1.4.7) =

(1.4.7) determines the first r rows of B~, r being the number of

non-zero eigenvalues

gona1 completion of B~.

Proof.

the remaining rows are any ortho-

where

Using Lemma A2; there exist orthogonal matrices B*1

and B*2

such that

as defined in (1.4.4) and (1.4.5). The

according to Lemma A2, are the eigenvalues of R12R21 • Moreover, B!

may be any orthogonal matrix such that

=

Note that the eigenvalues of R12R21 are the same as those of

-1 -1 -1L:11L:12L:22L:21. Let DB = DB*DT • This definition of DB yields a ~

matrix of the type specified in (1.4.2) - (1.4.5). Suppose now that D~

is any other matrix of the form given in (1.4.1) which yields a matrix $

of the same type. Then

4The "A" indicates that the lemma is in the appendix.

Therefore

A

4>12

=

A A,B1~12B2

A -1 -1 A(B1T1)T1 ~12T2 '(B2T2)'·

11

where is an orthogonal matrix. Hence the must be

the eigenvalues of and there is one and only one

matrix of the prescribed form.

two equations

The properties of B* follow from the. 2

and

=

=

The first corollary contains alternative expressions, in terms of

the original X variables, for parts of the previous theorem.

COROLLARY 1.4.1.1. B1 is any matrix such that

(1. 48) and

Alternatively, the rows of B1 are any complete set of eigenvectors of

the i-th largest eigenvalue, and

1!4.' of B1

B1~l1Bi = 1.

is associated with

The first r rows

of B2 are determined by the equation

(1. 4.9) =

12

and the remaining (P2 - r) rows are chosen so that

(1.4.10)

Proof. Suppose (1.4.8) holds. Then

B1T1 is orthogonal and thus may be taken as Bf in the theorem. Sup-

pose now that

(1.4.11) and

But (1.4.11) holds if and only if

= (B T )' D2

1 1and

Again B1T1 is orthogonal and, because of (1.4.6), may be taken as Bf

in the theorem. The statement involving (1.4.9) and (1.4.10) is an im-

mediate consequence of the theorem.

THEOREM 1.4.2. Any Z with covariance matrix equal to the unique

~ of Theorem 1.4.1 is a vector of canonical variates.

Proof. That (lZ1' 2Z1) may be taken as the first pair of canonical

variates follows from Lancaster [15J or Anderson ([1], p.295). (Both

use an argument like the one used below for the other pairs.) Now for~ ~

i = 2, 3, ... , r in turn select (lZi' 2Zi) to maximize~ ~ ~

corr(lZi' 2Zi) subject to (1.3.1) where 1Zi = a'Z ,--1~

a' (0'.1 ' 0'.2' ... , a ), II~II = 1, 2Zi = (3'Z ,P1 --2

(3 ,«(31' (32' ... , (3 ) , and Il.§.l I = 1. In other words,

P2

rl: ~J' SJ' PJ'j=i

is to be maximized subject to ~. = 0,J

13

j = 1, 2, "" i-I.

Arguing as in Anderson ([1], p.295), let (Yi ' Yi +1 , .•• , YP2

) =

(~iPi' ~i+1Pi+l' .•• , ~rPr' 0, .•. , 0). Using the Cauchy-Schwarz in-

r P2 P22)t

P22)tequality, I l: ~jSlj I = I L.: y.s.1 ::; ( l: ( L.: with equa1-

j=i j=i J J j=iYj j=i

Sj

ity if and only if (yi' Yi +1' ... , Yp ) 0:: (Si' Si+1' ... , S ) • Assum~

2 P2

ing this equality condition, S. = 0, jJ

r+1, r+2, ••• , P2' Changing

the sign of all the ~'s, if necessary, one is left with

(1.4.12)PI Plr 2 2 2 2 2 2

L.: ~jSjPj C l: ~.P . C { L.: (p j Pi) ~. + ~.} for some C > ° .j=i j=i J J j=i J 1.

If Pi = Pi +1 = ... = Pk (and Pk> Pk+1 if k < PI) , then to maximize

(1.4.12) it is necessary that ~. = 0, j = k+1, k+2, ... , PI and henceJ

Sj = 0, j = k+1, k+2, ... , r. Clearly S. = 0, j = 1, 2, ... , i-I,J

in addition, so that (1.3.2) holds. Evidently

k k 2~

1Zi l: 01.. 1Zj , L.: ~. 1,j=i J j=i J

k kS7

~

2Zi = l: S. 2Zj' l: = 1,j=i J j=i J

(1.4.13)

~. = Sj' j = i, i+1, ... , k ,J

and~

corr(lZi' 2Zi) Pi'

Take ~. = S. = 1 and note that conditions (1.3.1) - (1.3.4) are satis-1. 1.

fied so that !(i) is the i-th pair of canonical variates. (A similar

proof holds starting from (1.3.2) instead of (1.3.1).) For r < PI'

14

(lZr+l' lZr+2' ••. , lZ ) is uncorrelated with ~2 which means thatPI

the (r+l)-st canonical correlation is zero. In view of the structure of

~, this is enough for ~ to be a canonical vector.

The composition of an arbitrary i-th pair of canonical variates,

i ~ r, is spelled out in the corollary.

are fixedCOROLLARY 1.4.2.1. Suppose ~(l)' ~(2)' ••• , ~(i-l)

pairs of canonical variates for i ~ r. Suppose ~~i)' Z(f+l)' ••• , ~~k)

are also pairs of canonical variates with and

Pk > Pk+l

of the form

if Then any i-th pair of canonical variates must be

~(i)

kL cx. Z*(.),

j=i J - J

k 2L cx

J. =

j=i1 .

Proof. The proof for i = 2, 3, ... , r follows from (1.4.13). The

case j 1 can be handled by a similar argument.

1.5 Minimal Conditions fo~ Canonical Analysis

This section contains results pertaining to the effects of restric-

tions (1.3.1) - (1.3.4) on the generation of higher order canonical var-

iates. The effects of two alternative constraints are also considered.

THEOREM L 5 •L Let Z Z Z be the first (i-I)-(1)' -(2)' ••. , -(i-I)

pairs of canonical variates with i ~ r+l. Then restrictions (1.3.1)

and (1.3.4) on the i-th pair of variates are equivalent, as are the

restrictions (1.3.2) and (1.3.3). Furthermore, for i ~ r, variables

lZi and 2Zi having the maximum correlation, subject to only one of the

15

constraints (1.3.1) - (1.3.4), must in fact satisfy all four and thus may

be identified as the i-th pair of canonical variates.

in their proper places.

contain the elementsProof. Let -~'l and '~2

Form1Z1' = a' Z- -1 and

of. ~(1) ~Z.(2)'··· '~(i-l)

2Zi = £'~2 as in the

proof of Theorem 1.4.2. Requiring that 1Zi satisfy (1.3.1) is equiv-

alent to having a. = 0,J

j = 1, 2, ••. , i-I. Requiring that sat-

isfy (1. 3.4) is equivalent to having a ,P. = 0, j = 1, 2, ... , i-IJ J

which is the same as a. = 0, j = 1, 2, ... , i-I since i ::; r+1. ThisJ

demonstrates that (1. 3.1) and (1.3.4) are equivalent for i ::; r+1 and a

similar argument proves the equivalence of (1.3.2) and (1.3.3), again for

i ::; r+l. The proof of the second part of the theorem is contained in the

proof of Theorem 1.4.2.

One constraint of the type (1.3.1) - (1.3.4) will not suffice to

determine canonical variates in the case r < i ::; Suppose, as one~

possibility, that 1Zi and 2Zi have the maximum correlation subject

where 1Zi and 2Zl are canonical variates. But this is clearly not

the i-th canonical pair. Perhaps the most natural way of dealing with

this case is to impose the two constraints (1.3.1) and (1.3.2). One

then has the following well known result.

THEOREM 1. 5.2. Let ~(l)' ~(2)' •.• , ~(i-l) be the first (i-I)

pairs of canonical variates with r < i ~ Pl. Then any ~(i) may be

taken as the i-th pair of canonical variates (in other words, any f(i)

subject to (1.3.1) and (1.3.2».

16

Proof. This follows from Theorems1.4~1 ann 1.4.2 noting that there

exists a Z with covariance matrix ¢ as in (1.4.2) - (1.4.5) and con-

taJ"n·l'n~ . ·z Z-- Z -['_n the proper. loc~tl·ons... - ','" ·"::','1\' -(?\' •••• ---(J.') -,./ ,L../ ,

Another type of constraint can be employed to generate

min(Pl' r + 1) canonical pairs.

THEOREM 1. 5. 3. Suppose ~(l)' ~(2)' ••• , ~(i-l) are the first

(i-I) pairs of canonical variates. Then the variables lZi and 2Zi

with the maximum correlation subject to the constraint

(1. 5.1)~

l'corr(Z(.), Z(.»l = 0, j = 1,2, ... , i-I- -J -1.-

must be the i-th canonical pair for i ~ min(Pl' r+l) (but need not be

in case r + 1 < i ~ PI)'

Proof. Start with1

2,l

== a'Z-- '--1N

and. 2Zi =: £.;~2 exactly at; 1.11 the proof

of Theorem 1.4.2.

(1.5.2)

Thus (1. 5.1) holds if and only if aj = - Bj' j = 1, 2, ... , i-I. Then

i-l 2PI

(1.5.3) corr(lZi' 2Zi) = - 1: a.p. + 1: ajBjPj .j=l J J j=i

If i ~ r+ 1, a. = Bj = 0, j = 1, 2, ... , i-I for a maximum. UsingJ

Theorems 1. 5.1 and 1.5.2, it may now be inferred that Z(i) must be the

i-th pair of canonical variates. If i > r + 1, then Pi - l = 0 and

the maximum value of (1.5.3), namely zero, may be obtained even with

a i _1 ~ o. Consequently, ~(i) need not be the i-th pair of canonical

variates in this case.

17

A constraint similar to (1.5.1) can be used with the same effect.

THEOREM 1.5.4. Theorem 1.5.3 remains correct if (1.5.1) is replaced

by

(1.5.4) j = 1, 2, ... , i-I.

Proof. The proof is the same as that for Theorem 1.5.3 except that

(1.5.2) is replaced by

= (a. + 8.)(1 - P.).J J J

1.6. Principal Component Models for Pairs of Canonical Variates

The concepts of canonical analysis and principal component analysis

can be brought together in a way which gives new insight into the struc-

ture of the canonical variates. This approach is extremely useful in

developing and relating methods for the canonical analysis of several

sets of variables which is the sUbject of Chapter III.

The first theorem of this section and its corollary link the canon-

ical variates to the principal components of Y.

THEOREM 1.6.1. The eigenvalues of Rare

1 + Pj' j = 1, 2, ... , PI

c. (R) 1, j = Pl+l, Pl+2, ... , P2J

1 - Pp+l - j ' j = P2+l , P2+2, ... , p

If v., j = 1, 2, ••. , p, are corresponding orthonormal eigenvectors,-J

then v1 ' v2 ' ••• , v r ' ;-r+l' ';'-r+2' ••• , .;, must be of the form

18

{-~

Zbj') ,2 (b*' j 1, 2, ... , rv~ =

l-j ,-J

2-~(- b*' 2E.j') , j = p-r+l, p-r+2, p1 . , ... ,-J

and ~r+l' vr+2' ..• , ~-r can (but need not) be of the form

-~2E.j') ,2 (lb~', j = r+l, r+2, ... , PI-J

v~ (0' 2b i:') , j = Pl+l, Pl+2, ... , P2-J - , -J-~ b*' 2b~'), j P2+l , P2+2,2 (- 1 . , = ... , p-r

-J -J

(The are the rows of B*1 and B~ which define the ca-

Proof. Let V = (VI' v2 ' ••. , V ) where the v. are as defined above,-p J

and let A = diag(I + D, I, I - D) where D is as in (1.4.5). Now

(1. 6 .1) RV = VA < = > B!R12B~' = (D 0).

The second equation in (1.6.1) holds by virtue of (1.4.7) so that the

eigenvalues of R are as claimed, and the v.-J

form a complete set of

orthonormal eigenvectors. The proof that the first r of the v.-J

and

the last r of the v. must be of the prescribed form follows readily-J

from Corollary 1.4.2.1.

COROLLARY 1.6.1.1. The first r principal components of Y must

be of the form

j = 1, 2, ... , r ;

and the last r principal components must be of the form

v'Y =-j- j p-r+l, p-r+2, ••• , p •

The corresponding variances are the respective

canonical variates.)

c. (R) •J

(The .Z.~ J

19

are

Okamoto and Kanazawa [18] present a development of principal com-

ponent theory which is more general in scope than any previous approach.

Their main result is recorded in Lemma A3. It can be used to further

develop the work of Carroll [3]. Paraphrasing Carroll's central idea,

the sum of squared sample correlations between each of the lZl and

2Zl sample vectors, standardized to zero sample mean and unit sample

variance, and a third vector, at one's disposal, is a maximum with re-

spect to the third vector when this third vector is proportional to the

sum of the other two and overall when the sample vectors are canonical.

Suppose fixed lZl and 2Zl sample vectors and an optimal third vec

tor are used as the coordinates of three points in Euclidean space.

Then the line passing through the origin and the third point is the first

principal component line (cf. Gower [7]) for the other two points (i.e.,

those corresponding to the sample vectors). A random variable develop-

ment of Carroll's work, in the context of principal components, is pos-

siblewiththe:a.id,OfLemma A].

Suppose ~(l)' ~(2)' ••• , ~(i-l) are the first (i-l) pairs of ca

no~al variates. Assume, for convenience, that E(~(i» = Q and

corr(lZi' 2Zi) = ~i ~ O. Consider as a representation of an arbitrary

~(i) ,

(1.6.2) ~(1)

where t. is a non-null vector, F. is a standardized (zero mean and-~ ~

unit variance) random variable, and E. is an error vector. t. and F.-~ -~ ~

20

are to be chosen to produce the best fitting model for !(i) according

to the following criterion:

(1.6.3) minimize f(var(~)) = g(61 , 6Z)'

where g is any strictly increasing function in 61 and 62

, the

eigenvalues of the non-negative definite matrix in the argument of f.

Two possib1ities for f are the trace and the Euclidean norm functions.

According to Lemma A3, optimal choices (for all f satisfying the

above condition) are

t {{(1 + c/>i)/2}t 1. if

.t. = 1Ai Ie. =-~ -~

ifIe.-~

(1.6.4) and

{-t

F. -tle~Z(")

{Z(l + c/>i)} !'!(i)= 1Ai =

~ -~- ~le~Z(")-J.;- ~ ..

c/>. > 0~

c/>i 0

if c/>. > 0~

in which case

The !(i) admitting the best fit of (1.6.Z) in the sense of (1.6.3)

must be the i-th pair of canonical variates since (1 - c/>i) is a min-

imum when c/>i is a maximum. This maximum is, of course, Pi.

Consider next a two factor model for !(i)

(1.6.6) !(~) = l.t· IF. + 2.t· ZF. + E. ,... -~ ~ -~ ~ -~

where l.ti and 2~ are non-null vectors, 1Fi and ZFi are stand-

ardized random variables, and is an error vector. Clearly the .t's

21

and F's can be chosen so that E. = O. This however is to be done in-1. -

a special way: first, pick 1~ and lFi to

(1.6.7)

second, pick 2~ and 2Fi to

(1.6.8)

In both instances, g is the function defined in (1.6.3).

Applying Lemma A3, the choices are

{k

~{(l + <p.)/2F(1, 1) if <Pi > 0

l~I 1.

lAi l~ I if <Pi 0l~

=

~ {{(l - <P . ) / 2}~ (1, -1) if <Pi > 0

2~I 1.

= 2Ai 2e . =-1. ,

if <Pi 02~

=(1.6.9) -k

-~ {{2(1 + <P.)} 2(1, 1)!(i) if <Pi ,> 0

lFi le~ZC)1.

= lAi -1.- 1.

l~~(i) if <Pi 0=

{-k

-~{2(1 - <P.)} 2(1, -l)~(i) if <Pi > 0

2Fi 2e~zC)1.

= 2Ai =-1.- 1.

2e~Z C) if <Pi 0-1.- 1.

An external criterion is needed for the "optimal" selection of

g = 0 after the second fit (1.6.8).

~(i) • If

(1. 6.7) and

f is the trace function, g = 1 - <p.1.

after the first fit

Translated into

words, the first factor accounts for an amount l Ao = 1 + <P. of the var-1. 1.

iability in ~(i)' and the second factor accounts for the remaining

amount 2Ai = 1 - <Pi. A sensible approach is to choose ~(i) to make

lAi large and 2Ai· small. This, after all, is the effect of model

(1.6.2) as can be seen from (1.6.5). It will prove beneficial for

22

future work to state the external criterion in the following two equiv-

a·len t ways: choose ~(i) to

(1. 6 .10)

or

maximize =

(1. 6.11) 1 - cP~1.

Each leads to ~(i) being the i-th pair of canonical variates.

These results are summarized below.

pairs of canonical variates. (Assume

THEOREM 1. 6• 2. Let ~(l)' ~(2)' .•• , ~(i-l)

E(~(i» = .9..

be the first (i-I)

and CP. ~ 0.) The1.

best fit of model (1.6.2), for arbitrary ~(i)' as measured by (1.6.3),

is given by (1.6.4), and the ~(i) yielding the best fit is the i-th

canonical pair. The best fit of model (1.6.6), for arbitrary ~(i)' as

measured by (1.6.7) and (1.6.8), is given by (1.6.9). The ~(i) admit

ting the best fit using (1.6.10) or (1.6.11) is again the i-th canon-

ica1 pair.

COROLLARY 1.6.2.1. max 1Ai~(i)

1 + p ••1.

1 - p. •1.

COROLLARY 1.6.2.2. Let Z(i) replace ~(i) in model (1.6.2).

(Assume E(Z(.» = ° and corr(lz" 22.) = ~. ~ 0.) Requiring that the-1. - 1. 1. 1.

best (cf. (1.6.3» fitting factor F. satisfy1.

corr (F ., F.)J 1.

0, j = 1,2, ... , i-I

23

~ ~

is equivalent to requiring that ~(i) satisfy (1.5.1) if ¢i > O. This

is also true for ¢. = 0 if (and only if) Ie. is chosen proportional1 -1

to 1.

Proof. corr(Fj , Fi ) = 0 < = > cov(ll~(j)' l'~(i)) = 0

< = > !'corr(~(j)' ~(i))! = 0, j = 1, 2, ••• , i-I. Note that

Fi 0:: !'Z(i) only when l~i 0:: !.

COROLLARY 1.6.2.3. Let~

~(i) and be as in Corollary 1.6.2.2.

Requiring that the best (cf. (1.6.3) fitting factor F. be such that1

corr(E., E.) = 0, j-J --].

1,2, •.• , i-I

~

is equivalent to requiring that ~(i) satisfy (1.5.4) if ¢i > O. This~

is also true for ¢. = 0 if (and only if) Ie. 0:: 1 (which implies1 -1

2e~ 0:: (1, -1)).-1

Proof.

< = >

that

~

corr(E j , .§.i) = 0 < = > cov«l, -l)~(j)' (1, -l)~(i)) = 0~

(1, -l)corr(~(j)' ~(i)) (1, -1)' = 0, j = 1, 2, .•. , i-I.

E~ 0:: (1, -1)(1, -l)~(i) if and only if 1~ 0:: 1.

Note

fitting factors lFi and 2Fi satisfy

Let replace in model (1.6.6).

Requiring that the best

~(i)

;<: 0.)

~

~(i)COROLLARY 1.6.2.4.~

(Assume E(~(i)) = Q and

(cf. (1. 6. 7) and (1. 6.8) )

corr( F., F.) = 0, u = 1, 2; v = 1, 2; j = 1, 2, ... , i-Iu J v 1

~

is equivalent to requiring that ~(i) satisfy

~

corr(~(j) , ~(i)) 0, j 1,2, ... , i-I

(i.e., conditions (1.3.1) - (1.3.4)). This is true for i 2,3, •.• ,P1·

24

Proof.

j = 1, 2, •• 0, i-l, and

[:::] -A.Z( ')1- 1

for some nonsingular Al , A2 , "0' Ai' Now

corr(Aj~(j)' AiZ(i)) = ° < = > corr(~(j)' K(i)) = 0,

j = 1, 2, ... , i-l; i = 2, 3, ... , Pl'

1.7 Some Invariants under Transformations of One Set

The next theorem is the main force behind certain iterative pro-

cedures developed in Chapter IV, but it is also of interest here for the

further insight it gives into the relations between two sets of variates.

In essence, one set of variables is held fixed and attention is directed

to some special functions which are invariant under internal nonsingular

transformations of the other set.

Let the second set !2 be fixed and consider the following func

tions of ~l:

P22

0'.. = L: {corr(lZi' 2Xj)} , i = 1, 2, ... , Pl ,1 j=l

P22

Si = { L: corr(lZi' 2Xj)} , i = 1, 2, ... , Pl ,(1. 7.1) j=l

and

[lZiJYi var i = 1, 2, ... , Pl-!2

25

The goal, for i = 1, 2, ••• , PI in turn, is to select different lZi

to (i) maximize (ii) maximize and (iii) minimize y .•l.

The

technique for doing this and the values attained are presented in Theorem

1.7.1. (Part (i) of this theorem was discussed by Fortier [6] and

earlier by Rao [19].)

THEOREM 1. 7.1.

in (1. 7.1) are

The optimal values for ai' Si' and y. definedl.

(i) i=1,2, ••• ,Pl

and

i 2,3, .•• ,Pl

(iii) i=1,2, ••• ,Pl;

where

~)o •• • . , pp

These values are attained when the lb i defining the lZi are the

eigenvectors, subject to BlLllBi = I, corresponding to the i-th largest

-1 -1 -1 -~ -~ -1-1eigenvalues of LllL1ZDZ2LZ1' LllL12DZZ! !'D22L2l , and LllLlZLZZL2l

respectively. For Sl > 0, the corresponding eigenvector is

-1 -1 -~Sl LllL1ZD2c!·

Proof.

is

Repeated use will be made of Lemma AI. The maximum value of

26

and occurs when Ib l is a corresponding eigenvector of

The maximum value of is

sup

1~·2

lE..i 1: 11 1b2 = °

=

and occurs when 1b2 is a corresponding eigenvector satisfying

Continuing in this way, the maximum value of

is a corresponding eigenvector satisfyingand occurs when bl-Pl

Ibj 1: 11 l~l = 0, j = 1, 2, .•• , PI-I. The results for the Si and

follow in a similar manner. To ob tain the lZi yielding the max-

imal Si' one needs to consider the variation of

And for the lZi giving the minimal Yi , it is the variation of

1 lE..~ 1: 12

1: 21 1bi 1: 22=

which is of interest. MinimiZing this last expression is equivalent to

maximizing

and this may be carried out using Lemma AI.

The minimal are especially noteworthy because of their in-

27

timate relation to the canonical correlations. Moreover, the corre-

sponding lZi are canonical variables for the first set. Part (iii) of

the theorem is in essence an application of the Cauchy-Schwarz inequality.

1.8 Modified Pairs of Canonical Variates

The restrictions (1.3.1) - (1.3.4) are quite stringent in that they

leave the experimenter with little control over the extracted summary

variables which he must analyze. Typically, it is very difficult, if

not impossible, to attach meaning to the resulting higher order canonical

pairs (and sometimes to the first pair as well). Another awkward point

is that canonical variables derived from an estimated covariance matrix

will usually be correlated in the population, when not members of the

same pair, even though uncorrelated in the sample.

The purpose of this section is to consider one particular way of

adding flexibility to the types of higher order summary variables at

one's disposal. The resulting "modified" canonical variates could be

used to supplement the first pair as the usual higher order pairs do.

The i-th pair of modified canonical variates consists of variables

(1. 8.1)

subject to

1,2; ji 2 5

1,2, ••. ,i-l; L: e.. =l.j=l J1

5 This constraint on the 8ji is given for convenience only and can beeasily removed.

28

The maximum correlation is the i-th modified canonical correlation,

designated P ••J.

THEOREM 1.8.1. Suppose (lZj' 2Zj) = (l~j' Y1' ~j' Y2)'

j = 1, 2, " •• , i, are the first i-pairs of canonical variates. Assume

~

the corresponding canonical correlations are distinct. Let (lZj' 2Zj)'

j = 1, 2, .•• , i, designate the modified pairs of canonical variates.

Then the i-th pair of modified canonical variates, constructed to max-

imize (1.8.1) subject to (1.8.2), may be defined as

where

b*' Y11-i and

i

E e.. lb~J J. -Jj=l

and 2b~-J.

iE e.. 2b~

J J. -Jj=l

in which case the inequalities in (1.8.2) are all equalities and the cor-

responding i-th modified canonical correlation is

Also, for h < i,

2Zi)~

corr(lZh' = corr(lZi' 2Zh) = ehiPhh

corr(lZh' 1Zi) = corr(2Zh' 2Zi) E ejhejij=lh

~

2Zi)~

2Zh)corr(lZh' = corr(lZi' E e 'h e. ,p.j=l J JJ. J

All of this holds for i 2, 3, ... , P1"

29

Proof. If 6 .. = 0, j = 1, 2,It " " ,

i-1, the results follow fromJ~

Theorems 1.4.1 and 1. 4. 2. In general,

'"PI P1 2

lb~~. a. b* ~ a. 1·1 ., ,

-~ j=l J -J j=l J

'"P2 P2

S:2~ = ~ S. 2b~, ~ = 1;j=l J -J j=l J

Icorr(lZj' 1Zi) I 10'. .1 :s; 6 .. , j = 1, 2, ... , i-1J J~

= 6 •• ,J~

j =1,2, ... , i-1

P1= L: a.f3.p ..

j=l J J J

The a's and S's may be assumed non-negative without any loss. Con-

sider a,J

and fixed for j 1,2, •.• , i-1 and define

Yj

j i, i+1, ... , P1

of Anderson ([1], p. 295).

2)~ with equality if and onlyYj

8 e " , S ) . Thus, for a maximum,

P2P2

P2 P12 2 2~ Sj Yj = K ~ Yj = K 2: a j Pj =

j=i j=i j=i

for some K > o. Maximizing over

and••• , P2

i-1 2 2}~ a,)p.

. 1 J ~J=

proceeds

P2

~ S, Y •j=i J J

a1Qng the lines

P2 k P2:s; (2: S:)2( ~

j=i J j=i

(Yi' Yi +l' ••• , YP/ IX ( Si' Si +1 'if

The argument

P1

~ aJ.SJ' PJ' =j=i

the free a's yields a, = 0, j = i+1, i+2, ... , PI'Ji-1

2)~a. = (1 - I: aj , Sj = 0, j = i+1, i+2, .. " , P1' and~ j=l

i-IS:)~.Si = (1 - 2: What remains is to maximize

j=l J

i-I i-I2)~

f(al' S1' a2 , S2' ... , ai-I' Si-l) = 2: ajS jP j + (1 - 2: a. .j=l j=l J

30

i-1 2k(1 - 2: S.)2p •

j=l J 1.in the region ~ e.. , 0 ~ SJ' ~ e.. , j

J1. J1.1,2, .• ,i-1.

If Pi 0, it is easily seen that the maximum occurs when a. = Sj e.. ,J J1.

i-I 2j 1, 2, ... , i-1 so that f = 2: e.. p .. Hence suppose Pi > O.

j=l J 1. J

Since eji = 0 implies aj = Sj = 0, it is only necessary to consider

eji > O. For any such j,

i-IS:

~1 - 2:

af j=l J--= Slj - a. Pi' 0 ~ a. < eji; 0 ~ S. ~ 8 ..aa. J i-1 2 J J J1.

J 1 - 2: a.j=l J

and

i-l 2 ~1 - 2: a.af j=l J

--= a.p. - S. Pi' 0 ~ a. ~ 8ji; 0 ~ Sj < e...as. J J J i-l

S:J J1.

J 1 - 2:j=l J

Except for the solution aj

= Sj = 0,

afaa. =

J

and

31

af (3j a,--= o => <

]

a(3,

[1 - i-1 2]'" [1 - 1-1 21"JI: (3, P I: a,

j=l J j=l J

Consequently, there is only one solution and the maximum f is attained

when, for each j, either

if a, = 0, then ~<J a(3,

Jaf 0, 0 e" •-- < < a, <

aa j J J1.

a, = 0 or e.. or (3 , = 0 or e, , . But,J J1. J J1.

0, ° < (3, < eji; if (3 , = 0, thenJ J

This means there are four possibilities for each

j(O .. > 0);. Jl

(i) a, = (3 , = e.. ;J J J1.

(ii) a, e, , , 0 < (3 , < e..J J1. J J1.

(iii) (3 , 8, , , 0 < a, < e..J J1. J J1.

(iv) a, = (3, = ° .J J

Not all j's can be associated with (iv) for then f is less than that

is classified with

Similarly, if the

This means that either

connected with (ii),

e, , .J1.

are associated with (i). Suppose j

with (iii). Then ~ =~ = ° >a(3j aaj ,

which is impossible.

j'

j's

and(ii)

when all

2(3 , a , , p, =i> (3, a " > e" e, , 'J J 1. J J J1. J 1.

(ii) or (iii) can be ruled out. Now, for a j

~(3af > e,,(p, - p,) > 0 which means that (3J'o j J1. J 1.

j were linked with (iii) instead of (ii), one would infer that

Thus the only remaining possibilities are (i) and (iv).a, = e".J ]J.

nally, (iv) is ruled out by noting that, for any j

Fi-

connected with it,

the corresponding value of f must be less than the value when

a j = Sj = 6ji • Evidently a j = Sj = 6ji , j = 1,2, ... , i-I, and

i-I 2f = Pi + E 6

J. i CP

J. - Pi)' The remainder of the theorem follows

j=l

readily.

32

CHAPTER II

CANONICAL ANALYSIS IN THE PRESENCE OF SINGULARITIES

2.1 Introduction

The theory of canonical analysis is extended in this chapter to

cover the situation where one or both of the sets may be singular. The

key matrix in the canonical analysis of two nonsingular sets was shown in

-1 -1Theorem 1.4.1 to be E1IE12E22E2l. Here, in the less restricted setting,

- -the key matrix turns out to be EI1E12E22E2l ("_" indicates a gen-

eralized inverse). The main result is recorded in Theorem 2.3.1.

Three applications of the theorem are developed. In Section 2.4,

the question of independence in a two way contingency table is considered

as one of independence of two singular sets of variables. In Section

2.5, a scoring system which yields the maximum or minimum F statistic

is constructed for qualitative responses in a one way analysis of var-

iance. The corresponding most significant contrast in the treatment

effects is obtained at the same time. In Section 2.6, a simple procedure

is outlined for the appropriate canonical analysis when linear restric-

tions have been imposed on the coefficient vectors which define the ca-

nonical variates. The analysis is arrived at by first translating the

problem into terms of two singular sets of variables.

34

2.2 Singular Sets of Variates

In many applications of canonical analysis, one must be prepared to

handle singularities in one or both of the sets. From each singular set,

it is customary to derive two subsets by an appropriate nonsingular trans

formation: one is nonsingular and the other is degenerate in that its

covariance matrix is null. The degenerate subset is then eliminated on

the grounds that it can contribute only a constant term to the canon-

ical variates. The problem is thus reduced to the familiar analysis of

two nonsingular sets of variates.

The theory presented in Section 2.3 demonstrates that the prelim

inary transformation and reduction are in fact unnecessary. In some

situations, for example those of Sections 2.4, 2.5, and 2.6, considerable

saving may be realized by not making the reduction. More generally, the

theory yields a unified approach to the subject which covers the situa

tion where both sets are nonsingular as a special case.

The degenerate subset mentioned above is formed from a particular

basis of the null space of the covariance matrix of the singular set.

The null space, in some cases, may contain useful statistical insights

and thus be worthy of special attention. It is possible to obtain a

basis for the null space as a by-product of the calculation of the gen

eralized inverse of the singular set covariance matrix.

This generalized inverse plays an integral role in the ensuing de

velopment. A generalized inverse (g-inverse) of an arbitrary matrix A

is any matrix A such that AA A = A. This definition has been popu

larized by Rao [20J; he has emphasized that it is often easier to com

pute than other (more restrictive) generalized inverses and that it is

particularly relevant to statistical problems. For matrices with sim-

35

ple structures, like those in Sections 2.4, 2.5, and 2.6, one may be

able to determine a g-inverse by inspection or by trial and error.

McKeon [16] appears to have been the first to employ generalized

inverses in the context of canonical analysis. Confining attention to

square matrices, he defines a generalized inverse A+ of A as a matrix

satisfying and (ii) AE = EA = A for some idem-

potent E. Such a matrix A+ exists whenever A is symmetric, but may

not otherwise (e.g., when A = (~~J).

A+ is a g-inverse.

With +A = A', AA A = A so that

The theory presented here may be taken as justification of McKeon's

development which he gives without proof. It is more general, however,

in that it allows one to choose from the larger class of g-inverses where

particularly simple and natural matrices, inadmissible in McKeon's set-

ting, may be found.

2.3 The Extension Theorem

The ranks of and 2: 22 are denoted by and q2' Because

ql ::;; Pl and q2 s; P2 there are only (ql + q2) canonical variates to

be found instead of (Pl + PZ). Consequently, the dimensions of certain

quantities defined in Section 1.Z need to be adjusted to reflect the cor-

rect number of canonical variates:

B' = (1bl' 1b Z' ... , 1b ) and B' (2bl' 2b Z' ... , 2b ) ;1 :-ql 2 :-<12

Z' = (lZl' lZZ' • e e , lZ ) and Z' = (2Zl' 2ZZ' ... , 2Z );-1 ql -2 q2

<P1Z = Bl 2: l2Bi, a (ql x qZ) matrix.

36

THEOREM 2. 3. 1. Let and r be the ranks of

and ~12 respectively. Let ~11 and ~22 be g-inverses of ~11 and

eigenvalues of - -~22 • Suppose q1 :s; q2. The ~ 11~ 12~ 22~21 are

2 2 2 0, 0 where 1 > 2 > 2 > > 2 thePI' P2, ... , Pq , ... , P1 - P2 - ... - P are1 q1

ordered squared canonical correlations. Eigenvectors lb. associated-1.

and used to define canon-2Pi may be chosen so that B1~11Bi = I

ica1 variates ~1 = B1X1 for the first set. Any matrix B2 satisfying

with

(2.3.1) ,:and" = I ,

where

(2.3.2) !P12 = (D

q1and D = diag(P 1 , P2 , ... , Pq ),

1

may be used to define corresponding canonical variates ~2 B2!2

for the second set.

ProoL 1 Use will be made of the existence of matrices M12 and M21

such that ~12 = M12~22 and ~21 = M21~11 (cf. Lemma A4). ~11~11 is

idempotent since

and

1 This method of proof was suggested to me by Professor R. I. Jennrich.

37

Thus, by the matrix analogue of Lemma AS, the direct sum decomposition,

V(I ) = N(E 11) $ V(E~1E11) , can be made. Similarly,P1

V(I ) = N(E 22 ) $ V(E;2E22)· Also note that dim[ V(E~lE11)] q1 andP2

dim[V(E;2E22)] = q2· From Theorems 1.4.1 and 1.4.2, it follows that there

exist vectors 1~ and 2b . defining canonical variates 1Zi = 1E.d. Xl-J

and 2Zj = 2bj X2 with canonical correlation Pi = corr(lZi' 2Zi)'

i = 1, 2, ••• , q1; j = 1, 2, ••• , q2" The component of 1~ in N(E 11)

adds only a constant to 1Zi so that without loss it may be assumed

that l~i E V(E~lL11) and likewise that 2b j E V(E;2E22)·

Therefore

a =

=

cov(lZi' Pj 1Zj - 2Zj)

lb~ (P j E11 lb j - E12 2b j)' i = 1, 2, ••• , q1 •

which implies that

o =

one arrives at

Now

Pj

E11 lb. = E122

bj

}-J

j = 1, 2, ••• , q1·Pj E22 2b . = E21 lb.-J -J

and

38

so that

and

Therefore

- -EIIE12E22E21 lb j , j = 1,2, ... , ql'

Every is an eigenvalue of and the rows of BI form

a linearly independent set of eigenvectors with each lbj

E V(E~IE1I)'

Since dim[N(EI1)J = PI - ql and E~IEI2E;2E21b = E~IEI2E;2M2IEllb = Q

for any l E N(E II ), it follows that the remaining (PI - ql) eigen-

values are all zero. Now let Bl be any matrix such that BlEllBi = I

Bin2 with n as in (2.3.2). Then

Let the first r rows of B2 be defined by

E;2E2lBi = B2~i2 with ~12 as in (2.3.2) so that n2

= ~12B2E22B2~i2'

Thus the matrix B2 may be completed so that B2L22Bi = I. Now

Bl E12Bi = BlE12E;2L22Bi = ~12B2E22Bi = ~12' ~l and ~2' therefore,

are vectors of canonical variates.

2.4 Canonical Analysis of ~ Two Way Contingency Table

The analysis of association in a two way contingency table is a

particular application of canonical analysis. This is well documented

(Fisher [5J, Williams [24J, Kendall and Stuart [141, and McKeon [16]).

39

Let f .. , f. , and f. be the probabilities of an observation falling~J ~- - J

in the (i,j) cell, the i-th row, and the j-th column, respectively_

Write

F = «fij » ,

f' = (f1- , f 2_, ... , f .>-1 P1

f' = (f -1' f •• II , f )-2 -2' -P2

D1diag(f1 _, f 2_, \I •• , f .>

P1

and

To make the connection with canonical analysis, it is necessary to

define the vector variates Xl and X2- They will be used as indicators

of row and column membership: for an observation in the i-th row and

j-th column, Xl would have a one as the i-th element with zeros else

where and !2 would have a one as the j-th element with zeros else

where. Then

and

It is easy to show that q = p - 1 and q2 p - l. Natural1 1 2

choices for the g-inverses - -1 and E22

-1 (McKeon usedare .E11

= D1 D2 -

E11

=-1 and E

22=

-1 where E = (I - p~11l') andE1D1 E1 E2D2 E2 1

there are

40

(squared) canonical

correlations which may be obtained as the ql largest eigenvalues of

One eigenvalue is always zero and 1 E N(~ll) an associated eigenvector.

This reflects the known constraint iiI = 10 The sum of the eigenvalues

is

2P P2 f ..L:

lL: 1.J

. 1 . 1 f. f .1.= J= 1. 0 OJ- 1.

It is easy to show that L: 12 = 0 if and only if Xl and X2

are

independent, as in the case of the multivariate normal distribution. It

is for this reason that the canonical correlations are directly related

to independence statements or hypotheses.

The sample covariance matrix S, based on n observations, has the

same structure as

n. In, and f .1. 0 OJ

by

One need only replace

n . In; n .. , n. , and° J 1.J 1. °

f ..1.J

n .OJ

by n .. In, f. by1.J 1. °

are the cell, row, and

column totals, respectively. Then, partitioning S like ~, it fol-

lows that

2nij

n. n .1. ° ° J

- 11 2=; X

where 2X is the "square contingency" statistic for measuring association

in a two way table. A test based on 2X is, therefore, a test based on

the sum of squared sample canonical correlations.

41

2.5 Scoring Qualitative Responses in ~ One Way Analysis of Variance

A problem concerning qualitative responses in a one way analysis of

variance (ANOVA) is mathematically equivalent to the contingency table

2analysis of Section 2.4. The ANOVA problem has been included to provide

one example from the multitude of rotational type problems which arise

frequently and, even though not reducible to the study of two sets of

random variables, can be translated into a form to which the usual ca-

nonical analysis techniques, or hybrids of them, are applicable. (Other

examples are discriminant analysis, as discussed by Bartlett [2] and

MCKeon [16], and the problem of orthogonal rotation to congruence in

factor analysis. Some references to the latter are Wrigley and Neuhaus

[25], Horst [8], and Cliff [4].)

Imagine a one way ANOVA with n observations representing qual-

itative responses at any of Pl possible levels on a total of

treatments. The object is to affix scores s' = (sl' s2' ••• , to

2

the response levels such that the resulting F statistic is a maximum

(or a minimum). The corresponding most significant contrast c' =

(cl , c2 ' ••• , c ) is also to be determined. The sampling can be sumPl

marized in n pairs of vectors which are realizations of indicator

variables (~l' X2) like those used in the previous section. The ob

servation on Xl indicates the level of response while the X2 value

indicates which treatment was used. One can formally calculate a sample

covariance matrix S using these n pairs as suggested at the end of

Section 2.4. Making the substitutions specified there, one arrives at

This connection became apparent during discussions withDr. Peter Nemenyi.

4Z

811 = B1 -1.1 1i

and

where

~, -1 ),n (n1., nZ' , ... , n-1 PI

~, -1) ,n (n.l' n· Z' • & • , n-z 'PZ

B = -1),n diag(n1., nZ" ... , n1 PI

A -1) ,DZ n diag(n.l' n· Z' ... , n

'PZand

1'1 -1F n «n .. ))

~J

The F statistic, for fixed ~, is a function of the most signif-

icant contrast; that is,

(Z.5.1) F

[n-p j= sup __2

~'d = 0 PZ-l-F

~'~d)Z

after making the substitution

(Z.5.2)i)-I

~ = JJ2 .£.

43

The expression for F in terms of the usual sums of squares is

(2.5.3)

Define

F[n-P2]p -1

2

s,(~e-1~,_ ~ ~')s- 2 -1-1-

A Ab-1A •~'(D1 - FD2 F')~

F* = sup Fs E V(S~lSll)

F* is a monotonic function of

and F** = inf F .~ E V(S~lS11)

(P2-1)F*----::..---- = sup(n-P2) + (P2-1)F* ~ E V(S~lSll) (n-P2) + (P2-1)F

Substituting (2.5.1) in the numerator and (2.5.3) in the denominator

leads to

(2.5.4)

=

(~' ~d) 2sup sup

S E V(S~lS11). d E V(S;2S22)(~'Sll~)(i,e25.!)

Thus the optimal sand d are consistent with the optimal Ib 1

and 2b1 for the sample two way contingency analysis. Introducing

A2P1 for (2.5.4), i.e., the squared first sample canonical correlation

and solving for F* yields

F*

44

Having found an optimal s in accordance with Theorem 2.3.1 and assuming

1\2P1 > 0, it follows from (2.3.1) and (2.5.2) that the optimal c is

proportional to

(2.5.5)

The foregoing argument can be repeated in substantially the same way

to find an ~ yielding F** and the corresponding most significant

out.

"c: one need only replace

Then

"sup~ E V(SllSll>

hy " i.nf~ E V(S~lS11)

through-

F**[n-P 2]p -1

2 1 -

where is the smallest of the squared sample canonical corre-

lations associated with S. If1\2Pp -1 > 0,

1the most significant con-

trast c is proportional to (2.5.5).

2.6 Canonical Analysis with Linear Restrictions

A thorough study of the relations between two sets of variates may

include the systematic placement of linear restrictions on the coefficient

vectors which define the canonical variates. Such restrictions may be

used, for example to force particular coefficients to add to zero, be

equal, or occur in specified proportions; they may also assure that the

canonical variates are uncorrelated with some specified set of variables.

The restrictions are embedded in matrices M1 and M2; it is re-

1, 2, ••• , q1' and

45

and ql ~ q2· The effect of Ml and M2 is to

induce a new covariance matrix

[

MiIllMl

MiI 2lMl

MiI 12M2 ]

MiI 22M22 t

which can be formally treated according to the theorem. The coefficient~ ~

vectors Bl and B2 (say) obtained in this way are related to the de-

sired Bl and B2 by

and

3

It may be most natural to formulate the restrictions in terms of

matrices Gl

and G 3 and then t as a second stept to define Mland

2

M2 so that V(Ml) .1 V(I

llGl ) and V(M 2) .1 V(I 22G2) • Thus let

-1- Gl (GiI llGl)

-G' and -1- G2(GiI 22G2)-Gi t assumingM

l = III 1 M2 = I 22

now that I is positive definite. [Note: g-inverses of Ml and M2

are III and I 22t respectively.]

In compliance with the theorem t one then forms

For example, if the canonical variables for the first set are to beuncorrelated with lXl and 2Xlt then

1

o

o o

46

and works from there. The eigenvalues are the same as those of

Taking the trace, one finds that the sum of squared canonical correlations

between the constrained variables (MI!I' M2!2) is equal to the sum for

(Xl' !2) plus the sum for (Gi!l' GZ!2) minus the sum for (!l' GZ!2)

minus the sum for (Gi!l' !2)·

CHAPTER III

MODELS FOR THE CANONICAL ANALYSIS OF SEVERAL SETS OF VARIABLES

3.1 Introduction

Hotelling [12J recognized the need of extending his theory to deal

with more than two sets of random variables. He did not solve this pro

blem, however, and it has only been since 1950 that much attention has

been given to it.

In the canonical analysis of several sets of variables, one would

like, as for two sets, to identify "canonical variates" which isolate

and summarize existing linear relations among the sets in a convenient

and insightful manner. One would at the same time like to measure the

intensity of the relationship through some sort of generalization of the

canonical correlation coefficient.

Five different techniques for the canonical analysis of several

sets will be presented here. They are

(i) the sum of correlations method (SUMCOR) ;

(ii) the maximum variance method (MAXVAR) ;

(iii) the sum of squared correlations method (SSQCOR) ;

(iv) the minimum variance method (MINVAR);

(v) the generalized variance method (GENVAR).

Of these, the SSQCOR and MINVAR procedures are new. Each is a general-

48

ization of the two set analysis in that it reduces to Rotelling's clas-

sical method whenever the number of sets is only two. Each is designed

to detect a different form of linear relationship among the sets. It is,

therefore, often advisable, especially in exploratory studies, to employ

more than one and perhaps all of these methods.

The notation relevant to the several set problem is contained in

Section 3.2. The criteria for picking first stage canonical variates

(analogous to the first pair of canonical variates for two sets) accord-

ing to the different methods are set down in Section 3.3. The main pur-

pose of this chapter is to develop models of the general principal com-

ponent type, extensions of those in Section 1.6, for the various methods.

This is done in Sections 3.4 - 3.8 in terms of first stage canonical var-

iates. The models are important in that they reveal the types of effects

which the methods can uncover. They also help to motivate and to inter-

relate the methods. The question of how to construct higher order ca-

nonical variates is postponed to Section 3.9.

3.2 Notation

The main notation for the next three chapters is recorded here for

future reference. Essentially the notation is the logical extension of

that given in Section 1.2 for two sets so it will not be necessary to

explain the symbols in detail.

The number of sets is designated by m. Thus

Xj

(p. x 1), j- J

1,2, ... , m, with

mand p = l:: Pi

i=l

49

are the original variables. The other random variables of interest are

Xl• (l .. ,

XI)"""'1ll '

and

yl (Yi, Xi, yl) with y. -1= It •• , = T. X.,"""'1ll ' -J J -J

Zl = (.?.i' .?.i' e •• , ZI) with Z. = B~Y , B.X""""'1ll ' -J J-J J-J

with

Although the elements of Z are still required to have unit variance,

they no longer need be uncorrelated within sets as in Chapter I.

The corresponding covariance matrices are

L: •• ,JJ

and

L: = «L: ij» ,

R = «Rij ))

~ = «~ij))'

~ (j) •

with R., = I,1.1.

Introducing block diagonal matrices

DT = diag(Tl , T2, ... , T ),m

DB = diag(Bl

, B2

, to .:. • , B ),m

DB* diag(B~, B~, €I •• , B*)m '

DB(j) = diag(lb ~ , ~j, e •• , b ~) ,-J nr-J

and

DB*(j) = diag(lbj' , 2b~ , • e • , b1~ I)-J nr-j ,

50

it follows that

DB* = DBDT ,

R = D-1I: D-1 'T T

and

Additionally, let

to the eigenvalues

vj

be orthonormal eigenvectors of R corresponding

c.(R), and let .e. be orthonormal eigenvectorsJ J-~

of cI>(i) corresponding to the eigenvalues1

Ai~

2Ai~ ... ~ mAi > 0,

m~,J i th I: .A. = m. It is also convenient to define

j=l J ~

and

Finally, let ~ be the largest canonical correlation obtainable"'max

between any two of the m sets.

3.3 Criteria for the Selection of First Stage Canonical Variables

The methods of canonical analysis involve a number of stages. The

goal at the s-th stage is to find ~(s) which optimizes some criterion

function. All of the criteria considered here involve the optimization

of some function, of The chosen .2 are called s-thJ s

stage canonical variates. The same criterion function is used at each

stage although certain restrictions on the admissible ~(s) are added

51

as one progresses beyond the first stage. The nature of these restric-

tions and the number of possible stages are matters which will be dealt

with in Section 3.9. It will suffice here to consider the construction

of the optimal !(l)'

The SUMCOR method was introduced by Horst in [8]. It is briefly

mentioned in [9J and again, more extensively, in [10] under the name of

the maximum correlation method;:/ The material in [10], however, is essen-,/'

tially the same as in [8J. '

The SUMCOR criterira is to select !(l) to

(3.3.1)m

maximizp f(~(l» = Ei=l

J/

An equi~?~ent criterion is to

sign changes in the jZl whereas the other four criteria to be con

sidered are not.

mE var(.zl - .Zl)'

j=l 1. J

mminimize f(~(l» = E

i=l.,;<_{...-/7-~;1'",

/';'

/;/~~. It is worth remarking that the f in (3.3.1) or (3.3.2) is sensitive to

.. y'

Horst also originated the MAXVAR method (see [9] or [10]). His

motivation was to find a !(l) such that the associated ~ (1) gives the

best least squares or Euclidean norm approximation to a rank one matrix;

and, consequently, Horst refers to this in [10] as the "rank one approx-

imation method." 1He shows that the procedure is equivalent to picking

!(l) to

(3.3.3)

1 See Corollary 1.6.2.1 which deals with the special case m = 2.

52

For any ~(1)' the corresponding 1A1 is the variance of the first

principal component, lei ~(1). Thus the first stage canonical variates

have the property that their first principal component has a variance at

least as large as that for any other choice of ~(1).

McKeon [16] and Carroll [3] have arrived at this same procedure

using different approaches. Carroll's is particularly interesting. It

is the logical extension of his formulation for two sets which was de-

scribed in Section 1.6. Working in terms of a sample space, he finds

(sample) canonical variates and an auxiliary (sample) variate which are

most closely related in the sense that the sum of squared correlations

of the auxiliary with each of the canonical variates is a maximum. The

intimate connection between the auxiliary vector and the first principal

component line that was stated in Section 1.6 is also valid in this con-

text. More will be said about Carroll's work in Section 3.5.

The least informative<I> (1) is the identity matrix in that there

is no correlation between any two of the jZ1. It seems, therefore,

that a logical criterion for picking ~(l) would be to make as

"distant" from the identity as possible. This idea can be expressed

mathematically in terms of the Euclidean norm: pick ~(l) to maximize

I II - <P(l) I I· Or, equivalently,

(3.3.4) maximizemE

i=l=

At first glance, (3.3.4) appears to be only a variation of (3.3.1), but

in effect it is a much different criterion. (3.3.4) is the SSQCOR cri-

terion for determining ~(l). It is the obvious generalization of

(1.6.10). Loosely speaking, (3.3.4), calls for maximizing the variance

of the

53

The MINVAR criterion is, to some extent, the antithesis of the

MAXVAR criterion. The MINVAR ~(l) has the property that the variance

of its m-th principal component is a minimum. In other words, ~(l) is

chosen to

(3.3.5)

That this criterion is a generalization of the two set procedure follows

from Corollary 1.6.2.1.

The GENVAR procedure was proposed by Steel [21] in 1951 and is the

oldest of the five methods. As its name suggests, the criterion is to

find ~(l) with the smallest possible generalized variance. That is,

one wants to

(3.3.6)m

minimize f(~(l» = I~(l) I = IT JoA l ,j=l

which is the natural generalization of (1.6.11).

3.4 The Sum of Correlations Model

Further insight into the statistical nature of the SUMCOR criterion

can be obtained by considering the following model for ~(l) 2

(3.4.1)

2

where £1 is a known non-null vector, Fl

is a standardized random var

iable, and !l is a vector of error variables.

Suppose that Fl

is chosen so as to minimize the sum of residual

variances, i. e. ,

The jZl will be assumed to have zero means in Sections 3.4 - 3.8

for convenience.

(3.4.2)

Now

(3.4.3)

where

minimize tr{var(E1

)} •

54

The problem, therefore, is to choose Fl so as to maximize ~ial.

Adjusting the sign of Fl , if necessary, to make li£l non-negative,

it follows that

by the Cauchy-Schwarz inequality. The second inequality becomes an

equality if and only if ~i!(l) and F1 are linearly related (with

probability one). Since the jZl

maximized when

and are standardized, lia1 is

and, for this choice of F1

,

Choosing canonical variates to give the best fit to (3.4.1), as

measured by (3.4.2), is thus equivalent to finding ,Zl which maximizeJ .

fi~(l)ll. If II oc l, then this is just the SUMCOR procedure. In other

55

words, Horst's method generates a ~(l) having the "best" fitting com

mon factor, assuming the factor contributes with the~ potency to each

of the

3.5 The Maximum Variance Model

Carroll's development, mentioned in Section 3.3 and earlier in

Section 1.6, can be reformulated in terms of random variables and the

nature of the auxiliary variable revealed. A model for the MAXVAR pro-

cedure will be arrived at in the process. Returning to the model (3.4.1),

suppose that 11 is also allowed to vary so that both II and Fl

are

to be found according to the criterion (3.4.2).

Equation (3.4.3) can be rewritten as

(3.5.1) =

The second term on the right hand side is a positive definite quadratic

form in (11

- ~l); consequently, the minimum of (3.5.1) with respect

to II occurs when 11 = ~l. With this choice of 11 ,

=

m= m - E {corr(J,zl' F1)}2

j=l

The last sum corresponds to Carroll's criterion with Fl in the role of

the auxiliary variate.

The overall minimum of (3.5.1), for given ~(l)' is obtained when

(3.5.2) and

56

Using (3.5.2),

(3.5.3) =

In the limiting case given by 11.1 = m, the fit is perfect, each element

of is -~ and Icorr(jZl' Fl ) I = 1 for j = 1, 2,l~l±m , ..., m•

The .fl and Fl in (3.5.2) are, in fact, the optimal choices for

a general class of criterion functions, one of which is the trace func-

tion (3.4.2). This should be clear from the discussion, in Section 1.6,

of the work of Okamoto and Kanazawa [18] (see also Lemma A3). Thus

(3.4.2) can be replaced, in this section, by

(3.5.4)

where g is any strictly increasing function in each e. ,J

the eigen-

values of the non-negative definite matrix in the argument of f.

The goodness of fit obtainable depends, of course, on the choice of

~(l). The MAXVAR method is equivalent to finding ~(l) which admits the

best fit of (3.4.1) as measured by (3.5.4), with both £.1 and Fl at

one's disposal. This follows from (3.3.3) and (3.5.3) or, more generally,

Lemma A3.

3.6 The Sum of Squared Correlations Model

The model appropriate for the sum of squared correlations method

requires m factors: for a given ~(l)'

(3.6.1) ~(l)

where the .£.1J-

are arbitrary non-null vectors, the jFl are standard-

57

ized random variables, and El is a vector of error variables. (Of

course, with m factors, it is possible to choose the and

such that !l = Q.)

The factors are determined so that lFl is the most important,

and

the second most important, and so on.

are chosen to

More precisely, the

(3.6.2)j

minimize f(var(~(l) - L .tl .Fl )) = g(8 l , 82' ••• , 8m),i=l 1.- 1.

j = 1, 2, ... , m,

where g is the function in (3.5.4). The correct choice for l~l and

lFl follows from (3.5.2). To obtain ~l and 2Fl' subtract ltl lFl

from both sides of (3.6.1). Then (3.5.2) may be used to determine the

optimal ztl and ZFl , remembering that the largest eigenvalue of

(~(l) - lA l leI lei) is ZA I with associated eigenvector ~l. Con

tinuing in this way through m steps, it is seen that

(3.6.3) and

The factors are just the standardized principal components of ~(l) and

hence are uncorrelated. With f as the trace function in (3.6.2), thej

minimum sum of residual variances with j factors is m - L .A l •i=l1.

Thus, by adding on the j-th factor, a reduction of jAl from the min-

imum sum with (j-l) factors may be attained.

An external criterion is needed as a basis for choosing the optimal

~(l)' and (3.3.4) is the one which will be used here.

The maximum of the sum of squares of m numbers which add to m

and lie in the range zero to one is Zm , and it occurs when one number

is m and the rest are zero. The minimum sum is m and results when

58

each number is equal to one. This suggests that the MAXVAR and SSQCOR

methods will yield similar !(l)'s whenever most Ei the variability can

be accounted for E.l.1! single factor. It further suggests that the effect

of the criterion (3.3.4) will be to produce a !(l) such that its first

"few" factors account for most of the variability and the last "few"

very little of the variability. That is, the tendency should be to

is the identity matrix.

This is in contrast with eachspread out the jA l as much as possible.

factor being equally important

the situation when and only when

3.7 The Minimum Variance Model

j = 1, 2, ... , m) which is

MINVAR canonical variates admit the best possible representation in

terms of (m-1) factors. To see in what sense this is true, consider an

(m-1) factor model,

(3.7.1) !(l)

m-11: . ·.f1 J.F1 + E1 ,

j=l J

for an arbitrary !(1). The jll' jF1 , and ~1 are as in (3.6.1).

andIf the .llJ-

is reached when

are chosen to satisfy (3.5.4), then the minimum

(3.7.2) and

This follows from Lemma A3. Alternatively the and can be

picked sequentially using (3.6.2) for j = 1,2, .•. , m-l. The optimal

choices are, again, as recorded in (3.7.2). Operating with criterion

(3.5.4) has the advantage that orthogonal rotations on the factors

(and simultaneously on the can be made without disturbing the

59

measure of fit. These rotated factors, however, would not be expected to

possess the pleasing properties of those in (3.7.2).

If the '£1 and .Fl specified in (3.7.2) are substituted intoJ- J

(3. 7. 1), then (3. 5 . 4) becomes

From this point, it is clear why MINVAR canonical variates can be inter-

preted as those which admit the best fit in terms of (m-l) factors.

Evidently the optimal (m-l) factors in (3.7.2) are such that the

single remaining (or m-th) factor needed to obtain a perfect fit in

(3.7.1) contributes the smallest possible amount to the residual var-

-~iances. The m-th factor is mFl = mAl ~i !(l)' the corresponding

o ~vector of weights is ~l = mAl ~l' and the error vector is

3.8 The Generalized Variance Model

The GENVAR method can be motivated in terms of the m factor model

in (3.6.1), using the criterion (3.6.2) and the results in (3.6.3).

Having done this for a fixed !(l)' it remains to define an effective

external criterion, in this case (3.3.6), for the optimal selection of

!(l) •

The SSQCOR criterion places main emphasis on the first "few" fac-

The GENVAR criterion has theand less on the last "few."tors or jAl

opposite emphasis: it is directed at diminishing the contribution of the

last "few" factors or. jAl with less weight given to the effect of the

firs t "few."

60

One might expect that the MINVAR and GENVAR methods would yield

similar !(l)s whenever it is possible to account for most of the var

iability (in some !(l)) with (m-l) factors. A note of caution should

be added, however, as the multiplicative type function in (3.3.6) is

especially sensitive to the values of the "smaller" jAl and, conse

quently, it is difficult to predict exactly how it will behave.

3.9 Higher Stage Canonical Variables

The study of relations among the sets can be continued beyond the

first stage by considering higher stage or higher order canonical var-

iates !(2) , !(3)' to supplement the optimal !(l). The same cri-

terion function is used at each stage, but restrictions are added to

assure that the canonical variables for a particular stage differ from

those for the previous stages. The restrictions are such that if the

number of sets were only two, the selected variates would be the

usual canonical pair as defined in Section 1.3.

It has already been demonstrated that there are a number of work

able criterion functions for the several set situation which provide gen

eralizations of the two set criterion. It is also true that for each

method there exist several different kinds of restrictions which can be

used in the construction of higher stage canonical variables, each lead

ing to a different choice of variables and each being a generalization

of the restrictions used for two sets.

The models introduced in Sections 3.4 - 3.8 (or, in two instances

to be specified later, generalizations of them) can be adapted to a gen-

61

eral s-th stage by expressing each in terms of an arbitrary member of the

class of admissible ~(s)'s. One then seeks that ~(s) which admits the

best fit according to the appropriate standard.

A simple but useful type of restriction on Z-(s) is

(3.9.1) corr( .Z., .Z) = 0,J ~ J s

i = 1, 2, ... , s-l; j = 1, 2, •.. , m.

(~(j)' j = 1, 2, ... , s-l, are, of course, the canonical variates for

the preceding stages.) In other words, the canonical variates are re-

quired to be uncorrelated within sets. (3.9.1) has been used by Horst

([8], [9], and [10]) in connection with the SUMCOR and MAXVAR pro-

cedures. When m = 2, (3.9.1) is equivalent to (1.3.1) and (1.3.2).

A more flexible class of constraints is obtained by requiring the

canonical variables to be uncorrelated in some but not necessarily all

of the sets:

(3.9.2) corr(.Z., .Z) = 0, i = 1,2, ... , s-l; j ElkJ ~ J s

where I k is a non-empty subset of k of the first m integers,

specifically, i l < i 2 < ••• < ike When k = m, (3.9.2) is the same

as (3.9.1).

The most rigid restriction to be considered is

(3.9.3) corr(~(j)'~(s)) = 0, j = 1,2, ... , s-1.

That is, the canonical variables at the s-th stage must be uncorrelated

with all the lower order canonical variables. When m = 2, (3.9.3) is

equivalent to (1.3.1) - (1.3.4). If.F, j = 1, 2 , ..., m, are th eJ s

best fitting factors associated with the s-th stage fit of an m factor

62

model like (3.6.1) using a criterion like (3.6.2), and if the admissible

~(s)s are those for which

(3.9.4) corr(.F., kF ) = 0, i = 1, 2, ••• , m; j = 1, 2, •.• , s-l;~ J s

k=1,2, ••• ,m,

then (3.9.4) is equivalent to (3.9.3). This was proved in Corollary

1.6.2.4 for the case m = 2 and a similar proof applies here.

The final type of constraint to be considered is relevant to the

MAXVAR or MINVAR method. This constraint is important because it leads

to canonical variables which correspond to Carroll's higher stage var-

iab1es.

Extensions of the MAXVAR and MINVAR first stage models are required

in this connection. For the MAXVAR method, the model is

(3.9.5) Z =-(s) .F + E ,

J s --s1 :0; k < m ,

s

where the j~ and .FJ s

are chosen sequentially in accordance with the

s-th stage equivalent of (3.6.2) for j = 1, 2, ..• , k .s

The constraint

is in terms of the "last" factors, namely,

(3.9.6) corr(k F., k Fs ) = 0, j = 1,2, ... , s-l,j J s

and the criterion is to

(3.9.7) maximize k Ass

with respect to admissible ks and ~(s). It will be shown in Section

4.5 that k1

= 1 always and that the maximum of (3.9.7) is c (R).s

63

The revised model for the MINVAR method is

(3.9.8) z-(s)=

k -1s

L:j=l

.l .F + E ,J-S J S -s

1 < k :s; m,s

higher stages is again (3.9.6) where

where the j~ and jFS are selected in accordance with the s-th stage

equivalent of (3.5.4) as prescribed by Lemma A3. The constraint on the

F is now the next factor whichk s s

would be added if an additional term were allowed in (3.9.8). (This

new factor is just in (3.9.5).) The criterion is to

(3.9.9) minimize k Ass

with respect to admissible ks

and ~(s) • It will be seen in Section

4.6 that k1

= m of necessity and that the minimum in (3.9.9) is

c +l(R).p-s

In either situation, it is easy to prove that (3.9.6) can be re-

written as

(3.9.10) k :~.j corr(~(j)' ~(s)) k ~J s

0, j = 1, 2, "', s-1.

(See Corollaries 1.6.2.2 and 1.6.2.3 for the special cases with m = 2.)

When restrictions (3.9.1) or (3.9.3) are used in place of (3.9.10)

with the criteria (3.9.7) and (3.9.9), k = 1j

at all stages of the

MAXVAR procedure and k. = mJ

at all stages of the MINVAR procedure.

This, too, follows from Sections 4.5 and 4.6.

Nothing has been said so far about the number of stages that can be

usefully employed in conjunction with anyone of the indicated restric-

tions. This number, So say, specifies how many non-redundant ~(S)'S

can be generated in a manner which is consistent with the procedure when

m = 2.

64

Restrictions (3.9.1), (3.9.2), and (3.9.3) are applicable to any of

the five methods. For (3.9.1) , s o = P1' the number of variables in

the least numerous set. For (3.9.2) with k > 1, s = p .• The case0 J.1

k = 1 is more complex. Simply stated s is the greater of one and n,0

where n is the number of stages for which there exist canonical var-

iab1es with ~(s) T I (cf. Section 1.5). [It is easy to prove that n

is the greatest s such that the value of (4.5.8) is greater than one --

if R = I, n = 0.] As for (3.9.3), s is the largest value of so

for

[In terms of the matrices

as the zero matrix,

definedwhich (3.9.3) can be satisfied.

in (4.2.8), and interpreting jct

imum s such that (p.-rank(.C*»J J s

> 0 for

c*j s

s is the maxo

j = 1, 2, ... , m.]

One can deduce from Theorem 1.6.1 and Corollary 1.6.2.2 (1.6.2.3)

that the number of stages for the MAXVAR (MINVAR) method using restric-

tion (3.9.6) when m = 2 is the number of eigenvalues of R greater

(less) than one. The situation for m > 2 is somewhat arbitrary. The

approach taken here is that the number of MAXVAR (MINVAR) stages should

be at least equal to the number of c. (R)J

greater (less) than one. The

ambiguity arises in connection with the unit eigenvalues of R. For each

c.(R) = 1,J

it is possible that the associated variables, z-(s) say,

have ~(s) = I in which case they should not be considered as a sepa

rate stage for the same reasons as when m = 2. If ~(s) ~ I, ~(s)

can reasonably be considered as part of either the MAXVAR or MINVAR

system.

Horst introduced a technique in [9], later labeled the "oblique

maximum variance method" in [10], which is mathematically equivalent to

P1 stages of the MAXVAR procedure using restriction (3.9.6). His "rank

65

one approximation method" ([9] and [10]) is equivalent to PI stages of

the MAXVAR procedure using (3.9.1). From the present point of view, the

only difference between these two methods of Horst's is in the type of

restriction used at the higher stages. However, it should be clear from

Theorem 1.6.1 that the first of these is not strictly within the

province of canonical analysis when m = 2, unless R12 is of full

rank (r = PI).

Carroll used his technique to generate what amounts to p stages

of the MAXVAR (or MINVAR) method using (3.9.6). Thus his derived var-

iates are a mixture of MAXVAR and MINVAR canonical variates plus, pos

sibly, some others corresponding to eigenvalues of R equal to one.

CHAPTER IV

PROCEDURES FOR THE CANONICAL ANALYSIS OF SEVERAL SETS OF VARIABLES

4.1 Introduction

Procedures are developed in this chapter for finding the various

stages of canonical variates and the corresponding criterion va1ues~

The SUMCOR, SSQCOR, and GENVAR problems, considered in Sections

4.2 - 4.4, require iterative procedures for generating the optimal ~(s).

Fortunately, each procedure is of the same basic type so that one gen

eral computer program can be used for all of them. The procedures are

such that convergence is guaranteed; and, given that the starting points

are appropriately chosen, the end products will be coefficient vectors

which define the desired canonical variates.

Horst proposed a different type of iterative procedure for finding

SUMCOR first stage and higher order canonical variates in accordance with

restriction (3.9.1). The convergence properties of his procedure, how

ever, are not known.

The MAXVAR and MINVAR procedures are presented in Sections 4.5 and

4.6. The associated canonical variates and criterion values are re

lated to the eigenvectors and eigenvalues of certain known matrices and

hence are easy to determine. The MAXVAR procedure, using (3.9.1) as the

constraint on the higher stages, is due to Horst [9].

67

4.2 The Sum of Correlations Procedure

The following special notation will be used:

(4.2.2) N* (Rjl 1~' c •• , R.. 1 . lb*, Rjj+1 j+1~' ... , R. b*) ,j s JJ- J- :-s Jm m-s

.h = .N 1,J-S J s-

and

h* .N*l.J-S J s-

The s-th stage of the SUMCOR method involves the maximization of

f s subject to certain restrictions on DB*(s). For the first stage,

start from the Lagrangian equation

where ii = (181' 281' ... , m81) is a vector of Lagrange multipliers.

Differentiating with respect to D~*(1)1 gives

Equating the derivative to zero leads to

(4.2.3)

or

(4.2.4) = 1) ,bl*,

J-j 1, 2, ... , m.

68

Premultiply (4.2.3) by DB*(l) or (4.2.4) by b*' ,j-l '

then

which means that

~l =

(4.2.5)m

l: corr(J,zl'iZl)' j = 1,2, ... , m.i=l

The relation between f l and the Lagrange multipliers is

(4.2.6) fl

= l' 8 •--1

Suppose that the restrictions imposed at the s-th stage (s > 1)

can be expressed in terms of matrices ,G* in the following manner:J s

(4.2.7) b*' C* 0'j-s j s = -' j 1,2, ••• ,m.

Restrictions (3.9.1), (3.9.2), and (3.9.3), in particular, can be ex-

pressed in this form. For (3.9.2) (which includes (3.9.1))

,C*J s

=

,b*2' ••• , ,b* 1)'J- J- s-

otherwise •

Alternatively, if the restrictions are like those in (3.9.3), then

The appropriate s-th stage Lagrangian equation is

(4.2.9)m m

gs = f + l: (1 - b*' .b*) ,8 - 2 l: b*' ,C* J'.:Y.g's 5=1 j-s J-S J s j =1 j-s J s

where the .8J s

8' = (8 8-s ls'2s'

and

... ,

jIs are Lagrange multipliers with

8 ).m s

69

3gs

'" b* = 2.h* - 2(.8 - 1) .b* - 2 C*.Y •o j-s J-S J S J-s j s J~

Setting this equal to zero yields

(4.2.10) o = .h* - (.8 - 1) b* - C*.Y •J-s J s j-s j s J~

Multiplying by C*,j s and solving for jIs' one obtains the general

solution (cf. Rao [20J, p.26)

(4.2.11) .Y = (.C*' .C*) .C*' .h* + {I - (.C*' .C*)J~ J s J s J s J-S J s J s

(.C*' .C*)}u ,J S J S -

where u is arbitrary. Substituting (4.2.11) into (4.2.10) gives

(4.2.12) o {I- .C*(.C*' .C*) .C*'} h*- (.8 -1) b*JSJs JS JS j-s JS j-s'

j = 1, 2, ... , m.

It may be inferred from (4.2.12) that

8 =j S

mr

i=lcorr( . Z , . Z )

J S 1. S

(4.2.13)

and

(4.2.14)

subject to

.b* E V(I - .C*(.C*' .C*) .C*')J-S JSJS JS JS

f = 1'8 •S --s

is defined to be the zero matrix, then the above results apply

for S = 1 as well: (4.2.4), (4.2.5), and (4.2.6) are just special cases

of (4.2.12), (4.2.13), and (4.2.14).

70

In terms of the original X variables, (4.2.12) becomes

(4.2.15) -1 -1j Cs (jC~

-1jCs )

- ,C' -1o = {2:jj

- 2: .. 2: .. 2: .. } ,h - (j es - 1) ,b ,JJ JJ J s JJ J-S J-s

j = 1, 2, ... , m,

where

and (4.2.7) becomes

,b' ,C =Q.',J-s J s

j = 1, 2, .•. , m.

The form of (4.2.12) or (4.2.15) suggests the following iterative

procedure (described in terms of the ! variables) for generating the

optimal s-th stage ~(s):

(i) specify initial variables

satisfy (4.2.7);

,z (0)J s

by vectors ,b*(o) whichJ-s

(ii) for n = 1, 2, ... and with ,h*(n) fixed, solveJ-S

(4.2.16) o = {I - ,C*(,C*' C*)JSJS js

,c*'} h*(n)J s j-s

(J,es(n) - 1) b*(n)j-s '

j = 1, 2, ... , m,

obtaining the maximum e(n)j s

and associated ,b* (n)J-s

The vector is computed like h*j-s using

(4.2.17)

so that

b*(n)l-s ' ... , b*(n) b*(n..,.l)

j-l-s 'j+l-s ' ... ,

(4.2.18) (. e(n) - 1)J s

j-l m= 2: corr(,Z(n) "~zen»~ + 2: corr(,z(n) "z(n-l»

i=l J s 1 s i=j+l J s 1 S

71

This procedure, as will be shown, must converge monotonically in

the value of fs

using Z(n) (n) (n-l)Is 'H.'jZS 'j+lZs ,H.,

f(n)j s '

Z(n-l)m s '

to a solution of (4.2.12). The solution will be optimal if the initial

variables are appropriately chosen. A sufficient condition for the

the calculation which is needed at each

solution to be optimal is for

than for any non-optimal ~(s)

Observe that for s = 1

m mE E corr(.Z(o) ,Z(o» to be greater

~ s 'J si=l j=lwhich comprises a solution of (4.2.12).

step of the iteration procedure is equivalent to the calculation of

Sl in Theorem 1.7.1 (ii). For s > 1, the calculation is of the same

type but with linear restrictions on the admissible coefficient vectors.

In either case, the calculation is quick and easy to execute.

The nature of the calculation is such that

(4.2.19) f (1) ::;1 s

Thus, since f(n) is bounded above by2

j s m ,

(4.2.20) lim f (n) = f (say) , j = 1, 2, ... , m,n+ oo j s s

where f is a positive finite number.s

Define ,~(n) by the equationJ s

(4.2.21)

( .~(n) _ 1)J s

and let

j-lI;

i=l( (n-l) ,z(n» +corr ,Z ,

J S 1. S

mE

i=j+l

. (n-l) (n-l)carr ( .Z . Z ) ,

J S '1. S

(4.2.22) f(n) =o s

f (n-l) •m s

72

Note that

(4.2.23)

and

.6(n)J s

< m

(4.2.24) fen)j-l s

= 2( e(n)j s

@(n))j s '

j = 1, 2, ... , m.

The limit of the left hand side of (4.2.24) is zero as n + 00 by

virtue of (4.2.19) and (4.2.20) so that

(4.2.25) lim (.e(n) _ .@(n)) = o.n + 00 J s J s

This implies that each element of can be made arbi-

trarily small in magnitude by taking n sufficiently large, provided

that .e(n) > O. (For .6 (n) = 0, b*(n) is arbitrary. )J s J s j-s

~

Suppose ~(s) is a particular choice of s-th stage variables with

associated~ ~ ~

.b*, .6, .h*,J-s J s J-s

and~

4>(s) such that

~

where fs

f = 1'~ 1s - (s)-

is the limiting value in (4.2.20). Then the~

b*j-s

and~

.6J s

comprise a solution of (4.2.12) since

changing anyone of the~

.b* •J-s

fs

can not be increased by

In practice, where the criterion values connected with different

solutions of (4.2.12) will be distinct and where the

be positive, the situation is such that

.e(n)J s

will always

(4.2.26)

(4.2.27)

(4.2.28)

73

lim ( b*(n) _ .b*(n-l)) = ..Q.,n-+ oo j-s J-S

.b* (n) rv

nl!moo = b*J-s j-s

h*(n) rv

lim = h*n-+ oo j-s j-s

8(n) rv

lim = .8 ,n -+ 00 j s J s

(4.2.20) holds, and the '"b*j-s and '". 8J s

form a solution of (4.2.12).

4.3 The Sum of Squared Correlations Procedure

The SSQCOR procedure is developed in much the same way as the SUMCOR

procedure. Define

(4.3.1) .p = .N .N'J s J s J s

and p* = N* N*'j s j s j s

where .NJ s

and N*j s

are as in (4.2.1) and (4.2.2). The criterion func-

tion at the s-th stage is

fs

=mL:

j=l('b*' b*)2 +J-s j-s

m mL: L:

i=l j=lifj

(.b*' R...b*)2,J.-S J.J J-s

which is to be maximized subject to certain restrictions on DB*(s).

The restrictions which are relevant to the higher stages of the

SSQCOR procedure are all of the form (4.2.7). If is again used

to represent the zero matrix, one need only consider the general

Lagrangian equation

m mgs = f + 2 L: (1 - b*' .b*).8 - 4L: b*' j C~ joYss j=l j-s J-S J s j=l j-s

where the .8 and joYs are Lagrange multipliers as in (4.2.9).J s

74

Clg':\,b~ 4,P* b* - 4(,8 - 1) .b* - 4,C* J'Yo J s j-s J S J-s J s -'-SJ-S

which, equating to zero, gives

(4.3.2) o = P* ,b * - (, 8 - 1) b * - ,C* .y •j s J-S J s j-s J s J-'-S

Multiplying by C*'j s

and solving for ,yJ-'-S

yields

(4.3.3) (,C*' C*) ,C*' P"~ b* + {I - (.C*' .C*)J s j s J s j s j-s J s J s

(C*' C*)}uj S j S -'

where u is arbitrary. Substituting (4.3.3) into (4.3.2), one arrives

at

(4.3.4) 0= [{I - .C*(,C*' ,C*) ,C*'} p* - (,8 - 1) IJ.b*,J s J s J S J S j S J S J-s

j = 1,2, ... , m,

which implies that

.8J S

m2::

i=l{ ( 2 .2 )} 2corr, ,

J.s ]. S

and

subject to

,b* E V(I - .C~~<'C*' C*) .C*')J-s JSJs js JS

f = 1'8 .S --s

In terms of the original X variables, (4.3.4) becomes

-1(4.3.5) 0 = [{2::

jj-1 (,-1

2:: ... C .C 2:: ...C)JJ J S J S JJ J S

C' 2::-1 } P - (8 - l)IJ ,b ,j S jj j S j S J-s

j = 1, 2, ... , m.

75

Equation (4.3.4) or (4.3.5) can be solved with the help of an iter-

ative procedure like the one described in the last section. The prin-

p*(n) is computed like ,p*j s J s

(4.2.18), one has

cipa1 change is that h*(n)j-s

is replaced by .p*(n) .b*(n) in (4.2.16).J s J-S

in (4.3.1\ using (4.2.17). In place of

(.e(n) - 1)J s

=j-1

L: {corr(,~~(n)i=l ] s

m+ L:

i=j+1

. «n) ,z(n-1»}2i. corr ,Z ,J S 1 S

Again, the procedure must converge monotonically in(n)

,f -- i. e. ,J s

(4.2.19) and (4.2.20) hold -- to a solution of the derivative equations,

(4.3.4) in this case. Thus the solution is optimal provided the ini-

tia1 variables are appropriately chosen.

For s = 1, each iteration involves a calculation like the one

used to find and 1£~ in Theorem 1.7.1 (i). For s > 1, the only

difference is that linear restrictions on the admissible coefficient

vectors are imposed. In both instances, the calculation involves the

determination of the largest eigenvalue and associated eigenvector of a

certain matrix.

Define "(n),eJ s

by the equation

( ~ (n) _ 1)j s

j-1 mL: {corr(,z(n-1) "z(n»}2 + L: {corr(,z(n-1) "z(n-1»}2,

i=l J s 1 s i=j+1 J s 1 S

analogous to (4.2.21), and let be as in (4.2.22). It is ap-

parent that equations (4.2.23), (4.2.24), and (4.2.25) are valid here,

although their interpretation is somewhat different in this context.

76

The most general conclusion is that and b*(n)j-s are, in the

limit as n ~ 00, in the same eigenspace the space corresponding to

as well as the associated

But here

of .p*(n) •J s

in Section 4.2,

. 8(n)J s

like~

<1> (s) •

~(s) ,

and

Consider

the largest eigenvalue

fs

~

where f is the limiting value in (4.2.20) for the SSQCOR iteratives

procedure. As before, the~

.b*J-S

~

and .8J s

must form a solution of the

current derivative equations, namely (4.3.4).

In practice, the criterion values corresponding to different so~

lutions of (4.3.4) will be distinct, the 8 (n) will be of multiplicityj s

one, and .b*(n-l) will equal b*(n) apart from a possible scalarJ-s j-s '

factor of (-1) , in the limit as n ~ 00 With no loss in generality,

the b*(n) may be adjusted in sign so thatj-s

lim p*(n) = p*n ~ 00 j s j s

equations (4.2.26), (4.2.27), and (4.2.28) hold for the SSQCOR procedure;

and the .b* and .8 form a solution of (4.3.4).J-S J s

4.4 The Generalized Variance Procedure

Steel [21] developed a system of non-linear equations for finding

first stageGENVAR canonical variates. The crux of his argument is that,

for any orthogonal DB*, 1<1>(1) I appears as a diagonal element of the

m-th compound matrix of <1>. Fixing attention on this element, deriv-

ative equations are found which, along with orthogonality restrictions,

77

m 2are I: Pi in number with as many unknowns, the elements of Br,

i=lB~, ... , B*. Although some solution to these equations will minimizem

I~(l) I, the equations appear difficult to solve except in special cases.

It is possible to obtain appropriate derivative equations without

recourse to compound matrices by making repeated use of a simple deter-

minanta1 expansion. These equations can be put in a form which is ame-

nab1e to solution.

Let jMS

be the matrix obtained from ~(s) by deleting the j-th

row and j-th column. Define

= .N .M-1

.N'J s J s J s

and .Q*J s

.N* .M-1

N*'J s J s j s

The criterion function at the s-th stage is

= I.M I .b 1<' (IJ s J-s

Q*) b* J.j s j-s'

1, 2, ... , m •

The crucial point here is that .M and Q* do not involve .b*.J s j s J-s

Considering restrictions like (3.9.1), (3.9.2), and (3.9.3), which

can be phrased in the form of (4.2.7), one has the general s-th stage

Lagrangian equation

= f s

m mI: (1 - .b*' .b*) I.M 1.6 + 2 I: I.M I.b*' .C* J.1.g

j =1 J-S J-s J s J s j =1 J s J-S J S

andwhere the .6J s

matrix as before.

are Lagrange multipliers and is the zero

3gs

a b* =j-s

- 21.M I.Q* .b* + 21.M 1<'8 - l).b* + 21.M I.c* .YJ s J S J-s J s J s J-s J s J s J""'"'S

78

and equating this to zero gives

(4.4.1) o = Q* b* - (.8 - 1) b* - C* .y •j s j-s J s j-s j s J-'-S

The general solution for j.Ys is

(4.4.2) jl:s = ( .C*' .C*) jQ~ b* + {I - ( .C*' .C*) ( .C*' .C*)}u,J s J s j-s J s J s J s J s -

with u arbitrary. Inserting (4.4.2) into (4.4.1) results in

(4.4.3) 0= [{I - .C*(.C*' .C*) .C*'}.Q* - (.8 - l)IJ.b* ,J S J S J S J S J S J S J-S

j=1,2, ••• ,m.

It follows from (4.4.3) that

.8 = 1 + b*' jQ~ b*J S j-s j-s

subject to

b* E V(I - .C*(.C*' C*) .C*')j-s J S J S j S J S

and

f = I.M 1(2 - .8 ),S J S J S

j = 1, 2, ... , m.

In other words, (.8 - 1) is the squared multiple correlation betweenJ S

jZS and (lZs' ••• , j-1Zs' j+1Zs' ••. , mZs)' The companion expression

of (4.4.3), in terms of the original! variables, is

79

(4.4.4) -1o = [{Ejj

j = 1, 2, ... , m.

Equation (4.4.3) or (4.4.4) can be solved using an iterative pro-

cedure which is basically the same as the ones suggested for (4.2.12)

is the

.C*']Y. andJ s -J

and b* (n)j-s

using (4.2.17). The quantity

for .h*(n). In this case,J-s

and (4.3.4). One now requires .Q*(n) which is computed like Q*J s j's

.Q*(n) .b*(n) is substituted in (4.2.16)J s J-S

(.e(n) - 1) is the square of the first caJ s

nonical correlation between [I - .C*(.C*' .C*)J s J s J s

( z(n) z(n) (n-l) (n-l)Is '···'j-ls 'j+lZs ""'mZs ),

vector which defines the corresponding first canonical variate for the

first of these sets (cf. Theorem 1.7.1 (iii». For s > 1, this is an

example of a canonical analysis with linear restrictions (on the set of

variables Y.), as discussed in Section 2.5.-J

In contrast to (4.2.19), the iterative procedure is such that

(4.4.5) f (1) ;:;:1 s

f(l) ;:;:m s

Furthermore,

(4.4.6) o < f(n):o; 1 ,j s

and (4.2.20) holds for the GENVAR sequence .f(n) .J s

Now let

( ~ (n) _ 1)j s

= b*(n-l)' .Q*(n) b*(n-l)j-s J s j-s '

that

and

is, the squared multiple correlation coefficient between

( Z(n) Z(n) Z(n-l) z(n-l». Instead1 s , •.. , j-l s ' j+l s " •• , m s

Z (n-l)j s

80

of (4.2.23), one now has

(4.4.7)

Consider the ratio

A(n) (n)1 ~.8 ~ 8 < 2.

J s j s

(4.4.8)

f(n)j-l s

f(n)j s

=1 .M(n) I (2 J s

I .M(n) 1 (2J s

where

.MJ s

f(n) is defined as in (4.2.22) and M(n) is computed likeo s j s

using (4.2.17). The limit of the left hand side as n ~ 00 is one

because of (4.4.5) and (4.2.20). This, together with (4.4.7), assures

that (4.2.25) remains correct for the GENVAR sequence

From this point, the development proceeds exactly like the last

part of the previous section: one need only read .Q*(n)J s

and GENVAR for SSQCOR and redefine f s to be I~(s) I.

4.5 The Maximum Variance Procedure

for

The criterion function at the s-th stage of the MAXVAR procedure

is

(4.5.1) fs

=k ~ lP(s) k ~

s s

is a length one vector.

which is to be maximized with respect to choice of DB*(s) and

k , 1 ~ k < m, subject to restrictions (3.9.10) or (4.2.7). Note thats s

D~*(S) k ~s

The optimal criterion value and the corresponding canonical var-

iables at the first stage can be easily found with the aid of Lemma AI.

81

Let v be an arbitrary unit length vector. Then

sup v'Rv = Y1'RV1v - -

Suppose DB*(l) and k ~11

are required to satisfy

(4.5.2)

(4.5.2) implies that1

(4.5.3) j = 1, 2, .•. , m •

Next observe that ,iP (1) k ~1 = viRV1 = c1 (R). There cannot existk ~1

1 1an i, 11.t11 = 1, such that i'iP(l)i = i'DB*(l)RD~*(l)i = v'Rv > c

1(R) ,

since c1 (R) is the largest eigenvalue of R. This implies that

sup f 1 = c1 (R)

~(1)

and occurs when ~(1) is defined by (4.5.3). It also implies that

C1 (R) is the largest eigenvalue of iP(l) with corresponding eigen-

vector Thus

Suppose that the restrictions imposed at the s-th stage are of the

type (4.2.7) with

(4.5.4) =

1

Let Y = D~*(s) k~' It is desired to find v with the appropriates

properties which maximizes (4.5.1). Form the Lagrangian equation

If I I jY1 1 I = 0, jl! is arbitrary. A similar comment applies to

the other expressions of this type found in this and the next section.

82

f + (1 - v'v)8 - 2v'DC* y ,s - s - """""SS

where 8s and .Y.s are Lagrange multipliers. The derivative with re-

spect to v is

dgs

dV = 2Rv - 2 8s'y' - 2 DC* .Y.ss

which, when equated to zero, gives

(4.5.5) o = Rv - 8s v - DC*1.g.s

Multiplying (4.5.5) by v' results in

(4.5.6) f = 8.s s

Solving for y and substituting the general expression for it into"""""S

(4.5.5) leads to

(4.5.7)

Thus

(4.5.8)

o

sup f =s

~(s)

D' } R - 8 I] v.C* ss

and DB*(s)

(4.5.9)

where

andk~

sare such that

= ... , ~, )m-s

83

is a unit length eigenvector associated with the largest eigenvalue in

(4.5.7). (4.5.9) implies

(4.5.10) b*j-s± . J:':·s

Ilj~11j = 1,2, ... , m.

It remains to verify that k ~ is an eigenvector of the induced ~(s).

sTo see this, first note that

Now

arguing as in the case s = 1,

which is recorded in (4.5.8).

eigenvector but also that ks

it follows not only that

= 1 and lAs = sup fZ s-(s)

k ~ is ans

the value of

Restriction (3.9.10) is also easy to handle. At the second stage,

and using Lemma Al once more,

sup f 2~(2)

v'Rv = 0-1-

= sup v'Rv = ~iRV2

~(2)v'v = 0-1-

= c 2(R).

Suppose and k ~22


D~*(2) k ~2 = ~22

which implies that

± j 1,2, ... ,m.

Then, if k ~2 can be shown to be an eigenvector of the associated2

~ (2)' ~(2) must be the required vector of canonical variables.

Let R(2) = (I - v v')R(I - v v') and form-1-1 -1-1

~(2) = DB*(2)R(2)D~*(2) = ~(2) - cl (R)DB*(2)vlviD~*(2)

must be an eigenvector of ~(2) because

Now k ~22

84

which proves that

~(2) _ D R(2) ='!' k ~2 - B*(2) .Y.2

2

~(2) e ='!' k-2

2

But

k ~2 is an eigenvector of ~(2) corresponding to the eigenvalue2

c2(R). It is not true, however, that k2 must be equal to one in this

situation; that is, k A2 need not be the largest eigenvalue of ~(2)'

2Examples are given in Sections 5.4 - 5.6 which demonstrate that, at

least for some s > 1, one can have k > 1s

Continuing in this way, one has at the s-th stage

(4.5.11)

with

and

zsup-(s)v~Rv = 0-J -

j=1,2, ••. ,s-1

fs =

v-s

c (R),s

(4.5.12) b*j-s

j 1, 2, ••. , m.

One can easily show, as for s = 2,

~(s) corresponding to the eigenvalue

that k ~s

c (R).s

is an eigenvector of

As with the other procedures, one does not in practice need to make

the transformation from X to Y. The equation analogous to (4.5.10)

85

which defines the s-th stage canonical variate coefficient vectors with

respect to X is

-1Ej j .w

(4.5.13) .b = ± J-S j = 1, 2, mJ-S II -~ .; II

... , ,E ..JJ J-s

where

;1-s = ... ,

is an eigenvector corresponding to the largest eigenvalue of the matrix

(4.5.14)

with

The largest eigenvalue of (4.5.14) is the maximum value of f ,s

as

given in (4.5.8). The companion expression of (4.5.12), in terms of the

original variables, is

-1± E .. .w

.b JJ J-S j = 1, 2, mII -~

• e • , ,J-S

E •• .w IIJJ J-S

where

(4.5.15)

is an eigenvector of

WI =-s

( I I I)

l W'Zw, ••• , w:-s:-s nr-s

corresponding to the s-th largest eigenvalue

of this matrix subject to the restriction

0, j = 1,2, ... , s-1.

Note that

(4.5.11).

c (R),s

the maximum value of fs

86

as recorded in

4.6 The Minimum Variance Procedure

The mathematical developments of the MAXVAR and MINVAR procedures

are so much alike that many of the details for the latter procedure can

be safely omitted. The s-th stage criterion function is the same as

(4.5.1). The object now is to minimize (4.5.1) with respect to choice

of DB*(S)

(4.2.7).

and k ,s

1 < k ~ m,s

subject to restrictions (3.9.10) or

First of all, using (If.2.l) of Rao [20], one has

inf v'Rvv

= v'Rv-p -p

c (R) •P

Suppose and k :::'11


It follows then that

= v-p

(4.6.1) b* =J-l+ .v- J=p

II j Y.p11j 1, 2, ... , m.

Arguing in a manner similar to that in the previous section, one con-

eludes that

inf fl

~(l)

c (R)P

and occurs when ~(l) is defined by (4.6.1), c (R)P

is the smallest

eigenvalue of the associated ~(l)' and kl = m.

87

For restrictions of the type (4.2.7) on the higher stage canonical

variables, the key equation is again (4.5.7). The relationship between

the present e and f is the same as in (4.5.6).s s The eigenvalues of

the 'adjusted' R matrix in (4.5.7) are the same as those of

{I - DC*(Db*DC*)- Db*}2R because of the idempotency of the matrix ins s s s

braces. These in turn are the same as those of

(4.6.2)

If the rank of DC* is denoted by q, then q of the eigenvalues ofs

(4.6.2) are zero, and associated eigenvectors are q independent col-

umns of DC*. Since the matrix in (406.2) is symmetric, the eigenvectorss

associated with its non-zero eigenvalues must be orthogonal to the col-

umns of DC* -- i.e., each eigenvector is in V(I - DC*(Db*DC*) Db*).s s s s s

One can infer from this fact that any eigenvector associated with a non-

zero eigenvalue of (4.6.2) is also an eigenvector corresponding to this

same eigenvalue for the 'adjusted' R matrix in (4.5.7). This, together

with (If.2.3) of Rao [20], shows that the smallest admissible esis the

(p-q)-th largest eigenvalue of the 'adjusted' R matrix. [In either

the MAXVAR or MINVAR procedur~ one can work with the symmetric matrix

(4.6.2) in place of

... ,(l~-S+l' 2~-s+1'~,-p-s+l

If

is the eigenvector associated with the smallest non-zero eigenvalue, and

if ~dk~

ssatisfy

D~*(S) k ~ = ~-s+ls

88

which implies that

(4.6.3) b* =j-s j 1,2, ... ,m,

then it follows readily that (i) k ~ is an eigenvector of the assos

ciated ci> (s) , (ii) k m, ands

(iii) A. inf f c ({I- DC*(D~*DC*) D~*}R).m sZ

s p-q-(s) s s s s

Turning to restriction (3.9.10), the main results are

infZ-(s~

v~Rv = 0-J -

j=1,2, ... , s-l

fs = c +l(R)p-s

with the optimal D~*(s) and associatedk~

ssatisfying

v-p-s+1

which implies that

(4.6.4) b* =j-s± j";-s+l

IIjYp-s+1 IIj = 1, 2, ... , m.

k ~ is indeed the eigenvector of ci>(s) associated with the eigenvalues

c +l(R) •p-s

The counterpart of (4.6.3) in terms of the X variables is

j = 1, 2, ... , m ,j:Y-s+l

L: ••JJ

-1± L: ..

I'" ~i=.bJ-S

89

where

... ,

is an eigenvector corresponding to the smallest non-zero eigenvalue of

the matrix in (4.5.14) using the current DC. The counterpart ofs

(4.6.4) is

where

.bJ-S

=

E- 1jj

II -~E..JJ

ly-s+1j;-s+1 11

j = 1, 2, ... , m,

w'-p-s+1 = ( w' w'1-p-s+1' 2-p-s+1' ... , w' )

m-p-s+1

is like w in (4.5.15) but subject to-s

j = 1, 2, ... , s-1.

CHAPTER V

PRACTICAL CONSIDERATIONS AND EXAMPLES

5.1 Introduction

The final chapter is devoted to practical considerations and ex-

amples. Suggestions for the interpretation of the results of a canon-

ical analysis are made in Section 5.2. The third section contains spe-

cific recommendations on starting points for the iterative procedures.

Selected calculations and comments for three examples are presented in

Sections 5.4 - 5.6.

The examples are based on data taken from Thurstone and Thurstone

[22]. They study in their monograph a number of different variables,

each of which measures one of seven "primary abilities." Three variab les

are associated with each ability. The seven abilities are: verbal

comprehension (V), word fluency (W), number (N), space (S), rote

memory (M), reasoning (R), and perception (p). The sample correlations,

based on 437 observations, are displayed in Table 5.4.1.

The SUMCOR, SSQCOR, and GENVAR iterative procedures were program-

med to terminate as soon as the condition

mL:

j=lI .s(n) - .e(n-l) I < 0.0001J s J s

holds. This condition has proved to be an effective test for convergence

of the criterion values

definitions of .e(n)J s

.f(n). (See Sections 4.2 - 4.4 for theJ s

and . £(n) .)J s

91

5.2 Interpretation of the Results

The analysis should include more than an examination of the ca-

nonical variates and their correlations. Given the canonical vector

!(s)t one should also study the basic ingredients of the associated

best fitting model.

Taking the MAXVAR !(l) as one example t attention should be di

rected to lA1t 1~lt and Fl' HopefullYt some meaning can be attached

to the "connnon factor" The elements of together wi th

are indicators of the potency of the factor in the different sets. For

the MINVAR !(l)t as a second example t the most important quantities

If c (R) is near zero t the canonical varp

iables form a "nearly" singular set; that is t the elements of ~ (1) are

"nearly" linearly related. The relevant linear compound iS t of course t

mFl' which should be interpreted if possible. In the situation where

only two elements of ~l are non-zero (it will often happen that this

is approximately true)t the interpretation is simplified in that mFlt

apart from a constant multiplier t is the difference between the first

pair of canonical variates for the sets corresponding to the non-zero

elements of ~lt and the accompanying first canonical correlation is

just (1 - c (R».p

Furthermore. this correlation must be ~, 't'max t the

largest possible first canonical correlation, for otherwise c (R)p

would not be the minimum eigenvalue of R as claimed (cf. Theorem 1.6.1).

92

Sometimes the correlation ~ is of interest in itself. One~max

would like, especially when m is large, to obtain some information

about it without separately studying all (~) possible pairs of sets.

The off-diagonal element of any ~(1) matrix which is largest in mag

nitude provides a simple lower bound for </Jmax • A non-trivial upper

bound can be found in terms of the extreme eigenvalues of R. It follows

from Theorem 1.6.1 that

(1 + </J ) $ C1(R)max and ~ c (R).P

Combining these inequalities, one has

(5.2.1) 1 - c (R)}.P

Since o < c (R) $ 1,p

by virtue of R being a positive

definite correlation matrix, the right hand side of (5.2.1) is always

non-negative and less than one.

If the upper bound is small, that is, if c1 (R) or c (R)p

is

close to one, there would be little reason to further pursue the search

for relations between two (or more) of the sets. Otherwise, it is rea-

sonab1e to consider the sets corresponding to the dominant elements (as-

surning there are some) of the MAXVAR le1 , if (c1 (R) - 1) < (1 - Cp(R»,

or the MINVAR ~1' if (C1 (R) - 1) > (1 - Cp(R», as the ones most likely

to produce canonical variates with correlation equal to </Jmax• The

heuristic justification for this notion, in the case where (c1 (R) - 1)

is the minimum term in (5.2.1), is that lei~(l)l~l = c1 (R) is large

(but less than two) by assumption which suggests that the off-diagonal

elements of ~(1) corresponding to the dominant elements of are

also large. A similar argument can be made when (c1 (R) - 1) > (1 - Cp(R».

93

5.3 Starting Points for Iterative Procedures

Iterative procedures were proposed in the last chapter for deter-

mining the coefficient vectors which define the SUMCOR,SSQCOR, and

GENVAR canonical variates. These procedures can be used not only at the

first stage but also at the higher stages, so long as the associated

constraints have the effect of confining the coefficient vectors to some

specified vector space.

There .are two primary objectives to consider in the selection of

starting points for the iterative procedures: first, limiting the num-

ber of iterations needed to achieve convergence and, second, assuring

that convergence is to an.optimal solution.

An ."all purpose" starting point, which has been. used successfully

x

orb ex: 1j-s

depending on.whether one is working withjb* ex: 1, j "" 1, 2, .... , m,-s -

or Y. This equal weight starting point is most.appropriate when one

with each of the procedures, begins with either

does not want to bother to make the calculations needed to determine a

more refined starting point. It can also be used to generate variables

to compare with those obtained using other starting points.

A bet~er starting point for the first stage of the SUMCOR procedure

can usually be obtained by selecting the MAXVAR!(l) for the initial

variables. Recall from Sections 3.4 and 3.5 that the MAXVAR and SUMCOR

!(l) vectors will be the same when the MAXVAR 1el is proportional to

1. Thus one would expect that the MAXVAR !(l) will be a particularly

good starting point whenever the associated 1e1 is nearly proportional

to a unit ve~tor. A similar type of starting point can be found for the

higher stages: choose that !(s) which results when the MAXVAR pro-

94

cedure is applied subject to the constraint (4.2.7) but with

!(l)' !(2)' ••• , !(s-l) being the previously selected SUMCOR variables.

Again, if l~ is nearly proportional to 1, convergence should be

quick.

It was argued in Section 3.6 that the MAXVAR and SSQCOR procedures

tend to produce similar !(l) 's whenever there exist first stage var

iables which can be well explained by a single factor. Accordingly, the

MAXVAR !(l) can often serve as a useful starting point for the SSQCOR

procedure. One can argue on similar grounds that a reasonable starting

point for the s-th stage of the SSQCOR procedure would be !(s) con

structed by the MAXVAR procedure subject to the constraint (4.2.7), in

which the SSQCOR !(l)' !(2)' ••• , !(s-l) are utilized instead of the

usual MAXVAR canonical variables.

Finally, the MINVAR!(l) is often a satisfactory starting point

for the GENVAR first stage, especially when the number of sets is large.

(For a small number of sets, the MAXVAR !(l) seems frequently to be a

better choice.) And, at the higher stages, one can begin with !(s)

constructed by the MINVAR procedure subject to (4.2.7), in which the

GENVAR Z(l)' !(2)' ••• , !(s-l) are inserted in place of the usual

MINVAR canonical variables.

For all of the procedures requiring iterative techniques, it would

be advisable to try more than one starting point. Experience has shown

that convergence is usually obtained rapidly enough to permit this kind

of check without excessive expense.

95

5.4 Example Number One

The first example is the one which Horst dealt with in [8], [9J,

and [10]. In this example, m = 3, . Pl = 3, P = 3, P3 =3, and p = 9.2

The first set. contains variables 5, 6, and 7 (cf. Table 5.4.1); the

second contains variables 12, 13, and 14; and the third contains var-

iables 19, 20, and 21. Horst made a preliminary transformation from

the original ,variables with (sample) covariance matrix Ll (the sub

script indicates the example number) to new variables Y with (sample)

correlation matrix Rl • The analysis given here starts from this latter

matrix as found in [8] or [9] and displayed below:

1.000 0.000 0.000 0.636 0.126 0.059 0.626 . 0.195 0.059

1.000 0.000 -0.021 0.633 0.049 0.035 0.459 0.129

1.000 0.016 0.157 0.521 0.048 0.238 0.426

1.000 0.000 0.000 0.709 0.050 -0.002

R = 1.000 0.000 0.039 0.532 0.1901

1.000 0.067 0.258 0.299

1.000 0.000 0.000

1.000 0.000

1.000

e e

TABLE 5.4.1

Correlation Coefficients for the Twenty-one Variables

e

2(M) 3(V) 4(W) 5(S) 6(N) 7(R) 8(P) 9(M)10(V)11(W)12(S)13(N)14(R)15(P)16(M)17(V)18(W)19(S)20(N)21(R)

l(P) .129 .261 .297 .181 .434 .292 .461 .156 .247 .239 .143 .497 .259 .430 .129 .204 .231 .221 .369 .3482(M) .295 .286 .103 .196 .320 .209 .280 .364 .299 .046 .268 .381 .263 .478 .282 .311 .061 .261 .2853(V) .419 .108 .298 .492 .297 .151 .829 .356 .033 .309 .555 .343 .234 .768 .407 .108 .351 .4254(W) .176 .293 .391 .256 .243 .472 .654" .092 .348 .355 .447. .240'.428'.557. .127 .• 319 .3985 (S) .249 .271 .416 .227 .115 .192 .636 .183 .185 .279 .100 .272 .100 .626 .369 .279

6(N) .399 .339 .121 .323 .261 .138 .654 .262 .317 .216 .296 .233 .190 .527 .3567(R) .378 .290 ~468 .367 .180 .407 .613 .418 .249 .446 .305 .225 .471 .6108(P) .338 .264 .221 .402 .334 .398 .505 .175 .400 .l83 .424 .433 .4019(M) .242 .184 .183 .040 .254 .224 .292 .234 .122 .252 .155 .217

10(V) .415 .061 .347 .525 .349 .260 .775 .482 .125 .369 .381

11(W) .165 .308 .323 .372 .217 .354 .514 .144 .334 .38112(8) .091 .147 .291 .078 .205 .009 .709 .254 .19113(N) .296 .354 .203 .271 .254 .103 .541 .39414(R) .385 .254 .523 .319 .179 .437 .49615(P) .209 .332 .350 .298 .347 .460

16(M) .251 .236 .166 .140 .19217 (V) .433 .238 .385 .39618(W) .066 .293 .30319(5) .291 .24520(N) .429

P = perception S = spaceM = rote memory N = numberV = verbal comprehension R = reasoningW= word fluency

\00\

The SSQCOR calculations for R1 using restriction (3.9.1) give

97

~(1) =

~ (2) =

~(3) =

1.000

1.000

1.000

0.735

1.000

0.603

1.000

0.464

1.000

0.756

0,743 ,

1.000

0.504

0.635 ,

1.000

0.268

0.165 ,

1.000

0.578 -0.533 -0.619

E(l) = 0.574 0.~4 -0.155

0.580 ..0.265 0.770

0.559 ...0.753 -0.348

E(2) = 0.602 0.080 0.794 ,0.571 0.653 -0.498

0.653 -0.176 -0.736

E(3) = 0.611 -0.451 0.650 ,0.447 0.875 0.187

98

The GENVAR, MAXVAR, a~d SUMCOR results using restriction (3.9.1)

are virtually the same in each case as those for the SSQCOR procedure.

The SUMCOR calculations also have been carded out using l:'estdction

(3.9.2), first with lk = {l} and then again with lk = {l, 2}. The

second and third stage cOl:'relation matrices in the first instance are

~(2) =1.000 0.601

1.000

0.505

0.6~8

1.000

and

1.000 0.315 0.153

~(3) = 1.000 0.526

1.000

and in the second instance al:'e

1.000 0.603 0,505

~(2) = 1.000 0,635

1.000

and

1.000 0.465 0.269

~ (3);00 1.000 0.166

1.000

AI3 Ollewoulde~pect, higher critedon values are attained as the

number of sets upon which the restrictions are imposed is reduced. The

results al:'e s.ullUlladzed in Table 5.4.2.

99

TABLE 5.4.2

Values of the SUMCOR Criterion Function for R1Using Different Restrictions from the Class (3.9.2)

I k : {l} {l, 2} {1, 2, 3}

Stage 2 6.489 6.487 6.484

Stage 3 4.988 4.800 4.794

The eigenvalues which are relevant to the MAXVAR/MINVAR procedures

using restriction (3.9.10)

in Table 5.4.3.

along with the values of k are displayeds

TABLE 5.4.3

Values of Cj

(R1) and Related MAXVARlHINvAR Quantities

] MAXVAR

j cj (R1) kj !(j)," .,- ~

1. 1'.490 1 2.490 0.267 0.2432 2.164 1 2.164 0.497 0.338

3 1.620 1 1.620 0.860 0.520

MINVARk

1O_

jA'. -(10-j)

4 0.867 2 1.517 0.867 0.616

5 0.541 3 1.455 1.004 0.541

6 0.487 3 1. 757 0.756 0.4877 0.337 3 1.954 0.709 0.337

8 0.259 3 2.365 0.376 0.2599 0.235 3 2.082 0.683 0.235

in Table 5.4.3 to those

100

There are evidently three MAXVAR and six MINVAR stages. Each

MAXVAR~(s) has associated with it a representation of the type (3.9.5)

involving a single factor while each MINVAR ~(s)' except ~(6)' has

associated with it a representation of the type (3.9.8) using two fac-

tors.

The close resemblance of the MAXVAR !(j)

previously listed for the SSQCOR procedure suggests that the corre-

sponding ep(j) and will be virtually the same in both cases, as

they indeed turn out to be. (The same resemblance holds if (3.9.10) is

replaced by (3.9.1).)

The and matrices for the first two MINVAR stages

(using restriction (3.9.10)) are as follows:

=

=

1.000

1.000

0.345

1"000

-0.725

1.000

-0.736

-0.517

1.000

0.626

-0.695

1.000

1'J

0.591 -0.516 0.620

E(1) 0.493 0.839 0.228

-0.638 0.171 0.751

and

0.574 -0.632 0.520

E(2) = -0.593 0.118 0.797

0.565 0.766 0.307

.~..~

101

Taking the element 0.756 from the SSQCOR ~(l) matrix and using

(5.2.1), one has

0.756 ~ ~ ~ 0.765 •max

The dominant elements of the MINVAR ~l vector are the first and the

third which suggests that the corresponding sets are the ones which yield

andjZl' j Z2'

l'X. ,--J

is well ex-

As a rough approximation, the canonical variates

jZ3' other than the MINVAR ones, are proportional to

(2, -1, -l)Xj , and (0, 1,-1)Xj , j = 1, 2, 3. ~(l)

plained by a single factor accounting for 83.0 per cent of its vari-

ability and contributing with equal weight to each of the jZl. The

factor is approximately proportional to l'X. For !(2) , the first

factor accounts for 72.1 per cent and the second factor 16.6 per cent

of the variability. The second factor is present mainly in lZ2 and

3Z2. Two factors account for 82.6 per cent of the variability in

!(3) , about as much as one factor does for ~(l) •

Table 1 shows the first SSQCOR criterion values5.4.4 stage at var-

ious steps of the iteration procedure. The following starting vectors

b* (0) . = 1, 2, 3, were used:j-l ,J

(i) ( 0.574 0.574 0.574) ;

(ii) ( 1.000 0.000 0.000);

(iii) ( 0.000 0.000 1. 000) ;

1 The figures in Table 5.4.4 are subject to error in the sixthdecimal place.

102

(iv) .the SUMCOR j'£.~ under restriction (3.9.1);

(v) the SUMCORj~~ under restriction (3.9.1);

(vi) the MAXVAR j~r.

Starting point (vi) gave the quickest convergence which is not sur-

prising in view of the MAXVAR le1 being nearly proportional to 1.

TABLE 5.4.4

Values of SSQCOR f(n) on R1 Using Different Starting Points·31

Starting Points

n (i) (ii) (iii) (iv) (v) (vi)

1 6.303480 5.799538 5.118418 5.046118 3.635250 6.329807

2 6.324824 6.051978 6.044878 5.047182 3.686164 6.329809

3 6.328192 6.209252 6.235992 5.049262 4.686334

4 6.329200 6.281344 6.293624 5.054866 6.182884

5 6.329578 6.311030 6.315908 5.069552 6.323596

6 6.329720 6.322640 6.324514 5.107198 6.329160

7 6.329778 6.327086 6.327802 5.197832 6.329620

8 6.329796 6.328780 6.329044 5.385994 6.329738

9 6.329804 6.329422 6.329522 5.677556 6.329786

10 6.329810 6.329666 6.329700 5.972386 6.329798

11 6.329810 6.329752 6.329770 6.167168 6.329802

12 6.329792 6.329792 6.263168 6.329808

13 6.329802 6.329804 6.303786 6.329810

14 6.329808 6.329808 6.319844

15 6.329808 6.329808 6.326020

16 6.329806 6.328374

17 6.329264

18 6.329602

19 6.329734

20 6.329780

21 6.329800

22 6.329808

23 6.329804

24 6.329806

103

5.5 Example Number Two

L2 is the correlation matrix of the twenty-one variables in Table

5.4.1. The first set consists of the first seven variables, the second

set consists of the middle seven variables, and the third set consists

of the last seven variables.

The main features of the data can be extracted by the MAXVAR and

MINVAR procedures. The eigenvalues of R2 are given in Table 5.5.1

along with the number of factors for the models (3.9.5) and (3.9.8)

using restriction (3.9.10).

The other key quantities for the first two MAXVAR stages are

ep (1) =

1.000 0.889

1.000

0.869

0.871

1.000

1~000 0.710

1.000

0.636

0.689

1.000

E(l) =

0.578 -0.449 -0.681

0.579 -0.362 0.730

0.575 0.817 -0.050

0.575 -0.640 -0.510

0.589 -0.109 0.801

0.568 0.760 -0.314

!h) = (

!(2) = (

2.753 0.136

2.357 0.366

0.111 ),

0.277),

104

e b' = ( 0.114 0.149 0.532 0.216 0.190 0.124 0.190 ) ,1-1

b' = ( 0.208 0.031 0.538 0.179 0.097 0.184 0.224 ) ,2-1

b' = ( 0.193 0.109 0.534 0.134 0.028 0.223 0.206 ) ,3=-1

b' = ( -0.284 0.034 0.618 0.135 -0.759 -0.096 -0.166 ) ,1-2

b' = ( -0.191 -0.149 0.605 0.041 -0.723 -0.291 0.014 ) ,2=-2

and

b' = ( -0.162 0.035 0.530 0.205 -0.819 -0.291 -0.051 ).3=-2

It appears that the best fitting factor for !(1) is largely asso

ciated with the verbal primary ability while the best fitting factor for

!(2) is, for the most part, a weighted difference of verbal and spatial

abilities.

For the MINVAR first stage

q> (1) =

1.000 -0.894 -0.022

1. 000 -0.073

1.000

0.705 -0.081 0~704

E(l) = -0.708 -0.024 0.706

0.040 0.996 0.074

. lh) = ( 1. 896 1.003 0.101),

b' = (-0.120 -0.188 -0.547 -0.219 -0.024 -0.137 -0.203),1-1

2bi = ( 0.140 0.018 0.607 0.171 -0.025 0.208 0.221),

and

~i = ( 0.074 0.553 -0.645 -0.084 -0.014 -0.358 0.820).

105

TABLE 5.5.1

and and

j ci

(R2) k i /k22- ic

i(R

3) k

i/k

22_

i

1 2.752 1 4.406 1

2 2.357 1 2.650 1

3 2.148 1 2.107 1

4 1.808 1 1. 815 1

5 1.606 1 1.519 2

6 1. 482 1 1.321 2

7 1. 356 1 0.892 4

8 0.961 3 0.840 5

9 0.888 2 0.742 4

10 0.838 3 0.645 5

11 0.800 3 0.625 5

12 0.666 3 0.573 5

13 0.641 3 0.506 6

14 0.586 3 0.480 5

15 0.478 3 0.428 6

16 0.434 3 0.390 6

17 0.404 3 0.307 6

18 0.310 3 0.239 6

19 0.259 3 0.228 6

20 0.123 3 0.161 6

21 0.101 3 0.124 6

Using theMINVAR ~(1) and (5.2.1), one obtains

(5.5.1)

The elements of #1

second sets.

0.894 $ ~ $ 0.899 .max

suggest that ~ occurs between the first andmax

•

106

Note the close resemblamcebetween the nMAXVAR lb1 and 2b1 and

the corresponding MINVAR lb1 (apart from a factor of (-1» and 2b1.

Without separately ana1ydng the first and second sets ,one can, surmise,

in 1ight·of(5.5.1) and the MINVAR~(l)' that the MINVAR residual fac

tor 3F1 is, in essence, the difference between the first canonical

variates for these se~s.

5.6 Example Number Three

E3 is the same matrix as E2 , but the breakdown of variables into

sets is different (and hence R2 and R3 are different). Now there are

six sets: the first, third, and fifth sets consist .of the first four

variables in the three sets used in Example Two; the secQnd, fourth, and

sixth sets cqnsist of the last three variables from the sets in Example

Two.

The eigenvalues of R3 and the number of factors for models (3.9.5)

and (3.9.8), using restrictioQ. (3.9.10), may be found in Table 5.5.l.

The detailed calculations are given only for the first stages of the

SSQCOR and GENVAR procedures. TheSSQCORresu1ts are given first:

1.000 0.598 0.822 0.662 0.791 0.591

LOOO 0.619 0.730 0.590 0.739

<I> (1)1.000 0.677 0;823 0.628

= ,-l.000 0.636 0.712

1.000 0.592

l.000

and

0.415 -0.398 0.089 -0.026 -0.715 0.387

0.394 0.482 0.018 -0.781 -0.b27 -0.033

E(l)0.425 -0.361 -0.068 0.025 -0.008 -0.827=0.409 0.292 0.749 0.396 0.171 0.032

0.412 -0.421 -0.181 -0.095 0.671 0.401

0.393 0.465 -0.628 0.472 -0.091 0.065

!(1) =( 4.406 0.683 0.284 0.255 0.206 0.167),

l£.i = ( 0.322 0.229 0.562 0.307),

~i = ( 0.463 0.024 0.592 0.279),

s£.i ::; ( 0.449 0.133 0.608 0.154),

2£.i = ( 0.239 0.369 0.686),

4bi = ( 0.254 0.498 0.651),

~i ::; ( 0.181 0.520 0.578).

The GENVAR results are

107

<1>(1) =

1.000 0.548

1.000

0.858

0.573

1.000

0.604

0.736

0.636

1.000

0.814

0.565

0.840

0.607

1.000

0.538

0.751

0.575

0.727

0.567

1.000

108

0.415 -0.423 -0.020 0.064 -0.590 -0.545

0.392 0.460 0.192 0.772 -0.008 0.034

E(l)0.427 -0.380 -0.004 -0.026 -0.171 0.802=0.406 0.332 -0.821 -0.197 0.099 -0.049

0.418 -:-0.378 0.164 -0.009 0.775 -0.235

0.391 0.461 0.512 -0.600 -0.110 -0.030

1(1) = ( 4.317 0.842 0.271 0.248 0.186 0.136),

lbi=( 0.149 0.166 0.706 0.299),

~i = ( 0~298 -0.006 0.754 0.230),

b l = ( 0.293 0.094 0.708 0.207 ) ,• .s--;:.1

b l = ( 0.364 0.370 0.604 ) ,2-1

b l = ( 0.355 0.515 0.581 ) ,4-1

and

b I = ( 0.288 0.535 0.494 ) .6=-1

The largest off-diagonal element of theMINVAR ~(1) matrix, inab

solute value, turns out to be 0.864 and appears in the first row and

third column. Thus,with the aid of (5.2.1), it follows that

0.864 S $ S 0.876 •max

The dominant elements of the MINVAR ~1 are the first and,the third

which suggests that it is the corresponding sets which achieve $max'

109

The main difference between the two ~(l) matrices is that. the off~

diagonal elements of theSSQCORmatrixwhich are .less (greater) than

0.7 are greater (less) than the corresponding elements in the GENVAR

matrb:. LookinR at the two l(l) vectors, one, sees that

lA l , 3Al' 4Al' SAl' and 6Al are less for theGENVAR procedure than

for the SSQCOR procedure while only 2Al is greater. Passing from the

SSQCORto the GENVAR j~l' it is apparent that the most importantdif

ferences are a uniform increase in the verbal comprehension and space

coefficients and a uniform decrease in the perception and reasoning coef-

ficients.

Focusing on the SSQCOR results, the first two factors account fot:'

84.8 per cent of the variability in ~(l). The elements of lel indi

cate that the first factor contributes with approximately the same

wei.ght to each jZr The second factor, however ,has positive weight

in the representation of 2Zl' 4Zl' and 6Zl and negative weight in

the representation of lZl' 3Zl' and sZf. Each of the remaining four

factors appears to be common to at most threeof,the jZ10

110

BIBLIOGRAPHY

[1] Anderson, T.W. (1958). Introduction to Multivariate Statisti-

cal Analysis. Wiley, New York.

[2] Bartlett, M.S. (1951). "The goodness of fit of a single hypo-

thetica1 discriminant function in the case of several

groups." Annals of Eugenics, 16, 199-214.

[3] Carroll, J. Douglas (1968). "A generalization of canonical

correlation analysis to three or more sets of variables."

To appear in Proceedings of American Psychological

Association.

[4] Cliff, Norman (1966). "Orthogonal rotation to congruence."

Psychometrika, 31, 33-42.

[5] Fisher, R.A. (1940). "The precision of discriminant functions."

Annals of Eugenics, 10, 422-429.

[6] Fortier, J.J. (1966). "Simultaneous linear prediction."


[7] Gower, J.C. (1967). "Multivariate analysis and multidimen-

sional geometry." The Statistician, ..!Z., 13-28.

[8] Horst, Paul (1961). "Relations among m sets of measures."


[9] Horst, Paul (1961). "Generalized canonical correlations and

their applications to experimental data." Journal of

Clinical Psychology (monograph supplement), 14, 331-347.

[10] Horst, Paul (1965). Factor Analysis of Data Matrices. Holt,

Rinehart; and Winston, New York.

111

[11] Hotelling, Harold (1935). "The most predictable criteI:ion."

Journal of Educational Psychology, ~, 139-142.

[12] Hotelling, Harold (1936). "Relations between two sets of

variates." Biometrika, 28, 321-377.

[13] Hsu, P.L. (1946). "On a factorization of pseudo-orthogonal

matrices." Quarterly Journal of Mathematics, .!1., 162-165.

[14] Kendall, Maurice G. and Stuart, Alan (1961). The Advanced

Theory of Statistics, Vol. 2. Hafner, New York.

[15] Lancaster, H.O. (1966). "Kolmogorov's remark on the Hotelling

canonical correlati,ons." Biometrika,.2]., 585-588.

[16] McKeon, J.J. (1966). "Canonical analysis: some relations be-

tween canonical correlation, factor analysis, discriminant

function analysis , and scaling theory." Psychometric

Monographs, 13.

[17] Mostow, George D., Sampson, Joseph H., and Meyer, Jean-Pierre

(1963). Fundamental Structures of Algebra. McGraw-Hill,

New York.

[18] Okamoto, Masashi and Kanazawa, Mitsuyo (1968). "Minimization

of eigenvalues ofa matrix and optimality of principal

components." Annals of Mathematical Statistics, ]2.,

859-863.

[19] Rao, C. Radhakrishna (1964). "Use and interpretation of prin-

cipal components." Sankhya, 26, 329-358.

[20] Rao, C.Radhakrishna (1965). Linear Statistical Inference and

its Applications. Wiley, New York.

112

[21] Steel, Robert G.D. (1951). "Minimum generalized variance for a

set of linear functions." Annals of Mathematical Statis

tics, 22, 456-460.

[22] Thurstone, L.L. and Thurstone, Thelma Gwinn (1941). "Factorial

studies of invariance." Psychometric Monographs, l.

[23] Vinograde, Bernard. (1950). "Canonical positive definite ma-

trices under internal linear transformations." Proceedings

of the American Mathematical Society, 1, 159-161.

[24] Williams, E.J. (1952). "Use of scores for the analysis of

association in contingency tables." Biometrika,~,

274-289.

[25] Wrigley, Charles and Neuhaus, Jack O. (1955). "The matching

of.two sets of factors." American Psychologist, 10,

418-419.

113

APPENDIX

LEMMA AI. Let A and B be (p x p) matrices with A symmetric

and B positive definite. Let 61 ~ 62 ~ ..• ~ 6p be the eigenvalues

of B-1A and iI' i 2, ••• '4 a corresponding set of eigenvectors such

that f~Bf. = &.. (Kronecker delta). Then--J:, -j IJ

and

y'Aysup -Y y'By

y'AysiP 1.' By

iiBy = 0

i= 1,2, ••. ,k

ik+1Aik+1= =

ik+1B-4+1k = 1, 2, •.. , p-1.

Proof. Let x = B~y and apply (If.2.1) - (If.2.3) of Rao [20].

LEMMA A2. Let M be a (p x q)matrix with p::;; q. Then there

ex;istorthogona1 matrices rand D. such that

rMD.' = (D 0)

where

D =

and ••• , 62P

are the eigenvalues of .. MM'.

Proof. See Hsu [13] or Vinograde [23].

114

LEMMA A3. Let X be a (p x 1) random vector with E(X) =.Q.

and var (X) = L:. Let M be any (p x q) matrix and Y any (q x 1)

random vector. Let f be any real-valued function defined on the set

of all non-negative definite matrices of order p which is identical to

some function g(81

, 82

, ••• ,8 ) where 81, 82

, ... , 8 are thep p

eigenvalues of the matrix in the argument of f and g is strictly in.,.

creasing in each 8 .•J

Then

f(E{(X - MY) (X - MY)'})

is minimized with respect to M and Y when and only when

qMY = L: v.vj'X

j=l -J--

the v. being orthonormal eigenvectors of L:-J

The minimum value of f is

corresponding to the 8 .•J

g(8 +1,8 +2' ... ,8,0, ••• ,0).q. q p

Proof. See Okamoto and Kanazawa [18].

The next lemma is based on an assertion by Rao ([ 20], p. 441) •

LEMMAA4. Suppose

L: =

is non-negative definite. Then there exist matrices M12 and M2l

such that

115

and

Proof. L:i

. = T.T~ for some T. andJ ~ J ~

c V(T l ) = V(T1Ti). Therefore,

Tj

, i = 1,2;

L: 2l = M21L: ll

j = 1,2.

for some

LEMMA A5. Let T: U + U be an idempotent (i.e., T2 = T)

endomorphism of a vector space. Then U is the direct sum of the

kernel of T and the image of T.

Proof. See Mostow, Sampson, and Meyer ([17], p. 384).

canonical analysis of several sets of variables jon … · 2018-11-13 · 5.2 interpretation of the...

Documents