of series no. 1964boos/library/mimeo.archive/isms_1964_392.pdfan extension of the concept of...

CONTRIBUTIONS 'ro THE THEORY, COMPUTATION AND

APPLICATION OF GENERALIZED INVERSES

Charles A. Rohde

Institute of Statistics~tlmeog~aph Series No. 392May 1964

•iv

TABLE OF CONTENTS

Page

LIST OF TABLES vi

LIST OF ILLUSTRATIONS • • • • vii

CHAPI'ER 1. INTRODUCTION. 1

CHAPTER 2. LITERATURE REVIEW

CHAPTER 4. COMPUTATIONAL METHOrs •

CHAPTER 5. LEAST SQUARES APPLICATIONS. •

CHAPTER 3. TIlEOREn'ICAL RESULTS •

4

44

1314161929

32

323235374142

47

47

47

52525456565969

74

7475

7982

Matrix . . . . . . . . . . . . . . . . .

Theorem. • . . . . . . . . . . . . . . .

Summar.y. 0 • • • • • • •

Summar.y. • 0 0 0 • • • • •

Summary.. ~ .

An Expression for a Generalized Inverse of a Partitioned

Results on Rank. . • • • •Symmetry Results • • • . •Eigenvalues and EigenvectorsOrthogonal Projections • • • •Characterization of Generalized Inverses .

Generalized Inverses • • • • • •Reflexive Generalized InversesNormalized Generalized Inverses ••Pseudo-Inverses. • • • • • • • . • • ••.Generalizations to Hilbert Spaces. • •••Generalized Inverses in Algebraic Structures • •

Linear Models with Restricted Parameters • • • • • • • • • •

Introduction • • • • • • • • • •A Review of Some Current Theories.. • ••••Weighted Least Squares and the Generalized Gauss-Markoff

4.3 Some Computational Procedures for Finding Generalized

4.14.2

Inverses . . . . . . . . . (I • • • • •

4.3.1 Generalized Inverses•••••..4.3.2 Reflexive Generalized Inverses ••4.3.3 Normalized Generalized Inverses •4.3.4 Pseudo-Inverses •••••

4.4 Abbreviated Doolittle Technique. •4.5 Some Results on Bordered Matrices ••

2.12.22·32.42·52.62·7

3·13.23·33.43·53.6

5·15·25.3

5·4

• TABLE OF CONTENTS (continued)

v

Page

5.5 Linear Models with a Singular Variance - CovarianceMatrix . . . . . . . . . . . . . . . . . . . . . . . 87

5.6 Multiple Analysis of Covariance. • • • • • • • • • • • 895.7 Use of the Abbreviated Doolittle in Linear Estimation

CHAPTER 7. LIST OF REFERENCES. • • • • •

5.8 Tests of Linear HYPotheses ••••••••••..

CHAPTER 6. SUMMARY AND SUGGESTIONS FOR FUTURE RESEARCH • •

6.1 Summary. . . . . . . . . . . . . .6.2 Suggestions for Future Research••

9192

96

9696

103

. . . .Problems 0 • • 0 • • • • • •

•

LIST OF TABLES

vi

4.1. The Abbreviated Doolittle :format o • • • • • ill • • • 0 ill 0

Page

63

vii

LIST OF ILLUSTRATIONS

Page

2.1. The Pseudo-Inverse in geometric terms • • . . . • . • • .. 27

CHAPl'ER 1. INTRODUCTION

Statistics, which may be viewed as a branch of mathematics, has

suggested many interesting mathematical problems. Certain developments

in the areas of measure theory, modern algebra,mmiber theory, numerical

analySiS, etc. have been fostered by research into statistical problems.

Matrix theory affords perhaps the best example of the ~nterplay between

statistics and mathematics. The most widely used statistical techniques

are Analysis of Variance and Regression and each can be compactly treated

using matrix algebra. Application of matrix algebra to these areas led,

for example, to Cochran's Theorem which has as its mathematical counter

part the Spectral Representation Theorem for a symmetric matrix. Matrix

theory also has provided relatively simple proofs of the properties of

least squares as used in Analysis of Variance and Regression problems.

Mathematically these proofs are based on the concepts of projections and

linear transformations on (finite dimensional) vector spaces. Recent

work in the area of stationary stochastic processes has indicated that

the infinite dimensional analogues of Analysis of Variance and Regression

techniques will become increasingly important and useful to the applied

statistician.

Application of the theory of least squares to the General Linear

Model results in a set of linear equations called the normal or least

squares equations. When represented in matrix form, these equations

frequently have no unique solution, Le. the coefficient matrix is

singular. To cope with this situation, certain authors (Bose [1959], and

C. R. Rao [1962]) developed the concept of a Generalized Inverse. Roughly

2

speaking, a Generalized Inverse is an attempt to represent a typical

or generic solution to a set of linear equations in a compact form.

Several earlier generalizations of the inverse of a matrix developed

in certain pure mathematical contexts.

Until the late 1950's little investigation of the theoretical

properties of Generalized Inverses was undertaken. Starting in 1955

With an excellent article by Penrose, many authors have investigated

certain properties of a Generalized Inverse called the Pseudo-Inverse.

The Pseudo-Inverse of a matrix is a uniquely determined matrix which

behaves almost exactly like an inverse. Due to its uniqueness it was

natural that the properties of the Pseudo-Inverse were studied and ex

tended to apply to other Algebraic Systems and just recently to Hilbert

space. The Pseudo-Inverse of a matriX, while of interest because of its

theoretical properties, is perhaps a little too cumbersome computation

ally for use in least squares theory. In particular the results of

least squares computations happen to be invariant under choice of

Generalized Inverse. Such invariance, coupled With the computational

difficulties present With the PseudO-Inverse, indicates one need for

research on the properties of other types of Generalized Inverses. At

present there appear to be four types of Generalized Inverses which

deserve recognition and study.

In this work we shall attempt to accomplish five purposes:

1. Define the various types of Generalized Inverses which appear

to be important and collate previous research,

2. Investigate inter-relationships between the various types of

Generalized Inverses,

3

3 . Discuss some computational aspects of obtaining the various

types of Generalized Inverses,

4. Indicate and modify the statistical applications of Generalized

Inverses which have appeared in the literature,

and

5. Discuss possible areas for future research.

Although topics 1 through 5 were of primary importance in this

work, at least one interesting by-product appeared. Upon investigating

the computational methods it was found that by a simple modification of

the techniques and theory taught in beginning matrix theory courses

Generalized Inverses could be discussed at an elementary level. This

suggests that the idea of Generalized Inverses might well be fitted into

beginning matrix theory courses in order to achieve a more general and

more satisfying treatment of systems of linear equations.

4

CHAPrER 2. LITERATURE REVIEW

2.1 Summary

In this chapter we investigate five definitions of Generalized

Inverses. Each of these definitions, as mentioned in the Introduction,

is designed to assist in solving a set of equations. Four of the defi

nitions presented are concerned with linear equations in finite dimen

sional inner product spaces. Definition 2.1 provides the most general

representation of a solution. Each of the other definitions can be

considered as special cases of Definition 2.1 which defines what we

shall call a Generalized Inverse. Definition 2.2 defines the concept

of a Reflexive Generalized Inverse which is simply a Generalized Inverse

with a little symmetry. Normalized Generalized Inverses are introduced

in Definition 2.3 and can be considered as Reflexive Generalized In

verses with a certain projection property. The concept of Pseudo-Inverse

introduced in Definition 2.4 is simply a unique Generalized Inverse or

alternatively a Normalized Generalized Inverse with a symmetric projec

tion property.

An extension of the concept of Generalized Inverses to Hilbert

spaces is achieved in Definition 2.5 where the Pseudo-Inverse of a

bounded linear transformation is defined. Further extensions of the

concept of Generalized Inverses are discussed in Section 2.6.

2.2 Generalized Inverses

The definition of a Generalized Inverse formulated by Bose [1959],

who used the term Conditional Inverse, revolved around the need to solve

5

the system of non-homogeneous linear equations

(2.1) ~ =~ ,

where A is an nx p matrix (n > p),! is a p xl vector and ~ is

an n x 1 vector. The rank: of A is assumed to be r ~ p. If n = p

and r = p, then (2.1) admits a unique solution, namely ! =A-l~ where

A- l denotes the inverse of A. If r = p, then provided the equations

are consistent (!.!:.., possess a solution), a solution is given by

( , )-1!o= AA kz..

Since

we see that (2.2) is a solution of (2.1). The matrix (A'A)-lA' is an

example of the Pseudo-Inverse of the matrix A and will be discussed in

Section 2.4.

A solution of (2.1), provided it exists, is relatively easy to

represent in both the cases mentioned above. In the more general case

where r < p < n, a solution of (2.1) is more difficult to represent.

If we let· Ag =A-lor (A' ArlA' depending on whether r = p = n or

r = p < n then we see that

This suggests that a particular solution of (2.1), say A~, should

satisfy (2.3). We thus introduce the following definition due to Bose.

Definition 2.1 (Generalized Inverse) A matrix Ag

is said to be

a Generalized Inverse of the matrix A if (2.3) is satisfied, !.!:.., if

AAgA = A •

6

Most of the theory of solving linear equations is based on the

following three theorems which are well known and stated here for future

reference.

Theorem 2.1: If A is a square matrix a necessary and sufficient

condition that there exists ~ rQ such that Ax = 0 is that A be

singular.

Theorem 2.2: A necessary and sufficient condition that the linear

equations ~ = y be solvable for x is that

Theorem 2.3: The general solution (more precisely a typical element

in the set of solutions) of the non-homogeneous linear equations ~ = if.

is obtained by adding to any particular solution, x , the general solu-0

tion of

Ax = 0 •

Using the above three theorems, Bose proves the following results,

the proofs of which we reproduce since they form an integral part of our

later work. The first of these shows that any Ag satisfYing Defini-

tion 2.1 plays a basic role in providing a solution to (2.1).

Theorem 2.4: (i) If Ag

is a Generalized Inverse of A and the

equations ~ = if. are consistent (!.~., solvable for ~) then Agif. =~l

is a solution of ~ = if..

(ii) If ~l =Agif. is a solution of ~ = if. for

every if. for which ~ = if. admits a solution then AAgA = A, 1.~., Ag

is a Generalized Inverse of A.

Proof: (i ) Since ~ = y.. are consistent there exists x

such that .A:! = y... Thus if ~ = Agy" then

~l = AAgy" = AAg(~) =AAg~ =~ = Y.. .

Hence Agy" =~l is a solution to ~ = Y..

(11) We have AAgy" = Y.. for every Y.. for which the

7

system ~ = Y.. is consistent. In particular let Y.. = A! for an arbi

trary !. Then ~ = Y.. is consistent with Agy" = AgA! a solution and

thus we have

for arbitrary !. Hence AAgA = A •

The following result, used by Bose in developing the theory of

least squares, indicates an important set of linear equations which are

consistent and possess a certain uniqueness property.

Theorem 2~: The equations (X'X)~ = Xl Y.. for any Y..:f Q are

consistent and X! is unique.

Proof: Since the rank of a product of matrices is less than or

equal to the minimum of the separate ranks we have

Rank [(X'X) I X'y"] =Rank [x'(xlY..)] ~ Rank [X'] =Rank [X'X] •

We also have

Rank [(X'X) I X'y"] ~ Rank [X'X] •

Hence Rank [X'X I X'y"] = Rank [X'X], and by Theorem 2.2 the equations

(X'X)~ =X'y" are consistent.

Let ~l and ~ be any two solutions to (X'X)~ = X'y" then

(X'X) (~l - ~) = 0

8

and hence

(x' - ~) x'x (x - X-) =a-1 -'~ -l-'~

which implies

or

Xx..=Xx--;;';;1 ==2

proving uniqueness of' X:!.

The f'ollowing expanded version of' Theorem 2.5 also due to Bose f'inds

extensive application in Chapter 4.

Theorem 2.6: If' X' is a p x n matrix, there exists a p x n

matrix Y such that (X'X)Y =X' .

Further:

Proof':

( i) XY is unique

(ii) (xy)' = XY

(iii) (xy)2 = (XY)

thLet ~i be an (n x 1) vector with a 1 in the i row

and a's elsewhere. By Theorem 2.5 f'or each i = 1,2, ... ,n there exists

an !i such that (X'X)!i = X'~i. Hence

(X'X) [!1'!2'·.· '.!nl = X' [~1'~2'··· '~nl = X'I = X'

which proves the existence of' Y by taking Y = [!l'~' ••• '.!nl . By

Theorem 2.5, X!i is unique f'or each i = 1,2, ... ,n. Hence XY is

unique. To prove (ii) and (iii) we see that y'(X'xy) =y'X' or

(X'xy)'y = Y'X'. Hence (X')'Y = y'X' or Xy = (Y'X') = (XY)'. Also

(xy)'xy = (xy)' =Xy or 2(XY) =XY .

9

Corollary 2.1 The matrix x(x'x)Bx' where (X'X)g is any

Generalized Inverse of (X'X) is uniquely determined, symmetric and

idempotent.

Proof: By Theorems 2.5 and 2.6 the equations (X'X)Y = X' are

consistent. Hence by Theorem 2.4 a solution is given by

y = (x'x)Bx' .

By Theorem 2.6 X(X'X)Sx' has the stated properties.

In view of the importance of Corollary 2.1 an alternative proof

will be presented. ,(i) [(X'X)(X'X)g(X'X)]' = (x'x)(x'X)g X'X =X'X implies that,

(X'X)g is also a Generalized Inverse of X'X,

(ii) [X - X(X'X)g(X'X)]'[X - X(X'X)Sx'X], ,.

= [x'-(X'X)(X'X)g X'][X-X(X'X)g X'X]

, I

= X'X - X'X(X'X)Sx'X-(X'X)(X'X)g x'x + (X' X)(X'X)g (X'X)(X'x]8x' X

= X'X - X'X - X'X + X'X

= 0

implies that X = x(x'x')g(X'x) for any Generalized Inverse of X'X,

(iii) Let (X'X)~ and (X'X)~ be any two Generalized Inverses of

(X'X), then

10

[X(X'X)~f - X(X'X)~']'[X(XfX)~' - X(X'X)~']1 2 1 2, ,= [X(X'X)g X' - X(X'X)g X'][X(X'X)~' - X(X'X)~']1 212

, ,= X(X'X)g X'X(X'X)~'- X(X'X)g X'X(X'X)gx'

1 1 1 2

, ,- X(X'X)g X'X(X'X)~' + X(X'X)g X'X(X'X)~'2 1 2 2

= X(X'X)~' - X(X'X)~' - X(X'X)~' + X(X'X)~'1 2 1 2

= 0 •-Thus X(X'X)~' = X(X'X)~' !.~., X(X'X)~' is uniquely determined,,

(iv) [X(X'X)~']' =X(X'X)g X' = X(X'X)~' by (iii),

and (v) [X(X'X)~'][X(X'X)~'] = [X(X'X)g(X'X)](X'X)~' = X(X'X)~' .

Thus X(X'X)~' is unique, syrmnetric, idempotent, X = X(X'X)g(X'X),

and X' = (X'X)(X'X)~'.

Thus far we have not settled the question of existence of a Gener-

a1ized Inverse for an arbitrary matrix A. The following result of Bose

settles the existence question and in addition gives a type of canonical

representation for a Generalized Inverse.

Theorem 2.7: If A is any matrix then a Generalized Inverse

exists and is given by

where P2 and P1 are non-singular matrices such that

o..o.. , r =Rank A,

11

U, V and W being arbitrary matrices.

Further every Generalized Inverse of A has the form P2BgPl for

some Generalized Inverse, Bg of B.

Proof: For any matrix A there exists non-singular matrices Pl

and P2 such that

where r is the rank of A.

Define Bg

= [*"J then

B#B=[Hi][m][Hi] ~ [m]Hence Bg is a Generalized Inverse of B if and only if X = I(r).

Thus Bg

= [Ww], where' U, V and II are arbitrary, is a canonical

-L -1representation for a Generalized Inverse of B. Since A = Pl l3P2 we

see that Ag =P2BSpl is a Generalized Inverse of A, thus proving

existence.

If Ag is any Generalized Inverse of A then the equation

implies that

12

or

It follows that p~lAgpil

Hence

*is a Generalized Inverse of B say B .

for some Generalized Inverse of B.

Bose applies the above results to obtain a solution of the so-called

" "" "normal or least squares equations given by

(x'x)!? = X'I

and discusses conditions for (linear) estimability, which reduce to the

requirements for the uniqueness of a linear function l!?, and (Strong)

testability which reduce to conditions for the uniqueness of L!? where

the rank of L is full. (L is a rectangular matrix of order

sxp(s < p) and rank s).

C. R. Rao [1962], in a recent paper, has discussed what we have

defined as a Generalized Inverse. Rao's treatment closely parallels

that developed by Bose as given above. Rao defines his Generalized In

verse to be any matrix AS such that for any I for which ~ =I is

consistent, ~ =A~ is a solution. He then establishes that AS is

a Generalized Inverse of A' if and only if AASA =A. Rao discusses

briefly the application of Generalized Inverses to the theory of least

squares and distributions of quadratic forms in normal variables.

13

2.3 Reflexive Generalized Inverses

C. R. Rao [1962] established that there exists a Generalized In-

verse with the additional property that

In his proof of this result a constructive procedure for determining

such an Ag is given. He simply uses the fact that for any matrix A

there exists non-singular matrices P1 and P2 such that

P1A P2= ~(r) ~j =B

where r is the rank of A. The Generalized Inverse given by

Ag = P Blp is such that AAgA =A and AgAAg = Ag .2 1

The reflexive property present with such a Generalized Inverse

suggests that some interesting results might be obtained by consideringthe

following type of Generalized Inverse.

Definition 2.2 A Reflexive Generalized Inverse of a matrix A is

defined as any matrix Ar such that

(2.4)

In the case of a symmetric matrix Rao has given a computational

method (a modified version of which we discuss in detail in Chapter 4)

for obtaining a Reflexive Generalized Inverse. Properties of this type

of Generalized Inverse (other than those which a Generalized Inverse

possesses) do not appear to have been previously discussed. It is of

interest to point out here that Roy [1953] in his discussion of

14

univariate Analysis of Variance utilizes an approach resting on the

concept of the space spanned by the columns of a matrix. It can be

easily shown that such an approach is exactly equivalent to working with

a judicious choice of a Reflexive Generalized Inverse. This will be

discussed briefly in Chapter 5.

Specific properties of the Reflexive Generalized Inverse will be

discussed in Chapter 3.

2.4 Normalized Generalized Inverses

Zelen and Goldman [1963] in a recent article have proposed a type

of Generalized Inverse which they call a Weak Generalized Inverse.

Definition 2.3 (Normalized Generalized Inverse) If X is an

n x p matrix then a Normalized Generalized Inverse of X is a p x n

nmatrix X satisfying

XX~ =X

(__.n) , n.xx =XX.

Zelen and Goldman's proof of the existence of a Normalized Gener-

alized Inverse is based on the following three Lemmas on bordered

matrices which we state without proof.

Lemma 2.1 Let A be a symmetric px p matrix of rank r < p and

let R be a p x q matrix of rank q = p-r. Then there exists a

px(p-r) matrix B with the properties

(a)

(b)

B'A = 0-det (B'R) ~ 0

15

if and only if the square symmetric matrix

M = [+++]is non-singular. In this case any B of rank p-r can be used as the

B in (*).

Furthermore -1 has the formM

-1

[(B'R~-~'B R'B -1 ]M =

0-Lemma 2.2 Let A, R, M be as in Lemma (2.1) and assume M non-

singular. Then there is a unique px p symmetric matrix e associated

to R, With the property that for at least one B satisfying (*),

(**)

(a) R' e = 0-

(B bears no relation With the B used in Sections 2.2 and 2.3.) The

matrix e satisfies (**) for every B satisfying (*), and has the

additional properties:

( a) CAe = e, ACA = A

(b) e is of rank r.

Also e can be written as e = [I - B(R'B)-lR'][A + RB,]-l.

Lemma 2.3 Let A be a symmetric p xp matrix. If e satisfies

(a) and (b) of Lemma 2.2 then e can be found using some R in the

expression given for e in Lemma 2.2.

Note that e as given above is symmetric since the inverse of the

symmetric matrix M is symmetric. Also note that e is a Reflexive

16

Generalized Inverse in the sense of Definition 2.2, but that C is

required to be symmetric here.

The following Lemma established by Zelen and Goldman will be used

in Chapter 3 to establish the relation between Reflexive Generalized

Inverses and Normalized Generalized Inverses.

Lemma 2.4 Let X be an n x p matrix. Then the p x n matrix yfl

is a Normalized Generalized Inverse of X, if and only if

for some C associated with A =X'X as in Lemma 2.2.

Zelen and Goldman use the concept of Normalized Generalized In-

verses to study and extend some of the concepts of minimum variance

linear unbiased estimation theory.

~ Pseudo-Inverses

The earliest work on Generalized Inverses was done by E. H. Moore

[1920]. " "Moore introduced the concept of general reciprocal of a

matrix in 1920 and later in 1935 in a book entitled General Analysis

further investigated the concept. Penrose [1955, 1956], independently

d ," "through a different approach discovere Moore s general reciprocal ,

which he calls the "generalized inverse" of a matrix. Rado [1956] has

pointed out the equivalence of the two approaches. Penrose's approach

is much closer to the approach we have followed thus far and we intro-

duce this type of Generalized Inverse in this manner. Due to the

recent literature we use the name Pseudo-Inverse for the Generalized

Inverse developed by Moore and Penrose.

17

The definition of the Pseudo-Inverse of a matrix hinges on the

following theorem proved by Penrose.

Theorem 2.8: The four equations

(i) AAtA = A

(ii ) AtAAt = At

(iii) (AAt )' = AAt

(iv) (AtA)' = AtA

where Ei = E~ = ~,

eigenvalue of A' A.

have a unique solution for any A.

Proof: Penrose's proof can be greatly shortened using the results

previously mentioned in this Chapter and the spectral representation of

a symmetric matrix.

We see that (A' A) is a symmetric matrix and hence has a spectral

representation

thEiEj = 2 (i ~ j) and A.i is the i non-zero

If we let X = .E A.~lEi then (A' A)X(A' A) = 1: A.iEii i= (A' A) and similarly X A'A X = X. Hence X is a Reflexive Gener-

a1ized Inverse of A' A. Hence by Corollary 2.1 A X A' A = A. We now

define At = XA'. Then

and

AAtA = A X A' A = A ,

~tAAt = X A' A X A' = X A' = At ,

AtA=XA'A=1:E =(.EE)'=(AtA/iii i

AAt = A X A' = (A X A' )' = (AAt ), by Theorem 2.6.

Following Penrose, uniqueness is shown by noting that if B also

18

satisfies (2.6) then

At = AtAAt = AtAt'A' = AtAt'A'B'A' = AtAt'A'(AB)' =AtAt'A'AB

= AtAB = AtAA'B'B =A'B'B = (BA)'B = BAB = B

t t' t t ,where the relations A A A' = A , A AA' = A, (AB) = (AB)', and

A = AA' At' have been freely used.

Definition 2.4 The matrix At associated with A as in Theorem

2.8 is called the Pseudo-Inverse of A.

Penrose investigates many properties of the Pseudo-Inverse, the

results being summarized in the following theorems.

Theorem 2.9: (i) (At)t = A,

(ii) (A,)t =At, ,

(iii) At = A-l if A is non-singular,

( i v) (M)t = A.tAt , A. any s caler with A.t = 0 if A. =0,

(v) (A'A)t = AtAt , ,

(vi) If U and V are orthonormal (UAV)t = V'AU',

(vii) If A =rAi' where AiAj = ~ and A~Aj = ~

whenever i r j then At = L At ,i i

(viii) At = (A'A)tA"

(ix) AtA, AAt , I-AtA, and I_AAt are all synunetric

all have rank equal to

trace of

idempotent matrices,

If A is normal AtA = AAt ,

, tA,A A,A ,

(x)

We recall that a normal matrix is a matrix such that AA' = A' A.

19

Theorem 2.10: A necessary and sufficient condition for the equation

A X B = C to have a solution for X is

in which case the general solution is

where Y is arbitrary. Penrose points out that A need only satisfy

AAtA = A in order for Theorem 2.10 to hold. An immediate corollary is

that the general solution of the consistent equation P.x = c is

t tx = P .£ + (I-P p)Z where Z is arbitrary.

tTheorem 2.11 The matrix A B is the unique best approximate

solution of the equation AX = B. (The matrix Xo is said to be a

best approximate solution of the equation f(X) =G if for all X

either

or

(i) Ilf(x) - Gil> II f (xo ) - Gil

(ii) IIf(X) - G II = II f (xo ) - Gil and Ilxll > II X II .- 0

Note that II Mil is a norm of the matrix M and is commonly given by the

square root of the trace of M'M.).

2.6 Generalizations to Hilbert Spaces

A generalization of the concept of Pseudo-Inverses to infinite

dimensional vector spaces or more specifically to Hilbert space has

recently been achieved by Desoer and Whalen [1963] and also by Foulis

[1963] in a more algebraic manner. Whatever is said about Hilbert space

naturally holds for a finite-dimensional inner product space. Desoer

20

and Whalen's approach will be discussed in some detail because of its

importance in Hilbert space applications and also because the natural

specialization of their approach to finite dimensional inner product

spaces affords a modern look at the concept of Pseudo-Inverses. Their

approach can best be described as a range-null-space approach. Before

describing the approach taken by Desoer and Whalen we need to indicate

certain basic definitions, theorems, and notations appropriate to the

concepts of Hilbert space. The approach we take follows closely that

of Simmons [1963] and Halmos [1951].

We recall that a linear space H is a set of points ~,~, z

such that

I. There is defined an operation + on Hx H to H such that

(H, +) is an Albelian Group, !.!:.,

(i) (~+~) + ! = ! + (~ +!) for all ~, ~ and! in H,

(ii) There exists 0 € H such that x + 0 = 0 + x =x for

(2.6')

and

any ~ € H,

(iii) For each x € H there exists an element -x such that

x + (-~) = (-x) + ~ =Q ,

(iv ) ~ + ~ = ~ + ~ for all x and ~ in H

a (~ + ~) = CX! + et;L ~, ~ € H, a € F,

(a + (3) ~ = CX! + (3~ X € H, a, (3 € F,

(a(3)~ = a«(3~) x € H, a, (3 € F,

l.x = x X € H •

II. There is defined an operation on H x F (where F is a field

of scalars) to H called scalar multiplication such that

(i)

(2.6") (ii)

(iii)

(iv)

•

21

The elements of H are called vectors while the zero vector 0 is

called the origin.

The linear space H is called a normed linear space if to each

vector x € H there corresponds a real number denoted II xII and called

the norm of x such that

II~II ~ 0, and II~II = 0 if and only if ~ = .Q. ,

and

It can be shown that a normed linear space H can be made into a metric

space by defining the metric as

(2.8)

The metric defined by (2.8) is called the metric induced by the norm.

We recall that a sequence x-nin a metric space H is convergent to

X € H, written x -;> x if and only if d(x ,x) -;> 0 as n -;> 00.-n - -n-

A Cauchy sequence in a metric space is a sequence x such that-n

d(x ,x ) < € whenever n, m > n. Obviously a convergent sequence is-m -n - 0

a Cauchy sequence. A metric space H such that each Cauchy sequence

in H is convergent to a point in H is called a complete metric

space.

A complete normed linear space is defined to be a Banach space. In

a Banach space the induced norm provides us wi.th a natural measure of

distances but there is no natural measure of angles as exists in, say

R2 • The concept of a Hilbert space admits such a measure of angles. A

linear inner product space is a linear space with a function (called the

22

inner product) such that

If a linear inner product space H is complete and we define IIxll by

the equation

IIxll2 = (x,x)

then H can be shown to be a Banach space. The 'norm, Ilxll is said to

be induced by the inner product.

A Hilbert spac e is thus a complex Banach space whose norm arises

from an inner product. The inner product allows a defin1tion of orthog

onality which generalizes the usual concept of orthogonality in R2 and

r{; . Vectors ! and l in H, are said to be orthogonal if (!,l) = O.

The notation !.1l indicates that ! is orthogonal to l,!-lA

indicates that !l.l for all l € A while AJ.. B indicates that

!....Ll for all x € A and l € B.

It can be shown that the inner product, the induced norm, addition

and scalar multiplication are all continuous functions in the topology

generated by the metric defined by (2.8).

A subset MeH is said to be subspace of H if (~ + I3l) € M

whenever !, l' € A and Cl, 13 € F. A subset M of a Hilbert space can

be shown to be closed (in the sense of the norm topology) if and only if

it is complete. If any vector ! € H can be written uniquely as

x = J!j where !j € Mj , a closed subspace of H, for all j, then we

write H = Ml + ~ + ••. + and call H the direct sum of the subspaces

23

M:L' ~, .••• It can be shown that if M:L'~' •.• are pairwise

orthogonal and H is spanned by the union of the MJ

then H is the

direct sum of the MJ

•

Reflection on finite dimensional vector spaces reveals that linear

transformations are of importance in many applications. Linear transfor

mations in finite dimensional vector spaces are represented by matrices.

The generalization of a matrix to Hilbert space results in the concept

of a linear transformation A. Alinear transformation A is a mapping

from H to another Hilbert space H' such that

(2.10)

for all ~,'Z € H and Ct, 13 € F.

A linear transformation mapping one Hilbert Space into another

obviously preserves some of the algebraic structure of H but since H

is equipped With a topology it is useful to require that A be continu

ous in order to preserve the topological structure as well. The linear

transformation A:-H --> H' is said to be bounded if there exists Ct > 0

such that

(2.11) IIA xii ~ Ct II x II for all x € H •

-. The norm IIAII of A is defined as

(2.12) IIAII = in! { Ct: IIA xii ~ Ct Ilxll for all ~ € H} •

The concept of boundedness is important since it can be proved that a

linear transformation is continuous if and only if it is bounded. If

A: H --~ H and is bounded then we shall call A a (linear) operator.

24

Similarly if A: H -> R or C and is bounded then we shall call A

a (linear) functional.

We shall have occasion to discuss only bounded operators and func-

tionals. We also note that there are many differences in terminology

used in defining functionals and operators.

The scalar multiple, sum, and product of the operators A and B

are defined by

(aA)! =a(~) ,

(A + B)! = ~ + B! ,

(AB)2£ = A(l32£) ,

and it can be shown that

(2.14) llaAll = lal IIAII, IIA + BII ~ IIAII + IIBII and IIABII < IIAII IIBII •

It can also be shown that the set of all linear operators B(H)

from H into H forms an algebra over the field F,!•.!!., a linear

space over F where the elements (in this case the operators) and

scalars can be multiplied in the following natural manner:

A(BC) = (AB)C

A(B + C) =AB + AC

for A,B,C, € B(H) ,

for A,B,C, € B(H) ,

and a(AB) = (aA)B = A(aB) for A,B€ B(H) and a € F •

In addition this algebra has an identity, namely the operator I defined

by

(2.16) I x = x for all x € H .

An operator A is said to be invertible if there exists A- l such that

A-1A = I =AA-l .

25

If A is any operator it can be shown that there exists a unique

*operator A with the property that

for all ~, Z € H. *The operator A is called the adjoint of A and

plays a role similar to that of the conjugate transpose of a matrix in

finite dimensional linear spaces. It can be shown that the adjoint has

the following properties:

(2.18)

and

(A*)* = A ,

(aA)* = a*A* where a* = complex conjugate of a,

(A + B)* = A* + B* ,

(AB)* = B*A* ,

(A*rl = (A-l )* if A is invertible.

There are several specific types of operators of importance in the

study of Hilbert spaces. We list these operators and their properties

below.

*Hermitian Operator: An operator A is hermitian if A = A

(generalizes the concept of a hermitian matrix),

* *Normal Operator: An operator A is normal if A A = AA. A

normal operator has the property that IIA*xll = IIAxIl for all ~ € H,

Unitary Operator: An operator U is unitary if U'*u = 00* = I

(generalizes the concept of a unitary (orthogonal) matrix) ,

Projection Operator: The projection operator P on a subspace M

is the operator P, defined, for every ~ € H of the form ~ =~ + Z

where x € M and Z € ~, by pz = x. It can be shown that if P is

e,

26

an idempotent (rl- = p) and hermitian operator then P is the proj ec-

tion on M ={!: P! =! J.Theorem 2.8 indicates that AAt and AtA are proj ections on suit-

able subspaces in the finite dimensional case.

One of the main virtues of a Hilbert space is the fact that if M

is a closed subspace then there exists a projection operator P such

that M =1!: P! = ! J. Further MJ. = [!: p! = QJ and M + Ml. = H.

In accordance with customary usage M is called the range of P

(written R(P)) while ~ is called the null space of P (written N(P)).

The existence of projections on a given closed subspace plays a funda-

mental role in the generalization of the concept of a Pseudo-Inverse to

Hilbert space.

The final concept of importance is the spectrum of an operator

which is defined as the set of all ')... such that A - ')...1 is not in-

vertible. The spectrum of an operator is obviously a generalization of

the set of characteristic roots of a matrix.

With the above preliminaries and notation completed we are in a

position to discuss Desoer and Whalen's definition of the Pseudo-Inverse

of a bounded linear transformation from H to H'.

The essential problem is simply to associate with each transforma

tion A in B( H, H') another linear transformation A such that

.A! =;£. is true if and only if At;£. =!. If A is invertible then A- l

clearly satisfies the requirements. In order to cope with the situation

when A-1 does not exist Desoer and Whalen introduce the following

definition.

e-27

Definition 2.5 Let A be a bounded linear transformation of a

Hilbert space H into a Hilbert space H' such that R(A) is closed.

A is said to be the Pseudo-Inverse of A if

(i) At~ = ~ for all ~ € [N(A)]J. ,

(ii) Atl = 0 for alll € [R(A)].L

and (iii) If II € R(A) and l2 € [R(A)~ then At (ll + Ie) = Atll + AtIe'

tThe connection between A and A can best be seen by the diagram

given in Figure 2.1.

A

o G

Figure 2.1. The Pseudo-Inverse in geometric terms

We note that since R(A) is closed the mapping At is a one-one

mapping of R(A) onto [N(A)].L. This coupled with (ii) and (iii) serve

to define At as a mapping of H' into H. 'In addition (iii) guaran

tees that At is a linear operator from H' into H with range

[N(A)]l and null space [R(A)lL.

28

Using Definition 2.5 Desoer and Whalen prove that At has the

properties (i) to (iv) of Theorem 2.8 thus indicating that Definition

2.5 provides an extension of the concept of Pseudo-Inverse (.£.,f. Defini

tion 2.4) to Hilbert space. Note that in the case of a finite dimension-

al vector space the range of any linear transformation is closed and

hence in this case Definition 2.5 is equivalent to Definition 2.4.

Using Definition 2.5 the authors prove that At is such that

(At)t :::: A and (At )*:::: (A*)t. The projection results indicated in

8 t '( )Theorem 2. , namely, that AA is the projection of H onto R A

and that AtA is the projection of H onto [N(A)f are also estab

lished. Equation (viii) of Theorem 2.9, which plays an important role

in the computational methods to be discussed in Chapter 4, is true in

the Hilbert Space context also and reduces the computation of the

Pseudo-Inverse of an operator A to the computation of the Pseudo

*Inverse of (A A).

Several interesting properties of the Pseudo-Inverse are estab-

lished by Desoer and Whalen. The first states an equivalent definition

in terms of an inner product relation and a condition on the null space

of A. Specifically the equivalent definition is:

Definition 2.5.1 Let 2f,;[. be elements of H and write

2f :::: 2fl +~, ;[.::::;[.1 + ~ where 2fl' ;[.1· are in [N(A)].L and ~, ~

are in N(A). Then At is the Pseudo-Inverse of A if and only if

(i) (AtA 2f,;[.) :::: (2f1';[.1) for all 2f,;[. in H,

and

(ii) N(At ):::: [R(A)].L .

29

As stated in the introduction, the concept of Generalized Inverse

is extremely important in the theory of least squares. Penrose's

Theorem (2.11) on best approximate solutions of matrix equations

generalizes in the following sense.

Theorem 2.11': If l € H' and 2£1 = Atl then (a) 11A!1 - l II

:5 IIA! - III for all 2£ € H, and (b) 11!I11 :5 1I!oli for all !o satisfying

inequality (a). In intuitive terms Theorem 2.11' states that if A! = l

has a solution then Atl is the solution which has smallest norm.

As is well known (see for example HalmosU958]) an arbitrary matrix can

be factored as A = UQ where U is an isometry (an orthonormal matrix)

and Q is a positive matrix (non-negative definite matrix). Desoer and

Whalen proved the following generalization of this decomposition (Which

is called a polar decomposition because of its relation to the polar

representation of a complex number as reiG ).

Theorem 2.12: Let A be a bounded linear mapping from H into

H'. If R(A) is closed then A can be factored as

A = UP

* twhere U = U, P is a positive self adjoint operator on H and U

maps H onto R(A).

2.6 Generalized Inverses in Algebraic Structures

The concept of Generalized Inverses revived by Penrose [19'5] in the case

of matrices led to immediate generalizations to other algebraic struc-

tures. Since the set of all n x n matrices is a ring under multiplica-

tion, it is natural to suspect that the concept of Generalized Inverse

30

extends to arbitrary rings. In recent articles Drazin [1958] and Munn

[1962] have investigated Generalized Inverses in semi-groups and rings.

A semi-group consists of a set 8 and a mapping of 8 x 8 into 8

such that the mapping is associative, !.~., if x, y and z are elements

of 8 then

Drazin and Munn introduce the following definition.

Definition 2.6 An element x in 8 is said to be pseudo-

invertible if there exists an element i in 8 such that

xx=xx,

and

(ii) Xn -- xn+lx- f iti i tor some pos ve n eger n,

- -2(iii) x = x x .

The uniqueness of i (if eXistent) has been proved by Drazin.

In terms of the associative ring of nxn matrices under multipli

cation we see that any normal matrix (any matrix such that X' X= XX' )is pseudo-invertible in the sense of Munn and Drazin since

xtx = (X'X)tx'x = (XX,)txx ' =x,tx' = txxt), =xxt ,

X= xxtx = x2xt ,

and xt =xtxxt =xt2x .

Using the above definition of a Pseudo-Inverse Munn and Drazin investi-

gate the implications of the concept of Pseudo-Inverse in the structure

of rings and semi-groups. In particular Drazin gives sufficient condi

tions for x to be pseudo-invertible when (8, (3)) is a ring and MUJ:!lIl

obtains necessary and sufficient conditions for an element of a

31

semi-group to be pseudo-invertible. This condition is si1DI>ly that some

power of x lies in a subgroup of (S,®) •

Foulis [1963] in a recent paper has considered Pseudo-Inverses in a

special type of semi-group. His work is highly algebraic in structure

and includes as a special case certain of the results obtained by Desoer

and Whalen [1963] in Hilbert space contexts.

32

CHAPrER 3. THEORETICAL RESULTS

3.1 Summary

In Chapter 2 several definitions of Generalized Inverses were

presented. It is the purpose of the present chapter to investigate some

of the implications of these definitions. Let g(A), r(A), n(A) and

t(A) denote the set of all Generalized, Reflexive Generalized, Normal

ized Generalized and Pseudo-Inverses respectively, of a matrix A. We

are interested in investigating relationships between a property of A

and the corresponding property of a typical element in g(A), r(A), n(A)

or t(A). The property of rank is discussed in Section 3.2, symmetry

in Section 3.3, and eigenvalues and eigenvectors in Section 3.4. In

statistical applications of Generalized Inverses orthogonal projections,

represented by idempotent symmetric matrices, play an important role and

these aspects of Generalized Inverses are discussed in Section 3.5. In

Section 3.6 several results are proved which indicate that under certain

conditions g(A), r(A), n(A) and t(A) can be characterized by the

relation of a property of A to the corresponding property of a typical

element in i(A), i=g, r, n, t.

~ Results on Rank

One of the most important characteristics of a matrix is its rank.

In the case of a Generalized Inverse Ag, of a matrix A, the rank of If>

satisfies the following inequality

The proof of this result hinges on the following Lemma.

33

Proof: Since the rank of a product of two matrices does not exceed the

rank of either factor the conclusions follow from

Rank (A) ~ Rank (AAg ) ~ Rank (AAgA) = Rank (A),

and Rank (A) ~ Rank (AgA) ~ Rank (AAgA) = Rank (A) •

Using Lemma 3.1 we have

Theorem 3.1: Rank (Ag) ~ Rank (AgA) = Rank (A).

If A is a square matrix the following theorem indicates that

singularity of A and singularity of its Generalized Inverse, Ag, are

not equivalent concepts.

Theorem 3.2: If A is singular then a non-singular Generalized

Inverse exists.

Proof: From Theorem 2.7 a Generalized Inverse of A is given by

P2BgPl where

P1AP2=B=[~] and BS=[Wr-].Since Pl and P2 are non-singular it follows that Ag = P2B~1 is a

non-singular Generalized Inverse of A.

One of the most useful results concerning inverses of square

matrices is that CAB, where C, A, and B are non-singular, is in-

. -1 -1 -1vertible with inverse given by B A C It is interesting and very

useful (see Chapter 4) to note that a similar result holds for Gener-

alized Inverses.

Theorem 3.3: If C and B are non-singular square matrices and

A is any matrix such that CAB eXists, then a Generalized Inverse of

CAB is given by B-1AgC-l •

Proof: (CAB)(B-1AgC-l )( CAB) = CAAg~ = CAB.

One would hope that an analogous result would hold if B and C

were replaced by singular matrices and B-1AgC-l replaced by BgAgcS

but consideration of the equation

indicates that such a result cannot hold in general. A special case of

some interest occurs when C and B are commutative idempotent matrices

which also commute With A. Then since a Generalized Inverse of an idem-

potent matrix is itself we have (CAB)(BAgC)(CAB) = CABAgCAB = CBAAgACB

= CBACB = CBCBA = CBA = CAB. Such a result could be of interest in the

study of idempotent matrices and their associated projections.

When we consider Reflexive Generalized Inverses the inequality of

Theorem 3.1 becomes an equality.

Theorem 3.4: If Ar is a Reflexive Generalized Inverse of A

then Rank (Ar

) = Rank (A).

Proof: Theorem 3.1 applied to A and another application of Theorem

r3.1 to A yields the results

Rank (Ar ) 2: Rank (A),

Rank (A) 2: Rank (Ar )

and hence Rank (Ar ) = Rank (A) •

The analogue of Theorem 3.3 holds for Reflexive Generalized Inverses

also.

35

Theorem 3.5: If C and Bare non- singular square matrices and

A is any matrix such that CAB exists then a Reflexive Generalized

-1 r -1Inverse of CAB is given by B A C .

Proof: In view of Theorem 3.3 the equation

(CAB)r(CAB)(CAB)r = (B-~rc-l)(CAB)(B-1ArC-l) =B-1ArAArC-l

=B-1ArC-l

completes the proof.

In view of Theorem 3.4 both Normalized Generalized Inverses and

Pseudo-Inverses preserve the rank of the original matrix. Theorems 3.3

and 3.5, however, become somewhat weaker when we restrict the class of

Generalized Inverses. Simple computations show that a Normalized Gener-

alized Inverse of MC where C is any non-singular square matrix and

B is any orthonormal matrix is C-1A~'. As might be expected the

Pseudo-Inverse of MC when B and C are orthonormal is C'AB'

(this result was in fact given by Penrose [1955]).

2..:2. Symmetry Results

In many applications of matrices square symmetric matrices become

extremely important. In view of the importance of symmetry and the

simple fact that the inverse of a non-singular symmetric matrix is again

symmetric, a natural question which arises is the extent to which Gener-

alized Inverses preserve or destroy symmetry of the original matrix.

The first result states that the two concepts are not equivalent

for Generalized Inverses and Reflexive Generalized Inverses while the

second is an existence theorem guaranteeing that symmetric Generalized

and Reflexive Generalized Inverses of symmetric matrices exist.

36

Theorem 3.6: If A is symmetric, then it is not necessarily true

that Ag or Ar is symmetric.

Proof: By Theorem 2.7 a Generalized Inverse of A is given by

and it is clear from this form that Ag need not be symmetric.

Similarly a Reflexive Generalized Inverse of A is given by

rand again it is clear that A need not be symmetric.

Theorem 3.7: If A is symmetric then there exists a symmetric Ag

rand a symmetric A.

Proof: In Theorem 2.7 we can choose Pl and P2 so that P2 = P~.

If we choose Bg and Br to be symmetric the conclusion follows from

Theorem 3.5.

The situation for Normalized Generalized Inverses of symmetric

matrices is not quite so simple. Consideration of

-13o

which is a Normalized Generalized Inverse of

A = [~1

1

o

shows that a Normalized Generalized Inverse of a symmetric matrix need

not be symmetric. If the Normalized Generalized Inverse is symmetric

then it is the Pseudo-Inverse. Thus the existence result for Normalized

37

Generalized Inverses of symmetric matrices has little importance since,

as we shall show below, the Pseudo-Inverse of a symmetric matrix is

symmetric.

Theorem 3.8: If A is symmetric then the Pseudo-Inverse of A is

symmetric.

Proof: From (ii) Theorem 2.9 we have (A'} = At, .At = (A,)t = (At),.

Hence

..

Theorem 3.8 can, using (v) of Theorem 2.9, be strengthened to the

statement that A normal implies At normal.

~ Eigenvalues and Eigenvectors

The importance of the spectrum of a matrix and more generally the

spectrum of a linear transformation on a linear space is well known.

Classical matrix theory discusses the spectrum of a matrix via study of

the eigenvectors or characteristic vectors and their associated eigen-

values or characteristic roots. It is known that if x is an eigen-

vector of an invertible matrix A then x will also be an eigenvector

of A-1. The corresponding eigenvalues will be reciprocals. Since

Generalized Inverses behave so much like inverses a natural subject for

investigation is the relation of the eigenvectors (values) of a square

matrix A and those of an associated Generalized Inverse Ag •

For any square matrix M let 1\ 1 (M) denote the set of non-zero

eigenvalues of M, !.~.,

1\ 1(M) ={ A.: A. ~ 0 and M! = A. ~ for some ~;. Q5

and let 1\ ~l(m) =f f: A. € '\(M)J. For any square matrices M and N

let the sets A2(M) and A ;l(N) be defined by

A2(M) =[!:! r 0 and Mx = A. x for some A. €A:fM)}

,\;l(N) ={!: ! r Q, N! = A.-1! and ! €A 2(M)} •

If A is any square matrix and Ag is any Generalized Inverse of

A then we define properties R1 and ~ as follows:

Property ~: A1(Ag) =A~l(A),

Property ~: A 2(Ag ) =A;l(A) •

If Ag is such that property R1 is satisfied we say that Ag has

property E1. and similarly for property ~ •

Theorem 3.9: Generalized Inverses and Reflexive Generalized In-

verses do not necessarily possess properties R1 and ~.

Proof: The matrix

A = [~11o ~]

has characteristic roots 0, 1, 3,. The matrices

r [1and A = -~

-12o ~]

are respectively a Generalized and Reflexive Generalized Inverse of A.

The eigenvalues of Ag are 1, 1, 1, while those of Ar are 3 ;-15and 3;V5. Note that the multiplicity of the zero eigenValues is not

preserved with Ag

• Consideration of

•39

[

1r

and A = ~

shows that a Reflexive Generalized Inverse of a symmetric matrix can

possess property Bi but not necessarily property ~ since~] is

an eigenvector of Ar and not an eigenvector of A.

Theorem 3.10: If A is symmetric then the non-zero eigenvalues of

A and any Normalized Generalized Inverse of A are reciprocals, !.~.,

nA possesses property Rl if A is symmetric.

Proof: If A is a non-zero eigenvalue of A there exists ! r2 such

that

Ax=Ax.

Multiplying both sides of the above equation by AAn yields

n= AA we have

n n' -1 n'! = (AA )'! =A Ax or A x =A x .

Hence A- l is a non-zero eigenvalue of An •

If ~ is a non-zero eigenvalue of An then there exists a non-

zero vector I such that

Hence

or

n'Thus A:;i is an eigenvector of A with eigenvalue ~. Multiplying

both sides by A gives

40

Hence A(Al) = ~-l(Al) or ~-l is a non-zero eigenvalue of A. Note

n' nthat Al r Q since if Al = Q then A Al = Q or AA Z = 0 and hence

nA Z = Q which is a contradiction.

For non-symmetric matrices the conclusion of Theorem 3.10 is not

valid as a consideration of

A = [~ ~

shows.

values

nThe matrix A has eigenvalues 0, and 3 while A has eigen-

o and 1/5.

The following Lemma gives a sufficient condition for a Reflexive

Generalized Inverse to possess property ~.

Lemma 3.2: A Reflexive Generalized Inverse Ar of A possessesr r

property ~ if A and A conunute (written A <-> A ).

Proof: If A <-> Ar then if A. rOwe have

r 2 r -1 rAx = A.x => Ax = ")I.AA x => A.x = A. A x => A. x;:: A x, and

The following simple corollary to Lemma 3.2 indicates that Pseudo-

Inverses of normal matrices are indeed well behaved.

Theorem 3.11: AtIf A is normal then the Pseudo-Inverse, J

possesses property R2 •

Proof: By (viii) of Theorem 2·9J

From Lemma 3.2 it follows that At

A normal implies that AAt =AtA .

possesses property ~.

41

It is curious to note that if A is not normal then its Pseudo

Inverse need not even possess property Rl . An example of this is

furnished by the matrix

A = [~ ~

and its Pseudo-Inverse

A

t= ~~ ~

It is easily shown that A has eigenvalues 0 and 3 while A has

eigenvalues 0 and 3/10. Necessary conditions for the Pseudo-Inverse

of an arbitrary matrix to possess property Rl or R2 are not known

at present.

~ Orthogonal Projections

The topic of orthogonal or perpendicular projections finds wide

application in the theory of matrices and linear spaces (SUCh as the

spectral representation of a linear operator) as well as in the applica

tion of matrix theory to other fields (the sum of squares due to a

hypothesis can be defined using the notion of an orthogonal projection

on a certain subspace). We recall (see Section 2.6) that an orthogonal

projection is siI!I.Ply a symmetric ideI!I.Potent matrix. In this section we

shall see that Generalized Inverses can be classified partly by the

projection properties of AAg and AgA. Simple matrix multiplication

shows that both AAg and AgA are ideI!I.Potent. Neither Generalized In

verses or Reflexive Generalized Inverses are necessarily orthogonal

projections because they are not necessarily symmetric.

Theorem 3.12: A Reflexive Generalized Inverse is a Normalized

rGeneralized Inverse if and only if AA is an orthogonal projection.

Proof: By the definitions.

Theorem 3.13: A Reflexive Generalized Inverse is the Pseudo-

r rInverse if and only if AA and A A are orthogonal projections.

Proof: Again obvious from the definitions.

Note that Theorems 3.12 and 3 .13 merely put conditions (iii) of

Definition 2.3 and (iv) of Definition 2.4 into geometric terms.

For applications to the theory of least squares it is of interest

to note that the trace of AgA and AAg is equal to the rank of A

(see Rao and Chipman [1964] for a proof of the result that the trace of

an idempotent matrix is equal to its rank).

3.6 Characterization of Generalized Inverses

Thus far in this Chapter we have investigated the relation between

various properties of a matrix A and the corresponding properties of

a typical element of g(A), r(A), n(A) or t(A). In this section we

carry this investigation further and find criteria to characterize the

sets g(A), r(A), n(A) and. t(A).

The first result indicates that rank plays an important role in

distinguishing between Generalized Inverses and Reflexive Generalized

Inverses.

Theorem 3.14: A Generalized Inverse is a Reflexive Generalized

Inverse if and only if Rank Ag

= Rank A.

Proof: If Ar is a Reflexive Generalized Inverse then by Theorem 3.4rRank A = Rank A.

To establish the converse let Ag

be an arbitrary Generalized

Inverse of A which is of the rank as A. By Theorem 2.7 there exists

P2

and Pl

such that

where

[Hf][H+]

,

the matrices V, W and X being arbitrary. If Rank A =Rank Ag

then

Rank A = Rank Ag = Rank B

g = Rank B. We see,that

R=kBg=Rank [;g][~][~]

= Rank [~ > Rank [~]= Rank B •

Hence Rank Bg = Rank B implies X - WV = ~ or X = WV •

Thus

Bg

= ~,

if Ag

is a Generalized Inverse with Rank Ag = Rank A. We now find

that

44

But

= [~][$][~] = [~][~J

= [~ = Bg

Hence

and AS is a Reflexive Generalized Inverse.

As has been the case throughout this Chapter the properties of the

Normalized Generalized Inverse are somewhat nebulous. The following

theorem, a slightly stronger version of Lemma 2.4 due to Zelen and

Goldman, characterizes Normalized Generalized Inverses in terms of

Generalized Inverses.

Theorem 3.15: A Generalized Inverse is a Normalized Generalized

Inverse if and only if it can be written in the form

for some Generalized Inverse of A'A.

Proof: If An = (A'A)SA' where (A'A)S is a Generalized Inverse of

A'A then the equations

and

show that An is a Normalized Generalized Inverse of A.

The converse follows immediately from Lemma 2.4.

nA of a

The property of commutativity of A and Ar serves to characterize

the Pseudo-Inverse of a sYmmetric matrix as the following theorem indi-

cates.

tTheorem 3.16: If A is SYmmetric then A <;--> A. Conversely

r r r tif A and A are SYmmetric and A ~---> A then A = A •

Proof: If A is SYmmetric (viii) of Theorem 2.9 implies that

AtA =AAt .

If A and Ar are SYmmetric and commute then

(AAr),I

=ArA = AAr=Ar A'and

(ArA)fI r r=AfAr =AA =AA.

Hence Ar = A.

The Pseudo-Inverse of a SYmmetric matrix can also be characterized

nin terms of A and Property~.

Theorem 3.17: If the Normalized Generalized Inverse

SYmmetric matrix possesses Property ~ then An = At.

Proof: Let ~J:' ~2' ..• , ~r denote the non- zero eigenvalues of A and

El' ~, .•• , Er the corresponding eigenspaces. Also let Eo denote

the eigenspace spanned by the vectors in the null space of A. Property

R2 implies that

i = 1,2, ... ,r

where !i is an arbitrary element in Ei i = 1,2, ••. , r •

We may write any vector ! as

+ x + x-r -0

where ~ € Ei i:= 1,2, ••• ,r and ~ € Eo. Hence

46

"" x +-1... t

+ X + An Ax-r """0

e·

0: X + + x-1 -r

and AnAx =AnA(_x, + ... + x + x ) + + x •:.I. -r -0 =!l ... -r

It follows tha.t AAn = AnA. Hence

and thus

CHAPTER 4. COMPUTATIONAL METHOOO

4.1 Summary

In this chapter certain computational methods for finding

Generalized Inverses are discussed. The subject is as vast as the

general theory of solving systems of linear equations and hence no claim

to completeness can be made.

In Section 4.2 an expression is obtained for a Generalized Inverse

of a suitably partitioned matrix. This result is subsequently used in

obtaining Generalized Inverses of Bordered Matrices in Section 4.5.

Section 4.3 provides a review of some computational methods. Some of

the theory underlying the Abbreviated Doolittle technique is discussed

using Generalized Inverses in Section 4.4.

4.2 An Expression for a Generalized Inverseof a Partitioned Matrix

In this section we shall develop a formula for obtaining a Gener-

alized Inverse of a suitably partitioned symmetric matrix. We note

that computation of Normalized Generalized Inverses and Pseudo-Inverses

of a matrix X can be performed by finding either a Generalized Inverse

or the Pseudo-Inverse of A =X·X (see Theorem 3.15 and (viii) of

Theorem 2.9). Matrices of the form X·X arise frequently in statis-

tical applications and occur commonly in partitioned form as

(4.1) X·X =

For use in algebraic expressions occurring in statistical applications

and as a computational method to find Normalized Generalized Inverses

48

and Pseudo-Inverses we shall develop an expression for a Generalized

Inverse of a matrix partitioned in the form (4.1).

Theorem 4.1: If a matrix X'X is partitioned as

X'X

then a Generalized Inverse of X'X is given by

[

(X' X )g + (X'X )g(X'X_)Qg(X~X )(X'X )g(X'X)g = 1 1 1 1 l-~ -~ 1 1 1

-Qg(X'X )(X'X )g2 1 1 1

(4.2 )

where Q = x~x_ - (X'X )(X'X )~IX_ and Qg and (X'X)g-~-~ 2 1 1 1 l-~ 1 1 are any

Generalized Inverses of Q and xiXl respectively.

Proof: Formula (4.2) can be directly verified by forming (X'X)(X'X)~X'X)

and simplifying. It is of some interest, however, to observe that (4.2)

can be obtained in a straightfoward manner using the results of

previous chapters. Using the relations Xi = (Xi~)(XiXl)~i and

Xl = Xl(XiXl)~iXl we see that P1X'XP2 = Z where

Z = ~] and Q = x~x_ - X~X (X'X )gx'x_-~-~ -~ 1 1 1 l-~·

I -1-1Since X X = Pl ZP2 it follows from Theorem 3.3 that a Generalized

Inverse of X'X is

where zg is any Generalized Inverse of Z. The Generalized Inverse of

Z which yields (4.2) is

Note that other choices of zg would be permissible and it appears

possible that certain of these choices would lead to simpler expressions

for (X'X)g although the choice made above is a natural one.

The following paragraph serves as an introduction to the proofs of

Theorems 4.2 and 4.3. Using the relations Xl = ~(XiXl)g(XiXl)'

Xi =Xixl(Xi~)~i ' the definition of Q and simple matrix multiplica

tions shows that

(x'x)g(x'x)(x'x)g =

[(XiXl)g(XiXl)(Xi~)g + (XiXl)~i~QgQQ~~(Xi~)g -(XiXl)~i~QgQQg]

-QgQQ~Xl(XiXl)g QgQQg

Using the results of (4.3) the following extended form of Theorem 4.1

can be stated.

50

Theorem 4.2: If X'X be partitioned as in (4.1) then a Reflexive

Generalized Inverse of X·X is

where Q = ~~ - ~JS.(XiXl )rXi~ and Qr and (Xixl? are any Reflex

ive Generalized Inverses of Q and (XiXl ) respectively.

Proof: From Theorem 4.1 it follows that (X·X)(x·x)rx •X = X·X. From

equation 4.3 with (XiXl)r(XiJS.)(XiXl)r = (xiJS.? and QrQQr =Qr it

is clear that (x'x)r(X'x)(X'X)r = (X'X)r which completes the proof.

The results of Theorem 4.1 and Theorem 4.2 can be extended, under

certain circumstances, to include Normalized Generalized Inverses and

Pseudo-Inverses. Before indicating these extensions we establish the

following Lemma:

Lemma 4.1: If X·X is partitioned as

[X'XJ. IX'~]x·x = ~JS. ~~

where Rank ~JS. = r and ~~ is non-singular (of order qxq) and

Rank X'X = r + q then

is non-singular.

Proof: Since Q is q xq it suffices to prove that Rank Q = q.

Since the rank of a matrix is unaltered by pre- or post-multiplication

by non-singular matrices it follows that the rank of X·X is the same

51

as that of Z" defined in the proof of Theorem 4.1. Hence"

Rank Z =Rank (X'X) = r + q •

On the other hand Rank Z = Rank (Xixl) + Rank Q = r + Rank Q. Hence

Rank Q = q.

Using Lemma 4.1 we can establish the following result.

Theorem 4.3: If X'X be partitioned as

where the ass~tions of Lemma 4.1 are fulfilled then

where (Xixl)n is any Normalized Generalized Inverse of (xiJS.), is a

Normalized Generalized Inverse of X·X.

(b)

(x.x/ = [<XixJ + <XiX/<XiX:!)Q-l<~JS.)(XiJS.)t -<XiJS.)t(XiX:!)Q-l]Q-l(~JS.)(xixl)t Q-l

twhere (xixl) is the Pseudo-Inverse of (Xixl)" is the Pseudo-Inverse

of (X'X).

Proof: (a) (X'X)(X'X)n(X'X) and (X'X)n(X'X)(X'X)n = (X'X)n by

Theorem 4.2. From 4.3 it follows that (X'X)(X'X)n = [(x'x)(x'x)n], •

(b) By part (a) (X'X)(X'X)t(x'X) = x'x" (X'X)t(X'X)(x'X)t= (X'X)

and (x'X)(X'X)t = [(X'x)(X'X)t],. From 4.3 it follows that

(X'x)t(X'X) = [(X'X)t(X'x)]' •

52

It is to be noted that if XiX is non-singular and (XiXl) is

non-singular then (4.2) reduces to a commonly given expression for the

inverse of a partitioned matrix. The expression given by (4.2) can be

used to find Normalized Generalized Inverses of arbitrary partitioned

matrices. If a matrix A is partitioned as

A = [A~llA12]~l ~2

we simply apply (4.2) to

and form

which, according to Theorem 3.15, is a Normalized Generalized Inverse of

A.

~ Some Cornwutational Procedures for Finding Generalized Inverses

4.3.1 Generalized Inverses

Finding Generalized Inverses, according to the definition is

simply a matter of solving the linear equations

and expressing a solution as

The matrix Ag will then be a Generalized Inverse of A. There is a

53

wealth of literature dealing with the solution of systems of"linear

equations (see for example Bodewig [1959], Dwyer [1951], and Fadeeva

[1959]) and to present a review of this material would be out of place

here. In Section 4.4 we shall present a method which has gained wide

acceptance in statistical circles. For the present the method called

sweep-out or pivotal condensation seems to be the simplest, although

not necessarily the most compact way of finding a Generalized Inverse

of an arbitrary matrix.

Consider the n x p matrix X and the augmented matrixnp

[~np ~(n)J

One performs elementary row and column operations on this matrix until

it has been reduced to the form

r-I(r)

o_n-r,r,

o- -where r = Rank (Xnp )' or more simply

*It follows that P1XP2 =X and using Theorem 3.3 a Generalized Inverse

of X is given by

(4.4)

where a possible X*g is given by

x*g = [I."....(r_)---+~2;;;..,t.r .;;.;;..n-~r_]~p-r,r Qp-r,n-r

4.3.2 Reflexive Generalized Inverses

The computation of Reflexive Generalized Inverses can be achieved

in a manner analagous to that used to compute Generalized Inverses.

If Xnp is an arbitrary matrix one forms the augmented matrix

_[xnp I I(n)]M- I(P) 0 ....

One then performs elementary row and column operations on M until M

becomes

~:p I ~l~where

X* = ~I(r) I ~r,p-r ]np 0 0...m-r,r ...~r,p-r

with r = Rank (X ).np

*It then follows that P1XP2 = X and using Theorem 3.5 we see that

is a Reflexive Generalized Inverse of X.

If A is a symmetric matrix written as A = X·X it is possible

to give a particularly simple method for computing a Reflexive Gener

alized Inverse. Pre- and post-multiplication of X·X by elementary

matrices results in

55

the row space of [XiX1 IXi~] constitutes a basis for the row space of

E'X'XE. (Hence Xi~ is non-singular.) As is well known (see Roy

[1957])

Routine matrix multiplication shows that a Reflexive Generalized Inverse

of E'X'XE is given by

Hence, by Theorem 3.5, a Reflexive Generalized Inverse of X'X is given

by

say.

A direct verification of this result is simple since

E'(X'X)(x'x)r(X'X)E =E'(X'X) [EQE'] (X'X)E'

= [E'(X'X)E] Q [E'(X'X)E]

=E'X'XE

and

(X'X)r(X'X)(X'X)r =EQ(E'X'XE)QE' = EQE' = (X'X)r •

Note that (X'X)r as given by (4.6) is not necessarily a Normalized

Generalized Inverse of (X'X) but that (X,x)rx , is a Normalized

Generalized Inverse of X. Another method of computing a Reflexive

56

Generalized Inverse will be given in Section 4.5 which does not require

that XiX be partitioned as above.

4.3.3 Normalized Generalized Inverses

By Theorem 3.15 any Generalized Inverse of the form (X'X)~' is

a Normalized Generalized Inverse of X. Hence, the computation of a

Normalized Generalized Inverse can be reduced to the computation of a

Generalized Inverse.

Alternatively, the method due to Zelen and Goldman [1963] can be

used. If one writes A=X'X (A is pxp of rank r) andchooses

R (R is P x (p-r) of rank (p-r» so that B'R is non-singular where

B'A = 0 then a Normalized Generalized Inverse of X is given by ex'-where

As discussed in Section 2.4 if one finds M- l and partitions it in the

-1same manner as M then C occupies the same position in M as A

does in M where

M = [Hi]4.3.4 Pseudo-Inverses

Computational methods for finding Pseudo-Inverses have been explored

from various viewpoints. Some of the methods preViously published seem

to have little practical value, however.

Equation (Viii) of Theorem 2.9 allows us to concentrate on methods

for obtaining the Pseudo-Inverse of a matrix of the form A = XiX.

Penrose [1956] showed that the Pseudo-Inverse of A = XiX can be

57

expressed in terms of ordinary inverses via the formula

(4.8)

where

P = [(X~Xl)2 + (~~)(~~)rl(~~)[(~~)2 + (~~)(~~)rl

Since we can rearrange the rows and columns X'X by pre- and post-

multiplication by suitable orthonormal matrices (vi) of Theorem 2.9

shows that there is no loss of generality in assuming that

forms a basis for the row space of X'X.where [XiXl I Xi~]

Greville [1959] showed that if one factors X as X = Be,np np nr rp

where r = Rank X and Bnr and C are both of rank r, thent np rpX can be expressed as

In the case A = X'X we can factor A as A = T'T where T is r xp

of rank r and (4.9) reduces to

(4.10 )

In a recent paper van der Vaart [1964] has used a particularly simple

matrix factorization to obtain an expression for the Pseudo-Inverse. If

X is a matrix of rank r simply select an orthonormal basis for thenp

row space of XnP

• If the row vectors of this basis are ~i,~, .•• ,

~' then we can writer

58

where

P' = and Q = XP

It is then easy to verify that

Boot [1963] has also given a formula for the Pseudo-Inverse of a

matrix in terms of ordinary inverses. We write X' so that its firstpn

r rows are linearly independent (where r = Rank X). Then one defines

F' and G' by the equationsrp rn

X'X = [:? Jp-r,p [

G' jx, - rn- Nt

p-r,n

The Pseudo-Inverse of

(4.11)

X is then given bynp

Xt =F(F'F)-lG' •pn

Boot gives two proofs of this result. The first is based on the result

of a matrix X is that matrix which minimizes

subject to the restriction that (X'x)Xt =X'. Thetrace of (X'X)

(see Ben Israel and Wersan [1962] or Rao and Chipman [1964]) that the

tPseudo-Inverse X

alternative proof is algebraic and like Greville's treatment rests on a

factorization of X.

(4.12 )

59

Ze1en and Goldman [1963] mention that one can compute the Pseudo

Inverse of A = X'X (where rank A = r) by choosing R = B in Lenuna

2.1. The explicit formula

At = [I _ R(R'R)-lR,](A + RR' )-1

is presented.

It is interesting to note that (4.12) can be directly verified

using the properties of the Pseudo-Inverse. We see that

by (iii) and (vii) of Theorem 2.9. Hence

[I - R(R'R)-lR'][A + RR,]-l =[I - R(R'R)-lR'][A + (RR,)]t =At

since it is easily verified that (RR') = R(R'R)-2R, and since

R(R'R)-~'At = R(R'R)-lR'At AAt = R(R'R)-lR'AAt At = 2 .

4.4 Abbreviated Doolittle Technique

In this section we indicate a presentation of the theory behind the

Abbreviated Doolittle computing technique widely used by statisticians

in dealing with linear equations, and illustrate how this technique can

be used to obtain Generalized Inverses. We consider matrices of the

form X'X and linear equations of the form (X'X)£. = X'I. The develop

ment followed here closely follows that of Anderson [1959] who consid-

ered the case where X'X is non-singular.

In Chapter 6, Section 5.7, we return to the Abbreviated Doolittle

and its place in both the theory and application of least squares.

60

We recall that a matrix X can be factored as follows

X = ZH

where Z is such that Z' Z = D, a diagonal matrix, and H is an upper

right triangular matrix with all diagonal elements unity. To conform

with the usual presentation of the Abbreviated Doolittle technique we

write X = ZH instead of the more natural X = HZ. The above factori-

zation is simply a result of the Gram-Schmidt Orthogonalization Process

applied to the columns of X. Let !l'~' ""!p and .!l' ~, ... ,

~ be the columns of Z and X respectively.

are determined recursively as

Then zl' ~~, ..• , z- -c; -p

for i = 1,2, ••• ,p. Since no assumptions have been made about the rank

of X some care needs to be taken if !.e =Q for some ! . In this

case for q = 1,2, ••. ,p-.e we take

.e-q-l

!.e+q = .!.e+q - j ~l

jrj:.e

It is clear that

ZH = X

where

61

,~~j j = i+1, i~, PI ...,!i!i

Hij = 1 j = i if !i rQ0 j = 1, 2, ... , i-1

(4.14)

0 j = i+1, i~, ..., P

Hij = 1 j = i and z = 0 if !i = 0-i -

0 j = 1, 2, ..., i-1

Hence H is upper right triangular, non-singular and has diagonal

elements un1ty . The vectors zl'~' ••. , z are orthogonal by con-- -Co -p

struction and hence

(4.14') z'z = D ,

where D is diagonal.

Thus we have

(4.14" ) x'x = H'Z'ZH •

From Theorem 3.3 we see that a Generalized Inverse of X'X is given by

(4.15 )

where (z'Z)g is a Generalized Inverse of (Z'Z). Since (Z'Z) is

diagonal a Generalized Inverse is easily found. If we define (Z'Z)g

by

(4.16)

62

where1

arbitrary

then (X'X)g is a Generalized Inverse of (X'X).

If (z'z)r is defined by

(4.17) (Z'Z)r = diag dii

where 1 if z. fQ, -J.

dii!i!i

=0 if z = 0-i -

then (x'x)r is a Reflexive Generalized Inverse of (X'X).

The above remarks indicate that if one can obtain the matrices H

and Z one can easily find a Generalized Inverse or a Reflexive Gener-

alized Inverse of X'X.

The equations (4.13) and (4.14) for the determination of Z and H

indicate that such determination of Z and H is anything but trivial.

Fortunately the application of the Abbreviated Doolittle Computational

Procedure yields a systematic method for obtaining H. It turns out

that Z is not needed explicitly and the procedure yields Z'Z in a

simple manner.

The Abbreviated Doolittle computing procedure can be conveniently

expressed in the format given in Table 4.1. The quantities aij =!.' i!.j

are the elements of the X'X matrix and aiy = !.~l for i = 1,2, •.. ,p.

Some difficulty arises in using the format indicated in Table 4.1 when

.,

•Table 4.1. The Abbreviated Doolittle format

e.

e

~ ~1 ~ ·.... Pu- ·...... ~p '1.y where ~ = ~j; j = 1,2, ... ,p,y

~~ 1 B12 ·.... B1i ·...... B1P

B1y where B1j = A1j/~1; j = 1,2, •.• ,p,y

~ 0 ~2 ·..... ~i ·..... B2p ~y where ~j = a2j - B12~j; j = 1,2, ••. ,p,y

~ 0 1 ·..... B2i ·...... B2p B2y where B2j = ~j/~2; j = 1,2, ••• ,p,y

· · · · · ·· · · · · ·· · · · · ·· · · · · ·~~ 0 0 ·.... A

ii ·..... AiP _ Aiy where Aij = aij - ~~ Bk1~; j=1,2, ••• ,p,y

~~ 0 0 ·.... 1 ·..... BiP

Biy where Bij = Aij/Aii ; j = 1,2, ••. ,p,y

· · · · · ·· · · · · ·· · · · · ·· · · · · ·~~ 0 0 ·.... 0 ·.. ,. .. App A where Apj = ~j - ~~ Bkp~; j=1,2, •.. ,p,ypy

~~ 0 0 .. .. ., .. 0 ·...... 1 B where B . = A j / A ; j=1,2, •.• ,p,ypy PJ P pp

-&

64

AJ,J = O. This will occur if and only if ~J is a linear combination ofJ-l

~l' ~, ... , ~J,-l· More precisely if ~J = i~l Ci~i then since the

equations X'XE, = X'y are consistent it can be shown that AJ,j = 0 for

j =1,2, ••• ,p,y •

In such a case we define as follows

OJ> J

o j < J

(4.18) 1 j = J

Similar procedures are followed if several Aii ' s are null. The reason

for this somewhat peculiar convention will become clear in what follows.

Recalling the definition of !l given in (4.13) we see that

and

Also

B' =-1 o • 0 J

for j =2, .• ·,Pj ~y =~ib and B2l = OJ B2j

~l.3, . . .,p j B2y = z i z

=e=e

,= ¥jz~

for j = 2,

In general it follows that

0 j <1

A1j = 1 j = 1!1!1

,j = 1+1, 1+2,!1!j ••• , p

0 j <1

(4.19 ) B1j = 1 j ::: 1

1

!t!j j = 1+1, 1-t2, ••• , P!1!1

A1y = ZIZ-1

zllB1y = -1~

!1!1

for 1 = 1,2, ... ,p if ~:f o. If !1 =.Q we follow the convention

that

0 j <1

B1j::: 1 j ::: 1

(4.19 ' ) 0 j >1

B1y::: o .

It is clear that

65

B =

B'-1

B'-p

=H

66

(4.20 )

Apy

where the matrices Z and H are as defined by equations 4.13 and

4.14.

It is also of interest to note that

(4.21) (Z'Z)H = (Z'Z)B =

A'-p

= A.

The matrix H-l = B-1 can easily be found by the following

sequential scheme. Remembering that the inverse of a triangular

matrix is again triangular we see that the matrix equation

67

1 B12B13

B1P Cll C

12C1P

1 0 ...... 0

0 1 B23B2p 0 C22 C2P

o 1 ...... 0

= .o

o ........... o 1

o

o . . . . . . . . . . .. cppo 1

yields the following set of equations for determining the elements of

the C matrix.

Cij = 1

(4.22) Cij = 0

j

k;iBikCkj = 0

i = j

j < i

j > i

i =1,2, ••• ,p

•

The simplest procedure is to start with the equation

C + C B = 0p-l,p pp p-l,p

and proceed in a sequential fashion to determine the other Cij's. It

-1is clear that this procedure gives a routine method for finding B •

Thus the computation of a Generalized Inverse or a Reflexive Generalized

Inverse can easily be performed within the Abbreviated Doolittle format

utilizing equation (4.15) i.e.,

It is of interest to note that the Doolittle procedure can be used

to compute certain matrix products. To be specific ve shall indicate

how to compute ~(xtX)r~. Form the augmented matrix

68

M = [X·X I ~'I~]

and carry through the forward solution of the Abbreviated Doolittle on

M. Note that since M is rectangular with P rows the Doolittle

procedure will stop after p steps. If one forms the matrix, ~,

". "consisting of the A rows of the Doolittle and partitions it in the

same way as M it follows from A = (B'rlX'X that

(4.23' )

If we let ~ designate theJ8lJalagously partitioned matrix of "B" rows

it follows from the definition of the Doolittle operations that

Hence

(4.24)

Equation (4.24) gives a simple procedure for computing a Normalized

Generalized Inverse of (X'X). Recall from Theorem 2.18 that a Normal

ized Generalized Inverse of (X'X) is given by

(4.25 )

Upon forming the augmented matrix

The computation of (X'X)n can be performed by an application of the

Doolittle forward solution and equation (4.24). The computation of the

matrix product ~(X'X)~ discussed above is a generalization of

69

Aitken' s triple product method [1931] and is discussed in a recent

paper by Rohde and Harvey [1964].

The computation of the Pseudo-Inverse of (X'X) can be achieved

by the Doolittle using the factorization representation discussed in

Section 4.3.4, namely,

where T is determined from the equations

X'X = T'T, Rank (Trp) = r = Rank (X'X) •

To compute T we let T consist of the r non-zero rows of DB

where D and B are as defined in (4.20). It is easy to see that

X'X = T'T. The matrix product T' (TT' r 2T can, of course, be computed

from the Doolittle utilizing the augmented matrix

(4.26) [(TT,

)2 I T] •

Note that the determination of T is identical with the determination

of the square root of X'X and hence the square root method or any

similar factorization procedure can be used to obtain the Pseudo-Inverse

in a routine fashion.

~ Some Results on Bordered Matrices

A widely used technique in the solution of extrema problems is the

Lagrangian Multiplier Technique. In this technique one considers the

problem of maximizing (minimizing) a function F of p variables

91, 92, ••• , 9p subject to s < p restrictions. Let the restrictions

be given by Vi (9l ,92, ••• ,9p ) =r i for i =1,2, ••• ,s. Define

,.

70

• e' = (el

, e2, ••• J e )p

A.' = (~, A.2

, • 0 ., A. ).s

!' (e) = ('lfl(~)' 'lf2(~)' ... , 'If (e»s -

A solution to the problem of finding an extremum of F subj ect to the

stated restrictions is often obtained by equating the partial deriva-

tives of

with respect to ~ and ~ to zero and solving for e.

Under certain circumstances (~.~. when F is a quadratic form in

e and i (~) is linear in ~) the above procedure results in a set of

linear equations of the form

(4.27)

The matrix

is, for obvious reasons, called a Bordered Matrix. As a useful applica-

tion of the theory of Generalized Inverses we shall discuss briefly the

problem of obtaining a solution to a set of linear equations of the form

(4.27). We shall do this by a simple application of the results of

Section 4.2

Forming

L=[f-B][~] [A'A+RR'lK.B]

= RiA ~ [~J= R'JllR'R ' say,

71

we see that a Generalized Inverse of L is given by

where Q =R'R - R'~A'R

[~I I~] is given by

Hence a Normalized Generalized Inverse of

(4.28)

~R+~A'RQgR'~~l •

-QgR'~R J

The general formula given by (4.28) can be specialized to yield the

results concerning Bordered Matrices which are widely known as well as

those which are not.

One application of Bordered Matrices is, as indicated in Section

4.3, the computation of Normalized Generalized Inverses and Pseudo

Inverses. Using (4.28) we shall derive alternative, but equiValent,

expressions for the Normalized Generalized Inverses and PseudO-Inverses

given in Section 4.3.

Case I: If we adopt the assumptions used by Zelen and Goldman

[1963], !.~., A is symmetric, p xp of rank rj B' A =2 where B is

p x (p-r) of rank (p-r) and R'B is non-singular, then

F = [Hi]is non-singular. Letting

72

be partitioned in the same manner as F we see that the identity

FF-l = I yields the equations

AC + RU' = I ,

R' C = ~ ,

R'U = I ,

and AU + RV = 0 •

Thus U' = (B'Rr~' and V =~. Since the inverse of F is unique we

have

F- l =[ C IB(R'Bd =[¥f>A+¥f>ARQgR' A¥f>A-¥f>ARQ~' I¥f>R+¥f>ARQ~'A~R1(B'Rr~' 2 J -Q~'A¥f>A + Q~' -QgR' A¥f>R J

where Q = R'R - R' A¥f>AR and K = A2 + RR' . It follows that

-QgR' A¥f>R = 2

¥f>R = B(R'Brl

QgR' - Q~'A¥f>A = (B'Rr~' •

Noting that assumptions imply that ¥f> = K- l = (A2 + RR' r l we have an

explicit expression for C given by

C = Kf5A - Kf5AR[Q~' - Q~'A'Kf5A]

= ¥f>A[I - R(B'R)-~']

= (A2 + RR,)-lA[I - R(B'R)-~']

This is an alternative form of the expression given in Section 4.3.3.

It is easily checked that C is a Reflexive Generalized Inverse of

A2 and hence CA is a Normalized Generalized Inverse of' A.

73

Case II: If we let B =R in Case I we see that the assumptions

imply that

[LLB] K = A2 + RR', R'R and Q = R'R _ R' AK-1AR~'

are all invertible. Obviously

Then 4.28 becomes

Since the inverse of a symmetric matrix is again symmetric it follows

that

If we let At be the Pseudo-Inverse of A we see that

involved only one matrix inversion while that given by Zelen and Goldman

involves two inversions.

CHAPrER 5. LEAST SQUARES APPLICATIONS

~ Introduction

In an excellent historical review Eisenhart [1963] has pointed out

that the analysis of a set of observed data using least squares tech-

niques is by no means new. It has only been in the last thirty years,

however, that research into the statistical properties and implications

of such analyses has been initiated. Much of this research grew out of

the application of least squares analyses to a Wide class of models

variously called Analysis of Variance, Regression or General Linear

Models.

Generally speaking, least squares techniques yield tractable estima-

tors when the model used to represent the observed data is linear in the

parameters. There is concern about properties of least squares estima-

tors when parameters enter into models in a non-linear fashion. However,

alternative methods of estimation such as maximum likelihood, minimum

chi-square, minimum modified chi-square etc. can often be viewed as

quasi-least squares estimators.

Least squares theory (and indeed most of the important applications)

is relatively complete for the class of models which can be fitted into

the following definition.

Definition 5.1 A model for a set of observed data I is called a

general linear model if

(i) E(lnl ) = X ~ , RankX=r~p<n,

np pl(5.1)

V (]'2and (ii) Var(l) = , Rank V = n ,where ~ and 2 are unknown parameters.(]'

75

Of importance is inference about the parameters. We shall devote

most of our attention to problems of estimation in this chapter. In the

framework of Definition 5.1 not all the parameters are estimable in the

following sense.

Definition 5.2 A parametric function !r~ is said to be linearly

estimable if there exists c' such that

regardless of the value of ~.

There is some controversy about restricting problems of estimation

to those parametric functions which are (linearly) estimable. Two

reasons can be given for such a restriction. First it can be shown

(Section 5.2) that estimable functions possess unique minimum variance

unbiased linear estimators (called BLUE estimators). The criterion of

minimum variance unbiasedness is Widely used in other estimation prob

lems (although occasionally it leads to nonsensical estimators (~.f.

Lehmann [1950], pages 3-14 to 3-15) and hence such a restriction seems

reasonable. Second, most of the common applications require for their

solution that only estimable functions in the sense of Definition 5.2

need be considered.

In the next section we review some of the current approches to

least squares theory in general linear models.

5.2 A Review of Some Current Theories

The theory of estimation in the general linear !!lOdel is well

developed and can be compactly presented. In this section we shall

describe briefly the essence of this theory. We assume that we have a

76

vector of random variables Z with the following properties

(i) E(Z) =~

(5.3 )and (ii) Var(z)

2 RankX=r'::::p'::::n,= Icr ,

where ~ and 2 are unknown parameters.cr

It is desired to estimate !'~, a (linearly) estimable function in

the sense of Definition 5.1 using a minimum variance linear unbiased

estimator or BLUE estimator.

The most elegant proof of the existence and uniqueness of a BLUE

estimator for !'~ uses the theory of projections on finite dimensional

vector spaces. The details of this theory are available in many sources.

'"For convenience we follow the line of development used by Scheffe

[1959]. Recall that !'~ is linearly estimable if and only if there

exists c such that J,' = c 'X' •- -".

Scheffe proves that there exists a

, *'unique linear unbiased estimator for ! ~ given by £ Z. The vector

•

£* lies i,n Vr ' the vector space spanned by the columns of X. In

addition, if £'Z is any linear unbiased estimator of !'~ then c*

is the proj ection of c on V· along v-l. 1./ Using these results- r r

'"Scheffe establishes the celebrated Gauss-Markoff Theorem.

Theorem 5.1: (Gauss-Markoff) Under the model (5.3) every

(linearly) estimable function !'~ has a unique linear unbiased esti

mator £i'~ which has minimum variance in the class of' all linear un

fbiased estimators. The form of ! ~ is given by

" 1./:s.ose (~959] calls Vr the "estimation space" and v!- theerror space •

77

where b is any solution to the " """least squares or normal equations

(X'X)£ = X',;z: •

From Theorem 2.5 we have £ = (X'X)~ I,;z:. Hence the BLUE of .!'~

is t£ = .&0' (X'X)~',;z:. It can also be shown that E[(,;z:-X!?), (,;z:-X£)] =

E[SSE] = qri where q = n - Rank (X'X). Thus an unbiased estimator for

(]"2 is '(/2 = SSE!q •

Other approaches to the theory of least squares are possible, and

have certain desirable qualities. The approach of Plackett [1960] will

be extended in Section 5.4. An approach due to Roy [1953] is of interest

because it is equivalent to using a Reflexive Generalized Inverse of

(X'X). ROY's approach is also important in that it often leads to more

efficient programming of linear model problems on high-speed computing

equipment as is brought out by the following paragraph.

Roy's approach utilizes a basis of X and is as follows. Rewrite

the model (5.3) as

where Xl is a basis of X. Note that the elements of ~ have been

rearranged to conform with the rearrangement of X. From the condition

for linear estimability (5.2) it follows that .!'~ is estimable if and

only if there exists c such that

Using this form of the estimability condition Roy shows that the BLUE

estimator for !'~ is given by

•

78

.etA = .e'(x'x r lv '_.t:. -1 11 ~l

and that this estimator coincides with the least squares estimator of

,!'f!. Roy also develops criteria for testing and establishing the test

ability of hypotheses and shows that these results are invariant under

choice of a basis of X.

It is interesting to note that a simple proof of the invariance of

say, l'f! under choice of a basis of X can be given using Generalized

Inverses. We see that

Since l'~ is estimable we have [l~ I ~] = [£~ I ~] [Xl I X2] for

some [£~ I £~]. Thus

,!ff! = [£~ I £~] X(X'X)rX' l and since X(X'X)r X·

is uniquely determined by Theorem 1.6 it follows that another choice of

1-basis for X will yield the same ,! ~ .

The above approached to the theory of the general linear model can

be generalized in that one can replace the assumption that Var (y) = Icl

by the assumption that Var (y) = v~2 where V is positive definite

and known. If V is not known, then estimates of its elements must be

substituted and iteration will need to b'e used to find b. The theory

necessary to handle linear model theory under the assumption that

2Var (~) = V~ and V known will be discussed in Section 5.3 .

79

~ Weighted Least Squares and the GeneralizedGausS-Markoff Theorem

In Section 5.2, it was mentioned that the assumption of the general

2linear model could be somewhat relaxed in that Var (z) = IO" could be

2replaced by Var (z) = VO". In this section we define the weighted

least squares estimator of a linearly estimable function and show the

equivalence of the weighted least squares estimator and the BLUE. The

results extend P1ackett's [1960] results and utilize the theory of

Generalized Inverses. The proofs are simple, purely algebraic, and

2reduce to the case considered in Section 5.2 when V = IO" .

Definition 5.3 If a set of linear parametric functions If! is

such that there exists C satisfying E[qz] = ~ for all ~ then ~

is said to be a (linearly) estimable set of parametric functions.

Theorem 5.2: If ~ is a set of linearly estimable functions then

qz, the best (minimum variance) linear unbiased estimator for ~ is

given when

Further the value of var(ca) is L(x'v-lx)gL' •

Before proving Theorem 5.2 we need to establish the following

Lemma.

Lemma 5.1: The identity (x'v-lx)(x'v-lx)~,v-1= x,v-1 holds and

the quantity v-(i)X(X'v-lx)~'v{t)' is uniquely determined, synnnetric

and idempotent.

Proof: Let Y = v-(i)x where v(t)'v(i) = V. Then

v-(t)X(X'v-lx)~'v(t)' = Y(Y'Y)~' is uniquely determined, synnnetric

80

and idempotent by Theorem 2.6. Also by Theorem 2.6 it follows that

Proof of Theorem 5.2. If Cz is to be unbiased for If! then

E[ Cz] = CXl:! = If! for all /3. Hence ex = L. We must show that the

diagonal elements of Var (Cz) = eve', subject to ex = L, are minimized

when e = L(x'v-lx)gx'v-l . We see that

[e - L(X'V-lX)gx'V-l]V [e - L(X'V- l x)gx'v-l ],

= eve' - CVV-lx(x'V-lX)g'L' - L(x'v-lx)~'v-lye'

+ L(x'v-lx)gx'v-lyy-lx(x'v-lx)g'L'

= eve' - CVV-lx(x'v-lx)g'x'v-lye' - ex(x'v-lx)gx'e'

+ cvv-lx(x'v-lx)~'v-lx(x'V-lx)g'x'v-lye'

= eve' ex(x'v-lx)~' e' -ex(x'v-lx)x' e' + ex(X'v-lx)~' e'

= eve' - ex(x'v-lx)~' e' by Lemma 4.1.

Hence

...Var(If!) = Var( Cz) = eve'

= e x (x'v~lx)g x'e' + [e-L(x'v-lx)~'v-l]v[e-L(x'v-lX)gxv-l]'

= L(x'v-lx)gL' + [e-L(x'v-lx)gx'v-l]v[e-L(x'v-lX)gx'V-l ], •

"Thus each diagonal element of Var (If!) is minimized when

e - L(X'V-lX)~'V-l = ~

or when e = L(X'V-lX)~'v-l .

With this choice of e we have Var (:4!) = L(xv-lx)gL' •

81

Having found the linear minimum variance unbiased estimator for ~

we now show the equivalence of this estimator and the weighted least

squares estimator.

Definition 5.4 The weighted least squares estimator for a linearly

"" ""estimable set of parametric functions ~ is given by ~ where ~

is any solution to the equations

"" ""Theorem 5 .3: The vector ~ used in determining ~ minimizes the

weighted error sum of squares

Proof: We first note that

where

Hence

WSSE = (X - X!?) 'v-leX - X£) = !'y-l!

! = [(X - X(X'y-~)gX'y-1X) + (X!? - X(X'y-~)Sx'y-1Z)] .

WSSE = [X - x(x'y-lx)Sx'y-lx) 'y-l[X - X(X'y-3x )~'y-\]

- [x-X(X'y-lx)gx'v-1X]'y-l[X!?-X(X'y-lx)gx'y-lx]

- [X!? - x(X'y-lx)gx'y-lx] 'y-l[X- x(x'y-lx)Sx'y-lx]

+ [XE, - X(X'y-lx)Sx'y-lx] 'y-l[XE, - x(x'y-lx)gx'v-lx] .

By Lemma 4.1 we have

[X!? - X(X'y-lx)Sx'y-1X]'y-l[X - x(x'y-lx)Sx'y-lx]

= [£ _ (X'y-~)Sx'y-1X]' [(X'y-l)X - (X'y-lx)(X'y-~)Sx'y-1X] =0 .

82

Similarly

[I - X(X'V-lx)gx'v-11]'V-1[~ - X(X'V-lx)gX'V-11] =0 •

Hence

WSSE = l'[r - X(X'V-lx)Sx'V-1]'v-1[r - X(X'V-lx)Sx'V-1]1

+ [~ - X(X'V-lx)gx'V-11]'V-1[~ - X(X'V-lx)gX'V-11]

is minimized if b is chosen so that X£ = X(X'v-lx)gX,v-11 •

Obviously E. = (X'V-lx)~'V-11 satisfies this equation. Further if b

satisfies the above equation then £ also satisfies

(X'v-lx)£ = (x'V-lx)(x'v-lx)Sx'V-11 = X'V-11 ;

£ =~ = (X'V-lx)Sx'V-11 ,

. """ "is a solut~on to the normal or least squares equations.

~ Linear Models with Restricted Parameters

Occasionally linear models arise in which the parameters ~ are

restricted in a linear fashion. That is, it is known that

R'~ =!

for some matrix R and vector r. In such a restricted model one"-

feels that it is only reasonable that a set of estimators ~ should

satisfy (5.4).

We are thus led to consider a restricted linear model of the form

and

(i)

(ii)

(iii)

E(l) = Jq!; Rank X = r ~ p < n

2Var (I) = Vcr; V positive definite,

R'~ = ! .

Under the model (5.5) c. R. Rao [1952] established the following

theorem.

Theorem 5.4: A linear function .t~ is estimable under (5.5) if

and only if there exists c and d such that

clx + d'R' = J'

and d'r = b o

hold. If £~ is estimable then its BLUE is given by ~'"l. + bo where

c' = A.'X'y-l and b = d'r. The quantities _A. and _d are any solu-- - 0

tion to the equations

[~] [~] =t~jProof: If c and d are such that clX + d'R' = JI and d'r = bo

then !Ip' is estimable since E(c'V + b r = clyp' + b = Jlp'- dIR'P.+bt:: -, 0 _."t:, 0 _t:: - t:: 0

=t~ - ~'! + ~I! =~I~ .

If ~'~ is estimable and R'~ = r then there exists c and bo

such that E(~I"l. + bo ) = !I~. Hence

clyp' + b = Jlp' •_."t:, 0 - t::

Thus

for some d. Hence clX - JI = _d'R' and b = d'r •o"To find the BLUE of ~I~ one simply lets ~I~ =~I"l. + b 0 and

"determines ~ and b0 such that Yar (t~) is a minimum. Since

"Yar (!'~) = ~Iy~ we are led to form the LaSrangian function

F(c, A. , A.) = clyc + 2A. (b - d'r) - 2A. I (X IC + Rd - J) •

- 0 - - - 0 0 -- - - -

Taking partials with respect to ~, A.0' b0 and' ~ yields the equations

84

aF = 2c'y - 2A.'X'de ,

aF _ b - d'rdA.o - 0 --'

~ = c'X' + d'R' _ £'- ,

aF2A.ow= ,

0

aF 2A. r' + 2A. 'R'dd'" = -0

Equating the partial derivatives to zero and simplifying yields the

equations

c' = !:::.'X'y-l,

bo =d'r- - ,and A.'R =0' •

c'X + d'R' =!' ,

A. = 0 ,o

Hence c' = A.'X'y-l and, .

b = dr,o --

where A. and d are solutions to the equations

[~] [~] = [~JSince the use of Lagrangian multipliers only yields necessary conditions

for extrema one still needs to verify that the above solutions do in

fact yield a minimum. Rao has shown that these solutions do in fact...

minimize Yar C~'f~) and that !'t3 is uniquely determined.

Dwyer [1959] has obtained the natural generalization of Rao's

formula to the case of a set of linear functions. He has shown that

85

~ is a set of estimable functions in the model (5.5) if and only if

there exists C and D such that

e'x + D'R' = L

and D'r = b •- -0

is an estimable set of linear functions then its BLUE is given"'-

by ~ = e'l + £0 where

e' = !\'X'V-l

b = D'r •-0

The quantities J\ and D are given as any solution to the equations

[~l[q = U]Dwyer also gives an explicit expression for the BLUE in the case

that X is of full rank and R is of full rank. He mentions that Rao

did not obtain such an explicit expression for the BLUE of a single

estimable function. This is not surprising since Rao assumes that X

is not necessarily of full rank. Rao assumes, however, that R is of

full rank.

Using the theory of Generalized Inverses of bordered matrices we"'-

can give an explicit expression for ~ which includes both Dwyer's

and Rao's results as special cases. Zelen and Goldman [1963] have

achieved such results but their formulae seem unnecessarily complicated

and cumbersome.

r86

From Section 4.5 a Generalized Inverse of

fXIV~;X I~] .is given by

where A =x'v-Ix, Q = R'R - R'~A'R and K =A'A + RR' .

Hence

[~jThus

. ,b = L( I - ~ A' )RQg r .-0 -

By allowing the assumptions on X and R to vary one can obtain

as special cases of the above formulae the results of Dwyer and an ex

plicit expression for Rao's situation. Note that since a Generalized

Inverse of 0 is 0 there is no problem in treating the case R = 0- - -as in Dwyer's method.

The variance covariance matrix of the BLUE for ~ can easily be

obtained and is simply

""V(~) = e've or

The expressions given by Dwyer for various special cases can be obtained

by simple substitution into the above formula.

r

87

~ Linear Models with a Singular Variance - Covariance Matrix

In this section we generalize the results of Section 5.3 to the

case where V is singular. This situation can occur when one of the

lIs is a linear combination of some of the other lIs or when one of

the lIs is a constant. Zelen and Goldman [1963] have presented some

results in this connection but their results seem too restrictive. The

treatment presented here is straight-forward and utilizes the results

of Section 5.4 as well as the theory of Generalized Inverses.

The linear model under consideration is now

and

( i ) E(l) = xt!,

(ii) Var l = V, V positive semi-definite of rank q ~ n •

Without loss of generality we may assume that V has been written in

the form

V = V12J

where [Vll I V12

] is a basis for V. It then follows that Vll is

non- singular and that

If we let

pI = [I ,

then simple matrix multiplications show that

[VQllplvp = E-

r88

It follows that the transformation ! =P'l yields a linear model

for ~ = [~ ~ , "here ~l corresponds to VU ' given by

(i) E~:~ = P'x.@. =

[Zl] [ VQll 0Q_ ]

(ii) Var ~2 J = -=--~--=-J

Hence !2 = (~ - viiv12Xl )~ with probability one.

It is clear that the model has been reduced by the transformation

P' to a restricted model of the form considered in Section 5.4. The

model is thus

and

(i) E(!l) =Xl~ ,

(ii) var(!l) =Vll '

(iii) R'~ = (~ - Viiv12~)~ =~ .

From the results presented in Section 5.4 it follows that

where

and

" e'l + !!o~ =

, , '1e' = L[I - (K-~ A' )RQg R']~ ~V-

11, ,bo = L( I - ~ A' )RQg ! ,

A ' -1= X1Vll~ ,

K = A'A + RR' ,

Q = R'R - R'~A'R ,

R' = ~ - viiv12 .

The formula presented in Section 5.4 for the variance covariance

matrix can be used to obtain the variance covariance matrix of the BLUE

of ~.

2..:.§. Multiple Analysis of Covariance

The use of the theory of Generalized Inverses allows an uncluttered

treatment of multiple analysis of covariance. In particular the results

of Section 4.2 can be used to show that the estimators for the regres-

sion coefficients of the covariables are unbiased.

Suppose that we have the model

z = [~ I ~l [~j + €

where

Z is an n x 1 vector,

€ is an n x 1 vector,

Xl is an n x p matrix of rank r ~ p < n,

X2 is an n x q matrix of covariables of rank q < n,

~l is a p x 1 vector of parameters,

and ~ is a q x 1 vector of regression coefficients for the covariables.

We make the usual assumptions that E(.§) = Q and Var (§.) = lei as

well as the assumption that [~I~] has rank (r + q) ~ n.

The normal equations for estimating ~l and ~ are, in

partitioned form,

where £1 and Ee are the estimates of ~l and ~2 respectively .

From Theorem 2.6 we have

In terms of the model the estimators of ~l and ~2 become

90

Using 4.3 we have

From Lemma 4.1 it follows that Q is non- singular. Hence

Another use of 4.3 shows that

"-It follows that ~2 is an unbiased estimator for ~2 with variance

covariance matrix given by

v(~ ) = Q-l[x~ - x~X (X'X )~'][x_ - X (X'x~ )~IX ]Q-l cr2~2 -~ -~ 1 1 1 1 -~ 1 1--1 1 2

= Q-l[x~x__ x~X (X'X )~'X ]Q-l cr2-~-~ -c 1 1 1 1 2

= Q-1QQ-l cr2

-1 2= Q cr •

91

l:1 Use of the Abbreviated Doolittle inLinear Estimation Problems

In a recent paper Rohde and Harvey [1964] have generalized a method

of computing certain matrix products originally due to Aitken [1937].

The Abbreviated Doolittle procedure described in Chapter 4 can be used

to give a compact formulation of Aitken's method which computes matrix

products of the form CA-~. Of more importance in statistical applica..

tions is the computation of the matrix products which arise in the

theory of linear estimation.

To be specific let us indicate how one can use the Doolittle proce

dure to compute the matrix products 1(x'y-Ix)~y-1Iand 1(X'y-lx)~,

Which arise in the theory developed in Section 5.3. Augment the

X'Y-Ix matrix and X'Y-~ column of the Doolittle by l' = X' C to obtain

[x'Y-Ix I X'y-II I x'c =1'] .

If one carries through the forward solution of the Doolittle on the

augmented matrix it follows from Section 4.5 that the matrix composed of

the "A" rows is

[(Z'Z)B I Z'I I Z'C] •

Similarly the matrix composed of the "B" rows is

We see immediately that

c'z(z'Z)rZ'I = C'XB-l(B'-lx'Y-lxB-l)rB,-lX'y-1I

= c'X(X'Y-lx?X'y-lI

= 1(x'Y-lx)Sx'y-l I .

92

Similarly

Thus the computation of the BLUE and variance covariance matrix of

an estimable function can be obtained from the forward solution of the

properly augmented Doolittle. Note that it is the fact that (xlv-Ix)g

can be any Generalized Inverse of (xlv-Ix) which allows the above

procedure to function properly. By changing the (XIv-Ix) and L

matrices properly one can use the above method to find the BLUE and

variance-covariance matrix of the estimable functions considered in

Sections 5.4, 5.5 and 5.6.

~ Tests of Linear Hypotheses

Associated with the linear model 5.1 one can consider the testing

of various (linear) hypothesis about f! . Such hypotheses are often

expressed in the form

HO: ~=.2 vs.

HA: ~ r .2 .

We confine ourselves to strongly testable hypotheses (hypotheses where

L is of full rank and ~ is a set of (linearly) estimable functions).

The statistic used to test the hypotheses HO will be composed of

two parts. The numerator of the statistic consists of the sum of squares

due to the hypothesis, SSHO' divided by its degrees of freedom, Rank L,

and the denominator of the statistic consists of the sum of squares due

to error divided by its degrees of freedom, [n - Rank (XIX)]. The re

sults on obtaining such quantities are widely known and available so we

93

shall be content to present the formulas as follows:

and

We note in passing that both ~ and ~ are idempotent and symmetrico

and that ~oV~ = ~ •Intuitively SSHO is a measure of that part of the total sum of

squares which~ be explained Ez the null hypothesis He while SSE

is a measure of the part of the total sum of squares which cannot be

explained Ex the model. The usual F statistic defined by

[SSHe/Rank L]F = [SSE/n-Rank XiX]

.-

is thus seen to be a reasonable criterion to judge the import of the

hypothesis. The diVisors, [Rank L] and [n-Rank X'X], simply put the

numerator and denominator on a unit basis. This intuitive interpreta-

tion of the F statistic can be greatly strengthened when certain

additional distributional assumptions are included as part of the model.

More precisely if ~ is assumed to have a multinormal distribution with

E(~) = 2 and Var (§.) = V then the F statistic has, under the null

hypothesis, ~ =2, a central F distribution and allows one to make

a judgment about the hypothesis via significance tests. When the null

hypothesis is false, say ~ = ! r 0, then the distribution of F is

a non-central F distribution with [Rank L] and [n-Rank X'X]

degrees of freedom.

•

94

The above distributional results follow directly from the following

two Lemmas due to Graybill [1961].

Lemma 5.2: If "l:. has a multinormal distribution with E("l:.) =!:!:.

and Var ("l:.) =V then "l:.'B"l:. has a non-central chi-square distribution

with [Rank B] degrees of freedom and non-centrality parameter A

given by

Lemma 5.;: If "l:. has a multinormal distribution with E("l:.) = !:!:.

and Var ("l:.) = V then i Ay and "l:.' B"l:. are independent if and only if

AVB = Q.

From the above Lemmas it follows from previous remarks that:

(i) SSHO has a non-central chi-square distribution with [Rank L]

degrees of freedom and non-centrality parameter given by

(ii) SSE has a central chi-square distribution with

[n-Rank (x'v-lx)] degrees of freedom,

(iii) SSHO and SSE are independent,

and (iv) F, as defined by (5.6) has a non-central F distribution

with [Rank L] and [n-Rank (X'V-1X)] degrees of freedom.

It is to be pointed out that the seemingly more general hypothesis

can be tested by the above method simply by replacing the model by

E(Z} = [X I Ql [~l

• and testing the homogeneous hypothesis

95

[L I -!] t~f = 0 .

The resulting SSHO is found to be

[L(X'VX)Sx'V-ll - !]'[L(x'v-lx)gL,]-l[L(X'V-lx)gx'v-ll -!] .

This simple device was pointed out by Dwyer [1959] in another context.

96

CIJAP:rER 6. SUMMARY AND SUGGESTIONS FOR FUTURE RESEARCH

6 .1 Summary

In this dissertation four types of Generalized Inverses of matrices

were discussed. The sets defined by

• g(A) ={Ag

:

r(A) =[Ar :

n(A) ={An:

and t (A) = [At:

AAgA = AJ

AArA = A and ArAAr = Ar JAAn = A, AnAAn = An and (AAn ) 1 = AAn }

AAtA = A, AtAAt = At, (AAt)1 = AAt and (AtA)1 = AtA}

define the sets consisting of the Generalized, Reflexive Generalized,

Normalized Generalized and Pseudo Inverses respectively. It can easily

be shown that g(A);2 r(A) :J n(A).2.. t(A) with equality holding if and

only if A is non-singular. In Chapter 2 the work of various authors

regarding the sets i(A) for i = g, r, n and t was reviewed. In

particular the fundamental results of Bose [1959] on g(A) and Penrose

[1955] on t (A) were emphasized, and extensions to Hilbert space and

other algebraic systems were discussed.

In Chapter 3 the various types of Generalized Inverses were studied

from a theoretical viewpoint. The principle underlying this investiga

tion was that an intuitively appealing method of studying the sets g(A),

r(A), n(A) and t(A) is to relate a property of the matrix A to the

corresponding property of a typical element of the set i(A), i = g, r,

n and t. Properties investigated from this viewpoint were rank,

synnnetry, eigenvalues and eigenvectors. In particular it was shown

that the rank of any matrix Ar in r(A) is equal to the rank of A,

..

97

and that if A is symmetric then the non-zero eigenvalues of any matrix

An in n(A) are reciprocals of the non-zero eigenvalues of A. It was

also shown that if A is symmetric then a symmetric Generalized Inverse

always exists and that the eigenvectors of the Pseudo-Inverse of A

coincide with the eigenvectors of A if A is symmetric. Using

results on the properties of the sets g(A), r(A), n(A) and t(A),

certain relations between the sets were established. It was shown that

AgE;: r(A) if and only if Rank (A

g) = Rank (A) and that Ag

E;: n(A) if

and only if Ag = (A'A)gA' where (A'A)g E;: g(A). This second result is

a slight extension of one due to Zelen and Goldman [1963]. Relations

were also obtained for t (A) by restricting A to be normal or

symmetric.

In Chapter 4 computational methods were reviewed and the results of

Chapters 2 and 3 used to develop some useful modifications. Of partic-

ular importance in applications is the following expression obtained for

a Generalized Inverse of a partitioned matrix. If XiX is partitioned

as

XiX =

then a Generalized Inverse of XiX is given by

(6.1)[

(XIX )g + (XiX )/5vIX_Q/5vIX (XiX )g _(Xix.. )g(Xlx_ )Qg](XIX)g = 1 1 1 1 ~-~. ~ 1 1 1 1--1 l-~ ,

_Q/5vIX (XiX )g Qg~ 1 11

where Q =~~ - ~xl(XiXl )~i~ . It was also shown by suitably choos

ing (XiXl)g and Qg that the above expression can be used to find a

Reflexive Generalized Inverse of XIX. It was also shown that clever

partitioning of XIX and suitable choice of (~Xl)g permit the compu

tation of Normalized Generalized and Pseudo-Inverses. Application of

(6.1) to Bordered Matrices was discussed briefly and alternative but

equivalent expressions for previously known results were developed.

The theory of Generalized Inverses was also used to develop the theory

of the Abbreviated Doolittle technique when the coefficient matrix is

singular.

Application of Generalized Inverses to least squares theory formed

the content of Chapter 5. In particular, different formulations of

certain well known results were obtained. The use of partitioned

matrices and (6.1) allowed a simple treatment of an estimation problem

present in multiple covariance. Also discussed were the Gauss-Markoff

Theorem, Weighted Least Squares, Linear Models with restrictions, Linear

Models with a Singular Variance-Covariance Matrix and use of the Abbre

viated Doolittle Technique.

6.2 Suggestions for Future Research

There are many possibilities for future research in the area of

Generalized Inverses. First of all it is clear that nearly all results .

hold when the matrices are allowed to have complex elements, provided

one interprets the transpose operation as the conjugate transpose opera

tion and replaces symmetric by Hermitian. Of interest would be careful

delineation of the results which do hold for complex matrices and those

which do not. The relation of a property of a matrix A and the

corresponding property of a typical element of i(Ab i = g, r, n and t ,

99

can clearly be investigated for properties other than the obvious ones

studied here. One by-product of such research might be different and

possibly more natural ways of characterizing the sets g(A), r(A),

n(A) and t (A). Many minor though not uninteresting problems are also

present. For example:

(1) If A is not synnnetric but An is, does this inr.ply that An

possesses Property Rl (the non-zero eigenvalues of Ag are

reciprocals of the non-zero eigenvalues of A and conversely)

or Property ~ (the eigenvectors of Ag corresponding to non

zero eigenvalues are identical to the eigenvectors of A with

(2 )

the associated eigenvalues being reciprocals)?

nDoes A continue to possess Property Rl if synnnetric is

replaced by normal in Theorem 3.l0?

-.

e.

(3) What happens to the multiplicities of the eigenvalues (includ-

ing the zero eigenvalue) when a Generalized Inverse possesses

Property Rl ?

(4) For what class of matrices does the Pseudo-Inverse necessarily

possess Properties Rl or R2?

An extensive list of such problems could easily be drawn up.

A topic of considerable interest which has not been touched upon in

this work is the topic of convergence of a sequence of matrices. As is

well known (see John [1956] or Marcus [1960]) there are many ways to

define a matrix norm in the set of all square matrices. Among the most

common and most natural are

t2 xA'AxHAil = 1. u. b • XiX = max (A.)A.€/\

where

and

A= {r..: At~ = A.:! for some ~ r Q1'IIAII = Trace (At A) ~ = [ E r..}~

r..€/\

100

The matrix norms defined above satisfy all the requirements of

norms discussed in Section 2.6. A convergence problem of interest can

be sin:q;>ly stated as: Given A > A in norm under what conditionsn n:-> 00

does Ag -> A

gin (the same) norm.n n:-> 00

From the eigenvalue and eigenvector properties discussed in Section

3.5 it would appear that for a Normalized Generalized Inverse and the

Pseudo-Inverse one might find a sin:q;>le solution to the convergence

problem.

There are also numerous con:q;>utational aspects of Generalized In-

verses which can and should be investigated. For matrices which occur

frequently in certain investigations (such as least squares analyses)

it would be useful to have available a con:q;>i1ation of the various types

of Generalized Inverses. Variants of formula (4.2) might prove useful

in such cOn:q;>utations. Litt1e work has been done in the con:q;>utation of

Generalized Inverses in Hilbert Spaces. Recent work by Parzen [1959]

indicates that it might be profitable to investigate the computation of

Generalized Inverses in Hilbert spaces using the notions of Reproducing

Kernel Hilbert Spaces.

The results of Section 5.5 should be investigated to see if they

are basis-invariant. Some recent work by Kalman [1963] and C. R. Rao

[1962] on the application of Generalized Inverses to singular multi-

normal distributions indicates that it shOuld be possible to develop

101

the theories of Regression and Analysis of Variance on an underlying

singular multinormal distribution.

Aside from least squares it appears that Generalized Inverses might

be applicable in qwasi-leastsquares situations which include the vari-

ous minimum chi-square estimators and, in some cases, maximum likelihood

estimators. Let ~ = (~, x2, ... , xs ) be a vector valued random

variable with a probability distribution given by P9' where

9 = (91,92, •.. , 9r ) is a vector of (real) parameters. The mean vector

and the variance covariance matrix of x will be denoted by Il (9) and v(9 )- - - -respectively. Let ~l'!e' .• -, ~n represent independent observations

non x and define z = (1: x. )/n. It is well known (see Ferguson

- -n i=l-J.[1959]) that the common minimum chi-square estimators for 9 can be

found by minimizing

where

gl(wl'w2

, _•• , ws )ag

lags

~ ~fi(~:~) = ~(wl,w2'·· .,ws ) . G(!) =,

gs(wl ,w2,·· -,ws )ag

l aGs

~. . . - .

~

for suitable choice of M(9, z) and a. The usual method of choosing- n £.

e to minimize r consists of expanding r in a Taylor series,- ~

ignoring terms higher than the second, and minimizing the resulting

quadratic expression. Such a procedure yields linear equations with a

coefficient matrix depending on the unknown ~. Iteration is usually

102

applied to solve such a system. Often the resulting equations are

singular and the usual procedures call for some sort of reparametriza

tion in order to solve the system. It appears to the author that one

of the types of Generalized Inverses might prove useful to avoid

reparametrization.

The previously mentioned results concerning statistical applica

tions of Generalized Inverses indicate that much work remains to be

done in extending, simplifying and unifying these applications.

Although certain of these developments might achieve little from a

practical viewpoint it appears that they would be extremely important

from the standpoint of presenting a logical development of statistical

analysis in the classroom.

103

CHAP.rER 7. LIST OF REFERENCES

Aitken, A. c. 1937. The evaluation of' a certain triple product matrix.Proc. of' the Royal Soc. of' Edinburgh 57:172-181.

Anderson, R. L. 1959. Unpublished lectures on the applications of'least squares in economics. N. C. State of' the University of'North Carolina at Raleigh, Raleigh, N. C.

Ben-Israel, A. and S. J. Wersan. 1962. A least square method f'orcomputing the generalized inverse of' an arbitrary complex matrix.Northwestern University, O. N. R. Research Memorandum No. 61.

Bodewig, E. 1959. Matrix Calculus, 2nd Edition. North Holland Publishing Co., Amsterdam.

Boot, J. C. G. 1963. The computation of' the generalized inverse of'singular or rectangular matrices. Amer. Math. Monthly 70(3) :302303·

Bose, R. c. 1959. Unpublished lecture notes on analysis of' variance.Univ. of' North Carolina at Chapel Hill, Chapel Hill, N. C.

Desoer, C. A. and B. H. Whalen. 1963. A note on Pseudo-Inverses.SIAM Jour. 11(2):442-447.

Drazin, M. P.groups.

1958. Pseudo-Inverses in associative rings and semiAmer. Math. Monthly 65:506-514.

Dwyer, P. s. 1959. Generalizations of' a Gaussian theorem. Annals of'Math. Stat. 29:106-117.

Eisenhart, C. 1963. The background and evolution of' the method of'least squares. 34th Session of' the Inter. Stat. Inst., Ottawa,Canada, 21-29 August 1963.

Fadeeva, V. N. 1959. Computational Methods of' Linear Algebra. DoverPublications, Inc., New York.

Ferguson, T. s. 1959. A method of' generating best asymptoticallynormal estimators with application to the estimation of' bacterialdensities. Annals of'Math. Stat. 29:1046-1062.

*Foulis, D. J. 1963. Relative inverses in Baer - semi-groups. MichiganMath. Jour. 10(1):65-84.

Graybill, F. A. 1961. An Introduction to Linear Statistical Models,Vol. I. McGraw Hill Book Co., New York.

. .

104

Greville, T. N. E. 1959. The Pseudo-Inverse of a rectangular orsingular matrix and its application to the solution of systemsof linear equations. SIAM Review 1(1) :38-43.

Halmos, P. R. 1951. Introduction to Hilbert Space and the Theory ofSpectral Multiplicity. Chelsea Publishing Co., New York.

Halmos, P. R. 1958. Finite-Dimensional Vector Spaces. D. Van NostrandCo., Inc., Princeton, New Jersey.

John, F. 1956. Advanced Numerical Analysis. New York UniversityInst. of Math. Sciences.

Kalman, R. E. 1963. New methods in Wiener filtering - Proceedings ofthe First Symposium on Engineering Applications of Random FunctionTheory and Probability. John Wiley and Sons, New York.

Lehmann, E. L. 1950. Notes on the Theory of Estimation, Chapters I toIV. Associated Students' Store, Univ. of California, Berkeley 4,California.

Marcus, M. 1960. Basic theorems in matrix theory. National Bureau ofStandards, Applied Math. Series. 57.

Moore, E. H. 1920. Abstract - Bull. of the Amer. Math. Soc. 26:394395·

Moore, E. H. 1935. General Analysis. Memoirs of the Amer. Philosophical Soc., Vol. I.

Munn, W. D. 1962. Pseudo-Inverses in semi-groups. Proc. of theCambridge Philosophical Soc. 57:247-250.

Parzen, E. 1959. Statistical inference on time series by Hilbert spacemethod.s, I. Tech. Report No. 23, Applied Math. and Stat. Lab.,Stanford University.

Penrose, R. 1955. A generalized inverse for matrices. Proc. of theCambridge Philosophical Soc. 51:406-413.

Penrose, R. 1956. On best approximate solutions of linear matrixequations. Proc. of the Cambridge Philosophical Soc. 52:17-19 .

Plackett, R. L. 1960. Principles of Regression Analysis. OxfordUniversity Press, London.

Rado, R. 1956. Note on generalized Inverses of matrices. Proc. of theCambridge Philosophical Soc. 52:600-601.

105

Rao, C. R. 1952. Advanced Statistical Methods in Biometric Research.John Wiley and Sons, New York.

Rao, C. R. 1962. A note on a generalized inverse of a matrix withapplications to problems in mathematical statistics. Jour. of theRoyal Stat. Soc., Series B 24(1):152-158.

Rao, M. M. and J. S. Chipman. 1964. Projections, generalized inversesand quadratic forms. To be published in Jour. of Math. Analysisand Applications.

Roy, S. N.- I.

1953. Some notes on least squares and analysis of varianceInst. of Stat. Mimeo. Series No. 81.

•

Roy, S. N. 1957. Some Aspects of Multivariate Analysis. John Wileyand Sons, New York.

Rohde, C. A. and J. R. Harvey. Unified Least squares analysis.Submitted for publication to Jour. of the Amer. Stat. Assoc.

Scheffe, H. 1959. The Analysis of Variance. John Wiley and Sons,New York.

Sirmnons, G. F. 1963. Introduction to Topology and Modern Analysis.McGraw Hill Book Co., New York.

Van der Vaart, R. 1964. Appendix to 'Generalizations of Wilcoxonstatistic for the case of k samples" by Elizabeth Yen. To bepublished in Statistica Neerlandica.

Wersan, J. J. and A. Ben-Israel. 1962. A least square method forcomputing the generalized inverse of an arbitrary complex matrix.Northwestern University, 0. N. R. Research Memorandum No. 61.

Zelen, M. and A. J. Goldman. 1963. Weak generalized inverses andminimum variance linear unbiased estimation. Math. Researc h CenterTech. Report 314, U. s. Army, Madison, Wisconsin .

of series no. 1964boos/library/mimeo.archive/isms_1964_392.pdfan extension of the concept of...

Documents