of series no. 1964boos/library/mimeo.archive/isms_1964_392.pdfan extension of the concept of...
Post on 22-Jul-2020
3 Views
Preview:
TRANSCRIPT
CONTRIBUTIONS 'ro THE THEORY, COMPUTATION AND
APPLICATION OF GENERALIZED INVERSES
Charles A. Rohde
Institute of Statistics~tlmeog~aph Series No. 392May 1964
•iv
TABLE OF CONTENTS
Page
LIST OF TABLES vi
LIST OF ILLUSTRATIONS • • • • vii
CHAPI'ER 1. INTRODUCTION. 1
CHAPTER 2. LITERATURE REVIEW
CHAPTER 4. COMPUTATIONAL METHOrs •
CHAPTER 5. LEAST SQUARES APPLICATIONS. •
CHAPTER 3. TIlEOREn'ICAL RESULTS •
4
44
1314161929
32
323235374142
47
47
47
52525456565969
74
7475
7982
Matrix . . . . . . . . . . . . . . . . .
Theorem. • . . . . . . . . . . . . . . .
Summar.y. 0 • • • • • • •
Summar.y. • 0 0 0 • • • • •
Summary.. ~ .
An Expression for a Generalized Inverse of a Partitioned
Results on Rank. . • • • •Symmetry Results • • • . •Eigenvalues and EigenvectorsOrthogonal Projections • • • •Characterization of Generalized Inverses .
Generalized Inverses • • • • • •Reflexive Generalized InversesNormalized Generalized Inverses ••Pseudo-Inverses. • • • • • • • . • • ••.Generalizations to Hilbert Spaces. • •••Generalized Inverses in Algebraic Structures • •
Linear Models with Restricted Parameters • • • • • • • • • •
Introduction • • • • • • • • • •A Review of Some Current Theories.. • ••••Weighted Least Squares and the Generalized Gauss-Markoff
4.3 Some Computational Procedures for Finding Generalized
4.14.2
Inverses . . . . . . . . . (I • • • • •
4.3.1 Generalized Inverses•••••..4.3.2 Reflexive Generalized Inverses ••4.3.3 Normalized Generalized Inverses •4.3.4 Pseudo-Inverses •••••
4.4 Abbreviated Doolittle Technique. •4.5 Some Results on Bordered Matrices ••
2.12.22·32.42·52.62·7
3·13.23·33.43·53.6
5·15·25.3
5·4
• TABLE OF CONTENTS (continued)
v
Page
5.5 Linear Models with a Singular Variance - CovarianceMatrix . . . . . . . . . . . . . . . . . . . . . . . 87
5.6 Multiple Analysis of Covariance. • • • • • • • • • • • 895.7 Use of the Abbreviated Doolittle in Linear Estimation
CHAPTER 7. LIST OF REFERENCES. • • • • •
5.8 Tests of Linear HYPotheses ••••••••••..
CHAPTER 6. SUMMARY AND SUGGESTIONS FOR FUTURE RESEARCH • •
6.1 Summary. . . . . . . . . . . . . .6.2 Suggestions for Future Research••
9192
96
9696
103
. . . .Problems 0 • • 0 • • • • • •
•
LIST OF TABLES
vi
4.1. The Abbreviated Doolittle :format o • • • • • ill • • • 0 ill 0
Page
63
vii
LIST OF ILLUSTRATIONS
Page
2.1. The Pseudo-Inverse in geometric terms • • . . . • . • • .. 27
CHAPl'ER 1. INTRODUCTION
Statistics, which may be viewed as a branch of mathematics, has
suggested many interesting mathematical problems. Certain developments
in the areas of measure theory, modern algebra,mmiber theory, numerical
analySiS, etc. have been fostered by research into statistical problems.
Matrix theory affords perhaps the best example of the ~nterplay between
statistics and mathematics. The most widely used statistical techniques
are Analysis of Variance and Regression and each can be compactly treated
using matrix algebra. Application of matrix algebra to these areas led,
for example, to Cochran's Theorem which has as its mathematical counter
part the Spectral Representation Theorem for a symmetric matrix. Matrix
theory also has provided relatively simple proofs of the properties of
least squares as used in Analysis of Variance and Regression problems.
Mathematically these proofs are based on the concepts of projections and
linear transformations on (finite dimensional) vector spaces. Recent
work in the area of stationary stochastic processes has indicated that
the infinite dimensional analogues of Analysis of Variance and Regression
techniques will become increasingly important and useful to the applied
statistician.
Application of the theory of least squares to the General Linear
Model results in a set of linear equations called the normal or least
squares equations. When represented in matrix form, these equations
frequently have no unique solution, Le. the coefficient matrix is
singular. To cope with this situation, certain authors (Bose [1959], and
C. R. Rao [1962]) developed the concept of a Generalized Inverse. Roughly
2
speaking, a Generalized Inverse is an attempt to represent a typical
or generic solution to a set of linear equations in a compact form.
Several earlier generalizations of the inverse of a matrix developed
in certain pure mathematical contexts.
Until the late 1950's little investigation of the theoretical
properties of Generalized Inverses was undertaken. Starting in 1955
With an excellent article by Penrose, many authors have investigated
certain properties of a Generalized Inverse called the Pseudo-Inverse.
The Pseudo-Inverse of a matrix is a uniquely determined matrix which
behaves almost exactly like an inverse. Due to its uniqueness it was
natural that the properties of the Pseudo-Inverse were studied and ex
tended to apply to other Algebraic Systems and just recently to Hilbert
space. The Pseudo-Inverse of a matriX, while of interest because of its
theoretical properties, is perhaps a little too cumbersome computation
ally for use in least squares theory. In particular the results of
least squares computations happen to be invariant under choice of
Generalized Inverse. Such invariance, coupled With the computational
difficulties present With the PseudO-Inverse, indicates one need for
research on the properties of other types of Generalized Inverses. At
present there appear to be four types of Generalized Inverses which
deserve recognition and study.
In this work we shall attempt to accomplish five purposes:
1. Define the various types of Generalized Inverses which appear
to be important and collate previous research,
2. Investigate inter-relationships between the various types of
Generalized Inverses,
3
3 . Discuss some computational aspects of obtaining the various
types of Generalized Inverses,
4. Indicate and modify the statistical applications of Generalized
Inverses which have appeared in the literature,
and
5. Discuss possible areas for future research.
Although topics 1 through 5 were of primary importance in this
work, at least one interesting by-product appeared. Upon investigating
the computational methods it was found that by a simple modification of
the techniques and theory taught in beginning matrix theory courses
Generalized Inverses could be discussed at an elementary level. This
suggests that the idea of Generalized Inverses might well be fitted into
beginning matrix theory courses in order to achieve a more general and
more satisfying treatment of systems of linear equations.
4
CHAPrER 2. LITERATURE REVIEW
2.1 Summary
In this chapter we investigate five definitions of Generalized
Inverses. Each of these definitions, as mentioned in the Introduction,
is designed to assist in solving a set of equations. Four of the defi
nitions presented are concerned with linear equations in finite dimen
sional inner product spaces. Definition 2.1 provides the most general
representation of a solution. Each of the other definitions can be
considered as special cases of Definition 2.1 which defines what we
shall call a Generalized Inverse. Definition 2.2 defines the concept
of a Reflexive Generalized Inverse which is simply a Generalized Inverse
with a little symmetry. Normalized Generalized Inverses are introduced
in Definition 2.3 and can be considered as Reflexive Generalized In
verses with a certain projection property. The concept of Pseudo-Inverse
introduced in Definition 2.4 is simply a unique Generalized Inverse or
alternatively a Normalized Generalized Inverse with a symmetric projec
tion property.
An extension of the concept of Generalized Inverses to Hilbert
spaces is achieved in Definition 2.5 where the Pseudo-Inverse of a
bounded linear transformation is defined. Further extensions of the
concept of Generalized Inverses are discussed in Section 2.6.
2.2 Generalized Inverses
The definition of a Generalized Inverse formulated by Bose [1959],
who used the term Conditional Inverse, revolved around the need to solve
5
the system of non-homogeneous linear equations
(2.1) ~ =~ ,
where A is an nx p matrix (n > p),! is a p xl vector and ~ is
an n x 1 vector. The rank: of A is assumed to be r ~ p. If n = p
and r = p, then (2.1) admits a unique solution, namely ! =A-l~ where
A- l denotes the inverse of A. If r = p, then provided the equations
are consistent (!.!:.., possess a solution), a solution is given by
( , )-1!o= AA kz..
Since
we see that (2.2) is a solution of (2.1). The matrix (A'A)-lA' is an
example of the Pseudo-Inverse of the matrix A and will be discussed in
Section 2.4.
A solution of (2.1), provided it exists, is relatively easy to
represent in both the cases mentioned above. In the more general case
where r < p < n, a solution of (2.1) is more difficult to represent.
If we let· Ag =A-lor (A' ArlA' depending on whether r = p = n or
r = p < n then we see that
This suggests that a particular solution of (2.1), say A~, should
satisfy (2.3). We thus introduce the following definition due to Bose.
Definition 2.1 (Generalized Inverse) A matrix Ag
is said to be
a Generalized Inverse of the matrix A if (2.3) is satisfied, !.!:.., if
AAgA = A •
6
Most of the theory of solving linear equations is based on the
following three theorems which are well known and stated here for future
reference.
Theorem 2.1: If A is a square matrix a necessary and sufficient
condition that there exists ~ rQ such that Ax = 0 is that A be
singular.
Theorem 2.2: A necessary and sufficient condition that the linear
equations ~ = y be solvable for x is that
Theorem 2.3: The general solution (more precisely a typical element
in the set of solutions) of the non-homogeneous linear equations ~ = if.
is obtained by adding to any particular solution, x , the general solu-0
tion of
Ax = 0 •
Using the above three theorems, Bose proves the following results,
the proofs of which we reproduce since they form an integral part of our
later work. The first of these shows that any Ag satisfYing Defini-
tion 2.1 plays a basic role in providing a solution to (2.1).
Theorem 2.4: (i) If Ag
is a Generalized Inverse of A and the
equations ~ = if. are consistent (!.~., solvable for ~) then Agif. =~l
is a solution of ~ = if..
(ii) If ~l =Agif. is a solution of ~ = if. for
every if. for which ~ = if. admits a solution then AAgA = A, 1.~., Ag
is a Generalized Inverse of A.
Proof: (i ) Since ~ = y.. are consistent there exists x
such that .A:! = y... Thus if ~ = Agy" then
~l = AAgy" = AAg(~) =AAg~ =~ = Y.. .
Hence Agy" =~l is a solution to ~ = Y..
(11) We have AAgy" = Y.. for every Y.. for which the
7
system ~ = Y.. is consistent. In particular let Y.. = A! for an arbi
trary !. Then ~ = Y.. is consistent with Agy" = AgA! a solution and
thus we have
for arbitrary !. Hence AAgA = A •
The following result, used by Bose in developing the theory of
least squares, indicates an important set of linear equations which are
consistent and possess a certain uniqueness property.
Theorem 2~: The equations (X'X)~ = Xl Y.. for any Y..:f Q are
consistent and X! is unique.
Proof: Since the rank of a product of matrices is less than or
equal to the minimum of the separate ranks we have
Rank [(X'X) I X'y"] =Rank [x'(xlY..)] ~ Rank [X'] =Rank [X'X] •
We also have
Rank [(X'X) I X'y"] ~ Rank [X'X] •
Hence Rank [X'X I X'y"] = Rank [X'X], and by Theorem 2.2 the equations
(X'X)~ =X'y" are consistent.
Let ~l and ~ be any two solutions to (X'X)~ = X'y" then
(X'X) (~l - ~) = 0
8
and hence
(x' - ~) x'x (x - X-) =a-1 -'~ -l-'~
which implies
or
Xx..=Xx--;;';;1 ==2
proving uniqueness of' X:!.
The f'ollowing expanded version of' Theorem 2.5 also due to Bose f'inds
extensive application in Chapter 4.
Theorem 2.6: If' X' is a p x n matrix, there exists a p x n
matrix Y such that (X'X)Y =X' .
Further:
Proof':
( i) XY is unique
(ii) (xy)' = XY
(iii) (xy)2 = (XY)
thLet ~i be an (n x 1) vector with a 1 in the i row
and a's elsewhere. By Theorem 2.5 f'or each i = 1,2, ... ,n there exists
an !i such that (X'X)!i = X'~i. Hence
(X'X) [!1'!2'·.· '.!nl = X' [~1'~2'··· '~nl = X'I = X'
which proves the existence of' Y by taking Y = [!l'~' ••• '.!nl . By
Theorem 2.5, X!i is unique f'or each i = 1,2, ... ,n. Hence XY is
unique. To prove (ii) and (iii) we see that y'(X'xy) =y'X' or
(X'xy)'y = Y'X'. Hence (X')'Y = y'X' or Xy = (Y'X') = (XY)'. Also
(xy)'xy = (xy)' =Xy or 2(XY) =XY .
9
Corollary 2.1 The matrix x(x'x)Bx' where (X'X)g is any
Generalized Inverse of (X'X) is uniquely determined, symmetric and
idempotent.
Proof: By Theorems 2.5 and 2.6 the equations (X'X)Y = X' are
consistent. Hence by Theorem 2.4 a solution is given by
y = (x'x)Bx' .
By Theorem 2.6 X(X'X)Sx' has the stated properties.
In view of the importance of Corollary 2.1 an alternative proof
will be presented. ,(i) [(X'X)(X'X)g(X'X)]' = (x'x)(x'X)g X'X =X'X implies that,
(X'X)g is also a Generalized Inverse of X'X,
(ii) [X - X(X'X)g(X'X)]'[X - X(X'X)Sx'X], ,.
= [x'-(X'X)(X'X)g X'][X-X(X'X)g X'X]
, I
= X'X - X'X(X'X)Sx'X-(X'X)(X'X)g x'x + (X' X)(X'X)g (X'X)(X'x]8x' X
= X'X - X'X - X'X + X'X
= 0
implies that X = x(x'x')g(X'x) for any Generalized Inverse of X'X,
(iii) Let (X'X)~ and (X'X)~ be any two Generalized Inverses of
(X'X), then
10
[X(X'X)~f - X(X'X)~']'[X(XfX)~' - X(X'X)~']1 2 1 2, ,= [X(X'X)g X' - X(X'X)g X'][X(X'X)~' - X(X'X)~']1 212
, ,= X(X'X)g X'X(X'X)~'- X(X'X)g X'X(X'X)gx'
1 1 1 2
, ,- X(X'X)g X'X(X'X)~' + X(X'X)g X'X(X'X)~'2 1 2 2
= X(X'X)~' - X(X'X)~' - X(X'X)~' + X(X'X)~'1 2 1 2
= 0 •-Thus X(X'X)~' = X(X'X)~' !.~., X(X'X)~' is uniquely determined,,
(iv) [X(X'X)~']' =X(X'X)g X' = X(X'X)~' by (iii),
and (v) [X(X'X)~'][X(X'X)~'] = [X(X'X)g(X'X)](X'X)~' = X(X'X)~' .
Thus X(X'X)~' is unique, syrmnetric, idempotent, X = X(X'X)g(X'X),
and X' = (X'X)(X'X)~'.
Thus far we have not settled the question of existence of a Gener-
a1ized Inverse for an arbitrary matrix A. The following result of Bose
settles the existence question and in addition gives a type of canonical
representation for a Generalized Inverse.
Theorem 2.7: If A is any matrix then a Generalized Inverse
exists and is given by
where P2 and P1 are non-singular matrices such that
o..o.. , r =Rank A,
11
U, V and W being arbitrary matrices.
Further every Generalized Inverse of A has the form P2BgPl for
some Generalized Inverse, Bg of B.
Proof: For any matrix A there exists non-singular matrices Pl
and P2 such that
where r is the rank of A.
Define Bg
= [*"J then
B#B=[Hi][m][Hi] ~ [m]Hence Bg is a Generalized Inverse of B if and only if X = I(r).
Thus Bg
= [Ww], where' U, V and II are arbitrary, is a canonical
-L -1representation for a Generalized Inverse of B. Since A = Pl l3P2 we
see that Ag =P2BSpl is a Generalized Inverse of A, thus proving
existence.
If Ag is any Generalized Inverse of A then the equation
implies that
12
or
It follows that p~lAgpil
Hence
*is a Generalized Inverse of B say B .
for some Generalized Inverse of B.
Bose applies the above results to obtain a solution of the so-called
" "" "normal or least squares equations given by
(x'x)!? = X'I
and discusses conditions for (linear) estimability, which reduce to the
requirements for the uniqueness of a linear function l!?, and (Strong)
testability which reduce to conditions for the uniqueness of L!? where
the rank of L is full. (L is a rectangular matrix of order
sxp(s < p) and rank s).
C. R. Rao [1962], in a recent paper, has discussed what we have
defined as a Generalized Inverse. Rao's treatment closely parallels
that developed by Bose as given above. Rao defines his Generalized In
verse to be any matrix AS such that for any I for which ~ =I is
consistent, ~ =A~ is a solution. He then establishes that AS is
a Generalized Inverse of A' if and only if AASA =A. Rao discusses
briefly the application of Generalized Inverses to the theory of least
squares and distributions of quadratic forms in normal variables.
13
2.3 Reflexive Generalized Inverses
C. R. Rao [1962] established that there exists a Generalized In-
verse with the additional property that
In his proof of this result a constructive procedure for determining
such an Ag is given. He simply uses the fact that for any matrix A
there exists non-singular matrices P1 and P2 such that
P1A P2= ~(r) ~j =B
where r is the rank of A. The Generalized Inverse given by
Ag = P Blp is such that AAgA =A and AgAAg = Ag .2 1
The reflexive property present with such a Generalized Inverse
suggests that some interesting results might be obtained by consideringthe
following type of Generalized Inverse.
Definition 2.2 A Reflexive Generalized Inverse of a matrix A is
defined as any matrix Ar such that
(2.4)
In the case of a symmetric matrix Rao has given a computational
method (a modified version of which we discuss in detail in Chapter 4)
for obtaining a Reflexive Generalized Inverse. Properties of this type
of Generalized Inverse (other than those which a Generalized Inverse
possesses) do not appear to have been previously discussed. It is of
interest to point out here that Roy [1953] in his discussion of
14
univariate Analysis of Variance utilizes an approach resting on the
concept of the space spanned by the columns of a matrix. It can be
easily shown that such an approach is exactly equivalent to working with
a judicious choice of a Reflexive Generalized Inverse. This will be
discussed briefly in Chapter 5.
Specific properties of the Reflexive Generalized Inverse will be
discussed in Chapter 3.
2.4 Normalized Generalized Inverses
Zelen and Goldman [1963] in a recent article have proposed a type
of Generalized Inverse which they call a Weak Generalized Inverse.
Definition 2.3 (Normalized Generalized Inverse) If X is an
n x p matrix then a Normalized Generalized Inverse of X is a p x n
nmatrix X satisfying
XX~ =X
(__.n) , n.xx =XX.
Zelen and Goldman's proof of the existence of a Normalized Gener-
alized Inverse is based on the following three Lemmas on bordered
matrices which we state without proof.
Lemma 2.1 Let A be a symmetric px p matrix of rank r < p and
let R be a p x q matrix of rank q = p-r. Then there exists a
px(p-r) matrix B with the properties
(a)
(b)
B'A = 0-det (B'R) ~ 0
15
if and only if the square symmetric matrix
M = [+++]is non-singular. In this case any B of rank p-r can be used as the
B in (*).
Furthermore -1 has the formM
-1
[(B'R~-~'B R'B -1 ]M =
0-Lemma 2.2 Let A, R, M be as in Lemma (2.1) and assume M non-
singular. Then there is a unique px p symmetric matrix e associated
to R, With the property that for at least one B satisfying (*),
(**)
(a) R' e = 0-
(B bears no relation With the B used in Sections 2.2 and 2.3.) The
matrix e satisfies (**) for every B satisfying (*), and has the
additional properties:
( a) CAe = e, ACA = A
(b) e is of rank r.
Also e can be written as e = [I - B(R'B)-lR'][A + RB,]-l.
Lemma 2.3 Let A be a symmetric p xp matrix. If e satisfies
(a) and (b) of Lemma 2.2 then e can be found using some R in the
expression given for e in Lemma 2.2.
Note that e as given above is symmetric since the inverse of the
symmetric matrix M is symmetric. Also note that e is a Reflexive
16
Generalized Inverse in the sense of Definition 2.2, but that C is
required to be symmetric here.
The following Lemma established by Zelen and Goldman will be used
in Chapter 3 to establish the relation between Reflexive Generalized
Inverses and Normalized Generalized Inverses.
Lemma 2.4 Let X be an n x p matrix. Then the p x n matrix yfl
is a Normalized Generalized Inverse of X, if and only if
for some C associated with A =X'X as in Lemma 2.2.
Zelen and Goldman use the concept of Normalized Generalized In-
verses to study and extend some of the concepts of minimum variance
linear unbiased estimation theory.
~ Pseudo-Inverses
The earliest work on Generalized Inverses was done by E. H. Moore
[1920]. " "Moore introduced the concept of general reciprocal of a
matrix in 1920 and later in 1935 in a book entitled General Analysis
further investigated the concept. Penrose [1955, 1956], independently
d ," "through a different approach discovere Moore s general reciprocal ,
which he calls the "generalized inverse" of a matrix. Rado [1956] has
pointed out the equivalence of the two approaches. Penrose's approach
is much closer to the approach we have followed thus far and we intro-
duce this type of Generalized Inverse in this manner. Due to the
recent literature we use the name Pseudo-Inverse for the Generalized
Inverse developed by Moore and Penrose.
17
The definition of the Pseudo-Inverse of a matrix hinges on the
following theorem proved by Penrose.
Theorem 2.8: The four equations
(i) AAtA = A
(ii ) AtAAt = At
(iii) (AAt )' = AAt
(iv) (AtA)' = AtA
where Ei = E~ = ~,
eigenvalue of A' A.
have a unique solution for any A.
Proof: Penrose's proof can be greatly shortened using the results
previously mentioned in this Chapter and the spectral representation of
a symmetric matrix.
We see that (A' A) is a symmetric matrix and hence has a spectral
representation
thEiEj = 2 (i ~ j) and A.i is the i non-zero
If we let X = .E A.~lEi then (A' A)X(A' A) = 1: A.iEii i= (A' A) and similarly X A'A X = X. Hence X is a Reflexive Gener-
a1ized Inverse of A' A. Hence by Corollary 2.1 A X A' A = A. We now
define At = XA'. Then
and
AAtA = A X A' A = A ,
~tAAt = X A' A X A' = X A' = At ,
AtA=XA'A=1:E =(.EE)'=(AtA/iii i
AAt = A X A' = (A X A' )' = (AAt ), by Theorem 2.6.
Following Penrose, uniqueness is shown by noting that if B also
18
satisfies (2.6) then
At = AtAAt = AtAt'A' = AtAt'A'B'A' = AtAt'A'(AB)' =AtAt'A'AB
= AtAB = AtAA'B'B =A'B'B = (BA)'B = BAB = B
t t' t t ,where the relations A A A' = A , A AA' = A, (AB) = (AB)', and
A = AA' At' have been freely used.
Definition 2.4 The matrix At associated with A as in Theorem
2.8 is called the Pseudo-Inverse of A.
Penrose investigates many properties of the Pseudo-Inverse, the
results being summarized in the following theorems.
Theorem 2.9: (i) (At)t = A,
(ii) (A,)t =At, ,
(iii) At = A-l if A is non-singular,
( i v) (M)t = A.tAt , A. any s caler with A.t = 0 if A. =0,
(v) (A'A)t = AtAt , ,
(vi) If U and V are orthonormal (UAV)t = V'AU',
(vii) If A =rAi' where AiAj = ~ and A~Aj = ~
whenever i r j then At = L At ,i i
(viii) At = (A'A)tA"
(ix) AtA, AAt , I-AtA, and I_AAt are all synunetric
all have rank equal to
trace of
idempotent matrices,
If A is normal AtA = AAt ,
, tA,A A,A ,
(x)
We recall that a normal matrix is a matrix such that AA' = A' A.
19
Theorem 2.10: A necessary and sufficient condition for the equation
A X B = C to have a solution for X is
in which case the general solution is
where Y is arbitrary. Penrose points out that A need only satisfy
AAtA = A in order for Theorem 2.10 to hold. An immediate corollary is
that the general solution of the consistent equation P.x = c is
t tx = P .£ + (I-P p)Z where Z is arbitrary.
tTheorem 2.11 The matrix A B is the unique best approximate
solution of the equation AX = B. (The matrix Xo is said to be a
best approximate solution of the equation f(X) =G if for all X
either
or
(i) Ilf(x) - Gil> II f (xo ) - Gil
(ii) IIf(X) - G II = II f (xo ) - Gil and Ilxll > II X II .- 0
Note that II Mil is a norm of the matrix M and is commonly given by the
square root of the trace of M'M.).
2.6 Generalizations to Hilbert Spaces
A generalization of the concept of Pseudo-Inverses to infinite
dimensional vector spaces or more specifically to Hilbert space has
recently been achieved by Desoer and Whalen [1963] and also by Foulis
[1963] in a more algebraic manner. Whatever is said about Hilbert space
naturally holds for a finite-dimensional inner product space. Desoer
20
and Whalen's approach will be discussed in some detail because of its
importance in Hilbert space applications and also because the natural
specialization of their approach to finite dimensional inner product
spaces affords a modern look at the concept of Pseudo-Inverses. Their
approach can best be described as a range-null-space approach. Before
describing the approach taken by Desoer and Whalen we need to indicate
certain basic definitions, theorems, and notations appropriate to the
concepts of Hilbert space. The approach we take follows closely that
of Simmons [1963] and Halmos [1951].
We recall that a linear space H is a set of points ~,~, z
such that
I. There is defined an operation + on Hx H to H such that
(H, +) is an Albelian Group, !.!:.,
(i) (~+~) + ! = ! + (~ +!) for all ~, ~ and! in H,
(ii) There exists 0 € H such that x + 0 = 0 + x =x for
(2.6')
and
any ~ € H,
(iii) For each x € H there exists an element -x such that
x + (-~) = (-x) + ~ =Q ,
(iv ) ~ + ~ = ~ + ~ for all x and ~ in H
a (~ + ~) = CX! + et;L ~, ~ € H, a € F,
(a + (3) ~ = CX! + (3~ X € H, a, (3 € F,
(a(3)~ = a«(3~) x € H, a, (3 € F,
l.x = x X € H •
II. There is defined an operation on H x F (where F is a field
of scalars) to H called scalar multiplication such that
(i)
(2.6") (ii)
(iii)
(iv)
•
21
The elements of H are called vectors while the zero vector 0 is
called the origin.
The linear space H is called a normed linear space if to each
vector x € H there corresponds a real number denoted II xII and called
the norm of x such that
II~II ~ 0, and II~II = 0 if and only if ~ = .Q. ,
and
It can be shown that a normed linear space H can be made into a metric
space by defining the metric as
(2.8)
The metric defined by (2.8) is called the metric induced by the norm.
We recall that a sequence x-nin a metric space H is convergent to
X € H, written x -;> x if and only if d(x ,x) -;> 0 as n -;> 00.-n - -n-
A Cauchy sequence in a metric space is a sequence x such that-n
d(x ,x ) < € whenever n, m > n. Obviously a convergent sequence is-m -n - 0
a Cauchy sequence. A metric space H such that each Cauchy sequence
in H is convergent to a point in H is called a complete metric
space.
A complete normed linear space is defined to be a Banach space. In
a Banach space the induced norm provides us wi.th a natural measure of
distances but there is no natural measure of angles as exists in, say
R2 • The concept of a Hilbert space admits such a measure of angles. A
linear inner product space is a linear space with a function (called the
22
inner product) such that
If a linear inner product space H is complete and we define IIxll by
the equation
IIxll2 = (x,x)
then H can be shown to be a Banach space. The 'norm, Ilxll is said to
be induced by the inner product.
A Hilbert spac e is thus a complex Banach space whose norm arises
from an inner product. The inner product allows a defin1tion of orthog
onality which generalizes the usual concept of orthogonality in R2 and
r{; . Vectors ! and l in H, are said to be orthogonal if (!,l) = O.
The notation !.1l indicates that ! is orthogonal to l,!-lA
indicates that !l.l for all l € A while AJ.. B indicates that
!....Ll for all x € A and l € B.
It can be shown that the inner product, the induced norm, addition
and scalar multiplication are all continuous functions in the topology
generated by the metric defined by (2.8).
A subset MeH is said to be subspace of H if (~ + I3l) € M
whenever !, l' € A and Cl, 13 € F. A subset M of a Hilbert space can
be shown to be closed (in the sense of the norm topology) if and only if
it is complete. If any vector ! € H can be written uniquely as
x = J!j where !j € Mj , a closed subspace of H, for all j, then we
write H = Ml + ~ + ••. + and call H the direct sum of the subspaces
23
M:L' ~, .••• It can be shown that if M:L'~' •.• are pairwise
orthogonal and H is spanned by the union of the MJ
then H is the
direct sum of the MJ
•
Reflection on finite dimensional vector spaces reveals that linear
transformations are of importance in many applications. Linear transfor
mations in finite dimensional vector spaces are represented by matrices.
The generalization of a matrix to Hilbert space results in the concept
of a linear transformation A. Alinear transformation A is a mapping
from H to another Hilbert space H' such that
(2.10)
for all ~,'Z € H and Ct, 13 € F.
A linear transformation mapping one Hilbert Space into another
obviously preserves some of the algebraic structure of H but since H
is equipped With a topology it is useful to require that A be continu
ous in order to preserve the topological structure as well. The linear
transformation A:-H --> H' is said to be bounded if there exists Ct > 0
such that
(2.11) IIA xii ~ Ct II x II for all x € H •
-. The norm IIAII of A is defined as
(2.12) IIAII = in! { Ct: IIA xii ~ Ct Ilxll for all ~ € H} •
The concept of boundedness is important since it can be proved that a
linear transformation is continuous if and only if it is bounded. If
A: H --~ H and is bounded then we shall call A a (linear) operator.
24
Similarly if A: H -> R or C and is bounded then we shall call A
a (linear) functional.
We shall have occasion to discuss only bounded operators and func-
tionals. We also note that there are many differences in terminology
used in defining functionals and operators.
The scalar multiple, sum, and product of the operators A and B
are defined by
(aA)! =a(~) ,
(A + B)! = ~ + B! ,
(AB)2£ = A(l32£) ,
and it can be shown that
(2.14) llaAll = lal IIAII, IIA + BII ~ IIAII + IIBII and IIABII < IIAII IIBII •
It can also be shown that the set of all linear operators B(H)
from H into H forms an algebra over the field F,!•.!!., a linear
space over F where the elements (in this case the operators) and
scalars can be multiplied in the following natural manner:
A(BC) = (AB)C
A(B + C) =AB + AC
for A,B,C, € B(H) ,
for A,B,C, € B(H) ,
and a(AB) = (aA)B = A(aB) for A,B€ B(H) and a € F •
In addition this algebra has an identity, namely the operator I defined
by
(2.16) I x = x for all x € H .
An operator A is said to be invertible if there exists A- l such that
A-1A = I =AA-l .
25
If A is any operator it can be shown that there exists a unique
*operator A with the property that
for all ~, Z € H. *The operator A is called the adjoint of A and
plays a role similar to that of the conjugate transpose of a matrix in
finite dimensional linear spaces. It can be shown that the adjoint has
the following properties:
(2.18)
and
(A*)* = A ,
(aA)* = a*A* where a* = complex conjugate of a,
(A + B)* = A* + B* ,
(AB)* = B*A* ,
(A*rl = (A-l )* if A is invertible.
There are several specific types of operators of importance in the
study of Hilbert spaces. We list these operators and their properties
below.
*Hermitian Operator: An operator A is hermitian if A = A
(generalizes the concept of a hermitian matrix),
* *Normal Operator: An operator A is normal if A A = AA. A
normal operator has the property that IIA*xll = IIAxIl for all ~ € H,
Unitary Operator: An operator U is unitary if U'*u = 00* = I
(generalizes the concept of a unitary (orthogonal) matrix) ,
Projection Operator: The projection operator P on a subspace M
is the operator P, defined, for every ~ € H of the form ~ =~ + Z
where x € M and Z € ~, by pz = x. It can be shown that if P is
e,
26
an idempotent (rl- = p) and hermitian operator then P is the proj ec-
tion on M ={!: P! =! J.Theorem 2.8 indicates that AAt and AtA are proj ections on suit-
able subspaces in the finite dimensional case.
One of the main virtues of a Hilbert space is the fact that if M
is a closed subspace then there exists a projection operator P such
that M =1!: P! = ! J. Further MJ. = [!: p! = QJ and M + Ml. = H.
In accordance with customary usage M is called the range of P
(written R(P)) while ~ is called the null space of P (written N(P)).
The existence of projections on a given closed subspace plays a funda-
mental role in the generalization of the concept of a Pseudo-Inverse to
Hilbert space.
The final concept of importance is the spectrum of an operator
which is defined as the set of all ')... such that A - ')...1 is not in-
vertible. The spectrum of an operator is obviously a generalization of
the set of characteristic roots of a matrix.
With the above preliminaries and notation completed we are in a
position to discuss Desoer and Whalen's definition of the Pseudo-Inverse
of a bounded linear transformation from H to H'.
The essential problem is simply to associate with each transforma
tion A in B( H, H') another linear transformation A such that
.A! =;£. is true if and only if At;£. =!. If A is invertible then A- l
clearly satisfies the requirements. In order to cope with the situation
when A-1 does not exist Desoer and Whalen introduce the following
definition.
e-27
Definition 2.5 Let A be a bounded linear transformation of a
Hilbert space H into a Hilbert space H' such that R(A) is closed.
A is said to be the Pseudo-Inverse of A if
(i) At~ = ~ for all ~ € [N(A)]J. ,
(ii) Atl = 0 for alll € [R(A)].L
and (iii) If II € R(A) and l2 € [R(A)~ then At (ll + Ie) = Atll + AtIe'
tThe connection between A and A can best be seen by the diagram
given in Figure 2.1.
A
o G
Figure 2.1. The Pseudo-Inverse in geometric terms
We note that since R(A) is closed the mapping At is a one-one
mapping of R(A) onto [N(A)].L. This coupled with (ii) and (iii) serve
to define At as a mapping of H' into H. 'In addition (iii) guaran
tees that At is a linear operator from H' into H with range
[N(A)]l and null space [R(A)lL.
28
Using Definition 2.5 Desoer and Whalen prove that At has the
properties (i) to (iv) of Theorem 2.8 thus indicating that Definition
2.5 provides an extension of the concept of Pseudo-Inverse (.£.,f. Defini
tion 2.4) to Hilbert space. Note that in the case of a finite dimension-
al vector space the range of any linear transformation is closed and
hence in this case Definition 2.5 is equivalent to Definition 2.4.
Using Definition 2.5 the authors prove that At is such that
(At)t :::: A and (At )*:::: (A*)t. The projection results indicated in
8 t '( )Theorem 2. , namely, that AA is the projection of H onto R A
and that AtA is the projection of H onto [N(A)f are also estab
lished. Equation (viii) of Theorem 2.9, which plays an important role
in the computational methods to be discussed in Chapter 4, is true in
the Hilbert Space context also and reduces the computation of the
Pseudo-Inverse of an operator A to the computation of the Pseudo
*Inverse of (A A).
Several interesting properties of the Pseudo-Inverse are estab-
lished by Desoer and Whalen. The first states an equivalent definition
in terms of an inner product relation and a condition on the null space
of A. Specifically the equivalent definition is:
Definition 2.5.1 Let 2f,;[. be elements of H and write
2f :::: 2fl +~, ;[.::::;[.1 + ~ where 2fl' ;[.1· are in [N(A)].L and ~, ~
are in N(A). Then At is the Pseudo-Inverse of A if and only if
(i) (AtA 2f,;[.) :::: (2f1';[.1) for all 2f,;[. in H,
and
(ii) N(At ):::: [R(A)].L .
29
As stated in the introduction, the concept of Generalized Inverse
is extremely important in the theory of least squares. Penrose's
Theorem (2.11) on best approximate solutions of matrix equations
generalizes in the following sense.
Theorem 2.11': If l € H' and 2£1 = Atl then (a) 11A!1 - l II
:5 IIA! - III for all 2£ € H, and (b) 11!I11 :5 1I!oli for all !o satisfying
inequality (a). In intuitive terms Theorem 2.11' states that if A! = l
has a solution then Atl is the solution which has smallest norm.
As is well known (see for example HalmosU958]) an arbitrary matrix can
be factored as A = UQ where U is an isometry (an orthonormal matrix)
and Q is a positive matrix (non-negative definite matrix). Desoer and
Whalen proved the following generalization of this decomposition (Which
is called a polar decomposition because of its relation to the polar
representation of a complex number as reiG ).
Theorem 2.12: Let A be a bounded linear mapping from H into
H'. If R(A) is closed then A can be factored as
A = UP
* twhere U = U, P is a positive self adjoint operator on H and U
maps H onto R(A).
2.6 Generalized Inverses in Algebraic Structures
The concept of Generalized Inverses revived by Penrose [19'5] in the case
of matrices led to immediate generalizations to other algebraic struc-
tures. Since the set of all n x n matrices is a ring under multiplica-
tion, it is natural to suspect that the concept of Generalized Inverse
30
extends to arbitrary rings. In recent articles Drazin [1958] and Munn
[1962] have investigated Generalized Inverses in semi-groups and rings.
A semi-group consists of a set 8 and a mapping of 8 x 8 into 8
such that the mapping is associative, !.~., if x, y and z are elements
of 8 then
Drazin and Munn introduce the following definition.
Definition 2.6 An element x in 8 is said to be pseudo-
invertible if there exists an element i in 8 such that
xx=xx,
and
(ii) Xn -- xn+lx- f iti i tor some pos ve n eger n,
- -2(iii) x = x x .
The uniqueness of i (if eXistent) has been proved by Drazin.
In terms of the associative ring of nxn matrices under multipli
cation we see that any normal matrix (any matrix such that X' X= XX' )is pseudo-invertible in the sense of Munn and Drazin since
xtx = (X'X)tx'x = (XX,)txx ' =x,tx' = txxt), =xxt ,
X= xxtx = x2xt ,
and xt =xtxxt =xt2x .
Using the above definition of a Pseudo-Inverse Munn and Drazin investi-
gate the implications of the concept of Pseudo-Inverse in the structure
of rings and semi-groups. In particular Drazin gives sufficient condi
tions for x to be pseudo-invertible when (8, (3)) is a ring and MUJ:!lIl
obtains necessary and sufficient conditions for an element of a
31
semi-group to be pseudo-invertible. This condition is si1DI>ly that some
power of x lies in a subgroup of (S,®) •
Foulis [1963] in a recent paper has considered Pseudo-Inverses in a
special type of semi-group. His work is highly algebraic in structure
and includes as a special case certain of the results obtained by Desoer
and Whalen [1963] in Hilbert space contexts.
32
CHAPrER 3. THEORETICAL RESULTS
3.1 Summary
In Chapter 2 several definitions of Generalized Inverses were
presented. It is the purpose of the present chapter to investigate some
of the implications of these definitions. Let g(A), r(A), n(A) and
t(A) denote the set of all Generalized, Reflexive Generalized, Normal
ized Generalized and Pseudo-Inverses respectively, of a matrix A. We
are interested in investigating relationships between a property of A
and the corresponding property of a typical element in g(A), r(A), n(A)
or t(A). The property of rank is discussed in Section 3.2, symmetry
in Section 3.3, and eigenvalues and eigenvectors in Section 3.4. In
statistical applications of Generalized Inverses orthogonal projections,
represented by idempotent symmetric matrices, play an important role and
these aspects of Generalized Inverses are discussed in Section 3.5. In
Section 3.6 several results are proved which indicate that under certain
conditions g(A), r(A), n(A) and t(A) can be characterized by the
relation of a property of A to the corresponding property of a typical
element in i(A), i=g, r, n, t.
~ Results on Rank
One of the most important characteristics of a matrix is its rank.
In the case of a Generalized Inverse Ag, of a matrix A, the rank of If>
satisfies the following inequality
The proof of this result hinges on the following Lemma.
33
Proof: Since the rank of a product of two matrices does not exceed the
rank of either factor the conclusions follow from
Rank (A) ~ Rank (AAg ) ~ Rank (AAgA) = Rank (A),
and Rank (A) ~ Rank (AgA) ~ Rank (AAgA) = Rank (A) •
Using Lemma 3.1 we have
Theorem 3.1: Rank (Ag) ~ Rank (AgA) = Rank (A).
If A is a square matrix the following theorem indicates that
singularity of A and singularity of its Generalized Inverse, Ag, are
not equivalent concepts.
Theorem 3.2: If A is singular then a non-singular Generalized
Inverse exists.
Proof: From Theorem 2.7 a Generalized Inverse of A is given by
P2BgPl where
P1AP2=B=[~] and BS=[Wr-].Since Pl and P2 are non-singular it follows that Ag = P2B~1 is a
non-singular Generalized Inverse of A.
One of the most useful results concerning inverses of square
matrices is that CAB, where C, A, and B are non-singular, is in-
. -1 -1 -1vertible with inverse given by B A C It is interesting and very
useful (see Chapter 4) to note that a similar result holds for Gener-
alized Inverses.
Theorem 3.3: If C and B are non-singular square matrices and
A is any matrix such that CAB eXists, then a Generalized Inverse of
CAB is given by B-1AgC-l •
Proof: (CAB)(B-1AgC-l )( CAB) = CAAg~ = CAB.
One would hope that an analogous result would hold if B and C
were replaced by singular matrices and B-1AgC-l replaced by BgAgcS
but consideration of the equation
indicates that such a result cannot hold in general. A special case of
some interest occurs when C and B are commutative idempotent matrices
which also commute With A. Then since a Generalized Inverse of an idem-
potent matrix is itself we have (CAB)(BAgC)(CAB) = CABAgCAB = CBAAgACB
= CBACB = CBCBA = CBA = CAB. Such a result could be of interest in the
study of idempotent matrices and their associated projections.
When we consider Reflexive Generalized Inverses the inequality of
Theorem 3.1 becomes an equality.
Theorem 3.4: If Ar is a Reflexive Generalized Inverse of A
then Rank (Ar
) = Rank (A).
Proof: Theorem 3.1 applied to A and another application of Theorem
r3.1 to A yields the results
Rank (Ar ) 2: Rank (A),
Rank (A) 2: Rank (Ar )
and hence Rank (Ar ) = Rank (A) •
The analogue of Theorem 3.3 holds for Reflexive Generalized Inverses
also.
35
Theorem 3.5: If C and Bare non- singular square matrices and
A is any matrix such that CAB exists then a Reflexive Generalized
-1 r -1Inverse of CAB is given by B A C .
Proof: In view of Theorem 3.3 the equation
(CAB)r(CAB)(CAB)r = (B-~rc-l)(CAB)(B-1ArC-l) =B-1ArAArC-l
=B-1ArC-l
completes the proof.
In view of Theorem 3.4 both Normalized Generalized Inverses and
Pseudo-Inverses preserve the rank of the original matrix. Theorems 3.3
and 3.5, however, become somewhat weaker when we restrict the class of
Generalized Inverses. Simple computations show that a Normalized Gener-
alized Inverse of MC where C is any non-singular square matrix and
B is any orthonormal matrix is C-1A~'. As might be expected the
Pseudo-Inverse of MC when B and C are orthonormal is C'AB'
(this result was in fact given by Penrose [1955]).
2..:2. Symmetry Results
In many applications of matrices square symmetric matrices become
extremely important. In view of the importance of symmetry and the
simple fact that the inverse of a non-singular symmetric matrix is again
symmetric, a natural question which arises is the extent to which Gener-
alized Inverses preserve or destroy symmetry of the original matrix.
The first result states that the two concepts are not equivalent
for Generalized Inverses and Reflexive Generalized Inverses while the
second is an existence theorem guaranteeing that symmetric Generalized
and Reflexive Generalized Inverses of symmetric matrices exist.
36
Theorem 3.6: If A is symmetric, then it is not necessarily true
that Ag or Ar is symmetric.
Proof: By Theorem 2.7 a Generalized Inverse of A is given by
and it is clear from this form that Ag need not be symmetric.
Similarly a Reflexive Generalized Inverse of A is given by
rand again it is clear that A need not be symmetric.
Theorem 3.7: If A is symmetric then there exists a symmetric Ag
rand a symmetric A.
Proof: In Theorem 2.7 we can choose Pl and P2 so that P2 = P~.
If we choose Bg and Br to be symmetric the conclusion follows from
Theorem 3.5.
The situation for Normalized Generalized Inverses of symmetric
matrices is not quite so simple. Consideration of
-13o
which is a Normalized Generalized Inverse of
A = [~1
1
o
shows that a Normalized Generalized Inverse of a symmetric matrix need
not be symmetric. If the Normalized Generalized Inverse is symmetric
then it is the Pseudo-Inverse. Thus the existence result for Normalized
37
Generalized Inverses of symmetric matrices has little importance since,
as we shall show below, the Pseudo-Inverse of a symmetric matrix is
symmetric.
Theorem 3.8: If A is symmetric then the Pseudo-Inverse of A is
symmetric.
Proof: From (ii) Theorem 2.9 we have (A'} = At, .At = (A,)t = (At),.
Hence
..
Theorem 3.8 can, using (v) of Theorem 2.9, be strengthened to the
statement that A normal implies At normal.
~ Eigenvalues and Eigenvectors
The importance of the spectrum of a matrix and more generally the
spectrum of a linear transformation on a linear space is well known.
Classical matrix theory discusses the spectrum of a matrix via study of
the eigenvectors or characteristic vectors and their associated eigen-
values or characteristic roots. It is known that if x is an eigen-
vector of an invertible matrix A then x will also be an eigenvector
of A-1. The corresponding eigenvalues will be reciprocals. Since
Generalized Inverses behave so much like inverses a natural subject for
investigation is the relation of the eigenvectors (values) of a square
matrix A and those of an associated Generalized Inverse Ag •
For any square matrix M let 1\ 1 (M) denote the set of non-zero
eigenvalues of M, !.~.,
1\ 1(M) ={ A.: A. ~ 0 and M! = A. ~ for some ~;. Q5
and let 1\ ~l(m) =f f: A. € '\(M)J. For any square matrices M and N
let the sets A2(M) and A ;l(N) be defined by
A2(M) =[!:! r 0 and Mx = A. x for some A. €A:fM)}
,\;l(N) ={!: ! r Q, N! = A.-1! and ! €A 2(M)} •
If A is any square matrix and Ag is any Generalized Inverse of
A then we define properties R1 and ~ as follows:
Property ~: A1(Ag) =A~l(A),
Property ~: A 2(Ag ) =A;l(A) •
If Ag is such that property R1 is satisfied we say that Ag has
property E1. and similarly for property ~ •
Theorem 3.9: Generalized Inverses and Reflexive Generalized In-
verses do not necessarily possess properties R1 and ~.
Proof: The matrix
A = [~11o ~]
has characteristic roots 0, 1, 3,. The matrices
r [1and A = -~
-12o ~]
are respectively a Generalized and Reflexive Generalized Inverse of A.
The eigenvalues of Ag are 1, 1, 1, while those of Ar are 3 ;-15and 3;V5. Note that the multiplicity of the zero eigenValues is not
preserved with Ag
• Consideration of
•39
[
1r
and A = ~
shows that a Reflexive Generalized Inverse of a symmetric matrix can
possess property Bi but not necessarily property ~ since~] is
an eigenvector of Ar and not an eigenvector of A.
Theorem 3.10: If A is symmetric then the non-zero eigenvalues of
A and any Normalized Generalized Inverse of A are reciprocals, !.~.,
nA possesses property Rl if A is symmetric.
Proof: If A is a non-zero eigenvalue of A there exists ! r2 such
that
Ax=Ax.
Multiplying both sides of the above equation by AAn yields
n= AA we have
n n' -1 n'! = (AA )'! =A Ax or A x =A x .
Hence A- l is a non-zero eigenvalue of An •
If ~ is a non-zero eigenvalue of An then there exists a non-
zero vector I such that
Hence
or
n'Thus A:;i is an eigenvector of A with eigenvalue ~. Multiplying
both sides by A gives
40
Hence A(Al) = ~-l(Al) or ~-l is a non-zero eigenvalue of A. Note
n' nthat Al r Q since if Al = Q then A Al = Q or AA Z = 0 and hence
nA Z = Q which is a contradiction.
For non-symmetric matrices the conclusion of Theorem 3.10 is not
valid as a consideration of
A = [~ ~
shows.
values
nThe matrix A has eigenvalues 0, and 3 while A has eigen-
o and 1/5.
The following Lemma gives a sufficient condition for a Reflexive
Generalized Inverse to possess property ~.
Lemma 3.2: A Reflexive Generalized Inverse Ar of A possessesr r
property ~ if A and A conunute (written A <-> A ).
Proof: If A <-> Ar then if A. rOwe have
r 2 r -1 rAx = A.x => Ax = ")I.AA x => A.x = A. A x => A. x;:: A x, and
The following simple corollary to Lemma 3.2 indicates that Pseudo-
Inverses of normal matrices are indeed well behaved.
Theorem 3.11: AtIf A is normal then the Pseudo-Inverse, J
possesses property R2 •
Proof: By (viii) of Theorem 2·9J
From Lemma 3.2 it follows that At
A normal implies that AAt =AtA .
possesses property ~.
41
It is curious to note that if A is not normal then its Pseudo
Inverse need not even possess property Rl . An example of this is
furnished by the matrix
A = [~ ~
and its Pseudo-Inverse
A
t= ~~ ~
It is easily shown that A has eigenvalues 0 and 3 while A has
eigenvalues 0 and 3/10. Necessary conditions for the Pseudo-Inverse
of an arbitrary matrix to possess property Rl or R2 are not known
at present.
~ Orthogonal Projections
The topic of orthogonal or perpendicular projections finds wide
application in the theory of matrices and linear spaces (SUCh as the
spectral representation of a linear operator) as well as in the applica
tion of matrix theory to other fields (the sum of squares due to a
hypothesis can be defined using the notion of an orthogonal projection
on a certain subspace). We recall (see Section 2.6) that an orthogonal
projection is siI!I.Ply a symmetric ideI!I.Potent matrix. In this section we
shall see that Generalized Inverses can be classified partly by the
projection properties of AAg and AgA. Simple matrix multiplication
shows that both AAg and AgA are ideI!I.Potent. Neither Generalized In
verses or Reflexive Generalized Inverses are necessarily orthogonal
projections because they are not necessarily symmetric.
Theorem 3.12: A Reflexive Generalized Inverse is a Normalized
rGeneralized Inverse if and only if AA is an orthogonal projection.
Proof: By the definitions.
Theorem 3.13: A Reflexive Generalized Inverse is the Pseudo-
r rInverse if and only if AA and A A are orthogonal projections.
Proof: Again obvious from the definitions.
Note that Theorems 3.12 and 3 .13 merely put conditions (iii) of
Definition 2.3 and (iv) of Definition 2.4 into geometric terms.
For applications to the theory of least squares it is of interest
to note that the trace of AgA and AAg is equal to the rank of A
(see Rao and Chipman [1964] for a proof of the result that the trace of
an idempotent matrix is equal to its rank).
3.6 Characterization of Generalized Inverses
Thus far in this Chapter we have investigated the relation between
various properties of a matrix A and the corresponding properties of
a typical element of g(A), r(A), n(A) or t(A). In this section we
carry this investigation further and find criteria to characterize the
sets g(A), r(A), n(A) and. t(A).
The first result indicates that rank plays an important role in
distinguishing between Generalized Inverses and Reflexive Generalized
Inverses.
Theorem 3.14: A Generalized Inverse is a Reflexive Generalized
Inverse if and only if Rank Ag
= Rank A.
Proof: If Ar is a Reflexive Generalized Inverse then by Theorem 3.4rRank A = Rank A.
To establish the converse let Ag
be an arbitrary Generalized
Inverse of A which is of the rank as A. By Theorem 2.7 there exists
P2
and Pl
such that
where
[Hf][H+]
,
the matrices V, W and X being arbitrary. If Rank A =Rank Ag
then
Rank A = Rank Ag = Rank B
g = Rank B. We see,that
R=kBg=Rank [;g][~][~]
= Rank [~ > Rank [~]= Rank B •
Hence Rank Bg = Rank B implies X - WV = ~ or X = WV •
Thus
Bg
= ~,
if Ag
is a Generalized Inverse with Rank Ag = Rank A. We now find
that
44
But
= [~][$][~] = [~][~J
= [~ = Bg
Hence
and AS is a Reflexive Generalized Inverse.
As has been the case throughout this Chapter the properties of the
Normalized Generalized Inverse are somewhat nebulous. The following
theorem, a slightly stronger version of Lemma 2.4 due to Zelen and
Goldman, characterizes Normalized Generalized Inverses in terms of
Generalized Inverses.
Theorem 3.15: A Generalized Inverse is a Normalized Generalized
Inverse if and only if it can be written in the form
for some Generalized Inverse of A'A.
Proof: If An = (A'A)SA' where (A'A)S is a Generalized Inverse of
A'A then the equations
and
show that An is a Normalized Generalized Inverse of A.
The converse follows immediately from Lemma 2.4.
nA of a
The property of commutativity of A and Ar serves to characterize
the Pseudo-Inverse of a sYmmetric matrix as the following theorem indi-
cates.
tTheorem 3.16: If A is SYmmetric then A <;--> A. Conversely
r r r tif A and A are SYmmetric and A ~---> A then A = A •
Proof: If A is SYmmetric (viii) of Theorem 2.9 implies that
AtA =AAt .
If A and Ar are SYmmetric and commute then
(AAr),I
=ArA = AAr=Ar A'and
(ArA)fI r r=AfAr =AA =AA.
Hence Ar = A.
The Pseudo-Inverse of a SYmmetric matrix can also be characterized
nin terms of A and Property~.
Theorem 3.17: If the Normalized Generalized Inverse
SYmmetric matrix possesses Property ~ then An = At.
Proof: Let ~J:' ~2' ..• , ~r denote the non- zero eigenvalues of A and
El' ~, .•• , Er the corresponding eigenspaces. Also let Eo denote
the eigenspace spanned by the vectors in the null space of A. Property
R2 implies that
i = 1,2, ... ,r
where !i is an arbitrary element in Ei i = 1,2, ••. , r •
We may write any vector ! as
+ x + x-r -0
where ~ € Ei i:= 1,2, ••• ,r and ~ € Eo. Hence
46
"" x +-1... t
+ X + An Ax-r """0
e·
0: X + + x-1 -r
and AnAx =AnA(_x, + ... + x + x ) + + x •:.I. -r -0 =!l ... -r
It follows tha.t AAn = AnA. Hence
and thus
CHAPTER 4. COMPUTATIONAL METHOOO
4.1 Summary
In this chapter certain computational methods for finding
Generalized Inverses are discussed. The subject is as vast as the
general theory of solving systems of linear equations and hence no claim
to completeness can be made.
In Section 4.2 an expression is obtained for a Generalized Inverse
of a suitably partitioned matrix. This result is subsequently used in
obtaining Generalized Inverses of Bordered Matrices in Section 4.5.
Section 4.3 provides a review of some computational methods. Some of
the theory underlying the Abbreviated Doolittle technique is discussed
using Generalized Inverses in Section 4.4.
4.2 An Expression for a Generalized Inverseof a Partitioned Matrix
In this section we shall develop a formula for obtaining a Gener-
alized Inverse of a suitably partitioned symmetric matrix. We note
that computation of Normalized Generalized Inverses and Pseudo-Inverses
of a matrix X can be performed by finding either a Generalized Inverse
or the Pseudo-Inverse of A =X·X (see Theorem 3.15 and (viii) of
Theorem 2.9). Matrices of the form X·X arise frequently in statis-
tical applications and occur commonly in partitioned form as
(4.1) X·X =
For use in algebraic expressions occurring in statistical applications
and as a computational method to find Normalized Generalized Inverses
48
and Pseudo-Inverses we shall develop an expression for a Generalized
Inverse of a matrix partitioned in the form (4.1).
Theorem 4.1: If a matrix X'X is partitioned as
X'X
then a Generalized Inverse of X'X is given by
[
(X' X )g + (X'X )g(X'X_)Qg(X~X )(X'X )g(X'X)g = 1 1 1 1 l-~ -~ 1 1 1
-Qg(X'X )(X'X )g2 1 1 1
(4.2 )
where Q = x~x_ - (X'X )(X'X )~IX_ and Qg and (X'X)g-~-~ 2 1 1 1 l-~ 1 1 are any
Generalized Inverses of Q and xiXl respectively.
Proof: Formula (4.2) can be directly verified by forming (X'X)(X'X)~X'X)
and simplifying. It is of some interest, however, to observe that (4.2)
can be obtained in a straightfoward manner using the results of
previous chapters. Using the relations Xi = (Xi~)(XiXl)~i and
Xl = Xl(XiXl)~iXl we see that P1X'XP2 = Z where
Z = ~] and Q = x~x_ - X~X (X'X )gx'x_-~-~ -~ 1 1 1 l-~·
I -1-1Since X X = Pl ZP2 it follows from Theorem 3.3 that a Generalized
Inverse of X'X is
where zg is any Generalized Inverse of Z. The Generalized Inverse of
Z which yields (4.2) is
Note that other choices of zg would be permissible and it appears
possible that certain of these choices would lead to simpler expressions
for (X'X)g although the choice made above is a natural one.
The following paragraph serves as an introduction to the proofs of
Theorems 4.2 and 4.3. Using the relations Xl = ~(XiXl)g(XiXl)'
Xi =Xixl(Xi~)~i ' the definition of Q and simple matrix multiplica
tions shows that
(x'x)g(x'x)(x'x)g =
[(XiXl)g(XiXl)(Xi~)g + (XiXl)~i~QgQQ~~(Xi~)g -(XiXl)~i~QgQQg]
-QgQQ~Xl(XiXl)g QgQQg
Using the results of (4.3) the following extended form of Theorem 4.1
can be stated.
50
Theorem 4.2: If X'X be partitioned as in (4.1) then a Reflexive
Generalized Inverse of X·X is
where Q = ~~ - ~JS.(XiXl )rXi~ and Qr and (Xixl? are any Reflex
ive Generalized Inverses of Q and (XiXl ) respectively.
Proof: From Theorem 4.1 it follows that (X·X)(x·x)rx •X = X·X. From
equation 4.3 with (XiXl)r(XiJS.)(XiXl)r = (xiJS.? and QrQQr =Qr it
is clear that (x'x)r(X'x)(X'X)r = (X'X)r which completes the proof.
The results of Theorem 4.1 and Theorem 4.2 can be extended, under
certain circumstances, to include Normalized Generalized Inverses and
Pseudo-Inverses. Before indicating these extensions we establish the
following Lemma:
Lemma 4.1: If X·X is partitioned as
[X'XJ. IX'~]x·x = ~JS. ~~
where Rank ~JS. = r and ~~ is non-singular (of order qxq) and
Rank X'X = r + q then
is non-singular.
Proof: Since Q is q xq it suffices to prove that Rank Q = q.
Since the rank of a matrix is unaltered by pre- or post-multiplication
by non-singular matrices it follows that the rank of X·X is the same
51
as that of Z" defined in the proof of Theorem 4.1. Hence"
Rank Z =Rank (X'X) = r + q •
On the other hand Rank Z = Rank (Xixl) + Rank Q = r + Rank Q. Hence
Rank Q = q.
Using Lemma 4.1 we can establish the following result.
Theorem 4.3: If X'X be partitioned as
where the ass~tions of Lemma 4.1 are fulfilled then
where (Xixl)n is any Normalized Generalized Inverse of (xiJS.), is a
Normalized Generalized Inverse of X·X.
(b)
(x.x/ = [<XixJ + <XiX/<XiX:!)Q-l<~JS.)(XiJS.)t -<XiJS.)t(XiX:!)Q-l]Q-l(~JS.)(xixl)t Q-l
twhere (xixl) is the Pseudo-Inverse of (Xixl)" is the Pseudo-Inverse
of (X'X).
Proof: (a) (X'X)(X'X)n(X'X) and (X'X)n(X'X)(X'X)n = (X'X)n by
Theorem 4.2. From 4.3 it follows that (X'X)(X'X)n = [(x'x)(x'x)n], •
(b) By part (a) (X'X)(X'X)t(x'X) = x'x" (X'X)t(X'X)(x'X)t= (X'X)
and (x'X)(X'X)t = [(X'x)(X'X)t],. From 4.3 it follows that
(X'x)t(X'X) = [(X'X)t(X'x)]' •
52
It is to be noted that if XiX is non-singular and (XiXl) is
non-singular then (4.2) reduces to a commonly given expression for the
inverse of a partitioned matrix. The expression given by (4.2) can be
used to find Normalized Generalized Inverses of arbitrary partitioned
matrices. If a matrix A is partitioned as
A = [A~llA12]~l ~2
we simply apply (4.2) to
and form
which, according to Theorem 3.15, is a Normalized Generalized Inverse of
A.
~ Some Cornwutational Procedures for Finding Generalized Inverses
4.3.1 Generalized Inverses
Finding Generalized Inverses, according to the definition is
simply a matter of solving the linear equations
and expressing a solution as
The matrix Ag will then be a Generalized Inverse of A. There is a
53
wealth of literature dealing with the solution of systems of"linear
equations (see for example Bodewig [1959], Dwyer [1951], and Fadeeva
[1959]) and to present a review of this material would be out of place
here. In Section 4.4 we shall present a method which has gained wide
acceptance in statistical circles. For the present the method called
sweep-out or pivotal condensation seems to be the simplest, although
not necessarily the most compact way of finding a Generalized Inverse
of an arbitrary matrix.
Consider the n x p matrix X and the augmented matrixnp
[~np ~(n)J
One performs elementary row and column operations on this matrix until
it has been reduced to the form
r-I(r)
o_n-r,r,
o- -where r = Rank (Xnp )' or more simply
*It follows that P1XP2 =X and using Theorem 3.3 a Generalized Inverse
of X is given by
(4.4)
where a possible X*g is given by
x*g = [I."....(r_)---+~2;;;..,t.r .;;.;;..n-~r_]~p-r,r Qp-r,n-r
4.3.2 Reflexive Generalized Inverses
The computation of Reflexive Generalized Inverses can be achieved
in a manner analagous to that used to compute Generalized Inverses.
If Xnp is an arbitrary matrix one forms the augmented matrix
_[xnp I I(n)]M- I(P) 0 ....
One then performs elementary row and column operations on M until M
becomes
~:p I ~l~where
X* = ~I(r) I ~r,p-r ]np 0 0...m-r,r ...~r,p-r
with r = Rank (X ).np
*It then follows that P1XP2 = X and using Theorem 3.5 we see that
is a Reflexive Generalized Inverse of X.
If A is a symmetric matrix written as A = X·X it is possible
to give a particularly simple method for computing a Reflexive Gener
alized Inverse. Pre- and post-multiplication of X·X by elementary
matrices results in
55
the row space of [XiX1 IXi~] constitutes a basis for the row space of
E'X'XE. (Hence Xi~ is non-singular.) As is well known (see Roy
[1957])
Routine matrix multiplication shows that a Reflexive Generalized Inverse
of E'X'XE is given by
Hence, by Theorem 3.5, a Reflexive Generalized Inverse of X'X is given
by
say.
A direct verification of this result is simple since
E'(X'X)(x'x)r(X'X)E =E'(X'X) [EQE'] (X'X)E'
= [E'(X'X)E] Q [E'(X'X)E]
=E'X'XE
and
(X'X)r(X'X)(X'X)r =EQ(E'X'XE)QE' = EQE' = (X'X)r •
Note that (X'X)r as given by (4.6) is not necessarily a Normalized
Generalized Inverse of (X'X) but that (X,x)rx , is a Normalized
Generalized Inverse of X. Another method of computing a Reflexive
56
Generalized Inverse will be given in Section 4.5 which does not require
that XiX be partitioned as above.
4.3.3 Normalized Generalized Inverses
By Theorem 3.15 any Generalized Inverse of the form (X'X)~' is
a Normalized Generalized Inverse of X. Hence, the computation of a
Normalized Generalized Inverse can be reduced to the computation of a
Generalized Inverse.
Alternatively, the method due to Zelen and Goldman [1963] can be
used. If one writes A=X'X (A is pxp of rank r) andchooses
R (R is P x (p-r) of rank (p-r» so that B'R is non-singular where
B'A = 0 then a Normalized Generalized Inverse of X is given by ex'-where
As discussed in Section 2.4 if one finds M- l and partitions it in the
-1same manner as M then C occupies the same position in M as A
does in M where
M = [Hi]4.3.4 Pseudo-Inverses
Computational methods for finding Pseudo-Inverses have been explored
from various viewpoints. Some of the methods preViously published seem
to have little practical value, however.
Equation (Viii) of Theorem 2.9 allows us to concentrate on methods
for obtaining the Pseudo-Inverse of a matrix of the form A = XiX.
Penrose [1956] showed that the Pseudo-Inverse of A = XiX can be
57
expressed in terms of ordinary inverses via the formula
(4.8)
where
P = [(X~Xl)2 + (~~)(~~)rl(~~)[(~~)2 + (~~)(~~)rl
Since we can rearrange the rows and columns X'X by pre- and post-
multiplication by suitable orthonormal matrices (vi) of Theorem 2.9
shows that there is no loss of generality in assuming that
forms a basis for the row space of X'X.where [XiXl I Xi~]
Greville [1959] showed that if one factors X as X = Be,np np nr rp
where r = Rank X and Bnr and C are both of rank r, thent np rpX can be expressed as
In the case A = X'X we can factor A as A = T'T where T is r xp
of rank r and (4.9) reduces to
(4.10 )
In a recent paper van der Vaart [1964] has used a particularly simple
matrix factorization to obtain an expression for the Pseudo-Inverse. If
X is a matrix of rank r simply select an orthonormal basis for thenp
row space of XnP
• If the row vectors of this basis are ~i,~, .•• ,
~' then we can writer
58
where
P' = and Q = XP
It is then easy to verify that
Boot [1963] has also given a formula for the Pseudo-Inverse of a
matrix in terms of ordinary inverses. We write X' so that its firstpn
r rows are linearly independent (where r = Rank X). Then one defines
F' and G' by the equationsrp rn
X'X = [:? Jp-r,p [
G' jx, - rn- Nt
p-r,n
The Pseudo-Inverse of
(4.11)
X is then given bynp
Xt =F(F'F)-lG' •pn
Boot gives two proofs of this result. The first is based on the result
of a matrix X is that matrix which minimizes
subject to the restriction that (X'x)Xt =X'. Thetrace of (X'X)
(see Ben Israel and Wersan [1962] or Rao and Chipman [1964]) that the
tPseudo-Inverse X
alternative proof is algebraic and like Greville's treatment rests on a
factorization of X.
(4.12 )
59
Ze1en and Goldman [1963] mention that one can compute the Pseudo
Inverse of A = X'X (where rank A = r) by choosing R = B in Lenuna
2.1. The explicit formula
At = [I _ R(R'R)-lR,](A + RR' )-1
is presented.
It is interesting to note that (4.12) can be directly verified
using the properties of the Pseudo-Inverse. We see that
by (iii) and (vii) of Theorem 2.9. Hence
[I - R(R'R)-lR'][A + RR,]-l =[I - R(R'R)-lR'][A + (RR,)]t =At
since it is easily verified that (RR') = R(R'R)-2R, and since
R(R'R)-~'At = R(R'R)-lR'At AAt = R(R'R)-lR'AAt At = 2 .
4.4 Abbreviated Doolittle Technique
In this section we indicate a presentation of the theory behind the
Abbreviated Doolittle computing technique widely used by statisticians
in dealing with linear equations, and illustrate how this technique can
be used to obtain Generalized Inverses. We consider matrices of the
form X'X and linear equations of the form (X'X)£. = X'I. The develop
ment followed here closely follows that of Anderson [1959] who consid-
ered the case where X'X is non-singular.
In Chapter 6, Section 5.7, we return to the Abbreviated Doolittle
and its place in both the theory and application of least squares.
60
We recall that a matrix X can be factored as follows
X = ZH
where Z is such that Z' Z = D, a diagonal matrix, and H is an upper
right triangular matrix with all diagonal elements unity. To conform
with the usual presentation of the Abbreviated Doolittle technique we
write X = ZH instead of the more natural X = HZ. The above factori-
zation is simply a result of the Gram-Schmidt Orthogonalization Process
applied to the columns of X. Let !l'~' ""!p and .!l' ~, ... ,
~ be the columns of Z and X respectively.
are determined recursively as
Then zl' ~~, ..• , z- -c; -p
for i = 1,2, ••• ,p. Since no assumptions have been made about the rank
of X some care needs to be taken if !.e =Q for some ! . In this
case for q = 1,2, ••. ,p-.e we take
.e-q-l
!.e+q = .!.e+q - j ~l
jrj:.e
It is clear that
ZH = X
where
61
,~~j j = i+1, i~, PI ...,!i!i
Hij = 1 j = i if !i rQ0 j = 1, 2, ... , i-1
(4.14)
0 j = i+1, i~, ..., P
Hij = 1 j = i and z = 0 if !i = 0-i -
0 j = 1, 2, ..., i-1
Hence H is upper right triangular, non-singular and has diagonal
elements un1ty . The vectors zl'~' ••. , z are orthogonal by con-- -Co -p
struction and hence
(4.14') z'z = D ,
where D is diagonal.
Thus we have
(4.14" ) x'x = H'Z'ZH •
From Theorem 3.3 we see that a Generalized Inverse of X'X is given by
(4.15 )
where (z'Z)g is a Generalized Inverse of (Z'Z). Since (Z'Z) is
diagonal a Generalized Inverse is easily found. If we define (Z'Z)g
by
(4.16)
62
where1
arbitrary
then (X'X)g is a Generalized Inverse of (X'X).
If (z'z)r is defined by
(4.17) (Z'Z)r = diag dii
where 1 if z. fQ, -J.
dii!i!i
=0 if z = 0-i -
then (x'x)r is a Reflexive Generalized Inverse of (X'X).
The above remarks indicate that if one can obtain the matrices H
and Z one can easily find a Generalized Inverse or a Reflexive Gener-
alized Inverse of X'X.
The equations (4.13) and (4.14) for the determination of Z and H
indicate that such determination of Z and H is anything but trivial.
Fortunately the application of the Abbreviated Doolittle Computational
Procedure yields a systematic method for obtaining H. It turns out
that Z is not needed explicitly and the procedure yields Z'Z in a
simple manner.
The Abbreviated Doolittle computing procedure can be conveniently
expressed in the format given in Table 4.1. The quantities aij =!.' i!.j
are the elements of the X'X matrix and aiy = !.~l for i = 1,2, •.. ,p.
Some difficulty arises in using the format indicated in Table 4.1 when
.,
•Table 4.1. The Abbreviated Doolittle format
e.
e
~ ~1 ~ ·.... Pu- ·...... ~p '1.y where ~ = ~j; j = 1,2, ... ,p,y
~~ 1 B12 ·.... B1i ·...... B1P
B1y where B1j = A1j/~1; j = 1,2, •.• ,p,y
~ 0 ~2 ·..... ~i ·..... B2p ~y where ~j = a2j - B12~j; j = 1,2, ••. ,p,y
~ 0 1 ·..... B2i ·...... B2p B2y where B2j = ~j/~2; j = 1,2, ••• ,p,y
· · · · · ·· · · · · ·· · · · · ·· · · · · ·~~ 0 0 ·.... A
ii ·..... AiP _ Aiy where Aij = aij - ~~ Bk1~; j=1,2, ••• ,p,y
~~ 0 0 ·.... 1 ·..... BiP
Biy where Bij = Aij/Aii ; j = 1,2, ••. ,p,y
· · · · · ·· · · · · ·· · · · · ·· · · · · ·~~ 0 0 ·.... 0 ·.. ,. .. App A where Apj = ~j - ~~ Bkp~; j=1,2, •.. ,p,ypy
~~ 0 0 .. .. ., .. 0 ·...... 1 B where B . = A j / A ; j=1,2, •.• ,p,ypy PJ P pp
-&
64
AJ,J = O. This will occur if and only if ~J is a linear combination ofJ-l
~l' ~, ... , ~J,-l· More precisely if ~J = i~l Ci~i then since the
equations X'XE, = X'y are consistent it can be shown that AJ,j = 0 for
j =1,2, ••• ,p,y •
In such a case we define as follows
OJ> J
o j < J
(4.18) 1 j = J
Similar procedures are followed if several Aii ' s are null. The reason
for this somewhat peculiar convention will become clear in what follows.
Recalling the definition of !l given in (4.13) we see that
and
Also
B' =-1 o • 0 J
for j =2, .• ·,Pj ~y =~ib and B2l = OJ B2j
~l.3, . . .,p j B2y = z i z
=e=e
,= ¥jz~
for j = 2,
In general it follows that
0 j <1
A1j = 1 j = 1!1!1
,j = 1+1, 1+2,!1!j ••• , p
0 j <1
(4.19 ) B1j = 1 j ::: 1
1
!t!j j = 1+1, 1-t2, ••• , P!1!1
A1y = ZIZ-1
zllB1y = -1~
!1!1
for 1 = 1,2, ... ,p if ~:f o. If !1 =.Q we follow the convention
that
0 j <1
B1j::: 1 j ::: 1
(4.19 ' ) 0 j >1
B1y::: o .
It is clear that
65
B =
B'-1
B'-p
=H
66
(4.20 )
Apy
where the matrices Z and H are as defined by equations 4.13 and
4.14.
It is also of interest to note that
(4.21) (Z'Z)H = (Z'Z)B =
A'-p
= A.
The matrix H-l = B-1 can easily be found by the following
sequential scheme. Remembering that the inverse of a triangular
matrix is again triangular we see that the matrix equation
67
1 B12B13
B1P Cll C
12C1P
1 0 ...... 0
0 1 B23B2p 0 C22 C2P
o 1 ...... 0
= .o
o ........... o 1
o
o . . . . . . . . . . .. cppo 1
yields the following set of equations for determining the elements of
the C matrix.
Cij = 1
(4.22) Cij = 0
j
k;iBikCkj = 0
i = j
j < i
j > i
i =1,2, ••• ,p
•
The simplest procedure is to start with the equation
C + C B = 0p-l,p pp p-l,p
and proceed in a sequential fashion to determine the other Cij's. It
-1is clear that this procedure gives a routine method for finding B •
Thus the computation of a Generalized Inverse or a Reflexive Generalized
Inverse can easily be performed within the Abbreviated Doolittle format
utilizing equation (4.15) i.e.,
It is of interest to note that the Doolittle procedure can be used
to compute certain matrix products. To be specific ve shall indicate
how to compute ~(xtX)r~. Form the augmented matrix
68
M = [X·X I ~'I~]
and carry through the forward solution of the Abbreviated Doolittle on
M. Note that since M is rectangular with P rows the Doolittle
procedure will stop after p steps. If one forms the matrix, ~,
". "consisting of the A rows of the Doolittle and partitions it in the
same way as M it follows from A = (B'rlX'X that
(4.23' )
If we let ~ designate theJ8lJalagously partitioned matrix of "B" rows
it follows from the definition of the Doolittle operations that
Hence
(4.24)
Equation (4.24) gives a simple procedure for computing a Normalized
Generalized Inverse of (X'X). Recall from Theorem 2.18 that a Normal
ized Generalized Inverse of (X'X) is given by
(4.25 )
Upon forming the augmented matrix
The computation of (X'X)n can be performed by an application of the
Doolittle forward solution and equation (4.24). The computation of the
matrix product ~(X'X)~ discussed above is a generalization of
69
Aitken' s triple product method [1931] and is discussed in a recent
paper by Rohde and Harvey [1964].
The computation of the Pseudo-Inverse of (X'X) can be achieved
by the Doolittle using the factorization representation discussed in
Section 4.3.4, namely,
where T is determined from the equations
X'X = T'T, Rank (Trp) = r = Rank (X'X) •
To compute T we let T consist of the r non-zero rows of DB
where D and B are as defined in (4.20). It is easy to see that
X'X = T'T. The matrix product T' (TT' r 2T can, of course, be computed
from the Doolittle utilizing the augmented matrix
(4.26) [(TT,
)2 I T] •
Note that the determination of T is identical with the determination
of the square root of X'X and hence the square root method or any
similar factorization procedure can be used to obtain the Pseudo-Inverse
in a routine fashion.
~ Some Results on Bordered Matrices
A widely used technique in the solution of extrema problems is the
Lagrangian Multiplier Technique. In this technique one considers the
problem of maximizing (minimizing) a function F of p variables
91, 92, ••• , 9p subject to s < p restrictions. Let the restrictions
be given by Vi (9l ,92, ••• ,9p ) =r i for i =1,2, ••• ,s. Define
,.
70
• e' = (el
, e2, ••• J e )p
A.' = (~, A.2
, • 0 ., A. ).s
!' (e) = ('lfl(~)' 'lf2(~)' ... , 'If (e»s -
A solution to the problem of finding an extremum of F subj ect to the
stated restrictions is often obtained by equating the partial deriva-
tives of
with respect to ~ and ~ to zero and solving for e.
Under certain circumstances (~.~. when F is a quadratic form in
e and i (~) is linear in ~) the above procedure results in a set of
linear equations of the form
(4.27)
The matrix
is, for obvious reasons, called a Bordered Matrix. As a useful applica-
tion of the theory of Generalized Inverses we shall discuss briefly the
problem of obtaining a solution to a set of linear equations of the form
(4.27). We shall do this by a simple application of the results of
Section 4.2
Forming
L=[f-B][~] [A'A+RR'lK.B]
= RiA ~ [~J= R'JllR'R ' say,
71
we see that a Generalized Inverse of L is given by
where Q =R'R - R'~A'R
[~I I~] is given by
Hence a Normalized Generalized Inverse of
(4.28)
~R+~A'RQgR'~~l •
-QgR'~R J
The general formula given by (4.28) can be specialized to yield the
results concerning Bordered Matrices which are widely known as well as
those which are not.
One application of Bordered Matrices is, as indicated in Section
4.3, the computation of Normalized Generalized Inverses and Pseudo
Inverses. Using (4.28) we shall derive alternative, but equiValent,
expressions for the Normalized Generalized Inverses and PseudO-Inverses
given in Section 4.3.
Case I: If we adopt the assumptions used by Zelen and Goldman
[1963], !.~., A is symmetric, p xp of rank rj B' A =2 where B is
p x (p-r) of rank (p-r) and R'B is non-singular, then
F = [Hi]is non-singular. Letting
72
be partitioned in the same manner as F we see that the identity
FF-l = I yields the equations
AC + RU' = I ,
R' C = ~ ,
R'U = I ,
and AU + RV = 0 •
Thus U' = (B'Rr~' and V =~. Since the inverse of F is unique we
have
F- l =[ C IB(R'Bd =[¥f>A+¥f>ARQgR' A¥f>A-¥f>ARQ~' I¥f>R+¥f>ARQ~'A~R1(B'Rr~' 2 J -Q~'A¥f>A + Q~' -QgR' A¥f>R J
where Q = R'R - R' A¥f>AR and K = A2 + RR' . It follows that
-QgR' A¥f>R = 2
¥f>R = B(R'Brl
QgR' - Q~'A¥f>A = (B'Rr~' •
Noting that assumptions imply that ¥f> = K- l = (A2 + RR' r l we have an
explicit expression for C given by
C = Kf5A - Kf5AR[Q~' - Q~'A'Kf5A]
= ¥f>A[I - R(B'R)-~']
= (A2 + RR,)-lA[I - R(B'R)-~']
This is an alternative form of the expression given in Section 4.3.3.
It is easily checked that C is a Reflexive Generalized Inverse of
A2 and hence CA is a Normalized Generalized Inverse of' A.
73
Case II: If we let B =R in Case I we see that the assumptions
imply that
[LLB] K = A2 + RR', R'R and Q = R'R _ R' AK-1AR~'
are all invertible. Obviously
Then 4.28 becomes
Since the inverse of a symmetric matrix is again symmetric it follows
that
If we let At be the Pseudo-Inverse of A we see that
involved only one matrix inversion while that given by Zelen and Goldman
involves two inversions.
CHAPrER 5. LEAST SQUARES APPLICATIONS
~ Introduction
In an excellent historical review Eisenhart [1963] has pointed out
that the analysis of a set of observed data using least squares tech-
niques is by no means new. It has only been in the last thirty years,
however, that research into the statistical properties and implications
of such analyses has been initiated. Much of this research grew out of
the application of least squares analyses to a Wide class of models
variously called Analysis of Variance, Regression or General Linear
Models.
Generally speaking, least squares techniques yield tractable estima-
tors when the model used to represent the observed data is linear in the
parameters. There is concern about properties of least squares estima-
tors when parameters enter into models in a non-linear fashion. However,
alternative methods of estimation such as maximum likelihood, minimum
chi-square, minimum modified chi-square etc. can often be viewed as
quasi-least squares estimators.
Least squares theory (and indeed most of the important applications)
is relatively complete for the class of models which can be fitted into
the following definition.
Definition 5.1 A model for a set of observed data I is called a
general linear model if
(i) E(lnl ) = X ~ , RankX=r~p<n,
np pl(5.1)
V (]'2and (ii) Var(l) = , Rank V = n ,where ~ and 2 are unknown parameters.(]'
75
Of importance is inference about the parameters. We shall devote
most of our attention to problems of estimation in this chapter. In the
framework of Definition 5.1 not all the parameters are estimable in the
following sense.
Definition 5.2 A parametric function !r~ is said to be linearly
estimable if there exists c' such that
regardless of the value of ~.
There is some controversy about restricting problems of estimation
to those parametric functions which are (linearly) estimable. Two
reasons can be given for such a restriction. First it can be shown
(Section 5.2) that estimable functions possess unique minimum variance
unbiased linear estimators (called BLUE estimators). The criterion of
minimum variance unbiasedness is Widely used in other estimation prob
lems (although occasionally it leads to nonsensical estimators (~.f.
Lehmann [1950], pages 3-14 to 3-15) and hence such a restriction seems
reasonable. Second, most of the common applications require for their
solution that only estimable functions in the sense of Definition 5.2
need be considered.
In the next section we review some of the current approches to
least squares theory in general linear models.
5.2 A Review of Some Current Theories
The theory of estimation in the general linear !!lOdel is well
developed and can be compactly presented. In this section we shall
describe briefly the essence of this theory. We assume that we have a
76
vector of random variables Z with the following properties
(i) E(Z) =~
(5.3 )and (ii) Var(z)
2 RankX=r'::::p'::::n,= Icr ,
where ~ and 2 are unknown parameters.cr
It is desired to estimate !'~, a (linearly) estimable function in
the sense of Definition 5.1 using a minimum variance linear unbiased
estimator or BLUE estimator.
The most elegant proof of the existence and uniqueness of a BLUE
estimator for !'~ uses the theory of projections on finite dimensional
vector spaces. The details of this theory are available in many sources.
'"For convenience we follow the line of development used by Scheffe
[1959]. Recall that !'~ is linearly estimable if and only if there
exists c such that J,' = c 'X' •- -".
Scheffe proves that there exists a
, *'unique linear unbiased estimator for ! ~ given by £ Z. The vector
•
£* lies i,n Vr ' the vector space spanned by the columns of X. In
addition, if £'Z is any linear unbiased estimator of !'~ then c*
is the proj ection of c on V· along v-l. 1./ Using these results- r r
'"Scheffe establishes the celebrated Gauss-Markoff Theorem.
Theorem 5.1: (Gauss-Markoff) Under the model (5.3) every
(linearly) estimable function !'~ has a unique linear unbiased esti
mator £i'~ which has minimum variance in the class of' all linear un
fbiased estimators. The form of ! ~ is given by
" 1./:s.ose (~959] calls Vr the "estimation space" and v!- theerror space •
77
where b is any solution to the " """least squares or normal equations
(X'X)£ = X',;z: •
From Theorem 2.5 we have £ = (X'X)~ I,;z:. Hence the BLUE of .!'~
is t£ = .&0' (X'X)~',;z:. It can also be shown that E[(,;z:-X!?), (,;z:-X£)] =
E[SSE] = qri where q = n - Rank (X'X). Thus an unbiased estimator for
(]"2 is '(/2 = SSE!q •
Other approaches to the theory of least squares are possible, and
have certain desirable qualities. The approach of Plackett [1960] will
be extended in Section 5.4. An approach due to Roy [1953] is of interest
because it is equivalent to using a Reflexive Generalized Inverse of
(X'X). ROY's approach is also important in that it often leads to more
efficient programming of linear model problems on high-speed computing
equipment as is brought out by the following paragraph.
Roy's approach utilizes a basis of X and is as follows. Rewrite
the model (5.3) as
where Xl is a basis of X. Note that the elements of ~ have been
rearranged to conform with the rearrangement of X. From the condition
for linear estimability (5.2) it follows that .!'~ is estimable if and
only if there exists c such that
Using this form of the estimability condition Roy shows that the BLUE
estimator for !'~ is given by
•
78
.etA = .e'(x'x r lv '_.t:. -1 11 ~l
and that this estimator coincides with the least squares estimator of
,!'f!. Roy also develops criteria for testing and establishing the test
ability of hypotheses and shows that these results are invariant under
choice of a basis of X.
It is interesting to note that a simple proof of the invariance of
say, l'f! under choice of a basis of X can be given using Generalized
Inverses. We see that
Since l'~ is estimable we have [l~ I ~] = [£~ I ~] [Xl I X2] for
some [£~ I £~]. Thus
,!ff! = [£~ I £~] X(X'X)rX' l and since X(X'X)r X·
is uniquely determined by Theorem 1.6 it follows that another choice of
1-basis for X will yield the same ,! ~ .
The above approached to the theory of the general linear model can
be generalized in that one can replace the assumption that Var (y) = Icl
by the assumption that Var (y) = v~2 where V is positive definite
and known. If V is not known, then estimates of its elements must be
substituted and iteration will need to b'e used to find b. The theory
necessary to handle linear model theory under the assumption that
2Var (~) = V~ and V known will be discussed in Section 5.3 .
79
~ Weighted Least Squares and the GeneralizedGausS-Markoff Theorem
In Section 5.2, it was mentioned that the assumption of the general
2linear model could be somewhat relaxed in that Var (z) = IO" could be
2replaced by Var (z) = VO". In this section we define the weighted
least squares estimator of a linearly estimable function and show the
equivalence of the weighted least squares estimator and the BLUE. The
results extend P1ackett's [1960] results and utilize the theory of
Generalized Inverses. The proofs are simple, purely algebraic, and
2reduce to the case considered in Section 5.2 when V = IO" .
Definition 5.3 If a set of linear parametric functions If! is
such that there exists C satisfying E[qz] = ~ for all ~ then ~
is said to be a (linearly) estimable set of parametric functions.
Theorem 5.2: If ~ is a set of linearly estimable functions then
qz, the best (minimum variance) linear unbiased estimator for ~ is
given when
Further the value of var(ca) is L(x'v-lx)gL' •
Before proving Theorem 5.2 we need to establish the following
Lemma.
Lemma 5.1: The identity (x'v-lx)(x'v-lx)~,v-1= x,v-1 holds and
the quantity v-(i)X(X'v-lx)~'v{t)' is uniquely determined, synnnetric
and idempotent.
Proof: Let Y = v-(i)x where v(t)'v(i) = V. Then
v-(t)X(X'v-lx)~'v(t)' = Y(Y'Y)~' is uniquely determined, synnnetric
80
and idempotent by Theorem 2.6. Also by Theorem 2.6 it follows that
Proof of Theorem 5.2. If Cz is to be unbiased for If! then
E[ Cz] = CXl:! = If! for all /3. Hence ex = L. We must show that the
diagonal elements of Var (Cz) = eve', subject to ex = L, are minimized
when e = L(x'v-lx)gx'v-l . We see that
[e - L(X'V-lX)gx'V-l]V [e - L(X'V- l x)gx'v-l ],
= eve' - CVV-lx(x'V-lX)g'L' - L(x'v-lx)~'v-lye'
+ L(x'v-lx)gx'v-lyy-lx(x'v-lx)g'L'
= eve' - CVV-lx(x'v-lx)g'x'v-lye' - ex(x'v-lx)gx'e'
+ cvv-lx(x'v-lx)~'v-lx(x'V-lx)g'x'v-lye'
= eve' ex(x'v-lx)~' e' -ex(x'v-lx)x' e' + ex(X'v-lx)~' e'
= eve' - ex(x'v-lx)~' e' by Lemma 4.1.
Hence
...Var(If!) = Var( Cz) = eve'
= e x (x'v~lx)g x'e' + [e-L(x'v-lx)~'v-l]v[e-L(x'v-lX)gxv-l]'
= L(x'v-lx)gL' + [e-L(x'v-lx)gx'v-l]v[e-L(x'v-lX)gx'V-l ], •
"Thus each diagonal element of Var (If!) is minimized when
e - L(X'V-lX)~'V-l = ~
or when e = L(X'V-lX)~'v-l .
With this choice of e we have Var (:4!) = L(xv-lx)gL' •
81
Having found the linear minimum variance unbiased estimator for ~
we now show the equivalence of this estimator and the weighted least
squares estimator.
Definition 5.4 The weighted least squares estimator for a linearly
"" ""estimable set of parametric functions ~ is given by ~ where ~
is any solution to the equations
"" ""Theorem 5 .3: The vector ~ used in determining ~ minimizes the
weighted error sum of squares
Proof: We first note that
where
Hence
WSSE = (X - X!?) 'v-leX - X£) = !'y-l!
! = [(X - X(X'y-~)gX'y-1X) + (X!? - X(X'y-~)Sx'y-1Z)] .
WSSE = [X - x(x'y-lx)Sx'y-lx) 'y-l[X - X(X'y-3x )~'y-\]
- [x-X(X'y-lx)gx'v-1X]'y-l[X!?-X(X'y-lx)gx'y-lx]
- [X!? - x(X'y-lx)gx'y-lx] 'y-l[X- x(x'y-lx)Sx'y-lx]
+ [XE, - X(X'y-lx)Sx'y-lx] 'y-l[XE, - x(x'y-lx)gx'v-lx] .
By Lemma 4.1 we have
[X!? - X(X'y-lx)Sx'y-1X]'y-l[X - x(x'y-lx)Sx'y-lx]
= [£ _ (X'y-~)Sx'y-1X]' [(X'y-l)X - (X'y-lx)(X'y-~)Sx'y-1X] =0 .
82
Similarly
[I - X(X'V-lx)gx'v-11]'V-1[~ - X(X'V-lx)gX'V-11] =0 •
Hence
WSSE = l'[r - X(X'V-lx)Sx'V-1]'v-1[r - X(X'V-lx)Sx'V-1]1
+ [~ - X(X'V-lx)gx'V-11]'V-1[~ - X(X'V-lx)gX'V-11]
is minimized if b is chosen so that X£ = X(X'v-lx)gX,v-11 •
Obviously E. = (X'V-lx)~'V-11 satisfies this equation. Further if b
satisfies the above equation then £ also satisfies
(X'v-lx)£ = (x'V-lx)(x'v-lx)Sx'V-11 = X'V-11 ;
£ =~ = (X'V-lx)Sx'V-11 ,
. """ "is a solut~on to the normal or least squares equations.
~ Linear Models with Restricted Parameters
Occasionally linear models arise in which the parameters ~ are
restricted in a linear fashion. That is, it is known that
R'~ =!
for some matrix R and vector r. In such a restricted model one"-
feels that it is only reasonable that a set of estimators ~ should
satisfy (5.4).
We are thus led to consider a restricted linear model of the form
and
(i)
(ii)
(iii)
E(l) = Jq!; Rank X = r ~ p < n
2Var (I) = Vcr; V positive definite,
R'~ = ! .
Under the model (5.5) c. R. Rao [1952] established the following
theorem.
Theorem 5.4: A linear function .t~ is estimable under (5.5) if
and only if there exists c and d such that
clx + d'R' = J'
and d'r = b o
hold. If £~ is estimable then its BLUE is given by ~'"l. + bo where
c' = A.'X'y-l and b = d'r. The quantities _A. and _d are any solu-- - 0
tion to the equations
[~] [~] =t~jProof: If c and d are such that clX + d'R' = JI and d'r = bo
then !Ip' is estimable since E(c'V + b r = clyp' + b = Jlp'- dIR'P.+bt:: -, 0 _."t:, 0 _t:: - t:: 0
=t~ - ~'! + ~I! =~I~ .
If ~'~ is estimable and R'~ = r then there exists c and bo
such that E(~I"l. + bo ) = !I~. Hence
clyp' + b = Jlp' •_."t:, 0 - t::
Thus
for some d. Hence clX - JI = _d'R' and b = d'r •o"To find the BLUE of ~I~ one simply lets ~I~ =~I"l. + b 0 and
"determines ~ and b0 such that Yar (t~) is a minimum. Since
"Yar (!'~) = ~Iy~ we are led to form the LaSrangian function
F(c, A. , A.) = clyc + 2A. (b - d'r) - 2A. I (X IC + Rd - J) •
- 0 - - - 0 0 -- - - -
Taking partials with respect to ~, A.0' b0 and' ~ yields the equations
84
aF = 2c'y - 2A.'X'de ,
aF _ b - d'rdA.o - 0 --'
~ = c'X' + d'R' _ £'- ,
aF2A.ow= ,
0
aF 2A. r' + 2A. 'R'dd'" = -0
Equating the partial derivatives to zero and simplifying yields the
equations
c' = !:::.'X'y-l,
bo =d'r- - ,and A.'R =0' •
c'X + d'R' =!' ,
A. = 0 ,o
Hence c' = A.'X'y-l and, .
b = dr,o --
where A. and d are solutions to the equations
[~] [~] = [~JSince the use of Lagrangian multipliers only yields necessary conditions
for extrema one still needs to verify that the above solutions do in
fact yield a minimum. Rao has shown that these solutions do in fact...
minimize Yar C~'f~) and that !'t3 is uniquely determined.
Dwyer [1959] has obtained the natural generalization of Rao's
formula to the case of a set of linear functions. He has shown that
85
~ is a set of estimable functions in the model (5.5) if and only if
there exists C and D such that
e'x + D'R' = L
and D'r = b •- -0
is an estimable set of linear functions then its BLUE is given"'-
by ~ = e'l + £0 where
e' = !\'X'V-l
b = D'r •-0
The quantities J\ and D are given as any solution to the equations
[~l[q = U]Dwyer also gives an explicit expression for the BLUE in the case
that X is of full rank and R is of full rank. He mentions that Rao
did not obtain such an explicit expression for the BLUE of a single
estimable function. This is not surprising since Rao assumes that X
is not necessarily of full rank. Rao assumes, however, that R is of
full rank.
Using the theory of Generalized Inverses of bordered matrices we"'-
can give an explicit expression for ~ which includes both Dwyer's
and Rao's results as special cases. Zelen and Goldman [1963] have
achieved such results but their formulae seem unnecessarily complicated
and cumbersome.
r86
From Section 4.5 a Generalized Inverse of
fXIV~;X I~] .is given by
where A =x'v-Ix, Q = R'R - R'~A'R and K =A'A + RR' .
Hence
[~jThus
. ,b = L( I - ~ A' )RQg r .-0 -
By allowing the assumptions on X and R to vary one can obtain
as special cases of the above formulae the results of Dwyer and an ex
plicit expression for Rao's situation. Note that since a Generalized
Inverse of 0 is 0 there is no problem in treating the case R = 0- - -as in Dwyer's method.
The variance covariance matrix of the BLUE for ~ can easily be
obtained and is simply
""V(~) = e've or
The expressions given by Dwyer for various special cases can be obtained
by simple substitution into the above formula.
r
87
~ Linear Models with a Singular Variance - Covariance Matrix
In this section we generalize the results of Section 5.3 to the
case where V is singular. This situation can occur when one of the
lIs is a linear combination of some of the other lIs or when one of
the lIs is a constant. Zelen and Goldman [1963] have presented some
results in this connection but their results seem too restrictive. The
treatment presented here is straight-forward and utilizes the results
of Section 5.4 as well as the theory of Generalized Inverses.
The linear model under consideration is now
and
( i ) E(l) = xt!,
(ii) Var l = V, V positive semi-definite of rank q ~ n •
Without loss of generality we may assume that V has been written in
the form
V = V12J
where [Vll I V12
] is a basis for V. It then follows that Vll is
non- singular and that
If we let
pI = [I ,
then simple matrix multiplications show that
[VQllplvp = E-
r88
It follows that the transformation ! =P'l yields a linear model
for ~ = [~ ~ , "here ~l corresponds to VU ' given by
(i) E~:~ = P'x.@. =
[Zl] [ VQll 0Q_ ]
(ii) Var ~2 J = -=--~--=-J
Hence !2 = (~ - viiv12Xl )~ with probability one.
It is clear that the model has been reduced by the transformation
P' to a restricted model of the form considered in Section 5.4. The
model is thus
and
(i) E(!l) =Xl~ ,
(ii) var(!l) =Vll '
(iii) R'~ = (~ - Viiv12~)~ =~ .
From the results presented in Section 5.4 it follows that
where
and
" e'l + !!o~ =
, , '1e' = L[I - (K-~ A' )RQg R']~ ~V-
11, ,bo = L( I - ~ A' )RQg ! ,
A ' -1= X1Vll~ ,
K = A'A + RR' ,
Q = R'R - R'~A'R ,
R' = ~ - viiv12 .
The formula presented in Section 5.4 for the variance covariance
matrix can be used to obtain the variance covariance matrix of the BLUE
of ~.
2..:.§. Multiple Analysis of Covariance
The use of the theory of Generalized Inverses allows an uncluttered
treatment of multiple analysis of covariance. In particular the results
of Section 4.2 can be used to show that the estimators for the regres-
sion coefficients of the covariables are unbiased.
Suppose that we have the model
z = [~ I ~l [~j + €
where
Z is an n x 1 vector,
€ is an n x 1 vector,
Xl is an n x p matrix of rank r ~ p < n,
X2 is an n x q matrix of covariables of rank q < n,
~l is a p x 1 vector of parameters,
and ~ is a q x 1 vector of regression coefficients for the covariables.
We make the usual assumptions that E(.§) = Q and Var (§.) = lei as
well as the assumption that [~I~] has rank (r + q) ~ n.
The normal equations for estimating ~l and ~ are, in
partitioned form,
where £1 and Ee are the estimates of ~l and ~2 respectively .
From Theorem 2.6 we have
In terms of the model the estimators of ~l and ~2 become
90
Using 4.3 we have
From Lemma 4.1 it follows that Q is non- singular. Hence
Another use of 4.3 shows that
"-It follows that ~2 is an unbiased estimator for ~2 with variance
covariance matrix given by
v(~ ) = Q-l[x~ - x~X (X'X )~'][x_ - X (X'x~ )~IX ]Q-l cr2~2 -~ -~ 1 1 1 1 -~ 1 1--1 1 2
= Q-l[x~x__ x~X (X'X )~'X ]Q-l cr2-~-~ -c 1 1 1 1 2
= Q-1QQ-l cr2
-1 2= Q cr •
91
l:1 Use of the Abbreviated Doolittle inLinear Estimation Problems
In a recent paper Rohde and Harvey [1964] have generalized a method
of computing certain matrix products originally due to Aitken [1937].
The Abbreviated Doolittle procedure described in Chapter 4 can be used
to give a compact formulation of Aitken's method which computes matrix
products of the form CA-~. Of more importance in statistical applica..
tions is the computation of the matrix products which arise in the
theory of linear estimation.
To be specific let us indicate how one can use the Doolittle proce
dure to compute the matrix products 1(x'y-Ix)~y-1Iand 1(X'y-lx)~,
Which arise in the theory developed in Section 5.3. Augment the
X'Y-Ix matrix and X'Y-~ column of the Doolittle by l' = X' C to obtain
[x'Y-Ix I X'y-II I x'c =1'] .
If one carries through the forward solution of the Doolittle on the
augmented matrix it follows from Section 4.5 that the matrix composed of
the "A" rows is
[(Z'Z)B I Z'I I Z'C] •
Similarly the matrix composed of the "B" rows is
We see immediately that
c'z(z'Z)rZ'I = C'XB-l(B'-lx'Y-lxB-l)rB,-lX'y-1I
= c'X(X'Y-lx?X'y-lI
= 1(x'Y-lx)Sx'y-l I .
92
Similarly
Thus the computation of the BLUE and variance covariance matrix of
an estimable function can be obtained from the forward solution of the
properly augmented Doolittle. Note that it is the fact that (xlv-Ix)g
can be any Generalized Inverse of (xlv-Ix) which allows the above
procedure to function properly. By changing the (XIv-Ix) and L
matrices properly one can use the above method to find the BLUE and
variance-covariance matrix of the estimable functions considered in
Sections 5.4, 5.5 and 5.6.
~ Tests of Linear Hypotheses
Associated with the linear model 5.1 one can consider the testing
of various (linear) hypothesis about f! . Such hypotheses are often
expressed in the form
HO: ~=.2 vs.
HA: ~ r .2 .
We confine ourselves to strongly testable hypotheses (hypotheses where
L is of full rank and ~ is a set of (linearly) estimable functions).
The statistic used to test the hypotheses HO will be composed of
two parts. The numerator of the statistic consists of the sum of squares
due to the hypothesis, SSHO' divided by its degrees of freedom, Rank L,
and the denominator of the statistic consists of the sum of squares due
to error divided by its degrees of freedom, [n - Rank (XIX)]. The re
sults on obtaining such quantities are widely known and available so we
93
shall be content to present the formulas as follows:
and
We note in passing that both ~ and ~ are idempotent and symmetrico
and that ~oV~ = ~ •Intuitively SSHO is a measure of that part of the total sum of
squares which~ be explained Ez the null hypothesis He while SSE
is a measure of the part of the total sum of squares which cannot be
explained Ex the model. The usual F statistic defined by
[SSHe/Rank L]F = [SSE/n-Rank XiX]
.-
is thus seen to be a reasonable criterion to judge the import of the
hypothesis. The diVisors, [Rank L] and [n-Rank X'X], simply put the
numerator and denominator on a unit basis. This intuitive interpreta-
tion of the F statistic can be greatly strengthened when certain
additional distributional assumptions are included as part of the model.
More precisely if ~ is assumed to have a multinormal distribution with
E(~) = 2 and Var (§.) = V then the F statistic has, under the null
hypothesis, ~ =2, a central F distribution and allows one to make
a judgment about the hypothesis via significance tests. When the null
hypothesis is false, say ~ = ! r 0, then the distribution of F is
a non-central F distribution with [Rank L] and [n-Rank X'X]
degrees of freedom.
•
94
The above distributional results follow directly from the following
two Lemmas due to Graybill [1961].
Lemma 5.2: If "l:. has a multinormal distribution with E("l:.) =!:!:.
and Var ("l:.) =V then "l:.'B"l:. has a non-central chi-square distribution
with [Rank B] degrees of freedom and non-centrality parameter A
given by
Lemma 5.;: If "l:. has a multinormal distribution with E("l:.) = !:!:.
and Var ("l:.) = V then i Ay and "l:.' B"l:. are independent if and only if
AVB = Q.
From the above Lemmas it follows from previous remarks that:
(i) SSHO has a non-central chi-square distribution with [Rank L]
degrees of freedom and non-centrality parameter given by
(ii) SSE has a central chi-square distribution with
[n-Rank (x'v-lx)] degrees of freedom,
(iii) SSHO and SSE are independent,
and (iv) F, as defined by (5.6) has a non-central F distribution
with [Rank L] and [n-Rank (X'V-1X)] degrees of freedom.
It is to be pointed out that the seemingly more general hypothesis
can be tested by the above method simply by replacing the model by
E(Z} = [X I Ql [~l
• and testing the homogeneous hypothesis
95
[L I -!] t~f = 0 .
The resulting SSHO is found to be
[L(X'VX)Sx'V-ll - !]'[L(x'v-lx)gL,]-l[L(X'V-lx)gx'v-ll -!] .
This simple device was pointed out by Dwyer [1959] in another context.
96
CIJAP:rER 6. SUMMARY AND SUGGESTIONS FOR FUTURE RESEARCH
6 .1 Summary
In this dissertation four types of Generalized Inverses of matrices
were discussed. The sets defined by
• g(A) ={Ag
:
r(A) =[Ar :
n(A) ={An:
and t (A) = [At:
AAgA = AJ
AArA = A and ArAAr = Ar JAAn = A, AnAAn = An and (AAn ) 1 = AAn }
AAtA = A, AtAAt = At, (AAt)1 = AAt and (AtA)1 = AtA}
define the sets consisting of the Generalized, Reflexive Generalized,
Normalized Generalized and Pseudo Inverses respectively. It can easily
be shown that g(A);2 r(A) :J n(A).2.. t(A) with equality holding if and
only if A is non-singular. In Chapter 2 the work of various authors
regarding the sets i(A) for i = g, r, n and t was reviewed. In
particular the fundamental results of Bose [1959] on g(A) and Penrose
[1955] on t (A) were emphasized, and extensions to Hilbert space and
other algebraic systems were discussed.
In Chapter 3 the various types of Generalized Inverses were studied
from a theoretical viewpoint. The principle underlying this investiga
tion was that an intuitively appealing method of studying the sets g(A),
r(A), n(A) and t(A) is to relate a property of the matrix A to the
corresponding property of a typical element of the set i(A), i = g, r,
n and t. Properties investigated from this viewpoint were rank,
synnnetry, eigenvalues and eigenvectors. In particular it was shown
that the rank of any matrix Ar in r(A) is equal to the rank of A,
..
97
and that if A is symmetric then the non-zero eigenvalues of any matrix
An in n(A) are reciprocals of the non-zero eigenvalues of A. It was
also shown that if A is symmetric then a symmetric Generalized Inverse
always exists and that the eigenvectors of the Pseudo-Inverse of A
coincide with the eigenvectors of A if A is symmetric. Using
results on the properties of the sets g(A), r(A), n(A) and t(A),
certain relations between the sets were established. It was shown that
AgE;: r(A) if and only if Rank (A
g) = Rank (A) and that Ag
E;: n(A) if
and only if Ag = (A'A)gA' where (A'A)g E;: g(A). This second result is
a slight extension of one due to Zelen and Goldman [1963]. Relations
were also obtained for t (A) by restricting A to be normal or
symmetric.
In Chapter 4 computational methods were reviewed and the results of
Chapters 2 and 3 used to develop some useful modifications. Of partic-
ular importance in applications is the following expression obtained for
a Generalized Inverse of a partitioned matrix. If XiX is partitioned
as
XiX =
then a Generalized Inverse of XiX is given by
(6.1)[
(XIX )g + (XiX )/5vIX_Q/5vIX (XiX )g _(Xix.. )g(Xlx_ )Qg](XIX)g = 1 1 1 1 ~-~. ~ 1 1 1 1--1 l-~ ,
_Q/5vIX (XiX )g Qg~ 1 11
where Q =~~ - ~xl(XiXl )~i~ . It was also shown by suitably choos
ing (XiXl)g and Qg that the above expression can be used to find a
Reflexive Generalized Inverse of XIX. It was also shown that clever
partitioning of XIX and suitable choice of (~Xl)g permit the compu
tation of Normalized Generalized and Pseudo-Inverses. Application of
(6.1) to Bordered Matrices was discussed briefly and alternative but
equivalent expressions for previously known results were developed.
The theory of Generalized Inverses was also used to develop the theory
of the Abbreviated Doolittle technique when the coefficient matrix is
singular.
Application of Generalized Inverses to least squares theory formed
the content of Chapter 5. In particular, different formulations of
certain well known results were obtained. The use of partitioned
matrices and (6.1) allowed a simple treatment of an estimation problem
present in multiple covariance. Also discussed were the Gauss-Markoff
Theorem, Weighted Least Squares, Linear Models with restrictions, Linear
Models with a Singular Variance-Covariance Matrix and use of the Abbre
viated Doolittle Technique.
6.2 Suggestions for Future Research
There are many possibilities for future research in the area of
Generalized Inverses. First of all it is clear that nearly all results .
hold when the matrices are allowed to have complex elements, provided
one interprets the transpose operation as the conjugate transpose opera
tion and replaces symmetric by Hermitian. Of interest would be careful
delineation of the results which do hold for complex matrices and those
which do not. The relation of a property of a matrix A and the
corresponding property of a typical element of i(Ab i = g, r, n and t ,
99
can clearly be investigated for properties other than the obvious ones
studied here. One by-product of such research might be different and
possibly more natural ways of characterizing the sets g(A), r(A),
n(A) and t (A). Many minor though not uninteresting problems are also
present. For example:
(1) If A is not synnnetric but An is, does this inr.ply that An
possesses Property Rl (the non-zero eigenvalues of Ag are
reciprocals of the non-zero eigenvalues of A and conversely)
or Property ~ (the eigenvectors of Ag corresponding to non
zero eigenvalues are identical to the eigenvectors of A with
(2 )
the associated eigenvalues being reciprocals)?
nDoes A continue to possess Property Rl if synnnetric is
replaced by normal in Theorem 3.l0?
-.
e.
(3) What happens to the multiplicities of the eigenvalues (includ-
ing the zero eigenvalue) when a Generalized Inverse possesses
Property Rl ?
(4) For what class of matrices does the Pseudo-Inverse necessarily
possess Properties Rl or R2?
An extensive list of such problems could easily be drawn up.
A topic of considerable interest which has not been touched upon in
this work is the topic of convergence of a sequence of matrices. As is
well known (see John [1956] or Marcus [1960]) there are many ways to
define a matrix norm in the set of all square matrices. Among the most
common and most natural are
t2 xA'AxHAil = 1. u. b • XiX = max (A.)A.€/\
where
and
A= {r..: At~ = A.:! for some ~ r Q1'IIAII = Trace (At A) ~ = [ E r..}~
r..€/\
100
The matrix norms defined above satisfy all the requirements of
norms discussed in Section 2.6. A convergence problem of interest can
be sin:q;>ly stated as: Given A > A in norm under what conditionsn n:-> 00
does Ag -> A
gin (the same) norm.n n:-> 00
From the eigenvalue and eigenvector properties discussed in Section
3.5 it would appear that for a Normalized Generalized Inverse and the
Pseudo-Inverse one might find a sin:q;>le solution to the convergence
problem.
There are also numerous con:q;>utational aspects of Generalized In-
verses which can and should be investigated. For matrices which occur
frequently in certain investigations (such as least squares analyses)
it would be useful to have available a con:q;>i1ation of the various types
of Generalized Inverses. Variants of formula (4.2) might prove useful
in such cOn:q;>utations. Litt1e work has been done in the con:q;>utation of
Generalized Inverses in Hilbert Spaces. Recent work by Parzen [1959]
indicates that it might be profitable to investigate the computation of
Generalized Inverses in Hilbert spaces using the notions of Reproducing
Kernel Hilbert Spaces.
The results of Section 5.5 should be investigated to see if they
are basis-invariant. Some recent work by Kalman [1963] and C. R. Rao
[1962] on the application of Generalized Inverses to singular multi-
normal distributions indicates that it shOuld be possible to develop
101
the theories of Regression and Analysis of Variance on an underlying
singular multinormal distribution.
Aside from least squares it appears that Generalized Inverses might
be applicable in qwasi-leastsquares situations which include the vari-
ous minimum chi-square estimators and, in some cases, maximum likelihood
estimators. Let ~ = (~, x2, ... , xs ) be a vector valued random
variable with a probability distribution given by P9' where
9 = (91,92, •.. , 9r ) is a vector of (real) parameters. The mean vector
and the variance covariance matrix of x will be denoted by Il (9) and v(9 )- - - -respectively. Let ~l'!e' .• -, ~n represent independent observations
non x and define z = (1: x. )/n. It is well known (see Ferguson
- -n i=l-J.[1959]) that the common minimum chi-square estimators for 9 can be
found by minimizing
where
gl(wl'w2
, _•• , ws )ag
lags
~ ~fi(~:~) = ~(wl,w2'·· .,ws ) . G(!) =,
gs(wl ,w2,·· -,ws )ag
l aGs
~. . . - .
~
for suitable choice of M(9, z) and a. The usual method of choosing- n £.
e to minimize r consists of expanding r in a Taylor series,- ~
ignoring terms higher than the second, and minimizing the resulting
quadratic expression. Such a procedure yields linear equations with a
coefficient matrix depending on the unknown ~. Iteration is usually
102
applied to solve such a system. Often the resulting equations are
singular and the usual procedures call for some sort of reparametriza
tion in order to solve the system. It appears to the author that one
of the types of Generalized Inverses might prove useful to avoid
reparametrization.
The previously mentioned results concerning statistical applica
tions of Generalized Inverses indicate that much work remains to be
done in extending, simplifying and unifying these applications.
Although certain of these developments might achieve little from a
practical viewpoint it appears that they would be extremely important
from the standpoint of presenting a logical development of statistical
analysis in the classroom.
103
CHAP.rER 7. LIST OF REFERENCES
Aitken, A. c. 1937. The evaluation of' a certain triple product matrix.Proc. of' the Royal Soc. of' Edinburgh 57:172-181.
Anderson, R. L. 1959. Unpublished lectures on the applications of'least squares in economics. N. C. State of' the University of'North Carolina at Raleigh, Raleigh, N. C.
Ben-Israel, A. and S. J. Wersan. 1962. A least square method f'orcomputing the generalized inverse of' an arbitrary complex matrix.Northwestern University, O. N. R. Research Memorandum No. 61.
Bodewig, E. 1959. Matrix Calculus, 2nd Edition. North Holland Publishing Co., Amsterdam.
Boot, J. C. G. 1963. The computation of' the generalized inverse of'singular or rectangular matrices. Amer. Math. Monthly 70(3) :302303·
Bose, R. c. 1959. Unpublished lecture notes on analysis of' variance.Univ. of' North Carolina at Chapel Hill, Chapel Hill, N. C.
Desoer, C. A. and B. H. Whalen. 1963. A note on Pseudo-Inverses.SIAM Jour. 11(2):442-447.
Drazin, M. P.groups.
1958. Pseudo-Inverses in associative rings and semiAmer. Math. Monthly 65:506-514.
Dwyer, P. s. 1959. Generalizations of' a Gaussian theorem. Annals of'Math. Stat. 29:106-117.
Eisenhart, C. 1963. The background and evolution of' the method of'least squares. 34th Session of' the Inter. Stat. Inst., Ottawa,Canada, 21-29 August 1963.
Fadeeva, V. N. 1959. Computational Methods of' Linear Algebra. DoverPublications, Inc., New York.
Ferguson, T. s. 1959. A method of' generating best asymptoticallynormal estimators with application to the estimation of' bacterialdensities. Annals of'Math. Stat. 29:1046-1062.
*Foulis, D. J. 1963. Relative inverses in Baer - semi-groups. MichiganMath. Jour. 10(1):65-84.
Graybill, F. A. 1961. An Introduction to Linear Statistical Models,Vol. I. McGraw Hill Book Co., New York.
. .
104
Greville, T. N. E. 1959. The Pseudo-Inverse of a rectangular orsingular matrix and its application to the solution of systemsof linear equations. SIAM Review 1(1) :38-43.
Halmos, P. R. 1951. Introduction to Hilbert Space and the Theory ofSpectral Multiplicity. Chelsea Publishing Co., New York.
Halmos, P. R. 1958. Finite-Dimensional Vector Spaces. D. Van NostrandCo., Inc., Princeton, New Jersey.
John, F. 1956. Advanced Numerical Analysis. New York UniversityInst. of Math. Sciences.
Kalman, R. E. 1963. New methods in Wiener filtering - Proceedings ofthe First Symposium on Engineering Applications of Random FunctionTheory and Probability. John Wiley and Sons, New York.
Lehmann, E. L. 1950. Notes on the Theory of Estimation, Chapters I toIV. Associated Students' Store, Univ. of California, Berkeley 4,California.
Marcus, M. 1960. Basic theorems in matrix theory. National Bureau ofStandards, Applied Math. Series. 57.
Moore, E. H. 1920. Abstract - Bull. of the Amer. Math. Soc. 26:394395·
Moore, E. H. 1935. General Analysis. Memoirs of the Amer. Philosophical Soc., Vol. I.
Munn, W. D. 1962. Pseudo-Inverses in semi-groups. Proc. of theCambridge Philosophical Soc. 57:247-250.
Parzen, E. 1959. Statistical inference on time series by Hilbert spacemethod.s, I. Tech. Report No. 23, Applied Math. and Stat. Lab.,Stanford University.
Penrose, R. 1955. A generalized inverse for matrices. Proc. of theCambridge Philosophical Soc. 51:406-413.
Penrose, R. 1956. On best approximate solutions of linear matrixequations. Proc. of the Cambridge Philosophical Soc. 52:17-19 .
Plackett, R. L. 1960. Principles of Regression Analysis. OxfordUniversity Press, London.
Rado, R. 1956. Note on generalized Inverses of matrices. Proc. of theCambridge Philosophical Soc. 52:600-601.
105
Rao, C. R. 1952. Advanced Statistical Methods in Biometric Research.John Wiley and Sons, New York.
Rao, C. R. 1962. A note on a generalized inverse of a matrix withapplications to problems in mathematical statistics. Jour. of theRoyal Stat. Soc., Series B 24(1):152-158.
Rao, M. M. and J. S. Chipman. 1964. Projections, generalized inversesand quadratic forms. To be published in Jour. of Math. Analysisand Applications.
Roy, S. N.- I.
1953. Some notes on least squares and analysis of varianceInst. of Stat. Mimeo. Series No. 81.
•
Roy, S. N. 1957. Some Aspects of Multivariate Analysis. John Wileyand Sons, New York.
Rohde, C. A. and J. R. Harvey. Unified Least squares analysis.Submitted for publication to Jour. of the Amer. Stat. Assoc.
Scheffe, H. 1959. The Analysis of Variance. John Wiley and Sons,New York.
Sirmnons, G. F. 1963. Introduction to Topology and Modern Analysis.McGraw Hill Book Co., New York.
Van der Vaart, R. 1964. Appendix to 'Generalizations of Wilcoxonstatistic for the case of k samples" by Elizabeth Yen. To bepublished in Statistica Neerlandica.
Wersan, J. J. and A. Ben-Israel. 1962. A least square method forcomputing the generalized inverse of an arbitrary complex matrix.Northwestern University, 0. N. R. Research Memorandum No. 61.
Zelen, M. and A. J. Goldman. 1963. Weak generalized inverses andminimum variance linear unbiased estimation. Math. Researc h CenterTech. Report 314, U. s. Army, Madison, Wisconsin .
top related