chapter 1 matrix algebra. deﬁnitions and operationstbco/cm5100/chapter1_2_3.pdf · chapter 1...

Chapter 1

Matrix Algebra. Definitions andOperations

1.1 Matrices

Matrices play very important roles in the computation and analysis of several engineeringproblems. First, matrices allow for compact notations. As discussed below, matrices arecollections of objects arranged in rows and columns. The symbols representing these collec-tions can then induce an algebra, in which different operations such as matrix addition ormatrix multiplication can be defined. (This compact representation has become even moresignificant today with the advent of computer software that allows simple statements suchas A+B or A*B to be evaluated directly.)

Aside from the convenience in representation, once the matrices have been constructed,other internal properties can be derived and assessed. Properties such as determinant, rank,trace, eigenvalues and eigenvectors (all to be defined later) determine characteristics aboutthe systems from which the matrices were obtained. These properties can then help in theanalysis and improvement of the systems under study.

It can be argued that some problems may be solved without the use of matrices. How-ever, as the complexity of the problem increases, matrices can help improve the tractabilityof the solution.

Definition 1.1 A matrix is a collection of objects, called the elements of the matrix,arranged in rows and columns.

These elements could be numbers,

A =

(

1 0 −0.3−2 3 + i 1

2

)

with i =√−1

3

4 c© 2006 Tomas Co, all rights reserved

or functions,

B =

(

1 2x(t) + a∫

sin(ωt)dt dy/dt

)

We restrict the discussion on matrices that contain elements for which binary oper-ations such as addition, subtraction, multiplication and division among the elements makealgebraic sense.

To distinguish the elements from the collection, we refer to the valid elements of thematrix as scalars. Thus, a scalar is not the same as a matrix having only one row and onecolumn.

We will denote the elements of matrix A positioned at the ith row and jth column asaij. We will use capital letters to denote matrices. For example, let matrix A have m rowsand n columns,

A =

a11 a12 · · · a1n

a21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

We will also use the symbol “[=]” to denote “has the size”, i.e A [=] m × n means Ahas m rows and n columns.

A row vector is simply a matrix having one row.

v = (v1, v2, · · · , vn)

If v has n elements, then v is said to have length n. Likewise, a column vector is simply amatrix having one column.

v =

v1

v2...

vn

By default, “vector” will imply a column vector, unless it has been specified to be a rowvector.

A square matrix is a matrix with the same number of columns and rows. Special casesinclude:

1. L, a lower triangular matrix

L =

ℓ11 0 0 · · · 0ℓ21 ℓ22 0 · · · 0ℓ31 ℓ32 ℓ33 · · · 0...

......

. . ....

ℓn1 ℓn2 ℓn3 · · · ℓnn

c© 2006 Tomas Co, all rights reserved 5

2. U , an upper triangular matrix

U =

u11 u12 u13 · · · u1n

0 u22 u23 · · · u2n

0 0 u33 · · · u3n...

......

. . ....

0 0 0 · · · unn

3. D, a diagonal matrix

D =

d11 0 0 · · · 00 d22 0 · · · 00 0 d33 · · · 0...

......

. . ....

0 0 0 · · · dnn

A short hand notation is, D = diag(d11, d22, . . . , dnn)

4. I, the identity matrix

I = diag(1, 1, . . . , 1)

We will also use In to denote an identity matrix of size n.

1.2 Matrix Operations

1. Matrix Addition Let A = (aij), B = (bij), C = (cij), then

A + B = C

if and only if cij = aij + bij.Condition: A, B and C all have the same size.

2. Scalar Matrix Multiplication. Let A = (aij),B = (bij), and α a scalar (e.g. areal number or a complex number), then

αA = B

if and only if bij = α aij.Condition: A and B have the same size.

3. Matrix Multiplication. Let A [=] m × n, B [=] n × p, C [=] m × p, then

A ∗ B = C


if and only if

cij =n

∑

ℓ=1

aiℓ bℓj

Remarks:

(a) A shorthand notation is AB.

(b) For the operation AB, we say A pre-multiplies B and B post-multiplies A.

(c) When the number of columns of A is equal to the number of rows of B then wesay that A is conformable with B for the operation AB.

(d) In general, AB is not equal to BA. For those special cases in which AB = BA,then we say that A commutes with B.

4. Haddamard-Schur Product. Let A = (aij), B = (bij), C = (cij), then

A B = C

if and only ifcij = aijbij

Condition: A, B and C all have the same size.

5. Kronecker (direct) Product. Let A = (aij)[=]m × n then

A ⊗ B = C =

a11B a12B · · · a1nBa21B a22B · · · a2nB

......

. . ....

am1B am2B · · · amnB

6. Transpose. Let A = (aij), then the transpose of A, denoted AT , is obtained byinterchanging the position of row and column.1 For example, suppose A is given by

A =

(

a b c de f g h

)

then the transpose is given by

AT =

a eb fc gd h

1In other journals and books, the transpose symbol is an apostrophe (′), i.e. A′ instead of AT .


If A = AT , then A is said to be symmetric. If A = −AT , then A is said to beskew-symmetric.

If the elements of the matrix involves elements in the complex number field, then arelated operation is the conjugate transpose A∗ = (aji), where A = (aij) and a is thecomplex conjugate of a. If A = A∗ then A is said to be Hermitian. If A = −A∗ thenA is said to be skew-Hermitian.

7. Vectorization. Let A = (aij)[=]m × n,

x = vec(A) =

a11

a21...

am1...

a1n

a2n...

amn

8. Determinant. Let A be a square matrix of size n, then the determinant of A isgiven by

det(A) or |A| =∑

k1 6=k2 6=···6=kn

p(k1, . . . , kn)a1,k1a2,k2

. . . an,kn(1.1)

where p(k1, . . . , kn) = (−1)h is called the permutation index and h is equal to thenumber of flips needed to make the sequence k1, k2, k3, . . . , kn equal to the sequence1, 2, 3, . . . , n.

Example 1.1

Let A be a 3×3 matrix, then the determinant of A is obtained as follows:

k1 k2 k3 h (−1)ha1k1a2k2

ankn

1 2 3 0 a11a22a33

1 3 2 1 − a11a23a32

2 1 3 1 − a12a21a33

2 3 1 2 a12a23a31

3 1 2 2 a13a21a32

3 2 1 1 − a13a22a31

|A| = a11a22a33 − a11a23a32 − a12a21a33 + a12a23a31 + a13a21a32 − a13a22a31


♦♦♦

From the definition given, we expect the summation to consist of n! terms. Thisdefinition is not usually used when doing actual determinant calculations. Instead,it is used more for proving some theorems which involve determinants. It is crucialto remember that (1.1) is the definition of a determinant ( and not the computationmethod using the cofactor that is developed below ).

9. Cofactor of aij. Let Aij↓ denote a new matrix obtained by deleting the ith row andjth column of A, then the cofactor of aij, denoted cof(aij) is given by (−1)i+j|Aij↓|.Using cofactors, the determinant of a matrix can be obtained recursively as follows:

(a) The determinant of a 1×1 matrix is equal to that element, e.g. |(a)| = a.

(b) The determinant of an n × n matrix can be obtained by column expansion

|A| =n

∑

i=1

aik cof(aik) k is any one fixed column

or by row expansion

|A| =n

∑

j=1

akj cof(akj) k is any one fixed row

10. Matrix Adjoint. The matrix adjoint of a square matrix, denoted adj(A), is obtainedby first replacing each element aij by its cofactor and then taking the transpose of theresulting matrix.

a11 a12 · · · a1n

a21 a22 · · · a2n...

.... . .

...an1 an2 · · · ann

−→replace with

cofactors

cof(a11) cof(a12) · · · cof(a1n)cof(a21) cof(a22) · · · cof(a2n)

......

. . ....

cof(an1) cof(an2) · · · cof(ann)

−→transpose

cof(a11) cof(a21) · · · cof(an1)cof(a12) cof(a22) · · · cof(an2)

......

. . ....

cof(a1n) cof(a2n) · · · cof(ann)

11. Trace of a Square Matrix. The trace of an n × n matrix A, denoted tr(A), isgiven by

tr(A) =n

∑

i=1

aii


12. Inverse of a Square Matrix. The matrix, denoted by A−1, is called the inverse ofA if and only if

A−1 A = A A−1 = I

where I is the identity matrix.

Condition: The inverse of a square matrix exists only if its determinant is not equalto zero. A matrix whose determinant is zero is called a singular matrix.

Lemma 1.1 The inverse of a square matrix A can be obtained using theidentity

A−1 =1

|A| adj(A) (1.2)

(see page 23 for proof)

Note that even though only nonsingular square matrices have inverses, all square ma-trices can still have matrix adjoints.

13. Augmentation of Matrices Let matrices A[=]m× n and B[=]m× s have the samenumber of rows, then row augmentation

C = augmentrow(A,B) (1.3)

is obtained by simply attaching each row of A to corresponding rows from B, i.e.

cij =

aij if j ≤ n

bi,j−n if j > ni = 1, . . . , m; j = 1, . . . , n + s (1.4)

Likewise, for A[=]n × m and B[=]s × m, column augmentation

G = augmentcol(A,B) (1.5)

is obtained by simply attaching each columns of A to corresponding columns from B,i.e.

gij =

aij if i ≤ n

bi−n,j if i > ni = 1, . . . , n + s; j = 1, . . . , m (1.6)

As a shorthand, we will also use vertical bars to represent row augmentation andhorizontal bars to represent column augmentation, i.e.

augmentrow(A,B) = (A|B) (1.7)

augmentcol(A,B) =

(

AB

)

(1.8)


14. Block Matrix OperationsThe reverse of the augmentation operation is the partitioning of matrices to blocks ofmatrices. Once partitioned, these block matrices obtain a set of operations that couldtake advantage of the special structure of the resulting matrix blocks.

Suppose that the matrices can be partitioned into different blocks of appropriate (i.e.conformable) sizes, then we have the following matrix block operations:

(a) Block matrix multiplication

(

A BC D

) (

E FG H

)

=

(

AE + BG AF + BHCE + DG CF + DH

)

(1.9)

(b) Block matrix determinant

Let A[=]n × n , D[=]m × m

det

(

A 0C D

)

= det(A)det(D) (1.10)

det

(

A BC D

)

= det(A)det(D − CA−1B) if A is nonsingular (1.11)

= det(D)det(A − BD−1C) if D is nonsingular (1.12)

(see page 22 for proof of (1.10) and page 23 for proof of (1.11).

(c) Block matrix inverse

Let

Ω =

(

A BC D

)

where A[=]n × n and Γ = D − CA−1B[=]m × m are both nonsingular, then

Ω−1 =

(

W XY Z

)

where,

Y = −Γ−1CA−1 (1.13)

Z = Γ−1 (1.14)

X = −A−1BΓ−1 (1.15)

W = A−1(I + BΓ−1CA−1) (1.16)

(see exercise E12.)


15. Derivative of a Matrix of Functions of One Variable. The derivative of amatrix is simply defined as the matrix of derivatives of each element of the matrix, i.e.

d

dt

a11(t) a12(t) · · · a1n(t)a21(t) a22(t) · · · a2n(t)

......

. . ....

an1(t) an2(t) · · · ann(t)

=

d(a11)/dt d(a12)/dt · · · d(a1n)/dtd(a21)/dt d(a22)/dt · · · d(a2n)/dt

......

. . ....

d(an1)/dt d(an2)/dt · · · d(ann)/dt

(1.17)

16. Partial Derivative of a Vector of Functions with Multiple Variables

Let f(x1, x2, . . . , xn) be a scalar function in which x1, x2 . . . xn are independent vari-ables, then we will denote the operation df/dx as the n × 1 row vector given by

d

dxf(x1, x2, . . . , xn) =

(

∂f

∂x1

, . . . ,∂f

∂xn

)

(1.18)

If f is a vector of m functions, then we denote df/dx as the m × n matrix given by

d

dx

f1(x1, . . . , xn)f2(x1, . . . , xn)

...fm(x1, . . . , xn)

=

∂f1/∂x1 ∂f1/∂x2 · · · ∂f1/∂xn

∂f2/∂x1 ∂f2/∂x2 · · · ∂f2/∂xn...

.... . .

...∂fm/∂x1 ∂fm/∂x2 · · · ∂fm/∂xn

(1.19)

17. Integral of a Matrix The integral of a matrix is defined as the matrix of integrals ofeach element of the matrix, i.e.

∫ b

a

a11(t) a12(t) · · · a1n(t)a21(t) a22(t) · · · a2n(t)

......

. . ....

an1(t) an2(t) · · · ann(t)

dt =

∫ b

aa11 dt

∫ b

aa12 dt · · ·

∫ b

aa1n dt

∫ b

aa21 dt

∫ b

aa22 dt · · ·

∫ b

aa2n dt

......

. . ....

∫ b

aan1 dt

∫ b

aan2 dt · · ·

∫ b

aann dt

(1.20)

1.3 Elementary Matrix Operators and Gaussian Elim-

ination

Let ei be the ith unit vector defined as a column vector whose ith element is 1 and the otherelements are all 0.

ei =

0...1...0

← ith position


Note that the identity matrix can also be represented by In = (e1| . . . |en). Using the unitvectors, we could construct matrices known as elementary matrix operators. Thesematrices, say E, manipulates the rows of matrix A by postmultiplication to yield matrix B,i.e. B = AE. Likewise, the transpose of matrix operator E works on the columns of matrixA by premultiplication, i.e. ET A = B.

By using a sequence of elementary matrix operations, via both premultiplication andpostmultiplications, one can reduce square matrices to either an upper triangular, lowertriangular or diagonal matrices. This process is generally known as Gaussian elimination.2

1.3.1 Elementary Matrix Operators

We will group the elementary matrix operators into only three types: permutation, combi-nation and scaling matrices.

1. Permutation Operator

Epermute(k1, . . . , kn) =(

ek1|ek2

| . . . |ekn

)

k1 6= k2 6= · · · 6= kn (1.21)

When A is postmultiplied by Epermute, the columns of A will be rearranged accordingto the sequence given by k1, . . . kn. Likewise, if A is premultiplied by ET

permute, the rowsare permuted according to the given sequence.

Example 1.2

AEpermute(3, 1, 2) =

a11 a12 a13

a21 a22 a23

a31 a32 a33

0 1 00 0 11 0 0

=

a13 a11 a12

a23 a21 a22

a33 a31 a32

Epermute(3, 1, 2)T A =

0 0 11 0 00 1 0

a11 a12 a13

a21 a22 a23

a31 a32 a33

=

a31 a32 a33

a11 a12 a13

a21 a22 a23

♦♦♦

Properties:

2One can use the same procedures to reduce a non-square matrix to an upper or lower echelon form.


(a) Determinant

∣

∣

∣Epermute(k1, . . . , kn)

∣

∣

∣= p(k1, k2, . . . , kn) = (−1)h (1.22)

where h is the number of flips to make the sequence k1, k2, k3, . . . , kn equal tothe sequence 1, 2, 3, . . . , n.

(b) Inverse

E−1permute = ET

permute (1.23)

2. Combination Operator

E[i]combine( v ) =

(

e1| . . . |ei−1| v |ei+1| . . . |en

)

(1.24)

=

(

I(i−1)

0v

0I(n−i)

)

where v is a vector whose ith element is equal to 1.

When matrix E[i]combine postmultiplies a matrix A, the ith column of A is replaced by

the original column plus a linear combination of the other columns with the weightsgiven by the coefficients vj, j 6= i.

Similarly, if matrix A is premultiplied by (E[i]combine)

T , the ith row of A is replaced bythe original row plus a linear combination of the other rows with weights given by thecoefficients vj, j 6= i.

Example 1.3

AE[2]combine

(

[0.5, 1, 0]T)

=

a11 a12 a13

a21 a22 a23

a31 a32 a33

1 0.5 00 1 00 0 1

=

a11 (0.5a11 + a12) a13

a21 (0.5a21 + a22) a23

a31 (0.5a31 + a32) a33


E[3]combine

(

[−0.2, 0, 1]T)T

A =

1 0 00 1 0

−0.2 0 1

a11 a12 a13

a21 a22 a23

a31 a32 a33

=

a11 a12 a13

a21 a22 a23

(a31 − 0.2a11) (a32 − 0.2a12) (a33 − 0.2a13)

♦♦♦

However, during the Gaussian elimination procedure that will be described later, weactually premultiply a matrix by Ecombine – not its transpose. Doing so will have a

different effect. For[

E[i]combine(v)

]

A, the jth row (j 6= i) of A will be changed as

follows:

anewjk = aold

jk − vjaik

Properties:

(a) Determinant

By expanding along the ith column,∣

∣

∣ Ecombine

∣

∣

∣ = 1 (1.25)

(b) Inverse

E[i]combine

(

v)

= E[i]combine

(

2ei − v)

(1.26)

(c) Decomposition

E[i]combine

(

v)

=∏

k 6=i

E[i]combine

(

vkek + ei

)

(1.27)

3. Scale Operator

Escale

(

α1, α2, . . . , αn

)

= diag(

α1, . . . , αn

)

(1.28)

where αi 6= 0

When matrix E[i]scale postmultiplies a matrix A, the ith column of A is scaled by a factor

αi. Similarly, if matrix A is premultiplied by E[i]scale, the ith row of A is scaled by a

factor αi.


Example 1.4

AEscale(1, 0.5,−2) =

a11 a12 a13

a21 a22 a23

a31 a32 a33

1 0 00 0.5 00 0 −2

=

a11 0.5a12 −2a13

a21 0.5a22 −2a23

a31 0.5a32 −2a33

Escale(1,−2, 1.5)A =

1 0 00 −2 00 0 1.5

a11 a12 a13

a21 a22 a23

a31 a32 a33

=

a11 a12 a13

−2a21 −2a22 −2a23

1.5a31 1.5a32 1.5a33

♦♦♦

Properties:

(a) Determinant

∣

∣

∣Escale

∣

∣

∣=

n∏

i=1

αi (1.29)

(b) Inverse

E−1scale = diag

(

1

α1

, . . . ,1

αn

)

(1.30)

1.3.2 Gaussian Elimination

For a matrix A[=]m×n, the Gaussian elimination (or reduction) procedure involves a seriesof premultiplication and postmultiplication by Ecombine, Epermute and Escale so that

QAW =

(

Ir 00 0

)

(1.31)


where Q and W are products of elementary matrix operators, and r ≤ min(m,n).

Gaussian elimination procedures are useful to understand solutions of linear equations.However, these procedures can propagate round-off errors during the determination of therequired elementary operators. Hence for large matrices, with the exception of sparse matri-ces, Gaussian elimination is used with caution. Instead, there are more efficient and stablenumerical methods, e.g. singular value decomposition, that are preferred in most applicationsinvolving linear equations. The Gaussian elimination algorithm included below is just oneamong many possible methods, and hence the matrices Q and W operators are not unique.

Rather than focusing on the actual computations of Q and W , it is more instructivefor our discussion in the next chapters to note the existence of operators Q and W so thatequation (1.31) is true.

1.3.3 Gaussian Elimination Algorithm

Gaussian Elimination Algorithm : Given A[=]n × n, the following procedure obtainsthe required operators Q and W . (The algorithm for a nonsquare A is left as an exercise).

1. Initialize: M = A, β = 1, i = 1

2. Reduce M to upper triangular form:While i ≤ n

(a) Determine pivot term, β = mkj, where

(k, j) = arg

[

maxk,j>i

(|mkj|)]

If β = 0 then exit while-loop; otherwise

(b) Eliminate all terms in column j (except for the kth entry) using operator Zi

Zi = E[k]combine

(

v)

where

vℓ =

1 if ℓ = k−mℓ,j/β if ℓ 6= k

(c) Interchange column i and column j using Gi = Pi,j and interchange row i androw k using Hi = P T

i,k, where

Pi,ℓ =

I if ℓ = iEpermute(1, 2, . . . , i − 1, ℓ, i + 1, . . . , ℓ − 1, i, ℓ + 1, . . . , n) if ℓ > i

(d) Update matrix M : M ←− HiZiMGi


(e) Increment i: i ←− i + 1

(End of while-loop)

3. Normalize nonzero diagonal terms in M :

Using operator D

D = Escale(d1, . . . , dn) = diag(d1, . . . , dn)

di =

1/mii if mii 6= 01 if mii = 0

M is updated

M ←− DM =

I if r = n(

Ir B0 0

)

if r < n

where r is the number of times a complete while-loop was invoked in the previous step,and B[=]r × (n − r)

4. Eliminate B.

Using operator F

F =

(

Ir −B0 In−r

)

M ←− MF =

(

Ir 00 0

)

5. Evaluate the required operators:

Q = DHrZr · · ·H2Z2H1Z1

W = G1G2 · · ·GrF

Example 1.5

Given

A =

0 2 2 42 1 3 4−1 2 1 33 1 4 6

Set M = A


i = 1, β = 6, k = 4, j = 4

Z1 =

1 0 0 −4/β0 1 0 −4/β0 0 1 −3/β0 0 0 1

, H1 =

0 0 0 10 1 0 00 0 1 01 0 0 0

, G1 =

0 0 0 10 1 0 00 0 1 01 0 0 0

M ←− H1Z1MG1 =

6 1 4 30 1/3 1/3 00 3/2 −1 −5/20 4/3 −2/3 −2

i = 2, β = −5/2, k = 3, j = 4

Z2 =

1 0 −3/β 00 1 0 00 0 1 00 0 2/β 1

, H2 =

1 0 0 00 0 1 00 1 0 00 0 0 1

, G2 =

1 0 0 00 0 0 10 0 1 00 1 0 0

M ←− H2Z2MG2 =

6 0 14/5 14/50 −5/2 −1 3/20 0 1/3 1/30 0 4/30 4/30

i = 3, β = 1/3, k = 3, j = 3

Z3 =

1 0 −14/(5β) 00 1 1/β 00 0 1 00 0 −4/(30β) 1

, H2 =

1 0 0 00 1 0 00 0 1 00 0 0 1

, G3 =

1 0 0 00 1 0 00 0 1 00 0 0 1

M ←− H3Z3MG3 =

6 0 0 00 −5/2 0 5/20 0 1/3 1/30 0 0 0

i = 4, β = 0 −→ exit while-loop, r = 3

D =

1/6 0 0 00 −2/5 0 00 0 3 00 0 0 1

; M ←− DM =

1 0 0 00 1 0 −10 0 1 10 0 0 0


F =

1 0 0 00 1 0 10 0 1 −10 0 0 1

; M ←− MF =

1 0 0 00 1 0 00 0 1 00 0 0 0

Thus the required operators Q and W are given by

Q = DH3Z3H2Z2H1Z1 =

0 −7/5 1/5 10 −6/5 −2/5 10 3 0 −21 −2/5 −4/5 0

W = G1G2G3F =

0 1 0 10 0 0 10 0 1 −11 0 0 0

As a check,

QAW =

0 −7/5 1/5 10 −6/5 −2/5 10 3 0 −21 −2/5 −4/5 0

0 2 2 42 1 3 4−1 2 1 33 1 4 6

0 1 0 10 0 0 10 0 1 −11 0 0 0

=

1 0 0 00 1 0 00 0 1 00 0 0 0

♦♦♦

1.4 Some Important Properties of Matrix Operations

Property 1. Let A and B be square matrices of the same size, then the determinant ofthe product AB is the product of the determinants of A and B,

|AB| = |A||B| (1.32)

(see page 21 for proof)

Property 2. If any column of A is a linear combination of the other columns of A, thendet(A)=0. Likewise, if any row of A is a linear combination of the other rows of A,then det(A)=0. (see page 21 for proof)


Property 3. The transpose operation does not affect the determinant

|A| = |AT | (1.33)

(see exercise E3.)

Property 4. The determinant of an upper triangular matrix U , a lower triangular matrixL or diagonal matrix D is equal to the product of its diagonal.

|U | =n

∏

i=1

uii (1.34)

|L| =n

∏

i=1

ℓii (1.35)

|D| =n

∏

i=1

dii (1.36)

(see exercise E2)

Property 5. The transpose of a matrix product is the product of the transposes of eachmatrix in reversed order

(AB)T = BT AT (1.37)

(see exercise E7)

Property 6. The inverse of a matrix product is the product of the inverses of each matrixin reversed order

(AB)−1 = B−1A−1 (1.38)

(see exercise E8)

Property 7. The determinant of the inverse of a nonsingular matrix A is the reciprocalof the determinant of A

|A−1| =1

|A| (1.39)

(see exercise E9)

Property 8. Let A[=]m × n be a matrix of constants and x[=]n × 1 be a vector ofindependent variables x1, · · · , xn, then

d

dx(Ax) = A (1.40)

(see page 23 for proof.)

Property 9. Let A[=]n×n be a matrix of constants and x[=]n×1 vector of independentvariables x1, · · · , xn, then

d

dx(xT Ax) = xT (A + AT ) (1.41)



1.5 Appendix: Proofs

• Proof of |AB| = |A||B|: (cf. page 19)3 Let C = AB, then

c1k1=

n∑

ℓ1=1

a1ℓ1bℓ1k1, · · · , cnkn

=n

∑

ℓn=1

anℓnbℓnkn

|C| =∑

k1 6=k2 6=···6=kn

p(k1, . . . , kn)c1,k1· · · cn,kn

=∑

k1 6=k2 6=···6=kn

p(k1, . . . , kn)

(

n∑

ℓ1=1

a1ℓ1bℓ1k1

)

· · ·(

n∑

ℓn=1

anℓnbℓnkn

)

=∑

k1 6=k2 6=···6=kn

n∑

ℓ1=1

· · ·n

∑

ℓn=1

p(k1, . . . , kn)(a1ℓ1 · · · anℓn)(bℓ1k1

· · · bℓnkn)

=n

∑

ℓ1=1

· · ·n

∑

ℓn=1

(a1ℓ1 · · · anℓn)

∑

k1 6=k2 6=···6=kn

p(k1, . . . , kn)(bℓ1k1· · · bℓnkn

)

but

∑

k1 6=k2 6=···6=kn

p(k1, . . . , kn)(bℓ1k1· · · bℓnkn

) = 0 if ℓi = ℓj for some i, j = 1, . . . , n

so

|C| =∑

ℓ1 6=ℓ2 6=···6=ℓn

(a1ℓ1 · · · anℓn)

∑

k1 6=k2 6=···6=kn

p(k1, . . . , kn)(bℓ1k1· · · bℓnkn

)

the inner summation can be further reindexed as

∑

k1 6=k2 6=···6=kn

p(k1, . . . , kn)p(ℓ1, . . . , ℓn)(b1k1· · · bnkn

)

and the determinant of C then becomes

|C| =

(

∑

ℓ1 6=ℓ2 6=···6=ℓn

p(ℓ1, . . . , ℓn)a1ℓ1 · · · anℓn

)(

∑

k1 6=k2 6=···6=kn

p(k1, . . . , kn)b1k1· · · bnkn

)

= |A||B|

• Proof of Property 2: (cf. page 19)

3 Proof is based on A. Deif, Advanced Matrix Theory for Scientists and Engineers, Halsted Press, London,1982. pp. 14-15.


Denote the ith column of A as ai. Suppose the kth column of A is a linear combinationof the other columns of A, i.e.

A = (a1| · · · |ak| · · · |an)

ak =∑

j 6=k

γjaj

where γj 6= 0 for some j 6= k. By postmultiplying A with Ecombine,

A ∗ Ecombine(k : 1,−γ1; · · · ; n,−γn) = B = (b1| · · · |bk| · · · |bn)

where

bi = ai if i 6= k

bk =∑

j 6=k

γjaj −∑

j 6=k

γjaj = 0

Since matrix B has column k filled with 0’s, expanding along this column, we find|B| = 0. Using property 1, |A||Ecombine| = 0. Since |Ecombine| 6= 0, it must be that|A| = 0.

• Proof of (1.10): (cf. page 10) First, suppose A = (a), a 1 × 1 matrix. Then byexpanding along the first row,

∣

∣

∣

∣

(

a 0C B

)∣

∣

∣

∣

= a|B|

Now assume that (1.10) is true for A[=](n − 1) × (n − 1). Let

Z =

(

G 0C B

)

with G[=]n × n. By expanding along the first row,

|Z| =n+m∑

j=1

z1jcof(z1j)

where

z1j =

g1j j ≤ n0 j > n

cof(z1j) = (−1)1+j

∣

∣

∣

∣

(

G1j↓ 0C B

)∣

∣

∣

∣

, j ≤ n

then

|Z| =n

∑

j=1

g1jcof(g1j)|B| = |G||B|

Equation (1.10) is proved by induction starting from n = 1.


• Proof of (1.11): (Cf. page 10) Using (1.9), with A nonsingular,(

A DC B

)(

I −A−1D0 I

)

=

(

A 0C B − CA−1D

)

Applying (1.32) and (1.10),∣

∣

∣

∣

(

A DC B

)∣

∣

∣

∣

|I| = |A||B − CA−1D|

• Proof of the matrix inverse formula: (1.2). (Cf. page 9) Let B = A adj(A) =(bij), then

bkk =n

∑

j=1

akj cof(akj) = |A|

If i 6= j,

bij =n

∑

ℓ=1

aiℓ cof(ajℓ)

which is the same as the determinant of a matrix M having row i equal to row jobtained via expansion on row j, i.e.

M =

a11 a12 · · · a1n

a21 a22 · · · a2n...

.... . .

...ai1 ai2 · · · ain...

.... . .

...aj1 aj2 · · · ajn...

.... . .

...an1 an2 · · · ann

obtain determinant of M←− by expanding about row j

with ajk = aik

Because row j is equal to row i, property 2 implies bij = |M | = 0 for i 6= j orB=diag(|A|, |A|, . . . , |A|). Thus A adjA/|A| = I or A−1 = adjA/|A|.

• Proof of (1.40): d(Ax)/dx = A. (cf. page 20)

Let n = 1. Then x = (x1) and A = (a11, . . . , am1)T

d

dxAx =

d(a11x1)/dx1...

d(am1x1)/dx1

= A

Now assume that (1.40) is true for some A[=]m × n. Let B = (A|v), i.e. append anm × 1 vector v to the left of A, and let y = (xT , α)T be a new vector of length n + 1,where α is a scalar variable.

By = (A | v)

(

xα

)

= (Ax + vα)


d

d

(

xα

)(Ax + vα) =

(

∂(Ax + vα)

∂x

∣

∣

∣

∣

∂(Ax + vα)

∂α

)

= (A | v) = B

Thus (1.40) is proved by induction from n = 1.

• Proof of (1.41): d(xT Ax)/dx = xT (A + AT ) (cf. page 20)

Let n = 1,d

dxxT Ax = 2x1a11 = xT (A + AT )

Now assume that (1.41) is true for A[=]n × n.

Let y = (xT , α)T be a new vector of length n + 1, where α is a scalar variable.Let B[=](n + 1)× (n + 1) be obtained by appending vectors of constants v and w andscalar constant β

B =

(

A vwT β

)

then

yT By = xT Ax + αwT x + xT vα + αβα

d

d

(

xα

)(yT By) =

(

∂(yT By)

∂x,

∂(yT By)

∂α

)

=(

xT (A + AT ) + α(wT + vT ) | xT (w + v) + 2αβ)

= (xT | α)

(

A + AT w + vwT + vT 2β

)

= yT

((

A vwT β

)

+

(

AT wvT β

))

= yT (B + BT )

where we took advantage of the fact that xT v and xT w are 1 × 1 matrices and aretherefore symmetric. Thus (1.41) is proved by induction from n = 1.

Exercises

E1. Show that AT A, AAT and A+AT are symmetric, and A−AT is skew symmetric.

E2. Prove that the determinant of a triangular matrix is the product of the diagonals.(Hint: Use induction.)


E3. Show that |A| = |AT |. (Hint: use the row and column expansion formulas fordeterminants.)

E4. Let A[=]n × m and B[=]m × n, show that

|In − AB| = |Im − BA|

(Hint: use (1.11) and (1.12))

E5. For a nonsingular matrix G, define the relative gain array of G as

R = G G−T

where G−T = (G−1)T . Prove the following properties for R:

1. The sum of all elements along any row or along any column is 1, i.e.

n∑

i=1

rij = 1 for all i

n∑

j=1

rij = 1 for all j

2. if G is triangular then R = I.

E6. Let A, C and C−1 + DA−1B, be nonsingular, and B and D are not necessarilysquare. Prove the matrix inversion lemma:

(A + BCD)−1 = A−1 − A−1B[

C−1 + DA−1B]−1

DA−1

(Hint: use the definition of inverse and show

(A + BCD)(A + BCD)−1 = I = (A + BCD)−1(A + BCD)

Be careful that matrix multiplications are not commutative in general, and do notto take inverses of B or D, since these are not square.)

E7. Prove that (AB)T = BT AT . (Hint: Use the definition of matrix multiplication.)

E8. Prove that (AB)−1 = B−1A−1. (Hint: Use the definition of inverses.)

E9. Prove that |A−1| = 1/|A| if A is nonsingular. (Hint: use definition of inverse and(1.32).)

E10. If A2 = A then A is said to be idempotent. Let A and B be idempotent. Provethe following:


1. I − A is idempotent.

2. AB is idempotent if AB = BA.

3. A + B is idempotent only if BA = −(AB)2.

E11. If Ar = 0, with Ar−1 6= 0 for some positive integer r, then A is said to benilpotent of index r. Show that a triangular matrix whose diagonal elements areall zero is nilpotent.

E12. Show that equations (1.13)-(1.16) yield the required block matrices which formthe inverse of a partitioned matrix. (Hint: Substitute to show that

(

A BC D

)(

W XY Z

)

= I

is true)

E13. The Vandermonde matrix is a matrix having a special structure given by

V =

λn−11 λn−2

1 · · · λ1 1λn−1

2 λn−22 · · · λ2 1

......

. . ....

...λn−1

n λn−2n · · · λn 1

Prove that the determinant is given by

|V | =∏

i<j

(λi − λj)

(Hint: Use the fact that the product series can be rearranged to become,

∏

i<j

(λi − λj) =n

∑

k=1

(

(−1)k+1λn−1k

∏

i<j,i6=k,j 6=k

(λi − λj)

)

and then prove using induction starting with n = 2)

E14. Let A[=]m × q and B[=]q × n. Show that

d

dt(AB) = (

d

dtA)(B) + (A)(

d

dtB)

E15. Unsteady-state heat conduction in a flat rectangular plate, say of dimensionL × W , can be described by

α∂T

∂t=

∂2T

∂x2+

∂2T

∂y2


Suppose the boundary conditions are given by

T (0, y, t) = 100

T (L, y, t) = 100

T (x, 0, t) = 100

T (x,W, t) = 100

and initial condition of the plate is

T (x, y, 0) = 50

A finite difference approximation to the differential equation (with ∆x = ∆y,L = N∆x and W = M∆x) is given by

Tn,m(k + 1) =1

µ

([

Tn−1,m(k) + (µ

2− 2)Tn,m(k) + Tn+1,m(k)

]

+[

Tn,m−1(k) + (µ

2− 2)Tn,m(k) + Tn,m+1(k)

])

(1.42)

where

µ =α(∆x)2

∆tis the modulus which needs to be > 4 for the approximation to be stable. T (k) isthe temperature distribution at time t = k∆t of the rectangular grid and T (k) isan N × M matrix of temperature values.

1. Obtain matrices A, B and C such that the finite difference equation in (1.42)can be written as

T (k + 1) = AT (k) + T (k)B + C (1.43)

2. (Computer project) Using available matrix software, obtain and animate thetemperature profiles using (1.43) from k = 0 to k = 1000, using µ = 8,N = 50 and M = 100 (To reduce memory problems store only the images atevery 10th k-increments.)

E16. The general equation of an ellipse is given by

(

x

a1

)2

+

(

y

a2

)2

− 2xy

a1a2

cos δ = sin2 δ (1.44)

where a1, a2, sin(δ) 6= 0.

Let v = (x, y)T , find matrix A such that equation (1.44) can be written as

vT Av = 1

(Note: vT Av = 1 is the general equation of a conic and if A is positive definite, aproperty to be discussed in the next chapter, then the conic is an ellipse.)


E17. Discuss the changes needed for Gaussian elimination algorithm given in sec-tion 1.3.3 to apply to nonsquare matrices of size n × m, n 6= m.

E18. Show that tr(AB)=tr(BA) (assuming conformability conditions are met).

Chapter 2

The Linear Equation: Ax=b

In this chapter, we will treat the equation,

Ax = b (2.1)

in three perspectives. The first view is to treat (2.1) as the matrix description of a setof n linear equations in which x = x1, . . . , xn is the set of unknowns which needs to beevaluated such that all the equations are satisfied. The objective is to determine the set x.If no set exists that achieves equality, the problem is said to have no solution. This view willbe referred to as the solution of linear equations.

The second view is to consider matrix A = (a1| . . . |an) as a set of column vectors, ai.The column vector, x = (x1, . . . , xn)T , is a set of scalar weights xi which linearly combinesthe columns of A. The problem is to find x such that the sum : a1x1 + . . . + anxn matchesvector b. In some cases, an exact match can not be obtained. The problem is then relaxedto that of finding the closest match, e.g. in a least squared error sense. The column vectorsof A forms the base or basis set to which an algebra is performed. This view is thus referredto as the linear algebra treatment of (2.1).

The third view is to consider matrix A as an operator which moves the position ofvector x, to a new position described by vector b. In this treatment, the problem is not somuch the evaluation of x. Instead, the main focus is the analysis of the properties of A, i.e.to determine what it does to various input vectors x as well as what output vectors b canresult. This third view is the linear operator view of (2.1).

All three views are important in engineering applications. In solving of linear equa-tions, the main purpose is the exact determination of unknown values. The second view,i.e. the linear algebra perspective, is useful for mathematical modeling where the best ap-proximation is the focus. The third view, the linear operator perspective, is useful for thedesign/redesign of operator A to achieve desired characteristics.

Unfortunately, the coexistence of these different views can cause confusion. One sourceof confusion is the term – coefficients. When solving linear equations, the rows of A are the

29


coefficients of each equation. However, when taking the linear algebra view, x is the set ofweights, or coefficients, and not matrix A.

Since all three viewpoints consider exactly the same equation (2.1), it is inevitablethat the tools developed within each perspectives will enter in the treatment of the otherperspectives. Thus it is important to be able to distinguish among the three perspectiveswhile still being able to temporarily switch views when needed.

2.1 Solution of Linear Equation.

For a given set of linear equations,

a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2...

......

an1x1 + an2x2 + · · · + annxn = bn

(2.2)

where aij and bi are fixed constants, the problem is to find the values of x1, x2, . . ., xn whichsatisfy all the equations simultaneously.

The set of equations in (2.2) can be written in matrix notation,

Ax = b

where

A =

a11 a12 · · · a1n

a21 a22 · · · a2n...

.... . .

...an1 an2 · · · ann

x =

x1

x2...

xn

b =

b1

b2...bn

Three case are possible: (1) there is a unique solution, (2) there are multiple (indeedinfinite) solutions, and (3) there are no solution possible that would satisfy all the givenequations. Matrix theory will be used to determine which case the problem falls under, aswell as the solutions (if they exist).

Case 1: Unique Solution Exists.

If A−1 exists, then the solution is unique and is immediately obtained by pre-multiplying (2.1) by A−1 to yield

x = A−1b (2.3)

In some applications, one might be interested only in a much smaller subset ofx1, . . . , xn. A method known as Cramer’s rule is useful in these instances.


Theorem 2.1 Cramer’s Rule.

If A is nonsingular then the jth element of the solution vector x for Ax = b isgiven by

xj =|Ω(j)||A| (2.4)

where Ω(j) is matrix A whose jth column is replaced by vector b.


Case 2: Multiple Solutions.

If A is singular then either an infinite number of solutions exist or no solutionexists. Recall from the Gaussian elimination procedure discussed in section 1.3.2that there exists nonsingular matrices Q and W such that

QAW =

(

Ir 00 0

)

(2.5)

(if A is nonsingular then r = n and QAW = I)

Applying these matrices on (2.1),

(QAW )(W−1x) = Qb

If we let y = W−1x and partition y and Q as follows:

y =

(

y(1)

y(2)

)

Q =

(

Q(1)

Q(2)

)

where y(1)[=]r × 1 and Q(1)[=]r × n. This results in two sets of matrix equations

y(1) = Q(1)b (2.6)

0y(2) = Q(2)b (2.7)

Equation (2.7) can now be used to determine if a solution exist. A solution ispossible only if Q(2)b = 0.

If consistency is present, the solution is given by

x = W

(

Q(1)by(2)

)

in which y(2) becomes an (n − r) × 1 vector of arbitrary constants.


Example 2.1

Suppose the set of equations is given by

0 2 2 42 1 3 4−1 2 1 33 1 4 6

x1

x2

x3

x4

=

810514

From example 1.5 (cf. page 17),

Q =

0 −7/5 1/5 10 −6/5 −2/5 10 3 0 −21 −2/5 −4/5 0

, W =

0 1 0 10 0 0 10 0 1 −11 0 0 0

and r = 3. Q(1) consists of the first three rows of Q while Q(2) consists of thelast row of Q. Since

Q(2)b = (1,−2/5,−4/5, 0)

810514

= (0)

there are multiple solutions. Setting y(2) = (α),

x =

0 1 0 10 0 0 10 0 1 −11 0 0 0

0 −7/5 1/5 10 −6/5 −2/5 10 3 0 −2

810514

α

=

αα

2 − α1

♦♦♦

If one is interested only in determining whether a solution exists or not, another index,called the rank, can be used. This index has already appeared earlier as r in (2.5)

Definition 2.1 The rank of a matrix A, denoted rank(A) is the size of the largestnonsingular square submatrix obtained from the columns and rows of A.


With this definition, and the property that |AB| = |A||B|, one could see that with Bnonsingular,

rank(A) = rank(AB) = rank(BA) (2.8)

Since Q and W used in the Gaussian elimination procedure are nonsingular, the size of theidentity matrix Ir in (2.5) turns out to be the rank of A.

Next, obtain a partitioned matrix

G = (A | b)

and

QG = (QA | Qb) =

(

V Q(1)b0 Q(2)b

)

(2.9)

where V is a matrix formed by extracting the first r rows of W−1, and rank(V ) = rank(QA) =r. From (2.9) we arrive at the rank test for the existence of solutions.

Theorem 2.2 A solution exist for (2.1) if and only if

rank(A) = rank((A | b)) (2.10)

For the special case of b = 0, the rank condition (2.10) is immediately satisfied. If Ais nonsingular, the only solution is x = 0. However, if A is singular, nonzero solutions exist,even though x = 0 still satisfy Ax = 0. x = 0 is referred to as the trivial solution sincethis always true for Ax = 0, while the nonzero solutions, x 6= 0, are called the nontrivialsolutions.

Case 3: No Solution.

As a corollary to theorem 2.2 we have

Corollary 2.3 A solution does not exist for (2.1) if and only if

rank(A) < rank((A | b))

Suppose there is additional knowledge on which equations may be unreliable, sayequation j. One could set bj as an unknown to determine a new value for bj to satisfy therank test (2.10), i.e. determine bj such that Q(2)b = 0.

Example 2.2


0 2 2 42 1 3 4−1 2 1 33 1 4 6

x1

x2

x3

x4

=

3566

Matrix A is the same as in example 2.1, so we can use Q and W determined fromthat example. However, there is no solution for this example because Q(2)b =(−19/5) 6= (0).

Suppose the second equation is unreliable. We want to find b2 so that Q(2)b = 0.

Q(2)b = (1,−2/5,−4/5, 0)

3b2

66

=

(

− 9 + 2 ∗ b2

5

)

i.e. set b2 = −9/2. With this new vector b, the case of multiple solutions can nowbe solved.

♦♦♦

The results for the solution Ax = b for x can be generalized for other linear problems.For instance, the Lyapunov equations for the unknown matrix X are given by the form

BX + XC = E (2.11)

where B[=]n × n, C[=]m × m, X[=]n × m and E[=]n × m. Except for the special casewhen either B or C are identity matrices, one can not solve (2.11) by direct matrix inversionapproach. However, using the properties of Kronecker products (cf. page 6)and vectorizationoperations (cf. page 7), one can easily transform problems like (2.11) into the familiar formof Ax = b. This transformation depends on the two lemmas given below.

Lemma 2.1 For matrices X[=]m × n and Y [=]m × n,

vec (X + Y ) = vec (X) + vec (Y ) (2.12)

Lemma 2.2 For matrices B[=]p × q, X[=]q × r and C[=]r × s,

vec (BXC) =(

CT ⊗ B)

vec (X) (2.13)


(see page 64 for proof of lemma 2.2.)

Returning to (2.11), we can now apply both lemmas and obtain

vec (BX + XC) = vec (E)

(Im ⊗ B) vec(X) +(

CT ⊗ In

)

vec(X) =

Ax = b

where,A = Im ⊗ B + CT ⊗ In

and x = vec (X), b = vec (E). With this transformation, the same criterion can be usedon A to determine whether the linear equation has a unique, non-unique or no solution. Ofcourse after x has been determined, an additional step is required to reverse the vectorizationoperation to reshape x to have the same size of X.

Exercises

E1. An N -multistage evaporator system is shown in Figure 2.1. Brine having a mass

Figure 2.1: Flowsheet of an N-stage evaporator system.

fraction xLN+1salt is fed to stage N at a rate nLN+1

. Condensation of steam fromstage (i − 1) is used to evaporate some water that is in the brine fed to stage i.By fixing the pressure at each stage i and assuming that the enthalpy of the brine


solution is the same as that of pure water, the values of liquid enthalpy HLiand

HVican be considered constant. For steady state operations, the equations are

given by

Enthalpy balances:

(

HVi−1− HLi−1

)

nVi−1− nVi

HVi− nLi

HLi+ nLi+1

HLi+1= 0 (2.14)

Mass balance:nLi+1

− nLi− nVi

= 0 (2.15)

Component balance:nLi+1

xLi+1− nLi

xLi= 0 (2.16)

From these equations, it is desired to obtain the amount of fresh steam needed bystage 1 in order to obtain a concentrated brine solution of xL1

.

1. Show that equation 2.16 yields the overall mass balance equation,

xL1nL1

= xLN+1nLN+1

(2.17)

2. Collect the various mass flow rates into vectors as follows:

nV =

nV1

nV2

...nVN

; nL =

nL1

nL2

...nLN

Obtain the matrices A, B and C, and vectors q, h, f and g such that equations(2.14),(2.15) and (2.17) can be rewritten in matrix form

AnV + BnL + qnV0= h (2.18)

nV + CnL = f (2.19)

gT nL = xLN+1nLN+1

(2.20)

where

h =

0...0

HLN+1nLN+1

; f =

0...0

nLN+1

3. These equations can be further put in compact form as

G n = v (2.21)


where

G =

A B qI C 00 gT 0

(2.22)

n =

nV

nL

nV0

(2.23)

v =

hf

xLN+1nLN+1

(2.24)

where nV0can be extracted as the last element of n = G−1v.

G is of size (2N + 1)× (2N + 1) but relatively sparse. There exists numericalmethods which could take advantage of the sparseness. Another method isto manipulate the matrix equations. Show that nV0

can be obtained from(2.18),(2.19) and (2.20) to be

nV0=

xLN+1nLN+1

− gT (AC − B)−1(Af − h)

gT (AC − B)−1q(2.25)

Note that equation (2.25) involves the inverse of a smaller matrix (AC − B)that is of size N × N .

(Hint: first solve for nV in equation (2.19), then substitute into (2.18). Finally,substitute into (2.20) and solve for nV0

.)

4. Show that you can also derive equation (2.25) using Cramer’s rule togetherwith equations (1.11) and (1.13)-(1.16).

E2. For the multiple reversible first-order batch reaction shown in Figure 2.2, the

Figure 2.2: A schematic of three-component reversible reaction.

equilibrium equation is given by Ax = 0, where

A =

−(kab + kac) kba kca

kab −(kba + kbc) kcb

kac kbc −(kca + kcb)


x =

xa

xb

xc

with xi being the concentration of component i (in units of mass/volume) and kij

being the specific rate constant of the formation of j from i. By noting that Ais singular, the equation should yield multiple solution. However, starting withinitial concentration xio for component i, the equilibrium is unique. What is themissing equation? Is it possible, even with the additional equation, that a case ofnon-unique solution or a case with no solution may still occur? Explain.

E3. Let A[=]n × n, X[=]n × m and C[=]n ×m, show that equation AX = C can berewritten as

(Im ⊗ A)x = c

where x = vec(X) and c = vec(C).

E4. Let A[=]n× n, B[=]m×m, G[=]n× n, H[=]m×m X[=]n×m and C[=]n×m.To solve for X on

AX + XB + GXH = C

determine D such thatDx = c

where x = vec(X) and c = vec(C).

2.2 Linear Algebra Interpretation of Ax=b

The linear algebra perspective to the equation, Ax = b is specially useful to mathematicalmodeling where linear regressions are needed to determine unknown model parameters.

To illustrate, suppose a study has gathered some experimental data shown in Table 2.1to relate the effects of temperature and pressure on concentration.

The investigator wishes to obtain a linear model given by

C = αT + βP + γ (2.26)

that would best fit the data. To determine the unknown parameters of the model (2.26, letus first assume that the model will fit each data point.

0.660 = α 60.2 + β 600 + γ


Table 2.1: Sample experimental data.

Temperature Pressure Concentration(oF) (mm Hg) (mole fraction)

60.2 600 0.66070.1 602 0.67279.9 610 0.68690.0 590 0.68462.3 680 0.70275.5 672 0.71177.2 670 0.71385.3 670 0.72060.7 720 0.72169.3 720 0.73178.7 725 0.74292.6 700 0.74260.5 800 0.76070.6 800 0.77181.7 790 0.77791.8 795 0.790

0.672 = α 70.1 + β 602 + γ... (2.27)

0.790 = α 91.8 + β 795 + γ

To write these equations into matrix form,

A =

60.2 600 170.1 602 1

......

...91.8 795 1

x =

αβγ

b =

0.6600.672

...0.790

then the set of equations in (2.27) becomes

Ax = b

It is highly unlikely that all the equations in (2.27) can be satisfied simultaneously.Further, matrix A is not a square matrix and does not have an inverse. What we will developin this chapter is a pseudo-inverse, A†, which acts almost like an inverse in the sense thatx = A†b becomes the best approximate solution.


2.2.1 Preliminaries: Terms and Notations

Consider a set of vectors a1, a2, . . ., am, each having n elements,

a1 =

a11

a12...

a1n

; a2 =

a21

a22...

a2n

; ; am =

am1

am2...

amn

(2.28)

Each vector describes a point in an n-dimensional space.

1. Linear Combination and Basis Vectors.

A linear combination of vectors a1, . . . , am (each of dimension n ≥ m) is a weightedsum these vectors to yield another vector, say b, of the same dimension,

b = x1a1 + · · · + xmam (2.29)

where x1, x2, . . . , xm are scalars.

2. Span

In (2.29), if each scalar xi is allowed to take on a range of values, then vector b willreside in a subspace of dimension less than or equal to m. This subspace is called thespan and it contains all the vectors resulting from a linear combination of the spanningvectors including the origin.

Span(a1, . . . , am) = x1a1 + . . . + xmam (2.30)

where scalars, xi, are real or complex depending on the application.

3. Euclidean Norm

The norm of a vector is the distance of the point from the origin. One specific measure,called the Euclidean norm, uses the Pythagorean theorem to obtain the distancebetween two points: one point is the position represented by the vector and the otherpoint is the origin. The Euclidean norm 1 for vector ai, denoted by ‖ai‖, is evaluatedas follows:

‖ai‖ =√

a2i1 + · · · + a2

in (2.31)

Example 2.3

1 This norm is also known as 2-norm since each element is raised to the second power and is also denotedby ‖ai‖2.


Consider the following vectors

a1 =

2−22

, a2 =

−200

, a3 =

22−2

, a4 =

−221

Since there are three elements in vector a4, the dimension is n = 3. The norm ofa4 is given by

‖a4‖ =√

(−2)2 + 22 + 12 = 3

Figure 2.3 shows the point described by vector a4 and its norm.

Figure 2.3: The norm of vector a4.

The span of a1, a3 and a4 is given by

q = Span(a1, a3, a4) = x1a1 + x3a3 + x4a4

=

2x1 + 2x3 − 2x4

−2x1 + 2x3 + 2x4

2x1 − 2x3 + x4

which turns out to be the whole three dimensional space as x1, x3 and x4 takeon any real values. This is shown in Figure (2.4). Note that the vectors a1, a3

and a4 are not coplanar with the origin.

The span of a1, a2 and a3 is given by

Span(a1, a2, a3) = x1a1 + x2a2 + x3a3

=

2x1 − 2x2 + 2x3

−2x1 + 2x3

2x1 − 2x3


Figure 2.4: The span of the set a1, a3 and a4.

It turns out that the points a1, a2 and a3 plus the origin, lies in a two dimensionalplane described by

y + z = 0

Thus the dimension of the span space could be less than m, the number ofspanning vectors. It turns out that in this example, a3 = −a1 − 2a2, i.e. a3 islinearly dependent on vectors a1 and a2. The notion of linear independence isdefined more formally below.

♦♦♦

Definition 2.2 A set of vectors a1, a2, . . . , am are linearly independent if theonly possibility for the linear combination

x1a1 + x2a2 + · · · + xmam = 0

to be true is for the scalars x1, . . . , xm to be all zero. Otherwise, the set is linearlydependent.

If the spanning vectors a1, a2, . . . , am are linearly independent, then they are called the basisvectors.

One method for checking linear independence is through the use of the Grammianmatrix.


Figure 2.5: The span of the set a1, a2 and a3.

Definition 2.3 Given the set of n-dimensional vectors a1, a2, . . . , am, and

A = (a1|a2| · · · |am)

The Grammian G is defined as

G = AT A

Theorem 2.4 A set of n-dimensional vectors a1, a2, . . . , am is linearly indepen-dent if and only if the corresponding Grammian is nonsingular.

( see page 65 for proof of theorem 2.4.)

An immediate consequence of theorem 2.4 is that the set of n-dimensional vector a1, . . . , am

is linearly independent if and only if rank(A) = min(m,n), where A = (a1| · · · |am).

2.2.2 Least squares solution to Ax = b

.

Let matrix A be the matrix obtained by augmenting the basis vectors a1, . . . , am

A = (a1| · · · |am)


(In this section, we will restrict the elements of the vectors and matrices to be real numbers.)

The corresponding scalar coefficients of a linear combination can be collected into avector x

x =

x1...

xm

and if x is allowed to be arbitrary, the span is given by the product, Ax.

If vector b is included in the span Ax, equality for Ax = b is possible. On the otherhand, if vector b is not in region inside the span, the best one could do is to find a point inAx that is closest to the point b. This means that a value for x has to be found that wouldminimize the error vector

e = Ax − b

Figure 2.6: Graphical interpretation of the least square problem.

One choice for measuring the error is the Euclidean norm. However, for simpler cal-culations, the problem does not change if we use the square of the norm instead. Theminimization problem is then formulated as

xlsq = arg(minx

‖Ax − b‖2)

This is often referred to as the least squares problem and xlsq is referred to as the leastsquares solution.2

To find the minimum of a function that is dependent on only one variable, the resultfrom calculus states that at the minimum point, the derivative has to be zero, while at thesame time the second derivative has to be positive (i.e. concave upwards). For situationsin which the function to be minimized depends on more than one variable, we will need an

2Strictly speaking, it should be “least squares error” problem and solution, but the term “error” is droppedsince the method applies to more general situations where the objective functions do not involve errors.


extension that is given in theorem 2.5. It states that the partial derivatives with respect toeach independent variable has to be zero. In addition, for the function to attain minimum,it is sufficient that matrix containing the second order partial derivatives, called the Hessian,will have to attain a condition called positive definiteness.

Definition 2.4 A matrix P [=]n × n is positive definite if xT Px > 0 for allx 6= 0.

Example 2.4

Let A be given by

A =

(

3 −1−1 2

)

then

xT A x = (x1, x2)A

(

x1

x2

)

= 3x21 − 2x1x2 + 2x2

2

= 2x21 + (x2

1 − 2x1x2 + x22) + x2

2

= 2x21 + (x1 − x2)

2 + x22

Thus A is positive definite because xT Ax is always positive as long as x1 and x2

are not both zero. The process just shown is called completing the squares. (This method can be too complicated for large dimensions. There are other moreefficient methods for determining positive definiteness of square matrices whichwill be discussed in later sections. ) To show why positive definiteness is anextension to the concept of concavity, a plot of xT Ax is shown in Figure 2.7.

♦♦♦

Theorem 2.5 Let f(x) be a scalar function of vector x. Then xopt minimizesthe value of f if

df

dx(x = xopt) = 0

and the Hessian matrix H is positive definite, where H is given by

H =d

dx

[

df

dx(x = xopt)

]


-2 0 2-2

020

10

20

30

x1

x2

x A

xT

Figure 2.7: Plot of xT Ax, where A is positive definite.

Expanding the square of the norm of vector e,

‖e‖2 = (Ax − b)T (Ax − b)

= (xT AT − bT )(Ax − b)

= xT AT Ax − 2bT Ax + bT b

d

dx‖e‖2 = 2xT AT A − 2bT A

After taking the transpose and equating to zero,

AT A xlsq = AT b (2.32)

If AT A is nonsingular,

xlsq = (AT A)−1AT b (2.33)

This equation is referred to as the normal equation.3 To determine if the solution (2.33)indeed yields a minimum, we need to check whether

d

dx

(

d

dx‖e‖2

)

= 2AT A

3 The term is due to the matrix AT A being a normal matrix. As to be discussed later, a matrix N isnormal if NT N = NNT of which symmetric matrices are special cases.


is positive definite. If AT A is nonsingular, it is immediately a positive definite matrix. Toshow this, let s = Ax.

p = xT (AT A)x

= (xT AT )(Ax)

= sts

and p > 0 as long as s 6= 0.

Recall also that AT A is the Grammian for the columns of A. Thus, as long as thecolumns of A are linearly independent, AT A is guaranteed to be nonsingular and positivedefinite. This means that the normal equation yields a least squared error solution to Ax = b.

Let us take another look at the normal equation (2.33). The product [(AT A)−1AT ]acts like an inverse in solving the problem, Ax = b. This group of terms is often referred toas the pseudoinverse of A, often denoted as A†. If A is square and nonsingular, the inverseof A and pseudoinverse of A are exactly the same:

A† = (AT A)−1AT = A−1(AT )−1AT = A−1

For more general cases, e.g. including when AT A is singular, the pseudoinverse (orsometimes called the Moore-Penrose generalized inverse) is defined as the matrix A† thatsatisfies the following conditions:

1. AA† and A†A are Hermitian

2. AA†A = A

3. A†AA† = A†

2.2.3 Linear-in-parameters Model

Often times, the models required to fit the data can be rearranged into a form that is linearin the unknown parameters.

g(y1, . . . , yn) = α1f1(y1, . . . , yn) + · · · + αmfm(y1, . . . , yn) (2.34)

α1, . . . , αm are the unknown parameters. y1, . . . , yn are the variables that are observed in anexperiment. g, f1, . . . , fm are functions which do not contain any unknown parameters.

Once in this form, the matrix notation becomes Ax = b, where

A =

f1,[1] · · · fm,[1]

f1,[2] · · · fm,[2]...

......

f1,[T ] · · · fm,[T ]

x =

α1

α2...

αm

b =

g[1]

g[2]...

g[T ]

(2.35)


The subscript [i] denotes evaluation of the function using the values observed at the ithsample, and T > m is total number of samples in the experiment. The unknown parameterscan then be easily determined using the normal equation, as long as AT A is nonsingular, i.e.the columns of A need to be linearly independent.

Example 2.5

Consider the raw data from the laboratory for the determination of vapor pressureas a function of temperature:

Table 2.2: Raw Data from Vapor Pressure Experiment.

Temperature Vapor Pressure(oC) (mm Hg)

29.0 2030.5 2140.0 3545.3 4653.6 6860.1 9272.0 15279.7 20683.5 23890.2 305105.2 512110.5 607123.2 897130.0 1092132.0 1156

Suppose one wishes to determine the coefficients of the Antoine Equation givenby:

log10(Pvap) = A − B

T + C(2.36)

where Pvap is the vapor pressure in mm Hg and T is the temperature in oC.The model given in (2.36) is not linear in the parameters A, B and C. We canrearrange the equation to make it amenable to the normal equation:

(T + C) log10(Pvap) = (T + C)A − B

T log10(Pvap) = −C log10(Pvap) + AT + (AC − B) (2.37)


Compared with (2.34), we have g(T, P ) = T log10(Pvap), f1(T, P ) = log10(Pvap),f2(T, P ) = T and f3(T, P ) = 1. These functions can be evaluated at each datapoint as shown in Table 2.3. The parameters are recast as α1 = −C, α2 = A andα3 = AC − B.

Table 2.3: Function Evaluations for Different Samples.

g = T log10(Pvap) f1 = log10(Pvap) f2 = T f3 = 1

37.43 1.291 29.0 140.47 1.327 30.5 161.85 1.546 40.0 175.28 1.662 45.3 198.30 1.834 53.6 1117.91 1.962 60.1 1157.04 2.181 72.0 1184.42 2.314 79.7 1198.47 2.377 83.5 1224.09 2.484 90.2 1284.98 2.709 105.2 1307.56 2.783 110.5 1363.76 2.953 123.2 1394.97 3.038 130.0 1404.29 3.063 132.0 1

1.291 29.0 11.327 30.5 1

......

...3.063 132.0 1

α1

α2

α3

=

37.4340.47

...404.29

Solving the normal equation yields

α1 = −222.5, α2 = 7.39, α3 = 110.28

The original parameters can be obtained:

A = α2 = 7.39, C = −α1 = 222.5, B = AC − α3 = 1534

Figure 2.8 shows the data points together with the model (2.36) after applyingthese estimates.

♦♦♦


20 40 60 80 100 120 1400

500

1000

1500

Temperature (deg C)

Vapor

Pre

ssure

(m

m H

g)

Figure 2.8: Comparison of the Antoine model and raw data.

2.2.4 Least Squares Approximation Under Equality Constraints.

Often times, the physics of the system require that the model strictly pass through specifiedpoints. For instance, for multicomponent systems an fL-fV diagram require that at least themole fraction in the liquid phase fL = 0 when the mole fraction in the vapor phase fV = 0.Likewise, fL = 1 when fV = 1. Constraints can generally be handled by using methodssuch as calculus of variations. However, in cases where the constraints can be set up to besystems of linear algebraic equations and where the model can be formulated to be in a formthat is linear in parameters, a simple modification of the normal equation offer a more directsolution.

Let x be the model parameters, i.e.

u1x1 + u2x2 + · · · + unxn = y

orUx = Y (2.38)

that needs to identified based on available measurements of ui and y. Also, suppose thatthere are r (< n) independent equality constraints, i.e.

q11x1 + q12x2 · · · + q1nxn = z1

q21x1 + q22x2 · · · + q2nxn = z2

... =...

qr1x1 + qr2x2 · · · + qrnxn = zr


orQx = Z

Further, suppose Q and x are partitioned as

(Qa | Qb)

(

xa

xb

)

= Z

where Qa is nonsingular. (Note: reordering or reindexing of the parameters may be necessaryin most cases to have Qa nonsingular). Then

xa = −Q−1a Qbxb + Q−1

a Z (2.39)

Let Ua be the matrix formed from the first r columns u1, . . . , ur. Let Ub be formedusing the remaining n − r columns ur+1, . . . , un. Then equation (2.38) becomes

(Ua | Ub)

(

xa

xb

)

= Uaxa + Ubxb = Y

Using xa from equation (2.39),

Ua(−Q−1a Qbxb + Q−1

a Z) + Ubxb = Y

(Ub − UaQ−1a Qb)xb = (Y − UaQ

−1a Z)

The least squares solution of xb is then given by

xb = (M tM)−1M tW (2.40)

where M = Ub − UaQ−1a Qb and W = Y − UaQ

−1a Z. Finally, xa is evaluated by substitution

of xb in equation (2.39).

Example 2.6

Consider the task of finding a second order polynomial model approximation forfL-fV diagram

fV = αf 2L + βfL + γ (2.41)

to fit the set of vapor-liquid equilibrium data for a binary system that is givenin Table 2.4.

It is required that fL = 0 at fV = 0. and fL = 1 at fV = 1. (For some systems, anazeotrope point needs to be fixed at fL,az = fV,az, 0 < fL,az < 1.) The regressionhas to obtain a model which passes through these fixed points. Based on themodel (2.41), the constraints are given as

(

0 0 11 1 1

)

αβγ

=

(

01

)


Table 2.4: Vapor-Liquid Equilibrium Data.

fL fV

0 00.0718 0.17000.1121 0.23320.1322 0.19370.1753 0.25300.1983 0.36360.2500 0.34780.2931 0.45060.3190 0.52570.3362 0.52170.3937 0.4032

fL fV

0.4052 0.59680.4483 0.65220.5172 0.67590.5690 0.75490.6236 0.81030.6753 0.81420.7443 0.83000.7902 0.89720.9080 0.92890.9167 0.98021.0000 1.0000

In order to use the partition suggested for (2.39), we need to rewrite the equationto be

Qx = Z −→(

1 0 01 1 1

)

γβα

=

(

01

)

i.e.

Qa =

(

1 01 1

)

Qb =

(

01

)

Z =

(

01

)

xa =

(

γβ

)

xb = (α)

Using the data,

Ua =

1 01 0.07181 0.1121...

...1 0.91671 1

Ub =

02

0.07182

0.11212

...0.91672

12

Y =

00.17000.2332

...0.9802

1

Proceeding to use equation (2.40) then equation (2.39), the model fit obtainedfrom the least squares with constraints is

fV = −0.6869 f 2L + 1.6869 fL (2.42)


Without imposing the constraints, i.e. applying the normal equation on Ux = Z,the fit is given by

fV = −0.6349 f2L + 1.5999 fL + 0.0262 (2.43)

The plots shown in Figure 2.9 compares the model fit that uses constraints andto the one that does not.

0 0.5 10

0.5

1

fL

f v

without constraints

using constraints

Figure 2.9: Comparison of models using constraints and not using constraints.

♦♦♦

Exercises

E1. Determine which of the following matrices are positive definite:

A =

(

3 41 2

)

B =

(

3 41 0

)

C =

(

0 2−2 0

)

E2. The linear regression model given by

y = mx + b

where x and y are the independent and dependent variables, respectively, m is theslope of the line and b is the intercept. Show that the familiar formulas

m =n

∑ni=1 xiyi −

∑ni=1 xi

∑ni=1 yi

n∑n

i=1 x2i − (

∑ni=1 xi)2


Table 2.5: Data to be fit by (2.44)

w z

0.1 0.650.2 -0.010.3 0.870.4 -0.550.5 -1.020.6 -0.460.7 -0.080.8 -1.230.9 -1.991.0 0.89

w z

1.1 1.821.2 3.151.3 4.011.4 2.511.5 -0.211.6 -3.391.7 -8.181.8 -7.521.9 -4.232.0 0.15

w z

2.1 7.122.2 11.252.3 14.272.4 9.112.5 -1.812.6 -9.442.7 -18.242.8 -16.552.9 -11.203.0 0.64

w z

3.1 13.993.2 24.273.3 23.533.4 15.453.5 0.623.6 -16.373.7 -30.673.8 -31.633.9 -20.134.0 -2.48

w z

4.1 23.334.2 36.524.3 40.414.4 24.564.5 0.634.6 -26.764.7 -46.664.8 -47.834.9 -31.235.0 0.55

b =

∑ni=1 yi

∑ni=1 x2

i −∑n

i=1 xi

∑ni=1 xiyi

n∑n

i=1 x2i − (

∑ni=1 xi)2

can be obtained from the normal equation.

E3. From the data given in Table 2.5, obtain the parameters α1, α2 and α3 that wouldyield the least squares fit using the following model:

z = (α1w2 + α2w + α3) sin(2πw) (2.44)

E4. Given the vapor liquid equilibrium data shown in Table 2.6, obtain a 5th orderpolynomial fit that satisfies the following constraints:

fL = 0 when fV = 0fL = 1 when fV = 1

fL = 0.65 when fV = 0.65

2.2.5 Gram-Schmidt Orthogonalization

Suppose we are given a set of linearly independent vectors, say x1, x2, . . . , xn. These vectorsare basis vectors that span an n-dimensional space. However, in order to obtain a betterdescription of the span space, it is often better to use a different basis set of n vectors thatare perpendicular (or orthogonal) to each other. For instance, for models that are linear-in-parameters, if the columns of data are closer to orthogonality (a property that is definedbelow), solving the normal equation becomes more computationally stable. The Gram-Schmidt algorithm is one procedure to obtain these mutually perpendicular basis vectors.


Table 2.6: Vapor Liquid Equilibrium Data

fL fV

0.02 0.070.06 0.160.11 0.240.18 0.330.22 0.350.29 0.430.37 0.500.50 0.560.71 0.680.78 0.74

fL fV

0.85 0.790.89 0.860.95 0.920.99 0.970.25 0.370.38 0.450.54 0.580.81 0.730.94 0.880.97 0.93

Definition 2.5 Let a and b be two vectors of the same length which could containcomplex numbers. Then the inner product of a and b denoted by 〈a, b〉 is givenby

〈a, b〉 = a∗b (2.45)

( Note that 〈b, a〉 = 〈a, b〉 (the complex conjugate of 〈a, b〉).)

Definition 2.6 Let a and b be two vectors of the same length. Then a and b areorthogonal to each other if < a, b >= 0. A set of vectors z1, . . . , zn is called anorthonormal set if

< zi, zj >=

0 if i 6= j

1 if i = j(2.46)

Gram-Schmidt Algorithm:

Given a set of linearly independent vectors: x1, . . . , xn

1. Start with x1.

Let y1 = x1, then normalize y1 via

z1 =y1

‖y1‖


2. Perform the following recursion for k = 2, . . . , n.

yk = xk −k−1∑

i=1

< xk, zk > zk

or in matrix notation,

yk =

In − (z1| . . . |zk−1)

z∗1...

z∗k−1

xk

then normalize yk to get zk

zk =yk

‖yk‖

Example 2.7

Consider the following vectors,

x1 =

121

x2 =

012

x3 =

110

Applying the method, we obtain

y1 =

121

z1 =

0.4080.8160.408

y2 =

I3 −

0.4080.8160.408

(0.408, 0.816, 0.408)

012

=

−0.667−0.3331.333

z2 =

−0.4360.2180.873


y3 =

I3 −

0.408 −0.4360.816 −0.2180.408 0.873

(

0.408 0.816 0.408−0.436 −0.218 0.873

)

110

=

0.214−0.1430.071

z3 =

0.802−0.5340.267

♦♦♦

For large systems, the Gram-Schmidt algorithm described above have been shown tobe sensitive to round-off errors. A modification to the Gram-Schmidt algorithm has beensuggested and yields more stable results. In addition, this modification also yields a veryuseful factoring of a matrix X = (x1| · · · |xn) containing the original vectors:

X = QR (2.47)

where Q is the matrix containing orthogonal vectors and R is an upper triangular matrix.Due to the convention used by the originators of this factorization (or decomposition), it isreferred to as the QR decomposition.

Modified Gram-Schmidt Orthogonalization (and QR Decomposition) Algorithm:

While i < n

1. Obtain rii and qi.

rii = ‖xi‖qi =

xi

rii

2. Obtain rik, k = i + 1, . . . nrik = q∗i xi

3. Update the remaining xk vectors

xk ←− xk − rikqi


End while-loop

Obtain rnn and qn.

rnn = ‖xn‖qn =

xn

rnn

Example 2.8

Let us use the same set as example 2.7 for comparison,

x1 =

121

x2 =

012

x3 =

110

Using the modified Gram-Schmidt algorithm ,

r11 = ‖x1‖ = 2.450

q1 =

0.4080.8160.408

(r12, r13) = q∗1(x2|x3)

= (0.408, 0.816, 0.408)

0 11 12 0

= (1.633, 1.225)

Update x2 and x3

(x2|x3) ←− (x2|x3) − q∗1(r12, r13)

=

0 12 12 0

−

0.4080.8160.408

(1.633, 1.225)

=

−0.667 0.5−0.333 01.333 −0.5

Continuing,


r22 = ‖x2‖ = 1.528

q2 =

−0.436−0.2180.873

r23 = q∗2x3

= (−0.436,−0.218, 0.873)

0.50

−0.5

= −0.655

x3 −→ x3 − r23q2 =

0.214−0.1430.071

Finally,r33 = ‖x3‖ = 0.2673

q3 =

0.802−0.5350.267

The algorithm should now have also produced the following decomposition:

Q =

0.408 −0.436 0.8020.816 −0.218 −0.5350.408 0.873 0.267

; R =

2.450 1.633 1.2250.000 1.528 −0.6550.000 0.000 0.267

i.e.

QR =

1 0 12 1 11 2 0

= X

♦♦♦


2.3 Matrices as Operators

In this section we will focus our attention on matrix A in the equation, Ax = b. This viewis to treat matrix A as an operator that transforms an input vector x and outputs a vectorb as shown in Figure 2.10.

A

x

b

Figure 2.10: Matrix as an operator.

If A [=] m×n, then it transforms vectors of n dimensions into vectors of m dimensions.When A is square, the number of dimensions are preserved. If the elements of x describesthe position of a point, then the effect of A is to move it to another position, see Figure 2.11as an example. More generally, the operation is also called “mapping”.

Figure 2.11: Example of A operating on x to yield b.

In physical systems, the elements of a vector may have units attached to the scalars.For instance, let

A =

(

2g/cc 0g/cc1g/cc 1g/cc

)

; x =

(

1cc2cc

)

;

then

Ax = b =

(

2g3g

)

For this case, one should be cautious about plotting x and b on the same graph. Even thoughboth x and b have the same number of dimensions, they have different units and thereforereside in different spaces. Nonetheless, we will be dealing mostly with systems where both


the input x and output b reside in the same space. Thus, unless otherwise noted, we canvisualize A as simply repositioning x to b. Furthermore, we will mainly focus our discussionon square matrices in the sections below.

2.3.1 Orthogonal and Unitary Matrices

There are special types of operators which will not change the norm of a vector. The firsttype is the orthogonal matrix.

Definition 2.7 A square matrix A is an orthogonal matrix if

AT A = AAT = I (2.48)

Recall from equation (2.31) that for vectors x of real elements, the distance measuredusing the Pythagorean theorem is given by

‖x‖ =

√

√

√

√

n∑

i=1

xi =√

xT x

Thus the norm of b = Ax is given by

‖b‖2 = ‖Ax‖2

= (Ax)T (Ax)

= xT (AT A)x

= xT x due to (2.48)

= ‖x‖2

which means that the operation of A is simply to change the position of x into anotherposition described by b which has the same distance from the origin as x had.

Example 2.9

The matrix Rcw defined below is a clockwise rotation operator which can beshown to be an orthogonal matrix (see Figure 2.12 for an example).

Rcw =

(

cos(θ) sin(θ)− sin(θ) cos(θ)

)

(2.49)


Figure 2.12: Example of Rcw operating on x to yield b.

To show that Rcw is orthogonal, we check if it satisfies the condition stated in(2.48):

RTcwRcw =

(

cos(θ) − sin(θ)sin(θ) cos(θ)

)(

cos(θ) sin(θ)− sin(θ) cos(θ)

)

=

(

cos2(θ) + sin2(θ) 00 cos2(θ) + sin2(θ)

)

= I

Similarly, one can show RcwRTcw = I.

♦♦♦

Note that the condition described by (2.48) states that for an orthogonal matrix,AT = A−1. Computationally, this means that the inverse of an orthogonal matrix can befound by simply taking the transpose. For instance, the transpose of the clockwise rotationoperator Rcw yields the counterclockwise rotation operator:

Rccw = RTcw =

(


)

Since we will be dealing soon with vectors having elements which are complex numbers,we need to generalize our definition of norms and orthogonality.

Definition 2.8 Let v be a vector whose elements are complex numbers, then theEuclidean norm of v, denoted ‖v‖ is given by

‖v‖ =

√

√

√

√

n∑

i=1

vi vi =√

v∗ v (2.50)


Definition 2.9 A matrix A consisting of elements which are complex numbersis called a unitary matrix if

A∗ A = A A∗ = I (2.51)

Example 2.10

An important example of a unitary matrix operator is the Householder Trans-formation Operator Uw given by

Uw = I − 2

w∗www∗ (2.52)

where w is a nonzero vector. Note that U∗w = Uw

To show that Uw is unitary, we check for the condition

U∗wUw = (I − 2

w∗www∗)∗(I − 2

w∗www∗)

= (I − 2

w∗www∗)(I − 2

w∗www∗)

= I − 4

w∗www∗ +

4

(w∗w)2ww∗ww∗

= I − 4

w∗www∗ +

4

(w∗w)ww∗

= I

The action of Uw on a vector x is to reflect this vector along a hyperplane per-pendicular to w as shown in Figure 2.13.

One could also use the Householder transformation operator to move a vector xto another vector y having the same norm as x by choosing w = x − y, i.e.

Ux−yx =

(

I − 2

(x − y)∗(x − y)(x − y)(x − y)∗

)

x

= x − 1

x∗x − y∗x(xx∗x − yx∗x − xy∗x + yy∗x)

=xx∗x − xy∗x − xx∗x + yx∗x + xy∗x − yy∗x

x∗x − y∗x= y

♦♦♦We will further analyze the characteristics of matrix operators such as eigenvector, eigen-values and canonical forms in the next chapter. The purpose of including the discussion ofmatrix operators in this section is simply to underline the fact that a simple-looking equationAx = b has at least three different perspectives, the last of which we begin to investigate thebehavior of A itself rather than x or b.


Figure 2.13: Householder transformation as a reflector operator.

2.4 Appendix: Proofs

• Proof of Cramer’s Rule: (cf. page 31)

Using A−1 = Adj(A)/|A| ,

x1

x2...

xn

=1

|A|

cof(a11) cof(a21) · · · cof(an1)cof(a12) cof(a22) · · · cof(an2)

......

. . ....

cof(a1n) cof(a2n) · · · cof(ann)

b1

b2...bn

Thus for the jth entry in x,

xj =

∑nk=1 bkcof(ajk)

|A|The numerator is the determinant of a matrix, Ω(j), in which jth column of the originalmatrix A is replaced by the vector b.

• Proof of lemma 2.2: (cf. page 34)

Let the notation M(•,j) stand for the jth column of any matrix M , then

(XC)(•,j) = (X)C(•,j)

=(

X(•,1) X(•,2) · · · X(•,r))

c1j

c2j...

crj

=r

∑

i=1

cijX(•,i)


Extending this to BXC,

(BXC)(•,j) = (BX)C(•,j) = B(

XC(•,j))

=r

∑

i=1

cijBX(•,i)

=(

c1jB c2jB · · · crjB)

X(•,1)X(•,2)

...X(•,r)

=(

c1jB c2jB · · · crjB)

vec (X)

Collecting these into a column,

vec (BXC) =

(BXC)(•,1)(BXC)(•,2)

...(BXC)(•,s)

=

c11B c21B · · · cr1Bc12B c22B · · · cr2B

......

. . ....

c1sB c21B · · · crsB

vec (X)

=(

CT ⊗ B)

vec (X)

QED

• Proof of theorem 2.4. (cf. page 43) Let A[=]n × m, m ≤ n

A = (a1| · · · |am)

then

Ax = 0

AT Ax = 0

Gx = 0

If the Grammian G is nonsingular then the only solution is when x1 = · · · = xm = 0,i.e. the columns of A are linearly independent.

Exercises


E1. Determine which of the following matrices are orthogonal or unitary:

a) Epermute(5, 1, 4, 2, 3)

b) B =

(

I − 2

w∗www∗

)(


)

c) A =

cos(θ) − sin(θ) 0sin(θ) cos(θ) 0

0 0 cos(θ)

E2. Let x1, x2, . . . , xm be a collection of vectors of length n < m. Will the Gram-Schmidt orthogonalization procedure produce a new set of orthogonal vectors ?Why or why not?

E3. Show that products of unitary matrices are unitary.

E4. Find an operator that would rotate a three dimensional vector by an angle equalto θ radians clockwise around: a) z-axis, b) y-axis, and c) x-axis. Also finda single operator that would first rotate a point 30o counterclockwise around thez-axis then 30o clockwise around the x-axis. (Note that three dimensional rotationoperators are not commutative. Verify the operators found by using sample vectorsand plot the vectors before and after being operated on by the matrices found.)

E5. In two dimensional graphics, the transformation is done on vectors given by

v =

xy1

to describe the x coordinate, the y coordinate, and the last entry is a constant=1to allow translation of points. This extension then uses two operators, Gtranslate

and Gcw rotate:

Gtranslate =

1 0 xtrans

0 1 ytrans

0 0 1

Gcw rotate =

cos(θ) sin(θ) 0− sin(θ) cos(θ) 0

0 0 1

The curve shown in Figure 2.14 is generated using the data in Table 2.7.

1. Find an operator that would rotate the curve θ radians counterclockwisearound the point (x, y) = (a, b). Test this operator on the data given inTable 2.7 with (a, b) = (4,−1.5) and θ = π/2.


2. Find an operator that would reflect the curve along a line that contains points(x1, y1) and (x2, y2). (Hint: you might need a corresponding Householder-typereflector operator that is a 3×3 matrix operator.). Test this operator on thedata given in Table 2.7 with (x1, y1) = (0,−1) and (x2, y2) = (10,−3).

0 1 2 3 4 5 6 7 8 9 10-3

-2.5

-2

-1.5

-1

-0.5

0

x

y

Figure 2.14: The curve used for exercise.


Table 2.7: Data for curve shown in Figure 2.14.

x y

0.0 0.00000.5 -0.06151.0 -0.23481.5 -0.48822.0 -0.77612.5 -1.04843.0 -1.26013.5 -1.38014.0 -1.39714.5 -1.3205

x y

5.0 -1.17765.5 -1.00686.0 -0.84946.5 -0.73997.0 -0.69997.5 -0.73378.0 -0.82838.5 -0.95819.0 -1.09039.5 -1.193910.0 -1.2459

Chapter 3

Matrix Analysis

In the previous chapter, we have discussed three perspectives or uses of matrices for theequation Ax = b. In this chapter, we will further explore some analysis tools and decom-positions (factoring) of matrices. Some of the results are useful in understanding how theelements affect the operational behavior of the matrix while other results simply improvecomputational efficiency.

We will first introduce the concepts of eigenvalues and eigenvectors, followed by a listof properties pertaining to eigenvalues and eigenvectors. Next, we describe three importanttransformations that are strongly based on eigenvalues. These are Schur triangularization,diagonalization and Jordan canonical formulation. Each of these transformations are usefulin furthering the results of eigenvalue analysis. They also are useful in the evaluation ofa function whose arguments are matrices. After a discussion of a few methods on how tohandle matrix functions, we include a section on a particular numerical method that is widelyused in the determination of eigenvalues, called the QR method. We then end this chapterwith three more matrix decompositions that are useful in both matrix computation andanalysis, namely the LU decomposition, the Singular Value Decomposition and the PolarDecomposition.

3.1 Eigenvalues and Eigenvectors

To study the characteristics of a particular matrix operator A, one can collect several pairsof x and b = Ax. Some of these pairs will behave more distinctively than others and willyield more information about the operator, i.e. somewhat like signatures of the operator.Specifically, we are interested in those vectors, v, which when operated on by matrix A, willresult in a vector that is scaled by a factor λ,

Av = λv where λ is a scalar (3.1)

69


These vectors are known as the eigenvectors of A. For instance, if λ is real, thiscondition simply states that the eigenvectors v are special vectors to A, in which the onlyeffect A has on v is to either move the position radially-out by a factor λ if |λ| > 1, radially-inif |λ| < 1. It could also flip v by 180o if λ < 0.

To determine these eigenvectors, we need to use the condition given in (3.1).

Av = λv = λIv

Av − λIv = 0

(A − λI)v = 0 (3.2)

Obviously, v = 0 will always be a solution to the last equation (3.2). Thus the zerovector is called a trivial solution because it does not give us any more information aboutA other than A 0 = 0. To obtain nontrivial solutions, we need the group (A − λI) to besingular, i.e.

det( A − λI ) = 0 (3.3)

Equation (3.3) is known as the characteristic equation of A. With (A−λI) singular,the solution to this equation is not unique, i.e. there are more than one eigenvalue. The setof values for λ which satisfies the characteristic equation (3.3) are known as the eigenvaluesof A.

Using the definition of determinants, plus the fact that the elements of A−λI is eitheraij or ajj − λ, the characteristic equation can be expanded into a single polynomial of ordern, where n is the size of A. Thus in the matrix case, the term characteristic polynomialequation is used interchangeably with characteristic equation. The polynomial is givenby

p(λ) = λn + βn−1λn−1 + · · · + β1λ + β0

where β0, β1, . . . , βn−1 are coefficients resulting from the expansion of equation (3.3). Thecharacteristic polynomials will yield n roots. The eigenvalues may either be real numbers orcomplex numbers. Some of the eigenvalues may even occur more than once. The collectionof all the eigenvalues of A, including multiplicities, are also known as the spectrum of A,denoted by σ(A).

Once the eigenvalues are found, the eigenvectors can be obtained by substituting eacheigenvalues one at a time to equation (3.2),

(A − λ1I)v1 = 0

(A − λ2I)v2 = 0... =

...

(A − λnI)vn = 0


Recall that each of these equations yield infinite solutions. However, only the directionsof these vectors are important. As a standard, the desired eigenvectors are those whoselengths equal to 1. These vectors are known as normalized eigenvectors.

Example 3.1

Let A =

(

2 −34 2

)

then the characteristic equation becomes

λ2 − 4λ + 16 = 0

whose roots are λ1 = 2 + 2√

3i and λ2 = 2 − 2√

3i. Upon substitution, theseeigenvalues are used to yield the following eigenvectors:

v1 = α

√3

2i

1

and v2 = β

−√

32

i

1

where α and β are just two arbitrary numbers. To obtain normalized vectors, weneed α = 0.7559 and β = 0.7559.

♦♦♦

3.2 Properties of Eigenvalues and Eigenvectors.

In this section we will list some useful properties and identities that applies to eigenvaluesand eigenvectors. This will aid in both future analysis and computational efficiency. (Unlessotherwise noted, the proofs are left as exercises – mostly using (3.1).)

Property 1. Let B = S−1AS where S is nonsingular. Then the eigenvalues of A and Bare the same. (Note: S−1AS is called a similarity transformation of A.)

Property 2. Let λ be an eigenvalue of A, then λk is an eigenvalue of Ak as long as Ak

exist, k = . . . ,−2,−1, 0, 1, 2, . . .. The eigenvector corresponding to λ for A is thesame eigenvector corresponding to λk for Ak.

Property 3. Let λ be an eigenvalue of A and let α be a scalar, then αλ is an eigenvalueof αA.


Property 4. The eigenvalues of diagonal and triangular matrices are the diagonal en-tries of these matrices. The eigenvectors of a diagonal matrix D corresponding toeigenvalue djj is the unit vector ej.

Property 5. A and AT have the same eigenvalues, but in general, they do not have thesame set of eigenvectors.

Property 6. Eigenvectors corresponding to distinct eigenvalues are linearly independent.


Property 7.∏n

i=1 λi = |A|. ( Thus at least one of the eigenvalues of a singular matrixwill be zero. )


Property 8.∑n

i=1 λi = tr(A).


Property 9. The eigenvalues of Hermitian matrices are all real-valued. ( Thus the eigen-values of real-valued symmetric matrices are also all real. )


Property 10. The eigenvalues of skew-Hermitian matrices are all pure imaginary.


Property 11. The eigenvectors of a Hermitian matrix corresponding to distinct eigen-values are orthogonal.


Property 12. The eigenvalues of block diagonal matrices, upper block triangular or lowerblock triangular matrices, is the collection of all the eigenvalues of each of the blockmatrices in the diagonal.

3.3 Schur triangularization.

For any square matrix A, one can find a unitary matrix operator U such that U∗AU willyield an upper triangular matrix where all the eigenvalues appear in the diagonal. Thisapproach is known as the Schur triangularization method. This result is useful in provingsome properties of eigenvalues. Also, for certain types of matrices called normal matrices(to be defined later), the Schur triangularization method yields diagonal matrices.

Schur Triangularization Algorithm.


1. Initialization: Let m = n and G = A.

2. If m = 1 then Um = I and go to step 7.

3. Determine an eigenvalue λ of G and its corresponding orthonormal eigenvector v.

4. Using Gram-Schmidt orthogonalization, obtain w2, . . . , wm such that v, w2, . . . , wmforms an orthonormal set.

5. Let H = (v | w2 | · · · | wm), then

H∗GH =

(

λ bT

0 C

)

6. Set m ← m − 1, G = C,

Um =

(

In−m 00 H

)

then repeat from step 2.

7. Calculate U = UnUn−1 · · ·U1

Example 3.2

Let

A =

−2 1 11 −2 11 1 −2

whose eigenvalues are (-3,-3,0). (Note that −3 occurs twice.)

Based on the procedure,

1. m = n = 3, G = A, λ = −3 and

v =

√2/2

−√

2/20

w2 =

√2/2√2/20

w3 =

001

H = U3 =

√2/2

√2/2 0

−√

2/2√

2/2 00 0 1

H∗GH =

−3 0 0

0 −1√

2

0√

2 −2


2. m = 2,

G =

(

−1√

2√2 −2

)

λ = −3 v =

(

−√

3/3√6/3

)

w2 =

( √6/3√3/3

)

H =

(

−√

3/3√

6/3√6/3

√3/3

)

U2 =

1 0 0

0 −√

3/3√

6/3

0√

6/3√

3/3

H∗GH =

−3 0 00 −3 00 0 0

3. m = 1, U3 = I

Then,

U = U3U2U1 =

√2/2 −

√6/6

√3/3

−√

2/2 −√

6/6√

3/3

0√

6/3√

3/3

As a final check,

U∗AU =

−3 0 00 −3 00 0 0

♦♦♦

3.4 Diagonalization

For some square matrices A there exists similarity transformations, T−1AT , T nonsingular,which will yield a diagonal matrix in which the diagonal elements are the eigenvalues of A,i.e.

T−1AT =

λ1 0 · · · 00 λ2 · · · 0...

.... . .

...0 0 · · · λn

These matrices which can be diagonalized are classified as diagonalizable matrices orsemisimple matrices.


Two cases (not necessarily disjoint) are guaranteed to be diagonalizable. The first caseinvolves those having all distinct eigenvalues. The second case involves normal matrices (tobe defined below). Other diagonalizable matrices may have repeated eigenvalues but requiresome rank conditions.

3.4.1 Diagonalizable Class 1: All Eigenvalues Are Distinct.

The n eigenvalue equations,

Av1 = λ1v1

...

Avn = λnvn

can be rewritten compactly as follows:

AV = V Λ (3.4)

whereV = (v1| · · · |vn) Λ = diag(λ1, . . . , λn)

If all the eigenvalues are different from each other, then the corresponding eigenvec-tors v1, . . . , vn are all linearly independent (see property 6 on page 72) . This means V isnonsingular, and diagonalization is obtained by premultiplying (3.4) by V −1 to yield

V −1AV = Λ

Example 3.3

A =

(

a b0 c

)

a 6= c

The eigenvalues are λ1 = a, λ2 = c, while the corresponding eigenvectors are

v1 =

(

10

)

v2 =

(

b/(c − a)1

)

Then

V −1AV =

(

1 b/(c − a)0 1

)−1 (

a b0 c

)(

1 b/(c − a)0 1

)

=

(

a 00 c

)

= Λ

♦♦♦


3.4.2 Diagonalizable Class 2: Normal matrices.

Definition 3.1 A matrix A is normal if

AA∗ = A∗A (3.5)

Examples of normal matrices include Hermitian, real symmetric, skew- Hermitian, realskew-symmetric, unitary, orthogonal and circulant matrices.

Circulant matrices are matrices that have the form

A =

a0 a1 a2 · · · an

an a0 a1 · · · an−1

an−1 an a0 · · · an−2...

......

. . ....

a1 a2 a3 · · · a0

which can also be described by the following sum formula:

A =n−1∑

k=0

αkEkpermute(n, 1, 2, . . . , n − 1) (3.6)

where n is the size of matrix A.

Example 3.4

A = −I + 2Epermute(3, 1, 2) + E2permute(3, 1, 2) =

−1 2 11 −1 22 1 −1

is circulant. Since

A∗A =

6 −1 −1−1 6 −1−1 −1 6

= AA∗

A is a normal matrix.

♦♦♦

Theorem 3.1 If A is a normal matrix, then there exists a unitary matrix U(e.g. obtained via the Schur triangularization procedure) such that

U∗AU = diag(λ1, . . . , λn)

where λ1, . . . , λn are the eigenvalues of A.

(see page 106 for proof of theorem 3.1 )


3.4.3 Diagonalizable Class 3: Repeated Eigenvalues Under RankConditions.

Theorem 3.2 Let λi be the eigenvalues of A repeated ki times, i = 1, 2, . . . , pwhere p is the number of distinct eigenvalues. If

rank(λiI − A) = n − ki (3.7)

then there exists linearly independent eigenvectors v1, . . . , vn

( See page 106 for proof of theorem 3.2)

Under the condition given in (3.7), one can again obtain

AV = ΛV

in which V is nonsingular. ThenV −1AV = Λ

Example 3.5

A =

−1 0 0−1 −2 00 0 −2

whose eigenvalues are (-2,-2,-1). Thus λ1 = −2, k1 = 2, λ2 = −1, k2 = 1.

λ1I − A =

−1 0 01 0 00 0 0

λ2I − A =

0 0 01 1 00 0 1

With rank(λ1I −A) = 1 = n− 2 and rank(λ2I −A) = 2 = n− 1, the conditionsof theorem 3.2 are satisfied. This implies that A is diagonalizable.

Solving for the eigenvectors,

(λ1I − A) v =

−1 0 01 0 00 0 0

v = 0


v =

0αβ

choose α = 1, β = 0 for v1, and α = 0, β = 1 for v2. Next, to solve for v3,

(λ2I − A) v3 =

0 0 01 1 00 0 1

v3 = 0

v3 =

γ−γ0

choose γ = 1. Thus,

V =

0 0 11 0 −10 1 0

and

V −1AV =

−2 0 00 −2 00 0 −1

♦♦♦

3.5 Jordan Canonical Form

For nondiagonalizable matrices, the closest form to a diagonal matrix via a similarity trans-formation is the Jordan canonical form given by the block diagonal matrix

J =

J1 0 · · · 00 J2 · · · 0...

... · · · ...0 0 · · · Jk

(3.8)

where Ji is called the Jordan block of either a scalar or of the form

Ji =

λ 1 0 · · · 00 λ 1 · · · 0

0 0 λ. . . 0

......

.... . .

...0 0 0 · · · λ

(3.9)


The similarity matrix T that would transform a square matrix A to another matrix Jin Jordan canonical form, i.e.

T−1AT = J

is a called the modal matrix. The columns of the modal matrix is also collectively called thecanonical basis of A. The canonical basis is composed of vectors derived from eigenvectorchains of different orders.

Definition 3.2 Given matrix A and one of its eigenvalues λ, the eigenvectorchain with respect to λ of order r is

chain(A, λ, r) = (v1, v2, . . . , vr) (3.10)

where(A − λI)rvr = 0 (A − λI)r−1vr 6= 0

vj = (A − λI)vj+1 j = (r − 1), . . . , 1

Note: If the order of the chain is 1, then the chain is composed of only one eigenvector.

Algorithm for Obtaining Chain(A,λ,r).

1. Obtain vector vr to begin the chain.

(a) Construct matrix M ,

M(λ, r) =

(

(A − λI)r−1 −I(A − λI)r 0

)

(b) Use Gaussian elimination to obtain Q, W and qA such that

QMW =

(

Iq 00 0

)

(c) set vector h

hj =

0 j = 1, 2, . . . , qa randomly generated number j = q + 1, . . . , 2n

(d) Obtain vr by extracting the first n elements of z = Wh.

2. Calculate the rest of the chain.

vj = (A − λI)vj+1 j = (r − 1), . . . , 1


Example 3.6

Let

A =

3 0 0 0 10 3 0 1 11 0 3 0 00 0 0 2 00 0 0 0 3

to find the eigenvector chain of A with respect to 3 of order 3, first find v3

according to the algorithm,

v3 =

0.88921.18261.8175

0−1.2992

checking,

(A − 3I)3v3 = 0 while (A − 3I)2v3 =

00

−1.299200

Next, we evaluate v2 then v1

v2 = (A − 3I)v3 =

−1.2992−1.29920.8892

00

v1 = (A − 3I)v2 =

00

−1.299200

Now collect the vectors to form the required chain,

chain(A, 3, 3) = (v1, v2, v3) =

0 −1.2992 0.88920 −1.2992 1.1826

−1.2992 0.8892 1.81750 0 00 0 −1.2992


♦♦♦

To obtain the canonical basis, we still need to determine the required eigenvectorchains. To do so, we need to calculate the orders of matrix degeneracy with respect to aneigenvalue λi, to be denoted by Ni,k, which is just the difference in ranks of succeeding orders,i.e.

Ni,k = rank(A − λiI)k−1 − rank(A − λiI)

k (3.11)

Using these orders of degeneracy, one can calculate the required orders for the eigenvec-tor chains. The algorithm below describes in more detail the procedure for obtaining thecanonical basis.

Algorithm for Obtaining Canonical Basis. Given A[=]n × n. For each distinct λi:

1. Determine multiplicity mi.

2. Calculate order of required eigenvector chains.

Let

pi = arg

(

min1≤p≤n

[rank(A − λ1I)p = n − mi]

)

then obtain ordi = (γi,1, . . . , γi,pi), where

γi,k =

Ni,k if k = pi

max(0, [Ni,k −∑pi

j=k+1 γi,j]) if k < pi

where,Ni,k = rank(A − λiI)

k−1 − rank(A − λiI)k

3. Obtain the required eigenvector chains.

For each γi,k > 0, find γi,k sets of chain(A, λi, k) and add to the collection of canonicalbasis.

One can show that the eigenvector chains found will be linearly independent, i.e. Tis nonsingular. Thus the Jordan canonical form can then be determined from the similaritytransformation T−1AT = J .


Example 3.7

Consider the matrix A,

A =

3 0 0 0 10 3 0 1 11 0 3 0 00 0 0 2 00 0 0 0 3

then

λi mi pi Ni,k ordi

2 1 1 [1] [1]3 4 3 [2, 1, 1] [1, 0, 1]

Next calculate the required chains:

chain(A, 2, 1) =

0−0.707

00.707

chain(A, 3, 1) =

0−0.5843−1.0107

00

chain(A, 3, 3) =

0 −1.2992 0.88920 −1.2992 1.1826

−1.2992 0.8892 1.81750 0 00 0 −1.2992

The modal matrix T is obtained by collecting the chains just calculated:

T =

0 0 0 −1.2992 0.8892−0.7071 −0.5843 0 −1.2992 1.1826

0 −1.0107 −1.2992 0.8892 1.81750.7071 0 0 0 0

0 0 0 0 −1.2992

and the Jordan canonical form is given by,


J = T−1AT =

2 0 0 0 00 3 0 0 00 0 3 1 00 0 0 3 10 0 0 0 3

♦♦♦

3.6 Functions of Square Matrices

There are several functions of square matrices such as sin(A), cos(A) and exp(A) whichare comparable to their analogous scalar function but will also have significant differencesspecially because of the loss of commutative operations. We begin with the definition ofwell-defined functions.

Definition 3.3 Let f(x) be a function having a power series expansion

f(x) =∞

∑

i=0

αi xi (3.12)

which is convergent for |x| < R. Then the function of a square matrix A definedby

f(A) =∞

∑

i=0

αi Ai (3.13)

is called a well-defined function if each eigenvalue have absolute value lessthan the radius of convergence R.

Example 3.8

exp(A) = I + A +1

2A2 +

1

3!A3 + · · · (3.14)


♦♦♦

Of course it is not often advisable to actually calculate the power series of a squarematrix directly from the definition. Instead, one could use several methods to simplify theevaluation. One method is to implement diagonalization when possible, or use the Jordancanonical forms when matrix A is not diagonalizable. Another method is make use of theCayley-Hamilton theorem to produce a finite series that is equivalent to the power series.

3.6.1 Case 1: A is Diagonalizable

Let us first take the situation when A is diagonalizable. This means that there exist anonsingular matrix T such that T−1AT = Dλ, where Dλ is a diagonal matrix with theeigenvalues in the diagonal, or equivalently, A = TDλT

−1. The power series function (3.13)then becomes

f(A) = α0TT−1 + α1TDλT−1 + α2(TDλT

−1)(TDλT−1) + . . .

= T (α0I + α1Dλ + α2D2λ + . . .)T−1

= T

(α0 + α1λ1 + . . .) 0 · · · 00 (α0 + α1λ2 + . . .) · · · 0...

.... . .

...0 0 · · · (α0 + α1λn + . . .)

T−1

= T

f(λ1) 0 · · · 00 f(λ2) · · · 0...

.... . .

...0 0 · · · f(λn)

T−1 (3.15)

Example 3.9

Suppose A is diagonalizable, then

exp(A) = T

exp(λ1) 0 · · · 00 exp(λ2) · · · 0...

.... . .

...0 0 · · · exp(λn)

T−1 (3.16)

♦♦♦


3.6.2 Case 2: A is Not Diagonalizable.

If A is not diagonalizable, we will need to implement the Jordan canonical form instead, i.e.T−1AT = J where J is a block diagonal matrix with Jordan blocks in the diagonal.

Before we expand the power series, let us take a closer look at different powers of aJordan block. Recall that a Jordan block is of the form,

Ji =

λi 1 0 · · · 0 00 λi 1 · · · 0 00 0 λi · · · 0 0...

......

. . ....

...0 0 0 · · · λi 10 0 0 · · · 0 λi

It can be shown that

Jki =

λki β[k,k−1]λ

k−1i β[k,k−2]λ

k−2i · · · β[k,k−n+1]λ

k−n+1i

0 λki β[k,k−1]λ

k−1i · · · β[k,k−n+2]λ

k−n+2i

0 0 λki · · · β[k,k−n+3]λ

k−n+3i

......

.... . .

...0 0 0 · · · λk

i

(3.17)

where,

βk,j =

k!(k−j)!j!

if j ≥ 0

0 otherwise

(3.18)

So the power series (3.13) applied to a Jordan block Ji yields

f(Ji) = α0I + α1Ji + α2J2i + α3J

3i + · · ·

=

γ[i,0] γ[i,1] · · · γ[i,n−1]

0 γ[i,0] · · · γ[i,n−2]...

.... . .

...0 0 · · · γ[i,0]

(3.19)

where,

γ[i,j] =∞

∑

k=0

αk+jβ[k+j,k]λki


=∞

∑

k=0

αk+j(k + j)!

k!j!λk

i

=1

j!

∞∑

k=0

αk+j(k + j)!

k!λk

i (3.20)

Using the following identity for the nth derivative of f(λ),

dnf(λ)

dλni

=∞

∑

k=0

αk+n(k + n)!

k!λk (3.21)

we can rewrite γ[i,j] as

γ[i,j] =1

j!

(

djf(λi)

dλji

)

(3.22)

or

f(Ji) =

f(λi) f (1)(λi) · · · (1/(n − 1)!)f (n−1)(λi)0 f(λi) · · · (1/(n − 2)!)f (n−2)(λi)...

.... . .

...0 0 · · · f(λi)

(3.23)

Example 3.10

exp(Ji) =

exp(λi) exp(λi) · · · (1/(n − 1)!) exp(λi)0 exp(λi) · · · (1/(n − 2)!) exp(λi)...

.... . .

...0 0 · · · exp(λi)

(3.24)

♦♦♦

Returning to the main issue of evaluating a nondiagonalizable matrix A, with m Jordanblocks occurring in the Jordan canonical form, f(A) is given by

f(A) = T

f(J1) 0 · · · 00 f(J2) · · · 0...

.... . .

...0 0 · · · f(Jm)

T−1 (3.25)


3.6.3 Case 3: Using Finite Sums to Evaluate Matrix Functions

The method of using finite matrix sequences begins with the use of Cayley-Hamilton theorem.

Theorem 3.3 For any square matrix A[=]n×n, whose characteristic polynomialis given by

charpoly(λ) = a0 + a1λ + · · · + anλn = 0 (3.26)

matrix A will also satisfy the characteristic polynomial with A replacing λ, i.e.

charpoly(A) = a0I + a1A + · · · + anAn = 0 (3.27)

(see page 107 for proof of theorem 3.3)

Using the Cayley-Hamilton theorem, we can see that An can be written as a linearcombination of Ai, i = 0, 1, . . . , (n − 1).

An =−1

an

(a0I + · · · + an−1An−1) (3.28)

The same is true with An+1,

An+1 =−1

an

(a0A + · · · + an−1An)

=−1

an

(

a0A + · · · + an−1

[−1

an

(a0I + · · · + an−1An−1)

])

= β0I + β1A + · · · + βn−1An−1 (3.29)

We can continue this process and conclude that An+j , j > 0, can always be recast as alinear combination of I, A, . . ., An−1. This means that for any well-defined matrix function,

f(A) = c0I + c1A + · · · + cn−1An−1 (3.30)

for some coefficients c0, . . . , cn−1. Therefore the key evaluations are on these coefficientswhich require n linearly independent equations.

Since the suggested derivation of (3.30) was based on the characteristic polynomial,this equation should also hold if A is replaced by λi, an eigenvalue of A. Thus we can get mlinearly independent equations from the m distinct eigenvalues:

f(λ1) = c0 + c1λ1 + · · · + cn−1λn−11

...

f(λm) = c0 + c1λm + · · · + cn−1λn−1m


For the remaining equations, we can use [dqf(λ)/dλq]λ=λi, q = 1, ..., r, where r is the multi-

plicity of λi in the spectrum of A, i.e.

dqf(λ)

dλq

∣

∣

∣

∣

λ=λi

=dq

dλq(c0 + c1λ + · · · + cn−1λ

n−1)

∣

∣

∣

∣

λ=λi

(3.31)

After obtaining the required independent linear equations, c0, . . . , cn−1 can be calcu-lated and used in (3.30).1

Example 3.11

Let

A =

−2 0 00 −3 11 0 −3

To find exp(A), we first obtain the eigenvalues. These are: -2, -3, -3. To applythe finite sequence algorithm,

exp(−2) = c0 + c1(−2) + c2(−2)2

exp(−3) = c0 + c1(−3) + c2(−3)2

exp(−3) = c1 + 2c2(−3)

Solving the simultaneous equations yields: c0 = 0.5210, c1 = 0.2644, c2 = 0.0358.

exp(A) = c0I + c1A + c2A2

=

0.1353 0 00.0358 0.0498 0.04980.0855 0 0.0498

As an alternative solution for verification, one can apply the power series defi-nition (3.14) and truncate at a high order, say at the 20th power, and shouldfind a very close match. In fact, since the power series was truncated, the finitesequence method is actually the more accurate answer.

♦♦♦1Note that in cases where the degree of degeneracy of A introduces multiple Jordan blocks corresponding

to the same eigenvalue, this method may not yield n linearly independent equations. In those cases however,there exists a polynomial of lower order than the characteristic polynomial (called minimal polynomials)such that the required number of coefficients will be equal to the number of linear equations obtained fromthe method just described.


3.7 Numerical Methods for Eigenvalue Calculations

For large systems, the determination of eigenvalues and eigenvectors can become susceptibleto numerical errors especially since the roots of polynomials are very sensitive to smallperturbations in the polynomial coefficients. Fortunately, there exist other more robustand accurate methods to determine eigenvalues aside from using the basic definitions. Oneof the more stable and reliable method is the QR decomposition method. Recall fromsection 2.2.5 that the modified Gram-Schmidt method for orthogonalization also produces adecomposition of the original matrix A into a product of two matrices Q and R where Q isunitary while R is upper triangular

A = QR (3.32)

Since Q is unitary, (3.32) a similarity transformation based on Q,

A(1) = Q−1AQ = RQ (3.33)

only involves inverting the order of matrix multiplication of Q and R. Since matrix A(1) andA are similar, they have exactly the same eigenvalues. Due to the upper triangular form ofR, one can show2 that performing the QR decomposition iteratively on Ai , i = 1, 2, . . . willreliably converge to a matrix that can be partitioned as follows:

A(i) =

(

A(1),11 A(1),12

0 A(1),22

)

where A(1),22 is either a scalar or a 2×2 submatrix. When A(1),22 is a scalar, then this scalaris one of the eigenvalues of A. When A(1),22 is a 2 × 2 submatrix, it can be solved to obtaina pair of complex roots of A. The same QR decomposition is then applied to the submatrixA(1),11. This process continues until all the eigenvalues are obtained.

Although the QR method converges to the required eigenvalues, two enhancements tothe method significantly helps in accelerating the convergence. The first is called the shiftedQR method. Here, instead of taking the QR decomposition of A(i), the decomposition isinstead applied on

A(i) = A(i) − σiI = QiRi

where σi is the element of A(i) in the (n, n)th position. Then

A(i+1) = RiQi + σiI

= Q−1i (A(i) − σiI)Qi + σiI

= Q−1i A(i)Qi

2For a detailed proof, refer to G. H. Golub and C. Van Loan, Matrix Computations, third edition, 1996,John Hopkins University Press.


which is again a similarity transformation, thus preserving the set of eigenvalues.

The second enhancement to the algorithm is the use of Householder operators to firstput A into the upper Hessenberg form. The upper Hessenberg form is a matrix in which allelements below the first subdiagonal are zero,

H =

× × × · · · × × ×× × × · · · × × ×0 × × · · · × × ×0 0 × · · · × × ×...

......

. . ....

......

0 0 0 · · · × × ×0 0 0 · · · 0 × ×

(3.34)

where “×” denotes arbitrary values. Using Householder operators Uw (cf. page 63 ) whichhave already been shown to be unitary (and Hermitian) means that UwAUw is anothersimilarity transformation and thus preserves the set of eigenvalues. Once in Hessenbergform, the shifted-QR method will have significantly fewer steps required to converge to asimilar matrix where the eigenvalues are in the diagonals (or 2×2 diagonal blocks for complexeigenvalues). The main idea is to use a Householder operator Ux−y to transform a vector xto another vector y of the same norm but in which only the first element is non-zero, i.e.

Ux−yx =

(

I − 2

< x − y, x − y > (x − y)(x − y)∗

)

x = y with y =

‖x‖0...0

(cf. example shown in page 63). This idea is then expanded to introduce different unitarytransforms to iteratively eliminate the terms below the first subdiagonal.

Algorithm for Householder Transformations to Upper Hessenberg Form.

Given matrix A[=]n × n

1. Initialize. Set m = 1 and G = A

2. While m < (n − 2)

(a) Partition G.

G =

(

g11 G12

w G22

)

where w is a column vector of length m − 1.


(b) Generate the Householder operator:

Let y = (‖w‖, 0, · · · 0)T , µ =< y − w, y − w >,

If µ = 0, set Hm = I and go to step (e), otherwise set

Uy−w = I − 2

µ(y − w)(y − w)∗

(c) Form matrix Hm,

Hm =

(

In−m 00 Uy−w

)

(d) Reduce matrix G

B =

(

1 00 Uy−w

)

G

(

1 00 Uy−w

)

=

B11 B12

‖w‖0...0

B22

(e) Shrink G.Update G to be the submatrix B22[=](m − 1) × (m − 1), and update m:

G ←− B22 and m ←− m − 1

End While-Loop

3. Determine the complete unitary matrix, H.

H = Hn · · ·H1

Then HT AH should yield a matrix that is in upper Hessenberg form.

Example 3.12

Given

A =

3 −4 0 12 120 1 0 0 20 0 −2 3 30 2 0 −5 −60 −2 0 6 7


Set G ←− A.

m = 1: The first column is already in required form so set H1 = I. Update G, m

G ←−

1 0 0 20 −2 3 32 0 −5 −6−2 0 6 7

m = 2:

w =

02−2

y =

√8

00

Uw−y =

0 0.7071 −0.70710.7071 0.5 0.5−0.7071 0.5 0.5

H2 =

1 0 0 0 00 1 0 0 00 0 0 0.7071 −0.70710 0 0.7071 0.5 0.50 0 −0.7071 0.5 0.5

B =

(

1 00 Uy−w

)

G

(

1 00 Uy−w

)

=

1 −1.4142 1 12.8284 1 −8.4853 −8.4853

0 0 1.6213 3.62130 0 −0.6213 −2.6213

then update G and m,

m ←− 3 and G ←−

1 −8.4853 −8.48530 1.6213 3.62130 −0.6213 −2.6213

m = 3: again G is already in Hessenberg form, so H3 = I and we stop.

The unitary matrix required for the Hessenberg transform is thus

H = H3H2H1 =

1 0 0 0 00 1 0 0 00 0 0 0.7071 −0.70710 0 0.7071 0.5 0.50 0 −0.7071 0.5 0.5

and the resulting Hessenberg form is given by


HT AH =

3 −4 0 12 120 1 −1.4142 1 10 2.8284 1 −8.4853 −8.48530 0 0 1.6213 3.62130 0 0 −0.6213 −2.6213

In this example, it was very fortuitous that the resulting Hessenberg form isalready in block diagonal form, and the eigenvalues can be obtained from eachsubmatrix block in the diagonal. This is not usually the case and the QR (orshifted-QR) method is needed to obtain the eigenvalues.

♦♦♦

QR Algorithm for Obtaining Eigenvalues.

Given matrix A[=]n × n, and a specified tolerance, ǫ

1. Use Householder’s method to obtain Hessenberg Form.

G = HT AH

2. Use QR to reduce G.

While G[=]m × m, m > 2

Let σ = Gm,m,

• Case 1: (|Gm,m−1| > ǫ) and (|Gm−1,m−2| > ǫ)Iterate on the following steps until either the conditions of Case 2 or Case 3results:

(a) Find Q and R, such thatQR = G − σI

(b) Update GG ←− RQ + σI

• Case 2: (|Gm,m−1| ≤ ǫ).Add Gm,m to list of eigenvalues.Update G by removing the last row and last column.


• Case 3: (|Gm−1,m−2| ≤ ǫ).Let b = −(Gm−1,m−1 + Gm,m) and c = Gm−1,m−1Gm,m − Gm,m−1Gm−1,m

Add µ1 and µ2 to list of eigenvalues, where

µ1 =−b +

√b2 − 4c

2

µ2 =−b −

√b2 − 4c

2

Update G by removing the last 2 rows and last 2 columns.

End While-loop

3. Solve for the last remaining eigenvalues.

• Case 1: G[=]1 × 1, then add g11 to eigenvalue list.

• Case 2: G[=]2 × 2Let b = −(G11 + G22) and c = G11G22 − G1,2G2,1

Add µ1 and µ2 to list of eigenvalues, where

µ1 =−b +

√b2 − 4c

2

µ2 =−b −

√b2 − 4c

2

Example 3.13

Let

A =

1 2 0 1 12 1 0 0 21 0 −1 2 −12 2 1 0 30 0 2 0 1

First, obtain the upper Hessenberg matrix G using Householder’s method

G =

1 −2 −0.1060 −1.3072 −0.5293−3 1.8889 −1.1542 2.3544 1.76420 −1.0482 0.8190 0.6139 −0.45630 0 −1.2738 0.0036 2.97040 0 0 −0.8456 −1.7115


After 10 iterations of the shifted-QR method, G has become

4.2768 0.2485 −2.2646 2.2331 −5.70240 −1.8547 2.3670 −1.3323 0.20850 −1.5436 0.4876 1.0912 −0.00940 0 −0.2087 0.3759 −0.02650 0 0 0 −1.2856

Thus -1.2856 is one of the eigenvalues. Next update G by deleting the last rowand column

G ←−

4.2768 0.2485 −2.2646 2.23310 −1.8547 2.3670 −1.33230 −1.5436 0.4876 1.09120 0 −0.2087 0.3759

After 4 iterations of the shifted-QR method, G has become

4.2768 −0.7732 −1.7695 −0.63320 −1.1442 2.9004 −2.77530 −0.9076 0.0814 −0.24020 0 0 0.0716

Thus 0.0716 is one of the eigenvalues. Next update G by deleting the last rowand column

G ←−

4.2768 −0.7732 −1.76950 −1.1442 2.90040 −0.9076 0.0814

Since G2,1 = 0, there is no need to apply the shifted-QR method. We can simplyfind the eigenvalues of the 2× 2 matrix in the lower right corner. This yields thecomplex pair −0.5314 ± 1.5023i, which are then included as eigenvalues.

Finally, after extracting the last 2 rows and 2 columns, G = (4.2768). Thus4.2768 is the last eigenvalue in the list. Gathering all the values, the spectrumof A is given by: −1.2856,0.0716,−0.5314 ± 1.5023i, 4.2768.

♦♦♦

3.8 Miscellaneous Matrix Decomposition

There are several other matrix decompositions possible. In this section, we will describethree more popular and useful decompositions:


Decomposition Formula Description Main Use

LU Factorization A = LU L is lower triangular Solution ofU is upper triangular linear equations

Singular Value A = UΣV ∗ U is unitary Generalized inversesDecomposition V is unitary

Σ is diagonalPolar Decomposition A = UP U is unitary Analysis of operators

P is positive definiteor

A = SV S is positive definiteV is unitary

3.8.1 LU Decomposition

The main goal of the LU decomposition is to factor out a square matrix A such the solutionof simultaneous linear equation is broken down in two sequence of operation: the forwardsubstitution and the backward substitution. Observe that if L is a lower triangular matrixand U is an upper triangular matrix, then if A = LU , the general equation for solution oflinear equations become

Ax = b

L(Ux) = b

Ly = b (3.35)

wherey = Ux (3.36)

From (3.35), note that the first equation is given by

ℓ11y1 = b1

y1 =b1

ℓ11

The second equation can then use the value of y1 just calculated.

ℓ21y1 + ℓ22y2 = b2

y2 =b2 − ℓ21y1

ℓ22

This process now continues until the last element of y is obtained.

yn =1

ℓnn

(

bn −n−1∑

i=1

ℓniyi

)


This process is called the forward substitution part of the solution.

Once vector y has been obtained, equation (3.36) is then applied. This time we workbackwards from the last equation:

unnxn = yn

xn =yn

unn

Then the (n − 1)th equation can use the value of xn just calculated.

un−1,n−1xn−1 + un−1,nxn = yn−1

xn−1 =yn−1 − un−1,nyn

un−1,n−1

This process now continues until the first element of x is obtained.

x1 =1

u11

(

y1 −n

∑

i=2

u1,ixi

)

This process is called the backward substitution part of the solution. The key part of themethod then relies on the factorization itself.

Algorithm for LU Decomposition (Crout’s Method)

Given a nonsingular matrix A[=]n × n (for simplicity, we assume that none of thediagonal terms of A are zero, otherwise pivoting is necessary.)

For i = 1, . . . , n

1. Calculate diagonal entries.

uii = 1

ℓii =

aii if i = 1

aii −∑i−1

k=1 ℓikuki otherwise

2. Calculate off-diagonal entries. For j = i + 1, . . . , n

uij =

aij/ℓii if i = 1(

aij −∑i−1

k=1 ℓikukj

)

/ℓii otherwise

ℓji =

aji/uii if i = 1(

aji −∑i−1

k=1 ℓjkuki

)

/uii otherwise


Example 3.14

Let

A =

1 2 12 −1 11 0 3

After the first iteration, we find

L =

1 0 02 × 01 × ×

U =

1 2 10 × ×0 0 ×

where “×” denote ”unknown quantity at this time”. Proceeding with the seconditeration,

L =

1 0 02 −5 01 −2 ×

U =

1 2 10 1 (1/5)0 0 ×

Finally,

L =

1 0 02 −5 01 −2 (12/5)

U =

1 2 10 1 (1/5)0 0 1

♦♦♦

3.8.2 Singular Value Decomposition

For any matrix A[=]m × n, there exist a decomposition such that

A = UΣV ∗ (3.37)

The matrices U [=]m×m and V [=]n× n are unitary. Σ[=]m× n is a block diagonal matrixof the form:

Σ =

σ1 0 · · · 00 σ2 · · · 0...

.... . .

...0 0 · · · σr

0

0 0


where σi =√

λi are called the singular values of A, where λi is a positive eigenvalue ofA∗A.

Algorithm for Singular Value Decomposition

1. Calculate the Eigenvalues and Eigenvectors of A∗A.

Since A∗A is Hermitian, the eigenvalues are all real and nonnegative. Moreover A∗Ais also normal, and thus diagonalizable, e.g. via Schur triangularization.

2. Build matrix Σ.

Σ =

(

D 00 0

)

where D = diag(σ1, . . . , σr)

3. Arrange eigenvectors into matrix V . Let V = [V1|V2] where V1 contains the reigenvectors corresponding to the nonnegative singular values in Σ.

4. Obtain U1.

U1 = AV1D−1

5. Complete the Columns of U . If m > r, augment U1 with m− n column vectors ofrandom numbers. Then use Gram-Schmidt (or QR) orthogonalization to obtain U .

Example 3.15

Let

A =

1 0 12 1 01 1 −10 1 −2

then,

A∗ A =

6 3 03 3 −30 −3 6

Using Schur triangularization,

V =

0.5774 0.7071 −0.40820.5774 0.0000 0.8165−0.5774 0.7071 0.4082

and T =

9 0 00 6 00 0 0


Next, since r = 2, extract the first 2 columns of V to build V1, while extract the2 × 2 diagonal submatrix of T and take the square root to form D:

V1 =

0.5774 0.70710.5774 0−0.5774 0.7071

; D =

(

3 00 2.4495

)

and

Σ =

3 0 00 2.4495 00 0 00 0 0

Calculate U1 = AV1D−1,

U1 =

0.0000 0.57740.5774 0.57740.5774 0.00000.5774 −0.5774

Completing the columns of U and orthonormalizing the columns,

U =

0.0000 0.5774 −0.5297 −0.62130.5774 0.5774 −0.0458 0.57550.5774 0.0000 0.6213 −0.52970.5774 −0.5774 −0.5755 −0.0458

To check,

UΣV ∗ =

1 0 12 1 01 1 −10 1 −2

♦♦♦

3.8.3 Polar Decomposition

Given a matrix A[=]n × n, the polar decomposition of A is given by

A = UP (3.38)


in which P is a symmetric positive definite matrix and U is unitary. The main implicationof the polar decomposition is to separate the operation of A to be a sequence of operationon a vector, say x. First, Ux is applying a unitary matrix, which in most cases involves arotation of the vector without changing its magnitude. Then the symmetric positive definitematrix P will have, by its symmetric property, all real positive eigenvalues and orthogonaleigenvectors. Thus, P (Ux) will simply stretch the vectors Ux according to the positive scalesdetermined by the eigenvalues of P along the principal axes determined by the orthogonalset of eigenvectors. This idea is useful when analyzing deformation of systems in which thetwo operators P and U need to be separately analyzed.

Algorithm for Polar Decomposition Given a matrix A[=]n × n,

1. Obtain the Grammian of A.G = A∗A

2. Calculate P to be the positive square root of G.

Since G is normal, the Schur Triangularization should yield the followingdiagonalization

G = QΛQ∗

where Q is unitary and Λ is diagonal matrix of eigenvalues of G.

Let P be the positive square root of G, i.e. P 2 = G.

P = U

√λ1 0 · · · 00

√λ2 · · · 0

......

. . ....

0 0 · · ·√

λn

U∗

3. Calculate U .U = AP−1

Example 3.16

Let

A =

(

1.4575 −1.2745−0.0245 0.7075

)

The positive definite factor, P , is given by

P = (A∗A)1/2 =

(

1.25 −0.75−0.75 1.25

)


Then the unitary (orthogonal) part is given by

U = AP−1 =

(

0.866 −0.50.5 0.866

)

To see how each matrix affects different vectors, Figure 3.1 show how the pointsof a unit circle are stretched by P . Specifically, note that the eigenvalues of Pare λ1 = 2 and λ2 = 0.5 with the corresponding eigenvectors

v1 =

(

0.7071−0.7071

)

v2 =

(

−0.7071−0.7071

)

-2 -1 0 1 2-2

-1

0

1

2

x1

x2

v1

P v1

P v2

v2

Figure 3.1: The effects of P on the unit circle.

Along v1, the points are stretched to twice it original size, while along v2, thepoints are compressed to half its original size.

Afterwards, the operator U rotates the ellipse by an angle θ = tan−1(0.5/0.866) =30o counterclockwise.

♦♦♦

The main rationale for calling it a polar decomposition is due to its analogy to thescalar version in which a complex number z can be represented by

z = reiθ


-2 -1 0 1 2-2

-1

0

1

2

x1

x2

P v1

UP v1

Figure 3.2: The effects of U on the ellipse formed by P on the unit circle.

where r is a positive real number and the term exp(iθ) has a magnitude of 1, describing arotation.

There is a dual polar decomposition form to (3.38) in which

A = SV (3.39)

where S is the positive definite matrix and V is unitary. This polar decomposition firstrotates the vector using V prior to the stretching obtained using S. Here, S = (AA∗)1/2 andV = S−1A.

3.9 Proofs

• Proof that eigenvectors of distinct eigenvalues are linearly independent. (cf.property 6, page 72)

Let λ1, . . . , λm be a set of distinct eigenvalues of A[=]n × n, with m ≤ n, and letv1, . . . , vn be the eigenvectors. Now search for a set of coefficients, α1, . . . αm such that

α1v1 + · · · + αnvn = 0

Premultiplying by A, A2, . . . , Am−1

α1λ1v1 + · · · + αnλnvn = 0...

α1λm−11 v1 + · · · + αnλm−1

n vn = 0


which can be rewritten as the partitioned matrix,

(α1v1 | · · · | αnvn)

1 λ1 · · · λm−11

1 λ2 · · · λm−12

......

. . ....

1 λ1 · · · λm−11

= 0

But the second matrix on the left is just a Vandermonde matrix which is nonsingularbecause λ1 6= · · · 6= λm (cf. exercise 13, page 26 ). Thus

α1v1 = · · · = αnvn = 0

or

α1 = · · · = αn = 0

since the eigenvectors are nontrivial.

• Proof that∏

λi = |A|. (cf. property 7, page 72)

Using the Schur triangularization procedure, an upper triangular matrix with the eigen-values in the diagonal will result after a similarity transformation using a unitary matrixU

U∗AU =

λ1 ∗ · · · ∗0 λ2 · · · ∗...

.... . .

...0 0 · · · λn

Next take the determinant of both sides.

|U∗||A||U | =n

∏

i=1

λi

|U∗||U ||A| =

|A| =n

∏

i=1

λi

• Proof that∑

λi = tr(A). (cf. property 8, page 72)

Using the Schur triangularization procedure,

U∗AU =

λ1 ∗ · · · ∗0 λ2 · · · ∗...

.... . .

...0 0 · · · λn


Next take the trace of both sides.

tr(U∗AU) =n

∑

i=1

λi

tr(AUU∗) =

tr(A) =n

∑

i=1

λi

• Proof that eigenvalue of Hermitian matrices are real. (cf. property 9, page 72)

Let H be Hermitian. The eigenvalue equation is

Hv = λv

Next premultiply by v∗,

v∗Hv = λv∗v

Next note that

(v∗Hv)∗ = v∗H∗v = v∗Hv

which means v∗Hv is real-valued. Since v∗v is also real-valued, we conclude that λ hasto be real.

• Proof that eigenvalue of skew Hermitian matrices are pure imaginary. (cf.property 10, page 72)

Let H be skew-Hermitian. The eigenvalue equation is

Hv = λv

Next premultiply by v∗,

v∗Hv = λv∗v

Next note that

(v∗Hv)∗ = v∗H∗v = −v∗Hv

which means v∗HV is pure imaginary. Since v∗v is real-valued, we conclude that λ hasto be pure imaginary.

• Proof that eigenvectors of Hermitian matrices corresponding to distincteigenvalues are orthogonal. (cf. property 11, page 72)

Writing out the eigenvector condition for λi and λj,

Avi = λivi

Avj = λjvj


Multiplying the first equation by v∗j and the second by v∗

i ,

v∗j Avi = λ1v

∗j vi

v∗i Avj = λ2v

∗i vj

Taking the conjugate transpose of the second equation and subtracting from the first,

0 = (λi − λj)v∗i vj

With λi 6= λj, it must be the v∗i vj = 0, i.e. vi is orthogonal to vj.

• Proof of theorem 3.1: (cf. page 76)

Using the Schur triangularization process,

U∗AU = B =

λ1 b12 · · · b1n

0 λ2 · · · b2n...

.... . .

...0 0 · · · λn

However, if A is normal then B is also normal:

B∗B = (U∗AU)∗(U∗AU)

= U∗A∗UU∗AU

= U∗A∗AU

= U∗AA∗U

= U∗AUU∗A∗U

= (U∗AU)(U∗AU)∗

= BB∗

Comparing the ith diagonal element of both B∗B and BB∗,

λ2i = λ2

i +n

∑

k=i+1

bikbik

But because B is normal, then it must be that bik = 0, i = 1, . . . , n, i < k ≤ n. ThusU∗AU = diag(λ1, . . . , λn).

• Proof of theorem 3.2 (cf. page 77)

Suppose λ1 is repeated k1 times. From the rank assumption,

rank(λ1I − A) = n − k1

means that solving(λ1I − A) v = 0


for the eigenvectors contains k1 arbitrary constants (cf. section 2.1, case 2). Thus thereare k1 linearly independent eigenvectors that can be obtained for λ1. Likewise, thereare k2 linearly independent eigenvectors that can be obtained for λ2, and so forth. Letthe first set of k1 eigenvectors v1, . . . , vk1

correspond to λ1 while the subsequent setof k2 eigenvectors vk1+1, . . . , vk1+k2

correspond to eigenvalue λ2, and so forth. Eacheigenvector from the first set is linearly independent from the other set of eigenvectors.And the same can be said of the eigenvectors of the other sets. In the end, all the neigenvectors obtained will form a linearly independent set.

• Proof of theorem 3.3, Cayley-Hamilton theorem: (cf. page 87)

Using the Jordan canonical decomposition, A = TJT−1, where T is the modal matrix,and J is a matrix in Jordan canonical form with m Jordan blocks,

a0I + a1A + · · · + anAn = T (a0I + a1J + · · · + anJn)T−1

= T

charpoly(J1) 0 · · · 00 charpoly(J2) · · · 0...

.... . .

...0 0 · · · charpoly(Jm)

T−1

(3.40)The elements of charpoly(Ji) are either 0, charpoly(λi) or derivatives of charpoly(λi),multiplied by finite scalars. Thus charpoly(Ji) are zero matrices, and the right handside of equation (3.40) is a zero matrix.

Exercises

E1. Find the eigenvalues and eigenvector of A, where

A =

k + 1 3 0−1 2 10 1 3

E2. Prove properties 1 to 5 and 12 of eigenvalues and eigenvectors given on page 71.

E3. An n × n companion matrix is one that has the following form:

C =

0 1 0 · · · 00 0 1 · · · 0...

......

. . ....

0 0 0 · · · 1−α0 −α1 −α2 · · · −αn−1


Show that the characteristic polynomial of C is given by:

λn +n−1∑

i=0

αiλi = 0

Further, show that the eigenvectors corresponding for each distinct eigenvalue λi

of C is given by

vi =

1λi...

λn−1i

E4. Determine whether or not all sums of normal matrices are also normal.

E5. Determine which matrices below are diagonalizable and which are not. If diago-nalizable, obtain matrix T that diagonalizes the matrix, otherwise determine themodal matrix T that produces a similarity transformation to the Jordan canonicalform.

a) A =

1 −2 0 22 1 −2 00 2 1 −2−2 0 2 1

b) B =

4 1 21 3 1−1 0 2

c) C =

1 1 −10 2 00 0 2

E6. The polar decomposition of a square matrix A can be achieved using two se-quences. One sequence, which was discussed in section 3.8.3, A = UP , will firststretch the position of the vector using P and then rotating it with U . The othersequence, which is equally valid, is A = TV , where the vector is first rotated byan orthogonal matrix V and then stretched via a positive definite matrix T .

Obtain an algorithm for the evaluation of T and V . (Hint: V is the square root ofAAT .)

chapter 1 matrix algebra. deﬁnitions and operationstbco/cm5100/chapter1_2_3.pdf · chapter 1...

Documents