mrq november27,2014martyn/3501/3501lecturenotes.pdf · something just for euclidean vectors, we can...

MT3501 Linear Mathematics

MRQ

November 27, 2014

Contents

Introduction 2

1 Vector spaces 4Definition and examples of vector spaces . . . . . . . . . . . . . . . 4Basic properties of vector spaces . . . . . . . . . . . . . . . . . . . 9Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Spanning sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Linear independent elements and bases . . . . . . . . . . . . . . . . 15

2 Linear transformations 28Definition and basic properties . . . . . . . . . . . . . . . . . . . . 28Constructing linear transformations . . . . . . . . . . . . . . . . . 32The matrix of a linear transformation . . . . . . . . . . . . . . . . 38Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3 Direct sums 47Definition and basic properties . . . . . . . . . . . . . . . . . . . . 47Projection maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Direct sums of more summands . . . . . . . . . . . . . . . . . . . . 55

4 Diagonalisation of linear transformations 56Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . . 56Diagonalisability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Algebraic and geometric multiplicities . . . . . . . . . . . . . . . . 64Minimum polynomial . . . . . . . . . . . . . . . . . . . . . . . . . 71

5 Jordan normal form 81

6 Inner product spaces 102Orthogonality and orthonormal bases . . . . . . . . . . . . . . . . 106Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . . 113

7 The adjoint of a transformation and self-adjoint transforma-tions 122

1

Introduction

The aims of this module are stated on the School website as:

“To show an intending honour mathematician the importanceof linearity in many areas of mathematics ranging from linearalgebra through geometric applications to linear operators andspecial functions.”

How should we interpret such a statement? Firstly, as we progress intoHonours modules, we have already begun to develop some experience of thesort of objects that occur in mathematics and to develop some facility inworking with them. Functions are extremely important in mathematics andto date probably the most studied object in a student’s programme. In thismodule, we shall see how functions that are linear are far more tractibleto study. Indeed, we can hope (and succeed!) to obtain information aboutlinear mappings that we would not attempt to find for arbitrary functions.Moreover, linearity occurs naturally throughout mathematics and for thisreason understanding linear mappings is vital for the study of pure, appliedand statistical mathematics. The obvious example is differentiation:

d

dx(f + g) =

df

dx+

dg

dx,

d

dx(cf) = c

df

dx

(when c is a constant).We start by discussing vector spaces, since these are the correct and

natural place within which to frame linearity. They may initially look likean abstract algebraic concept, but they do illustrate the two main themesof this course:

(i) Vector spaces are the easiest algebraic structure to study; we can es-tablish facts that we would hope to be true more easily than in otheralgebraic systems. (For example, in groups many such results do noteven hold, though arguably this makes for interesting mathematics.)

(ii) An abstract setting is good to work within since we can establishfacts in general and then apply them to many settings. Rememberthat linearity occurs throughout mathematics: rather than proving

2

something just for Euclidean vectors, we can obtain results that covervectors, complex numbers, matrices, polynomials, differentiable func-tions, solutions to differential equations, etc.

Overview of course content

• Vector spaces: subspaces, spanning sets, linear independent sets, bases

• Linear transformations: rank, nullity, general form of a linear trans-formation, matrix of a linear transformation, change of basis

• Direct sums: projection maps

• Diagonalisation of linear transformations: eigenvectors and eigenval-ues, characteristic polynomial, minimum polynomial, characterisationof diagonalisable transformations

• Jordan normal form: method to determine the Jordan normal form

• Inner product spaces: orthogonality, associated inequalities, orthonor-mal bases, Gram–Schmidt process, applications

• Examples of infinite-dimensional inner product spaces

Comment about lecture content

The printed lecture notes (as available either through MMS or my website)will contain a number of examples that will not be covered on the board dur-ing the actual lectures. This is an attempt to provide more examples, whilenot decreasing the amount of material covered. These omitted examples arenumbered in the form “1A” (etc.) in the body of these notes.

Recommended texts

• T. S. Blyth & E. F. Robertson, Basic Linear Algebra, Second Edition,Springer Undergraduate Mathematics Series (Springer-Verlag 2002)

• T. S. Blyth & E. F. Robertson, Further Linear Algebra, Springer Un-dergraduate Mathematics Series (Springer-Verlag 2002)

• R. Kaye & R. Wilson, Linear Algebra, Oxford Science Publications(OUP 1998)

3

Section 1

Vector spaces

Definition and examples of vector spaces

Being “linear” will boil down to “preserving addition” and “preserving scalarmultiplication.” Our first step is to specify the scalars with which we intendto work.

Definition 1.1 A field is a set F together with two binary operations

F × F → F F × F → F

(α, β) 7→ α+ β (α, β) 7→ αβ

called addition and multiplication, respectively, such that

(i) α+ β = β + α for all α, β ∈ F ;

(ii) (α+ β) + γ = α+ (β + γ) for all α, β, γ ∈ F ;

(iii) there exists an element 0 in F such that α+ 0 = α for all α ∈ F ;

(iv) for each α ∈ F , there exists an element −α in F such that α+(−α) = 0;

(v) αβ = βα for all α, β ∈ F ;

(vi) (αβ)γ = α(βγ) for all α, β, γ ∈ F ;

(vii) α(β + γ) = αβ + αγ for all α, β, γ ∈ F ;

(viii) there exists an element 1 in F such that 1 6= 0 and 1α = α for allα ∈ F ;

(ix) for each α ∈ F with α 6= 0, there exists an element α−1 (or 1/α) in Fsuch that αα−1 = 1.

4

Although a full set of axioms have been provided, we are not going toexamine them in detail nor spend developing the theory of fields. Instead,one should simply note that in a field one may add, subtract, multiply anddivide (by non-zero scalars) and that all normal rules of arithmetic hold.This is illustrated by our examples.

Example 1.2 The following are examples of fields:

(i) Q = {m/n | m,n ∈ Z, n 6= 0 }

(ii) R

(iii) C = {x+ iy | x, y ∈ R }with all three possessing the usual addition and multiplication.

(iv) Z/pZ = {0, 1, . . . , p − 1}, where p is a prime number, with additionand multiplication being performed modulo p.

The latter example is important in the context of pure mathematics andit is for this reason that I mention it. For the purposes of this moduleand for many applications of linear algebra in applied mathematics and thephysical sciences, the examples R and C are the most important and it issafe to think of them as the typical examples of a field throughout. Forthose of a pure mathematical bent, however, it is worth noting that muchof what is done in this course will work over an arbitrary field.

Definition 1.3 Let F be a field. A vector space over F is a set V togetherwith the following operations

V × V → V F × V → V

(u, v) 7→ u+ v (α, v) 7→ αv,

called addition and scalar multiplication, respectively, such that

(i) u+ v = v + u for all u, v ∈ V ;

(ii) (u+ v) +w = u+ (v + w) for all u, v, w ∈ V ;

(iii) there exists a vector 0 in V such that v + 0 = v for all v ∈ V ;

(iv) for each v ∈ V , there exists a vector −v in V such that v + (−v) = 0;

(v) α(u+ v) = αu+ αv for all u, v ∈ V and α ∈ F ;

(vi) (α+ β)v = αv + βv for all v ∈ V and α, β ∈ F ;

(vii) (αβ)v = α(βv) for all v ∈ V and α, β ∈ F ;

(viii) 1v = v for all v ∈ V .

5

Comments:

(i) A vector space then consists of a collection of vectors which we arepermitted to add and which we may multiply by scalars from our basefield. These operations behave in a natural way.

(ii) One aspect which requires some care is that the field contains thenumber 0 while the vector space contains the zero vector 0. The latterwill be denoted by boldface in notes and on the slides. On the boardboldface is unavailable, so although the difference is usually clear fromthe context (we can multiply vectors by scalars, but cannot multiplyvectors, and we can add two vectors but cannot add a vector to ascalar) we shall use

˜0 to denote the zero vector on that medium.

(iii) We shall use the term real vector space to refer to a vector space overthe field R and complex vector space to refer to one over the field C.Almost all examples in this module will be either real or complexvector spaces.

(iv) We shall sometimes refer simply to a vector space V without specify-ing the base field F . Nevertheless, there is always such a field F inthe background and we will then use the term scalar to refer to theelements of this field when we fail to actually name it.

To illustrate why these are important objects, we shall give a number ofexamples, many of which should be familiar (not least from MT2001).

Example 1.4 (i) Let n be a positive integer and let Fn denote the setof column vectors of length n with entries from the field F :

Fn =

x1x2...xn

∣

∣

∣

∣

∣

∣

∣

∣

∣

x1, x2, . . . , xn ∈ F

.

This is an example of a vector space over F . Addition in Fn is givenby

x1x2...xn

+

y1y2...yn

=

x1 + y1x2 + y2

...xn + yn

,

while scalar multiplication is similarly given by

α

x1x2...xn

=

αx1αx2...

αxn

.

6

The zero vector is

0 =

00...0

and

−

x1x2...xn

=

−x1−x2...−xn

specifies the negative of a vector.

We could also consider the set of row vectors of length n as a vectorspace over F . This is sometimes also denoted Fn and has the ad-vantage of being more easily written on the page! However, columnvectors turn out to be slightly more natural than row vectors when weconsider the matrix of a linear transformation later in the module.

As a further comment about distinguishing between scalars and vec-tors, we shall follow the usual convention of using boldface (or writingsomething such as

˜v on the board) to denote column vectors in the

vector space Fn. We shall, however, usually not use boldface letterswhen referring to vectors in an abstract vector space (since in an actualexample, they could become genuine column vectors, but also possiblymatrices, polynomials, functions, etc.).

(ii) The complex numbers C can be viewed as a vector space over R.Addition is the usual addition of complex numbers:

(x1 + iy1) + (x2 + iy2) = (x1 + x2) + i(y1 + y2);

while scalar multiplication is given by

α(x+ iy) = (αx) + i(αy) (for α ∈ R).

The zero vector is the element 0 = 0 + i0 ∈ C.

(iii) A polynomial over the field F is an expression of the form

f(x) = a0 + a1x+ a2x2 + · · ·+ amxm,

for some m > 0, where a0, a1, . . . , am ∈ F and where we ignore termswith 0 as the coefficient. The set of all polynomials over F is usuallydenoted by F [x]. If necessary we can “pad” such an expression for apolynomial using 0 as the coefficient for the extra terms to increase its

7

length. Thus to add f(x) above to another polynomial g(x), we mayassume they are represented by expressions of the same length, say

g(x) = b0 + b1x+ b2x2 + · · ·+ bmxm.

Then

f(x)+ g(x) = (a0+ b0)+ (a1+ b1)x+(a2+ b2)x2 + · · ·+(am+ bm)xm.

Scalar multiplication is straightforward:

αf(x) = (αa0) + (αa1)x+ (αa2)x2 + · · ·+ (αam)xm

for f(x) as above and α ∈ F . The vector space axioms are pretty muchstraightforward to verify. The zero vector is the polynomial with allcoefficients 0:

0 = 0 + 0x+ 0x2 + · · ·+ 0xm

(for any choice of m) and

−f(x) = (−a0) + (−a1)x+ (−a2)x2 + · · ·+ (−am)xm.

(iv) The final example is heavily related to the example of polynomials,which are after all special types of functions. Let FR denote the set ofall functions f : R→ R. Define the addition of two functions f and gby

(f + g)(x) = f(x) + g(x) (for x ∈ R)

and scalar multiplication of f by α ∈ R by

(αf)(x) = α · f(x) (for x ∈ R).

Then FR is a real vector space with

(−f)(x) = −f(x)

and 0 being the function given by x 7→ 0 for all x ∈ R.

These examples illustrate that vector spaces occur in numerous situations(functions covering a large class of mathematical objects for a start) and sothe study of linear algebra is of considerable importance. We shall spendsome time developing (and reviewing the development) of the theory ofvector spaces.

8

Basic properties of vector spaces

In order to work with the vectors in a vector space, we record the basicproperties we need.

Proposition 1.5 Let V be a vector space over a field F . Let v ∈ V andα ∈ F . Then

(i) α0 = 0;

(ii) 0v = 0;

(iii) if αv = 0, then either α = 0 or v = 0;

(iv) (−α)v = −αv = α(−v).Proof: [Omitted in lectures — appears in MT2001]

(i) Use condition (v) of Definition 1.3 to give

α(0 + 0) = α0+ α0;

that is,α0 = α0+ α0.

Now add −α0 to both sides to yield

0 = α0.

(ii) Use condition (vi) of Definition 1.3 to give

(0 + 0)v = 0v + 0v;

that is,0v = 0v + 0v.

Now add −0v to deduce 0 = 0v.(iii) Suppose αv = 0, but that α 6= 0. Then F contains the scalar α−1

and multiplying by this gives

α−1(αv) = α−10 = 0 (using (i)).

Therefore1v = (α−1 · α)v = 0.

Condition (viii) of Definition 1.3 then shows v = 0. Hence if αv = 0, eitherα = 0 or v = 0.

(iv)αv + (−α)v = (α+ (−α))v = 0v = 0,

so (−α)v is the vector which when added to αv yields 0; that is, (−α)v =−αv.

Similarly,αv + α(−v) = α(v + (−v)) = α0 = 0

and we deduce that α(−v) must be the vector −αv. �

9

Subspaces

Although linear algebra is a branch of mathematics that is used throughoutthe whole spectrum of pure and applied mathematics, it is nonetheless abranch of algebra. As a consequence, we should expect to do the sort ofthing that is done throughout algebra, namely examine substructures andstructure preserving maps. For the former, we make the following definition.

Definition 1.6 Let V be a vector space over a field F . A subspace W of Vis a non-empty subset such that

(i) if u, v ∈W , then u+ v ∈W , and

(ii) if v ∈W and α ∈ F , then αv ∈W .

Thus a subspace W is a non-empty subset of the vector space V suchthat W is closed under vector addition and scalar multiplication by anyscalar from the field F . The following basic properties hold:

Lemma 1.7 Let V be a vector space and let W be a subspace of V . Then

(i) 0 ∈W ;

(ii) if v ∈W , then −v ∈W .

Proof: (i) Since W is non-empty, there exists at least one vector u ∈ W .Now W is closed under scalar multiplication, so 0u ∈W ; that is, 0 ∈W (byProposition 1.5(i)).

(ii) Let v be any vector in W . Then W contains

(−1)v = −1v = −v.

�

Consequence: If W is a subspace of V (over a field F ), then W is also avector space of F : if u, v are elements of W and α ∈ F , then u+ v, −v andαv are defined elements of W . We have a zero vector 0 ∈W and the axiomsare all inherited from the fact that they hold universally on all vectors in V .

Example 1.8 Many examples of subspaces were presented in MT2001. Welist a few here with full details, but these details will probably be omittedduring the lectures.

(i) Let V = R3, the real vector space of column vectors of length 3.Consider

W =

xy0

∣

∣

∣

∣

∣

∣

x, y ∈ R

⊆ R3;

10

so W consists of all vectors with zero in the last entry. We check

x1y10

+

x2y20

=

x1 + x2y1 + y2

0

∈W

and

α

xy0

=

αxαy0

∈W (for α ∈ R).

Thus W is closed under sums and scalar multiplication; that is, W isa subspace of R3.

(ii) Let FR be the set of all functions f : R→ R, which forms a real vectorspace under

(f + g)(x) = f(x) + g(x); (αf)(x) = α · f(x).

Let P denote the set of polynomial functions; i.e., each f ∈ P has theform

f(x) = a0 + a1x+ a2x2 + · · · + amxm

for some m > 0 and a0, a1, . . . , am ∈ R. Then P ⊆ FR and, sincethe sum of two polynomials is a polynomial and a scalar multiple of apolynomial is a polynomial, P is a subspace of FR.

We shall meet a generic way of constructing subspaces in a short while.The following specifies basic ways of manipulating subspaces.

Definition 1.9 Let V be a vector space and let U and W be subspacesof V .

(i) The intersection of U and W is

U ∩W = { v | v ∈ U and v ∈W }.

(ii) The sum of U and W is

U +W = {u+ w | u ∈ U,w ∈W }.

Since V is a vector space, addition of a vector u ∈ U ⊆ V and w ∈W ⊆V makes sense. Thus the sum U +W is a sensible collection of vectors in V .

Proposition 1.10 Let V be a vector space and let U and W be subspacesof V . Then

(i) U ∩W is a subspace of V ;

11

(ii) U +W is a subspace of V .

Proof: (i) First note that Lemma 1.7(i) tells us that 0 lies in both U andW .Hence 0 ∈ U ∩W , so this intersection is non-empty. Let u, v ∈ U ∩W andα be a scalar from the base field. Then U is a subspace containing u and v,so u+ v ∈ U and αv ∈ U . Equally, u, v ∈ W so we deduce u+ v ∈ W andαv ∈ W . Hence u + v ∈ U ∩W and αv ∈ U ∩W . This shows U ∩W is asubspace of V .

(ii) Using the fact that 0 lies in U and W , we see 0 = 0 + 0 ∈ U +W .Hence U +W is non-empty. Now let v1, v2 ∈ U +W , say v1 = u1 + w1 andv2 = u2 + w2 where u1, u2 ∈ U and w1, w2 ∈W . Then

v1 + v2 = (u1 + w1) + (u2 + w2) = (u1 + u2) + (w1 + w2) ∈ U +W

and if α is a scalar then

αv1 = α(u1 +w1) = (αu1) + (αw1) ∈ U +W.

Hence U +W is a subspace of V . �

A straightforward induction argument then establishes:

Corollary 1.11 Let V be a vector space and let U1, U2, . . . , Uk be sub-spaces of V . Then

U1 + U2 + · · ·+ Uk = {u1 + u2 + · · ·+ uk | ui ∈ Ui for each i }

is a subspace of V .

Spanning sets

We have defined earlier what is meant by a subspace. We shall now describea good way (indeed, probably the canonical way) to specify subspaces.

Definition 1.12 Let V be a vector space over a field F and suppose thatA = {v1, v2, . . . , vk} is a set of vectors in V . A linear combination of thesevectors is a vector of the form

α1v1 + α2v2 + · · ·+ αkvk

for some α1, α2, . . . , αk ∈ F . The set of all such linear combinations is calledthe span of the vectors v1, v2, . . . , vk and is denoted by Span(v1, v2, . . . , vk)or by Span(A ).

12

Remarks

(i) We shall often use the familiar summation notation to abbreviate alinear combination:

k∑

i=1

αivi = α1v1 + α2v2 + · · · + αkvk.

(ii) In some settings, one might (and certainly some authors do) write 〈A 〉or 〈v1, v2, . . . , vk〉 for the span of A = {v1, v2, . . . , vk}. We will not doso in this course, since we wish to reserve angled brackets for innerproducts later in the course.

(iii) When A is an infinite set of vectors in a vector space V (over a field F ),we need to apply a little care when defining Span(A ). It does not makesense to add together infinitely many vectors: our addition only allowsus to combine two vectors at a time. Consequently for an arbitraryset A of vectors we make the following definition:

Span(A ) =

{ k∑

i=1

αivi

∣

∣

∣

∣

v1, v2, . . . , vk ∈ A , α1, α2, . . . , αk ∈ F

}

.

Thus Span(A ) is the set of all linear combinations formed by selectingfinitely many vectors from A . When A is finite, this coincides withDefinition 1.12.

Proposition 1.13 Let A be a set of vectors in the vector space V . ThenSpan(A ) is a subspace of V .

Proof: [Omitted in lectures — appears in MT2001]We prove the proposition for the case when A = {v1, v2, . . . , vk} is a

finite set of vectors from V . The case when A is infinite requires few changes.First note that, taking αi = 0 for each i, we see

0 =

k∑

i=1

0vi ∈ Span(A ).

Let u, v ∈ Span(A ), say

u =

k∑

i=1

αivi and v =

k∑

i=1

βivi

where the αi and βi are scalars. Then

u+ v =

k∑

i=1

(αi + βi)vi ∈ Span(A )

13

and if γ is a further scalar then

γv =k

∑

i=1

(γαi)vi ∈ Span(A ).

Thus Span(A ) is a non-empty subset of V which is closed under additionand scalar multiplication; that is, it is a subspace of V . �

It is fairly easy to see that if W is a subspace of a vector space V , thenW = Span(A ) for some choice of A ⊆W . Indeed, the fact that W is closedunder addition and scalar multiplication ensures that linear combinationsof its elements are again in W and hence W = Span(W ). However, whatwe will typically want to do is seek sets A which span particular subspaceswhere A can be made reasonably small.

Definition 1.14 A spanning set for a subspace W is a set A of vectorssuch that Span(A ) = W .

Thus if Span(A ) = W , then each element of W can be written in theform

v =

k∑

i=1

αivi

where v1, v2, . . . , vk ∈ A and α1, α2, . . . , αk are scalars from the base field.We seek to find efficient choices of spanning sets A ; i.e., make A as smallas possible.

Example 1.15 (i) Since every vectors in R3 has the form

xyz

= x

100

+ y

010

+ z

001

,

we conclude that

100

,

010

,

001

is a spanning set for R3. However, note that although this is probablythe most natural spanning set, it is not the only one. For example,

xyz

=x+ y

2

110

+x− y

2

1−10

+ z

001

,

so

110

,

1−10

,

001

14

is also a spanning set.

We can also add vectors to a set that already spans and produce yetanother spanning set. (Though there will inevitably be a level of re-dundancy to this and this relates to the concept of linear independencewhich we address next.) For example,

100

,

010

,

001

,

12−1

is a spanning set for R3, since every vector in R3 can be written asa linear combination of the vectors appearing (in multiple ways); forexample,

xyz

=(

x+z

2

)

100

+ (y + z)

010

+z

2

001

− z

2

12−1

.

(ii) Recall the vector space F [x] of polynomials over the field F ; its ele-ments have the form

f(x) = a0 + a1x+ a2x2 + · · ·+ amxm.

We can therefore write

f(x) = a0f0(x) + a1f1(x) + a2f2(x) + · · ·+ amfm(x)

where fi(x) = xi for i = 0, 1, 2, . . . . Hence the set

M = {1, x, x2, x3, . . . }

of all monomials is a spanning set for F [x].

Linear independent elements and bases

We described spanning sets in the previous section. When seeking to makethese as efficient as possible, we will want to use linear independent spanningsets.

Definition 1.16 Let V be a vector space over a field F . A set A ={v1, v2, . . . , vk} is called linearly independent if the only solution to the equa-tion

k∑

i=1

αivi = 0

(with αi ∈ F ) is α1 = α2 = · · · = αk = 0.If A is not linearly independent, we shall call it linearly dependent.

15

The method to check whether a set of vectors is linearly independent wascovered in MT2001 and is fairly straightforward. Take the set of vectors A ={v1, v2, . . . , vk} and consider the equation

∑ki=1

αivi = 0. This will usuallyconvert to a system of linear equations in the variables αi. The usual methodof solving systems of linear equations (such as applying row operations tothe matrix associated to the system) determines whether αi = 0 for all i isthe only solution or whether there are non-trivial solutions.

Example 1A Determine whether the set {x+x2, 1−2x2, 3+6x} is linearlyindependent in the vector space P of all real polynomials.

Solution: We solve

α(x+ x2) + β(1− 2x2) + γ(3 + 6x) = 0;

that is,(β + 3γ) + (α+ 6γ)x + (α − 2β)x2 = 0. (1.1)

Equating coefficients yields the system of equations

β + 3γ = 0

α + 6γ = 0

α− 2β = 0;

that is,

0 1 31 0 61 −2 0

αβγ

=

000

.

A sequence of row operations (Check!) converts this to

1 −2 00 1 30 0 0

αβγ

=

000

.

Hence the original equation (1.1) is equivalent to

α− 2β = 0

β + 3γ = 0.

Since there are fewer equations remaining than the number of variables,we have enough freedom to produce a non-zero solution. For example, ifwe set γ = 1, then β = −3γ = −3 and α = 2β = −6. Hence the set{x+ x2, 1− 2x2, 3 + 6x} is linearly dependent. �

16

In general, if A = {v1, v2, . . . , vk} is linearly dependent, then we have

k∑

i=1

αivi = 0

with not all αi ∈ F equal to zero. Suppose αj 6= 0 and rearrange theequation to

αjvj = −(α1v1 + · · ·+ αj−1vj−1 + αj+1vj+1 + · · · + αkvj).

Therefore

vj = (−α1/αj)v1 + · · · + (−αj−1/αj)vj−1

+ (−αj+1/αj)vj+1 + · · ·+ (−αk/αj)vk,

so vj can be expressed as a linear combination of the other vectors in A .Equally such an expression can be rearranged into an equation of lineardependence for A . Hence we have the following important observation:

Lemma 1.17 Let A = {v1, v2, . . . , vk} be a set of vectors in the vectorspace V . Then A is linearly independent if and only if no vector in A canbe expressed as a linear combination of the others. �

Suppose that A = {v1, v2, . . . , vk} is a finite set of vectors which islinearly dependent and let W = Span(A ). Now as A is linear dependent,Lemma 1.17 says that one of the vectors in A is a linear combination ofthe others. Let us suppose without loss of generality (and for notationalconvenience only) that vk is such a vector. Say

vk =

k−1∑

i=1

αivi

for some scalars αi. Put B = {v1, v2, . . . , vk−1} ⊆ A .

Claim: Span(B) = Span(A ) = W .

Proof: Since B ⊆ A , it is clear that Span(B) ⊆ Span(A ): any linearcombination of the vectors in B is a linear combination of those in A bytaking 0 as the coefficient for vk.

Conversely, if w ∈ Span(A ), then

w = β1v1 + β2v2 + · · ·+ βkvk

for some scalars β1, β2, . . . , βk. Hence

w = β1v1 + β2v2 + · · · + βk−1vk−1 + βk

k−1∑

i=1

αivi

17

=k−1∑

i=1

(βi + βkαi)vi

∈ Span(B).

This proves the claim and we have the following result. �

Lemma 1.18 Let A be a linear dependent set of vectors belonging to avector space V . Then there exists some vector v in A such that A \{v} spansthe same subspace as A .

As we started with a finite set, repeating this process eventually stops,at which point we must have produced a linearly independent set. Hence,we conclude:

Theorem 1.19 Let V be a vector space. If A is a finite subset of V andW = Span(A ), then there exists a linearly independent subset B withB ⊆ A and Span(B) = W . �

Thus we can pass from a finite spanning set to a linearly independentspanning set simply by omitting the correct choice of vectors. (The abovetells us that we want to omit a vector that can be expressed as a linearcombination of the others, and then repeat.) In particular, if the vectorspace V possesses a finite spanning set, then it possesses a linearly indepen-dent spanning set. Accordingly, we make the following definition:

Definition 1.20 Let V be a vector space over the field F . A basis for V isa linearly independent spanning set. We say that V is finite-dimensional ifit possesses a finite spanning set; that is, if V possesses a finite basis. Thedimension of V is the size of any basis for V and is denoted by dimV .

Example 1.21 (i) The set

B =

100...0

,

010...0

, . . . ,

00...01

is a basis for V = Fn. We shall call it the standard basis for Fn.Hence dimFn = n (as one would probably expect). Throughout we

18

shall write

ei =

0...010...0

,

where the 1 is in the ith position, so that B = {e1, e2, . . . , en}.[Verification (omitted in lectures): If v is an arbitrary vector in V , say

v =

x1x2...xn

=

n∑

i=1

xiei

(where xi ∈ F ). Thus B = {e1, e2, . . . , en} spans V . Suppose thereexist scalars α1, α2, . . . , αn such that

n∑

i=1

αiei = 0;

that is,

α1

α2

...αn

=

00...0

.

Hence α1 = α2 = · · · = αn = 0. Thus B is linearly independent.]

(ii) Let Pn be the set of polynomials over the field F of degree at most n:

Pn = { f(x) | f(x) = a0 + a1x+ a2x2 + · · · + anx

n

for some ai ∈ F }.

It is easy to check Pn is closed under sums and scalar multiples, soPn forms a vector subspace of the space F [x] of all polynomials. Theset of monomials {1, x, x2, . . . , xn} is a basis for Pn. Hence dimPn =n+ 1.

In these examples we have referred to dimension as though it is uniquelydetermined. In Corollary 1.24 we shall show that it is. Beforehand, however,we shall observe how bases are efficient as spanning sets, since they producea uniqueness to the linear combinations required.

19

Lemma 1.22 Let V be a vector space of dimension n and suppose that B ={v1, v2, . . . , vn} is a basis for V . Then every vector in V can be expressed asa linear combination of the vectors in B in a unique way.

Proof: Let v ∈ V . Since B is a spanning set for V , we can certainlyexpress v as a linear combination of the vectors in B. Suppose we have twoexpressions for v:

v =n∑

i=1

αivi =n∑

i=1

βivi.

Hencen∑

i=1

(αi − βi)vi = 0.

Since the set B is linearly independent, we deduce αi−βi = 0 for all i; thatis, αi = βi for all i. Hence our linear combination expression for v is indeedunique. �

Theorem 1.23 Let V be a finite-dimensional vector space. Suppose that{v1, v2, . . . , vm} is a linearly independent set of vectors and {w1, w2, . . . , wn}is a spanning set for V . Then

m 6 n.

The proof of this theorem appears only in the lecture notes, but it willnot be presented in the actual lectures. Before giving the proof, we firstrecord the important consequence which we referred to above.

Corollary 1.24 Let V be a finite-dimensional vector space. Then any twobases for V have the same size and consequently dimV is uniquely deter-mined.

Proof: If {v1, v2, . . . , vm} and {w1, w2, . . . , wn} are bases for V , then theyare both linearly independent and spanning sets for V , so Theorem 1.23applied twice gives

m 6 n and n 6 m.

Hence m = n. �

Let us now turn to the proof of the main theorem about linearly inde-pendent sets and spanning sets.

Proof of Theorem 1.23: [Omitted in lectures]Let V be a finite-dimensional vector space over a field F . We shall assume

that {v1, v2, . . . , vm} is a linearly independent set in V and {w1, w2, . . . , wn}

20

is a spanning set for V . In particular, there exist α1, α2, . . . , αn ∈ F suchthat

v1 =n∑

i=1

αiwi.

Since {v1, v2, . . . , vn} is linearly independent, certainly v1 6= 0. Thus someof the αi are non-zero. By re-arranging the wi, there is no loss of generalityin assuming α1 6= 0. Then

w1 =1

α1

(

v1 −n∑

i=2

αiwi

)

and this enables us to replace w1 in any expression by a linear combinationof v1 and w2, . . . , wn. Since V = Span(w1, w2, . . . , wn), we now deduce

V = Span(v1, w2, . . . , wn).

Suppose that we manage to show

V = Span(v1, v2, . . . , vj , wj+1, . . . , wn)

for some value of j where j < m,n. Then vj+1 is a vector in V , so can beexpressed as a linear combination of v1, v2, . . . , vj , wj+1, . . . , wn, say

vj+1 =

j∑

i=1

βivi +

n∑

i=j+1

βiwi

for some scalars βi ∈ F . Now if βj+1 = · · · = βn = 0, then we would have

vj+1 =

j∑

i=1

βivi,

which would contradict the set {v1, v2, . . . , vn} being linearly independent(see Lemma 1.17). Hence some βi, with i > j+1, is non-zero. Re-arrangingthe wi (again), we can assume that it is βj+1 6= 0. Hence

wj+1 =1

βj+1

(

vj+1 −j

∑

i=1

βivi −n∑

i=j+2

βiwi

)

.

Consequently, we can replace wj+1 by a linear combination of the vectorsv1, v2, . . . , vj+1, wj+2, . . . , wn. Therefore

V = Span(v1, v2, . . . , vj , wj+1, wj+2, . . . , wn)

= Span(v1, v2, . . . , vj+1, wj+2, . . . , wn).

21

If it were the case that m > n, then this process stops when we have replacedall the wi by vi and have

V = Span(v1, v2, . . . , vn).

But then vn+1 is a linear combination of v1, v2, . . . , vn, and this contradicts{v1, v2, . . . , vm} being linearly independent.

Consequently m 6 n, as required. �

In two examples that finish this section, we shall illustrate two ways tobuild bases for subspaces. The first is the following:

Example 1.25 Let

v1 =

1−103

, v2 =

2110

, v3 =

031−6

, v4 =

010−1

, v5 =

−11−10

and let U be the subspace of R4 spanned by the set A = {v1,v2,v3,v4,v5}.Find a basis B for U and hence determine the dimension of U .

Solution: Since dimR4 = 4, the maximum size for a linearly independentset is 4 (by Theorem 1.23). This tells us that A is certainly not linearlyindependent; we need to find a linearly independent subset B of A thatalso spans U (see Theorem 1.19). This is done by finding which vectorsin A can be written as a linear combination of the other vectors in A (seeLemma 1.17). We solve

αv1 + βv2 + γv3 + δv4 + εv5 = 0; (1.2)

that is,

α

1−103

+ β

2110

+ γ

031−6

+ δ

010−1

+ ε

−11−10

=

0000

or, equivalently,

1 2 0 0 −1−1 1 3 1 10 1 1 0 −13 0 −6 −1 0

αβγδε

=

0000

.

22

We apply row operation as follows to the appropriate augmented matrix:

1 2 0 0 −1−1 1 3 1 10 1 1 0 −13 0 −6 −1 0

0000

−→

1 2 0 0 −10 3 3 1 00 1 1 0 −10 −6 −6 −1 3

0000

r2 7→ r2 + r1r4 7→ r4 − 3r1

−→

1 2 0 0 −10 1 1 0 −10 3 3 1 00 −6 −6 −1 3

0000

r2 ←→ r3

−→

1 2 0 0 −10 1 1 0 −10 0 0 1 30 0 0 −1 −3

0000

r3 7→ r3 − 3r2r4 7→ r4 + 6r2

−→

1 2 0 0 −10 1 1 0 −10 0 0 1 30 0 0 0 0

0000

r4 7→ r4 + r3

So our equation (1.2) is equivalent to:

α+ 2β − ε = 0

β + γ − ε = 0

δ + 3ε = 0

Given arbitrary γ and ε, we can read off α, β and δ that solve the equation.Taking γ = 1, ε = 0 and γ = 0, ε = 1 tells us that the vectors v3 and v5

in A can be written as a linear combination of the others, as we shall nowobserve.

If γ = 1 and ε = 0, then the above tells us:

δ = −3ε = 0

β = −γ + ε = −1α = −2β + ε = 2

Hence2v1 − v2 + v3 = 0,

sov3 = −2v1 + v2. (1.3)

23

If γ = 0 and ε = 1, then the above tells us:

δ = −3ε = −3β = −γ + ε = 1

α = −2β + ε = −1.

Hence−v1 + v2 − 3v4 + v5 = 0,

sov5 = v1 − v2 + 3v4. (1.4)

Equations (1.3) and (1.4) tell us v3,v5 ∈ Span(v1,v2,v4). Thereforeany linear combination of the vectors in A can also be written as a linearcombination of B = {v1,v2,v4} (using (1.3) and (1.4) to achieve this).Hence

U = Span(A ) = Span(B).

We finish by observing that B is linearly independent. Solve

αv1 + βv2 + γv4 = 0;

that is,

α

1−103

+ β

2110

+ γ

010−1

=

0000

,

or

α+ 2β = 0

−α + β + γ = 0

β = 0

3α − γ = 0

In this case, we can automatically read off the solution:

β = 0, α = −2β = 0, γ = 3α = 0.

Hence B is linearly independent. It follows that B is the required basisfor U . Then

dimU = |B| = 3.

�

Before the final example, we shall make some important observationsconcerning the creation of bases for finite-dimensional vector spaces.

24

Suppose that V is a finite-dimensional vector space, say dimV = n,and suppose that we already have a linearly independent set of vectors, sayA = {v1, v2, . . . , vm}. If A happens to span V , then it is a basis for V (andconsequently m = n).

If not, there exists some vector, which we shall call vm+1, such thatvm+1 6∈ Span(A ). Consider the set

A′ = {v1, v2, . . . , vm, vm+1}.

Claim: A ′ is linearly independent.

Proof: Supposem+1∑

i=1

αivi = 0.

If αm+1 6= 0, then

vm+1 = −1

αm+1

m∑

i=0

αivi ∈ Span(A ),

which is a contradiction. Thus αm+1 = 0, so

m∑

i=1

αivi = 0,

which implies α1 = α2 = · · · = αm = 0, as A is linearly independent. �

Hence, if A is a linearly independent subset which does not span V thenwe can adjoin another vector to produce a larger linearly independent set.

Let us now repeat the process. This cannot continue forever, since Theo-rem 1.23 says there is a maximum size for a linearly independent set, namelyn = dimV . Hence we must eventually reach a linearly independent set con-taining A that does span V . This proves:

Proposition 1.26 Let V be a finite-dimensional vector space. Then everylinearly independent set of vectors in V can be extended to a basis for V byadjoining a finite number of vectors. �

Corollary 1.27 Let V be a vector space of finite dimension n. If A is alinearly independent set containing n vectors, then A is a basis for V .

Proof: By Proposition 1.26, we can extend A to a basis B for V . Butby Corollary 1.24, B contains dimV = n vectors. Hence we cannot haveintroduced any new vectors and so B = A is the basis we have found. �

25

Example 1.28 Let V = R4. Show that the set

A =

3100

,

1034

is a linearly independent set of vectors. Find a basis for R4 containing A .

Solution: To show A is linearly independent, we suppose

α1

3100

+ α2

1034

=

0000

.

This yields four equations:

3α1 + α2 = 0, α1 = 0, 3α2 = 0, 4α2 = 0.

Hence α1 = α2 = 0. Thus A is linearly independent.We now seek to extend A to a basis of R4. We do so by first attempting

to add the first vector of the standard basis for R4 to A : Set

B =

3100

,

1034

,

1000

.

Suppose

α1

3100

+ α2

1034

+ α3

1000

=

0000

.

Therefore

3α1 + α2 + α3 = 0, α1 = 0, 3α2 = 0, 4α2 = 0.

So α1 = α2 = 0 (from the second and third equations) and we deduceα3 = −3α1 − α2 = 0. Hence our new set B is linearly independent.

If we now attempt to adjoin

0100

to B and repeat the above, we would

find that we were unable to prove the corresponding αi are non-zero. Indeed,

0100

=

3100

− 3

1000

∈ Span(B).

26

Thus there is no need to adjoin the second standard basis vector to B.

Now let us attempt to adjoin

0010

to B:

C =

3100

,

1034

,

1000

,

0010

.

Suppose

α1

3100

+ α2

1034

+ α3

1000

+ α4

0010

=

0000

.

Hence

3α1 + α2 + α3 = 0α1 = 0

3α2 + α4 = 04α2 = 0

Therefore α1 = α2 = 0, from which we deduce α3 = α4 = 0. Thus we haveproduced a linearly independent set C of size 4. But dimR4 = 4 and henceC must now be a basis for R4. �

27

Section 2

Linear transformations

Definition and basic properties

Linear transformations are functions between vector spaces that interactwell with the vector space structure and probably the most important thingwe study in linear algebra.

Definition 2.1 Let V and W be vector spaces over the same field F . A lin-ear mapping (also called a linear transformation) from V to W is a functionT : V →W such that

(i) T (u+ v) = T (u) + T (v) for all u, v ∈ V , and

(ii) T (αv) = αT (v) for all v ∈ V and α ∈ F .

Comment: Sometimes we shall write Tv for the image of the vector vunder the linear transformation T (instead of T (v)). We shall particularlydo this when v is a column vector, so already possesses its own pair ofbrackets already.

Linear transformations were discussed in great detail during the MT2001module. We recall below some of these facts but omit the proofs in thelectures. (The proofs do appear in the lecture notes however.)

Lemma 2.2 Let T : V →W be a linear mapping between two vector spacesover the field F . Then

(i) T (0) = 0;

(ii) T (−v) = −T (v) for all v ∈ V ;

(iii) if v1, v2, . . . , vk ∈ V and α1, α2, . . . , αk ∈ F , then

T

( k∑

i=1

αivi

)

=

k∑

i=1

αiT (vi).

28

Proof: [Omitted in lectures — appears in MT2001](i) Since 0 · 0 = 0, we have

T (0) = T (0 · 0) = 0 · T (0) = 0

(using Proposition 1.5(ii)).(ii) Proposition 1.5(iv) tells us (−1)v = −1v = −v, so

T (−v) = T ((−1)v) = (−1)T (v) = −T (v).

(iii) Using the two conditions of Definition 2.1, we see

T

( k∑

i=1

αivi

)

=

k∑

i=1

T (αivi) =

k∑

i=1

αiT (vi).

�

Definition 2.3 Let T : V → W be a linear transformation between vectorspaces over a field F .

(i) The image of T is

T (V ) = imT = {T (v) | v ∈ V }.

(ii) The kernel or null space of T is

ker T = { v ∈ V | T (v) = 0W }.

Note here that we are working with two vector spaces, each of which willpossess its own zero vector. For emphasis in the definition, we are writing 0Wfor the zero vector belonging to the vector space W , so the kernel consistsof those vectors in V which are mapped by T to the zero vector of W .

Of course, T (v) has to be a vector in W , so actually there is little harmin writing simply T (v) = 0. For this equation to make any sense, the zerovector referred to must be that belonging to W , so confusion should notarise. Nevertheless to start with we shall write 0W just to be completelycareful and clear.

Warning: The previous version of the MT3501 lecture course used R(T )to denote the image (and called it the range) and used N (T ) to denote thenull space. This will be observed when consulting previous exam papers andappropriate care should be taken.

Proposition 2.4 Let T : V →W be a linear transformation between vectorspaces V and W over the field F . The image and kernel of T are subspacesof W and V , respectively.

29

Proof: [Omitted in lectures — appears in MT2001]Certainly imT is non-empty since it contains all the images of vectors

under the application of T . Let x, y ∈ imT . Then x = T (u) and y = T (v)for some u, v ∈ V . Hence

x+ y = T (u) + T (v) = T (u+ v) ∈ imT

andαx = αT (v) = T (αv) ∈ imT

for any α ∈ F . Hence imT is a subspace of W .Note that T (0V ) = 0W , so we see 0V ∈ kerT . So to start with ker T is

non-empty. Now let u, v ∈ ker T . Then

T (u+ v) = T (u) + T (v) = 0W + 0W = 0W

andT (αv) = αT (v) = α · 0W = 0W .

Hence u + v, αv ∈ kerT (for all α ∈ F ) and we deduce that kerT is asubspace of V . �

Definition 2.5 Let T : V → W be a linear transformation between vectorspaces over the field F .

(i) The rank of T , which we shall denote rankT , is the dimension of theimage of T .

(ii) The nullity of T , which we shall denote nullT , is the dimension of thekernel of T .

Comment: The notations here are not uniformly established and I havesimply selected a convenient notation rather than a definitive one. Manyauthors use different notation or, more often, no specific notation whatsoeverfor these two concepts.

Theorem 2.6 (Rank-Nullity Theorem) Let V and W be vector spacesover the field F with V finite-dimensional and let T : V → W be a lineartransformation. Then

rankT + nullT = dimV.

Comment: [For those who have done MT2002] This can be viewed as ananalogue of the First Isomorphism Theorem for groups within the world ofvector spaces. Rearranging gives

dimV − dimker T = dim imT

30

and since (as we shall see in Problem Sheet II, Question 2) dimension es-sentially determines vector spaces we conclude

V/ ker T ∼= imT.

Of course, I have not specified what is meant by a quotients and isomorphism(and I will not address the former at all in this course), but this does givesome context for the theorem.

Proof: [Omitted in lectures — appears in MT2001]Let B = {v1, v2, . . . , vn} be a basis for kerT (so that n = nullT ) and ex-

tend this (by Proposition 1.26) to a basis C = {v1, v2, . . . , vn, vn+1, . . . , vn+k}for V (so that dimV = n+ k). We now seek to find a basis for im T .

If w ∈ imT , then w = T (v) for some v ∈ V . We can write v as a linearcombination of the vectors in the basis C , say

v =n+k∑

i=1

αivi.

Then, applying T and using linearity,

w = T (v) = T

(n+k∑

i=1

αivi

)

=n+k∑

i=1

αiT (vi) =k

∑

j=1

αn+jT (vn+j)

since T (v1) = · · · = T (vn) = 0 as v1, . . . , vn ∈ kerT . This shows thatD = {T (vn+1), . . . , T (vn+k)} spans imT .

Now suppose thatk

∑

j=1

βjT (vn+j) = 0;

that is,

T

( k∑

j=1

βjvn+j

)

= 0.

Hence∑k

j=1βjvn+j ∈ kerT , so as B = {v1, . . . , vn} is a basis for ker T , we

havek

∑

j=1

βjvn+j =n∑

i=1

γivi

for some γ1, γ2, . . . , γn ∈ F . We now have an expression

(−γ1)v1 + · · ·+ (−γn)vn + β1vn+1 + · · ·+ βkvn+k = 0

involving the vectors in the basis C for V . Since C is linearly independent,we conclude all the coefficients occurring here are zero. In particular, β1 =

31

β2 = · · · = βk = 0. This shows that D = {T (vn+1), . . . , T (vn+k)} is alinearly independent set and consequently a basis for imT . Thus

rank(T ) = dim imT = k = dimV − nullT

and this establishes the theorem. �

Constructing linear transformations

These have described the basic facts about linear transformations. When itcomes to giving examples, it is possible to describe various different ones,some of which can seem natural, some more esoteric. Instead, what we shalldo is to describe a standard method for defining linear transformations.

Let V and W be vector spaces over a field F . Suppose that V is finite-dimensional and that B = {v1, v2, . . . , vn} is a basis for V . We shall definea linear transformation T : V → W by specifying its effect on each of thebasis vectors vi.

Pick any vectors y1, y2, . . . , yn ∈W . (These can be completely arbitrary,we do not need them to be linearly independent nor to span W ; some can bethe zero vector of W and we can even repeat the same vector over and overagain if we want.) We intend to show that there is a linear transformationT : V →W satisfying T (vi) = yi for i = 1, 2, . . . , n. If such a T does exist,consider the effect T has on an arbitrary vector v in V . Since B is a basisfor V , the vector v can be uniquely expressed as

v =n∑

i=1

αivi

for scalars αi in F . Then linearity of T implies

T (v) = T

( n∑

i=1

αivi

)

=n∑

i=1

αiT (vi) =n∑

i=1

αiyi. (2.1)

Hence T , if it exists, is uniquely specified by this formula (2.1).We now claim that this formula does indeed define a linear transforma-

tion T : V →W . If u, v ∈ V , say

u =n∑

i=1

αivi and v =n∑

i=1

βivi

for some uniquely determined αi, βi ∈ F . Then u+v =∑n

i=1(αi+βi)vi and

this must be the unique expression for u+ v in terms of the basis B. So

T (u+ v) =

n∑

i=1

(αi + βi)yi =

n∑

i=1

αiyi +

n∑

i=1

βiyi = T (u) + T (v).

32

Similarly γu =∑n

i=1(γαi)vi, so

T (γu) =

n∑

i=1

(γαi)yi = γ

n∑

i=1

αiyi = γT (u) for any γ ∈ F .

This shows T is a linear transformation.We have now established the following result:

Proposition 2.7 Let V be a finite-dimensional vector space over the field Fwith basis {v1, v2, . . . , vn} and let W be any vector space over F . If y1, y2,. . . , yn are arbitrary vectors in W , there is a unique linear transformationT : V →W such that

T (vi) = yi for i = 1, 2, . . . , n.

�

Moreover, we have shown this transformation T is given by

T

( n∑

i=1

αivi

)

=n∑

i=1

αiyi

for an arbitrary linear combination∑n

i=1αivi in V .

We now give an example of our method of creating linear transforma-tions.

Example 2.8 Define a linear transformation T : R4 → R3 in terms of thestandard basis B = {e1, e2, e3, e4} by

T (e1) = y1 =

213

, T (e2) = y2 =

−101

,

T (e3) = y3 =

015

, T (e4) = y4 =

−5−2−5

.

Calculate the linear transformation T and its rank and nullity.

Solution: The effect of T on an arbitrary vector of R4 can be calculatedby the linearity property:

T

αβγδ

= T (αe1 + βe2 + γe3 + δe4)

33

= αT (e1) + βT (e2) + γT (e3) + δT (e4)

= α

213

+ β

−101

+ γ

015

+ δ

−5−2−5

=

2α− β − 5δα+ γ − 2δ

3α+ β + 5γ − 5δ

.

[Exercise: Check by hand that this formula does really define a lineartransformation T : R4 → R3.]

Now let us determine the kernel of this transformation T . Suppose v ∈ker T . Here v is some vector in R4, say

v =

αβγδ

= αe1 + βe2 + γe3 + δe4

where

T (v) =

2α− β − 5δα+ γ − 2δ

3α+ β + 5γ − 5δ

=

000

.

We have here three simultaneous equations in four variables which we con-vert to the matrix equation

2 −1 0 −51 0 1 −23 1 5 −5

αβγδ

=

000

.

We solve this by performing the usual row operations used in Gaussianelimination [see MT1002]:

2 −1 0 −51 0 1 −23 1 5 −5

000

−→

1 0 1 −22 −1 0 −53 1 5 −5

000

(r1 ↔ r2)

−→

1 0 1 −20 −1 −2 −10 1 2 1

000

(r2 7→ r2 − 2r1,r3 7→ r3 − 3r1)

−→

1 0 1 −20 1 2 10 0 0 0

000

(r3 7→ r3 + r2,r2 7→ −r2)

So given arbitrary γ and δ, we require

α+ γ − 2δ = 0 and β + 2γ + δ = 0.

34

We remain with two degrees of freedom (the free choice of γ and δ) and soker T is 2-dimensional:

kerT =

−γ + 2δ−2γ − δ

γδ

∣

∣

∣

∣

∣

∣

∣

∣

γ, δ ∈ R

=

γ

−1−210

+ δ

2−101

∣

∣

∣

∣

∣

∣

∣

∣

γ, δ ∈ R

= Span

−1−210

,

2−101

.

It is easy to check these two spanning vectors are linearly independent, sonullT = dimkerT = 2. The Rank-Nullity Theorem then says

rankT = dimR4 − nullT = 4− 2 = 2.

Essentially this boils down to the four image vectors y1, y2, y3, y4 spanninga 2-dimensional space. Indeed, note that they are not linearly independentbecause

y3 =

015

=

213

+ 2

−101

= y1 + 2y2

y4 =

−5−2−5

= −2

213

+

−101

= −2y1 + y2.

The full explanation behind this lies in the following result. �

Proposition 2.9 Let V be a finite-dimensional vector space over the field Fwith basis {v1, v2, . . . , vn} and let W be a vector space over F . Fix vectorsy1, y2, . . . , yn in W and let T : V → W be the unique linear transformationgiven by T (vi) = yi for i = 1, 2, . . . , n. Then

(i) imT = Span(y1, y2, . . . , yn).

(ii) ker T = {0} if and only if {y1, y2, . . . , yn} is a linearly independent set.

Proof: (i) If x ∈ imT , then x = T (v) for some v ∈ V . We can writev =

∑ni=1

αivi for some αi ∈ F . Then

x = T (v) = T

( n∑

i=1

αivi

)

=

n∑

i=1

αiT (vi) =

n∑

i=1

αiyi.

35

Thus, imT consists of all linear combinations of the vectors y1, y2, . . . , yn;that is,

imT = Span(y1, y2, . . . , yn).

(ii) Consider a vector v =∑n

i=1αivi expressed in terms of the basis

vectors of V . If v lies in kerT , then T (v) =∑n

i=1αiyi equals 0. If the yi

are linearly independent, this forces αi = 0 for all i and we deduce v = 0.So linear independence of the wi implies ker T = {0}.

Conversely, if kerT = {0}, consider an equation∑n

i=1αiyi = 0 involving

the yi. Set v =∑n

i=1αivi. Our assumption forces v ∈ kerT , so v = 0 by

hypothesis. Thus∑n

i=1αivi = 0 and, since the vi are linearly independent,

we deduce αi = 0 for all i. Hence {y1, y2, . . . , yn} is linearly independent. �

Comment: Part (ii) can also be deduced (pretty much immediately) frompart (i) using the Rank-Nullity Theorem.

Example 2A Define a linear transformation T : R3 → R3 in terms of thestandard basis B = {e1, e2, e3} by

T (e1) = y1 =

21−1

, T (e2) = y2 =

−102

, T (e3) = y3 =

0−14

.

Show that kerT = {0} and imT = R3.

Solution: We check whether {y1,y2,y3} is linearly independent. Solve

αy1 + βy2 + γy3 = 0;

that is,

2α − β = 0

α − γ = 0

−α+ 2β + 4γ = 0.

The second equation tells us that γ = α while the first says β = 2α. Sub-stituting for β and γ in the third equation gives

−α+ 4α + 4α = 7α = 0.

Hence α = 0 and consequently β = γ = 0.This shows {y1,y2,y3} is linearly independent. Consequently, ker T =

{0} by Proposition 2.9. The Rank-Nullity Theorem now says

dim imT = dimR3 − dimker T = 3− 0 = 3.

Therefore imT = R3 as it has the same dimension.

36

[Alternatively, since dimR3 = 3 and {y1,y2,y3} is linearly independent,this set must be a basis for R3 (see Corollary 1.27). Therefore, by Proposi-tion 2.9(i),

imT = Span(y1,y2,y3) = R3,

once again.] �

Proposition 2.9(i) tells us that if {v1, v2, . . . , vn} is a basis for V andT : V →W is a linear transformation, then imT is spanned by the n vectors

yi = T (vi) for i = 1, 2, . . . , n.

There is, however, no expectation that this set is a basis for im T . (Indeed,it is only linearly independent when ker T = {0} by part (ii) of the propo-sition.) In such a situation, we apply Theorem 1.19 to tell us that there isa basis

B ⊆ {y1, y2, . . . , yn}for imT and the method presented in Example 1.25 can be used to find thisbasis.

Example 2.10 Let T : R4 → R3 be the linear transformation defined interms of the standard basis B = {e1, e2, e3, e4} by

T (e1) = y1 =

213

, T (e2) = y2 =

−101

T (e3) = y3 =

015

, T (e4) = y4 =

−5−2−5

.

Find a basis for the image of T .

Solution: This is the linear transformation considered in Example 2.8. Weobserved there that dim imT = rankT = 2. We also know from Proposi-tion 2.9 that

imT = Span(y1,y2,y3,y4),

so we conclude that imT has a basis C containing 2 vectors and satisfyingC ⊆ {y1,y2,y3,y4}. Note that

{y1,y2} =

213

,

−101

is linearly independent. Indeed if

α

213

+ β

−101

=

000

37

then we deduce straightaway α = 0 and then β = 0. We now have alinearly independent subset of imT of the right size to be a basis. HenceC = {y1,y2} is a basis for imT . �

The matrix of a linear transformation

We have attempted to describe an arbitrary linear transformation T : V →W . Given a basis {v1, v2, . . . , vn} for V , we have observed that T is uniquelydetermined by specifying the images T (v1), T (v2), . . . , T (vn) of the basisvectors. If we are also given a basis for W , we can then express theseimage vectors as a linear combination of the basis vectors of W and hencecompletely specify them.

Definition 2.11 Let V and W be finite-dimensional vector spaces over thefield F and let B = {v1, v2, . . . , vn} and C = {w1, w2, . . . , wm} be bases forV and W , respectively. If T : V →W is a linear transformation, let

T (vj) =

m∑

i=1

αijwi

express the image of the vector vj under T as a linear combination of thebasis C (for j = 1, 2, . . . , n). The m × n matrix [αij ] is called the matrixof T with respect to the bases B and C . We shall denote this by Mat(T ) or,when we wish to be explicit about the dependence upon the bases B and C ,by MatB,C (T ).

In the special case of a linear transformation T : V → V , we shall speakof the matrix of T with respect to the basis B to mean MatB,B(T ).

Note that the entries of the jth column of the matrix of T are:

α1j

α2j...

αmj

i.e., the jth column specifies the image of T (vj) by listing the coefficientswhen it is expressed as a linear combination of the vectors in C .

It should be noted that the matrix of a linear transformation does verymuch depend upon the choices of bases. Accordingly it is much safer toemploy the notation MatB,C (T ) and retain reference to the bases involved.

What does the matrix of a linear transformation actually repre-sent? This question could be answered at great length and can get as

38

complicated and subtle as one wants. The short answer is that if V and Ware m and n-dimensional vector spaces over a field F , then they “look like”Fm and Fn (formally, are isomorphic to these spaces, see Problem Sheet IIfor details). Then T maps vectors from V into W in the same way that thematrix Mat(T ) maps vectors from Fm into Fn. (There is a technical for-mulation of what “in the same way” means here, but that goes way beyondthe requirements of this course. It will result in the kernels of the two linearmaps being of the same dimension, similarly for the images, etc.)

Example 2.12 Define a linear transformation T : R4 → R4 by the followingformula:

T

xyzt

=

x+ 4yy

2z + tz + 2t

.

Let B = {e1, e2, e3, e4} denote the standard basis for R4 and let C be thebasis

C = {v1,v2,v3,v4} =

2020

,

01−10

,

0010

,

3001

.

Determine the matrices MatB,B(T ), MatC ,B(T ) and MatC ,C (T ).

Solution: We calculate

T (e1) = T

1000

=

1000

= e1

T (e2) = T

0100

=

4100

= 4e1 + e2

T (e3) = T

0010

=

0021

= 2e3 + e4

T (e4) = T

0001

=

0012

= e3 + 2e4.

39

So the matrix of T with respect to the basis B is

MatB,B(T ) =

1 4 0 00 1 0 00 0 2 10 0 1 2

.

[We leave it as an exercise for the reader to check that C is indeed abasis for R4. Do this by showing it is linearly independent, i.e., the onlysolution to

2 0 0 30 1 0 02 −1 1 00 0 0 1

αβγδ

=

0000

is α = β = γ = δ = 0.]We shall calculate the matrices MatC ,B(T ) and MatC ,C (T ).

T (v1) = T

2020

=

2042

= 2e1 + 4e3 + 2e4

T (v2) = T

01−10

=

41−2−1

= 4e1 + e2 − 2e3 − e4

T (v3) = T

0010

=

0021

= 2e3 + e4

T (v4) = T

3001

=

3012

= 3e1 + e3 + 2e4.

Hence

MatC ,B(T ) =

2 4 0 30 1 0 04 −2 2 12 −1 1 2

.

To find MatC ,C (T ), we need to express each T (vj) in terms of the ba-sis C .

T (v1) =

2042

= −2

2020

+ 8

0010

+ 2

3001

40

= −2v1 + 8v3 + 2v4

T (v2) =

41−2−1

=7

2

2020

+

01−10

− 8

0010

−

3001

= 7

2v1 + v2 − 8v3 − v4

T (v3) =

0021

= −3

2

2020

+ 5

0010

+

3001

= −3

2v1 + 5v3 + v4

T (v4) =

3012

= −3

2

2020

+ 4

0010

+ 2

3001

= −3

2v1 + 4v3 + 2v4.

Hence

MatC ,C (T ) =

−2 31

2−11

2−11

2

0 1 0 08 −8 5 42 −1 1 2

.

�

Change of basis

Suppose we are given two bases B and C for the same vector space V . Weshall now describe how MatB,B(T ) and MatC ,C (T ) are related for somelinear transformation T : V → V . (A similar description can be given for alinear transformation V → W with two bases B,B′ for V and two basesC ,C ′ for W . This would be more complicated, but essentially the sameideas apply.) [The following discussion also appeared in MT2001. Only abrief summary will appear presented during lectures.]

Let B = {v1, v2, . . . , vn} and C = {w1, w2, . . . , wn} be our two basesfor V . Since they are both bases, we can write a vector in one as a linearcombination of the vectors in the other and vice versa. Say

wj =

n∑

k=1

λkjvk (2.2)

41

and

vℓ =n∑

i=1

µiℓwi (2.3)

for some scalars λkj, µiℓ ∈ F . Let P = [λij ] be the matrix whose entries arethe coefficients from (2.2). We call P the change of basis matrix from B

to C . Note that we write the coefficients appearing when wj is expressed interms of the basis B down the jth column of P . Similarly Q = [µij ] is thechange of basis matrix from C to B.

Let us substitute (2.2) into (2.3):

vℓ =n∑

i=1

µiℓ

n∑

k=1

λkivk =n∑

k=1

( n∑

i=1

λkiµiℓ

)

vk.

This must be the unique way of writing vℓ as a linear combination of thevectors in B = {v1, v2, . . . , vn}. Thus

n∑

i=1

λkiµiℓ = δkℓ =

{

1 if k = ℓ

0 if k 6= ℓ.

(This δkℓ is called the Kronecker delta.) The left-hand side is the formulafor matrix multiplication, so

PQ = I,

the n× n identity matrix. Similarly, substituting (2.3) into (2.2) gives

QP = I.

We conclude that Q = P−1.Now suppose that T : V → V is a linear transformation whose matrix is

MatB,B(T ) = A = [αij ]. This means

T (vj) =

n∑

i=1

αijvi for j = 1, 2, . . . , n. (2.4)

To find MatC ,C (T ), apply T to (2.2):

T (wj) = T

( n∑

k=1

λkjvk

)

=n∑

k=1

λkjT (vk)

=n∑

k=1

λkj

n∑

ℓ=1

αℓkvℓ (from (2.4))

42

=

n∑

ℓ=1

n∑

k=1

αℓkλkj

n∑

i=1

µiℓwi (from (2.3))

=

n∑

i=1

( n∑

ℓ=1

n∑

k=1

µiℓαℓkλkj

)

wi.

Hence MatC ,C (T ) = B = [βij ] where

βij =

n∑

ℓ=1

n∑

k=1

µiℓαℓkλkj;

that is,B = QAP = P−1AP.

We have proved:

Theorem 2.13 Let V be a vector space of dimension n over a field F andlet T : V → V be a linear transformation. Let B = {v1, v2, . . . , vn} andC = {w1, w2, . . . , wn} be bases for V and let A and B be the matrices of Twith respect to B and C , respectively. Then there is an invertible matrix Psuch that

B = P−1AP.

Specifically, the (i, j)th entry of P is the coefficient of vi when wj is expressedas a linear combination of the basis vectors in B. �

Let us illustrate what we have just done with an example. This happensto be the first part of Question 1 on the January 2005 exam paper. Itfeatures a 2-dimensional vector space, principally chosen because calculatingthe inverse of a 2× 2 matrix is much easier than doing so with one of largerdimension. However, for larger dimension exactly the same method shouldbe used.

Example 2.14 Let V be a 2-dimensional vector space over R with basisB = {v1, v2}. Let

w1 = 3v1 − 5v2, w2 = −v1 + 2v2 (2.5)

and C = {w1, w2}. Define the linear transformation T : V → V by

T (v1) = 16v1 − 30v2

T (v2) = 9v1 − 17v2.

Find the matrix MatC ,C (T ).

43

Solution: The formula for T tells us that the matrix of T in terms of thebasis B is

A = MatB,B(T ) =

(

16 9−30 −17

)

.

The formula (2.5) expresses the wj in terms of the vi. Hence, our change ofbasis matrix is

P =

(

3 −1−5 2

)

.

ThendetP = 3× 2− (−1×−5) = 6− 5 = 1,

so

P−1 =1

detP

(

2 15 3

)

=

(

2 15 3

)

.

So

MatC ,C (T ) = P−1AP

=

(

2 15 3

)(

16 9−30 −17

)(

3 −1−5 2

)

=

(

2 1−10 −6

)(

3 −1−5 2

)

=

(

1 00 −2

)

.

We have diagonalised our linear transformation T . We shall discuss thistopic in more detail later in these notes.

As a check, observe

T (w2) = T (−v1 + 2v2)

= −T (v1) + 2T (v2)

= −(16v1 − 30v2) + 2(9v1 − 17v2)

= 2v1 − 4v2

= −2(−v1 + 2v2) = −2w2,

and similarly for T (w1). �

Example 2B Let

B =

01−1

,

10−1

,

2−10

.

(i) Show that B is a basis for R3.

44

(ii) Write down the change of basis matrix from the standard basis E ={e1, e2, e3} to B.

(iii) Let

A =

−2 −2 −31 1 2−1 −2 −2

and view A as a linear transformation R3 → R3. Find the matrix of Awith respect to the basis B.

Solution: (i) We first establish that B is linearly independent. Solve

α

01−1

+ β

10−1

+ γ

2−10

=

000

;

that is,

β + 2γ = 0

α − γ = 0

−α− β = 0.

Thus γ = α and the first equation yields 2α + β = 0. Adding the thirdequation now gives α = 0 and hence β = γ = 0. This show B is linearlyindependent and it is therefore a basis for R3 since dimR3 = 3 = |B|.

(ii) We write each vector in B in terms of the standard basis

01−1

= e2 − e3

10−1

= e1 − e3

2−10

= 2e1 − e2

and write the coefficients appearing down the columns of the change of basismatrix:

P =

0 1 21 0 −1−1 −1 0

.

(iii) Theorem 2.13 says MatB,B(A) = P−1AP (as the matrix of A withrespect to the standard basis is A itself). We first calculate the inverse of P

45

via the usual row operation method:

0 1 21 0 −1−1 −1 0

∣

∣

∣

∣

∣

∣

1 0 00 1 00 0 1

−→

0 1 21 0 −10 −1 −1

∣

∣

∣

∣

∣

∣

1 0 00 1 00 1 1

r3 7→ r3 + r1

−→

1 0 −10 1 20 −1 −1

∣

∣

∣

∣

∣

∣

0 1 01 0 00 1 1

r1 ↔ r2

−→

1 0 −10 1 20 0 1

∣

∣

∣

∣

∣

∣

0 1 01 0 01 1 1

r3 7→ r3 + r2

−→

1 0 00 1 00 0 1

∣

∣

∣

∣

∣

∣

1 2 1−1 −2 −21 1 1

r1 7→ r1 + r3r2 7→ r2 − 2r3

Hence

P−1 =

1 2 1−1 −2 −21 1 1

and so

MatB,B(A) = P−1AP

=

1 2 1−1 −2 −21 1 1

−2 −2 −31 1 2−1 −2 −2

0 1 21 0 −1−1 −1 0

=

−1 −2 −12 4 3−2 −3 −3

0 1 21 0 −1−1 −1 0

=

−1 0 01 −1 00 1 −1

.

�

46

Section 3

Direct sums

Definition and basic properties

The following construction is extremely useful.

Definition 3.1 Let V be a vector space over a field F . We say that V isthe direct sum of two subspaces U1 and U2, written V = U1 ⊕ U2 if everyvector in V can be expressed uniquely in the form u1 + u2 where u1 ∈ U1

and u2 ∈ U2.

Proposition 3.2 Let V be a vector space and U1 and U2 be subspaces of V .Then V = U1 ⊕ U2 if and only if the following conditions hold:

(i) V = U1 + U2,

(ii) U1 ∩ U2 = {0}.

Comment: Many authors use these two conditions to define what is meantby a direct sum and then show it is equivalent to our “unique expression”definition.

Proof: By definition, U1+U2 = {u1+u2 | u1 ∈ U1, u2 ∈ U2 }, so certainlyevery vector in V can be expressed in the form u1 + u2 (where ui ∈ Ui) ifand only if V = U1 + U2. We must show that condition (ii) corresponds tothe uniqueness part.

So suppose V = U1⊕U2. Let u ∈ U1∩U2. Then we have u = u+0 = 0+uas two ways of expressing u as the sum of a vector in U1 and a vector in U2.The uniqueness condition forces u = 0, so U1 ∩ U2 = {0}.

Conversely, suppose U1 ∩ U2 = {0}. Suppose v = u1 + u2 = u′1 + u′2 areexpressions for a vector v where u1, u

′1 ∈ U1 and u2, u

′2 ∈ U2. Then

u1 − u′1 = u′2 − u2 ∈ U1 ∩ U2,

47

so u1 − u′1 = u′2 − u2 = 0 and we deduce u1 = u′1 and u2 = u′2. Hence ourexpressions are unique, so (i) and (ii) together imply V = U1 ⊕ U2. �

Example 3.3 Let V = R3 and let

U1 = Span

111

,

210

and U2 = Span

031

.

Show that V = U1 ⊕ U2.

Solution: Let us solve

α

111

+ β

210

+ γ

031

=

000

.

We findα+ 2β = α+ β + 3γ = α+ γ = 0.

Thus γ = −α, so the second equation gives β− 2α = 0; i.e., β = 2α. Hence5α = 0, so α = 0 which implies β = γ = 0. Thus the three vectors

111

,

210

,

031

are linearly independent and hence form a basis for R3. Therefore everyvector in R3 can be expressed (uniquely) as

α

111

+ β

210

+

γ

031

= u1 + u2 ∈ U1 + U2.

So R3 = U1 + U2. If v ∈ U1 ∩ U2, then

v = α

111

+ β

210

= γ

031

for some α, β, γ ∈ R and we would have

α

111

+ β

210

− γ

031

=

000

.

Linear independence forces α = β = γ = 0. Hence v = 0, so U1 ∩U2 = {0}.Thus R3 = U1 ⊕ U2. �

48

The link between a basis for V and a direct sum decomposition V =U1 ⊕ U2 has now arisen. We formalise this in the following observation.

Proposition 3.4 Let V = U1 ⊕ U2 be a finite-dimensional vector spaceexpressed as a direct sum of two subspaces. If B1 and B2 are bases forU1 and U2, respectively, then B1 ∪B2 is a basis for V .

Proof: Let B1 = {u1, u2, . . . , um} and B2 = {v1, v2, . . . , vn}. If v ∈ V ,then v = x+ y where x ∈ U1 and y ∈ U2. Since B1 and B2 span U1 and U2,respectively, there exist scalars αi and βj such that

x = α1u1 + · · · + αmum and y = β1v1 + · · ·+ βnvn.

Thenv = x+ y = α1u1 + · · ·+ αmum + β1v1 + · · ·+ βnvn

and it follows that B = {u1, u2, . . . , um, v1, v2, . . . , vn} spans V .Now suppose

α1u1 + · · ·+ αmum + β1v1 + · · ·+ βnvn = 0

for some scalars αi, βi. Put

x = α1u1 + · · ·+ αmum ∈ U1 and y = β1v1 + · · ·+ βnvn ∈ U2.

Then x + y = 0 must be the unique decomposition of 0 produced by thedirect sum V = U1 ⊕ U2; that is, it must be 0+ 0 = 0. Hence

α1u1 + · · ·+ αmum = x = 0 and β1v1 + · · ·+ βnvn = y = 0.

Linear independence of B1 and B2 now give

α1 = · · · = αm = 0 and β1 = · · · = βn = 0.

Hence B = B1 ∪B2 is linearly independent and therefore a basis for V . �

Corollary 3.5 If V = U1⊕U2 is a finite-dimensional vector space expressedas a direct sum of two subspaces, then

dimV = dimU1 + dimU2.

�

Example 3.3 is in some sense typical of direct sums. To gain a visualunderstanding, the following picture illustrates the 3-dimensional space R3

as the direct sum of a 1-dimensional subspace U1 and a 2-dimensional sub-space U2 (these being a line and a plane passing through the origin, respec-tively).

49

z

y

x

0

U2

U1

Projection maps

Definition 3.6 Let V = U1⊕U2 be a vector space expressed as a direct sumof two subspaces. The two projection maps P1 : V → V and P2 : V → V ontoU1 and U2, respectively, corresponding to this decomposition are defined asfollows:

if v ∈ V , express v uniquely as v = u1 + u2 where u1 ∈ U1 andu2 ∈ U2, then

P1(v) = u1 and P2(v) = u2.

Note that the uniqueness of expression guarantees that precisely onevalue is specified for P1(v) and one for P2(v). (If we only had V = U1 +U2,then we would have choice as to which expression v = u1 + u2 to use andwe would not have well-defined maps.)

Lemma 3.7 Let V = U1⊕U2 be a direct sum of subspaces with projectionmaps P1 : V → V and P2 : V → V . Then

(i) P1 and P2 are linear transformations;

(ii) P1(u) = u for all u ∈ U1 and P1(w) = 0 for all w ∈ U2;

(iii) P2(u) = 0 for all u ∈ U1 and P2(w) = w for all w ∈ U2;

(iv) kerP1 = U2 and imP1 = U1;

(v) kerP2 = U1 and imP2 = U2.

50

Proof: We just deal with the parts relating to P1. Those for P2 areestablished by identical arguments. To simplify notation we shall discardthe subscript and simply write P for the projection map onto U1 associatedto the direct sum decomposition V = U1⊕U2. This is defined by P (v) = u1when v = u1 + u2 with u1 ∈ U1 and u2 ∈ U2.

(i) Let v, v′ ∈ V and write v = u1 + u2, v′ = u′1 + u′2 where u1, u′1 ∈ U1

and u2, u′2 ∈ U2. Then

v + v′ = (u1 + u′1) + (u2 + u′2)

and u1 + u′1 ∈ U1, u2 + u′2 ∈ U2. This must be the unique decompositionfor v + v′, so

P (v + v′) = u1 + u′1 = P (v) + P (v′).

Equally if α ∈ F , then αv = αu1 + αu2 where αu1 ∈ U1, αu2 ∈ U2. Thus

P (αv) = αu1 = αP (v).

Hence P is a linear transformation.(ii) If u ∈ U1, then u = u+0 is the decomposition we use to calculate P ,

so P (u) = u.If w ∈ U2, then w = 0+ w is the required decomposition, so P (w) = 0.(iv) For any vector v, P (v) is always the U1-part in the decomposition

of v, so certainly imP ⊆ U1. On the other hand, if u ∈ U1, then part (ii)says u = P (u) ∈ imP . Hence imP = U1.

Part (ii) also says P (w) = 0 for all w ∈ U2, so U2 ⊆ kerP . On the otherhand, if v = u1 + u2 lies in kerP , then P (v) = u1 = 0, so v = u2 ∈ U2.Hence kerP = U2. �

The major facts about projections are the following:

Proposition 3.8 Let P : V → V be a projection corresponding to somedirect sum decomposition of the vector space V . Then

(i) P 2 = P ;

(ii) V = kerP ⊕ imP ;

(iii) I − P is also a projection;

(iv) V = kerP ⊕ ker(I − P ).

Here I : V → V denotes the identity transformation I : v 7→ v for v ∈ V .

Proof: As a projection map, P must be associated to a direct sum de-composition of V , so let us assume that V = U1 ⊕ U2 and that P = P1 isthe corresponding projection onto the subspace U1 (i.e., that P denotes thesame projection as in the previous proof).

51

(i) If v ∈ V , then P (v) ∈ U1, so by Lemma 3.7(ii),

P 2(v) = P (P (v)) = P (v).

Hence P 2 = P .(ii) kerP = U2 and imP = U1, so

V = U1 ⊕ U2 = imP ⊕ kerP,

as required.(iii) Let Q : V → V denote the projection onto U2. If v ∈ V , say v =

u1 + u2 where u1 ∈ U1 and u2 ∈ U2, then

Q(v) = u2 = v − u1 = v − P (v) = (I − P )(v).

Hence I − P is the projection Q.(iv) kerP = U2, while ker(I − P ) = kerQ = U1. Hence

V = U1 ⊕ U2 = ker(I − P )⊕ kerP.

�

We give an example to illustrate how projection maps depend on bothsummands in the direct sum decomposition.

Example 3.9 Let

U1 = Span

((

10

))

, U2 = Span

((

01

))

and U3 = Span

((

11

))

.

Show thatR2 = U1 ⊕ U2 and R2 = U1 ⊕ U3.

and, if P : R2 → R2 is the projection onto U1 corresponding to the firstdecomposition and Q : R2 → R2 is the projection onto U1 corresponding tothe second decomposition, that P 6= Q.

Solution: If v =

(

xy

)

∈ R2, then

v =

(

xy

)

= x

(

10

)

+ y

(

01

)

and

v =

(

xy

)

= (x− y)

(

10

)

+ y

(

11

)

.

Hence R2 = U1 + U2 = U1 + U3. Moreover,

U1 =

{(

x0

) ∣

∣

∣

∣

x ∈ R

}

and U2 =

{(

0y

) ∣

∣

∣

∣

y ∈ R

}

,

52

so U1∩U2 = {0}. Therefore we do have a direct sum R2 = U1⊕U2. Similarly,one can see U1 ∩ U3 = {0}, so the second sum is also direct.

We know by Lemma 3.7(ii) that

P (u) = Q(u) = u for all u ∈ U1,

but if we take v =

(

32

)

∈ R2, we obtain different values for P (v) and Q(v).

Indeed(

32

)

=

(

30

)

+

(

02

)

is the decomposition corresponding to R2 = U1 ⊕ U2 which yields

P

(

32

)

=

(

30

)

∈ U1

while(

32

)

=

(

10

)

+

(

22

)

is that corresponding to R2 = U1 ⊕ U3 which yields

Q

(

32

)

=

(

10

)

∈ U1.

Also note kerP = U2 6= kerQ = U3, which is more information indicatingthe difference between these two transformations. �

Example 3A Let V = R3 and U = Span(v1), where

v1 =

3−12

.

(i) Find a subspace W such that V = U ⊕W .

(ii) Let P : V → V be the associated projection onto W . Calculate P (u)where

u =

444

.

Solution: (i) We first extend {v1} to a basis for R3. We claim that

B =

3−12

,

100

,

010

53

is a basis for R3. We solve

α

3−12

+ β

100

+ γ

010

=

000

;

that is,3α+ β = −α+ γ = 2α = 0.

Hence α = 0, so β = −3α = 0 and γ = α = 0. Thus B is linearlyindependent. Since dimV = 3 and |B| = 3, we conclude that B is a basisfor R3.

Let W = Span(v2,v3) where

v2 =

100

and v3 =

010

.

Since B = {v1,v2,v3} is a basis for V , if v ∈ V , then there exist α1, α2, α3 ∈R such that

v = (α1v1) + (α2v2 + α3v3) ∈ U +W.

Hence V = U +W .If v ∈ U ∩W , then there exist α, β1, β2 ∈ R such that

v = αv1 = β1v2 + β2v3.

Thereforeαv1 + (−β1)v2 + (−β2)v3 = 0.

Since B is linearly independent, we conclude α = −β1 = −β2 = 0, sov = αv1 = 0. Thus U ∩W = {0} and so

V = U ⊕W.

(ii) We write u as a linear combination of the basis B. Inspection shows

u =

444

= 2

3−12

− 2

100

+ 6

010

=

6−24

+

−260

,

where the first term in the last line belongs to U and the second to W .Hence

P (u) =

−260

(since this is the W -component of u). �

54

Direct sums of more summands

We briefly address the situation when V is expressed as a direct sum of morethan two subspaces.

Definition 3.10 Let V be a vector space. We say that V is the direct sumof subspaces U1, U2, . . . , Uk, written V = U1⊕U2⊕· · · ⊕Uk, if every vectorin V can be uniquely expressed in the form u1 +u2 + · · ·+uk where ui ∈ Ui

for each i.

Again this can be translated into a condition involving sums and inter-sections, though the intersection condition is more complicated. We omitthe proof.

Proposition 3.11 Let V be a vector space with subspaces U1, U2, . . . , Uk.Then V = U1 ⊕ U2 ⊕ · · · ⊕ Uk if and only if the following conditions hold:

(i) V = U1 + U2 + · · ·+ Uk;

(ii) Ui ∩ (U1 + · · · + Ui−1 + Ui+1 + · · ·+ Uk) = {0} for each i.

We shall exploit the potential of direct sums to produce useful bases forour vector spaces. The following adapts quite easily from Proposition 3.4:

Proposition 3.12 Let V = U1⊕U2⊕· · ·⊕Uk be a direct sum of subspaces.If Bi is a basis for Ui for i = 1, 2, . . . , k, then B1 ∪B2 ∪ · · · ∪Bk is a basisfor V .

55

Section 4

Diagonalisation of linear

transformations

In this section, we seek to discuss the diagonalisation of a linear transforma-tion; that is, to understand when a linear transformation can be representedby a diagonal matrix with respect to some basis. We first need to refer toeigenvectors and eigenvalues.

Eigenvectors and eigenvalues

Definition 4.1 Let V be a vector space over a field F and let T : V → Vbe a linear transformation. (This will be our setup throughout this section.)A non-zero vector v is an eigenvector for T with eigenvalue λ (where λ ∈ F )if

T (v) = λv.

Note: The condition v 6= 0 is important since T (0) = 0 = λ0 for everyλ ∈ F , so considering 0 will never provide interesting about our transfor-mation T .

Note that T (v) = λv implies that T (v)−λv = 0. Consequently, we makethe following definition:

Definition 4.2 Let V be a vector space over a field F , let T : V → V bea linear transformation, and λ ∈ F . The eigenspace corresponding to theeigenvalue λ is the subspace

Eλ = ker(T − λI) = { v ∈ V | T (v)− λv = 0 }= { v ∈ V | T (v) = λv }.

Recall that I denotes the identity transformation v 7→ v.

56

Thus Eλ consists of all the eigenvectors of T with eigenvalue λ togetherwith the zero vector 0. Note that T − λI is a linear transformation (sinceit is built from the linear transformations T and I), so Eλ is certainly asubspace of V .

From now on we assume that V is finite-dimensional over the field F .To see how to find eigenvalues and eigenvectors, note that if v is an

eigenvector, then

(T − λI)(v) = 0 or (λI − T )(v) = 0,

so v ∈ ker(λI − T ). Consequently, λI − T is not invertible (ker(λI − T ) 6={0}). If A is the matrix of T with respect to some basis, then λI −A is thematrix of λI − T (where in the former I refers to the identity matrix). Wenow know that λI −A is not invertible, so det(λI −A) = 0.

Definition 4.3 Let T : V → V be a linear transformation of the finite-dimensional vector space V (over F ) and let A be the matrix of T withrespect to some basis. The characteristic polynomial of T is

cT (x) = det(xI −A)

where x is an indeterminate variable.

We have established one half of the following lemma.

Lemma 4.4 Suppose that T : V → V is a linear transformation of thefinite-dimensional vector space V over F . Then λ is an eigenvalue of T ifand only if λ is a root of the characteristic polynomial of T .

Proof: We have shown that if λ is an eigenvalue of T , then λ is a rootof cT (x) (in the above notation).

For the converse, first note that, for any linear map S : V → V , S isinvertible if and only if kerS = {0}. (For if kerS = {0}, the Rank-NullityTheorem says imS = V ; that is, S is surjective. If Sv = Sw for somev and w, then S(v − w) = 0 and the fact that the kernel is zero forcesv = w. Hence, S is a bijection.)

Now if λ is a root of cT (x), then det(λI − A) = 0, so λI − A is notinvertible. The previous observation then tells us Eλ = ker(λI − T ) 6= {0}and then any non-zero vector in the eigenspace Eλ is an eigenvector for Twith eigenvalue λ. �

Remarks

(i) Some authors view it is as easier to calculate det(A − xI) ratherthan det(xI − A) and so define the characteristic polynomial slightly

57

different. Since we are just multiplying every entry in the matrixby −1, the resulting determinant will merely be different by a factorof (−1)n = ±1 (where n = dimV ) and the roots will be unchanged.The reason for defining cT (x) = det(xI − A) is that it ensures thehighest degree term is always xn (rather than −xn in the case thatn is odd). The latter will be relevant in later discussions.

(ii) On the face of it the characteristic polynomial depends on the choiceof basis, since, as we have seen, changing basis changes the matrix of alinear transformation. In fact, we get the same polynomial no matterwhich basis we use.

Lemma 4.5 Let V be a finite-dimensional vector space V over F andT : V → V be a linear transformation. The characteristic polynomial cT (x)is independent of the choice of basis for V .

Consequently, cT (x) depends only on T .

Proof: Let A and A′ be the matrices of T with respect to two differentbases for V . Theorem 2.13 tells us that A′ = P−1AP for some invertiblematrix P . Then

P−1(xI −A)P = xP−1IP − P−1AP = xI −A′,

so

det(xI −A′) = det(P−1(xI −A)P )

= detP−1 · det(xI −A) · detP= (detP )−1 · det(xI −A) · detP= det(xI −A)

(since multiplication in the field F is commutative — see condition (v)of Definition 1.1). Hence we get the same answer for the characteristicpolynomial. �

Diagonalisability

Let us now move onto the diagonalisation of linear transformations.

Definition 4.6 (i) Let T : V → V be a linear transformation of a finite-dimensional vector space V . We say that T is diagonalisable if thereis a basis with respect to which T is represented by a diagonal matrix.

(ii) A square matrix A is diagonalisable if there is an invertible matrix Psuch that P−1AP is diagonal.

58

Of course, in (ii) we are simply viewing the n × n matrix A as a lineartransformation Fn → Fn and forming P−1AP simply corresponds to choos-ing a (non-standard) basis for Fn with respect to which the transformationis represented by a diagonal matrix (see Theorem 2.13).

Proposition 4.7 Let V be a finite-dimensional vector space and T : V → Vbe a linear transformation. Then T is diagonalisable if and only if there isa basis for V consisting of eigenvectors for T .

Proof: If T is diagonalisable, there is a basis B = {v1, v2, . . . , vn} withrespect to which T is represented by a diagonal matrix, say

MatB,B(T ) =

λ1

λ20

0. . .

λn

for some λ1, λ2, . . . , λn ∈ F . Then T (vi) = λivi for i = 1, 2, . . . , n, so eachbasis vector in B is an eigenvector.

Conversely, if each vector in a basis B is an eigenvector, then the matrixMatB,B(T ) is diagonal (with each diagonal entry being the correspondingeigenvalue). �

Example 4.8 Let V = R3 and consider the linear transformation T : V →V given by the matrix

A =

8 6 0−9 −7 03 3 2

Show that T is diagonalisable, find a matrix P such that D = P−1AP isdiagonal and find D.

Solution: To say that T is given by the above matrix is to say that thismatrix represents T with respect to the standard basis for R3; i.e., that weobtain T by multiplying vectors on the left by A). We first calculate thecharacteristic polynomial:

det(xI −A) = det

x− 8 −6 09 x+ 7 0−3 −3 x− 2

= (x− 2) ((x− 8)(x + 7) + 6× 9)

= (x− 2) ((x− 8)(x + 7) + 54)

= (x− 2)(x2 − x− 2)

= (x− 2)(x+ 1)(x− 2)

59

= (x+ 1)(x− 2)2,

socT (x) = (x+ 1)(x− 2)2

and the eigenvalues of T are −1 and 2. (We cannot yet guarantee we haveenough eigenvectors to form a basis.)

We now need to go looking for eigenvectors. First we seek v ∈ R3 suchthat T (v) = −v; i.e., such that (T + I)(v) = 0. Thus we solve

9 6 0−9 −6 03 3 3

xyz

=

000

,

so9x+ 6y = 0, 3x+ 3y + 3z = 0.

So given arbitrary x, take y = −3

2x and z = −x− y. We have one degree of

freedom (the choice of x) and we determine that

E−1 = ker(T + I) =

x−3

2x

1

2x

∣

∣

∣

∣

∣

∣

x ∈ R

= Span

2−31

.

Hence our eigenspace E−1 is one-dimensional and the vector

2−31

is a

suitable eigenvector with eigenvalue −1.Now we seek v ∈ R3 such that T (v) = 2v; i.e., (T − 2I)(v) = 0. We

therefore solve

6 6 0−9 −9 03 3 0

xyz

=

000

,

sox+ y = 0.

Hence our eigenspace is

E2 = ker(T − 2I) =

x−xz

∣

∣

∣

∣

∣

∣

x, z ∈ R

= Span

1−10

,

001

.

60

Hence the eigenspace E2 is two-dimensional and we can find two linearlyindependent eigenvectors with eigenvalue 2, for example

1−10

,

001

.

We conclude that with respect to the basis

B =

2−31

,

1−10

,

001

the matrix of T is

D =

−1 0 00 2 00 0 2

.

In particular, T is diagonalisable. It remains to find P such that D =P−1AP , but Theorem 2.13 tells us we need simply take the matrix whoseentries are the coefficients of the vectors in B when expressed in terms ofthe standard basis:

P =

2 1 0−3 −1 01 0 1

since

2−31

= 2

100

− 3

010

+

001

= 2e1 − 3e2 + e3,

so the entries of the first column are 2, −3 and 1, and similarly for the othercolumns.

Before continuing to develop the theory of diagonalisation, we give anexample that illustrates how this can be applied to an applied mathematicssetting.

Example 4.9 Solve the following system of differential equations involvingdifferentiable functions x, y and z of one variable:

8dx

dt+ 6

dy

dt= 2− 2t− 2t2

−9dxdt− 7

dy

dt= −2 + 2t+ 3t2

3dx

dt+ 3

dy

dt+ 2

dz

dt= 2 + 2t− t2

61

Solution: To simply notation, we shall use a dash (′) to denote differenti-ation with respect to t. Define the vector-valued function v = v(t) by

v =

xyz

.

Our system of differential equations then becomes

Av′ =

2− 2t− 2t2

−2 + 2t+ 3t2

2 + 2t− t2

where

A =

8 6 0−9 −7 03 3 2

,

which happens to be the matrix considered in Example 4.8. As a conse-quence of that example, we know that P−1AP = D, where

P =

2 1 0−3 −1 01 0 1

and D =

−1 0 00 2 00 0 2

.

Multiply our matrix equation on the left by P−1 and define w = P−1v. Wethen obtain

P−1APw′ = P−1Av′ = P−1

2− 2t− 2t2

−2 + 2t+ 3t2

2 + 2t− t2

.

The usual method [which will be omitted during lectures] finds the inverseof P :

2 1 0−3 −1 01 0 1

1 0 00 1 00 0 1

−→

0 1 −20 −1 31 0 1

1 0 −20 1 30 0 1

r1 7→ r1 − 2r3r2 7→ r2 + 3r3

−→

0 1 −20 0 11 0 1

1 0 −21 1 10 0 1

r2 7→ r2 + r1

−→

0 1 00 0 11 0 0

3 2 01 1 1−1 −1 0

r1 7→ r1 + 2r2r3 7→ r3 − r2

−→

1 0 00 1 00 0 1

−1 −1 03 2 01 1 1

62

(finally rearranging the rows). Hence

P−1 =

−1 −1 03 2 01 1 1

.

Hence our equation becomes

Dw′ =

−1 −1 03 2 01 1 1

2− 2t− 2t2

−2 + 2t+ 3t2

2 + 2t− t2

=

−t22− 2t2 + 2t

.

Let’s write a, b and c for the three entries of w. So our equation becomes:

−da

dt= −t2, 2

db

dt= 2− 2t, 2

dc

dt= 2 + 2t;

that is,da

dt= t2,

db

dt= 1− t,

dc

dt= 1 + t.

Hence

a = 1

3t3 + c1, b = t− 1

2t2 + c2, c = t+ 1

2t2 + c3

for some constants c1, c2 and c3; that is,

w =

1

3t3

t− 1

2t2

t+ 1

2t2

+ c

for some constant vector c. Therefore

v = Pw =

2 1 0−3 −1 01 0 1

1

3t3

t− 1

2t2

t+ 1

2t2

+ c

=

t− 1

2t2 + 2

3t3

−t+ 1

2t2 − t3

t+ 1

2t2 + 1

3t3

+ k

for some constant vector k. Hence our solution is

x = 2

3t3 − 1

2t2 + t+ k1

y = −t3 + 1

2t2 − t+ k2

z = 1

3t3 + 1

2t2 + t+ k3

for some constants k1, k2 and k3.

63

Algebraic and geometric multiplicities

Suppose T : V → V is a diagonalisable linear transformation. Then there isa basis B for V with respect to which the matrix of T has the form

MatB,B(T ) = A =

λ1

λ20

0. . .

λn

for some λ1, λ2, . . . , λn ∈ F (possibly including repeats). The characteristicpolynomial of T does not depend on the choice of basis (Lemma 4.5), so

cT (x) = det(xI −A) = det

x− λ1

x− λ20

0. . .

x− λn

= (x− λ1)(x− λ2) · · · (x− λn).

So:

Lemma 4.10 If the linear transformation T : V → V is diagonalisable, thenthe characteristic polynomial of T is a product of linear factors. �

Here, a linear polynomial is one of degree 1, so we mean that the char-acteristic polynomial cT (x) is a product of factors of the form αx + β. Ofcourse, as the leading coefficient of cT (x) is x

n, the linear factors can alwaysbe arranged to have the form x + β (β ∈ F ). Then −β would be a rootof cT (x) and hence an eigenvalue of T (by Lemma 4.4).

Note:

(i) This lemma only gives a necessary condition for diagonalisability. Weshall next meet an example where cT (x) is a product of linear factorsbut T is not diagonalisable.

(ii) The field C of complex numbers is algebraically closed, i.e., every poly-nomial factorises as a product of linear factors. So Lemma 4.10 givesno information in that case.

Example 4.11 Let T : R3 → R3 be the linear transformation given by thematrix

B =

8 3 0−18 −7 0−9 −4 2

.

Determine whether T is diagonalisable.

64

Solution: Now

det(xI −B) = det

x− 8 −3 018 x+ 7 09 4 x− 2

= (x− 2) ((x− 8)(x + 7) + 3× 18)

= (x− 2) ((x− 8)(x + 7) + 54)

= (x− 2)(x2 − x− 2)

= (x+ 1)(x− 2)2,

socT (x) = (x+ 1)(x − 2)2.

Again we have a product of linear factors, exactly as we did in Example 4.8.This time, however, issues arise when we seek to find eigenvectors witheigenvalue 2. We solve (T − 2I)(v) = 0; that is,

6 3 0−18 −9 0−9 −4 0

xyz

=

000

,

so6x+ 3y = −18− 9y = −9x− 4y = 0;

that is,2x+ y = 0, 9x+ 4y = 0.

The first equation gives y = −2x, and when we substitute in the secondwe obtain x = 0 and so y = 0. Hence our eigenspace corresponding toeigenvalue 2 is

E2 = ker(T − 2I) =

00z

∣

∣

∣

∣

∣

∣

z ∈ R

= Span

001

.

Thus dimE2 = 1 and we cannot find two linearly independent eigenvectorswith eigenvalue 2.

I shall omit the calculation, but only a single linearly independent vectorcan be found with eigenvalue −1. We can only find a set of size at most twocontaining linearly independent eigenvectors. Hence there is no basis for R3

consisting of eigenvectors for T and T is not diagonalisable.

65

The basic issue occurring in this example is that not as many linearlyindependent eigenvectors could be found as there were relevant linear factorsin the characteristic polynomial. This relates to the so-called algebraic andgeometric multiplicity of the eigenvalue.

Definition 4.12 Let V be a finite-dimensional vector space over the field Fand let T : V → V be a linear transformation of V . Let λ ∈ F .

(i) The algebraic multiplicity of λ (as an eigenvalue of T ) is the largestpower k such that (x − λ)k is a factor of the characteristic polyno-mial cT (x).

(ii) The geometric multiplicity of λ is the dimension of the eigenspace Eλ

corresponding to λ.

If λ is not an eigenvalue of T , then λ is not a root of the characteristicpolynomial (by Lemma 4.4), so (x−λ) does not divide cT (x). Equally, thereare no eigenvectors with eigenvalue λ in this case, so Eλ = ker(T−λI) = {0}.Hence the algebraic and geometric multiplicities of λ are both 0 when λ isnot an eigenvalue. Consequently we shall not be at all interested in thesetwo multiplicities in this case.

On the other hand, when λ is an eigenvalue, the same sort of argumentshows that the algebraic and geometric multiplicities are both at least 1.

We shall shortly discuss how these two multiplicities are linked. We shallmake use of the following important fact:

Proposition 4.13 Let T : V → V be a linear transformation of a vectorspace V . A set of eigenvectors of T corresponding to distinct eigenvalues islinearly independent.

Proof: Let A = {v1, v2, . . . , vk} be a set of eigenvectors of T . Let λi bethe eigenvalue of vi and assume the λi are distinct. We proceed by inductionon k.

Note that if k = 1, then A = {v1} consists of a single non-zero vectorso is linearly independent.

So assume that k > 1 and that the result holds for smaller sets of eigen-vectors for T . Suppose

α1v1 + α2v2 + · · ·+ αkvk = 0. (4.1)

Apply T :α1T (v1) + α2T (v2) + · · ·+ αkT (vk) = 0,

soα1λ1v1 + α2λ2v2 + · · · + αkλkvk = 0.

66

Multiply (4.1) by λk and subtract:

α1(λ1 − λk)v1 + α2(λ2 − λk)v2 + · · ·+ αk−1(λk−1 − λk)vk−1 = 0.

This is an expression of linear dependence involving the eigenvectors v1, v2,. . . , vk−1, so by induction

αi(λi − λk) = 0 for i = 1, 2, . . . , k − 1.

By assumption the eigenvalues λi are distinct, so λi − λk 6= 0 for i = 1, 2,. . . , k − 1, and, dividing by this non-zero scalar, we deduce

αi = 0 for i = 1, 2, . . . , k − 1.

Equation 4.1 now yields αkvk = 0, which forces αk = 0 as the eigenvector vkis non-zero. This completes the induction step. �

Theorem 4.14 Let V be a finite-dimensional vector space over the field Fand let T : V → V be a linear transformation of V .

(i) If the characteristic polynomial cT (x) is a product of linear factors (asalways happens, for example, if F = C), then the sum of the algebraicmultiplicities equals dimV .

(ii) Let λ ∈ F and let rλ be the algebraic multiplicity and nλ be thegeometric multiplicity of λ. Then

nλ 6 rλ.

(iii) The linear transformation T is diagonalisable if and only if cT (x) is aproduct of linear factors and nλ = rλ for all eigenvalues λ.

Proof: (i) Let n = dimV and write cT (x) as a product of linear factors

cT (x) = (x− λ1)r1(x− λ2)

r2 . . . (x− λk)rk

where λ1, λ2, . . . , λk are the distinct roots of cT (x) (i.e., the distinct eigen-values of T ). Since cT (x) is the determinant of a specific n× n matrix, it isa polynomial of degree n, so

dimV = n = r1 + r2 + · · · + rk,

the sum of the algebraic multiplicities.(ii) Let λ be an eigenvalue of T and let m = nλ, the dimension of the

eigenspace Eλ = ker(T − λI). Choose a basis {v1, v2, . . . , vm} for Eλ andextend to a basis B = {v1, v2, . . . , vm, vm+1, . . . , vn} for V . Note then

T (vi) = λvi for i = 1, 2, . . . , m,

67

so the matrix of T with respect to B has the form

A = MatB,B(T ) =

λ 0 · · · 0 ∗ · · · ∗0 λ

. . .... ∗ · · · ∗

0 0. . . 0

......

......

. . . λ...

......

... 0...

......

......

......

0 0 · · · 0 ∗ · · · ∗

Hence

cT (x) = det

x− λ 0 · · · 0 ∗ · · · ∗0 x− λ

. . .... ∗ · · · ∗

0 0. . . 0

......

......

. . . x− λ...

......

... 0...

......

......

......

0 0 · · · 0 ∗ · · · ∗

= (x− λ)mp(x)

for some polynomial p(x). Hence rλ > m = nλ.(iii) Suppose that

cT (x) = (x− λ1)r1(x− λ2)

r2 . . . (x− λk)rk

where λ1, λ2, . . . , λk are the distinct eigenvalues of T , so

r1 + r2 + · · ·+ rk = n = dimV

(by (i)). Let ni = dimEλibe the geometric multiplicity of λi. We suppose

that ni = ri. Choose a basis Bi = {vi1, vi2, . . . , vini} for each Eλi

and let

B = B1 ∪B2 ∪ · · · ∪Bk = { vij | i = 1, 2, . . . , k; j = 1, 2, . . . , ni }.

Claim: B is linear independent.

Suppose∑

16i6k16j6ni

αijvij = 0.

68

Let wi =∑ni

j=1αijvij . Then wi is a linear combination of vectors in the

eigenspace Eλi, so wi ∈ Eλi

. Now

w1 +w2 + · · ·+ wk = 0.

Proposition 4.13 says that eigenvectors for distinct eigenvalues are linearlyindependent, so the wi cannot be eigenvectors. Therefore

wi = 0 for i = 1, 2, . . . , k;

that is,ni∑

j=1

αijvij = 0 for i = 1, 2, . . . , k.

Since Bi is a basis for Eλi, it is linearly independent and we conclude that

αij = 0 for all i and j. Hence B is a linearly independent set.

Now since ni = ri by assumption, we have

|B| = n1 + n2 + · · ·+ nk = n.

Hence B is a linearly independent set of size equal to the dimension of V .Therefore B is a basis for V and it consists of eigenvectors for T . HenceT is diagonalisable.

Conversely, suppose T is diagonalisable. We have already observed thatcT (x) is a product of linear factors (Lemma 4.10). We may therefore main-tain the notation of the first part of this proof. Since T is diagonalisable,there is a basis B for V consisting of eigenvectors for T . Let Bi = B ∩Eλi

,that is, Bi consists of those vectors from B that have eigenvalue λi. Asevery vector in B is an eigenvector,

B = B1 ∪B2 ∪ · · · ∪Bk.

As B is linearly independent, so is Bi and Theorem 1.23 tells us

|Bi| 6 dimEλi= ni.

Hence

n = |B| = |B1|+ |B2|+ · · ·+ |Bk| 6 n1 + n2 + · · ·+ nk.

But ni 6 ri and r1 + r2 + · · · + rk = n, so we deduce ni = ri for all i. Thiscompletes the proof of (iii). �

This theorem now explains what was actually going on in Example 4.11.It was an example where the characteristic polynomial splits into linearfactors but the geometric multiplicity for one of the eigenvalues is smallerthan its algebraic multiplicity.

69

Example 4A Let

A =

−1 2 −1−4 5 −2−4 3 0

.

Show that A is not diagonalisable.

Solution: The characteristic polynomial of A is

cA(x) = det(xI −A)

= det

x+ 1 −2 14 x− 5 24 −3 x

= (x+ 1) det

(

x− 5 2−3 x

)

+ 2det

(

4 24 x

)

+ det

(

4 x− 54 −3

)

= (x+ 1)(

x(x− 5) + 6)

+ 2(4x− 8) + (−12− 4x+ 20)

= (x+ 1)(x2 − 5x+ 6) + 8(x− 2)− 4x+ 8

= (x+ 1)(x− 2)(x− 3) + 8(x− 2)− 4(x− 2)

= (x− 2)(

(x+ 1)(x− 3) + 8− 4)

= (x− 2)(x2 − 2x− 3 + 4)

= (x− 2)(x2 − 2x+ 1)

= (x− 2)(x− 1)2.

In particular, the algebraic multiplicity of the eigenvalue 1 is 2.We now determine the eigenspace for eigenvalue 1. We solve (A− I)v =

0; that is,

−2 2 −1−4 4 −2−4 3 −1

xyz

=

000

. (4.2)

We solve this by applying row operations:

−2 2 −1−4 4 −2−4 3 −1

∣

∣

∣

∣

∣

∣

000

−→

−2 2 −10 0 00 −1 1

∣

∣

∣

∣

∣

∣

000

r2 7→ r2 − 2r1r3 7→ r3 − 2r1

−→

−2 0 10 0 00 −1 1

∣

∣

∣

∣

∣

∣

000

r1 7→ r1 + 2r3

So Equation (4.2) is equivalent to

−2x+ z = 0 = −y + z.

70

Hence z = 2x and y = z = 2x. Therefore the eigenspace is

E1 =

x2x2x

∣

∣

∣

∣

∣

∣

x ∈ R

= Span

122

and we conclude dimE1 = 1. Thus the geometric multiplicity of 1 is notequal to the algebraic multiplicity, so A is not diagonalisable. �

Minimum polynomial

To gain further information about diagonalisation of linear transformations,we shall introduce the concept of the minimum polynomial (also called theminimal polynomial).

Let V be a vector space over a field F of dimension n. Consider alinear transformation T : V → V . We know the following facts about lineartransformations:

(i) the composition of two linear transformations is also a linear transfor-mation; in particular, T 2, T 3, T 4, . . . are all linear transformations;

(ii) a scalar multiple of a linear transformation is a linear transformation;

(iii) the sum of two linear transformations is a linear transformation.

Consider a polynomial f(x) over the field F , say

f(x) = a0 + a1x+ a2x2 + · · ·+ akx

k.

The facts above ensure that we have a well-defined linear transformation

f(T ) = a0I + a1T + a2T2 + · · · + akT

k.

This is what we shall mean by substituting the linear transformation T intothe polynomial f(x).

Moreover, since dimV = n, the set of linear transformations V → Valso forms a vector space which has dimension n2 (see Problem Sheet II,Question 6). Now if we return to our linear transformation T : V → V , thenlet us consider the following collection of linear transformations

I, T, T 2, T 3, . . . , T n2

.

There are n2 + 1 linear transformations listed, so they must form a linearlydependent set. Hence there exist scalars α0, α1, . . . , αn2 ∈ F such that

α0I + α1T + α2T2 + · · ·+ αn2T n2

= 0

71

(the latter being the zero map 0: v 7→ 0 for all v ∈ V ). Omitting zerocoefficients and dividing by the last non-zero scalar αk yields an expressionof the form

T k + bk−1Tk−1 + · · ·+ b2T

2 + b1T + b0I = 0

where bi = αi/αk for i = 1, 2, . . . , k − 1. Hence there exists a monicpolynomial (that is, one whose leading coefficient is 1)

f(x) = xk + bk−1xk−1 + · · ·+ b2x

2 + b1x+ b0

such thatf(T ) = 0.

We make the following definition:

Definition 4.15 Let T : V → V be a linear transformation of a finite-dimensional vector space over the field F . The minimum polynomial mT (x)is the monic polynomial over F of smallest degree such that

mT (T ) = 0.

Note: Our definition of the characteristic polynomial is to ensure thatcT (x) = det(xI −A) is always a monic polynomial.

We have observed that if V has dimension n and T : V → V is a lineartransformation, then there certainly does exist some monic polynomial f(x)such that f(T ) = 0. Hence it makes sense to speak of a monic polynomialof smallest degree such that f(T ) = 0. However, if

f(x) = xk + αk−1xk−1 + · · ·+ α1x+ α0 and

g(x) = xk + βk−1xk−1 + · · ·+ β1x+ β0

are two different polynomials of the same degree such that f(T ) = g(T ) = 0,then

h(x) = f(x)− g(x) = (αk−1 − βk−1)xk−1 + · · ·+ (α1 − β1)x+ (α0 − β0)

is a non-zero polynomial of smaller degree satisfying h(T ) = 0, and somescalar multiple of h(x) is then monic. Hence there is a unique monic poly-nomial f(x) of smallest degree such that f(T ) = 0.

We have also observed that if V has dimension n and T : V → V , thenthere is a polynomial f(x) of degree at most n2 such that f(T ) = 0. In fact,there is a major theorem that does considerably better:

72

Theorem 4.16 (Cayley–Hamilton Theorem) Let T : V → V be a lin-ear transformation of a finite-dimensional vector space V . If cT (x) is thecharacteristic polynomial of T , then

cT (T ) = 0.

This is a difficult theorem to prove, so we omit the proof. One proof canbe found in Blyth and Robertson’s “Basic” book. We need to prove thatwhen T is substituted into

cT (x) = det(xI −A) (where A = Mat(T ))

we produce the zero transformation. The main difficulty comes from thefact that x must be treated as a scalar when expanding the determinantand then T is substituted in instead of this scalar variable.

The upshot of the Cayley–Hamilton Theorem is to show that the min-imum polynomial of T has degree at most the dimension of V and (as weshall see) to indicate close links between mT (x) and cT (x).

To establish the relevance of the minimum polynomial to diagonalisation,we will need some basic properties of polynomials.

Facts about polynomials: Let F be a field and recall F [x] denotes theset of polynomials with coefficients from F :

f(x) = anxn + an−1x

n−1 + · · · + a1x+ a0 (where ai ∈ F ).

Then F [x] is an example of what is known as a Euclidean domain (seeMT4517 Rings and Fields for full details). A summary of its main propertiesare:

• We can add, multiply and subtract polynomials;

• if we attempt to divide f(x) by g(x) (where g(x) 6= 0), we obtain

f(x) = g(x)q(x) + r(x)

where either r(x) = 0 or the degree of r(x) satisfies deg r(x) < deg g(x)(i.e., we can perform long-division with polynomials).

• When the remainder is 0, that is, when f(x) = g(x)q(x) for somepolynomial q(x), we say that g(x) divides f(x).

• If f(x) and g(x) are non-zero polynomials, their greatest common di-visor is the polynomial d(x) of largest degree dividing them both. Itis uniquely determined up to multiplying by a scalar and can be ex-pressed as

d(x) = a(x)f(x) + b(x)g(x)

for some polynomials a(x), b(x).

73

Those familiar with divisibility in the integers Z (particularly those whohave attended MT1003 ) will recognise these facts as being standard prop-erties of Z (which is also a standard example of a Euclidean domain).

Example 4.17 Find the remainder upon dividing x3+2x2 +1 by x2 +1 inthe Euclidean domain R[x].

Solution:

x3 + 2x2 + 1 = x(x2 + 1) + 2x2 − x+ 1

= (x+ 2)(x2 + 1)− x− 1.

Here the degree of the remainder r(x) = −x − 1 is less than the degreeof x2 + 1, so we have our required form. The quotient is q(x) = x + 2 andthe remainder r(x) = −x− 1. �

Proposition 4.18 Let V be a finite-dimensional vector space over a field Fand let T : V → V be a linear transformation. If f(x) is any polynomial(over F ) such that f(T ) = 0, then the minimum polynomial mT (x) di-vides f(x).

Proof: Attempt to divide f(x) by the minimum polynomial mT (x):

f(x) = mT (x)q(x) + r(x)

for some polynomials q(x) and r(x) with either r(x) = 0 or deg r(x) <degmT (x). Substituting the transformation T for the variable x gives

0 = f(T ) = mT (T )q(T ) + r(T ) = r(T )

sincemT (T ) = 0 by definition. SincemT has the smallest degree among non-zero polynomials which vanish when T is substituted, we conclude r(x) = 0.Hence

f(x) = mT (x)q(x);

that is, mT (x) divides f(x). �

Corollary 4.19 Suppose that T : V → V is a linear transformation of afinite-dimensional vector space V . Then the minimum polynomial mT (x)divides the characteristic polynomial cT (x).

Proof: This follows immediately from the Cayley–Hamilton Theorem andProposition 4.18. �

This corollary gives one link between the minimum polynomial and thecharacteristic polynomial. The following gives even stronger information,since it says a linear factor (x − λ) occurs in one if and only if it occurs inthe other.

74

Theorem 4.20 Let V be a finite-dimensional vector space over a field Fand let T : V → V be a linear transformation of V . Then the roots ofthe minimum polynomial mT (x) and the roots of the characteristic polyno-mial cT (x) coincide.

Recall that the roots of cT (x) are precisely the eigenvalues of T . Thusthis theorem has some significance in the context of diagonalisation.

Proof: Let λ be a root of mT (x), so (x− λ) is a factor of mT (x). It couldbe deduced from Corollary 4.19 (i.e., from the Cayley–Hamilton Theorem)that (x− λ) divides cT (x), but we will do this directly. Factorise

mT (x) = (x− λ)f(x)

for some non-zero polynomial f(x). Then deg f(x) < degmT (x), so f(T ) 6=0. Hence there exists a vector v ∈ V such that w = f(T )v 6= 0. NowmT (T ) = 0, so

0 = mT (T )v = (T − λI)f(T )v = (T − λI)w,

so T (w) = λw and we conclude that w is an eigenvector with eigenvalue λ.Hence λ is a root of cT (x) (by Lemma 4.4).

Conversely, suppose λ is a root of cT (x), so λ is an eigenvalue of T . Hencethere is an eigenvector v (note v 6= 0) with eigenvalue λ. Then T (v) = λv,

T 2(v) = T (T (v)) = T (λv) = λT (v) = λ2v,

and in general T i(v) = λiv for all i = 1, 2, . . . . Suppose

mT (x) = xk + αk−1xk−1 + · · ·+ α1x+ α0.

Then

0 = mT (T )v = (T k + αk−1Tk−1 + · · ·+ α1T + α0I)v

= T k(v) + αk−1Tk−1(v) + · · ·+ α1T (v) + α0v

= λkv + αk−1λk−1v + · · ·+ α1λv + α0v

= (λk + αk−1λk−1 + · · ·+ α1λ+ α0)v

= mT (λ)v.

Since v 6= 0, we conclude mT (λ) = 0; i.e., λ is a root of mT (x). �

To see the full link to diagonalisability, we finally prove:

Theorem 4.21 Let V be a finite-dimensional vector space over the field Fand let T : V → V be a linear transformation. Then T is diagonalisable ifand only if the minimum polynomial mT (x) is a product of distinct linearfactors.

75

Proof: Suppose there is a basis B with respect to which T is representedby a diagonal matrix:

MatB,B(T ) = A =

λ1

. . .

λ1 0λ2

. . .

λ2

. . .

0 λk

. . .

λk

where the λi are the distinct eigenvalues. Then

A− λ1I =

0. . .

0λ2 − λ1

. . .

λk − λ1

(with all non-diagonal entries being 0) and similar expressions apply toA− λ2I, . . . , A− λkI. Hence

(A− λ1I)(A− λ2I) . . . (A− λkI) =

0 · · · 0...

...0 · · · 0

= 0,

so(T − λ1I)(T − λ2I) . . . (T − λkI) = 0

Thus mT (x) divides (x − λ1)(x − λ2) . . . (x − λk) by Proposition 4.18. (Infact, by Theorem 4.20, it equals this product.) Hence mT (x) is a product ofdistinct linear factors.

To prove the converse we make use of the following lemma which exploitsthe greatest common divisor of polynomials.

Lemma 4.22 Let T : V → V be a linear transformation of a vector spaceover the field F and let f(x) and g(x) be coprime polynomials over F . Then

ker f(T )g(T ) = ker f(T )⊕ ker g(T ).

76

To say that f(x) and g(x) are coprime means that the largest degreepolynomial that divides both of them is a constant (i.e., has degree 0).

Proof: The greatest common divisor of f(x) and g(x) is a constant, sothere exist polynomials a(x) and b(x) over F such that

1 = a(x)f(x) + b(x)g(x).

HenceI = a(T )f(T ) + b(T )g(T ),

sov = a(T )f(T )v + b(T )g(T )v for v ∈ V . (4.3)

Now if v ∈ ker f(T )g(T ), then

g(T ) (a(T )f(T )v) = a(T )f(T )g(T )v = a(T )0 = 0,

so a(T )f(T )v ∈ ker g(T ). (We are here using the fact that if f(T ) and g(T )are polynomial expressions in the linear transformation T , then the lineartransformations f(T ) and g(T ) commute; that is, f(T )g(T ) = g(T )f(T ).)

Similarly

f(T ) (b(T )g(T )v) = b(T )f(T )g(T )v = b(T )0 = 0,

so b(T )g(T )v ∈ ker f(T ). Therefore

ker f(T )g(T ) = ker f(T ) + ker g(T ).

(Note that ker f(T ) ⊆ ker f(T )g(T ), since if v ∈ ker f(T ), then f(T )g(T )v =g(T )f(T )v = g(T )0 = 0, and similarly ker g(T ) ⊆ ker f(T )g(T ).)

If v ∈ ker f(T ) ∩ ker g(T ), then, from Equation 4.3,

v = a(T )f(T )v + b(T )g(T )v = a(T )0+ b(T )0 = 0.

Hence ker f(T ) ∩ ker g(T ) = {0} and we deduce

ker f(T )g(T ) = ker f(T )⊕ ker g(T ).

�

We now return and complete the proof of Theorem 4.21. Supposem(x) =(x−λ1)(x−λ2) . . . (x−λk) where the λi are distinct scalars. Now mT (T ) = 0and distinct linear polynomials are coprime (as only constants have smallerdegree), so

V = kermT (T )

= ker(T − λ1I)(T − λ2I) . . . (T − λkI)

= ker(T − λ1I)⊕ ker(T − λ2I)⊕ · · · ⊕ ker(T − λkI)

77

= Eλ1⊕Eλ2

⊕ · · · ⊕ Eλk

by repeated use of 4.22. Pick a basis Bi for each eigenspace Eλi. Then

B = B1 ∪B2 ∪ · · · ∪Bk is a basis for V consisting of eigenvectors for T(using Proposition 3.12). We deduce that T is diagonalisable. �

Example 4.23 Let us return to our two earlier examples and discuss themin the context of Theorem 4.21.

In Examples 4.8 and 4.11, we defined linear transformations R3 → R3

given by the matrices

A =

8 6 0−9 −7 03 3 2

and B =

8 3 0−18 −7 0−9 −4 2

,

respectively. Determine the minimum polynomials of A and B.

Solution: We observed that the matrices have the same characteristicpolynomial

cA(x) = cB(x) = (x+ 1)(x − 2)2,

but A is diagonalisable while B is not. The minimum polynomial of eachdivides (x + 1)(x − 2)2 and certainly has (x + 1)(x − 2) as a factor (byTheorem 4.20). Now Theorem 4.21 tells us that mA(x) is a product ofdistinct linear factors, but mB(x) is not. Therefore

mA(x) = (x+ 1)(x− 2) and mB(x) = (x+ 1)(x− 2)2.

[Exercise: Verify (A+ I)(A− 2I) = 0 and (B + I)(B − 2I) 6= 0.]

Example 4.24 Consider the linear transformation R3 → R3 given by thematrix

D =

3 0 12 2 2−1 0 1

.

Calculate the characteristic polynomial of D, determine if D is diagonalis-able and calculate the minimum polynomial.

Solution: The characteristic polynomial is

cD(x) = det

x− 3 0 −1−2 x− 2 −21 0 x− 1

= (x− 3)(x − 2)(x− 1) + (x− 2)

78

= (x− 2)(x2 − 4x+ 3 + 1)

= (x− 2)(x2 − 4x+ 4)

= (x− 2)3.

Therefore D is a diagonalisable only if mD(x) = x− 2. But

D − 2I =

1 0 12 0 2−1 0 −1

6=

0 0 00 0 00 0 0

,

so mD(x) 6= x− 2. Thus D is not diagonalisable. Indeed

(D − 2I)2 =

1 0 12 0 2−1 0 −1

1 0 12 0 2−1 0 −1

=

0 0 00 0 00 0 0

,

so we deduce mD(x) = (x− 2)2.

Example 4.25 Consider the linear transformation R3 → R3 given by thematrix

E =

−3 −4 −120 −11 −240 4 9

.

Calculate the characteristic polynomial of E, determine if E is diagonalisableand calculate its minimum polynomial.

Solution:

cE(x) = det

x+ 3 4 120 x+ 11 240 −4 x− 9

= (x+ 3) ((x+ 11)(x − 9) + 96)

= (x+ 3)(x2 + 2x− 3)

= (x+ 3)(x− 1)(x+ 3)

= (x− 1)(x+ 3)2.

So the eigenvalues of E are 1 and −3. Now E is diagonalisable only ifmE(x) = (x− 1)(x+ 3). We calculate

E − I =

−4 −4 −120 −12 −240 4 8

, E + 3I =

0 −4 −120 −8 −240 4 12

,

so

(E − I)(E + 3I) =

−4 −4 −120 −12 −240 4 8

0 −4 −120 −8 −240 4 12

=

0 0 00 0 00 0 0

.

79

Hence mE(x) = (x− 1)(x+ 3) and E is diagonalisable.

Example 4B Let

A =

0 −2 −11 5 3−1 −2 0

.

Calculate the characteristic polynomial and the minimum polynomial of A.Hence determine whether A is diagonalisable.

Solution:

cA = det(xI −A)

= det

x 2 1−1 x− 5 −31 2 x

= xdet

(

x− 5 −32 x

)

− 2 det

(

−1 −31 x

)

+ det

(

−1 x− 51 2

)

= x(

x(x− 5) + 6)

− 2(−x+ 3) + (−2− x+ 5)

= x(x2 − 5x+ 6) + 2(x− 3)− x+ 3

= x(x− 3)(x − 2) + 2(x− 3)− (x− 3)

= (x− 3)(

x(x− 2) + 2− 1)

= (x− 3)(x2 − 2x+ 1)

= (x− 3)(x − 1)2.

Since the minimum polynomial divides cA(x) and has the same roots, wededuce

mA(x) = (x− 3)(x − 1) or mA(x) = (x− 3)(x− 1)2.

We calculate

(A− 3I)(A − I) =

−3 −2 −11 2 3−1 −2 −3

−1 −2 −11 4 3−1 −2 −1

=

2 0 −2−2 0 22 0 −2

6= 0.

Hence mA(x) 6= (x− 3)(x − 1). We conclude

mA(x) = (x− 3)(x − 1)2.

This is not a product of distinct linear factors, so A is not diagonalisable. �

80

Section 5

Jordan normal form

In the previous section we discussed at great length the diagonalisation oflinear transformations. This is useful since it is much easier to work withdiagonal matrices than arbitrary matrices. However, as we saw, not everylinear transformation can be diagonalised. In this section, we discuss analternative which, at least in the case of vector spaces over C, can be usedfor any linear transformation or matrix.

Definition 5.1 A Jordan block is an n× n matrix of the form

Jn(λ) =

λ 1

λ 1 0. . .

. . .

0 . . . 1λ

for some positive integer n and some scalar λ.A linear transformation T : V → V (of a vector space V ) has Jordan

normal form A if there exists a basis for V with respect to which the matrixof T is

Mat(T ) = A =

Jn1(λ1) 0 · · · · · · 00 Jn2

(λ2) 0 · · · 0... 0

. . .. . .

......

.... . . 0

0 0 · · · 0 Jnk(λk)

for some positive integers n1, n2, . . . , nk and scalars λ1, λ2, . . . , λk. (Theoccurrences of 0 here indicate zero matrices of appropriate sizes.)

81

Comment: Blyth & Robertson use the term “elementary Jordan matrix”for what we have called a Jordan block and use the term “Jordan blockmatrix” for something that is a hybrid between our two concepts above. Ibelieve the terminology above is most common.

Theorem 5.2 Let V be a finite-dimensional vector space and T : V → V bea linear transformation of V such that the characteristic polynomial cT (x)is a product of linear factors with eigenvalues λ1, λ2, . . . , λn, then thereexist a basis for V with respect to which Mat(T ) is in Jordan normal formwhere each Jordan block has the form Jm(λi) for some m and some i.

In particular, this theorem applies when our field is C, since every poly-nomial is a product of linear factors over C. When cT (x) is not a productof linear factors, Jordan normal form cannot be used. Instead, one usessomething called rational normal form, which I shall not address here.

Corollary 5.3 Let A be a square matrix over C. Then there exist aninvertible matrix P (over C) such that P−1AP is in Jordan normal form.

This corollary follows from Theorem 5.2 and Theorem 2.13 (which tellsus that change of basis corresponds to forming P−1AP ).

We shall not prove Theorem 5.2. It is reasonably hard to prove and ismost easily addressed by developing more advanced concepts and theory.Instead, we shall use this section to explain how to calculate the Jordannormal form associated to a linear transformation or matrix.

First consider a Jordan block

J = Jn(λ) =

λ 1

λ 1 0. . .

. . .

0 . . . 1λ

.

We shall first determine its characteristic polynomial and minimum polyno-mial.

The characteristic polynomial of J is

cJ (x) = det

x− λ −1x− λ −1 0

. . .. . .

0. . . −1

x− λ

82

= (x− λ) det

x− λ −10. . .

. . .

. . . −10 x− λ

...

= (x− λ)n.

When we turn to calculating the minimum polynomial, we note thatmJ(x) divides cJ (x), somJ(x) = (x−λ)k for some value of k with 1 6 k 6 n.Our problem is to determine what k must be.

We note

J − λI =

0 1

0 1 0. . .

. . .

0 . . . 10

.

Let us now calculate successive powers of J − λI:

(J − λI)2 =

0 1

0 1 0. . .

. . .

0 . . . 10

0 1

0 1 0. . .

. . .

0 . . . 10

=

0 0 1 0 · · · · · · 00 0 0 1 0 · · · 0...

.... . .

. . .. . .

......

.... . . 1 0

......

. . . 10 0 · · · · · · · · · · · · 00 0 · · · · · · · · · · · · 0

.

83

Repeatedly multiplying by J −λI successively moves the diagonal of 1s onelevel higher in the matrix. Thus

(J − λI)3 =

0 0 0 1 0 · · · · · · 00 0 0 0 1 0 · · · 0...

.... . .

. . .. . .

......

.... . . 1 0

......

. . . 10 0 · · · · · · · · · · · · · · · 00 0 · · · · · · · · · · · · · · · 00 0 · · · · · · · · · · · · · · · 0

.

Finally, we find

(J − λI)n−1 =

0 · · · 0 10 · · · 0 0...

......

0 · · · 0 0

and (J − λI)n =

0 · · · 0...

...0 · · · 0

.

So (J − λI)n = 0 but (J − λI)n−1 6= 0. Therefore

mJ(x) = (x− λ)n.

In particular, we see the characteristic and minimum polynomials of a Jordanblock coincide. We record these observations for future use:

Proposition 5.4 Let J = Jn(λ) be an n× n Jordan block. Then

(i) cJ(x) = (x− λ)n;

(ii) mJ(x) = (x− λ)n;

(iii) the eigenspace Eλ of J has dimension 1.

Proof: It remains to prove part (iii). To find the eigenspace Eλ, we solve(J − λI)(v) = 0. We have calculated J − λI above, so we solve

0 1

0 1 0. . .

. . .

0 . . . 10

x1x2x3...xn

=

000...0

;

84

that is,

x2x3...xn0

=

00...00

.

Hence x2 = x3 = · · · = xn, while x1 may be arbitrary. Therefore

Eλ =

x00...0

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

x ∈ R

= Span

100...0

.

Hence dimEλ = 1, as claimed. �

We now use the information obtained in the previous proposition to tellus how to embark on solving the following general proposition.

Problem: Let V be a finite-dimensional vector space and let T : V → V bea linear transformation. If the characteristic polynomial cT (x) is a productof linear factors, find a basis B with respect to which T is in Jordan normalform and determine what this Jordan normal form is.

If B is the basis solving this problem, then

MatB,B(T ) = A =

Jn1(λ1)

Jn2(λ2)

0

0. . .

Jnk(λk)

,

where Jn1(λ1), Jn2

(λ2), . . . , Jnk(λk) are the Jordan blocks. When we calcu-

late the characteristic polynomial cT (x) using this matrix, each block Jni(λi)

contributes a factor of (x− λi)ni (see Proposition 5.4(i)). Collecting all the

factors corresponding to the same eigenvalue, we conclude:

Observation 5.5 The algebraic multiplicity of λ as an eigenvalue of Tequals the sum of the sizes of the Jordan blocks Jn(λ) (associated to λ)occurring in the Jordan normal form for T .

This means, of course, that the number of times that λ occurs on thediagonal in the Jordan normal form matrix A is precisely the algebraicmultiplicity rλ of λ.

85

If particular, if rλ = 1, a single 1× 1 Jordan block occurs in A, namelyJ1(λ) = (λ). If rλ = 2, then either two 1 × 1 Jordan blocks occur or a2× 2 Jordan block J2(λ) occurs in A. Thus A either contains

(

λ 00 λ

)

or

(

λ 10 λ

)

.

Similar observations may be made for other small values of rλ, but thepossibilities grow more complicated as rλ increases.

To distinguish between these possibilities, we first make use of the min-imum polynomial. To ensure the block Jni

(λi) becomes 0 when we substi-tute A into the polynomial, we must have at least a factor (x − λi)

ni (seeProposition 5.4(ii)). Consequently:

Observation 5.6 If λ is an eigenvalue of T , then the power of (x − λ)occurring in the minimum polynomial mT (x) is (x − λ)m where m is thelargest size of a Jordan block associated to λ occurring in the Jordan normalform for T .

Observations 5.5 and 5.6 are enough to determine the Jordan normalform in small cases.

Example 5.7 Let V = R4 and let T : V → V be the linear transformationgiven by the matrix

B =

2 1 0 −30 2 0 44 5 −2 10 0 0 −2

.

Determine the Jordan normal form of T .

Solution: We first determine the characteristic polynomial of T :

cT (x) = det

x− 2 −1 0 30 x− 2 0 −4−4 −5 x+ 2 −10 0 0 x+ 2

= (x+ 2) det

x− 2 −1 00 x− 2 0−4 −5 x+ 2

= (x+ 2)2 det

(

x− 2 −10 x− 2

)

= (x− 2)2(x+ 2)2.

So the Jordan normal form contains either a single Jordan block J2(2) corre-sponding to eigenvalue 2 or two blocks J1(2) of size 1. Similar observations

86

apply to the Jordan block(s) corresponding to the eigenvalue −2. To deter-mine which occurs, we consider the minimum polynomial.

We now know the minimum polynomial of T has the form mT (x) =(x − 2)i(x + 2)j where 1 6 i, j 6 2 by Corollary 4.19 and Theorem 4.20.Now

B − 2I =

0 1 0 −30 0 0 44 5 −4 10 0 0 −4

and B + 2I =

4 1 0 −30 4 0 44 5 0 10 0 0 0

,

so

(B − 2I)(B + 2I) =

0 4 0 40 0 0 00 4 0 40 0 0 0

6= 0

and

(B − 2I)2(B + 2I) =

0 1 0 −30 0 0 44 5 −4 10 0 0 −4

0 4 0 40 0 0 00 4 0 40 0 0 0

=

0 0 0 00 0 0 00 0 0 00 0 0 0

.

The first calculation shows mT (x) is not equal to the only possibility ofdegree 2, so we conclude from the second calculation that

mT (x) = (x− 2)2(x+ 2).

Hence at least one Jordan block J2(2) of size 2 occurs in the Jordan normalform of T , while all Jordan blocks corresponding to the eigenvalue −2 havesize 1.

We conclude the Jordan normal form of T is

J2(2) 0 00 J1(−2) 00 0 J1(−2)

=

2 1 0 00 2 0 00 0 −2 00 0 0 −2

.

Example 5.8 Let V = R4 and let T : V → V be the linear transformationgiven by the matrix

C =

3 0 1 −11 2 1 −1−1 0 1 10 0 0 2

.

87

Determine the Jordan normal form of T .

Solution:

cT (x) = det

x− 3 0 −1 1−1 x− 2 −1 11 0 x− 1 −10 0 0 x− 2

= (x− 2) det

x− 3 0 −1−1 x− 2 −11 0 x− 1

= (x− 2)2 det

(

x− 3 −11 x− 1

)

= (x− 2)2(

(x− 3)(x− 1) + 1)

= (x− 2)2(x2 − 4x+ 3 + 1)

= (x− 2)2(x2 − 4x+ 4)

= (x− 2)4.

Now we calculate the minimum polynomial:

C − 2I =

1 0 1 −11 0 1 −1−1 0 −1 10 0 0 0

,

so

(C − 2I)2 =

1 0 1 −11 0 1 −1−1 0 −1 10 0 0 0

1 0 1 −11 0 1 −1−1 0 −1 10 0 0 0

=

0 0 0 00 0 0 00 0 0 00 0 0 0

.

Hence mT (x) = (x − 2)2. We now know the Jordan normal form for Tincludes at least one block J2(2) but we cannot tell whether the remainingblocks are a single block of size 2 or two blocks of size 1.

To actually determine which, we need to go beyond the characteristic andminimum polynomials, and consider the eigenspace E2. We shall describethis in general and return to complete the solution of this example later.

88

Consider a linear transformation T : V → V with Jordan normal form A.Each block Jn(λ) occurring in A contributes one linearly independent eigen-vector to a basis for the eigenspace Eλ (see Proposition 5.4(iii)). Thus thenumber of blocks in A corresponding to a particular eigenvalue λ will equal

dimEλ = nλ,

the geometric multiplicity of λ. In summary:

Observation 5.9 The geometric multiplicity of λ as an eigenvalue of Tequals the number of Jordan blocks Jn(λ) occurring in the Jordan normalform for T .

Example 5.8 (cont.) Let us determine the eigenspace E2 for our trans-formation T with matrix C. We solve (T − 2I)(v) = 0; that is,

1 0 1 −11 0 1 −1−1 0 −1 10 0 0 0

xyzt

=

0000

.

This reduces to a single equation

x+ z − t = 0,

so

E2 =

xyz

x+ z

∣

∣

∣

∣

∣

∣

∣

∣

x, y, z ∈ R

= Span

1001

,

0100

,

0011

.

Hence n2 = dimE2 = 3. It follows that the Jordan normal for T containsthree Jordan blocks corresponding to the eigenvalue 2. Therefore the Jordannormal form of T is

J2(2) 0 00 J1(2) 00 0 J1(2)

=

2 1 0 00 2 0 00 0 2 00 0 0 2

.

Our three observations are enough to determine the Jordan normal formfor the linear transformations that will be encountered in this course. Theyare sufficient for small matrices, but will not solve the problem for all pos-sibilities. For example, they do not distinguish between the 7× 7 matrices

J3(λ) 0 00 J2(λ) 00 0 J2(λ)

and

J3(λ) 0 00 J3(λ) 00 0 J1(λ)

,

89

which both have characteristic polynomial (x − λ)7, minimum polynomial(x − λ)3 and geometric multiplicity nλ = dimEλ = 3. To deal with suchpossible Jordan normal forms one needs to generalise Observation 5.9 toconsider the dimension of generalisations of eigenspaces:

dimker(T − λI)2, dimker(T − λI)3, . . . .

We leave the details to the interested and enthused student.

We finish this section by returning to the final part of the general prob-lem: finding a basis with respect to which a linear transformation is inJordan normal form.

Example 5.10 Let

C =

3 0 1 −11 2 1 −1−1 0 1 10 0 0 2

(the matrix from Example 5.8). Find an invertible matrix P such thatP−1CP is in Jordan normal form.

Solution: We have already established the Jordan normal form of thetransformation T : R4 → R4 with matrix C is

A =

J2(2) 0 00 J1(2) 00 0 J1(2)

=

2 1 0 00 2 0 00 0 2 00 0 0 2

.

Our problem is equivalent (by Theorem 2.13) to finding a basis B withrespect to which the matrix of T equals A. Thus B = {v1, v2, v3, v4} suchthat

T (v1) = 2v1, T (v2) = v1 + 2v2, T (v3) = 2v3, T (v4) = 2v4.

So we need to choose v1, v3 and v4 to lie in the eigenspace E2 (which wedetermined earlier).

On the face of it, the choice of v2 appears to be less straightforward: werequire (T − 2I)(v2) = v1, some non-zero vector in E2 and this indicates wealso probably do not have total freedom in the choice of v1. In Example 5.8,we calculated

E2 =

xyz

x+ z

∣

∣

∣

∣

∣

∣

∣

∣

x, y, z ∈ R

.

90

Let us solve

(T − 2I)(v) =

1 0 1 −11 0 1 −1−1 0 −1 10 0 0 0

αβγδ

=

xyz

x+ z

.

We need to establish for which values of x, y and z this has a non-zerosolution (and in the process we determine possibilities for both v1 and v2).The above matrix equation implies

α+ γ − δ = x = y = −z

andx+ z = 0.

Any value of x will determine a possible solution, so let us choose x = 1.Then y = 1 and z = −1. Hence we shall take

v1 =

11−10

and then the equation (T − 2I)(v2) = v1 has non-zero solutions, namely

v2 =

αβγδ

where α+ γ − δ = 1.

There are many possible solutions, we shall take α = 1, β = γ = δ = 0 andhence

v2 =

1000

will be good enough.To find v3 and v4, we need two vectors from E2 which together with v1

form a basis for E2. We shall choose

v3 =

0100

and v4 =

0011

.

91

Indeed, note that an arbitrary vector in E2 can be expressed as

xyz

x+ z

= x

11−10

+ (y − x)

0100

+ (x+ z)

0011

= xv1 + (y − x)v3 + (x+ z)v4,

so E2 = Span(v1, v3, v4).We now have our required basis

B =

11−10

,

1000

,

0100

,

0011

and the required change of basis matrix is

P =

1 1 0 01 0 1 0−1 0 0 10 0 0 1

.

[Exercise: Calculate P−1CP and verify it has the correct form.]

Example 5.11 Let T : R4 → R4 be the linear transformation given by thematrix

D =

−3 2 1

2−2

0 0 0 00 −3 −3 −30 0 0 0

.

Determine the Jordan normal form J of T and find an invertible matrix Psuch that P−1DP = J .

Solution:

cT (x) = det

x+ 3 −2 −1

22

0 x 0 00 3 x+ 3 30 0 0 x

= (x+ 3) det

x 0 03 x+ 3 30 0 x

= x(x+ 3) det

(

x 03 x+ 3

)

92

= x2(x+ 3)2.

So the eigenvalues of T are 0 and −3. Then mT (x) = xi(x + 3)j where1 6 i, j 6 2. Since

D + 3I =

0 2 1

2−2

0 3 0 00 −3 0 −30 0 0 3

,

we calculate

D(D + 3I) =

0 −3

2−3

2−3

2

0 0 0 00 0 0 00 0 0 0

6= 0

and

D(D + 3I)2 =

0 −3

2−3

2−3

2

0 0 0 00 0 0 00 0 0 0

0 2 1

2−2

0 3 0 00 −3 0 −30 0 0 3

=

0 0 0 00 0 0 00 0 0 00 0 0 0

.

Hence mT (x) = x(x+ 3)2. Therefore the Jordan normal form of T is

J =

J1(0) 0 00 J1(0) 00 0 J2(−3)

=

0 0 0 00 0 0 00 0 −3 10 0 0 −3

.

We now seek a basis B = {v1, v2, v3, v4} with respect to which the matrixof T is J . Thus, we require v1, v2 ∈ E0, v3 ∈ E−3 and

T (v4) = v3 − 3v4.

We first solve T (v) = 0:

−3 2 1

2−2

0 0 0 00 −3 −3 −30 0 0 0

xyzt

=

0000

,

so

−3x+ 2y + 1

2z − 2t = 0

93

−3y − 3z − 3t = 0.

Hence given arbitrary z, t ∈ R, we have y = −z − t and

x = 1

3(2y + 1

2z − 2t)

= 1

3(−3

2z − 4t)

= −1

2z − 4

3t.

So

E0 =

−1

2z − 4

3t

−z − tzt

∣

∣

∣

∣

∣

∣

∣

∣

z, t ∈ R

= Span

−1

2

−110

,

−4

3

−101

.

Take

v1 =

−1

2

−110

and v2 =

−4

3

−101

.

Now solve (T + 3I)(v) = 0:

0 2 1

2−2

0 3 0 00 −3 0 −30 0 0 3

zyzt

=

0000

,

so2y + 1

2z − 2t = 0

3y = 0−3y − 3t = 0

3t = 0.

Hence y = t = 0 and we deduce z = 0, while x may be arbitrary. Thus

E−3 =

x000

∣

∣

∣

∣

∣

∣

∣

∣

x ∈ R

= Span

1000

.

Take

v3 =

1000

.

94

We finally solve T (v4) = v3 − 3v4; that is, (T + 3I)(v4) = v3:

0 2 1

2−2

0 3 0 00 −3 0 −30 0 0 3

xyzt

=

1000

.

Hence2y + 1

2z − 2t = 1

3y = 0−3y − 3t = 0

3t = 0,

so y = t = 0 and then 1

2z = 1, which forces z = 2, while x may be arbitrary.

Thus

v4 =

0020

is one solution. Thus

B =

−1

2

−110

,

−4

3

−101

,

1000

,

0020

and our change of basis matrix is

P =

−1

2−4

31 0

−1 −1 0 01 0 0 20 1 0 0

.

This last example illustrates some general principles. When seeking theinvertible matrix P such that P−1AP is in Jordan normal form, we seekparticular vectors to form a basis. These basis vectors can be found bysolving appropriate systems of linear equations (though sometimes care isneeded to find the correct system to solve as was illustrated in Example 5.10).

Example 5A Let V = R5 and let T : V → V be the linear transformationgiven by the matrix

E =

1 0 −1 0 −80 1 4 0 29−1 0 1 1 50 0 −1 1 −110 0 0 0 −2

.

Determine a Jordan normal form J of T and find an invertible matrix Psuch that P−1EP = J .

95

Solution: We first determine the characteristic polynomial of T :

cT (x) = det

x− 1 0 1 0 80 x− 1 −4 0 −291 0 x− 1 −1 −50 0 1 x− 1 110 0 0 0 x+ 2

= (x+ 2) det

x− 1 0 1 00 x− 1 −4 01 0 x− 1 −10 0 1 x− 1

= (x− 1)(x+ 2) det

x− 1 1 01 x− 1 −10 1 x− 1

= (x− 1)(x+ 2)

(

(x− 1) det

(

x− 1 −11 x− 1

)

− det

(

1 −10 x− 1

))

= (x− 1)(x+ 2)

(

(x− 1)(

(x− 1)2 + 1)

− (x− 1)

)

= (x− 1)2(x+ 2)(

(x− 1)2 + 1− 1)

= (x− 1)4(x+ 2).

We now know that the Jordan normal form for T contains a single Jordanblock J1(−2) corresponding to eigenvalue −2 and some number of Jordanblocks Jn(1) corresponding to eigenvalue 1. The sum of the sizes of theselatter blocks equals 4.

Let us now determine the minimum polynomial of T . We know mT (x) =(x− 1)i(x+ 2) where 1 6 i 6 4 by Corollary 4.19 and Theorem 4.20. Now

E − I =

0 0 −1 0 −80 0 4 0 29−1 0 0 1 50 0 −1 0 −110 0 0 0 −3

and E+2I =

3 0 −1 0 −80 3 4 0 29−1 0 3 1 50 0 −1 3 −110 0 0 0 0

,

so

(E − I)(E + 2I) =

1 0 −3 −1 −5−4 0 12 4 20−3 0 0 3 −31 0 −3 −1 −50 0 0 0 0

6= 0,

96

(E − I)2(E + 2I) =

0 0 −1 0 −80 0 4 0 29−1 0 0 1 50 0 −1 0 −110 0 0 0 −3

1 0 −3 −1 −5−4 0 12 4 20−3 0 0 3 −31 0 −3 −1 −50 0 0 0 0

=

3 0 0 −3 3−12 0 0 12 −120 0 0 0 03 0 0 −3 30 0 0 0 0

6= 0

and

(E − I)3(E + 2I) =

0 0 −1 0 −80 0 4 0 29−1 0 0 1 50 0 −1 0 −110 0 0 0 −3

3 0 0 −3 3−12 0 0 12 −120 0 0 0 03 0 0 −3 30 0 0 0 0

=

0 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 0

.

Hence mT (x) = (x− 1)3(x+2). As a consequence, the Jordan normal formof T must contain at least one Jordan block J3(1) of size 3. Since the sizesof the Jordan blocks associated to the eigenvalue 1 has sum equal to 4 (fromearlier), there remains a single Jordan block J1(1) of size 1.

Our conclusion is a Jordan normal form of T is

J =

J3(1) 0 00 J1(1) 00 0 J1(−2)

=

1 1 0 0 00 1 1 0 00 0 1 0 00 0 0 1 00 0 0 0 −2

.

We now want to find a basis B = {v1,v2,v3,v4,v5} for R5 such thatMatB,B(T ) = J . In particular, v1 and v4 are required to be eigenvectorswith eigenvalue 1. Let us first find the eigenspace E1 by solving (T−I)(v) =0; that is,

0 0 −1 0 80 0 4 0 29−1 0 0 1 50 0 −1 0 −110 0 0 0 −3

xyztu

=

00000

.

97

The fifth row yields −3u = 0; that is, u = 0. It follows from the first rowthat −z+8u = 0 and hence z = 0. The only row yielding further informationis the third which says −x+ t+ 5u = 0 and so x = t. Hence

E1 =

xy0x0

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

x, y ∈ R

.

From this we can read off a basis for the eigenspace E1, but this does nottell us which vector to take as v1. We need v1 to be a suitable choice ofeigenvector so that T (v2) = v1 + v2, that is, (T − I)(v2) = v1, is possible.We solve for (T − I)(v) = w where w is a typical vector in E1. So consider

0 0 −1 0 80 0 4 0 29−1 0 0 1 50 0 −1 0 −110 0 0 0 −3

xyztu

=

αβ0α0

for some non-zero scalars α and β. Thus −3u = 0 and so u = 0. We thenobtain three equations

−z = α, 4z = β, −x+ t = 0. (5.1)

Thus to have a solution we must have −4α = β. This tells us what to takeas v1: we want a vector of the form

α−4α0α0

where α ∈ R, but α 6= 0.

So take α = 1 and

v1 =

1−4010

.

Then Equations (5.1) tell us that vector v2 is given by z = −1, x = t andy and x can be arbitrary. We shall take x = y = 0 (mainly for convenience):

v2 =

00−100

.

98

The vector v3 is required to satisfy T (v3) = v2 + v3, so to find v3 we solve(T − I)(v) = v2:

0 0 −1 0 80 0 4 0 29−1 0 0 1 50 0 −1 0 −110 0 0 0 −3

xyztu

=

00−100

.

Thus u = 0, z = 0 and −x + t = −1. Any other choices are arbitrary, sowe shall take x = 1, y = 0 and then t = 0. So we take

v3 =

10000

.

For v3, we note that v1, v4 should be linearly independent vectors (as theyform part of a basis) in the eigenspace

E1 =

xy0x0

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

x, y ∈ R

.

Given our choice of v1, there remains much choice for v4 (essentially it mustnot be a scalar multiple of v1). We shall take

v4 =

10010

(i.e., take x = 1, y = 0).Finally, we require v5 to be an eigenvector for T with eigenvalue −2, so

we solve T (v) = −2v or, equivalently, (T + 2I)(v) = 0:

3 0 −1 0 −80 3 4 0 29−1 0 3 1 50 0 −1 3 −110 0 0 0 0

xyztu

=

00000

.

99

We apply row operations to solve this system of equations:

3 0 −1 0 −80 3 4 0 29−1 0 3 1 50 0 −1 3 −110 0 0 0 0

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

00000

−→

0 0 8 3 70 3 4 0 29−1 0 3 1 50 0 −1 3 −110 0 0 0 0

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

00000

r1 7→ r1 + 3r3

−→

0 0 0 27 −810 3 0 12 −15−1 0 0 10 −280 0 −1 3 −110 0 0 0 0

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

00000

r1 7→ r1 + 8r4r2 7→ r2 + 4r4r3 7→ r3 + 3r4

−→

0 0 0 1 −30 3 0 4 −5−1 0 0 10 −280 0 −1 3 −110 0 0 0 0

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

00000

r1 7→ 1

27r1

r2 7→ 1

3r2

Hence

t − 3u = 0

y + 4t − 5u = 0

−x + 10t− 28u = 0

−z + 3t− 11u = 0.

Take u = 1 (it can be non-zero, but otherwise arbitrary, when producingthe eigenvector v5). Then

t = 3u = 3

y = 5u− 4t = −7x = 10t− 28u = 2

z = 3t− 11u = −2.

So we take

v5 =

2−7−231

.

100

With the above choices, the matrix of T with respect to the basis B ={v1,v2,v3,v4,v5} is then our Jordan normal form J . The change of basismatrix P such that P−1EP = J is found by writing each vj in terms of thestandard basis and placing the coefficients in the jth column of P . Thus

P =

1 0 1 1 2−4 0 0 0 −70 −1 0 0 −21 0 0 1 30 0 0 0 1

.

�

101

Section 6

Inner product spaces

We now head off in a different direction from the subject of representinglinear transformations by matrices. We shall consider the topic of innerproduct spaces. These are vector spaces endowed with an “inner product”(essentially a generalisation of the dot product of vectors in R3) and areextremely important. If time allows (which it probably won’t!), we shall seea link to the topic of diagonalisation.

Throughout this section (and the rest of the course), our base field Fwill be either R or C. Recall that if z = x+ iy ∈ C, the complex conjugateof z is given by

z = x− iy.

To save space and time, we shall use the complex conjugate even whenF = R. Thus, when F = R and α appears, it means α = α for a scalarα ∈ R.

Definition 6.1 Let F = R or C. An inner product space is a vector space Vover F together with an inner product

V × V → F

(v,w) 7→ 〈v,w〉

such that

(i) 〈u+ v,w〉 = 〈u,w〉+ 〈v,w〉 for all u, v, w ∈ V ,

(ii) 〈αv,w〉 = α〈v,w〉 for all v,w ∈ V and α ∈ F ,

(iii) 〈v,w〉 = 〈w, v〉 for all v,w ∈ V ,

(iv) 〈v, v〉 is a real number satisfying 〈v, v〉 > 0 for all v ∈ V ,

(v) 〈v, v〉 = 0 if and only if v = 0.

102

Thus, in the case when F = RRR, our inner product is symmetric in thesense that Condition (iii) then becomes

〈v,w〉 = 〈w, v〉 for all v,w ∈ V .

Example 6.2 (i) The vector space Rn of column vectors of real numbersis an inner product space with respect to the usual dot product :

⟨

x1x2...xn

,

y1y2...yn

⟩

=

x1x2...xn

·

y1y2...yn

=

n∑

i=1

xiyi.

Note that if v =

x1x2...xn

, then

〈v,v〉 =n∑

i=1

x2i

and from this Condition (iv) follows immediately.

(ii) We can endow Cn with an inner product by introducing the complexconjugate:

⟨

z1z2...zn

,

w1

w2

...wn

⟩

=

n∑

i=1

ziwi.

Note that if v =

z1z2...zn

, then

〈v,v〉 =n∑

i=1

zizi =

n∑

i=1

|zi|2.

(iii) If a < b, the set C[a, b] of continuous functions f : [a, b] → R is a realvector space when we define

(f + g)(x) = f(x) + g(x)

(αf)(x) = α · f(x)

103

for f, g ∈ C[a, b] and α ∈ R. In fact, C[a, b] is an inner product spacewhen we define

〈f, g〉 =∫ b

af(x)g(x) dx.

Since f(x)2 > 0 for all x, we have

〈f, f〉 =∫ b

af(x)2 dx > 0.

(iv) The space Pn of real polynomials of degree at most n is a real vec-tor space of dimension n + 1. It becomes an inner product space byinheriting the inner product from above, for example:

〈f, g〉 =∫

1

0

f(x)g(x) dx

for real polynomials f(x), g(x) ∈ Pn.

We can also generalise these last two examples to complex-valued func-tions. For example, the complex vector space of polynomials

f(x) = αnxn + αn−1x

n−1 + · · ·+ α1x+ α0

where α0, α1, . . . , αn ∈ C becomes an inner product space when wedefine

〈f, g〉 =∫

1

0

f(x)g(x) dx

wheref(x) = αnx

n + αn−1xn−1 + · · ·+ α1x+ α0.

Definition 6.3 Let V be an inner product space with inner product 〈·, ·〉.The norm is the function ‖ · ‖ : V → R defined by

‖v‖ =√

〈v, v〉.

(This makes sense since 〈v, v〉 > 0 for all v ∈ V .)

Lemma 6.4 Let V be an inner product space with inner product 〈·, ·〉.Then

(i) 〈v, αw〉 = α〈v,w〉 for all v,w ∈ V and α ∈ F ;

(ii) ‖αv‖ = |α| · ‖v‖ for all v ∈ V and α ∈ F ;

(iii) ‖v‖ > 0 whenever v 6= 0.

104

Proof: (i)

〈v, αw〉 = 〈αw, v〉 = α〈w, v〉 = α〈w, v〉 = α〈v,w〉.

(ii)‖αv‖2 = 〈αv, αv〉 = α〈v, αv〉 = αα〈v, v〉 = |α|2‖v‖2

and taking square roots gives the result.(iii) 〈v, v〉 > 0 whenever v 6= 0. �

Theorem 6.5 (Cauchy–Schwarz Inequality) Let V be an inner prod-uct space with inner product 〈·, ·〉. Then

|〈u, v〉| 6 ‖u‖ · ‖v‖

for all u, v ∈ V .

Proof: If v = 0, then we see

〈u, v〉 = 〈u,0〉 = 〈u, 0 · 0〉 = 0〈u,0〉 = 0.

Hence|〈u, v〉| = 0 = ‖u‖ · ‖v‖

as ‖v‖ = 0.In the remainder of the proof we assume v 6= 0. Let α be a scalar, put

w = u+ αv and expand 〈w,w〉:

0 6 〈w,w〉 = 〈u+ αv, u + αv〉= 〈u, u〉+ α〈v, u〉 + α〈u, v〉 + αα〈v, v〉= ‖u‖2 + α〈u, v〉 + α〈u, v〉 + |α|2‖v‖2.

Now take α = −〈u, v〉/‖v‖2. We deduce

0 6 ‖u‖2 − 〈u, v〉 · 〈u, v〉‖v‖2 − 〈u, v〉〈u, v〉‖v‖2 +|〈u, v〉|2‖v‖4 ‖v‖

2

= ‖u‖2 − |〈u, v〉|2

‖v‖2 ,

so|〈u, v〉|2 6 ‖u‖2‖v‖2

and taking square roots gives the result. �

Corollary 6.6 (Triangle Inequality) Let V be an inner product space.Then

‖u+ v‖ 6 ‖u‖ + ‖v‖for all u, v ∈ V .

105

Proof:

‖u+ v‖2 = 〈u+ v, u+ v〉= 〈u, u〉 + 〈u, v〉 + 〈v, u〉 + 〈v, v〉= ‖u‖2 + 〈u, v〉+ 〈u, v〉 + ‖v‖2

= ‖u‖2 + 2Re〈u, v〉+ ‖v‖2

6 ‖u‖2 + 2|〈u, v〉| + ‖v‖2

6 ‖u‖2 + 2‖u‖ · ‖v‖+ ‖v‖2 (by Cauchy–Schwarz)

= (‖u‖ + ‖v‖)2

and taking square roots gives the result. �

The triangle inequality is a fundamental observation that tells us we canuse the norm to measure distance on an inner product space in the sameway that modulus |x| is used to measure distance on R or C. We can thenperform analysis and speak of continuity and convergence. This topic isaddressed in greater detail in the study of Functional Analysis.

Orthogonality and orthonormal bases

Definition 6.7 Let V be an inner product space.

(i) Two vectors v and w are said to be orthogonal if 〈v,w〉 = 0.

(ii) A set A of vectors is orthogonal if every pair of vectors within it areorthogonal.

(iii) A set A of vectors is orthonormal if it is orthogonal and every vectorin A has unit norm.

Thus the set A = {v1, v2, . . . , vk} is orthonormal if

〈vi, vj〉 = δij =

{

0 if i 6= j

1 if i = j.

An orthonormal basis for an inner product space V is a basis which isitself an orthonormal set.

Example 6.8 (i) The standard basis E = {e1, e2, . . . , en} is an orthonor-mal basis for Rn:

〈ei, ej〉 = ei · ej ={

0 if i 6= j

1 if i = j.

106

(ii) Consider the inner product space C[−π, π], consisting of all continousfunctions f : [−π, π]→ R, with inner product

〈f, g〉 =∫ π

−πf(x)g(x) dx.

Define

e0(x) =1√2π

en(x) =1√πcosnx

fn(x) =1√πsinnx

for n = 1, 2, . . . . These functions (without the scaling) were studiedin MT2001. We have the following facts

〈em, en〉 = 0 if m 6= n,

〈fm, fn〉 = 0 if m 6= n,

〈em, fn〉 = 0 for all m, n

and‖en‖ = ‖fn‖ = 1 for all n.

(The reason for the scaling factors is to achieve unit norm for eachfunction.) The topic of Fourier series relates to expressing functionsas linear combinations of the orthonormal set

{ e0, en, fn | n = 1, 2, 3, . . . }.

Theorem 6.9 An orthogonal set of non-zero vectors is linearly indepen-dent.

Proof: Let A = {v1, v2, . . . , vk} be an orthogonal set of non-zero vectors.Suppose that

k∑

i=1

αivi = 0.

Then, by linearity of the inner product in the first entry, for j = 1, 2, . . . , kwe have

0 =

⟨ k∑

i=1

αivi, vj

⟩

=

k∑

i=1

αi〈vi, vj〉 = αj‖vj‖2,

since by assumption 〈vi, vj〉 = 0 for i 6= j. Now vj 6= 0, so ‖vj‖ 6= 0. Hencewe must have

αj = 0 for all j.

Thus A is linearly independent. �

107

Problem: Given a (finite-dimensional) inner product space V , how do wefind an orthonormal basis?

Theorem 6.10 (Gram–Schmidt Process) Suppose that V is a finite-dimensional inner product space with basis {v1, v2, . . . , vn}. The followingprocedure constructs an orthonormal basis {e1, e2, . . . , en} for V .

Step 1: Define e1 =1

‖v1‖v1.

Step k: Suppose {e1, e2, . . . , ek−1} has been constructed. Define

wk = vk −k−1∑

i=1

〈vk, ei〉ei

and

ek =1

‖wk‖wk.

Proof: We claim that {e1, e2, . . . , ek} is always an orthonormal set con-tained in Span(v1, v2, . . . , vk).

Step 1: v1 is a non-zero vector, so ‖v1‖ 6= 0 and hence e1 = 1

‖v1‖v1 isdefined. Now

‖e1‖ =∥

∥

∥

∥

1

‖v1‖v1

∥

∥

∥

∥

=1

‖v1‖· ‖v1‖ = 1.

Hence {e1} is an orthonormal set (there are no orthogonality conditions tocheck) and by definition e1 ∈ Span(v1).

Step k: Suppose that we have shown {e1, e2, . . . , ek−1} is an orthonormalset contained in Span(v1, v2, . . . , vk−1). Consider

wk = vk −k−1∑

i=1

〈vk, ei〉.

We claim that wk 6= 0. Indeed, if wk = 0, then

vk =

k−1∑

i=1

〈vk, ei〉ei ∈ Span(e1, . . . , ek−1)

⊆ Span(v1, . . . , vk−1).

But this contradicts {v1, v2, . . . , vn} being linearly independent. Thus wk 6=0 and hence ek = 1

‖wk‖wk is defined.

108

By construction ‖ek‖ = 1 and

ek =1

‖wk‖

(

vk −k−1∑

i=1

〈vk, ei〉ei)

∈ Span(e1, . . . , ek−1, vk)

⊆ Span(v1, . . . , vk−1, vk).

It remains to check that ek is orthogonal to ej for j = 1, 2, . . . , k − 1. Wecalculate

〈wk, ej〉 =⟨

vk −k−1∑

i=1

〈vk, ei〉ei, ej⟩

= 〈vk, ej〉 −k−1∑

i=1

〈vk, ei〉〈ei, ej〉

= 〈vk, ej〉 − 〈vk, ej〉‖ej‖2 (by inductive hypothesis)

= 〈vk, ej〉 − 〈vk, ej〉 = 0.

Hence

〈ek, ej〉 =⟨

1

‖wk‖wk, ej

⟩

=1

‖wk‖〈wk, ej〉 = 0

for j = 1, 2, . . . , k − 1.

This completes the induction. We conclude that, at the final stage,{e1, e2, . . . , en} is an orthonormal set. Theorem 6.9 tells us this set is linearlyindependent and hence a basis for V (since dimV = n). �

Example 6.11 Consider R3 with the usual inner product. Find an or-thonormal basis for the subspace U spanned by the vectors

v1 =

10−1

and v2 =

231

.

Solution: We apply the Gram–Schmidt Process to {v1,v2}.

‖v1‖2 =⟨

10−1

,

10−1

⟩

= 12 + (−1)2 = 2.

Take

e1 =1

‖v1‖v1 =

1√2

10−1

.

109

Now

〈v2,e1〉 =⟨

231

,1√2

10−1

⟩

=1√2(2− 1) =

1√2.

Put

w2 = v2 − 〈v2,e1〉e1

=

231

− 1√2· 1√

2

10−1

=

231

−

1/20−1/2

=

3/233/2

.

So

‖w2‖2 = (3/2)2 + 32 + (3/2)2 =27

2

and

‖w2‖ =3√3√2.

Take

e2 =1

‖w2‖w2 =

√

2

3

1/211/2

=1√6

121

.

Thus

1√2

10−1

,1√6

121

is an orthonormal basis for U . �

Example 6.12 (Laguerre polynomials) We can define an inner producton the space P of real polynomials f(x) by

〈f, g〉 =∫ ∞

0

f(x)g(x)e−x dx.

The Laguerre polynomials form the orthonormal basis for P that is producedwhen we apply the Gram–Schmidt process to the standard basis

{1, x, x2, x3, . . . }

of monomials.Determine the first three Laguerre polynomials.

110

Solution: We apply the Gram–Schmidt process to the basis {1, x, x2} forthe inner product space P2, of polynomials of degree at most 2, with innerproduct as above. We shall make use of the fact (determined by inductionand integration by parts) that

∫ ∞

0

xne−x dx = n!

Define fi(x) = xi for i = 0, 1, 2. Then

‖f0‖2 =∫ ∞

0

f0(x)2e−x dx =

∫ ∞

0

e−x dx = 1,

so

L0(x) =1

‖f0‖f0(x) = 1.

We now calculate L1. First

〈f1, L0〉 =∫ ∞

0

f1(x)L0(x)e−x dx =

∫ ∞

0

xe−x dx = 1.

The Gram-Schmidt process says we first put

w1(x) = f1(x)− 〈f1, L0〉L0(x) = x− 1.

Now

‖w1‖2 =∫ ∞

0

w1(x)2e−x dx

=

∫ ∞

0

(x2e−x − 2xe−x + e−x) dx

= 2− 2 + 1 = 1.

Hence

L1(x) =1

‖w1‖w1(x) = x− 1.

In the next step of the Gram–Schmidt process, we calculate

〈f2, L0〉 =∫ ∞

0

x2e−x dx = 2

and

〈f2, L1〉 =∫ ∞

0

x2(x− 1)e−x dx

=

∫ ∞

0

(x3e−x − x2e−x) dx

= 3!− 2! = 6− 2 = 4.

111

So we put

w2(x) = f2(x)− 〈f2, L0〉L0(x)− 〈f2, L1〉L1(x)

= x2 − 4(x− 1)− 2

= x2 − 4x+ 2.

Now

‖w2‖2 =∫ ∞

0

w2(x)2e−x dx

=

∫ ∞

0

(x4 − 8x3 + 20x2 − 16x+ 4)e−x dx

= 4!− 8 · 3! + 20 · 2!− 16 + 4

= 4.

Hence we take

L2(x) =1

‖w2‖w2(x) =

1

2(x2 − 4x+ 2).

Similar calculations can be performed to determine L3, L4, . . . , but theybecome increasingly more complicated (and consequently less suitable forpresenting on a whiteboard!). �

Example 6.13 Define an inner product on the space P of real polynomialsby

〈f, g〉 =∫

1

−1

f(x)g(x) dx.

Applying the Gram–Schmidt process to the monomials {1, x, x2, x3, . . . }produces an orthonormal basis (with respect to this inner product). Thepolynomials produced are scalar multiples of the Legendre polynomials:

P0(x) = 1

P1(x) = x

P2(x) =1

2(3x2 − 1)

...

The set {Pn(x) | n = 0, 1, 2, . . . } of Legendre polynomials is orthogonal, butnot orthonormal. This is the reason why the Gram–Schmidt process onlyproduces a scalar multiple of them. The scalars appearing are determinedby the norms of the Pn with respect to this inner product.

For example,

‖P0‖2 =∫

1

−1

P0(x)2 dx =

∫

1

−1

dx = 2,

112

so the polynomial of unit norm produced will be 1√2P0(x). Similar calcula-

tions (of increasing length) can be performed for the other polynomials.

The Hermite polynomials form an orthogonal set in the space P whenwe endow it with the following inner product

〈f, g〉 =∫ ∞

−∞f(x)g(x)e−x2/2 dx.

Again the orthonomal basis produced by applying the Gram–Schmidt pro-cess to the monomials are scalar multiples of the Hermite polynomials.

Orthogonal complements

Definition 6.14 Let V be an inner product space. If U is a subspace of V ,the orthogonal complement to U is

U⊥ = { v ∈ V | 〈v, u〉 = 0 for all u ∈ U }.

Thus U⊥ consists of those vectors which are orthogonal to every single vectorin U .

Lemma 6.15 Let V be an inner product space and U be a subspace of V .Then

(i) U⊥ is a subspace of V , and

(ii) U ∩ U⊥ = {0}.

Proof: (i) First note 〈0, u〉 = 0 for all u ∈ U , so 0 ∈ U⊥. Now let v,w ∈ U⊥

and α ∈ F . Then

〈v + w, u〉 = 〈v, u〉 + 〈w, u〉 = 0 + 0 = 0

and〈αv, u〉 = α〈v, u〉 = α · 0 = 0

for all u ∈ U . So we deduce v + w ∈ U⊥ and αv ∈ U⊥. This shows thatU⊥ is a subspace.

(ii) Let u ∈ U ∩ U⊥. Then

‖u‖2 = 〈u, u〉 = 0

(since the element u is, in particular, orthogonal to itself). Hence u = 0. �

Theorem 6.16 Let V be a finite-dimensional inner product space and U bea subspace of V . Then

V = U ⊕ U⊥.

113

Proof: We already know that U ∩ U⊥ = {0}, so it remains to show V =U + U⊥.

Let {v1, v2, . . . , vk} be a basis for U . Extend it to a basis

B = {v1, v2, . . . , vk, wk+1, . . . , wn}

for V . Now apply the Gram–Schmidt process to B and hence produce anorthonormal basis E = {e1, e2, . . . , en} for V . By construction,

{e1, e2, . . . , ek} ⊆ Span(v1, v2, . . . , vk) = U

and, since it is an orthonormal set, {e1, e2, . . . , ek} is a linearly independentset of size k = dimU . Therefore {e1, e2, . . . , ek} is a basis for U .

Hence any vector u ∈ U can be uniquely written as u =∑k

i=1αiei. Then

for all such u

〈u, ej〉 =⟨ k∑

i=1

αiei, ej

⟩

=

k∑

i=1

αi〈ei, ej〉 = 0

for j = k + 1, k + 2, . . . , n. That is,

ek+1, ek+2, . . . , en ∈ U⊥.

Now if v ∈ V , we can write

v = β1e1 + · · ·+ βkek + βk+1ek+1 + · · ·+ βnen

for some scalars β1, β2, . . . , βn and

β1e1 + · · ·+ βkek ∈ U and βk+1ek+1 + · · ·+ βnen ∈ U⊥.

This shows that every vector in V is the sum of a vector in U and one in U⊥,so

V = U + U⊥ = U ⊕ U⊥,

as required to complete the proof. �

Once we have a direct sum, we can consider an associated projectionmap. In particular, we have the projection PU : V → V onto U associatedto the decomposition V = U ⊕ U⊥. This is given by

PU (v) = u

where v = u+w is the unique decomposition of v with u ∈ U and w ∈ U⊥.

Theorem 6.17 Let V be a finite-dimensional inner product space and U bea subspace of V . Let PU : V → V be the projection map onto U associatedto the direct sum decomposition V = U ⊕ U⊥. If v ∈ V , then PU (v) is thevector in U that is closest to v.

114

Proof: Recall that the norm ‖ · ‖ determines the distance between twovectors, specifically ‖v − u‖ is the distance from v to u. Write v = u0 + w0

where u0 ∈ U and w0 ∈ U⊥, so that PU (v) = u0. Then if u is any vectorin U ,

‖v − u‖2 = ‖v − u0 + (u0 − u)‖2

= ‖w0 + (u0 − u)‖2= 〈w0 + (u0 − u), w0 + (u0 − u)〉= 〈w0, w0〉+ 〈w0, u0 − u〉+ 〈u0 − u,w0〉+ 〈u0 − u, u0 − u〉= ‖w0‖2 + ‖u0 − u‖2 (since w0 is orthogonal to u0 − u ∈ U)

> ‖w0‖2 (since ‖u0 − u‖ > 0)

= ‖v − u0‖2

= ‖v − PU (v)‖2.Hence

‖v − u‖ > ‖v − PU (v)‖ for all u ∈ U .

This proves the theorem: PU (v) is closer to v than any other vector in U .�

Example 6.18 Find the distance from the vector w0 =

−151

in R3 to the

subspace

U = Span

111

,

01−2

.

Solution: We need to find U⊥, which must be a 1-dimensional subspacesince R3 = U ⊕ U⊥. We solve the condition 〈v,u〉 = 0 for all u ∈ U :

⟨

xyz

,

111

⟩

=

xyz

·

111

= x+ y + z

and⟨

xyz

,

01−2

⟩

=

xyz

·

01−2

= y − 2z.

Hencex+ y + z = y − 2z = 0.

Given arbitrary z, we take y = 2z and x = −y − z = −3z. Therefore

U⊥ =

−3z2zz

∣

∣

∣

∣

∣

∣

z ∈ R

= Span

−321

.

115

The closest vector in U to w0 is PU (w0) where PU : R3 → R3 is theprojection onto U associated to R3 = U ⊕ U⊥. To determine this we solve

w0 =

−151

= α

111

+ β

01−2

+ γ

−321

,

so

α − 3γ = −1 (6.1)

α + β + 2γ = 5 (6.2)

α− 2β + γ = 1. (6.3)

Multiplying (6.2) by 2 and adding to (6.3) gives

3α+ 5γ = 11.

Then multiplying (6.1) by 3 and subtracting gives

14γ = 14.

Hence γ = 1, α = −1 + 3γ = 2 and β = 5− α− 2γ = 1. We conclude

w0 = 2

111

+

01−2

+

−321

= PU (w0) +

−321

.

We know PU (w0) is the nearest vector in U to w0, so the distance of w0

to U is

‖w0 − PU (w0)‖ =

∥

∥

∥

∥

∥

∥

−321

∥

∥

∥

∥

∥

∥

=√

(−3)2 + 22 + 12 =√14.

�

Example 6A (Exam Paper, January 2010) Let 〈·, ·〉 denote the usualinner product on R4, namely

〈u,v〉 =4

∑

i=1

xiyi

for u =

x1x2x3x4

and v =

y1y2y3y4

.

116

(i) Apply the Gram–Schmidt Process to the set

A =

11−11

,

31−22

,

2−431

,

1000

to produce an orthonormal basis for R4.

(ii) Let U be the subspace of R4 spanned by

B =

11−11

,

31−22

.

Find a basis for the orthogonal complement to U in R4.

(iii) Find the vector in U that is nearest to

2−431

.

Solution: (i) Define

v1 =

11−11

, v2 =

31−22

, v3 =

2−431

, v4 =

1000

.

We perform the steps of the Gram–Schmidt Process:

Step 1:‖v1‖2 = 12 + 12 + (−1)2 + 12 = 4,

so‖v1‖ = 2.

Take

e1 =1

‖v1‖v1 =

1

2

11−11

.

117

Step 2:

〈v2,e1〉 =1

2

⟨

31−22

,

11−11

⟩

= 1

2(3 + 1 + 2 + 2) = 4.

Take

w2 = v2 − 〈v2,e1〉e1 =

31−22

− 2

11−11

=

1−100

.

Then‖w2‖2 = 12 + (−1)2 = 2,

so take

e2 =1

‖w2‖w2 =

1√2

1−100

.

Step 3:

〈v3,e1〉 =1

2

⟨

2−431

,

11−11

⟩

= 1

2(2− 4− 3 + 1) = −2

〈v3,e2〉 =1√2

⟨

2−431

,

1−100

⟩

= 1√2(2 + 4 + 0 + 0) =

6√2= 3√2.

Take

w3 = v3 − 〈v3,e1〉e1 − 〈v3,e2〉e2

=

2−431

+ 2 · 12

11−11

− 3√2 · 1√

2

1−100

=

2−431

+

11−11

− 3

1−100

=

0022

.

Then‖w3‖2 = 22 + 22 = 8,

118

so take

e3 =1

‖w3‖w3 =

1

2√2w3 =

1√2

0011

.

Step 4:

〈v4,e1〉 =1

0

⟨

1000

,

11−11

⟩

= 1

2

〈v4,e2〉 =1√2

⟨

1000

,

1−100

⟩

=1√2

〈v4,e3〉 =1√2

⟨

1000

,

0011

⟩

= 0.

Take

w4 = v4 − 〈v4,e1〉e1 − 〈v4,e2〉e2 − 〈v4,e3〉e3

=

1000

− 1

2· 12

11−11

− 1√2· 1√

2

1−100

=

1000

− 1

4

11−11

− 1

2

1−100

=

1/41/41/4−1/4

.

Then

‖w4‖2 =(

1

4

)2+

(

1

4

)2+

(

1

4

)2+

(

−1

4

)2=

1

4,

so take

e4 =1

‖w4‖w4 =

1

2

111−1

.

119

Hence

1

2

11−11

,1√2

1−100

,1√2

0011

,1

2

111−1

is the orthonormal basis for R4 obtained by applying the Gram–SchmidtProcess to A .

(ii) In terms of the notation of (i), U = Span(v1,v2). However, themethod of the Gram–Schmidt Process (see the proof of Theorem 6.10) showsthat

Span(e1,e2) = Span(v1,v2) = U.

If v = αe1 + βe2 + γe3 + δe4 is an arbitrary vector of R4 (expressed interms of our orthonormal basis), then

〈v,e1〉 = α and 〈v,e2〉 = β.

Hence if v ∈ U⊥, then in particular α = β = 0, so U⊥ ⊆ Span(e3,e4).Conversely, if v = γe3 + δe4 ∈ Span(e3,e4), then

〈ζe1 + ηe2, γe3 + δe4〉 = 0

since 〈ei,ej〉 = 0 for i 6= j. Hence every vector in Span(e3,e4) is orthogonalto every vector in U and we conclude

U⊥ = Span(e3,e4).

Thus {e3,e4} is a basis for U⊥.(iii) Let P : V → V be the projection onto U associated to the direct

sum decomposition V = U ⊕U⊥. Then P (v) is the vector in U closest to v.Now in our application of the Gram–Schmidt Process,

w3 = v3 − 〈v3,e1〉e1 − 〈v3,e2〉e2,

soP (w3) = P (v3)− 〈v3,e1〉P (e1)− 〈v3,e1〉P (e2).

Therefore0 = P (v3)− 〈v3,e1〉e1 − 〈v3,e2〉e2,

since w3 = ‖w3‖e3 ∈ U⊥ and e1,e2 ∈ U . Hence the closest vector in Uto v3 is

P (v3) = 〈v3,e1〉e1 + 〈v3,e2〉e2

= (−2) · 12

11−11

+ 3√2 · 1√

2

1−100

120

= −

11−11

+ 3

1−100

=

2−41−1

.

�

121

Section 7

The adjoint of a

transformation and

self-adjoint transformations

Throughout this section, V is a finite-dimensional inner product space overa field F (where, as before, F = R or C) with inner product 〈·, ·〉.

Definition 7.1 Let T : V → V be a linear transformation. The adjointof T is a map T ∗ : V → V such that

〈T (v), w〉 = 〈v, T ∗(w)〉 for all v,w ∈ V .

Remark: More generally, if T : V → W is a linear map between innerproduct spaces, the adjoint T ∗ : W → V is a map satisfying the aboveequation for all v ∈ V and w ∈ W . Appropriate parts of what we describehere can be done in this more general setting.

Lemma 7.2 Let V be a finite-dimensional inner product space and letT : V → V be a linear transformation. Then there is a unique adjoint T ∗

for T and, moreover, T ∗ is a linear transformation.

Proof: We first show that if T ∗ exists, then it is unique. For if S : V → Valso satisfies the same condition, then

〈v, T ∗(w)〉 = 〈T (v), w〉 = 〈v, S(w)〉

for all v,w ∈ V . Hence

〈v, T ∗(w)〉 − 〈v, S(w)〉 = 0,

that is,〈v, T ∗(w) − S(w)〉 = 0 for all v,w ∈ V .

122

Let us fix w ∈ V and take v = T ∗(w)− S(w). Then

〈T ∗(w)− S(w), T ∗(w)− S(w)〉 = 0.

The axioms of an inner product space tell us

T ∗(w)− S(w) = 0

soS(w) = T ∗(w) for all w ∈ V ,

as claimed.It remains to show that such a linear map T ∗ actually exists. Let B =

{e1, e2, . . . , en} be an orthonormal basis for V . (The Gram–Schmidt Processguarantees that this exists.) Let A = [αij] be the matrix of T with respectto B. Define T ∗ : V → V be the linear map whose matrix is the conjugatetranspose of A with respect to B. Thus

T ∗(ej) =n∑

i=1

αjiei for j = 1, 2, . . . , n.

(Here we are using Proposition 2.7 to guarantee that this determines aunique linear transformation T ∗.) Note also that

T (ej) =n∑

i=1

αijei for j = 1, 2, . . . , n.

Claim: 〈T (v), w〉 = 〈v, T ∗(w)〉 for all v,w ∈ V .

Write v =∑n

j=1βjej and w =

∑nk=1

γkek in terms of the basis B. Then

〈T (v), w〉 =⟨

T

( n∑

j=1

βjej

)

,

n∑

k=1

γkek

⟩

=

⟨ n∑

j=1

βjT (ej),n∑

k=1

γkek

⟩

=

⟨ n∑

j=1

βj

n∑

i=1

αijei,

n∑

k=1

γkek

⟩

=n∑

j=1

n∑

i=1

n∑

k=1

βjαij γk〈ei, ek〉

=

n∑

j=1

n∑

i=1

βjαij γi,

123

while

〈v, T ∗(w)〉 =⟨

n∑

j=1

βjej , T∗( n∑

k=1

γkek

)

⟩

=

⟨ n∑

j=1

βjej ,n∑

k=1

γkT∗(ek)

⟩

=

⟨ n∑

j=1

βjej ,

n∑

k=1

γk

n∑

i=1

akiei

⟩

=n∑

j=1

n∑

k=1

n∑

i=1

βj γkaki〈ej , ei〉

=

n∑

j=1

n∑

k=1

βj γkakj

= 〈T (v), w〉.

Hence T ∗ is indeed the adjoint of T . �

We also record what was observed in the course of this proof:

If A = MatB,B(T ) is the matrix of T with respect to an or-thonormal basis, then

MatB,B(T ∗) = AT

(the conjugate transpose of A).

Definition 7.3 A linear transformation T : V → V is self-adjoint if T ∗ = T .

Interpreting this in terms of the matrices (using our observation above),we conclude:

Lemma 7.4 (i) A real matrix A defines a self-adjoint transformation ifand only if it is symmetric: AT = A.

(ii) A complex matrix A defines a self-adjoint transformation if and onlyif it is Hermitian: AT = A. �

The most important theorem concerning self-adjoint transformation isthe following:

Theorem 7.5 A self-adjoint transformation of a finite-dimensional innerproduct space is diagonalisable.

Interpreting this in terms of matrices gives us:

124

Corollary 7.6 (i) A real symmetric matrix is diagonalisable.

(ii) A Hermitian matrix is diagonalisable.

We finish the course by establishing Theorem 7.5. First we establish themain tools needed to prove that result.

Lemma 7.7 Let V be a finite-dimensional inner product space and T : V →V be a self-adjoint transformation. Then the characteristic polynomial is aproduct of linear factors and every eigenvalue of T is real.

Proof: Any polynomial is factorisable over C into a product of linearfactors. Thus it is sufficient to show all the roots of the characteristic poly-nomial are real.

Let W be an inner product space over C with the same dimension as Vand let S : W → W be a linear transformation whose matrix A with re-spect to an orthonormal basis is the same as that of T with respect to anorthonormal basis for V . Then S is also self-adjoint since AT = A (becauseT ∗ = T ). (Essentially this process deals with the fact that V might be avector space over R, so we replace it by one over C that in all other ways isthe same.)

Let λ ∈ C be a root of cS(x) = det(xI − A) = cT (x). Then λ is aneigenvalue of S, so there exists an eigenvector v ∈W for S:

S(v) = λv.

Therefore〈S(v), v〉 = 〈λv, v〉 = λ‖v‖2,

but also

〈S(v), v〉 = 〈v, S∗(v)〉 = 〈v, S(v)〉 = 〈v, λv〉 = λ‖v‖2.

Henceλ‖v‖2 = λ‖v‖2

and since v 6= 0, we conclude λ = λ. This shows that λ ∈ R and the lemmais proved. �

Lemma 7.8 Let V be an inner product space and T : V → V be a linearmap. If U is a subspace of V such that T (U) ⊆ U (i.e., U is T -invariant),then T ∗(U⊥) ⊆ U⊥ (i.e., U⊥ is T ∗-invariant).

Proof: Let v ∈ U⊥. Then for any u ∈ U , we have

〈u, T ∗(v)〉 = 〈T (u), v〉 = 0,

since T (u) ∈ U (by assumption) and v ∈ U⊥. Hence T ∗(v) ∈ U⊥. �

125

These two lemmas now enable us to prove the main theorem about di-agonalisation of self-adjoint transformations.

Proof of Theorem 7.5: We proceed by induction on n = dimV . Ifn = 1, then T is represented by a 1× 1 matrix, which is already diagonal.

Consider the characteristic polynomial cT (x). By Lemma 7.7, this isa product of linear factors. In particular, there exists some root λ ∈ F .Let v1 be an eigenvector with eigenvalue λ. Let U = Span(v1) be the1-dimensional subspace spanned by v1. By Theorem 6.16,

V = U ⊕ U⊥.

Now as T (v1) = λv1 ∈ U , we see that U is T -invariant. Hence U⊥ is alsoT -invariant by Lemma 7.8 (since T ∗ = T ).

Now consider the restriction S = T |U⊥ : U⊥ → U⊥ of T to U⊥. This isself-adjoint, since

〈T (v), w〉 = 〈v, T (w)〉 for all v,w,∈ U⊥

tells us(T |U⊥)∗ = T |U⊥ .

By induction, S = T |U⊥ is diagonalisable. Hence there is a basis {v2, . . . , vn}for U⊥ of eigenvectors for T . Then as V = U ⊕ U⊥, we conclude that{v1, v2, . . . , vn} is a basis for V consisting of eigenvectors for T . Hence T isdiagonalisable and the proof is complete. �

126

mrq november27,2014martyn/3501/3501lecturenotes.pdf · something just for euclidean vectors, we can...

Documents