linear algebra - alex chan · 2020-03-15 · c.w. curtis linear algebra: an introductory approach....

AAA

Part IB of the Mathematical Tripos

of the University of Cambridge

Michaelmas 2012

Linear Algebra

Lectured by:Prof. I. Grojnowski

Notes by:Alex Chan

Please send comments and corrections to [email protected].

The most recent copy of these notes can be downloaded from alexwlchan.net/maths

These notes are licensed under a Creative CommonsAttribution-NonCommercial-ShareAlike 3.0 Unported License.

The following resources are not endorsed by the University of Cambridge.

Printed Tuesday, 7 May 2013.

mailto:[email protected]

http://www.alexwlchan.net/maths

http://creativecommons.org/licenses/by-nc-sa/3.0/

http://creativecommons.org/licenses/by-nc-sa/3.0/

Course schedule

Denition of a vector space (over R or C), subspaces, the space spanned by a subset.Linear independence, bases, dimension. Direct sums and complementary subspaces. [3]

Linear maps, isomorphisms. Relation between rank and nullity. The space of linearmaps from U to V , representation by matrices. Change of basis. Row rank and columnrank. [4]

Determinant and trace of a square matrix. Determinant of a product of two matricesand of the inverse matrix. Determinant of an endomorphism. The adjugate matrix. [3]

Eigenvalues and eigenvectors. Diagonal and triangular forms. Characteristic and min-imal polynomials. Cayley-Hamilton Theorem over C. Algebraic and geometric multi-plicity of eigenvalues. Statement and illustration of Jordan normal form. [4]

Dual of a nite-dimensional vector space, dual bases and maps. Matrix representation,rank and determinant of dual map. [2]

Bilinear forms. Matrix representation, change of basis. Symmetric forms and their linkwith quadratic forms. Diagonalisation of quadratic forms. Law of inertia, classificationby rank and signature. Complex Hermitian forms. [4]

Inner product spaces, orthonormal sets, orthogonal projection, V = W ⊕W⊥. Gram-Schmidt orthogonalisation. Adjoints. Diagonalisation of Hermitian matrices. Orthogo-nality of eigenvectors and properties of eigenvalues. [4]

Appropriate books

C.W. Curtis Linear Algebra: an Introductory Approach. Springer 1984 (38.50 hardback)

P.R. Halmos Finite-Dimensional Vector Spaces. Springer 1974 (31.50 hardback)

K. Hoffman and R. Kunze Linear Algebra. Prentice-Hall 1971 (72.99 hardback)

Contents

1 Vector spaces 31.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Linear maps and matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 111.5 Conservation of dimension: the rank-nullity theorem . . . . . . . . . . . 171.6 Sums and intersections of subspaces . . . . . . . . . . . . . . . . . . . . 21

2 Endomorphisms 252.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Jordan normal form 353.1 Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . 353.2 Cayley-Hamilton theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 413.3 Combinatorics of nilpotent matrices . . . . . . . . . . . . . . . . . . . . 453.4 Applications of JNF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4 Duals 53

5 Bilinear forms 575.1 Symmetric forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.2 Anti-symmetric forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6 Hermitian forms 696.1 Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716.2 Hermitian adjoints for inner products . . . . . . . . . . . . . . . . . . . 74

Vector spaces | 3

1 Vector spaces

1.1 Definitions

Lecture 1We start by fixing a field , F. We say that F is a field if:

• F is an abelian group under an operation called addition, (+), with additive iden-tity 0;

• F\{0} is an abelian group under an operation called multiplication, (·), with mul-tiplicative identity 1;

• Multiplication is distributive over addition; that is, a (b+ c) = ab + ac for alla, b, c ∈ F.

Fields we’ve encountered before include the reals R, the complex numbers C, the ring ofintegers modulo p, Z/p = Fp, the rationals Q, as well as Q[

√3 ] = {a+ b

√3 : a, b ∈ Q}.

Everything we will discuss works over any field, but it’s best to have R and C in mind,since that’s what we’re most familiar with.

Definition. A vector space over F is a tuple (V,+, ·) consisting of a set V , opera-tions + : V × V → V (vector addition) and · : F × V → V (scalar multiplication)such that

(i) (V,+) is an abelian group, that is:

• Associative: for all v1, v2, v3 ∈ V , (v1 + v2) + v3 = v1 + (v2 + v3);

• Commutative: for all v1, v2 ∈ V , v1 + v2 = v2 + v1;

• Identity: there is some (unique) 0 ∈ V such that, for all v ∈ V , 0 + v =v = v + 0;

• Inverse: for all v ∈ V , there is some u ∈ V with u+ v = v + u = 0.

This inverse is unique, and often denoted −v.

(ii) Scalar multiplication satisfies

• Associative: for all λ1, λ2 ∈ F, v ∈ V , λ1 · (λ2 · v) = (λ1λ2) · v;

• Identity: for all v ∈ V , the unit 1 ∈ F acts by 1 · v = v;

• · distributes over +V : for all λ ∈ F, v1, v2 ∈ V , λ·(v1 + v2) = λ·v1+λ·v2;

• +F distributes over ·: for all λ1, λ2 ∈ F, v ∈ V , (λ1 + λ2)·v = λ1·v+λ2·v;

We usually say “the vector space V ” rather than (V,+, ·).

http://en.wikipedia.org/wiki/Field_(mathematics)

http://en.wikipedia.org/wiki/Vector_space

4 | IB Linear Algebra

Examples 1.1.

(i) {0} is a vector space.

(ii) Vectors in the plane under vector addition form a vector space.

(iii) The space of n-tuples with entries in F, denoted Fn ={

(a1, . . . , an) : ai ∈ F}

with component-wise addition

(a1, . . . , an) + (b1, . . . , bn) = (a1 + b1, . . . , an + bn)

and scalar multiplication

λ · (a1, . . . , an) = (λa1, . . . , λan) .

Proving that this is a vector space is an exercise. It is also a special case ofthe next example.

(iv) Let X be any set, and FX = {f : X → F} be the set of all functions X → F.This is a vector space, with addition defined pointwise:

(f + g)(x) = f(x) + g(x)

and multiplication also defined pointwise:

(λ · f)(x) = λf(x)

if λ ∈ F, f, g ∈ FX , x ∈ X. If X = {1, . . . , n}, then FX = Fn and we have theprevious example.

Proof that FX is a vector space.

• As + in F is commutative, we have

(f + g)(x) = f(x) + g(x) = g(x) + f(x) = (g + f)(x),

so f + g = g + f . Similarly, f in F associative implies f + (g + h) =(f + g) + h, and that (−f)(x) = −f(x) and 0(x) = 0.

• Axioms for scalar multiplication follow from the relationship between ·and + in F. Check this yourself!

(v) C is a vector space over R.

Lemma 1.2. Let V be a vector space over F.

(i) For all λ ∈ F, λ · 0 = 0, and for all v ∈ V , 0 · v = 0.

(ii) Conversely, if λ · v = 0 and λ ∈ F has λ 6= 0, then v = 0.

(iii) For all v ∈ V , −1 · v = −v.

Proof.

(i) λ · 0 = λ · (0 + 0) = λ · 0 + λ · 0 =⇒ λ · 0 = 0.

0 · v = (0 + 0) · v = 0 · v + 0 · v =⇒ 0 · v = 0.

(ii) As λ ∈ F, λ 6= 0, there exists λ−1 ∈ F such that λ−1λ = 1, so v = (λ−1λ) · v =λ−1 (λ · v), hence if λ · v = 0, we get v = λ−1 · 0 = 0 by (i).

(iii) 0 = 0 · v = (1 + (−1)) · v = 1 · v + (−1 · v) = v + (−1 · v) =⇒ −1 · v = −v.

We will write λv rather than λ · v from now on, as the lemma means this will not causeany confusion.

Vector spaces | 5

1.2 Subspaces

Definition. Let V be a vector space over F. A subset U ⊆ V is a vector subspace(or just a subspace), written U ≤ V , if the following holds:

(i) 0 ∈ U ;

(ii) If u1, u2 ∈ U , then u1 + u2 ∈ U ;

(iii) If u ∈ U , λ ∈ F, then λu ∈ U .

Equivalently, U is a subspace if U ⊆ V , U 6= ∅ (U is non-empty) and for allu, v ∈ U , λ, µ ∈ F, λu+ µv ∈ U .

Lemma 1.3. If V is a vector space over F and U ≤ V , then U is a vector space over Funder the restriction of the operations + and · on V to U . (Proof is an exercise.)

Examples 1.4.

(i) {0} and V are always subspaces of V .

(ii){

(r1, . . . , rn, 0, . . . , 0) : ri ∈ R}⊆ Rn+m is a subspace of Rn+m.

(iii) The following are all subspaces of sets of functions:

C1(R) ={f : R→ R | f continuously differentiable

}⊆ C(R) =

{f : R→ R | f continuous

}⊆ RR = {f : R→ R} .

Consider: f, g continuous implies that f + g is, and λf is, for λ ∈ R; the zerofunction is continuous, so C(R) is a subspace of RR, similarly for C1(R).

(iv) Let X be any set, and write

F[X] = (FX)fin ={f : X → F | f(x) 6= 0 for only finitely many x ∈ X

}.

This is the set of finitely supported functions, which is is a subspace of FX .Consider: if f(x) = 0, then λ f(x) = 0, so if f ∈ (FX)fin, then so is λf .Similarly,

(f + g)−1 (F\{0}) ⊆ f−1(F\{0}) ∪ g−1(F\{0})

and if these two are finite, then so is the LHS. Now consider the special caseX = N:

F[N] = (FN)fin ={

(λ0, λ1, . . .) | only finitely many λi are non-zero}.

We write xi for the function which sends i 7→ 1, j 7→ 0 if j 6= i; that is, forthe tuple (0, . . . , 0, 1, 0, . . .) in the ith place. Thus

F[N] ={∑

λi | only finitely many λi non-zero}.

We can do better than a vector space here: define multiplication by(∑λi x

i) (∑

µj xj)

=∑λi µj · xi+j .

This is still in F[N]. It is more usual to denote this by F[x], the polynomialsin x over F (and this is a formal definition of the polynomial ring).

http://en.wikipedia.org/wiki/Linear_subspace

http://en.wikipedia.org/wiki/Linear_subspace


1.3 Bases

Lecture 2Definition. Suppose V is a vector space over F, and S ⊆ V is a subset of V . Thenv is a linear combination of elements of S if there is some n > 0 and λ1, . . . , λn ∈ F,v1, . . . , vn ∈ S such that v = λ1v1 + · · ·+ λnvn or if v = 0.

We write 〈S〉 for the span of S, the set of all linear combinations of elements of S.

Notice that it is important in the definition to use only finitely many elements – infinitesums do not make sense in arbitrary vector spaces.

We will see later why it is convenient notation to say that 0 is a linear combination ofn = 0 elements of S.

Example 1.5.⟨∅⟩

= {0}.

Lemma 1.6.

(i) 〈S〉 is a subspace of V .

(ii) If W ≤ V is a subspace, and S ⊆ W , then 〈S〉 ≤ W ; that is, 〈S〉 is the smallestsubset of V containing S.

Proof. (i) is immediate from the definition. (ii) is immediate, by (i) applied to W .

Definition. We say that S spans V if 〈S〉 = V .

Example 1.7. The set {(1, 0, 0), (0, 1, 0), (1, 1, 0), (7, 8, 0)} spans{

(x, y, 0)}≤ R3.

Definition. Let v1, . . . , vn be a sequence of elements in V . We say they are linearlydependent if there exist λ1, . . . , λn ∈ F, not all zero, such that

n∑i=1

λi vi = 0,

which we call a linear relation among the vi. We say that v1, . . . , vn are linearlyindependent if they are not linearly dependent; that is, if there is no linear relationamong them, or equivalently if

n∑i=1

λi vi = 0 =⇒ λi = 0 for all i.

We say that a subset S ⊆ V is linearly independent if every finite sequence ofdistinct elements in S is linearly independent.

http://en.wikipedia.org/wiki/Linear_combination

http://en.wikipedia.org/wiki/Linear_span

Vector spaces | 7

Note that if v1, . . . , vn is linearly independent, then so is every reordering vπ(1), . . . , vπ(n).

• If v1, . . . , vn are linearly independent, and vi1 , . . . , vik is a subsequence, then thesubsequence is also linearly independent.

• If some vi = 0, then 1 · 0 = 0 is a linear relation, so v1, . . . , vn is not linearlyindependent.

• If vi = vj for some i 6= j, then 1 · vi + (−1) vj = 0 is a linear relation, so thesequence isn’t linearly independent.

• If |S| < ∞, say S = {v1, . . . , vn}, then S is linearly independent if and only ifv1, . . . , vn are linearly independent.

Example 1.8. Let V = R3, S =

1

00

,

010

, and then

λ1

100

+ λ2

010

=

λ1

λ2

0

is zero if and only if λ1 = λ2 = 0, and so S is linearly independent.

Exercises 1.9.

(i) Show that v1, v2 ∈ V are linearly dependent if and only if v1 = 0 or v2 = λv1

for some λ ∈ F.

(ii) Let S =

1

01

,

120

,

210

, then

λ1

101

+ λ2

120

+ λ3

210

=

1 1 20 2 11 0 0

︸︷︷︸

A

λ1

λ2

λ3

,

so linear independence of S is the same as Aλ = 0 =⇒ λ = 0. Show that inthis example, there are no non-zero solutions.

(iii) If S ⊆ Fn, S = {v1, . . . , vm}, then show that finding a relation of lineardependence

∑mi=1 λi vi is equivalent to solving Aλ = 0, where A = (v1 . . . vm)

is an n×m matrix whose columns are the vi.

(iv) Hence show that every collection of four vectors in R3 has a relation of lineardependence.

Definition. The set S ⊆ V is a basis for V if

(i) S is linearly independent and;

(ii) S spans V .

Remark. This is slightly the wrong notion. We should order S, but we’ll deal with thislater.

http://en.wikipedia.org/wiki/Basis_(linear_algebra)


Examples 1.10.

(i) By convention, the vector space {0} has ∅ as a basis.

(ii) S = {e1, . . . , en}, where ei is a vector of all zeroes except for a one in the ithposition, is a basis of Fn called the standard basis.

(iii) F[x] = F[N] = (FN)fin has basis{

1, x, x2, . . .}

.

More generally, F[X] has{δx | x ∈ X

}as a basis, where

δx(y) =

{1 if x = y,

0 otherwise,

so F[X] is, formally, the set of linear combinations of elements of X.

For amusement: F[N] ≤ FN, and 1, x, x2, . . . are linearly independent inFN as they are linearly independent in F[N], but they do not span FN, as(1, 1, 1, . . .) 6∈ F[N].

Show that if a basis of FN exists, then it is uncountable.

Lemma 1.11. A set S is a basis of V if and only if every vector v ∈ V can be writtenuniquely as a linear combination of elements of S.

Proof. (⇐) Writing v as a linear combination of elements of S for every v ∈ V meansthat 〈S〉 = V . Uniquely means that, in particular, 0 can be written uniquely, and so Sis linearly independent.

(⇒) If v =∑n

i=1 λi vi =∑n

i=1 µivi, where vi ∈ S and i = 1, . . . , n, then∑n

i=1 (λi − µi) vi =0, and since the vi are linearly independent, λi = µi for all i.

Observe: if S is a basis of V , |S| = d and |F| = q < ∞ (for example, F = Z/pZ, andq = p), then the lemma gives |V | = qd, which implies that d is the same, regardless ofchoice of basis for V , that is every basis of V has the same size. In fact, this is truewhen F = R or indeed when F is arbitrary, which means we must give a proof withoutcounting. We will now slowly show this, showing that the language of vector spacesreduces the proof to a statement about matrices – Gaussian elimination (row reduction)– we’re already familiar with.

Definition. V is finite dimensional if there exists a finite set S which spans V .

Lecture 3

Theorem 1.12

Let V be a vector space over F, and let S span V . If S is finite, then S has a subsetwhich is a basis for V . In particular, if V is finite dimensional, then V has a basis.

Proof. If S is linearly independent, then we’re done. Otherwise, there exists a relationof linear dependence,

∑ni=1 ci vi = 0, where not all ci are zero (for ci ∈ F). Sup-

pose ci0 6= 0, then we get ci0vi0 = −∑

j 6=i0 cjvj , so vi0 = −∑cjvj/ci0 , and hence we

claim 〈v1, . . . , vm〉 =⟨v1, . . . , vi0−1, vi0+1, . . . , vm

⟩(proof is an exercise). So removing vi0

doesn’t change the span. We repeat this process, continuing to remove elements untilwe have a basis.

Remark. If S = {0}, say with V = {0}, then the proof says remove 0 from the set S toget ∅, which is why it is convenient to say that ∅ is a basis of {0}.

Vector spaces | 9

Theorem 1.13

Let V be a vector space over F, and V finite dimensional. If v1, . . . , vr are lin-early independent vectors, then there exist elements vr+1, . . . , vn ∈ V such that{v1, . . . , vr, vr+1, . . . , vn} is a basis.

That is, any linearly independent set can be extended to a basis of V .

Remark. This theorem is true without the assumption that V is finite dimensional –any vector space has a basis. The proof is similar to what we give below, plus a bitof fiddling with the axiom of choice. The interesting theorems in this course are aboutfinite dimensional vector spaces, so you’re not missing much by this omission.

First, we prove a lemma.

Lemma 1.14. Let v1, . . . , vm be linearly independent, and v ∈ V . Then v /∈ 〈v1, . . . , vm〉if and only if v1, . . . , vm, v are linearly independent.

Proof. (⇐) If v ∈ 〈v1, . . . , vm〉, then v =∑m

i=1 civi for some ci ∈ F, so∑m

i=1 civi+(−1)·vis a non-trivial relation of linear dependence.

(⇒) Conversely, if v1, . . . , vm, v are linearly dependent, then there exist ci, b such that∑civi + bv = 0, with not all ci, b zero. Then if b = 0, we get

∑civi = 0, which is a

non-trivial relation on the linearly independent vi, which is not possible, so b 6= 0. Sov = −

∑civi/b and v ∈ 〈v1, . . . , vm〉.

Proof of theorem 1.13. Since V is finite dimensional, there is a finite spanning set S ={w1, . . . , wd}. Now, if wi ∈ 〈v1, . . . , vr〉 for all i, then V = 〈w1, . . . , wd〉 ⊆ 〈v1, . . . , vr〉,so in this case v1, . . . , vr is already a basis.

Otherwise, there is some wi 6∈ 〈v1, . . . , vr〉. But then the lemma implies that v1, . . . , vr, wiis linearly independent. We recurse, adding elements to S until we have a basis.

Theorem 1.15

Let V be a vector space over F. Let S = {v1, . . . , vm} span V and L = {w1, . . . , wn}be linearly independent. Then m ≥ n.

In particular, if B1,B2 are two bases of V , then |B1| = |B2|.

Proof. As the vk’s span V , we can write each wi as a linear combination of the vk’s,wi =

∑mk=1 cki vk, for some cki ∈ F. Now we know the wi’s are linearly independent,

which means∑

i λiwi = 0 =⇒ λi = 0 for all i. But∑i λiwi =

∑i λi(∑

k cki vk)

=∑

k

(∑i cki λi

)vk.

We write C = (cki) for the m × n matrix formed by the coefficients cki. Observe thatthe rules of matrix multiplication are such that the coefficient of vk in

∑λiwi is the

kth entry of the column vector Cλ.

If m < n, we learned in Vectors & Matrices that there is a non-trivial solution λ 6= 0.(We have m linear equations in n variables, so a non-zero solution exists; the proof isby row reduction.) This contradicts the wi’s as linearly independent. So m ≥ n.

Now, if B1 and B2 are bases, then apply this to S = B1, L = B2 to get |B1| ≥ |B2|.Similarly apply this S = B2, L = B1 to get |B2| ≥ |B1|, and so |B1| = |B2|.


Definition. Let V be a vector space over a field F. Then the dimension of V ,denoted by dimV , is the number of elements in a basis of V .

Example 1.16. dimFn = n, as e1, . . . , en is a basis, called the standard basis, whereei is a vector that is 1 in the ith place, and 0 everywhere else (for i = 1, . . . , n).

Corollary 1.17.

(i) If S spans V , then |S| ≥ dimV , with equality if and only if S is a basis.

(ii) If L = {v1, . . . , vk} is linearly independent, then |L| ≤ dimV , with equality if andonly if L is a basis.

Proof. Immediate. Theorem 1.12 implies (i) and theorem 1.13 implies (ii).

Lemma 1.18. Let W ≤ V , and V be finite dimensional. Then W is finite dimensional,and dimW ≤ dimV . Moreover, dimW = dimV if and only if W = V .

Proof. The subtle point is to show that W is finite dimensional.

Let w1, . . . , wr be linearly independent vectors in W . Then they are linearly independentwhen considered as vectors in V , so r ≤ dimV by our theorem. If 〈w1, . . . , wr〉 6= W ,then there is some w ∈ W with w 6∈ 〈w1, . . . , wr〉, and so by lemma 1.14, w1, . . . , wr, wis linearly independent, and r + 1 ≤ dimV .

Continue in this way finding linearly independent vectors in W , and we must stop afterat most (dimV ) steps. When we stop, we have a finite basis of W , so W is finitedimensional, and the rest of the theorem is immediate.

Lemma 1.19. Let V be finite dimensional and S any spanning set. Then there is afinite subset S ′ of S which still spans V , and hence a finite subset of that which is abasis.

Proof. As V is finite dimensional, there is a finite spanning set {v1, . . . , vn}. Now, as Sspans V , we can write each vi as a finite linear combination of elements of S.

But when you do this, you use only finitely many elements of S for each i. Hence as thereare only finitely many vi (there are n of them!), this only uses finitely many elements ofS. We call this finite subset S ′. By construction, V = 〈v1, . . . , vn〉 ⊆

⟨S ′⟩.

http://en.wikipedia.org/wiki/Dimension

Vector spaces | 11

1.4 Linear maps and matrices

Definition. Let V and W be vector spaces over F, and ϕ : V → W a map. Wesay that ϕ is linear if

(i) ϕ is a homomorphism of abelian groups; that is, ϕ(0) = 0 and for all v1, v2 ∈V , we have ϕ(v1 + v2) = ϕ(v1) + ϕ(v2).

(ii) ϕ respects scalar multiplication; that is, ϕ(λv) = λϕ(v) for all λ ∈ F, v ∈ V .

Combining these two conditions, we see that a map ϕ is linear if and only if

ϕ(λ1 v1 + λ2 v2) = λ1 ϕ(v1) + λ2 ϕ(v2)

for all λ1, λ2 ∈ F, v1, v2 ∈ V .

Definition. We write L(V,W ) to be the set of linear maps from V to W ; that is,

L(V,W ) ={ϕ : V →W | ϕ linear

}A linear map ϕ : V → W is an isomorphism if there is a linear map ψ : W → Vsuch that ϕψ = 1W and ψϕ = 1V .

Notice that if ϕ is an isomorphism, then in particular ϕ is a bijection on sets. Theconverse also holds:

Lemma 1.20. A linear map ϕ is an isomorphism if ϕ is a bijection; that is, if ϕ−1

exists as a map of sets.

Proof. We must show that ϕ−1 : W → V is linear; that is,

ϕ−1(a1w1 + a2w2) = a1 ϕ−1(w1) + a2 ϕ

−1(w2). (∗)

But we have

ϕ(a1 ϕ

−1(w1) + a2 ϕ−1(w2)

)= a1 ϕ(ϕ−1(w1)) + a2 ϕ(ϕ−1(w2)) = a1w1 + a2w2,

as ϕ is linear. Now apply ϕ−1 to get (∗).

Lemma 1.21. If ϕ : V →W is a vector space isomorphism, then dimV = dimW .

Proof. Lecture 4Let b1, . . . , bn be a basis of V . We claim that ϕ(b1), . . . , ϕ(bn) is a basis of W .First we check linear independence: Suppose

0 =

n∑i=1

λi ϕ(bi) = ϕ

n∑i=1

λi bi

.

As ϕ is injective, so∑λi bi = 0, and hence as the bi are linearly independent, λi = 0

for i = 1, . . . , n. So ϕ(bi) are linearly independent.

Then we check they span: since ϕ is surjective, for all w ∈ W , we have w = ϕ(v)for some v ∈ V . But v =

∑λi bi for some λi ∈ F, as the bi span V . But then

w = ϕ(v) =∑λi ϕ(bi), and the ϕ(bi) span W .

Since they both have a basis of the same size, it follows that dimV = dimW .

http://en.wikipedia.org/wiki/Linear_map


Definition. If b1, . . . , bn are a basis of V , and v =∑

i λi vi, we say λ1, . . . , λn arethe coordinates of v with respect to the basis b1, . . . , bn.

Here is another view of what the coordinates of a vector mean:

Proposition 1.22. Let V be a finite-dimensional vector space over F, with dimV = n.Then there is a bijection

{ordered bases b1, . . . , bn of V } ∼−→{ϕ : Fn ∼−→ V

}.

The idea of the proposition is that coordinates of a vector with respect to a basis definea point in Fn, and hence a choice of a basis is a choice of an isomorphism of our vectorspace V with Fn.

Proof. Given an ordered basis b1, . . . , bn of V , call it B, we can write every vector v ∈ Vas v =

∑λi bi for unique λi, . . . , λn ∈ F. Define αB : V → Fn by

αB(v) =

λ1...λn

=n∑i=1

λi ei,

where {ei} is the standard basis of Fn.

It is clear that αB is well-defined, linear and an isomorphism, and the inverse sends(λ1, . . . , λn) 7→

∑λi bi.

This defines a map {ordered bases B} →{α : V

∼−→ Fn}

taking B 7→ αB.

To see that this map is a bijection, suppose we are given α : V → Fn an isomorphism.Then α−1 : Fn → V is also an isomorphism, and we define bi = α−1(ei). The proofof the previous lemma showed that b1, . . . , bn is a basis of V . It is clear that for thisordered basis B, αB = α.

Let V and W be finite dimensional vector spaces over F, and choose bases v1, . . . , vnand w1, . . . , wm of V and W , respectively. Then we have the diagram:

Fn

∼=��

Fm

∼=��

V α//W

Now, suppose α : V → W is a linear map. As α is linear, and every v ∈ V can bewritten as v =

∑λi vi for some λ1, . . . , λn, we have

α(v) = α

n∑i=1

λi vi

=

n∑i=1

λi α(vi),

so α is determined by its values α(v1), . . . , α(vn). But then write each α(vi) as a sumof basis elements w1, . . . , wm

α(vj) =

m∑i=1

aij wi j = 1, . . . ,m

for some aij ∈ F.

http://en.wikipedia.org/wiki/Coordinates

Vector spaces | 13

Hence, if (λ1, . . . , λn) are the coordinates of v ∈ V , with respect to a basis v1, . . . , vn;that is, if v =

∑λi vi, then

α(v) = α

n∑j=1

λj vj

=∑i,j

aij λj wi;

that is, ∑a1j λj∑a2j λj...∑amj λj

= A

λ1

λ2...λn

are the coordinates of α(v) with respect to w1, . . . , wm.

That is, by choosing bases v1, . . . , vn and w1, . . . , wm of V and W , respectively, everylinear map α : V →W determines a matrix A ∈ Matm,n(F).

Conversely, given A ∈ Matm,n(F), we can define

α

n∑i=1

λi vi

=n∑i=1

m∑j=1

aij λj wi,

which is a well-defined linear map α : V →W , and these constructions are inverse, andso we’ve proved the following theorem:

Theorem 1.23

A choice of bases v1, . . . , vn and w1, . . . , wm of vector spaces V and W defines anisomorphism L(V,W )

∼−→ Matm,n(F).

Remark. Actually, L(V,W ) is a vector space. The vector space structure is given bydefining, for a, b ∈ F, α, β ∈ L(V,W ),

(aα+ bβ) (v) = aα(v) + b β(v).

Also, Matm,n(F) is a vector space over F, and these maps L(V,W ) � Matm,n(F) arevector space isomorphisms.

The choice of bases for V and W define isomorphisms with Fn and Fm respectively sothat the following diagram commutes:

Fn A//

∼=��

Fm

∼=��

V α//W

We say a diagram commutes if every directed path through the diagram with the samestart and end vertices leads to the same result by composition. This is convenient shorthand language for a bunch of linear equations – that the coordinates of the differentmaps that you get by composing maps in the different manners agree.

Corollary 1.24. dimL(V,W ) = dim Matm,n(F) = nm = dimV dimW .


Lemma 1.25. Let α : V →W , β : W → U be linear maps of vector spaces U, V,W .

(i) βα : V → U is linear.

(ii) If v1, . . . , vn is a basis of V ,w1, . . . , wm is a basis of W ,u1, . . . , ur is a basis of U ,

and A ∈ Matm,n(F) is the matrix of α with respect to the vi, wj bases, andB ∈ Matr,m(F) is the matrix of β with respect to the wj , uk bases,

then the matrix of βα : V → U with respect to the vi, uk bases is BA.

Proof.

(i) Exercise.

(ii) We have from our earlier work

α(vj) =m∑i=1

aij wi and β(wi) =r∑

k=1

bki uk.

Now we have

(βα)(vj) = β

m∑i=1

aij wi

=r∑

k=1

m∑i=1

aij bki uk,

and so the coefficient of uk is∑

i,k bki aij = (BA)kj .

Definition. A linear map ϕ : V → V is an automorphism if it is an isomorphism.The set of automorphisms forms a group, and is denoted

GL(V ) ={ϕ : V → V | ϕ a linear isomorphism

}={ϕ ∈ L(V, V ) | ϕ an isomorphism

}Example 1.26. We write GLn(F) = {ϕ : Fn → Fn, ϕ isomorphism} = GL(Fn).

Exercise 1.27. Show that if ϕ : V∼−→ W is an isomorphism, then it induces an

isomorphism of groups GL(V ) ∼= GL(W ), so GL(V ) ∼= GLdimV (F).Lecture 5

Lemma 1.28. Let v1, . . . , vn be a basis of V and ϕ : V → V be an isomorphism; thatis, let ϕ ∈ GL(V ). Then we showed that ϕ(v1), . . . , ϕ(vn) is also a basis of V and hence

(i) If v1 = ϕ(v1), . . . , vn = ϕ(vn), then ϕ = idV . In other words, we get the sameordered basis if and only if ϕ is the identity map.

(ii) If v ′1, . . . , v′n is another basis of V , then the linear map ϕ : V → V defined by

ϕ(∑

λi vi)

=∑λi v

′i

(that is, the map sending vi 7→ v ′i ) is an isomorphism.

Proof. Define its inverse ψ : V → V by v ′i 7→ vi; that is,

ψ(∑

λi v′i

)=∑λi vi.

Then it is clear ϕψ = ψϕ = idV : V → V .

http://en.wikipedia.org/wiki/Automorphism

Vector spaces | 15

So (i) and (ii) say that:

Proposition 1.29. GL(V ) acts simply and transitively on the set of bases; that is, givenv ′1, . . . , v

′n a basis, there is a unique ϕ ∈ GL(V ) such that ϕ(v1) = v ′1, . . . , ϕ(vn) = v ′n.

Corollary 1.30.∣∣GLn(Fp)

∣∣ = (pn − 1) (pn − p) · · ·(pn − pn−1

).

Proof. It is enough to count ordered bases of Fnp , which is done by proceeding as follows:

Choose v1, which can be any non-zero element, so we have pn − 1 choices.

Choose v2, any non-zero element not a multiple of v1, so pn − p choices.

Choose v3, any non-zero element not in 〈v1, v2〉, so pn − p2 choices....

Choose vn, any non-zero element not in 〈v1, . . . , vn−1〉, so pn − pn−1 choices.

Example 1.31.∣∣GL2(Fp)

∣∣ = p (p− 1)2 (p+ 1).

Remark. We could express the same proof by saying that a matrix A ∈ Matn(Fp) isinvertible if and only if all of its columns are linearly independent, and the proof worksby picking each column in turn.

Let v1, . . . , vn be a basis of V , and ϕ ∈ GL(V ). Then v ′i = ϕ(vi) is a new basis of V .Let A be the matrix of ϕ with respect to the original basis v1, . . . , vn for both sourceand target ϕ : V → V . Then

ϕ(vi) = v ′i =∑

j aji vj ,

so the columns of A are the coordinates of the new basis in terms of the old. We canalso express this by saying the following diagram commutes:

Fn∼=//

A��

V

ϕ

��

ei 7→ vi

Fn ∼=// V ei 7→ vi

Conversely, if v1, . . . , vn is a basis of V , and v ′1, . . . , v′n is another basis, then we can

define ϕ : V → V by ϕ(vi) = v′i, and we can express this by saying that the followingdiagram commutes.

Fn∼=//

∼=��????????

V

ϕ

��

ei� //

��>>>>>>>

vi_

ϕ

��

V v ′i

This is just language meant to clarify the relation between changing bases, and bases asgiving isomorphisms with a fixed Fn. If it instead confuses you, feel free to ignore it. Incontrast, here is a practical and important question about bases and linear maps, whichyou can’t ignore:

Consider a linear map α : V → W . Let v1, . . . , vn be a basis of V , w1, . . . , wn of W ,and A be the matrix of α. If we have new bases v ′1, . . . , v

′n and w ′1, . . . , w

′n, then we get

a new matrix of α with respect to this basis. What is the matrix with respect to thesenew bases? We write

v ′i =∑

j pji vj w ′i =∑

j qjiwj .


Exercise 1.32. Show that w ′i =∑

j qjiwj if and only if wi =∑

j (Q−1)jiw′j , where

Q = (qab).

Then we have

α(v ′i ) =∑j

pji α(vj) =∑j,k

pji akj wk =∑j,k,l

pji akj (Q−1)lk w′l =

∑l

(Q−1AP )liw′l ,

and we see that the matrix is Q−1AP .

Finally, a definition.

Definition.

(i) Two matrices A,B ∈ Matm,n(F) are said to be equivalent if they representthe same linear map Fn → Fm with respect to different bases, that is thereexist P ∈ GLn(F), Q ∈ GLm(F) such that

B = Q−1AP.

(ii) The linear maps α : V →W and β : V →W are equivalent if their matriceslook the same after an appropriate choice of bases; that is, if there existsan isomorphism p ∈ GL(V ), q ∈ GL(W ) such that the following diagramcommutes:

Vα//W

V

p

OO

β//W

q

OO

That is to say, if q−1αp = β.

Vector spaces | 17

1.5 Conservation of dimension: the rank-nullity theorem

Definition. For a linear map α : V →W , we define the kernel to be the set of allelements that are mapped to zero

kerα ={x ∈ V : α(x) = 0

}= K ≤ V

and the image to be the points in W which we can reach from V

Imα = α(V ) ={α(v) : v ∈ V

}≤W.

Proving that these are subspaces is left as an exercise. We then say that r(α) =dim Imα is the rank and n(α) = dim kerα is the nullity.

Theorem 1.33: Rank-nullity theorem

For a linear map α : V →W , where V is finite dimensional, we have

r(α) + n(α) = dim Imα+ dim kerα = dimV.

Proof. Let v1, . . . , vd be a basis of kerα, and extend it to a basis of V , say, v1, . . . , vd, vd+1, . . . , vn.We show the following claim, which implies the theorem immediately:

Claim. α(vd+1), . . . , α(vn) is a basis of Imα.

Proof of claim. Span: if w ∈ Imα, then w = α(v) for some v ∈ V . But v1, . . . , vn is abasis, so there are some λ1, . . . , λn ∈ F with v =

∑λi vi. Then

α(v) = α

n∑i=1

λi vi

=n∑

i=d+1

λi α(vi)

as α(v1) = · · · = α(vd) = 0; that is, α(vd+1), . . . , α(vn) span Imα.

Linear independence: we have

n∑i=d+1

λi α(vi) = 0 =⇒ α

n∑i=d+1

λi vi

= 0.

And hence∑n

i=d+1 λi vi ∈ kerα. But kerα has basis v1, . . . , vd, and so there areµ1, . . . , µd ∈ F such that

n∑i=d+1

λi vi =

d∑i=1

µi vi.

But this is a relation of linear dependence on v1, . . . , vn, a basis of V , so we must have

−µ1 = −µ2 = · · · = −µd = λd+1 = · · · = λn︸︷︷︸hence linearly independent

= 0.

Corollary 1.34. Let α : V → W be a linear map between finite dimensional spaces Vand W . If dimV = dimW , then α : V → W is an isomorphism if and only if α isinjective, and if and only if α is surjective.

Proof. The map α is injective if and only if dim kerα = 0, and so dim Imα = dimV(which is dimW here), which is true if and only if α is surjective.

http://en.wikipedia.org/wiki/Image_(mathematics)


Remark.Lecture 6 If v1, . . . , vn is a basis for V , w1, . . . , wm is a basis for W and A is the matrix of

the linear map α : V → W , then Imα∼−→ 〈column space of A〉, kerα

∼−→ kerA, and theisomorphism is induced by the choice of bases for V and W , that is by the isomorphismsW

∼−→ Fm, V∼←− Fn.

You’ll notice that the rank-nullity theorem follows easily from our basic results abouthow linearly independent sets extend to bases. You’ll recall that these results in turndepended on row and column reduction of matrices. We’ll now show that in turn theyimply the basic results about row and column reduction – the first third of this courseis really just learning fancy language in which to rephrase Gaussian elimination.

The language will be useful in future years, especially when you learn geometry. Howeverit doesn’t really help when you are trying to solve linear equations – that is, finding thekernel of a linear transformation. For that, there’s not much you can say other than:write the linear map in terms of a basis, as a matrix, and row and column reduce!

Theorem 1.35

(i) Let A ∈ Matm,n(F). Then A is equivalent to

B =

1 0 · · · · · · · · · 0

0. . . 0

. . .. . .

...... 0 1 0

. . ....

.... . . 0 0

......

.... . .

. . .. . .

. . ....

0 · · · · · · · · · · · · 0

that is, there exist invertible P ∈ GLm(F), Q ∈ GLn(F) such that B =Q−1AP .

(ii) The matrix B is well defined. That is, if A is equivalent to another matrixB ′ of the same form, then B ′ = B.

Part (ii) of the theorem is clunkily phrased. We’ll phrase it better in a moment bysaying that the number of ones is the rank of A, and equivalent matrices have the samerank.

Proof 1 of theorem 1.35.

(i) Let V = Fn, W = Fm and α : V → W be the linear map taking x 7→ Ax.Define d = dim kerα. Choose a basis y1, . . . , yd of kerα, and extend this to a basisv1, . . . , vn−d, y1, . . . , yd of V .

Then by the proof of the rank-nullity theorem, α(vi) = wi, for 1 ≤ i ≤ n− d, arelinearly independent in W , and we can extend this to a basis w1, . . . , wm of W .But then with respect to these new bases of V and W , the matrix of α is just B,as desired.

(ii) The number of one’s (n−d here) in this matrix equals the rank of B. By definition,

r(A) = column rank of A

= dim Imα

= dim(subspace spanned by columns)

So to finish the proof, we need a lemma.

Vector spaces | 19

Lemma 1.36. If α, β : V →W are equivalent linear maps, then

dim kerα = dim kerβ dim Imα = dim Imβ

Proof of lemma. Recall α, β : V → W are equivalent if there are some p, q ∈ GL(V ) ×GL(W ) such that β = q−1αp.

Vα//W

V

p

OO

β//W

q

OO

Claim. x ∈ kerβ ⇐⇒ px ∈ kerα.

Proof. β(x) = q−1αp(x). As q is an isomorphism, q−1(α(p(x))) = 0 ⇐⇒ α(p(x)) = 0;that is, the restriction of p to kerβ maps kerβ to kerα; that is, p : kerβ

∼−→ kerα, andthis is an isomorphism, as p−1 exists on V . (So p−1y ∈ kerβ ⇐⇒ y ∈ kerα.)

Similarly, you can show that q induces an isomorphism q : Imβ∼−→ Imα.

Note that the rank-nullity theorem implies that in the lemma, if we know dim kerα =dim kerβ, then you know dim Imα = dim Imβ, but we didn’t need to use this.

Theorem 1.37: Previous theorem restated

The GL(V )×GL(W ) orbits on L(V,W ) are in bijection with{r : 0 ≤ r ≤ min(dimV,dimW )

}under the map taking α : V →W to rk(α) = dim Imα.

Here GL(V )×GL(W ) acts on L(V,W ) by (q, p) · β = qβp−1.

Hard exercise.

(i) What are the orbits of GL(V )×GL(W )×GL(U) on the set L(V,W )×L(W,U) ={α : V →W,β : U → V both linear}?

(ii) What are the orbits of GL(V )×GL(W ) on L(V,W )× L(W,V )?

You won’t be able to do part (ii) of the exercise before the next chapter, when you learnJordan normal form. It’s worthwhile trying to do them then.

Proof 2 of theorem 1.35.

(ii) As before, no theorems were used.

(i) We’ll write an algorithm to find P and Q explicitly:

Step 1: If top left a11 6= 0, then we can clear all of the first column by rowoperations, and all of the first row by column operations.

Let’s remember what this means.

Let Eij be the matrix with a 1 in the (i, j)’th position, and zeros elsewhere. Recallthat for i 6= j,

(I + αEij

)A is a new matrix, whose ith row is the ith row of A +

α · (ith row of A). This is an elementary row operation.

Similarly A(I + αEij

)is an elementary column operation. As an exercise, state

this precisely, as we did for the rows.


We have

E ′mE′m−1 · · ·E ′1 AE1 · · ·En =

(a11 00 A′

)where

E ′i = I − ai1a11

Ei1, Ej = I − a1j

a11E1j .

Step 2: if a11 = 0, either A = 0, in which case we are done, or there is someaij 6= 0.

Consider the matrix sij , which is the identity matrix with the ith row and the jth

row swapped, for example s12 =

(0 11 0

).

Exercise. sij A is the matrix A with the ith row and the jth row swapped, Asijis the matrix A with the ith and the jth column swapped.

Hence si1Asj1 has (1, 1) entry aij 6= 0.

Now go back to step 1 with this matrix instead of A.

Step 3: multiply by the diagonal matrix with ones along the diagonal except forthe (1, ), position, where it is a−1

11 .

Note it doesn’t matter whether we multiply on the left or the right, we get a matrixof the form (

1 00 A′′

)Step 4: Repeat this algorithm for A′′.

When the algorithm finishes, we end up with a diagonal matrix B with some oneson the diagonal, then zeros, and we have written it as a product(

row oppsfor col n

)∗ · · · ∗

(row oppsfor col 1

)︸︷︷︸

Q

∗A ∗[

row oppsfor row 1

]∗ · · · ∗

[row oppsfor row n

]︸︷︷︸

P

where each ∗ is either sij or 1 times an invertible diagonal matrix (which is mostlyones, but in the i’th place is a−1

ii ).

But this is precisely writing this as a product Q−1AP .

Corollary 1.38. Another direct proof of the rank-nullity theorem.

Proof. (ii) showed that dim kerA = dim kerB and dim ImA = dim ImB if A and Bare equivalent, by (i) of the theorem, it is enough to the show rank/nullity for B in thespecial form above. But here is it obvious.

Remark. Notice that this proof really is just the Gaussian elimination argument youlearned last year. We used this to prove the theorem ? on bases. So now that we’vewritten the proof here, the course really is self contained. It’s better to think thateverything we’ve been doing as dressing up this algorithm in coordinate independentlanguage.

In particular, we have given coordinate independent meaning to the kernel and columnspace of a matrix, and hence to its column rank. We should also give a coordinateindependent meaning for the row space and row rank, for the transposed matrix AT ,and show that column rank equals row rank. This will happen in chapter 4.

Vector spaces | 21

1.6 Sums and intersections of subspaces

Lemma 1.39. Let V be a vector space over F, and Ui ≤ V subspaces. Then U =⋂Ui

is a subspace.

Proof. Lecture 7Since 0 ∈ Ui for all i, certainly 0 ∈⋂Ui. And if u, v ∈ U , then u, v ∈ Ui for all

i, so λu+ µv ∈ Ui for all i, and hence λu+ µv ∈ U .

By contrast, the union U1 ∪ U2 is not a subspace unless U1 ⊆ U2 or U2 ⊆ U1.

Definition. Let U1, . . . , Ur ≤ V be subspaces. The sum of the Ui is the subspacedenoted

r∑i=1

Ui = U1 + · · ·+ Ur

={u1 + u2 + · · ·+ ur | ui ∈ Ui

}= 〈U1, . . . , Ur〉 ,

which is the span of⋃ri=1 Ui.

Exercise: prove the two equalities in the definition.

Definition. The set of d-dimensional subspaces of V ,{U | U ≤ V,dimU = d

}is

called the Grassmannian of d-planes in V , denoted Grd(V ).

Example 1.40. We have

Gr1(F2) ={

lines L in F2}

= F ∪ {∞},

as L = 〈λe1 + µe2〉. If λ 6= 0, we get L = 〈e1 + γe2〉, where γ = µ/λ ∈ F. If λ = 0,then L = 〈e2〉, which we think of as ∞.

If F = R, then this is R ∪ {∞}, the circle. If F = C then this is C ∪ {∞}, theRiemann sphere.

Theorem 1.41

Suppose U1, U2 ≤ V and Ui finite dimensional. Then

dim(U1 ∩ U2) + dim(U1 + U2) = dimU1 + dimU2.

Proof 1. Pick a basis v1, . . . , vd of U1 ∩U2. Extend it to a basis v1, . . . , vd, w1, . . . , wr ofU1 and a basis v1, . . . , vd, y1, . . . , ys of U2.

Claim. {v1, . . . , vd, w1, . . . , wr, y1, . . . , ys} is a basis of U1 + U2. The claim implies thetheorem immediately.

Proof of claim. Span: an element of U1 + U2 can be written x + y for x ∈ U1, y ∈ U2,and so

x =∑λi vi +

∑µj wj y =

∑αi vi +

∑βk yk

Combining these two, we have

x+ y =∑

(λi + αi) vi +∑µj wj +

∑βk yk

http://en.wikipedia.org/wiki/Grassmannian


Linear independence is obvious, but messy to write: if∑αi vi +

∑βj wj +

∑γkyk = 0,

then ∑αi vi +

∑βj wj︸︷︷︸

∈U1

= −∑γkyk︸︷︷︸

∈U2

,

hence∑γk yk ∈ U1 ∩ U2, and hence

∑γk yk =

∑θivi for some θi, as v1, . . . , vd is a

basis of U1 ∩ U2. But vi, yk are linearly independent, so γk = θi = 0 for all i, k. Thus∑αi vi +

∑βj wj = 0, but as vi, wj are linearly independent, we have αi = βj = 0 for

all i, j.

We can rephrase this by introducing more notation. Suppose Ui ≤ V , and we say thatU =

∑Ui is a direct sum if every u ∈ U can be written uniquely as u = u1 + · · ·+ uk,

for some ui ∈ U .

Lemma 1.42. U1 + U2 is a direct sum if and only if U1 ∩ U2 = {0}.

Proof. (⇒) Suppose v ∈ U1 ∩ U2. Then

v = v∈U1

+ 0 = 0 + v∈U2

,

which is two ways of writing v, so uniqueness gives that v = 0.

(⇐) If u1 + u2 = u ′1 + u ′2, for ui, u′i ∈ Ui, then u1 − u ′1

∈U1

= u2 − u ′2∈U2

.

This is in U1 ∩ U2 = {0}, and so u1 = u ′1 and u2 = u ′2, and sums are unique.

Definition. Let U ≤ V . A complement to U is a subspace W ≤ V such thatW + U = V and W ∩ U = {0}.

Example 1.43. Let V = R2, and U be the line spanned by e1. Any line differentfrom U is a complement to U ; that is, W = 〈e2 + λe1〉 is a complement to U , forany λ ∈ F.

In particular, complements are not unique. But they always exist:

Lemma 1.44. Let U ≤ V and U finite dimensional. Then a complement to U exists.

Proof. We’ve seen that U is finite dimensional. Choose v1, . . . , vd as a basis of V , andextend it by w1, . . . , wr to a basis v1, . . . , vd, w1, . . . , wr of V .

Then W = 〈w1, . . . , wr〉 is a complement.

Exercise 1.45. Show that if W ′ is another complement to U , then there exists aunique ϕ : W → U linear, such that W ′ =

{w + ϕ(w) | w ∈W

}, and conversely.

In other words, show that there is a bijection from the set of complements of U toL(W,U).

Lemma 1.46. If U1, . . . Ur ≤ U are such that U1 + . . . Ur is a direct sum, show thatdim(U1 + · · ·+ Ur) = dimU1 + · · ·+ dimUr.

Proof. Exercise. Show that a union of bases for Ui is a basis for∑Ui.

http://en.wikipedia.org/wiki/Direct_sum

http://en.wikipedia.org/wiki/Direct_sum_of_modules#Internal_direct_sum

Vector spaces | 23

Now let U1, U2 ≤ V be finite dimensional subspaces of V . ChooseW1 ≤ U1 a complementto U1 ∩ U2 in U1, and W2 ≤ U2 a complement to U1 ∩ U2 in U2. Then

Corollary 1.47.U1 + U2 = (U1 ∩ U2) +W1 +W2

is a direct sum, and the previous lemma gives another proof that

dim(U1 + U2) + dim(U1 ∩ U2) = dimU1 + dimU2.

Once more, let’s look at this:

Definition. Let V1, V2 be two vector spaces over F. Then define V1⊕V2, the directsum of V1 and V2 to be the product set V1 × V2, with vector space structure

(v1, v2) + (w1, w2) = (v1 + w1, v2 + w2) λ (v1, v2) = (λv1, λv2) .

Exercises 1.48.

(i) Show that V1 ⊕ V2 is a vector space. Consider the linear maps

i1 : V1 ↪→ V1 ⊕ V2 taking v1 7→ (v1, 0)

i2 : V2 ↪→ V1 ⊕ V2 taking v2 7→ (0, v2)

These makes V1∼= i(V1) and V2

∼= i(V2) subspaces of V1 ⊕ V2 such thatiV1 ∩ iV2 = {0}, and so V1 ⊕ V2 = iV1 + iV2, so it is a direct sum.

(ii) Show that F⊕ · · · ⊕ F︸︷︷︸n times

= Fn.

Once more let U1, U2 ≤ V be subspaces of V . Consider U1, U2 as vector spaces in theirown right, and form U1 ⊕ U2, a new vector space. (This is no longer a subspace of V .)

Lemma 1.49. Consider the linear map U1 ⊕ U2π−→ V taking (u1, u2) 7→ u1 + u2.

(i) This is linear.

(ii) kerπ ={

(−w,w) | w ∈ U1 ∩ U2

}.

(iii) Imπ = U1 + U2 ≤ V .

Proof. Exercise.

Corollary 1.50. Show that the rank-nullity theorem implies

dim kerπ=dimU1∩U2

+ dim Imπ=dim(U1+U2)

= dim(U1 ⊕ U2) = dimU1 + dimU2.

This is our slickest proof yet. All three proofs are really the same – they ended upreducing to Gaussian elimination – but the advantage of this formulation is we neverhave to mention bases. Not only is it the cleanest proof, it actually makes it easier tocalculate. It is certainly helpful for part (ii) of the following exercise.

Exercise 1.51. Let V = Rn and U1, U2 ≤ Rn. Let U1 have a basis v1, . . . , vr andU2 have a basis w1, . . . , ws.

(i) Find a basis for U1 + U2.

(ii) Find a basis for U1 ∩ U2.

http://en.wikipedia.org/wiki/Direct_sum_of_modules

http://en.wikipedia.org/wiki/Direct_sum_of_modules

Endomorphisms | 25

2 Endomorphisms

Lecture 8In this chapter, unless stated otherwise, we take V to be a vector space over a field F,and α : V → V to be a linear map.

Definition. An endomorphism of V is a linear map from V to V . We writeEnd(V ) = L(V, V ) to denote the set of endomorphisms of V :

End(V ) = {α : V → V, α linear} .

The set End(V ) is an algebra: as well as being a vector space over F, we can also multiplyelements of it – if α, β ∈ End(V ), then αβ ∈ End(V ), i.e. product is composition oflinear maps.

Recall we have also defined

GL(V ) ={α ∈ End(V ) : α invertible

}.

Fix a basis b1, . . . , bn of V and use it as the basis for both the source and target ofα : V → V . Then α defines a matrix A ∈ Matn(F), by α(bj) =

∑i aij bi. If b ′1, . . . , b

′n is

another basis, with change of basis matrix P , then the matrix of α with respect to thenew basis is PAP−1.

Vα//

∼=��

V

∼=��

FnA// Fn

Hence the properties of α : V → V which don’t depend on choice of basis are theproperties of the matrix A which are also the properties of all conjugate matrices PAP−1.

These are the properties of the set of GL(V ) orbits on End(V ) = L(V, V ), where GL(V )acts on End(V ), by (g, α) 7→ gαg−1.

In the next two chapters we will determine the set of orbits. This is called the theoryof Jordan normal forms, and is quite involved.

Contrast this with the properties of a linear map α : V → W which don’t depend onthe choice of basis of both V and W ; that is, the determination of the GL(V )×GL(W )orbits on L(V,W ). In chapter 1, we’ve seen that the only property of a linear map whichdoesn’t depend on the choices of a basis is its rank – equivalently that the set of orbitsis isomorphic to {i | 0 ≤ i ≤ min(dimV,dimW )}.

We begin by defining the determinant, which is a property of an endomorphism whichdoesn’t depend on the choice of a basis.


2.1 Determinants

Definition. We define the map det : Matn(F)→ F by

detA =∑σ∈Sn

ε(σ) a1,σ(1) . . . an,σ(n).

Recall that Sn is the group of permutations of {1, . . . , n}. Any σ ∈ Sn can be writtenas a product of transpositions (ij).

Then ε : Sn → {±1} is a group homomorphism taking

ε(σ) =

{+1 if number of transpositions is even,

−1 if number of transpositions is odd.

In class, we had a nice interlude here on drawing pictures for symmetric group elementsas braids, composition as concatenating pictures of braids, and how ε(w) is the parityof the number of crossings in any picture of w. This was just too unpleasant to type up;sorry!

Example 2.1. We can calculate det by hand for small values of n:

det(a11

)= a11

det

(a11 a12

a21 a22

)= a11a22 − a12a21

det

a11 a12 a13

a21 a22 a23

a31 a32 a33

=a11a22a33 + a12a23a31 + a13a21a32

−a13a22a31 − a12a21a33 − a11a23a32

The complexity of these expressions grows nastily; when calculating determinants it’susually better to use a different technique rather than directly using the definition.

Lemma 2.2. If A is upper triangular, that is, if aij = 0 for all i > j, then detA =a11 . . . ann.

Proof. From the definition of determinant:

detA =∑σ∈Sn

ε(σ) a1,σ(1) . . . an,σ(n).

If a product contributes, then we must have σ(i) ≤ i for all i = 1, . . . , n. Hence σ(1) = 1,σ(2) = 2, and so on until σ(n) = n. Thus the only term that contributes is the identity,σ = id, and detA = a11 . . . ann.

Lemma 2.3. detAT = detA, where (AT )ij = Aji is the transpose.

Proof. From the definition of determinant, we have

detAT =∑σ∈Sn

ε(σ) aσ(1),1 . . . aσ(n),n

=∑σ∈Sn

ε(σ)

n∏i=1

aσ(i),i

Endomorphisms | 27

Now∏ni=1 aσ(i),i =

∏ni=1 ai,σ−1(i), since they contain the same factors but in a different

order. We relabel the indices accordingly:

=∑σ∈Sn

ε(σ)n∏k=1

ak,σ−1(k)

Now since ε is a group homomorphism, we have ε(σ · σ−1) = ε(ι) = 1, and thus ε(σ) =ε(σ−1). We also note that just as σ runs through {1, . . . , n}, so does σ−1. We thus have

=∑σ∈Sn

ε(σ)n∏k=1

ak,σ(k) = detA.

Writing vi for the ith column of A, we can consider A as an n-tuple of column vectors,A = (v1, . . . , vn). Then Matn(F) ∼= Fn×· · ·×Fn, and det is a function Fn×· · ·×Fn → F.

Proposition 2.4. The function det : Matn(F) → F is multilinear; that is, it is linearin each column of the matrix separately, so:

det(v1, . . . , λi vi, . . . , vn) = λi det(v1, . . . , vi, . . . , vn)

det(v1, . . . , v′i + v′′i , . . . , vn) = det(v1, . . . , v

′i, . . . , vn) + det(v1, . . . , v

′′i , . . . , vn).

We can combine this into the single condition

det(v1, . . . , λ′i v′i + λ ′′i v

′′i , . . . , vn) = λ ′i det(v1, . . . , v

′i , . . . , vn)

+ λ ′′i det(v1, . . . , v′′i , . . . , vn).

Proof. Immediate from the definition: detA is a sum of terms a1,σ(1), . . . , an,σ(n), each ofwhich contains only one factor from the ith column: aσ−1(i),i. If this term is λ′i aσ−1(i),i+λ′′i a

′′σ−1(i),i, then the determinant expands as claims.

Example 2.5. If we split a matrix along a single column, such as below, thendet(A) = detA′ + detA′′.

det

1 7 13 4 12 3 0

= det

1 3 13 2 12 1 0

+ det

1 4 13 2 12 2 0

Observe how the first and third columns remain the same, and only the secondcolumn changes. (Don’t get confused: note that det(A + B) 6= detA + detB forgeneral A and B.)

Corollary 2.6. det(λA) = λnA.

Proof. This follows immediately from the definition, or from applying the result ofproposition 2.4 multiple times.

Proposition 2.7. If two columns of A are the same, then detA = 0.

Proof. Suppose vi and vj are the same. Let τ = (i j) be the transposition in Sn whichswaps i and j. Then Sn = An

∐Anτ , where An = ker ε : Sn → {±1}. We will prove the

result by splitting the sum

detA =∑σ∈Sn

ε(σ)

n∏i=1

aσ(i),i


into a sum over these two cosets for An, observing that for all σ ∈ An, ε(σ) = 1 andε(στ) = −1.

Now, for all σ ∈ An we have

a1,σ(1) . . . an,σ(n) = a1,τσ(1) . . . an,τσ(n),

as if σ(k) 6∈ {i, j}, then τσ(k) = σ(k), and if σ(k) = i, then

ak,τσ(k) = ak,τ(i) = ak,j = ak,i = ak,σ(k),

and similarly if σ(k) = j. Hence

detA =∑σ∈An

n∏i=1

aσ(i),i −n∏i=1

aστ(i),i

= 0.

Proposition 2.8. If I is the identity matrix, then det I = 1

Proof. Immediate.

Theorem 2.9

These three properties characterise the function det.

Before proving this, we need some language.

Definition. A function f : Fn × · · · × Fn → F is a volume form on Fn if

(i) It is multilinear, that is, if

f(v1, . . . , λi vi, . . . , vn) = λi f(v1, . . . , vi, . . . , vn)

f(v1, . . . , vi + v ′i , . . . , vn) = f(v1, . . . , vi, . . . , vn) + f(v1, . . . , v′i , . . . , vn).

We saw earlier that we can write this in a single condition:

f(v1, . . . , λi vi + λ ′i v′i , . . . , vn) = λi f(v1, . . . , vi, . . . , vn)

+ λ ′i f(v1, . . . , v′i , . . . , vn).

(ii) It is alternating ; that is, whenever i 6= j and vi = vj , then f(v1, . . . , vn) = 0.

Example 2.10. We have seen that det : Fn × · · · × Fn → F is a volume form. It isa volume form f with f(e1, . . . , en) = 1 (that is, det I = 1).

Remark. Let’s explain the name “volume form”.

Let F = R, and consider the volume of a rectangular box with a corner at 0 and sidesdefined by v1, . . . , vn in Rn. The volume of this box is a function of v1, . . . , vn that almostsatisfies the properties above. It doesnt quite satisfy linearity, as the volume of a boxwith sides defined by −v1, v2, . . . , vn is the same as that of the box with sides definedby v1, . . . , vn, but this is the only problem. (Exercise: check that the other propertiesof a volume form are immediate for voluems of rectangular boxes.)

You should think of this as saying that a volume form gives a signed version of thevolume of a rectangular box (and the actual volume is the absoulute value). In any

Endomorphisms | 29

case, this explains the name. You’ve also seen this in multi-variable calculus, in the waythat the determinant enters into the formula for what happens to integrals when youchange coordinates.

Theorem 2.11: Precise form

The set of volume forms forms a vector space of dimension 1. This line is calledthe determinant line.

Proof. Lecture 9It is immediate from the definition that volume forms are a vector space. Lete1, . . . , en be a basis of V with n = dimV . Every element of V n is of the form(∑

ai1 ei,∑

ai2 ei, . . . ,∑

ain ei

),

with aij ∈ F (that is, we have an isomorphism of sets V n ∼−→ Matn(F)). So if f is avolume form, then

f

n∑i1=1

ai11 ei1 , . . . ,n∑

in=1

ainn ein

=n∑

i1=1

ai11 f

ei1 , n∑i2=1

ai21 ei2 , . . . ,n∑

in=1

ainn ein

= · · · =

∑1≤i1,...,in≤n

ai11 . . . ainn f(ei1 , . . . , ein),

by linearity in each variable. But as f is alternating, f(ei1 , . . . , ein) = 0 unless i1, . . . , inis 1, . . . , n in some order; that is,

(i1, . . . , in) =(σ(1), . . . , σ(n)

)for some σ ∈ Sn.

Claim. f(eσ(1), . . . , eσ(n)) = ε(σ) f(e1, . . . , en).

Given the claim, we get that the sum above simplifies to∑σ∈Sn

aσ(1),1 . . . aσ(n),n ε(w) f(e1, . . . , en),

and so the volume form is determined by f(e1, . . . , en); that is, dim({vol forms}) ≤1. But det : Matn(F) → F is a well-defined non-zero volume form, so we must havedim({vol forms}) = 1.

Note that we have just shown that for any volume form

f(v1, . . . , vn) = det(v1, . . . , vn) f(e1, . . . , en).

So to finish our proof, we just have to prove our claim.

Proof of claim. First, for any v1, . . . , vn ∈ V , we show that

f(. . . , vi, . . . , vj , . . .) = −f(. . . , vj , . . . , vi, . . .),

that is, swapping the ith and jth entries changes the sign. Applying multilinearity isenough to see this:

f(. . . , vi + vj , . . . , vi + vj , . . .)=0 as alternating

= f(. . . , vi, . . . , vi, . . .)=0 as alternating

+ f(. . . , vj , . . . , vj , . . .)=0 as alternating

+ f(. . . , vi, . . . , vj , . . .) + f(. . . , vj , . . . , vi, . . .).

Now the claim follows, as an arbitrary permutation can be written as a product oftranspositions, and ε(w) = (−1)# of transpositions.


Remark. Notice that if Z/2 6⊂ F is not a subfield (that is, if 1 + 1 6= 0), then for amultilinear form f(x, y) to be alternating, it suffices that f(x, y) = −f(y, x). This isbecause we have f(x, x) = −f(x, x), so 2f(x, x) = 0, but 2 6= 0 and so 2−1 exists, givingf(x, x) = 0. If 2 = 0, then f(x, y) = −f(y, x) for any f and the correct definition ofalternating is f(x, x) = 0.

If that didn’t make too much sense, don’t worry: this is included for mathematical in-terest, and isn’t essential to understand anything else in the course.

Remark. If σ ∈ Sn, then we can attach to it a matrix P (σ) ∈ GLn by

P (σ)ij =

{1 if σ−1i = j,

0 otherwise.

Exercises 2.12. Show that:

(i) P (w) has exactly one non-zero entry in each row and column, and that entryis a 1. Such a matrix is called a permutation matrix.

(ii) P (w) ei = ej , hence

(iii) P : Sn → GLn is a group homomorphism;

(iv) ε(w) = detP (w).

Theorem 2.13

Let A,B ∈ Matn(F). Then detAB = detAdetB.

Slick proof. Fix A ∈ Matn(F), and consider f : Matn(F) → F taking f(B) = detAB.We observe that f is a volume form. (Exercise: check this!!) But then

f(B) = detB · f(e1, . . . , en).

But by the definition,f(e1, . . . , en) = f(I) = detA.

Corollary 2.14. If A ∈ Matn(F) is invertible, then detA−1 = 1/detA.

Proof. Since AA−1 = I, we have

detAdetA−1 = detAA−1 = det I = 1,

by the theorem, and rearranging gives the result.

Corollary 2.15. If P ∈ GLn(F), then

det(PAP−1) = detP detAdetP−1 = detA.

Definition. Let α : V → V be a linear map. Define detα ∈ F as follows: chooseany basis b1, . . . , bn of V , and let A be the matrix of α with respect to the basis.Set detα = detA, which is well-defined by the corollary.

Endomorphisms | 31

Remark. Here is a coordinate free definition of detα.

Pick f any volume form for V , f 6= 0. Then

(x1, . . . , xn) 7→ f(αx1, . . . , αxn) = (fα)(x1, . . . , xn)

is also a volume form. But the space of volume forms is one-dimensional, so there issome λ ∈ F with fα = λf , and we define

λ = detα

(Though this definition is independent of a basis, we haven’t gained much, as we neededto choose a basis to say anything about it.)

Proof 2 of detAB = detAdetB. We first observe that it’s true if B is an elementarycolumn operation; that is, B = I + αEij . Then detB = 1. But

detAB = detA+ detA′,

where A′ is A except that the ith and jth column of A′ are the same as the jth columnof A. But then detA′ = 0 as two columns are the same.

Next, if B is the permutation matrix P ((i j)) = sij , that is, the matrix obtained fromthe identity matrix by swapping the ith and jth columns, then detB = −1, but Asij isA with its ith and jth columns swapped, so detAB = detAdetB.

Finally, if B is a matrix of zeroes with r ones along the leading diagonal, then if r = n,then B = I and detB = 1. If r < n, then detB = 0. But then if r < n, AB has somecolumns which are zero, so detAB = 0, and so the theorem is true for these B also.

Now any B ∈ Matn(F) can be written as a product of these three types of matrices. Soif B = X1 · · ·Xr is a product of these three types of matrices, then

detAB = det((AX1 · · ·Xm−1

)Xm)

= det(AX1 · · ·Xm−1) detXm

= · · · = detAdetX1 · · · detXm

= · · · = detAdet (X1 · · ·Xm)

= detAdetB.

Remark. That determinants behave well with respect to row and column operations isalso a useful way for humans (as opposed to machines!) to compute determinants.

Proposition 2.16. Let A ∈ Matn(F). Then the following are equivalent:

(i) A is invertible;

(ii) detA 6= 0;

(iii) r(A) = n.

Proof. (i) =⇒ (ii). Follows since detA−1 = 1/detA.

(iii) =⇒ (i). From the rank-nullity theorem, we have

r(A) = n ⇐⇒ kerα = {0} ⇐⇒ A invertible.


Finally we must show (ii) =⇒ (iii). If r(A) < n, then kerα 6= {0}, so there is someΛ = (λ1, . . . , λn)T ∈ Fn such that AΛ = 0, and λk 6= 0 for some k. Now put

B =

1 λ1

1 λ2

. . ....λk...

. . .

λn 1

Then detB = λk 6= 0, but AB is a matrix whose kth column is 0, so detAB = 0; thatis, detA = 0, since λk 6= 0.

This is a horrible and unenlightening proof that detA 6= 0 implies the existence of A−1.A good proof would write the matrix coefficients of A−1 in terms of (detA)−1 and thematrix coefficients of A. We will now do this, after some showing some further propertiesof the determinant.

We can compute detA by expanding along any column or row.Lecture 10

Definition. Let Aij be the matrix obtained from A by deleting the ith row andthe jth column.

Theorem 2.17

(i) Expand along the jth column:

detA = (−1)j+1 a1j detA1j + (−1)j+2 a2j detA2j + · · ·+ (−1)j+n anj detAnj

= (−1)j+1[a1j detA1j − a2j detA2j + a3j detA3j − · · ·+ (−1)n+1 anj detAnj

](the thing to observe here is that the signs alternate!)

(ii) Expanding along the ith row:

detA =

n∑j=1

(−1)i+j aij detAij .

The proof is boring book keeping.

Proof. Put in the definition of Aij as a sum over w ∈ Sn−1, and expand. We can tidythis up slightly, by writing it as follows: write A = (v1 · · · vn), so vj =

∑i aij ei. Then

detA = det(v1, . . . , vn) =

n∑i=1

aij det(v1, . . . , vj−1, ei, vj+1, . . . , vn)

=n∑i=1

(−1)j−1 aij det(ei, v1, . . . , vj−1, vj+1, . . . , vn

).

as ε(1 2 . . . j) = (−1)j−1 (in class we drew a picture of this symmetric group element,and observed it had j − 1 crossings.) Now ei = (0, . . . , 0, 1, 0, . . . , 0)T , so we pick up

Endomorphisms | 33

(−1)i−1 as the sign of the permutation (1 2 . . . i) that rotates the 1st through ith rows,and so we get ∑

i

(−1)i+j−2 aij det

(1 ∗0 Aij

)=∑i

(−1)i+j aij detAij .

Definition. For A ∈ Matn(F), the adjugate matrix , denoted by adjA, is thematrix with

(adjA)ij = (−1)i+j detAji.

Example 2.18.

adj

(a bc d

)=

(d −b−c a

), adj

1 1 20 2 11 0 2

=

4 −2 −31 0 −1−2 1 2

.

Theorem 2.19: Cramer’s rule

(adjA) ·A = A · (adjA) = (detA) · I.

Proof. We have

((adjA)A

)jk

=n∑i=1

(adjA)ji aik =n∑i=1

(−1)i+j detAijaik

Now, if we have a diagonal entry j = k then this is exactly the formula for detA in (i)above. If j 6= k, then by the same formula, this is detA ′, where A ′ is obtained fromA by replacing its jth column with the kth column of A; that is A ′ has the j and kthcolumns the same, so detA ′ = 0, and so this term is zero.

Corollary 2.20. A−1 =1

detAadjA if detA 6= 0.

The proof of Cramer’s rule only involved multiplying and adding, and the fact that theysatisfy the usual distributive rules and that multiplication and addition are commutative.A set in which you can do this is called a commutative ring. Examples include theintegers Z, or polynomials F[x].

So we’ve shown that if A ∈ Matn(R), where R is any commutative ring, then thereexists an inverse A−1 ∈ Matn(R) if and only if detA has an inverse in R: (detA)−1 ∈ R.For example, an integer matrix A ∈ Matn(Z) has an inverse with integer coefficients ifand only if detA = ±1.

Moreover, the matrix coefficients of adjA are polynomials in the matrix coefficients ofA, so the matrix coefficients of A−1 are polynomials in the matrix coefficients of A andthe inverse of the function detA (which is itself a polynomial function of the matrixcoefficients of A).

That’s very nice to know.

http://en.wikipedia.org/wiki/Adjugate_matrix

http://en.wikipedia.org/wiki/Cramer's_rule

Jordan normal form | 35

3 Jordan normal form

In this chapter, unless stated otherwise, we take V to be a finite dimensional vectorspace over a field F, and α : V → V to be a linear map. We’re going to look at whatmatrices look like up to conjugacy; that is, what the map α looks like, given the freedomto choose a basis for V .

3.1 Eigenvectors and eigenvalues

Definition. A non-zero vector v ∈ V is an eigenvector for α : V → V if α(v) = λv,for some λ ∈ F. Then λ is called the eigenvalue associated with v, and the set

Vλ ={v ∈ V : α(v) = λv

}is called the eigenspace of λ for α, which is a subspace of V .

We observe that if ι : V → V is the identity map, then

Vλ = ker(λι− α : V → V ).

So if v ∈ Vλ, then we can choose v to be a non-zero vector if and only if ker(λι−α) 6= {0},which is equivalent to saying that λι− α is not invertible. Thus

det(λι− α) = 0,

by the results of the previous chapter.

Definition. If b1, . . . , bn is a basis of V , and A ∈ Matn(F) is a matrix of α, then

chα(x) = det(xι− α) = det(xI −A)

is the characteristic polynomial of α.

The following properties follow from the definition:

(i) The general form is

chα(x) = chA(x) = det

x− a11 −a12 · · · −a1n

−a21 x− a22. . .

......

. . .. . .

...−an1 · · · · · · x− ann

∈ F [x].

Observe that chA(x) ∈ F [x] is a polynomial in x, equal to xn plus terms of smallerdegree, and the coefficients are polynomials in the matrix coefficients aij .

Lecture 11For example, if A =

(a11 a12

a21 a22

)then

chA(x) = x2 − x (a11 + a22) + (a11a22 − a12a21)

= x2 − x. trA+ detA.


(ii) Conjugate matrices have the same characteristic polynomials. Explicitly:

chPAP−1(x) = det(xI − PAP−1)

= det(P (xI −A)P−1)

= det(xI −A)

= chA(x).

(iii) For λ ∈ F, chα(λ) = 0 if and only if Vλ ={v ∈ V : α(v) = λv

}6= {0}; that is, if λ

is an eigenvalue of α. This gives us a way to find the eigenvalues of a linear map.

Example 3.1. If A is upper-triangular with aii in the ith diagonal entry, then

chA(x) =n∏i=1

(x− aii) .

It follows that the diagonal terms of an upper triangular matrix are its eigenvalues.

Definition. We say that chA(x) factors if it factors into linear factors; that is, if

chA(x) =r∏i=1

(x− λi)ni ,

for some ni ∈ N, λi ∈ F and λi 6= λj for i 6= j.

Examples 3.2. If we take F = C, then the fundamental theorem of algebra saysthat every polynomial f ∈ C[x] factors into linear terms.

In R, consider the rotation matrix

A =

(cos θ sin θ− sin θ cos θ

),

which has characteristic polynomial

chA(x) = x2 − 2x cos θ + 1,

which factors if and only if A = ±I and θ = 0 or π.

Definition. If F is any field, then there is some bigger field F, the algebraic closureof F, such that F ⊆ F and every polynomial in F[x] factors into linear factors. Thisis proved in the Part II course Galois Theory.

Theorem 3.3

If A is an n× n matrix over F, then chA(x) factors if and only if A is conjugate toan upper triangular matrix.

In particular, this means that if F = F, such as F = C, then every matrix isconjugate to an upper triangular matrix.


We can give a coordinate free formulation of the theorem: if α : V → V is a linear map,then chα(x) factors if and only if there is some basis b1, . . . , bn of V such that the matrixof α with respect to the basis is upper triangular.

Proof. (⇐) If A is upper triangular, then chA(x) =∏

(x− aii), so done.

(⇒) Otherwise, set V = Fn, and α(x) = Ax. We induct on dimV . If dimV = n = 1,then we have nothing to prove.

As chα(x) factors, there is some λ ∈ F such that chα(λ) = 0, so there is some λ ∈ Fwith a non-zero eigenvector b1. Extend this to a basis b1, . . . , bn of V .

Now conjugate A by the change of basis matrix. (In other words, write the linear mapα, x 7→ Ax with respect to this basis bi rather than the standard basis ei).

We get a new matrix

A =

(λ ∗0 A ′

),

and it has characteristic polynomial

chA

(x) = (x− λ) chA′(x).

So if chα(x) factors, then so does chA′(x).

Now, by induction, there is some matrix P ∈ GLn−1(F) such that PA′P−1 is uppertriangular. But now (

1P

)A

(1

P−1

)=

(λ

PA ′P−1

),

proving the theorem.

Aside: what is the meaning of the matrix A ′? We can ask this question more generally.Let α : V → V be linear, and W ≤ V a subspace. Choose a basis b1, . . . , br of W , extendit to be a basis of V (add br+1, . . . , bn).

Then α(W ) ⊆W if and only if the matrix of α with respect to this basis looks like(X Z0 Y

),

where X is r × r and Y is (n− r) × (n− r), and it is clear that α∣∣W

: W → W hasmatrix X, with respect to a basis b1, . . . , br of W .

Then our question is: What is the meaning of the matrix Y ?

The answer requires a new concept, the quotient vector space.

Exercise 3.4. Consider V as an abelian group, and consider the coset groupV/W = {v +W : v ∈ V }. Show that this is a vector space, that br+1+W, . . . , bn+Wis a basis for it, and α : V → V induces a linear map α : V/W → V/W byα(v +W ) = α(v) +W (you need to check this is well-defined and linear), and thatwith respect to this basis, Y is the matrix of α.

Remark. Let V = W ′⊕W ′′; that is, W ′∩W ′′ = {0}, W ′+W ′′ = V , and suppose thatα(W ′) ⊆ W ′ and α(W ′′) ⊆ W ′′. We write this as α = α′ ⊕ α ′′, where α′ : W ′ → W ′,α ′′ : W ′′ →W ′′ are the restrictions of α.

In this special case the matrix of α looks even more special then the above for any basisb1, . . . , br of W ′ and br+1, . . . , bn of W ′′ – we have Z = 0 also.


Definition. The trace of a matrix A = (aij), denoted tr(A), is given by

tr(A) =∑i

aii,

Lemma 3.5. tr(AB) = tr(BA).

Proof. tr(AB) =∑

i(AB)ii =∑

i,j aijbji =∑

j(BA)jj = tr(BA).

Corollary 3.6. tr(PAP−1) = tr(P−1PA) = tr(A).

So we define, if α : V → V is linear, tr(α) = tr(A), where A is a matrix of α with respectto some basis b1, . . . , bn, and this doesn’t depend on the choice of a basis.

Proposition 3.7. If chα(x) factors as (x− λ1) · · · (x− λn) (repetition allowed), then

(i) trα =∑

i λi;

(ii) detα =∏i λi.

Proof. As chα factors, there is some basis b1, . . . , bn of V such that the matrix of A isupper triangular, the diagonal entries are λ1, . . . , λn, and we’re done.

Remark. This is true whatever F is. Embed F ⊆ F (for example, R ⊆ C), and chAfactors as (x− λ1) · · · (x− λn). Now λ1, . . . , λn ∈ F, not necessarily in F.

Regard A ∈ Matn(F), which doesn’t change trA or detA, and we get the same result.Note that

∑i λi and

∏i λi are in F even though λi 6∈ F.

Example 3.8. Take A =

(cos θ sin θ− sin θ cos θ

). Eigenvalues are eiθ, e−iθ, so

trA = eiθ + e−iθ = 2 cos θ detA = eiθ · e−iθ = 1.

Note that

chA(x) = (x− λ1) · · · (x− λn)

= xn −

∑i

λi

xn−1 +

∑i<j

λi λj

xn−2 − · · ·+ (−1)n (λ1 . . . λn)

= xn − tr(A)xn−1 + e2xn−2 − · · ·+ (−1)n−1 en−1 + (−1)n detA,

where the coefficients e1 = trA, e2, . . . , en−1, en = detA are functions of λ1, . . . , λncalled elementary symmetric functions (which we see more of in Galois theory).

Each of these ei depend only on the conjugacy class of A, as

chPAP−1(x) = chA(x).

Note that A and B conjugate implies chA(x) = chB(x), but the converse is false. Con-sider, for example, the matrices

A =

(0 00 0

)B =

(0 10 0

),

and chA(x) = chB(x) = x2, but A and B are not conjugate.


For an upper triangular matrix, the diagonal entries are the eigenvalues. What is themeaning of the upper triangular coefficients?

This example shows there is some information in the upper triangular entries of anupper-triangular matrix, but the question is how much? We would like to always diag-onalise A, but this example shows that it isn’t always possible. Let’s understand whenit is possible. Lecture 12

Proposition 3.9. If v1, . . . , vk are eigenvectors with eigenvalues λ1, . . . , λk, and λi 6= λjif i 6= j, then v1, . . . , vk are linearly independent.

Proof 1. Induct on k. This is clearly true when k = 1. Now if the result is false,then there are ai ∈ F such that

∑ki=1 ai vi = 0, with some ai 6= 0, and without loss of

generality, a1 6= 0. (In fact, all ai 6= 0, as if not, we have a relation of linearly dependenceamong (k − 1) eigenvectors, contradicting our inductive assumption.)

Apply α to∑k

i=1 ai vi = 0, to get

k∑i=1

λi ai vi = 0.

Now multiply∑k

i=1 ai vi by λ1, and we get

k∑i=1

λ1 ai vi = 0.

Subtract these two, and we get

k∑i=2

(λi − λ1)︸︷︷︸6=0

ai vi = 0,

a relation of linear dependence among v2, . . . , vk, so ai = 0 for all i, by induction.

Proof 2. Suppose∑ai vi = 0. Apply α, we get

∑λi ai vi = 0; apply α2, we get∑

λ2i ai vi = 0, and so on, so

∑ki=1 λ

ri ai vi = 0 for all r ≥ 0. In particular,

1 · · · 1λ1 · · · λk...

...

λk−11 · · · λk−1

k

a1 v1

...

...ak vk

= 0.

Lemma 3.10 (The Vandermonde determinant). The determinant of the above matrixis∏i<j

(λj − λi

).

Proof. Exercise!

By the lemma, if λi 6= λj , this matrix is invertible, and so (a1 v1, . . . , ak vk)T = 0; that

is, ai = 0 for all i.

Note these two proofs are the same: the first version of the proof was surreptitiouslyshowing that the Vandermonde determinant was non-zero. It looks like the first proofis easier to understand, but the second proof makes clear what’s actually going on.


Definition. A map α is diagonalisable if there is some basis for V such that thematrix of α : V → V is diagonal.

Corollary 3.11. The map α is diagonalisable if and only if chα(x) factors into∏ri=1 (x− λi)ni,

and dimVλi = ni for all i.

Proof. (⇒) α is diagonalisable means that in some basis it is diagonal, with ni copiesof λi in the diagonal entries, hence the characteristic polynomial is as claimed.

(⇐)∑

i Vλi is a direct sum, by the proposition, so

dim(∑

i Vλi)

=∑

dimVλi =∑ni,

and by our assumption, n = dimV . Now in any basis which is the union of basis for theVλi , the matrix of α is diagonal.

Corollary 3.12. If A is conjugate to an upper triangular matrix with λi as the diagonalentries, and the λi are distinct, then A is conjugate to the diagonal matrix with λi inthe entries.

Example 3.13.

(1 70 2

)is conjugate to

(1 00 2

).

The upper triangular entries “contain no information”. That is, they are an artefactof the choice of basis.

Remark. If F = C, then the diagonalisable A are dense in Matn(C) = Cn2(exercise).

In general, if F = F, then diagonalisable A are dense in Matn(F) = Fn2, in the sense of

algebraic geometry.

Exercise 3.14. If A =

λ1 · · · an. . .

...0 λn

, then Aei = λei +∑

j<i aji ej .

Show that if λ1, . . . , λn are distinct, then you can “correct” each ei to an eigenvectorvi just by adding smaller terms; that is, there are pji ∈ F such that

vi = ei +∑j<i

pjiej has Avi = λivi,

which gives yet another proof of our proposition.


3.2 Cayley-Hamilton theorem

Let α : V → V be a linear map, and V a finite dimensional vector space over F.

Theorem 3.15: Cayley-Hamilton theorem

Every square matrix over a commutative ring (such as R or C) satisfies chA(A) = 0.

Example 3.16. A =

(2 31 2

)and chA(x) = x2− 4x+ 1, so chA(A) = A2− 4A+ I.

Then

A2 =

(7 124 7

),

which does equal 4A− I.

Remark. We have

chA(x) = det(xI −A) = det

x− a11 · · · −a1n...

. . ....

−an1 · · · x− ann

= xn − e1xn−1 + · · · ± en,

so we don’t get a proof by saying chα(α) = det(α − α) = 0. This just doesn’t makesense. However, you can make it make sense, and our second proof will do this.

Proof 1. If A is upper triangular and can be written as

A =

λ1 ∗. . .

0 λn

then the characteristic polynomial is

chA(x) =n∏i=1

(x− λi) .

Now if A were in fact diagonal, then

chA(A) =n∏i=1

(A− λiI) =

0 0∗∗

0 ∗

∗ 0

0∗

0 ∗

· · ·∗ 0∗∗

0 0

= 0

But even when A is upper triangular, this is zero.

Example 3.17. The matrix0 ∗ ∗∗ ∗∗

∗ ∗ ∗0 ∗∗

∗ ∗ ∗∗ ∗0

is still zero.


Here is a nicer way of writing this:

Let W0 = {0}, Wi = 〈e1, . . . , ei〉 ≤ V . If A is upper triangular, then AWi ⊆ Wi, andeven (A− λiI)Wi ⊆Wi−1. So (A− λnI)Wn ⊆Wn−1, and so

(A− λn−1I) (A− λnI)Wn ⊆ (A− λn−1I)Wn−1 ⊆Wn−2,

and so on, untiln∏i=1

(A− λiI)Wn ⊆W0 = {0};

that is, chA(A) = 0.

If F = F, then we can choose a basis for V such that α : V → V has an upper-triangularmatrix with respect to this basis, and hence the above shows chA(A) = 0; that is,chA(A) = 0 for all A ∈ Matn(F).

If F ⊆ F, then as Cayley-Hamilton is true for all A ∈ Matn(F), then it is certainly stilltrue for A ∈ Matn(F) ⊆ Matn(F).

Definition. The generalised eigenspace with eigenvalue λ is given by

V λ ={v ∈ V : (α− λI)dimV (v) = 0

}= ker (λI − α)dimV : V → V.

Note that Vλ ⊆ V λ.

Example 3.18. Let A =

λ ∗. . .

0 λ

.

Then (λI −A) ei has stuff involving e1, . . . , ei−1, so (λI −A)dimV ei = 0 for all i,as in our proof of Cayley-Hamilton (or indeed, by Cayley-Hamilton).

Further, if µ 6= λ, then

µI −A =

µ− λ ∗. . .

0 µ− λ

and so

(µI −A)n =

(µ− λ)n ∗. . .

0 (µ− λ)n

has non-zero diagonal terms, and so trivial kernel. Thus in this case, V λ = V ,V µ = 0 if µ 6= λ, and in general V µ = 0 if chα(µ) 6= 0, that is, ker (A− µI)N = {0}for all N ≥ 0.


Theorem 3.19

If chA(x) =∏ri=1 (x− λi)ni , with the λi distinct, then Lecture 13

V ∼=r⊕i=1

V λi ,

and dimV λi = ni. In other word, choose any basis of V which is the union of thebases of the V λi . Then the matrix of α is block diagonal. Moreover, we can choosethe basis of each V λ so that each diagonal block is upper triangular, with only oneeigenvalue— λ–on its diagonals.

We say “different eigenvalues don’t interact”.

Remark. If n1 = n2 = · · · = nr = 1 (and so r = n), then this is our previous theoremthat matrices with distinct eigenvalues are diagonalisable.

Proof. Consider

hλi(x) =∏j 6=i

(x− λj

)nj =chα(x)

(x− λi)ni.

Then defineW λi = Im

(hλi(A) : V → V

)≤ V.

Now Cayley-Hamilton implies that

(A− λiI)ni hλi(A)︸︷︷︸= chA(A)

= 0,

that is,W λi ⊆ ker (A− λiI)ni ⊆ ker (A− λiI)n = V λi .

We want to show that

(i)∑

iWλi = V ;

(ii) This sum is direct.

Now, the hλi are coprime polynomials, so Euclid’s algorithm implies that there arepolynomials fi ∈ F [x] such that

r∑i=1

fi hλi = 1,

and sor∑i=1

hλi(A) fi(A) = I ⊂ End(V ).

Now, if v ∈ V , then this gives

v =

r∑i=1

hλi(A) fi(A) v︸︷︷︸∈Wλi

,

that is,∑r

i=1Wλi = V . This is (i).


To see the sum is direct: if 0 =∑r

i=1wi, wi ∈ W λi , then we want to show that eachwi = 0. But hλj (wi) = 0, i 6= j as wi ∈ ker (A− λiI)ni , so (i) gives

wi =r∑i=1

hλi(A) fj(A)wi = fi(A)hλi(A) (wi),

so apply fi(A)hλi(A) to∑r

i=1wi = 0 and get wi = 0.

Now defineπi = fi(A)hλi(A) = hλi(A) fi(A).

We showed that πi : V → V has

Imπi = W λi ⊆ V λi and πi|W λi = identity, and so π2i = πi,

that is, πi is the projection to W λi . Compare with hλi(A) = hi, which has hi(V ) =W λi ⊆ V λi , hi | V λi an isomorphism, but not the identity; that is, fi | V λi = h−1

i | V λi .

This tells us to understand what matrices look like up to conjugacy, it is enough tounderstand matrices with a single eigenvalue λ, and by subtracting λI from our matrixwe may as well assume that eigenvalue is zero.

Before we continue investigating this, we digress and give another proof of Cayley-Hamilton.

Proof 2 of Cayley-Hamilton. Let ϕ : V → V be linear, V finite dimensional over F. Picka basis e1, . . . , en of V , so ϕ(ei) =

∑j aji ej , and we have the matrix A = (aij). Consider

ϕI −AT =

ϕ− a11 · · · −an1...

. . ....

−a1n · · · ϕ− ann

∈ Matn(End(V )),

where aij ∈ F ↪→ End(V ) by regarding an element λ as the operation of scalar mul-tiplication V → V , v 7→ λv. The elements of Matn(End(V )) act on V n by the usualformulas. So

(ϕI −AT )

e1...en

=

0...0

by the definition of A.

The problem is, it isn’t clear how to define det : Matn(End(V )) → End(V ), as thematrix coefficients (that is, elements of End(V )) do not commute in general. But thematrix elements of the above matrix do commute, so this shouldn’t be a problem.

To make it not a problem, consider ϕI−AT ∈ Matn(F [ϕ]); that is, F [ϕ] are polynomialsin the symbol ϕ. This is a commutative ring and now det behaves as always:

(i) det(ϕI −AT ) = chA(ϕ) ∈ F [ϕ] (by definition);

(ii) adj(ϕI −AT ) · (ϕI −AT ) = det(ϕI −AT ) · I ∈ Matn(F [ϕ]), as we’ve shown.

This is true for any B ∈ Matn(R), where R is a commutative ring. Here R = F [ϕ],B = ϕI −AT .

Make F [ϕ] act on V , by∑

i ai ϕi : v 7→

∑i ai ϕ

i(v), so

(ϕI −AT )

e1...en

=

0...0

.


Thus

0 = adj(ϕI −AT ) (ϕI −AT )

e1...en

︸︷︷︸

=0

= det(ϕI −AT )

e1...en

=

chA(ϕ) e1...

chA(ϕ) en

.

So this says that chA(A) ei = chA(ϕ) ei = 0 for all i, so chA(A) : V → V is the zeromap, as e1, . . . , en is a basis of V ; that is, chA(A) = 0.

This correct proof is as close to the nonsense tautological “proof” (just set x equal toA) as you can hope for. You will meet it again several times in later life, where it iscalled Nakayama’s lemma.

3.3 Combinatorics of nilpotent matrices

Lecture 14Definition. If ϕ : V → V can be written in block diagonal form; that is, if thereare some W ′,W ′′ ≤ V such that

ϕ(W ′) ⊆W ′ ϕ(W ′′) ⊂W ′′ V = W ′ ⊕W ′′

then we say that ϕ is decomposable and write

ϕ = ϕ ′ ⊕ ϕ ′′ ϕ ′ = ϕ∣∣W ′

: W ′ →W ′ ϕ ′′ = ϕ∣∣W ′′

: W ′′ →W ′′

We say that ϕ is the direct sum of ϕ ′ and ϕ ′′. Otherwise, we say that ϕ isdecomposable.

Examples 3.20.

(i) ϕ =

(0 00 0

)= (0 : F→ F)⊕ (0 : F→ F)

(ii) ϕ =

(0 10 0

): F2 → F2 is indecomposable, as 〈e1〉 is a unique ϕ-stable line.

(iii) If ϕ : V → V , then chϕ(x) =∏ri=1 (x− λi)ni , for λi 6= λj if i 6= j.

Then V =⊕r

i=1 Vλi decomposes ϕ into pieces ϕλi = ϕ

∣∣V λi

: V λi → V λi suchthat each ϕλ has only one eigenvalue, λ.

This decomposition is exactly the amount of information in chϕ(x). So weneed new information to further understand what matrices are up to conju-gacy.

Observe that ϕλ is decomposable if and only if ϕλ − λI is, and the onlyeigenvalue of ϕλ − λI is zero.

Definition. The map ϕ is nilpotent if the following (equivalent) conditions apply:

• ϕdimV = 0;

• kerϕdimV = V ;

• V 0 = V ;


• chϕ(x) = xdimV , and the only eigenvalue is zero.


Theorem 3.21

Let ϕ be nilpotent. Then ϕ is indecomposable if and only if there is a basisv1, . . . , vn such that

ϕ(vi) =

{0 if i = 1,

vi−1 if i > 1,

that is, if the matrix of ϕ is

Jn =

0 1 0

. . .. . .. . . 1

0 0

This is the Jordan block of size n with eigenvalue 0.

Definition. The Jordan block of size n, eigenvalue λ, is given by

Jn(λ) = λI + Jn.

Theorem 3.22: Jordan normal form

Every matrix is conjugate to a direct sum of Jordan blocks. Morever, these areunique up to rearranging their order.

Proof. Observe that theorem 3.22 =⇒ theorem 3.21, if we show that Jn is indecom-posable (and theorem 3.21 =⇒ theorem 3.22, existence).

[Proof of Theorem 3.21, ⇐] Put Wi = 〈v1, . . . , vi〉.

Then Wi is the only subspace W of dim i such that ϕ(W ) ⊆W , a ϕ-invariant subspaceof V of dim i, and Wn−i is not a complement to it, as Wi ∩Wn−i = Wmin(i,n−i).

Proof of Theorem 3.22, uniqueness. Suppose α : V → V , α nilpotent and α =⊕r

i=1 Jki .Rearrange their order so that ki ≥ kj for i ≥ j, and group them together, so

⊕ri=1miJi.

There are mi blocks of size i, and

mi = #{ka | ka = i

}. (∗)

Example 3.23. If (k1, k2, . . . ) = (3, 3, 2, 1, 1, 1), then n = 11, and m1 = 3,m2 =1,m3 = 2, and ma = 0 for a > 3. (It is customary to omit the zero entries whenlisting these numbers).

Definition. Let Pn = {(k1, k2, . . . , kn) ∈ Nn | k1 ≥ k2 ≥ · · · ≥ kn ≥ 0,∑ki = n}

be the set of partitions of n. This is isomorphic to the set {m : N→ N |∑

i im(i) =n} as above.


We represent k ∈ Pn by a picture, with a row of length kj for each j (equivalently, withmi rows of length i). For example, the above partition (3, 3, 2, 1, 1, 1) has picture

k =

X X XX X XX XXXX,

Now define kT , if k ∈ Pn, the dual partition, to be the partition attached to the trans-posed diagram.

In the above example kT = (6, 3, 2).

It is clear that k determines kT . In formulas:

kT = (m1 +m2 +m3 + · · ·+mn,m2 +m3 + · · ·+mn, . . . ,mn)

Now, let α : V → V , and α =⊕r

i=1 Jki =⊕r

i=1miJi as before. Observe that

dim kerα = # of Jordan blocks = r =n∑i=1

mi = (kT )1

dim kerα2 = # of Jordan blocks of sizes 1 and 2 =n∑i=1

mi +n∑i=2

mi = (kT )1 + (kT )2

...

dim kerαn = # of Jordan blocks of sizes 1, . . . , n =

n∑k=1

n∑i=k

mi =

n∑i=1

(kT )i.

That is, dim kerα, . . . , dim kerαn determine the dual partition to the partition of n intoJordan blocks, and hence determine it.

It follows that the decomposition α =⊕n

i=1miJi is unique.

Remark. This is a practical way to compute JNF of a matrix A. First computer chA(x) =∏ri=1 (x− λi)ni , then compute eigenvalues with ker(A−λiI), ker (A− λiI)2 , . . . , ker (A− λiI)n.

Corollary 3.24. The number of nilpotent conjugacy classes is equal to the size of Pn.

Exercises 3.25.

(i) List all the partitions of {1, 2, 3, 4, 5}; show there are 7 of them.

(ii) Show that the size of Pn is the coefficient of xn in∏i≥1

1

1− xi= (1 + x+ x2 + x3 + · · · )(1 + x2 + x4 + x6 + · · · )(1 + x3 + x6 + x9 + · · · ) · · ·

=∏k≥1

∞∑i=0

xki


Theorem 3.26: Jordan Normal Form

Every matrix is conjugate to a direct sum of Jordan blocks Lecture 15

Jn(λ) =

λ 1 0

. . .. . .. . . 1

0 λ

Proof. It is enough to show this when ϕ : V → V has a single generalised eigenspacewith eigenvalue λ, and now replacing ϕ by ϕ− λI, we can assume that ϕ is nilpotent.

Induct on n = dimV . The case n = 1 is clear.

Consider V ′ = Imϕ = ϕ(V ). Then V ′ 6= V , as ϕ is nilpotent, and ϕ(V ′) ⊆ ϕ(V ) = V ′,and ϕ | V ′ : V ′ → V ′ is obviously nilpotent, so induction gives the existence of a basis

e1, . . . , ek1︸︷︷︸Jk1

⊕ ek1+1, . . . , ek1+k2︸︷︷︸Jk2

⊕ . . .⊕ . . . , ek1+...+kr︸︷︷︸Jkr

such that ϕ | V ′ is in JNF with respect to this basis.

Because V ′ = Imϕ, it must be that the tail end of these strings is in Imϕ; that is, thereexist b1, . . . , br ∈ V \V ′ such that ϕ(bi) = ek1+···+ki , as ek1+···+ki 6∈ ϕ(V ′). Notice theseare linearly independent, as if

∑λi bi = 0, then∑

λi ϕ(bi) =∑

λi ek1+···+ki = 0.

But ek1 , . . . , ek1+...+kr are linearly independent, hence λ1 = · · · = λr = 0. Even better:{ej , bi | j ≤ k1 + . . .+ kr, 1 ≤ i ≤ r} are linearly independent. (Proof: exercise.)

Finally, extend e1, ek1+1, . . . , ek1+...+kr−1+1︸︷︷︸basis of kerϕ ∩ Imϕ

to a basis of kerϕ, by adding basis vectors.

Denote these by q1, . . . , qs. Exercise. Show {ej , bi, qk} are linearly independent. (∗∗)

Now, the rank-nullity theorem shows that dim Imϕ+ dim kerϕ = dimV . But dim Imϕis the number of the ei, that is k1 + · · · + kr, and dim kerϕ is the number of Jordanblocks, which is r+ s, (r is the number of blocks of size greater than one, s the numberof size one), which is the number of bi plus the number of qk.

So this shows that ej , bi, qk are a basis of V , and hence with respect to this basis,

ϕ = Jk1+1 ⊕ · · · ⊕ Jkr+1 ⊕ J1 ⊕ · · · ⊕ J1︸︷︷︸s times


3.4 Applications of JNF

Definition. Suppose α : V → V . The minimum polynomial of α is a monicpolynomial p(x) of smallest degree such that p(α) = 0.

Lemma 3.27. If q(x) ∈ F [x] and q(α) = 0, then p � q.

Proof. Write q = pa+ r, with a, r ∈ F [x] and deg r < deg p. Then

0 = q(α) = p(α) a(α) + r(α) = r(α) =⇒ r(α) = 0,

which contradicts deg p as minimal unless r = 0.

As chα(α) = 0, p(x) � chα(x), and in particular, it exists. (And by our lemma, isunique.) Here is a cheap proof that the minimum polynomial exists, which doesn’t useCayley Hamilton.

Proof. I, α, α2, . . . , αn2

are n2 +1 linear functions in EndV , a vector space of dimension

n2. Hence there must be a relation of linear dependence,∑n2

0 ai αi = 0, so q(x) =∑n2

0 ai xi is a polynomial with q(α) = 0.

Now, lets use JNF to determine the minimial polynomial.

Exercise 3.28. Let A ∈ Matn(F), chA(x) = (x − λ1)n1 · · · (x − λr)nr . Supposethat the maximal size of a Jordan block with eigenvalue λi is ki. (So ki ≤ ni for alli). Show that the minimum polynomial of A is (x− λ1)k1 · · · (x− λr)kr .

So the minimum polynomial forgets most of the structure of the Jordan normal form.

Another application of JNF is we can use it to compute powers Bn of a matrix B forany n ≥ 0. First observe that (PAP−1)n = PAnP−1 Now write B = PAP−1 with Ain Jordan normal form. So to finish we must compute what the powers of elements inJNF look like. But

Jn =

0 1 00 1

. . .. . .

0 10 0

, J2

n =

0 0 1 00 1

. . .

0 10

0 0

, . . . , Jn−1

n =

0 0 1

0

0 0

and

(λI + Jn)a =∑k≥0

(a

k

)λa−kJkn .

Now assume F = C.

Definition. expA =∑n≥0

An

n!, A ∈ Matn(C).

This is an infinite sum, and we must show it converges. This means that each matrixcoefficient converges. This is very easy, but we omit here for lack of time.


Example 3.29. For a diagonal matrix:

exp

λ1 0. . .

0 λn

=

eλ1 0

. . .

0 eλn

and convergence is usual convergence of exp.

Exercises 3.30.

(i) If AB = BA, then exp(A+B) = expA expB.

(ii) Hence exp(Jn + λI) = eλ exp(Jn)

(iii) P · exp(A) · P−1 = exp(PAP−1)

So now you know how to compute exp(A), for A ∈ Matn(C).

We can use this to solve linear ODEs with constant coefficients:

Consider the linear ODEdy

dt= Ay,

for A ∈ Matn(C), y = (y1(t), y2(t), . . . , yn(t))T , yi(t) ∈ C∞(C).

Example 3.31. Consider

dnz

dtn+ cn−1

dn−1z

dtn−1+ · · ·+ c0z = 0, (∗∗)

This is a particular case of the above, where A is the matrix0 1

0 1. . .

0 1−c0 −c1 . . . −cn−1

.

To see this, consider what Ay = y ′ means. Set z = y1, then y2 = y ′1 = z ′,

y3 = y ′2 = z ′′, . . . , yn = y ′n−1 = dn−1zdtn−1 and (∗∗) is the last equation.

There is a unique solution of dydt = Ay with fixed initial conditions y(0), by a theorem

of analysis. On the other hand:

Exercise 3.32. exp(At) y(0) is a solution, that is

d

dt

(exp(At) y(0)

)= A exp(At) y(0)

Hence it is the unique solution with value y(0).

Compute this when A = λI + Jn is a Jordan block of size n.

Duals | 53

4 Duals

This chapter really belongs after chapter 1 – it’s just definitions and intepretations ofrow reduction.

Lecture 16Definition. Let V be a vector space over a field F. Then

V ∗ = L(V,F) = {linear functions V → F}

is the dual space of V .

Examples 4.1.

(i) Let V = R3. Then (x, y, z) 7→ x− y is in V ∗.

(ii) If V = C([0, 1]) =⟨continuous functions [0, 1]→ R

⟩, then f 7→

∫ 10 f(t) dt is

in C([0, 1])∗.

Definition. Let V be a finite dimensional vector space over F, and v1, . . . , vn bea basis of V . Then define v∗i ∈ V ∗ by

v∗i (vj) = δij =

{0 if i 6= j,

1 if i = j,

and extend linearly. That is, v∗i

(∑j λj vj

)= λi.

Lemma 4.2. The set v∗1, . . . , v∗n is a basis for V ∗, called the basis dual to or dual basis

for v1, . . . , vn. In particular, dimV ∗ = dimV .

Proof. Linear independence: if∑λi v

∗i = 0, then 0 =

(∑λi v

∗i

)(vj) = λj , so λj = 0 for

all j. Span: if ϕ ∈ V ∗, then we claim

ϕ =n∑j=1

ϕ(vj) · v∗j .

As ϕ is linear, it is enough to check that the right hand side applied to vk is ϕ(vk). But∑j ϕ(vj) v

∗j (vk) =

∑j ϕ(vj) δjk = ϕ(vk).

Remarks.

(i) We know in general that dimL(V,W ) = dimV dimW .

(ii) If V is finite dimensional, then this shows that V ∼= V ∗, as any two vector spacesof dimension n are isomorphic. But they are not canonically isomorphic (there isno natural choice of isomorphism).

If the vector space V has more structure (for example, a group G acts upon it),then V and V ∗ are not usually isomorphic in a way that respects this structure.

(iii) If V = F [x], then V ∗∼−→ FN by the isomorphism θ ∈ V ∗ 7→ (θ(1), θ(x), θ(x2), . . .),

and conversely, if λi ∈ F, i = 0, 1, 2, . . . is any sequence of elements of F, we getan element of V ∗ by sending

∑ai x

i 7→∑ai λi (notice this is a finite sum).

Thus V and V ∗ are not isomorphic, since dimV is countable, but dimFN is un-countable.


Definition. Let V and W be vector space over F, and α a linear map V → W ,α ∈ L(V,W ). Then we define α∗ : W ∗ → V ∗ ∈ L(W ∗, V ∗), by setting α∗(θ) = θα :

V → F.

(Note: α linear, θ linear implies θα linear, and so α∗(θ) ∈ V ∗ as claimed, if θ ∈W ∗.)

Lemma 4.3. Let V,W be finite dimensional vector spaces, with

v1, . . . , vn a basis of V , and w1, . . . , wm a basis for W ;

v∗1, . . . , v∗n the dual basis of V ∗, and w∗1, . . . , w

∗m the dual basis for W ∗;

If α is a linear map V → W , and A is the matrix of α with respect to vi, wj, then AT

is the matrix of α∗ : W ∗ → V ∗ with respect to w∗j , v∗i .

Proof. Write α∗(w∗i ) =∑n

j=1 cji v∗j , so cij is a matrix of α∗. Apply this to vk:

LHS =(α∗(w∗i )

)(vk) = w∗i (α(vk)) = w∗i

(∑` a`k w`

)= aik

RHS = cki,

that is, cji = aij for all i, j.

This was the promised interpretation of AT .

Corollary 4.4.

(i) (αβ)∗ = β∗α∗;

(ii) (α+ β)∗ = α∗ + β∗;

(iii) detα∗ = detα

Proof. (i) and (ii) are immediate from the definition, or use the result (AB)T = BTAT .(iii) we proved in the section on determinants where we showed that detAT = detA.

Now observe that (AT )T

= A. What does this mean?

Proposition 4.5.

(i) Consider the map V → V ∗∗ = (V ∗)∗ taking v 7→ ˆv, where ˆv(θ) = θ(v) if θ ∈ V ∗.Then ˆv ∈ V ∗∗, and the map V 7→ V ∗∗ is linear and injective.

(ii) Hence if V is a finite dimensional vector space over F, then this map is an iso-morphism, so V

∼−→ V ∗∗ canonically.

Proof.

(i) We first show ˆv ∈ V ∗∗, that is ˆv : V ∗ → F, is linear:

ˆv(a1θ1 + a2θ2) = (a1θ1 + a2θ2)(v) = a1 θ1(v) + a2 θ2(v)

= a1ˆv(θ1) + a2

ˆv(θ2).

Next, the map V → V ∗∗ is linear. This is because

(λ1v1 + λ2v2)ˆ(θ) = θ (λ1v1 + λ2v2)

= λ1 θ(v1) + λ2 θ(v2)

=(λ1

ˆv1 + λ2ˆv2

)(θ).

Finally, if v 6= 0, then there exists a linear function θ : V → F such that θ(v) 6= 0.

Duals | 55

(Proof: extend v to a basis, and then define θ on this basis. We’ve only provedthat this is okay when V is finite dimensional, but it’s always okay.)

Thus ˆv(θ) 6= 0, so ˆv 6= 0, and V → V ∗∗ is injective.

(ii) Immediate.

Definition.

(i) If U ≤ V , then define

U◦ ={θ ∈ V ∗ | θ(U) = 0

}={θ ∈ V ∗ | θ(u) = 0 ∀u ∈ U

}≤ V ∗.

This is the annihilator of U , a subspace of V ∗, often denoted U⊥.

(ii) If W ≤ V ∗, then define

◦W ={v ∈ V | ϕ(v) = 0 ∀ϕ ∈W

}≤ V.

This is often denoted ⊥W .

Example 4.6. If V = R3, U =⟨(1, 2, 1)

⟩, then

U◦ =

3∑i=1

ai e∗i ∈ V ∗ | a1 + 2a2 + a3 = 0

=

⟨−210

,

01−2

⟩ .Remark. If V is finite dimensional, and W ≤ V ∗, then under the canonical isomorphismV → V ∗∗, we have ◦W 7→W ◦, where ◦W ≤ V and (W ◦) ≤ (V ∗)∗. Proof is an exercise.

Lemma 4.7. Let V be a finite dimensional vector space with U ≤ V . Then

dimU + dimU◦ = dimV.

Proof. Consider the restriction map Res : V ∗ → U∗ taking ϕ 7→ ϕ|U . (Note thatRes = ι∗, where ι : U ↪→ V is the inclusion.)

Then ker Res = U◦, by definition, and Res is surjective (why?).

So the rank-nullity theorem implies the result, as dimV ∗ = dimV .

Proposition 4.8. Let V,W be a finite dimensional vector space over F, with α ∈L(V,W ). Then

(i) ker(α∗ : W ∗ → V ∗) = (Imα)◦ (≤W ∗);

(ii) rk(α∗) = rk(α); that is, rkAT = rkA, as promised;

(iii) Imα∗ = (kerα)◦.

Proof.

(i) Let θ ∈ W ∗. Then θ ∈ kerα∗ ⇐⇒ θα = 0 ⇐⇒ θα(v) = 0 ∀v ∈ V ⇐⇒ θ ∈(Imα)◦.

(ii) By rank-nullity, we have

rkα∗ = dimW − dim kerα∗

= dimW − dim(Imα)◦,by (i),

= dim Imα, by the previous lemma,

= rkα, by definition.


(iii) Let ϕ ∈ Imα∗, and then ϕ = θ ◦ α for some θ ∈ W ∗. Now, let v ∈ kerα. Thenϕ(v) = θα(v) = 0, so ϕ ∈ (kerα)◦; that is, Imα∗ ⊆ (kerα)◦.

But by (ii),

dim Imα∗ = rk(α∗) = rkα = dimV − dim kerα

= dim (kerα)◦

by the previous lemma; that is, they both have the same dimension, so they areequal.

Lemma 4.9. Let U1, U2 ≤ V , and V be finite dimensional. Then

(i) U◦◦1∼−→ ◦(U◦1 )

∼−→ U1 under the isomorphism V∼−→ V ∗∗.

(ii) (U1 + U2)◦ = U◦1 ∩ U◦2 .

(iii) (U1 ∩ U2)◦ = U◦1 + U◦2 .

Proof. Exercise!

Bilinear forms | 57

5 Bilinear forms

Lecture 17Definition. Let V be a vector space over F. A bilinear form on V is a multilinearform V × V → F; that is, ψ : V × V → F such that

ψ(v, a1w1 + a2w2) = a1 ψ(v, w1) + a2 ψ(v, w2)

ψ(a1v1 + a2v2, w) = a1 ψ(v1, w) + a2 ψ(v2, w).

Examples 5.1.

(i) V = Fn, ψ((x1, . . . , xn), (y1, . . . , yn)

)=∑n

i=1 xi yi, which is the dot productof F = Rn.

(ii) V = Fn, A ∈ Matn(F). Define ψ(v, w) = vTAw. This is bilinear.

(i) is the special case when A = I. Another special case is A = 0, which isalso a bilinear form.

(iii) Take V = C([0, 1]), the set of continuous functions on [0, 1]. Then

(f, g) 7→∫ 1

0f(t) g(t) dt

is bilinear.

Definition. The set of bilinear forms of V is denoted

Bil(V ) = {ψ : V × V → F bilinear}

Exercise 5.2. If g ∈ GL(V ), ψ ∈ Bil(V ), then gψ : (v, w) 7→ ψ(g−1v, g−1w) is abilinear form. Show this defines a group action of GL(V ) on Bil(V ). In particular,show that h(gψ) = (hg)ψ, and you’ll see why the inverse is in the definition of gψ.

Definition. We say that ψ,ϕ ∈ Bil(V ) are isomorphic if there is some g ∈ GL(V )such that ϕ = gψ; that is, if they are in the same orbit.

Q: What are the orbits of GL(V ) on Bil(V ); that is, what are the isomorphism classesof bilinear forms?

Compare with:

• L(V,W )/GL(V ) × GL(W ) ↔{i ∈ N | 0 ≤ i ≤ min(dimV,dimW )

}with ϕ 7→

rkϕ. Here (g, h) ◦ ϕ = hϕg−1.

• L(V, V )/GL(V )↔ JNF. Here g◦ϕ = gϕg−1 and we require F algebraically closed.

• Bil(V )/GL(V )↔??, with (g ◦ ψ)(v, w) = ψ(g−1v, g−1w).

First, let’s express this in matrix form. Let v1, . . . , vn be a basis for V , where V is afinite dimensional vector space over F, and ψ ∈ Bil(V ). Then

ψ(∑

i xi vi,∑

j yj vj

)=∑

i,j xi yj ψ(vi, vj)

So if we define a matrix A by A = (aij), aij = ψ(vi, vj), then we say that A is the matrixof the bilinear form with respect to the basis v1, . . . , vn.


In other words, the isomorphism V∼−→θ

Fn induces an isomorphism Bil(V )∼−→ Matn F,

ψ 7→ aij = ψ(vi, vj).

Now, let v ′1, . . . , v′n be another basis, with v ′j =

∑i pij vi. Then

ψ(v ′a, v′b) = ψ

(∑i pia vi,

∑j pjb vj

)=∑

i,j pia ψ(vi, vj) pjb =(P TAP

)ab

So if P is the matrix of the linear map g−1 : V → V , then the matrix of gψ =ψ(g−1(·), g−1(·)) is P TAP .

So the concrete version of our question “what are the orbits Bil(V )/GL(V )” is “whatare the orbits of GLn on Matn(F) for this action?”

Definition. Suppose Q acts on A by QAQT . We say that A and B are congruentif B = QAQT for some Q ∈ GLn.

We want to understand when two matrices are congruent.

Recall that if P,Q ∈ GLn, then rank(PAQ) = rank(A). Hence taking Q = P T , we getrank(PAP T ) = rankA, and so the following definition makes sense:

Definition. If ψ ∈ Bil(V ), then the rank of ψ, denoted rankψ or rkψ is the rankof the matrix of ψ with respect to some (and hence any) basis of V .

We will see later how to give a basis independent definition of the rank.

Definition. A form ψ ∈ Bil(V ) is

• symmetric if ψ(v, w) = ψ(w, v) for all v, w ∈ V . In terms of the matrix A ofψ, this is requiring AT = A.

• anti-symmetric if ψ(v, v) = 0 for all v ∈ V , which implies ψ(v, w) = −ψ(w, v)for all v, w ∈ V . In terms of the matrix, AT = −A.

From now on, we assume that charF 6= 2, so 1 + 1 = 2 6= 0 and 1/2 exists.

Given ψ, put

ψ+(v, w) = 12

[ψ(v, w) + ψ(w, v)

]ψ−(v, w) = 1

2

[ψ(v, w)− ψ(w, v)

],

which splits a form into symmetric and anti-symmetric components, and ψ = ψ+ +ψ−.

Observe that if ψ is symmetric or anti-symmetric, then so is gψ = ψ(g−1(·), g−1(·)),or in matrix form, A is symmetric or anti-symmetric if and only if PAP T is, since(PAP T )T = PATP T .

So to understand Bil(V )/GL(V ), we will first understand the simpler question of clas-sifying symmetric and anti-symmetric forms. Set

Bilε(V ) ={ψ ∈ Bil(V ) | ψ(v, w) = εψ(w, v) ∀v, w ∈ V

}ε = ± 1.

So Bil+(V ) is the symmetric forms, and Bil− is the antisymmetric forms.

So our simpler question is to ask, “What is Bilε(V )/GL(V )?”

Hard exercise: Once you’ve finished revising the course, go and classify Bil(V )/GL(V ).

Bilinear forms | 59

5.1 Symmetric forms

Let V be a finite dimensional vector space over F and charF 6= 2. If ψ ∈ Bil+(V ) is asymmetric form, then define Q : V → F as

Q(v) = Qψ(v) = ψ(v, v).

We have

Q(u+ v) = ψ(u+ v, u+ v)

= ψ(u, u) + ψ(v, v) + ψ(u, v) + ψ(v, u)

= Q(u) +Q(v) + ψ(u, v) + ψ(v, u)

Q(λu) = ψ(λu, λu)

= λ2 ψ(u, u)

= λ2Q(u).

Definition. A quadratic form on V is a function Q : V → F such that

(i) Q(λv) = λ2Q(v);

(ii) Set ψQ(u, v) = 12

[Q(u+ v)−Q(u)−Q(v)

]; then ψQ : V ×V → F is bilinear.

Lemma 5.3. The map Bil+(V ) → {quadratic forms on V }, ψ 7→ Qψ is a bijection;Q 7→ ψQ is its inverse.

Proof. Clear. We just note that

ψQ(v, v) = 12

(Q(2v)− 2Q(v)

)= 1

2

(4Q(v)− 2Q(v)

)= Q(v),

as Q(λu) = λ2Q(u).

Remark. If v1, . . . , vn is a basis of V with ψ(vi, vj) = aij , then

Q(∑

xivi)

=∑aijxixj = xTAx,

that is, a quadratic form is a homogeneous polynomial of degree 2 in the variablesx1, . . . , xn.

Theorem 5.4

Let V be a finite dimensional vector space over F and ψ ∈ Bil+(V ) a symmetricbilinear form. Then there is some basis v1, . . . , vn of V such that ψ(vi, vj) = 0 ifi 6= j. That is, we can choose a basis so that the matrix of ψ is diagonal.

Proof. Induct on dimV . Now dimV = 1 is clear. It is also clear if ψ(v, w) = 0 for allv, w ∈ V .

So assume otherwise. Then there exists a w ∈ V such that ψ(w,w) 6= 0. (As ifψ(w,w) = 0 for all w ∈ V ; that is, Q(w) = 0 for all w ∈ V , then by the lemma,ψ(v, w) = 0 for all v, w ∈ V .)

To continue, we need some notation. For an arbitrary ψ ∈ Bil(V ), U ≤ V , define

U⊥ ={v ∈ V : ψ(u, v) = 0 for all u ∈ U

}.


Claim. 〈w〉 ⊕ 〈w〉⊥ = V is a direct sum.

[Proof of claim] As ψ(w,w) 6= 0, w 6∈ 〈w〉⊥, so 〈w〉 ∩ 〈w〉⊥ = 0, and the sum is direct.

Now we must show 〈w〉+ 〈w〉⊥ = V .

Let v ∈ V . Consider v − λw. We want to find a λ such that v − λw ∈ 〈w〉⊥, as thenv = λw + (v − λw) shows v ∈ 〈w〉+ 〈w〉⊥.

But v − λw ∈ 〈w〉⊥ ⇐⇒ ψ(w, v − λw) = 0 ⇐⇒ ψ(w, v) = λψ(w,w); that is, set

λ =ψ(w, v)

ψ(w,w).Lecture 18

Now let W = 〈w〉⊥, and ψ ′ = ψ |W : W ×W → F the restriction of ψ. This is symmetricbilinear, so by induction there is some basis v2, . . . , vn of W such that ψ(vi, vj) = λi δijfor λi ∈ F.

Hence, as ψ(w, vi) = ψ(vi, w) = 0 if i ≥ 2, put v1 = w and we get that with respect tothe basis v1, . . . , vn, the matrix of ψ is

ψ(w,w) 0λ2

. . .

0 λn

.

Warning. The diagonal entries are not determined by ψ, for example, considera1

. . .

an

λ1

. . .

λn

a1

. . .

an

T

=

a21 λ1

. . .

a2n λn

,

that is, rescaling the basis element vi to aivi changes Q(aivi) = a2i Q(vi).

Also, we can reorder our basis – equivalently, take P = P (w), the permutation matrixof w ∈ Sn, and note P T = P (w−1), so

P (w)AP (w)T = P (w)AP (w)−1.

Furthermore, it’s not obvious that more complicated things can’t happen, for example,

P

(2

3

)P T =

(5

30

)if P =

(1 −31 2

).

Corollary 5.5. Let V be a finite dimensional vector space over F, and suppose F isalgebraically closed (such as F = C). Then

Bil+(V )/GL(V )∼−→ {i : 0 ≤ i ≤ dimV } ,

under the isomorphism taking ψ 7→ rkψ.

Proof. By the above, we can reorder and rescale so the matrix looks like

1. . .

10

. . .

0

as√λi is always in F.

Bilinear forms | 61

That is, there exists a basis of Q such that

Q

n∑i=1

xi vi

=r∑i=1

x2i ,

where r = rkQ ≤ n.

Now let F = R, and ψ : V × V → R be bilinear symmetric.

By the theorem, there is some basis v1, . . . , vn such that φ(vi, vj) = λi δij . Replace viby vi/

√|λi| if λi 6= 0 and reorder the basis, we get ψ is represented by the matrixIp −Iq

0

,

for p, q ≥ 0, that is, with respect to this basis

Q

n∑i=1

xi vi

=

p∑i=1

x2i −

p+q∑i=p+1

x2i .

Note that rkψ = p+ q.

Definition. The signature of ψ is signψ = p− q.

We need to show this is well defined, and not an artefact of the basis chosen.

Theorem 5.6: Sylvester’s law of inertia

The signature does not depend on the choice of basis; that is, if ψ is representedby Ip −Iq

0

wrt v1, . . . , vn and by

I ′p −I ′q0

wrt w1, . . . , wn,

then p = p ′ and q = q ′.

Warning. In general, tr(P TAP ) 6= tr(A), so we can’t prove it that way.

Definition. Let Q : V → R be a quadratic form on V , where V is a vector spaceover R, and U ≤ V .

We say Q is positive semi-definite on U if for all u ∈ U , Q(u) ≥ 0. Further, ifQ(u) = 0 ⇐⇒ u = 0, then we say that Q is positive definite on U .

If U = V , then we just say that Q is positive (semi) definite.

We define negative (semi) definite to mean −Q is positive (semi) definite.


Proof of theorem. Let P =⟨v1, . . . , vp

⟩. So if v =

∑pi=1 λivi ∈ P , Q(v) =

∑i λ

2i v

2i ≥ 0,

and Q(v) = 0 ⇐⇒ v = 0, so Q is positive definite on P .

Let U =⟨vp+1, . . . , vp+q, . . . , vn

⟩, so Q is negative semi-definite on U . And now let P ′

be any positive definite subspace.

Now we claim that P ′ ∩ U = {0}.

Consider: if v ∈ P ′, then Q(v) ≥ 0; if v ∈ U , Q(u) ≤ 0. So if v ∈ P ′∩U , then Q(v) = 0.But P ′ is positive definite, so v = 0. Hence

dimP ′ + dimU = dim(P ′ + U) ≤ dimV = n,

and sodimP ′ ≤ dimV − dimU = dimP,

that is, p is the maximum dimension of any positive definite subspace, and hence p ′ = p.Similarly, q is the maximum dimension of any negative definite subspace, so q ′ = q.

Note that (p, q) determine (rk, sign), and conversely, p = 12 (rk + sign) and q = 1

2 (rk− sign).So we now have

Bil+(Rn)/GLn(R)→{

(p, q) : p, q ≥ 0, p+ q ≤ n} ∼−→ {(rk, sgn)}.

Example 5.7. Let V = R2, and Q

(x1

x2

)= x2

1 − x22.

Consider the line Lλ = 〈e1 + λe2〉, Q(

1λ

)= 1 − λ2, so this is positive definite if

|λ| < 1, and negative definite if |λ| > 1.

In particular, p = q = 1, but there are many choices of positive and negative definitesubspaces of maximal dimension. (Recall that lines in R2 are parameterised bypoints on the circle R ∪ {∞}).

Example 5.8. Compute the rank and signature of

Q(x, y, z) = x2 + y2 + 2z2 + 2xy + 2xz − 2yz.

Note the matrix A of Q is1 1 11 1 −11 −1 2

, that is Q

xyz

=(x y z

)A

xyz

.

(Recall that we for an arbitrary quadratic form Q, its matrix A is given by

Q(∑

i xi vi)

=∑

i,j aij xi xj =∑

i aii x2i +

∑i<j 2aij xi xj .

which is why the off-diagonal terms halved!)

We could apply the method in the proof of the theorem: begin by finding w ∈ R3

such that Q(w) 6= 0. Take w = e1 = (1, 0, 0). Now find 〈w〉⊥. To do this, we seekλ such that e2 + λe1 ∈ 〈e1〉⊥. But Q(e1, e2 + λe1) = 0 implies λ = −1. Similarlywe find e3− e1 ∈ 〈e1〉⊥, so 〈e1〉⊥ = 〈e2 − e1, e3 − e1〉. Now continue with Q | 〈e1〉⊥,and so on.

Bilinear forms | 63

Here is a nicer way of writing this same computation: row and column reduce A:

First, R2 7→ R2−R1 and C2 7→ C2− C1. In matrix form:

(I − E21)A (I − E12) =

1 0 10 0 −21 −2 2

Next R3 7→ R3−R1 and C3 7→ C3− C1, giving1

0 −2−2 1

.

Then swap R2, R3, and C2, C3, giving11 −2−2 0

.

Then R3 7→ R3 + 2R2 and C3 7→ C3 + 2C2, giving11−4

.

Finally, rescale the last basis vector, giving11−1

.

That is, if we put

P = (I − E12) (I − E13)P ((2 3)) (I − 2E23)

11

12

,

then

P TAP =

11−1

.

Method 2: we could just try to complete the square

Q(x, y, z) = (x+ y + z)2 + z2 − 4yz = (x+ y + z)2 + (z − 2y)2 − 4y2

Remark. We will see in Chapter 6 that sign(A) is the number of positive eigenvaluesminus the number of negative eigenvalues, so we could also compute it by computingthe characteristic polynomial of A.


5.2 Anti-symmetric forms

We begin with a basis independent meaning of the rank of an arbitrary bilinear form.

Proposition 5.9. Let V be a finite dimensional vector space over F. Then

rkψ = dimV − dimV ⊥ = dimV − dim⊥V,

where V ⊥ ={v ∈ V : ψ(V, v) = 0

}and ⊥V =

{v ∈ V : ψ(v, V ) = 0

}.

Proof.Lecture 19 Define a linear map Bil(V )→ L(V, V ∗), ψ 7→ ψL with ψL(v)(w) = ψ(v, w). Firstwe check that this is well-defined: ψ(v, ·) linear implies ψL(v) ∈ V ∗, and ψ(·, w) linearimplies ψL(λv + λ ′v ′) = λψL(v) + λ ′ ψL(v ′); that is, ψL is linear, and ψL ∈ L(V, V ∗).

It is clear that the map is injective (as ψ 6≡ 0 implies there are some v, w such thatψ(v, w) 6= 0, and so ψL(v)(w) 6= 0) and hence an isomorphism, as Bil(V ) and L(V, V ∗)are both vector spaces of dimension (dimV )2.

Let v1, . . . , vn be a basis of V , and v∗1, . . . , v∗n be the dual basis of V ∗; that is, v∗i (vj) = δij .

Let A = (aij) be the matrix of ψL with respect to these bases; that is,

ψL(vj) =∑i

aij v∗i . (∗)

Apply both sides of (∗), to vi, and we have

ψ(vj , vi) = ψL(vj , vi) = aij .

So the matrix of ψ with respect to the basis vi is just AT .

Exercise 5.10. Define ψR ∈ L(V, V ∗) by ψR(v)(w) = ψ(w, v). Show the matrix ofψR is the matrix of ψ (which we’ve just seen is the transpose of the matrix of ψL).

Now we haverkA = dim Im(ψL : V → V ∗) = dimV − dim kerψL

andkerψL =

{v ∈ V : ψ(v, V ) = 0

}= ⊥V.

But alsorkA = rkAT = dim Im(ψR : V → V ∗) = dimV − dim kerψR,

and kerψR = V ⊥.

Definition. A form ψ ∈ Bil(V ) is non-degenerate if any of the following equivalentconditions hold:

• V ⊥ = ⊥V = {0};• rkψ = dimV ;

• ψL : V → V ∗ taking v 7→ ψ(v, ·) is an isomorphism;

• ψR : V → V ∗ taking v 7→ ψ(·, v) is an isomorphism;

• for all v ∈ V \{0}, there is some w ∈ V such that ψ(v, w) 6= 0; that is, anon-degenerate bilinear form gives an isomorphism between V and V ∗.

Bilinear forms | 65

Proposition 5.11. Let W ≤ V and ψ ∈ Bil(V ). Then

dimW + dimW⊥ − dim(W ∩ ⊥V ) = dimV.

Proof. Consider the map V →W ∗ taking v 7→ ψ(·, v). (When we write ψ(·, v) : W → F,we mean the map w 7→ ψ(w, v).) The kernel is

ker ={v ∈ V : ψ(v, w) = 0 ∀w ∈W

}= W⊥,

so the rank-nullity theorem gives

dimV = dimW⊥ + dim Im .

So what is the image? Recall that

dim Im(θ : V →W ∗) = dim Im(θ∗ : W = W ∗∗ → V ∗) = dimW − dim ker θ∗.

Butθ∗(w) = ψ(w, ·) : V → F,

and so θ∗(w) = 0 if and only if w ∈ ⊥V ; that is, ker θ∗ = W ∩ ⊥V , proving theproposition.

Remark. If you are comfortable with the notion of a quotient vector space, considerinstead the map V → (W/W ∩ ⊥V )∗, v 7→ ψ(·, v) and show it is well-defined, surjectiveand has W⊥ as the kernel.

Example 5.12. If V = R2, and Q

(x1

x2

)= x2

1 − x22, then A =

(1−1

).

Then if W =

⟨(11

)⟩, W⊥ = W and the proposition says 1 + 1− 0 = 2.

Or if we let V = C2, Q

(x1

x2

)= x2

1 + x22 , so A =

(1

1

), and set W =

⟨(1i

)⟩then W = W⊥.

Corollary 5.13. ψ∣∣W

: W ×W → F is non-degenerate if and only if V = W ⊕W⊥.

Proof. (⇐) ψ∣∣W

is non-degenerate means that for all w ∈W\{0}, there is some w ′ ∈Wsuch that ψ(w,w ′) 6= 0, so if w ∈ W⊥ ∩W , w 6= 0, then for all w ′ ∈ W , ψ(w,w ′) = 0,a contradiction, and so W ∩W⊥ = {0}. Now

dim(W +W⊥) = dimW + dimW⊥ ≥ dimV,

by the proposition, so W +W⊥ = V (and also ψ is non-degenerate on all of V clearly).

(⇒) Clear by our earlier remarks that W ∩W⊥ = 0 if and only if ψ∣∣W

is non-degenerate.


Theorem 5.14

Let ψ ∈ Bil−(V ) be an anti -symmetric bilinear form. Then there is some basisv1, . . . , vn of V such that the matrix of ψ is

0 1−1 0

0

0 1−1 0

. . .

0 1−1 0

0. . .

0 0

In particular, rkψ is even! (F is arbitrary.)

Remark. If ψ ∈ Bil±(V ), then W⊥ = ⊥W for all W ≤ V .

Proof. We induct on rkψ, if rkψ = 0, then ψ = 0 and we’re done.

Otherwise, there are some v1, v2 ∈ V such that ψ(v1, v2) 6= 0. If v2 = λv1 thenψ(v1, λv2) = λψ(v1, v1) = 0, as ψ is anti-symmetric; so v1, v2 are linearly independent.Change v2 to v2/ψ(v1, v2).

So now ψ(v1, v2) = 1. Put W = 〈v1, v2〉, then ψ∣∣W

has matrix0 1−1 0

, which is

non-degenerate, so the corollary gives V = W ⊕W⊥. And now induction gives the basisof W⊥, v3, . . . , vn, of the correct form, and v1, . . . , vn is our basis.

So we’ve shown that there is an isomorphism

Bil−(V )/GL(V )∼−→{

2i : 0 ≤ i ≤ 12 dimV

}.

taking ψ 7→ rkψ.

Remark. A non-degenerate anti-symmetric form ψ is usually called a symplectic form.

Let ψ ∈ Bil−(V ) be non-degenerate, rkψ = n = dimV (even!). Put L = 〈v1, v3, v5, . . .〉,with v1, . . . , vn as above, and then L⊥ = L. Such a subspace is called Lagrangian.

If U ≤ L, then U⊥ ≥ L⊥ = L, and so U ⊆ U⊥. Such a subspace is called isotropic.

Bilinear forms | 67

Definition. If ψ ∈ Bil(V ), then the isometries of ψ are

Isomψ ={g ∈ GL(V ) : gψ = ψ

}={g ∈ GL(V ) : ψ(g−1v, g−1w) = ψ(v, w) ∀v, w ∈ V

}={X ∈ GLn(F) : XAXT = A

}if A is a matrix of ψ.

This is a group.

Exercise 5.15. Show that Isom(gψ) = g Isom(ψ) g−1, and so isomorphism bilinearforms have isomorphic isometry groups.

If ψ ∈ Bil+(V ), ψ is non-degenerate, we often write O(ψ), the orthogonal group of ψ forthe isometry group of ψ.

Example 5.16. Suppose F = C. If ψ ∈ Bil+(V ), and ψ is non-degenerate, then ψis isomorphic to the standard quadratic form, whose matrix A = I, and so Isomψis conjugate to the group

Isom(A = I) ={X ∈ GLn(C) : XXT = I

}= On(C),

which is what we usually call the orthogonal group.

If F = R, then

Op,q(R) =

{X | X

(Ip−Iq

)XT =

(Ip−Iq

)}.

are the possible isometry groups of non-degenerate symmetric forms.

For any field F, if ψ ∈ Bil−(F) is non-degenerate, then Isomψ is called the symplecticgroup, and it is conjugate to the group

Sp2n(F) ={X : XJXT = J

},

where J is the matrix given by

J =

0 1−1 0

0

0 1−1 0

. . .

00 1−1 0

Hermitian forms | 69

6 Hermitian forms

Lecture 20A non-degenerate quadratic form on a vector space over C doesn’t behave like an innerproduct on R2. For example,

if Q

(x1

x2

)= x2

1 + x22 then we have Q

(1

i

)= 1 + i2 = 0.

We don’t have a notion of positive definite, but there is a modification of a notion of abilinear form which does.

Definition. Let V be a vector space over C; then a function ψ : V × V → C iscalled sesquilinear if

(i) For all v ∈ V , ψ(·, v), u 7→ ψ(u, v) is linear; that is

ψ(λ1 u1 + λ2 u2, v) = λ1ψ(u1, v) + λ2ψ(u2, v)

(ii) For all u, v1, v2 ∈ V , λ1, λ2 ∈ C,

ψ(u, λ1 v1 + λ2 v2) = λ1 ψ(u, v1) + λ2 ψ(u, v2),

where z is the complex conjugate of z.

It is called Hermitian if it also satisfies

(iii) ψ(v, w) = ψ(w, v) for all v, w ∈ V .

Note that (i) and (iii) imply (ii).

Let V be a vector space over C, and ψ : V × V → C a Hermitian form. Define

Q(v) = ψ(v, v) = ψ(v, v)

by (iii), so Q : V → R.

Lemma 6.1. We have Q(v) = 0 for all v ∈ V if and only if ψ(v, w) = 0 for all v, w ∈ V .

Proof. We have.

Q(u± v) = ψ(u± v, u± v) = ψ(u, u) + ψ(v, v)± ψ(u, v)± ψ(v, u)

= Q(u) +Q(v)± 2<ψ(u, v),

as z + z = 2<(z). Thus

Q(u+ v)−Q(u− v) = 4<ψ(u, v),

Q(u+ iv)−Q(u− iv) = 4 Iψ(u, v),

that is, Q : V → R determines ψ : V × V → C if Q is Hermitian:

ψ(u, v) = 14

[Q(u+ v) + iQ(u+ iv)−Q(u− v)− iQ(u− iv)

].

Note thatQ(λv) = ψ(λv, λv) = λλψ(v, v) = |λ|2Q(v).

If ψ : V × V → C is Hermitian, and v1, . . . , vn is a basis of V , then we write A = (aij),aij = ψ(vi, vj), and we call this the matrix of ψ with respect to v1, . . . , vn.

Observe that AT = A; that is, A is a Hermitian matrix.


Exercise 6.2. Show that if we change basis, v ′j =∑

i pij vi, P = (pij), then

A 7→ P TAP .

Theorem 6.3

If V is a finite dimensional vector space over C, and ψ : V × V → C is Hermitian,then there is a basis for V such that the matrix A of ψ isIp −Iq

0

for some p, q ≥ 0. Moreover, p and q are uniquely determined by ψ: p is the maxi-mum dimension of a positive definite subspace P , and q is the maximal dimensionof a negative definite subspace.

Here P ≤ V is positive definite if Q(v) ≥ 0 for all v ∈ P , and Q(v) = 0 when v ∈ Pimplies v = 0; P is negative definite −Q is positive definite on P .

Proof. Exactly as for real symmetric forms, using the Hermitian ingredients before thetheorem instead of their bilinear counterparts.

Definition. If W ≤ V , then the orthogonal complement to W is given by

W⊥ ={v ∈ V | ψ(W, v) = ψ(v,W ) = 0

}= ⊥W.

We say that ψ is non-degenerate if V ⊥ = 0, equivalenyly if p+ q = dimV . We alsodefine the unitary group

U(p, q) = Isom

(Ip−Iq

)

=

{X ∈ GLn(C) | XT

(Ip−Iq

)X =

(Ip 00 −Iq

)}

=

{stabilizers of the form

(Ip−Iq

)with respect to GLn(C) action

},

where the action takes ψ 7→ gψ, with (gψ)(x, y) = ψ(g−1x, g−1y). Again, note g−1

here so that (gh)ψ = g(hψ).

In the special case where the form ψ is positive definite, that is, conjugate to In,we call this the unitary group

U(n) = U(n, 0) ={X ∈ GLn(C) | XTX = I

}.

Proposition 6.4. Let V be a vector space over C (or R), and ψ : V × V → C (or R) aHermitian (respectively, symmetric) form, so Q : V → R.

Let v1, . . . , vn be a basis of V , and A ∈ Matn(C) the matrix of ψ, so AT = A. ThenQ : V → R is positive definite if and only if, for all k, 1 ≤ k ≤ n, the top left k × ksubmatrix of A (called Ak) has detAk ∈ R and detAk > 0.


Proof. (⇒) If Q is positive definite, then A = P T IP = P TP for some P ∈ GLn(C), andso

detA = detP T detP = |detP |2 > 0. (∗)

But if U ≤ V , as Q is positive definite on V , it is positive definite on U . Take U =〈v1, . . . , vk〉, then Q | U is positive definite, Ak is the matrix of Q | U , and by (∗),detAk > 0.

(⇐) Induct on n = dimV . The case n = 1 is clear. Now the induction hypothesis tellsus that ψ | 〈v1, . . . , vn−1〉 is positive definite, and hence the dimension of a maximumpositive definite subspace is p ≥ n− 1.

So by classification of Hermitian forms, there is some P ∈ GLn(C) such that

A = P T(In−1 0

0 c

)P ,

where c = 0, 1 or −1. But detA = |detP |2 c > 0 by assumption, so c = 1, and A = P TP ;that is, Q is positive definite.

Definition. If V is a vector space over F = R or F = C, then an inner product onV is a positive definite symmetric bilinear/Hermitian form 〈·, ·〉 : V × V → F, andwe say that V is an inner product space.

Example 6.5. Consider Rn or Cn, and the dot product 〈x,y〉 =∑xi yi. These

forms behave exactly as our intuition tells us in R2.

6.1 Inner product spaces

Definition. Let V be an inner product space over F with 〈·, ·〉 : V ×V → C. ThenQ(v) ∈ R≥0, and so we can define

|v| =√Q(v)

to be the length or norm of v. Note that |v| = 0 if and only if v = 0.

Lemma 6.6 (Cauchy-Schwarz inequality).∣∣〈v, w〉∣∣ ≤ |v| |w|.

Proof. As you’ve seen many times before:

0 ≤ 〈−λv + w,−λv + w〉= |λ|2 〈v, v〉+ 〈w,w〉 − λ 〈v, w〉 − λ 〈v, w〉.

The result is clear if v = 0, otherwise suppose |v| 6= 0, and put λ = 〈v, w〉/ 〈v, v〉. Weget

0 ≤∣∣〈v, w〉∣∣2〈v, v〉

−2∣∣〈v, w〉∣∣2〈v, v〉

+ 〈w,w〉 ,

that is,∣∣〈v, w〉∣∣2 ≤ 〈v, v〉〈w,w〉.

Note that if F = R, then 〈v, w〉 / |w| |v| ∈ [−1, 1] so there is some θ ∈ [0, π) such thatcos θ = 〈v, w〉 / |v| |w|. We call θ the angle between v and w.


Corollary 6.7 (Triangle inequality). For all v, w ∈ V , |v + w| ≤ |v|+ |w|.

Proof. As you’ve seen many times before:

|v + w|2 = 〈v + w, v + w〉= |v|2 + 2< 〈v, w〉+ |w|2

≤∣∣∣v2∣∣∣+ 2 |v| |w|+

∣∣∣w2∣∣∣ (by lemma)

=(|v|+ |w|

)2.

Lecture 21Given v1, . . . , vn with

⟨vi, vj

⟩= 0 if i 6= j, we say that v1, . . . , vn are orthogonal. If⟨

vi, vj⟩

= δij , then we say that v1, . . . , vn are orthonormal.

So v1, . . . , vn orthogonal and vi 6= 0 for all i implies that v1, . . . , vn are orthonormal,where vi = vi/ |vi|.

Lemma 6.8. If v1, . . . , vn are non-zero and orthogonal, and if v =∑n

i=1 λi vi, thenλi = 〈v, vi〉 / |vi|2.

Proof. 〈v, vk〉 =∑n

i=1 λi 〈vi, vk〉 = λk 〈vk, vk〉, hence the result.

In particular, distinct orthonormal vectors v1, . . . , vn are linearly independent, since∑i λi vi = 0 implies λi = 0.

As 〈·, ·〉 is Hermitian, we know there is a basis v1, . . . , vn such that the matrix of 〈·, ·〉 isIp −Iq0

.

As 〈·, ·〉 is positive definite, we know that p = n, q = 0, rk = dimV ; that is, this matrixis In. So we know there exists an orthonormal basis v1, . . . , vn; that is V ∼= Rn, with〈x, y〉 =

∑i xi yi, or V ∼= Cn, with 〈x, y〉 =

∑i xi yi.

Here is another constructive proof that orthonormal bases exist.

Theorem 6.9: Gram-Schmidt orthogonalisation

Let V have a basis v1, . . . , vn. Then there exists an orthonormal basis e1, . . . , ensuch that 〈v1, . . . , vk〉 = 〈e1, . . . , ek〉 for all 1 ≤ k ≤ n.

Proof. Induct on k. For k = 1, set e1 = v1/ |v1|.

Suppose we’ve found e1, . . . , ek such that 〈e1, . . . , ek〉 = 〈v1, . . . , vk〉. Define

ek+1 = vk+1 −∑

1≤i≤k〈vk+1, ei〉 ei.

Thus〈ek+1, ei〉 = 〈vk+1, ei〉 − 〈vk+1, ei〉 = 0 if i ≤ k.

Also ek+1 6= 0, as if ek+1 = 0, then vk+1 ∈ 〈e1, . . . , ek〉 = 〈v1, . . . , vk〉 which contradictsv1, . . . , vk+1 linearly independent.

So put ek+1 = ek+1/ |ek+1|, and then e1, . . . , ek+1 are orthonormal, and 〈e1, . . . , ek+1〉 =〈v1, . . . , vk+1〉.


Corollary 6.10. Any orthonormal set can be extended to an orthonormal basis.

Proof. Extend the orthonormal set to a basis; now the Gram-Schmidt algorithm doesn’tchange v1, . . . , vk if they are already orthonormal.

Recall that if W ≤ V , W⊥ = ⊥W ={v ∈ V | 〈v, w〉 = 0 ∀w ∈W

}.

Proposition 6.11. If W ≤ V , V an inner product space, then W ⊕W⊥ = V .

Proof 1. If 〈·, ·〉 is positive definite on V , then it is also positive definite on W , andthus 〈·, ·〉|W is non-degenerate. If F = R, then 〈·, ·〉 is bilinear, and we’ve shown thatW ⊕W⊥ = V when the form 〈·, ·〉 |W is non-degenerate. If F = C, then exactly thesame proof for sesquilinear forms shows the result.

Proof 2. Pick an orthonormal basis w1, . . . , wr for W , and extend it to an orthonormalbasis for V , w1, . . . , wn.

Now observe that 〈wr+1, . . . , wn〉 = W⊥. Proof (⊆) is done. For (⊇): if∑n

i=1 λiwi ∈W⊥, then take 〈·, wi〉, i ≤ r, and we get λi = 0 for i ≤ r. So V = W ⊕W⊥.

Geometric interpretation of the key step in the Gram-Schmidt algorithm

Let V be an inner product space, with W ≤ V and V = W ⊕ W⊥. Define a mapπ : V → W , the orthogonal projection onto W , defined as follows: if v ∈ V , then writev = w + w ′, where w ∈W and w ′ ∈W⊥ uniquely, and set π(v) = w.

This satisfies π|W = id : W →W , π2 = π and π linear.

Proposition 6.12. If W has an orthonormal basis e1, . . . , ek and π : V →W as above,then

(i) π(v) =∑k

i=1 〈v, ei〉 ei;(ii) π(v) is the vector in W closest to v; that is,

∣∣v − π(v)∣∣ ≤ |v − w| for all w ∈ W ,

with equality if and only if w = π(v).

Proof.

(i) If v ∈ V , then put w =∑k

i=1 〈v, ei〉 ei, and w ′ = v − w. So w ∈ W , and we wantw ′ ∈W⊥. But ⟨

w ′, ei⟩

= 〈v, ei〉 − 〈v, ei〉 = 0 for all i, 1 ≤ i ≤ k,

so indeed we have w ′ ∈W⊥, and π(v) = w by definition.

(ii) We have v − π(v) ∈W⊥, and if w ∈W , π(v)− w ∈W , then

|v − w|2 =∣∣∣(v − π(v)

)+(π(v)− w

)∣∣∣2=∣∣v − π(v)

∣∣2 +∣∣π(v)− w

∣∣2 + 2< 〈v − π(v)∈W⊥

, π(v)− w∈W

〉︸︷︷︸=0

,

and so |v − w|2 ≥∣∣v − π(v)

∣∣2, with equality if and only if∣∣π(v)− w

∣∣ = 0; that is,if π(v) = w.


6.2 Hermitian adjoints for inner products

Let V and W be inner product spaces over F and α : V →W a linear map.

Proposition 6.13. There is a unique linear map α∗ : W → V such that for all v ∈ V ,w ∈W ,

⟨α(v), w

⟩=⟨v, α∗(w)

⟩. This map is called the Hermitian adjoint.

Moreover, if e1, . . . , en is an orthonormal basis of V , and f1, . . . , fm is an orthonormalbasis for W , and A = (aij) is the matrix of α with respect to these bases, then AT is thematrix of α∗.

Proof. If β : W → V is a linear map with matrix B = (bij), then⟨α(v), w

⟩=⟨v, β(w)

⟩for all v, w

if and only if ⟨α(ej), fk

⟩=⟨ej , β(fk)

⟩for all 0 ≤ j ≤ n, 0 ≤ k ≤ m.

But we have

akj =⟨∑

aij fi, fk⟩

=⟨α(ej), fk

⟩=⟨ej , β(fk)

⟩=⟨ej ,∑bik ei

⟩= bjk,

that is, B = AT . Now define α∗ to be the map with matrix AT .

Exercise 6.14. If F = R, identify V∼−→ V ∗ by v 7→ 〈v, ·〉, W ∼−→W ∗ by w 7→ 〈w, ·〉,

and then show that α∗ is just the dual map.

More generally, if α : V →W defines a linear map over F, ψ ∈ Bil(V ), ψ ′ ∈ Bil(W ),both non-degenerate, then you can define the adjoint by ψ′(α(v), w) = ψ(v, α∗(w))for all v ∈ V , w ∈W , and show that it is the dual map.Lecture 22

Lemma 6.15.

(i) If α, β : V →W , then (α+ β)∗ = α∗ + β∗.

(ii) (λα)∗ = λα∗.

(iii) α∗∗ = α.

Proof. Immediate from the properties of A→ AT .

Definition. A map α : V → V is self-adjoint if α = α∗.

If v1, . . . , vn is an orthonormal basis for V , and A is the matrix of α, then α is self-adjoint

if and only if A = AT

.

In short, if F = R, then A is symmetric, and if F = C, then A is Hermitian.


Theorem 6.16

Let α : V → V be self-adjoint. Then

(i) All the eigenvalues of α are real.

(ii) Eigenvectors with distinct eigenvalues are orthogonal.

(iii) There exists an orthogonal basis of eigenvectors for α. In particular, α isdiagonalisable.

Proof.

(i) First assume F = C. If αv = λv for a non-zero vector v and λ ∈ C, then

λ 〈v, v〉 = 〈λv, v〉 =⟨v, α∗v

⟩= 〈v, αv〉 = 〈v, λv〉 = λ 〈v, v〉 ,

as α is self-adjoint. Since v 6= 0, we have 〈v, v〉 6= 0 and thus λ = λ.

If F = R, then let A = AT be the matrix of α; regard it as a matrix over C, whichis obviously Hermitian, and then the above shows that the eigenvalue for A is real.

Remark. This shows that we should introduce some notation so that we can phrasethis argument without choosing a basis. Here is one way: let V be a vectorspace over R. Define a new vector space, VC = V ⊕ iV , a new vector spaceover R of twice the dimension, and make it a complex vector space by sayingthat i (v + iw) = (−w + iv), so dimR V = dimC VC. Now suppose the matrixof α : V → V is A. Then show the matrix of αC : VC → VC is also A, whereαC(v + iw) = α(v) + i α(w).

Now we can phrase (i) of the proof using VC: show λ ∈ R implies that we canchoose a λ-eigenvector v ∈ VC to be in V ⊆ VC.

(ii) If α(vi) = λi vi, i = 1, 2, where vi 6= 0 and λ1 6= λ2, then

λ1 〈v1, v2〉 = 〈αv1, v2〉 = 〈v1, αv2〉 = λ2 〈v1, v2〉 ,

as α = α∗, so if 〈v1, v2〉 6= 0, then λ1 = λ2 = λ2, a contradiction.

(iii) Induct on dimV . The case dimV = 1 is clear, so assume n = dimV > 1. By(i), there is a real eigenvalue λ, and an eigenvector v1 ∈ V such that α(v1) = λv1.Thus V = 〈v1〉 ⊕ 〈v1〉⊥ as V is an inner product space. Now put W = 〈v1〉⊥.

Claim. α(W ) ⊆W ; that is, if 〈x, v1〉 = 0, then⟨α(x), v1

⟩= 0.

Proof. We have⟨α(x), v1

⟩=⟨x, α∗(v1)

⟩=⟨x, α(v1)

⟩= λ 〈x, v1〉 = 0.

Also, α|W : W →W is self-adjoint, as⟨α(v), w

⟩=⟨v, α(w)

⟩for all v, w ∈ V , and

so this is also true for all v, w ∈ W . Hence by induction W has an orthonormalbasis v2, . . . , vn, and so v1, v2, . . . , vn is an orthonormal basis for V .


Definition. Let V be an inner product space over C. Then the group of isometriesof the form 〈·, ·〉, denoted U(V ), is defined to be

U(V ) = Isom(V ) ={α : V → V |

⟨α(v), α(w)

⟩= 〈v, w〉 ∀v, w ∈ V

}=

{α ∈ GL(V ) |

⟨α(v), w ′

⟩=⟨v, α−1w ′

⟩∀v, w ′ ∈ V

},

putting w ′ = α(w). Now we note that α : V → V an isometry implies that α is anisomorphism. This is because v 6= 0 if and only if |v| 6= 0, and α is an isometry, sowe have |αv| = |v| 6= 0, and so α is injective.

={α ∈ GL(V ) | α−1 = α∗

}.

This is called the unitary group.

If V = Cn, and 〈·, ·〉 is the standard inner product 〈x, y〉 =∑

i xi yi, then we write

Un = U(n) = U(Cn) ={X ∈ GLn(C) | XT ·X = I

}.

So an orthonormal basis (that is, a choice of isomorphism V∼−→ Cn) gives us an isomor-

phism U(V )∼−→ Un.

Theorem 6.17

Let V be an inner product space over C, and α : V → V an isometry; that is,α∗ = α−1, and α ∈ U(V ). Then

(i) All eigenvalues λ of α have |λ| = 1; that is, they lie on the unit circle.

(ii) Eigenvectors with distinct eigenvalues are orthogonal.

(iii) There exists an orthonormal basis of eigenvectors for α; in particular α isdiagonalisable.

Remark. If V is an inner product space over R, then Isom 〈·, ·〉 = O(V ), the usualorthogonal group, also denoted On(R). If we choose an orthonormal basis for V , thenα ∈ O(V ) if A, the matrix of α, has ATA = I.

Then this theorem applied to A considered as a complex matrix shows that A is diag-onalisable over C, but as all the eigenvalues of A have |λ| = 1, it is not diagonalisableover R unles the only eigenvalues are ±1.

Example 6.18. The matrix(cos θ sin θ− sin θ cos θ

)= A ∈ O(2)

is diagonalisable over C, and conjugate to(eiθ

e−iθ

),

but not over R, unless sin θ = 0.


Proof.

(i) If α(v) = λv, for v non-zero, then

λ 〈v, v〉 = 〈λv, v〉 =⟨α(v), v

⟩=⟨v, α∗(v)

⟩=⟨v, α−1(v)

⟩=⟨v, λ−1v

⟩= λ

−1 〈v, v〉 ,

and so λ = λ−1

and λλ = 1.

(ii) If α(vi) = λi vi, for v non-zero and λi 6= λj :

λi⟨vi, vj

⟩=⟨α(vi), vj

⟩=⟨vi, α

−1(vj)⟩

= λ−1j

⟨vi, vj

⟩= λj

⟨vi, vj

⟩,

and so λi 6= λj implies⟨vi, vj

⟩= 0.

(iii) Induct on n = dimV . If V is a vector space over C, then a non-zero eigenvectorv1 exists with some eigenvalue λ, so α(v1) = λv1.

Put W = 〈v1〉⊥, so V = 〈v1〉 ⊕W , as V is an inner product space.

Claim. α(W ) ⊆W ; that is, 〈x, v1〉 implies⟨α(x), v1

⟩= 0.

Proof. We have⟨α(x), v1

⟩=⟨x, α−1(v1)

⟩=⟨x, λ−1(v1)

⟩= λ−1 〈x, v1〉 = 0.

Also,⟨α(v), α(w)

⟩= 〈v, w〉 for all v, w ∈ V implies that this is true for all v, w ∈

W , so α|W is unitary; that is (α|W )∗ = (α|W )−1, so induction gives an orthonormalbasis of W , namely v2, . . . , vn of eigenvectors for α, and so v1, v2, . . . , vn is anorthonormal basis for V .

Remark. The previous two theorems admit the following generalisation: define α : V →V to be normal if αα∗ = α∗α; that is, if α and α∗ commute.

Theorem 6.19

If α is normal, then there is an orthonormal basis consisting of eigenvalues for α.

Proof. Exercise!

Lecture 23Recall that

(i) GLn(C) acts on Matn(C) taking (P,A) 7→ PAP−1.

Interpretation: a choice of basis of a vector space V identifies Matn(C) ∼= L(V, V ),and a change of basis changes A to PAP−1.

(ii) GLn(C) acts on Matn(C) taking (P,A) 7→ PAPT

.

Interpretation: a choice of basis of a vector space V identifies Matn(C) withsesquilinear forms.

A change of basis changes A to PAPT

(where P is Q−1

, if Q is the change of basismatrix).

These are genuinely different; that is, the theory of linear maps and sesquilinear formsare different.

But we have P ∈ Un if and only if PTP = I, and P−1 = P

T, and then these two actions

coincide! This occurs if and only if the columns of P are an orthonormal basis withrespect to usual inner product on Cn.


Proposition 6.20.

(i) Let A ∈ Matn(C) be Hermitian, so AT

= A. Then there exists a P ∈ Un such that

PAP−1 = PAPT

is real and diagonal.

(ii) Let A ∈ Matn(R) be symmetric, with AT = A. Then there exists a P ∈ On(R)such that PAP−1 = PAP T is real and diagonal.

Proof. Given A ∈ Matn(F) (for F = C or R), the map α : Fn → Fn taking x 7→ Ax isself-adjoint with respect to the standard inner product. By theorem 6.17, there is anorthonormal basis of eigenvectors for α : Fn → Fn, that is, there are some λ1, . . . , λn ∈ Rsuch that Avi = λivi. Then

A(v1 · · · vn

)=(λ1v1 · · ·λnvn

)=(v1 · · · vn

)λ1

. . .

λn

.

If we set Q =(v1 · · · vn

)∈ Matn(F), then

AQ = Q

λ1

. . .

λn

,

and v1, . . . , vn are an orthonormal basis if and only if QT

= Q−1, so we put P = Q−1

and get the result.

Corollary 6.21. If ψ is a Hermitian form on V with matrix A, then the signaturesign(ψ) is the number of positive eigenvalues of A less the number of negative eigenvalues.

Proof. If the matrix A is diagonal, then this is clear: rescale the basis vectors vi 7→vi/ |vi|, and the signature is the number of original diagonal entries which are positive,less the number which are negative.

Now for general A, the proposition shows that we can choose P ∈ Un such that PAP−1 =

PAPT

is diagonal, and this represents the same form with respect to the new basis, butalso has the same eigenvalues.

Corollary 6.22. Both rk(ψ) and sign(ψ) can be read off the characteristic polynomi-alomial of any matrix A for ψ.

Exercise 6.23. Let ψ : Rn × Rn → R be ψ(x, y) = xTAy, where

A =

0 1 · · · · · · 11 0 1 · · · 1... 1

. . ....

.... . .

...1 1 · · · 1 0

.

Show that chA(x) = (x+ 1)n−1 (x− (n− 1)), so the signature is 2−n and the rank

is n.

Another consequence of the proposition is the simultaneous diagonalisation of somebilinear forms.


Theorem 6.24

Let V be a finite dimensional vector space over C (or R), and ϕ,ψ : V ×V → F betwo Hermitian (symmetric) bilinear forms.

If ϕ is positive definite, then there is some basis v1, . . . , vn of V such that with re-spect to this basis, both forms ϕ and ψ are diagonal; that is, ψ(vi, vj) = ϕ(vi, vj) =0 if i 6= j.

Proof. As ϕ is positive definite, there exists an orthonormal basis for ϕ; that is, somew1, . . . , wn such that ϕ(wi, wj) = δij .

Now let B be the matrix of ψ with respect to this basis; that is, bij = ψ(wi, wj) = bji =ψ(wj , wi), as ψ is Hermitian.

By the proposition, there is some P ∈ Un (or On(R), if V is over R) such that

PTBP = D =

λ1 0· · ·

0 λn

is diagonal, for λi ∈ R, and now the matrix of ϕ with respect to our new basis, is

PTIP = I, also diagonal.

Now we ask what is the “meaning” of the diagonal entries λ1, . . . , λn?

If ϕ,ψ : V ×V → F are any two bilinear/sesquilinear forms, then they determine (anti)-linear maps V → V ∗ taking v 7→ ϕ(·, v) and v 7→ ψ(·, v), and if ϕ is a non-degenerateform, then the map V → V ∗ taking v 7→ ϕ(·, v) is an (anti)-linear isomorphism. So wecan take its inverse, and compose with the map V → V ∗, v 7→ ψ(·, v) to get a (linear!)map V → V . Then λ1, . . . , λn are the eigenvalues of this map.

Exercise 6.25. If ϕ,ψ are both not positive definite, then need they be simulta-neously diagonalisable?

Remark. In coordinates: if we choose any basis for V , let the matrix of ϕ be A and thatfor ψ be B, with respect to this basis. Then A = QTQ for some Q ∈ GLn(C) as ϕ ispositive definite, and then the above proof shows that

B = Q−TP−TDP−1Q−1,

since P−1 = PT

. Then

det(D − xI) = det(Q−T(P−TDP − xQTQ

)Q−1)

= detAdet(B − xA),

and the diagonal entries are the roots of the polynomial det(B − xA); that is, the rootsof det(BA−1 − xI), as claimed.

Lecture 24Consider the relationship between On(R) ↪→ GLn(R), Un ↪→ GLn(C).

Example 6.26. Take n = 1. We have

GL1(C) = C∗ and U1 ={λ ∈ C : |λ| = 1

}= S 1

We have C∗ = S 1 × R>0, with λr ← [ (λ, r).


In R, we have GL1(R) = R∗, O1(R) = {±1} and R∗ = {±1} × R>0.

For n > 1, Gram-Schmidt orthonormalisation tells us the relation: define

A =

λ1 0

. . .

0 λn

∣∣λi ∈ R>0

, N(F) =

1 ∗

. . .

0 1

∣∣∗ ∈ F

,

where F = R or C. Then A as a set, is homeomorphic to Rn, and N(F) as a set

(not a group) is isomorphic to F12

(n−1)n, so Rn(n−1)/2 or Cn(n−1)/2.

Exercise 6.27. Show that

A ·N(F) =

λ1 ∗

. . .

0 λn

| λi ∈ R>0, ∗ ∈ F

is a group, N(F) is a normal subgroup and A ∩N(F) = {I}.

Theorem 6.28

Any A ∈ GLn(C) can be written uniquely as A = QR, with Q ∈ Un, R ∈ A·N(C).

Similarly, any A ∈ GLn(R) can be written uniquely as A = QR, with Q ∈ On(R),R ∈ A ·N(R).

Example 6.29. n = 1 is above.

Proof. This is just Gram-Schmidt.

Write A =(v1 · · · vn

), vi ∈ Fn so v1, . . . , vn is a basis for Fn. Now the Gram-Schmidt

algorithm gives an orthonormal basis e1, . . . , en. Recall how it went: set

e1 = v1,

e2 = v2 −〈v2, e1〉〈e1, e1〉

· e1,

e3 = v3 −〈v3, e2〉〈e2, e2〉

· e2 −〈v3, e1〉〈e1, e1〉

· e1

...

en = vn −n−1∑i=1

〈vn, ei〉ei, ei

· ei,

so that e1, . . . , en are orthogonal, and if we set ei = ei/ |ei|, then e1, . . . , en are anorthonormal basis. So

ei = vi + correction terms

= vi + 〈e1, . . . , ei−1〉= vi + 〈v1, . . . , vi−1〉 ,


so we can write

e1 = v1,

e2 = v2 + (∗) v1,

e3 = v3 + (∗) v2 + (∗) v1,

that is, (e1 · · · en

)=(v1 · · · vn

)1 ∗. . .

0 1

, with ∗ ∈ F,

and (e1 · · · en

)λ1 0. . .

0 λn

=(e1 · · · en

),

with λi = 1/ |ei|. So if Q =(e1 · · · en

), this is

Q = A

1 ∗. . .

1

λ1 0

. . .

0 λn

︸︷︷︸

call this R−1

.

Thus QR = A, with R ∈ A ·N(F), and e1, . . . , en is an orthonormal basis if and only if

Q ∈ Un; that is, if QTQ = I.

For uniqueness: if QR = Q ′R ′, then

(Q)−1Q∈Un

= R ′R−1

∈A·N(F).

So it is enough to show that if X =(xij)∈ A ·N(F) ∩ Un, then X = I. But

X =

x11 ∗. . .

0 xnn

,

and both the columns and the rows are orthonormal bases since X ∈ Un. Since thecolumns are an orthonormal basis, |x11| = 1 implies x12 = x13 = · · · = x1n = 0, as∑n

i=1 |x1i|2 = 1.

Then x11 ∈ R>0 ∩{λ ∈ C | |λ| = 1

}implies x11 = 1, so

X =

1 0

0 X ′

,

with X ′ ∈ Un−1 ∩ A ·N(F), so induction gives X ′ = I.

Warning. Notice that Un is a group, A·N(C) is a group, and if you want you can makeUn ×A ·N(C) into a group by the direct product. But if you do this, then the map inthe theorem is not a group homomorphism.


The theorem says the map

φ : Un ×A ·N(C) −→ GLn(C)(Q,R) 7−→ QR

is a bijection of sets, not an isomorphism of groups.

This theorem tells us that the ‘shape’ of the group GLn(C) and the shape of the groupUn are the “same” – one differs from another by the product of a space of the form Ck,a vector space. You will learn in topology the precise words for this – these two groupsare homotopic – and you will learn later on that this means that many of their essentialfeatures are the same.

Finally (!!!), let’s give another proof that every element of the unitary group is diagonal-isable. We already know a very strong form of this. The following proof gives a weakerresult, but gives it for a wider class of groups. It uses the same ideas as in in the above(probably cryptic) remark.

Consider the mapθ : Matn(C)

=Cn2=R2n2

−→ Matn(C)=Cn2=R2n2

A 7−→ ATA

.

This is a continuous map, and θ−1({I}) = Un, so as this is the inverse image of a

closed set, it is a closed subset of Cn2. We also observe that

∑j

∣∣aij∣∣2 = 1 implies

Un ⊆{

(aij) |∣∣aij∣∣ ≤ 1

}is a bounded set, so Un is a closed bounded subset of Cn2

. ThusUn is a compact topological space, and a group (a compact group).

Proposition 6.30. Let G ≤ GLn(C) be a subgroup such that G is also a closed boundedsubset, that is, a compact subgroup of GLn(C). Then if g ∈ G, then g is diagonalisableas an element of GLn(C). That is, there is some P ∈ GLn(C) such that PgP−1 isdiagonal.

Example 6.31. Any g ∈ Un is diagonalisable.

Proof. Consider the sequence of elements 1, g, g2, g3, . . . in G. As G is a closed boundedsubset, it must have a convergent subsequence.

Let P ∈ GLn(C) such that PgP−1 is in JNF.

Claim. The sequence a1, a2, . . . , an in GLn converges if and only if Pa1P−1, Pa2P

−1, . . .converges.

Proof of claim. For fixed P , the map A 7→ PAP−1 is a continuous map on Cn2. This

implies the claim, as the matrix coefficients are linear functions of the matrix coefficientson A.

If PgP−1 has a Jordan block of size a > 1,λ 1 0. . . 1

0 λ

= (λI + Ja) , λ 6= 0,


then

(λI + Ja)N = λNI +NλN−1Ja +

(N

2

)λN−2J2

a + · · ·

=

λN NλN−1

. . .. . . NλN−1

λN

.

If |λ| > 1, this has unbounded coefficients on the diagonal as N → ∞; if |λ| < 1, thishas unbounded coefficients on the diagonal as N → −∞, contradicting the existance ofa convergent subsequence.

So it must be that |λ| = 1. But now examine the entries just above the diagonal, andobserve these are unbounded as N → ∞, contradicting the existence of a convergentsubsequence.

linear algebra - alex chan · 2020-03-15 · c.w. curtis linear algebra: an introductory approach....

Documents