linear algebra: mat 217 lecture notes, spring 2012 michael...

Linear Algebra: MAT 217Lecture notes, Spring 2012

Michael Damron

compiled from lectures and exercises designed with Tasho Kaletha

Princeton University

1

Contents

1 Vector spaces 41.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Linear independence and bases . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Linear transformations 162.1 Definitions and basic properties . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Range and nullspace, one-to-one, onto . . . . . . . . . . . . . . . . . . . . . . 172.3 Isomorphisms and L(V,W ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4 Matrices and coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Dual spaces 323.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2 Annihilators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3 Double dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4 Dual maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Determinants 394.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.2 Determinants: existence and uniqueness . . . . . . . . . . . . . . . . . . . . 414.3 Properties of determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Eigenvalues 545.1 Definitions and the characteristic polynomial . . . . . . . . . . . . . . . . . . 545.2 Eigenspaces and the main diagonalizability theorem . . . . . . . . . . . . . . 565.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6 Jordan form 626.1 Generalized eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.2 Primary decomposition theorem . . . . . . . . . . . . . . . . . . . . . . . . . 636.3 Nilpotent operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.4 Existence and uniqueness of Jordan form, Cayley-Hamilton . . . . . . . . . . 706.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7 Bilinear forms 797.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797.2 Symmetric bilinear forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817.3 Sesquilinear and Hermitian forms . . . . . . . . . . . . . . . . . . . . . . . . 84

2

7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8 Inner product spaces 888.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898.3 Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 938.4 Spectral theory of self-adjoint operators . . . . . . . . . . . . . . . . . . . . . 958.5 Normal and commuting operators . . . . . . . . . . . . . . . . . . . . . . . . 988.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3

1 Vector spaces

1.1 Definitions

We begin with the definition of a vector space. (Keep in mind vectors in Rn or Cn.)

Definition 1.1.1. A vector space is a collection of two sets, V and F . The elements of F(usually we take R or C) are called scalars and the elements of V are called vectors. Foreach v, w ∈ V , there is a vector sum, v + w ∈ V , with the following properties.

0. There is one (and only one) vector called ~0 with the property

v +~0 = v for all v ∈ V ;

1. for each v ∈ V there is one (and only one) vector called −v with the property

v + (−v) = ~0 for all v ∈ V ;

2. commutativity of vector sum:

v + w = w + v for all v, w ∈ V ;

3. associativity of vector sum:

(v + w) + z = v + (w + z) for all v, w, z ∈ V .

Furthermore, for each v ∈ V and c ∈ F there is a scalar product cv ∈ V with the followingproperties.

1. For all v ∈ V ,1v = v .

2. For all v ∈ V and c, d ∈ F ,(cd)v = c(dv) .

3. For all c ∈ F, v, w ∈ V ,c(v + w) = cv + cw .

4. For all c, d ∈ F, v ∈ V ,(c+ d)v = cv + dv .

Here are some examples.

1. V = Rn, F = R. Addition is given by

(v1, . . . , vn) + (w1, . . . , wn) = (v1 + w1, . . . , vn + wn)

and scalar multiplication is given by

c(v1, . . . , vn) = (cv1, . . . , cvn) .

4

2. Polynomials: take V to be all polynomials of degree up to n with real coefficients

V = {anxn + · · ·+ a1x+ a0 : ai ∈ R for all i}

and take F = R. Note the similarity to Rn.

3. Let S be any nonempty set and let V be the set of functions from S to C. Set F = C.If f1, f2 ∈ V set f1 + f2 to be the function given by

(f1 + f2)(s) = f1(s) + f2(s) for all s ∈ S

and if c ∈ C set cf1 to be the function given by

(cf1)(s) = c(f1(s)) .

4. Let

V =

{(1 10 0

)}(or any other fixed object) with F = C. Define(

1 10 0

)+

(1 10 0

)=

(1 10 0

)and

c

(1 10 0

)=

(1 10 0

).

In general F is allowed to be a field.

Definition 1.1.2. A set F is called a field if for each a, b ∈ F there is an element ab ∈ Fand one a+ b ∈ F such that the following hold.

1. For all a, b, c ∈ F we have (ab)c = a(bc) and (a+ b) + c = a+ (b+ c).

2. For all a, b ∈ F we have ab = ba and a+ b = b+ a.

3. There exists an element 0 ∈ F such that for all a ∈ F , we have a+ 0 = a; furthermorethere is a non-zero element 1 ∈ F such that for all a ∈ F , we have 1a = a.

4. For each a ∈ F there is an element −a ∈ F such that a + (−a) = 0. If a 6= 0 thereexists an element a−1 such that aa−1 = 1.

5. For all a, b, c ∈ F ,a(b+ c) = ab+ ac .

Here are some general facts.

1. For all c ∈ F , c~0 = ~0.

5

Proof.c~0 = c(~0 +~0) = c~0 + c~0

c~0 + (−(c~0)) = (c~0 + c~0) + (−c(~0))~0 = c~0 + (c~0 + (−(c~0)))~0 = c~0 +~0~0 = c~0 .

Similarly one may prove that for all v ∈ V , 0v = ~0.

2. For all v ∈ V , (−1)v = −v.

Proof.

v + (−1)v = 1v + (−1)v

= (1 + (−1))v

= 0v

= ~0 .

However −v is the unique vector such that v + (−v) = ~0. Therefore (−1)v = −v.

1.2 Subspaces

Definition 1.2.1. A subset W ⊆ V of a vector space is called a subspace if (W,F ) with thesame operations is also a vector space.

Many of the rules for vector spaces follow directly by “inheritance.” For example, ifW ⊆ V then for all v, w ∈ W we have v+w = w+ v. We actually only need to check a few:

A. ~0 ∈ W .

B. For all w ∈ W the vector −w is also in W .

C. For all w ∈ W and c ∈ F , cw ∈ W .

D. For all v, w ∈ W , v + w ∈ W .

Theorem 1.2.2. W ⊆ V is a subspace if and only if it is nonempty and for all v, w ∈ Wand c ∈ F we have cv + w ∈ W .

Proof. Suppose that W is a subspace and let v, w ∈ W , c ∈ F . By (C) we have cv ∈ W . By(D) we have cv + w ∈ W .

Conversely suppose that for all v, w ∈ W and c ∈ F we have cv+w ∈ W . Then we needto show A-D.

6

A. SinceW is nonempty choose w ∈ W . Let v = w and c = −1. This gives~0 = w−w ∈ W .

B. Set v = w, w = 0 and c = −1.

C. Set v = w, w = 0 and c ∈ F .

D. Set c = 1.

Examples:

1. If V is a vector space then {0} is a subspace.

2. Take V = Cn. Let

W = {(z1, . . . , zn) :n∑i=1

zi = 0} .

Then W is a subspace. (Exercise.)

3. Let V be the set of 2× 2 matrices with real entries.(a1 b1

c1 d1

)+

(a2 b2

c2 d2

)=

(a1 + a2 b1 + b2

c1 + c2 d1 + d2

)

c

(a1 b1

c1 d1

)=

(ca1 cb1

cc1 cd1

).

Then is W is a subspace, where

W =

{(a bc d

): a+ d = 1

}?

4. In Rn all subspaces are hyperplanes through the origin.

Theorem 1.2.3. Suppose that C is a non-empty collection of subspaces of V . Then theintersection

W = ∩W∈CWis a subspace.

Proof. Let c ∈ F and v, w ∈ W . We need to show that (a) W 6= ∅ and (b) cv+w ∈ W . Thefirst holds because each W is a subspace, so 0 ∈ W . Next for all W ∈ C we have v, w ∈ W .Then cv + w ∈ W . Therefore cv + w ∈ W .

Definition 1.2.4. If S ⊆ V is a subset of vectors let CS be the collection of all subspacescontaining S. The span of S is defined as

span(S) = ∩W∈CSW .

Since CS is non-empty, span(S) is a subspace.

7

Question: What is span({})?Definition 1.2.5. If

v = a1w1 + · · ·+ akwk

for scalars ai ∈ F and vectors wi ∈ V then we say that v is a linear combination of{w1, . . . , wk}.Theorem 1.2.6. If S 6= ∅ then span(S) is equal to the set of all finite linear combinationsof elements of S.

Proof. SetS = all finite l.c.’s of elements of S .

We want to show that S = span(S). First we show that S ⊆ span(S). Let

a1s1 + · · ·+ aksk ∈ Sand let W be a subspace in CS. Since si ∈ S for all i we have si ∈ W . By virtue ofW being a subspace, a1s1 + · · · + aksk ∈ W . Since this is true for all W ∈ CS thena1s1 + · · ·+ aksk ∈ span(S). Therefore S ⊆ span(S).

In the other direction, the set S is itself a subspace and it contains S (exercise). Thus

S ∈ CS and sospan(S) = ∩W∈CSW ⊆ S .

Question: If W1 and W2 are subspaces, is W1 ∪W2 a subspace? If it were, it would be thesmallest subspace containing W1 ∪W2 and thus would be equal to span(W1 ∪W2). But it isnot!

Proposition 1.2.7. If W1 and W2 are subspaces then span(W1 ∪W2) = W1 +W2, where

W1 +W2 = {w1 + w2 : w1 ∈ W1, w2 ∈ W2} .Furthermore for each n ≥ 1,

span (∪nk=1Wk) = W1 + · · ·+Wn .

Proof. A general element w1+w2 in W1+W2 is a linear combination of elements in W1∪W2 soit is in the span ofW1∪W2. On the other hand if a1v1+· · ·+anvn is an element of the span thenwe can split the vi’s into vectors from W1 and those from W2. For instance, v1, . . . , vk ∈ W1

and vk+1, . . . , vn ∈ W2. Now a1v1 + · · ·+ akvk ∈ W1 and ak+1vk+1 + · · ·+ anvn ∈ W2 by thefact that these are subspaces. Thus this linear combination is equal to w1 + w2 for

w1 = a1v1 + · · ·+ akvk and w2 = ak+1vk+1 + · · ·+ anvn .

The general case is an exercise.

Remark. 1. Span(S) is the “smallest” subspace containing S in the following sense: ifW is any subspace containing S then span(S) ⊆ W .

2. If S ⊆ T then Span(S) ⊆ Span(T ).

3. If W is a subspace then Span(W ) = W . Therefore Span(Span(S)) = Span(S).

8

1.3 Linear independence and bases

Now we move on to linear independence.

Definition 1.3.1. We say that vectors v1, . . . , vk ∈ V are linearly independent if whenever

a1v1 + · · ·+ akvk = ~0 for scalars ai ∈ F

then ai = 0 for all i. Otherwise we say they are linearly dependent.

Lemma 1.3.2. Let S = {v1, . . . , vn} for n ≥ 1. Then S is linearly dependent if and only ifthere exists v ∈ S such that v ∈ Span(S \ {v}).

Proof. Suppose first that S is linearly dependent and that n ≥ 2. Then there exist scalarsa1, . . . , an ∈ F which are not all zero such that a1v1 + · · ·+ anvn = ~0. By reordering we mayassume that a1 6= 0. Now

v1 = −a2

a1

v2 + . . .+

(−ana1

)vn .

So v1 ∈ Span(S \ {v1}).If S is linearly dependent and n = 1 then there exists a nonzero a1 such that a1v1 = ~0,

so v1 = ~0. Now v1 ∈ Span({}) = Span(S \ {v1}).Conversely, suppose there exists v ∈ S such that v ∈ span(S \ {v}) and n ≥ 2. By

reordering we may suppose that v = v1. Then there exist scalars a2, . . . , an such that

v1 = a2v2 + · · ·+ anvn .

But now we have(−1)v1 + a2v2 + · · ·+ anvn = ~0 .

Since this is a nontrivial linear combination, S is linearly dependent.If n = 1 and v1 ∈ Span(S \ {v1}) then v1 ∈ Span({}) = {~0}, so that v1 = ~0. Now it is

easy to see that S is linearly dependent.

Examples:

1. For two vectors v1, v2 ∈ V , they are linearly dependent if and only if one is a scalarmultiple of the other. By reordering, we may suppose v1 = av2.

2. For three vectors this is not true anymore. The vectors (1, 1), (1, 0) and (0, 1) in R2

are linearly dependent since

(1, 1) + (−1)(1, 0) + (−1)(0, 1) = (0, 0) .

However none of these is a scalar multiple of another.

9

3. {~0} is linearly dependent:1 ·~0 = ~0 .

Proposition 1.3.3. If S is a linearly independent set and T is a nonempty subset then Tis linearly independent.

Proof. Suppose thata1t1 + · · ·+ antn = ~0

for vectors ti ∈ T and scalars ai ∈ F . Since each ti ∈ S this is a linear combinationof elements of S. Since S is linearly independent all the coefficients must be zero. Thusai = 0 for all i. This was an arbitrary linear combination of elements of T so T is linearlyindependent.

Corollary 1.3.4. If S is linearly dependent and R contains S then R is linearly dependent.In particular, any set containing ~0 is linearly dependent.

Proposition 1.3.5. If S is linearly independent and v ∈ Span(S) then there exist uniquevectors v1, . . . , vn and scalars a1, . . . , an such that

v = a1v1 + · · ·+ anvn .

Proof. Suppose that S is linearly independent and there are two representations

v = a1v1 + · · ·+ anvn and v = b1w1 + · · ·+ bkwk .

Then split the vectors in {v1, . . . , vn}∪ {w1, . . . , wk} into three sets: S1 are those in the firstbut not the second, S2 are those in the second but not the first, and S3 are those in both.

~0 = v − w =∑sj∈S1

ajsj +∑sj∈S2

bjsj +∑sj∈S3

(aj − bj)sj .

This is a linear combination of elements from S and by linear independence all coefficients arezero. Thus both representations used the same vectors (in S3) and with the same coefficientsand are thus the same.

Lemma 1.3.6 (Steinitz exchange). Let L = {v1, . . . , vk} be an linearly independent set in avector space V and let S = {w1, . . . , wm} be a spanning set; that is, Span(S) = V . Then

1. k ≤ m and

2. there exist m− k vectors s1, . . . , sm−k ∈ S such that

Span({v1, . . . , vk, s1, . . . , sm−k}) = V .

10

Proof. We will prove this by induction on k. For k = 0 it is obviously true (using the factthat {} is linearly independent). Suppose it is true for k and we will prove it for k + 1.In other words, let {v1, . . . , vk+1} be a linearly independent set. Then by last lecture, since{v1, . . . , vk} is linearly independent, we find k ≤ m and vectors s1, . . . , sm−k ∈ S with

Span({v1, . . . , vk, s1, . . . , sm−k}) = V .

Now since vk+1 ∈ V we can find scalars a1, . . . , ak and b1, . . . , bm−k in F such that

vk+1 = a1v1 + · · ·+ akvk + b1s1 + · · · bm−ksm−k . (1)

We claim that not all of the bi’s are zero. If this were the case then we would have

vk+1 = a1v1 + · · ·+ akvk

a1v1 + · · ·+ akvk + (−1)vk+1 = ~0 ,

a contradiction to linear independence. Also this implies that k 6= m, since otherwise thelinear combination (1) would contain no bi’s. Thus

k ≤ m+ 1 .

Suppose for example that b1 6= 0. Then we could write

b1s1 = −a1v1 + · · ·+ (−ak)vk + vk+1 + (−b2)s2 + · · ·+ (−bm−k)sm−k

s1 =

(−a1

b1

)v1 + · · ·+

(−akb1

)vk +

1

b1

vk+1 +

(−b2

b1

)s2 + · · ·+

(−bm−k

b1

)sm−k .

In other words,s1 ∈ Span(v1, . . . , vk, vk+1, s2, . . . , sm−k)

or said differently,

V = Span(v1, . . . , vk, s1, . . . , sm−k) ⊆ Span(v1, . . . , vk+1, s2, . . . , sm−k) .

This completes the proof.

This lemma has loads of consequences.

Definition 1.3.7. A set B ⊆ V is called a basis for V if B is linearly independent andSpan(B) = V .

Corollary 1.3.8. If B1 and B2 are bases for V then they have the same number of elements.

Proof. If both sets are infinite, we are done. If B1 is infinite and B2 is finite then we maychoose |B2| + 1 elements of B1 that will be linearly independent. Since B2 spans V , thiscontradicts Steinitz. Similarly if B1 is finite and B2 is infinite.

Otherwise they are both finite. Since each spans V and is linearly independent, we applySteinitz twice to get |B1| ≤ |B2| and |B2| ≤ |B1|.

11

Definition 1.3.9. We define the dimension dim V to be the number of elements in a basis.By the above, this is well-defined.

Remark. Each nonzero element of V has a unique representation in terms of a basis.

Theorem 1.3.10. Let V be a nonzero vector space and suppose that V is finitely generated;that is, there is a finite set S ⊆ V such that V = Span(S). Then V has a finite basis:dim V <∞.

Proof. Let B be a minimal spanning subset of S. We claim that B is linearly independent.If not, then by a previous result, there is a vector b ∈ B such that b ∈ Span(B \ {b}). Itthen follows that

V = Span(B) ⊆ Span(B \ {b}) ,so B \ {b} is a spanning set, a contradiction.

Theorem 1.3.11 (1 subspace theorem). Let V be a finite dimensional vector space and Wbe a nonzero subspace of V . Then dim W <∞. If C = {w1, . . . , wk} is a basis for W thenthere exists a basis B for V such that C ⊆ B.

Proof. It is an exercise to show that dim W < ∞. Write n for the dimension of V andlet B be a basis for V (with n elements). Since this is a spanning set and C is a linearlyindependent set, there exist vectors b1, . . . , bn−k ∈ B such that

B := C ∪ {b1, . . . , bn−k}

is a spanning set. Note that B is a spanning set with n = dim V number of elements. Thuswe will be done if we prove the following lemma.

Lemma 1.3.12. Let V be a vector space of dimension n ≥ 1 and S = {v1, . . . , vk} ⊆ V .

1. If k < n then S cannot span V .

2. If k > n then S cannot be linearly independent.

3. If k = n then S is linearly independent if and only if S spans V .

Proof. Let B be a basis of V . If S spans V , Steinitz gives that |B| ≤ |S|. This proves 1. IfS is linearly independent then again Steinitz gives |S| ≤ |B|. This proves 2.

Suppose that k = n and S is linearly independent. By Steinitz, we may add 0 vectorsfrom the set B to S to make S span V . Thus Span(S) = V . Conversely, suppose thatSpan(S) = V . If S is linearly dependent then there exists s ∈ S such that s ∈ Span(S\{s}).Then S \ {s} is a set of smaller cardinality than S and spans V . But now this contradictsSteinitz, using B as our linearly independent set and S \ {s} as our spanning set.

Corollary 1.3.13. If V is a vector space and W is a subspace then dim W ≤ dim V .

Theorem 1.3.14. Let W1 and W2 be subspaces of V (with dim V <∞). Then

dim W1 + dim W2 = dim (W1 ∩W2) + dim (W1 +W2) .

12

Proof. Easy if either is zero. Otherwise we argue as follows. Let

{v1, . . . , vk}

be a basis for W1 ∩W2. By the 1-subspace theorem, extend this to a basis of W1:

{v1, . . . , vk, w1, . . . , wm1}

and also extend it to a basis for W2:

{v1, . . . , vk, w1, . . . , wm2} .

We claim thatB := {v1, . . . , vk, w1, . . . , wm1 , w1, . . . , wm2}

is a basis for W1 +W2. It is not hard to see it is spanning.To show linear independence, suppose

a1v1 + · · ·+ akvk + b1w1 + · · ·+ bm1wm1 + c1w1 + · · ·+ cm2wm2 = ~0 .

Then

c1w1 + · · ·+ cm2wm2 = (−a1)v1 + · · ·+ (−ak)vk + (−b1)w1 + · · ·+ (−bm1)wm1 ∈ W1 .

Also it is clearly in W2. So it is in W1 ∩W2. So we can find scalars a1, . . . , ak such that

c1w1 + · · ·+ cm2wm2 = a1v1 + · · ·+ akvk

a1v1 + · · ·+ akvk + (−c1)w1 + · · ·+ (−cm2)wm2 = ~0 .

This is a linear combination of basis elements, so all ci’s are zero. A similar argument givesall bi’s as zero. Thus we finally have

a1v1 + · · ·+ akvk = ~0 .

But again this is a linear combination of basis elements so the ai’s are zero.

Theorem 1.3.15 (2 subspace theorem). Let V be a finite dimensional vector space and W1

and W2 be nonzero subspaces. There exists a basis of V that contains bases for W1 and W2.

Proof. The proof of the previous theorem shows that there is a basis for W1 + W2 whichcontains bases for W1 and W2. Extend this to a basis for V .

13

1.4 Exercises

Notation:

1. If F is a field then define s1 : F → F by s1(x) = x. For integers n ≥ 2 definesn : F → F by sn(x) = x+ sn−1(x). Last, define the characteristic of F as

char(F ) = min{n : sn(1) = 0} .

If sn(1) 6= 0 for all n ≥ 1 then we set char(F ) = 0.

Exercises:

1. The finite field Fp: For n ∈ N, let Zn denote the set of integers mod n. That is, eachelement of Z/nZ is a subset of Z of the form d+ nZ, where d ∈ Z. We define additionand multiplication on Zn by

• (a+ nZ) + (b+ nZ) = (a+ b) + nZ.

• (a+ nZ) · (b+ nZ) = (ab) + nZ.

Show that these operations are well defined. That is, if a′, b′ ∈ Z are integers such thata+ nZ = a′+ nZ and b+ nZ = b′+ nZ, then (a′+ nZ) + (b′+ nZ) = (a+ b) + nZ and(a′ + nZ) · (b′ + nZ) = (ab) + nZ. Moreover, show that these operations make Zn intoa field if and only if n is prime. In that case, one writes Z/pZ = Fp.

2. The finite field Fpn : Let F be a finite field.

(a) Show that char(F ) is a prime number (in particular non-zero).

(b) Write p for the characteristic of F and define

F = {sn(1) : n = 1, . . . , p} ,

where sn is the function given in the notation section. Show that F is a subfieldof F , isomorphic to Fp.

(c) We can consider F as a vector space over F . Vector addition and scalar mul-tiplication are interpreted using the operations of F . Show that F has finitedimension.

(d) Writing n for the dimension of F , show that

|F | = pn .

3. Consider R as a vector space over Q (using addition and multiplication of real numbers).Does this vector space have finite dimension?

14

4. Recall the definition of direct sum: if W1 and W2 are subspaces of a vector space Vthen we write W1 ⊕W2 for the space W1 +W2 if W1 ∩W2 = {~0}. For k ≥ 3 we writeW1 ⊕ · · · ⊕Wk for the space W1 + · · ·+Wk if for each i = 2, . . . , k, we have

Wi ∩ (W1 + · · ·+Wi−1) = {~0} .

Let S = {v1, . . . , vn} be a subset of nonzero vectors in a vector space V and for eachk = 1, . . . , n write Wk = Span({vk}). Show that S is a basis for V if and only if

V = W1 ⊕ · · · ⊕Wn .

15

2 Linear transformations

2.1 Definitions and basic properties

We now move to linear transformations.

Definition 2.1.1. Let V and W be vector spaces over the same field F . A function T : V →W is called a linear transformation if

1. for all v1, v2 ∈ V , T (v1 + v2) = T (v1) + T (v2) and

2. for all v ∈ V and c ∈ F , T (cv) = cT (v).

Remark. We only need to check that for all v1, v2 ∈ V , c ∈ F , T (cv1 +v2) = cT (v1)+T (v2).

Examples:

1. Let V = F n and W = Fm, the vector spaces of n-tuples and m-tuples respectively.Any m× n matrix A defines a linear transformation LA : F n → Fm by

LA~v = A · ~v .

2. Let V be a finite dimensional vector space (of dimension n) and fix an (ordered) basis

β = {v1, . . . , vn}

of V . Define Tβ : V → F n by

Tβ(v) = (a1, . . . , an) ,

where v = a1v1 + · · ·+ anvn. Then Tβ is linear. It is called the coordinate map for theordered basis β.

Suppose that T : Fn → Fm is a linear transformation and ~x ∈ Fn. Then writing ~x =(x1, . . . , xn),

T (~x) = T ((x1, . . . , xn))

= T (x1(1, 0, . . . , 0) + · · ·+ xn(0, . . . , 0, 1))

= x1T ((1, 0, . . . , 0)) + xnT ((0, . . . , 0, 1)) .

Therefore we only need to know the values of T at the standard basis. This leads us to:

Theorem 2.1.2 (The slogan). Given V and W , vector spaces over F , let {v1, . . . , vn} be abasis for V . If {w1, . . . , wn} are any vectors in W , there exists exactly one linear transfor-mation T : V → W such that

T (vi) = wi for all i = 1, . . . , n .

16

Proof. We need to prove two things: (a) there is such a linear transformation and (b) therecannot be more than one. Motivated by the above, we first prove (a).

Each v ∈ V has a unique representation

v = a1v1 + · · ·+ anvn .

Define T byT (v) = a1w1 + · · ·+ anwn .

Note that by unique representations, T is well-defined. We claim that T is linear. Letv, v ∈ V and c ∈ F . We must show that T (cv + v) = cT (v) + T (v). If v =

∑ni=1 aivi and

v =∑n

i=1 aivi then we claim that the unique representation of cv + v is

cv + v = (ca1 + a1)v1 + · · ·+ (can + an)vn .

Therefore

T (cv + v) = (ca1 + a1)w1 + · · ·+ (can + an)wn

= c(a1w1 + · · ·+ anwn) + (a1w1 + · · ·+ anwn)

= cT (v) + T (v) .

Thus T is linear.Now we show that T is unique. Suppose that T ′ is another linear transformation such

thatT ′(vi) = wi for all i = 1, . . . , n .

Then if v ∈ V write v =∑n

i=1 aivi. We have

T ′(v) = T ′(a1v1 + · · ·+ anvn)

= a1T′(v1) + · · ·+ anT

′(vn)

= a1w1 + · · ·+ anwn

= T (v) .

Since T (v) = T ′(v) for all v, by definition T = T ′.

2.2 Range and nullspace, one-to-one, onto

Definition 2.2.1. If T : V → W is linear we define the nullspace of T by

N(T ) = {v ∈ V : T (v) = ~0} .

Here, ~0 is the zero vector in W . We also define the range of T

R(T ) = {w ∈ W : there exists v ∈ V s.t. T (v) = w} .

17

Proposition 2.2.2. If T : V → W is linear then N(T ) is a subspace of V and R(T ) is asubspace of W .

Proof. First note that T (~0) = ~0. This holds from

~0 = 0T (v) = T (0v) = T (~0) .

Therefore both spaces are nonempty.Choose v1, v2 ∈ N(T ) and c ∈ F . Then

T (cv1 + v2) = cT (v1) + T (v2) = c~0 +~0 = ~0 ,

so that cv1 + v2 ∈ N(T ). Therefore N(T ) is a subspace of V . Also if w1, w2 ∈ R(T ) andc ∈ F then we may find v1, v2 ∈ V such that T (v1) = w1 and T (v2) = w2. Now

T (cv1 + v2) = cT (v1) + T (v2) = cw1 + w2 ,

so that cw1 + w2 ∈ R(T ). Therefore R(T ) is a subspace of W .

Definition 2.2.3. Let S and T be sets and f : S → T a function.

1. f is one-to-one if it maps distinct points to distinct points. In other words if s1, s2 ∈ Ssuch that s1 6= s2 then f(s1) 6= f(s2). Equivalently, whenever f(s1) = f(s2) thens1 = s2.

2. f is onto if its range is equal to T . That is, for each t ∈ T there exists s ∈ S such thatf(s) = t.

Theorem 2.2.4. Let T : V → W be linear. Then T is one-to-one if and only if N(T ) = {~0}.

Proof. Suppose T is one-to-one. We want to show that N(T ) = {~0}. Clearly ~0 ∈ N(T ) as itis a subspace of V . If v ∈ N(T ) then we have

T (v) = ~0 = T (~0) .

Since T is one-to-one this implies that v = ~0.Suppose conversely that N(T ) = {~0}. If T (v1) = T (v2) then

~0 = T (v1)− T (v2) = T (v1 − v2) ,

so that v1−v2 ∈ N(T ). But the only vector in the nullspace is ~0 so v1−v2 = ~0. This impliesthat v1 = v2 and T is one-to-one.

We now want to give a theorem that characterizes one-to-one and onto linear maps in adifferent way.

Theorem 2.2.5. Let T : V → W be linear.

18

1. T is one-to-one if and only if it maps linearly independent sets in V to linearly inde-pendent sets in W .

2. T is onto if and only if it maps spanning sets of V to spanning sets of W .

Proof. Suppose that T is one-to-one and that S is a linearly independent set in V . We willshow that T (S), defined by

T (S) := {T (s) : s ∈ S} ,

is also linearly independent. If

a1T (s1) + · · ·+ akT (sk) = ~0

for some si ∈ S and ai ∈ F then

T (a1s1 + · · ·+ aksk) = ~0 .

Therefore a1s1 + · · ·+ aksk ∈ N(T ). But T is one-to-one so N(T ) = {~0}. This gives

a1s1 + · · ·+ aksk = ~0 .

Linear independence of S gives the ai’s are zero. Thus T (S) is linearly independent.Suppose conversely that T maps linearly independent sets to linearly independent sets.

If v is any nonzero vector in V then {v} is linearly independent. Therefore so is {T (v)}.This implies that T (v) 6= 0. Therefore N(T ) = {~0} and so T is one-to-one.

If T maps spanning sets to spanning sets then let w ∈ W . Let S be a spanning set of V ,so that consequently T (S) spans W . If w ∈ W we can write w = a1T (s1) + · · · + akT (sk)for ai ∈ F and si ∈ S, so

w = T (a1s1 + · · ·+ aksk) ∈ R(T ) ,

giving that T is onto.For the converse suppose that T is onto and that S spans V . We claim that T (S) spans

W . To see this, let w ∈ W and note there exists v ∈ V such that T (v) = w. Write

v = a1s1 + · · ·+ aksk ,

so w = T (v) = a1T (s1) + · · ·+ akT (sk) ∈ Span(T (S)). Therefore T (S) spans W .

Corollary 2.2.6. Let T : V → W be linear.

1. if V and W are finite dimensional, then T is an isomorphism (one-to-one and onto)if and only if T maps bases to bases.

2. If V is finite dimensional, then every basis of V is mapped to a spanning set of R(T ).

3. If V is finite dimensional, then T is one-to-one if and only if T maps bases of V tobases of R(T ).

19

Theorem 2.2.7 (Rank-nullity theorem). Let T : V → W be linear and V of finite dimen-sion. Then

rank(T ) + nullity(T ) = dim V .

Proof. Let{v1, . . . , vk}

be a basis for N(T ). Extend it to a basis

{v1, . . . , vk, vk+1, . . . , vn}

of V . Write N ′ = Span{vk+1, . . . , vn} and note that V = N(T )⊕N ′.

Lemma 2.2.8. If N(T ) and N ′ are complementary subspaces (that is, V = N(T )⊕N ′) thenT is one-to-one on N ′.

Proof. If z1, z2 ∈ N ′ such that T (z1) = T (z2) then z1 − z2 ∈ N(T ). But it is in N ′ so it is inN ′ ∩N(T ), which is only the zero vector. So z1 = z2.

We may view T as a linear transformation only on N ′; call it T |N ′ ; in other words, T |N ′is a linear transformation from N ′ to W that acts exactly as T does. By the corollary,{T |N ′(vk+1), . . . , T |N ′(vn)} is a basis for R(T |N ′). Therefore {T (vk+1), . . . , T (vn)} is a basisfor R(T |N ′). By part two of the corollary, {T (v1), . . . , T (vn)} spans R(T ). So

R(T ) = Span({T (v1), . . . , T (vn)})= Span({T (vk+1), . . . , T (vn)})= = R(T |N ′) .

The second equality follows because the vectors T (v1), . . . , T (vk) are all zero and do notcontribute to the span (you can work this out as an exercise). Thus {T (vk+1), . . . , T (vn)} isa basis for R(T ) and

rank(T ) + nullity(T ) = (n− k) + k = n = dim V .

2.3 Isomorphisms and L(V,W )

Definition 2.3.1. If S and T are sets and f : S → T is a function then a function g : T → Sis called an inverse function for f (written f−1) if

f(g(t)) = t and g(f(s)) = s for all t ∈ T, s ∈ S .

Fact: f : S → T has an inverse function if and only if f is one-to-one and onto. Furthermorethe inverse is one-to-one and onto. (Explain this.)

Theorem 2.3.2. If T : V → W is an isomorphism then the inverse map T−1 : W → V isan isomorphism.

20

Proof. We have one-to-one and onto. We just need to show linear. Suppose that w1, w2 ∈ Wand c ∈ F . Then

T(T−1(cw1 + w2)

)= cw1 + w2

andT(cT−1(w1) + T−1(w2)

)= cT (T−1(w1)) + T (T−1(w2)) = cw1 + w2 .

However T is one-to-one, so

T−1(cw1 + w2) = cT−1(w1) + T−1(w2) .

The proof of the next lemma is in the homework.

Lemma 2.3.3. Let V and W be vector spaces with dim V = dim W . If T : V → W islinear then T is one-to-one if and only if T is onto.

Example: The coordinate map is an isomorphism. For V of dimension n choose a basis

β = {v1, . . . , vn}

and define Tβ : V → F n by T (v) = (a1, . . . , an), where

v = a1v1 + · · ·+ anvn .

Then Tβ is linear (check). To show one-to-one and onto we only need to check one (sincedim V = dim F n. If (a1, . . . , an) ∈ F n then define v = a1v1 + · · ·+ anvn. Now

Tβ(v) = (a1, . . . , an) .

So T is onto.

The space of linear maps. Let V and W be vector spaces over the same field F . Define

L(V,W ) = {T : V → W |T is linear} .

We define addition and scalar multiplication as usual: for T, U ∈ L(V,W ) and c ∈ F ,

(T + U)(v) = T (v) + U(v) and (cT )(v) = cT (v) .

This is a vector space (exercise).

Theorem 2.3.4. If dim V = n and dim W = m then dim L(V,W ) = mn. Given bases{v1, . . . , vn} and {w1, . . . , wm} of V and W , the set {Ti,j : 1 ≤ i ≤ n, 1 ≤ j ≤ m} is a basisfor L(V,W ), where

Ti,j(vk) =

{wj if i = k~0 otherwise

.

21

Proof. First, to show linear independence, suppose that∑i,j

ai,jTi,j = 0T ,

where the element on the right is the zero transformation. Then for each k = 1, . . . , n, applyboth sides to vk: ∑

i,j

ai,jTi,j(vk) = 0T (vk) = ~0 .

We then get

~0 =m∑j=1

n∑i=1

ai,jTi,j(vk) =m∑j=1

ak,jwj .

But the wj’s form a basis, so all ak,1, . . . , ak,m = 0. This is true for all k so the Ti,j’s arelinearly independent.

To show spanning suppose that T : V → W is linear. Then for each i = 1, . . . , n, thevector T (vi) is in W , so we can write it in terms of the wj’s:

T (vi) = ai,1w1 + · · ·+ ai,mwm .

Now define the transformationT =

∑i,j

ai,jTi,j .

We claim that this equals T . To see this, we must only check on the basis vectors. For somek = 1, . . . , n,

T (vk) = ak,1w1 + · · ·+ ak,mwm .

However,

T (vk) =∑i,j

ai,jTi,j(vk) =m∑j=1

n∑i=1

ai,jTi,j(vk)

=m∑j=1

ak,jwj

= ak,1w1 + · · ·+ ak,mwm .

2.4 Matrices and coordinates

Let T : V → W be a linear transformation and

β = {v1, . . . , vn} and γ = {w1, . . . , wm}

bases for V and W , respectively.We now build a matrix, which we label [T ]βγ . (Using the column convention.)

22

1. Since T (v1) ∈ W , we can write

T (v1) = a1,1w1 + · · ·+ am,1wm .

2. Put the entries a1,1, . . . , am,n into the first column of [T ]βγ .

3. Repeat for k = 1, . . . , n, writing

T (vk) = a1,kw1 + · · ·+ am,kwm

and place the entries a1,k, . . . , am,k into the k-th column.

Theorem 2.4.1. For each T : V → W and bases β and γ, there exists a unique m × nmatrix [T ]βγ such that for all v ∈ V ,

[T ]βγ [v]β = [T (v)]γ .

Proof. Let v ∈ V . Then write v = a1v1 + · · ·+ anvn.

T (v) = a1T (v1) + · · ·+ anT (vn)

= a1(a1,1w1 + · · ·+ am,1wm) + · · ·+ an(a1,nw1 + · · ·+ am,nwm) .

Collecting terms,

T (v) = (a1a1,1 + · · ·+ ana1,n)w1 + · · ·+ (a1am,1 + · · ·+ anam,n)wm .

This gives the coordinates of T (v) in terms of γ:

[T (v)]γ =

a1a1,1 + · · ·+ ana1,n

· · ·a1am,1 + · · ·+ anam,n

=

a1,1 · · · a1,n

· · · · · · · · ·am,1 · · · am,n

a1

· · ·an

= [T ]βγ [v]β .

Suppose that A and B are two matrices such that for all v ∈ V ,

A[v]β = [T (v)]γ = B[v]β .

Take v = vk. Then [v]β = ~ek and A[v]β is the k-th column of A (and similarly for B).Therefore A and B have all the same columns. This means A = B.

Theorem 2.4.2. Given bases β and γ of V and W , the map T 7→ [T ]βγ is an isomorphism.

Proof. It is one-to-one. To show linear let c ∈ F and T, U ∈ L(V,W ). For each k = 1, . . . , n,

[cT + U ]βγ [vk]β = [cT (v) + U(v)]β = c[T (vk)]β + [U(vk)]β = c[T ]βγ [vk]β + [U ]βγ [vk]β

= (c[T ]βγ + [U ]βγ)[vk]β .

However the left side is the k-th column of [cT + U ]βγ and the right side is the k-th columnof c[T ]βγ + [U ]βγ .

Both spaces have the same dimension, so the map is also onto and thus an isomorphism.

23

Examples:

1. Change of coordinates: Let V be finite dimensional and β, β′ two bases of V . Howdo we express coordinates in β in terms of those in β′? For each v ∈ V ,

[v]β′ = [Iv]β′ = [I]ββ′ [v]β .

We multiply by the matrix [I]ββ′ .

2. Suppose that V,W and Z are vector spaces over the same field. Let T : V → W andU : W → Z be linear with β, γ and δ bases for V,W and Z.

(a) UT : V → Z is linear. If v1, v2 ∈ V and c ∈ F then

(UT )(cv1 + v2) = U(T (cv1 + v2)) = U(cT (v1) + T (v2)) = c(UT )(v1) + (UT )(v2) .

(b) For each v ∈ V ,

[(UT )v]δ = [U(T (v))]δ = [U ]γδ [T (v)]γ = [U ]γδ [T ]βγ [v]β .

Therefore[UT ]βδ = [U ]γδ [T ]βγ .

(c) If T is an isomorphism from V to W ,

Id = [I]ββ = [T ]βγ [T−1]γβ .

Similarly,Id = [I]γγ = [T−1]γβ[T ]βγ

This implies that [T−1]γβ =([T ]βγ

)−1.

3. To change coordinates back,

[I]β′

β =(

[I−1]ββ′)−1

=(

[I]ββ′ ])−1

.

Definition 2.4.3. An n × n matrix A is invertible is there exists an n × n matrix B suchthat

I = AB = BA .

Remark. If β, β′ are bases of V then

I = [I]ββ = [I]β′

β [I]ββ′ .

Therefore each change of basis matrix is invertible.

Now how do we relate the matrix of T with respect to different bases?

24

Theorem 2.4.4. Let V and W be finite-dimensional vector spaces over F with β, β basesfor V and γ, γ bases for W .

1. If T : V → W is linear then there exist invertible matrices P and Q such that

[T ]βγ = P [T ]β′

γ′Q .

2. If T : V → W is linear then there exists an invertible matrix P such that

[T ]ββ = P−1[T ]β′

β′P .

Proof.[T ]βγ = [I]γ

′

γ [T ]β′

γ′ [I]ββ′ .

Also

[T ]ββ = [I]β′

β [T ]β′

β′ [I]ββ′ =(

[I]ββ′)−1

[T ]β′

β′ [I]ββ′ .

Definition 2.4.5. Two n×n matrices A and B are similar if there exists an n×n invertiblematrix P such that

A = P−1BP .

Theorem 2.4.6. Let A and B be n×n matrices with entries from F . If A and B are similarthen there exists an n-dimensional vector space V , a linear transformation T : V → V , andbases β, β′ such that

A = [T ]ββ and B = [T ]β′

β′ .

Proof. Suppose that A = P−1BP . Define the linear transformation LA : F n → F n by

LA(~v) = A · ~v .

If we choose β to be the standard basis then

[LA]ββ = A .

Next we will show that if β′ = {~p1, . . . , ~pn} are the columns of P then β′ is a basis and

P = [I]β′

β , where I : F n → F n is the identity map. If we prove this, then P−1 = [I]ββ′ and so

B = P−1AP = [I]ββ′ [LA]ββ[I]β′

β = [LA]β′

β′ .

Why is β′ a basis? Note that~pk = P · ~ek ,

so that if LP is invertible then β′ will be the image of a basis and thus a basis. But for all~v ∈ F n,

LP−1LP~v = P−1 · P · ~v = ~v

andLPLP−1~v = ~v .

So (LP )−1 = LP−1 . This completes the proof.

25

The moral: Similar matrices represent the same transformation but with respect to twodifferent bases. Any property of matrices that is invariant under conjugation can be viewedas a property of the underlying transformation.

Example: Trace. Given an n× n matrix A with entries from F , define

Tr A =n∑i=1

ai,i .

Note that if P is another matrix (not nec. invertible),

Tr (AP ) =n∑i=1

(AP )i,i =n∑i=1

[n∑l=1

ai,lpl,i

]=

n∑l=1

[n∑i=1

pl,iai,l

]=

n∑l=1

(PA)l,l = Tr (PA) .

Therefore if P is invertible,

Tr (P−1AP ) = Tr (APP−1) = Tr A .

This means that trace is invariant under conjugation. Thus if T : V → V is linear (and V isfinite dimensional) then Tr T can be defined as

Tr [T ]ββ

for any basis β.

2.5 Exercises

Notation:

1. A group is a pair (G, ·), where G is a set and · : G × G → G is a function (usuallycalled product) such that

(a) there is an identity element e; that is, an element with the property

e · g = g · e = g for all g ∈ G ,

(b) for all g ∈ G there is an inverse element in G called g−1 such that

g−1 · g = g · g−1 = e ,

(c) and the operation is associative: for all g, h, k ∈ G,

(g · h) · k = g · (h · k) .

If the operation is commutative; that is, for all g, h ∈ G, we have g · h = h · g then wecall G abelian.

26

2. If (G, ·) is a group and H is a subset of G then we call H a subgroup of G if (H, ·|H×H)is a group. Equivalently (and analogously to vector spaces and subspaces), H ⊆ G isa subgroup of G if and only if

(a) for all h1, h2 ∈ H we have h1 · h2 ∈ H and

(b) for all h ∈ H, we have h−1 ∈ H.

3. If G and H are groups then a function Φ : G→ H is called a group homomorphism if

Φ(g1 · g2) = Φ(g1) · Φ(g2) for all g1 and g2 ∈ G .

Note that the product on the left is in G whereas the product on the right is in H. Wedefine the kernel of Φ to be

Ker(Φ) = {g ∈ G : Φ(g) = eH} .

Here, eH refers to the identity element of H.

4. A group homomorphism Φ : G→ H is called

• a monomorphism (or injective, or one-to-one), if Φ(g1) = Φ(g2)⇒ g1 = g2;

• an epimorphism (or surjective, or onto), if Φ(G) = H;

• an isomorphism (or bijective), if it is both injective and surjective.

A group homomorphism Φ : G → G is called an endomorphism of G. An endomor-phism which is also an isomorphism is called an automorphism. The set of automor-phisms of a group G is denoted by Aut(G).

5. Recall that, if F is a field (with operations + and ·) then (F,+) and (F \ {0}, ·) areabelian groups, with identity elements 0 and 1 respectively. If F and G are fields andΦ : F → G is a function then we call Φ a field homomorphism if

(a) for all a, b ∈ F we haveΦ(a+ b) = Φ(a) + Φ(b) ,

(b) for all a, b ∈ F we haveΦ(ab) = Φ(a)Φ(b) ,

(c) and Φ(1F ) = 1G. Here 1F and 1G are the multiplicative identities of F and Grespectively.

6. If V and W are vector spaces over the same field F then we define their product to bethe set

V ×W = {(v, w) : v ∈ V and w ∈ W} .which becomes a vector space under the operations

(v, w)+(v′, w′) = (v+v′, w+w′), c(v, w) = (cv, cw) v, v′ ∈ V,w,w′ ∈ W, c ∈ F.

27

If you are familiar with the notion of an external direct sum, notice that the productof two vector spaces is the same as their external direct sum. The two notions ceasebeing equivalent when one considers infinitely many factors/summands.

If Z is another vector space over F then we call a function f : V ×W → Z bilinear if

(a) for each fixed v ∈ V , the function fv : W → Z defined by

fv(w) = f((v, w))

is a linear transformation as a function of w and

(b) for each fixed w ∈ W , the function fw : V → Z defined by

fw(v) = f((v, w))

is a linear transformation as a function of v.

Exercises:

1. Suppose that G and H are groups and Φ : G→ H is a homomorphism.

(a) Prove that if H ′ is a subgroup of H then the inverse image

Φ−1(H ′) = {g ∈ G : Φ(g) ∈ H ′}

is a subgroup of G. Deduce that Ker(Φ) is a subgroup of G.

(b) Prove that if G′ is a subgroup of G, then the image of G′ under Φ,

Φ(G′) = {Φ(g)|g ∈ G′}

is a subgroup of H.

(c) Prove that Φ is one-to-one if and only if Ker(Φ) = {eG}. (Here, eG is the identityelement of G.)

2. Prove that every field homomorphism is one-to-one.

3. Let V and W be finite dimensional vector spaces with dim V = n and dim W = m.Suppose that T : V → W is a linear transformation.

(a) Prove that if n > m then T cannot be one-to-one.

(b) Prove that if n < m then T cannot be onto.

(c) Prove that if n = m then T is one-to-one if and only if T is onto.

28

4. Let F be a field, V,W be finite-dimensional F -vector spaces, and Z be any F -vectorspace. Choose a basis {v1, . . . , vn} of V and a basis {w1, . . . , wm} of W . Let

{zi,j : 1 ≤ i ≤ n, 1 ≤ j ≤ m}

be any set of mn vectors from Z. Show that there is precisely one bilinear transforma-tion f : V ×W → Z such that

f(vi, wj) = zi,j for all i, j .

5. Let V be a vector space and T : V → V a linear transformation. Show that thefollowing two statements are equivalent.

(A) V = R(T )⊕N(T ), where R(T ) is the range of T and N(T ) is the nullspace of T .

(B) N(T ) = N(T 2), where T 2 is T composed with itself.

6. Let V,W and Z be finite-dimensional vector spaces over a field F . If T : V → W andU : W → Z are linear transformations, prove that

rank(UT ) ≤ min{rank(U), rank(T )} .

Prove also that if either of U or T is invertible, then the rank of UT is equal to therank of the other one. Deduce that if P : V → V and Q : W → W are isomorphismsthen the rank of QTP equals the rank of T .

7. Let V and W be finite-dimensional vector spaces over a field F and T : V → W bea linear transformation. Show that there exist ordered bases β of V and γ of W suchthat (

[T ]βγ)i,j

=

{0 if i 6= j

0 or 1 if i = j.

8. The purpose of this question is to show that the row rank of a matrix is equal to itscolumn rank. Note that this is obviously true for a matrix of the form described inthe previous exercise. Our goal will be to put an arbitrary matrix in this form withoutchanging either its row rank or its column rank. Let A be an m×n matrix with entriesfrom a field F .

(a) Show that the column rank of A is equal to the rank of the linear transformationLA : F n → Fm defined by LA(~v) = A · ~v, viewing ~v as a column vector.

(b) Use question 1 to show that if P and Q are invertible n× n and m×m matricesrespectively then the column rank of QAP equals the column rank of A.

(c) Show that the row rank of A is equal to the rank of the linear transformationRA : Fm → F n defined by RA(~v) = ~v · A, viewing ~v as a row vector.

(d) Use question 1 to show that if P and Q are invertible n× n and m×m matricesrespectively then the row rank of QAP equals the row rank of A.

29

(e) Show that there exist n× n and m×m matrices P and Q respectively such thatQAP has the form described in question 2. Deduce that the row rank of A equalsthe column rank of A.

9. Given an angle θ ∈ [0, 2π) let Tθ : R2 → R2 be the function which rotates a vectorclockwise about the origin by an angle θ. Find the matrix of Tθ relative to the standardbasis. You do not need to prove that Tθ is linear.

10. Given m ∈ R, define the line

Lm = {(x, y) ∈ R2 : y = mx} .

(a) Let Tm be the function which maps a point in R2 to its closest point in Lm. Findthe matrix of Tm relative to the standard basis. You do not need to prove thatTm is linear.

(b) Let Rm be the function which maps a point in R2 to the reflection of this pointabout the line Lm. Find the matrix of Tm relative to the standard basis. You donot need to prove that Rm is linear.

Hint for (a) and (b): first find the matrix relative to a carefully chosen basisand then perform a change of basis.

11. The quotient space V/W . Let V be an F-vector space, and W ⊂ V a subspace. Asubset S ⊂ V is called a W -affine subspace of V , if the following holds:

∀s, s′ ∈ S : s− s′ ∈ W and ∀s ∈ S,w ∈ W : s+ w ∈ S.

(a) Let S and T be W -affine subspaces of V and c ∈ F. Put

S + T := {s+ t : s ∈ S, t ∈ T}, cT :=

{{ct : t ∈ T} c 6= 0

W c = 0.

Show that S + T and cT are again W -affine subspaces of V .

(b) Show that the above operations define an F-vector space structure on the set ofall W -affine subspaces of V .

We will write V/W for the set of W -affine subspaces of V . We now know that itis a vector space. Note that the elements of V/W are subsets of V .

(c) Show that if v ∈ V , then

v +W := {v + w : w ∈ W}

is a W -affine subspace of V . Show moreover that for any W -affine subspace S ⊂ Vthere exists a v ∈ V such that S = v +W .

(d) Show that the map p : V → V/W defined by p(v) = v+W is linear and surjective.

30

(e) Compute the nullspace of p and, if the dimension of V is finite, the dimension ofV/W .

A helpful way to think about the quotient space V/W is to think of it as “being” thevector space V , but with a new notion of “equality” of vectors. Namely, two vectorsv1, v2 ∈ V are now seen as equal if v1 − v2 ∈ W . Use this point of view to find asolution for the following exercise. When you find it, use the formal definition givenabove to write your solution rigorously.

12. Let V and X be F-vector spaces, and f ∈ L(V,X). Let W be a subspace of Vcontained in N(f). Consider the quotient space V/W and the map p : V → V/W fromthe previous exercise.

(a) Show that there exists a unique f ∈ L(V/W,X) such that f = f ◦ p.(b) Show that f is injective if and only if W = N(f).

31

3 Dual spaces

3.1 Definitions

Consider the space F n and write each vector as a column vector. When can we say that avector is zero? When all coordinates are zero. Further, we say that two vectors are the sameif all of their coordinates are the same. This motivates the definition of the coordinate maps

e∗i : F n → F by e∗i (~v) = i-th coordinate of ~v .

Notice that each e∗i is a linear function from F n to F . Furthermore they are linearly inde-pendent, so since the dimension of L(F n, F ) is n, they form a basis.

Last it is clear that a vector ~v is zero if and only if e∗i (~v) = 0 for all i. This is true ifand only if f(~v) = 0 for all f which are linear functions from F n → F . This motivates thefollowing definition.

Definition 3.1.1. If V is a vector space over F we define the dual space V ∗ as the space oflinear functionals

V ∗ = {f : V → F | f is linear} .We can view this as the space L(V, F ), where F is considered as a one-dimensional vectorspace over itself.

Suppose that V is finite dimensional and f ∈ V ∗. Then by the rank-nullity theorem,either f ≡ 0 or N(f) is a dim V − 1 dimensional subspace of V . Conversely, you will showin the homework that any dim V − 1 dimensional subspace W (that is, a hyperspace) is thenullspace of some linear functional.

Definition 3.1.2. If β = {v1, . . . , vn} is a basis for V then we define the dual basis β∗ ={fv1 , . . . , fvn} as the unique functionals satisfying

fvi(vj) =

{1 i = j

0 i 6= j.

From our proof of the dimension of L(V,W ) we know that β∗ is a basis of V ∗.

Proposition 3.1.3. Given a basis β = {v1, . . . , vn} of V and dual basis β∗ of V ∗ we canwrite

f = f(v1)fv1 + · · ·+ f(vn)fvn .

In other words, the coefficients for f in the dual basis are just f(v1), . . . , f(vn).

Proof. Given f ∈ V ∗, we can write

f = a1fv1 + · · ·+ anfvn .

To find the coefficients, we evaluate both sides at vk. The left is just f(vk). The right is

akfvk(vk) = ak .

Therefore ak = f(vk) are we are done.

32

3.2 Annihilators

We now study annihilators.

Definition 3.2.1. If S ⊆ V we define the annihilator of S as

S⊥ = {f ∈ V ∗ : f(v) = 0 for all v ∈ S} .

Theorem 3.2.2. Let V be a vector space and S ⊆ V .

1. S⊥ is a subspace of V ∗ (although S does not have to be a subspace of V ).

2. S⊥ = (Span S)⊥.

3. If dim V < ∞ and U is a subspace of V then whenever {v1, . . . , vk} is a basis for Uand {v1, . . . , vn} is a basis for V ,

{fvk+1, . . . , fvn} is a basis for U⊥ .

Proof. First we show that S⊥ is a subspace of V ∗. Note that the zero functional obviouslysends every vector in S to zero, so 0 ∈ S⊥. If c ∈ F and f1, f2 ∈ S⊥, then for each v ∈ S,

(cf1 + f2)(v) = cf1(v) + f2(v) = 0 .

So cf1 + f2 ∈ S⊥ and S⊥ is a subspace of V ∗.Next we show that S⊥ = (Span S)⊥. To prove the forward inclusion, take f ∈ S⊥. Then

if v ∈ Span S we can writev = a1v1 + · · ·+ amvm

for scalars ai ∈ F and vi ∈ S. Thus

f(v) = a1f(v1) + · · ·+ amf(vm) = 0 ,

so f ∈ (Span S)⊥. On the other hand if f ∈ (Span S)⊥ then clearly f(v) = 0 for all v ∈ S(since S ⊆ Span S). This completes the proof of item 2.

For the third item, we know that the functionals fvk+1, . . . , fvn are linearly independent.

Therefore we just need to show that they span U⊥. To do this, take f ∈ U⊥. We can writef in terms of the dual basis fv1 , . . . , fvn :

f = a1fv1 + · · ·+ akfvk + ak+1fvk+1+ · · ·+ anfvn .

Using the formula we have for the coefficients, we get aj = f(vj), which is zero for j ≤ k.Therefore

f = ak+1fvk+1+ · · ·+ anfvn

and we are done.

33

Corollary 3.2.3. If V is finite dimensional and W is a subspace then

dim V = dim W + dim W⊥ .

Definition 3.2.4. For S ′ ⊆ V ∗ we define

⊥(S ′) = {v ∈ V : f(v) = 0 for all f ∈ S ′} .

In the homework you will prove similar properties for ⊥(S ′).

Fact: v ∈ V is zero if and only if f(v) = 0 for all f ∈ V ∗. One implication is easy. To provethe other, suppose that v 6= ~0 and and extend {v} to a basis for V . Then the dual basis hasthe property that fv(v) 6= 0.

Proposition 3.2.5. If W ⊆ V is a subspace and V is finite-dimensional then ⊥(W⊥) = W .

Proof. If w ∈ W then for all f ∈ W⊥, we have f(w) = 0, so w ∈⊥ (W⊥). Suppose converselythat w ∈ V has f(w) = 0 for all f ∈ W⊥. If w /∈ W then build a basis {v1, . . . , vn} of Vsuch that {v1, . . . , vk} is a basis for W and vk+1 = w. Then by the previous proposition,{fvk+1

, . . . , fvn} is a basis for W⊥. However fw(w) = 1 6= 0, which is a contradiction, sincefw ∈ W⊥.

3.3 Double dual

Lemma 3.3.1. If v ∈ V is nonzero and dim V <∞, there exists a linear functional fv ∈ V ∗such that fv(v) = 1. Therefore v = ~0 if and only if f(v) = 0 for all f ∈ V ∗.

Proof. Extend {v} to a basis of V and consider the dual basis. fv is in this basis andfv(v) = 1.

For each v ∈ V we can define the evaluation map v : V ∗ → F by

v(f) = f(v) .

Theorem 3.3.2. Suppose that V is finite-dimensional. Then the map Φ : V → V ∗∗ givenby

Φ(v) = v

is an isomorphism.

Proof. First we show that if v ∈ V then Φ(v) ∈ V ∗∗. Clearly v maps V ∗ to F , but we justneed to show that v is linear. If f1, f2 ∈ V ∗ and c ∈ F then

v(cf1 + f2) = (cf1 + f2)(v) = cf1(v) + f2(v) = cv(f1) + v(f2) .

Therefore Φ(v) ∈ V ∗∗. We now must show that Φ is linear and either one-to-one or onto(since the dimension of V is equal to the dimension of V ∗∗). First if v1, v2 ∈ V , c ∈ F thenwe want to show that

Φ(cv1 + v2) = cΦ(v1) + Φ(v2) .

34

Both sides are elements of V ∗∗ so we need to show they act the same on elements of V ∗. Letf ∈ V ∗. Then

Φ(cv1 +v2)(f) = f(cv1 +v2) = cf(v1)+f(v2) = cΦ(v1)(f)+Φ(v2)(f) = (cΦ(v1)+Φ(v2))(f) .

Finally to show one-to-one, we show that N(Φ) = {0}. If Φ(v) = ~0 then for all f ∈ V ∗,

0 = Φ(v)(f) = f(v) .

This implies v = ~0.

Theorem 3.3.3. Let V be finite dimensional.

1. If β = {v1, . . . , vn} is a basis for V then Φ(β) = {Φ(v1), . . . ,Φ(vn)} is the double dualof this basis.

2. If W is a subspace of V then Φ(W ) is equal to (W⊥)⊥.

Proof. Recall that the dual basis of β is β∗ = {fv1 , . . . , fvn}, where

fvi(vj) =

{1 i = j

0 i 6= j.

Since Φ is an isomorphism, Φ(β) is a basis of V ∗∗. Now

Φ(vi)(fvk) = fvk(vi)

which is 1 if i = k and 0 otherwise. This means Φ(β) = β∗∗.Next if W is a subspace, let w ∈ W . Letting f ∈ W⊥,

Φ(w)(f) = f(w) = 0 .

So w ∈ (W⊥)⊥. However, since Φ is an isomorphism, Φ(W ) is a subspace of (W⊥)⊥. Butthey have the same dimension, so they are equal.

3.4 Dual maps

Definition 3.4.1. Let T : V → W be linear. We define the dual map T ∗ : W ∗ → V ∗ by

T ∗(g)(v) = g(T (v)) .

Theorem 3.4.2. Let V and W be finite dimensional and let β and γ be bases for V and W .If T : V → W is linear, so is T ∗. If β∗ and γ∗ are the dual bases, then

[T ∗]γ∗

β∗ =([T ]βγ

)t.

35

Proof. First we show that T ∗ is linear. If g1, g2 ∈ W ∗ and c ∈ F then for each v ∈ V ,

T ∗(cg1 + g2)(v) = (cg1 + g2)(T (v)) = cg1(T (v)) + g2(T (v))

= cT ∗(g1)(v) + T ∗(g2)(v) = (cT ∗(g1) + T ∗(g2))(v) .

Next let β = {v1, . . . , vn}, γ = {w1, . . . , wm}, β∗ = {fv1 , . . . , fvn} and γ∗ = {gw1 , . . . , gwm}and write [T ]βγ = (ai,j). Then recall from the lemma that for any f ∈ V ∗ we have

f = f(v1)fv1 + · · ·+ f(vn)fvn ,

Therefore the coefficient of fvi for T ∗(gwk) is

T ∗(gwk)(vi) = gwk(T (vi)) = gwk(a1,iw1 + · · ·+ am,iwm) = ak,i .

SoT ∗(gwk) = ak,1fv1 + · · ·+ ak,nfvn .

This is the k-th column of [T ∗]γ∗

β∗ .

Theorem 3.4.3. If V and W are finite dimensional and T : V → W is linear then R(T ∗) =(N(T ))⊥ and (R(T ))⊥ = N(T ∗).

Proof. If g ∈ N(T ∗) then T ∗(g)(v) = 0 for all v ∈ V . If w ∈ R(T ) then w = T (v) forsome v ∈ V . Then g(w) = g(T (v)) = T ∗(g)(v) = 0. Thus g ∈ (R(T ))⊥. If, conversely,g ∈ (R(T ))⊥ then we would like to show that T ∗(g)(v) = 0 for all v ∈ V . We have

T ∗(g)(v) = g(T (v)) = 0 .

Let f ∈ R(T ∗). Then f = T ∗(g) for some g ∈ W ∗. If T (v) = 0, we have f(v) =T ∗(g)(v) = g(T (v)) = 0. Therefore f ∈ (N(T ))⊥. This gives R(T ∗) ⊆ (N(T ))⊥. To showthe other direction,

dim R(T ∗) = m− dim N(T ∗)

anddim (N(T ))⊥ = n− dim N(T ) .

However, by part 1, dim N(T ∗) = m− dim R(T ), so

dim R(T ∗) = dim R(T ) = n− dim N(T ) = dim (N(T ))⊥ .

This gives the other inclusion.

36

3.5 Exercises

Notation:

1. Recall the definition of a bilinear function. Let F be a field, and V,W and Z beF -vector spaces. A function f : V ×W → Z is called bilinear if

(a) for each v ∈ V the function fv : W → Z given by fv(w) = f(v, w) is linear as afunction of w and

(b) for each w ∈ W the function fw : V → Z given by fw(v) = f(v, w) is linear as afunction of v.

When Z is the F -vector space F , one calls f a bilinear form.

2. Given a bilinear function f : V ×W → Z, we define its left kernel and its right kernelas

LN(f) = {v ∈ V : f(v, w) = 0 ∀w ∈ W},

RN(f) = {w ∈ W : f(v, w) = 0 ∀v ∈ V }.

More generally, for subspaces U ⊂ V and X ⊂ W we define their orthogonal comple-ments

U⊥f = {w ∈ W : f(u,w) = 0 ∀u ∈ U},⊥fX = {v ∈ V : f(v, x) = 0 ∀x ∈ X}.

Notice that LN(f) = ⊥fW and RN(f) = V ⊥f .

Exercises:

1. Let V and W be vector spaces over a field F and let f : V ×W → F be a bilinearform. For each v ∈ V , we denote by fv the linear functional W → F given byfv(w) = f(v, w). For each w ∈ W , we denote by fw the linear functional V → F givenby fw(v) = f(v, w).

(a) Show that the mapΦ : V → W ∗, Φ(v) = fv

is linear and its kernel is LN(f).

(b) Analogously, show that the map

Ψ : W → V ∗, Ψ(w) = fw

is linear and its kernel is RN(f).

(c) Assume now that V and W are finite-dimensional. Show that the map W ∗∗ → V ∗

given by composing Ψ with the inverse of the canonical isomorphism W → W ∗∗

is equal to Φ∗, the map dual to Φ.

37

(d) Assuming further dim(V ) = dim(W ), conclude that the following statements areequivalent:

i. LN(f) = {0},ii. RN(f) = {0},

iii. Φ is an isomorphism,

iv. Ψ is an isomorphism.

2. Let V,W be finite-dimensional F -vector spaces. Denote by ξV and ξW the canonicalisomorphisms V → V ∗∗ and W → W ∗∗. Show that if T : V → W is linear thenξ−1W ◦ T ∗∗ ◦ ξV = T .

3. Let V be a finite-dimensional F -vector space. Show that any basis (λ1, . . . , λn) of V ∗

is the dual basis to some basis (v1, . . . , vn) of V .

4. Let V be an F -vector space, and S ′ ⊂ V ∗ a subset. Recall the definition

⊥S ′ = {v ∈ V : λ(v) = 0 ∀λ ∈ S ′}.

Imitating a proof given in class, show the following:

(a) ⊥S ′ = ⊥span(S ′).

(b) Assume that V is finite-dimensional, let U ′ ⊂ V ∗ be a subspace, and let (λ1, . . . , λn)be a basis for V ∗ such that (λ1, . . . , λk) is a basis for U ′. If (v1, . . . , vn) is the ba-sis of V from the previous exercise, then (vk+1, . . . , vn) is a basis for ⊥U ′. Inparticular, dim(U ′) + dim(⊥U ′) = dim(V ∗).

5. Let V be a finite-dimensional F -vector space, and U ⊂ V a hyperplane (that is, asubspace of V of dimension dim V − 1).

(a) Show that there exists λ ∈ V ∗ with N(λ) = U .

(b) Show that if µ ∈ V ∗ is another functional with N(µ) = U , then there existsc ∈ F× with µ = cλ.

6. (From Hoffman-Kunze)

(a) Let A and B be n×n matrices with entries from a field F . Show that Tr (AB) =Tr (BA).

(b) Let T : V → V be a linear transformation on a finite-dimensional vector space.Define the trace of T as the trace of the matrix of T , represented in some basis.Prove that the definition of trace does not depend on the basis thus chosen.

(c) Prove that on the space of n × n matrices with entries from a field F , the tracefunction Tr is a linear functional. Show also that, conversely, if some linearfunctional g on this space satisfies g(AB) = g(BA) then g is a scalar multiple ofthe trace function.

38

4 Determinants

4.1 Permutations

Now we move to permutations. These will be used when we talk about the determinant.

Definition 4.1.1. A permutation on n letters is a function π : {1, . . . , n} → {1, . . . , n}which is a bijection.

The set of all permutations forms a group under composition. There are n! elements.There are two main ways to write a permutation.

1. Row notation:1 2 3 4 5 66 4 2 3 5 1

Here we write the elements of {1, . . . , n} in the first row, in order. In the second rowwe write the elements they are mapped to, in order.

2. Cycle decomposition:(1 6)(2 4 3)(5)

All cycles are disjoint. It is easier to compose permutations this way. Suppose π is thepermutation given above and π′ is the permutation

π′ = (1 2 3 4 5)(6) .

Then the product of ππ′ is (here we will apply π′ first)[(1 6)(2 3 4)(5)

][(1 2 3 4 5)(6)

]= (1 3 2 4 5 6) .

It is a simple fact that each permutation has a cycle decomposition with disjoint cycles.

Definition 4.1.2. A transposition is a permutation that swaps two letters and fixes theothers. Removing the fixed letters, it looks like (i j) for i 6= j. An adjacent transposition isone that swaps neighboring letters.

Lemma 4.1.3. Every permutation π can be written as a product of transpositions. (Notnecessarily disjoint.)

Proof. All we need to do is write a cycle as a product of transpositions. Note

(a1 a2 · · · ak) = (a1 ak)(a1 ak−1) · · · (a1 a3)(a1 a2) .

Definition 4.1.4. A pair of numbers (i, j) is an inversion pair for π if i < j but π(i) > π(j).Write Ninv(π) for the number of inversion pairs of π.

39

For example in the permutation (13)(245), also written as

1 2 3 4 53 4 1 5 2

,

we have inversion pairs (1, 3), (1, 5), (2, 3), (2, 5), (4, 5).

Lemma 4.1.5. Let π be a permutation and σ = (k k+1) be an adjacent transposition. ThenNinv(σπ) = Ninv(π)± 1. If σ1, . . . , σm are adjacent transpositions then

Ninv(πσ1 · · ·σm)−Ninv(π) =

{even m even

odd m odd.

Proof. Let a < b ∈ {1, . . . , n}. If π(a), π(b) ∈ {k, k+1} then σπ(a)−σπ(b) = −(π(a)−π(b))so (a, b) is an inversion pair for π if and only if it is not one for σπ. We claim that in allother cases, the sign of σπ(a) − σπ(b) is the same as the sign of π(a) − π(b). If neitherof π(a) and π(b) is in {k, k + 1} then σπ(a) − σπ(b) = π(a) − π(b). The other cases aresomewhat similar: if π(a) = k but π(b) > k + 1 then σπ(a)− σπ(b) = k + 1− π(b) < 0 andπ(a)− π(b) = k − π(b) < 0. Keep going.

Therefore σπ has exactly the same inversion pairs as π except for (π−1(a), π−1(b)), whichswitches status. This proves the lemma.

Definition 4.1.6. Given a permutation π on n letters, we say that π is even if it can bewritten as a product of an even number of transpositions and odd otherwise. This is calledthe signature (or sign) of a permutation:

sgn(π) =

{+1 if π is even

−1 if π is odd.

Theorem 4.1.7. If π can be written as a product of an even number of transpositions, itcannot be written as a product of an odd number of transpositions. In other words, signatureis well-defined.

Proof. Suppose that π = s1 · · · sk and π = t1 · · · tj. We want to show that k − j is even. Inother words, if k is odd, so is j and if k is even, so is j.

Now note that each transposition can be written as a product of an odd number ofadjacent transpositions:

(5 1) = (5 4)(4 3)(3 2)(1 2)(2 3)(3 4)(4 5)

so write s1 · · · sk = s1 · · · sk′ and t1 · · · tj = t1 · · · tj′ where k − k′ is even and j − j′ is even.We have 0 = Ninv(id) = Ninv(t

−1j′ · · · t

−11 π), which is Ninv(π) plus an even number if j′

is even or and odd number if j′ is odd. This means that Ninv(π) − j′ is even. The sameargument works for k, so Ninv(π)− k′ is even. Now

j − k = j − j′ + j′ −Ninv(π) +Ninv(π)− k′ + k′ − k

is even.

40

Corollary 4.1.8. For any two permutations π and π′,

sgn(ππ′) = sgn(π)sgn(π′) .

4.2 Determinants: existence and uniqueness

Given n vectors ~v1, . . . , ~vn in Rn we want to define something like the volume of the paral-lelepiped spanned by these vectors. What properties would we expect of a volume?

1. vol(e1, . . . , en) = 1.

2. If two of the vectors ~vi are equal the volume should be zero.

3. For each c > 0, vol(c~v1, ~v2, . . . , ~vn) = c vol(~v1, . . . , ~vn). Same in other arguments.

4. For each ~v′1, vol(~v1 +~v′1, ~v2, . . . , ~vn) = vol(~v1, . . . , ~vn)+vol(~v′1, ~v2, . . . , ~vn). Same in otherarguments.

Using the motivating example of the volume, we define a multilinear function as follows.

Definition 4.2.1. If V is an n-dimensional vector space over F then define

V n = {(v1, . . . , vn) : vi ∈ V for all i = 1, . . . , n} .

A function f : V n → F is called multilinear if for each i and vectors v1, . . . , vi−1, vi+1, . . . , vn ∈V , the function fi : V → F is linear, where

fi(v) = f(v1, . . . , vi−1, v, vi+1, . . . , vn) .

A multilinear function f is called alternating if f(v1, . . . , vn) = 0 whenever vi = vj for somei 6= j.

Proposition 4.2.2. Let f : V n → F be a multilinear function. If F does not have charac-teristic two then f is alternating if and only if for all v1, . . . , vn and i < j,

f(v1, . . . , vi, . . . , vj, . . . , vn) = −f(v1, . . . , vj, . . . , vi, . . . , vn) .

Proof. Suppose that f is alternating. Then

0 = f(v1, . . . , vi + vj, . . . , vi + vj, . . . , vn)

= f(v1, . . . , vi, . . . , vi + vj, . . . , vn) + f(v1, . . . , vj, . . . , vi + vj, . . . , vn)

= f(v1, . . . , vi, . . . , vj, . . . , vn) + f(v1, . . . , vj, . . . , vi, . . . , vn) .

Conversely suppose that f has the property above. Then if vi = vj,

f(v1, . . . , vi, . . . , vj, . . . , vn) = −f(v1, . . . , vj, . . . , vi, . . . , vn)

= −f(v1, . . . , vi, . . . , vj, . . . , vn) .

Since F does not have characteristic two, this means this is zero.

41

Corollary 4.2.3. Let f : V n → F be an n-linear alternating function. Then for each π ∈ Sn,

f(vπ(1), . . . , vπ(n)) = sgn(π) f(v1, . . . , vn) .

Proof. Write π = σ1 · · ·σk where the σi’s are transpositions and (−1)k = sgn(π). Then

f(vπ(1), . . . , vπ(n)) = −f(vσ1···σk−1(1), . . . , vσ1···σk−1(n)) .

Applying this k − 1 more times gives the corollary.

Theorem 4.2.4. Let {v1, . . . , vn} be a basis for V . There is at most one multilinear alter-nating function f : V n → F such that f(v1, . . . , vn) = 1.

Proof. Let u1, . . . , un ∈ V and write

uk = a1,kv1 + · · ·+ an,kvn .

Then

f(u1, . . . , un) =n∑

i1=1

ai1,1f(vi1 , u2, . . . , un)

=n∑

i1,...,in=1

ai1,1 · · · ain,nf(vi1 , . . . , vin) .

However whenever two ij’s are equal, we get zero, so we can restrict the sum to all distinctij’s. So this is

n∑i1,...,in=1 distinct

ai1,1 · · · ain,nf(vi1 , . . . , vin) .

This can now be written as ∑π∈Sn

aπ(1),1 · · · aπ(n),nf(vπ(1), . . . , vπ(n))

=∑π∈Sn

sgn(π) aπ(1),1 · · · aπ(n),nf(v1, . . . , vn)

=∑π∈Sn

sgn(π) aπ(1),1 · · · aπ(n),n .

We now find that if dim V = n and f : V n → F is an n-linear alternating functionwith f(v1, . . . , vn) = 1 for some fixed basis {v1, . . . , vn} then we have a specific form for f .Writing vectors u1, . . . , un as

uk = a1,kv1 + · · ·+ an,kvn ,

42

then we havef(u1, . . . , un) =

∑π∈Sn

sgn(π) aπ(1),1 · · · aπ(n),n .

Now we would like to show that the formula above indeed does define an n-linear alternatingfunction with the required property.

1. Alternating. Suppose that ui = uj for some i < j. We will then split the set ofpermutations into two classes. Let A = {π ∈ Sn : π(i) < π(j)}. Letting σi,j = (ij),write π for π ∈ A for the permutation πσi,j.

f(u1, . . . , un) =∑π∈A

sgn(π) aπ(1),1 · · · aπ(n),n +∑

τ∈Sn\A

sgn(τ) aτ(1),1 · · · aτ(n),n

=∑π∈A

sgn(π) aπ(1),1 · · · aπ(n),n +∑pi∈A

sgn(πσi,j) aπσi,j(1),1 · · · aπσi,j(n),n .

However, πσi,j(k) = π(k) when k 6= i, j and ui = uj so this equals∑π∈A

[sgn(π) + sgn(πσi,j)]aπ(1),1 · · · aπ(n),n = 0 .

2. 1 at the basis. Note that for ui = vi for all i we have

ai,j =

{1 i = j

0 i 6= j.

Therefore the value of f is∑π∈Sn

sgn(π) aπ(1),1 · · · aπ(n),n = sgn(id)a1,1 · · · an,n = 1 .

3. Multilinear. Write

uk = a1,kv1 + · · ·+ an,kvn for k = 1, . . . , n

and write u = b1v1 + · · ·+ bnvn. Now for c ∈ F ,

cu+ u1 = (cb1 + a1,1)v1 + · · ·+ (cbn + an,1)vn .

f(cu+ u1, u2, . . . , un)

=∑π∈Sn

sgn(π) [cbπ(1) + aπ(1),1]aπ(2),2 · · · aπ(n),n

= c∑π∈Sn

sgn(π) bπ(1)aπ(2),2 · · · aπ(n),n +∑π∈Sn

sgn(π) aπ(1),1aπ(2),2 · · · aπ(n),n

= cf(u, u2, . . . , un) + f(u1, . . . , un) .

43

4.3 Properties of determinants

Theorem 4.3.1. Let f : V n → F be a multilinear alternating function and let {v1, . . . , vn}be a basis with f(v1, . . . , vn) 6= 0. Then {u1, . . . , un} is linearly dependent if and only iff(u1, . . . , un) = 0.

Proof. One direction is on the homework: suppose that f(u1, . . . , un) = 0 but that {u1, . . . , un}is linearly independent. Then write

vk = a1,ku1 + · · ·+ an,kun .

By the same computation as above,

f(v1, . . . , vn) = f(u1, . . . , un)∑π∈Sn

sgn(π) aπ(1),1 · · · aπ(n),n = 0 ,

which is a contradiction. Therefore {u1, . . . , un} is linearly dependent.Conversely, if {u1, . . . , un} is linearly dependent then for n ≥ 2 we can find some uj and

scalars ai for i 6= j such that

uj =∑i 6=j

aiui .

Now we havef(u1, . . . , uj, . . . , un) =

∑i 6=j

aif(u1, . . . , ui, . . . , un) = 0 .

Definition 4.3.2. On the space F n we define det : F n → F as the unique alternatingn-linear function that gives det(e1, . . . , en) = 1. If A is an n× n matrix then we define

det A = det(~a1, . . . ,~an) ,

where ~ak is the k-th column of A.

Corollary 4.3.3. An n× n matrix A over F is invertible if and only if det A 6= 0.

Proof. We have det A 6= 0 if and only if the columns of A are linearly independent. This istrue if and only if A is invertible.

We start with the multiplicative property of determinants.

Theorem 4.3.4. Let A and B be n× n matrices over a field F . Then

det (AB) = det A · det B .

44

Proof. If det B = 0 then B is not invertible, so it cannot have full column rank. Thereforeneither can AB (by a homework problem). This means det (AB) = 0 and we are done.

Otherwise det B 6= 0. Define a function f : Mn×n(F )→ F by

f(A) =det (AB)

det B.

We claim that f is n-linear, alternating and assigns the value 1 to the standard basis (thatis, the identity matrix).

1. f is alternating. If A has two equal columns then its column rank is not full.Therefore neither can be the column rank of AB and we have det (AB) = 0. Thisimplies f(A) = 0.

2. f(I) = 1. This is clear since IB = B.

3. f is n-linear. This follows because det is.

But there is exactly one function satisfying the above. We find f(A) = det A and we aredone.

For the rest of the lecture we will give further properties of determinants.

• detA = detAt. This is on homework.

• det is alternating and n-linear as a function of rows.

• If A is (a block matrix) of the form

A =

(B C0 D

)then detA = detB detD. This is also on homework.

• The determinant is unchanged if we add a multiple of one column (or row) to another.To show this, write a matrix A as a collection of columns (~a1, . . . ,~an). For example ifwe add a multiple of column 1 to column 2 we get

det(~a1, c~a1 + ~a2,~a3, . . . ,~an) = det c(~a1,~a1,~a3, . . . ,~an) + det(~a1,~a2,~a3, . . . ,~an)

= det(~a1,~a2,~a3, . . . ,~an)

• det cA = cn detA.

We will now discuss cofactor expansion.

Definition 4.3.5. Let A ∈ Mn×n(F ). For i, j ∈ [1, n] define the (i, j)-minor of A (writtenA(i|j)) to be the (n− 1)× (n− 1) matrix obtained from A by removing the i-th row and thej-th column.

45

Theorem 4.3.6 (Laplace expansion). Let A ∈ Mn×n(F ) for n ≥ 2 and fix j ∈ [1, n]. Wehave

detA =n∑i=1

(−1)i+jAi,j detA(i|j) .

Proof. Let us begin by taking j = 1. Now write the column ~a1 as

~a1 = A1,1e1 + a2,1e2 + · · ·+ An,1en .

Then we get

detA = det(~a1, . . . ,~an) =n∑i=1

Ai,1 det(ei,~a2, . . . ,~an) . (2)

We now consider the term det(ei,~a2, . . . ,~an). This is the determinant of the following matrix:0 A1,2 · · · A1,n

· · ·1 Ai,2 · · · Ai,n

· · ·0 An,2 · · · An,n

.

Here, the first column is 0 except for a 1 in the i-th spot. We can now swap the i-th row tothe top using i−1 adjacent transpositions (12) · · · (i−1 i). We are left with the determinantof the matrix

(−1)i−1

1 Ai,2 · · · Ai,n0 A1,2 · · · A1,n

· · ·0 Ai−1,2 · · · Ai−1,n

0 Ai+1,2 · · · Ai+1,n

· · ·0 An,2 · · · An,n

.

This is a block matrix of the form (1 B0 A(i|1)

).

By the remarks earlier, the determinant is equal to detA(i|j)×1. Plugging this into formula(5), we get

detA =n∑i=1

(−1)i−1Ai,1 detA(i|1) ,

which equals∑n

i=1(−1)i+jAi,j detA(i|j).If j 6= 1 then we perform j − 1 adjacent column switches to get the j-th column to the

first. This gives us a new matrix A. For this matrix, the formula holds. Compensating for

46

the switches,

detA = (−1)j−1 det A = (−1)j−1

n∑i=1

(−1)i−1Ai,1 det A(i|1)

=n∑i=1

(−1)i+jAi,j detA(i|j) .

Check the last equality. This completes the proof.

We have discussed determinants of matrices (and of n vectors in F n). We will now definefor transformations.

Definition 4.3.7. Let V be a finite dimensional vector space over a field F . If T : V → Vis linear, we define detT as det[T ]ββ for any basis β of V .

• Note that detT does not depend on the choice of basis. Indeed, if β′ is another basis,

det[T ]β′

β′ = det(

[I]β′

β [T ]ββ[I]ββ′)

= det[T ]ββ .

• If T and U are linear transformations from V to V then detTU = detT detU .

• det cT = cdim V detT .

• detT = 0 if and only if T is non-invertible.

4.4 Exercises

Notation:

1. n = {1, . . . , n} is the finite set of natural numbers between 1 and n;

2. Sn is the set of all bijective maps n→ n;

3. For a sequence k1, . . . , kt of distinct elements of n, we denote by (k1k2 . . . kt) the elementσ of Sn which is defined by

σ(i) =

ks, i = ks−1, 1 < s < t+ 1

k1, i = kt

i, i /∈ {k1, . . . , kt}

Elements of this form are called cycles (or t-cycles). Two cycles (k1 . . . kt) and (l1 . . . ls)are called disjoint if the sets {k1, . . . , kt} and {l1, . . . , ls} are disjoint.

4. Let σ ∈ Sn. A subset {k1, . . . , kt} ⊂ n is called an orbit of σ if the following conditionshold

47

• For any j ∈ N there exists an 1 ≤ i ≤ t such that σj(k1) = ki.

• For any 1 ≤ i ≤ t there exists a j ∈ N such that ki = σj(k1).

Here σj is the product of j-copies of σ.

5. Let V and W be two vector spaces over an arbitrary field F , and k ∈ N. Recall thata k-linear map f : V k → W is called

• alternating, if f(v1, . . . , vk) = 0 whenever the vectors (v1, . . . , vk) are not distinct;

• skew-symmetric, if f(v1, . . . , vk) = −f(vτ(1), . . . , vτ(k)) for any transposition τ ∈Sk;

• symmetric, if f(v1, . . . , vk) = f(vτ(1), . . . , vτ(k)) for any transposition τ ∈ Sk.

6. If k and n are positive integers such that k ≤ n the binomial coefficient(nk

)is defined

as (n

k

)=

n!

k!(n− k)!.

Note that this number is equal to the number of distinct subsets of size k of a finiteset of size n.

7. Given an F-vector space V , denote by Altk(V ) the set of alternating k-linear forms(functions) on V ; that is, the set of alternating k-linear map V k → F.

Exercises:

1. Prove that composition of maps defines a group law on Sn. Show that this group isabelian only if n ≤ 2.

2. Let f : Sn → Z be a function which is multiplicative, i.e. f(στ) = f(σ)f(τ). Show thatf must be one of the following three functions: f(σ) = 0, f(σ) = 1, f(σ) = sgn(σ).

3. (From Dummit-Foote) List explicitly the 24 permutations of degree 4 and state whichare odd and which are even.

4. Let k1, . . . , kt be a sequence of distinct elements of n. Show that sgn((k1 . . . kt)) =(−1)t−1.

5. Let π ∈ Sn be the element (k1 . . . kt) from the previous exercise, and let σ ∈ Sn be anyelement. Find a formula for the element σπσ−1.

6. Let σ = (k1 . . . kt) and τ = (l1 . . . ls) be disjoint cycles. Show that then στ = τσ. Onesays that σ and τ commute.

7. Let σ ∈ Sn. Show that σ can be written as a product of disjoint (and hence, by theprevious exercise, commuting) cycles.Hint: Consider the orbits of σ.

48

8. If G is a group and S ⊆ G is a subset, define 〈S〉 to be the intersection of all subgroupsof G that contain S. (This is the subgroup of G generated by S.)

(a) Show that if S is a subset of G then 〈S〉 is a subgroup of G.

(b) Let S be a subset of G and define

S = S ∪ {s−1 : s ∈ S} .

Show that〈S〉 = {a1 · · · ak : k ≥ 1 and ai ∈ S for all i} .

9. Prove that Sn = 〈(12), (12 · · ·n)〉.Hint: Use exercise 5.

10. Let V be an finite-dimensional vector space over some field F , W an arbitrary vectorspace over F , and k > dim(V ). Show that every alternating k-linear function V k → Wis identically zero. Give an example (choose F , V , W , and k as you wish, as long ask > dim(V )) of a skew-symmetric k-linear function V k → W which is not identicallyzero.

11. (From Hoffman-Kunze) Let F be a field and f : F2 → F be a 2-linear alternatingfunction. Show that

f(

(ab

),

(cd

)) = (ad− bc)f(e1, e2).

Find an analogous formula for F3. Deduce from this the formula for the determinantof a 2× 2 and a 3× 3 matrix.

12. Let V be an n-dimensional vector space over a field F. Suppose that f : V n → F is ann-linear alternating function such that f(v1, . . . , vn) 6= 0 for some basis {v1, . . . , vn} ofV . Show that f(u1, . . . , un) = 0 implies that {u1, . . . , un} is linearly dependent.

13. Let V and W be vector spaces over a field F , and f : V → W a linear map.

(a) For η ∈ Altk(W ) let f ∗η be the function on V k defined by

[f ∗η](v1, . . . , vk) = η(f(v1), . . . , f(vk)).

Show that f ∗η ∈ Altk(V ).

(b) Show that in this way we obtain a linear map f ∗ : Altk(W )→ Altk(V ).

(c) Show that, given a third vector space X over F and a linear map g : W → X,one has (g ◦ f)∗ = f ∗ ◦ g∗.

(d) Show that if f is an isomorphism, then so is f ∗.

49

14. For n ≥ 2, we call M ∈Mn×n(F) a block upper-triangular matrix if there exists k with1 ≤ k ≤ n− 1 and matrices A ∈ Mk×k(F), B ∈ Mk×(n−k)(F) and C ∈ M(n−k)×(n−k)(F)such that M has the form (

A B0 C

).

That is, the elements of M are given by

Mi,j =

Ai,j 1 ≤ i ≤ k, 1 ≤ j ≤ k

Bi,j−k 1 ≤ i ≤ k, k < j ≤ n

0 k < i ≤ n, 1 ≤ j ≤ k

Ci−k,j−k k < i ≤ n, k < j ≤ n

.

We will show in this exercise that

detM = detA · detC . (3)

(a) Show that if detC = 0 then formula (3) holds.

(b) Suppose that detC 6= 0 and define a function A 7→ φ(A)

for A ∈Mk×k(F) by

φ(A)

= [detC]−1 det

(A B0 C

).

That is, φ(A)

is a scalar multiple of the determinant of the block upper-triangular

matrix we get when we replace A by A and keep B and C fixed.

i. Show that φ is k-linear as a function of the columns of A.

ii. Show that φ is alternating and satisfies φ(Ik) = 1, where Ik is the k × kidentity matrix.

iii. Conclude that formula (3) holds when detC 6= 0.

15. Suppose that A ∈ Mn×n(F) is upper-triangular; that is, ai,j = 0 when 1 ≤ j < i ≤ n.Show that detA = a1,1a2,2 · · · an,n.

16. Let A ∈Mn×n(F) such that Ak = 0 for some k ≥ 0. Show that detA = 0.

17. Let a0, a1, . . . , an be distinct complex numbers. Write Mn(a0, . . . , an) for the matrix1 a0 a2

0 · · · an01 a1 a2

1 · · · an1· · ·

1 an a2n · · · ann

.

The goal of this exercise is to show that

detMn(a0, . . . , an) =∏

0≤i<j≤n

(aj − ai) . (4)

We will argue by induction on n.

50

(a) Show that if n = 2 then formula (5) holds.

(b) Now suppose that k ≥ 3 and that formula (5) holds for all 2 ≤ n ≤ k. Show thatit holds for n = k + 1 by completing the following outline.

i. Define the function f : C→ C by f(z) = detMn(z, a1, . . . , an). Show that fis a polynomial of degree at most n.

ii. Find all the zeros of f .

iii. Show that the coefficient of zn is (−1)n detMn−1(a1, . . . , an).

iv. Show that formula (5) holds for n = k + 1, completing the proof.

18. Show that if A ∈Mn×n(F) then detA = detAt, where At is the transpose of A.

19. Let V be an n-dimensional F-vector space and k ≤ n. The purpose of this problem isto show that

dim(Altk(V )) =

(n

k

),

by completing the following steps:

(a) Let W be a subspace of V and let B = (v1, . . . , vn) be a basis for V such that(v1, . . . , vk) is a basis for W . Show that

pW,B : V → W, vi 7→

{vi, i ≤ k

0, i > k

specifies a linear map with the property that pW,B ◦ pW,B = pW,B. Such a map(that is, a T such that T ◦ T = T ) is called a projection.

(b) With W and B as in the previous part, let dW be a non-zero element of Altk(W ).Show that [pW,B]∗dW is a non-zero element of Altk(V ). (Recall this notation fromexercise 3.)

(c) Let B = (v1, . . . , vn) be a basis of V . Let S1, . . . , St be subsets of n = {1, . . . , n}.Assume that each Si has exactly k elements and no two Si’s are the same. Let

Wi = Span({vj : j ∈ Si}).

For i = 1, . . . , t, let dWi∈ Altk(Wi) be non-zero. Show that the collection

{[pWi,B]∗dWi: 1 ≤ i ≤ t} of elements of Altk(V ) is linearly independent.

(d) Show that the above collection is also generating, by taking an arbitrary η ∈Altk(V ), an arbitrary collection u1, . . . , uk of vectors in V , expressing each ui as alinear combination of (v1, . . . , vk) and plugging those linear combinations into η.

In doing this, it may be helpful (although certainly not necessary) to assumethat dWi

is the unique element of Altk(Wi) with dWi(w1, . . . , wk) = 1, where

Si = (w1, . . . , wk).

51

20. Let A ∈Mn×n(F ) for some field F . Recall that if 1 ≤ i, j ≤ n then the (i, j)-th minorof A, written A(i|j), is the (n− 1)× (n− 1) matrix obtained by removing the i-th rowand j-th column from A. Define the cofactor

Ci,j = (−1)i+j detA(i|j) .

Note that the Laplace expansion for the determinant can be written

detA =n∑i=1

Ai,jCi,j .

(a) Show that if 1 ≤ i, j, k ≤ n with j 6= k then

n∑i=1

Ai,kCi,j = 0 .

(b) Define the classical adjoint of A, written adj A, by

(adj A)i,j = Cj,i .

Show that (adj A)A = (detA)I.

(c) Show that A(adj A) = (detA)I and deduce that if A is invertible then

A−1 = (detA)−1adj A .

Hint: begin by applying the result of the previous part to At.

(d) Use the formula in the last part to find the inverses of the following matrices: 1 2 41 3 91 4 16

,

1 2 3 41 0 0 00 1 1 16 0 1 1

.

21. Consider a system of equations in n variables with coefficients from a field F . Wecan write this as AX = Y for an n × n matrix A, an n × 1 matrix X (with entriesx1, . . . , xn) and an n× 1 matrix Y (with entries y1, . . . , yn). Given the matrices A andY we would like to solve for X.

(a) Show that

(detA)xj =n∑i=1

(−1)i+jyi detA(i|j) .

(b) Show that if detA 6= 0 then we have

xj = (detA)−1 detBj ,

where Bj is an n× n matrix obtained from A by replacing the j-th column of Aby Y . This is known as Cramer’s rule.

52

(c) Solve the following systems of equations using Cramer’s rule.

2x− y + z = 3

2y − z = 1

y − x = 1

2x− y + z − 2t = −5

2x+ 2y − 3z + t = −1

−x+ y − z = −1

4x− 3y + 2z − 3t = −8

22. Find the determinants of the following matrices. In the first example, the entries arefrom R and in the second they are from Z3.

1 4 5 70 0 2 31 4 −1 72 8 10 14

,

1 1 0 0 01 1 1 0 00 1 1 1 00 0 1 1 10 0 0 1 1

53

5 Eigenvalues

5.1 Definitions and the characteristic polynomial

The simplest matrix is λI for some λ ∈ F . These act just like the field F . What is thesecond simplest? A diagonal matrix; that is, a matrix D that satisfied Dij = 0 if i 6= j.

Definition 5.1.1. Let V be a finite dimensional vector space over F . An linear transfor-mation T is called diagonalizable if there exists a basis β such that [T ]ββ is diagonal.

Proposition 5.1.2. T is diagonalizable if and only if there exists a basis {v1, . . . , vn} of Vand scalars λ1, . . . , λn such that

T (vi) = λivi for all i .

Proof. Suppose that T is diagonalizable. Then there is a basis {v1, . . . , vn} such that [T ]ββ is

diagonal. Then, writing D = [T ]ββ,

[Tvk]β = D[vk]β = D · ek = Dk,k .

Now we can choose λk = Dk,k.

If the second condition holds then we see that [T ]ββ is diagonal with entries Di,j = 0 ifi 6= j and Di,i = λi.

This motivates the following definition.

Definition 5.1.3. If T : V → V is linear then we call a nonzero vector v an eigenvector ofT if there exists λ ∈ F such that T (v) = λv. In this case we call λ the eigenvalue for v.

Theorem 5.1.4. If dim V <∞ and T : V → V is linear then the following are equivalent.

1. λ is an eigenvalue of T (for some eigenvector).

2. T − λI is not invertible.

3. det(T − λI) = 0.

Proof. If (1) holds then the eigenvector v is a non-zero vector in the nullspace of T−λI. ThusT − λI is not invertible. We already know that (2) and (3) are equivalent. If T − λI is notinvertible then there is a non-zero vector in its nullspace. This vector is an eigenvector.

Definition 5.1.5. If T : V → V is linear and dim V = n then we define the characteristicpolynomial c : F → F by

c(λ) = det(T − λI) .

• Note that c(λ) does not depend on the choice of basis.

54

• We can write in terms of the matrix.

c(λ) = det(T − λI) = det([T − λI]ββ) = det([T ]ββ − λId) .

• Eigenvalues are exactly the roots of c(λ).

Facts about c(x).

1. c is a polynomial of degree n. We can see this by analyzing each term in the definitionof the determinant: set B = A− xI and see

sgn(π) B1,π(1) · · ·Bn,π(n) .

Each term Bi,π(i) is a polynomial of degree 0 or 1 in x. So the product has degree atmost n. A sum of such polynomials is a polynomial of degree at most n.

In fact, the only term of degree n is

sgn(id) B1,1 · · ·Bn,n = (A1,1 − x) · · · (An,n − x) .

So the coefficient of xn is (−1)n.

2. In the above description of c(x), all terms corresponding to non-identity permutationsπ have degree at most n − 2. Therefore the degree n − 1 term comes from (A1,1 −x) · · · (An,n − x) as well. It is

(−1)n−1xn−1 [A1,1 + · · ·+ An,n] = (−1)n−1xn−1Tr A .

3. Because c(0) = det A,

c(x) = (−1)n[xn − Tr A xn−1 + · · ·+ det A

].

For F = C (or any field so that c(x) splits), we can always write c(x) = (−1)n(x −λ1) · · · (x− λn). Thus the constant term in the polynomial is (−1)n

∏λi. Therefore

c(x) = (−1)n

[xn − Tr A xn−1 + · · ·+

n∏i=1

λi

].

We find det A =∏λi in C.

Theorem 5.1.6. If dim V = n and c(λ) has n distinct roots then T is diagonalizable. Theconverse is not true.

Proof. Write the eigenvalues as λ1, . . . , λn. For each λi we have an eigenvector vi. We claimthat the vi’s are linearly independent. This follows from the lemma:

55

Lemma 5.1.7. If λ1, . . . , λk are k-distinct eigenvalues associated to eigenvectors v1, . . . , vkthen {v1, . . . , vk} is linearly independent.

Proof. Suppose thata1v1 + · · ·+ akvk = ~0 .

Take T of both sidesa1λ1v1 + · · ·+ akλkvk = ~0 .

Keep doing this k − 1 times so we get the system of equations

a1v1 + · · ·+ akvk = ~0

a1λ1v1 + · · ·+ akλkvk = ~0

· · ·a1λ

k−11 v1 + · · ·+ akλ

k−1k vk = ~0

Write each vi as [vi]β for some basis β. This is then equivalent to the matrix equation

a1(v1)β a2(v2)β · · · ak(vk)β

1 λ1 · · · λk−11

1 λ2 · · · λk−12

· · ·1 λk · · · λk−1

k

=

~0 ~0 · · · ~0

.

Here the left matrix is n × k and has j-th column equal to the column vector aj[vj]β. Butthe middle matrix has nonzero determinant when the λi’s are distinct: its determinant is∏

1≤i<j≤k(λj − λi). Therefore it is invertible. Multiplying both sides by its inverse, we find

aivi = ~0 for all i. Since vi 6= ~0, it follows that ai = 0 for all i.

5.2 Eigenspaces and the main diagonalizability theorem

Definition 5.2.1. If λ ∈ F we define the eigenspace

Eλ = N(T − λI) = {v ∈ V : T (v) = λv} .

Note that Eλ is a subspace even if λ is not an eigenvalue. Furthermore,

Eλ 6= {0} if and only if λ is an eigenvalue of T

andEλ is T -invariant for all λ ∈ F .

What this means is that if v ∈ Eλ then so is T (v):

(T − λI)(T (v)) = (T − λI)(λv) = λ(T − λI)v = ~0 .

56

Definition 5.2.2. If W1, . . . ,Wk are subspaces of a vector space V then we write

W1 ⊕ · · · ⊕Wk

for the sum space W1 + · · ·+Wk and say the sum is direct if

Wj ∩ [W1 + · · ·+Wj−1] = {0} for all j = 2, . . . , k .

We also say the subspaces are independent.

Theorem 5.2.3. If λ1, . . . , λk are distinct eigenvalues of T then

Eλ1 + · · ·+ Eλk = Eλ1 ⊕ · · · ⊕ Eλk .

Furthermore

dim

(k∑i=1

Eλi

)=

k∑i=1

dim Eλi .

Proof. The theorem will follows directly from the following lemma. The proof is in home-work.

Lemma 5.2.4. Let W1, . . . ,Wk be subspaces of V . The following are equivalent.

1.W1 + · · ·+Wk = W1 ⊕ · · · ⊕Wk .

2. Whenever w1 + · · ·+ wk = ~0 for wi ∈ Wi for all i, we have wi = ~0 for all i.

3. Whenever βi is a basis for Wi for all i, the βi’s are disjoint and β := ∪ki=1βi is a basisfor∑k

i=1 Wi.

So take w1+· · ·+wk = ~0 for wi ∈ Eλi for all i. Note that each nonzero wi is an eigenvectorfor the eigenvalue λi. Remove all the zero ones. If we are left with any nonzero ones, by theprevious theorem, they must be linearly independent. This would be a contradiction. Sothey are all zero.

For the second claim take bases βi of Eλi . By the lemma, ∪ki=1βi is a basis for∑k

i=1Eλi .This implies the claim.

Theorem 5.2.5 (Main diagonalizability theorem). Let T : V → V be linear and dim V <∞. The following are equivalent.

1. T is diagonalizable.

2. c(x) can be written as (−1)n(x− λ1)n1 · · · (x− λk)nk , where ni = dim Eλi for all i.

3. V = Eλ1 ⊕ · · · ⊕ Eλk , where λ1, . . . , λk are the distinct eigenvalues of T .

57

Proof. Suppose first that T is diagonalizable. Then there exists a basis β of eigenvectors forT ; that is, for which [T ]ββ is diagonal. Clearly each diagonal element is an eigenvalue. For

each i, call ni the number of entries on the diagonal that are equal to λi. Then [T − λiI]ββhas ni number of zeros on the diagonal. All other diagonal entries must be non-zero, so thenullspace has dimension ni. In other words, ni = dim Eλi .

Suppose that condition 2 holds. Since c is a polynomial of degree n we must have

dim Eλ1 + · · ·+ dim‘Eλk = dim V .

However since the λi’s are distinct the previous theorem gives that

dimk∑i=1

Eλi = dim V .

In other words, V =∑k

i=1Eλi . The previous theorem implies that the sum is direct and theclaim follows.

Suppose that condition 3 holds. Then take βi a basis for Eλi for all i. Then β = ∪ki=1βiis a basis for V . We claim that [T ]ββ is diagonal. This is because each vector in β is aneigenvector. This proves 1 and completes the proof.

5.3 Exercises

1. Let V be an F-vector space and let W1, . . . ,Wk be subspaces of V . Recall the definitionof the sum

∑ki=1 Wi. It is the subspace of V given by

{w1 + · · ·+ wk : wi ∈ Wi}.

Recall further that this sum is called direct, and written as⊕k

i=1 if and only if for all1 < i ≤ n we have

Wi ∩ (i−1∑j=1

Wj) = {0}.

Show that the following statements are equivalent:

(a) The sum∑k

i=1Wi is direct.

(b) For any collection {w1, . . . , wk} with wi ∈ Wi for all i, we have

k∑i=1

wi = 0⇒ ∀i : wi = 0.

(c) If, for each i, βi is a basis of Wi, then the βi’s are disjoint and their unionβ = tki=1βi is a basis for the subspace

∑ki=1Wi.

(d) For any v ∈∑k

i=1Wi there exist unique vectors w1, . . . , wk such that wi ∈ Wi for

all i and v =∑k

i=1 wi.

58

2. Let V be an F-vector space. Recall that a linear map p ∈ L(V, V ) is called a projectionif p ◦ p = p.

(a) Show that if p is a projection, then so is q = idV −p, and we have p◦q = q ◦p = 0.

(b) Let W1, . . . ,Wk be subspaces of V and assume that V =⊕k

i=1Wi. For 1 ≤ t ≤ k,show that there is a unique element pt ∈ L(V,Wt) such that for any choice ofvectors w1, . . . , wk such that wj ∈ Wj for all j,

pt(wj) =

{wj j = t

0 j 6= t.

(c) Show that each pt defined in the previous part is a projection. Show furthermorethat

∑ki=1 pt = idV and that for t 6= s we have pt ◦ ps = 0.

(d) Show conversely that if p1, . . . , pt ∈ L(V, V ) are projections with the properties(a)∑k

i=1 pi = idV and (b) pi ◦ pj = 0 for all i 6= j, and if we put Wi = R(pi), then

V =⊕k

i=1Wi.

3. Let V be an F-vector space.

(a) If U ⊂ V is a subspace, W is another F-vector space, and f ∈ L(V,W ), definef |U : U → W by

f |U(u) = f(u) ∀u ∈ U.

Show that the map f 7→ f |U is a linear map L(V,W )→ L(U,W ). It is called therestriction map.

(b) Let f ∈ L(V, V ) and let U ⊂ V be an f -invariant subspace (that is, a subspace Uwith the property that f(u) ∈ U whenever u ∈ U). Observe that f |U ∈ L(U,U).If W ⊂ V is another f -invariant subspace and V = U ⊕W , show that

N(f) = N(f |U)⊕N(f |W ), R(f) = R(f |U)⊕R(f |W ), det(f) = det(f |U) det(f |W ).

(c) Let f, g ∈ L(V, V ) be two commuting endomorphisms, i.e. we have f ◦ g = g ◦ f .Show that N(g) and R(g) are f -invariant subspaces of V .

4. Consider the matrix

A :=

[1 10 1

].

Show that there does not exist an invertible matrix P ∈M2×2(C) such that PAP−1 isdiagonal.

5. Let V be a finite-dimensional F-vector space and f ∈ L(V, V ). Observe that for eachnatural number k we have N(fk) ⊂ N(fk+1).

(a) Show that there exists a natural number k so that N(fk) = N(fk+1).

59

(b) Show further that for all l ≥ k one has N(f l) = N(fk).

6. Let V be a finite-dimensional F-vector space and f ∈ L(V, V ).

(a) Let U ⊂ V be an f -invariant subspace and β = (v1, . . . , vn) a basis of V such thatβ′ = (v1, . . . , vk) is a basis for U . Show that

[f ]ββ =

[A B0 C

]with A = [f |U ]β

′

β′ ∈Mk×k(F), B ∈Mk×(n−k)(F), and C ∈M(n−k)×(n−k)(F).

(b) Let U,W ⊂ V be f -invariant subspaces with V = U ⊕W . Let β′ be a basis forU , β′′ a basis for W , and β = β′ t β′′. Show that

[f ]ββ =

[A 00 C

]with A = [f |U ]β

′

β′ and C = [f |W ]β′′

β′′ .

7. Let V be a finite-dimensional F-vector space. Recall that an element p ∈ L(V, V ) withp2 = p is called a projection. On the other hand, an element i ∈ L(V, V ) with i2 = idVis called an involution.

(a) Assume that char(F) 6= 2. Show that the maps

Involutions on V � Projections in V

i 7→ 1

2(idV + i)

2p− idV ←[ p

are mutually inverse bijections.

(b) Show that if p ∈ L(V, V ) is a projection, then the only eigenvalues of p are 0and 1. Furthermore, V = E0(p) ⊕ E1(p) (the eigenspaces for p). That is, p isdiagonalizable.

(c) Show that if i ∈ L(V, V ) is an involution, then the only eigenvalues of i are +1and −1. Furthermore, V = E+1(i)⊕ E−1(i). That is, i is diagonalizable.

Observe that projections and involutions are examples of diagonalizable endomor-phisms which do not have dim(V )-many distinct eigenvalues.

8. In this problem we will show that every endomorphism of a vector space over analgebraically closed field can be represented as an upper triangular matrix. This is asimpler result than (and is implied by) the Jordan Canonical form, which we will coverin class soon.

60

We will argue by (strong) induction on the dimension of V . Clearly the result holds fordim V = 1. So suppose that for some k ≥ 1 whenever dim W ≤ k and U : W → Wis linear, we can find a basis of W with respect to which the matrix of U is upper-triangular. Further, let V be a vector space of dimension k + 1 over F and T : V → Vbe linear.

(a) Let λ be an eigenvalue of T . Show that the dimension of R := R(T − λI) isstrictly less than dim V and that R is T -invariant.

(b) Apply the inductive hypothesis to T |R to find a basis of R with respect to whichT |R is upper-triangular. Extend this to a basis for V and complete the argument.

9. Let A ∈Mn×n(F) be upper-triangular. Show that the eigenvalues of A are the diagonalentries of A.

10. Let A be the matrix

A =

6 −3 −24 −1 −210 −5 −3

.

(a) Is A diagonalizable over R? If so, find a basis for R3 of eigenvectors of A.

(b) Is A diagonalizable over C? If so, find a basis for C3 of eigenvectors of A.

11. For which values of a, b, c ∈ R is the following matrix diagonalizable over R?0 0 0 0a 0 0 00 b 0 00 0 c 0

12. Let V be a finite dimensional vector space over a field F and let T : V → V be linear.

Suppose that every subspace of V is T -invariant. What can you say about T?

13. Let V be a finite dimensional vector space over a field F and let T, U : V → V belinear transformations.

(a) Prove that if I − TU is invertible then I − UT is invertible and

(I − UT )−1 = I + U(I − TU)−1T .

(b) Use the previous part to show that TU and UT have the same eigenvalues.

14. Let A be the matrix −1 1 11 −1 11 1 −1

.

Find An for all n ≥ 1.

Hint: first diagonalize A.

61

6 Jordan form

6.1 Generalized eigenspaces

It is of course not always true that T is diagonalizable. There can be a couple of reasons forthat. First it may be that the roots of the characteristic polynomial do not lie in the field.For instance (

0 1−1 0

)has characteristic polynomial x2 + 1. Even still it may be that the eigenvalues are in thefield, but we still cannot diagonalize. On the homework you will see that the matrix(

1 10 1

)is not diagonalizable over C (although its eigenvalues are certainly in C). So we resort tolooking for a block diagonal matrix.

Suppose that we can show that

V = W1 ⊕ · · · ⊕Wk

for some subspaces Wi. Then we can choose a basis for V made up of bases for the Wi’s. Ifthe Wi’s are T -invariant then the matrix will be in block form.

Definition 6.1.1. Let T : V → V be linear. A subspace W of V is T -invariant if T (w) ∈ Wwhenever w ∈ W .

• Each eigenspace is T -invariant. If w ∈ Eλ then

(T − λI)T (w) = λ(T − λI)w = ~0 .

• Therefore the eigenspace decomposition is a T -invariant direct sum.

To find a general T -invariant direct sum we define generalized eigenspaces.

Definition 6.1.2. Let T : V → V be linear. If λ ∈ F then the set

Eλ = {v ∈ V : (T − λI)kv = ~0 for some k}

is called the generalized eigenspace for λ.

• This is a subspace. If v, w ∈ Eλ and c ∈ F then there exists kv and kw such that

(T − λI)kvv = ~0 = (T − λI)kww .

Choosing k = max{kv, kw} we find

(T − λI)k(cv + w) = ~0 .

62

• Each generalized eigenspace is T -invariant. To see this, suppose that (T − λI)kv = ~0.Then because T commutes with (T − λI)k we have

(T − λI)kTv = T (T − λI)kv = ~0 .

To make sure the characteristic polynomial has roots we will take F to be an algebraicallyclosed field. That is, each polynomial with coefficients in F has a root in F .

6.2 Primary decomposition theorem

Theorem 6.2.1 (Primary decomposition theorem). Let F be algebraically closed and V afinite-dimensional vector space over F . If T : V → V is linear and λ1, . . . , λk are the distincteigenvalues of T then

V = Eλ1 ⊕ · · · ⊕ Eλk .

Proof. We follow several steps.

Step 1. c(x) has a root. Therefore T has an eigenvalue. Let λ be this value.

Step 2. Consider the generalized eigenspace Eλ. We first show that there exists k such that

Eλ = N(T − λI)k .

Let v1, . . . , vm be a basis for Eλ. Then for each i there is a ki such that (T − λI)kivi = ~0.Choose k = max{k1, . . . , km}. Then (T − λI)k kills all the basis vectors and thus killseverything in Eλ. Therefore

Eλ ⊆ N(T − λI)k .

The other direction is obvious.

Step 3. We now claim that

V = R(T − λI)k ⊕N(T − λI)k = R(T − λI)k ⊕ Eλ .

First we show that the intersection is only the zero vector. Suppose that v is in the in-tersection. Then (T − λI)kv = ~0 and there exists w such that (T − λI)kw = v. Then(T − λ1I)2k1w = ~0 so w ∈ Eλ1 . Therefore

v = (T − λ1I)k1w = ~0 .

By the rank-nullity theorem,

dim R(T − λ1I)k1 + dim N(T − λ1I)k1 = dim V .

By the 2-subspace (dimension) theorem,

V = R(T − λ1I)k1 +N(T − λ1I)k1 ,

and so it is a direct sum.

63

Step 4. Write W1 = R(T − λ1I)k1 so that

V = Eλ1 ⊕W1 .

These spaces are T -invariant. To show that note that we know Eλ1 is already. For W1,suppose that w ∈ W1. Then there exists u such that

w = (T − λ1I)k1u .

So(T − λ1I)k1(T − λ1I)u = (T − λ1I)w .

Therefore (T − λ1I)w ∈ W1 and thus W1 is (T − λ1I)-invariant. If w ∈ W1 then

Tw = (T − λ1I)w + λ1Iw ∈ W1 ,

so W1 is T -invariant.

Step 5. We now argue by induction and do the base case. Let e(T ) be the number ofdistinct eigenvalues of T . Note e(T ) ≥ 1.

We first assume e(T ) = 1. In this case we write λ1 for the eigenvalue and see

V = Eλ1 ⊕R(T − λ1I)k1 = Eλ1 ⊕W1 .

We claim that the second space is only the zero vector. Otherwise we restrict T to it to getan operator TW1 . Then TW1 has an eigenvalue λ. So there is a nonzero vector w ∈ W1 suchthat

Tw = TW1w = λw ,

so w is an eigenvector for T . But T has only one eigenvalue so λ = λ1. This means thatw ∈ Eλ1 and thus

w ∈ Eλ1 ∩W1 = {~0} .

This is a contradiction, soV = Eλ1 ⊕ {~0} = Eλ1 ,

and we are done.

Step 6. Suppose the theorem is true for any transformation U with e(U) = k (k ≥ 1). Thensuppose that e(T ) = k+ 1. Let λ1, . . . , λk+1 be the distinct eigenvalues of T and decomposeas before:

V = Eλ1 ⊕R(T − λ1I)k1 = Eλ1 ⊕W1 .

Now restrict T to W1 and call it TW1 .

Claim 6.2.2. TW1 has eigenvalues λ2, . . . , λk+1 with the generalized eigenspaces from T : theyare Eλ2 , . . . , Eλk+1

.

64

Once we show this we will be done: we will have e(TW1) = k and so we can apply thetheorem:

W1 = Eλ2 ⊕ · · · ⊕ Eλk+1,

soV = Eλ1 ⊕ · · · ⊕ Eλk+1

.

Proof. We first show that each of Eλ2 , . . . , Eλk+1is in W1. For this we want a lemma and a

definition:

Definition 6.2.3. If p(x) is a polynomial with coefficients in F and T : V → V is linear,where V is a vector space over F , we define the transformation

P (T ) = anTn + · · ·+ a1T + a0I ,

where p(x) = anxn + · · ·+ a1x+ a0.

Lemma 6.2.4. Suppose that p(x) and q(x) are two polynomials with coefficients in F . Ifthey have no common root then there exist polynomials a(x) and b(x) such that

a(x)p(x) + b(x)q(x) = 1 .

Proof. Homework

Now choose v ∈ Eλj for some j = 2, . . . , k + 1. By the decomposition we can write

v = u+ w where u ∈ Eλ1 and w ∈ W1. We can now write

Eλ1 = N(T − λ1I)k1 and Eλj = N(T − λjI)kj

and see~0 = (T − λjI)kjv = (T − λjI)kju+ (T − λjI)kjw .

However Eλ1 and W1 are T -invariant so they are (T − λjI)kj -invariant. This is a sum of

vectors equal to zero, where on is in Eλ1 , the other is in W1. Because these spaces directsum to V we know both vectors are zero. Therefore

u satisfies (T − λjI)kju = ~0 = (T − λ1I)k1u .

In other words, p(T )u = q(T )u = ~0, where p(x) = (x − λj)kj and q(x) = (x − λ1)k1 . Sincethese polynomials have no root in common we can find a(x) and b(x) as in the lemma.Finally,

u = (a(T )p(T ) + b(T )q(T ))u = ~0 .

This implies that v = w ∈ W1 and therefore all of Eλ2 , . . . , Eλk+1are in W1.

Because of the above statement, we now know that all of λ2, . . . , λk+1 are eigenvalues ofTW1 . Furthermore if λW1 is an eigenvalue of TW1 then it is an eigenvalue of T . It cannotbe λ1 because then any eigenvector for TW1 with eigenvalue λW1 would have to be in Eλ1

65

but also in W1 so it would be zero, a contradiction. Therefore the eigenvalues of TW1 areprecisely λ2, . . . , λk+1.

Let EW1λj

be the generalized eigenspace for TW1 corresponding to λj. We want

EW1λj

= Eλj , j = 2, . . . , k + 1 .

If w ∈ EW1λj

then there exists k such that (TW1 − λjI)kw = ~0. But now on W1, (TW1 − λjI)k

is the sam as (T − λjI)k, so

(T − λjI)kw = (TW1 − λjI)kw = ~0 ,

so that EW1λj

= Eλj . To show the other inclusion, take w ∈ Eλj , Since Eλj ⊆ W1, this implies

that w ∈ W1. Now since there exists k such that (T − λI)kw = ~0, we find

(TW1 − λjI)kw = (T − λjI)kw = ~0 ,

and we are done. We findV = Eλ1 ⊕ · · · ⊕ Eλk+1

.

6.3 Nilpotent operators

Now we look at the operator T on the generalized eigenspaces. We need only restrict T toeach eigenspace to determine the action on all of V . So for this purpose we will assume thatT has only one generalized eigenspace: there exists λ ∈ F such that

V = Eλ .

In other words, for each v ∈ V there exists k such that (T − λI)kv = ~0. Recall we can thenargue that there exists k∗ such that

V = N(T − λI)k∗,

or, if U = T − λI, Uk∗ = 0.

Definition 6.3.1. Let U : V → V be linear. We say that U is nilpotent if there exists ksuch that

Uk = 0 .

The smallest k for which Uk = 0 is called the degree of U .

The point of this section will be to give a structure theorem for nilpotent operators. Itcan be seen as a special case of Jordan form when all eigenvalues are zero. To prove thisstructure theorem, we will look at the nullspaces of powers of U . Note that if k = deg(U),then N(Uk) = V but N(Uk−1) 6= V . We get then an increasing chain of subspaces

N0 ⊆ N1 ⊆ · · · ⊆ Nk, where Nj = N(U j) .

66

• If v ∈ Nj \Nj−1 then U(v) ∈ Nj−1 \Nj−2.

Proof. v has the property that U jv = ~0 but U j−1v 6= ~0. Therefore U j−1(Uv) = ~0 butU j−2(Uv) 6= ~0.

Definition 6.3.2. If W1 ⊆ W2 are subspaces of V then we say that v1, . . . , vm ∈ W2 arelinearly independent mod W1 if

a1v1 + · · ·+ amvm ∈ W1 implies ai = 0 for all i .

Lemma 6.3.3. Suppose that dim W2 − dim W1 = l and v1, . . . , vm ∈ W2 \W1 are linearlyindependent mod W1. Then

1. m ≤ l and

2. we can choose l−m vectors vm+1, . . . , vl in W2 \W1 such that {v1, . . . , vl} are linearlyindependent mod W1.

Proof. It suffices to show that we can add just one vector. Let w1, . . . , wt be a basis for W1.Then define

X = Span({w1, . . . , wt, v1, . . . , vm}) .

Then this set is linearly independent. Indeed, if

a1w1 + · · ·+ atwt + b1v1 + · · ·+ bmvm = ~0 ,

then b1v1 + · · ·+ bmvm ∈ W1, so all bi’s are zero. Then

a1w1 + · · ·+ atwt = ~0 ,

so all ai’s are zero. Thus

t+m = dim X ≤ dim W2 = t+ l

or m ≤ l.For the second part, if k = l, we are done. Otherwise dim X < dim W2, so there exists

vk+1 ∈ W2 \X. To show linear independence mod W1, suppose that

a1v1 + · · ·+ akvk + ak+1vk+1 = w ∈ W1 .

If ak+1 = 0 then we are done. Otherwise we can solve for vk+1 and see it is in X. This is acontradiction.

Lemma 6.3.4. Suppose that for some m, v1, . . . , vp ∈ Nm \ Nm−1 are linearly independentmod Nm−1. Then U(v1), . . . , U(vp) are linearly independent mod Nm−2.

67

Proof. Suppose thata1U(v1) + · · ·+ apU(vp) = ~n ∈ Nm−2 .

ThenU(a1v1 + · · ·+ apvp) = ~n .

NowUm−1(a1v1 + · · ·+ apvp) = Um−1(~n) = ~0 .

Therefore a1v1 + · · · + apvp ∈ Nm−1. But these are linearly independent mod Nm−1 so wefind that ai = 0 for all i.

Now we do the following

1. Write dm = dim Nm − dim Nm−1. Starting at the top, choose

βk = vk1 , . . . , vkdk

which are linearly independent mod Nk−1.

2. Move down one level: write vk−1i = U(vki ). Then {vk−1

1 , . . . , vk−1dk} is linearly indepen-

dent mod Nk−2, so dk ≤ dk−1. By the lemma we can extend this to

βk−1 = {vk−11 , . . . , vk−1

dk, vk−1dk+1, . . . , v

k−1dk−1} ,

a maximal linearly independent set mod Nk−2 in Nk−1 \Nk−2.

3. Repeat.

Note that dk + dk−1 + · · ·+ d1 = dim V . We claim that if βi is the set at level i then

β = β1 ∪ · · · ∪ βk

is a basis for V . It suffices to show linear independence. For this, we use the following fact.

• If W1 ⊆ · · · ⊆ Wk = V is a nested sequence of subspaces with βi ⊆ Wi \Wi−1 linearlyindependent mod Wi−1 then β = ∪iβi is linearly independent. (check).

We have shown the following result.

Definition 6.3.5. A chain of length l for U is a set {v, U(v), U2(v), . . . , U l−1(v)} of non-zerovectors such that U l(v) = ~0.

Theorem 6.3.6. If U : V → V is linear and nilpotent (dim V < ∞) then there exists abasis of V consisting entirely of chains for U .

68

Let U : V → V be nilpotent. If C = {U l−1v, U l−2v, . . . , U(v), v} is a chain then note that

C = Span(C) is U -invariant .

Since V has a basis of chains, say C1, . . . , Cm, we can write

V = C1 ⊕ · · · ⊕ Cm .

In our situation each Ci is a basis for Ci so our matrix for U is block diagonal. Eachblock corresponds to a chain. Then U |Ci has the following matrix w.r.t. Ci:

0 1 0 0 · · · 00 0 1 0 · · · 00 0 0 1 · · ·

· · ·0 1

0

.

Theorem 6.3.7 (Uniqueness of nilpotent form). Let U : V → V be linear and nilpotentwith dim V <∞. Write

li(β) = # of (maximal) chains of length i in β .

Then if β, β′ are bases of V consisting of chains for U then

li(β)li(β′) for all i .

Proof. Write Ki(β) for the set of elements v of β such that U i(v) = ~0 but U i−1(v) 6= ~0 (fori = 1 we only require U(v) = ~0). Let li(β) be the number of (maximal) chains in β of lengthat least i.

We first claim that #Ki(β) = li(β) for all i. To see this note that for each chain C oflength at least i there is a unique element v ∈ C such that v ∈ Ki(β). Conversely, for eachv ∈ Ki(β) there is a unique chain of length at least i containing v.

Let ni be the dimension of N(U i). We claim that ni equals the number mi(β) of v ∈ βsuch that U i(v) = ~0. Indeed, the set of such v’s is linearly independent and in N(U i) soni ≥ mi(β). However all other v’s (dim V −mi(β) of them) are mapped to distinct elementsof β (since β is made up of chains), so dim R(U i) ≥ dim V −mi(β), so ni ≤ mi(β).

Because N(U i) contains N(U i−1) for all i (here we take N(U0) = {~0}, we have

li(β) = #Ki(β) = ni − ni−1 .

Thereforeli(β) = li(β)− li+1(β) = (ni − ni−1)− (ni+1 − ni) .

The right side does not depend on β and in fact the same argument shows it is equal toli(β

′). This completes the proof.

69

6.4 Existence and uniqueness of Jordan form, Cayley-Hamilton

Definition 6.4.1. A Jordan block for λ of size l is

Jλl =

λ 1 0 0 · · · 00 λ 1 0 · · · 0

· · ·1λ

.

Theorem 6.4.2 (Jordan canonical form). If T : V → V is linear with dim V < ∞ and Falgebraically closed. Then there is a basis β of V such that [T ]β is block diagonal with Jordanblocks.

Proof. First decompose V = Eλ1⊕· · ·⊕ Eλk . On each Eλi , the operator T −λiI is nilpotent.Each chain for (T−λiI)|Eλi gives a block in nilpotent decomposition. Then T = T−λiI+λiI

gives a Jordan block.

Draw a picture at this point (of sets of chains). We first decompose

V = Eλ1 ⊕ · · · ⊕ Eλk

and thenEλi = Ci1 ⊕ · · · ⊕ Ciki ,

where each Cij is the span of a chain of generalized eigenvectors: Cij = {v1, . . . , vp}, where

T (v1) = λiv1, T (v2) = λv2 + v1, . . . , T (vp) = λivp + vp−1 .

Theorem 6.4.3 (Cayley-Hamilton). Let T : V → V be linear with dim V < ∞ and Falgebraically closed. Then

c(T ) = 0 .

Remark. In fact the theorem holds even if F is not algebraically closed by doing a fieldextension.

Lemma 6.4.4. If U : V → V is linear and nilpotent with dim V = n <∞ then

Un = 0 .

Therefore if T : V → V is linear and v ∈ Eλ then

(T − λI)dim Eλv = ~0 .

Proof. Let β be a basis of chains for U . Then the length of the longest chain is n.

70

Lemma 6.4.5. If T : V → V is linear with dim V < ∞ and β is a basis such that [T ]ββ isin Jordan form then for each eigenvalue λ, let Sλ be the basis vectors corresponding to blocksfor λ. Then

Span(Sλ) = Eλ for each λ .

Therefore if

c(x) =k∏i=1

(λi − x)ni ,

then ni = dim Eλi for each i.

Proof. Write λ1, . . . , λk for the distinct eigenvalues of T . Let

Wi = Span(Sλi) .

We may assume that the blocks corresponding to λ1 appear first, λ2 appear second, and soon. Since [T ]ββ is in block form, this means V is a T -invariant direct sum

W1 ⊕ · · · ⊕Wk .

However for each i, T −λiI restricted to Wi is in nilpotent form. Thus (T −λiI)dim Eλiv = ~0for each v ∈ Sλi . This means

Wi ⊆ Eλi for all i or dim Wi ≤ dim Eλi .

But V = Eλ1 ⊕ · · · ⊕ Eλk , so∑k

i=1 dim Eλi = dim V . This gives that dim Wi = dimEλi for

all i, or Wi = Eλi .For the second claim, ni is the number of times that λi appears on the diagonal; that is,

the dimension of Span(Sλi).

Proof of Cayley-Hamilton. We first factor

c(x) =k∏i=1

(λi − x)ni ,

where ni is called the algebraic multiplicity of the eigenvalue λi. Let β be a basis such that[T ]ββ is in Jordan form. If v ∈ β is in a block corresponding to λj then v ∈ Eλj and so

(T − λjI)dim Eλj v = ~0 by the previous lemma). Now

c(T )v =

(k∏i=1

(λiI − T )ni

)v =

(∏i 6=j

(λiI − T )ni

)(λjI − T )njv = ~0

since nj = dim Eλj .

Finally we have uniqueness of Jordan form.

71

Theorem 6.4.6. If β and β′ are bases of V for which T is in Jordan form, then the matricesare the same up to permutation of blocks.

Proof. First we note that the characteristic polynomial can be read off of the matrices andis invariant. This gives that the diagonal entries are the same, and the number of vectorscorresponding to each eigenvalue is the same.

We see from the second lemma that if βi and β′i are the parts of the bases correspondingto blocks involving λi then

Wi := Span(βi) = Eλi = Span(β′i) =: W ′i .

Restricting T to Wi and to W ′i then gives the blocks for λi. But then βi and β′i are just bases

of Eλi of chains for T − λiI. The number of chains of each length is the same, and this isthe number of blocks of each size.

6.5 Exercises

Notation:

1. If F is a field then we write F[X] for the set of polynomials with coefficients in F.

2. If P,Q ∈ F[X] then we say that P divides Q and write P |Q if there exists S ∈ F[X]such that Q = SP .

3. If P ∈ F[X] then we write the deg(P ) for the largest k such that the coefficient of xk

in P is nonzero. We define the degree of the zero polynomial to be −∞.

4. If P ∈ F[X] then we say that P is monic if the coefficient of xn is 1, where n =deg(P ).

5. For a complex number z, we denote the complex conjugate by z, i.e. if z = a+ ib, witha, b ∈ R, then z = a− ib.

6. If V is an F -vector space, recall the definition of V ×V : it is the F -vector space whoseelements are

V × V = {(v1, v2) : v1, v2 ∈ V } .

Vector addition is performed as (v1, v2) + (v3, v4) = (v1 + v3, v2 + v4) and for c ∈ F ,c(v1, v2) is defined as (cv1, cv2).

Exercises

1. (a) Show that for P,Q ∈ F[X], one has deg(PQ) = deg(P ) + deg(Q).

(b) Show that for P,D ∈ F[X] such that D is nonzero, there exist Q,R ∈ F[X] suchthat P = QD +R and deg(R) < deg(D).Hint: Use induction on deg(P ).

72

(c) Show that, for any λ ∈ F ,

P (λ) = 0⇔ (X − λ)|P.

(d) Let P ∈ F[X] be of the form p(X) = a(X − λ1)n1 · · · (X − λk)nk for some

a, λ1, . . . , λk ∈ F and natural numbers n1, . . . , nk. Show that Q ∈ F[X] dividesP if and only if Q(X) = b(X − λ1)m1 · · · (X − λk)mk for some b ∈ F and naturalnumbers mi with mi ≤ ni (we allow mi = 0).

(e) Assume that F is algebraically closed. Show that every P ∈ F[X] is of theform a(X − λ1)n1 . . . (X − λk)nk for some a, λ1, . . . , λk ∈ F and natural numbersn1, . . . , nk with n1 + · · · + nk = deg(P ). In this case we call the λi’s the roots ofP .

2. Let F be a field and suppose that P,Q are nonzero polynomials in F[X]. Define thesubset S of F[X] as follows:

S = {AP +BQ : A,B ∈ F[X]} .

(a) Let D ∈ S be of minimal degree. Show that D divides both P and Q.

Hint: Use part 2 of the previous problem.

(b) Show that if S ∈ F [X] divides both P and Q then S divides D.

(c) Conclude that there exists a unique monic polynomial D ∈ F[X] satisfying thefollowing conditions

i. D divides P and Q.

ii. If T ∈ F[X] divides both P and Q then T divides D.

(Such a polynomial is called the greatest common divisor (GCD) of P and Q.)

(d) Show that if F is algebraically closed and P and Q are polynomials in F[X] withno common root then there exist A and B in F[X] such that

AP +BQ = 1 .

3. Let F be any field, V be an F -vector space, f ∈ L(V, V ), and W ⊂ V an f -invariantsubspace.

(a) Let p : V → V/W denote the natural map defined in Homework 8, problem 5.Show that there exists an element of L(V/W, V/W ), which we will denote byf |V/W , such that p ◦ f = f |V/W ◦ p. It is customary to expresses this fact usingthe following diagram:

Vf //

p

��

V

p

��V/W

f |V/W // V/W

73

(b) Let β′ be a basis for W and β be a basis for V which contains β′. Show that theimage of β′′ := β \ β′ under p is a basis for V/W .

(c) Let β′′ be a subset of V with the property that the restriction of p to β′′ (whichis a map of sets β′′ → V/W ) is injective and its image is a basis for V/W . Showthat β′′ is a linearly-independent set. Show moreover that if β′ is a basis for W ,then β := β′ t β′′ is a basis for V .

(d) Let β, β′, and β′′ be as above. Show that

[f ]ββ =

[A B0 C

]with A = [f |W ]β

′

β′ and C = [f |V/W ]p(β′′)p(β′′).

4. The minimal polynomial. Let F be any field, V an F -vector space, and f ∈ L(V, V ).

(a) Consider the subset S ⊂ F [X] defined by

S = {P ∈ F [X]|P (f) = 0}.

Show that S contains a nonzero element.

(b) Let Mf ∈ S be a monic non-zero element of minimal degree. Show that Mf

divides any other element of S. Conclude that Mf as defined is unique. It iscalled the minimal polynomial of f .

(c) Show that the roots of Mf are precisely the eigenvalues of f by completing thefollowing steps.

i. Suppose that r ∈ F is such that Mf (r) = 0. Show that

Mf (X) = Q(X)(X − r)k

for some positive integer k and Q ∈ F[X] such that Q(r) 6= 0. Prove alsothat Q(f) 6= 0.

ii. Show that if r ∈ F satisfies Mf (r) = 0 then f − rI is not invertible and thusr is an eigenvalue of f .

iii. Conversely, let λ be an eigenvalue of f with eigenvector v. Show that ifP ∈ F[X] then

P (f)v = P (λ)v .

Conclude that λ is a root of Mf .

(d) Assume that F is algebraically closed. For each eigenvalue λ of f , express multλ(Mf )in terms of the Jordan form of f .

(e) Assume that F is algebraically closed. Show that f is diagonalizable if and onlyif multλ(Mf ) = 1 for all eigenvalues λ of f .

74

(f) Assume that F is algebraically closed. Under which circumstances the does Mf

equal the characteristic polynomial of f?

5. If T : V → V is linear and V is a finite-dimensional F-vector space with F algebraicallyclosed, we define the algebraic multiplicity of an eigenvalue λ to be a(λ), the dimensionof the generalized eigenspace Eλ. The geometric multiplicity of λ is g(λ), the dimensionof the eigenspace Eλ. Finally, the index of λ is i(λ), the length of the longest chain ofgeneralized eigenvectors in Eλ.

Suppose that λ is an eigenvalue of T and g = g(λ) and i = i(λ) are given integers.

(a) What is the minimal possible value for a = a(λ)?

(b) What is the maximal possible for a?

(c) Show that a can take any value between the answers for the above two questions.

(d) What is the smallest dimension n of V for which there exist two linear transfor-mations T and U from V to V with all of the following properties? (i) Thereexists λ ∈ F which is the only eigenvalue of either T or U , (ii) T and U are notsimilar transformations and (iii) the geometric multiplicity of λ for T equals thatof U and similarly for the index.

6. Find the Jordan form for each of the following matrices over C. Write the minimalpolynomial and characteristic polynomial for each. To do this, first find the eigenvalues.Then, for each eigenvalue λ, find the dimensions of the nullspaces of (A − λI)k forpertinent values of k (where A is the matrix in question). Use this information todeduce the block forms.

(a)

−1 0 01 4 −1−1 4 0

(b)

2 3 00 −1 00 −1 2

(c)

5 1 30 2 0−6 −1 −4

(d)

3 0 04 2 05 0 2

(e)

2 3 2−1 4 20 1 3

(f)

4 1 0−1 2 01 1 3

7. (a) The characteristic polynomial of the matrix

A =

7 1 2 21 4 −1 −1−2 1 5 −11 1 2 8

is c(x) = (x − 6)4. Find an invertible matrix S such that S−1AS is in Jordanform.

(b) Find all complex matrices in Jordan form with characteristic polynomial

c(x) = (i− x)3(2− x)2 .

75

8. Complexification of finite-dimensional real vector spaces. Let V be an R-vector space. Just as we can view R as a subset of C we will be able to view V as asubset of a C-vector space. This will be useful because C is algebraically closed so wecan, for example, use the theory of Jordan form in VC and bring it back (in a certainform) to V . We will give two constructions of the complexification; the first is moreelementary and the second is the standard construction you will see in algebra.

We put VC = V × V .

(a) Right now, VC is only an R-vector space. We must define what it means tomultiply vectors by complex scalars. For z ∈ C, z = a + ib with a, b ∈ R, andv = (vr, vi) ∈ VC, we define the element zv ∈ VC to be

(avr − bvi, avi + bvr).

Show that in this way, VC becomes a C-vector space. This is the complexificationof V .

(b) We now show how to view V as a “subset” of VC. Show that the map ι : V → VCwhich maps v to (v, 0) is injective and R-linear. (Thus the set ι(V ) is a copy ofV sitting in VC.)

(c) Show that dimC(VC) = dimR(V ). Conclude that VC is equal to spanC(ι(V )).Conclude further that if v1, . . . , vn is an R-basis for V , then ι(v1), . . . , ι(vn) is aC-basis for VC.

(d) Complex conjugation: We define the complex conjugation map c : VC → VC tobe the map (vr, vi) 7→ (vr,−vi). Just as R (sitting inside of C) is invariant undercomplex conjugation, so will our copy of V (and its subspaces) be inside of VC.

i. Prove that c2 = 1 and i(V ) = {v ∈ VC|c(v) = v}.ii. Show that for all z ∈ C and v ∈ VC, we have c(zv) = zc(v). Maps with this

property are called anti-linear.

iii. In the next two parts, we classify those subspaces of VC that are invariantunder c. Let W be a subspace of V . Show that the C-subspace of VC spannedby ι(W ) equals

{(w1, w2) ∈ VC : w1, w2 ∈ W}and is invariant under c.

iv. Show conversely that if a subspace W of the C-vector space VC is invariantunder c, then there exists a subspace W ⊂ V such that W = spanC(ι(W )).

v. Last, notice that the previous two parts told us the following: The subspacesof VC which are invariant under conjugation are precisely those which areequal to WC for subspaces W of the R-vector space V . Show that moreover,in that situation, the restriction of the complex conjugation map c : VC → VCto WC is equal to the complex conjugation map defined for WC (the lattermap is defined intrinsically for WC, i.e. without viewing it as a subspace ofVC).

76

9. Let V be a finite dimensional R-vector space. For this exercise we use the notation ofthe previous one.

(a) Let W be another finite dimensional R-vector space, and let f ∈ L(V,W ). Showthat

fC((v, w)) := (f(v), f(w))

defines an element fC ∈ L(VC,WC).

(b) Show that for v ∈ VC, we have fC(c(v)) = c(fC(v)). Show conversely that iff ∈ L(VC,WC) has the property that f(c(v)) = c(f(v)), the f = fC for somef ∈ L(V,W ).

10. In this problem we will establish the real Jordan form. Let V be a vector space overR of dimension n <∞. Let T : V → V be linear and TC its complexification.

(a) If λ ∈ C is an eigenvalue of TC, and Eλ is the corresponding generalized eigenspace,show that

c(Eλ) = Eλ.

(b) Show that the non-real eigenvalues of TC come in pairs. In other words, show thatwe can list the distinct eigenvalues of TC as

λ1, . . . , λr, σ1, σ2, . . . , σ2m ,

where for each j = 1, . . . , r, λj = λj and for each i = 1, . . . ,m, σ2i−1 = σ2i.

(c) Because C is algebraically closed, the proof of Jordan form shows that

VC = Eλ1 ⊕ · · · ⊕ Eλr ⊕ Eσ1 ⊕ · · · ⊕ Eσ2m .

Using the previous two points, show that for j = 1, . . . , r and i = 1, . . . ,m, thesubspaces of VC

Eλj and Eσ2i−1⊕ Eσ2i

are c-invariant.

(d) Deduce from the results of problem 6, homework 10 that there exist subspacesX1, . . . , Xr and Y1, . . . , Ym of V such that for each j = 1, . . . , r and i = 1, . . . ,m,

Eλj = SpanC(ι(Xj)) and Eσ2i−1⊕ Eσ2i = SpanC(ι(Yi)) .

Show thatV = X1 ⊕ · · · ⊕Xr ⊕ Y1 ⊕ · · · ⊕ Ym .

(e) Prove that for each j = 1, . . . , r, the transformation T − λjI restricted to Xj isnilpotent and thus we can find a basis βj for Xj consisting entirely of chains forT − λjI.

77

(f) For each i = 1, . . . ,m, let

βi = {(vi1, wi1), . . . , (vini , wini

)}

be a basis of Eσ2i−1consisting of chains for TC − σ2i−1I. Prove that

βi = {vi1, wi1, . . . , vini , wini}

is a basis for Yi. Describe the form of the matrix representation of T restrictedto Yi relative to βi.

(g) Gathering the previous parts, state and prove a version of Jordan form for lineartransformations over finite-dimensional real vector spaces. Your version shouldbe of the form “If T : V → V is linear then there exists a basis β of V such that[T ]ββ has the form . . . ”

78

7 Bilinear forms

7.1 Definitions

We now switch gears from Jordan form.

Definition 7.1.1. If V is a vector space over F , a function f : V × V → F is called abilinear form if for fixed v ∈ V , f(v, w) is linear in w and for fixed w ∈ V , f(v, w) is linearin v.

Bilinear forms have matrix representations similar to those for linear transformations.Choose a basis β = {v1, . . . , vn} for V and write

v = a1v1 + · · ·+ anvn, w = b1v1 + · · ·+ bnvn .

Now

f(v, w) =n∑i=1

aif(vi, w) =n∑i=1

n∑j=1

aibjf(vi, vj) .

Define an n× n matrix A by Ai,j = f(vi, vj). Then this is

n∑i=1

ai

(n∑j=1

bjAi,j

)=

n∑i=1

ai

(A ·~b

)i

= (~a)tA ·~b = [v]tβ[f ]ββ[w]β .

We have proved:

Theorem 7.1.2. If dim V <∞ and f : V ×V → F is a bilinear form there exists a uniquematrix [f ]ββ such that for all v, w ∈ V ,

f(v, w) = [v]tβ[f ]ββ[w]β .

Furthermore the map f 7→ [f ]ββ is an isomorphism from Bil(V, F ) to Mn×n(F ).

Proof. We showed existence. To prove uniqueness, suppose that A is another such matrix.Then

Ai,j = etiAej = [vi]tβA[vj]β = f(vi, vj) .

If β′ is another basis then

([v]β′)t

(([I]β

′

β

)t[f ]ββ[I]β

′

β

)[w]β′ =

([I]β

′

β [v]β′)t

[f ]ββ[I]β′

β [w]β′ = ([v]β)t [f ]ββ[w]β = f(v, w) .

Therefore

[f ]β′

β′ =(

[I]β′

β

)t[f ]ββ[I]β

′

β .

Note that for fixed v ∈ V the map Lf (v) : V → F given by Lf (v)(w) = f(v, w) is alinear functional. So f gives a map in L(V, V ∗).

79

Theorem 7.1.3. Denote by Bil(V, F ) the set of bilinear forms on V . The map ΦL :Bil(V, F )→ L(V, V ∗) given by

ΦL(f) = Lf

is an isomorphism.

Proof. If f, g ∈ Bil(V, F ) and c ∈ F then

(ΦL(cf + g)(v)) (w) = (Lcf+g(v)) (w) = (cf + g)(v, w) = cf(v, w) + g(v, w)

= cLf (v)(w) + Lg(v)(w) = (cLf (v) + Lg(v))(w)

= (cΦL(f)(v) + ΦL(g)(v))(w) .

Thus ΦL(cf + g)(v) = cΦL(f)(v) + ΦL(g)(v). This is the same as (cΦL(f) + ΦL(g))(v).Therefore ΦL(cf + g) = cΦL(f) + ΦL(g). Thus ΦL is linear.

Now Bil(V, F ) has dimension n2. This is because the map from last theorem is anisomorphism. So does L(V, V ∗). Therefore we only need to show one-to-one or onto. Toshow one-to-one, suppose that ΦL(f) = 0. Then for all v, Lf (v) = 0. In other words, for allv and w ∈ V , f(v, w) = 0. This means f = 0.

Remark. We can also define Rf (w) by Rf (w)(v) = f(v, w). Then the corresponding mapΦR is an isomorphism.

You will prove the following fact in homework. If β is a basis for V and β∗ is the dualbasis, then for each f ∈ Bil(V, F ),

[Rf ]ββ∗ = [f ]ββ .

Then we have

[Lf ]ββ∗ =

([f ]ββ

)t.

To see this, set g ∈ Bil(V, F ) to be g(v, w) = f(w, v). Then for each v, w ∈ V ,

([w]β)t [f ]ββ[v]β = f(w, v) = g(v, w)

Taking transpose on both sides,

([v]β)t(

[f ]ββ

)t[w]β = g(v, w)

so [g]ββ =(

[f ]ββ

)t. But Lf = Rg, so(

[f ]ββ

)t= [Rg]

ββ∗ = [Lf ]

ββ∗ .

Definition 7.1.4. If f ∈ Bil(V, F ) then we define the rank of f to be the rank of Rf .

• By the above remark, the rank equals the rank of either of the matrices

[f ]ββ or [Lf ]ββ∗ .

Therefore the rank of [f ]ββ does not depend on the choice of basis.

80

• For f ∈ Bil(V, F ), define

N(f) = {v ∈ V : f(v, w) = 0 for all w ∈ V } .

This is just{v ∈ V : Lf (v) = 0} = N(Lf ) .

But Lf is a map from V to V ∗, so we have

rank(f) = rank(Lf ) = dim V − dim N(f) .

7.2 Symmetric bilinear forms

Definition 7.2.1. A bilinear form f ∈ Bil(V, F ) is called symmetric if f(v, w) = f(w, v)for all v, w ∈ V . f is called skew-symmetric if f(v, v) = 0 for all v ∈ V .

• The matrix for a symmetric bilinear form is symmetric and the matrix for a skew-symmetric bilinear form is anti-symmetric.

• Furthermore, each symmetric matrix A gives a symmetric bilinear form:

f(v, w) = ([v]β)tA · [w]β .

Similarly for skew-symmetric matrices.

Lemma 7.2.2. If f ∈ Bil(V, F ) is symmetric and char(F ) 6= 2 then f(v, w) = 0 for allv, w ∈ V if and only if f(v, v) = 0 for all v ∈ V .

Proof. One direction is clear. For the other, suppose that f(v, v) = 0 for all v ∈ V .

f(v + w, v + w) = f(v, v) + 2f(v, w) + f(w,w)

f(v − w, v − w) = f(v, v)− 2f(v, w) + f(w,w) .

Therefore0 = f(v + w, v + w)− f(v − w, v − w) = 4f(v, w) .

Here 4f(v, w) means f(v, w) added to itself 3 times. If char(F ) 6= 2 then this impliesf(v, w) = 0.

Remark. If char(F ) = 2 then the above lemma is false. Take F = Z2 with V = F 2 and fwith matrix (

0 11 0

).

Check that this has f(v, v) = 0 for all v but f(v, w) is clearly not 0 for all v, w ∈ V .

Definition 7.2.3. A basis β = {v1, . . . , vn} of V is orthogonal for f ∈ Bil(V, F ) if f(vi, vj) =0 whenever i 6= j. It is orthonormal if it is orthogonal and f(vi, vi) = 1 for all i.

81

Theorem 7.2.4 (Diagonalization of symmetric bilinear forms). Let f ∈ Bil(V, F ) withchar(F ) 6= 2 and dim V <∞. If f is symmetric then V has an orthogonal basis for f .

Proof. We argue by induction on n = dim V . If n = 1 it is clear. Let us now suppose thatthe statement holds for all k < n for some n > 1 and show that it holds for n. If f(v, v) = 0for all v then f is identically zero and thus we are done. Otherwise we can find some v 6= 0such that f(v, v) 6= 0.

DefineW = {w ∈ V : f(v, w) = 0} .

Since this is the nullspace of Lf (v) and Lf (v) is a nonzero element of V ∗, it follows that Wis n−1 dimensional. Because f restricted to W is still a symmetric bilinear function, we canfind a basis β′ of W such that [f ]β

′

β′ is diagonal. Write β′ = {v1, . . . , vn−1} and β = β′ ∪ {v}.Then we claim β is a basis for V : if

a1v1 + · · ·+ an−1vn−1 + av = ~0 ,

then taking Lf (v) of both sides we find af(v, v) = ~0, or a = 0. Linear independence givesthat the other ai’s are zero.

Now it is clear that [f ]ββ is diagonal. For i 6= j which are both < n this follows because β′

is a basis for which f is diagonal on W . Otherwise one of i, j is n and then the other vectoris in W and so f(vi, vj) = 0.

• In the basis β, f has a diagonal matrix. This says that for each symmetric matrix Awe can find an invertible matrix S such that

StAS is diagonal .

• In fact, if F is a field such that each number has a square root (like C) then we can makea new basis, replacing each element v of β such that f(v, v) 6= 0 by v/

√f(v, v) and

leaving all elements such that f(v, v) = 0 to find a basis such that the representationof f is diagonal, with all 1 and 0 on the diagonal. The number of 1’s equals the rankof f .

• Therefore if f has full rank and each element of F has a square root, there exists anorthonormal basis of V for f .

Theorem 7.2.5 (Sylvester’s law). Let f be a symmetric bilinear form on Rn. There existsa basis β such that [f ]ββ is diagonal, with only 0’s, 1’s and −1’s. Furthermore, the numberof each is independent of the choice of basis that puts f into this form.

Proof. Certainly such a basis exists. Just modify the construction by dividing by√|f(vi, vi)|

instead. So we show the other claim. Because the number of 0’s is independent of the basis,we need only show the statement for 1’s.

82

For a basis β, let V +(β) be the span of the vi’s such that f(vi, vi) > 0, and similarly forV −(β) and V 0(β). Clearly

V = V +(β)⊕ V −(β)⊕ V 0(β) .

Note that the number of 1’s equals the dimension of V +(β). Furthermore, for each v ∈ V +(β)we have

f(v, v) =

p∑i=1

a2i f(vi, vi) > 0 ,

where v1, . . . , vp are the basis vectors for V +(β). A similar argument gives f(v, v) ≤ 0 forall v ∈ V −(β)⊕ V 0(β).

If β′ is another basis we also have

V = V +(β′)⊕ V −(β′)⊕ V 0(β′) .

Suppose that dim V +(β′) > dim V +(β). Then

dim V +(β′) + dim (V −(β)⊕ V 0(β)) > n ,

so V +(β′) intersects V −(β)⊕V 0(β) in at least one non-zero vector, say v. Since v ∈ V +(β′),f(v, v) > 0, However, since v ∈ V −(β) ⊕ V 0(β), so f(v, v) ≤ 0, a contradiction. Thereforedim V +(β) = dim V +(β′) and we are done.

• The subspace V 0(β) is unique. We can define

NL(f) = N(Lf ), NR(f) = N(Rf ) .

In the symmetric case, these are equal and we can define it to be be N(f). We claimthat

V 0(β) = N(f) for all β .

Indeed, if v ∈ V 0(β) then because the basis β is orthogonal for f , f(v, v) = 0 for allv ∈ β and so v ∈ N(f). On the other hand,

dim V 0(β) = dim V −[dim V +(β) + dim V −(β)

]= dim V − rank [f ]ββ = dim N(f) .

• However the spaces V +(β) and V −(β) are not unique. Let us take f ∈ Bil(R2,R) withmatrix (in the standard basis)

[f ]ββ =

(1 00 −1

).

Then f((a, b), (c, d)) = ac− bd. Take v1 = (2,√

3) and v2 = (√

3, 2). Thus we get

f(v1, v1) = (2)(2)− (√

3)(√

3) = 1 ,

f(v1, v2) = (2)(√

3)− (√

3)(2) = 0

f(v2, v2) = (√

3)(√

3)− (2)(2) = −1 .

83

7.3 Sesquilinear and Hermitian forms

One important example of a symmetric bilinear form on Rn is

f(v, w) = v1w1 + · · ·+ vnwn .

In this case,√f(v, v) actually defines a good notion of length of vectors on Rn (we will

define precisely what this means later). In particular, we have f(v, v) ≥ 0 for all v. On Cn,however, this is not true. If f is the bilinear form from above, then f((i, . . . , i), (i, . . . , i)) < 0.But if we define the form

f(v, w) = v1w1 + · · ·+ vnwn ,

then it is true. This is not bilinear, but it is sesquilinear.

Definition 7.3.1. Let V be a finite dimensional complex vector space. A function f :V × V → C is called sesquilinear if

1. for each w ∈ V , the function v 7→ f(v, w) is linear and

2. for each v ∈ V , the function w 7→ f(v, w) is anti-linear.

To be anti-linear, it means that f(v, cw1 +w2) = cf(v, w1) + f(v, w2). The sesquilinear formf is called Hermitian if f(v, w) = f(w, v).

• Note that if f is hermitian, then f(v, v) = f(v, v), so f(v, v) ∈ R.

1. If f(v, v) ≥ 0 (> 0) for all v 6= 0 then f is positive semi-definite (positive definite).

2. If f(v, v) ≤ 0 (< 0) for all v 6= 0 then f is negative semi-definite (negativedefinite).

• If f is a sesquilinear form and β is a basis then there is a matrix for f : as before, ifv = a1v1 + · · ·+ anvn and w = b1v1 + · · ·+ bnvn,

f(v, w) =n∑i=1

aif(vi, w) =n∑i=1

[bjf(vi, vj)

]= [v]tβ[f ]ββ[w]β .

• The map w 7→ f(·, w) is a conjugate isomorphism from V to V ∗.

• We have the polarization formula

4f(u, v) = f(u+ v, u+ v)− f(u− v, u− v) + if(u+ iv, u+ iv)− if(u− iv, u− iv) .

From this we deduce that if f(v, v) = 0 for all v then f = 0.

Theorem 7.3.2 (Sylvester for Hermitian forms). Let f be a hermitian form on a finite-dimensional complex vector space V . There exists a basis β of V such that [f ]ββ is diagonalwith only 0’s, 1’s and −1’s. Furthermore the number of each does not depend on β so longas the matrix is in diagonal form.

Proof. Same proof.

84

7.4 Exercises

Notation:

1. For all problems below, F is a field of characteristic different from 2, and V is a finite-dimensional F -vector space. We write Bil(V, F ) for the vector space of bilinear formson V , and Sym(V, F ) for the subspace of symmetric bilinear forms.

2. If B is a bilinear form on V and W ⊂ V is any subspace, we define the restriction ofB to W , written B|W ∈ Bil(W,F ), by B|W (w1, w2) = B(w1, w2).

3. We call B ∈ Sym(V, F ) non-degenerate, if N(B) = 0.

Exercises

1. Let l ∈ V ∗. Define a symmetric bilinear form B on V by B(v, w) = l(v)l(w). Computethe nullspace of B.

2. Let B be a symmetric bilinear form on V . Suppose that W ⊂ V is a subspace withthe property that V = W ⊕N(B). Show that the B|W is non-degenerate.

3. Let B be a symmetric bilinear form on V and char(F) 6= 2. Suppose that W ⊂ V is asubspace such that B|W is non-degenerate. Show that then V = W ⊕W⊥.Hint: Use induction on dim(W ).

4. Recall the isomorphism Φ : Bil(V, F )→ L(V, V ∗) given by

Φ(B)(v)(w) = B(v, w), v, w ∈ V.

If β is a basis of V , and β∗ is the dual basis of V ∗, show that

[B]ββ =(

[Φ(B)]ββ∗)T

.

5. Let n denote the dimension of V . Let d ∈ Altn(V ), and B ∈ Sym(V, F ) both be non-zero. We are going to show that there exists a constant cd,B ∈ F with the propertythat for any vectors v1, . . . , vn, w1, . . . , wn ∈ V , the following identity holds:

det(B(vi, wj)ni=1

nj=1) = cd,Bd(v1, . . . , vn)d(w1, . . . , wn),

by completing the following steps:

(a) Show that for fixed (v1, . . . , vn), there exists a constant cd,B(v1, . . . , vn) ∈ F suchthat

det(B(vi, wj)ni=1

nj=1) = cd,B(v1, . . . , vn) · d(w1, . . . , wn).

85

(b) We now let (v1, . . . , vn) vary. Show that there exists a constant cd,B ∈ F suchthat

cd,B(v1, . . . , vn) = cd,B · d(v1, . . . , vn).

Show further that cd,B = 0 precisely when B is degenerate.

6. The orthogonal group. Let B be a non-degenerate symmetric bilinear form on V .Consider

O(B) = {f ∈ L(V, V )|B(f(v), f(w)) = B(v, w) ∀v, w ∈ V }.

(a) Show that if f ∈ O(B), then det(f) is either 1 or −1

Hint: Use the previous exercise.

(b) Show that the composition of maps makes O(B) into a group.

(c) Let V = R2, B((x1, x2), (y1, y2)) = x1y1 + x2y2. Give a formula for the 2x2-matrices that belong to O(B).

7. The vector product. Assume that V is 3-dimensional. Let B ∈ Sym(V, F ) benon-degenerate, and d ∈ Alt3(V ) be non-zero.

(a) Show that for any v, w ∈ V there exists a unique vector z ∈ V such that for allvectors x ∈ V the following identity holds: B(z, x) = d(v, w, x).

Hint: Consider the element d(v, w, ·) ∈ V ∗.(b) We will denote the unique vector z from part 1 by v×w. Show that V ×V → V ,

(v, w) 7→ v × w is bilinear and skew-symmetric.

(c) For f ∈ O(B), show that f(v × w) = det(f) · (f(v)× f(w)).

(d) Show that v × w is B-orthogonal to both v and w.

(e) Show that v × w = 0 precisely when v and w are linearly dependent.

8. Let V be a finite dimensional R-vector space. Recall its complexification VC, definedin the exercises of last chapter. It is a C-vector space with dimC VC = dimR V . As anR-vector space, it equals V ⊕ V . We have the injection ι : V → VC, v 7→ (v, 0). Wealso have the complex conjugation map c(v, w) = (v,−w).

(a) Let B be a bilinear form on V . Show that

BC((v, w), (x, y)) := B(v, x)−B(w, y) + iB(v, y) + iB(w, x)

defines a bilinear form on VC. Show that N(BC) = N(B)C. Show that BC issymmetric if and only if B is.

(b) Show that for v, w ∈ VC, we have BC(c(v), c(w)) = BC(v, w). Show conversely

that any bilinear form B on VC with the property B(c(v), c(w)) = B(v, w) is equalto BC for some bilinear form B on V .

86

(c) Let B be a symmetric bilinear form on V . Show that

BH((v, w), (x, y)) := B(v, x) +B(w, y)− iB(v, y) + iB(w, x)

defines a Hermitian form on VC. Show that N(BH) = N(B)C.

(d) Show that for v, w ∈ VC, we have BH(c(v), c(w)) = BH(v, w). Show conversely

that any Hermitian form B on VC with the property B(c(v), c(w)) = B(v, w) isequal to BH for some bilinear form B on V .

9. Prove that if V is a finite-dimensional F-vector space with char(F ) 6= 2 and f is anonzero skew-symmetric bilinear form (that is, a bilinear form such that f(v, w) =−f(w, v) for all v, w ∈ V ) then there is no basis β for V such that [f ]ββ is uppertriangular.

10. For each of the following real matrices A, find an invertible matrix S such that StASis diagonal. 2 3 5

3 7 115 11 13

,

0 1 2 31 0 1 22 1 0 13 2 1 0

.

Also find a complex matrix T such that T tAT is diagonal with only entries 0 and 1.

87

8 Inner product spaces

8.1 Definitions

We will be interested in positive definite hermitian forms.

Definition 8.1.1. Let V be a complex vector space. A hermitian form f is called an innerproduct (or scalar product) if f is positive definite. In this case we call V a (complex) innerproduct space.

An example is the standard dot product:

〈u, v〉 = u1v1 + · · ·+ unvn .

It is customary to write an inner product f(u, v) as 〈u, v〉. In addition, we write ‖u‖ =√〈u, u〉. This is the norm induced by the inner product 〈·, ·〉. In fact, (V, d) is a metric

space, usingd(u, v) = ‖u− v‖ .

Properties of norm. Let (V, 〈·, ·〉) be a complex inner product space.

1. For all c ∈ C, ‖cu‖ = |c|‖u‖.

2. ‖u‖ = 0 if and only if u = ~0.

3. (Cauchy-Schwarz inequality) For u, v ∈ V ,

|〈u, v〉| ≤ ‖u‖‖v‖ .

Proof. Define If u or v is ~0 then we are done. Otherwise, set w = u− 〈u,v〉‖v‖2 v.

0 ≤ 〈w,w〉 = 〈w, u〉 − 〈u, v〉‖v‖2

〈w, v〉 .

However

〈w, v〉 = 〈u, v〉 − 〈u, v〉‖v‖2

〈v, v〉 = 0 ,

so

0 ≤ 〈w, u〉 = 〈u, u〉 − 〈u, v〉〈u, v〉‖v‖2

,

and

0 ≤ 〈u, u〉 − |〈u, v〉|2

‖v‖2⇒ |〈u, v〉|2 ≤ ‖u‖2‖v‖2 .

88

Everything above is an equality so, we have equality if and only if w = ~0, or v and uare linearly dependent.

4. (Triangle inequality) For u, v ∈ V ,

‖u+ v‖ ≤ ‖u‖+ ‖v‖ .

This is also written ‖u− v‖ ≤ ‖u− w‖+ ‖w − v‖.

Proof.

‖u+ v‖2 = 〈u+ v, u+ v〉 = 〈u, u〉+ 〈u, v〉+ 〈u, v〉+ 〈v, v〉= 〈u, u〉+ 2Re 〈u, v〉+ 〈v, v〉≤ ‖u‖2 + 2 |〈u, v〉|+ ‖v‖2

≤ ‖u‖2 + 2‖u‖‖v‖+ ‖v‖2 = (‖u‖+ ‖v‖)2 .

Taking square roots gives the result.

8.2 Orthogonality

Definition 8.2.1. Given a complex inner product space (V, 〈·, ·〉) we say that vectors u, v ∈ Vare orthogonal if 〈u, v〉 = 0.

Theorem 8.2.2. Let v1, . . . , vk be nonzero and pairwise orthogonal in a complex inner prod-uct space. Then they are linearly independent.

Proof. Suppose thata1v1 + · · ·+ akvk = ~0 .

Then we take inner product with vi.

0 = 〈~0, vi〉 =k∑j=1

aj〈vj, vi〉 = ai‖vi‖2 .

Therefore ai = 0.

We begin with a method to transform a linearly independent set into an orthonormal set.

Theorem 8.2.3 (Gram-Schmidt). Let V be a complex inner product space and v1, . . . , vk ∈ Vwhich are linearly independent. There exist u1, . . . , uk such that

1. {u1, . . . , uk} is orthonormal and

2. for all j = 1, . . . , k, Span({u1, . . . , uj}) = Span({v1, . . . , vj}).

89

Proof. We will prove by induction. If k = 1, we must have v1 6= ~0, so set u1 = v1/‖v1‖. Thisgives ‖u1‖ = 1 so that {u1} is orthonormal and certainly the second condition holds.

If k ≥ 2 then assume the statement holds for all j = k − 1. Find vectors u1, . . . , uk−1 asin the statement. Now to define uk we set

wk = vk − [〈vk, u1〉u1 + · · ·+ 〈vk, uk−1〉uk−1] .

We claim that wk is orthogonal to all uj’s and is not zero. To check the first, let 1 ≤ j ≤ k−1and compute

〈wk, uj〉 = 〈vk, uj〉 − [〈vk, u1〉〈u1, uj〉+ · · ·+ 〈vk, uk−1〉〈uk−1, uj〉]= 〈vk, uj〉 − 〈vk, uj〉〈uj, uj〉 = 0 .

Second, if wk were zero then we would have

vk ∈ Span({u1, . . . , uk−1}) = Span({v1, . . . , vk−1}) ,

a contradiction to linear independence. Therefore we set uk = wk/‖wk‖ and we see that{u1, . . . , uk} is orthonormal and therefore linearly independent.

Furthermore note that by induction,

Span({u1, . . . , uk}) ⊆ Span({u1, . . . , uk−1, vk}) ⊆ Span({v1, . . . , vk}) .

Since the spaces on the left and right have the same dimension they are equal.

Corollary 8.2.4. If V is a finite-dimensional inner product space then V has an orthonormalbasis.

What do vectors look like represented in an orthonormal basis? Let β = {v1, . . . , vn} bea basis and let v ∈ V . Then

v = a1v1 + · · ·+ anvn .

Taking inner product with vj on both sides gives aj = 〈v, vj〉, so

v = 〈v, v1〉v1 + · · ·+ 〈v, vn〉vn .

Thus we can view in this case (orthonormal) the number 〈v, vi〉 as the projection of v ontovi. We can then find the norm of v easily:

‖v‖2 = 〈v, v〉 = 〈v,n∑i=1

〈v, vi〉vi〉 =n∑i=1

〈v, vi〉〈v, vi〉

=n∑i=1

|〈v, vi〉|2 .

This is known as Parseval’s identity.

90

Definition 8.2.5. If V is an inner product space and W is a subspace of V we define theorthogonal complement of V as

W⊥ = {v ∈ V : 〈v, w〉 = 0 for all w ∈ W} .

• {~0}⊥ = V and V ⊥ = {~0}.

• If S ⊆ V then S⊥ is always a subspace of V (even if S was not). Furthermore,

S⊥ = (Span S)⊥ and(S⊥)⊥

= Span S .

Theorem 8.2.6. Let V be a finite-dimensional inner product space with W a subspace. Then

V = W ⊕W⊥ .

Proof. Let {w1, . . . , wk} be a basis for W and extend it to a basis {w1, . . . , wn} for V . Thenperform Gram-Schmidt to get an orthonormal basis {v1, . . . , vn} such that

Span({v1, . . . , vj}) = Span({w1, . . . , wj}) for all j = 1, . . . , n .

In particular, {v1, . . . , vk} is an orthonormal basis for W . We claim that {vk+1, . . . , vn} is a

basis for W⊥. To see this, define W to be the span of these vectors. Clearly W ⊆ W⊥. Onthe other hand,

W ∩W⊥ = {w ∈ W : 〈w,w′〉 = 0 for all w′ ∈ W}⊆ {w ∈ W : 〈w,w〉 = 0} = {~0} .

This means that dim W + dim W⊥ ≤ n, or dim W⊥ ≤ n− k. Since dim W = n− k we seethey are equal.

This leads us to a definition.

Definition 8.2.7. Let V be a finite dimensional inner product space. If W is a subspace ofV we write PW : V → V for the operator

PW (v) = w1 ,

where v is written uniquely as w1+w2 for w1 ∈ W and w2 ∈ W⊥. PW is called the orthogonalprojection onto W .

Properties of orthogonal projection.

1. PW is linear.

2. P 2W = PW .

3. PW⊥ = I − PW .

91

4. For all v1, v2 ∈ V ,

〈PW (v1), v2〉 = 〈PW (v1), PW (v2)〉+ 〈PW (v1), PW⊥(v2)〉= 〈PW (v1), PW (v2)〉+ 〈PW⊥(v1), PW (v2)〉 = 〈v1, PW (v2)〉 .

Alternatively one may define an orthogonal projection as a linear map with properties 2 and4. That is, if T : V → V is linear with T 2 = I and 〈T (v), w〉 = 〈v, T (w)〉 for all v, w ∈ Vthen (check this)

• V = R(T )⊕N(T ) where N(T ) = (R(T ))⊥ and

• T = PR(T ).

Example. Orthogonal projection onto a 1-d subspace. What we saw in the proof of V =W ⊕W⊥ is the following. If W is a subspace of a finite dimensional inner product space,there exists an orthonormal basis of V of the form β = β1 ∪ β2, where β1 is an orthonormalbasis of W and β2 is an orthonormal basis of W⊥.

Choose an orthonormal basis {v1, . . . , vn} of V so that v1 spans W and the other vectorsspan W⊥. For any v ∈ V ,

v = 〈v, v1〉v1 + 〈v, v2〉v2 + · · ·+ 〈v, vn〉vn ,

which is a representation of v in terms of W and W⊥. Thus PW (v) = 〈v, v1〉v1.For any vector v′ we can define the orthogonal projection onto v′ as PW , where W =

Span{v′}. Then we choose w′ = v′/‖v′‖ as our first vector in the orthonormal basis and

Pv′(v) = PW (v) = 〈v, w′〉w′ = 〈v, v′〉

‖v′‖2v′ .

Theorem 8.2.8. If V is a finite dimensional inner product space with W a subspace of V ,for each v ∈ V , PW (v) is the closest vector in W to v (using the distance coming from ‖ · ‖).That is, for all w ∈ W ,

‖v − PW (v)‖ ≤ ‖v − w‖ .

Proof. First we note that if w ∈ W and w′ ∈ W⊥ then the Pythagoras theorem holds:

‖w + w′‖2 = 〈w,w′〉 = 〈w,w〉+ 〈w′, w′〉 = ‖w‖2 + ‖w′‖2 .

We now take v ∈ V and w ∈ W and write v−w = PW (v)+PW⊥(v)−w = PW (v)−w+PW⊥(v).Applying Pythagoras,

‖v − w‖2 = ‖PW (v)− w‖2 + ‖PW⊥(v)‖2 ≥ ‖PW⊥(v)‖2 = ‖PW (v)− v‖2 .

92

8.3 Adjoints

Theorem 8.3.1. Let V be a finite-dimensional inner product space. If T : V → V is linearthen there exists a unique linear transformation T ∗ : V → V such that for all v, u ∈ V ,

〈T (v), u〉 = 〈v, T ∗(u)〉 . (5)

We call T ∗ the adjoint of T .

Proof. We will use the Riesz representation theorem.

Lemma 8.3.2 (Riesz). Let V be a finite-dimensional inner product space. For each f ∈ V ∗there exists a unique z ∈ V such that

f(v) = 〈v, z〉 for all v ∈ V .

Given this we will define T ∗ as follows. For u ∈ V we define the linear functional

fu,T : V → C by fu,T (v) = 〈T (v), u〉 .

You can check this is indeed a linear functional. By Riesz, there exists a unique z ∈ V suchthat

fu,T (v) = 〈v, z〉 for all v ∈ V .

We define this z to be T ∗(u). In other words, for a given u ∈ V , T ∗(u) is the unique vectorin V with the property

〈T (v), u〉 = fT,u(v) = 〈v, T ∗(u)〉 for all v ∈ V .

Because of this identity, we see that there exists a function T ∗ : V → V such that for allu, v ∈ V , (5) holds. In other words, given T , we have a way of mapping a vector u ∈ V toanother vector which we call T ∗(u). We need to know that it is unique and linear.

Suppose that R : V → V is another function such that for all u, v ∈ V ,

〈T (v), u〉 = 〈v,R(u)〉 .

Then we see that

〈v, T ∗(u)−R(u)〉 = 〈v, T ∗(u)〉 − 〈v,R(u)〉 = 〈T (v), u〉 − 〈v,R(u)〉 = 0

for all u, v ∈ V . Choosing v = T ∗(u)−R(u) gives that

‖T ∗(u)−R(u)‖ = 0 ,

or T ∗(u) = R(u) for all u. This means T ∗ = R.To show linearity, let c ∈ C and u1, u2, v ∈ V .

〈T (v), cu1 + u2〉 = c〈T (v), u1〉+ 〈T (v), u2〉 = c〈v, T ∗(u1)〉+ 〈v, T ∗(u2)〉= 〈v, cT ∗(u1) + T ∗(u2)〉 .

93

This means that〈v, T ∗(cu1 + u2)− cT ∗(u1)− T ∗(u2)〉 = 0

for all v ∈ V . Choosing v = T ∗(cu1 + u2)− cT ∗(u1)− T ∗(u2) give that

T ∗(cu1 + u2) = cT ∗(u1) + T ∗(u2) ,

or T ∗ is linear.

Properties of adjoint.

1. T ∗ : V → V is linear. To see this, if w1, w2 ∈ V and c ∈ F then for all v,

〈T (v), cw1 + w2〉 = c〈T (v), w1〉+ 〈T (v), w2〉= c〈v, T ∗(w1)〉+ 〈v, T ∗(w2)〉= 〈v, cT ∗(w1) + T ∗(w2)〉 .

By uniqueness, T ∗(cw1 + w2) = cT ∗(w1) + T ∗(w2).

2. If β is an orthonormal basis of V then [T ∗]ββ =(

[T ]ββ

)t.

Proof. If β is an orthonormal basis, remembering that 〈·, ·〉 is a sesquilinear form, thematrix of this in the basis β is simply the identity. Therefore

〈T (v), w〉 = [T (v)]tβ[w]β =(

[T ]ββ[v]β

)t[w]β = [v]tβ

([T ]ββ

)t[w]β .

On the other hand, for all v, w

〈v, T ∗(w)〉 = [v]tβ[T ∗(w)]β = [v]tβ[T ∗]ββ[w]β .

Therefore

[v]tβ[T ∗]ββ[w]β = [v]tβ

([T ]ββ

)t[w]β .

Choosing v = vi and w = vj tells that all the entries of the two matrices are equal.

3. (T + S)∗ = T ∗ + S∗.

Proof. If v, w ∈ V ,

〈(T + S)(v), w〉 = 〈T (v), w〉+ 〈S(v), w〉 = 〈v, T ∗(w)〉+ 〈v, S∗(w)〉 .

This equals 〈v, (T ∗ + S∗)(w)〉.

4. (cT )∗ = cT ∗. This is similar.

94

5. (TS)∗ = S∗T ∗.

Proof. For all v, w ∈ V ,

〈(TS)(v), w〉 = 〈T (S(v)), w〉 = 〈S(v), T ∗(w)〉 = 〈v, S∗(T ∗(w))〉 .

This is 〈v, (S∗T ∗)(w)〉.

6. (T ∗)∗ = T .

Proof. If v, w ∈ V ,

〈T ∗(v), w〉 = 〈w, T ∗(v)〉 = 〈T (w), v〉 = 〈v, T (w)〉 .

8.4 Spectral theory of self-adjoint operators

Definition 8.4.1. If V is an inner product space and T : V → V is linear we say that T

1. self-adjoint if T = T ∗;

2. skew-adjoint if T = −T ∗;

3. unitary if T is invertible and T−1 = T ∗;

4. normal if TT ∗ = T ∗T .

Note all the above operators are normal. Also orthogonal projections are self-adjoint.Draw relation to complex numbers (SA is real, skew is purely imaginary, unitary is unitcircle).

Theorem 8.4.2. Let V be an inner product space and T : V → V linear with λ an eigenvalueof T .

1. If T is self-adjoint then λ is real.

2. If T is skew-adjoint then λ is purely imaginary.

3. If T is unitary then |λ| = 1 and |det T | = 1.

Proof. Let T : V → V be linear and λ and eigenvalue for eigenvector v.

1. Suppose T = T ∗. Then

〈λ‖v‖2 = 〈T (v), v〉 = 〈v, T (v)〉 = λ‖v‖2 .

But v 6= 0 so λ = λ.

95

2. Suppose that T = −T ∗. Define S = iT . Then S∗ = iT ∗ = −iT ∗ = iT = S, so S isself-adjoint. Now iλ is an eigenvalue of S:

S(v) = (iT )(v) = iλv .

This means iλ is real, or λ is purely imaginary.

3. Suppose T ∗ = T−1. Then

|λ|2‖v‖2 = 〈T (v), T (v)〉 = 〈v, T−1Tv〉 = ‖v‖2 .

This means |λ| = 1. Furthermore, det T is the product of eigenvalues, so |det T | = 1.

What do these operators look like? If β is an orthonormal basis for V then

1. If T = T ∗ then [T ]ββ =(

[T ]ββ

)t.

2. If T = −T ∗ then [T ]ββ = −(

[T ]ββ

)t.

Lemma 8.4.3. Let V be a finite-dimensional inner product space. If T : V → V is linear,the following are equivalent.

1. T is unitary.

2. For all v ∈ V , ‖T (v)‖ = ‖v‖.

3. For all v, w ∈ V , 〈T (v), T (w)〉 = 〈v, w〉.

Proof. If T is unitary then 〈T (v), T (v)〉 = 〈v, T−1Tv〉 = 〈v, v〉. This shows 1 implies 2. If Tpreserves norm then it also preserves inner product by the polarization identity. This proves2 implies 3. To see that 3 implies 1, we take v, w ∈ V and see

〈v, w〉 = 〈T (v), T (w)〉 = 〈v, T ∗T (w)〉 .

This implies that 〈v, w − T ∗T (w)〉 = 0 for all v. Taking v = w − T ∗T (w) gives thatT ∗T (w) = w. Thus T must be invertible and T ∗ = T−1.

• Furthermore, T is unitary if and only if T maps orthonormal bases to orthonormalbases. In particular, [T ]ββ has orthonormal columns whenever β is orthonormal.

• For β an orthonormal basis, the unitary operators are exactly those whose matricesrelative to β have orthonormal columns.

We begin with a definition.

96

Definition 8.4.4. If V is a finite-dimensional inner product space and T : V → V is linear,we say that T is unitarily diagonalizable if there exists an orthonormal basis β of V suchthat [T ]ββ is diagonal.

Note that T is unitarily diagonalizable if and only if there exists a unitary operator Usuch that

U−1TU is diagonal .

Theorem 8.4.5 (Spectral theorem). Let V be a finite-dimensional inner product space. IfT : V → V is self-adjoint then T is unitarily diagonalizable.

Proof. We will use induction on dim V = n. If n = 1 just choose a vector of norm 1.Otherwise suppose the statement is true for n < k and we will show it for k ≥ 2. Since Thas an eigenvalue λ, it has an eigenvector, v1. Choose v1 with norm 1.

Let Uλ = T − λI. We claim that

V = N(Uλ)⊕R(Uλ) .

To show this we need only prove that R(Uλ) = N(Uλ)⊥. This will follow from a lemma:

Lemma 8.4.6. Let V be a finite-dimensional inner product space and U : V → V linear.Then

R(U) = N(U∗)⊥ .

Proof. If w ∈ R(U) then let z ∈ N(U∗). There exists v ∈ V such that U(v) = w. Therefore

〈w, z〉 = 〈U(v), z〉 = 〈v, U∗(z)〉 = 0 .

Therefore R(U) ⊆ N(U∗)⊥. For the other containment, note that dim R(U) = dim R(U∗)(since the matrix of U∗ in an orthonormal basis is just the conjugate transpose of that ofU). Therefore

dim R(U) = dim R(U∗) = dim V − dim N(U∗) = dim N(U∗)⊥ .

Now we apply the lemma. Note that since T is self-adjoint,

U∗λ = (T − λI)∗ = T ∗ − λI = T − λI = Uλ ,

since λ ∈ R. Thus using the lemma with Uλ,

V = N(U∗λ)⊕N(U∗λ)⊥ = N(Uλ)⊕R(Uλ) .

Note that these are T -invariant spaces and dim R(Uλ) < k since λ is an eigenvalue of T . Thusthere is an orthonormal basis β′ of R(Uλ) such that T restricted to this space is diagonal.Taking

β = β′ ∪ {v1}gives now an orthonormal basis such that [T ]ββ is block diagonal. But the block for {v1} is

of size 1, so [T ]ββ is diagonal.

97

• Note that if T is skew-adjoint, iT is self-adjoint, so we can find an orthonormal basisβ such that [iT ]ββ is diagonal. This implies that T itself can be diagonalized by β: its

matrix is just −i[iT ]ββ.

8.5 Normal and commuting operators

Lemma 8.5.1. Let U, T : V → V be linear and F be algebraically closed. Write ETλ1, . . . , ET

λkfor the generalized eigenspaces for T . If T and U commute then

V = ETλ1⊕ · · · ⊕ ET

λk

is both a T -invariant direct sum and a U-invariant direct sum.

Proof. We need only show that the generalized eigenspaces of T are U -invariant. If v ∈N(T − λiI)m then

(T − λiI)m(U(v)) = U(T − λiI)mv = ~0 .

Theorem 8.5.2. Let U, T : V → V be linear and F algebraically closed. Suppose that Tand U commute. Then

1. If T and U are diagonalizable then there exists a basis β such that both [T ]ββ and [U ]ββare diagonal.

2. If V is an inner product space and T and U are self-adjoint then we can choose β tobe orthonormal.

Proof. Suppose that T and U are diagonalizable. Then the direct sum of eigenspaces for Tis simply

ETλ1⊕ · · · ⊕ ET

λk,

the eigenspaces. For each j, choose a Jordan basis βj for U on ETλj

. Set β = ∪kj=1βj. These

are all eigenvectors for T so [T ]ββ is diagonal. Further, [U ]ββ is in Jordan form. But since U

is diagonalizable, its Jordan form is diagonal. By uniqueness, [U ]ββ is diagonal.If T and U are self-adjoint the decomposition

V = ETλ1⊕ · · · ⊕ ET

λk

is orthogonal. For each j, choose an orthonormal basis βj of ETλj

of eigenvectors for U (since

U is self-adjoint on ETλj

). Now β = ∪ki=1βi is an orthonormal basis of eigenvectors for bothT and U .

Theorem 8.5.3. Let V be a finite-dimensional inner product space. If T : V → V is linearthen T is normal if and only if T is unitarily diagonalizable.

98

Definition 8.5.4. If V is an inner product space and T : V → V is linear, we write

T = (1/2)(T + T ∗) + (1/2)(T − T ∗) = T1 + T2

and call these operators the self-adjoint part and the skew-adjoint part of T respectively.

Of course each part of a linear transformation can be unitarily diagonalized on its own.We now need to diagonalize them simultaneously.

Proof. If T is unitarily diagonalizable, then, taking U such that T = U−1DU for a diagonalD, we get

TT ∗ = U−1DU(U−1DU

)∗= U∗DUU∗D∗U = U∗DD∗U = · · · = T ∗T ,

or T is normal.Suppose that T is normal. Then T1 and T2 commute. Note that T1 is self-adjoint and

iT2 is also. They commute so we can find an orthonormal basis β such that [T1]ββ and [iT2]ββare diagonal. Now

[T ]ββ = [T1]ββ − i[iT2]ββ

is diagonal.

8.6 Exercises

Notation

1. If V is a vector space over R and 〈·, ·〉 : V × V → R is a positive-definite symmetricbilinear form, then we call 〈·, ·〉 a (real) inner product. The pair (V, 〈·, ·〉) is called a(real) inner product space.

2. If (V, 〈·, ·〉) is a real inner product space and S is a subset of V we say that S isorthogonal if 〈v, w〉 = 0 whenever v, w ∈ S are distinct. We say S is orthonormal if Sis orthogonal and 〈v, v〉 = 1 for all v ∈ S.

3. If f is a symmetric bilinear form on a vector space V the orthogonal group is the set

O(f) = {T : V → V | f(T (u), T (v)) = f(u, v) for all u, v ∈ V } .

Exercises

1. Let V be a complex inner product space. Let T ∈ L(V, V ) be such that T ∗ = −T .We call such T skew-self-adjoint. Show that the eigenvalues of T are purely imaginary.Show further that V is the orthogonal direct sum of the eigenspaces of T . In otherwords, V is a direct sum of the eigenspaces and 〈v, w〉 = 0 if v and w are in distincteigenspaces.Hint: Construct from T a suitable self-adjoint operator and apply the known resultsfrom the lecture to that operator.

99

2. Let V be a complex inner product space, and T ∈ L(V, V ).

(a) Show that T is unitary if and only if it maps orthonormal bases to orthonormalbases.

(b) Let β be an orthonormal basis of V . Show that T is unitary if and only if thecolumns of the matrix matrix [T ]ββ form a set of orthonormal vectors in Cn withrespect to the standard hermitian form (standard dot product).

3. Let (V, 〈·, ·〉) be a complex inner product space, and η be a Hermitian form on V (inaddition to 〈·, ·〉 ). Show that there exists an orthonormal basis of V such that [η]ββ isdiagonal, by completing the following steps:

(a) Show that for each w ∈ V , there exists a unique vector, which we call Aw, in Vwith the property that for all v ∈ V ,

η(v, w) = 〈v, Aw〉.

(b) Show the the map A : V → V which sends a vector w ∈ V to the vector Aw justdefined, is linear and self-adjoint.

(c) Use the spectral theorem for self-adjoint operators to complete the problem.

4. Let (V, 〈·, ·〉) be a real inner product space.

(a) Define ‖ · ‖ : V → R by

‖v‖ =√〈v, v〉 .

Show that for all v, w ∈ V ,

|〈v, w〉| ≤ ‖v‖‖w‖ .

(b) Show that ‖ · ‖ is a norm on V .

(c) Show that there exists an orthonormal basis β of V .

5. Let (V, 〈·, ·〉) be a real inner product space and T : V → V be linear.

(a) Prove that for each f ∈ V ∗ there exists a unique z ∈ V such that for all v ∈ V ,

f(v) = 〈v, z〉 .

(b) For each u ∈ V define fu,T : V → V by

fu,T (v) = 〈T (v), u〉 .

Prove that fu,T ∈ V ∗. Define T t(u) to be the unique u ∈ V such that for allv ∈ V ,

〈T (v), u〉 = 〈v, T t(u)〉and show that T t is linear.

100

(c) Show that if β is an orthonormal basis for V then

[T t]ββ =(

[T ]ββ

)t.

6. Let (V, 〈·, ·〉) be a real inner product space and define the complexification of 〈·, ·〉 asin homework 11 by

〈(v, w), (x, y)〉C = 〈v, x〉+ 〈w, y〉 − i〈v, y〉+ i〈w, x〉 .

(a) Show that 〈·, ·〉C is an inner product on VC.

(b) Let T : V → V be linear.

i. Prove that (TC)∗ = (T t)C.

ii. If T t = T then we say that T is symmetric. Show in this case that TC isHermitian.

iii. It T t = −T then we say T is anti-symmetric. Show in this case that TC isskew-adjoint.

iv. If T is invertible and T t = T−1 then we say that T is orthogonal. Show inthis case that TC is unitary. Show that this is equivalent to

T ∈ O(〈·, ·〉) ,

where O(〈·, ·〉) is the orthogonal group for 〈·, ·〉.

7. Let (V, 〈·, ·〉) be a real inner product space and T : V → V be linear.

(a) Suppose that TT t = T tT . Show then that TC is normal. In this case, we can finda basis β of VC such that β is orthonormal (with respect to 〈·, ·〉C) and [TC]ββ isdiagonal. Define the subspaces of V

X1, . . . , Xr, Y1, . . . , Y2m

as in problem 1, question 3. Show that these are mutually orthogonal; that is, ifv, w are in different subspaces then 〈v, w〉 = 0.

(b) If T is symmetric then show that there exists an orthonormal basis β of V suchthat [T ]ββ is diagonal.

(c) If T is skew-symmetric, what is the form of the matrix of T in real Jordan form?

(d) A ∈M2×2(R) is called a rotation matrix if there exists θ ∈ [0, 2π) such that

A =

(cos θ sin θ− sin θ cos θ

).

If T is orthogonal, show that there exists a basis β of V such that [T ]ββ is blockdiagonal, and the blocks are either 2 × 2 rotation matrices or 1 × 1 matricesconsisting of 1 or −1.

Hint. Use the real Jordan form.

101

linear algebra: mat 217 lecture notes, spring 2012 michael...

Documents