math 235.9 spring 2015 course notes - umass...

Math 235.9 Spring 2015 Course Notes

Andrew J. Havens

April 15, 2015

1 Systems of Real Linear Equations

Let’s consider two geometric problems:

(1) Find the intersection point, if it exists, for the pair of lines whose equations in “standardform” are given as

2x+ 4y = 6,

x− y = 0 .

More generally, can we solve the two dimensional linear system:

ax+ by = e,

cx+ dy = f ,

provided a solution exists? Can we develop criteria to understand when there is a uniquesolution, or multiple solutions, or no solution at all?

(2) Consider the vectors Ç11

å,

Ç2−1

å.

We can depict these as arrows in the plane as follows:

Figure 1: The two vectors above depicted as “geometric arrows” in the Cartesian coordinate plane.

Imagine that we can only take “steps” corresponding to these vectors, i.e. we can only moveparallel to these vectors, and a valid move consists of adding one of these two vectors to ourposition to obtain our next position. Can we make it from the origin O = (0, 0) to the point(6, 0)?

1

Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens

We will see that these two kinds of problems are actually related more closely than they wouldinitially appear (though the second has a restriction that the first does not require, namely weseek an integer solution; nonetheless, there is an underlying algebraic formalism which allows us toconsider this problem one of linear algebra).

First, we solve problem (1). There are many ways to solve the given numerical problem. Amongthem: solving one equation for either x or y (the second is ripe for this) and substituting the resultinto the other equation, writing both equations in slope-intercept form and setting them equal(this is clearly equivalent to the substitution described), or eliminating variables by multiplyingthe equations by suitable constants and respectively adding the resulting left and right hand sidesto obtain a single variable equation:{

2x+ 4y = 6

x− y = 0←→

{x+ 2y = 3

2x− 2y = 0

3x = 3 .

From this we see that x = 1, and substituting into the second of the two original equations, wesee that y = 1 as well.

Figure 2: The two lines plotted in the Cartesian coordinate plane.

The motivation to use these manipulations will become more clear when we see higher-dimensionallinear systems (more variables and more equations motivates a systematic approach, which we willdevelop in subsequent lectures). One often notates this kind of problem and the manipulations in-volved by writing down only the coefficients and constants in what is called an augmented matrix :ñ

2 4 61 −1 0

ô.

The square portion of the matrix is the coefficient matrix, and the final column contains the con-stants from the standard forms of our linear equations. This notation generalizes nicely whenencoding large systems of linear equations in many unknowns. Let us describe what the manipu-lations of the equations correspond to in this matrix notation:

(i) A row may be scaled by a nonzero number since equations may be multiplied/divided on leftand right sides by a nonzero number,

2


(ii) A nonzero multiple of a row may be added to another row, and the sum may replace thatrow, since we can recombine equations by addition as above.

(iii) Two rows may be swapped, since the order in which equations are written down does notdetermine or effect their solutions.

The above are known as elementary row operations. Note that for constants p, q ∈ R anaugmented matrix of the form ñ

1 0 p0 1 q

ôcorresponds to a solution x = p, y = q. Further, note that we can combine operations (i) and(ii) to a more general and powerful row operation: we may replace a row by any nontrivial linearcombination of that row and other rows, i.e. we may take a non-zero multiple of a row and addmultiples of other rows, and replace the original row with this sum.

Let us apply row operations to attempt to solve the abstract system{ax+ by = e

cx+ dy = f←→

ña b ec d f

ô.

We assume temporarily that a 6= 0. We will discuss this assumption in more depth later. Sinceour goal is to make the coefficient matrix have ones along the diagonal from left top to right bottom,and zeros elsewhere, we work to first zero out the bottom left entry. This can be done, for example,by taking a times the second row and subtracting c times the first row, and replacing the secondrow with the result. We denote this by writing

aR2 − cR1 7→ R′2

(I may get lazy and stop writing the primes, where it will be understood that R2 after the arrowrepresents a row replacement by the quantity on the left). The effect on the augmented matrix isñ

a b ec d f

ô7−→

ña b e0 ad− bc af − ce

ô.

We see that if ad− bc = 0, then either there is no solution, or we must have af − ce = 0. Let’splug on assuming that ad − bc 6= 0. We may eliminate the upper right position held by b in thecoefficient matrix by (ad− bc)R1 − bR2 7→ R′1, yieldingñ

a b e0 ad− bc af − ce

ô7→ña(ad− bc) 0 (ad− bc)e− b(af − ce)

0 ad− bc af − ce

ô=

ña(ad− bc) 0 ade− abf

0 ad− bc af − ce

ô.

Since we assumed a and ad− bc nonzero, we may apply the final row operations 1a(ad−bc)R1 7→ R′1

and 1ad−bcR2 7→ R′2 to obtain ñ

1 0 (de− bf)/(ad− bc)0 1 (af − ce)/(ad− bc)

ô,

so we obtain the solution as

x =de− bfad− bc

, y =af − cead− bc

.

Note that if a = 0 but bc 6= 0, the solutions are still well defined, and one can obtain thecorresponding expressions with a = 0 substituted in by instead performing elimination onñ

0 b ec d f

ô,

3


where the first step might be a simple row swap. However, if ad− bc = 0, there is no hope for theunique solution expressions we obtained, though there may still be solutions, or there may be noneat all. We will characterize this failure geometrically eventually. First, we turn to problem (2).

Problem (2) is best rephrased in terms of the language of linear combinations of vectors. Recallthat addition of the real vectors, which we are representing as arrows in the plane, has bothgeometric and algebraic definitions. The geometric definition is of course the parallelogram rule:the sum of two vectors a and b is the diagonal of the parallelogram completed by parallel translatinga along b and b along a:

Figure 3: Vector addition with arrows.

The corresponding algebraic operation is merely addition of components: if

a =

Çaxay

å, b =

Çbxby

å,

then define

a + b :=

Çax + bxay + by

å.

It is left to the reader to see that these two notions of addition are equivalent, and satisfy propertiessuch as commutativity and associativity. Moreover, one can iterate addition, and thus define forany positive integer n ∈ Z

na = a + a + . . .+ a︸︷︷︸n times

.

Similarly, one can define subtraction, which regards −a :=

Ç−ax−ay

åas a natural additive inverse

to a.In fact, geometrically, we need not restrict ourselves to integer multiples, for we can scale a

vector by any real number (reversing direction if negative), and algebraically this corresponds tosimply multiplying each component by that real number. (For the math majors among you, weare giving the space R2 of vectors, thought of either as pairs of real numbers or as arrows in theplane, an abelian group structure but also a structure as a free R-module; we will see many of theseproperties later when we define vector spaces formally, but a further generalization is to studygroups and modules; an elementary theory of groups is treated in introductory abstract algebra–math 411 here at UMass, while more advanced group theory, ring theory and module theory areleft to more advanced abstract algebra courses, such as math 412 and math 611.)

We restrict our attention to integral linear combinations of the vectors

a :=

Ç11

å, b :=

Ç2−1

å,

4


i.e. combinations of the form xa + yb, where x, y ∈ Z. Then problem (2) is easily rephrased as

follows: does there exist an integral linear combination of a and b equal to the vector

Ç60

å?

Visually, it would seem quite plausible (make two parallelograms as shown below!)

Figure 4: The two vectors above depicted as “geometric arrows” in the Cartesian coordinate plane.

Algebraically, we can apply the definitions of vector scaling and addition to unravel the meaningof the question: we are seeking integers x and y such thatÇ

60

å= x

Ç11

å+ y

Ç2−1

å=

Çx+ 2yx− y

å, .

This is equivalent to a linear system as seen in problem (1)! In fact, we can use the solution of (1)to slickly obtain a solution to (2): since (1, 1) = (x, y) is a solution toÇ

30

å=

Çx+ 2yx− y

å,

we can multiply both sides by 2 to obtainÇ60

å=

Ç2x+ 4y2x− 2y

å= 2x

Ç11

å+ 2y

Ç2−1

å= 2(1)

Ç11

å+ 2(1)

Ç2−1

å.

Thus, taking two steps along a and two steps along b lands on the desired point (6, 0).Let’s summarize what we’ve seen in these two problems. We have two dual perspectives:

Intersection problem: find the intersection Linear combination problem: Find a linearof two lines / solve a linear system combination of two vectors

of two equations: a =

Çac

åand b =

Çbd

å:

{ax+ by = e

cx+ dy = fx

Çac

å+ y

Çbd

å=

Çef

å5


Let’s return to studying the intersection problem to fill in the gap: what can we say aboutexistence or uniqueness of solutions if the quantity ad− bc is equal to zero?

Proposition 1.1. For a given two variable linear system described by the equations{ax+ by = e

cx+ dy = f

the quantity ad− bc = 0 if and only if the lines described by the equations have the same slope.

Proof. We must show two directions, since this is an if and only if statement. Namely, we mustshow that if the lines have the same slopes, then ad − bc = 0, and conversely, if we know onlythat ad− bc = 0, we must deduce the corresponding lines possess the same slopes. Let’s prove theformer. We have several cases we need to consider. First, let’s suppose that none of the coefficientsare zero, in which case we can write each equation in slope-intercept form:

ax+ by = e←→ y = −abx+

e

b,

cx+ dy = f ←→ y = − cdx+

f

d,

and applying the assumption that the lines have identical slopes, we obtain

−ab

= − cd

=⇒ ad = bc =⇒ ad− bd = 0 . (1)

On the other hand, if for example, a = 0, then the first equation is by = e, which describes ahorizontal line (we must have b 6= 0 if this equation is meaningful). This tells us that the otherequation is also for a horizontal line, so c = 0 and consequently ad− bc = 0 · d− b · 0 = 0. A nearlyidentical argument works when the lines are vertical, which happens if and only if b = 0 = d.

It now remains to show the converse, that if ad − bc = 0, we can deduce the equality of thelines’ slopes. Provided neither a nor d are zero, we can work backwards in the equation (??):

ad− bc = 0 =⇒ −ab

= − cd.

Else, if a = 0 or d = 0 and ad− bc = 0, then since ad− bc = bc, either b = 0 or c = 0. But a andb cannot both be zero if we have a meaningful system (or indeed, the equations of lines). Thusif a = 0 and ad − bc = 0, then c = 0 and the lines are both horizontal. Similarly, if d = 0 andad− bc = 0, then b = 0 we are faced with two vertical lines.

There are thus three pictures, dependent on ad− bc, e and f :

1. If ad− bc 6= 0, there is a unique solution (x, y) for any e and f we choose, and this pair (x, y)corresponds to the unique intersection point of two non-parallel lines.

2. If ad − bc = 0, but af − ec = 0 = bf − ed, then one equation is a multiple of the other,and geometrically we are looking at redundant equations for a single line. There are infinitelymany solutions (x, y) corresponding to all ordered pairs lying on this line.

3. ad − bc = 0 but af 6= ec. We have two parallel lines, which never intersect. There are nosolutions to the linear system.

6


While there is much more that can be done with two dimensional linear algebra, we have afairly complete idea of how to solve each of the basic problems posed. We now will explore theanalogous problems in three dimensions, as a way to build up to solving general linear systems.Thus, consider the following problems from three dimensional geometry:

1. Given three “generic” planes in R3 which intersect in a unique point, can we locate theirpoint of intersection?

2. Given two planes intersecting along a line, can we describe the line “parametrically”?

3. Given three vectors u,v,w in R3, can we describe a fourth vector b as a linear combinationof the other three?

Before approaching this, we review some important properties of the real numbers, and thedescription of Cartesian coordinates on Cartesian products of the reals.

R denotes the real numbers, which has some additional structure such as a notion of distancegiven by absolute value, a notion of partial ordering (≤). With these notions, together with ordinaryreal number arithmetic, we can view R as a normed, ordered scalar field. The properties of R whichmake it a field are:

(i.) R comes equipped with a notion of associative, commutative addition: for any real numbersa, b, and c, a + b = b + a is also a real number, and (a + b) + c = a + b + c = a + (b + c).Moreover, there is a unique element 0 ∈ R which acts as an identity for the addition of realnumbers: 0 + a = a for any a ∈ R. Every a ∈ R has a unique additive inverse (−a) such thata+ (−a) = 0.

(ii.) R comes equipped with a notion of associative, commutative, and distributive multiplication:for any a, b, c ∈ R, ab = ba determines a real number, a(bc) = abc = (ab)c, and a(b + c) =ab + ac = (b + c)a. Moreover, 0a = 0 for any a ∈ R, and there is a unique number a ∈ Rwhich acts as an identity for multiplication of real numbers: 1a = a for any a ∈ R.

(iii.) To any nonzero a ∈ R there corresponds a multiplicative inverse 1a := a−1 satisfying aa−1 = 1.

A mathematical set with a structure as above is called a field. We will encounter other fieldslater on. We’ve already seen examples of “vectors” in the plane, utilizing the coordinates comingfrom a Cartesian product :

R2 = R× R := {(x, y) |x, y, z ∈ R} .

When we wish to emphasize that we are talking about vectors, we write them not as ordered pairshorizontally, but as vertical tuples:

x =

Çxy

å∈ R2 .

We can regard such a vector as the position vector of the point (x, y), which means it is geometricallythe arrow pointing from the origin (0, 0) to the point (x, y). It has a notion of geometric lengthcoming from the pythagorean theorem:

‖x‖ =»x2 + y2 .

We can extend the ideas of this construction to create “higher dimensional” spaces. Thegeometry we are working with here is called Euclidean (vector) geometry. We define R3 analogously:

R3 = R× R× R := {(x, y, z) |x, y, z ∈ R} .

7


In R3, we can carve out subsets called planes. They have equations with general form:

ax+ by + cz = d , a, b, c, d ∈ R , x, y, z are real variables for coordinates on the plane.Let’s try to find an intersection point for a system of three planes.

Example 1.1. Consider the 3× 3 systemx+ y + z = 6

x− 2y + 3z = 6

4x− 5y + 6z = 12

←→

1 1 1 61 −2 3 64 −5 6 12

.Our goal is to manipulate the system via operations corresponding to adding or scaling the equa-tions, in order to obtain 1 0 0 p

0 1 0 q0 0 1 r

,which corresponds to a solution (x, y, z) = (p, q, r) for some p, q, r ∈ R.

A simple list of valid manipulations corresponds to the following elementary row operations:

1. We may swap two rows, just as we may write the equations in any order we please. We notatea swap of the ith and jth rows of an augmented matrix by Ri ↔ Rj .

2. We may replace a row Ri with the row obtained by scaling the original row by a nonzero realnumber. We notate this by sRi 7→ Ri.

3. We may replace a row Ri by the difference of that row and a multiple of another row. Wenotate this by Ri − sRj 7→ Ri.

Before we proceed to apply these row operations to try to solve our system, I remark thatcombining these elementary operations allows us to describe a more general valid manipulation:we may replace a row by a linear combination of rows, where the original row is weighted by anonzero real number. E.g., if s 6= 0, then the following is the most general row operation (up torow swapping) involving the rows R1, R2, R3:

sR1 + tR2 + uR3 7→ R1 .

Now, to create our solution with row operations. Notice that the top left entry of the matrixis already a 1, which is good news! We want 1s on the main diagonal, and zeros elsewhere on thecoefficient side of the augmented matrix. So if the top left entry was a 0, we’d swap rows to get anonzero entry there, and then if it was not 1 we’d scale the first row by the multiplicative inverseof that entry. Once we’ve got a nonzero entry there, we call this position the first pivot, and ourgoal is to use it to create a column of zeroes beneath that position.

Focusing on that first column, we have:1 . . . 61 . . . 64 . . . 12

.It is clear that we can eliminate the second entry in the first column by the row operation

R2−R1 7→ R2. Similarly, we can create a zero in the first entry of the third row by R3−4R1 7→ R3.This yields 1 1 1 6

1 −2 3 64 −5 6 12

7−→1 1 1 6

0 −3 2 00 −9 2 −12

8


Next, we want to make the middle entry from a −3 into a 1. This is readily accomplished by a rowoperation of the second type: −1

3R2 7→ R2.One should check that after performing in sequence the moves R3 − 9R2 7→ R3,

14R3 7→ R3,

R2 + 23R3 7→ R3, R1 − 1

3R3 7→ R1, and R1 −R2 7→ R1, the matrix reduces to1 0 0 10 1 0 20 0 1 3

.Thus the solution to our system is (1, 2, 3), which is the point where these planes intersect.

The process where we used a pivot to make zeroes below that entry is called pivoting down,while the process where we eliminated entries above a pivot position is called pivoting up.

Exercise 1.1. Show that the row operations are invertible, by producing for a given elementaryrow operation, another elementary operation which applied either before or after the given one willresult in the final matrix being unchanged.

Example 1.2. Let us turn to the second geometric problem, regarding the description of a line ofintersection of two planes. Take, for instance, the two planes{

x+ y + z = 6

x− 2y + 3z = 6.

By applying the row operations in the preceding example together with a few more (whichones?), we see that we can get the system to reduce toñ

1 0 5/3 60 1 −2/3 0

ô.

Notice that there can be at most two pivots, since there are only two rows! We rewrite the matrixrows as equations to try to parametrize the line:

x = 6− (5/3)z ,

y = (2/3)z ,

whence xyz

=

6− (5/3)z(2/3)zz

=

600

+ z

−5/32/31

.Thus the line can be parametrized by z ∈ R, which is the height along the line which begins at(6, 0, 0) on the xy-plane in R3 when z = 0, and travels with velocity

v =

−5/32/31

.Note that above we wrote the solution as a linear combination of the vectors for the starting

position and the velocity. It will be common to solve systems where the final solution is an arbitrarylinear combination dependent on some scalar weights coming from undetermined variables. Byconvention, we often choose different letters from the variable designations, such as s and t, torepresent the scalings in such a solution. Thus we would write x

yz

=

600

+ s

−5/32/31

, s ∈ R ,

where we’ve taken z = s as a free variable.

9


For the third problem, the key observation is that it is essentially the same as the first problem,dualized. We can write down the equation

xu + yv + zw = b ,

for some unknowns x, y, z ∈ R, and after scaling the vectors entry by entry, and adding entry byentry, we have two vectors which are ostensibly equal. Thus setting their entries equal, we obtain asystem of three equations, which can be solved via elimination/row operations on the correspondingaugmented matrix.

Example 1.3. Let

u =

123

, v =

−2−1−3

, w =

325

.Can the vector

b =

014

be written as a linear combination of u, v, and w?

The claim is that this is not possible. Observe that if such a linear combination exists, thenthere’s a solution to the vector equation

xu + yv + zw = b .

We can rewrite this as a system as follows:x− 2y + 3z = 0

2x − y + 2z = 1

3x− 3y + 5z = 4

←→

1 −2 3 02 −1 2 13 −3 5 4

We apply the row operations R2 − 2R1 7→ R2 and R3 − 3R1 7→ R3 to obtain1 −2 3 0

0 3 −4 10 3 −4 4

,and then R3 −R2 7→ R3 leaves us with 1 −2 3 0

0 3 −4 10 0 0 3

.The last row corresponds to the impossible equation 0z = 3 =⇒ 0 = 3, so there is no possiblesolution! We call such a system inconsistent. Otherwise, if the equation can be solved (even if thesolution is not unique), we refer to the system as consistent.

Some possible practice problems: Problems 1-18 in section 1.1 – Introduction to Linear systemsin Otto Bretscher’s textbook Linear Algebra with Applications.

These problems generalize easily into higher dimensions, and it will be nice to see that ourprocedure illustrated in the above examples works just as well in those settings. Thus, it seemsfitting that we study the general algorithm which allows us to reduce systems and solve either foran explicit solution, or to realize a system is inconsistent. As we will use this algorithm extensively,I devote several lectures to its details and implementation.

10


2 Gauss-Jordan Elimination

In this section we describe the general algorithm which takes a matrix and reduces it in order tosolve a system or determine that it is inconsistent. Let us begin with some language and notations.

Definition 2.1. A matrix is said to be in Row Echelon Form (REF) if the following conditionshold:

1. All rows containing only zeros appear below rows with nonzero entries.

2. The first nonzero entry in any row appears in a column to the right of the first nonzero entryin any preceding row, and any such initial nonzero entry is a 1.

The columns with leading 1s are called pivot columns, and the entries containing leading 1s arecalled pivots. If, in addition, all entries other than the pivot entries are zero we say the matrix isin Reduced Row Echelon Form (RREF).

Example 2.1. ñ1 0 5/30 1 −2/3

ôis a matrix in row echelon form, while ñ

1 0 00 1 0

ôis a matrix in reduced row echelon form.

We write elementary row ops as follows: let s ∈ R \ 0 be a nonzero scalar, A ∈ Matm×n(R) amatrix which contains m rows and n columns of real entries. Let Ri denote the ith row of A forany integer i, 1 ≤ i ≤ m. Then the elementary row operations are

1. Row swap: Ri ↔ Rj swaps the ith and jth rows.

2. Rescaling: sRi 7→ Ri scales Ri by s.

3. Row combine: Ri − sRj 7→ Ri combines Ri with the scalar multiple sRj of Rj .

We are ready to describe the procedure for pivoting downward :

Definition 2.2. Let aij denote the entry in the ith row and jth column of A ∈ Matm×n(R). Topivot downward on the (i,j)th entry is to perform the following operations:

(i.)1

aijRi 7→ Ri,

(ii.) For each integer k > i, Ri+k − ai+k,jRi 7→ Ri+k.

In words, make aij into a 1, and use this one to eliminate (make 0) all other entries directly belowthe (i,j)th entry.

Let’s give a brief overview of what the Gauss-Jordan algorithm accomplishes. First, given aninput matrix, it searches for the leftmost nonzero column. Then, after finding this column, andafter exchanging rows if necessary, it brings the first nonzero entry up to the top. It then pivotsdownwards on this entry. It subsequently narrows its view to the submatrix with the first row andcolumn removed, and repeats the procedure. Once it has located all pivot columns and pivoteddown in each one, it starts from the rightmost pivot and pivot up, then move left to the next pivotand pivot up. It then continues pivoting up and moving left until the matrix is in row echelon form.

The descriptions and charts I gave in class are largely taken from a textbook which is in myoffice (the name escapes me). However, the technical details given in class are not a principal focus,and in particular, will not appear on the exam in any formal capacity (as long as you can performthe algorithm in practice, then you’ve got what you need for the remainder of the course). I maycome back and include these details at a future date.

11


3 Matrices and Linear Maps of Rn → Rm

Now that we have an algorithm for solving systems, let’s return to the vector picture again. Here,we review some basic vector algebra in two and thee dimensions: Regard R2 as the set of vectors

R2 =

®Çxy

å ∣∣∣∣∣ x, y ∈ R´,

and similarly regard R3 as the set of vectors

R3 =

Ö

xyz

è ∣∣∣∣∣∣ x, y, z ∈ R

.

Recall the dot product, which I define in R3 (for R2 simply forget the last coordinate):Öabc

è·

Öxyz

è= ax+ by + cz .

Notice that the right hand side is in fact identical to the expression appearing on the left handside of our general equation for a plane in R3! This is not a coincidence. One way to geometricallydetermine a plane is to fix a vector

n =

Öabc

èand find the set of all points such that the displacement vector from a fixed point (x0, y0, z0) isperpendicular to i.. The key fact (which we will prove later in the course) is that u·v = ‖u‖‖v‖ cos θfor any vectors u,v ∈ R3, where θ ∈ [0, π] is the angle made between the two vectors (which canalways be chosen to be in the interval [0, π]). Thus, a plane equation has the form

n · (x− x0) = 0 ,

where

x =

Öxyz

è, x0 =

Öx0y0z0

è.

It is simple algebra of real numbers which turns this into the equation ax + by + cz = d, whered = n · x0 is a constant determined by the choices of n and x0. One refers to the functionf(x, y, z) = ax + by + cz, with a, b, c ∈ R known and x, y, z ∈ R variable, as a linear function. Soanother viewpoint is that a plane in R3 is a level set of a linear function in three variables.

We can regard the dot product in another way: as a 1× 3 matrix acting on a 3× 1 matrix bymatrix multiplication:

[ a b c ]

xyz

= [ax+ by + cz] = n · x ,

where I’ve abused notation slightly by taking the 1×1 resulting matrix, and regarding it as merelythe real number it contains. We take this as the definition of matrix multiplication in the casewhere we are given a 3× 1 matrix (a row vector) and a 1× 3 matrix (a column vector). We wishto extend this definition to matrices acting on column vectors, and we will see that the definitionis powerful enough to capture both the concepts of linear systems and linear combinations.

12


The idea is simple: we’ll let rows of a matrix be dotted with a vector, as above, which gives usa new vector consisting of the real numbers resulting from each row-column product. Formally, wecan define it in Rn, which we think of as the space of column vectors with n real entries:

Rn =

Ö

x1...xn

è ∣∣∣∣∣∣ x1, . . . , xn ∈ R

.

Definition 3.1. Let A ∈ Matm×n(R) be a matrix given by a11 · · · a1n...

. . ....

am1 · · · amn

.Let vi denote the vector whose entries are taken from the ith row of A:

vi :=

ai1...ain

.Then define the matrix-vector product as a map Matm×n(R)× Rn → Rm given by the formula

x 7→ Ax :=

v1 · x...

vn · x

∈ Rm .

Example 3.1. Compute the matrix vector product Au where

A =

1 1 11 −2 34 −5 6

, u =

123

.To compute this, we need to dot each row with the column vector u. For example, the first row

gives

[ 1 1 1 ]

123

= 1(1) + 1(2) + 1(3) = 6 .

Note that dotting a vector u with a vector v consisting entirely of ones simply sums the componentsu. Computing the remaining rows this way, we obtain the vector

Au =

1 1 11 −2 34 −5 6

1

23

=

6612

.Let’s call this vector b. Recall that u was a solution to the system with augmented matrix

îA b

ó!

This is no coincidence. We can view the system of equations as being equivalent to solving thefollowing problem: find a vector x such that Ax = b. In this case we’d solved that system forx = u, and just checked via matrix-vector multiplication that indeed, it is a solution!

We have one last perspective on this, which is that we found a linear combination of the columnsof A:

x

114

+ y

1−2−5

+ z

136

=

6612

13


is solved by x = 1, y = 2, and z = 3.Thus, we’ve explored numerous ways to understand the solution of the equation

Ax =

1 1 11 −2 34 −5 6

xyz

=

6612

.Let us remark on some basic properties of matrix-vector products. We know that we can view

them as giving maps between Euclidean spaces of vectors. We have the following observations:

1. For any matrix A ∈ Matm×n(R) and any vectors x,y ∈ Rn, A(x + y) = Ax + Ay.

2. For any A ∈ Matm×n(R), any vector x,∈ Rn, and scalar s ∈ R, A(sx) = s(Ax).

Are these not familiar properties? Consider, for example, limits, derivatives, integrals. Anotherway of stating these properties is to say we have discovered operators which, upon acting on linearcombinations of inputs, output a linear combination of the sub-outputs. That is, matrices take linearcombinations of vectors to linear combinations of matrix-vector products, derivatives take linearcombinations of differentiable functions to linear combinations of the derivatives of the simplerfunctions, and integrals act analogously on integrable functions. Both derivatives and integralsbehave this way because limits do, so the linearity was somehow inherited. We’d gradually liketo come to an understanding of the word linear describing the commonality among these variousoperations, which behave well with respect to linear combinations. To do this, we need to seewhat spaces of objects have the right properties to form linear combinations, and to ensure thatwe consider maps of such spaces which respect this structure in a way analogous to the above twoproperties.

Practice:

Exercise 3.1. Let A be the matrix

A =

0 2 −1−2 0 31 −3 0

.Compute Ax for

1.

x =

111

,2.

x =

312

,3.

x =

100

, 0

10

, or

001

.Can you interpret the results geometrically? We will eventually have a good understanding of

the geometry of the transformation x 7→ Ax for the above matrix, and others which share a certainproperty which it possess. (Preview: it is a skew symmetric matrix, and represents a certaincross-product operation).

14


We now investigate so called linear maps from Rn to Rm.

Definition 3.2. A map T: Rn → Rm is called a linear transformation, or a linear map if thefollowing properties hold:

1. For all s ∈ R and any x ∈ Rn, T(sx) = s(Tx).

2. For any pair of vectors x,y ∈ Rn, T(x + y) = Tx + Ty.

We refer to T as a linear operator if these properties hold. Note the convention of often omittingparentheses between the operator T and the vector input x: Tx := T(x).

Clearly, the operator TA : Rn → Rm defined by TAx = Ax defines a linear map. Let us see howlinear systems fit into this framework. First, a formal description of linear systems:

Definition 3.3. A system of linear equations in n variables is a set of m ≥ 1 equations of the form

a11x1 + a12x2 + . . .+ a1nxn = b1

a21x1 + a22x2 + . . .+ a2nxn = b2...

. . ....

...

am1x1 + am2x2 + . . .+ amnxn = bm

.

Observation 3.1. A system of linear equations can be captured by the linear transformation TA

associated to a matrix A = (aij) ∈ Matm×n(R). Thus, a linear system can be written as Ax = bfor x ∈ Rn unknown. The system Ax = b is solvable if and only if b is in the image of TA.We need to recall what is meant by this terminology, so what follows is a handful of definitionsregarding functions (not necessarily just linear functions; these definitions are standard and areusually introduced in high school or college algebra and precalculus courses).

Definition 3.4. Let X and Y be mathematical sets. A function f : X → Y assigns to each x ∈ Xprecisely one y ∈ Y . X is called the domain or source, and Y is called the codomain or target.

Note that one y may be assigned to multiple xs, but each x can be assigned no more than oney... this is a distinction which often trips folk up when first learning about functions. To betterunderstand this distinction, let’s view functions as triples consisting of a domain X (the inputs),a codomain Y (the possible outputs), and a rule f assigning outputs to inputs. Note htat we needto specify all of these to completely identify a function. Now, if the domain were the keys on akeyboard, and the outputs the symbols on your screen in a basic word processing environment,you’d declare your keyboard “broken” if after pushing the same key several times, your screendisplayed various unexpected results. On the other hand, if your function was determined by apreset font, you could imagine pushing many different keys, and having all of the outputs be thesame. In this latter case, the keyboard is functioning, but the rule assigning outputs happens tobe a silly one (every keystroke produces a ‘k’, for example). Thus, a function may assign at mostone output per input, but may reuse outputs as often as it pleases.

Sometimes a function is also called a map, especially if the sets involved are thought of as“spaces” in some way. We will later define structures on a set which turn them into somethingcalled a vector space, and we will study linear maps on them, which are just functions with propertiesanalogous to those for linear functions from Rn to Rm.

Definition 3.5. Given sets X and Y and a function f : X → Y , the set

f(x) := Im(f) = {y ∈ Y | y = f(x) for some x ∈ X} ⊂ Y

is called the image of f .

15


Definition 3.6. Given sets X and Y and a function f : X → Y , and given a subset V ⊂ Y , theset

f−1(V ) := {x ∈ X | f(x) ∈ V } ⊂ X

is called the preimage of V .

Be warned: the preimage of a subset is merely the set of things being mapped to that subset, butis not necessarily constructed by “inverting a function” since not every function is invertible (butany subset of a codomain has defined for it a preimage by any function mapping to the codomain;that preimage may be empty!) If, on the other hand, for every y ∈ Y there is a unique x ∈ X, suchthat y = f(x), then we use the same notation f−1 to describe the inverse function. We will talkmore about inverses after a few more definitions.

Definition 3.7. A map f : X → Y is called surjective or onto if and only if for every y ∈ Y , thereis an x ∈ X such that y = f(x); equivalently, if the preimage f−1{y} 6= ∅ for all y ∈ Y , f is asurjection from X to Y . A common shorthand is to write f : X � Y to indicate a surjection; inclass I avoid this shorthand because it is easy to miss until one becomes quite comfortable with thenotion. However, in these notes, I will from time to time use it, while also reminding the readerthat a particular map is a surjection by declaring it “onto” or ”surjective” in the commentary.

Note that a map f : X → Y is a surjective map if and only if the image is equal to the codomain:f(X) = Y . In our keyboard analogy, we’d want to be able to produce any symbol capable of beingdisplayed in a word processing program by finding an appropriate keystroke in order to declarethat our typing with a particular font was “surjective”. Thus, the rule for producing outputs hasto be powerful relative to the set of outputs: any output can be achieved by an appropriate inputinto a surjective function. Another remark is that if we start with some function f : X → Y , andthen restrict our codomain to the image f(X) ⊆ Y , we obtain a new function, which we abusivelymight still label f . This function is surjective! Said another way, any function surjects onto itsimage, because we’ve thrown out anything in the codomain which wasn’t in the image when werestricted the codomain! So, in our typing analogy, perhaps we can’t produce all symbols with agiven font, but if we declare our codomain to be only the symbols that display in that font withregular typing inputs (no fancy stuff, multiple keys at once, sequences of keystrokes, etc1), then wehave automatically built an onto map between keys and displayable symbols in the given font.

Definition 3.8. A map f : X → Y is called injective or one-to-one if and only if for every distinctpair of points x1, x2 ∈ X, they possess distinct images:

x1 6= x2 =⇒ f(x1) 6= f(x2) for all x1, x2 ∈ X .

Equivalently, for any y ∈ f(X), the preimage of y, f−1({y}) contains precisely one element fromX. As a shorthand, one often writes f : X ↪→ Y , and refers to f as an injection.

Exercise 3.2. Show that a function f : X → Y is injective if and only if whenever f(x1) = f(x2),one has that x1 = x2.

Definition 3.9. A map f : X → Y is called a bijection if and only if it is both injective andsurjective.

Definition 3.10. Given a map f : X → Y , a map f−1 : Y → X is called an inverse for f if andonly if

1If we define our domain to be the set of all sequences of keystrokes which can produce a single symbol output,and our codomain to be all possible outputs in the font, then we have a bijection between keystroke sequences andoutputs if and only if the font contains no repeated characters, and the hardcoding contains no redundant inputsequences.

16


(i.) f−1 ◦ f = IdX , i.e. f−1Äf(x)

ä= x for every x ∈ X,

(ii.) f ◦ f−1 = IdY , i.e. fÄf(y)

ä= y for every y ∈ Y .

If such a function exists, we say f is invertible.

Exercise 3.3. A function f : X → Y is invertible if and only if it is a bijection.

Note that there are two ways to show that some map f : X → Y is a bijection. You can showthat it is both injective and surjective separately, or you can prove that an inverse exists.

We’d now like to return to doing linear algebra, a little brighter with our language of functions.Consider the following questions:

Question 1: If Ax = b possess a solution x for every v ∈ Rm, then what can we say aboutthe linear map

TA : Rn → Rm ?

Question 2: If Ax = b possess a unique solution x for every v ∈ Im(TA) =: ARn, then whatcan we say about the linear map

TA : Rn → Rm ?

These answers give us a surprising motivation to study specific properties of linear maps, suchas which vectors they send to the zero vector. Here I provide incomplete answers to these questions.For the first, we know that the map is surjective, though we need to discover what that means interms of our matrix; in particular, we’d like to answer “what property must a matrix have for theassociated matrix-vector multiplication map to be surjective?” Similarly, for the second question,we know that the map must be injective, and would hope to characterize injectivity in an easilycomputable way for a given map coming from multiplying vectors by matrices.

Surjectivity, recall, is equivalent to the image being the entire codomain. So for a linear mapT: Rn → Rm to be surjective, we merely require that T(Rn) = Rm. To know when a given matrixcan accomplish this, we’ll need to do more matrix algebra, and come to an understanding of theconcept of dimension. For now, I’ll state without argument that there’s certainly no hope if n < m.But it’s also possible to have n >> m and still produce a map which doesn’t cover Rm (e.g. bymapping everything onto 0, or onto some linear set carved out by a linear equation system).

Injectivity is more subtle. Begin first by observing that if T: Rn → Rm is linear, then T0Rn =0Rm where 0Rn is the zero vector, consisting of n zeroes for components, and similarly for 0Rm (Iwill often drop the subscript when it is clear which zero vector is being invoked). This is becauseof the first property in the definition of linearity:

0 = 0(Tx) = T(0x) = T0 for any x ∈ Rn .

So certainly, the preimage of 0 by a linear map contains 0. If it contains anything else, then themap is not injective by definition. I claim that the converse is true: if there’s only the zero vector inthe preimage of the zero vector, then the linear map is an injection. The proof is constructed as asolution to the first question on the second written assignment (HW quiz 2, problem 1), in greatergenerality (the result, correctly stated, holds for vector spaces). We’ll discuss this proposition morelater. Generally, we want to know about solutions to the homogeneous equation Ax = 0, and inparticular, when there are nontrivial solutions (which means the matrix-vector multiplication mapis not injective). It seems clear that this information comes from applying Gauss-Jordan to thematrix, and counting the pivots. If there are no free variables, then the homogenous system issolved uniquely, and the map is injective. If it is also surjective, we’d like to be able to find aninverse function which solves the general, inhomogeneous system Ax = b once and for all! We needa little more information about matrix algebra if we wish to accomplish this. Along the way, wewill further motivate the development of abstract vector spaces.

17


4 Matrix algebra

Suppose we wanted to compose a pair of linear maps induced by matrix multiplication:

Rk TB−→ Rn TA−→ Rm ,

where B ∈ Matn×k(R) and A ∈ Matm×n(R). Let TAB = TA ◦ TB denote the composition obtainedby first applying TB and then applying TA.

Exercise 4.1. Check that TAB above is a linear map.

We want to know if we can represent TAB by a matrix-vector multiplication. It turns out wecan, and the corresponding matrix can be though of as a matrix product of A and B. Let us do anexample before defining this product in full generality.

Example 4.1. Let

A =

ñ3 2 16 5 4

ô∈ Mat2×3(R) , and B =

1 23 45 6

inMat3×2(R) .

Thus, TA : R3 → R2 is given by TAy = Ay and TB : R2 → R3 is given by TBx = Bx. Given

x =

ñx1x2

ô∈ R2, the map TAB : R2 → R2 sends x to A(Bx). Let y = Bx. Then

y =

1 23 45 6

ñ x1x2

ô=

x1 + 2x23x1 + 4x25x1 + 6x2

.We can then compute Ay:

TABx = Ay = A(Bx) =

ñ3 2 16 5 4

ô x1 + 2x23x1 + 4x25x1 + 6x2

=

ñ3(x1 + 2x2) + 2(3x1 + 4x2) + 5x1 + 6x2

6(x1 + 2x2) + 5(3x1 + 4x2) + 4(5x1 + 6x2)

ô=

[ Ä3(1) + 2(3) + 1(5)

äx1 +

Ä3(2) + 2(4) + 1(6)

äx2Ä

6(1) + 5(3) + 4(5)äx1 +

Ä6(2) + 5(4) + 4(6)

äx2

]

=

ñ3(1) + 2(3) + 1(5) 3(2) + 2(4) + 1(6)6(1) + 5(3) + 4(5) 6(2) + 5(4) + 4(6)

ô ñx1x2

ô=

ñ14 2041 56

ô ñx1x2

ô=

ñ14x1 + 20x241x1 + 56x2

ô.

Notice that the matrix in the penultimate line above is obtained by forming dot products fromthe row vectors of A with the column vectors of B to obtain each entry. This is how we willdefine matrix multiplication in general: we treat the columns of the second matrix as vectors, andcompute matrix-vector products in order to obtain new column vectors.

We are now ready to define the matrix product as the matrix which successfully captures acomposition of two linear maps coming from matrix-vector multiplication. Let’s return to thesetup.

18


Definition 4.1. Suppose we have linear maps

Rk TB−→ Rn TA−→ Rm ,

where B ∈ Matn×k(R) and A ∈ Matm×n(R). Let TAB = TA ◦TB : Rk → Rm denote the compositionobtained by first applying TB and then applying TA. Then there is a matrix M such that TABx =Mx for any x ∈ Rk, and we wish to define AB := M. Following the ideas of the above example,we can (exercise!) realize M = (mij) ∈ Matm×k(R) as the matrix whose entries are given by theformula

mij =n∑l=1

ailblj .

Thus, the columns of AB are precisely the matrix-vector products Avj where vj is the jth columnof B. We refer to AB ∈ Matm×k(R) as the matrix product of A and B.

Several remarks are in order. First, note that there is a distinguished identity matrix In ∈Matn×n(R) such that for any A ∈ Matm×n, AIn = A and for any B ∈ Matn×k, InB = B. Thismatrix consists of entries δij which are 1 if i = j and 0 if i 6= j:

In =

1 0 0 . . . 00 1 0 . . . 0...

. . ....

0 0 . . . 0 1

∈ Matn×n(R) .

Clearly, for any vector x ∈ Rn, Inx = x, whence it also acts as an identity for matrix multiplication,when products are defined.

Notice also that the number of columns of the first matrix must match the number rows of thesecond matrix. In particular, if A ∈ Matm×n and B ∈ Matn×k(R), then AB is well defined, but BAis well defined if and only if k = m.

Worse yet, like function composition, matrix multiplication, even if it can be defined in bothorders, is in general not commutative, as the maps of the two differently ordered compositions mayland in different spaces altogether!

Example 4.2. Suppose A ∈ Mat2×3(R), and B ∈ Mat3×2(R). Then both AB and BA are defined,but AB ∈ Mat2×2(R), while BA ∈ Mat3×3(R)!

We may hope that things are nicer if we deal with square matrices only, so that products ofmatrices stay in the same space. Alas, even here, commutativity is in general lost, as the nextexample illustrates.

Example 4.3. Consider the following matrices:ñ1 20 1

ô,

ñ0 −11 0

ô.

We compute the products in each order:ñ1 20 1

ô ñ0 −11 0

ô=

ñ2 −11 0

ôñ

0 −11 0

ô ñ1 20 1

ô=

ñ0 −11 2

ô.

Thus, matrix multiplication isn’t generally commutative, even for 2× 2 square matrices where allproducts are always defined.

19


Another remark, which would require some work to prove, is that multiplication of real matricesis associative. In particular, if A, B and C are matrices for which the products A(BC) and (AB)Care defined, then in fact these are the same and thus without ambiguity we have

A(BC) = ABC = (AB)C .

There are several other important constructions in matrix algebra, which rely on the structureof the Euclidean spaces of vectors we’ve been working with. Note that we can define sums of imagesof vectors under a linear map. This allows us to also define sums of matrices.

Definition 4.2. Given A,B ∈ Matm×n(R), we can define the sum A + B to be the matrix suchthat for any x ∈ Rn, (A + B)x = Ax + Bx. Using the indicial notation for entries, we have thenthat

n∑j=1

aijxj +n∑j=1

bijxj =n∑j=1

(aij + bij)xj ,

which implies that A + B is obtained by adding corresponding entries of A and B.

Matrices can also be scaled, by simply scaling all the entries: sA = (saij) for any s ∈ R. Inparticular, we may also subtract matrices, and each matrix has an additive inverse. There’s aunique zero matrix in any given matrix space Matm×n(R), consisting of all zero entries. Denotethis zero matrix by 0m×n.

We define a few more operations with matrices. If A ∈ Matm×n(R), then we can define a newmatrix called it’s transpose, which lives in Matn×m(R):

Definition 4.3. The matrix A = (aij) has transpose Aτ = (aji), in other words, the transposematrix is the matrix obtained by exchanging the rows of A for columns.

Example 4.4. ñ1 2 34 5 6

ôτ=

1 42 53 6

.Finally, we discuss, for square matrices, the notion of a matrix inverse. The inverse matrix of

a matrix A ∈ Matn×n(R) is one which, if it exists, undoes the action of the linear map x 7→ Ax.In particular, we seek a matrix A−1 such that A−1A = In = AA−1. Recall, that the map must bebijective for it to be fully invertible.

Proposition 4.1. If an inverse matrix for a matrix A ∈ Matn×n(R) exists, then one can computeit by solving the system with augmented matrixî

A Inó.

This can be done if and only if the reduced row echelon form of A is the n × n identity, that is,RREF(A) = In. In this case, after applying Gauss-Jordan to this augmented matrix, one has thematrix î

In A−1ó.

Proof. The condition AA−1 = In gives us n systems of n equations in n variables, correspondingto the systems Avj = ej for vj a column of A−1, and ej the jth column of the identity matrix In.The row operations to put A into RREF do not depend on ej , so applying these operations to thematrix î

A e1 . . . enó

=îA In

ósimultaneously solves all n systems, provided that RREF(A) = In. If RREF(A) 6= In, then thereare free variables, and the columns of our hypothetical inverse cannot be uniquely determined, and

20


in fact, at least one of the systems will consequently be inconsistent. This latter statement willbe more carefully proved when we discuss linear independence. Assuming the reduction can becompleted to solve for A−1, then the final form of the augmented matrix is clearlyî

In v1 . . . vnó

=îIn A−1

ó,

which gives the desired matrix inverse.

Example 4.5. Let’s compute ñ1 21 3

ô−1.

The augmented matrix system we need isñ1 2 1 01 3 0 1

ô.

Applying the row operations R2 −R1 7→ R2 followed by R1 − 2R2 7→ R2, one obtainsñ1 0 3 −20 1 −1 1

ô.

We can check easily by multiplying, in either order, to obtain the identity matrix.

Exercise 4.2. Find 1 2 34 5 67 8 9

−1

,

if it exists.

Exercise 4.3. Show that A(B + C) = AB + AC whenever the products and sums are defined.Convince yourself that s(AB) = A(sB) for any scalar s ∈ R, provided the matrix product isdefined. What can you say about (A + B)τ and (AB)τ?

5 Vector Spaces

5.1 Indulging a Motivation

In the previous section, we saw that matrices have algebraic properties identical in some senseto the algebraic properties of vectors in a Euclidean vector space: we can add them and scalethem, and we can form linear combinations of matrices if we so please, with all these operationsbeing commutative and associative. Matrix multiplication, on the other hand, defines linear mapsof Euclidean vectors. But since we can also multiply matrices by each other under the right(dimensional) conditions, we may want a way to regard matrices as determining linear maps on thespaces of matrices. More specifically, given M ∈ Matm×n(R) and A ∈ Matn×k(R), we can define amap

TM : Matn×k(R)→ Matm×k(R) ,

given by the ruleA 7→ MA .

By the exercise at the end of last section, we have that

TM(sA + tB) = M(sA + tB) = s(MA) + t(MB) = sTMA + tTMB .

21


Thus, we want to be able to regard this map as a linear map since it shares the properties whichdefined linear maps from Rn to Rm.

One way to easily realize this is to actually identify the spaces Matn×k(R) with some Euclideanvector space. By concatenating the columns of matrices in some chosen order, we can create abijective map from Matn×k(R) to Rn×k. Of course, there’s not a single natural way to do this; wecould also concatenate rows, or scramble the entries up somehow, as long as we do it consistentlyfor all matrices.

Example 5.1. We can identify Mat2×2(R) with R4 as follows: given a matrixña bc d

ô,

we can map it to the 4-vector acbd

,obtained by concatenating the first and second columns but we can also map it to the 4-vector

abcd

,obtained by concatenating rows. Neither choice is better than the other, so we say that ouridentification, whichever we choose, is non-canonical, since there’s not a particularly more naturalchoice.

Exercise 5.1. Given A ∈ Matn×k(R), how many different ways can one identify A with a vectorin Rnk which contains the same entries as A? How many ways can we bijectively map Matn×k(R)and Rnk?

5.2 The Big Definition

Another approach, which is quite fruitful, is to investigate spaces which have the appropriategeneral algebraic structure to support a notion of “linear map”. This brings us to the study ofvector spaces.

Definition 5.1. A vector space is a set V whose elements will be called vectors, together withadditional structure depending on a pair of operations and a choice of a scalar field F (for now,mentally picture F = R ,the field real numbers, or F = Q, the field of rational numbers; otherexamples will be given later including complex numbers C and finite fields.) The operations arevector addition and scalar multiplication. Vector addition takes two vectors x,y ∈ V and producesa (possibly new) vector x + y ∈ V , while scalar multiplication takes a scalar s ∈ F and a vectorx ∈ V and produces a (possibly new) vector sx ∈ V . These operations are required to satisfy 8axioms:

Axiom 1: Commutativity of vector addition: for any x,y ∈ V , x + y = y + x.

Axiom 2: Associativity of vector addition: for any x,y, z ∈ V , x + (y + z) = x + y + z =(x + y) + z.

22


Axiom 3: Identity for vector addition: there exists a vector 0 ∈ V such that for any x ∈ V ,x + 0 = x.

Claim. The zero vector 0 ∈ V is unique.

Proof. This follows from the preceding axioms: Assume we have found 0. Then if 0 ∈ V is avector such that x + 0 = x for any x ∈ V as well, then taking x = 0 one has 0 = 0 + 0 =0 + 0 = 0, showing that our new candidate was in fact the same as the zero vector.

Axiom 4: Inverse for vector addition: for any x ∈ V , there is an inverse element (−x) suchthat x + (−x) = 0.

Axiom 5: Scalar distributivity over vector addition: for any s ∈ F and any x,y ∈ V ,s(x + y) = sx + sy.

Axiom 6: vector distributivity over scalar addition: for any x ∈ V and any scalars r, s ∈ F,(r + s)x = rx + sx.

Axiom 7: Associativity of scaling: for any x ∈ V and any scalars r, s ∈ F, s(rx) = (sr)x.

Axiom 8: Scalar identity: for any x ∈ V , 1x = x, where 1 ∈ F is the multiplicative identityfor the field.

A set V with vector addition and scalar multiplication satisfying the above eight axioms for a fieldF is called a “vector space over F” of simply “an F-vector space”.

Exercise 5.2. Let V be an F-vector space. Prove that for any given x ∈ V , the inverse (−x) isunique, and equals −1(x).

Given the abstraction of the above definition, let us convince ourselves that it is a worthwhiledefinition by exhibiting a plethora of examples. The longer one studies math, the more one discoversmany ubiquitous vector spaces, which vindicate the choices made in crafting such a long, abstractdefinition. After a while, one potentially becomes disappointed when one encounters somethingthat’s almost a vector space (modules over commutative rings with zero divisors: I’m looking atyou!), but rest assured, there are plenty of vector spaces out there to become acquainted with!The following examples are also “thought exercises” where you should convince yourself that theexamples meet the conditions set forth in the above axioms.

Example 5.2. The obvious example is Rn: every axiom seems to have been picked from observingthe essential structure of Rn as a vector space over R.

Example 5.3. It doesn’t take much work at this point to show that Matm×n(R) is an R-vectorspace for any positive integers m and n. Convince yourself that all eight axioms are met if we takematrix addition as the vector addition, and scaling a matrix as the scalar multiplication operation.

Example 5.4. Let Pn(R) denote the space of all polynomials of degree less than or equal to nwith real coefficients:

Pn(R) = {a0 + a1x+ . . . anxn | a0, . . . an ∈ R} .

Then I claim this is naturally a vector space over R with the vector addition given by usual additionof polynomials, and the scalar multiplication given by scaling polynomials in the usual way.

23


Example 5.5. The complex numbers C := {a+ bi | a, b ∈ R, i2 = −1} are naturally a vector spaceover the real numbers, but since C is also a field, C can be regarded as a C-vector space. in general,any field F is itself a vector space over F, and Fn may be defined as it was for Rn. Fn inherits anatural vector space structure, much as R did, by allowing componentwise addition of vectors usingthe additive structure of F, and allowing the multiplicative structure of F to determine the scalaraction componentwise.

Example 5.6. Let p be a prime number. Then there exists a field Fp which has p elements. Wecan regard this field as the set of remainder classes modulo the prime p, and so we write

Fp = {0, 1, . . . , p− 1}

as a set. The additive structure is determined by taking the remainder of addition modulo p, andthe multiplicative structure is determined likewise. For example, if p = 3, one has F3 = {0, 1, 2} asa set, and the operations are

0 + 0 = 0, 0 + 1 = 1, 0 + 2 = 2

1 + 1 = 2, 1 + 2 = 0

0(1) = 0(2) = 0(0) = 0, 1(1) = 1, 1(2) = 2, 2(2) = 1 .

Given any Fp, we can construct Fnp which is certainly an Fp-vector space, but it will containonly pn elements. We can also construct the space Pn(Fp) of polynomials of degree less than orequal to n with Fp coefficients. These spaces are interesting in their own right within the studyof number theory. However, a simple example shows that these are not so abstract: let p = 2. F2

is called the binary field. Recall that any given integer m possess a binary expansion, which is anexpression of the form

m = a020 + a12

1 + a222 + . . . an2n

for some integer n, where a0, . . . an ∈ F2 are equal either 0 or 1. This is just a polynomial inPn(F2) evaluated with x = 2! Thus, there is a correspondence between binary expansions ofintegers and polynomials in the vector space Pn(F2). As an example, consider the integer 46.We know that 46 = 32 + 8 + 4 + 2 = 25 + 23 + 22 + 21. The corresponding polynomial is then0 + 1x + 1x2 + 1x3 + 0x4 + 1x5 ∈ P5(F2), while the binary expansion is just the list of thesecoefficients (with highest degree first): 4610 = 1011102.

Example 5.7. Fix an interval I ∈ R, and let C0(I,R) denote the set of all continuous R-valuedfunctions on I. Convince yourself that this is indeed a vector space. One can also give a vector spacestructure to continuously differentiable functions C1(I,R) defined over an open interval I ∈ R.

5.3 Linear Maps and Machinery

We now can proceed to define and study linear maps between vector spaces. What we will see isthat the phenomena in Rn aren’t particularly special to Rn, but rather a consequence of vectorspace structure. We will have the power to prove facts for all vector spaces and linear maps, whichgives us the power to transfer ideas about how to solve problems in one space to other spaces. Ourdefinition of linear map won’t appear any different, but we see that it is truly the two propertieswe’ve settled on which create much of the rigidity in the study of linear algebra.

Definition 5.2. A map T: V → W of F-vector spaces is called an F-linear map or an F-lineartransformation if

(i.) for any u,v ∈ V , T(u + v) = Tu + Tv,

(ii.) for any s ∈ F and any v ∈ V , T(sv) = sTv.

24


If the field is understood, one simply says ”linear map”, ”linear function”, or ”linear transforma-tion.” The symbol T is often referred to as a linear operator on V .

In analogy to how in elementary algebra, one studies roots of polynomials, i.e. points which apolynomial maps to 0, one may concern oneself with solutions v to the homogeneous linear equationTv = 0W for a linear map T: V →W . We have a special name for the set of solutions to such anequation:

Definition 5.3. The kernel of an F-linear map T: V → W of F-vector spaces is the preimage ofthe zero vector 0W ∈W :

ker T := T−1{0W } = {v ∈ V |Tv = 0W } .

Thus, the kernel of a linear map is the set of solutions to the homogeneous equation determined bythat map:

v ∈ ker T ⇐⇒ Tv = 0W .

Proposition 5.1. A linear map T: V →W is an injection if and only if the kernel is trivial, i.e.ker T = {0V }.

Proof. The proof is built in HW quiz 2, problem 1.

Example 5.8. We’ve already encountered linear maps of R-vector spaces extensively, and in thecase of a linear map given by matrix-vector multiplication, we can easily characterize injectivity.In particular, if A ∈ Matm×n(R) is a matrix determining a linear map TA : Rn → Rm, it’s injectiveif and only if the homogeneous equation Ax = 0 ∈ Rm is uniquely solved by the zero vector0 ∈ Rn. This occurs if and only if A has n pivots. If there are fewer than n pivots, we havefree variables, and can write the solution to the homogeneous equation as a linear combination ofvectors which generate or span the kernel. We’ve seen this basic procedure performed when solvingfor the intersection of two planes, though in that case there was an additional vector with scalarweight 1, since we were solving an inhomogeneous equation of the form Ax = b for b ∈ R2.

So by the above discussion, we can detect injectivity of the map x 7→ Ax by examining the rowreduction of A and counting the pivot entries. Note also that this implies that if n > m, there isno hope for injectivity, as there can be at most as many pivots as the minimum of n and m. Wewill often abuse notation and write ker A for the kernel of the linear map TA, and refer to thiskernel as the null space of A. This language will be better justified when we study subspaces andthe rank-nullity theorem in coming lectures.

We also have a special name for bijective linear maps, owing to the fact that linear maps preservevector space structure well:

Definition 5.4. Given two vector spaces V and W over a field F, an F-linear map T : V → W iscalled a linear isomorphism or a vector space isomorphism if it is a bijection. In this case we saythat V and W are isomorphic as F-vector spaces, and we write

V ∼= W .

If it is clear we are dealing with two vector spaces over a common field, we may simply say thatthe map is an isomorphism and that the vector spaces are isomorphic.

Exercise 5.3. Show that Pn(F) ∼= Fn+1 by exhibiting a linear isomorphism.

Exercise 5.4. Deduce that if A ∈ Matn×n(R) is an invertible matrix, it determines a self-isomorphism of Rn. We call such a self-isomorphism a linear automorphism.

25


Exercise 5.5. Compute the kernel of the linear map TA with matrix

A =

4 1 41 1 14 1 4

.Describe the general solution to Ax = b in terms of the components b1, b2, b3 of b and the elementsof the kernel (in particular, you should be able to express the solution as a linear combination ofsome vectors; what is this geometrically?)

5.4 Subspaces

An important concept in the study of vector spaces is that of a subspace. The idea is that linearequations carve out smaller vector spaces within larger ones, and vector spaces nest well in othervector spaces.

Definition 5.5. Let V be an F-vector space, U ⊂ V a nonempty subset. We call U a vectorsubspace or linear subspace if and only if the following two conditions hold:

(i.) for any u,v ∈ U , u + v ∈ U ,

(ii.) for any s ∈ F and any u ∈ U , su ∈ U .

Exercise 5.6. Verify that a subset U of a vector space V over F is a vector subspace if and onlyif it is itself a vector space with the operations it inherits from V .

Exercise 5.7. Convince yourself (and me, if you care) that U ⊂ V is a vector subspace if and onlyif it passes the following subspace test :

For any u, v ∈ U and any s ∈ F, u+ sv ∈ U .This is analogous to the statement that a map T: V →W is F-linear if and only if T(u+ sv) =

Tu + sTv for any u, v ∈ U and any s ∈ F.

Example 5.9. Given any vector space V , V is a vector subspace of itself, called the impropersubspace. A subspace U ⊂ V is called proper if and only if it is not all of V .

Example 5.10. For any vector space V , {0} is a vector subspace of V , called the trivial subspace.This justifies the language “the kernel is trivial”, as the kernel is trivial if and only if it equals thetrivial subspace. We often drop the braces and write 0 for the subspace as well as the element.

Example 5.11. If T: V → W is a linear map, then the kernel kerT ⊂ V is a subspace andsimilarly the image T (V ) ⊂ W is a subspace. Let us prove the former, and leave the latter as anexercise. We have to check the two conditions of being a subspace, namely, whether it is closedunder addition and scalar multiplication. By some trickery, one can claim that it suffices to checkthat for any u,v ∈ ker T, and any scalar s, u + sv ∈ ker T. (Why?) This is readily verified:

T(u + sv) = Tu + sTv = 0 + s0 = 0 =⇒ u + sv ∈ ker T .

Thus the kernel of the map T is a subspace of V .

Exercise 5.8. Check that Pk(F) ⊂ Pn(F) is naturally a subspace so long as k ≤ n. Define thespace of all polynomials over F

P(F) := {p(x) ∈Pn(F) | some n ∈ N} = ∪nPn(F) .

Then convince yourself that Pn(F) ⊂P(F) is a subspace for any nonnegative integer n.

26


Example 5.12. We can view the set C1Ä(a, b),R

äof continuously differentiable functions on an

open interval (a, b) as sitting inside of continuous functions C0Ä(a, b),R

ä, indeed, as a vector sub-

space (prove this to yourself!) The derivative map provides a linear map

d

dx: C1Ä(a, b),R

ä→ C0

Ä(a, b),R

ä,

and since the kernel of this map is nontrivial (it consists of all the constant functions, which asa vector subspace is R sitting inside C1

Ä(a, b),R

ä), we know the map is not injective, and so in

particular, it is not the map giving us the inclusion C1Ä(a, b),R

ä↪→ C0

Ä(a, b),R

ä. On the other

hand, by the fundamental theorem of calculus, the map is surjective, since we can always integratea continuous function f ∈ C0

Ä(a, b),R

äto obtain a continuously differentiable function

g(x) :=

∫ x

af(t)d t , g(x) ∈ C1

Ä(a, b),R

ä,

d

dxg(x) = f(x) .

Thus, we’ve furnished an example of a proper vector subspace which possesses a surjective but notinjective linear map onto its parent vector space. This is possible because the spaces are infinitedimensional – a notion we will make precise soon! We will also show that these oddities don’toccur in the finite dimensional cases.

Before we can define dimension properly, we must carefully come to understand the role playedby linear combinations in building subspaces, and in describing elements of vector spaces. Thus,we will define linear combinations and linear independence for a general vector space V over a fieldF.

Definition 5.6. Let V be an F-vector space. Given a finite collection of vectors {v1, . . . ,vk} ⊂ V ,and a collection of scalars (not necessarily distinct) a1, . . . , ak ∈ F, the expression

a1v1 + . . .+ akvk =k∑i=1

aivi

is called an F-linear combination of the vectors v1, . . . ,vk with scalar weights a1, . . . ak. It is callednontrivial if at least one ai 6= 0, otherwise it is called trivial.

As alluded to, one major use of linear combinations is to construct new subspaces. Considerlooking at the collection of all linear combinations made from a collection of vectors. We will callthis their span:

Definition 5.7. The linear span of a finite collection {v1, . . . ,vk} ⊂ V of vectors is the set of alllinear combinations of those vectors:

span {v1, . . . ,vk} :=

{k∑i=1

aivi

∣∣∣∣ ai ∈ F, i = 1, . . . , k

}.

If S ⊂ V is an infinite set of vectors, the span is defined to be the set of finite linear combinationsmade from finite collections of vectors in S.

Proposition 5.2. Let V be an F-vector space. Given a finite collection of vectors S ⊂ V , the spanspan (S) is a vector subspace of V .

Proof. A sketch was given in class. You are encouraged to go through a careful argument anddetermine which axioms of being a vector space are applied where.

27


5.5 Linear Independence and Bases

Definition 5.8. A collection {v1, . . . ,vk} ⊂ V of vectors in an F-vector space V are called linearlyindependent if and only if the only linear combination of v1, . . . ,vk equal to 0 ∈ V is the triviallinear combination:

{v1, . . . ,vk} linearly independent ⇐⇒Ä k∑i=1

aivi = 0 =⇒ a1 = . . . = ak = 0ä.

Otherwise we say that {v1, . . . ,vk} is linearly dependent.

Proposition 5.3. {v1, . . . ,vk} is linearly dependent if and only if there is some vi ∈ {v1, . . . ,vk}which can be expressed as a linear combination of the vectors vj for j 6= i.

Proof. Suppose {v1, . . . ,vk} is linearly dependent . After possibly relabeling we can assume thatthere’s a tuple (a1, . . . , ak) ∈ Fk such that a1 6= 0, and

∑ki=1 aivi = 0. Then rearranging, one has

v1 =k∑i=2

−Åaia1

ãvi ,

and thus we have expressed one of the vectors as a linear combination of the others.Conversely, if there’s a vector vi ∈ {v1, . . . ,vk} such that it can be expressed as a linear

combination of the other vectors, then we have vi =∑i 6=j ajvj for some constants aj ∈ F, and

rearranging one has vi −∑i 6=j ajvj = 0, which is a nontrivial linear combination equal to the zero

vector. This establishes that {v1, . . . ,vk} is linearly dependent.

Example 5.13. Let V = Rn, and suppose {v1, . . . ,vk} ⊂ Rn is a collection of k ≤ n vectors. Thenwe have the following proposition:

Proposition 5.4. The set of vectors {v1, . . . ,vk} is linearly independent if and only if the matrixA = [v1 . . . vk] has k pivots.

Proof. Consider the system Ax = 0. If 0 6= kerA := ker T, then there’s some nonzero x ∈ Rn suchthat

∑ni=1 xivi = 0, which implies that {v1, . . . ,vk} is linearly dependent. Thus, {v1, . . . ,vk} is

linearly independent if and only if ker A is trivial, which is true if and only if there k pivots.

Definition 5.9. A vector space V over F is called finite dimensional if and only if there exists afinite collection S = {v1, . . . ,vk} ⊂ V such that the F-linear span of S is V . If no finite collectionof vectors spans V , we say V is infinite dimensional.

Proposition 5.5. Any finite dimensional F-vector space V contains a linearly independent setB ⊂ V such that span B = V , and moreover, any other such set B′ ⊂ V such that span B′ = Vhas the same number of elements as B.

Proof. Let V be a finite dimensional F-vector space. Observe that because the V is finite dimen-sional, by definition there exists a subset S ⊂ V such that spanS = V . If S is linearly independentthen we merely have to show that no other linearly independent set has a different number of ele-ments. On the other hand, if S is linearly dependent, then since S is finite, we can remove at mostfinitely many vectors in S without changing the span. The claim is that removing a vector whichis a linear combination of the remaining vectors does not alter the span. This is obvious, since thespan is the set of linear combinations of the vectors, so if we throw some vector w out of S, theset S \ {w} still contains w in its span, and hence any other linear combination which potentiallyinvolved w can be constructed using only S \ {w}. Thus, after throwing out finitely many vectors,we have a set B which is linearly independent, such that span B = spanS = V . It now remainsto show that the size of any linearly independent set B′ which also spans V is the same as that ofB. To do this we need the following lemma:

28


Lemma 5.1. If S ⊂ V is a finite set and B ⊂ spanS is a linearly independent set, then |B| ≤ |S|.

Assuming the lemma, let’s finish the proof of the proposition. Suppose |B| = n and |B′| = m.From the lemma, since span B = V ⊃ B′ and B′ is linearly independent, we deduce that m ≤ nfrom the lemma. We similarly conclude that since span B′ = V ⊃ B and B is linearly independent,n ≤ m. Thus m = n and we are done.

We now prove the lemma:

Proof. Let S = {v1 . . .vm} and suppose B ⊂ spanS is a linearly independent set. Choose somefinite subset E ⊂ B. Since B is linearly independent, so is E. Suppose E = {u1, . . .uk}. SinceE ⊂ spanS, there’s a linear relation

uk = a1v1 + . . . amvm .

Since uk 6= 0 by linear independence of E, we deduce that at least one aj 6= 0. We may assume itis a1 whence we can write v1 as a linear combination of {uk,v2 . . .vm}. Note that E is also in thespan of this new set. We readily conclude that uk−1 is in the span of this new set, and repeatingthe argument above we can claim v2 ∈ span {uk,uk−1,v3 . . .vm}. Note that E is also in the spanof this new set. We can repeat this procedure until either we’ve used up E, in which case k ≤ m,or until we run out of elements of S. If we were to run out of elements of S, without running outof elements of E, then since E is in the span of each of the sets we are building, we’d be forced toconclude that there are elements of E which are linear combinations of other elements in E, whichcontradicts its linear independence. Thus, it must be the case that k ≤ m, as desired.

Definition 5.10. Given a vector space V over F, we say that a linearly independent set B suchthat V = span FB is a basis of V . Thus, the above proposition amounts to stating that we canalways provide a basis for a finite dimensional vector space, and moreover, any basis will have thesame number of elements.

Definition 5.11. Given a finite dimensional vector space V over F, the dimension of V is the sizeof any F-basis of V :

dimF V := |B| .

A remark: the subscript F is necessary at times, since a given set V may have different vectorspace structures over different fields, and consequently different dimensions. Specifying the fieldremoves ambiguity. We will see examples of this shortly.

Example 5.14. The standard basis of Fn is the set BS := {e1, . . . , en} consisting of the vectorswhich are columns of In. In particular, for any x ∈ Fn:

x =

x1...xn

= x1e1 + . . .+ xnen =n∑i=1

xiei .

Clearly, the vectors of BS are linearly independent since they are columns of the identity matrix.

Exercise 5.9. Show that if A ∈ Matn×n(R) is an invertible matrix, then the columns of A form abasis of Rn. Note that dimRRn = n as expected, either by the previous example or this one.

Example 5.15. A choice of basis for Pn(F) can be given by the set of monomials of degree lessthan n: {1, x, . . . , xn}. Clearly, any polynomial with coefficients in F is an F-linear combination ofthese, as indeed, that is how one defines polynomials! We merely need to check linear independence.This is clear since the only polynomial equal to the zero polynomial is the zero polynomial, andso any F-linear combination of the monomials equal to the zero polynomial necessarily has all zerocoefficients, and thus is the trivial linear combination. Note that there are n+ 1 monomials in thebasis, so dimF Pn(F) = n+ 1.

29


Example 5.16. The complex numbers C, regarded as a real vector space, have a basis with twoelements: {1, i}, and thus dimRC = 2. But as a vector space over the field C, a basis choicecould be any nonzero complex number, and in particular, {1} is a basis of C as a vector space overC, so dimCC = 1. More generally, dimRCn = 2n while dimCCn = n. Note that for any field,dimF Fn = n, which is established for example by looking at the standard basis.

Example 5.17. Let us examine an analogue of the standard basis in the case that our vector spaceis the space of real m× n matrices, Matm×n(R). Define a basis

BS = {eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n, i, j ∈ N} ,

such that eij is the matrix containing a single 1 in the (i, j)-th entry, and zeros in all other entries.It is easy to check that this is an R-basis of Matm×n(R), and thus that Matm×n(R) is an mn-dimensional real vector space.

Exercise 5.10. Consider the set of all matrices Eij ∈ Matn×n(F) defined by

Eij := In − eii − ejj + eij + eji .

(a) Given an n× n matrix A, what is EijA?

(b) Describe the vector space span F{Eij | 1 ≤ i, j ≤ n, i, j ∈ N} ⊂ Matn×n(F), and give a basis forthis vector space. (Hint: first figure out what happens for 2× 2 matrices and 3× 3 matrices,then generalize).

The notion of basis is useful in describing linear maps, in addition to giving us a notion of“linear coordinates”. Let us examine the connection between bases and linear maps. The firstresult in this direction is the following theorem:

Theorem 5.1. Let V be a finite vector space over F and B = {v1, . . . ,vn} a basis of V . Let Wbe a vector space and {w1, . . .wn} ⊂W a collection of not necessarily distinct vectors. Then thereis a unique linear map T: V →W such that

Tv1 = w1, . . .Tvn = wn .

Proof. For any v ∈ V we can write v as a linear combination of the basis vectors. Thus, letv =

∑ni=1 aivi. Suppose T : V →W is a linear map which satisfies the conditions

Tv1 = w1, . . .Tvn = wn .

Then the claim is that the value Tv is determined uniquely. Indeed, since T is linear, one has

Tv = Tn∑i=1

aivi =n∑i=1

aiTvi =n∑i=1

aiwi .

Moreover, we may construct a unique T from the data Tvi = wi by the above formula, and definethis to be the linear extension of the map on the basis.

This proposition tells us that if we determine the values to which basis vectors transform, thenwe can linearly extend to describe a linear map of all of V , and so the following corollary shouldcome as no surprise (we’ve alluded to the fact before in comments and exercises):

Corollary 5.1. Let V be a finite vector space over a field F. Then V is non-canonically2 isomorphicto Fn where n = dimF V .

2The term “non-canonical” in mathematics refers to the fact that the construction depends on choices in such away that there is no natural preference. In this case, there are many isomorphisms that may exist between V andFn, and we have no reason to prefer a specific choice outside of specific applications.

30


Proof. Since V is finite dimensional, we may find some basis B = {v1 . . .vn}, where dimF V = n.Then define LB on B by specifying

LBvi = ei , i = 1, . . . , n ,

where {e1, . . . , en} = BS ⊂ Fn is the standard basis. Then by the above proposition, we maylinearly extend LB to a linear map LB : V → Fn. It is clearly an isomorphism, as L−1B is definedon BS and determines a unique linear map LB : Fn → V which clearly satisfies

LB ◦ L−1B = IdFn and L−1B ◦ LB = IdV .

Example 5.18. Regarding Cn as a real vector space we have an isomorphism Cn ∼= R2n. Similarly,we have Pn(R) ∼= Rn+1 and Matm×n(R) ∼= Rmn. This latter fact justifies the notation that manyauthors (including Bretscher) exploit of writing Rm×n instead of Matm×n(R).

Exercise 5.11. For each of the above examples, write down explicit isomorphisms (in particular,produce a basis and describe how to map it to a basis of an appropriate model vector space Rk).

Exercise 5.12. Explain why there can be no invertible linear map T: R3 → R2. (this will beclarified more deeply in the discussion of the Rank-Nullity theorem; try to prove this using asimple argument about bases!)

We will later explore the use of the maps LB : V → Rn of real vector spaces to discuss linearcoordinates and change of basis matrices. For now, we finish with another important example:using a basis to describe a linear map via a matrix. Since any n-dimensional vector space over Ris isomorphic to Rn, it suffices to understand how to write matrices for maps T: Rn → Rm.

Theorem 5.2. Let T: Rn → Rm be a linear map. Then there is a matrix A ∈ Matm×n(R), calledthe matrix of T relative to the standard basis, or simply the standard matrix of T, such thatTx = Ax. The matrix is given by has columns given by the effect of the map T on the standardbasis:

A = [Te1 . . .Ten] ∈ Matm×n(R) .

Proof. The proof is a simple computation. Let x =∑ni=1 xiei. Then

Tx = T

(n∑i=1

xiei

)=

n∑i=1

xiTei = [Te1 . . .Ten]

x1...xn

.

Remark 5.1. If we think about the one line proof above, it should be clear that the image of thelinear map T: Rn → Rm is nothing more than the span of the columns of the matrix A representingthe map:

T (Rn) = span {Te1, . . .Ten} =: Col A .

The last notation is new: for any matrix A ∈ Matm×n(R), Col A is the subspace of Rm spanned bythe columns of A. This is called the column space of A, though we may also just refer to it as theof the image of the matrix map x 7→ Ax. We’ll see the column space again shortly.

31


Example 5.19. Let us demonstrate how to construct a matrix representing the linear map

d

dx: P2(R)→P1(R) .

Since matrices describe maps between vector Euclidean vector spaces, we need to exploit the iso-morphisms

ϕ2 : P2(R)→ R3 , ϕ2(a0 + a1x+ a2x2) = a0e1 + a1e2 + a3e3 ∈ R3 ,

ϕ1 : P1(R)→ R2 , ϕ1(a0 + a1x) = a0e1 + a1e2 ∈ R2

The matrix we desire will actually then be the standard matrix of the map

ϕ1 ◦d

dx◦ ϕ−12 : R3 → R2

which completes the diagram:

d/dxP2(R) P1(R)

R3 R2

ϕ2 ϕ1

ϕ1 ◦ ddx ◦ ϕ

−12

Note that since ddxp(x) = a1 + 2a2x, one has that the bottom map in the diagram is defined byÅ

ϕ1 ◦d

dx◦ ϕ−12

ã Äp(x)

ä= a1e1 + 2a2e2 ,

and by applying our theorem we find that the desired matrix representing the derivative is

A =

ñ0 1 00 0 2

ô.

Observe that the first column is a zero column, and this is entirely sensible since the derivative ofa constant is 0.

Exercise 5.13. Expand on the above example and describe matrices representing the derivativeof polynomials in Pn(R), and do the same for the integral. (This is part of exercise 4 on HW quiz3.)

Exercise 5.14. Fix a real number a ∈ R and a positive R ∈ R and denote by I the open interval(a−R, a+R). Denote by Cω(I,R) the space of power series centered at a and convergent on I.

(a) Show that Cω(I,R) is a vector space over R with vector addition and scalar multiplicationdefined in the natural ways.

(b) Is this vector space finite dimensional?

(c) Describe a basis of Cω(I,R).

(d) Give an example of a linear transformation T: Cω(I,R)→ Cω(I,R) that is surjective but notinjective. Can you find an example of a linear transformation of Cω(I,R) which is injective,has image of the same dimension as Cω(I,R), but is not surjective?

32


6 Rank and Nullity, and the General Solution to Ax = b

This section introduces us to the notions of rank and nullity, and will also give us the relationbetween them. The theorem relating them, called the rank-nullity theorem, is also sometimesaffectionately referred to as the fundamental theorem of linear algebra. This is because it gives usa rigid relationship between the dimensions of the domain of a linear map, the dimension of itsimage, and the dimension of its kernel, effectively telling us that linear maps can at worst collapsea subspace (the kernel, if it is nontrivial), leaving the image as a possibly lower dimensional shadowof the source vector space, sitting inside the target vector space. We will then discuss the generalsolution of linear systems.

6.1 Images, Kernels and their Dimensions

Let us introduce the main definitions and their elementary properties. Throughout, let V be afinite dimensional vector space of a field F, and let T: V →W be a linear map.

Definition 6.1. The rank of the linear map T: V →W is the dimension of the image:

rank T := dimF T (V ) .

It is sometimes abbreviated as rk T.

Remark 6.1. Note that rank T ≤ dimF V and rank T ≤ dimW .

Exercise 6.1. Explain the above remark about bounds on the rank of a linear map.

Definition 6.2. The nullity of the linear map T: V →W is the dimension of the kernel:

null T := dimF ker T .

Remark 6.2. Observe that null T ≤ dimF V , but it need not be bounded by the dimension of W .

Exercise 6.2. Explain the above remark about the bound on the nullity of a linear map.

Let us consider how nullity and rank are computed when a linear map is given by matrixmultiplication. Consider, for example, a linear map T: Rn → Rm given by the rule Tx = Ax forA ∈ Matm×n(R). Recall that the image of the map T is the same as the set of all vectors whichcan be written as linear combinations of the columns of A (this is why some books call it thecolumn space of A.) Thus, the rank is the number of linearly independent columns, as a collectionof linearly independent columns of A is a basis for the image. But we know that a set of k vectorsis linearly independent if and only if the matrix whose columns are the k vectors has k pivots, andso we deduce that the rank of the map T is precisely the number of pivots of A. The nullity is thedimension of the kernel, and each free variable of A contributes a vector which is in a basis of thekernel (think about using Gauss Jordan to solve Ax = 0). It is thus clear that the nullity of themap T can be computed by counting free variables, or equivalently by subtracting the number ofpivots from the total number of columns of A. We then have the obvious relationship: rank plusnullity gives the number of columns, which is just the dimension of the domain Rn. This is therank-nullity theorem, as stated for matrices. We will show it generally:

Theorem 6.1. Rank-Nullity

Let V be a finite dimensional F-vector space, and let T: V →W be a linear map.Then dimF V = dimF T(V ) + dimF ker T = rank T + null T .

33


Proof. Since V is finite dimensional, there exists a basis B of V . Moreover, since ker T ⊂ Vis a subspace, it is itself a finite dimensional vector space, and it thus possesses a basis. LetB = {u1, . . .uk,v1, . . .vr} be a basis of V such that ker T = span {u1, . . . ,uk}. We claim severalthings: that we can indeed procure a basis of V satisfying this property, and that {Tv1, . . .Tvr}are a basis of the image.

For the first claim, note that we can start with any basis B of V and some basis {u1, . . .uk}of K = ker T ⊂ V , where k = dimFK. Assume that dimF V = n. Then to produce a basis of theform above, we start by replacing a vector of B by u1. If the resulting set is linearly independent,then we choose a different vector in B to be replaced by u1. I claim there is a choice such that themodified set is still a basis. For if not, then u1 is in the span of any n−1 vectors in B. But then wehave a pair of distinct linear relations involving u1, and by subtracting these we obtain a nontriviallinear relation involving the elements of B, contradicting the linear independence of the vectors inB. Thus, we may choose to replace a vector of the basis with u1 to form a different basis. Theset of elements in B − {u1} is then a basis of an n − 1 dimensional subspace complimentary tospan {u1}, and we can iterate the process of replacement by elements of the basis of K, until we’veexhausted ourselves of the ui, i = 1, . . . k. The final set B is a basis of the form given above, wheren = k + r, and we know that k = null T is the nullity.

For the second claim, observe that the image of T satisfies

T(V ) = span {Tu1, . . . ,Tuk,Tv1, . . . ,Tvr}= span {0, . . . ,0,Tv1, . . . ,Tvr}= span {Tv1, . . . ,Tvr}

Thus the set {Tv1, . . .Tvr} spans the image. We need to show that this set is linearly independent.We prove this by contradiction. Suppose that there is a nontrivial relation

∑ri=1 aiTvi = 0. Then

T

(r∑i=1

aivi

)= 0 =⇒

r∑i=1

aivi ∈ ker T .

Since {u1, . . . ,uk} are a basis of ker T, we then can express the linear combination of vis as a linearcombination of the ujs:

r∑i=1

aivi =k∑j=1

bjuj .

We thus obtain a relation

a1v1 + . . .+ arvr − b1u1 − . . .− bkvk = 0 ,

and since at least one of the ais is nonzero, this relation is nontrivial. This contradicts the linearindependence of the elements of B. Thus, the assumption that there exists a non-trivial linearrelation on the set {Tv1, . . .Tvr} is untenable. We conclude that {Tv1, . . .Tvr} is a basis of theimage, so the rank is then r.

It is therefore clear that

dimF V = n = r + k = dimF T(V ) + dimF ker T = rank T + null T .

Let’s examine the consequences of this theorem briefly. First, note that if a map T: V → Wis an injection from a finite dimensional vector space V , then the kernel has dimension 0, and byrank-nullity we have that the dimension of the image is the same as the dimension of the domain.In particular, if a linear map is injective, its image is an “isomorphic copy” of the domain, andone may refer to such maps as linear embeddings, since we can imagine that we are identifying thedomain with its image as a subspace of the target space.

34


If we have a surjective map T: V → W from a finite dimensional vector space V , then theimage has the same dimension as W . We see that the dimensions then satisfy

dimF ker T = dimF V − dimFW ,

whence we see that the nullity is the difference in the dimensions of the domain and codomain fora surjective map. We can interpret this as follows: to cover the space W linearly by V , we have tosquish extra dimensions, nullifying a subspace (the kernel) whose dimension is complimentary toW .

Finally of course, in a linear isomorphism T: V →W , we have injectivity and surjectivity, andso in particular we have null T = 0 and dimF V = dimFW = rank T.

6.2 Column Space, Null Space, Row Space

This section introduces some language which is seen in many linear algebra textbooks for talkingabout the various subspaces associated to a linear map defined by matrix multiplication. Wewill presume a linear map T: Rn → Rm throughout, given by Tx = Ax for some matrix A ∈Matm×n(R).

Definition 6.3. The column space of the matrix A is the span of the columns of A. Observe thatthe column space is thus a subspace Rm, indeed, it is just another name for the image of the mapT, i.e. Col A = T(Rn) ⊆ Rm.

Definition 6.4. The row space of a matrix A is the span of the rows of A, and is denoted RowA.Technically, this a subspace of Mat1×n(R), but often one identifies the row space with a corre-sponding subspace of Rn (via the isomorphism ·τ : Mat1×n(R) → Rn sending a row vector to thecorresponding column vector).

Definition 6.5. The null space (or right null space as it is sometimes called) of the matrix A isthe space of vectors x such that Ax = 0. Note this is just another term for the kernel of the mapT. There is a notion of a “left null space” of A, which is the kernel of the map whose matrix is Aτ .The right nullity is just the nullity (i.e. the dimension of the kernel of T), and the left nullity isthe dimension of the left null space. I will tend to use the term kernel instead of null space, exceptwhen dealing with both left and right null spaces of a given matrix.

One can naturally identify rows with linear functions from Rn to R, and so there is a moreformal viewpoint on the row space: it is a subspace of the dual vector space to Rn. We develop thisidea with a few exercises. We first define duals in general:

Definition 6.6. Let V be a vector space over F. Then V ∗ = {f : V → R | f is F-linear} has anatural vector space structure induced by scaling and addition of functions, and when endowedwith this structure is called the dual vector space to V , or the “space of linear functionals on V ”.

Exercise 6.3. Show that for any finite dimensional F-vector space V , V ∗ ∼= V (non-canonically).

Exercise 6.4. What geometric objects give a model of the dual vector space to R3?

By the preceding exercise, we see that the space of linear functionals on Rn is isomorphic toRn. By fixing the standard basis as our basis, we can realize linear functionals as row vectors, andtheir action by the matrix product. Thus, we see that the row space of a matrix is a subspace of(Rn)∗, and we can pass through the aforementioned transposition isomorphism to Rn.

Exercise 6.5. What is the relationship between the row space of A and the column space ofAτ? What does rank nullity tell us about the relationships of the dimension of the row space, thedimension of the column space, and the right and left nullities?

35


6.3 The General Solution At Last

We now will discuss the general solution to a linear system. We’ve already seen how to algorithmi-cally solve a matrix equation of an inhomogeneous linear system Ax = b, where A ∈ Matm×n(R),x ∈ Rn and constant b ∈ Rm, using Gauss-Jordan. We wish to more deeply interpret these resultsin light of our knowledge of the various subspaces associated to a linear map (or to a matrix), andthe rank-nullity theorem. Throughout, assume A ∈ Matm×n(R), and b ∈ Rm fixed. We begin witha few observations.

Observation 6.1. Let K = ker(x 7→ Ax). Note this is precisely the space of solutions to thehomogeneous linear system Ax = 0. Suppose x0 ∈ K, and that xp solves the inhomogeneoussystem Ax = b. Then note that xp + x0 is also a solution of the inhomogeneous system:

A(xp + x0) = Axp + Ax0 = b + 0 = b .

Observation 6.2. If xp and xp both solve the inhomogeneous system, then they differ by anelement of K:

A(xp − xp) = Axp −Axp = b− b = 0 ,

=⇒ xp − xp ∈ K .

These two observations together imply the following: given any particular solution xp to theinhomogeneous linear system Ax = b, we can obtain any other solution by adding elements of thekernel of the map x 7→ Ax. In particular, we can describe the general solution to Ax = b as beingof the form

x = xp + x0 ,

for x0 ∈ K.When we reduce the augmented matrix

îA b

óand write the solution as a sum of a constant

vector with coefficient 1 and a linear combination of vectors with coefficients coming from the freevariables, we are in fact describing a general solution of the above form. The constant vector isan example of a particular solution, while the remaining vectors which are scaled by free variablesgive a basis of the null space.

We thus know how to solve a general linear system and produce a basis for the null space. Howdo we find a basis of the column space? The procedure is remarkably simple once we’ve reducedthe matrix A: simply look for the pivot columns, and then take the corresponding columns of theoriginal matrix A, and this collection gives a basis of the image of the map x 7→ Ax.

6.4 Excercises

Recommended exercises from Bretscher’s text:

• Any (really many) of the problems at the end of section of 3.1. Especially 9-12,19, 20, 22-31,35, 36, 42, 43, 48-50.

• Problems 28, 29, 34-44 at the end of section 3.2

• Problems 33-39 at the end of section 3.3

• Problems 1-10 and 16-39 at the end of section 4.1

36


7 A Tour of Linear Geometry in R2 and R3

This section was covered in class primarily on the dates 3/6, 3/9, and 3/11. Please read Bretscher,chapter 2, section 2. I covered more than the contents of Bretscher, providing a number of pictures,proofs and examples. The notes will be updated to more completely reflect what was stated inclass at some point, but in the interim, please find a classmate’s notes if you were unable to attend,or attempt to prove the given formulae by constructing your own compelling geometric arguments.The outline of what as covered in class and the statements of the main formulae may be foundbelow, with propositions, theorems, and definitions generalized to Rn where applicable.

7.1 Geometry of linear transformations of the plane

Before exploring linear transformations of the plane, we need to understand the Euclidean structureof R2. As it happens, this structure comes from the dot product, and indeed the dot product givesa Euclidean structure to any Euclidean vector space Rn.

Proposition 7.1 (Bilinearity of the dot product). Given a fixed vector u ∈ Rn, x 7→ u · x givesa linear map from Rn to R. Since the dot product is commutative, we have in particular that themap · : Rn × Rn → R is bilinear (linear in each factor).

Theorem 7.1 (Geometric interpretation of the dot product). Let u and v be vectors in Rn. Then

u · v = ‖u‖‖v‖ cos θ ,

where θ ∈ [0, π] is the (lesser) angle between the vectors u and v as measured in the plane theyspan.

Remark 7.1. It suffices to prove the above in R2, since the angle is always measured in the twodimensional subspace span {u,v} ∼= R2. We used elementary trigonometry to deduce this.

Proposition 7.2 (Euclidean orthogonality from the dot product). Two vectors u,v ∈ Rn areorthogonal if and only if u · v = 0.

Definition 7.1. Given u,v ∈ Rn, the orthogonal projection of v onto u is the vector

projuv :=u · v‖u‖2

u .

Remark 7.2. If instead we take a unit vector u ∈ S1 := {x | ‖x‖ = 1}, then the formula simplifiesto

projuv = (u · v)u .

Exercise 7.1. Prove the above remark using the formula in the definition of orthogonal projection.Then give a matrix for the operator proju for u ∈ R2, and show that this is the same as the matrixfor proju where u := u/‖u‖ is the normalization of u. Find also the corresponding matrices ifu ∈ R3.

We may use the above construction to understand reflections through 1-dimensional subspacesof R2 (namely, reflections across lines through the origin). The remaining theorems exercises ofthis subsection concern linear automorphisms of R2, i.e. bijective linear maps of R2 to itself. Inparticular, rotations and reflections are explored through the following exercises.

Exercise 7.2. Prove the following theorems for the rotation and reflection formulae in the plane(this was done in class!):

37


Theorem 7.2. Given an angle θ ∈ [0, 2π), the operator for counter-clockwise rotation of R2 by theangle θ has standard matrix

Rθ =

ñcos(θ) − sin(θ)sin(θ) cos(θ)

ô.

Using the isomorphism C ∼= R2 given by mapping the basis (1, i) to (e1, e2), the operator Rθ

corresponds to the 1D C-linear operation

Rθ(z) = eiθz .

Theorem 7.3. Let L ⊂ R2 be a line through 0, and suppose u is a vector spanning L. Then theoperator giving reflection through L is

ML = (2proju − I2) : R2 → R2 ,

and it is well defined independently of the choice of u spanning L. If θ ∈ [0, π) is the angle madeby L with the x-axis, then the matrix of ML in the standard basis of R2 isñ

cos(2θ) sin(2θ)sin(2θ) − cos(2θ)

ô.

We can thus determine a reflection by the angle θ ∈ [0, π) made by the line L with the x-axis, andmay also write Mθ to indicate the dependence on this parameter.

Moreover, if

A =

ña bb −a

ôfor a, b ∈ R such that a2 + b2 = 1, then A represents a reflection through the line L = span (u)where u is any vector lying on the line bisecting the angle between the first column vector of A ande1.

Using the isomorphism C ∼= R2 given by mapping the basis (1, i) to (e1, e2), the operator Mθ

corresponds to the operationMθ(z) = e2iθz ,

where z = <z − i=z is the complex conjugate of z. (Note this operation is not, strictly speaking,complex linear, since complex conjugation is not C-linear.)

Exercise 7.3. Given an arbitrary nonzero-complex number a ∈ C∗ = C − {0}, what is the effectof the map z 7→ az? Give a matrix representation when this is viewed as a map of R2.

One then has the following conclusion about the relation between complex and real represen-tations of rigid linear motions in the plane: “rigid linear motions of R2 are captured by C-linearmotions of C together with conjugation; that is, C-linear motions of C are more restricted (theypreserve orientation), but including the complex conjugation operation recovers R-linear motionsof C as an R-vector space.”

Example 7.1. Let L be the line in R2 through the origin making angle 3π/4 with the x-axis, andlet M be the line in R2 through the origin making angle π/6 with the x axis. Find the standardmatrix for the composition T = MM ◦ML of reflections through the lines L and M . What is thegeometric interpretation of this composition? Write a formula for it using complex numbers.

Solution:By the above theorems, if a line L is spanned by a unit vector u = cos θe1 + sin θe2, then we

can compute the reflection through L as

ML(x) = 2(u · x)u− x = (2proju − I)x ,

38


and the matrix (2proju − I) is given as

(2proju − I) =

ñ2 cos2 θ − 1 2 sin θ cos θ2 sin θ cos θ 2 sin2 θ − 1

ô=

ñcos(2θ) sin(2θ)sin(2θ) − cos(2θ)

ô.

Thus, first we determine the unit vectors associated to each line:

L = span

®cos(3π/4)sin(3π/4)

´= span

®Ç−√

2/2√2/2

å´M = span

®cos(π/6)sin(π/6)

´= span

®Ç √3/2

1/2

å´Let A be the matrix such that ML(x) = Ax and let B be the matrix such that MM (x) = Bx.

We then have

A =

ñ0 −1−1 0

ô,

B =

ñ1/2

√3/2√

3/2 −1/2

ô.

The composition of the maps T = MM ◦ML has matrix equal to the matrix product

BA =

ñ−√

3/2 −1/2

1/2 −√

3/2

ô.

Note that this matrix is the matrix of a rotation! Since sin θ = 1/2 and cos θ = −√

3/2, weconclude that the angle of the associated counterclockwise rotation is θ = 5π/6, and we conclude

MM ◦ML = R5π/6 .

As a complex linear map T can be realized by z 7→ e5πi/6z.

Exercise 7.4. Give matrix-vector formulae for rotation about an arbitrary point of R2 and reflec-tion through an arbitrary line (not necessary containing 0).

Exercise 7.5. Characterize all bijective linear maps of R2 which do not decompose as a compositioninvolving rotations or reflections.

Exercise 7.6. (Hard!) Describe an algorithm which, for a given matrix A describing a bijectivelinear map x 7→ Ax of R2, produces a decomposition in terms of reflections, rotations, and themaps described in the previous exercise. Can one decompose any linear automorphism of R2 usingjust reflections and the maps from the previous exercise (i.e., can we exclude rotations in ourdecompositions)?

7.2 Geometry of linear transformations of three-dimensional space

Below is a summary of the contents of the two lectures given on the geometry of linear transfor-mations of R3. If you missed those lectures, then it is advised you copy notes and discuss thematerial with a classmate or myself during office hours. The essential points, such as computing3× 3 determinants, are reviewed in future sections.

• Projections - the formula for projection onto a line appears the same. Can you find a formulafor projection onto a plane?

39


• Planes and normals - This is largely overlap material with math 233; I chose to present itfrom a linear algebra perspective in class as a point of unification (e.g. deriving the equationof a plane, which we’ve used for a while without justification)

• Reflections in planes - the visual argument for this is analogous to the argument used toderive reflections across a line in R2.

• Cross products and determinants/Triple Scalar Products - The 3× 3 determinant was intro-duced and used as a mnemonic for the computation of the 3D cross product. Note that thereis no cross product in dimensions other than 3 and 7 (though there’s a pseudo-cross productin R2 which returns the signed area of the parallelogram spanned by the pair of vectors beingmultiplied). It was observed that the 3×3 determinant is in fact the signed volume of a paral-lelepiped spanned by the (column or) row vectors. This construction is equivalent to dottingthe vector corresponding to the first row with the cross product of the vectors correspondingto the second and third rows.

• Spatial Rotations - Using the cross product and projections, we obtained a beautiful formulafor rotation of R3 about an axis by an angle θ.

8 Coordinates, Basis changes, and Matrix Similarity

Please read sections 3.4 and 4.3, and 4.4 in Bretscher for the presentation and examples of thefollowing topics.

8.1 Linear Coordinates in Rn

8.2 Coordinates on a finite dimensional vector space

8.3 Change of Basis

9 Determinants and Invertibility

Please read Bretscher, chapter 6; this section of the notes will include definitions and proofs auxiliaryto those provided by the text.

9.1 Review of Determinants in 2 and 3 Dimensions

Recall that we defined the determinant of a 2× 2 matrix A as follows:

det A := a11a22 − a21a12 , where A = (aij) ∈ Mat2×2(F) .

Note that this definition can be applied for matrices over any field (or more generally, even over aring, such as the integers). Note also that det A = det Aτ .

For 2 × 2 matrices over a field, we know that invertibility of the matrix is equivalent to non-vanishing of its determinant. A natural question is whether we can generalize this to square matricesof any size. Recall, the geometric interpretation of the 2 × 2 determinant for matrices with realentries:

40


Example 9.1. (HW 1 Bonus 1) Show that ad− bc is the signed area of the parallelogram spannedby u and v, where the sign is positive if rotating u counter-clockwise to be colinear to v sweepsinto the parallelogram, and is negative otherwise.

Solution:First, let us suppose u and v are unit vectors, i.e. a2+c2 = 1 = b2+d2. Geometrically, they are

vectors lying on the unit circle, and so we can express their components as trigonometric functionsof the angles they make with the x axis. Let u make an angle of α with the x axis and v makean angle of β with the x axis. Then the angle between the vectors is β − α, and from the sinesubtraction formula: sin(β − α) = cos(α) sin(β)− cos(β) sin(α) = ad− bc.

Recall that the area of a parallelogram is the base times an altitude, formed by taking anorthogonal line segment from one side to an opposite side. From a picture, one sees that the area ofa parallelogram can be expressed as the product of side lengths times the sine of the internal anglebetween adjacent sides. If the sides are the unit vectors u and v, then the area is | sin(β − α)|.Thus, for unit vectors, ad−bc is ±area, with the sign positive if the angle β−α ∈ (0, π), negative ifβ−α ∈ (π, 2π), and 0 if the angle β−α = 0 or π (the colinear case). Thus, for the non-colinear case,if u sweeps into the parallelogram when rotated counterclockwise towards v, the sign is positive.Note that switching the order of the vectors switches the sign of the determinant ad− bc, and thisis consistently reflected in the convention regarding the vectors’ orientations.

For general vectors, one scales the area of the parallelogram as well as the components, anddiscovers that the scale factors for the area and the equation ad− bc are identical: e.g. if we scaleu by λ, then the area scales by λ, and so do the components:

λu =

Çλaλc

å,

so the determinant scales to (λa)d− b(λc) = λ(ad− bc). Thus, the determinant is the signed area,accounting for the orientation/ordering of the two vectors.

We also defined determinants for 3× 3 matrices, and discovered that our generalization has ananalogous geometric interpretation as a signed volume in R3 of the parallelepiped whose sides aredetermined by the column vectors (or row vectors) of the matrix:

∣∣∣∣∣∣∣a11 a12 a13a21 a22 a23a31 a32 a33

∣∣∣∣∣∣∣ = a11(a22a33 − a32a23)− a12(a21a33 − a23a31) + a13(a21a32 − a31a23) .

See Brestcher, section 6.1, for a discussion of Sarrus’s rule, and why it fails to generalize to givedeterminants for n > 3.

9.2 Defining a General Determinant

For the definition provided in class, please read Bretscher, section 6.1. Here, I rephrase his definition(which uses “patterns” and “inversions”) in the modern, standard language. We need to define avery important object, called a permutation group, in order to give the modern definition of thedeterminant. This definition is very formal, and is not necessary for the kinds of computationswe will be doing (see instead the discussions of computing determinants by row reduction, or viaexpansion by minors.) It is recommended you read Bretscher’s treatment or the in class notesregarding patterns and signatures first, before approaching this section. The end of the sectiondescribes how to define determinants in the general setting of finite vector spaces over a field,where instead of matrices we consider maps of the vector space to itself, called endomorphisms.

41


Definition 9.1. Consider a set of n symbols, e.g. the standard collection of integers 1 throughn: {1, . . . , n}. Define the permutation group on n symbols to be the set of all bijective maps of{1, . . . , n} to itself, with group operation given by composition. See HW 4 for the definition of agroup and an exercise realizing a representation of this group. Denote this permutation group bySn.

A common notation for the above group’s elements is cycle notation. For example, let usconsider S3, the permutation group of the symbols {1, 2, 3}. Consider the map which sends 1 to 2,2 to 3 and 3 to 1. We notate this element as (1 2 3). We interpret the symbol as telling us where tosend each element as follows: if an integer m appears directly to the right of k, then k is mapped ton, and the last integer on the right in the cycle is mapped to the first listed on the left. The cycle(1 2 3) clearly gives a bijection, so we can regard (1 2 3) ∈ S3. This is called a cyclic permutation,as it consists of a single cycle. Another special type of permutation is a cyclic permutation withjust two elements, which is called a transposition. An example would be the map which sends 1 toitself, but swaps 2 and 3. This is notated (2 3) ∈ S3. The lack of the appearance of 1 tells us that1 is mapped to itself (sometime, this transposition would be denoted (1)(2 3) to emphasize this.)The convention I will follow is that if an integer is missing from a cycle, then it is sent to itself bythat cycle. To see the effect of the map determined by a cycle, we’ll denote it’s action sometimesby writing how it permutes the ordered tuple (1, . . . , n), e.g. if σ = (1 3) ∈ S3, then

(1, 2, 3)σ7−→ (3, 2, 1) .

One can cyclically reorder any cycle and it will represent the same map, e.g. (1 2 3) = (2 3 1) =(3 1 2). By convention one usually starts the cycle with the lowest integer on which the cycle actsnontrivially. The empty cycle () represents the identity map on the set of symbols.

One can “multiply” cycles to compute a composition of permutations as follows:

1. Two adjacent cycles represent applying one cycle after another, from right to left. For ex-ample, in permutations of 6 symbols, S6, the cycles σ = (1 2 3) and σ′ = (3 5 4 6) can becomposed in two ways:

σσ′ = (1 2 3)(3 5 4 6), which acts as (1, 2, 3, 4, 5, 6)σσ′7−→ (2, 3, 5, 6, 4, 1) ,

σ′σ = (3 5 4 6)(1 2 3), which acts as (1, 2, 3, 4, 5, 6)σ′σ7−→ (2, 5, 1, 6, 4, 3) .

2. Any cycle product can be rewritten as a product of disjoint cycles. Disjoint cycles commutewith each other, e.g. (1 2)(3 4) = (3 4)(1 2) ∈ S4 represents the map

(1, 2, 3, 4) 7→ (2, 1, 4, 3) .

If cycles are not disjoint, to write them as disjoint cycles, one reads where the rightmost cyclesends a given symbol, then scans left to find its image in the cycles to the left, then followsthis image to the left, etc. E.g. using the examples from (1):

σσ′ = (1 2 3)(3 5 4 6) = (3 5 4 6 1 2) = (1 2 3 5 4 6) .

σ′σ = (3 5 4 6)(1 2 3) = (1 2 5 4 6 3) .

In these cases the result is a single cycle (which is therefore a product of disjoint ones). Amore interesting example is the product

(1 3 5)(5 6)(1 4 2 6) = (1 4 2)(3 5 6) .

42


3. Any cycle can be decomposed as a product of (not necessarily disjoint) transpositions. E.g.

(1 2 3) = (1 2)(2 3)

σσ′ = (1 2)(2 3)(3 5)(5 4)(4 6)

A permutation is called even if it can be decomposed into an even number of transpositions,otherwise it is said to be odd.

Exercise 9.1. Argue that the notions of evenness and oddness of a permutation are well defined.Thus you must show that if a permutation has one decomposition into evenly many transpositions,then any decomposition into transpositions has an even number of transpositions, and similarly ifit admits an odd decomposition, then all decompositions are odd.

Definition 9.2. A given permutation has signature sgnσ = 1 if σ is even and −1 if σ is odd. Bythe above exercise, this is well defined, and in fact determines a unique map sgn : Sn → {−1, 1}such that sgn (σ1σ2) = sgn (σ1) sgn (σ2) and with sgn (τ) = −1 for any single transposition τ ∈ Sn.

The “patterns” Bretscher speaks of are actually the result of applying permutations to theindices of entries in the matrix. In particular, one can define a pattern as follows. Let’s assume weare given a matrix A ∈ Matn×n(F). Fix a permutation σ ∈ Sn. Then we obtain a pattern

Pσ = (a1,σ(1), a2,σ(2), . . . , anσ(n)) .

The claim is that all patterns are of this form and that the signature of the pattern is equal to thesignature of the associated permutation. Given this fact, one can realize Bretcher’s definition asthe more common Lagrange formula for the determinant :

Definition 9.3. The determinant of A ∈ Matn×n(F) is the scalar det A ∈ F given by

det A :=∑σ∈Sn

sgn (σ)n∏i=1

aiσ(i) =∑σ∈Sn

sgn (σ)Äa1σ(1) · · · anσ(n)

ä.

One can readily recover some basic properties of the determinant from this definition. Forexample, suppose one were to swap the ith and jth columns of a matrix A. This is equivalent toacting on the matrix by the transposition τij = (i j) ∈ Sn. Denote the image matrix as τijA andlet A = (akl). Note that sgn τ = −1 and sgn (σ) = −sgn (στ) for any σ ∈ Sn. Moreover, since Sn

is a group, the map τ : Sn → Sn is a bijection. Thus

det(τijA) = det(akτ(l)) =∑σ∈Sn

sgn (σ)n∏k=1

akσ(τ(l))

=∑σ∈Sn

−sgn (στ)n∏k=1

akσ(τ(l))

= −∑

στ∈Sn

sgn (στ)n∏k=1

akστ(l)

= −∑σ∈Sn

sgn (σ)n∏k=1

akσ(l)

= −det(A) .

Exercise 9.2. Use the above definition to show that det A = det Aτ for any matrix A ∈ Matn×n(R).

Exercise 9.3. Use the above definition to describe the effect of the other elementary row/columnoperations on the determinant of a square matrix.

43


Let us now generalize our definition of determinants to a suitable class of maps of abstract finitedimensional vector spaces. Given a finite dimensional vector space V over a field F, we can considerendomorphisms of V and their determinants:

Definition 9.4. Let V be a vector space over F. Then a linear endomorphism, vector spaceendomorphism, or simply endomorphism of V as an F-vector space is an F-linear map T: V → V .We denote the space of all endomorphisms of the F-vector space V by EndF(V ).

Let us consider a finite dimensional vector space V , with dimension n. Thus, there is a basisA of V consisting of n vectors, giving us a coordinate system in Fn. If T ∈ EndF(V ) is anendomorphism of V , we can find a matrix A representing T relative to the basis A .

Definition 9.5. For V an F-vector space with dimF V = n, the determinant of an endomorphismT: V → V is the determinant of any matrix A ∈ Matn×n(F) representing T in coordinates deter-mined by some basis A of V :

det T := det A, where A ∈ Matn×n(F) such that [Tv]A = A[v]A .

We need to check that this is a reasonable definition. Mathematicians speak of checking if agiven construction or definition is “well-defined”. In this case, that means we need to check thatthe determinant depends only on the endomorphism T , and not on the choice of basis A of V .

Claim 9.1. The determinant of an endomorphism T: V → V of a finite vector space is well defined.

Proof. Suppose A and B are bases of V , and A and B are the coordinate matrices of T ∈ EndF(V )relative to A and B respectively. It suffices to show that det A = det B. We know that A and Bare similar, for if S is the change of basis matrix from A to B, i.e. the standard matrix of theisomorphism LB ◦L −1

A : Fn → Fn, then AS = SB, whence B = S−1AS. Then by properties of thedeterminant of a square matrix, we have:

det B = det(S−1AS)

= (det S−1)(det A)(det S)

= (det S−1)(det S)(det A)

=Ä

det(S−1S)ä(det A)

= (det In)(det A)

= det A .

An alternative definition of general determinants of endomorphisms of a finite vector spaceis to define the determinant of a map as the product of its eigenvalues (see the next section).This alternative definition has the advantage of being completely coordinate free; one need notinvoke coordinates directly in the definition, and it is clearly well defined since the eigenspectrumis determined only by the map itself.

We now consider the properties of the determinant.

Proposition 9.1. Let V be a finite dimensional vector space over the field F, dimF V = n. Thenthere are isomorphisms

EndF(V ) ∼= V n := V × . . .× V︸︷︷︸n times

∼= Matn×n(F) ∼= Fn×n .

Exercise 9.4. Prove the above proposition.

44


Definition 9.6. Given a product of vector spaces V1 × V2 × . . .× Vn, a map

T: V1 × V2 × . . .× Vn → F

is said to be multilinear if it is linear in each factor, i.e., if for any i ∈ {1, . . . , n}, any α, β ∈ F, andany pair x1,yi ∈ Vi,

T(x1,x2, . . . , αxi + βyi, . . . ,xn) = αT(x1,x2, . . . ,xi, . . . ,xn) + βT(x1,x2, . . . ,yi, . . .xn) .

Definition 9.7. A multilinear map T: : V × V × . . . × V → F is called alternating if and only iffor any pair of indices i, j ∈ {1, . . . , n}

T(x1,x2, . . . ,xi, . . . ,xj , . . . ,xn) = −T(x1,x2, . . . ,xj , . . . ,xi, . . . ,xn) ,

i.e. after swapping any pair of inputs, the map is scaled by −1 ∈ F. A multilinear map is calledsymmetric if and only if such a swap does not change the value of the map on its inputs.

Remark 9.1. Note that if F is of characteristic 2, then a map is alternating if and only if it issymmetric. Otherwise (e.g. the fields we’ve worked with most, such as R, C, Q, or Fp, p 6= 2) amap might be one but not the other, or might be neither.

Exercise 9.5. Show that any alternating multilinear map T: V × . . .× V → F evaluates to zero ifit has repeated inputs. E.g. for an alternating bilinear map B : V ×V → F, B(x,x) = 0 necessarily.

Theorem 9.1. Let V be a finite dimensional vector space over the field F, dimF V = n. There isa unique map D: EndF(V )→ F satisfying the following properties:

(i.) D is multilinear and alternating when viewed as a map D: V n → F,

(ii.) For any endomorphisms T,S ∈ EndF(V ), D(T ◦ S) = D(T)D(S),

(iii.) D(IdV ) = 1

Exercise 9.6. Prove the above theorem and show that the map D is indeed the determinantas defined above. Note in particular that the multilinearity and alternativity of D should beindependent of the choice of isomorphism EndF(V ) ∼= V n.

9.3 Expansion by Minors

We now show that one can recursively compute the determinant. It suffices to demonstrate that arecursive formula can be produced for a given A ∈ Matn×n(F). We work from the definition

det A :=∑σ∈Sn

sgn (σ)n∏i=1

aiσ(i) =∑σ∈Sn

sgn (σ)Äa1σ(1) · · · anσ(n)

ä.

Fix a particular index i ∈ {1, . . . , n} =: [n], and observe that

n∏k=1

akσ(k) = aiσ(i)∏

k∈[n]\{i}akσ(k) .

Let j := σ(i). The following exercise is to deduce the details leading to our recursive formula fordeterminant computation.

Exercise 9.7. Let Pσ =∏nk=1 akσ(k), and take aij as above. Let Pij

σ =∏k∈[n]\{i} akσ(k). Let

Pσ and P ijσ be the respective patterns corresponding to these products (taken in order of the firstindex). Show that

45


a. sgn (σ) = sgn (Pσ),

b. sgn (Pσ) = (−1)i+jsgnP ijσ ,

c. sgn (σ)Pσ = (−1)i+jaijPijσ .

Theorem 9.2 (Expansion by Minors/The Laplace Expansion). Let A = (aij) ∈ Matn×n(F). Fixa column (or row), with index j (or i respectively). Denote by Aij the submatrix of A obtained byremoving the i-th row and j-th column. Then

det A =n∑i=1

(−1)i+jaij det(Aij) =n∑j=1

(−1)i+jaij det(Aij) .

The rightmost formula being an expansion by minors along the i-th row, and the middle formulabeing an expansion by minors down the j-th column. Note that the pattern for choosing the signs,as shown in the above preceding exercise, is a checkerboard, with the upper left corner positive:

+ − + − . . .− + − + . . .+ − + − . . .− + − + . . ....

......

.... . .

Exercise 9.8. Let A ∈ Matn×nR and suppose that k is a positive integer such that A has a k × kminor which has nonzero determinant, and such that there are no minors of larger size in A withnonzero determinant (note, the minor might be A itself). Show that rk A = k. Moreover, showthat if rk A = k for some k, then the largest size of a nonzero minor in A is k × k.

9.4 Cramer’s Rule and the Inverse Matrix Theorem

Theorem 9.3 (Cramer’s Rule). Consider the linear system Ax = b, where A ∈ Matn×n(R) andb ∈ Rn. Suppose x is the unique solution to the system, and xi = ei · x is the i-th component of x.Then

xi =det(Ab,i)

det(A),

where Ab,i is the matrix obtained from A by replacing the i-th column with the vector b.

Proof. We compute det(Ab,i) assuming that Ax = b. We write A = [v1, . . . ,vn], where vj is thej-th column of A, as is usual. Then

det(Ab,i) =∣∣∣ v1 v2 . . . vi−1 b vi+1 . . . vn

∣∣∣ (2)

=∣∣∣ v1 v2 . . . Ax . . . vn

∣∣∣ (3)

=∣∣∣ v1 . . . (x1v1 + . . .+ xivi + . . .+ xnvn) . . . vn

∣∣∣ (4)

=∣∣∣ v1 . . . xivi . . . vn

∣∣∣ (5)

= xi∣∣∣ v1 . . . vi . . . vn

∣∣∣ (6)

= xi det(A) . (7)

Since x is the unique solution, A has nonzero determinant (as it must be invertible), and weconclude that for each i ∈ {1, . . . , n}

xi =det(Ab,i)

det(A).

46


An interesting corollary of this is the following algorithm for computing the inverse of aninvertible matrix. Define the (i, j)-th cofactor of A to be cij = detAij where Aij is the matrixobtained from A by removing the i-th row and j-th column, and let C =

Ä(−1)i+jcij

äbe the signed

cofactor matrix. Then the classical adjoint is

A∗ := Cτ .

Corollary 9.1. If A ∈ Matn×n(F) is invertible, then the inverse of A is given by

A−1 =1

det AA∗ .

Exercise 9.9. Prove the above corollary, using Cramer’s rule. (The proof was given in class andcan be found in Bretscher, but see if you can reproduce it without referencing anything other thanCramer’s rule!)

10 Eigenvectors, Eigenvalues, and the Characteristic Equation

10.1 The concepts of eigenvectors and eigenvalues

Consider the following puzzle, whose solution is intuitive. We have three friends sitting arounda table, and each is given some amount of putty: at time t = 0 minutes one of them has a > 0grams of putty, another has b > 0 grams of putty, and the last individual has c > 0 grams of putty.They play with their respective wads of putty for nearly a minute, and then divide their wads intoperfect halves. Exactly at the one minute mark, each person passes one half to the friend to theirleft, and the other half to the friend to their right. They then play with their wads of putty fornearly another minute before agreeing to again divide and pass exactly as they did at t = 1. Foreach integer number of minutes n, at exactly t = n they pass half of the putty in their possessionat the time to the adjacent friends. What happens in the long term? Does any one friend end upwith all of the putty, or most of the putty, or does it rather approach an equilibrium?

What we’ve described is an example of a discrete dynamical system. In this particular case, itis in fact a linear system: you can check that if xt is the vector describing the putty in possessionof our three friends at time t, then at time t = n we have xn = Axn−1, where

A =

0 1/2 1/21/2 0 1/21/2 1/2 0

.It is easy to see that we can define a function, for nonnegative integral t:

x• : Z≥0 → R3 , xt = Atx0 ,

where x0 = ae1 + be2 + ce3 is the initial vector describing the putty held by each friend at timet = 0. The question of long term behavior is then stated mathematically as “Find

limn→∞

xn = limn→∞

Anx0 ,

if it exists.”One observation about this putty problem: at each step, the total amount of putty in the hands

of the collective of friends is conserved. This might give us some hope that the limit exists, but ofcourse we need to understand what it means for this system to converge for a given initial valuex, and actually show that it does (if this is the case). Before we analyze this system in full, let usexplore two-dimensional systems, and define an incredibly useful tool which will allow us to solvesuch linear discrete dynamical systems (LDSSs for short).

47


Exercise 10.1. Generalize the putty problem in two different ways to feature n friends. Intuitively,can you argue that the long term behavior of each such system is qualitatively the same as that weexpect in the original putty problem?

Example 10.1. Let us consider the matrixñ1 20 −1

ô.

Suppose we wanted to understand the action of the map x 7→ Ax on the plane R2. One naturalquestion is “does the map T(x) = Ax admit any invariant proper subspaces (in this case, lines) inR2?” That is, are there lines L such that the the image TL of L is L ? Suppose that L ⊂ R2

is a 1-dimensional subspace fixed by the map T. Then there is some nonzero vector v ∈ R2 suchthat L = span {v}. Then Tv = Av = λv for some scalar λ ∈ R, since Tv ∈ span {v}. We canrearrange this equation as

Av − λv = (A− λI2)v = 0 .

Thus, v ∈ ker(A− λI2). Since we assumed v 6= 0, it follows that det(A− λI2) = 0. This gives usa polynomial equation, which should determine λ. We call this the characteristic equation of thematrix A. Using the given values, we have

det

Çñ1 20 −1

ô−ñλ 00 λ

ôå= det

Çñ1− λ 2

0 −1− λ

ôå= (1− λ)(−1− λ) = 0

⇐⇒ λ = 1 or λ = −1 .

That we get two such scalars λ suggests that there are two subspaces invariant with respect toour map T. The λs are called eigenvalues, and the corresponding invariant subspaces are calledeigenspaces (“eigen” means “own” or “self” in German, though it’s come to mean “characteristic”or “self-similar” owing to it’s extensive appearance in modern mathematics as a prefix for gadgetscoming from linear operators.)

We can find a pair of eigenvectors describing our two eigenlines. Indeed, we can use the valuesof λ we found to solve the vector equations

(A− (1)I2)v1 = 0 .

(A− (−1)I2)v1 = 0 .

Exercise 10.2. Find the vectors v1 and v2 above. Note that in class we deduced that we couldread off the eigenvalues from the main diagonal of the matrix in this case, since the matrix isupper-triangular. In general, an upper triangular matrix or lower triangular matrix has eigenval-ues precisely equal to the entries along the main diagonal. In class we used our eigenvectors toform a basis, and rewrote the linear map in eigencoordinates, exhibiting that in the appropriatecoordinates, it was merely a reflection along across one axis.

Please read Bretscher 7.1 - 7.3, which will cover much of the following topics:

48


10.2 The characteristic equation

10.3 Eigenvalue formulae for Traces and Determinants

10.4 Eigenspaces and Eigenbases

10.5 Diagonalization

10.6 Jordan Canonical form

11 Orthogonality and Inner Product Spaces

Will there be time?

49

math 235.9 spring 2015 course notes - umass...

Documents