f:/00 linearalgebrabook 4/0 linalgbook

Contents

A Note to Students - READ THIS! . . . . . . . . . . . . . . . . . . iii

How To Make The Most Of This Book: SQ3R . . . . . . . . . . . . viii

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 Vectors in Euclidean Space 1

1.1 Vector Addition and Scalar Multiplication . . . . . . . . . . . . . . 1

1.2 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.4 Dot Product, Cross Product, and Scalar Equations . . . . . . . . . . 30

1.5 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2 Systems of Linear Equations 48

2.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . 48

2.2 Solving Systems of Linear Equations . . . . . . . . . . . . . . . . . 53

3 Matrices and Linear Mappings 72

3.1 Operations on Matrices . . . . . . . . . . . . . . . . . . . . . . . . 72

3.2 Linear Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

3.3 Special Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

3.4 Operations on Linear Mappings . . . . . . . . . . . . . . . . . . . 106

4 Vector Spaces 110

4.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.2 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.3 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5 Inverses and Determinants 143

5.1 Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

5.2 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 151

5.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

5.4 Determinants and Systems of Equations . . . . . . . . . . . . . . . 173

5.5 Area and Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

6 Eigenvectors and Diagonalization 182

6.1 Matrix of a Linear Mapping, Similar Matrices . . . . . . . . . . . . 182

6.2 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . 188

6.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

6.4 Powers of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 204

i

7 Fundamental Subspaces 207

7.1 Bases of Fundamental Subspaces . . . . . . . . . . . . . . . . . . . 207

8 Linear Mappings 215

8.1 General Linear Mappings . . . . . . . . . . . . . . . . . . . . . . . 215

8.2 Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . 221

8.3 Matrix of a Linear Mapping . . . . . . . . . . . . . . . . . . . . . 225

8.4 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

9 Inner Products 237

9.1 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 237

9.2 Orthogonality and Length . . . . . . . . . . . . . . . . . . . . . . . 242

9.3 The Gram-Schmidt Procedure . . . . . . . . . . . . . . . . . . . . 256

9.4 General Projections . . . . . . . . . . . . . . . . . . . . . . . . . . 262

9.5 The Fundamental Theorem . . . . . . . . . . . . . . . . . . . . . . 270

9.6 The Method of Least Squares . . . . . . . . . . . . . . . . . . . . . 273

10 Applications of Orthogonal Matrices 280

10.1 Orthogonal Similarity . . . . . . . . . . . . . . . . . . . . . . . . . 280

10.2 Orthogonal Diagonalization . . . . . . . . . . . . . . . . . . . . . . 284

10.3 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

10.4 Graphing Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . 299

10.5 Optimizing Quadratic Forms . . . . . . . . . . . . . . . . . . . . . 305

10.6 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . 309

11 Complex Vector Spaces 319

11.1 Complex Number Review . . . . . . . . . . . . . . . . . . . . . . . 319

11.2 Complex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . 326

11.3 Complex Diagonalization . . . . . . . . . . . . . . . . . . . . . . . 333

11.4 Complex Inner Products . . . . . . . . . . . . . . . . . . . . . . . 336

11.5 Unitary Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . 345

11.6 Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . 353

Index 356

ii

A NOTE TO STUDENTS - READ THIS!

Best Selling Novel: “My Favourite Math,” by Al G. Braw

Welcome to Linear Algebra!

In this book, you will be introduced to basic elements in linear algebra, including

vectors, vector spaces, matrices, linear mappings, and solving systems of linear equa-

tions. You will then be led to more advanced topics and applications including gen-

eral projections and unitary diagonalization. The strategy used in this book is to

help you understand an example of a concept before generalizing it. For example, in

Chapter 1 you will learn about vectors in Rn, and then in Chapter 4 we will extend

part of what we did with vectors in Rn to the abstract setting of a real vector space.

Of course, the better you understand the particular example, the easier it will be for

you to understand the more complicated concepts introduced later.

As you will quickly see, this book contains both computational and theoretical ele-

ments. Most students, with enough practice, will find the computational questions

fairly easy, but may have difficulty with the theory portion (definitions, theorems and

proofs). While studying linear algebra you should keep in mind that when applying

these concepts in the future, the problems will likely be far too large to compute by

hand. Hence, the computational work will be done by a computer, and you will use

your knowledge and understanding of the theory to set up the problem in terms of

linear algebra and to interpret the results. Why do we make you compute things by

hand if they can be done by a computer? Because solving problems by hand can often

help you understand the theory. Thus, when you first learn how to solve problems,

pay close attention to how solving the problems applies to the concepts in the course

(definitions and theorems). In doing so, not only will you find the computational

problems easier, but it will also help you greatly when doing proofs.

Prerequisites:

Teacher: Recall from your last course that...

Student: What?!? You mean we were supposed to remember that?

We expect that you know:

- The basics of set theory

- How to solve a system of equations using substitution and elimination

- The basics of vectors in R2 and R3, including the dot product and cross product

- The basics of factoring polynomials

- The basics of how to write proofs

If you are unsure of any of these, we recommend that you do some self study to

learn/remember these topics.

iii

What is linear algebra used for?

Student: Will we ever use this in real life?

Professor: Not if you get a job flipping hamburgers.

Linear algebra is used in the social sciences, the natural sciences, engineering, busi-

ness, computer science, statistics, and all branches of mathematics. A small sample

of courses at the University of Waterloo that have linear algebra prerequisites include:

AFM 272, ACTSC 291, AMATH 333, CM 271, CO 250, CS 371, ECON 301, PHYS

234, PMATH 330, STAT 331.

Note that although we will mention applications of linear algebra, we will not discuss

them in depth in this book. This is because in most cases it would require knowledge

of the various applications that we do not expect you to have at this point. Addition-

ally, you will have a lot of time to apply your linear algebra knowledge and skills in

future courses.

How to learn linear algebra!

Student: Is there life after death?

Teacher: Why do you ask?

Student: I’ll need the extra time to finish all the homework you gave us.

1. Attend all the lectures.

Although this book have been written to help you learn linear algebra, the

lectures are your primary resource to the material covered in the course. The

lectures will provide examples and explanations to help you understand the

course material. Additionally, the lectures will be interactive. That is, when

you do not understand something or need clarification, you can ask. Books do

not provide this feature (yet). You will likely find it difficult to get a good grade

in the course without attending all the lectures.

2. Read this book.

It is generally recommended that students read course notes/text books prior

to class. If you are able to teach yourself some of the material before it being

taught in class, you will find the lectures twice as effective. The lecturers will

then be there for you to clarify what you taught yourself and to help with the

areas that you had difficulty with. Trust me, you will enjoy your classes much

more if you understand most of what is being taught in the lectures. Addition-

ally, you will likely find it considerably easier to take notes and to listen to the

professor at the same time.

Note that between section the theorems, lemmas, definitions, etc, are numbered

in the form A.B.C where A indicates the chapter, B indicates the section, and

C indicates which item. For example, Theorem 3.1.4 indicates Theorem 4 in

Section 1 of Chapter 3.

iv

3. RED MEANS STOP!

The mid-section exercises are designed to help you ensure that you understand

the material. Many are of a routine nature, but some are there to make you think

more in-depth about the material. You should always do these exercises and

check your answers before proceeding.

4. Doing/Checking homework assignments.

Homework assignments are designed to help you learn the material and prepare

for the tests. You should always give a serious effort to solve a problem before

seeking help. Spending the time required to solve a hard problem is not only

very satisfying, but it will greatly increase your knowledge of the course and

your problem solving skills. Naturally, if you are unable to figure out how

to solve a problem, it is very important to get help so that you can learn how

to do it. Make sure that you always indicate on each assignment where you

have received help. Never just read/copy solutions. On a test, you are not

going to have a solution to read/copy from. You need to make sure that you

are able to solve problems without referring to similar problems. It is highly

recommended that you collect your assignments and compare them with the

solutions. Even if you have the correct answer, it is worth checking if there is

another way to solve the problem.

5. Study!

This might sound a little obvious, but most students do not do nearly enough

of this. Moreover, you need to study properly. Staying up all night the night

before a test is not studying properly! Learning math, and most other subjects,

is like building a house. If you do not have a firm foundation, then it will be

very difficult for you to build on top of it. It is extremely important that you

know and do not forget the basics.

6. Use learning resources.

There are many places you can get help outside of the lectures. They are there

to assist you... make use of them!

v

Proofs

Student 1: What is your favorite part of mathematics?

Student 2: Knot Theory.

Student 1: Me neither.

Do you have difficulty with proof questions? Here are some suggestions to help you.

1. Know and understand all definitions and theorems.

Proofs are essentially puzzles that you have to put together. In class and in this

book we will give you all of the pieces and it is up to you to figure out how

they fit together to give the desired picture. Of course, if you are missing or

do not understand a required piece, then you are not going to be able to get the

correct result.

2. Learn proof techniques.

In class and in this book you will see numerous proofs. You should use these

to learn how to select the correct puzzle pieces and to learn the ways of putting

the puzzle pieces together. In particular, I always recommend that students ask

the following two questions for every step in each proof:

(a) “Why is the step true?”

(b) “What was the purpose of the step?”

Answering the first question should really help your understanding of the def-

initions and theorems from class (the puzzle pieces). Answering the second

question, which is generally more difficult, will help you learn how to think

about creating proofs (how the puzzle pieces fit together).

3. Practice!

As with everything else, you will get better at proofs with experience. I rec-

ommend working in groups of two or three. First, you each individually write

your own proof of a statement. Then, you each pass your proof to another

group member who will try to evaluate whether your proof is valid and/or easy

to follow. After every group member has look at everyone else’s proof, discuss

the different proofs in the group and together try to write one very good proof.

vi

Steps For Writing Your Own Proof:

1. Read the statement you are trying to prove carefully. Write down the definition

of any key term in the statement and think about what theorems relate to these

terms.

2. Think about what strategy/proof technique (see below) you will use for the

proof.

3. Based on the strategy you selected, write down the beginning and the end of

the proof.

4. Use the definitions and theorems that relate to the key terms to try to manipulate

the beginning towards the end and/or the end towards the beginning.

5. If you have trouble completing the proof, then either (a) look for more theorems

that relate to the statement, (b) decide upon a new strategy and return to Step

3, or (c) ask a friend, tutor, TA, or prof for a HINT. (Giving up or reading a

solution are not good options).

6. Once you have a proof, make sure that every step is justified by a theorem or

definition that you have been given.

Most Commonly Used Proof Strategies:

Common Proof Prove the statement is true by showing the definition of the conclusion is satis-

fied, or prove an equation is valid by starting with one side and showing that it

can be modified into the other side.

Constructive Proof Prove an object exists by first constructing (defining) the object and then show-

ing it has the desired properties. A common way of figuring out how to define

the object is to work backwards... that is assume the object exists and figure

out how it must be defined (this is often done in ǫ − δ proofs in Calculus).

Subset Proof Prove a set S is a subset of a set T by first picking any element s ∈ S (first step:

Let s ∈ S ) and then showing that s satisfies the condition of T (last step: Thus,

s ∈ T ).

Sets Equal Proof Prove two sets are equal by showing they are subsets of each other.

Uniqueness Proof Prove an object is unique by assuming there are two of them and showing they

must actually be the same object.

vii

How To Make The Most Of This Book: SQ3R

The SQ3R reading technique was developed by Francis Robinson to help students

read textbooks more effectively. It will not only help you learn even more from this

book (or any textbook), but I’m sure you will find that it also makes the reading more

interesting.

Survey: Quickly skim over the section. Make note of any heading or boldface

words. Read over the definitions, the statement of theorems, and the statement of

examples or exercises (don’t read proofs or solutions at this time).

Question: Make a purpose for your reading by writing down general questions

about the headings, boldface words, definitions, or theorems that you surveyed. For

example, a couple of questions for Section 1.2 could be:

What is the geometric interpretation of spanning?

How do I determine if a set is linearly independent?

What is the relationship between spanning and linear independence?

How does this material relate to content I have already learned in the course?

What is the purpose of a basis?

How do I find a basis for a surface in higher dimensions?

Read: Read the material in chunks of about one to two pages long. Read carefully

and look for the answers to your questions as well as key concepts and supporting

details. Try to solve examples before reading the solutions. Try to figure out the proofs

before you read them. If you are not able to solve them, look carefully through the

provided solution to figure out the step where you got stuck.

Recall: As you finish each chunk, put the book aside and summarize the important

details of what you have just read. Write down the answers to any questions that you

made and write down any further questions that you have. Think critically about how

well you have understood the concepts and if necessary go back and reread a part or

do some relevant end of section problems.

Review: This is an ongoing process. Once you complete an entire section, go back

and review your notes and questions from the entire section. Test your understanding

by trying to solve the end of chapter problems without referring to the book or your

notes. Repeat this again when you finish an entire chapter and then again in the future

as necessary.

Yes, you are going to find that this makes the reading go much slower for the first

couple of chapters, but students who use this technique consistently report that they

feel that they end up spending a lot less time studying for the course as they learn the

material so much better at the beginning (which makes future concepts much easier

to learn).

viii

ACKNOWLEDGMENTS

Thanks are expressed to:

Mike La Croix: for formatting, LaTeX consulting, and all of the amazing figures.

Eddie Dupont: for the interactive features.

Pantelis Eleftheriou: for editing and proving many useful suggestions.

Brian Ingalls : for editing, proving many useful suggestions, and adding the problem

sets.

Peiyao Zeng : for editing and proving many useful suggestions.

Li Liu: for assisting with LaTeXing.

Kate Schreiner, Joseph Horan, Laurentiu Anton, Chi Zhang, and Henry Ho: for

proof-reading, editing, and suggestions.

Stephen New, Martin Pei, Conrad Hewitt, Robert Andre, Uldis Celmins, C. T. NG,

and many other of my colleagues and students who have broaden my knowl-

edge of and how to teach linear algebra.

ix

Chapter 1

Vectors in Euclidean Space

1.1 Vector Addition and Scalar Multiplication

2-dimensional and 3-dimensional Euclidean spaces were studied by ancient Greek

mathematicians. In Euclid’s Elements, Euclid defined two and three dimensional

space with a set of postulates, definitions, and common notions. In modern mathe-

matics, we define Euclidean spaces differently so that we are able to easily generalize

the concepts. This allows us to use the same ideas to solve problems in other areas.

You may be familiar with defining n-dimensional Euclidean space, Rn, to be the set of

all elements of the form (x1, . . . , xn) where xi ∈ R for 1 ≤ i ≤ n. The elements of Rn

are called points and are usually denoted P(x1, . . . , xn). However, in linear algebra,

we will view Rn as a set of vectors rather than as a set of points. In particular, we

will write an element of Rn as a column and denote it with the usual vector symbol ~x(some textbooks will denote vectors in bold face i.e. x).

DEFINITION

Rn

The set Rn is defined by Rn =

~x =

x1

...xn

∣

∣

∣

∣

∣

x1, . . . , xn ∈ R

.

Two vectors ~x =

x1

...xn

and ~y =

y1

...yn

in Rn are said to be equal if xi = yi for 1 ≤ i ≤ n.

REMARK

Note that working in, for example, R6 does not mean that we are necessarily in 6-

dimensional space. When analyzing the motion of a particle, an engineer will require

3 variables for the particle’s position as well as 3 variables for the particle’s velocity.

In Economics, one can easily be working with 1000 variables and hence working in

R1000. Understanding the ideas of linear algebra are necessary to determine which

calculations are required and to interpret the results.

1

2 Chapter 1 Vectors in Euclidean Space

Although we are going to view Rn as a set of vectors, it is important to understand

that we really are still referring to n-dimensional Euclidean space. In some cases it

will be useful to think of the vector

x1

...xn

as the point (x1, . . . , xn). In particular, we

will sometimes interpret a set of vectors as a set of points to get a geometric object

(such as a line or plane). So, geometrically, a vector in Rn always has its tail at the

origin and its head at its corresponding point.

REMARK

In linear algebra all vectors in Rn should be written as column vectors. However, we

will sometimes write these as n-tuples to match the notation that is commonly used

in other areas. In particular, when we are dealing with functions of vectors, we will

often write f (x1, . . . , xn) rather than f

x1

...xn

.

One advantage of viewing the elements of Rn as vectors instead of as points is that

we can perform operations on vectors. You may have seen vectors in R2 and R3

in physics being used to represent motion or force. In these physical examples we

add vectors by summing their components, and multiply a vector by a scalar by

multiplying each entry of the vector by the scalar. We keep these definitions for

vectors in Rn in linear algebra.

DEFINITION

Addition

ScalarMultiplication

Let ~x =

x1

...xn

, ~y =

y1

...yn

∈ Rn and let c ∈ R (called a scalar). We define addition,

~x + ~y, and scalar multiplication, c~x, by

~x + ~y =

x1 + y1

...xn + yn

, c~x =

cx1

...cxn

REMARK

We define subtraction in terms of addition and scalar multiplication. In particular,

when we write ~x − ~y we mean

~x − ~y = ~x + (−1)~y

Section 1.1 Vector Addition and Scalar Multiplication 3

EXAMPLE 1 For ~x =

1

2

−3

and ~y =

−1

3

0

in R3 we have

~x + ~y =

1 + (−1)

2 + 3

−3 + 0

=

0

5

−3

√2~y =

√2

−1

3

0

=

−√

2

3√

2

0

2~x − 3~y = 2

1

2

−3

+ (−3)

−1

3

0

=

2

4

−6

+

3

−9

0

=

5

−5

−6

We can view addition of vectors geometrically.

EXAMPLE 2 Add ~x =

[

3

1

]

and ~y =

[

−1

2

]

in R2. Show the result geometrically.

Solution: We have

~x + ~y =

[

3 + (−1)

1 + 2

]

=

[

2

3

]

We get the diagram to the right.1 2 3

1

2

3

x1

x2

~x

~y

~x + ~y

Observe that the points corresponding to the origin, ~x, ~y, and ~x + ~y form a parallelo-

gram. Thus, the vector rule for addition is sometimes called the parallelogram rule

for addition.

EXERCISE 1 Add ~x =

[

3

−2

]

and ~y =

[

−4

−1

]

in R2. Show the result geometrically.

Since we will often look at the sums of scalar multiples of vectors, we make the

following definition.

DEFINITION

LinearCombination

For ~v1, . . . ,~vk ∈ Rn and c1, . . . , ck ∈ R we call the sum

c1~v1 + c2~v2 + · · · + ck~vk

a linear combination of ~v1, . . . ,~vk.


Although it is intuitive that any linear combination

c1~v1 + c2~v2 + · · · + ck~vk

of vectors ~v1, . . . ,~vk ∈ Rn is going to be a vector in Rn, it is instructive to prove

this along with several other properties. We will see throughout this text that these

properties are extremely important.

THEOREM 1 If ~x, ~y, ~w ∈ Rn and c, d ∈ R, then

V1 ~x + ~y ∈ Rn

V2 (~x + ~y) + ~w = ~x + (~y + ~w)

V3 ~x + ~y = ~y + ~x

V4 There exists a vector ~0 ∈ Rn such that ~x + ~0 = ~x for all ~x ∈ Rn

V5 For every ~x ∈ Rn there exists (−~x) ∈ Rn such that ~x + (−~x) = ~0V6 c~x ∈ Rn

V7 c(d~x) = (cd)~xV8 (c + d)~x = c~x + d~xV9 c(~x + ~y) = c~x + c~y

V10 1~x = ~x

Proof: We will prove V1 and V3. Let ~x =

x1

...xn

and ~y =

y1

...yn

be any vectors in Rn.

For V1, by definition of addition we have

~x + ~y =

x1 + y1

...xn + yn

Since xi + yi ∈ R for 1 ≤ i ≤ n, we have that ~x +~y satisfies the condition of the set Rn

and hence ~x + ~y ∈ Rn.

Similarly, for V3, we have

~x + ~y =

x1 + y1

...xn + yn

by definition of addition

=

y1 + x1

...yn + xn

since addition of real numbers is commutative

= ~y + ~x by definition of addition

�

EXERCISE 2 Finish the proof of Theorem 1.

Section 1.1 Vector Addition and Scalar Multiplication 5

REMARKS

1. Observe that properties V2, V3, V7, V8, V9, and V10 are only about the oper-

ations of addition and scalar multiplication, while the other properties V1, V4,

V5, and V6 are about the relationship between the operations and the set Rn.

These facts should be clear in the proof of the theorem.

2. The proof of V4 shows that ~0 =

0...0

. It is called the zero vector.

3. The proof of V5 shows that the additive inverse of ~x =

x1

...xn

is (−~x) =

−x1

...−xn

.

4. Properties V1 and V6 show that Rn is closed under linear combinations. That

is, if ~v1, . . . ,~vk ∈ Rn, then c1~v1 + · · · + ck~vk ∈ Rn for any c1, . . . , ck ∈ R. This

fact might seem rather obvious in Rn; however, we will soon see that there

are sets which are not closed under linear combinations. In linear algebra,

it will be important and useful to identify which sets are closed under linear

combinations and, in fact, which sets have all the properties V1 - V10.

Section 1.1 Problems

1. Compute the following linear combinations.

(a) 2

2

1

−1

+ 3

−1

0

1

(b)1

3

3

1

−2

−

4

1/2−2/3

(c)√

2

√2

0√2

+2

3

6

0

0

(d) 3

1

0

1

− 2

−2

3

1

+

−7

6

−1

(e) (a + b)

1

1

−1

+ (−a + b)

1

2

−3

+ (−a + 2b)

−1

−1

2

2. Let ~x,~v ∈ Rn. Solve the following equations for

~x using only properties V1 - V10. Clearly indi-

cating each algebraic step being performed and

referencing which of the properties V1 - V10 is

being used.

(a) 2~x + 2~v = 4~x

(b) ~x + 2~v = ~v + (−~x)

3. In this exercise, we look at the area of the paral-

lelogram formed by the parallelogram rule for

addition on ~x, ~y ∈ R2.

(a) Calculate each of the following and show

the result geometrically. Find the area of

each parallelogram formed by the parallel-

ogram rule for addition.

i.

[

−1

1

]

+

[

0

2

]

ii.

[

2

−1

]

+

[

−2

1

]

(b) What condition must be placed on ~x and ~yto ensure that the parallelogram has non-

zero area?

(c) Create two more pairs ~x, ~y ∈ R2 which in-

duce a parallelogram with non-zero area.

Find the area of each parallelogram.

(d) Find a general formula for the area of the

parallelogram induced by ~x, ~y ∈ R2. Prove

that your formula is correct.


1.2 Bases

We now look at three extremely important concepts in linear algebra: spanning, linear

independence, and bases. We will introduce these using geometric interpretations of

sets of vectors. Note that all of these concepts will be used in every chapter in this

text, so spending the time to grasp these concepts now will have long term benefits.

EXAMPLE 1 Let ~v =

[

1

1

]

∈ R2 and consider the set S of all scalar multiples of ~v. That is,

S = {c~v | c ∈ R}

What does S represent geometrically?

Solution: Every vector ~x ∈ S has the form

~x =

[

x1

x2

]

= c

[

1

1

]

=

[

c

c

]

, c ∈ R

Hence, we see this is all vectors in R2 where x1 = x2. Alternately, we can view this

as the set of all points (x1, x1) in R2, which we recognize as the line x2 = x1.

x1

x2

O

~x = c

[

1

1

]

~v =

[

1

1

]

Notice in the example that the vector ~v gives the line its direction. Hence, we call ~v a

direction vector of the line.

EXAMPLE 2 Let ~e1 =

[

1

0

]

, ~e2 =

[

0

1

]

∈ R2. How would you describe geometrically the set S of all

possible linear combinations of ~e1 and ~e2?

Solution: Every vector ~x in S has the form

~x = c1~e1 + c2~e2 = c1

[

1

0

]

+ c2

[

0

1

]

=

[

c1

c2

]

for any c1, c2 ∈ R. Therefore, any vector in R2 can be represented as a linear combi-

nation of ~e1 and ~e2. Hence, S is the x1x2-plane.

Section 1.2 Bases 7

Instead of using set notation to indicate a set of vectors, we sometimes instead use a

vector equation as we did in examples 1 and 2.

EXAMPLE 3 Let ~m =

−1

1

1

, ~b =

4/51/56/5

∈ R3. Describe geometrically the set with vector equation

~x = t~m + ~b, t ∈ R

Solution: We have all possible scalar multiples of ~m that are being translated by~b. Hence, this is a line in R3 that passes through the point (4/5, 1/5, 6/5) that has

direction vector ~m.

~x1

~x2

~x3

~m

~b

~x = t~m + ~b

EXERCISE 1 Let ~y =

1

1

1

,~z =

1

0

1

∈ R3. Describe geometrically the set with vector equation

~x = s~y + t~z, s, t ∈ R

In words, we could describe the set of vectors in Exercise 1 as the set of all possible

linear combinations of ~y and ~z. Since we will often look at the set of all possible

linear combinations of a set of vectors we make the following definition.

DEFINITION

Span

Let B = {~v1, . . . ,~vk} be a set of vectors in Rn. We define the span of B by

SpanB = {c1~v1 + c2~v2 + · · · + ck~vk | c1, . . . , ck ∈ R}

We say that the set SpanB is spanned by B and that B is a spanning set for SpanB.


EXAMPLE 4 Using the definition above, we can denote the line in Example 1 as Span

{[

1

1

]}

.

Alternatively, we could say that the line is spanned by

{[

1

1

]}

.

The set S in Example 2 can be written as S = Span

{[

1

0

]

,

[

0

1

]}

= Span{~e1, ~e2}.

So, we could say that {~e1, ~e2} is a spanning set for R2.

EXAMPLE 5 Is the vector

[

1

3

]

∈ Span

{[

2

−1

]

,

[

3

3

]}

?

Solution: By definition of span, we need to determine whether there exists c1, c2 ∈ Rsuch that

[

1

3

]

= c1

[

2

−1

]

+ c2

[

3

3

]

Performing operations on vectors on the right hand side, this gives

[

1

3

]

=

[

2c1 + 3c2

−c1 + 3c2

]

Since vectors are equal if and only if they corresponding entries are equal, we get

that this vector equation implies

1 = 2c1 + 3c2

3 = −c1 + 3c2

Subtracting the second equation from the first gives −2 = 3c1 and so c1 = −23.

Substituting this into either equation gives c2 =79.

Hence, we have that[

1

3

]

= −2

3

[

2

−1

]

+7

9

[

3

3

]

Thus, by definition,

[

1

3

]

∈ Span

{[

2

−1

]

,

[

3

3

]}

.

EXAMPLE 6 Describe the subset S of R3 defined by S = Span

1

0

0

,

0

1

0

geometrically and

determine a vector equation for S .

Solution: By definition of span we have

S =

c1

1

0

0

+ c2

0

1

0

| c1, c2 ∈ R

Section 1.2 Bases 9

Thus, a vector equation for S is ~x = c1

1

0

0

+ c2

0

1

0

, c1, c2 ∈ R.

Therefore, S is the set of all vectors of the form

c1

c2

0

, and so represents the

x1x2-plane in R3.

EXAMPLE 7 Describe the subset T of R3 defined by T = Span

1

0

1

,

2

0

2

geometrically and

determine a vector equation for T .

Solution: We have

T =

c1

1

0

1

+ c2

2

0

2

| c1, c2 ∈ R

A vector equation for T is ~x = c1

1

0

1

+ c2

2

0

2

, c1, c2 ∈ R.

However, we see that

2

0

2

= 2

1

0

1

, so we can simplify the vector equation. We get

~x = c1

1

0

1

+ c2

2

1

0

1

= (c1 + 2c2)

1

0

1

= c

1

0

1

where c = c1 + 2c2 ∈ R. Therefore, T is the set of all vectors of the form

c

0

c

, and so

T is a line in R3 with direction vector

1

0

1

that passes through the origin.

For Example 7, we used the fact that the second vector was a scalar multiple of the

first to simplify the vector equation. Of course, whenever we write a vector equa-

tion/spanning set for a set of vectors, we will want to simplify the equation/spanning

set as much as possible by removing any unnecessary vectors.

In particular, we want to know under what conditions we have

Span{~v1, . . . ,~vk} = Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}

Thinking about what these two sets represent (or thinking geometrically), we get the

following theorem.


THEOREM 1 Let ~v1, . . . ,~vk ∈ Rn. Some vector ~vi, 1 ≤ i ≤ k, can be written as a linear combination

of ~v1, . . . ,~vi−1,~vi+1, . . . ,~vk if and only if


Proof: ⇒ If~vi can be written as a linear combination of~v1, . . . ,~vi−1,~vi+1, . . . ,~vk, then,

by definition of linear combination, there exist c1, . . . , ci−1, ci+1, . . . , ck ∈ R such that

c1~v1 + · · · + ci−1~vi−1 + ci+1~vi+1 + · · · + ck~vk = ~vi (1.1)

To prove that Span{~v1, . . . ,~vk} = Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk} we will show that the

sets are subsets of each other.

By definition of span, for any ~x ∈ Span{~v1, . . . ,~vk} there exist d1, . . . , dk ∈ R such that

~x = d1~v1 + · · · + di−1~vi−1 + di~vi + di+1~vi+1 + · · · + dk~vk

Using equation (1.1) to substitute in for ~vi gives

~x = d1~v1 + · · · + di(c1~v1 + · · · + ci−1~vi−1 + ci+1~vi+1 + · · · + ck~vk) + · · · + dk~vk

Rearranging using properties from Theorem 1.1.1 gives

~x = (d1 + dic1)~v1 + · · · + (di−1 + dici−1)~vi−1 + (di+1 + dici+1)~vi+1 + · · · + (dk + dick)~vk

Thus, by definition, ~x ∈ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk} and hence

Span{~v1, . . . ,~vk} ⊆ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}

Now, if ~y ∈ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}, then there exists a1, . . . , ai−1, ai+1, . . . , an ∈R such that

~y = a1~v1 + · · · + ai−1~vi−1 + ai+1~vi+1 + · · · + ak~vk

= a1~v1 + · · · + ai−1~vi−1 + 0~vi + ai+1~vi+1 + · · · + ak~vk

Thus, ~y ∈ Span{~v1, . . . ,~vk}. Hence, we also have Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk} ⊆Span{~v1, . . . ,~vk} and so


⇐ Assume that Span{~v1, . . . ,~vk} = Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}.Since~vi ∈ Span{~v1, . . . ,~vk} our assumption implies that~vi ∈ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}.Consequently, there exists b1, . . . , bi−1, bi+1, . . . , bk ∈ R such that

~vi = b1~v1 + · · · + bi−1~vi−1 + bi+1~vi+1 + · · · + bk~vk

Therefore, ~vi is a linear combination of ~v1, . . . ,~vi−1,~vi+1, . . . ,~vk as required. �

We are going to apply this theorem frequently throughout this course. So, let’s

demonstrate how it works with some examples.

Section 1.2 Bases 11

EXAMPLE 8 Describe each of the sets geometrically and give a simplified vector equation.

(a) S = Span

{[

1

1

]

,

[

3

3

]}

Solution: Since

[

3

3

]

is a linear combination (scalar multiple) of

[

1

1

]

, Theorem 1 gives

Span

{[

1

1

]

,

[

3

3

]}

= Span

{[

1

1

]}

Hence, S is a line in R2 with vector equation ~x = c1

[

1

1

]

, c1 ∈ R.

(b) T = Span

{[

1

0

]

,

[

2

0

]

,

[

1

3

]

,

[

0

1

]}

Solution: Since

[

2

0

]

is a scalar multiple of

[

1

0

]

, Theorem 1 gives

Span

{[

1

0

]

,

[

2

0

]

,

[

1

3

]

,

[

0

1

]}

= Span

{[

1

0

]

,

[

1

3

]

,

[

0

1

]}

We also have that

[

1

3

]

can be written as a linear combination of

[

1

0

]

and

[

0

1

]

, hence

Span

{[

1

0

]

,

[

1

3

]

,

[

0

1

]}

= Span

{[

1

0

]

,

[

0

1

]}

Since neither vector in the spanning set

{[

1

0

]

,

[

0

1

]}

is a scalar multiple of the other,

by Theorem 1, we cannot reduce the spanning set any further. Hence, as described in

Example 2, T is all of R2 and has vector equation ~x = c1

[

1

0

]

+ c2

[

0

1

]

, c1, c2 ∈ R.

(c) U = Span

1

1

2

,

0

1

−1

,

0

0

0

Solution: Since

0

0

0

is a scalar multiple of any vector, we get by Theorem 1 that

Span

1

1

2

,

0

1

−1

,

0

0

0

= Span

1

1

2

,

0

1

−1

Since neither vector in the spanning set

1

1

2

,

0

1

−1

is a scalar multiple of the other,

by Theorem 1, we cannot reduce the spanning set any further. Hence, U is a plane in

R3 with vector equation ~x = c1

1

1

2

+ c2

0

1

−1

, c1, c2 ∈ R.


REMARK

Observe that spanning sets are not unique. For example, each of the following would

also be correct answer for part (b) in Example 8.

~x = c1

[

1

0

]

+ c2

[

1

3

]

, c1, c2 ∈ R

~x = c1

[

2

0

]

+ c2

[

0

1

]

, c1, c2 ∈ R

~x = c1

[

2

0

]

+ c2

[

1

3

]

, c1, c2 ∈ R

From the examples we see that it is very important to detect when one or more vectors

in the set can be made up as linear combinations of some (or all) of the other vec-

tors. We call such sets linearly dependent sets. For example,

{[

1

0

]

,

[

2

0

]

,

[

1

3

]

,

[

0

1

]}

is

linearly dependent and

{[

1

0

]

,

[

0

1

]}

is linearly independent. In the examples it was

quite easy to see when we had this situation; however, in real world applications we

may get hundreds of vectors each having hundreds of entries. So, we need to develop

an algorithm for detecting whether one vector in a set can be written as a linear com-

bination of the others. To do this, we first need a more mathematical definition of

linear dependence.

If a set {~v1, . . . ,~vk} is linearly dependent, then it means that one of the vectors can

be made as a linear combination of some or all of the other vectors. Say, ~vi can be

written as a linear combination of the other vectors. That is, there exist c1, . . . ck ∈ Rsuch that

−ci~vi = c1~v1 + · · · + ci−1~vi−1 + ci+1~vi+1 + · · · + ck~vk

where ci , 0. We can rewrite this as

~0 = c1~v1 + · · · + ci−1~vi−1 + ci~vi + ci+1~vi+1 + · · · + ck~vk (1.2)

So, if the set is linearly dependent, then equation (1.2) has a solution where at least

one coefficient is non-zero. On the other hand, if at least one coefficient in equation

(1.2) is non-zero, say c j, then we can solve the equation for ~v j to get

−c j~v j = c1~v1 + · · · + c j−1~v j−1 + c j+1~v j+1 + · · · + ck~vk

~v j = −c1

c j

~v1 − · · · −c j−1

c j

~v j−1 −c j+1

c j

~v j+1 − · · · −ck

c j

~vk

Therefore, ~v j ∈ Span{~v1, . . . ,~v j−1,~v j+1, . . . ,~vk} and so the set is linearly dependent.

We now use this to formulate a mathematical definition for linear independence and

linear dependence.


DEFINITION

LinearlyDependent

LinearlyIndependent

A set of vectors {~v1, . . . ,~vk} in Rn is said to be linearly dependent if there exist

coefficients c1, . . . , ck not all zero such that

~0 = c1~v1 + · · · + ck~vk

A set of vectors {~v1, . . . ,~vk} is said to be linearly independent if the only solution to

~0 = c1~v1 + · · · + ck~vk

is c1 = c2 = · · · = ck = 0 (called the trivial solution).

The work above which we used to motivate the definition proves the following result.

THEOREM 2 A set of vectors {~v1, . . . ,~vk} in Rn is linearly dependent if and only if

~vi ∈ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk} for some i, 1 ≤ i ≤ k.

EXAMPLE 9 Determine whether

7

−14

6

,

−10

15

15/14

,

−1

0

3

is linearly independent. If not, find a

vector which is in the span of the others.

Solution: Using the definition of linear independence/dependence, we consider

c1

7

−14

6

+ c2

−10

15

15/14

+ c3

−1

0

3

=

0

0

0

Calculating the linear combination of the vectors on the left gives

7c1 − 10c2 − c3

−14c1 + 15c2

6c1 +1514

c2 + 3c3

=

0

0

0

Comparing corresponding entries gives the 3 equations in 3 unknowns

7c1 − 10c2 − c3 = 0

−14c1 + 15c2 = 0

6c1 +15

14c2 + 3c3 = 0

Solving by substitution and elimination we find that there are infinitely many so-

lutions. One solution is c1 =37, c2 =

25, and c3 = −1. Hence, the system has a

non-trivial solution, so the set is linearly dependent.

Now, Theorem 2 tells us that at least one of the vectors can be written as a linear

combination of the other vectors. Moreover, the proof of the theorem shows us how

to actually find the linear combination. First, the solution above gives

3

7

7

−14

6

+2

5

−10

15

15/14

+ (−1)

−1

0

3

=

0

0

0

(1.3)


Following the proof, we pick one of the non-zero coefficients in (1.3) and solve for

the corresponding vector. Picking c2 =25, we solve the equation for ~v2 to get

−2

5

−10

15

15/14

=3

7

7

−14

6

+ (−1)

−1

0

3

−10

15

15/14

= −15

14

7

−14

6

+5

2

−1

0

3

Thus, we have written one of the vectors as a linear combination of the others.

REMARKS

1. In Example 9, we could have written any of the vectors in (1.3) that have non-

zero coefficient as a linear combination of the other vectors.

2. It is important to realize that to show linear independence we must show the

only solution to c1~v1 + · · · + ck~vk = ~0 is c1 = · · · = ck = 0. In particular, notice

that c1 = · · · = ck = 0 is always a solution regardless of whether {~v1, . . . ,~vk} is

linearly independent or not.

3. You will see frequently in this text that a proof often does a lot more than just

prove that the statement of a theorem is true. In many cases, the proof will

teach us how to solve computational problems, as demonstrated in Example

9. Additionally, understanding the techniques and strategies used in one proof

can help one develop proofs for other results.

EXAMPLE 10 Determine whether

5

2

7

,

3

−8

4

,

2

6

15

is linearly independent.

Solution: By definition of linear dependence, we consider

c1

5

2

7

+ c2

3

−8

4

+ c3

2

6

15

=

0

0

0

Calculating the linear combination of the vectors on the left gives

5c1 + 3c2 + 2c3

2c1 − 8c2 + 6c3

7c1 + 4c2 + 15c3

=

0

0

0


Comparing corresponding entries gives the 3 equations in 3 unknowns

5c1 + 3c2 + 2c3 = 0

2c1 − 8c2 + 6c3 = 0

7c1 + 4c2 + 15c3 = 0

Solving by substitution and elimination we find that the only solution is c1 = c2 =

c3 = 0. Consequently, the set is linearly independent.

EXERCISE 2 Determine whether

1

2

1

,

1

−1

2

,

1

−1

−3

is linearly independent. If not, find a vector

which is in the span of the others.

EXERCISE 3 Determine whether

1

−1

3

,

2

−4

1

,

1

0

−1

,

3

−4

0

is linearly independent. If not, find a

vector which is in the span of the others.

Observe that the procedure for determining whether a set {~v1, . . . ,~vk} in Rn is linearly

dependent or linearly independent requires determining solutions of the vector equa-

tion c1~v1 + · · · + ck~vk = ~0. However, this vector equation actually represents n scalar

equations (one for each entry of the vectors) in k unknowns c1, . . . , ck. In the next

chapter, we will look at how to efficiently solve such systems of equations. However,

in some cases it may be easy to detect solutions by inspection.

EXAMPLE 11 Show that

1

2

1

,

2

4

2

is linearly dependent.

Solution: By definition of linear dependence, we consider

c1

1

2

1

+ c2

2

4

2

=

0

0

0

Since the second vector is a scalar multiple of the first, we see that if we take c1 = −2

and c2 = 1 we get

(−2)

1

2

1

+

2

4

2

=

0

0

0

Therefore, the equation has non-trivial solutions, so the set is linearly dependent.


EXAMPLE 12 Show that

3

1

4

1

,

5

9

2

6

,

0

0

0

0

is linearly dependent.

Solution: Consider c1

3

1

4

1

+ c2

5

9

2

6

+ c3

0

0

0

0

=

0

0

0

0

We immediately observe that this equation has the non-trivial solution c1 = c2 = 0,

c3 = 1. Hence, the set is linearly dependent.

REMARK

Notice in both of the last two examples that there were in fact infinitely many different

non-trivial solutions to the equations.

We generalize our method of solution in Example 12 to prove the following theorem.

THEOREM 3 If a set of vectors {~v1, . . . ,~vk} contains the zero vector, then it is linearly dependent.

Proof: If ~vi = ~0, then

0~v1 + · · · + 0~vi−1 + 1~vi + 0~vi+1 + · · · + 0~vk = ~0

Hence, the equation ~0 = c1~v1 + · · · + ck~vk has a solution with one coefficient, namely

ci, that is non-zero. So, by definition, the set is linearly dependent. �

Theorem 2 tells us that a set is linearly independent if and only if none of the vectors

in the set can be written as a linear combination of the others. That is, the simplest

spanning set for a given set is one that is linearly independent. Hence, we make the


DEFINITION

Basis

Let S be a subset of Rn. If {~v1, . . . ,~vk} is a linearly independent set of vectors in Rn

such that S = Span{~v1, . . . ,~vk}, then the set {~v1, . . . ,~vk} is called a basis for S .

We define a basis for the set {~0} to be the empty set.

Note that this definition and its generalization will be extremely important throughout

the remainder of the book.


EXAMPLE 13 Prove that the set B ={[

1

0

]

,

[

0

1

]}

is a basis for R2.

Solution: To prove this, we need to show that SpanB = R2 (B is a spanning set for

R2) and that B is linearly independent.

To prove SpanB = R2 we will show that they are subsets of each other. For any[

x1

x2

]

∈ R2 we have[

x1

x2

]

= x1

[

1

0

]

+ x2

[

0

1

]

This shows that every vector in R2 is a linear combination of the vectors in B, so

R2 ⊆ SpanB. On the other hand, properties V1 and V6 of Theorem 1.1.1 tells us

that all linear combinations of the vectors in B are in R2. Hence, SpanB ⊆ R2 and

consequently, SpanB = R2.

To prove that B is linearly independent, we use the definition of linear independence.

Consider[

0

0

]

= c1

[

1

0

]

+ c2

[

0

1

]

=

[

c1

c2

]

So, c1 = c2 = 0 and hence B is also linearly independent. Thus, B is a basis for R2.

DEFINITION

Standard Basis

In Rn, let ~ei represent the vector whose i-th component is 1 and all other components

are 0. The set {~e1, . . . , ~en} is called the standard basis for Rn.

EXAMPLE 14 The standard basis for R2 is {~e1, ~e2} ={[

1

0

]

,

[

0

1

]}

.

EXERCISE 4 Write the standard basis for R4.

REMARK

Observe that the standard basis vectors for Rn just represent the usual coordinate

axes. For example, in R3 we have that Span{~e1} is the x-axis, Span{~e2} is the y-axis,

and Span{~e3} is the z-axis. In particular, whenever we write a point (x, y, z) in R3 we

interpret it as

x

y

z

= x

1

0

0

+ y

0

1

0

+ z

0

0

1

= x~e1 + y~e2 + z~e3


EXAMPLE 15 Is the set B =

1

0

1

,

−2

1

1

a basis for R3?

Solution: Thinking geometrically, we guess that a basis for R3 should need at least

three vectors to span R3. So, we expect that B does not span R3. To prove this, we

just need to show that there exists a vector ~x ∈ R3 that cannot be written as a linear

combination of the vectors in B. Consider

0

1

0

= c1

1

0

1

+ c2

−2

1

1

=

c1 − 2c2

c2

c1 + c2

Comparing entries gives three equations in two unknowns

c1 − 2c2 = 0

c2 = 1

c1 + c2 = 0

Substituting c2 = 1 into the first and third equations gives −1 = c1 = 2 which shows

that there is no solution. Hence, ~x < SpanB and so B is not a basis for R3.

EXAMPLE 16 Prove that the set B ={[

−1

2

]

,

[

1

1

]}

is a basis for R2.

Solution: We first show spanning. Let

[

x1

x2

]

∈ R2 and consider

[

x1

x2

]

= c1

[

−1

2

]

+ c2

[

1

1

]

=

[

−c1 + c2

2c1 + c2

]

(1.4)

Comparing entries we get two equations in two unknowns

−c1 + c2 = x1

2c1 + c2 = x2

Solving gives c1 =13(−x1 + x2) and c2 =

13(2x1 + x2). Hence, we have that

1

3(−x1 + x2)

[

−1

2

]

+1

3(2x1 + x2)

[

1

1

]

=

[

x1

x2

]

Thus, every vector in R2 is a linear combination of the vectors in B. On the other

hand, every linear combination of

[

−1

2

]

and

[

1

1

]

is in R2 by Theorem 1.1.1. Conse-

quently, SpanB = R2.

For linear independence, we take x1 = x2 = 0 in equation (1.4). Our general solution

to that vector equation says that the only solution is c1 = c2 = 0. So, B is also linearly

independent. Therefore, B is a basis for R2.


Graphically, the basis in the last example just represents a different set of coordinate

axes for R2. The figure below shows how these vectors still form a grid covering all

points in R2. Observe that the vector

[

4

1

]

is -1 ‘unit’ in the direction of

[

−1

2

]

and 3

‘units’ in the direction of

[

1

1

]

.

Notice that the reason we need a basis to be a spanning set is so that the grid covers

every point in the space. The reason that we need a basis to be linearly independent

is so that, like the standard coordinate axes, each vector has unique representation as

a point on the grid.

x

y

[

−1

2

]

[

1

1

]

[

4

1

]

= −[

−1

2

]

+ 3

[

1

1

]

Figure 1.2.1: Geometric representation of the basis B in Example 16

THEOREM 4 If B = {~v1, . . . ,~vk} is a basis for a subset S of Rn, then every vector ~x ∈ S can be

written as a unique linear combination of the vectors in B.

Proof: By definition of a basis, B is a spanning set for S . Thus, by definition of a

spanning set, every vector in S can be written as a linear combination of the vectors

in B.

To prove that every such linear combination is unique, we assume to the contrary that

there is a vector ~x ∈ S such that

c1~v1 + · · · + ck~vk = ~x = d1~v1 + · · · + dk~vk

Using the properties from Theorem 1.1.1, we can rearrange this to get

(c1 − d1)~v1 + · · · + (ck − dk)~vk = ~0 (1.5)

Because {~v1, . . . ,~vk} is linearly independent, the only solution to equation (1.5) is

ci − di = 0 for 1 ≤ i ≤ k. Hence, ci = di for 1 ≤ i ≤ k. Consequently, every vector in

S has a unique representation as a linear combination of the vectors in B. �


As with many proofs, some of the “easy steps” in the proof above are omitted. In

particular, the algebraic steps required to simplify

c1~v1 + · · · + ck~vk = d1~v1 + · · · + dk~vk

to

(c1 − d1)~v1 + · · · + (ck − dk)~vk = ~0

At first glance, this simplification should seem straightforward (the reason it was

omitted). However, if you try to justify each step using the properties in Theorem

1.1.1, you will likely discover it takes quite a bit of effort and thought.

EXERCISE 5 Show and justify ALL steps required to change

c1~v1 + · · · + ck~vk = d1~v1 + · · · + dk~vk

into

(c1 − d1)~v1 + · · · + (ck − dk)~vk = ~0

using only properties in Theorem 1.1.1.

Check the solution to the exercise. If you managed to justify every step, you can be

particularly proud of yourself. At this point, you may think that justifying all these

steps has no purpose. However, it is necessary! Soon we will start seeing objects

where some of the familiar properties that we take for granted do not hold! For

example, in Chapter 3, we will see objects where A times B most of the times does

not equal B times A. In these cases, it will be important that you think about every

step that you are doing so that you don’t use a property that doesn’t hold.

Surfaces in Higher Dimensions

We now extend our geometrical concepts of lines and planes to Rn for any n ≥ 2.

DEFINITION

Line in Rn

Let ~v, ~b ∈ Rn with ~v , ~0. We call the set with vector equation ~x = c1~v + ~b, c1 ∈ R a

line in Rn through ~b.

DEFINITION

Plane in Rn

Let ~v1,~v2, ~b ∈ Rn with {~v1,~v2} being a linearly independent set. We call the set with

vector equation ~x = c1~v1 + c2~v2 + ~b, c1, c2 ∈ R a plane in Rn through ~b.

DEFINITION

Hyperplane in Rn

Let ~v1, . . . ,~vn−1, ~b ∈ Rn with {~v1, . . . ,~vn−1} being linearly independent. The set with

vector equation ~x = c1~v1 + · · · + cn−1~vn−1 + ~b, c1, . . . , cn−1 ∈ R is called a hyperplane

in Rn through ~b.


REMARK

Intuitively we see that a line in Rn is one-dimensional, a plane in Rn is two-

dimensional, and a hyperplane in Rn is n − 1-dimensional. We will justify this in-

tuition in Chapter 4 when we make a precise mathematical definition of dimension.

EXAMPLE 17 B =

1

0

0

1

,

0

1

0

−2

,

0

1

1

−1

is linearly independent, so SpanB is a hyperplane in R4.

EXERCISE 6 Give a vector equation for a hyperplane in R5.

As we will see, we will sometimes want to refer to a k-dimensional object in Rn. So,

we make one more definition.

DEFINITION

k-Flat in Rn

Let ~v1, . . . ,~vk, ~b ∈ Rn with {~v1, . . . ,~vk} linearly independent. We call the set with

vector equation ~x = c1~v1 + . . . + ck~vk + ~b, c1, . . . , ck ∈ R a k-flat in Rn through ~b.

Observe that a point in Rn is a 0-flat, a line in Rn is a 1-flat, and a hyperplane in Rn is

an (n − 1)-flat.


1. Describe geometrically the following sets and

write a simplified vector equation for each.

(a) Span

{[

1

0

]}

(b) Span

{[

−1

1

]

,

[

2

−2

]}

(c) Span

1

−3

1

,

−2

6

−2

(d)

1

−3

1

,

−2

6

−2

(e) Span

1

0

−2

,

2

1

−1

(f) Span

0

0

0

(g) Span

1

0

1

1

,

1

0

2

1

,

3

1

0

0

(h) Span

1

1

1

0

2. Determine which of the following sets is lin-

early independent. If a set is linearly depen-

dent, write one of the vectors in the set as a

linear combination of the others.

(a)

{[

1

2

]

,

[

1

3

]

,

[

1

4

]}

(b)

{[

3

1

]

,

[

−1

3

]}

(c)

{[

1

1

]

,

[

1

0

]}

(d)

{[

2

3

]

,

[

−4

−6

]}

(e)

1

2

1

(f)

1

−3

−2

,

4

6

1

,

0

0

0

(g)

1

1

0

,

1

2

−1

,

−2

−4

2

(h)

1

−2

1

,

2

3

4

,

0

−1

−2

(i)

1

1

2

1

,

2

2

4

2

,

1

0

1

0

,

2

1

3

1

(j)

1

1

0

1

,

0

1

1

1


3. Determine which of the following sets forms a

basis for R2.

(a)

{[

3

2

]}

(b)

{[

2

3

]

,

[

1

0

]}

(c)

{[

−1

1

]

,

[

1

3

]}

(d)

{[

−1

1

]

,

[

1

3

]

,

[

0

2

]}

(e)

{[

1

0

]

,

[

−3

5

]}

(f)

{[

1

0

]

,

[

0

−3

]

,

[

0

0

]}

4. Determine which of the following sets forms a

basis for R3.

(a)

1

2

1

,

0

0

0

,

1

4

3

(b)

−1

2

−1

,

1

1

2

(c)

1

0

1

(d)

1

0

1

,

0

1

1

,

1

1

0

(e)

1

1

1

,

1

1

0

,

0

0

3

(f)

1

1

−1

,

1

2

−3

,

−1

−1

2

5. Write a basis for a hyperplane in R3. Justify

your answer.

6. Prove that {~v1,~v2} is linearly independent if and

only if neither ~v1 nor ~v2 is a scalar multiple of

the other.

7. Prove that if ~v1 ∈ Span{~v2,~v3}, then {~v1,~v2,~v3}is linearly dependent.

8. Let ~x =

1

1

0

and ~y =

0

1

2

.

(a) Prove that {~x, ~y} is not a basis for R3.

(b) Find, with proof, a vector ~z ∈ R3 such that

{~x, ~y,~z} is a basis for R3.

(c) With your ~z from part (b), is {~x, ~y, t~z} a ba-

sis for R3 for all t ∈ R? If so, give a proof.

If not, what restriction must you make on

t?

(d) Is there a vector ~w that is not a scalar mul-

tiple of ~z such that {~x, ~y, ~w} is a basis for

R3.

9. Repeat Question 8 with ~x =

1

2

−1

and ~y =

0

1

2

.

10. Prove that if {~v1,~v2,~v3} is linearly independent,

then {~v1,~v2} is also linearly independent.

11. Prove that Span{~v1,~v2} = Span{~v1, t~v2} for any

t ∈ R, t , 0.

12. Prove that if {~v1,~v2} is linearly independent and

~v3 < Span{~v1,~v2}, then {~v1,~v2,~v3} is linearly in-

dependent.

13. Let ~a, ~b, ~c ∈ R4 and let r, s, t be non-zero real

numbers. Prove that

Span{~a, ~b, ~c} = Span{r~a, s~b, t~c}

14. Determine whether each statement is true or

false. Justify your answer.

(a) {~v1} is linearly independent if and only if

~v1 ,~0.

(b) If ~v1,~v2 ∈ R3, then Span{~v1,~v2} is a plane

in R3.

(c) If {~v1,~v2} spans R2, then {~v1,~v1 + ~v2} spans

R2.

(d) If {~v1,~v2,~v3} is linearly dependent, then

~v3 ∈ Span{~v1,~v2}.(e) If ~v1 is not a scalar multiple of ~v2, then

{~v1,~v2} is linearly independent.

(f) If ~b is a scalar multiple of ~d, then ~x = t~d+~bis a line through the origin.

(g) If ~v1, . . . ,~vk ∈ Rn, then ~0 ∈Span{~v1, . . . ,~vk}.

(h) If ~x1, ~x2, ~x3, ~x4 ∈ R4 are distinct vectors

such that {~x1, ~x2} and {~x3, ~x4} are both lin-

early independent sets, then {~x1, ~x2, ~x3, ~x4}is also linearly independent.

(i) If {~v1, . . . ,~vk} spans a subset S of Rn and

{~w1, . . . , ~wℓ} spans a subset T of Rn, then

Span{~v1, . . . ,~vk, ~w1, . . . , ~wℓ} = S ∪ T .

Section 1.3 Subspaces 23

1.3 Subspaces

As we mentioned when we first looked at the ten properties of vector addition and

scalar multiplication in Theorem 1.1.1, properties V1 and V6 seem rather obvious.

However, it is easy to see that not all subsets of Rn have this property. For example,

the set S =

{[

1

1

]}

clearly does not satisfy properties V1 and V6 since

[

1

1

]

+

[

1

1

]

=

[

2

2

]

< S

c

[

1

1

]

=

[

c

c

]

< S if c , 1

A set which satisfies property V1 is said to be closed under addition. A set which

satisfies property V6 is said to be closed under scalar multiplication.

You might be wondering why we should be so interested in whether a set has prop-

erties V1 and V6 or not. One reason is that a non-empty subset S of Rn which has

these properties is closed under linear combinations. Moreover, such a subset S will

in fact satisfy all the same ten properties as Rn. That is, it is a “space” which is a

subset of the larger space Rn.

DEFINITION

Subspace of Rn

A subset S of Rn is called a subspace of Rn if for every ~x, ~y, ~w ∈ S and c, d ∈ R we

have

S1 ~x + ~y ∈ SS2 (~x + ~y) + ~w = ~x + (~y + ~w)

S3 ~x + ~y = ~y + ~x

S4 There exists a vector ~0 ∈ S such that ~x + ~0 = ~x for all ~x ∈ SS5 For every ~x ∈ S there exists (−~x) ∈ S such that ~x + (−~x) = ~0S6 c~x ∈ SS7 c(d~x) = (cd)~xS8 (c + d)~x = c~x + d~xS9 c(~x + ~y) = c~x + c~y

S10 1~x = ~x

It would be tedious to have to check all ten properties each time we wanted to prove

a non-empty subset S of Rn is a subspace. The good news is that this is not necessary.

Observe that properties S2, S3, S7, S8, S9, and S10 are only about the operations of

addition and scalar multiplication. Thus, we already know that these properties hold

by Theorem 1.1.1. Also, if S1 and S6 hold, then for any ~v ∈ S we have ~0 = 0~v ∈ Sand (−~v) = (−1)~v ∈ S, so S4 and S5 hold. Thus, to check if a non-empty subset of Rn

is a subspace, we can use the following result.

THEOREM 1 (Subspace Test)

Let S be a non-empty subset of Rn. If ~x + ~y ∈ S and c~x ∈ S for all ~x, ~y ∈ S and c ∈ R,

then S is a subspace of Rn.


EXAMPLE 1 Show that the line through the origin in R2 with vector equation ~x = s

[

1

−4

]

, s ∈ R is

a subspace of R2.

Solution: By V6, we know that s

[

1

−4

]

∈ R2 for every s ∈ R. Thus, the line is a

non-empty subset of R2.

To show that the line is closed under addition (satisfies S1), we pick any two vectors

~x and ~y on the line and show that ~x + ~y is also on the line. Pick ~x = s1

[

1

−4

]

and

~y = s2

[

1

−4

]

for some s1, s2 ∈ R. Then, since ~x and ~y are in R2, we can apply property

V8 to get

~x + ~y = s1

[

1

−4

]

+ s2

[

1

−4

]

= (s1 + s2)

[

1

−4

]

and hence ~x + ~y is also on the line. Similarly, to show that the line is closed under

scalar multiplication (satisfies S6) we need to show that c~x is also on the line for any

c ∈ R. As above, we can use property V7 to get

c~x = c

(

s1

[

1

−4

])

= (cs1)

[

1

−4

]

So, c~x is also on the line. Hence, by the Subspace Test, the line is a subspace of R2.

If S is non-empty and is closed under scalar multiplication, then our work above

shows us that S must contain ~0. Thus, the standard way of checking if S is non-

empty is to determine if ~0 satisfies the conditions of the set. If ~0 is not in S, then we

automatically know that it is not a subspace of Rn as it would not satisfy S4 or S6.

EXERCISE 1 If L is a line in R2 that does not pass through the origin, can L be a subspace of R2?

EXAMPLE 2 Prove that S1 =

x1

x2

x3

∈ R3 | x1 + x2 = 0, x1 − x3 = 0

is a subspace of R3.

Solution: By definition S1 is a subset of R3 and ~0 satisfies the conditions of the set

(0 + 0 = 0 and 0 − 0 = 0) so ~0 ∈ S1. Thus, S1 is a non-empty subset of R3.

To show that S1 is closed under addition we pick two vectors ~x, ~y ∈ S1 and show that

~x + ~y satisfies the conditions of S1. If ~x =

x1

x2

x3

and ~y =

y1

y2

y3

are in S1, then we have

x1 + x2 = 0, x1 − x3 = 0, y1 + y2 = 0, and y1 − y3 = 0.


By definition of addition, we get

~x + ~y =

x1 + y1

x2 + y2

x3 + y3

We see that

(x1 + y1) + (x2 + y2) = (x1 + x2) + (y1 + y2) = 0 + 0 = 0

and

(x1 + y1) − (x3 + y3) = (x1 − x3) + (y1 − y3) = 0 + 0 = 0

Hence, ~x + ~y satisfies the conditions of S1, so S1 is closed under addition.

Similarly, for any c ∈ R we have

c~x =

cx1

cx2

cx3

and

cx1 + cx2 = c(x1 + x2) = c0 = 0, cx1 − cx3 = c(x1 − x3) = c0 = 0

Hence, S1 is also closed under scalar multiplication.

Therefore, by the Subspace Test S1 is a subspace of R3.


x1

x2

x3

∈ R3 | x1 + x2 = 1

is not a subspace of R3.

Solution: By definition S2 is a subset of R3, but we see that ~0 does not satisfy the

condition of S2 (0 + 0 , 1), hence S2 is not a subspace of R3.


x1

x2

x3

∈ R3 | x1 − 2x2 = 0


Solution: By definition S3 is a subset of R3, and ~0 satisfies the conditions of the set

(0 − 2(0) = 0) so ~0 ∈ S3. Thus, S3 is non-empty subset of R3.

Let ~x =

x1

x2

x3

and ~y =

y1

y2

y3

be vectors in S3, then we have x1−2x2 = 0 and y1−2y2 = 0.

Then

~x + ~y =

x1 + y1

x2 + y2

x3 + y3


and

(x1 + y1) − 2(x2 + y2) = (x1 − 2x2) + (y1 − 2y2) = 0 + 0 = 0

Hence, S3 is closed under addition.

Similarly, for any c ∈ R we have

c~x =

cx1

cx2

cx3

and

cx1 − 2(cx2) = c(x1 − 2x2) = c0 = 0

Hence, S3 is also closed under scalar multiplication.

Therefore, S3 is a subspace of R3 by the Subspace Test.

EXERCISE 2 Let ~v, ~b ∈ Rn with ~v , ~0. Show that the line S = {t~v + ~b | t ∈ R} is a subspace of Rn if

and only if ~b is a scalar multiple of ~v.

THEOREM 2 If ~v1, . . . ,~vk ∈ Rn, then S = Span{~v1, . . . ,~vk} is a subspace of Rn.

Bases of Subspaces

Theorem 2 implies that if a subset S of Rn has a basis, then S must be a subspace of

Rn. In Chapter 4, we will prove the converse: every subspace S of Rn has basis. For

now, we look at the procedure for finding a basis of a given subspace.

The algorithm for finding a basis of a subspace S on Rn will be an extension of what

we did in the last section. We will first find a general form for a vector in S and then

use the general form to find a spanning set for S. Finally, we use Theorem 1.2.1 as

many times as necessary until we get a linearly independent spanning set for S.

Of course, as we saw for bases of Rn, every subspace of Rn, excluding the trivial

subspace {~0}, will have infinitely many bases.

REMARK

To prove that a setB spans a subspace S ofRn, we are required to prove that SpanB ⊆S and S ⊆ SpanB. However, observe that if all of the vectors in B are in S, then we

must have SpanB ⊆ S since S is closed under linear combinations. Since verifying

that all the vectors in B are in S is considered trivial, it is often omitted.


EXAMPLE 5 Find a basis for the subspace S1 =

x1

x2

x3

∈ R3 | x1 + x2 = 0, x1 − x3 = 0

of R3.

Solution: We begin by using the conditions on S1 to find a general form of a vector

~x ∈ S1. If

x1

x2

x3

∈ S1, then we have x1 + x2 = 0 and x1 − x3 = 0. Hence, we get

x2 = −x1 and x3 = x1. Therefore, every vector in S1 can be represented as

x1

x2

x3

=

x1

−x1

x1

= x1

1

−1

1

Thus, S1 ⊆ Span

1

−1

1

. Moreover, since

1

−1

1

∈ S1 we have that Span

1

−1

1

⊆

S1. Consequently, B =

1

−1

1

is a spanning set for S1. Clearly, B is also linearly

independent (it contains a single non-zero vector). Hence, B is a basis for S1.

EXERCISE 3 Find another basis for the subspace S1 in Example 5.

EXAMPLE 6 Find a basis for the subspace S2 =

a − b

b − c

c − a

| a, b, c ∈ R

of R3.

Solution: We are given that every vector in S2 has the form

a − b

b − c

c − a

. To write this as

a linear combination of vectors, we factor out the variables individually to get

a − b

b − c

c − a

= a

1

0

−1

+ b

−1

1

0

+ c

0

−1

1

Thus, S2 ⊆ Span

1

0

−1

,

−1

1

0

,

0

−1

1

. Since each of the vectors are also in S2 we get

that

1

0

−1

,

−1

1

0

,

0

−1

1

spans S2.

However, observe that this set is linearly dependent. In particular, we have

0

−1

1

= −

1

0

−1

−

−1

1

0


Therefore, by Theorem 1.2.1 we get that S2 = Span

1

0

−1

,

−1

1

0

.

Since B =

1

0

−1

,

−1

1

0

is also linearly independent (neither vector is a scalar mul-

tiple of the other), we get that B is a basis for S2.

The purpose of the next exercise is to stress the importance of verifying that the

vectors in a set B are actually in the set S when trying to prove that SpanB = S.

EXERCISE 4 Consider the subspace S =

x1

x1

x1

∈ R3 | x1 ∈ R

.

(a) Show that every vector in S is a linear combination of ~v1 =

1

0

1

and ~v2 =

0

1

0

.

(b) Explain why B = {~v1,~v2} does not span S.

The importance and usefulness of bases will become more and more clear as we

proceed further into linear algebra. For now, the most important thing to realize

about bases is that they give us a simple explicit formula for every vector in the set.

One advantage of having this is that it becomes very easy to identify the geometric

interpretation of the subspace.

EXAMPLE 7 Find a geometric interpretation of the subspace S3 =

x1

x2

x3

∈ R3 | x1 − 2x2 = 0

.

Solution: Every

x1

x2

x3

∈ S3 satisfies x1 − 2x2 = 0. We have x1 = 2x2 and so every

vector in S3 can be represented as

x1

x2

x3

=

2x2

x2

x3

= x2

2

1

0

+ x3

0

0

1

Since we also have that

2

1

0

,

0

0

1

∈ S3, we get that B =

2

1

0

,

0

0

1

spans S3. Finally,

B is linearly independent since neither vector in B is a scalar multiple of the other.

So, B is a basis for S3.

Therefore, by definition, S3 is a plane in R3 through the origin.



1. Determine which of the following sets are sub-

spaces of the given Rn. Find a basis of each

subspace and describe the subspace geometri-

cally.

(a) S1 = Span

1

1

−1

,

2

1

4

of R3

(b) S2 =

{[

x1

x2

]

∈ R2 | x1 = x2

}

of R2

(c) S3 =

5

x2

x3

| x2, x3 ∈ R

of R3

(d) S4 =

x1

x2

x3

∈ R3 | x1 + x2 = x3

of R3

(e) S5 =

0

0

0

0

of R4

(f) S6 =

1

2

3

of R3

(g) S7 =

x1

x2

x3

x4

| x2 = x1 − x3 + x4

of R4

(h) S8 =

a − b

b − d

a + b − 2d

c − d

| a, b, c, d ∈ R

of R4

2. Prove that a plane in R3 is a subspace of R3 if

and only if it passes through the origin.

3. Let S = {~v1, . . . ,~vk} be a set of non-zero vectors

in Rn. Prove that S is not a subspace of Rn.

4. Let B =

1

3

1

,

2

0

1

.

(a) Prove that B is not a basis for R3.

(b) Find a subspace S of R3 such that B is a

basis for S.

5. Explain the difference between a subset of Rn

and a subspace of Rn.

6. Determine whether the set B is a basis for the

given subspace S of R3.

(a) S1 =

x1

x2

x3

| x1 − 2x2 + x3 = 0

; B =

0

1

2

,

1

0

−1

(b) S2 =

x1

x2

x3

| x3 = 2x1 − 3x2 ∈ R

; B =

1

1

−1

(c) S3 =

x1

x2

x3

| x1 + 2x2 = 0 ∈ R

; B =

1

2

0

,

0

0

1

(d) S4 =

x1

x2

x3

| x1 = 2x2, x2 = −4x3

; B =

4

2

−1/2

(e) S5 =

x1

x2

x3

| 3x1 + x2 = x3

; B =

2

1

7

,

1

−1

2

7. Find a basis for the subspace S of R3 defined

by S = Span

1

−1

3

,

−1

0

2

,

0

1

−5

.

8. Let B =

1

2

1

,

2

2

1

. Find a, b, c ∈ R such that

B is a basis for the subspace

S =

x1

x2

x3

| ax1 + bx2 + cx3 = 0

of R3.

9. Let a, b, c, d ∈ R with d , 0 and let

P =

x1

x2

x3

∈ R3 | ax1 + bx2 + cx3 = d

(a) Show that P is not a subspace of R3.

(b) Explain why P does not have a basis.

(c) Find a vector equation for P.

(d) What is P geometrically?


1.4 Dot Product, Cross Product, and Scalar Equations

So far, we have looked at how to add two vectors in Rn and how to multiply a vector

by a scalar. We now look at two ways of defining a product of two vectors.

Dot Product

Recall that the dot product in R2 and R3 is defined by

[

x1

x2

]

·[

y1

y2

]

= x1y1 + x2y2

x1

x2

x3

·

y1

y2

y3

= x1y1 + x2y2 + x3y3

Two useful aspects of the dot product is in calculating the length of a vector and in

determining whether two vectors are orthogonal. In particular, the length of a vector

~v is ‖~v ‖ =√~v · ~v, and two vectors ~x and ~y are orthogonal if and only if ~x · ~y = 0. In

relation to the latter, the dot product also determines the angle between two vectors

in R2.

THEOREM 1 If ~x, ~y ∈ R2 and θ is an angle between ~x and ~y, then

~x · ~y = ‖~x ‖ ‖~y ‖ cos θ

O x1

x2

θ

~y

~x

Figure 1.4.1: An angle between vectors ~x and ~y.

Since the dot product is so useful in R2 and R3, it makes sense to extend it to Rn.

Section 1.4 Dot Product, Cross Product, and Scalar Equations 31

DEFINITION

Dot Product

Let ~x =

x1

...xn

and ~y =

y1

...yn

be vectors in Rn. The dot product of ~x and ~y is

~x · ~y =

x1

...xn

·

y1

...yn

= x1y1 + · · · + xnyn =

n∑

i=1

xiyi

REMARK

The dot product is sometimes called the standard inner product on Rn, or the scalar

product.

EXAMPLE 1 Let ~x =

1

2

−1

1

, ~y =

0

0

1

1

, and ~z =

3

1

−2

−2

. Calculate ~x · ~y, ~y ·~z, ~z ·~z, and (~x ·~z)~z.

Solution: We have

~x · ~y = 1(0) + 2(0) + (−1)(1) + 1(1) = 0

~y ·~z = 0(3) + 0(1) + 1(−2) + 1(−2) = −4

~z ·~z = 3(3) + 1(1) + (−2)(−2) + (−2)(−2) = 18

(~x ·~z)~z =(

1(3) + 2(1) + (−1)(−2) + 1(−2)

)

~z = 5

3

1

−2

−2

=

15

5

−10

−10

As usual when looking at a new operation, we should prove some important prop-

erties of the dot product. It also should be noted that these properties are not only

important when working with the dot product, but, like Theorem 1.1.1, we will make

important use of these properties later in the course.

THEOREM 2 If ~x, ~y,~z ∈ Rn and s, t ∈ R, then

(1) ~x · ~x ≥ 0, and ~x · ~x = 0 if and only if ~x = ~0(2) ~x · ~y = ~y · ~x(3) ~x · (s~y + t~z) = s(~x · ~y) + t(~x ·~z)


Proof: Let ~x =

x1

...xn

, ~y =

y1

...yn

, and ~z =

z1

...zn

. For (1) we have

~x · ~x = x21 + · · · + x2

n

Since this is a sum of non-negative numbers, we get that ~x · ~x ≥ 0 for all ~x, and that

~x · ~x = 0 if and only if xi = 0 for 1 ≤ i ≤ n.

For (2) we have

~x · ~y = x1y1 + · · · + xnyn = y1x1 + · · · + ynxn = ~y · ~x

For (3) We have

~x · (s~y + t~z) =

x1

...xn

·

sy1 + tz1

...syn + tzn

= x1(sy1 + tz1) + · · · + xn(syn + tzn)

= sx1y1 + tx1z1 + · · · + sxnyn + txnzn

= s(x1y1 + · · · + xnyn) + t(x1z1 + · · · + xnzn)

= s(~x · ~y) + t(~x ·~z)

�

Observe that property (1) allows us to define the length of a vector in Rn to match our

formula in R2 and R3.

DEFINITION

Length

Let ~x ∈ Rn. The length or norm of ~x is defined to be

‖~x ‖ =√~x · ~x

EXAMPLE 2 Find the length of ~x =

1

2

−1

0

, ~y =

1

−2

3

1

1

, and ~z =

1/21/21/21/2

.

Solution: We have

‖~x ‖ =√~x · ~x =

√

12 + 22 + (−1)2 + 02 =√

6

‖~y ‖ =√

~y · ~y =√

12 + (−2)2 + 32 + 12 + 12 =√

16 = 4

‖~z ‖ =√

~z ·~z =√

(1/2)2 + (1/2)2 + (1/2)2 + (1/2)2 =√

1 = 1


In some applications, it is useful to have vectors of length 1. So, we make the fol-

lowing definition.

DEFINITION

Unit Vector

A vector ~x ∈ Rn such that ‖~x ‖ = 1 is called a unit vector.

THEOREM 3 If ~x, ~y ∈ Rn and c ∈ R, then

(1) ‖~x ‖ ≥ 0, and ‖~x‖ = 0 if and only if ~x = ~0(2) ‖c~x ‖ = |c| ‖~x ‖(3) |~x · ~y | ≤ ‖~x ‖ ‖~y ‖ (Cauchy-Schwarz-Bunyakovski Inequality)

(4) ‖~x + ~y ‖ ≤ ‖~x ‖ + ‖~y ‖ (Triangle Inequality)

Proof: We prove (3) and leave the rest as exercises.

If ~x = ~0, the result is obvious. For ~x , ~0, ~y ∈ Rn and t ∈ R, we use property (1) and

properties in Theorem 2 to get

0 ≤ ‖t~x + ~y‖2 = (t~x + ~y) · (t~x + ~y) = (~x · ~x)t2 + 2(~x · ~y)t + (~y · ~y)

This is a polynomial in the variable t where the coefficient ~x · ~x of t2 is positive. For

this polynomial to be non-negative it must have non-positive discriminant. Hence,

(2~x · ~y)2 − 4(~x · ~x)(~y · ~y) ≤ 0

4(~x · ~y)2 − 4‖~x ‖2 ‖~y ‖2 ≤ 0

(~x · ~y)2 ≤ ‖~x ‖2‖~y ‖2

Taking square roots of both sides gives the desired result. �

We now also extend the concept of angles and orthogonality to vectors in Rn.

DEFINITION

Angle in Rn

Let ~x, ~y ∈ Rn. We define an angle between ~x and ~y to be an angle θ such that

~x · ~y = ‖~x ‖ ‖~y ‖ cos θ

REMARK

Property (3) of Theorem 3 guarantees that such an angle θ exists for each pair ~x, ~y.

Observe that if the angle between two vectors ~x and ~y is θ = π/2, then we get that

~x · ~y = 0. On the other hand, if ~x · ~y = 0, then we can take θ = π/2. Hence, we make

the following definition.


DEFINITION

Orthogonal

Let ~x, ~y ∈ Rn. If ~x · ~y = 0, then ~x and ~y are said to be orthogonal.

EXAMPLE 3 Let ~w =

1

0

1

2

, ~y =

2

3

−4

1

, ~u =

1

2

1

−1

1

, and ~v =

1

1

2

1

5

. We have that ~w and ~y are orthogonal

because

~w · ~y = 1(2) + 0(3) + 1(−4) + 2(1) = 0

While, ~u and ~v are not orthogonal because

~u · ~v = 1(1) + 2(1) + 1(2) + (−1)(1) + 1(5) = 9

Notice that our definition of orthogonal does not match exactly with our geometric

view of two vectors being perpendicular because of the zero vector. In particular, we

have the following theorem.

THEOREM 4 The zero vector ~0 ∈ Rn is orthogonal to every vector ~x ∈ Rn.

We can also extend the definition of orthogonal to a set of vectors.

DEFINITION

Orthogonal Set

A set of vectors {~v1, . . . ,~vk} in Rn is called an orthogonal set if ~vi · ~v j = 0 for all

i , j.

EXAMPLE 4 Show that the standard basis vectors for R3 form an orthogonal set.

Solution: We have

1

0

0

·

0

1

0

= 0,

1

0

0

·

0

0

1

= 0, and

0

0

1

·

0

1

0

= 0 as required.

REMARK

In fact, one reason the standard basis vectors for Rn are so easy to work with is be-

cause they form an orthogonal set of unit vectors. Such a set is called an orthonormal

set. We will look at other orthonormal sets in Chapter 9.


Cross Product

We have seen the dot product of two vectors is a scalar which has a nice geometric

interpretation. We now look at another way of taking a product of two vectors in R3.

Let ~v =

v1

v2

v3

, ~w =

w1

w2

w3

∈ R3. In many applications, it is necessary to determine a

vector ~n =

n1

n2

n3

which is orthogonal to both ~v and ~w. That is, we want

0 = ~v · ~n = v1n1 + v2n2 + v3n3

0 = ~w · ~n = w1n1 + w2n2 + w3n3

Solving these 2 equations in 3 unknowns, we find that one solution is

~n =

v2w3 − v3w2

v3w1 − v1w3

v1w2 − v2w1

Hence, we make the following definition.

DEFINITION

Cross Product

Let ~v, ~w ∈ R3. The cross product of ~v =

v1

v2

v3

and ~w =

w1

w2

w3

is defined to be

~v × ~w =

v2w3 − v3w2

v3w1 − v1w3

v1w2 − v2w1

REMARK

The 2 equations in 3 unknowns we solved to derive the formula for the cross-product

in fact has infinitely many solutions. The natural question to ask is why we picked

this particular one. The reason is two-fold. First, we have picked it so that {~v, ~w,~v×~w}forms what is called a right-handed system. Secondly, we have picked it so that it

has a nice geometric interpretation (see Section 5.5).

EXAMPLE 5 We have

1

2

3

×

6

5

4

=

2(4) − 3(5)

−[1(4) − 3(6)]

1(5) − 2(6)

=

−7

14

−7

0

1

0

×

1

0

0

=

1(0) − 0(0)

−[0(0) − 0(1)]

0(0) − 1(1)

=

0

0

−1


REMARK

One way to help remember the formula for the cross-product is as follows: Write the

components of the vectors next to each other as

v1 w1

v2 w2

v3 w3

To get the first component of ~v× ~w we cover the first row and calculate the difference

of the products of the cross-terms. That is,

v2

v3րց w2

w3gives v2w3 − v3w2

To get the second component we cover the second row and calculate the negative of

the difference of the products of the cross-terms.

v1

v3րց w1

w3gives − (v1w3 − v3w1) = v3w1 − v1w3

For the third component we cover the third row and calculate the difference of the

products of the cross-terms like we did for the first component.

v1

v2րց w1

w2gives v1w2 − v2w1

Be careful to include the extra negative in the calculation of the second component.

EXERCISE 1 Evaluate the following:

(a)

1

0

0

×

0

1

0

(b)

3

−1

1

×

1

2

−1

(c)

1

1

3

×

c

c

3c

The following theorem gives some of the important properties of the cross product.

THEOREM 5 Suppose that ~v, ~w, ~x ∈ R3 and c ∈ R.

(1) If ~n = ~v × ~w, then for any ~y ∈ Span{~v, ~w} we have ~y · ~n = 0

(2) ~v × ~w = −~w × ~v(3) ~v × ~v = ~0(4) ~v × ~w = ~0 if and only if either ~v = ~0 or ~w is a scalar multiple of ~v(5) ~v × (~w + ~x) = ~v × ~w + ~v × ~x(6) (c~v) × (~w) = c(~v × ~w)

(7) ‖~v × ~w ‖ = ‖~v ‖ ‖~w ‖∣

∣

∣ sin θ∣

∣

∣ where θ is the angle between ~v and ~w


REMARK

It is important to remember that, as derived, the cross product of ~v, ~w ∈ R3 gives a

vector that is orthogonal to ~v and ~w (as stated in property (1) of Theorem 5).

Scalar Equation of a Plane

In some applications the vector equation of a plane is not the most suitable. We now

look at how to change from the vector equation of a plane in R3 to another useful

representation.

THEOREM 6 Let ~v, ~w, ~b ∈ R3 with {~v, ~w} being linearly independent and let P be a plane in R3

with vector equation ~x = s~v + t~w + ~b, s, t ∈ R. If ~n = ~v × ~w, then an equation for the

plane is

(~x − ~b) · ~n = 0 (1.6)

Proof: We will prove that equation 1.6 is an equation for the plane by showing that

a point is on the plane if and only if it satisfies equation 1.6.

First observe that since {~v, ~w} is linearly independent neither ~v nor ~w is a scalar mul-

tiple of the other. Thus, property (4) of Theorem 5 gives ~n , ~0.

Consider any point X(x1, x2, x3) in the plane. For the corresponding vector ~x =

x1

x2

x3

the vector equation of the plane gives ~x − ~b = s~v + t~w. Thus,

(~x − ~b) · ~n = (s~v + t~w) · ~n = 0

by property (1) of Theorem 5. Hence, every point in the plane satisfies (1.6).

On the other hand, if X(x1, x2, x3) is any point in R3 that satisfies (~x − ~b) · ~n = 0, then

0 =

x1

x2

x3

−

b1

b2

b3

·

v2w3 − v3w2

v3w1 − v1w3

v1w2 − v2w1

0 =

x1 − b1

x2 − b2

x3 − b3

·

v2w3 − v3w2

v3w1 − v1w3

v1w2 − v2w1

0 = (v2w3 − v3w2)(x1 − b1) + (v3w1 − v1w3)(x2 − b2) + (v1w2 − v2w1)(x3 − b3)

Since ~n , ~0 we get that at least one entry of ~n is non-zero. We will prove the case

where v1w2 − v2w1 , 0. The other two cases are similar.

Bringing (v1w2 − v2w1)(x3 − b3) to the other side gives

−(v1w2 − v2w1)(x3 − b3) = (v2w3 − v3w2)(x1 − b1) + (v3w1 − v1w3)(x2 − b2)


Therefore, we get

−(v1w2 − v2w1)

x1 − b1

x2 − b2

x3 − b3

=

−(v1w2 − v2w1)(x1 − b1)

−(v1w2 − v2w1)(x2 − b2)

(v2w3 − v3w2)(x1 − b1) + (v3w1 − v1w3)(x2 − b2)

= (x1 − b1)

−(v1w2 − v2w1)

0

v2w3 − v3w2

+ (x2 − b2)

0

−(v1w2 − v2w1)

v3w1 − v1w3

= (x1 − b1)

−w2

v1

v2

v3

+ v2

w1

w2

w3

+ (x2 − b2)

w1

v1

v2

v3

− v1

w1

w2

w3

Dividing both sides by −(v1w2 − v2w1) gives

~x − ~b =[

−w2(x1 − b1) + w1(x2 − b2)

−v1w2 + v2w1

]

~v +

[

v2(x1 − b1) − v1(x2 − b2)

−v1w2 + v2w1

]

~w

Consequently, there exists s, t ∈ R such that ~x = s~v + t~w + ~b. Hence, the only points

in R3 which satisfy (1.6) are in P.

Thus, the equation (~x − ~b) · ~n = 0 defines the plane. �

Observe that we can take the vector ~n in Theorem 6 to be any non-zero scalar multiple

of ~v × ~w. Moreover, we can use properties of the dot product to rearrange equation

(1.6) to be in the form ~x · ~n = ~b · ~n.

DEFINITION

Normal Vector

Scalar Equation

Let P be a plane in R3 that passes through the point B(b1, b2, b3). If ~n =

n1

n2

n3

∈ R3 is

a vector such that

n1x1 + n2x2 + n3x3 = b1n1 + b2n2 + b3n3 (1.7)

is an equation for P, then ~n is called a normal vector for P. We call equation (1.7) a

scalar equation of P.

EXAMPLE 6 Find a scalar equation of the plane through B(1, 2, 3) with normal vector ~n =

1

−1

2

.

Solution: By definition a scalar equation of the plane is

n1x1 + n2x2 + n3x3 = b1n1 + b2n2 + b3n3

1x1 + (−1)x2 + 2x3 = 1(1) + (−1)(2) + 2(3) = 5


EXAMPLE 7 What is a normal vector for the plane with scalar equation 3x1 + x2 − x3 = 4?

Solution: By definition, the coefficients x1, x2, and x3 of a scalar equation for the

plane are the components of a normal vector ~n. Thus, ~n =

3

1

−1

.

EXERCISE 2 List 2 other normal vectors for the plane with scalar equation 3x1 + x2 − x3 = 4.

EXAMPLE 8 Find a scalar equation of the plane through A(−1, 0, 2) that is parallel to the plane

x1 − 2x2 + 3x3 = −2.

Solution: For two planes to be parallel, their normal vectors must be non-zero scalar

multiples of each other. That is, we can take a normal vector for the required plane

to be ~n =

1

−2

3

. Thus, a scalar equation of the required plane is

x1 − 2x2 + 3x3 = 1(−1) + (−2)(0) + 3(2) = 5

EXAMPLE 9 Find a scalar equation for the plane defined by ~x = s

3

1

−1

+ t

−1

−1

5

+

2

3

3

, s, t ∈ R.

Solution: By Theorem 6 we find that a normal vector for the plane is

~n =

3

1

−1

×

−1

−1

5

=

4

−14

−2

We can find a point on the plane by picking any values for s and t. Taking s = t = 0,

we find that B(2, 3, 3) is on the plane. Thus, a scalar equation of the plane is

4x1 + (−14)x2 + (−2)x3 = 4(2) + (−14)(3) + (−2)(3) = −40

Of course, we could divide both sides of the equation by -2 to get a slightly simpler

scalar equation

−2x1 + 7x2 + x3 = 20


EXAMPLE 10 Find a scalar equation for the plane defined by ~x = s

4

1

−2

+ t

0

1

2

, s, t ∈ R.

Solution: We have

4

1

−2

×

0

1

2

=

4

−8

4

Since any non-zero scalar multiple of this is also a normal vector to the plane, we

pick ~n =

1

−2

1

. We also notice that taking s = t = 0 shows us that the plane passes

through the origin. Thus, a scalar equation of the plane is

x1 + (−2)x2 + x3 = 1(0) + (−2)(0) + (1)(0) = 0

EXERCISE 3 Find a scalar equation for the plane with vector equation ~x = s

−1

−1

3

+ t

2

1

4

, s, t ∈ R.

Scalar Equation of a Hyperplane

If {~v1, . . . ,~vm−1} is a linearly independent set of vectors in Rm, m ≥ 2, and ~b ∈ Rm,

then, ~x = c1~v1 + · · ·+ cm−1~vm−1 +~b is a vector equation of a hyperplane in Rm. If there

exists a non-zero vector ~n =

n1

...nm

∈ Rm which is orthogonal to each of ~v1, . . . ,~vm−1,

then we can repeat our argument above to get that an equation for the hyperplane,

also called a scalar equation, is

n1x1 + · · · + nmxm = n1b1 + · · · + nmbm

REMARK

Although our geometrical intuition tells us that such a vector ~n exists, we do need to

prove that it exists. To do this, we would repeat what we did in deriving the formula

for the cross product. That is, we would consider the m− 1 equations in m unknowns

given by ~n · ~v1 = 0, . . . , ~n · ~vm−1 = 0. In the next chapter we will look at the theory of

solving such systems which will guarantee us that there is a solution.



1. Evaluate the following:

(a)

5

3

−6

·

3

2

4

(b)

1

−2

−2

3

·

2

1/21/2−1

(c)

∥

∥

∥

∥

∥

∥

∥

∥

∥

√2

1√2

∥

∥

∥

∥

∥

∥

∥

∥

∥

(d)

1

4

−1

·

3

−1

−1

(e)

∥

∥

∥

∥

∥

∥

∥

∥

∥

∥

∥

1

2

−1

3

∥

∥

∥

∥

∥

∥

∥

∥

∥

∥

∥

(f)

2

1

−3

×

1

1

−1

(g)

1

1

−1

×

2

1

−3

(h)

3

2

3

×

3a

2a

3a

2. Find a scalar equation for:

(a) The plane with normal vector ~n =

1

−3

3

that passes through the origin.

(b) The plane through (1, 2, 0) that is parallel

to the plane 3x1 + 2x2 − x3 = 1.

(c) The plane with vector equation

~x = s

1

0

−2

+ t

1

1

−1

, s, t ∈ R.

(d) The plane with vector equation

~x = s

3

3

3

+ t

2

1

−1

, s, t ∈ R.

(e) The plane P = Span

3

1

−2

,

2

2

1

.

3. Determine which of the following sets are

orthogonal.

(a)

1

1

2

,

−1

−1

−1

,

0

1

1

(b)

1

2

−1

,

3

−2

−1

,

2

1

4

(c)

0

0

0

,

1

0

1

,

0

1

0

(d)

1√2

1√3

1√6

,

0

− 1√3

2√6

,

1√2

− 1√3

− 1√6

4. Show that

1/√

3

1/√

3

1/√

3

,

1/√

2

0

−1/√

2

,

1/√

6

−2/√

6

1/√

6

is

an orthogonal set of unit vectors.

5. Prove that ‖c~x ‖ = |c|‖~x ‖ for any ~x ∈ Rn and

c ∈ R.

6. Prove that if ~x , ~0, then1

‖~x ‖~x is a unit vector.

7. Prove that if ~v1 and ~v2 are unit vectors such that

~v1 ·~v2 = 0, then {~v1,~v2} is linearly independent.

8. Prove that if ~v1 · ~v2 = 0, then ‖~v1 + ~v2 ‖2 =‖~v1 ‖2 + ‖~v2‖2.

9. Prove that if ~n = ~v × ~w, then for any ~y ∈Span{~v, ~w} we have ~y · ~n = 0.

10. Prove that the set of all vectors orthogonal to

~x =

1

1

1


11. Prove Theorem 4.



(a) Let ~x, ~y,~z ∈ R4. If ~x · ~y = 0 and ~y · ~z = 0,

then ~x ·~z = 0.

(b) If ~x and ~y are orthogonal, then {~x, ~y} is lin-

early independent.

(c) If ~x and ~y are orthogonal, then {s~x, t~y} is an

orthogonal set for any s, t ∈ R.

(d) If ~x and ~y are both orthogonal to~z, then~z is

orthogonal to ~v = s~x + t~y for any s, t ∈ R.

(e) ~x× (~y×~z) = (~x×~y)×~z for any ~x, ~y,~z ∈ R3.

(f) If ~n is a normal vector for a plane P in R3,

then for any t ∈ R, t , 0, t~n is also a nor-

mal vector for P.


1.5 Projections

We have seen how to combine vectors together via linear combinations to create other

vectors. We now look at this procedure in reverse. We want to take a vector and write

it as a sum of two other vectors. In elementary physics, this is used to split a force

into its components along certain directions (usually horizontal and vertical).

Projection onto a Line

Given vectors ~u,~v ∈ Rn with ~v , ~0 we want to write ~u as a sum of a scalar multiple of

~v and another vector ~w which is orthogonal to ~v as in Figure 1.5.1. That is, we want

~u = c~v + ~w for some c ∈ R such that ~w · ~v = 0.

x1

x2

O

~v~u

c~v

~w

x1

x2

O

~v

~u

c~v

~w

x1

x2

O

~v~u

c~v

~w

Figure 1.5.1: Examples of projections

We begin by determining the value of c. Observe that

~u · ~v = (c~v + ~w) · ~v = c(~v · ~v) + ~w · ~v = c‖~v ‖2 + 0

Hence, c =~u · ~v‖~v ‖2 .

We define the quantity c~v to be the projection of ~u onto the line spanned by ~v. In

particular, it is the amount of ~u in the direction of ~v.

DEFINITION

Projection onto aLine in Rn

Let ~u,~v ∈ Rn with ~v , ~0. The projection of ~u onto ~v is defined by

proj~v(~u) =~u · ~v‖~v ‖2~v

Section 1.5 Projections 43

Now that we have the projection of ~u onto ~v, it is easy to find the desired vector ~wthat is orthogonal to ~v. We get that ~w = ~u − c~v = ~u − proj~v(~u).

DEFINITION

Perpendicular ofa Projection in Rn

Let ~u,~v ∈ Rn with ~v , ~0. The perpendicular of ~u onto ~v is defined by

perp~v(~u) = ~u − proj~v(~u)

EXERCISE 1 Verify that proj~v(~u) · perp~v(~u) = 0.

EXAMPLE 1 Let ~u =

[

2

3

]

and ~v =

[

2

1

]

. Find proj~v(~u) and perp~v(~u).

Solution: We have

proj~v(~u) =~u · ~v‖~v ‖2~v =

7

5

[

2

1

]

=

[

14/57/5

]

perp~v(~u) = ~u − proj~v(~u) =

[

2

3

]

−[

14/57/5

]

=

[

−4/58/5

]

x1

x2

O

~v

~u

proj~v(~u)perp~v(~u)

EXAMPLE 2 Let ~u =

[

−1

−3

]

and ~v =

[

2

1

]


Solution: We have

proj~v(~u) =~u · ~v‖~v ‖2~v =

−5

5

[

2

1

]

=

[

−2

−1

]


[

−1

−3

]

−[

−2

−1

]

=

[

1

−2

]

x1

x2

O

~v

~u

proj~v(~u)

perp~v(~u)

EXAMPLE 3 Let ~v =

1

3

−1

and ~u =

0

2

1


Solution: We have

proj~v(~u) =~u · ~v‖~v ‖2~v =

5

11

1

3

−1

=

5/11

15/11

−5/11


0

2

1

−

5/11

15/11

−5/11

=

−5/11

7/11

16/11


EXERCISE 2 Let ~a =

1

2

2

and ~b =

1

0

1

. Find proj~b(~a) and perp~b(~a).

Projection onto a Plane

So far we have been finding the projection of a vector ~u onto the line spanned by a

vector ~v. How do we define the projection of a vector ~u onto a plane P = Span{~v, ~w}in R3? Our goal is the same as before. That is, we want to be able to write any vector

~x ∈ R3 as a sum of a vector ~u which is in the plane and a vector ~z which is orthogonal

to the plane (that is, a scalar multiple of any normal vector).

Figure 1.5.2 suggests that the desired scalar multiple ~z is equal to the projection of ~xonto a normal vector ~n of the plane. Indeed, we know that we can write

~x = perp~n(~x) + proj~n(~x)

where perp~n(~x) is orthogonal to ~n and hence, by Theorem 1.4.6, is in P. Thus, we get

our desired result by taking ~u = perp~n(~x) and ~z = proj~n(~x).

x1 x2

x3x3

O

~x~n

projn(~x)

perp~n(~x)

Figure 1.5.2: Projection of a vector onto a plane

DEFINITION

Projection andPerpendicular

onto a Plane in R3

Let P be a plane in R3 that passes through the origin and has normal vector ~n. The

projection of ~x ∈ R3 onto P is defined by

projP(~x) = perp~n(~x)

The perpendicular of ~x onto P is defined by

perpP(~x) = proj~n(~x)


EXAMPLE 4 Find the projection and perpendicular of ~y =

2

3

1

onto the plane 3x1 − x2 + 4x3 = 0.

Solution: We see that the normal vector for the plane is ~n =

3

−1

4

. Hence, we get

projP(~y) = perp~n(~y) = ~y − ~y · ~n‖~n ‖2~n =

2

3

1

− 7

26

3

−1

4

=

31/26

85/26

−1/13

perpP(~y) = proj~n(~y) =~y · ~n‖~n ‖2~n =

7

26

3

−1

4

=

21/26

−7/26

14/13

EXAMPLE 5 Find the projection of ~u =

−1

1

3

onto the plane P with vector equation

~x = s

1

2

1

+ t

−2

2

0

, s, t ∈ R

Solution: To find the projection, we first need to find a normal vector of the plane.

We have

1

2

1

×

−2

2

0

=

−2

−2

6

Thus, we can take ~n =

1

1

−3

. Then, the projection of ~u onto the plane is

projP(~u) = perp~n(~u) = ~u − ~u · ~n‖~n ‖2~n =

−1

1

3

− −9

11

1

1

−3

=

−2/11

20/11

6/11


EXAMPLE 6 Find the projection of ~u =

2

3

5

onto the plane P = Span

1

−1

0

,

1

2

0

.

Solution: A vector equation of the plane is ~x = s

1

−1

0

+ t

1

2

0

, s, t ∈ R. We have

1

−1

0

×

1

2

0

=

0

0

3

Thus, a normal vector of the plane is ~n =

0

0

1

. So, the projection of ~u onto P is

projP(~u) = perp~n(~u) = ~u − ~u · ~n‖~n ‖2~n =

2

3

5

− 5

1

0

0

1

=

2

3

0

This result make sense since P is the xy-plane in R3.


1. Evaluate proj~v(~u) and perp~v(~u) where

(a) ~v =

[

2

3

]

, ~u =

[

−3

0

]

(b) ~v =

[

1

3

]

, ~u =

[

3

2

]

(c) ~v =

[

2

−3

]

, ~u =

[

6

4

]

(d) ~v =

[

3

2

]

, ~u =

[

1

3

]

(e) ~v =

1

0

0

, ~u =

3

2

4

(f) ~v =

1

1

1

, ~u =

3

2

4

(g) ~v =

1

0

0

1

, ~u =

2

5

−6

6

(h) ~v =

1

4

4

−1

, ~u =

1

1

1

1

(i) ~v =

1

2

0

1

, ~u =

3

1

1

−1

(j) ~v =

−2

−4

0

−2

, ~u =

3

1

1

−1

2. Find the projection and perpendicular of ~v onto

the plane P.

(a) ~v =

1

0

−1

and P has scalar equation

3x1 − 2x2 + 3x3 = 0.

(b) ~v =

−1

2

1

and P has scalar equation

x1 + 2x2 + 2x3 = 0.

(c) ~v =

3

−4

1

and P has vector equation

~x = s

0

−1

1

+ t

0

3

1

, s, t ∈ R.

(d) ~v =

1

1

2

and P has vector equation

~x = s

1

−2

1

+ t

−2

1

−2

, s, t ∈ R.


3. Let P = Span

1

1

1

,

1

0

−1

. Find projP(~x) where

(a) ~x =

1

2

3

(b) ~x =

2

1

0

(c) ~x =

1

−2

1

(d) ~x =

2

3

3

4. Find the projection of ~v =

2

2

1

onto the plane

S = Span

3

3

−1

,

2

3

−1

.

5. Let ~v ∈ Rn with ~v , ~0. Prove that for any

~x, ~y ∈ Rn we have

(a) proj~v(~x + ~y) = proj~v(~x) + proj~v(~y)

(b) proj~v(s~x) = s proj~v(~x)

(c) proj~v(proj~v(~x)) = proj~v(~x)

6. Let ~v, ~x ∈ Rn with ~v , ~0. Determine whether

each statement is true or false. Justify your

answer.

(a) proj~v(~x) = proj−~v(~x).

(b) proj~v(−~x) = − proj~v(~x).

(c) {proj~v(~x), perp~v(~x)} is linearly independent.

(d) proj~v(perp~v(~x)) = perp~v(proj~v(~x))

7. Let P = Span{~v1,~v2} be a plane in R3 with

normal vector ~n.

(a) Explain how we know that {~v1,~v2} is

linearly independent.

(b) Prove that {~v1,~v2, proj~n(~x)} is linearly

independent for all ~x ∈ R3, ~x < P.

(c) Prove that {~v1,~v2, ~n} is a basis for R3.

8. Let P(p1, p2) be a point in R2 and let L be a line

in R2 which passes through the origin.

(a) Let ~p =

[

p1

p2

]

and let ~d , ~0 be a direction

vector for the line L. Prove that

‖~p − proj~d (~p)‖ < ‖~p − ~x‖

for all ~x ∈ Span{~d}, ~x , proj~d (~p).

(b) Explain in words what part (a) says about

the relationship of proj~d (~p) to P and L.

(c) Determine the geometric meaning of

‖ perp~d (~p)‖ in this context.

9. Suppose that ~u and ~v are orthogonal unit vec-

tors in R3.

(a) Prove that {~u,~v} is linearly independent.

(b) Prove that for every ~x ∈ R3,

perp~u×~v(~x) = proj~u(~x) + proj~v(~x)

Chapter 2

Systems of Linear Equations

In the last chapter, we saw a few cases where we needed to solve a system of m

equations in n unknowns. For example, when determining whether a set was linearly

independent, and when deriving the formula for the cross product. For each of these

cases, the examples and exercises given could be solved, with some effort, using

the method of substitution and elimination. However, in the real world, we don’t

often get a set of 3 linear equations in 3 unknowns. In applications we can easily

have thousands of equations and thousands of unknowns. Hence, we want to develop

some theory that will allow us to solve such problems.

It is important to remember while studying this chapter that in typical applications

the computations are performed by computer. Thus, although we expect that you can

solve small systems by hand, it is more important that you understand the theory of

solving systems. This theory will be used frequently throughout this book.

2.1 Systems of Linear Equations

DEFINITION

Linear Equation

An equation in n variables (unknowns) x1, . . . , xn that can be written in the form

a1x1 + · · · + anxn = b

where a1, . . . , an, b are constants is called a linear equation. The constants ai are

called the coefficients of the equation and b is called the right-hand side.

REMARKS

1. In Chapters 1 - 10 the coefficients and right-hand side will be real numbers. In

Chapter 11 we will look at linear equations where the coefficients and right-

hand side are a little more complex.

2. From our work in the last chapter, we know that a linear equation in n variables

where a1, . . . , an, b ∈ R geometrically represents a hyperplane in Rn.

48

Section 2.1 Systems of Linear Equations 49

EXAMPLE 1 The equation x1 + 0x2 + 2x3 = 4 is linear.

The equation x2 = −4x1 is also linear as it can be written as 4x1 + x2 = 0.

The equations x21+ x1x2 = 5 and x sin x = 0 are not linear equations.

DEFINITION

System of LinearEquations

A set of m linear equations in the same variables x1, . . . , xn is called a system of m

linear equations in n variables.

A general system of m linear equations in n variables has the form

a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2

...... =

...

am1x1 + am2x2 + · · · + amnxn = bm

Observe that the coefficient ai j represents the coefficient of x j in the i-th equation.

DEFINITION

Solution of aSystem

Solution Set

A vector ~s =

s1

...sn

∈ Rn is called a solution of a system of m linear equations in n

variables if all m equations are satisfied when we set xi = si for 1 ≤ i ≤ n. The set of

all solutions of a system of linear equations is called the solution set of the system.

DEFINITION

Consistent

Inconsistent

If a system of linear equations has at least one solution, then it is said to be

consistent. Otherwise, it is said to be inconsistent.

Observe that geometrically a system of m linear equations in n variables represents

m hyperplanes in Rn and so a solution of the system is a vector in Rn which lies on all

m hyperplanes. The system is inconsistent if all m hyperplanes do not share a point

of intersection.

EXAMPLE 2 The system of 2 equations in 3 variables

x1 + x2 + x3 = 1

x1 + x2 + x3 = 2

does not have any solutions since the two corresponding hyperplanes are parallel.

Hence, the system is inconsistent.

50 Chapter 2 Systems of Linear Equations

EXAMPLE 3 The system of 3 linear equations in 2 variables

x1 + x2 = 1

2x1 − 3x2 = 12

−2x1 − 3x2 = 0

is consistent since if we take x1 = 3 and x2 = −2, then we get

3 + (−2) = 1

2(3) − 3(−2) = 12

−2(3) − 3(−2) = 0

It can be shown this is the only solution, so the solution set is

{[

3

−2

]}

.

EXERCISE 1 Sketch the lines x1 + x2 = 1, 2x1 − 3x2 = 12, and −2x1 − 3x2 = 0 on a single graph to

verify the result of Example 3 geometrically.

EXAMPLE 4 It can be shown that the solution set of the system of 2 equations in 3 variables

x1 − 2x2 = 3

x1 + x2 + 3x3 = 9

has vector equation

~x =

7

2

0

+ t

2

1

−1

, t ∈ R

Geometrically, this means that the planes x1 − 2x2 = 3 and x1 + x2 + 3x3 = 9 intersect

in the line with the given vector equation.

EXERCISE 2 Verify that every vector on the line with vector equation ~x =

7

2

0

+ t

2

1

−1

, t ∈ R is

indeed a solution of the system of linear equations in Example 4.

It is easy to show geometrically that a system of 2 linear equations in 2 variables

is either inconsistent, consistent with a unique solution, or consistent with infinitely

many solutions (you should sketch all 3 possibilities). In particular, if a system of 2

linear equations in 2 unknowns has two solutions, then it must in fact have infinitely

many solutions. We now prove that is true for any system of linear equations.

Section 2.1 Systems of Linear Equations 51

THEOREM 1 If the system of linear equations

a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2

...... =

...

am1x1 + am2x2 + · · · + amnxn = bm

has two distinct solutions ~s =

s1

...sn

and ~t =

t1

...tn

, then ~x = ~s + c(~s − ~t) is a distinct

solution for each c ∈ R.

Proof: Observe that the i-th equation of the system is

ai1x1 + · · · + ainxn = bi

Substituting

x1

...xn

= ~s + c(~s − ~t) =

s1 + c(s1 − t1)...

sn + c(sn − tn)

into the i-th equation gives

ai1

(

s1 + c(s1 − t1))

+ · · · + ain

(

sn + c(sn − tn))

= ai1s1 + cai1s1 − cai1t1 + · · · + ainsn + cainsn − caintn

= ai1s1 + · · · + ainsn + c(

ai1s1 + · · · + ainsn

) − c(

ai1t1 + · · · + aintn

)

= bi + cbi − cbi

= bi

Therefore, the i-th equation is satisfied when ~x = ~s + c(~s − ~t). Since this is valid for

all 1 ≤ i ≤ m, we have shown that ~x = ~s + c(~s − ~t) is a solution for each c ∈ R.

We now show that for any c1 , c2, the solutions ~s + c1(~s − ~t) and ~s + c2(~s − ~t) are

distinct. If they are not distinct, then

~s + c1(~s − ~t) = ~s + c2(~s − ~t)c1(~s − ~t) = c2(~s − ~t)

(c1 − c2)(~s − ~t) = ~0

Since c1 , c2, this implies that ~s − ~t = ~0 and hence ~s = ~t. But, this contradicts our

assumption that ~s , ~t. �

This proves that the solution set of a system of m linear equations in n variables must

either be empty, contain exactly one vector, or have infinitely many vectors in it. In

the next section, we will derive an efficient method for finding the solution set of a

system of linear equations.



1. Write a system of linear equations with 3 equa-

tions in 3 unknowns such that:

(a) The system has a unique solution.

(b) The system is inconsistent.

(c) The system has infinitely many solutions.

(d) The solution set of the system is the line

~x = s

1

2

1

+

1

0

1

, s ∈ R.

2. For each of the following, determine if the

given vector ~x is a solution of the specified

system of linear equations.

(a) 6x1 + 2x2 + 3x3 = 1

2x1 + x2 + 6x3 = 2

4x1 + 5x2 − 6x3 = −9,

~x =

3/7−10/7

3/7

(b) 3x1 + 6x2 + 7x3 = 9

−4x1 + 3x2 − 3x3 = −4

x1 − 13x2 − 2x3 = 4,

~x =

−2

−1

3

(c) x1 − 2x2 + x3 − 3x4 = 0

−4x1 + 5x2 + 5x3 − 6x4 = 0

5x1 − 3x2 + 3x3 + 8x4 = 0,~x =

4

3

−1

−1

(d) x1 + 2x2 + x3 = 3

−x1 + x2 + 5x3 = −9

x1 + x2 − x3 = 5,

~x =

7 + 3t

−2 − 2t

t

3. Given that

1

0

0

and

0

1

1

are both solutions of a

system of linear equations, list three other so-

lutions of the same system.



(a) x1 + x3 = −3x2 is a linear equation.

(b) x21+ 2x1x2 + x2

2= 0 is a linear equation.

(c) If a system of linear equations has

1

1

1

as a

solution, then

2

2

2

is also a solution of the

system.

(d) It is impossible to find a solution to a sys-

tem of linear equations that has more un-

knowns than equations.

5. Consider a system of linear equations

a11x1 + · · · + a1nxn = 0

a21x1 + · · · + a2nxn = 0

.........

am1x1 + · · · + amnxn = 0

(a) Explain geometrically why this system

must be consistent.

(b) Prove algebraically that this system must

be consistent.

6. It can be shown that the three hyperplanes

2x1 + x2 + 4x3 + x4 = 6

x1 + x3 + 2x4 = 4

−x1 − 4x2 − 9x3 + 10x4 = 4

intersect in the plane ~x =

4

−2

0

0

+ t

−1

−2

1

0

+ s

−2

3

0

1

,

s, t ∈ R.

(a) Prove that the point P(−1, 2, 1, 2) lies on

all three hyperplanes.

(b) Find three more points that lie on all three

hyperplanes.

Section 2.2 Solving Systems of Linear Equations 53

2.2 Solving Systems of Linear Equations

To derive a method for solving large systems of linear equations, we start by looking

more closely at how we solve small systems.

Given a system of 2 linear equations in 2 unknowns, we can use the method of sub-

stitution and elimination to solve it.

EXAMPLE 1 Solve the system

2x1 + 3x2 = 11

3x1 + 6x2 = 7

Solution: We first want to solve for one variable, say x2. To do this, we eliminate x1

from one of the equations. Multiplying the first equation by (-3) gives

−6x1 − 9x2 = −33

3x1 + 6x2 = 7

Now, if we multiply the second equation by 2 we get

−6x1 − 9x2 = −33

6x1 + 12x2 = 14

Adding the first equation to the second we get

−6x1 − 9x2 = −33

0x1 + 3x2 = −19

Multiplying the second equation by 13

gives

−6x1 − 9x2 = −33

0x1 + x2 = −19/3

We can now add 9 times the second equation to the first equation to get

−6x1 + 0x2 = −90

0x1 + x2 = −19/3

Finally, multiplying the first equation by −16

we get

x1 + 0x2 = 15

0x1 + x2 = −19/3

Hence, the system is consistent with unique solution

[

15

−19/3

]

.


Observe that after each step we actually have a new system of linear equations to

solve. The idea is that each new system has the same solution set as the original and

is easier to solve.

DEFINITION

Equivalent

Two systems of equations which have the same solution set are said to be equivalent.

Of course, this method can be used to solve larger systems as well.

EXERCISE 1 Solve the system by using substitution and elimination. Clearly show/explain all of

your steps used in solving the system and describe your general procedure.

x1 − 2x2 − 3x3 = 0

2x1 + x2 + 5x3 = −1

−x1 + x2 − x3 = 2

Now imagine that you need to solve a system of a hundred thousand equations with

a hundred thousand variables. Certainly you are not going to want to do this by hand

(it would be environmentally unfriendly to use that much paper). So, being clever,

you decide that you want to program a computer to solve the system for you. To do

this, you need to ask yourself a few questions: How is the computer going to store

the system? What operations is the computer allowed to perform on the equations?

How will the computer know which operations it should use to solve the system?

To answer the first question, we observe that when using substitution and elimina-

tion, we do not really need to write down the variables xi each time as we are only

modifying the coefficients and right-hand side. Thus, it makes sense to store just the

coefficients and right-hand side of each equation in the system in a rectangular array.

DEFINITION

AugmentedMatrix

Coefficient Matrix

The coefficient matrix for the system of linear equations

a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2

... =...

am1x1 + am2x2 + · · · + amnxn = bm

is the rectangular array

a11 a12 · · · a1n

a21 a22 · · · a2n

....... . .

...am1 am2 · · · amn

The augmented matrix of the system is

a11 a12 · · · a1n b1

a21 a22 · · · a2n b2

....... . .

......

am1 am2 · · · amn bm


REMARK

It is very important to observe that we can look at the coefficient matrix in two ways.

First, the i-th row of the coefficient matrix represents the coefficients of the i-th equa-

tion in the system. Second, the j-th column of the coefficient matrix represents the

coefficients of x j in all equations.

For computational problems, the notation in the definition is necessary. However,

when working on theory, it would be much better to have more compact notation.

Observe that by using operations on vectors in Rn, the system of linear equations

a11x1 + · · · + a1nxn = b1

...... =

...

am1x1 + · · · + amnxn = bm

can be written as

b1

...bm

=

a11x1 + · · · + a1nxn

...am1x1 + · · · + amnxn

= x1

a11

...am1

+ · · · + xn

a1n

...amn

Notice that the columns of the coefficient matrix of the system are the vectors

~a j =

a1 j

...am j

for 1 ≤ j ≤ n. Thus, we can denote the coefficient matrix of the system by

A =[

~a1 · · · ~an

]

. If we let ~b =

b1

...bm

, then we can denote the augmented matrix of

the system by[

~a1 · · · ~an~b

]

or sometimes simply as[

A | ~b]

.

EXAMPLE 2 The system of linear equations

3x1 + x2 + x3 = −1

6x1 + x3 = −2

−3x1 + 2x2 = 3

has coefficient matrix

A =

3 1 1

6 0 1

−3 2 0

and augmented matrix

[

A | ~b]

=

3 1 1 −1

6 0 1 −2

−3 2 0 3


To answer the question about what operations is the computer allowed to perform, it

is helpful to write each step of our example in matrix form.

EXAMPLE 3 Write each step of Example 1 in matrix form and describe how the rows of the matrix

were modified in each step.

Solution: The initial system has augmented matrix

[

2 3 11

3 6 7

]

We multiplied the first row of the augmented matrix by (-3) to get the augmented

matrix[

−6 −9 −33

3 6 7

]

Then we multiplied the second row by 2 to get

[

−6 −9 −33

6 12 14

]

Adding the first row to the second row gave

[

−6 −9 −33

0 3 −19

]

Next we multiplied the second row by 13

to get

[

−6 −9 −33

0 1 −19/3

]

Then we added 9 times the second row to the first.

[

−6 0 −90

0 1 −19/3

]

The last step was to multiply the first row by −16.

[

1 0 15

0 1 −19/3

]

This is the augmented matrix for the system

x1 = 15

x2 = −19/3

which gives us the solution.


EXERCISE 2 Write each step of Exercise 1 in matrix form and describe how the rows of the matrix

were modified in each step.

Our method of solving has involved two types of operations. One was to multiply

a row by a non-zero number and the second was to add a multiple of one row to

another. However, to create a nice method for a computer to solve a system, we will

add one more type of operation: swapping rows.

DEFINITION

Elementary RowOperations

The three elementary row operations (EROs) are:

1. multiplying a row by a non-zero scalar,

2. adding a multiple of one row to another,

3. swapping two rows.

When applying EROs to a matrix, called row reducing the matrix, we often use

short hand notation to indicate which operations we are using:

1. cRi indicates multiplying the i-th row by c , 0.

2. Ri + cR j indicates adding c times the j-th row to the i-th row.

3. Ri ↔ R j indicates swapping the i-th row and the j-th row.

For now you should always indicate which ERO you use in each step. Not only will

this help you in row reducing a matrix and finding/fixing computational errors, but in

Chapter 5 we will see that this is necessary.

Observe that EROs are reversible (we can always perform an ERO that undoes what

an ERO did), hence if there exists a sequence of elementary row operations that

transform A into B, then there also exists a sequence of elementary row operations

that transforms B into A.

DEFINITION

Row Equivalent

Two matrices A and B are said to be row equivalent if there exists a sequence of

elementary row operations that transform A into B. We write A ∼ B.

Recall that our goal is to row reduce the augmented matrix of a system into the

augmented matrix of an equivalent system that is easier to solve.

THEOREM 1 If the augmented matrices[

A1 | ~b1

]

and[

A | ~b]

are row equivalent, then the systems

of linear equations associated with each system are equivalent.

Finally, to answer the third question about which operations a computer should use

to solve the system, we need to figure out an algorithm for the computer to follow.

The algorithm should apply elementary row operations to the augmented matrix of

a system until we have an equivalent system which is easy to solve. We make a

definition for the form of the coefficient matrix that makes the system easiest to solve.


DEFINITION

Reduced RowEchelon Form

A matrix R is said to be in reduced row echelon form (RREF) if:

1. All rows containing a non-zero entry are above rows which only contains zeros.

2. The first non-zero entry in each non-zero row is 1, called a leading one.

3. The leading one in each non-zero row is to the right of the leading one in any

row above it.

4. A leading one is the only non-zero entry in its column.

If A is row equivalent to a matrix R in RREF, then we say that R is the reduced row

echelon form of A.

In the definition, we can say that R is the reduced row echelon form of A because of

the following theorem.

THEOREM 2 If A is a matrix, then A has a unique reduced row echelon form R.

The proof requires additional tools which will be presented in Chapter 4.

EXAMPLE 4 Determine which of the following matrices are in RREF.

(a)

1 0 2 0

0 1 1 0

0 0 0 1

(b)

0 1 0 0

0 0 0 1

0 0 0 0

(c)

1 0 0 2

0 1 1 1

0 0 1 1

(d)

[

0 1 0

1 0 0

]

Solution: The matrices in (a) and (b) are in RREF. The matrix in (c) is not in RREF

as the column containing the leading one in the third row has another non-zero entry.

The matrix (d) is not in RREF since the leading one in the second row is to the left

of the leading one in the row above it.

EXERCISE 3 Determine which of the following matrices are in RREF.

(a)

1 0 −1 1

0 1 −1 −1

0 0 0 1

(b)

1 0 −1 0

0 0 0 1

0 0 0 0

(c)

0 2 0 3

0 0 2 0

0 0 0 0

(d)

1 0 7 0

0 0 0 0

0 1 0 1

We now just need to give our computer an algorithm to row reduce any matrix to

its reduced row echelon form. The typical method for row reducing an augmented

matrix to its RREF is called Gauss-Jordan Elimination. Although this algorithm for

row reducing a matrix always works, it is not always the best way to reduce a matrix

by hand (or by computer). In individual cases, you may want to perform various

“tricks” to make the row reductions/computations easier. For example, you probably

want to avoid fractions as long as possible. These “tricks” are best learned from

experience. That is, you should row reduce lots of matrices.


ALGORITHM

Gauss-Jordan Elimination To row reduce a non-zero matrix into RREF, we

1. Consider the first non-zero column. Use elementary row operations (if neces-

sary) to get a leading one in the top of the column.

2. Use the elementary row operation add a multiple of one row to another to make

all entries beneath the leading one into a 0.

3. Consider the submatrix consisting of the columns to the right of the column

you just modified and the rows beneath the row that just got a leading one. Use

elementary row operations (if necessary) to get a leading one in the top of the

first non-zero column of this submatrix.

4. Use the elementary row operation add a multiple of one row to another to make

all other entries in the column (for the whole matrix, not just the submatrix)

containing the new leading one into a 0.

5. Repeat steps 3 and 4 until the matrix is in RREF.

In this example, we will follow the Gauss-Jordan Elimination Algorithm to demon-

strate how it works.

EXAMPLE 5 Solve the following system of equations by row reducing the augmented matrix to

RREF.

x1 + x2 = −7

2x1 + 4x2 + x3 = −16

x1 + 2x2 + x3 = 9

Solution: The upper left corner is already a leading one, so we place 0s beneath it.

1 1 0 −7

2 4 1 −16

1 2 1 9

R2 − 2R1 ∼

1 1 0 −7

0 2 1 −2

1 2 1 9

R3 − R1

∼

1 1 0 −7

0 2 1 −2

0 1 1 16

We now consider the submatrix where we ignore the first row and the first column.

The entry in the top of the first column of this submatrix is a 2. We need to make this

a leading one.

1 1 0 −7

0 2 1 −2

0 1 1 16

R2 ↔ R3

∼

1 1 0 −7

0 1 1 16

0 2 1 −2


We now need to get 0s above and below the new leading one.

1 1 0 −7

0 1 1 16

0 2 1 −2

R1 − R2

∼

1 0 −1 −23

0 1 1 16

0 2 1 −2

R3 − 2R2

∼

1 0 −1 −23

0 1 1 16

0 0 −1 −34

We repeat the procedure on the new submatrix [−1 | −34].

1 0 −1 −23

0 1 1 16

0 0 −1 −34

(−1)R3

∼

1 0 −1 −23

0 1 1 16

0 0 1 34

R1 + R3

∼

1 0 0 11

0 1 1 16

0 0 1 34

R2 − R3 ∼

1 0 0 11

0 1 0 −18

0 0 1 34

This corresponds to the system x1 = 11, x2 = −18, x3 = 34. Hence, the solution is

~x =

11

−18

34

.

When row reducing a matrix, we must remember that we are only allowed to perform

one elementary row operation at a time as in the example. However, writing out the

matrix so many times becomes quite tedious and time consuming. Thus, whenever

possible, we include multiple elementary row operations in a single rewriting of the

matrix. When doing this, we must make sure that we are not modifying a row that

we are using in another elementary row operation.

EXAMPLE 6 Find the solution set of the following system of equations.

x1 + x2 + 2x3 + x4 = 3

x1 + 2x2 + 4x3 + x4 = 7

x1 + x4 = −21

Solution: We write the augmented matrix and row reduce

1 1 2 1 3

1 2 4 1 7

1 0 0 1 −21

R2 − R1

R3 − R1

∼

1 1 2 1 3

0 1 2 0 4

0 −1 −2 0 −24

R1 − R2

R3 + R2

∼

1 0 0 1 −1

0 1 2 0 4

0 0 0 0 −20

Observe that the last row represents the equation 0x1+0x2+0x3+0x4 = −20. Clearly

this is impossible, so the system is inconsistent.

It is clear that if we ever get a row of the form[

0 · · · 0 b]

with b , 0, then the

system of equations is inconsistent as this corresponds to the equation 0 = b.


EXERCISE 4 Solve the following system of linear equations.

x1 + 2x2 + x3 = 0

−x1 − 2x2 + 2x3 = 0

2x2 + 4x3 = −3

Infinitely Many Solutions

As we discussed above, a consistent system of linear equations either has a unique

solution or infinitely many. We now look at how to determine all solutions of a system

with infinitely many solutions. We start by considering an example.

EXAMPLE 7 Show the following system of equations has infinitely many solutions and find the

solution set.

x1 + x2 + x3 = 4

x2 + x3 = 3

Solution: We row reduce the augmented matrix to RREF.

[

1 1 1 4

0 1 1 3

]

R1 − R2 ∼[

1 0 0 1

0 1 1 3

]

Observe that the RREF of the augmented matrix corresponds to the system of equa-

tions

x1 = 1

x2 + x3 = 3

There are infinitely many choices for x2 and x3 that will satisfy x2 + x3 = 3. Hence,

the system has infinitely many solutions.

Observe that by choosing any x3 = t ∈ R, we get a unique value for x2; in particular,

x2 = 3 − t. Thus, the solution set of the system is

~x =

x1

x2

x3

=

1

3 − t

t

=

1

3

0

+ t

0

−1

1

, t ∈ R

In this example, we could have solved for x3 in terms of x2 instead. However, for

consistency and ease, we make the convention that we always solve for the variables

whose columns have leading ones in terms of the variables whose columns do not

have leading ones.


DEFINITION

Free Variable

Let R be the RREF of a coefficient matrix of a consistent system of linear equations.

If the j-th column of R does not contain a leading one, then we call x j a free variable

of the system of linear equations.

We can now give an algorithm for solving a system of linear equations.

ALGORITHM

To solve a system of linear equations:

1. Write the augmented matrix for the system.

2. Use elementary row operations to row reduce the augmented matrix into RREF.

3. Write the system of linear equations corresponding to the RREF.

4. If the system contains an equation of the form 0 = 1, the system is inconsistent.

5. Otherwise, move each free variable (if any) to the right hand side of each equa-

tion and assign each free variable a parameter.

6. Determine the solution set by using vector operations to write the system as a

linear combination of vectors.

EXAMPLE 8 Solve the following system of linear equations.

2x1 + x2 + x3 + x4 = −2

x1 + x3 + x4 = 1

−2x2 + 2x3 + 2x4 = 8

Solution: We have

2 1 1 1 −2

1 0 1 1 1

0 −2 2 2 8

R1 ↔ R2

12R3

∼

1 0 1 1 1

2 1 1 1 −2

0 −1 1 1 4

R2 − 2R1 ∼

1 0 1 1 1

0 1 −1 −1 −4

0 −1 1 1 4

R3 + R2

∼

1 0 1 1 1

0 1 −1 −1 −4

0 0 0 0 0

The corresponding system of equations is

x1 + x3 + x4 = 1

x2 − x3 − x4 = −4

0 = 0

Since x3 and x4 are free variables, we let x3 = s ∈ R and x4 = t ∈ R. Thus, we have

x1 = 1− x3− x4 = 1− s− t and x2 = −4+ x3+ x4 = −4+ s+ t. Therefore, all solutions

have the form

~x =

x1

x2

x3

x4

=

1 − s − t

−4 + s + t

s

t

=

1

−4

0

0

+ s

−1

1

1

0

+ t

−1

1

0

1

, s, t ∈ R


EXAMPLE 9 Solve the following system of linear equations.

x1 + x3 + 2x4 = 4

2x1 + x2 + 4x3 + x4 = 6

−x1 − 4x2 − 9x3 + 5x4 = 4

Solution: We have

1 0 1 2 4

2 1 4 1 6

−1 −4 −9 5 4

R2 − 2R1

R3 + R1

∼

1 0 1 2 4

0 1 2 −3 −2

0 −4 −8 7 8

R3 + 4R2

∼

1 0 1 2 4

0 1 2 −3 −2

0 0 0 −5 0

−15R3

∼

1 0 1 2 4

0 1 2 −3 −2

0 0 0 1 0

R1 − 2R3

R2 + 3R3∼

1 0 1 0 4

0 1 2 0 −2

0 0 0 1 0

The corresponding system of equations is

x1 + x3 = 4

x2 + 2x3 = −2

x4 = 0

Since x3 is a free variable, we let x3 = s ∈ R. Thus, we have x1 = 4 − x3 = 4 − s and

x2 = −2 − 2x3 = −2 − 2s. Therefore, all solutions have the form

~x =

x1

x2

x3

x4

=

4 − s

−2 − 2s

s

0

=

4

−2

0

0

+ s

−1

−2

1

0

, s ∈ R

EXERCISE 5 Solve the following system of linear equations.

x2 + 3x3 = 0

2x1 + 3x2 + x3 + x4 = 0

−2x1 + 8x3 = 0


REMARK

Instead of thinking of this algorithm as solving a system, we can think about it as just

changing the form of the system. The original system is an implicit representation

for the intersection of the hyperplanes, while a vector equation for the solution set is

an explicit form. Note that both representations have their uses. For example, if you

wanted to find ten points which were on the intersection of the hyperplanes in Ex-

ample 8, it would be much easier to use a vector equation of the solution set. On the

other hand, if you wanted to determine if ten particular points were on the intersec-

tion, it would be much easier to test the original scalar equations of the hyperplanes.

One reason we developed our theory of solving systems of linear equations was so

that we could solve more complicated linear algebra problems.

EXAMPLE 10 Prove that B =

2

−1

2

,

−1

1

0

,

2

0

5

is a basis for R3.

Solution: By definition of a basis, we need to prove that B spans R3 and is linearly

independent. For spanning, we need to show that R3 = SpanB. First, since B ⊂ R3

we have that SpanB ⊆ R3 by Theorem 1.1.1. Next, for any

x1

x2

x3

∈ R3 we need to

show that we can find t1, t2, and t3 such that

x1

x2

x3

= t1

2

−1

2

+ t2

−1

1

0

+ t3

2

0

5

=

2t1 − t2 + 2t3

−t1 + t2

2t1 + 5t3

Comparing entries gives the system of linear equations

2t1 − t2 + 2t3 = x1

−t1 + t2 = x2

2t1 + 5t3 = x3

We now need to determine if this system is consistent for all x1, x2, and x3. Row

reducing gives

2 −1 2 x1

−1 1 0 x2

2 0 5 x3

R1 + R2

∼

1 0 2 x1 + x2

−1 1 0 x2

2 0 5 x3

R2 + R1

R3 − 2R1

∼

1 0 2 x1 + x2

0 1 2 x1 + 2x2

0 0 1 −2x1 − 2x2 + x3

R1 − 2R3

R2 − 2R3 ∼

1 0 0 5x1 + 5x2 − 2x3

0 1 0 5x1 + 6x2 − 2x3

0 0 1 −2x1 − 2x2 + x3


Thus, the system is consistent for all x1, x2, and x3 as required. Hence, R3 ⊆ SpanB.

Therefore, SpanB = R3.

To determine if B is linearly independent, we use the definition of linear indepen-

dence. Consider

t1

2

−1

2

+ t2

−1

1

0

+ t3

2

0

5

=

0

0

0

We see that this is going to give us exactly the same system as above, except that

the right-hand side is now all zeros. Hence, performing the same elementary row

operations will give

2 −1 2 0

−1 1 0 0

2 0 5 0

∼

1 0 0 0

0 1 0 0

0 0 1 0

Consequently, the only solution is t1 = t2 = t3 = 0 and so B is also linearly indepen-

dent. Therefore, it is a basis for R3.

EXERCISE 6 Prove that B =

1

2

−1

,

2

1

1

,

3

3

1

is a basis for R3.

Homogeneous Systems

In the last example, we saw that determining if a set is linearly independent cor-

responds to solving a system where the right-hand side only contains zeros. Such

systems arise frequently in linear algebra and its applications, so we look at these a

little more closely.

DEFINITION

HomogeneousSystem

A system of linear equations is said to be a homogeneous system if the right-hand

side only contains zeros. That is, it has the form[

A | ~0]

.

Observe that the augmented part of any matrix that is row equivalent to the augmented

matrix of a homogeneous system will only contain zeros. Thus, it is standard practice

to omit this column when solving a homogeneous system; we just row reduce the

coefficient matrix.

Moreover, we immediately observe that it is impossible to get a row of the form[

0 · · · 0 1]

, hence a homogeneous system is always consistent. In particular, it

is easy to see that the zero vector ~0 is always a solution of a homogeneous system.


EXAMPLE 11 Find the solution set of the homogeneous system

x1 + 2x2 + 3x3 = 0

−2x1 − 3x2 − 6x3 = 0

x1 + 3x2 + 4x3 = 0

x1 + 4x2 + 5x3 = 0

Solution: We row reduce the coefficient matrix to get

1 2 3

−2 −3 −6

1 3 4

1 4 5

R2 + 2R1

R3 − R1

R4 − R1

∼

1 2 3

0 1 0

0 1 1

0 2 2

R1 − 2R2

R3 − R2

R4 − 2R2

∼

1 0 3

0 1 0

0 0 1

0 0 2

R1 − 3R3

R4 − 2R3

∼

1 0 0

0 1 0

0 0 1

0 0 0

Since this is the coefficient matrix of a homogeneous system we must remember that

the right-hand sides of all the equations are 0. Thus, when we write the RREF back

into equation form we get x1 = 0, x2 = 0, and x3 = 0. Therefore, the only solution is

the trivial solution.

EXAMPLE 12 Find the solution set of the homogeneous system

x1 + 2x2 = 0

x1 + 3x2 + x3 = 0

x2 + x3 = 0

Solution: We row reduce the coefficient matrix to get

1 2 0

1 3 1

0 1 1

R2 − R1 ∼

1 2 0

0 1 1

0 1 1

R1 − 2R2

R3 − R2

∼

1 0 −2

0 1 1

0 0 0

Rewriting the RREF as a homogeneous system gives x1 − 2x3 = 0 and x2 + x3 = 0.

Since x3 is a free variable, we let x3 = t ∈ R. Then x1 = 2x3 = 2t and x2 = −x3 = −t,

so the solution set has vector equation

~x =

x1

x2

x3

=

2t

−t

t

= t

2

−1

1

, t ∈ R

We see in both of the examples that the solution set is a subspace of R3.


THEOREM 3 The solution set of a homogeneous system of m linear equations in n variables is a

subspace of Rn.

Proof: For 1 ≤ i ≤ m, denote the i-th equation of the homogeneous system by

ai1x1 + · · · + ainxn = 0

By definition, the solution set of the system is a subset ofRn and clearly ~0 is a solution

of the system. Let ~s =

s1

...sn

and ~t =

t1

...tn

both be solutions of the system. Observe that

ai1(s1 + t1) + · · · + ain(sn + tn) = (ai1s1 + · · · + ainsn) + (ai1t1 + · · · + aintn) = 0 + 0 = 0

and

ai1(cs1) + · · · + ain(csn) = c(ai1s1 + · · · + ainsn) = c(0) = 0

for 1 ≤ i ≤ m. Thus, ~s + ~t and c~s are also solutions for any c ∈ R, so the solution set

is a subspace of Rn by the Subspace Test. �

DEFINITION

Solution Space

The solution set of a homogeneous system is called the solution space of the system.

Rank

We see that the number of free variables in a system of linear equations is the number

of columns in the RREF of the coefficient matrix which do not have leading ones.

Thus, it makes sense to count the number of leading ones in the RREF of a matrix.

DEFINITION

Rank

The rank of a matrix A is the number of leading ones in the RREF of the matrix and

is denoted rank A.

EXAMPLE 13 Determine the rank of the following matrices.

(a) A =

1 0 0 1 0

0 0 1 0 −2

0 0 0 0 0

.

Solution: Since A is in RREF, we see that A has two leading ones, so rank A = 2.

(b) B =

2 1 1 1 −2

1 0 1 1 1

0 0 −2 2 8

.

Solution: Observe that the matrix B is just the augmented matrix in Example 8.

Hence, the rank of B is 2 since the RREF of B is

R =

1 0 1 1 1

0 1 −1 −1 −4

0 0 0 0 0


By definition of rank, we have that the rank of an m × n matrix A must be less than

or equal to the number of rows of A. However, it is also important to observe that

by definition of RREF, we can have at most one leading one per column as well. So,

we also have that the rank of a matrix A must be less than or equal to the number of

columns of A. That is, we have the following theorem.

THEOREM 4 For any m × n matrix A we have

rank A ≤ min(m, n)

We now get the following extremely important theorem which summarizes all of

our results on systems of linear equations in terms of rank. This theorem will be

referenced many times throughout the rest of this book.

THEOREM 5 (System-Rank Theorem)

Let A be the coefficient matrix of a system of m linear equations in n unknowns[

A | ~b]

.

(1) The rank of A is less than the rank of the augmented matrix[

A | ~b]

if and only

if the system is inconsistent.

(2) If the system[

A | ~b]

is consistent, then the system contains (n − rank A) free

variables (parameters).

(3) rank A = m if and only if the system[

A | ~b]

is consistent for every ~b ∈ Rm.

Proof: (1) The rank of A is less than the rank of the augmented matrix if and only

if there is a row in the RREF of the augmented matrix of the form[

0 · · · 0 1]

.

This is true if and only if the system is inconsistent.

(2) This follows immediately from the definition of rank and free variables.

(3) Assume that the RREF of[

A | ~b]

is[

R | ~c ]

. If[

A | ~b]

is inconsistent for some

~b ∈ Rm, then the RREF of the augmented matrix must contain a row of the form[

0 · · · 0 1]

, as otherwise, we could find a solution to[

A | ~b]

by picking all

free-variables to be 0, and setting the variable containing the leading one in the i-th

row to be equal to ci. Therefore, rank A < m.

On the other hand, if rank A < m, then the RREF R of A has a row of all zeros. Thus,[

R | ~em

]

is inconsistent as it has a row of the form[

0 · · · 0 1]

. Since elementary

row operations are reversible, we can apply the reverse of the row operations needed

to row reduce A to R on[

R | ~em

]

to get[

A | ~b]

for some ~b ∈ Rm. Then this system

is inconsistent since elementary row operations do not change the solution set. Thus,

there exists some ~b ∈ Rm such that[

A | ~b]

is inconsistent. �


Part (2) of Theorem 5 tells us that a consistent system[

A | ~b]

of m linear equations

in n variables has a unique solution if and only if rank A = n. The next theorem

tells us that if[

A | ~b]

is consistent with rank A < n, then the set of (n − rank A)

vectors corresponding to the free variables in our method for finding the solution set

is necessarily linearly independent.

THEOREM 6 (Solution Theorem)

Let[

A | ~b]

be a consistent system of m linear equations in n variables with RREF[

R | ~c ]

. If rank A = k < n, then a vector equation of the solution set of[

A | ~b]

has the

form

~x = ~d + t1~v1 + · · · + tn−k~vn−k, t1, . . . , tn−k ∈ Rwhere ~d ∈ Rn and {~v1, . . . ,~vn−k} is a linearly independent set in Rn. In particular, the

solution set of[

A | ~b]

is an (n − k)-flat in Rn.

The idea of the proof is to observe that if xi is a free variable, then it will correspond

to a term in a vector equation of the solution set of the form

ci~vi = ci

v1

...vi−1

1

0...0

Thus, any free variables with index less than i must have a 0 as their i-th entry, so

{~v1, . . . ,~vn−k} will be linearly independent.

One of the many important uses of the System-Rank Theorem is it allows us to easily

prove some important facts about spanning, linear independence, and bases in Rn.

THEOREM 7 A set of n vectors in Rn is linearly independent if and only if it spans Rn.

REMARK

The method used in the proof of Theorem 7 will be used several more times in this

book.



1. Find each of the following systems:

(i) Write the coefficient matrix and aug-

mented matrix.

(ii) Find the rank of the coefficient and aug-

mented matrix by row reducing to RREF.

(iii) Determine the solution set for the system.

(iv) Describe the system and its solution set ge-

ometrically.

(a) x1 + 2x2 + 2x3 = 1

x1 + x2 + x3 = 2

(b) 2x1 + 2x2 − 3x3 = 3

x1 + x2 + x3 = 9

(c) 2x1 + x2 − x3 = 4

2x1 − x2 − 3x3 = −1

3x1 + 2x2 − x3 = 8

(d) x1 + 2x2 = 3

x1 + 3x2 − x3 = 4

−3x1 − 8x2 + 2x3 = −11

(e) x1 + 3x2 − x3 = 9

2x1 + x2 − 2x3 = −2

(f) 2x1 − 2x2 + 2x3 + x4 = −4

3x1 − 3x2 + 3x3 + 4x4 = 9

(g) x1 + x2 + x3 + 6x4 = 5

2x2 − 4x3 + 4x4 = −4

2x1 + 4x3 + 5x4 = 4

−x1 + 2x2 − 3x3 + 4x4 = 9

(h) x1 + 2x2 + x3 + 2x4 = −2

2x1 + x2 + 2x3 + x4 = 2

2x2 + x3 + x4 = 2

x1 + x2 + 2x3 = 6

(i) x2 − 2x3 = 3

2x1 + 2x2 + 3x3 = 1

x1 + 2x2 + 4x3 = −1

2x1 + x2 − x3 = 1

2. For each of the following, determine if ~x is in

SpanB.

(a) B =

1

2

−1

,

3

−4

2

, ~x =

1

3

1

(b) B =

1

2

−1

,

3

−4

2

, ~x =

1

0

0

(c) B =

3

1

2

3

,

2

1

3

−4

,

−1

0

−4

3

, ~x =

1

−3

−7

−9

(d) B =

5

−3

4

,

−6

3

−4

,

6

−6

8

, ~x =

1

−7

−9

3. Which of the following sets is linearly indepen-

dent.

(a)

1

2

−1

,

1

3

1

,

1

−1

4

,

2

1

1

(b)

1

0

−1

,

2

1

−2

,

3

3

−1

(c)

1

2

3

,

4

5

6

,

7

8

9

(d)

5

−3

4

,

−6

3

−4

,

6

−6

8

(e)

3

1

2

,

2

1

3

,

1

3

2

(f)

3

1

3

5

,

6

4

−3

4

,

0

2

−1

8

4. Which of the following sets is basis for R3.

(a)

5

−3

7

,

−4

1

6

(b)

1

−1

3

,

1

0

−1

,

6

−2

5

,

2

−1

4

(c)

1

3

−2

,

2

−5

3

,

4

1

0

(d)

1

1

1

,

2

1

−1

,

1

0

−3

(e)

1

0

−4

,

3

3

−5

,

3

6

2

(f)

3

2

−3

,

4

4

0

,

2

1

−2

5. Prove Theorem 7.

6. Let a, b, c, d ∈ R. Prove that the set

{[

a

c

]

,

[

b

d

]}

is a basis for R2 if and only if ad − bc , 0.


7. Consider three planes P1, P2, and P3, with re-

spective equations

a11x1 + a12x2 + a13x3 = b1,

a21x1 + a22x2 + a23x3 = b2, and

a31x1 + a32x2 + a33x3 = b3

The intersections of these planes is illustrated

below. Assume that L1, L2, and L3 are parallel.

P1P2

P3

L1

L2

L3

What is the rank of A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

?

8. Prove that if two planes in R3 intersect, then

they either intersect in a line or a plane.



(a) The solution set of a system of linear equa-

tions is a subspace of Rn.

(b) A system of 3 equations in 5 variables has

infinitely many solutions.

(c) A system of 5 equations in 3 variables can-

not have infinitely many solutions.

(d) A homogeneous system of 3 equations in

5 variables cannot have a unique solution.

(e) If A is the coefficient matrix of a system

of 3 linear equations in 4 variables where

rank A = 2, then the solution set is a plane

in R4.

(f) If A is the coefficient matrix of a system

of 3 linear equations in 5 variables where

rank A = 3, then the solution set is a plane

in R5.

(g) If the solution space of a homogeneous

system of 5 linear equations in 3 variables

is a line in R3, then the rank of the coeffi-

cient matrix is 2.

(h) If A is the coefficient matrix of a system

of m linear equations in n variables where

m < n, then rank A = m.

(i) If a system of m linear equations in n vari-

ables[

A | ~b]

is consistent for every ~b ∈ Rm

with a unique solution, then m = n.

10. Let B = {~v1, . . . ,~vk} be a set of vectors in Rn.

Prove thatB is linearly independent if and only

if the rank of the matrix[

~v1 · · · ~vk

]

is k.

11. Prove that:

(a) a set of k vectors in Rn spans Rn if and only

if the rank of the coefficient matrix of the

system c1~v1 + · · · + ck~vk = ~b is n.

(b) if Span{~v1, . . . ,~vk} = Rn, then k ≥ n.

12. Prove that:

(a) if S is a non-trivial subspace of Rn,

Span{~v1, . . . ,~vℓ} = S, and {~u1, . . . , ~uk} is

a linearly independent set of vectors in S,

then k ≤ ℓ.(b) if {~v1, . . . ,~vℓ} and {~u1, . . . , ~uk} are bases of

a non-trivial subspace S, then k = ℓ.

13. Consider a system of m linear equations in n

variables with coefficient matrix A and right-

hand side ~b. Prove that if ~y is a solution of[

A | ~0]

and ~z is a solution of[

A | ~b]

, then ~x =

~z + c~y is a solution of[

A | ~b]

for each c ∈ R.

14. Steady flow through a network of conductors

can be described a system of linear equations.

Such networks can be used to describe systems

such as traffic/fluid flow, electrical current,

metabolism, and warehouse supply chains.

50

f f

f

f f

12

3

4 5

50

6040

To characterize the possible flows in the net-

work shown above, we observe that, in steady

state, the flow into each corner node must be

equal to the flow out. For instance, at the top

node, this balance constraint takes the form

(flow in) = f1 + f2 = 50 = (flow out).

(a) Create a system of equations by writing

the constraint equation for all four nodes.

(b) Solve this system to determine how the

flows f1, . . . , f5 are related in steady state.

(c) How do the constraints you found in part

(a) change if the flows cannot take nega-

tive values (i.e. fi ≥ 0 for i = 1, 2, . . . , 5.)

Chapter 3

Matrices and Linear Mappings

3.1 Operations on Matrices

Addition and Scalar Multiplication of Matrices

In the last chapter we used matrix notation to help us solve systems of linear equa-

tions. However, matrices can be used in other settings as well. We will now see that

we can treat matrices just like vectors in Rn.

DEFINITION

Matrix

Mm×n(R)

An m× n matrix A is a rectangular array with m rows and n columns. We denote the

entry in the i-th row and j-th column by ai j. That is,

A =

a11 a12 · · · a1 j · · · a1n

a21 a22 · · · a2 j · · · a2n

......

......

ai1 ai2 · · · ai j · · · ain

......

......

am1 am2 · · · am j · · · amn

Two m × n matrices A and B are equal if ai j = bi j for all 1 ≤ i ≤ m, 1 ≤ j ≤ n.

The set of all m × n matrices with real entries is denoted by Mm×n(R).

Addition and scalar multiplication of matrices is also defined to match what we did

with vectors in Rn.

DEFINITION

Addition andScalar

Multiplication

Let A, B ∈ Mm×n(R) and c ∈ R. We define A + B and cA by

(A + B)i j = (A)i j + (B)i j

(cA)i j = c(A)i j

72

Section 3.1 Operations on Matrices 73

REMARK

1. Note that addition is only defined if both matrices have the same size.

2. As with vectors in Rn, a sum of scalar multiples of matrices is called a linear

combination.

EXAMPLE 1 If A =

[

1 2

3 −3

]

, B =

[

2 5

−1 3

]

, and C =

[

1 −2

−√

2√

3

]

, then

A + B =

[

1 2

3 −3

]

+

[

2 5

−1 3

]

=

[

3 7

2 0

]

√2C =

√2

[

1 −2

−√

2√

3

]

=

[√2 −2

√2

−2√

6

]

3A − 2B =

[

3 6

9 −9

]

+

[

−4 −10

2 −6

]

=

[

−1 −4

11 −15

]

Not surprisingly, addition and scalar multiplication of matrices satisfy the same ten

properties as vectors in Rn. The proof of each property is essentially the same as in

Rn, and hence is left to the reader.

THEOREM 1 If A, B,C ∈ Mm×n(R) and c, d ∈ R, then

V1 A + B ∈ Mm×n(R)

V2 (A + B) +C = A + (B + C)

V3 A + B = B + A

V4 There exists a matrix Om,n ∈ Mm×n(R), such that A + Om,n = A for all A.

V5 For every A ∈ Mm×n(R) there exists (−A) ∈ Mm×n(R) such that A+ (−A) = Om,n

V6 cA ∈ Mm×n(R)

V7 c(dA) = (cd)A

V8 (c + d)A = cA + dA

V9 c(A + B) = cA + cB

V10 1A = A

The matrix Om,n is the m×n matrix with all entries zero and is called the zero matrix.

The Transpose of a Matrix

We will soon see that we sometimes wish to treat the rows of an m × n matrix as

vectors in Rn. To preserve our convention of writing vectors in Rn as column vectors,

we invent some notation for turning rows into columns and vice versa.

74 Chapter 3 Matrices and Linear Mappings

DEFINITION

Transpose

The transpose of an m × n matrix A is the n × m matrix AT whose i j-th entry is the

ji-th entry of A. That is,

(AT )i j = (A) ji

EXAMPLE 2 Determine the transpose of A =[

1 2 5]

, B =

−3

0

1

, and C =

3 −2

4 1

7 −6

.

Solution: We have

AT =[

1 2 5]T=

1

2

5

BT =

−3

0

1

T

=[

−3 0 1]

CT =

3 −2

4 1

7 −6

T

=

[

3 4 7

−2 1 −6

]

THEOREM 2 If A, B ∈ Mm×n(R) and c ∈ R, then

(1) (AT )T = A

(2) (A + B)T = AT + BT

(3) (cA)T = cAT

Proof: (1) By definition of the transpose we have(

(AT )T)

i j = (AT ) ji = Ai j.

(2)(

(A + B)T)

i j = (A + B) ji = (A) ji + (B) ji = (AT )i j + (BT )i j = (AT + BT )i j.

(3)(

(cA)T)

i j = (cA) ji = c(A) ji = c(AT )i j = (cAT )i j. �

As mentioned above, one use of the transpose is so that we have notation for the rows

of a matrix. We demonstrate this in the next example.

EXAMPLE 3 If ~a1 =

1

3

1

and ~a2 =

2

−1

4

, then the matrix A =

[

~aT1

~aT2

]

is

[

1 3 1

2 −1 4

]

.

EXERCISE 1 If

~aT1

~aT2

~aT3

=

1 3 1

−4 0 2

5 9 −3

, then what are ~a1, ~a2, and ~a3?


Matrix-Vector Multiplication

We saw in the last chapter that we had two ways of viewing the coefficient matrix A

of a system of linear equations[

A | ~b]

: the coefficients of the i-th equation are the

entries in the i-th row of A, or the coefficients of x j are the entries in the j-th column

of A. We will now use both of these interpretations to derive equivalent definitions

of matrix-vector multiplication so that we can represent the system[

A | ~b]

by the

equation A~x = ~b where ~x =

x1

...xn

. Note that both of the definitions are going to be

used frequently throughout this book.

Coefficients are the Rows of A

Let A be an m× n matrix. Using the transpose, we can write A =

~aT1...~aT

m

where ~ai ∈ Rn.

Observe that the i-th equation of the system[

A | ~b]

can be written ~ai · ~x = bi since

ai1

...ain

·

x1

...xn

= ai1x1 + · · · + ainxn

Therefore, the system[

A | ~b]

can be written as

~a1 · ~x...

~am · ~x

=

b1

...bm

To write the left-hand side in the form A~x, we make the following definition.

DEFINITION

Matrix-VectorMultiplication

Let A be an m × n matrix whose rows are denoted ~aTi for 1 ≤ i ≤ m. For any ~x ∈ Rn,

we define A~x by

A~x =

~a1 · ~x...

~am · ~x

REMARK

If A is an m × n matrix, then A~x is only defined if ~x ∈ Rn. Moreover, if ~x ∈ Rn, then

A~x ∈ Rm.


EXAMPLE 4 Calculate

1 3

2 −4

9 −1

[

x1

x2

]

.

Solution: Let

1 3

2 −4

9 −1

[

x1

x2

]

=

b1

b2

b3

. We have

b1 =

[

1

3

]

·[

x1

x2

]

= x1 + 3x2

b2 =

[

2

−4

]

·[

x1

x2

]

= 2x1 − 4x2

b3 =

[

9

−1

]

·[

x1

x2

]

= 9x1 − x2

Hence,

1 3

2 −4

9 −1

[

x1

x2

]

=

x1 + 3x2

2x1 − 4x2

9x1 − x2

EXAMPLE 5 Let A =

[

3 4 −5

1 0 2

]

. Calculate A~x where:

(a) ~x =

2

−1

6

(b) ~x =

1

0

0

(c) ~x =

0

0

1

Solution: We have

A~x =

[

3 4 −5

1 0 2

]

2

−1

6

=

[

3(2) + 4(−1) + (−5)(6)

1(2) + 0(−1) + (2)(6)

]

=

[

−28

14

]

A~x =

[

3 4 −5

1 0 2

]

1

0

0

=

[

3(1) + 4(0) + (−5)(0)

1(1) + 0(0) + (2)(0)

]

=

[

3

1

]

A~x =

[

3 4 −5

1 0 2

]

0

0

1

=

[

3(0) + 4(0) + (−5)(1)

1(0) + 0(0) + (2)(1)

]

=

[

−5

2

]

EXERCISE 2 Calculate the following matrix-vector products.

(a)

[

1 3 2

−1 4 5

]

2

−1

6

(b)[

6 −1 1]

2

3

1

(c)

1 −4

3 1

1 −2

[

0

1

]


EXERCISE 3 Write the following system of linear equations in the form A~x = ~b.

x1 + x2 = 3

2x1 = 15

−2x1 − 3x2 = −5

Coefficients are the Columns of A

Observe that by using operations on vectors in Rn, the system of linear equations

a11x1 + · · · + a1nxn = b1

...... =

...

am1x1 + · · · + amnxn = bm

can be written as

x1

a11

...am1

+ · · · + xn

a1n

...amn

=

b1

...bm

To write the right-hand side in the form A~x, we make the following definition.

DEFINITION

Matrix-VectorMultiplication

Let A =[

~a1 · · · ~an

]

be an m × n matrix. For any ~x =

x1

...xn

∈ Rn, we define A~x by

A~x = x1~a1 + · · · + xn~an

REMARKS

1. As above, if A is an m×n matrix, then A~x only makes sense if ~x ∈ Rn. Moreover,

if ~x ∈ Rn, then A~x ∈ Rm.

2. Observe that by definition for any ~x ∈ Rn, A~x is a linear combination of the

columns of A.

EXAMPLE 6 Calculate

1 3

2 −4

9 −1

[

x1

x2

]

.

Solution: We have

1 3

2 −4

9 −1

[

x1

x2

]

= x1

1

2

9

+ x2

3

−4

−1

=

x1 + 3x2

2x1 − 4x2

9x1 − x2


EXAMPLE 7 Let A =

[

3 4 −5

1 0 2

]

. Calculate A~x where:

(a) ~x =

2

−1

6

(b) ~x =

0

1

0

Solution: We have

[

3 4 −5

1 0 2

]

2

−1

6

= 2

[

3

1

]

+ (−1)

[

4

0

]

+ 6

[

−5

2

]

=

[

−28

14

]

[

3 4 −5

1 0 2

]

0

1

0

= 0

[

3

1

]

+ 1

[

4

0

]

+ 0

[

−5

2

]

=

[

4

0

]

EXERCISE 4 Calculate the following matrix-vector products by expanding them as a linear com-

bination of the columns of the matrix.

(a)

[

1 3 2

−1 4 5

]

2

−1

6

(b)[

6 −1 1]

2

3

1

(c)

1 −4

3 1

1 −2

[

0

1

]

Observe that this exercise contained the same problems as Exercise 2. This was to

demonstrate the fact that both definitions of matrix-vector multiplication indeed give

the same answer. At this point you may wonder why we have two different definitions

for the same thing. The reason is that they both have different uses. We use the first

method for computing matrix-vector products or when we want to work with the rows

of a matrix. When we are working with the columns of matrix or linear combinations

of vectors, then we use the second method.

The following is a simple, but useful theorem.

THEOREM 3 If ~ei is the i-th standard basis vector and A =[

~a1 · · · ~an

]

, then

A~ei = ~ai

The following formula will be extremely important later in the text.

DEFINITION

~xT~y

If ~x, ~y ∈ Rn, then

~xT~y = ~x · ~y


EXAMPLE 8 If ~x =

3

1

2

−1

and ~y =

−2

0

4

5

, then

~xT~y = ~x · ~y = 3(−2) + 1(0) + 2(4) + (−1)(5) = −3

REMARK

Because of this property, we can now write our first definition of matrix-vector

multiplication as

A~x =

~aT1...~aT

m

~x =

~at1~x...~aT

m~x

The properties of matrix-vector multiplication and of ~xT~y are the same as the proper-

ties of matrix-matrix multiplication given in Theorem 4 below.

In most cases, we will now represent a system of linear equations[

A | ~b]

as A~x = ~b.

This will allow us to use properties of matrix-vector multiplication when dealing with

systems of linear equations.

EXAMPLE 9 Assume that ~v is a solution of the system of linear equations A~x = ~b and that ~z is a

vector such that A~z = ~0. Prove that ~x = ~v + t~z is also a solution of the system for

every t ∈ R.

Solution: For any t ∈ R we have

A(~v + t~z) = A~v + A(t~z) by Theorem 4 (1)

= A~v + tA~z by Theorem 4 (3)

= ~b + t~0 since A~v = ~b and A~z = ~0

= ~b


Matrix Multiplication

We now use matrix-vector multiplication to define the product of two appropriately

sized matrices.

DEFINITION

MatrixMultiplication

For an m× n matrix A and an n× p matrix B =[

~b1 · · · ~bp

]

we define AB to be the

m × p matrix

AB = A[

~b1 · · · ~bp

]

=[

A~b1 · · · A~bp

]

REMARKS

1. Notice that the product AB is only defined if the number of columns of A is

equal to the number of rows of B.

2. There are a variety of ways to justify/motivate this definition of matrix mul-

tiplication. Many of these ways are quite interesting. However, we will wait

until Section 3.4 to justify the definition. There we will show how matrix mul-

tiplication corresponds to a composition of functions.

We can compute the entries of AB easily using our first interpretation of matrix-vector

multiplication. In particular, if ~aTi are the rows of A, then the i j-th entry of AB is the

i-th entry of A~b j. That is

(AB)i j = ~ai · ~b j = ~aTi~b j = ai1b1 j + ai2b2 j + · · · + ainbn j =

n∑

k=1

(A)ik(B)k j

EXAMPLE 10

[

1 3 2

−1 0 −3

]

−2 4

−4 5

0 0

=

[

1(−2) + 3(−4) + 2(0) 1(4) + 3(5) + 2(0)

(−1)(−2) + 0(−4) + (−3)(0) (−1)(4) + 0(5) + (−3)(0)

]

=

[

−14 19

2 −4

]

−2 4

−4 5

0 0

[

1 3 2

−1 0 −3

]

=

(−2)(1) + 4(−1) (−2)(3) + 4(0) (−2)(2) + 4(−3)

(−4)(1) + 5(−1) (−4)(3) + 5(0) (−4)(2) + 5(−3)

0(1) + 0(−1) 0(3) + 0(0) 0(2) + 0(−3)

=

−6 −6 −16

−9 −12 −23

0 0 0


EXERCISE 5 Calculate the following matrix products, or state why they don’t exist.

(a)

1 3

2 −1

0 1

[

2 3 0 1

0 1 −2 1

]

(b)

[

2 3 0 1

0 1 −2 1

]

1 3

2 −1

0 1

(c)[

1 2 3]

−1

2

1

(d)

−1

2

1

[

1 2 3]

THEOREM 4 If A, B, and C are matrices of the correct size so that the required products are

defined, and t ∈ R, then

(1) A(B +C) = AB + AC

(2) (A + B)C = AC + BC

(3) t(AB) = (tA)B = A(tB)

(4) A(BC) = (AB)C

(5) (AB)T = BT AT

Proof: We prove (1) and (4) and leave the others as exercises. For (1) we have

(

A(B +C))

i j =

n∑

k=1

(A)ik(B + C)k j

=

n∑

k=1

(A)ik

(

(B)k j + (C)k j

)

=

n∑

k=1

(A)ik(B)k j +

n∑

k=1

(A)ik(C)k j

=(

AB)

i j +(

AC)

i j =(

AB + AC)

i j

For (3) we have

(

A(BC))

i j =

n∑

k=1

(A)ik(BC)k j

=

n∑

k=1

(A)ik

n∑

ℓ=1

(B)kℓ(C)ℓ j

=

n∑

k=1

n∑

ℓ=1

(A)ik(B)kℓ(C)ℓ j

=

n∑

ℓ=1

n∑

k=1

(A)ik(B)kℓ(C)ℓ j

=

n∑

ℓ=1

n∑

k=1

(A)ik(B)kℓ

(C)ℓ j

=

n∑

ℓ=1

(AB)iℓ(C)ℓ j =(

(AB)C)

i j


�

Observe that Example 10 shows that in general AB , BA. When multiplying a matrix

equation by a matrix we have to make sure that we keep this in mind. That is, if A = B

and we multiply both sides by a matrix D, then we get either DA = DB or AD = BD

depending on whether we multiply by D on the left or right. Similarly, we do not

have the cancellation law for matrix multiplication; if AC = BC, then we can not

guarantee that A = B. This is demonstrated in the next example.

EXAMPLE 11 We have[

1 3

0 0

] [

0 1

0 0

]

=

[

0 1

0 0

]

=

[

1 15

0 −5

] [

0 1

0 0

]

but[

1 3

0 0

]

,

[

1 15

0 −5

]

However, in many cases we will want to prove that two matrices are equal. To do

this, we often use the following theorem.

THEOREM 5 (Matrices Equal Theorem)

If A and B are m × n matrices such that A~x = B~x for every ~x ∈ Rn, then A = B.

Proof: Let A =[

~a1 · · · ~an

]

and B =[

~b1 · · · ~bn

]

. Let ~ei denote the i-th standard

basis vector. Then, since A~x = B~x for all ~x ∈ Rn, we have that

~ai = A~ei = B~ei = ~bi

for 1 ≤ i ≤ n. Hence A = B. �

Identity Matrix

When considering multiplication, we are often interested in finding the multiplicative

identity; the element I such that AI = A = IA for every element A. We first observe

that if A is an m × n matrix and AB = A, then B must be an n × n matrix, while if

BA = A, then B must be m×m. Therefore, the set Mm×n(R) with m , n cannot have a

multiplicative identity. However, we can have a multiplicative identity for Mn×n(R).

To find the multiplicative identity for Mn×n(R) we want to find an n × n matrix

B =[

~b1 · · · ~bn

]

such that

AB = A = BA

for every n × n matrix A =[

~a1 · · · ~an

]

. Observe that if A = AB, then we have

[

~a1 · · · ~an

]

= A[

~b1 · · · ~bn

]

=[

A~b1 · · · A~bn

]

Thus, ~ai = A~bi. However, we know that ~ai = A~ei, where ~ei is the i-th standard basis

vector for all 1 ≤ i ≤ n. Thus, we should take I =[

~e1 · · · ~en

]

.


THEOREM 6 If I =[

~e1 · · · ~en

]

, then for any n × n matrix A we have

AI = A = IA

EXERCISE 6 Prove Theorem 6

Theorem 6 shows that I =[

~e1 · · · ~en

]

is a multiplicative identity. We now prove

that it is the only multiplicative identity for Mn×n(R).

THEOREM 7 The multiplicative identity for Mn×n(R) is unique.

Proof: If I1 and I2 are two distinct multiplicative identities for Mn×n(R), then

I1 = I1I2 = I2

which contradicts our assumption that they are distinct. Hence, the multiplicative

identity for Mn×n(R) is unique. �

We make the following definition.

DEFINITION

Identity Matrix

The n × n identity matrix, denoted by I or In, is the matrix such that (I)ii = 1 for

1 ≤ i ≤ n and (I)i j = 0 whenever i , j. In particular,

I =[

~e1 · · · ~en

]

EXAMPLE 12 The 2 × 2 identity matrix is

I2 =[

~e1 ~e2

]

=

[

1 0

0 1

]

and the 3 × 3 identity matrix is

I3 =[

~e1 ~e2 ~e3

]

=

1 0 0

0 1 0

0 0 1


EXERCISE 7 Let A be an m×n matrix such that the system of linear equations A~x = ~ei is consistent

for all 1 ≤ i ≤ m.

(a) Prove that the system of equations A~x = ~y is consistent for all ~y ∈ Rm.

(b) What can you conclude about the rank of A from the result in part (a)?

(c) Prove that there exists a matrix B such that AB = Im.

(d) Let A =

[

1 2 1

0 1 1

]

. Use your method in part (c) to find a matrix B such that

AB = I2.

Block Multiplication

So far in this section, we have often made use of the fact that we could represent

a matrix as a list of columns (n × 1 matrices) or as a list of rows (1 × n matrices).

Sometimes it can also be very useful to represent a matrix as a group of submatrices

called blocks.

DEFINITION

Block Matrix

If A is an m × n matrix, then we can write A as the k × ℓ block matrix

A =

A11 · · · A1ℓ

.... . .

...Ak1 · · · Akℓ

where Ai j is a block such that all blocks in the i-th row have the same number of rows

and all blocks in the j-th column have the same number of columns.

EXAMPLE 13 If A =

1 −1 3 4

0 0 0 0

0 3 1 2

, then we can write A as

A =

[

A11 A12

A21 A22

]

where A11 =

[

1 −1

0 0

]

, A12 =

[

3 4

0 0

]

, A21 =[

0 3]

, and A22 =[

1 2]

. Alternately, we

could write A as

A =

A11 A12

A21 A22

A31 A32

where A11 =[

1 −1 3]

, A12 =[

4]

, A21 =[

0 0 0]

, A22 =[

0]

, A31 =[

0 3 1]

,

and A32 =[

2]

.


Let A be an m × n matrix and let B be an n × p matrix. If we write A as a k × ℓ block

matrix and B as an ℓ × s block matrix, then we can calculate AB, by multiplying the

block matrices together using the definition of matrix multiplication. For example, if

A =

A11 A12

A21 A22

A31 A32

and B =

[

B11 B12

B21 B22

]

, then

AB =

A11B11 + A12B21 A11B12 + A12B22

A21B11 + A22B21 A21B12 + A22B22

A31B11 + A32B21 A31B12 + A32B22

assuming that we have divided A and B into blocks such that all of the required

products of the blocks are defined.

EXAMPLE 14 Let A =

1 2 −3

0 3 1

1 0 2

and B =

2 3 1

0 1 −2

0 0 3

. Let A11 =[

1]

, A12 =[

2 −3]

, A21 =

[

0

1

]

,

and A22 =

[

3 1

0 2

]

so that A =

[

A11 A12

A21 A22

]

, and let B11 =[

2]

, B12 =[

3 1]

, B21 =

[

0

0

]

,

and B22 =

[

1 −2

0 3

]

, so that B =

[

B11 B12

B21 B22

]

. Then we have

AB =

[

A11 A12

A21 A22

] [

B11 B12

B21 B22

]

=

[

A11B11 + A12B21 A11B12 + A12B22

A21B11 + A22B21 A21B12 + A22B22

]

Computing each entry we get

A11B11 + A12B21 =[

1] [

2]

+[

2 −3]

[

0

0

]

=[

2]

+[

0]

=[

2]

A11B12 + A12B22 =[

1] [

3 1]

+[

2 −3]

[

1 −2

0 3

]

=[

3 1]

+[

2 −13]

=[

5 −12]

A21B11 + A22B21 =

[

0

1

]

[

2]

+

[

3 1

0 2

] [

0

0

]

=

[

0

2

]

+

[

0

0

]

=

[

0

2

]

A21B12 + A22B22 =

[

0

1

]

[

3 1]

+

[

3 1

0 2

] [

1 −2

0 3

]

=

[

0 0

3 1

]

+

[

3 −3

0 6

]

=

[

3 −3

3 7

]

Hence,

AB =

2 5 −12

0 3 −3

2 3 7

REMARK

If you multiply AB using the definition of matrix-matrix multiplication and compare

this to the calculations above it becomes clear why block multiplication works.



1. Compute the following linear combinations.

(a) 2

[

2 1

−3 4

]

+

[

3 −1

0 2

]

(b)1

3

[

1 3

1 −6

]

− 1

4

[

4 8

1/2 3

]

(c)1√

2

[

2 4

0 2

]

+1√

3

[

6 0

0 6

]

(d) −2

[

3 3

0 −1

]

− 3

[

−2 −2

0 2

]

(e) x

[

1 0

0 0

]

+ y

[

0 1

0 0

]

+ z

[

0 0

1 0

]

(f) 0

[

−4 3

3 5

]

+ 0

[

9 −8

1 5

]

2. Compute the following or state why they don’t

exist.

(a)

[

3 1

2 1

] [

2 −2

0 1

]

(b)

[

3 2 −1

1 4 3

] [

1 2 2

−3 5 0

]

(c)

1 2 3

4 5 6

7 8 9

1

0

0

(d)

1 2 3

4 5 6

7 8 9

0 0

1 0

0 1

(e)

1 5

3 2

3 −2

[

3 1

2 1

]T

(f)

1 4 −1

2 1 0

3 1 1

4 1 −1

0 0 0

3 1 1

(g)

2

3

5

1

−1

1

(h)

[

−1 2 0

2 1 1

]

5 −3

2 7

−1 −3

(i)

2

3

5

T

1

−1

1

(j)

5 −3

2 7

−1 −3

[

−1 2 0

2 1 1

]

(k)

2

3

5

1

−1

1

T

(l)

2

3

5

T

x1

x2

x3

3. We define the trace of an n × n matrix A by

tr A =n∑

i=1

aii. Let A and B be n × n matrices.

(a) Prove that tr(A + B) = tr A + tr B.

(b) Prove that tr(AT B) =

n∑

i=1

n∑

j=1

(A) ji(B) ji.

4. Find a matrix A and vectors ~x and ~b such that

A~x = ~b represents the system

3x1 + 2x2 − x3 = 4

2x1 − x2 + 5x3 = 5

5. Find all solutions to the equation

c1

[

3 1

3 1

]

+ c2

[

2 1

2 −1

]

+ c3

[

1 1

1 −2

]

=

[

0 0

0 0

]

6. Find t1, t2, and t3 such that

t1

[

1 2

−2 −1

]

+ t2

[

1 1

0 −1

]

+ t3

[

2 3

1 1

]

=

[

−1 −4

3 −2

]



(a) If A ∈ M3×2(R) and A~x is defined, then

~x ∈ R3.

(b) If A ∈ M2×4(R) and A~x is defined, then

A~x ∈ R4.

(c) If A ∈ M3×3(R), then there is no matrix B

such that AB = BA.

(d) The only 2×2 matrix A such that A2 = O2,2

is O2,2. (NOTE: As usual, by A2 we mean

AA.)

(e) If A and B are 2 × 2 matrices such that

AB = O2,2, then either A is the zero ma-

trix or B is the zero matrix.

8. Prove that if A and B are m × n matrices, then

A + B = B + A.

9. Let A be an m × n matrix. Prove that ImA = A

and AIn = A.


11. Let A be an m× n matrix with one row of zeros

and B be an n × p matrix.

(a) Use block multiplication to prove that AB

also has at least one row of zeros.

(b) Give an example where AB has more than

one row of zeros.

12. Find all 2 × 2 matrices A such that A2 = I.


13. Consider the populations of individuals in two

adjacent regions X and Y . Suppose that

changes in these populations only occur via

transfer (migration) of individuals from one re-

gion to the other. Let xt and yt denote the popu-

lations of regions X and Y , respectively, in year

t. Suppose that each year, 110

of the population

of X migrates to region Y , while 15

of the popu-

lation of Y migrates to X. We can then express

the year-to-year change in the populations as:

xt+1 =9

10xt +

1

5yt

yt+1 =1

10xt +

4

5yt

(a) Express the year-to-year change in the

population as a matrix product.[

xt+1

yt+1

]

= A

[

xt

yt

]

(b) Suppose that in year t = 0 the populations

are given by x0 = 1000, y0 = 2000. Use

your formula from part (a) to calculate the

populations in years t = 1, 2 and 3. Con-

firm that in each year, the total population

(xt + yt) is 3000. Why should this be ex-

pected?

(c) If the initial populations are x0 = 2000,

y0 = 1000, what are the populations in

year t = 1? What about year t = 2? What

about year t = 186? Explain your reason-

ing.

14. Suppose that the only competing suppliers of

internet services are NewServices and Real-

West. At present, they each have 50% of

the market share in their community. How-

ever, NewServices is upgrading their connec-

tion speeds but at a higher cost. Because of this,

each month 30% of RealWest’s customers will

switch to NewServices, while 10% of NewSer-

vices customers will switch to RealWest.

(a) Express the year-to-year change in the per-

cent of market share as a matrix product.

(b) Calculate the market share of each com-

pany for the first two months.

(c) Use a computer to calculate the market

share of each company after six months.

If this continues for a long time, how big

will NewServices share become?

15. A car rental company serving one city has three

locations: the airport, the train station, and the

city centre. Of the cars rented at the airport,

8/10 are returned to the airport, 1/10 are left

at the train station, and 1/10 are left at the city

centre. Of cars rented at the train station, 3/10

are left at the airport, 6/10 are returned to the

train station, and 1/10 are left at the city cen-

tre. Of cars rented at the city centre, 3/10 go

to the airport, 1/10 go to the train station, and

6/10 are returned to the city centre. Express

the change in the location of the cars as a ma-

trix product.

16. The Fibonacci sequence is defined as

1, 1, 2, 3, 5, 8, 11, . . .

After the first two values, each entry in the se-

quence is the sum of the preceding two. De-

noting the n-th entry in the sequence by an, we

have

an+2 = an + an+1 for n = 0, 1, 2, . . .

This is called a two-step recursion, because

each term depends on the two preceding terms.

The sequence can also be constructed as a pair

of one-step recursions as follows:

xn+1 = yn

yn+1 = xn + yn

(a) Verify that using this rule and taking x0 =

1, y0 = 1, the xn sequence is the Fi-

bonacci sequence, while the yn sequence

is a shifted version of the Fibonacci se-

quence.

(b) an advantage of the one-step recursion

form of the sequence definition is that it

can be written as a matrix product. Find

the matrix A so that the recursion takes the

form[

xn+1

yn+1

]

= A

[

xn

yn

]

This matrix formulation will be to find an

explicit expression for the n term in this

sequence in Problem 11 in Section 6.4.


3.2 Linear Mappings

Recall that for sets A and B, a function f : A → B is a rule that associates with each

element a ∈ A one element f (a) ∈ B called the image of a under f . The set A is

called the domain of f and the set B is called the codomain of f .

Matrix Mappings

Observe that if A is an m × n matrix, then we can define a function f : Rn → Rm by

f (~x) = A~x called a matrix mapping.

REMARK

As we indicated previously, if f : Rn → Rm, then we should write

f

x1

...xn

=

y1

...ym

However, we will often write f (x1, . . . , xn) = (y1, . . . , ym) as is typically done in other

areas.

EXAMPLE 1 Let A =

[

1 3 2

−1 3 1

]

and define f (~x) = A~x. From the last section we know that we

can calculate A~x if and only if ~x ∈ R3. Thus, the domain of f is R3. Also, we see that

A~x ∈ R2 and so the codomain of f is R2. We can easily compute the images of some

vectors in R3 under f using either definition of matrix-vector multiplication.

f (4,−2, 5) =

[

1 3 2

−1 3 1

]

4

−2

5

=

[

8

−5

]

f (−1,−1, 2) =

[

1 3 2

−1 3 1

]

−1

−1

2

=

[

0

0

]

f (1, 0, 0) =

[

1 3 2

−1 3 1

]

1

0

0

=

[

1

−1

]

f (x1, x2, x3) =

[

1 3 2

−1 3 1

]

x1

x2

x3

=

[

x1 + 3x2 + 2x3

−x1 + 3x2 + x3

]

Using the properties of matrix multiplication, we prove the following important prop-

erty of matrix mappings.

Section 3.2 Linear Mappings 89

THEOREM 1 If A is an m × n matrix and f : Rn → Rm is defined by f (~x) = A~x, then for all

~x, ~y ∈ Rn and b, c ∈ R we have

f (b~x + c~y) = b f (~x) + c f (~y)

Proof: Using Theorem 3.1.?? we get

f (b~x + c~y) = A(b~x + c~y) = A(b~x) + A(c~y) = bA~x + cA~y = b f (~x) + c f (~y)

�

Any function f which has the property that f (b~x + c~y) = b f (~x) + c f (~y) is said to

preserve linear combinations. This property, called the linearity property, is ex-

tremely important in linear algebra. We now look at functions that have this property.

Linear Mappings

DEFINITION

Linear Mapping

Linear Operator

A function L : Rn → Rm is said to be a linear mapping if for every ~x, ~y ∈ Rn and

b, c ∈ R we have

L(b~x + c~y) = bL(~x) + cL(~y)

Two linear mappings L : Rn → Rm and M : Rn → Rm are said to be equal if

L(~x) = M(~x) for all ~x ∈ Rn. We write L = M.

A linear mapping L : Rn → Rn is sometimes called a linear operator. This is often

done when we wish to stress the fact that the domain and codomain of the linear

mapping are the same.

Observe that if L : Rn → Rm is a linear mapping and ~x can be written as a linear

combination of vectors ~v1, . . . ,~vk, say

~x = c1~v1 + · · · + ck~vk

then the image of ~x under L can be written as a linear combination of the images of

the vectors ~v1, . . . ,~vk under L. That is

L(~x) = L(c1~v1 + · · · + ck~vk) = c1L(~v1) + · · · + ckL(~vk)

since L preserves linear combinations. This is the most important property of linear

mappings; it will be used frequently!


EXAMPLE 2 Prove that the function L : R3 → R2 defined by L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3)

is a linear mapping.

Solution: Let ~x, ~y ∈ R3 and b, c ∈ R. Then

L(b~x + c~y) = L(bx1 + cy1, bx2 + cy2, bx3 + cy3)

=(

3(bx1 + cy1) − (bx2 + cy2), 2(bx1 + cy1) + 2(bx3 + cy3))

= b(3x1 − x2, 2x1 + 2x3) + c(3y1 − y2, 2y1 + 2y3)

= bL(~x) + cL(~y)

Hence, L is linear.

EXAMPLE 3 Prove that L : R2 → R2 defined by L(x1, x2) = (x21− x2

2, x1x2) is not linear.

Solution: To prove L is not linear we just need to find one example to show that L

does not preserve linear combinations. Observe that L(1, 2) = (−3, 2) and L(2, 4) =

(−12, 8), but 2L(1, 2) = (−6, 4) , L(

2(1, 2))

= L(2, 4). Hence, L is not linear.

EXAMPLE 4 Let ~a ∈ Rn. Show that proj~a : Rn → Rn is a linear mapping.

Solution: Let ~x, ~y ∈ Rn and b, c ∈ R. Then

proj~a(b~x + c~y) =(b~x + c~y) · ~a‖~a ‖2 ~a

=b(~x · ~a)

‖~a ‖2 ~a +c(~y · ~a)

‖~a ‖2 ~a

= b proj~a(~x) + c proj~a(~y)

Hence, proj~a is linear.

EXERCISE 1 Prove that the mapping L : R3 → R2 defined by L(x1, x2, x3) = (x1 − x2, x1 − 2x3) is

linear.

Theorem 3.2.1 shows that every matrix mapping is a linear mapping. It is natural

to ask whether the converse is true: “Is every linear mapping a matrix mapping?”

To answer this question, we will see that our second interpretation of matrix vector

multiplication is very important.

Let L : Rn → Rm be a linear mapping and let ~x ∈ Rn. We know that ~x can be written

as a unique linear combination of the standard basis vectors, say

~x = x1~e1 + · · · + xn~en

Hence, using the linearity property we get

L(~x) = L(x1~e1 + · · · + xn~en) = x1L(~e1) + · · · + xnL(~en)


Our second interpretation of matrix vector multiplication says that a matrix times

a vector gives us a linear combination of the columns of the matrix. Using this in

reverse, we see that

L(~x) = x1L(~e1) + · · · + xnL(~en) =[

L(~e1) · · · L(~en)]

x1

...xn

We have proven the following theorem.

THEOREM 2 Every linear mapping L : Rn → Rm can be represented as a matrix mapping with

matrix whose i-th column is the image of the i-th standard basis vector of Rn under L

for all 1 ≤ i ≤ n. That is, L(~x) = [L]~x where

[L] =[

L(~e1) · · · L(~en)]

DEFINITION

Standard Matrix

Let L : Rn → Rm be a linear mapping. The matrix [L] =[

L(~e1) · · · L(~en)]

is called

the standard matrix of L. It satisfies L(~x) = [L]~x.

REMARK

We are calling this the standard matrix of L since it is based on the standard basis. In

Chapters 6 and 8, we will see that we can find the matrix representation of a linear

mapping with respect to different bases.

EXAMPLE 5 Determine the standard matrix of L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3).

Solution: By definition, the columns of the standard matrix [L] are the images of the

standard basis vectors of R3 under L. We have

L(1, 0, 0) = (3, 2)

L(0, 1, 0) = (−1, 0)

L(0, 0, 1) = (0, 2)

Thus,

[L] =[

L(~e1) L(~e2) L(~e3)]

=

[

3 −1 0

2 0 2

]


REMARK

Be careful to remember that when we write L(1, 0, 0) = (3, 2), we really mean

L(~e1) = L

1

0

0

=

[

3

2

]

so that L(~e1) is in fact a column of [L].

EXAMPLE 6 Let ~a =

[

1

2

]

. Determine the standard matrix of proj~a : R2 → R2.

Solution: We need to find the projection of the standard basis vectors of R2 onto ~a.

proj~a(~e1) =~e1 · ~a‖~a ‖2 ~a =

1

5

[

1

2

]

=

[

1/52/5

]

proj~a(~e2) =~e2 · ~a‖~a ‖2 ~a =

2

5

[

1

2

]

=

[

2/54/5

]

Hence, [proj~a] =[

proj~a(~e1) proj~a(~e2)]

=

[

1/5 2/52/5 4/5

]

.

EXERCISE 2 Determine the standard matrix of the linear mapping

L(x1, x2, x3) = (x1 + 4x2, x1 − x2 + 2x3)

We now look at two important examples of linear mappings that will be used later in

the book.

Rotations in R2

Let Rθ : R2 → R2 denote the function that rotates

a vector ~x ∈ R2 about the origin through an angle

θ. Using basic trigonometric identities, we can show

that

Rθ(x1, x2) = (x1 cos θ − x2 sin θ, x1 sin θ + x2 cos θ)

and that Rθ is linear. Using this we get that the stan-

dard matrix of Rθ is

[Rθ] =

[

cos θ − sin θsin θ cos θ

]

x1

x2

Rθ(~x)

θ ~x


EXAMPLE 7 Determine the standard matrix of Rπ/4.

Solution: We have

[Rπ/4] =

[

cos π/4 − sinπ/4sin π/4 cos π/4

]

=

[

1/√

2 −1/√

2

1/√

2 1/√

2

]

We call Rθ a rotation in R2 and we call its standard matrix [Rθ] a rotation matrix.

The following theorem shows some important properties of a rotation matrix.

THEOREM 3 If Rθ : R2 → R2 is a rotation with rotation matrix A = [Rθ], then the columns of A

are orthogonal unit vectors.

Proof: The columns of A are ~a1 =

[

cos θsin θ

]

and ~a2 =

[

− sin θcos θ

]

. Hence,

~a1 · ~a2 = − cos θ sin θ + cos θ sin θ = 0

‖~a1‖2 = cos2 θ + sin2 θ = 1

‖~a2‖2 = cos2 θ + sin2 θ = 1

�

Reflections

Let reflP : Rn → Rn denote the mapping that sends a

vector ~x to its mirror image in the hyperplane P with

normal vector ~n. The figure shows a reflection over a

line in R2 with normal vector ~n. From the figure, we

see that the reflection of ~x over the line with normal

vector ~n is given by

reflP(~x) = ~x − 2 proj~n(~x)

We extend this formula directly to the n-dimensional

case.

x1

x2

proj~n(~x)

~n

θθ

~x

reflP(~x)

EXAMPLE 8 Prove that reflP : Rn → Rn is a linear mapping.

Solution: Since proj~n is a linear mapping we get

reflP(b~x + c~y) = (b~x + c~y) − 2 proj~n(b~x + c~y)

= b~x + c~y − 2(b proj~n(~x) + c proj~n(~y))

= b(~x − 2 proj~n(~x)) + c(~y − 2 proj~n(~y))

= b reflP(~x) + c reflP(~y)

for all ~x, ~y ∈ Rn and b, c ∈ R as required.


EXAMPLE 9 Find the reflection of ~x =

1

1

1

3

over the hyperplane P in R4 with normal vector ~n =

1

2

2

0

.

Solution: We have

reflP(~x) = ~x − 2 proj~n(~x) = ~x − 2~x · ~n‖~n ‖2~n =

1

1

1

3

− 10

9

1

2

2

0

=

−1/9−11/9−11/9

3

EXAMPLE 10 Determine the standard matrix of reflP : R3 → R3 where ~n =

1

2

−3

.

Solution: We have

reflP(~e1) =

1

0

0

− 2

14

1

2

−3

=

6/7−2/73/7

reflP(~e2) =

0

1

0

− 4

14

1

2

−3

=

−2/73/76/7

reflP(~e3) =

0

0

1

+6

14

1

2

−3

=

3/76/7−2/7

Hence,

[reflP] =

6/7 −2/7 3/7−2/7 3/7 6/73/7 6/7 −2/7



1. Let A =

−2 3

3 0

1 5

4 −6

and let L(~x) = A~x be the

corresponding matrix mapping.

(a) Determine the domain and codomain of L.

(b) Determine L(2,−5) and L(−3, 4).

(c) Determine [L].

2. Let B =

1 2 −3 0

2 −1 0 3

1 0 2 −1

and let f (~x) = B~x

be the corresponding matrix mapping.

(a) Determine the domain and codomain of f .

(b) Determine f (2,−2, 3, 1) and f (−3, 1, 4, 2).

(c) Determine [ f ].

3. Determine which of the following are linear.

Find the standard matrix of each linear map-

ping.

(a) L(x1, x2, x3) = (x1 + x2, 0)

(b) L(x1, x2, x3) = (x1 − x2, x2 + x3)

(c) L(x1, x2, x3) = (1, x1, x2 + x3)

(d) L(x1, x2) = (x21, x2

2)

(e) L(x1, x2, x3) = (0, 0, 0)

(f) L(x1, x2, x3, x4) = (x4 − x1, 2x2 + 3x3)

4. Let ~a =

[

2

2

]

and ~x =

[

3

−1

]

.

(a) Find [proj~a].

(b) Use the standard matrix you found in (a)

to find proj~a(~x).

(c) Check your answer in (b) by computing

proj~a(~x) directly.

5. Let P be a plane with normal vector ~n =

3

1

−1

.

Find the standard matrix of reflP : R3 → R3.

6. Let ~a ∈ Rn. Prove that perp~a : Rn → Rn is a

linear mapping.

7. Let ~v ∈ Rn and define DOT~v : Rn → R by

DOT~v(~x) = ~x ·~v. Prove that DOT~v is linear and

find it’s standard matrix.

8. Prove that if L : Rn → Rm is linear, then

L(~0) = ~0.

9. Let L be a linear mapping with standard matrix

[L] =

[

1 3 2 1

−1 3 0 5

]

.

(a) What is the domain of L?

(b) What is the codomain of L?

(c) What is L(~x) for any ~x in the domain of L.

10. Consider the rotation Rπ/3 : R2 → R2.

(a) Find the standard matrix of Rπ/3.

(b) Use the standard matrix to determine

Rπ/3(1/√

2, 1/√

2).

11. (a) Invent a linear mapping L : R2 → R3 such

that L(1, 0) = (3,−1, 4) and

L(0, 1) = (1,−5, 9).

(b) Invent a non-linear mapping L : R2 → R3

such that L(1, 0) = (3,−1, 4) and

L(0, 1) = (1,−5, 9).

12. Invent a linear mapping L : R2 → R2 such that

L(1, 1) = (2, 3) and L(1,−1) = (3, 1).

13. (a) Let L : Rn → Rm be a linear mapping.

Prove that if {L(~v1), . . . , L(~vk)} is a linearly

independent set in Rm, then {~v1, . . . ,~vk} is


(b) Give an example of a linear mapping

L : Rn → Rm where {~v1, . . . ,~vk} is linearly

independent in Rn, but {L(~v1), . . . , L(~vk)} is

linearly dependent.

14. Prove that if ~u is a unit vector, then

[proj~u] = ~u~uT


3.3 Special Subspaces

When working with a function we are often interested in the subset of the codomain

which is the set of all possible images under the function, called the range of the

function. In this section we will not only look at the range of a linear mapping, but a

special subset of the domain as well. We will then use the connection between linear

mappings and their standard matrices to derive four important subspaces of a matrix.

Range

DEFINITION

Range

Let L : Rn → Rm be a linear mapping. The range of L is defined by

Range(L) ={

L(~x) ∈ Rm | ~x ∈ Rn}

EXAMPLE 1 Let L : R2 → R3 be defined by L(x1, x2) = (x1 + x2, x1 − x2, x2). Determine which of

the following vectors is in the range of L: ~y1 =

0

0

0

, ~y2 =

2

3

1

, ~y3 =

3

−1

2

.

Solution: Observe that L(0, 0) = (0 + 0, 0 − 0, 0) = (0, 0, 0), so ~y1 ∈ Range(L).

For ~y2, we need to find if there exist ~x =

[

x1

x2

]

such that

(2, 3, 1) = L(x1, x2) = (x1 + x2, x1 − x2, x2)

Comparing corresponding entires gives us the system of linear of equations

x1 + x2 = 2

x1 − x2 = 3

x2 = 1

It is easy to see that this system is inconsistent. Thus, ~y2 < Range(L).

For ~y3, we need to find if there exist x1 and x2 such that

(3,−1, 2) = L(x1, x2) = (x1 + x2, x1 − x2, x2)

That is, we need to solve the system of equations

x1 + x2 = 3

x1 − x2 = −1

x2 = 2

We find that the system is consistent with unique solution is x1 = 1, x2 = 2. Hence,

L(1, 2) = (3,−1, 2), so ~y3 ∈ Range(L).

Section 3.3 Special Subspaces 97

EXAMPLE 2 Find a basis for the range of L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3).

Solution: We see that every vector L(~x) has the form

[

3x1 − x2

2x1 + 2x3

]

= x1

[

3

2

]

+ x2

[

−1

0

]

+ x3

[

0

2

]

Hence,

Range(L) = Span

{[

3

2

]

,

[

−1

0

]

,

[

0

2

]}

= Span

{[

−1

0

]

,

[

0

2

]}

since

[

3

2

]

= (−3)

[

−1

0

]

+

[

0

2

]

. Thus, B ={[

−1

0

]

,

[

0

2

]}

spans Range(L) and is clearly

linearly independent, so it is a basis for Range(L).

EXAMPLE 3 Let ~a =

[

2

−1

]

. Find a basis for the range of proj~a : R2 → R2.

Solution: Let ~y ∈ Range(proj~a). Then, by definition, there exists ~x ∈ R2 such that

~y = proj~a(~x) =~x · ~a‖~a‖2~a ∈ Span{~a}

So, Range(proj~a) ⊆ Span{~a}. On the other hand, pick any c~a ∈ Span{~a}. Taking

~y =

[

0

−5c

]

gives

proj~a(~y) =~y · ~a‖~a‖2~a =

0 + (−5c)(−1)

5~a = c~a

Thus, Span{~a} ⊆ Range(proj~a). Hence, Range(proj~a) = Span{~a}. Since {~a} is also

clearly linearly independent, it is a basis for Range(proj~a).

EXERCISE 1 Find a basis for the range of L(x1, x2, x3) = (x1 − x2, x1 − 2x3).

Observe in the examples above that the range of a linear mapping is a subspace of the

codomain. To prove that this is always the case, we will use the following lemma.

LEMMA 1 If L : Rn → Rm is a linear mapping, then L(~0) = ~0.

THEOREM 2 If L : Rn → Rm is a linear mapping, then Range(L) is a subspace of Rm.

In many cases we are interested in whether the range of a linear mapping L equals its

codomain. That is, for every vector ~y in the codomain, can we find a vector ~x in the

domain such that L(~x) = ~y.


REMARK

A function whose range equals its codomain is said to be onto. We will look at onto

mappings in Chapter 8.

EXAMPLE 4 Let L : R3 → R3 be defined by L(x1, x2, x3) = (x1+ x2+ x3, 2x1−2x3, x2+3x3). Prove

that Range(L) = R3.

Solution: Since Range(L) is a subset ofR3 by definition, to prove that Range(L) = R3

we only need to show that if we pick any vector ~y ∈ R3, then we can find a vector

~x ∈ R3 such that L(~x) = ~y. Let ~y =

y1

y2

y3

∈ R3. Consider

(y1, y2, y3) = L(x1, x2, x3) = (x1 + x2 + x3, 2x1 − 2x3, x2 + 3x3)

This gives the system of linear equations

x1 + x2 + x3 = y1

2x1 − 2x3 = y2

x2 + 3x3 = y3

Row reducing the corresponding augmented matrix gives

1 1 1 y1

2 0 −2 y2

0 1 3 y3

∼

1 0 0 −y1 + y2 + y3

0 1 0 3y1 − 32y2 − 2y3

0 0 1 −y1 +12y2 + y3

Therefore, Range(L) = R3 since the system is consistent for all y1, y2, and y3. In

particular, taking x1 = −y1 + y2 + y3, x2 = 3y1 − 32y2 − 2y3, x3 = −y1 +

12y2 + y3 we get

L(x1, x2, x3) = (y1, y2, y3)

EXAMPLE 5 Let A be an n × n matrix and let L : Rn → Rn be defined by L(~x) = A~x. Prove that

Range(L) = Rn if and only if rank A = n.

Solution: If Range(L) = Rn, then for every ~y ∈ Rn there exists a vector ~x such that

~y = L(~x) = A~x. Hence, the system of equations A~x = ~y with coefficient matrix A is

consistent for every ~y ∈ Rn. The System-Rank Theorem then implies that rank(A) =

n.

On the other hand, if rank A = n, then by the System-Rank Theorem L(~x) = A~x = ~yis consistent for every ~y ∈ Rn, so Range(L) = Rn.


Kernel

We now look at a special subset of the domain of a linear mapping.

DEFINITION

Kernel

Let L : Rn → Rm be a linear mapping. The kernel (nullspace) of L is the set of all

vectors in the domain which are mapped to the zero vector in the codomain. That is,

Ker(L) = {~x ∈ Rn | L(~x) = ~0}

EXAMPLE 6 Let L : R2 → R2 be defined by L(x1, x2) = (2x1 − 2x2,−x1 + x2). Determine which of

the following vectors is in the kernel of L: ~x1 =

[

0

0

]

, ~x2 =

[

2

2

]

, ~x3 =

[

3

−1

]

.

Solution: We have

L(~x1) = L(0, 0) = (0, 0)

L(~x2) = L(2, 2) = (0, 0)

L(~x3) = L(3,−1) = (8,−4)

Hence, ~x1, ~x2 ∈ Ker(L) while ~x3 < Ker(L).

EXAMPLE 7 Find a basis for the kernel of L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3).

Solution: By definition, ~x =

x1

x2

x3

is in the kernel of L if

(0, 0) = L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3)

Hence, ~x is in the kernel of L if and only if it satisfies

3x1 − x2 = 0

2x1 + 2x3 = 0

Row reducing the corresponding coefficient matrix gives

[

3 −1 0

2 0 2

]

∼[

1 0 1

0 1 3

]

We find that the solution space of the homogeneous system has vector equation

~x = t

−1

−3

1

, t ∈ R. Thus, Ker(L) = Span

−1

−3

1

. Since

−1

−3

1

is also linearly

independent, it is a basis for Ker(L).


EXAMPLE 8 Find a basis for the kernel of Rθ : R2 → R2.

Solution: Let’s first think about this geometrically. We are looking for all vectors in

R2 such that rotating them by an angle of θ gives the zero vector. Since a rotation

does not affect length, the only vector which could have image of the zero vector

would be the zero vector.

We verify our geometric intuition algebraically. Consider

[

0

0

]

= Rθ(~x) = [Rθ]~x =

[


]

~x

Row reducing the coefficient matrix of the corresponding homogeneous system gives

[


]

∼[

1 0

0 1

]

Hence, the system has the unique solution ~x = ~0. So,

Ker(Rθ) =

{[

0

0

]}

Recall that we defined a basis for {~0} to be the empty set. Hence, the empty set is a

basis for Ker(Rθ).

EXERCISE 2 Find a basis for the kernel of L(x1, x2, x3) = (x1 − x2, x1 − 2x3).

In the examples, the kernel of a linear mapping was always a subspace of the appro-

priate Rn.

THEOREM 3 If L : Rn → Rm is a linear mapping, then Ker(L) is a subspace of Rn.

Proof: We use the Subspace Test. We first observe that by definition Ker(L) is a

subset of Rn. Also, we have that ~0 ∈ Ker(L) by Lemma 1. Let ~x, ~y ∈ Ker(L). Then,

L(~x) = ~0 and L(~y) = ~0 so

L(~x + ~y) = L(~x) + L(~y) = ~0 + ~0 = ~0

so ~x+~y ∈ Ker(L). Thus, Ker(L) is closed under addition. Similarly, for any c ∈ R we

have

L(c~x) = cL(~x) = c~0 = ~0

so c~x ∈ Ker(L). Therefore, Ker(L) is also closed under scalar multiplication. Hence,

Ker(L) is a subspace of Rn. �


The Four Fundamental Subspaces of a Matrix

We now look at the relationship of the standard matrix of a linear mapping to its

range and kernel. In doing so, we will derive four important subspaces of a matrix.

THEOREM 4 Let L : Rn → Rm be a linear mapping with standard matrix [L]. Then, ~x ∈ Ker(L) if

and only if [L]~x = ~0.

Proof: If ~x ∈ Ker(L), then ~0 = L(~x) = [L]~x.

On the other hand, if [L]~x = ~0, then ~0 = [L]~x = L(~x), so ~x ∈ Ker(L). �

We get an immediate corollary to this theorem.

COROLLARY 5 Let A ∈ Mm×n(R). The set {~x ∈ Rn | A~x = ~0} is a subspace of Rn.

The corollary motivates the following definition.

DEFINITION

Nullspace

Let A be an m × n matrix. The nullspace (kernel) of A is defined by

Null(A) = {~x ∈ Rn | A~x = ~0}

EXAMPLE 9 Let A =

[

1 2

−1 0

]

. Determine if ~x =

[

−1

2

]

is in the nullspace of A.

Solution: We have A~x =

[

3

1

]

. Since A~x , ~0, ~x < Null(A).

EXAMPLE 10 Let A =

[

1 3 1

−1 −2 2

]

. Find a basis for the nullspace of A.

Solution: We need to find all ~x such that A~x = ~0. We observe that this is a homoge-

neous system with coefficient matrix A. Row reducing gives[

1 3 1

−1 −2 2

]

∼[

1 0 −8

0 1 3

]

Thus, a vector equation for the solution set of the system is

~x = t

8

−3

1

, t ∈ R

Hence,

8

−3

1

spans Null(A) and is linearly independent, so it is a basis for Null(A).


EXAMPLE 11 Let A =

1 2 −1

5 0 −4

−2 6 4

. Show that Null(A) = {~0}.

Solution: We need to show that the only solution to the homogeneous system A~x = ~0

is ~x = ~0. Row reducing the associated coefficient matrix we get

1 2 −1

5 0 −4

−2 6 4

∼

1 0 0

0 1 0

0 0 1

Hence, the rank of the coefficient matrix equals the number of columns so by the

System-Rank Theorem there is a unique solution. Therefore, since the system is

homogeneous, the only solution is ~x = ~0. Thus, Null(A) = {~0}.

The connection between the range of a linear mapping and its standard matrix follows

easily from our second interpretation of matrix vector multiplication.

THEOREM 6 If L : Rn → Rm is a linear mapping with standard matrix [L] = A =[

~a1 · · · ~an

]

,

then

Range(L) = Span{~a1, . . . , ~an}

Proof: Observe that for any ~x ∈ Rn we have

L(~x) = A~x =[

~a1 · · · ~an

]

x1

...xn

= x1~a1 + · · · + xn~an

Thus, every vector in Range(L) is in Span{~a1, . . . , ~an} and every vector in Span{~a1, . . . , ~an}is in Range(L) as required. �

Therefore, the range of a linear mapping is the subspace spanned by the columns of

its standard matrix.

DEFINITION

Column Space

Let A ∈ Mm×n(R). The column space of A is the subspace of Rm defined by

Col(A) = {A~x ∈ Rm | ~x ∈ Rn}

EXAMPLE 12 Let A =

[

1 5 −3

9 2 1

]

and B =

1 −3

0 2

0 1

, then Col(A) = Span

{[

1

9

]

,

[

5

2

]

,

[

−3

1

]}

and

Col(B) = Span

1

0

0

,

−3

2

1

.


EXERCISE 3 Let A be an m × n matrix. Prove that Col(A) = Rm if and only if rank(A) = m.

Recall that taking the transpose of a matrix changes the columns of a matrix into rows

and vice versa. Hence, the subspace of Rn spanned by the rows of an m × n matrix A

is just the column space of AT .

DEFINITION

Row Space

Let A be an m × n matrix. The row space of A is the subspace of Rn defined by

Row(A) = {AT~x ∈ Rn | ~x ∈ Rm}

EXAMPLE 13 Let A =

[

1 5 −3

9 2 1

]

and B =

1 −3

0 2

0 1

, then Row(A) = Span

1

5

−3

,

9

2

1

and

Row(B) = Span

{[

1

−3

]

,

[

0

2

]

,

[

0

1

]}

.

Lastly, since we have looked at the column space of AT it makes sense to also consider

the nullspace of AT .

DEFINITION

Left Nullspace

Let A be an m × n matrix. The left nullspace of A is the subspace of Rm defined by

Null(AT ) = {~x ∈ Rm | AT~x = ~0}

EXAMPLE 14 Let A =

1 −3

0 2

0 1

. Find a basis for the left nullspace of A.

Solution: We have AT =

[

1 0 0

−3 2 1

]

. Solving the homogeneous system AT~x = ~0 we

get

~x = t

0

−1

2

, t ∈ R

Therefore,

0

−1

2

spans Null(AT ) and is linearly independent, hence it is a basis for

Null(AT ).

For any m × n matrix A, we call the nullspace, column space, row space, and left

nullspace the four fundamental subspaces of A.


EXAMPLE 15 Find a basis for each of the four fundamental subspaces of A =

[

1 3 2

2 5 2

]

.

Solution: To find a basis for Null(A) we solve A~x = ~0. Row reducing A gives

[

1 3 2

2 5 2

]

∼[

1 0 −4

0 1 2

]

Thus, a vector equation for the solution space is ~x = t

4

−2

1

, t ∈ R. Therefore,

4

−2

1

spans Null(A) and is linearly independent, so it is a basis for Null(A).

By definition of the column space of A we have

Col(A) = Span

{[

1

2

]

,

[

3

5

]

,

[

2

2

]}

Observe that −4

[

1

2

]

+ 2

[

3

5

]

=

[

2

2

]

. Thus, by Theorem 1.2.1 we have

Col(A) = Span

{[

1

2

]

,

[

3

5

]}

Thus,

{[

1

2

]

,

[

3

5

]}

spans Col(A) and is clearly linearly independent, so it is a basis for

Col(A).

By definition, the row space of A is spanned by

1

3

2

,

2

5

2

. This set is also linearly

independent and hence forms a basis for Row(A).

To determine the left nullspace of A, we solve AT~x = ~0. Row reducing AT gives

1 2

3 5

2 2

∼

1 0

0 1

0 0

Hence, the only solution is ~x = ~0, so Null(AT ) = {~0}. Consequently, a basis for

Null(AT ) is the empty set.



1. For each of the following linear mappings L:

(i) Determine a basis for the range and kernel

of L;

(ii) Find the standard matrix [L] of L;

(iii) Find a basis for the four fundamental sub-

spaces of [L].

(a) L(x1, x2, x3) = (x1 + x2, 0)

(b) L(x1, x2, x3) = (x1 − x2, x2 + x3)

(c) L(x1, x2, x3) = (0, 0, 0)

(d) L(x1, x2, x3, x4) = (x1, x2 + x4)

(e) L(x1, x2) = (x1, x2)

(f) L(x1, x2, x3, x4) = (x4 − x1, 2x2 + 3x3)

(g) L(x1) = (3x1)

(h) L(x1, x2, x3) = (2x1 + x3, x1 − x3, x1 + x3)

(i) L(x1) = (x1, 2x1,−x1)

(j) L(x1, x2) = (x1 + x2, 2x1 + 4x2)

2. Find a basis for the four fundamental subspaces

of each matrix.

(a) A =

[

1 2

3 4

]

(b) B =

[

1 0 1

0 1 −1

]

(c) C =

1 2 0

−1 −2 0

1 3 −1

(d) D =

1 2 1 1

−2 −5 −2 1

−1 −3 −1 2

3. Let A be an m × n matrix. Prove that a consis-

tent system of linear equations A~x = ~b has a

unique solution if and only if Null(A) = {~0}.

4. Prove Lemma 1 and Theorem 2.

5. Suppose that {~v1, . . . ,~vk} is a linearly indepen-

dent set in Rn and that L : Rn → Rm is a

linear map with Ker(L) = {~0}. Prove that

{L(~v1), . . . , L(~vk)} is linearly independent.

6. Let L : Rn → Rn be a linear operator. Prove

that Ker(L) = {~0} if and only if

Range(L) = Rn.

7. Let A be an n × n matrix such that A2 = On,n.

Prove that the columnspace of A is a subset of

the nullspace of A.

8. In the notes we proved that the nullspace of an

m×n matrix A was a subspace of Rn by proving

that the nullspace of A equals the kernel of the

matrix mapping L(~x) = A~x. It is instructive to

proof this result in other ways.

(a) Prove that Null(A) is a subspace of Rn di-

rectly using the Subspace Test.

(b) Prove that Null(A) is a subspace of Rn

by using one of the definitions of matrix-

vector multiplication.

9. Prove that if an m×n matrix A is row equivalent

to a matrix B, then Row(A) = Row(B).

10. Prove or disprove the following statements.

(a) A system of linear equations A~x = ~b is

consistent if and only if ~b ∈ Col(A).

(b) If L : Rn → Rm is a linear mapping and

~x ∈ Ker(L), then ~x = ~0.

(c) Let L : Rn → Rm be a linear mapping

with standard matrix [L]. If Range(L) =

Span{~a1, . . . , ~an}, then

[L] =[

~a1 · · · ~an

]


3.4 Operations on Linear Mappings

In the preceding sections we have been treating linear mappings as functions. We

have also seen that there is a close connection between a linear mapping and its

standard matrix. We now use this connection to show that we can also treat linear

mappings just like vectors in Rn or matrices. We begin with some basic definitions.

DEFINITION

Addition


Let L : Rn → Rm and M : Rn → Rm be linear mappings. We define L+M : Rn → Rm

and cL : Rn → Rm by

(L + M)(~x) = L(~x) + M(~x)

(cL)(~x) = cL(~x)

EXAMPLE 1 Let L : R2 → R3 and M : R2 → R3 be defined by L(x1, x2) = (x1, x1 + x2,−x2) and

M(x1, x2) = (x1, x1, x2). Calculate L + M and 3L.

Solution: L + M is the mapping defined by

(L + M)(~x) = L(~x) + M(~x) = (x1, x1 + x2,−x2) + (x1, x1, x2) = (2x1, 2x1 + x2, 0)

and 3L is the mapping defined by

(3L)(~x) = 3L(~x) = 3(x1, x1 + x2,−x2) = (3x1, 3x1 + 3x2,−3x2)

By analyzing Example 1 we can learn more than just how to add linear mappings and

how to multiply them by a scalar. We first observe that L + M and 3L are both linear

mappings. This makes us realize that we can rewrite the calculations in the example

in terms of standard matrices. For example,

(L + M)(~x) = L(~x) + M(~x) = [L]~x + [M]~x = ([L] + [M])~x

=

1 0

1 1

0 −1

+

1 0

1 0

0 1

~x =

2 0

2 1

0 0

[

x1

x2

]

=

2x1

2x1 + x2

0

This matches our result above. Generalizing what we did here gives the following

theorem.

Section 3.4 Operations on Linear Mappings 107

THEOREM 1 If L,M : Rn → Rm are linear mappings and c ∈ R, then L + M : Rn → Rm and

cL : Rn → Rm are linear mappings. Moreover, we have

[L + M] = [L] + [M] and [cL] = c[L]

Proof: For any ~x ∈ Rn we have

(L + M)(~x) = L(~x) + M(~x) = [L]~x + [M]~x = ([L] + [M])~x

(cL)(~x) = cL(~x) = c[L]~x = (c[L])~x

Thus, L + M and cL are matrix mappings with m × n standard matrices. Hence, they

are both linear mappings from Rn to Rm. Moreover, the standard matrix of L + M is

[L] + [M] and the standard matrix of cL is c[L] as required. �

If we let L denote the set of all possible linear mappings L : Rn → Rm, then Theorem

1 shows that L is closed under addition and scalar multiplication. Moreover, we

see that we can use the same proof technique to show that L satisfies the same 10

properties as vectors in Rn and matrices in Mm×n(R).

THEOREM 2 If L,M,N ∈ L and c, d ∈ R, then

V1 L + M ∈ LV2 (L + M) + N = L + (M + N)

V3 L + M = M + L

V4 There exists a linear mapping O : Rn → Rm, such that L + O = L for all L. In

particular, O is the linear mapping defined by O(~x) = ~0 for all ~x ∈ Rn

V5 For any linear mapping L : Rn → Rm, there exists a linear mapping

(−L) : Rn → Rm with the property that L + (−L) = O. In particular, (−L) is the

linear mapping defined by (−L)(~x) = −L(~x) for all ~x ∈ Rn

V6 cL ∈ LV7 c(dL) = (cd)L

V8 (c + d)L = cL + dL

V9 c(L + M) = cL + cM

V10 1L = L

Proof: Theorem 1 proves properties V1 and V6. We will prove V3 and V9, and

leave the rest as exercises.

To prove V3 we need to show (L + M)(~x) = (M + L)(~x) for all ~x ∈ Rn. We have

(L + M)(~x) = [L + M](~x) = ([L] + [M])~x = ([M] + [L])~x = [M + L](~x) = (M + L)(~x)

For V9, we need to show that(

c(L + M))

(~x) =(

cL + cM)

(~x) for all ~x ∈ Rn. We have

(

c(L + M))

(~x) = [c(L + M)](~x) = c([L + M])~x = c([L] + [M])~x

= (c[L] + c[M])~x = ([cL] + [cM])~x = [cL + cM]~x =(

cL + cM)

(~x)

�


EXERCISE 1 Prove V7 of Theorem 2.

The fact that the definitions of matrix addition and scalar multiplication match with

the definitions of addition and scalar multiplication of linear mappings is truly

remarkable. Of course, the wonderfulness of mathematics does not end there. We

now show, as promised, that matrix-matrix multiplication also corresponds to a stan-

dard operation on functions.

DEFINITION

Composition

Let L : Rn → Rm and M : Rm → Rp be linear mappings. The composition of M and

L is the function M ◦ L : Rn → Rp defined by

(M ◦ L)(~x) = M(

L(~x))

REMARK

Observe that it is necessary that the range of L is a subset of the domain of M for

M ◦ L to be defined.

EXAMPLE 2 Let L : R2 → R3 and M : R3 → R be the linear mappings defined by

L(x1, x2) = (x1 + x2, 0, x1 − 2x2) and M(x1, x2, x3) = x1 + x2 + x3. Then M ◦ L is

mapping defined by

(M ◦ L)(x1, x2) = M(x1 + x2, 0, x1 − 2x2) = (x1 + x2) + 0 + (x1 − 2x2) = 2x1 − x2

EXAMPLE 3 Let L : R2 → R2 and M : R2 → R2 be the linear mappings defined by

L(x1, x2) = (2x1 + x2, x1 + x2) and M(x1, x2) = (x1 − x2,−x1 + 2x2). Then, M ◦ L is

mapping defined by

(M ◦ L)(x1, x2) = M(2x1 + x2, x1 + x2)

=

(

(2x1 + x2) − (x1 + x2),−(2x1 + x2) + 2(x1 + x2)

)

= (x1, x2)

THEOREM 3 If L : Rn → Rm and M : Rm → Rp are linear mappings, then M ◦ L : Rn → Rp is a

linear mapping and

[M ◦ L] = [M][L]

Section 3.4 Operations on Linear Mappings 109

EXAMPLE 4 Let L : R2 → R2 and M : R2 → R2 be the linear mappings defined by

L(x1, x2) = (2x1 + x2, x1 + x2) and M(x1, x2) = (x1 − x2,−x1 + 2x2). Then

[M ◦ L] = [M][L] =

[

1 −1

−1 2

] [

2 1

1 1

]

=

[

1 0

0 1

]

In Example 3, M ◦ L is the mapping such that (M ◦ L)(~x) = ~x for all ~x ∈ R2. We see

from Example 4 that the standard matrix of this mapping is the identity matrix. Thus,

we make the following definition.

DEFINITION

Identity Mapping

The linear mapping Id : Rn → Rn defined by Id(~x) = ~x is called the identity

mapping.


1. For each of the following linear mappings L

and M, determine L + M and [L + M].

(a) L(x1, x2) = (2x1 − x2, x1),

M(x1, x2) = (3x1 + x2, x1 + x2)

(b) L(x1, x2, x3) = (x1 + x2, x1 + x3),

M(x1, x2, x3) = (−x3, x1 + x2 + x3)

(c) L(x1, x2, x3) = (2x1 − x2, x2 + 3x3),

M(x1, x2, x3) = (−2x1 + x2,−x2 − 3x3)

2. For each of the following linear mappings L

and M, determine M ◦ L, L ◦ M, [M ◦ L], and

[L ◦ M].

(a) L(x1, x2) = (2x1 − x2, x1)

M(x1, x2) = (3x1 + x2, x1 + x2)

(b) L(x1, x2, x3) = (x1 − 2x2, x1 + x2 + x3),

M(x1, x2) = (0, x1 + x2, x1)

(c) L(x1, x2) = (3x1 + x2, 5x1 + 2x2),

M(x1, x2) = (2x1 − x2,−5x1 − 3x2)

3. Let L : Rn → Rm and M : Rm → Rp be linear

mappings.

(a) Prove that if Range(L) = Rm and

Range(M) = Rp, then Range(M ◦ L) = Rp.

(b) Give an example where Range(L) , Rm,

but Range(M ◦ L) = Rp.

(c) Is it possible to give an example where

Range(M) , Rp, but

Range(M ◦ L) = Rp? Explain.

4. Prove properties V4, V5, V8 of Theorem 2.

5. Prove Theorem 3.

6. Let L1, L2 : R3 → R2 be the linear mappings

defined by

L1(x1, x2, x3) = (2x1, x2 − x3)

L2(x1, x2, x3) = (x1 + x2 + x3, x3)

Find all t1 and t2 such that

t1L1 + t2L2 = O

7. Let L1, L2, L3 : R2 → R2 be the linear map-

pings defined by

L1(x1, x2) = (x1 + 3x2, 6x1 + x2)

L2(x1, x2) = (2x1 + 5x2, 8x1 + 3x2)

L3(x1, x2) = (x1 + x2,−2x1 + 3x2)

Find all t1, t2, t3 such that

t1L1 + t2L2 + t3L3 = O

8. Let L denote the set of all linear operators

L : Rn → Rn. Prove that there exists a linear

mapping Id ∈ L such that L ◦ Id = L = Id ◦ L

for every L ∈ L.

Chapter 4

Vector Spaces

4.1 Vector Spaces

We have seen that Rn, subspaces of Rn, the set Mm×n(R), and the set L of all possible

linear mappings L : Rn → Rm satisfy the same ten properties. In fact, as we will see,

there are many other sets with operations of addition and scalar multiplication that

satisfy these same properties. Instead of studying each of these separately, it makes

sense to define and study one abstract concept which contains all of them.

DEFINITION

Vector Space

A set V with an operation of addition, denoted ~x + ~y, and an operation of scalar

multiplication, denoted c~x, is called a vector space over R if for every ~v, ~x, ~y ∈ Vand c, d ∈ R we have:

V1 ~x + ~y ∈ VV2 (~x + ~y) + ~v = ~x + (~y + ~v)

V3 ~x + ~y = ~y + ~x

V4 There is a vector ~0 ∈ V, called the zero vector, such that ~x+~0 = ~x for all ~x ∈ VV5 For every ~x ∈ V there exists (−~x) ∈ V such that ~x + (−~x) = ~0V6 c~x ∈ VV7 c(d~x) = (cd)~xV8 (c + d)~x = c~x + d~xV9 c(~x + ~y) = c~x + c~y

V10 1~x = ~x

We call the elements of V vectors.

Many textbooks will denote vectors in a vector space in boldface, i.e. x. However,

in this book we will use the usual vector symbol ~x as it is more readable and gives

students a nice way of writing vectors on assignments and tests.

110

Section 4.1 Vector Spaces 111

REMARKS

1. We sometimes denote addition by ~x⊕~y and scalar multiplication by c⊙~x to dis-

tinguish these from “normal” operations of addition and scalar multiplication.

See Example 8 below.

2. In some cases, when working with multiple vector spaces, we denote the zero

vector in a vector space V by ~0V.

3. The ten vector space axioms define a “structure” for the set based on the oper-

ations of addition and scalar multiplication. The study of vector spaces is the

study of this structure. Note that there may be additional operations defined on

some vector spaces that are not included in this structure. For example, ma-

trix multiplication in M2×2(R) is an additional operation that is not related to

M2×2(R) being a vector space.

4. The definition of a vector space makes sense if the scalars are allowed to be

taken from any set that has all the same important properties as R. Such sets

are called fields, for example, Q or C. In Chapter 11, we will consider the case

where the scalars are allowed to be complex. Until then, in this book when we

say vector space we mean a vector space over R.

EXAMPLE 1 In Chapter 1, we saw that Rn and any subspace of Rn under standard addition and

scalar multiplication of vectors satisfy all ten vector space axioms and hence are

vector spaces.

EXAMPLE 2 The set Mm×n(R) is a vector space under standard addition and scalar multiplication

of matrices.

EXAMPLE 3 The set L of all linear mappings L : Rn → Rm is a vector space under standard

addition and scalar multiplication of linear mappings.

EXAMPLE 4 Denote the set of all polynomials of degree at most n with real coefficients by Pn(R).

That is,

Pn(R) = {a0 + a1x + · · · + anxn | a0, a1, . . . , an ∈ R}Define standard addition and scalar multiplication of polynomials by

(a0 + · · · + anxn) + (b0 + · · · + bnxn) = (a0 + b0) + · · · + (an + bn)xn

c(a0 + · · · + anxn) = (ca0) + · · · + (can)xn

for all c ∈ R. It is easy to verify that Pn(R) is a vector space under these operations.

We find that the zero vector in Pn(R) is the polynomial z(x) = 0 for all x, and the

additive inverse of p(x) = a0 + a1x + · · · + anxn is (−p)(x) = −a0 − a1x − · · · − anxn.

112 Chapter 4 Vector Spaces

EXAMPLE 5 The set P(R) of all polynomials with real coefficients is a vector space under standard

addition and scalar multiplication of polynomials.

EXAMPLE 6 The set C[a, b] of all continuous functions defined on the interval [a, b] is a vector

space under standard addition and scalar multiplication of functions.

EXAMPLE 7 Let V = {~0} and define addition by ~0 + ~0 = ~0, and scalar multiplication by c~0 = ~0 for

all c ∈ R. Show that V is a vector space under these operations.

Solution: To show that V is a vector space, we need to show that it satisfies all ten

axioms.

V1 The only element in V is ~0 and ~0 + ~0 = ~0 ∈ V.

V2 (~0 + ~0) + ~0 = ~0 + ~0 = ~0 + (~0 + ~0).

V3 ~0 + ~0 = ~0 = ~0 + ~0.

V4 ~0 satisfies ~0 + ~0 = ~0 and ~0 ∈ V, so ~0 is the zero vector in V.

V5 The additive inverse of ~0 is ~0 since ~0 + ~0 = ~0 and ~0 ∈ V.

V6 c~0 = ~0 ∈ V.

V7 c(d~0) = c~0 = ~0 = (cd)~0.

V8 (c + d)~0 = ~0 = ~0 + ~0 = c~0 + d~0 .

V9 c(~0 + ~0) = c~0 = ~0 = ~0 + ~0 = c~0 + c~0.

V10 1~0 = ~0.

for all c, d ∈ R. Thus, V is a vector space. It is called the trivial vector space.

EXAMPLE 8 Let S = {x ∈ R | x > 0}. Define addition in S by x ⊕ y = xy, and define scalar

multiplication by c ⊙ x = xc for all x, y ∈ S and all c ∈ R. Prove that S is a vector

space under these operations.

Solution: We will show that S satisfies all ten vector space axioms. For any x, y, z ∈ Sand c, d ∈ R we have

V1 x ⊕ y = xy > 0 since x > 0 and y > 0. Hence, x ⊕ y ∈ S.V2 (x ⊕ y) ⊕ z = (xy) ⊕ z = (xy)z = x(yz) = x ⊕ (yz) = x ⊕ (y ⊕ z).

V3 x ⊕ y = xy = yx = y ⊕ x.

V4 The zero vector is 1 since 1 ∈ S and x ⊕ 1 = x1 = x for all x ∈ S.V5 If x ∈ S, then 1

x⊕ x = 1

xx = 1. Thus, 1

xis the additive inverse of x. Also, 1

x> 0

since x > 0 and hence 1x∈ S.

V6 c ⊙ x = xc > 0 since x > 0. So, c ⊙ x ∈ S.V7 c ⊙ (d ⊙ x) = c ⊙ xd = (xd)c = xcd = (cd) ⊙ x.

V8 (c + d) ⊙ x = xc+d = xcxd = xc ⊕ xd = (c ⊙ x) ⊕ (d ⊙ x).

V9 c ⊙ (x ⊕ y) = c ⊙ (xy) = (xy)c = xcyc = xc ⊕ yc = (c ⊙ x) ⊕ (c ⊙ y).

V10 1 ⊙ x = x1 = x.

Therefore, S is a vector space.


EXAMPLE 9 Let D =

{[

a1 0

0 a2

]

| a1, a2 ∈ R}

. Show that D is a vector space under standard

addition and scalar multiplication of matrices.

Solution: We show that D satisfies all ten vector space axioms. For any

A =

[

a1 0

0 a2

]

, B =

[

b1 0

0 b2

]

, and C =

[

c1 0

0 c2

]

in D and s, t ∈ R we have

V1 A + B =

[

a1 + b1 0

0 a2 + b2

]

∈ D.

V2 (A + B) +C = A + (B + C) by Theorem 3.1.1.

V3 A + B = B + A by Theorem 3.1.1.

V4 Clearly O2,2 ∈ D and we have O2,2 satisfies A + O2,2 = A for all A ∈ D by

Theorem 3.1.1. Hence, O2,2 is the zero vector in D.

V5 By Theorem 3.1.1, the additive inverse of A is

[

−a1 0

0 −a2

]

∈ D for all A ∈ D.

V6 sA =

[

sa1 0

0 sa2

]

∈ D.

V7 s(t(A)) = (st)A by Theorem 3.1.1.

V8 (s + t)A = sA + tA by Theorem 3.1.1.

V9 s(A + B) = sA + sB by Theorem 3.1.1.

V10 1A = A by Theorem 3.1.1.

Hence, D is a vector space.

Example 9 should remind you a lot of the Subspace Test we did back in Chapter 1.

In particular, we automatically had that all the axioms V2, V3, V5, V7, V8, V9, and

V10 held since we were using the addition and scalar multiplication operations of a

vector space. We will look more at this below.

EXAMPLE 10 Let V andW be vectors spaces. We define the Cartesian product of V andW by

V ×W = {(~v, ~w) | ~v ∈ V, ~w ∈W}

If we define addition and scalar multiplication in V ×W by

(~v1, ~w1) ⊕ (~v2, ~w2) = (~v1 + ~v2, ~w1 + ~w2)

c ⊙ (~v1, ~w1) = (c~v1, c~w1)

for all (~v1, ~w1), (~v2, ~w2) ∈ V ×W and c ∈ R, then V ×W is a vector space.

EXERCISE 1 Prove that the Cartesian product of two vector spaces with addition and scalar multi-

plication defined as in Example 10 is a vector space.


EXAMPLE 11 Show the set S = {(x, y) | x, y ∈ R} with addition defined by (x1, y1) ⊕ (x2, y2) =

(x1 + y2, x2 + y1) and scalar multiplication defined by c ⊙ (x1, y1) = (cx1, cy1) is not a

vector space.

Solution: To show that S is not a vector space, we only need to find one counter

example to one of the vector space axioms. Observe that (1, 2) ∈ S and that

(1 + 1) ⊙ (1, 2) = 2 ⊙ (1, 2) = (2, 4)(

1 ⊙ (1, 2)) ⊕ (

1 ⊙ (1, 2))

= (1, 2) ⊕ (1, 2) = (1 + 2, 2 + 1) = (3, 3)

Hence, V8 is not satisfied. Thus, S is not a vector space.

EXAMPLE 12 Show the set Z2 =

{[

x1

x2

]

| x1, x2 ∈ Z}

is not a vector space under standard addition

and scalar multiplication of vectors.

Solution: Observe that

[

1

2

]

∈ Z2, but√

2

[

1

2

]

=

[ √2

2√

2

]

< Z2. Hence, Z2 does not

satisfy axiom V6, so it is not a vector space.

We have now seen that there are numerous different vector spaces. The advantage of

having all of these belonging to the one abstract concept of a vector space is that we

can study all of them at the same time. That is, if we prove a result for a general vector

space, then it immediately applies for all specific vector spaces. We now demonstrate

this.

THEOREM 1 If V is a vector space, then

(1) 0~x = ~0 for all ~x ∈ V,

(2) (−~x) = (−1)~x for all ~x ∈ V.

Proof: We prove (1) and leave (2) as an exercise. For all ~x ∈ V we have

0~x = 0~x + ~0 by V4

= 0~x + [~x + (−~x)] by V5

= 0~x + [1~x + (−~x)] by V10

= [0~x + 1~x] + (−~x) by V2

= (0 + 1)~x + (−~x) by V8

= 1~x + (−~x) operation of numbers in R

= ~x + (−~x) by V10

= ~0 by V5

�


REMARK

Observe that Theorem 1 implies that the zero vector in a vector space is unique, as is

the additive inverse of any vector in the vector space.

We can now use this result to find the zero vector in any vector space V by simply

finding 0~x for any ~x ∈ V. Similarly, we can find the additive inverse of any ~x ∈ V by

calculating (−1)~x.

EXAMPLE 13 Let S be defined as in Example 8. Observe that we indeed have ~0 = 0 ⊙ x = x0 = 1,

and (−x) = (−1) ⊙ x = x−1 = 1x

for any x ∈ S.

Subspaces

Observe that the set D in Example 9 is a vector space that is contained in the larger

vector space M2×2(R). This is exactly the same as we saw in Chapter 1 with subspaces

of Rn.

DEFINITION

Subspace

Let V be a vector space. If S is a subset of V and S is a vector space under the same

operations as V, then S is called a subspace of V.

As in Example 9, we see that we do not need to test all ten vector space axioms to

show that a set S is a subspace of a vector space V. In particular, we get the following

theorem which matches what we saw in Chapter 1.

THEOREM 2 (Subspace Test)

If S is a non-empty subset of V such that ~x + ~y ∈ S and c~x ∈ S for all ~x, ~y ∈ S and

c ∈ R under the operations of V, then S is a subspace of V.

REMARKS

1. It is very important to always remember to check that S is a subset of V and is

non-empty as part of the Subspace Test.

2. As before, we will say that S is closed under addition if ~x + ~y ∈ S for all

~x, ~y ∈ S, and we will say that S is closed under scalar multiplication if c~x ∈ Sfor all ~x ∈ S and c ∈ R.


EXAMPLE 14 Determine if S1 = {xp(x) | p(x) ∈ P2(R)} is a subspace of P4(R).

Solution: Recall that P2(R) is the set of all polynomials of degree at most 2 and P4(R)

is the set of polynomials of degree at most 4. Consequently, every vector q ∈ S1 has

the form

q(x) = x(a0 + a1x + a2x2) = a0x + a1x2 + a2x3

which is a polynomial of degree at most 4 (actually, of at most 3). Hence, S1 is a

subset of P4(R).

0 + 0x + 0x2 ∈ P2(R), so 0 = x(0 + 0x + 0x2) ∈ S1. Thus, S1 is non-empty.

If q1, q2 ∈ S1, then there exist p1, p2 ∈ P2(R) such that q1(x) = xp1(x) and q2(x) =

xp2(x). We have

q1(x) + q2(x) = xp1(x) + xp2(x) = x(p1(x) + p2(x))

and p1 + p2 ∈ P2(R) as P2(R) is closed under addition. So, q1 + q2 ∈ S1.

Similarly, for any c ∈ R we get

cq1(x) = c(xp1(x)) = x(cp1(x)) ∈ S1

since cp1 ∈ P2(R) as P2(R) is closed under scalar multiplication.

Therefore, S1 is a subspace of P4(R) by the Subspace Test.

EXAMPLE 15 Determine if S2 =

{[

a1 a2

0 a3

]

| a1a2a3 = 0

}

is a subspace of M2×2(R).

Solution: Observe that A =

[

1 1

0 0

]

and B =

[

0 0

0 1

]

are both in S2, but

A + B =

[

1 1

0 1

]

< S2

So, S2 is not closed under addition and hence is not a subspace.

EXAMPLE 16 Determine if S3 = {p(x) ∈ P2(R) | p(2) = 0} is a subspace of P2(R).

Solution: By definition S3 is a subset of P2(R).

The zero vector in P2(R) is z(x) = 0. Observe z ∈ S3 since it satisfies z(2) = 0.

Therefore, S3 is non-empty.

If p, q ∈ S3, then p(2) = 0 and q(2) = 0, so (p + q)(2) = p(2) + q(2) = 0 + 0 = 0.

Hence, (p + q) ∈ S3. Thus, S3 is closed under addition.

Similarly, (cp)(2) = cp(2) = c(0) = 0 for all c ∈ R, so (cp) ∈ S3. Thus, it is

also closed under scalar multiplication. Therefore S3 is a subspace of P2(R) by the

Subspace Test.


EXAMPLE 17 Determine if S4 =

{[

1 2

3 4

]}


Solution: Observe that 0

[

1 2

3 4

]

=

[

0 0

0 0

]

< S4. Therefore, S4 is not closed under

scalar multiplication and hence is not a subspace.

REMARK

As with subspaces in Chapter 1, normally the easiest way to verify the set S is non-

empty is to check if the zero vector of V, ~0V, is in S. Moreover, if ~0V < S, then, as

we saw in the example above, S is not closed under scalar multiplication and hence

is not a subspace.

EXAMPLE 18 A matrix A is called skew-symmetric if AT = −A. Determine if the set of all n × n

skew-symmetric matrices with real entries is a vector space under standard addition

and scalar multiplication of matrices.

Solution: To prove the set S of skew-symmetric matrices is a vector space, we will

prove that it is a subspace of Mn×n(R).

By definition S is a subset of Mn×n(R). Also, the n×n zero matrix On,n clearly satisfies

AT = −A, thus On,n ∈ S, and so S is non-empty.

Let A and B be any two skew-symmetric matrices. Then,

(A + B)T = AT + BT = (−A) + (−B) = −A − B = −(A + B)

So, A + B is skew-symmetric and hence S is closed under addition.

For any c ∈ R we get (cA)T = c(AT ) = c(−A) = −(cA) so (cA) is skew-symmetric

and therefore S is closed under scalar multiplication.

Thus, S is a subspace of Mn×n(R) by the Subspace Test. Therefore, S is a vector space

by definition of a subspace.

EXERCISE 2 Prove that the set S =

{[

a1 a2

a3 a4

]

| a1 + a2 = a3 − a4

}


Axioms V1 and V6 imply that every vector space is closed under linear combinations.

Therefore, it makes sense to look at the concepts of spanning and linear independence

as we did in Chapter 1.


Spanning

DEFINITION

Span

Let B = {~v1, . . . ,~vk} be a set of vectors in a vector space V. We define the span of Bby

SpanB = {c1~v1 + c2~v2 + · · · + ck~vk | c1, . . . , ck ∈ R}We say that SpanB is spanned by B and that B is a spanning set for SpanB.

EXAMPLE 19 Let B = {1 + 2x + 3x2, 3 + 2x + x2, 5 + 6x + 7x2}. Determine which of the following

vectors is in SpanB.

(a) p(x) = 1 + x + 2x2.

Solution: To determine if p(x) ∈ SpanB, we need to find whether there exist coeffi-

cients c1, c2, and c3 such that

1 + x + 2x2 = c1(1 + 2x + 3x2) + c2(3 + 2x + x2) + c3(5 + 6x + 7x2)

= (c1 + 3c2 + 5c3) + (2c1 + 2c2 + 6c3)x + (3c1 + c2 + 7c3)x2

Comparing coefficients of powers of x on both sides gives us the system of linear

equations

c1 + 3c2 + 5c3 = 1

2c1 + 2c2 + 6c3 = 1

3c1 + c2 + 7c3 = 2

Row reducing the augmented matrix of the system we get

1 3 5 1

2 2 6 1

3 1 7 2

∼

1 3 5 1

0 −4 −4 −1

0 0 0 1

Hence, the system is inconsistent, so p(x) < SpanB.

(b) q(x) = 2 + x.

Solution: To determine if q(x) ∈ SpanB, we need to find whether there exist coeffi-

cients c1, c2, and c3 such that

2 + x = c1(1 + 2x + 3x2) + c2(3 + 2x + x2) + c3(5 + 6x + 7x2)

= (c1 + 3c2 + 5c3) + (2c1 + 2c2 + 6c3)x + (3c1 + c2 + 7c3)x2

Comparing coefficients of powers of x on both sides gives us the system of linear

equations

c1 + 3c2 + 5c3 = 2

2c1 + 2c2 + 6c3 = 1

3c1 + c2 + 7c3 = 0


Row reducing the augmented matrix of the system we get

1 3 5 2

2 2 6 1

3 1 7 0

∼

1 0 2 −1/40 1 1 3/40 0 0 0

Hence, the system is consistent, so q(x) ∈ SpanB. In particular, if we solve the

system, we get

~c =

c1

c2

c3

=

−1/43/40

+ t

−2

−1

1

for any t ∈ R. Therefore, there are infinitely many different ways to write q(x) as a

linear combination of the vectors in B.

EXAMPLE 20 Prove that the subspace S =

{[

a1 a2

0 a3

]

| a1, a2, a3 ∈ R}

of M2×2(R) is spanned by

B ={[

1 1

0 1

]

,

[

1 2

0 −1

]

,

[

1 3

0 2

]}

.

Solution: We need to show that every vector of the form

[

a1 a2

0 a3

]

can be written as a

linear combination of the vectors in B. That is, we need to show that the equation

c1

[

1 1

0 1

]

+ c2

[

1 2

0 −1

]

+ c3

[

1 3

0 2

]

=

[

a1 a2

0 a3

]

is consistent for all a1, a2, a3 ∈ R. Using operations on matrices on the left-hand side,

we get[

c1 + c2 + c3 c1 + 2c2 + 3c3

0 c1 − c2 + 2c3

]

=

[

a1 a2

0 a3

]

Comparing entries gives the system of linear equations

c1 + c2 + c3 = a1

c1 + 2c2 + 3c3 = a2

c1 − c2 + 2c3 = a3

We row reduce the coefficient matrix to get

1 1 1

1 2 3

1 −1 2

∼

1 0 0

0 1 0

0 0 1

By the System-Rank Theorem, the system is consistent for all a1, a2, a3 ∈ R.


REMARKS

1. As we saw in Chapter 1, when proving that a set B spans a vector space V,

we technically need to prove that SpanB ⊆ V and V ⊆ SpanB. However,

as long as B ⊂ V, we have that SpanB ⊆ V since V is closed under linear

combinations. Thus, it is standard practice to only show that V ⊆ SpanB.

2. To prove that the system in Example 20 was consistent for all a1, a2, a3 ∈ Rwe only needed to determine the rank of the coefficient matrix. However, if we

had row reduced the corresponding augmented matrix, we would have found

equations for c1, c2, and c3 in terms of a1, a2, and a3.

EXERCISE 3 Prove that B = {1 + x + x2, 1 + 2x − x2, 1 + 3x + 2x2} spans P2(R).

EXERCISE 4 Prove that a set of three vectors in M2×2(R) cannot span M2×2(R).

At this time you should notice that all of this is exactly the same as what we did with

spanning in Chapter 1 except that we no longer necessarily have a geometric inter-

pretation of the spanned set. The fact that it is the same should not be surprising since

a general vector space is just a generalization of Rn and our definition of spanning is

exactly the same as before. So, we would expect that we would also have the exactly

same theorems as we did for Rn. Indeed the following theorems and their proofs are

exactly the same.

THEOREM 3 If B = {~v1, . . . ,~vk} is a set of vectors in a vector space V, then SpanB is a subspace

of V.

THEOREM 4 Let V be a vector space and ~v1, . . . ,~vk ∈ V. Then, ~vi ∈ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}if and only if



Linear Independence

As was the case with spanning, we now see that our theory of linear independence is

also exactly the same as in Chapter 1.

DEFINITION

LinearlyDependent

LinearlyIndependent

A set of vectors {~v1, . . . ,~vk} in a vector space V is said to be linearly dependent if

there exist coefficients c1, . . . , ck not all zero such that

~0 = c1~v1 + · · · + ck~vk

A set of vectors {~v1, . . . ,~vk} is said to be linearly independent if the only solution to

~0 = c1~v1 + · · · + ck~vk

is the trivial solution c1 = · · · = ck = 0.

EXAMPLE 21 Determine if the set

{[

1 2

1 −2

]

,

[

1 −2

2 2

]

,

[

2 0

3 0

]

,

[

1 1

1 1

]}

is linearly independent or

linearly dependent in M2×2(R).

Solution: Consider

[

0 0

0 0

]

= c1

[

1 2

1 −2

]

+ c2

[

1 −2

2 2

]

+ c3

[

2 0

3 0

]

+ c4

[

1 1

1 1

]

=

[

c1 + c2 + 2c3 + c4 2c1 − 2c2 + c4

c1 + 2c2 + 3c3 + c4 −2c1 + 2c2 + c4

]

Comparing entries we get the homogeneous system

c1 + c2 + 2c3 + c4 = 0

2c1 − 2c2 + c4 = 0

c1 + 2c2 + 3c3 + c4 = 0

−2c1 + 2c2 + c4 = 0

We row reduce the corresponding coefficient matrix to get

1 1 2 1

2 −2 0 1

1 2 3 1

−2 2 0 1

∼

1 0 1 0

0 1 1 0

0 0 0 1

0 0 0 0

Therefore, by the System-Rank Theorem, the system has infinitely many solution

and hence the set is linearly dependent. In particular, we can find that

[

0 0

0 0

]

= (−t)

[

1 2

1 −2

]

+ (−t)

[

1 −2

2 2

]

+ t

[

2 0

3 0

]

+ 0

[

1 1

1 1

]

for all t ∈ R.


EXAMPLE 22 Determine if the set {1 + x + 2x2, x − x2,−2 + x2} is linearly independent or linearly

dependent in P2(R).

Solution: Consider

0 = c1(1 + x + 2x2) + c2(x − x2) + c3(−2 + x2)

0 + 0x + 0x2 = (c1 − 2c3) + (c1 + c2)x + (2c1 − c2 + c3)x2

Comparing coefficients of like powers of x we get the homogeneous system

c1 − 2c3 = 0

c1 + c2 = 0

2c1 − c2 + c3 = 0

Row reducing the corresponding coefficient matrix gives

1 0 −2

1 1 0

2 −1 1

∼

1 0 0

0 1 0

0 0 1

Thus, we see that the only solution to the system is c1 = c2 = c3 = 0. Thus, the set is


As expected, we get the following theorems which match what we did in Chapter 1.

THEOREM 5 A set of vectors {~v1, . . . ,~vk} in a vector space V is linearly dependent if and only if

~vi ∈ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk} for some i, 1 ≤ i ≤ k.

THEOREM 6 Any set of vectors {~v1, . . . ,~vk} in a vector space V which contains the zero vector is

linearly dependent.

EXERCISE 5 Prove that a set of four vectors in P2(R) must be linearly dependent.



1. Determine, with proof, which of the following

are subspaces of the given vector space.

(a) S1 ={

x2 p(x) | p(x) ∈ P2(R)}

of P4(R)

(b) S2 =

{[

a1 a2

0 a3

]

∈ M2×2(R) | a1 + a3 = 0

}

of M2×2(R)

(c) S3 = {p(x) ∈ P2(R) | p(1) = 0} of P2(R)

(d) S4 =

{[

0 0

0 0

]}

of M2×2(R)

(e) S5 = {a + bx + cx2 ∈ P2(R) | a = c} of

P2(R)

(f) S6 ={

A ∈ M2×2(R) | AT = A}

of M2×2(R)

(g) S7 = {1, x, x2} of P2(R)

(h) S8 = Span{

1 + x + x2, x + x2 + x3, 1 + x2 + x3}

of P3(R)

(i) S9 =

{[

a1 a2

a3 a4

]

∈ M2×2(R) | a1 + a3 = 0, a2 = 2a4

}

of M2×2(R)

2. In each case, determine if ~v is in SpanB.

(a) ~v = 1 + x2,

B = {1 + x + x2,−1 + 2x + 2x2, 5 + x + x2}

(b) ~v =

[

2 5

4 4

]

,B ={[

1 0

−1 1

]

,

[

0 1

2 2

]

,

[

3 1

−3 1

]}

(c) ~v =

0

−3

3

−6

, B =

1

2

0

−1

,

1

3

1

−1

,

2

4

3

−5

(d) ~v = −3x + 3x2 − 6x3, B = {1+ 2x − x3, 1+3x + x2 − x3, 2 + 4x + 3x2 − 5x3}

(e) ~v =

[

0 −3

3 −6

]

,B ={[

1 2

0 −1

]

,

[

1 3

1 −1

]

,

[

2 4

3 −5

]}

3. In each case, determine if B is linearly inde-

pendent.

(a) B = {3−2x+ x2, 2−5x+2x2, 6+7x−2x2}

(b) B ={[

1 0

0 1

]

,

[

0 0

0 0

]

,

[

3 −4

−4 3

]}

(c) B ={[

1 1

−2 1

]

,

[

2 2

1 0

]

,

[

1 −2

3 4

]}

(d) B = {1+ x− 2x2 + x3, 2+ 2x+ x2, 1− 2x+

3x2 + 4x3}

(e) B ={[

1 −1

2 1

]

,

[

3 1

2 1

]

,

[

−1 5

0 1

]

,

[

3 1

2 2

]}

4. Let L1, L2, L3 : R2 → R2 be the linear map-

pings defined by

L1(x1, x2) = (x1 + 3x2, 6x1 + x2)

L2(x1, x2) = (2x1 + 5x2, 8x1 + 3x2)

L3(x1, x2) = (x1 + x2,−2x1 + 3x2)

Determine if the set {L1, L2, L3} is linearly in-

dependent or linearly dependent in L.

5. Let V be a vector space and ~v1, . . . ,~vk ∈ V. De-

termine whether each statement is true or false.

Justify your answer.

(a) If B = {~v1, . . . ,~vk} is linearly independent,

then any subset of B is linearly indepen-

dent.

(b) If c~v1 = c~v2, then ~v1 = ~v2.

(c) c~0V = ~0V for all c ∈ R.

6. Let V be a vector space. Prove that {~0V} is a

vector space under the same operations as V.

7. Let S be the set of all n × n matrices with real

entries. Determine if S is a vector space where

addition and scalar multiplication are defined

by A ⊕ B = AB and c · A = cA for all A, B ∈ Sand c ∈ R.

8. A sequence is an infinite list of real numbers.

For example 1, 2, 3, 4, 5, 6, . . . is a sequence, as

is 1,−1, 2,−2, 4,−4, . . .. (The terms may or

may not follow a discernible pattern). We de-

fine addition and scalar multiplication of se-

quences in a component-wise fashion, so that

{1, 2, 3, 4, . . .} + {1,−1, 2, 4, . . .} = {2, 1, 5, 8, . . .}c{1, 2, 3, . . .} = {c, 2c, 3c, . . .}

Verify that the set of sequences is a vector

space.

9. Let V be a vector space. Prove that

(−~x) = (−1)~x for all ~x ∈ V.




4.2 Bases and Dimension

We have seen the importance and usefulness of bases in Rn. A basis for a subspace S

of Rn provides an explicit formula for every vector in S. Moreover, we can geometri-

cally view this explicit formula as a coordinate system. That is, a basis of a subspace

of Rn forms a set of coordinate axes for that subspace. Thus, we desire to extend the

concept of bases to general vector spaces.

Bases

Of course, our definition of a basis mimics our definition from Chapter 1.

DEFINITION

Basis

Let V be a vector space. A set B is called a basis for V if B is a linearly independent

spanning set for V.

We define a basis for {~0V} to be the empty set.

EXAMPLE 1 The set {1, x, x2, . . . , xn} is a basis for Pn(R) since it is clearly linearly independent

and spans Pn(R). It is called the standard basis for Pn(R).

EXAMPLE 2 The set

{[

1 0

0 0

]

,

[

0 1

0 0

]

,

[

0 0

1 0

]

,

[

0 0

0 1

]}

is the standard basis for M2×2(R).

EXAMPLE 3 Prove that B = {1 + 2x + x2, 2 + 9x, 3 + 3x + 4x2} is a basis for P2(R).

Solution: First, we need to show that for any polynomial p(x) = c+ bx+ ax2 we can

find c1, c2, c3 such that

c + bx + ax2 = c1(1 + 2x + x2) + c2(2 + 9x) + c3(3 + 3x + 4x2) (4.1)

= (c1 + 2c2 + 3c3) + (2c1 + 9c2 + 3c3)x + (c1 + 4c3)x2

Row reducing the coefficient matrix of the corresponding system gives

1 2 3

2 9 3

1 0 4

∼

1 0 0

0 1 0

0 0 1

By the System-Rank Theorem, SpanB = P2(R) since the system is consistent for

every right hand side

c

b

a

. Moreover, if we let a = b = c = 0 in equation (4.1), we

see that we get the unique solution c1 = c2 = c3 = 0, and hence B is also linearly

independent. Therefore, it is a basis for P2(R).

Section 4.2 Bases and Dimension 125

EXERCISE 1 Prove that B = {3 − 2x + x2, 2 − 5x + 2x2, 1 − x + x2} is a basis for P2(R).

EXAMPLE 4 Consider the subspace U =

{[

a b

0 c

]

| a, b, c ∈ R}

of M2×2(R). Determine whether

B ={[

1 1

0 1

]

,

[

1 2

0 −1

]

,

[

1 3

0 2

]}

is a basis for U.

Solution: We first check to see if the set is linearly independent.

[

0 0

0 0

]

= c1

[

1 1

0 1

]

+ c2

[

1 2

0 −1

]

+ c3

[

1 3

0 2

]

=

[

c1 + c2 + c3 c1 + 2c2 + 3c3

0 c1 − c2 + 2c3

]

.

This gives us the homogeneous system of linear equations

c1 + c2 + c3 = 0

c1 + 2c2 + 3c3 = 0

c1 − c2 + 2c3 = 0

Row reducing the coefficient matrix of the homogeneous system we get

1 1 1

1 2 3

1 −1 2

∼

1 0 0

0 1 0

0 0 1

Hence, the only solution is c1 = c2 = c3 = 0, so the set is linearly independent.

Next, we need to determine if B spans U. Consider

[

a b

0 c

]

= c1

[

1 1

0 1

]

+ c2

[

1 2

0 −1

]

+ c3

[

1 3

0 2

]

This gives a system of linear equations with the same coefficient matrix as above. So,

using the same row-operations we did above we see that the matrix has a leading one

in each row of its RREF, hence the system is consistent by the System-Rank Theorem

for all matrices

[

a b

0 c

]

. Consequently, the set spans U. Therefore, B is a basis for U.

EXERCISE 2 Prove that B ={[

1 1

0 0

]

,

[

1 0

0 1

]

,

[

0 0

1 1

]}

is not a basis for M2×2(R). Find a subspace

S of M2×2(R) such that B is a basis for S.


Finding A Basis

In linear algebra and its applications we often need to find a basis for a vector space.

We now give a general procedure for finding a basis for a vector space which can be

spanned by a finite number of vectors.

ALGORITHM

Let V be a vector space that can be spanned by a finite number of vectors. To find a

basis for V:

1. Find the general form of a vector ~x ∈ V.

2. Write the general form of ~x as a linear combination of vectors ~v1, . . . ,~vk in V.

3. Check if B = {~v1, . . . ,~vk} is linearly independent. If it is, then stop as B is a

basis.

4. If B is linearly dependent, pick some vector ~vi ∈ B that can be written as a

linear combination of the other vectors in B and remove it from the set using

Theorem 4.1.4. Repeat this until the set is linearly independent.

EXAMPLE 5 Find a basis for the subspace S of M2×2(R) of skew-symmetric matrices.

Solution: We first want to find the general form of a vector A ∈ S. Recall that if

A =

[

a b

c d

]

is skew-symmetric, then AT = −A. So,

[

a c

b d

]

=

[

−a −b

−c −d

]

For this to be true, we must have a = −a, c = −b, b = −c, and d = −d. Thus,

a = 0 = d and b = −c. Hence, every vector in S can be written in the form

A =

[

a b

c d

]

=

[

0 b

−b 0

]

= b

[

0 1

−1 0

]

Thus, S = Span

{[

0 1

−1 0

]}

, so B ={[

0 1

−1 0

]}

is a spanning set for S. Moreover, Bcontains one non-zero vector, so it is clearly linearly independent. Therefore, B is a

basis for S.

EXAMPLE 6 Find a basis for the subspace V =

a − b

a − c

c − b

| a, b, c ∈ R

of R3.

Solution: Any vector ~v ∈ V has the form

~v =

a − b

a − c

c − b

= a

1

1

0

+ b

−1

0

−1

+ c

0

−1

1


Thus, V = Span

1

1

0

,

−1

0

−1

,

0

−1

1

. Consider

0

0

0

= c1

1

1

0

+ c2

−1

0

−1

+ c3

0

−1

1

=

c1 − c2

c1 − c3

−c2 + c3

(4.2)

Row reducing the coefficient matrix of the corresponding system of linear equations

gives

1 −1 0

1 0 −1

0 −1 1

∼

1 0 −1

0 1 −1

0 0 0

Hence, we see that equation (4.2) has infinitely many solutions. Therefore,

1

1

0

,

−1

0

−1

,

0

−1

1

is linearly dependent. A vector equation of the solution set of

the system is

c1

c2

c3

= t

1

1

1

, t ∈ R. Taking t = 1 gives c1 = c2 = c3 = 1. Substituting

these into equation (4.2) gives

0

−1

1

= −

1

1

0

−

−1

0

−1

So, we can apply Theorem 4.1.4, to get that the set B =

1

1

0

,

−1

0

−1

also spans V.

Now consider

0

0

0

= c1

1

1

0

+ c2

−1

0

−1

=

c1 − c2

c1

−c2

Using exactly the same row operations as above gives

1 −1

1 0

0 −1

∼

1 0

0 1

0 0

Hence, the system has a unique solution, so B is linearly independent. Thus, B is a

basis for V.

In general, it would be very inefficient to remove one vector at a time to reduce

a spanning set to a basis. Notice that when we removed a vector from B in the

example above, all we did was remove the corresponding column from the matrix

representation of the corresponding system of equations. So, we often can use our

knowledge and understanding of solving systems of equations to remove all linearly

dependent vectors at once. This is demonstrated in the next example.


EXAMPLE 7 Consider the vectors p1, p2, p3, p4, and p5 given by

p1(x) = 4 + 7x − 12x3 + 9x4 p2(x) = −2 + 12x − 8x2 + 4x4

p3(x) = −16 + 3x − 16x2 + 36x3 − 19x4 p4(x) = −6 + 19x2 − 6x3

p5(x) = 82 − 49x − 96x2 − 12x3 + 17x4

in P4(R), and let V = Span{p1, p2, p3, p4, p5}. Determine a basis for V given that the

following two matrices are row equivalent:

4 −2 −16 −6 82

7 12 3 0 −49

0 −8 −16 19 −96

−12 0 36 −6 −12

9 4 −19 0 17

and

1 0 −3 0 5

0 1 2 0 −7

0 0 0 1 −8

0 0 0 0 0

0 0 0 0 0

Solution: Observe that the first matrix is the coefficient matrix of the homogeneous

system corresponding to the equation

0 = c1 p1 + c2 p2 + c3 p3 + c4 p4 + c5 p5

The RREF shows us that a vector equation for the solution set of the system is

c1

c2

c3

c4

c5

= s

3

−2

1

0

0

+ t

−5

7

0

8

1

, s, t ∈ R

This implies that the set {p1, p2, p3, p4, p5} is linearly dependent. However, it also

shows a lot more. If we take s = 1 and t = 0, we get that c1 = 3, c2 = −2, and c3 = 1,

so

0 = 3p1 − 2p2 + p3 + 0p4 + 0p5

Thus, p3 = −3p1 + 2p2. Similarly, taking s = 0 and t = 1 gives

0 = −5p1 + 7p2 + 0p3 + 8p4 + p5

and so p5 = 5p1 − 7p2 − 8p4. Hence, by Theorem 4.1.4, we get that V =

Span{p1, p2, p4}. Moreover, if we ignore the third and fifth column of the matrices,

we see that the equation

0 = c1 p1 + c2 p2 + c4 p4

has a unique solution, hence {p1, p2, p4} is linearly independent. Thus, {p1, p2, p4} is

a basis for V.


Dimension

In Chapter 1, we geometrically understood that Rn is n-dimensional and, for example,

a plane in Rn is two-dimensional. We now generalize the concept of dimension to

vector spaces. We begin with two important theorems.

THEOREM 1 Let B = {~v1, . . . ,~vn} be a basis for a vector space V and let C = {~w1, . . . , ~wk} be a set

in V. If k > n, then C is linearly dependent.

Proof: Since ~w1, . . . , ~wk ∈ V and B is a basis for V, we can write every vector

~w1, . . . , ~wk as a linear combination of the vectors in B. Say,

~w1 = a11~v1 + a12~v2 + · · · + a1n~vn

~w2 = a21~v1 + a22~v2 + · · · + a2n~vn

......

...

~wk = ak1~v1 + ak2~v2 + · · · + akn~vn

Consider,

~0 = c1~w1 + · · · + ck~wk (4.3)

= c1(a11~v1 + a12~v2 + · · · + a1n~vn) + · · · + ck(ak1~v1 + ak2~v2 + · · · + akn~vn)

= (c1a11 + · · · + ckak1)~v1 + · · · + (c1a1n + · · · + ckakn)~vn (4.4)

Since B is a basis, it is linearly independent, and hence equation (4.4) implies that

c1a11 + · · · + ckak1 = 0

......

...

c1a1n + · · · + ckakn = 0

Since k > n, this system has infinitely many solutions by the System-Rank Theorem.

Hence, equation (4.3) has infinitely many solutions, and so C is linearly dependent

as required. �

REMARK

Recall that we can think of a basis as a minimal spanning set. This theorem shows

that we can also think of a basis as a maximal linearly independent set. That is, a

basis B is a linearly independent set such that if we add any other vector from the

vector space to B, then the resulting set would be linearly dependent.

THEOREM 2 If B = {~v1, . . . ,~vn} and C = {~w1, . . . , ~wk} are bases for a vector space V, then k = n.

Proof: Since B is a basis and C is linearly independent, we get k ≤ n by Theorem 1.

Similarly, C is a basis and B is linearly independent, so n ≤ k. Thus, n = k. �


Thus, if a vector space V has a basis containing a finite number of elements, then

every basis of V has the same number of elements. This implies that every basis for

Rn contains exactly n vectors, and every basis for a plane in Rn contains exactly two

vectors which matches our geometric understanding of dimension. Thus, we make


DEFINITION

Dimension

If B = {~v1, . . . ,~vn} is a basis for a vector space V, then we say the dimension of V is

n and write

dim V = n

If V is the trivial vector space, then dimV = 0. If V does not have a basis with a

finite number of vectors in it, then V is said to be infinite dimensional.

EXAMPLE 8 The dimension of Rn is n since the standard basis {~e1, . . . , ~en} contains n vectors.

The dimension of Pn(R) is n + 1 since the standard basis {1, x, . . . , xn} contains n + 1

vectors.

The dimension of Mm×n(R) is mn since the standard basis contains mn vectors.

The vector space P(R) of all polynomials with real coefficients is infinite dimensional

since a basis is {1, x, x2, . . .}.

EXAMPLE 9 Since

{[

0 1

−1 0

]}

is a basis for the subspace S of M2×2(R) of skew-symmetric

matrices, we have that the dimension of S is 1.

EXAMPLE 10 In Example 6, we found that a basis for the subspace V =

a − b

a − c

c − b

| a, b, c ∈ R

of

R3 is B =

1

1

0

,

−1

0

−1

. Thus, dimV = 2. So, geometrically, V is a plane in R3.

Knowing the dimension of a vector space can be very useful. This is demonstrated

in the following theorem.

THEOREM 3 If V is an n-dimensional vector space with n > 0, then

(1) a set of more than n vectors in V must be linearly dependent

(2) a set of fewer than n vectors in V cannot span V

(3) a set of n vectors in V is linearly independent if and only if it spans V


REMARK

Observe that (3) shows that if we have a set B of n linearly independent vectors in

an n-dimensional vector space V, then B is a basis for V. Similarly, if C contains n

vectors in V and SpanC = V, then C is a basis for V.

EXAMPLE 11 Find a basis for the hyperplane x1 + x2 + 2x3 + x4 = 0 in R4 and then extend the basis

to obtain a basis for R4.

Solution: By definition a hyperplane in R4 is 3-dimensional. Hence, by Theorem 3,

any set of 3 linearly independent vectors in the hyperplane will form a basis for the

hyperplane. Observe that the vectors ~v1 =

1

−1

0

0

, ~v2 =

0

2

−1

0

, and ~v3 =

0

0

1

−2

satisfy the

equation of the hyperplane and hence are in the hyperplane. Now, consider

0

0

0

0

= c1~v1 + c2~v2 + c3~v3 =

c1

−c1 + 2c2

−c2 + c3

−2c3

The only solution is c1 = c2 = c3 = 0 so the vectors are linearly independent. Thus,

B = {~v1,~v2,~v3} is a basis for the hyperplane.

To extend the basis B to a basis for R4, we just need to add a vector ~v4 so that

{~v1,~v2,~v3,~v4} is linearly independent. Since ~v4 =

1

0

0

0

does not satisfy the equation

of the hyperplane, it is not in Span(~v1,~v2,~v3) and so B = {~v1,~v2,~v3,~v4} is linearly

independent by Theorem 4.1.5. Therefore, B is a basis for R4.

THEOREM 4 If V is an n-dimensional vector space and {~v1, . . . ,~vk} is a linearly indepen-

dent set in V with k < n, then there exist vectors ~wk+1, . . . , ~wn in V such that

{~v1, . . . ,~vk, ~wk+1, . . . , ~wn} is a basis for V.

We will use this fact that we can always extend a linearly independent set to a basis

for a finite dimensional vector space in a few proofs later in the book.

COROLLARY 5 If S is a subspace of a finite dimensional vector space V, then dimS ≤ dimV.



1. Determine if the given set B is a basis for the

specified vector space.

(a) B = {1 + x2, 1 − x + x3, 2x + x2 − 3x3, 1 +4x + x2, 1 − x − x2 + x3} of P3(R)

(b) B = {1 + 2x + x2, 2 + 9x, 3 + 3x + 4x2} of

P2(R)

(c) B ={[

1 1

0 1

]

,

[

1 2

0 −1

]

,

[

1 3

0 2

]}

of the sub-

space U =

{[

a b

0 c

]

| a, b, c ∈ R}

of M2×2(R)

(d) B ={[

1 2

3 4

]

,

[

5 6

7 8

]

,

[

9 0

1 2

]}

of M2×2(R)

2. Find a basis and the dimension for each of the

following subspaces.

(a) S1 = {xp(x) | p(x) ∈ P2(R)} of P4(R)

(b) S2 =

{[

a1 a2

0 a3

]

| a1 + a3 = 0

}

of M2×2(R)

(c) S3 = {p(x) ∈ P2(R) | p(2) = 0} of P2(R)

(d) S4 =

{[

0 0

0 0

]}

of M2×2(R)

(e) S5 = {a + bx + cx2 | a = c} of P2(R)

(f) S6 ={

A ∈ M2×2(R) | AT = A}

of M2×2(R)

(g) S7 =

{[

a1 a2

a3 a4

]

| a1 + a3 = 0, a2 = 2a4

}

of M2×2(R)

(h) S8 = Span

{

1 + x, 1 + x + x2, x + 2x2,

x + x2 + x3, 1 + x2 + x3

}

of P3(R)

(i) S9 = Span

1

−3

2

,

−1

4

−1

,

1

−1

4

,

0

2

2

,

0

1

1

of R3.

3. State the standard basis for M2×3(R).

4. Prove Theorem 3.

5. Find a basis for P3(R) that includes the vectors

~v1 = 1 + x + x3 and ~v2 = 1 + x2.

6. Find a basis for M2×2(R) that includes the vec-

tors ~v1 =

[

1 2

2 1

]

and ~v2 =

[

1 2

3 4

]

.

7. Find a basis for the hyperplane in R4 defined by

x1+ x2+2x3+ x4 = 0, and then extend the basis

to obtain a basis for R4.

8. Prove that if S is a subspace of a vector space

V and dimV = dimS, then S = V.

9. Prove Theorem 4 and Corollary 5.


(a) If V is an n-dimensional vector space and

{~v1, . . . ,~vk} is a linearly independent set in

V, then k ≤ n.

(b) Every basis for P2(R) has exactly 2 vectors

in it.

(c) If {~v1,~v2} is a basis for a 2-dimensional

vector space V, then {a~v1 + b~v2, c~v1 + d~v2}is also a basis for V for any non-zero real

numbers a, b, c, d.

(d) If B = {~v1, . . . ,~vk} spans a vector space V,

then some subset of B is a basis for V.

(e) If {~v1, . . . ,~vk} is a linearly independent set

in a vector space V, then dimV ≥ k.

(f) Every set of four non-zero matrices in

M2×2(R) is a basis for M2×2(R).

11. Prove that if S and T are both 3-dimensional

subspaces of a 5-dimensional vector space V,

then S ∩ T , {~0}.

Section 4.3 Coordinates 133

4.3 Coordinates

Recall that when we write ~x =

x1

...xn

in Rn we really mean ~x = x1~e1 + · · ·+ xn~en where

{~e1, . . . , ~en} is the standard basis for Rn. For example, when you originally learned to

plot the point (1, 2) in the xy-plane, you were taught that this means you move 1 in

the x-direction (~e1) and 2 in the y-direction (~e2). We can extend this idea to bases for

general vector spaces.

THEOREM 1 If B = {~v1, . . . ,~vn} is a basis for a vector space V, then every ~v ∈ V can be written as

a unique linear combination of the vectors in B.

Of course, the proof is the same as Theorem 1.2.4. Moreover, the theorem shows

us that we can think of any basis B = {~v1, . . . ,~vn} for a vector space V as forming

coordinate axes for V. That is, we can think of a vector ~v = b1~v1 + · · · + bn~vn in V

as being b1 in the ~v1 direction, b2 in the ~v2 direction, etc. Hence, just like in Rn, it is

really the coefficients of the linear combination that identify the vector.

The next example shows that once we have written out the vectors with respect to

a basis, performing operations on vectors in any finite dimensional vector space is

exactly the same as operations on vectors in Rn.

EXAMPLE 1 Let ~x =

1

−2

3

, ~y =

0

5

−4

, p(x) = 1 − 2x + 3x2, q(x) = 5x − 4x2. Also, let {~v1,~v2,~v3} be

a basis for V = Span{~v1,~v2,~v3} and let ~v = ~v1 − 2~v2 + 3~v3, and ~w = 0~v1 + 5~v2 − 4~v3.

We have

~x + ~y =

1 + 0

−2 + 5

3 − 4

=

1

3

−1

p(x) + q(x) = (1 + 0) + (−2 + 5)x + (3 − 4)x2 = 1 + 3x − x2

~v + ~w = (1 + 0)~v1 + (−2 + 5)~v2 + (3 − 4)~v3 = 1~v1 + 3~v2 − ~v3

This motivates the following definition.

DEFINITION

B-Coordinates

B-CoordinateVector

Let B = {~v1, . . . ,~vn} be a basis for a vector space V. If ~v = b1~v1 + · · · + bn~vn, then

b1, . . . , bn are called the B-coordinates of ~v, and we define the B-coordinate vector

(also called the coordinate vector of ~v with respect to B) by

[~v]B =

b1

...bn


EXAMPLE 2 If B = {~v1,~v2,~v3} is a basis for a vector space V and ~v = 3~v1 − 2~v2 + 4~v3, then the

B-coordinate vector of ~v is

[~v]B =

3

−2

4

EXAMPLE 3 Given that B ={[

2

1

]

,

[

3

3

]}

is a basis for R2 and that [~x]B =

[√2

−π

]

. Find ~x.

Solution: By definition of coordinates we get

~x =√

2

[

2

1

]

− π[

3

3

]

=

[

2√

2 − 3π√2 − 3π

]

EXERCISE 1 Given that B ={[

1 1

0 −1

]

,

[

1 0

1 1

]

,

[

0 1

1 2

]}

is a basis for the subspace of M2×2(R)

which it spans, find A and B where [A]B =

2

1

3

and [B]B =

−4

1

5

.

EXAMPLE 4 Find the coordinate vector of ~x =

[

3

1

]

with respect to the basis B ={[

1

1

]

,

[

1

−1

]}

for

R2.

Solution: We just need to write ~x as a linear combination of the basis vectors. That

is, we need to find c1 and c2 such that

[

3

1

]

= c1

[

1

1

]

+ c2

[

1

−1

]

By inspection (or by solving the corresponding system) we get that c1 = 2 and c2 = 1.

Hence, we have[

3

1

]

= 2

[

1

1

]

+ 1

[

1

−1

]

So,

[~x]B =

[

2

1

]


EXAMPLE 5 Given that B ={[

1 1

0 −1

]

,

[

1 0

1 1

]

,

[

0 1

1 2

]}

is a basis for the subspace of M2×2(R)

which it spans, find the B-coordinate vector of A =

[

1 −3

2 3

]

and of B =

[

−1 0

3 7

]

.

Solution: We need to find c1, c2, c3 and d1, d2, d3 such that

c1

[

1 1

0 −1

]

+ c2

[

1 0

1 1

]

+ c3

[

0 1

1 2

]

=

[

1 −3

2 3

]

d1

[

1 1

0 −1

]

+ d2

[

1 0

1 1

]

+ d3

[

0 1

1 2

]

=

[

−1 0

3 7

]

Observe that each corresponding system will have the same coefficient matrix, but

different augmented part. Thus, we can solve both systems simultaneously by row

reducing a doubly augmented matrix. We get

1 1 0 1 −1

1 0 1 −3 0

0 1 1 2 3

−1 1 2 3 7

∼

1 0 0 −2 −2

0 1 0 3 1

0 0 1 −1 2

0 0 0 0 0

For the first system we have c1 = −2, c2 = 3, and c3 = −1, and for the second system

we have d1 = −2, d2 = 1, and d3 = 2. Thus,

[A]B =

−2

3

−1

and [B]B =

−2

1

2

EXERCISE 2 Find the B-coordinate vectors of p(x) = 1 + x + x2 and q(x) = 1 − 2x + 3x2 where

B = {1, 1 + x, 1 + x + x2}.

REMARK

It is important to notice that the coordinates of a vector are dependent on which basis

is being used for the vector space. Moreover, it is also dependent on the order of

vectors in the basis. Thus, to prevent confusion, when we speak of a basis, we mean

an ordered basis.

As intended, coordinate vectors allow us to change from working with vectors in

some vector space with respect to some basis to working with vectors in Rn. In

particular, observe that taking coordinates of a vector ~v with respect to a basis B of a

vector space V can be thought of as a function that takes a vector in V and outputs a

vector in Rn. Not surprisingly, this function is linear.


THEOREM 2 If V is a vector space with basis B = {~v1, . . . ,~vn}, then for any ~v, ~w ∈ V and s, t ∈ Rwe have

[s~v + t~w]B = s[~v]B + t[~w]B

Proof: Since B is a basis for V we get that there exists b1, . . . , bn, c1, . . . , cn ∈ R such

that ~v = b1~v1 + · · · + bn~vn and ~w = c1~v1 + · · · + cn~vn. Observe that

s~v + t~w = s(b1~v1 + · · · + bn~vn) + t(c1~v1 + · · · + cn~vn)

= s(b1~v1) + · · · + s(bn~vn) + t(c1~v1) + · · · + t(cn~vn) by V9

= (sb1)~v1 + · · · + (sbn)~vn + (tc1)~v1 + · · · + (tcn)~vn by V7

= (sb1)~v1 + (tc1)~v1 + · · · + (sbn)~vn + (tcn)~vn by V3

= (sb1 + tc1)~v1 + · · · + (sbn + tcn)~vn by V8

Therefore,

[s~v + t~w]B =

sb1 + tc1

...sbn + tcn

= s

b1

...bn

+ t

c1

...cn

= s[~v]B + t[~w]B

�

Change of Coordinates

We have seen how to find different bases for a vector space V and how to find the

coordinates of a vector ~v ∈ V with respect to any basis B of V. In some cases, it is

useful to have a quick way of determining the coordinates of ~v with respect to some

basis C for V if we are given the coordinates of ~v with respect to the basis B of V.

For example, in some applications it is useful to write polynomials in terms of powers

of x−c for some real number c. That is, given a polynomial p(x) = a0+a1x+· · ·+anxn,

we want to write it in the form

p(x) = b0 + b1(x − c) + b2(x − c)2 + · · · + bn(x − c)n

Such a situation may arise if the values of x you are working with are very close to

c. If you are working with many polynomials, then it would be very helpful to have

a fast way of converting each polynomial. We can rephrase this problem in terms of

linear algebra.

Let S = {1, x, . . . , xn} be the standard basis for Pn(R). Given [p(x)]S =

a0

...an

we want

to determine [p(x)]B where B is the basis B = {1, x − c, (x − c)2, . . . , (x − c)n} for

Pn(R).


Since taking coordinates is a linear operation by Theorem 2, we get

[p(x)]B = [a01 + a1x + · · · + anxn]B

= a0[1]B + a1[x]B + · · · + an[xn]B

=[

[1]B [x]B · · · [xn]B]

a0

...an

So, the multiplication of this matrix and the coordinate vector of p(x) with respect to

the standard basis gives us the coordinates of p(x) with respect to the new basis B.

We demonstrate how this works with an example.

EXAMPLE 6 Let B = {1, x − 3, (x − 3)2}. Determine the matrix[

[1]B [x]B [x2]B]

and use it to

write the polynomials p(x) = 2 + x + 3x2, q(x) = x − 4x2, and r(x) = 3 + 5x − x2 in

powers of x − 3.

Solution: To find the matrix, we first need to find the coordinates of 1, x, and x2 with

respect to B. That is, we need to find a1, a2, a3, b1, b2, b3, c1, c2, c3 such that

1 = a1(1) + a2(x − 3) + a3(x − 3)2 = (a1 − 3a2 + 9a3) + (a2 − 6a3)x + a3x2

x = b1(1) + b2(x − 3) + b3(x − 3)2 = (b1 − 3b2 + 9b3) + (b2 − 6b3)x + b3x2

x2 = c1(1) + c2(x − 3) + c3(x − 3)2 = (c1 − 3c2 + 9c3) + (c2 − 6c3)x + c3x2

Comparing like powers of x we get three systems of linear equations all with the

same coefficient matrix. Thus, to solve all three simultaneously we row reduce the

corresponding multi-augmented matrix. We get

1 −3 9 1 0 0

0 1 −6 0 1 0

0 0 1 0 0 1

∼

1 0 0 1 3 9

0 1 0 0 1 6

0 0 1 0 0 1

Hence, we get that

[

[1]B [x]B [x2]B]

=

1 3 9

0 1 6

0 0 1

Therefore, according to our work above, we get that

[2 + x + 3x2]B =

1 3 9

0 1 6

0 0 1

2

1

3

=

32

19

3

Therefore, by definition of coordinates,

2 + x + 3x2 = 32 + 19(x − 3) + 3(x − 3)2

It is easy to verify that this is indeed correct.


Similarly,

[x − 4x2]B =

1 3 9

0 1 6

0 0 1

0

1

−4

=

−33

−23

−4

[3 + 5x − x2]B =

1 3 9

0 1 6

0 0 1

3

5

−1

=

9

−1

−1

Hence,

x − 4x2 = −33 − 23(x − 3) − 4(x − 3)2

and

3 + 5x − x2 = 9 − (x − 3) − (x − 3)2

It is important to observe how easy it was to convert polynomials from standard

coordinates to B-coordinates once we had found the matrix.

Of course, we can repeat the argument above in the case where B and C are any two

bases of a vector space V. If B = {~v1, . . . ,~vn} and [~x]B =

b1

...bn

, then

[~x]C = [b1~v1 + · · · + bn~vn]C

= b1[~v1]C + · · · + bn[~vn]C

=[

[~v1]C · · · [~vn]C]

b1

...bn

=[

[~v1]C · · · [~vn]C]

[~x]B

Hence, we make the following definition.

DEFINITION

Change ofCoordinates

Matrix

Let B = {~v1, . . . ,~vn} and C both be bases for a vector space V. The change of

coordinates matrix from B-coordinates to C-coordinates is defined by

CPB =[

[~v1]C · · · [~vn]C]

and for any ~x ∈ V we have

[~x]C =CPB[~x]B

REMARK

Observe that the notation CPB can help you remember which side has the C-

coordinate vector of ~x and which side has the B-coordinate vector of ~x.


EXAMPLE 7 Consider the basis B =

1

2

3

,

−1

0

1

,

−2

0

−3

for R3. Find the change of coordinates

matrix from B to the standard basis S.

Solution: The columns of the desired change of coordinates matrix are the S-

coordinate vectors of the vectors in B. We have

1

2

3

S

=

1

2

3

,

−1

0

1

S

=

−1

0

1

,

−2

0

−3

S

=

−2

0

−3

So, the change of coordinates matrix SPB is

SPB =

1

2

3

S

−1

0

1

S

−2

0

−3

S

=

1 −1 −2

2 0 0

3 1 −3

EXAMPLE 8 Let B ={[

1

3

]

,

[

2

1

]}

and C ={[

−1

1

]

,

[

5

−4

]}

be bases for R2. Find CPB and BPC.

Solution: To find CPB we need to find the C-coordinate vectors of the vectors in B.

Thus, we need to solve the systems

c1

[

−1

1

]

+ c2

[

5

−4

]

=

[

1

3

]

d1

[

−1

1

]

+ d2

[

5

−4

]

=

[

2

1

]

Observe that since system has the same coefficient matrix, the same row operations

will solve each system. To make this easier, we make the multiple augmented system[

−1 5 1 2

1 −4 3 1

]

Row reducing this system gives[

−1 5 1 2

1 −4 3 1

]

∼[

1 0 19 13

0 1 4 3

]

Hence,

CPB =

[

19 13

4 3

]

Similarly, for BPC we need to find the B-coordinate vectors of the vectors in C. We

again form a multiple augmented system and row reduce to get[

1 2 −1 5

3 1 1 −4

]

∼[

1 0 3/5 −13/50 1 −4/5 19/5

]

So,

BPC =

[

3/5 −13/5−4/5 19/5

]


EXAMPLE 9 Let B = {1 + x, 1 + 2x + x2, x − x2} be a basis for P2(R). Determine the change of

coordinates matrix from B to the standard basis S = {1, x, x2} for P2(R).

Solution: We need to find the S-coordinate vectors of the vectors in B. We have

[1 + x]S =

1

1

0

[1 + 2x + x2]S =

1

2

1

[x − x2]S =

0

1

−1

Hence, the change of coordinates matrix is

SPB =[

[1 + x]S [1 + 2x + x2]S [x − x2]S]

=

1 1 0

1 2 1

0 1 −1

EXAMPLE 10 Consider the basis B =

1

3

−1

,

2

1

1

,

3

4

1

. Find the change of coordinates matrix

from the standard basis S to B and hence find

x1

x2

x3

B

. Verify the answer is correct by

using the definition of coordinates.

Solution: The columns of the desired change of coordinates matrix are the B-

coordinate vectors of the standard basis vectors. To find these we need to solve the

augmented systems

1 2 3 1

3 1 4 0

−1 1 1 0

,

1 2 3 0

3 1 4 1

−1 1 1 0

,

1 2 3 0

3 1 4 0

−1 1 1 1

Again, we row reduce the multiple augmented system to get

1 2 3 1 0 0

3 1 4 0 1 0

−1 1 1 0 0 1

∼

1 0 0 3/5 −1/5 −1

0 1 0 7/5 −4/5 −1

0 0 1 −4/5 3/5 1

Thus, the change of coordinates matrix from S-coordinates to B-coordinates is

BPS =

3/5 −1/5 −1

7/5 −4/5 −1

−4/5 3/5 1

Therefore, we have

x1

x2

x3

B

= BPS[~x]S =

3/5 −1/5 −1

7/5 −4/5 −1

−4/5 3/5 1

x1

x2

x3

=

35x1 − 1

5x2 − x3

75x1 − 4

5x2 − x3

−45x1 +

35x2 + x3


To verify that these are the B-coordinates of ~x we apply the definition of coordinates.

We have

(

3

5x1−

1

5x2 − x3

)

1

3

−1

+

(

7

5x1 −

4

5x2 − x3

)

2

1

1

+

(

− 4

5x1 +

3

5x2 + x3

)

3

4

1

=

1 2 3

3 1 4

−1 1 1

35x1 − 1

5x2 − x3

75x1 − 4

5x2 − x3

−45x1 +

35x2 + x3

=

1 2 3

3 1 4

−1 1 1

3/5 −1/5 −1

7/5 −4/5 −1

−4/5 3/5 1

x1

x2

x3

=

1 0 0

0 1 0

0 0 1

x1

x2

x3

=

x1

x2

x3

as required.

Notice in Example 10 that the matrix

1 2 3

3 1 4

−1 1 1

is the change of coordinates matrix

from B-coordinates to S-coordinates. So, the calculations in the example show that

SPB BPS = I. Of course, this makes sense since the product should change from

S-coordinates to B-coordinates and then back to S-coordinates. This is exactly how

we will prove the theorem.

THEOREM 3 If B and C are bases for an n-dimensional vector space V, then the change of coordi-

nate matrices CPB and BPC satisfy

CPB BPC = I =BPC CPB

Proof: By definition of the change of coordinates matrix we have [~x]C = CPB[~x]Band [~x]B =BPC[~x]C. Combining these gives

[~x]C =CPB[~x]B =CPB BPC[~x]C

Hence, I[~x]C =CPB BPC[~x]C for all [~x]C ∈ Rn, so CPB BPC = I by the Matrices Equal

Theorem. Similarly,

[~x]B =BPC[~x]C =BPC CPB[~x]B

for all [~x]B ∈ Rn, hence BPC CPB = I. �



1. Given thatB is a basis for the subspace it spans,

find ~x such that:

(a) B ={[

1 3

−1 2

]

,

[

3 6

−4 3

]}

, [~x]B =

[√2

−4

]

(b) B ={

1 + x, 1 + x + x2, 1 + x2}

, [~x]B =

1

0

3

2. Consider the basisB =

1

2

0

,

3

7

3

,

−2

−3

5

forR3.

Find

−2

−5

1

B

and

4

8

−3

B

.

3. In each case, given that B is a basis for the sub-

space it spans, determine the coordinate vectors

of A and B with respect to B.

(a) B ={[

1 1

0 −1

]

,

[

1 0

1 1

]

,

[

0 1

1 2

]}

,

A =

[

1 −3

2 3

]

, B =

[

−1 0

3 7

]

.

(b) B ={[

1 1

1 0

]

,

[

0 1

1 1

]

,

[

2 0

0 −1

]}

,

A =

[

0 1

1 2

]

, B =

[

−4 1

1 4

]

4. In each case, given thatB is a basis for the set it

spans, determine the coordinate vectors of p(x)

and q(x) with respect to B.

(a) B = {1 + x + x2, 1 + 3x + 2x2, 4 + x2},p(x) = −2+8x+5x2, q(x) = −4+8x+4x2

(b) B ={

1 + x − 4x2, 2 + 3x − 3x2, 3 + 6x + 4x2}

p(x) = −3 − 5x − 3x2, q(x) = x.

5. Given that B ={[

3

1

]

,

[

5

3

]}

and C ={[

2

1

]

,

[

5

2

]}

are both bases for R2, find BPC and CPB.

6. Given the bases B = {1, x − 1, (x − 1)2} and

C = {1+ x+ x2, 1+3x− x2, 1− x− x2} for P2(R),

find BPC and CPB.

7. Let B = {1− x+ x2, 1−2x+3x2, 1−2x+4x2} be

a basis for P2(R) and let S denote the standard

basis for P2(R).

(a) Find SPB and BPS.

(b) Find p(x) if [p(x)]B =

3

1

−2

.

(c) Find [q(x)]B if q(x) = 1 + x2.

(d) Find r(x) if [r(x)]B =

2

−3

2

.

8. Let T =

{[

a1 a2

0 a3

]

| a1, a2, a3 ∈ R}

be the vec-

tor space of upper triangular matrices with

standard addition and scalar multiplication of

matrices. Given that

S ={[

1 0

0 0

]

,

[

0 1

0 0

]

,

[

0 0

0 1

]}

and

B ={[

3 0

0 3

]

,

[

−1 3

0 0

]

,

[

1 −3

0 1

]}

are both bases for T. Find the change of coor-

dinates matrix from B to S and the change of

coordinates matrix from S to B. Verify that the

product of the change of coordinates matrices

is I.

9. Let A =

1 2 2

2 4 1

3 8 0

. Find a matrix B such that

AB = I.

10. If B = {~v1, . . . ,~vn} is a basis for V, show that

{[~v1]B, . . . , [~vn]B} is a basis for Rn.

11. Suppose that B = {~v1, . . . ,~vn} and C =

{~w1, . . . , ~wn} are both bases for a vector space

V such that [~x]B = [~x]C for all ~x ∈ V. Must it

be true that ~vi = ~wi for each 1 ≤ i ≤ n? Explain

or prove your conclusion.

Chapter 5

Inverses and Determinants

5.1 Matrix Inverses

In the last chapter, we saw that if B and C are bases for a finite dimensional vector

space V, then the change of coordinates matrix BPC and the change of coordinates

matrix CPB satisfy

BPCCPB = I = CPBBPC

Since the products of these matrices give the identity matrix (the multiplicative iden-

tity), these matrices are multiplicative inverses of each other. We now develop and

explore the idea of inverse matrices.

DEFINITION

Left and RightInverse

Let A be an m × n matrix. If B is an n × m matrix such that AB = Im, then B is called

a right inverse of A. If C is an n×m matrix such that CA = In, then C is called a left

inverse of A.

EXAMPLE 1 If A =

[

1 2 2

−1 3 1

]

and B =

3/5 −2/51/5 1/50 0

, then B is a right inverse of A since

[

1 2 2

−1 3 1

]

3/5 −2/51/5 1/50 0

=

[

1 0

0 1

]

Note that this also shows that A is a left inverse of B.

How would we try to find a right inverse of an m × n matrix A? We want to find an

n × m matrix B =[

~b1 · · · ~bm

]

such that

Im = AB

Im = A[

~b1 · · · ~bm

]

[

~e1 · · · ~em

]

=[

A~b1 · · · A~bm

]

143

144 Chapter 5 Inverses and Determinants

Comparing columns, we see that we need to find ~bi such that A~bi = ~ei for 1 ≤ i ≤m. But, this is just solving m systems of linear equations with the same coefficient

matrix. To do this, we row reduce the multiple augmented matrix[

A | Im

]

to RREF.

For this to work, we require that all m systems are consistent. By the result of Ex-

ercise 3.1.7 (b) we get that rank(A) = m. Hence, by Theorem 2.2.4, we must have

n >= m.

Moreover, if n > m, then by the System Rank Theorem (2) we will have infinitely

many solutions and hence infinitely many right inverses.

On the other hand, if n < m, then all the systems cannot be consistent and hence A

cannot have a right inverse. We state this as a theorem.

THEOREM 1 If A is an m × n matrix with m > n, then A cannot have a right inverse.

COROLLARY 2 If A is an m × n matrix with m < n, then A cannot have a left inverse.

Proof: If A has a left inverse C, then C is an n × m matrix with n > m with a right

inverse which contradicts the previous theorem. �

We see that only an n × n matrix can have a left and a right inverse.

DEFINITION

Square Matrix

An n × n matrix is called a square matrix.

We now show that if a matrix has a left and right inverse, then the left and right

inverses must equal.

THEOREM 3 If A, B, and C are n × n matrices such that AB = I = CA, then B = C.

Proof: We have

B = IB = (CA)B = C(AB) = CI = C

�

DEFINITION

Matrix Inverse

Invertible

Let A be an n × n matrix. If B is a matrix such that AB = I = BA, then B is called the

inverse of A. We write B = A−1 and we say that A is invertible.

Section 5.1 Matrix Inverses 145

REMARKS

1. Observe from the definition that if B = A−1, then B is invertible and A = B−1.

2. In the definition we said that B is ‘the’ inverse of A. The fact that the inverse

of a matrix is unique follows immediately from Theorem 3.

EXAMPLE 2 Observe that[

3 −1

−2 1

] [

1 1

2 3

]

=

[

1 0

0 1

]

=

[

1 1

2 3

] [

3 −1

−2 1

]

Hence[

3 −1

−2 1

]−1

=

[

1 1

2 3

]

and[

1 1

2 3

]−1

=

[

3 −1

−2 1

]

EXERCISE 1 Is A =

1 1 2

1 2 2

1 −1 3

the inverse of B =

8 −5 −2

−1 1 0

−3 2 1

?

The following theorem is quite useful. It not only shows that to prove that B is the

inverse of a square matrix A we only need to show that B is the left or right inverse

of A, but it also shows that the rank of an invertible matrix is n.

THEOREM 4 If A and B are n × n matrices such that AB = I, then A and B are invertible and

rank A = rank B = n.

Proof: Assume that AB = I and consider the homogeneous system B~x = ~0. We get

~0 = A~0 = A(B~x) = (AB)~x = I~x = ~x

So, the system has a unique solution and hence, by the System Rank Theorem, the

coefficient matrix B has rank n. But, rank B = n implies that that the system B~x = ~yis consistent for all ~y ∈ Rn. This gives

BA~y = BA(B~x) = B(AB)~x = BI~x = B~x = ~y = I~y

Thus, BA~y = I~y for every ~y ∈ Rn and hence BA = I by the Matrices Equal Theo-

rem. Therefore, A and B are invertible. Moreover, since BA = I we can repeat the

argument above to prove that rank A = n. �


We now use this result to prove the following theorem.

THEOREM 5 If A and B are invertible matrices and c ∈ R with c , 0, then

(1) (cA)−1 = 1cA−1

(2) (AT )−1 = (A−1)T

(3) (AB)−1 = B−1A−1

Proof: We just prove (1) and leave (2) and (3) as exercises.

By Theorem 4, to show that B = A−1 we just need to show that AB = I. Thus, to

show that (cA)−1 = 1cA−1 we just need to show that (cA)

(

1cA−1

)

= I. We have

(cA)

(

1

cA−1

)

=c

cAA−1 = 1I = I

�

The statement of Theorem 4 says that for an n × n matrix A to be invertible, it is

necessary that rank A = n. We now show that rank A = n is also sufficient for A to be

invertible.

THEOREM 6 If A is an n × n matrix such that rank A = n, then A is invertible.

Proof: If rank A = n, then the system of equations A~bi = ~ei, 1 ≤ i ≤ n are all

consistent by the System Rank Theorem. Hence, if we let B =[

~b1 · · · ~bn

]

, then

AB = A[

~b1 · · · ~bn

]

=[

A~b1 · · · A~bn

]

=[

~e1 · · · ~en

]

= I

Thus, A is invertible by Theorem 4. �

Observe that the proof of this theorem teaches us one way of finding the inverse of

a matrix A. The columns of the inverse are the vectors ~bi such that A~bi = ~ei for

1 ≤ i ≤ n. Since each of these systems have the same coefficient matrix A, we

can solve them by making one multiple augmented matrix and row reducing. In

particular, assuming that A is invertible we get

[A | I ] ∼[

I | A−1]

REMARK

Compare this with the method for finding the change of coordinates matrix from

standard coordinates in Rn to any other basis B of Rn.


EXAMPLE 3 Find the inverse of A =

1 −2 3

−1 4 1

1 −1 4

.

Solution: We row reduce the multiple augmented system [A | I ].

1 −2 3 1 0 0

−1 4 1 0 1 0

1 −1 4 0 0 1

∼

1 0 0 −17/2 −5/2 7

0 1 0 −5/2 −1/2 2

0 0 1 3/2 1/2 −1

Thus, A−1 =

−17/2 −5/2 7

−5/2 −1/2 2

3/2 1/2 −1

.

EXERCISE 2 Find the inverse of A =

1 2 −1

0 1 2

−1 3 10

.

EXAMPLE 4 Let A ∈ M2×2(R). Find a condition on the entries of A that guarantees that A is

invertible and then, assuming that A is invertible, find A−1.

Solution: Let A =

[

a b

c d

]

. For A to be invertible, we require that rank A = 2. Thus,

we cannot have a = 0 = c. Assume that a , 0. Row reducing [A | I ] we get

[

a b 1 0

c d 0 1

]

R2 − caR1∼

[

a b 1 0

0 d − bca− c

a1

]

aR2∼[

a b 1 0

0 ad − bc −c a

]

Since we need rank A = 2, we now require that ad − bc , 0. Assuming this, we

continue row reducing to get

[

a b 1 0

0 ad − bc −c a

]

∼[

1 0 dad−bc

−bad−bc

0 1 −cad−bc

aad−bc

]

Thus, we get that A is invertible if and only if ad−bc , 0. Moreover, if A is invertible,

then

A−1 =1

ad − bc

[

d −b

−c a

]

Repeating the argument in the case that c , 0 gives the same result.

REMARK

This quantity ad − bc which determines whether a 2 × 2 matrix is invertible or not is

called the determinant of the matrix. We will look more at this later in the chapter.


EXERCISE 3 Let A =

a b c

0 d e

0 0 f

. Find a condition on the entries of A that guarantees that A is

invertible and then, assuming that A is invertible, find A−1.

Invertible Matrix Theorem

The following theorem ties together many of the concepts we have looked at previ-

ously. Additionally, we will find it very useful throughout the rest of the book.

THEOREM 7 (Invertible Matrix Theorem)

For any n × n matrix A, the following are equivalent:

(1) A is invertible

(2) The RREF of A is I

(3) rank A = n

(4) The system of equations A~x = ~b is consistent with a unique solution for all~b ∈ Rn

(5) The nullspace of A is {~0}(6) The columns of A form a basis for Rn

(7) The rows of A form a basis for Rn

(8) AT is invertible

This theorem says that if any one of the eight items are true for an n × n matrix, then

all of the other items are also true. We leave the proof of this theorem as an important

exercise. This is an excellent way of testing your knowledge and understanding of

much of the material we have done up to this point.

REMARK

The Invertible Matrix Theorem is not limited to these eight statements. At this point,

you should be able to add many more to the list. Additionally, we will add a few

more statements to it as we proceed through the text.

Parts (1) and (4) of the Invertible Matrix Theorem tell us that if A is invertible, then

the system of equations A~x = ~b is consistent with a unique solution. What is the

solution? Observe that since A is invertible, we have

A−1A~x = A−1~b

I~x = A−1~b

~x = A−1~b

Hence, if A is invertible, we can solve the system A~x = ~b by simply matrix multiply-

ing A−1~b.


EXAMPLE 5 Solve the system of linear equations

3x1 − 2x2 = 4

x1 + 5x2 = 7

Solution: We can represent the system as A~x = ~b where A =

[

3 −2

1 5

]

and ~b =

[

4

7

]

.

From our work in Example 4, we see that A is invertible with A−1 = 117

[

5 2

−1 3

]

.

Hence, the solution of the system is

~x = A−1~b =1

17

[

5 2

−1 3

] [

4

7

]

=

[

2

1

]

EXAMPLE 6 Let A =

1 −2 3

−1 4 1

1 −1 4

. Solve the systems A~x =

1

0

1

and A~y =

1

2

1

.

Solution: In Example 3 we found that A−1 =

−17/2 −5/2 7

−5/2 −1/2 2

3/2 1/2 −1

. Hence, the solu-

tion of the first system is

~x = A−1~b =

−17/2 −5/2 7

−5/2 −1/2 2

3/2 1/2 −1

1

0

1

=

−3/2−1/21/2

and the solution for the second system is

~y = A−1~b =

−17/2 −5/2 7

−5/2 −1/2 2

3/2 1/2 −1

1

2

1

=

−13/2−3/23/2

The example above shows that once we have found A−1 we can solve any system of

linear equations with coefficient matrix A very quickly. Moreover, at first glance, it

might seem like we are solving the system of linear equations without row reducing.

However, this is an illusion. The row operations are in fact “stored” inside of the

inverse of the matrix (we used row operations to find the inverse). In the next section

we look at how to discover those stored row operations.



1. For each matrix, find the inverse or show that

the matrix is not invertible.

(a)

[

2 1

−3 4

]

(b)

[

1 3

1 −6

]

(c)

[

2 4

3 6

]

(d)

1 0 0

0 2 0

0 0 3

(e)

1 2 3

4 5 6

7 8 9

(f)

1 2 1

0 −2 4

3 4 4

(g)

3 1 −2

1 2 1

2 1 1

(h)

1 0 1

0 2 −3

0 0 3

(i)

−1 0 −1 2

1 −1 −2 4

−3 −1 5 2

−1 −2 4 4

(j)

1 2 3 2

1 1 1 1

−1 −3 1 4

0 1 −3 −5

2. Let A =

1 −2 2

−1 1 0

3 −2 −3

. Find A−1 and use it to

solve A~x = ~d where:

(a) ~d =

1

−2

3

(b) ~d =

1

0

0

(c) ~d =

1/2−1/3√

2

3. Let A =

[√2 3/2

1/2 0

]

. Find A−1 and use it to

solve A~x = ~d where:

(a) ~d =

[

1

0

]

(b) ~d =

[√2

1

]

(c) ~d =

[√3

π

]

4. Find all right inverses of A =

[

3 1 0

1 2 0

]

.

5. Find all right inverses of A =

[

1 −2 1

1 −1 1

]

.

6. Find all left inverses of B =

1 1

−2 −1

1 1

7. Find all left inverses of B =

1 2

0 −1

3 3

.

8. Let A =

[

2 1

3 1

]

and B =

[

1 2

3 5

]

.

(a) Find A−1 and B−1.

(b) Calculate AB and (AB)−1 and verify that

(AB)−1 = B−1A−1.

(c) Calculate (2A)−1 and verify that it equals12A−1.

(d) Calculate (AT )−1 and verify that

AT (AT )−1 = I.

9. Prove that (AT )−1 = (A−1)T .

10. Let {~v1, . . . ,~vk} be a linearly independent set in

Rn and let A be an invertible n×n matrix. Prove

that {A~v1, . . . , A~vk} is linearly independent.

11. (a) Prove that if A and B are n×n matrices such

that AB is invertible, then A and B are invert-

ible.

(b) Find non-invertible matrices C and D such

that CD is invertible.

12. Prove that if A is an m × n matrix with m > n,

then A cannot have a right inverse.

13. Prove or disprove the following statements. In

each case, A and B are both n × n matrices.

(a) If C is a 3 × 2 matrix, then C has a left

inverse.

(b) If Null(AT ) = {~0}, then A is invertible.

(c) If A and B are invertible matrices, then

A + B is invertible.

(d) If AA = A and A is not the zero matrix,

then A is invertible.

(e) If AA is invertible, then A is invertible.

(f) If A and B satisfy AB = On,n, then B is not

invertible.

(g) There exists a basis {A1, A2, A3, A4} for

M2×2(R) such that each Ai is invertible.

(h) If A has a column of zeroes, then A is not

invertible.

(i) If A~x = ~0 has a unique solution, then

Col(A) = Rn.

(j) If rank AT = n, then rank A = n.

(k) If A is invertible, then the columns of AT

are linearly independent.

(l) If A satisfies A2 − 2A + I = On,n, then A is

invertible.

14. (a) Find n × n matrices A and B such that

(AB)−1, A−1B−1.

(b) What condition is required on AB so that

we do have (AB)−1 = A−1B−1.

Section 5.2 Elementary Matrices 151

5.2 Elementary Matrices

Consider the system of linear equations A~x = ~b where A is invertible. We compare

our two methods of solving this system. First, we can solve this system using the

methods of Chapter 2 to row reduce the augmented matrix:[

A | ~b]

∼ [

I | ~x ]

Alternately, we find A−1 using the method in the previous section. We row reduce

[A | I ] ∼[

I | A−1]

and then solve the system by computing ~x = A−1~b.

Observe that in both methods we can use exactly the same row operations to row

reduce A to I. In the first method, we are applying those row operations directly

to ~b to determine ~x. Thus, in the second method the row operations used are being

“stored” inside of A−1 and the matrix vector product A−1~b is “performing” those row

operations on ~b so that we get the same answer as in the first method.

This not only shows us that A−1 is made of elementary row operations, but also that

there is a close connection between matrix multiplication and performing elementary

row operations.

DEFINITION

ElementaryMatrix

An n×n matrix E is called an elementary matrix if it can be obtained from the n×n

identity matrix by performing exactly one elementary row operation.

EXAMPLE 1 Find the 3×3 elementary matrix corresponding to each row operation by performing

the row operation on the 3 × 3 identity matrix.

(a) R2 − 3R1

Solution: We have

1 0 0

0 1 0

0 0 1

R2 − 3R1 ∼

1 0 0

−3 1 0

0 0 1

Thus, the corresponding elementary matrix is E1 =

1 0 0

−3 1 0

0 0 1

.

(b) R1 ↔ R2

Solution: We have

1 0 0

0 1 0

0 0 1

R1 ↔ R2 ∼

0 1 0

1 0 0

0 0 1


0 1 0

1 0 0

0 0 1

.


(c) 3R3

Solution: We have

1 0 0

0 1 0

0 0 1

3R3

∼

1 0 0

0 1 0

0 0 3


1 0 0

0 1 0

0 0 3

.

EXAMPLE 2 Determine which of the following matrices are elementary. For each elementary

matrix, indicate the associated elementary row operation.

(a)

1 0 2

0 1 0

0 0 1

Solution: This matrix is elementary as it can be obtained from the 3 × 3 identity

matrix by performing the row operation R1 + 2R3.

(b)

[

0 1

1 0

]


matrix by performing the row operation R1 ↔ R2.

(c)

3 0 0

0 3 0

0 0 3

Solution: This matrix is not elementary as it would require three elementary row

operations to get this matrix from the 3 × 3 identity matrix.

(d)

1 0 0

0 −1 0

0 0 1


matrix by performing the row operation (−1)R2.

EXERCISE 1 Determine which of the following matrices are elementary. For each elementary

matrix, indicate the associated elementary row operation.

E1 =

[

1 1

0 1

]

, E2 =

[

−1 0

0 1

]

, E3 =

[

2 3

0 1

]


It is clear that the RREF of every elementary matrix is I; we just need to perform

the inverse (opposite) row operation to turn the elementary matrix E back to I. Thus,

we not only have that every elementary matrix is invertible, but the inverse is the

elementary matrix associated with the reverse row operation.

THEOREM 1 If E is an elementary matrix, then E is invertible and E−1 is also an elementary matrix.

EXAMPLE 3 Find the inverse of each of the following elementary matrices. Check your answer

by multiplying the matrices together.

(a) E1 =

[

1 2

0 1

]

Solution: The inverse matrix E−11 is the elementary matrix associated with the row

operation required to bring E1 back to I. That is, R1 − 2R2. Therefore,

E−11 =

[

1 −2

0 1

]

Checking, we get E−11

E1 =

[

1 −2

0 1

] [

1 2

0 1

]

=

[

1 0

0 1

]

.

(b) E2 =

0 0 1

0 1 0

1 0 0

Solution: The inverse matrix E−12

is the elementary matrix associated with the row

operation required to bring E2 back to I. That is, R1 ↔ R3. Therefore,

E−12 =

0 0 1

0 1 0

1 0 0


E2 =

0 0 1

0 1 0

1 0 0

0 0 1

0 1 0

1 0 0

=

1 0 0

0 1 0

0 0 1

.

(c) E3 =

1 0 0

0 1 0

0 0 −3

Solution: The inverse matrix E−13

is the elementary matrix associated with the row

operation required to bring E3 back to I. That is, (−1/3)R3. Therefore,

E−13 =

1 0 0

0 1 0

0 0 −1/3


E3 =

1 0 0

0 1 0

0 0 −1/3

1 0 0

0 1 0

0 0 −3

=

1 0 0

0 1 0

0 0 1

.


If we look at our calculations in the example above we see something interesting.

Consider an elementary matrix E. We saw that the product EI is the matrix obtained

from I by performing the row operations associated with E on I, and that E−1E is the

matrix obtained from E by performing the row operation associated with E−1 on E.

We demonstrate this further with another example.

EXAMPLE 4 Let E1 =

1 0 0

2 1 0

0 0 1

, E2 =

1 0 0

0 1 0

0 0 3

, and A =

1 1 3

0 −1 2

0 5 6

. Calculate E1A and

E2E1A. Describe the products in terms of matrices obtain from A by elementary row

operations.

Solution: By matrix multiplication, we get

E1A =

1 0 0

2 1 0

0 0 1

1 1 3

0 −1 2

0 5 6

=

1 1 3

2 1 8

0 5 6

Observe that E1A is the matrix obtain from A by performing the row operation

R2 + 2R1. That is, by performing the row operation associated with E1.

E2E1A =

1 0 0

0 1 0

0 0 3

1 1 3

2 1 8

0 5 6

=

1 1 3

2 1 8

0 15 18

Thus, E2E1A is the matrix obtained from A by first performing the row operation

R2 + 2R1, and then performing the row operation associated with E2, namely 3R3.

The following theorems prove that this useful fact is true for all elementary matrices.

THEOREM 2 If A is an m×n matrix and E is the m×m elementary matrix corresponding to the row

operation Ri + cR j, for i , j, then EA is the matrix obtained from A by performing

the row operation Ri + cR j on A.

Proof: Observe that the i-th row of E satisfies eii = 1, ei j = c, and eik = 0 for all

k , i, j. So, the entries in the i-th row of EA are

(EA)ik =

m∑

ℓ=1

EiℓAℓk

= ei ja jk + eiiaik

= ca jk + aik

for 1 ≤ k ≤ n. Therefore, the i-th row of A is equal to Ri + cR j. The remaining rows

of E are rows from the identity matrix, and hence the remaining rows of EA are just

the corresponding rows in A. �


The proofs of the following two theorems are similar to that of Theorem 2.

THEOREM 3 If A is an m × n matrix and E is the m × m elementary matrix corresponding to the

row operation cRi, then EA is the matrix obtained from A by performing the row

operation cRi on A.

THEOREM 4 If A is an m×n matrix and E is the m×m elementary matrix corresponding to the row

operation Ri ↔ R j, for i , j, then EA is the matrix obtained from A by performing

the row operation Ri ↔ R j on A.

Since elementary row operations do not change the rank of a matrix, we immediately

get the following result.

COROLLARY 5 If A is an m × n matrix and E is an m × m elementary matrix, then

rank(EA) = rank A

Matrix Decomposition into Elementary Matrices

We began this section by asking about how the inverse of a matrix stores elementary

row operations. From our work above, we guess that these are stored as elementary

matrices. We now look at how to decompose a matrix into its elementary factors.

THEOREM 6 If A is an m×n matrix with reduced row echelon form R, then there exists a sequence

E1, . . . , Ek of m × m elementary matrices such that Ek · · ·E2E1A = R. In particular,

A = E−11 E−1

2 · · ·E−1k R

Proof: Using the methods of Chapter 2, we know that we can row reduce a matrix A

to its RREF R with a sequence of elementary row operations. Let E1 denote the ele-

mentary matrix corresponding to the first row operation performed, E2 the elementary

matrix corresponding to the second row operation performed, up to Ek corresponding

to the last row operation performed. Then, Theorems 2, 3, and 4, gives us that

EkEk−1 · · ·E2E1A = R (5.1)

Because each elementary matrix is invertible, we get that the product EkEk−1 · · ·E1 is

invertible by Theorem 5.1.5 (3). In particular, multiplying by the inverse of Ek · · ·E1

on both sides of (5.1) we get that

A = (Ek · · ·E1)−1R = E−11 E−1

2 · · ·E−1k R

as required. �


EXAMPLE 5 Write A =

1 3 0 1

−1 −3 3 2

2 7 0 2

as a product of elementary matrices and its RREF R.

Solution: We first row reduce A to R keeping track of the row operations used.

1 3 0 1

−1 −3 3 2

2 7 0 2

R2 + R1

R3 − 2R1

∼

1 3 0 1

0 0 3 3

0 1 0 0

R2 ↔ R3∼

1 3 0 1

0 1 0 0

0 0 3 3

13R3

∼

1 3 0 1

0 1 0 0

0 0 1 1

R1 − 3R2

∼

1 0 0 1

0 1 0 0

0 0 1 1

Next, we write the elementary matrix for each row operation we used. Observe that

each elementary matrix will be 3 × 3 since A has three rows. We have

E1 =

1 0 0

1 1 0

0 0 1

E2 =

1 0 0

0 1 0

−2 0 1

E3 =

1 0 0

0 0 1

0 1 0

E4 =

1 0 0

0 1 0

0 0 1/3

E5 =

1 −3 0

0 1 0

0 0 1

and E5E4E3E2E1A = R. Then

A = E−11 E−1

2 E−13 E−1

4 E−15 R

=

1 0 0

−1 1 0

0 0 1

1 0 0

0 1 0

2 0 1

1 0 0

0 0 1

0 1 0

1 0 0

0 1 0

0 0 3

1 3 0

0 1 0

0 0 1

1 0 0 1

0 1 0 0

0 0 1 1

EXAMPLE 6 Write A =

1 −1 0

−1 2 1

0 1 1

1 −1 0


Solution: We first row reduce A to R keeping track of the row operations used.

1 −1 0

−1 2 1

0 1 1

1 −1 0

R2 + R1

R4 − R1

∼

1 −1 0

0 1 1

0 1 1

0 0 0

R1 + R2

R3 − R2∼

1 0 1

0 1 1

0 0 0

0 0 0

Next, we write the elementary matrix for each row operation we used. Observe that

each elementary matrix will be 4 × 4 since A has four rows. We have

E1 =

1 0 0 0

1 1 0 0

0 0 1 0

0 0 0 1

E2 =

1 0 0 0

0 1 0 0

0 0 1 0

−1 0 0 1

E3 =

1 1 0 0

0 1 0 0

0 0 1 0

0 0 0 1

E4 =

1 0 0 0

0 1 0 0

0 −1 1 0

0 0 0 1

and E4E3E2E1A = R. Then,


A = E−11 E−1

2 E−13 E−1

4 R

=

1 0 0 0

−1 1 0 0

0 0 1 0

0 0 0 1

1 0 0 0

0 1 0 0

0 0 1 0

1 0 0 1

1 −1 0 0

0 1 0 0

0 0 1 0

0 0 0 1

1 0 0 0

0 1 0 0

0 1 1 0

0 0 0 1

1 0 1

0 1 1

0 0 0

0 0 0

EXERCISE 2 Write A =

[

0 1 2 4

−2 3 4 4

]


If A is invertible, then we get that the reduced row echelon form of A is I. So, we

can write A entirely as a product of elementary matrices. Moreover, by Theorem 6

we have

Ek · · ·E1A = I

and

A−1 = Ek · · ·E1

Consequently, we can also write A−1 as a product of elementary matrices. In particu-

lar, we have proven the following corollary.

COROLLARY 7 If A is an n × n invertible matrix, then A and A−1 can be written as a product of

elementary matrices.

EXAMPLE 7 Let A =

[

1 3

−3 −11

]

. Write A and A−1 as products of elementary matrices.

Solution: Row reducing A to RREF gives

[

1 3

−3 −11

]

R2 + 3R1∼

[

1 3

0 −2

]

(−1/2)R2∼

[

1 3

0 1

]

R1 − 3R2 ∼[

1 0

0 1

]

Hence,

E1 =

[

1 0

3 1

]

E2 =

[

1 0

0 −1/2

]

E3 =

[

1 −3

0 1

]

and E3E2E1A = I. Thus,

A−1 = E3E2E1 =

[

1 −3

0 1

] [

1 0

0 −1/2

] [

1 0

3 1

]

and

A = E−11 E−1

2 E−13 =

[

1 0

−3 1

] [

1 0

0 −2

] [

1 3

0 1

]


EXAMPLE 8 Let A =

1 0 1

1 0 3

−1 1 −1

. Write A and A−1 as products of elementary matrices.

Solution: Row reducing A to RREF gives

1 0 1

1 0 3

−1 1 −1

R2 − R1

R3 + R1

∼

1 0 1

0 0 2

0 1 0

12R2 ∼

1 0 1

0 0 1

0 1 0

R2 ↔ R3

∼

1 0 1

0 1 0

0 0 1

R1 − R3

∼

1 0 0

0 1 0

0 0 1

Hence,

E1 =

1 0 0

−1 1 0

0 0 1

E2 =

1 0 0

0 1 0

1 0 1

E3 =

1 0 0

0 1/2 0

0 0 1

E4 =

1 0 0

0 0 1

0 1 0

E5 =

1 0 −1

0 1 0

0 0 1

and E5E4E3E2E1A = I. Therefore,

A−1 = E5E4E3E2E1 =

1 0 −1

0 1 0

0 0 1

1 0 0

0 0 1

0 1 0

1 0 0

0 1/2 0

0 0 1

1 0 0

0 1 0

1 0 1

1 0 0

−1 1 0

0 0 1

and

A = E−11 E−1

2 E−13 E−1

4 E−15 =

1 0 0

1 1 0

0 0 1

1 0 0

0 1 0

−1 0 1

1 0 0

0 2 0

0 0 1

1 0 0

0 0 1

0 1 0

1 0 1

0 1 0

0 0 1

We end this section with one additional result which will be used in a couple of proofs

later in the course.

THEOREM 8 If E is an m × m elementary matrix, then ET is an elementary matrix.

Proof: If E is obtained from I by multiplying a row by a non-zero constant c, then

ET = E.

If E is obtained from I by swapping row i and row j, then ei j = 1 = e ji, so ET = E.

Finally, if E is obtained from I by adding c times row i to row j, then ET is obtained

from I by adding c times row j to row i.

Hence, in all cases, ET is elementary. �



1. Determine which of the following are elemen-

tary matrices. For each elementary matrix,

state the corresponding row operation and the

inverse of the matrix.

(a)

[

1 3

0 1

]

(b)

[

1 0

0 0

]

(c)

[

1 0

−4 1

]

(d)

1 0 0

0 0 1

0 1 0

(e)

1 2 0

0 1 0

1 2 1

(f)

1 0 0

0 1 0

0 0 −4

(g)

1 0 1

0 1 0

1 0 1

(h)

0 0 0 1

0 1 0 0

0 0 1 0

1 0 0 0

(i)

2 0 0 0

0 2 0 0

0 0 2 0

0 0 0 2

(j)

2 0 0 1

0 1 0 0

0 0 1 0

0 0 0 1

2. Write the 3×3 elementary matrix and its inverse

associated with each elementary row operation.

(a) R1 − 2R3 (b) R1 ↔ R2

(c)√

2R2 (d) R3 + 2R2

(e) R2 ↔ R3 (f) 3R1

(g) R2 + 2R1 (h) 12R3

3. Write each matrix A as a product of elementary

matrices and its reduced row echelon form.

(a) A =

1 2

−1 1

2 1

(b) A =

[

1 −1 2

2 1 1

]

(c) A =

0 1 3

0 −1 −2

2 2 4

(d) A =

2 4 1

1 2 2

1 2 2

(e) A =

3 0 1

2 1 0

1 0 1

(f) A =

2 1 1

2 3 −1

2 1 1

(e) A =

1 0 2

0 1 3

−1 1 1

−2 2 5

(f) A =

1 2 1 2

0 0 1 2

−1 −1 3 6

4. Write A and A−1 as a product of elementary ma-

trices.

(a) A =

[

2 1

3 3

]

(b) A =

1 1 3

1 2 2

1 1 1

(c) A =

[

0 3

1 −2

]

(d) A =

0 2 6

1 4 4

−1 2 8

(e) A =

[

4 2

2 2

]

(f) A =

1 0 −1

−2 0 −2

−4 1 4

(g) A =

[

1 3

−3 1

]

(h) A =

1 −2 4 1

−1 3 −4 −1

0 1 2 0

−2 4 −8 −1

5. Without using matrix-matrix multiplication,

calculate A where

A =

[

1 2

0 1

] [

1/2 0

0 1

] [

0 1

1 0

] [

1 0

−3 1

] [

1 0

0 3

] [

1 1

0 1

]

6. Let A =

[

1 2

2 6

]

and ~b =

[

3

5

]

.

(a) Determine elementary matrices E1, E2,

and E3 such that E3E2E1A = I.

(b) Since A is invertible, we know that they

system A~x = ~b has unique solution

~x = A−1~b = E3E2E1~b

Calculate the solution ~x without using

matrix-matrix multiplication.

(c) Solve the system A~x = ~b by row reducing[

A | ~b]

. Observe the operations that you

use on the augmented part of the system

and compare with part (b).

7. Prove that if A is invertible and B is row equiv-

alent to A, then B is also invertible.

8. We have seen that multiplying an m × n matrix

A on the left by an elementary matrix has the

same effect as performing the elementary row

operation corresponding to the elementary ma-

trix on A. If E is an n × n elementary matrix,

then what effect does multiplying by A on the

right by E have? Justify your answer.


5.3 Determinants

In Example 5.1.4 we showed that a matrix A =

[

a b

c d

]

is invertible if and only if

ad − bc , 0. Since this quantity determines whether the matrix is invertible or not,

we make the following definition.

DEFINITION

2 × 2 Determinant

Let A =

[

a b

c d

]

. The determinant of A is

det A = ad − bc

REMARK

The determinant is often denoted by vertical straight lines:

∣

∣

∣

∣

∣

∣

a b

c d

∣

∣

∣

∣

∣

∣

= det

[

a b

c d

]

= ad − bc

EXAMPLE 1 Evaluate det

[

3 5

−1 2

]

and

∣

∣

∣

∣

∣

∣

3 6

2 4

∣

∣

∣

∣

∣

∣

.

Solution: We have

det

[

3 5

−1 2

]

= 3(2) − 5(−1) = 11

∣

∣

∣

∣

∣

∣

3 6

2 4

∣

∣

∣

∣

∣

∣

= 3(4) − 6(2) = 0

We, of course, want to extend the definition of the determinant to n × n matrices. So,

we next consider the 3×3 case. Let A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

. It can be shown (with some

effort) that A is invertible if and only if

a11a22a33 − a11a23a32 − a12a21a33 + a13a21a32 + a12a23a31 − a13a22a31 , 0

Thus, we define

det A = a11a22a33 − a11a23a32 − a12a21a33 + a13a21a32 + a12a23a31 − a13a22a31

Section 5.3 Determinants 161

However, we would like to write the formula in a nicer form. Moreover, we would

like to find some sort of pattern with this and the 2 × 2 determinant so that we can

figure out how to define the determinant for general n × n matrices. We observe that

a11a22a33 − a11a23a32 − a12a21a33 + a13a21a32 + a12a23a31 − a13a22a31

= a11(a22a33 − a23a32) − a21(a12a33 − a13a32) + a31(a12a23 − a13a22)

= a11

∣

∣

∣

∣

∣

∣

a22 a23

a32 a33

∣

∣

∣

∣

∣

∣

− a21

∣

∣

∣

∣

∣

∣

a12 a13

a32 a33

∣

∣

∣

∣

∣

∣

+ a31

∣

∣

∣

∣

∣

∣

a12 a13

a22 a23

∣

∣

∣

∣

∣

∣

(5.2)

Therefore, the 3×3 determinant can be written as a linear combination of 2×2 deter-

minants of submatrices of A where the coefficients are (plus or minus) the coefficients

from the first column of A.

To extend all of this to n × n matrices we will recursively use the same pattern. We

first make a definition.

DEFINITION

Cofactor

Let A be an n×n matrix with n ≥ 2. Let A(i, j) be the (n−1)× (n−1) matrix obtained

from A by deleting the i-th row and the j-th column. The cofactor of ai j is

Ci j = (−1)i+ j det A(i, j)

So far, we only have a definition of a determinant for up to 3 × 3 matrices. So,

we currently can only use this definition up to n = 4. We will rectify that problem

shortly.

EXAMPLE 2 Find all nine of the cofactors of A =

1 0 2

0 −1 3

−2 −3 4

.

Solution: The cofactor C11 of a11 is defined by C11 = (−1)1+1 det A(1, 1), where

A(1, 1) is the matrix obtained from A by deleting the first row and first column. That

is

A(1, 1) =

1 0 2

0 −1 3

−2 −3 4

=

[

−1 3

−3 4

]

Hence, C11 = (−1)1+1

∣

∣

∣

∣

∣

∣

−1 3

−3 4

∣

∣

∣

∣

∣

∣

= (1)[(−1)(4) − 3(−3)] = 5.

The cofactor C12 of a12 is defined by C12 = (−1)1+2 det A(1, 2), where A(1, 2) is the

matrix obtained from A by deleting the first row and second column. We have

A(1, 2) =

1 0 2

0 −1 3

−2 −3 4

=

[

0 3

−2 4

]

Thus, C12 = (−1)1+2

∣

∣

∣

∣

∣

∣

0 3

−2 4

∣

∣

∣

∣

∣

∣

= (−1)[0(4) − 3(−2)] = −6.


Similarly, the cofactor C13 of a13 is defined by C13 = (−1)1+3 det A(1, 3), where A(1, 3)

is the matrix obtained from A by deleting the first row and third column. We have

A(1, 3) =

1 0 2

0 −1 3

−2 −3 4

=

[

0 −1

−2 −3

]

Thus, C13 = (−1)1+3

∣

∣

∣

∣

∣

∣

0 −1

−2 −3

∣

∣

∣

∣

∣

∣

= (1)[0(−3) − (−1)(−2)] = −2.

Continuing in this way, we can find the other cofactors are:

C21 = (−1)2+1

∣

∣

∣

∣

∣

∣

0 2

−3 4

∣

∣

∣

∣

∣

∣

= (−1)[0(4) − 2(−3)] = −6

C22 = (−1)2+2

∣

∣

∣

∣

∣

∣

1 2

−2 4

∣

∣

∣

∣

∣

∣

= (1)[1(4) − 2(−2)] = 8

C23 = (−1)2+3

∣

∣

∣

∣

∣

∣

1 0

−2 −3

∣

∣

∣

∣

∣

∣

= (−1)[1(−3) − 0(−2)] = 3

C31 = (−1)3+1

∣

∣

∣

∣

∣

∣

0 2

−1 3

∣

∣

∣

∣

∣

∣

= (1)[0(3) − 2(−1)] = 2

C32 = (−1)3+2

∣

∣

∣

∣

∣

∣

1 2

0 3

∣

∣

∣

∣

∣

∣

= (−1)[1(3) − 2(0)] = −3

C33 = (−1)3+3

∣

∣

∣

∣

∣

∣

1 0

0 −1

∣

∣

∣

∣

∣

∣

= (1)[1(−1) − 0(0)] = −1

EXERCISE 1 Find all four of the cofactors of A =

[

−1 −2

3 4

]

.

Using cofactors we can rewrite equation (5.2) as

det A = a11C11 + a21C21 + a31C31 =

3∑

i=1

ai1Ci1

To define the determinant of an n × n matrix we repeat this pattern.

DEFINITION

n × n Determinant

Let A be an n × n matrix with n ≥ 2. The determinant of A is defined to be

det A =

n∑

i=1

ai1Ci1

where the determinant of a 1 × 1 matrix is defined by det[c] = c.


REMARKS

1. Notice that the definition of a determinant is recursive. The determinant of

an n × n matrix is defined in terms of cofactors which are determinants of

(n − 1) × (n − 1) matrices.

2. As we did with 2 × 2 matrices, we often represent the determinant of a matrix

with vertical straight lines. For example, we write

det

a11 a12 a13

a21 a22 a23

a31 a32 a33

=

∣

∣

∣

∣

∣

∣

∣

∣

a11 a12 a13

a21 a22 a23

a31 a32 a33

∣

∣

∣

∣

∣

∣

∣

∣

EXAMPLE 3 Find the determinant of A =

1 0 2

0 1 1

−2 2 3

.

Solution: By definition, we have

det A = a11C11 + a21C21 + a31C31

= 1(−1)1+1

∣

∣

∣

∣

∣

∣

1 1

2 3

∣

∣

∣

∣

∣

∣

+ 0(−1)2+1

∣

∣

∣

∣

∣

∣

0 2

2 3

∣

∣

∣

∣

∣

∣

+ (−2)(−1)3+1

∣

∣

∣

∣

∣

∣

0 2

1 1

∣

∣

∣

∣

∣

∣

= 1(1)(

1(3) − 1(2))

+ 0(1)(

0(3) − 2(2))

+ (−2)(1)(

0(1) − 2(1))

= 1 + 0 + 4 = 5

In the example above we did not actually need to evaluate the cofactor C12 since the

coefficient a12 = 0. Such steps are normally omitted.

EXAMPLE 4 Calculate

∣

∣

∣

∣

∣

∣

∣

∣

0 2 −4

1 3 2

0 1 −2

∣

∣

∣

∣

∣

∣

∣

∣

.


det A = a11C11 + a21C21 + a31C31

= (−1)2+1

∣

∣

∣

∣

∣

∣

2 −4

1 −2

∣

∣

∣

∣

∣

∣

= (−1)[2(−2) − (−4)(1)]

= 0


EXERCISE 2 Find the determinant of A =

−1 −2 1

−2 4 0

0 1 1

.

EXAMPLE 5 Find the determinant of B =

0 1 0 0

1 0 1 0

0 1 0 1

−2 0 2 3

.


det B = b11C11 + b21C21 + b31C31 + b41C41

= (−1)2+1

∣

∣

∣

∣

∣

∣

∣

∣

1 0 0

1 0 1

0 2 3

∣

∣

∣

∣

∣

∣

∣

∣

+ (−2)(−1)4+1

∣

∣

∣

∣

∣

∣

∣

∣

1 0 0

0 1 0

1 0 1

∣

∣

∣

∣

∣

∣

∣

∣

= (−1)

∣

∣

∣

∣

∣

∣

∣

∣

1 0 0

1 0 1

0 2 3

∣

∣

∣

∣

∣

∣

∣

∣

+ 2

∣

∣

∣

∣

∣

∣

∣

∣

1 0 0

0 1 0

1 0 1

∣

∣

∣

∣

∣

∣

∣

∣

We now need to evaluate these 3 × 3 determinants. By definition, we get

det B =(−1)

(

(−1)1+1

∣

∣

∣

∣

∣

∣

0 1

2 3

∣

∣

∣

∣

∣

∣

+ (−1)2+1

∣

∣

∣

∣

∣

∣

0 0

2 3

∣

∣

∣

∣

∣

∣

)

+

2

(

(−1)1+1

∣

∣

∣

∣

∣

∣

1 0

0 1

∣

∣

∣

∣

∣

∣

+ (−1)3+1

∣

∣

∣

∣

∣

∣

0 0

1 0

∣

∣

∣

∣

∣

∣

)

=(−1)[(

0(3) − 1(2))

+ (−1)(

0(3) − 0(2))]

+

2[(

1(1) − 0(0))

+(

0(0) − 0(1))]

=2 + 2 = 4

This example clearly demonstrates the recursive nature of the definition of the de-

terminant. In particular, notice that we were quite lucky to have lots of zeros in the

matrix as otherwise we would have had many more cofactors to evaluate. In general,

if we just use a cofactor expansion to find the determinant of an n × n matrix, we

need to calculate the value of n cofactors. Each of these cofactors are in terms of

a determinant of an (n − 1) × (n − 1) matrix. To calculate the determinant of the

(n − 1) × (n − 1) matrices, we need to calculate the value of n − 1 determinants of

(n − 2) × (n − 2) matrices, and so on. For even a relatively small value of n this can

be a lot of work. Moreover, in some modern applications, it is necessary to calcu-

late determinants of very large matrices. Therefore, we want to find a faster way of

evaluating a determinant.


The Cofactor Expansion

Observe that our way of factoring the 3×3 determinant in (5.2) was not the only way

that equation could be factored. We also could get

det A = a12C12 + a22C22 + a32C32

det A = a13C13 + a23C23 + a33C33

det A = a11C11 + a12C12 + a13C13

det A = a21C21 + a22C22 + a23C23

det A = a31C31 + a32C32 + a33C33

Hence, we did not have to use the coefficients and cofactors from the first column of

the matrix. We can in fact use the coefficients and cofactors from any row or column

of the matrix. For n × n matrices we have the following theorem.

THEOREM 1 Let A be an n × n matrix. For any i with 1 ≤ i ≤ n

det A =

n∑

k=1

aikCik

called the cofactor expansion across the i-th row, OR for any j with 1 ≤ j ≤ n

det A =

n∑

k=1

ak jCk j

called the cofactor expansion across the j-th column.

REMARK

The proof of this theorem is fairly lengthy and so is omitted. However, the proof is

actually quite interesting, so you may wish to search for it online. Alternatively, if

you are interested in a challenge, try proving it yourself.

This theorem allows us to expand the determinant along the row or column with the

most zeros.

EXAMPLE 6 Find the determinant of each of the following matrices.

(a) A =

1 2 0

−3 4 6

−2 5 0

.

Solution: Using the cofactor expansion along the third columns gives

det A = a13C13 + a23C23 + a33C33 = 6(−1)2+3

∣

∣

∣

∣

∣

∣

1 2

−2 5

∣

∣

∣

∣

∣

∣

= −6(

1(5) − 2(−2))

= −54


(b) B =

1 4 2 5

0 1 0 0

−1 3 0 1

0 −2 3 0

Solution: Using the cofactor expansion along the second row gives

det B = b21C21 + b22C22 + b23C23 + b24C24 = 1(−1)2+2

∣

∣

∣

∣

∣

∣

∣

∣

1 2 5

−1 0 1

0 3 0

∣

∣

∣

∣

∣

∣

∣

∣

We now expand the 3 × 3 determinant along the third row to get

det B = 1(3)(−1)3+2

∣

∣

∣

∣

∣

∣

1 5

−1 1

∣

∣

∣

∣

∣

∣

= −3(

1(1) − 5(−1))

= −18

(c) C =

1 51 −73 29

0 2 −16 15

0 0 3 −99

0 0 0 4

.

Solution: Using the cofactor expansion along the first column in each step gives

det C = 1(−1)1+1

∣

∣

∣

∣

∣

∣

∣

∣

2 −16 15

0 3 −99

0 0 4

∣

∣

∣

∣

∣

∣

∣

∣

= 1(2)(−1)1+1

∣

∣

∣

∣

∣

∣

3 −99

0 4

∣

∣

∣

∣

∣

∣

= 1(2)(

3(4) − (−99)(0))

= 1(2)(3)(4) = 24

(d) D =

3 −3 4 0 5

−6 8 9 0 3

−3 5 2 0 1

−5 −5 3 0 1

3 7 3 0 1

.

Solution: Using the cofactor expansion along the fourth column gives det D = 0.

EXERCISE 3 Find the determinant of each of the following matrices.

(a) A =

2 3 1

−4 2 −5

0 −3 0

(b) B =

4 0 0

−11 3 0

15 17 8

Part (c) in Example 6 and part (b) in Exercise 3 motivates the following definitions

which leads to an important result.


DEFINITION

Upper Triangular

Lower Triangular

An m×n matrix U is said to be upper triangular if ui j = 0 whenever i > j. An m×n

matrix L is said to be lower triangular if ℓi j = 0 whenever i < j.

EXAMPLE 7 The matrices

1 2 3 1

0 0 2 5

0 0 −1 1

,

1 1 1

0 1 1

0 0 1

, and[

−4]

are all upper triangular.

The matrices

3 0 0 0

1 4 0 0

0 0 −2 0

,

1 0 0

1 0 0

1 0 0

, and

[

0 0

0 0

]

are all lower triangular.

THEOREM 2 If an n × n matrix A is upper triangular or lower triangular, then

det A = a11a22 · · · ann

Proof: We prove this by induction on n for n × n upper triangular matrices. The

proof for lower triangular matrices is similar.

If A is a 1 × 1 matrix, then A is upper triangular and det A = a11 as required. Assume

that if B is an (n − 1) × (n − 1) upper triangular matrix, then det B = b11 · · · b(n−1)(n−1).

Let A be an n × n upper triangular matrix. Expanding the determinant along the first

column gives

det A = a11(−1)1+1 det A(1, 1) + 0 + · · · + 0 = a11 det A(1, 1)

But, A(1, 1) is the (n−1)× (n−1) upper triangular matrix formed by deleting the first

row and first column of A. Thus, det A(1, 1) = a22 · · · ann by the inductive hypothesis.

Thus, det A = a11a22 · · · ann as required. �

EXERCISE 4 Calculate the determinant of the following matrices.

A =

1 0 0

−√

2 3 0

π 5 5

, B =

2 −3 5

0 2 3

0 0 2

, C =

3 0 1

2 1 0

1 0 0


Determinants and Row Operations

We have seen that it is very difficult to calculate the determinant of a large matrix

with very few zeros, but very easy to calculate the determinant of an upper or lower

triangular matrix. Therefore, we now look at how applying elementary row oper-

ations to a matrix changes the determinant, so that we can row reduce a matrix to

upper (or lower) triangular form to make calculating the determinant much easier.

THEOREM 3 If A is an n × n matrix and B is the matrix obtained from A by multiplying one row

of A by c ∈ R, then det B = c det A.

Proof: We will prove the theorem by induction on n. For n = 1, the result is easily

verified. Assume the result holds for all (n − 1) × (n − 1) matrices with n ≥ 3. Let B

be the matrix obtained from A by multiplying the ℓ-th row of A by c. Then

bi j =

ai j if i , ℓ

cai j if i = ℓ

Let B(x, y) denote the (n − 1) × (n − 1) submatrix of B obtained by deleting the x-th

row and y-th column of B where x , ℓ. Then B(x, y) is the matrix obtained from

A(x, y) by multiplying some row of A(x, y) by c. Hence, by the inductive hypothesis,

det B(x, y) = c det A(x, y). Therefore, using the cofactor expansion along the x-th row

gives

det B =

n∑

k=1

bxk(−1)x+k det B(x, k)

=

n∑

k=1

axk(−1)x+k[c det A(x, k)]

= c

n∑

k=1

axk(−1)x+k det A(x, k)

= c det A �

THEOREM 4 If A is an n× n matrix and B is the matrix obtained from A by swapping two rows of

A, then det B = − det A.

COROLLARY 5 If an n × n matrix A has two identical rows, then det A = 0.

Proof: If the i-th and j-th rows of A are identical, then let B be the matrix obtained

from A by swapping the i-th and j-th rows of A. Then, clearly A = B. But, by

Theorem 4, we have that det B = − det A. Hence, det A = − det A, so det A = 0. �


THEOREM 6 If A is an n × n matrix and B is the matrix obtained from A by adding a multiple of

one row of A to another row, then det B = det A.

Now we can find a determinant quite efficiently by row reducing the matrix to upper

triangular form, keeping track of how the row operations change the determinant.

Since adding a multiple of one row to another does not change the determinant, it

reduces the chance of errors if this is the only row operation used.

EXAMPLE 8 Evaluate the following determinants.

(a)

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

1 2 3 1

−1 −1 −1 2

1 3 1 1

−2 −2 0 −1

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

.

Solution: Since adding a multiple of one row to another does not change the deter-

minant, we get∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

1 2 3 1

−1 −1 −1 2

1 3 1 1

−2 −2 0 −1

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

1 2 3 1

0 1 2 3

0 1 −2 0

0 2 6 1

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

1 2 3 1

0 1 2 3

0 0 −4 −3

0 0 2 −5

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

1 2 3 1

0 1 2 3

0 0 −4 −3

0 0 0 −13/2

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

= 1(1)(−4)(−13/2) = 26

(b)

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

3 2 −1 1

2 −1 0 −3

5 2 3 −2

1 3 −1 4

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

Solution: Using the row operation R4 + R2 and then R4 − R1, we get∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

3 2 −1 1

2 −1 0 −3

5 2 3 −2

1 3 −1 4

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

3 2 −1 1

2 −1 0 −3

5 2 3 −2

3 2 −1 1

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

3 2 −1 1

2 −1 0 −3

5 2 3 −2

0 0 0 0

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

= 0


EXERCISE 5 Find the determinant of the following matrices.

A =

1 2 1 2

−1 3 −1 1

2 3 3 2

1 1 1 1

, B =

1 4 2 3

−1 −4 −5 4

−1 −3 1 1

0 0 1 1

Determinants and Elementary Matrices

Since multiplying a matrix A on the left by an elementary matrix E performs the same

elementary row operation on A as the corresponding row operation for E, Theorems

3, 4, and 6 give the following result.

COROLLARY 7 If A is an n × n matrix and E is an n × n elementary matrix, then

det EA = det E det A

Proof: Observe from Theorems 3, 4, and 6, that performing an elementary row op-

eration multiplies the determinant of the original matrix by a non-zero real number

c (in the case of swapping rows or adding a multiple of one row to another we have

c = −1 or c = 1 respectively). Since E is obtained from I by performing a single row

operation, we get that det E = c, where c depends on the single row operation. Simi-

larly, since EA is the matrix obtained from A by performing the same row operation,

we have that det EA = c det A = det E det A. �

This corollary allows us to prove some other important results.

THEOREM 8 (addition to the Invertible Matrix Theorem)

An n × n matrix A is invertible if and only if det A , 0.

Proof: By Theorem 5.2.6 there exists a sequence of elementary matrices E1, . . . , Ek

such that Ek · · ·E1A = R where R is the reduced row echelon form of A. Then

det R = det(Ek · · ·E1A) = det Ek · · · det E1 det A

Since the determinant of an elementary matrix is non-zero, we get that det A , 0 if

and only if det R , 0. But det R , 0 if and only if R = I. Hence, det A , 0 if and

only if A is invertible. �


THEOREM 9 If A and B are n × n matrices, then det(AB) = det A det B.

Proof: By Theorem 5.2.6 there exists a sequence of elementary matrices E1, . . . , Ek

such that A = E1 · · ·EkR where R is the reduced row echelon form of A. If det A , 0,

then A is invertible and R = I. Using Corollary 7, we get

det(AB) = det(E1 · · ·EkB) = det(E1 · · ·Ek) det B = det A det B

If det A = 0, then R , I and so R contains at least one row of zeros. Corollary 7 then

gives

det(AB) = det(E1 · · ·EkRB) = det E1 · · · det Ek det(RB)

Using block multiplication, we see that if R contains a row of zeros, then so does RB.

Thus, det(RB) = 0 and hence

det(AB) = 0 = det A det B

�

COROLLARY 10 If A is an invertible matrix, then det A−1 =1

det A.

THEOREM 11 If A is an n × n matrix, then det A = det AT .

Since det A = det AT we can use column operations instead of row operations when

evaluating a determinant. In particular, adding a multiple of one column to another

does not change the determinant, swapping two columns multiplies the determinant

by (-1), and multiplying a column by a scalar c multiplies the determinant by c.

Furthermore, combining row operations, column operations, and cofactor expansions

can make evaluating determinants quite efficient.

EXAMPLE 9 Evaluate the determinant of D =

1 3 −1 1

−3 2 1 2

2 −1 1 1

2 −3 2 −3

.

Solution: Using the row operation R3 + R1 gives

det D =

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

1 3 −1 1

−3 2 1 2

3 2 0 2

2 −3 2 −3

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

We now use the column operation C2 − C4 to get

det D =

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

1 2 −1 1

−3 0 1 2

3 0 0 2

2 0 2 −3

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣


Using the cofactor expansion along the second column gives

det D = 2(−1)1+2

∣

∣

∣

∣

∣

∣

∣

∣

−3 1 2

3 0 2

2 2 −3

∣

∣

∣

∣

∣

∣

∣

∣

Applying R3 − 2R1 and then using the cofactor expansion along the second column

we get

det D = −2

∣

∣

∣

∣

∣

∣

∣

∣

−3 1 2

3 0 2

8 0 −7

∣

∣

∣

∣

∣

∣

∣

∣

= (−2)(1)(−1)1+2

∣

∣

∣

∣

∣

∣

3 2

8 −7

∣

∣

∣

∣

∣

∣

= −74


1. Find all the cofactors of the following matrices.

(a)

[

4 −1

5 8

]

(b)

1 0 3

1 2 0

−1 1 3

(c)

3 2 −1

0 4 3

2 −1 2

(d)

−3 1 −4

1 −5 9

−2 6 −5

2. Evaluate the following determinants.

(a)

∣

∣

∣

∣

∣

∣

2 1

−3 4

∣

∣

∣

∣

∣

∣

(b)

∣

∣

∣

∣

∣

∣

1 3

1 −6

∣

∣

∣

∣

∣

∣

(c)

∣

∣

∣

∣

∣

∣

4 2

6 3

∣

∣

∣

∣

∣

∣

(d)

∣

∣

∣

∣

∣

∣

∣

∣

1 2 −1

0 2 7

0 0 3

∣

∣

∣

∣

∣

∣

∣

∣

(e)

∣

∣

∣

∣

∣

∣

∣

∣

3 0 −4

−2 0 5

3 −1 8

∣

∣

∣

∣

∣

∣

∣

∣

(f)

∣

∣

∣

∣

∣

∣

∣

∣

1 2 1

5 2 0

3 0 0

∣

∣

∣

∣

∣

∣

∣

∣

(g)

∣

∣

∣

∣

∣

∣

∣

∣

1 2 3

−1 3 −8

2 −5 2

∣

∣

∣

∣

∣

∣

∣

∣

(h)

∣

∣

∣

∣

∣

∣

∣

∣

2 3 1

−2 2 0

1 3 4

∣

∣

∣

∣

∣

∣

∣

∣

(i)

∣

∣

∣

∣

∣

∣

∣

∣

6 8 −8

7 5 8

−2 −4 8

∣

∣

∣

∣

∣

∣

∣

∣

(j)

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

1 10 7 −9

7 −7 7 7

2 −2 6 2

−3 −3 4 1

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

(k)

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

1 a a2 a3

1 b b2 b3

1 c c2 c3

1 d d2 d3

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

(l)

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

−1 2 6 4

0 3 5 6

1 −1 4 −2

1 2 1 2

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

3. Prove that if A is invertible, then

det A−1 =1

det A.

4. Two n×n matrices A and B are said to be similar

if there exists an invertible matrix P such that

P−1AP = B. Prove that if A and B are similar,

then

det A = det B

5. An n × n matrix A is called orthogonal if

AT = A−1.

(a) If A is orthogonal prove that det A = ±1.

(b) Give an example of an orthogonal matrix

for which det A = −1.

6. Let A and B be n × n matrices. Determine

whether each statement is true or false. Justify

your answer.

(a) If the columns of A are linearly indepen-

dent, then det A , 0.

(b) det(A + B) = det A + det B.

(c) The system of equations A~x = ~b is consis-

tent only if det A , 0.

(d) If det(AB) = 0, then det A = 0 or

det B = 0.

7. Let A be an n×n matrix and let B be the matrix

obtained from A by swapping two rows of A.

Prove that det B = − det A.

Section 5.4 Determinants and Systems of Equations 173

5.4 Determinants and Systems of Equations

The Invertible Matrix Theorem shows us that there is a close connection between

whether an n × n matrix A is invertible, the number of solutions of the system of

linear equations A~x = ~b, and the determinant of A. We now examine this relationship

a little further by looking at how to use determinants to find the inverse of a matrix

and to solve systems of linear equations.

Inverse by Cofactors

Observe that we can write the inverse of a matrix A =

[

a11 a12

a21 a22

]

in the form

A−1 =1

det A

[

a22 −a12

−a21 a11

]

=1

det A

[

C11 C21

C12 C22

]

That is, the entries of A−1 are the cofactors of A. In particular,

(

A−1)

i j =1

det AC ji (5.3)

It is important to take careful notice of the change of order in the subscripts in the

line above.

It is instructive to verify this result by multiplying out A and A−1.

AA−1 =

[

a11 a12

a21 a22

]

1

det A

[

C11 C21

C12 C22

]

=1

det A

[

a11C11 + a12C12 a11C21 + a12C22

a21C11 + a22C12 a21C21 + a22C22

]

=1

det A

[

det A a11(−a12) + a12a11

a21a22 + a22(−a21) det A

]

=

[

1 0

0 1

]

Our goal now is to prove that equation (5.3) holds in the n × n case. We begin by

proving a lemma.

LEMMA 1 If A is an n × n matrix with cofactors Ci j and i , j, then

n∑

k=1

(A)ikC jk = 0


Proof: Let B be the matrix obtained from A by replacing the j-th row of A by the

i-th row of A. That is, bℓk = aℓk for all ℓ , j, and b jk = aik. Then B has two identical

rows, so det B = 0. Moreover, since the cofactors of the j-th row of B are the same

as the cofactors of the j-th row of A we get

0 = det B =

n∑

k=1

b jkC jk =

n∑

k=1

aikC jk

�

In words, this lemma says that if we try to do a cofactor expansion of the determinant,

but use the coefficients from one row and the cofactors from another row, then we will

always get 0.

THEOREM 2 If A is an invertible n × n matrix, then (A−1)i j =1

det AC ji.

Proof: Let B be the n× n matrix defined by (B)i j =1

det AC ji. Then, for 1 ≤ i ≤ n we

have that(

AB)

ii =

n∑

k=1

(A)ik(B)ki =1

det A

n∑

k=1

(A)ikCik = 1

For(

AB)

i j with i , j, we have

(

AB)

i j =

n∑

k=1

(A)ik(B)k j =1

det A

n∑

k=1

(A)ikC jk = 0

by Lemma 1. Therefore, AB = I so B = A−1. �

Because of this result, we make the following two definitions.

DEFINITION

Cofactor Matrix

Let A be an n × n matrix. The cofactor matrix, cof A, of A is the matrix whose i j-th

entry is the i j-th cofactor of A. That is,

(cof A)i j = Ci j

DEFINITION

Adjugate

Let A be an n × n matrix. The adjugate of A is the matrix defined by

(

adj A)

i j = C ji

In particular, adj A = (cof A)T .


REMARK

By Theorem 2, we have that A−1 =1

det Aadj A.

EXAMPLE 1 Determine the adjugate of A =

1 0 2

−2 1 0

3 −1 1

and verify that A−1 =1

det Aadj A.

Solution: We find that the cofactor matrix of A is

cof A =

1 2 −1

−2 −5 1

−2 −4 1

So

adj A = (cof A)T =

1 −2 −2

2 −5 −4

−1 1 1

Next, we find that

det A =

∣

∣

∣

∣

∣

∣

∣

∣

1 0 2

−2 1 0

3 −1 1

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

1 0 2

0 1 0

1 −1 1

∣

∣

∣

∣

∣

∣

∣

∣

= −1

Multiplying we find

A

(

1

det Aadj A

)

=

1 0 2

−2 1 0

3 −1 1

1

−1

1 −2 −2

2 −5 −4

−1 1 1

=

1 0 0

0 1 0

0 0 1

Hence,

A−1 =1

det Aadj A =

−1 2 2

−2 5 4

1 −1 −1

EXERCISE 1 Determine the adjugate of A =

3 4 0

−2 1 2

1 3 1

and verify that A−1 =1

det Aadj A.

Notice that this is not a very efficient way of calculating the inverse. However, it can

be useful in some theoretic applications as it gives a formula for the entries of the

inverse.


Cramer’s Rule

Consider the system of linear equations A~x = ~b where A is an n × n matrix. If A is

invertible, then we know this has unique solution

~x = A−1~b =1

det A(adj A)~b =

1

det A

C11 · · · Cn1

......

C1n · · · Cnn

b1

...bn

Moreover, we can find that the i-th component of ~x is

xi =b1C1i + · · · + bnCni

det A

Observe that the quantity b1C1i+ · · ·+bnCni is the determinant of the matrix where we

have replaced the i-th column of A by ~b. In particular, if we let Ai denote the matrix

where the i-th column of A is replace by ~b

Ai =

a11 · · · a1(i−1) b1 a1(i+1) · · · a1n

......

......

...an1 · · · an(i−1) bn an(i+1) · · · ann

then we get

xi =det Ai

det A

THEOREM 3 (Cramer’s Rule)

If A is an n × n invertible matrix, then the solution ~x of A~x = ~b is given by

xi =det Ai

det A, 1 ≤ i ≤ n

where Ai is the matrix obtained from A by replacing the i-th column of A by ~b.

EXAMPLE 2 Solve the system of linear equations using Cramer’s Rule.

x1 + 2x2 + x3 = 2

−4x1 − x2 + 3x3 = 6

−2x1 + x2 + 2x3 = 3

Solution: We have A =

1 2 1

−4 −1 3

−2 1 2

and ~b =

2

6

3

. We find that

det A =

∣

∣

∣

∣

∣

∣

∣

∣

1 2 1

−4 −1 3

−2 1 2

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

1 2 1

0 7 7

0 5 4

∣

∣

∣

∣

∣

∣

∣

∣

= −7


Therefore, A is invertible and we can apply Cramer’s Rule to get

x1 =det A1

det A= −1

7

∣

∣

∣

∣

∣

∣

∣

∣

2 2 1

6 −1 3

3 1 2

∣

∣

∣

∣

∣

∣

∣

∣

= −1

7

∣

∣

∣

∣

∣

∣

∣

∣

0 2 1

0 −1 3

−1 1 2

∣

∣

∣

∣

∣

∣

∣

∣

=−7

−7= 1

x2 =det A2

det A= −1

7

∣

∣

∣

∣

∣

∣

∣

∣

1 2 1

−4 6 3

−2 3 2

∣

∣

∣

∣

∣

∣

∣

∣

= −1

7

∣

∣

∣

∣

∣

∣

∣

∣

1 0 1

−4 0 3

−2 −1 2

∣

∣

∣

∣

∣

∣

∣

∣

=7

−7= −1

x3 =det A3

det A= −1

7

∣

∣

∣

∣

∣

∣

∣

∣

1 2 2

−4 −1 6

−2 1 3

∣

∣

∣

∣

∣

∣

∣

∣

= −1

7

∣

∣

∣

∣

∣

∣

∣

∣

1 2 2

0 7 14

0 5 7

∣

∣

∣

∣

∣

∣

∣

∣

=−21

−7= 3

So, the solution is ~x =

1

−1

3

.

EXAMPLE 3 Let A =

a 1 2

0 b −1

c 1 d

. Assuming that det A , 0, solve A~x =

1

−1

1

.

Solution: We have

det A = a(bd + 1) + c(−1 − 2b) = abd + a − c − 2cb

Thus,

x1 =1

det A

∣

∣

∣

∣

∣

∣

∣

∣

1 1 2

−1 b −1

1 1 d

∣

∣

∣

∣

∣

∣

∣

∣

=1

det A

∣

∣

∣

∣

∣

∣

∣

∣

1 1 2

0 b + 1 1

0 0 d − 2

∣

∣

∣

∣

∣

∣

∣

∣

=(b + 1)(d − 2)

abd + a − c − 2cb

x2 =1

det A

∣

∣

∣

∣

∣

∣

∣

∣

a 1 2

0 −1 −1

c 1 d

∣

∣

∣

∣

∣

∣

∣

∣

=1

det A

∣

∣

∣

∣

∣

∣

∣

∣

a 0 1

0 −1 −1

c 0 d − 1

∣

∣

∣

∣

∣

∣

∣

∣

=−ad + a + c


x3 =1

det A

∣

∣

∣

∣

∣

∣

∣

∣

a 1 1

0 b −1

c 1 1

∣

∣

∣

∣

∣

∣

∣

∣

=1

det A

∣

∣

∣

∣

∣

∣

∣

∣

a 0 1

0 b + 1 −1

c 0 1

∣

∣

∣

∣

∣

∣

∣

∣

=(b + 1)(a − c)


That is, the solution is ~x =1


(b + 1)(d − 2)

−ad + a + c

(b + 1)(a − c)

.



1. Determine the adjugate of A and verify that

A(adj A) = (det A)I.

(a) A =

[

1 3

−2 5

]

(b) A =

[

4 5

−1 3

]

(c) A =

[

1 3

4 10

]

(d) A =

1 2 −1

0 2 7

0 0 3

(e) A =

1 −2 2

3 0 1

5 1 1

(f) A =

4 0 −4

0 −1 1

−2 2 −1

2. Assuming that each matrix is invertible, use the

adjugate to find A−1.

(a) A =

[

a b

c d

]

(b) A =

2 −3 t

−2 0 1

3 1 1

(c) A =

2 1 3

3 t −1

−2 1 4

(d) A =

a 0 b

0 c 0

b 0 d

3. Solve the following systems of linear equations

using Cramer’s Rule.

(a) 3x1 − x2 = 2

4x1 + 7x2 = 5

(b) 2x1 + x2 = 1

3x1 + 7x2 = −2

(c) 2x1 + x2 = 3

3x1 + 7x2 = 5

(d) 2x1 + 3x2 + x3 = 1

x1 + x2 − x3 = −1

−2x1 + 2x3 = 1

(e) 5x1 + x2 − x3 = 4

9x1 + x2 − x3 = 1

x1 − x2 + 5x3 = 2

5.5 Area and Volume

Area of a Parallelogram in R2

Let ~u =

[

u1

u2

]

and~v =

[

v1

v2

]

be two non-zero vectors inR2. We can form a parallelogram

in R2 with corner points (0, 0), (u1, u2), (v1, v2), and (u1 + v1, u2 + v2). This is called

the parallelogram induced by ~u and ~v. See Figure 5.5.1.

x1

x2

~u

(u1, u2)~v

(v1, v2)

~u + ~v

u1 u1 + v1

v2

u2 + v2

(0, 0)

(u1 + v1, u2 + v2)

Figure 5.5.1: The parallelogram induced by vectors ~u and ~v

Section 5.5 Area and Volume 179

We know the area of a parallelogram is length × height. As in Figure 5.5.2, we get

Area(~u,~v) = ‖~u ‖ ‖ perp~u(~v) ‖

Also observe from Figure 5.5.2 that ‖ perp~u ~v ‖ = ‖~v ‖ | sin θ|. Now, recall from Section

1.4 that cos θ =~u · ~v‖~u ‖ ‖~v ‖

. Using this gives

(Area(~u,~v))2 = ‖~u ‖2‖~v ‖2 sin2 θ

= ‖~u ‖2‖~v ‖2(1 − cos2 θ)

= ‖~u ‖2‖~v ‖2 − (~u · ~v)2

= (u21 + u2

2)(v21 + v2

2) − (u1v1 + u2v2)2

= u21v2

2 + u22v2

1 − 2(u1v2u2v1)

= (u1v2 − u2v1)2

=

∣

∣

∣

∣

∣

∣

u1 v1

u2 v2

∣

∣

∣

∣

∣

∣

2

Hence, the area of the parallelogram is given by Area(~u,~v) =∣

∣

∣

∣

det[

~u ~v]

∣

∣

∣

∣

.

x1

x2

(0, 0)

‖~u‖

‖~v‖

θ

‖perp~u(~v)‖

Figure 5.5.2: The area of the parallelogram induced by ~u and ~v

EXAMPLE 1 Determine the area of the parallelogram induced by ~u =

[

1

3

]

and ~v =

[

2

1

]

.

Solution: We have

Area(~u,~v) =

∣

∣

∣

∣

∣

∣

det

[

1 2

3 1

]∣

∣

∣

∣

∣

∣

= | − 5| = 5

REMARK

Observe that the determinant in the example above gave a negative answer because

the angle θ from ~u to ~v was greater than π radians. In particular, if we swap ~u and ~v,

that is swap the columns of the matrix[

~u ~v]

, then we will multiply the determinant

by -1. In a similar way, we can use the result above to get geometric confirmation of

many of the properties of determinants.


Volume of a Parallelepiped

We can repeat what we did above to find the volume of a parallelepiped induced by

three non-zero vectors ~u =

u1

u2

u3

, ~v =

v1

v2

v3

, and ~w =

w1

w2

w3

in R3.

~n = ~u × ~v

O

~u

~v

~w

proj~n(~w)

‖~u × ~v ‖

Figure 5.5.3: The volume of a parallelepiped is determined by the area of its base times its height

The volume of a parallelepiped is (area of base) × height. The base is the parallelo-

gram induced by vectors ~u and ~v. By Theorem 1.4.5, we have that

area of base = ‖~u ‖ ‖~v ‖∣

∣

∣ sin θ∣

∣

∣ = ‖~u × ~v ‖

Also, the height is the perpendicular of the projection of ~w onto the base. But, this is

just the projection of ~w onto the normal vector ~n of the base. Since ~u and ~v lie in the

base, we can take ~n = ~u × ~v. Therefore,

height = ‖ proj~n(~w) ‖ = |~w · ~n |‖~n ‖

=|~w · (~u × ~v)|‖~u × ~v ‖

Hence, the volume of the parallelepiped is given by

Volume(~u,~v, ~w) =|~w · (~u × ~v)|‖~u × ~v ‖ × ‖~u × ~v ‖ = |~w · (~u × ~v)|

= |w1(u2v3 − u3v2) − w2(u1v3 − u3v1) + w3(u1v2 − u2v1)|

=

∣

∣

∣

∣

∣

∣

∣

∣

det

u1 v1 w1

u2 v2 w2

u3 v3 w3

∣

∣

∣

∣

∣

∣

∣

∣

Section 5.5 Area and Volume 181

EXAMPLE 2 Find the volume of the parallelepiped determined by ~u =

1

1

4

, ~v =

1

3

4

, and ~w =

−2

1

−5

.

Solution: The volume is

Volume(~u,~v, ~w) =∣

∣

∣

∣

det[

~u ~v ~w]

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

det

1 1 −2

1 3 1

4 4 −5

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

det

1 1 −2

0 2 3

0 0 3

∣

∣

∣

∣

∣

∣

∣

∣

= 6

REMARK

Of course, this can be extended to higher dimensions. If ~v1, . . . ,~vn ∈ Rn, then we

say that they induce an n-dimensional parallelotope (the n-dimensional version of a

parallelogram or parallelepiped). The n-volume V of the parallelotope is

V =∣

∣

∣

∣

det[

~v1 · · · ~vn

]

∣

∣

∣

∣


1. Find the area of the parallelogram induced by

~u and ~v.

(a) ~u =

[

2

−3

]

, ~v =

[

3

2

]

(b) ~u =

[

4

1

]

, ~v =

[

−1

1

]

(c) ~u =

[

4

1

]

, ~v =

[

−3

3

]

(d) ~u =

[

2

3

]

, ~v =

[

−2

2

]

2. Find the volume of the parallelepiped deter-

mined by ~u, ~v, and ~w.

(a) ~u =

5

−1

1

, ~v =

1

4

−1

, ~w =

1

2

−6

(b) ~u =

1

1

4

, ~v =

1

3

4

, ~w =

−2

1

−5

(c) ~u =

2

3

4

, ~v =

2

−1

−5

, ~w =

1

5

2

3. Find the 4-volume of the parallelotope induced

by ~v1 =

1

1

2

1

, ~v2 =

1

1

1

3

, ~v3 =

1

2

3

0

, ~v4 =

1

0

5

7

.

4. Let ~v1, . . . ,~vn ∈ Rn. Prove that the n-volume

of the parallelotope induced by ~v1, . . . ,~vn is the

same as the n-volume of the parallelotope in-

duced by ~v1, . . . ,~vn−1,~vn + t~v1.

5. (a) Let L : R2 → R2 be a linear mapping.

Prove that

Area(L(~u), L(~v)) = | det[L] |Area(~u,~v)

Thus, | det[L]| is called the area scaling

factor of the linear mapping.

(b) Calculate the area scaling factor of

L(x1, x2) = (2x1 + 8x2,−x1 + 5x2).

(c) If L,M : R3 → R3 are linear mappings,

then prove that the volume scaling factor

of M ◦ L is | det([M][L])|.

Chapter 6

Eigenvectors and Diagonalization

6.1 Matrix of a Linear Mapping, Similar Matrices

Matrix of a Linear Operator

In Chapter 3, we saw how to find the standard matrix of a linear mapping by finding

the image of the standard basis vectors under the linear mapping. However, the

standard matrix of a linear mapping is not always nice to work with. For example,

consider the matrix

A =

[

1/5 2/52/5 4/5

]

By just looking at this matrix, can you give a geometric interpretation of the matrix

mapping L(~x) = A~x? Probably not. We found in Example 3.2.6 that this is the

standard matrix of proj~a : R2 → R2 where ~a =

[

1

2

]

. Even though we can use this

matrix to calculate the projection of a vector ~x onto ~a, the matrix itself does not seem

to provide us with any information about the action of the projection. This is because

we are trying to use the standard basis vectors to determine how a vector is projected

onto ~a. It would make more sense to instead use a basis for R2 which contains ~a and

a vector orthogonal to ~a. Of course, to do this we first need to determine how to find

the matrix of a linear operator with respect to a basis other than the standard basis.

The standard matrix [L] of a linear operator L : Rn → Rn satisfies L(~x) = [L]~x.

Thus, to define the matrix [L]B of L with respect to any basis B of Rn, we mimic this

formula in B-coordinates instead of standard coordinates. That is, we want

[L(~x)]B = [L]B[~x]B

Let B = {~v1, . . . ,~vn}. Then, we can write ~x = b1~v1 + · · · + bn~vn and we get

[L(~x)]B = [L(b1~v1 + · · · + bn~vn)]B

= [b1L(~v1) + · · · + bnL(~vn)]B

= b1[L(~v1)]B + · · · + bn[L(~vn)]B

=[

[L(~v1)]B · · · [L(~vn)]B]

b1

...bn

182

Section 6.1 Matrix of a Linear Mapping, Similar Matrices 183

Therefore, we have [L(~x)]B as a product of a matrix and [~x]B as desired. We make


DEFINITION

B-Matrix

Let B = {~v1, . . . ,~vn} be a basis for Rn and let L : Rn → Rn be a linear operator. The

B-matrix of L is defined to be

[L]B =[

[L(~v1)]B · · · [L(~vn)]B]

It satisfies [L(~x)]B = [L]B[~x]B.

EXAMPLE 1 Let L be a linear mapping with [L] =

−6 −2 9

−5 −1 7

−7 −2 10

and let B =

1

1

1

,

1

0

1

,

1

3

2

.

Find the B-matrix of L and use it to determine [L(~x)]B where [~x]B =

1

−2

0

.

Solution: By definition, the columns of the B-matrix are the B-coordinate vectors of

the images of the vectors in B under L. Thus, we first find the images of the vectors

in B under L and write them as linear combinations of the vectors in B. We get

−6 −2 9

−5 −1 7

−7 −2 10

1

1

1

=

1

1

1

= 1

1

1

1

+ 0

1

0

1

+ 0

1

3

2

−6 −2 9

−5 −1 7

−7 −2 10

1

0

1

=

3

2

3

= 2

1

1

1

+ 1

1

0

1

+ 0

1

3

2

−6 −2 9

−5 −1 7

−7 −2 10

1

3

2

=

6

6

7

= 3

1

1

1

+ 2

1

0

1

+ 1

1

3

2

Hence, [L]B =[

[L(1, 1, 1)]B [L(1, 0, 1)]B [L(1, 3, 2)]B]

=

1 2 3

0 1 2

0 0 1

.

Thus, [L(~x)]B = [L]B[~x]B =

1 2 3

0 1 2

0 0 1

1

−2

0

=

−3

−2

0

.

EXERCISE 1 Let L : R2 → R2 be the linear mapping defined by L(x1, x2) = (2x1 + 3x2, 4x1 − x2)

and B ={[

1

2

]

,

[

1

−1

]}

. Find the B-matrix of L and use it to determine [L(~x)]B where

[~x]B =

[

1

1

]

.

184 Chapter 6 Eigenvectors and Diagonalization

In the last example and the last exercise we used the methods of Section 4.3 to find

the coordinates of the image vectors with respect to the basis B. However, if we can

pick a basis for the linear mapping which is geometrically suited to the mapping, then

we can often find the coordinates of the image vectors with respect to the basis by

observation.

EXAMPLE 2 Let ~a =

[

1

2

]

. Determine a geometrically natural basis B for proj~a : R2 → R2, and

then determine [proj~a]B.

Solution: As discussed above, it makes sense to include ~a in our geometrically nat-

ural basis as well as a vector orthogonal to ~a. So, we take the other basis vector to

be ~n =

[

2

−1

]

and let B = {~a, ~n}. By definition, the columns of the B-matrix are the

B-coordinate vectors of the images of the vectors in B under proj~a. We have

proj~a(~a) = ~a = 1

[

1

2

]

+ 0

[

−2

1

]

proj~a(~n) = ~0 = 0

[

1

2

]

+ 0

[

−2

1

]

Therefore,

[proj~a]B =[

[proj~a(~a)]B [proj~a(~n)]B]

=

[

1 0

0 0

]

Let us compare the standard matrix of proj~a to its B-matrix found in the example

above. First, we already discussed that its standard matrix [proj~a] =

[

1/5 2/52/5 4/5

]

does not provide any useful information about the mapping. It does show us that

the projection of

[

x1

x2

]

onto ~a is

[

15x1 +

25x2

25x1 +

45x2

]

, but this does not tell us much about the

geometry of the mapping.

On the other hand, consider its B-matrix [proj~a]B =

[

1 0

0 0

]

. What information does

this give us about the mapping? First, this matrix clearly shows the action of the

mapping. In particular, for any ~x = b1~a + b2~n ∈ R2 we have

[proj~a(~x)]B = [proj~a]B[~x]B =

[

1 0

0 0

] [

b1

b2

]

=

[

b1

0

]

That is, the image of ~x under proj~a is the amount of ~x in the direction of ~a. Also, it is

clear from the form of the matrix that the range of the mapping is all scalar multiples

of the first vector in B and the kernel is all scalar multiples of the second vector in B.

The form of the matrix in Example 2 is obviously important. Therefore, we make the



DEFINITION

Diagonal Matrix

An n × n matrix D is said to be a diagonal matrix if di j = 0 for all i , j. We denote

a diagonal matrix by

diag(d11, d22, . . . , dnn)

EXAMPLE 3 The diagonal matrix D = diag(1, 2, 0, 1) is D =

1 0 0 0

0 2 0 0

0 0 0 0

0 0 0 1

.

Let’s consider another example. Let L : R3 → R3 be the linear mapping with standard

matrix [L] =

1 0 −1

−1/3 2/3 −1/3−2/3 −2/3 4/3

. Without doing any further work, what can you

determine about L from [L]? Not much. However, it can be shown that there exists a

basis B = {~v1,~v2,~v3} for R3 such that [L]B =

1 0 0

0 2 0

0 0 0

. What can you say about L by

looking at [L]B? We see that the action of L is to map a vector ~x = b1~v1 + b2~v2 + b3~v3

onto b1~v1+2b2~v2+0b3~v3, the range of L is Span{~v1,~v2}, and the kernel of L is Span{~v3}.

Clearly it is very useful to have a B-matrix of a linear operator L : Rn → Rn in

diagonal form. This leads to a couple of questions. How do we determine if there

exists a basis B such that [L]B has this form? If there is such a basis, how do we find

it? We start answering these questions by looking at our method for calculating the

B-matrix a little more closely.

Similar Matrices

Let L : Rn → Rn be a linear operator with standard matrix A = [L] and let

B = {~v1, . . . ,~vn} be a basis for Rn. Then, by definition, we have

[L]B =[

[A~v1]B · · · [A~vn]B]

(6.1)

From our work in Section 4.3, we know that the change of coordinates matrix from

B-coordinates to S-coordinates is P =[

~v1 · · · ~vn

]

. Hence, we have [~x]B = P−1~x

for any ~x ∈ Rn. Since A~vi ∈ Rn for 1 ≤ i ≤ n, we can apply this to (6.1) to get

[L]B =[

[A~v1]B · · · [A~vn]B]

=[

P−1A~v1 · · · P−1A~vn

]

= P−1A[

~v1 · · · ~vn

]

= P−1AP


REMARK

You can also show that [L]B = P−1[L]P, by showing that [L]B[~x]B = P−1[L]P[~x]Bfor all ~x ∈ Rn by simplifying the right-hand side of the equation. This is an excellent

exercise.

Since [L]B and [L] represent the same linear mapping, we would expect that these

two matrices have many of the same properties.

THEOREM 1 If A and B are n×n matrices such that P−1AP = B for some invertible matrix P, then

(1) rank A = rank B

(2) det A = det B

(3) tr A = tr B where tr A is defined by tr A =n∑

i=1

aii and is called the trace of a

matrix.

We see that such matrices are similar in many ways. This motivates the following

definition.

DEFINITION

Similar Matrices

If A and B be n × n matrices such that P−1AP = B for some invertible matrix P, then

A is said to be similar to B.

REMARK

Observe that if P−1AP = B, then taking Q = P−1 we get that Q is an invertible

matrix such that A = PBP−1 = Q−1BQ. So, the similarity property is symmetric. In

particular, we usually will just say that A and B are similar.

We can now rephrase our earlier questions. Given a linear operator L : Rn → Rn is the

standard matrix of L similar to a diagonal matrix? If so, how do we find an invertible

matrix P such that P−1[L]P is diagonal? In the next section, we will continue to work

towards answering these questions.



1. For each linear operator L and basis B, find the

B-matrix of L.

(a) L(x1, x2) = (3x1 + 2x2, 5x1 + 4x2),

B ={[

1

−2

]

,

[

2

1

]}

(b) L(x1, x2) = (2x1 + 4x2, 3x1 − x2),

B ={[

1

1

]

,

[

1

2

]}

(c) L(x1, x2) = (−7x1 − 3x2, 10x1 + 4x2),

B ={[

−3

5

]

,

[

1

−2

]}

(d) L(x1, x2, x3) = (x1+3x3, x2− x3, 2x1+3x2),

B =

1

1

1

,

1

−1

0

,

0

−1

−1

(e) L(x1, x2, x3) = (5x1 − 6x3, 2x2, 3x1 − 4x3)

B =

2

−1

−1

,

1

1

1

,

0

0

−1

(f) L(x1, x2, x3) = (5x1 − 6x3, 2x2, 3x1 − 4x3)

B =

0

1

0

,

2

0

1

,

1

0

1

(g) L(x1, x2, x3) = (2x1 + x3, x1 + 2x2 + x3, x3)

B =

0

1

0

,

−1

0

1

,

1

1

1

2. Consider the basisB =

2

1

0

,

−1

0

1

,

1

1

0

forR3.

Let L : R3 → R3 be the linear mapping where

[L]B =

0 1 0

0 4 2

3 3 0

.

(a) Find L(~x) if [~x]B =

1

4

3

.

(b) Find L(~y) if [~y]B =

1

0

−1

.

(c) Find [L].

3. Consider the basis B =

1

1

0

,

0

1

1

,

1

0

1

for R3.

Let L : R3 → R3 be the linear mapping where

[L]B =

0 0 3

5 0 −1

0 2 2

.

(a) Find L(~x) if [~x]B =

3

−1

2

.

(b) Find L(~y) if [~y]B =

1

2

−1

.

(c) Find [L].

4. Let ~a =

[

3

−1

]

, ~b =

[

1

1

]

, and let P be a plane with

normal vector ~n =

1

0

−1

. For each linear map-

ping find a geometrically natural basis B and

the B-matrix of the mapping.

(a) proj~a : R2 → R2 (b) proj~b : R2 → R2

(c) perp~b : R2 → R2 (d) reflP : R3 → R3

5. Let A, B, and C be n × n matrices.

(a) Prove that A is similar to A.

(b) Prove that if A is similar to B and B is sim-

ilar to C, then A is similar to C.

6. Prove that AT A is similar to I if and only if

AT = A−1.

7. If A is similar to B, prove that An is similar to

Bn for all positive integers n.

8. Prove Theorem 1(1).

9. Let A and B be n×n matrices. Prove or disprove

each statement.

(a) If rank A = rank B, then A and B are simi-

lar.

(b) If A and B are similar and A is invertible,

then B is invertible.

(c) A is similar to its reduced row echelon

form R.

(d) If A is similar to the diagonal matrix

D = diag(d11, d22, . . . , dnn), then

det A = d11d22 · · · dnn.


6.2 Eigenvalues and Eigenvectors

Let A = [L] be the standard matrix of a linear operator L : Rn → Rn. To determine

how to construct an invertible matrix P such that P−1AP = D is diagonal, we work in

reverse. That is, we will assume that such a matrix P exists and use this to find what

properties P and D must have.

Let P =[

~v1 · · · ~vn

]

and let D = diag(λ1, . . . , λn) =[

λ1~e1 · · · λn~en

]

such that

P−1AP = D, or alternately AP = PD. This gives

A[

~v1 · · · ~vn

]

= P[

λ1~e1 · · · λn~en

]

[

A~v1 · · · A~vn

]

=[

λ1P~e1 · · · λnP~en

]

[

A~v1 · · · A~vn

]

=[

λ1~v1 · · · λn~vn

]

Thus, we must have A~vi = λi~vi for 1 ≤ i ≤ n. Moreover, for P to be invertible, the

columns of P must be linearly independent. In particular, ~vi ,~0 for 1 ≤ i ≤ n.

DEFINITION

Eigenvalues

Eigenvectors

Eigenpair

Let A be an n × n matrix. If there exists a vector ~v , ~0 such that A~v = λ~v, then λis called an eigenvalue of A and ~v is called an eigenvector of A corresponding to λ.The pair (λ,~v) is called an eigenpair.

If L : Rn → Rn is a linear operator with standard matrix A = [L] and (λ,~v) is an

eigenpair of A, then observe that we have

L(~v) = A~v = λ~v

Hence, it makes sense to also define eigenvalues and eigenvectors for linear operators.

DEFINITION

Eigenvalues

Eigenvectors

Let L : Rn → Rn be a linear operator. If there exists a vector ~v , ~0 such that

L(~v) = λ~v, then λ is called an eigenvalue of L and ~v is called an eigenvector of L

corresponding to λ.

EXAMPLE 1 Let ~a =

[

1

2

]

and ~n =

[

2

−1

]

. We have

proj~a(~a) = 1~a

proj~a(~n) = ~0 = 0~n

Hence, ~a is an eigenvector of proj~a with eigenvalue 1, and ~n is an eigenvector of proj~awith eigenvalue 0.

Observe that it is easy to determine whether a given vector ~v is an eigenvector of a

given matrix A. We just need to determine whether the image of ~v under A is a scalar

multiple of ~v.

Section 6.2 Eigenvalues and Eigenvectors 189

EXAMPLE 2 Determine which of the following vectors are eigenvectors of A =

2 −3 −1

1 −2 −1

1 −3 0

.

(a) ~v1 =

3

1

0

(b) ~v2 =

−1

2

−1

(c) ~v3 =

1

1

1

.

Solution: We have A~v1 =

2 −3 −1

1 −2 −1

1 −3 0

3

1

0

=

3

1

0

.

So, ~v1 is an eigenvector with eigenvalue λ1 = 1.

For ~v2 we get A~v2 =

2 −3 −1

1 −2 −1

1 −3 0

−1

2

−1

=

−7

−4

−7

.

So, A~v2 is not a scalar multiple of ~v2, so it is not an eigenvector.

For ~v3 we get A~v3 =

2 −3 −1

1 −2 −1

1 −3 0

1

1

1

=

−2

−2

−2

= (−2)

1

1

1

.

So, ~v3 is an eigenvector with eigenvalue λ2 = −2.

EXERCISE 1 Determine which of the following vectors are eigenvectors of A =

1 4 0

−1 −5 1

−3 −4 −2

.

(a) ~v1 =

1

2

3

(b) ~v2 =

−4

3

5

(c) ~v3 =

−2

1

2

How do we determine whether a given scalar λ is an eigenvalue of a given matrix A?

We need to determine whether there exists a vector ~v , ~0 such that A~v = λ~v. At first

glance this equation might look like a system of linear equations, but we observe that

the unknown vector ~v is actually on both sides of the equation. To turn this into a

system of linear equations, we need to move λ~v to the other side. We get

A~v − λ~v = ~0(A − λI)~v = ~0

Hence, we have a homogeneous system of linear equations with coefficient matrix

A − λI. Since the definition requires that ~v , ~0, we see that λ is an eigenvalue of A if

and only if this system has infinitely many solutions. Thus, since A − λI is n × n, we

know by the Invertible Matrix Theorem that we require det(A − λI) = 0. Moreover,

we see that to find all the eigenvectors associated with the eigenvalue λ, we just need

to determine all non-zero vectors ~v such that (A − λI)~v = ~0.


EXAMPLE 3 Find all of the eigenvalues of A =

1 6 3

0 −2 0

3 6 1

. Determine all eigenvectors associated

with each eigenvalue.

Solution: Our work above shows that all eigenvalues of A must satisfy det(A−λI) =

0. Thus, to find all eigenvalues of A, we solve this equation for λ. We have

0 = det(A − λI) =

∣

∣

∣

∣

∣

∣

∣

∣

1 − λ 6 3

0 −2 − λ 0

3 6 1 − λ

∣

∣

∣

∣

∣

∣

∣

∣

= (−2 − λ)∣

∣

∣

∣

∣

∣

1 − λ 3

3 1 − λ

∣

∣

∣

∣

∣

∣

= −(λ + 2)(λ2 − 2λ − 8) = −(λ + 2)(λ + 2)(λ − 4)

Hence, the eigenvalues are λ1 = −2, λ2 = −2, and λ3 = 4.

To find the eigenvectors associated with the eigenvalue λ1 = −2 we solve the homo-

geneous system (A − λ1I)~v = ~0. Row reducing the coefficient matrix gives

A − λ1I =

3 6 3

0 0 0

3 6 3

∼

1 2 1

0 0 0

0 0 0

The solution of the homogeneous system is ~v = s

−2

1

0

+ t

−1

0

1

, s, t ∈ R. Hence, all

eigenvectors for λ1 are all non-zero vectors of the form ~v = s

−2

1

0

+ t

−1

0

1

, s, t ∈ R.

Of course, these are also the eigenvectors for λ2.

Similarly, for λ3 = 4 we row reduce the coefficient matrix of the homogeneous system

(A − λ3I)~v = ~0 to get

A − λ3I =

−3 6 3

0 −6 0

3 6 −3

∼

1 0 −1

0 1 0

0 0 0

The solution of the homogeneous system is ~v = t

1

0

1

, t ∈ R. Hence, all eigenvectors

for λ3 are ~v = t

1

0

1

, t ∈ R with t , 0.

Observe that for any n × n matrix A the equation det(A − λI) will always be an n-th

degree polynomial and the roots of the polynomial are the eigenvalues of A.


DEFINITION

CharacteristicPolynomial

Let A be an n × n matrix. The characteristic polynomial of A is the n-th degree

polynomial

C(λ) = det(A − λI)

THEOREM 1 A scalar λ is an eigenvalue of an n × n matrix A if and only if C(λ) = 0.

We also see that the set of all eigenvectors associated with an eigenvalue λ is the

nullspace of A − λI not including the zero vector.

DEFINITION

Eigenspace

Let A be an n × n matrix with eigenvalue λ. We call the nullspace of A − λI the

eigenspace of λ. The eigenspace is denoted Eλ.

REMARKS

1. It is important to remember that the set of all eigenvectors for the eigenvalue λof A is all vectors in Eλ excluding the zero vector.

2. Since the eigenspace is just the nullspace of A − λI, it is also a subspace of Rn.

3. The examples and exercises we will do are relatively simple. In real world

applications, the matrices are generally much larger and most likely have non-

integer eigenvalues.

EXAMPLE 4 Find all eigenvalues and a basis for each eigenspace for A =

[

−4 3

2 1

]

.

Solution: To find all of the eigenvalues, we find the roots of the characteristic poly-

nomial. We have

0 = C(λ) = det(A − λI) =

∣

∣

∣

∣

∣

∣

−4 − λ 3

2 1 − λ

∣

∣

∣

∣

∣

∣

= λ2 + 3λ − 10 = (λ − 2)(λ + 5)

Hence, the eigenvalues of A are λ1 = 2 and λ2 = −5. For λ1 = 2, we have

A − λ1I =

[

−6 3

2 −1

]

∼[

1 −1/20 0

]

Hence, a basis for Eλ1is

{[

1

2

]}

.

For λ2 = −5, we have

A − λ2I =

[

1 3

2 6

]

∼[

1 3

0 0

]


{[

−3

1

]}

.


EXAMPLE 5 Find all eigenvalues and a basis for each eigenspace for A =

1 0 1

0 1 0

0 0 1

.

Solution: To find all of the eigenvalues, we find the roots of the characteristic poly-

nomial. We have

0 = C(λ) = det(A − λI) =

∣

∣

∣

∣

∣

∣

∣

∣

1 − λ 0 1

0 1 − λ 0

0 0 1 − λ

∣

∣

∣

∣

∣

∣

∣

∣

= (1 − λ)3

Hence, all three eigenvalues of A are 1. For λ1 = 1, we have A − λ1I =

0 0 1

0 0 0

0 0 0

.


1

0

0

,

0

1

0

.

EXERCISE 2 Find all eigenvalues and a basis for each eigenspace for A =

1 2 1

0 1 2

0 0 −2

.

From our work in the last example and the last exercise, we see that finding the

eigenvalues of a triangular matrix is easy.

THEOREM 2 If A is an n × n upper or lower triangular matrix, then the eigenvalues of A are the

diagonal entries of A.

Observe in the problems above that the dimension of the eigenspace of an eigenvalue

is always less than or equal to the number of times the eigenvalue is repeated as a

root of the characteristic polynomial. Before we prove this is always the case, we

make the following definitions and prove a lemma.

DEFINITION

AlgebraicMultiplicity

GeometricMultiplicity

Let A be an n × n matrix with eigenvalue λ1. The algebraic multiplicity of λ1,

denoted aλ1, is the number of times that λ1 is a root of the characteristic polynomial

C(λ). That is, if C(λ) = (λ − λ1)kC1(λ), where C1(λ1) , 0, then aλ1= k. The

geometric multiplicity of λ, denoted gλ1, is the dimension of its eigenspace. So,

gλ1= dim(Eλ1

).

EXAMPLE 6 In Example 5, we found that the matrix A has only one eigenvalue λ1 = 1 which is a

triple root of the characteristic polynomial. Thus, the algebraic multiplicity of λ1 is

aλ1= 3. The geometric multiplicity of λ1 is gλ1

= dim Eλ1= 2.


EXAMPLE 7 In Example 4, the matrix A has eigenvalues λ1 = 2 and λ2 = −5 each which occur

as a single root of the characteristic polynomial. Hence, the algebraic multiplicity of

both eigenvalues is 1. Moreover, the eigenspace of each eigenvalue has dimension 1,

so the geometric multiplicity of each eigenvalue is 1.

EXAMPLE 8 Find the geometric and algebraic multiplicity of all eigenvalues of A =

−1 6 3

1 0 −1

−3 6 5

.

Solution: We have

0 = C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

−1 − λ 6 3

1 −λ −1

−3 6 5 − λ

∣

∣

∣

∣

∣

∣

∣

∣

= (−1 − λ)∣

∣

∣

∣

∣

∣

−λ −1

6 5 − λ

∣

∣

∣

∣

∣

∣

+ 6(−1)

∣

∣

∣

∣

∣

∣

1 −1

−3 5 − λ

∣

∣

∣

∣

∣

∣

+ 3

∣

∣

∣

∣

∣

∣

1 −λ−3 6

∣

∣

∣

∣

∣

∣

= (−1 − λ)(λ2 − 5λ + 6) − 6(−λ + 2) + 3(6 − 3λ)

= −λ3 + 4λ2 − 4λ = −λ(λ2 − 4λ + 4) = −λ(λ − 2)2

Thus, the eigenvalues of A are λ1 = 0 and λ2 = 2. Since λ1 is a single root of C(λ)we have that aλ1

= 1. We have aλ2= 2 since λ2 is a double root of C(λ).

For λ1 = 0, we have

A − λ1I =

−1 6 3

1 0 −1

−3 6 5

∼

1 0 −1

0 1 1/30 0 0

Thus, a basis for Eλ1is

1

−1/31

. Hence, gλ1= dim Eλ1

= 1.


A − λ2I =

−3 6 3

1 −2 −1

−3 6 3

∼

1 −2 −1

0 0 0

0 0 0


2

1

0

,

1

0

1


= 2.

EXERCISE 3 Find the geometric and algebraic multiplicity of all eigenvalues of A =

1 2 1

0 1 2

0 0 −2

.


Observe in Example 8 that we needed to factor a cubic polynomial. For purposes of

this course, we typically have two options to make this easier. First, we can try to

use row and/or column operations when finding det(A−λI) so that we do not have to

factor a cubic polynomial, or alternately we can use the Rational Roots Theorem and

synthetic division to factor the cubic. Of course, in the real world we generally have

polynomials of degree much larger than 3 and are not so lucky as to have rational

roots. In these cases, techniques for approximating roots of polynomials or more

specialized methods for approximating eigenvalues are employed. These are beyond

the scope of this book. However, you should keep in mind that the problems we do

here are to help you understand the theory.

EXAMPLE 9 Find the geometric/algebraic multiplicity of all eigenvalues of A =

−8 −3 −1

9 4 −2

−8 −8 −1

.

Solution: Since adding a multiple of one column to another or adding a multiple of

one row to another does not change the determinant, we get

0 = C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

−8 − λ −3 −1

9 4 − λ −2

−8 −8 −1 − λ

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

−5 − λ −3 −1

5 + λ 4 − λ −2

0 −8 −1 − λ

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

−5 − λ −3 −1

0 1 − λ −3

0 −8 −1 − λ

∣

∣

∣

∣

∣

∣

∣

∣

= (−5 − λ)(λ2 − 25) = −(λ + 5)2(λ − 5)

Hence, the eigenvalues of A are λ1 = −5 and λ2 = 5 with aλ1= 2 and aλ2

= 1.

For λ1 = −5, we have A − λ1I =

−3 −3 −1

9 9 −2

−8 −8 4

∼

1 1 0

0 0 1

0 0 0


−1

1

0


= 1.

For λ2 = 5, we have A − λ2I =

−13 −3 −1

9 −1 −2

−8 −8 −6

∼

1 0 −1/80 1 7/80 0 0


1/8−7/8

1


= 1.

Recall that our goal so far in this chapter is to answer the following two questions.

Given a linear operator L : Rn → Rn is the standard matrix of L similar to a diagonal

matrix? If so, how do we find an invertible matrix P such that P−1[L]P is diagonal?


In this section, we have discovered that if [L] is similar to a diagonal matrix D, then

the columns in P must be eigenvectors of [L] and the diagonal entries in D must be

eigenvalues of [L]. Using this and the following two results, we will finally be able

to fully answer both of these questions in the next section.

LEMMA 3 Let A and B be similar matrices, then A and B have the same characteristic polyno-

mial, and hence the same eigenvalues.

THEOREM 4 If A is an n × n matrix with eigenvalue λ1, then 1 ≤ gλ1≤ aλ1

.

Proof: First, by definition of an eigenvalue we have that gλ1= dim(Eλ1

) ≥ 1.

Let {~v1, . . . ,~vk} be a basis for Eλ1. Extend this to a basis {~v1, . . . ,~vk, ~wk+1, . . . , ~wn} for

Rn and let P =[

~v1 · · · ~vk ~wk+1 · · · ~wn

]

. Since the columns of P form a basis for

Rn, we have that P is invertible by the Inverse Matrix Theorem. Let F =[

~v1 · · · ~vk

]

and G =[

~wk+1 · · · ~wn

]

so that P can be written as the block matrix P =[

F G]

.

Similarly, write P−1 as the block matrix P−1 =

[

H

J

]

so that

I = P−1P =

[

H

J

]

[

F G]

=

[

HF HG

JF JG

]

In particular, this gives that HF = Ik and JF = On−k,k, the (n − k) × k zero matrix.

Define the matrix B by B = P−1AP, so that B is similar to A.

B =

[

H

J

]

A[

F G]

=

[

H

J

]

[

AF AG]

=

[

H

J

]

[

λ1F AG]

=

[

λ1HF HAG

λ1JF JAG

]

=

[

λ1Ik X

On−k,k Y

]

where X is k × (n − k) and Y is (n − k) × (n − k). Then, by Lemma 3, we have the

characteristic polynomial C(λ) of A equals the characteristic polynomial of B. In

particular,

C(λ) = det(B − λI) =

∣

∣

∣

∣

∣

∣

(λ1 − λ)I X

On−k,n−k Y − λI

∣

∣

∣

∣

∣

∣

Expanding the determinant we get

C(λ) = (−1)k(λ − λ1)k det(Y − λI)

Since, λ1 may or may not be a root of det(Y − λI), we get that aλ1≥ k = gλ1

. �



1. Determine which of the following are eigen-

vectors of A =

−1 1 −4

5 −5 −4

−5 11 10

.

(a) ~v1 =

1

1

1

(b) ~v2 =

2

2

−3

(c) ~v3 =

0

0

0

2. Determine which of the following are eigen-

vectors of B =

1 3 3

6 7 12

−3 −3 −5

.

(a) ~v1 =

−1

−1

1

(b) ~v2 =

3

−6

3

(c) ~v3 =

2

−2

1

3. Find the geometric and algebraic multiplicity

of all eigenvalues of each matrix.

(a)

[

3 2

4 5

]

(b)

[

−2 0

5 −2

]

(c)

[

4 −3

3 −4

]

(d)

−3 −2 −2

2 2 1

0 0 1

(e)

−3 6 −2

−1 2 −1

1 −3 0

(f)

4 2 2

2 4 2

2 2 4

(g)

1 3 1

0 1 2

0 2 1

(h)

2 2 −1

−2 1 2

2 2 −1

(i)

1 6 −4

5 −3 1

−5 6 2

(j)

1 0 1

0 2 1

0 0 2

4. Let ~v =

[

1

3

]

. Use geometric reasoning to deter-

mine the eigenvalues and eigenvectors of the

projection proj~v : R2 → R2 and the rotation

Rπ/3 : R2 → R2.

5. Show that if A is invertible and ~v is an eigen-

vector of A, then ~v is also an eigenvector of

A−1. How are the corresponding eigenvalues

related?

6. Invent a 2 × 2 matrix A that has eigenvalues 2

and 3 with corresponding eigenvectors

[

1

2

]

,

[

1

3

]

respectively.

7. Invent a 4 × 4 matrix A that has an eigenvalue

λ such that gλ < aλ. Justify your answer.

8. Invent a 4×4 matrix A that has 4 distinct eigen-

values. Justify your answer.

9. Suppose A and B are n × n matrices and that ~uand ~v are eigenvectors of both A and B where

A~u = 6~u, B~u = 5~u, A~v = 10~v, and B~v = 3~v.

(a) If C = AB, show that 5~u + 3~v is an eigen-

vector of C. Find the corresponding eigen-

value.

(b) If n = 2, and ~w =

[

0.21.4

]

compute AB~w.

10. Let A and B be n × n matrices with eigenvalues

λ and µ respectively. Does this imply that λµ is

an eigenvalue of AB? Justify your answer.

11. Suppose that A is an n × n matrix such that the

sum of the entries in each row is the same. That

is,

n∑

j=1

ai j = c for 1 ≤ i ≤ n. Show that ~v =

1...1

is an eigenvector of A. (Such matrices arise in

probability theory).


13. Prove Lemma 3.

14. Prove the following addition to the Invertible

Matrix Theorem. A matrix A is invertible if

and only if λ = 0 is not an eigenvalue of A.

Section 6.3 Diagonalization 197

6.3 Diagonalization

In this section, we will reveal how to determine when a matrix A is similar to a

diagonal matrix D, and if it is, how to find a matrix P such that P−1AP = D. We

begin with a definition.

DEFINITION

Diagonalizable

An n × n matrix A is said to be diagonalizable if A is similar to a diagonal matrix D.

If P−1AP = D, then we say that P diagonalizes A.

REMARK

For now, we will restrict ourselves to diagonalizing real matrices with real eigenval-

ues. That is, if A has a non-real eigenvalue, then we will say that A is not diagonaliz-

able over R. In Chapter 11, we will look at diagonalizing matrices over the complex

numbers.

Our work in the last section motivates the following theorem.

THEOREM 1 (Diagonalization Theorem)

An n × n matrix A is diagonalizable if and only if there exists a basis {~v1, . . . ,~vn} for

Rn of eigenvectors of A.

Proof: Assume that A is diagonalizable. By definition, this means that there exists

an invertible matrix P =[

~v1 · · · ~vn

]

such that

P−1AP = D (6.2)

where D = diag(λ1, . . . , λn) is diagonal. Since D is diagonal, the eigenvalues of D

are λ1, . . . , λn. Therefore, by Lemma 6.2.3, λ1, . . . , λn are also the eigenvalues of A.

Equation (6.2) gives

AP = PD

A[

~v1 · · · ~vn

]

= P[

λ1~e1 · · · λn~en

]

[

A~v1 · · · A~vn

]

=[


]

[

A~v1 · · · A~vn

]

=[

λ1~v1 · · · λn~vn

]

Thus, A~vi = λi~vi for 1 ≤ i ≤ n. Moreover, since P is invertible we have {~v1, . . . ,~vn}is a basis for Rn by the Invertible Matrix Theorem, which also implies ~vi ,

~0 for

1 ≤ i ≤ n. Hence, {~v1, . . . ,~vn} is a basis for Rn of eigenvectors of A.


On the other hand, if {~v1, . . . ,~vn} is a basis for Rn of eigenvectors, then the matrix

P =[

~v1 · · · ~vn

]

is invertible by the Invertible Matrix Theorem, and we have

P−1AP = P−1[

A~v1 · · · A~vn

]

= P−1[

λ1~v1 · · · λn~vn

]

= P−1[


]

= P−1P[

λ1~e1 · · · λn~en

]

= diag(λ1, . . . , λn)

Therefore, A is diagonalizable. �

It is very important to understand the proof of the Diagonalization Theorem. It does

a lot more than just prove the theorem. It shows us how to diagonalize a matrix!

In particular, we just need to find a basis {~v1, . . . ,~vn} for Rn of eigenvectors of A

corresponding to eigenvalues λ1, . . . , λn and then take P =[

~v1 · · · ~vn

]

. Moreover,

it shows that taking this matrix P gives

P−1AP = diag(λ1, . . . , λn)

The next two results show us how to construct a basis for Rn of eigenvectors of A if

such a basis exists.

LEMMA 2 If A is an n × n matrix with eigenpairs (λ1,~v1), (λ2,~v2), . . . , (λk,~vk) where λi , λ j for

i , j, then {~v1, . . . ,~vk} is linearly independent.

Proof: We will prove this by induction. If k = 1, then the result is trivial, since by

definition of an eigenvector ~v1 ,~0. Assume that the result is true for some k ≥ 1. To

show {~v1, . . . ,~vk,~vk+1} is linearly independent we consider

c1~v1 + · · · + ck~vk + ck+1~vk+1 = ~0 (6.3)

Observe that since A~vi = λi~vi we have

(A − λiI)~v j = A~v j − λi~v j = λ j~v j − λi~v j = (λ j − λi)~v j

for all 1 ≤ i, j ≤ k + 1. Thus, multiplying both sides of (6.3) by A − λk+1I gives

~0 = c1(λ1 − λk+1)~v1 + · · · + ck(λk − λk+1)~vk + ck+1(λk+1 − λk+1)~vk+1

= c1(λ1 − λk+1)~v1 + · · · + ck(λk − λk+1)~vk + ~0

By our induction hypothesis, {~v1, . . . ,~vk} is linearly independent and hence all the

coefficients must be 0. But, λi , λk+1 for all 1 ≤ i ≤ k, so we must have c1 = · · · =ck = 0. Thus (6.3) becomes

~0 + ck+1~vk+1 = ~0

But ~vk+1 ,~0 since it is an eigenvector, hence ck+1 = 0. Therefore, the set is linearly

independent. �


THEOREM 3 If A is an n × n matrix with distinct eigenvalues λ1, . . . , λk and

Bi = {~vi,1, . . . ,~vi,gλi} is a basis for the eigenspace of λi for 1 ≤ i ≤ k, then B1 ∪ B2 ∪

· · · ∪ Bk is a linearly independent set.

Proof: We prove the result by induction on k. If k = 1, then B1 is linearly indepen-

dent since it is a basis for Eλ1. Assume the result is true for some j ≥ 1. That is,

assume that B1 ∪ · · · ∪ B j is linearly independent and consider B1 ∪ · · · ∪ B j ∪ B j+1.

For simplicity, we will denote the vectors inB1∪· · ·∪B j by ~w1, . . . , ~wℓ. Assume that

there exists a non-trivial solution to the equation

c1~w1 + · · · + cℓ~wℓ + d1~v j+1,1 + · · · + dgλ j+1~v j+1,gλ j+1

= ~0

Observe that c1, . . . , cℓ cannot be all zero as otherwise this would contradict that the

fact that B j+1 is a basis. Similarly, d1, . . . , dgλ j+1cannot be all zero as this would

contradict the inductive hypothesis. Thus, we must have at least one ci non-zero

and at least one di non-zero. But then we would have a linear combination of some

vectors in B j+1 equaling a linear combination of vectors in B1, . . . ,B j which would

contradict Lemma 2. Hence, the only solution must be the trivial solution. Therefore,

B1∪· · ·∪B j∪B j+1 is linearly independent. The result now follows by induction. �

Theorem 3 shows us how to form a linearly independent set of eigenvectors of a

matrix A. We first find a basis for the eigenspace of every eigenvalue of A. Then,

the set containing all the vectors in every basis is linearly independent by Theorem 3.

However, for the matrix P to be invertible, we actually need n linearly independent

eigenvectors of A. The next result gives us the condition required for there to exist n

linearly independent eigenvectors.

COROLLARY 4 If A is an n × n matrix with distinct eigenvalues λ1, . . . , λk, then A is diagonalizable

if and only if gλi= aλi

for 1 ≤ i ≤ k.

If λ is an eigenvalue of A such that gλ < aλ, then λ is said to be deficient. Observe

that since gλ ≥ 1 for every eigenvalue λ, then by Theorem 6.2.4 any eigenvalue λsuch that aλ = 1 cannot be deficient. This gives the following result.

COROLLARY 5 If A is an n × n matrix with n distinct eigenvalues, then A is diagonalizable.

We now have an algorithm for diagonalizing a matrix A or showing that it is not

diagonalizable.


ALGORITHM

To diagonalize an n × n matrix A, or show that A is not diagonalizable:

1. Find and factor the characteristic polynomial C(λ) = det(A − λI).

2. Let λ1, . . . , λn denote the n-roots of C(λ) (repeated according to multiplicity).

If any of the eigenvalues λi are not real, then A is not diagonalizable over R.

3. Find a basis for the eigenspace of each λi by finding a basis for the nullspace

of A − λiI.

4. If gλi< aλi

for any λi, then A is not diagonalizable. Otherwise, form

a basis {~v1, . . . ,~vn} for Rn of eigenvectors of A by using Theorem 3. Let

P =[

~v1 · · · ~vn

]

. Then, P−1AP = diag(λ1, . . . , λn) where λi is an eigenvalue

corresponding to the eigenvector ~vi for 1 ≤ i ≤ n.

EXAMPLE 1 Show that A =

−1 6 3

1 0 −1

−3 6 5

is diagonalizable and find an invertible matrix P and a

diagonal matrix D such that P−1AP = D.

Solution: In Example 6.2.8 we found that the eigenvalues of A are λ1 = 0 and

λ2 = 2, and that aλ1= gλ1

and aλ2= gλ2

. Hence, A is diagonalizable by Corollary

4. Moreover, we get the the columns of P are the basis vectors from each of the

eigenspaces. That is,

P =

1 2 1

−1/3 1 0

1 0 1

The diagonal entries in D are the eigenvalues of A corresponding column wise to the

columns in P. Hence,

P−1AP = D = diag(0, 2, 2) =

0 0 0

0 2 0

0 0 2

REMARK

It is important to remember that the answer is not unique. We can arrange the linearly

independent eigenvectors of A as the columns of P in any order. We only have to

ensure the entry dii in D is an eigenvalue of A of the eigenvector in the i-th column of

P.

EXERCISE 1 Find two other invertible matrices P which diagonalize A =

−1 6 3

1 0 −1

−3 6 5

. In each

case, give the corresponding diagonal matrix D.



[

1 1

0 1

]

is not diagonalizable.

Solution: We have 0 = C(λ) =

∣

∣

∣

∣

∣

∣

1 − λ 1

0 1 − λ

∣

∣

∣

∣

∣

∣

= (λ − 1)2.

Hence, λ1 = 1 is the only eigenvalue and aλ1= 2. We then get

A − λ1I =

[

0 1

0 0

]

So, a basis for Eλ1is

{[

1

0

]}

. Thus, A is not diagonalizable, because gλ1= 1 < aλ1

.


[

1 2

2 1

]

is diagonalizable and find an invertible matrix P and a diagonal

matrix D such that P−1AP = D.


∣

∣

∣

∣

∣

∣

1 − λ 2

2 1 − λ

∣

∣

∣

∣

∣

∣

= λ2 − 2λ − 3 = (λ − 3)(λ + 1).

Thus, the eigenvalues of A are λ1 = 3 and λ2 = −1. Since the 2 × 2 matrix A has 2

distinct eigenvalues, it is diagonalizable by Corollary 5.


A − λ1I =

[

−2 2

2 −2

]

∼[

1 −1

0 0

]


{[

1

1

]}

.


A − λ2I =

[

2 2

2 2

]

∼[

1 1

0 0

]


{[

−1

1

]}

.

We now let P =

[

−1 1

1 1

]

and get D = P−1AP =

[

−1 0

0 3

]

.


[

0 1

−1 0

]

is not diagonalizable over R.


∣

∣

∣

∣

∣

∣

−λ 1

−1 −λ

∣

∣

∣

∣

∣

∣

= λ2 + 1.

The eigenvalues of A are not real, so A is not diagonalizable over R.


REMARK

We will show in Chapter 11 that A =

[

0 1

−1 0

]

is diagonalizable over the complex

numbers.


1 2 2

2 1 2

2 2 1

is diagonalizable and find an invertible matrix P and a

diagonal matrix D such that P−1AP = D.

Solution: Using column and row operations on the determinant, we get

0 = C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

1 − λ 2 2

2 1 − λ 2

2 2 1 − λ

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

1 − λ 2 0

2 1 − λ 1 + λ2 2 −1 − λ

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

1 − λ 2 0

4 3 − λ 0

2 2 −1 − λ

∣

∣

∣

∣

∣

∣

∣

∣

= (−1 − λ)∣

∣

∣

∣

∣

∣

1 − λ 2

4 3 − λ

∣

∣

∣

∣

∣

∣

= −(λ + 1)(λ2 − 4λ − 5)

= −(λ + 1)2(λ − 5)

Thus, the eigenvalues of A are λ1 = −1 and λ2 = 5 where aλ1= 2 and aλ2

= 1.


A − λ1I =

2 2 2

2 2 2

2 2 2

∼

1 1 1

0 0 0

0 0 0


−1

1

0

,

−1

0

1

, so gλ1= aλ1

.


A − λ2I =

−4 2 2

2 −4 2

2 2 −4

∼

1 0 −1

0 1 −1

0 0 0


1

1

1

, so gλ2= aλ2

.

Therefore, A is diagonalizable by Corollary 4. We can take P =

−1 −1 1

1 0 1

0 1 1

and get

D = P−1AP = diag(−1,−1, 5).


We conclude this section with a couple of remarks and a result about the eigenvalues

of a matrix. First, remember that by definition of an eigenvalue λ of a matrix A, the

RREF of A − λI must have at least one row of zeros. If you ever find that the RREF

of your A − λI is I, then you either made a mistake in your row reducing, or your

eigenvalue is not actually an eigenvalue. Secondly, a lot of mathematics is about

looking for patterns. As you do diagonalization examples, look for patterns in the

values of eigenvalues and eigenvectors. There are some tricks for finding eigenvalues

and/or eigenvectors in some special cases. The following theorem can help.

THEOREM 6 If λ1, . . . , λn are all the eigenvalues of an n × n matrix A, then

tr A = λ1 + · · · + λn


1. For each of the following matrices, determine

the eigenvalues and a basis for each eigenspace

and hence determine if the matrix is diagonal-

izable. If it is, write a diagonalizing matrix P

and the resulting matrix D.

(a)

[

3 −1

−1 3

]

(b)

[

1 2

0 1

]

(c)

[

4 −2

−2 7

]

(d)

5 −1 1

−1 5 1

1 1 5

(e)

1 0 −1

2 2 1

4 0 5

(f)

1 1 2

2 0 1

1 −1 −1

(g)

1 −2 3

2 6 −6

1 2 −1

(h)

3 1 1

−4 −2 −5

2 2 5

(i)

−3 −3 5

13 10 −13

3 2 −1

(j)

8 6 −10

5 1 −5

5 3 −7

(k)

2 1 0

1 3 1

3 1 4

(l)

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

2. For each linear operator L, find a basis B such

that [L]B is diagonal.

(a) L(x1, x2) = (3x1 − x2,−x1 + 3x2)

(b) L(x1, x2) = (x1 + 3x2, 4x1 + 2x2)

(c) L(x1, x2, x3) = (−2x1 + 2x2 + 2x3,−3x1 +

3x2 + 2x3,−2x1 + 2x2 + 2x3)

3. Let A be an n × n diagonalizable matrix with

eigenvalues λ1, . . . , λn repeated according to

multiplicity.

(a) Prove that tr A = λ1 + · · · + λn.

(b) Prove that det A = λ1 · · ·λn.

4. Write an invertible matrix A that is not diago-

nalizable, an invertible matrix B that is diago-

nalizable, a non-invertible matrix C that is not

diagonalizable, and a non-invertible matrix E

that is diagonalizable.

5. Suppose that A is diagonalizable and that the

eigenvalues of A are λ1, . . . , λn. Show that

A − λ1I is also diagonalizable and its eigenval-

ues are 0, λ2 − λ1, λ3 − λ1, . . . , λn − λ1.

6. Prove that if A is digaonalizable, then AT is di-

agonalizable.

7. Let A =

[

a b

c d

]

.

(a) Prove that A is diagonalizable over R if

(a−d)2+4bc > 0 and is not diagonalizable

over R if (a − d)2 + 4bc < 0.

(b) Find two examples to demonstrate that if

(a − d)2 + 4bc = 0, then A may or may not

be diagonalizable.

(c) What condition is required on (a−d)2+4bc

to ensure that A has two integer eigenval-

ues.


6.4 Powers of Matrices

We now briefly look at one of the many applications of diagonalization. In particular,

we will look at how to use diagonalization to easily compute powers of a diagonaliz-

able matrix.

Let A be an n × n matrix. We define powers of a matrix in the expected way. That is,

for any positive integer k, Ak is defined to be A multiplied by itself k times. Thus, for

A =

[

1 2

−1 4

]

we have

A2 =

[

1 2

−1 4

] [

1 2

−1 4

]

=

[

−1 10

−5 14

]

A3 = A2A =

[

−1 10

−5 14

] [

1 2

−1 4

]

=

[

−11 38

−19 46

]

A1000 =

[

21001 − 31000 −21001 + 2 · 31000

21000 − 31000 −21000 + 2 · 31000

]

To calculate A1000 we certainly did not multiply a thousand matrices A together. In-

stead, we used the power of the diagonal form of a matrix.

Observe that multiplying a diagonal matrix D by itself is easy. In particular, we

have if D = diag(d1, . . . , dn), then Dk = diag(dk1, . . . , dk

n). The following theorem,

shows us how we can use this fact with diagonalization to quickly take powers of a

diagonalizable matrix A.

THEOREM 1 Let A be an n × n matrix. If there exists a matrix P and diagonal matrix D such that

P−1AP = D, then

Ak = PDkP−1

Proof: We prove the result by induction on k. If k = 1, then P−1AP = D implies

A = PDP−1 and so the result holds. Assume the result is true for some k ≥ 1. We

then have

Ak+1 = AkA = (PDkP−1)(PDP−1) = PDkP−1PDP−1 = PDkIDP−1 = PDk+1P−1

as required. �

EXAMPLE 1 Let A =

[

1 2

−1 4

]

. Show that A1000 =

[

21001 − 31000 −21001 + 2 · 31000

21000 − 31000 −21000 + 2 · 31000

]

.

Solution: We first diagonalize A. We have

0 = det(A − λI) =

∣

∣

∣

∣

∣

∣

1 − λ 2

−1 4 − λ

∣

∣

∣

∣

∣

∣

= λ2 − 5λ + 6 = (λ − 2)(λ − 3)

Thus, the eigenvalues of A are λ1 = 2 and λ2 = 3.

Section 6.4 Powers of Matrices 205

For λ1 = 2 we have

A − λ1I =

[

−1 2

−1 2

]

∼[

−1 2

0 0

]

so a basis for Eλ1is

{[

2

1

]}

.

For λ2 = 3 we have

A − λ2I =

[

−2 2

−1 1

]

∼[

1 −1

0 0

]

so a basis for Eλ2is

{[

1

1

]}

.

Therefore, P =

[

2 1

1 1

]

and D =

[

2 0

0 3

]

. We find that P−1 =

[

1 −1

−1 2

]

. Hence,

A1000 = PD1000P−1

=

[

2 1

1 1

] [

21000 0

0 31000

] [

1 −1

−1 2

]

=

[

21001 31000

21000 31000

] [

1 −1

−1 2

]

[

21001 − 31000 −21001 + 2 · 31000

21000 − 31000 −21000 + 2 · 31000

]

EXAMPLE 2 Let A =

[

−2 2

−3 5

]

. Calculate A200.

Solution: We have

0 = det(A − λI) =

∣

∣

∣

∣

∣

∣

−2 − λ 2

−3 5 − λ

∣

∣

∣

∣

∣

∣

= λ2 − 3λ − 4 = (λ + 1)(λ − 4).

Thus, the eigenvalues of A are λ1 = −1 and λ2 = 4.

For λ1 = −1 we have A − λ1I =

[

−1 2

−3 6

]

∼[

1 −2

0 0

]

. so a basis for Eλ1is

{[

2

1

]}

.

For λ2 = 4 we have A − λ2I =

[

−6 2

−3 1

]

∼[

1 −1/30 0

]

. so a basis for Eλ2is

{[

1

3

]}

.

Therefore, P =

[

2 1

1 3

]

and D =

[

−1 0

0 4

]

. We find that P−1 = 15

[

3 −1

−1 2

]

. Hence,

A200 = PD200P−1 =

[

2 1

1 3

] [

1 0

0 4200

]

1

5

[

3 −1

−1 2

]

=1

5

[

6 − 4200 −2 + 2 · 4200

3 − 3 · 4200 −1 + 6 · 4200

]



1. Let A =

[

4 −1

−2 5

]

. Use diagonalization to cal-

culate A3. Verify your answer by computing A3

directly.

2. Calculate A100 where A =

[

−6 −10

4 7

]

.


[

2 2

−3 −5

]

.


[

−2 2

−3 5

]

.


−2 1 1

−1 0 1

−2 2 1

.

6. Show that if λ is an eigenvalue of A, then λn is

an eigenvalue of An. How are the correspond-

ing eigenvectors related?

7. An n × n matrix T is called a Markov matrix

(or transition matrix) if ti j ≥ 0 for all i, j, andn

∑

i=1

ti j = 1 for each j. Prove that one eigenvalue

of a Markov matrix T is λ1 = 1.

8. Recall the population-migration system de-

scribed in Problem 3.1.13 on page 86.

(a) Given initial populations of x0 and y0, use

diagonalization to verify that the populations in

year t are given by[

xt

yt

]

= At

[

x0

y0

]

= 13

2 +(

710

)t2 − 2

(

710

)t

1 −(

710

)t1 + 2

(

710

)t

[

x0

y0

]

(b) Use this formula to confirm that, if the ini-

tial populations sum to T (that is, x0 + y0 = T ),

then in the long term (i.e. as t gets very large),

the populations tend to

x =2

3T, y =

1

3T

regardless of the separate values of x0 and y0.

This is referred to as a stable population pro-

file. (Compare with Problem 3.1.13(c).)

9. Recall the competing suppliers of internet ser-

vices described in Problem 3.1.14 on page 83.

Use diagonalization to calculate the market

share of each company after six months and to

calculate how the market shares will be divided

in the long run.

10. Recall the transferring of cars between the

three locations of a car rental company de-

scribed in Problem 3.1.15 on page 83. Use di-

agonalization to determine how the cars will be

distributed in the long run.

11. Recall the one-step recursion definition of the

Fibonacci sequence given in problem 3.1.16 on

page 83. Use diagonalization to derive the fol-

lowing explicit expression for the n-th term of

the Fibonacci sequence:

an =1√

5

1 +√

5

2

n

−

1 −√

5

2

n

Chapter 7

Fundamental Subspaces

The main purpose of this chapter is to review a few important concepts from the first

six chapters. These concepts include subspaces, bases, and dimension. As you will

soon see, the rest of the book relies heavily on these and other concepts from the first

six chapters.

7.1 Bases of Fundamental Subspaces

Recall from Section 3.3 the four fundamental subspaces of a matrix.

DEFINITION

FundamentalSubspaces

Let A be an m × n matrix. The four fundamental subspaces of A are

1. Col(A) = {A~x ∈ Rm | ~x ∈ Rn}, called the column space,

2. Row(A) = {AT~x ∈ Rn | ~x ∈ Rm}, called the row space,

3. Null(A) = {~x ∈ Rn | A~x = ~0}, called the nullspace,

4. Null(AT ) = {~x ∈ Rm | AT~x = ~0}, called the left nullspace.

THEOREM 1 If A is an m × n matrix, then Col(A) and Null(AT ) are subspaces of Rm, and Row(A)

and Null(A) are subspaces of Rn.

Our goal now is to find an easy way to determine a basis for each of the four funda-

mental subspaces.

REMARK

To help you understand the following proof, you may wish to pick a simple 3 × 2

matrix A and follow the steps of the proof with your matrix A.

207

208 Chapter 7 Fundamental Subspaces

THEOREM 2 If A is an m × n matrix, then the columns of A which correspond to leading ones in

the reduced row echelon form of A form a basis for Col(A). Moreover,

dim Col(A) = rank A

Proof: We first observe that if A is the zero matrix, then the result is trivial. Other-

wise, A contains a non-zero entry and so rank A = r > 0.

Let the RREF of A be R =[

~r1 · · · ~rn

]

. Since rank A = r, R contains r leading ones.

Let t1, . . . , tr denote the indexes of the columns of R which contain leading ones. We

will first show that B = {~rt1 , . . . ,~rtr} is a basis for Col(R).

By definition of the RREF, the vectors ~rt1 , . . . ,~rtr contain a leading one and all other

entries must be zero. Thus, they are standard basis vectors. Moreover, they are

distinct standard basis vectors since each leading one must be in a different row.

Hence, B is linearly independent.

Also, by definition of the RREF, all columns of R without leading ones are in SpanBas they cannot contain a non-zero entry in any row without a leading one. Therefore,

we also have SpanB = Col(R) and hence B is a basis for Col(R) as claimed.

Let A =[

~a1 · · · ~an

]

. We will show that C = {~at1 , . . . , ~atr} is a basis for Col(A)

by using the fact that B is a basis for Col(R). To do this, we first need to find a

relationship between the vectors in B and C.

Since R is the reduced row echelon form of A, by Theorem 5.2.6 there exists a

sequence of elementary matrices E1, . . . , Ek such that Ek · · ·E1A = R. Let E =

Ek · · ·E1. Recall that every elementary matrix is invertible, hence E−1 = E−11 · · ·E−1

k

exists. Then

R = EA = [E~a1 · · · E~an]

Consequently, ~ri = E~ai, or ~ai = E−1~ri.

To prove that C is linearly independent, we use the definition of linear independence.

Consider

c1~at1 + · · · + cr~atr =~0

Multiply both sides by E to get

E(c1~at1 + · · · + cr~atr ) = E~0

c1E~at1 + · · · + crE~atr =~0

c1~rt1 + · · · + cr~rtr =~0

This implies that c1 = · · · = cr = 0 since {~rt1 , . . . ,~rtr} is linearly independent. Thus, Cis linearly independent.

To show that C spans Col(A), we need to show that if we pick any ~b ∈ Col(A), we

can write it as a linear combination of the vectors ~at1 , . . . , ~atr . Let ~b ∈ Col(A).

Section 7.1 Bases of Fundamental Subspaces 209

By definition of the column space, there exists a ~x ∈ Rn such that

~b = A~x

~b = E−1R~x

E~b = R~x

Therefore, E~b is in the column space of R and hence can be written as a linear com-

bination of the basis vectors {~rt1 , . . . ,~rtr}. That is, there exists d1, . . . , dr ∈ R such

that

E~b = d1~rt1 + · · · + dr~rtr

~b = E−1(d1~rt1 + · · · + dr~rtr )

~b = d1E−1~rt1 + · · · + drE−1~rtr

~b = d1~at1 + · · · + dr~atr

as required. Thus, we also have Span{~at1 , . . . , ~atr} = Col(A). Hence, {~at1 , . . . , ~atr} is a

basis for Col(A).

Recall that the dimension of a vector space is the number of vectors in any basis.

Thus, dim Col(A) = r = rank A. �

LEMMA 3 If R is an m × n matrix and E is an n × n invertible matrix, then

{RE~x | ~x ∈ Rn} = {R~y | ~y ∈ Rn}

THEOREM 4 If A is an m × n matrix, then the non-zero rows in the reduced row echelon form of

A forms a basis for Row(A). Hence, dim Row(A) = rank A.

Proof: Let R be the RREF of A. By definition of RREF, we have that the set of all

non-zero rows of R form a basis for the row space of R. Moreover, if E is a matrix as

in the proof of Theorem 2, then we get that A = E−1R and

Row(A) = {AT~x | ~x ∈ Rm}= {(E−1R)T~x | ~x ∈ Rm}= {RT (E−1)T~x | ~x ∈ Rm}= {RT~y | ~y ∈ Rm} by Lemma 3

= Row(R)

Hence, the set of all non-zero rows of R also form a basis for the row space of A. �

Using Theorem 2 and Theorem 4 we get the following corollary.


COROLLARY 5 For any m × n matrix A we have rank A = rank AT .

REMARK

Due to its usefulness, it is very common to define the rank of a matrix as the dimen-

sion of the column space (or row space) and then prove that this is equivalent to the

number of leading ones in the reduced row echelon form.

Thanks to our previous experience in finding bases of nullspaces (which you prac-

ticed lots in chapter 6), once we have found the reduced row echelon form of a ma-

trix, we can easily find a basis for the nullspace, column space, and row space of

the matrix. To find a basis for the left nullspace of a matrix, we can use our method

for finding a basis for the nullspace on AT . We demonstrate this with a couple of

examples.

EXAMPLE 1 Let A =

1 2 −1

1 1 2

1 0 5

. Find a basis for the four fundamental subspaces.

Solution: Row reducing A we get

1 2 −1

1 1 2

1 0 5

∼

1 0 5

0 1 −3

0 0 0

= R

A basis for Row(A) is the set of non-zero rows of R. Thus, a basis is

1

0

5

,

0

1

−3

(remember that in linear algebra, vectors in Rn are always written as column vectors).

A basis for Col(A) is the columns of A which correspond to the columns of R which

contain leading ones. Hence, a basis for Col(A) is

1

1

1

,

2

1

0

.

To find a basis for Null(A), we solve A~x = ~0 as normal. That is, we rewrite the system

R~x = ~0 as a system of equations:

x1 + 5x3 = 0

x2 − 3x3 = 0

Hence, x3 is a free variable and a vector equation of the solution space is

x1

x2

x3

=

−5x3

3x3

x3

= x3

−5

3

1

, x3 ∈ R


Consequently, a basis for Null(A) is

−5

3

1

.

To find a basis for Null(AT ), we can just use our method for finding the nullspace of

a matrix on AT . That is, we row reduce AT to get

1 1 1

2 1 0

−1 2 5

∼

1 0 −1

0 1 2

0 0 0

Thus, x3 is a free variable and a vector equation for the solution space is

x1

x2

x3

=

x3

−2x3

x3

= x3

1

−2

1

, x3 ∈ R

Therefore, a basis for the left nullspace of A is

1

−2

1

.

EXAMPLE 2 Let A =

2 4 −4 2

1 2 2 −7

−1 −2 0 3

. Find a basis for the four fundamental subspaces.

Solution: Row reducing A we get

2 4 −4 2

1 2 2 −7

−1 −2 0 3

∼

1 2 0 −3

0 0 1 −2

0 0 0 0

= R

The set of non-zero rows

1

2

0

−3

,

0

0

1

−2

of R forms a basis for Row(A).

The columns of A which correspond to the columns of R which contain leading ones

form a basis for Col(A). Hence, a basis for Col(A) is

2

1

−1

,

−4

2

0

.

To find a basis for Null(A), we solve A~x = ~0 as normal. That is, we rewrite the system

R~x = ~0 as a system of equations:

x1 + 2x2 − 3x4 = 0

x3 − 2x4 = 0

Hence, x2 and x4 are free variables and a vector equation for the solution set is

x1

x2

x3

x4

=

−2x2 + 3x4

x2

2x4

x4

= x2

−2

1

0

0

+ x4

3

0

2

1

, x2, x4 ∈ R


Hence, a basis for Null(A) is

−2

1

0

0

,

3

0

2

1

.

Now, we row reduce AT to get

2 1 −1

4 2 −2

−4 2 0

2 −7 3

∼

1 0 −1/40 1 −1/20 0 0

0 0 0

Consequently, x3 is a free variable and the general solution is

x1

x2

x3

=

14x3

12x3

x3

= x3

1/41/21

, x3 ∈ R

Therefore, a basis for the left nullspace of A is

1/41/21

.

REMARK

It is not actually necessary to row reduce AT to find a basis for the left nullspace of A.

We can use the fact that EA = R where E is the product of elementary matrices used

to bring A to its reduced row echelon form R to find a basis for the left nullspace. The

derivation of this procedure is left as an exercise.

Notice in each of the examples that the dimension of the column space plus the

dimension of the nullspace equals number of columns of the matrix. This result

should not be surprising since Theorem 1 tells us that the dimension of the column

space equals the number of leading ones, and the Solution Theorem (Theorem 2.2.6)

tells us that the dimension of the nullspace equals the number of columns without

leading ones. We get the following theorem.

THEOREM 6 (Dimension Theorem)

If A is an m × n matrix, then rank A + dim Null(A) = n.

Proof: Let B = {~v1, . . . ,~vk} be a basis for Null(A) so that dim Null(A) = k. By

Theorem 4.2.4 we can extend B to a basis {~v1, . . . ,~vk,~vk+1, . . . ,~vn} for Rn. We will

prove that {A~vk+1, . . . , A~vn} is a basis for Col(A).

Consider~0 = ck+1A~vk+1 + · · · + cnA~vn = A(ck+1~vk+1 + · · · + cn~vn)


This implies that ck+1~vk+1 + · · · + cn~vn ∈ Null(A) and hence we can write it as a linear

combination of the basis vectors for Null(A). That is, there exist d1, . . . , dk ∈ R such

that

ck+1~vk+1 + · · · + cn~vn = d1~v1 + · · · + dk~vk

−d1~v1 − · · · − dk~vk + ck+1~vk+1 + · · · + cn~vn = ~0

Thus, d1 = · · · = dk = ck+1 = · · · = cn = 0 since {~v1, . . . ,~vn} is linearly independent.

Therefore, {A~vk+1, . . . , A~vn} is linearly independent.

Let~b ∈ Col(A). Then~b = A~x for some ~x ∈ Rn. Since ~x ∈ Rn, there exist s1, . . . , sn ∈ Rsuch that

~b = A(s1~v1 + · · · + sk~vk + sk+1~vk+1 + · · · + sn~vn)

= s1A~v1 + · · · + skA~vk + sk+1A~vk+1 + · · · + snA~vn

But, ~vi ∈ Null(A) for 1 ≤ i ≤ k, so A~vi = ~0. Hence, we have

~b = ~0 + · · · + ~0 + sk+1A~vk+1 + · · · + snA~vn = sk+1A~vk+1 + · · · + snA~vn

Thus, Span{A~vk+1, . . . , A~vn} = Col(A).

Therefore, we have shown that {A~vk+1, . . . , A~vn} is a basis for Col(A) and hence

rank A = dim Col(A) = n − k = n − dim Null(A)

as required. �

Wait! Why did we just give a proof of this theorem when we already stated that

the result follows from Theorem 1 and the Solution Theorem? We did this for three

reasons. First, a proof can often teach us more than just that the statement of the

theorem is true. For example, this proof teaches us a way of finding a basis for the

column space if we are given a basis for the nullspace, and it teaches us a proof

technique (which we will use again soon!). Second, in Chapter 2 we omitted a proof

of the Solution Theorem, so it is nice to make up for that now. Lastly, because proving

theorems is fun.

EXERCISE 1 Let L : Rn → Rm be a linear mapping. Prove that

dim Range(L) + dim Ker(L) = dim(Rn)

in two ways. First, by mimicking the proof of the Dimension Theorem. Second, by

using the Dimension Theorem and the theorems is Section 3.3.



1. Find a basis for the four fundamental subspace

of each matrix.

(a)

[

0 1 3

1 2 3

]

(b)

[

1 2 −1

2 4 3

]

(c)

2 −1 5

3 4 2

1 1 1

(d)

1 −2 3 5

−2 4 0 −4

3 −6 −5 1

(e)

3 2 5 3

−1 0 −3 1

1 1 1 1

1 4 −5 1

2. If

1

0

0

,

0

1

0

is a basis for the nullspace of a

4 × 3 matrix A =[

~a1 ~a2 ~a3

]

, then what is a

basis for Col(A)?

3. Let A be an m × n matrix with rank(A) =

r < m and reduced row echelon form R. Let

E1, . . . , Ek be a sequence of elementary matri-

ces such that Ek · · ·E1A = R. Prove that the

bottom m − r rows of E = Ek · · ·E1 form a ba-

sis for the left nullspace of A.

4. Prove Lemma 3.

5. Let A be an n × n matrix. Prove that

Null(A) = {~0} if and only if det A , 0.

6. Let B be an m × n matrix.

(a) Prove that if ~x is any vector in the left

nullspace of B, then ~xT B = ~0T .

(b) Prove that if ~x is any vector in the left

nullspace of B and ~y is any vector in the

column space of B, then ~x · ~y = 0.

7. Invent a 2 × 2 matrix A such that

Null(A) = Col(A).

8. Prove that if n×n matrices A and B are similar,

then rank A = rank B.

9. Let A be an m × n matrix.

(a) Prove that Null(AT A) = Null(A).

(b) Prove that rank(AT A) = rank(A).

10. Prove that if AB = Om,n, then the columnspace

of B is a subspace of the nullspace of A.

11. Let S = Span

1

1

1

,

1

0

2

.

(a) Find a matrix A such that Col(A) = S.

(b) Find a matrix B such that Row(B) = S.

(c) Find a matrix C such that Null(C) = S.

(d) Find a matrix D such that Null(DT ) = S.

(e) Is there a matrix F such that Col(F) = S

and Null(F) = S?

Chapter 8

Linear Mappings

8.1 General Linear Mappings

Linear Mappings L : V→ W

We now extend our definition of a linear mapping to the case where the domain and

codomain are general vectors spaces instead of just Rn.

DEFINITION

Linear Mapping

Let V andW be vector spaces. A mapping L : V→W is called linear if

L(s~x + t~y) = sL(~x) + tL(~y)

for all ~x, ~y ∈ V and s, t ∈ R.

Two linear mappings L : V→W and M : V→W are said to be equal if L(~v) = M(~v)

for all ~v ∈ V.

REMARKS

1. The definition of a linear mapping above is well defined because of the closure

properties of vector spaces.

2. As before, we sometimes call a linear mapping L : V→ V a linear operator.

215

216 Chapter 8 Linear Mappings

EXAMPLE 1 Let L : P3(R)→ R2 be defined by L(a + bx + cx2 + dx3) =

[

a + d

b + c

]

.

(a) Evaluate L(1 + 2x2 − x3).

Solution: L(1 + 2x2 − x3) =

[

1 + (−1)

0 + 2

]

=

[

0

2

]

.

(b) Prove that L is linear.

Solution: For any a1 + b1x + c1x2 + d1x3, a2 + b2x + c2x2 + d2x3 ∈ P3(R) and

s, t ∈ R we get

L(

s(a1 + b1x + c1x2 + d1x3) + t(a2 + b2x + c2x2 + d2x3))

= L(

(sa1 + ta2) + (sb1 + tb2)x + (sc1 + tc2)x2 + (sd1 + td2)x3)

=

[

sa1 + ta2 + sd1 + td2

sb1 + tb2 + sc1 + tc2

]

= s

[

a1 + d1

b1 + c1

]

+ t

[

a2 + d2

b2 + c2

]

= sL(a1 + b1x + c1x2 + d1x3) + tL(a2 + b2x + c2x2 + d2x3)

Thus, L is linear.

EXAMPLE 2 Let tr : Mn×n(R) → R be defined by tr A =

n∑

i=1

(A)ii (called the trace of a matrix).

Prove that tr is linear.

Solution: Let A, B ∈ Mn×n(R) and s, t ∈ R. We get

tr(sA + tB) =

n∑

i=1

(sA + tB)ii

=

n∑

i=1

(s(A)ii + t(B)ii)

= s

n∑

i=1

(A)ii + t

n∑

i=1

(B)ii

= s tr A + t tr B

Thus, tr is linear.

Section 8.1 General Linear Mappings 217

EXAMPLE 3 Prove that the mapping L : P2(R) → M2×2(R) defined by L(a + bx + cx2) =

[

a bc

0 abc

]

is not linear.


L(1 + x + x2) =

[

1 1

0 1

]

But,

L(

2(1 + x + x2))

= L(2 + 2x + 2x2) =

[

2 4

0 8

]

, 2L(1 + x + x2)

Of course, operations on linear mappings L : V→W are defined in the same way as

for linear mappings L : Rn → Rm.

DEFINITION

Addition


Let L : V→W and M : V→W be linear mappings. We define L +M : V→W by

(L + M)(~v) = L(~v) + M(~v)

and for any t ∈ R we define tL : V→W by

(tL)(~v) = tL(~v)

EXAMPLE 4 Let L : M2×2(R) → P2(R) be defined by L

([

a b

c d

])

= a + bx + (a + c + d)x2 and

M : M2×2(R) → P2(R) be defined by M

([

a b

c d

])

= (b + c) + dx2. Then, L and M are

both linear and L + M is the mapping defined by

(L + M)

([

a b

c d

])

= L

([

a b

c d

])

+ M

([

a b

c d

])

= [a + bx + (a + c + d)x2] + [(b + c) + dx2]

= (a + b + c) + bx + (a + c + 2d)x2

Similarly, 4L is the mapping defined by

(4L)

([

a b

c d

])

= 4L

([

a b

c d

])

= 4a + 4bx + (4a + 4c + 4d)x2


THEOREM 1 Let V and W be vector spaces. The set L of all linear mappings L : V → W with

standard addition and scalar multiplication of mappings is a vector space.

Proof: To prove that L is a vector space, we need to show that it satisfies all ten

vector spaces axioms. We will prove V1 and V2 and leave the rest as exercises.

Let L,M ∈ L. Then, L : V→W and M : V→W are both linear.

V1 To prove that L is closed under addition, we need to show that L+M : V→Wis a linear mapping.

For any ~v1,~v2 ∈ V and s, t ∈ R we have

(L + M)(s~v1 + t~v2) = L(s~v1 + t~v2) + M(s~v1 + t~v2)

= sL(~v1) + tL(~v2) + sM(~v1) + tM(~v2)

= s[L(~v1) + M(~v1)] + t[L(~v2) + M(~v2)]

= s(L + M)(~v1) + t(L + M)(~v2)

Thus, L + M is linear and so L + M ∈ L.

V2 For any ~v ∈ V we have

(L + M)(~v) = L(~v) + M(~v) = M(~v) + L(~v) = (M + L)(~v)

since addition inW is commutative. Hence L + M = M + L.

�

We also define the composition of mappings in the expected way.

DEFINITION

Composition

Let L : V→W and M :W→ U be linear mappings. We define M ◦ L : V→ U by

(M ◦ L)(~v) = M(L(~v)), for all ~v ∈ V

EXAMPLE 5 If L : P2(R) → R3 is defined by L(a + bx + cx2) =

a + b

c

0

and M : R3 → M2×2(R) is

defined by M

x1

x2

x3

=

[

x1 0

x2 + x3 0

]

, then M ◦ L is the mapping defined by

(M ◦ L)(a + bx + cx2) = M(L(a + bx + cx2)) = M

a + b

c

0

=

[

a + b 0

c 0

]

Observe that M ◦ L is in fact a linear mapping from P2(R) to M2×2(R).

Section 8.1 General Linear Mappings 219

THEOREM 2 If L : V → W and M : W → U are linear mappings, then M ◦ L : V → U is a linear

mapping.

DEFINITION

InvertibleMapping

Let L : V → W and M : W → V be linear mappings. If (M ◦ L)(~v) = ~v for all ~v ∈ Vand (L ◦ M)(~w) = ~w for all ~w ∈ W, then L and M are said to be invertible. We write

M = L−1 and L = M−1.

EXAMPLE 6 Let L : P3(R)→ M2×2(R) be the linear mapping defined by

L(a + bx + cx2 + dx3) =

[

a + b + c −a + b + c + d

a + c − d a + b + d

]

and M : M2×2(R)→ P3(R) be the linear mapping defined by

M

([

a b

c d

])

= (−a+c+d)+(4a−b−3c−2d)x+(−2a+b+2c+d)x2+(−3a+b+2c+2d)x3

Prove that L and M are inverses of each other.


(M ◦ L)(a + bx + cx2 + dx3) =M

([

a + b + c −a + b + c + d

a + c − d a + b + d

])

=a + bx + cx2 + dx3

(L ◦ M)

([

a b

c d

])

=L((−a + c + d) + (4a − b − 3c − 2d)x

+ (−2a + b + 2c + d)x2 + (−3a + b + 2c + 2d)x3)

=

[

a b

c d

]

Thus, L and M are inverses of each other.



1. Determine which of the following mappings

are linear. If it is linear, prove it. If not, give

a counterexample to show that it is not linear.

(a) L : R2 → M2×2(R) defined by

L(a, b) =

[

a 0

0 b

]

(b) T : P2(R)→ P2(R) defined by

T (a + bx + cx2) = (a − b) + (bc)x2.

(c) L : P2(R)→ M2×2(R) defined by

L(a + bx + cx2) =

([

1 0

a + c b + c

])

.

(d) T : M2×2(R)→ R3 defined by

T

([

a b

c d

])

=

a + b

b + c

c − a

(e) D : P3(R) → P2(R) defined by

D(a + bx + cx2 + dx3) = b + 2cx + 3dx2

(f) L : M2×2(R) → R defined by L(A) = det A.

2. For the given vector spaces V and W, invent a

linear mapping L : V → W that satisfies the

given properties.

(a) V = R3, W = P2(R); L(1, 0, 0) = 1 + x,

L(0, 1, 0) = 1 − x2, L(0, 0, 1) = 1 + x + x2

(b) V = P2(R), W = M2×2(R), L(1) =

[

1 2

1 2

]

,

L(x) =

[

1 0

0 1

]

, L(1 + x + x2) =

[

2 2

2 3

]

.

3. Let B be a basis for an n-dimensional vector

space V. Prove that the mapping L : V → Rn

defined by L(~v) = [~v]B for all ~v ∈ V is linear.

4. Let L : V→W be a linear mapping. Prove that

L(~0) = ~0.

5. Let L : V → W and M : W → U be linear

mappings. Prove that M ◦L is a linear mapping

from V to U.

6. (a) Let L : V→W be a linear mapping. Prove

that if {L(~v1), . . . , L(~vk)} is a linearly inde-

pendent set in W, then {~v1, . . . ,~vk} is lin-

early independent in V.

(b) Give an example of a linear mapping L :

V → W, where {~v1, . . . ,~vk} is linearly in-

dependent in V, but {L(~v1), . . . , L(~vk)} is

linearly dependent inW.

7. Let L be the set of all linear mappings from V

to W with standard addition and scalar multi-

plication of linear mappings. Prove that

(a) tL ∈ L for all L ∈ L and t ∈ R.

(b) t(L + M) = tL + tM for all L,M ∈ L and

t ∈ R.

8. Prove that a linear mapping L : Rn → Rn is in-

vertible if and only if its standard matrix [L] is

invertible.

9. Find the inverse of the linear mapping

L : R2 → R2 defined by

L(x1, x2) = (2x1 + x2, 3x1 − 4x2)

10. Let S denote the vector space of infinite se-

quences with addition and scalar multiplication

defined component-wise (see Problem 4.1.8 pg

123). Define the left shift L : S → S by

L(x1, x2, x3, . . .) = (x2, x3, x4, . . .)

and the right shift R : S → S by

R(x1, x2, x3, . . .) = (0, x1, x2, x3, . . .)

It is easy to verify that L and R are linear.

Prove that for any sequence s ∈ S we have

(L ◦ R)(~x) = ~x but (R ◦ L)(~x) , ~x. This shows

that L has a right inverse, but it does not have a

left inverse. It is important in this example that

S is infinite dimensional.

Section 8.2 Rank-Nullity Theorem 221

8.2 Rank-Nullity Theorem

Our goal in this section is not only to extend the definitions of the range and kernel

of a linear mapping to general linear mappings, but to also generalize the result of

Exercise 7.1.1 to general linear mappings. We will see that this generalization, called

the Rank-Nullity Theorem, is extremely useful.

DEFINITION

Range

Kernel

For a linear mapping L : V→W the kernel of L is defined to be

Ker(L) = {~v ∈ V | L(~v) = ~0W}

and the range of L is defined to be

Range(L) = {L(~v) ∈ W | ~v ∈ V}

THEOREM 1 If V andW are vector spaces and L : V→W is a linear mapping, then

L(~0) = ~0

THEOREM 2 If L : V → W is a linear mapping, then Ker(L) is a subspace of V and Range(L) is a

subspace ofW.

The procedure for finding a basis for the range and kernel of a general linear mapping

is, of course, exactly the same as we saw for linear mappings from Rn to Rm.

EXAMPLE 1 Find a basis for the range and kernel of L : P3(R) → R2 defined by

L(a + bx + cx2 + dx3) =

[

a + d

b + c

]

Solution: If a + bx + cx2 + dx3 ∈ Ker(L), then[

0

0

]

= L(a + bx + cx2 + dx3) =

[

a + d

b + c

]

Therefore, a = −d and b = −c. Thus, every vector in Ker(L) has the form

a + bx + cx2 + dx3 = −d − cx + cx2 + dx3 = d(−1 + x3) + c(−x + x2)

Since {−1 + x3,−x + x2} is clearly linearly independent, it is a basis for Ker(L).

The range of L contains all vectors of the form[

a + d

b + c

]

= (a + d)

[

1

0

]

+ (b + c)

[

0

1

]

So, a basis for Range(L) is

{[

1

0

]

,

[

0

1

]}

.


EXAMPLE 2 Determine the dimension of the range and kernel of L : P2(R)→ M3×2(R) defined by

L(a + bx + cx2) =

a b

c b

a + b a + c − b

Solution: If a + bx + cx2 ∈ Ker(L), then

0 0

0 0

0 0

= L(a + bx + cx2) =

a b

c b

a + b a + c − b

Hence, we must have a = b = c = 0. Thus, Ker(L) = {~0}, so a basis for Ker(L) is the

empty set. Hence, dim(Ker(L)) = 0.

The range of L contains all vectors of the form

a b

c b

a + b a + c − b

= a

1 0

0 0

1 1

+ b

0 1

0 1

1 −1

+ c

0 0

1 0

0 1

Thus,

1 0

0 0

1 1

,

0 1

0 1

1 −1

,

0 0

1 0

0 1

spans Range(L) and can be shown to be linearly

independent. Hence, it is a basis for Range(L). Consequently, dim(Range(L)) = 3.

EXERCISE 1 Find a basis for the range and kernel of L : R2 → M2×2(R) defined by

L

([

x1

x2

])

=

[

x1 x1 + x2

0 x1 + x2

]

Observe in all of the examples above that dim(Range(L)) + dim(Ker(L)) = dimV

which matches the result we had for linear mappings L : Rn → Rm. Before we

extend this to the general case, we make a couple of definitions.

DEFINITION

Rank

Nullity

Let L : V→W be a linear mapping. We define the rank of L by

rank(L) = dim(Range(L))

We define the nullity of L to be

nullity(L) = dim(Ker(L))

Section 8.2 Rank-Nullity Theorem 223

THEOREM 3 (Rank-Nullity Theorem)

Let V be an n-dimensional vector space and letW be a vector space. If L : V → Wis linear, then

rank(L) + nullity(L) = n

Proof: Assume that{

~v1, · · · ,~vk

}

is a basis for Ker(L) which means nullity(L) = k.

By Theorem 4.2.4, we can extend{

~v1, · · · ,~vk

}

to a basis{

~v1, · · · ,~vk,~vk+1, · · · ,~vn

}

for

V. We claim that B = {

L(~vk+1), · · · , L(~vn)}

is a basis for Range(L).

Consider

ck+1L(~vk+1) + · · · + cnL(~vn) = ~0

L(ck+1~vk+1 + · · · + cn~vn) = ~0

This implies that ck+1~vk+1 + · · · + cn~vn ∈ Ker(L). Thus, we can write

ck+1~vk+1 + · · · + cn~vn = d1~v1 + · · · + dk~vk

But, then we have

−d1~v1 − · · · − dk~vk + ck+1~vk+1 + · · · + cn~vn = ~0

Hence, d1 = · · · = dk = ck+1 = · · · = cn = 0 since{

~v1, · · · ,~vn

}

is linearly independent

as it is a basis for V. Thus, B is linearly independent.

By definition, for any ~y ∈ Range(L) there exists ~v ∈ V such that

~y = L(~v) = L(a1~v1 + · · · + ak~vk + ak+1~vk+1 + · · · + an~vn)

= a1L(~v1) + · · · + akL(~vk) + ak+1L(~vk+1) + · · · + anL(~vn)

= ~0 + · · · + ~0 + ak+1L(~vk+1) + · · · + anL(~vn)

= ak+1L(~vk+1) + · · · + anL(~vn)

Thus, ~y ∈ Span{L(~vk+1), . . . , L(~vn)} and so B is also a spanning set for Range(L).

Therefore, we have shown that {L(~vk+1), · · · , L(~vn)} is a basis for Range(L) and hence

rank(L) = dim(Range(L)) = n − k = n − nullity(L)

�

REMARK

The proof of the Rank-Nullity Theorem should seem very familiar. It is essentially

identical to that of the Dimension Theorem in Chapter 7. In this book there will be a

few times where proofs are essentially repeated, so spending time to make sure you

understand proofs when you first see them can have real long term benefits.


EXAMPLE 3 Let L : P2(R)→ M2×3(R) be defined by

L(a + bx + cx2) =

[

a a c

a a c

]

Find the rank and nullity of L.

Solution: If a + bx + cx2 ∈ Ker(L), then

[

a a c

a a c

]

= L(a + bx + cx2) =

[

0 0 0

0 0 0

]

Hence, a = 0 and c = 0. So,

Ker(L) = {0 + bx + 0x2 | b ∈ R} = Span{x}

Therefore, a basis for Ker(L) is {x} and nullity(L) = 1. Then, by the Rank-Nullity

Theorem, we get that

rank(L) = dim P2(R) − nullity(L) = 3 − 1 = 2

In the example above, we could have instead found a basis for the range and then

used the Rank-Nullity Theorem to find the nullity of L.


1. Find the rank and nullity of the following linear

mappings.

(a) L : P2(R)→ M2×2(R) defined by

L(a + bx + cx2) =

[

a 0

0 b

]

.

(b) L : P2(R)→ P2(R) defined by

L(a + bx + cx2) = (a − b) + (b + c)x2.

(c) T : P2(R)→ M2×2(R) defined by

T (a + bx + cx2) =

[

0 0

a + c b + c

]

.

(d) L : R3 → P1(R) defined by

L(a, b, c) = (a + b) + (a + b + c)x.

(e) T : M2×2(R) → M2×2(R) defined by

T (A) = AT .

(f) M : P2(R)→ M2×2(R) defind by

M(a + bx + cx2) =

[

−a − 2c 2b − c

−2a + 2c −2b − c

]

.

2. Let ~v1, . . . ,~vk be vectors in a vector space V

and L : V → W be a linear mapping. Prove

that if {L(~v1), . . . , L(~vk)} spansW, then dimV ≥dimW.

3. Find a linear mapping L : V → W where

dimV ≥ dimW, but Range(L) ,W.

4. Find a linear mapping L : V → V such that

Ker(L) = Range(L).

5. Let V and W be n-dimensional vector spaces

and let L : V → W be a linear mapping. Prove

that Range(L) =W if and only if Ker(L) = {~0}.

6. LetU,V,W be finite dimensional vector spaces

and let L : V → U and M : U → W be linear

mappings.

(a) Prove that rank(M ◦ L) ≤ rank(M).

(b) Prove that rank(M ◦ L) ≤ rank(L).

Section 8.3 Matrix of a Linear Mapping 225

8.3 Matrix of a Linear Mapping

We now show that every linear mapping L : V→W can also be represented as a ma-

trix mapping. However, we must be careful when dealing with general vector spaces

as our domain and codomain. For example, it is certainly impossible to represent a

linear mapping L : P2(R) → M2×2(R) as a matrix mapping L(~x) = A~x since we can

not multiply a matrix by a polynomial ~x ∈ P2(R).

Thus, if we are going to define a matrix representation of a general linear mapping,

we need to convert vectors in V to vectors in Rn. Recall that the coordinate vector of

~x ∈ V with respect to a basis B is a vector in Rn. In particular, if B = {~v1, . . . ,~vn} and

~x = b1~v1 + · · · + bn~vn, then the coordinate vector of ~x with respect to B is defined to

be

[~x]B =

b1

...bn

Using coordinates, we can write a matrix mapping representation for a linear map-

ping L : V→W. We want to find a matrix A such that

[L(~x)]C = A[~x]B

for every ~x ∈ V, where B is a basis for V and C is a basis forW.

Consider the left-hand side [L(~x)]C. Using properties of linear mappings and coordi-

nates, we get

[L(~x)]C = [L(b1~v1 + · · · + bn~vn)]C

= [b1L(~v1) + · · · + bnL(~vn)]C

= b1[L(~v1)]C + · · · + bn[L(~vn)]C

=[

[L(~v1)]C · · · [L(~vn)]C]

b1

...bn

Thus, we have the matrix[

[L(~v1)]C · · · [L(~vn)]C]

being matrix-vector multiplied

by the vector [~x]B as desired. Consequently, this is the matrix we were trying to find.

DEFINITION

Matrix of aLinear Mapping

Suppose B = {~v1, . . . ,~vn} is any basis for a vector space V and C is any basis for a

finite dimensional vector spaceW. For a linear mapping L : V → W, the matrix of

L with respect to bases B and C is defined by

C[L]B =[

[L(~v1)]C · · · [L(~vn)]C]

and satisfies

[L(~x)]C = C[L]B[~x]B

for all ~x ∈ V.


EXAMPLE 1 Let L : P2(R) → R2 be the linear mapping defined by L(a + bx + cx2) =

[

a + c

b − c

]

,

B = {1 + x2, 1 + x,−1 + x + x2}, C ={[

1

0

]

,

[

1

1

]}

. Find C[L]B.

Solution: To find C[L]B we need to determine the C-coordinate vectors of the images

of the vectors in B under L. We have

L(1 + x2) =

[

2

−1

]

= 3

[

1

0

]

+ (−1)

[

1

1

]

L(1 + x) =

[

1

1

]

= 0

[

1

0

]

+ 1

[

1

1

]

L(−1 + x + x2) =

[

0

0

]

= 0

[

1

0

]

+ 0

[

1

1

]

Hence,

C[L]B =[

[L(1 + x2)]C [L(1 + x)]C [L(−1 + x + x2)]C]

=

[

3 0 0

−1 1 0

]

We can check our answer. Let p(x) = a + bx + cx2. Recall, to use C[L]B we need

[p(x)]B. Hence, we need to write p(x) as a linear combination of the vectors in B. In

particular, we need to solve

a+bx+cx2 = t1(1+ x2)+ t2(1+ x)+ t3(−1+ x+ x2) = (t1+ t2− t3)+(t2+ t3)x+(t1+ t3)x2


1 1 −1 a

0 1 1 b

1 0 1 c

∼

1 0 0 (a − b + 3c)/30 1 0 (a + 2b − c)/30 0 1 (−a + b + c)/3

Thus,

[p(x)]B =

(a − b + 3c)/3(a + 2b − c)/3(−a + b + c)/3

So, we get

[L(p(x))]C = C[L]B[p(x)]B

=

[

3 0 0

−1 1 0

]

(a − b + 2c)/3(a + 2b − c)/3(−a + b + c)/3

=

[

a − b + 2c

b − c

]

Therefore, by definition of C-coordinates,

L(p(x)) = (a − b + 2c)

[

1

0

]

+ (b − c)

[

1

1

]

=

[

a + c

b − c

]

as required.


EXAMPLE 2 Let T : R2 → M2×2(R) be the linear mapping defined by T

([

a

b

])

=

[

a + b 0

0 a − b

]

,

let B ={[

2

−1

]

,

[

1

2

]}

, and let C ={[

1 1

0 0

]

,

[

1 0

0 1

]

,

[

1 1

0 1

]

,

[

0 0

1 0

]}

. Determine C[T ]B

and use it to calculate T (~v) where [~v]B =

[

2

−3

]

.

Solution: To find C[T ]B we need to determine the C-coordinate vectors of the images

of the vectors in B under T . We have

T (2,−1) =

[

1 0

0 3

]

= −2

[

1 1

0 0

]

+ 1

[

1 0

0 1

]

+ 2

[

1 1

0 1

]

+ 0

[

0 0

1 0

]

T (1, 2) =

[

3 0

0 −1

]

= 4

[

1 1

0 0

]

+ 3

[

1 0

0 1

]

+ (−4)

[

1 1

0 1

]

+ 0

[

0 0

1 0

]

Hence,

C[T ]B =[

[T (2,−1)]C [T (1, 2)]C]

=

−2 4

1 3

2 −4

0 0

Thus, we get

[T (~v)]C =

−2 4

1 3

2 −4

0 0

[

2

−3

]

=

−16

−7

16

0

So,

T (~v) = (−16)

[

1 1

0 0

]

+ (−7)

[

1 0

0 1

]

+ 16

[

1 1

0 1

]

+ (0)

[

0 0

1 0

]

=

[

−7 0

0 9

]

In the special case of a linear operator L acting on a finite-dimensional vector space

V with basis B, we often wish to find the matrix B[L]B.

DEFINITION

Matrix of aLinear Operator

Suppose B = {~v1, . . . ,~vn} is any basis for a vector space V and let L : V → V be a

linear operator. The B-matrix of L (or the matrix of L with respect to the basis B) is

defined by

[L]B =[

[L(~v1)]B · · · [L(~vn)]B]

and satisfies

[L(~x)]B = [L]B[~x]B

for all ~x ∈ V.


EXAMPLE 3 Let B =

1

1

1

,

1

0

1

,

1

3

2

and let A =

−6 −2 9

−5 −1 7

−7 −2 10

. Find the B-matrix of the linear

mapping defined by L(~x) = A~x.

Solution: By definition, the columns of the B-matrix are the B-coordinate vectors of

the images of the vectors in B under L. We find that

−6 −2 9

−5 −1 7

−7 −2 10

1

1

1

=

1

1

1

= 1

1

1

1

+ 0

1

0

1

+ 0

1

3

2

−6 −2 9

−5 −1 7

−7 −2 10

1

0

1

=

3

2

3

= 2

1

1

1

+ 1

1

0

1

+ 0

1

3

2

−6 −2 9

−5 −1 7

−7 −2 10

1

3

2

=

6

6

7

= 3

1

1

1

+ 2

1

0

1

+ 1

1

3

2

Hence, [L]B =[

[L(1, 1, 1)]B [L(1, 0, 1)]B [L(1, 3, 2)]B]

=

1 2 3

0 1 2

0 0 1

EXAMPLE 4 Let L : P2(R)→ P2(R) be defined by L(a+bx+cx2) = (2a+b)+(a+b+c)x+(b+2c)x2 .

Find [L]B whereB = {−1+x2, 1−2x+x2, 1+x+x2} and find [L]S where S = {1, x, x2}.Solution: To determine [L]B we need to find the B-coordinate vectors of the images

of the vectors in B under L.

L(−1 + x2) = −2 + 0x + 2x2 = 2(−1 + x2) + 0(1 − 2x + x2) + 0(1 + x + x2)

L(1 − 2x + x2) = 0 + 0x + 0x2 = 0(−1 + x2) + 0(1 − 2x + x2) + 0(1 + x + x2)

L(1 + x + x2) = 3 + 3x + 3x2 = 0(−1 + x2) + 0(1 − 2x + x2) + 3(1 + x + x2)

Thus,

[L]B =[

[L(−1 + x2)]B [L(1 − 2x + x2)]B [L(1 + x + x2)]B]

=

2 0 0

0 0 0

0 0 3

Similarly, for [L]S we calculate the S-coordinate vectors of the images of the vectors

in S under L.

L(1) = 2 + x = 2(1) + 1(x) + 0x2

L(x) = 1 + x + x2 = 1(1) + 1(x) + 1(x2)

L(x2) = x + 2x2 = 0(1) + 1(x) + 2(x2)

Hence,

[L]S =

2 1 0

1 1 1

0 1 2


In Example 4 we found the matrix of L with respect to the basis B is diagonal while

the matrix of L with respect to the basis S is not. This is perhaps not surprising since

we see that the vectors in B are eigenvectors of L (that is, they satisfy L(~vi) = λi~vi),

while the vectors in S are not. Recall from Section 6.1 that we called such a basis a

geometrically natural basis.

REMARK

The relationship between diagonalization and the matrix of a linear operator is ex-

tremely important. We will review this a little later in the text. You may also find it

useful at this point to review Section 6.1.


1. Find the matrix of each linear mapping with re-

spect to the bases B and C.

(a) B = {1, x, x2}, C = {1, x}, D : P2(R) →P1(R) defined by D(a+bx+cx2) = b+2cx,

(b) B ={[

1

−1

]

,

[

1

2

]}

, C = {1 + x2, 1 + x,−1 −

x + x2}, L : R2 → P2(R) defined by

L(a1, a2) = (a1 + a2) + a1x2,

(c) B =

{[

1 1

0 0

]

,

[

0 1

1 0

]

,

[

1 0

0 1

]

,

[

0 0

1 1

]}

,

C ={[

1

1

]

,

[

−1

1

]}

, T : M2×2(R) → R2 de-

fined by T

([

a b

c d

])

=

[

a + c

b + c

]

.

(d) B = {1+ x2, x− x2, x2, x3}, C ={[

1

1

]

,

[

1

2

]}

,

L : P3(R)→ R2 defined by L(p) =

[

p(0)

p(1)

]

.

(e) B = {1, x − 1, (x − 1)2}, C = {1, x +1, (x + 1)2}, L : P2(R) → P2(R) defined

by L(a + bx + cx2) = a + bx + cx2.

(f) B =

1

−1

0

,

1

0

−1

,

0

1

1

, C = {1 − x, x},

L : R3 → P1(R) defined by

L(a, b, c) = (a + c) + (a + 2b)x.

2. Find the B-matrix of each linear mapping with

respect to the give basis B.

(a) B = {1, x, x2}, D : P2(R) → P2(R) defined

by D(a + bx + cx2) = b + 2cx,

(b) B =

{[

1 1

0 0

]

,

[

1 0

0 1

]

,

[

1 1

0 1

]

,

[

0 0

1 0

]}

,

L : M2×2(R) → M2×2(R) defined by

L

([

a b

c d

])

=

[

a b + c

0 d

]

.

(c) B ={[

1 0

0 1

]

,

[

2 0

0 3

]}

, T : U → U, where

U is the subspace of diagonal matrices in

M2×2(R), defined by

T

([

a 0

0 b

])

=

[

a + b 0

0 2a + b

]

.

(d) B = {1+x2,−1+x, 1−x+x2}, L : P2(R) →P2(R) defined by

L(a + bx + cx2) = a + (b + c)x2.

3. Let V be an n-dimensional vector space and let

L : V → V by the linear operator defined by

L(~v) = ~v. Prove that [L]B = I for any basis Bof V.

4. Invent a mapping L : R2 → R2 and a basis Bfor R2, such that Col([L]B) , Range(L). Justify

your answer.


8.4 Isomorphisms

Recall that the ten vector space axioms define a “structure” for the set based on

the operations of addition and scalar multiplication. Since all vector spaces satisfy

the same ten properties, we would expect that all n-dimensional vector spaces should

have the same structure. This idea seems to be confirmed by our work with coordinate

vectors in Section 4.3. No matter what n-dimensional vector space V we use and

which basis we use for the vector space, we have a nice way of relating the vectors

in V to vectors in Rn. Moreover, the operations of addition and scalar multiplication

are preserved with respect to this basis. Consider the following calculations:

1

2

3

+

4

5

6

=

5

7

9

(1 + 2x + 3x2) + (4 + 5x + 6x2) = 5 + 7x + 9x2

(~v1 + 2~v2 + 3~v3) + (4~v1 + 5~v2 + 6~v3) = 5~v1 + 7~v2 + 9~v3

No matter which vector space we are using and which basis for that vector space,

any linear combination of vectors is really just performed on the coordinates of the

vectors with respect to the basis.

We will now look at how to use general linear mappings to mathematically prove

these observations. We begin with some familiar concepts.

One-To-One and Onto

DEFINITION

One-To-One

Onto

Let L : V → W be a linear mapping. L is called one-to-one (injective) if for every

~u,~v ∈ V such that L(~u) = L(~v), we must have ~u = ~v.

L is called onto (surjective) if for every ~w ∈W, there exists ~v ∈ V such that L(~v) = ~w.

EXAMPLE 1 Let L : R2 → P2(R) be the linear mapping defined by L(a1, a2) = a1 + a2x2. Then

L is one-to-one, since if L(a1, a2) = L(b1, b2), then a1 + a2x2 = b1 + b2x2 and hence

a1 = b1 and a2 = b2. L is not onto, since there is no ~a ∈ R2 such that L(~a) = x.

EXAMPLE 2 Let L : M2×2(R) → R be the linear mapping defined by L(A) = tr A. L is not one-

to-one since L

([

1 0

0 −1

])

= L

([

0 0

0 0

])

, but

[

1 0

0 −1

]

,

[

0 0

0 0

]

. L is onto since if we

pick any x ∈ R, we can find a matrix A ∈ M2×2(R), for example A =

[

x 0

0 0

]

, such that

L (A) = x.

Section 8.4 Isomorphisms 231

Observe that a linear mapping L : V → W being onto means that Range(L) = W.

Hence, there is a direct connection between a mapping being onto and its range. On

the other hand, L being one-to-one means that for each ~w ∈ Range(L) there exists

exactly one ~v ∈ V such that L(~v) = ~w. We now establish a relationship between

a mapping being one-to-one and its kernel. These connections will be exploited in

some proofs below.

LEMMA 1 Let L : V→W be a linear mapping. L is one-to-one if and only if Ker(L) = {~0}.

Proof: Assume L is one-to-one. If ~x ∈ Ker(L), then L(~x) = ~0 = L(~0) which implies

that ~x = ~0 since L is one-to-one. Hence Ker(L) = {~0}.

Assume Ker(L) = {~0}. If L(~u) = L(~v), then

~0 = L(~u) − L(~v) = L(~u − ~v)

Hence, ~u − ~v ∈ Ker(L) which implies that ~u − ~v = ~0. Therefore, ~u = ~v and so L is

one-to-one. �

EXAMPLE 3 Let L : R3 → P5(R) be the linear mapping defined by

L(a, b, c) = a(1 + x3) + b(1 + x2 + x5) + cx4

Prove that L is one-to-one, but not onto.

Solution: Let ~a =

a

b

c

∈ Ker(L). Then,

0 = L(a, b, c) = a(1 + x3) + b(1 + x2 + x5) + cx4

This gives a = b = c = 0, so Ker(L) =

0

0

0

. Hence, L is one-to-one by Lemma 1.

Observe that a basis for Range(L) is {1+ x3, 1+ x2+ x5, x4}. Hence, dim(Range(L)) =

3 < dim P5(R). Thus, Range(L) , P5(R), so L is not onto.

EXAMPLE 4 Let L : M2×2(R)→ R2 be the linear mapping defined by L

([

a b

c d

])

=

[

a

b

]

.

Prove that L is onto, but not one-to-one.


[

0 0

1 1

]

∈ Ker(L). Thus, L is not one-to-one by Lemma 1.

Let

[

x1

x2

]

∈ R2. Observe that L

([

x1 x2

0 0

])

=

[

x1

x2

]

, so L is onto.


EXERCISE 1 Let L : R3 → P2(R) be the linear mapping defined by L(a, b, c) = a+ bx+ cx2. Prove

that L is one-to-one and onto.

Isomorphisms

For two vector spaces V andW to have the same structure we will need each vector

~v ∈ V to be identified with a unique vector ~w ∈ W such that linear combinations

are preserved. To relate each ~v ∈ V with a unique ~w ∈ W we see that we just need

to find a mapping L from V to W which is both one-to-one and onto. For linear

combinations to be preserved we need the mapping to be linear.

DEFINITION

Isomorphism

Isomorphic

A vector space V is said to be isomorphic to a vector spaceW if there exists a linear

mapping L : V → W which is one-to-one and onto. L is called an isomorphism

from V toW.

EXAMPLE 5 Show that R4 is isomorphic to M2×2(R).

Solution: We need to invent a linear mapping L fromR4 to M2×2(R) that is one-to-one

and onto. To define such a linear mapping, we think about how to relate vectors in

R4 to vectors in M2×2(R). Keeping in mind our work at the beginning of this section,

we define the mapping L : R4 → M2×2(R) by

L(a0, a1, a2, a3) =

[

a0 a1

a2 a3

]

Linear: Let ~a =

a0

a1

a2

a3

, ~b =

b0

b1

b2

b3

∈ R4 and s, t ∈ R. Then L is linear since

L(s~a + t~b) = L(sa0 + tb0, sa1 + tb1, sa2 + tb2, sa3 + tb3)

=

[

sa0 + tb0 sa1 + tb1

sa2 + tb2 sa3 + tb3

]

= s

[

a0 a1

a2 a3

]

+ t

[

b0 b1

b2 b3

]

= sL(~a) + tL(~b)

One-To-One: If L(~a) = L(~b), then

[

a0 a1

a2 a3

]

=

[

b0 b1

b2 b3

]

which implies ai = bi for

i = 0, 1, 2, 3. Therefore, ~a = ~b and so L is one-to-one.

Onto: Pick

[

a0 a1

a2 a3

]

∈ M2×2(R). Then L(a0, a1, a2, a3) =

[

a0 a1

a2 a3

]

, so L is onto.

Hence, L is an isomorphism and so R4 is isomorphic to M2×2(R).


It is instructive to think carefully about the isomorphism in the example above. Ob-

serve that the images of the standard basis vectors for R4 are the standard basis vec-

tors for M2×2(R). That is, the isomorphism is mapping basis vectors to basis vectors.

We will keep this in mind when constructing an isomorphism in the next example.

EXAMPLE 6 Let T = {p(x) ∈ P2(R) | p(1) = 0} and S =

a

b

c

∈ R3 | a + b + c = 0

. Prove that T

is isomorphic to S.

Solution: By the Factor Theorem, every vector p(x) ∈ T has the form (x−1)(a+bx).

Hence, a basis for T is B = {1 − x, x − x2}. Also, every vector ~x ∈ S has the form

~x =

a

b

−a − b

= a

1

0

−1

+ b

0

1

−1

Therefore, a basis for S is C =

1

0

−1

,

0

1

−1

. To define an isomorphism we can map

the vectors in B to the vectors in C. In particular, we define L : T→ S by

L

(

a(1 − x) + b(x − x2)

)

= a

1

0

−1

+ b

0

1

−1

Linear: Let p(x), q(x) ∈ T with p(x) = a1(1 − x) + b1(x − x2) and

q(x) = a2(1 − x) + b2(x − x2). For any s, t ∈ R we get

L(sp(x) + tq(x)) = L

(

(sa1 + ta2)(1 − x) + (sb1 + tb2)(x − x2)

)

= (sa1 + ta2)

1

0

−1

+ (sb1 + tb2)

0

1

−1

= s

a1

1

0

−1

+ b1

0

1

−1

+ t

a2

1

0

−1

+ b2

0

1

−1

= sL(p(x)) + tL(q(x))

Thus, L is linear.

One-To-One: Let a(1 − x) + b(x − x2) ∈ Ker(L). Then,

0

0

0

= L

(

a(1 − x) + b(x − x2)

)

= a

1

0

−1

+ b

0

1

−1

=

a

b

−a − b

Hence, a = b = 0, so Ker(L) = {0}. Therefore, by Lemma 1, L is one-to-one.


Onto: Let a

1

0

−1

+ b

0

1

−1

∈ S. We have

L

(

a(1 − x) + b(x − x2)

)

= a

1

0

−1

+ b

0

1

−1

So, L is onto.

Thus, L is an isomorphism and T is isomorphic to S.

EXERCISE 2 Prove that P2(R) is isomorphic to R3.

Observe in the examples above that the isomorphic vector spaces have the same di-

mension. This makes sense since if we are mapping basis vectors to basis vectors,

then both vector spaces need to have the same number of basis vectors. We now

prove that this is indeed always the case.

THEOREM 2 Let V andW be finite dimensional vector spaces. V is isomorphic toW if and only

if dimV = dimW.

Proof: On one hand, assume that dimV = dimW = n. Let B = {~v1, . . . ,~vn} be a

basis for V and C = {~w1, . . . , ~wn} be a basis for W. As in our examples above, we

define a mapping L : V → W which maps the basis vectors in B to the basis vectors

in C. So, we define

L(t1~v1 + · · · + tn~vn) = t1~w1 + · · · + tn~wn

Linear: Let ~x = a1~v1 + · · · + an~vn, ~y = b1~v1 + · · · + bn~vn ∈ V and s, t ∈ R. We have

L(s~x + t~y) = L(

(sa1 + tb1)~v1 + · · · + (san + tbn)~vn

)

= (sa1 + tb1)~w1 + · · · + (san + tbn)~wn

= s(a1~w1 + · · · + an~wn) + t(b1~w1 + · · · + bn~wn)

= sL(~x) + tL(~y)

Therefore, L is linear.

One-To-One: If ~v ∈ Ker(L), then

~0 = L(~v) = L(c1~v1 + · · · + cn~vn) = c1L(~v1) + · · · + cnL(~vn) = c1~w1 + · · · + cn~wn

C is linearly independent so we get c1 = · · · = cn = 0. Therefore, Ker(L) = {~0} and

hence L is one-to-one.


Onto: We have Ker(L) = {~0} since L is one-to-one. Thus, we get rank(L) = dimV −0 = n by the Rank-Nullity Theorem. Consequently, Range(L) is an n-dimensional

subspace ofW which implies that Range(L) =W, so L is onto.

Thus, L is an isomorphism from V toW and so V is isomorphic toW.

On the other hand, assume that V is isomorphic toW. Then there exists an isomor-

phism L from V to W. Since L is one-to-one and onto, we get Range(L) = W and

Ker(L) = {~0}. Thus, the Rank-Nullity Theorem gives

dimW = dim(Range(L)) = rank(L) = dimV − nullity(L) = dimV

as required. �

The proof not only proves the theorem, but it demonstrates a few additional facts.

1. It shows the intuitively obvious fact that if V is isomorphic to W, then W is

isomorphic to V. Typically we just say that V andW are isomorphic.

2. It confirms that we can make an isomorphism from V toW by mapping basis

vectors of V to basis vectors toW.

3. Finally, observe that once we had proven that L was one-to-one, we could

exploit the Rank-Nullity Theorem and that dimV = dimW to immediately get

that the mapping was onto. In particular, it shows us how to prove the following

theorem.

THEOREM 3 If V andW are both n-dimensional vector spaces and L : V→W is linear, then L is

one-to-one if and only if L is onto.

REMARK

Note that if V and W are both n-dimensional, then it does not mean every linear

mapping L : V → W must be an isomorphism. For example, L : R4 → M2×2(R)

defined by L(x1, x2, x3, x4) =

[

0 0

0 0

]

is definitely not one-to-one nor onto!

We saw that we can make an isomorphism by mapping basis vectors to basis vectors.

The following theorem shows that this property actually characterizes isomorphisms.

THEOREM 4 Let V and W be isomorphic vector spaces and let {~v1, . . . ,~vn} be a basis for V. A

linear mapping L : V → W is an isomorphism if and only if {L(~v1), . . . , L(~vn)} is a

basis forW.

The following exercise is intended to be quite challenging. It requires complete un-

derstanding of what we have done in this chapter.


EXERCISE 3 Use the following steps to find a basis for the vector space L of all linear operators

L : V→ V on an 2-dimensional vector space V.

1. Prove that L is isomorphic to M2×2(R) by constructing an isomorphism T .

2. Find an isomorphism T−1 from M2×2(R) to L. Let {~e1, ~e2, ~e3, ~e4} denote the stan-

dard basis vectors for M2×2(R). Calculate T−1(~e1), T−1(~e2), T−1(~e3), T−1(~e4).

By Theorem 4 these vectors form a basis for L.

3. Demonstrate that you are correct by proving that the set

{T−1(~e1), T−1(~e2), T−1(~e3), T−1(~e4)} is linearly independent and spans L.


1. For each of the following pairs of vector

spaces, define an explicit isomorphism to es-

tablish that the spaces are isomorphic. Prove

that your map is an isomorphism.

(a) P1(R) and R2.

(b) P3(R) and M2×2(R).

(c) The subspace U of 2 × 2 upper triangular

matrices and the subspace

P = {p(x) ∈ P3(R) | p(1) = 0}.

(d) The subspace S =

x1

x2

0

x3

| x1 − x2 + x3 = 0

of R4 and the subspace U = {a+ bx + cx2 |a = b} of P2(R).

(e) Any n-dimensional vector space V and Rn.

(f) Any n-dimensional vector space V and

Pn−1(R).

2. Let V and W be n-dimensional vector spaces

and let L : V → W be a linear mapping. Prove

that L is one-to-one if and only if L is onto.

3. Let L : V → W be an isomorphism and

let {~v1, . . . ,~vn} be a basis for V. Prove that

{L(~v1), . . . , L(~vn)} is a basis forW.

4. Let L : V → U and M : U → W be linear

mappings.

(a) Prove that if L and M are onto, then M ◦ L

is onto.

(b) Give an example where L is not onto, but

M ◦ L is onto.

(c) Is it possible to give an example where M

is not onto, but M ◦ L is onto? Explain.

5. Let V andW be vector spaces with dimV = n

and dimW = m, let L : V → W be a linear

mapping, and let A be the matrix of L with re-

spect to bases B for V and C forW.

(a) Define an explicit isomorphism from

Range(L) to Col(A). Prove that your map

is an isomorphism.

(b) Use (a) to prove that rank(L) = rank(A).

6. LetU andV be finite dimensional vector spaces

and L : U → V and M : V → U both be one-

to-one linear mappings. Prove that U and V are

isomorphic.


Assume that U,V,W are all finite dimensional

vector spaces.

(a) If dimV = dimU and L : U→ V is linear,

then L is an isomorphism.

(b) If L : V → W is a linear mapping and

dimV < dimW, then L cannot be onto.

(c) If L : V → W is a linear mapping and

dimV > dimW, then L is onto.

(d) If L : V → W is a linear mapping and

dimV > dimW, then L cannot be one-to-

one.

Chapter 9

Inner Products

In Chapter 1, we looked at the dot product for vectors in Rn. We saw that the dot

product has an important relationship to the length of a vector and to the angle be-

tween two vectors. Moreover, the dot product gave us an easy way of finding the

projection of a vector onto a plane. In Chapter 8, we saw that every n-dimensional

vector space has the same structure as Rn in terms of being a vector space. Thus, we

expect that we can extend the concepts of length, orthogonality, and projections to

general vector spaces.

9.1 Inner Product Spaces

Recall that we defined a vector space so that the operations of addition and scalar

multiplication had the essential properties of addition and scalar multiplication of

vectors in Rn. Thus, to generalize the concept of the dot product to general vector

spaces, it makes sense to include the essential properties of the dot product in our

definition (see Theorem 1.4.2 on page 31).

DEFINITION

Inner Product

Inner ProductSpace

Let V be a vector space. An inner product on V is a function 〈 , 〉 : V × V→ R that

has the following properties: for every ~v, ~u, ~w ∈ V and s, t ∈ R we have

I1 〈~v,~v〉 ≥ 0, and 〈~v,~v〉 = 0 if and only if ~v = ~0 (Positive Definite)

I2 〈~v, ~w〉 = 〈~w,~v〉 (Symmetric)

I3 〈s~v + t~u, ~w〉 = s〈~v, ~w〉 + t〈~u, ~w〉 (Left linear)

A vector space V with an inner product 〈 , 〉 on V is called an inner product space.

REMARK

Notice that since an inner product is left linear and symmetric, then it is also right

linear:

〈~w, s~v + t~u〉 = s〈~w,~v〉 + t〈~w, ~u〉Thus, we say that an inner product is bilinear.

237

238 Chapter 9 Inner Products

In the same way that a vector space is dependent on the defined operations of addition

and scalar multiplication, an inner product space is dependent on the definitions of

addition, scalar multiplication, and the inner product. This will be demonstrated in

the examples below.

EXAMPLE 1 The dot product is an inner product on Rn, called the standard inner product on Rn.

EXAMPLE 2 Which of the following defines an inner product on R3?

(a)⟨

~x, ~y⟩

= x1y1 + 2x2y2 + 4x3y3.

Solution: We have⟨

~x, ~x⟩

= x21 + 2x2

2 + 4x23 ≥ 0

and⟨

~x, ~x⟩

= 0 if and only if ~x = ~0. Hence, it is positive definite.

⟨

~x, ~y⟩

= x1y1 + 2x2y2 + 4x3y3 = y1x1 + 2y2x2 + 4y3x3 =⟨

~y, ~x⟩

Therefore, it is symmetric. Also, it is bilinear since

⟨

s~x + t~y,~z⟩

= (sx1 + ty1)z1 + 2(sx2 + ty2)z2 + 4(sx3 + ty3)z3

= s(x1z1 + 2x2z2 + 4x3z3) + t(y1z1 + 2y2z2 + 4y3z3)

= s⟨

~x,~z⟩

+ t⟨

~y,~z⟩

Thus 〈 , 〉 is an inner product.

(b)⟨

~x, ~y⟩

= x1y1 − x2y2 + x3y3.

Solution: Observe that if ~x =

0

1

0

, then

⟨

~x, ~x⟩

= 0(0) − 1(1) + 0(0) = −1 < 0

So, it is not positive definite and thus it is not an inner product.

(c)⟨

~x, ~y⟩

= x1y1 + x2y2.

Solution: If ~x =

0

0

1

, then

⟨

~x, ~x⟩

= 0(0) + 0(0) = 0

but ~x , ~0. So, it is not positive definite and thus it is not an inner product.

Section 9.1 Inner Product Spaces 239

EXAMPLE 3 On the vector space M2×2(R), define 〈 , 〉 by

〈A, B〉 = tr(BT A)

where tr is the trace of a matrix defined as the sum of the diagonal entries. Show that

〈 , 〉 is an inner product on M2×2(R).

Solution: Let A, B,C ∈ M2×2(R) and s, t ∈ R. Then,

〈A, A〉 = tr(AT A) = tr

([

a11 a21

a12 a22

] [

a11 a12

a21 a22

])

= tr

([

a211+ a2

21a11a12 + a21a22

a11a12 + a21a22 a212 + a2

22

])

= a211 + a2

21 + a212 + a2

22 ≥ 0

Moreover, 〈A, A〉 = 0 if and only if A is the zero matrix. Hence, 〈 , 〉 is positive

definite.

〈A, B〉 = tr(BT A) = tr

([

b11 b21

b12 b22

] [

a11 a12

a21 a22

])

= tr

([

a11b11 + a21b21 b11a12 + b21a22

a11b12 + a21b22 a12b12 + a22b22

])

= a11b11 + a12b12 + a21b21 + a22b22

With a similar calculation we find that

〈B, A〉 = a11b11 + a12b12 + a21b21 + a22b22 = 〈A, B〉

Therefore, 〈 , 〉 is symmetric. In Example 8.1.2, we proved that tr is a linear operator.

Hence, we get

〈sA + tB,C〉 = tr(CT (sA + tB)) = tr(sCT A + tCT B)

= s tr(CT A) + t tr(CT B) = s 〈A,C〉 + t 〈B,C〉

Thus, 〈 , 〉 is also bilinear, and so it is an inner product on M2×2(R).

REMARKS

1. Of course, this can be generalized to an inner product 〈A, B〉 = tr(BT A) on

Mm×n(R). This is called the standard inner product on Mm×n(R).

2. Observe that calculating 〈A, B〉 corresponds exactly to finding the dot product

on vectors in Rmn under the obvious isomorphism. As a result, when using this

inner product you do not actually have to compute BT A.


EXAMPLE 4 Prove that the function 〈 , 〉 defined by

〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1)

is an inner product on P2(R).

Solution: Let p(x) = a + bx + cx2, q(x), r(x) ∈ P2(R). Then,

〈p(x), p(x)〉 = [p(−1)]2 + [p(0)]2 + [p(1)]2

= [a − b + c]2 + [a]2 + [a + b + c]2 ≥ 0

Moreover, 〈p(x), p(x)〉 = 0 if and only if a − b + c = 0, a = 0, and a + b + c = 0,

which implies a = b = c = 0. Hence, 〈 , 〉 is positive definite. We have

〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1)

= q(−1)p(−1) + q(0)p(0) + q(1)p(1)

= 〈q(x), p(x)〉

Hence, 〈 , 〉 is symmetric. Finally, we get

〈(sp + tq)(x), r(x)〉 =(sp + tq)(−1)r(−1) + (sp + tq)(0)r(0) + (sp + tq)(1)r(1)

=[sp(−1) + tq(−1)]r(−1) + [sp(0) + tq(0)]r(0)+

+ [sp(1) + tq(1)]r(1)

=s[p(−1)r(−1) + p(0)r(0) + p(1)r(1)]+

+ t[q(−1)r(−1) + q(0)r(0) + q(1)r(1)]

=s 〈p(x), r(x)〉 + t 〈q(x), r(x)〉

Therefore, 〈 , 〉 is also bilinear and hence is an inner product on P2(R).

EXAMPLE 5 An extremely important inner product in applied mathematics, physics, and engineer-

ing is the inner product

〈 f (x), g(x)〉 =∫ π

−πf (x)g(x) dx

on the vector space C[−π, π] of continuous functions defined on the closed interval

from −π to π. This is the foundation for Fourier Series.

THEOREM 1 If V is an inner product space with inner product 〈 , 〉, then for any ~v ∈ V we have

⟨

~v, ~0⟩

= 0

Section 9.1 Inner Product Spaces 241

REMARK

Whenever one considers an inner product space one must define which inner product

they are using. However, in Rn and Mm×n(R) one generally uses the standard inner

product. Thus, whenever we are working in Rn or Mm×n(R), if no other inner product

is defined, we will take that to mean that the standard inner product is being used.


1. Calculate the following inner products under

the standard inner product on M2×2(R).

(a)

⟨[

1 2

0 −1

]

,

[

1 2

0 −1

]⟩

(b)

⟨[

3 −4

2 5

]

,

[

4 1

2 −1

]⟩

(c)

⟨[

2 3

−4 1

]

,

[

−5 3

1 5

]⟩

(d)

⟨[

0 0

0 0

]

,

[

3 7

−4 3

]⟩

2. Calculate the following using the inner product

for P2(R) defined by

〈p(x), q(x)〉 = p(0)q(0) + p(1)q(1) + p(2)q(2)

(a) 〈x, 1 + x + x2〉 (b) 〈1 + x2, 1 + x2〉(c) 〈x + x2, 1 + x + x2〉 (d) 〈x, x〉


functions defines an inner product on R3.

(a) 〈~x, ~y〉 = 2x1y1 + 3x2y2 + 4x3y3

(b) 〈~x, ~y〉 = 2x1y1 − x1y2 − x2y1 + 2x2y2 + x3y3


functions defines an inner product on P2(R).

(a) 〈p(x), q(x)〉 = p(−1)q(−1) + p(1)q(1)

(b) 〈p(x), q(x)〉 = p(−2)q(−2) + p(1)q(1) +

p(2)q(2)

(c) 〈p(x), q(x)〉 = 2p(−1)q(−1) + 2p(0)q(0) +

2p(1)q(1) − p(−1)q(1) − p(1)q(−1)


functions defines an inner product on M2×2(R).

(a) 〈A, B〉 = 2a1b1 + a2b2 + 2a3b3

(b) 〈A, B〉 = a1b1 + a2b2 − a3b3 − a4b4

6. Let V be an inner product space with inner

product 〈 , 〉. Prove that 〈~v, ~0〉 = 0 for any~v ∈ V.

7. Let 〈 , 〉 denote the standard inner product on Rn

and let A ∈ Mn×n(R). Prove that for all ~x, ~y ∈ Rn

〈A~x, ~y〉 = 〈~x, AT~y〉

8. Let {~e1, ~e2} be the standard basis for R2. Prove

that for any inner product 〈 , 〉 on R2 we have

〈~x, ~y〉 = x1y1〈~e1, ~e1〉+x1y2〈~e1, ~e2〉+x2y1〈~e2, ~e1〉+x2y2〈~e2, ~e2〉

9. Verify that 〈 f (x), g(x)〉 =∫ π

−πf (x)g(x) dx is an

inner product on C[−π, π].10. Determine if the following statements are true

or false. Justify your answer.

(a) If there exists ~x, ~y ∈ V such that 〈~x, ~y〉 < 0,

then 〈 , 〉 is not an inner product on V.

(b) If 〈 , 〉 is an inner product on an inner prod-

uct space V, then for all ~x, ~y ∈ V we have

〈a~x+b~y, c~x+d~y〉 = ac〈~x, ~x〉+2bc〈~x, ~y〉+bd〈~y, ~y〉


9.2 Orthogonality and Length

The concepts of length and orthogonality are fundamental in geometry and have

many real world applications. Thus, we now extend these concepts to general in-

ner product spaces.

Length

DEFINITION

Length

Let V be an inner product space with inner product 〈 , 〉. For any ~v ∈ V we define the

length (or norm) of ~v by

‖~v ‖ =√

⟨

~v,~v⟩

Observe that the length of a vector is well defined since an inner product must be

positive definite. Also note that the length of a vector in an inner product space is

dependent on the definition of the inner product being used in the same way that

the sum of two vectors is dependent on the definition of addition being used. This

might seem a little strange at first, but it is actually extremely useful. It means in

applications we have flexibility to define the lengths of some vectors as required by

the problem.

EXAMPLE 1 In R3 under the standard inner product, we have that the length of ~e1 =

1

0

0

is 1.

However, under the inner product 〈~x, ~y〉 = ax1y1 + x2y2 + x3y3 where a > 0, then

‖~e1‖ =√

〈~e1, ~e1〉 =√

a(1)2 + 02 + 02 =√

a

EXAMPLE 2 Let p(x) = x and q(x) = 2 − 3x2 be vectors in P2(R) with inner product

〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1)

Calculate the length of p(x) and the length of q(x).

Solution: We have

‖p(x)‖ =√

〈p(x), p(x)〉 =√

p(−1)p(−1) + p(0)p(0) + p(1)p(1)

=√

(−1)(−1) + 0(0) + 1(1) =√

2

‖q(x)‖ =√

〈q(x), q(x)〉 =√

q(−1)q(−1) + q(0)q(0) + q(1)q(1)

=√

(−1)(−1) + 2(2) + (−1)(−1) =√

6

Section 9.2 Orthogonality and Length 243

EXAMPLE 3 Find the length of p(x) = x in P2(R) with inner product

〈p(x), q(x)〉 = p(0)q(0) + p(1)q(1) + p(2)q(2)

Solution: We have

‖p(x)‖ =√

〈p(x), p(x)〉 =√

p(0)p(0) + p(1)p(1) + p(2)p(2)

=√

0(0) + 1(1) + 2(2) =√

5

EXAMPLE 4 On C[−π, π], find the length of f (x) = x under the inner product

〈 f (x), g(x)〉 =∫ π

−πf (x)g(x) dx

Solution: We have

‖x ‖2 =∫ π

−πx2 dx =

1

3x3

∣

∣

∣

∣

∣

π

−π=

2

3π3

Thus, ‖x ‖ =√

2π3/3.

EXAMPLE 5 Find the length of A =

[

1/2 1/2−1/2 −1/2

]

in M2×2(R).

Solution: Using the standard inner product we get

‖A ‖ =√

〈A, A〉 =√

(1/2)2 + (1/2)2 + (−1/2)2 + (−1/2)2 = 1

Of course, we get the same familiar properties of length that we saw in Chapter 1.

THEOREM 1 Let V be an inner product space with inner product 〈 , 〉. For any ~x, ~y ∈ V and t ∈ Rwe have

(1) ‖~x ‖ ≥ 0, and ‖~x ‖ = 0 if and only if ~x = ~0.

(2) ‖t~v ‖ = |t| ‖~v ‖(3) 〈~x, ~y〉 ≤ ‖~x ‖‖~y ‖ Cauchy-Schwarz-Bunyakovski Inequality

(4) ‖~x + ~y‖ ≤ ‖~x‖ + ‖~y‖ Triangle Inequality

The proofs of these properties are essentially identical to the proofs of the corre-

sponding properties in Rn.


In many cases it is useful (and sometimes necessary) to use vectors which have length

one.

DEFINITION

Unit Vector

Let V be an inner product space with inner product 〈 , 〉. If ~v ∈ V is a vector such that

‖~v ‖ = 1, then ~v is called a unit vector.

In many situations, we will be given a general vector ~v ∈ V and need to find a unit

vector v̂ in the direction of ~v. This is called normalizing the vector. Theorem 1 (2)

shows us that

v̂ =1

‖~v ‖~v

EXAMPLE 6 Find a unit vector in the direction of p(x) = x in P2(R) with inner product

〈p(x), q(x)〉 = p(0)q(0) + p(1)q(1) + p(2)q(2)

Solution: We have

‖p(x)‖ =√

p(0)p(0) + p(1)p(1) + p(2)p(2) =√

0(0) + 1(1) + 2(2) =√

5

Thus, a unit vector in the direction of p is

ˆp(x) =1√

5x

Orthogonality

We now extend the idea of vectors being orthogonal to inner product spaces.

DEFINITION

OrthogonalVectors

Let V be an inner product space with inner product 〈 , 〉. If ~x, ~y ∈ V such that

⟨

~x, ~y⟩

= 0

then ~x and ~y are said to be orthogonal.

Just like length, whether two vectors are orthogonal or not is dependent on the defi-

nition of the inner product.

EXAMPLE 7 The standard basis vectors for R2 are not orthogonal under the inner product⟨

~x, ~y⟩

= 2x1y1 + x1y2 + x2y1 + 2x2y2 for R2 because

⟨[

1

0

]

,

[

0

1

]⟩

= 2(1)(0) + 1(1) + 0(0) + 2(0)(1) = 1


EXAMPLE 8 Show that

[

1 2

3 −1

]

is orthogonal to

[

−1 2

−1 0

]

in M2×2(R).

Solution:

⟨[

1 2

3 −1

]

,

[

−1 2

−1 0

]⟩

= (−1)(1) + 2(2) + (−1)(3) + 0(−1) = 0.

So, they are orthogonal.

DEFINITION

Orthogonal Set

If S = {~v1, . . . ,~vk} is a set of vectors in an inner product space V with inner product

〈 , 〉 such that⟨

~vi,~v j

⟩

= 0 for all i , j, then S is called an orthogonal set.

EXAMPLE 9 Prove that {1, x, 3x2 − 2} is an orthogonal set of vectors in P2(R) under the inner

product 〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1).

Solution: We have

〈1, x〉 = 1(−1) + 1(0) + 1(1) = 0⟨

1, 3x2 − 2⟩

= 1(1) + 1(−2) + 1(1) = 0⟨

x, 3x2 − 2⟩

= (−1)(1) + 0(−2) + 1(1) = 0

Hence, each vector is orthogonal to all other vectors, and so the set is orthogonal.

EXAMPLE 10 Prove that {1, x, 3x2 − 2} is not an orthogonal set of vectors in P2(R) under the inner

product 〈p(x), q(x)〉 = p(0)q(0) + p(1)q(1) + p(2)q(2).

Solution: We have 〈1, x〉 = 1(0) + 1(1) + 1(2) = 3 so 1 and x are not orthogonal.

Thus, the set is not orthogonal.

EXAMPLE 11 Prove that

{[

1 2

0 1

]

,

[

0 1

0 −2

]

,

[

0 0

1 0

]}

is an orthogonal set in M2×2(R).

Solution: We have

⟨[

1 2

0 1

]

,

[

0 1

0 −2

]⟩

= 1(0) + 2(1) + 0(0) + 1(−2) = 0

⟨[

1 2

0 1

]

,

[

0 0

1 0

]⟩

= 1(0) + 2(0) + 0(1) + 1(0) = 0

⟨[

0 0

1 0

]

,

[

0 1

0 −2

]⟩

= 0(0) + 0(1) + 1(0) + 0(−2) = 0

Hence, the set is orthogonal.


EXERCISE 1 Determine if

{[

3 1

−1 2

]

,

[

1 −3

2 1

]

,

[

1 2

3 −1

]}

is an orthogonal set in M2×2(R).

One important property of orthogonal vectors in R2 is the Pythagorean Theorem.

This extends to an orthogonal set in an inner product space.

THEOREM 2 If {~v1, . . . ,~vk} is an orthogonal set in an inner product space V, then

‖~v1 + · · · + ~vk ‖2 = ‖~v1‖2 + · · · + ‖~vk ‖2

The following theorem gives us an important result about orthogonal sets which do

not contain the zero vector.

THEOREM 3 If S = {~v1, . . . ,~vk} is an orthogonal set in an inner product spaceVwith inner product

〈 , 〉 such that ~vi ,~0 for all 1 ≤ i ≤ k, then S is linearly independent.

Proof: Consider~0 = c1~v1 + · · · + ck~vk

Take the inner product of both sides with ~vi to get

⟨

~vi, ~0⟩

=⟨

~vi, c1~v1 + · · · + ck~vk

⟩

0 = c1

⟨

~vi,~v1

⟩

+ · · · + ci

⟨

~vi,~vi

⟩

+ · · · + ck

⟨

~vi,~vk

⟩

= 0 + · · · + 0 + ci‖~vi ‖2 + 0 + · · · + 0

= ci‖~vi ‖2

Since ~vi ,~0, we have that ‖~vi ‖ , 0, and so ci = 0 for 1 ≤ i ≤ k. Therefore, S is

linearly independent. �

Consequently, if we have an orthogonal set of non-zero vectors which spans an inner

product space V, then it will be a basis for V.

DEFINITION

Orthogonal Basis

If B is an orthogonal set in an inner product space V that is a basis for V, then B is

called an orthogonal basis for V.

Since the vectors in the basis are all orthogonal to each other, our geometric intuition

tells us that it should be quite easy to find the coordinates of any vector with respect

to this basis. The following theorem demonstrates this.


THEOREM 4 If S = {~v1, . . . ,~vn} is an orthogonal basis for an inner product space V with inner

product 〈 , 〉 and ~v ∈ V, then the coefficient of ~vi when ~v is written as a linear combi-

nation of the vectors in S is

⟨

~v,~vi

⟩

‖~vi ‖2. In particular,

~v =

⟨

~v,~v1

⟩

‖~v1 ‖2~v1 + · · · +

⟨

~v,~vn

⟩

‖~vn ‖2~vn

Proof: Since ~v ∈ V, we can write

~v = c1~v1 + · · · + cn~vn

Taking the inner product of both sides with ~vi gives⟨

~v,~vi

⟩

=⟨

c1~v1 + · · · + cn~vn,~vi

⟩

= c1

⟨

~v1,~vi

⟩

+ · · · + ci

⟨

~vi,~vi

⟩

+ · · · + cn

⟨

~vn,~vi

⟩

= 0 + · · · + 0 + ci‖~vi ‖2 + 0 + · · · + 0

since S is orthogonal. Also, ~vi ,~0 which implies that ‖~vi ‖2 , 0. Therefore, we get

ci =

⟨

~v,~vi

⟩

‖~vi ‖2This is valid for 1 ≤ i ≤ n and so the result follows. �

EXAMPLE 12 Find the coordinate vector of ~x =

1

2

−3

in R3 with respect to the orthogonal basis

B =

1

1

1

,

1

0

−1

,

1

−2

1

= {~v1,~v2,~v3}.

Solution: By Theorem 4, we have

c1 =

⟨

~x,~v1

⟩

‖~v1 ‖2=

0

3= 0

c2 =

⟨

~x,~v2

⟩

‖~v2 ‖2=

4

2= 2

c3 =

⟨

~x,~v3

⟩

‖~v3 ‖2=−6

6= −1

Thus, [~x]B =

0

2

−1

.

Note that it is easy to verify that

1

2

−3

= 0

1

1

1

+ 2

1

0

−1

+ (−1)

1

−2

1


EXAMPLE 13 Find the coordinate vector of A =

[

4 2

3 −1

]

with respect to the orthogonal basis

B ={[

1 1

1 1

]

,

[

−2 0

1 1

]

,

[

0 0

1 −1

]}

= {~v1,~v2,~v3} for a subspace S of M2×2(R).

Solution: We have

c1 =

⟨

A,~v1

⟩

‖~v1‖2=

4(1) + 2(1) + 3(1) + (−1)(1)

12 + 12 + 12 + 12=

8

4= 2

c2 =

⟨

A,~v2

⟩

‖~v2‖2=

4(−2) + 2(0) + 3(1) + (−1)(1)

(−2)2 + 02 + 12 + 12=−6

6= −1

c3 =

⟨

A,~v3

⟩

‖~v3‖2=

4(0) + 2(0) + 3(1) + (−1)(−1)

02 + 02 + 12 + (−1)2=

4

2= 2

Thus, [A]B =

2

−1

2

.

EXERCISE 2 Consider P2(R) with inner product 〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1).

Verify that B ={

1 − x − 2x2, 2 − x2, 2 + 3x − 4x2}

is an orthogonal basis for P2(R)

and determine the B-coordinate vector of 1 + x2.

Orthonormal Bases

Observe that the formula for the coordinates with respect to an orthogonal basis

would be even easier if all the basis vectors were unit vectors.

DEFINITION

Orthonormal Set

If S = {~v1, . . . ,~vk} is an orthogonal set in an inner product space V such that ‖~vi ‖ = 1

for 1 ≤ i ≤ k, then S is called an orthonormal set.

By Theorem 3, an orthonormal set is necessarily linearly independent.

DEFINITION

OrthonormalBasis

A basis for an inner product space V which is an orthonormal set is called an

orthonormal basis of V.

EXAMPLE 14 The standard basis of Rn is an orthonormal basis under the standard inner product.

EXAMPLE 15 The standard basis for R3 is not an orthonormal basis under the inner product

⟨

~x, ~y⟩

= x1y1 + 2x2y2 + x3y3

since ~e2 is not a unit vector under this inner product.


EXAMPLE 16 Let B =

1√3

1

1

−1

, 1√2

1

0

1

, 1√6

1

−2

−1

. Verify that B is an orthonormal basis in R3.

Solution: We first verify that each vector is a unit vector. We have

⟨

1√

3

1

1

−1

,1√

3

1

1

−1

⟩

=1

3+

1

3+

1

3= 1

⟨

1√

2

1

0

1

,1√

2

1

0

1

⟩

=1

2+ 0 +

1

2= 1

⟨

1√

6

1

−2

−1

,1√

6

1

−2

−1

⟩

=1

6+

4

6+

1

6= 1

We now verify that B is orthogonal.

⟨

1√

3

1

1

−1

,1√

2

1

0

1

⟩

=1√

6(1 + 0 − 1) = 0

⟨

1√

3

1

1

−1

,1√

6

1

−2

−1

⟩

=1√

18(1 − 2 + 1) = 0

⟨

1√

2

1

0

1

,1√

6

1

−2

−1

⟩

=1√

12(1 + 0 − 1) = 0

Therefore, B is an orthonormal set and hence is a linearly independent set of three

vectors in R3. Thus, B is a basis for R3.

EXERCISE 3 Show that B ={[

1/√

2 1/√

2

0 0

]

,

[

0 0

1/√

2 1/√

2

]

,

[

1/2 −1/21/2 −1/2

]}

is an orthonormal

set in M2×2(R).

EXERCISE 4 Show that B = {1, x, x2} is not an orthonormal set for P2(R) under the inner product

〈p(x), q(x)〉 = p(−2)q(−2) + p(0)q(0) + p(2)q(2)


EXAMPLE 17 Turn {1, x, 3x2 − 2} into an orthonormal basis for P2(R) under the inner product

〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1)

Solution: In Example 9, we showed that {1, x, 3x2 − 2} is orthogonal. Hence, it is a

linearly independent set of three vectors in P2(R) and thus is a basis for P2(R). To

turn it into an orthonormal basis, we just need to normalize each vector. We have

‖1‖ =√

12 + 12 + 12 =√

3

‖x‖ =√

(−1)2 + 02 + 12 =√

2

‖3x2 − 2‖ =√

12 + (−2)2 + 12 =√

6

Hence, an orthonormal basis for P2(R) is

{

1√

3,

x√

2,

3x2 − 2√

6

}

EXAMPLE 18 Show that the set {1, sin x, cos x} is an orthogonal set in C[−π, π] under the inner

product 〈 f (x), g(x)〉 =∫ π

−πf (x)g(x) dx, and then normalize the vectors to make it an

orthonormal set.

Solution:

〈1, sin x〉 =∫ π

−π1 · sin x dx = 0

〈1, cos x〉 =∫ π

−π1 · cos x dx = 0

〈sin x, cos x〉 =∫ π

−πsin x · cos x dx = 0

Thus, the set is orthogonal. Next, we find that

〈1, 1〉 =∫ π

−π12 dx = 2π

〈sin x, sin x〉 =∫ π

−πsin2 x dx = π

〈cos x, cos x〉 =∫ π

−πcos2 x dx = π

Hence, ‖1‖ =√

2π, ‖ sin x‖ =√π, and ‖ cos x‖ =

√π. Therefore,

{

1√

2π,

sin x√π,

cos x√π

}

is an orthonormal set in C[−π, π].

The following corollary to Theorem 4 shows that it is very easy to find coordinates

of a vector with respect to an orthonormal basis.


COROLLARY 5 If ~v is any vector in an inner product space V with inner product 〈 , 〉 and

B = {~v1, . . . ,~vn} is an orthonormal basis for V, then

~v =⟨

~v,~v1

⟩

~v1 + · · · +⟨

~v,~vn

⟩

~vn

Proof: By Theorem 4 we have

~v =

⟨

~v,~v1

⟩

‖~v1‖2~v1 + · · · +

⟨

~v,~vn

⟩

‖~vn‖2~vn =

⟨

~v,~v1

⟩

~v1 + · · · +⟨

~v,~vn

⟩

~vn

since ‖~vi ‖ = 1 for 1 ≤ i ≤ n. �

REMARK

The formula for finding coordinates with respect to an orthonormal basis looks nicer

than that for an orthogonal basis. However, when doing problems by hand, is it not

necessarily easier to use as the vectors in an orthonormal basis as they often contain

square roots. Compare the following example to Example 12. In proofs, it will

generally be much nicer to have an orthonormal set.

EXAMPLE 19 Let ~x =

1

2

3

. Find the coordinates of ~x with respect to the orthonormal basis

B =

1√3

1

1

−1

, 1√2

1

0

1

, 1√6

1

−2

−1

for R3.

Solution: By Corollary 5, the coordinates of ~x with respect to B are

c1 =⟨

~x,~v1

⟩

=

⟨

1

2

3

,1√

3

1

1

−1

⟩

= 0

c2 =⟨

~x,~v2

⟩

=

⟨

1

2

3

,1√

2

1

0

1

⟩

=4√

2= 2√

2

c3 =⟨

~x,~v3

⟩

=

⟨

1

2

3

,1√

6

1

−2

−1

⟩

=−6√

6= −√

6


EXAMPLE 20 Consider P2(R) with inner product defined by

〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1)

Write f (x) = 1 + x + x2 as a linear combination of the vectors in the orthonormal

basis B = {~v1,~v2,~v3} ={

1√

3,

x√

2,

3x2 − 2√

6

}

.

Solution: By Corollary 5, the coordinates of ~x with respect to B are

c1 =⟨

f ,~v1

⟩

= 1

(

1√

3

)

+ 1

(

1√

3

)

+ 3

(

1√

3

)

=5√

3

c2 =⟨

f ,~v2

⟩

= 1

(

−1√

2

)

+ 1(0) + 3

(

1√

2

)

=√

2

c3 =⟨

f ,~v3

⟩

= 1

(

1√

6

)

+ 1

(

−2√

6

)

+ 3

(

1√

6

)

=2√

6

Thus,

1 + x + x2 =5√

3

(

1√

3

)

+√

2

(

x√

2

)

+2√

6

(

3x2 − 2√

6

)

Orthogonal Matrices

We end this section by looking at a very important application of orthonormal bases

of Rn.

THEOREM 6 For an n × n matrix P, the following are equivalent.

(1) The columns of P form an orthonormal basis for Rn

(2) PT = P−1

(3) The rows of P form an orthonormal basis for Rn

Proof: Let P =[

~v1 · · · ~vn

]

.

(1)⇔ (2): By definition of matrix multiplication we have

PT P =

~vT1...~vT

n

[

~v1 · · · ~vn

]

=

~v1 · ~v1 ~v1 · ~v2 · · · ~v1 · ~vn

~v2 · ~v1 ~v2 · ~v2 · · · ~v2 · ~vn

......

. . ....

~vn · ~v1 ~vn · ~v2 · · · ~vn · ~vn

Therefore, PT P = I if and only if ~vi · ~v j = 0 whenever i , j and ~vi · ~vi = 1. That is,

PT = P−1 if and only if {~v1, . . . ,~vn} is an orthonormal basis for Rn.


(2)⇔ (3): The rows of P form an orthonormal basis for Rn if and only if the columns

of PT from an orthonormal basis for Rn. We proved above that this is true if and only

if

(PT )T = (PT )−1

P = (P−1)T

PT =(

(P−1)T)T

PT = P−1

�

DEFINITION

OrthogonalMatrix

If the columns of an n×n matrix P form an orthonormal basis for Rn, then P is called

an orthogonal matrix.

REMARKS

1. Be very careful to remember that an orthogonal matrix has orthonormal rows

and columns. A matrix whose columns form an orthogonal set but not an

orthonormal set is not orthogonal!

2. By Theorem 6, we could have defined an orthogonal matrix as a matrix P

such that PT = P−1 instead. We will use this property of orthogonal matrices

frequently.

EXAMPLE 21 Which of the following matrices are orthogonal.

A =

0 1 0

0 0 1

1 0 0

, B =

[

1/2 1/2−1/2 1/2

]

, C =

1/√

2 1/√

2 0

1/√

6 −1/√

6 2/√

6

−1/√

3 1/√

3 1/√

3

Solution: The columns of A are the standard basis vectors for R3 which we know is

an orthonormal basis for R3. Thus, A is orthogonal.

Although the columns of B are clearly orthogonal, we have that∥

∥

∥

∥

∥

∥

[

1/2−1/2

]∥

∥

∥

∥

∥

∥

=√

(1/2)2 + (−1/2)2 =√

1/2 , 1

Thus, the first column is not a unit vector, so B is not orthogonal.

By matrix multiplication, we find that

CCT =

1/√

2 1/√

2 0

1/√

6 −1/√

6 2/√

6

−1/√

3 1/√

3 1/√

3

1/√

2 1/√

6 −1/√

3

1/√

2 −1/√

6 1/√

3

0 2/√

6 1/√

3

= I

Hence, C is orthogonal.


We now look at some useful properties of orthogonal matrices.

THEOREM 7 If P and Q are n × n orthogonal matrices and ~x, ~y ∈ Rn, then

(1) (P~x) · (P~y) = ~x · ~y(2) ‖P~x ‖ = ‖~x ‖(3) det P = ±1

(4) All real eigenvalues of P are 1 or −1

(5) PQ is an orthogonal matrix

Proof: We will prove (1) and (2) and leave the others as exercises. We have

(P~x) · (P~y) = (P~x)T (P~y) = ~xT PT P~y = ~xT~y = ~x · ~y

Thus,

‖P~x ‖2 = (P~x) · (P~x) = ~x · ~x = ‖~x ‖2

�


1. Let B = {1 + x,−2 + 3x, 2 − 3x2}. Consider the

inner product 〈 , 〉 on P2(R) defined by

〈p(x), q(x)〉 = p(−1)q(−1)+p(0)q(0)+p(1)q(1)

(a) Prove that B is an orthogonal basis for

P2(R).

(b) Find the B-coordinate vector of p(x) = 1.

(c) Find the B-coordinate vector of p(x) = x.

(d) Find the B-coordinate vector of p(x) = x2.

(e) Find the B-coordinate vector of

p(x) = 3 − 7x + x2 in three ways.

i. By row reducing an augmented

matrix.

ii. By using (b), (c), (d), and Theorem

4.3.2.

iii. By using the formula for coordinates

of vectors with respect to a basis

(Theorem 4).

2. Let ~x =

4

3

5

, and let B =

1

−2

2

,

2

2

1

,

−2

1

2

be

a basis for R3.

(a) Prove that B is an orthogonal basis for S.

(b) TurnB into an orthonormal basisC by nor-

malizing the vectors in B.

(c) Find the B-coordinate vector of ~x.

(d) Find the C-coordinate vector of ~x.

3. Consider the inner product 〈 , 〉 on defined R3

by⟨

~x, ~y⟩

= x1y1 + 2x2y2 + x3y3, and let

B =

1

2

−1

,

3

−1

−1

,

3

1

7

and ~x =

1

1

1

.

(a) Prove that B is an orthogonal basis for R3.



(c) Find the B-coordinate vector of ~x.

(d) Find the C-coordinate vector of ~x.


4. Consider the subspace SpanB in M2×2(R)

where B ={[

1 1

−1 −1

]

,

[

1 −1

1 −1

]

,

[

−1 1

1 −1

]}

.

(a) Prove that B is an orthogonal basis for S.



(c) Find the C-coordinate vector of

~x =

[

−3 6

−2 −1

]

.

5. Which of the following matrices are orthogo-

nal.

(a)

[


]

(b)

[

1/√

10 −3/√

10

3/√

10 1/√

10

]

(c)

[

2 1

−1 2

]

(d)

2/3 1/3 2/3

−1/√

18 4/√

18 −1/√

18

−1/√

2 0 1/√

2

(e)

1 1 1

2 −1 0

1 1 −1

(f)

0 1 0

0 0 1

1 0 0

(g)

2/√

20 3/√

11 0

3/√

20 −1/√

11 −1/√

2

3/√

20 −1/√

11 1/√

2

6. Consider P2(R) with inner product

〈p(x), q(x)〉 = p(−1)q(−1)+p(0)q(0)+p(1)q(1)

Given thatB = {1−x2, 12(x−x2)} is an orthonor-

mal set, extend B to find an orthonormal basis

for P2(R).


product 〈 , 〉. Prove that for any~v ∈ V and t ∈ R,

we have

‖t~v ‖ = |t| ‖~v ‖

8. Let P and Q be n × n orthogonal matrices

(a) Prove that det P = ±1.

(b) Prove that all real eigenvalues of P are ±1.

(c) Give an orthogonal matrix P whose eigen-

values are not ±1.

(d) Prove that PQ is an orthogonal matrix.

9. Let B =

1/√

6

−2/√

6

1/√

6

,

1/√

3

1/√

3

1/√

3

,

1/√

2

0

−1/√

2

be an orthonormal basis for R3. UsingB, deter-

mine another orthonormal basis for R3 which

includes the vector

1/√

6

1/√

3

1/√

2

, and briefly ex-

plain why your basis is orthonormal.

10. Let {~v1, . . . ,~vm} be a set of m orthonormal vec-

tors in Rn with m < n. Let Q =[

~v1 · · · ~vm

]

.

Prove that QT Q = Im.

11. Let {~v1, . . . ,~vn} be an orthonormal basis for an

inner product space V with inner product 〈 , 〉and let ~x = c1~v1+ · · ·+ cn~vn and ~y = d1~v1+ · · ·+dn~vn.

(a) Show that 〈~x, ~y〉 = c1d1 + · · · + cndn

(b) Show that ‖~x ‖2 = c21+ · · · + c2

n


product 〈 , 〉. Define the distance between two

vectors ~x, ~y ∈ V by

d(~x, ~y) = ‖~x − ~y‖

Prove the following properties of the distance

function.

(a) d(~x, ~y) ≥ 0.

(b) d(~x, ~y) = 0 if and only if ~x = ~y.

(c) d(~x, ~y) = d(~y, ~x).

(d) d(~x,~z) ≤ d(~x, ~y) + d(~y,~z).

(e) d(~x, ~y) = d(~x +~z, ~y +~z).

(f) d(c~x, c~y) = |c|d(~x, ~y).


9.3 The Gram-Schmidt Procedure

We have now seen that orthogonal and orthonormal bases are very nice. However,

this is only useful if we can find such a basis for any inner product space we wish to

use. So, our goal in this section is to not only prove that we can find an orthogonal

basis for any finite dimensional inner product space, but to also find an algorithm

for finding such a basis. Note that once we have an orthogonal basis, it is easy to

normalize each vector to make it orthonormal.

LetW be a k-dimensional subspace of an inner product space V. We will prove that

W has an orthogonal basis constructively. That is, we will derive a method for turning

any basis {~w1, . . . , ~wk} forW into an orthogonal basis {~v1, . . . ,~vk} forW.

We will do this inductively.

First, consider the case where k = 1. That is, {~w1} is a basis forW. In this case, we

can take ~v1 = ~w1 so that {~v1} is an orthogonal basis forW.

Now, consider the case k = 2 where {~w1, ~w2} is a basis forW. Starting as in the case

above we let ~v1 = ~w1. We then need to pick a non-zero vector ~v2 that is orthogonal

to ~v1. To see how to find such a vector ~v2 we work backwards. If {~v1,~v2} is an

orthogonal basis for W, then ~w2 ∈ Span{~v1,~v2}. If this is true, then, from our work

with coordinates with respect to an orthogonal basis, we find that

~w2 =〈~w2,~v1〉‖~v1‖2

~v1 +〈~w2,~v2〉‖~v2‖2

~v2

where〈~w2,~v2〉‖~v2‖2

, 0 since ~w2 is not a scalar multiple of ~w1 = ~v1. Rearranging gives

〈~w2,~v2〉‖~v2‖2

~v2 = ~w2 −〈~w2,~v1〉‖~v1‖2

~v1

Because multiplying by a non-zero scalar does not change orthogonality, this shows

that for {~v1,~v2} to be an orthogonal basis forW, it is necessary that ~v2 is a multiple of

~w2 −〈~w2,~v1〉‖~v1‖2

~v1. We now need to prove that this is also sufficient.

If we take ~v2 = ~w2 −〈~w2,~v1〉‖~v1‖2

~v1, then we have

〈~v1,~v2〉 =⟨

~v1, ~w2 −〈~w2,~v1〉‖~v1‖2

~v1

⟩

= 〈~v1, ~w2〉 −〈~w2,~v1〉‖~v1‖2

〈~v1,~v1〉

= 〈~v1, ~w2〉 −〈~w2,~v1〉‖~v1‖2

‖~v1‖2

= 0

Consequently, {~v1,~v2} is an orthogonal set of two non-zero vectors inW and hence is

an orthogonal basis forW by Theorem 9.2.3 and Theorem 4.2.3.

Section 9.3 The Gram-Schmidt Procedure 257

For k = 3, we can repeat the same procedure. We start by picking ~v1 and ~v2 as in the

case for k = 2. We then need to find ~v3 such that {~v1,~v2,~v3} is orthogonal. Using the

same method as in the previous case, we find that taking

~v3 = ~w3 −〈~w3,~v1〉‖~v1‖2

~v1 −〈~w3,~v2〉‖~v2‖2

~v2

will give an orthogonal basis {~v1,~v2,~v3} forW.

Continuing this gives the following result.

THEOREM 1 (Gram-Schmidt Orthogonalization Theorem)

Let {~w1, . . . , ~wn} be a basis for an inner product space W. If we define ~v1, . . . ,~vn

successively as follows:

~v1 = ~w1

~v2 = ~w2 −〈~w2,~v1〉‖~v1‖2

~v1

~vi = ~wi −〈~wi,~v1〉‖~v1‖2

~v1 −〈~wi,~v2〉‖~v2‖2

~v2 − · · · −〈~wi,~vi−1〉‖~vi−1‖2

~vi−1

for 3 ≤ k ≤ n, then {~v1, . . . ,~vk} is an orthogonal basis for Span{~w1, . . . , ~wk} for

1 ≤ k ≤ n.

EXERCISE 1 Prove Theorem 1.

Hint: Since the algorithm is recursive, what proof of technique should you use? Also,

make sure that you fully prove every part of the conclusion of the theorem.

Observe that Theorem 1 implies that every finite dimensional inner product space has

an orthogonal basis. We call the process defined in Theorem 1 for finding an orthog-

onal basis {~v1, . . . ,~vn} for an inner product spaceW the Gram-Schmidt procedure.

EXAMPLE 1 Use the Gram-Schmidt procedure to transform B =

1

1

1

,

1

−1

1

,

1

1

0

= {~w1, ~w2, ~w3}

into an orthonormal basis for R3.

Solution: We first take ~v1 = ~w1 =

1

1

1

. Next, we calculate

~w2 −⟨

~w2,~v1

⟩

‖~v1‖2~v1 =

1

−1

1

− 1

3

1

1

1

=

2/3−4/32/3


We can take any non-zero scalar multiple of this, so we take ~v2 =

1

−2

1

to simplify

the next calculation. Finally,

~v3 = ~w3 −⟨

~w3,~v1

⟩

‖~v1‖2~v1 −

⟨

~w3,~v2

⟩

‖~v2‖2~v2

=

1

1

0

− 2

3

1

1

1

+1

6

1

−2

1

=

1/20

−1/2

By Theorem 1, {~v1,~v2,~v3} is an orthogonal basis for R3. To get an orthonormal basis,

we normalize each vector. We get

v̂1 =1√

3

1

1

1

v̂2 =1√

6

1

−2

1

v̂3 =1√

2

1

0

−1

So, {v̂1, v̂2, v̂3} is an orthonormal basis for R3.

EXAMPLE 2 Use the Gram-Schmidt procedure to transform B =

1

1

1

,

1

−1

1

,

1

1

0

= {~w1, ~w2, ~w3}

into an orthogonal basis for R3 with inner product

⟨

~x, ~y⟩

= 2x1y1 + x2y2 + 3x3y3

Solution: Take ~v1 =

1

1

1

. Then,

~w2 −⟨

~w2,~v1

⟩

‖~v1‖2~v1 =

1

−1

1

− 4

6

1

1

1

=

1/3−5/31/3

To simplify calculations we take ~v2 =

1

−5

1

. Next,

~v3 = ~w3 −⟨

~w3,~v1

⟩

‖~v1‖2~v1 −

⟨

~w3,~v2

⟩

‖~v2‖2~v2

=

1

1

0

− 3

6

1

1

1

+3

30

1

−5

1

=

35

0

−25

Thus, an orthogonal basis is

1

1

1

,

1

−5

1

,

3

0

−2

.


EXAMPLE 3 Find an orthogonal basis of P3(R) with inner product 〈p, q〉 =∫ 1

−1

p(x)q(x) dx by

applying the Gram-Schmidt procedure to the basis {1, x, x2, x3}.

Solution: Take ~v1 = 1. Then

~v2 = x − 〈x, 1〉‖1‖2 1 = x − 0

21 = x

~v3 = x2 −

⟨

x2, 1⟩

‖1‖2 1 −

⟨

x2, x⟩

‖x‖2 x = x2 −23

21 − 0

23

x = x2 − 1

3

We instead take ~v3 = 3x2 − 1 to make calculations easier. Finally, we get

~v4 = x3 −

⟨

x3, 1⟩

‖1‖2 1 −

⟨

x3, x⟩

‖x‖2 x −

⟨

x3, 3x2 − 1⟩

‖3x2 − 1‖2 (3x2 − 1) = x3 − 3

5x

Hence, an orthogonal basis for this inner product space{

1, x, 3x2 − 1, x3 − 35x}

.

EXAMPLE 4 Use the Gram-Schmidt procedure to find an orthogonal basis for the subspaceW of

M2×2(R) spanned by

{~w1, ~w2, ~w3, ~w4} ={[

1 1

0 1

]

,

[

1 0

1 1

]

,

[

2 1

1 2

]

,

[

1 0

0 1

]}

under the standard inner product.

Solution: Take ~v1 = ~w1. Next, we get

~w2 −⟨

~w2,~v1

⟩

‖~v1‖2~v1 =

[

1 0

1 1

]

− 2

3

[

1 1

0 1

]

=

[

1/3 −2/31 1/3

]

So, we take ~v2 =

[

1 −2

3 1

]

. We then find that

~v3 = ~w3 −⟨

~w3,~v1

⟩

‖~v1‖2~v1 −

⟨

~w3,~v2

⟩

‖~v2‖2~v2 =

[

2 1

1 2

]

− 5

3

[

1 1

0 1

]

− 5

15

[

1 −2

3 1

]

=

[

0 0

0 0

]

At first glance, we may think something has gone wrong since the zero vector cannot

be a member of the basis. However, if we look closely, we see that this implies that

~w3 is a linear combination of ~w1 and ~w2. Hence, we have

Span{~w1, ~w2, ~w4} = Span{~w1, ~w2, ~w3, ~w4}

Therefore, we can ignore ~w3 and continue the procedure using ~w4.

~v3 = ~w4 −⟨

~w4,~v1

⟩

‖~v1‖2~v1 −

⟨

~w4,~v2

⟩

‖~v2‖2~v2 =

[

1 0

0 1

]

− 2

3

[

1 1

0 1

]

− 2

15

[

1 −2

3 1

]

=

[

1/5 −2/5−2/5 1/5

]

Hence, an orthogonal basis forW is {~v1,~v2,~v3}.


REMARK

It is important to realize that if you change the order that you use the vectors in a

basis in the Gram-Schmidt procedure, that you will likely get a different result. The

next exercise will demonstrate how big the difference can be. For purposes of the end

of section problems below, you are expected to use the vectors in the order provided

(from left to right).

EXERCISE 2 Use the Gram-Schmidt procedure to find an orthogonal basis for the subspaceW of

M2×2(R) from Example 4, using the vectors in the order

{~w4, ~w2, ~w1} ={[

1 0

0 1

]

,

[

1 0

1 1

]

,

[

1 1

0 1

]}

QR-Decomposition (optional)

The Gram-Schmidt procedure can be used to generate an extremely useful decompo-

sition of an m × n matrix A.

THEOREM 2 (QR-Decomposition)

Let A =[

~a1 · · · ~an

]

∈ Mm×n(R) where dim Col(A) = n. Let ~q1, . . . , ~qn denote the

vectors that result from applying the Gram-Schmidt procedure to the columns of A

(in order) and then normalizing. If we define

Q =[

~q1 · · · ~qn

]

and

R =

~a1 · ~q1 ~a2 · ~q1 · · · ~an · ~q1

0 ~a2 · ~q2 · · · ~an · ~q2

0 0. . . ~an · ~qn−1

0 · · · 0 ~an · ~qn

then Q is orthogonal, R is invertible, and

A = QR

EXERCISE 3 Prove the QR-Decomposition Theorem.



1. Use the Gram-Schmidt procedure to transform

B =

1

0

−1

,

2

2

3

,

1

3

1

into an orthogonal basis

for R3.


B =

−3

1

−1

,

2

−2

2

,

2

3

−1

into an orthogonal ba-

sis for R3.


S = {1, x, x2} into an orthonormal basis for

P2(R) under the inner product

〈p(x), q(x)〉 = p(−1)q(−1)+p(0)q(0)+p(1)q(1)


S = {1, x, x2} into an orthonormal basis for

P2(R) under the inner product

〈p(x), q(x)〉 = p(0)q(0) + p(1)q(1) + p(2)q(2)

5. Find an orthogonal basis for the subspace S of

R4 defined by

S = Span

−2

2

2

2

,

−1

3

1

1

,

1

2

−2

0

6. Let B ={[

−1 1

2 2

]

,

[

1 0

−1 −1

]

,

[

1 1

0 0

]

,

[

1 0

0 1

]}

.

Find an orthogonal basis for the subspace

S = SpanB of M2×2(R).

7. Suppose P2(R) has an inner product 〈 , 〉 which

satisfies the following:

〈1, 1〉 = 2 〈1, x〉 = 2 〈1, x2〉 = −2

〈x, x〉 = 4 〈x, x2〉 = −2 〈x2, x2〉 = 3

Given thatB = {x, 2x2+x, 2} is a basis of P2(R).

Apply the Gram-Schmidt procedure to this ba-

sis to find an orthogonal basis of P2(R).

8. Let ~v1 =

1

−1

−1

1

and ~v2 =

3

1

1

−1

. Extend the set

{~v1,~v2} to an orthogonal basis for R4 by follow-

ing the indicated steps.

(a) Use the methods of Chapter 4 to extend the

set {~v1,~v2} to be a basis {~v1,~v2, ~w3, ~w4} for

R4.

(b) Apply the Gram-Schmidt procedure to the

basis you found in (a) to get an orthogonal

basis {~v1,~v2,~v3,~v4} for R4.

9. (optional) Find a QR-Decomposition for

A =

1 0 0

1 1 0

1 1 1

10. (optional) Find a QR-Decomposition for

A =

3 −6

4 −8

0 1


9.4 General Projections

In Chapter 1, we saw how to calculate the projection of a vector onto a plane in R3.

Such projections have many important uses. So, our goal now is to extend projections

to general inner product spaces.

We will do this by mimicking what we did with projections in R3. We first recall

from Section 1.5 that given a vector ~x ∈ R3 and a plane P in R3 which passes through

the origin, we wanted to write ~x as the sum of a vector in P and a vector orthogonal

to every vector in P. Therefore, given a subspaceW of an inner product space V and

any vector ~v ∈ V, we want to find a vector projW(~v) inW and a vector perpW(~v) which

is orthogonal to every vector inW such that

~v = projW(~v) + perpW(~v)

In the case of projecting ~v ∈ R3 onto a plane P, we knew that perpP(~v) was a normal

vector of P. In the general case, we need to start by looking at the set of vectors

which are orthogonal to every vector inW.

DEFINITION

OrthogonalComplement

LetW be a subspace of an inner product space V. The orthogonal complementW⊥

ofW in V is defined by

W⊥ = {~v ∈ V | ⟨~w,~v⟩ = 0 for all ~w ∈W}

Instead of having to show a vector ~v ∈ V is orthogonal to every vector ~w ∈ W to

prove that ~v ∈W⊥, the following theorem tells us that we only need to show that ~v is

orthogonal to every vector in a spanning set forW.

THEOREM 1 Let {~v1, . . . ,~vk} be a spanning set for a subspaceW of an inner product space V, and

let ~x ∈ V. We have that ~x ∈W⊥ if and only if⟨

~x,~vi

⟩

= 0 for 1 ≤ i ≤ k.

Proof: If ~x ∈ W⊥, then by definition⟨

~w, ~x⟩

= 0 for all ~w ∈ W. Thus, since ~vi ∈ W,

we have⟨

~x,~vi

⟩

= 0 for 1 ≤ i ≤ k.

On the other hand, let ~x ∈ V such that⟨

~x,~vi

⟩

= 0 for 1 ≤ i ≤ k. Let ~w ∈ W. Since

{~v1, . . . ,~vk} spansW, there exists c1, . . . , ck ∈ R such that

~w = c1~v1 + · · · + ck~vk

Hence,

⟨

~x, ~w⟩

=⟨

~x, c1~v1 + · · · + ck~vk

⟩

= c1

⟨

~x,~v1

⟩

+ · · · + ck

⟨

~x,~vk

⟩

= 0

Thus, ~x ∈W⊥. �

Section 9.4 General Projections 263

EXAMPLE 1 LetW = Span

1

0

0

1

,

1

1

0

0

. FindW⊥ in R4.

Solution: We want to find all ~v =

v1

v2

v3

v4

such that

0 =

⟨

v1

v2

v3

v4

,

1

0

0

1

⟩

= v1 + v4

0 =

⟨

v1

v2

v3

v4

,

1

1

0

0

⟩

= v1 + v2

Solving this homogeneous system we find that a vector equation for the solution

space is ~v = s

0

0

1

0

+ t

−1

1

0

1

, s, t ∈ R. Thus,

W⊥ = Span

0

0

1

0

,

−1

1

0

1

EXAMPLE 2 LetW = Span{x} in P2(R) with 〈p(x), q(x)〉 =∫ 1

0

p(x)q(x) dx. FindW⊥.

Solution: We want to find all p(x) = a + bx + cx2 such that 〈p, x〉 = 0. This gives

0 =

∫ 1

0

(ax + bx2 + cx3) dx =1

2a +

1

3b +

1

4c

So, c = −2a − 43b. Thus, every vector in the orthogonal complement has the form

a + bx −(

2a + 43b)

x2 = a(1 − 2x2) + b(

x − 43x2

)

. Hence, we get

W⊥ = Span{1 − 2x2, 3x − 4x2}

The following theorem gives some important properties of the orthogonal comple-

ment which we will require later.


THEOREM 2 IfW is a subspace of an inner product space V, then

(1) W⊥ is a subspace of V

(2) If dimV = n, then dimW⊥ = n − dimW

(3) If dimV = n, then(

W⊥)⊥=W

(4) W ∩W⊥ = {~0}(5) If dimV = n, {~v1, . . . ,~vk} is an orthogonal basis forW, and {~vk+1, . . . ,~vn} is an

orthogonal basis for W⊥, then {~v1, . . . ,~vk,~vk+1, . . . ,~vn} is an orthogonal basis

for V

Proof: For (1), we apply the Subspace Test. By definition, W⊥ is a subset of V.

Also, ~0 ∈ W⊥ since 〈~0, ~w〉 = 0 for all ~w ∈ W by Theorem 9.1.1. Let ~u,~v ∈ W⊥ and

t ∈ R. For any ~w ∈W we have

⟨

~u + ~v, ~w⟩

=⟨

~u, ~w⟩

+⟨

~v, ~w⟩

= 0 + 0 = 0

and⟨

t~u, ~w⟩

= t⟨

~u, ~w⟩

= t(0) = 0

Thus, ~u + ~v ∈ W⊥ and t~u ∈W⊥, soW⊥ is a subspace of V.

For (2), let {~v1, . . . ,~vk} be an orthonormal basis for W. We can extend this to a

basis for V and then apply the Gram-Schmidt procedure to get an orthonormal basis

{~v1, . . . ,~vn} for V. Then, for any ~x ∈W⊥ we have ~x ∈ V so we can write

~x =⟨

~x,~v1

⟩

~v1 + · · · +⟨

~x,~vk

⟩

~vk +⟨

~x,~vk+1

⟩

~vk+1 + · · · +⟨

~x,~vn

⟩

~vn

= ~0 + · · · + ~0 + ⟨

~x,~vk+1

⟩

~vk+1 + · · · +⟨

~x,~vn

⟩

~vn

Hence, W⊥ = Span{~vk+1, . . . ,~vn}. Moreover, B = {~vk+1, . . . ,~vn} is linearly indepen-

dent since it is orthonormal. Thus, B is a basis forW⊥ and hence dimW⊥ = n − k =

n − dimW.

For (3), if ~w ∈ W, then⟨

~w,~v⟩

= 0 for all ~v ∈ W⊥ by definition of W⊥. Hence,

~w ∈ (

W⊥)⊥

. So, W ⊆ (

W⊥)⊥

Also, by (2), dim(

W⊥)⊥= n − dimW⊥ = n − (n −

dimW) = dimW. Therefore,W =(

W⊥)⊥

.

For (4), if ~x ∈W ∩W⊥, then⟨

~x, ~x⟩

= 0 and hence ~x = ~0. Therefore,W ∩W⊥ = {~0}.

For (5), we have that {~v1, . . . ,~vk,~vk+1, . . . ,~vn} is an orthogonal set of non-zero vectors

in V since every vector inW⊥ is orthogonal to every vectorW. Moreover, it contains

n vectors by (4). Hence, we have that it is linearly independent set of n vectors. Thus,

by Theorem 4.2.3, it is an orthogonal basis for V. �

Property (3) may seem obvious at first glance which may make us wonder why the

condition that dimV = n was necessary. This is because this property may not be

true in an infinite dimensional inner product space! That is, we can have(

W⊥)⊥,W.

We demonstrate this with an example.


EXAMPLE 3 Let P be the inner product space of all real polynomials with inner product

⟨

a0 + a1x + · · · + ak xk, b0 + b1x + · · · + blxl⟩

= a0b0 + a1b1 + · · · + apbp

where p = min(k, l). Now, let U be the subspace

U ={

g(x) ∈ P | g(1) = 0}

We claim that U⊥ = {0}. Let f (x) ∈ U⊥, say f (x) = a0 + a1x + · · · + anxn. Note that

〈 f , xn+1〉 = 0. Define g(x) = f (x) − f (1)xn+1. We get that g(1) = 0 and hence g ∈ U.

Therefore, we have

0 = 〈 f , g〉 = ⟨

f , f − f (1)xn+1⟩ = 〈 f , f 〉 − f (1)〈 f , xn+1〉 = 〈 f , f 〉

and hence f (x) = 0.

Of course(

U⊥)⊥= P, since 〈0, p(x)〉 = 0 for every p(x) ∈ P. Therefore,

(

U⊥)⊥, U.

REMARK

Notice that, as in the proof of property (3) above, we do have that U ⊂ (

U⊥)⊥

.

We now return to our purpose of looking at orthogonal complements, to define the

projection of a vector ~v onto a subspace W of an inner product space V. We stated

that we want to find projW(~v) and perpW(~v) such that ~v = projW(~v) + perpW(~v) with

projW(~v) ∈W and perpW(~v) ∈W⊥.

Suppose that dimV = n, {~v1, . . . ,~vk} is an orthogonal basis forW, and {~vk+1, . . . ,~vn}is an orthogonal basis for W⊥. By Theorem 2, we have that {~v1, . . . ,~vk,~vk+1, . . . ,~vn}is an orthogonal basis for V. So, for any ~v ∈ V we get

~v =

⟨

~v,~v1

⟩

‖~v1‖2~v1 + · · · +

⟨

~v,~vk

⟩

‖~vk‖2~vk +

⟨

~v,~vk+1

⟩

‖~vk+1‖2~vk+1 + · · · +

⟨

~v,~vn

⟩

‖~vn‖2~vn

This gives us what we want! Taking

projW(~v) =

⟨

~v,~v1

⟩

‖~v1‖2~v1 + · · · +

⟨

~v,~vk

⟩

‖~vk‖2~vk ∈W

and

perpW(~v) =

⟨

~v,~vk+1

⟩

‖~vk+1‖2~vk+1 + · · · +

⟨

~v,~vn

⟩

‖~vn‖2~vn ∈W⊥

we get ~v = projW(~v) + perpW(~v) with projW(~v) ∈W and perpW(~v) ∈W⊥.

At first glance it looks like these choices depend on the orthogonal basis for V being

used. However, it is easy to show that the projection and perpendicular are in fact

unique (independent of basis).


DEFINITION

Projection

Perpendicular

SupposeW is a k-dimensional subspace of an inner product space V and {~v1, . . . ,~vk}is an orthogonal basis forW. For any ~v ∈ V we define the projection of ~v ontoW by

projW(~v) =

⟨

~v,~v1

⟩

‖~v1‖2~v1 + · · · +

⟨

~v,~vk

⟩

‖~vk‖2~vk

and the perpendicular of the projection by

perpW(~v) = ~v − projW(~v)

It is very important to notice that we require an orthogonal basis forW to calculate

the projection. For this reason, these are often called orthogonal projections. To

save ourselves some work, we have defined the perpendicular of the projection in

such a way that we do not need an orthogonal basis (or any basis) forW⊥. To ensure

this definition is valid, we need to verify that perpW(~v) ∈W⊥.

THEOREM 3 IfW is a k-dimensional subspace of an inner product space V, then for any ~v ∈ V we

have

perpW(~v) = ~v − projW(~v) ∈W⊥

Proof: Observe that

perpW(~v) = ~v − projW(~v) = ~v −⟨

~v,~v1

⟩

‖~v1‖2~v1 − · · · −

⟨

~v,~vk

⟩

‖~vk‖2~vk

So, by the Gram-Schmidt Orthogonalization Theorem, {~v1, . . . ,~vk, perpW(~v)} is an

orthogonal set. That is, perpW(~v) is orthogonal to ~v1, . . . ,~vk. Thus, perpW(~v) ∈ W⊥by Theorem 1. �

Observe that this implies that projW(~v) and perpW(~v) are orthogonal for any ~v ∈ V as

we saw in Chapter 1. Additionally, these functions satisfy other properties which we

saw for projections in Chapter 1.

THEOREM 4 IfW is a k-dimensional subspace of an inner product space V, then projW is a linear

operator on V with kernelW⊥.

THEOREM 5 IfW is a subspace of a finite dimensional inner product space V, then for any ~v ∈ Vwe have

projW⊥(~v) = perpW(~v)



{[

1 0

0 1

]

,

[

1 0

0 −1

]}

be a subspace of M2×2(R). Find projW

([

1 −1

2 3

])

.

Solution: Denote the vectors by ~v1 =

[

1 0

0 1

]

, ~v2 =

[

1 0

0 −1

]

, and ~x =

[

1 −1

2 3

]

. Since

{~v1,~v2} is an orthogonal basis forW we get

projW

([

1 −1

2 3

])

=

⟨

~x,~v1

⟩

‖~v1‖2~v1 +

⟨

~x,~v2

⟩

‖~v2‖2~v2 =

4

2

[

1 0

0 1

]

+−2

2

[

1 0

0 −1

]

=

[

1 0

0 3

]

EXAMPLE 5 Let B =

1

0

−1

1,

,

1

1

0

1

,

−1

1

0

−1

= {~b1, ~b2, ~b3} and let ~x =

4

3

−2

5

. Find the projection of ~x

onto the subspace S = SpanB of R4.

Solution: To find the projection, we must have an orthogonal basis for S. Thus, our

first step must be to perform the Gram-Schmidt procedure on B. Let ~w1 = ~b1. Next,

we get

~b2 −

⟨

~b2, ~w1

⟩

‖~w1‖2~w1 =

1

1

0

1

− 2

3

1

0

−1

1,

=1

3

1

3

2

1

To simplify calculations we use ~w2 =

1

3

2

1

instead. Then, we get

~b3 −

⟨

~b3, ~w1

⟩

‖~w1‖2~w1 −

⟨

~b3, ~w2

⟩

‖~w2‖2~w2 =

−1

1

0

−1

+2

3

1

0

−1

1,

− 1

15

1

3

2

1

=2

5

−1

2

−2

−1

We take ~w3 =

−1

2

−2

−1

. Thus, the set {~w1, ~w2, ~w3} is an orthogonal basis for S. We can

now determine the projection.

projS(~x) =

⟨

~x, ~w1

⟩

‖~w1‖2~w1 +

⟨

~x, ~w2

⟩

‖~w2‖2~w2 +

⟨

~x, ~w3

⟩

‖~w3‖2~w3 =

11

3~w1 +

14

15~w2 +

1

10~w3 =

9/23

−2

9/2


Notice that if {~v1, . . . ,~vk} is an orthonormal basis forW, then the formula for calcu-

lating a projection simplifies to

projW(~v) =⟨

~v,~v1

⟩

~v1 + · · · +⟨

~v,~vk

⟩

~vk


0

1

0

,

−45

035

be a subspace of R3. Find projW

1

1

1

and perpW

1

1

1

.


0

1

0

,

−45

035

is an orthonormal basis forW. Hence,

projW

1

1

1

=

⟨

1

1

1

,

0

1

0

⟩

0

1

0

+

⟨

1

1

1

,

−4/50

3/5

⟩

−4/50

3/5

=

4/25

1

−3/25

perpW

1

1

1

=

1

1

1

−

4/25

1

−3/25

=

21/25

0

28/25

EXAMPLE 7 On C[−π, π] under the inner product 〈 f (x), g(x)〉 =∫ π

−π f (x)g(x) dx, the set

B ={

1√

2π,

sin x√π,

cos x√π

}

is an orthonormal basis for the subspace S = SpanB

(see Example 9.2.18). Let f (x) = |x|. Find projS( f ).

Solution: We have

〈|x|, 1〉 =∫ π

−π|x| dx = π2

〈|x|, sin x〉 =∫ π

−π|x| sin x dx = 0

〈|x|, cos x〉 =∫ π

−π|x| sin x dx = −4

Hence,

projS( f ) =

⟨

|x|, 1√

2π

⟩

1√

2π+

⟨

|x|, sin x√π

⟩

sin x√π+

⟨

|x|, cos x√π

⟩

cos x√π

=1

2π〈|x|, 1〉1 + 1

π〈|x|, sin x〉 sin x +

1

π〈|x|, cos x〉 cos x

=π

2− 4

πcos x

This is called the first degree Fourier polynomial of f .



1. Find a basis for the orthogonal complement of

each subspace.

(a) S1 = Span

3

2

1

,

1

−2

3

in R3.

(b) S2 = Span

1

−2

0

1

in R4.

(c) S3 =

x1

0

x1

| x1 ∈ R

in R3.

(d) S4 = Span

{[

1 0

1 0

]

,

[

2 −1

1 3

]}

in M2×2(R).

(e) S5 = Span

{[

2 1

1 1

]

,

[

−1 1

3 1

]}

in M2×2(R).

(f) S6 =

{[

a b

c d

]

| a + b + d = 0

}

in M2×2(R).

2. Find an orthogonal basis for the orthogonal

complement of each subspace of P2(R) under

the inner product

〈p(x), q(x)〉 = p(−1)q(−1)+p(0)q(0)+p(1)q(1)

(a) T1 = Span{

x2 + 1}

(b) T2 = Span{x2}

3. Let ~w1 =

1

2

0

1

, ~w2 =

2

1

1

2

, ~w3 =

2

1

3

2

, and let

S = Span{~w1, ~w2, ~w3}.(a) Find an orthonormal basis for S.

(b) Let ~y =

0

0

0

12

. Determine projS(~y).

(c) Let ~z =

1

0

0

2

. Determine perpS(~z).

4. In M2×2(R), consider the subspace

S = Span

{[

1 0

−1 1,

]

,

[

1 1

1 1

]

,

[

2 0

1 1

]}

(a) Find an orthogonal basis for S.

(b) Let A =

[

4 3

−2 5

]

. Calculate projS(A) and

perpS(A).

(c) Let B =

[

0 2

1 1

]

. Calculate projS(B) and

perpS(B).

(d) Without doing any further work, deter-

mine whether A and/or B are in S. Explain.

5. On P2(R) define the inner product

〈p, q〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1)

and let S = Span{

1, x − x2}

.

(a) Find an orthogonal basis for S.

(b) Determine projS(1 + x + x2) and

perpS(1 + x + x2).

(c) Determine projS(2 + 2x + 2x2) and

perpS(2 + 2x + 2x2).

6. Suppose W is a k-dimensional subspace of an

inner product space V. Prove that projW is a

linear operator on V with kernelW⊥.

7. Let W be a finite-dimensional subspace of an

inner product space V and let ~v ∈ V. Prove that

projW(~v) and perpW(~v) are unique.

8. In P3(R) using the inner product

〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) +

p(1)q(1) + p(2)q(2)

the following four vectors are orthogonal:

p1(x) = 9 − 13x − 15x2 + 10x3

p2(x) = 1 + x − x2

p3(x) = 1 − 2x

p4(x) = 1

Let S = Span{p1(x), p2(x), p3(x)}. Determine

the projection of −1+x+x2−x3 onto S. [HINT:

If you do an excessive number of calculations,

you have missed the point of the question.]


9.5 The Fundamental Theorem

LetW be a subspace of a finite dimensional inner product space V. In the last section

we saw that any ~v ∈ V can be written as ~v = ~x+~y where ~x ∈W and ~y ∈W⊥. We now

invent some notation for this.

DEFINITION

Direct Sum

Let V be a vector space and U and W be subspaces of a vector space V such that

U ∩W = {~0}. The direct sum of U andW is

U ⊕W = {~u + ~w ∈ V | ~u ∈ U, ~w ∈W}

EXAMPLE 1 Let U = Span

1

0

1

andW = Span

1

1

1

. What is U ⊕W?

Solution: Since U ∩W = {~0} we have

U ⊕W = {~u + ~w ∈ V | ~u ∈ U, ~w ∈ W} =

s

1

0

1

+ t

1

1

1

| s, t ∈ R

= Span

1

0

1

,

1

1

1

THEOREM 1 If U andW are subspaces of a vector space V such that U ∩W = {~0}, then U ⊕W is

a subspace of V. Moreover, if {~v1, . . . ,~vk} is a basis for U and {~w1, . . . , ~wℓ} is a basis

forW, then {~v1, . . . ,~vk, ~w1, . . . , ~wℓ} is a basis for U ⊕W.

COROLLARY 2 If U andW are subspaces of a vector space V such that U ∩W = {~0} and ~v ∈ U ⊕W,

then there exists unique ~u ∈ U and ~w ∈W such that ~v = ~u + ~w.

EXAMPLE 2 Let U = Span{x2 + 1, x4} andW = Span{x3 − x, x3 + x2 + 1}. Find a basis for U ⊕W.

Solution: Since U ∩W = {~0} a basis for U ⊕W is {x2 + 1, x4, x3 − x, x3 + x2 + 1}.

Theorem 9.4.2 tells us that for any subspace W of an n-dimensional inner product

space V thatW∩W⊥ = {~0} and dimW⊥ = n−dimW. Combining this with Theorem

1 we get the following result.

Section 9.5 The Fundamental Theorem 271

THEOREM 3 If V is a finite dimensional inner product space andW is a subspace of V, then

W ⊕W⊥ = V

We now look at the related result for matrices. We start by looking at an example.

EXAMPLE 3 Find a basis for each of the four fundamental subspaces of A =

1 2 0 −1

3 6 1 −1

−2 −4 2 6

and

then determine the orthogonal complement of the row space of A and the orthogonal

complement of the column space of A.

Solution: To find a basis for each of the four fundamental subspaces, we use the

method derived in Section 7.1. Row reducing A to reduced row echelon form gives

R =

1 2 0 −1

0 0 1 2

0 0 0 0

Hence, a basis for Row(A) is

1

2

0

−1

,

0

0

1

2

, a basis for Null(A) is

−2

1

0

0

,

1

0

−2

1

,

and a basis for Col(A) is

1

3

−2

,

0

1

2

.

Row reducing AT gives

1 0 −8

0 1 2

0 0 0

0 0 0

. Thus, a basis for the left nullspace is

8

−2

1

.

Observe that the basis vectors for Null(A) are orthogonal to the basis vectors for

Row(A). Additionally, we know that dim(Row(A))⊥ = 4 − 2 = 2 by Theorem 9.4.2

(2). Therefore, since dim(Null(A)) = 2 we have that (Row(A))⊥ = Null(A).

Similarly, we see that (Col(A))⊥ = Null(AT ).

The result of this example seems pretty amazing. Moreover, applying parts (4) and

(5) of Theorem 9.4.2 and our notation for direct sums, we get that

Rn = Row(A) ⊕ Null(A) and Rm = Col(A) ⊕ Null(AT )

This tells us that every vector in R4 is the sum of a vector in the row space of A and

a vector in the nullspace of A.

The generalization of this result to any m × n matrix is extremely important.


THEOREM 4 (The Fundamental Theorem of Linear Algebra)

If A is an m × n matrix, then Col(A)⊥ = Null(AT ) and Row(A)⊥ = Null(A). In

particular,

Rn = Row(A) ⊕ Null(A) and Rm = Col(A) ⊕ Null(AT )

Proof: Let A =

~vT1...~vT

m

.

If ~x ∈ Row(A)⊥, then ~x is orthogonal to each column of AT . Hence, ~vi · ~x = 0 for

1 ≤ i ≤ m. Thus, we get

A~x =

~vT1 ~x...~vT

m~x

=

~v1 · ~x...

~vm · ~x

= ~0

Hence, ~x ∈ Null(A) and so Row(A)⊥ ⊆ Null(A).

On the other hand, let ~x ∈ Null(A), then A~x = ~0 so ~vi · ~x = 0 for 1 ≤ i ≤ m. Now, pick

any ~b ∈ Row(A). Then ~b = c1~v1 + · · · + cm~vm for some c1, . . . , cm ∈ R. Therefore,

~b · ~x = (c1~v1 + · · · + cm~vm) · ~x = c1(~v1 · ~x) + · · · + cm(~vm · ~x) = 0

Hence, ~x ∈ Row(A)⊥ and so Null(A) ⊆ Row(A)⊥. Thus, Row(A)⊥ = Null(A).

Applying what we proved above to AT we get

(Col(A))⊥ = (Row(AT ))⊥ = Null(AT )

�

Observe that the Fundamental Theorem of Linear Algebra tells us more than the

relationship between the four fundamental subspaces. For example, if A is an m × n

matrix, then we know that the rank and nullity of the linear mapping L(~x) = A~x is

rank L = rank A = dim Row A and nullity L = dim(Null A). Since Rn = Row(A) ⊕Null(A) we get by Theorem 9.4.2 (2) that

n = rank L + nullity(L)

That is, the Fundamental Theorem of Linear Algebra implies the Rank-Nullity The-

orem.

It can also be shown that the Fundamental Theorem of Linear Algebra implies a lot

of our results about solving systems of linear equations. We will also find it useful in

a couple of proofs in future sections.

Section 9.6 The Method of Least Squares 273


1. Let A =

1 1 3 1

2 2 6 2

−1 0 −2 −3

3 1 7 7

. Find a basis for

Row(A)⊥.

2. Prove that if U andW are subspaces of a vector

space V such that U∩W = {~0}, then U⊕W is a

subspace of V. Moreover, if {~v1, . . . ,~vk} is a ba-

sis for U and {~w1, · · · , ~wℓ} is a basis forW, then

{~v1, . . . ,~vk, ~w1, . . . , ~wℓ} is a basis for U ⊕W.

9.6 The Method of Least Squares

In the sciences one often tries to find a correlation between two quantities by collect-

ing data from repeated experimentation. Say, for example, a scientist is comparing

quantities x and y which are known to satisfy a quadratic relation y = a0+a1x+a2x2.

The scientist would like to find appropriate values for a0, a1, and a2.

To do this, the scientist performs some experiments involving the two quantities.

Every time they do the experiment they will get quantities yi and xi. Putting this into

their equation, they get a linear equation

yi = a0 + a1xi + a2x2i

in the three unknowns a0, a1, and a2. To get as accurate of a result as possible, the

scientists will perform the experiment many times. We thus end up with a system of

linear equations

y1 = a0 + a1x1 + a2x21

... =...

ym = a0 + a1xm + a2x2m

which has more equations than unknowns. Such a system of equations is called an

overdetermined system. Also notice that due to experimentation error, the system

is very likely to be inconsistent. So, the scientist needs to find the values of a0, a1,

and a2 which best approximates the data they collected.

To solve this problem, we rephrase it in terms of linear algebra. Let A be an m × n

matrix with m > n and let the system A~x = ~b be inconsistent. We want to find a

vector ~x that minimizes the distance between A~x and ~b. Hence, we need to minimize

‖~b − A~x‖. The following theorem tells us how to do this.

THEOREM 1 (Approximation Theorem)

Let W be a finite dimensional subspace of an inner product space V. If ~v ∈ V, then

the vector closest to ~v inW is projW(~v). That is,

‖~v − projW(~v) ‖ < ‖~v − ~w ‖

for all ~w ∈W, ~w , projW(~v).


Proof: Consider ~v − ~w = (~v − projW(~v)) + (projW(~v) − ~w). Now, observe that

{~v− projW(~v), projW(~v)− ~w} is an orthogonal set since ~v− projW(~v) = perpW(~v) ∈ W⊥and projW(~v) − ~w ∈ W. Therefore, by Theorem 9.2.2, we get

‖~v − ~w ‖2 = ‖(~v − projW(~v)) + (projW(~v) − ~w)‖2

= ‖~v − projW(~v) ‖2 + ‖ projW(~v) − ~w ‖2

> ‖~v − projW(~v) ‖2

since ‖ projW(~v) − ~w ‖2 > 0 if ~w , projW(~v). The result now follows. �

Notice that A~x is in the column space of A. Thus, the Approximation Theorem tells

us that we can minimize ‖~b − A~x ‖ by finding the projection of ~b onto the column

space of A. Therefore, if we solve the consistent system

A~x = projCol A(~b)

we will find the desired vector ~x. This might seem quite simple, but we can make it

even easier. Subtracting both sides of this equation from ~b we get

~b − A~x = ~b − projCol A(~b) = perpCol A(~b)

Thus, ~b − A~x is in the orthogonal complement of the column space of A which, by

the Fundamental Theorem of Linear Algebra, means ~b−A~x is in the nullspace of AT .

Hence

AT (~b − A~x) = ~0

or equivalently

AT A~x = AT~b

This is called the normal system and the individual equations are called the normal

equations. This system will be consistent by construction. However, it need not have

a unique solution. If it does have infinitely many solutions, then each of the solutions

will minimize ‖~b − A~x‖.

EXAMPLE 1 Determine the vector ~x that minimizes ‖~b − A~x‖ for the system

3x1 − x2 = 4

x1 + 2x2 = 0

2x1 + x2 = 1


3 −1

1 2

2 1

and ~b =

4

0

1

, so AT A =

[

14 1

1 6

]

and AT~b =

[

14

−3

]

.

Since AT A is invertible, we find that

~x = (AT A)−1AT~b =1

83

[

6 −1

−1 14

] [

14

−3

]

=1

83

[

87

−56

]

Thus, x1 =8783

and x2 =−5683

.

Note that these give 3x1 − x2 = 3.82, x1 + 2x2 = −0.3, 2x1 + x2 = 1.42, so we have

found a fairly good approximation.


EXAMPLE 2 Determine the vector ~x that minimizes ‖~b − A~x‖ for the system

2x1 + x2 = −5

−2x1 + x2 = 8

2x1 + 3x2 = 1


2 1

−2 1

2 3

so

AT A =

[

12 6

6 11

]

, AT~b =

[

−24

6

]

Since AT A is invertible, we find that

~x = (AT A)−1AT~b =1

96

[

11 −6

−6 12

] [

−24

6

]

=

[

−300/96

216/96

]

Thus, x1 =−25

8and x2 =

94.

This method of finding an approximate solution is called the method of least squares.

It is called this because we are minimizing

‖~b − A~x‖ =

∥

∥

∥

∥

∥

∥

∥

∥

∥

v1

...vm

∥

∥

∥

∥

∥

∥

∥

∥

∥

=

√

v21+ · · · + v2

m

which is equivalent to minimizing v21+ · · · + v2

m.

We now return to our problem of finding the curve of best fit for a set of data points.

Say in an experiment we get a set of data points (x1, y1), . . . , (xm, ym) and we want to

find the values of a0, . . . , an such that p(x) = a0 + a1x + · · · + anxn is the polynomial

of best fit. That is, we want the values of a0, . . . , an such that the values of yi are

approximated as well as possible by p(xi).

Let ~y =

y1

...ym

and define p(~x) =

p(x1)...

p(xm)

. To make this look like the method of least

squares we write

p(~x) =

p(x1)...

p(xm)

=

a0 + a1(x1) + · · · + an(x1)n

...a0 + a1(xm) + · · · + an(xm)n

=

1 x1 · · · xn1

......

...1 xm · · · xn

m

a0

...an

= X~a

Thus, we are trying to minimize ‖X~a−~y ‖ and we can use the method of least squares

above.

This can be summarized in the following theorem.


THEOREM 2 Let m data points (x1, y1), . . . , (xm, ym) be given and write

~y =

y1

...ym

, X =

1 x1 x21· · · xn

1

1 x2 x22· · · xn

2.........

...1 xm x2

m · · · xnm

If ~a =

a0

...an

is any solution to the normal system

XT X~a = XT~y

then the polynomial

p(x) = a0 + a1x + · · · + anxn

is the best fitting polynomial of degree n for the given data. Moreover, if at least n+1

of the numbers x1, . . . , xm are distinct, then the matrix XT X is invertible and thus ~a is

unique with

~a = (XT X)−1XT~y

Proof: We proved the first part above, so we just need to prove the second part.

Suppose that n + 1 of the xi are distinct and consider a linear combination of the

columns of X.

c0

1

1...1

+ c1

x1

x2

...xm

+ · · · + cn

xn1

xn2...

xnm

=

0

0...0

Let q(x) = c0 + c1x + · · · + cnxn. Equating coefficients we see that x1, x2, . . . , xm are

all roots of q(x). Thus, q(x) is a polynomial of degree at most n with at least n + 1

distinct roots. Consequently, by the Fundamental Theorem of Algebra, q(x) must be

the zero polynomial, and so ci = 0 for 0 ≤ i ≤ n. Thus, the columns of X are linearly

independent.

To show this implies that XT X is invertible we consider XT X~v = ~0. We have

‖X~v ‖2 = (X~v)T X~v = ~vT XT X~v = ~vT~0 = 0

Hence, X~v = ~0, so ~v = ~0 since the columns of X are linearly independent. Thus, XT X

is invertible by the Invertible Matrix Theorem. �


EXAMPLE 3 Find a0 and a1 to obtain the best fitting equation of the form y = a0 + a1x for the

given data.x 1 3 4 6 7

y 1 2 3 4 5

Solution: We let X =

1 1

1 3

1 4

1 6

1 7

and ~y =

1

2

3

4

5

. We then get

XT X =

[

1 1 1 1 1

1 3 4 6 7

]

1 1

1 3

1 4

1 6

1 7

=

[

5 21

21 111

]

XT~y =

[

1 1 1 1 1

1 3 4 6 7

]

1

2

3

4

5

=

[

15

78

]

By Theorem 2, we get that ~a =

[

a0

a1

]

is unique and satisfies

~a = (XT X)−1XT~y

=

[

5 21

21 111

]−1 [

15

78

]

=1

114

[

111 −21

−21 5

] [

15

78

]

=1

38

[

9

25

]

So, the line of best fit is y = 938+ 25

38x.


EXAMPLE 4 Find a0, a1, a2 to obtain the best fitting equation of the form y = a0 + a1x + a2x2 for

the given data.x −3 −1 0 1 3

y 3 1 1 2 4

Solution: We have ~y =

3

1

1

2

4

and X =

1 −3 9

1 −1 1

1 0 0

1 1 1

1 3 9

. Hence,

XT X =

5 0 20

0 20 0

20 0 164

, XT~y =

11

4

66

By Theorem 2, we get that ~a =

a0

a1

a2

is unique and satisfies

~a = (XT X)−1XT~y

=

5 0 20

0 20 0

20 0 164

−1

11

4

66

=1

210

242

42

55

So the best fitting parabola is

p(x) =121

105+

1

5x +

11

42x2



1. Find the vector ~x that minimizes ‖A~x − ~b‖ for

each of the following systems.

(a) x1 + 2x2 = 1

x1 + x2 = 2

x1 − x2 = 5

(b) x1 + 5x2 = 1

−2x1 − 7x2 = 1

x1 + 2x2 = 0

(c) 2x1 + x2 = 4

2x1 − x2 = −1

3x1 + 2x2 = 8

(d) x1 + 2x2 = 2

x1 + 2x2 = 3

x1 + 3x2 = 2

x1 + 3x2 = 3

(e) x1 − x2 = 2

2x1 − x2 + x3 = −2

2x1 + 2x2 + x3 = 3

x1 − x2 = −2

(f) 2x1 − 2x2 = 1

−x1 + x2 = 2

3x1 + x2 = 1

2x1 − x2 = 2

2. Let ~x =

1

2

1

, ~y =

3

1

−2

, ~z =

1

−1

1

. and con-

sider the plane P in R3 with scalar equation

2x1 + x2 − x3 = 0.

(a) Find the vector in P that is closest to ~x.

(b) Find the vector in P that is closest to ~y.

(c) Find the vector in P that is closest to ~z.

(d) Without doing any further calculations,

which vector ~x, ~y, or ~z is furthest from P.

Explain.

3. Find a0 and a1 to obtain the best fitting equation

of the form y = a0 + a1x for the given data.

(a)x −1 0 1

y −3 2 2

(b)x −1 0 1 2

y −3 −1 0 1

4. Find a0, a1, a2 to obtain the best fitting equation

of the form y = a0 + a1x + a2x2 for the given

data.

(a)x −2 −1 1 2

y 0 −2 0 1

(b)x −2 −1 0 1 2

y 0 1 −1 3 1

5. Find a1 and a2 to obtain the best fitting equation

of the form y = a1x + a2x2

for the given data.

x −1 0 1

y 4 1 1

6. (optional)

(a) Let A ∈ Mm×n(R) with rank(A) = n. Let

A = QR be a QR-Decomposition (see page

260) of A. Show that the normal equa-

tions AT A~x = AT~b can be rewritten as

R~x = QT~b.

(b) Let A =

3 −6

4 −8

0 1

and ~b =

−1

7

2

. Solve

the normal system AT A~x = AT~b by using

the QR-Decomposition found in Problem

9.3.10.

Chapter 10

Applications of Orthogonal Matrices

10.1 Orthogonal Similarity

In Chapter 6, we saw that two matrices A and B are similar if there exists an invertible

matrix P such that P−1AP = B. Moreover, we saw that this corresponds to calculating

the matrix of a linear operator with respect to a basis. In particular, if

L : Rn → Rn is a linear operator and B = {~v1, . . . ,~vn} is a basis for Rn, then taking

P =[

~v1 · · · ~vn

]

we get

[L]B = P−1[L]P

We then looked for a basis B of Rn such that [L]B was diagonal. We found that such

a basis is made up of the eigenvectors of [L].

In the last chapter, we saw that orthonormal bases have some very nice properties. For

example, if the columns of P form an orthonormal basis for Rn, then P is orthogonal

and it is very easy to find P−1. In particular, we have P−1 = PT . Hence, we now turn

our attention to trying to find an orthonormal basis B of eigenvectors such that [L]Bis diagonal. Since the corresponding matrix P will be orthogonal, we will get

[L]B = PT [L]P

DEFINITION

OrthogonallySimilar

Two matrices A and B are said to be orthogonally similar if there exists an orthogo-

nal matrix P such that

PT AP = B

Since PT = P−1 we have that if A and B are orthogonally similar, then they are similar.

Therefore, all the properties of similar matrices still hold. In particular, if A and B

are orthogonally similar, then rank A = rank B, tr A = tr B, det A = det B, and A and

B have the same eigenvalues.

We saw in Chapter 6 that not every square matrix A has a set of eigenvectors which

forms a basis for Rn. Since we are now going to require the additional condition that

the basis is orthonormal, we expect that even fewer matrices will have this property.

So, we first just look for matrices A such that there exists an orthogonal matrix P for

which PT AP is upper triangular.

280

Section 10.1 Orthogonal Similarity 281

THEOREM 1 (Triangularization Theorem)

If A is an n × n matrix with real eigenvalues, then A is orthogonally similar to an

upper triangular matrix T .

Proof: We prove the result by induction on n. If n = 1, then A is upper triangular.

Therefore, we can take P to be the orthogonal matrix P = [1] and the result follows.

Now assume the result holds for all (n − 1) × (n − 1) matrices and consider an n × n

matrix A with all real eigenvalues.

Let ~v1 be a unit eigenvector of one of A’s eigenvalues λ1. Extend the set {~v1} to a

basis for Rn and apply the Gram-Schmidt procedure to produce an orthonormal basis

{~v1, ~w2, . . . , ~wn} for Rn. Then, the matrix P1 =[

~v1 ~w2 . . . ~wn

]

is orthogonal and

PT1 AP1 =

~vT1

~wT2...~wT

n

A[

~v1 ~w2 . . . ~wn

]

=

~v1 · A~v1 ~v1 · A~w2 · · · ~v1 · A~wn

~w2 · A~v1 ~w2 · A~w2 . . . ~w2 · A~wn

.... . .

...~wn · A~v1 ~wn · A~w2 . . . ~wn · A~wn

Consider the entries in the first column. Since A~v1 = λ1~v1 we get ~v1 · A~v1 = λ1, and

~wi · A~v1 = ~wi · λ1~v1 = λ1(~wi · ~v1) = 0

for 2 ≤ i ≤ n. Hence

PT1 AP1 =

[

λ1~bT

~0 A1

]

where A1 is an (n − 1) × (n − 1) matrix, ~b is a vector in Rn−1, and ~0 is the zero vector

in Rn−1. Because A is similar to

[

λ1~bT

~0 A1

]

we get that all of the eigenvalues of A1 are

eigenvalues of A, and hence all the eigenvalues of A1 are real. So, by the inductive

hypothesis, there exists an (n−1)×(n−1) orthogonal matrix Q such that QT A1Q = T1

is upper triangular.

Let P2 =

[

1 ~0T

~0 Q

]

. Then, the columns of P2 are orthonormal since the columns of

Q are orthonormal. Hence, P2 is an orthogonal matrix. Consequently, the matrix

P = P1P2 is also orthogonal by Theorem 9.2.7. By block multiplication we get

PT AP = (P1P2)T A(P1P2) = PT2 PT

1 AP1P2

=

[

1 ~0T

~0 QT

] [

λ1~bT

~0 A1

] [

1 ~0T

~0 Q

]

=

[

λ1~bT Q

~0 QT A1Q

]

=

[

λ1~bT Q

~0 T1

]

is upper triangular. Thus, the result follows by induction. �

282 Chapter 10 Applications of Orthogonal Matrices

If A is orthogonally similar to an upper triangular matrix T , then A and T must share

the same eigenvalues. Thus, since T is upper triangular, the eigenvalues must appear

along the main diagonal of T .

Notice that the proof of the theorem gives us a method for finding an orthogonal

matrix P so that PT AP = T is upper triangular. However, since the proof is by

induction, this leads to a recursive algorithm. We demonstrate this with an example.

EXAMPLE 1 Let A =

2 1 1

0 3 −2

0 2 −1

. Find an orthogonal matrix P such that PT AP is upper triangular.

Solution: As in the proof, we first need to find one of the real eigenvalues of A. By

inspection, we see that 2 is an eigenvalue of A with unit eigenvector ~v1 =

1

0

0

. Next,

we extend {~v1} to an orthonormal basis for R3. This is very easy in this case as we

can take ~w2 =

0

1

0

and ~w3 =

0

0

1

. Thus, we get P1 = I and

PT1 AP1 =

2 1 1

0 3 −2

0 2 −1

=

[

2 ~bT

~0 A1

]

where A1 =

[

3 −2

2 −1

]

and ~bT =[

1 1]

.

We now need to apply the inductive hypothesis on A1. To do this, we need to find

a 2 × 2 orthogonal matrix Q such that QT A1Q = T1 is upper triangular. We start

by finding a real eigenvalue of A1. Consider A1 − λI =[

3 − λ −2

2 −1 − λ

]

. We have

C(λ) = det(A1 − λI) = (λ− 1)2 which implies that the only eigenvalue of A1 is λ = 1.

We find that A1 − λI =[

2 −2

2 −2

]

, so a corresponding unit eigenvector is

[

1/√

2

1/√

2

]

.

We extend

{[

1/√

2

1/√

2

]}

to the orthonormal basis

{[

1/√

2

1/√

2

]

,

[

−1/√

2

1/√

2

]}

for R2. There-

fore, Q = 1√2

[

1 −1

1 1

]

is an orthogonal matrix and

QT A1Q =1√

2

[

1 1

−1 1

] [

3 −2

2 −1

]

1√

2

[

1 −1

1 1

]

=

[

1 −4

0 1

]

= T1

Now that we have Q and T1, the proof of the theorem tells us to choose

P2 =

[

1 ~0T

~0 Q

]

=

1 0 0

0 1/√

2 −1/√

2

0 1/√

2 1/√

2

Section 10.1 Orthogonal Similarity 283

and P = P1P2 = P2. This gives

PT AP =

[

2 ~bT Q~0 T1

]

=

2√

2 0

0 1 −4

0 0 1

as required.

Since the procedure is recursive, it would be very inefficient for larger matrices.

There is a much better algorithm for doing this, although we will not cover it in

this book. The purpose of the example above is not to teach one how to find such an

orthogonal matrix, but to help one understand the proof.


1. By following the steps in the proof of the Trian-

gularization Theorem, find an orthogonal ma-

trix P and upper triangular matrix T such that

PT AP = T for each of the following matrices.

(a) A =

[

2 4

−4 −6

]

(b) A =

[

1 4

7 4

]

(c) A =

−3 4 5

0 1 4

0 7 4

(d) A =

7 −6 5

0 1 4

0 6 3

(e) A =

1 0 −1

2 2 1

4 0 5

(f) A =

0 1 1

2 3 6

−1 −1 −2

(g) A =

2 0 0 1

0 2 0 −1

−1 1 2 0

0 0 0 2

2. Prove that if A is orthogonal similar to B and B

is orthogonally similar to C, then A is orthogo-

nal similar to C.

3. Let A and B be orthogonally similar matrices.

(a) Prove that if A and B are invertible, then

A−1 and B−1 are orthogonally similar.

(b) Prove that A2 and B2 are orthogonally sim-

ilar.

(c) Prove that if AT = A, then BT = B.

(d) Prove that A and B are orthogonally simi-

lar to the same upper triangular matrix T .


10.2 Orthogonal Diagonalization

We now know that for any linear operator L on Rn with real eigenvalues we can find

an orthonormal basis B such that the B-matrix of L is upper triangular. However, as

we saw in Chapter 6, having a diagonal matrix would be even better. So, we now

look at which matrices are orthogonally similar to a diagonal matrix.

DEFINITION

OrthogonallyDiagonalizable

An n × n matrix A is said to be orthogonally diagonalizable if there exists an

orthogonal matrix P and diagonal matrix D such that

PT AP = D

that is, if A is orthogonally similar to a diagonal matrix.

Our first goal is to determine all matrices which are orthogonally diagonalizable. But,

how do we even start looking for non-trivial matrices which will have an orthonormal

basis for Rn of eigenvectors.

A clever way to approach this problem is to work backwards. That is, start by looking

at what condition a matrix must have if it is orthogonally diagonalizable.

So, assume that A is orthogonally diagonalizable. Then, by definition, there exists

an orthogonal matrix P such that PT AP = D is diagonal. We are looking for some

special condition, the matrix A must have. Of course, since A and D are similar (in

fact, orthogonally similar), they must have the same properties. So, we can instead

ask ourselves what nice properties must D have. Since D is diagonal, we can see that

DT = D. Let’s try to prove that A has the same property.

Using the fact that P is orthogonal, we can rewrite PT AP = D as A = PDPT . Taking

transposes gives

AT = (PDPT )T = (PT )T DT PT = PDPT = A

Thus, we also have AT = A. We have proven the following theorem.

THEOREM 1 If A is orthogonally diagonalizable, then AT = A.

The condition AT = A is going to be very important, so we make the following

definition.

DEFINITION

SymmetricMatrix

A matrix A such that AT = A is said to be symmetric.

It is clear from the definition that a matrix must be square to be symmetric. Moreover,

observe that a matrix A is symmetric if and only if ai j = a ji for all 1 ≤ i, j ≤ n.

Section 10.2 Orthogonal Diagonalization 285

EXAMPLE 1 Determine which of the following matrices is symmetric.

(a) A =

1 2 1

2 0 1

1 1 −3

, (b) B =

1 0 1

1 1 1

1 0 1

, (c) C =

1 0 0

0 2 0

0 0 3

Solution: It is easy to verify that AT = A and CT = C so they are both symmetric.

However, we have BT, B, so B is not symmetric.

Theorem 1 shows that every orthogonally diagonalizable matrix is symmetric. But,

is the converse true? Let’s begin by consider the case of 2 × 2 symmetric matrices.

EXAMPLE 2 Prove that every 2 × 2 symmeric matrix A =

[

a b

b c

]

is orthogonally diagonalizable.

Solution: We will show that we can find an orthonormal basis of eigenvectors for A

so that we can find an orthogonal matrix P which diagonalizes A. We have

C(λ) = det

[

a − λ b

b c − λ

]

= λ2 − (a + c)λ + ac − b2. Thus, by the quadratic formula,

the eigenvalues of A are

λ+ =a + c +

√

(a + c)2 − 4(ac − b2)

2=

a + c +√

(a − c)2 + 4b2

2

λ− =a + c −

√

(a + c)2 − 4(ac − b2)

2=

a + c −√

(a − c)2 + 4b2

2

If a = c and b = 0, then A is already diagonal and thus can be orthogonally diagonal-

ized by I. Otherwise, A has two distinct real eigenvalues. We have

A − λ+I =

a−c−√

(a−c)2+4b2

2b

bc−a−√

(a−c)2+4b2

2

∼

1 2b

a−c−√

(a−c)2+4b2

0 0

Hence, an eigenvector for λ+ is ~v1 =

−2b

a−c−√

(a−c)2+4b2

1

. Also,

A − λ−I =

a−c+√

(a−c)2+4b2

2b

bc−a+√

(a−c)2+4b2

2

∼

1 2b

a−c+√

(a−c)2+4b2

0 0

so, an eigenvector for λ− is ~v2 =

−2b

a−c+√

(a−c)2+4b2

1

.

Observe that

~v1 · ~v2 =

−2b

a − c −√

(a − c)2 + 4b2

−2b

a − c +√

(a − c)2 + 4b2

+ 1(1)

=4b2

(a − c)2 − (a − c)2 − 4b2+ 1 = 0

Thus, the eigenvectors are orthogonal and hence we can normalize them to get an

orthonormal basis for R2. So, A can be diagonalized by an orthogonal matrix.


Example 2 not only shows that every 2 × 2 symmetric matrix is orthogonally diag-

onalizable, but it shows some other important facts. It proves that all eigenvalues

any 2 × 2 symmetric matrix must be real, and that the eigenvectors corresponding to

different eigenvalues are naturally orthogonal.

LEMMA 2 If A is a symmetric matrix with real entries, then all of its eigenvalues are real.

This proof requires properties of vectors and matrices with complex entries and so

will be delayed until Chapter 11.

The wonderful part of Lemma 2 is that implies that we can use the Triangularization

Theorem on any symmetric matrix. This makes proving the fact that every symmetric

matrix is orthogonally diagonalizable very easy.

THEOREM 3 (Principal Axis Theorem)

Every symmetric matrix A is orthogonally diagonalizable.

Proof: By Lemma 2 all eigenvalues of A are real. Therefore, we can apply the

Triangularization Theorem to get that there exists an orthogonal matrix P such that

PT AP = T is upper triangular. Since A is symmetric, we have that AT = A and hence

T T = (PT AP)T = PT AT (PT )T = PT AP = T

Therefore, T is an upper triangular symmetric matrix. But, if T is upper triangular,

then T T is lower triangular, and so T is both upper and lower triangular and hence T

is diagonal. �

REMARK

We have now proven that a matrix is orthogonally diagonalizable if and only if it is

symmetric.

One disadvantage in using the Triangularization Theorem to prove the Principal Axis

Theorem is that it does not give us a nice way of orthogonally diagonalizing a sym-

metric matrix. So, we instead refer to our solution of Example 2. In the example, we

saw that the eigenvectors corresponding to different eigenvalues were naturally or-

thogonal. To prove that this is always the case, we first prove an important property

of symmetric matrices.


THEOREM 4 A matrix A is symmetric if and only if ~x · (A~y) = (A~x) · ~y for all ~x, ~y ∈ Rn.

Proof: Suppose that A is symmetric, then for any ~x, ~y ∈ Rn we have

~x · (A~y) = ~xT A~y = ~xT AT~y = (A~x)T~y = (A~x) · ~y

Conversely, if ~x · (A~y) = (A~x) · ~y for all ~x, ~y ∈ Rn, then

(~xT A)~y = (A~x)T~y = (~xT AT )~y

Since this is valid for all ~y ∈ Rn we get that ~xT A = ~xT AT by the Matrices Equal

Theorem. Taking transposes of both sides gives AT~x = A~x. This is still valid for all

~x ∈ Rn, so using the Matrices Equal Theorem again gives AT = A as required. �

THEOREM 5 If ~v1,~v2 are eigenvectors of a symmetric matrix A corresponding to distinct eigenval-

ues λ1, λ2, then ~v1 is orthogonal to ~v2.

Proof: We are assuming that A~v1 = λ1~v1 and A~v2 = λ2~v2, λ1 , λ2. Theorem 4 gives

λ1(~v1 · ~v2) = (λ1~v1) · ~v2 = (A~v1) · ~v2 = ~v1 · (A~v2) = ~v1 · (λ2~v2) = λ2(~v1 · ~v2)

Hence, (λ1 − λ2)(~v1 · ~v2) = 0. But, λ1 , λ2, so ~v1 · ~v2 = 0 as required. �

Consequently, if a symmetric matrix A has n distinct eigenvalues, the basis of eigen-

vectors which diagonalizes A will naturally be orthogonal. Hence, to orthogonally

diagonalize such a matrix A, we just need to normalize these eigenvectors to form an

orthonormal basis for Rn of eigenvectors of A.

EXAMPLE 3 Find an orthogonal matrix P such that PT AP is diagonal, where A =

1 0 −1

0 1 2

−1 2 5

.

Solution: The characteristic polynomial is

C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

1 − λ 0 −1

0 1 − λ 2

−1 2 5 − λ

∣

∣

∣

∣

∣

∣

∣

∣

= (1−λ)[(1−λ)(5−λ)−4]− (1−λ) = −(λ−1)λ(λ−6)

Thus, the eigenvalues of A are λ1 = 0, λ2 = 1, and λ3 = 6. For λ1 = 0 we get

A − 0I =

1 0 −1

0 1 2

−1 2 5

∼

1 0 −1

0 1 2

0 0 0


Hence, a basis for Eλ1, the eigenspace of λ1, is

1

−2

1

. For λ2 = 1, we get

A − I ∼

1 −2 0

0 0 1

0 0 0


2

1

0

. For λ3 = 6, we get

A − 6I ∼

1 0 1/50 1 −2/50 0 0


−1

2

5

.

As predicted by Theorem 5, we can see that these are orthogonal to each other. After

normalizing we find that an orthonormal basis of eigenvectors for A is

1/√

6

−2/√

6

1/√

6

,

2/√

5

1/√

5

0

,

−1/√

30

2/√

30

5/√

30

Thus, we take

P =

1/√

6 2√

5 −1/√

30

−2/√

6 1/√

5 2/√

30

1/√

6 0 5/√

30

Since the columns of P form an orthonormal basis for R3 we get that P is orthogonal.

Moreover, since the columns of P form a basis of eigenvectors for A, we get that P

diagonalizes A. In particular,

PT AP =

0 0 0

0 1 0

0 0 6

Notice that Theorem 5 does not say anything about different eigenvectors correspond-

ing to the same eigenvalue. In particular, if an eigenvalue of a symmetric matrix A has

geometric multiplicity 2, then there is no guarantee that a basis for the eigenspace of

the eigenvalue will be orthogonal. Of course, this is easily fixed as we can just apply

the Gram-Schmidt procedure to find an orthonormal basis for the eigenspace.


EXAMPLE 4 Orthogonally diagonalize A =

8 −2 2

−2 5 4

2 4 5

.

Solution: The characteristic polynomial of A is

C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

8 − λ −2 2

−2 5 − λ 4

2 4 5 − λ

∣

∣

∣

∣

∣

∣

∣

∣

=

8 − λ −4 2

0 0 9 − λ2 λ − 1 5 − λ

= −λ(λ − 9)2

Thus, the eigenvalues are λ1 = 0 and λ2 = 9 where the algebraic multiplicity of λ1 is

1 and the algebraic multiplicity of λ2 is 2. For λ1 = 0 we get

A − 0I ∼

1 0 1/20 1 1

0 0 0


1

2

−2

= {~v1}. For λ2 = 9 we get

A − 9I ∼

1 2 −2

0 0 0

0 0 0


−2

1

0

,

2

0

1

= {~v2,~v3}.

As predicted by Theorem 5, ~v2 and ~v3 are both orthogonal to ~v1. But, since they

are not orthogonal to each other, we apply the Gram-Schmidt procedure on the basis

{~v2,~v3} for Eλ2to get an orthonormal basis for Eλ2

. We take ~w2 = ~v2 and get

~v3 −⟨

~v3, ~w2

⟩

‖~w2‖2~w2 =

2

0

1

− −4

5

−2

1

0

=

2/54/51

To simplify calculations, we take ~w3 =

2

4

5

. Thus, {~w2, ~w3} is an orthogonal basis for

the eigenspace of λ2 and hence {~v1, ~w2, ~w3} is an orthogonal basis for R3 of eigenvec-

tors of A. We normalize the vectors to get the orthonormal basis

1/32/3−2/3

,

−2/√

5

1/√

5

0

,

2/√

45

4/√

45

5/√

45

Hence, we get

P =

1/3 −2/√

5 2/√

45

2/3 1/√

5 4/√

45

−2/3 0 5/√

45

and D = PT AP =

0 0 0

0 9 0

0 0 9


EXAMPLE 5 Orthogonally diagonalize A =

1 1 1

1 1 1

1 1 1

.


C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

1 − λ 1 1

1 1 − λ 1

1 1 1 − λ

∣

∣

∣

∣

∣

∣

∣

∣

=

1 − λ 1 2

1 1 − λ 2 − λ0 λ 0

= −λ2(λ − 3)

Thus, the eigenvalues are λ1 = 0 with algebraic multiplicity 2 and λ2 = 3 with

algebraic multiplicity 1. For λ1 = 0 we get

A − 0I ∼

1 1 1

0 0 0

0 0 0


−1

1

0

,

−1

0

1

= {~v1,~v2}.

Since this is not an orthogonal basis for Eλ1, we need to apply the Gram-Schmidt

procedure. We take ~w1 = ~v1 and get

−1

0

1

− 1

2

−1

1

0

=

−1/2−1/2

1

So, we take ~w2 =

1

1

−2

. Then, {~w1, ~w2} is an orthogonal basis for Eλ1. For λ2 = 3 we

get

A − 3I ∼

1 0 −1

0 1 −1

0 0 0


1

1

1

.

We normalize the basis vectors to get the orthonormal basis

−1/√

2

1/√

2

0

,

1/√

6

1/√

6

−2/√

6

,

1/√

3

1/√

3

1/√

3

Hence, we take

P =

−1/√

2 1/√

6 1/√

3

1/√

2 1/√

6 1/√

3

0 −2/√

6 1/√

3

and get

PT AP =

0 0 0

0 0 0

0 0 3



1. Orthogonally diagonalize each of the following

symmetric matrices.

(a)

[

2 2

2 2

]

(b)

0 2 −2

2 1 0

−2 0 1

(c)

[

4 −2

−2 7

]

(d)

2 −2 −5

−2 −5 −2

−5 −2 2

(e)

[

1 −2

−2 −2

]

(f)

1 0 2

0 1 4

2 4 2

(g)

1 1 1

1 1 1

1 1 1

(h)

2 2 4

2 4 2

4 2 2

(i)

2 −4 −4

−4 2 −4

−4 −4 2

(j)

5 −1 1

−1 5 1

1 1 5

(k)

0 1 −1

1 0 1

−1 1 0

(l)

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

2. Invent a 3 × 3 orthogonally diagonalizable ma-

trix that has eigenvalues λ1 = −3, λ2 = 1,

and λ3 = 6, with corresponding eigenvectors

~v1 =

−1

−2

2

, ~v2 =

−2

1

0

, and ~v3 =

2

4

5

.

3. Prove that if A is invertible and orthogonally

diagonalizable, then A−1 is orthogonally diago-

nalizable.

4. Let A and B be n×n symmetric matrices. Deter-

mine which of the following is also symmetric.

(a) AT A (b) AAT

(c) AB (d) A2

5. Let A be a symmetric matrix. Prove that all

eigenvalues of A are non-negative if and only

if there exists a symmetric matrix B such that

A = B2.


false. Justify your answer with a proof or a

counter example.

(a) Every orthogonal matrix is orthogonally

diagonalizable.

(b) If A and B are orthogonally diagonaliz-

able, then AB is orthogonally diagonaliz-

able.

(c) If A is orthogonally similar to a symmetric

matrix B, then A is orthogonally diagonal-

izable.

(d) If λ is an eigenvalue of a symmetric matrix

A, then gλ = aλ.


10.3 Quadratic Forms

We now use the results of the last section to study a very important class of functions

called quadratic forms. Quadratic forms are not only important in linear algebra, but

also in number theory, group theory, differential geometry, and many other areas.

Simply put, a quadratic form is a function defined as a linear combination of all

possible terms xix j for 1 ≤ i ≤ j ≤ n. For example, for two variables x1, x2 we have

a1x21+a2x1x2+a3x2

2, and for three variables x1, x2, x3 we get a1x2

1+a2x1x2+a3x1x3 +

a4x22 + a5x2x3 + a6x2

3.

Notice that a quadratic form is certainly not a linear function. Instead, they are tied

to linear algebra by matrix multiplication. For example, if B =

[

1 2

0 −1

]

and ~x =

[

x1

x2

]

,

then we have

Q(~x) = ~xT B~x =[

x1 x2

]

[

1 2

0 −1

] [

x1

x2

]

=[

x1 x2

]

[

x1 + 2x2

−x2

]

= x1(x1 + 2x2) + x2(−x2) = x21 + 2x1x2 − x2

2

which is a quadratic form. We now use this to precisely define a quadratic form.

DEFINITION

Quadratic Form

Let A be an n × n matrix. A function Q : Rn → R of the form

Q(~x) = ~xT A~x =

n∑

i=1

n∑

j=1

ai jxix j

is called a quadratic form.

There is a connection between the dot product and quadratic forms. Observe that

Q(~x) = ~xT A~x = ~x · (A~x) = (A~x) · ~x = (A~x)T~x = ~xT AT~x

Hence, A and AT give the same quadratic form (this does not imply that A = AT ).

Of course, this is most natural when A is symmetric. In particular, for any quadratic

form

Q(~x) = ~xT B~x =

n∑

i=1

n∑

j=1

bi jxix j

if we define a matrix A by

(A)i j = (A) ji =

bii if i = j12[bi j + b ji] if i , j

then we have that A is symmetric and Q(~x) = ~xT A~x. Hence, there is a one-to-one

correspondence between symmetric matrices and quadratic forms. Thus, as we will

see, we often deal with a quadratic form and its corresponding symmetric matrix in

the same way.

Section 10.3 Quadratic Forms 293

EXAMPLE 1 Let B =

[

1 2

0 −1

]

. Find the symmetric matrix corresponding to Q(~x) = ~xT B~x.

Solution: From our work above we have

Q(~x) = ~xT B~x = x21 + 2x1x2 − x2

2

Thus, we take (A)11 = b11 = 1, (A)12 = A21 =12b12 =

12(2) = 1, and (A)22 = b22 = −1.

So, A =

[

1 1

1 −1

]

.

It is easy to verify that Q(~x) = ~xT A~x.

EXAMPLE 2 Let Q(x1, x2) = 2x21+ 3x1x2 − 4x2

2. Find the corresponding symmetric matrix A.

Solution: We get a11 = 2, 2a12 = 3, and a22 = −4. Hence,

A =

[

2 3/23/2 −4

]

EXAMPLE 3 Let Q(x1, x2, x3) = x21 − 2x1x2 + 8x1x3 + 3x2

2 + x2x3 − 5x23. Find the corresponding

symmetric matrix A.

Solution: We have a11 = 1, 2a12 = −2, 2a13 = 8, a22 = 3, 2a23 = 1, and a33 = −5.

Hence,

A =

1 −1 4

−1 3 1/24 1/2 −5

EXERCISE 1 Let Q(x1, x2, x3) = 3x21 +

12x1x2 − x2

2 + 2x2x3 + 2x23, Find the corresponding symmetric

matrix A.

Of course, this also works in reverse. That is, if A is symmetric, then the coefficient

of bi j of xix j is given by

bi j =

aii if i = j

2ai j if i < j

EXAMPLE 4 If A =

[

1 4

4 3

]

, then the quadratic form corresponding to A is

Q(x1, x2) = x21 + 2(4)x1x2 + 3x2

2 = x21 + 8x1x2 + 3x2

2


EXAMPLE 5 If A =

1 2 −3

2 −4 0

−3 0 −1


Q(x1, x2, x3) = x21 + 2(2)x1x2 + 2(−3)x1x3 − 4x2

2 + 2(0)x2x3 − x23

= x21 + 4x1x2 − 6x1x3 − 4x2

2 − x23

EXERCISE 2 Let A =

−3 1/2 1

1/2 3/2 2

1 2 0

. Find the quadratic form corresponding to A.

EXAMPLE 6 If A =

1 0 0

0 −2 0

0 0 −3


Q(x1, x2, x3) = x21 − 2x2

2 − 3x23

The terms xix j with i , j are called cross-terms. We see that a quadratic form Q(~x)

has no cross terms if and only if its corresponding symmetric matrix is diagonal. We

will call such a quadratic form a diagonal quadratic form.

Classifying Quadratic Forms

DEFINITION

Positive Definite

Negative Definite

Indefinite

PositiveSemidefinite

NegativeSemidefinite

Let Q(~x) be a quadratic form. We say that:

1. Q(~x) is positive definite if Q(~x) > 0 for all ~x , ~0,

2. Q(~x) is negative definite if Q(~x) < 0 for all ~x , ~0,

3. Q(~x) is indefinite if Q(~x) > 0 for some ~x and Q(~x) < 0 for some ~x,

4. Q(~x) is positive semidefinite if Q(~x) ≥ 0 for all ~x,

5. Q(~x) is negative semidefinite if Q(~x) ≤ 0 for all ~x.

EXAMPLE 7 Q(x1, x2) = x21+ x2

2is positive definite, Q(x1, x2) = −x2

1− x2

2is negative defi-

nite, Q(x1, x2) = x21− x2

2is indefinite, Q(x1, x2) = x2

1is positive semidefinite, and

Q(x1, x2) = −x21

is negative semidefinite.

We classify symmetric matrices in the same way that we classify their corresponding

quadratic forms.


EXAMPLE 8 The symmetric matrix A =

[

1 0

0 1

]

corresponds to the quadratic form Q(x1, x2) =

x21+ x2

2, so A is positive definite as Q is positive definite.

EXAMPLE 9 Classify the symmetric matrix A =

[

4 −2

−2 4

]

and the corresponding quadratic form

Q(~x) = ~xT A~x.

Solution: Observe that we have

Q(~x) = 4x21 − 4x1x2 + 4x2

2 = (2x1 − x2)2 + 3x22 > 0

whenever ~x , ~0. Thus, Q(~x) and A are both positive definite.

REMARK

These definitions can be generalized to other functions or matrices. For example, the

matrix A =

[

1 1

0 1

]

could be called positive definite since

~xT A~x = x21 + x1x2 + x2

2 =

(

x1 +1

2x2

)2

+3

4x2

2 > 0

for all ~x , ~0. In this book, we will only be dealing with quadratic forms and sym-

metric matrices. However, it is important to remember that “Assume A is positive

definite” does not imply that A is symmetric.

Each of the quadratic forms in the example above was simple to classify since it either

had no cross-terms or we were able to easily complete a square. To classify more

difficult quadratic forms, we use our theory from the last section and the connection

between quadratic forms and symmetric matrices.

THEOREM 1 Let A be the symmetric matrix corresponding to a quadratic form Q(~x) = ~xT A~x. If P

is an orthogonal matrix that diagonalizes A, then Q(~x) can be expressed as

λ1y21 + · · ·λny2

n

where ~y = PT~x and where λ1, . . . , λn are the eigenvalues of A corresponding to the

columns of P.

Proof: Since P is orthogonal, if ~y = PT~x, then ~x = P~y. Moreover, we have that

PT AP = D is diagonal where the diagonal entries of D are the eigenvalues λ1, . . . , λn

of A. Hence,

Q(~x) = ~xT A~x = (P~y)T A(P~y) = ~yT PT AP~y = ~yT D~y = λ1y21 + · · · λny2

n

�


This theorem shows that every quadratic form Q(~x) is equivalent to a diagonal quadratic

form. Moreover, Q(~x) can be brought into this form by the change of variables

~y = PT~x where P is an orthogonal matrix which diagonalizes the corresponding

symmetric matrix A.

EXAMPLE 10 Find new variables y1, y2, y3, y4 such that

Q(x1, x2, x3, x4) = 3x21+2x1x2−10x1x3+10x1x4+3x2

2+10x2x3−10x2x4+3x23+2x3x4+3x2

4

has diagonal form. Use the diagonal form to classify Q(~x).

Solution: We have corresponding symmetric matrix A =

3 1 −5 5

1 3 5 −5

−5 5 3 1

5 −5 1 3

.

We first find an orthogonal matrix P which diagonalizes A. Using the methods of

Section 10.2 we find that P = 12

1 1 1 1

−1 −1 1 1

−1 1 1 −1

1 −1 1 −1

and PT AP =

12 0 0 0

0 −8 0 0

0 0 4 0

0 0 0 4

.

Thus, if

~y =

y1

y2

y3

y4

= PT

x1

x2

x3

x4

=1

2

x1 − x2 − x3 + x4

x1 − x2 + x3 − x4

x1 + x2 + x3 + x4

x1 + x2 − x3 − x4

then we get

Q = 12y21 − 8y2

2 + 4y23 + 4y2

4

Therefore, Q(~x) clearly takes positive and negative values, so Q(~x) is indefinite.

REMARKS

1. The eigenvectors we used to make up P are called the principal axes of A. We

will see a geometric interpretation of this in the next section.

2. By changing the order of the eigenvectors in P we also change the order of

the eigenvalues in D and hence the coefficients of the corresponding yi. For

example, in Example 10, if we took

P =1

2

1 1 1 1

−1 1 1 −1

1 1 −1 −1

−1 1 −1 1

then we would get Q = −8y21+ 4y2

2+ 4y2

3+ 12y2

4. Notice, that since we can pick

any vector ~y ∈ R4, this does in fact give us exactly the same set of values as

our choice above. Alternatively, you can think of it as doing another change of

variables z1 = y2, z2 = y3, z3 = y4, and z4 = y1.


Generalizing the method of the example above gives us the following theorem.

THEOREM 2 If A is a symmetric matrix, then the quadratic form Q(~x) = ~xT A~x is

(1) positive definite if and only if the eigenvalues of A are all positive

(2) negative definite if and only if the eigenvalues of A are all negative

(3) indefinite if and only if the some of the eigenvalues of A are positive and some

are negative

(4) positive semidefinite if and only if all of the eigenvalues are non-negative

(5) negative semidefinite if and only if all of the eigenvalues are non-positive

Proof: We will prove (1). The proofs of the others are similar. By the Principal Axis

Theorem A is orthogonally diagonalizable, so there exists an orthogonal matrix P

such that

Q(~x) = ~xT A~x = ~yT D~y = λ1y21 + · · · + λny2

n

where ~y = PT~x and λ1, . . . , λn are the eigenvalues of A. Consequently, Q(~x) > 0 for

all ~x , ~0 if and only if all λi are positive. Thus, Q(~x) is positive definite if and only if

the eigenvalues of A are all positive. �

EXAMPLE 11 Classify the following:

(a) Q(x1, x2) = 4x21 − 6x1x2 + 2x2

2.


[

4 −3

−3 2

]

so

det(A − λI) =

∣

∣

∣

∣

∣

∣

4 − λ −3

−3 2 − λ

∣

∣

∣

∣

∣

∣

= λ2 − 6λ − 1

Applying the quadratic formula we get λ = 6±√

402

. So, we have both positive and

negative eigenvalues so Q is indefinite.

(b) Q(x1, x2, x3) = 2x21 + 2x1x2 + 2x1x3 + 2x2

2 + 2x2x3 + 2x23.


2 1 1

1 2 1

1 1 2

. We can find that the eigenvalues are 1, 1 and 4.

Hence all the eigenvalues of A are positive so A is positive definite.

(c) A =

3 2 0

2 2 2

0 2 1

.

Solution: The eigenvalues of A are 5, 2, and −1. Therefore, A is indefinite.



1. For each of the following quadratic forms Q(~x):

(i) Find the corresponding symmetric matrix.

(ii) Classify the quadratic form and its corre-

sponding symmetric matrix.

(iii) Find a corresponding diagonal form of

Q(~x) and the change of variables which

brings it into this form.

(a) Q(~x) = x21+ 3x1x2 + x2

2

(b) Q(~x) = 8x21+ 4x1x2 + 11x2

2

(c) Q(~x) = x21+ 8x1x2 − 5x2

2

(d) Q(~x) = −x21+ 4x1x2 + 2x2

2

(e) Q(~x) = 4x21+4x1x2+4x1x3+4x2

2+4x2x3+

4x23

(f) Q(~x) = 4x1x2 − 4x1x3 + x22 − x2

3

(g) Q(~x) = 4x21−2x1x2+2x1x3+4x2

2+2x2x3+

4x23

(h) Q(~x) = −4x21+2x1x2+2x1x3−4x2

2−2x2x3−

4x23

2. Let Q(~x) = ~xT A~x with A =

[

a b

b c

]

and

det A , 0.

(a) Prove that Q is positive definite if det A >0 and a > 0.

(b) Prove that Q is negative definite if det A >0 and a < 0.

(c) Prove that Q is indefinite if det A < 0.

3. Let A be an n × n matrix and let ~x, ~y ∈ Rn. De-

fine < ~x, ~y >= ~xT A~y. Prove that < , > is an

inner product on Rn if and only if A is a posi-

tive definite, symmetric matrix.

4. Let A be a positive definite symmetric matrix.

Prove that:

(a) the diagonal entries of A are all positive.

(b) A is invertible.

5. Let A and B be symmetric n×n matrices whose

eigenvalues are all positive. Show that the

eigenvalues of A + B are all positive.

Section 10.4 Graphing Quadratic Forms 299

10.4 Graphing Quadratic Forms

In many applications of quadratic forms it is important to be able to sketch the graph

of a quadratic form Q(~x) = k for some k. Observe that, in general, it is not easy to

identify the shape of the graph of an equation of the form ax21+ bx1x2 + cx2

2= k by

inspection. On the other hand, if you are familiar with conic sections, it is easy to

determine the shape of the graph of an equation of the form λ1x21 + λ2x2

2 = k. We

demonstrate this in a table.

The graph of λ1x21 + λ2x2

2 = k looks like:

k > 0 k = 0 k < 0

λ1 > 0, λ2 > 0 ellipse point(0, 0) does not exist

λ1 < 0, λ2 < 0 does not exist point(0, 0) ellipse

λ1λ2 < 0 hyperbola asymptotes for hyperbola hyperbola

λ1 = 0, λ2 > 0 parallel lines line x2 = 0 does not exist

λ1 = 0, λ2 < 0 does not exist line x2 = 0 parallel lines

In the last section, we saw that we could bring a quadratic form into diagonal form

by performing the change variables ~y = PT~x where P orthogonally diagonalizes

the corresponding symmetric matrix. Therefore, to sketch an equation of the form

ax21 + bx1x2 + cx2

2 = k we could first apply the change of variables to bring it into the

form λ1y21+λ2y2

2= k which will be easy to sketch. However, we must determine how

performing the change of variables will affect the graph.

THEOREM 1 If Q(~x) = ax21+ bx1x2 + cx2

2with a, b, c not all zero, then there exists an orthogonal

matrix P, which corresponds to a rotation in R2, such that the change of variables

~y = PT~x brings Q(~x) into diagonal form.

Proof: Let A =

[

a b/2b/2 c

]

. Since A is symmetric we can apply the Principal Axis

Theorem to get that there exists an orthonormal basis {~v1,~v2} of R2 of eigenvectors of

A. Let ~v1 =

[

a1

a2

]

and ~v2 =

[

b1

b2

]

. Since ~v1 is a unit vector we must have a21+ a2

2= 1.

Hence, the components lie on the unit circle and so there exists an angle θ such that

a1 = cos θ and a2 = sin θ. Moreover, since ~v2 is a unit vector orthogonal to ~v1 we can

pick b1 = − sin θ and b2 = cos θ. Hence, we have

P =

[


]

which corresponds to a rotation by θ. Finally, from our work above, we know that

this change of coordinates matrix brings Q into diagonal form. �


REMARK

Observe that our choices for ~v1 and ~v2 in the proof above were not the only possibili-

ties. For example, we could have picked ~v1 =

[

cos θsin θ

]

and ~v2 =

[

sin θ− cos θ

]

. In this case

the matrix P =[

~v1 ~v2

]

would correspond to a rotation and a reflection. This does

not contradict the theorem though. The theorem only says that we can choose P to

correspond to a rotation, not that a rotation is the only choice. We will demonstrate

this in an example below.

EXAMPLE 1 Consider the equation x21+ x1x2 + x2

2= 1. Find a rotation so that the equation has no

cross terms.


[

1 1/21/2 1

]

. The eigenvalues are λ1 =12

and λ2 =32.

Therefore, the graph of x21+x1x2+x2

2= 1 is an ellipse. We find that the corresponding

unit eigenvectors are ~v1 =1√2

[

−1

1

]

and ~v2 =1√2

[

−1

−1

]

. So, from our work in the proof

of the theorem, we can pick cos θ =−1√

2and sin θ =

1√

2. Hence, θ =

3π

4and

[

x1

x2

]

=

[

−1/√

2 −1/√

2

1/√

2 −1/√

2

] [

y1

y2

]

=

[

(−y1 − y2)/√

2

(y1 − y2)/√

2

]

Substituting these into the equation gives

1 = x21 + x1x2 + x2

2 =

(

−y1 − y2√2

)2

+

(

−y1 − y2√2

) (

y1 − y2√2

)

+

(

y1 − y2√2

)2

=1

2(y2

1 + 2y1y2 + y22) +

1

2(−y2

1 + y22) +

1

2(y2

1 − 2y1y2 + y22)

=1

2y2

1 +3

2y2

2

Thus, the graph of x21+ x1x2 + x2

2= 1 is the graph of

1

2y2

1 +3

2y2

2 = 1 rotated by 3π4

radians. This is shown below.


Let us look at this procedure in general. Assume that we want to sketch Q(x1, x2) = k

where Q(x1, x2) is a quadratic form with corresponding symmetric matrix A. We

first write Q(x1, x2) in diagonal form λ1y21 + λ2y2

2 by finding the orthogonal matrix

P =[

~v1 ~v2

]

which diagonalizes A and performing the change of variables ~y = PT~x.

We can then easily sketch λ1y21+ λ2y2

2= k (finding the equations of the asymptotes if

it is a hyperbola) in the y1y2-plane. Then, using the fact that

~x = P~y =[

~v1 ~v2

]

[

y1

y2

]

= y1~v1 + y2~v2

we can find the y1-axis and the y2-axis in the x1x2-plane. In particular, the y1-axis in

the y1y2-plane is spanned by

[

1

0

]

, thus the y1-axis in the x1x2-plane is spanned by

~x = 1~v1 + 0~v2 = ~v1

Hence, performing the change of variables rotates the y1-axis to be in the direction

of ~v1. Similarly, the y2 axis is rotated to the ~v2 direction. Therefore, we can sketch

the graph of Q(x1, x2) = k in the x1x2-plane by drawing the y1 and y2 axes in the

x1x2-plane and sketching the graph of λ1y21 + λ2y2

2 = k on these axes as we did in

the y1y2-plane. For this reason, the orthogonal eigenvectors ~v1,~v2 of A are called the

principal axes for Q(x1, x2).

EXAMPLE 2 Sketch 6x21+ 6x1x2 − 2x2

2= 5.

Solution: We start by finding an orthogonal matrix which diagonalizes the corre-

sponding symmetric matrix A.

We have A =

[

6 3

3 −2

]

, so det(A − λI) =

∣

∣

∣

∣

∣

∣

6 − λ 3

3 −2 − λ

∣

∣

∣

∣

∣

∣

= (λ − 7)(λ + 3). We

find corresponding eigenvectors of A are

[

3

1

]

for λ = 7 and

[

−1

3

]

for λ = −3. Thus,

P =

[

3/√

10 −1/√

10

1/√

10 3/√

10

]

orthogonally diagonalizes A.

Our work above shows that the change of vari-

ables ~y = PT~x changes 6x21+6x1x2−2x2

2= 5 into

7y21− 3y2

2= 5.

To sketch the hyperbola 5 = 7y21− 3y2

2more ac-

curately, we first find the equations of its asymp-

totes. We get

0 = 7y21 − 3y2

2

y2 = ±√

21

3y1

≈ ±1.528y1


Now to sketch 6x21 +6x1x2 −2x2

2 = 5 in the x1x2-plane we first draw the y1-axis in the

direction of

[

3

1

]

and the y2-axis in the direction of

[

−1

3

]

. Next, to be more precise, we

use the change of variables ~y = PT~x to convert the equations of the asymptotes of the

hyperbola to equations in the x1x2-plane. We have

[

y1

y2

]

=1√

10

[

3 1

−1 3

] [

x1

x2

]

=1√

10

[

3x1 + x2

−x1 + 3x2

]

Hence, we get the equations of the asymptotes are

y2 = ±√

21

3y1

1√

10(−x1 + 3x2) = ±

√21

3

1√

10(3x1 + x2)

−3x1 + 9x2 = ±3√

21x1 ±√

21x2

x2(9 ∓√

21) = (3 ± 3√

21)x1

x2 =3 ± 3

√21

9 ∓√

21x1

Plotting the asymptotes and copying the graph

from above onto the y1 and y2 axes gives the pic-

ture to the right.

REMARKS

1. The advantage of finding the principal axes over just calculating the angle of

rotation is that we actually get precise equations of the axes and the asymptotes

which are required in some applications.

2. Another way of converting the equations of the asymptotes is to find the direc-

tion vectors ~a1 and ~a2 of the asymptotes in the y1y2-plane and compute P~a1 and

P~a2 which will be the direction vectors of the asymptotes in the x1x2-plane.


EXAMPLE 3 Sketch 2x21 + 4x1x2 + 5x2

2 = 1.


[

2 2

2 5

]

, so det(A − λI) =

∣

∣

∣

∣

∣

∣

2 − λ 2

2 5 − λ

∣

∣

∣

∣

∣

∣

= (λ − 6)(λ − 1).

We find corresponding eigenvectors of A are

[

1

2

]

for λ = 6 and

[

−2

1

]

for λ = 1. To demon-

strate a rotation and a reflection, we will pick

P =

[

−2/√

5 1/√

5

1/√

5 2/√

5

]

. The change of variables

~y = PT~x changes

2x21+4x1x2+5x2

2= 1 into the ellipse y2

1+6y2

2= 1.

Graphing gives the picture to the right.

To sketch 2x21+4x1x2+5x2

2 = 1 in the x1x2-plane

we draw the y1-axis in the direction of

[

−2

1

]

and

the y2-axis in the direction of

[

1

2

]

and copy the

picture above onto the axes appropriately to get

the picture to the right.

EXAMPLE 4 Sketch x21+ 4x1x2 − 2x2

2= 1.


[

1 2

2 −2

]

so det(A−λI) =

∣

∣

∣

∣

∣

∣

1 − λ 2

2 −2 − λ

∣

∣

∣

∣

∣

∣

= (λ+3)(λ−2). We

find corresponding eigenvectors of A are

[

−1

2

]

for λ = −3 and

[

2

1

]

for λ = 2. Thus,

P =

[

−1/√

5 2/√

5

2/√

5 1/√

5

]

orthogonally diagonalizes

A.

Then, the change of variables ~y = PT~x changes

x21+ 4x1x2 − 2x2

2= 1 into −3y2

1+ 2y2

2= 1. This

is a hyperbola, so we need to find the equations

of its asymptotes. We get

0 = −3y21 + 2y2

2

y2 = ±√

6

2y1

≈ ±1.225y1


To sketch x21 + 4x1x2 − 2x2

2 = 1 in the x1x2-plane we first draw the y1-axis in the

direction of

[

−1

2

]

and the y2-axis in the direction of

[

2

1

]

. Next, we use the change

of variables ~y = PT~x to convert the equations of the asymptotes of the hyperbola to

equations in the x1x2-plane. We have

[

y1

y2

]

=1√

5

[

−1 2

2 1

] [

x1

x2

]

=1√

5

[

−x1 + 2x2

2x1 + x2

]

Hence, we get the equations of the asymptotes are

y2 = ±√

6

2y1

1√

5(2x1 + x2) = ±

√6

2

1√

5(−x1 + 2x2)

4x1 + 2x2 = ∓√

6x1 ± 2√

6x2

x2(2 ∓ 2√

6) = (−4 ∓√

6)x1

x2 =−4 ∓

√6

2 ∓ 2√

6x1

Plotting the asymptotes and copying the graph

from above onto the y1 and y2 axes gives the pic-

ture to the right.

Notice that in the last two examples, the change of variables we used actually corre-

sponded to a reflection and a rotation.


1. Sketch the graph of each of the following equations showing both the original

and new axes. For any hyperbola, find the equation of the asymptotes.

(a) x21+ 8x1x2 + x2

2= 6 (b) 3x2

1− 2x1x2 + 3x2

2= 12

(c) −4x21+ 4x1x2 − 7x2

2= −8 (d) −3x2

1− 4x1x2 = 4

(e) −x21 + 4x1x2 + 2x2

2 = 6 (f) 4x21 + 4x1x2 + 4x2

2 = 12

(g) 2x21 − 4x1x2 − x2

2 = 6 (h) 3x21 + 4x1x2 + 3x2

2 = 5

Section 10.5 Optimizing Quadratic Forms 305

10.5 Optimizing Quadratic Forms

We use quadratic forms in calculus to classify critical points as local minimums and

maximums. However, in many other applications of quadratic forms, we actually

want to find the maximum and/or minimum of the quadratic form subject to a con-

straint. Most of the time, it is possible to use a change of variables so that the con-

straint is ‖~x ‖ = 1. That is, given a quadratic form Q(~x) = ~xT A~x on Rn we want to find

the maximum and minimum value of Q(~x) subject to ‖~x ‖ = 1. For ease, we instead

use the equivalent constraint

1 = ‖~x ‖2 = x21 + · · · + x2

n

To develop a procedure for solving this problem in general, we begin by looking at a

couple of examples.

EXAMPLE 1 Find the maximum and minimum of Q(x1, x2) = 2x21+ 3x2

2subject to the constraint

x21 + x2

2 = 1.

Solution: Since we don’t have a general method for solving this type of problem, we

resort to the basics. We will first try a few points (x1, x2) that satisfy the constraint.

We get

Q(1, 0) = 2

Q

√3

2,

1

2

=9

4

Q

(

1√

2,

1√

2

)

=5

2

Q

1

2,

√3

2

=11

4

Q(0, 1) = 3

Although we have only tried a few points, we may guess that 3 is the maximum value.

Since we have already found that Q(0, 1) = 3, to prove that 3 is the maximum we just

need to show that 3 is an upper bound for Q(x1, x2) subject to x21+ x2

2= 1. Indeed,

we have that

Q(x1, x2) = 2x21 + 3x2

2 ≤ 3x21 + 3x2

2 = 3(x21 + x2

2) = 3

Hence, 3 is the maximum.

Similarly, we have

Q(x1, x2) = 2x21 + 3x2

2 ≥ 2x21 + 2x2

2 = 2(x21 + x2

2) = 2

Hence, 2 is a lower bound for Q(x1, x2) subject to the constraint and Q(1, 0) = 2, so

2 is the minimum.


This example was extremely easy since we had no cross-terms in the quadratic form.

We now look at what happens if we have cross-terms.

EXAMPLE 2 Find the maximum and minimum of Q(x1, x2) = 7x21+ 8x1x2 + x2

2subject to the

constraint x21 + x2

2 = 1.

Solution: We first try some values of (x1, x2) that satisfy the constraint. We have

Q(1, 0) = 7

Q

(

1√

2,

1√

2

)

= 8

Q(0, 1) = 1

Q

(

− 1√

2,

1√

2

)

= 0

Q(−1, 0) = 7

From this we might think that the maximum is 8 and the minimum is 0, although we

again realize that we have tested very few points. Indeed, if we try more points, we

quickly find that these are not the maximum and minimum. So, instead of testing

points, we need to try to think of where the maximum and minimum should occur.

In the previous example, they occurred at (1, 0) and (0, 1) which are on the principal

axes. Thus, it makes sense to consider the principal axes of this quadratic form as

well.

We find that A =

[

7 4

4 1

]

has eigenvalues λ1 = 9 and λ2 = −1 with corresponding unit

eigenvectors ~v1 =

[

2/√

5

1/√

5

]

and ~v2 =

[

1/√

5

−2/√

5

]

. Taking these for (x1, x2) we get

Q

(

2√

5,

1√

5

)

= 9, Q

(

1√

5,−2√

5

)

= −1

Moreover, taking

~y = PT~x =1√

5

[

2 1

1 −2

] [

x1

x2

]

=1√

5

[

2x1 + x2

x1 − 2x2

]

we get

Q(x1, x2) = ~yT D~y = 9

(

2√

5x1 +

1√

5x2

)2

−(

1√

5x1 −

2√

5x2

)2

≤ 9

(

2√

5x1 +

1√

5x2

)2

+ 9

(

1√

5x1 −

2√

5x2

)2

= 9

(

2√

5x1 +

1√

5x2

)2

+

(

1√

5x1 −

2√

5x2

)2

= 9

since(

2√

5x1 +

1√

5x2

)2

+

(

1√

5x1 −

2√

5x2

)2

= x21 + x2

2 = 1

Similarly, we can show that Q(x1, x2) ≥ −1. Thus, the maximum of Q(x1, x2) subject

to ‖~x ‖ = 1 is 9 and the minimum is −1.

Section 10.5 Optimizing Quadratic Forms 307

We generalize what we learned in the examples above to get the following theorem.

THEOREM 1 Let Q(~x) be a quadratic form on Rn with corresponding symmetric matrix A. The

maximum value and minimum value of Q(~x) subject to the constraint ‖~x ‖ = 1 are the

greatest and least eigenvalues of A respectively. Moreover, these values occur when

~x is taken to be a unit eigenvector corresponding to the eigenvalue.

Proof: Since A is symmetric, we can find an orthogonal matrix P =[

~v1 · · · ~vn

]

which orthogonally diagonalizes A where ~v1, . . . ,~vn are arranged so that the corre-

sponding eigenvalues λ1, . . . , λn satisfy λ1 ≥ λ2 ≥ · · · ≥ λn.

Hence, there exists a diagonal matrix D such that

~xT A~x = ~yT D~y

where ~x = P~y. Since P is orthogonal we have that

‖~y ‖ = ‖P~y ‖ = ‖~x ‖ = 1

Thus, the quadratic form ~yT D~y subject to ‖~y ‖ = 1 takes the same set of values as the

quadratic form ~xT A~x subject to ‖~x ‖ = 1. Hence, we just need to find the maximum

and minimum of ~yT D~y subject to ‖~y ‖ = 1.

Since y21+ · · · + y2

n = ‖~y ‖2 = 1 we have

~yT D~y = λ1y21 + · · · + λny2

n ≤ λ1y21 + · · · + λ1y2

n = λ1(y21 + · · · + y2

n) = λ1

and ~yT D~y = λ1 with ~y = ~e1. Hence, λ1 is the maximum value with ~x = P~e1 = ~v1.

Similarly,

~yT D~y = λ1y21 + · · · + λny2

n ≥ λny21 + · · · + λny2

n = λn(y21 + · · · + y2

n) = λn

and ~yT D~y = λn with ~y = ~en. Hence, λn is the minimum value with ~x = P~en = ~vn. �

EXAMPLE 3 Let A =

−3 0 −1

0 3 0

−1 0 −3

. Find the maximum and minimum value of the quadratic form

Q(~x) = ~xT A~x subject to x21+ x2

2+ x2

3= 1.


C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

−3 − λ 0 −1

0 3 − λ 0

−1 0 −3 − λ

∣

∣

∣

∣

∣

∣

∣

∣

= −(λ − 3)(λ + 4)(λ + 2)

By Theorem 1, the maximum of Q(~x) subject to the constraint is 3 and the minimum

is −4.


EXAMPLE 4 Find the maximum and minimum value of the quadratic form

Q(x1, x2, x3) = 4x21 + 2x1x2 + 2x1x3 + 4x2

2 − 2x2x3 + 4x23

subject to x21 + x2

2 + x23 = 1. Find vectors at which the maximum and minimum occur.

Solution: The characteristic polynomial of the corresponding symmetric matrix is

C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

4 − λ 1 1

1 4 − λ −1

1 −1 4 − λ

∣

∣

∣

∣

∣

∣

∣

∣

= −(λ − 5)2(λ − 2)

We find that a unit eigenvectors corresponding to λ1 = 5 and λ2 = 2 are ~v1 =

1/√

2

1/√

2

0

and

~v2 =

−1/√

3

1/√

3

1/√

3

respectively. By Theorem 1, the maximum of Q(~x) subject to x21+ x2

2+

x23 = 1 is 5 and occurs when ~x = ~v1, and the minimum is 2 and occurs when ~x = ~v2.


1. Find the maximum and minimum of each quadratic form subject to ‖~x ‖ = 1.

(a) Q(~x) = 4x21 − 2x1x2 + 4x2

2

(b) Q(~x) = 3x21+ 10x1x2 − 3x2

2

(c) Q(~x) = −x21+ 8x2

2+ 4x2x3 + 11x2

3

(d) Q(~x) = 3x21 − 4x1x2 + 8x1x3 + 6x2

2 + 4x2x3 + 3x23

(e) Q(~x) = 2x21 + 2x1x2 + 2x1x3 + 2x2

2 + 2x2x3 + 2x23

Section 10.6 Singular Value Decomposition 309

10.6 Singular Value Decomposition

We have now seen that finding the maximum and minimum of a quadratic form sub-

ject to ‖~x ‖ = 1 is very easy. However, in many applications we are not lucky enough

to get a symmetric matrix; in fact, we often do not even get a square matrix. Thus, it

is very desirable to derive a similar method for finding the maximum and minimum

of ‖A~x ‖ for an m × n matrix A subject to ‖~x ‖ = 1. This derivation will lead us to a

very important matrix factorization known as the singular value decomposition. We

again begin by looking at an example.

EXAMPLE 1 Consider a linear mapping f : R3 → R2 defined by f (~x) = A~x with A =

[

1 0 1

1 1 −1

]

.

Since the range is now a subspace of R2, we want to find the maximum length of A~xsubject to ‖~x ‖ = 1. That is, we want to maximize ‖A~x ‖. Observe that

‖A~x ‖2 = (A~x)T (A~x) = ~xT AT A~x

Since AT A is symmetric this is a quadratic form. Hence, using Theorem 10.5.1, to

maximize ‖A~x ‖ we just need to find the square root of the largest eigenvalue of AT A.

We have

AT A =

2 1 0

1 1 −1

0 −1 2

The characteristic polynomial of AT A is C(λ) = −λ(λ − 3)(λ − 2). Thus, the eigen-

values of AT A are 3, 2 and 0. Hence, the maximum of ‖A~x ‖ subject to ‖~x ‖ = 1 is√

3

and the minimum is 0. Moreover, these occur at the corresponding eigenvectors

−1/√

3

−1/√

3

1/√

3

and

−1/√

6

2/√

6

1/√

6

of AT A.

In the example, we found that the eigenvalues of AT A were all non-negative. This

was important as we had to take the square root of them to maximize/minimize ‖A~x ‖.We now prove that this is always the case.

THEOREM 1 If A is an m× n matrix and λ1, . . . , λn are the eigenvalues of AT A with corresponding

unit eigenvectors ~v1, . . . ,~vn, then λ1, . . . , λn are all non-negative. In particular,

‖A~vi ‖ =√

λi, 1 ≤ i ≤ n

Proof: For 1 ≤ i ≤ n we have AT A~vi = λi~vi and hence

‖A~vi ‖2 = (A~vi)T A~vi = ~v

Ti AT A~vi = ~v

Ti (λi~vi) = λi~v

Ti ~vi = λi‖~vi‖2 = λi

since ‖~vi ‖ = 1. �


Observe from the example above that the square root of the largest and smallest

eigenvalues of AT A are the maximum and minimum of values of ‖A~x ‖ subject to

‖~x ‖ = 1. So, these are behaving like the eigenvalues of a symmetric matrix. This

motivates the following definition.

DEFINITION

Singular Values

The singular values σ1, . . . , σn of an m × n matrix A are the square roots of the

eigenvalues of AT A arranged so that σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0.

EXAMPLE 2 Find the singular values of A =

[

1 1 1

2 2 2

]

.

Solution: We have AT A =

5 5 5

5 5 5

5 5 5

. The characteristic polynomial is C(λ) =

−λ2(λ − 15) so the eigenvalues are (from greatest to least) λ1 = 15, λ2 = 0, and

λ3 = 0. Thus, the singular values of A are σ1 =√

15, σ2 = 0, and σ3 = 0.

EXAMPLE 3 Find the singular values of B =

1 1

−1 2

−1 1

.

Solution: We have BT B =

[

3 −2

−2 6

]

. The characteristic polynomial is C(λ) = (λ −

2)(λ−7). Hence, the eigenvalues of BT B are λ1 = 7 and λ2 = 2, so the singular values

of B are σ1 =√

7 and σ2 =√

2.

EXAMPLE 4 Find the singular values of A =

2 −1

1 0

1 2

and B =

[

2 1 1

−1 0 2

]

.


[

6 0

0 5

]

. Hence, the eigenvalues of AT A are λ1 = 6 and

λ2 = 5. Thus, the singular values of A are σ1 =√

6 and σ2 =√

5.

We have BT B =

5 2 0

2 1 1

0 1 5

. The characteristic polynomial of BT B is

C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

5 − λ 2 0

2 1 − λ 1

0 1 5 − λ

∣

∣

∣

∣

∣

∣

∣

∣

= −λ(λ − 5)(λ − 6)

Hence, the eigenvalues of BT B are λ1 = 6, λ2 = 5, and λ3 = 0. Thus, the singular

values of B are σ1 =√

6, σ2 =√

5, and σ3 = 0.


EXERCISE 1 Find the singular values of A =

1 −1

3 1

1 1

and B =

[

1 −1 1

0 2 2

]

.

It is easy to show that the number of non-zero eigenvalues of AT A is the rank of AT A.

Thus, it is natural to ask if there is a relationship between the number of non-zero

singular values of an m×n matrix A and the rank of A. In the problems above, we see

that we have the rank of A equals the number of non-zero singular values. To prove

this result in general, we use the following lemmas.

LEMMA 2 If A is an m × n matrix, then Null(AT A) = Null(A).

Proof: If A~x = ~0, then AT A~x = AT~0 = ~0. Hence, the nullspace of A is a subset of the

nullspace of AT A. On the other hand, consider AT A~x = ~0. Then,

‖A~x ‖2 = (A~x) · (A~x) = (A~x)T (A~x) = ~xT AT A~x = ~xT~0 = 0

Thus, A~x = ~0. Hence, the nullspace of AT A is also a subset of the nullspace of A, and

the result follows. �

LEMMA 3 If A is an m × n matrix, then rank(AT A) = rank(A).

Proof: Using Lemma 2, we get that dim(Null(AT A)) = dim(Null(A)). Thus, the

Dimension Theorem gives

rank(AT A) = n − dim(Null(AT A)) = n − dim(Null(A)) = rank(A)

as required. �

COROLLARY 4 If A is an m × n matrix and rank(A) = r, then A has r non-zero singular values.

We now want to extend the similarity of singular values and eigenvalues by defining

singular vectors. Observe that if A is an m × n matrix with m , n, then we cannot

have A~v = σ~v since A~v ∈ Rm. Hence, the best we can do to match the definition of

eigenvalues and eigenvectors is to pick suitable non-zero vectors ~v ∈ Rn and ~u ∈ Rm

such that A~v = σ~u.

By definition, for any non-zero singular value σ of A there is a vector ~v , ~0 such

that AT A~v = σ2~v. Thus, if we have A~v = σ~u, then we have AT A~v = AT (σ~u) so

σ2~v = σAT~u. Dividing by σ, we see that we must also have AT~u = σ~v. Moreover, by

Theorem 1, if ~v is a unit eigenvector of AT A, then we get that ~u = 1σ

A~v is also a unit

vector.


Therefore, for a non-zero singular value σ of A, we will want unit vectors ~v and ~usuch that

A~v = σ~u and AT~u = σ~v

However, our derivation does not work for σ = 0. In this case, we only require that

one of these conditions being satisfied.

DEFINITION

Singular Vectors

Let A be an m × n matrix. If ~v ∈ Rn and ~u ∈ Rm are unit vectors and σ , 0 is a

singular value of A such that

A~v = σ~u and AT~u = σ~v

then we say that ~u is a left singular vector of A corresponding to σ and ~v is a right

singular vector of A corresponding to σ. For σ = 0, if ~u is a unit vector such that

AT~u = ~0, then ~u is a left singular vector of A corresponding to σ = 0. If ~v is a

unit vector such that A~v = ~0, then ~v is a right singular vector of A corresponding to

σ = 0.

Our derivation above the definition proves the following theorem which we will make

frequent use of.

THEOREM 5 Let A be an m × n matrix. If ~v ∈ Rn is a unit eigenvector of AT A corresponding

to a non-zero singular value σ of A , then ~u =1

σA~v is a left singular vector of A

corresponding to σ.

The definition of right singular vectors, not only preserves our relationship of these

vectors and the eigenvectors of AT A, but we also get the corresponding result for the

left singular vectors and the eigenvectors of AAT .

THEOREM 6 Let A be an m × n matrix. A vector ~v ∈ Rn is a right singular vector of A if and only

if ~v is a unit eigenvector of AT A. A vector ~u ∈ Rm is a left singular vector of A if and

only if ~u is a unit eigenvector of AAT .

Now that we have singular values and singular vectors mimicking eigenvalues and

eigenvectors, we wish to try to mimic orthogonal diagonalization. Let A be an m × n

matrix. As was the case with singular vectors, if m , n, then PT AP is not defined for

a square matrix P. Thus, we want to find an m ×m orthogonal matrix U and an n × n

orthogonal matrix V such that UT AV = Σ, where Σ is an m × n “diagonal” matrix.

To do this, we require that for any m×n matrix A there exists an orthonormal basis for

Rn of right singular vectors and an orthonormal basis for Rm of left singular vectors.

We now prove that these always exist.


LEMMA 7 Let A be an m × n matrix with rank(A) = r and singular values σ1, . . . , σn. If

{~v1, . . . ,~vn} is an orthonormal basis for Rn consisting of the eigenvectors of AT A ar-

ranged so that the corresponding eigenvalues are arranged from greatest to least, then

{

1

σ1

A~v1, . . . ,1

σr

A~vr

}

is an orthonormal basis for Col A.

Proof: Since A has rank r, we know that AT A has rank r and so AT A has r non-zero

eigenvalues. Thus, σ1, . . . , σr are the r non-zero singular values of A.

Now, observe that for 1 ≤ i, j ≤ r, i , j, we have

A~vi · A~v j = (A~vi)T A~v j = ~v

Ti AT A~v j = ~v

Ti λ j~v j = λ j~v

Ti ~v j = 0

because {~v1, . . . ,~vr} is orthogonal. So, {A~v1, . . . , A~vr} is also orthogonal. Theorem 1

then implies that{

1

σ1

A~v1, . . . ,1

σr

A~vr

}

is orthonormal. Since dim Col(A) = r, Theorem 4.2.3 implies that this is an orthonor-

mal basis for Col A. �

THEOREM 8 If A is an m×n matrix with rank r and singular values σ1, . . . , σn, then there exists an

orthonormal basis {~v1, . . . ,~vn} forRn of right singular vectors of A and an orthonormal

basis {~u1, . . . , ~um} for Rm of left singular vectors of A such that

A~vi = σi~ui, 1 ≤ i ≤ min(m, n)

Proof: By Theorem 6 unit eigenvectors of AT A are right singular vectors of A.

Hence, since AT A is symmetric, we can find an orthonormal basis {~v1, . . . ,~vn} of

right singular vectors of A.

Let

~ui =1

σi

A~vi, 1 ≤ i ≤ r

By Lemma 7, {~u1, . . . , ~ur} is an orthonormal basis for Col(A). Also, by definition, a

left singular vectors ~u j of A corresponding toσ = 0 lies in the nullspace of AT . Hence,

by the Fundamental Theorem of Linear Algebra, if {~ur+1, . . . , ~um} is an orthonormal

basis for the nullspace of AT , then {~u1, . . . , ~um} is an orthonormal basis for Rm of left

singular vectors of A. �


EXAMPLE 5 Let B =

1 1

−1 2

−1 1

. Find an orthonormal basis for R2 of right singular vectors for B

and an orthonormal basis for R3 of left singular vectors of B.

Solution: By Theorem 6, we can find an orthonormal basis for R2 of right singular

vectors of B by finding an orthonormal basis for R2 of eigenvectors of BT B.

In Example 4, we found that BT B =

[

3 −2

−2 6

]

with eigenvalues λ1 = 7 and λ2 = 2.

We have BT B − 7I =

[

−4 −2

−2 −1

]

∼[

1 1/20 0

]

.

Thus, a unit eigenvector corresponding to λ1 = 7 is ~v1 =

[

−1/√

5

2/√

5

]

.

We have BT B − 2I =

[

1 −2

−2 4

]

∼[

1 −2

0 0

]

.

Thus, a unit eigenvector corresponding to λ2 = 2 is ~v2 =

[

2/√

5

1/√

5

]

.

Hence, an orthonormal basis for R2 of right singular vectors of B is {~v1,~v2}.

To find an orthonormal basis for R3 of left singular vectors, we follow the steps in

the proof of Theorem 8.

We first find an orthonormal basis for Col(B) using Lemma 7. We get

~u1 =1

σ1

B~v1 =1√

7

1 1

−1 2

−1 1

[

−1/√

5

2/√

5

]

=

1/√

35

5/√

35

3/√

35

~u2 =1

σ2

B~v2 =1√

2

1 1

−1 2

−1 1

[

2/√

5

1/√

5

]

=

3/√

10

0

−1/√

10

So, {~u1, ~u2} is an orthonormal basis for Col(B). To complete the orthonormal basis

for R3, we add an orthonormal basis for Null(BT ). We have

BT ∼[

1 0 −1/30 1 2/3

]

Thus, if we let ~u3 =

1/√

14

−2/√

14

3/√

14

we get that {~u3} is an orthonormal basis for Null(BT )

and {~u1, ~u2, ~u3} is an orthonormal basis for R3 of left singular vectors of B.


EXERCISE 2 Let A =

[

1 −1 −1

1 2 1

]

. Find an orthonormal basis for R3 of right singular vectors of A

and an orthonormal basis for R2 of left singular vectors of A. Compare to the solution

of Example 5.

Let A ∈ Mm×n(R) with rank(A) = r. Let {~v1, . . . ,~vn} be an orthonormal basis for Rn of

right singular vectors of A, let σ1, . . . , σr be the non-zero singular values of A, and let

{~u1, . . . , ~um} be an orthonormal basis for Rm of left singular vectors of A constructed

as in Theorem 8. Observe that we have

A[

~v1 · · · ~vn

]

=[

A~v1 · · ·A~vr A~vr+1 · · · A~vn

]

=[

σ1~u1 · · · σr~ur~0 · · · ~0

]

=[

~u1 · · · ~um

]

Σ

where Σ is the m × n matrix with (Σ)ii = σi for 1 ≤ i ≤ r and all other entries are 0.

Hence, we have that there exists an orthogonal matrix V and orthogonal matrix U

such that AV = UΣ. Instead of writing this as UT AV = Σ, we typically, write this as

A = UΣVT to get a matrix decomposition of A.

DEFINITION

Singular ValueDecomposition

(SVD)

Let A be an m × n matrix with rank(A) = r and with non-zero singular values

σ1, . . . , σr. A singular value decomposition of A is a factorization of the form

A = UΣVT

where U is an orthogonal matrix containing left singular vectors of A, V is an or-

thogonal matrix containing right singular vectors of A, and Σ is the m×n matrix with

(Σ)ii = σi for 1 ≤ i ≤ r and all other entries of Σ are 0.

ALGORITHM

To find a singular value decomposition of a matrix A with rank r:

(1) Find the eigenvalues λ1, . . . , λn of AT A arranged from greatest to least and a

corresponding set of orthonormal eigenvectors {~v1, . . . ,~vn}. Define σi =√λi

for 1 ≤ i ≤ r.

(2) Let V =[

~v1 · · · ~vn

]

and let Σ be the m × n matrix such that (Σ)ii = σi for

1 ≤ i ≤ r and all other entries of Σ are 0.

(3) Find left singular vectors of A by computing ~ui =1σi

A~vi for 1 ≤ i ≤ r, and then

extend the set {~u1, . . . , ~ur} to an orthonormal basis {~u1, . . . , ~um} for Rm. Take

U =[

~u1 · · · ~um

]

.

Then, A = UΣVT .


REMARK

As indicated in the proof of Theorem 8, one way to extend {~u1, . . . , ~ur} to an orthonor-

mal basis for Rm is by finding an orthonormal basis for Null(AT ).

EXAMPLE 6 Find a singular value decomposition of A =

[

1 −1 3

3 1 1

]

.

Solution: We find that AT A =

10 2 6

2 2 −2

6 −2 10

has eigenvalues (ordered from greatest

to least) of λ1 = 16, λ2 = 6 and λ3 = 0. Corresponding orthonormal eigenvectors are

~v1 =

1/√

2

0

1/√

2

, ~v2 =

1/√

3

1/√

3

−1/√

3

, ~v3 =

1/√

6

−2/√

6

−1/√

6

So, the singular values of A are σ1 =√

16 = 4, and σ2 =√

6.

We define V =

1/√

2 1/√

3 1/√

6

0 1/√

3 −2/√

6

1/√

2 −1/√

3 −1/√

6

and Σ =

[

4 0 0

0√

6 0

]

.

To find the left singular vectors we compute

~u1 =1

σ1

A~v1 =1

4

[

4/√

2

4/√

2

]

=

[

1/√

2

1/√

2

]

, ~u2 =1

σ2

A~v2 =1√

6

[

−3/√

3

3/√

3

]

=

[

−1/√

2

1/√

2

]

Since this forms a basis for R2, we take U =

[

1/√

2 −1/√

2

1/√

2 1/√

2

]

.

Then, we have a singular value decomposition A = UΣVT of A.

EXAMPLE 7 Find a singular value decomposition of B =

2 −4

2 2

−4 0

1 4

.

Solution: We have BT B =

[

25 0

0 36

]

. The eigenvalues are λ1 = 36 and λ2 = 25 with

corresponding orthonormal eigenvectors

[

0

1

]

,

[

1

0

]

. Consequently, the singular values

of B are σ1 =√

36 = 6 and σ2 =√

25 = 5.

We define V =

[

0 1

1 0

]

and Σ =

6 0

0 5

0 0

0 0

.


Next, we compute

~u1 =1

σ1

B~v1 =1

6

−4

2

0

4

, ~u2 =1

σ2

B~v2 =1

5

2

2

−4

1

But, we only have two orthogonal vectors in R4, so we need to extend {~u1, ~u2} to an

orthonormal basis {~u1, ~u2, ~u3, ~u4} for R4. We know that we can do this by finding an

orthonormal basis for Null(BT ). We apply the Gram-Schmidt algorithm to a basis for

Null(BT ) to get vectors

~u3 =1

3

1

−2

0

2

, ~u4 =1

15

8

8

9

4

We let U =[

~u1 ~u2 ~u3 ~u4

]

and we get B = UΣVT .

EXAMPLE 8 Find a singular value decomposition for A =

1 1

2 2

−1 −1

.


[

6 6

6 6

]

. Thus, by inspection, the eigenvalues of A λ1 = 12

and λ2 = 0, with corresponding unit eigenvectors ~v1 =

[

1/√

2

1/√

2

]

and ~v2 =

[

−1/√

2

1/√

2

]

respectively. Hence, the singular values of A are σ1 =√

12 and σ2 = 0. Thus, we

have

V =

[

1/√

2 −1/√

2

1/√

2 1/√

2

]

, Σ =

√12 0

0 0

0 0

We compute

~u1 =1

σ1

A~v1 =1√

24

2

4

−2

We now need to extend {~u1} to an orthonormal basis {~u1, ~u2, ~u3} for R3. To do this, we

find an orthonormal basis for Null(AT ).

We observe that a basis for Null(AT ) is

−2

1

0

,

1

0

1

. Applying the Gram-Schmidt

algorithm to this basis and normalizing gives

~u2 =1√

5

−2

1

0

, ~u3 =1√

30

1

2

5

We let U =[

~u1 ~u2 ~u3

]

and we have a singular value decomposition UΣVT of A.



1. Find a singular value decomposition for the fol-

lowing matrices.

(a)

[

2 2

−1 2

]

(b)

[

3 −4

8 6

]

(c)

[

2 3

0 2

]

(d)

1 3

2 1

1 1

(e)

2 −1

2 −1

2 −1

(f)

1 0 1

1 1 −1

−1 1 0

0 1 1

(g)

1 2

1 2

1 2

1 2

(h)

[

1 1 1 1

2 2 2 2

]

(i)

[

4 4 −2

3 2 4

]

2. Find the maximum of ‖A~x‖ subject to ‖~x‖ = 1

for each of the following matrices A.

(a)

[

2 3

6 −1

]

(b)

−1 3

2 −1

1 2

(c)

[

1 2 3

2 3 −1

]

3. Let A be an m× n matrix and let P be an m×m

orthogonal matrix. Prove that PA has the same

singular values as A.

4. Prove Theorem 6.

5. Let UΣVT be a singular value decomposi-

tion for an m × n matrix A with rank r.

Find, with proof, an orthonormal basis for

Row(A), Col(A), Null(A) and Null(AT ) from

the columns of U and V .

6. Let A =

2 1 1

0 2 0

0 0 2

be the standard matrix of a

linear mapping L. (Observe that A is not diag-

onalizable).

(a) Find a basis B for R3 of right singular vec-

tors of A.

(b) Find a basis C for R3 of left singular vec-

tors of A.

(c) Determine C[L]B.

7. Show that a singular value decomposition

A = UΣVT allows us to write A as

A = σ1~u1~vT1 + · · · + σr~ur~v

Tr

where r = rank(A).

Chapter 11

Complex Vector Spaces

Our goal in this chapter is to extend everything we have done with real vector spaces

to vector spaces which use complex numbers for their scalars instead of just real

numbers. We begin with a very brief review of the basic operations on complex

numbers.

11.1 Complex Number Review

Recall that a complex number is a number of the form z = a + bi where a, b ∈ R and

i 2 = −1. We call a the real part of z and denote it by

Re z = a

and we call b the imaginary part of z and denote it by

Im z = b

The set of all complex numbers is denoted C.

It is important to notice that the real numbers are a subset of the complex numbers.

In particular, every real number a is the complex number a+0i. A non-zero complex

number whose real part is 0 is said to be imaginary. Any complex number which

has non-zero imaginary part is said to be non-real.

We define addition and subtraction of complex numbers by

(a + bi) ± (c + di) = (a ± c) + (b ± d)i

EXAMPLE 1 We have

(3 + 4i) + (2 − 3i) = (3 + 2) + (4 − 3)i = 5 + i

(1 − 3i) − (4 + 2i) = (1 − 4) + (−3 − 2)i = −3 − 5i

319

320 Chapter 11 Complex Vector Spaces

Observe that addition of complex numbers is defined by adding the real parts and

adding the imaginary parts of the complex numbers. That is, we add complex num-

bers component wise just like vectors in Rn.

In fact, the set C with this rule for addition and with scalar multiplication by a real

scalar t defined by

t(a + bi) = ta + tbi

forms a vector space over R. Let’s find a basis for this real vector space. Notice that

every vector in C can be written as a + bi with a, b ∈ R, so {1, i} spans C. Also, {1, i}is clearly linearly independent as the only solution to

t11 + t2i = 0 + 0i

is t1 = t2 = 0 since t1 and t2 are real scalars. Hence, C forms a two dimensional

real vector space. Consequently, it is isomorphic to every other two dimensional real

vector space. In particular, it is isomorphic to the plane R2. As a result, the set C is

often called the complex plane. We generally think of having an isomorphism which

maps the complex number z = a + bi to the point (a, b) in R2.

We can use this isomorphism to define the absolute value of a complex number z =

a + bi. Recall that the absolute value of a real number x measures the distance that

x is from the origin. So, we define the absolute value of z as the distance from the

point (a, b) in R2 to the origin.

DEFINITION

Absolute Value

The absolute value or modulus of a complex number z = a + bi is defined by

|z| =√

a2 + b2

EXAMPLE 2 We have

|2 − 3i| =√

22 + (−3)2 =√

13

∣

∣

∣

∣

∣

1

2+

1

2i

∣

∣

∣

∣

∣

=

√

(

1

2

)2

+

(

1

2

)2

=

√

1

2

Of course, this has the same properties as the real absolute value.

THEOREM 1 If w, z ∈ C, then

(1) |z| is a non-negative real number

(2) |z| = 0 if and only if z = 0

(3) |w + z| ≤ |w| + |z|

Section 11.1 Complex Number Review 321

REMARK

If z = a + bi is real, then b = 0, so |z| = |a + 0i| =√

a2 + 02 = |a|. Hence, this is a

direct generalization of the real absolute value function.

The ability to represent complex numbers as points in the plane is very important

in the study of complex numbers. However, viewing C as a real vector space has

one serious limitation; it only defines the multiplication of a complex number by

a real scalar and does not immediately give us a way of multiplying two complex

numbers together. So, we return to the standard form of complex numbers and define

multiplication of complex numbers by using the normal distributive property and the

fact that i2 = −1. We get

(a + bi)(c + di) = ac + adi + bci + bdi2 = ac − bd + (ad + bc)i

EXAMPLE 3 We have

(3 + 4i)(2 − 3i) = 3(2) − 4(−3) + [4(2) + 3(−3)]i = 18 − i

(1 − 3i)(4 + 2i) = 1(4) − (−3)(2) + [(−3)(4) + 1(2)]i = 10 − 10i

We get the following familiar properties.

THEOREM 2 If z1, z2, z3 ∈ C, then

(1) z1 + z2 ∈ C(2) z1 + z2 = z2 + z1

(3) z1z2 = z2z1

(4) z1 + (z2 + z3) = (z1 + z2) + z3

(5) z1(z2z3) = (z1z2)z3

(6) z1(z2 + z3) = z1z2 + z1z3

(7) z1 + 0 = z1

(8) 1z1 = z1

(9) For each z ∈ C there exist (−z) ∈ C such that z + (−z) = 0.

(10) For each z ∈ C with z , 0, there exists (1/z) ∈ C such that z(1/z) = 1.

REMARK

Any set with two operations of addition and multiplication that satisfies properties

(1) to (10) is called a field.

EXERCISE 1 Prove that |z1z2| = |z1||z2| for z1, z2 ∈ C.


We look at one more important example of complex multiplication.

EXAMPLE 4(a + bi)(a − bi) = a2 + b2 + [b(a) − a(b)]i = a2 + b2 = |a + bi|2

This example motivates the following definition.

DEFINITION

ComplexConjugate

The complex conjugate z of z = a + bi ∈ C is defined to be

z = a − bi

EXAMPLE 5 We have

4 − 3i = 4 + 3i

3i = −3i

−4 = −4

2 + 2i = 2 − 2i

We have many important properties of complex conjugates.

THEOREM 3 If w, z ∈ C with z = a + bi, then

(1) z = z

(2) z is real if and only if z = z

(3) z is imaginary if and only if z = −z and z , 0

(4) z ± w = z ± w

(5) zw = z w

(6) z + z = 2 Re(z) = 2a

(7) z − z = i2 Im(z) = 2bi

(8) zz = a2 + b2 = |z|2

Since the absolute value of a non-zero complex number is a positive real number,

we can use property (8) to divide complex numbers. We demonstrate this with some

examples.

EXAMPLE 6 Calculate2 − 3i

1 + 2i.

Solution: To calculate this, we make the denominator real by multiplying the numer-

ator and the denominator by the complex conjugate of the denominator. We get

2 − 3i

1 + 2i=

2 − 3i

1 + 2i· 1 − 2i

1 − 2i=

2 − 6 + (−3 − 4)i

12 + 22=−4 + 7i

5= −4

5+

7

5i


We can easily check this answer. We have

(1 + 2i)

(

−4

5− 7

5i

)

= −4

5+

14

5+

[

−8

5− 7

5

]

i = 2 − 3i

EXAMPLE 7 Calculatei

1 + i.

Solution: We have

i

1 + i=

i

1 + i· 1 − i

1 − i=

1 + i

1 + 1=

1

2+

1

2i

EXERCISE 2 Calculate5

3 − 4i.

In some real world applications, such as problems involving electric circuits, one

needs to solve systems of linear equations involving complex numbers. The proce-

dure for solving systems of linear equations is the same, except that we can now

multiply a row by a non-zero complex number and we can add a complex multiple

of one row to another. We demonstrate this with a few examples.


z1 + iz2 + (−1 + 2i)z3 = −1 + 2i

z2 + 2z3 = 2 + 2i

2z1 + (−1 + 2i)z2 + (−6 + 4i)z3 = −4

Solution: We row reduce the augmented matrix of the system to RREF.

1 i −1 + 2i −1 + 2i

0 1 2 2 + 2i

2 −1 + 2i −6 + 4i −4

R3 − 2R1

∼

1 i −1 + 2i −1 + 2i

0 1 2 2 + 2i

0 −1 −4 −2 − 4i

R1 − iR2

R3 + R2

∼

1 0 −1 1

0 1 2 2 + 2i

0 0 −2 −2i

−12R3

∼

1 0 −1 1

0 1 2 2 + 2i

0 0 1 i

R1 + R3

R2 − 2R3 ∼

1 0 0 1 + i

0 1 0 2

0 0 1 i

Hence, the only solution is z1 = 1 + i, z2 = 2, and z3 = i.


REMARK

It is very easy to make calculation mistakes when working with complex numbers.

Thus, it is highly recommended that you check your answer whenever possible.


z1 + z2 + iz3 = 1

−2iz1 + z2 + (1 − 2i)z3 = −2i

(1 + i)z1 + (2 + 2i)z2 − 2z3 = 2


1 1 i 1

−2i 1 1 − 2i −2i

1 + i 2 + 2i −2 2

R2 + 2iR1

R3 − (1 + i)R1

∼

1 1 i 1

0 1 + 2i −1 − 2i 0

0 1 + i −1 − i 1 − i

11+2i

R2 ∼

1 1 i 1

0 1 −1 0

0 1 + i −1 − i 1 − i

R1 − R2

R3 − (1 + i)R2

∼

1 0 1 + i 1

0 1 −1 0

0 0 0 1 − i

Hence, the system is inconsistent.

EXAMPLE 10 Find the solution set of the system of linear equations

z1 − iz2 = 1 + i

(1 + i)z1 + (1 − i)z2 + (1 − i)z3 = 1 + 3i

2z1 − 2iz2 + (3 + i)z3 = 1 + 5i


1 −i 0 1 + i

1 + i 1 − i 1 − i 1 + 3i

2 −2i 3 + i 1 + 5i

R2 − (1 + i)R1

R3 − 2R1

∼

1 −i 0 1 + i

0 0 1 − i 1 + i

0 0 3 + i −1 + 3i

11−i

R2 ∼

1 −i 0 1 + i

0 0 1 i

0 0 3 + i −1 + 3i

R3 − (3 + i)R2

∼

1 −i 0 1 + i

0 0 1 i

0 0 0 0

Thus, z2 is a free variable. Since we are solving this system over C, this means that

z2 can take any complex value. Hence, we let z2 = α ∈ C. Then, a vector equation for

the solution set is

~z =

1 + i

0

i

+ α

i

1

0

, α ∈ C



1. Calculate the following

(a) (3 + 4i) − (−2 + 6i)

(b) (2 − 3i) + (1 + 3i)

(c) (−1 + 2i)(3 + 2i)

(d) −3i(−2 + 3i)

(e) 3 − 7i

(f) (1 + i)(1 − i)

(g) |(1 + i)(1 − 2i)(3 + 4i) |

(h) |1 + 6i |

(i)

∣

∣

∣

∣

∣

2

3−√

2i

∣

∣

∣

∣

∣

(j)2

1 − i

(k)4 − 3i

3 − 4i

(l)2 + 5i

−3 − 6i

2. Solve each equation for z.

(a) iz = 4 − zi

(b) 11−z= 1 − 5i

(c) (1 + i)z = 2 − (2 + i)z

3. Solving the following systems of linear equa-

tions over C.

(a) z1 + (2 + i)z2 + iz3 = 1 + i

iz1 + (−1 + 2i)z2 + 2iz4 = −i

z1 + (2 + i)z2 + (1 + i)z3 + 2iz4 = 2 − i

(b) iz1 + 2z2 − (3 + i)z3 = 1

(1 + i)z1 + (2 − 2i)z2 − 4z3 = i

iz1 + 2z2 − (3 + 3i)z3 = 1 + 2i

(c) iz1 + (1 + i)z2 + z3 = 2i

(1 − i)z1 + (1 − 2i)z2 + (−2 + i)z3 = −2 + i

2iz1 + 2iz2 + 2z3 = 4 + 2i

(d) z1 − z2 + iz3 = 2i

(1 + i)z1 − iz2 + iz3 = −2 + i

(1 − i)z1 + (−1 + 2i)z2 + (1 + 2i)z3 = 3 + 2i

4. Prove that for any z1, z2, z3 ∈ C that we have

(a) z1(z2 + z3) = z1z2 + z1z3

(b) For each z ∈ C with z , 0, there exists

(1/z) ∈ C such that z(1/z) = 1.

(c) z1 is real if and only if z1 = z1.

(d) z1z2 = z1 z2

(e) |z1 + z2| ≤ |z1| + |z2|5. Let z1, z2 ∈ C such that z1 + z2 and z1z2 are

each negative real numbers. Prove that z1 and

z2 must be real numbers.

6. Prove that if |z| = 1, z , 1, then Re(

11−z

)

= 12.

7. Prove that for any positive integer n and z ∈ Cwe have

zn = (z)n


11.2 Complex Vector Spaces

We saw in the last section that the set of all complex numbers C can be thought of

as a two dimensional real vector space. However, we see this is inappropriate for

the solution space of Example 11.1.10 since it requires multiplication by complex

scalars. Thus, we need to extend the definition of a real vector space to a complex

vector space.

DEFINITION

Complex VectorSpace

A setV with operations of addition and scalar multiplication is called a vector space

over C if for any ~v,~z, ~w ∈ V and α, β ∈ C we have:

V1 ~z + ~w ∈ VV2 (~z + ~w) + ~v = ~z + (~w + ~v)

V3 ~z + ~w = ~w +~zV4 There exists a vector ~0 ∈ V such that ~z + ~0 = ~z for all ~z ∈ VV5 For every ~z ∈ V there exists (−~z) ∈ V such that ~z + (−~z) = ~0V6 α~z ∈ VV7 α(β~z) = (αβ)~zV8 (α + β)~z = α~z + β~zV9 α(~z + ~w) = α~z + α~w

V10 1~z = ~z

This definition looks a lot like the definition of a real vector space. In fact, the only

difference is that we now allow the scalars to be any complex number, instead of just

any real number. In the rest of this section we will generalize most of our concepts

from real vector spaces to complex vector spaces.

EXAMPLE 1 The set Cn =

z1

...zn

| zi ∈ C

is a vector space over C with addition defined by

z1

...zn

+

w1

...wn

=

z1 + w1

...zn + wn

and scalar multiplication defined by

α

z1

...zn

=

αz1

...αzn

for any α ∈ C.

Just like a complex number z ∈ C, we can split a vector inCn into a real and imaginary

part.

Section 11.2 Complex Vector Spaces 327

THEOREM 1 If ~z ∈ Cn, then there exists vectors ~x, ~y ∈ Rn such that

~z = ~x + i~y

Other familiar real vector spaces can be turned into complex vector spaces.

EXAMPLE 2 The set Mm×n(C) of all m×n matrices with complex entries is a complex vector space

with standard addition and complex scalar multiplication of matrices.

We now extend all of our theory from real vector spaces to complex vectors space.

DEFINITION

Subspace

If S is a subset of a complex vector space V and S is a complex vector space under

the same operations as V, then S is said to be a subspace of V.

As in the real case, we use the Subspace Test to prove a non-empty subset S of a

complex vector space V is a subspace of V.

EXAMPLE 3 Prove that S =

z1

z2

z3

| z1 + iz2 = −z3

is a subspace of C3.

Solution: By definition S is a subset of C3. Moreover, S is non-empty since

0

0

0

satisfies 0 + i(0) = −0. Let

z1

z2

z3

,

w1

w2

w3

∈ S. Then z1 + iz2 = −z3 and w1 + iw2 = −w3.

Hence,

z1

z2

z3

+

w1

w2

w3

=

z1 + w1

z2 + w2

z3 + w3

∈ S

since (z1 + w1) + i(z2 + w2) = z1 + iz2 + w1 + iw2 = −z3 − w3 = −(z3 + w3). Also,

α

z1

z2

z3

=

αz1

αz2

αz3

∈ S

since αz1 + i(αz2) = α(z1 + iz2) = α(−z3) = −(αz3). Thus, S is a subspace of C3 by

the Subspace Test.


EXAMPLE 4 Is R a subspace of C as a vector space over C?

Solution: By definition R is a non-empty subset of C. Let x, y ∈ R. Then, x + y is

also a real number, so the set is closed under addition. But, the difference between a

real vector space and a complex vector space is that we have scalar multiplication by

complex numbers in a complex vector space. So, if we take 2 ∈ R and the scalar i ∈ C,

then we get i(2) < R. Thus, R is not closed under complex scalar multiplication, and

hence it is not a subspace of C.

The concepts of linear independence, spanning, bases, and dimension are defined the

same as in the real case except that we now have complex scalar multiplication.

EXAMPLE 5 Find a basis for C as a vector space over C.

Solution: It may be tempting to think that a basis for C is {1, i}, but this would

be incorrect since this set is linearly dependent in C. In particular, the equation

α11 + α2i = 0 has solution α1 = 1 and α2 = i since 1(1) + i(i) = 1 − 1 = 0. A basis

for C is {1} since any complex number a + bi is just equal to (a + bi)(1).

EXERCISE 1 What is the standard basis for Cn?

EXAMPLE 6 Determine the dimension of the subspace S =

z1

z2

z3

| z1 + iz2 = −z3

of C3.

Solution: Every vector in S can be written in the form

z1

z2

z3

=

z1

z2

−z1 − iz2

= z1

1

0

−1

+ z2

0

1

−i

Thus,

1

0

−1

,

0

1

−i

spans S and is clearly linearly independent since neither vector is

a scalar multiple of the other, so it is a basis for S. Thus, dimS = 2.

EXERCISE 2 Find a basis for the four fundamental subspaces of A =

1 −1 −i

i 1 −1 + 2i

−i 1 + 2i −3 + 2i

2 0 2i

.


REMARK

Be warned that it is possible for two complex vectors to be scalar multiples of each

other without looking like they are. For example, can you determine by inspection

which of the following vectors is a scalar multiple of

[

1 − i

−2i

]

?

[

1

−1 − i

]

,

[

−i

−1 − i

]

,

[

7 − i

8 + 6i

]

Coordinates with respect to a basis, linear mappings, and matrices of linear mappings

are also defined in exactly the same way.

EXAMPLE 7 Given that B =

1

1 + i

1 − i

,

−1

−i

−1 + 2i

,

i

i

2 + 2i

is a basis of C3, find the B-coordinate

vector of ~z =

2i

−2 + i

3 + 2i

.

Solution: We need to find complex numbers α1, α2, and α3 such that

2i

−2 + i

3 + 2i

= α1

1

1 + i

1 − i

+ α2

−1

−i

−1 + 2i

+ α3

i

i

2 + 2i


1 −1 i 2i

1 + i −i i −2 + i

1 − i −1 + 2i 2 + 2i 3 + 2i

R2 − (1 + i)R1

R3 − (1 − i)R1

∼

1 −1 i 2i

0 1 1 −i

0 i 1 + i 1

R1 + R2

R3 − iR2

∼

1 0 1 + i i

0 1 1 −i

0 0 1 0

R1 − (1 + i)R3

R2 − R3 ∼

1 0 0 i

0 1 0 −i

0 0 1 0

Thus, [~z]B =

i

−i

0

.


EXAMPLE 8 Find the standard matrix of L : C2 → C3 given by L(z1, z2) =

iz1

z2

z1 + z2

.

Solution: The standard basis for C2 is

{[

1

0

]

,

[

0

1

]}

. We have

L(1, 0) =

i

0

1

and L(0, 1) =

0

1

1

. Hence, [L] =[

L(1, 0) L(0, 1)]

=

i 0

0 1

1 1

.

Of course, the concepts of determinants and invertibility are also the same.

EXAMPLE 9 Calculate det

1 + i 2 3

0 −2i −2 + 3i

0 0 1 − i

and det

i i 2i

0 0 0

1 + i 1 − i 3i

.

Solution: We have

det

1 + i 2 3

0 −2i −2 + 3i

0 0 1 − i

= (1 + i)(−2i)(1 − i) = −4i

det

i i 2i

1 1 2

1 + i 1 − i 3i

= det

i i 2i

0 0 0

1 + i 1 − i 3i

= 0

EXAMPLE 10 Let A =

[

1 + i 1

1 2 − i

]

and B =

i 0 1

1 i 1 − i

0 1 1 + i

. Show that A and B are invertible and

find their inverse.

Solution: We have det A =

∣

∣

∣

∣

∣

∣

1 + i 1

1 2 − i

∣

∣

∣

∣

∣

∣

= 2 + i , 0, so A is invertible. Using the

formula for the inverse of a 2 × 2 matrix we found in Section 5.1 gives

A−1 =1

2 + i

[

2 − i −1

−1 1 + i

]

Row reducing [B | I ] to RREF gives

i 0 1 1 0 0

1 i 1 − i 0 1 0

0 1 1 + i 0 0 1

∼

1 0 0 2 − 2i −1 i

0 1 0 3 − i −1 − i i

0 0 1 −1 − 2i i 1

Since the RREF of B is I, B is invertible. Moreover, B−1 =

2 − 2i −1 i

3 − i −1 − i i

−1 − 2i i 1

.


EXERCISE 3 Let A =

1 + i 2 i

2 3 − 2i 1 + i

1 − i 1 − 3i 1 + i

. Find the determinant of A and A−1.

Complex Conjugates

We saw in Section 11.1 that the complex conjugate was a very important and useful

operation on complex numbers. Thus, we should expect that it will also be useful

and important in the study of general complex vector spaces. Therefore, we extend

the definition of the complex conjugate to vectors in Cn and matrices in Mm×n(C).

DEFINITION

ComplexConjugate in Cn

For ~z =

z1

...zn

∈ Cn we define ~z by ~z =

z1

...zn

.

DEFINITION

ComplexConjugate in

Mm×n(C)

For A =

z11 · · · z1n

......

zm1 · · · zmn

∈ Mm×n(C) we define A by A =

z11 · · · z1n

......

zm1 · · · zmn

.

THEOREM 2 If A ∈ Mm×n(C) and ~z ∈ Cn, then A~z = A~z.

EXAMPLE 11 Let ~w =

1

2i

1 − i

, ~z =

1

0

1

, and A =

[

1 1 − 3i

−i 2i

]

. Then

~w =

1

−2i

1 + i

~z =

1

0

1

A =

[

1 1 + 3i

i −2i

]

EXAMPLE 12 Is the mapping L : C2 → C2 defined by L(~z) = ~z a linear mapping?

Solution: Consider L(α~z1 + ~z2) = α~z1 +~z2 = α~z1 + ~z2, but α , α if α is not real. So,

it is not linear. For example

L

(

(1 + i)

[

1

0

])

= L

([

1 + i

0

])

=

[

1 + i

0

]

=

[

1 − i

0

]

,

[

1 + i

0

]

= (1 + i)L

([

1

0

])


Even though the complex conjugate of a vector in Cn does not define a linear map-

ping, we will see that complex conjugates of vectors in Cn and matrices in Mm×n(C)

occur naturally in the study of complex vector spaces.


1. Determine which of the following sets is a sub-

space of C3. Find a basis and the dimension of

each subspace.

(a) S1 =

z1

z2

z3

∈ C3 | iz1 = z3

(b) S2 =

z1

z2

z3

∈ C3 | z1z2 = 0

(c) S3 =

z1

z2

z3

∈ C3 | z1 + z2 + z3 = 0

2. Find a basis for the four fundamental subspaces

of the following matrices.

(a) A =

1 i

1 + i −1 + i

−1 i

(b) B =

[

1 1 + i −1

i −1 + i i

]

(c) C =

1 1 + i 3

0 2 2 − 2i

i 1 − i −i

3. Let ~z ∈ Cn. Prove that there exists vectors

~x, ~y ∈ Rn such that ~z = ~x + i~y.

4. Let L : C3 → C2 be defined by

L(z1, z2, z3) =

[

−iz1 + (1 + i)z2 + (1 + 2i)z3

(−1 + i)z1 − 2iz2 − 3iz3

]

.

(a) Prove that L is linear and find the standard

matrix of L.

(b) Determine a basis for the range and kernel

of L.

5. Let L : C2 → C2 be defined by

L(z1, z2) =

[

z1 + (1 + i)z2

(1 + i)z1 + 2iz2

]

.

(a) Prove that L is linear.

(b) Determine a basis for the range and kernel

of L.

(c) Use the result of part (b) to define a basis

B for C2 such that [L]B is diagonal.

6. Calculate the determinant of

A =

1 −1 i

1 + i −i i

1 − i −1 + 2i 1 + 2i

.

7. Find the inverse of A =

1 1 2

0 1 + i 1

i 1 + i 1 + 2i

.

8. Prove that for any n × n matrix A, we have

det A = det A.

Section 11.3 Complex Diagonalization 333

11.3 Complex Diagonalization

In Chapter 6, we saw that some matrices were not diagonalizable over R because

they had complex eigenvalues. But, if we extend all of our concepts of eigenvalues,

eigenvectors, and diagonalization to complex vector spaces, then we may be able to

diagonalize these matrices.

DEFINITION

Eigenvalue

Eigenvector

Let A ∈ Mn×n(C). If there exists λ ∈ C and ~z ∈ Cn with ~z , ~0 such that A~z = λ~z, then

λ is called an eigenvalue of A and ~z is called an eigenvector of A corresponding to

λ. We call (λ,~z) an eigenpair.

All of the theory of similar matrices, eigenvalues, eigenvectors, and diagonalization

we did in Chapter 6 still applies in the complex case, except that we now allow the

use of complex eigenvalues and eigenvectors.

EXAMPLE 1 Diagonalize A =

[

0 1

−1 0

]

over C.


C(λ) =

∣

∣

∣

∣

∣

∣

−λ 1

−1 −λ

∣

∣

∣

∣

∣

∣

= λ2 + 1

Therefore, the eigenvalues of A are λ1 = i and λ2 = −i. For λ1 = i we get

A − iI =

[

−i 1

−1 −i

]

∼[

1 i

0 0

]


{[

−i

1

]}

. For λ2 = −i we get

A + iI =

[

i 1

−1 i

]

∼[

1 −i

0 0

]


{[

i

1

]}

. Thus, letting P =

[

−i i

1 1

]

gives

P−1AP =

[

i 0

0 −i

]

EXAMPLE 2 Diagonalize B =

[

i 1 + i

1 + i 1

]

over C.


C(λ) =

∣

∣

∣

∣

∣

∣

i − λ 1 + i

1 + i 1 − λ

∣

∣

∣

∣

∣

∣

= λ2 − (1 + i)λ − i


Using the quadratic formula we get

λ =(1 + i) ±

√6i

2=

(1 + i) ±√

6(1+i√2)

2=

1 ±√

3

2(1 + i)

So, the eigenvalues are λ1 =1+√

32

(1 + i) and λ2 =1−√

32

(1 + i). For λ1 we get

B − λ1I =

i − 1+√

3

2(1 + i) 1 + i

1 + i 1 − 1+√

3

2(1 + i)

∼[

−√

3 + i 2

2 −√

3 − i

]

∼[

1 −2/(√

3 − i)

0 0

]

Then, a basis for Eλ1is

{[

2√3 − i

]}

. For λ2 we get

B − λ2I =

i − 1−√

3

2(1 + i) 1 + i

1 + i 1 − 1−√

3

2(1 + i)

∼[√

3 + i 2

2√

3 − i

]

∼[

1 2/(√

3 + i)

0 0

]


{[

−2√3 + i

]}

. Thus, B is diagonalized by

P =

[

2 −2√3 − i

√3 + i

]

to D =

[

λ1 0

0 λ2

]

.

EXAMPLE 3 Diagonalize F =

[

3 2 + i

2 − i 7

]

over C.


C(λ) =

∣

∣

∣

∣

∣

∣

3 − λ 2 + i

2 − i 7 − λ

∣

∣

∣

∣

∣

∣

= λ2 − 10λ + 16 = (λ − 2)(λ − 8)

So the eigenvalues are λ1 = 2 and λ2 = 8. For λ1 we get

F − λ1I =

[

1 2 + i

2 − i 5

]

∼[

1 2 + i

0 0

]

Then, a basis for Eλ1is

{[

−2 − i

1

]}

. For λ2 we get

F − λ2I =

[

−5 2 + i

2 − i −1

]

∼[

1 −1/(2 − i)

0 0

]


{[

−1

−2 + i

]}

. Thus, F is diagonalized by

P =

[

−2 − i −1

1 −2 + i

]

to D =

[

2 0

0 8

]

.

Section 11.3 Complex Diagonalization 335

EXERCISE 1 Diagonalize A =

−3 −1 4

−5 3 5

−8 −2 9

over C.

As usual, these examples teach us more than just how to diagonalize complex ma-

trices. In the first example we see that when the matrix has only real entries, then

the eigenvalues come in complex conjugate pairs. The second example shows that

when working with matrices with non-real entries our theory of symmetric matrices

for real matrices does not apply. In particular, we had BT = B, but the eigenvalues of

B are not real. On the other hand, observe that the matrix F has non-real entries but

the eigenvalues of F are all real.

THEOREM 1 Let A ∈ Mn×n(R). If λ is a non-real eigenvalue of A with corresponding eigenvector

~z, then λ is also an eigenvalue of A with eigenvector ~z.

Proof: We have A~z = λ~z. Hence,

λ~z = λ~z = A~z = A~z = A~z

since A is a real matrix. Thus, λ is an eigenvalue of A with eigenvector ~z. �

COROLLARY 2 If A ∈ Mn×n(R) with n odd, then A has at least one real eigenvalue.

EXAMPLE 4 Find all eigenvalues of A =

0 2 1

−2 3 0

1 0 2

given that λ1 = 2 + i is an eigenvalue of A.

Solution: Since A is a real matrix and λ1 = 2 + i is an eigenvalue of A, Theorem 1

tells us that λ2 = λ1 = 2 − i is also an eigenvalue of A. Finally, by Theorem 6.3.6,

we know that the sum of the eigenvalues is the trace of the matrix, so the remaining

eigenvalue, which must be real, is

λ3 = tr(A) − λ1 − λ2 = 5 − (2 + i) − (2 − i) = 1

It is easy to verify this result by checking that det(A − I) = 0.



1. For each of the following matrices, either diag-

onalize the matrix over C, or show that it is not

diagonalizable.

(a)

[

2 1 + i

1 − i 1

]

(b)

[

3 5

−5 −3

]

(c)

2 2 −1

−4 1 2

2 2 −1

(d)

[

1 i

i −1

]

(e)

2 1 −1

2 1 0

3 −1 2

(f)

1 + i 1 0

1 1 −i

1 0 1

(g)

5 −1 + i 2i

−2 − 2i 2 1 − i

4i −1 − i −1

(h)

−6 − 3i −2 −3 − 2i

10 2 5

8 + 6i 3 4 + 4i

2. For any θ ∈ R, let Rθ =

[


]

.

(a) Diagonalize Rθ over C.

(b) Verify your answer in (a) is correct, by

showing the matrix P and diagonal matrix

D from part (a) satisfy P−1RθP = D for

θ = 0 and θ = π4.

3. Let A ∈ Mn×n(C) and let ~z be an eigenvector of

A. Prove that ~z is an eigenvector of A. What is

the corresponding eigenvalue?

4. Suppose that a real 2 × 2 matrix A has 2 + i as

an eigenvalue with a corresponding eigenvector[

1 + i

i

]

. Determine A.

5. Prove that if A is a real 3×3 matrix with a non-

real eigenvalue λ, then A is diagonalizable over

C.

11.4 Complex Inner Products

The concepts of orthogonality and orthonormal bases were very useful in real vector

spaces. Hence, it makes sense to extend these concepts to complex vector spaces. In

particular, we would like to be able to define the concepts of orthogonality and length

in complex vector spaces to match those in the real case.

REMARK

In this section we leave the proofs of the theorems to the reader (or homework as-

signments/tests) as they are essentially the same as in the real case.

We start by looking at how to define an inner product for Cn.

We first observe that if we extend the standard dot product to vectors in Cn, then this

does not define an inner product. Indeed, if ~z = ~x + i~y, then we have

~z ·~z = z21 + · · · + z2

n

= (x21 + · · · + x2

n − y21 − · · · − y2

n) + 2i(x1y1 + · · · + xnyn)

But, to be an inner product we require that 〈~z,~z〉 be a non-negative real number so

that we can define the length of a vector. Since ~z · ~z does not even have to be real, it

cannot define an inner product.

Section 11.4 Complex Inner Products 337

Thinking of properties of complex numbers, one way we can ensure that we get

a non-negative real number when multiplying complex numbers is to multiply the

complex number by its conjugate. In particular, if for ~z, ~w ∈ Cn we define

〈~z, ~w〉 = z1w1 + · · · + znwn = ~z · ~w

then we get

< ~z,~z >= z1z1 + · · · + znzn = |z1|2 + · · · + |zn|2

which is not only a non-negative real number, but it is 0 if and only if ~z = ~0. Thus,

this is what we are looking for.

DEFINITION

Standard Innerproduct on Cn

The standard inner product < , > on Cn is defined by

⟨

~z, ~w⟩

= ~z · ~w = z1w1 + · · · + znwn

for any ~z, ~w ∈ Cn.

REMARK

In some other areas, for example engineering, they use

⟨

~z, ~w⟩

= ~z · ~w

for the definition of the standard inner product for Cn. Be warned that many computer

programs use the engineering definition of the inner product for Cn.

EXAMPLE 1 Let ~z =

1

2 + i

1 − i

and ~w =

i

2

1 + 3i

. Calculate 〈~z,~z〉, 〈~z, ~w〉, and 〈~w,~z〉.

Solution: We have

〈~z,~z〉 = ~z ·~z =

1

2 + i

1 − i

·

1

2 − i

1 + i

= 12 + (2 + i)(2 − i) + (1 − i)(1 + i) = 8

〈~z, ~w〉 = ~z · ~w =

1

2 + i

1 − i

·

−i

2

1 − 3i

= 1(−i) + (2 + i)(2) + (1 − i)(1 − 3i) = 2 − 3i

〈~w,~z〉 = ~w ·~z =

i

2

1 + 3i

·

1

2 − i

1 + i

= i(1) + 2(2 − i) + (1 + 3i)(1 + i) = 2 + 3i

This example shows that the standard inner product for Cn is not symmetric. The

next example shows that it is also not bilinear.


EXAMPLE 2 Let ~z, ~w ∈ Cn. Calculate 〈α~z, ~w〉 and 〈~z, α~w〉 for any α ∈ C.

Solution: We have

〈α~z, ~w〉 = (α~z) · ~w = α(~z · ~w) = α〈~z, ~w〉〈~z, α~w〉 = ~z · α~w = ~z · (α~w) = α(~z · ~w) = α〈~z, ~w〉

EXERCISE 1 Let ~z =

1 + i

1 − i

2

and ~w =

1 − i

2i

−2

. Find 〈~z, ~w〉 and 〈~w,~z〉.

THEOREM 1 If ~v,~z, ~w ∈ Cn and α ∈ C, then

(1)⟨

~z,~z⟩ ∈ R,

⟨

~z,~z⟩ ≥ 0, and

⟨

~z,~z⟩

= 0 if and only if ~z = ~0

(2)⟨

~z, ~w⟩

=⟨

~w,~z⟩

(3)⟨

~v +~z, ~w⟩

=⟨

~v, ~w⟩

+⟨

~z, ~w⟩

(4)⟨

α~z, ~w⟩

= α⟨

~z, ~w⟩

As we saw in the examples, this theorem shows that the standard inner product on

Cn does not satisfy the same properties as a real inner product. So, to define an inner

product on a general complex vector space, we should use these properties instead of

those for a real inner product.

DEFINITION

Hermitian InnerProduct

Let V be a complex vector space. A Hermitian inner product on V is a function

〈 , 〉 : V × V→ C such that for all ~v, ~w,~z ∈ V and α ∈ C we have

(1)⟨

~z,~z⟩ ∈ R,

⟨

~z,~z⟩ ≥ 0 for all ~z ∈ V, and

⟨

~z,~z⟩

= 0 if and only if ~z = ~0

(2)⟨

~z, ~w⟩

=⟨

~w,~z⟩

(3)⟨

~v +~z, ~w⟩

=⟨

~v, ~w⟩

+⟨

~z, ~w⟩

(4)⟨

α~z, ~w⟩

= α⟨

~z, ~w⟩

A complex vector space with a Hermitian inner product is called a Hermitian inner

product space.

THEOREM 2 If 〈 , 〉 is a Hermitian inner product on a complex vector space V, then for all ~v, ~w,~z ∈V and α ∈ C we have

⟨

~z,~v + ~w⟩

=⟨

~z,~v⟩

+⟨

~z, ~w⟩

⟨

~z, α~w⟩

= α⟨

~z, ~w⟩


It is important to observe that this is a true generalization of the real inner product.

That is, we could use this for the definition of the real inner product since if⟨

~z, ~w⟩

and α are strictly real, then this will satisfy the definition of a real inner product.

THEOREM 3 The function 〈A, B〉 = tr(BT A) defines a Hermitian inner product on Mm×n(C). It is

called the standard inner product for Mm×n(C).

EXAMPLE 3 Let A =

[

1 i

2 2 + i

]

and B =

[

−3 3 + i

−2i 0

]

∈ M2×2(C). Find 〈A, B〉.

Solution: We have

〈A, B〉 = tr(BT A) = tr

([

−3 2i

3 − i 0

] [

1 i

2 2 + i

])

= tr

[

−3 + 4i −2 + i

3 − i 1 + 3i

]

= −3 + 4i + 1 + 3i = −2 + 7i

EXAMPLE 4 If A =

a11 · · · a1n

......

am1 · · · amn

and B =

b11 · · · b1n

......

bm1 · · · bmn

, then

BT

A =

b11 · · · bm1

......

b1n · · · bmn

a11 · · · a1n

......

am1 · · · amn

Hence, we get

tr(BT

A) =

m∑

ℓ=1

aℓ1bℓ1 +

m∑

ℓ=1

aℓ2bℓ2 + · · · +m

∑

ℓ=1

aℓnbℓn =

n∑

j=1

m∑

ℓ=1

aℓ jbℓ j

which corresponds to the standard inner product of the corresponding vectors under

the obvious isomorphism with Cmn.

REMARK

As in the real case, whenever we are working with Cn or Mm×n(C), if no other inner

product is specified, we will assume we are working with the standard inner product.

EXERCISE 2 Let A, B ∈ M2×2(C) with A =

[

1 1 − i

2 + 2i i

]

and B =

[

2 + i 1

2i 1 + i

]

. Determine

〈A, B〉 and 〈B, A〉.


Length and Orthogonality

We can now define length and orthogonality to match the definitions in the real case.

DEFINITION

Length

Unit Vector

Let V be a Hermitian inner product space with inner product 〈 , 〉. For any ~z ∈ V we

define the length of ~z by

‖~z ‖ =√

⟨

~z,~z⟩

If ‖~z ‖ = 1, then ~z is called a unit vector.

DEFINITION

Orthogonality

Let V be a Hermitian inner product space with inner product 〈 , 〉. For any ~z, ~w ∈ Vwe say that ~z and ~w are orthogonal if

⟨

~z, ~w⟩

= 0.

EXAMPLE 5 If ~u =

i

1 + i

2 − i

∈ C3, then

‖~u ‖ =√

⟨

~u, ~u⟩

=

√

~u · ~u =√

i(−i) + (1 + i)(1 − i) + (2 − i)(2 + i)

=√

1 + 2 + 5 =√

8

EXAMPLE 6 Let ~z =

1

i

2 + 3i

∈ C3. Which of the following vectors are orthogonal to ~z:

~u =

−i

1

0

, ~v =

i

1

0

, ~w =

1 − 2i

−7

1 − i

Solution: We have

⟨

~z, ~u⟩

= ~z · ~u =

1

i

2 + 3i

·

i

1

0

= i + i + 0 = 2i

⟨

~z,~v⟩

= ~z · ~v =

1

i

2 + 3i

·

−i

1

0

= i − i + 0 = 0

⟨

~z, ~w⟩

= ~z · ~w =

1

i

2 + 3i

·

1 + 2i

−7

1 + i

= (1 + 2i) − 7i + (−1 + 5i) = 0

Hence, ~v and ~w are orthogonal to ~z, and ~u is not orthogonal to ~z.


Of course, these satisfy all of our familiar properties of length and orthogonality.

THEOREM 4 If V is a Hermitian inner product space with inner product 〈 , 〉, then for any ~z, ~w ∈ Vand α ∈ C we have

(1) ‖α~z ‖ = |α| ‖~z ‖(2) ‖~z + ~w ‖ ≤ ‖~z ‖ + ‖~w ‖(3)

1

‖~z ‖~z is a unit vector in the direction of ~z.

EXERCISE 3 If V is a Hermitian inner product space with inner product 〈 , 〉. Prove that

‖α~z ‖ = |α| ‖~z ‖

for all ~z ∈ V and α ∈ C.

DEFINITION

Orthogonal

Orthonormal

Let S = {~z1, . . . ,~zk} be a set in a Hermitian inner product space with Hermitian inner

product 〈 , 〉. S is said to be orthogonal if⟨

~zℓ,~z j

⟩

= 0 for all ℓ , j. S is said to be

orthonormal if it is orthogonal and ‖~z j ‖ = 1 for 1 ≤ j ≤ k.

THEOREM 5 If S = {~z1, . . . ,~zk} is an orthogonal set in a Hermitian inner product space V, then

‖~z1 + · · · +~zk ‖2 = ‖~z1‖2 + · · · + ‖~zk ‖2

THEOREM 6 If S = {~z1, . . . ,~zk} is an orthogonal set of non-zero vectors in a Hermitian inner

product space V, then S is linearly independent.

The formulas for finding coordinates with respect to an orthogonal basis, for perform-

ing the Gram-Schmidt procedure, and for finding projections and perpendiculars are

still valid for Hermitian inner products. However, be careful to remember that a Her-

mitian product is not symmetric, so you need to have the vectors in the inner products

in the correct order.

EXERCISE 4 Let {~v1, . . . ,~vn} be an orthogonal basis for a Hermitian inner product space V. Prove

that if ~z ∈ V, then

~z =〈~z,~v1〉‖~v1‖2

~v1 + · · · +〈~z,~vn〉‖~vn‖2

~vn


EXAMPLE 7 Use the Gram-Schmidt procedure to find an orthogonal basis for the subspace S of

C4 defined by

S = Span

1

0

1

0

,

i

1

i

1

,

1 + i

−1

i

1 + i

= {~w1, ~w2, ~w3}

Solution: First, we let ~v1 = ~w1. Then we have

~v2 = ~w2 −⟨

~w2,~v1

⟩

‖~v1‖2~v1 =

i

1

i

1

− 2i

2

1

0

1

0

=

0

1

0

1

~v3 = ~w3 −⟨

~w3,~v1

⟩

‖~v1‖2~v1 −

⟨

~w3,~v2

⟩

‖~v2‖2~v2

=

1 + i

−1

i

1 + i

− 1 + 2i

2

1

0

1

0

− i

2

0

1

0

1

=

1/2−1 − i/2−1/2

1 + i/2

Consequently, {~v1,~v2,~v3} is an orthogonal basis for S.

We recall that a real square matrix with orthonormal columns is an orthogonal matrix.

Since orthogonal matrices were so useful in the real case, we generalize them as well.

DEFINITION

Unitary Matrix

If the columns of a matrix U form an orthonormal basis for Cn, then U is called

unitary.

THEOREM 7 If U ∈ Mn×n(C), then the following are equivalent:

(1) the columns of U form an orthonormal basis for Cn

(2) UT = U−1

(3) the rows of U form an orthonormal basis for Cn

EXAMPLE 8 Determine which of the following matrices are unitary:

A =

[

1 1 + 2i

−1 1 + 2i

]

, B =

(1 + i)/√

2 0 0

0 0 1

0 i 0

, C =

1 + i

2

1 − i√

6

1√

6i

20

−3 − 3i√

241

2

2i√

6

1 − i√

24

Solution: The columns of A are not unit vectors, so A is not unitary.


The columns of B are clearly unit vectors and orthogonal to each other, so the

columns of B form an orthonormal basis for C3. Hence, B is unitary.

We find that CT C = I and hence C is unitary.

REMARK

Observe that if the entries of A are all real, then A is unitary if and only if it is

orthogonal.

THEOREM 8 If U and W are unitary matrices, then UW is a unitary matrix.

Notice this is the second time we have seen the conjugate of the transpose of a matrix.

So, we make the following definition.

DEFINITION

ConjugateTranspose

For A ∈ Mm×n(C) the conjugate transpose A∗ of A is defined to be

A∗ = AT

EXAMPLE 9 If A =

1 0 1 + i −2i

1 − 3i 2 2 + i i

i −3i 4 − i 1 − 5i

, then A∗ =

1 1 + 3i −i

0 2 3i

1 − i 2 − i 4 + i

2i −i 1 + 5i

.

REMARK

Observe that if A is a real matrix, then A∗ = AT . Hence, the conjugate transpose is

the complex version of the transpose.

THEOREM 9 If A, B are matrices of the appropriate size, ~z, ~w ∈ Cn, and α ∈ C, then

(1) 〈A~z, ~w〉 = 〈~z, A∗~w〉(2) (A∗)∗ = A

(3) (A + B)∗ = A∗ + B∗

(4) (αA)∗ = αA∗

(5) (AB)∗ = B∗A∗



1. Let ~u =

1

2i

1 − i

, ~z =

2

−2i

1 − i

, and ~w =

1 + i

1 − 2i

i

be

vectors in C3. Calculate the following.

(a) ‖~u ‖ (b) ‖~w ‖(c) 〈~u,~z〉 (d) 〈~z, ~u〉(e) 〈~u, (2 + i)~z〉 (f) 〈~z, ~w〉(g) 〈~w,~z〉 (h) 〈~u +~z, 2i~w − i~z〉

2. Let A =

[

1 2i

−i 1 + i

]

, B =

[

i 2

1 − i 3

]

∈ M2×2(C).

Calculate the following.

(a) 〈A, B〉 (b) 〈B, A〉(c) ‖A‖ (d) ‖B‖

3. Determine which of the following matrices is

unitary.

(a)

[

i/√

2 1/√

2

1/√

2 −i/√

2

]

(b)

(1 + i)/2 (1 − i)/√

6 (1 − i)/√

12

1/2 0 3i/√

12

1/2 2i/√

6 −i/√

12

(c)

[

1/√

2 −1/√

2

1/√

2 1/√

2

]

(d)

0 i 0

i 0 0

0 0 (1 + i)/2

4. Let U ∈ Mn×n(C). Prove that if the columns

of U form an orthonormal basis for Cn, then

U∗ = U−1.

5. Consider C3 with its standard inner product.

Let ~z =

1 + i

2 − i

−1 + i

, ~w =

1 − i

−2 − 3i

−1

.

(a) Evaluate 〈~z, ~w〉 and 〈~w, 2i~z〉.(b) Find a non-zero vector in Span{~z, ~w} that is

orthogonal to ~z.

(c) Let ~u ∈ Cn. Write the formula for the pro-

jection of ~u onto S = Span{~z, ~w}.6. Use the Gram-Schmidt procedure to produce

an orthogonal basis for the subspace S of

M2×2(C) defined by

S = Span

{[

1 i

0 0

]

,

[

i 0

0 i

]

,

[

0 2

0 i

]}

7. Let A, B ∈ Mm×n(C).

(a) Prove that (A∗)∗ = A.

(b) (αA)∗ = αA∗

(c) (AB)∗ = B∗A∗

(d) 〈A~z, ~w〉 = 〈~z, A∗~w〉 for all ~z, ~w ∈ Cn.

8. Let U be an n × n unitary matrix.

(a) Show that 〈U~z,U~w〉 = 〈~z, ~w〉 for any~z, ~w ∈Cn.

(b) Suppose λ is an eigenvalue of U. Use part

(a) to show that |λ| = 1.

(c) Find a unitary matrix U which has eigen-

value λ where λ , ±1.

9. Let V be a complex inner product space, with

complex inner product 〈 , 〉. Prove that if

〈~u,~v〉 = 0, then ‖~u + ~v ‖2 = ‖~u ‖2 + ‖~v ‖2. Is

the converse true?

Section 11.5 Unitary Diagonalization 345

11.5 Unitary Diagonalization

We have seen that orthogonal diagonalization is amazingly useful. Thus, it makes

sense to try to extend this to the complex case as well.

DEFINITION

Unitarily Similar

Let A, B ∈ Mm×n(C). If there exists a unitary matrix U such that U∗AU = B, then we

say that A and B are unitarily similar.

If A and B are unitarily similar, then they are similar. Consequently, all of our prop-

erties of similarity still apply.

DEFINITION

UnitarilyDiagonalizable

A matrix A ∈ Mn×n(C) is said to be unitarily diagonalizable if it is unitarily similar

to a diagonal matrix.

In Section 10.2, we saw that every real symmetric matrix is orthogonally diagonaliz-

able. It is natural to ask if there is a comparable result in the case of matrices with

complex entries.

We observe that if A is a real symmetric matrix, then the condition AT = A is equiv-

alent to A∗ = A. Hence, this condition should be our analogue of symmetric.

DEFINITION

Hermitian matrix

A matrix A ∈ Mn×n(C) such that A∗ = A is called Hermitian.

EXAMPLE 1 Which of the following matrices are Hermitian?

A =

[

2 3 − i

3 + i 4

]

, B =

[

1 2i

−2i 3 − i

]

, C =

0 i i

−i 0 i

−i i 0

Solution: We have A∗ =

[

2 3 − i

3 + i 4

]

= A, so A is Hermitian.

B∗ =

[

1 2i

−2i 3 + i

]

, B, so B is not Hermitian.

C∗ =

0 i i

−i 0 −i

−i −i 0

, C, so C is not Hermitian.

Observe that if A is Hermitian, then we have (A)ℓ j = A jℓ, so the diagonal entries of A

must be real, and for ℓ , j the ℓ j-th entry must be the complex conjugate of the jℓ-thentry. Moreover, we see that every real symmetric matrix is Hermitian.


Our goal now is to prove that every Hermitian matrix is unitarily diagonalizable. We

will do this in exactly the same way that we did in Section 10.2 for symmetric matri-

ces; by first proving that every square matrix is unitarily similar to an upper triangular

matrix. The following extension of the Triangularization Theorem is extremely im-

portant and useful.

THEOREM 1 (Schur’s Theorem)

If A ∈ Mn×n(C), then A is unitarily similar to an upper triangular matrix T . Moreover,

the diagonal entries of T are the eigenvalues of A.

Proof: We prove this by induction. If n = 1, then A is upper triangular, so we

can take U = I. Assume the result holds for all (n − 1) × (n − 1) matrices. Let

λ1 be an eigenvalue of A with corresponding unit eigenvector ~z1. Extend {~z1} to an

orthonormal basis {~z1, . . . ,~zn} of Cn and let U1 =[

~z1 · · · ~zn

]

. Then, U1 is unitary

and we have

U∗1AU1 =

~z1

T

...

~zn

T

A[

~z1 · · · ~zn

]

=

~z1

T

λ1~z1 ~z1

T

A~z2 · · · ~z1

T

A~zn

~z2

T

λ1~z1 ~z2

T

A~z2 · · · ~z2

T

A~zn

......

...

~zn

T

λ1~z1 ~zn

T

A~z2 · · · ~zn

T

A~zn

For 1 ≤ j ≤ n, we have

~z j

T

λ1~z1 = λ1~z j

T

~z1 = λ1~zTj~z1 = λ1~z j ·~z1 = λ1~z1 ·~z j = λ1〈~z1,~z j〉

Since {~z1, . . . ,~zn} is orthonormal, we get that ~z1

T

λ1~z1 = λ1 and ~z j

T

λ1~z1 = 0 for 2 ≤j ≤ n. Hence, we can write

U∗1AU1 =

[

λ1~bT

0 A1

]

where A1 is an (n−1)×(n−1) matrix and ~b ∈ Cn−1. Thus, by the inductive hypothesis

we get that there exits a unitary matrix Q such that Q∗A1Q = T1 is upper triangular.

Let U2 =

[

1 ~0T

~0 Q

]

, then U2 is clearly unitary and hence U = U1U2 is unitary. Thus,

U∗AU = (U1U2)∗A(U1U2) = U∗2U∗1AU1U2 =

[

1 ~0T

~0 Q∗

] [

λ1~bT

~0 A1

] [

1 ~0T

~0 Q

]

=

[

λ1~bT Q

~0 T1

]

which is upper triangular. Since unitarily similar matrices have the same eigenvalues

and the eigenvalues of a triangular matrix are its diagonal entries, we get that the

eigenvalues of A are along the main diagonal of U∗AU as required. �


REMARK

Schur’s Theorem shows that for every n × n matrix A there exists a unitary matrix U

such that U∗AU = T is upper triangular. Since U∗ = U−1 we can solve this for A to

get A = UTU∗. This is called a Schur decomposition of A.

We can now prove the result corresponding to the Principal Axis Theorem.

THEOREM 2 (Spectral Theorem for Hermitian Matrices)

If A is Hermitian, then A is unitarily diagonalizable.

Proof: By Schur’s Theorem, there exists a unitary matrix U and an upper triangular

matrix T such that U∗AU = T . Since A∗ = A we get that

T ∗ = (U∗AU)∗ = U∗A∗U∗∗ = U∗AU = T

Since T is upper triangular, we get that T ∗ is lower triangular. Thus, T is both upper

and lower triangular and hence is diagonal. Consequently, U unitarily diagonalizes

A. �

Just like the Principal Axis Theorem, this proof does not give us an effective way

of unitarily diagonalizing a Hermitian matrix. But, not surprisingly, we find that

Hermitian matrices have the same properties as symmetric matrices.

THEOREM 3 A matrix A ∈ Mn×n(C) is Hermitian if and only if for all ~z, ~w ∈ Cn we have

⟨

A~z, ~w⟩

=⟨

~z, A~w⟩

Proof: If A is Hermitian, then A∗ = A and hence by Theorem 11.4.9 we get

⟨

A~z, ~w⟩

=⟨

~z, A∗~w⟩

=⟨

~z, A~w⟩

On the other hand, if 〈~z, A~w〉 = 〈A~z, ~w〉, then we have

~zT A ~w = ~zT A~w =⟨

~z, A~w⟩

=⟨

A~z, ~w⟩

= (A~z)T ~w = ~zT AT ~w

Since this is valid for all ~z, ~w ∈ Cn, we can apply the Matrices Equal Theorem twice

to get A = AT , and hence A = A∗ as required. �

THEOREM 4 If A is a Hermitian matrix, then

(1) All the eigenvalues of A are real.

(2) If λ1 and λ2 are distinct eigenvalues with corresponding eigenvectors ~v1 and ~v2

respectively, then ~v1 and ~v2 are orthogonal.


Proof: We will prove (1) and leave (2) as an exercise. Let λ be an eigenvalue of A

with corresponding eigenvector ~z. Then,

λ〈~z,~z〉 = 〈λ~z,~z〉 = 〈A~z,~z〉 = 〈~z, A∗~z〉 = 〈~z, A~z〉 = 〈~z, λ~z〉 = λ〈~z,~z〉

Since 〈~z,~z〉 , 0 we have λ = λ, so λ is real. �

REMARK

Since every real symmetric matrix is Hermitian, this result implies Lemma 10.2.2.

Thus, we can unitarily diagonalize a Hermitian matrix in exactly the same way that

we orthogonally diagonalized symmetric matrices. The amazing thing though is that

we are guaranteed to get a real matrix D.

EXAMPLE 2 Let A =

[

−4 1 − 3i

1 + 3i 5

]

. Verify that A is Hermitian and show that A is unitarily

diagonalizable.

Solution: We see that A∗ =

[

−4 1 − 3i

1 + 3i 5

]

= A, so A is Hermitian. We have

C(λ) =

∣

∣

∣

∣

∣

∣

−4 − λ 1 − 3i

1 + 3i 5 − λ

∣

∣

∣

∣

∣

∣

= λ2 − λ − 30 = (λ + 5)(λ − 6)

Hence, the eigenvalues are λ1 = −5 and λ2 = 6. We have

A + 5I =

[

1 1 − 3i

1 + 3i 10

]

∼[

1 1 − 3i

0 0

]


{[

−1 + 3i

1

]}

. We also have

A − 6I =

[

−10 1 − 3i

1 + 3i −1

]

∼[

1 −(1 − 3i)/10

0 0

]


{[

1 − 3i

10

]}

. Observe that

⟨[

−1 + 3i

1

]

,

[

1 − 3i

10

]⟩

= (−1 + 3i)(1 + 3i) + 1(10) = 0

Thus, the eigenvectors are orthogonal. Normalizing them, we get that A is diagonal-

ized by the unitary matrix U =

[

−(1 + 3i)/√

11 (1 − 3i)/√

110

1/√

11 10/√

110

]

to D =

[

−5 0

0 6

]

.


In the real case, we also had that every orthogonally diagonalizable matrix was sym-

metric. Unfortunately, the corresponding result is not true in the complex case as the

next example demonstrates.

EXAMPLE 3 Prove that A =

[

0 1

−1 0

]

is unitarily diagonalizable, but not Hermitian.

Solution: Observe that A∗ =

[

0 −1

1 0

]

, A, so A is not Hermitian.

In Example 11.3.1, we showed that A is diagonalized by P =

[

−i i

1 1

]

. Observe that

⟨[

−i

1

]

,

[

i

1

]⟩

= (−i)(−i) + 1(1) = 0

so the columns of P are orthogonal. Hence, if we normalize them, we get that A is

unitarily diagonalized by U =

− i√2

i√2

1√2

1√2

.

The matrix in Example 3 satisfies A∗ = −A and so is called skew-Hermitian. Thus,

we see that there are matrices which are not Hermitian but whose eigenvectors form

an orthonormal basis for Cn and hence are unitarily diagonalizable. In particular, in

the proof of the Spectral Theorem for Hermitian Matrices, we can observe that the

reverse direction fails because if A is not Hermitian, we cannot guarantee that T ∗ = T

since some of the eigenvalues may not be real.

So, as we did in the real case, we should look for a necessary condition for a matrix

to be unitarily diagonalizable.

Assume that eigenvectors of A ∈ Mn×n(C) form an orthonormal basis {~z1, . . . ,~zn} for

Cn. Then, we know that U =[

~z1 · · · ~zn

]

unitarily diagonalizes A. That is, we

have U∗AU = D, where D is diagonal. Hence, D∗ = U∗A∗U. Now, notice that

DD∗ = D∗D since D and D∗ are diagonal. This gives

(U∗AU)(U∗A∗U) = (U∗A∗U)(U∗AU) ⇒ U∗AA∗U = U∗A∗AU

Consequently, if A is unitarily diagonalizable, then we must have AA∗ = A∗A.

DEFINITION

Normal Matrix

A matrix A ∈ Mn×n(C) is called normal if AA∗ = A∗A.


THEOREM 5 (Spectral Theorem for Normal Matrices)

A matrix A is normal if and only if it is unitarily diagonalizable.

Proof: We proved that every unitarily diagonalizable matrix is normal above. So,

we just need to prove that every normal matrix is unitarily diagonalizable. Of course,

we do this using Schur’s Theorem.

By Schur’s Theorem, there exists an upper triangular matrix T and a unitary matrix

U such that U∗AU = T . We just need to prove that T is in fact diagonal. Observe

that

TT ∗ = (U∗AU)(U∗A∗U) = U∗AA∗U = U∗A∗AU = (U∗A∗U)(U∗AU) = T ∗T

Hence, T is also normal.

We now just need to prove that every upper triangular normal matrix is diagonal.

Write T as

T =

t11 t12 · · · t1n

0 t22 · · · t2n

.... . .

. . ....

0 · · · 0 tnn

Comparing 1, 1-entries of TT ∗ and T ∗T gives

|t11|2 + |t12|2 + · · · + |t1n|2 = |t11|2

Notice that this is only possible if t12 = · · · = t1n = 0.

Therefore, we have T =

t11 0 · · · 0

0 t22 · · · t2n

.... . .

. . ....

0 · · · 0 tnn

.

Next, we compare the 2, 2-entries of TT ∗ and T ∗T . We get

|t22|2 + · · · + |t2n|2 = |t22|2

Hence, t23 = · · · = t2n = 0.

Continuing in this way we get that T is diagonal as required. �

As a result normal matrices are very important. We now look at some useful proper-

ties of normal matrices.


THEOREM 6 If A ∈ Mn×n(C) is normal, then

(1) ‖A~z ‖ = ‖A∗~z ‖, for all ~z ∈ Cn

(2) A − λI is normal for every λ ∈ C(3) If A~z = λ~z, then A∗~z = λ~z(4) If ~z1 and ~z2 are eigenvectors of A corresponding to distinct eigenvalues λ1 and

λ2 of A, then ~z1 and ~z2 are orthogonal.

Proof: For (1), we use Theorem 11.4.9. For any ~z ∈ Cn we get

‖A∗~z ‖2 = 〈A∗~z, A∗~z〉 = 〈~z, (A∗)∗A∗~z〉 = 〈~z, AA∗~z〉 = 〈~z, A∗A~z〉 = 〈A~z, A~z〉 = ‖A~z‖2

For (2), we observe that

(A − λI)(A − λI)∗ = (A − λI)(A∗ − λI) = AA∗ − λA∗ − λA + |λ|2I

= A∗A − λA − λA∗ + |λ|2I = (A∗ − λI)(A − λI)

= (A − λI)∗(A − λI)

For (3), suppose that A~z = λ~z for some ~z ∈ Cn and let B = A − λI. Then, B is normal

by (2) and

B~z = (A − λI)~z = A~z − λ~z = ~0.So, by (1) we get

0 = ‖B~z ‖ = ‖B∗~z ‖ = ‖(A∗ − λI)~z ‖ = ‖A∗~z − λ~z ‖Thus, A∗~z = λ~z, as required.

The proof of (4) is left as an exercise. �

Notice that property (4) shows us that the procedure for unitarily diagonalizing a

normal matrix is exactly the same as the procedure for orthogonally diagonalizing a

real symmetric matrix, except that the calculations are a little more complex.

EXAMPLE 4 Unitarily diagonalize A =

[

4i 1 + 3i

−1 + 3i i

]

.

Solution: We have C(λ) = λ2 − 5iλ + 6. Using the quadratic formula we get eigen-

values λ1 = 6i and λ2 = −i. For λ1 = 6i we get

A − λ1I =

[

−2i 1 + 3i

−1 + 3i −5i

]

⇒ ~z1 =

[

3 − i

2

]

For λ2 = −i we get

A − λ2I =

[

5i 1 + 3i

−1 + 3i 2i

]

⇒ ~z2 =

[

1 + 3i

−5i

]

Hence, taking U =

3−i√14

1+3i√35

2√14

−5i√35

gives U∗AU =

[

6i 0

0 −i

]

.



1. Prove that every skew-Hermitian matrix is uni-

tarily diagonalizable by:

(a) mimicking the proof of the Spectral Theo-

rem for Hermitian Matrices.

(b) showing that directly that every skew-

Hermitian matrix is normal.

2. Prove that all the eigenvalues of a skew-

Hermitian matrix A are purely imaginary.

3. Prove that every unitary matrix is unitarily di-

agonalizable by:

(a) mimicking the proof of the Spectral Theo-

rem for Hermitian Matrices.

(b) showing that directly that every unitary

matrix is normal.

4. Determine if the matrix is Hermitian,

skew-Hermitian, and/or normal.

(a)

[

3 1 − i

1 + i 5

]

(b)

[

2 −i

i 1 + i

]

(c)

[

0 1 − i

−1 − i 0

]

(d)

[

1 − i 2i

2 3

]

(e)

[

i 2i

2 i

]

(f)

[

0 3i

−3i 1

]

5. Unitarily diagonalize the following matrices.

(a) A =

[

a b

−b a

]

(b) A =

[

2 1 + i

1 − i 3

]

(c) A =

[

5√

2 − i√2 + i 3

]

(d) A =

1 0 1

1 1 0

0 1 1

(e) A =

1 0 1 + i

0 2 0

1 − i 0 0

(f) A =

1 i −i

−i −1 i

i −i 0

6. Let A ∈ Mn×n(C) be normal and invertible.

Prove that B = A∗A−1 is unitary.

7. Let A and B be Hermitian matrices. Prove that

AB is Hermitian if and only if AB = BA.

8. Let A =

[

2i −2 + i

1 − i 3

]

.

(a) Prove that λ1 = 2+ i is an eigenvalue of A.

(b) Find a unitary matrix U such that U∗AU =

T is upper triangular.

9. Let A ∈ Mn×n(C) satisfy A∗ = iA.

(a) Prove that A is normal.

(b) Show that every eigenvalue λ of A must

satisfy λ = −iλ.

10. Let A ∈ Mn×n(C), and let λ1, . . . , λn denote the

eigenvalues of A (repeated according to multi-

plicity). Prove that

det A = λ1 · · · λn, and tr A = λ1 + · · · + λn

Section 11.6 Cayley-Hamilton Theorem 353

11.6 Cayley-Hamilton Theorem

We now use Schur’s theorem to prove a result about the relationship of a matrix with

its characteristic polynomial.

THEOREM 1 (Cayley-Hamilton Theorem)

If A ∈ Mn×n(C), then A is a root of its characteristic polynomial C(λ).

Proof: The characteristic polynomial of A has the form

C(λ) = (−1)nλn + cn−1λn−1 + · · · + c1λ + c0

We now consider the polynomial

C(X) = (−1)nXn + cn−1Xn−1 + · · · + c1X + c0I

whose argument is a matrix X.

Now, by Schur’s theorem, there exists a unitary matrix U and upper triangular matrix

T such that U∗AU = T .

Since the eigenvalues λ1, . . . , λn of A are the diagonal entries of T we have

T =

λ1 t12 · · · t1n

0 λ2 · · · t2n

.... . .

. . ....

0 · · · 0 λn

Moreover, we know that the roots of the characteristic polynomial are the eigenvalues

of A. Thus, we can factor C(X) as

C(X) = (X − λ1I)(X − λ2I) · · · (X − λnI)

Therefore,

C(T ) = (T − λ1I)(T − λ2I) · · · (T − λnI) (11.1)

We will show that C(T ) is the zero matrix by showing that the first k-columns of

(T − λ1I)(T − λ2I) · · · (T − λkI) only contain zeros for 1 ≤ k ≤ n.

Observe that the first column of T − λ1I is zero since the first column of T is

λ1

0...0

.

Assume that all entries of the first k-columns of (T − λ1I)(T − λ2I) · · · (T − λkI) only

contain zeros. Then, since bottom n− k entries of (T − λk+1I) are all zero, we get that

the first k + 1 columns of (T − λ1I) · · · (T − λkI)(T − λk+1I) only contain zeros. Thus,

by induction, C(T ) = On,n.


We now observe that since A = UTU∗ we have

C(A) = C(UTU∗)

= (−1)n(UTU∗)n + cn−1(UTU∗)n−1 + · · · + c1(UTU∗) + c0I

= (−1)nUT nU∗ + cn−1UT n−1U∗ + · · · + c1UTU∗ + c0UU∗

= U[

(−1)nT n + cn−1T n−1 + · · · + c1T + c0I]

U∗

= UC(T )U∗ = On,n

as required.

�

EXAMPLE 1 Verify that A =

[

a b

0 c

]

is a root of its characteristic polynomial.

Solution: Since A is upper triangular, we have C(λ) = (λ−a)(λ−c) = λ2−(a+c)λ+ac.

Then,

C(A) = A2 − (a + c)A + acI

=

[

a2 ab + bc

0 c2

]

−[

a2 + ac ab + cb

0 ac + c2

]

+

[

ac 0

0 ac

]

=

[

0 0

0 0

]

EXERCISE 1 Verify that A =

a b c

0 d e

0 0 f

is a root of its characteristic polynomial in two ways. First,

show it directly by using the method of Example 1. Second, show it by evaluating

C(A) = −(A − aI)(A − dI)(A − f I)

by multiplying from left to right as in the proof.

We now briefly look at one of the uses of the Cayley-Hamilton Theorem.

THEOREM 2 If A ∈ Mm×n(C) is invertible, then A−1 can be written as a linear combination of

powers of A.

Proof: Assume that the characteristic polynomial of A is

C(λ) = (−1)nλn + cn−1λn−1 + · · · + c1λ + c0

Observe that if c0 = 0, then λ1 = 0 is a root of C(λ) and hence det A = 0. But, this

would contradict the fact that A is invertible. So, c0 , 0.


Then, by the Cayley-Hamilton Theorem we get

(−1)nAn + cn−1An−1 + · · · + c1A + c0I = On,n

Rearranging this gives

I = − 1

c0

(

(−1)nAn + cn−1An−1 + · · · + c1A

)

= − 1

c0

(

(−1)nAn−1 + cn−1An−2 + · · · + c1I

)

A

Hence,

A−1 = − 1

c0

(

(−1)nAn−1 + cn−1An−2 + · · · + c1I

)

as required. �


1. Given that the characteristic polynomial of A =

1 1 1

0 −1 −1

−1 2 0

is

C(λ) = λ3 + 2λ − 2, write A−1 as a linear combination of powers of A.

2. Determine A3 given that the characteristic polynomial of A =

3 −8 −2

−8 −9 −4

−2 −4 6

is

C(λ) = λ3 − 147λ + 686.

3. Find a 3 × 3 matrix A that has characteristic polynomial λ3 − 3λ + 2.


INDEX

C(a, b), 112

k-flat, 21

L, 107

Mm×n(R), 72, 111

Pn(R), 111

P(R), 112

Rn, 1

operations on, 2

properties of, 4

subspaces of, 23

Adjugate, 174

Algebraic Multiplicity, 192

Angle in Rn, 33

Approximation Theorem, 273

Augmented Matrix, 54

B-Matrix, 183

Basis

algorithm for finding, 126

for subsets of Rn, 16, 26

for a vector space, 124

Block Matrix, 84

Block Multiplication, 84

Cartesian Product, 113

Cayley-Hamilton Theorem, 353

Change of Coordinates Matrix, 138

Characteristic Polynomial, 191

Coefficient Matrix, 54

Cofactor, 161

inverse by, 173

Cofactor Expansion, 165

Cofactor Matrix, 174

Complex Conjugates

of complex numbers, 322

of complex vectors, 331

of complex matrices, 331

Composition, 108, 218

Coordinate Vector, 133

Column Space, 102, 207

Consistent System, 49

Cramer’s Rule, 176

Cross Product, 35

properties of, 36

Determinant

2 × 2 matrix, 147, 160

3 × 3 matrix, 161

n × n matrix, 162

Diagonal Matrix, 185

Diagonalization, 197

algorithm, 200

orthogonal, 284

unitary, 345

Diagonalization Theorem, 197

Dimension, 130

Dimension Theorem, 212

Direct Sum, 270

Dot Product, 31

Eigenpair, 188, 333

Eigenspace, 191

Eigenvalue, 188, 333

Eigenvector, 188, 333

Elementary Matrix, 151

Elementary Row Operations (EROs), 57

Equivalent System, 54

Four Fundamental Subspaces, 103, 207

Free Variable, 62

Fundamental Theorem of Linear Algebra, 272

Geometric Multiplicity, 192

Gauss-Jordan Elimination, 59

Gram-Schmidt Procedure, 257

Hermitian Inner Product, 338

Hermitian Inner Product Space, 338

Hermitian Matrix, 345

Homogeneous System, 65

Hyperplane in Rn, 20

Identity Mapping, 109

Identity Matrix, 83

Inconsistent System, 49

Infinite Dimensional, 130

Inner Product, 237

Inner Product Space, 237

Invertible Matrix, 144

Invertible Matrix Theorem, 148, 170

Isomorphism, 232


Kernel, 99, 221

Least Squares Method, 275

Left Inverse, 143

Left Nullspace, 103, 207

Length, 32, 242, 340

properties of, 33

Line in Rn, 20

Linearity Property, 89

Linear Combination, 3

Linear Equation, 48

Linear Mappings L : Rn → Rm, 89

B-matrix, 183

eigenvalues, 188

kernel, 99

operations on, 106, 108

properties of, 107

range, 96

standard matrix, 91

Linear Mappings L : V→W, 215

matrix of, 225, 227

kernel, 221

one-to-one, 230

onto, 230

operations on, 217, 218

properties of, 218

range, 221

rank, 222

Linear Operator, 89, 215

Linearly Dependent/Independent, 13, 121

Lower Triangular, 167

Matrices Equal Theorem, 82

Matrix, 72

addition, 72

augmented, 54

coefficient, 54

column space, 102

conjugate transpose, 343

diagonalizable, 197

eigenvalues, 188

Hermitian, 345

inverse, 144

left nullspace, 103

mappings, 88

normal, 349

of a linear mapping, 183

orthogonal, 253

nullspace, 101

properties of, 73

row space, 103

scalar multiplication, 72

similar, 186

transpose, 74

unitary, 342

Matrix-Matrix Multiplication, 80

block multiplication, 84

identity, 83

properties of, 81

Matrix-Vector Multiplication, 75, 77

properties of, ??

Normal Equations, 274

Normal Matrix, 349

Normal System, 274

Normal Vector, 38

Normalizing, 244

Nullity, 222

Nullspace, 101, 207

One-to-one, 230

Onto, 230

Orthogonal Complement, 262

Orthogonal Matrix, 253

Orthogonal Vectors, 34, 244, 340

Orthogonal Basis, 246

Orthogonal Set, 34, 245, 341

Orthogonally Similar, 280

Orthonormal Basis, 248

Orthonormal Set, 248, 341

Parallelotope, 181

Perpendicular (of a projection), 43, 44, 266

Plane in Rn, 20

Projection

onto a line, 42

onto a plane, 44

onto a subspace, 266

QR-Decomposition, 260

Quadratic Form, 292

classifications, 294

diagonal, 294

principal axes, 301


Range, 96, 221

Rank

of a linear mapping, 222

of a matrix, 67

Rank-Nullity Theorem, 223

Reduced Row Echelon Form (RREF), 58

Reflections, 93

Right Inverse, 143

Rotation in R2, 93

Rotation Matrix, 93

Row Equivalent, 57

Row Reducing Algorithm, 62

Row Space, 103, 207

Scalar Equation of a Plane, 37, 38

Scalar Equation of a Hyperplane, 40

Scalar Product, 31

Schur’s Theorem, 346

Similar Matrices, 186

orthogonally, 280

unitarily, 345

Singular Value Decomposition, 315

Singular Values, 310

Singular Vectors, 312

Skew-Symmetric, 117

Solution/Solution Set, 49

Solution Space, 67

Solution Theorem, 69

Span

in Rn, 7

in a vector space, 118

Spectral Theorem

for Hermitian matrices, 347

for normal matrices, 350

Square Matrix, 144

Standard Basis

for Rn, 17

for M2×2(R), 124

for Pn(R), 124

Standard Inner Product

for Cn, 337

for Rn, 31

Standard Matrix, 91

Subspace

of Rn, 23

of a vector space over R, 115

bases of, 26

Subspace Test, 23, 115

Symmetric Matrix, 284

System of Linear Equations, 49

algorithm for solving, 62

System Rank Theorem, 68

Trace, 186, 216

Transpose, 74

properties of, 74

Triangularization Theorem, 281

Trivial Solution, 13

Trivial Vector Space, 112

Unit Vector, 33, 244, 340

Unitarily Diagonalizable, 345

Unitarily Similar, 345

Unitary Matrix, 342

properties of, 334

Upper Triangular, 167

Vector Equation, 7

Vector Space over C, 326

subspace of, 327

Vector Space over R, 110

subspace of, 115

bases of, 124

f:/00 linearalgebrabook 4/0 linalgbook

Documents