f:/00 linearalgebrabook 4/0 linalgbook
TRANSCRIPT
Contents
A Note to Students - READ THIS! . . . . . . . . . . . . . . . . . . iii
How To Make The Most Of This Book: SQ3R . . . . . . . . . . . . viii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1 Vectors in Euclidean Space 1
1.1 Vector Addition and Scalar Multiplication . . . . . . . . . . . . . . 1
1.2 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4 Dot Product, Cross Product, and Scalar Equations . . . . . . . . . . 30
1.5 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2 Systems of Linear Equations 48
2.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . 48
2.2 Solving Systems of Linear Equations . . . . . . . . . . . . . . . . . 53
3 Matrices and Linear Mappings 72
3.1 Operations on Matrices . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2 Linear Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3 Special Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.4 Operations on Linear Mappings . . . . . . . . . . . . . . . . . . . 106
4 Vector Spaces 110
4.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.2 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.3 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5 Inverses and Determinants 143
5.1 Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.2 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.4 Determinants and Systems of Equations . . . . . . . . . . . . . . . 173
5.5 Area and Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6 Eigenvectors and Diagonalization 182
6.1 Matrix of a Linear Mapping, Similar Matrices . . . . . . . . . . . . 182
6.2 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . 188
6.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
6.4 Powers of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 204
i
7 Fundamental Subspaces 207
7.1 Bases of Fundamental Subspaces . . . . . . . . . . . . . . . . . . . 207
8 Linear Mappings 215
8.1 General Linear Mappings . . . . . . . . . . . . . . . . . . . . . . . 215
8.2 Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . 221
8.3 Matrix of a Linear Mapping . . . . . . . . . . . . . . . . . . . . . 225
8.4 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
9 Inner Products 237
9.1 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 237
9.2 Orthogonality and Length . . . . . . . . . . . . . . . . . . . . . . . 242
9.3 The Gram-Schmidt Procedure . . . . . . . . . . . . . . . . . . . . 256
9.4 General Projections . . . . . . . . . . . . . . . . . . . . . . . . . . 262
9.5 The Fundamental Theorem . . . . . . . . . . . . . . . . . . . . . . 270
9.6 The Method of Least Squares . . . . . . . . . . . . . . . . . . . . . 273
10 Applications of Orthogonal Matrices 280
10.1 Orthogonal Similarity . . . . . . . . . . . . . . . . . . . . . . . . . 280
10.2 Orthogonal Diagonalization . . . . . . . . . . . . . . . . . . . . . . 284
10.3 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
10.4 Graphing Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . 299
10.5 Optimizing Quadratic Forms . . . . . . . . . . . . . . . . . . . . . 305
10.6 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . 309
11 Complex Vector Spaces 319
11.1 Complex Number Review . . . . . . . . . . . . . . . . . . . . . . . 319
11.2 Complex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . 326
11.3 Complex Diagonalization . . . . . . . . . . . . . . . . . . . . . . . 333
11.4 Complex Inner Products . . . . . . . . . . . . . . . . . . . . . . . 336
11.5 Unitary Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . 345
11.6 Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . 353
Index 356
ii
A NOTE TO STUDENTS - READ THIS!
Best Selling Novel: “My Favourite Math,” by Al G. Braw
Welcome to Linear Algebra!
In this book, you will be introduced to basic elements in linear algebra, including
vectors, vector spaces, matrices, linear mappings, and solving systems of linear equa-
tions. You will then be led to more advanced topics and applications including gen-
eral projections and unitary diagonalization. The strategy used in this book is to
help you understand an example of a concept before generalizing it. For example, in
Chapter 1 you will learn about vectors in Rn, and then in Chapter 4 we will extend
part of what we did with vectors in Rn to the abstract setting of a real vector space.
Of course, the better you understand the particular example, the easier it will be for
you to understand the more complicated concepts introduced later.
As you will quickly see, this book contains both computational and theoretical ele-
ments. Most students, with enough practice, will find the computational questions
fairly easy, but may have difficulty with the theory portion (definitions, theorems and
proofs). While studying linear algebra you should keep in mind that when applying
these concepts in the future, the problems will likely be far too large to compute by
hand. Hence, the computational work will be done by a computer, and you will use
your knowledge and understanding of the theory to set up the problem in terms of
linear algebra and to interpret the results. Why do we make you compute things by
hand if they can be done by a computer? Because solving problems by hand can often
help you understand the theory. Thus, when you first learn how to solve problems,
pay close attention to how solving the problems applies to the concepts in the course
(definitions and theorems). In doing so, not only will you find the computational
problems easier, but it will also help you greatly when doing proofs.
Prerequisites:
Teacher: Recall from your last course that...
Student: What?!? You mean we were supposed to remember that?
We expect that you know:
- The basics of set theory
- How to solve a system of equations using substitution and elimination
- The basics of vectors in R2 and R3, including the dot product and cross product
- The basics of factoring polynomials
- The basics of how to write proofs
If you are unsure of any of these, we recommend that you do some self study to
learn/remember these topics.
iii
What is linear algebra used for?
Student: Will we ever use this in real life?
Professor: Not if you get a job flipping hamburgers.
Linear algebra is used in the social sciences, the natural sciences, engineering, busi-
ness, computer science, statistics, and all branches of mathematics. A small sample
of courses at the University of Waterloo that have linear algebra prerequisites include:
AFM 272, ACTSC 291, AMATH 333, CM 271, CO 250, CS 371, ECON 301, PHYS
234, PMATH 330, STAT 331.
Note that although we will mention applications of linear algebra, we will not discuss
them in depth in this book. This is because in most cases it would require knowledge
of the various applications that we do not expect you to have at this point. Addition-
ally, you will have a lot of time to apply your linear algebra knowledge and skills in
future courses.
How to learn linear algebra!
Student: Is there life after death?
Teacher: Why do you ask?
Student: I’ll need the extra time to finish all the homework you gave us.
1. Attend all the lectures.
Although this book have been written to help you learn linear algebra, the
lectures are your primary resource to the material covered in the course. The
lectures will provide examples and explanations to help you understand the
course material. Additionally, the lectures will be interactive. That is, when
you do not understand something or need clarification, you can ask. Books do
not provide this feature (yet). You will likely find it difficult to get a good grade
in the course without attending all the lectures.
2. Read this book.
It is generally recommended that students read course notes/text books prior
to class. If you are able to teach yourself some of the material before it being
taught in class, you will find the lectures twice as effective. The lecturers will
then be there for you to clarify what you taught yourself and to help with the
areas that you had difficulty with. Trust me, you will enjoy your classes much
more if you understand most of what is being taught in the lectures. Addition-
ally, you will likely find it considerably easier to take notes and to listen to the
professor at the same time.
Note that between section the theorems, lemmas, definitions, etc, are numbered
in the form A.B.C where A indicates the chapter, B indicates the section, and
C indicates which item. For example, Theorem 3.1.4 indicates Theorem 4 in
Section 1 of Chapter 3.
iv
3. RED MEANS STOP!
The mid-section exercises are designed to help you ensure that you understand
the material. Many are of a routine nature, but some are there to make you think
more in-depth about the material. You should always do these exercises and
check your answers before proceeding.
4. Doing/Checking homework assignments.
Homework assignments are designed to help you learn the material and prepare
for the tests. You should always give a serious effort to solve a problem before
seeking help. Spending the time required to solve a hard problem is not only
very satisfying, but it will greatly increase your knowledge of the course and
your problem solving skills. Naturally, if you are unable to figure out how
to solve a problem, it is very important to get help so that you can learn how
to do it. Make sure that you always indicate on each assignment where you
have received help. Never just read/copy solutions. On a test, you are not
going to have a solution to read/copy from. You need to make sure that you
are able to solve problems without referring to similar problems. It is highly
recommended that you collect your assignments and compare them with the
solutions. Even if you have the correct answer, it is worth checking if there is
another way to solve the problem.
5. Study!
This might sound a little obvious, but most students do not do nearly enough
of this. Moreover, you need to study properly. Staying up all night the night
before a test is not studying properly! Learning math, and most other subjects,
is like building a house. If you do not have a firm foundation, then it will be
very difficult for you to build on top of it. It is extremely important that you
know and do not forget the basics.
6. Use learning resources.
There are many places you can get help outside of the lectures. They are there
to assist you... make use of them!
v
Proofs
Student 1: What is your favorite part of mathematics?
Student 2: Knot Theory.
Student 1: Me neither.
Do you have difficulty with proof questions? Here are some suggestions to help you.
1. Know and understand all definitions and theorems.
Proofs are essentially puzzles that you have to put together. In class and in this
book we will give you all of the pieces and it is up to you to figure out how
they fit together to give the desired picture. Of course, if you are missing or
do not understand a required piece, then you are not going to be able to get the
correct result.
2. Learn proof techniques.
In class and in this book you will see numerous proofs. You should use these
to learn how to select the correct puzzle pieces and to learn the ways of putting
the puzzle pieces together. In particular, I always recommend that students ask
the following two questions for every step in each proof:
(a) “Why is the step true?”
(b) “What was the purpose of the step?”
Answering the first question should really help your understanding of the def-
initions and theorems from class (the puzzle pieces). Answering the second
question, which is generally more difficult, will help you learn how to think
about creating proofs (how the puzzle pieces fit together).
3. Practice!
As with everything else, you will get better at proofs with experience. I rec-
ommend working in groups of two or three. First, you each individually write
your own proof of a statement. Then, you each pass your proof to another
group member who will try to evaluate whether your proof is valid and/or easy
to follow. After every group member has look at everyone else’s proof, discuss
the different proofs in the group and together try to write one very good proof.
vi
Steps For Writing Your Own Proof:
1. Read the statement you are trying to prove carefully. Write down the definition
of any key term in the statement and think about what theorems relate to these
terms.
2. Think about what strategy/proof technique (see below) you will use for the
proof.
3. Based on the strategy you selected, write down the beginning and the end of
the proof.
4. Use the definitions and theorems that relate to the key terms to try to manipulate
the beginning towards the end and/or the end towards the beginning.
5. If you have trouble completing the proof, then either (a) look for more theorems
that relate to the statement, (b) decide upon a new strategy and return to Step
3, or (c) ask a friend, tutor, TA, or prof for a HINT. (Giving up or reading a
solution are not good options).
6. Once you have a proof, make sure that every step is justified by a theorem or
definition that you have been given.
Most Commonly Used Proof Strategies:
Common Proof Prove the statement is true by showing the definition of the conclusion is satis-
fied, or prove an equation is valid by starting with one side and showing that it
can be modified into the other side.
Constructive Proof Prove an object exists by first constructing (defining) the object and then show-
ing it has the desired properties. A common way of figuring out how to define
the object is to work backwards... that is assume the object exists and figure
out how it must be defined (this is often done in ǫ − δ proofs in Calculus).
Subset Proof Prove a set S is a subset of a set T by first picking any element s ∈ S (first step:
Let s ∈ S ) and then showing that s satisfies the condition of T (last step: Thus,
s ∈ T ).
Sets Equal Proof Prove two sets are equal by showing they are subsets of each other.
Uniqueness Proof Prove an object is unique by assuming there are two of them and showing they
must actually be the same object.
vii
How To Make The Most Of This Book: SQ3R
The SQ3R reading technique was developed by Francis Robinson to help students
read textbooks more effectively. It will not only help you learn even more from this
book (or any textbook), but I’m sure you will find that it also makes the reading more
interesting.
Survey: Quickly skim over the section. Make note of any heading or boldface
words. Read over the definitions, the statement of theorems, and the statement of
examples or exercises (don’t read proofs or solutions at this time).
Question: Make a purpose for your reading by writing down general questions
about the headings, boldface words, definitions, or theorems that you surveyed. For
example, a couple of questions for Section 1.2 could be:
What is the geometric interpretation of spanning?
How do I determine if a set is linearly independent?
What is the relationship between spanning and linear independence?
How does this material relate to content I have already learned in the course?
What is the purpose of a basis?
How do I find a basis for a surface in higher dimensions?
Read: Read the material in chunks of about one to two pages long. Read carefully
and look for the answers to your questions as well as key concepts and supporting
details. Try to solve examples before reading the solutions. Try to figure out the proofs
before you read them. If you are not able to solve them, look carefully through the
provided solution to figure out the step where you got stuck.
Recall: As you finish each chunk, put the book aside and summarize the important
details of what you have just read. Write down the answers to any questions that you
made and write down any further questions that you have. Think critically about how
well you have understood the concepts and if necessary go back and reread a part or
do some relevant end of section problems.
Review: This is an ongoing process. Once you complete an entire section, go back
and review your notes and questions from the entire section. Test your understanding
by trying to solve the end of chapter problems without referring to the book or your
notes. Repeat this again when you finish an entire chapter and then again in the future
as necessary.
Yes, you are going to find that this makes the reading go much slower for the first
couple of chapters, but students who use this technique consistently report that they
feel that they end up spending a lot less time studying for the course as they learn the
material so much better at the beginning (which makes future concepts much easier
to learn).
viii
ACKNOWLEDGMENTS
Thanks are expressed to:
Mike La Croix: for formatting, LaTeX consulting, and all of the amazing figures.
Eddie Dupont: for the interactive features.
Pantelis Eleftheriou: for editing and proving many useful suggestions.
Brian Ingalls : for editing, proving many useful suggestions, and adding the problem
sets.
Peiyao Zeng : for editing and proving many useful suggestions.
Li Liu: for assisting with LaTeXing.
Kate Schreiner, Joseph Horan, Laurentiu Anton, Chi Zhang, and Henry Ho: for
proof-reading, editing, and suggestions.
Stephen New, Martin Pei, Conrad Hewitt, Robert Andre, Uldis Celmins, C. T. NG,
and many other of my colleagues and students who have broaden my knowl-
edge of and how to teach linear algebra.
ix
Chapter 1
Vectors in Euclidean Space
1.1 Vector Addition and Scalar Multiplication
2-dimensional and 3-dimensional Euclidean spaces were studied by ancient Greek
mathematicians. In Euclid’s Elements, Euclid defined two and three dimensional
space with a set of postulates, definitions, and common notions. In modern mathe-
matics, we define Euclidean spaces differently so that we are able to easily generalize
the concepts. This allows us to use the same ideas to solve problems in other areas.
You may be familiar with defining n-dimensional Euclidean space, Rn, to be the set of
all elements of the form (x1, . . . , xn) where xi ∈ R for 1 ≤ i ≤ n. The elements of Rn
are called points and are usually denoted P(x1, . . . , xn). However, in linear algebra,
we will view Rn as a set of vectors rather than as a set of points. In particular, we
will write an element of Rn as a column and denote it with the usual vector symbol ~x(some textbooks will denote vectors in bold face i.e. x).
DEFINITION
Rn
The set Rn is defined by Rn =
~x =
x1
...xn
∣
∣
∣
∣
∣
x1, . . . , xn ∈ R
.
Two vectors ~x =
x1
...xn
and ~y =
y1
...yn
in Rn are said to be equal if xi = yi for 1 ≤ i ≤ n.
REMARK
Note that working in, for example, R6 does not mean that we are necessarily in 6-
dimensional space. When analyzing the motion of a particle, an engineer will require
3 variables for the particle’s position as well as 3 variables for the particle’s velocity.
In Economics, one can easily be working with 1000 variables and hence working in
R1000. Understanding the ideas of linear algebra are necessary to determine which
calculations are required and to interpret the results.
1
2 Chapter 1 Vectors in Euclidean Space
Although we are going to view Rn as a set of vectors, it is important to understand
that we really are still referring to n-dimensional Euclidean space. In some cases it
will be useful to think of the vector
x1
...xn
as the point (x1, . . . , xn). In particular, we
will sometimes interpret a set of vectors as a set of points to get a geometric object
(such as a line or plane). So, geometrically, a vector in Rn always has its tail at the
origin and its head at its corresponding point.
REMARK
In linear algebra all vectors in Rn should be written as column vectors. However, we
will sometimes write these as n-tuples to match the notation that is commonly used
in other areas. In particular, when we are dealing with functions of vectors, we will
often write f (x1, . . . , xn) rather than f
x1
...xn
.
One advantage of viewing the elements of Rn as vectors instead of as points is that
we can perform operations on vectors. You may have seen vectors in R2 and R3
in physics being used to represent motion or force. In these physical examples we
add vectors by summing their components, and multiply a vector by a scalar by
multiplying each entry of the vector by the scalar. We keep these definitions for
vectors in Rn in linear algebra.
DEFINITION
Addition
ScalarMultiplication
Let ~x =
x1
...xn
, ~y =
y1
...yn
∈ Rn and let c ∈ R (called a scalar). We define addition,
~x + ~y, and scalar multiplication, c~x, by
~x + ~y =
x1 + y1
...xn + yn
, c~x =
cx1
...cxn
REMARK
We define subtraction in terms of addition and scalar multiplication. In particular,
when we write ~x − ~y we mean
~x − ~y = ~x + (−1)~y
Section 1.1 Vector Addition and Scalar Multiplication 3
EXAMPLE 1 For ~x =
1
2
−3
and ~y =
−1
3
0
in R3 we have
~x + ~y =
1 + (−1)
2 + 3
−3 + 0
=
0
5
−3
√2~y =
√2
−1
3
0
=
−√
2
3√
2
0
2~x − 3~y = 2
1
2
−3
+ (−3)
−1
3
0
=
2
4
−6
+
3
−9
0
=
5
−5
−6
We can view addition of vectors geometrically.
EXAMPLE 2 Add ~x =
[
3
1
]
and ~y =
[
−1
2
]
in R2. Show the result geometrically.
Solution: We have
~x + ~y =
[
3 + (−1)
1 + 2
]
=
[
2
3
]
We get the diagram to the right.1 2 3
1
2
3
x1
x2
~x
~y
~x + ~y
Observe that the points corresponding to the origin, ~x, ~y, and ~x + ~y form a parallelo-
gram. Thus, the vector rule for addition is sometimes called the parallelogram rule
for addition.
EXERCISE 1 Add ~x =
[
3
−2
]
and ~y =
[
−4
−1
]
in R2. Show the result geometrically.
Since we will often look at the sums of scalar multiples of vectors, we make the
following definition.
DEFINITION
LinearCombination
For ~v1, . . . ,~vk ∈ Rn and c1, . . . , ck ∈ R we call the sum
c1~v1 + c2~v2 + · · · + ck~vk
a linear combination of ~v1, . . . ,~vk.
4 Chapter 1 Vectors in Euclidean Space
Although it is intuitive that any linear combination
c1~v1 + c2~v2 + · · · + ck~vk
of vectors ~v1, . . . ,~vk ∈ Rn is going to be a vector in Rn, it is instructive to prove
this along with several other properties. We will see throughout this text that these
properties are extremely important.
THEOREM 1 If ~x, ~y, ~w ∈ Rn and c, d ∈ R, then
V1 ~x + ~y ∈ Rn
V2 (~x + ~y) + ~w = ~x + (~y + ~w)
V3 ~x + ~y = ~y + ~x
V4 There exists a vector ~0 ∈ Rn such that ~x + ~0 = ~x for all ~x ∈ Rn
V5 For every ~x ∈ Rn there exists (−~x) ∈ Rn such that ~x + (−~x) = ~0V6 c~x ∈ Rn
V7 c(d~x) = (cd)~xV8 (c + d)~x = c~x + d~xV9 c(~x + ~y) = c~x + c~y
V10 1~x = ~x
Proof: We will prove V1 and V3. Let ~x =
x1
...xn
and ~y =
y1
...yn
be any vectors in Rn.
For V1, by definition of addition we have
~x + ~y =
x1 + y1
...xn + yn
Since xi + yi ∈ R for 1 ≤ i ≤ n, we have that ~x +~y satisfies the condition of the set Rn
and hence ~x + ~y ∈ Rn.
Similarly, for V3, we have
~x + ~y =
x1 + y1
...xn + yn
by definition of addition
=
y1 + x1
...yn + xn
since addition of real numbers is commutative
= ~y + ~x by definition of addition
�
EXERCISE 2 Finish the proof of Theorem 1.
Section 1.1 Vector Addition and Scalar Multiplication 5
REMARKS
1. Observe that properties V2, V3, V7, V8, V9, and V10 are only about the oper-
ations of addition and scalar multiplication, while the other properties V1, V4,
V5, and V6 are about the relationship between the operations and the set Rn.
These facts should be clear in the proof of the theorem.
2. The proof of V4 shows that ~0 =
0...0
. It is called the zero vector.
3. The proof of V5 shows that the additive inverse of ~x =
x1
...xn
is (−~x) =
−x1
...−xn
.
4. Properties V1 and V6 show that Rn is closed under linear combinations. That
is, if ~v1, . . . ,~vk ∈ Rn, then c1~v1 + · · · + ck~vk ∈ Rn for any c1, . . . , ck ∈ R. This
fact might seem rather obvious in Rn; however, we will soon see that there
are sets which are not closed under linear combinations. In linear algebra,
it will be important and useful to identify which sets are closed under linear
combinations and, in fact, which sets have all the properties V1 - V10.
Section 1.1 Problems
1. Compute the following linear combinations.
(a) 2
2
1
−1
+ 3
−1
0
1
(b)1
3
3
1
−2
−
4
1/2−2/3
(c)√
2
√2
0√2
+2
3
6
0
0
(d) 3
1
0
1
− 2
−2
3
1
+
−7
6
−1
(e) (a + b)
1
1
−1
+ (−a + b)
1
2
−3
+ (−a + 2b)
−1
−1
2
2. Let ~x,~v ∈ Rn. Solve the following equations for
~x using only properties V1 - V10. Clearly indi-
cating each algebraic step being performed and
referencing which of the properties V1 - V10 is
being used.
(a) 2~x + 2~v = 4~x
(b) ~x + 2~v = ~v + (−~x)
3. In this exercise, we look at the area of the paral-
lelogram formed by the parallelogram rule for
addition on ~x, ~y ∈ R2.
(a) Calculate each of the following and show
the result geometrically. Find the area of
each parallelogram formed by the parallel-
ogram rule for addition.
i.
[
−1
1
]
+
[
0
2
]
ii.
[
2
−1
]
+
[
−2
1
]
(b) What condition must be placed on ~x and ~yto ensure that the parallelogram has non-
zero area?
(c) Create two more pairs ~x, ~y ∈ R2 which in-
duce a parallelogram with non-zero area.
Find the area of each parallelogram.
(d) Find a general formula for the area of the
parallelogram induced by ~x, ~y ∈ R2. Prove
that your formula is correct.
6 Chapter 1 Vectors in Euclidean Space
1.2 Bases
We now look at three extremely important concepts in linear algebra: spanning, linear
independence, and bases. We will introduce these using geometric interpretations of
sets of vectors. Note that all of these concepts will be used in every chapter in this
text, so spending the time to grasp these concepts now will have long term benefits.
EXAMPLE 1 Let ~v =
[
1
1
]
∈ R2 and consider the set S of all scalar multiples of ~v. That is,
S = {c~v | c ∈ R}
What does S represent geometrically?
Solution: Every vector ~x ∈ S has the form
~x =
[
x1
x2
]
= c
[
1
1
]
=
[
c
c
]
, c ∈ R
Hence, we see this is all vectors in R2 where x1 = x2. Alternately, we can view this
as the set of all points (x1, x1) in R2, which we recognize as the line x2 = x1.
x1
x2
O
~x = c
[
1
1
]
~v =
[
1
1
]
Notice in the example that the vector ~v gives the line its direction. Hence, we call ~v a
direction vector of the line.
EXAMPLE 2 Let ~e1 =
[
1
0
]
, ~e2 =
[
0
1
]
∈ R2. How would you describe geometrically the set S of all
possible linear combinations of ~e1 and ~e2?
Solution: Every vector ~x in S has the form
~x = c1~e1 + c2~e2 = c1
[
1
0
]
+ c2
[
0
1
]
=
[
c1
c2
]
for any c1, c2 ∈ R. Therefore, any vector in R2 can be represented as a linear combi-
nation of ~e1 and ~e2. Hence, S is the x1x2-plane.
Section 1.2 Bases 7
Instead of using set notation to indicate a set of vectors, we sometimes instead use a
vector equation as we did in examples 1 and 2.
EXAMPLE 3 Let ~m =
−1
1
1
, ~b =
4/51/56/5
∈ R3. Describe geometrically the set with vector equation
~x = t~m + ~b, t ∈ R
Solution: We have all possible scalar multiples of ~m that are being translated by~b. Hence, this is a line in R3 that passes through the point (4/5, 1/5, 6/5) that has
direction vector ~m.
~x1
~x2
~x3
~m
~b
~x = t~m + ~b
EXERCISE 1 Let ~y =
1
1
1
,~z =
1
0
1
∈ R3. Describe geometrically the set with vector equation
~x = s~y + t~z, s, t ∈ R
In words, we could describe the set of vectors in Exercise 1 as the set of all possible
linear combinations of ~y and ~z. Since we will often look at the set of all possible
linear combinations of a set of vectors we make the following definition.
DEFINITION
Span
Let B = {~v1, . . . ,~vk} be a set of vectors in Rn. We define the span of B by
SpanB = {c1~v1 + c2~v2 + · · · + ck~vk | c1, . . . , ck ∈ R}
We say that the set SpanB is spanned by B and that B is a spanning set for SpanB.
8 Chapter 1 Vectors in Euclidean Space
EXAMPLE 4 Using the definition above, we can denote the line in Example 1 as Span
{[
1
1
]}
.
Alternatively, we could say that the line is spanned by
{[
1
1
]}
.
The set S in Example 2 can be written as S = Span
{[
1
0
]
,
[
0
1
]}
= Span{~e1, ~e2}.
So, we could say that {~e1, ~e2} is a spanning set for R2.
EXAMPLE 5 Is the vector
[
1
3
]
∈ Span
{[
2
−1
]
,
[
3
3
]}
?
Solution: By definition of span, we need to determine whether there exists c1, c2 ∈ Rsuch that
[
1
3
]
= c1
[
2
−1
]
+ c2
[
3
3
]
Performing operations on vectors on the right hand side, this gives
[
1
3
]
=
[
2c1 + 3c2
−c1 + 3c2
]
Since vectors are equal if and only if they corresponding entries are equal, we get
that this vector equation implies
1 = 2c1 + 3c2
3 = −c1 + 3c2
Subtracting the second equation from the first gives −2 = 3c1 and so c1 = −23.
Substituting this into either equation gives c2 =79.
Hence, we have that[
1
3
]
= −2
3
[
2
−1
]
+7
9
[
3
3
]
Thus, by definition,
[
1
3
]
∈ Span
{[
2
−1
]
,
[
3
3
]}
.
EXAMPLE 6 Describe the subset S of R3 defined by S = Span
1
0
0
,
0
1
0
geometrically and
determine a vector equation for S .
Solution: By definition of span we have
S =
c1
1
0
0
+ c2
0
1
0
| c1, c2 ∈ R
Section 1.2 Bases 9
Thus, a vector equation for S is ~x = c1
1
0
0
+ c2
0
1
0
, c1, c2 ∈ R.
Therefore, S is the set of all vectors of the form
c1
c2
0
, and so represents the
x1x2-plane in R3.
EXAMPLE 7 Describe the subset T of R3 defined by T = Span
1
0
1
,
2
0
2
geometrically and
determine a vector equation for T .
Solution: We have
T =
c1
1
0
1
+ c2
2
0
2
| c1, c2 ∈ R
A vector equation for T is ~x = c1
1
0
1
+ c2
2
0
2
, c1, c2 ∈ R.
However, we see that
2
0
2
= 2
1
0
1
, so we can simplify the vector equation. We get
~x = c1
1
0
1
+ c2
2
1
0
1
= (c1 + 2c2)
1
0
1
= c
1
0
1
where c = c1 + 2c2 ∈ R. Therefore, T is the set of all vectors of the form
c
0
c
, and so
T is a line in R3 with direction vector
1
0
1
that passes through the origin.
For Example 7, we used the fact that the second vector was a scalar multiple of the
first to simplify the vector equation. Of course, whenever we write a vector equa-
tion/spanning set for a set of vectors, we will want to simplify the equation/spanning
set as much as possible by removing any unnecessary vectors.
In particular, we want to know under what conditions we have
Span{~v1, . . . ,~vk} = Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}
Thinking about what these two sets represent (or thinking geometrically), we get the
following theorem.
10 Chapter 1 Vectors in Euclidean Space
THEOREM 1 Let ~v1, . . . ,~vk ∈ Rn. Some vector ~vi, 1 ≤ i ≤ k, can be written as a linear combination
of ~v1, . . . ,~vi−1,~vi+1, . . . ,~vk if and only if
Span{~v1, . . . ,~vk} = Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}
Proof: ⇒ If~vi can be written as a linear combination of~v1, . . . ,~vi−1,~vi+1, . . . ,~vk, then,
by definition of linear combination, there exist c1, . . . , ci−1, ci+1, . . . , ck ∈ R such that
c1~v1 + · · · + ci−1~vi−1 + ci+1~vi+1 + · · · + ck~vk = ~vi (1.1)
To prove that Span{~v1, . . . ,~vk} = Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk} we will show that the
sets are subsets of each other.
By definition of span, for any ~x ∈ Span{~v1, . . . ,~vk} there exist d1, . . . , dk ∈ R such that
~x = d1~v1 + · · · + di−1~vi−1 + di~vi + di+1~vi+1 + · · · + dk~vk
Using equation (1.1) to substitute in for ~vi gives
~x = d1~v1 + · · · + di(c1~v1 + · · · + ci−1~vi−1 + ci+1~vi+1 + · · · + ck~vk) + · · · + dk~vk
Rearranging using properties from Theorem 1.1.1 gives
~x = (d1 + dic1)~v1 + · · · + (di−1 + dici−1)~vi−1 + (di+1 + dici+1)~vi+1 + · · · + (dk + dick)~vk
Thus, by definition, ~x ∈ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk} and hence
Span{~v1, . . . ,~vk} ⊆ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}
Now, if ~y ∈ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}, then there exists a1, . . . , ai−1, ai+1, . . . , an ∈R such that
~y = a1~v1 + · · · + ai−1~vi−1 + ai+1~vi+1 + · · · + ak~vk
= a1~v1 + · · · + ai−1~vi−1 + 0~vi + ai+1~vi+1 + · · · + ak~vk
Thus, ~y ∈ Span{~v1, . . . ,~vk}. Hence, we also have Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk} ⊆Span{~v1, . . . ,~vk} and so
Span{~v1, . . . ,~vk} = Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}
⇐ Assume that Span{~v1, . . . ,~vk} = Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}.Since~vi ∈ Span{~v1, . . . ,~vk} our assumption implies that~vi ∈ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}.Consequently, there exists b1, . . . , bi−1, bi+1, . . . , bk ∈ R such that
~vi = b1~v1 + · · · + bi−1~vi−1 + bi+1~vi+1 + · · · + bk~vk
Therefore, ~vi is a linear combination of ~v1, . . . ,~vi−1,~vi+1, . . . ,~vk as required. �
We are going to apply this theorem frequently throughout this course. So, let’s
demonstrate how it works with some examples.
Section 1.2 Bases 11
EXAMPLE 8 Describe each of the sets geometrically and give a simplified vector equation.
(a) S = Span
{[
1
1
]
,
[
3
3
]}
Solution: Since
[
3
3
]
is a linear combination (scalar multiple) of
[
1
1
]
, Theorem 1 gives
Span
{[
1
1
]
,
[
3
3
]}
= Span
{[
1
1
]}
Hence, S is a line in R2 with vector equation ~x = c1
[
1
1
]
, c1 ∈ R.
(b) T = Span
{[
1
0
]
,
[
2
0
]
,
[
1
3
]
,
[
0
1
]}
Solution: Since
[
2
0
]
is a scalar multiple of
[
1
0
]
, Theorem 1 gives
Span
{[
1
0
]
,
[
2
0
]
,
[
1
3
]
,
[
0
1
]}
= Span
{[
1
0
]
,
[
1
3
]
,
[
0
1
]}
We also have that
[
1
3
]
can be written as a linear combination of
[
1
0
]
and
[
0
1
]
, hence
Span
{[
1
0
]
,
[
1
3
]
,
[
0
1
]}
= Span
{[
1
0
]
,
[
0
1
]}
Since neither vector in the spanning set
{[
1
0
]
,
[
0
1
]}
is a scalar multiple of the other,
by Theorem 1, we cannot reduce the spanning set any further. Hence, as described in
Example 2, T is all of R2 and has vector equation ~x = c1
[
1
0
]
+ c2
[
0
1
]
, c1, c2 ∈ R.
(c) U = Span
1
1
2
,
0
1
−1
,
0
0
0
Solution: Since
0
0
0
is a scalar multiple of any vector, we get by Theorem 1 that
Span
1
1
2
,
0
1
−1
,
0
0
0
= Span
1
1
2
,
0
1
−1
Since neither vector in the spanning set
1
1
2
,
0
1
−1
is a scalar multiple of the other,
by Theorem 1, we cannot reduce the spanning set any further. Hence, U is a plane in
R3 with vector equation ~x = c1
1
1
2
+ c2
0
1
−1
, c1, c2 ∈ R.
12 Chapter 1 Vectors in Euclidean Space
REMARK
Observe that spanning sets are not unique. For example, each of the following would
also be correct answer for part (b) in Example 8.
~x = c1
[
1
0
]
+ c2
[
1
3
]
, c1, c2 ∈ R
~x = c1
[
2
0
]
+ c2
[
0
1
]
, c1, c2 ∈ R
~x = c1
[
2
0
]
+ c2
[
1
3
]
, c1, c2 ∈ R
From the examples we see that it is very important to detect when one or more vectors
in the set can be made up as linear combinations of some (or all) of the other vec-
tors. We call such sets linearly dependent sets. For example,
{[
1
0
]
,
[
2
0
]
,
[
1
3
]
,
[
0
1
]}
is
linearly dependent and
{[
1
0
]
,
[
0
1
]}
is linearly independent. In the examples it was
quite easy to see when we had this situation; however, in real world applications we
may get hundreds of vectors each having hundreds of entries. So, we need to develop
an algorithm for detecting whether one vector in a set can be written as a linear com-
bination of the others. To do this, we first need a more mathematical definition of
linear dependence.
If a set {~v1, . . . ,~vk} is linearly dependent, then it means that one of the vectors can
be made as a linear combination of some or all of the other vectors. Say, ~vi can be
written as a linear combination of the other vectors. That is, there exist c1, . . . ck ∈ Rsuch that
−ci~vi = c1~v1 + · · · + ci−1~vi−1 + ci+1~vi+1 + · · · + ck~vk
where ci , 0. We can rewrite this as
~0 = c1~v1 + · · · + ci−1~vi−1 + ci~vi + ci+1~vi+1 + · · · + ck~vk (1.2)
So, if the set is linearly dependent, then equation (1.2) has a solution where at least
one coefficient is non-zero. On the other hand, if at least one coefficient in equation
(1.2) is non-zero, say c j, then we can solve the equation for ~v j to get
−c j~v j = c1~v1 + · · · + c j−1~v j−1 + c j+1~v j+1 + · · · + ck~vk
~v j = −c1
c j
~v1 − · · · −c j−1
c j
~v j−1 −c j+1
c j
~v j+1 − · · · −ck
c j
~vk
Therefore, ~v j ∈ Span{~v1, . . . ,~v j−1,~v j+1, . . . ,~vk} and so the set is linearly dependent.
We now use this to formulate a mathematical definition for linear independence and
linear dependence.
Section 1.2 Bases 13
DEFINITION
LinearlyDependent
LinearlyIndependent
A set of vectors {~v1, . . . ,~vk} in Rn is said to be linearly dependent if there exist
coefficients c1, . . . , ck not all zero such that
~0 = c1~v1 + · · · + ck~vk
A set of vectors {~v1, . . . ,~vk} is said to be linearly independent if the only solution to
~0 = c1~v1 + · · · + ck~vk
is c1 = c2 = · · · = ck = 0 (called the trivial solution).
The work above which we used to motivate the definition proves the following result.
THEOREM 2 A set of vectors {~v1, . . . ,~vk} in Rn is linearly dependent if and only if
~vi ∈ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk} for some i, 1 ≤ i ≤ k.
EXAMPLE 9 Determine whether
7
−14
6
,
−10
15
15/14
,
−1
0
3
is linearly independent. If not, find a
vector which is in the span of the others.
Solution: Using the definition of linear independence/dependence, we consider
c1
7
−14
6
+ c2
−10
15
15/14
+ c3
−1
0
3
=
0
0
0
Calculating the linear combination of the vectors on the left gives
7c1 − 10c2 − c3
−14c1 + 15c2
6c1 +1514
c2 + 3c3
=
0
0
0
Comparing corresponding entries gives the 3 equations in 3 unknowns
7c1 − 10c2 − c3 = 0
−14c1 + 15c2 = 0
6c1 +15
14c2 + 3c3 = 0
Solving by substitution and elimination we find that there are infinitely many so-
lutions. One solution is c1 =37, c2 =
25, and c3 = −1. Hence, the system has a
non-trivial solution, so the set is linearly dependent.
Now, Theorem 2 tells us that at least one of the vectors can be written as a linear
combination of the other vectors. Moreover, the proof of the theorem shows us how
to actually find the linear combination. First, the solution above gives
3
7
7
−14
6
+2
5
−10
15
15/14
+ (−1)
−1
0
3
=
0
0
0
(1.3)
14 Chapter 1 Vectors in Euclidean Space
Following the proof, we pick one of the non-zero coefficients in (1.3) and solve for
the corresponding vector. Picking c2 =25, we solve the equation for ~v2 to get
−2
5
−10
15
15/14
=3
7
7
−14
6
+ (−1)
−1
0
3
−10
15
15/14
= −15
14
7
−14
6
+5
2
−1
0
3
Thus, we have written one of the vectors as a linear combination of the others.
REMARKS
1. In Example 9, we could have written any of the vectors in (1.3) that have non-
zero coefficient as a linear combination of the other vectors.
2. It is important to realize that to show linear independence we must show the
only solution to c1~v1 + · · · + ck~vk = ~0 is c1 = · · · = ck = 0. In particular, notice
that c1 = · · · = ck = 0 is always a solution regardless of whether {~v1, . . . ,~vk} is
linearly independent or not.
3. You will see frequently in this text that a proof often does a lot more than just
prove that the statement of a theorem is true. In many cases, the proof will
teach us how to solve computational problems, as demonstrated in Example
9. Additionally, understanding the techniques and strategies used in one proof
can help one develop proofs for other results.
EXAMPLE 10 Determine whether
5
2
7
,
3
−8
4
,
2
6
15
is linearly independent.
Solution: By definition of linear dependence, we consider
c1
5
2
7
+ c2
3
−8
4
+ c3
2
6
15
=
0
0
0
Calculating the linear combination of the vectors on the left gives
5c1 + 3c2 + 2c3
2c1 − 8c2 + 6c3
7c1 + 4c2 + 15c3
=
0
0
0
Section 1.2 Bases 15
Comparing corresponding entries gives the 3 equations in 3 unknowns
5c1 + 3c2 + 2c3 = 0
2c1 − 8c2 + 6c3 = 0
7c1 + 4c2 + 15c3 = 0
Solving by substitution and elimination we find that the only solution is c1 = c2 =
c3 = 0. Consequently, the set is linearly independent.
EXERCISE 2 Determine whether
1
2
1
,
1
−1
2
,
1
−1
−3
is linearly independent. If not, find a vector
which is in the span of the others.
EXERCISE 3 Determine whether
1
−1
3
,
2
−4
1
,
1
0
−1
,
3
−4
0
is linearly independent. If not, find a
vector which is in the span of the others.
Observe that the procedure for determining whether a set {~v1, . . . ,~vk} in Rn is linearly
dependent or linearly independent requires determining solutions of the vector equa-
tion c1~v1 + · · · + ck~vk = ~0. However, this vector equation actually represents n scalar
equations (one for each entry of the vectors) in k unknowns c1, . . . , ck. In the next
chapter, we will look at how to efficiently solve such systems of equations. However,
in some cases it may be easy to detect solutions by inspection.
EXAMPLE 11 Show that
1
2
1
,
2
4
2
is linearly dependent.
Solution: By definition of linear dependence, we consider
c1
1
2
1
+ c2
2
4
2
=
0
0
0
Since the second vector is a scalar multiple of the first, we see that if we take c1 = −2
and c2 = 1 we get
(−2)
1
2
1
+
2
4
2
=
0
0
0
Therefore, the equation has non-trivial solutions, so the set is linearly dependent.
16 Chapter 1 Vectors in Euclidean Space
EXAMPLE 12 Show that
3
1
4
1
,
5
9
2
6
,
0
0
0
0
is linearly dependent.
Solution: Consider c1
3
1
4
1
+ c2
5
9
2
6
+ c3
0
0
0
0
=
0
0
0
0
We immediately observe that this equation has the non-trivial solution c1 = c2 = 0,
c3 = 1. Hence, the set is linearly dependent.
REMARK
Notice in both of the last two examples that there were in fact infinitely many different
non-trivial solutions to the equations.
We generalize our method of solution in Example 12 to prove the following theorem.
THEOREM 3 If a set of vectors {~v1, . . . ,~vk} contains the zero vector, then it is linearly dependent.
Proof: If ~vi = ~0, then
0~v1 + · · · + 0~vi−1 + 1~vi + 0~vi+1 + · · · + 0~vk = ~0
Hence, the equation ~0 = c1~v1 + · · · + ck~vk has a solution with one coefficient, namely
ci, that is non-zero. So, by definition, the set is linearly dependent. �
Theorem 2 tells us that a set is linearly independent if and only if none of the vectors
in the set can be written as a linear combination of the others. That is, the simplest
spanning set for a given set is one that is linearly independent. Hence, we make the
following definition.
DEFINITION
Basis
Let S be a subset of Rn. If {~v1, . . . ,~vk} is a linearly independent set of vectors in Rn
such that S = Span{~v1, . . . ,~vk}, then the set {~v1, . . . ,~vk} is called a basis for S .
We define a basis for the set {~0} to be the empty set.
Note that this definition and its generalization will be extremely important throughout
the remainder of the book.
Section 1.2 Bases 17
EXAMPLE 13 Prove that the set B ={[
1
0
]
,
[
0
1
]}
is a basis for R2.
Solution: To prove this, we need to show that SpanB = R2 (B is a spanning set for
R2) and that B is linearly independent.
To prove SpanB = R2 we will show that they are subsets of each other. For any[
x1
x2
]
∈ R2 we have[
x1
x2
]
= x1
[
1
0
]
+ x2
[
0
1
]
This shows that every vector in R2 is a linear combination of the vectors in B, so
R2 ⊆ SpanB. On the other hand, properties V1 and V6 of Theorem 1.1.1 tells us
that all linear combinations of the vectors in B are in R2. Hence, SpanB ⊆ R2 and
consequently, SpanB = R2.
To prove that B is linearly independent, we use the definition of linear independence.
Consider[
0
0
]
= c1
[
1
0
]
+ c2
[
0
1
]
=
[
c1
c2
]
So, c1 = c2 = 0 and hence B is also linearly independent. Thus, B is a basis for R2.
DEFINITION
Standard Basis
In Rn, let ~ei represent the vector whose i-th component is 1 and all other components
are 0. The set {~e1, . . . , ~en} is called the standard basis for Rn.
EXAMPLE 14 The standard basis for R2 is {~e1, ~e2} ={[
1
0
]
,
[
0
1
]}
.
EXERCISE 4 Write the standard basis for R4.
REMARK
Observe that the standard basis vectors for Rn just represent the usual coordinate
axes. For example, in R3 we have that Span{~e1} is the x-axis, Span{~e2} is the y-axis,
and Span{~e3} is the z-axis. In particular, whenever we write a point (x, y, z) in R3 we
interpret it as
x
y
z
= x
1
0
0
+ y
0
1
0
+ z
0
0
1
= x~e1 + y~e2 + z~e3
18 Chapter 1 Vectors in Euclidean Space
EXAMPLE 15 Is the set B =
1
0
1
,
−2
1
1
a basis for R3?
Solution: Thinking geometrically, we guess that a basis for R3 should need at least
three vectors to span R3. So, we expect that B does not span R3. To prove this, we
just need to show that there exists a vector ~x ∈ R3 that cannot be written as a linear
combination of the vectors in B. Consider
0
1
0
= c1
1
0
1
+ c2
−2
1
1
=
c1 − 2c2
c2
c1 + c2
Comparing entries gives three equations in two unknowns
c1 − 2c2 = 0
c2 = 1
c1 + c2 = 0
Substituting c2 = 1 into the first and third equations gives −1 = c1 = 2 which shows
that there is no solution. Hence, ~x < SpanB and so B is not a basis for R3.
EXAMPLE 16 Prove that the set B ={[
−1
2
]
,
[
1
1
]}
is a basis for R2.
Solution: We first show spanning. Let
[
x1
x2
]
∈ R2 and consider
[
x1
x2
]
= c1
[
−1
2
]
+ c2
[
1
1
]
=
[
−c1 + c2
2c1 + c2
]
(1.4)
Comparing entries we get two equations in two unknowns
−c1 + c2 = x1
2c1 + c2 = x2
Solving gives c1 =13(−x1 + x2) and c2 =
13(2x1 + x2). Hence, we have that
1
3(−x1 + x2)
[
−1
2
]
+1
3(2x1 + x2)
[
1
1
]
=
[
x1
x2
]
Thus, every vector in R2 is a linear combination of the vectors in B. On the other
hand, every linear combination of
[
−1
2
]
and
[
1
1
]
is in R2 by Theorem 1.1.1. Conse-
quently, SpanB = R2.
For linear independence, we take x1 = x2 = 0 in equation (1.4). Our general solution
to that vector equation says that the only solution is c1 = c2 = 0. So, B is also linearly
independent. Therefore, B is a basis for R2.
Section 1.2 Bases 19
Graphically, the basis in the last example just represents a different set of coordinate
axes for R2. The figure below shows how these vectors still form a grid covering all
points in R2. Observe that the vector
[
4
1
]
is -1 ‘unit’ in the direction of
[
−1
2
]
and 3
‘units’ in the direction of
[
1
1
]
.
Notice that the reason we need a basis to be a spanning set is so that the grid covers
every point in the space. The reason that we need a basis to be linearly independent
is so that, like the standard coordinate axes, each vector has unique representation as
a point on the grid.
x
y
[
−1
2
]
[
1
1
]
[
4
1
]
= −[
−1
2
]
+ 3
[
1
1
]
Figure 1.2.1: Geometric representation of the basis B in Example 16
THEOREM 4 If B = {~v1, . . . ,~vk} is a basis for a subset S of Rn, then every vector ~x ∈ S can be
written as a unique linear combination of the vectors in B.
Proof: By definition of a basis, B is a spanning set for S . Thus, by definition of a
spanning set, every vector in S can be written as a linear combination of the vectors
in B.
To prove that every such linear combination is unique, we assume to the contrary that
there is a vector ~x ∈ S such that
c1~v1 + · · · + ck~vk = ~x = d1~v1 + · · · + dk~vk
Using the properties from Theorem 1.1.1, we can rearrange this to get
(c1 − d1)~v1 + · · · + (ck − dk)~vk = ~0 (1.5)
Because {~v1, . . . ,~vk} is linearly independent, the only solution to equation (1.5) is
ci − di = 0 for 1 ≤ i ≤ k. Hence, ci = di for 1 ≤ i ≤ k. Consequently, every vector in
S has a unique representation as a linear combination of the vectors in B. �
20 Chapter 1 Vectors in Euclidean Space
As with many proofs, some of the “easy steps” in the proof above are omitted. In
particular, the algebraic steps required to simplify
c1~v1 + · · · + ck~vk = d1~v1 + · · · + dk~vk
to
(c1 − d1)~v1 + · · · + (ck − dk)~vk = ~0
At first glance, this simplification should seem straightforward (the reason it was
omitted). However, if you try to justify each step using the properties in Theorem
1.1.1, you will likely discover it takes quite a bit of effort and thought.
EXERCISE 5 Show and justify ALL steps required to change
c1~v1 + · · · + ck~vk = d1~v1 + · · · + dk~vk
into
(c1 − d1)~v1 + · · · + (ck − dk)~vk = ~0
using only properties in Theorem 1.1.1.
Check the solution to the exercise. If you managed to justify every step, you can be
particularly proud of yourself. At this point, you may think that justifying all these
steps has no purpose. However, it is necessary! Soon we will start seeing objects
where some of the familiar properties that we take for granted do not hold! For
example, in Chapter 3, we will see objects where A times B most of the times does
not equal B times A. In these cases, it will be important that you think about every
step that you are doing so that you don’t use a property that doesn’t hold.
Surfaces in Higher Dimensions
We now extend our geometrical concepts of lines and planes to Rn for any n ≥ 2.
DEFINITION
Line in Rn
Let ~v, ~b ∈ Rn with ~v , ~0. We call the set with vector equation ~x = c1~v + ~b, c1 ∈ R a
line in Rn through ~b.
DEFINITION
Plane in Rn
Let ~v1,~v2, ~b ∈ Rn with {~v1,~v2} being a linearly independent set. We call the set with
vector equation ~x = c1~v1 + c2~v2 + ~b, c1, c2 ∈ R a plane in Rn through ~b.
DEFINITION
Hyperplane in Rn
Let ~v1, . . . ,~vn−1, ~b ∈ Rn with {~v1, . . . ,~vn−1} being linearly independent. The set with
vector equation ~x = c1~v1 + · · · + cn−1~vn−1 + ~b, c1, . . . , cn−1 ∈ R is called a hyperplane
in Rn through ~b.
Section 1.2 Bases 21
REMARK
Intuitively we see that a line in Rn is one-dimensional, a plane in Rn is two-
dimensional, and a hyperplane in Rn is n − 1-dimensional. We will justify this in-
tuition in Chapter 4 when we make a precise mathematical definition of dimension.
EXAMPLE 17 B =
1
0
0
1
,
0
1
0
−2
,
0
1
1
−1
is linearly independent, so SpanB is a hyperplane in R4.
EXERCISE 6 Give a vector equation for a hyperplane in R5.
As we will see, we will sometimes want to refer to a k-dimensional object in Rn. So,
we make one more definition.
DEFINITION
k-Flat in Rn
Let ~v1, . . . ,~vk, ~b ∈ Rn with {~v1, . . . ,~vk} linearly independent. We call the set with
vector equation ~x = c1~v1 + . . . + ck~vk + ~b, c1, . . . , ck ∈ R a k-flat in Rn through ~b.
Observe that a point in Rn is a 0-flat, a line in Rn is a 1-flat, and a hyperplane in Rn is
an (n − 1)-flat.
Section 1.2 Problems
1. Describe geometrically the following sets and
write a simplified vector equation for each.
(a) Span
{[
1
0
]}
(b) Span
{[
−1
1
]
,
[
2
−2
]}
(c) Span
1
−3
1
,
−2
6
−2
(d)
1
−3
1
,
−2
6
−2
(e) Span
1
0
−2
,
2
1
−1
(f) Span
0
0
0
(g) Span
1
0
1
1
,
1
0
2
1
,
3
1
0
0
(h) Span
1
1
1
0
2. Determine which of the following sets is lin-
early independent. If a set is linearly depen-
dent, write one of the vectors in the set as a
linear combination of the others.
(a)
{[
1
2
]
,
[
1
3
]
,
[
1
4
]}
(b)
{[
3
1
]
,
[
−1
3
]}
(c)
{[
1
1
]
,
[
1
0
]}
(d)
{[
2
3
]
,
[
−4
−6
]}
(e)
1
2
1
(f)
1
−3
−2
,
4
6
1
,
0
0
0
(g)
1
1
0
,
1
2
−1
,
−2
−4
2
(h)
1
−2
1
,
2
3
4
,
0
−1
−2
(i)
1
1
2
1
,
2
2
4
2
,
1
0
1
0
,
2
1
3
1
(j)
1
1
0
1
,
0
1
1
1
22 Chapter 1 Vectors in Euclidean Space
3. Determine which of the following sets forms a
basis for R2.
(a)
{[
3
2
]}
(b)
{[
2
3
]
,
[
1
0
]}
(c)
{[
−1
1
]
,
[
1
3
]}
(d)
{[
−1
1
]
,
[
1
3
]
,
[
0
2
]}
(e)
{[
1
0
]
,
[
−3
5
]}
(f)
{[
1
0
]
,
[
0
−3
]
,
[
0
0
]}
4. Determine which of the following sets forms a
basis for R3.
(a)
1
2
1
,
0
0
0
,
1
4
3
(b)
−1
2
−1
,
1
1
2
(c)
1
0
1
(d)
1
0
1
,
0
1
1
,
1
1
0
(e)
1
1
1
,
1
1
0
,
0
0
3
(f)
1
1
−1
,
1
2
−3
,
−1
−1
2
5. Write a basis for a hyperplane in R3. Justify
your answer.
6. Prove that {~v1,~v2} is linearly independent if and
only if neither ~v1 nor ~v2 is a scalar multiple of
the other.
7. Prove that if ~v1 ∈ Span{~v2,~v3}, then {~v1,~v2,~v3}is linearly dependent.
8. Let ~x =
1
1
0
and ~y =
0
1
2
.
(a) Prove that {~x, ~y} is not a basis for R3.
(b) Find, with proof, a vector ~z ∈ R3 such that
{~x, ~y,~z} is a basis for R3.
(c) With your ~z from part (b), is {~x, ~y, t~z} a ba-
sis for R3 for all t ∈ R? If so, give a proof.
If not, what restriction must you make on
t?
(d) Is there a vector ~w that is not a scalar mul-
tiple of ~z such that {~x, ~y, ~w} is a basis for
R3.
9. Repeat Question 8 with ~x =
1
2
−1
and ~y =
0
1
2
.
10. Prove that if {~v1,~v2,~v3} is linearly independent,
then {~v1,~v2} is also linearly independent.
11. Prove that Span{~v1,~v2} = Span{~v1, t~v2} for any
t ∈ R, t , 0.
12. Prove that if {~v1,~v2} is linearly independent and
~v3 < Span{~v1,~v2}, then {~v1,~v2,~v3} is linearly in-
dependent.
13. Let ~a, ~b, ~c ∈ R4 and let r, s, t be non-zero real
numbers. Prove that
Span{~a, ~b, ~c} = Span{r~a, s~b, t~c}
14. Determine whether each statement is true or
false. Justify your answer.
(a) {~v1} is linearly independent if and only if
~v1 ,~0.
(b) If ~v1,~v2 ∈ R3, then Span{~v1,~v2} is a plane
in R3.
(c) If {~v1,~v2} spans R2, then {~v1,~v1 + ~v2} spans
R2.
(d) If {~v1,~v2,~v3} is linearly dependent, then
~v3 ∈ Span{~v1,~v2}.(e) If ~v1 is not a scalar multiple of ~v2, then
{~v1,~v2} is linearly independent.
(f) If ~b is a scalar multiple of ~d, then ~x = t~d+~bis a line through the origin.
(g) If ~v1, . . . ,~vk ∈ Rn, then ~0 ∈Span{~v1, . . . ,~vk}.
(h) If ~x1, ~x2, ~x3, ~x4 ∈ R4 are distinct vectors
such that {~x1, ~x2} and {~x3, ~x4} are both lin-
early independent sets, then {~x1, ~x2, ~x3, ~x4}is also linearly independent.
(i) If {~v1, . . . ,~vk} spans a subset S of Rn and
{~w1, . . . , ~wℓ} spans a subset T of Rn, then
Span{~v1, . . . ,~vk, ~w1, . . . , ~wℓ} = S ∪ T .
Section 1.3 Subspaces 23
1.3 Subspaces
As we mentioned when we first looked at the ten properties of vector addition and
scalar multiplication in Theorem 1.1.1, properties V1 and V6 seem rather obvious.
However, it is easy to see that not all subsets of Rn have this property. For example,
the set S =
{[
1
1
]}
clearly does not satisfy properties V1 and V6 since
[
1
1
]
+
[
1
1
]
=
[
2
2
]
< S
c
[
1
1
]
=
[
c
c
]
< S if c , 1
A set which satisfies property V1 is said to be closed under addition. A set which
satisfies property V6 is said to be closed under scalar multiplication.
You might be wondering why we should be so interested in whether a set has prop-
erties V1 and V6 or not. One reason is that a non-empty subset S of Rn which has
these properties is closed under linear combinations. Moreover, such a subset S will
in fact satisfy all the same ten properties as Rn. That is, it is a “space” which is a
subset of the larger space Rn.
DEFINITION
Subspace of Rn
A subset S of Rn is called a subspace of Rn if for every ~x, ~y, ~w ∈ S and c, d ∈ R we
have
S1 ~x + ~y ∈ SS2 (~x + ~y) + ~w = ~x + (~y + ~w)
S3 ~x + ~y = ~y + ~x
S4 There exists a vector ~0 ∈ S such that ~x + ~0 = ~x for all ~x ∈ SS5 For every ~x ∈ S there exists (−~x) ∈ S such that ~x + (−~x) = ~0S6 c~x ∈ SS7 c(d~x) = (cd)~xS8 (c + d)~x = c~x + d~xS9 c(~x + ~y) = c~x + c~y
S10 1~x = ~x
It would be tedious to have to check all ten properties each time we wanted to prove
a non-empty subset S of Rn is a subspace. The good news is that this is not necessary.
Observe that properties S2, S3, S7, S8, S9, and S10 are only about the operations of
addition and scalar multiplication. Thus, we already know that these properties hold
by Theorem 1.1.1. Also, if S1 and S6 hold, then for any ~v ∈ S we have ~0 = 0~v ∈ Sand (−~v) = (−1)~v ∈ S, so S4 and S5 hold. Thus, to check if a non-empty subset of Rn
is a subspace, we can use the following result.
THEOREM 1 (Subspace Test)
Let S be a non-empty subset of Rn. If ~x + ~y ∈ S and c~x ∈ S for all ~x, ~y ∈ S and c ∈ R,
then S is a subspace of Rn.
24 Chapter 1 Vectors in Euclidean Space
EXAMPLE 1 Show that the line through the origin in R2 with vector equation ~x = s
[
1
−4
]
, s ∈ R is
a subspace of R2.
Solution: By V6, we know that s
[
1
−4
]
∈ R2 for every s ∈ R. Thus, the line is a
non-empty subset of R2.
To show that the line is closed under addition (satisfies S1), we pick any two vectors
~x and ~y on the line and show that ~x + ~y is also on the line. Pick ~x = s1
[
1
−4
]
and
~y = s2
[
1
−4
]
for some s1, s2 ∈ R. Then, since ~x and ~y are in R2, we can apply property
V8 to get
~x + ~y = s1
[
1
−4
]
+ s2
[
1
−4
]
= (s1 + s2)
[
1
−4
]
and hence ~x + ~y is also on the line. Similarly, to show that the line is closed under
scalar multiplication (satisfies S6) we need to show that c~x is also on the line for any
c ∈ R. As above, we can use property V7 to get
c~x = c
(
s1
[
1
−4
])
= (cs1)
[
1
−4
]
So, c~x is also on the line. Hence, by the Subspace Test, the line is a subspace of R2.
If S is non-empty and is closed under scalar multiplication, then our work above
shows us that S must contain ~0. Thus, the standard way of checking if S is non-
empty is to determine if ~0 satisfies the conditions of the set. If ~0 is not in S, then we
automatically know that it is not a subspace of Rn as it would not satisfy S4 or S6.
EXERCISE 1 If L is a line in R2 that does not pass through the origin, can L be a subspace of R2?
EXAMPLE 2 Prove that S1 =
x1
x2
x3
∈ R3 | x1 + x2 = 0, x1 − x3 = 0
is a subspace of R3.
Solution: By definition S1 is a subset of R3 and ~0 satisfies the conditions of the set
(0 + 0 = 0 and 0 − 0 = 0) so ~0 ∈ S1. Thus, S1 is a non-empty subset of R3.
To show that S1 is closed under addition we pick two vectors ~x, ~y ∈ S1 and show that
~x + ~y satisfies the conditions of S1. If ~x =
x1
x2
x3
and ~y =
y1
y2
y3
are in S1, then we have
x1 + x2 = 0, x1 − x3 = 0, y1 + y2 = 0, and y1 − y3 = 0.
Section 1.3 Subspaces 25
By definition of addition, we get
~x + ~y =
x1 + y1
x2 + y2
x3 + y3
We see that
(x1 + y1) + (x2 + y2) = (x1 + x2) + (y1 + y2) = 0 + 0 = 0
and
(x1 + y1) − (x3 + y3) = (x1 − x3) + (y1 − y3) = 0 + 0 = 0
Hence, ~x + ~y satisfies the conditions of S1, so S1 is closed under addition.
Similarly, for any c ∈ R we have
c~x =
cx1
cx2
cx3
and
cx1 + cx2 = c(x1 + x2) = c0 = 0, cx1 − cx3 = c(x1 − x3) = c0 = 0
Hence, S1 is also closed under scalar multiplication.
Therefore, by the Subspace Test S1 is a subspace of R3.
EXAMPLE 3 Prove that S2 =
x1
x2
x3
∈ R3 | x1 + x2 = 1
is not a subspace of R3.
Solution: By definition S2 is a subset of R3, but we see that ~0 does not satisfy the
condition of S2 (0 + 0 , 1), hence S2 is not a subspace of R3.
EXAMPLE 4 Prove that S3 =
x1
x2
x3
∈ R3 | x1 − 2x2 = 0
is a subspace of R3.
Solution: By definition S3 is a subset of R3, and ~0 satisfies the conditions of the set
(0 − 2(0) = 0) so ~0 ∈ S3. Thus, S3 is non-empty subset of R3.
Let ~x =
x1
x2
x3
and ~y =
y1
y2
y3
be vectors in S3, then we have x1−2x2 = 0 and y1−2y2 = 0.
Then
~x + ~y =
x1 + y1
x2 + y2
x3 + y3
26 Chapter 1 Vectors in Euclidean Space
and
(x1 + y1) − 2(x2 + y2) = (x1 − 2x2) + (y1 − 2y2) = 0 + 0 = 0
Hence, S3 is closed under addition.
Similarly, for any c ∈ R we have
c~x =
cx1
cx2
cx3
and
cx1 − 2(cx2) = c(x1 − 2x2) = c0 = 0
Hence, S3 is also closed under scalar multiplication.
Therefore, S3 is a subspace of R3 by the Subspace Test.
EXERCISE 2 Let ~v, ~b ∈ Rn with ~v , ~0. Show that the line S = {t~v + ~b | t ∈ R} is a subspace of Rn if
and only if ~b is a scalar multiple of ~v.
THEOREM 2 If ~v1, . . . ,~vk ∈ Rn, then S = Span{~v1, . . . ,~vk} is a subspace of Rn.
Bases of Subspaces
Theorem 2 implies that if a subset S of Rn has a basis, then S must be a subspace of
Rn. In Chapter 4, we will prove the converse: every subspace S of Rn has basis. For
now, we look at the procedure for finding a basis of a given subspace.
The algorithm for finding a basis of a subspace S on Rn will be an extension of what
we did in the last section. We will first find a general form for a vector in S and then
use the general form to find a spanning set for S. Finally, we use Theorem 1.2.1 as
many times as necessary until we get a linearly independent spanning set for S.
Of course, as we saw for bases of Rn, every subspace of Rn, excluding the trivial
subspace {~0}, will have infinitely many bases.
REMARK
To prove that a setB spans a subspace S ofRn, we are required to prove that SpanB ⊆S and S ⊆ SpanB. However, observe that if all of the vectors in B are in S, then we
must have SpanB ⊆ S since S is closed under linear combinations. Since verifying
that all the vectors in B are in S is considered trivial, it is often omitted.
Section 1.3 Subspaces 27
EXAMPLE 5 Find a basis for the subspace S1 =
x1
x2
x3
∈ R3 | x1 + x2 = 0, x1 − x3 = 0
of R3.
Solution: We begin by using the conditions on S1 to find a general form of a vector
~x ∈ S1. If
x1
x2
x3
∈ S1, then we have x1 + x2 = 0 and x1 − x3 = 0. Hence, we get
x2 = −x1 and x3 = x1. Therefore, every vector in S1 can be represented as
x1
x2
x3
=
x1
−x1
x1
= x1
1
−1
1
Thus, S1 ⊆ Span
1
−1
1
. Moreover, since
1
−1
1
∈ S1 we have that Span
1
−1
1
⊆
S1. Consequently, B =
1
−1
1
is a spanning set for S1. Clearly, B is also linearly
independent (it contains a single non-zero vector). Hence, B is a basis for S1.
EXERCISE 3 Find another basis for the subspace S1 in Example 5.
EXAMPLE 6 Find a basis for the subspace S2 =
a − b
b − c
c − a
| a, b, c ∈ R
of R3.
Solution: We are given that every vector in S2 has the form
a − b
b − c
c − a
. To write this as
a linear combination of vectors, we factor out the variables individually to get
a − b
b − c
c − a
= a
1
0
−1
+ b
−1
1
0
+ c
0
−1
1
Thus, S2 ⊆ Span
1
0
−1
,
−1
1
0
,
0
−1
1
. Since each of the vectors are also in S2 we get
that
1
0
−1
,
−1
1
0
,
0
−1
1
spans S2.
However, observe that this set is linearly dependent. In particular, we have
0
−1
1
= −
1
0
−1
−
−1
1
0
28 Chapter 1 Vectors in Euclidean Space
Therefore, by Theorem 1.2.1 we get that S2 = Span
1
0
−1
,
−1
1
0
.
Since B =
1
0
−1
,
−1
1
0
is also linearly independent (neither vector is a scalar mul-
tiple of the other), we get that B is a basis for S2.
The purpose of the next exercise is to stress the importance of verifying that the
vectors in a set B are actually in the set S when trying to prove that SpanB = S.
EXERCISE 4 Consider the subspace S =
x1
x1
x1
∈ R3 | x1 ∈ R
.
(a) Show that every vector in S is a linear combination of ~v1 =
1
0
1
and ~v2 =
0
1
0
.
(b) Explain why B = {~v1,~v2} does not span S.
The importance and usefulness of bases will become more and more clear as we
proceed further into linear algebra. For now, the most important thing to realize
about bases is that they give us a simple explicit formula for every vector in the set.
One advantage of having this is that it becomes very easy to identify the geometric
interpretation of the subspace.
EXAMPLE 7 Find a geometric interpretation of the subspace S3 =
x1
x2
x3
∈ R3 | x1 − 2x2 = 0
.
Solution: Every
x1
x2
x3
∈ S3 satisfies x1 − 2x2 = 0. We have x1 = 2x2 and so every
vector in S3 can be represented as
x1
x2
x3
=
2x2
x2
x3
= x2
2
1
0
+ x3
0
0
1
Since we also have that
2
1
0
,
0
0
1
∈ S3, we get that B =
2
1
0
,
0
0
1
spans S3. Finally,
B is linearly independent since neither vector in B is a scalar multiple of the other.
So, B is a basis for S3.
Therefore, by definition, S3 is a plane in R3 through the origin.
Section 1.3 Subspaces 29
Section 1.3 Problems
1. Determine which of the following sets are sub-
spaces of the given Rn. Find a basis of each
subspace and describe the subspace geometri-
cally.
(a) S1 = Span
1
1
−1
,
2
1
4
of R3
(b) S2 =
{[
x1
x2
]
∈ R2 | x1 = x2
}
of R2
(c) S3 =
5
x2
x3
| x2, x3 ∈ R
of R3
(d) S4 =
x1
x2
x3
∈ R3 | x1 + x2 = x3
of R3
(e) S5 =
0
0
0
0
of R4
(f) S6 =
1
2
3
of R3
(g) S7 =
x1
x2
x3
x4
| x2 = x1 − x3 + x4
of R4
(h) S8 =
a − b
b − d
a + b − 2d
c − d
| a, b, c, d ∈ R
of R4
2. Prove that a plane in R3 is a subspace of R3 if
and only if it passes through the origin.
3. Let S = {~v1, . . . ,~vk} be a set of non-zero vectors
in Rn. Prove that S is not a subspace of Rn.
4. Let B =
1
3
1
,
2
0
1
.
(a) Prove that B is not a basis for R3.
(b) Find a subspace S of R3 such that B is a
basis for S.
5. Explain the difference between a subset of Rn
and a subspace of Rn.
6. Determine whether the set B is a basis for the
given subspace S of R3.
(a) S1 =
x1
x2
x3
| x1 − 2x2 + x3 = 0
; B =
0
1
2
,
1
0
−1
(b) S2 =
x1
x2
x3
| x3 = 2x1 − 3x2 ∈ R
; B =
1
1
−1
(c) S3 =
x1
x2
x3
| x1 + 2x2 = 0 ∈ R
; B =
1
2
0
,
0
0
1
(d) S4 =
x1
x2
x3
| x1 = 2x2, x2 = −4x3
; B =
4
2
−1/2
(e) S5 =
x1
x2
x3
| 3x1 + x2 = x3
; B =
2
1
7
,
1
−1
2
7. Find a basis for the subspace S of R3 defined
by S = Span
1
−1
3
,
−1
0
2
,
0
1
−5
.
8. Let B =
1
2
1
,
2
2
1
. Find a, b, c ∈ R such that
B is a basis for the subspace
S =
x1
x2
x3
| ax1 + bx2 + cx3 = 0
of R3.
9. Let a, b, c, d ∈ R with d , 0 and let
P =
x1
x2
x3
∈ R3 | ax1 + bx2 + cx3 = d
(a) Show that P is not a subspace of R3.
(b) Explain why P does not have a basis.
(c) Find a vector equation for P.
(d) What is P geometrically?
30 Chapter 1 Vectors in Euclidean Space
1.4 Dot Product, Cross Product, and Scalar Equations
So far, we have looked at how to add two vectors in Rn and how to multiply a vector
by a scalar. We now look at two ways of defining a product of two vectors.
Dot Product
Recall that the dot product in R2 and R3 is defined by
[
x1
x2
]
·[
y1
y2
]
= x1y1 + x2y2
x1
x2
x3
·
y1
y2
y3
= x1y1 + x2y2 + x3y3
Two useful aspects of the dot product is in calculating the length of a vector and in
determining whether two vectors are orthogonal. In particular, the length of a vector
~v is ‖~v ‖ =√~v · ~v, and two vectors ~x and ~y are orthogonal if and only if ~x · ~y = 0. In
relation to the latter, the dot product also determines the angle between two vectors
in R2.
THEOREM 1 If ~x, ~y ∈ R2 and θ is an angle between ~x and ~y, then
~x · ~y = ‖~x ‖ ‖~y ‖ cos θ
O x1
x2
θ
~y
~x
Figure 1.4.1: An angle between vectors ~x and ~y.
Since the dot product is so useful in R2 and R3, it makes sense to extend it to Rn.
Section 1.4 Dot Product, Cross Product, and Scalar Equations 31
DEFINITION
Dot Product
Let ~x =
x1
...xn
and ~y =
y1
...yn
be vectors in Rn. The dot product of ~x and ~y is
~x · ~y =
x1
...xn
·
y1
...yn
= x1y1 + · · · + xnyn =
n∑
i=1
xiyi
REMARK
The dot product is sometimes called the standard inner product on Rn, or the scalar
product.
EXAMPLE 1 Let ~x =
1
2
−1
1
, ~y =
0
0
1
1
, and ~z =
3
1
−2
−2
. Calculate ~x · ~y, ~y ·~z, ~z ·~z, and (~x ·~z)~z.
Solution: We have
~x · ~y = 1(0) + 2(0) + (−1)(1) + 1(1) = 0
~y ·~z = 0(3) + 0(1) + 1(−2) + 1(−2) = −4
~z ·~z = 3(3) + 1(1) + (−2)(−2) + (−2)(−2) = 18
(~x ·~z)~z =(
1(3) + 2(1) + (−1)(−2) + 1(−2)
)
~z = 5
3
1
−2
−2
=
15
5
−10
−10
As usual when looking at a new operation, we should prove some important prop-
erties of the dot product. It also should be noted that these properties are not only
important when working with the dot product, but, like Theorem 1.1.1, we will make
important use of these properties later in the course.
THEOREM 2 If ~x, ~y,~z ∈ Rn and s, t ∈ R, then
(1) ~x · ~x ≥ 0, and ~x · ~x = 0 if and only if ~x = ~0(2) ~x · ~y = ~y · ~x(3) ~x · (s~y + t~z) = s(~x · ~y) + t(~x ·~z)
32 Chapter 1 Vectors in Euclidean Space
Proof: Let ~x =
x1
...xn
, ~y =
y1
...yn
, and ~z =
z1
...zn
. For (1) we have
~x · ~x = x21 + · · · + x2
n
Since this is a sum of non-negative numbers, we get that ~x · ~x ≥ 0 for all ~x, and that
~x · ~x = 0 if and only if xi = 0 for 1 ≤ i ≤ n.
For (2) we have
~x · ~y = x1y1 + · · · + xnyn = y1x1 + · · · + ynxn = ~y · ~x
For (3) We have
~x · (s~y + t~z) =
x1
...xn
·
sy1 + tz1
...syn + tzn
= x1(sy1 + tz1) + · · · + xn(syn + tzn)
= sx1y1 + tx1z1 + · · · + sxnyn + txnzn
= s(x1y1 + · · · + xnyn) + t(x1z1 + · · · + xnzn)
= s(~x · ~y) + t(~x ·~z)
�
Observe that property (1) allows us to define the length of a vector in Rn to match our
formula in R2 and R3.
DEFINITION
Length
Let ~x ∈ Rn. The length or norm of ~x is defined to be
‖~x ‖ =√~x · ~x
EXAMPLE 2 Find the length of ~x =
1
2
−1
0
, ~y =
1
−2
3
1
1
, and ~z =
1/21/21/21/2
.
Solution: We have
‖~x ‖ =√~x · ~x =
√
12 + 22 + (−1)2 + 02 =√
6
‖~y ‖ =√
~y · ~y =√
12 + (−2)2 + 32 + 12 + 12 =√
16 = 4
‖~z ‖ =√
~z ·~z =√
(1/2)2 + (1/2)2 + (1/2)2 + (1/2)2 =√
1 = 1
Section 1.4 Dot Product, Cross Product, and Scalar Equations 33
In some applications, it is useful to have vectors of length 1. So, we make the fol-
lowing definition.
DEFINITION
Unit Vector
A vector ~x ∈ Rn such that ‖~x ‖ = 1 is called a unit vector.
THEOREM 3 If ~x, ~y ∈ Rn and c ∈ R, then
(1) ‖~x ‖ ≥ 0, and ‖~x‖ = 0 if and only if ~x = ~0(2) ‖c~x ‖ = |c| ‖~x ‖(3) |~x · ~y | ≤ ‖~x ‖ ‖~y ‖ (Cauchy-Schwarz-Bunyakovski Inequality)
(4) ‖~x + ~y ‖ ≤ ‖~x ‖ + ‖~y ‖ (Triangle Inequality)
Proof: We prove (3) and leave the rest as exercises.
If ~x = ~0, the result is obvious. For ~x , ~0, ~y ∈ Rn and t ∈ R, we use property (1) and
properties in Theorem 2 to get
0 ≤ ‖t~x + ~y‖2 = (t~x + ~y) · (t~x + ~y) = (~x · ~x)t2 + 2(~x · ~y)t + (~y · ~y)
This is a polynomial in the variable t where the coefficient ~x · ~x of t2 is positive. For
this polynomial to be non-negative it must have non-positive discriminant. Hence,
(2~x · ~y)2 − 4(~x · ~x)(~y · ~y) ≤ 0
4(~x · ~y)2 − 4‖~x ‖2 ‖~y ‖2 ≤ 0
(~x · ~y)2 ≤ ‖~x ‖2‖~y ‖2
Taking square roots of both sides gives the desired result. �
We now also extend the concept of angles and orthogonality to vectors in Rn.
DEFINITION
Angle in Rn
Let ~x, ~y ∈ Rn. We define an angle between ~x and ~y to be an angle θ such that
~x · ~y = ‖~x ‖ ‖~y ‖ cos θ
REMARK
Property (3) of Theorem 3 guarantees that such an angle θ exists for each pair ~x, ~y.
Observe that if the angle between two vectors ~x and ~y is θ = π/2, then we get that
~x · ~y = 0. On the other hand, if ~x · ~y = 0, then we can take θ = π/2. Hence, we make
the following definition.
34 Chapter 1 Vectors in Euclidean Space
DEFINITION
Orthogonal
Let ~x, ~y ∈ Rn. If ~x · ~y = 0, then ~x and ~y are said to be orthogonal.
EXAMPLE 3 Let ~w =
1
0
1
2
, ~y =
2
3
−4
1
, ~u =
1
2
1
−1
1
, and ~v =
1
1
2
1
5
. We have that ~w and ~y are orthogonal
because
~w · ~y = 1(2) + 0(3) + 1(−4) + 2(1) = 0
While, ~u and ~v are not orthogonal because
~u · ~v = 1(1) + 2(1) + 1(2) + (−1)(1) + 1(5) = 9
Notice that our definition of orthogonal does not match exactly with our geometric
view of two vectors being perpendicular because of the zero vector. In particular, we
have the following theorem.
THEOREM 4 The zero vector ~0 ∈ Rn is orthogonal to every vector ~x ∈ Rn.
We can also extend the definition of orthogonal to a set of vectors.
DEFINITION
Orthogonal Set
A set of vectors {~v1, . . . ,~vk} in Rn is called an orthogonal set if ~vi · ~v j = 0 for all
i , j.
EXAMPLE 4 Show that the standard basis vectors for R3 form an orthogonal set.
Solution: We have
1
0
0
·
0
1
0
= 0,
1
0
0
·
0
0
1
= 0, and
0
0
1
·
0
1
0
= 0 as required.
REMARK
In fact, one reason the standard basis vectors for Rn are so easy to work with is be-
cause they form an orthogonal set of unit vectors. Such a set is called an orthonormal
set. We will look at other orthonormal sets in Chapter 9.
Section 1.4 Dot Product, Cross Product, and Scalar Equations 35
Cross Product
We have seen the dot product of two vectors is a scalar which has a nice geometric
interpretation. We now look at another way of taking a product of two vectors in R3.
Let ~v =
v1
v2
v3
, ~w =
w1
w2
w3
∈ R3. In many applications, it is necessary to determine a
vector ~n =
n1
n2
n3
which is orthogonal to both ~v and ~w. That is, we want
0 = ~v · ~n = v1n1 + v2n2 + v3n3
0 = ~w · ~n = w1n1 + w2n2 + w3n3
Solving these 2 equations in 3 unknowns, we find that one solution is
~n =
v2w3 − v3w2
v3w1 − v1w3
v1w2 − v2w1
Hence, we make the following definition.
DEFINITION
Cross Product
Let ~v, ~w ∈ R3. The cross product of ~v =
v1
v2
v3
and ~w =
w1
w2
w3
is defined to be
~v × ~w =
v2w3 − v3w2
v3w1 − v1w3
v1w2 − v2w1
REMARK
The 2 equations in 3 unknowns we solved to derive the formula for the cross-product
in fact has infinitely many solutions. The natural question to ask is why we picked
this particular one. The reason is two-fold. First, we have picked it so that {~v, ~w,~v×~w}forms what is called a right-handed system. Secondly, we have picked it so that it
has a nice geometric interpretation (see Section 5.5).
EXAMPLE 5 We have
1
2
3
×
6
5
4
=
2(4) − 3(5)
−[1(4) − 3(6)]
1(5) − 2(6)
=
−7
14
−7
0
1
0
×
1
0
0
=
1(0) − 0(0)
−[0(0) − 0(1)]
0(0) − 1(1)
=
0
0
−1
36 Chapter 1 Vectors in Euclidean Space
REMARK
One way to help remember the formula for the cross-product is as follows: Write the
components of the vectors next to each other as
v1 w1
v2 w2
v3 w3
To get the first component of ~v× ~w we cover the first row and calculate the difference
of the products of the cross-terms. That is,
v2
v3րց w2
w3gives v2w3 − v3w2
To get the second component we cover the second row and calculate the negative of
the difference of the products of the cross-terms.
v1
v3րց w1
w3gives − (v1w3 − v3w1) = v3w1 − v1w3
For the third component we cover the third row and calculate the difference of the
products of the cross-terms like we did for the first component.
v1
v2րց w1
w2gives v1w2 − v2w1
Be careful to include the extra negative in the calculation of the second component.
EXERCISE 1 Evaluate the following:
(a)
1
0
0
×
0
1
0
(b)
3
−1
1
×
1
2
−1
(c)
1
1
3
×
c
c
3c
The following theorem gives some of the important properties of the cross product.
THEOREM 5 Suppose that ~v, ~w, ~x ∈ R3 and c ∈ R.
(1) If ~n = ~v × ~w, then for any ~y ∈ Span{~v, ~w} we have ~y · ~n = 0
(2) ~v × ~w = −~w × ~v(3) ~v × ~v = ~0(4) ~v × ~w = ~0 if and only if either ~v = ~0 or ~w is a scalar multiple of ~v(5) ~v × (~w + ~x) = ~v × ~w + ~v × ~x(6) (c~v) × (~w) = c(~v × ~w)
(7) ‖~v × ~w ‖ = ‖~v ‖ ‖~w ‖∣
∣
∣ sin θ∣
∣
∣ where θ is the angle between ~v and ~w
Section 1.4 Dot Product, Cross Product, and Scalar Equations 37
REMARK
It is important to remember that, as derived, the cross product of ~v, ~w ∈ R3 gives a
vector that is orthogonal to ~v and ~w (as stated in property (1) of Theorem 5).
Scalar Equation of a Plane
In some applications the vector equation of a plane is not the most suitable. We now
look at how to change from the vector equation of a plane in R3 to another useful
representation.
THEOREM 6 Let ~v, ~w, ~b ∈ R3 with {~v, ~w} being linearly independent and let P be a plane in R3
with vector equation ~x = s~v + t~w + ~b, s, t ∈ R. If ~n = ~v × ~w, then an equation for the
plane is
(~x − ~b) · ~n = 0 (1.6)
Proof: We will prove that equation 1.6 is an equation for the plane by showing that
a point is on the plane if and only if it satisfies equation 1.6.
First observe that since {~v, ~w} is linearly independent neither ~v nor ~w is a scalar mul-
tiple of the other. Thus, property (4) of Theorem 5 gives ~n , ~0.
Consider any point X(x1, x2, x3) in the plane. For the corresponding vector ~x =
x1
x2
x3
the vector equation of the plane gives ~x − ~b = s~v + t~w. Thus,
(~x − ~b) · ~n = (s~v + t~w) · ~n = 0
by property (1) of Theorem 5. Hence, every point in the plane satisfies (1.6).
On the other hand, if X(x1, x2, x3) is any point in R3 that satisfies (~x − ~b) · ~n = 0, then
0 =
x1
x2
x3
−
b1
b2
b3
·
v2w3 − v3w2
v3w1 − v1w3
v1w2 − v2w1
0 =
x1 − b1
x2 − b2
x3 − b3
·
v2w3 − v3w2
v3w1 − v1w3
v1w2 − v2w1
0 = (v2w3 − v3w2)(x1 − b1) + (v3w1 − v1w3)(x2 − b2) + (v1w2 − v2w1)(x3 − b3)
Since ~n , ~0 we get that at least one entry of ~n is non-zero. We will prove the case
where v1w2 − v2w1 , 0. The other two cases are similar.
Bringing (v1w2 − v2w1)(x3 − b3) to the other side gives
−(v1w2 − v2w1)(x3 − b3) = (v2w3 − v3w2)(x1 − b1) + (v3w1 − v1w3)(x2 − b2)
38 Chapter 1 Vectors in Euclidean Space
Therefore, we get
−(v1w2 − v2w1)
x1 − b1
x2 − b2
x3 − b3
=
−(v1w2 − v2w1)(x1 − b1)
−(v1w2 − v2w1)(x2 − b2)
(v2w3 − v3w2)(x1 − b1) + (v3w1 − v1w3)(x2 − b2)
= (x1 − b1)
−(v1w2 − v2w1)
0
v2w3 − v3w2
+ (x2 − b2)
0
−(v1w2 − v2w1)
v3w1 − v1w3
= (x1 − b1)
−w2
v1
v2
v3
+ v2
w1
w2
w3
+ (x2 − b2)
w1
v1
v2
v3
− v1
w1
w2
w3
Dividing both sides by −(v1w2 − v2w1) gives
~x − ~b =[
−w2(x1 − b1) + w1(x2 − b2)
−v1w2 + v2w1
]
~v +
[
v2(x1 − b1) − v1(x2 − b2)
−v1w2 + v2w1
]
~w
Consequently, there exists s, t ∈ R such that ~x = s~v + t~w + ~b. Hence, the only points
in R3 which satisfy (1.6) are in P.
Thus, the equation (~x − ~b) · ~n = 0 defines the plane. �
Observe that we can take the vector ~n in Theorem 6 to be any non-zero scalar multiple
of ~v × ~w. Moreover, we can use properties of the dot product to rearrange equation
(1.6) to be in the form ~x · ~n = ~b · ~n.
DEFINITION
Normal Vector
Scalar Equation
Let P be a plane in R3 that passes through the point B(b1, b2, b3). If ~n =
n1
n2
n3
∈ R3 is
a vector such that
n1x1 + n2x2 + n3x3 = b1n1 + b2n2 + b3n3 (1.7)
is an equation for P, then ~n is called a normal vector for P. We call equation (1.7) a
scalar equation of P.
EXAMPLE 6 Find a scalar equation of the plane through B(1, 2, 3) with normal vector ~n =
1
−1
2
.
Solution: By definition a scalar equation of the plane is
n1x1 + n2x2 + n3x3 = b1n1 + b2n2 + b3n3
1x1 + (−1)x2 + 2x3 = 1(1) + (−1)(2) + 2(3) = 5
Section 1.4 Dot Product, Cross Product, and Scalar Equations 39
EXAMPLE 7 What is a normal vector for the plane with scalar equation 3x1 + x2 − x3 = 4?
Solution: By definition, the coefficients x1, x2, and x3 of a scalar equation for the
plane are the components of a normal vector ~n. Thus, ~n =
3
1
−1
.
EXERCISE 2 List 2 other normal vectors for the plane with scalar equation 3x1 + x2 − x3 = 4.
EXAMPLE 8 Find a scalar equation of the plane through A(−1, 0, 2) that is parallel to the plane
x1 − 2x2 + 3x3 = −2.
Solution: For two planes to be parallel, their normal vectors must be non-zero scalar
multiples of each other. That is, we can take a normal vector for the required plane
to be ~n =
1
−2
3
. Thus, a scalar equation of the required plane is
x1 − 2x2 + 3x3 = 1(−1) + (−2)(0) + 3(2) = 5
EXAMPLE 9 Find a scalar equation for the plane defined by ~x = s
3
1
−1
+ t
−1
−1
5
+
2
3
3
, s, t ∈ R.
Solution: By Theorem 6 we find that a normal vector for the plane is
~n =
3
1
−1
×
−1
−1
5
=
4
−14
−2
We can find a point on the plane by picking any values for s and t. Taking s = t = 0,
we find that B(2, 3, 3) is on the plane. Thus, a scalar equation of the plane is
4x1 + (−14)x2 + (−2)x3 = 4(2) + (−14)(3) + (−2)(3) = −40
Of course, we could divide both sides of the equation by -2 to get a slightly simpler
scalar equation
−2x1 + 7x2 + x3 = 20
40 Chapter 1 Vectors in Euclidean Space
EXAMPLE 10 Find a scalar equation for the plane defined by ~x = s
4
1
−2
+ t
0
1
2
, s, t ∈ R.
Solution: We have
4
1
−2
×
0
1
2
=
4
−8
4
Since any non-zero scalar multiple of this is also a normal vector to the plane, we
pick ~n =
1
−2
1
. We also notice that taking s = t = 0 shows us that the plane passes
through the origin. Thus, a scalar equation of the plane is
x1 + (−2)x2 + x3 = 1(0) + (−2)(0) + (1)(0) = 0
EXERCISE 3 Find a scalar equation for the plane with vector equation ~x = s
−1
−1
3
+ t
2
1
4
, s, t ∈ R.
Scalar Equation of a Hyperplane
If {~v1, . . . ,~vm−1} is a linearly independent set of vectors in Rm, m ≥ 2, and ~b ∈ Rm,
then, ~x = c1~v1 + · · ·+ cm−1~vm−1 +~b is a vector equation of a hyperplane in Rm. If there
exists a non-zero vector ~n =
n1
...nm
∈ Rm which is orthogonal to each of ~v1, . . . ,~vm−1,
then we can repeat our argument above to get that an equation for the hyperplane,
also called a scalar equation, is
n1x1 + · · · + nmxm = n1b1 + · · · + nmbm
REMARK
Although our geometrical intuition tells us that such a vector ~n exists, we do need to
prove that it exists. To do this, we would repeat what we did in deriving the formula
for the cross product. That is, we would consider the m− 1 equations in m unknowns
given by ~n · ~v1 = 0, . . . , ~n · ~vm−1 = 0. In the next chapter we will look at the theory of
solving such systems which will guarantee us that there is a solution.
Section 1.4 Dot Product, Cross Product, and Scalar Equations 41
Section 1.4 Problems
1. Evaluate the following:
(a)
5
3
−6
·
3
2
4
(b)
1
−2
−2
3
·
2
1/21/2−1
(c)
∥
∥
∥
∥
∥
∥
∥
∥
∥
√2
1√2
∥
∥
∥
∥
∥
∥
∥
∥
∥
(d)
1
4
−1
·
3
−1
−1
(e)
∥
∥
∥
∥
∥
∥
∥
∥
∥
∥
∥
1
2
−1
3
∥
∥
∥
∥
∥
∥
∥
∥
∥
∥
∥
(f)
2
1
−3
×
1
1
−1
(g)
1
1
−1
×
2
1
−3
(h)
3
2
3
×
3a
2a
3a
2. Find a scalar equation for:
(a) The plane with normal vector ~n =
1
−3
3
that passes through the origin.
(b) The plane through (1, 2, 0) that is parallel
to the plane 3x1 + 2x2 − x3 = 1.
(c) The plane with vector equation
~x = s
1
0
−2
+ t
1
1
−1
, s, t ∈ R.
(d) The plane with vector equation
~x = s
3
3
3
+ t
2
1
−1
, s, t ∈ R.
(e) The plane P = Span
3
1
−2
,
2
2
1
.
3. Determine which of the following sets are
orthogonal.
(a)
1
1
2
,
−1
−1
−1
,
0
1
1
(b)
1
2
−1
,
3
−2
−1
,
2
1
4
(c)
0
0
0
,
1
0
1
,
0
1
0
(d)
1√2
1√3
1√6
,
0
− 1√3
2√6
,
1√2
− 1√3
− 1√6
4. Show that
1/√
3
1/√
3
1/√
3
,
1/√
2
0
−1/√
2
,
1/√
6
−2/√
6
1/√
6
is
an orthogonal set of unit vectors.
5. Prove that ‖c~x ‖ = |c|‖~x ‖ for any ~x ∈ Rn and
c ∈ R.
6. Prove that if ~x , ~0, then1
‖~x ‖~x is a unit vector.
7. Prove that if ~v1 and ~v2 are unit vectors such that
~v1 ·~v2 = 0, then {~v1,~v2} is linearly independent.
8. Prove that if ~v1 · ~v2 = 0, then ‖~v1 + ~v2 ‖2 =‖~v1 ‖2 + ‖~v2‖2.
9. Prove that if ~n = ~v × ~w, then for any ~y ∈Span{~v, ~w} we have ~y · ~n = 0.
10. Prove that the set of all vectors orthogonal to
~x =
1
1
1
is a subspace of R3.
11. Prove Theorem 4.
12. Determine whether each statement is true or
false. Justify your answer.
(a) Let ~x, ~y,~z ∈ R4. If ~x · ~y = 0 and ~y · ~z = 0,
then ~x ·~z = 0.
(b) If ~x and ~y are orthogonal, then {~x, ~y} is lin-
early independent.
(c) If ~x and ~y are orthogonal, then {s~x, t~y} is an
orthogonal set for any s, t ∈ R.
(d) If ~x and ~y are both orthogonal to~z, then~z is
orthogonal to ~v = s~x + t~y for any s, t ∈ R.
(e) ~x× (~y×~z) = (~x×~y)×~z for any ~x, ~y,~z ∈ R3.
(f) If ~n is a normal vector for a plane P in R3,
then for any t ∈ R, t , 0, t~n is also a nor-
mal vector for P.
42 Chapter 1 Vectors in Euclidean Space
1.5 Projections
We have seen how to combine vectors together via linear combinations to create other
vectors. We now look at this procedure in reverse. We want to take a vector and write
it as a sum of two other vectors. In elementary physics, this is used to split a force
into its components along certain directions (usually horizontal and vertical).
Projection onto a Line
Given vectors ~u,~v ∈ Rn with ~v , ~0 we want to write ~u as a sum of a scalar multiple of
~v and another vector ~w which is orthogonal to ~v as in Figure 1.5.1. That is, we want
~u = c~v + ~w for some c ∈ R such that ~w · ~v = 0.
x1
x2
O
~v~u
c~v
~w
x1
x2
O
~v
~u
c~v
~w
x1
x2
O
~v~u
c~v
~w
Figure 1.5.1: Examples of projections
We begin by determining the value of c. Observe that
~u · ~v = (c~v + ~w) · ~v = c(~v · ~v) + ~w · ~v = c‖~v ‖2 + 0
Hence, c =~u · ~v‖~v ‖2 .
We define the quantity c~v to be the projection of ~u onto the line spanned by ~v. In
particular, it is the amount of ~u in the direction of ~v.
DEFINITION
Projection onto aLine in Rn
Let ~u,~v ∈ Rn with ~v , ~0. The projection of ~u onto ~v is defined by
proj~v(~u) =~u · ~v‖~v ‖2~v
Section 1.5 Projections 43
Now that we have the projection of ~u onto ~v, it is easy to find the desired vector ~wthat is orthogonal to ~v. We get that ~w = ~u − c~v = ~u − proj~v(~u).
DEFINITION
Perpendicular ofa Projection in Rn
Let ~u,~v ∈ Rn with ~v , ~0. The perpendicular of ~u onto ~v is defined by
perp~v(~u) = ~u − proj~v(~u)
EXERCISE 1 Verify that proj~v(~u) · perp~v(~u) = 0.
EXAMPLE 1 Let ~u =
[
2
3
]
and ~v =
[
2
1
]
. Find proj~v(~u) and perp~v(~u).
Solution: We have
proj~v(~u) =~u · ~v‖~v ‖2~v =
7
5
[
2
1
]
=
[
14/57/5
]
perp~v(~u) = ~u − proj~v(~u) =
[
2
3
]
−[
14/57/5
]
=
[
−4/58/5
]
x1
x2
O
~v
~u
proj~v(~u)perp~v(~u)
EXAMPLE 2 Let ~u =
[
−1
−3
]
and ~v =
[
2
1
]
. Find proj~v(~u) and perp~v(~u).
Solution: We have
proj~v(~u) =~u · ~v‖~v ‖2~v =
−5
5
[
2
1
]
=
[
−2
−1
]
perp~v(~u) = ~u − proj~v(~u) =
[
−1
−3
]
−[
−2
−1
]
=
[
1
−2
]
x1
x2
O
~v
~u
proj~v(~u)
perp~v(~u)
EXAMPLE 3 Let ~v =
1
3
−1
and ~u =
0
2
1
. Find proj~v(~u) and perp~v(~u).
Solution: We have
proj~v(~u) =~u · ~v‖~v ‖2~v =
5
11
1
3
−1
=
5/11
15/11
−5/11
perp~v(~u) = ~u − proj~v(~u) =
0
2
1
−
5/11
15/11
−5/11
=
−5/11
7/11
16/11
44 Chapter 1 Vectors in Euclidean Space
EXERCISE 2 Let ~a =
1
2
2
and ~b =
1
0
1
. Find proj~b(~a) and perp~b(~a).
Projection onto a Plane
So far we have been finding the projection of a vector ~u onto the line spanned by a
vector ~v. How do we define the projection of a vector ~u onto a plane P = Span{~v, ~w}in R3? Our goal is the same as before. That is, we want to be able to write any vector
~x ∈ R3 as a sum of a vector ~u which is in the plane and a vector ~z which is orthogonal
to the plane (that is, a scalar multiple of any normal vector).
Figure 1.5.2 suggests that the desired scalar multiple ~z is equal to the projection of ~xonto a normal vector ~n of the plane. Indeed, we know that we can write
~x = perp~n(~x) + proj~n(~x)
where perp~n(~x) is orthogonal to ~n and hence, by Theorem 1.4.6, is in P. Thus, we get
our desired result by taking ~u = perp~n(~x) and ~z = proj~n(~x).
x1 x2
x3x3
O
~x~n
projn(~x)
perp~n(~x)
Figure 1.5.2: Projection of a vector onto a plane
DEFINITION
Projection andPerpendicular
onto a Plane in R3
Let P be a plane in R3 that passes through the origin and has normal vector ~n. The
projection of ~x ∈ R3 onto P is defined by
projP(~x) = perp~n(~x)
The perpendicular of ~x onto P is defined by
perpP(~x) = proj~n(~x)
Section 1.5 Projections 45
EXAMPLE 4 Find the projection and perpendicular of ~y =
2
3
1
onto the plane 3x1 − x2 + 4x3 = 0.
Solution: We see that the normal vector for the plane is ~n =
3
−1
4
. Hence, we get
projP(~y) = perp~n(~y) = ~y − ~y · ~n‖~n ‖2~n =
2
3
1
− 7
26
3
−1
4
=
31/26
85/26
−1/13
perpP(~y) = proj~n(~y) =~y · ~n‖~n ‖2~n =
7
26
3
−1
4
=
21/26
−7/26
14/13
EXAMPLE 5 Find the projection of ~u =
−1
1
3
onto the plane P with vector equation
~x = s
1
2
1
+ t
−2
2
0
, s, t ∈ R
Solution: To find the projection, we first need to find a normal vector of the plane.
We have
1
2
1
×
−2
2
0
=
−2
−2
6
Thus, we can take ~n =
1
1
−3
. Then, the projection of ~u onto the plane is
projP(~u) = perp~n(~u) = ~u − ~u · ~n‖~n ‖2~n =
−1
1
3
− −9
11
1
1
−3
=
−2/11
20/11
6/11
46 Chapter 1 Vectors in Euclidean Space
EXAMPLE 6 Find the projection of ~u =
2
3
5
onto the plane P = Span
1
−1
0
,
1
2
0
.
Solution: A vector equation of the plane is ~x = s
1
−1
0
+ t
1
2
0
, s, t ∈ R. We have
1
−1
0
×
1
2
0
=
0
0
3
Thus, a normal vector of the plane is ~n =
0
0
1
. So, the projection of ~u onto P is
projP(~u) = perp~n(~u) = ~u − ~u · ~n‖~n ‖2~n =
2
3
5
− 5
1
0
0
1
=
2
3
0
This result make sense since P is the xy-plane in R3.
Section 1.5 Problems
1. Evaluate proj~v(~u) and perp~v(~u) where
(a) ~v =
[
2
3
]
, ~u =
[
−3
0
]
(b) ~v =
[
1
3
]
, ~u =
[
3
2
]
(c) ~v =
[
2
−3
]
, ~u =
[
6
4
]
(d) ~v =
[
3
2
]
, ~u =
[
1
3
]
(e) ~v =
1
0
0
, ~u =
3
2
4
(f) ~v =
1
1
1
, ~u =
3
2
4
(g) ~v =
1
0
0
1
, ~u =
2
5
−6
6
(h) ~v =
1
4
4
−1
, ~u =
1
1
1
1
(i) ~v =
1
2
0
1
, ~u =
3
1
1
−1
(j) ~v =
−2
−4
0
−2
, ~u =
3
1
1
−1
2. Find the projection and perpendicular of ~v onto
the plane P.
(a) ~v =
1
0
−1
and P has scalar equation
3x1 − 2x2 + 3x3 = 0.
(b) ~v =
−1
2
1
and P has scalar equation
x1 + 2x2 + 2x3 = 0.
(c) ~v =
3
−4
1
and P has vector equation
~x = s
0
−1
1
+ t
0
3
1
, s, t ∈ R.
(d) ~v =
1
1
2
and P has vector equation
~x = s
1
−2
1
+ t
−2
1
−2
, s, t ∈ R.
Section 1.5 Projections 47
3. Let P = Span
1
1
1
,
1
0
−1
. Find projP(~x) where
(a) ~x =
1
2
3
(b) ~x =
2
1
0
(c) ~x =
1
−2
1
(d) ~x =
2
3
3
4. Find the projection of ~v =
2
2
1
onto the plane
S = Span
3
3
−1
,
2
3
−1
.
5. Let ~v ∈ Rn with ~v , ~0. Prove that for any
~x, ~y ∈ Rn we have
(a) proj~v(~x + ~y) = proj~v(~x) + proj~v(~y)
(b) proj~v(s~x) = s proj~v(~x)
(c) proj~v(proj~v(~x)) = proj~v(~x)
6. Let ~v, ~x ∈ Rn with ~v , ~0. Determine whether
each statement is true or false. Justify your
answer.
(a) proj~v(~x) = proj−~v(~x).
(b) proj~v(−~x) = − proj~v(~x).
(c) {proj~v(~x), perp~v(~x)} is linearly independent.
(d) proj~v(perp~v(~x)) = perp~v(proj~v(~x))
7. Let P = Span{~v1,~v2} be a plane in R3 with
normal vector ~n.
(a) Explain how we know that {~v1,~v2} is
linearly independent.
(b) Prove that {~v1,~v2, proj~n(~x)} is linearly
independent for all ~x ∈ R3, ~x < P.
(c) Prove that {~v1,~v2, ~n} is a basis for R3.
8. Let P(p1, p2) be a point in R2 and let L be a line
in R2 which passes through the origin.
(a) Let ~p =
[
p1
p2
]
and let ~d , ~0 be a direction
vector for the line L. Prove that
‖~p − proj~d (~p)‖ < ‖~p − ~x‖
for all ~x ∈ Span{~d}, ~x , proj~d (~p).
(b) Explain in words what part (a) says about
the relationship of proj~d (~p) to P and L.
(c) Determine the geometric meaning of
‖ perp~d (~p)‖ in this context.
9. Suppose that ~u and ~v are orthogonal unit vec-
tors in R3.
(a) Prove that {~u,~v} is linearly independent.
(b) Prove that for every ~x ∈ R3,
perp~u×~v(~x) = proj~u(~x) + proj~v(~x)
Chapter 2
Systems of Linear Equations
In the last chapter, we saw a few cases where we needed to solve a system of m
equations in n unknowns. For example, when determining whether a set was linearly
independent, and when deriving the formula for the cross product. For each of these
cases, the examples and exercises given could be solved, with some effort, using
the method of substitution and elimination. However, in the real world, we don’t
often get a set of 3 linear equations in 3 unknowns. In applications we can easily
have thousands of equations and thousands of unknowns. Hence, we want to develop
some theory that will allow us to solve such problems.
It is important to remember while studying this chapter that in typical applications
the computations are performed by computer. Thus, although we expect that you can
solve small systems by hand, it is more important that you understand the theory of
solving systems. This theory will be used frequently throughout this book.
2.1 Systems of Linear Equations
DEFINITION
Linear Equation
An equation in n variables (unknowns) x1, . . . , xn that can be written in the form
a1x1 + · · · + anxn = b
where a1, . . . , an, b are constants is called a linear equation. The constants ai are
called the coefficients of the equation and b is called the right-hand side.
REMARKS
1. In Chapters 1 - 10 the coefficients and right-hand side will be real numbers. In
Chapter 11 we will look at linear equations where the coefficients and right-
hand side are a little more complex.
2. From our work in the last chapter, we know that a linear equation in n variables
where a1, . . . , an, b ∈ R geometrically represents a hyperplane in Rn.
48
Section 2.1 Systems of Linear Equations 49
EXAMPLE 1 The equation x1 + 0x2 + 2x3 = 4 is linear.
The equation x2 = −4x1 is also linear as it can be written as 4x1 + x2 = 0.
The equations x21+ x1x2 = 5 and x sin x = 0 are not linear equations.
DEFINITION
System of LinearEquations
A set of m linear equations in the same variables x1, . . . , xn is called a system of m
linear equations in n variables.
A general system of m linear equations in n variables has the form
a11x1 + a12x2 + · · · + a1nxn = b1
a21x1 + a22x2 + · · · + a2nxn = b2
...... =
...
am1x1 + am2x2 + · · · + amnxn = bm
Observe that the coefficient ai j represents the coefficient of x j in the i-th equation.
DEFINITION
Solution of aSystem
Solution Set
A vector ~s =
s1
...sn
∈ Rn is called a solution of a system of m linear equations in n
variables if all m equations are satisfied when we set xi = si for 1 ≤ i ≤ n. The set of
all solutions of a system of linear equations is called the solution set of the system.
DEFINITION
Consistent
Inconsistent
If a system of linear equations has at least one solution, then it is said to be
consistent. Otherwise, it is said to be inconsistent.
Observe that geometrically a system of m linear equations in n variables represents
m hyperplanes in Rn and so a solution of the system is a vector in Rn which lies on all
m hyperplanes. The system is inconsistent if all m hyperplanes do not share a point
of intersection.
EXAMPLE 2 The system of 2 equations in 3 variables
x1 + x2 + x3 = 1
x1 + x2 + x3 = 2
does not have any solutions since the two corresponding hyperplanes are parallel.
Hence, the system is inconsistent.
50 Chapter 2 Systems of Linear Equations
EXAMPLE 3 The system of 3 linear equations in 2 variables
x1 + x2 = 1
2x1 − 3x2 = 12
−2x1 − 3x2 = 0
is consistent since if we take x1 = 3 and x2 = −2, then we get
3 + (−2) = 1
2(3) − 3(−2) = 12
−2(3) − 3(−2) = 0
It can be shown this is the only solution, so the solution set is
{[
3
−2
]}
.
EXERCISE 1 Sketch the lines x1 + x2 = 1, 2x1 − 3x2 = 12, and −2x1 − 3x2 = 0 on a single graph to
verify the result of Example 3 geometrically.
EXAMPLE 4 It can be shown that the solution set of the system of 2 equations in 3 variables
x1 − 2x2 = 3
x1 + x2 + 3x3 = 9
has vector equation
~x =
7
2
0
+ t
2
1
−1
, t ∈ R
Geometrically, this means that the planes x1 − 2x2 = 3 and x1 + x2 + 3x3 = 9 intersect
in the line with the given vector equation.
EXERCISE 2 Verify that every vector on the line with vector equation ~x =
7
2
0
+ t
2
1
−1
, t ∈ R is
indeed a solution of the system of linear equations in Example 4.
It is easy to show geometrically that a system of 2 linear equations in 2 variables
is either inconsistent, consistent with a unique solution, or consistent with infinitely
many solutions (you should sketch all 3 possibilities). In particular, if a system of 2
linear equations in 2 unknowns has two solutions, then it must in fact have infinitely
many solutions. We now prove that is true for any system of linear equations.
Section 2.1 Systems of Linear Equations 51
THEOREM 1 If the system of linear equations
a11x1 + a12x2 + · · · + a1nxn = b1
a21x1 + a22x2 + · · · + a2nxn = b2
...... =
...
am1x1 + am2x2 + · · · + amnxn = bm
has two distinct solutions ~s =
s1
...sn
and ~t =
t1
...tn
, then ~x = ~s + c(~s − ~t) is a distinct
solution for each c ∈ R.
Proof: Observe that the i-th equation of the system is
ai1x1 + · · · + ainxn = bi
Substituting
x1
...xn
= ~s + c(~s − ~t) =
s1 + c(s1 − t1)...
sn + c(sn − tn)
into the i-th equation gives
ai1
(
s1 + c(s1 − t1))
+ · · · + ain
(
sn + c(sn − tn))
= ai1s1 + cai1s1 − cai1t1 + · · · + ainsn + cainsn − caintn
= ai1s1 + · · · + ainsn + c(
ai1s1 + · · · + ainsn
) − c(
ai1t1 + · · · + aintn
)
= bi + cbi − cbi
= bi
Therefore, the i-th equation is satisfied when ~x = ~s + c(~s − ~t). Since this is valid for
all 1 ≤ i ≤ m, we have shown that ~x = ~s + c(~s − ~t) is a solution for each c ∈ R.
We now show that for any c1 , c2, the solutions ~s + c1(~s − ~t) and ~s + c2(~s − ~t) are
distinct. If they are not distinct, then
~s + c1(~s − ~t) = ~s + c2(~s − ~t)c1(~s − ~t) = c2(~s − ~t)
(c1 − c2)(~s − ~t) = ~0
Since c1 , c2, this implies that ~s − ~t = ~0 and hence ~s = ~t. But, this contradicts our
assumption that ~s , ~t. �
This proves that the solution set of a system of m linear equations in n variables must
either be empty, contain exactly one vector, or have infinitely many vectors in it. In
the next section, we will derive an efficient method for finding the solution set of a
system of linear equations.
52 Chapter 2 Systems of Linear Equations
Section 2.1 Problems
1. Write a system of linear equations with 3 equa-
tions in 3 unknowns such that:
(a) The system has a unique solution.
(b) The system is inconsistent.
(c) The system has infinitely many solutions.
(d) The solution set of the system is the line
~x = s
1
2
1
+
1
0
1
, s ∈ R.
2. For each of the following, determine if the
given vector ~x is a solution of the specified
system of linear equations.
(a) 6x1 + 2x2 + 3x3 = 1
2x1 + x2 + 6x3 = 2
4x1 + 5x2 − 6x3 = −9,
~x =
3/7−10/7
3/7
(b) 3x1 + 6x2 + 7x3 = 9
−4x1 + 3x2 − 3x3 = −4
x1 − 13x2 − 2x3 = 4,
~x =
−2
−1
3
(c) x1 − 2x2 + x3 − 3x4 = 0
−4x1 + 5x2 + 5x3 − 6x4 = 0
5x1 − 3x2 + 3x3 + 8x4 = 0,~x =
4
3
−1
−1
(d) x1 + 2x2 + x3 = 3
−x1 + x2 + 5x3 = −9
x1 + x2 − x3 = 5,
~x =
7 + 3t
−2 − 2t
t
3. Given that
1
0
0
and
0
1
1
are both solutions of a
system of linear equations, list three other so-
lutions of the same system.
4. Determine whether each statement is true or
false. Justify your answer.
(a) x1 + x3 = −3x2 is a linear equation.
(b) x21+ 2x1x2 + x2
2= 0 is a linear equation.
(c) If a system of linear equations has
1
1
1
as a
solution, then
2
2
2
is also a solution of the
system.
(d) It is impossible to find a solution to a sys-
tem of linear equations that has more un-
knowns than equations.
5. Consider a system of linear equations
a11x1 + · · · + a1nxn = 0
a21x1 + · · · + a2nxn = 0
.........
am1x1 + · · · + amnxn = 0
(a) Explain geometrically why this system
must be consistent.
(b) Prove algebraically that this system must
be consistent.
6. It can be shown that the three hyperplanes
2x1 + x2 + 4x3 + x4 = 6
x1 + x3 + 2x4 = 4
−x1 − 4x2 − 9x3 + 10x4 = 4
intersect in the plane ~x =
4
−2
0
0
+ t
−1
−2
1
0
+ s
−2
3
0
1
,
s, t ∈ R.
(a) Prove that the point P(−1, 2, 1, 2) lies on
all three hyperplanes.
(b) Find three more points that lie on all three
hyperplanes.
Section 2.2 Solving Systems of Linear Equations 53
2.2 Solving Systems of Linear Equations
To derive a method for solving large systems of linear equations, we start by looking
more closely at how we solve small systems.
Given a system of 2 linear equations in 2 unknowns, we can use the method of sub-
stitution and elimination to solve it.
EXAMPLE 1 Solve the system
2x1 + 3x2 = 11
3x1 + 6x2 = 7
Solution: We first want to solve for one variable, say x2. To do this, we eliminate x1
from one of the equations. Multiplying the first equation by (-3) gives
−6x1 − 9x2 = −33
3x1 + 6x2 = 7
Now, if we multiply the second equation by 2 we get
−6x1 − 9x2 = −33
6x1 + 12x2 = 14
Adding the first equation to the second we get
−6x1 − 9x2 = −33
0x1 + 3x2 = −19
Multiplying the second equation by 13
gives
−6x1 − 9x2 = −33
0x1 + x2 = −19/3
We can now add 9 times the second equation to the first equation to get
−6x1 + 0x2 = −90
0x1 + x2 = −19/3
Finally, multiplying the first equation by −16
we get
x1 + 0x2 = 15
0x1 + x2 = −19/3
Hence, the system is consistent with unique solution
[
15
−19/3
]
.
54 Chapter 2 Systems of Linear Equations
Observe that after each step we actually have a new system of linear equations to
solve. The idea is that each new system has the same solution set as the original and
is easier to solve.
DEFINITION
Equivalent
Two systems of equations which have the same solution set are said to be equivalent.
Of course, this method can be used to solve larger systems as well.
EXERCISE 1 Solve the system by using substitution and elimination. Clearly show/explain all of
your steps used in solving the system and describe your general procedure.
x1 − 2x2 − 3x3 = 0
2x1 + x2 + 5x3 = −1
−x1 + x2 − x3 = 2
Now imagine that you need to solve a system of a hundred thousand equations with
a hundred thousand variables. Certainly you are not going to want to do this by hand
(it would be environmentally unfriendly to use that much paper). So, being clever,
you decide that you want to program a computer to solve the system for you. To do
this, you need to ask yourself a few questions: How is the computer going to store
the system? What operations is the computer allowed to perform on the equations?
How will the computer know which operations it should use to solve the system?
To answer the first question, we observe that when using substitution and elimina-
tion, we do not really need to write down the variables xi each time as we are only
modifying the coefficients and right-hand side. Thus, it makes sense to store just the
coefficients and right-hand side of each equation in the system in a rectangular array.
DEFINITION
AugmentedMatrix
Coefficient Matrix
The coefficient matrix for the system of linear equations
a11x1 + a12x2 + · · · + a1nxn = b1
a21x1 + a22x2 + · · · + a2nxn = b2
... =...
am1x1 + am2x2 + · · · + amnxn = bm
is the rectangular array
a11 a12 · · · a1n
a21 a22 · · · a2n
....... . .
...am1 am2 · · · amn
The augmented matrix of the system is
a11 a12 · · · a1n b1
a21 a22 · · · a2n b2
....... . .
......
am1 am2 · · · amn bm
Section 2.2 Solving Systems of Linear Equations 55
REMARK
It is very important to observe that we can look at the coefficient matrix in two ways.
First, the i-th row of the coefficient matrix represents the coefficients of the i-th equa-
tion in the system. Second, the j-th column of the coefficient matrix represents the
coefficients of x j in all equations.
For computational problems, the notation in the definition is necessary. However,
when working on theory, it would be much better to have more compact notation.
Observe that by using operations on vectors in Rn, the system of linear equations
a11x1 + · · · + a1nxn = b1
...... =
...
am1x1 + · · · + amnxn = bm
can be written as
b1
...bm
=
a11x1 + · · · + a1nxn
...am1x1 + · · · + amnxn
= x1
a11
...am1
+ · · · + xn
a1n
...amn
Notice that the columns of the coefficient matrix of the system are the vectors
~a j =
a1 j
...am j
for 1 ≤ j ≤ n. Thus, we can denote the coefficient matrix of the system by
A =[
~a1 · · · ~an
]
. If we let ~b =
b1
...bm
, then we can denote the augmented matrix of
the system by[
~a1 · · · ~an~b
]
or sometimes simply as[
A | ~b]
.
EXAMPLE 2 The system of linear equations
3x1 + x2 + x3 = −1
6x1 + x3 = −2
−3x1 + 2x2 = 3
has coefficient matrix
A =
3 1 1
6 0 1
−3 2 0
and augmented matrix
[
A | ~b]
=
3 1 1 −1
6 0 1 −2
−3 2 0 3
56 Chapter 2 Systems of Linear Equations
To answer the question about what operations is the computer allowed to perform, it
is helpful to write each step of our example in matrix form.
EXAMPLE 3 Write each step of Example 1 in matrix form and describe how the rows of the matrix
were modified in each step.
Solution: The initial system has augmented matrix
[
2 3 11
3 6 7
]
We multiplied the first row of the augmented matrix by (-3) to get the augmented
matrix[
−6 −9 −33
3 6 7
]
Then we multiplied the second row by 2 to get
[
−6 −9 −33
6 12 14
]
Adding the first row to the second row gave
[
−6 −9 −33
0 3 −19
]
Next we multiplied the second row by 13
to get
[
−6 −9 −33
0 1 −19/3
]
Then we added 9 times the second row to the first.
[
−6 0 −90
0 1 −19/3
]
The last step was to multiply the first row by −16.
[
1 0 15
0 1 −19/3
]
This is the augmented matrix for the system
x1 = 15
x2 = −19/3
which gives us the solution.
Section 2.2 Solving Systems of Linear Equations 57
EXERCISE 2 Write each step of Exercise 1 in matrix form and describe how the rows of the matrix
were modified in each step.
Our method of solving has involved two types of operations. One was to multiply
a row by a non-zero number and the second was to add a multiple of one row to
another. However, to create a nice method for a computer to solve a system, we will
add one more type of operation: swapping rows.
DEFINITION
Elementary RowOperations
The three elementary row operations (EROs) are:
1. multiplying a row by a non-zero scalar,
2. adding a multiple of one row to another,
3. swapping two rows.
When applying EROs to a matrix, called row reducing the matrix, we often use
short hand notation to indicate which operations we are using:
1. cRi indicates multiplying the i-th row by c , 0.
2. Ri + cR j indicates adding c times the j-th row to the i-th row.
3. Ri ↔ R j indicates swapping the i-th row and the j-th row.
For now you should always indicate which ERO you use in each step. Not only will
this help you in row reducing a matrix and finding/fixing computational errors, but in
Chapter 5 we will see that this is necessary.
Observe that EROs are reversible (we can always perform an ERO that undoes what
an ERO did), hence if there exists a sequence of elementary row operations that
transform A into B, then there also exists a sequence of elementary row operations
that transforms B into A.
DEFINITION
Row Equivalent
Two matrices A and B are said to be row equivalent if there exists a sequence of
elementary row operations that transform A into B. We write A ∼ B.
Recall that our goal is to row reduce the augmented matrix of a system into the
augmented matrix of an equivalent system that is easier to solve.
THEOREM 1 If the augmented matrices[
A1 | ~b1
]
and[
A | ~b]
are row equivalent, then the systems
of linear equations associated with each system are equivalent.
Finally, to answer the third question about which operations a computer should use
to solve the system, we need to figure out an algorithm for the computer to follow.
The algorithm should apply elementary row operations to the augmented matrix of
a system until we have an equivalent system which is easy to solve. We make a
definition for the form of the coefficient matrix that makes the system easiest to solve.
58 Chapter 2 Systems of Linear Equations
DEFINITION
Reduced RowEchelon Form
A matrix R is said to be in reduced row echelon form (RREF) if:
1. All rows containing a non-zero entry are above rows which only contains zeros.
2. The first non-zero entry in each non-zero row is 1, called a leading one.
3. The leading one in each non-zero row is to the right of the leading one in any
row above it.
4. A leading one is the only non-zero entry in its column.
If A is row equivalent to a matrix R in RREF, then we say that R is the reduced row
echelon form of A.
In the definition, we can say that R is the reduced row echelon form of A because of
the following theorem.
THEOREM 2 If A is a matrix, then A has a unique reduced row echelon form R.
The proof requires additional tools which will be presented in Chapter 4.
EXAMPLE 4 Determine which of the following matrices are in RREF.
(a)
1 0 2 0
0 1 1 0
0 0 0 1
(b)
0 1 0 0
0 0 0 1
0 0 0 0
(c)
1 0 0 2
0 1 1 1
0 0 1 1
(d)
[
0 1 0
1 0 0
]
Solution: The matrices in (a) and (b) are in RREF. The matrix in (c) is not in RREF
as the column containing the leading one in the third row has another non-zero entry.
The matrix (d) is not in RREF since the leading one in the second row is to the left
of the leading one in the row above it.
EXERCISE 3 Determine which of the following matrices are in RREF.
(a)
1 0 −1 1
0 1 −1 −1
0 0 0 1
(b)
1 0 −1 0
0 0 0 1
0 0 0 0
(c)
0 2 0 3
0 0 2 0
0 0 0 0
(d)
1 0 7 0
0 0 0 0
0 1 0 1
We now just need to give our computer an algorithm to row reduce any matrix to
its reduced row echelon form. The typical method for row reducing an augmented
matrix to its RREF is called Gauss-Jordan Elimination. Although this algorithm for
row reducing a matrix always works, it is not always the best way to reduce a matrix
by hand (or by computer). In individual cases, you may want to perform various
“tricks” to make the row reductions/computations easier. For example, you probably
want to avoid fractions as long as possible. These “tricks” are best learned from
experience. That is, you should row reduce lots of matrices.
Section 2.2 Solving Systems of Linear Equations 59
ALGORITHM
Gauss-Jordan Elimination To row reduce a non-zero matrix into RREF, we
1. Consider the first non-zero column. Use elementary row operations (if neces-
sary) to get a leading one in the top of the column.
2. Use the elementary row operation add a multiple of one row to another to make
all entries beneath the leading one into a 0.
3. Consider the submatrix consisting of the columns to the right of the column
you just modified and the rows beneath the row that just got a leading one. Use
elementary row operations (if necessary) to get a leading one in the top of the
first non-zero column of this submatrix.
4. Use the elementary row operation add a multiple of one row to another to make
all other entries in the column (for the whole matrix, not just the submatrix)
containing the new leading one into a 0.
5. Repeat steps 3 and 4 until the matrix is in RREF.
In this example, we will follow the Gauss-Jordan Elimination Algorithm to demon-
strate how it works.
EXAMPLE 5 Solve the following system of equations by row reducing the augmented matrix to
RREF.
x1 + x2 = −7
2x1 + 4x2 + x3 = −16
x1 + 2x2 + x3 = 9
Solution: The upper left corner is already a leading one, so we place 0s beneath it.
1 1 0 −7
2 4 1 −16
1 2 1 9
R2 − 2R1 ∼
1 1 0 −7
0 2 1 −2
1 2 1 9
R3 − R1
∼
1 1 0 −7
0 2 1 −2
0 1 1 16
We now consider the submatrix where we ignore the first row and the first column.
The entry in the top of the first column of this submatrix is a 2. We need to make this
a leading one.
1 1 0 −7
0 2 1 −2
0 1 1 16
R2 ↔ R3
∼
1 1 0 −7
0 1 1 16
0 2 1 −2
60 Chapter 2 Systems of Linear Equations
We now need to get 0s above and below the new leading one.
1 1 0 −7
0 1 1 16
0 2 1 −2
R1 − R2
∼
1 0 −1 −23
0 1 1 16
0 2 1 −2
R3 − 2R2
∼
1 0 −1 −23
0 1 1 16
0 0 −1 −34
We repeat the procedure on the new submatrix [−1 | −34].
1 0 −1 −23
0 1 1 16
0 0 −1 −34
(−1)R3
∼
1 0 −1 −23
0 1 1 16
0 0 1 34
R1 + R3
∼
1 0 0 11
0 1 1 16
0 0 1 34
R2 − R3 ∼
1 0 0 11
0 1 0 −18
0 0 1 34
This corresponds to the system x1 = 11, x2 = −18, x3 = 34. Hence, the solution is
~x =
11
−18
34
.
When row reducing a matrix, we must remember that we are only allowed to perform
one elementary row operation at a time as in the example. However, writing out the
matrix so many times becomes quite tedious and time consuming. Thus, whenever
possible, we include multiple elementary row operations in a single rewriting of the
matrix. When doing this, we must make sure that we are not modifying a row that
we are using in another elementary row operation.
EXAMPLE 6 Find the solution set of the following system of equations.
x1 + x2 + 2x3 + x4 = 3
x1 + 2x2 + 4x3 + x4 = 7
x1 + x4 = −21
Solution: We write the augmented matrix and row reduce
1 1 2 1 3
1 2 4 1 7
1 0 0 1 −21
R2 − R1
R3 − R1
∼
1 1 2 1 3
0 1 2 0 4
0 −1 −2 0 −24
R1 − R2
R3 + R2
∼
1 0 0 1 −1
0 1 2 0 4
0 0 0 0 −20
Observe that the last row represents the equation 0x1+0x2+0x3+0x4 = −20. Clearly
this is impossible, so the system is inconsistent.
It is clear that if we ever get a row of the form[
0 · · · 0 b]
with b , 0, then the
system of equations is inconsistent as this corresponds to the equation 0 = b.
Section 2.2 Solving Systems of Linear Equations 61
EXERCISE 4 Solve the following system of linear equations.
x1 + 2x2 + x3 = 0
−x1 − 2x2 + 2x3 = 0
2x2 + 4x3 = −3
Infinitely Many Solutions
As we discussed above, a consistent system of linear equations either has a unique
solution or infinitely many. We now look at how to determine all solutions of a system
with infinitely many solutions. We start by considering an example.
EXAMPLE 7 Show the following system of equations has infinitely many solutions and find the
solution set.
x1 + x2 + x3 = 4
x2 + x3 = 3
Solution: We row reduce the augmented matrix to RREF.
[
1 1 1 4
0 1 1 3
]
R1 − R2 ∼[
1 0 0 1
0 1 1 3
]
Observe that the RREF of the augmented matrix corresponds to the system of equa-
tions
x1 = 1
x2 + x3 = 3
There are infinitely many choices for x2 and x3 that will satisfy x2 + x3 = 3. Hence,
the system has infinitely many solutions.
Observe that by choosing any x3 = t ∈ R, we get a unique value for x2; in particular,
x2 = 3 − t. Thus, the solution set of the system is
~x =
x1
x2
x3
=
1
3 − t
t
=
1
3
0
+ t
0
−1
1
, t ∈ R
In this example, we could have solved for x3 in terms of x2 instead. However, for
consistency and ease, we make the convention that we always solve for the variables
whose columns have leading ones in terms of the variables whose columns do not
have leading ones.
62 Chapter 2 Systems of Linear Equations
DEFINITION
Free Variable
Let R be the RREF of a coefficient matrix of a consistent system of linear equations.
If the j-th column of R does not contain a leading one, then we call x j a free variable
of the system of linear equations.
We can now give an algorithm for solving a system of linear equations.
ALGORITHM
To solve a system of linear equations:
1. Write the augmented matrix for the system.
2. Use elementary row operations to row reduce the augmented matrix into RREF.
3. Write the system of linear equations corresponding to the RREF.
4. If the system contains an equation of the form 0 = 1, the system is inconsistent.
5. Otherwise, move each free variable (if any) to the right hand side of each equa-
tion and assign each free variable a parameter.
6. Determine the solution set by using vector operations to write the system as a
linear combination of vectors.
EXAMPLE 8 Solve the following system of linear equations.
2x1 + x2 + x3 + x4 = −2
x1 + x3 + x4 = 1
−2x2 + 2x3 + 2x4 = 8
Solution: We have
2 1 1 1 −2
1 0 1 1 1
0 −2 2 2 8
R1 ↔ R2
12R3
∼
1 0 1 1 1
2 1 1 1 −2
0 −1 1 1 4
R2 − 2R1 ∼
1 0 1 1 1
0 1 −1 −1 −4
0 −1 1 1 4
R3 + R2
∼
1 0 1 1 1
0 1 −1 −1 −4
0 0 0 0 0
The corresponding system of equations is
x1 + x3 + x4 = 1
x2 − x3 − x4 = −4
0 = 0
Since x3 and x4 are free variables, we let x3 = s ∈ R and x4 = t ∈ R. Thus, we have
x1 = 1− x3− x4 = 1− s− t and x2 = −4+ x3+ x4 = −4+ s+ t. Therefore, all solutions
have the form
~x =
x1
x2
x3
x4
=
1 − s − t
−4 + s + t
s
t
=
1
−4
0
0
+ s
−1
1
1
0
+ t
−1
1
0
1
, s, t ∈ R
Section 2.2 Solving Systems of Linear Equations 63
EXAMPLE 9 Solve the following system of linear equations.
x1 + x3 + 2x4 = 4
2x1 + x2 + 4x3 + x4 = 6
−x1 − 4x2 − 9x3 + 5x4 = 4
Solution: We have
1 0 1 2 4
2 1 4 1 6
−1 −4 −9 5 4
R2 − 2R1
R3 + R1
∼
1 0 1 2 4
0 1 2 −3 −2
0 −4 −8 7 8
R3 + 4R2
∼
1 0 1 2 4
0 1 2 −3 −2
0 0 0 −5 0
−15R3
∼
1 0 1 2 4
0 1 2 −3 −2
0 0 0 1 0
R1 − 2R3
R2 + 3R3∼
1 0 1 0 4
0 1 2 0 −2
0 0 0 1 0
The corresponding system of equations is
x1 + x3 = 4
x2 + 2x3 = −2
x4 = 0
Since x3 is a free variable, we let x3 = s ∈ R. Thus, we have x1 = 4 − x3 = 4 − s and
x2 = −2 − 2x3 = −2 − 2s. Therefore, all solutions have the form
~x =
x1
x2
x3
x4
=
4 − s
−2 − 2s
s
0
=
4
−2
0
0
+ s
−1
−2
1
0
, s ∈ R
EXERCISE 5 Solve the following system of linear equations.
x2 + 3x3 = 0
2x1 + 3x2 + x3 + x4 = 0
−2x1 + 8x3 = 0
64 Chapter 2 Systems of Linear Equations
REMARK
Instead of thinking of this algorithm as solving a system, we can think about it as just
changing the form of the system. The original system is an implicit representation
for the intersection of the hyperplanes, while a vector equation for the solution set is
an explicit form. Note that both representations have their uses. For example, if you
wanted to find ten points which were on the intersection of the hyperplanes in Ex-
ample 8, it would be much easier to use a vector equation of the solution set. On the
other hand, if you wanted to determine if ten particular points were on the intersec-
tion, it would be much easier to test the original scalar equations of the hyperplanes.
One reason we developed our theory of solving systems of linear equations was so
that we could solve more complicated linear algebra problems.
EXAMPLE 10 Prove that B =
2
−1
2
,
−1
1
0
,
2
0
5
is a basis for R3.
Solution: By definition of a basis, we need to prove that B spans R3 and is linearly
independent. For spanning, we need to show that R3 = SpanB. First, since B ⊂ R3
we have that SpanB ⊆ R3 by Theorem 1.1.1. Next, for any
x1
x2
x3
∈ R3 we need to
show that we can find t1, t2, and t3 such that
x1
x2
x3
= t1
2
−1
2
+ t2
−1
1
0
+ t3
2
0
5
=
2t1 − t2 + 2t3
−t1 + t2
2t1 + 5t3
Comparing entries gives the system of linear equations
2t1 − t2 + 2t3 = x1
−t1 + t2 = x2
2t1 + 5t3 = x3
We now need to determine if this system is consistent for all x1, x2, and x3. Row
reducing gives
2 −1 2 x1
−1 1 0 x2
2 0 5 x3
R1 + R2
∼
1 0 2 x1 + x2
−1 1 0 x2
2 0 5 x3
R2 + R1
R3 − 2R1
∼
1 0 2 x1 + x2
0 1 2 x1 + 2x2
0 0 1 −2x1 − 2x2 + x3
R1 − 2R3
R2 − 2R3 ∼
1 0 0 5x1 + 5x2 − 2x3
0 1 0 5x1 + 6x2 − 2x3
0 0 1 −2x1 − 2x2 + x3
Section 2.2 Solving Systems of Linear Equations 65
Thus, the system is consistent for all x1, x2, and x3 as required. Hence, R3 ⊆ SpanB.
Therefore, SpanB = R3.
To determine if B is linearly independent, we use the definition of linear indepen-
dence. Consider
t1
2
−1
2
+ t2
−1
1
0
+ t3
2
0
5
=
0
0
0
We see that this is going to give us exactly the same system as above, except that
the right-hand side is now all zeros. Hence, performing the same elementary row
operations will give
2 −1 2 0
−1 1 0 0
2 0 5 0
∼
1 0 0 0
0 1 0 0
0 0 1 0
Consequently, the only solution is t1 = t2 = t3 = 0 and so B is also linearly indepen-
dent. Therefore, it is a basis for R3.
EXERCISE 6 Prove that B =
1
2
−1
,
2
1
1
,
3
3
1
is a basis for R3.
Homogeneous Systems
In the last example, we saw that determining if a set is linearly independent cor-
responds to solving a system where the right-hand side only contains zeros. Such
systems arise frequently in linear algebra and its applications, so we look at these a
little more closely.
DEFINITION
HomogeneousSystem
A system of linear equations is said to be a homogeneous system if the right-hand
side only contains zeros. That is, it has the form[
A | ~0]
.
Observe that the augmented part of any matrix that is row equivalent to the augmented
matrix of a homogeneous system will only contain zeros. Thus, it is standard practice
to omit this column when solving a homogeneous system; we just row reduce the
coefficient matrix.
Moreover, we immediately observe that it is impossible to get a row of the form[
0 · · · 0 1]
, hence a homogeneous system is always consistent. In particular, it
is easy to see that the zero vector ~0 is always a solution of a homogeneous system.
66 Chapter 2 Systems of Linear Equations
EXAMPLE 11 Find the solution set of the homogeneous system
x1 + 2x2 + 3x3 = 0
−2x1 − 3x2 − 6x3 = 0
x1 + 3x2 + 4x3 = 0
x1 + 4x2 + 5x3 = 0
Solution: We row reduce the coefficient matrix to get
1 2 3
−2 −3 −6
1 3 4
1 4 5
R2 + 2R1
R3 − R1
R4 − R1
∼
1 2 3
0 1 0
0 1 1
0 2 2
R1 − 2R2
R3 − R2
R4 − 2R2
∼
1 0 3
0 1 0
0 0 1
0 0 2
R1 − 3R3
R4 − 2R3
∼
1 0 0
0 1 0
0 0 1
0 0 0
Since this is the coefficient matrix of a homogeneous system we must remember that
the right-hand sides of all the equations are 0. Thus, when we write the RREF back
into equation form we get x1 = 0, x2 = 0, and x3 = 0. Therefore, the only solution is
the trivial solution.
EXAMPLE 12 Find the solution set of the homogeneous system
x1 + 2x2 = 0
x1 + 3x2 + x3 = 0
x2 + x3 = 0
Solution: We row reduce the coefficient matrix to get
1 2 0
1 3 1
0 1 1
R2 − R1 ∼
1 2 0
0 1 1
0 1 1
R1 − 2R2
R3 − R2
∼
1 0 −2
0 1 1
0 0 0
Rewriting the RREF as a homogeneous system gives x1 − 2x3 = 0 and x2 + x3 = 0.
Since x3 is a free variable, we let x3 = t ∈ R. Then x1 = 2x3 = 2t and x2 = −x3 = −t,
so the solution set has vector equation
~x =
x1
x2
x3
=
2t
−t
t
= t
2
−1
1
, t ∈ R
We see in both of the examples that the solution set is a subspace of R3.
Section 2.2 Solving Systems of Linear Equations 67
THEOREM 3 The solution set of a homogeneous system of m linear equations in n variables is a
subspace of Rn.
Proof: For 1 ≤ i ≤ m, denote the i-th equation of the homogeneous system by
ai1x1 + · · · + ainxn = 0
By definition, the solution set of the system is a subset ofRn and clearly ~0 is a solution
of the system. Let ~s =
s1
...sn
and ~t =
t1
...tn
both be solutions of the system. Observe that
ai1(s1 + t1) + · · · + ain(sn + tn) = (ai1s1 + · · · + ainsn) + (ai1t1 + · · · + aintn) = 0 + 0 = 0
and
ai1(cs1) + · · · + ain(csn) = c(ai1s1 + · · · + ainsn) = c(0) = 0
for 1 ≤ i ≤ m. Thus, ~s + ~t and c~s are also solutions for any c ∈ R, so the solution set
is a subspace of Rn by the Subspace Test. �
DEFINITION
Solution Space
The solution set of a homogeneous system is called the solution space of the system.
Rank
We see that the number of free variables in a system of linear equations is the number
of columns in the RREF of the coefficient matrix which do not have leading ones.
Thus, it makes sense to count the number of leading ones in the RREF of a matrix.
DEFINITION
Rank
The rank of a matrix A is the number of leading ones in the RREF of the matrix and
is denoted rank A.
EXAMPLE 13 Determine the rank of the following matrices.
(a) A =
1 0 0 1 0
0 0 1 0 −2
0 0 0 0 0
.
Solution: Since A is in RREF, we see that A has two leading ones, so rank A = 2.
(b) B =
2 1 1 1 −2
1 0 1 1 1
0 0 −2 2 8
.
Solution: Observe that the matrix B is just the augmented matrix in Example 8.
Hence, the rank of B is 2 since the RREF of B is
R =
1 0 1 1 1
0 1 −1 −1 −4
0 0 0 0 0
68 Chapter 2 Systems of Linear Equations
By definition of rank, we have that the rank of an m × n matrix A must be less than
or equal to the number of rows of A. However, it is also important to observe that
by definition of RREF, we can have at most one leading one per column as well. So,
we also have that the rank of a matrix A must be less than or equal to the number of
columns of A. That is, we have the following theorem.
THEOREM 4 For any m × n matrix A we have
rank A ≤ min(m, n)
We now get the following extremely important theorem which summarizes all of
our results on systems of linear equations in terms of rank. This theorem will be
referenced many times throughout the rest of this book.
THEOREM 5 (System-Rank Theorem)
Let A be the coefficient matrix of a system of m linear equations in n unknowns[
A | ~b]
.
(1) The rank of A is less than the rank of the augmented matrix[
A | ~b]
if and only
if the system is inconsistent.
(2) If the system[
A | ~b]
is consistent, then the system contains (n − rank A) free
variables (parameters).
(3) rank A = m if and only if the system[
A | ~b]
is consistent for every ~b ∈ Rm.
Proof: (1) The rank of A is less than the rank of the augmented matrix if and only
if there is a row in the RREF of the augmented matrix of the form[
0 · · · 0 1]
.
This is true if and only if the system is inconsistent.
(2) This follows immediately from the definition of rank and free variables.
(3) Assume that the RREF of[
A | ~b]
is[
R | ~c ]
. If[
A | ~b]
is inconsistent for some
~b ∈ Rm, then the RREF of the augmented matrix must contain a row of the form[
0 · · · 0 1]
, as otherwise, we could find a solution to[
A | ~b]
by picking all
free-variables to be 0, and setting the variable containing the leading one in the i-th
row to be equal to ci. Therefore, rank A < m.
On the other hand, if rank A < m, then the RREF R of A has a row of all zeros. Thus,[
R | ~em
]
is inconsistent as it has a row of the form[
0 · · · 0 1]
. Since elementary
row operations are reversible, we can apply the reverse of the row operations needed
to row reduce A to R on[
R | ~em
]
to get[
A | ~b]
for some ~b ∈ Rm. Then this system
is inconsistent since elementary row operations do not change the solution set. Thus,
there exists some ~b ∈ Rm such that[
A | ~b]
is inconsistent. �
Section 2.2 Solving Systems of Linear Equations 69
Part (2) of Theorem 5 tells us that a consistent system[
A | ~b]
of m linear equations
in n variables has a unique solution if and only if rank A = n. The next theorem
tells us that if[
A | ~b]
is consistent with rank A < n, then the set of (n − rank A)
vectors corresponding to the free variables in our method for finding the solution set
is necessarily linearly independent.
THEOREM 6 (Solution Theorem)
Let[
A | ~b]
be a consistent system of m linear equations in n variables with RREF[
R | ~c ]
. If rank A = k < n, then a vector equation of the solution set of[
A | ~b]
has the
form
~x = ~d + t1~v1 + · · · + tn−k~vn−k, t1, . . . , tn−k ∈ Rwhere ~d ∈ Rn and {~v1, . . . ,~vn−k} is a linearly independent set in Rn. In particular, the
solution set of[
A | ~b]
is an (n − k)-flat in Rn.
The idea of the proof is to observe that if xi is a free variable, then it will correspond
to a term in a vector equation of the solution set of the form
ci~vi = ci
v1
...vi−1
1
0...0
Thus, any free variables with index less than i must have a 0 as their i-th entry, so
{~v1, . . . ,~vn−k} will be linearly independent.
One of the many important uses of the System-Rank Theorem is it allows us to easily
prove some important facts about spanning, linear independence, and bases in Rn.
THEOREM 7 A set of n vectors in Rn is linearly independent if and only if it spans Rn.
REMARK
The method used in the proof of Theorem 7 will be used several more times in this
book.
70 Chapter 2 Systems of Linear Equations
Section 2.2 Problems
1. Find each of the following systems:
(i) Write the coefficient matrix and aug-
mented matrix.
(ii) Find the rank of the coefficient and aug-
mented matrix by row reducing to RREF.
(iii) Determine the solution set for the system.
(iv) Describe the system and its solution set ge-
ometrically.
(a) x1 + 2x2 + 2x3 = 1
x1 + x2 + x3 = 2
(b) 2x1 + 2x2 − 3x3 = 3
x1 + x2 + x3 = 9
(c) 2x1 + x2 − x3 = 4
2x1 − x2 − 3x3 = −1
3x1 + 2x2 − x3 = 8
(d) x1 + 2x2 = 3
x1 + 3x2 − x3 = 4
−3x1 − 8x2 + 2x3 = −11
(e) x1 + 3x2 − x3 = 9
2x1 + x2 − 2x3 = −2
(f) 2x1 − 2x2 + 2x3 + x4 = −4
3x1 − 3x2 + 3x3 + 4x4 = 9
(g) x1 + x2 + x3 + 6x4 = 5
2x2 − 4x3 + 4x4 = −4
2x1 + 4x3 + 5x4 = 4
−x1 + 2x2 − 3x3 + 4x4 = 9
(h) x1 + 2x2 + x3 + 2x4 = −2
2x1 + x2 + 2x3 + x4 = 2
2x2 + x3 + x4 = 2
x1 + x2 + 2x3 = 6
(i) x2 − 2x3 = 3
2x1 + 2x2 + 3x3 = 1
x1 + 2x2 + 4x3 = −1
2x1 + x2 − x3 = 1
2. For each of the following, determine if ~x is in
SpanB.
(a) B =
1
2
−1
,
3
−4
2
, ~x =
1
3
1
(b) B =
1
2
−1
,
3
−4
2
, ~x =
1
0
0
(c) B =
3
1
2
3
,
2
1
3
−4
,
−1
0
−4
3
, ~x =
1
−3
−7
−9
(d) B =
5
−3
4
,
−6
3
−4
,
6
−6
8
, ~x =
1
−7
−9
3. Which of the following sets is linearly indepen-
dent.
(a)
1
2
−1
,
1
3
1
,
1
−1
4
,
2
1
1
(b)
1
0
−1
,
2
1
−2
,
3
3
−1
(c)
1
2
3
,
4
5
6
,
7
8
9
(d)
5
−3
4
,
−6
3
−4
,
6
−6
8
(e)
3
1
2
,
2
1
3
,
1
3
2
(f)
3
1
3
5
,
6
4
−3
4
,
0
2
−1
8
4. Which of the following sets is basis for R3.
(a)
5
−3
7
,
−4
1
6
(b)
1
−1
3
,
1
0
−1
,
6
−2
5
,
2
−1
4
(c)
1
3
−2
,
2
−5
3
,
4
1
0
(d)
1
1
1
,
2
1
−1
,
1
0
−3
(e)
1
0
−4
,
3
3
−5
,
3
6
2
(f)
3
2
−3
,
4
4
0
,
2
1
−2
5. Prove Theorem 7.
6. Let a, b, c, d ∈ R. Prove that the set
{[
a
c
]
,
[
b
d
]}
is a basis for R2 if and only if ad − bc , 0.
Section 2.2 Solving Systems of Linear Equations 71
7. Consider three planes P1, P2, and P3, with re-
spective equations
a11x1 + a12x2 + a13x3 = b1,
a21x1 + a22x2 + a23x3 = b2, and
a31x1 + a32x2 + a33x3 = b3
The intersections of these planes is illustrated
below. Assume that L1, L2, and L3 are parallel.
P1P2
P3
L1
L2
L3
What is the rank of A =
a11 a12 a13
a21 a22 a23
a31 a32 a33
?
8. Prove that if two planes in R3 intersect, then
they either intersect in a line or a plane.
9. Determine whether each statement is true or
false. Justify your answer.
(a) The solution set of a system of linear equa-
tions is a subspace of Rn.
(b) A system of 3 equations in 5 variables has
infinitely many solutions.
(c) A system of 5 equations in 3 variables can-
not have infinitely many solutions.
(d) A homogeneous system of 3 equations in
5 variables cannot have a unique solution.
(e) If A is the coefficient matrix of a system
of 3 linear equations in 4 variables where
rank A = 2, then the solution set is a plane
in R4.
(f) If A is the coefficient matrix of a system
of 3 linear equations in 5 variables where
rank A = 3, then the solution set is a plane
in R5.
(g) If the solution space of a homogeneous
system of 5 linear equations in 3 variables
is a line in R3, then the rank of the coeffi-
cient matrix is 2.
(h) If A is the coefficient matrix of a system
of m linear equations in n variables where
m < n, then rank A = m.
(i) If a system of m linear equations in n vari-
ables[
A | ~b]
is consistent for every ~b ∈ Rm
with a unique solution, then m = n.
10. Let B = {~v1, . . . ,~vk} be a set of vectors in Rn.
Prove thatB is linearly independent if and only
if the rank of the matrix[
~v1 · · · ~vk
]
is k.
11. Prove that:
(a) a set of k vectors in Rn spans Rn if and only
if the rank of the coefficient matrix of the
system c1~v1 + · · · + ck~vk = ~b is n.
(b) if Span{~v1, . . . ,~vk} = Rn, then k ≥ n.
12. Prove that:
(a) if S is a non-trivial subspace of Rn,
Span{~v1, . . . ,~vℓ} = S, and {~u1, . . . , ~uk} is
a linearly independent set of vectors in S,
then k ≤ ℓ.(b) if {~v1, . . . ,~vℓ} and {~u1, . . . , ~uk} are bases of
a non-trivial subspace S, then k = ℓ.
13. Consider a system of m linear equations in n
variables with coefficient matrix A and right-
hand side ~b. Prove that if ~y is a solution of[
A | ~0]
and ~z is a solution of[
A | ~b]
, then ~x =
~z + c~y is a solution of[
A | ~b]
for each c ∈ R.
14. Steady flow through a network of conductors
can be described a system of linear equations.
Such networks can be used to describe systems
such as traffic/fluid flow, electrical current,
metabolism, and warehouse supply chains.
50
f f
f
f f
12
3
4 5
50
6040
To characterize the possible flows in the net-
work shown above, we observe that, in steady
state, the flow into each corner node must be
equal to the flow out. For instance, at the top
node, this balance constraint takes the form
(flow in) = f1 + f2 = 50 = (flow out).
(a) Create a system of equations by writing
the constraint equation for all four nodes.
(b) Solve this system to determine how the
flows f1, . . . , f5 are related in steady state.
(c) How do the constraints you found in part
(a) change if the flows cannot take nega-
tive values (i.e. fi ≥ 0 for i = 1, 2, . . . , 5.)
Chapter 3
Matrices and Linear Mappings
3.1 Operations on Matrices
Addition and Scalar Multiplication of Matrices
In the last chapter we used matrix notation to help us solve systems of linear equa-
tions. However, matrices can be used in other settings as well. We will now see that
we can treat matrices just like vectors in Rn.
DEFINITION
Matrix
Mm×n(R)
An m× n matrix A is a rectangular array with m rows and n columns. We denote the
entry in the i-th row and j-th column by ai j. That is,
A =
a11 a12 · · · a1 j · · · a1n
a21 a22 · · · a2 j · · · a2n
......
......
ai1 ai2 · · · ai j · · · ain
......
......
am1 am2 · · · am j · · · amn
Two m × n matrices A and B are equal if ai j = bi j for all 1 ≤ i ≤ m, 1 ≤ j ≤ n.
The set of all m × n matrices with real entries is denoted by Mm×n(R).
Addition and scalar multiplication of matrices is also defined to match what we did
with vectors in Rn.
DEFINITION
Addition andScalar
Multiplication
Let A, B ∈ Mm×n(R) and c ∈ R. We define A + B and cA by
(A + B)i j = (A)i j + (B)i j
(cA)i j = c(A)i j
72
Section 3.1 Operations on Matrices 73
REMARK
1. Note that addition is only defined if both matrices have the same size.
2. As with vectors in Rn, a sum of scalar multiples of matrices is called a linear
combination.
EXAMPLE 1 If A =
[
1 2
3 −3
]
, B =
[
2 5
−1 3
]
, and C =
[
1 −2
−√
2√
3
]
, then
A + B =
[
1 2
3 −3
]
+
[
2 5
−1 3
]
=
[
3 7
2 0
]
√2C =
√2
[
1 −2
−√
2√
3
]
=
[√2 −2
√2
−2√
6
]
3A − 2B =
[
3 6
9 −9
]
+
[
−4 −10
2 −6
]
=
[
−1 −4
11 −15
]
Not surprisingly, addition and scalar multiplication of matrices satisfy the same ten
properties as vectors in Rn. The proof of each property is essentially the same as in
Rn, and hence is left to the reader.
THEOREM 1 If A, B,C ∈ Mm×n(R) and c, d ∈ R, then
V1 A + B ∈ Mm×n(R)
V2 (A + B) +C = A + (B + C)
V3 A + B = B + A
V4 There exists a matrix Om,n ∈ Mm×n(R), such that A + Om,n = A for all A.
V5 For every A ∈ Mm×n(R) there exists (−A) ∈ Mm×n(R) such that A+ (−A) = Om,n
V6 cA ∈ Mm×n(R)
V7 c(dA) = (cd)A
V8 (c + d)A = cA + dA
V9 c(A + B) = cA + cB
V10 1A = A
The matrix Om,n is the m×n matrix with all entries zero and is called the zero matrix.
The Transpose of a Matrix
We will soon see that we sometimes wish to treat the rows of an m × n matrix as
vectors in Rn. To preserve our convention of writing vectors in Rn as column vectors,
we invent some notation for turning rows into columns and vice versa.
74 Chapter 3 Matrices and Linear Mappings
DEFINITION
Transpose
The transpose of an m × n matrix A is the n × m matrix AT whose i j-th entry is the
ji-th entry of A. That is,
(AT )i j = (A) ji
EXAMPLE 2 Determine the transpose of A =[
1 2 5]
, B =
−3
0
1
, and C =
3 −2
4 1
7 −6
.
Solution: We have
AT =[
1 2 5]T=
1
2
5
BT =
−3
0
1
T
=[
−3 0 1]
CT =
3 −2
4 1
7 −6
T
=
[
3 4 7
−2 1 −6
]
THEOREM 2 If A, B ∈ Mm×n(R) and c ∈ R, then
(1) (AT )T = A
(2) (A + B)T = AT + BT
(3) (cA)T = cAT
Proof: (1) By definition of the transpose we have(
(AT )T)
i j = (AT ) ji = Ai j.
(2)(
(A + B)T)
i j = (A + B) ji = (A) ji + (B) ji = (AT )i j + (BT )i j = (AT + BT )i j.
(3)(
(cA)T)
i j = (cA) ji = c(A) ji = c(AT )i j = (cAT )i j. �
As mentioned above, one use of the transpose is so that we have notation for the rows
of a matrix. We demonstrate this in the next example.
EXAMPLE 3 If ~a1 =
1
3
1
and ~a2 =
2
−1
4
, then the matrix A =
[
~aT1
~aT2
]
is
[
1 3 1
2 −1 4
]
.
EXERCISE 1 If
~aT1
~aT2
~aT3
=
1 3 1
−4 0 2
5 9 −3
, then what are ~a1, ~a2, and ~a3?
Section 3.1 Operations on Matrices 75
Matrix-Vector Multiplication
We saw in the last chapter that we had two ways of viewing the coefficient matrix A
of a system of linear equations[
A | ~b]
: the coefficients of the i-th equation are the
entries in the i-th row of A, or the coefficients of x j are the entries in the j-th column
of A. We will now use both of these interpretations to derive equivalent definitions
of matrix-vector multiplication so that we can represent the system[
A | ~b]
by the
equation A~x = ~b where ~x =
x1
...xn
. Note that both of the definitions are going to be
used frequently throughout this book.
Coefficients are the Rows of A
Let A be an m× n matrix. Using the transpose, we can write A =
~aT1...~aT
m
where ~ai ∈ Rn.
Observe that the i-th equation of the system[
A | ~b]
can be written ~ai · ~x = bi since
ai1
...ain
·
x1
...xn
= ai1x1 + · · · + ainxn
Therefore, the system[
A | ~b]
can be written as
~a1 · ~x...
~am · ~x
=
b1
...bm
To write the left-hand side in the form A~x, we make the following definition.
DEFINITION
Matrix-VectorMultiplication
Let A be an m × n matrix whose rows are denoted ~aTi for 1 ≤ i ≤ m. For any ~x ∈ Rn,
we define A~x by
A~x =
~a1 · ~x...
~am · ~x
REMARK
If A is an m × n matrix, then A~x is only defined if ~x ∈ Rn. Moreover, if ~x ∈ Rn, then
A~x ∈ Rm.
76 Chapter 3 Matrices and Linear Mappings
EXAMPLE 4 Calculate
1 3
2 −4
9 −1
[
x1
x2
]
.
Solution: Let
1 3
2 −4
9 −1
[
x1
x2
]
=
b1
b2
b3
. We have
b1 =
[
1
3
]
·[
x1
x2
]
= x1 + 3x2
b2 =
[
2
−4
]
·[
x1
x2
]
= 2x1 − 4x2
b3 =
[
9
−1
]
·[
x1
x2
]
= 9x1 − x2
Hence,
1 3
2 −4
9 −1
[
x1
x2
]
=
x1 + 3x2
2x1 − 4x2
9x1 − x2
EXAMPLE 5 Let A =
[
3 4 −5
1 0 2
]
. Calculate A~x where:
(a) ~x =
2
−1
6
(b) ~x =
1
0
0
(c) ~x =
0
0
1
Solution: We have
A~x =
[
3 4 −5
1 0 2
]
2
−1
6
=
[
3(2) + 4(−1) + (−5)(6)
1(2) + 0(−1) + (2)(6)
]
=
[
−28
14
]
A~x =
[
3 4 −5
1 0 2
]
1
0
0
=
[
3(1) + 4(0) + (−5)(0)
1(1) + 0(0) + (2)(0)
]
=
[
3
1
]
A~x =
[
3 4 −5
1 0 2
]
0
0
1
=
[
3(0) + 4(0) + (−5)(1)
1(0) + 0(0) + (2)(1)
]
=
[
−5
2
]
EXERCISE 2 Calculate the following matrix-vector products.
(a)
[
1 3 2
−1 4 5
]
2
−1
6
(b)[
6 −1 1]
2
3
1
(c)
1 −4
3 1
1 −2
[
0
1
]
Section 3.1 Operations on Matrices 77
EXERCISE 3 Write the following system of linear equations in the form A~x = ~b.
x1 + x2 = 3
2x1 = 15
−2x1 − 3x2 = −5
Coefficients are the Columns of A
Observe that by using operations on vectors in Rn, the system of linear equations
a11x1 + · · · + a1nxn = b1
...... =
...
am1x1 + · · · + amnxn = bm
can be written as
x1
a11
...am1
+ · · · + xn
a1n
...amn
=
b1
...bm
To write the right-hand side in the form A~x, we make the following definition.
DEFINITION
Matrix-VectorMultiplication
Let A =[
~a1 · · · ~an
]
be an m × n matrix. For any ~x =
x1
...xn
∈ Rn, we define A~x by
A~x = x1~a1 + · · · + xn~an
REMARKS
1. As above, if A is an m×n matrix, then A~x only makes sense if ~x ∈ Rn. Moreover,
if ~x ∈ Rn, then A~x ∈ Rm.
2. Observe that by definition for any ~x ∈ Rn, A~x is a linear combination of the
columns of A.
EXAMPLE 6 Calculate
1 3
2 −4
9 −1
[
x1
x2
]
.
Solution: We have
1 3
2 −4
9 −1
[
x1
x2
]
= x1
1
2
9
+ x2
3
−4
−1
=
x1 + 3x2
2x1 − 4x2
9x1 − x2
78 Chapter 3 Matrices and Linear Mappings
EXAMPLE 7 Let A =
[
3 4 −5
1 0 2
]
. Calculate A~x where:
(a) ~x =
2
−1
6
(b) ~x =
0
1
0
Solution: We have
[
3 4 −5
1 0 2
]
2
−1
6
= 2
[
3
1
]
+ (−1)
[
4
0
]
+ 6
[
−5
2
]
=
[
−28
14
]
[
3 4 −5
1 0 2
]
0
1
0
= 0
[
3
1
]
+ 1
[
4
0
]
+ 0
[
−5
2
]
=
[
4
0
]
EXERCISE 4 Calculate the following matrix-vector products by expanding them as a linear com-
bination of the columns of the matrix.
(a)
[
1 3 2
−1 4 5
]
2
−1
6
(b)[
6 −1 1]
2
3
1
(c)
1 −4
3 1
1 −2
[
0
1
]
Observe that this exercise contained the same problems as Exercise 2. This was to
demonstrate the fact that both definitions of matrix-vector multiplication indeed give
the same answer. At this point you may wonder why we have two different definitions
for the same thing. The reason is that they both have different uses. We use the first
method for computing matrix-vector products or when we want to work with the rows
of a matrix. When we are working with the columns of matrix or linear combinations
of vectors, then we use the second method.
The following is a simple, but useful theorem.
THEOREM 3 If ~ei is the i-th standard basis vector and A =[
~a1 · · · ~an
]
, then
A~ei = ~ai
The following formula will be extremely important later in the text.
DEFINITION
~xT~y
If ~x, ~y ∈ Rn, then
~xT~y = ~x · ~y
Section 3.1 Operations on Matrices 79
EXAMPLE 8 If ~x =
3
1
2
−1
and ~y =
−2
0
4
5
, then
~xT~y = ~x · ~y = 3(−2) + 1(0) + 2(4) + (−1)(5) = −3
REMARK
Because of this property, we can now write our first definition of matrix-vector
multiplication as
A~x =
~aT1...~aT
m
~x =
~at1~x...~aT
m~x
The properties of matrix-vector multiplication and of ~xT~y are the same as the proper-
ties of matrix-matrix multiplication given in Theorem 4 below.
In most cases, we will now represent a system of linear equations[
A | ~b]
as A~x = ~b.
This will allow us to use properties of matrix-vector multiplication when dealing with
systems of linear equations.
EXAMPLE 9 Assume that ~v is a solution of the system of linear equations A~x = ~b and that ~z is a
vector such that A~z = ~0. Prove that ~x = ~v + t~z is also a solution of the system for
every t ∈ R.
Solution: For any t ∈ R we have
A(~v + t~z) = A~v + A(t~z) by Theorem 4 (1)
= A~v + tA~z by Theorem 4 (3)
= ~b + t~0 since A~v = ~b and A~z = ~0
= ~b
80 Chapter 3 Matrices and Linear Mappings
Matrix Multiplication
We now use matrix-vector multiplication to define the product of two appropriately
sized matrices.
DEFINITION
MatrixMultiplication
For an m× n matrix A and an n× p matrix B =[
~b1 · · · ~bp
]
we define AB to be the
m × p matrix
AB = A[
~b1 · · · ~bp
]
=[
A~b1 · · · A~bp
]
REMARKS
1. Notice that the product AB is only defined if the number of columns of A is
equal to the number of rows of B.
2. There are a variety of ways to justify/motivate this definition of matrix mul-
tiplication. Many of these ways are quite interesting. However, we will wait
until Section 3.4 to justify the definition. There we will show how matrix mul-
tiplication corresponds to a composition of functions.
We can compute the entries of AB easily using our first interpretation of matrix-vector
multiplication. In particular, if ~aTi are the rows of A, then the i j-th entry of AB is the
i-th entry of A~b j. That is
(AB)i j = ~ai · ~b j = ~aTi~b j = ai1b1 j + ai2b2 j + · · · + ainbn j =
n∑
k=1
(A)ik(B)k j
EXAMPLE 10
[
1 3 2
−1 0 −3
]
−2 4
−4 5
0 0
=
[
1(−2) + 3(−4) + 2(0) 1(4) + 3(5) + 2(0)
(−1)(−2) + 0(−4) + (−3)(0) (−1)(4) + 0(5) + (−3)(0)
]
=
[
−14 19
2 −4
]
−2 4
−4 5
0 0
[
1 3 2
−1 0 −3
]
=
(−2)(1) + 4(−1) (−2)(3) + 4(0) (−2)(2) + 4(−3)
(−4)(1) + 5(−1) (−4)(3) + 5(0) (−4)(2) + 5(−3)
0(1) + 0(−1) 0(3) + 0(0) 0(2) + 0(−3)
=
−6 −6 −16
−9 −12 −23
0 0 0
Section 3.1 Operations on Matrices 81
EXERCISE 5 Calculate the following matrix products, or state why they don’t exist.
(a)
1 3
2 −1
0 1
[
2 3 0 1
0 1 −2 1
]
(b)
[
2 3 0 1
0 1 −2 1
]
1 3
2 −1
0 1
(c)[
1 2 3]
−1
2
1
(d)
−1
2
1
[
1 2 3]
THEOREM 4 If A, B, and C are matrices of the correct size so that the required products are
defined, and t ∈ R, then
(1) A(B +C) = AB + AC
(2) (A + B)C = AC + BC
(3) t(AB) = (tA)B = A(tB)
(4) A(BC) = (AB)C
(5) (AB)T = BT AT
Proof: We prove (1) and (4) and leave the others as exercises. For (1) we have
(
A(B +C))
i j =
n∑
k=1
(A)ik(B + C)k j
=
n∑
k=1
(A)ik
(
(B)k j + (C)k j
)
=
n∑
k=1
(A)ik(B)k j +
n∑
k=1
(A)ik(C)k j
=(
AB)
i j +(
AC)
i j =(
AB + AC)
i j
For (3) we have
(
A(BC))
i j =
n∑
k=1
(A)ik(BC)k j
=
n∑
k=1
(A)ik
n∑
ℓ=1
(B)kℓ(C)ℓ j
=
n∑
k=1
n∑
ℓ=1
(A)ik(B)kℓ(C)ℓ j
=
n∑
ℓ=1
n∑
k=1
(A)ik(B)kℓ(C)ℓ j
=
n∑
ℓ=1
n∑
k=1
(A)ik(B)kℓ
(C)ℓ j
=
n∑
ℓ=1
(AB)iℓ(C)ℓ j =(
(AB)C)
i j
82 Chapter 3 Matrices and Linear Mappings
�
Observe that Example 10 shows that in general AB , BA. When multiplying a matrix
equation by a matrix we have to make sure that we keep this in mind. That is, if A = B
and we multiply both sides by a matrix D, then we get either DA = DB or AD = BD
depending on whether we multiply by D on the left or right. Similarly, we do not
have the cancellation law for matrix multiplication; if AC = BC, then we can not
guarantee that A = B. This is demonstrated in the next example.
EXAMPLE 11 We have[
1 3
0 0
] [
0 1
0 0
]
=
[
0 1
0 0
]
=
[
1 15
0 −5
] [
0 1
0 0
]
but[
1 3
0 0
]
,
[
1 15
0 −5
]
However, in many cases we will want to prove that two matrices are equal. To do
this, we often use the following theorem.
THEOREM 5 (Matrices Equal Theorem)
If A and B are m × n matrices such that A~x = B~x for every ~x ∈ Rn, then A = B.
Proof: Let A =[
~a1 · · · ~an
]
and B =[
~b1 · · · ~bn
]
. Let ~ei denote the i-th standard
basis vector. Then, since A~x = B~x for all ~x ∈ Rn, we have that
~ai = A~ei = B~ei = ~bi
for 1 ≤ i ≤ n. Hence A = B. �
Identity Matrix
When considering multiplication, we are often interested in finding the multiplicative
identity; the element I such that AI = A = IA for every element A. We first observe
that if A is an m × n matrix and AB = A, then B must be an n × n matrix, while if
BA = A, then B must be m×m. Therefore, the set Mm×n(R) with m , n cannot have a
multiplicative identity. However, we can have a multiplicative identity for Mn×n(R).
To find the multiplicative identity for Mn×n(R) we want to find an n × n matrix
B =[
~b1 · · · ~bn
]
such that
AB = A = BA
for every n × n matrix A =[
~a1 · · · ~an
]
. Observe that if A = AB, then we have
[
~a1 · · · ~an
]
= A[
~b1 · · · ~bn
]
=[
A~b1 · · · A~bn
]
Thus, ~ai = A~bi. However, we know that ~ai = A~ei, where ~ei is the i-th standard basis
vector for all 1 ≤ i ≤ n. Thus, we should take I =[
~e1 · · · ~en
]
.
Section 3.1 Operations on Matrices 83
THEOREM 6 If I =[
~e1 · · · ~en
]
, then for any n × n matrix A we have
AI = A = IA
EXERCISE 6 Prove Theorem 6
Theorem 6 shows that I =[
~e1 · · · ~en
]
is a multiplicative identity. We now prove
that it is the only multiplicative identity for Mn×n(R).
THEOREM 7 The multiplicative identity for Mn×n(R) is unique.
Proof: If I1 and I2 are two distinct multiplicative identities for Mn×n(R), then
I1 = I1I2 = I2
which contradicts our assumption that they are distinct. Hence, the multiplicative
identity for Mn×n(R) is unique. �
We make the following definition.
DEFINITION
Identity Matrix
The n × n identity matrix, denoted by I or In, is the matrix such that (I)ii = 1 for
1 ≤ i ≤ n and (I)i j = 0 whenever i , j. In particular,
I =[
~e1 · · · ~en
]
EXAMPLE 12 The 2 × 2 identity matrix is
I2 =[
~e1 ~e2
]
=
[
1 0
0 1
]
and the 3 × 3 identity matrix is
I3 =[
~e1 ~e2 ~e3
]
=
1 0 0
0 1 0
0 0 1
84 Chapter 3 Matrices and Linear Mappings
EXERCISE 7 Let A be an m×n matrix such that the system of linear equations A~x = ~ei is consistent
for all 1 ≤ i ≤ m.
(a) Prove that the system of equations A~x = ~y is consistent for all ~y ∈ Rm.
(b) What can you conclude about the rank of A from the result in part (a)?
(c) Prove that there exists a matrix B such that AB = Im.
(d) Let A =
[
1 2 1
0 1 1
]
. Use your method in part (c) to find a matrix B such that
AB = I2.
Block Multiplication
So far in this section, we have often made use of the fact that we could represent
a matrix as a list of columns (n × 1 matrices) or as a list of rows (1 × n matrices).
Sometimes it can also be very useful to represent a matrix as a group of submatrices
called blocks.
DEFINITION
Block Matrix
If A is an m × n matrix, then we can write A as the k × ℓ block matrix
A =
A11 · · · A1ℓ
.... . .
...Ak1 · · · Akℓ
where Ai j is a block such that all blocks in the i-th row have the same number of rows
and all blocks in the j-th column have the same number of columns.
EXAMPLE 13 If A =
1 −1 3 4
0 0 0 0
0 3 1 2
, then we can write A as
A =
[
A11 A12
A21 A22
]
where A11 =
[
1 −1
0 0
]
, A12 =
[
3 4
0 0
]
, A21 =[
0 3]
, and A22 =[
1 2]
. Alternately, we
could write A as
A =
A11 A12
A21 A22
A31 A32
where A11 =[
1 −1 3]
, A12 =[
4]
, A21 =[
0 0 0]
, A22 =[
0]
, A31 =[
0 3 1]
,
and A32 =[
2]
.
Section 3.1 Operations on Matrices 85
Let A be an m × n matrix and let B be an n × p matrix. If we write A as a k × ℓ block
matrix and B as an ℓ × s block matrix, then we can calculate AB, by multiplying the
block matrices together using the definition of matrix multiplication. For example, if
A =
A11 A12
A21 A22
A31 A32
and B =
[
B11 B12
B21 B22
]
, then
AB =
A11B11 + A12B21 A11B12 + A12B22
A21B11 + A22B21 A21B12 + A22B22
A31B11 + A32B21 A31B12 + A32B22
assuming that we have divided A and B into blocks such that all of the required
products of the blocks are defined.
EXAMPLE 14 Let A =
1 2 −3
0 3 1
1 0 2
and B =
2 3 1
0 1 −2
0 0 3
. Let A11 =[
1]
, A12 =[
2 −3]
, A21 =
[
0
1
]
,
and A22 =
[
3 1
0 2
]
so that A =
[
A11 A12
A21 A22
]
, and let B11 =[
2]
, B12 =[
3 1]
, B21 =
[
0
0
]
,
and B22 =
[
1 −2
0 3
]
, so that B =
[
B11 B12
B21 B22
]
. Then we have
AB =
[
A11 A12
A21 A22
] [
B11 B12
B21 B22
]
=
[
A11B11 + A12B21 A11B12 + A12B22
A21B11 + A22B21 A21B12 + A22B22
]
Computing each entry we get
A11B11 + A12B21 =[
1] [
2]
+[
2 −3]
[
0
0
]
=[
2]
+[
0]
=[
2]
A11B12 + A12B22 =[
1] [
3 1]
+[
2 −3]
[
1 −2
0 3
]
=[
3 1]
+[
2 −13]
=[
5 −12]
A21B11 + A22B21 =
[
0
1
]
[
2]
+
[
3 1
0 2
] [
0
0
]
=
[
0
2
]
+
[
0
0
]
=
[
0
2
]
A21B12 + A22B22 =
[
0
1
]
[
3 1]
+
[
3 1
0 2
] [
1 −2
0 3
]
=
[
0 0
3 1
]
+
[
3 −3
0 6
]
=
[
3 −3
3 7
]
Hence,
AB =
2 5 −12
0 3 −3
2 3 7
REMARK
If you multiply AB using the definition of matrix-matrix multiplication and compare
this to the calculations above it becomes clear why block multiplication works.
86 Chapter 3 Matrices and Linear Mappings
Section 3.1 Problems
1. Compute the following linear combinations.
(a) 2
[
2 1
−3 4
]
+
[
3 −1
0 2
]
(b)1
3
[
1 3
1 −6
]
− 1
4
[
4 8
1/2 3
]
(c)1√
2
[
2 4
0 2
]
+1√
3
[
6 0
0 6
]
(d) −2
[
3 3
0 −1
]
− 3
[
−2 −2
0 2
]
(e) x
[
1 0
0 0
]
+ y
[
0 1
0 0
]
+ z
[
0 0
1 0
]
(f) 0
[
−4 3
3 5
]
+ 0
[
9 −8
1 5
]
2. Compute the following or state why they don’t
exist.
(a)
[
3 1
2 1
] [
2 −2
0 1
]
(b)
[
3 2 −1
1 4 3
] [
1 2 2
−3 5 0
]
(c)
1 2 3
4 5 6
7 8 9
1
0
0
(d)
1 2 3
4 5 6
7 8 9
0 0
1 0
0 1
(e)
1 5
3 2
3 −2
[
3 1
2 1
]T
(f)
1 4 −1
2 1 0
3 1 1
4 1 −1
0 0 0
3 1 1
(g)
2
3
5
1
−1
1
(h)
[
−1 2 0
2 1 1
]
5 −3
2 7
−1 −3
(i)
2
3
5
T
1
−1
1
(j)
5 −3
2 7
−1 −3
[
−1 2 0
2 1 1
]
(k)
2
3
5
1
−1
1
T
(l)
2
3
5
T
x1
x2
x3
3. We define the trace of an n × n matrix A by
tr A =n∑
i=1
aii. Let A and B be n × n matrices.
(a) Prove that tr(A + B) = tr A + tr B.
(b) Prove that tr(AT B) =
n∑
i=1
n∑
j=1
(A) ji(B) ji.
4. Find a matrix A and vectors ~x and ~b such that
A~x = ~b represents the system
3x1 + 2x2 − x3 = 4
2x1 − x2 + 5x3 = 5
5. Find all solutions to the equation
c1
[
3 1
3 1
]
+ c2
[
2 1
2 −1
]
+ c3
[
1 1
1 −2
]
=
[
0 0
0 0
]
6. Find t1, t2, and t3 such that
t1
[
1 2
−2 −1
]
+ t2
[
1 1
0 −1
]
+ t3
[
2 3
1 1
]
=
[
−1 −4
3 −2
]
7. Determine whether each statement is true or
false. Justify your answer.
(a) If A ∈ M3×2(R) and A~x is defined, then
~x ∈ R3.
(b) If A ∈ M2×4(R) and A~x is defined, then
A~x ∈ R4.
(c) If A ∈ M3×3(R), then there is no matrix B
such that AB = BA.
(d) The only 2×2 matrix A such that A2 = O2,2
is O2,2. (NOTE: As usual, by A2 we mean
AA.)
(e) If A and B are 2 × 2 matrices such that
AB = O2,2, then either A is the zero ma-
trix or B is the zero matrix.
8. Prove that if A and B are m × n matrices, then
A + B = B + A.
9. Let A be an m × n matrix. Prove that ImA = A
and AIn = A.
10. Prove Theorem 3.
11. Let A be an m× n matrix with one row of zeros
and B be an n × p matrix.
(a) Use block multiplication to prove that AB
also has at least one row of zeros.
(b) Give an example where AB has more than
one row of zeros.
12. Find all 2 × 2 matrices A such that A2 = I.
Section 3.1 Operations on Matrices 87
13. Consider the populations of individuals in two
adjacent regions X and Y . Suppose that
changes in these populations only occur via
transfer (migration) of individuals from one re-
gion to the other. Let xt and yt denote the popu-
lations of regions X and Y , respectively, in year
t. Suppose that each year, 110
of the population
of X migrates to region Y , while 15
of the popu-
lation of Y migrates to X. We can then express
the year-to-year change in the populations as:
xt+1 =9
10xt +
1
5yt
yt+1 =1
10xt +
4
5yt
(a) Express the year-to-year change in the
population as a matrix product.[
xt+1
yt+1
]
= A
[
xt
yt
]
(b) Suppose that in year t = 0 the populations
are given by x0 = 1000, y0 = 2000. Use
your formula from part (a) to calculate the
populations in years t = 1, 2 and 3. Con-
firm that in each year, the total population
(xt + yt) is 3000. Why should this be ex-
pected?
(c) If the initial populations are x0 = 2000,
y0 = 1000, what are the populations in
year t = 1? What about year t = 2? What
about year t = 186? Explain your reason-
ing.
14. Suppose that the only competing suppliers of
internet services are NewServices and Real-
West. At present, they each have 50% of
the market share in their community. How-
ever, NewServices is upgrading their connec-
tion speeds but at a higher cost. Because of this,
each month 30% of RealWest’s customers will
switch to NewServices, while 10% of NewSer-
vices customers will switch to RealWest.
(a) Express the year-to-year change in the per-
cent of market share as a matrix product.
(b) Calculate the market share of each com-
pany for the first two months.
(c) Use a computer to calculate the market
share of each company after six months.
If this continues for a long time, how big
will NewServices share become?
15. A car rental company serving one city has three
locations: the airport, the train station, and the
city centre. Of the cars rented at the airport,
8/10 are returned to the airport, 1/10 are left
at the train station, and 1/10 are left at the city
centre. Of cars rented at the train station, 3/10
are left at the airport, 6/10 are returned to the
train station, and 1/10 are left at the city cen-
tre. Of cars rented at the city centre, 3/10 go
to the airport, 1/10 go to the train station, and
6/10 are returned to the city centre. Express
the change in the location of the cars as a ma-
trix product.
16. The Fibonacci sequence is defined as
1, 1, 2, 3, 5, 8, 11, . . .
After the first two values, each entry in the se-
quence is the sum of the preceding two. De-
noting the n-th entry in the sequence by an, we
have
an+2 = an + an+1 for n = 0, 1, 2, . . .
This is called a two-step recursion, because
each term depends on the two preceding terms.
The sequence can also be constructed as a pair
of one-step recursions as follows:
xn+1 = yn
yn+1 = xn + yn
(a) Verify that using this rule and taking x0 =
1, y0 = 1, the xn sequence is the Fi-
bonacci sequence, while the yn sequence
is a shifted version of the Fibonacci se-
quence.
(b) an advantage of the one-step recursion
form of the sequence definition is that it
can be written as a matrix product. Find
the matrix A so that the recursion takes the
form[
xn+1
yn+1
]
= A
[
xn
yn
]
This matrix formulation will be to find an
explicit expression for the n term in this
sequence in Problem 11 in Section 6.4.
88 Chapter 3 Matrices and Linear Mappings
3.2 Linear Mappings
Recall that for sets A and B, a function f : A → B is a rule that associates with each
element a ∈ A one element f (a) ∈ B called the image of a under f . The set A is
called the domain of f and the set B is called the codomain of f .
Matrix Mappings
Observe that if A is an m × n matrix, then we can define a function f : Rn → Rm by
f (~x) = A~x called a matrix mapping.
REMARK
As we indicated previously, if f : Rn → Rm, then we should write
f
x1
...xn
=
y1
...ym
However, we will often write f (x1, . . . , xn) = (y1, . . . , ym) as is typically done in other
areas.
EXAMPLE 1 Let A =
[
1 3 2
−1 3 1
]
and define f (~x) = A~x. From the last section we know that we
can calculate A~x if and only if ~x ∈ R3. Thus, the domain of f is R3. Also, we see that
A~x ∈ R2 and so the codomain of f is R2. We can easily compute the images of some
vectors in R3 under f using either definition of matrix-vector multiplication.
f (4,−2, 5) =
[
1 3 2
−1 3 1
]
4
−2
5
=
[
8
−5
]
f (−1,−1, 2) =
[
1 3 2
−1 3 1
]
−1
−1
2
=
[
0
0
]
f (1, 0, 0) =
[
1 3 2
−1 3 1
]
1
0
0
=
[
1
−1
]
f (x1, x2, x3) =
[
1 3 2
−1 3 1
]
x1
x2
x3
=
[
x1 + 3x2 + 2x3
−x1 + 3x2 + x3
]
Using the properties of matrix multiplication, we prove the following important prop-
erty of matrix mappings.
Section 3.2 Linear Mappings 89
THEOREM 1 If A is an m × n matrix and f : Rn → Rm is defined by f (~x) = A~x, then for all
~x, ~y ∈ Rn and b, c ∈ R we have
f (b~x + c~y) = b f (~x) + c f (~y)
Proof: Using Theorem 3.1.?? we get
f (b~x + c~y) = A(b~x + c~y) = A(b~x) + A(c~y) = bA~x + cA~y = b f (~x) + c f (~y)
�
Any function f which has the property that f (b~x + c~y) = b f (~x) + c f (~y) is said to
preserve linear combinations. This property, called the linearity property, is ex-
tremely important in linear algebra. We now look at functions that have this property.
Linear Mappings
DEFINITION
Linear Mapping
Linear Operator
A function L : Rn → Rm is said to be a linear mapping if for every ~x, ~y ∈ Rn and
b, c ∈ R we have
L(b~x + c~y) = bL(~x) + cL(~y)
Two linear mappings L : Rn → Rm and M : Rn → Rm are said to be equal if
L(~x) = M(~x) for all ~x ∈ Rn. We write L = M.
A linear mapping L : Rn → Rn is sometimes called a linear operator. This is often
done when we wish to stress the fact that the domain and codomain of the linear
mapping are the same.
Observe that if L : Rn → Rm is a linear mapping and ~x can be written as a linear
combination of vectors ~v1, . . . ,~vk, say
~x = c1~v1 + · · · + ck~vk
then the image of ~x under L can be written as a linear combination of the images of
the vectors ~v1, . . . ,~vk under L. That is
L(~x) = L(c1~v1 + · · · + ck~vk) = c1L(~v1) + · · · + ckL(~vk)
since L preserves linear combinations. This is the most important property of linear
mappings; it will be used frequently!
90 Chapter 3 Matrices and Linear Mappings
EXAMPLE 2 Prove that the function L : R3 → R2 defined by L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3)
is a linear mapping.
Solution: Let ~x, ~y ∈ R3 and b, c ∈ R. Then
L(b~x + c~y) = L(bx1 + cy1, bx2 + cy2, bx3 + cy3)
=(
3(bx1 + cy1) − (bx2 + cy2), 2(bx1 + cy1) + 2(bx3 + cy3))
= b(3x1 − x2, 2x1 + 2x3) + c(3y1 − y2, 2y1 + 2y3)
= bL(~x) + cL(~y)
Hence, L is linear.
EXAMPLE 3 Prove that L : R2 → R2 defined by L(x1, x2) = (x21− x2
2, x1x2) is not linear.
Solution: To prove L is not linear we just need to find one example to show that L
does not preserve linear combinations. Observe that L(1, 2) = (−3, 2) and L(2, 4) =
(−12, 8), but 2L(1, 2) = (−6, 4) , L(
2(1, 2))
= L(2, 4). Hence, L is not linear.
EXAMPLE 4 Let ~a ∈ Rn. Show that proj~a : Rn → Rn is a linear mapping.
Solution: Let ~x, ~y ∈ Rn and b, c ∈ R. Then
proj~a(b~x + c~y) =(b~x + c~y) · ~a‖~a ‖2 ~a
=b(~x · ~a)
‖~a ‖2 ~a +c(~y · ~a)
‖~a ‖2 ~a
= b proj~a(~x) + c proj~a(~y)
Hence, proj~a is linear.
EXERCISE 1 Prove that the mapping L : R3 → R2 defined by L(x1, x2, x3) = (x1 − x2, x1 − 2x3) is
linear.
Theorem 3.2.1 shows that every matrix mapping is a linear mapping. It is natural
to ask whether the converse is true: “Is every linear mapping a matrix mapping?”
To answer this question, we will see that our second interpretation of matrix vector
multiplication is very important.
Let L : Rn → Rm be a linear mapping and let ~x ∈ Rn. We know that ~x can be written
as a unique linear combination of the standard basis vectors, say
~x = x1~e1 + · · · + xn~en
Hence, using the linearity property we get
L(~x) = L(x1~e1 + · · · + xn~en) = x1L(~e1) + · · · + xnL(~en)
Section 3.2 Linear Mappings 91
Our second interpretation of matrix vector multiplication says that a matrix times
a vector gives us a linear combination of the columns of the matrix. Using this in
reverse, we see that
L(~x) = x1L(~e1) + · · · + xnL(~en) =[
L(~e1) · · · L(~en)]
x1
...xn
We have proven the following theorem.
THEOREM 2 Every linear mapping L : Rn → Rm can be represented as a matrix mapping with
matrix whose i-th column is the image of the i-th standard basis vector of Rn under L
for all 1 ≤ i ≤ n. That is, L(~x) = [L]~x where
[L] =[
L(~e1) · · · L(~en)]
DEFINITION
Standard Matrix
Let L : Rn → Rm be a linear mapping. The matrix [L] =[
L(~e1) · · · L(~en)]
is called
the standard matrix of L. It satisfies L(~x) = [L]~x.
REMARK
We are calling this the standard matrix of L since it is based on the standard basis. In
Chapters 6 and 8, we will see that we can find the matrix representation of a linear
mapping with respect to different bases.
EXAMPLE 5 Determine the standard matrix of L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3).
Solution: By definition, the columns of the standard matrix [L] are the images of the
standard basis vectors of R3 under L. We have
L(1, 0, 0) = (3, 2)
L(0, 1, 0) = (−1, 0)
L(0, 0, 1) = (0, 2)
Thus,
[L] =[
L(~e1) L(~e2) L(~e3)]
=
[
3 −1 0
2 0 2
]
92 Chapter 3 Matrices and Linear Mappings
REMARK
Be careful to remember that when we write L(1, 0, 0) = (3, 2), we really mean
L(~e1) = L
1
0
0
=
[
3
2
]
so that L(~e1) is in fact a column of [L].
EXAMPLE 6 Let ~a =
[
1
2
]
. Determine the standard matrix of proj~a : R2 → R2.
Solution: We need to find the projection of the standard basis vectors of R2 onto ~a.
proj~a(~e1) =~e1 · ~a‖~a ‖2 ~a =
1
5
[
1
2
]
=
[
1/52/5
]
proj~a(~e2) =~e2 · ~a‖~a ‖2 ~a =
2
5
[
1
2
]
=
[
2/54/5
]
Hence, [proj~a] =[
proj~a(~e1) proj~a(~e2)]
=
[
1/5 2/52/5 4/5
]
.
EXERCISE 2 Determine the standard matrix of the linear mapping
L(x1, x2, x3) = (x1 + 4x2, x1 − x2 + 2x3)
We now look at two important examples of linear mappings that will be used later in
the book.
Rotations in R2
Let Rθ : R2 → R2 denote the function that rotates
a vector ~x ∈ R2 about the origin through an angle
θ. Using basic trigonometric identities, we can show
that
Rθ(x1, x2) = (x1 cos θ − x2 sin θ, x1 sin θ + x2 cos θ)
and that Rθ is linear. Using this we get that the stan-
dard matrix of Rθ is
[Rθ] =
[
cos θ − sin θsin θ cos θ
]
x1
x2
Rθ(~x)
θ ~x
Section 3.2 Linear Mappings 93
EXAMPLE 7 Determine the standard matrix of Rπ/4.
Solution: We have
[Rπ/4] =
[
cos π/4 − sinπ/4sin π/4 cos π/4
]
=
[
1/√
2 −1/√
2
1/√
2 1/√
2
]
We call Rθ a rotation in R2 and we call its standard matrix [Rθ] a rotation matrix.
The following theorem shows some important properties of a rotation matrix.
THEOREM 3 If Rθ : R2 → R2 is a rotation with rotation matrix A = [Rθ], then the columns of A
are orthogonal unit vectors.
Proof: The columns of A are ~a1 =
[
cos θsin θ
]
and ~a2 =
[
− sin θcos θ
]
. Hence,
~a1 · ~a2 = − cos θ sin θ + cos θ sin θ = 0
‖~a1‖2 = cos2 θ + sin2 θ = 1
‖~a2‖2 = cos2 θ + sin2 θ = 1
�
Reflections
Let reflP : Rn → Rn denote the mapping that sends a
vector ~x to its mirror image in the hyperplane P with
normal vector ~n. The figure shows a reflection over a
line in R2 with normal vector ~n. From the figure, we
see that the reflection of ~x over the line with normal
vector ~n is given by
reflP(~x) = ~x − 2 proj~n(~x)
We extend this formula directly to the n-dimensional
case.
x1
x2
proj~n(~x)
~n
θθ
~x
reflP(~x)
EXAMPLE 8 Prove that reflP : Rn → Rn is a linear mapping.
Solution: Since proj~n is a linear mapping we get
reflP(b~x + c~y) = (b~x + c~y) − 2 proj~n(b~x + c~y)
= b~x + c~y − 2(b proj~n(~x) + c proj~n(~y))
= b(~x − 2 proj~n(~x)) + c(~y − 2 proj~n(~y))
= b reflP(~x) + c reflP(~y)
for all ~x, ~y ∈ Rn and b, c ∈ R as required.
94 Chapter 3 Matrices and Linear Mappings
EXAMPLE 9 Find the reflection of ~x =
1
1
1
3
over the hyperplane P in R4 with normal vector ~n =
1
2
2
0
.
Solution: We have
reflP(~x) = ~x − 2 proj~n(~x) = ~x − 2~x · ~n‖~n ‖2~n =
1
1
1
3
− 10
9
1
2
2
0
=
−1/9−11/9−11/9
3
EXAMPLE 10 Determine the standard matrix of reflP : R3 → R3 where ~n =
1
2
−3
.
Solution: We have
reflP(~e1) =
1
0
0
− 2
14
1
2
−3
=
6/7−2/73/7
reflP(~e2) =
0
1
0
− 4
14
1
2
−3
=
−2/73/76/7
reflP(~e3) =
0
0
1
+6
14
1
2
−3
=
3/76/7−2/7
Hence,
[reflP] =
6/7 −2/7 3/7−2/7 3/7 6/73/7 6/7 −2/7
Section 3.2 Linear Mappings 95
Section 3.2 Problems
1. Let A =
−2 3
3 0
1 5
4 −6
and let L(~x) = A~x be the
corresponding matrix mapping.
(a) Determine the domain and codomain of L.
(b) Determine L(2,−5) and L(−3, 4).
(c) Determine [L].
2. Let B =
1 2 −3 0
2 −1 0 3
1 0 2 −1
and let f (~x) = B~x
be the corresponding matrix mapping.
(a) Determine the domain and codomain of f .
(b) Determine f (2,−2, 3, 1) and f (−3, 1, 4, 2).
(c) Determine [ f ].
3. Determine which of the following are linear.
Find the standard matrix of each linear map-
ping.
(a) L(x1, x2, x3) = (x1 + x2, 0)
(b) L(x1, x2, x3) = (x1 − x2, x2 + x3)
(c) L(x1, x2, x3) = (1, x1, x2 + x3)
(d) L(x1, x2) = (x21, x2
2)
(e) L(x1, x2, x3) = (0, 0, 0)
(f) L(x1, x2, x3, x4) = (x4 − x1, 2x2 + 3x3)
4. Let ~a =
[
2
2
]
and ~x =
[
3
−1
]
.
(a) Find [proj~a].
(b) Use the standard matrix you found in (a)
to find proj~a(~x).
(c) Check your answer in (b) by computing
proj~a(~x) directly.
5. Let P be a plane with normal vector ~n =
3
1
−1
.
Find the standard matrix of reflP : R3 → R3.
6. Let ~a ∈ Rn. Prove that perp~a : Rn → Rn is a
linear mapping.
7. Let ~v ∈ Rn and define DOT~v : Rn → R by
DOT~v(~x) = ~x ·~v. Prove that DOT~v is linear and
find it’s standard matrix.
8. Prove that if L : Rn → Rm is linear, then
L(~0) = ~0.
9. Let L be a linear mapping with standard matrix
[L] =
[
1 3 2 1
−1 3 0 5
]
.
(a) What is the domain of L?
(b) What is the codomain of L?
(c) What is L(~x) for any ~x in the domain of L.
10. Consider the rotation Rπ/3 : R2 → R2.
(a) Find the standard matrix of Rπ/3.
(b) Use the standard matrix to determine
Rπ/3(1/√
2, 1/√
2).
11. (a) Invent a linear mapping L : R2 → R3 such
that L(1, 0) = (3,−1, 4) and
L(0, 1) = (1,−5, 9).
(b) Invent a non-linear mapping L : R2 → R3
such that L(1, 0) = (3,−1, 4) and
L(0, 1) = (1,−5, 9).
12. Invent a linear mapping L : R2 → R2 such that
L(1, 1) = (2, 3) and L(1,−1) = (3, 1).
13. (a) Let L : Rn → Rm be a linear mapping.
Prove that if {L(~v1), . . . , L(~vk)} is a linearly
independent set in Rm, then {~v1, . . . ,~vk} is
linearly independent.
(b) Give an example of a linear mapping
L : Rn → Rm where {~v1, . . . ,~vk} is linearly
independent in Rn, but {L(~v1), . . . , L(~vk)} is
linearly dependent.
14. Prove that if ~u is a unit vector, then
[proj~u] = ~u~uT
96 Chapter 3 Matrices and Linear Mappings
3.3 Special Subspaces
When working with a function we are often interested in the subset of the codomain
which is the set of all possible images under the function, called the range of the
function. In this section we will not only look at the range of a linear mapping, but a
special subset of the domain as well. We will then use the connection between linear
mappings and their standard matrices to derive four important subspaces of a matrix.
Range
DEFINITION
Range
Let L : Rn → Rm be a linear mapping. The range of L is defined by
Range(L) ={
L(~x) ∈ Rm | ~x ∈ Rn}
EXAMPLE 1 Let L : R2 → R3 be defined by L(x1, x2) = (x1 + x2, x1 − x2, x2). Determine which of
the following vectors is in the range of L: ~y1 =
0
0
0
, ~y2 =
2
3
1
, ~y3 =
3
−1
2
.
Solution: Observe that L(0, 0) = (0 + 0, 0 − 0, 0) = (0, 0, 0), so ~y1 ∈ Range(L).
For ~y2, we need to find if there exist ~x =
[
x1
x2
]
such that
(2, 3, 1) = L(x1, x2) = (x1 + x2, x1 − x2, x2)
Comparing corresponding entires gives us the system of linear of equations
x1 + x2 = 2
x1 − x2 = 3
x2 = 1
It is easy to see that this system is inconsistent. Thus, ~y2 < Range(L).
For ~y3, we need to find if there exist x1 and x2 such that
(3,−1, 2) = L(x1, x2) = (x1 + x2, x1 − x2, x2)
That is, we need to solve the system of equations
x1 + x2 = 3
x1 − x2 = −1
x2 = 2
We find that the system is consistent with unique solution is x1 = 1, x2 = 2. Hence,
L(1, 2) = (3,−1, 2), so ~y3 ∈ Range(L).
Section 3.3 Special Subspaces 97
EXAMPLE 2 Find a basis for the range of L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3).
Solution: We see that every vector L(~x) has the form
[
3x1 − x2
2x1 + 2x3
]
= x1
[
3
2
]
+ x2
[
−1
0
]
+ x3
[
0
2
]
Hence,
Range(L) = Span
{[
3
2
]
,
[
−1
0
]
,
[
0
2
]}
= Span
{[
−1
0
]
,
[
0
2
]}
since
[
3
2
]
= (−3)
[
−1
0
]
+
[
0
2
]
. Thus, B ={[
−1
0
]
,
[
0
2
]}
spans Range(L) and is clearly
linearly independent, so it is a basis for Range(L).
EXAMPLE 3 Let ~a =
[
2
−1
]
. Find a basis for the range of proj~a : R2 → R2.
Solution: Let ~y ∈ Range(proj~a). Then, by definition, there exists ~x ∈ R2 such that
~y = proj~a(~x) =~x · ~a‖~a‖2~a ∈ Span{~a}
So, Range(proj~a) ⊆ Span{~a}. On the other hand, pick any c~a ∈ Span{~a}. Taking
~y =
[
0
−5c
]
gives
proj~a(~y) =~y · ~a‖~a‖2~a =
0 + (−5c)(−1)
5~a = c~a
Thus, Span{~a} ⊆ Range(proj~a). Hence, Range(proj~a) = Span{~a}. Since {~a} is also
clearly linearly independent, it is a basis for Range(proj~a).
EXERCISE 1 Find a basis for the range of L(x1, x2, x3) = (x1 − x2, x1 − 2x3).
Observe in the examples above that the range of a linear mapping is a subspace of the
codomain. To prove that this is always the case, we will use the following lemma.
LEMMA 1 If L : Rn → Rm is a linear mapping, then L(~0) = ~0.
THEOREM 2 If L : Rn → Rm is a linear mapping, then Range(L) is a subspace of Rm.
In many cases we are interested in whether the range of a linear mapping L equals its
codomain. That is, for every vector ~y in the codomain, can we find a vector ~x in the
domain such that L(~x) = ~y.
98 Chapter 3 Matrices and Linear Mappings
REMARK
A function whose range equals its codomain is said to be onto. We will look at onto
mappings in Chapter 8.
EXAMPLE 4 Let L : R3 → R3 be defined by L(x1, x2, x3) = (x1+ x2+ x3, 2x1−2x3, x2+3x3). Prove
that Range(L) = R3.
Solution: Since Range(L) is a subset ofR3 by definition, to prove that Range(L) = R3
we only need to show that if we pick any vector ~y ∈ R3, then we can find a vector
~x ∈ R3 such that L(~x) = ~y. Let ~y =
y1
y2
y3
∈ R3. Consider
(y1, y2, y3) = L(x1, x2, x3) = (x1 + x2 + x3, 2x1 − 2x3, x2 + 3x3)
This gives the system of linear equations
x1 + x2 + x3 = y1
2x1 − 2x3 = y2
x2 + 3x3 = y3
Row reducing the corresponding augmented matrix gives
1 1 1 y1
2 0 −2 y2
0 1 3 y3
∼
1 0 0 −y1 + y2 + y3
0 1 0 3y1 − 32y2 − 2y3
0 0 1 −y1 +12y2 + y3
Therefore, Range(L) = R3 since the system is consistent for all y1, y2, and y3. In
particular, taking x1 = −y1 + y2 + y3, x2 = 3y1 − 32y2 − 2y3, x3 = −y1 +
12y2 + y3 we get
L(x1, x2, x3) = (y1, y2, y3)
EXAMPLE 5 Let A be an n × n matrix and let L : Rn → Rn be defined by L(~x) = A~x. Prove that
Range(L) = Rn if and only if rank A = n.
Solution: If Range(L) = Rn, then for every ~y ∈ Rn there exists a vector ~x such that
~y = L(~x) = A~x. Hence, the system of equations A~x = ~y with coefficient matrix A is
consistent for every ~y ∈ Rn. The System-Rank Theorem then implies that rank(A) =
n.
On the other hand, if rank A = n, then by the System-Rank Theorem L(~x) = A~x = ~yis consistent for every ~y ∈ Rn, so Range(L) = Rn.
Section 3.3 Special Subspaces 99
Kernel
We now look at a special subset of the domain of a linear mapping.
DEFINITION
Kernel
Let L : Rn → Rm be a linear mapping. The kernel (nullspace) of L is the set of all
vectors in the domain which are mapped to the zero vector in the codomain. That is,
Ker(L) = {~x ∈ Rn | L(~x) = ~0}
EXAMPLE 6 Let L : R2 → R2 be defined by L(x1, x2) = (2x1 − 2x2,−x1 + x2). Determine which of
the following vectors is in the kernel of L: ~x1 =
[
0
0
]
, ~x2 =
[
2
2
]
, ~x3 =
[
3
−1
]
.
Solution: We have
L(~x1) = L(0, 0) = (0, 0)
L(~x2) = L(2, 2) = (0, 0)
L(~x3) = L(3,−1) = (8,−4)
Hence, ~x1, ~x2 ∈ Ker(L) while ~x3 < Ker(L).
EXAMPLE 7 Find a basis for the kernel of L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3).
Solution: By definition, ~x =
x1
x2
x3
is in the kernel of L if
(0, 0) = L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3)
Hence, ~x is in the kernel of L if and only if it satisfies
3x1 − x2 = 0
2x1 + 2x3 = 0
Row reducing the corresponding coefficient matrix gives
[
3 −1 0
2 0 2
]
∼[
1 0 1
0 1 3
]
We find that the solution space of the homogeneous system has vector equation
~x = t
−1
−3
1
, t ∈ R. Thus, Ker(L) = Span
−1
−3
1
. Since
−1
−3
1
is also linearly
independent, it is a basis for Ker(L).
100 Chapter 3 Matrices and Linear Mappings
EXAMPLE 8 Find a basis for the kernel of Rθ : R2 → R2.
Solution: Let’s first think about this geometrically. We are looking for all vectors in
R2 such that rotating them by an angle of θ gives the zero vector. Since a rotation
does not affect length, the only vector which could have image of the zero vector
would be the zero vector.
We verify our geometric intuition algebraically. Consider
[
0
0
]
= Rθ(~x) = [Rθ]~x =
[
cos θ − sin θsin θ cos θ
]
~x
Row reducing the coefficient matrix of the corresponding homogeneous system gives
[
cos θ − sin θsin θ cos θ
]
∼[
1 0
0 1
]
Hence, the system has the unique solution ~x = ~0. So,
Ker(Rθ) =
{[
0
0
]}
Recall that we defined a basis for {~0} to be the empty set. Hence, the empty set is a
basis for Ker(Rθ).
EXERCISE 2 Find a basis for the kernel of L(x1, x2, x3) = (x1 − x2, x1 − 2x3).
In the examples, the kernel of a linear mapping was always a subspace of the appro-
priate Rn.
THEOREM 3 If L : Rn → Rm is a linear mapping, then Ker(L) is a subspace of Rn.
Proof: We use the Subspace Test. We first observe that by definition Ker(L) is a
subset of Rn. Also, we have that ~0 ∈ Ker(L) by Lemma 1. Let ~x, ~y ∈ Ker(L). Then,
L(~x) = ~0 and L(~y) = ~0 so
L(~x + ~y) = L(~x) + L(~y) = ~0 + ~0 = ~0
so ~x+~y ∈ Ker(L). Thus, Ker(L) is closed under addition. Similarly, for any c ∈ R we
have
L(c~x) = cL(~x) = c~0 = ~0
so c~x ∈ Ker(L). Therefore, Ker(L) is also closed under scalar multiplication. Hence,
Ker(L) is a subspace of Rn. �
Section 3.3 Special Subspaces 101
The Four Fundamental Subspaces of a Matrix
We now look at the relationship of the standard matrix of a linear mapping to its
range and kernel. In doing so, we will derive four important subspaces of a matrix.
THEOREM 4 Let L : Rn → Rm be a linear mapping with standard matrix [L]. Then, ~x ∈ Ker(L) if
and only if [L]~x = ~0.
Proof: If ~x ∈ Ker(L), then ~0 = L(~x) = [L]~x.
On the other hand, if [L]~x = ~0, then ~0 = [L]~x = L(~x), so ~x ∈ Ker(L). �
We get an immediate corollary to this theorem.
COROLLARY 5 Let A ∈ Mm×n(R). The set {~x ∈ Rn | A~x = ~0} is a subspace of Rn.
The corollary motivates the following definition.
DEFINITION
Nullspace
Let A be an m × n matrix. The nullspace (kernel) of A is defined by
Null(A) = {~x ∈ Rn | A~x = ~0}
EXAMPLE 9 Let A =
[
1 2
−1 0
]
. Determine if ~x =
[
−1
2
]
is in the nullspace of A.
Solution: We have A~x =
[
3
1
]
. Since A~x , ~0, ~x < Null(A).
EXAMPLE 10 Let A =
[
1 3 1
−1 −2 2
]
. Find a basis for the nullspace of A.
Solution: We need to find all ~x such that A~x = ~0. We observe that this is a homoge-
neous system with coefficient matrix A. Row reducing gives[
1 3 1
−1 −2 2
]
∼[
1 0 −8
0 1 3
]
Thus, a vector equation for the solution set of the system is
~x = t
8
−3
1
, t ∈ R
Hence,
8
−3
1
spans Null(A) and is linearly independent, so it is a basis for Null(A).
102 Chapter 3 Matrices and Linear Mappings
EXAMPLE 11 Let A =
1 2 −1
5 0 −4
−2 6 4
. Show that Null(A) = {~0}.
Solution: We need to show that the only solution to the homogeneous system A~x = ~0
is ~x = ~0. Row reducing the associated coefficient matrix we get
1 2 −1
5 0 −4
−2 6 4
∼
1 0 0
0 1 0
0 0 1
Hence, the rank of the coefficient matrix equals the number of columns so by the
System-Rank Theorem there is a unique solution. Therefore, since the system is
homogeneous, the only solution is ~x = ~0. Thus, Null(A) = {~0}.
The connection between the range of a linear mapping and its standard matrix follows
easily from our second interpretation of matrix vector multiplication.
THEOREM 6 If L : Rn → Rm is a linear mapping with standard matrix [L] = A =[
~a1 · · · ~an
]
,
then
Range(L) = Span{~a1, . . . , ~an}
Proof: Observe that for any ~x ∈ Rn we have
L(~x) = A~x =[
~a1 · · · ~an
]
x1
...xn
= x1~a1 + · · · + xn~an
Thus, every vector in Range(L) is in Span{~a1, . . . , ~an} and every vector in Span{~a1, . . . , ~an}is in Range(L) as required. �
Therefore, the range of a linear mapping is the subspace spanned by the columns of
its standard matrix.
DEFINITION
Column Space
Let A ∈ Mm×n(R). The column space of A is the subspace of Rm defined by
Col(A) = {A~x ∈ Rm | ~x ∈ Rn}
EXAMPLE 12 Let A =
[
1 5 −3
9 2 1
]
and B =
1 −3
0 2
0 1
, then Col(A) = Span
{[
1
9
]
,
[
5
2
]
,
[
−3
1
]}
and
Col(B) = Span
1
0
0
,
−3
2
1
.
Section 3.3 Special Subspaces 103
EXERCISE 3 Let A be an m × n matrix. Prove that Col(A) = Rm if and only if rank(A) = m.
Recall that taking the transpose of a matrix changes the columns of a matrix into rows
and vice versa. Hence, the subspace of Rn spanned by the rows of an m × n matrix A
is just the column space of AT .
DEFINITION
Row Space
Let A be an m × n matrix. The row space of A is the subspace of Rn defined by
Row(A) = {AT~x ∈ Rn | ~x ∈ Rm}
EXAMPLE 13 Let A =
[
1 5 −3
9 2 1
]
and B =
1 −3
0 2
0 1
, then Row(A) = Span
1
5
−3
,
9
2
1
and
Row(B) = Span
{[
1
−3
]
,
[
0
2
]
,
[
0
1
]}
.
Lastly, since we have looked at the column space of AT it makes sense to also consider
the nullspace of AT .
DEFINITION
Left Nullspace
Let A be an m × n matrix. The left nullspace of A is the subspace of Rm defined by
Null(AT ) = {~x ∈ Rm | AT~x = ~0}
EXAMPLE 14 Let A =
1 −3
0 2
0 1
. Find a basis for the left nullspace of A.
Solution: We have AT =
[
1 0 0
−3 2 1
]
. Solving the homogeneous system AT~x = ~0 we
get
~x = t
0
−1
2
, t ∈ R
Therefore,
0
−1
2
spans Null(AT ) and is linearly independent, hence it is a basis for
Null(AT ).
For any m × n matrix A, we call the nullspace, column space, row space, and left
nullspace the four fundamental subspaces of A.
104 Chapter 3 Matrices and Linear Mappings
EXAMPLE 15 Find a basis for each of the four fundamental subspaces of A =
[
1 3 2
2 5 2
]
.
Solution: To find a basis for Null(A) we solve A~x = ~0. Row reducing A gives
[
1 3 2
2 5 2
]
∼[
1 0 −4
0 1 2
]
Thus, a vector equation for the solution space is ~x = t
4
−2
1
, t ∈ R. Therefore,
4
−2
1
spans Null(A) and is linearly independent, so it is a basis for Null(A).
By definition of the column space of A we have
Col(A) = Span
{[
1
2
]
,
[
3
5
]
,
[
2
2
]}
Observe that −4
[
1
2
]
+ 2
[
3
5
]
=
[
2
2
]
. Thus, by Theorem 1.2.1 we have
Col(A) = Span
{[
1
2
]
,
[
3
5
]}
Thus,
{[
1
2
]
,
[
3
5
]}
spans Col(A) and is clearly linearly independent, so it is a basis for
Col(A).
By definition, the row space of A is spanned by
1
3
2
,
2
5
2
. This set is also linearly
independent and hence forms a basis for Row(A).
To determine the left nullspace of A, we solve AT~x = ~0. Row reducing AT gives
1 2
3 5
2 2
∼
1 0
0 1
0 0
Hence, the only solution is ~x = ~0, so Null(AT ) = {~0}. Consequently, a basis for
Null(AT ) is the empty set.
Section 3.3 Special Subspaces 105
Section 3.3 Problems
1. For each of the following linear mappings L:
(i) Determine a basis for the range and kernel
of L;
(ii) Find the standard matrix [L] of L;
(iii) Find a basis for the four fundamental sub-
spaces of [L].
(a) L(x1, x2, x3) = (x1 + x2, 0)
(b) L(x1, x2, x3) = (x1 − x2, x2 + x3)
(c) L(x1, x2, x3) = (0, 0, 0)
(d) L(x1, x2, x3, x4) = (x1, x2 + x4)
(e) L(x1, x2) = (x1, x2)
(f) L(x1, x2, x3, x4) = (x4 − x1, 2x2 + 3x3)
(g) L(x1) = (3x1)
(h) L(x1, x2, x3) = (2x1 + x3, x1 − x3, x1 + x3)
(i) L(x1) = (x1, 2x1,−x1)
(j) L(x1, x2) = (x1 + x2, 2x1 + 4x2)
2. Find a basis for the four fundamental subspaces
of each matrix.
(a) A =
[
1 2
3 4
]
(b) B =
[
1 0 1
0 1 −1
]
(c) C =
1 2 0
−1 −2 0
1 3 −1
(d) D =
1 2 1 1
−2 −5 −2 1
−1 −3 −1 2
3. Let A be an m × n matrix. Prove that a consis-
tent system of linear equations A~x = ~b has a
unique solution if and only if Null(A) = {~0}.
4. Prove Lemma 1 and Theorem 2.
5. Suppose that {~v1, . . . ,~vk} is a linearly indepen-
dent set in Rn and that L : Rn → Rm is a
linear map with Ker(L) = {~0}. Prove that
{L(~v1), . . . , L(~vk)} is linearly independent.
6. Let L : Rn → Rn be a linear operator. Prove
that Ker(L) = {~0} if and only if
Range(L) = Rn.
7. Let A be an n × n matrix such that A2 = On,n.
Prove that the columnspace of A is a subset of
the nullspace of A.
8. In the notes we proved that the nullspace of an
m×n matrix A was a subspace of Rn by proving
that the nullspace of A equals the kernel of the
matrix mapping L(~x) = A~x. It is instructive to
proof this result in other ways.
(a) Prove that Null(A) is a subspace of Rn di-
rectly using the Subspace Test.
(b) Prove that Null(A) is a subspace of Rn
by using one of the definitions of matrix-
vector multiplication.
9. Prove that if an m×n matrix A is row equivalent
to a matrix B, then Row(A) = Row(B).
10. Prove or disprove the following statements.
(a) A system of linear equations A~x = ~b is
consistent if and only if ~b ∈ Col(A).
(b) If L : Rn → Rm is a linear mapping and
~x ∈ Ker(L), then ~x = ~0.
(c) Let L : Rn → Rm be a linear mapping
with standard matrix [L]. If Range(L) =
Span{~a1, . . . , ~an}, then
[L] =[
~a1 · · · ~an
]
106 Chapter 3 Matrices and Linear Mappings
3.4 Operations on Linear Mappings
In the preceding sections we have been treating linear mappings as functions. We
have also seen that there is a close connection between a linear mapping and its
standard matrix. We now use this connection to show that we can also treat linear
mappings just like vectors in Rn or matrices. We begin with some basic definitions.
DEFINITION
Addition
ScalarMultiplication
Let L : Rn → Rm and M : Rn → Rm be linear mappings. We define L+M : Rn → Rm
and cL : Rn → Rm by
(L + M)(~x) = L(~x) + M(~x)
(cL)(~x) = cL(~x)
EXAMPLE 1 Let L : R2 → R3 and M : R2 → R3 be defined by L(x1, x2) = (x1, x1 + x2,−x2) and
M(x1, x2) = (x1, x1, x2). Calculate L + M and 3L.
Solution: L + M is the mapping defined by
(L + M)(~x) = L(~x) + M(~x) = (x1, x1 + x2,−x2) + (x1, x1, x2) = (2x1, 2x1 + x2, 0)
and 3L is the mapping defined by
(3L)(~x) = 3L(~x) = 3(x1, x1 + x2,−x2) = (3x1, 3x1 + 3x2,−3x2)
By analyzing Example 1 we can learn more than just how to add linear mappings and
how to multiply them by a scalar. We first observe that L + M and 3L are both linear
mappings. This makes us realize that we can rewrite the calculations in the example
in terms of standard matrices. For example,
(L + M)(~x) = L(~x) + M(~x) = [L]~x + [M]~x = ([L] + [M])~x
=
1 0
1 1
0 −1
+
1 0
1 0
0 1
~x =
2 0
2 1
0 0
[
x1
x2
]
=
2x1
2x1 + x2
0
This matches our result above. Generalizing what we did here gives the following
theorem.
Section 3.4 Operations on Linear Mappings 107
THEOREM 1 If L,M : Rn → Rm are linear mappings and c ∈ R, then L + M : Rn → Rm and
cL : Rn → Rm are linear mappings. Moreover, we have
[L + M] = [L] + [M] and [cL] = c[L]
Proof: For any ~x ∈ Rn we have
(L + M)(~x) = L(~x) + M(~x) = [L]~x + [M]~x = ([L] + [M])~x
(cL)(~x) = cL(~x) = c[L]~x = (c[L])~x
Thus, L + M and cL are matrix mappings with m × n standard matrices. Hence, they
are both linear mappings from Rn to Rm. Moreover, the standard matrix of L + M is
[L] + [M] and the standard matrix of cL is c[L] as required. �
If we let L denote the set of all possible linear mappings L : Rn → Rm, then Theorem
1 shows that L is closed under addition and scalar multiplication. Moreover, we
see that we can use the same proof technique to show that L satisfies the same 10
properties as vectors in Rn and matrices in Mm×n(R).
THEOREM 2 If L,M,N ∈ L and c, d ∈ R, then
V1 L + M ∈ LV2 (L + M) + N = L + (M + N)
V3 L + M = M + L
V4 There exists a linear mapping O : Rn → Rm, such that L + O = L for all L. In
particular, O is the linear mapping defined by O(~x) = ~0 for all ~x ∈ Rn
V5 For any linear mapping L : Rn → Rm, there exists a linear mapping
(−L) : Rn → Rm with the property that L + (−L) = O. In particular, (−L) is the
linear mapping defined by (−L)(~x) = −L(~x) for all ~x ∈ Rn
V6 cL ∈ LV7 c(dL) = (cd)L
V8 (c + d)L = cL + dL
V9 c(L + M) = cL + cM
V10 1L = L
Proof: Theorem 1 proves properties V1 and V6. We will prove V3 and V9, and
leave the rest as exercises.
To prove V3 we need to show (L + M)(~x) = (M + L)(~x) for all ~x ∈ Rn. We have
(L + M)(~x) = [L + M](~x) = ([L] + [M])~x = ([M] + [L])~x = [M + L](~x) = (M + L)(~x)
For V9, we need to show that(
c(L + M))
(~x) =(
cL + cM)
(~x) for all ~x ∈ Rn. We have
(
c(L + M))
(~x) = [c(L + M)](~x) = c([L + M])~x = c([L] + [M])~x
= (c[L] + c[M])~x = ([cL] + [cM])~x = [cL + cM]~x =(
cL + cM)
(~x)
�
108 Chapter 3 Matrices and Linear Mappings
EXERCISE 1 Prove V7 of Theorem 2.
The fact that the definitions of matrix addition and scalar multiplication match with
the definitions of addition and scalar multiplication of linear mappings is truly
remarkable. Of course, the wonderfulness of mathematics does not end there. We
now show, as promised, that matrix-matrix multiplication also corresponds to a stan-
dard operation on functions.
DEFINITION
Composition
Let L : Rn → Rm and M : Rm → Rp be linear mappings. The composition of M and
L is the function M ◦ L : Rn → Rp defined by
(M ◦ L)(~x) = M(
L(~x))
REMARK
Observe that it is necessary that the range of L is a subset of the domain of M for
M ◦ L to be defined.
EXAMPLE 2 Let L : R2 → R3 and M : R3 → R be the linear mappings defined by
L(x1, x2) = (x1 + x2, 0, x1 − 2x2) and M(x1, x2, x3) = x1 + x2 + x3. Then M ◦ L is
mapping defined by
(M ◦ L)(x1, x2) = M(x1 + x2, 0, x1 − 2x2) = (x1 + x2) + 0 + (x1 − 2x2) = 2x1 − x2
EXAMPLE 3 Let L : R2 → R2 and M : R2 → R2 be the linear mappings defined by
L(x1, x2) = (2x1 + x2, x1 + x2) and M(x1, x2) = (x1 − x2,−x1 + 2x2). Then, M ◦ L is
mapping defined by
(M ◦ L)(x1, x2) = M(2x1 + x2, x1 + x2)
=
(
(2x1 + x2) − (x1 + x2),−(2x1 + x2) + 2(x1 + x2)
)
= (x1, x2)
THEOREM 3 If L : Rn → Rm and M : Rm → Rp are linear mappings, then M ◦ L : Rn → Rp is a
linear mapping and
[M ◦ L] = [M][L]
Section 3.4 Operations on Linear Mappings 109
EXAMPLE 4 Let L : R2 → R2 and M : R2 → R2 be the linear mappings defined by
L(x1, x2) = (2x1 + x2, x1 + x2) and M(x1, x2) = (x1 − x2,−x1 + 2x2). Then
[M ◦ L] = [M][L] =
[
1 −1
−1 2
] [
2 1
1 1
]
=
[
1 0
0 1
]
In Example 3, M ◦ L is the mapping such that (M ◦ L)(~x) = ~x for all ~x ∈ R2. We see
from Example 4 that the standard matrix of this mapping is the identity matrix. Thus,
we make the following definition.
DEFINITION
Identity Mapping
The linear mapping Id : Rn → Rn defined by Id(~x) = ~x is called the identity
mapping.
Section 3.4 Problems
1. For each of the following linear mappings L
and M, determine L + M and [L + M].
(a) L(x1, x2) = (2x1 − x2, x1),
M(x1, x2) = (3x1 + x2, x1 + x2)
(b) L(x1, x2, x3) = (x1 + x2, x1 + x3),
M(x1, x2, x3) = (−x3, x1 + x2 + x3)
(c) L(x1, x2, x3) = (2x1 − x2, x2 + 3x3),
M(x1, x2, x3) = (−2x1 + x2,−x2 − 3x3)
2. For each of the following linear mappings L
and M, determine M ◦ L, L ◦ M, [M ◦ L], and
[L ◦ M].
(a) L(x1, x2) = (2x1 − x2, x1)
M(x1, x2) = (3x1 + x2, x1 + x2)
(b) L(x1, x2, x3) = (x1 − 2x2, x1 + x2 + x3),
M(x1, x2) = (0, x1 + x2, x1)
(c) L(x1, x2) = (3x1 + x2, 5x1 + 2x2),
M(x1, x2) = (2x1 − x2,−5x1 − 3x2)
3. Let L : Rn → Rm and M : Rm → Rp be linear
mappings.
(a) Prove that if Range(L) = Rm and
Range(M) = Rp, then Range(M ◦ L) = Rp.
(b) Give an example where Range(L) , Rm,
but Range(M ◦ L) = Rp.
(c) Is it possible to give an example where
Range(M) , Rp, but
Range(M ◦ L) = Rp? Explain.
4. Prove properties V4, V5, V8 of Theorem 2.
5. Prove Theorem 3.
6. Let L1, L2 : R3 → R2 be the linear mappings
defined by
L1(x1, x2, x3) = (2x1, x2 − x3)
L2(x1, x2, x3) = (x1 + x2 + x3, x3)
Find all t1 and t2 such that
t1L1 + t2L2 = O
7. Let L1, L2, L3 : R2 → R2 be the linear map-
pings defined by
L1(x1, x2) = (x1 + 3x2, 6x1 + x2)
L2(x1, x2) = (2x1 + 5x2, 8x1 + 3x2)
L3(x1, x2) = (x1 + x2,−2x1 + 3x2)
Find all t1, t2, t3 such that
t1L1 + t2L2 + t3L3 = O
8. Let L denote the set of all linear operators
L : Rn → Rn. Prove that there exists a linear
mapping Id ∈ L such that L ◦ Id = L = Id ◦ L
for every L ∈ L.
Chapter 4
Vector Spaces
4.1 Vector Spaces
We have seen that Rn, subspaces of Rn, the set Mm×n(R), and the set L of all possible
linear mappings L : Rn → Rm satisfy the same ten properties. In fact, as we will see,
there are many other sets with operations of addition and scalar multiplication that
satisfy these same properties. Instead of studying each of these separately, it makes
sense to define and study one abstract concept which contains all of them.
DEFINITION
Vector Space
A set V with an operation of addition, denoted ~x + ~y, and an operation of scalar
multiplication, denoted c~x, is called a vector space over R if for every ~v, ~x, ~y ∈ Vand c, d ∈ R we have:
V1 ~x + ~y ∈ VV2 (~x + ~y) + ~v = ~x + (~y + ~v)
V3 ~x + ~y = ~y + ~x
V4 There is a vector ~0 ∈ V, called the zero vector, such that ~x+~0 = ~x for all ~x ∈ VV5 For every ~x ∈ V there exists (−~x) ∈ V such that ~x + (−~x) = ~0V6 c~x ∈ VV7 c(d~x) = (cd)~xV8 (c + d)~x = c~x + d~xV9 c(~x + ~y) = c~x + c~y
V10 1~x = ~x
We call the elements of V vectors.
Many textbooks will denote vectors in a vector space in boldface, i.e. x. However,
in this book we will use the usual vector symbol ~x as it is more readable and gives
students a nice way of writing vectors on assignments and tests.
110
Section 4.1 Vector Spaces 111
REMARKS
1. We sometimes denote addition by ~x⊕~y and scalar multiplication by c⊙~x to dis-
tinguish these from “normal” operations of addition and scalar multiplication.
See Example 8 below.
2. In some cases, when working with multiple vector spaces, we denote the zero
vector in a vector space V by ~0V.
3. The ten vector space axioms define a “structure” for the set based on the oper-
ations of addition and scalar multiplication. The study of vector spaces is the
study of this structure. Note that there may be additional operations defined on
some vector spaces that are not included in this structure. For example, ma-
trix multiplication in M2×2(R) is an additional operation that is not related to
M2×2(R) being a vector space.
4. The definition of a vector space makes sense if the scalars are allowed to be
taken from any set that has all the same important properties as R. Such sets
are called fields, for example, Q or C. In Chapter 11, we will consider the case
where the scalars are allowed to be complex. Until then, in this book when we
say vector space we mean a vector space over R.
EXAMPLE 1 In Chapter 1, we saw that Rn and any subspace of Rn under standard addition and
scalar multiplication of vectors satisfy all ten vector space axioms and hence are
vector spaces.
EXAMPLE 2 The set Mm×n(R) is a vector space under standard addition and scalar multiplication
of matrices.
EXAMPLE 3 The set L of all linear mappings L : Rn → Rm is a vector space under standard
addition and scalar multiplication of linear mappings.
EXAMPLE 4 Denote the set of all polynomials of degree at most n with real coefficients by Pn(R).
That is,
Pn(R) = {a0 + a1x + · · · + anxn | a0, a1, . . . , an ∈ R}Define standard addition and scalar multiplication of polynomials by
(a0 + · · · + anxn) + (b0 + · · · + bnxn) = (a0 + b0) + · · · + (an + bn)xn
c(a0 + · · · + anxn) = (ca0) + · · · + (can)xn
for all c ∈ R. It is easy to verify that Pn(R) is a vector space under these operations.
We find that the zero vector in Pn(R) is the polynomial z(x) = 0 for all x, and the
additive inverse of p(x) = a0 + a1x + · · · + anxn is (−p)(x) = −a0 − a1x − · · · − anxn.
112 Chapter 4 Vector Spaces
EXAMPLE 5 The set P(R) of all polynomials with real coefficients is a vector space under standard
addition and scalar multiplication of polynomials.
EXAMPLE 6 The set C[a, b] of all continuous functions defined on the interval [a, b] is a vector
space under standard addition and scalar multiplication of functions.
EXAMPLE 7 Let V = {~0} and define addition by ~0 + ~0 = ~0, and scalar multiplication by c~0 = ~0 for
all c ∈ R. Show that V is a vector space under these operations.
Solution: To show that V is a vector space, we need to show that it satisfies all ten
axioms.
V1 The only element in V is ~0 and ~0 + ~0 = ~0 ∈ V.
V2 (~0 + ~0) + ~0 = ~0 + ~0 = ~0 + (~0 + ~0).
V3 ~0 + ~0 = ~0 = ~0 + ~0.
V4 ~0 satisfies ~0 + ~0 = ~0 and ~0 ∈ V, so ~0 is the zero vector in V.
V5 The additive inverse of ~0 is ~0 since ~0 + ~0 = ~0 and ~0 ∈ V.
V6 c~0 = ~0 ∈ V.
V7 c(d~0) = c~0 = ~0 = (cd)~0.
V8 (c + d)~0 = ~0 = ~0 + ~0 = c~0 + d~0 .
V9 c(~0 + ~0) = c~0 = ~0 = ~0 + ~0 = c~0 + c~0.
V10 1~0 = ~0.
for all c, d ∈ R. Thus, V is a vector space. It is called the trivial vector space.
EXAMPLE 8 Let S = {x ∈ R | x > 0}. Define addition in S by x ⊕ y = xy, and define scalar
multiplication by c ⊙ x = xc for all x, y ∈ S and all c ∈ R. Prove that S is a vector
space under these operations.
Solution: We will show that S satisfies all ten vector space axioms. For any x, y, z ∈ Sand c, d ∈ R we have
V1 x ⊕ y = xy > 0 since x > 0 and y > 0. Hence, x ⊕ y ∈ S.V2 (x ⊕ y) ⊕ z = (xy) ⊕ z = (xy)z = x(yz) = x ⊕ (yz) = x ⊕ (y ⊕ z).
V3 x ⊕ y = xy = yx = y ⊕ x.
V4 The zero vector is 1 since 1 ∈ S and x ⊕ 1 = x1 = x for all x ∈ S.V5 If x ∈ S, then 1
x⊕ x = 1
xx = 1. Thus, 1
xis the additive inverse of x. Also, 1
x> 0
since x > 0 and hence 1x∈ S.
V6 c ⊙ x = xc > 0 since x > 0. So, c ⊙ x ∈ S.V7 c ⊙ (d ⊙ x) = c ⊙ xd = (xd)c = xcd = (cd) ⊙ x.
V8 (c + d) ⊙ x = xc+d = xcxd = xc ⊕ xd = (c ⊙ x) ⊕ (d ⊙ x).
V9 c ⊙ (x ⊕ y) = c ⊙ (xy) = (xy)c = xcyc = xc ⊕ yc = (c ⊙ x) ⊕ (c ⊙ y).
V10 1 ⊙ x = x1 = x.
Therefore, S is a vector space.
Section 4.1 Vector Spaces 113
EXAMPLE 9 Let D =
{[
a1 0
0 a2
]
| a1, a2 ∈ R}
. Show that D is a vector space under standard
addition and scalar multiplication of matrices.
Solution: We show that D satisfies all ten vector space axioms. For any
A =
[
a1 0
0 a2
]
, B =
[
b1 0
0 b2
]
, and C =
[
c1 0
0 c2
]
in D and s, t ∈ R we have
V1 A + B =
[
a1 + b1 0
0 a2 + b2
]
∈ D.
V2 (A + B) +C = A + (B + C) by Theorem 3.1.1.
V3 A + B = B + A by Theorem 3.1.1.
V4 Clearly O2,2 ∈ D and we have O2,2 satisfies A + O2,2 = A for all A ∈ D by
Theorem 3.1.1. Hence, O2,2 is the zero vector in D.
V5 By Theorem 3.1.1, the additive inverse of A is
[
−a1 0
0 −a2
]
∈ D for all A ∈ D.
V6 sA =
[
sa1 0
0 sa2
]
∈ D.
V7 s(t(A)) = (st)A by Theorem 3.1.1.
V8 (s + t)A = sA + tA by Theorem 3.1.1.
V9 s(A + B) = sA + sB by Theorem 3.1.1.
V10 1A = A by Theorem 3.1.1.
Hence, D is a vector space.
Example 9 should remind you a lot of the Subspace Test we did back in Chapter 1.
In particular, we automatically had that all the axioms V2, V3, V5, V7, V8, V9, and
V10 held since we were using the addition and scalar multiplication operations of a
vector space. We will look more at this below.
EXAMPLE 10 Let V andW be vectors spaces. We define the Cartesian product of V andW by
V ×W = {(~v, ~w) | ~v ∈ V, ~w ∈W}
If we define addition and scalar multiplication in V ×W by
(~v1, ~w1) ⊕ (~v2, ~w2) = (~v1 + ~v2, ~w1 + ~w2)
c ⊙ (~v1, ~w1) = (c~v1, c~w1)
for all (~v1, ~w1), (~v2, ~w2) ∈ V ×W and c ∈ R, then V ×W is a vector space.
EXERCISE 1 Prove that the Cartesian product of two vector spaces with addition and scalar multi-
plication defined as in Example 10 is a vector space.
114 Chapter 4 Vector Spaces
EXAMPLE 11 Show the set S = {(x, y) | x, y ∈ R} with addition defined by (x1, y1) ⊕ (x2, y2) =
(x1 + y2, x2 + y1) and scalar multiplication defined by c ⊙ (x1, y1) = (cx1, cy1) is not a
vector space.
Solution: To show that S is not a vector space, we only need to find one counter
example to one of the vector space axioms. Observe that (1, 2) ∈ S and that
(1 + 1) ⊙ (1, 2) = 2 ⊙ (1, 2) = (2, 4)(
1 ⊙ (1, 2)) ⊕ (
1 ⊙ (1, 2))
= (1, 2) ⊕ (1, 2) = (1 + 2, 2 + 1) = (3, 3)
Hence, V8 is not satisfied. Thus, S is not a vector space.
EXAMPLE 12 Show the set Z2 =
{[
x1
x2
]
| x1, x2 ∈ Z}
is not a vector space under standard addition
and scalar multiplication of vectors.
Solution: Observe that
[
1
2
]
∈ Z2, but√
2
[
1
2
]
=
[ √2
2√
2
]
< Z2. Hence, Z2 does not
satisfy axiom V6, so it is not a vector space.
We have now seen that there are numerous different vector spaces. The advantage of
having all of these belonging to the one abstract concept of a vector space is that we
can study all of them at the same time. That is, if we prove a result for a general vector
space, then it immediately applies for all specific vector spaces. We now demonstrate
this.
THEOREM 1 If V is a vector space, then
(1) 0~x = ~0 for all ~x ∈ V,
(2) (−~x) = (−1)~x for all ~x ∈ V.
Proof: We prove (1) and leave (2) as an exercise. For all ~x ∈ V we have
0~x = 0~x + ~0 by V4
= 0~x + [~x + (−~x)] by V5
= 0~x + [1~x + (−~x)] by V10
= [0~x + 1~x] + (−~x) by V2
= (0 + 1)~x + (−~x) by V8
= 1~x + (−~x) operation of numbers in R
= ~x + (−~x) by V10
= ~0 by V5
�
Section 4.1 Vector Spaces 115
REMARK
Observe that Theorem 1 implies that the zero vector in a vector space is unique, as is
the additive inverse of any vector in the vector space.
We can now use this result to find the zero vector in any vector space V by simply
finding 0~x for any ~x ∈ V. Similarly, we can find the additive inverse of any ~x ∈ V by
calculating (−1)~x.
EXAMPLE 13 Let S be defined as in Example 8. Observe that we indeed have ~0 = 0 ⊙ x = x0 = 1,
and (−x) = (−1) ⊙ x = x−1 = 1x
for any x ∈ S.
Subspaces
Observe that the set D in Example 9 is a vector space that is contained in the larger
vector space M2×2(R). This is exactly the same as we saw in Chapter 1 with subspaces
of Rn.
DEFINITION
Subspace
Let V be a vector space. If S is a subset of V and S is a vector space under the same
operations as V, then S is called a subspace of V.
As in Example 9, we see that we do not need to test all ten vector space axioms to
show that a set S is a subspace of a vector space V. In particular, we get the following
theorem which matches what we saw in Chapter 1.
THEOREM 2 (Subspace Test)
If S is a non-empty subset of V such that ~x + ~y ∈ S and c~x ∈ S for all ~x, ~y ∈ S and
c ∈ R under the operations of V, then S is a subspace of V.
REMARKS
1. It is very important to always remember to check that S is a subset of V and is
non-empty as part of the Subspace Test.
2. As before, we will say that S is closed under addition if ~x + ~y ∈ S for all
~x, ~y ∈ S, and we will say that S is closed under scalar multiplication if c~x ∈ Sfor all ~x ∈ S and c ∈ R.
116 Chapter 4 Vector Spaces
EXAMPLE 14 Determine if S1 = {xp(x) | p(x) ∈ P2(R)} is a subspace of P4(R).
Solution: Recall that P2(R) is the set of all polynomials of degree at most 2 and P4(R)
is the set of polynomials of degree at most 4. Consequently, every vector q ∈ S1 has
the form
q(x) = x(a0 + a1x + a2x2) = a0x + a1x2 + a2x3
which is a polynomial of degree at most 4 (actually, of at most 3). Hence, S1 is a
subset of P4(R).
0 + 0x + 0x2 ∈ P2(R), so 0 = x(0 + 0x + 0x2) ∈ S1. Thus, S1 is non-empty.
If q1, q2 ∈ S1, then there exist p1, p2 ∈ P2(R) such that q1(x) = xp1(x) and q2(x) =
xp2(x). We have
q1(x) + q2(x) = xp1(x) + xp2(x) = x(p1(x) + p2(x))
and p1 + p2 ∈ P2(R) as P2(R) is closed under addition. So, q1 + q2 ∈ S1.
Similarly, for any c ∈ R we get
cq1(x) = c(xp1(x)) = x(cp1(x)) ∈ S1
since cp1 ∈ P2(R) as P2(R) is closed under scalar multiplication.
Therefore, S1 is a subspace of P4(R) by the Subspace Test.
EXAMPLE 15 Determine if S2 =
{[
a1 a2
0 a3
]
| a1a2a3 = 0
}
is a subspace of M2×2(R).
Solution: Observe that A =
[
1 1
0 0
]
and B =
[
0 0
0 1
]
are both in S2, but
A + B =
[
1 1
0 1
]
< S2
So, S2 is not closed under addition and hence is not a subspace.
EXAMPLE 16 Determine if S3 = {p(x) ∈ P2(R) | p(2) = 0} is a subspace of P2(R).
Solution: By definition S3 is a subset of P2(R).
The zero vector in P2(R) is z(x) = 0. Observe z ∈ S3 since it satisfies z(2) = 0.
Therefore, S3 is non-empty.
If p, q ∈ S3, then p(2) = 0 and q(2) = 0, so (p + q)(2) = p(2) + q(2) = 0 + 0 = 0.
Hence, (p + q) ∈ S3. Thus, S3 is closed under addition.
Similarly, (cp)(2) = cp(2) = c(0) = 0 for all c ∈ R, so (cp) ∈ S3. Thus, it is
also closed under scalar multiplication. Therefore S3 is a subspace of P2(R) by the
Subspace Test.
Section 4.1 Vector Spaces 117
EXAMPLE 17 Determine if S4 =
{[
1 2
3 4
]}
is a subspace of M2×2(R).
Solution: Observe that 0
[
1 2
3 4
]
=
[
0 0
0 0
]
< S4. Therefore, S4 is not closed under
scalar multiplication and hence is not a subspace.
REMARK
As with subspaces in Chapter 1, normally the easiest way to verify the set S is non-
empty is to check if the zero vector of V, ~0V, is in S. Moreover, if ~0V < S, then, as
we saw in the example above, S is not closed under scalar multiplication and hence
is not a subspace.
EXAMPLE 18 A matrix A is called skew-symmetric if AT = −A. Determine if the set of all n × n
skew-symmetric matrices with real entries is a vector space under standard addition
and scalar multiplication of matrices.
Solution: To prove the set S of skew-symmetric matrices is a vector space, we will
prove that it is a subspace of Mn×n(R).
By definition S is a subset of Mn×n(R). Also, the n×n zero matrix On,n clearly satisfies
AT = −A, thus On,n ∈ S, and so S is non-empty.
Let A and B be any two skew-symmetric matrices. Then,
(A + B)T = AT + BT = (−A) + (−B) = −A − B = −(A + B)
So, A + B is skew-symmetric and hence S is closed under addition.
For any c ∈ R we get (cA)T = c(AT ) = c(−A) = −(cA) so (cA) is skew-symmetric
and therefore S is closed under scalar multiplication.
Thus, S is a subspace of Mn×n(R) by the Subspace Test. Therefore, S is a vector space
by definition of a subspace.
EXERCISE 2 Prove that the set S =
{[
a1 a2
a3 a4
]
| a1 + a2 = a3 − a4
}
is a subspace of M2×2(R).
Axioms V1 and V6 imply that every vector space is closed under linear combinations.
Therefore, it makes sense to look at the concepts of spanning and linear independence
as we did in Chapter 1.
118 Chapter 4 Vector Spaces
Spanning
DEFINITION
Span
Let B = {~v1, . . . ,~vk} be a set of vectors in a vector space V. We define the span of Bby
SpanB = {c1~v1 + c2~v2 + · · · + ck~vk | c1, . . . , ck ∈ R}We say that SpanB is spanned by B and that B is a spanning set for SpanB.
EXAMPLE 19 Let B = {1 + 2x + 3x2, 3 + 2x + x2, 5 + 6x + 7x2}. Determine which of the following
vectors is in SpanB.
(a) p(x) = 1 + x + 2x2.
Solution: To determine if p(x) ∈ SpanB, we need to find whether there exist coeffi-
cients c1, c2, and c3 such that
1 + x + 2x2 = c1(1 + 2x + 3x2) + c2(3 + 2x + x2) + c3(5 + 6x + 7x2)
= (c1 + 3c2 + 5c3) + (2c1 + 2c2 + 6c3)x + (3c1 + c2 + 7c3)x2
Comparing coefficients of powers of x on both sides gives us the system of linear
equations
c1 + 3c2 + 5c3 = 1
2c1 + 2c2 + 6c3 = 1
3c1 + c2 + 7c3 = 2
Row reducing the augmented matrix of the system we get
1 3 5 1
2 2 6 1
3 1 7 2
∼
1 3 5 1
0 −4 −4 −1
0 0 0 1
Hence, the system is inconsistent, so p(x) < SpanB.
(b) q(x) = 2 + x.
Solution: To determine if q(x) ∈ SpanB, we need to find whether there exist coeffi-
cients c1, c2, and c3 such that
2 + x = c1(1 + 2x + 3x2) + c2(3 + 2x + x2) + c3(5 + 6x + 7x2)
= (c1 + 3c2 + 5c3) + (2c1 + 2c2 + 6c3)x + (3c1 + c2 + 7c3)x2
Comparing coefficients of powers of x on both sides gives us the system of linear
equations
c1 + 3c2 + 5c3 = 2
2c1 + 2c2 + 6c3 = 1
3c1 + c2 + 7c3 = 0
Section 4.1 Vector Spaces 119
Row reducing the augmented matrix of the system we get
1 3 5 2
2 2 6 1
3 1 7 0
∼
1 0 2 −1/40 1 1 3/40 0 0 0
Hence, the system is consistent, so q(x) ∈ SpanB. In particular, if we solve the
system, we get
~c =
c1
c2
c3
=
−1/43/40
+ t
−2
−1
1
for any t ∈ R. Therefore, there are infinitely many different ways to write q(x) as a
linear combination of the vectors in B.
EXAMPLE 20 Prove that the subspace S =
{[
a1 a2
0 a3
]
| a1, a2, a3 ∈ R}
of M2×2(R) is spanned by
B ={[
1 1
0 1
]
,
[
1 2
0 −1
]
,
[
1 3
0 2
]}
.
Solution: We need to show that every vector of the form
[
a1 a2
0 a3
]
can be written as a
linear combination of the vectors in B. That is, we need to show that the equation
c1
[
1 1
0 1
]
+ c2
[
1 2
0 −1
]
+ c3
[
1 3
0 2
]
=
[
a1 a2
0 a3
]
is consistent for all a1, a2, a3 ∈ R. Using operations on matrices on the left-hand side,
we get[
c1 + c2 + c3 c1 + 2c2 + 3c3
0 c1 − c2 + 2c3
]
=
[
a1 a2
0 a3
]
Comparing entries gives the system of linear equations
c1 + c2 + c3 = a1
c1 + 2c2 + 3c3 = a2
c1 − c2 + 2c3 = a3
We row reduce the coefficient matrix to get
1 1 1
1 2 3
1 −1 2
∼
1 0 0
0 1 0
0 0 1
By the System-Rank Theorem, the system is consistent for all a1, a2, a3 ∈ R.
120 Chapter 4 Vector Spaces
REMARKS
1. As we saw in Chapter 1, when proving that a set B spans a vector space V,
we technically need to prove that SpanB ⊆ V and V ⊆ SpanB. However,
as long as B ⊂ V, we have that SpanB ⊆ V since V is closed under linear
combinations. Thus, it is standard practice to only show that V ⊆ SpanB.
2. To prove that the system in Example 20 was consistent for all a1, a2, a3 ∈ Rwe only needed to determine the rank of the coefficient matrix. However, if we
had row reduced the corresponding augmented matrix, we would have found
equations for c1, c2, and c3 in terms of a1, a2, and a3.
EXERCISE 3 Prove that B = {1 + x + x2, 1 + 2x − x2, 1 + 3x + 2x2} spans P2(R).
EXERCISE 4 Prove that a set of three vectors in M2×2(R) cannot span M2×2(R).
At this time you should notice that all of this is exactly the same as what we did with
spanning in Chapter 1 except that we no longer necessarily have a geometric inter-
pretation of the spanned set. The fact that it is the same should not be surprising since
a general vector space is just a generalization of Rn and our definition of spanning is
exactly the same as before. So, we would expect that we would also have the exactly
same theorems as we did for Rn. Indeed the following theorems and their proofs are
exactly the same.
THEOREM 3 If B = {~v1, . . . ,~vk} is a set of vectors in a vector space V, then SpanB is a subspace
of V.
THEOREM 4 Let V be a vector space and ~v1, . . . ,~vk ∈ V. Then, ~vi ∈ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}if and only if
Span{~v1, . . . ,~vk} = Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk}
Section 4.1 Vector Spaces 121
Linear Independence
As was the case with spanning, we now see that our theory of linear independence is
also exactly the same as in Chapter 1.
DEFINITION
LinearlyDependent
LinearlyIndependent
A set of vectors {~v1, . . . ,~vk} in a vector space V is said to be linearly dependent if
there exist coefficients c1, . . . , ck not all zero such that
~0 = c1~v1 + · · · + ck~vk
A set of vectors {~v1, . . . ,~vk} is said to be linearly independent if the only solution to
~0 = c1~v1 + · · · + ck~vk
is the trivial solution c1 = · · · = ck = 0.
EXAMPLE 21 Determine if the set
{[
1 2
1 −2
]
,
[
1 −2
2 2
]
,
[
2 0
3 0
]
,
[
1 1
1 1
]}
is linearly independent or
linearly dependent in M2×2(R).
Solution: Consider
[
0 0
0 0
]
= c1
[
1 2
1 −2
]
+ c2
[
1 −2
2 2
]
+ c3
[
2 0
3 0
]
+ c4
[
1 1
1 1
]
=
[
c1 + c2 + 2c3 + c4 2c1 − 2c2 + c4
c1 + 2c2 + 3c3 + c4 −2c1 + 2c2 + c4
]
Comparing entries we get the homogeneous system
c1 + c2 + 2c3 + c4 = 0
2c1 − 2c2 + c4 = 0
c1 + 2c2 + 3c3 + c4 = 0
−2c1 + 2c2 + c4 = 0
We row reduce the corresponding coefficient matrix to get
1 1 2 1
2 −2 0 1
1 2 3 1
−2 2 0 1
∼
1 0 1 0
0 1 1 0
0 0 0 1
0 0 0 0
Therefore, by the System-Rank Theorem, the system has infinitely many solution
and hence the set is linearly dependent. In particular, we can find that
[
0 0
0 0
]
= (−t)
[
1 2
1 −2
]
+ (−t)
[
1 −2
2 2
]
+ t
[
2 0
3 0
]
+ 0
[
1 1
1 1
]
for all t ∈ R.
122 Chapter 4 Vector Spaces
EXAMPLE 22 Determine if the set {1 + x + 2x2, x − x2,−2 + x2} is linearly independent or linearly
dependent in P2(R).
Solution: Consider
0 = c1(1 + x + 2x2) + c2(x − x2) + c3(−2 + x2)
0 + 0x + 0x2 = (c1 − 2c3) + (c1 + c2)x + (2c1 − c2 + c3)x2
Comparing coefficients of like powers of x we get the homogeneous system
c1 − 2c3 = 0
c1 + c2 = 0
2c1 − c2 + c3 = 0
Row reducing the corresponding coefficient matrix gives
1 0 −2
1 1 0
2 −1 1
∼
1 0 0
0 1 0
0 0 1
Thus, we see that the only solution to the system is c1 = c2 = c3 = 0. Thus, the set is
linearly independent.
As expected, we get the following theorems which match what we did in Chapter 1.
THEOREM 5 A set of vectors {~v1, . . . ,~vk} in a vector space V is linearly dependent if and only if
~vi ∈ Span{~v1, . . . ,~vi−1,~vi+1, . . . ,~vk} for some i, 1 ≤ i ≤ k.
THEOREM 6 Any set of vectors {~v1, . . . ,~vk} in a vector space V which contains the zero vector is
linearly dependent.
EXERCISE 5 Prove that a set of four vectors in P2(R) must be linearly dependent.
Section 4.1 Vector Spaces 123
Section 4.1 Problems
1. Determine, with proof, which of the following
are subspaces of the given vector space.
(a) S1 ={
x2 p(x) | p(x) ∈ P2(R)}
of P4(R)
(b) S2 =
{[
a1 a2
0 a3
]
∈ M2×2(R) | a1 + a3 = 0
}
of M2×2(R)
(c) S3 = {p(x) ∈ P2(R) | p(1) = 0} of P2(R)
(d) S4 =
{[
0 0
0 0
]}
of M2×2(R)
(e) S5 = {a + bx + cx2 ∈ P2(R) | a = c} of
P2(R)
(f) S6 ={
A ∈ M2×2(R) | AT = A}
of M2×2(R)
(g) S7 = {1, x, x2} of P2(R)
(h) S8 = Span{
1 + x + x2, x + x2 + x3, 1 + x2 + x3}
of P3(R)
(i) S9 =
{[
a1 a2
a3 a4
]
∈ M2×2(R) | a1 + a3 = 0, a2 = 2a4
}
of M2×2(R)
2. In each case, determine if ~v is in SpanB.
(a) ~v = 1 + x2,
B = {1 + x + x2,−1 + 2x + 2x2, 5 + x + x2}
(b) ~v =
[
2 5
4 4
]
,B ={[
1 0
−1 1
]
,
[
0 1
2 2
]
,
[
3 1
−3 1
]}
(c) ~v =
0
−3
3
−6
, B =
1
2
0
−1
,
1
3
1
−1
,
2
4
3
−5
(d) ~v = −3x + 3x2 − 6x3, B = {1+ 2x − x3, 1+3x + x2 − x3, 2 + 4x + 3x2 − 5x3}
(e) ~v =
[
0 −3
3 −6
]
,B ={[
1 2
0 −1
]
,
[
1 3
1 −1
]
,
[
2 4
3 −5
]}
3. In each case, determine if B is linearly inde-
pendent.
(a) B = {3−2x+ x2, 2−5x+2x2, 6+7x−2x2}
(b) B ={[
1 0
0 1
]
,
[
0 0
0 0
]
,
[
3 −4
−4 3
]}
(c) B ={[
1 1
−2 1
]
,
[
2 2
1 0
]
,
[
1 −2
3 4
]}
(d) B = {1+ x− 2x2 + x3, 2+ 2x+ x2, 1− 2x+
3x2 + 4x3}
(e) B ={[
1 −1
2 1
]
,
[
3 1
2 1
]
,
[
−1 5
0 1
]
,
[
3 1
2 2
]}
4. Let L1, L2, L3 : R2 → R2 be the linear map-
pings defined by
L1(x1, x2) = (x1 + 3x2, 6x1 + x2)
L2(x1, x2) = (2x1 + 5x2, 8x1 + 3x2)
L3(x1, x2) = (x1 + x2,−2x1 + 3x2)
Determine if the set {L1, L2, L3} is linearly in-
dependent or linearly dependent in L.
5. Let V be a vector space and ~v1, . . . ,~vk ∈ V. De-
termine whether each statement is true or false.
Justify your answer.
(a) If B = {~v1, . . . ,~vk} is linearly independent,
then any subset of B is linearly indepen-
dent.
(b) If c~v1 = c~v2, then ~v1 = ~v2.
(c) c~0V = ~0V for all c ∈ R.
6. Let V be a vector space. Prove that {~0V} is a
vector space under the same operations as V.
7. Let S be the set of all n × n matrices with real
entries. Determine if S is a vector space where
addition and scalar multiplication are defined
by A ⊕ B = AB and c · A = cA for all A, B ∈ Sand c ∈ R.
8. A sequence is an infinite list of real numbers.
For example 1, 2, 3, 4, 5, 6, . . . is a sequence, as
is 1,−1, 2,−2, 4,−4, . . .. (The terms may or
may not follow a discernible pattern). We de-
fine addition and scalar multiplication of se-
quences in a component-wise fashion, so that
{1, 2, 3, 4, . . .} + {1,−1, 2, 4, . . .} = {2, 1, 5, 8, . . .}c{1, 2, 3, . . .} = {c, 2c, 3c, . . .}
Verify that the set of sequences is a vector
space.
9. Let V be a vector space. Prove that
(−~x) = (−1)~x for all ~x ∈ V.
10. Prove Theorem 4.
11. Prove Theorem 5.
124 Chapter 4 Vector Spaces
4.2 Bases and Dimension
We have seen the importance and usefulness of bases in Rn. A basis for a subspace S
of Rn provides an explicit formula for every vector in S. Moreover, we can geometri-
cally view this explicit formula as a coordinate system. That is, a basis of a subspace
of Rn forms a set of coordinate axes for that subspace. Thus, we desire to extend the
concept of bases to general vector spaces.
Bases
Of course, our definition of a basis mimics our definition from Chapter 1.
DEFINITION
Basis
Let V be a vector space. A set B is called a basis for V if B is a linearly independent
spanning set for V.
We define a basis for {~0V} to be the empty set.
EXAMPLE 1 The set {1, x, x2, . . . , xn} is a basis for Pn(R) since it is clearly linearly independent
and spans Pn(R). It is called the standard basis for Pn(R).
EXAMPLE 2 The set
{[
1 0
0 0
]
,
[
0 1
0 0
]
,
[
0 0
1 0
]
,
[
0 0
0 1
]}
is the standard basis for M2×2(R).
EXAMPLE 3 Prove that B = {1 + 2x + x2, 2 + 9x, 3 + 3x + 4x2} is a basis for P2(R).
Solution: First, we need to show that for any polynomial p(x) = c+ bx+ ax2 we can
find c1, c2, c3 such that
c + bx + ax2 = c1(1 + 2x + x2) + c2(2 + 9x) + c3(3 + 3x + 4x2) (4.1)
= (c1 + 2c2 + 3c3) + (2c1 + 9c2 + 3c3)x + (c1 + 4c3)x2
Row reducing the coefficient matrix of the corresponding system gives
1 2 3
2 9 3
1 0 4
∼
1 0 0
0 1 0
0 0 1
By the System-Rank Theorem, SpanB = P2(R) since the system is consistent for
every right hand side
c
b
a
. Moreover, if we let a = b = c = 0 in equation (4.1), we
see that we get the unique solution c1 = c2 = c3 = 0, and hence B is also linearly
independent. Therefore, it is a basis for P2(R).
Section 4.2 Bases and Dimension 125
EXERCISE 1 Prove that B = {3 − 2x + x2, 2 − 5x + 2x2, 1 − x + x2} is a basis for P2(R).
EXAMPLE 4 Consider the subspace U =
{[
a b
0 c
]
| a, b, c ∈ R}
of M2×2(R). Determine whether
B ={[
1 1
0 1
]
,
[
1 2
0 −1
]
,
[
1 3
0 2
]}
is a basis for U.
Solution: We first check to see if the set is linearly independent.
[
0 0
0 0
]
= c1
[
1 1
0 1
]
+ c2
[
1 2
0 −1
]
+ c3
[
1 3
0 2
]
=
[
c1 + c2 + c3 c1 + 2c2 + 3c3
0 c1 − c2 + 2c3
]
.
This gives us the homogeneous system of linear equations
c1 + c2 + c3 = 0
c1 + 2c2 + 3c3 = 0
c1 − c2 + 2c3 = 0
Row reducing the coefficient matrix of the homogeneous system we get
1 1 1
1 2 3
1 −1 2
∼
1 0 0
0 1 0
0 0 1
Hence, the only solution is c1 = c2 = c3 = 0, so the set is linearly independent.
Next, we need to determine if B spans U. Consider
[
a b
0 c
]
= c1
[
1 1
0 1
]
+ c2
[
1 2
0 −1
]
+ c3
[
1 3
0 2
]
This gives a system of linear equations with the same coefficient matrix as above. So,
using the same row-operations we did above we see that the matrix has a leading one
in each row of its RREF, hence the system is consistent by the System-Rank Theorem
for all matrices
[
a b
0 c
]
. Consequently, the set spans U. Therefore, B is a basis for U.
EXERCISE 2 Prove that B ={[
1 1
0 0
]
,
[
1 0
0 1
]
,
[
0 0
1 1
]}
is not a basis for M2×2(R). Find a subspace
S of M2×2(R) such that B is a basis for S.
126 Chapter 4 Vector Spaces
Finding A Basis
In linear algebra and its applications we often need to find a basis for a vector space.
We now give a general procedure for finding a basis for a vector space which can be
spanned by a finite number of vectors.
ALGORITHM
Let V be a vector space that can be spanned by a finite number of vectors. To find a
basis for V:
1. Find the general form of a vector ~x ∈ V.
2. Write the general form of ~x as a linear combination of vectors ~v1, . . . ,~vk in V.
3. Check if B = {~v1, . . . ,~vk} is linearly independent. If it is, then stop as B is a
basis.
4. If B is linearly dependent, pick some vector ~vi ∈ B that can be written as a
linear combination of the other vectors in B and remove it from the set using
Theorem 4.1.4. Repeat this until the set is linearly independent.
EXAMPLE 5 Find a basis for the subspace S of M2×2(R) of skew-symmetric matrices.
Solution: We first want to find the general form of a vector A ∈ S. Recall that if
A =
[
a b
c d
]
is skew-symmetric, then AT = −A. So,
[
a c
b d
]
=
[
−a −b
−c −d
]
For this to be true, we must have a = −a, c = −b, b = −c, and d = −d. Thus,
a = 0 = d and b = −c. Hence, every vector in S can be written in the form
A =
[
a b
c d
]
=
[
0 b
−b 0
]
= b
[
0 1
−1 0
]
Thus, S = Span
{[
0 1
−1 0
]}
, so B ={[
0 1
−1 0
]}
is a spanning set for S. Moreover, Bcontains one non-zero vector, so it is clearly linearly independent. Therefore, B is a
basis for S.
EXAMPLE 6 Find a basis for the subspace V =
a − b
a − c
c − b
| a, b, c ∈ R
of R3.
Solution: Any vector ~v ∈ V has the form
~v =
a − b
a − c
c − b
= a
1
1
0
+ b
−1
0
−1
+ c
0
−1
1
Section 4.2 Bases and Dimension 127
Thus, V = Span
1
1
0
,
−1
0
−1
,
0
−1
1
. Consider
0
0
0
= c1
1
1
0
+ c2
−1
0
−1
+ c3
0
−1
1
=
c1 − c2
c1 − c3
−c2 + c3
(4.2)
Row reducing the coefficient matrix of the corresponding system of linear equations
gives
1 −1 0
1 0 −1
0 −1 1
∼
1 0 −1
0 1 −1
0 0 0
Hence, we see that equation (4.2) has infinitely many solutions. Therefore,
1
1
0
,
−1
0
−1
,
0
−1
1
is linearly dependent. A vector equation of the solution set of
the system is
c1
c2
c3
= t
1
1
1
, t ∈ R. Taking t = 1 gives c1 = c2 = c3 = 1. Substituting
these into equation (4.2) gives
0
−1
1
= −
1
1
0
−
−1
0
−1
So, we can apply Theorem 4.1.4, to get that the set B =
1
1
0
,
−1
0
−1
also spans V.
Now consider
0
0
0
= c1
1
1
0
+ c2
−1
0
−1
=
c1 − c2
c1
−c2
Using exactly the same row operations as above gives
1 −1
1 0
0 −1
∼
1 0
0 1
0 0
Hence, the system has a unique solution, so B is linearly independent. Thus, B is a
basis for V.
In general, it would be very inefficient to remove one vector at a time to reduce
a spanning set to a basis. Notice that when we removed a vector from B in the
example above, all we did was remove the corresponding column from the matrix
representation of the corresponding system of equations. So, we often can use our
knowledge and understanding of solving systems of equations to remove all linearly
dependent vectors at once. This is demonstrated in the next example.
128 Chapter 4 Vector Spaces
EXAMPLE 7 Consider the vectors p1, p2, p3, p4, and p5 given by
p1(x) = 4 + 7x − 12x3 + 9x4 p2(x) = −2 + 12x − 8x2 + 4x4
p3(x) = −16 + 3x − 16x2 + 36x3 − 19x4 p4(x) = −6 + 19x2 − 6x3
p5(x) = 82 − 49x − 96x2 − 12x3 + 17x4
in P4(R), and let V = Span{p1, p2, p3, p4, p5}. Determine a basis for V given that the
following two matrices are row equivalent:
4 −2 −16 −6 82
7 12 3 0 −49
0 −8 −16 19 −96
−12 0 36 −6 −12
9 4 −19 0 17
and
1 0 −3 0 5
0 1 2 0 −7
0 0 0 1 −8
0 0 0 0 0
0 0 0 0 0
Solution: Observe that the first matrix is the coefficient matrix of the homogeneous
system corresponding to the equation
0 = c1 p1 + c2 p2 + c3 p3 + c4 p4 + c5 p5
The RREF shows us that a vector equation for the solution set of the system is
c1
c2
c3
c4
c5
= s
3
−2
1
0
0
+ t
−5
7
0
8
1
, s, t ∈ R
This implies that the set {p1, p2, p3, p4, p5} is linearly dependent. However, it also
shows a lot more. If we take s = 1 and t = 0, we get that c1 = 3, c2 = −2, and c3 = 1,
so
0 = 3p1 − 2p2 + p3 + 0p4 + 0p5
Thus, p3 = −3p1 + 2p2. Similarly, taking s = 0 and t = 1 gives
0 = −5p1 + 7p2 + 0p3 + 8p4 + p5
and so p5 = 5p1 − 7p2 − 8p4. Hence, by Theorem 4.1.4, we get that V =
Span{p1, p2, p4}. Moreover, if we ignore the third and fifth column of the matrices,
we see that the equation
0 = c1 p1 + c2 p2 + c4 p4
has a unique solution, hence {p1, p2, p4} is linearly independent. Thus, {p1, p2, p4} is
a basis for V.
Section 4.2 Bases and Dimension 129
Dimension
In Chapter 1, we geometrically understood that Rn is n-dimensional and, for example,
a plane in Rn is two-dimensional. We now generalize the concept of dimension to
vector spaces. We begin with two important theorems.
THEOREM 1 Let B = {~v1, . . . ,~vn} be a basis for a vector space V and let C = {~w1, . . . , ~wk} be a set
in V. If k > n, then C is linearly dependent.
Proof: Since ~w1, . . . , ~wk ∈ V and B is a basis for V, we can write every vector
~w1, . . . , ~wk as a linear combination of the vectors in B. Say,
~w1 = a11~v1 + a12~v2 + · · · + a1n~vn
~w2 = a21~v1 + a22~v2 + · · · + a2n~vn
......
...
~wk = ak1~v1 + ak2~v2 + · · · + akn~vn
Consider,
~0 = c1~w1 + · · · + ck~wk (4.3)
= c1(a11~v1 + a12~v2 + · · · + a1n~vn) + · · · + ck(ak1~v1 + ak2~v2 + · · · + akn~vn)
= (c1a11 + · · · + ckak1)~v1 + · · · + (c1a1n + · · · + ckakn)~vn (4.4)
Since B is a basis, it is linearly independent, and hence equation (4.4) implies that
c1a11 + · · · + ckak1 = 0
......
...
c1a1n + · · · + ckakn = 0
Since k > n, this system has infinitely many solutions by the System-Rank Theorem.
Hence, equation (4.3) has infinitely many solutions, and so C is linearly dependent
as required. �
REMARK
Recall that we can think of a basis as a minimal spanning set. This theorem shows
that we can also think of a basis as a maximal linearly independent set. That is, a
basis B is a linearly independent set such that if we add any other vector from the
vector space to B, then the resulting set would be linearly dependent.
THEOREM 2 If B = {~v1, . . . ,~vn} and C = {~w1, . . . , ~wk} are bases for a vector space V, then k = n.
Proof: Since B is a basis and C is linearly independent, we get k ≤ n by Theorem 1.
Similarly, C is a basis and B is linearly independent, so n ≤ k. Thus, n = k. �
130 Chapter 4 Vector Spaces
Thus, if a vector space V has a basis containing a finite number of elements, then
every basis of V has the same number of elements. This implies that every basis for
Rn contains exactly n vectors, and every basis for a plane in Rn contains exactly two
vectors which matches our geometric understanding of dimension. Thus, we make
the following definition.
DEFINITION
Dimension
If B = {~v1, . . . ,~vn} is a basis for a vector space V, then we say the dimension of V is
n and write
dim V = n
If V is the trivial vector space, then dimV = 0. If V does not have a basis with a
finite number of vectors in it, then V is said to be infinite dimensional.
EXAMPLE 8 The dimension of Rn is n since the standard basis {~e1, . . . , ~en} contains n vectors.
The dimension of Pn(R) is n + 1 since the standard basis {1, x, . . . , xn} contains n + 1
vectors.
The dimension of Mm×n(R) is mn since the standard basis contains mn vectors.
The vector space P(R) of all polynomials with real coefficients is infinite dimensional
since a basis is {1, x, x2, . . .}.
EXAMPLE 9 Since
{[
0 1
−1 0
]}
is a basis for the subspace S of M2×2(R) of skew-symmetric
matrices, we have that the dimension of S is 1.
EXAMPLE 10 In Example 6, we found that a basis for the subspace V =
a − b
a − c
c − b
| a, b, c ∈ R
of
R3 is B =
1
1
0
,
−1
0
−1
. Thus, dimV = 2. So, geometrically, V is a plane in R3.
Knowing the dimension of a vector space can be very useful. This is demonstrated
in the following theorem.
THEOREM 3 If V is an n-dimensional vector space with n > 0, then
(1) a set of more than n vectors in V must be linearly dependent
(2) a set of fewer than n vectors in V cannot span V
(3) a set of n vectors in V is linearly independent if and only if it spans V
Section 4.2 Bases and Dimension 131
REMARK
Observe that (3) shows that if we have a set B of n linearly independent vectors in
an n-dimensional vector space V, then B is a basis for V. Similarly, if C contains n
vectors in V and SpanC = V, then C is a basis for V.
EXAMPLE 11 Find a basis for the hyperplane x1 + x2 + 2x3 + x4 = 0 in R4 and then extend the basis
to obtain a basis for R4.
Solution: By definition a hyperplane in R4 is 3-dimensional. Hence, by Theorem 3,
any set of 3 linearly independent vectors in the hyperplane will form a basis for the
hyperplane. Observe that the vectors ~v1 =
1
−1
0
0
, ~v2 =
0
2
−1
0
, and ~v3 =
0
0
1
−2
satisfy the
equation of the hyperplane and hence are in the hyperplane. Now, consider
0
0
0
0
= c1~v1 + c2~v2 + c3~v3 =
c1
−c1 + 2c2
−c2 + c3
−2c3
The only solution is c1 = c2 = c3 = 0 so the vectors are linearly independent. Thus,
B = {~v1,~v2,~v3} is a basis for the hyperplane.
To extend the basis B to a basis for R4, we just need to add a vector ~v4 so that
{~v1,~v2,~v3,~v4} is linearly independent. Since ~v4 =
1
0
0
0
does not satisfy the equation
of the hyperplane, it is not in Span(~v1,~v2,~v3) and so B = {~v1,~v2,~v3,~v4} is linearly
independent by Theorem 4.1.5. Therefore, B is a basis for R4.
THEOREM 4 If V is an n-dimensional vector space and {~v1, . . . ,~vk} is a linearly indepen-
dent set in V with k < n, then there exist vectors ~wk+1, . . . , ~wn in V such that
{~v1, . . . ,~vk, ~wk+1, . . . , ~wn} is a basis for V.
We will use this fact that we can always extend a linearly independent set to a basis
for a finite dimensional vector space in a few proofs later in the book.
COROLLARY 5 If S is a subspace of a finite dimensional vector space V, then dimS ≤ dimV.
132 Chapter 4 Vector Spaces
Section 4.2 Problems
1. Determine if the given set B is a basis for the
specified vector space.
(a) B = {1 + x2, 1 − x + x3, 2x + x2 − 3x3, 1 +4x + x2, 1 − x − x2 + x3} of P3(R)
(b) B = {1 + 2x + x2, 2 + 9x, 3 + 3x + 4x2} of
P2(R)
(c) B ={[
1 1
0 1
]
,
[
1 2
0 −1
]
,
[
1 3
0 2
]}
of the sub-
space U =
{[
a b
0 c
]
| a, b, c ∈ R}
of M2×2(R)
(d) B ={[
1 2
3 4
]
,
[
5 6
7 8
]
,
[
9 0
1 2
]}
of M2×2(R)
2. Find a basis and the dimension for each of the
following subspaces.
(a) S1 = {xp(x) | p(x) ∈ P2(R)} of P4(R)
(b) S2 =
{[
a1 a2
0 a3
]
| a1 + a3 = 0
}
of M2×2(R)
(c) S3 = {p(x) ∈ P2(R) | p(2) = 0} of P2(R)
(d) S4 =
{[
0 0
0 0
]}
of M2×2(R)
(e) S5 = {a + bx + cx2 | a = c} of P2(R)
(f) S6 ={
A ∈ M2×2(R) | AT = A}
of M2×2(R)
(g) S7 =
{[
a1 a2
a3 a4
]
| a1 + a3 = 0, a2 = 2a4
}
of M2×2(R)
(h) S8 = Span
{
1 + x, 1 + x + x2, x + 2x2,
x + x2 + x3, 1 + x2 + x3
}
of P3(R)
(i) S9 = Span
1
−3
2
,
−1
4
−1
,
1
−1
4
,
0
2
2
,
0
1
1
of R3.
3. State the standard basis for M2×3(R).
4. Prove Theorem 3.
5. Find a basis for P3(R) that includes the vectors
~v1 = 1 + x + x3 and ~v2 = 1 + x2.
6. Find a basis for M2×2(R) that includes the vec-
tors ~v1 =
[
1 2
2 1
]
and ~v2 =
[
1 2
3 4
]
.
7. Find a basis for the hyperplane in R4 defined by
x1+ x2+2x3+ x4 = 0, and then extend the basis
to obtain a basis for R4.
8. Prove that if S is a subspace of a vector space
V and dimV = dimS, then S = V.
9. Prove Theorem 4 and Corollary 5.
10. Prove or disprove the following statements.
(a) If V is an n-dimensional vector space and
{~v1, . . . ,~vk} is a linearly independent set in
V, then k ≤ n.
(b) Every basis for P2(R) has exactly 2 vectors
in it.
(c) If {~v1,~v2} is a basis for a 2-dimensional
vector space V, then {a~v1 + b~v2, c~v1 + d~v2}is also a basis for V for any non-zero real
numbers a, b, c, d.
(d) If B = {~v1, . . . ,~vk} spans a vector space V,
then some subset of B is a basis for V.
(e) If {~v1, . . . ,~vk} is a linearly independent set
in a vector space V, then dimV ≥ k.
(f) Every set of four non-zero matrices in
M2×2(R) is a basis for M2×2(R).
11. Prove that if S and T are both 3-dimensional
subspaces of a 5-dimensional vector space V,
then S ∩ T , {~0}.
Section 4.3 Coordinates 133
4.3 Coordinates
Recall that when we write ~x =
x1
...xn
in Rn we really mean ~x = x1~e1 + · · ·+ xn~en where
{~e1, . . . , ~en} is the standard basis for Rn. For example, when you originally learned to
plot the point (1, 2) in the xy-plane, you were taught that this means you move 1 in
the x-direction (~e1) and 2 in the y-direction (~e2). We can extend this idea to bases for
general vector spaces.
THEOREM 1 If B = {~v1, . . . ,~vn} is a basis for a vector space V, then every ~v ∈ V can be written as
a unique linear combination of the vectors in B.
Of course, the proof is the same as Theorem 1.2.4. Moreover, the theorem shows
us that we can think of any basis B = {~v1, . . . ,~vn} for a vector space V as forming
coordinate axes for V. That is, we can think of a vector ~v = b1~v1 + · · · + bn~vn in V
as being b1 in the ~v1 direction, b2 in the ~v2 direction, etc. Hence, just like in Rn, it is
really the coefficients of the linear combination that identify the vector.
The next example shows that once we have written out the vectors with respect to
a basis, performing operations on vectors in any finite dimensional vector space is
exactly the same as operations on vectors in Rn.
EXAMPLE 1 Let ~x =
1
−2
3
, ~y =
0
5
−4
, p(x) = 1 − 2x + 3x2, q(x) = 5x − 4x2. Also, let {~v1,~v2,~v3} be
a basis for V = Span{~v1,~v2,~v3} and let ~v = ~v1 − 2~v2 + 3~v3, and ~w = 0~v1 + 5~v2 − 4~v3.
We have
~x + ~y =
1 + 0
−2 + 5
3 − 4
=
1
3
−1
p(x) + q(x) = (1 + 0) + (−2 + 5)x + (3 − 4)x2 = 1 + 3x − x2
~v + ~w = (1 + 0)~v1 + (−2 + 5)~v2 + (3 − 4)~v3 = 1~v1 + 3~v2 − ~v3
This motivates the following definition.
DEFINITION
B-Coordinates
B-CoordinateVector
Let B = {~v1, . . . ,~vn} be a basis for a vector space V. If ~v = b1~v1 + · · · + bn~vn, then
b1, . . . , bn are called the B-coordinates of ~v, and we define the B-coordinate vector
(also called the coordinate vector of ~v with respect to B) by
[~v]B =
b1
...bn
134 Chapter 4 Vector Spaces
EXAMPLE 2 If B = {~v1,~v2,~v3} is a basis for a vector space V and ~v = 3~v1 − 2~v2 + 4~v3, then the
B-coordinate vector of ~v is
[~v]B =
3
−2
4
EXAMPLE 3 Given that B ={[
2
1
]
,
[
3
3
]}
is a basis for R2 and that [~x]B =
[√2
−π
]
. Find ~x.
Solution: By definition of coordinates we get
~x =√
2
[
2
1
]
− π[
3
3
]
=
[
2√
2 − 3π√2 − 3π
]
EXERCISE 1 Given that B ={[
1 1
0 −1
]
,
[
1 0
1 1
]
,
[
0 1
1 2
]}
is a basis for the subspace of M2×2(R)
which it spans, find A and B where [A]B =
2
1
3
and [B]B =
−4
1
5
.
EXAMPLE 4 Find the coordinate vector of ~x =
[
3
1
]
with respect to the basis B ={[
1
1
]
,
[
1
−1
]}
for
R2.
Solution: We just need to write ~x as a linear combination of the basis vectors. That
is, we need to find c1 and c2 such that
[
3
1
]
= c1
[
1
1
]
+ c2
[
1
−1
]
By inspection (or by solving the corresponding system) we get that c1 = 2 and c2 = 1.
Hence, we have[
3
1
]
= 2
[
1
1
]
+ 1
[
1
−1
]
So,
[~x]B =
[
2
1
]
Section 4.3 Coordinates 135
EXAMPLE 5 Given that B ={[
1 1
0 −1
]
,
[
1 0
1 1
]
,
[
0 1
1 2
]}
is a basis for the subspace of M2×2(R)
which it spans, find the B-coordinate vector of A =
[
1 −3
2 3
]
and of B =
[
−1 0
3 7
]
.
Solution: We need to find c1, c2, c3 and d1, d2, d3 such that
c1
[
1 1
0 −1
]
+ c2
[
1 0
1 1
]
+ c3
[
0 1
1 2
]
=
[
1 −3
2 3
]
d1
[
1 1
0 −1
]
+ d2
[
1 0
1 1
]
+ d3
[
0 1
1 2
]
=
[
−1 0
3 7
]
Observe that each corresponding system will have the same coefficient matrix, but
different augmented part. Thus, we can solve both systems simultaneously by row
reducing a doubly augmented matrix. We get
1 1 0 1 −1
1 0 1 −3 0
0 1 1 2 3
−1 1 2 3 7
∼
1 0 0 −2 −2
0 1 0 3 1
0 0 1 −1 2
0 0 0 0 0
For the first system we have c1 = −2, c2 = 3, and c3 = −1, and for the second system
we have d1 = −2, d2 = 1, and d3 = 2. Thus,
[A]B =
−2
3
−1
and [B]B =
−2
1
2
EXERCISE 2 Find the B-coordinate vectors of p(x) = 1 + x + x2 and q(x) = 1 − 2x + 3x2 where
B = {1, 1 + x, 1 + x + x2}.
REMARK
It is important to notice that the coordinates of a vector are dependent on which basis
is being used for the vector space. Moreover, it is also dependent on the order of
vectors in the basis. Thus, to prevent confusion, when we speak of a basis, we mean
an ordered basis.
As intended, coordinate vectors allow us to change from working with vectors in
some vector space with respect to some basis to working with vectors in Rn. In
particular, observe that taking coordinates of a vector ~v with respect to a basis B of a
vector space V can be thought of as a function that takes a vector in V and outputs a
vector in Rn. Not surprisingly, this function is linear.
136 Chapter 4 Vector Spaces
THEOREM 2 If V is a vector space with basis B = {~v1, . . . ,~vn}, then for any ~v, ~w ∈ V and s, t ∈ Rwe have
[s~v + t~w]B = s[~v]B + t[~w]B
Proof: Since B is a basis for V we get that there exists b1, . . . , bn, c1, . . . , cn ∈ R such
that ~v = b1~v1 + · · · + bn~vn and ~w = c1~v1 + · · · + cn~vn. Observe that
s~v + t~w = s(b1~v1 + · · · + bn~vn) + t(c1~v1 + · · · + cn~vn)
= s(b1~v1) + · · · + s(bn~vn) + t(c1~v1) + · · · + t(cn~vn) by V9
= (sb1)~v1 + · · · + (sbn)~vn + (tc1)~v1 + · · · + (tcn)~vn by V7
= (sb1)~v1 + (tc1)~v1 + · · · + (sbn)~vn + (tcn)~vn by V3
= (sb1 + tc1)~v1 + · · · + (sbn + tcn)~vn by V8
Therefore,
[s~v + t~w]B =
sb1 + tc1
...sbn + tcn
= s
b1
...bn
+ t
c1
...cn
= s[~v]B + t[~w]B
�
Change of Coordinates
We have seen how to find different bases for a vector space V and how to find the
coordinates of a vector ~v ∈ V with respect to any basis B of V. In some cases, it is
useful to have a quick way of determining the coordinates of ~v with respect to some
basis C for V if we are given the coordinates of ~v with respect to the basis B of V.
For example, in some applications it is useful to write polynomials in terms of powers
of x−c for some real number c. That is, given a polynomial p(x) = a0+a1x+· · ·+anxn,
we want to write it in the form
p(x) = b0 + b1(x − c) + b2(x − c)2 + · · · + bn(x − c)n
Such a situation may arise if the values of x you are working with are very close to
c. If you are working with many polynomials, then it would be very helpful to have
a fast way of converting each polynomial. We can rephrase this problem in terms of
linear algebra.
Let S = {1, x, . . . , xn} be the standard basis for Pn(R). Given [p(x)]S =
a0
...an
we want
to determine [p(x)]B where B is the basis B = {1, x − c, (x − c)2, . . . , (x − c)n} for
Pn(R).
Section 4.3 Coordinates 137
Since taking coordinates is a linear operation by Theorem 2, we get
[p(x)]B = [a01 + a1x + · · · + anxn]B
= a0[1]B + a1[x]B + · · · + an[xn]B
=[
[1]B [x]B · · · [xn]B]
a0
...an
So, the multiplication of this matrix and the coordinate vector of p(x) with respect to
the standard basis gives us the coordinates of p(x) with respect to the new basis B.
We demonstrate how this works with an example.
EXAMPLE 6 Let B = {1, x − 3, (x − 3)2}. Determine the matrix[
[1]B [x]B [x2]B]
and use it to
write the polynomials p(x) = 2 + x + 3x2, q(x) = x − 4x2, and r(x) = 3 + 5x − x2 in
powers of x − 3.
Solution: To find the matrix, we first need to find the coordinates of 1, x, and x2 with
respect to B. That is, we need to find a1, a2, a3, b1, b2, b3, c1, c2, c3 such that
1 = a1(1) + a2(x − 3) + a3(x − 3)2 = (a1 − 3a2 + 9a3) + (a2 − 6a3)x + a3x2
x = b1(1) + b2(x − 3) + b3(x − 3)2 = (b1 − 3b2 + 9b3) + (b2 − 6b3)x + b3x2
x2 = c1(1) + c2(x − 3) + c3(x − 3)2 = (c1 − 3c2 + 9c3) + (c2 − 6c3)x + c3x2
Comparing like powers of x we get three systems of linear equations all with the
same coefficient matrix. Thus, to solve all three simultaneously we row reduce the
corresponding multi-augmented matrix. We get
1 −3 9 1 0 0
0 1 −6 0 1 0
0 0 1 0 0 1
∼
1 0 0 1 3 9
0 1 0 0 1 6
0 0 1 0 0 1
Hence, we get that
[
[1]B [x]B [x2]B]
=
1 3 9
0 1 6
0 0 1
Therefore, according to our work above, we get that
[2 + x + 3x2]B =
1 3 9
0 1 6
0 0 1
2
1
3
=
32
19
3
Therefore, by definition of coordinates,
2 + x + 3x2 = 32 + 19(x − 3) + 3(x − 3)2
It is easy to verify that this is indeed correct.
138 Chapter 4 Vector Spaces
Similarly,
[x − 4x2]B =
1 3 9
0 1 6
0 0 1
0
1
−4
=
−33
−23
−4
[3 + 5x − x2]B =
1 3 9
0 1 6
0 0 1
3
5
−1
=
9
−1
−1
Hence,
x − 4x2 = −33 − 23(x − 3) − 4(x − 3)2
and
3 + 5x − x2 = 9 − (x − 3) − (x − 3)2
It is important to observe how easy it was to convert polynomials from standard
coordinates to B-coordinates once we had found the matrix.
Of course, we can repeat the argument above in the case where B and C are any two
bases of a vector space V. If B = {~v1, . . . ,~vn} and [~x]B =
b1
...bn
, then
[~x]C = [b1~v1 + · · · + bn~vn]C
= b1[~v1]C + · · · + bn[~vn]C
=[
[~v1]C · · · [~vn]C]
b1
...bn
=[
[~v1]C · · · [~vn]C]
[~x]B
Hence, we make the following definition.
DEFINITION
Change ofCoordinates
Matrix
Let B = {~v1, . . . ,~vn} and C both be bases for a vector space V. The change of
coordinates matrix from B-coordinates to C-coordinates is defined by
CPB =[
[~v1]C · · · [~vn]C]
and for any ~x ∈ V we have
[~x]C =CPB[~x]B
REMARK
Observe that the notation CPB can help you remember which side has the C-
coordinate vector of ~x and which side has the B-coordinate vector of ~x.
Section 4.3 Coordinates 139
EXAMPLE 7 Consider the basis B =
1
2
3
,
−1
0
1
,
−2
0
−3
for R3. Find the change of coordinates
matrix from B to the standard basis S.
Solution: The columns of the desired change of coordinates matrix are the S-
coordinate vectors of the vectors in B. We have
1
2
3
S
=
1
2
3
,
−1
0
1
S
=
−1
0
1
,
−2
0
−3
S
=
−2
0
−3
So, the change of coordinates matrix SPB is
SPB =
1
2
3
S
−1
0
1
S
−2
0
−3
S
=
1 −1 −2
2 0 0
3 1 −3
EXAMPLE 8 Let B ={[
1
3
]
,
[
2
1
]}
and C ={[
−1
1
]
,
[
5
−4
]}
be bases for R2. Find CPB and BPC.
Solution: To find CPB we need to find the C-coordinate vectors of the vectors in B.
Thus, we need to solve the systems
c1
[
−1
1
]
+ c2
[
5
−4
]
=
[
1
3
]
d1
[
−1
1
]
+ d2
[
5
−4
]
=
[
2
1
]
Observe that since system has the same coefficient matrix, the same row operations
will solve each system. To make this easier, we make the multiple augmented system[
−1 5 1 2
1 −4 3 1
]
Row reducing this system gives[
−1 5 1 2
1 −4 3 1
]
∼[
1 0 19 13
0 1 4 3
]
Hence,
CPB =
[
19 13
4 3
]
Similarly, for BPC we need to find the B-coordinate vectors of the vectors in C. We
again form a multiple augmented system and row reduce to get[
1 2 −1 5
3 1 1 −4
]
∼[
1 0 3/5 −13/50 1 −4/5 19/5
]
So,
BPC =
[
3/5 −13/5−4/5 19/5
]
140 Chapter 4 Vector Spaces
EXAMPLE 9 Let B = {1 + x, 1 + 2x + x2, x − x2} be a basis for P2(R). Determine the change of
coordinates matrix from B to the standard basis S = {1, x, x2} for P2(R).
Solution: We need to find the S-coordinate vectors of the vectors in B. We have
[1 + x]S =
1
1
0
[1 + 2x + x2]S =
1
2
1
[x − x2]S =
0
1
−1
Hence, the change of coordinates matrix is
SPB =[
[1 + x]S [1 + 2x + x2]S [x − x2]S]
=
1 1 0
1 2 1
0 1 −1
EXAMPLE 10 Consider the basis B =
1
3
−1
,
2
1
1
,
3
4
1
. Find the change of coordinates matrix
from the standard basis S to B and hence find
x1
x2
x3
B
. Verify the answer is correct by
using the definition of coordinates.
Solution: The columns of the desired change of coordinates matrix are the B-
coordinate vectors of the standard basis vectors. To find these we need to solve the
augmented systems
1 2 3 1
3 1 4 0
−1 1 1 0
,
1 2 3 0
3 1 4 1
−1 1 1 0
,
1 2 3 0
3 1 4 0
−1 1 1 1
Again, we row reduce the multiple augmented system to get
1 2 3 1 0 0
3 1 4 0 1 0
−1 1 1 0 0 1
∼
1 0 0 3/5 −1/5 −1
0 1 0 7/5 −4/5 −1
0 0 1 −4/5 3/5 1
Thus, the change of coordinates matrix from S-coordinates to B-coordinates is
BPS =
3/5 −1/5 −1
7/5 −4/5 −1
−4/5 3/5 1
Therefore, we have
x1
x2
x3
B
= BPS[~x]S =
3/5 −1/5 −1
7/5 −4/5 −1
−4/5 3/5 1
x1
x2
x3
=
35x1 − 1
5x2 − x3
75x1 − 4
5x2 − x3
−45x1 +
35x2 + x3
Section 4.3 Coordinates 141
To verify that these are the B-coordinates of ~x we apply the definition of coordinates.
We have
(
3
5x1−
1
5x2 − x3
)
1
3
−1
+
(
7
5x1 −
4
5x2 − x3
)
2
1
1
+
(
− 4
5x1 +
3
5x2 + x3
)
3
4
1
=
1 2 3
3 1 4
−1 1 1
35x1 − 1
5x2 − x3
75x1 − 4
5x2 − x3
−45x1 +
35x2 + x3
=
1 2 3
3 1 4
−1 1 1
3/5 −1/5 −1
7/5 −4/5 −1
−4/5 3/5 1
x1
x2
x3
=
1 0 0
0 1 0
0 0 1
x1
x2
x3
=
x1
x2
x3
as required.
Notice in Example 10 that the matrix
1 2 3
3 1 4
−1 1 1
is the change of coordinates matrix
from B-coordinates to S-coordinates. So, the calculations in the example show that
SPB BPS = I. Of course, this makes sense since the product should change from
S-coordinates to B-coordinates and then back to S-coordinates. This is exactly how
we will prove the theorem.
THEOREM 3 If B and C are bases for an n-dimensional vector space V, then the change of coordi-
nate matrices CPB and BPC satisfy
CPB BPC = I =BPC CPB
Proof: By definition of the change of coordinates matrix we have [~x]C = CPB[~x]Band [~x]B =BPC[~x]C. Combining these gives
[~x]C =CPB[~x]B =CPB BPC[~x]C
Hence, I[~x]C =CPB BPC[~x]C for all [~x]C ∈ Rn, so CPB BPC = I by the Matrices Equal
Theorem. Similarly,
[~x]B =BPC[~x]C =BPC CPB[~x]B
for all [~x]B ∈ Rn, hence BPC CPB = I. �
142 Chapter 4 Vector Spaces
Section 4.3 Problems
1. Given thatB is a basis for the subspace it spans,
find ~x such that:
(a) B ={[
1 3
−1 2
]
,
[
3 6
−4 3
]}
, [~x]B =
[√2
−4
]
(b) B ={
1 + x, 1 + x + x2, 1 + x2}
, [~x]B =
1
0
3
2. Consider the basisB =
1
2
0
,
3
7
3
,
−2
−3
5
forR3.
Find
−2
−5
1
B
and
4
8
−3
B
.
3. In each case, given that B is a basis for the sub-
space it spans, determine the coordinate vectors
of A and B with respect to B.
(a) B ={[
1 1
0 −1
]
,
[
1 0
1 1
]
,
[
0 1
1 2
]}
,
A =
[
1 −3
2 3
]
, B =
[
−1 0
3 7
]
.
(b) B ={[
1 1
1 0
]
,
[
0 1
1 1
]
,
[
2 0
0 −1
]}
,
A =
[
0 1
1 2
]
, B =
[
−4 1
1 4
]
4. In each case, given thatB is a basis for the set it
spans, determine the coordinate vectors of p(x)
and q(x) with respect to B.
(a) B = {1 + x + x2, 1 + 3x + 2x2, 4 + x2},p(x) = −2+8x+5x2, q(x) = −4+8x+4x2
(b) B ={
1 + x − 4x2, 2 + 3x − 3x2, 3 + 6x + 4x2}
p(x) = −3 − 5x − 3x2, q(x) = x.
5. Given that B ={[
3
1
]
,
[
5
3
]}
and C ={[
2
1
]
,
[
5
2
]}
are both bases for R2, find BPC and CPB.
6. Given the bases B = {1, x − 1, (x − 1)2} and
C = {1+ x+ x2, 1+3x− x2, 1− x− x2} for P2(R),
find BPC and CPB.
7. Let B = {1− x+ x2, 1−2x+3x2, 1−2x+4x2} be
a basis for P2(R) and let S denote the standard
basis for P2(R).
(a) Find SPB and BPS.
(b) Find p(x) if [p(x)]B =
3
1
−2
.
(c) Find [q(x)]B if q(x) = 1 + x2.
(d) Find r(x) if [r(x)]B =
2
−3
2
.
8. Let T =
{[
a1 a2
0 a3
]
| a1, a2, a3 ∈ R}
be the vec-
tor space of upper triangular matrices with
standard addition and scalar multiplication of
matrices. Given that
S ={[
1 0
0 0
]
,
[
0 1
0 0
]
,
[
0 0
0 1
]}
and
B ={[
3 0
0 3
]
,
[
−1 3
0 0
]
,
[
1 −3
0 1
]}
are both bases for T. Find the change of coor-
dinates matrix from B to S and the change of
coordinates matrix from S to B. Verify that the
product of the change of coordinates matrices
is I.
9. Let A =
1 2 2
2 4 1
3 8 0
. Find a matrix B such that
AB = I.
10. If B = {~v1, . . . ,~vn} is a basis for V, show that
{[~v1]B, . . . , [~vn]B} is a basis for Rn.
11. Suppose that B = {~v1, . . . ,~vn} and C =
{~w1, . . . , ~wn} are both bases for a vector space
V such that [~x]B = [~x]C for all ~x ∈ V. Must it
be true that ~vi = ~wi for each 1 ≤ i ≤ n? Explain
or prove your conclusion.
Chapter 5
Inverses and Determinants
5.1 Matrix Inverses
In the last chapter, we saw that if B and C are bases for a finite dimensional vector
space V, then the change of coordinates matrix BPC and the change of coordinates
matrix CPB satisfy
BPCCPB = I = CPBBPC
Since the products of these matrices give the identity matrix (the multiplicative iden-
tity), these matrices are multiplicative inverses of each other. We now develop and
explore the idea of inverse matrices.
DEFINITION
Left and RightInverse
Let A be an m × n matrix. If B is an n × m matrix such that AB = Im, then B is called
a right inverse of A. If C is an n×m matrix such that CA = In, then C is called a left
inverse of A.
EXAMPLE 1 If A =
[
1 2 2
−1 3 1
]
and B =
3/5 −2/51/5 1/50 0
, then B is a right inverse of A since
[
1 2 2
−1 3 1
]
3/5 −2/51/5 1/50 0
=
[
1 0
0 1
]
Note that this also shows that A is a left inverse of B.
How would we try to find a right inverse of an m × n matrix A? We want to find an
n × m matrix B =[
~b1 · · · ~bm
]
such that
Im = AB
Im = A[
~b1 · · · ~bm
]
[
~e1 · · · ~em
]
=[
A~b1 · · · A~bm
]
143
144 Chapter 5 Inverses and Determinants
Comparing columns, we see that we need to find ~bi such that A~bi = ~ei for 1 ≤ i ≤m. But, this is just solving m systems of linear equations with the same coefficient
matrix. To do this, we row reduce the multiple augmented matrix[
A | Im
]
to RREF.
For this to work, we require that all m systems are consistent. By the result of Ex-
ercise 3.1.7 (b) we get that rank(A) = m. Hence, by Theorem 2.2.4, we must have
n >= m.
Moreover, if n > m, then by the System Rank Theorem (2) we will have infinitely
many solutions and hence infinitely many right inverses.
On the other hand, if n < m, then all the systems cannot be consistent and hence A
cannot have a right inverse. We state this as a theorem.
THEOREM 1 If A is an m × n matrix with m > n, then A cannot have a right inverse.
COROLLARY 2 If A is an m × n matrix with m < n, then A cannot have a left inverse.
Proof: If A has a left inverse C, then C is an n × m matrix with n > m with a right
inverse which contradicts the previous theorem. �
We see that only an n × n matrix can have a left and a right inverse.
DEFINITION
Square Matrix
An n × n matrix is called a square matrix.
We now show that if a matrix has a left and right inverse, then the left and right
inverses must equal.
THEOREM 3 If A, B, and C are n × n matrices such that AB = I = CA, then B = C.
Proof: We have
B = IB = (CA)B = C(AB) = CI = C
�
DEFINITION
Matrix Inverse
Invertible
Let A be an n × n matrix. If B is a matrix such that AB = I = BA, then B is called the
inverse of A. We write B = A−1 and we say that A is invertible.
Section 5.1 Matrix Inverses 145
REMARKS
1. Observe from the definition that if B = A−1, then B is invertible and A = B−1.
2. In the definition we said that B is ‘the’ inverse of A. The fact that the inverse
of a matrix is unique follows immediately from Theorem 3.
EXAMPLE 2 Observe that[
3 −1
−2 1
] [
1 1
2 3
]
=
[
1 0
0 1
]
=
[
1 1
2 3
] [
3 −1
−2 1
]
Hence[
3 −1
−2 1
]−1
=
[
1 1
2 3
]
and[
1 1
2 3
]−1
=
[
3 −1
−2 1
]
EXERCISE 1 Is A =
1 1 2
1 2 2
1 −1 3
the inverse of B =
8 −5 −2
−1 1 0
−3 2 1
?
The following theorem is quite useful. It not only shows that to prove that B is the
inverse of a square matrix A we only need to show that B is the left or right inverse
of A, but it also shows that the rank of an invertible matrix is n.
THEOREM 4 If A and B are n × n matrices such that AB = I, then A and B are invertible and
rank A = rank B = n.
Proof: Assume that AB = I and consider the homogeneous system B~x = ~0. We get
~0 = A~0 = A(B~x) = (AB)~x = I~x = ~x
So, the system has a unique solution and hence, by the System Rank Theorem, the
coefficient matrix B has rank n. But, rank B = n implies that that the system B~x = ~yis consistent for all ~y ∈ Rn. This gives
BA~y = BA(B~x) = B(AB)~x = BI~x = B~x = ~y = I~y
Thus, BA~y = I~y for every ~y ∈ Rn and hence BA = I by the Matrices Equal Theo-
rem. Therefore, A and B are invertible. Moreover, since BA = I we can repeat the
argument above to prove that rank A = n. �
146 Chapter 5 Inverses and Determinants
We now use this result to prove the following theorem.
THEOREM 5 If A and B are invertible matrices and c ∈ R with c , 0, then
(1) (cA)−1 = 1cA−1
(2) (AT )−1 = (A−1)T
(3) (AB)−1 = B−1A−1
Proof: We just prove (1) and leave (2) and (3) as exercises.
By Theorem 4, to show that B = A−1 we just need to show that AB = I. Thus, to
show that (cA)−1 = 1cA−1 we just need to show that (cA)
(
1cA−1
)
= I. We have
(cA)
(
1
cA−1
)
=c
cAA−1 = 1I = I
�
The statement of Theorem 4 says that for an n × n matrix A to be invertible, it is
necessary that rank A = n. We now show that rank A = n is also sufficient for A to be
invertible.
THEOREM 6 If A is an n × n matrix such that rank A = n, then A is invertible.
Proof: If rank A = n, then the system of equations A~bi = ~ei, 1 ≤ i ≤ n are all
consistent by the System Rank Theorem. Hence, if we let B =[
~b1 · · · ~bn
]
, then
AB = A[
~b1 · · · ~bn
]
=[
A~b1 · · · A~bn
]
=[
~e1 · · · ~en
]
= I
Thus, A is invertible by Theorem 4. �
Observe that the proof of this theorem teaches us one way of finding the inverse of
a matrix A. The columns of the inverse are the vectors ~bi such that A~bi = ~ei for
1 ≤ i ≤ n. Since each of these systems have the same coefficient matrix A, we
can solve them by making one multiple augmented matrix and row reducing. In
particular, assuming that A is invertible we get
[A | I ] ∼[
I | A−1]
REMARK
Compare this with the method for finding the change of coordinates matrix from
standard coordinates in Rn to any other basis B of Rn.
Section 5.1 Matrix Inverses 147
EXAMPLE 3 Find the inverse of A =
1 −2 3
−1 4 1
1 −1 4
.
Solution: We row reduce the multiple augmented system [A | I ].
1 −2 3 1 0 0
−1 4 1 0 1 0
1 −1 4 0 0 1
∼
1 0 0 −17/2 −5/2 7
0 1 0 −5/2 −1/2 2
0 0 1 3/2 1/2 −1
Thus, A−1 =
−17/2 −5/2 7
−5/2 −1/2 2
3/2 1/2 −1
.
EXERCISE 2 Find the inverse of A =
1 2 −1
0 1 2
−1 3 10
.
EXAMPLE 4 Let A ∈ M2×2(R). Find a condition on the entries of A that guarantees that A is
invertible and then, assuming that A is invertible, find A−1.
Solution: Let A =
[
a b
c d
]
. For A to be invertible, we require that rank A = 2. Thus,
we cannot have a = 0 = c. Assume that a , 0. Row reducing [A | I ] we get
[
a b 1 0
c d 0 1
]
R2 − caR1∼
[
a b 1 0
0 d − bca− c
a1
]
aR2∼[
a b 1 0
0 ad − bc −c a
]
Since we need rank A = 2, we now require that ad − bc , 0. Assuming this, we
continue row reducing to get
[
a b 1 0
0 ad − bc −c a
]
∼[
1 0 dad−bc
−bad−bc
0 1 −cad−bc
aad−bc
]
Thus, we get that A is invertible if and only if ad−bc , 0. Moreover, if A is invertible,
then
A−1 =1
ad − bc
[
d −b
−c a
]
Repeating the argument in the case that c , 0 gives the same result.
REMARK
This quantity ad − bc which determines whether a 2 × 2 matrix is invertible or not is
called the determinant of the matrix. We will look more at this later in the chapter.
148 Chapter 5 Inverses and Determinants
EXERCISE 3 Let A =
a b c
0 d e
0 0 f
. Find a condition on the entries of A that guarantees that A is
invertible and then, assuming that A is invertible, find A−1.
Invertible Matrix Theorem
The following theorem ties together many of the concepts we have looked at previ-
ously. Additionally, we will find it very useful throughout the rest of the book.
THEOREM 7 (Invertible Matrix Theorem)
For any n × n matrix A, the following are equivalent:
(1) A is invertible
(2) The RREF of A is I
(3) rank A = n
(4) The system of equations A~x = ~b is consistent with a unique solution for all~b ∈ Rn
(5) The nullspace of A is {~0}(6) The columns of A form a basis for Rn
(7) The rows of A form a basis for Rn
(8) AT is invertible
This theorem says that if any one of the eight items are true for an n × n matrix, then
all of the other items are also true. We leave the proof of this theorem as an important
exercise. This is an excellent way of testing your knowledge and understanding of
much of the material we have done up to this point.
REMARK
The Invertible Matrix Theorem is not limited to these eight statements. At this point,
you should be able to add many more to the list. Additionally, we will add a few
more statements to it as we proceed through the text.
Parts (1) and (4) of the Invertible Matrix Theorem tell us that if A is invertible, then
the system of equations A~x = ~b is consistent with a unique solution. What is the
solution? Observe that since A is invertible, we have
A−1A~x = A−1~b
I~x = A−1~b
~x = A−1~b
Hence, if A is invertible, we can solve the system A~x = ~b by simply matrix multiply-
ing A−1~b.
Section 5.1 Matrix Inverses 149
EXAMPLE 5 Solve the system of linear equations
3x1 − 2x2 = 4
x1 + 5x2 = 7
Solution: We can represent the system as A~x = ~b where A =
[
3 −2
1 5
]
and ~b =
[
4
7
]
.
From our work in Example 4, we see that A is invertible with A−1 = 117
[
5 2
−1 3
]
.
Hence, the solution of the system is
~x = A−1~b =1
17
[
5 2
−1 3
] [
4
7
]
=
[
2
1
]
EXAMPLE 6 Let A =
1 −2 3
−1 4 1
1 −1 4
. Solve the systems A~x =
1
0
1
and A~y =
1
2
1
.
Solution: In Example 3 we found that A−1 =
−17/2 −5/2 7
−5/2 −1/2 2
3/2 1/2 −1
. Hence, the solu-
tion of the first system is
~x = A−1~b =
−17/2 −5/2 7
−5/2 −1/2 2
3/2 1/2 −1
1
0
1
=
−3/2−1/21/2
and the solution for the second system is
~y = A−1~b =
−17/2 −5/2 7
−5/2 −1/2 2
3/2 1/2 −1
1
2
1
=
−13/2−3/23/2
The example above shows that once we have found A−1 we can solve any system of
linear equations with coefficient matrix A very quickly. Moreover, at first glance, it
might seem like we are solving the system of linear equations without row reducing.
However, this is an illusion. The row operations are in fact “stored” inside of the
inverse of the matrix (we used row operations to find the inverse). In the next section
we look at how to discover those stored row operations.
150 Chapter 5 Inverses and Determinants
Section 5.1 Problems
1. For each matrix, find the inverse or show that
the matrix is not invertible.
(a)
[
2 1
−3 4
]
(b)
[
1 3
1 −6
]
(c)
[
2 4
3 6
]
(d)
1 0 0
0 2 0
0 0 3
(e)
1 2 3
4 5 6
7 8 9
(f)
1 2 1
0 −2 4
3 4 4
(g)
3 1 −2
1 2 1
2 1 1
(h)
1 0 1
0 2 −3
0 0 3
(i)
−1 0 −1 2
1 −1 −2 4
−3 −1 5 2
−1 −2 4 4
(j)
1 2 3 2
1 1 1 1
−1 −3 1 4
0 1 −3 −5
2. Let A =
1 −2 2
−1 1 0
3 −2 −3
. Find A−1 and use it to
solve A~x = ~d where:
(a) ~d =
1
−2
3
(b) ~d =
1
0
0
(c) ~d =
1/2−1/3√
2
3. Let A =
[√2 3/2
1/2 0
]
. Find A−1 and use it to
solve A~x = ~d where:
(a) ~d =
[
1
0
]
(b) ~d =
[√2
1
]
(c) ~d =
[√3
π
]
4. Find all right inverses of A =
[
3 1 0
1 2 0
]
.
5. Find all right inverses of A =
[
1 −2 1
1 −1 1
]
.
6. Find all left inverses of B =
1 1
−2 −1
1 1
7. Find all left inverses of B =
1 2
0 −1
3 3
.
8. Let A =
[
2 1
3 1
]
and B =
[
1 2
3 5
]
.
(a) Find A−1 and B−1.
(b) Calculate AB and (AB)−1 and verify that
(AB)−1 = B−1A−1.
(c) Calculate (2A)−1 and verify that it equals12A−1.
(d) Calculate (AT )−1 and verify that
AT (AT )−1 = I.
9. Prove that (AT )−1 = (A−1)T .
10. Let {~v1, . . . ,~vk} be a linearly independent set in
Rn and let A be an invertible n×n matrix. Prove
that {A~v1, . . . , A~vk} is linearly independent.
11. (a) Prove that if A and B are n×n matrices such
that AB is invertible, then A and B are invert-
ible.
(b) Find non-invertible matrices C and D such
that CD is invertible.
12. Prove that if A is an m × n matrix with m > n,
then A cannot have a right inverse.
13. Prove or disprove the following statements. In
each case, A and B are both n × n matrices.
(a) If C is a 3 × 2 matrix, then C has a left
inverse.
(b) If Null(AT ) = {~0}, then A is invertible.
(c) If A and B are invertible matrices, then
A + B is invertible.
(d) If AA = A and A is not the zero matrix,
then A is invertible.
(e) If AA is invertible, then A is invertible.
(f) If A and B satisfy AB = On,n, then B is not
invertible.
(g) There exists a basis {A1, A2, A3, A4} for
M2×2(R) such that each Ai is invertible.
(h) If A has a column of zeroes, then A is not
invertible.
(i) If A~x = ~0 has a unique solution, then
Col(A) = Rn.
(j) If rank AT = n, then rank A = n.
(k) If A is invertible, then the columns of AT
are linearly independent.
(l) If A satisfies A2 − 2A + I = On,n, then A is
invertible.
14. (a) Find n × n matrices A and B such that
(AB)−1, A−1B−1.
(b) What condition is required on AB so that
we do have (AB)−1 = A−1B−1.
Section 5.2 Elementary Matrices 151
5.2 Elementary Matrices
Consider the system of linear equations A~x = ~b where A is invertible. We compare
our two methods of solving this system. First, we can solve this system using the
methods of Chapter 2 to row reduce the augmented matrix:[
A | ~b]
∼ [
I | ~x ]
Alternately, we find A−1 using the method in the previous section. We row reduce
[A | I ] ∼[
I | A−1]
and then solve the system by computing ~x = A−1~b.
Observe that in both methods we can use exactly the same row operations to row
reduce A to I. In the first method, we are applying those row operations directly
to ~b to determine ~x. Thus, in the second method the row operations used are being
“stored” inside of A−1 and the matrix vector product A−1~b is “performing” those row
operations on ~b so that we get the same answer as in the first method.
This not only shows us that A−1 is made of elementary row operations, but also that
there is a close connection between matrix multiplication and performing elementary
row operations.
DEFINITION
ElementaryMatrix
An n×n matrix E is called an elementary matrix if it can be obtained from the n×n
identity matrix by performing exactly one elementary row operation.
EXAMPLE 1 Find the 3×3 elementary matrix corresponding to each row operation by performing
the row operation on the 3 × 3 identity matrix.
(a) R2 − 3R1
Solution: We have
1 0 0
0 1 0
0 0 1
R2 − 3R1 ∼
1 0 0
−3 1 0
0 0 1
Thus, the corresponding elementary matrix is E1 =
1 0 0
−3 1 0
0 0 1
.
(b) R1 ↔ R2
Solution: We have
1 0 0
0 1 0
0 0 1
R1 ↔ R2 ∼
0 1 0
1 0 0
0 0 1
Thus, the corresponding elementary matrix is E2 =
0 1 0
1 0 0
0 0 1
.
152 Chapter 5 Inverses and Determinants
(c) 3R3
Solution: We have
1 0 0
0 1 0
0 0 1
3R3
∼
1 0 0
0 1 0
0 0 3
Thus, the corresponding elementary matrix is E3 =
1 0 0
0 1 0
0 0 3
.
EXAMPLE 2 Determine which of the following matrices are elementary. For each elementary
matrix, indicate the associated elementary row operation.
(a)
1 0 2
0 1 0
0 0 1
Solution: This matrix is elementary as it can be obtained from the 3 × 3 identity
matrix by performing the row operation R1 + 2R3.
(b)
[
0 1
1 0
]
Solution: This matrix is elementary as it can be obtained from the 2 × 2 identity
matrix by performing the row operation R1 ↔ R2.
(c)
3 0 0
0 3 0
0 0 3
Solution: This matrix is not elementary as it would require three elementary row
operations to get this matrix from the 3 × 3 identity matrix.
(d)
1 0 0
0 −1 0
0 0 1
Solution: This matrix is elementary as it can be obtained from the 3 × 3 identity
matrix by performing the row operation (−1)R2.
EXERCISE 1 Determine which of the following matrices are elementary. For each elementary
matrix, indicate the associated elementary row operation.
E1 =
[
1 1
0 1
]
, E2 =
[
−1 0
0 1
]
, E3 =
[
2 3
0 1
]
Section 5.2 Elementary Matrices 153
It is clear that the RREF of every elementary matrix is I; we just need to perform
the inverse (opposite) row operation to turn the elementary matrix E back to I. Thus,
we not only have that every elementary matrix is invertible, but the inverse is the
elementary matrix associated with the reverse row operation.
THEOREM 1 If E is an elementary matrix, then E is invertible and E−1 is also an elementary matrix.
EXAMPLE 3 Find the inverse of each of the following elementary matrices. Check your answer
by multiplying the matrices together.
(a) E1 =
[
1 2
0 1
]
Solution: The inverse matrix E−11 is the elementary matrix associated with the row
operation required to bring E1 back to I. That is, R1 − 2R2. Therefore,
E−11 =
[
1 −2
0 1
]
Checking, we get E−11
E1 =
[
1 −2
0 1
] [
1 2
0 1
]
=
[
1 0
0 1
]
.
(b) E2 =
0 0 1
0 1 0
1 0 0
Solution: The inverse matrix E−12
is the elementary matrix associated with the row
operation required to bring E2 back to I. That is, R1 ↔ R3. Therefore,
E−12 =
0 0 1
0 1 0
1 0 0
Checking, we get E−12
E2 =
0 0 1
0 1 0
1 0 0
0 0 1
0 1 0
1 0 0
=
1 0 0
0 1 0
0 0 1
.
(c) E3 =
1 0 0
0 1 0
0 0 −3
Solution: The inverse matrix E−13
is the elementary matrix associated with the row
operation required to bring E3 back to I. That is, (−1/3)R3. Therefore,
E−13 =
1 0 0
0 1 0
0 0 −1/3
Checking, we get E−13
E3 =
1 0 0
0 1 0
0 0 −1/3
1 0 0
0 1 0
0 0 −3
=
1 0 0
0 1 0
0 0 1
.
154 Chapter 5 Inverses and Determinants
If we look at our calculations in the example above we see something interesting.
Consider an elementary matrix E. We saw that the product EI is the matrix obtained
from I by performing the row operations associated with E on I, and that E−1E is the
matrix obtained from E by performing the row operation associated with E−1 on E.
We demonstrate this further with another example.
EXAMPLE 4 Let E1 =
1 0 0
2 1 0
0 0 1
, E2 =
1 0 0
0 1 0
0 0 3
, and A =
1 1 3
0 −1 2
0 5 6
. Calculate E1A and
E2E1A. Describe the products in terms of matrices obtain from A by elementary row
operations.
Solution: By matrix multiplication, we get
E1A =
1 0 0
2 1 0
0 0 1
1 1 3
0 −1 2
0 5 6
=
1 1 3
2 1 8
0 5 6
Observe that E1A is the matrix obtain from A by performing the row operation
R2 + 2R1. That is, by performing the row operation associated with E1.
E2E1A =
1 0 0
0 1 0
0 0 3
1 1 3
2 1 8
0 5 6
=
1 1 3
2 1 8
0 15 18
Thus, E2E1A is the matrix obtained from A by first performing the row operation
R2 + 2R1, and then performing the row operation associated with E2, namely 3R3.
The following theorems prove that this useful fact is true for all elementary matrices.
THEOREM 2 If A is an m×n matrix and E is the m×m elementary matrix corresponding to the row
operation Ri + cR j, for i , j, then EA is the matrix obtained from A by performing
the row operation Ri + cR j on A.
Proof: Observe that the i-th row of E satisfies eii = 1, ei j = c, and eik = 0 for all
k , i, j. So, the entries in the i-th row of EA are
(EA)ik =
m∑
ℓ=1
EiℓAℓk
= ei ja jk + eiiaik
= ca jk + aik
for 1 ≤ k ≤ n. Therefore, the i-th row of A is equal to Ri + cR j. The remaining rows
of E are rows from the identity matrix, and hence the remaining rows of EA are just
the corresponding rows in A. �
Section 5.2 Elementary Matrices 155
The proofs of the following two theorems are similar to that of Theorem 2.
THEOREM 3 If A is an m × n matrix and E is the m × m elementary matrix corresponding to the
row operation cRi, then EA is the matrix obtained from A by performing the row
operation cRi on A.
THEOREM 4 If A is an m×n matrix and E is the m×m elementary matrix corresponding to the row
operation Ri ↔ R j, for i , j, then EA is the matrix obtained from A by performing
the row operation Ri ↔ R j on A.
Since elementary row operations do not change the rank of a matrix, we immediately
get the following result.
COROLLARY 5 If A is an m × n matrix and E is an m × m elementary matrix, then
rank(EA) = rank A
Matrix Decomposition into Elementary Matrices
We began this section by asking about how the inverse of a matrix stores elementary
row operations. From our work above, we guess that these are stored as elementary
matrices. We now look at how to decompose a matrix into its elementary factors.
THEOREM 6 If A is an m×n matrix with reduced row echelon form R, then there exists a sequence
E1, . . . , Ek of m × m elementary matrices such that Ek · · ·E2E1A = R. In particular,
A = E−11 E−1
2 · · ·E−1k R
Proof: Using the methods of Chapter 2, we know that we can row reduce a matrix A
to its RREF R with a sequence of elementary row operations. Let E1 denote the ele-
mentary matrix corresponding to the first row operation performed, E2 the elementary
matrix corresponding to the second row operation performed, up to Ek corresponding
to the last row operation performed. Then, Theorems 2, 3, and 4, gives us that
EkEk−1 · · ·E2E1A = R (5.1)
Because each elementary matrix is invertible, we get that the product EkEk−1 · · ·E1 is
invertible by Theorem 5.1.5 (3). In particular, multiplying by the inverse of Ek · · ·E1
on both sides of (5.1) we get that
A = (Ek · · ·E1)−1R = E−11 E−1
2 · · ·E−1k R
as required. �
156 Chapter 5 Inverses and Determinants
EXAMPLE 5 Write A =
1 3 0 1
−1 −3 3 2
2 7 0 2
as a product of elementary matrices and its RREF R.
Solution: We first row reduce A to R keeping track of the row operations used.
1 3 0 1
−1 −3 3 2
2 7 0 2
R2 + R1
R3 − 2R1
∼
1 3 0 1
0 0 3 3
0 1 0 0
R2 ↔ R3∼
1 3 0 1
0 1 0 0
0 0 3 3
13R3
∼
1 3 0 1
0 1 0 0
0 0 1 1
R1 − 3R2
∼
1 0 0 1
0 1 0 0
0 0 1 1
Next, we write the elementary matrix for each row operation we used. Observe that
each elementary matrix will be 3 × 3 since A has three rows. We have
E1 =
1 0 0
1 1 0
0 0 1
E2 =
1 0 0
0 1 0
−2 0 1
E3 =
1 0 0
0 0 1
0 1 0
E4 =
1 0 0
0 1 0
0 0 1/3
E5 =
1 −3 0
0 1 0
0 0 1
and E5E4E3E2E1A = R. Then
A = E−11 E−1
2 E−13 E−1
4 E−15 R
=
1 0 0
−1 1 0
0 0 1
1 0 0
0 1 0
2 0 1
1 0 0
0 0 1
0 1 0
1 0 0
0 1 0
0 0 3
1 3 0
0 1 0
0 0 1
1 0 0 1
0 1 0 0
0 0 1 1
EXAMPLE 6 Write A =
1 −1 0
−1 2 1
0 1 1
1 −1 0
as a product of elementary matrices and its RREF R.
Solution: We first row reduce A to R keeping track of the row operations used.
1 −1 0
−1 2 1
0 1 1
1 −1 0
R2 + R1
R4 − R1
∼
1 −1 0
0 1 1
0 1 1
0 0 0
R1 + R2
R3 − R2∼
1 0 1
0 1 1
0 0 0
0 0 0
Next, we write the elementary matrix for each row operation we used. Observe that
each elementary matrix will be 4 × 4 since A has four rows. We have
E1 =
1 0 0 0
1 1 0 0
0 0 1 0
0 0 0 1
E2 =
1 0 0 0
0 1 0 0
0 0 1 0
−1 0 0 1
E3 =
1 1 0 0
0 1 0 0
0 0 1 0
0 0 0 1
E4 =
1 0 0 0
0 1 0 0
0 −1 1 0
0 0 0 1
and E4E3E2E1A = R. Then,
Section 5.2 Elementary Matrices 157
A = E−11 E−1
2 E−13 E−1
4 R
=
1 0 0 0
−1 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
1 0 0 1
1 −1 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0
0 1 0 0
0 1 1 0
0 0 0 1
1 0 1
0 1 1
0 0 0
0 0 0
EXERCISE 2 Write A =
[
0 1 2 4
−2 3 4 4
]
as a product of elementary matrices and its RREF R.
If A is invertible, then we get that the reduced row echelon form of A is I. So, we
can write A entirely as a product of elementary matrices. Moreover, by Theorem 6
we have
Ek · · ·E1A = I
and
A−1 = Ek · · ·E1
Consequently, we can also write A−1 as a product of elementary matrices. In particu-
lar, we have proven the following corollary.
COROLLARY 7 If A is an n × n invertible matrix, then A and A−1 can be written as a product of
elementary matrices.
EXAMPLE 7 Let A =
[
1 3
−3 −11
]
. Write A and A−1 as products of elementary matrices.
Solution: Row reducing A to RREF gives
[
1 3
−3 −11
]
R2 + 3R1∼
[
1 3
0 −2
]
(−1/2)R2∼
[
1 3
0 1
]
R1 − 3R2 ∼[
1 0
0 1
]
Hence,
E1 =
[
1 0
3 1
]
E2 =
[
1 0
0 −1/2
]
E3 =
[
1 −3
0 1
]
and E3E2E1A = I. Thus,
A−1 = E3E2E1 =
[
1 −3
0 1
] [
1 0
0 −1/2
] [
1 0
3 1
]
and
A = E−11 E−1
2 E−13 =
[
1 0
−3 1
] [
1 0
0 −2
] [
1 3
0 1
]
158 Chapter 5 Inverses and Determinants
EXAMPLE 8 Let A =
1 0 1
1 0 3
−1 1 −1
. Write A and A−1 as products of elementary matrices.
Solution: Row reducing A to RREF gives
1 0 1
1 0 3
−1 1 −1
R2 − R1
R3 + R1
∼
1 0 1
0 0 2
0 1 0
12R2 ∼
1 0 1
0 0 1
0 1 0
R2 ↔ R3
∼
1 0 1
0 1 0
0 0 1
R1 − R3
∼
1 0 0
0 1 0
0 0 1
Hence,
E1 =
1 0 0
−1 1 0
0 0 1
E2 =
1 0 0
0 1 0
1 0 1
E3 =
1 0 0
0 1/2 0
0 0 1
E4 =
1 0 0
0 0 1
0 1 0
E5 =
1 0 −1
0 1 0
0 0 1
and E5E4E3E2E1A = I. Therefore,
A−1 = E5E4E3E2E1 =
1 0 −1
0 1 0
0 0 1
1 0 0
0 0 1
0 1 0
1 0 0
0 1/2 0
0 0 1
1 0 0
0 1 0
1 0 1
1 0 0
−1 1 0
0 0 1
and
A = E−11 E−1
2 E−13 E−1
4 E−15 =
1 0 0
1 1 0
0 0 1
1 0 0
0 1 0
−1 0 1
1 0 0
0 2 0
0 0 1
1 0 0
0 0 1
0 1 0
1 0 1
0 1 0
0 0 1
We end this section with one additional result which will be used in a couple of proofs
later in the course.
THEOREM 8 If E is an m × m elementary matrix, then ET is an elementary matrix.
Proof: If E is obtained from I by multiplying a row by a non-zero constant c, then
ET = E.
If E is obtained from I by swapping row i and row j, then ei j = 1 = e ji, so ET = E.
Finally, if E is obtained from I by adding c times row i to row j, then ET is obtained
from I by adding c times row j to row i.
Hence, in all cases, ET is elementary. �
Section 5.2 Elementary Matrices 159
Section 5.2 Problems
1. Determine which of the following are elemen-
tary matrices. For each elementary matrix,
state the corresponding row operation and the
inverse of the matrix.
(a)
[
1 3
0 1
]
(b)
[
1 0
0 0
]
(c)
[
1 0
−4 1
]
(d)
1 0 0
0 0 1
0 1 0
(e)
1 2 0
0 1 0
1 2 1
(f)
1 0 0
0 1 0
0 0 −4
(g)
1 0 1
0 1 0
1 0 1
(h)
0 0 0 1
0 1 0 0
0 0 1 0
1 0 0 0
(i)
2 0 0 0
0 2 0 0
0 0 2 0
0 0 0 2
(j)
2 0 0 1
0 1 0 0
0 0 1 0
0 0 0 1
2. Write the 3×3 elementary matrix and its inverse
associated with each elementary row operation.
(a) R1 − 2R3 (b) R1 ↔ R2
(c)√
2R2 (d) R3 + 2R2
(e) R2 ↔ R3 (f) 3R1
(g) R2 + 2R1 (h) 12R3
3. Write each matrix A as a product of elementary
matrices and its reduced row echelon form.
(a) A =
1 2
−1 1
2 1
(b) A =
[
1 −1 2
2 1 1
]
(c) A =
0 1 3
0 −1 −2
2 2 4
(d) A =
2 4 1
1 2 2
1 2 2
(e) A =
3 0 1
2 1 0
1 0 1
(f) A =
2 1 1
2 3 −1
2 1 1
(e) A =
1 0 2
0 1 3
−1 1 1
−2 2 5
(f) A =
1 2 1 2
0 0 1 2
−1 −1 3 6
4. Write A and A−1 as a product of elementary ma-
trices.
(a) A =
[
2 1
3 3
]
(b) A =
1 1 3
1 2 2
1 1 1
(c) A =
[
0 3
1 −2
]
(d) A =
0 2 6
1 4 4
−1 2 8
(e) A =
[
4 2
2 2
]
(f) A =
1 0 −1
−2 0 −2
−4 1 4
(g) A =
[
1 3
−3 1
]
(h) A =
1 −2 4 1
−1 3 −4 −1
0 1 2 0
−2 4 −8 −1
5. Without using matrix-matrix multiplication,
calculate A where
A =
[
1 2
0 1
] [
1/2 0
0 1
] [
0 1
1 0
] [
1 0
−3 1
] [
1 0
0 3
] [
1 1
0 1
]
6. Let A =
[
1 2
2 6
]
and ~b =
[
3
5
]
.
(a) Determine elementary matrices E1, E2,
and E3 such that E3E2E1A = I.
(b) Since A is invertible, we know that they
system A~x = ~b has unique solution
~x = A−1~b = E3E2E1~b
Calculate the solution ~x without using
matrix-matrix multiplication.
(c) Solve the system A~x = ~b by row reducing[
A | ~b]
. Observe the operations that you
use on the augmented part of the system
and compare with part (b).
7. Prove that if A is invertible and B is row equiv-
alent to A, then B is also invertible.
8. We have seen that multiplying an m × n matrix
A on the left by an elementary matrix has the
same effect as performing the elementary row
operation corresponding to the elementary ma-
trix on A. If E is an n × n elementary matrix,
then what effect does multiplying by A on the
right by E have? Justify your answer.
160 Chapter 5 Inverses and Determinants
5.3 Determinants
In Example 5.1.4 we showed that a matrix A =
[
a b
c d
]
is invertible if and only if
ad − bc , 0. Since this quantity determines whether the matrix is invertible or not,
we make the following definition.
DEFINITION
2 × 2 Determinant
Let A =
[
a b
c d
]
. The determinant of A is
det A = ad − bc
REMARK
The determinant is often denoted by vertical straight lines:
∣
∣
∣
∣
∣
∣
a b
c d
∣
∣
∣
∣
∣
∣
= det
[
a b
c d
]
= ad − bc
EXAMPLE 1 Evaluate det
[
3 5
−1 2
]
and
∣
∣
∣
∣
∣
∣
3 6
2 4
∣
∣
∣
∣
∣
∣
.
Solution: We have
det
[
3 5
−1 2
]
= 3(2) − 5(−1) = 11
∣
∣
∣
∣
∣
∣
3 6
2 4
∣
∣
∣
∣
∣
∣
= 3(4) − 6(2) = 0
We, of course, want to extend the definition of the determinant to n × n matrices. So,
we next consider the 3×3 case. Let A =
a11 a12 a13
a21 a22 a23
a31 a32 a33
. It can be shown (with some
effort) that A is invertible if and only if
a11a22a33 − a11a23a32 − a12a21a33 + a13a21a32 + a12a23a31 − a13a22a31 , 0
Thus, we define
det A = a11a22a33 − a11a23a32 − a12a21a33 + a13a21a32 + a12a23a31 − a13a22a31
Section 5.3 Determinants 161
However, we would like to write the formula in a nicer form. Moreover, we would
like to find some sort of pattern with this and the 2 × 2 determinant so that we can
figure out how to define the determinant for general n × n matrices. We observe that
a11a22a33 − a11a23a32 − a12a21a33 + a13a21a32 + a12a23a31 − a13a22a31
= a11(a22a33 − a23a32) − a21(a12a33 − a13a32) + a31(a12a23 − a13a22)
= a11
∣
∣
∣
∣
∣
∣
a22 a23
a32 a33
∣
∣
∣
∣
∣
∣
− a21
∣
∣
∣
∣
∣
∣
a12 a13
a32 a33
∣
∣
∣
∣
∣
∣
+ a31
∣
∣
∣
∣
∣
∣
a12 a13
a22 a23
∣
∣
∣
∣
∣
∣
(5.2)
Therefore, the 3×3 determinant can be written as a linear combination of 2×2 deter-
minants of submatrices of A where the coefficients are (plus or minus) the coefficients
from the first column of A.
To extend all of this to n × n matrices we will recursively use the same pattern. We
first make a definition.
DEFINITION
Cofactor
Let A be an n×n matrix with n ≥ 2. Let A(i, j) be the (n−1)× (n−1) matrix obtained
from A by deleting the i-th row and the j-th column. The cofactor of ai j is
Ci j = (−1)i+ j det A(i, j)
So far, we only have a definition of a determinant for up to 3 × 3 matrices. So,
we currently can only use this definition up to n = 4. We will rectify that problem
shortly.
EXAMPLE 2 Find all nine of the cofactors of A =
1 0 2
0 −1 3
−2 −3 4
.
Solution: The cofactor C11 of a11 is defined by C11 = (−1)1+1 det A(1, 1), where
A(1, 1) is the matrix obtained from A by deleting the first row and first column. That
is
A(1, 1) =
1 0 2
0 −1 3
−2 −3 4
=
[
−1 3
−3 4
]
Hence, C11 = (−1)1+1
∣
∣
∣
∣
∣
∣
−1 3
−3 4
∣
∣
∣
∣
∣
∣
= (1)[(−1)(4) − 3(−3)] = 5.
The cofactor C12 of a12 is defined by C12 = (−1)1+2 det A(1, 2), where A(1, 2) is the
matrix obtained from A by deleting the first row and second column. We have
A(1, 2) =
1 0 2
0 −1 3
−2 −3 4
=
[
0 3
−2 4
]
Thus, C12 = (−1)1+2
∣
∣
∣
∣
∣
∣
0 3
−2 4
∣
∣
∣
∣
∣
∣
= (−1)[0(4) − 3(−2)] = −6.
162 Chapter 5 Inverses and Determinants
Similarly, the cofactor C13 of a13 is defined by C13 = (−1)1+3 det A(1, 3), where A(1, 3)
is the matrix obtained from A by deleting the first row and third column. We have
A(1, 3) =
1 0 2
0 −1 3
−2 −3 4
=
[
0 −1
−2 −3
]
Thus, C13 = (−1)1+3
∣
∣
∣
∣
∣
∣
0 −1
−2 −3
∣
∣
∣
∣
∣
∣
= (1)[0(−3) − (−1)(−2)] = −2.
Continuing in this way, we can find the other cofactors are:
C21 = (−1)2+1
∣
∣
∣
∣
∣
∣
0 2
−3 4
∣
∣
∣
∣
∣
∣
= (−1)[0(4) − 2(−3)] = −6
C22 = (−1)2+2
∣
∣
∣
∣
∣
∣
1 2
−2 4
∣
∣
∣
∣
∣
∣
= (1)[1(4) − 2(−2)] = 8
C23 = (−1)2+3
∣
∣
∣
∣
∣
∣
1 0
−2 −3
∣
∣
∣
∣
∣
∣
= (−1)[1(−3) − 0(−2)] = 3
C31 = (−1)3+1
∣
∣
∣
∣
∣
∣
0 2
−1 3
∣
∣
∣
∣
∣
∣
= (1)[0(3) − 2(−1)] = 2
C32 = (−1)3+2
∣
∣
∣
∣
∣
∣
1 2
0 3
∣
∣
∣
∣
∣
∣
= (−1)[1(3) − 2(0)] = −3
C33 = (−1)3+3
∣
∣
∣
∣
∣
∣
1 0
0 −1
∣
∣
∣
∣
∣
∣
= (1)[1(−1) − 0(0)] = −1
EXERCISE 1 Find all four of the cofactors of A =
[
−1 −2
3 4
]
.
Using cofactors we can rewrite equation (5.2) as
det A = a11C11 + a21C21 + a31C31 =
3∑
i=1
ai1Ci1
To define the determinant of an n × n matrix we repeat this pattern.
DEFINITION
n × n Determinant
Let A be an n × n matrix with n ≥ 2. The determinant of A is defined to be
det A =
n∑
i=1
ai1Ci1
where the determinant of a 1 × 1 matrix is defined by det[c] = c.
Section 5.3 Determinants 163
REMARKS
1. Notice that the definition of a determinant is recursive. The determinant of
an n × n matrix is defined in terms of cofactors which are determinants of
(n − 1) × (n − 1) matrices.
2. As we did with 2 × 2 matrices, we often represent the determinant of a matrix
with vertical straight lines. For example, we write
det
a11 a12 a13
a21 a22 a23
a31 a32 a33
=
∣
∣
∣
∣
∣
∣
∣
∣
a11 a12 a13
a21 a22 a23
a31 a32 a33
∣
∣
∣
∣
∣
∣
∣
∣
EXAMPLE 3 Find the determinant of A =
1 0 2
0 1 1
−2 2 3
.
Solution: By definition, we have
det A = a11C11 + a21C21 + a31C31
= 1(−1)1+1
∣
∣
∣
∣
∣
∣
1 1
2 3
∣
∣
∣
∣
∣
∣
+ 0(−1)2+1
∣
∣
∣
∣
∣
∣
0 2
2 3
∣
∣
∣
∣
∣
∣
+ (−2)(−1)3+1
∣
∣
∣
∣
∣
∣
0 2
1 1
∣
∣
∣
∣
∣
∣
= 1(1)(
1(3) − 1(2))
+ 0(1)(
0(3) − 2(2))
+ (−2)(1)(
0(1) − 2(1))
= 1 + 0 + 4 = 5
In the example above we did not actually need to evaluate the cofactor C12 since the
coefficient a12 = 0. Such steps are normally omitted.
EXAMPLE 4 Calculate
∣
∣
∣
∣
∣
∣
∣
∣
0 2 −4
1 3 2
0 1 −2
∣
∣
∣
∣
∣
∣
∣
∣
.
Solution: By definition, we have
det A = a11C11 + a21C21 + a31C31
= (−1)2+1
∣
∣
∣
∣
∣
∣
2 −4
1 −2
∣
∣
∣
∣
∣
∣
= (−1)[2(−2) − (−4)(1)]
= 0
164 Chapter 5 Inverses and Determinants
EXERCISE 2 Find the determinant of A =
−1 −2 1
−2 4 0
0 1 1
.
EXAMPLE 5 Find the determinant of B =
0 1 0 0
1 0 1 0
0 1 0 1
−2 0 2 3
.
Solution: By definition, we have
det B = b11C11 + b21C21 + b31C31 + b41C41
= (−1)2+1
∣
∣
∣
∣
∣
∣
∣
∣
1 0 0
1 0 1
0 2 3
∣
∣
∣
∣
∣
∣
∣
∣
+ (−2)(−1)4+1
∣
∣
∣
∣
∣
∣
∣
∣
1 0 0
0 1 0
1 0 1
∣
∣
∣
∣
∣
∣
∣
∣
= (−1)
∣
∣
∣
∣
∣
∣
∣
∣
1 0 0
1 0 1
0 2 3
∣
∣
∣
∣
∣
∣
∣
∣
+ 2
∣
∣
∣
∣
∣
∣
∣
∣
1 0 0
0 1 0
1 0 1
∣
∣
∣
∣
∣
∣
∣
∣
We now need to evaluate these 3 × 3 determinants. By definition, we get
det B =(−1)
(
(−1)1+1
∣
∣
∣
∣
∣
∣
0 1
2 3
∣
∣
∣
∣
∣
∣
+ (−1)2+1
∣
∣
∣
∣
∣
∣
0 0
2 3
∣
∣
∣
∣
∣
∣
)
+
2
(
(−1)1+1
∣
∣
∣
∣
∣
∣
1 0
0 1
∣
∣
∣
∣
∣
∣
+ (−1)3+1
∣
∣
∣
∣
∣
∣
0 0
1 0
∣
∣
∣
∣
∣
∣
)
=(−1)[(
0(3) − 1(2))
+ (−1)(
0(3) − 0(2))]
+
2[(
1(1) − 0(0))
+(
0(0) − 0(1))]
=2 + 2 = 4
This example clearly demonstrates the recursive nature of the definition of the de-
terminant. In particular, notice that we were quite lucky to have lots of zeros in the
matrix as otherwise we would have had many more cofactors to evaluate. In general,
if we just use a cofactor expansion to find the determinant of an n × n matrix, we
need to calculate the value of n cofactors. Each of these cofactors are in terms of
a determinant of an (n − 1) × (n − 1) matrix. To calculate the determinant of the
(n − 1) × (n − 1) matrices, we need to calculate the value of n − 1 determinants of
(n − 2) × (n − 2) matrices, and so on. For even a relatively small value of n this can
be a lot of work. Moreover, in some modern applications, it is necessary to calcu-
late determinants of very large matrices. Therefore, we want to find a faster way of
evaluating a determinant.
Section 5.3 Determinants 165
The Cofactor Expansion
Observe that our way of factoring the 3×3 determinant in (5.2) was not the only way
that equation could be factored. We also could get
det A = a12C12 + a22C22 + a32C32
det A = a13C13 + a23C23 + a33C33
det A = a11C11 + a12C12 + a13C13
det A = a21C21 + a22C22 + a23C23
det A = a31C31 + a32C32 + a33C33
Hence, we did not have to use the coefficients and cofactors from the first column of
the matrix. We can in fact use the coefficients and cofactors from any row or column
of the matrix. For n × n matrices we have the following theorem.
THEOREM 1 Let A be an n × n matrix. For any i with 1 ≤ i ≤ n
det A =
n∑
k=1
aikCik
called the cofactor expansion across the i-th row, OR for any j with 1 ≤ j ≤ n
det A =
n∑
k=1
ak jCk j
called the cofactor expansion across the j-th column.
REMARK
The proof of this theorem is fairly lengthy and so is omitted. However, the proof is
actually quite interesting, so you may wish to search for it online. Alternatively, if
you are interested in a challenge, try proving it yourself.
This theorem allows us to expand the determinant along the row or column with the
most zeros.
EXAMPLE 6 Find the determinant of each of the following matrices.
(a) A =
1 2 0
−3 4 6
−2 5 0
.
Solution: Using the cofactor expansion along the third columns gives
det A = a13C13 + a23C23 + a33C33 = 6(−1)2+3
∣
∣
∣
∣
∣
∣
1 2
−2 5
∣
∣
∣
∣
∣
∣
= −6(
1(5) − 2(−2))
= −54
166 Chapter 5 Inverses and Determinants
(b) B =
1 4 2 5
0 1 0 0
−1 3 0 1
0 −2 3 0
Solution: Using the cofactor expansion along the second row gives
det B = b21C21 + b22C22 + b23C23 + b24C24 = 1(−1)2+2
∣
∣
∣
∣
∣
∣
∣
∣
1 2 5
−1 0 1
0 3 0
∣
∣
∣
∣
∣
∣
∣
∣
We now expand the 3 × 3 determinant along the third row to get
det B = 1(3)(−1)3+2
∣
∣
∣
∣
∣
∣
1 5
−1 1
∣
∣
∣
∣
∣
∣
= −3(
1(1) − 5(−1))
= −18
(c) C =
1 51 −73 29
0 2 −16 15
0 0 3 −99
0 0 0 4
.
Solution: Using the cofactor expansion along the first column in each step gives
det C = 1(−1)1+1
∣
∣
∣
∣
∣
∣
∣
∣
2 −16 15
0 3 −99
0 0 4
∣
∣
∣
∣
∣
∣
∣
∣
= 1(2)(−1)1+1
∣
∣
∣
∣
∣
∣
3 −99
0 4
∣
∣
∣
∣
∣
∣
= 1(2)(
3(4) − (−99)(0))
= 1(2)(3)(4) = 24
(d) D =
3 −3 4 0 5
−6 8 9 0 3
−3 5 2 0 1
−5 −5 3 0 1
3 7 3 0 1
.
Solution: Using the cofactor expansion along the fourth column gives det D = 0.
EXERCISE 3 Find the determinant of each of the following matrices.
(a) A =
2 3 1
−4 2 −5
0 −3 0
(b) B =
4 0 0
−11 3 0
15 17 8
Part (c) in Example 6 and part (b) in Exercise 3 motivates the following definitions
which leads to an important result.
Section 5.3 Determinants 167
DEFINITION
Upper Triangular
Lower Triangular
An m×n matrix U is said to be upper triangular if ui j = 0 whenever i > j. An m×n
matrix L is said to be lower triangular if ℓi j = 0 whenever i < j.
EXAMPLE 7 The matrices
1 2 3 1
0 0 2 5
0 0 −1 1
,
1 1 1
0 1 1
0 0 1
, and[
−4]
are all upper triangular.
The matrices
3 0 0 0
1 4 0 0
0 0 −2 0
,
1 0 0
1 0 0
1 0 0
, and
[
0 0
0 0
]
are all lower triangular.
THEOREM 2 If an n × n matrix A is upper triangular or lower triangular, then
det A = a11a22 · · · ann
Proof: We prove this by induction on n for n × n upper triangular matrices. The
proof for lower triangular matrices is similar.
If A is a 1 × 1 matrix, then A is upper triangular and det A = a11 as required. Assume
that if B is an (n − 1) × (n − 1) upper triangular matrix, then det B = b11 · · · b(n−1)(n−1).
Let A be an n × n upper triangular matrix. Expanding the determinant along the first
column gives
det A = a11(−1)1+1 det A(1, 1) + 0 + · · · + 0 = a11 det A(1, 1)
But, A(1, 1) is the (n−1)× (n−1) upper triangular matrix formed by deleting the first
row and first column of A. Thus, det A(1, 1) = a22 · · · ann by the inductive hypothesis.
Thus, det A = a11a22 · · · ann as required. �
EXERCISE 4 Calculate the determinant of the following matrices.
A =
1 0 0
−√
2 3 0
π 5 5
, B =
2 −3 5
0 2 3
0 0 2
, C =
3 0 1
2 1 0
1 0 0
168 Chapter 5 Inverses and Determinants
Determinants and Row Operations
We have seen that it is very difficult to calculate the determinant of a large matrix
with very few zeros, but very easy to calculate the determinant of an upper or lower
triangular matrix. Therefore, we now look at how applying elementary row oper-
ations to a matrix changes the determinant, so that we can row reduce a matrix to
upper (or lower) triangular form to make calculating the determinant much easier.
THEOREM 3 If A is an n × n matrix and B is the matrix obtained from A by multiplying one row
of A by c ∈ R, then det B = c det A.
Proof: We will prove the theorem by induction on n. For n = 1, the result is easily
verified. Assume the result holds for all (n − 1) × (n − 1) matrices with n ≥ 3. Let B
be the matrix obtained from A by multiplying the ℓ-th row of A by c. Then
bi j =
ai j if i , ℓ
cai j if i = ℓ
Let B(x, y) denote the (n − 1) × (n − 1) submatrix of B obtained by deleting the x-th
row and y-th column of B where x , ℓ. Then B(x, y) is the matrix obtained from
A(x, y) by multiplying some row of A(x, y) by c. Hence, by the inductive hypothesis,
det B(x, y) = c det A(x, y). Therefore, using the cofactor expansion along the x-th row
gives
det B =
n∑
k=1
bxk(−1)x+k det B(x, k)
=
n∑
k=1
axk(−1)x+k[c det A(x, k)]
= c
n∑
k=1
axk(−1)x+k det A(x, k)
= c det A �
THEOREM 4 If A is an n× n matrix and B is the matrix obtained from A by swapping two rows of
A, then det B = − det A.
COROLLARY 5 If an n × n matrix A has two identical rows, then det A = 0.
Proof: If the i-th and j-th rows of A are identical, then let B be the matrix obtained
from A by swapping the i-th and j-th rows of A. Then, clearly A = B. But, by
Theorem 4, we have that det B = − det A. Hence, det A = − det A, so det A = 0. �
Section 5.3 Determinants 169
THEOREM 6 If A is an n × n matrix and B is the matrix obtained from A by adding a multiple of
one row of A to another row, then det B = det A.
Now we can find a determinant quite efficiently by row reducing the matrix to upper
triangular form, keeping track of how the row operations change the determinant.
Since adding a multiple of one row to another does not change the determinant, it
reduces the chance of errors if this is the only row operation used.
EXAMPLE 8 Evaluate the following determinants.
(a)
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
1 2 3 1
−1 −1 −1 2
1 3 1 1
−2 −2 0 −1
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
.
Solution: Since adding a multiple of one row to another does not change the deter-
minant, we get∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
1 2 3 1
−1 −1 −1 2
1 3 1 1
−2 −2 0 −1
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
1 2 3 1
0 1 2 3
0 1 −2 0
0 2 6 1
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
1 2 3 1
0 1 2 3
0 0 −4 −3
0 0 2 −5
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
1 2 3 1
0 1 2 3
0 0 −4 −3
0 0 0 −13/2
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
= 1(1)(−4)(−13/2) = 26
(b)
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
3 2 −1 1
2 −1 0 −3
5 2 3 −2
1 3 −1 4
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
Solution: Using the row operation R4 + R2 and then R4 − R1, we get∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
3 2 −1 1
2 −1 0 −3
5 2 3 −2
1 3 −1 4
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
3 2 −1 1
2 −1 0 −3
5 2 3 −2
3 2 −1 1
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
3 2 −1 1
2 −1 0 −3
5 2 3 −2
0 0 0 0
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
= 0
170 Chapter 5 Inverses and Determinants
EXERCISE 5 Find the determinant of the following matrices.
A =
1 2 1 2
−1 3 −1 1
2 3 3 2
1 1 1 1
, B =
1 4 2 3
−1 −4 −5 4
−1 −3 1 1
0 0 1 1
Determinants and Elementary Matrices
Since multiplying a matrix A on the left by an elementary matrix E performs the same
elementary row operation on A as the corresponding row operation for E, Theorems
3, 4, and 6 give the following result.
COROLLARY 7 If A is an n × n matrix and E is an n × n elementary matrix, then
det EA = det E det A
Proof: Observe from Theorems 3, 4, and 6, that performing an elementary row op-
eration multiplies the determinant of the original matrix by a non-zero real number
c (in the case of swapping rows or adding a multiple of one row to another we have
c = −1 or c = 1 respectively). Since E is obtained from I by performing a single row
operation, we get that det E = c, where c depends on the single row operation. Simi-
larly, since EA is the matrix obtained from A by performing the same row operation,
we have that det EA = c det A = det E det A. �
This corollary allows us to prove some other important results.
THEOREM 8 (addition to the Invertible Matrix Theorem)
An n × n matrix A is invertible if and only if det A , 0.
Proof: By Theorem 5.2.6 there exists a sequence of elementary matrices E1, . . . , Ek
such that Ek · · ·E1A = R where R is the reduced row echelon form of A. Then
det R = det(Ek · · ·E1A) = det Ek · · · det E1 det A
Since the determinant of an elementary matrix is non-zero, we get that det A , 0 if
and only if det R , 0. But det R , 0 if and only if R = I. Hence, det A , 0 if and
only if A is invertible. �
Section 5.3 Determinants 171
THEOREM 9 If A and B are n × n matrices, then det(AB) = det A det B.
Proof: By Theorem 5.2.6 there exists a sequence of elementary matrices E1, . . . , Ek
such that A = E1 · · ·EkR where R is the reduced row echelon form of A. If det A , 0,
then A is invertible and R = I. Using Corollary 7, we get
det(AB) = det(E1 · · ·EkB) = det(E1 · · ·Ek) det B = det A det B
If det A = 0, then R , I and so R contains at least one row of zeros. Corollary 7 then
gives
det(AB) = det(E1 · · ·EkRB) = det E1 · · · det Ek det(RB)
Using block multiplication, we see that if R contains a row of zeros, then so does RB.
Thus, det(RB) = 0 and hence
det(AB) = 0 = det A det B
�
COROLLARY 10 If A is an invertible matrix, then det A−1 =1
det A.
THEOREM 11 If A is an n × n matrix, then det A = det AT .
Since det A = det AT we can use column operations instead of row operations when
evaluating a determinant. In particular, adding a multiple of one column to another
does not change the determinant, swapping two columns multiplies the determinant
by (-1), and multiplying a column by a scalar c multiplies the determinant by c.
Furthermore, combining row operations, column operations, and cofactor expansions
can make evaluating determinants quite efficient.
EXAMPLE 9 Evaluate the determinant of D =
1 3 −1 1
−3 2 1 2
2 −1 1 1
2 −3 2 −3
.
Solution: Using the row operation R3 + R1 gives
det D =
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
1 3 −1 1
−3 2 1 2
3 2 0 2
2 −3 2 −3
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
We now use the column operation C2 − C4 to get
det D =
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
1 2 −1 1
−3 0 1 2
3 0 0 2
2 0 2 −3
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
172 Chapter 5 Inverses and Determinants
Using the cofactor expansion along the second column gives
det D = 2(−1)1+2
∣
∣
∣
∣
∣
∣
∣
∣
−3 1 2
3 0 2
2 2 −3
∣
∣
∣
∣
∣
∣
∣
∣
Applying R3 − 2R1 and then using the cofactor expansion along the second column
we get
det D = −2
∣
∣
∣
∣
∣
∣
∣
∣
−3 1 2
3 0 2
8 0 −7
∣
∣
∣
∣
∣
∣
∣
∣
= (−2)(1)(−1)1+2
∣
∣
∣
∣
∣
∣
3 2
8 −7
∣
∣
∣
∣
∣
∣
= −74
Section 5.3 Problems
1. Find all the cofactors of the following matrices.
(a)
[
4 −1
5 8
]
(b)
1 0 3
1 2 0
−1 1 3
(c)
3 2 −1
0 4 3
2 −1 2
(d)
−3 1 −4
1 −5 9
−2 6 −5
2. Evaluate the following determinants.
(a)
∣
∣
∣
∣
∣
∣
2 1
−3 4
∣
∣
∣
∣
∣
∣
(b)
∣
∣
∣
∣
∣
∣
1 3
1 −6
∣
∣
∣
∣
∣
∣
(c)
∣
∣
∣
∣
∣
∣
4 2
6 3
∣
∣
∣
∣
∣
∣
(d)
∣
∣
∣
∣
∣
∣
∣
∣
1 2 −1
0 2 7
0 0 3
∣
∣
∣
∣
∣
∣
∣
∣
(e)
∣
∣
∣
∣
∣
∣
∣
∣
3 0 −4
−2 0 5
3 −1 8
∣
∣
∣
∣
∣
∣
∣
∣
(f)
∣
∣
∣
∣
∣
∣
∣
∣
1 2 1
5 2 0
3 0 0
∣
∣
∣
∣
∣
∣
∣
∣
(g)
∣
∣
∣
∣
∣
∣
∣
∣
1 2 3
−1 3 −8
2 −5 2
∣
∣
∣
∣
∣
∣
∣
∣
(h)
∣
∣
∣
∣
∣
∣
∣
∣
2 3 1
−2 2 0
1 3 4
∣
∣
∣
∣
∣
∣
∣
∣
(i)
∣
∣
∣
∣
∣
∣
∣
∣
6 8 −8
7 5 8
−2 −4 8
∣
∣
∣
∣
∣
∣
∣
∣
(j)
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
1 10 7 −9
7 −7 7 7
2 −2 6 2
−3 −3 4 1
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
(k)
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
1 a a2 a3
1 b b2 b3
1 c c2 c3
1 d d2 d3
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
(l)
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
−1 2 6 4
0 3 5 6
1 −1 4 −2
1 2 1 2
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
3. Prove that if A is invertible, then
det A−1 =1
det A.
4. Two n×n matrices A and B are said to be similar
if there exists an invertible matrix P such that
P−1AP = B. Prove that if A and B are similar,
then
det A = det B
5. An n × n matrix A is called orthogonal if
AT = A−1.
(a) If A is orthogonal prove that det A = ±1.
(b) Give an example of an orthogonal matrix
for which det A = −1.
6. Let A and B be n × n matrices. Determine
whether each statement is true or false. Justify
your answer.
(a) If the columns of A are linearly indepen-
dent, then det A , 0.
(b) det(A + B) = det A + det B.
(c) The system of equations A~x = ~b is consis-
tent only if det A , 0.
(d) If det(AB) = 0, then det A = 0 or
det B = 0.
7. Let A be an n×n matrix and let B be the matrix
obtained from A by swapping two rows of A.
Prove that det B = − det A.
Section 5.4 Determinants and Systems of Equations 173
5.4 Determinants and Systems of Equations
The Invertible Matrix Theorem shows us that there is a close connection between
whether an n × n matrix A is invertible, the number of solutions of the system of
linear equations A~x = ~b, and the determinant of A. We now examine this relationship
a little further by looking at how to use determinants to find the inverse of a matrix
and to solve systems of linear equations.
Inverse by Cofactors
Observe that we can write the inverse of a matrix A =
[
a11 a12
a21 a22
]
in the form
A−1 =1
det A
[
a22 −a12
−a21 a11
]
=1
det A
[
C11 C21
C12 C22
]
That is, the entries of A−1 are the cofactors of A. In particular,
(
A−1)
i j =1
det AC ji (5.3)
It is important to take careful notice of the change of order in the subscripts in the
line above.
It is instructive to verify this result by multiplying out A and A−1.
AA−1 =
[
a11 a12
a21 a22
]
1
det A
[
C11 C21
C12 C22
]
=1
det A
[
a11C11 + a12C12 a11C21 + a12C22
a21C11 + a22C12 a21C21 + a22C22
]
=1
det A
[
det A a11(−a12) + a12a11
a21a22 + a22(−a21) det A
]
=
[
1 0
0 1
]
Our goal now is to prove that equation (5.3) holds in the n × n case. We begin by
proving a lemma.
LEMMA 1 If A is an n × n matrix with cofactors Ci j and i , j, then
n∑
k=1
(A)ikC jk = 0
174 Chapter 5 Inverses and Determinants
Proof: Let B be the matrix obtained from A by replacing the j-th row of A by the
i-th row of A. That is, bℓk = aℓk for all ℓ , j, and b jk = aik. Then B has two identical
rows, so det B = 0. Moreover, since the cofactors of the j-th row of B are the same
as the cofactors of the j-th row of A we get
0 = det B =
n∑
k=1
b jkC jk =
n∑
k=1
aikC jk
�
In words, this lemma says that if we try to do a cofactor expansion of the determinant,
but use the coefficients from one row and the cofactors from another row, then we will
always get 0.
THEOREM 2 If A is an invertible n × n matrix, then (A−1)i j =1
det AC ji.
Proof: Let B be the n× n matrix defined by (B)i j =1
det AC ji. Then, for 1 ≤ i ≤ n we
have that(
AB)
ii =
n∑
k=1
(A)ik(B)ki =1
det A
n∑
k=1
(A)ikCik = 1
For(
AB)
i j with i , j, we have
(
AB)
i j =
n∑
k=1
(A)ik(B)k j =1
det A
n∑
k=1
(A)ikC jk = 0
by Lemma 1. Therefore, AB = I so B = A−1. �
Because of this result, we make the following two definitions.
DEFINITION
Cofactor Matrix
Let A be an n × n matrix. The cofactor matrix, cof A, of A is the matrix whose i j-th
entry is the i j-th cofactor of A. That is,
(cof A)i j = Ci j
DEFINITION
Adjugate
Let A be an n × n matrix. The adjugate of A is the matrix defined by
(
adj A)
i j = C ji
In particular, adj A = (cof A)T .
Section 5.4 Determinants and Systems of Equations 175
REMARK
By Theorem 2, we have that A−1 =1
det Aadj A.
EXAMPLE 1 Determine the adjugate of A =
1 0 2
−2 1 0
3 −1 1
and verify that A−1 =1
det Aadj A.
Solution: We find that the cofactor matrix of A is
cof A =
1 2 −1
−2 −5 1
−2 −4 1
So
adj A = (cof A)T =
1 −2 −2
2 −5 −4
−1 1 1
Next, we find that
det A =
∣
∣
∣
∣
∣
∣
∣
∣
1 0 2
−2 1 0
3 −1 1
∣
∣
∣
∣
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
∣
∣
1 0 2
0 1 0
1 −1 1
∣
∣
∣
∣
∣
∣
∣
∣
= −1
Multiplying we find
A
(
1
det Aadj A
)
=
1 0 2
−2 1 0
3 −1 1
1
−1
1 −2 −2
2 −5 −4
−1 1 1
=
1 0 0
0 1 0
0 0 1
Hence,
A−1 =1
det Aadj A =
−1 2 2
−2 5 4
1 −1 −1
EXERCISE 1 Determine the adjugate of A =
3 4 0
−2 1 2
1 3 1
and verify that A−1 =1
det Aadj A.
Notice that this is not a very efficient way of calculating the inverse. However, it can
be useful in some theoretic applications as it gives a formula for the entries of the
inverse.
176 Chapter 5 Inverses and Determinants
Cramer’s Rule
Consider the system of linear equations A~x = ~b where A is an n × n matrix. If A is
invertible, then we know this has unique solution
~x = A−1~b =1
det A(adj A)~b =
1
det A
C11 · · · Cn1
......
C1n · · · Cnn
b1
...bn
Moreover, we can find that the i-th component of ~x is
xi =b1C1i + · · · + bnCni
det A
Observe that the quantity b1C1i+ · · ·+bnCni is the determinant of the matrix where we
have replaced the i-th column of A by ~b. In particular, if we let Ai denote the matrix
where the i-th column of A is replace by ~b
Ai =
a11 · · · a1(i−1) b1 a1(i+1) · · · a1n
......
......
...an1 · · · an(i−1) bn an(i+1) · · · ann
then we get
xi =det Ai
det A
THEOREM 3 (Cramer’s Rule)
If A is an n × n invertible matrix, then the solution ~x of A~x = ~b is given by
xi =det Ai
det A, 1 ≤ i ≤ n
where Ai is the matrix obtained from A by replacing the i-th column of A by ~b.
EXAMPLE 2 Solve the system of linear equations using Cramer’s Rule.
x1 + 2x2 + x3 = 2
−4x1 − x2 + 3x3 = 6
−2x1 + x2 + 2x3 = 3
Solution: We have A =
1 2 1
−4 −1 3
−2 1 2
and ~b =
2
6
3
. We find that
det A =
∣
∣
∣
∣
∣
∣
∣
∣
1 2 1
−4 −1 3
−2 1 2
∣
∣
∣
∣
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
∣
∣
1 2 1
0 7 7
0 5 4
∣
∣
∣
∣
∣
∣
∣
∣
= −7
Section 5.4 Determinants and Systems of Equations 177
Therefore, A is invertible and we can apply Cramer’s Rule to get
x1 =det A1
det A= −1
7
∣
∣
∣
∣
∣
∣
∣
∣
2 2 1
6 −1 3
3 1 2
∣
∣
∣
∣
∣
∣
∣
∣
= −1
7
∣
∣
∣
∣
∣
∣
∣
∣
0 2 1
0 −1 3
−1 1 2
∣
∣
∣
∣
∣
∣
∣
∣
=−7
−7= 1
x2 =det A2
det A= −1
7
∣
∣
∣
∣
∣
∣
∣
∣
1 2 1
−4 6 3
−2 3 2
∣
∣
∣
∣
∣
∣
∣
∣
= −1
7
∣
∣
∣
∣
∣
∣
∣
∣
1 0 1
−4 0 3
−2 −1 2
∣
∣
∣
∣
∣
∣
∣
∣
=7
−7= −1
x3 =det A3
det A= −1
7
∣
∣
∣
∣
∣
∣
∣
∣
1 2 2
−4 −1 6
−2 1 3
∣
∣
∣
∣
∣
∣
∣
∣
= −1
7
∣
∣
∣
∣
∣
∣
∣
∣
1 2 2
0 7 14
0 5 7
∣
∣
∣
∣
∣
∣
∣
∣
=−21
−7= 3
So, the solution is ~x =
1
−1
3
.
EXAMPLE 3 Let A =
a 1 2
0 b −1
c 1 d
. Assuming that det A , 0, solve A~x =
1
−1
1
.
Solution: We have
det A = a(bd + 1) + c(−1 − 2b) = abd + a − c − 2cb
Thus,
x1 =1
det A
∣
∣
∣
∣
∣
∣
∣
∣
1 1 2
−1 b −1
1 1 d
∣
∣
∣
∣
∣
∣
∣
∣
=1
det A
∣
∣
∣
∣
∣
∣
∣
∣
1 1 2
0 b + 1 1
0 0 d − 2
∣
∣
∣
∣
∣
∣
∣
∣
=(b + 1)(d − 2)
abd + a − c − 2cb
x2 =1
det A
∣
∣
∣
∣
∣
∣
∣
∣
a 1 2
0 −1 −1
c 1 d
∣
∣
∣
∣
∣
∣
∣
∣
=1
det A
∣
∣
∣
∣
∣
∣
∣
∣
a 0 1
0 −1 −1
c 0 d − 1
∣
∣
∣
∣
∣
∣
∣
∣
=−ad + a + c
abd + a − c − 2cb
x3 =1
det A
∣
∣
∣
∣
∣
∣
∣
∣
a 1 1
0 b −1
c 1 1
∣
∣
∣
∣
∣
∣
∣
∣
=1
det A
∣
∣
∣
∣
∣
∣
∣
∣
a 0 1
0 b + 1 −1
c 0 1
∣
∣
∣
∣
∣
∣
∣
∣
=(b + 1)(a − c)
abd + a − c − 2cb
That is, the solution is ~x =1
abd + a − c − 2cb
(b + 1)(d − 2)
−ad + a + c
(b + 1)(a − c)
.
178 Chapter 5 Inverses and Determinants
Section 5.4 Problems
1. Determine the adjugate of A and verify that
A(adj A) = (det A)I.
(a) A =
[
1 3
−2 5
]
(b) A =
[
4 5
−1 3
]
(c) A =
[
1 3
4 10
]
(d) A =
1 2 −1
0 2 7
0 0 3
(e) A =
1 −2 2
3 0 1
5 1 1
(f) A =
4 0 −4
0 −1 1
−2 2 −1
2. Assuming that each matrix is invertible, use the
adjugate to find A−1.
(a) A =
[
a b
c d
]
(b) A =
2 −3 t
−2 0 1
3 1 1
(c) A =
2 1 3
3 t −1
−2 1 4
(d) A =
a 0 b
0 c 0
b 0 d
3. Solve the following systems of linear equations
using Cramer’s Rule.
(a) 3x1 − x2 = 2
4x1 + 7x2 = 5
(b) 2x1 + x2 = 1
3x1 + 7x2 = −2
(c) 2x1 + x2 = 3
3x1 + 7x2 = 5
(d) 2x1 + 3x2 + x3 = 1
x1 + x2 − x3 = −1
−2x1 + 2x3 = 1
(e) 5x1 + x2 − x3 = 4
9x1 + x2 − x3 = 1
x1 − x2 + 5x3 = 2
5.5 Area and Volume
Area of a Parallelogram in R2
Let ~u =
[
u1
u2
]
and~v =
[
v1
v2
]
be two non-zero vectors inR2. We can form a parallelogram
in R2 with corner points (0, 0), (u1, u2), (v1, v2), and (u1 + v1, u2 + v2). This is called
the parallelogram induced by ~u and ~v. See Figure 5.5.1.
x1
x2
~u
(u1, u2)~v
(v1, v2)
~u + ~v
u1 u1 + v1
v2
u2 + v2
(0, 0)
(u1 + v1, u2 + v2)
Figure 5.5.1: The parallelogram induced by vectors ~u and ~v
Section 5.5 Area and Volume 179
We know the area of a parallelogram is length × height. As in Figure 5.5.2, we get
Area(~u,~v) = ‖~u ‖ ‖ perp~u(~v) ‖
Also observe from Figure 5.5.2 that ‖ perp~u ~v ‖ = ‖~v ‖ | sin θ|. Now, recall from Section
1.4 that cos θ =~u · ~v‖~u ‖ ‖~v ‖
. Using this gives
(Area(~u,~v))2 = ‖~u ‖2‖~v ‖2 sin2 θ
= ‖~u ‖2‖~v ‖2(1 − cos2 θ)
= ‖~u ‖2‖~v ‖2 − (~u · ~v)2
= (u21 + u2
2)(v21 + v2
2) − (u1v1 + u2v2)2
= u21v2
2 + u22v2
1 − 2(u1v2u2v1)
= (u1v2 − u2v1)2
=
∣
∣
∣
∣
∣
∣
u1 v1
u2 v2
∣
∣
∣
∣
∣
∣
2
Hence, the area of the parallelogram is given by Area(~u,~v) =∣
∣
∣
∣
det[
~u ~v]
∣
∣
∣
∣
.
x1
x2
(0, 0)
‖~u‖
‖~v‖
θ
‖perp~u(~v)‖
Figure 5.5.2: The area of the parallelogram induced by ~u and ~v
EXAMPLE 1 Determine the area of the parallelogram induced by ~u =
[
1
3
]
and ~v =
[
2
1
]
.
Solution: We have
Area(~u,~v) =
∣
∣
∣
∣
∣
∣
det
[
1 2
3 1
]∣
∣
∣
∣
∣
∣
= | − 5| = 5
REMARK
Observe that the determinant in the example above gave a negative answer because
the angle θ from ~u to ~v was greater than π radians. In particular, if we swap ~u and ~v,
that is swap the columns of the matrix[
~u ~v]
, then we will multiply the determinant
by -1. In a similar way, we can use the result above to get geometric confirmation of
many of the properties of determinants.
180 Chapter 5 Inverses and Determinants
Volume of a Parallelepiped
We can repeat what we did above to find the volume of a parallelepiped induced by
three non-zero vectors ~u =
u1
u2
u3
, ~v =
v1
v2
v3
, and ~w =
w1
w2
w3
in R3.
~n = ~u × ~v
O
~u
~v
~w
proj~n(~w)
‖~u × ~v ‖
Figure 5.5.3: The volume of a parallelepiped is determined by the area of its base times its height
The volume of a parallelepiped is (area of base) × height. The base is the parallelo-
gram induced by vectors ~u and ~v. By Theorem 1.4.5, we have that
area of base = ‖~u ‖ ‖~v ‖∣
∣
∣ sin θ∣
∣
∣ = ‖~u × ~v ‖
Also, the height is the perpendicular of the projection of ~w onto the base. But, this is
just the projection of ~w onto the normal vector ~n of the base. Since ~u and ~v lie in the
base, we can take ~n = ~u × ~v. Therefore,
height = ‖ proj~n(~w) ‖ = |~w · ~n |‖~n ‖
=|~w · (~u × ~v)|‖~u × ~v ‖
Hence, the volume of the parallelepiped is given by
Volume(~u,~v, ~w) =|~w · (~u × ~v)|‖~u × ~v ‖ × ‖~u × ~v ‖ = |~w · (~u × ~v)|
= |w1(u2v3 − u3v2) − w2(u1v3 − u3v1) + w3(u1v2 − u2v1)|
=
∣
∣
∣
∣
∣
∣
∣
∣
det
u1 v1 w1
u2 v2 w2
u3 v3 w3
∣
∣
∣
∣
∣
∣
∣
∣
Section 5.5 Area and Volume 181
EXAMPLE 2 Find the volume of the parallelepiped determined by ~u =
1
1
4
, ~v =
1
3
4
, and ~w =
−2
1
−5
.
Solution: The volume is
Volume(~u,~v, ~w) =∣
∣
∣
∣
det[
~u ~v ~w]
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
∣
∣
det
1 1 −2
1 3 1
4 4 −5
∣
∣
∣
∣
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
∣
∣
det
1 1 −2
0 2 3
0 0 3
∣
∣
∣
∣
∣
∣
∣
∣
= 6
REMARK
Of course, this can be extended to higher dimensions. If ~v1, . . . ,~vn ∈ Rn, then we
say that they induce an n-dimensional parallelotope (the n-dimensional version of a
parallelogram or parallelepiped). The n-volume V of the parallelotope is
V =∣
∣
∣
∣
det[
~v1 · · · ~vn
]
∣
∣
∣
∣
Section 5.5 Problems
1. Find the area of the parallelogram induced by
~u and ~v.
(a) ~u =
[
2
−3
]
, ~v =
[
3
2
]
(b) ~u =
[
4
1
]
, ~v =
[
−1
1
]
(c) ~u =
[
4
1
]
, ~v =
[
−3
3
]
(d) ~u =
[
2
3
]
, ~v =
[
−2
2
]
2. Find the volume of the parallelepiped deter-
mined by ~u, ~v, and ~w.
(a) ~u =
5
−1
1
, ~v =
1
4
−1
, ~w =
1
2
−6
(b) ~u =
1
1
4
, ~v =
1
3
4
, ~w =
−2
1
−5
(c) ~u =
2
3
4
, ~v =
2
−1
−5
, ~w =
1
5
2
3. Find the 4-volume of the parallelotope induced
by ~v1 =
1
1
2
1
, ~v2 =
1
1
1
3
, ~v3 =
1
2
3
0
, ~v4 =
1
0
5
7
.
4. Let ~v1, . . . ,~vn ∈ Rn. Prove that the n-volume
of the parallelotope induced by ~v1, . . . ,~vn is the
same as the n-volume of the parallelotope in-
duced by ~v1, . . . ,~vn−1,~vn + t~v1.
5. (a) Let L : R2 → R2 be a linear mapping.
Prove that
Area(L(~u), L(~v)) = | det[L] |Area(~u,~v)
Thus, | det[L]| is called the area scaling
factor of the linear mapping.
(b) Calculate the area scaling factor of
L(x1, x2) = (2x1 + 8x2,−x1 + 5x2).
(c) If L,M : R3 → R3 are linear mappings,
then prove that the volume scaling factor
of M ◦ L is | det([M][L])|.
Chapter 6
Eigenvectors and Diagonalization
6.1 Matrix of a Linear Mapping, Similar Matrices
Matrix of a Linear Operator
In Chapter 3, we saw how to find the standard matrix of a linear mapping by finding
the image of the standard basis vectors under the linear mapping. However, the
standard matrix of a linear mapping is not always nice to work with. For example,
consider the matrix
A =
[
1/5 2/52/5 4/5
]
By just looking at this matrix, can you give a geometric interpretation of the matrix
mapping L(~x) = A~x? Probably not. We found in Example 3.2.6 that this is the
standard matrix of proj~a : R2 → R2 where ~a =
[
1
2
]
. Even though we can use this
matrix to calculate the projection of a vector ~x onto ~a, the matrix itself does not seem
to provide us with any information about the action of the projection. This is because
we are trying to use the standard basis vectors to determine how a vector is projected
onto ~a. It would make more sense to instead use a basis for R2 which contains ~a and
a vector orthogonal to ~a. Of course, to do this we first need to determine how to find
the matrix of a linear operator with respect to a basis other than the standard basis.
The standard matrix [L] of a linear operator L : Rn → Rn satisfies L(~x) = [L]~x.
Thus, to define the matrix [L]B of L with respect to any basis B of Rn, we mimic this
formula in B-coordinates instead of standard coordinates. That is, we want
[L(~x)]B = [L]B[~x]B
Let B = {~v1, . . . ,~vn}. Then, we can write ~x = b1~v1 + · · · + bn~vn and we get
[L(~x)]B = [L(b1~v1 + · · · + bn~vn)]B
= [b1L(~v1) + · · · + bnL(~vn)]B
= b1[L(~v1)]B + · · · + bn[L(~vn)]B
=[
[L(~v1)]B · · · [L(~vn)]B]
b1
...bn
182
Section 6.1 Matrix of a Linear Mapping, Similar Matrices 183
Therefore, we have [L(~x)]B as a product of a matrix and [~x]B as desired. We make
the following definition.
DEFINITION
B-Matrix
Let B = {~v1, . . . ,~vn} be a basis for Rn and let L : Rn → Rn be a linear operator. The
B-matrix of L is defined to be
[L]B =[
[L(~v1)]B · · · [L(~vn)]B]
It satisfies [L(~x)]B = [L]B[~x]B.
EXAMPLE 1 Let L be a linear mapping with [L] =
−6 −2 9
−5 −1 7
−7 −2 10
and let B =
1
1
1
,
1
0
1
,
1
3
2
.
Find the B-matrix of L and use it to determine [L(~x)]B where [~x]B =
1
−2
0
.
Solution: By definition, the columns of the B-matrix are the B-coordinate vectors of
the images of the vectors in B under L. Thus, we first find the images of the vectors
in B under L and write them as linear combinations of the vectors in B. We get
−6 −2 9
−5 −1 7
−7 −2 10
1
1
1
=
1
1
1
= 1
1
1
1
+ 0
1
0
1
+ 0
1
3
2
−6 −2 9
−5 −1 7
−7 −2 10
1
0
1
=
3
2
3
= 2
1
1
1
+ 1
1
0
1
+ 0
1
3
2
−6 −2 9
−5 −1 7
−7 −2 10
1
3
2
=
6
6
7
= 3
1
1
1
+ 2
1
0
1
+ 1
1
3
2
Hence, [L]B =[
[L(1, 1, 1)]B [L(1, 0, 1)]B [L(1, 3, 2)]B]
=
1 2 3
0 1 2
0 0 1
.
Thus, [L(~x)]B = [L]B[~x]B =
1 2 3
0 1 2
0 0 1
1
−2
0
=
−3
−2
0
.
EXERCISE 1 Let L : R2 → R2 be the linear mapping defined by L(x1, x2) = (2x1 + 3x2, 4x1 − x2)
and B ={[
1
2
]
,
[
1
−1
]}
. Find the B-matrix of L and use it to determine [L(~x)]B where
[~x]B =
[
1
1
]
.
184 Chapter 6 Eigenvectors and Diagonalization
In the last example and the last exercise we used the methods of Section 4.3 to find
the coordinates of the image vectors with respect to the basis B. However, if we can
pick a basis for the linear mapping which is geometrically suited to the mapping, then
we can often find the coordinates of the image vectors with respect to the basis by
observation.
EXAMPLE 2 Let ~a =
[
1
2
]
. Determine a geometrically natural basis B for proj~a : R2 → R2, and
then determine [proj~a]B.
Solution: As discussed above, it makes sense to include ~a in our geometrically nat-
ural basis as well as a vector orthogonal to ~a. So, we take the other basis vector to
be ~n =
[
2
−1
]
and let B = {~a, ~n}. By definition, the columns of the B-matrix are the
B-coordinate vectors of the images of the vectors in B under proj~a. We have
proj~a(~a) = ~a = 1
[
1
2
]
+ 0
[
−2
1
]
proj~a(~n) = ~0 = 0
[
1
2
]
+ 0
[
−2
1
]
Therefore,
[proj~a]B =[
[proj~a(~a)]B [proj~a(~n)]B]
=
[
1 0
0 0
]
Let us compare the standard matrix of proj~a to its B-matrix found in the example
above. First, we already discussed that its standard matrix [proj~a] =
[
1/5 2/52/5 4/5
]
does not provide any useful information about the mapping. It does show us that
the projection of
[
x1
x2
]
onto ~a is
[
15x1 +
25x2
25x1 +
45x2
]
, but this does not tell us much about the
geometry of the mapping.
On the other hand, consider its B-matrix [proj~a]B =
[
1 0
0 0
]
. What information does
this give us about the mapping? First, this matrix clearly shows the action of the
mapping. In particular, for any ~x = b1~a + b2~n ∈ R2 we have
[proj~a(~x)]B = [proj~a]B[~x]B =
[
1 0
0 0
] [
b1
b2
]
=
[
b1
0
]
That is, the image of ~x under proj~a is the amount of ~x in the direction of ~a. Also, it is
clear from the form of the matrix that the range of the mapping is all scalar multiples
of the first vector in B and the kernel is all scalar multiples of the second vector in B.
The form of the matrix in Example 2 is obviously important. Therefore, we make the
following definition.
Section 6.1 Matrix of a Linear Mapping, Similar Matrices 185
DEFINITION
Diagonal Matrix
An n × n matrix D is said to be a diagonal matrix if di j = 0 for all i , j. We denote
a diagonal matrix by
diag(d11, d22, . . . , dnn)
EXAMPLE 3 The diagonal matrix D = diag(1, 2, 0, 1) is D =
1 0 0 0
0 2 0 0
0 0 0 0
0 0 0 1
.
Let’s consider another example. Let L : R3 → R3 be the linear mapping with standard
matrix [L] =
1 0 −1
−1/3 2/3 −1/3−2/3 −2/3 4/3
. Without doing any further work, what can you
determine about L from [L]? Not much. However, it can be shown that there exists a
basis B = {~v1,~v2,~v3} for R3 such that [L]B =
1 0 0
0 2 0
0 0 0
. What can you say about L by
looking at [L]B? We see that the action of L is to map a vector ~x = b1~v1 + b2~v2 + b3~v3
onto b1~v1+2b2~v2+0b3~v3, the range of L is Span{~v1,~v2}, and the kernel of L is Span{~v3}.
Clearly it is very useful to have a B-matrix of a linear operator L : Rn → Rn in
diagonal form. This leads to a couple of questions. How do we determine if there
exists a basis B such that [L]B has this form? If there is such a basis, how do we find
it? We start answering these questions by looking at our method for calculating the
B-matrix a little more closely.
Similar Matrices
Let L : Rn → Rn be a linear operator with standard matrix A = [L] and let
B = {~v1, . . . ,~vn} be a basis for Rn. Then, by definition, we have
[L]B =[
[A~v1]B · · · [A~vn]B]
(6.1)
From our work in Section 4.3, we know that the change of coordinates matrix from
B-coordinates to S-coordinates is P =[
~v1 · · · ~vn
]
. Hence, we have [~x]B = P−1~x
for any ~x ∈ Rn. Since A~vi ∈ Rn for 1 ≤ i ≤ n, we can apply this to (6.1) to get
[L]B =[
[A~v1]B · · · [A~vn]B]
=[
P−1A~v1 · · · P−1A~vn
]
= P−1A[
~v1 · · · ~vn
]
= P−1AP
186 Chapter 6 Eigenvectors and Diagonalization
REMARK
You can also show that [L]B = P−1[L]P, by showing that [L]B[~x]B = P−1[L]P[~x]Bfor all ~x ∈ Rn by simplifying the right-hand side of the equation. This is an excellent
exercise.
Since [L]B and [L] represent the same linear mapping, we would expect that these
two matrices have many of the same properties.
THEOREM 1 If A and B are n×n matrices such that P−1AP = B for some invertible matrix P, then
(1) rank A = rank B
(2) det A = det B
(3) tr A = tr B where tr A is defined by tr A =n∑
i=1
aii and is called the trace of a
matrix.
We see that such matrices are similar in many ways. This motivates the following
definition.
DEFINITION
Similar Matrices
If A and B be n × n matrices such that P−1AP = B for some invertible matrix P, then
A is said to be similar to B.
REMARK
Observe that if P−1AP = B, then taking Q = P−1 we get that Q is an invertible
matrix such that A = PBP−1 = Q−1BQ. So, the similarity property is symmetric. In
particular, we usually will just say that A and B are similar.
We can now rephrase our earlier questions. Given a linear operator L : Rn → Rn is the
standard matrix of L similar to a diagonal matrix? If so, how do we find an invertible
matrix P such that P−1[L]P is diagonal? In the next section, we will continue to work
towards answering these questions.
Section 6.1 Matrix of a Linear Mapping, Similar Matrices 187
Section 6.1 Problems
1. For each linear operator L and basis B, find the
B-matrix of L.
(a) L(x1, x2) = (3x1 + 2x2, 5x1 + 4x2),
B ={[
1
−2
]
,
[
2
1
]}
(b) L(x1, x2) = (2x1 + 4x2, 3x1 − x2),
B ={[
1
1
]
,
[
1
2
]}
(c) L(x1, x2) = (−7x1 − 3x2, 10x1 + 4x2),
B ={[
−3
5
]
,
[
1
−2
]}
(d) L(x1, x2, x3) = (x1+3x3, x2− x3, 2x1+3x2),
B =
1
1
1
,
1
−1
0
,
0
−1
−1
(e) L(x1, x2, x3) = (5x1 − 6x3, 2x2, 3x1 − 4x3)
B =
2
−1
−1
,
1
1
1
,
0
0
−1
(f) L(x1, x2, x3) = (5x1 − 6x3, 2x2, 3x1 − 4x3)
B =
0
1
0
,
2
0
1
,
1
0
1
(g) L(x1, x2, x3) = (2x1 + x3, x1 + 2x2 + x3, x3)
B =
0
1
0
,
−1
0
1
,
1
1
1
2. Consider the basisB =
2
1
0
,
−1
0
1
,
1
1
0
forR3.
Let L : R3 → R3 be the linear mapping where
[L]B =
0 1 0
0 4 2
3 3 0
.
(a) Find L(~x) if [~x]B =
1
4
3
.
(b) Find L(~y) if [~y]B =
1
0
−1
.
(c) Find [L].
3. Consider the basis B =
1
1
0
,
0
1
1
,
1
0
1
for R3.
Let L : R3 → R3 be the linear mapping where
[L]B =
0 0 3
5 0 −1
0 2 2
.
(a) Find L(~x) if [~x]B =
3
−1
2
.
(b) Find L(~y) if [~y]B =
1
2
−1
.
(c) Find [L].
4. Let ~a =
[
3
−1
]
, ~b =
[
1
1
]
, and let P be a plane with
normal vector ~n =
1
0
−1
. For each linear map-
ping find a geometrically natural basis B and
the B-matrix of the mapping.
(a) proj~a : R2 → R2 (b) proj~b : R2 → R2
(c) perp~b : R2 → R2 (d) reflP : R3 → R3
5. Let A, B, and C be n × n matrices.
(a) Prove that A is similar to A.
(b) Prove that if A is similar to B and B is sim-
ilar to C, then A is similar to C.
6. Prove that AT A is similar to I if and only if
AT = A−1.
7. If A is similar to B, prove that An is similar to
Bn for all positive integers n.
8. Prove Theorem 1(1).
9. Let A and B be n×n matrices. Prove or disprove
each statement.
(a) If rank A = rank B, then A and B are simi-
lar.
(b) If A and B are similar and A is invertible,
then B is invertible.
(c) A is similar to its reduced row echelon
form R.
(d) If A is similar to the diagonal matrix
D = diag(d11, d22, . . . , dnn), then
det A = d11d22 · · · dnn.
188 Chapter 6 Eigenvectors and Diagonalization
6.2 Eigenvalues and Eigenvectors
Let A = [L] be the standard matrix of a linear operator L : Rn → Rn. To determine
how to construct an invertible matrix P such that P−1AP = D is diagonal, we work in
reverse. That is, we will assume that such a matrix P exists and use this to find what
properties P and D must have.
Let P =[
~v1 · · · ~vn
]
and let D = diag(λ1, . . . , λn) =[
λ1~e1 · · · λn~en
]
such that
P−1AP = D, or alternately AP = PD. This gives
A[
~v1 · · · ~vn
]
= P[
λ1~e1 · · · λn~en
]
[
A~v1 · · · A~vn
]
=[
λ1P~e1 · · · λnP~en
]
[
A~v1 · · · A~vn
]
=[
λ1~v1 · · · λn~vn
]
Thus, we must have A~vi = λi~vi for 1 ≤ i ≤ n. Moreover, for P to be invertible, the
columns of P must be linearly independent. In particular, ~vi ,~0 for 1 ≤ i ≤ n.
DEFINITION
Eigenvalues
Eigenvectors
Eigenpair
Let A be an n × n matrix. If there exists a vector ~v , ~0 such that A~v = λ~v, then λis called an eigenvalue of A and ~v is called an eigenvector of A corresponding to λ.The pair (λ,~v) is called an eigenpair.
If L : Rn → Rn is a linear operator with standard matrix A = [L] and (λ,~v) is an
eigenpair of A, then observe that we have
L(~v) = A~v = λ~v
Hence, it makes sense to also define eigenvalues and eigenvectors for linear operators.
DEFINITION
Eigenvalues
Eigenvectors
Let L : Rn → Rn be a linear operator. If there exists a vector ~v , ~0 such that
L(~v) = λ~v, then λ is called an eigenvalue of L and ~v is called an eigenvector of L
corresponding to λ.
EXAMPLE 1 Let ~a =
[
1
2
]
and ~n =
[
2
−1
]
. We have
proj~a(~a) = 1~a
proj~a(~n) = ~0 = 0~n
Hence, ~a is an eigenvector of proj~a with eigenvalue 1, and ~n is an eigenvector of proj~awith eigenvalue 0.
Observe that it is easy to determine whether a given vector ~v is an eigenvector of a
given matrix A. We just need to determine whether the image of ~v under A is a scalar
multiple of ~v.
Section 6.2 Eigenvalues and Eigenvectors 189
EXAMPLE 2 Determine which of the following vectors are eigenvectors of A =
2 −3 −1
1 −2 −1
1 −3 0
.
(a) ~v1 =
3
1
0
(b) ~v2 =
−1
2
−1
(c) ~v3 =
1
1
1
.
Solution: We have A~v1 =
2 −3 −1
1 −2 −1
1 −3 0
3
1
0
=
3
1
0
.
So, ~v1 is an eigenvector with eigenvalue λ1 = 1.
For ~v2 we get A~v2 =
2 −3 −1
1 −2 −1
1 −3 0
−1
2
−1
=
−7
−4
−7
.
So, A~v2 is not a scalar multiple of ~v2, so it is not an eigenvector.
For ~v3 we get A~v3 =
2 −3 −1
1 −2 −1
1 −3 0
1
1
1
=
−2
−2
−2
= (−2)
1
1
1
.
So, ~v3 is an eigenvector with eigenvalue λ2 = −2.
EXERCISE 1 Determine which of the following vectors are eigenvectors of A =
1 4 0
−1 −5 1
−3 −4 −2
.
(a) ~v1 =
1
2
3
(b) ~v2 =
−4
3
5
(c) ~v3 =
−2
1
2
How do we determine whether a given scalar λ is an eigenvalue of a given matrix A?
We need to determine whether there exists a vector ~v , ~0 such that A~v = λ~v. At first
glance this equation might look like a system of linear equations, but we observe that
the unknown vector ~v is actually on both sides of the equation. To turn this into a
system of linear equations, we need to move λ~v to the other side. We get
A~v − λ~v = ~0(A − λI)~v = ~0
Hence, we have a homogeneous system of linear equations with coefficient matrix
A − λI. Since the definition requires that ~v , ~0, we see that λ is an eigenvalue of A if
and only if this system has infinitely many solutions. Thus, since A − λI is n × n, we
know by the Invertible Matrix Theorem that we require det(A − λI) = 0. Moreover,
we see that to find all the eigenvectors associated with the eigenvalue λ, we just need
to determine all non-zero vectors ~v such that (A − λI)~v = ~0.
190 Chapter 6 Eigenvectors and Diagonalization
EXAMPLE 3 Find all of the eigenvalues of A =
1 6 3
0 −2 0
3 6 1
. Determine all eigenvectors associated
with each eigenvalue.
Solution: Our work above shows that all eigenvalues of A must satisfy det(A−λI) =
0. Thus, to find all eigenvalues of A, we solve this equation for λ. We have
0 = det(A − λI) =
∣
∣
∣
∣
∣
∣
∣
∣
1 − λ 6 3
0 −2 − λ 0
3 6 1 − λ
∣
∣
∣
∣
∣
∣
∣
∣
= (−2 − λ)∣
∣
∣
∣
∣
∣
1 − λ 3
3 1 − λ
∣
∣
∣
∣
∣
∣
= −(λ + 2)(λ2 − 2λ − 8) = −(λ + 2)(λ + 2)(λ − 4)
Hence, the eigenvalues are λ1 = −2, λ2 = −2, and λ3 = 4.
To find the eigenvectors associated with the eigenvalue λ1 = −2 we solve the homo-
geneous system (A − λ1I)~v = ~0. Row reducing the coefficient matrix gives
A − λ1I =
3 6 3
0 0 0
3 6 3
∼
1 2 1
0 0 0
0 0 0
The solution of the homogeneous system is ~v = s
−2
1
0
+ t
−1
0
1
, s, t ∈ R. Hence, all
eigenvectors for λ1 are all non-zero vectors of the form ~v = s
−2
1
0
+ t
−1
0
1
, s, t ∈ R.
Of course, these are also the eigenvectors for λ2.
Similarly, for λ3 = 4 we row reduce the coefficient matrix of the homogeneous system
(A − λ3I)~v = ~0 to get
A − λ3I =
−3 6 3
0 −6 0
3 6 −3
∼
1 0 −1
0 1 0
0 0 0
The solution of the homogeneous system is ~v = t
1
0
1
, t ∈ R. Hence, all eigenvectors
for λ3 are ~v = t
1
0
1
, t ∈ R with t , 0.
Observe that for any n × n matrix A the equation det(A − λI) will always be an n-th
degree polynomial and the roots of the polynomial are the eigenvalues of A.
Section 6.2 Eigenvalues and Eigenvectors 191
DEFINITION
CharacteristicPolynomial
Let A be an n × n matrix. The characteristic polynomial of A is the n-th degree
polynomial
C(λ) = det(A − λI)
THEOREM 1 A scalar λ is an eigenvalue of an n × n matrix A if and only if C(λ) = 0.
We also see that the set of all eigenvectors associated with an eigenvalue λ is the
nullspace of A − λI not including the zero vector.
DEFINITION
Eigenspace
Let A be an n × n matrix with eigenvalue λ. We call the nullspace of A − λI the
eigenspace of λ. The eigenspace is denoted Eλ.
REMARKS
1. It is important to remember that the set of all eigenvectors for the eigenvalue λof A is all vectors in Eλ excluding the zero vector.
2. Since the eigenspace is just the nullspace of A − λI, it is also a subspace of Rn.
3. The examples and exercises we will do are relatively simple. In real world
applications, the matrices are generally much larger and most likely have non-
integer eigenvalues.
EXAMPLE 4 Find all eigenvalues and a basis for each eigenspace for A =
[
−4 3
2 1
]
.
Solution: To find all of the eigenvalues, we find the roots of the characteristic poly-
nomial. We have
0 = C(λ) = det(A − λI) =
∣
∣
∣
∣
∣
∣
−4 − λ 3
2 1 − λ
∣
∣
∣
∣
∣
∣
= λ2 + 3λ − 10 = (λ − 2)(λ + 5)
Hence, the eigenvalues of A are λ1 = 2 and λ2 = −5. For λ1 = 2, we have
A − λ1I =
[
−6 3
2 −1
]
∼[
1 −1/20 0
]
Hence, a basis for Eλ1is
{[
1
2
]}
.
For λ2 = −5, we have
A − λ2I =
[
1 3
2 6
]
∼[
1 3
0 0
]
Hence, a basis for Eλ2is
{[
−3
1
]}
.
192 Chapter 6 Eigenvectors and Diagonalization
EXAMPLE 5 Find all eigenvalues and a basis for each eigenspace for A =
1 0 1
0 1 0
0 0 1
.
Solution: To find all of the eigenvalues, we find the roots of the characteristic poly-
nomial. We have
0 = C(λ) = det(A − λI) =
∣
∣
∣
∣
∣
∣
∣
∣
1 − λ 0 1
0 1 − λ 0
0 0 1 − λ
∣
∣
∣
∣
∣
∣
∣
∣
= (1 − λ)3
Hence, all three eigenvalues of A are 1. For λ1 = 1, we have A − λ1I =
0 0 1
0 0 0
0 0 0
.
Hence, a basis for Eλ1is
1
0
0
,
0
1
0
.
EXERCISE 2 Find all eigenvalues and a basis for each eigenspace for A =
1 2 1
0 1 2
0 0 −2
.
From our work in the last example and the last exercise, we see that finding the
eigenvalues of a triangular matrix is easy.
THEOREM 2 If A is an n × n upper or lower triangular matrix, then the eigenvalues of A are the
diagonal entries of A.
Observe in the problems above that the dimension of the eigenspace of an eigenvalue
is always less than or equal to the number of times the eigenvalue is repeated as a
root of the characteristic polynomial. Before we prove this is always the case, we
make the following definitions and prove a lemma.
DEFINITION
AlgebraicMultiplicity
GeometricMultiplicity
Let A be an n × n matrix with eigenvalue λ1. The algebraic multiplicity of λ1,
denoted aλ1, is the number of times that λ1 is a root of the characteristic polynomial
C(λ). That is, if C(λ) = (λ − λ1)kC1(λ), where C1(λ1) , 0, then aλ1= k. The
geometric multiplicity of λ, denoted gλ1, is the dimension of its eigenspace. So,
gλ1= dim(Eλ1
).
EXAMPLE 6 In Example 5, we found that the matrix A has only one eigenvalue λ1 = 1 which is a
triple root of the characteristic polynomial. Thus, the algebraic multiplicity of λ1 is
aλ1= 3. The geometric multiplicity of λ1 is gλ1
= dim Eλ1= 2.
Section 6.2 Eigenvalues and Eigenvectors 193
EXAMPLE 7 In Example 4, the matrix A has eigenvalues λ1 = 2 and λ2 = −5 each which occur
as a single root of the characteristic polynomial. Hence, the algebraic multiplicity of
both eigenvalues is 1. Moreover, the eigenspace of each eigenvalue has dimension 1,
so the geometric multiplicity of each eigenvalue is 1.
EXAMPLE 8 Find the geometric and algebraic multiplicity of all eigenvalues of A =
−1 6 3
1 0 −1
−3 6 5
.
Solution: We have
0 = C(λ) =
∣
∣
∣
∣
∣
∣
∣
∣
−1 − λ 6 3
1 −λ −1
−3 6 5 − λ
∣
∣
∣
∣
∣
∣
∣
∣
= (−1 − λ)∣
∣
∣
∣
∣
∣
−λ −1
6 5 − λ
∣
∣
∣
∣
∣
∣
+ 6(−1)
∣
∣
∣
∣
∣
∣
1 −1
−3 5 − λ
∣
∣
∣
∣
∣
∣
+ 3
∣
∣
∣
∣
∣
∣
1 −λ−3 6
∣
∣
∣
∣
∣
∣
= (−1 − λ)(λ2 − 5λ + 6) − 6(−λ + 2) + 3(6 − 3λ)
= −λ3 + 4λ2 − 4λ = −λ(λ2 − 4λ + 4) = −λ(λ − 2)2
Thus, the eigenvalues of A are λ1 = 0 and λ2 = 2. Since λ1 is a single root of C(λ)we have that aλ1
= 1. We have aλ2= 2 since λ2 is a double root of C(λ).
For λ1 = 0, we have
A − λ1I =
−1 6 3
1 0 −1
−3 6 5
∼
1 0 −1
0 1 1/30 0 0
Thus, a basis for Eλ1is
1
−1/31
. Hence, gλ1= dim Eλ1
= 1.
For λ2 = 2, we have
A − λ2I =
−3 6 3
1 −2 −1
−3 6 3
∼
1 −2 −1
0 0 0
0 0 0
Thus, a basis for Eλ2is
2
1
0
,
1
0
1
. Hence, gλ2= dim Eλ2
= 2.
EXERCISE 3 Find the geometric and algebraic multiplicity of all eigenvalues of A =
1 2 1
0 1 2
0 0 −2
.
194 Chapter 6 Eigenvectors and Diagonalization
Observe in Example 8 that we needed to factor a cubic polynomial. For purposes of
this course, we typically have two options to make this easier. First, we can try to
use row and/or column operations when finding det(A−λI) so that we do not have to
factor a cubic polynomial, or alternately we can use the Rational Roots Theorem and
synthetic division to factor the cubic. Of course, in the real world we generally have
polynomials of degree much larger than 3 and are not so lucky as to have rational
roots. In these cases, techniques for approximating roots of polynomials or more
specialized methods for approximating eigenvalues are employed. These are beyond
the scope of this book. However, you should keep in mind that the problems we do
here are to help you understand the theory.
EXAMPLE 9 Find the geometric/algebraic multiplicity of all eigenvalues of A =
−8 −3 −1
9 4 −2
−8 −8 −1
.
Solution: Since adding a multiple of one column to another or adding a multiple of
one row to another does not change the determinant, we get
0 = C(λ) =
∣
∣
∣
∣
∣
∣
∣
∣
−8 − λ −3 −1
9 4 − λ −2
−8 −8 −1 − λ
∣
∣
∣
∣
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
∣
∣
−5 − λ −3 −1
5 + λ 4 − λ −2
0 −8 −1 − λ
∣
∣
∣
∣
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
∣
∣
−5 − λ −3 −1
0 1 − λ −3
0 −8 −1 − λ
∣
∣
∣
∣
∣
∣
∣
∣
= (−5 − λ)(λ2 − 25) = −(λ + 5)2(λ − 5)
Hence, the eigenvalues of A are λ1 = −5 and λ2 = 5 with aλ1= 2 and aλ2
= 1.
For λ1 = −5, we have A − λ1I =
−3 −3 −1
9 9 −2
−8 −8 4
∼
1 1 0
0 0 1
0 0 0
Thus, a basis for Eλ1is
−1
1
0
. Hence, gλ1= dim Eλ1
= 1.
For λ2 = 5, we have A − λ2I =
−13 −3 −1
9 −1 −2
−8 −8 −6
∼
1 0 −1/80 1 7/80 0 0
Thus, a basis for Eλ2is
1/8−7/8
1
. Hence, gλ2= dim Eλ2
= 1.
Recall that our goal so far in this chapter is to answer the following two questions.
Given a linear operator L : Rn → Rn is the standard matrix of L similar to a diagonal
matrix? If so, how do we find an invertible matrix P such that P−1[L]P is diagonal?
Section 6.2 Eigenvalues and Eigenvectors 195
In this section, we have discovered that if [L] is similar to a diagonal matrix D, then
the columns in P must be eigenvectors of [L] and the diagonal entries in D must be
eigenvalues of [L]. Using this and the following two results, we will finally be able
to fully answer both of these questions in the next section.
LEMMA 3 Let A and B be similar matrices, then A and B have the same characteristic polyno-
mial, and hence the same eigenvalues.
THEOREM 4 If A is an n × n matrix with eigenvalue λ1, then 1 ≤ gλ1≤ aλ1
.
Proof: First, by definition of an eigenvalue we have that gλ1= dim(Eλ1
) ≥ 1.
Let {~v1, . . . ,~vk} be a basis for Eλ1. Extend this to a basis {~v1, . . . ,~vk, ~wk+1, . . . , ~wn} for
Rn and let P =[
~v1 · · · ~vk ~wk+1 · · · ~wn
]
. Since the columns of P form a basis for
Rn, we have that P is invertible by the Inverse Matrix Theorem. Let F =[
~v1 · · · ~vk
]
and G =[
~wk+1 · · · ~wn
]
so that P can be written as the block matrix P =[
F G]
.
Similarly, write P−1 as the block matrix P−1 =
[
H
J
]
so that
I = P−1P =
[
H
J
]
[
F G]
=
[
HF HG
JF JG
]
In particular, this gives that HF = Ik and JF = On−k,k, the (n − k) × k zero matrix.
Define the matrix B by B = P−1AP, so that B is similar to A.
B =
[
H
J
]
A[
F G]
=
[
H
J
]
[
AF AG]
=
[
H
J
]
[
λ1F AG]
=
[
λ1HF HAG
λ1JF JAG
]
=
[
λ1Ik X
On−k,k Y
]
where X is k × (n − k) and Y is (n − k) × (n − k). Then, by Lemma 3, we have the
characteristic polynomial C(λ) of A equals the characteristic polynomial of B. In
particular,
C(λ) = det(B − λI) =
∣
∣
∣
∣
∣
∣
(λ1 − λ)I X
On−k,n−k Y − λI
∣
∣
∣
∣
∣
∣
Expanding the determinant we get
C(λ) = (−1)k(λ − λ1)k det(Y − λI)
Since, λ1 may or may not be a root of det(Y − λI), we get that aλ1≥ k = gλ1
. �
196 Chapter 6 Eigenvectors and Diagonalization
Section 6.2 Problems
1. Determine which of the following are eigen-
vectors of A =
−1 1 −4
5 −5 −4
−5 11 10
.
(a) ~v1 =
1
1
1
(b) ~v2 =
2
2
−3
(c) ~v3 =
0
0
0
2. Determine which of the following are eigen-
vectors of B =
1 3 3
6 7 12
−3 −3 −5
.
(a) ~v1 =
−1
−1
1
(b) ~v2 =
3
−6
3
(c) ~v3 =
2
−2
1
3. Find the geometric and algebraic multiplicity
of all eigenvalues of each matrix.
(a)
[
3 2
4 5
]
(b)
[
−2 0
5 −2
]
(c)
[
4 −3
3 −4
]
(d)
−3 −2 −2
2 2 1
0 0 1
(e)
−3 6 −2
−1 2 −1
1 −3 0
(f)
4 2 2
2 4 2
2 2 4
(g)
1 3 1
0 1 2
0 2 1
(h)
2 2 −1
−2 1 2
2 2 −1
(i)
1 6 −4
5 −3 1
−5 6 2
(j)
1 0 1
0 2 1
0 0 2
4. Let ~v =
[
1
3
]
. Use geometric reasoning to deter-
mine the eigenvalues and eigenvectors of the
projection proj~v : R2 → R2 and the rotation
Rπ/3 : R2 → R2.
5. Show that if A is invertible and ~v is an eigen-
vector of A, then ~v is also an eigenvector of
A−1. How are the corresponding eigenvalues
related?
6. Invent a 2 × 2 matrix A that has eigenvalues 2
and 3 with corresponding eigenvectors
[
1
2
]
,
[
1
3
]
respectively.
7. Invent a 4 × 4 matrix A that has an eigenvalue
λ such that gλ < aλ. Justify your answer.
8. Invent a 4×4 matrix A that has 4 distinct eigen-
values. Justify your answer.
9. Suppose A and B are n × n matrices and that ~uand ~v are eigenvectors of both A and B where
A~u = 6~u, B~u = 5~u, A~v = 10~v, and B~v = 3~v.
(a) If C = AB, show that 5~u + 3~v is an eigen-
vector of C. Find the corresponding eigen-
value.
(b) If n = 2, and ~w =
[
0.21.4
]
compute AB~w.
10. Let A and B be n × n matrices with eigenvalues
λ and µ respectively. Does this imply that λµ is
an eigenvalue of AB? Justify your answer.
11. Suppose that A is an n × n matrix such that the
sum of the entries in each row is the same. That
is,
n∑
j=1
ai j = c for 1 ≤ i ≤ n. Show that ~v =
1...1
is an eigenvector of A. (Such matrices arise in
probability theory).
12. Prove Theorem 2.
13. Prove Lemma 3.
14. Prove the following addition to the Invertible
Matrix Theorem. A matrix A is invertible if
and only if λ = 0 is not an eigenvalue of A.
Section 6.3 Diagonalization 197
6.3 Diagonalization
In this section, we will reveal how to determine when a matrix A is similar to a
diagonal matrix D, and if it is, how to find a matrix P such that P−1AP = D. We
begin with a definition.
DEFINITION
Diagonalizable
An n × n matrix A is said to be diagonalizable if A is similar to a diagonal matrix D.
If P−1AP = D, then we say that P diagonalizes A.
REMARK
For now, we will restrict ourselves to diagonalizing real matrices with real eigenval-
ues. That is, if A has a non-real eigenvalue, then we will say that A is not diagonaliz-
able over R. In Chapter 11, we will look at diagonalizing matrices over the complex
numbers.
Our work in the last section motivates the following theorem.
THEOREM 1 (Diagonalization Theorem)
An n × n matrix A is diagonalizable if and only if there exists a basis {~v1, . . . ,~vn} for
Rn of eigenvectors of A.
Proof: Assume that A is diagonalizable. By definition, this means that there exists
an invertible matrix P =[
~v1 · · · ~vn
]
such that
P−1AP = D (6.2)
where D = diag(λ1, . . . , λn) is diagonal. Since D is diagonal, the eigenvalues of D
are λ1, . . . , λn. Therefore, by Lemma 6.2.3, λ1, . . . , λn are also the eigenvalues of A.
Equation (6.2) gives
AP = PD
A[
~v1 · · · ~vn
]
= P[
λ1~e1 · · · λn~en
]
[
A~v1 · · · A~vn
]
=[
λ1P~e1 · · · λnP~en
]
[
A~v1 · · · A~vn
]
=[
λ1~v1 · · · λn~vn
]
Thus, A~vi = λi~vi for 1 ≤ i ≤ n. Moreover, since P is invertible we have {~v1, . . . ,~vn}is a basis for Rn by the Invertible Matrix Theorem, which also implies ~vi ,
~0 for
1 ≤ i ≤ n. Hence, {~v1, . . . ,~vn} is a basis for Rn of eigenvectors of A.
198 Chapter 6 Eigenvectors and Diagonalization
On the other hand, if {~v1, . . . ,~vn} is a basis for Rn of eigenvectors, then the matrix
P =[
~v1 · · · ~vn
]
is invertible by the Invertible Matrix Theorem, and we have
P−1AP = P−1[
A~v1 · · · A~vn
]
= P−1[
λ1~v1 · · · λn~vn
]
= P−1[
λ1P~e1 · · · λnP~en
]
= P−1P[
λ1~e1 · · · λn~en
]
= diag(λ1, . . . , λn)
Therefore, A is diagonalizable. �
It is very important to understand the proof of the Diagonalization Theorem. It does
a lot more than just prove the theorem. It shows us how to diagonalize a matrix!
In particular, we just need to find a basis {~v1, . . . ,~vn} for Rn of eigenvectors of A
corresponding to eigenvalues λ1, . . . , λn and then take P =[
~v1 · · · ~vn
]
. Moreover,
it shows that taking this matrix P gives
P−1AP = diag(λ1, . . . , λn)
The next two results show us how to construct a basis for Rn of eigenvectors of A if
such a basis exists.
LEMMA 2 If A is an n × n matrix with eigenpairs (λ1,~v1), (λ2,~v2), . . . , (λk,~vk) where λi , λ j for
i , j, then {~v1, . . . ,~vk} is linearly independent.
Proof: We will prove this by induction. If k = 1, then the result is trivial, since by
definition of an eigenvector ~v1 ,~0. Assume that the result is true for some k ≥ 1. To
show {~v1, . . . ,~vk,~vk+1} is linearly independent we consider
c1~v1 + · · · + ck~vk + ck+1~vk+1 = ~0 (6.3)
Observe that since A~vi = λi~vi we have
(A − λiI)~v j = A~v j − λi~v j = λ j~v j − λi~v j = (λ j − λi)~v j
for all 1 ≤ i, j ≤ k + 1. Thus, multiplying both sides of (6.3) by A − λk+1I gives
~0 = c1(λ1 − λk+1)~v1 + · · · + ck(λk − λk+1)~vk + ck+1(λk+1 − λk+1)~vk+1
= c1(λ1 − λk+1)~v1 + · · · + ck(λk − λk+1)~vk + ~0
By our induction hypothesis, {~v1, . . . ,~vk} is linearly independent and hence all the
coefficients must be 0. But, λi , λk+1 for all 1 ≤ i ≤ k, so we must have c1 = · · · =ck = 0. Thus (6.3) becomes
~0 + ck+1~vk+1 = ~0
But ~vk+1 ,~0 since it is an eigenvector, hence ck+1 = 0. Therefore, the set is linearly
independent. �
Section 6.3 Diagonalization 199
THEOREM 3 If A is an n × n matrix with distinct eigenvalues λ1, . . . , λk and
Bi = {~vi,1, . . . ,~vi,gλi} is a basis for the eigenspace of λi for 1 ≤ i ≤ k, then B1 ∪ B2 ∪
· · · ∪ Bk is a linearly independent set.
Proof: We prove the result by induction on k. If k = 1, then B1 is linearly indepen-
dent since it is a basis for Eλ1. Assume the result is true for some j ≥ 1. That is,
assume that B1 ∪ · · · ∪ B j is linearly independent and consider B1 ∪ · · · ∪ B j ∪ B j+1.
For simplicity, we will denote the vectors inB1∪· · ·∪B j by ~w1, . . . , ~wℓ. Assume that
there exists a non-trivial solution to the equation
c1~w1 + · · · + cℓ~wℓ + d1~v j+1,1 + · · · + dgλ j+1~v j+1,gλ j+1
= ~0
Observe that c1, . . . , cℓ cannot be all zero as otherwise this would contradict that the
fact that B j+1 is a basis. Similarly, d1, . . . , dgλ j+1cannot be all zero as this would
contradict the inductive hypothesis. Thus, we must have at least one ci non-zero
and at least one di non-zero. But then we would have a linear combination of some
vectors in B j+1 equaling a linear combination of vectors in B1, . . . ,B j which would
contradict Lemma 2. Hence, the only solution must be the trivial solution. Therefore,
B1∪· · ·∪B j∪B j+1 is linearly independent. The result now follows by induction. �
Theorem 3 shows us how to form a linearly independent set of eigenvectors of a
matrix A. We first find a basis for the eigenspace of every eigenvalue of A. Then,
the set containing all the vectors in every basis is linearly independent by Theorem 3.
However, for the matrix P to be invertible, we actually need n linearly independent
eigenvectors of A. The next result gives us the condition required for there to exist n
linearly independent eigenvectors.
COROLLARY 4 If A is an n × n matrix with distinct eigenvalues λ1, . . . , λk, then A is diagonalizable
if and only if gλi= aλi
for 1 ≤ i ≤ k.
If λ is an eigenvalue of A such that gλ < aλ, then λ is said to be deficient. Observe
that since gλ ≥ 1 for every eigenvalue λ, then by Theorem 6.2.4 any eigenvalue λsuch that aλ = 1 cannot be deficient. This gives the following result.
COROLLARY 5 If A is an n × n matrix with n distinct eigenvalues, then A is diagonalizable.
We now have an algorithm for diagonalizing a matrix A or showing that it is not
diagonalizable.
200 Chapter 6 Eigenvectors and Diagonalization
ALGORITHM
To diagonalize an n × n matrix A, or show that A is not diagonalizable:
1. Find and factor the characteristic polynomial C(λ) = det(A − λI).
2. Let λ1, . . . , λn denote the n-roots of C(λ) (repeated according to multiplicity).
If any of the eigenvalues λi are not real, then A is not diagonalizable over R.
3. Find a basis for the eigenspace of each λi by finding a basis for the nullspace
of A − λiI.
4. If gλi< aλi
for any λi, then A is not diagonalizable. Otherwise, form
a basis {~v1, . . . ,~vn} for Rn of eigenvectors of A by using Theorem 3. Let
P =[
~v1 · · · ~vn
]
. Then, P−1AP = diag(λ1, . . . , λn) where λi is an eigenvalue
corresponding to the eigenvector ~vi for 1 ≤ i ≤ n.
EXAMPLE 1 Show that A =
−1 6 3
1 0 −1
−3 6 5
is diagonalizable and find an invertible matrix P and a
diagonal matrix D such that P−1AP = D.
Solution: In Example 6.2.8 we found that the eigenvalues of A are λ1 = 0 and
λ2 = 2, and that aλ1= gλ1
and aλ2= gλ2
. Hence, A is diagonalizable by Corollary
4. Moreover, we get the the columns of P are the basis vectors from each of the
eigenspaces. That is,
P =
1 2 1
−1/3 1 0
1 0 1
The diagonal entries in D are the eigenvalues of A corresponding column wise to the
columns in P. Hence,
P−1AP = D = diag(0, 2, 2) =
0 0 0
0 2 0
0 0 2
REMARK
It is important to remember that the answer is not unique. We can arrange the linearly
independent eigenvectors of A as the columns of P in any order. We only have to
ensure the entry dii in D is an eigenvalue of A of the eigenvector in the i-th column of
P.
EXERCISE 1 Find two other invertible matrices P which diagonalize A =
−1 6 3
1 0 −1
−3 6 5
. In each
case, give the corresponding diagonal matrix D.
Section 6.3 Diagonalization 201
EXAMPLE 2 Show that A =
[
1 1
0 1
]
is not diagonalizable.
Solution: We have 0 = C(λ) =
∣
∣
∣
∣
∣
∣
1 − λ 1
0 1 − λ
∣
∣
∣
∣
∣
∣
= (λ − 1)2.
Hence, λ1 = 1 is the only eigenvalue and aλ1= 2. We then get
A − λ1I =
[
0 1
0 0
]
So, a basis for Eλ1is
{[
1
0
]}
. Thus, A is not diagonalizable, because gλ1= 1 < aλ1
.
EXAMPLE 3 Show that A =
[
1 2
2 1
]
is diagonalizable and find an invertible matrix P and a diagonal
matrix D such that P−1AP = D.
Solution: We have 0 = C(λ) =
∣
∣
∣
∣
∣
∣
1 − λ 2
2 1 − λ
∣
∣
∣
∣
∣
∣
= λ2 − 2λ − 3 = (λ − 3)(λ + 1).
Thus, the eigenvalues of A are λ1 = 3 and λ2 = −1. Since the 2 × 2 matrix A has 2
distinct eigenvalues, it is diagonalizable by Corollary 5.
For λ1 = 3, we have
A − λ1I =
[
−2 2
2 −2
]
∼[
1 −1
0 0
]
Thus, a basis for Eλ1is
{[
1
1
]}
.
For λ2 = −1, we have
A − λ2I =
[
2 2
2 2
]
∼[
1 1
0 0
]
Thus, a basis for Eλ2is
{[
−1
1
]}
.
We now let P =
[
−1 1
1 1
]
and get D = P−1AP =
[
−1 0
0 3
]
.
EXAMPLE 4 Show that A =
[
0 1
−1 0
]
is not diagonalizable over R.
Solution: We have 0 = C(λ) =
∣
∣
∣
∣
∣
∣
−λ 1
−1 −λ
∣
∣
∣
∣
∣
∣
= λ2 + 1.
The eigenvalues of A are not real, so A is not diagonalizable over R.
202 Chapter 6 Eigenvectors and Diagonalization
REMARK
We will show in Chapter 11 that A =
[
0 1
−1 0
]
is diagonalizable over the complex
numbers.
EXAMPLE 5 Show that A =
1 2 2
2 1 2
2 2 1
is diagonalizable and find an invertible matrix P and a
diagonal matrix D such that P−1AP = D.
Solution: Using column and row operations on the determinant, we get
0 = C(λ) =
∣
∣
∣
∣
∣
∣
∣
∣
1 − λ 2 2
2 1 − λ 2
2 2 1 − λ
∣
∣
∣
∣
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
∣
∣
1 − λ 2 0
2 1 − λ 1 + λ2 2 −1 − λ
∣
∣
∣
∣
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
∣
∣
1 − λ 2 0
4 3 − λ 0
2 2 −1 − λ
∣
∣
∣
∣
∣
∣
∣
∣
= (−1 − λ)∣
∣
∣
∣
∣
∣
1 − λ 2
4 3 − λ
∣
∣
∣
∣
∣
∣
= −(λ + 1)(λ2 − 4λ − 5)
= −(λ + 1)2(λ − 5)
Thus, the eigenvalues of A are λ1 = −1 and λ2 = 5 where aλ1= 2 and aλ2
= 1.
For λ1 = −1, we have
A − λ1I =
2 2 2
2 2 2
2 2 2
∼
1 1 1
0 0 0
0 0 0
Thus, a basis for Eλ1is
−1
1
0
,
−1
0
1
, so gλ1= aλ1
.
For λ2 = 5, we have
A − λ2I =
−4 2 2
2 −4 2
2 2 −4
∼
1 0 −1
0 1 −1
0 0 0
Thus, a basis for Eλ2is
1
1
1
, so gλ2= aλ2
.
Therefore, A is diagonalizable by Corollary 4. We can take P =
−1 −1 1
1 0 1
0 1 1
and get
D = P−1AP = diag(−1,−1, 5).
Section 6.3 Diagonalization 203
We conclude this section with a couple of remarks and a result about the eigenvalues
of a matrix. First, remember that by definition of an eigenvalue λ of a matrix A, the
RREF of A − λI must have at least one row of zeros. If you ever find that the RREF
of your A − λI is I, then you either made a mistake in your row reducing, or your
eigenvalue is not actually an eigenvalue. Secondly, a lot of mathematics is about
looking for patterns. As you do diagonalization examples, look for patterns in the
values of eigenvalues and eigenvectors. There are some tricks for finding eigenvalues
and/or eigenvectors in some special cases. The following theorem can help.
THEOREM 6 If λ1, . . . , λn are all the eigenvalues of an n × n matrix A, then
tr A = λ1 + · · · + λn
Section 6.3 Problems
1. For each of the following matrices, determine
the eigenvalues and a basis for each eigenspace
and hence determine if the matrix is diagonal-
izable. If it is, write a diagonalizing matrix P
and the resulting matrix D.
(a)
[
3 −1
−1 3
]
(b)
[
1 2
0 1
]
(c)
[
4 −2
−2 7
]
(d)
5 −1 1
−1 5 1
1 1 5
(e)
1 0 −1
2 2 1
4 0 5
(f)
1 1 2
2 0 1
1 −1 −1
(g)
1 −2 3
2 6 −6
1 2 −1
(h)
3 1 1
−4 −2 −5
2 2 5
(i)
−3 −3 5
13 10 −13
3 2 −1
(j)
8 6 −10
5 1 −5
5 3 −7
(k)
2 1 0
1 3 1
3 1 4
(l)
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
2. For each linear operator L, find a basis B such
that [L]B is diagonal.
(a) L(x1, x2) = (3x1 − x2,−x1 + 3x2)
(b) L(x1, x2) = (x1 + 3x2, 4x1 + 2x2)
(c) L(x1, x2, x3) = (−2x1 + 2x2 + 2x3,−3x1 +
3x2 + 2x3,−2x1 + 2x2 + 2x3)
3. Let A be an n × n diagonalizable matrix with
eigenvalues λ1, . . . , λn repeated according to
multiplicity.
(a) Prove that tr A = λ1 + · · · + λn.
(b) Prove that det A = λ1 · · ·λn.
4. Write an invertible matrix A that is not diago-
nalizable, an invertible matrix B that is diago-
nalizable, a non-invertible matrix C that is not
diagonalizable, and a non-invertible matrix E
that is diagonalizable.
5. Suppose that A is diagonalizable and that the
eigenvalues of A are λ1, . . . , λn. Show that
A − λ1I is also diagonalizable and its eigenval-
ues are 0, λ2 − λ1, λ3 − λ1, . . . , λn − λ1.
6. Prove that if A is digaonalizable, then AT is di-
agonalizable.
7. Let A =
[
a b
c d
]
.
(a) Prove that A is diagonalizable over R if
(a−d)2+4bc > 0 and is not diagonalizable
over R if (a − d)2 + 4bc < 0.
(b) Find two examples to demonstrate that if
(a − d)2 + 4bc = 0, then A may or may not
be diagonalizable.
(c) What condition is required on (a−d)2+4bc
to ensure that A has two integer eigenval-
ues.
204 Chapter 6 Eigenvectors and Diagonalization
6.4 Powers of Matrices
We now briefly look at one of the many applications of diagonalization. In particular,
we will look at how to use diagonalization to easily compute powers of a diagonaliz-
able matrix.
Let A be an n × n matrix. We define powers of a matrix in the expected way. That is,
for any positive integer k, Ak is defined to be A multiplied by itself k times. Thus, for
A =
[
1 2
−1 4
]
we have
A2 =
[
1 2
−1 4
] [
1 2
−1 4
]
=
[
−1 10
−5 14
]
A3 = A2A =
[
−1 10
−5 14
] [
1 2
−1 4
]
=
[
−11 38
−19 46
]
A1000 =
[
21001 − 31000 −21001 + 2 · 31000
21000 − 31000 −21000 + 2 · 31000
]
To calculate A1000 we certainly did not multiply a thousand matrices A together. In-
stead, we used the power of the diagonal form of a matrix.
Observe that multiplying a diagonal matrix D by itself is easy. In particular, we
have if D = diag(d1, . . . , dn), then Dk = diag(dk1, . . . , dk
n). The following theorem,
shows us how we can use this fact with diagonalization to quickly take powers of a
diagonalizable matrix A.
THEOREM 1 Let A be an n × n matrix. If there exists a matrix P and diagonal matrix D such that
P−1AP = D, then
Ak = PDkP−1
Proof: We prove the result by induction on k. If k = 1, then P−1AP = D implies
A = PDP−1 and so the result holds. Assume the result is true for some k ≥ 1. We
then have
Ak+1 = AkA = (PDkP−1)(PDP−1) = PDkP−1PDP−1 = PDkIDP−1 = PDk+1P−1
as required. �
EXAMPLE 1 Let A =
[
1 2
−1 4
]
. Show that A1000 =
[
21001 − 31000 −21001 + 2 · 31000
21000 − 31000 −21000 + 2 · 31000
]
.
Solution: We first diagonalize A. We have
0 = det(A − λI) =
∣
∣
∣
∣
∣
∣
1 − λ 2
−1 4 − λ
∣
∣
∣
∣
∣
∣
= λ2 − 5λ + 6 = (λ − 2)(λ − 3)
Thus, the eigenvalues of A are λ1 = 2 and λ2 = 3.
Section 6.4 Powers of Matrices 205
For λ1 = 2 we have
A − λ1I =
[
−1 2
−1 2
]
∼[
−1 2
0 0
]
so a basis for Eλ1is
{[
2
1
]}
.
For λ2 = 3 we have
A − λ2I =
[
−2 2
−1 1
]
∼[
1 −1
0 0
]
so a basis for Eλ2is
{[
1
1
]}
.
Therefore, P =
[
2 1
1 1
]
and D =
[
2 0
0 3
]
. We find that P−1 =
[
1 −1
−1 2
]
. Hence,
A1000 = PD1000P−1
=
[
2 1
1 1
] [
21000 0
0 31000
] [
1 −1
−1 2
]
=
[
21001 31000
21000 31000
] [
1 −1
−1 2
]
[
21001 − 31000 −21001 + 2 · 31000
21000 − 31000 −21000 + 2 · 31000
]
EXAMPLE 2 Let A =
[
−2 2
−3 5
]
. Calculate A200.
Solution: We have
0 = det(A − λI) =
∣
∣
∣
∣
∣
∣
−2 − λ 2
−3 5 − λ
∣
∣
∣
∣
∣
∣
= λ2 − 3λ − 4 = (λ + 1)(λ − 4).
Thus, the eigenvalues of A are λ1 = −1 and λ2 = 4.
For λ1 = −1 we have A − λ1I =
[
−1 2
−3 6
]
∼[
1 −2
0 0
]
. so a basis for Eλ1is
{[
2
1
]}
.
For λ2 = 4 we have A − λ2I =
[
−6 2
−3 1
]
∼[
1 −1/30 0
]
. so a basis for Eλ2is
{[
1
3
]}
.
Therefore, P =
[
2 1
1 3
]
and D =
[
−1 0
0 4
]
. We find that P−1 = 15
[
3 −1
−1 2
]
. Hence,
A200 = PD200P−1 =
[
2 1
1 3
] [
1 0
0 4200
]
1
5
[
3 −1
−1 2
]
=1
5
[
6 − 4200 −2 + 2 · 4200
3 − 3 · 4200 −1 + 6 · 4200
]
206 Chapter 6 Eigenvectors and Diagonalization
Section 6.4 Problems
1. Let A =
[
4 −1
−2 5
]
. Use diagonalization to cal-
culate A3. Verify your answer by computing A3
directly.
2. Calculate A100 where A =
[
−6 −10
4 7
]
.
3. Calculate A100 where A =
[
2 2
−3 −5
]
.
4. Calculate A200 where A =
[
−2 2
−3 5
]
.
5. Calculate A100 where A =
−2 1 1
−1 0 1
−2 2 1
.
6. Show that if λ is an eigenvalue of A, then λn is
an eigenvalue of An. How are the correspond-
ing eigenvectors related?
7. An n × n matrix T is called a Markov matrix
(or transition matrix) if ti j ≥ 0 for all i, j, andn
∑
i=1
ti j = 1 for each j. Prove that one eigenvalue
of a Markov matrix T is λ1 = 1.
8. Recall the population-migration system de-
scribed in Problem 3.1.13 on page 86.
(a) Given initial populations of x0 and y0, use
diagonalization to verify that the populations in
year t are given by[
xt
yt
]
= At
[
x0
y0
]
= 13
2 +(
710
)t2 − 2
(
710
)t
1 −(
710
)t1 + 2
(
710
)t
[
x0
y0
]
(b) Use this formula to confirm that, if the ini-
tial populations sum to T (that is, x0 + y0 = T ),
then in the long term (i.e. as t gets very large),
the populations tend to
x =2
3T, y =
1
3T
regardless of the separate values of x0 and y0.
This is referred to as a stable population pro-
file. (Compare with Problem 3.1.13(c).)
9. Recall the competing suppliers of internet ser-
vices described in Problem 3.1.14 on page 83.
Use diagonalization to calculate the market
share of each company after six months and to
calculate how the market shares will be divided
in the long run.
10. Recall the transferring of cars between the
three locations of a car rental company de-
scribed in Problem 3.1.15 on page 83. Use di-
agonalization to determine how the cars will be
distributed in the long run.
11. Recall the one-step recursion definition of the
Fibonacci sequence given in problem 3.1.16 on
page 83. Use diagonalization to derive the fol-
lowing explicit expression for the n-th term of
the Fibonacci sequence:
an =1√
5
1 +√
5
2
n
−
1 −√
5
2
n
Chapter 7
Fundamental Subspaces
The main purpose of this chapter is to review a few important concepts from the first
six chapters. These concepts include subspaces, bases, and dimension. As you will
soon see, the rest of the book relies heavily on these and other concepts from the first
six chapters.
7.1 Bases of Fundamental Subspaces
Recall from Section 3.3 the four fundamental subspaces of a matrix.
DEFINITION
FundamentalSubspaces
Let A be an m × n matrix. The four fundamental subspaces of A are
1. Col(A) = {A~x ∈ Rm | ~x ∈ Rn}, called the column space,
2. Row(A) = {AT~x ∈ Rn | ~x ∈ Rm}, called the row space,
3. Null(A) = {~x ∈ Rn | A~x = ~0}, called the nullspace,
4. Null(AT ) = {~x ∈ Rm | AT~x = ~0}, called the left nullspace.
THEOREM 1 If A is an m × n matrix, then Col(A) and Null(AT ) are subspaces of Rm, and Row(A)
and Null(A) are subspaces of Rn.
Our goal now is to find an easy way to determine a basis for each of the four funda-
mental subspaces.
REMARK
To help you understand the following proof, you may wish to pick a simple 3 × 2
matrix A and follow the steps of the proof with your matrix A.
207
208 Chapter 7 Fundamental Subspaces
THEOREM 2 If A is an m × n matrix, then the columns of A which correspond to leading ones in
the reduced row echelon form of A form a basis for Col(A). Moreover,
dim Col(A) = rank A
Proof: We first observe that if A is the zero matrix, then the result is trivial. Other-
wise, A contains a non-zero entry and so rank A = r > 0.
Let the RREF of A be R =[
~r1 · · · ~rn
]
. Since rank A = r, R contains r leading ones.
Let t1, . . . , tr denote the indexes of the columns of R which contain leading ones. We
will first show that B = {~rt1 , . . . ,~rtr} is a basis for Col(R).
By definition of the RREF, the vectors ~rt1 , . . . ,~rtr contain a leading one and all other
entries must be zero. Thus, they are standard basis vectors. Moreover, they are
distinct standard basis vectors since each leading one must be in a different row.
Hence, B is linearly independent.
Also, by definition of the RREF, all columns of R without leading ones are in SpanBas they cannot contain a non-zero entry in any row without a leading one. Therefore,
we also have SpanB = Col(R) and hence B is a basis for Col(R) as claimed.
Let A =[
~a1 · · · ~an
]
. We will show that C = {~at1 , . . . , ~atr} is a basis for Col(A)
by using the fact that B is a basis for Col(R). To do this, we first need to find a
relationship between the vectors in B and C.
Since R is the reduced row echelon form of A, by Theorem 5.2.6 there exists a
sequence of elementary matrices E1, . . . , Ek such that Ek · · ·E1A = R. Let E =
Ek · · ·E1. Recall that every elementary matrix is invertible, hence E−1 = E−11 · · ·E−1
k
exists. Then
R = EA = [E~a1 · · · E~an]
Consequently, ~ri = E~ai, or ~ai = E−1~ri.
To prove that C is linearly independent, we use the definition of linear independence.
Consider
c1~at1 + · · · + cr~atr =~0
Multiply both sides by E to get
E(c1~at1 + · · · + cr~atr ) = E~0
c1E~at1 + · · · + crE~atr =~0
c1~rt1 + · · · + cr~rtr =~0
This implies that c1 = · · · = cr = 0 since {~rt1 , . . . ,~rtr} is linearly independent. Thus, Cis linearly independent.
To show that C spans Col(A), we need to show that if we pick any ~b ∈ Col(A), we
can write it as a linear combination of the vectors ~at1 , . . . , ~atr . Let ~b ∈ Col(A).
Section 7.1 Bases of Fundamental Subspaces 209
By definition of the column space, there exists a ~x ∈ Rn such that
~b = A~x
~b = E−1R~x
E~b = R~x
Therefore, E~b is in the column space of R and hence can be written as a linear com-
bination of the basis vectors {~rt1 , . . . ,~rtr}. That is, there exists d1, . . . , dr ∈ R such
that
E~b = d1~rt1 + · · · + dr~rtr
~b = E−1(d1~rt1 + · · · + dr~rtr )
~b = d1E−1~rt1 + · · · + drE−1~rtr
~b = d1~at1 + · · · + dr~atr
as required. Thus, we also have Span{~at1 , . . . , ~atr} = Col(A). Hence, {~at1 , . . . , ~atr} is a
basis for Col(A).
Recall that the dimension of a vector space is the number of vectors in any basis.
Thus, dim Col(A) = r = rank A. �
LEMMA 3 If R is an m × n matrix and E is an n × n invertible matrix, then
{RE~x | ~x ∈ Rn} = {R~y | ~y ∈ Rn}
THEOREM 4 If A is an m × n matrix, then the non-zero rows in the reduced row echelon form of
A forms a basis for Row(A). Hence, dim Row(A) = rank A.
Proof: Let R be the RREF of A. By definition of RREF, we have that the set of all
non-zero rows of R form a basis for the row space of R. Moreover, if E is a matrix as
in the proof of Theorem 2, then we get that A = E−1R and
Row(A) = {AT~x | ~x ∈ Rm}= {(E−1R)T~x | ~x ∈ Rm}= {RT (E−1)T~x | ~x ∈ Rm}= {RT~y | ~y ∈ Rm} by Lemma 3
= Row(R)
Hence, the set of all non-zero rows of R also form a basis for the row space of A. �
Using Theorem 2 and Theorem 4 we get the following corollary.
210 Chapter 7 Fundamental Subspaces
COROLLARY 5 For any m × n matrix A we have rank A = rank AT .
REMARK
Due to its usefulness, it is very common to define the rank of a matrix as the dimen-
sion of the column space (or row space) and then prove that this is equivalent to the
number of leading ones in the reduced row echelon form.
Thanks to our previous experience in finding bases of nullspaces (which you prac-
ticed lots in chapter 6), once we have found the reduced row echelon form of a ma-
trix, we can easily find a basis for the nullspace, column space, and row space of
the matrix. To find a basis for the left nullspace of a matrix, we can use our method
for finding a basis for the nullspace on AT . We demonstrate this with a couple of
examples.
EXAMPLE 1 Let A =
1 2 −1
1 1 2
1 0 5
. Find a basis for the four fundamental subspaces.
Solution: Row reducing A we get
1 2 −1
1 1 2
1 0 5
∼
1 0 5
0 1 −3
0 0 0
= R
A basis for Row(A) is the set of non-zero rows of R. Thus, a basis is
1
0
5
,
0
1
−3
(remember that in linear algebra, vectors in Rn are always written as column vectors).
A basis for Col(A) is the columns of A which correspond to the columns of R which
contain leading ones. Hence, a basis for Col(A) is
1
1
1
,
2
1
0
.
To find a basis for Null(A), we solve A~x = ~0 as normal. That is, we rewrite the system
R~x = ~0 as a system of equations:
x1 + 5x3 = 0
x2 − 3x3 = 0
Hence, x3 is a free variable and a vector equation of the solution space is
x1
x2
x3
=
−5x3
3x3
x3
= x3
−5
3
1
, x3 ∈ R
Section 7.1 Bases of Fundamental Subspaces 211
Consequently, a basis for Null(A) is
−5
3
1
.
To find a basis for Null(AT ), we can just use our method for finding the nullspace of
a matrix on AT . That is, we row reduce AT to get
1 1 1
2 1 0
−1 2 5
∼
1 0 −1
0 1 2
0 0 0
Thus, x3 is a free variable and a vector equation for the solution space is
x1
x2
x3
=
x3
−2x3
x3
= x3
1
−2
1
, x3 ∈ R
Therefore, a basis for the left nullspace of A is
1
−2
1
.
EXAMPLE 2 Let A =
2 4 −4 2
1 2 2 −7
−1 −2 0 3
. Find a basis for the four fundamental subspaces.
Solution: Row reducing A we get
2 4 −4 2
1 2 2 −7
−1 −2 0 3
∼
1 2 0 −3
0 0 1 −2
0 0 0 0
= R
The set of non-zero rows
1
2
0
−3
,
0
0
1
−2
of R forms a basis for Row(A).
The columns of A which correspond to the columns of R which contain leading ones
form a basis for Col(A). Hence, a basis for Col(A) is
2
1
−1
,
−4
2
0
.
To find a basis for Null(A), we solve A~x = ~0 as normal. That is, we rewrite the system
R~x = ~0 as a system of equations:
x1 + 2x2 − 3x4 = 0
x3 − 2x4 = 0
Hence, x2 and x4 are free variables and a vector equation for the solution set is
x1
x2
x3
x4
=
−2x2 + 3x4
x2
2x4
x4
= x2
−2
1
0
0
+ x4
3
0
2
1
, x2, x4 ∈ R
212 Chapter 7 Fundamental Subspaces
Hence, a basis for Null(A) is
−2
1
0
0
,
3
0
2
1
.
Now, we row reduce AT to get
2 1 −1
4 2 −2
−4 2 0
2 −7 3
∼
1 0 −1/40 1 −1/20 0 0
0 0 0
Consequently, x3 is a free variable and the general solution is
x1
x2
x3
=
14x3
12x3
x3
= x3
1/41/21
, x3 ∈ R
Therefore, a basis for the left nullspace of A is
1/41/21
.
REMARK
It is not actually necessary to row reduce AT to find a basis for the left nullspace of A.
We can use the fact that EA = R where E is the product of elementary matrices used
to bring A to its reduced row echelon form R to find a basis for the left nullspace. The
derivation of this procedure is left as an exercise.
Notice in each of the examples that the dimension of the column space plus the
dimension of the nullspace equals number of columns of the matrix. This result
should not be surprising since Theorem 1 tells us that the dimension of the column
space equals the number of leading ones, and the Solution Theorem (Theorem 2.2.6)
tells us that the dimension of the nullspace equals the number of columns without
leading ones. We get the following theorem.
THEOREM 6 (Dimension Theorem)
If A is an m × n matrix, then rank A + dim Null(A) = n.
Proof: Let B = {~v1, . . . ,~vk} be a basis for Null(A) so that dim Null(A) = k. By
Theorem 4.2.4 we can extend B to a basis {~v1, . . . ,~vk,~vk+1, . . . ,~vn} for Rn. We will
prove that {A~vk+1, . . . , A~vn} is a basis for Col(A).
Consider~0 = ck+1A~vk+1 + · · · + cnA~vn = A(ck+1~vk+1 + · · · + cn~vn)
Section 7.1 Bases of Fundamental Subspaces 213
This implies that ck+1~vk+1 + · · · + cn~vn ∈ Null(A) and hence we can write it as a linear
combination of the basis vectors for Null(A). That is, there exist d1, . . . , dk ∈ R such
that
ck+1~vk+1 + · · · + cn~vn = d1~v1 + · · · + dk~vk
−d1~v1 − · · · − dk~vk + ck+1~vk+1 + · · · + cn~vn = ~0
Thus, d1 = · · · = dk = ck+1 = · · · = cn = 0 since {~v1, . . . ,~vn} is linearly independent.
Therefore, {A~vk+1, . . . , A~vn} is linearly independent.
Let~b ∈ Col(A). Then~b = A~x for some ~x ∈ Rn. Since ~x ∈ Rn, there exist s1, . . . , sn ∈ Rsuch that
~b = A(s1~v1 + · · · + sk~vk + sk+1~vk+1 + · · · + sn~vn)
= s1A~v1 + · · · + skA~vk + sk+1A~vk+1 + · · · + snA~vn
But, ~vi ∈ Null(A) for 1 ≤ i ≤ k, so A~vi = ~0. Hence, we have
~b = ~0 + · · · + ~0 + sk+1A~vk+1 + · · · + snA~vn = sk+1A~vk+1 + · · · + snA~vn
Thus, Span{A~vk+1, . . . , A~vn} = Col(A).
Therefore, we have shown that {A~vk+1, . . . , A~vn} is a basis for Col(A) and hence
rank A = dim Col(A) = n − k = n − dim Null(A)
as required. �
Wait! Why did we just give a proof of this theorem when we already stated that
the result follows from Theorem 1 and the Solution Theorem? We did this for three
reasons. First, a proof can often teach us more than just that the statement of the
theorem is true. For example, this proof teaches us a way of finding a basis for the
column space if we are given a basis for the nullspace, and it teaches us a proof
technique (which we will use again soon!). Second, in Chapter 2 we omitted a proof
of the Solution Theorem, so it is nice to make up for that now. Lastly, because proving
theorems is fun.
EXERCISE 1 Let L : Rn → Rm be a linear mapping. Prove that
dim Range(L) + dim Ker(L) = dim(Rn)
in two ways. First, by mimicking the proof of the Dimension Theorem. Second, by
using the Dimension Theorem and the theorems is Section 3.3.
214 Chapter 7 Fundamental Subspaces
Section 7.1 Problems
1. Find a basis for the four fundamental subspace
of each matrix.
(a)
[
0 1 3
1 2 3
]
(b)
[
1 2 −1
2 4 3
]
(c)
2 −1 5
3 4 2
1 1 1
(d)
1 −2 3 5
−2 4 0 −4
3 −6 −5 1
(e)
3 2 5 3
−1 0 −3 1
1 1 1 1
1 4 −5 1
2. If
1
0
0
,
0
1
0
is a basis for the nullspace of a
4 × 3 matrix A =[
~a1 ~a2 ~a3
]
, then what is a
basis for Col(A)?
3. Let A be an m × n matrix with rank(A) =
r < m and reduced row echelon form R. Let
E1, . . . , Ek be a sequence of elementary matri-
ces such that Ek · · ·E1A = R. Prove that the
bottom m − r rows of E = Ek · · ·E1 form a ba-
sis for the left nullspace of A.
4. Prove Lemma 3.
5. Let A be an n × n matrix. Prove that
Null(A) = {~0} if and only if det A , 0.
6. Let B be an m × n matrix.
(a) Prove that if ~x is any vector in the left
nullspace of B, then ~xT B = ~0T .
(b) Prove that if ~x is any vector in the left
nullspace of B and ~y is any vector in the
column space of B, then ~x · ~y = 0.
7. Invent a 2 × 2 matrix A such that
Null(A) = Col(A).
8. Prove that if n×n matrices A and B are similar,
then rank A = rank B.
9. Let A be an m × n matrix.
(a) Prove that Null(AT A) = Null(A).
(b) Prove that rank(AT A) = rank(A).
10. Prove that if AB = Om,n, then the columnspace
of B is a subspace of the nullspace of A.
11. Let S = Span
1
1
1
,
1
0
2
.
(a) Find a matrix A such that Col(A) = S.
(b) Find a matrix B such that Row(B) = S.
(c) Find a matrix C such that Null(C) = S.
(d) Find a matrix D such that Null(DT ) = S.
(e) Is there a matrix F such that Col(F) = S
and Null(F) = S?
Chapter 8
Linear Mappings
8.1 General Linear Mappings
Linear Mappings L : V→ W
We now extend our definition of a linear mapping to the case where the domain and
codomain are general vectors spaces instead of just Rn.
DEFINITION
Linear Mapping
Let V andW be vector spaces. A mapping L : V→W is called linear if
L(s~x + t~y) = sL(~x) + tL(~y)
for all ~x, ~y ∈ V and s, t ∈ R.
Two linear mappings L : V→W and M : V→W are said to be equal if L(~v) = M(~v)
for all ~v ∈ V.
REMARKS
1. The definition of a linear mapping above is well defined because of the closure
properties of vector spaces.
2. As before, we sometimes call a linear mapping L : V→ V a linear operator.
215
216 Chapter 8 Linear Mappings
EXAMPLE 1 Let L : P3(R)→ R2 be defined by L(a + bx + cx2 + dx3) =
[
a + d
b + c
]
.
(a) Evaluate L(1 + 2x2 − x3).
Solution: L(1 + 2x2 − x3) =
[
1 + (−1)
0 + 2
]
=
[
0
2
]
.
(b) Prove that L is linear.
Solution: For any a1 + b1x + c1x2 + d1x3, a2 + b2x + c2x2 + d2x3 ∈ P3(R) and
s, t ∈ R we get
L(
s(a1 + b1x + c1x2 + d1x3) + t(a2 + b2x + c2x2 + d2x3))
= L(
(sa1 + ta2) + (sb1 + tb2)x + (sc1 + tc2)x2 + (sd1 + td2)x3)
=
[
sa1 + ta2 + sd1 + td2
sb1 + tb2 + sc1 + tc2
]
= s
[
a1 + d1
b1 + c1
]
+ t
[
a2 + d2
b2 + c2
]
= sL(a1 + b1x + c1x2 + d1x3) + tL(a2 + b2x + c2x2 + d2x3)
Thus, L is linear.
EXAMPLE 2 Let tr : Mn×n(R) → R be defined by tr A =
n∑
i=1
(A)ii (called the trace of a matrix).
Prove that tr is linear.
Solution: Let A, B ∈ Mn×n(R) and s, t ∈ R. We get
tr(sA + tB) =
n∑
i=1
(sA + tB)ii
=
n∑
i=1
(s(A)ii + t(B)ii)
= s
n∑
i=1
(A)ii + t
n∑
i=1
(B)ii
= s tr A + t tr B
Thus, tr is linear.
Section 8.1 General Linear Mappings 217
EXAMPLE 3 Prove that the mapping L : P2(R) → M2×2(R) defined by L(a + bx + cx2) =
[
a bc
0 abc
]
is not linear.
Solution: Observe that
L(1 + x + x2) =
[
1 1
0 1
]
But,
L(
2(1 + x + x2))
= L(2 + 2x + 2x2) =
[
2 4
0 8
]
, 2L(1 + x + x2)
Of course, operations on linear mappings L : V→W are defined in the same way as
for linear mappings L : Rn → Rm.
DEFINITION
Addition
ScalarMultiplication
Let L : V→W and M : V→W be linear mappings. We define L +M : V→W by
(L + M)(~v) = L(~v) + M(~v)
and for any t ∈ R we define tL : V→W by
(tL)(~v) = tL(~v)
EXAMPLE 4 Let L : M2×2(R) → P2(R) be defined by L
([
a b
c d
])
= a + bx + (a + c + d)x2 and
M : M2×2(R) → P2(R) be defined by M
([
a b
c d
])
= (b + c) + dx2. Then, L and M are
both linear and L + M is the mapping defined by
(L + M)
([
a b
c d
])
= L
([
a b
c d
])
+ M
([
a b
c d
])
= [a + bx + (a + c + d)x2] + [(b + c) + dx2]
= (a + b + c) + bx + (a + c + 2d)x2
Similarly, 4L is the mapping defined by
(4L)
([
a b
c d
])
= 4L
([
a b
c d
])
= 4a + 4bx + (4a + 4c + 4d)x2
218 Chapter 8 Linear Mappings
THEOREM 1 Let V and W be vector spaces. The set L of all linear mappings L : V → W with
standard addition and scalar multiplication of mappings is a vector space.
Proof: To prove that L is a vector space, we need to show that it satisfies all ten
vector spaces axioms. We will prove V1 and V2 and leave the rest as exercises.
Let L,M ∈ L. Then, L : V→W and M : V→W are both linear.
V1 To prove that L is closed under addition, we need to show that L+M : V→Wis a linear mapping.
For any ~v1,~v2 ∈ V and s, t ∈ R we have
(L + M)(s~v1 + t~v2) = L(s~v1 + t~v2) + M(s~v1 + t~v2)
= sL(~v1) + tL(~v2) + sM(~v1) + tM(~v2)
= s[L(~v1) + M(~v1)] + t[L(~v2) + M(~v2)]
= s(L + M)(~v1) + t(L + M)(~v2)
Thus, L + M is linear and so L + M ∈ L.
V2 For any ~v ∈ V we have
(L + M)(~v) = L(~v) + M(~v) = M(~v) + L(~v) = (M + L)(~v)
since addition inW is commutative. Hence L + M = M + L.
�
We also define the composition of mappings in the expected way.
DEFINITION
Composition
Let L : V→W and M :W→ U be linear mappings. We define M ◦ L : V→ U by
(M ◦ L)(~v) = M(L(~v)), for all ~v ∈ V
EXAMPLE 5 If L : P2(R) → R3 is defined by L(a + bx + cx2) =
a + b
c
0
and M : R3 → M2×2(R) is
defined by M
x1
x2
x3
=
[
x1 0
x2 + x3 0
]
, then M ◦ L is the mapping defined by
(M ◦ L)(a + bx + cx2) = M(L(a + bx + cx2)) = M
a + b
c
0
=
[
a + b 0
c 0
]
Observe that M ◦ L is in fact a linear mapping from P2(R) to M2×2(R).
Section 8.1 General Linear Mappings 219
THEOREM 2 If L : V → W and M : W → U are linear mappings, then M ◦ L : V → U is a linear
mapping.
DEFINITION
InvertibleMapping
Let L : V → W and M : W → V be linear mappings. If (M ◦ L)(~v) = ~v for all ~v ∈ Vand (L ◦ M)(~w) = ~w for all ~w ∈ W, then L and M are said to be invertible. We write
M = L−1 and L = M−1.
EXAMPLE 6 Let L : P3(R)→ M2×2(R) be the linear mapping defined by
L(a + bx + cx2 + dx3) =
[
a + b + c −a + b + c + d
a + c − d a + b + d
]
and M : M2×2(R)→ P3(R) be the linear mapping defined by
M
([
a b
c d
])
= (−a+c+d)+(4a−b−3c−2d)x+(−2a+b+2c+d)x2+(−3a+b+2c+2d)x3
Prove that L and M are inverses of each other.
Solution: Observe that
(M ◦ L)(a + bx + cx2 + dx3) =M
([
a + b + c −a + b + c + d
a + c − d a + b + d
])
=a + bx + cx2 + dx3
(L ◦ M)
([
a b
c d
])
=L((−a + c + d) + (4a − b − 3c − 2d)x
+ (−2a + b + 2c + d)x2 + (−3a + b + 2c + 2d)x3)
=
[
a b
c d
]
Thus, L and M are inverses of each other.
220 Chapter 8 Linear Mappings
Section 8.1 Problems
1. Determine which of the following mappings
are linear. If it is linear, prove it. If not, give
a counterexample to show that it is not linear.
(a) L : R2 → M2×2(R) defined by
L(a, b) =
[
a 0
0 b
]
(b) T : P2(R)→ P2(R) defined by
T (a + bx + cx2) = (a − b) + (bc)x2.
(c) L : P2(R)→ M2×2(R) defined by
L(a + bx + cx2) =
([
1 0
a + c b + c
])
.
(d) T : M2×2(R)→ R3 defined by
T
([
a b
c d
])
=
a + b
b + c
c − a
(e) D : P3(R) → P2(R) defined by
D(a + bx + cx2 + dx3) = b + 2cx + 3dx2
(f) L : M2×2(R) → R defined by L(A) = det A.
2. For the given vector spaces V and W, invent a
linear mapping L : V → W that satisfies the
given properties.
(a) V = R3, W = P2(R); L(1, 0, 0) = 1 + x,
L(0, 1, 0) = 1 − x2, L(0, 0, 1) = 1 + x + x2
(b) V = P2(R), W = M2×2(R), L(1) =
[
1 2
1 2
]
,
L(x) =
[
1 0
0 1
]
, L(1 + x + x2) =
[
2 2
2 3
]
.
3. Let B be a basis for an n-dimensional vector
space V. Prove that the mapping L : V → Rn
defined by L(~v) = [~v]B for all ~v ∈ V is linear.
4. Let L : V→W be a linear mapping. Prove that
L(~0) = ~0.
5. Let L : V → W and M : W → U be linear
mappings. Prove that M ◦L is a linear mapping
from V to U.
6. (a) Let L : V→W be a linear mapping. Prove
that if {L(~v1), . . . , L(~vk)} is a linearly inde-
pendent set in W, then {~v1, . . . ,~vk} is lin-
early independent in V.
(b) Give an example of a linear mapping L :
V → W, where {~v1, . . . ,~vk} is linearly in-
dependent in V, but {L(~v1), . . . , L(~vk)} is
linearly dependent inW.
7. Let L be the set of all linear mappings from V
to W with standard addition and scalar multi-
plication of linear mappings. Prove that
(a) tL ∈ L for all L ∈ L and t ∈ R.
(b) t(L + M) = tL + tM for all L,M ∈ L and
t ∈ R.
8. Prove that a linear mapping L : Rn → Rn is in-
vertible if and only if its standard matrix [L] is
invertible.
9. Find the inverse of the linear mapping
L : R2 → R2 defined by
L(x1, x2) = (2x1 + x2, 3x1 − 4x2)
10. Let S denote the vector space of infinite se-
quences with addition and scalar multiplication
defined component-wise (see Problem 4.1.8 pg
123). Define the left shift L : S → S by
L(x1, x2, x3, . . .) = (x2, x3, x4, . . .)
and the right shift R : S → S by
R(x1, x2, x3, . . .) = (0, x1, x2, x3, . . .)
It is easy to verify that L and R are linear.
Prove that for any sequence s ∈ S we have
(L ◦ R)(~x) = ~x but (R ◦ L)(~x) , ~x. This shows
that L has a right inverse, but it does not have a
left inverse. It is important in this example that
S is infinite dimensional.
Section 8.2 Rank-Nullity Theorem 221
8.2 Rank-Nullity Theorem
Our goal in this section is not only to extend the definitions of the range and kernel
of a linear mapping to general linear mappings, but to also generalize the result of
Exercise 7.1.1 to general linear mappings. We will see that this generalization, called
the Rank-Nullity Theorem, is extremely useful.
DEFINITION
Range
Kernel
For a linear mapping L : V→W the kernel of L is defined to be
Ker(L) = {~v ∈ V | L(~v) = ~0W}
and the range of L is defined to be
Range(L) = {L(~v) ∈ W | ~v ∈ V}
THEOREM 1 If V andW are vector spaces and L : V→W is a linear mapping, then
L(~0) = ~0
THEOREM 2 If L : V → W is a linear mapping, then Ker(L) is a subspace of V and Range(L) is a
subspace ofW.
The procedure for finding a basis for the range and kernel of a general linear mapping
is, of course, exactly the same as we saw for linear mappings from Rn to Rm.
EXAMPLE 1 Find a basis for the range and kernel of L : P3(R) → R2 defined by
L(a + bx + cx2 + dx3) =
[
a + d
b + c
]
Solution: If a + bx + cx2 + dx3 ∈ Ker(L), then[
0
0
]
= L(a + bx + cx2 + dx3) =
[
a + d
b + c
]
Therefore, a = −d and b = −c. Thus, every vector in Ker(L) has the form
a + bx + cx2 + dx3 = −d − cx + cx2 + dx3 = d(−1 + x3) + c(−x + x2)
Since {−1 + x3,−x + x2} is clearly linearly independent, it is a basis for Ker(L).
The range of L contains all vectors of the form[
a + d
b + c
]
= (a + d)
[
1
0
]
+ (b + c)
[
0
1
]
So, a basis for Range(L) is
{[
1
0
]
,
[
0
1
]}
.
222 Chapter 8 Linear Mappings
EXAMPLE 2 Determine the dimension of the range and kernel of L : P2(R)→ M3×2(R) defined by
L(a + bx + cx2) =
a b
c b
a + b a + c − b
Solution: If a + bx + cx2 ∈ Ker(L), then
0 0
0 0
0 0
= L(a + bx + cx2) =
a b
c b
a + b a + c − b
Hence, we must have a = b = c = 0. Thus, Ker(L) = {~0}, so a basis for Ker(L) is the
empty set. Hence, dim(Ker(L)) = 0.
The range of L contains all vectors of the form
a b
c b
a + b a + c − b
= a
1 0
0 0
1 1
+ b
0 1
0 1
1 −1
+ c
0 0
1 0
0 1
Thus,
1 0
0 0
1 1
,
0 1
0 1
1 −1
,
0 0
1 0
0 1
spans Range(L) and can be shown to be linearly
independent. Hence, it is a basis for Range(L). Consequently, dim(Range(L)) = 3.
EXERCISE 1 Find a basis for the range and kernel of L : R2 → M2×2(R) defined by
L
([
x1
x2
])
=
[
x1 x1 + x2
0 x1 + x2
]
Observe in all of the examples above that dim(Range(L)) + dim(Ker(L)) = dimV
which matches the result we had for linear mappings L : Rn → Rm. Before we
extend this to the general case, we make a couple of definitions.
DEFINITION
Rank
Nullity
Let L : V→W be a linear mapping. We define the rank of L by
rank(L) = dim(Range(L))
We define the nullity of L to be
nullity(L) = dim(Ker(L))
Section 8.2 Rank-Nullity Theorem 223
THEOREM 3 (Rank-Nullity Theorem)
Let V be an n-dimensional vector space and letW be a vector space. If L : V → Wis linear, then
rank(L) + nullity(L) = n
Proof: Assume that{
~v1, · · · ,~vk
}
is a basis for Ker(L) which means nullity(L) = k.
By Theorem 4.2.4, we can extend{
~v1, · · · ,~vk
}
to a basis{
~v1, · · · ,~vk,~vk+1, · · · ,~vn
}
for
V. We claim that B = {
L(~vk+1), · · · , L(~vn)}
is a basis for Range(L).
Consider
ck+1L(~vk+1) + · · · + cnL(~vn) = ~0
L(ck+1~vk+1 + · · · + cn~vn) = ~0
This implies that ck+1~vk+1 + · · · + cn~vn ∈ Ker(L). Thus, we can write
ck+1~vk+1 + · · · + cn~vn = d1~v1 + · · · + dk~vk
But, then we have
−d1~v1 − · · · − dk~vk + ck+1~vk+1 + · · · + cn~vn = ~0
Hence, d1 = · · · = dk = ck+1 = · · · = cn = 0 since{
~v1, · · · ,~vn
}
is linearly independent
as it is a basis for V. Thus, B is linearly independent.
By definition, for any ~y ∈ Range(L) there exists ~v ∈ V such that
~y = L(~v) = L(a1~v1 + · · · + ak~vk + ak+1~vk+1 + · · · + an~vn)
= a1L(~v1) + · · · + akL(~vk) + ak+1L(~vk+1) + · · · + anL(~vn)
= ~0 + · · · + ~0 + ak+1L(~vk+1) + · · · + anL(~vn)
= ak+1L(~vk+1) + · · · + anL(~vn)
Thus, ~y ∈ Span{L(~vk+1), . . . , L(~vn)} and so B is also a spanning set for Range(L).
Therefore, we have shown that {L(~vk+1), · · · , L(~vn)} is a basis for Range(L) and hence
rank(L) = dim(Range(L)) = n − k = n − nullity(L)
�
REMARK
The proof of the Rank-Nullity Theorem should seem very familiar. It is essentially
identical to that of the Dimension Theorem in Chapter 7. In this book there will be a
few times where proofs are essentially repeated, so spending time to make sure you
understand proofs when you first see them can have real long term benefits.
224 Chapter 8 Linear Mappings
EXAMPLE 3 Let L : P2(R)→ M2×3(R) be defined by
L(a + bx + cx2) =
[
a a c
a a c
]
Find the rank and nullity of L.
Solution: If a + bx + cx2 ∈ Ker(L), then
[
a a c
a a c
]
= L(a + bx + cx2) =
[
0 0 0
0 0 0
]
Hence, a = 0 and c = 0. So,
Ker(L) = {0 + bx + 0x2 | b ∈ R} = Span{x}
Therefore, a basis for Ker(L) is {x} and nullity(L) = 1. Then, by the Rank-Nullity
Theorem, we get that
rank(L) = dim P2(R) − nullity(L) = 3 − 1 = 2
In the example above, we could have instead found a basis for the range and then
used the Rank-Nullity Theorem to find the nullity of L.
Section 8.2 Problems
1. Find the rank and nullity of the following linear
mappings.
(a) L : P2(R)→ M2×2(R) defined by
L(a + bx + cx2) =
[
a 0
0 b
]
.
(b) L : P2(R)→ P2(R) defined by
L(a + bx + cx2) = (a − b) + (b + c)x2.
(c) T : P2(R)→ M2×2(R) defined by
T (a + bx + cx2) =
[
0 0
a + c b + c
]
.
(d) L : R3 → P1(R) defined by
L(a, b, c) = (a + b) + (a + b + c)x.
(e) T : M2×2(R) → M2×2(R) defined by
T (A) = AT .
(f) M : P2(R)→ M2×2(R) defind by
M(a + bx + cx2) =
[
−a − 2c 2b − c
−2a + 2c −2b − c
]
.
2. Let ~v1, . . . ,~vk be vectors in a vector space V
and L : V → W be a linear mapping. Prove
that if {L(~v1), . . . , L(~vk)} spansW, then dimV ≥dimW.
3. Find a linear mapping L : V → W where
dimV ≥ dimW, but Range(L) ,W.
4. Find a linear mapping L : V → V such that
Ker(L) = Range(L).
5. Let V and W be n-dimensional vector spaces
and let L : V → W be a linear mapping. Prove
that Range(L) =W if and only if Ker(L) = {~0}.
6. LetU,V,W be finite dimensional vector spaces
and let L : V → U and M : U → W be linear
mappings.
(a) Prove that rank(M ◦ L) ≤ rank(M).
(b) Prove that rank(M ◦ L) ≤ rank(L).
Section 8.3 Matrix of a Linear Mapping 225
8.3 Matrix of a Linear Mapping
We now show that every linear mapping L : V→W can also be represented as a ma-
trix mapping. However, we must be careful when dealing with general vector spaces
as our domain and codomain. For example, it is certainly impossible to represent a
linear mapping L : P2(R) → M2×2(R) as a matrix mapping L(~x) = A~x since we can
not multiply a matrix by a polynomial ~x ∈ P2(R).
Thus, if we are going to define a matrix representation of a general linear mapping,
we need to convert vectors in V to vectors in Rn. Recall that the coordinate vector of
~x ∈ V with respect to a basis B is a vector in Rn. In particular, if B = {~v1, . . . ,~vn} and
~x = b1~v1 + · · · + bn~vn, then the coordinate vector of ~x with respect to B is defined to
be
[~x]B =
b1
...bn
Using coordinates, we can write a matrix mapping representation for a linear map-
ping L : V→W. We want to find a matrix A such that
[L(~x)]C = A[~x]B
for every ~x ∈ V, where B is a basis for V and C is a basis forW.
Consider the left-hand side [L(~x)]C. Using properties of linear mappings and coordi-
nates, we get
[L(~x)]C = [L(b1~v1 + · · · + bn~vn)]C
= [b1L(~v1) + · · · + bnL(~vn)]C
= b1[L(~v1)]C + · · · + bn[L(~vn)]C
=[
[L(~v1)]C · · · [L(~vn)]C]
b1
...bn
Thus, we have the matrix[
[L(~v1)]C · · · [L(~vn)]C]
being matrix-vector multiplied
by the vector [~x]B as desired. Consequently, this is the matrix we were trying to find.
DEFINITION
Matrix of aLinear Mapping
Suppose B = {~v1, . . . ,~vn} is any basis for a vector space V and C is any basis for a
finite dimensional vector spaceW. For a linear mapping L : V → W, the matrix of
L with respect to bases B and C is defined by
C[L]B =[
[L(~v1)]C · · · [L(~vn)]C]
and satisfies
[L(~x)]C = C[L]B[~x]B
for all ~x ∈ V.
226 Chapter 8 Linear Mappings
EXAMPLE 1 Let L : P2(R) → R2 be the linear mapping defined by L(a + bx + cx2) =
[
a + c
b − c
]
,
B = {1 + x2, 1 + x,−1 + x + x2}, C ={[
1
0
]
,
[
1
1
]}
. Find C[L]B.
Solution: To find C[L]B we need to determine the C-coordinate vectors of the images
of the vectors in B under L. We have
L(1 + x2) =
[
2
−1
]
= 3
[
1
0
]
+ (−1)
[
1
1
]
L(1 + x) =
[
1
1
]
= 0
[
1
0
]
+ 1
[
1
1
]
L(−1 + x + x2) =
[
0
0
]
= 0
[
1
0
]
+ 0
[
1
1
]
Hence,
C[L]B =[
[L(1 + x2)]C [L(1 + x)]C [L(−1 + x + x2)]C]
=
[
3 0 0
−1 1 0
]
We can check our answer. Let p(x) = a + bx + cx2. Recall, to use C[L]B we need
[p(x)]B. Hence, we need to write p(x) as a linear combination of the vectors in B. In
particular, we need to solve
a+bx+cx2 = t1(1+ x2)+ t2(1+ x)+ t3(−1+ x+ x2) = (t1+ t2− t3)+(t2+ t3)x+(t1+ t3)x2
Row reducing the corresponding augmented matrix gives
1 1 −1 a
0 1 1 b
1 0 1 c
∼
1 0 0 (a − b + 3c)/30 1 0 (a + 2b − c)/30 0 1 (−a + b + c)/3
Thus,
[p(x)]B =
(a − b + 3c)/3(a + 2b − c)/3(−a + b + c)/3
So, we get
[L(p(x))]C = C[L]B[p(x)]B
=
[
3 0 0
−1 1 0
]
(a − b + 2c)/3(a + 2b − c)/3(−a + b + c)/3
=
[
a − b + 2c
b − c
]
Therefore, by definition of C-coordinates,
L(p(x)) = (a − b + 2c)
[
1
0
]
+ (b − c)
[
1
1
]
=
[
a + c
b − c
]
as required.
Section 8.3 Matrix of a Linear Mapping 227
EXAMPLE 2 Let T : R2 → M2×2(R) be the linear mapping defined by T
([
a
b
])
=
[
a + b 0
0 a − b
]
,
let B ={[
2
−1
]
,
[
1
2
]}
, and let C ={[
1 1
0 0
]
,
[
1 0
0 1
]
,
[
1 1
0 1
]
,
[
0 0
1 0
]}
. Determine C[T ]B
and use it to calculate T (~v) where [~v]B =
[
2
−3
]
.
Solution: To find C[T ]B we need to determine the C-coordinate vectors of the images
of the vectors in B under T . We have
T (2,−1) =
[
1 0
0 3
]
= −2
[
1 1
0 0
]
+ 1
[
1 0
0 1
]
+ 2
[
1 1
0 1
]
+ 0
[
0 0
1 0
]
T (1, 2) =
[
3 0
0 −1
]
= 4
[
1 1
0 0
]
+ 3
[
1 0
0 1
]
+ (−4)
[
1 1
0 1
]
+ 0
[
0 0
1 0
]
Hence,
C[T ]B =[
[T (2,−1)]C [T (1, 2)]C]
=
−2 4
1 3
2 −4
0 0
Thus, we get
[T (~v)]C =
−2 4
1 3
2 −4
0 0
[
2
−3
]
=
−16
−7
16
0
So,
T (~v) = (−16)
[
1 1
0 0
]
+ (−7)
[
1 0
0 1
]
+ 16
[
1 1
0 1
]
+ (0)
[
0 0
1 0
]
=
[
−7 0
0 9
]
In the special case of a linear operator L acting on a finite-dimensional vector space
V with basis B, we often wish to find the matrix B[L]B.
DEFINITION
Matrix of aLinear Operator
Suppose B = {~v1, . . . ,~vn} is any basis for a vector space V and let L : V → V be a
linear operator. The B-matrix of L (or the matrix of L with respect to the basis B) is
defined by
[L]B =[
[L(~v1)]B · · · [L(~vn)]B]
and satisfies
[L(~x)]B = [L]B[~x]B
for all ~x ∈ V.
228 Chapter 8 Linear Mappings
EXAMPLE 3 Let B =
1
1
1
,
1
0
1
,
1
3
2
and let A =
−6 −2 9
−5 −1 7
−7 −2 10
. Find the B-matrix of the linear
mapping defined by L(~x) = A~x.
Solution: By definition, the columns of the B-matrix are the B-coordinate vectors of
the images of the vectors in B under L. We find that
−6 −2 9
−5 −1 7
−7 −2 10
1
1
1
=
1
1
1
= 1
1
1
1
+ 0
1
0
1
+ 0
1
3
2
−6 −2 9
−5 −1 7
−7 −2 10
1
0
1
=
3
2
3
= 2
1
1
1
+ 1
1
0
1
+ 0
1
3
2
−6 −2 9
−5 −1 7
−7 −2 10
1
3
2
=
6
6
7
= 3
1
1
1
+ 2
1
0
1
+ 1
1
3
2
Hence, [L]B =[
[L(1, 1, 1)]B [L(1, 0, 1)]B [L(1, 3, 2)]B]
=
1 2 3
0 1 2
0 0 1
EXAMPLE 4 Let L : P2(R)→ P2(R) be defined by L(a+bx+cx2) = (2a+b)+(a+b+c)x+(b+2c)x2 .
Find [L]B whereB = {−1+x2, 1−2x+x2, 1+x+x2} and find [L]S where S = {1, x, x2}.Solution: To determine [L]B we need to find the B-coordinate vectors of the images
of the vectors in B under L.
L(−1 + x2) = −2 + 0x + 2x2 = 2(−1 + x2) + 0(1 − 2x + x2) + 0(1 + x + x2)
L(1 − 2x + x2) = 0 + 0x + 0x2 = 0(−1 + x2) + 0(1 − 2x + x2) + 0(1 + x + x2)
L(1 + x + x2) = 3 + 3x + 3x2 = 0(−1 + x2) + 0(1 − 2x + x2) + 3(1 + x + x2)
Thus,
[L]B =[
[L(−1 + x2)]B [L(1 − 2x + x2)]B [L(1 + x + x2)]B]
=
2 0 0
0 0 0
0 0 3
Similarly, for [L]S we calculate the S-coordinate vectors of the images of the vectors
in S under L.
L(1) = 2 + x = 2(1) + 1(x) + 0x2
L(x) = 1 + x + x2 = 1(1) + 1(x) + 1(x2)
L(x2) = x + 2x2 = 0(1) + 1(x) + 2(x2)
Hence,
[L]S =
2 1 0
1 1 1
0 1 2
Section 8.3 Matrix of a Linear Mapping 229
In Example 4 we found the matrix of L with respect to the basis B is diagonal while
the matrix of L with respect to the basis S is not. This is perhaps not surprising since
we see that the vectors in B are eigenvectors of L (that is, they satisfy L(~vi) = λi~vi),
while the vectors in S are not. Recall from Section 6.1 that we called such a basis a
geometrically natural basis.
REMARK
The relationship between diagonalization and the matrix of a linear operator is ex-
tremely important. We will review this a little later in the text. You may also find it
useful at this point to review Section 6.1.
Section 8.3 Problems
1. Find the matrix of each linear mapping with re-
spect to the bases B and C.
(a) B = {1, x, x2}, C = {1, x}, D : P2(R) →P1(R) defined by D(a+bx+cx2) = b+2cx,
(b) B ={[
1
−1
]
,
[
1
2
]}
, C = {1 + x2, 1 + x,−1 −
x + x2}, L : R2 → P2(R) defined by
L(a1, a2) = (a1 + a2) + a1x2,
(c) B =
{[
1 1
0 0
]
,
[
0 1
1 0
]
,
[
1 0
0 1
]
,
[
0 0
1 1
]}
,
C ={[
1
1
]
,
[
−1
1
]}
, T : M2×2(R) → R2 de-
fined by T
([
a b
c d
])
=
[
a + c
b + c
]
.
(d) B = {1+ x2, x− x2, x2, x3}, C ={[
1
1
]
,
[
1
2
]}
,
L : P3(R)→ R2 defined by L(p) =
[
p(0)
p(1)
]
.
(e) B = {1, x − 1, (x − 1)2}, C = {1, x +1, (x + 1)2}, L : P2(R) → P2(R) defined
by L(a + bx + cx2) = a + bx + cx2.
(f) B =
1
−1
0
,
1
0
−1
,
0
1
1
, C = {1 − x, x},
L : R3 → P1(R) defined by
L(a, b, c) = (a + c) + (a + 2b)x.
2. Find the B-matrix of each linear mapping with
respect to the give basis B.
(a) B = {1, x, x2}, D : P2(R) → P2(R) defined
by D(a + bx + cx2) = b + 2cx,
(b) B =
{[
1 1
0 0
]
,
[
1 0
0 1
]
,
[
1 1
0 1
]
,
[
0 0
1 0
]}
,
L : M2×2(R) → M2×2(R) defined by
L
([
a b
c d
])
=
[
a b + c
0 d
]
.
(c) B ={[
1 0
0 1
]
,
[
2 0
0 3
]}
, T : U → U, where
U is the subspace of diagonal matrices in
M2×2(R), defined by
T
([
a 0
0 b
])
=
[
a + b 0
0 2a + b
]
.
(d) B = {1+x2,−1+x, 1−x+x2}, L : P2(R) →P2(R) defined by
L(a + bx + cx2) = a + (b + c)x2.
3. Let V be an n-dimensional vector space and let
L : V → V by the linear operator defined by
L(~v) = ~v. Prove that [L]B = I for any basis Bof V.
4. Invent a mapping L : R2 → R2 and a basis Bfor R2, such that Col([L]B) , Range(L). Justify
your answer.
230 Chapter 8 Linear Mappings
8.4 Isomorphisms
Recall that the ten vector space axioms define a “structure” for the set based on
the operations of addition and scalar multiplication. Since all vector spaces satisfy
the same ten properties, we would expect that all n-dimensional vector spaces should
have the same structure. This idea seems to be confirmed by our work with coordinate
vectors in Section 4.3. No matter what n-dimensional vector space V we use and
which basis we use for the vector space, we have a nice way of relating the vectors
in V to vectors in Rn. Moreover, the operations of addition and scalar multiplication
are preserved with respect to this basis. Consider the following calculations:
1
2
3
+
4
5
6
=
5
7
9
(1 + 2x + 3x2) + (4 + 5x + 6x2) = 5 + 7x + 9x2
(~v1 + 2~v2 + 3~v3) + (4~v1 + 5~v2 + 6~v3) = 5~v1 + 7~v2 + 9~v3
No matter which vector space we are using and which basis for that vector space,
any linear combination of vectors is really just performed on the coordinates of the
vectors with respect to the basis.
We will now look at how to use general linear mappings to mathematically prove
these observations. We begin with some familiar concepts.
One-To-One and Onto
DEFINITION
One-To-One
Onto
Let L : V → W be a linear mapping. L is called one-to-one (injective) if for every
~u,~v ∈ V such that L(~u) = L(~v), we must have ~u = ~v.
L is called onto (surjective) if for every ~w ∈W, there exists ~v ∈ V such that L(~v) = ~w.
EXAMPLE 1 Let L : R2 → P2(R) be the linear mapping defined by L(a1, a2) = a1 + a2x2. Then
L is one-to-one, since if L(a1, a2) = L(b1, b2), then a1 + a2x2 = b1 + b2x2 and hence
a1 = b1 and a2 = b2. L is not onto, since there is no ~a ∈ R2 such that L(~a) = x.
EXAMPLE 2 Let L : M2×2(R) → R be the linear mapping defined by L(A) = tr A. L is not one-
to-one since L
([
1 0
0 −1
])
= L
([
0 0
0 0
])
, but
[
1 0
0 −1
]
,
[
0 0
0 0
]
. L is onto since if we
pick any x ∈ R, we can find a matrix A ∈ M2×2(R), for example A =
[
x 0
0 0
]
, such that
L (A) = x.
Section 8.4 Isomorphisms 231
Observe that a linear mapping L : V → W being onto means that Range(L) = W.
Hence, there is a direct connection between a mapping being onto and its range. On
the other hand, L being one-to-one means that for each ~w ∈ Range(L) there exists
exactly one ~v ∈ V such that L(~v) = ~w. We now establish a relationship between
a mapping being one-to-one and its kernel. These connections will be exploited in
some proofs below.
LEMMA 1 Let L : V→W be a linear mapping. L is one-to-one if and only if Ker(L) = {~0}.
Proof: Assume L is one-to-one. If ~x ∈ Ker(L), then L(~x) = ~0 = L(~0) which implies
that ~x = ~0 since L is one-to-one. Hence Ker(L) = {~0}.
Assume Ker(L) = {~0}. If L(~u) = L(~v), then
~0 = L(~u) − L(~v) = L(~u − ~v)
Hence, ~u − ~v ∈ Ker(L) which implies that ~u − ~v = ~0. Therefore, ~u = ~v and so L is
one-to-one. �
EXAMPLE 3 Let L : R3 → P5(R) be the linear mapping defined by
L(a, b, c) = a(1 + x3) + b(1 + x2 + x5) + cx4
Prove that L is one-to-one, but not onto.
Solution: Let ~a =
a
b
c
∈ Ker(L). Then,
0 = L(a, b, c) = a(1 + x3) + b(1 + x2 + x5) + cx4
This gives a = b = c = 0, so Ker(L) =
0
0
0
. Hence, L is one-to-one by Lemma 1.
Observe that a basis for Range(L) is {1+ x3, 1+ x2+ x5, x4}. Hence, dim(Range(L)) =
3 < dim P5(R). Thus, Range(L) , P5(R), so L is not onto.
EXAMPLE 4 Let L : M2×2(R)→ R2 be the linear mapping defined by L
([
a b
c d
])
=
[
a
b
]
.
Prove that L is onto, but not one-to-one.
Solution: Observe that
[
0 0
1 1
]
∈ Ker(L). Thus, L is not one-to-one by Lemma 1.
Let
[
x1
x2
]
∈ R2. Observe that L
([
x1 x2
0 0
])
=
[
x1
x2
]
, so L is onto.
232 Chapter 8 Linear Mappings
EXERCISE 1 Let L : R3 → P2(R) be the linear mapping defined by L(a, b, c) = a+ bx+ cx2. Prove
that L is one-to-one and onto.
Isomorphisms
For two vector spaces V andW to have the same structure we will need each vector
~v ∈ V to be identified with a unique vector ~w ∈ W such that linear combinations
are preserved. To relate each ~v ∈ V with a unique ~w ∈ W we see that we just need
to find a mapping L from V to W which is both one-to-one and onto. For linear
combinations to be preserved we need the mapping to be linear.
DEFINITION
Isomorphism
Isomorphic
A vector space V is said to be isomorphic to a vector spaceW if there exists a linear
mapping L : V → W which is one-to-one and onto. L is called an isomorphism
from V toW.
EXAMPLE 5 Show that R4 is isomorphic to M2×2(R).
Solution: We need to invent a linear mapping L fromR4 to M2×2(R) that is one-to-one
and onto. To define such a linear mapping, we think about how to relate vectors in
R4 to vectors in M2×2(R). Keeping in mind our work at the beginning of this section,
we define the mapping L : R4 → M2×2(R) by
L(a0, a1, a2, a3) =
[
a0 a1
a2 a3
]
Linear: Let ~a =
a0
a1
a2
a3
, ~b =
b0
b1
b2
b3
∈ R4 and s, t ∈ R. Then L is linear since
L(s~a + t~b) = L(sa0 + tb0, sa1 + tb1, sa2 + tb2, sa3 + tb3)
=
[
sa0 + tb0 sa1 + tb1
sa2 + tb2 sa3 + tb3
]
= s
[
a0 a1
a2 a3
]
+ t
[
b0 b1
b2 b3
]
= sL(~a) + tL(~b)
One-To-One: If L(~a) = L(~b), then
[
a0 a1
a2 a3
]
=
[
b0 b1
b2 b3
]
which implies ai = bi for
i = 0, 1, 2, 3. Therefore, ~a = ~b and so L is one-to-one.
Onto: Pick
[
a0 a1
a2 a3
]
∈ M2×2(R). Then L(a0, a1, a2, a3) =
[
a0 a1
a2 a3
]
, so L is onto.
Hence, L is an isomorphism and so R4 is isomorphic to M2×2(R).
Section 8.4 Isomorphisms 233
It is instructive to think carefully about the isomorphism in the example above. Ob-
serve that the images of the standard basis vectors for R4 are the standard basis vec-
tors for M2×2(R). That is, the isomorphism is mapping basis vectors to basis vectors.
We will keep this in mind when constructing an isomorphism in the next example.
EXAMPLE 6 Let T = {p(x) ∈ P2(R) | p(1) = 0} and S =
a
b
c
∈ R3 | a + b + c = 0
. Prove that T
is isomorphic to S.
Solution: By the Factor Theorem, every vector p(x) ∈ T has the form (x−1)(a+bx).
Hence, a basis for T is B = {1 − x, x − x2}. Also, every vector ~x ∈ S has the form
~x =
a
b
−a − b
= a
1
0
−1
+ b
0
1
−1
Therefore, a basis for S is C =
1
0
−1
,
0
1
−1
. To define an isomorphism we can map
the vectors in B to the vectors in C. In particular, we define L : T→ S by
L
(
a(1 − x) + b(x − x2)
)
= a
1
0
−1
+ b
0
1
−1
Linear: Let p(x), q(x) ∈ T with p(x) = a1(1 − x) + b1(x − x2) and
q(x) = a2(1 − x) + b2(x − x2). For any s, t ∈ R we get
L(sp(x) + tq(x)) = L
(
(sa1 + ta2)(1 − x) + (sb1 + tb2)(x − x2)
)
= (sa1 + ta2)
1
0
−1
+ (sb1 + tb2)
0
1
−1
= s
a1
1
0
−1
+ b1
0
1
−1
+ t
a2
1
0
−1
+ b2
0
1
−1
= sL(p(x)) + tL(q(x))
Thus, L is linear.
One-To-One: Let a(1 − x) + b(x − x2) ∈ Ker(L). Then,
0
0
0
= L
(
a(1 − x) + b(x − x2)
)
= a
1
0
−1
+ b
0
1
−1
=
a
b
−a − b
Hence, a = b = 0, so Ker(L) = {0}. Therefore, by Lemma 1, L is one-to-one.
234 Chapter 8 Linear Mappings
Onto: Let a
1
0
−1
+ b
0
1
−1
∈ S. We have
L
(
a(1 − x) + b(x − x2)
)
= a
1
0
−1
+ b
0
1
−1
So, L is onto.
Thus, L is an isomorphism and T is isomorphic to S.
EXERCISE 2 Prove that P2(R) is isomorphic to R3.
Observe in the examples above that the isomorphic vector spaces have the same di-
mension. This makes sense since if we are mapping basis vectors to basis vectors,
then both vector spaces need to have the same number of basis vectors. We now
prove that this is indeed always the case.
THEOREM 2 Let V andW be finite dimensional vector spaces. V is isomorphic toW if and only
if dimV = dimW.
Proof: On one hand, assume that dimV = dimW = n. Let B = {~v1, . . . ,~vn} be a
basis for V and C = {~w1, . . . , ~wn} be a basis for W. As in our examples above, we
define a mapping L : V → W which maps the basis vectors in B to the basis vectors
in C. So, we define
L(t1~v1 + · · · + tn~vn) = t1~w1 + · · · + tn~wn
Linear: Let ~x = a1~v1 + · · · + an~vn, ~y = b1~v1 + · · · + bn~vn ∈ V and s, t ∈ R. We have
L(s~x + t~y) = L(
(sa1 + tb1)~v1 + · · · + (san + tbn)~vn
)
= (sa1 + tb1)~w1 + · · · + (san + tbn)~wn
= s(a1~w1 + · · · + an~wn) + t(b1~w1 + · · · + bn~wn)
= sL(~x) + tL(~y)
Therefore, L is linear.
One-To-One: If ~v ∈ Ker(L), then
~0 = L(~v) = L(c1~v1 + · · · + cn~vn) = c1L(~v1) + · · · + cnL(~vn) = c1~w1 + · · · + cn~wn
C is linearly independent so we get c1 = · · · = cn = 0. Therefore, Ker(L) = {~0} and
hence L is one-to-one.
Section 8.4 Isomorphisms 235
Onto: We have Ker(L) = {~0} since L is one-to-one. Thus, we get rank(L) = dimV −0 = n by the Rank-Nullity Theorem. Consequently, Range(L) is an n-dimensional
subspace ofW which implies that Range(L) =W, so L is onto.
Thus, L is an isomorphism from V toW and so V is isomorphic toW.
On the other hand, assume that V is isomorphic toW. Then there exists an isomor-
phism L from V to W. Since L is one-to-one and onto, we get Range(L) = W and
Ker(L) = {~0}. Thus, the Rank-Nullity Theorem gives
dimW = dim(Range(L)) = rank(L) = dimV − nullity(L) = dimV
as required. �
The proof not only proves the theorem, but it demonstrates a few additional facts.
1. It shows the intuitively obvious fact that if V is isomorphic to W, then W is
isomorphic to V. Typically we just say that V andW are isomorphic.
2. It confirms that we can make an isomorphism from V toW by mapping basis
vectors of V to basis vectors toW.
3. Finally, observe that once we had proven that L was one-to-one, we could
exploit the Rank-Nullity Theorem and that dimV = dimW to immediately get
that the mapping was onto. In particular, it shows us how to prove the following
theorem.
THEOREM 3 If V andW are both n-dimensional vector spaces and L : V→W is linear, then L is
one-to-one if and only if L is onto.
REMARK
Note that if V and W are both n-dimensional, then it does not mean every linear
mapping L : V → W must be an isomorphism. For example, L : R4 → M2×2(R)
defined by L(x1, x2, x3, x4) =
[
0 0
0 0
]
is definitely not one-to-one nor onto!
We saw that we can make an isomorphism by mapping basis vectors to basis vectors.
The following theorem shows that this property actually characterizes isomorphisms.
THEOREM 4 Let V and W be isomorphic vector spaces and let {~v1, . . . ,~vn} be a basis for V. A
linear mapping L : V → W is an isomorphism if and only if {L(~v1), . . . , L(~vn)} is a
basis forW.
The following exercise is intended to be quite challenging. It requires complete un-
derstanding of what we have done in this chapter.
236 Chapter 8 Linear Mappings
EXERCISE 3 Use the following steps to find a basis for the vector space L of all linear operators
L : V→ V on an 2-dimensional vector space V.
1. Prove that L is isomorphic to M2×2(R) by constructing an isomorphism T .
2. Find an isomorphism T−1 from M2×2(R) to L. Let {~e1, ~e2, ~e3, ~e4} denote the stan-
dard basis vectors for M2×2(R). Calculate T−1(~e1), T−1(~e2), T−1(~e3), T−1(~e4).
By Theorem 4 these vectors form a basis for L.
3. Demonstrate that you are correct by proving that the set
{T−1(~e1), T−1(~e2), T−1(~e3), T−1(~e4)} is linearly independent and spans L.
Section 8.4 Problems
1. For each of the following pairs of vector
spaces, define an explicit isomorphism to es-
tablish that the spaces are isomorphic. Prove
that your map is an isomorphism.
(a) P1(R) and R2.
(b) P3(R) and M2×2(R).
(c) The subspace U of 2 × 2 upper triangular
matrices and the subspace
P = {p(x) ∈ P3(R) | p(1) = 0}.
(d) The subspace S =
x1
x2
0
x3
| x1 − x2 + x3 = 0
of R4 and the subspace U = {a+ bx + cx2 |a = b} of P2(R).
(e) Any n-dimensional vector space V and Rn.
(f) Any n-dimensional vector space V and
Pn−1(R).
2. Let V and W be n-dimensional vector spaces
and let L : V → W be a linear mapping. Prove
that L is one-to-one if and only if L is onto.
3. Let L : V → W be an isomorphism and
let {~v1, . . . ,~vn} be a basis for V. Prove that
{L(~v1), . . . , L(~vn)} is a basis forW.
4. Let L : V → U and M : U → W be linear
mappings.
(a) Prove that if L and M are onto, then M ◦ L
is onto.
(b) Give an example where L is not onto, but
M ◦ L is onto.
(c) Is it possible to give an example where M
is not onto, but M ◦ L is onto? Explain.
5. Let V andW be vector spaces with dimV = n
and dimW = m, let L : V → W be a linear
mapping, and let A be the matrix of L with re-
spect to bases B for V and C forW.
(a) Define an explicit isomorphism from
Range(L) to Col(A). Prove that your map
is an isomorphism.
(b) Use (a) to prove that rank(L) = rank(A).
6. LetU andV be finite dimensional vector spaces
and L : U → V and M : V → U both be one-
to-one linear mappings. Prove that U and V are
isomorphic.
7. Prove or disprove the following statements.
Assume that U,V,W are all finite dimensional
vector spaces.
(a) If dimV = dimU and L : U→ V is linear,
then L is an isomorphism.
(b) If L : V → W is a linear mapping and
dimV < dimW, then L cannot be onto.
(c) If L : V → W is a linear mapping and
dimV > dimW, then L is onto.
(d) If L : V → W is a linear mapping and
dimV > dimW, then L cannot be one-to-
one.
Chapter 9
Inner Products
In Chapter 1, we looked at the dot product for vectors in Rn. We saw that the dot
product has an important relationship to the length of a vector and to the angle be-
tween two vectors. Moreover, the dot product gave us an easy way of finding the
projection of a vector onto a plane. In Chapter 8, we saw that every n-dimensional
vector space has the same structure as Rn in terms of being a vector space. Thus, we
expect that we can extend the concepts of length, orthogonality, and projections to
general vector spaces.
9.1 Inner Product Spaces
Recall that we defined a vector space so that the operations of addition and scalar
multiplication had the essential properties of addition and scalar multiplication of
vectors in Rn. Thus, to generalize the concept of the dot product to general vector
spaces, it makes sense to include the essential properties of the dot product in our
definition (see Theorem 1.4.2 on page 31).
DEFINITION
Inner Product
Inner ProductSpace
Let V be a vector space. An inner product on V is a function 〈 , 〉 : V × V→ R that
has the following properties: for every ~v, ~u, ~w ∈ V and s, t ∈ R we have
I1 〈~v,~v〉 ≥ 0, and 〈~v,~v〉 = 0 if and only if ~v = ~0 (Positive Definite)
I2 〈~v, ~w〉 = 〈~w,~v〉 (Symmetric)
I3 〈s~v + t~u, ~w〉 = s〈~v, ~w〉 + t〈~u, ~w〉 (Left linear)
A vector space V with an inner product 〈 , 〉 on V is called an inner product space.
REMARK
Notice that since an inner product is left linear and symmetric, then it is also right
linear:
〈~w, s~v + t~u〉 = s〈~w,~v〉 + t〈~w, ~u〉Thus, we say that an inner product is bilinear.
237
238 Chapter 9 Inner Products
In the same way that a vector space is dependent on the defined operations of addition
and scalar multiplication, an inner product space is dependent on the definitions of
addition, scalar multiplication, and the inner product. This will be demonstrated in
the examples below.
EXAMPLE 1 The dot product is an inner product on Rn, called the standard inner product on Rn.
EXAMPLE 2 Which of the following defines an inner product on R3?
(a)⟨
~x, ~y⟩
= x1y1 + 2x2y2 + 4x3y3.
Solution: We have⟨
~x, ~x⟩
= x21 + 2x2
2 + 4x23 ≥ 0
and⟨
~x, ~x⟩
= 0 if and only if ~x = ~0. Hence, it is positive definite.
⟨
~x, ~y⟩
= x1y1 + 2x2y2 + 4x3y3 = y1x1 + 2y2x2 + 4y3x3 =⟨
~y, ~x⟩
Therefore, it is symmetric. Also, it is bilinear since
⟨
s~x + t~y,~z⟩
= (sx1 + ty1)z1 + 2(sx2 + ty2)z2 + 4(sx3 + ty3)z3
= s(x1z1 + 2x2z2 + 4x3z3) + t(y1z1 + 2y2z2 + 4y3z3)
= s⟨
~x,~z⟩
+ t⟨
~y,~z⟩
Thus 〈 , 〉 is an inner product.
(b)⟨
~x, ~y⟩
= x1y1 − x2y2 + x3y3.
Solution: Observe that if ~x =
0
1
0
, then
⟨
~x, ~x⟩
= 0(0) − 1(1) + 0(0) = −1 < 0
So, it is not positive definite and thus it is not an inner product.
(c)⟨
~x, ~y⟩
= x1y1 + x2y2.
Solution: If ~x =
0
0
1
, then
⟨
~x, ~x⟩
= 0(0) + 0(0) = 0
but ~x , ~0. So, it is not positive definite and thus it is not an inner product.
Section 9.1 Inner Product Spaces 239
EXAMPLE 3 On the vector space M2×2(R), define 〈 , 〉 by
〈A, B〉 = tr(BT A)
where tr is the trace of a matrix defined as the sum of the diagonal entries. Show that
〈 , 〉 is an inner product on M2×2(R).
Solution: Let A, B,C ∈ M2×2(R) and s, t ∈ R. Then,
〈A, A〉 = tr(AT A) = tr
([
a11 a21
a12 a22
] [
a11 a12
a21 a22
])
= tr
([
a211+ a2
21a11a12 + a21a22
a11a12 + a21a22 a212 + a2
22
])
= a211 + a2
21 + a212 + a2
22 ≥ 0
Moreover, 〈A, A〉 = 0 if and only if A is the zero matrix. Hence, 〈 , 〉 is positive
definite.
〈A, B〉 = tr(BT A) = tr
([
b11 b21
b12 b22
] [
a11 a12
a21 a22
])
= tr
([
a11b11 + a21b21 b11a12 + b21a22
a11b12 + a21b22 a12b12 + a22b22
])
= a11b11 + a12b12 + a21b21 + a22b22
With a similar calculation we find that
〈B, A〉 = a11b11 + a12b12 + a21b21 + a22b22 = 〈A, B〉
Therefore, 〈 , 〉 is symmetric. In Example 8.1.2, we proved that tr is a linear operator.
Hence, we get
〈sA + tB,C〉 = tr(CT (sA + tB)) = tr(sCT A + tCT B)
= s tr(CT A) + t tr(CT B) = s 〈A,C〉 + t 〈B,C〉
Thus, 〈 , 〉 is also bilinear, and so it is an inner product on M2×2(R).
REMARKS
1. Of course, this can be generalized to an inner product 〈A, B〉 = tr(BT A) on
Mm×n(R). This is called the standard inner product on Mm×n(R).
2. Observe that calculating 〈A, B〉 corresponds exactly to finding the dot product
on vectors in Rmn under the obvious isomorphism. As a result, when using this
inner product you do not actually have to compute BT A.
240 Chapter 9 Inner Products
EXAMPLE 4 Prove that the function 〈 , 〉 defined by
〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1)
is an inner product on P2(R).
Solution: Let p(x) = a + bx + cx2, q(x), r(x) ∈ P2(R). Then,
〈p(x), p(x)〉 = [p(−1)]2 + [p(0)]2 + [p(1)]2
= [a − b + c]2 + [a]2 + [a + b + c]2 ≥ 0
Moreover, 〈p(x), p(x)〉 = 0 if and only if a − b + c = 0, a = 0, and a + b + c = 0,
which implies a = b = c = 0. Hence, 〈 , 〉 is positive definite. We have
〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1)
= q(−1)p(−1) + q(0)p(0) + q(1)p(1)
= 〈q(x), p(x)〉
Hence, 〈 , 〉 is symmetric. Finally, we get
〈(sp + tq)(x), r(x)〉 =(sp + tq)(−1)r(−1) + (sp + tq)(0)r(0) + (sp + tq)(1)r(1)
=[sp(−1) + tq(−1)]r(−1) + [sp(0) + tq(0)]r(0)+
+ [sp(1) + tq(1)]r(1)
=s[p(−1)r(−1) + p(0)r(0) + p(1)r(1)]+
+ t[q(−1)r(−1) + q(0)r(0) + q(1)r(1)]
=s 〈p(x), r(x)〉 + t 〈q(x), r(x)〉
Therefore, 〈 , 〉 is also bilinear and hence is an inner product on P2(R).
EXAMPLE 5 An extremely important inner product in applied mathematics, physics, and engineer-
ing is the inner product
〈 f (x), g(x)〉 =∫ π
−πf (x)g(x) dx
on the vector space C[−π, π] of continuous functions defined on the closed interval
from −π to π. This is the foundation for Fourier Series.
THEOREM 1 If V is an inner product space with inner product 〈 , 〉, then for any ~v ∈ V we have
⟨
~v, ~0⟩
= 0
Section 9.1 Inner Product Spaces 241
REMARK
Whenever one considers an inner product space one must define which inner product
they are using. However, in Rn and Mm×n(R) one generally uses the standard inner
product. Thus, whenever we are working in Rn or Mm×n(R), if no other inner product
is defined, we will take that to mean that the standard inner product is being used.
Section 9.1 Problems
1. Calculate the following inner products under
the standard inner product on M2×2(R).
(a)
⟨[
1 2
0 −1
]
,
[
1 2
0 −1
]⟩
(b)
⟨[
3 −4
2 5
]
,
[
4 1
2 −1
]⟩
(c)
⟨[
2 3
−4 1
]
,
[
−5 3
1 5
]⟩
(d)
⟨[
0 0
0 0
]
,
[
3 7
−4 3
]⟩
2. Calculate the following using the inner product
for P2(R) defined by
〈p(x), q(x)〉 = p(0)q(0) + p(1)q(1) + p(2)q(2)
(a) 〈x, 1 + x + x2〉 (b) 〈1 + x2, 1 + x2〉(c) 〈x + x2, 1 + x + x2〉 (d) 〈x, x〉
3. Determine, with proof, which of the following
functions defines an inner product on R3.
(a) 〈~x, ~y〉 = 2x1y1 + 3x2y2 + 4x3y3
(b) 〈~x, ~y〉 = 2x1y1 − x1y2 − x2y1 + 2x2y2 + x3y3
4. Determine, with proof, which of the following
functions defines an inner product on P2(R).
(a) 〈p(x), q(x)〉 = p(−1)q(−1) + p(1)q(1)
(b) 〈p(x), q(x)〉 = p(−2)q(−2) + p(1)q(1) +
p(2)q(2)
(c) 〈p(x), q(x)〉 = 2p(−1)q(−1) + 2p(0)q(0) +
2p(1)q(1) − p(−1)q(1) − p(1)q(−1)
5. Determine, with proof, which of the following
functions defines an inner product on M2×2(R).
(a) 〈A, B〉 = 2a1b1 + a2b2 + 2a3b3
(b) 〈A, B〉 = a1b1 + a2b2 − a3b3 − a4b4
6. Let V be an inner product space with inner
product 〈 , 〉. Prove that 〈~v, ~0〉 = 0 for any~v ∈ V.
7. Let 〈 , 〉 denote the standard inner product on Rn
and let A ∈ Mn×n(R). Prove that for all ~x, ~y ∈ Rn
〈A~x, ~y〉 = 〈~x, AT~y〉
8. Let {~e1, ~e2} be the standard basis for R2. Prove
that for any inner product 〈 , 〉 on R2 we have
〈~x, ~y〉 = x1y1〈~e1, ~e1〉+x1y2〈~e1, ~e2〉+x2y1〈~e2, ~e1〉+x2y2〈~e2, ~e2〉
9. Verify that 〈 f (x), g(x)〉 =∫ π
−πf (x)g(x) dx is an
inner product on C[−π, π].10. Determine if the following statements are true
or false. Justify your answer.
(a) If there exists ~x, ~y ∈ V such that 〈~x, ~y〉 < 0,
then 〈 , 〉 is not an inner product on V.
(b) If 〈 , 〉 is an inner product on an inner prod-
uct space V, then for all ~x, ~y ∈ V we have
〈a~x+b~y, c~x+d~y〉 = ac〈~x, ~x〉+2bc〈~x, ~y〉+bd〈~y, ~y〉
242 Chapter 9 Inner Products
9.2 Orthogonality and Length
The concepts of length and orthogonality are fundamental in geometry and have
many real world applications. Thus, we now extend these concepts to general in-
ner product spaces.
Length
DEFINITION
Length
Let V be an inner product space with inner product 〈 , 〉. For any ~v ∈ V we define the
length (or norm) of ~v by
‖~v ‖ =√
⟨
~v,~v⟩
Observe that the length of a vector is well defined since an inner product must be
positive definite. Also note that the length of a vector in an inner product space is
dependent on the definition of the inner product being used in the same way that
the sum of two vectors is dependent on the definition of addition being used. This
might seem a little strange at first, but it is actually extremely useful. It means in
applications we have flexibility to define the lengths of some vectors as required by
the problem.
EXAMPLE 1 In R3 under the standard inner product, we have that the length of ~e1 =
1
0
0
is 1.
However, under the inner product 〈~x, ~y〉 = ax1y1 + x2y2 + x3y3 where a > 0, then
‖~e1‖ =√
〈~e1, ~e1〉 =√
a(1)2 + 02 + 02 =√
a
EXAMPLE 2 Let p(x) = x and q(x) = 2 − 3x2 be vectors in P2(R) with inner product
〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1)
Calculate the length of p(x) and the length of q(x).
Solution: We have
‖p(x)‖ =√
〈p(x), p(x)〉 =√
p(−1)p(−1) + p(0)p(0) + p(1)p(1)
=√
(−1)(−1) + 0(0) + 1(1) =√
2
‖q(x)‖ =√
〈q(x), q(x)〉 =√
q(−1)q(−1) + q(0)q(0) + q(1)q(1)
=√
(−1)(−1) + 2(2) + (−1)(−1) =√
6
Section 9.2 Orthogonality and Length 243
EXAMPLE 3 Find the length of p(x) = x in P2(R) with inner product
〈p(x), q(x)〉 = p(0)q(0) + p(1)q(1) + p(2)q(2)
Solution: We have
‖p(x)‖ =√
〈p(x), p(x)〉 =√
p(0)p(0) + p(1)p(1) + p(2)p(2)
=√
0(0) + 1(1) + 2(2) =√
5
EXAMPLE 4 On C[−π, π], find the length of f (x) = x under the inner product
〈 f (x), g(x)〉 =∫ π
−πf (x)g(x) dx
Solution: We have
‖x ‖2 =∫ π
−πx2 dx =
1
3x3
∣
∣
∣
∣
∣
π
−π=
2
3π3
Thus, ‖x ‖ =√
2π3/3.
EXAMPLE 5 Find the length of A =
[
1/2 1/2−1/2 −1/2
]
in M2×2(R).
Solution: Using the standard inner product we get
‖A ‖ =√
〈A, A〉 =√
(1/2)2 + (1/2)2 + (−1/2)2 + (−1/2)2 = 1
Of course, we get the same familiar properties of length that we saw in Chapter 1.
THEOREM 1 Let V be an inner product space with inner product 〈 , 〉. For any ~x, ~y ∈ V and t ∈ Rwe have
(1) ‖~x ‖ ≥ 0, and ‖~x ‖ = 0 if and only if ~x = ~0.
(2) ‖t~v ‖ = |t| ‖~v ‖(3) 〈~x, ~y〉 ≤ ‖~x ‖‖~y ‖ Cauchy-Schwarz-Bunyakovski Inequality
(4) ‖~x + ~y‖ ≤ ‖~x‖ + ‖~y‖ Triangle Inequality
The proofs of these properties are essentially identical to the proofs of the corre-
sponding properties in Rn.
244 Chapter 9 Inner Products
In many cases it is useful (and sometimes necessary) to use vectors which have length
one.
DEFINITION
Unit Vector
Let V be an inner product space with inner product 〈 , 〉. If ~v ∈ V is a vector such that
‖~v ‖ = 1, then ~v is called a unit vector.
In many situations, we will be given a general vector ~v ∈ V and need to find a unit
vector v̂ in the direction of ~v. This is called normalizing the vector. Theorem 1 (2)
shows us that
v̂ =1
‖~v ‖~v
EXAMPLE 6 Find a unit vector in the direction of p(x) = x in P2(R) with inner product
〈p(x), q(x)〉 = p(0)q(0) + p(1)q(1) + p(2)q(2)
Solution: We have
‖p(x)‖ =√
p(0)p(0) + p(1)p(1) + p(2)p(2) =√
0(0) + 1(1) + 2(2) =√
5
Thus, a unit vector in the direction of p is
ˆp(x) =1√
5x
Orthogonality
We now extend the idea of vectors being orthogonal to inner product spaces.
DEFINITION
OrthogonalVectors
Let V be an inner product space with inner product 〈 , 〉. If ~x, ~y ∈ V such that
⟨
~x, ~y⟩
= 0
then ~x and ~y are said to be orthogonal.
Just like length, whether two vectors are orthogonal or not is dependent on the defi-
nition of the inner product.
EXAMPLE 7 The standard basis vectors for R2 are not orthogonal under the inner product⟨
~x, ~y⟩
= 2x1y1 + x1y2 + x2y1 + 2x2y2 for R2 because
⟨[
1
0
]
,
[
0
1
]⟩
= 2(1)(0) + 1(1) + 0(0) + 2(0)(1) = 1
Section 9.2 Orthogonality and Length 245
EXAMPLE 8 Show that
[
1 2
3 −1
]
is orthogonal to
[
−1 2
−1 0
]
in M2×2(R).
Solution:
⟨[
1 2
3 −1
]
,
[
−1 2
−1 0
]⟩
= (−1)(1) + 2(2) + (−1)(3) + 0(−1) = 0.
So, they are orthogonal.
DEFINITION
Orthogonal Set
If S = {~v1, . . . ,~vk} is a set of vectors in an inner product space V with inner product
〈 , 〉 such that⟨
~vi,~v j
⟩
= 0 for all i , j, then S is called an orthogonal set.
EXAMPLE 9 Prove that {1, x, 3x2 − 2} is an orthogonal set of vectors in P2(R) under the inner
product 〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1).
Solution: We have
〈1, x〉 = 1(−1) + 1(0) + 1(1) = 0⟨
1, 3x2 − 2⟩
= 1(1) + 1(−2) + 1(1) = 0⟨
x, 3x2 − 2⟩
= (−1)(1) + 0(−2) + 1(1) = 0
Hence, each vector is orthogonal to all other vectors, and so the set is orthogonal.
EXAMPLE 10 Prove that {1, x, 3x2 − 2} is not an orthogonal set of vectors in P2(R) under the inner
product 〈p(x), q(x)〉 = p(0)q(0) + p(1)q(1) + p(2)q(2).
Solution: We have 〈1, x〉 = 1(0) + 1(1) + 1(2) = 3 so 1 and x are not orthogonal.
Thus, the set is not orthogonal.
EXAMPLE 11 Prove that
{[
1 2
0 1
]
,
[
0 1
0 −2
]
,
[
0 0
1 0
]}
is an orthogonal set in M2×2(R).
Solution: We have
⟨[
1 2
0 1
]
,
[
0 1
0 −2
]⟩
= 1(0) + 2(1) + 0(0) + 1(−2) = 0
⟨[
1 2
0 1
]
,
[
0 0
1 0
]⟩
= 1(0) + 2(0) + 0(1) + 1(0) = 0
⟨[
0 0
1 0
]
,
[
0 1
0 −2
]⟩
= 0(0) + 0(1) + 1(0) + 0(−2) = 0
Hence, the set is orthogonal.
246 Chapter 9 Inner Products
EXERCISE 1 Determine if
{[
3 1
−1 2
]
,
[
1 −3
2 1
]
,
[
1 2
3 −1
]}
is an orthogonal set in M2×2(R).
One important property of orthogonal vectors in R2 is the Pythagorean Theorem.
This extends to an orthogonal set in an inner product space.
THEOREM 2 If {~v1, . . . ,~vk} is an orthogonal set in an inner product space V, then
‖~v1 + · · · + ~vk ‖2 = ‖~v1‖2 + · · · + ‖~vk ‖2
The following theorem gives us an important result about orthogonal sets which do
not contain the zero vector.
THEOREM 3 If S = {~v1, . . . ,~vk} is an orthogonal set in an inner product spaceVwith inner product
〈 , 〉 such that ~vi ,~0 for all 1 ≤ i ≤ k, then S is linearly independent.
Proof: Consider~0 = c1~v1 + · · · + ck~vk
Take the inner product of both sides with ~vi to get
⟨
~vi, ~0⟩
=⟨
~vi, c1~v1 + · · · + ck~vk
⟩
0 = c1
⟨
~vi,~v1
⟩
+ · · · + ci
⟨
~vi,~vi
⟩
+ · · · + ck
⟨
~vi,~vk
⟩
= 0 + · · · + 0 + ci‖~vi ‖2 + 0 + · · · + 0
= ci‖~vi ‖2
Since ~vi ,~0, we have that ‖~vi ‖ , 0, and so ci = 0 for 1 ≤ i ≤ k. Therefore, S is
linearly independent. �
Consequently, if we have an orthogonal set of non-zero vectors which spans an inner
product space V, then it will be a basis for V.
DEFINITION
Orthogonal Basis
If B is an orthogonal set in an inner product space V that is a basis for V, then B is
called an orthogonal basis for V.
Since the vectors in the basis are all orthogonal to each other, our geometric intuition
tells us that it should be quite easy to find the coordinates of any vector with respect
to this basis. The following theorem demonstrates this.
Section 9.2 Orthogonality and Length 247
THEOREM 4 If S = {~v1, . . . ,~vn} is an orthogonal basis for an inner product space V with inner
product 〈 , 〉 and ~v ∈ V, then the coefficient of ~vi when ~v is written as a linear combi-
nation of the vectors in S is
⟨
~v,~vi
⟩
‖~vi ‖2. In particular,
~v =
⟨
~v,~v1
⟩
‖~v1 ‖2~v1 + · · · +
⟨
~v,~vn
⟩
‖~vn ‖2~vn
Proof: Since ~v ∈ V, we can write
~v = c1~v1 + · · · + cn~vn
Taking the inner product of both sides with ~vi gives⟨
~v,~vi
⟩
=⟨
c1~v1 + · · · + cn~vn,~vi
⟩
= c1
⟨
~v1,~vi
⟩
+ · · · + ci
⟨
~vi,~vi
⟩
+ · · · + cn
⟨
~vn,~vi
⟩
= 0 + · · · + 0 + ci‖~vi ‖2 + 0 + · · · + 0
since S is orthogonal. Also, ~vi ,~0 which implies that ‖~vi ‖2 , 0. Therefore, we get
ci =
⟨
~v,~vi
⟩
‖~vi ‖2This is valid for 1 ≤ i ≤ n and so the result follows. �
EXAMPLE 12 Find the coordinate vector of ~x =
1
2
−3
in R3 with respect to the orthogonal basis
B =
1
1
1
,
1
0
−1
,
1
−2
1
= {~v1,~v2,~v3}.
Solution: By Theorem 4, we have
c1 =
⟨
~x,~v1
⟩
‖~v1 ‖2=
0
3= 0
c2 =
⟨
~x,~v2
⟩
‖~v2 ‖2=
4
2= 2
c3 =
⟨
~x,~v3
⟩
‖~v3 ‖2=−6
6= −1
Thus, [~x]B =
0
2
−1
.
Note that it is easy to verify that
1
2
−3
= 0
1
1
1
+ 2
1
0
−1
+ (−1)
1
−2
1
248 Chapter 9 Inner Products
EXAMPLE 13 Find the coordinate vector of A =
[
4 2
3 −1
]
with respect to the orthogonal basis
B ={[
1 1
1 1
]
,
[
−2 0
1 1
]
,
[
0 0
1 −1
]}
= {~v1,~v2,~v3} for a subspace S of M2×2(R).
Solution: We have
c1 =
⟨
A,~v1
⟩
‖~v1‖2=
4(1) + 2(1) + 3(1) + (−1)(1)
12 + 12 + 12 + 12=
8
4= 2
c2 =
⟨
A,~v2
⟩
‖~v2‖2=
4(−2) + 2(0) + 3(1) + (−1)(1)
(−2)2 + 02 + 12 + 12=−6
6= −1
c3 =
⟨
A,~v3
⟩
‖~v3‖2=
4(0) + 2(0) + 3(1) + (−1)(−1)
02 + 02 + 12 + (−1)2=
4
2= 2
Thus, [A]B =
2
−1
2
.
EXERCISE 2 Consider P2(R) with inner product 〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1).
Verify that B ={
1 − x − 2x2, 2 − x2, 2 + 3x − 4x2}
is an orthogonal basis for P2(R)
and determine the B-coordinate vector of 1 + x2.
Orthonormal Bases
Observe that the formula for the coordinates with respect to an orthogonal basis
would be even easier if all the basis vectors were unit vectors.
DEFINITION
Orthonormal Set
If S = {~v1, . . . ,~vk} is an orthogonal set in an inner product space V such that ‖~vi ‖ = 1
for 1 ≤ i ≤ k, then S is called an orthonormal set.
By Theorem 3, an orthonormal set is necessarily linearly independent.
DEFINITION
OrthonormalBasis
A basis for an inner product space V which is an orthonormal set is called an
orthonormal basis of V.
EXAMPLE 14 The standard basis of Rn is an orthonormal basis under the standard inner product.
EXAMPLE 15 The standard basis for R3 is not an orthonormal basis under the inner product
⟨
~x, ~y⟩
= x1y1 + 2x2y2 + x3y3
since ~e2 is not a unit vector under this inner product.
Section 9.2 Orthogonality and Length 249
EXAMPLE 16 Let B =
1√3
1
1
−1
, 1√2
1
0
1
, 1√6
1
−2
−1
. Verify that B is an orthonormal basis in R3.
Solution: We first verify that each vector is a unit vector. We have
⟨
1√
3
1
1
−1
,1√
3
1
1
−1
⟩
=1
3+
1
3+
1
3= 1
⟨
1√
2
1
0
1
,1√
2
1
0
1
⟩
=1
2+ 0 +
1
2= 1
⟨
1√
6
1
−2
−1
,1√
6
1
−2
−1
⟩
=1
6+
4
6+
1
6= 1
We now verify that B is orthogonal.
⟨
1√
3
1
1
−1
,1√
2
1
0
1
⟩
=1√
6(1 + 0 − 1) = 0
⟨
1√
3
1
1
−1
,1√
6
1
−2
−1
⟩
=1√
18(1 − 2 + 1) = 0
⟨
1√
2
1
0
1
,1√
6
1
−2
−1
⟩
=1√
12(1 + 0 − 1) = 0
Therefore, B is an orthonormal set and hence is a linearly independent set of three
vectors in R3. Thus, B is a basis for R3.
EXERCISE 3 Show that B ={[
1/√
2 1/√
2
0 0
]
,
[
0 0
1/√
2 1/√
2
]
,
[
1/2 −1/21/2 −1/2
]}
is an orthonormal
set in M2×2(R).
EXERCISE 4 Show that B = {1, x, x2} is not an orthonormal set for P2(R) under the inner product
〈p(x), q(x)〉 = p(−2)q(−2) + p(0)q(0) + p(2)q(2)
250 Chapter 9 Inner Products
EXAMPLE 17 Turn {1, x, 3x2 − 2} into an orthonormal basis for P2(R) under the inner product
〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1)
Solution: In Example 9, we showed that {1, x, 3x2 − 2} is orthogonal. Hence, it is a
linearly independent set of three vectors in P2(R) and thus is a basis for P2(R). To
turn it into an orthonormal basis, we just need to normalize each vector. We have
‖1‖ =√
12 + 12 + 12 =√
3
‖x‖ =√
(−1)2 + 02 + 12 =√
2
‖3x2 − 2‖ =√
12 + (−2)2 + 12 =√
6
Hence, an orthonormal basis for P2(R) is
{
1√
3,
x√
2,
3x2 − 2√
6
}
EXAMPLE 18 Show that the set {1, sin x, cos x} is an orthogonal set in C[−π, π] under the inner
product 〈 f (x), g(x)〉 =∫ π
−πf (x)g(x) dx, and then normalize the vectors to make it an
orthonormal set.
Solution:
〈1, sin x〉 =∫ π
−π1 · sin x dx = 0
〈1, cos x〉 =∫ π
−π1 · cos x dx = 0
〈sin x, cos x〉 =∫ π
−πsin x · cos x dx = 0
Thus, the set is orthogonal. Next, we find that
〈1, 1〉 =∫ π
−π12 dx = 2π
〈sin x, sin x〉 =∫ π
−πsin2 x dx = π
〈cos x, cos x〉 =∫ π
−πcos2 x dx = π
Hence, ‖1‖ =√
2π, ‖ sin x‖ =√π, and ‖ cos x‖ =
√π. Therefore,
{
1√
2π,
sin x√π,
cos x√π
}
is an orthonormal set in C[−π, π].
The following corollary to Theorem 4 shows that it is very easy to find coordinates
of a vector with respect to an orthonormal basis.
Section 9.2 Orthogonality and Length 251
COROLLARY 5 If ~v is any vector in an inner product space V with inner product 〈 , 〉 and
B = {~v1, . . . ,~vn} is an orthonormal basis for V, then
~v =⟨
~v,~v1
⟩
~v1 + · · · +⟨
~v,~vn
⟩
~vn
Proof: By Theorem 4 we have
~v =
⟨
~v,~v1
⟩
‖~v1‖2~v1 + · · · +
⟨
~v,~vn
⟩
‖~vn‖2~vn =
⟨
~v,~v1
⟩
~v1 + · · · +⟨
~v,~vn
⟩
~vn
since ‖~vi ‖ = 1 for 1 ≤ i ≤ n. �
REMARK
The formula for finding coordinates with respect to an orthonormal basis looks nicer
than that for an orthogonal basis. However, when doing problems by hand, is it not
necessarily easier to use as the vectors in an orthonormal basis as they often contain
square roots. Compare the following example to Example 12. In proofs, it will
generally be much nicer to have an orthonormal set.
EXAMPLE 19 Let ~x =
1
2
3
. Find the coordinates of ~x with respect to the orthonormal basis
B =
1√3
1
1
−1
, 1√2
1
0
1
, 1√6
1
−2
−1
for R3.
Solution: By Corollary 5, the coordinates of ~x with respect to B are
c1 =⟨
~x,~v1
⟩
=
⟨
1
2
3
,1√
3
1
1
−1
⟩
= 0
c2 =⟨
~x,~v2
⟩
=
⟨
1
2
3
,1√
2
1
0
1
⟩
=4√
2= 2√
2
c3 =⟨
~x,~v3
⟩
=
⟨
1
2
3
,1√
6
1
−2
−1
⟩
=−6√
6= −√
6
252 Chapter 9 Inner Products
EXAMPLE 20 Consider P2(R) with inner product defined by
〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1)
Write f (x) = 1 + x + x2 as a linear combination of the vectors in the orthonormal
basis B = {~v1,~v2,~v3} ={
1√
3,
x√
2,
3x2 − 2√
6
}
.
Solution: By Corollary 5, the coordinates of ~x with respect to B are
c1 =⟨
f ,~v1
⟩
= 1
(
1√
3
)
+ 1
(
1√
3
)
+ 3
(
1√
3
)
=5√
3
c2 =⟨
f ,~v2
⟩
= 1
(
−1√
2
)
+ 1(0) + 3
(
1√
2
)
=√
2
c3 =⟨
f ,~v3
⟩
= 1
(
1√
6
)
+ 1
(
−2√
6
)
+ 3
(
1√
6
)
=2√
6
Thus,
1 + x + x2 =5√
3
(
1√
3
)
+√
2
(
x√
2
)
+2√
6
(
3x2 − 2√
6
)
Orthogonal Matrices
We end this section by looking at a very important application of orthonormal bases
of Rn.
THEOREM 6 For an n × n matrix P, the following are equivalent.
(1) The columns of P form an orthonormal basis for Rn
(2) PT = P−1
(3) The rows of P form an orthonormal basis for Rn
Proof: Let P =[
~v1 · · · ~vn
]
.
(1)⇔ (2): By definition of matrix multiplication we have
PT P =
~vT1...~vT
n
[
~v1 · · · ~vn
]
=
~v1 · ~v1 ~v1 · ~v2 · · · ~v1 · ~vn
~v2 · ~v1 ~v2 · ~v2 · · · ~v2 · ~vn
......
. . ....
~vn · ~v1 ~vn · ~v2 · · · ~vn · ~vn
Therefore, PT P = I if and only if ~vi · ~v j = 0 whenever i , j and ~vi · ~vi = 1. That is,
PT = P−1 if and only if {~v1, . . . ,~vn} is an orthonormal basis for Rn.
Section 9.2 Orthogonality and Length 253
(2)⇔ (3): The rows of P form an orthonormal basis for Rn if and only if the columns
of PT from an orthonormal basis for Rn. We proved above that this is true if and only
if
(PT )T = (PT )−1
P = (P−1)T
PT =(
(P−1)T)T
PT = P−1
�
DEFINITION
OrthogonalMatrix
If the columns of an n×n matrix P form an orthonormal basis for Rn, then P is called
an orthogonal matrix.
REMARKS
1. Be very careful to remember that an orthogonal matrix has orthonormal rows
and columns. A matrix whose columns form an orthogonal set but not an
orthonormal set is not orthogonal!
2. By Theorem 6, we could have defined an orthogonal matrix as a matrix P
such that PT = P−1 instead. We will use this property of orthogonal matrices
frequently.
EXAMPLE 21 Which of the following matrices are orthogonal.
A =
0 1 0
0 0 1
1 0 0
, B =
[
1/2 1/2−1/2 1/2
]
, C =
1/√
2 1/√
2 0
1/√
6 −1/√
6 2/√
6
−1/√
3 1/√
3 1/√
3
Solution: The columns of A are the standard basis vectors for R3 which we know is
an orthonormal basis for R3. Thus, A is orthogonal.
Although the columns of B are clearly orthogonal, we have that∥
∥
∥
∥
∥
∥
[
1/2−1/2
]∥
∥
∥
∥
∥
∥
=√
(1/2)2 + (−1/2)2 =√
1/2 , 1
Thus, the first column is not a unit vector, so B is not orthogonal.
By matrix multiplication, we find that
CCT =
1/√
2 1/√
2 0
1/√
6 −1/√
6 2/√
6
−1/√
3 1/√
3 1/√
3
1/√
2 1/√
6 −1/√
3
1/√
2 −1/√
6 1/√
3
0 2/√
6 1/√
3
= I
Hence, C is orthogonal.
254 Chapter 9 Inner Products
We now look at some useful properties of orthogonal matrices.
THEOREM 7 If P and Q are n × n orthogonal matrices and ~x, ~y ∈ Rn, then
(1) (P~x) · (P~y) = ~x · ~y(2) ‖P~x ‖ = ‖~x ‖(3) det P = ±1
(4) All real eigenvalues of P are 1 or −1
(5) PQ is an orthogonal matrix
Proof: We will prove (1) and (2) and leave the others as exercises. We have
(P~x) · (P~y) = (P~x)T (P~y) = ~xT PT P~y = ~xT~y = ~x · ~y
Thus,
‖P~x ‖2 = (P~x) · (P~x) = ~x · ~x = ‖~x ‖2
�
Section 9.2 Problems
1. Let B = {1 + x,−2 + 3x, 2 − 3x2}. Consider the
inner product 〈 , 〉 on P2(R) defined by
〈p(x), q(x)〉 = p(−1)q(−1)+p(0)q(0)+p(1)q(1)
(a) Prove that B is an orthogonal basis for
P2(R).
(b) Find the B-coordinate vector of p(x) = 1.
(c) Find the B-coordinate vector of p(x) = x.
(d) Find the B-coordinate vector of p(x) = x2.
(e) Find the B-coordinate vector of
p(x) = 3 − 7x + x2 in three ways.
i. By row reducing an augmented
matrix.
ii. By using (b), (c), (d), and Theorem
4.3.2.
iii. By using the formula for coordinates
of vectors with respect to a basis
(Theorem 4).
2. Let ~x =
4
3
5
, and let B =
1
−2
2
,
2
2
1
,
−2
1
2
be
a basis for R3.
(a) Prove that B is an orthogonal basis for S.
(b) TurnB into an orthonormal basisC by nor-
malizing the vectors in B.
(c) Find the B-coordinate vector of ~x.
(d) Find the C-coordinate vector of ~x.
3. Consider the inner product 〈 , 〉 on defined R3
by⟨
~x, ~y⟩
= x1y1 + 2x2y2 + x3y3, and let
B =
1
2
−1
,
3
−1
−1
,
3
1
7
and ~x =
1
1
1
.
(a) Prove that B is an orthogonal basis for R3.
(b) TurnB into an orthonormal basisC by nor-
malizing the vectors in B.
(c) Find the B-coordinate vector of ~x.
(d) Find the C-coordinate vector of ~x.
Section 9.2 Orthogonality and Length 255
4. Consider the subspace SpanB in M2×2(R)
where B ={[
1 1
−1 −1
]
,
[
1 −1
1 −1
]
,
[
−1 1
1 −1
]}
.
(a) Prove that B is an orthogonal basis for S.
(b) TurnB into an orthonormal basisC by nor-
malizing the vectors in B.
(c) Find the C-coordinate vector of
~x =
[
−3 6
−2 −1
]
.
5. Which of the following matrices are orthogo-
nal.
(a)
[
cos θ − sin θsin θ cos θ
]
(b)
[
1/√
10 −3/√
10
3/√
10 1/√
10
]
(c)
[
2 1
−1 2
]
(d)
2/3 1/3 2/3
−1/√
18 4/√
18 −1/√
18
−1/√
2 0 1/√
2
(e)
1 1 1
2 −1 0
1 1 −1
(f)
0 1 0
0 0 1
1 0 0
(g)
2/√
20 3/√
11 0
3/√
20 −1/√
11 −1/√
2
3/√
20 −1/√
11 1/√
2
6. Consider P2(R) with inner product
〈p(x), q(x)〉 = p(−1)q(−1)+p(0)q(0)+p(1)q(1)
Given thatB = {1−x2, 12(x−x2)} is an orthonor-
mal set, extend B to find an orthonormal basis
for P2(R).
7. Let V be an inner product space with inner
product 〈 , 〉. Prove that for any~v ∈ V and t ∈ R,
we have
‖t~v ‖ = |t| ‖~v ‖
8. Let P and Q be n × n orthogonal matrices
(a) Prove that det P = ±1.
(b) Prove that all real eigenvalues of P are ±1.
(c) Give an orthogonal matrix P whose eigen-
values are not ±1.
(d) Prove that PQ is an orthogonal matrix.
9. Let B =
1/√
6
−2/√
6
1/√
6
,
1/√
3
1/√
3
1/√
3
,
1/√
2
0
−1/√
2
be an orthonormal basis for R3. UsingB, deter-
mine another orthonormal basis for R3 which
includes the vector
1/√
6
1/√
3
1/√
2
, and briefly ex-
plain why your basis is orthonormal.
10. Let {~v1, . . . ,~vm} be a set of m orthonormal vec-
tors in Rn with m < n. Let Q =[
~v1 · · · ~vm
]
.
Prove that QT Q = Im.
11. Let {~v1, . . . ,~vn} be an orthonormal basis for an
inner product space V with inner product 〈 , 〉and let ~x = c1~v1+ · · ·+ cn~vn and ~y = d1~v1+ · · ·+dn~vn.
(a) Show that 〈~x, ~y〉 = c1d1 + · · · + cndn
(b) Show that ‖~x ‖2 = c21+ · · · + c2
n
12. Let V be an inner product space with inner
product 〈 , 〉. Define the distance between two
vectors ~x, ~y ∈ V by
d(~x, ~y) = ‖~x − ~y‖
Prove the following properties of the distance
function.
(a) d(~x, ~y) ≥ 0.
(b) d(~x, ~y) = 0 if and only if ~x = ~y.
(c) d(~x, ~y) = d(~y, ~x).
(d) d(~x,~z) ≤ d(~x, ~y) + d(~y,~z).
(e) d(~x, ~y) = d(~x +~z, ~y +~z).
(f) d(c~x, c~y) = |c|d(~x, ~y).
256 Chapter 9 Inner Products
9.3 The Gram-Schmidt Procedure
We have now seen that orthogonal and orthonormal bases are very nice. However,
this is only useful if we can find such a basis for any inner product space we wish to
use. So, our goal in this section is to not only prove that we can find an orthogonal
basis for any finite dimensional inner product space, but to also find an algorithm
for finding such a basis. Note that once we have an orthogonal basis, it is easy to
normalize each vector to make it orthonormal.
LetW be a k-dimensional subspace of an inner product space V. We will prove that
W has an orthogonal basis constructively. That is, we will derive a method for turning
any basis {~w1, . . . , ~wk} forW into an orthogonal basis {~v1, . . . ,~vk} forW.
We will do this inductively.
First, consider the case where k = 1. That is, {~w1} is a basis forW. In this case, we
can take ~v1 = ~w1 so that {~v1} is an orthogonal basis forW.
Now, consider the case k = 2 where {~w1, ~w2} is a basis forW. Starting as in the case
above we let ~v1 = ~w1. We then need to pick a non-zero vector ~v2 that is orthogonal
to ~v1. To see how to find such a vector ~v2 we work backwards. If {~v1,~v2} is an
orthogonal basis for W, then ~w2 ∈ Span{~v1,~v2}. If this is true, then, from our work
with coordinates with respect to an orthogonal basis, we find that
~w2 =〈~w2,~v1〉‖~v1‖2
~v1 +〈~w2,~v2〉‖~v2‖2
~v2
where〈~w2,~v2〉‖~v2‖2
, 0 since ~w2 is not a scalar multiple of ~w1 = ~v1. Rearranging gives
〈~w2,~v2〉‖~v2‖2
~v2 = ~w2 −〈~w2,~v1〉‖~v1‖2
~v1
Because multiplying by a non-zero scalar does not change orthogonality, this shows
that for {~v1,~v2} to be an orthogonal basis forW, it is necessary that ~v2 is a multiple of
~w2 −〈~w2,~v1〉‖~v1‖2
~v1. We now need to prove that this is also sufficient.
If we take ~v2 = ~w2 −〈~w2,~v1〉‖~v1‖2
~v1, then we have
〈~v1,~v2〉 =⟨
~v1, ~w2 −〈~w2,~v1〉‖~v1‖2
~v1
⟩
= 〈~v1, ~w2〉 −〈~w2,~v1〉‖~v1‖2
〈~v1,~v1〉
= 〈~v1, ~w2〉 −〈~w2,~v1〉‖~v1‖2
‖~v1‖2
= 0
Consequently, {~v1,~v2} is an orthogonal set of two non-zero vectors inW and hence is
an orthogonal basis forW by Theorem 9.2.3 and Theorem 4.2.3.
Section 9.3 The Gram-Schmidt Procedure 257
For k = 3, we can repeat the same procedure. We start by picking ~v1 and ~v2 as in the
case for k = 2. We then need to find ~v3 such that {~v1,~v2,~v3} is orthogonal. Using the
same method as in the previous case, we find that taking
~v3 = ~w3 −〈~w3,~v1〉‖~v1‖2
~v1 −〈~w3,~v2〉‖~v2‖2
~v2
will give an orthogonal basis {~v1,~v2,~v3} forW.
Continuing this gives the following result.
THEOREM 1 (Gram-Schmidt Orthogonalization Theorem)
Let {~w1, . . . , ~wn} be a basis for an inner product space W. If we define ~v1, . . . ,~vn
successively as follows:
~v1 = ~w1
~v2 = ~w2 −〈~w2,~v1〉‖~v1‖2
~v1
~vi = ~wi −〈~wi,~v1〉‖~v1‖2
~v1 −〈~wi,~v2〉‖~v2‖2
~v2 − · · · −〈~wi,~vi−1〉‖~vi−1‖2
~vi−1
for 3 ≤ k ≤ n, then {~v1, . . . ,~vk} is an orthogonal basis for Span{~w1, . . . , ~wk} for
1 ≤ k ≤ n.
EXERCISE 1 Prove Theorem 1.
Hint: Since the algorithm is recursive, what proof of technique should you use? Also,
make sure that you fully prove every part of the conclusion of the theorem.
Observe that Theorem 1 implies that every finite dimensional inner product space has
an orthogonal basis. We call the process defined in Theorem 1 for finding an orthog-
onal basis {~v1, . . . ,~vn} for an inner product spaceW the Gram-Schmidt procedure.
EXAMPLE 1 Use the Gram-Schmidt procedure to transform B =
1
1
1
,
1
−1
1
,
1
1
0
= {~w1, ~w2, ~w3}
into an orthonormal basis for R3.
Solution: We first take ~v1 = ~w1 =
1
1
1
. Next, we calculate
~w2 −⟨
~w2,~v1
⟩
‖~v1‖2~v1 =
1
−1
1
− 1
3
1
1
1
=
2/3−4/32/3
258 Chapter 9 Inner Products
We can take any non-zero scalar multiple of this, so we take ~v2 =
1
−2
1
to simplify
the next calculation. Finally,
~v3 = ~w3 −⟨
~w3,~v1
⟩
‖~v1‖2~v1 −
⟨
~w3,~v2
⟩
‖~v2‖2~v2
=
1
1
0
− 2
3
1
1
1
+1
6
1
−2
1
=
1/20
−1/2
By Theorem 1, {~v1,~v2,~v3} is an orthogonal basis for R3. To get an orthonormal basis,
we normalize each vector. We get
v̂1 =1√
3
1
1
1
v̂2 =1√
6
1
−2
1
v̂3 =1√
2
1
0
−1
So, {v̂1, v̂2, v̂3} is an orthonormal basis for R3.
EXAMPLE 2 Use the Gram-Schmidt procedure to transform B =
1
1
1
,
1
−1
1
,
1
1
0
= {~w1, ~w2, ~w3}
into an orthogonal basis for R3 with inner product
⟨
~x, ~y⟩
= 2x1y1 + x2y2 + 3x3y3
Solution: Take ~v1 =
1
1
1
. Then,
~w2 −⟨
~w2,~v1
⟩
‖~v1‖2~v1 =
1
−1
1
− 4
6
1
1
1
=
1/3−5/31/3
To simplify calculations we take ~v2 =
1
−5
1
. Next,
~v3 = ~w3 −⟨
~w3,~v1
⟩
‖~v1‖2~v1 −
⟨
~w3,~v2
⟩
‖~v2‖2~v2
=
1
1
0
− 3
6
1
1
1
+3
30
1
−5
1
=
35
0
−25
Thus, an orthogonal basis is
1
1
1
,
1
−5
1
,
3
0
−2
.
Section 9.3 The Gram-Schmidt Procedure 259
EXAMPLE 3 Find an orthogonal basis of P3(R) with inner product 〈p, q〉 =∫ 1
−1
p(x)q(x) dx by
applying the Gram-Schmidt procedure to the basis {1, x, x2, x3}.
Solution: Take ~v1 = 1. Then
~v2 = x − 〈x, 1〉‖1‖2 1 = x − 0
21 = x
~v3 = x2 −
⟨
x2, 1⟩
‖1‖2 1 −
⟨
x2, x⟩
‖x‖2 x = x2 −23
21 − 0
23
x = x2 − 1
3
We instead take ~v3 = 3x2 − 1 to make calculations easier. Finally, we get
~v4 = x3 −
⟨
x3, 1⟩
‖1‖2 1 −
⟨
x3, x⟩
‖x‖2 x −
⟨
x3, 3x2 − 1⟩
‖3x2 − 1‖2 (3x2 − 1) = x3 − 3
5x
Hence, an orthogonal basis for this inner product space{
1, x, 3x2 − 1, x3 − 35x}
.
EXAMPLE 4 Use the Gram-Schmidt procedure to find an orthogonal basis for the subspaceW of
M2×2(R) spanned by
{~w1, ~w2, ~w3, ~w4} ={[
1 1
0 1
]
,
[
1 0
1 1
]
,
[
2 1
1 2
]
,
[
1 0
0 1
]}
under the standard inner product.
Solution: Take ~v1 = ~w1. Next, we get
~w2 −⟨
~w2,~v1
⟩
‖~v1‖2~v1 =
[
1 0
1 1
]
− 2
3
[
1 1
0 1
]
=
[
1/3 −2/31 1/3
]
So, we take ~v2 =
[
1 −2
3 1
]
. We then find that
~v3 = ~w3 −⟨
~w3,~v1
⟩
‖~v1‖2~v1 −
⟨
~w3,~v2
⟩
‖~v2‖2~v2 =
[
2 1
1 2
]
− 5
3
[
1 1
0 1
]
− 5
15
[
1 −2
3 1
]
=
[
0 0
0 0
]
At first glance, we may think something has gone wrong since the zero vector cannot
be a member of the basis. However, if we look closely, we see that this implies that
~w3 is a linear combination of ~w1 and ~w2. Hence, we have
Span{~w1, ~w2, ~w4} = Span{~w1, ~w2, ~w3, ~w4}
Therefore, we can ignore ~w3 and continue the procedure using ~w4.
~v3 = ~w4 −⟨
~w4,~v1
⟩
‖~v1‖2~v1 −
⟨
~w4,~v2
⟩
‖~v2‖2~v2 =
[
1 0
0 1
]
− 2
3
[
1 1
0 1
]
− 2
15
[
1 −2
3 1
]
=
[
1/5 −2/5−2/5 1/5
]
Hence, an orthogonal basis forW is {~v1,~v2,~v3}.
260 Chapter 9 Inner Products
REMARK
It is important to realize that if you change the order that you use the vectors in a
basis in the Gram-Schmidt procedure, that you will likely get a different result. The
next exercise will demonstrate how big the difference can be. For purposes of the end
of section problems below, you are expected to use the vectors in the order provided
(from left to right).
EXERCISE 2 Use the Gram-Schmidt procedure to find an orthogonal basis for the subspaceW of
M2×2(R) from Example 4, using the vectors in the order
{~w4, ~w2, ~w1} ={[
1 0
0 1
]
,
[
1 0
1 1
]
,
[
1 1
0 1
]}
QR-Decomposition (optional)
The Gram-Schmidt procedure can be used to generate an extremely useful decompo-
sition of an m × n matrix A.
THEOREM 2 (QR-Decomposition)
Let A =[
~a1 · · · ~an
]
∈ Mm×n(R) where dim Col(A) = n. Let ~q1, . . . , ~qn denote the
vectors that result from applying the Gram-Schmidt procedure to the columns of A
(in order) and then normalizing. If we define
Q =[
~q1 · · · ~qn
]
and
R =
~a1 · ~q1 ~a2 · ~q1 · · · ~an · ~q1
0 ~a2 · ~q2 · · · ~an · ~q2
0 0. . . ~an · ~qn−1
0 · · · 0 ~an · ~qn
then Q is orthogonal, R is invertible, and
A = QR
EXERCISE 3 Prove the QR-Decomposition Theorem.
Section 9.3 The Gram-Schmidt Procedure 261
Section 9.3 Problems
1. Use the Gram-Schmidt procedure to transform
B =
1
0
−1
,
2
2
3
,
1
3
1
into an orthogonal basis
for R3.
2. Use the Gram-Schmidt procedure to transform
B =
−3
1
−1
,
2
−2
2
,
2
3
−1
into an orthogonal ba-
sis for R3.
3. Use the Gram-Schmidt procedure to transform
S = {1, x, x2} into an orthonormal basis for
P2(R) under the inner product
〈p(x), q(x)〉 = p(−1)q(−1)+p(0)q(0)+p(1)q(1)
4. Use the Gram-Schmidt procedure to transform
S = {1, x, x2} into an orthonormal basis for
P2(R) under the inner product
〈p(x), q(x)〉 = p(0)q(0) + p(1)q(1) + p(2)q(2)
5. Find an orthogonal basis for the subspace S of
R4 defined by
S = Span
−2
2
2
2
,
−1
3
1
1
,
1
2
−2
0
6. Let B ={[
−1 1
2 2
]
,
[
1 0
−1 −1
]
,
[
1 1
0 0
]
,
[
1 0
0 1
]}
.
Find an orthogonal basis for the subspace
S = SpanB of M2×2(R).
7. Suppose P2(R) has an inner product 〈 , 〉 which
satisfies the following:
〈1, 1〉 = 2 〈1, x〉 = 2 〈1, x2〉 = −2
〈x, x〉 = 4 〈x, x2〉 = −2 〈x2, x2〉 = 3
Given thatB = {x, 2x2+x, 2} is a basis of P2(R).
Apply the Gram-Schmidt procedure to this ba-
sis to find an orthogonal basis of P2(R).
8. Let ~v1 =
1
−1
−1
1
and ~v2 =
3
1
1
−1
. Extend the set
{~v1,~v2} to an orthogonal basis for R4 by follow-
ing the indicated steps.
(a) Use the methods of Chapter 4 to extend the
set {~v1,~v2} to be a basis {~v1,~v2, ~w3, ~w4} for
R4.
(b) Apply the Gram-Schmidt procedure to the
basis you found in (a) to get an orthogonal
basis {~v1,~v2,~v3,~v4} for R4.
9. (optional) Find a QR-Decomposition for
A =
1 0 0
1 1 0
1 1 1
10. (optional) Find a QR-Decomposition for
A =
3 −6
4 −8
0 1
262 Chapter 9 Inner Products
9.4 General Projections
In Chapter 1, we saw how to calculate the projection of a vector onto a plane in R3.
Such projections have many important uses. So, our goal now is to extend projections
to general inner product spaces.
We will do this by mimicking what we did with projections in R3. We first recall
from Section 1.5 that given a vector ~x ∈ R3 and a plane P in R3 which passes through
the origin, we wanted to write ~x as the sum of a vector in P and a vector orthogonal
to every vector in P. Therefore, given a subspaceW of an inner product space V and
any vector ~v ∈ V, we want to find a vector projW(~v) inW and a vector perpW(~v) which
is orthogonal to every vector inW such that
~v = projW(~v) + perpW(~v)
In the case of projecting ~v ∈ R3 onto a plane P, we knew that perpP(~v) was a normal
vector of P. In the general case, we need to start by looking at the set of vectors
which are orthogonal to every vector inW.
DEFINITION
OrthogonalComplement
LetW be a subspace of an inner product space V. The orthogonal complementW⊥
ofW in V is defined by
W⊥ = {~v ∈ V | ⟨~w,~v⟩ = 0 for all ~w ∈W}
Instead of having to show a vector ~v ∈ V is orthogonal to every vector ~w ∈ W to
prove that ~v ∈W⊥, the following theorem tells us that we only need to show that ~v is
orthogonal to every vector in a spanning set forW.
THEOREM 1 Let {~v1, . . . ,~vk} be a spanning set for a subspaceW of an inner product space V, and
let ~x ∈ V. We have that ~x ∈W⊥ if and only if⟨
~x,~vi
⟩
= 0 for 1 ≤ i ≤ k.
Proof: If ~x ∈ W⊥, then by definition⟨
~w, ~x⟩
= 0 for all ~w ∈ W. Thus, since ~vi ∈ W,
we have⟨
~x,~vi
⟩
= 0 for 1 ≤ i ≤ k.
On the other hand, let ~x ∈ V such that⟨
~x,~vi
⟩
= 0 for 1 ≤ i ≤ k. Let ~w ∈ W. Since
{~v1, . . . ,~vk} spansW, there exists c1, . . . , ck ∈ R such that
~w = c1~v1 + · · · + ck~vk
Hence,
⟨
~x, ~w⟩
=⟨
~x, c1~v1 + · · · + ck~vk
⟩
= c1
⟨
~x,~v1
⟩
+ · · · + ck
⟨
~x,~vk
⟩
= 0
Thus, ~x ∈W⊥. �
Section 9.4 General Projections 263
EXAMPLE 1 LetW = Span
1
0
0
1
,
1
1
0
0
. FindW⊥ in R4.
Solution: We want to find all ~v =
v1
v2
v3
v4
such that
0 =
⟨
v1
v2
v3
v4
,
1
0
0
1
⟩
= v1 + v4
0 =
⟨
v1
v2
v3
v4
,
1
1
0
0
⟩
= v1 + v2
Solving this homogeneous system we find that a vector equation for the solution
space is ~v = s
0
0
1
0
+ t
−1
1
0
1
, s, t ∈ R. Thus,
W⊥ = Span
0
0
1
0
,
−1
1
0
1
EXAMPLE 2 LetW = Span{x} in P2(R) with 〈p(x), q(x)〉 =∫ 1
0
p(x)q(x) dx. FindW⊥.
Solution: We want to find all p(x) = a + bx + cx2 such that 〈p, x〉 = 0. This gives
0 =
∫ 1
0
(ax + bx2 + cx3) dx =1
2a +
1
3b +
1
4c
So, c = −2a − 43b. Thus, every vector in the orthogonal complement has the form
a + bx −(
2a + 43b)
x2 = a(1 − 2x2) + b(
x − 43x2
)
. Hence, we get
W⊥ = Span{1 − 2x2, 3x − 4x2}
The following theorem gives some important properties of the orthogonal comple-
ment which we will require later.
264 Chapter 9 Inner Products
THEOREM 2 IfW is a subspace of an inner product space V, then
(1) W⊥ is a subspace of V
(2) If dimV = n, then dimW⊥ = n − dimW
(3) If dimV = n, then(
W⊥)⊥=W
(4) W ∩W⊥ = {~0}(5) If dimV = n, {~v1, . . . ,~vk} is an orthogonal basis forW, and {~vk+1, . . . ,~vn} is an
orthogonal basis for W⊥, then {~v1, . . . ,~vk,~vk+1, . . . ,~vn} is an orthogonal basis
for V
Proof: For (1), we apply the Subspace Test. By definition, W⊥ is a subset of V.
Also, ~0 ∈ W⊥ since 〈~0, ~w〉 = 0 for all ~w ∈ W by Theorem 9.1.1. Let ~u,~v ∈ W⊥ and
t ∈ R. For any ~w ∈W we have
⟨
~u + ~v, ~w⟩
=⟨
~u, ~w⟩
+⟨
~v, ~w⟩
= 0 + 0 = 0
and⟨
t~u, ~w⟩
= t⟨
~u, ~w⟩
= t(0) = 0
Thus, ~u + ~v ∈ W⊥ and t~u ∈W⊥, soW⊥ is a subspace of V.
For (2), let {~v1, . . . ,~vk} be an orthonormal basis for W. We can extend this to a
basis for V and then apply the Gram-Schmidt procedure to get an orthonormal basis
{~v1, . . . ,~vn} for V. Then, for any ~x ∈W⊥ we have ~x ∈ V so we can write
~x =⟨
~x,~v1
⟩
~v1 + · · · +⟨
~x,~vk
⟩
~vk +⟨
~x,~vk+1
⟩
~vk+1 + · · · +⟨
~x,~vn
⟩
~vn
= ~0 + · · · + ~0 + ⟨
~x,~vk+1
⟩
~vk+1 + · · · +⟨
~x,~vn
⟩
~vn
Hence, W⊥ = Span{~vk+1, . . . ,~vn}. Moreover, B = {~vk+1, . . . ,~vn} is linearly indepen-
dent since it is orthonormal. Thus, B is a basis forW⊥ and hence dimW⊥ = n − k =
n − dimW.
For (3), if ~w ∈ W, then⟨
~w,~v⟩
= 0 for all ~v ∈ W⊥ by definition of W⊥. Hence,
~w ∈ (
W⊥)⊥
. So, W ⊆ (
W⊥)⊥
Also, by (2), dim(
W⊥)⊥= n − dimW⊥ = n − (n −
dimW) = dimW. Therefore,W =(
W⊥)⊥
.
For (4), if ~x ∈W ∩W⊥, then⟨
~x, ~x⟩
= 0 and hence ~x = ~0. Therefore,W ∩W⊥ = {~0}.
For (5), we have that {~v1, . . . ,~vk,~vk+1, . . . ,~vn} is an orthogonal set of non-zero vectors
in V since every vector inW⊥ is orthogonal to every vectorW. Moreover, it contains
n vectors by (4). Hence, we have that it is linearly independent set of n vectors. Thus,
by Theorem 4.2.3, it is an orthogonal basis for V. �
Property (3) may seem obvious at first glance which may make us wonder why the
condition that dimV = n was necessary. This is because this property may not be
true in an infinite dimensional inner product space! That is, we can have(
W⊥)⊥,W.
We demonstrate this with an example.
Section 9.4 General Projections 265
EXAMPLE 3 Let P be the inner product space of all real polynomials with inner product
⟨
a0 + a1x + · · · + ak xk, b0 + b1x + · · · + blxl⟩
= a0b0 + a1b1 + · · · + apbp
where p = min(k, l). Now, let U be the subspace
U ={
g(x) ∈ P | g(1) = 0}
We claim that U⊥ = {0}. Let f (x) ∈ U⊥, say f (x) = a0 + a1x + · · · + anxn. Note that
〈 f , xn+1〉 = 0. Define g(x) = f (x) − f (1)xn+1. We get that g(1) = 0 and hence g ∈ U.
Therefore, we have
0 = 〈 f , g〉 = ⟨
f , f − f (1)xn+1⟩ = 〈 f , f 〉 − f (1)〈 f , xn+1〉 = 〈 f , f 〉
and hence f (x) = 0.
Of course(
U⊥)⊥= P, since 〈0, p(x)〉 = 0 for every p(x) ∈ P. Therefore,
(
U⊥)⊥, U.
REMARK
Notice that, as in the proof of property (3) above, we do have that U ⊂ (
U⊥)⊥
.
We now return to our purpose of looking at orthogonal complements, to define the
projection of a vector ~v onto a subspace W of an inner product space V. We stated
that we want to find projW(~v) and perpW(~v) such that ~v = projW(~v) + perpW(~v) with
projW(~v) ∈W and perpW(~v) ∈W⊥.
Suppose that dimV = n, {~v1, . . . ,~vk} is an orthogonal basis forW, and {~vk+1, . . . ,~vn}is an orthogonal basis for W⊥. By Theorem 2, we have that {~v1, . . . ,~vk,~vk+1, . . . ,~vn}is an orthogonal basis for V. So, for any ~v ∈ V we get
~v =
⟨
~v,~v1
⟩
‖~v1‖2~v1 + · · · +
⟨
~v,~vk
⟩
‖~vk‖2~vk +
⟨
~v,~vk+1
⟩
‖~vk+1‖2~vk+1 + · · · +
⟨
~v,~vn
⟩
‖~vn‖2~vn
This gives us what we want! Taking
projW(~v) =
⟨
~v,~v1
⟩
‖~v1‖2~v1 + · · · +
⟨
~v,~vk
⟩
‖~vk‖2~vk ∈W
and
perpW(~v) =
⟨
~v,~vk+1
⟩
‖~vk+1‖2~vk+1 + · · · +
⟨
~v,~vn
⟩
‖~vn‖2~vn ∈W⊥
we get ~v = projW(~v) + perpW(~v) with projW(~v) ∈W and perpW(~v) ∈W⊥.
At first glance it looks like these choices depend on the orthogonal basis for V being
used. However, it is easy to show that the projection and perpendicular are in fact
unique (independent of basis).
266 Chapter 9 Inner Products
DEFINITION
Projection
Perpendicular
SupposeW is a k-dimensional subspace of an inner product space V and {~v1, . . . ,~vk}is an orthogonal basis forW. For any ~v ∈ V we define the projection of ~v ontoW by
projW(~v) =
⟨
~v,~v1
⟩
‖~v1‖2~v1 + · · · +
⟨
~v,~vk
⟩
‖~vk‖2~vk
and the perpendicular of the projection by
perpW(~v) = ~v − projW(~v)
It is very important to notice that we require an orthogonal basis forW to calculate
the projection. For this reason, these are often called orthogonal projections. To
save ourselves some work, we have defined the perpendicular of the projection in
such a way that we do not need an orthogonal basis (or any basis) forW⊥. To ensure
this definition is valid, we need to verify that perpW(~v) ∈W⊥.
THEOREM 3 IfW is a k-dimensional subspace of an inner product space V, then for any ~v ∈ V we
have
perpW(~v) = ~v − projW(~v) ∈W⊥
Proof: Observe that
perpW(~v) = ~v − projW(~v) = ~v −⟨
~v,~v1
⟩
‖~v1‖2~v1 − · · · −
⟨
~v,~vk
⟩
‖~vk‖2~vk
So, by the Gram-Schmidt Orthogonalization Theorem, {~v1, . . . ,~vk, perpW(~v)} is an
orthogonal set. That is, perpW(~v) is orthogonal to ~v1, . . . ,~vk. Thus, perpW(~v) ∈ W⊥by Theorem 1. �
Observe that this implies that projW(~v) and perpW(~v) are orthogonal for any ~v ∈ V as
we saw in Chapter 1. Additionally, these functions satisfy other properties which we
saw for projections in Chapter 1.
THEOREM 4 IfW is a k-dimensional subspace of an inner product space V, then projW is a linear
operator on V with kernelW⊥.
THEOREM 5 IfW is a subspace of a finite dimensional inner product space V, then for any ~v ∈ Vwe have
projW⊥(~v) = perpW(~v)
Section 9.4 General Projections 267
EXAMPLE 4 LetW = Span
{[
1 0
0 1
]
,
[
1 0
0 −1
]}
be a subspace of M2×2(R). Find projW
([
1 −1
2 3
])
.
Solution: Denote the vectors by ~v1 =
[
1 0
0 1
]
, ~v2 =
[
1 0
0 −1
]
, and ~x =
[
1 −1
2 3
]
. Since
{~v1,~v2} is an orthogonal basis forW we get
projW
([
1 −1
2 3
])
=
⟨
~x,~v1
⟩
‖~v1‖2~v1 +
⟨
~x,~v2
⟩
‖~v2‖2~v2 =
4
2
[
1 0
0 1
]
+−2
2
[
1 0
0 −1
]
=
[
1 0
0 3
]
EXAMPLE 5 Let B =
1
0
−1
1,
,
1
1
0
1
,
−1
1
0
−1
= {~b1, ~b2, ~b3} and let ~x =
4
3
−2
5
. Find the projection of ~x
onto the subspace S = SpanB of R4.
Solution: To find the projection, we must have an orthogonal basis for S. Thus, our
first step must be to perform the Gram-Schmidt procedure on B. Let ~w1 = ~b1. Next,
we get
~b2 −
⟨
~b2, ~w1
⟩
‖~w1‖2~w1 =
1
1
0
1
− 2
3
1
0
−1
1,
=1
3
1
3
2
1
To simplify calculations we use ~w2 =
1
3
2
1
instead. Then, we get
~b3 −
⟨
~b3, ~w1
⟩
‖~w1‖2~w1 −
⟨
~b3, ~w2
⟩
‖~w2‖2~w2 =
−1
1
0
−1
+2
3
1
0
−1
1,
− 1
15
1
3
2
1
=2
5
−1
2
−2
−1
We take ~w3 =
−1
2
−2
−1
. Thus, the set {~w1, ~w2, ~w3} is an orthogonal basis for S. We can
now determine the projection.
projS(~x) =
⟨
~x, ~w1
⟩
‖~w1‖2~w1 +
⟨
~x, ~w2
⟩
‖~w2‖2~w2 +
⟨
~x, ~w3
⟩
‖~w3‖2~w3 =
11
3~w1 +
14
15~w2 +
1
10~w3 =
9/23
−2
9/2
268 Chapter 9 Inner Products
Notice that if {~v1, . . . ,~vk} is an orthonormal basis forW, then the formula for calcu-
lating a projection simplifies to
projW(~v) =⟨
~v,~v1
⟩
~v1 + · · · +⟨
~v,~vk
⟩
~vk
EXAMPLE 6 LetW = Span
0
1
0
,
−45
035
be a subspace of R3. Find projW
1
1
1
and perpW
1
1
1
.
Solution: Observe that
0
1
0
,
−45
035
is an orthonormal basis forW. Hence,
projW
1
1
1
=
⟨
1
1
1
,
0
1
0
⟩
0
1
0
+
⟨
1
1
1
,
−4/50
3/5
⟩
−4/50
3/5
=
4/25
1
−3/25
perpW
1
1
1
=
1
1
1
−
4/25
1
−3/25
=
21/25
0
28/25
EXAMPLE 7 On C[−π, π] under the inner product 〈 f (x), g(x)〉 =∫ π
−π f (x)g(x) dx, the set
B ={
1√
2π,
sin x√π,
cos x√π
}
is an orthonormal basis for the subspace S = SpanB
(see Example 9.2.18). Let f (x) = |x|. Find projS( f ).
Solution: We have
〈|x|, 1〉 =∫ π
−π|x| dx = π2
〈|x|, sin x〉 =∫ π
−π|x| sin x dx = 0
〈|x|, cos x〉 =∫ π
−π|x| sin x dx = −4
Hence,
projS( f ) =
⟨
|x|, 1√
2π
⟩
1√
2π+
⟨
|x|, sin x√π
⟩
sin x√π+
⟨
|x|, cos x√π
⟩
cos x√π
=1
2π〈|x|, 1〉1 + 1
π〈|x|, sin x〉 sin x +
1
π〈|x|, cos x〉 cos x
=π
2− 4
πcos x
This is called the first degree Fourier polynomial of f .
Section 9.4 General Projections 269
Section 9.4 Problems
1. Find a basis for the orthogonal complement of
each subspace.
(a) S1 = Span
3
2
1
,
1
−2
3
in R3.
(b) S2 = Span
1
−2
0
1
in R4.
(c) S3 =
x1
0
x1
| x1 ∈ R
in R3.
(d) S4 = Span
{[
1 0
1 0
]
,
[
2 −1
1 3
]}
in M2×2(R).
(e) S5 = Span
{[
2 1
1 1
]
,
[
−1 1
3 1
]}
in M2×2(R).
(f) S6 =
{[
a b
c d
]
| a + b + d = 0
}
in M2×2(R).
2. Find an orthogonal basis for the orthogonal
complement of each subspace of P2(R) under
the inner product
〈p(x), q(x)〉 = p(−1)q(−1)+p(0)q(0)+p(1)q(1)
(a) T1 = Span{
x2 + 1}
(b) T2 = Span{x2}
3. Let ~w1 =
1
2
0
1
, ~w2 =
2
1
1
2
, ~w3 =
2
1
3
2
, and let
S = Span{~w1, ~w2, ~w3}.(a) Find an orthonormal basis for S.
(b) Let ~y =
0
0
0
12
. Determine projS(~y).
(c) Let ~z =
1
0
0
2
. Determine perpS(~z).
4. In M2×2(R), consider the subspace
S = Span
{[
1 0
−1 1,
]
,
[
1 1
1 1
]
,
[
2 0
1 1
]}
(a) Find an orthogonal basis for S.
(b) Let A =
[
4 3
−2 5
]
. Calculate projS(A) and
perpS(A).
(c) Let B =
[
0 2
1 1
]
. Calculate projS(B) and
perpS(B).
(d) Without doing any further work, deter-
mine whether A and/or B are in S. Explain.
5. On P2(R) define the inner product
〈p, q〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1)
and let S = Span{
1, x − x2}
.
(a) Find an orthogonal basis for S.
(b) Determine projS(1 + x + x2) and
perpS(1 + x + x2).
(c) Determine projS(2 + 2x + 2x2) and
perpS(2 + 2x + 2x2).
6. Suppose W is a k-dimensional subspace of an
inner product space V. Prove that projW is a
linear operator on V with kernelW⊥.
7. Let W be a finite-dimensional subspace of an
inner product space V and let ~v ∈ V. Prove that
projW(~v) and perpW(~v) are unique.
8. In P3(R) using the inner product
〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) +
p(1)q(1) + p(2)q(2)
the following four vectors are orthogonal:
p1(x) = 9 − 13x − 15x2 + 10x3
p2(x) = 1 + x − x2
p3(x) = 1 − 2x
p4(x) = 1
Let S = Span{p1(x), p2(x), p3(x)}. Determine
the projection of −1+x+x2−x3 onto S. [HINT:
If you do an excessive number of calculations,
you have missed the point of the question.]
270 Chapter 9 Inner Products
9.5 The Fundamental Theorem
LetW be a subspace of a finite dimensional inner product space V. In the last section
we saw that any ~v ∈ V can be written as ~v = ~x+~y where ~x ∈W and ~y ∈W⊥. We now
invent some notation for this.
DEFINITION
Direct Sum
Let V be a vector space and U and W be subspaces of a vector space V such that
U ∩W = {~0}. The direct sum of U andW is
U ⊕W = {~u + ~w ∈ V | ~u ∈ U, ~w ∈W}
EXAMPLE 1 Let U = Span
1
0
1
andW = Span
1
1
1
. What is U ⊕W?
Solution: Since U ∩W = {~0} we have
U ⊕W = {~u + ~w ∈ V | ~u ∈ U, ~w ∈ W} =
s
1
0
1
+ t
1
1
1
| s, t ∈ R
= Span
1
0
1
,
1
1
1
THEOREM 1 If U andW are subspaces of a vector space V such that U ∩W = {~0}, then U ⊕W is
a subspace of V. Moreover, if {~v1, . . . ,~vk} is a basis for U and {~w1, . . . , ~wℓ} is a basis
forW, then {~v1, . . . ,~vk, ~w1, . . . , ~wℓ} is a basis for U ⊕W.
COROLLARY 2 If U andW are subspaces of a vector space V such that U ∩W = {~0} and ~v ∈ U ⊕W,
then there exists unique ~u ∈ U and ~w ∈W such that ~v = ~u + ~w.
EXAMPLE 2 Let U = Span{x2 + 1, x4} andW = Span{x3 − x, x3 + x2 + 1}. Find a basis for U ⊕W.
Solution: Since U ∩W = {~0} a basis for U ⊕W is {x2 + 1, x4, x3 − x, x3 + x2 + 1}.
Theorem 9.4.2 tells us that for any subspace W of an n-dimensional inner product
space V thatW∩W⊥ = {~0} and dimW⊥ = n−dimW. Combining this with Theorem
1 we get the following result.
Section 9.5 The Fundamental Theorem 271
THEOREM 3 If V is a finite dimensional inner product space andW is a subspace of V, then
W ⊕W⊥ = V
We now look at the related result for matrices. We start by looking at an example.
EXAMPLE 3 Find a basis for each of the four fundamental subspaces of A =
1 2 0 −1
3 6 1 −1
−2 −4 2 6
and
then determine the orthogonal complement of the row space of A and the orthogonal
complement of the column space of A.
Solution: To find a basis for each of the four fundamental subspaces, we use the
method derived in Section 7.1. Row reducing A to reduced row echelon form gives
R =
1 2 0 −1
0 0 1 2
0 0 0 0
Hence, a basis for Row(A) is
1
2
0
−1
,
0
0
1
2
, a basis for Null(A) is
−2
1
0
0
,
1
0
−2
1
,
and a basis for Col(A) is
1
3
−2
,
0
1
2
.
Row reducing AT gives
1 0 −8
0 1 2
0 0 0
0 0 0
. Thus, a basis for the left nullspace is
8
−2
1
.
Observe that the basis vectors for Null(A) are orthogonal to the basis vectors for
Row(A). Additionally, we know that dim(Row(A))⊥ = 4 − 2 = 2 by Theorem 9.4.2
(2). Therefore, since dim(Null(A)) = 2 we have that (Row(A))⊥ = Null(A).
Similarly, we see that (Col(A))⊥ = Null(AT ).
The result of this example seems pretty amazing. Moreover, applying parts (4) and
(5) of Theorem 9.4.2 and our notation for direct sums, we get that
Rn = Row(A) ⊕ Null(A) and Rm = Col(A) ⊕ Null(AT )
This tells us that every vector in R4 is the sum of a vector in the row space of A and
a vector in the nullspace of A.
The generalization of this result to any m × n matrix is extremely important.
272 Chapter 9 Inner Products
THEOREM 4 (The Fundamental Theorem of Linear Algebra)
If A is an m × n matrix, then Col(A)⊥ = Null(AT ) and Row(A)⊥ = Null(A). In
particular,
Rn = Row(A) ⊕ Null(A) and Rm = Col(A) ⊕ Null(AT )
Proof: Let A =
~vT1...~vT
m
.
If ~x ∈ Row(A)⊥, then ~x is orthogonal to each column of AT . Hence, ~vi · ~x = 0 for
1 ≤ i ≤ m. Thus, we get
A~x =
~vT1 ~x...~vT
m~x
=
~v1 · ~x...
~vm · ~x
= ~0
Hence, ~x ∈ Null(A) and so Row(A)⊥ ⊆ Null(A).
On the other hand, let ~x ∈ Null(A), then A~x = ~0 so ~vi · ~x = 0 for 1 ≤ i ≤ m. Now, pick
any ~b ∈ Row(A). Then ~b = c1~v1 + · · · + cm~vm for some c1, . . . , cm ∈ R. Therefore,
~b · ~x = (c1~v1 + · · · + cm~vm) · ~x = c1(~v1 · ~x) + · · · + cm(~vm · ~x) = 0
Hence, ~x ∈ Row(A)⊥ and so Null(A) ⊆ Row(A)⊥. Thus, Row(A)⊥ = Null(A).
Applying what we proved above to AT we get
(Col(A))⊥ = (Row(AT ))⊥ = Null(AT )
�
Observe that the Fundamental Theorem of Linear Algebra tells us more than the
relationship between the four fundamental subspaces. For example, if A is an m × n
matrix, then we know that the rank and nullity of the linear mapping L(~x) = A~x is
rank L = rank A = dim Row A and nullity L = dim(Null A). Since Rn = Row(A) ⊕Null(A) we get by Theorem 9.4.2 (2) that
n = rank L + nullity(L)
That is, the Fundamental Theorem of Linear Algebra implies the Rank-Nullity The-
orem.
It can also be shown that the Fundamental Theorem of Linear Algebra implies a lot
of our results about solving systems of linear equations. We will also find it useful in
a couple of proofs in future sections.
Section 9.6 The Method of Least Squares 273
Section 9.5 Problems
1. Let A =
1 1 3 1
2 2 6 2
−1 0 −2 −3
3 1 7 7
. Find a basis for
Row(A)⊥.
2. Prove that if U andW are subspaces of a vector
space V such that U∩W = {~0}, then U⊕W is a
subspace of V. Moreover, if {~v1, . . . ,~vk} is a ba-
sis for U and {~w1, · · · , ~wℓ} is a basis forW, then
{~v1, . . . ,~vk, ~w1, . . . , ~wℓ} is a basis for U ⊕W.
9.6 The Method of Least Squares
In the sciences one often tries to find a correlation between two quantities by collect-
ing data from repeated experimentation. Say, for example, a scientist is comparing
quantities x and y which are known to satisfy a quadratic relation y = a0+a1x+a2x2.
The scientist would like to find appropriate values for a0, a1, and a2.
To do this, the scientist performs some experiments involving the two quantities.
Every time they do the experiment they will get quantities yi and xi. Putting this into
their equation, they get a linear equation
yi = a0 + a1xi + a2x2i
in the three unknowns a0, a1, and a2. To get as accurate of a result as possible, the
scientists will perform the experiment many times. We thus end up with a system of
linear equations
y1 = a0 + a1x1 + a2x21
... =...
ym = a0 + a1xm + a2x2m
which has more equations than unknowns. Such a system of equations is called an
overdetermined system. Also notice that due to experimentation error, the system
is very likely to be inconsistent. So, the scientist needs to find the values of a0, a1,
and a2 which best approximates the data they collected.
To solve this problem, we rephrase it in terms of linear algebra. Let A be an m × n
matrix with m > n and let the system A~x = ~b be inconsistent. We want to find a
vector ~x that minimizes the distance between A~x and ~b. Hence, we need to minimize
‖~b − A~x‖. The following theorem tells us how to do this.
THEOREM 1 (Approximation Theorem)
Let W be a finite dimensional subspace of an inner product space V. If ~v ∈ V, then
the vector closest to ~v inW is projW(~v). That is,
‖~v − projW(~v) ‖ < ‖~v − ~w ‖
for all ~w ∈W, ~w , projW(~v).
274 Chapter 9 Inner Products
Proof: Consider ~v − ~w = (~v − projW(~v)) + (projW(~v) − ~w). Now, observe that
{~v− projW(~v), projW(~v)− ~w} is an orthogonal set since ~v− projW(~v) = perpW(~v) ∈ W⊥and projW(~v) − ~w ∈ W. Therefore, by Theorem 9.2.2, we get
‖~v − ~w ‖2 = ‖(~v − projW(~v)) + (projW(~v) − ~w)‖2
= ‖~v − projW(~v) ‖2 + ‖ projW(~v) − ~w ‖2
> ‖~v − projW(~v) ‖2
since ‖ projW(~v) − ~w ‖2 > 0 if ~w , projW(~v). The result now follows. �
Notice that A~x is in the column space of A. Thus, the Approximation Theorem tells
us that we can minimize ‖~b − A~x ‖ by finding the projection of ~b onto the column
space of A. Therefore, if we solve the consistent system
A~x = projCol A(~b)
we will find the desired vector ~x. This might seem quite simple, but we can make it
even easier. Subtracting both sides of this equation from ~b we get
~b − A~x = ~b − projCol A(~b) = perpCol A(~b)
Thus, ~b − A~x is in the orthogonal complement of the column space of A which, by
the Fundamental Theorem of Linear Algebra, means ~b−A~x is in the nullspace of AT .
Hence
AT (~b − A~x) = ~0
or equivalently
AT A~x = AT~b
This is called the normal system and the individual equations are called the normal
equations. This system will be consistent by construction. However, it need not have
a unique solution. If it does have infinitely many solutions, then each of the solutions
will minimize ‖~b − A~x‖.
EXAMPLE 1 Determine the vector ~x that minimizes ‖~b − A~x‖ for the system
3x1 − x2 = 4
x1 + 2x2 = 0
2x1 + x2 = 1
Solution: We have A =
3 −1
1 2
2 1
and ~b =
4
0
1
, so AT A =
[
14 1
1 6
]
and AT~b =
[
14
−3
]
.
Since AT A is invertible, we find that
~x = (AT A)−1AT~b =1
83
[
6 −1
−1 14
] [
14
−3
]
=1
83
[
87
−56
]
Thus, x1 =8783
and x2 =−5683
.
Note that these give 3x1 − x2 = 3.82, x1 + 2x2 = −0.3, 2x1 + x2 = 1.42, so we have
found a fairly good approximation.
Section 9.6 The Method of Least Squares 275
EXAMPLE 2 Determine the vector ~x that minimizes ‖~b − A~x‖ for the system
2x1 + x2 = −5
−2x1 + x2 = 8
2x1 + 3x2 = 1
Solution: We have A =
2 1
−2 1
2 3
so
AT A =
[
12 6
6 11
]
, AT~b =
[
−24
6
]
Since AT A is invertible, we find that
~x = (AT A)−1AT~b =1
96
[
11 −6
−6 12
] [
−24
6
]
=
[
−300/96
216/96
]
Thus, x1 =−25
8and x2 =
94.
This method of finding an approximate solution is called the method of least squares.
It is called this because we are minimizing
‖~b − A~x‖ =
∥
∥
∥
∥
∥
∥
∥
∥
∥
v1
...vm
∥
∥
∥
∥
∥
∥
∥
∥
∥
=
√
v21+ · · · + v2
m
which is equivalent to minimizing v21+ · · · + v2
m.
We now return to our problem of finding the curve of best fit for a set of data points.
Say in an experiment we get a set of data points (x1, y1), . . . , (xm, ym) and we want to
find the values of a0, . . . , an such that p(x) = a0 + a1x + · · · + anxn is the polynomial
of best fit. That is, we want the values of a0, . . . , an such that the values of yi are
approximated as well as possible by p(xi).
Let ~y =
y1
...ym
and define p(~x) =
p(x1)...
p(xm)
. To make this look like the method of least
squares we write
p(~x) =
p(x1)...
p(xm)
=
a0 + a1(x1) + · · · + an(x1)n
...a0 + a1(xm) + · · · + an(xm)n
=
1 x1 · · · xn1
......
...1 xm · · · xn
m
a0
...an
= X~a
Thus, we are trying to minimize ‖X~a−~y ‖ and we can use the method of least squares
above.
This can be summarized in the following theorem.
276 Chapter 9 Inner Products
THEOREM 2 Let m data points (x1, y1), . . . , (xm, ym) be given and write
~y =
y1
...ym
, X =
1 x1 x21· · · xn
1
1 x2 x22· · · xn
2.........
...1 xm x2
m · · · xnm
If ~a =
a0
...an
is any solution to the normal system
XT X~a = XT~y
then the polynomial
p(x) = a0 + a1x + · · · + anxn
is the best fitting polynomial of degree n for the given data. Moreover, if at least n+1
of the numbers x1, . . . , xm are distinct, then the matrix XT X is invertible and thus ~a is
unique with
~a = (XT X)−1XT~y
Proof: We proved the first part above, so we just need to prove the second part.
Suppose that n + 1 of the xi are distinct and consider a linear combination of the
columns of X.
c0
1
1...1
+ c1
x1
x2
...xm
+ · · · + cn
xn1
xn2...
xnm
=
0
0...0
Let q(x) = c0 + c1x + · · · + cnxn. Equating coefficients we see that x1, x2, . . . , xm are
all roots of q(x). Thus, q(x) is a polynomial of degree at most n with at least n + 1
distinct roots. Consequently, by the Fundamental Theorem of Algebra, q(x) must be
the zero polynomial, and so ci = 0 for 0 ≤ i ≤ n. Thus, the columns of X are linearly
independent.
To show this implies that XT X is invertible we consider XT X~v = ~0. We have
‖X~v ‖2 = (X~v)T X~v = ~vT XT X~v = ~vT~0 = 0
Hence, X~v = ~0, so ~v = ~0 since the columns of X are linearly independent. Thus, XT X
is invertible by the Invertible Matrix Theorem. �
Section 9.6 The Method of Least Squares 277
EXAMPLE 3 Find a0 and a1 to obtain the best fitting equation of the form y = a0 + a1x for the
given data.x 1 3 4 6 7
y 1 2 3 4 5
Solution: We let X =
1 1
1 3
1 4
1 6
1 7
and ~y =
1
2
3
4
5
. We then get
XT X =
[
1 1 1 1 1
1 3 4 6 7
]
1 1
1 3
1 4
1 6
1 7
=
[
5 21
21 111
]
XT~y =
[
1 1 1 1 1
1 3 4 6 7
]
1
2
3
4
5
=
[
15
78
]
By Theorem 2, we get that ~a =
[
a0
a1
]
is unique and satisfies
~a = (XT X)−1XT~y
=
[
5 21
21 111
]−1 [
15
78
]
=1
114
[
111 −21
−21 5
] [
15
78
]
=1
38
[
9
25
]
So, the line of best fit is y = 938+ 25
38x.
278 Chapter 9 Inner Products
EXAMPLE 4 Find a0, a1, a2 to obtain the best fitting equation of the form y = a0 + a1x + a2x2 for
the given data.x −3 −1 0 1 3
y 3 1 1 2 4
Solution: We have ~y =
3
1
1
2
4
and X =
1 −3 9
1 −1 1
1 0 0
1 1 1
1 3 9
. Hence,
XT X =
5 0 20
0 20 0
20 0 164
, XT~y =
11
4
66
By Theorem 2, we get that ~a =
a0
a1
a2
is unique and satisfies
~a = (XT X)−1XT~y
=
5 0 20
0 20 0
20 0 164
−1
11
4
66
=1
210
242
42
55
So the best fitting parabola is
p(x) =121
105+
1
5x +
11
42x2
Section 9.6 The Method of Least Squares 279
Section 9.6 Problems
1. Find the vector ~x that minimizes ‖A~x − ~b‖ for
each of the following systems.
(a) x1 + 2x2 = 1
x1 + x2 = 2
x1 − x2 = 5
(b) x1 + 5x2 = 1
−2x1 − 7x2 = 1
x1 + 2x2 = 0
(c) 2x1 + x2 = 4
2x1 − x2 = −1
3x1 + 2x2 = 8
(d) x1 + 2x2 = 2
x1 + 2x2 = 3
x1 + 3x2 = 2
x1 + 3x2 = 3
(e) x1 − x2 = 2
2x1 − x2 + x3 = −2
2x1 + 2x2 + x3 = 3
x1 − x2 = −2
(f) 2x1 − 2x2 = 1
−x1 + x2 = 2
3x1 + x2 = 1
2x1 − x2 = 2
2. Let ~x =
1
2
1
, ~y =
3
1
−2
, ~z =
1
−1
1
. and con-
sider the plane P in R3 with scalar equation
2x1 + x2 − x3 = 0.
(a) Find the vector in P that is closest to ~x.
(b) Find the vector in P that is closest to ~y.
(c) Find the vector in P that is closest to ~z.
(d) Without doing any further calculations,
which vector ~x, ~y, or ~z is furthest from P.
Explain.
3. Find a0 and a1 to obtain the best fitting equation
of the form y = a0 + a1x for the given data.
(a)x −1 0 1
y −3 2 2
(b)x −1 0 1 2
y −3 −1 0 1
4. Find a0, a1, a2 to obtain the best fitting equation
of the form y = a0 + a1x + a2x2 for the given
data.
(a)x −2 −1 1 2
y 0 −2 0 1
(b)x −2 −1 0 1 2
y 0 1 −1 3 1
5. Find a1 and a2 to obtain the best fitting equation
of the form y = a1x + a2x2
for the given data.
x −1 0 1
y 4 1 1
6. (optional)
(a) Let A ∈ Mm×n(R) with rank(A) = n. Let
A = QR be a QR-Decomposition (see page
260) of A. Show that the normal equa-
tions AT A~x = AT~b can be rewritten as
R~x = QT~b.
(b) Let A =
3 −6
4 −8
0 1
and ~b =
−1
7
2
. Solve
the normal system AT A~x = AT~b by using
the QR-Decomposition found in Problem
9.3.10.
Chapter 10
Applications of Orthogonal Matrices
10.1 Orthogonal Similarity
In Chapter 6, we saw that two matrices A and B are similar if there exists an invertible
matrix P such that P−1AP = B. Moreover, we saw that this corresponds to calculating
the matrix of a linear operator with respect to a basis. In particular, if
L : Rn → Rn is a linear operator and B = {~v1, . . . ,~vn} is a basis for Rn, then taking
P =[
~v1 · · · ~vn
]
we get
[L]B = P−1[L]P
We then looked for a basis B of Rn such that [L]B was diagonal. We found that such
a basis is made up of the eigenvectors of [L].
In the last chapter, we saw that orthonormal bases have some very nice properties. For
example, if the columns of P form an orthonormal basis for Rn, then P is orthogonal
and it is very easy to find P−1. In particular, we have P−1 = PT . Hence, we now turn
our attention to trying to find an orthonormal basis B of eigenvectors such that [L]Bis diagonal. Since the corresponding matrix P will be orthogonal, we will get
[L]B = PT [L]P
DEFINITION
OrthogonallySimilar
Two matrices A and B are said to be orthogonally similar if there exists an orthogo-
nal matrix P such that
PT AP = B
Since PT = P−1 we have that if A and B are orthogonally similar, then they are similar.
Therefore, all the properties of similar matrices still hold. In particular, if A and B
are orthogonally similar, then rank A = rank B, tr A = tr B, det A = det B, and A and
B have the same eigenvalues.
We saw in Chapter 6 that not every square matrix A has a set of eigenvectors which
forms a basis for Rn. Since we are now going to require the additional condition that
the basis is orthonormal, we expect that even fewer matrices will have this property.
So, we first just look for matrices A such that there exists an orthogonal matrix P for
which PT AP is upper triangular.
280
Section 10.1 Orthogonal Similarity 281
THEOREM 1 (Triangularization Theorem)
If A is an n × n matrix with real eigenvalues, then A is orthogonally similar to an
upper triangular matrix T .
Proof: We prove the result by induction on n. If n = 1, then A is upper triangular.
Therefore, we can take P to be the orthogonal matrix P = [1] and the result follows.
Now assume the result holds for all (n − 1) × (n − 1) matrices and consider an n × n
matrix A with all real eigenvalues.
Let ~v1 be a unit eigenvector of one of A’s eigenvalues λ1. Extend the set {~v1} to a
basis for Rn and apply the Gram-Schmidt procedure to produce an orthonormal basis
{~v1, ~w2, . . . , ~wn} for Rn. Then, the matrix P1 =[
~v1 ~w2 . . . ~wn
]
is orthogonal and
PT1 AP1 =
~vT1
~wT2...~wT
n
A[
~v1 ~w2 . . . ~wn
]
=
~v1 · A~v1 ~v1 · A~w2 · · · ~v1 · A~wn
~w2 · A~v1 ~w2 · A~w2 . . . ~w2 · A~wn
.... . .
...~wn · A~v1 ~wn · A~w2 . . . ~wn · A~wn
Consider the entries in the first column. Since A~v1 = λ1~v1 we get ~v1 · A~v1 = λ1, and
~wi · A~v1 = ~wi · λ1~v1 = λ1(~wi · ~v1) = 0
for 2 ≤ i ≤ n. Hence
PT1 AP1 =
[
λ1~bT
~0 A1
]
where A1 is an (n − 1) × (n − 1) matrix, ~b is a vector in Rn−1, and ~0 is the zero vector
in Rn−1. Because A is similar to
[
λ1~bT
~0 A1
]
we get that all of the eigenvalues of A1 are
eigenvalues of A, and hence all the eigenvalues of A1 are real. So, by the inductive
hypothesis, there exists an (n−1)×(n−1) orthogonal matrix Q such that QT A1Q = T1
is upper triangular.
Let P2 =
[
1 ~0T
~0 Q
]
. Then, the columns of P2 are orthonormal since the columns of
Q are orthonormal. Hence, P2 is an orthogonal matrix. Consequently, the matrix
P = P1P2 is also orthogonal by Theorem 9.2.7. By block multiplication we get
PT AP = (P1P2)T A(P1P2) = PT2 PT
1 AP1P2
=
[
1 ~0T
~0 QT
] [
λ1~bT
~0 A1
] [
1 ~0T
~0 Q
]
=
[
λ1~bT Q
~0 QT A1Q
]
=
[
λ1~bT Q
~0 T1
]
is upper triangular. Thus, the result follows by induction. �
282 Chapter 10 Applications of Orthogonal Matrices
If A is orthogonally similar to an upper triangular matrix T , then A and T must share
the same eigenvalues. Thus, since T is upper triangular, the eigenvalues must appear
along the main diagonal of T .
Notice that the proof of the theorem gives us a method for finding an orthogonal
matrix P so that PT AP = T is upper triangular. However, since the proof is by
induction, this leads to a recursive algorithm. We demonstrate this with an example.
EXAMPLE 1 Let A =
2 1 1
0 3 −2
0 2 −1
. Find an orthogonal matrix P such that PT AP is upper triangular.
Solution: As in the proof, we first need to find one of the real eigenvalues of A. By
inspection, we see that 2 is an eigenvalue of A with unit eigenvector ~v1 =
1
0
0
. Next,
we extend {~v1} to an orthonormal basis for R3. This is very easy in this case as we
can take ~w2 =
0
1
0
and ~w3 =
0
0
1
. Thus, we get P1 = I and
PT1 AP1 =
2 1 1
0 3 −2
0 2 −1
=
[
2 ~bT
~0 A1
]
where A1 =
[
3 −2
2 −1
]
and ~bT =[
1 1]
.
We now need to apply the inductive hypothesis on A1. To do this, we need to find
a 2 × 2 orthogonal matrix Q such that QT A1Q = T1 is upper triangular. We start
by finding a real eigenvalue of A1. Consider A1 − λI =[
3 − λ −2
2 −1 − λ
]
. We have
C(λ) = det(A1 − λI) = (λ− 1)2 which implies that the only eigenvalue of A1 is λ = 1.
We find that A1 − λI =[
2 −2
2 −2
]
, so a corresponding unit eigenvector is
[
1/√
2
1/√
2
]
.
We extend
{[
1/√
2
1/√
2
]}
to the orthonormal basis
{[
1/√
2
1/√
2
]
,
[
−1/√
2
1/√
2
]}
for R2. There-
fore, Q = 1√2
[
1 −1
1 1
]
is an orthogonal matrix and
QT A1Q =1√
2
[
1 1
−1 1
] [
3 −2
2 −1
]
1√
2
[
1 −1
1 1
]
=
[
1 −4
0 1
]
= T1
Now that we have Q and T1, the proof of the theorem tells us to choose
P2 =
[
1 ~0T
~0 Q
]
=
1 0 0
0 1/√
2 −1/√
2
0 1/√
2 1/√
2
Section 10.1 Orthogonal Similarity 283
and P = P1P2 = P2. This gives
PT AP =
[
2 ~bT Q~0 T1
]
=
2√
2 0
0 1 −4
0 0 1
as required.
Since the procedure is recursive, it would be very inefficient for larger matrices.
There is a much better algorithm for doing this, although we will not cover it in
this book. The purpose of the example above is not to teach one how to find such an
orthogonal matrix, but to help one understand the proof.
Section 10.1 Problems
1. By following the steps in the proof of the Trian-
gularization Theorem, find an orthogonal ma-
trix P and upper triangular matrix T such that
PT AP = T for each of the following matrices.
(a) A =
[
2 4
−4 −6
]
(b) A =
[
1 4
7 4
]
(c) A =
−3 4 5
0 1 4
0 7 4
(d) A =
7 −6 5
0 1 4
0 6 3
(e) A =
1 0 −1
2 2 1
4 0 5
(f) A =
0 1 1
2 3 6
−1 −1 −2
(g) A =
2 0 0 1
0 2 0 −1
−1 1 2 0
0 0 0 2
2. Prove that if A is orthogonal similar to B and B
is orthogonally similar to C, then A is orthogo-
nal similar to C.
3. Let A and B be orthogonally similar matrices.
(a) Prove that if A and B are invertible, then
A−1 and B−1 are orthogonally similar.
(b) Prove that A2 and B2 are orthogonally sim-
ilar.
(c) Prove that if AT = A, then BT = B.
(d) Prove that A and B are orthogonally simi-
lar to the same upper triangular matrix T .
284 Chapter 10 Applications of Orthogonal Matrices
10.2 Orthogonal Diagonalization
We now know that for any linear operator L on Rn with real eigenvalues we can find
an orthonormal basis B such that the B-matrix of L is upper triangular. However, as
we saw in Chapter 6, having a diagonal matrix would be even better. So, we now
look at which matrices are orthogonally similar to a diagonal matrix.
DEFINITION
OrthogonallyDiagonalizable
An n × n matrix A is said to be orthogonally diagonalizable if there exists an
orthogonal matrix P and diagonal matrix D such that
PT AP = D
that is, if A is orthogonally similar to a diagonal matrix.
Our first goal is to determine all matrices which are orthogonally diagonalizable. But,
how do we even start looking for non-trivial matrices which will have an orthonormal
basis for Rn of eigenvectors.
A clever way to approach this problem is to work backwards. That is, start by looking
at what condition a matrix must have if it is orthogonally diagonalizable.
So, assume that A is orthogonally diagonalizable. Then, by definition, there exists
an orthogonal matrix P such that PT AP = D is diagonal. We are looking for some
special condition, the matrix A must have. Of course, since A and D are similar (in
fact, orthogonally similar), they must have the same properties. So, we can instead
ask ourselves what nice properties must D have. Since D is diagonal, we can see that
DT = D. Let’s try to prove that A has the same property.
Using the fact that P is orthogonal, we can rewrite PT AP = D as A = PDPT . Taking
transposes gives
AT = (PDPT )T = (PT )T DT PT = PDPT = A
Thus, we also have AT = A. We have proven the following theorem.
THEOREM 1 If A is orthogonally diagonalizable, then AT = A.
The condition AT = A is going to be very important, so we make the following
definition.
DEFINITION
SymmetricMatrix
A matrix A such that AT = A is said to be symmetric.
It is clear from the definition that a matrix must be square to be symmetric. Moreover,
observe that a matrix A is symmetric if and only if ai j = a ji for all 1 ≤ i, j ≤ n.
Section 10.2 Orthogonal Diagonalization 285
EXAMPLE 1 Determine which of the following matrices is symmetric.
(a) A =
1 2 1
2 0 1
1 1 −3
, (b) B =
1 0 1
1 1 1
1 0 1
, (c) C =
1 0 0
0 2 0
0 0 3
Solution: It is easy to verify that AT = A and CT = C so they are both symmetric.
However, we have BT, B, so B is not symmetric.
Theorem 1 shows that every orthogonally diagonalizable matrix is symmetric. But,
is the converse true? Let’s begin by consider the case of 2 × 2 symmetric matrices.
EXAMPLE 2 Prove that every 2 × 2 symmeric matrix A =
[
a b
b c
]
is orthogonally diagonalizable.
Solution: We will show that we can find an orthonormal basis of eigenvectors for A
so that we can find an orthogonal matrix P which diagonalizes A. We have
C(λ) = det
[
a − λ b
b c − λ
]
= λ2 − (a + c)λ + ac − b2. Thus, by the quadratic formula,
the eigenvalues of A are
λ+ =a + c +
√
(a + c)2 − 4(ac − b2)
2=
a + c +√
(a − c)2 + 4b2
2
λ− =a + c −
√
(a + c)2 − 4(ac − b2)
2=
a + c −√
(a − c)2 + 4b2
2
If a = c and b = 0, then A is already diagonal and thus can be orthogonally diagonal-
ized by I. Otherwise, A has two distinct real eigenvalues. We have
A − λ+I =
a−c−√
(a−c)2+4b2
2b
bc−a−√
(a−c)2+4b2
2
∼
1 2b
a−c−√
(a−c)2+4b2
0 0
Hence, an eigenvector for λ+ is ~v1 =
−2b
a−c−√
(a−c)2+4b2
1
. Also,
A − λ−I =
a−c+√
(a−c)2+4b2
2b
bc−a+√
(a−c)2+4b2
2
∼
1 2b
a−c+√
(a−c)2+4b2
0 0
so, an eigenvector for λ− is ~v2 =
−2b
a−c+√
(a−c)2+4b2
1
.
Observe that
~v1 · ~v2 =
−2b
a − c −√
(a − c)2 + 4b2
−2b
a − c +√
(a − c)2 + 4b2
+ 1(1)
=4b2
(a − c)2 − (a − c)2 − 4b2+ 1 = 0
Thus, the eigenvectors are orthogonal and hence we can normalize them to get an
orthonormal basis for R2. So, A can be diagonalized by an orthogonal matrix.
286 Chapter 10 Applications of Orthogonal Matrices
Example 2 not only shows that every 2 × 2 symmetric matrix is orthogonally diag-
onalizable, but it shows some other important facts. It proves that all eigenvalues
any 2 × 2 symmetric matrix must be real, and that the eigenvectors corresponding to
different eigenvalues are naturally orthogonal.
LEMMA 2 If A is a symmetric matrix with real entries, then all of its eigenvalues are real.
This proof requires properties of vectors and matrices with complex entries and so
will be delayed until Chapter 11.
The wonderful part of Lemma 2 is that implies that we can use the Triangularization
Theorem on any symmetric matrix. This makes proving the fact that every symmetric
matrix is orthogonally diagonalizable very easy.
THEOREM 3 (Principal Axis Theorem)
Every symmetric matrix A is orthogonally diagonalizable.
Proof: By Lemma 2 all eigenvalues of A are real. Therefore, we can apply the
Triangularization Theorem to get that there exists an orthogonal matrix P such that
PT AP = T is upper triangular. Since A is symmetric, we have that AT = A and hence
T T = (PT AP)T = PT AT (PT )T = PT AP = T
Therefore, T is an upper triangular symmetric matrix. But, if T is upper triangular,
then T T is lower triangular, and so T is both upper and lower triangular and hence T
is diagonal. �
REMARK
We have now proven that a matrix is orthogonally diagonalizable if and only if it is
symmetric.
One disadvantage in using the Triangularization Theorem to prove the Principal Axis
Theorem is that it does not give us a nice way of orthogonally diagonalizing a sym-
metric matrix. So, we instead refer to our solution of Example 2. In the example, we
saw that the eigenvectors corresponding to different eigenvalues were naturally or-
thogonal. To prove that this is always the case, we first prove an important property
of symmetric matrices.
Section 10.2 Orthogonal Diagonalization 287
THEOREM 4 A matrix A is symmetric if and only if ~x · (A~y) = (A~x) · ~y for all ~x, ~y ∈ Rn.
Proof: Suppose that A is symmetric, then for any ~x, ~y ∈ Rn we have
~x · (A~y) = ~xT A~y = ~xT AT~y = (A~x)T~y = (A~x) · ~y
Conversely, if ~x · (A~y) = (A~x) · ~y for all ~x, ~y ∈ Rn, then
(~xT A)~y = (A~x)T~y = (~xT AT )~y
Since this is valid for all ~y ∈ Rn we get that ~xT A = ~xT AT by the Matrices Equal
Theorem. Taking transposes of both sides gives AT~x = A~x. This is still valid for all
~x ∈ Rn, so using the Matrices Equal Theorem again gives AT = A as required. �
THEOREM 5 If ~v1,~v2 are eigenvectors of a symmetric matrix A corresponding to distinct eigenval-
ues λ1, λ2, then ~v1 is orthogonal to ~v2.
Proof: We are assuming that A~v1 = λ1~v1 and A~v2 = λ2~v2, λ1 , λ2. Theorem 4 gives
λ1(~v1 · ~v2) = (λ1~v1) · ~v2 = (A~v1) · ~v2 = ~v1 · (A~v2) = ~v1 · (λ2~v2) = λ2(~v1 · ~v2)
Hence, (λ1 − λ2)(~v1 · ~v2) = 0. But, λ1 , λ2, so ~v1 · ~v2 = 0 as required. �
Consequently, if a symmetric matrix A has n distinct eigenvalues, the basis of eigen-
vectors which diagonalizes A will naturally be orthogonal. Hence, to orthogonally
diagonalize such a matrix A, we just need to normalize these eigenvectors to form an
orthonormal basis for Rn of eigenvectors of A.
EXAMPLE 3 Find an orthogonal matrix P such that PT AP is diagonal, where A =
1 0 −1
0 1 2
−1 2 5
.
Solution: The characteristic polynomial is
C(λ) =
∣
∣
∣
∣
∣
∣
∣
∣
1 − λ 0 −1
0 1 − λ 2
−1 2 5 − λ
∣
∣
∣
∣
∣
∣
∣
∣
= (1−λ)[(1−λ)(5−λ)−4]− (1−λ) = −(λ−1)λ(λ−6)
Thus, the eigenvalues of A are λ1 = 0, λ2 = 1, and λ3 = 6. For λ1 = 0 we get
A − 0I =
1 0 −1
0 1 2
−1 2 5
∼
1 0 −1
0 1 2
0 0 0
288 Chapter 10 Applications of Orthogonal Matrices
Hence, a basis for Eλ1, the eigenspace of λ1, is
1
−2
1
. For λ2 = 1, we get
A − I ∼
1 −2 0
0 0 1
0 0 0
Hence, a basis for Eλ2is
2
1
0
. For λ3 = 6, we get
A − 6I ∼
1 0 1/50 1 −2/50 0 0
Hence, a basis for Eλ3is
−1
2
5
.
As predicted by Theorem 5, we can see that these are orthogonal to each other. After
normalizing we find that an orthonormal basis of eigenvectors for A is
1/√
6
−2/√
6
1/√
6
,
2/√
5
1/√
5
0
,
−1/√
30
2/√
30
5/√
30
Thus, we take
P =
1/√
6 2√
5 −1/√
30
−2/√
6 1/√
5 2/√
30
1/√
6 0 5/√
30
Since the columns of P form an orthonormal basis for R3 we get that P is orthogonal.
Moreover, since the columns of P form a basis of eigenvectors for A, we get that P
diagonalizes A. In particular,
PT AP =
0 0 0
0 1 0
0 0 6
Notice that Theorem 5 does not say anything about different eigenvectors correspond-
ing to the same eigenvalue. In particular, if an eigenvalue of a symmetric matrix A has
geometric multiplicity 2, then there is no guarantee that a basis for the eigenspace of
the eigenvalue will be orthogonal. Of course, this is easily fixed as we can just apply
the Gram-Schmidt procedure to find an orthonormal basis for the eigenspace.
Section 10.2 Orthogonal Diagonalization 289
EXAMPLE 4 Orthogonally diagonalize A =
8 −2 2
−2 5 4
2 4 5
.
Solution: The characteristic polynomial of A is
C(λ) =
∣
∣
∣
∣
∣
∣
∣
∣
8 − λ −2 2
−2 5 − λ 4
2 4 5 − λ
∣
∣
∣
∣
∣
∣
∣
∣
=
8 − λ −4 2
0 0 9 − λ2 λ − 1 5 − λ
= −λ(λ − 9)2
Thus, the eigenvalues are λ1 = 0 and λ2 = 9 where the algebraic multiplicity of λ1 is
1 and the algebraic multiplicity of λ2 is 2. For λ1 = 0 we get
A − 0I ∼
1 0 1/20 1 1
0 0 0
Hence, a basis for Eλ1is
1
2
−2
= {~v1}. For λ2 = 9 we get
A − 9I ∼
1 2 −2
0 0 0
0 0 0
Hence, a basis for Eλ2is
−2
1
0
,
2
0
1
= {~v2,~v3}.
As predicted by Theorem 5, ~v2 and ~v3 are both orthogonal to ~v1. But, since they
are not orthogonal to each other, we apply the Gram-Schmidt procedure on the basis
{~v2,~v3} for Eλ2to get an orthonormal basis for Eλ2
. We take ~w2 = ~v2 and get
~v3 −⟨
~v3, ~w2
⟩
‖~w2‖2~w2 =
2
0
1
− −4
5
−2
1
0
=
2/54/51
To simplify calculations, we take ~w3 =
2
4
5
. Thus, {~w2, ~w3} is an orthogonal basis for
the eigenspace of λ2 and hence {~v1, ~w2, ~w3} is an orthogonal basis for R3 of eigenvec-
tors of A. We normalize the vectors to get the orthonormal basis
1/32/3−2/3
,
−2/√
5
1/√
5
0
,
2/√
45
4/√
45
5/√
45
Hence, we get
P =
1/3 −2/√
5 2/√
45
2/3 1/√
5 4/√
45
−2/3 0 5/√
45
and D = PT AP =
0 0 0
0 9 0
0 0 9
290 Chapter 10 Applications of Orthogonal Matrices
EXAMPLE 5 Orthogonally diagonalize A =
1 1 1
1 1 1
1 1 1
.
Solution: The characteristic polynomial of A is
C(λ) =
∣
∣
∣
∣
∣
∣
∣
∣
1 − λ 1 1
1 1 − λ 1
1 1 1 − λ
∣
∣
∣
∣
∣
∣
∣
∣
=
1 − λ 1 2
1 1 − λ 2 − λ0 λ 0
= −λ2(λ − 3)
Thus, the eigenvalues are λ1 = 0 with algebraic multiplicity 2 and λ2 = 3 with
algebraic multiplicity 1. For λ1 = 0 we get
A − 0I ∼
1 1 1
0 0 0
0 0 0
Hence, a basis for Eλ1is
−1
1
0
,
−1
0
1
= {~v1,~v2}.
Since this is not an orthogonal basis for Eλ1, we need to apply the Gram-Schmidt
procedure. We take ~w1 = ~v1 and get
−1
0
1
− 1
2
−1
1
0
=
−1/2−1/2
1
So, we take ~w2 =
1
1
−2
. Then, {~w1, ~w2} is an orthogonal basis for Eλ1. For λ2 = 3 we
get
A − 3I ∼
1 0 −1
0 1 −1
0 0 0
Hence, a basis for Eλ2is
1
1
1
.
We normalize the basis vectors to get the orthonormal basis
−1/√
2
1/√
2
0
,
1/√
6
1/√
6
−2/√
6
,
1/√
3
1/√
3
1/√
3
Hence, we take
P =
−1/√
2 1/√
6 1/√
3
1/√
2 1/√
6 1/√
3
0 −2/√
6 1/√
3
and get
PT AP =
0 0 0
0 0 0
0 0 3
Section 10.2 Orthogonal Diagonalization 291
Section 10.2 Problems
1. Orthogonally diagonalize each of the following
symmetric matrices.
(a)
[
2 2
2 2
]
(b)
0 2 −2
2 1 0
−2 0 1
(c)
[
4 −2
−2 7
]
(d)
2 −2 −5
−2 −5 −2
−5 −2 2
(e)
[
1 −2
−2 −2
]
(f)
1 0 2
0 1 4
2 4 2
(g)
1 1 1
1 1 1
1 1 1
(h)
2 2 4
2 4 2
4 2 2
(i)
2 −4 −4
−4 2 −4
−4 −4 2
(j)
5 −1 1
−1 5 1
1 1 5
(k)
0 1 −1
1 0 1
−1 1 0
(l)
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
2. Invent a 3 × 3 orthogonally diagonalizable ma-
trix that has eigenvalues λ1 = −3, λ2 = 1,
and λ3 = 6, with corresponding eigenvectors
~v1 =
−1
−2
2
, ~v2 =
−2
1
0
, and ~v3 =
2
4
5
.
3. Prove that if A is invertible and orthogonally
diagonalizable, then A−1 is orthogonally diago-
nalizable.
4. Let A and B be n×n symmetric matrices. Deter-
mine which of the following is also symmetric.
(a) AT A (b) AAT
(c) AB (d) A2
5. Let A be a symmetric matrix. Prove that all
eigenvalues of A are non-negative if and only
if there exists a symmetric matrix B such that
A = B2.
6. Determine whether each statement is true or
false. Justify your answer with a proof or a
counter example.
(a) Every orthogonal matrix is orthogonally
diagonalizable.
(b) If A and B are orthogonally diagonaliz-
able, then AB is orthogonally diagonaliz-
able.
(c) If A is orthogonally similar to a symmetric
matrix B, then A is orthogonally diagonal-
izable.
(d) If λ is an eigenvalue of a symmetric matrix
A, then gλ = aλ.
292 Chapter 10 Applications of Orthogonal Matrices
10.3 Quadratic Forms
We now use the results of the last section to study a very important class of functions
called quadratic forms. Quadratic forms are not only important in linear algebra, but
also in number theory, group theory, differential geometry, and many other areas.
Simply put, a quadratic form is a function defined as a linear combination of all
possible terms xix j for 1 ≤ i ≤ j ≤ n. For example, for two variables x1, x2 we have
a1x21+a2x1x2+a3x2
2, and for three variables x1, x2, x3 we get a1x2
1+a2x1x2+a3x1x3 +
a4x22 + a5x2x3 + a6x2
3.
Notice that a quadratic form is certainly not a linear function. Instead, they are tied
to linear algebra by matrix multiplication. For example, if B =
[
1 2
0 −1
]
and ~x =
[
x1
x2
]
,
then we have
Q(~x) = ~xT B~x =[
x1 x2
]
[
1 2
0 −1
] [
x1
x2
]
=[
x1 x2
]
[
x1 + 2x2
−x2
]
= x1(x1 + 2x2) + x2(−x2) = x21 + 2x1x2 − x2
2
which is a quadratic form. We now use this to precisely define a quadratic form.
DEFINITION
Quadratic Form
Let A be an n × n matrix. A function Q : Rn → R of the form
Q(~x) = ~xT A~x =
n∑
i=1
n∑
j=1
ai jxix j
is called a quadratic form.
There is a connection between the dot product and quadratic forms. Observe that
Q(~x) = ~xT A~x = ~x · (A~x) = (A~x) · ~x = (A~x)T~x = ~xT AT~x
Hence, A and AT give the same quadratic form (this does not imply that A = AT ).
Of course, this is most natural when A is symmetric. In particular, for any quadratic
form
Q(~x) = ~xT B~x =
n∑
i=1
n∑
j=1
bi jxix j
if we define a matrix A by
(A)i j = (A) ji =
bii if i = j12[bi j + b ji] if i , j
then we have that A is symmetric and Q(~x) = ~xT A~x. Hence, there is a one-to-one
correspondence between symmetric matrices and quadratic forms. Thus, as we will
see, we often deal with a quadratic form and its corresponding symmetric matrix in
the same way.
Section 10.3 Quadratic Forms 293
EXAMPLE 1 Let B =
[
1 2
0 −1
]
. Find the symmetric matrix corresponding to Q(~x) = ~xT B~x.
Solution: From our work above we have
Q(~x) = ~xT B~x = x21 + 2x1x2 − x2
2
Thus, we take (A)11 = b11 = 1, (A)12 = A21 =12b12 =
12(2) = 1, and (A)22 = b22 = −1.
So, A =
[
1 1
1 −1
]
.
It is easy to verify that Q(~x) = ~xT A~x.
EXAMPLE 2 Let Q(x1, x2) = 2x21+ 3x1x2 − 4x2
2. Find the corresponding symmetric matrix A.
Solution: We get a11 = 2, 2a12 = 3, and a22 = −4. Hence,
A =
[
2 3/23/2 −4
]
EXAMPLE 3 Let Q(x1, x2, x3) = x21 − 2x1x2 + 8x1x3 + 3x2
2 + x2x3 − 5x23. Find the corresponding
symmetric matrix A.
Solution: We have a11 = 1, 2a12 = −2, 2a13 = 8, a22 = 3, 2a23 = 1, and a33 = −5.
Hence,
A =
1 −1 4
−1 3 1/24 1/2 −5
EXERCISE 1 Let Q(x1, x2, x3) = 3x21 +
12x1x2 − x2
2 + 2x2x3 + 2x23, Find the corresponding symmetric
matrix A.
Of course, this also works in reverse. That is, if A is symmetric, then the coefficient
of bi j of xix j is given by
bi j =
aii if i = j
2ai j if i < j
EXAMPLE 4 If A =
[
1 4
4 3
]
, then the quadratic form corresponding to A is
Q(x1, x2) = x21 + 2(4)x1x2 + 3x2
2 = x21 + 8x1x2 + 3x2
2
294 Chapter 10 Applications of Orthogonal Matrices
EXAMPLE 5 If A =
1 2 −3
2 −4 0
−3 0 −1
, then the quadratic form corresponding to A is
Q(x1, x2, x3) = x21 + 2(2)x1x2 + 2(−3)x1x3 − 4x2
2 + 2(0)x2x3 − x23
= x21 + 4x1x2 − 6x1x3 − 4x2
2 − x23
EXERCISE 2 Let A =
−3 1/2 1
1/2 3/2 2
1 2 0
. Find the quadratic form corresponding to A.
EXAMPLE 6 If A =
1 0 0
0 −2 0
0 0 −3
, then the quadratic form corresponding to A is
Q(x1, x2, x3) = x21 − 2x2
2 − 3x23
The terms xix j with i , j are called cross-terms. We see that a quadratic form Q(~x)
has no cross terms if and only if its corresponding symmetric matrix is diagonal. We
will call such a quadratic form a diagonal quadratic form.
Classifying Quadratic Forms
DEFINITION
Positive Definite
Negative Definite
Indefinite
PositiveSemidefinite
NegativeSemidefinite
Let Q(~x) be a quadratic form. We say that:
1. Q(~x) is positive definite if Q(~x) > 0 for all ~x , ~0,
2. Q(~x) is negative definite if Q(~x) < 0 for all ~x , ~0,
3. Q(~x) is indefinite if Q(~x) > 0 for some ~x and Q(~x) < 0 for some ~x,
4. Q(~x) is positive semidefinite if Q(~x) ≥ 0 for all ~x,
5. Q(~x) is negative semidefinite if Q(~x) ≤ 0 for all ~x.
EXAMPLE 7 Q(x1, x2) = x21+ x2
2is positive definite, Q(x1, x2) = −x2
1− x2
2is negative defi-
nite, Q(x1, x2) = x21− x2
2is indefinite, Q(x1, x2) = x2
1is positive semidefinite, and
Q(x1, x2) = −x21
is negative semidefinite.
We classify symmetric matrices in the same way that we classify their corresponding
quadratic forms.
Section 10.3 Quadratic Forms 295
EXAMPLE 8 The symmetric matrix A =
[
1 0
0 1
]
corresponds to the quadratic form Q(x1, x2) =
x21+ x2
2, so A is positive definite as Q is positive definite.
EXAMPLE 9 Classify the symmetric matrix A =
[
4 −2
−2 4
]
and the corresponding quadratic form
Q(~x) = ~xT A~x.
Solution: Observe that we have
Q(~x) = 4x21 − 4x1x2 + 4x2
2 = (2x1 − x2)2 + 3x22 > 0
whenever ~x , ~0. Thus, Q(~x) and A are both positive definite.
REMARK
These definitions can be generalized to other functions or matrices. For example, the
matrix A =
[
1 1
0 1
]
could be called positive definite since
~xT A~x = x21 + x1x2 + x2
2 =
(
x1 +1
2x2
)2
+3
4x2
2 > 0
for all ~x , ~0. In this book, we will only be dealing with quadratic forms and sym-
metric matrices. However, it is important to remember that “Assume A is positive
definite” does not imply that A is symmetric.
Each of the quadratic forms in the example above was simple to classify since it either
had no cross-terms or we were able to easily complete a square. To classify more
difficult quadratic forms, we use our theory from the last section and the connection
between quadratic forms and symmetric matrices.
THEOREM 1 Let A be the symmetric matrix corresponding to a quadratic form Q(~x) = ~xT A~x. If P
is an orthogonal matrix that diagonalizes A, then Q(~x) can be expressed as
λ1y21 + · · ·λny2
n
where ~y = PT~x and where λ1, . . . , λn are the eigenvalues of A corresponding to the
columns of P.
Proof: Since P is orthogonal, if ~y = PT~x, then ~x = P~y. Moreover, we have that
PT AP = D is diagonal where the diagonal entries of D are the eigenvalues λ1, . . . , λn
of A. Hence,
Q(~x) = ~xT A~x = (P~y)T A(P~y) = ~yT PT AP~y = ~yT D~y = λ1y21 + · · · λny2
n
�
296 Chapter 10 Applications of Orthogonal Matrices
This theorem shows that every quadratic form Q(~x) is equivalent to a diagonal quadratic
form. Moreover, Q(~x) can be brought into this form by the change of variables
~y = PT~x where P is an orthogonal matrix which diagonalizes the corresponding
symmetric matrix A.
EXAMPLE 10 Find new variables y1, y2, y3, y4 such that
Q(x1, x2, x3, x4) = 3x21+2x1x2−10x1x3+10x1x4+3x2
2+10x2x3−10x2x4+3x23+2x3x4+3x2
4
has diagonal form. Use the diagonal form to classify Q(~x).
Solution: We have corresponding symmetric matrix A =
3 1 −5 5
1 3 5 −5
−5 5 3 1
5 −5 1 3
.
We first find an orthogonal matrix P which diagonalizes A. Using the methods of
Section 10.2 we find that P = 12
1 1 1 1
−1 −1 1 1
−1 1 1 −1
1 −1 1 −1
and PT AP =
12 0 0 0
0 −8 0 0
0 0 4 0
0 0 0 4
.
Thus, if
~y =
y1
y2
y3
y4
= PT
x1
x2
x3
x4
=1
2
x1 − x2 − x3 + x4
x1 − x2 + x3 − x4
x1 + x2 + x3 + x4
x1 + x2 − x3 − x4
then we get
Q = 12y21 − 8y2
2 + 4y23 + 4y2
4
Therefore, Q(~x) clearly takes positive and negative values, so Q(~x) is indefinite.
REMARKS
1. The eigenvectors we used to make up P are called the principal axes of A. We
will see a geometric interpretation of this in the next section.
2. By changing the order of the eigenvectors in P we also change the order of
the eigenvalues in D and hence the coefficients of the corresponding yi. For
example, in Example 10, if we took
P =1
2
1 1 1 1
−1 1 1 −1
1 1 −1 −1
−1 1 −1 1
then we would get Q = −8y21+ 4y2
2+ 4y2
3+ 12y2
4. Notice, that since we can pick
any vector ~y ∈ R4, this does in fact give us exactly the same set of values as
our choice above. Alternatively, you can think of it as doing another change of
variables z1 = y2, z2 = y3, z3 = y4, and z4 = y1.
Section 10.3 Quadratic Forms 297
Generalizing the method of the example above gives us the following theorem.
THEOREM 2 If A is a symmetric matrix, then the quadratic form Q(~x) = ~xT A~x is
(1) positive definite if and only if the eigenvalues of A are all positive
(2) negative definite if and only if the eigenvalues of A are all negative
(3) indefinite if and only if the some of the eigenvalues of A are positive and some
are negative
(4) positive semidefinite if and only if all of the eigenvalues are non-negative
(5) negative semidefinite if and only if all of the eigenvalues are non-positive
Proof: We will prove (1). The proofs of the others are similar. By the Principal Axis
Theorem A is orthogonally diagonalizable, so there exists an orthogonal matrix P
such that
Q(~x) = ~xT A~x = ~yT D~y = λ1y21 + · · · + λny2
n
where ~y = PT~x and λ1, . . . , λn are the eigenvalues of A. Consequently, Q(~x) > 0 for
all ~x , ~0 if and only if all λi are positive. Thus, Q(~x) is positive definite if and only if
the eigenvalues of A are all positive. �
EXAMPLE 11 Classify the following:
(a) Q(x1, x2) = 4x21 − 6x1x2 + 2x2
2.
Solution: We have A =
[
4 −3
−3 2
]
so
det(A − λI) =
∣
∣
∣
∣
∣
∣
4 − λ −3
−3 2 − λ
∣
∣
∣
∣
∣
∣
= λ2 − 6λ − 1
Applying the quadratic formula we get λ = 6±√
402
. So, we have both positive and
negative eigenvalues so Q is indefinite.
(b) Q(x1, x2, x3) = 2x21 + 2x1x2 + 2x1x3 + 2x2
2 + 2x2x3 + 2x23.
Solution: We have A =
2 1 1
1 2 1
1 1 2
. We can find that the eigenvalues are 1, 1 and 4.
Hence all the eigenvalues of A are positive so A is positive definite.
(c) A =
3 2 0
2 2 2
0 2 1
.
Solution: The eigenvalues of A are 5, 2, and −1. Therefore, A is indefinite.
298 Chapter 10 Applications of Orthogonal Matrices
Section 10.3 Problems
1. For each of the following quadratic forms Q(~x):
(i) Find the corresponding symmetric matrix.
(ii) Classify the quadratic form and its corre-
sponding symmetric matrix.
(iii) Find a corresponding diagonal form of
Q(~x) and the change of variables which
brings it into this form.
(a) Q(~x) = x21+ 3x1x2 + x2
2
(b) Q(~x) = 8x21+ 4x1x2 + 11x2
2
(c) Q(~x) = x21+ 8x1x2 − 5x2
2
(d) Q(~x) = −x21+ 4x1x2 + 2x2
2
(e) Q(~x) = 4x21+4x1x2+4x1x3+4x2
2+4x2x3+
4x23
(f) Q(~x) = 4x1x2 − 4x1x3 + x22 − x2
3
(g) Q(~x) = 4x21−2x1x2+2x1x3+4x2
2+2x2x3+
4x23
(h) Q(~x) = −4x21+2x1x2+2x1x3−4x2
2−2x2x3−
4x23
2. Let Q(~x) = ~xT A~x with A =
[
a b
b c
]
and
det A , 0.
(a) Prove that Q is positive definite if det A >0 and a > 0.
(b) Prove that Q is negative definite if det A >0 and a < 0.
(c) Prove that Q is indefinite if det A < 0.
3. Let A be an n × n matrix and let ~x, ~y ∈ Rn. De-
fine < ~x, ~y >= ~xT A~y. Prove that < , > is an
inner product on Rn if and only if A is a posi-
tive definite, symmetric matrix.
4. Let A be a positive definite symmetric matrix.
Prove that:
(a) the diagonal entries of A are all positive.
(b) A is invertible.
5. Let A and B be symmetric n×n matrices whose
eigenvalues are all positive. Show that the
eigenvalues of A + B are all positive.
Section 10.4 Graphing Quadratic Forms 299
10.4 Graphing Quadratic Forms
In many applications of quadratic forms it is important to be able to sketch the graph
of a quadratic form Q(~x) = k for some k. Observe that, in general, it is not easy to
identify the shape of the graph of an equation of the form ax21+ bx1x2 + cx2
2= k by
inspection. On the other hand, if you are familiar with conic sections, it is easy to
determine the shape of the graph of an equation of the form λ1x21 + λ2x2
2 = k. We
demonstrate this in a table.
The graph of λ1x21 + λ2x2
2 = k looks like:
k > 0 k = 0 k < 0
λ1 > 0, λ2 > 0 ellipse point(0, 0) does not exist
λ1 < 0, λ2 < 0 does not exist point(0, 0) ellipse
λ1λ2 < 0 hyperbola asymptotes for hyperbola hyperbola
λ1 = 0, λ2 > 0 parallel lines line x2 = 0 does not exist
λ1 = 0, λ2 < 0 does not exist line x2 = 0 parallel lines
In the last section, we saw that we could bring a quadratic form into diagonal form
by performing the change variables ~y = PT~x where P orthogonally diagonalizes
the corresponding symmetric matrix. Therefore, to sketch an equation of the form
ax21 + bx1x2 + cx2
2 = k we could first apply the change of variables to bring it into the
form λ1y21+λ2y2
2= k which will be easy to sketch. However, we must determine how
performing the change of variables will affect the graph.
THEOREM 1 If Q(~x) = ax21+ bx1x2 + cx2
2with a, b, c not all zero, then there exists an orthogonal
matrix P, which corresponds to a rotation in R2, such that the change of variables
~y = PT~x brings Q(~x) into diagonal form.
Proof: Let A =
[
a b/2b/2 c
]
. Since A is symmetric we can apply the Principal Axis
Theorem to get that there exists an orthonormal basis {~v1,~v2} of R2 of eigenvectors of
A. Let ~v1 =
[
a1
a2
]
and ~v2 =
[
b1
b2
]
. Since ~v1 is a unit vector we must have a21+ a2
2= 1.
Hence, the components lie on the unit circle and so there exists an angle θ such that
a1 = cos θ and a2 = sin θ. Moreover, since ~v2 is a unit vector orthogonal to ~v1 we can
pick b1 = − sin θ and b2 = cos θ. Hence, we have
P =
[
cos θ − sin θsin θ cos θ
]
which corresponds to a rotation by θ. Finally, from our work above, we know that
this change of coordinates matrix brings Q into diagonal form. �
300 Chapter 10 Applications of Orthogonal Matrices
REMARK
Observe that our choices for ~v1 and ~v2 in the proof above were not the only possibili-
ties. For example, we could have picked ~v1 =
[
cos θsin θ
]
and ~v2 =
[
sin θ− cos θ
]
. In this case
the matrix P =[
~v1 ~v2
]
would correspond to a rotation and a reflection. This does
not contradict the theorem though. The theorem only says that we can choose P to
correspond to a rotation, not that a rotation is the only choice. We will demonstrate
this in an example below.
EXAMPLE 1 Consider the equation x21+ x1x2 + x2
2= 1. Find a rotation so that the equation has no
cross terms.
Solution: We have A =
[
1 1/21/2 1
]
. The eigenvalues are λ1 =12
and λ2 =32.
Therefore, the graph of x21+x1x2+x2
2= 1 is an ellipse. We find that the corresponding
unit eigenvectors are ~v1 =1√2
[
−1
1
]
and ~v2 =1√2
[
−1
−1
]
. So, from our work in the proof
of the theorem, we can pick cos θ =−1√
2and sin θ =
1√
2. Hence, θ =
3π
4and
[
x1
x2
]
=
[
−1/√
2 −1/√
2
1/√
2 −1/√
2
] [
y1
y2
]
=
[
(−y1 − y2)/√
2
(y1 − y2)/√
2
]
Substituting these into the equation gives
1 = x21 + x1x2 + x2
2 =
(
−y1 − y2√2
)2
+
(
−y1 − y2√2
) (
y1 − y2√2
)
+
(
y1 − y2√2
)2
=1
2(y2
1 + 2y1y2 + y22) +
1
2(−y2
1 + y22) +
1
2(y2
1 − 2y1y2 + y22)
=1
2y2
1 +3
2y2
2
Thus, the graph of x21+ x1x2 + x2
2= 1 is the graph of
1
2y2
1 +3
2y2
2 = 1 rotated by 3π4
radians. This is shown below.
Section 10.4 Graphing Quadratic Forms 301
Let us look at this procedure in general. Assume that we want to sketch Q(x1, x2) = k
where Q(x1, x2) is a quadratic form with corresponding symmetric matrix A. We
first write Q(x1, x2) in diagonal form λ1y21 + λ2y2
2 by finding the orthogonal matrix
P =[
~v1 ~v2
]
which diagonalizes A and performing the change of variables ~y = PT~x.
We can then easily sketch λ1y21+ λ2y2
2= k (finding the equations of the asymptotes if
it is a hyperbola) in the y1y2-plane. Then, using the fact that
~x = P~y =[
~v1 ~v2
]
[
y1
y2
]
= y1~v1 + y2~v2
we can find the y1-axis and the y2-axis in the x1x2-plane. In particular, the y1-axis in
the y1y2-plane is spanned by
[
1
0
]
, thus the y1-axis in the x1x2-plane is spanned by
~x = 1~v1 + 0~v2 = ~v1
Hence, performing the change of variables rotates the y1-axis to be in the direction
of ~v1. Similarly, the y2 axis is rotated to the ~v2 direction. Therefore, we can sketch
the graph of Q(x1, x2) = k in the x1x2-plane by drawing the y1 and y2 axes in the
x1x2-plane and sketching the graph of λ1y21 + λ2y2
2 = k on these axes as we did in
the y1y2-plane. For this reason, the orthogonal eigenvectors ~v1,~v2 of A are called the
principal axes for Q(x1, x2).
EXAMPLE 2 Sketch 6x21+ 6x1x2 − 2x2
2= 5.
Solution: We start by finding an orthogonal matrix which diagonalizes the corre-
sponding symmetric matrix A.
We have A =
[
6 3
3 −2
]
, so det(A − λI) =
∣
∣
∣
∣
∣
∣
6 − λ 3
3 −2 − λ
∣
∣
∣
∣
∣
∣
= (λ − 7)(λ + 3). We
find corresponding eigenvectors of A are
[
3
1
]
for λ = 7 and
[
−1
3
]
for λ = −3. Thus,
P =
[
3/√
10 −1/√
10
1/√
10 3/√
10
]
orthogonally diagonalizes A.
Our work above shows that the change of vari-
ables ~y = PT~x changes 6x21+6x1x2−2x2
2= 5 into
7y21− 3y2
2= 5.
To sketch the hyperbola 5 = 7y21− 3y2
2more ac-
curately, we first find the equations of its asymp-
totes. We get
0 = 7y21 − 3y2
2
y2 = ±√
21
3y1
≈ ±1.528y1
302 Chapter 10 Applications of Orthogonal Matrices
Now to sketch 6x21 +6x1x2 −2x2
2 = 5 in the x1x2-plane we first draw the y1-axis in the
direction of
[
3
1
]
and the y2-axis in the direction of
[
−1
3
]
. Next, to be more precise, we
use the change of variables ~y = PT~x to convert the equations of the asymptotes of the
hyperbola to equations in the x1x2-plane. We have
[
y1
y2
]
=1√
10
[
3 1
−1 3
] [
x1
x2
]
=1√
10
[
3x1 + x2
−x1 + 3x2
]
Hence, we get the equations of the asymptotes are
y2 = ±√
21
3y1
1√
10(−x1 + 3x2) = ±
√21
3
1√
10(3x1 + x2)
−3x1 + 9x2 = ±3√
21x1 ±√
21x2
x2(9 ∓√
21) = (3 ± 3√
21)x1
x2 =3 ± 3
√21
9 ∓√
21x1
Plotting the asymptotes and copying the graph
from above onto the y1 and y2 axes gives the pic-
ture to the right.
REMARKS
1. The advantage of finding the principal axes over just calculating the angle of
rotation is that we actually get precise equations of the axes and the asymptotes
which are required in some applications.
2. Another way of converting the equations of the asymptotes is to find the direc-
tion vectors ~a1 and ~a2 of the asymptotes in the y1y2-plane and compute P~a1 and
P~a2 which will be the direction vectors of the asymptotes in the x1x2-plane.
Section 10.4 Graphing Quadratic Forms 303
EXAMPLE 3 Sketch 2x21 + 4x1x2 + 5x2
2 = 1.
Solution: We have A =
[
2 2
2 5
]
, so det(A − λI) =
∣
∣
∣
∣
∣
∣
2 − λ 2
2 5 − λ
∣
∣
∣
∣
∣
∣
= (λ − 6)(λ − 1).
We find corresponding eigenvectors of A are
[
1
2
]
for λ = 6 and
[
−2
1
]
for λ = 1. To demon-
strate a rotation and a reflection, we will pick
P =
[
−2/√
5 1/√
5
1/√
5 2/√
5
]
. The change of variables
~y = PT~x changes
2x21+4x1x2+5x2
2= 1 into the ellipse y2
1+6y2
2= 1.
Graphing gives the picture to the right.
To sketch 2x21+4x1x2+5x2
2 = 1 in the x1x2-plane
we draw the y1-axis in the direction of
[
−2
1
]
and
the y2-axis in the direction of
[
1
2
]
and copy the
picture above onto the axes appropriately to get
the picture to the right.
EXAMPLE 4 Sketch x21+ 4x1x2 − 2x2
2= 1.
Solution: We have A =
[
1 2
2 −2
]
so det(A−λI) =
∣
∣
∣
∣
∣
∣
1 − λ 2
2 −2 − λ
∣
∣
∣
∣
∣
∣
= (λ+3)(λ−2). We
find corresponding eigenvectors of A are
[
−1
2
]
for λ = −3 and
[
2
1
]
for λ = 2. Thus,
P =
[
−1/√
5 2/√
5
2/√
5 1/√
5
]
orthogonally diagonalizes
A.
Then, the change of variables ~y = PT~x changes
x21+ 4x1x2 − 2x2
2= 1 into −3y2
1+ 2y2
2= 1. This
is a hyperbola, so we need to find the equations
of its asymptotes. We get
0 = −3y21 + 2y2
2
y2 = ±√
6
2y1
≈ ±1.225y1
304 Chapter 10 Applications of Orthogonal Matrices
To sketch x21 + 4x1x2 − 2x2
2 = 1 in the x1x2-plane we first draw the y1-axis in the
direction of
[
−1
2
]
and the y2-axis in the direction of
[
2
1
]
. Next, we use the change
of variables ~y = PT~x to convert the equations of the asymptotes of the hyperbola to
equations in the x1x2-plane. We have
[
y1
y2
]
=1√
5
[
−1 2
2 1
] [
x1
x2
]
=1√
5
[
−x1 + 2x2
2x1 + x2
]
Hence, we get the equations of the asymptotes are
y2 = ±√
6
2y1
1√
5(2x1 + x2) = ±
√6
2
1√
5(−x1 + 2x2)
4x1 + 2x2 = ∓√
6x1 ± 2√
6x2
x2(2 ∓ 2√
6) = (−4 ∓√
6)x1
x2 =−4 ∓
√6
2 ∓ 2√
6x1
Plotting the asymptotes and copying the graph
from above onto the y1 and y2 axes gives the pic-
ture to the right.
Notice that in the last two examples, the change of variables we used actually corre-
sponded to a reflection and a rotation.
Section 10.4 Problems
1. Sketch the graph of each of the following equations showing both the original
and new axes. For any hyperbola, find the equation of the asymptotes.
(a) x21+ 8x1x2 + x2
2= 6 (b) 3x2
1− 2x1x2 + 3x2
2= 12
(c) −4x21+ 4x1x2 − 7x2
2= −8 (d) −3x2
1− 4x1x2 = 4
(e) −x21 + 4x1x2 + 2x2
2 = 6 (f) 4x21 + 4x1x2 + 4x2
2 = 12
(g) 2x21 − 4x1x2 − x2
2 = 6 (h) 3x21 + 4x1x2 + 3x2
2 = 5
Section 10.5 Optimizing Quadratic Forms 305
10.5 Optimizing Quadratic Forms
We use quadratic forms in calculus to classify critical points as local minimums and
maximums. However, in many other applications of quadratic forms, we actually
want to find the maximum and/or minimum of the quadratic form subject to a con-
straint. Most of the time, it is possible to use a change of variables so that the con-
straint is ‖~x ‖ = 1. That is, given a quadratic form Q(~x) = ~xT A~x on Rn we want to find
the maximum and minimum value of Q(~x) subject to ‖~x ‖ = 1. For ease, we instead
use the equivalent constraint
1 = ‖~x ‖2 = x21 + · · · + x2
n
To develop a procedure for solving this problem in general, we begin by looking at a
couple of examples.
EXAMPLE 1 Find the maximum and minimum of Q(x1, x2) = 2x21+ 3x2
2subject to the constraint
x21 + x2
2 = 1.
Solution: Since we don’t have a general method for solving this type of problem, we
resort to the basics. We will first try a few points (x1, x2) that satisfy the constraint.
We get
Q(1, 0) = 2
Q
√3
2,
1
2
=9
4
Q
(
1√
2,
1√
2
)
=5
2
Q
1
2,
√3
2
=11
4
Q(0, 1) = 3
Although we have only tried a few points, we may guess that 3 is the maximum value.
Since we have already found that Q(0, 1) = 3, to prove that 3 is the maximum we just
need to show that 3 is an upper bound for Q(x1, x2) subject to x21+ x2
2= 1. Indeed,
we have that
Q(x1, x2) = 2x21 + 3x2
2 ≤ 3x21 + 3x2
2 = 3(x21 + x2
2) = 3
Hence, 3 is the maximum.
Similarly, we have
Q(x1, x2) = 2x21 + 3x2
2 ≥ 2x21 + 2x2
2 = 2(x21 + x2
2) = 2
Hence, 2 is a lower bound for Q(x1, x2) subject to the constraint and Q(1, 0) = 2, so
2 is the minimum.
306 Chapter 10 Applications of Orthogonal Matrices
This example was extremely easy since we had no cross-terms in the quadratic form.
We now look at what happens if we have cross-terms.
EXAMPLE 2 Find the maximum and minimum of Q(x1, x2) = 7x21+ 8x1x2 + x2
2subject to the
constraint x21 + x2
2 = 1.
Solution: We first try some values of (x1, x2) that satisfy the constraint. We have
Q(1, 0) = 7
Q
(
1√
2,
1√
2
)
= 8
Q(0, 1) = 1
Q
(
− 1√
2,
1√
2
)
= 0
Q(−1, 0) = 7
From this we might think that the maximum is 8 and the minimum is 0, although we
again realize that we have tested very few points. Indeed, if we try more points, we
quickly find that these are not the maximum and minimum. So, instead of testing
points, we need to try to think of where the maximum and minimum should occur.
In the previous example, they occurred at (1, 0) and (0, 1) which are on the principal
axes. Thus, it makes sense to consider the principal axes of this quadratic form as
well.
We find that A =
[
7 4
4 1
]
has eigenvalues λ1 = 9 and λ2 = −1 with corresponding unit
eigenvectors ~v1 =
[
2/√
5
1/√
5
]
and ~v2 =
[
1/√
5
−2/√
5
]
. Taking these for (x1, x2) we get
Q
(
2√
5,
1√
5
)
= 9, Q
(
1√
5,−2√
5
)
= −1
Moreover, taking
~y = PT~x =1√
5
[
2 1
1 −2
] [
x1
x2
]
=1√
5
[
2x1 + x2
x1 − 2x2
]
we get
Q(x1, x2) = ~yT D~y = 9
(
2√
5x1 +
1√
5x2
)2
−(
1√
5x1 −
2√
5x2
)2
≤ 9
(
2√
5x1 +
1√
5x2
)2
+ 9
(
1√
5x1 −
2√
5x2
)2
= 9
(
2√
5x1 +
1√
5x2
)2
+
(
1√
5x1 −
2√
5x2
)2
= 9
since(
2√
5x1 +
1√
5x2
)2
+
(
1√
5x1 −
2√
5x2
)2
= x21 + x2
2 = 1
Similarly, we can show that Q(x1, x2) ≥ −1. Thus, the maximum of Q(x1, x2) subject
to ‖~x ‖ = 1 is 9 and the minimum is −1.
Section 10.5 Optimizing Quadratic Forms 307
We generalize what we learned in the examples above to get the following theorem.
THEOREM 1 Let Q(~x) be a quadratic form on Rn with corresponding symmetric matrix A. The
maximum value and minimum value of Q(~x) subject to the constraint ‖~x ‖ = 1 are the
greatest and least eigenvalues of A respectively. Moreover, these values occur when
~x is taken to be a unit eigenvector corresponding to the eigenvalue.
Proof: Since A is symmetric, we can find an orthogonal matrix P =[
~v1 · · · ~vn
]
which orthogonally diagonalizes A where ~v1, . . . ,~vn are arranged so that the corre-
sponding eigenvalues λ1, . . . , λn satisfy λ1 ≥ λ2 ≥ · · · ≥ λn.
Hence, there exists a diagonal matrix D such that
~xT A~x = ~yT D~y
where ~x = P~y. Since P is orthogonal we have that
‖~y ‖ = ‖P~y ‖ = ‖~x ‖ = 1
Thus, the quadratic form ~yT D~y subject to ‖~y ‖ = 1 takes the same set of values as the
quadratic form ~xT A~x subject to ‖~x ‖ = 1. Hence, we just need to find the maximum
and minimum of ~yT D~y subject to ‖~y ‖ = 1.
Since y21+ · · · + y2
n = ‖~y ‖2 = 1 we have
~yT D~y = λ1y21 + · · · + λny2
n ≤ λ1y21 + · · · + λ1y2
n = λ1(y21 + · · · + y2
n) = λ1
and ~yT D~y = λ1 with ~y = ~e1. Hence, λ1 is the maximum value with ~x = P~e1 = ~v1.
Similarly,
~yT D~y = λ1y21 + · · · + λny2
n ≥ λny21 + · · · + λny2
n = λn(y21 + · · · + y2
n) = λn
and ~yT D~y = λn with ~y = ~en. Hence, λn is the minimum value with ~x = P~en = ~vn. �
EXAMPLE 3 Let A =
−3 0 −1
0 3 0
−1 0 −3
. Find the maximum and minimum value of the quadratic form
Q(~x) = ~xT A~x subject to x21+ x2
2+ x2
3= 1.
Solution: The characteristic polynomial of A is
C(λ) =
∣
∣
∣
∣
∣
∣
∣
∣
−3 − λ 0 −1
0 3 − λ 0
−1 0 −3 − λ
∣
∣
∣
∣
∣
∣
∣
∣
= −(λ − 3)(λ + 4)(λ + 2)
By Theorem 1, the maximum of Q(~x) subject to the constraint is 3 and the minimum
is −4.
308 Chapter 10 Applications of Orthogonal Matrices
EXAMPLE 4 Find the maximum and minimum value of the quadratic form
Q(x1, x2, x3) = 4x21 + 2x1x2 + 2x1x3 + 4x2
2 − 2x2x3 + 4x23
subject to x21 + x2
2 + x23 = 1. Find vectors at which the maximum and minimum occur.
Solution: The characteristic polynomial of the corresponding symmetric matrix is
C(λ) =
∣
∣
∣
∣
∣
∣
∣
∣
4 − λ 1 1
1 4 − λ −1
1 −1 4 − λ
∣
∣
∣
∣
∣
∣
∣
∣
= −(λ − 5)2(λ − 2)
We find that a unit eigenvectors corresponding to λ1 = 5 and λ2 = 2 are ~v1 =
1/√
2
1/√
2
0
and
~v2 =
−1/√
3
1/√
3
1/√
3
respectively. By Theorem 1, the maximum of Q(~x) subject to x21+ x2
2+
x23 = 1 is 5 and occurs when ~x = ~v1, and the minimum is 2 and occurs when ~x = ~v2.
Section 10.5 Problems
1. Find the maximum and minimum of each quadratic form subject to ‖~x ‖ = 1.
(a) Q(~x) = 4x21 − 2x1x2 + 4x2
2
(b) Q(~x) = 3x21+ 10x1x2 − 3x2
2
(c) Q(~x) = −x21+ 8x2
2+ 4x2x3 + 11x2
3
(d) Q(~x) = 3x21 − 4x1x2 + 8x1x3 + 6x2
2 + 4x2x3 + 3x23
(e) Q(~x) = 2x21 + 2x1x2 + 2x1x3 + 2x2
2 + 2x2x3 + 2x23
Section 10.6 Singular Value Decomposition 309
10.6 Singular Value Decomposition
We have now seen that finding the maximum and minimum of a quadratic form sub-
ject to ‖~x ‖ = 1 is very easy. However, in many applications we are not lucky enough
to get a symmetric matrix; in fact, we often do not even get a square matrix. Thus, it
is very desirable to derive a similar method for finding the maximum and minimum
of ‖A~x ‖ for an m × n matrix A subject to ‖~x ‖ = 1. This derivation will lead us to a
very important matrix factorization known as the singular value decomposition. We
again begin by looking at an example.
EXAMPLE 1 Consider a linear mapping f : R3 → R2 defined by f (~x) = A~x with A =
[
1 0 1
1 1 −1
]
.
Since the range is now a subspace of R2, we want to find the maximum length of A~xsubject to ‖~x ‖ = 1. That is, we want to maximize ‖A~x ‖. Observe that
‖A~x ‖2 = (A~x)T (A~x) = ~xT AT A~x
Since AT A is symmetric this is a quadratic form. Hence, using Theorem 10.5.1, to
maximize ‖A~x ‖ we just need to find the square root of the largest eigenvalue of AT A.
We have
AT A =
2 1 0
1 1 −1
0 −1 2
The characteristic polynomial of AT A is C(λ) = −λ(λ − 3)(λ − 2). Thus, the eigen-
values of AT A are 3, 2 and 0. Hence, the maximum of ‖A~x ‖ subject to ‖~x ‖ = 1 is√
3
and the minimum is 0. Moreover, these occur at the corresponding eigenvectors
−1/√
3
−1/√
3
1/√
3
and
−1/√
6
2/√
6
1/√
6
of AT A.
In the example, we found that the eigenvalues of AT A were all non-negative. This
was important as we had to take the square root of them to maximize/minimize ‖A~x ‖.We now prove that this is always the case.
THEOREM 1 If A is an m× n matrix and λ1, . . . , λn are the eigenvalues of AT A with corresponding
unit eigenvectors ~v1, . . . ,~vn, then λ1, . . . , λn are all non-negative. In particular,
‖A~vi ‖ =√
λi, 1 ≤ i ≤ n
Proof: For 1 ≤ i ≤ n we have AT A~vi = λi~vi and hence
‖A~vi ‖2 = (A~vi)T A~vi = ~v
Ti AT A~vi = ~v
Ti (λi~vi) = λi~v
Ti ~vi = λi‖~vi‖2 = λi
since ‖~vi ‖ = 1. �
310 Chapter 10 Applications of Orthogonal Matrices
Observe from the example above that the square root of the largest and smallest
eigenvalues of AT A are the maximum and minimum of values of ‖A~x ‖ subject to
‖~x ‖ = 1. So, these are behaving like the eigenvalues of a symmetric matrix. This
motivates the following definition.
DEFINITION
Singular Values
The singular values σ1, . . . , σn of an m × n matrix A are the square roots of the
eigenvalues of AT A arranged so that σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0.
EXAMPLE 2 Find the singular values of A =
[
1 1 1
2 2 2
]
.
Solution: We have AT A =
5 5 5
5 5 5
5 5 5
. The characteristic polynomial is C(λ) =
−λ2(λ − 15) so the eigenvalues are (from greatest to least) λ1 = 15, λ2 = 0, and
λ3 = 0. Thus, the singular values of A are σ1 =√
15, σ2 = 0, and σ3 = 0.
EXAMPLE 3 Find the singular values of B =
1 1
−1 2
−1 1
.
Solution: We have BT B =
[
3 −2
−2 6
]
. The characteristic polynomial is C(λ) = (λ −
2)(λ−7). Hence, the eigenvalues of BT B are λ1 = 7 and λ2 = 2, so the singular values
of B are σ1 =√
7 and σ2 =√
2.
EXAMPLE 4 Find the singular values of A =
2 −1
1 0
1 2
and B =
[
2 1 1
−1 0 2
]
.
Solution: We have AT A =
[
6 0
0 5
]
. Hence, the eigenvalues of AT A are λ1 = 6 and
λ2 = 5. Thus, the singular values of A are σ1 =√
6 and σ2 =√
5.
We have BT B =
5 2 0
2 1 1
0 1 5
. The characteristic polynomial of BT B is
C(λ) =
∣
∣
∣
∣
∣
∣
∣
∣
5 − λ 2 0
2 1 − λ 1
0 1 5 − λ
∣
∣
∣
∣
∣
∣
∣
∣
= −λ(λ − 5)(λ − 6)
Hence, the eigenvalues of BT B are λ1 = 6, λ2 = 5, and λ3 = 0. Thus, the singular
values of B are σ1 =√
6, σ2 =√
5, and σ3 = 0.
Section 10.6 Singular Value Decomposition 311
EXERCISE 1 Find the singular values of A =
1 −1
3 1
1 1
and B =
[
1 −1 1
0 2 2
]
.
It is easy to show that the number of non-zero eigenvalues of AT A is the rank of AT A.
Thus, it is natural to ask if there is a relationship between the number of non-zero
singular values of an m×n matrix A and the rank of A. In the problems above, we see
that we have the rank of A equals the number of non-zero singular values. To prove
this result in general, we use the following lemmas.
LEMMA 2 If A is an m × n matrix, then Null(AT A) = Null(A).
Proof: If A~x = ~0, then AT A~x = AT~0 = ~0. Hence, the nullspace of A is a subset of the
nullspace of AT A. On the other hand, consider AT A~x = ~0. Then,
‖A~x ‖2 = (A~x) · (A~x) = (A~x)T (A~x) = ~xT AT A~x = ~xT~0 = 0
Thus, A~x = ~0. Hence, the nullspace of AT A is also a subset of the nullspace of A, and
the result follows. �
LEMMA 3 If A is an m × n matrix, then rank(AT A) = rank(A).
Proof: Using Lemma 2, we get that dim(Null(AT A)) = dim(Null(A)). Thus, the
Dimension Theorem gives
rank(AT A) = n − dim(Null(AT A)) = n − dim(Null(A)) = rank(A)
as required. �
COROLLARY 4 If A is an m × n matrix and rank(A) = r, then A has r non-zero singular values.
We now want to extend the similarity of singular values and eigenvalues by defining
singular vectors. Observe that if A is an m × n matrix with m , n, then we cannot
have A~v = σ~v since A~v ∈ Rm. Hence, the best we can do to match the definition of
eigenvalues and eigenvectors is to pick suitable non-zero vectors ~v ∈ Rn and ~u ∈ Rm
such that A~v = σ~u.
By definition, for any non-zero singular value σ of A there is a vector ~v , ~0 such
that AT A~v = σ2~v. Thus, if we have A~v = σ~u, then we have AT A~v = AT (σ~u) so
σ2~v = σAT~u. Dividing by σ, we see that we must also have AT~u = σ~v. Moreover, by
Theorem 1, if ~v is a unit eigenvector of AT A, then we get that ~u = 1σ
A~v is also a unit
vector.
312 Chapter 10 Applications of Orthogonal Matrices
Therefore, for a non-zero singular value σ of A, we will want unit vectors ~v and ~usuch that
A~v = σ~u and AT~u = σ~v
However, our derivation does not work for σ = 0. In this case, we only require that
one of these conditions being satisfied.
DEFINITION
Singular Vectors
Let A be an m × n matrix. If ~v ∈ Rn and ~u ∈ Rm are unit vectors and σ , 0 is a
singular value of A such that
A~v = σ~u and AT~u = σ~v
then we say that ~u is a left singular vector of A corresponding to σ and ~v is a right
singular vector of A corresponding to σ. For σ = 0, if ~u is a unit vector such that
AT~u = ~0, then ~u is a left singular vector of A corresponding to σ = 0. If ~v is a
unit vector such that A~v = ~0, then ~v is a right singular vector of A corresponding to
σ = 0.
Our derivation above the definition proves the following theorem which we will make
frequent use of.
THEOREM 5 Let A be an m × n matrix. If ~v ∈ Rn is a unit eigenvector of AT A corresponding
to a non-zero singular value σ of A , then ~u =1
σA~v is a left singular vector of A
corresponding to σ.
The definition of right singular vectors, not only preserves our relationship of these
vectors and the eigenvectors of AT A, but we also get the corresponding result for the
left singular vectors and the eigenvectors of AAT .
THEOREM 6 Let A be an m × n matrix. A vector ~v ∈ Rn is a right singular vector of A if and only
if ~v is a unit eigenvector of AT A. A vector ~u ∈ Rm is a left singular vector of A if and
only if ~u is a unit eigenvector of AAT .
Now that we have singular values and singular vectors mimicking eigenvalues and
eigenvectors, we wish to try to mimic orthogonal diagonalization. Let A be an m × n
matrix. As was the case with singular vectors, if m , n, then PT AP is not defined for
a square matrix P. Thus, we want to find an m ×m orthogonal matrix U and an n × n
orthogonal matrix V such that UT AV = Σ, where Σ is an m × n “diagonal” matrix.
To do this, we require that for any m×n matrix A there exists an orthonormal basis for
Rn of right singular vectors and an orthonormal basis for Rm of left singular vectors.
We now prove that these always exist.
Section 10.6 Singular Value Decomposition 313
LEMMA 7 Let A be an m × n matrix with rank(A) = r and singular values σ1, . . . , σn. If
{~v1, . . . ,~vn} is an orthonormal basis for Rn consisting of the eigenvectors of AT A ar-
ranged so that the corresponding eigenvalues are arranged from greatest to least, then
{
1
σ1
A~v1, . . . ,1
σr
A~vr
}
is an orthonormal basis for Col A.
Proof: Since A has rank r, we know that AT A has rank r and so AT A has r non-zero
eigenvalues. Thus, σ1, . . . , σr are the r non-zero singular values of A.
Now, observe that for 1 ≤ i, j ≤ r, i , j, we have
A~vi · A~v j = (A~vi)T A~v j = ~v
Ti AT A~v j = ~v
Ti λ j~v j = λ j~v
Ti ~v j = 0
because {~v1, . . . ,~vr} is orthogonal. So, {A~v1, . . . , A~vr} is also orthogonal. Theorem 1
then implies that{
1
σ1
A~v1, . . . ,1
σr
A~vr
}
is orthonormal. Since dim Col(A) = r, Theorem 4.2.3 implies that this is an orthonor-
mal basis for Col A. �
THEOREM 8 If A is an m×n matrix with rank r and singular values σ1, . . . , σn, then there exists an
orthonormal basis {~v1, . . . ,~vn} forRn of right singular vectors of A and an orthonormal
basis {~u1, . . . , ~um} for Rm of left singular vectors of A such that
A~vi = σi~ui, 1 ≤ i ≤ min(m, n)
Proof: By Theorem 6 unit eigenvectors of AT A are right singular vectors of A.
Hence, since AT A is symmetric, we can find an orthonormal basis {~v1, . . . ,~vn} of
right singular vectors of A.
Let
~ui =1
σi
A~vi, 1 ≤ i ≤ r
By Lemma 7, {~u1, . . . , ~ur} is an orthonormal basis for Col(A). Also, by definition, a
left singular vectors ~u j of A corresponding toσ = 0 lies in the nullspace of AT . Hence,
by the Fundamental Theorem of Linear Algebra, if {~ur+1, . . . , ~um} is an orthonormal
basis for the nullspace of AT , then {~u1, . . . , ~um} is an orthonormal basis for Rm of left
singular vectors of A. �
314 Chapter 10 Applications of Orthogonal Matrices
EXAMPLE 5 Let B =
1 1
−1 2
−1 1
. Find an orthonormal basis for R2 of right singular vectors for B
and an orthonormal basis for R3 of left singular vectors of B.
Solution: By Theorem 6, we can find an orthonormal basis for R2 of right singular
vectors of B by finding an orthonormal basis for R2 of eigenvectors of BT B.
In Example 4, we found that BT B =
[
3 −2
−2 6
]
with eigenvalues λ1 = 7 and λ2 = 2.
We have BT B − 7I =
[
−4 −2
−2 −1
]
∼[
1 1/20 0
]
.
Thus, a unit eigenvector corresponding to λ1 = 7 is ~v1 =
[
−1/√
5
2/√
5
]
.
We have BT B − 2I =
[
1 −2
−2 4
]
∼[
1 −2
0 0
]
.
Thus, a unit eigenvector corresponding to λ2 = 2 is ~v2 =
[
2/√
5
1/√
5
]
.
Hence, an orthonormal basis for R2 of right singular vectors of B is {~v1,~v2}.
To find an orthonormal basis for R3 of left singular vectors, we follow the steps in
the proof of Theorem 8.
We first find an orthonormal basis for Col(B) using Lemma 7. We get
~u1 =1
σ1
B~v1 =1√
7
1 1
−1 2
−1 1
[
−1/√
5
2/√
5
]
=
1/√
35
5/√
35
3/√
35
~u2 =1
σ2
B~v2 =1√
2
1 1
−1 2
−1 1
[
2/√
5
1/√
5
]
=
3/√
10
0
−1/√
10
So, {~u1, ~u2} is an orthonormal basis for Col(B). To complete the orthonormal basis
for R3, we add an orthonormal basis for Null(BT ). We have
BT ∼[
1 0 −1/30 1 2/3
]
Thus, if we let ~u3 =
1/√
14
−2/√
14
3/√
14
we get that {~u3} is an orthonormal basis for Null(BT )
and {~u1, ~u2, ~u3} is an orthonormal basis for R3 of left singular vectors of B.
Section 10.6 Singular Value Decomposition 315
EXERCISE 2 Let A =
[
1 −1 −1
1 2 1
]
. Find an orthonormal basis for R3 of right singular vectors of A
and an orthonormal basis for R2 of left singular vectors of A. Compare to the solution
of Example 5.
Let A ∈ Mm×n(R) with rank(A) = r. Let {~v1, . . . ,~vn} be an orthonormal basis for Rn of
right singular vectors of A, let σ1, . . . , σr be the non-zero singular values of A, and let
{~u1, . . . , ~um} be an orthonormal basis for Rm of left singular vectors of A constructed
as in Theorem 8. Observe that we have
A[
~v1 · · · ~vn
]
=[
A~v1 · · ·A~vr A~vr+1 · · · A~vn
]
=[
σ1~u1 · · · σr~ur~0 · · · ~0
]
=[
~u1 · · · ~um
]
Σ
where Σ is the m × n matrix with (Σ)ii = σi for 1 ≤ i ≤ r and all other entries are 0.
Hence, we have that there exists an orthogonal matrix V and orthogonal matrix U
such that AV = UΣ. Instead of writing this as UT AV = Σ, we typically, write this as
A = UΣVT to get a matrix decomposition of A.
DEFINITION
Singular ValueDecomposition
(SVD)
Let A be an m × n matrix with rank(A) = r and with non-zero singular values
σ1, . . . , σr. A singular value decomposition of A is a factorization of the form
A = UΣVT
where U is an orthogonal matrix containing left singular vectors of A, V is an or-
thogonal matrix containing right singular vectors of A, and Σ is the m×n matrix with
(Σ)ii = σi for 1 ≤ i ≤ r and all other entries of Σ are 0.
ALGORITHM
To find a singular value decomposition of a matrix A with rank r:
(1) Find the eigenvalues λ1, . . . , λn of AT A arranged from greatest to least and a
corresponding set of orthonormal eigenvectors {~v1, . . . ,~vn}. Define σi =√λi
for 1 ≤ i ≤ r.
(2) Let V =[
~v1 · · · ~vn
]
and let Σ be the m × n matrix such that (Σ)ii = σi for
1 ≤ i ≤ r and all other entries of Σ are 0.
(3) Find left singular vectors of A by computing ~ui =1σi
A~vi for 1 ≤ i ≤ r, and then
extend the set {~u1, . . . , ~ur} to an orthonormal basis {~u1, . . . , ~um} for Rm. Take
U =[
~u1 · · · ~um
]
.
Then, A = UΣVT .
316 Chapter 10 Applications of Orthogonal Matrices
REMARK
As indicated in the proof of Theorem 8, one way to extend {~u1, . . . , ~ur} to an orthonor-
mal basis for Rm is by finding an orthonormal basis for Null(AT ).
EXAMPLE 6 Find a singular value decomposition of A =
[
1 −1 3
3 1 1
]
.
Solution: We find that AT A =
10 2 6
2 2 −2
6 −2 10
has eigenvalues (ordered from greatest
to least) of λ1 = 16, λ2 = 6 and λ3 = 0. Corresponding orthonormal eigenvectors are
~v1 =
1/√
2
0
1/√
2
, ~v2 =
1/√
3
1/√
3
−1/√
3
, ~v3 =
1/√
6
−2/√
6
−1/√
6
So, the singular values of A are σ1 =√
16 = 4, and σ2 =√
6.
We define V =
1/√
2 1/√
3 1/√
6
0 1/√
3 −2/√
6
1/√
2 −1/√
3 −1/√
6
and Σ =
[
4 0 0
0√
6 0
]
.
To find the left singular vectors we compute
~u1 =1
σ1
A~v1 =1
4
[
4/√
2
4/√
2
]
=
[
1/√
2
1/√
2
]
, ~u2 =1
σ2
A~v2 =1√
6
[
−3/√
3
3/√
3
]
=
[
−1/√
2
1/√
2
]
Since this forms a basis for R2, we take U =
[
1/√
2 −1/√
2
1/√
2 1/√
2
]
.
Then, we have a singular value decomposition A = UΣVT of A.
EXAMPLE 7 Find a singular value decomposition of B =
2 −4
2 2
−4 0
1 4
.
Solution: We have BT B =
[
25 0
0 36
]
. The eigenvalues are λ1 = 36 and λ2 = 25 with
corresponding orthonormal eigenvectors
[
0
1
]
,
[
1
0
]
. Consequently, the singular values
of B are σ1 =√
36 = 6 and σ2 =√
25 = 5.
We define V =
[
0 1
1 0
]
and Σ =
6 0
0 5
0 0
0 0
.
Section 10.6 Singular Value Decomposition 317
Next, we compute
~u1 =1
σ1
B~v1 =1
6
−4
2
0
4
, ~u2 =1
σ2
B~v2 =1
5
2
2
−4
1
But, we only have two orthogonal vectors in R4, so we need to extend {~u1, ~u2} to an
orthonormal basis {~u1, ~u2, ~u3, ~u4} for R4. We know that we can do this by finding an
orthonormal basis for Null(BT ). We apply the Gram-Schmidt algorithm to a basis for
Null(BT ) to get vectors
~u3 =1
3
1
−2
0
2
, ~u4 =1
15
8
8
9
4
We let U =[
~u1 ~u2 ~u3 ~u4
]
and we get B = UΣVT .
EXAMPLE 8 Find a singular value decomposition for A =
1 1
2 2
−1 −1
.
Solution: We have AT A =
[
6 6
6 6
]
. Thus, by inspection, the eigenvalues of A λ1 = 12
and λ2 = 0, with corresponding unit eigenvectors ~v1 =
[
1/√
2
1/√
2
]
and ~v2 =
[
−1/√
2
1/√
2
]
respectively. Hence, the singular values of A are σ1 =√
12 and σ2 = 0. Thus, we
have
V =
[
1/√
2 −1/√
2
1/√
2 1/√
2
]
, Σ =
√12 0
0 0
0 0
We compute
~u1 =1
σ1
A~v1 =1√
24
2
4
−2
We now need to extend {~u1} to an orthonormal basis {~u1, ~u2, ~u3} for R3. To do this, we
find an orthonormal basis for Null(AT ).
We observe that a basis for Null(AT ) is
−2
1
0
,
1
0
1
. Applying the Gram-Schmidt
algorithm to this basis and normalizing gives
~u2 =1√
5
−2
1
0
, ~u3 =1√
30
1
2
5
We let U =[
~u1 ~u2 ~u3
]
and we have a singular value decomposition UΣVT of A.
318 Chapter 10 Applications of Orthogonal Matrices
Section 10.6 Problems
1. Find a singular value decomposition for the fol-
lowing matrices.
(a)
[
2 2
−1 2
]
(b)
[
3 −4
8 6
]
(c)
[
2 3
0 2
]
(d)
1 3
2 1
1 1
(e)
2 −1
2 −1
2 −1
(f)
1 0 1
1 1 −1
−1 1 0
0 1 1
(g)
1 2
1 2
1 2
1 2
(h)
[
1 1 1 1
2 2 2 2
]
(i)
[
4 4 −2
3 2 4
]
2. Find the maximum of ‖A~x‖ subject to ‖~x‖ = 1
for each of the following matrices A.
(a)
[
2 3
6 −1
]
(b)
−1 3
2 −1
1 2
(c)
[
1 2 3
2 3 −1
]
3. Let A be an m× n matrix and let P be an m×m
orthogonal matrix. Prove that PA has the same
singular values as A.
4. Prove Theorem 6.
5. Let UΣVT be a singular value decomposi-
tion for an m × n matrix A with rank r.
Find, with proof, an orthonormal basis for
Row(A), Col(A), Null(A) and Null(AT ) from
the columns of U and V .
6. Let A =
2 1 1
0 2 0
0 0 2
be the standard matrix of a
linear mapping L. (Observe that A is not diag-
onalizable).
(a) Find a basis B for R3 of right singular vec-
tors of A.
(b) Find a basis C for R3 of left singular vec-
tors of A.
(c) Determine C[L]B.
7. Show that a singular value decomposition
A = UΣVT allows us to write A as
A = σ1~u1~vT1 + · · · + σr~ur~v
Tr
where r = rank(A).
Chapter 11
Complex Vector Spaces
Our goal in this chapter is to extend everything we have done with real vector spaces
to vector spaces which use complex numbers for their scalars instead of just real
numbers. We begin with a very brief review of the basic operations on complex
numbers.
11.1 Complex Number Review
Recall that a complex number is a number of the form z = a + bi where a, b ∈ R and
i 2 = −1. We call a the real part of z and denote it by
Re z = a
and we call b the imaginary part of z and denote it by
Im z = b
The set of all complex numbers is denoted C.
It is important to notice that the real numbers are a subset of the complex numbers.
In particular, every real number a is the complex number a+0i. A non-zero complex
number whose real part is 0 is said to be imaginary. Any complex number which
has non-zero imaginary part is said to be non-real.
We define addition and subtraction of complex numbers by
(a + bi) ± (c + di) = (a ± c) + (b ± d)i
EXAMPLE 1 We have
(3 + 4i) + (2 − 3i) = (3 + 2) + (4 − 3)i = 5 + i
(1 − 3i) − (4 + 2i) = (1 − 4) + (−3 − 2)i = −3 − 5i
319
320 Chapter 11 Complex Vector Spaces
Observe that addition of complex numbers is defined by adding the real parts and
adding the imaginary parts of the complex numbers. That is, we add complex num-
bers component wise just like vectors in Rn.
In fact, the set C with this rule for addition and with scalar multiplication by a real
scalar t defined by
t(a + bi) = ta + tbi
forms a vector space over R. Let’s find a basis for this real vector space. Notice that
every vector in C can be written as a + bi with a, b ∈ R, so {1, i} spans C. Also, {1, i}is clearly linearly independent as the only solution to
t11 + t2i = 0 + 0i
is t1 = t2 = 0 since t1 and t2 are real scalars. Hence, C forms a two dimensional
real vector space. Consequently, it is isomorphic to every other two dimensional real
vector space. In particular, it is isomorphic to the plane R2. As a result, the set C is
often called the complex plane. We generally think of having an isomorphism which
maps the complex number z = a + bi to the point (a, b) in R2.
We can use this isomorphism to define the absolute value of a complex number z =
a + bi. Recall that the absolute value of a real number x measures the distance that
x is from the origin. So, we define the absolute value of z as the distance from the
point (a, b) in R2 to the origin.
DEFINITION
Absolute Value
The absolute value or modulus of a complex number z = a + bi is defined by
|z| =√
a2 + b2
EXAMPLE 2 We have
|2 − 3i| =√
22 + (−3)2 =√
13
∣
∣
∣
∣
∣
1
2+
1
2i
∣
∣
∣
∣
∣
=
√
(
1
2
)2
+
(
1
2
)2
=
√
1
2
Of course, this has the same properties as the real absolute value.
THEOREM 1 If w, z ∈ C, then
(1) |z| is a non-negative real number
(2) |z| = 0 if and only if z = 0
(3) |w + z| ≤ |w| + |z|
Section 11.1 Complex Number Review 321
REMARK
If z = a + bi is real, then b = 0, so |z| = |a + 0i| =√
a2 + 02 = |a|. Hence, this is a
direct generalization of the real absolute value function.
The ability to represent complex numbers as points in the plane is very important
in the study of complex numbers. However, viewing C as a real vector space has
one serious limitation; it only defines the multiplication of a complex number by
a real scalar and does not immediately give us a way of multiplying two complex
numbers together. So, we return to the standard form of complex numbers and define
multiplication of complex numbers by using the normal distributive property and the
fact that i2 = −1. We get
(a + bi)(c + di) = ac + adi + bci + bdi2 = ac − bd + (ad + bc)i
EXAMPLE 3 We have
(3 + 4i)(2 − 3i) = 3(2) − 4(−3) + [4(2) + 3(−3)]i = 18 − i
(1 − 3i)(4 + 2i) = 1(4) − (−3)(2) + [(−3)(4) + 1(2)]i = 10 − 10i
We get the following familiar properties.
THEOREM 2 If z1, z2, z3 ∈ C, then
(1) z1 + z2 ∈ C(2) z1 + z2 = z2 + z1
(3) z1z2 = z2z1
(4) z1 + (z2 + z3) = (z1 + z2) + z3
(5) z1(z2z3) = (z1z2)z3
(6) z1(z2 + z3) = z1z2 + z1z3
(7) z1 + 0 = z1
(8) 1z1 = z1
(9) For each z ∈ C there exist (−z) ∈ C such that z + (−z) = 0.
(10) For each z ∈ C with z , 0, there exists (1/z) ∈ C such that z(1/z) = 1.
REMARK
Any set with two operations of addition and multiplication that satisfies properties
(1) to (10) is called a field.
EXERCISE 1 Prove that |z1z2| = |z1||z2| for z1, z2 ∈ C.
322 Chapter 11 Complex Vector Spaces
We look at one more important example of complex multiplication.
EXAMPLE 4(a + bi)(a − bi) = a2 + b2 + [b(a) − a(b)]i = a2 + b2 = |a + bi|2
This example motivates the following definition.
DEFINITION
ComplexConjugate
The complex conjugate z of z = a + bi ∈ C is defined to be
z = a − bi
EXAMPLE 5 We have
4 − 3i = 4 + 3i
3i = −3i
−4 = −4
2 + 2i = 2 − 2i
We have many important properties of complex conjugates.
THEOREM 3 If w, z ∈ C with z = a + bi, then
(1) z = z
(2) z is real if and only if z = z
(3) z is imaginary if and only if z = −z and z , 0
(4) z ± w = z ± w
(5) zw = z w
(6) z + z = 2 Re(z) = 2a
(7) z − z = i2 Im(z) = 2bi
(8) zz = a2 + b2 = |z|2
Since the absolute value of a non-zero complex number is a positive real number,
we can use property (8) to divide complex numbers. We demonstrate this with some
examples.
EXAMPLE 6 Calculate2 − 3i
1 + 2i.
Solution: To calculate this, we make the denominator real by multiplying the numer-
ator and the denominator by the complex conjugate of the denominator. We get
2 − 3i
1 + 2i=
2 − 3i
1 + 2i· 1 − 2i
1 − 2i=
2 − 6 + (−3 − 4)i
12 + 22=−4 + 7i
5= −4
5+
7
5i
Section 11.1 Complex Number Review 323
We can easily check this answer. We have
(1 + 2i)
(
−4
5− 7
5i
)
= −4
5+
14
5+
[
−8
5− 7
5
]
i = 2 − 3i
EXAMPLE 7 Calculatei
1 + i.
Solution: We have
i
1 + i=
i
1 + i· 1 − i
1 − i=
1 + i
1 + 1=
1
2+
1
2i
EXERCISE 2 Calculate5
3 − 4i.
In some real world applications, such as problems involving electric circuits, one
needs to solve systems of linear equations involving complex numbers. The proce-
dure for solving systems of linear equations is the same, except that we can now
multiply a row by a non-zero complex number and we can add a complex multiple
of one row to another. We demonstrate this with a few examples.
EXAMPLE 8 Solve the system of linear equations
z1 + iz2 + (−1 + 2i)z3 = −1 + 2i
z2 + 2z3 = 2 + 2i
2z1 + (−1 + 2i)z2 + (−6 + 4i)z3 = −4
Solution: We row reduce the augmented matrix of the system to RREF.
1 i −1 + 2i −1 + 2i
0 1 2 2 + 2i
2 −1 + 2i −6 + 4i −4
R3 − 2R1
∼
1 i −1 + 2i −1 + 2i
0 1 2 2 + 2i
0 −1 −4 −2 − 4i
R1 − iR2
R3 + R2
∼
1 0 −1 1
0 1 2 2 + 2i
0 0 −2 −2i
−12R3
∼
1 0 −1 1
0 1 2 2 + 2i
0 0 1 i
R1 + R3
R2 − 2R3 ∼
1 0 0 1 + i
0 1 0 2
0 0 1 i
Hence, the only solution is z1 = 1 + i, z2 = 2, and z3 = i.
324 Chapter 11 Complex Vector Spaces
REMARK
It is very easy to make calculation mistakes when working with complex numbers.
Thus, it is highly recommended that you check your answer whenever possible.
EXAMPLE 9 Solve the system of linear equations
z1 + z2 + iz3 = 1
−2iz1 + z2 + (1 − 2i)z3 = −2i
(1 + i)z1 + (2 + 2i)z2 − 2z3 = 2
Solution: We row reduce the augmented matrix of the system to RREF.
1 1 i 1
−2i 1 1 − 2i −2i
1 + i 2 + 2i −2 2
R2 + 2iR1
R3 − (1 + i)R1
∼
1 1 i 1
0 1 + 2i −1 − 2i 0
0 1 + i −1 − i 1 − i
11+2i
R2 ∼
1 1 i 1
0 1 −1 0
0 1 + i −1 − i 1 − i
R1 − R2
R3 − (1 + i)R2
∼
1 0 1 + i 1
0 1 −1 0
0 0 0 1 − i
Hence, the system is inconsistent.
EXAMPLE 10 Find the solution set of the system of linear equations
z1 − iz2 = 1 + i
(1 + i)z1 + (1 − i)z2 + (1 − i)z3 = 1 + 3i
2z1 − 2iz2 + (3 + i)z3 = 1 + 5i
Solution: We row reduce the augmented matrix of the system to RREF.
1 −i 0 1 + i
1 + i 1 − i 1 − i 1 + 3i
2 −2i 3 + i 1 + 5i
R2 − (1 + i)R1
R3 − 2R1
∼
1 −i 0 1 + i
0 0 1 − i 1 + i
0 0 3 + i −1 + 3i
11−i
R2 ∼
1 −i 0 1 + i
0 0 1 i
0 0 3 + i −1 + 3i
R3 − (3 + i)R2
∼
1 −i 0 1 + i
0 0 1 i
0 0 0 0
Thus, z2 is a free variable. Since we are solving this system over C, this means that
z2 can take any complex value. Hence, we let z2 = α ∈ C. Then, a vector equation for
the solution set is
~z =
1 + i
0
i
+ α
i
1
0
, α ∈ C
Section 11.1 Complex Number Review 325
Section 11.1 Problems
1. Calculate the following
(a) (3 + 4i) − (−2 + 6i)
(b) (2 − 3i) + (1 + 3i)
(c) (−1 + 2i)(3 + 2i)
(d) −3i(−2 + 3i)
(e) 3 − 7i
(f) (1 + i)(1 − i)
(g) |(1 + i)(1 − 2i)(3 + 4i) |
(h) |1 + 6i |
(i)
∣
∣
∣
∣
∣
2
3−√
2i
∣
∣
∣
∣
∣
(j)2
1 − i
(k)4 − 3i
3 − 4i
(l)2 + 5i
−3 − 6i
2. Solve each equation for z.
(a) iz = 4 − zi
(b) 11−z= 1 − 5i
(c) (1 + i)z = 2 − (2 + i)z
3. Solving the following systems of linear equa-
tions over C.
(a) z1 + (2 + i)z2 + iz3 = 1 + i
iz1 + (−1 + 2i)z2 + 2iz4 = −i
z1 + (2 + i)z2 + (1 + i)z3 + 2iz4 = 2 − i
(b) iz1 + 2z2 − (3 + i)z3 = 1
(1 + i)z1 + (2 − 2i)z2 − 4z3 = i
iz1 + 2z2 − (3 + 3i)z3 = 1 + 2i
(c) iz1 + (1 + i)z2 + z3 = 2i
(1 − i)z1 + (1 − 2i)z2 + (−2 + i)z3 = −2 + i
2iz1 + 2iz2 + 2z3 = 4 + 2i
(d) z1 − z2 + iz3 = 2i
(1 + i)z1 − iz2 + iz3 = −2 + i
(1 − i)z1 + (−1 + 2i)z2 + (1 + 2i)z3 = 3 + 2i
4. Prove that for any z1, z2, z3 ∈ C that we have
(a) z1(z2 + z3) = z1z2 + z1z3
(b) For each z ∈ C with z , 0, there exists
(1/z) ∈ C such that z(1/z) = 1.
(c) z1 is real if and only if z1 = z1.
(d) z1z2 = z1 z2
(e) |z1 + z2| ≤ |z1| + |z2|5. Let z1, z2 ∈ C such that z1 + z2 and z1z2 are
each negative real numbers. Prove that z1 and
z2 must be real numbers.
6. Prove that if |z| = 1, z , 1, then Re(
11−z
)
= 12.
7. Prove that for any positive integer n and z ∈ Cwe have
zn = (z)n
326 Chapter 11 Complex Vector Spaces
11.2 Complex Vector Spaces
We saw in the last section that the set of all complex numbers C can be thought of
as a two dimensional real vector space. However, we see this is inappropriate for
the solution space of Example 11.1.10 since it requires multiplication by complex
scalars. Thus, we need to extend the definition of a real vector space to a complex
vector space.
DEFINITION
Complex VectorSpace
A setV with operations of addition and scalar multiplication is called a vector space
over C if for any ~v,~z, ~w ∈ V and α, β ∈ C we have:
V1 ~z + ~w ∈ VV2 (~z + ~w) + ~v = ~z + (~w + ~v)
V3 ~z + ~w = ~w +~zV4 There exists a vector ~0 ∈ V such that ~z + ~0 = ~z for all ~z ∈ VV5 For every ~z ∈ V there exists (−~z) ∈ V such that ~z + (−~z) = ~0V6 α~z ∈ VV7 α(β~z) = (αβ)~zV8 (α + β)~z = α~z + β~zV9 α(~z + ~w) = α~z + α~w
V10 1~z = ~z
This definition looks a lot like the definition of a real vector space. In fact, the only
difference is that we now allow the scalars to be any complex number, instead of just
any real number. In the rest of this section we will generalize most of our concepts
from real vector spaces to complex vector spaces.
EXAMPLE 1 The set Cn =
z1
...zn
| zi ∈ C
is a vector space over C with addition defined by
z1
...zn
+
w1
...wn
=
z1 + w1
...zn + wn
and scalar multiplication defined by
α
z1
...zn
=
αz1
...αzn
for any α ∈ C.
Just like a complex number z ∈ C, we can split a vector inCn into a real and imaginary
part.
Section 11.2 Complex Vector Spaces 327
THEOREM 1 If ~z ∈ Cn, then there exists vectors ~x, ~y ∈ Rn such that
~z = ~x + i~y
Other familiar real vector spaces can be turned into complex vector spaces.
EXAMPLE 2 The set Mm×n(C) of all m×n matrices with complex entries is a complex vector space
with standard addition and complex scalar multiplication of matrices.
We now extend all of our theory from real vector spaces to complex vectors space.
DEFINITION
Subspace
If S is a subset of a complex vector space V and S is a complex vector space under
the same operations as V, then S is said to be a subspace of V.
As in the real case, we use the Subspace Test to prove a non-empty subset S of a
complex vector space V is a subspace of V.
EXAMPLE 3 Prove that S =
z1
z2
z3
| z1 + iz2 = −z3
is a subspace of C3.
Solution: By definition S is a subset of C3. Moreover, S is non-empty since
0
0
0
satisfies 0 + i(0) = −0. Let
z1
z2
z3
,
w1
w2
w3
∈ S. Then z1 + iz2 = −z3 and w1 + iw2 = −w3.
Hence,
z1
z2
z3
+
w1
w2
w3
=
z1 + w1
z2 + w2
z3 + w3
∈ S
since (z1 + w1) + i(z2 + w2) = z1 + iz2 + w1 + iw2 = −z3 − w3 = −(z3 + w3). Also,
α
z1
z2
z3
=
αz1
αz2
αz3
∈ S
since αz1 + i(αz2) = α(z1 + iz2) = α(−z3) = −(αz3). Thus, S is a subspace of C3 by
the Subspace Test.
328 Chapter 11 Complex Vector Spaces
EXAMPLE 4 Is R a subspace of C as a vector space over C?
Solution: By definition R is a non-empty subset of C. Let x, y ∈ R. Then, x + y is
also a real number, so the set is closed under addition. But, the difference between a
real vector space and a complex vector space is that we have scalar multiplication by
complex numbers in a complex vector space. So, if we take 2 ∈ R and the scalar i ∈ C,
then we get i(2) < R. Thus, R is not closed under complex scalar multiplication, and
hence it is not a subspace of C.
The concepts of linear independence, spanning, bases, and dimension are defined the
same as in the real case except that we now have complex scalar multiplication.
EXAMPLE 5 Find a basis for C as a vector space over C.
Solution: It may be tempting to think that a basis for C is {1, i}, but this would
be incorrect since this set is linearly dependent in C. In particular, the equation
α11 + α2i = 0 has solution α1 = 1 and α2 = i since 1(1) + i(i) = 1 − 1 = 0. A basis
for C is {1} since any complex number a + bi is just equal to (a + bi)(1).
EXERCISE 1 What is the standard basis for Cn?
EXAMPLE 6 Determine the dimension of the subspace S =
z1
z2
z3
| z1 + iz2 = −z3
of C3.
Solution: Every vector in S can be written in the form
z1
z2
z3
=
z1
z2
−z1 − iz2
= z1
1
0
−1
+ z2
0
1
−i
Thus,
1
0
−1
,
0
1
−i
spans S and is clearly linearly independent since neither vector is
a scalar multiple of the other, so it is a basis for S. Thus, dimS = 2.
EXERCISE 2 Find a basis for the four fundamental subspaces of A =
1 −1 −i
i 1 −1 + 2i
−i 1 + 2i −3 + 2i
2 0 2i
.
Section 11.2 Complex Vector Spaces 329
REMARK
Be warned that it is possible for two complex vectors to be scalar multiples of each
other without looking like they are. For example, can you determine by inspection
which of the following vectors is a scalar multiple of
[
1 − i
−2i
]
?
[
1
−1 − i
]
,
[
−i
−1 − i
]
,
[
7 − i
8 + 6i
]
Coordinates with respect to a basis, linear mappings, and matrices of linear mappings
are also defined in exactly the same way.
EXAMPLE 7 Given that B =
1
1 + i
1 − i
,
−1
−i
−1 + 2i
,
i
i
2 + 2i
is a basis of C3, find the B-coordinate
vector of ~z =
2i
−2 + i
3 + 2i
.
Solution: We need to find complex numbers α1, α2, and α3 such that
2i
−2 + i
3 + 2i
= α1
1
1 + i
1 − i
+ α2
−1
−i
−1 + 2i
+ α3
i
i
2 + 2i
Row reducing the corresponding augmented matrix gives
1 −1 i 2i
1 + i −i i −2 + i
1 − i −1 + 2i 2 + 2i 3 + 2i
R2 − (1 + i)R1
R3 − (1 − i)R1
∼
1 −1 i 2i
0 1 1 −i
0 i 1 + i 1
R1 + R2
R3 − iR2
∼
1 0 1 + i i
0 1 1 −i
0 0 1 0
R1 − (1 + i)R3
R2 − R3 ∼
1 0 0 i
0 1 0 −i
0 0 1 0
Thus, [~z]B =
i
−i
0
.
330 Chapter 11 Complex Vector Spaces
EXAMPLE 8 Find the standard matrix of L : C2 → C3 given by L(z1, z2) =
iz1
z2
z1 + z2
.
Solution: The standard basis for C2 is
{[
1
0
]
,
[
0
1
]}
. We have
L(1, 0) =
i
0
1
and L(0, 1) =
0
1
1
. Hence, [L] =[
L(1, 0) L(0, 1)]
=
i 0
0 1
1 1
.
Of course, the concepts of determinants and invertibility are also the same.
EXAMPLE 9 Calculate det
1 + i 2 3
0 −2i −2 + 3i
0 0 1 − i
and det
i i 2i
0 0 0
1 + i 1 − i 3i
.
Solution: We have
det
1 + i 2 3
0 −2i −2 + 3i
0 0 1 − i
= (1 + i)(−2i)(1 − i) = −4i
det
i i 2i
1 1 2
1 + i 1 − i 3i
= det
i i 2i
0 0 0
1 + i 1 − i 3i
= 0
EXAMPLE 10 Let A =
[
1 + i 1
1 2 − i
]
and B =
i 0 1
1 i 1 − i
0 1 1 + i
. Show that A and B are invertible and
find their inverse.
Solution: We have det A =
∣
∣
∣
∣
∣
∣
1 + i 1
1 2 − i
∣
∣
∣
∣
∣
∣
= 2 + i , 0, so A is invertible. Using the
formula for the inverse of a 2 × 2 matrix we found in Section 5.1 gives
A−1 =1
2 + i
[
2 − i −1
−1 1 + i
]
Row reducing [B | I ] to RREF gives
i 0 1 1 0 0
1 i 1 − i 0 1 0
0 1 1 + i 0 0 1
∼
1 0 0 2 − 2i −1 i
0 1 0 3 − i −1 − i i
0 0 1 −1 − 2i i 1
Since the RREF of B is I, B is invertible. Moreover, B−1 =
2 − 2i −1 i
3 − i −1 − i i
−1 − 2i i 1
.
Section 11.2 Complex Vector Spaces 331
EXERCISE 3 Let A =
1 + i 2 i
2 3 − 2i 1 + i
1 − i 1 − 3i 1 + i
. Find the determinant of A and A−1.
Complex Conjugates
We saw in Section 11.1 that the complex conjugate was a very important and useful
operation on complex numbers. Thus, we should expect that it will also be useful
and important in the study of general complex vector spaces. Therefore, we extend
the definition of the complex conjugate to vectors in Cn and matrices in Mm×n(C).
DEFINITION
ComplexConjugate in Cn
For ~z =
z1
...zn
∈ Cn we define ~z by ~z =
z1
...zn
.
DEFINITION
ComplexConjugate in
Mm×n(C)
For A =
z11 · · · z1n
......
zm1 · · · zmn
∈ Mm×n(C) we define A by A =
z11 · · · z1n
......
zm1 · · · zmn
.
THEOREM 2 If A ∈ Mm×n(C) and ~z ∈ Cn, then A~z = A~z.
EXAMPLE 11 Let ~w =
1
2i
1 − i
, ~z =
1
0
1
, and A =
[
1 1 − 3i
−i 2i
]
. Then
~w =
1
−2i
1 + i
~z =
1
0
1
A =
[
1 1 + 3i
i −2i
]
EXAMPLE 12 Is the mapping L : C2 → C2 defined by L(~z) = ~z a linear mapping?
Solution: Consider L(α~z1 + ~z2) = α~z1 +~z2 = α~z1 + ~z2, but α , α if α is not real. So,
it is not linear. For example
L
(
(1 + i)
[
1
0
])
= L
([
1 + i
0
])
=
[
1 + i
0
]
=
[
1 − i
0
]
,
[
1 + i
0
]
= (1 + i)L
([
1
0
])
332 Chapter 11 Complex Vector Spaces
Even though the complex conjugate of a vector in Cn does not define a linear map-
ping, we will see that complex conjugates of vectors in Cn and matrices in Mm×n(C)
occur naturally in the study of complex vector spaces.
Section 11.2 Problems
1. Determine which of the following sets is a sub-
space of C3. Find a basis and the dimension of
each subspace.
(a) S1 =
z1
z2
z3
∈ C3 | iz1 = z3
(b) S2 =
z1
z2
z3
∈ C3 | z1z2 = 0
(c) S3 =
z1
z2
z3
∈ C3 | z1 + z2 + z3 = 0
2. Find a basis for the four fundamental subspaces
of the following matrices.
(a) A =
1 i
1 + i −1 + i
−1 i
(b) B =
[
1 1 + i −1
i −1 + i i
]
(c) C =
1 1 + i 3
0 2 2 − 2i
i 1 − i −i
3. Let ~z ∈ Cn. Prove that there exists vectors
~x, ~y ∈ Rn such that ~z = ~x + i~y.
4. Let L : C3 → C2 be defined by
L(z1, z2, z3) =
[
−iz1 + (1 + i)z2 + (1 + 2i)z3
(−1 + i)z1 − 2iz2 − 3iz3
]
.
(a) Prove that L is linear and find the standard
matrix of L.
(b) Determine a basis for the range and kernel
of L.
5. Let L : C2 → C2 be defined by
L(z1, z2) =
[
z1 + (1 + i)z2
(1 + i)z1 + 2iz2
]
.
(a) Prove that L is linear.
(b) Determine a basis for the range and kernel
of L.
(c) Use the result of part (b) to define a basis
B for C2 such that [L]B is diagonal.
6. Calculate the determinant of
A =
1 −1 i
1 + i −i i
1 − i −1 + 2i 1 + 2i
.
7. Find the inverse of A =
1 1 2
0 1 + i 1
i 1 + i 1 + 2i
.
8. Prove that for any n × n matrix A, we have
det A = det A.
Section 11.3 Complex Diagonalization 333
11.3 Complex Diagonalization
In Chapter 6, we saw that some matrices were not diagonalizable over R because
they had complex eigenvalues. But, if we extend all of our concepts of eigenvalues,
eigenvectors, and diagonalization to complex vector spaces, then we may be able to
diagonalize these matrices.
DEFINITION
Eigenvalue
Eigenvector
Let A ∈ Mn×n(C). If there exists λ ∈ C and ~z ∈ Cn with ~z , ~0 such that A~z = λ~z, then
λ is called an eigenvalue of A and ~z is called an eigenvector of A corresponding to
λ. We call (λ,~z) an eigenpair.
All of the theory of similar matrices, eigenvalues, eigenvectors, and diagonalization
we did in Chapter 6 still applies in the complex case, except that we now allow the
use of complex eigenvalues and eigenvectors.
EXAMPLE 1 Diagonalize A =
[
0 1
−1 0
]
over C.
Solution: The characteristic polynomial is
C(λ) =
∣
∣
∣
∣
∣
∣
−λ 1
−1 −λ
∣
∣
∣
∣
∣
∣
= λ2 + 1
Therefore, the eigenvalues of A are λ1 = i and λ2 = −i. For λ1 = i we get
A − iI =
[
−i 1
−1 −i
]
∼[
1 i
0 0
]
Hence, a basis for Eλ1is
{[
−i
1
]}
. For λ2 = −i we get
A + iI =
[
i 1
−1 i
]
∼[
1 −i
0 0
]
So, a basis for Eλ2is
{[
i
1
]}
. Thus, letting P =
[
−i i
1 1
]
gives
P−1AP =
[
i 0
0 −i
]
EXAMPLE 2 Diagonalize B =
[
i 1 + i
1 + i 1
]
over C.
Solution: The characteristic polynomial is
C(λ) =
∣
∣
∣
∣
∣
∣
i − λ 1 + i
1 + i 1 − λ
∣
∣
∣
∣
∣
∣
= λ2 − (1 + i)λ − i
334 Chapter 11 Complex Vector Spaces
Using the quadratic formula we get
λ =(1 + i) ±
√6i
2=
(1 + i) ±√
6(1+i√2)
2=
1 ±√
3
2(1 + i)
So, the eigenvalues are λ1 =1+√
32
(1 + i) and λ2 =1−√
32
(1 + i). For λ1 we get
B − λ1I =
i − 1+√
3
2(1 + i) 1 + i
1 + i 1 − 1+√
3
2(1 + i)
∼[
−√
3 + i 2
2 −√
3 − i
]
∼[
1 −2/(√
3 − i)
0 0
]
Then, a basis for Eλ1is
{[
2√3 − i
]}
. For λ2 we get
B − λ2I =
i − 1−√
3
2(1 + i) 1 + i
1 + i 1 − 1−√
3
2(1 + i)
∼[√
3 + i 2
2√
3 − i
]
∼[
1 2/(√
3 + i)
0 0
]
Hence, a basis for Eλ2is
{[
−2√3 + i
]}
. Thus, B is diagonalized by
P =
[
2 −2√3 − i
√3 + i
]
to D =
[
λ1 0
0 λ2
]
.
EXAMPLE 3 Diagonalize F =
[
3 2 + i
2 − i 7
]
over C.
Solution: The characteristic polynomial is
C(λ) =
∣
∣
∣
∣
∣
∣
3 − λ 2 + i
2 − i 7 − λ
∣
∣
∣
∣
∣
∣
= λ2 − 10λ + 16 = (λ − 2)(λ − 8)
So the eigenvalues are λ1 = 2 and λ2 = 8. For λ1 we get
F − λ1I =
[
1 2 + i
2 − i 5
]
∼[
1 2 + i
0 0
]
Then, a basis for Eλ1is
{[
−2 − i
1
]}
. For λ2 we get
F − λ2I =
[
−5 2 + i
2 − i −1
]
∼[
1 −1/(2 − i)
0 0
]
Hence, a basis for Eλ2is
{[
−1
−2 + i
]}
. Thus, F is diagonalized by
P =
[
−2 − i −1
1 −2 + i
]
to D =
[
2 0
0 8
]
.
Section 11.3 Complex Diagonalization 335
EXERCISE 1 Diagonalize A =
−3 −1 4
−5 3 5
−8 −2 9
over C.
As usual, these examples teach us more than just how to diagonalize complex ma-
trices. In the first example we see that when the matrix has only real entries, then
the eigenvalues come in complex conjugate pairs. The second example shows that
when working with matrices with non-real entries our theory of symmetric matrices
for real matrices does not apply. In particular, we had BT = B, but the eigenvalues of
B are not real. On the other hand, observe that the matrix F has non-real entries but
the eigenvalues of F are all real.
THEOREM 1 Let A ∈ Mn×n(R). If λ is a non-real eigenvalue of A with corresponding eigenvector
~z, then λ is also an eigenvalue of A with eigenvector ~z.
Proof: We have A~z = λ~z. Hence,
λ~z = λ~z = A~z = A~z = A~z
since A is a real matrix. Thus, λ is an eigenvalue of A with eigenvector ~z. �
COROLLARY 2 If A ∈ Mn×n(R) with n odd, then A has at least one real eigenvalue.
EXAMPLE 4 Find all eigenvalues of A =
0 2 1
−2 3 0
1 0 2
given that λ1 = 2 + i is an eigenvalue of A.
Solution: Since A is a real matrix and λ1 = 2 + i is an eigenvalue of A, Theorem 1
tells us that λ2 = λ1 = 2 − i is also an eigenvalue of A. Finally, by Theorem 6.3.6,
we know that the sum of the eigenvalues is the trace of the matrix, so the remaining
eigenvalue, which must be real, is
λ3 = tr(A) − λ1 − λ2 = 5 − (2 + i) − (2 − i) = 1
It is easy to verify this result by checking that det(A − I) = 0.
336 Chapter 11 Complex Vector Spaces
Section 11.3 Problems
1. For each of the following matrices, either diag-
onalize the matrix over C, or show that it is not
diagonalizable.
(a)
[
2 1 + i
1 − i 1
]
(b)
[
3 5
−5 −3
]
(c)
2 2 −1
−4 1 2
2 2 −1
(d)
[
1 i
i −1
]
(e)
2 1 −1
2 1 0
3 −1 2
(f)
1 + i 1 0
1 1 −i
1 0 1
(g)
5 −1 + i 2i
−2 − 2i 2 1 − i
4i −1 − i −1
(h)
−6 − 3i −2 −3 − 2i
10 2 5
8 + 6i 3 4 + 4i
2. For any θ ∈ R, let Rθ =
[
cos θ − sin θsin θ cos θ
]
.
(a) Diagonalize Rθ over C.
(b) Verify your answer in (a) is correct, by
showing the matrix P and diagonal matrix
D from part (a) satisfy P−1RθP = D for
θ = 0 and θ = π4.
3. Let A ∈ Mn×n(C) and let ~z be an eigenvector of
A. Prove that ~z is an eigenvector of A. What is
the corresponding eigenvalue?
4. Suppose that a real 2 × 2 matrix A has 2 + i as
an eigenvalue with a corresponding eigenvector[
1 + i
i
]
. Determine A.
5. Prove that if A is a real 3×3 matrix with a non-
real eigenvalue λ, then A is diagonalizable over
C.
11.4 Complex Inner Products
The concepts of orthogonality and orthonormal bases were very useful in real vector
spaces. Hence, it makes sense to extend these concepts to complex vector spaces. In
particular, we would like to be able to define the concepts of orthogonality and length
in complex vector spaces to match those in the real case.
REMARK
In this section we leave the proofs of the theorems to the reader (or homework as-
signments/tests) as they are essentially the same as in the real case.
We start by looking at how to define an inner product for Cn.
We first observe that if we extend the standard dot product to vectors in Cn, then this
does not define an inner product. Indeed, if ~z = ~x + i~y, then we have
~z ·~z = z21 + · · · + z2
n
= (x21 + · · · + x2
n − y21 − · · · − y2
n) + 2i(x1y1 + · · · + xnyn)
But, to be an inner product we require that 〈~z,~z〉 be a non-negative real number so
that we can define the length of a vector. Since ~z · ~z does not even have to be real, it
cannot define an inner product.
Section 11.4 Complex Inner Products 337
Thinking of properties of complex numbers, one way we can ensure that we get
a non-negative real number when multiplying complex numbers is to multiply the
complex number by its conjugate. In particular, if for ~z, ~w ∈ Cn we define
〈~z, ~w〉 = z1w1 + · · · + znwn = ~z · ~w
then we get
< ~z,~z >= z1z1 + · · · + znzn = |z1|2 + · · · + |zn|2
which is not only a non-negative real number, but it is 0 if and only if ~z = ~0. Thus,
this is what we are looking for.
DEFINITION
Standard Innerproduct on Cn
The standard inner product < , > on Cn is defined by
⟨
~z, ~w⟩
= ~z · ~w = z1w1 + · · · + znwn
for any ~z, ~w ∈ Cn.
REMARK
In some other areas, for example engineering, they use
⟨
~z, ~w⟩
= ~z · ~w
for the definition of the standard inner product for Cn. Be warned that many computer
programs use the engineering definition of the inner product for Cn.
EXAMPLE 1 Let ~z =
1
2 + i
1 − i
and ~w =
i
2
1 + 3i
. Calculate 〈~z,~z〉, 〈~z, ~w〉, and 〈~w,~z〉.
Solution: We have
〈~z,~z〉 = ~z ·~z =
1
2 + i
1 − i
·
1
2 − i
1 + i
= 12 + (2 + i)(2 − i) + (1 − i)(1 + i) = 8
〈~z, ~w〉 = ~z · ~w =
1
2 + i
1 − i
·
−i
2
1 − 3i
= 1(−i) + (2 + i)(2) + (1 − i)(1 − 3i) = 2 − 3i
〈~w,~z〉 = ~w ·~z =
i
2
1 + 3i
·
1
2 − i
1 + i
= i(1) + 2(2 − i) + (1 + 3i)(1 + i) = 2 + 3i
This example shows that the standard inner product for Cn is not symmetric. The
next example shows that it is also not bilinear.
338 Chapter 11 Complex Vector Spaces
EXAMPLE 2 Let ~z, ~w ∈ Cn. Calculate 〈α~z, ~w〉 and 〈~z, α~w〉 for any α ∈ C.
Solution: We have
〈α~z, ~w〉 = (α~z) · ~w = α(~z · ~w) = α〈~z, ~w〉〈~z, α~w〉 = ~z · α~w = ~z · (α~w) = α(~z · ~w) = α〈~z, ~w〉
EXERCISE 1 Let ~z =
1 + i
1 − i
2
and ~w =
1 − i
2i
−2
. Find 〈~z, ~w〉 and 〈~w,~z〉.
THEOREM 1 If ~v,~z, ~w ∈ Cn and α ∈ C, then
(1)⟨
~z,~z⟩ ∈ R,
⟨
~z,~z⟩ ≥ 0, and
⟨
~z,~z⟩
= 0 if and only if ~z = ~0
(2)⟨
~z, ~w⟩
=⟨
~w,~z⟩
(3)⟨
~v +~z, ~w⟩
=⟨
~v, ~w⟩
+⟨
~z, ~w⟩
(4)⟨
α~z, ~w⟩
= α⟨
~z, ~w⟩
As we saw in the examples, this theorem shows that the standard inner product on
Cn does not satisfy the same properties as a real inner product. So, to define an inner
product on a general complex vector space, we should use these properties instead of
those for a real inner product.
DEFINITION
Hermitian InnerProduct
Let V be a complex vector space. A Hermitian inner product on V is a function
〈 , 〉 : V × V→ C such that for all ~v, ~w,~z ∈ V and α ∈ C we have
(1)⟨
~z,~z⟩ ∈ R,
⟨
~z,~z⟩ ≥ 0 for all ~z ∈ V, and
⟨
~z,~z⟩
= 0 if and only if ~z = ~0
(2)⟨
~z, ~w⟩
=⟨
~w,~z⟩
(3)⟨
~v +~z, ~w⟩
=⟨
~v, ~w⟩
+⟨
~z, ~w⟩
(4)⟨
α~z, ~w⟩
= α⟨
~z, ~w⟩
A complex vector space with a Hermitian inner product is called a Hermitian inner
product space.
THEOREM 2 If 〈 , 〉 is a Hermitian inner product on a complex vector space V, then for all ~v, ~w,~z ∈V and α ∈ C we have
⟨
~z,~v + ~w⟩
=⟨
~z,~v⟩
+⟨
~z, ~w⟩
⟨
~z, α~w⟩
= α⟨
~z, ~w⟩
Section 11.4 Complex Inner Products 339
It is important to observe that this is a true generalization of the real inner product.
That is, we could use this for the definition of the real inner product since if⟨
~z, ~w⟩
and α are strictly real, then this will satisfy the definition of a real inner product.
THEOREM 3 The function 〈A, B〉 = tr(BT A) defines a Hermitian inner product on Mm×n(C). It is
called the standard inner product for Mm×n(C).
EXAMPLE 3 Let A =
[
1 i
2 2 + i
]
and B =
[
−3 3 + i
−2i 0
]
∈ M2×2(C). Find 〈A, B〉.
Solution: We have
〈A, B〉 = tr(BT A) = tr
([
−3 2i
3 − i 0
] [
1 i
2 2 + i
])
= tr
[
−3 + 4i −2 + i
3 − i 1 + 3i
]
= −3 + 4i + 1 + 3i = −2 + 7i
EXAMPLE 4 If A =
a11 · · · a1n
......
am1 · · · amn
and B =
b11 · · · b1n
......
bm1 · · · bmn
, then
BT
A =
b11 · · · bm1
......
b1n · · · bmn
a11 · · · a1n
......
am1 · · · amn
Hence, we get
tr(BT
A) =
m∑
ℓ=1
aℓ1bℓ1 +
m∑
ℓ=1
aℓ2bℓ2 + · · · +m
∑
ℓ=1
aℓnbℓn =
n∑
j=1
m∑
ℓ=1
aℓ jbℓ j
which corresponds to the standard inner product of the corresponding vectors under
the obvious isomorphism with Cmn.
REMARK
As in the real case, whenever we are working with Cn or Mm×n(C), if no other inner
product is specified, we will assume we are working with the standard inner product.
EXERCISE 2 Let A, B ∈ M2×2(C) with A =
[
1 1 − i
2 + 2i i
]
and B =
[
2 + i 1
2i 1 + i
]
. Determine
〈A, B〉 and 〈B, A〉.
340 Chapter 11 Complex Vector Spaces
Length and Orthogonality
We can now define length and orthogonality to match the definitions in the real case.
DEFINITION
Length
Unit Vector
Let V be a Hermitian inner product space with inner product 〈 , 〉. For any ~z ∈ V we
define the length of ~z by
‖~z ‖ =√
⟨
~z,~z⟩
If ‖~z ‖ = 1, then ~z is called a unit vector.
DEFINITION
Orthogonality
Let V be a Hermitian inner product space with inner product 〈 , 〉. For any ~z, ~w ∈ Vwe say that ~z and ~w are orthogonal if
⟨
~z, ~w⟩
= 0.
EXAMPLE 5 If ~u =
i
1 + i
2 − i
∈ C3, then
‖~u ‖ =√
⟨
~u, ~u⟩
=
√
~u · ~u =√
i(−i) + (1 + i)(1 − i) + (2 − i)(2 + i)
=√
1 + 2 + 5 =√
8
EXAMPLE 6 Let ~z =
1
i
2 + 3i
∈ C3. Which of the following vectors are orthogonal to ~z:
~u =
−i
1
0
, ~v =
i
1
0
, ~w =
1 − 2i
−7
1 − i
Solution: We have
⟨
~z, ~u⟩
= ~z · ~u =
1
i
2 + 3i
·
i
1
0
= i + i + 0 = 2i
⟨
~z,~v⟩
= ~z · ~v =
1
i
2 + 3i
·
−i
1
0
= i − i + 0 = 0
⟨
~z, ~w⟩
= ~z · ~w =
1
i
2 + 3i
·
1 + 2i
−7
1 + i
= (1 + 2i) − 7i + (−1 + 5i) = 0
Hence, ~v and ~w are orthogonal to ~z, and ~u is not orthogonal to ~z.
Section 11.4 Complex Inner Products 341
Of course, these satisfy all of our familiar properties of length and orthogonality.
THEOREM 4 If V is a Hermitian inner product space with inner product 〈 , 〉, then for any ~z, ~w ∈ Vand α ∈ C we have
(1) ‖α~z ‖ = |α| ‖~z ‖(2) ‖~z + ~w ‖ ≤ ‖~z ‖ + ‖~w ‖(3)
1
‖~z ‖~z is a unit vector in the direction of ~z.
EXERCISE 3 If V is a Hermitian inner product space with inner product 〈 , 〉. Prove that
‖α~z ‖ = |α| ‖~z ‖
for all ~z ∈ V and α ∈ C.
DEFINITION
Orthogonal
Orthonormal
Let S = {~z1, . . . ,~zk} be a set in a Hermitian inner product space with Hermitian inner
product 〈 , 〉. S is said to be orthogonal if⟨
~zℓ,~z j
⟩
= 0 for all ℓ , j. S is said to be
orthonormal if it is orthogonal and ‖~z j ‖ = 1 for 1 ≤ j ≤ k.
THEOREM 5 If S = {~z1, . . . ,~zk} is an orthogonal set in a Hermitian inner product space V, then
‖~z1 + · · · +~zk ‖2 = ‖~z1‖2 + · · · + ‖~zk ‖2
THEOREM 6 If S = {~z1, . . . ,~zk} is an orthogonal set of non-zero vectors in a Hermitian inner
product space V, then S is linearly independent.
The formulas for finding coordinates with respect to an orthogonal basis, for perform-
ing the Gram-Schmidt procedure, and for finding projections and perpendiculars are
still valid for Hermitian inner products. However, be careful to remember that a Her-
mitian product is not symmetric, so you need to have the vectors in the inner products
in the correct order.
EXERCISE 4 Let {~v1, . . . ,~vn} be an orthogonal basis for a Hermitian inner product space V. Prove
that if ~z ∈ V, then
~z =〈~z,~v1〉‖~v1‖2
~v1 + · · · +〈~z,~vn〉‖~vn‖2
~vn
342 Chapter 11 Complex Vector Spaces
EXAMPLE 7 Use the Gram-Schmidt procedure to find an orthogonal basis for the subspace S of
C4 defined by
S = Span
1
0
1
0
,
i
1
i
1
,
1 + i
−1
i
1 + i
= {~w1, ~w2, ~w3}
Solution: First, we let ~v1 = ~w1. Then we have
~v2 = ~w2 −⟨
~w2,~v1
⟩
‖~v1‖2~v1 =
i
1
i
1
− 2i
2
1
0
1
0
=
0
1
0
1
~v3 = ~w3 −⟨
~w3,~v1
⟩
‖~v1‖2~v1 −
⟨
~w3,~v2
⟩
‖~v2‖2~v2
=
1 + i
−1
i
1 + i
− 1 + 2i
2
1
0
1
0
− i
2
0
1
0
1
=
1/2−1 − i/2−1/2
1 + i/2
Consequently, {~v1,~v2,~v3} is an orthogonal basis for S.
We recall that a real square matrix with orthonormal columns is an orthogonal matrix.
Since orthogonal matrices were so useful in the real case, we generalize them as well.
DEFINITION
Unitary Matrix
If the columns of a matrix U form an orthonormal basis for Cn, then U is called
unitary.
THEOREM 7 If U ∈ Mn×n(C), then the following are equivalent:
(1) the columns of U form an orthonormal basis for Cn
(2) UT = U−1
(3) the rows of U form an orthonormal basis for Cn
EXAMPLE 8 Determine which of the following matrices are unitary:
A =
[
1 1 + 2i
−1 1 + 2i
]
, B =
(1 + i)/√
2 0 0
0 0 1
0 i 0
, C =
1 + i
2
1 − i√
6
1√
6i
20
−3 − 3i√
241
2
2i√
6
1 − i√
24
Solution: The columns of A are not unit vectors, so A is not unitary.
Section 11.4 Complex Inner Products 343
The columns of B are clearly unit vectors and orthogonal to each other, so the
columns of B form an orthonormal basis for C3. Hence, B is unitary.
We find that CT C = I and hence C is unitary.
REMARK
Observe that if the entries of A are all real, then A is unitary if and only if it is
orthogonal.
THEOREM 8 If U and W are unitary matrices, then UW is a unitary matrix.
Notice this is the second time we have seen the conjugate of the transpose of a matrix.
So, we make the following definition.
DEFINITION
ConjugateTranspose
For A ∈ Mm×n(C) the conjugate transpose A∗ of A is defined to be
A∗ = AT
EXAMPLE 9 If A =
1 0 1 + i −2i
1 − 3i 2 2 + i i
i −3i 4 − i 1 − 5i
, then A∗ =
1 1 + 3i −i
0 2 3i
1 − i 2 − i 4 + i
2i −i 1 + 5i
.
REMARK
Observe that if A is a real matrix, then A∗ = AT . Hence, the conjugate transpose is
the complex version of the transpose.
THEOREM 9 If A, B are matrices of the appropriate size, ~z, ~w ∈ Cn, and α ∈ C, then
(1) 〈A~z, ~w〉 = 〈~z, A∗~w〉(2) (A∗)∗ = A
(3) (A + B)∗ = A∗ + B∗
(4) (αA)∗ = αA∗
(5) (AB)∗ = B∗A∗
344 Chapter 11 Complex Vector Spaces
Section 11.4 Problems
1. Let ~u =
1
2i
1 − i
, ~z =
2
−2i
1 − i
, and ~w =
1 + i
1 − 2i
i
be
vectors in C3. Calculate the following.
(a) ‖~u ‖ (b) ‖~w ‖(c) 〈~u,~z〉 (d) 〈~z, ~u〉(e) 〈~u, (2 + i)~z〉 (f) 〈~z, ~w〉(g) 〈~w,~z〉 (h) 〈~u +~z, 2i~w − i~z〉
2. Let A =
[
1 2i
−i 1 + i
]
, B =
[
i 2
1 − i 3
]
∈ M2×2(C).
Calculate the following.
(a) 〈A, B〉 (b) 〈B, A〉(c) ‖A‖ (d) ‖B‖
3. Determine which of the following matrices is
unitary.
(a)
[
i/√
2 1/√
2
1/√
2 −i/√
2
]
(b)
(1 + i)/2 (1 − i)/√
6 (1 − i)/√
12
1/2 0 3i/√
12
1/2 2i/√
6 −i/√
12
(c)
[
1/√
2 −1/√
2
1/√
2 1/√
2
]
(d)
0 i 0
i 0 0
0 0 (1 + i)/2
4. Let U ∈ Mn×n(C). Prove that if the columns
of U form an orthonormal basis for Cn, then
U∗ = U−1.
5. Consider C3 with its standard inner product.
Let ~z =
1 + i
2 − i
−1 + i
, ~w =
1 − i
−2 − 3i
−1
.
(a) Evaluate 〈~z, ~w〉 and 〈~w, 2i~z〉.(b) Find a non-zero vector in Span{~z, ~w} that is
orthogonal to ~z.
(c) Let ~u ∈ Cn. Write the formula for the pro-
jection of ~u onto S = Span{~z, ~w}.6. Use the Gram-Schmidt procedure to produce
an orthogonal basis for the subspace S of
M2×2(C) defined by
S = Span
{[
1 i
0 0
]
,
[
i 0
0 i
]
,
[
0 2
0 i
]}
7. Let A, B ∈ Mm×n(C).
(a) Prove that (A∗)∗ = A.
(b) (αA)∗ = αA∗
(c) (AB)∗ = B∗A∗
(d) 〈A~z, ~w〉 = 〈~z, A∗~w〉 for all ~z, ~w ∈ Cn.
8. Let U be an n × n unitary matrix.
(a) Show that 〈U~z,U~w〉 = 〈~z, ~w〉 for any~z, ~w ∈Cn.
(b) Suppose λ is an eigenvalue of U. Use part
(a) to show that |λ| = 1.
(c) Find a unitary matrix U which has eigen-
value λ where λ , ±1.
9. Let V be a complex inner product space, with
complex inner product 〈 , 〉. Prove that if
〈~u,~v〉 = 0, then ‖~u + ~v ‖2 = ‖~u ‖2 + ‖~v ‖2. Is
the converse true?
Section 11.5 Unitary Diagonalization 345
11.5 Unitary Diagonalization
We have seen that orthogonal diagonalization is amazingly useful. Thus, it makes
sense to try to extend this to the complex case as well.
DEFINITION
Unitarily Similar
Let A, B ∈ Mm×n(C). If there exists a unitary matrix U such that U∗AU = B, then we
say that A and B are unitarily similar.
If A and B are unitarily similar, then they are similar. Consequently, all of our prop-
erties of similarity still apply.
DEFINITION
UnitarilyDiagonalizable
A matrix A ∈ Mn×n(C) is said to be unitarily diagonalizable if it is unitarily similar
to a diagonal matrix.
In Section 10.2, we saw that every real symmetric matrix is orthogonally diagonaliz-
able. It is natural to ask if there is a comparable result in the case of matrices with
complex entries.
We observe that if A is a real symmetric matrix, then the condition AT = A is equiv-
alent to A∗ = A. Hence, this condition should be our analogue of symmetric.
DEFINITION
Hermitian matrix
A matrix A ∈ Mn×n(C) such that A∗ = A is called Hermitian.
EXAMPLE 1 Which of the following matrices are Hermitian?
A =
[
2 3 − i
3 + i 4
]
, B =
[
1 2i
−2i 3 − i
]
, C =
0 i i
−i 0 i
−i i 0
Solution: We have A∗ =
[
2 3 − i
3 + i 4
]
= A, so A is Hermitian.
B∗ =
[
1 2i
−2i 3 + i
]
, B, so B is not Hermitian.
C∗ =
0 i i
−i 0 −i
−i −i 0
, C, so C is not Hermitian.
Observe that if A is Hermitian, then we have (A)ℓ j = A jℓ, so the diagonal entries of A
must be real, and for ℓ , j the ℓ j-th entry must be the complex conjugate of the jℓ-thentry. Moreover, we see that every real symmetric matrix is Hermitian.
346 Chapter 11 Complex Vector Spaces
Our goal now is to prove that every Hermitian matrix is unitarily diagonalizable. We
will do this in exactly the same way that we did in Section 10.2 for symmetric matri-
ces; by first proving that every square matrix is unitarily similar to an upper triangular
matrix. The following extension of the Triangularization Theorem is extremely im-
portant and useful.
THEOREM 1 (Schur’s Theorem)
If A ∈ Mn×n(C), then A is unitarily similar to an upper triangular matrix T . Moreover,
the diagonal entries of T are the eigenvalues of A.
Proof: We prove this by induction. If n = 1, then A is upper triangular, so we
can take U = I. Assume the result holds for all (n − 1) × (n − 1) matrices. Let
λ1 be an eigenvalue of A with corresponding unit eigenvector ~z1. Extend {~z1} to an
orthonormal basis {~z1, . . . ,~zn} of Cn and let U1 =[
~z1 · · · ~zn
]
. Then, U1 is unitary
and we have
U∗1AU1 =
~z1
T
...
~zn
T
A[
~z1 · · · ~zn
]
=
~z1
T
λ1~z1 ~z1
T
A~z2 · · · ~z1
T
A~zn
~z2
T
λ1~z1 ~z2
T
A~z2 · · · ~z2
T
A~zn
......
...
~zn
T
λ1~z1 ~zn
T
A~z2 · · · ~zn
T
A~zn
For 1 ≤ j ≤ n, we have
~z j
T
λ1~z1 = λ1~z j
T
~z1 = λ1~zTj~z1 = λ1~z j ·~z1 = λ1~z1 ·~z j = λ1〈~z1,~z j〉
Since {~z1, . . . ,~zn} is orthonormal, we get that ~z1
T
λ1~z1 = λ1 and ~z j
T
λ1~z1 = 0 for 2 ≤j ≤ n. Hence, we can write
U∗1AU1 =
[
λ1~bT
0 A1
]
where A1 is an (n−1)×(n−1) matrix and ~b ∈ Cn−1. Thus, by the inductive hypothesis
we get that there exits a unitary matrix Q such that Q∗A1Q = T1 is upper triangular.
Let U2 =
[
1 ~0T
~0 Q
]
, then U2 is clearly unitary and hence U = U1U2 is unitary. Thus,
U∗AU = (U1U2)∗A(U1U2) = U∗2U∗1AU1U2 =
[
1 ~0T
~0 Q∗
] [
λ1~bT
~0 A1
] [
1 ~0T
~0 Q
]
=
[
λ1~bT Q
~0 T1
]
which is upper triangular. Since unitarily similar matrices have the same eigenvalues
and the eigenvalues of a triangular matrix are its diagonal entries, we get that the
eigenvalues of A are along the main diagonal of U∗AU as required. �
Section 11.5 Unitary Diagonalization 347
REMARK
Schur’s Theorem shows that for every n × n matrix A there exists a unitary matrix U
such that U∗AU = T is upper triangular. Since U∗ = U−1 we can solve this for A to
get A = UTU∗. This is called a Schur decomposition of A.
We can now prove the result corresponding to the Principal Axis Theorem.
THEOREM 2 (Spectral Theorem for Hermitian Matrices)
If A is Hermitian, then A is unitarily diagonalizable.
Proof: By Schur’s Theorem, there exists a unitary matrix U and an upper triangular
matrix T such that U∗AU = T . Since A∗ = A we get that
T ∗ = (U∗AU)∗ = U∗A∗U∗∗ = U∗AU = T
Since T is upper triangular, we get that T ∗ is lower triangular. Thus, T is both upper
and lower triangular and hence is diagonal. Consequently, U unitarily diagonalizes
A. �
Just like the Principal Axis Theorem, this proof does not give us an effective way
of unitarily diagonalizing a Hermitian matrix. But, not surprisingly, we find that
Hermitian matrices have the same properties as symmetric matrices.
THEOREM 3 A matrix A ∈ Mn×n(C) is Hermitian if and only if for all ~z, ~w ∈ Cn we have
⟨
A~z, ~w⟩
=⟨
~z, A~w⟩
Proof: If A is Hermitian, then A∗ = A and hence by Theorem 11.4.9 we get
⟨
A~z, ~w⟩
=⟨
~z, A∗~w⟩
=⟨
~z, A~w⟩
On the other hand, if 〈~z, A~w〉 = 〈A~z, ~w〉, then we have
~zT A ~w = ~zT A~w =⟨
~z, A~w⟩
=⟨
A~z, ~w⟩
= (A~z)T ~w = ~zT AT ~w
Since this is valid for all ~z, ~w ∈ Cn, we can apply the Matrices Equal Theorem twice
to get A = AT , and hence A = A∗ as required. �
THEOREM 4 If A is a Hermitian matrix, then
(1) All the eigenvalues of A are real.
(2) If λ1 and λ2 are distinct eigenvalues with corresponding eigenvectors ~v1 and ~v2
respectively, then ~v1 and ~v2 are orthogonal.
348 Chapter 11 Complex Vector Spaces
Proof: We will prove (1) and leave (2) as an exercise. Let λ be an eigenvalue of A
with corresponding eigenvector ~z. Then,
λ〈~z,~z〉 = 〈λ~z,~z〉 = 〈A~z,~z〉 = 〈~z, A∗~z〉 = 〈~z, A~z〉 = 〈~z, λ~z〉 = λ〈~z,~z〉
Since 〈~z,~z〉 , 0 we have λ = λ, so λ is real. �
REMARK
Since every real symmetric matrix is Hermitian, this result implies Lemma 10.2.2.
Thus, we can unitarily diagonalize a Hermitian matrix in exactly the same way that
we orthogonally diagonalized symmetric matrices. The amazing thing though is that
we are guaranteed to get a real matrix D.
EXAMPLE 2 Let A =
[
−4 1 − 3i
1 + 3i 5
]
. Verify that A is Hermitian and show that A is unitarily
diagonalizable.
Solution: We see that A∗ =
[
−4 1 − 3i
1 + 3i 5
]
= A, so A is Hermitian. We have
C(λ) =
∣
∣
∣
∣
∣
∣
−4 − λ 1 − 3i
1 + 3i 5 − λ
∣
∣
∣
∣
∣
∣
= λ2 − λ − 30 = (λ + 5)(λ − 6)
Hence, the eigenvalues are λ1 = −5 and λ2 = 6. We have
A + 5I =
[
1 1 − 3i
1 + 3i 10
]
∼[
1 1 − 3i
0 0
]
So, a basis for Eλ1is
{[
−1 + 3i
1
]}
. We also have
A − 6I =
[
−10 1 − 3i
1 + 3i −1
]
∼[
1 −(1 − 3i)/10
0 0
]
So, a basis for Eλ2is
{[
1 − 3i
10
]}
. Observe that
⟨[
−1 + 3i
1
]
,
[
1 − 3i
10
]⟩
= (−1 + 3i)(1 + 3i) + 1(10) = 0
Thus, the eigenvectors are orthogonal. Normalizing them, we get that A is diagonal-
ized by the unitary matrix U =
[
−(1 + 3i)/√
11 (1 − 3i)/√
110
1/√
11 10/√
110
]
to D =
[
−5 0
0 6
]
.
Section 11.5 Unitary Diagonalization 349
In the real case, we also had that every orthogonally diagonalizable matrix was sym-
metric. Unfortunately, the corresponding result is not true in the complex case as the
next example demonstrates.
EXAMPLE 3 Prove that A =
[
0 1
−1 0
]
is unitarily diagonalizable, but not Hermitian.
Solution: Observe that A∗ =
[
0 −1
1 0
]
, A, so A is not Hermitian.
In Example 11.3.1, we showed that A is diagonalized by P =
[
−i i
1 1
]
. Observe that
⟨[
−i
1
]
,
[
i
1
]⟩
= (−i)(−i) + 1(1) = 0
so the columns of P are orthogonal. Hence, if we normalize them, we get that A is
unitarily diagonalized by U =
− i√2
i√2
1√2
1√2
.
The matrix in Example 3 satisfies A∗ = −A and so is called skew-Hermitian. Thus,
we see that there are matrices which are not Hermitian but whose eigenvectors form
an orthonormal basis for Cn and hence are unitarily diagonalizable. In particular, in
the proof of the Spectral Theorem for Hermitian Matrices, we can observe that the
reverse direction fails because if A is not Hermitian, we cannot guarantee that T ∗ = T
since some of the eigenvalues may not be real.
So, as we did in the real case, we should look for a necessary condition for a matrix
to be unitarily diagonalizable.
Assume that eigenvectors of A ∈ Mn×n(C) form an orthonormal basis {~z1, . . . ,~zn} for
Cn. Then, we know that U =[
~z1 · · · ~zn
]
unitarily diagonalizes A. That is, we
have U∗AU = D, where D is diagonal. Hence, D∗ = U∗A∗U. Now, notice that
DD∗ = D∗D since D and D∗ are diagonal. This gives
(U∗AU)(U∗A∗U) = (U∗A∗U)(U∗AU) ⇒ U∗AA∗U = U∗A∗AU
Consequently, if A is unitarily diagonalizable, then we must have AA∗ = A∗A.
DEFINITION
Normal Matrix
A matrix A ∈ Mn×n(C) is called normal if AA∗ = A∗A.
350 Chapter 11 Complex Vector Spaces
THEOREM 5 (Spectral Theorem for Normal Matrices)
A matrix A is normal if and only if it is unitarily diagonalizable.
Proof: We proved that every unitarily diagonalizable matrix is normal above. So,
we just need to prove that every normal matrix is unitarily diagonalizable. Of course,
we do this using Schur’s Theorem.
By Schur’s Theorem, there exists an upper triangular matrix T and a unitary matrix
U such that U∗AU = T . We just need to prove that T is in fact diagonal. Observe
that
TT ∗ = (U∗AU)(U∗A∗U) = U∗AA∗U = U∗A∗AU = (U∗A∗U)(U∗AU) = T ∗T
Hence, T is also normal.
We now just need to prove that every upper triangular normal matrix is diagonal.
Write T as
T =
t11 t12 · · · t1n
0 t22 · · · t2n
.... . .
. . ....
0 · · · 0 tnn
Comparing 1, 1-entries of TT ∗ and T ∗T gives
|t11|2 + |t12|2 + · · · + |t1n|2 = |t11|2
Notice that this is only possible if t12 = · · · = t1n = 0.
Therefore, we have T =
t11 0 · · · 0
0 t22 · · · t2n
.... . .
. . ....
0 · · · 0 tnn
.
Next, we compare the 2, 2-entries of TT ∗ and T ∗T . We get
|t22|2 + · · · + |t2n|2 = |t22|2
Hence, t23 = · · · = t2n = 0.
Continuing in this way we get that T is diagonal as required. �
As a result normal matrices are very important. We now look at some useful proper-
ties of normal matrices.
Section 11.5 Unitary Diagonalization 351
THEOREM 6 If A ∈ Mn×n(C) is normal, then
(1) ‖A~z ‖ = ‖A∗~z ‖, for all ~z ∈ Cn
(2) A − λI is normal for every λ ∈ C(3) If A~z = λ~z, then A∗~z = λ~z(4) If ~z1 and ~z2 are eigenvectors of A corresponding to distinct eigenvalues λ1 and
λ2 of A, then ~z1 and ~z2 are orthogonal.
Proof: For (1), we use Theorem 11.4.9. For any ~z ∈ Cn we get
‖A∗~z ‖2 = 〈A∗~z, A∗~z〉 = 〈~z, (A∗)∗A∗~z〉 = 〈~z, AA∗~z〉 = 〈~z, A∗A~z〉 = 〈A~z, A~z〉 = ‖A~z‖2
For (2), we observe that
(A − λI)(A − λI)∗ = (A − λI)(A∗ − λI) = AA∗ − λA∗ − λA + |λ|2I
= A∗A − λA − λA∗ + |λ|2I = (A∗ − λI)(A − λI)
= (A − λI)∗(A − λI)
For (3), suppose that A~z = λ~z for some ~z ∈ Cn and let B = A − λI. Then, B is normal
by (2) and
B~z = (A − λI)~z = A~z − λ~z = ~0.So, by (1) we get
0 = ‖B~z ‖ = ‖B∗~z ‖ = ‖(A∗ − λI)~z ‖ = ‖A∗~z − λ~z ‖Thus, A∗~z = λ~z, as required.
The proof of (4) is left as an exercise. �
Notice that property (4) shows us that the procedure for unitarily diagonalizing a
normal matrix is exactly the same as the procedure for orthogonally diagonalizing a
real symmetric matrix, except that the calculations are a little more complex.
EXAMPLE 4 Unitarily diagonalize A =
[
4i 1 + 3i
−1 + 3i i
]
.
Solution: We have C(λ) = λ2 − 5iλ + 6. Using the quadratic formula we get eigen-
values λ1 = 6i and λ2 = −i. For λ1 = 6i we get
A − λ1I =
[
−2i 1 + 3i
−1 + 3i −5i
]
⇒ ~z1 =
[
3 − i
2
]
For λ2 = −i we get
A − λ2I =
[
5i 1 + 3i
−1 + 3i 2i
]
⇒ ~z2 =
[
1 + 3i
−5i
]
Hence, taking U =
3−i√14
1+3i√35
2√14
−5i√35
gives U∗AU =
[
6i 0
0 −i
]
.
352 Chapter 11 Complex Vector Spaces
Section 11.5 Problems
1. Prove that every skew-Hermitian matrix is uni-
tarily diagonalizable by:
(a) mimicking the proof of the Spectral Theo-
rem for Hermitian Matrices.
(b) showing that directly that every skew-
Hermitian matrix is normal.
2. Prove that all the eigenvalues of a skew-
Hermitian matrix A are purely imaginary.
3. Prove that every unitary matrix is unitarily di-
agonalizable by:
(a) mimicking the proof of the Spectral Theo-
rem for Hermitian Matrices.
(b) showing that directly that every unitary
matrix is normal.
4. Determine if the matrix is Hermitian,
skew-Hermitian, and/or normal.
(a)
[
3 1 − i
1 + i 5
]
(b)
[
2 −i
i 1 + i
]
(c)
[
0 1 − i
−1 − i 0
]
(d)
[
1 − i 2i
2 3
]
(e)
[
i 2i
2 i
]
(f)
[
0 3i
−3i 1
]
5. Unitarily diagonalize the following matrices.
(a) A =
[
a b
−b a
]
(b) A =
[
2 1 + i
1 − i 3
]
(c) A =
[
5√
2 − i√2 + i 3
]
(d) A =
1 0 1
1 1 0
0 1 1
(e) A =
1 0 1 + i
0 2 0
1 − i 0 0
(f) A =
1 i −i
−i −1 i
i −i 0
6. Let A ∈ Mn×n(C) be normal and invertible.
Prove that B = A∗A−1 is unitary.
7. Let A and B be Hermitian matrices. Prove that
AB is Hermitian if and only if AB = BA.
8. Let A =
[
2i −2 + i
1 − i 3
]
.
(a) Prove that λ1 = 2+ i is an eigenvalue of A.
(b) Find a unitary matrix U such that U∗AU =
T is upper triangular.
9. Let A ∈ Mn×n(C) satisfy A∗ = iA.
(a) Prove that A is normal.
(b) Show that every eigenvalue λ of A must
satisfy λ = −iλ.
10. Let A ∈ Mn×n(C), and let λ1, . . . , λn denote the
eigenvalues of A (repeated according to multi-
plicity). Prove that
det A = λ1 · · · λn, and tr A = λ1 + · · · + λn
Section 11.6 Cayley-Hamilton Theorem 353
11.6 Cayley-Hamilton Theorem
We now use Schur’s theorem to prove a result about the relationship of a matrix with
its characteristic polynomial.
THEOREM 1 (Cayley-Hamilton Theorem)
If A ∈ Mn×n(C), then A is a root of its characteristic polynomial C(λ).
Proof: The characteristic polynomial of A has the form
C(λ) = (−1)nλn + cn−1λn−1 + · · · + c1λ + c0
We now consider the polynomial
C(X) = (−1)nXn + cn−1Xn−1 + · · · + c1X + c0I
whose argument is a matrix X.
Now, by Schur’s theorem, there exists a unitary matrix U and upper triangular matrix
T such that U∗AU = T .
Since the eigenvalues λ1, . . . , λn of A are the diagonal entries of T we have
T =
λ1 t12 · · · t1n
0 λ2 · · · t2n
.... . .
. . ....
0 · · · 0 λn
Moreover, we know that the roots of the characteristic polynomial are the eigenvalues
of A. Thus, we can factor C(X) as
C(X) = (X − λ1I)(X − λ2I) · · · (X − λnI)
Therefore,
C(T ) = (T − λ1I)(T − λ2I) · · · (T − λnI) (11.1)
We will show that C(T ) is the zero matrix by showing that the first k-columns of
(T − λ1I)(T − λ2I) · · · (T − λkI) only contain zeros for 1 ≤ k ≤ n.
Observe that the first column of T − λ1I is zero since the first column of T is
λ1
0...0
.
Assume that all entries of the first k-columns of (T − λ1I)(T − λ2I) · · · (T − λkI) only
contain zeros. Then, since bottom n− k entries of (T − λk+1I) are all zero, we get that
the first k + 1 columns of (T − λ1I) · · · (T − λkI)(T − λk+1I) only contain zeros. Thus,
by induction, C(T ) = On,n.
354 Chapter 11 Complex Vector Spaces
We now observe that since A = UTU∗ we have
C(A) = C(UTU∗)
= (−1)n(UTU∗)n + cn−1(UTU∗)n−1 + · · · + c1(UTU∗) + c0I
= (−1)nUT nU∗ + cn−1UT n−1U∗ + · · · + c1UTU∗ + c0UU∗
= U[
(−1)nT n + cn−1T n−1 + · · · + c1T + c0I]
U∗
= UC(T )U∗ = On,n
as required.
�
EXAMPLE 1 Verify that A =
[
a b
0 c
]
is a root of its characteristic polynomial.
Solution: Since A is upper triangular, we have C(λ) = (λ−a)(λ−c) = λ2−(a+c)λ+ac.
Then,
C(A) = A2 − (a + c)A + acI
=
[
a2 ab + bc
0 c2
]
−[
a2 + ac ab + cb
0 ac + c2
]
+
[
ac 0
0 ac
]
=
[
0 0
0 0
]
EXERCISE 1 Verify that A =
a b c
0 d e
0 0 f
is a root of its characteristic polynomial in two ways. First,
show it directly by using the method of Example 1. Second, show it by evaluating
C(A) = −(A − aI)(A − dI)(A − f I)
by multiplying from left to right as in the proof.
We now briefly look at one of the uses of the Cayley-Hamilton Theorem.
THEOREM 2 If A ∈ Mm×n(C) is invertible, then A−1 can be written as a linear combination of
powers of A.
Proof: Assume that the characteristic polynomial of A is
C(λ) = (−1)nλn + cn−1λn−1 + · · · + c1λ + c0
Observe that if c0 = 0, then λ1 = 0 is a root of C(λ) and hence det A = 0. But, this
would contradict the fact that A is invertible. So, c0 , 0.
Section 11.6 Cayley-Hamilton Theorem 355
Then, by the Cayley-Hamilton Theorem we get
(−1)nAn + cn−1An−1 + · · · + c1A + c0I = On,n
Rearranging this gives
I = − 1
c0
(
(−1)nAn + cn−1An−1 + · · · + c1A
)
= − 1
c0
(
(−1)nAn−1 + cn−1An−2 + · · · + c1I
)
A
Hence,
A−1 = − 1
c0
(
(−1)nAn−1 + cn−1An−2 + · · · + c1I
)
as required. �
Section 11.6 Problems
1. Given that the characteristic polynomial of A =
1 1 1
0 −1 −1
−1 2 0
is
C(λ) = λ3 + 2λ − 2, write A−1 as a linear combination of powers of A.
2. Determine A3 given that the characteristic polynomial of A =
3 −8 −2
−8 −9 −4
−2 −4 6
is
C(λ) = λ3 − 147λ + 686.
3. Find a 3 × 3 matrix A that has characteristic polynomial λ3 − 3λ + 2.
356 Chapter 11 Complex Vector Spaces
INDEX
C(a, b), 112
k-flat, 21
L, 107
Mm×n(R), 72, 111
Pn(R), 111
P(R), 112
Rn, 1
operations on, 2
properties of, 4
subspaces of, 23
Adjugate, 174
Algebraic Multiplicity, 192
Angle in Rn, 33
Approximation Theorem, 273
Augmented Matrix, 54
B-Matrix, 183
Basis
algorithm for finding, 126
for subsets of Rn, 16, 26
for a vector space, 124
Block Matrix, 84
Block Multiplication, 84
Cartesian Product, 113
Cayley-Hamilton Theorem, 353
Change of Coordinates Matrix, 138
Characteristic Polynomial, 191
Coefficient Matrix, 54
Cofactor, 161
inverse by, 173
Cofactor Expansion, 165
Cofactor Matrix, 174
Complex Conjugates
of complex numbers, 322
of complex vectors, 331
of complex matrices, 331
Composition, 108, 218
Coordinate Vector, 133
Column Space, 102, 207
Consistent System, 49
Cramer’s Rule, 176
Cross Product, 35
properties of, 36
Determinant
2 × 2 matrix, 147, 160
3 × 3 matrix, 161
n × n matrix, 162
Diagonal Matrix, 185
Diagonalization, 197
algorithm, 200
orthogonal, 284
unitary, 345
Diagonalization Theorem, 197
Dimension, 130
Dimension Theorem, 212
Direct Sum, 270
Dot Product, 31
Eigenpair, 188, 333
Eigenspace, 191
Eigenvalue, 188, 333
Eigenvector, 188, 333
Elementary Matrix, 151
Elementary Row Operations (EROs), 57
Equivalent System, 54
Four Fundamental Subspaces, 103, 207
Free Variable, 62
Fundamental Theorem of Linear Algebra, 272
Geometric Multiplicity, 192
Gauss-Jordan Elimination, 59
Gram-Schmidt Procedure, 257
Hermitian Inner Product, 338
Hermitian Inner Product Space, 338
Hermitian Matrix, 345
Homogeneous System, 65
Hyperplane in Rn, 20
Identity Mapping, 109
Identity Matrix, 83
Inconsistent System, 49
Infinite Dimensional, 130
Inner Product, 237
Inner Product Space, 237
Invertible Matrix, 144
Invertible Matrix Theorem, 148, 170
Isomorphism, 232
Section 11.6 Cayley-Hamilton Theorem 357
Kernel, 99, 221
Least Squares Method, 275
Left Inverse, 143
Left Nullspace, 103, 207
Length, 32, 242, 340
properties of, 33
Line in Rn, 20
Linearity Property, 89
Linear Combination, 3
Linear Equation, 48
Linear Mappings L : Rn → Rm, 89
B-matrix, 183
eigenvalues, 188
kernel, 99
operations on, 106, 108
properties of, 107
range, 96
standard matrix, 91
Linear Mappings L : V→W, 215
matrix of, 225, 227
kernel, 221
one-to-one, 230
onto, 230
operations on, 217, 218
properties of, 218
range, 221
rank, 222
Linear Operator, 89, 215
Linearly Dependent/Independent, 13, 121
Lower Triangular, 167
Matrices Equal Theorem, 82
Matrix, 72
addition, 72
augmented, 54
coefficient, 54
column space, 102
conjugate transpose, 343
diagonalizable, 197
eigenvalues, 188
Hermitian, 345
inverse, 144
left nullspace, 103
mappings, 88
normal, 349
of a linear mapping, 183
orthogonal, 253
nullspace, 101
properties of, 73
row space, 103
scalar multiplication, 72
similar, 186
transpose, 74
unitary, 342
Matrix-Matrix Multiplication, 80
block multiplication, 84
identity, 83
properties of, 81
Matrix-Vector Multiplication, 75, 77
properties of, ??
Normal Equations, 274
Normal Matrix, 349
Normal System, 274
Normal Vector, 38
Normalizing, 244
Nullity, 222
Nullspace, 101, 207
One-to-one, 230
Onto, 230
Orthogonal Complement, 262
Orthogonal Matrix, 253
Orthogonal Vectors, 34, 244, 340
Orthogonal Basis, 246
Orthogonal Set, 34, 245, 341
Orthogonally Similar, 280
Orthonormal Basis, 248
Orthonormal Set, 248, 341
Parallelotope, 181
Perpendicular (of a projection), 43, 44, 266
Plane in Rn, 20
Projection
onto a line, 42
onto a plane, 44
onto a subspace, 266
QR-Decomposition, 260
Quadratic Form, 292
classifications, 294
diagonal, 294
principal axes, 301
358 Chapter 11 Complex Vector Spaces
Range, 96, 221
Rank
of a linear mapping, 222
of a matrix, 67
Rank-Nullity Theorem, 223
Reduced Row Echelon Form (RREF), 58
Reflections, 93
Right Inverse, 143
Rotation in R2, 93
Rotation Matrix, 93
Row Equivalent, 57
Row Reducing Algorithm, 62
Row Space, 103, 207
Scalar Equation of a Plane, 37, 38
Scalar Equation of a Hyperplane, 40
Scalar Product, 31
Schur’s Theorem, 346
Similar Matrices, 186
orthogonally, 280
unitarily, 345
Singular Value Decomposition, 315
Singular Values, 310
Singular Vectors, 312
Skew-Symmetric, 117
Solution/Solution Set, 49
Solution Space, 67
Solution Theorem, 69
Span
in Rn, 7
in a vector space, 118
Spectral Theorem
for Hermitian matrices, 347
for normal matrices, 350
Square Matrix, 144
Standard Basis
for Rn, 17
for M2×2(R), 124
for Pn(R), 124
Standard Inner Product
for Cn, 337
for Rn, 31
Standard Matrix, 91
Subspace
of Rn, 23
of a vector space over R, 115
bases of, 26
Subspace Test, 23, 115
Symmetric Matrix, 284
System of Linear Equations, 49
algorithm for solving, 62
System Rank Theorem, 68
Trace, 186, 216
Transpose, 74
properties of, 74
Triangularization Theorem, 281
Trivial Solution, 13
Trivial Vector Space, 112
Unit Vector, 33, 244, 340
Unitarily Diagonalizable, 345
Unitarily Similar, 345
Unitary Matrix, 342
properties of, 334
Upper Triangular, 167
Vector Equation, 7
Vector Space over C, 326
subspace of, 327
Vector Space over R, 110
subspace of, 115
bases of, 124