10. euclidean spaces

10. Euclidean Spaces

Most of the spaces used in traditional consumer, producer, and general equilibriumtheory will be Euclidean spaces—spaces where Euclid’s geometry rules.

At this point, we have to start being a little more careful how we write things. Wewill start with the space Rn, the space of n-vectors, n-tuples of real numbers. Whenwe are being picky, we write them vertically as if they are n× 1 matrices.

x =

x1

x2...xn

,

just as we did when writing linear systems as matrix products, Ax = b. The xi arereferred to as the coordinates of the vector.

2 MATH METHODS

10.1 Vectors Vertical and Horizontal

Sometimes, especially in text, we will be informal and write vectors horizontally, butthis is not strictly correct. I will sometimes attach a transpose symbol to remind you theyshould be vertical. Truly horizontal “vectors” are called co-vectors or covariant vectors.They differ from vectors in how they transform if you change coordinates. Ordinaryvectors are sometimes called contravariant vectors.

We can see this distinction when thinking of commodity vectors and their associatedprice vectors. Suppose we decide to measure milk in quarts rather than gallons. Thequantities of all the different types of milk (skim, 2% milkfat, whole, chocolate, etc.)would all have to be multiplied by 4 to reflect the change in measurement becausethere are 4 quarts in a gallon.

This is not how this measurement change affects prices. No, no, no! Price “vectors”are actually covectors, and if milk costs 2.40 per gallon, that’s 0.60 per quart. Wehave to divide the prices by 4 rather than multiplying by 4. This is the essence of thedistinction between vectors and covectors.

If we were being really picky, which we won’t, the vectors would use superscriptsfor their coordinates and the covectors would use subscripts. Although that is a use-ful convention in geometry and physics, it conflicts with other useful conventions ineconomics. One such is to use superscripts to indicate ownership—which consumer acommodity vector belongs to or which firm is using inputs and producing outputs.

10. EUCLIDEAN SPACES 3

10.2 Vectors

In R2, a vector x =( 1

2

)

looks like this:

b x

x1

x2

x

x1

x2

Figure 10.2.1:

Although we will mostly treat vectors as points in the plane, as in the left panel ofFigure 10.2.1, it is often useful to think of them as indicating a direction, which x doesin the right panel of Figure 10.2.1. In mathematics this is sometimes indicated by usingvector bundles. A vector bundle indicates the starting point as well as the direction.We will not make explicit use of vector bundles.

4 MATH METHODS

10.3 Vector Addition

Algebraically, vector addition is just matrix addition. We add the components.

x1

x2...xn

+

y1

y2...yn

=

x1 + y1

x2 + y2...

xn + yn

When we add vectors on a diagram, we add them nose to tail, placing the startingpoint of the second vector at the end of the first.

a

b

b

a

a + b

x1

x2

Figure 10.3.1: Two ways to add vectors: Here we add a = (1, 2)T to b = (2, 1)T . Theupper combination is a + b and the lower b + a. Of course, both end at the same point,(3, 3)T , because matrix and thus vector addition are commutative. The red vector is the sum(a + b). In fact, we can read off the diagram that a = (1, 2) and b = (2, 1).


10.4 Scalar Multiplication

We can also multiply vectors by scalars. We still use the rules of matrix algebra so

α

x1

x2...xn

=

αx1

αx2...

αxn

.

The diagram illustrates this graphically.

a

−a

2a

x1

x2

Figure 10.4.1: Vector Multiplication: Here a = (−1.5, 1)T is multiplied by 2. Multiplyingby a larger number would extend the line further. Multiplying by a smaller number wouldshrink toward the origin. Finally, multiplying by a negative number goes in the oppositedirection as illustrated by −a.

6 MATH METHODS

10.5 Coordinate Vectors in Rn

The standard basis vectors in Rn, ek, are defined by ek = (δik), so

e1 =

100...0

,e2 =

010...0

, · · ·en =

00...01

.

These vectors are also referred to as the canonical basis vectors.We can write any vector x as a sum of basis vectors,

x =

x1

x2...xn

= x1e1 + x2e2 + · · · + xnen =n∑

i=1

xiei.

The sum can also be written as a matrix product.

x =(

e1 e2 · · · en

)

x1

x2...xn

=

1 0 · · · 00 1 · · · 0...

.... . .

...0 0 · · · 1

x1

x2...xn

= Inx.


10.6 Linear Transformations

We are now ready to show that any linear transformation T : Rn → Rm can be written

in matrix form.

Theorem 10.6.1. Let T : Rn → Rm be a linear transformation. Then there is an m× nmatrix A with T = TA.

Proof. We can write x =∑n

j=1 xjej. Then T (x) =∑n

j=1 xjT (ej) for any vector x.

The vector T (ej) is in Rm. We denote its ith component by T (ej)i, i = 1, . . . ,m.Define the m× n matrix A by setting aij = T (ej)i. Thus A =

[

T (ej)i]

.With this definition of A, T (x)i =

∑nj=1 aijxj, so T (x) = Ax. This shows that

T = TA.

This means that the matrix

A =(

T (e1) T (e2) · · · T (en))

represents T .

8 MATH METHODS

10.7 Matrix Representations of Linear Transformations

◮ Example 10.7.1: Linear Transformation as Matrix. Suppose T : R2 → R3 is a lineartransformation with

T (e1) =

(103

)

and T (e2) =

(−112

)

.

Then it can be represented by the matrix

A =

(1 −10 13 2

)

so that

Ax =

(

x1 − x2

x2

3x1 + 2x2

)

.

◭


10.8 Row Vectors vs. Column Vectors

I’m insisting on writing vectors as columns. One reason I gave earlier was that we canthen write our linear systems with coefficient matrix A as Ax = b.

But you might object that you could just take the transpose, obtaining xTAT = bT ,and redefine the coefficient matrix to be AT . That may seen slightly unnatural becausewe would always put the variable on the left, but math is sometimes done that way,particular in some areas of algebra.

The real issue is that we need to maintain a distinction between the row vectors andcolumn vectors.

Suppose we had a linear function f that maps Rn into R. Such a linear function iscalled a linear functional. As we just saw, that kind of linear function can be representedby a 1 × n matrix

A =(

f(e1) f(e2) · · · f(en))

.

If we write the vectors as columns, the linear functionals become row vectors.Thus the row (covariant) vector (1, 3, 0, 1) defines a linear functional on (contravariant)

vectors in R4 by

(

1 3 0 1)

x1

x2

x3

x4

= x1 + 3x2 + x4.

Which one is which is completely arbitrary. But the weight of mathematical tradition,stretching back at least to the 19th century, is to make the regular vectors columns,and the linear functionals rows. It’s implicit in the Ricci calculus that Einstein used todevelop the general theory of relativity.

In economics, vectors of goods should be represented as columns, but price vectorsare properly represented as rows. Thus each set of prices defines a linear functional onthe space of goods. If we purchase two different commodity bundles, the cost of themis the sum of the costs, and if we purchase a multiple of a particular commodity bundleunder the same price system, its cost is the same multiple of the original. The cost of abundle is both additive and multiplicative, so it is a linear functional on the commodityspace.

10 MATH METHODS

10.9 Vector Spaces: Definition

Both Rn and C

n, the set of complex n-tuples or n-vectors, are examples of vectorspaces. But what are vector spaces? They are sets of things called vectors that can beadded together and multiplied by scalars (numbers). The addition and multiplicationhas to obey certain rules.

Vector Space. A vector space (V,+, ·) over a number field F (the set of scalars) is a set ofvectors V with two operations: vector addition, which defines x+y ∈ V for all vectorsx,y ∈ V; and scalar multiplication, which defines αx for any scalar α ∈ F and anyvector x ∈ V. Scalar multiplication and vector addition obey the following properties:

1. For all x,y ∈ V, x + y = y + x (addition commutes).2. For all x,y, z ∈ V, (x + y) + z = x + (y + z) (addition associates).3. There exists a unique vector 0 ∈ V with x + 0 = 0 + x = x (additive identity).4. For all x ∈ V, there is a unique vector −x ∈ V with x + (−x) = 0 (additive

inverse).5. For all x ∈ V, 1x = x (multiplicative identity).6. For all α, β ∈ F and x ∈ V, α(βx) = (αβ)x (scalar multiplication associates).7. For all α ∈ F and x,y ∈ V, α(x + y) = αx + αy (distributive law I).8. For all α, β ∈ F and x ∈ V, (α + β)x = αx + βx (distributive law II).

A vector space can be defined over any number field F, such as the rational numbers,the real numbers, or the complex numbers.1 Vector spaces over the real numbersare called real vector spaces and vector spaces over the complex numbers are calledcomplex vector spaces. The vector space Rn is the set of all n-tuples of real numbers,x = (x1, x2, . . . , xn)T . Here vector addition is defined componentwise, with x +y = (x1 + y1, x2 + y2, . . . , xn + yn)T and scalar multiplication is given by αx =(αx1, αx2, . . . , αxn)T .

1 There are many other number fields and many other types of vector spaces.


10.10 More Vector Spaces

Vector spaces do not have to be n-tuples of numbers. They can be anything that wecan add and multiply by a scalar in a way that obeys the vector space axioms.

◮ Example 10.10.1: Spaces of Continuous Functions. If A is a set, the set of real-valuedcontinuous functions on A is a vector space with the usual addition of functions andmultiplication by real numbers: (f + g)(x) = f(x) + g(x) and (αf)(x) = αf(x).

Interestingly, the set of complex-valued functions on A can be regarded as either areal or complex vector space.

If A is an open set, we can consider the vector space of continuously differentiable,or even k-times continuously differentiable functions. These are denoted Ck(A). Thespace of continous functions is denoted C0 or sometimes just C. ◭

Sequence spaces are also commonly used in economics.

◮ Example 10.10.2: A Sequence Space. Another commonly used vector space is s, thespace of sequences of real numbers. Elements of s are of the form (x1, x2, x3, . . . ) whereeach xi ∈ R. Vectors are added componentwise, and scalar multiplication is definedcomponentwise. This and similar spaces often appear in optimal growth models andthe dynamic general equilibrium models used in macroeconomics. ◭

12 MATH METHODS

10.11 Vector Subspaces

We say that W is a vector subspace of V if W ⊂ V is closed under the vector spaceoperations of vector addition and scalar multiplication (that is, x + y ∈ W whenx,y ∈ W and αx ∈ W when x ∈ W and α ∈ F).

When W is a vector subspace of V it can also be regarded as a vector space in its ownright as it inherits all the vector space properites ofV. ThusW = {x ∈ R

n : x1+3x2 = 0}is a real vector subspace of Rn and hence a real vector space. Many of our vector spaceswill be subspaces of Rn.

Both requirements to be in a subspace can be shown simultaneously.

Theorem 10.11.1. Let V be a vector space over F. If αx+y ∈ W for any x,y ∈ W ⊂ Vand any α ∈ F, then W is a vector subspace of V.

Proof. We need only show that any x + y ∈ W and any αx ∈ W for α ∈ F andx,y ∈ W. For the first, set α = 1 in the hypothesis. For the second, we must first showthat 0 ∈ W. Set x = y and α = −1 to find that 0 = −x + x ∈ W Then set y = 0 inthe hypothesis.


10.12 Subspaces of Sequence Spaces

◮ Example 10.12.1: Subspaces of Sequence Spaces. Some vector subspaces of s includec, the space of convergent sequences, c0, the space of sequences that converge to 0(limn xn = 0), and c00, the space of sequences that are zero except for finitely manyterms (given x ∈ c00, there is some N with xn = 0 for n ≥ N). It is straightforward toverify these are all subspaces of s.

The vectors

{1, 8, 27, . . . , n3, . . . } and {1, e, e2, . . . , en, . . . }

are in s, but not c. The possibility of arbitrary growth rates is why we cannot find anorm for s.

The sequences

{2, 3/2, 4/3, . . . , 1 + 1/n, . . . }

{1, 1/4, 1/9, . . . , 1/n2, . . . }

{1, 2, 3, 4, 0, . . . , 0, . . . }

are respectively in c, but not c0; c0 but not c00; c00. These spaces are nested, so

c00 ⊂ c0 ⊂ c ⊂ s

The set sb of sequences that are bounded, but not necessarily convegent is anothersequence space that sits between c and s. All of the spaces except for s can be normedby ‖x‖∞ = supn |xn| where sup denotes the supremum, the least upper bound. ◭

14 MATH METHODS

10.13 The Null Space or Kernel of a Linear System

We start by considering a homogeneous linear system defined by an m× n coefficientmatrix A, Ax = 0. Recall that TA : Rn → Rm is given by TA(x) = Ax.

The solution set of the homogeneous system, {x : TA(x) = 0}, is a vector subspace ofRn, called the null space or kernel of A. We denote it by kerA.2

Theorem 10.13.1. For any m × n coefficient matrix A, consider the homogeneoussystem Ax = 0. The set of solutions to this system forms a vector subspace of Rn,kerA.

Proof. Let the solution set be V. Suppose x,y ∈ V and α ∈ R. Then A(αx + y) =αAx + Ay = 0, so any αx + y ∈ V, showing that V ⊂ Rn is a vector subspace ofRn.

Theorem 10.13.1 also gives us information about the solutions to the system Ax = b.If a solution exists, call it x0. Then Ax = b = Ax0. Subtracting, we find A(x−x0) = 0.In other words, if x0 is a solution to Ax = b, then every other solution can be writtenx + x0 where x ∈ kerA. The solution set is {x0} + kerA.

2 Simon and Blume prefer to call it the nullspace and use the symbol N(A).


10.14 The Range of a Linear Transformation

The kernel is not the only subspace associated with a linear transformation. Another isits range.

Theorem 10.14.1. Let T : Rn → Rm be a linear transformation. Then ran T is a vectorsubspace of Rm.

Proof. Clearly ran T ⊂ Rm. Let y,y′ ∈ ran T . Then there are x, x′ ∈ Rn withT (x) = y and T (x′) = y′. Let α be a scalar. Then T (αx+x′) = αT (x)+T (x′) = αy+y′,showing that αy + y′ ∈ ran T . This shows that ran T is a subspace of Rm.

In particular, this applies to linear transformations defined by matrices, TA(x) = Ax.

Corollary 10.14.2. Let A be an m×n matrix and define the linear function TA : Rn →Rm by TA(x) = Ax. Then ran TA is a vector subspace of Rm.

16 MATH METHODS

10.15 The Geometry of Vector Spaces: Norms

Euclidean space is distinguished by its geometry. Part of geometry is measuring distancesbetween points. In vector spaces this entails measuring vectors to determine their length.This can often be accomplished by a norm. There are three basic properties a normmust have.

Norm. A norm on a real or complex vector space V is a mapping from V to R, denoted‖x‖ that has three properties:

1. Positive Definite. For all x ∈ V, ‖x‖ ≥ 0 and ‖x‖ = 0 if and only if x = 0.2. Absolutely Homogeneous of Degree One. For all α ∈ F and x ∈ V, ‖αx‖ =

|α|‖x‖.3. Triangle Inequality. For all x,y ∈ V, ‖x + y‖ ≤ ‖x‖ + ‖y‖.

Normed Vector Space. A normed vector space(

(V,+, ·), ‖ · ‖)

is a vector space (V,+, ·)together with a norm ‖ · ‖ defined on that space.

We will usually use the abbreviated notation (V, ‖ · ‖) for(

(V,+, ·), ‖ · ‖)

We measure the distance between two points by the length of the vector betweenthem, ‖x− y‖, which is the same as ‖y− x‖ by absolute homogeneity.

One thing we can do with norms is create unit vectors, vectors with norm one, inany given direction. If x 6= 0, we define the unit vector u in direction x by u = x/‖x‖.If we compute

‖u‖ =

∥

∥

∥

∥

x

‖x‖

∥

∥

∥

∥

=‖x‖‖x‖ = 1,

by absolute homogeneity. Since it has norm one, u is indeed a unit vector.Once we define some more norms, Figure 10.17.1 will illustrate all the unit vectors

for three different norms.


10.16 ℓnp Spaces

One family of norms on Rn are the ℓp-norms.3. The ℓp norm on R

n is defined for1 ≤ p < ∞ by

‖x‖p =

(

n∑

i=1

|xi|p

)1/p

,

while‖x‖∞ = max

i|xi|.

We use the notation ℓnp to denote Rn with the ℓp norm. Thus ℓnp means(

Rn, ‖ · ‖p)

.The ℓp norm can also be defined on sequences of real numbers, when ℓp is defined

as the set of sequences x = {x1, x2, . . . } such that∑

i |xi|p converges. We use the norm

‖x‖p =

(

∞∑

i=1

|xi|p

)1/p

.

To see that distances change under different norms, consider the distance between(1, 3) and (4, 7). It is 7 in ℓ2

1, 5 in ℓ22, approximately 4.17 in ℓ2

5, and 4 in ℓ2∞. In contrast,

the distances between (1, 3) and (1, 4) are 1 in each of the ℓp norms. The fact that somedistances change while others don’t tells us that the geometry itself has changed.

3 Pronounced “little el pee”.

18 MATH METHODS

10.17 Shapes in ℓp

We can get a clue about how ℓp geometry changes with p by considering the vectorsof length one. Figure 10.17.1 shows all unit vectors x, vectors with ‖x‖ = 1, for threedifferent values of p. From left to right, the norms used are ℓ1, ℓ2, and ℓ∞.

Although the distances along the coordinate axes are the same in all cases, points inother directions get closer as p increases. As a result, the sets themselves expand in theoff-axis directions as p gets larger.

bx1

x2

‖x‖1 = 1

bx1

x2

‖x‖2 = 1

bx1

x2‖x‖∞ = 1

Figure 10.17.1: The left diagram shows the unit vectors, the vectors of length 1 in the ℓ1

norm. The center diagram shows the unit vectors in the ℓ2 norm. The right diagram showsthe unit vectors in the ℓ∞ norm. Although the distances along the axes stay the same, theother vectors get farther out as for the same ℓp length as p increases.

The ℓ1 norm is sometimes called the taxicab norm. When streets are laid out on agrid, it gives the distance via street between any two locations (no shortcuts throughbuildings!).

The ℓ2 norm is the Euclidean norm. It measures distance according to Euclideangeometry, and the points at distance one from the origin form a circle.


10.18 More Normed Spaces

Norms can be defined on other vector spaces. Consider the set of bounded real-valued continuous functions on a space X, denoted Cb(X). This space is a vectorspace when vector addition and scalar multiplication are defined pointwise. That is,(f+ g)(x) = f(x) + g(x) and (tf)(x) = tf(x) for all x ∈ X. Because the functions in Cb(X)are bounded, the supremum norm (or sup norm) defined by ‖f‖∞ = supx∈X |f(x)| willalways be finite. It is easy to show that the sup norm is positive definite, absolutelyhomogeneous of degree one, and obeys the triangle inequality on Cb(X).

Another function space with a norm is the vector space of square integrable functionson A, L2(A), has norm4

‖f‖2 =

(∫

A

|f(x)|2 dx

)1/2

.

It will turn out that this space is also Euclidean in the sense that the Pythagorean identityis true. In fact, it is a generalization of the ℓ2 norm.

4 This space is called “big el two” when we need to distinguish it from one of the ℓ2 spaces.

20 MATH METHODS

10.19 Lp Spaces

This generalization can be extended to p, 1 ≤ p < ∞ by

‖f‖p =

(∫

A

|f(x)|p dx

)1/p

.

◮ Example 10.19.1: Vectors in Lp Spaces. The function f(x) = x−1/2 is in L1 whenA = [0, 1] because ∫1

0

x−1/2 dx = 2x1/2∣

∣

∣

1

0= 2,

but it is not in L2(0, 1) because

∫ 1

0

∣

∣

∣x−1/2∣

∣

∣

2

dx =

∫1

0

1

xdx = ln x

∣

∣

∣

1

0= +∞.

◭


10.20 ℓnp norms and Pythagoras

The ℓ2 norm is called the Euclidean norm because it measures vectors according toEuclidean geometry. The other ℓp norms are not Euclidean. The geometry is different.

We can see this by measuring a right triangle in R2. Euclidean geometry requires thatthe Pythagorean identity hold.

We will use the right triangle defined by 0 = (0, 0), a = (a, 0), and b = (0, b)with a, b > 0 to check this. The two sides of the right angle are 0–a and 0–b whilethe hypotenuse is a–b. In all of the ℓp norms the a side has length (|a|p)1/p = a forp < ∞, and max{0, a} = a for p = ∞. Not surprisingly, the b side has length b. Thisis illustrated in the diagram.

a

ba− b

a

b

x1

x2

For the Pythagorean identity to be true, the hypotenuse a− b = (a,−b) must have

length√a2 + b2. It works fine in the ℓ2 norm, ‖(a,−b)‖2 =

√a2 + b2. For the ℓ∞

norm, we have ‖(a,−b)‖∞ = max{a, b} <√a2 + b2, failing the test.

For p 6= ∞, ‖(a,−b)‖p = (ap + bp)1/p, which will not be√a2 + b2 unless p = 2.

For example, if a = b = 2, then the terms are 2(p+1)/p and 23/2.Similar calculations apply to all the ℓnp for n ≥ 2 and 1 ≤ p ≤ ∞. We don’t worry

about n = 1 because there is no room for right triangles there. That means that ℓnp isnot Euclidean when p 6= 2.

22 MATH METHODS

10.21 Inner Product Spaces

The ℓ2 norm is special, and one thing that makes it special is that it is based on an innerproduct.

Inner Product Space. An inner product space (V, ·) is a real or complex vector space Vtogether with an inner product x·y on V. The inner product is a mapping V × V to R

or C, denoted x·y, which is:

1. (Conjugate) Symmetric. For all x,y ∈ V, x·y = y·x.5

2. Linear. For all vectors x,y, z and scalars α, β, x·(αy + z) = α(x·y) + (x·z).6

3. Positive Definite. For all x ∈ V, x·x ≥ 0 and x·x = 0 if and only if x = 0.

The inner product is also known as the dot product or scalar product. Variousnotations are used for the inner product, including x·y, (x,y), 〈x,y〉, or 〈x|y〉. We willgenerally use the dot notation, x·y.

When we use a notation such as 〈x|y〉 we are writing the inner product as a bilinearform or a sesquilinear form, where 〈x|y〉 is separately linear in both x and in y (bilinear)or linear in one and conjugate linear in the other (sesquilinear). In R

n,

(αx + y)·z = αx·y + y·z,

but in Cn,(αx + y)·z = αx·y + y·z.

The presence of the conjugate is why the inner product is sesquilinear in Cn, notbilinear.

5 The conjugate has no effect when V is a real vector space. There we just have ordinary symmetry.6 In the complex case, this implies conjugate linearity in the first term. Some authors use the opposite

convention with conjugate linearity in the second term.


10.22 Euclidean Inner Product on Rn

On Rn, we define the Euclidean inner product by

x·y =n∑

i=1

xiyi.

Another way to write the inner product on Rn is as a matrix product, x·y = xTy. InCn, the transpose must be replaced by the Hermitian conjugate, so x·y = x∗y whichcan also be written

x·y =n∑

i=1

xiyi.

24 MATH METHODS

10.23 More Inner Products

There are other inner products on Rn. In fact, whenever A is a symmetric positive

definite matrix, we can define an inner product by x·y = xTAy. Linearity in thefirst argument is clear. The symmetry of A ensures that the resulting inner productis symmetric. The fact that A is positive definite, makes the inner product positivedefinite. This also works in C

n provided that A is Hermitian.


10.24 Creating a Norm from the Inner Product

Any time we have an inner product x·y on a vector spaceV, we can define an associatednorm by

‖x‖ =√x·x.

If we use the Euclidean inner product, this becomes the Euclidean norm

‖x‖ =

√

√

√

√

n∑

i=1

|xi|2.

When we use the Euclidean norm on Rn, the resulting space is called n-dimensionalEuclidean space, ℓn2 .

26 MATH METHODS

10.25 Cauchy-Schwartz Inequality 9/14/21

An inner product and its associated norm are closely related. One aspect of this is theCauchy-Schwartz inequality, which will help us prove that the associated norm obeysthe triangle inequality.

Cauchy-Schwartz Inequality. Let x and y be vectors in an inner product space (V, ·).Then

|x·y| ≤ ‖x‖ ‖y‖for all vectors x and y in V. Moreover, if |x·y| = ‖x‖ ‖y‖ for non-zero x and y, then xand y are proportional.


10.26 Proof of Cauchy-Schwartz Inequality

Cauchy-Schwartz Inequality. Let x and y be vectors in an inner product space (V, ·).Then

|x·y| ≤ ‖x‖ ‖y‖for all vectors x and y in V. Moreover, if |x·y| = ‖x‖ ‖y‖ for non-zero x and y, then xand y are proportional.

Proof. The inequality clearly holds if x = 0 as both sides are then zero. We restrictour attention to the case x 6= 0. Since we wish to include the complex case, keep inmind that x·y = x∗y and y·x = y∗x = x∗y = x·y.

0 ≤∥

∥

∥

∥

y− x·y‖x‖2

x

∥

∥

∥

∥

2

(10.26.1)

=

(

y∗ − x·y‖x‖2

x∗

)(

y− x·y‖x‖2

x

)

= ‖y‖2 − (x·y)(x·y)

‖x‖2− (x·y)(x·y)

‖x‖2+

(x·y)(x·y)

‖x‖4‖x‖2

= ‖y‖2 − (x·y)(x·y)

‖x‖2

= ‖y‖2 − |x·y|2‖x‖2

. (10.26.2)

Since ‖ · ‖ is positive definite,

|x·y|2 ≤ ‖x‖2‖y‖2,

establishing the Cauchy-Schwartz Inequality.If |x·y| = ‖x‖ ‖y‖ for non-zero x and y, eq. 10.26.2 implies that the right-hand side

of eq. 10.26.1 is zero, meaning that

y =

(

y·x‖x‖2

)

x.

But then y is proportional to x. Since both x and y are non-zero, they are proportionalto each other.

28 MATH METHODS

10.27 The Inner Product defines a Norm

Now consider the norm derived from the inner product, ‖x‖ = (x·x)1/2. This isobviously absolutely homogeneous and positive definite. But is it a norm? Does thetriangle inequality apply? The Cauchy-Schwartz inequality tells us it does.

Proposition 10.27.1. Let (V, ·) be an inner product space over F = C or F = R and set‖x‖ = (x·x)1/2. Then ‖ · ‖ obeys:

1. For all α ∈ F and x ∈ V, ‖αx‖ = |α|‖x‖ (absolute homogeneity of degree one).2. ‖x‖ ≥ 0 and ‖x‖ = 0 if and only if x = 0 (positive definite).3. For all x,y ∈ V, ‖x + y‖ ≤ ‖x‖ + ‖y‖ (triangle inequality).

This shows that ‖ · ‖ is a norm.

Proof. Now (αx)·(αx) = (αα)(x·x) = |α|2 ‖x‖2, taking the positive square root shows‖αx‖ = |α| ‖x‖, proving (1).

For (2), ‖x‖ ≥ 0 by definition. If ‖x‖ = 0, then x·x = 0. Since the inner product ispositive definite, x = 0. This proves (2).

For (3),

‖x + y‖2 = ‖x‖2 + x·y + y·x + ‖y‖2

= ‖x‖2 + 2 Re x·y + ‖y‖2

≤ ‖x‖2 + 2|x·y| + ‖y‖2

≤ ‖x‖2 + 2‖x‖ ‖y‖ + ‖y‖2

≤(

‖x‖ + ‖y‖)2

where the third line uses the Cauchy-Schwartz inequality, |x·y| ≤ ‖x‖ ‖y‖. This proves(3).


10.28 Polarization Identities

If you have a norm derived from an inner product on Rn, it is possible to reconstruct

the inner product from the norm using the polarization identity

x·y =1

4

(

‖x + y‖2 − ‖x− y‖2)

.

Expanding the right-hand side gives

1

4

(

‖x‖2 + 2x·y + ‖y‖2 − ‖x‖2 + 2x·y− ‖y‖2)

= x·y.

Although the expression defined by the polarization identity is always symmetric andpositive definite, it will fail to be separately linear in x and y unless the norm obeys theparallelogram law

2‖x‖2 + 2‖y‖2 = ‖x + y‖2 + ‖x− y‖2

which states that the sum of squares of the lengths of the four sides of a parallelogram isequal to the sum of squares of the lengths of the two diagonals. In Euclidean geometry,it follows from the law of cosines. In inner product spaces the parallelogram law followsimmediately after expanding the right-hand terms.

In Cn, the polarization identity takes a somewhat more complicated form:

x·y =1

4

(

‖x + y‖2 − ‖x− y‖2)

− i

4

(

‖ix− y‖2 − ‖ix + y‖2)

.

the first term is the real part of x·y, given by

Re x·y =1

2

(

x·y + x·y)

.

The second term is i times the imaginary part of x·y,

i Im x·y =1

2

(

x·y− x·y)

.

30 MATH METHODS

10.29 Perpendicular Vectors

One important fact about the Euclidean inner product is that perpendicular vectorshave a dot product of zero.

Theorem 10.29.1. Let x and y be non-zero vectors in Euclidean Rn. Then x is perpen-dicular to y if and only if x·y = 0.

Proof. If case (⇐): Suppose x·y = 0. Then

‖y− x‖2 = ‖x‖2 − 2y·x + ‖y‖2 = ‖x‖2 + ‖y‖2

which is the Pythagorean identity. That means that x, y, and y−x form a right triangle.As y− x is the hypotenuse, x and y are perpendicular.

Only if case (⇒): Suppose x is perpendicular to y. Consider the right triangle withsides x, y, and y− x. By the Pythagorean identity

‖x‖2 + ‖y‖2 = ‖y− x‖2 = ‖y‖2 − 2y·x + ‖x‖2.

It follows immediately that x·y = 0.


10.30 Orthogonal and Orthonormal Vectors

The standard basis vectors E are perpendicular unit vectors because

ei·ej = ej·ei =∑

k

δkiδkj = δiiδjj = δij.

So ei·ej = 0 if i 6= j. The basis vectors are perpendicular to one another. A set ofvectors that are mutually perpendicular are referred to as orthogonal vectors.

Also, ei·ei = ‖ei‖2 = 1. The basis vectors are also unit vectors. A set of unit vectorsthat are also orthogonal are called orthonormal vectors.

The standard (or canonical) basis makes it easy to write the coordinates of x in termsof the inner product

xi = x·ei = ei·x.◮ Example 10.30.1: Orthonormal Basis for R3. Orthonormal bases don’t have to lookanything like the standard basis vectors.

b1 =1√2

( 110

)

,b2 =1√3

(−111

)

,b3 =1√6

( 1−12

)

.

Then {b1,b2,b3} is an orthonormal basis for R3. Easily calculations show the vectorsare perpendicular to each other, and that they all have norm 1. ◭

32 MATH METHODS

10.31 Angles and Inner Products I

Theorem 10.31.1. If x and y are non-zero vectors in Euclidean Rn, and θ is the anglebetween them, then

cos θ =x·y

‖x‖ ‖y‖. (10.31.3)

where x·y is the Euclidean inner product.

Proof. We will write y = w + z with w a multiple of x and z perpendicular to x.The diagrams below show how that works when the angle θ between x and y is acute(Figure 10.31.2) and obtuse (Figure 10.31.3). Notice that we always measure the anglethe short way round (or π for 180◦).

x

y

θx1

x2

x

y

z

w

θ x1

x2

Figure 10.31.2: The acute case is shown in the left-hand diagram. In the right-hand diagramy = w+ z where z is perpendicular to x and w is parallel to x. Here ‖w‖ = ‖y‖| cosθ|. Ascos θ > 0, w points in the same direction as x.

x

y

θ

x1

x2

x

yz

wθ

x1

x2

Figure 10.31.3: The obtuse case is shown in the left-hand diagram. In the right-hand diagramy = w+ z where z is perpendicular to x and w is parallel to x. Here ‖w‖ = ‖y‖| cosθ|. Ascos θ < 0, w points in the opposite direction as x.


10.32 Angles and Inner Products II

Proof continues. Now y = w + z where w is parallel to x and z is perpendicular tox. Euclidean geometry tells us that w has signed length ‖y‖ cosθ, we multiply that bythe unit vector in the x direction to find

w = ‖y‖ cosθx

‖x‖ .

Of course, cos θ is positive when the angle is acute and negative when it is obtuse. As aresult, the formula works for any θ, 0 ≤ θ ≤ π. Since z is perpendicular to x, x·z = 0by Theorem 10.29.1. Then

x·y = x·w + x·z= x·w

= x·(

‖y‖ cosθx

‖x‖

)

= ‖y‖ cosθ‖x‖2

‖x‖= ‖x‖‖y‖ cosθ.

Divide by ‖x‖‖y‖ to obtain equation (10.31.3).

In any inner product space, we can use equation (10.31.3) to define the anglebetween any two non-zero vectors.

θ = arccos

(

x·y‖x‖ ‖y‖

)

.

34 MATH METHODS

10.33 Metric Spaces

A metric is a way of measuring the distance between two points that is more generalthan a norm. There are several basic criteria it must satisfy.

Metric Space. Given a set X, a metric d on X is a mapping from X × X into R+ thatsatisfies:

1. Symmetry. d(x,y) = d(y, x).2. Positive Definite. d(x,y) ≥ 0 and d(x,y) = 0 if and only if x = y.3. Triangle Inequality. For all x,y, z ∈ X, d(x, z) ≤ (.x,y) + d(y, z).

Compared to a norm, we have lost the absolute homogeneity. The geometry canchange with distance.

A metric space is a set with a metric on it.

Metric Space. A metric space (X, d) is a set X together with a metric d defined on X.


10.34 Metrics for Normed Spaces

Normed vector spaces have a natural metric defined by d(x,y) = ‖x − y‖. It is easyto see that this metric is symmetric by absolute homogeneity of the norm. It is positivedefinite because the norm is positive definite. Finally, the triangle inequality for dfollows from

d(x, z) = ‖x− z‖≤ ‖x− y‖ + ‖y− z‖= d(x,y) + d(y, z).

36 MATH METHODS

10.35 Discrete Metric

◮ Example 10.35.1: Discrete Metric. A metric that does not require that X be a normedvector space, or a vector space, or that there be any structure at all on the space X, isthe discrete metric. It is defined by

d(x,y) =

{1 if x 6= y

0 if x = y.

The discrete metric is defined on every set X. It is positive definite and symmetric.It also obeys the triangle inquality because the left-hand side is either 0 or 1. If it isone, either d(x,y) = 1 or d(y, z) = 1, or both, so the right-hand side is either 1 or 2,satisfying the triangle inequality. ◭

Unless otherwise noted, we will use the Euclidean norm on Rn and its subsets. Ifthere is any ambiguity about which metric to use with a space, it should be specified.

Metrics will be important later on when we examine limits, open and closed sets,and continuity (Chapters 12, 29, and 13).


10.36 Bounded Metrics

Unlike a norm, a metric may be bounded. The following metric on the real line isbounded by one.

d(x, y) =|x− y|

1 + |x− y|< 1.

That d is symmetric and positive definite is pretty obvious. It takes a little work to showthe triangle inequality, but it is based on the fact that if a ≤ b + c, for a, b, c ≥ 0, then

a

1 + a≤ b

1 + b+

c

1 + c.

Here’s how it works:

a ≤ b + c

≤ b + c + 2bc + abc

a + ab + ac + abc ≤ b + ab + bc + abc + c + ac + bc + abc

a(1 + b)(1 + c) ≤ b(1 + a)(1 + c) + c(1 + a)(1 + b)

a

1 + a≤ b

1 + b+

c

1 + c.

We divided by (1 + a)(1 + b)(1 + c) in the last line.7

Finish by setting a = |x − z|, b = |x − y|, and c = |y − z| to obtain the triangleinequality.

7 If you’re wondering how anyone came up with such a calculation, there’s a trick. Work backwardsfrom the end.

38 MATH METHODS

10.37 A Metric for the Sequence Space

Although it is not possible to define a norm on the sequence space s, we can define ametric on it by

d(x,y) =∞∑

i=1

1

2i

|xi − yi|

1 + |xi − yi|.

Since each term is no more than 2−i, the sum converges uniformly to a number lessthan one. Although the metric is bounded, this doesn’t really translate into bounds onthe xi. The following diagram applies this to R2, with the same weighting of 1/2i. Thismeans

d(x,y) =1

2

|x1 − y1|

1 + |x1 − y1|+

1

4

|x2 − y2|

1 + |x2 − y2|

bx1

x2

1−1

Figure 10.37.1: Although the sequence metric is bounded, that cannot be said about thepoints at distance 1/4 from zero. The diagram illustrates points x with d(x, 0) = 1/4 inR2. The entire vertical axis has d(x, 0) < 1/4, with d

(

(0, x2), 0)

→ 1/4 as x2 → ±∞. Thegeometry is obviously quite different from any of the ℓp, some of which were illustrated inFigure 10.17.1.

As noted in the Figure 10.37.1 the entire vertical axis ends up being distance lessthan 1/4 from the origin. Here’s the calculation:

d(

(0, 0), (0, y))

=1

4

|y|

1 + |y|<

1

4.

As a result, when applied to the sequence space, the set

{x ∈ s : d(0, x) < r}

is unbounded. Even for r < 1 it will typically include the vertical axes for large valuesof i. When r ≥ 1, it is all of s.


10.38 Lines in Euclidean Space: Slope-Intercept Form

Now that we have measurement of distances and angles under control, via the innerproduct (angles and length), ℓn2 norm (length), and associated metric (distance function),we turn our attention to other aspects of the geometry of Rn. Until perpendicular anglesbecome involved, this will apply to Rn in general, not just Euclidean Rn, ℓn2 .

We start with lines in R2. As you well know, there’s more than one way to write the

equation of a line. We will start with the slope-intercept form, where y = mx + b.The coordinates are (x, y), the slope is m and b is the vertical intercept. Writing theequation in terms of coordinates, we have

(

xy

)

=(

xmx + b

)

= x( 1m

)

+( 0b

)

.

We can think of this as a one-parameter family of coordinates. So we can remove thelink between a particular coordinate system and the equation for the line, we rewritethis in terms of a parameter t ∈ R:

x(t) =( 0b

)

+ t( 1m

)

. (10.38.4)

The line is specified by a point on the line, (0, b)T , and a direction (1,m)T .

40 MATH METHODS

10.39 Lines in Euclidean Space: Parametric Form

We can easily generalize the form in equation (10.38.4) to Rn. We can write a line in

Rn as the points of the formx(t) = x0 + tx1 (10.39.5)

where x0 is a point on the line, and x1 is the direction of the line. This is the parametricform of a line. Coordinates have been eliminated from the definition. When we needthem, we can write the equation using any coordinates we wish.

If L is the line, we can now write

L ={x(t) : x(t) = x0 + tx1, t ∈ R

}.

One advantage of representing the line this way is that we can write equations forvertical lines. In R2, just set x1 = (0, 1)T (or even (0,−12)T ) to get a vertical line throughx0. This is one advantage of a coordinate-free definition. We are not tied to writing yas a function of x, x can be a function of y, or both functions of some other variable,such as the parameter t.

Before moving on, let’s consider lines through the origin, where 0 ∈ L. These can bewritten in a special form.

Theorem 10.39.1. A line through the origin, L, can be written L = {x : x = tx1, t ∈ R}

if and only if 0 ∈ L.

Proof. If case (⇐): Since 0 ∈ L, there is a t0 with x0 +t0x1 = 0. Then x0 +tx1 = (t−t0)x1 for any t ∈ R. Since t′ = t− t0 can be any real number, L = {x : x = tx1, t ∈ R}.

Only if case (⇒): If the line L has the specified form, set t = 0 to find 0 ∈ L.


10.40 Perpendiculars, Lines, and Hyperplanes

We return to the equation y = mx + b, this time restricting our attention to 2-dimensional Euclidean space. We previously converted this to an equation involvingthe direction (1,m). Suppose we think of the line as being defined by its perpendiculardirection. Since the line runs in the direction (1,m), we need (x1, x2) with 0 =(x1, x2)·(1,m) = x1 + mx2. One such vector is (x1, x2) = (−m, 1) (any scalar multiplewould do).

Now rewrite y = mx + b as y − mx = b, or (−m, 1)·(x, y) = b. In ℓn2 , we cangeneralize this to a·x = b for a 6= 0. Define H(a, b) = {x : a·x = b}.

Consider the equation defining H,

n∑

i=1

aixi = b.

Since at least one ai 6= 0, the coefficient matrix of this linear system has rank one, sothere are (n− 1) free variables. If n = 2, there is one free variable and we have a line.If n = 3, there are two free variables, and the equation a1x1 +a2x2 +a3x3 = b definesa plane. When n = 4, we have a three-dimensional surface in 4-space. We refer toH(a, b) as a hyperplane. It has the most free variables possible without including thewhole space R

n.

42 MATH METHODS

10.41 Hyperplanes and Half-Spaces

The hyperplane H(a, b) cuts Rn into two parts whose intersection is H(a, b), H+(a, b) ={x ∈ Rn : a·x ≥ b} and H−(a, b) = {x ∈ Rn : a·x ≤ b}. These two sets are referredto as closed half-spaces. The term closed means that the boundary, H(a, b) itself, isincluded (we will formalize terms such as closed and boundary in Chapter 12).

◮ Example 10.41.1: Hyperplane in ℓ22. We examine H(e1, 2) and its two closed half-

spaces.

x1

x2

H+(e1, 2)H−(e1, 2)

Figure 10.41.2: Here the heavy line/hyperplane H(e1, 2) separates R2 into two half-spaces,H+(e1, 2) right of the hyperplane, and H− +(e1, 2) left of the hyperplane.

◭


10.42 Hyperplanes, Half-Spaces, and the Budget Set

One place hyperplanes and half-spaces appear in economics is in the budget set.

◮ Example 10.42.1: The Budget Set. The budget set is defined by

B(p,m) = {x ∈ Rn+ : p·x ≤ m}

for p ≥ 0 and m ≥ 0. Here Rn+ = {x ∈ Rn : xi ≥ 0 for all i = 1, . . . , n}. We can

think of this as an intersection of half-spaces. There is the half-space H−(p,m), andthere are also the half-spaces H+(ei, 0). Thus

B(p,m) = H−(p,m)⋂

(

n⋂

i=1

H+(ei, 0)

)

.

p

m/p1

m/p2

x1

x2

Figure 10.42.2: The figure illustrates a budget set. The price vector is perpendicular to thebudget line. The intercepts, where all income is spent either on good one or good two, arem/p1 and m/p2, respectively.

◭

44 MATH METHODS

10.43 Hyperplanes, Half-Spaces, and the Probability Sim-

plex

A second example of hyperplanes and half-spaces in economics is the probabilitysimplex.

◮ Example 10.43.1: Probability Simplex. The probability simplex ∆ is defined by

∆ = {p ∈ Rn+ : p·e = 1}

where e =∑n

i=1 ei = (1, . . . , 1)T . Thus

∆ = {p ∈ Rn+ :

n∑

i=1

pi = 1}

The idea is that i = 1, . . . , n are mutually exclusive possible events and that pi is theprobability of each event. Since 0 ≤ pi ≤ 1 we can think of the pi as percentages. Thefact that

∑i pi = 1 tells us that the probabilities add up to 100%. One of the events

must happen. ◭

∆

x1

x2

x3

Figure 10.43.2: The probability simplex ∆ is the yellow triangle in R3 with vertices (1, 0, 0),(0, 1, 0), and (0, 0, 1).

11. LINEAR INDEPENDENCE 45

11. Linear Independence

This section focuses on bases for vector spaces. We’ve seen one basis already, thestandard basis for Rn. It allowed us to write any linear transformation from Rn to Rm

in terms of matrix multiplication.Bases allow us to define the dimension of a vector space. In the context of solutions

to linear systems, the dimension is the number of free variables. It tells us what thesolution set looks like.

Finally, the judicious choice of a basis can simplify linear systems, allowing easierinterpretation of results. In a dynamic context, this allows us to better understand bothshort and long-run dynamics of the system.

46 MATH METHODS

11.1 Linear Combinations

Let L be a line through the origin. We say L = {x : x = tx1} is the line generated byx1, or the line spanned by x1. It’s the set of all scalar multiples of x1. What if we havemore than one generator? What do we get?

Let’s try it. Let x1, . . . , xk be vectors in a vector space V. A sum of the form

t1x1 + t2x2 + · · · + tkxk =k∑

j=1

tjxj

for tj ∈ R is called a linear combination of x1, . . . , xk.


11.2 The Span of a Set

We define the span of a set {x1, . . . , xk}, L[

x1, . . . , xk

]

, as the set of linear combinationsof x1, . . . , xk. In other words,

L[

x1, . . . , xk

]

=

{

x : x =k∑

j=1

tjxj for some tj ∈ R

}

.

Theorem 11.2.1. Suppose x1, . . . , xk are vectors in V and x,y ∈ L[

x1, . . . , xk

]

. Thenfor α ∈ F, αx + y ∈ L

[

x1, . . . , xk

]

, so the span is a vector subspace of V.

Proof. In this case there are si, ti ∈ F with x =∑

i sixi and y =∑

i tixi. Thenαx + y =

∑i(αsi + ti)xi is also in L

[

x1, . . . , xk

]

.

When V = Rn, we can write the span using a matrix. Form an n × k matrix Xfrom the vectors x1, . . . , xk ∈ Rn by taking xj as the jth column of X. The linearcombinations of the x1, . . . , xk can be written

Xt =k∑

j=1

tjxj = t1

x11

x21...

xn1

+ · · · + tk

xk1

xk2...

xkn

.

ThenL[

x1, . . . , xk

]

={Xt : t ∈ R

k}.

When writing x this way, we can think of the tj as coordinates of x in the coordinatesystem X =

[

x1, · · · , xk

]

.

48 MATH METHODS

11.3 Spanning Examples

◮ Example 11.3.1: Standard Basis Vectors. If k = n and xj = ej, then the matrix formedfrom the standard basis vectors ej is the identity matrix and the coordinates of x areIx = x, meaning that the j coordinate is just xj. ◭

Spans need not resemble the standard basis vectors.

◮ Example 11.3.2: Span of Vectors. Consider the case of three vectors in R4 given bythe columns of

X =

1 1 00 1 11 1 11 0 0

.

Then

L[x1, x2, x3] =

t1 + t2

t2 + t3

t1 + t2 + t3

t1

: tj ∈ R

.

Since the rank of X is three, there are vectors cannot be written x = Xt. These vectorsare not in the span, showing that L[x1, x2, x3] is a proper subspace of R4. ◭


11.4 When are Linear Combinations Unique? 9/16/21

Linear combinations allow us to write vectors in terms of a particular set of vectors.It lets us set up a coordinate system. Does a vector have a single set of coordinates? Orare there multiple ways to write it in terms of our of vectors?

Let X be an n × k matrix whose columns define a coordinate system and supposex = Xt and x = Xt′, so t and t′ are both coordinates for x. When can we concludethat t = t′?

Alternately, when can we conclude that a vector has only one set of coordinates?By subtracting, we find 0 = X(t− t′), so the question is really whether this homoge-

neous linear system has a unique solution. By Corollary 7.28.1, this will have multiplesolutions if and only if k > rankX, which is equivalent to saying there are free variables.

This is connected to the idea of linear dependence.

Linear Dependence. Non-zero vectors x1 . . . , xk are linearly dependent if there are

t1, . . . , tk, not all zero, with∑k

j=1 tjxj = 0.

In other words, the vectors are linearly dependent if and only if Xt = 0 has anon-zero solution t.

50 MATH METHODS

11.5 Theorem on Linear Dependence

Theorem 11.5.1. Suppose x1 . . . , xk are linearly dependent vectors. Then there is h sothat

xh = − 1

th

∑

j 6=h

tjxj.

Proof. By linear dependence, we can find t1, . . . , tk, not all zero, with∑k

j=1 tjxj =0. Take h with th 6= 0. Then

thxh = −∑

j 6=h

tjxj

implying that xh is a linear combination of the other xj’s. Dividing by th, we obtain

xh = − 1

th

∑

j 6=h

tjxj.


11.6 Examples of Linear Dependence

The vectors

x1 =

(111

)

, x2 =

( 660

)

, x3 =

( 007

)

are linearly dependent as 7x1 − (7/6)x2 − x3 = 0.Another set of linearly dependent vectors is

x1 =( 1

0

)

, x2 =1√2

( 11

)

, x3 =1√2

( 1−1

)

.

Here

x1 −1√2

(

x2 + x3

)

= 0.

52 MATH METHODS

11.7 Linear Dependence and Independence

Linear Independence. We call non-zero vectors x1, . . . , xk linearly independent if theyare not linearly dependent.

Equivalently, a set of non-zero vectors X = {x1, . . . , xk} is linearly independent if∑kj=1 tjxj = 0 implies t1 = t2 = · · · = tk = 0. Linear independence implies there

is at most one vector t with x = Xt where X is the matrix formed by setting the jth

column of X equal to xj ∈ X.The vectors

x1 =

(110

)

, x2 =

( 011

)

, x3 =

( 101

)

are linearly independent.Suppose

Xt = t1x1 + t2x2 + t3x3 =

(

t1 + t3

t1 + t2

t2 + t3

)

=

(000

)

.

Then t1 = −t3, t1 = −t2, and t2 = −t3. Combining these, we find t = 0. Since theonly linear combination of x1, x2, and x3 that is zero is the zero linear combination,the vector are linearly independent.


11.8 Orthogonal Vectors are Linearly Independent

Orthogonal sets of vectors are automatically indendent.

Theorem 11.8.1. Let B = {bi}ki=1 be a set of orthogonal vectors in an inner product

space V. Then B is a linearly independent set.

Proof. Suppose there are real numbers ti with

z =k∑

i=1

tibi = 0

Now consider z·bi = ti = 0 for every i = 1, . . . , k. The vectors must be linearlyindependent.

54 MATH METHODS

11.9 Too Many Vectors Imply Linear Dependence

If there are too many vectors, they must be linearly dependent.

Theorem 11.9.1. Suppose x1, . . . , xk are non-zero vectors in Rn with k > n. Thenx1, . . . , xk are linearly dependent.

Proof. Consider the equation Xt = 0. Since there are more variables than equations,there is at least one free variable. It follows that Xt = 0 has infinitely many solutions,establishing linear dependence. Alternatively, we could quote Corollary 7.25.1.

◮ Example 11.9.2: More than n Vectors are Linearly Dependent in Rn. For example,suppose that in R3, we have

x1 =

( 111

)

, x2 =

(110

)

, x3 =

(011

)

, and x4 =

(101

)

.

These vectors are linearly dependent because there are too many of them. In fact, x1

is a linear combination of the others: x1 = (1/2)(x2 + x3 + x4). ◭


11.10 Spanning Sets

A second issue concerning coordinate systems is whether our standard set of vectors isbig enough to encompass all possible vectors as linear combinations. If so, every vectorwill have coordinates in our system. If not, there will be vectors outside the coordinatesystem.

Span. A set of non-zero vectors X = {x1, . . . , xk} ⊂ V spans a vector space V if∑kj=1 tjxj = x has a solution for every x ∈ V.

Equivalently, x1, . . . , xk span V if every vector in V is a linear combination ofx1, . . . , xk. If the vectors we are using to build a coordinate system span V, thenevery vector can be written using our coordinate system.

56 MATH METHODS

11.11 Spanning Sets in Rn

Any set that spans Rn must contain at least n vectors.

Theorem 11.11.1. If X = {x1, . . . , xk} is a set of non-zero vectors that spans Rn, thenk ≥ n.

Proof. If X spans Rn, construct X as above. Corollary 7.29.1 tells us that rankX = n,which implies k ≥ n.

More generally, if V is a vector subspace of Rn, x1, . . . , xk span V if Xt = x has asolution for every x ∈ V.

The set

x1 =( 1

1

)

, x2 =( 1−1

)

, x3 =( 1

0

)

spans R2. To see it, suppose

(

x1

x2

)

= t1x1 + t2x2 + t3x3 =(

t1 + t2 + t3

t1 − t2

)

.

This system has infinitely many solutions.

t =

(

x2

0x1 − x2

)

and t′ =

(

x2 + 11

x1 − x2 − 2

)

are two of them.Larger sets that span will involve some redundancy. Theorem 11.9.1 says they will

be linearly dependent, so by Theorem 11.5.1 at least one can be written as a linearcombination of the others. So every time it appears in a linear combination, it can bereplaced. It is redundant. It is not needed to span the set.


11.12 Basis of a Vector Space

This brings us to the concept of a basis. A set that both spans and is linearly independent.

Basis. A set of non-zero vectors {x1, . . . , xk} ⊂ V are a basis for a vector space V if

1. x1, . . . , xk are linearly independent.2. x1, . . . , xk span V.

Bases are ideal for building coordinate systems. They are neither too big nor toosmall. A basis is just right. We can write any vector as a linear combination of the basisvectors, and there is only one way to do it, only one set of coordinates for each vector.

Theorem 11.12.1. Every basis for Rn has exactly n elements.

Proof. Suppose x1, . . . , xk is a basis for Rn. By Theorem 11.11.1, k ≥ n, and byTheorem 11.9.1, k ≤ n. Thus k = n.

58 MATH METHODS

11.13 Creating a Vector Space

We can make a basis for a vector space out of anything.

◮ Example 11.13.1: Free Vector Space. Although our vector spaces usually have mean-ingful scalar products, it is not necessary. Given a set B, we define the free vectorspace over B as the set of formal linear combinations of elements of B. Here “formal”means that we don’t try to interpret what x = 1.5 red+ 2.4Einstein actually means.Sometimes, the scalar multiples are written in the form (α, red) to emphasize this.Moreover, we don’t need to know what addition means either. We just do calculationsconcerning vectors according to the rules of vector arithmetic.

For example, if B = {red,water,Einstein}, then V = {x1 red + x2 water +x3 Einstein : xi ∈ R}. ◭

We can even make a basis out of nothing, or more precisely the empty set.

◮ Example 11.13.2: Free Vector Space Using the Empty Set. Let

B′ ={∅,{∅},{∅, {∅}

}}

and form the free vector space over B′. ◭


11.14 Bases and Independent Sets

Theorem 11.14.1. Let B = {b1, . . . ,bn} be a basis for a vector space V. SupposeS ⊂ V has m > n elements. Then the vectors in S are linearly dependent.

Proof. Let S be as described. We can write S = {x1, . . . , xm}. Since B is a basis forV, and S ⊂ V, we can write each xi as a linear combination of the basis vectors B.That means there are aij, for i = 1, . . . ,m and j = 1, . . . , n, with

xi =n∑

j=1

aijbj.

To examine linear independence of the xi, we consider the equation

m∑

i=1

tixi = 0.

We will show it has non-zero solutions. We start by rewriting it

0 =m∑

i=1

ti

(

n∑

j=1

aijbj

)

=n∑

j=1

(

m∑

i=1

tiaij

)

bj.

As the bj are linearly independent, this implies their coefficients are zero. That means

m∑

i=1

tiai1 = 0,m∑

i=1

tiai2 = 0, . . . ,m∑

i=1

tiain = 0.

We have n equations in m unknowns, which we can write in matrix form as ATt = 0.This homogeneous system not only has a solution, but must have infinitely manysolutions because there are more unknowns (m) than equations (n). See Corollary7.28.1. It follows that there are t1, . . . , tm, not all zero, with

m∑

i=1

tixi = 0.

In other words, the {xi} must be linearly dependent.

60 MATH METHODS

11.15 Testing for a Basis

We can use our various results on solving equations to construct a test to see if{b1, . . . ,bn} form a basis for Rn. Item (4) of the theorem is the test.

Theorem 11.15.1. Let {b1, . . . ,bn} be a collection of vectors in Rn. Form the n × nmatrix whose columns B whose columns are the bj. Then the following are equivalent.

1. b1, . . . ,bn are linearly independent2. b1, . . . ,bn span R

n

3. b1, . . . ,bn form a basis for Rn

4. detB is non-zero

Proof. (1) implies (2). Linear independence means Bx = 0 has at most one solution,so rankB = #cols = n by Corollary 7.29.1. As B is n × n, this is also the number ofrows, so Bx = y always has a solution by Corollary 7.30.2, showing that the vectorsspan.

(2) implies (3). We do this by showing (2) implies (1). Just use the same arguments inthe opposite order. Then the vectors {b1, . . . ,bn} span and are linearly independent,so they are a basis.

(3) clearly implies (1) and (2). So (1), (2), and (3) are equivalent.(1)-(3) are equivalent to (4). As we saw above, (1), (2), and (3) are equivalent to

rankB = n = #cols = #rows, which is equivalent to B being non-singular (Corollary7.31.1). Finally, detB is non-zero if and only B is non-singular, completing the proof.


11.16 Finding an Orthonormal Basis

If we have a basisB for an inner product space, we can use it to construct an orthonormalbasis using the Gram-Schmidt method.1

Let B = {b1, . . . ,bn} be a basis for an inner product space V. The Gram-Schmidtmethod first constructs an orthogonal basis from B, and then normalizes it to obtain anorthonormal basis. Define

w1 = b1

w2 = b2 −w1·b2

w1·w1w1,

w3 = b3 −w1·b3

w1·w1w1 −

w2·b3

w2·w2w2, etc.

· · ·

wn = bn −n∑

i=2

(

wi−1·bn

wi−1·wi−1

)

wi−1

(11.16.1)

We will show that the set W = {w1, . . . ,wn} is an orthogonal basis.

1 This can be found on page 624 of Simon and Blume.

62 MATH METHODS

11.17 The Gram-Schmidt Vectors are Orthogonal

Theorem 11.17.1. If B is a basis for the inner product space V, then W = {w1, . . . ,wn}

as defined by equation (11.16.1) is also a basis for V.

Proof. Each of the wi is defined in terms of the bj for j = 1, . . . , i, and thus in theirspan. Alternatively, we can use equation (11.16.1) to write the bi in terms of the wj

for j = 1, . . . , i. This shows that B ⊂ L[W]. It follows that L[B] = L[W] = V.By Theorem 11.8.1, orthogonal vectors are linearly independent. Since they span V,

that will imply they are a basis for V. All that remains is to show that W is an orthogonalset of non-zero vectors.

Note that if any wi were zero, it would contradict the linear independence of B.We show that the wi are orthogonal vectors by inductively showing all of the

{w1, . . . ,wI} are orthogonal for I = 1, . . . , n. In this I = 1 case this reduces to{w1} = {b1}, which is trivially an orthogonal set of vectors.

For the induction step, suppose the set {w1, . . . ,wI} is orthogonal for some I < n.We must show adding wI+1 to the set maintains orthogonality. That means we need toshow wI+1·wj = 0 for all j = 1, . . . , I.

Now

wI+1·wj = bI+1·wj −I+1∑

i=2

(

wi−1·bn

wi−1·wi−1

)

(wi−1·wj)

= bI+1·wj −I+1∑

i=2

(

wi−1·bn

wi−1·wi−1

)

δi−1j(wj·wj)

= bI+1·wj −wj·bI+1

= 0.

Because of the Kronecker delta, only the i− 1 = j term remains of the sum in the thirdline. We have proved the induction step that {w1, . . . ,wI+1} is an orthogonal set ofvectors. It follows that W is an orthogonal set of vectors, and hence a basis.

We can form an orthonormal basis from W by defining

vi =wi

‖wi‖.

Then {v1, . . . , vn} is an orthonormal basis derived from B.


11.18 The Dimension of a Vector Space

A consequence of Theorem 11.14.1 is that every basis of a vector space must be thesame size, provided the size if finite. More precisely.

Basis Theorem. Suppose a vector space V has a basis B with n elements where n isfinite. Then every other basis of that vector space must also have n elements.

Proof. Any other basis must be a linearly independent set, so by Theorem 11.14.1,it cannot have more than n elements.

If there was a basis with fewer than n elements, we could apply Theorem 11.14.1 todetermine that B is not a linearly independent set, and so not a basis. This contradictsour hypothesis, so it is impossible. We conclude that any basis has exactly n elements.

The Basis Theorem lets us define the dimension of a vector space, at least when thedimension is finite.2

Suppose a vector space V has a finite basis B. The Basis Theorem tells us that anybasis for V will have the same number of elements as B.

Dimension. For vector spaces with a finite basis, we define the dimension of V as thenumber of elements of that basis.

By the Basis Theorem, the dimension does not depend on which basis we use. Wedenote the dimension of V by dimV.

2 When the vector space is infinite, we have two choices. We can continue to use finite linearcombinations (Hamel basis), or we can allow infinite sums. If we use the metric we previously definedon s, it is possible to show that the partial sums

∑ni=1 xiei converge to x ∈ s for every x. This is an

example of a Schauder basis.

64 MATH METHODS

11.19 Is the Dimension of a Vector Space Always Finite?

Although we started with a basis with a finite number of elements, there are vectorspaces with infinite bases. The arguments become trickier then, and we will notconsider that case further other than to give an example.

◮ Example 11.19.1: Attempted Basis for the Sequence Space. The sequence space scontains infinite linearly independent sets. For s, define the vectors ej, j = 1, 2, 3, . . .by (ej)i = δij. Then e1 = (1, 0, 0, . . . ), e2 = (0, 1, 0, 0, . . . ), etc. We now have aninfinite set E = {e1,e2,e3, . . . } that seem like a possible basis.

The set E is linearly independent in s. However, E is not a basis for s because it doesnot span s. The problem is that linear combinations involve finite sums, and vectorssuch as (1, 1, 1, . . . ) cannot be written as a finite sum of the ej. ◭

27. SUBSPACES ATTACHED TO A MATRIX 65

27. Subspaces Attached to a Matrix

This chapter draws on Chapter 27 of Simon and Blume.

27.1 The Column Space of a Matrix

Let A be an m × n matrix. The column space of the matrix A, Col(A), is the set oflinear combinations of the columns of A. Denote column i by aj, so

A =(

a1 · · · an

)

.

Any linear combination of the columns of A can written

m∑

i=1

xjaj = Ax

where x = (x1, . . . , xn)T ∈ Rn.Alternatively, the since the column space of A is the set of linear combinations of the

columns of A, we haveCol(A) = L

[

a1, . . . ,an

]

.

There are multiple ways to see that Col(A) is a vector subspace of Rm. One is toinvoke Theorem 11.2.1, which tells us the span of any set of vectors in Rm is a subspaceof Rm. Another is to realize that Col(A) = ran TA where the linear transformation TAis defined by TA(x) = Ax. By Corollary 10.14.2, ran TA is a vector subspace of Rm.

66 MATH METHODS

27.2 The Row Space of a Matrix

We define the row space of A, Row(A), as the of all linear combinations of the rows ofA. It is the span of the rows of A. Of course, these are row vectors in Rn, not columnvectors. Now let ai denote the ithrow of A, so we can write

A =

a1...

am

.

Any linear combination of the rows of A can be written

m∑

i=1

xiai = xA

where x = (x1, . . . , xm) is an m-dimensional row vector. We can turn it into a columnvector by taking the transpose (which was conspicuously absent). That yields ATxT , sowe can consider the row space as being the range of AT .


27.3 Elementary Operations and Row and Column Spaces

We now have three vector spaces derived from matrices: the kernel (null space), rowspace, and column space of a matrix A. Each of these spaces has a well-defineddimension.

It’s not hard to show that elementary column operations do not affect dim Col(A)and elementary row operations do not affect dim Row(A).

Theorem 27.3.1. Let A be an m × n matrix. The three elementary row operations donot affect the row space of A and the three elementary column operations do not affectthe column space of A.

Proof. We will prove the row case. The column case differs only in notation.Leta1, . . . ,am be the rows ofA. The row space is the set of their linear combinations,

L[

a1, . . . ,am

]

. We now consider the three elementary row operations in turn.(1) If we interchange two rows, the set of vectors is completely unchanged, so the

span is also unchanged.(2) If we multiply row i by r 6= 0, the linear combination

t1a1 + · · · + tiai + · · · + tmam

is the same as

t1a1 + · · · +ti

r(rai) + · · · + tmam,

so the span remains the same.(3) If we add r 6= 0 times row i to row j, we can rewrite the linear combination

t1a1 + · · · + tiai + · · · + tjaj + · · · + tmam

= t1a1 + · · · + (ti − tjr)ai + · · · + tj(aj + rai) + · · · + tmam.

Since this can be read either way, the span again remains the same.None of the elementary row operations affect the span.

68 MATH METHODS

27.4 Dimensions of Row and Column Spaces

Recall that the rank of a matrix A is the number of basic variables, which is the numberof leading ones in the reduced row-echelon form.

Theorem 27.4.1. Let A be an m×n matrix. Then dim Row(A) = rankA = dim Col(A).

Proof. Since each row in reduced row-echelon form has a leading one that is a pivot,the non-zero rows are linearly independent. They also span the row space, which isunchanged by row reduction, and thus form a basis. It follows that

dim Row(A) = rankA.

The column case is similar.

Since dim Row(A) (dim Col(A)) is unaffected by elementary row (column) operations,the rank is independent of the row (column) reduction used.


27.5 Rank of Matrix Products

One consequence of Theorem 27.4.1 is that the rank of the product AB is no morethan the rank of A, and that the ranks are equal when B is invertible.

Theorem 27.5.1. Let A and B be conformable matrices. Then rank(AB) ≤ rankA.Moreover, if B is invertible, rank(AB) = rankA.

Proof. We know that rankA = dim Col(A). Now the columns of AB are linearcombinations of the columns of A. In fact, if a1, . . . ,an are the columns of A, thenthe jth column of AB is

∑ni=1 bijai. But then

rank(AB) = dim Col(AB) ≤ dim Col(A) = rankA.

Now suppose B is invertible. Apply the previous result to [(AB)B−1], obtaining

rank[

(AB)B−1]

≤ rank(AB).

But (AB)B−1 = A(BB−1) = A, so rankA ≤ rank(AB). Combining the two resultsshows rankAB = A when B is invertible.

70 MATH METHODS

27.6 The Kernel of a Matrix

Since the solution set of a linear system is not affected by row operations, neither is thedimension of the kernel.

We also saw in section 10.13 that the solution set to Ax = b is a translate of kerA,so the dimension of the kernel tells us what the solution set looks like in general. It isjust a translate of the kernel.

So what is the dimension of the kernel?Let’s look at an example. Suppose the reduced row-echelon form of the coefficient

matrix is

R =( 1 1 0 1 1 1

0 0 1 1 0 1

)

.

There are two basic variables (x1 and x3) and four free variables (x2, x4, x5, and x6),marked in red.

The reduced matrix now gives us two equations that define the kernel:

0 = x1 + x2 + x4 + x5 + x6

0 = x3 + x4 + x6.

We can find a basis for the kernel by successively setting all of the free variables but oneto zero. The other can be anything we want. We choose +1. Here are the solutions.

b1 =

−110000

,b2 =

−10−1100

,b3 =

−100010

,b4 =

−10−1001

.

Each of the bi is in the kernel, and they are all linearly independent. The point isthat row i = 2, 4, 5, 6 is non-zero only in one of the vectors. This happens becauseeach vector is generated by considering a case where only one of the free variables isnon-zero, which forces the corresponding row to be non-zero.

The result of all this is that

dim kerA = #free vars.


27.7 The Dimension of the Kernel

The example was all very nice, but it is no substitute for a proof. Fortunately, all theproof needs is to add some words of explanation.

Theorem 27.7.1. Let A be an m× n matrix. Then


Proof. Row reduce A to a reduced row-echelon form R. Both A and R will havethe same kernel because elementary row operations do not affect the solution set.

Write down the homogeneous equations corresponding to the reduced row-echelonform R. For each free variable i, set xi = 1, all the other free variables to zero, andsolve for the basic variables. This defines a vector bi for each free variable i.

Each bi solves the homogeneous equations, bi ∈ kerA. Suppose x ∈ kerA. Thenx solves the reduced row-echelon system for some values xi of the free variables. Wecan write

x =∑

i∈free vars

xibi

showing that the bi span kerA.As bi is the only one of the bj where row i is non-zero, the bj are linearly indepen-

dent. It follows that they form a basis for kerA, so


72 MATH METHODS

27.8 Fundamental Theorem of Linear Algebra

We know that the number of free variables and the number of basic variables add tothe number of variables (n). Combining that with Theorem 27.7.1, we obtain

#basic vars = n− dim kerA.

Since rankA is the number of basic variables, we can sum this up as follows.

Fundamental Theorem of Linear Algebra. Let A be an m× n matrix. Then

n = rankA + dim kerA.

31. TRANSFORMATIONS AND COORDINATES 73

31. Transformations and Coordinates

You’ll notice there’s no Chapter 31 in Simon and Blume. Some of this material is notin Simon and Blume, some is scattered in the text.

31.1 Isomorphic Vector Spaces

It’s often helpful to analyze vector space problems based on general considerationsrather than being tied to a specific vector space characterized by a particular basis.

We can change bases of vector spaces regardless of whether we think of them asbeing subspaces of Rn or something else entirely. However, if two vector spaces havethe same finite dimension there will always be a mapping that will allow us to treatthem as identical, as far as all vector space constructions are concerned.

So when are two vector spaces the same? If only the vector space properties matter,the answer is that they are the same if they are isomorphic.

Isomorphic Vector Spaces. Two vector spaces V and W are isomorphic if there is a lineartransformation T : V → W that is one-to-one and onto. Such a mapping is called alinear isomorphism.

The fact that the transformation is linear tells us that it preserves the vector spaceoperations. The fact that it is bijective means it has an inverse.

74 MATH METHODS

31.2 The Inverse of a Linear Isomorphism

The inverse of a linear isomorphism is also a linear isomorephism.

Theorem 31.2.1. Suppose T : V → W is a linear isomorphism between vector spacesV and W. Then the inverse transformation T−1 exists and is also a linear isomophism.

Proof. For any y ∈ W, let T−1(y) denote the unique x ∈ V with T (x) = y. Heresuch x exist because T maps onto W, and x is unique because T is one-to-one.

Now if x,y ∈ W, then for any scalar α,

T(

αT−1(x) + T−1(y))

= αx + y,

showing that Tαx+y = αT−1(x) + T−1(y). In other words, T−1 is linear.If x ∈ V, then T−1

(

T (x))

= x showing that T−1 is onto. And if v,w ∈ W andT−1(v) = T−1(w), then apply T to find v = w, showing that T−1 is one-to-one. ThusT−1 is a linear isomorphism from W to V.


31.3 Matrix Isomorphisms

One important result is that a linear isomorphism T from Rn to R

m can be writtenT (x) = Ax for an invertible matrix A.

Theorem 31.3.1. Let T : Rn → Rn be a linear isomorphism. Then there is an invertiblematrix A with T (x) = Ax for every x ∈ V.

Proof. Let T be as above. Because the mapping is linear, we can use Theorem 10.6.1to find an m× n matrix A with T (x) = Ax.

Since T is a linear isomorphism, it must be both one-to-one and onto. The firstrequires that rankA = n by Corollary 7.28.1. By Corollary 7.30.2, the fact that T isonto tells us that rankA = m.

Combining these shows m = n = rankA, which means that A is non-singular, thatit must be invertible.

76 MATH METHODS

31.4 Isomorphic Vector Spaces have the Same Dimension

In fact, all isomorphic vector spaces, not just Rn, must have the same dimension. It’salso easy to show that two vector spaces of the same finite dimension are isomorphic.

Theorem 31.4.1. Let V and W be finite-dimensional vector spaces. Then dimV =dimW if and only if there is an isomorphism T : V → W.

Proof. Only if case (⇒): Let V be a basis for V and W be a basis for W and letdimV = n (which is also dimW in this part). Then set T (vi) = wi and use linearity todefine T on all of V. The resulting transformation T maps V into W. We will show it isbijective.

(1) The linear mapping T maps onto W, because if x =∑

j xjwj ∈ W, then

x = T(∑

j xjvj

)

.(2) The mapping T is one-to-one because if T (x) = T (y) for some x,y ∈ V, then∑j xjwj =

∑j yjwj. As W is linearly independent, xj = yj for all j = 1, . . . , n,

showing that x = y. It follows that T is an isomorphism.If case (⇐): Using the same notation, consider the set T

(

L[V])

= TV. Since T mapsonto W, T

(

L[V])

= W, showing that dimW ≤ dimV.We next show that the image of theV-basisV, T (V), is a linearly independent set. Sup-

pose there are tj, j = 1, . . . , n with∑

j tjT (vj) = 0. Then T(∑

j tjvj

)

=∑

j tjT (vj) =0. The fact that T is an isomorphism, and so one-to-one, implies

∑j tjvj = 0. But

V is a basis, so tj = 0 for all j, showing that T (V) is linearly independent. SinceT (V) is a linearly independent set in W, dimW ≥ n. Combining the results showsdimW = n = dimV.

You can even use this to show that any n-dimensional vector space is isomorphic toany free vector space of dimension n.


31.5 Isomorphism: One and a half Examples

◮ Example 31.5.1: An Isomorphism. Let W = {x ∈ R3 : x1 + x2 + x3 = 0}. This is atwo-dimensional subspace and should be isomorphic with R2. We start by finding abasis for W. The vectors

b1 =

( 10−1

)

,b2 =

( 01−1

)

will do (as a linearly independent set in a two-dimensional space, they must be a basis).Define

T(

x1

x2

)

= x1b1 + x2b2

= x1

( 10−1

)

+ x2

( 01−1

)

=

(

x1

x2

−x1 − x2

)

Because b1 and b2 are linearly independent, T is one-to-one, and because bi spansW, it is onto. That makes it an isomorphism. The inverse map is

T

(

y1

y2

−y1 − y2

)

=(

y1

y2

)

.

Keep in mind that the components of y must sum to zero by the definition of W. Soonce we know y1 and y2, the value of y3 is already determined by the need to be inW. ◭

We could just as easily use the free vector space with basis {red,water} and set

T(

x1 red + x2 water)

=

(

x1

x2

−x1 − x2

)

.

78 MATH METHODS

31.6 Isomorphic Normed Spaces

Normed spaces and inner product spaces have to meet higher standards for isomor-phism because they have structure beyond their vector space structure that must bepreserved.

Isometric Normed Spaces. An isomorphism T between normed spaces (V, ‖ · ‖1) and(W, ‖ · ‖2) is a vector space isomorphism between V and W that preserves the norm,‖T (x)‖2 = ‖x‖1. Such isomorphisms are also called linear isometries or isometricisomorphisms.

Isometries are often linear. To state this more precisely, define the midpoint of twovectors x and y as (x + y)/2.

We state the following theorem of Mazur and Ulam without proof

Mazur-Ulam Theorem. Let V and W be normed spaces and T a mapping of V onto Wwith ‖Tx‖ = ‖x‖ and T (0) = 0. Then T maps midpoints to midpoints and is linear as amap over R.

This means that T is an isometric isomorphism between V and W. The result can failin complex vector spaces.

You’ll notice that although T maps midpoints to midpoints, that is not generally trueof ‖T‖. What happens is that

∥

∥

∥

∥

T

(

1

2(x + y)

)∥

∥

∥

∥

=

∥

∥

∥

∥

1

2

(

T (x) + T (y))

∥

∥

∥

∥

≤ 1

2

(

‖T (x)‖ + ‖T (y)‖)

.

However, the triangle inequality is often strict, so ‖T‖ usually doesn’t map midpoints tomidpoints.


31.7 Isomorphic Inner Product Spaces 9/21/21

Inner product spaces have an even higher standard to uphold for isomorphism. Theinner product must be preserved, meaning that angles between vectors remain thesame. We will use the notation 〈x,y〉i to distinguish the inner products. Preserving theinner product means the norm is also preserved, so inner product space isomorphismsare always isometric.

Isomorphic Inner Product Spaces. An isomorphism T between inner product spaces(

V, 〈·, ·〉1

)

and(

W, 〈·, ·〉2

)

is a vector space isomorphism between V and W that pre-serves the inner product, 〈T (x), T (y)〉2 = 〈x,y〉1.

Using less precise notation, we may write the inner product condition as

T (x)·T (y) = x·y.

80 MATH METHODS

31.8 Automorphisms

An automorphism on a vector space V is an isomorphism from V to itself, T : V → V.Here will will consider automorphisms that preserve the inner product on ℓn2 , EuclideanRn.

Automorphisms are not quite as specialized as you might think. When we have alinear isomorophism, the two vector spaces involved have the same dimension, and theisometry makes them act like they are identical. Actually requiring they be identical isa small additional step.

Before studying automorphisms further, we prove a useful lemma.

Lemma 31.8.1. Let V be an inner product space and suppose x and x′ obey x·y = x′·yfor every y ∈ V. Then x = x′.

Proof. Now (x− x′)·y = 0 for every y ∈ V. Set y = x− x′ to obtain ‖x− x′‖ = 0.It follows that x = x′.


31.9 Characterizing Automorphisms on ℓn2The main result is:

Theorem 31.9.1. Let T be an automorphism on ℓn2 . Use any basis to find a matrixrepresentation A of T . Then ATA = I.

Proof. Under these conditions (Ax)·(Ay) = x·y for all x,y ∈ Rn. Now

x·y = (Ax)·(Ay)

= (Ay)T (Ax)

= yT(

(ATA)x)

=(

(ATA))

x·y.

Since this holds for every y ∈ Rn, Lemma 31.8.1 implies that ATAx = x. This holdsfor every x, which implies ATA = I.

The result can also be written as A−1 = AT .Consideration of the standard basis showed these are rotations and reflections. In

fact, the pure rotations all have detA = +1, while those involving reflections havedetA = −1. As in R2, an even number of reflections amounts to a rotation. I thinkthe main reason it is less clear in dimensions higher than 3 is that our intuition doesn’twork so well there.

When the vector space is Cn, we use x·y = x∗y =∑n

j=1 xjyj. In that case, wefind that the inner product is preserved when the basis matrix U obeys U∗U = I.Such matrices are called unitary matrices and are the complex version of rotations andreflections. We can still use detU = ±1 to sort out which is which.

82 MATH METHODS

31.10 Matrices with ATA = I

Reflections also obey ATA = I, but the determinant is −1. For example, reflecting inthe horizontal axis maps (e1,e2) 7→ (e1,−e2). The new basis matrix is

B =( 1 0

0 −1

)

which has determinant −1.Notice that multiplying again by B returns us to our original coordinates. Also notice

that this transformation can’t possibly be a rotation. We saw that both diagonal elementsmust be the same for a rotation, but they aren’t here. Also, rotations have determinant+1, this has determinant −1.

It is easy to see that matrices obeying ATA = I come in two types, those that arepurely rotations, with detA = +1 and those that involve a reflection, with detA = −1.Nothing else is possible.


31.11 Characterizing Inner Product Isomorphisms

Suppose T : Rn → Rn is an inner product isomorphism when both R

n’s have theEuclidean inner product. Such a mapping is also an isometry. The image of thestandard basis is not only a basis (guaranteed by the vector space isomorphism), butmust be an orthonormal basis.

Theorem 31.11.1. Let T be an inner product isomorphism from Rm to Rm and suppose{b1, . . . ,bm} is an orthonormal basis for Rm. Then {T (bi)} is also an orthonormal basisfor Rm.

Proof. To see this, we compute

T (bi)·T (bj) = bi·bj = δij.

It is now clear that T (bi) is an orthonormal set.

In particular, Theorem 31.11.1 applies when we use the standard basis vectors inRm. They are mapped to an orthonormal set. This means that T is either a rotation, ora rotation together with an reflection.

84 MATH METHODS

31.12 Rotations and Reflections in Two Dimensions

Let’s start with R2. The fact that T (e1) is perpendicular to T (e2) means that T (e2) lies on

the line perpendicular to T (e1). Since T (e2) is a unit vector, there are only two possibleplaces to put it. One is a rotation. The other involves a reflection together with arotation. This is illustrated in Figure 31.12.1.

Rotation

e1

e2

b1b2

45◦

45◦

x1

x2

Rotation and Reflection

e1

e2

b1b2

b′2

45◦

45◦

x1

x2

Figure 31.12.1: In the left panel the standard coordinate vectors are rotated counter-clockwise by 45◦. The dashed line is perpendicular to b1 and shows the line that b2

must lie in. Then b2 is the only unit vector on that line that is consistent with rotation. Using−b2 would require a combination of both rotation and reflection

In the right panel, we make the opposite choice for b2, pointing downward along the 45◦

line rather than upward. This amounts to making the rotation shown in the left panel, andthen reflecting the result about the line through the origin defined by b1. This leaves b1

unchanged, but flips b2 (in gray) to b′2 (in red).


31.13 General Rotations in R2

Consider a clockwise rotation of the canonical basis vectors in R2 by an angle θ. This

rotates black vectors ei into the red ones bi.

Effect of Rotation by θ

e1

e2

b1

b2

θ

θ

x1

x2

Figure 31.13.1: The standard coordinate axes are rotated counter-clockwise by an angel θ.A little trigonometry gives us the coordinates in the old system. Since bi is a unit vector, wecan read the coordinates in terms of the sine and cosine of θ. Specifically, b1 = (− sinθ, cos θ)and b2 = (cos θ, sinθ).

As shown in the diagram, we have the following mapping:

e1 =( 1

0

)

7→( cosθ− sin θ

)

,e2 =( 0

1

)

7→( sin θ

cosθ

)

.

Note that the new vectors are still an orthonormal basis, as both vectors are rotatedby the same angle. Also, the determinant of the new basis matrix is

det( cos θ sinθ− sinθ cos θ

)

= +1.

In fact, rotations always have a determinant of +1.

86 MATH METHODS

31.14 Example: A Rotation and its Inverse

◮ Example 31.14.1: 45◦ Rotation: Done and Undone. The matrix

B =1√2

( 1 −11 1

)

rotates the coordinates of R2 by 45◦, taking (1, 0)T to (1/√

2, 1/√

2)T and (0, 1)T to(−1/

√2, 1/

√2)T . Since this is a rotation, its transpose is also its inverse, and we have

B−1 = BT =1√2

( 1 1−1 1

)

.

Effect of B

e1

e2

b1b2

45◦

45◦

x1

x2

Effect of BT

e1

e2

bT1

bT2

45◦

45◦

x1

x2

Figure 31.14.2: Here the standard coordinate axes are rotated counter-clockwise by 45◦ inthe left panel. Here b1 and b2 are the columns of B. In the right panel, we have the inversetransformation where a 45◦ clockwise rotation gives us the new coordinate axes. Here bT

1

and bT2 are the columns of the matrix BT .

◭


31.15 Rotations and Reflections in Three Dimensions

In R3, T (e1) determines a plane that the other T (ei) lies in. Once we know where T (e2)

goes, there are only two choices for T (e3). One involves a rotation of {e1,e2,e3}, theother combines a reflection and rotation. That the latter case is possible can be seenby letting your thumb, forefinger, and middle finger represent the basis vectors. Yourright hand cannot be rotated to be a left hand, and vice-versa. However, a mirror canturn a right hand into a left hand, which is why a reflection might be needed.

In any Rm, there are two orientations of orthogonal axes. One that is a rotation ofthe standard axes, the always involves a reflection.

88 MATH METHODS

31.16 Bases and Coordinates

We’ve used two sets of bases for several pages. It’s time to approach the differentcoordinate systems more systematically.

Let B = {b1, . . . ,bn} be a basis for Rn. Define the basis matrix by lining up thebasis vectors in order:

B =(

b1 b2 · · · bn

)

.

Since B both spans and is a linearly independent set of n vectors in Rn, B is an n×nmatrix with rank n. That means it is invertible. Given a vector x ∈ Rn, we find itsvector of coordinates tB in the B basis by solving the equation

x = BtB.

Because B is invertible, tB = B−1x.This all applies to the standard basis E = {e1, . . . ,en}. In that case, the basis matrix

E is the n× n identity matrix. Nonetheless, we use a special name for it to emphasizethat we are doing basis calculations. Given a vector x, expressed in the standardcoordinates, we find that tE = E−1x = Ix = x, meaning that the coordinates are whatwe think they are.


31.17 Example: Coordinates in R3

Let’s see how this works in R3. The basis B = {b1,b2,b3}, defined by

b1 =

(123

)

,b2 =

( 202

)

,b3 =

( 111

)

,

gives us the basis matrix

B =

( 1 2 12 0 13 2 1

)

.

We now use the formulax = BtB

To change from B coordinates to standard coordinates. The vectors with coordinatestB = (1, 1, 0)T and t′B = (1, 0, 3) yield the vectors x = BtB = (3, 2, 5)T and x′ =Bt′B = (4, 5, 6) in standard coordinates.

Let’s see how it works the other way, going from standard basis E coordinates to B

coordinates. For that, we use the formula

tB = B−1x.

The inverse of B is

B−1 =

(−1/2 0 1/21/4 −1/2 1/41 1 −1

)

.

The vector x = (3, 2, 1)T then has coordinates tB = B−1x = (−1, 0, 4)T in the basis B,meaning that x = −b1 + 4b3. The vector x′ = (−1,−1,+5)T has coordinates t′B =B−1x′ = (3, 3/2,−7) in the B basis, so that x′ = 3b1 + (3/2)b2 − 7b3 = (−1,−1,+5)in the standard basis.

90 MATH METHODS

31.18 Changing Coordinate Systems

Consider two different bases (aka coordinate systems), B = {b1, . . . ,bn}, and B′ ={b′

1, . . . ,b′n}, we form the corresponding basis matrices B and B′.

Given a vector x, we can write it in the two coordinate systems as x = BtB andx = B′tB′. Then BtB = B′tB′. Solving for tB and tB′, we derive the change ofcoordinates formulas:

tB =(

B−1B′)

tB′ and tB′ =(

(B′)−1B)

tB (31.18.1)

Starting with the B′ coordinates, we multiply by B′ to get the actual vector x, and thenmultiply by B−1 to put it into the B coordinate system. Conversely, to convert the B

coordinates to B′ coordinates, we reverse the process, multiplying first by B, and thenby (B′)−1.


31.19 Example: Changing Coordinates in R2

To see how this works, suppose

B =( 1 1

0 1

)

and B′ =( 1 2

2 1

)

are the basis matrices. Then

B−1 =( 1 −1

0 1

)

and (B′)−1 =1

3

(−1 22 −1

)

.

Consider the vector with B′ coordinates tB′ = (1, 4)T . Using the formula, we obtainthe B coordinates

tB =( 1 −1

0 1

)( 14

)

=( 3

6

)

.

Let’s check it. Now tB′ = (1, 4)T corresponds to

x =( 1

2

)

+ 4(2

1

)

=( 9

6

)

and

x = 3(1

0

)

+ 6(1

1

)

=( 9

6

)

.

This shows that both expressions refer to the same vector x, whose standard coordinatesare

x =( 9

6

)

.

92 MATH METHODS

31.20 Linear Transformations and Bases

So how do these coordinate changes affect linear transformations? Take a linear trans-formation on Rn, T : Rn → Rn. As we saw in section 10.6, we can use the standardbasis to represent this in matrix form, so that T (x) = Ax.

So what happens if we want to use a different basis? One reason to do such a thingwould be to write the transformation in a more convenient form—one that is easier tocalculate or interpret.

Let B be a basis matrix for B. We will write the matrix for T as AE when it is instandard (E) coordinates and AB when it is in B coordinates.

To find AB from AE, we start with B coordinates tB, then convert them to standardcoordinates, x = BtB. Then we feed this to the matrix in standard coordinates,obtaining AEBtB.

As this is in standard coordinates, we have to convert the result back to the Bcoordinates. We do this by multiplying on the left by B−1. That gives us (B−1AEB)tBas the B coordinates of the transformed vector. Thus

AB = B−1AEB or BABB−1 = AE. (31.20.2)

is the matrix for T in B coordinates. The type of transformation used in equation(31.20.2) is sometimes called a similarity transformation.


31.21 Coordinate Change with Arbitrary Bases

Things are a bit more complicated if we had originally used a basis other than thestandard basis. If the transformation had been written in B′ coordinates, we wouldmultiply by

(

(B′)−1B)

to convert B coordinates to B′ coordinates, apply A, thenconvert back. The result is:

AB =(

B−1B′)

AB′

(

(B′)−1B)

.

Another way to write this that may make the method clearer is transform them intostandard coordinates:

BABB−1 = AE = B′AB′(B′)−1.

94 MATH METHODS

31.22 Example: Linear Transformation Basis Change

Suppose our new basis is

B ={( 1

1

)

,( 1−1

)}

with basis matrix

B =( 1 1

1 −1

)

.

This is an orthogonal basis since the vectors are perpendicular, but not orthonormalbecause they have length

√2. This means that B−1 = (1/2)BT .

Suppose our linear transformation T has matrix

A =1

2

( 3 11 3

)

in the standard basis.To find its representation AB in the B basis, we first compute

B−1 =1

2

( 1 11 −1

)

.

Then our new transformation matrix is

AB = B−1AB

=1

4

( 1 11 −1

)( 3 11 3

)( 1 11 −1

)

=( 2 0

0 1

)

.

As you can see, the transformation has taken a particularly simple form: The trans-formed matrix AB is diagonal. This reflects the fact that T (b1) = 2b1 and T (b2) = b2.In Chapter 23, you will learn how we can sometimes find such a basis from the originalmatrix.


31.23 Example: Transformations with Complex Numbers

Consider the matrix

A =( 0 1−1 0

)

.

In section 8.31, we saw that A is a square root of −I, the negative of the identity matrix.By using a complex basis, we can see just how true that is. There is a complex basis

B where AB is purely imaginary in the weak sense that all of its non-zero elements arepurely imaginary. Indeed, the non-zero elements are square roots of −1.

Consider the basis with basis matrix

B =1√2

( 1 1i −i

)

.

It is a unitary matrix. Its inverse is its Hermitian conjugate

B−1 =1√2

( 1 −i1 i

)

.

Now

AB = B−1AB

=1

2

( 1 −i1 i

)( 0 1−1 0

)( 1 1i −i

)

=1

2

( 1 −i1 i

)(

i −i−1 −1

)

=(

i 00 −i

)

revealing that the matrix A is more closely connected to the imaginary numbers thanwe first realized.

If you’ve had a comprehensive linear algebra course, you may have seen such transfor-mations before. If you had a differential equations class that covered linear differentialsystems, you may have seen them there too. As was true of the previous page, you willlearn how find these transformations like this in Chapter 23.

96 MATH METHODS

31.24 The Dual Space

In section 10.8, we defined linear functionals, linear functions from a real vector spaceRn to R. More generally, let F = R or C. A linear function f from a finite-dimensionalvector space V over F to F is called a linear functional. We saw earlier that any linearfunctional on Fn can be represented by a 1×n matrix, a horizontal vector, a covector.1

The dual space of V is the set of all linear functionals on V and is denoted V∗. Sincethe set of 1 × n matrices is a vector space of dimension n = dimV, dimV∗ = dimV.

The most common duality in economics involves prices and quantities. We think ofquantities x ∈ Rn and prices in (Rn)∗, writing px for cost. Some problems are betterstudied using functions of quantity (e.g., utility, production), while others are betterstudied using dual functions of price (cost, expenditure, indirect utility).

To see how duality relates to bases, we treat f as we do any other linear transformation.We express x using the standard basis and use the linearity of f to write:

f(x) = f

(

n∑

j=1

xjej

)

=n∑

j=1

xjf(ej)

=(

f(e1) · · · f(en))

x1...xn

.

(31.24.3)

So any linear functional f on V defines a 1 × n matrix vf by

vf =(

f(e1) · · · f(en))

.

Reading equation (31.24.3) up from the bottom makes it clear that any 1 × n matrixdefines a linear functional on V, and vice-versa. We can think of the linear functionalsas 1 × n matrices.

In fact, if V is an inner product space, we can identify x ∈ V with the dual element x∗

since y 7→ x·y = x∗y is a linear functional on V. However, this mapping is only linearif V is a real vector space. If it is a complex vector space, the mapping is conjugatelinear.

1 When dealing with infinite-dimensional spaces, a distinction is made between linear functions from Vto R, sometimes called linear forms and continuous linear functions from V to R, called linear functionals.We are avoiding these technical issues by restricting ourselves to finite-dimensional spaces.


31.25 The Dual of a Basis

Given a basis B = {b1, . . . ,bn} for V, we can define a corresponding dual basis B∗ forV∗ by setting b∗

i (bj) = δij.It is easy to verify that this is a basis for V∗.

Theorem 31.25.1. Let V be a finite-dimensional vector space. The dual basis is a basisfor V∗

Proof. Let f be a linear functional on V. We write x in the basis B as x =∑

j xjbj.Then

b∗i (x) =

n∑

j=1

xjb∗i (bj) =

n∑

j=1

xjδij = xi.

Now expand vfx = f(x):

vfx = f(x) = f

(

n∑

j=1

xjbj

)

=n∑

j=1

xjf(bj) =n∑

j=1

f(bj)b∗j (x).

This shows that f = vf =∑

j f(bj)b∗j , meaning that B∗ spans V∗.

Next, we consider linear independence. Suppose f = vf =∑n

j=1 xjb∗j = 0. Then

for each bi, i = 1, . . . , n,

0 = f(bi) = vf(bi) =n∑

j=1

xjb∗j (bi) =

n∑

j=1

xjδij = xi.

But then xi = 0 for i = 1, . . . , n, showing that B∗ is a linearly independent set, andtherefore a basis for V∗.

98 MATH METHODS

31.26 The Standard Dual Basis

We can now define the standard dual basis by {e∗1, . . . ,e

∗n} for (Rn)∗ by e∗

i (ej) =δij. Thus the dual basis vectors e∗

i are the row vectors e∗1 = (1, 0, 0, . . . , 0), e∗

2 =(0, 1, 0, . . . , 0), . . . , e∗

n − (0, . . . , 0, 1). This allows us to write any f ∈ V∗ as

f =n∑

i=1

f(ei)e∗i =

(

f(e1), . . . , f(en))

.

The following corollary is similar to Lemma 31.8.1, but applies to any dual space anddoesn’t require that V be an inner product space.

Corollary 31.26.1. Let V be a finite-dimensional vector space. Suppose f(x) = f(y) forall f ∈ V∗, then x = y

Proof. Let B be a basis for V. We can write x =∑

j xjbj and y =∑

j yjbj. Sinceb∗

i ∈ V∗, xi = b∗i (x) = b∗

i (y) = yi for every i = 1, . . . , n. Then x = y by linearindependence of the bi.


31.27 Coordinate Change in the Dual Space

Although we have a formula for the dual basis, we still need to fully identify it. Sincethe dual basis consists of covectors (row vectors), we form the dual basis matrix B̂ bystacking the rows.2

B̂ =

b∗1

b∗2...b∗

n

.

Then the ij coordinate of B̂B is b∗ibj = δij, meaning that B̂B = I. The dual basis

matrix is simply B−1, keeping in mind that we are using the rows of B−1, not thecolumns. We formalize this as the following theorem.

Theorem 31.27.1. Let B be a basis for an n-dimensional real vector space V andB = (b1, . . . ,bn) be the basis matrix. Then the dual basis B∗ = {b∗

1, . . . ,b∗n} has basis

matrix B−1.

Of course, to change coordinates, tB = B−1x in V. The dual space works a littledifferently as the basis matrix must multiply the coordinate vector on the right. Thusif we have a linear functional f defined by the covector vf in the standard basis, the

coordinates in the basis B∗ are t∗B = vf(B̂)−1 = vfB. Since the coordinates varydirectly with the basis matrix, the vector is called covariant. With ordinary vectors, weuse the inverse of the basis matrix, and call them contravariant as a result.

It follows that

t∗B(tB) =(

vfB)(

B−1x)

= vf

(

BB−1)

x = vfx = f(x),

showing that f(x) is unaffected by this double change of coordinates, which is what weneed.

◮ Example 31.27.2: Gallons vs. Quarts. Going back to the beginning of the Chapter 10notes, this means if we measures milk in quarts rather than gallons, and milk is good k,then the coordinate change for quantities is given by B−1 = diag(1, . . . , 1, 4, 1, . . . , 1)where 4 is in the kth row. Then B = diag(1, . . . , 1, 1/4, 1, . . . , 1), so the corresponding(dual) price vector must be multiplied by 1/4. ◭

2 I don’t use B∗ because of potential confusion with the Hermitian conjugate.

100 MATH METHODS

31.28 Another Duality Example

Rotations and reflections are a little different from the other transformations of bases.For starters, B−1 = BT . Now when we apply B−1 = BT to the coordinates of a vector,we taking sums of the columns of BT . When we apply B to a covector, we obtain sumsof the rows of B, which are the columns of BT . The action is the same on both thevectors and covectors. Let’s see how this works with an orthonormal basis.

◮ Example 31.28.1: 45◦ Rotation and Duality. The matrix

B =1√2

( 1 −11 1

)

rotates the coordinates of R2 by 45◦, mapping

( 10

)

7→ 1√2

( 11

)

and( 0

1

)

7→ 1√2

(−11

)

Since this is a rotation, its transpose is also its inverse, and we have

B−1 = BT =1√2

( 1 1−1 1

)

.

Now what happens to the dual basis? It is also rotated by 45◦ as the covector (1, 0) 7→(1/

√2, 1/

√2) and (0, 1) 7→ (−1/

√2, 1/

√2).

e1,e∗1

e2,e∗2

b1,b∗1

b2,b∗2

45◦

45◦

x1

x2

Figure 31.28.2: Here the standard coordinate axes are rotated counter-clockwise by 45◦.The standard dual basis lines up with the standard basis itself. Because the new basis is anorthonormal basis, the dual basis must rotate to match.

◭


31.29 Rn Geometry Puzzle

Consider a square with 2-foot sides. Divide that square into 4 quadrants, with sides1-foot each. Then inscribe a circle into each quadrant. Finally, inscribe a small circle inthe middle of the circles (shown in red below).

Suppose we try an analogous construction in R3, R4, . . . . In R3, we start with a cubewith 2-foot sides. We bisect each side with planes, inscribe the 1-foot spheres, theninscribe the red 2-sphere in the middle. In R

4, we have a tesseract with 2-foot sides, webisect each side with 3-d “planes”, inscribe the 1-foot 3-spheres, then the red 3-spherein the middle. We do this for each Rn with n ≥ 2.

Problem: What does the diameter of the red sphere converge to as n → ∞?(A) 0. (B) 1. (C) 2. (D) ∞.

102 MATH METHODS

31.30 The Answer to the Puzzle

As shown in the diagram, we draw the diagonal of the 1-foot square (cube, tesseract,

etc.). We are in Euclidean space, so in R2 the diagonal has length L2 =√

22 + 22 =2√

2.In Rn, the length is Ln = 2

√n. Examining the diagonal more closely, we find that

after subtracting the diameter of 1-foot circles, we have 2√n − 2. This includes both

the diameter of the red circle and the part sticking out of the 1-foot circles at eitherend. By symmetry, the portion sticking out of the 1-foot circle has the same length asthe radius, so the leftover portion (2

√n− 2) is 4 times the radius, or half the diameter

of the red circle.That means the red circle has diameter dn =

√n − 1. When n = 2, d2 ≈ .414.

when n = 4, d4 = 2 − 1 = 1. When n = 9, d9 = 3 − 1 = 2. At that point the redhypersphere in the middle touches the sides of the large enclosing box. For n > 9, thered hypersphere actually pokes out. We can see now that the correct answer was (D)∞. The inside gets much roomier as the number of dimensions increases, which allowsthe red hypersphere to partly escape the containment by the other hyperspheres.

32. TENSORS AND TENSOR PRODUCTS 103

32. Tensors and Tensor Products

Here we build up some of the basic facts concerning tensors. We will later use ten-sors to express the multidimensional version of Taylor’s Theorem. They are also usefulbackground for exterior products (a special kind of tensor) and their relation to multi-dimensional integrals.

32.1 Outer Products

We’ve spent some time on inner products of vectors. So if vectors have an innerproduct, do they also have an outer product? Yes they do!

Given x ∈ Rm and y ∈ Rn the outer product x ⊗ y is an m× n matrix defined by(x ⊗ y)ij = xiyj. This can also be written x⊗ y = xyT . An immediate consequenceis that (x⊗ y)T = yxT = y⊗ x.

The outer product is not to be confused with the exterior product which we willencounter later.

Thus, if x ∈ R3 and y ∈ R

2,

x⊗ y =

(

x1y1 x1y2

x2y1 x2y2

x3y1 x3y2

)

.

More generally, x⊗ y is an m× n matrix with ij element xjyi.

104 MATH METHODS

32.2 Examples of Outer Products

Suppose x = (1, 2, 3)T and y = (10,−1)T . Then

x⊗ y =

(10 −120 −230 −3

)

.

Another example is x = (−1, 0, 3)T and y = (−1, 5, 10)T , when

x⊗ y =

( 1 −5 −100 0 0−3 15 30

)

.


32.3 Some Properties of the Outer Product

There are two important relations:

(αx) ⊗ y = x⊗ (αy) = α(x⊗ y)

and

x⊗ (y + z) = x⊗ y + x⊗ z

(x + z) ⊗ y = x⊗ y + z⊗ y.

You can see that some sort of bilinearity has been built into the outer product. Infact, you can combine the above to show that for x,y ∈ Rm and u ∈ Rn,

(αx + y) ⊗ u = (αx) ⊗ u + y⊗ u

= α(x⊗ u) + y⊗ u,

showing that the outer product is linear in the first coordinate.Similarly, for x ∈ Rm and u, v ∈ Rn,

x⊗ (αu + v) = α(x⊗ u) + x⊗ v,

showing that the outer product is also linear in the second coordinate, and so is bilinear.Outer products can be used to represent tensor products, but there is an important

difference. The outer product obeys (x ⊗ y)T = y ⊗ x, a formula that often doesn’tmake sense for tensor products.

106 MATH METHODS

32.4 Outer Products and Bilinear Functions

Tf T is a linear map of m× n matrices (considered as a vector space) into R, T (x ⊗ y)is a bilinear function of (x,y).

Theorem 32.4.1. Let T be a linear function on the set of m × n matrices, x ∈ Rm andy ∈ Rn. Then the function f(x,y) = T (x⊗ y) is a bilinear function on Rm × Rn.

Proof. To see this, note that

T(

(αx) ⊗ y)

= T(

x⊗ (αy))

= T(

α(x⊗ y))

= αT (x⊗ y)

and

T(

(x + y) ⊗ z)

= T (x⊗ z + y⊗ z) = T (x⊗ z) + T (y⊗ z)

T(

x⊗ (y + z))

= T (x⊗ y + x⊗ z) = T (x⊗ y) + T (x⊗ z).

Together, these imply that f(x,y) = T (x⊗ y) is bilinear in (x,y).

According to Theorem 10.6.1, we can always represent a linear transformation be-tween vector spaces by a matrix. Here T maps the vector space of m × n matricesinto the real numbers. All we need is a basis for the matrices to help us write thetransformation in matrix form.

We will take the basis of the m× n real matrices defined by bij is the m× n matrixwith 1 in position ij, and zero elsewhere. It is easy to see this spans the m×n matricesand is linearly independent. In fact, the ij basis element is ei ⊗ ej.

Define an m × n matrix A by aij = T (ei ⊗ ej). We can represent f and T by thematrix A as

f(x,y) = T (x⊗ y) =m∑

i=1

n∑

j=1

aijxiyj,

which is a bilinear form (2-tensor).


32.5 The Vector Space of Outer Products

We can define a basis on the set of m × n outer products. One way is to use thestandard basis of ei. Then e1 ⊗ e1,e1 ⊗ e2, . . . ,e1⊗,en,e2 ⊗ e1, . . . ,em ⊗ en is abasis for the space of m× n outer products.

The way this is normally done is to take the free vector space generated by the pairs(ei,ej) and then take equivalence classes so that the result obeys the rules:

u⊗ (v + w) = u⊗ v + u⊗w

(u + v) ⊗w = u⊗w + v⊗w

α(v⊗w) = (αv) ⊗w = v⊗ (αw)

with x⊗ y representing an equivalence class. Fortunately, we only need to know thatwe have a vector space with the extra operation, the outer or tensor product.1

The use of the free vector space has purged such notions as taking the transpose ofa tensor product as there is no rule for handling it.

However, it does allow us to define the tensor product of linear maps. Suppose V, W,X, Y are vector spaces and we have linear transformations T : V → X and S : W → Y.Then we can define S⊗ T : V ⊗W → X⊗ Y by

(S⊗ T )(v⊗w) = S(v) ⊗ T (w).

1 The outer product is a way of representing the tensor product, but the two are isomorphic in this case.

108 MATH METHODS

32.6 Pure 2-Tensors

Tensors of the form v ⊗w ∈ V ⊗W are called pure tensors or simple tensors. Not alltensors can be written that way. Some must be written as linear combinations of puretensors.

◮ Example 32.6.1: Not All Tensors are Pure. Let V = W = R2,

e1 ⊗ e1 + e2 ⊗ e2 =( 1 0

0 1

)

Suppose there are v and w with

v⊗w =(

v1w1 v1w2

v2w1 v2w2

)

=( 1 0

0 1

)

Then v1w1 = 1, so neither v1 nor w1 is zero, and v2w2 = 1, so neihter v2 nor w2 iszero. But w1v2 = 0 and w2v1 = 0, so at least of the vi or wj must be zero, which isimpossible. We can only conclude that e1 ⊗ e1 + e2 ⊗ e2 is not a simple tensor. ◭


32.7 Compound Tensors

In fact, there is a rich supply of tensors that are not pure if the dimensions of both vectorspaces are greater than one.

Theorem 32.7.1. Suppose both {v1, v2} and {w1,w2} are linearly independent sets.Then v1 ⊗w1 + v2 ⊗w2 is not a pure tensor.

Proof. We use the outer product to see this. Suppose there are v3 and w3 with

v1 ⊗w1 + v2 ⊗w2 = v3 ⊗w3.

Since each of the outer products ism×n, this equation can be written asm equations.Let wij denote the jth component of wi.

w11v1 + w21v2 = w31v3

w12v1 + w22v2 = w32v3

......

w1mv1 + w2mv2 = w3mv3

If any of the w3i = 0, the corresponding w1i and w2i must be zero due to linearindependence of w1 and w2. If all of the w3i = 0, w1 = w2 = 0, contradictingthe linear independence of the wj. If only one of the w3i 6= 0, both wkj = 0 forj 6= i, so either both wi are zero, or one is proportional to the other, again violatinglinear independence. Finally, if at least two w3i 6= 0, either the non-zero equations areproportional, violating linear independence, or they are not proportional. Then we canwrite a non-trivial equation of the form αw1 + βw2 = 0, again contradicting linearindependence.

110 MATH METHODS

32.8 The Rank of Pure Tensors

A simpler way to see that many tensors are not pure is to consider the rank of the matrixx⊗ y. Since x⊗ y = xyT and the rank of the vecto x is either zero or one, the rankof any pure tensor must be zero or one by the product rule for rank (Theorem 27.5.1).In fact, the rank of x⊗ y is only zero when it is the zero tensor (aka the zero matrix).Summing up,

Theorem 32.8.1. Tensors with rank larger than one are not pure tensors.

Proof. As shown above, non-zero pure tensors have rank one, so tensors with ranklarger than one are not pure tensors.


32.9 Tensor Products

To get a taste of how we can free tensors from the use of coordinates, suppose V is avector space. We can write Vk as a tensor product of vector spaces:

k⊗

i=1

V = V⊗n =

k times︷︸︸︷V ⊗ V ⊗ · · · ⊗ V .

The tensor product has a natural vector space structure.We write the tensor product of vectors by

x1 ⊗ x2 ⊗ · · · ⊗ xn

where each xi ∈ Rn. The tensor product itself is k-linear and elements of the tensorproduct are said to have order k. Now

(αx1 + y1) ⊗ x2 ⊗ · · · ⊗ xk = α(x1 ⊗ x2 ⊗ · · · ⊗ xk) + y1 ⊗ x2 ⊗ · · · ⊗ xk,

x1 ⊗ (αx2 + y2) ⊗ · · · ⊗ xk = α(x1 ⊗ x2 ⊗ · · · ⊗ xk) + x1 ⊗ y2 ⊗ · · · ⊗ xk,

......

......

x1 ⊗ x2 ⊗ · · · ⊗ (αxk + yk) = α(x1 ⊗ x2 ⊗ · · · ⊗ xk) + x1 ⊗ x2 ⊗ · · · ⊗ yk.

This shows that the tensor product is k-linear.Finally, if A is linear on ⊗k

i=1Rn, then A(x1 ⊗ · · · ⊗ xk) is a k-linear function of

(x1, . . . , xk) because the tensor product is k-linear.If we need to use coordinates, we can write xi =

∑nj=1 xijej. Expanding A, we

obtain

A(x1 ⊗ · · · ⊗ xk) =n∑

j1···jk

x1j1· · ·xkjkA(ej1

⊗ · · · ⊗ ejk ) =n∑

j1···jk

aj1···jkx1j1· · ·xkjk

where each aj1···jk = A(ej1⊗ · · · ⊗ ejk).

112 MATH METHODS

32.10 Covariance and Contravariance

We will touch briefly on covariant and contravariant tensors, in case you encounterthem in the future. Suppose V is a vector space and V∗ its dual. A tensor of order (p, q)is in the tensor product

p times︷︸︸︷V ⊗ · · · ⊗ V ⊗

q times︷︸︸︷V∗ ⊗ · · · ⊗ V∗ = V⊗p ⊗ (V∗)⊗q.

It is p times contravariant and q times covariant, with total order p + q.Among the things that can be done with a (1, 1)-tensor is to form an evaluation map,

mappingx⊗ f → f(x).

It just evaluates the linear functional f at the vector x and is a special case of a moregeneral operation called contraction which reduces a (p, q)-tensor to a (p − 1, q− 1)-tensor.

Of course, tensors in V⊗p ⊗ (V∗)⊗q are often written as with p superscripts (forcontravariant coordinates) and q subscripts (for covariant coordinates) and summationoccurs over all repeated indices. Thus xifj denotes x ⊗ f while xifi is the evaluation

map. A more complicated (2, 3)-tensor might have the form T ijkℓm and a contraction on

it could be written T ijjℓm, meaning

∑j T

ijjℓm.

October 22, 2021

Copyright c©2021 by John H. Boyd III: Department of Economics, Florida InternationalUniversity, Miami, FL 33199

10. euclidean spaces

Documents