mathematical methods in economicsangell/booke/appendix.pdf · 2015. 9. 21. · s(x) there...

Mathematical Methods in Economics

T. S. Angell

Department of Mathematical Sciences

University of Delaware

Newark, Delaware

c© September 21, 2015

Contents

A Basic Set Theory 1

A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

A.2 Specification of Sets, Equality, and Subsets . . . . . . . . . . . . . . . . . . 3

A.3 The Algebra of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

A.3.1 Unions and Intersections . . . . . . . . . . . . . . . . . . . . . . . . 5

A.3.2 Set Differences, Complements, and DeMorgan’s Laws . . . . . . . . 7

A.4 Ordered Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

A.5 Binary Relations and Equivalence Relations . . . . . . . . . . . . . . . . . 11

A.6 Functions or Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

A.7 Orderings on Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

B Basic Analysis 27

B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

B.2 Norms and Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

B.2.1 Inner Products of Vectors . . . . . . . . . . . . . . . . . . . . . . . 28

B.2.2 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

B.2.3 Some Important Inequalities . . . . . . . . . . . . . . . . . . . . . . 33

B.3 Subsets of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

B.3.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

B.3.2 Suprema and Infima . . . . . . . . . . . . . . . . . . . . . . . . . . 40

B.3.3 Connected Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

B.3.4 The Bolzano-Weierstrass Theorem . . . . . . . . . . . . . . . . . . . 46

B.3.5 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

B.4 Functions on Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

B.4.1 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 53

B.4.2 Semicontinuous Functions . . . . . . . . . . . . . . . . . . . . . . . 59

i

ii CONTENTS

B.4.3 The Extended Reals . . . . . . . . . . . . . . . . . . . . . . . . . . 62

B.4.4 Epigraphs and Effective Domains . . . . . . . . . . . . . . . . . . . 64

Appendix A

Basic Set Theory

A.1 Introduction

We assume that most readers are familiar with the use of set notation and understand

the basic operations of the algebra of sets. But a review can be helpful and there are

certain particular matters that are worth emphasizing. So we devote this Appendix to a

review of the basic ideas. Of course, whole books have been written on the subject and

scores of textbooks contain this basic information. While these works discuss the subject

with varying levels of sophistication, or point of view is the “naive” one. Thus we take, as

basic undefined concepts, that of element, set, and the relation of belonging to. This is the

point of view of the book of P. R. Halmos, Naive Set Theory [?] which is still probably the

best exposition for the aspiring student. Much of what is contained in this brief appendix

follows the early part of his exposition.

To quote Halmos,

A pack of wolves, a bunch of grapes, or a flock of pigeons are all examples of

sets. . . . An element of a set may be a wolf, a grape, or a pigeon.

If we denote the set of wolves by W and a particular wolf by w, then the statement

w ∈ W is the statment that “w ” is a member of or belongs to the set W . Sometimes

the words collection, or class, or family are used synonymously with the word set. Some

authors reserve the word class to describe a set of sets and the word family to describe a

set of classes. Thus, to continue our example, we can speak of the the set W as being an

element of the class of sets of different species of mammals, and of the class of mammals

1

2 APPENDIX A. SETS

as belonging to the family of vertibrates. Again, having pointed out this usage, we use

these terms with some fluidity in our exposition. What is important, really, is clarity.

There are some logical, or better, grammatical niceties, that we need to discuss be-

fore we begin. Following Halmos, we list seven “logical” operators which will be used

throughout to construct sentences describing sets. They are

and,

or (in the sense of “either—or—or both”),

not,

if—then—(or implies),

if and only if,

for some— (or there exists—),

for all— .

The rules of sentence formation then can be listed:

(i) Put “not” before a sentence and enclose the result in parentheses.1

(ii) Put “and” or “or” or “if and only if” between two sentences and enclose the result

in parantheses.

(iii) Replace the dashes in “if—then—” by sentences and enclose the result in parenthe-

ses.

(iv) Replace the dash in “for some—” or in “for all—” by a letter, follow the result with

a sentence, and enclose the whole in parentheses.

The practice of “enclos[ing] the result in parentheses” is one that is used for clarity. Most

of the time, there is NO lack of clarity if we omit the parentheses e shall seldom, if ever,

use them.

1The correct answer to the question “Are you going to go to the rock concert or do something else?”

is “Yes”.

A.2. SPECIFICATION OF SETS, EQUALITY, AND SUBSETS 3

A.2 Specification of Sets, Equality, and Subsets

A set is specified or defined when its elements are completely characterized. There are

two ways to do this; one either makes an exhaustive list (without regard to order!) of all

elements of the set, or one gives an explicit property or attribute that actually character-

izes the elements. That doing so indeed specifies a set is often stated as an axiom of set

theory. Formally:

Axiom: To every set A and to every condition (equivalently “sentence”)

S(x) there corresponds a set B whose elements are exactly those elements x

of A for which S(x) holds.

The set of all students registered for a particular college course at a given moment in

time is given by the traditional class list. If we look at the students whose names appear

in this list, then we can define a set, F , of all named students who are female. If we call

the set of students whose names appear on the class list, L, then the set F is given by

{s ∈ L | s is the name of a female student}. A more mathematical example is the set of

points in the plane R2 that lie on the unit circle

S1 = {(x, y) ∈ R2 |x2 + y2 = 1} .

Notice that in both these examples a “generic” element is named (s in the first case and

(x, y) in the second) and they are required to be elements of some “universal” set (L in

the first case and R2 in the second). The specification of this so-called universe of discourse

makes clear what types of objects we are discussing and, in more theoretical expositions,

avoids certain well-known logical difficulties as, for example, the Russell Paradox ([?])

which involves only sets which do not have themselves as elemets and the set U of all such

sets.

Two sets, A and B are said to be equal provided they consist of exactly the same

elements. In this case we write A = B; in the contrary case we write A 6= B. We say

that a set A is a subset of B provided every element of A is also an element of B and, in

this case, we write A ⊂ B. Notice that this is quite different from the relation “belongs

to”; the relation “is a subset of” is, as defined, a reflexive relation in the sense that it is

always true that A ⊂ A. This is certainly not the case with the relation ∈.

The relation A ⊂ B may also be written in the reverse order as A ⊃ B. In the case

that A ⊂ B and A 6= B we say that A is a proper subset of B. Note that some authors

4 APPENDIX A. SETS

use the notation A ⊆ B for the relation A ⊂ B and A ( B in the case that A is a proper

subset. We do not use that notation in this book.

We will speak of relations with more specificity presently. To anticipate that discussion

we point out that the relation of being a subset has certain interesting properties. One we

have already mentioned, that of reflexivity. We collect the three most important properties

of the relation ⊂ here. Denoting the universe of discourse as U , we have

(1) A ⊂ A for every set A ⊂ U (reflexivity) ;

(2) For sets A and B, subsets of U , A ⊂ B and B ⊂ A implies

A = B (antisymmetry) ;

(3) For sets A,B,C ⊂ U , A ⊂ B and B ⊂ C implies A ⊂ C (transitivity) .

A relation on a class of sets, in this case all subsets of the universe U , with these three

properties defines what we will call a partial order (more on partial orderings presently).

In this case, we say that set class of all subsets of the set U , denoted by P(U), is partially

ordered by inclusion2. The set P(U) is called the power set of U

It is important to note that in this example of a partially ordered set, the properties

of reflexivity and antisymmetry can be combined into a single statement:

A = B if and only if A ⊂ B and B ⊂ A . (A.1)

This statement embodies the basic strategy for showing that two sets are equal: show

every element of A is an element of B and that every element of B is also an element of

A, i.e., that each set is a subset of the other.

It is often convenient to have a subset of the universe of discourse, U , that contains no

elements of U . This set is called the empty set and is denoted by the symbol ∅. If we agree,

as we do, that a set is specified by characterizing the properties that the elements must

have, then we may use any false statement to specify the empty set. Thus, for example,

we may write

∅ = {u ∈ U |u 6= u} .2We remark that there are other kinds of relations that can be defined on P(U). For example, equality

is a relation which is certainly reflexive, and transitive; however, rather than being anti-symmetric, it is

symmetric since A = B implies that B = A. Such relations on a set are called equivalence relations

A.3. THE ALGEBRA OF SETS 5

This set is called the empty set or, sometimes, the null set. It is a subset of any given set.

In particular, in any collection of sets, there can be at most one empty set since, were

there more, each would be subset of the other.

A.3 The Algebra of Sets

In this section we consider useful ways of combining sets to form new ones. We assume

that we have a fixed universe of discourse, U , and that all sets are subsets of U . We will

not always mention this universe in stating definitions. The familiar operations which

we study in this section are: union, intersection, complementation, and powers. These

operations are all fundamentally related to the relation of inclusion.

A.3.1 Unions and Intersections

We begin with the operation of set union. Given sets A and B, their union, written A∪Bis defined by

A ∪B = {x ∈ U |x ∈ A or x ∈ B} .3 (A.2)

For example, if A = {1, 5, 9, 7, 3} and B = {8, 4, 2, 6} then A∪B = {1, 2, 3, 4, 5, 6, 7, 8, 9}.(Note the only things here that matters are the elements themselves, not the order in

which we happen to write them down!). Alternately, using the symbol N for the nat-

ural numbers (set of positive integers) we can describe the union as A ∪ B = {x ∈N |x is used in Soduku puzzles}.

The operator ∪ is commutative, associative, and idempotent. That is (i)A ∪ B =

B ∪ A , (ii) (A ∪ B) ∪ C = A ∪ (B ∪ C), and (iii)A ∪ A = A. Likewise it is true that

A∪∅ = A and A∪U = U or, more generally, A ⊂ B if and only if A∪B = B. All of these

statements are statements about the equality of two sets. As such, they can be proved by

application of the definitions and our logical operators, although they are so elementary

that few would bother to write down a proof. But every serious student should prove

them once in a lifetime. Here is an example: we prove the statement

Proposition A.3.1 A ⊂ B if and only if A ∪B = B.

3REMEMBER: “or” means “either—or—or both”.

6 APPENDIX A. SETS

Proof: Suppose first that A ⊂ B. We then must show that A ∪ B = B. Following our

basic rule (A.1) we first, check that A ∪B ⊂ B and then prove the reverse inclusion. So,

if x ∈ A∪B then either x ∈ A or x ∈ B or both. But A ⊂ B means that for every x ∈ Awe must have x ∈ B so x ∈ A∪B implies x ∈ B. So the first inclusion is proved. Now we

establish the reverse inclusion B ⊂ A∪B. To do this, choose x ∈ B. Then, by definition

of union x ∈ A or x ∈ B hence x ∈ A ∪ B. This completes the proof of sufficiency: if

A ⊂ B then A ∪B = B.

To prove the reverse implication, that is, to prove the necessity of the left hand state-

ment, suppose that A ∪ B = B. We will be done if we can show that A ⊂ B. To this

end, choose x ∈ A. Then, x ∈ A ∪ B and since this latter set is, by hypothesis, just the

set B we have x ∈ B.

There are two things that are illustrated in this proof. First, the use of (A.1) to prove

that two sets are equal. The other is the structure of an “if and only if” proof. If the

reader is unsure of this reasoning, hopefully careful study of the arguments above will be

helpful.

The operation of intersection has many similarities with the operation of union. Given

two sets A and B, their intersection A ∩B is defined by

A ∩B = {x ∈ U |x ∈ A and x ∈ B} (A.3)

Notice that the definition is symmetric in A and B in the sense that A ∩ B = B ∩ A.

A list of elementary properties of the intersection operator is given here:

A ∩ ∅ = ∅ ,

A ∩B = B ∩ A ,

A ∩ (B ∩ C) = (A ∩B) ∩ C ,

A ∩ A = A, ,

A ⊂ B if and only if A ∩B = A .

Notice that the last of these properties shows, along with Proposition A.3.1, that set

inclusion can either be described in terms of unions or intersections.

If two sets have no common elements, then they are said to be disjoint and we write

A ∩ B = ∅. In the case that we have a collection of sets, any two of which are disjoint,

then we say that the collection is pairwise disjoint.


We now have two operations defined on sets, union and intersection. The natural

question now is to ask how these two operations are related to each other. This question

is answered by the distributive laws:

A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C) (A.4)

A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C . (A.5)

As an exercise intended to illustrate, once more, a set-theoretic argument, let us prove

the first of these distributive laws.

Proposition A.3.2 For any sets A,B and C subsets of the set U ,

A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C) .

Proof: If x belongs to the left-hand side of this equation, then x ∈ A and x ∈ B or x ∈ Cor both. If x ∈ B then x ∈ A ∩ B and so is an element of the set on the right. Likewise,

if x ∈ C then x ∈ A∩C and so, again, x belongs to the right hand side. This shows that

the set on the right-hand side includes the set on the left.

To prove the reverse inclusion, suppose x belongs to the set on the right. Then either

x ∈ A∩B or x ∈ A∩C or both. If x ∈ A∩B then x ∈ A and x ∈ B. So x ∈ B ∪C and

hence x ∈ A ∩ (B ∪ C). Likewise, if x ∈ A ∩ C then x ∈ C and thus x ∈ B ∪ C. So in

this case as well, x ∈ A ∩ (B ∪ C) and x belongs to the left-hand set. Hence the reverse

inclusion is satisfied. wwe conclude that the sets on each side of th equality are the same

set.

A.3.2 Set Differences, Complements, and DeMorgan’s Laws

The set theoretic difference A\B (also written A−B) is defined by

A\B = {a ∈ A | a 6∈ B} .

In many situations we are only interested in subsets of a given set X (the universe of

discourse). The complement Ac of a set A with respect to X is defined by

Ac = X\A = {a ∈ X | a 6∈ A} .

8 APPENDIX A. SETS

We can now formulate and prove De Morgan’s Laws. These are rules that relate com-

plements of unions to intersections of complements, and complements of intersections to

unions of complements. It is surprising how useful they are and how often they are used4.

In the case of just two sets A,B ⊂ X, these rules are simple to write down and

understand in terms of Venn diagrams and the reader is invited to do so. In this simple

case they read

X \ (A ∪B) = (X \ A) ∩ (X \B) , or (A ∪B)c = Ac ∩Bc . (A.6)

and

X \ (A ∩B) = (X \ A) ∪ (X \B) , or (A ∩B)c = Ac ∪Bc . (A.7)

Proposition A.3.3 Assume that A1, A2, . . . , An are subsets of the set X. Then

(A1 ∪ A2 ∪ . . . ∪ An)c = Ac1 ∩ Ac2 ∩ . . . Acn ,

and

(A1 ∩ A2 ∩ . . . ∩ An)c = Ac1 ∪ Ac2 ∪ . . . ∪ Acn .

Proof: For the first part, assume that x ∈ (A1∪A2∪. . .∪An)c. Then x 6∈ A1∪A2∪. . .∪An,

and hence x 6∈ Ai for any i = 1, 2, . . . , n. This means that x ∈ Aci for all i and so

x ∈ Ac1 ∩ Ac2 ∩ . . . ∩ Acn. So we have shown that

(A1 ∪ A2 ∪ . . . ∪ An)c ⊂ Ac1 ∩ Ac2 ∩ . . . Acn .

To prove the reverse inclusion, assume that x ∈ Ac1∩Ac2∩ . . . Acn. This means that x ∈ Acifor all i. So x 6∈ Ai for all i = 1, 2, . . . , n. It follows that x 6∈ A1 ∪ A2 ∪ . . . ∪ An which

means that x ∈ (A1 ∪ A2 ∪ . . . ∪ An)c. This completes the proof of the first equality. We

leave the (analogous) proof of the second equation as an exercise.

De Morgan’s Laws have extension to arbitrary families of sets. We first extend the

notions of union and intersection to families of sets in the following way: If A is a non-

empty family of sets, we define

4DeMorgan’s Laws are frequently used in programming, in particular in the construction of sorting

algorithms. From the point of view of logic, they allow the substitution of equivalent statements, e.g.,

“not S or not T ‘ being equivalent to “not both S and T”.


⋃A∈A

A = {a ∈ U | a belongs to at least one set A ∈ A}

and ⋂A∈A

A = {a ∈ U | a belongs to all sets A ∈ A} .

The distributive laws and the Laws of De Morgan extend to this case in the obvious

ways, e.g., (⋃A∈A

A

)c

=⋂A∈A

Ac .

Families are often given as indexed sets. This means we have one basic set I (the index

set and the family consists of one set Ai for each element i ∈ I. We then write the family

as

A = {Ai | i ∈ I} .

We may then write

⋃i∈I

Ai and⋂i∈I

Ai

for unions and intersections. In this setting De Morgan’s Laws become(⋃i∈I

Ai

)c

=⋂i∈I

Aci and

(⋂i∈I

Ai

)c

=⋃i∈I

Aci .

Let us finish the section with a simple example.

Example A.3.4 for each rational number q ∈ Q we can take the set

Cq := {(x, y) ∈ R2 |x2 + y2 = q2}

which is just the circle with rational radius q, centered at the origin. Then we can consider

C(Q) = {Cq | q ∈ Q} .

The set C(Q) is just the family of all circles in the plane R2 with center at the origin and

rational radius. Note that this family can be thought of as a family of sets indexed by the

rationals.

10 APPENDIX A. SETS

A.4 Ordered Pairs

In analytic geometry and elementary calculus it is common to introduce the coordinate, or

(x, y)-plane. The horizontal axis or axis of abscissae is associated with the value of x and

the vertical axis or axis of ordinates is associated with the y coordinate. The agreement

that the abscissa be listed first and the ordinate next in a certain sense gives a geometric

definition of the ordered pair (x, y). Likewise this construction gives a concrete example

of what is called the Cartesian product5 of two sets, in this case the two sets are two copies

of the real line.

In the theory of sets, we need a much more precise description of the notion of ordered

pair but its introduction makes things get very technical very quickly. We have given here

a quick summary for the sake of completeness.

The definition given here has the disadvantage of strangeness, but the decided advan-

tage of settling the problem of what we mean by a “first element” of an ordered pair. If

A = {a, b} and, in the desired order a comes first then we need a careful definition. Here

is the definition that we shall adopt.�

Definition A.4.1 The ordered pair of a and b with first coordinate a and second coordi-

nate b, is the set (a, b) defined by

(a, b) = {{a}, {a, b}} .

The definition clearly specifies the “first element”; it is the element that occurs in the

singleton set {a}. There are some technical difficulties that need to be addressed which

arise from the fact that the ordered pair is definied as a set of sets. Halmos ([?], pp.

23-24) deals with all of them. What is important is the statement that, given sets A and

B, there exists a set that contains all the ordered pairs (a, b) with a ∈ A and b ∈ B. This

set is called the Cartesian product of A and B, is written A×B, and is characterized by

the fact that

A×B = {x ∈ P [P(A ∪B)] |x = (a, b) for some a ∈ A and some b ∈ B} .

Remark: Note that we have written x ∈ P [P(A ∪ B) so (a, b) is a set of subsets of

A ∪B. If follows that one ordered pair is an element of P(A ∪B), while a set of ordered

pairs is then a set of subsets of A ∪B i.e., an element of P [P(A ∪B)].

5After all, it was Rene Desartes who introduced numerical components into geometry.

A.5. BINARY RELATIONS AND EQUIVALENCE RELATIONS 11

If R ⊂ A×B then the sets

RA = {a ∈ A | for some b ∈ B, (a, b) ∈ R}

and

RB = {b ∈ B | for some a ∈ A, (a, b) ∈ R}

are called the projections of R onto the first and second coordinates respectively.

Having seen the rigorous definition, we will henceforth treat ordered pairs less formally

as is usually done. Again, there are several facts that can be easily checked and we leave

them as exercises:

Exercise A.4.2 If A,B,X, and Y are sets, then

(a) If either A = ∅ or B = ∅, then A×B = ∅. The converse is also true.

(b) A ⊂ X and B ⊂ Y implies A × B ⊂ X × Y . If A × B 6= ∅ then A × B ⊂ X × Yimplies A ⊂ X and B ⊂ Y .

(c) The following distributive laws hold:

(i) (A ∪B)×X = (A×X) ∪ (B ×X) ;

(ii) (A ∩B)× (X ∩ Y ) = (A×X) ∩ (B × Y ) ;

(iii) (A−B)×X = (A×X)− (B ×X).

A.5 Binary Relations and Equivalence Relations

We start with two sets A and B. Then a binary relation R on a set A×B is a proposition

such that, for every ordered pair (a, b) ∈ A×B, one can decide if a is related to b or not.

It is simply a restricted set of ordered pairs. Formally,

Definition A.5.1 A binary relation in a set A×B is a subset R ⊂ A×B. The statement

“(a, b) ∈ R” is written as aR b.

12 APPENDIX A. SETS

Example A.5.2 (a) For any set A×A6 the diagonal ∆ = {(a, a) | a ∈ A} is the relation

of equality. The relation [(A× A) \∆] is the relation of inequality.

(b) The relation ≤ between two real numbers is the set

{(x, y) ∈ R× R |x coincides or lies to the left of y } ⊂ R× R .

(c) In P(A) the relation of set inclusion, B ⊂ A, is given by

{(A,B) ∈ P(A)× P(A) | every element of B is an element of A } .

(d) For any set X, let R be the relation on X×P(X) defined by (x,A) ∈ R if and only

if x ∈ A. This is the relation of membership in a set.

If A × B is a set with a binary relation R and C ⊂ A,D ⊂ B then the relation

R∩ (C ×D) is a binary relation on the set C ×D. It is called the relation induced by Ron C ×D .

Of all the relations, one of the most important is the equivalence relation. We will denote

such a relation by the symbol ∼ and write a ∼ b when we mean that a is equivalent to b.

We will also say that “a is similar to b”.

Definition A.5.3 An equivalence relation on a set X is a binary relation on X which is

reflexive, symmetric and transitive, i.e.

(a) for all a ∈ X : a ∼ a (reflexive).

(b) a ∼ b implies b ∼ a (symmetric).

(c) a ∼ b and b ∼ c implies a ∼ c (transitive).

We begin with some simple examples.

Example A.5.4

(a) The relation ∆ is an equivalence relation.

(b) In N the relation {(x, y) ∈ N×N |x−y is divisible by 2} is an equivalence relation.

6We will often say that a relation is defined on A to mean a relation on A × A, a slight abuse of

language that should cause no problem.


(c) Let f : X → Y be a function. Then {(x1, x2) ∈ X × X | f(x1) = f(x2)} is an

equivalence relation on X.

(d) Let T be the set of all triangles in the plane R2. Then the relation of congruence,

familiar from elementary Euclidean geometry is an equivalence relation.

Let us check the assertion (b). First, reflexivity. For alll x we have x − x = 0 and 0

is divisible by 2. Hence the relation is reflexive. Moreover, since y − x = (−1) (x − y)

it is clear that if 2|(x − y) then 2|(y − x). Hence the relation is symmetric. Finally, if

2|(x− y) and 2|(y− z), then since x− z = (x− y) + (y− z), it is clear that 2|(x− z). So

the relation is also transitive and hence is an equivalence relation.

Suppose that ∼ is an equivalence relation on the set X. If x ∈ X let E(x,∼) denote the

set of all elements y ∈ X such that x ∼ y. The set E(x,∼) is called the equivalence class

of x for the equivalence relation ∼. Since ∼ is an equivalence relation, the equivalence

classes have the following properties:

1. Each E(x;∼) is non-empty for, since x ∼ x, x ∈ E(x;∼).

2. Let x and y be elements of X. Since ∼ is symmetric, y ∈ E(x;∼) if and only if

x ∈ E(y;∼).

3. If x, y ∈ X the equivalence classes E(x;∼) and E(y;∼) are either identical or they

have no members in common.

Indeed, suppose, first, that x ∼ y. Let z ∈ E(x;∼). Then, by symmetry, since

z ∼ x we have also x ∼ z. Hence, by transitivity, z ∼ y and so, by symmetry,

y ∼ z. This shows that E(x;∼) ⊂ E(y;∼). By the symmetry of ∼ we see that

E(y;∼) ⊂ E(x;∼). Hence E(x;∼) = E(y;∼).

Finally, notice that if the points x, y ∈ X are not related then E(x;∼)∩E(y;∼) = ∅.Indeed, if z ∈ E(x;∼) ∩ E(y;∼) then x ∼ z and y ∼ z and so x ∼ z and z ∼ y.

Therefore x ∼ y which is a contradiction.

These facts lead to the following assertions concerning the family, F , of equivalence

classes of the equivalence relation ∼:

1. Every element of the family F is non-empty.

14 APPENDIX A. SETS

2. Each element x ∈ X belongs to one and only one of the sets in the family F .

3. x ∼ y if and only if x and y belong to the same set in the family F .

Otherwise said, an equivalence relation subdivides a set (or partitions the set) into the

union of a family of non-overlapping, non-empty subsets. Since, in most discussions, there

is only one equivalence relation that is relevant, we will often write simply E(x) instead

of E(x;∼) is no confusion can arise.

Here is an example which is perhaps the first most students see when they discuss

number systems.

Example A.5.5 In the construction of the rational numbers, which we will denote by Q,

we first introduce ratios of integers p/q where p ∈ N and q ∈ N. If p/q represents a point

on the number line, then the ratios kp/kq must represent the same point and hence the

same rational number. Thus, two ratios p/q and r/s represent the same rational number

and can be treated as equal and can be substituted for one another in proofs involving

rational numbers whenever the equality

ps = rq

is true.

Now, let us define a relation on N × N by (p, q) ∼ (r, s) if and only if ps = rq. We

check that this is an equivalence relation as follows:

(a) (Reflexivity): pq = pq hence (p, q) ∼ (p, q).

(b) (Symmetry): If ps = rq then rq = ps and so

(p, q) ∼ (r, s) implies (r, s) ∼ (p, q) .

(c) (Transitivity): If ps = rq and rt = vs, then

(pt) · s = (ps) · t = (rq) · t = (rt) · q = (vs) · q = (vq) · s

and thus pt = vq since s 6= 0. Hence (p, q) ∼ (r, s) and (r, s) ∼ (v, t) implies

(p, q) ∼ (v, t).


This argument shows that the rational numbers can be viewed as equivalence classes

of ratios of integers modulo the relation ∼ given in the example.

As a final example consider the following:

Example A.5.6 : Consider the set Z and let n be a fixed positive integer. Define a

relation ∼n by

x∼ny provided (x− y) is divisible by n .

This relation is called the relation of congruence modulo n. It is easy to check that

this is an equivalence relation on Z. (See the special case for n = 2 treated above.)

Moreover, there are n equivalence classes. Each integer x is uniquely expressible in the

form x = q n+ r, where q and r are integers and 0 ≤ r ≤ n− 1. (The integers q and r are

called the quotient and the remainder respectively. ) Hence each x is congruent modulo

n to one of the n integers 0, 1, . . . , n− 1. The equivalence classes are

E0 = {. . . ,−2n,−n, 0, n, 2n, . . .}

E1 = {. . . , 1− 2n, 1− n, 1 + n, 1 + 2n, . . .}

......

En−1 = {. . . , n− 1− 2n, n− 1− n, n− 1, n− 1 + n, n− 1 + 2n, . . .}

Formallly, the domain of a relation, R, on X × Y is the set of all first coordinates of

the members of R while, in this context, the range is the set of all second coordinates.

Formally

dom (R) = {x ∈ X | for some y ∈ Y, (x, y) ∈ R} ,

while

rng (R) = {y ∈ Y | for some x ∈ X, (x, y) ∈ R} .

The inverse of a relation R, denoted R−1, is obtained by reversing each of the pairs

belonging to R. Thus

R−1 = {(y, x) ∈ Y ×X | (x, y) ∈ R} .

16 APPENDIX A. SETS

Hence the domain of the inverse is the range of R and the range of R−1 is always the

domain of R.

If R and S are relations, then the composition R ◦ S is defined as

{(x, z) ∈ X × Z | for some y, (x, y) ∈ S and (y, z) ∈ R} .

Example A.5.7 If R = {(1, 2)} and S = {(0, 1)} then R◦S = {(0, 1)} while S ◦R = ∅.

Concerning compositions and inverses we have the following result

Proposition A.5.8 Let R,S, and T be relations. Then

(a) (R−1)−1

= R.

(b) (R ◦ S)−1 = S−1 ◦ R−1 .

(c) R ◦ (S ◦ T ) = (R ◦ S) ◦ T

Proof: (of (b)) We have

(x, a) ∈ (R ◦ S)−1 ⇔ (x, z) ∈ R ◦ S ⇔ for some y ,

(x, y) ∈ S and (y, z) ∈ R.

Consequently, (z, x) ∈ (R ◦ S)−1 if and only if (y, z) ∈ R−1 and (y, a) ∈ S−1 for some y.

But this is the condition that (z, x) ∈ S−1 ◦ R−1.

A.6 Functions or Maps

We now define the idea of a function (or a mapping) in terms of sets. This is not so

unusual since we often think of a function in terms of its graph which consists of a set

of ordered pairs: given two sets X and Y a function is determined provided we specify

a set of ordered pairs (the graph of the function) in X × Y with the additional property

that no two distinct pairs have the same first element. Hence a function is a particular

example of a relation! Not every subset of ordered pairs will do however; to be a function

the ordered pairs must satisfy a particular condition.

A.6. FUNCTIONS OR MAPS 17

Definition A.6.1 Let X and Y be two sets. A map f : X −→ Y (or a function with

domain X and range Y ) is a subset f ⊂ X × Y with the property: for each x ∈ X, there

is one, and only one, y ∈ Y satisfying (x, y) ∈ f .

It is usual to write y = f(x) instead of (x, y) ∈ f and say that “y is the value f assumes

at x”, or that “y is the image of x under f”, or that “f sends x to y”. The usual way to

define a map is to specify its domain X and the value of the function at each x ∈ X. We

often write x 7→ f(x). Here are some examples.

Example A.6.2 :

(a) Suppose k ∈ Y is fixed. Then the map defined for all x ∈ X by x 7→ k is called

a constant map. Note that a map need not send distinct points of X to distinct

points of Y , nor do we require it to take on all values in its range.

(b) The map x 7→ x of X onto itself is called the identity map on X. We will often

write this as IdX .

(c) If A ⊂ X the map i : A→ X given by a 7→ a is called the inclusion map of A into

X.

(d) For any sets X, Y the map p1 : X × Y → X determined by (x, y) 7→ x is called

the “projection onto the first coordinate”. Similarly p2 : X × Y → Y given by

(x, y) 7→ y is called the “projection onto the second coordinate”.

We have stated above that it is not required that every point in the range be a value

that is taken on by a given function. That is the motivation for the following definition.

Definition A.6.3 Let f : X → Y . Then

(1) For each A ⊂ X , f(A) = {f(x) ∈ Y |x ∈ A} ⊂ Y is called the image of A in Y

under f .

(2) For each B ⊂ Y , f−1(B) = {x ∈ X | f(x) ∈ B} is called the inverse image of B in

X under f .

Again, it is a good idea to see some examples.

18 APPENDIX A. SETS

Example A.6.4 (a) Let X = [−1, 1], Y = [0, 2] and f : X → Y be given by x 7→ x2.

Then f−1({1/4}) = {−1/2, 1/2}. This shows tat the inverse image of a single point

may well be a set in the domain. This cannot happen, of course, if f is one-to-one

(see definition below).7

(b) Let X = [−1, 1] , Y = [0, 2], and f : X → Y be x 7→ x2. Then f [0, 12] = [0, 1

4] and

f−1[0, 14] = [−1

2, 1

2].

(c) Let f : X → Y . If p1, p2 are the projections defined in the preceeding example

(A.6.2, part (d)), we have f(A) = p2[f ∩ (A× Y )] and f−1(B) = p1[f ∩ (X ×B)].

It is useful to think, explicitly, about how a function f : X → Y induces a map from

P(X) → P(Y ). This induced map is defined by A 7→ f(A) we call this induced map f

as well. Likewise, f : X → Y also induces a map f−1 : P(Y ) → P(X) by B 7→ f−1(B)

called the inverse map. Of these two maps, the most well-behaved is the inverse map f−1

and, in some sense, it is the most important.

Proposition A.6.5 : Let f : X → Y . then the inverse map f−1 : P(Y ) → P(X)

preserves union and intersection. Precisely

(a) f−1(B1 ∪B2) = f−1(B1) ∪ f−1(B2).

(b) f−1(B1 ∩B2) = f−1(B1) ∩ f−1(B2).

Proof: We leave (a) as an exercise and prove (b).

x ∈ f−1(B1 ∩B2) if and only if f(x) ∈ (B1 ∩B2) if and only if f(x) ∈ B1 and f(x) ∈ B2

if and only if x ∈ f−1(B1) and x ∈ f−1(B2)

if and only if x ∈ [f−1(B1) ∩ f−1(B2)] .

As a corollary, we can restate the result for arbitrary intersections; the proof is analo-

gous.

7While we usually make a careful distinction between a singleton set {x} ⊂ X and a point x ∈ X we

often abuse notation and write simply f−1(y) instead of f−1({y}).

A.6. FUNCTIONS OR MAPS 19

Corollary A.6.6 Let f : X → Y and let f−1 be the inverse map. Then if B be a family

of subsets of the set Y we have

f−1

(⋃B∈B

B

)=⋃B∈B

f−1(B) and f−1

(⋂B∈B

B

)=⋂B∈B

f−1(B) .

We have said that f−1 is better behaved because the last result is not true for the

induced map f . Indeed we have the counterexample:

Example A.6.7 : Let f : R → R be the constant map x 7→ 1. Let A = [0, 1] and

B = [2, 3]. Then A ∩B = ∅ and so

∅ = f(A ∩B) 6= f(A) ∩ f(B) = {1} .

We do find, however, that f preserves unions as the next result shows.

Proposition A.6.8 let A be a family of subsets of the set X. If f : X → Y , then for

the induced map f : P(X)→ P(Y ) we have

f

(⋃A∈A

A

)=⋃A∈A

f(A) , and f

(⋂A∈A

A

)⊂⋂A∈A

f(A) .

We leave the proof of this last result to the reader.

Note that, in general, we do not have equality in the last case as the above example

shows. To further clarify matters let us look at another example.

Example A.6.9 Let X = {x1, x2} and Y = {y}. Define f : X → Y by f(x1) = f(x2) =

y, and let A1 = {x1}, A2 = {x2}. Then A1 ∩ A2 = ∅ and consequently f(A1 ∩ A2) = ∅.On the other hand, f(A1) = f(A2) = {y} and so f(A1) ∩ f(A2) = {y}. This means that

f(A1 ∩ A2) 6= f(A1) ∩ f(A2).

The problem here stems from the fact that y belongs to both f(A1) and f(A2) but only

as the image of two different elements x1 ∈ A1 or x2 ∈ A2; there is no common element

x ∈ A1 ∩ A2 which is mapped into y. This cannot happen if f is one-to-one.

As for compositions, we have the usual result familiar from calculus.

20 APPENDIX A. SETS

Proposition A.6.10 Let f : X → Y and g : Y → Z. Then (g ◦ f)−1 = f−1 ◦ g−1.

Proof:

x ∈ (g ◦ f)−1(C) if and only if g ◦ f(x) ∈ C if and only if f(x) ∈ g−1(C)

if and only if x ∈ f−1[g−1(C)] if and only if x ∈ f−1 ◦ g−1(C) .

If f : X → Y takes on every value in its range, f is called surjective (or a surjection or

onto). Note that, for a surjective f we have for all B ⊂ Y , f [f−1(B)] = B.

If f sends distinct elements of X to distinct elements of Y , then f is call injective (or

and injection or one-to-one). Otherwise said, f in injective provided that x1 6= x2 iimplies

f(x1) 6= f(x2). This is equivalent to the statement that f(x1) = f(x2) if and only if

x1 = x2.

A function that is both injective and surjective is called bijective or a bijection. Note

that f is a bijection if and only if for all y ∈ Y , f−1({y}) is a single point. In this case

(f−1)−1 = f .

Example A.6.11 Consider the mapping f : R → R defined by x 7→ 2x + 3. Then f is

certainly a bijection with inverse mapping y 7→ 12y − 3

2. Indeed 1

2(2x+ 3)− 3

2= x.

Finally, we return to the notion of an indexed family of sets that we met in Section A3.

Let I be any set and F a family of subsets of a universal set U . Suppose, moreover, that

f : I −→ F . Then we write f(i) = Ai. In this way the function I is said to index the

sets {Ai ∈ F | f(i) = Ai, i ∈ I}. If the map f is surjective, then we say that the family

F has been indexed by I.

In particuar, if I = N then, to specify such a function simply defines what we usually

mean by a sequence of sets, and we write {A1, A2, . . .}. In the case that f is surjective,

we say that F is a countable family of subsets of U .

Example A.6.12 In Rn, let B(0) be the set of all sets of the form {x ∈ Rn |n∑i=1

x2i <

r2, r ∈ R}. These sets are the points in Rn whose distance from the origin is less than r.

Denote these sets by Br(0). Then the sets Bn(0) form a countable subset of B(0).

A.7. ORDERINGS ON SETS 21

A.7 Orderings on Sets

We will frequently meet certain binary relations on various sets which are called orderings

of which there are several types. Such relations are used in economics, for example, to de-

scribe preferences of various agents. Thus suppose that an n-vector x = (x1, x2, . . . , xn) ∈Rn represents a “bundle” of goods available to consumers, xi representing the amount of

good i in the bundle. Thus, for example, if the first component represents the number of

refrigerators measured in units while the second component represents wheat measured

in bushels then (2, 3.659, . . .) ∈ Rn is a bundle of goods consisting, among other things,

of two refrigerators and 3.659 bushels of wheat.

In describing consumer behavior, we generally make the assumption that one and only

one of the following alternatives holds:

1. a bundle x is preferred to the bundle y;

2. the consumer is indifferent in the choice of two bundles;

3. the bundle y is preferred to the bundle x.

Note that these alternatives, taken together, imply that the consumer can decide unam-

biguously between two bundles.

This situation suggests that we introduce some kind of structure that reflects preference,

i.e., some way of ordering bundles to reflect consumer desires.

Definition A.7.1 A binary relation R in a set A is said to be a preorder on A if it is

reflexive and transitive, i.e.,

(a) for all a ∈ A , aR a.

(b) If aR b and bR c then aR c.

A set, together with a definite preorder, is called a preordered set. It is traditional to write

a preorder with the symbol ≺ or with �. Thus “a preceeds b”, or “b is preceeded by a,

or, in economics, “b is preferred to a” is written a ≺ b. The symbol (A,≺) denotes a

preordered set. Notice that if B ⊂ A and if A is preordered by ≺ then, by default, this

preorder induces a preorder on B.

22 APPENDIX A. SETS

Example A.7.2 (a) In any set, the relation ∆ (the diagonal relation) on a set A× Ais a preorder and a ≺ b means a = b. Note that we no not assume in the definition

that any two elements can be compared. In other words, we do not require that

either a ≺ b or b ≺ a for all a, b ∈ A.

(b) In the set R the relation {(x, y) ∈ R× R |x ≤ y} is a preorder. On the other hand

{(x, y) ∈ R× R |x < y} is not a preorder. (Why?)

(c) (IMPORTANT!) For any set X, consider the power set P(X). The relation A ≺ B

defined by

A ≺ B if and only if A ⊂ B

is a preordering of P(X). In this particular case, we say that P(X) is preordered by

inclusion.

By putting other conditions on a preordering, different types of orders can be obtained.

Definition A.7.3 If a preordering on A satisfies the additional property of antisymmetry,

i.e.,

a ≺ b and b ≺ a if and only if a = b ,

then it is called a partial ordering. In this case A is called a partially ordered set.

As another, and important, example we consider the following.

Example A.7.4 Consider the set Rn. Let x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn).

Then we can introduce a partial order ≺ in Rn by

x ≺ y if and only if xi ≤ yi for all i = 1, 2, . . . , n.

Here ≤ is the usual ordering on the real line. Note that this is certainly a reflexive,

transitive, and anti-symmetric relation so that ≺ is indeed a partial ordering of Rn. Note

further that not every two elements can be compared. Thus, for example, in R2, the

vectors (1, 2)> and (2, 1)> are not comparable.


This last example contrasts with the usual ordering ≤ on the real line, R, where every

element can be compared. This leads to an important special case of a partial order.

Definition A.7.5 Let A be a set. A total or linear order on the set A is a partial order

≺ such that, for all x, y ∈ A, x 6= y either x ≺ y or y ≺ x whenever x and y are both in

the domain and range of the order relation.

The usual order on the real line is the obvious example. We remark that, in this

terminology, a chain in a partially ordered set is a totally ordered family.

The set P(X) is partially ordered by inclusion since the preorder is also antisymmetric.

In general, the set P(X) is not a chain. In this context, what does a chain look like? One

example is the following: for each n ∈ N, let An ⊂ P(X) and suppose that, for each n,

An+1 ⊂ An and An 6= An+1. Then the set of subsets {An}∞n=1 constitutes a chain in P(X)

with respect to the partial ordering of inclusion.

Example A.7.6 Let X = [−1, 1] ⊂ R and let An = [− 1n, 1n] , n = 1, 2, · · · . Then this set

forms a chain, namely

[−1, 1] ⊃ [−1

2,1

2] ⊃ [−1

3,1

3] ⊃ · · ·

Here is another example which is important in a number of applications including

integer programming algorithms and sorting algorithms in computer sciennce.

Example A.7.7 (Lexicographical Order) Let X be the set of all infinite sequences of

real numbers. Define a relation ≺L on X by

a ≺L b provided, for the smallest integer io such that aio 6= bio , aio < bio .

This order is called lexicographical order since it is the same kind of order used in

common dictionaries. In fact, this order is a total, or linear, order. Indeed, it is clearly

reflexive. To check transitivity, note that if a ≺L b and b ≺L c then for minimial io and

jo, we have aio < bio and bjo < cjo . If io < jo then aio < bio = cio and so a ≺L c. On the

other hand, if jo < io then ajo = bjo < cjo and so, again, a ≺L c.

The property of anti-symmetry is easy to check. Finally, since any two sequences can

be compared, this partial ordering is indeed a total order.

24 APPENDIX A. SETS

Let us finish this section by discussing some standard ideas pertaining to partially

ordered sets that will be of some use in discussions of some of the ideas of Pareto and

multi-criteria optimization.

Definition A.7.8 Let (A,≺) be a partially ordered set with partial order given by ≺.

Then

(a) m ∈ A is called a maximal element in A provided m ≺ a implies m = a. The set A

is said to have a greatest element m provided, for all a ∈ A , a ≺ m.

(b) ao ∈ A is called an upper bound for a subset B ⊂ A provided, for all b ∈ B , b ≺ ao.

(c) B ⊂ A is called a chain in A if each two elements in B are related.

It is important to distinguish between maximal and greatest elements.

Example A.7.9 (a) Consider the set consisting of the union of the sets A = {2, 4, 6}and B = {3, 9, 27}. Partially order the union with the relation a ≺ b provided a is

a factor of b. Then there is no largest element, but both 6 and 27 are maximal.

(b) If we take the union of A = {2, 4, 6} and B = {1, 3, 9, 27}, then 1 is both a minimal

element and a least element.

We end this appendix with some remarks on functions that preserve order when the

domain and range of the function are ordered in some way. This is an important question

in Economics where it arises in the context of the existence of a scalar-valued utility

function. We confine ourselves to some elementary remarks.

In what follows, we are given two partially ordered sets {X,≺} and {Y,<}.

Definition A.7.10 A function f : X −→ Y is said to be order preserving or sometimes

isotone relative to the orders on X and Y provided f(u) < f(v) or f(u) = f(v) whenever

u, v ∈ X are such that u ≺ v.

The situation of interest to us is the case in which ≺ is a partial order, while < is a linear,

i.e., a complete order, in fact, Y = R and the order is the usual order on R. Unfortunately,

some extra conditions must be put on the sets before an isotone mapping will exist even

in the case that the order relation ≺ is a total order.


�

We can see that this is the case by showing that lexicographic order does not admit

an order-preserving or isotone map into R with its usual order. The proof uses two

elementary facts about the system of real numbers, namely that any interval of positive

length contains a rational number, and that the cardinality of the set of rationals is less

than the cardinality of the set of reals. In particular, there cannot be an injective map of

R> into the rational numbers Q. The argument is by contradiction.

Proof: Suppose we are given a map u : R2> −→ R that represents lexicographic order in

R2. Were this the case, then, given any x ∈ R>, u(x, 0) < u(x, 1) since (x, 0) ≺L (x, 1)

the first components being the same and 0 < 1. Thus we can define an interval I(x) =

[u(x, 0), u(x, 1)] ⊂ R, of positive length.

Now, let x, y ∈ R> with x 6= y. Without loss of generality, we may assume that

y < x. Then (y, 1) ≺L (x, 0) in the lexicographic order. It follows that I(x) ∩ I(y) = ∅for (y, 0) ≺L (y, 1) ≺L (x, 0) ≺L (x, 1). Now let I =

⋃x∈R>

I(x) and define ϕ : R> −→ I by

ϕ(x) = I(x). This map is injective since I(x) ∩ I(y) = ∅.

Now it is a property of R that every interval contains a rational number. To finish

the proof, we pick a function that chooses one rational from each of the intervals8. Call

this map ψ. Then ψ : I −→ Q and ψ(I(x)) is, for each x, some rational number in the

interval I(x). Since the {I(x)}x∈R> is a disjoint family, the map ψ is injective and hence

ψ ◦ ϕ : R> −→ Q is an injective map. Hence card (R>) ≤ card (Q) which is false.

8Here, we are using the Axiom of Choice which we take as a fundamental axiom of Set Theory.

26 APPENDIX A. SETS

Appendix B

Basic Analysis

B.1 Introduction

In this appendix we collect a number of fact from Advanced Calculus that are neces-

sary for subsequent work. Again, the material here is intended as a summary and is not

a substitute for a course in Advanced Calculus/Beginning Analysis. Those who need a

good reference for this material should consult a standard text, for example M. Rosentlich

Introduction to Analysis [11] available through Dover, or W. Ruden, Principles of Math-

ematical Analysis,Third Ed. [12]. Here, as elsewhere in the text, we have printed some

proofs in small type. These are proofs which can be skipped without loss of continuity by

the beginning student

B.2 Norms and Inner Products

Our work will be confined almost exclusively to problems in a real n-dimensional Euclidean

vector space space which we will denote by Rn. However much of what we say here can be

considered in an arbitrary real vector space1. Vectors in the vector space Rn will always

be written as column vectors so that

1This is true in the finite dimensional case only. Certain changes muust be made in the case of an

infinite dimensional vector space like C[0, 1].

27

28 APPENDIX B. ANALYSIS

x =

x1

x2

...

xn

and we write its transpose as x>. Thus x> is a row vector. The distinction between the

notationally coincident row vectors and points in Rn will be clear from the context.

Recall, that there are two operations, the vector operations, that are defined on Rn,

namely vector addition and multipllication by scalars ( real numbers). These are defined

componentwise so that if x and y have ith components xi and yi and if α and β are real

numbers, then the ith component of αx + β y is αxi + β yi. There are eight basic rules

of computation that we list here for convenience.

Addition Scalar Multiplication

(1) x + y = y + x (5) α(x + y) = αx + βy

(2) x + (y + z) = (x + y) + z (6) (α + β)x = αx + βx

(3) There is a vector 0 with x + 0 = x (7) (αβ)x = α(βx)

(4) For each x there is a −x with x + (−x) = 0 (8) 1 · x = x

A linear combination of vectors is any sum of the form α1x1 + α2x

2 + · · ·+ αkxk.

B.2.1 Inner Products of Vectors

The usual dot product x · y between two vectors in Rn can be written in terms of matrix

multiplication. Thus

〈x,y〉 = y>x = x · y :=n∑i=1

xi yi .

We will almost always write the dot product as 〈x,y〉.This dot product has certain simple properties;

(i) 〈x,x〉 ≥ 0 and 〈x,x〉 = 0 if and only if x = 0. (positive definiteness)

(ii) 〈x,y〉 = 〈y,x〉 for all x,y ∈ Rn. (symmetry)

(iii) 〈αx,y〉 = α 〈x,y〉 for all α ∈ R and x,y ∈ Rn. (homogeneity)

(iv) 〈x + z,y〉 = 〈x,y〉+ 〈z,y〉 for all x,y, z ∈ Rn. (additivity)

B.2. NORMS AND INNER PRODUCTS 29

Notice that we can combine (ii) and (iii) to write

〈αx + βz,y〉 = α 〈x,y〉+ β 〈z,y〉

so that the inner product is linear in the first entry. By symmetry, it is likewise linear in

the second entry. This means that that an inner product2 is a positive definite, symmetric,

bilinear form. Indeed, in any real vector space, such a form defines an inner product. If

an inner product is given, the space, together with the inner product, is called an inner

product space.

We recall that two vectors in Rn are said to be orthogonal in the case that their dot

product vanishes. It is not difficult to show, say in R2 that orthogonal vectors must meet

at an angle of measure π/2. At this point, it is useful to recall a definition of linearly

independent vectors.

Definition B.2.1 A set of vectors {x1,x2, . . . ,xk} ⊂ Rn is said to be a linearly indepen-

dent set of vectors provided no one of them can be written as a linear combination of the

others.

Now we have the next result whose elementary proof we leave to the reader.

Lemma B.2.2 A set of vectors {x1, . . . ,xk} ⊂ Rn is a linearly independent set if and

only if the linear combination

α1x1 + · · ·+ αkx

k = 0

implies that α1 = α2 = · · · = αk = 0.

It should be clear that, for example, a set of mutually orthogonal non-zero vectors must

be linearly independent. Indeed, suppose that

α1x1 + · · ·αixi + · · ·+ αkx

k = 0 .

and take the inner product of both sides with xi. Then

k∑j=1

αj 〈xj,xi〉 = 〈0,xi〉 = 0 ,

2Here we restrict ourselves to the set of real scalars.


and each of the summands except the ith vanishes because of orthogonality. Hence this

last line reduces to αi〈xi,xi〉 = 0 and division by 〈xi,xi〉 shows that αi = 0. Since i is

arbitrary, all the αi = 0 and the vectors are linearly independent according to the lemma.

It is often useful to know that, given any set of linearly independent vectors, they span a

subspace of Rn and that the given set of k linearly independent vectors may be replaced

by a set of k mutually orthogonal vectors which span the same subspace. The method of

doing so is constructive and is known as the Gram-Schmidt Procedure.

B.2.2 Norms

The usual dot product, or inner product in Rn is associated with the Euclidean norm

‖x‖ =

√√√√ n∑i=1

x2i .

Unless stated explicitly, the symbol ‖ ·‖ will always refer to this Euclidean norm although

in some cases we will write ‖ · ‖2. This discinction may become useful since it is possible,

and often very useful (particularly in numerical work), to choose a different norm. In

particular, if we are working with an inner product that is different from the standard

dot product, then that new inner product likewise defines a norm in the same way that

the usual dot product induces the Euclidean norm.

Example B.2.3 As an example, suppose that A is a symmetric, n× n positive definite

matrix3. Then we can define a new inner product by

[x,y] = 〈Ax,y〉 .

The fact that the matrix is symmetric and positive definite insures that the new form [·, ·]satisfies all the properties of an inner product on Rn. This type of inner product is used

extensively in nonlinear programming where it arises in the so-called conjugate gradient

method. It is also useful, in more general settings, in studying elasticity where the norm

associated with the inner product is usually called the energy norm.

In order to understand this situation more fully, we should first recognize that the

idea of a norm is independent from that of an inner product. There are norms on Rn (or

on any vector space for that matter) that are not associated with any inner product at

3A matrix A is said to be positive definite provided, for all x ∈ Rn ,x 6= 0 , 〈Ax,x〉 > 0.


all; those that are must enjoy a particular extra property4. In general, a norm is just

a generalization of the familiar absolute value of a real number. Here is the concrete

definition.

Definition B.2.4 A norm on Rn (or on any real vector space) is a real-valued function

Rn −→ R, whose value is denoted by ‖x‖ which has the following three properties

(1) ‖x‖ ≥ 0 for all x ∈ Rn, and ‖x‖ = 0 if and only if x = 0.

(2) ‖x + y‖ ≤ ‖x‖+ ‖y‖ for each choice of x,y ∈ Rn.

(3) ‖αx‖ = |α| ‖x‖ for all α ∈ R ,x ∈ Rn.

In terms of these properties, we see that, again, the norm is a positive definite form which

is positively homogeneous of degree 1 and which satisfies the triangle inequality (propery

(2) ).

A crucial result that follows from definition of a norm is the Cauchy-Schwarz-Bunyakovski

inequality5. This is, without doubt, one of the most important inequalities in all of math-

ematics and physics.

|〈x,y〉| ≤ ‖x‖ ‖y‖ , with equality if and only if y = αx for some scalar α 6= 0 .

Proof: Clearly the inequality is true in the case that y = 0 and it is likewise true if 〈x,y〉 = 0, i.e. if the

vectors are orthogonal. We assume, therefore, that neither is the case.

We observe that, if y = αx then

|〈x,y〉| = |〈x, (αx)〉| = |α| |〈x,x〉|

= |α| ‖x‖2 = ‖x‖ ‖αx‖ = ‖x‖ ‖y‖ .

So we have equality in this case.

Now define scalars ξ = ‖y‖ and η := −〈x,y〉‖y‖

. Note that η is defined and non-zero by assumption. Then,

by the binomial expansion

4This extra property is called the parallelogram law . Any norm that satisfies 2‖x‖2 + 2‖y‖2 =

‖x+ y‖2 + ‖x− y‖2 must arise from an inner product5Fair or not, in the Western European and English-speaking worlds, the name of Bunyakovski is left

out and the inequality is simply called the Cauchy-Schwarz inequality. We will follow that tradition.


‖ξ x + η y‖2 = ξ2‖x‖2 + 2 ξ η 〈x,y〉+ η2‖y‖2

= ‖x‖2‖y‖2 − 2‖y‖(〈x,y〉‖y‖

)〈x.y〉+

〈x,y〉2

‖y‖2‖y‖2

= ‖x‖2 ‖y‖2 − 〈x,y〉2 .

From this identity we see that if equality holds in the Cauchy-Schwarz inequality, then ‖ξ x+η y‖ = 0. But

then

y = −(ξ

η

)x , with η 6= 0 by assumption

so that y is a non-zero multiple of x.

Finally, assuming that ‖ξ x + η x‖ > 0 we have 〈x,y〉2 < (‖x‖ ‖y‖)2 which implies, taking appropriate

square roots, that |〈x,y〉| < ‖x‖ ‖y‖ and the Cauchy-Schwarz inequality is established.

Our first use of the Cauchy-Schwarz inequality is to check that what we called the

Euclidean “norm” is, indeed, a norm.

Example B.2.5 Consider what we have called the norm associated with the Euclidean

inner product. To check that ‖x‖ :=

√n∑i=1

x2i is indeed a norm, we need to check the

three properties listed in [B.2.4] . Since, for any r > 0 we have√r > 0 we have ‖x‖ ≥ 0.

Moreover ‖x‖ = 0 implies that x2i = 0 for each component i = 1, . . . , n. Hence ‖x‖ = 0

implies x = 0. Since x = 0 imples xi = 0 for all i, we see that property (1) holds. Since√a · b =

√a√b clearly ‖αx‖ =

√α2

n∑i=1

x2i =√α2

√n∑i=1

x2i = |α|

√n∑i=1

x2i = |α| ‖x‖ we see

that (3) is satisfied.

Finally, to check the triangle inequality, we use the Cauchy-Schwarz inequality. Start

with the relationship of the supposed norm to the inner product and expand the inner

product.

‖x + y‖2 = 〈x + y,x,y〉 = 〈x,x + y〉+ 〈y,x + y〉

= 〈x,x〉+ 〈x,y〉〈y,x〉+ 〈y,y〉 = ‖x‖62 + 〈y,x〉+ 〈x,y〉+ ‖y‖2

= ‖x‖2 + 2 〈x,y〉‖y‖2 ≤ ‖x‖2 + 2 |〈x,y〉|+ ‖y‖2

≤ ‖x‖2 + 2 ‖x‖ ‖y‖+ ‖y‖2 by Cauchy-Schwarz

= (‖x‖+ ‖y‖)2 , the preceeding line from simple expansions of the last .

This establish the triangle inequality and the relation ‖x‖ =

√n∑i=1

x2i does, indeed, define

a norm.


The expansion of ‖x+y‖2 in the above calculation is a special case of what is known as

the binomial theorem. Given real scalars α, β and vectors x,y we have the usual binomial

expansion

‖αx + β y‖2 = α2‖x‖2 + 2αβ 〈x,y〉+ β2 ‖y‖2 ,

which can be easily checked by expanding the inner product ‖αx + β y‖2 = 〈(αx +

βy)(αx + βy)〉.

As remarked previously, there are may situations in which the Euclidean norm is not

the most convenient. The following form a scale of useful norms:

Examples B.2.6 (a) The `1-norm: ‖x‖1 = |x1|+ |x2|+ · · ·+ |xn|,

(b) The `2-norm : ‖x‖2 = (|x1|2 + |x2|2 + · · ·+ |xn|2)1/2,

(c) The `p-norm : ‖x‖p = (|x1|p + |x2|p + · · ·+ |xn|p)1/p , 1 ≤ p <∞,

(d) The `∞-norm: max1≤i≤n

|xi|.

Clearly, (a) and (b) are special cases of (c). In each case, the work of showing that the

given relation does in fact define a norm is that of checking that the triangle inequality

is satisfied. We have just shown that this is the case for the Euclidean or `2-norm

B.2.3 Some Important Inequalities

While we call the expressions the `p-norms, for 1 ≤ p < ∞, to check that they are, in

fact, norms requires that we check the triangle inequality, the other properties of norms

being easy to check. In these cases, the triangle inequality is called Minkowski’s Inequality

which reads (n∑i=1

|xi + yi|p)1/p

≤

(n∑i=1

|xi|p)1/p

+

(n∑i=1

|yi|p)1/p

To prove Minkowski’s inequality shows that these norms really are norms.

In order to prove Minkowski’s inequality, we need a generalization of the Cauchy-

Schwarz inequality which is called Holder’s Inequality. It is important in its own right and

it is worthwhile to remember both of these inequalities, together with the all-important

Cauchy-Schwarz inequality. Holder’s inequality is

n∑i=1

|xi yi| ≤ ‖x‖p ‖y‖q ,1

p+

1

q= 1 , 1 ≤ p <∞ .


Below, we establish these important inequalities.

We begin with a generalization of the inequality of the arithmetic-geometric mean of two positive real

numbers. This is the inequality√xy ≤ x

2+ y

2and is easily established.

0 ≤ (x− y)2 = x2 − 2x y + y2 (B.1)

= x2 + 2x y + y2 − 4x y = (x+ y)2 − 4x y . (B.2)

The result follows by rearrangement and taking square roots.

The generalization that we will need is given next.

Lemma B.2.7 Let a, b ≥ 0 be real numbers and let 1 ≤ p <∞ and q such that 1/p+ 1/q = 16. Then

a1/p b1/q ≤a

p+b

q.

Proof: We may consider only the case that both a and b are positive since, if either were zero, the result

would hold trivially. Now, for any fixed k, 0 < k < 1, and for t > 0 define a function f by

f(t) = k (t− 1)− tk + 1 ,

which has derivative f ′(t) = k − k tk−1 = k(1− tk−1) = k(

1− 1t1−k

)≥ 0. So f is an increasing function

and f(1) = 0 as can be easily checked. Hence

k(t− 1)− tk + 1 ≥ 0 , or tk ≤ k t+ 1− k) .

Since we require t ≥ 1 we have two cases. If a ≥ b put t = a/b and k = 1/p. Then

(ab

) 1p ≤

1

p

(ab

)+

1

q, or b

(ab

) 1p ≤ b

1

p

(ab

)+b

q,

from which it follows that

a1/pb1−1/p ≤a

p+

1

q.

The result follows from the fact that (1− 1/p) = 1/q.

If, on the other hand, b > a, then set t = b/a and k = 1/q. It follows that

(b

a

)1/q

≤1

q

(b

a

)+

(1−

1

q

),

and the result follows as before.

Lemma B.2.8 (Holder’s Inequality). For any x,y ∈ Rn

n∑i=1

|xi yi| ≤ ‖x‖p ‖y‖q ,1

p+

1

q= 1 , 1 ≤ p <∞ . (B.3)

6Such indices p and q are said to be conjugate.

B.3. SUBSETS OF RN 35

Proof: Again, if either x = 0 or y = 0 the inequality is triviallly true. Hence we assume that they are both

non-zero. In this case, set

ai =

(|xi|‖x‖p

)pand bi =

(|yi|‖y‖q

)q.

and appliy the preceeding lemma. Thus

a1/pi b

1/qi =

|xi‖x‖p

|yi|‖y‖q

≤ai

p+bi

q.

Adding these results, we obtain (recalling that , e.g., ‖x‖pp =∑|xi|p) we have

1

‖x‖p‖y‖q

n∑i=1

|xi yi| ≤(

1

p

)1

‖x‖pp

n∑i=1

|xi|p +

(1

q

)1

‖y‖qq

n∑i=1

|yi|q (B.4)

=1

p+

1

q= 1 . (B.5)

Multiplying both sides of this last inequality by ‖x‖p ‖y‖q the result follows.

We are now ready for the main event.

Proposition B.2.9 (Minkowski’s Inequality) For any x,y ∈ Rn , ‖x + y‖p ≤ ‖x‖p + ‖y‖p, 1 ≤ p <∞.

Proof: For p = 1 the inequality reduces to the well-known inequality for absolute value. For p > 1 and q

conjugate to p, note that p/q = p− 1. Then

‖x + y‖pp =

n∑i=1

|xi + yi|p =

n∑i=1

|xi + yi| |xi + yi|p−1 (B.6)

≤n∑i=1

|xi| |xi + yi|p−1 +

n∑i=1

|yi| |xi + yi|p−1 (B.7)

=n∑i=1

|xi| |xi + yi|p/q +

n∑i=1

|yi| |xi + yi|p/q (B.8)

≤ ‖x‖p

(n∑i=1

|xi + yi|p)1/q

+ ‖y‖p

(n∑i=1

|xi + yi|p)1/q

(B.9)

= (‖x‖p + ‖y‖p) ‖x + y‖p/qp (B.10)

where the last inequality is the result of applying Holder’s inequality. Dividing both sides by ‖x + y‖p/qp

we have, finally

‖x + y‖p−(p/q)p ≤ ‖x‖p + ‖y‖p ,

and Minkowski’s inequality follows from the fact that p− p/q = p(1− 1/q) = p(1/p) = 1.

B.3 Subsets of Rn

Basic properties of sets in Rn and related notions of convergence and continuity depend

on the notion of neighborhoods and of open sets, which, in turn, depend on the norm

imposed on the space. These notions are direct generalizations of, the notions of open

and closed intervals on the real line R. This section is devoted to an explanation of these

basic ideas.


B.3.1 Basic Definitions

We start with a tentative definition of neighborhood. The idea of neighborhood will be

expanded later.

By a δ-neighborhood of a point xo ∈ Rn is meant the set

Bδ(xo) := {x ∈ Rn | ‖x− xo‖ < δ} ,

which is also referred to as the open ball of radius δ centered at xo. Let S ⊂ Rn. Then a

point xo ∈ S is said to be an interior point of S provided there is a δ-neighborhood of xo

contained entirely in S. A point is said to be an accumulation point or a limit point of S if

every δ-neighborhood of xo contains a point x 6= xo with x ∈ S. Note that a limit point

of a set need not be in the set (take, e.g., S to be the open unit ball centered at xo = 0,

namely B1(0) = {x ∈ Rn | ‖x‖ < 1}. Then any unit vector is an accumulation point of

S and yet is not in S itself). A point is an isolated point of S if xo is in S but is not an

accumulation point of S. For example, if the set is {x ∈ R |x = −1 or 0 ≤ x ≤ 1} then

the point x = −1 is an isolated point of the set.

A point x is called a boundary point of S if every δ-neighborhood of xo contains

points in S and points not in S. For the set B1(0), all the vectors x with ‖x‖ = 1 are

boundary points of the unit ball. This set of boundary points is usually called the unit

sphere7. The boundary of a set A is just the set of all boundary points which we will write

bd(A). So, for example, if D is the the closed unit ball centered at the origin, then the

set of points on the unit circle constitute the boundary since every neighborhood of every

piont on the unit circle meets both the interior of D and R2 \ D. Finally, a point xo is

an exterior point of S provided there is a δ-neighborhood of xo which contains no points

of S.

Of all the subsets S ⊂ Rn we distinguish several with particular properties.

Definition B.3.1 A set S ⊂ Rn is said to be

(a) open provided all its points are interior points of S,

(b) closed provided it contains all its limit points,

(c) bounded provided it is contained in some ball {x ∈ Rn | ‖x‖ < r}, where 0 < r <∞,

7It is usually understood that term unit sphere is reserved for the set of all x with ‖x‖ = 1. It is called

a manifold orsurface in Rn and, as such, is a set of dimension n− 1.


(d) compact provided S is both closed and bounded 8.

Note that these definitions imply that the entire space Rn as well as the empty set, ∅,are both open and closed. Moreover, it is easy to check, using the definitions, that the

union of an arbitrary number of open sets is open, while the intersection of any collection

of closed sets is closed.

�

Here we must be careful! If we interchange unions and intersections in the state-

ments above, the results are false unless we restrict ourselves to finitely many sets. So we

can say that the intersection of finitely many open sets is open while the union of finitely

many closed sets is closed. Let us check this first statement.

Suppose that G = {Gi}ki=1 is a finite family of open sets. Then the set G =k⋂i=1

Gi is

in fact open. In the cases that either G is empty, or that, for some i , Gi = ∅ or that the

family is pairwise disjoint, then the intersection, G, is empty and so open since the empty

set is open. We may suppose, therefore, that G 6= ∅.

In this case let x ∈ G. Then for each i = 1, 2, . . . , k ,x ∈ Gi and there is an εi > 0 for

which Bεi(x) ⊂ Gi. Let ε = min1≤i≤k

{εi}. Then Bε(x) ⊂ Gi for all i and so Bε(x) ⊂ ∩Gi = G.

So every point of G is the center of some ball completely contained in G. Hence G is

open.

It is easy to give an example to show that if infinitely many sets are allowed, then

G may well not be open. Simply take Gi to be the open interval (−1/i, 1/i). Then

∩Gi = {0}, a singleton, which is a closed set.

At this point, let us take a short, but important, detour. Above, when we talked

about norms, we gave examples of norms, other than the Euclidean norm, that can be

imposed on Rn (see B.2.6). Now we have discussed what we mean by open sets, closed

sets, accumulation points, etc., all defined in terms of neighborhoods that are,themselves,

defined in terms of the Euclidean norm. Moreover, we shall presently discuss convergence

of sequences, again in terms of the Euclidean, or `2-norm of B.2.6. Here is an interesting,

and important, question: if we use a norm different from the `2-norm, how, if at all, do

8The definition of the term “compact’ given here is specific to Rn. In more general contexts, another

definition is used and it becomes a theorem to be proved that a set in Rn is compact in this more general

sense provided it is closed and bounded.


these things change? The answer is that they do not change. This means, for example,

that open sets defined in terms of balls with respect to one norm are also open when

we use balls defined with one of the other norms, and convergence of a sequence in one

norm, implies convergence with respect to all the other norms. The implication is, of

course, that we may use all of these ideas, choosing whatever norm is convenient to a

given situation.

Why is this the case? We first introduce a definition.

Definition B.3.2 Suppose that ‖ · ‖α and ‖ · ‖β are two norms on Rn. Then these

norms are said to be equivalent norms provided there are constants K1 and K2 such that

K1‖x‖α ≤ ‖x‖β ≤ K2‖x‖α.

It is clear, from this definition, that every ball in the α-norm contains a ball in the

β-norm and vice versa. So, for example, the interior points of a set can be described in

terms of either norm. And from this it follows that open sets and closed sets with respect

to one norm are open or closed with respect to the other. As for the norms that we

introduced earlier, they are equivalent norms.

Proposition B.3.3 Let 1 ≤ p ≤ ∞ then the `p-norms given in B.2.6 are equivalent.

Indeed, we have the following inequalities:

(1) ‖x‖2 ≤ ‖x‖1 ≤√n ‖x‖2;

(2) ‖x‖∞ ≤ ‖x‖2 ≤√n ‖x‖∞;

[(3) ‖x‖∞ ≤ ‖x‖1 ≤ n ‖x‖∞.

We leave the proof as an exercise.

We mention that much more is true. In fact, all norms on Rn are equivalent. The

proof requires the notion of compactness and we postpone it. You will find the result in

Proposition B.4.7.

Now, returning to the main thread of our discussion we can start with a set A that is

not closed, and take its union with the set of all of its accumulation points. The resulting

set is obviously closed. This new set is called the closure of the set A and we will write it

c`(A). Thus, if we take `p(A) to be the set of limit points of A, then A ∪ `p(A) = c`(A).

Intuitively, the closure of A is the set A itself together with all points arbitrarily close to


A. The basic example is the closure of the set B1(0) = {x ∈ Rn | ‖x‖ < 1} is the set

c`[B1(0]) = {x ∈ Rn | ‖x‖ ≤ 1}. Note that, for any set A ∈ Rn , c`(A) ⊃ A. We leave

the proof of the following facts to the reader.

Proposition B.3.4 Let A ⊂ Rn. Then

(a) A is closed if and only if A = c`(A).

(b) If F is a closed set such that F ⊃ A, then F ⊃ c`(A).

(c) If F denotes the set of all closed subsets containing A then ∩F∈FF = c`(A).

To complete the classification of points related to a set, we introduce the concept of

boundary point.

Definition B.3.5 Given a set A ⊂ Rn, a point x ∈ Rn is called a boundary point of A

provided that every open ball Bε(x) intersects both A and Rn \A. The boundary of the set

A is the set of all boundary points of A.

In what follows, we will denote the boundary of a set A by bd(A). So, for example, the

closed unit disk centered at the origin in R2 has the unit circle as its boundary since every

neighborhood of every point of the unit circle meets both the interior and the exterior of

the unit disk.

Again, we have some simple results whose proofs we leave to the reader.

Proposition B.3.6 Let A ⊂ Rn. Then

(a) bd(A) = c`(A) ∩ c`(Rn \ A).

(b) The set bd(A) is a closed set.

(c) The set A is closed if and only if it contains its boundary.

The next result characterizes closed sets in terms of open sets. The proof is worth

studying since it will give good practice in handling compliments, closed sets and open

sets.

Proposition B.3.7 A set F ⊂ Rn is closed if and only if its complement Rn \F is open.


Proof: Suppose, first, that the set F is closed. Then we show that O = Rn \ F is open.

If O is empty, then it is open and so we may assume that 0 6= ∅. Take an arbitrary point

x ∈ O. Since F is closed and hence contains all its limit points, x 6∈ `p(F ). So there

exists an open ball Bε(x) such that Bε(x) ∩ F = ∅. Hence, about any point x ∈ O we

can find an open ball, centered at x completely contained in O. Hence, O is an open set.

Conversely, suppose that the set O is open, and let x ∈ `p(F ). Then x ∈ F since each

point of O is the center of an open ball that does not meet F and such a point cannot be

a limit point of F by definition.

B.3.2 Suprema and Infima

We now turn to the case of sets in R and the notions of greatest lower bound or infimum

and least upper bound or supremum of a set of real numbers. Before proceeding, the reader

may wish to review the defiinition A.7.8 and the material immediately following that

definition.

A set S ⊂ R is said to be bounded above provided there is a constant M such that

x < M for all x ∈ S. Such a number M is called an upper bound. It is a property of

the real numbers with the usual ordering that every set which is bounded above has a

least upper bound, that is, there exists a number s such that x ≤ s for all x ∈ S and, if

M is an upper bound for S then s ≤ M . In this case we write s := sup(S). If S is not

bounded above, then we write sup(S) = ∞. Likewise, if S is bounded below, then there

exists a greatest lower bound or infimum that is, a number i such that i is itself a lower

bound and, if m ≤ x for all x ∈ S then m ≤ i. We write i := inf(S). If S is not bounded

below, we write i = −∞. Moreover, we will adopt the convention that sup(∅) = −∞ and

inf(∅) =∞. The numbers s and i may, or may not, belong to the set S. In the case that

they do, we write s = max(S) and i = min(S). In fact, we have the following result

Proposition B.3.8 Let S be a non-empty subset of R which is bounded above. If s =

sup(S) then s ∈ c`(S) Hence s ∈ S if S is closed.

Proof: If s ∈ S then s ∈ c`(S). On the other hand, if s 6∈ S then, for every ε > 0 there

is a point t ∈ S such that s− ε < t < s for otherwise, s− ε would be an upper bound for

S. Thus s is a limit point of S, hence s ∈ c`(S).

Before going more deeply into these ideas, we are going to use these basic definitions

to show that lying between any two real numbers there is both a rational number and


an irrational number. We start with a very simple fact that follows directly from the

definition of least upper bound.

Lemma B.3.9 Suppose that S ⊂ R is bounded above and let x be any upper bound for

S. Then the following statements are equivalent:

(a) x = sup(S).

(b) For any ε > 0 , S ∩ (x− ε, x) 6= ∅.

Proof: To see that (a) implies (b), suppose not. Then there exists and ε > 0 with the

property that S∩(x−ε, x) = ∅. Then x−ε is an upper bound less than x, a contradiction

to the choice of x.

To see that (b) implies (a) suppose that (b) is true but that (a) is false i.e., that x is

not the least upper bound for S. Then there exists a z < x such that y ≤ z for all y ∈ S.

Let εo = (z − x. Then, clearly, S ∩ (x− εo, x) = ∅ and (b) does not hold. .

The next proposition is usually known as the Archimedian Property of the real num-

bers. It says that there is no upper bound for the real numbers, and consequently, no

smallest positive number. It is an essential fact necessary to actually prove that the

sequence {1/n}∞n=1 converges to zero.

Proposition B.3.10 For any x ∈ R there exists and n ∈ N such that x < n.

Proof: Assume, on the contrary, that there exists an x ∈ R such that, for all n ∈ N , n ≤x. This means that the set N is bounded above and hence has a least upper bound, call it

y. Since y is the least upper bound, there must be an integer n in the interval (y− 1/2, y]

which exists by the previous lemma. But then y − 1/2 < n ≤ y from which we deduce

by addition that y = 1/2 < n + 1. But n + 1 is an integer and so y cannot be an upper

bound for N.

Remark: Note that a similar argument, using greatest lower bounds will show that

there cannot be a smallest integer.

Corollary B.3.11 The sequence {1/n}∞n=1 converges to 0.

Proof: . Let ε > 0 be given and consider the open ball Bε(0) ⊂ R. Choose any integer

N.1/ε whose existence is guaranteed by the proposition. Then for all n > N , we have

0 <1

n<

1

N< ε which implies that

1

n∈ Bε(0) whenevern > N .


Hence in any ball of the given form, there are infinitely many elemenets of the sequence.

Hence 0 is a limit point of the sequnce.

These types of arguments can now be used to show that between any two real numbers

there is a rational.

Proposition B.3.12 Let a, b ∈ R with a < b. Then there is a rational number q with

a < q < b.

Proof: Choose an integer N such that N > 1/(b − a), or equivalently, 1/N < b − a.

Consider the subset Q ⊂ Q given by

Q ={mN

∣∣m ∈ Z}.

Then Q∩(a, b) 6= ∅. Indeed, if not there must be a largest integer m such that m/N < a9.

If (m + 1)/N < b then it is a rational number between a and b and we are done by

construction. Hence we must assume that (m+ 1)/N ≥ b. But then

b− a ≤ m+ 1

N− m

N=

1

N< b− a ,

which is impossible. Hence Q ∩ (a, b) 6= ∅.

In order to show that every interval contains an irrational number as well as a rational

one, we need only know that irrational numbers exist. There are plenty of choices, but

the traditional one is, of course,√

2. In case the reader has never seen the proof that this

number is irrational, we give the traditional proof that already appeared in the book of

Euclid.

Proof: Suppose the contrary, that√

2 is rational and hence can be written as a ratio of integers p/q where

p and q have no common factors. Then, 2 = p2/q2 and so p2 = 2 q2 showing that p2 must be divisible by

2 and hence by 4 since all factors of p must occur twice. Write p2 = 4 r2. Then we have 4 r2 = q2 so that

q2is also divisible by 4 from which it follows that q is divisible by 2. Hence p and q have the factor 2 is

common, a contradiction.

We now have

Proposition B.3.13 If a, b,∈ real with a < b then there is an irrational number t with

a < t < b.

9This follows from the remark above that there can be no smallest integer.


Proof: The interval (a/√

2, b/√

2) contains a rational number, call it q. If a < 0 < b,

choose a rational number from the interval (a/√

2, 0) instead. Then√

2 q ∈ (a, b) in either

case, and this is the required irrational number. In fact, were this number to be rational,

then√

2 q = p, p rational, and hence√

2 = p/q which would mean√

2 would be rational,

which it is not.

At this juncture it is useful to introduce a definition.

Definition B.3.14 Let X be a subset of Rn with the usual metric and let E ⊂ X. Then

E is said to be dense in X provided every point of X is an accumulation point of E or a

point of E or both.

We have just shown that both the set of rational numbers and the set of irrational

numbers are dense in R. In other words, given any point in R, any ball around that point

contains a rational and an irrational number. So that point is an accumulation point of

both the rationals and the irrationals.

It is not hard to see that points with rational coordinates in Rn are dense. This follows

by using the fact that balls in the max-norm, `∞ are just “rectangles” whose sides are

determined by intervals of the form ai ≤ xi ≤ bi so that a point with rational coordinatec

can be found in any such neighborhood and then recalling that the norm is equivalent to

the `2-norm.

Now, let {xk}∞k=1 be a sequence of real numbers and let

ym := sup{xk | k ≥ m} , and zm := inf{xk | k ≥ m} .

Clearly, the sequence {ym}∞m=1 is nonincreasing while the sequence {zm}∞m=1 is nondecreas-

ing. It is a basic fact of real analysis that every bounded monotonically nonincreasing

or nondecreasing sequence converges. Here, if the original sequence is bounded above,

then the sequence {ym}∞m=1 converges, while if it is bounded below, the sequence {zm}∞m=1

converges. In the first case, the limit of the sequence {ym}∞m=1 is written lim supk→∞

xk, while,

in the second case, we write lim infk→∞

xk. If the sequence {xk}∞k=1 is not bounded above, we

write lim supk→∞

xk =∞ and if it is not bounded below, we write lim infk→∞

xk = −∞.

We give some simple examples.


(a) Let xk := (−1)k + 1k, k = 1, 2, . . .. So the sequence looks like {0, 3

2, −2

3, 5

4, −4

5, · · · }.

Then, since

max{xm, xm+1} =

m+1m

if m is even

m+1m+1

if m is odd

so that limm→∞

ym = 1 or lim supk→∞

= 1. Likewise, since

zm = inf{xk | k ≥ m} = min{xm, xm+1}

=

1−mm

if m is even

−mm+1

if m is odd ,

so that limm→∞

zm = −1 or lim infk→∞

xk = −1.

(b) Let xk = k2 sin2 (12kπ). Then xk ≥ 0 and, for each k even, xk = 0 while, for each k

odd, xk = k2. Hence, lim infk→∞

xk = 0 while lim supk→∞

xk =∞.

There are certain basic facts about how these limits behave. Here are a few facts: Let

{xk}∞k=1 and {yk}∞k=1 be sequences in R. Then

(i)

inf{xk | k ≥ m} ≤ lim infk→∞

xk

≤ lim supk→∞

xk ≤ sup{xk | k ≥ m} ,

(ii) {xk}∞k=1 converges if and only if −∞ < lim infk→∞

xk = lim supk→∞

xk < ∞. In which case

limk→∞

xk = lim infk→∞

xk = lim supk→∞

xk <∞.

(iii) If xk ≤ yk for all k = 1, 2, · · · , then

lim infk→∞

xk ≤ lim infk→∞

yk and lim supk→∞

xk ≤ lim supk→∞

yk .

(iv)

lim infk→∞

xk + lim infk→∞

yk ≤ lim infk→∞

(xk + yk) ,

and

lim supk→∞

xk + lim supk→∞

yk ≥ lim supk→∞

(xk + yk) .


We will also use the notation limk→∞xk for lim supk→∞ xk and limk→∞xk for lim infk→∞ xk.

B.3.3 Connected Sets

Another useful property of a subset is that it be connected. Roughly speaking, this means

that the set is “in one piece”. We will need a preliminary definition.

Definition B.3.15 Two subsets A and B of Rn are said to be separated A ∩ c`(B) and

c`(A) ∩B are both empty.

Otherwise said, no point of A is in the closure of B and vice versa. With this definition

in hand, we can give a precise definition of a connected set in Rn.

Definition B.3.16 A set A ⊂ Rn is said to be conneted provided it cannot be written as

the union of two non-empty separated sets.

We will have more to say about connected sets later. For now, the basic fact that

we want to establish is that the open and closed intervals of the real line are connected

sets. A little later, we will see how this fact is important in establishing the familiar

intermediate value theorem. In the proof of the next result we will use the notation

(−∞, u) = {x ∈ R |x < u} and (u,∞) = {x ∈ R |x > u}.

Proposition B.3.17 A subset IofR is connected if and only if it has the following prop-

erty: If a, b ∈ I and a < x < b then x ∈ I.

Proof: If a < x < b and x 6∈ I then the sets A = I ∩ (−∞, x) adn B = I ∩ (x,∞) are

non-empty since a ∈ A ad b ∈ B. They are separated since A ⊂ (−∞, x) and B ⊂ (x,∞).

Moreover I = A ∪B. So I is not connected.

Conversely, suppose I is not connected. There there are non-empty separated sets A

and B such that A ∪ B = I. Choose a ∈ A and b ∈ B and assume, without loss of

generality, that a < b. Define x = sup{A ∩ [a, b]}. Then by B.3.8 x ∈ c`(A) and hence

x 6∈ B. In particular a ≤ x < b. If x /∈ A it follows that a < x < b and so x 6∈ I. If x ∈ Athen x 6∈ B and so there is a x1 such that x < x1 < b and x1 6∈ B. Then a < x1 < b and

x1 6∈ I.

Finally we prove a result about arbitrary unions of connected sets in Rn.


Proposition B.3.18 Let {Si}i∈I be a family of connected subsets of Rn and that, for

some index io, Sio ∩ Si 6= ∅ for all i ∈ I. Then S = ∪i∈Isi is connected.

Proof: Suppose that the uniton S is not connected. Then there exist separated sets, A

and B such that A ∪ B = S. We first show that, for every index i, either A ∩ Si = Si or

A ∩ Si = ∅. To see this, note that A ∩ c`(B) = ∅ implies that

(A ∩ Si) ∩ c` (B ∩ Si) ⊂ A ∩ c`(B) = ∅.

SImilarly (A ∩ Si) ∩ c` (B ∩ Si) = ∅.

Now Si is connected so either Si ∩A = ∅ or Si ∩A = Si. Likewise, we have that either

Si ∩B = ∅ or Si ∩B = Si. But neither A nor B is empty, so there must be some indices

m and n such that A ∩ Sm = Sm and B ∩ Sn = Sn. By hypothesis, the connected set

Sio meets each Si and so for the index m, Sio ∩ Sm 6= ∅ and so A ∩ Sio 6= ∅ and therefore

A ∩ Sio = Sio .

In a completely similar manner, we show that B ∩ Sio = Sio . But then A ∩B 6= ∅ and

we have a contradiction.

B.3.4 The Bolzano-Weierstrass Theorem

One basic result, which can be found in any advanced calculus text (see, for example [2]),

is the Bolzano-Weierstrass Theorem which says that

Theorem B.3.19 (Bolzano-Weierstrass) Every bounded infinite set in Rn has at least

one accumulation point.

We will make free use of this theorem in this book. For the sake of completeness, we

include here a proof iin the case of the real line, R, with the usual notions of open and

closed sets. The idea of the proof (akin, according to Boas, to the process of finding a lion

in the Sahara Desert) can be easily generalized to Rn by using n-dimensional intervals10 .

Proof: Let S ⊂ R be bounded and contain infinitely many points. Then S lies in some interval of the form

[−a, a]. Then at least one of the intervals [−a, 0] and [0, a] contain infinitely many points of S. Choose one

that does and call it I = [a1, b1]. Bisect this interval, to obtain a smaller interval, I2 = [a2, b2] containing

infinitely many points of the original set S.

10In Rn we can define an interval to be a set consisting of point x = (x1, x2, . . . , xn) such that

ai ≤ xi ≤ bi , i = 1, . . . , n. Or any set with < replacing ≤. Moreover, we do not rule out that for some

i, ai = bi; in particular ∅ is an interval.


Now according to this construction b1−a1 = a and b2−a2 = (a/2). Again, construct an interval I3 = [a3, b3]

by bisection, so that I3 contains infinitely many points of S. Then, the length of I3 = |I3| = |I2|/2 =

a/22. Continuing in this manner we construct, at the nth step the interval In with length a/2n−1. Hence

limn→∞

|In| = 0 and so the endpoints an and bn converge to a point xo. This latter point is the required

accumulation point since, if ε > 0 is arbitrary, Bε(xo) ⊃ [an, bn] for n sufficiently large, specifically provided

bn − an < ε/2. In this case, Bε(xo) contains points of the original set S other than xo (in fact infinitely

many).

A related and quite useful result due to Cantor can be proved using the Bolzano-

Weierstrass Theorem. We state the theorem in Rn.

Theorem B.3.20 Let {S1, S2, . . .} be a sequence of closed, non-empty sets in Rn, nested

in the sense that Sk+1 ⊂ Sk, and assume that S1 is a bounded set. Then

S =∞⋂k=1

Sk 6= ∅ ,

and the intersection S is closed.

Proof: Since each Sk is closed, their intersection, S, is closed as well. The goal is then to show that S 6= ∅.

First observe that if any one of the sets has only finitely many points, all the rest do as well and the

existence of a common point is obvious. So, we assume that all the Sk have infinitely many points. Now,

let P = {x1,x2, . . .} where the xk ∈ Sk. Since S1 is bounded by hypothesis, so is the set P , and hence, by

the Bolzano-Weierstrass Theorem, this set P has an accumulation point, say xo.

Now, let ε > 0 be given and consider the neighborhood Bε(xo). Then, if P (k) = {xk, xk+1, . . .}, on the one

hand P (k) ⊂ Sk while, on the other hand, every Bε(xo) contains infinitely many points of P .

B.3.5 Convergence

It is convenient to extend the notion of neighborhood beyond that of the δ- neighborhood

introduced earlier . We will subsequently use the term neighborhood of the point xo to

mean any open set that contains the point xo. Similarly, we will find it useful to define a

neighborhood of a set S as any open set N containing S. In particular, a δ-neighborhood

of a set S, denoted by [S]δ, is the set of points in Rn each of which lies in a δ-neighborhood

of some point of the set S. In other words, if Bδ(x) denotes such a neighborhood of the

point x, then

[S]δ := ∪x∈SBδ(x) .

As a simple example, we can see that if S is the closed unit disk, then

[S] 12

= {x ∈ Rn | ‖x‖ ≤ 32}.


The `2-norm induces the usual Euclidean distance or Euclidean metric between points of

Rn via

d(x,y) = ‖x− y‖ =

(n∑i=1

|xi − yi|2) 1

2

.

Other norms induce other distance functions or metrics. Here is a definition of this

term. We state it for a general case, but remind outselves that most of our work is in

Rn. We do this because we are often working on a subset or Rn and we need the idea of

a metric on the set.

Definition B.3.21 Let X be a non-empty set. Then a metric on X is a real-valued

function d : X ×X −→ X which satisfies the following conditions:

1. d(x, y) ≥ 0, and d(x, y = 0 if and only if x = y.

2. d(x, y) = d(y, x) (symmetry).

3. d(x, y) ≤ d(x, z) + d(z, y) (the triangle inequality).

Here are some examples.

Examples B.3.22

Example 1. Let X be any non-empty set. Define d by

d(x, y) =

0 if x = y

1 if x 6= y

We leave it to the reader that this does, indeed, define a metric.

Example 2. Let X be any normed space. Then d(x, y) = ‖x − y‖ defines a metric

as is easily seen using the properties of a norm. So our usual Rn with the `2-norm is a

metric space with the metric induced in this way by the `1-norm. But then, so is the

same space, together with the distance functions defined by any of the `p-norms that we

have discussed. In the case of the `1-norm, the distance is sometimes called the taxicab

distance.

Example 3. The notion of Hamming distance occurs in coding theory, in particular in the

discussion of error detection and the construction of error correcting codes. Here, we will


consider the set X to be the set of all strings of the symbols 0 and 1 of length k. Given

two such strings, the Hamming distance dH(x, y) is just the number of corresponding

entries that are different. Thus, in the case that k = 3 we have dH(111, 000) = 3 while

dH(110, 100 = 1.

To actually check that dH defines a metric, we need only check that dH has the appro-

priate properties.

It is clear that dH(x, y) ≥ 0 and dH(x, y) = 0 if and only if the components of x and y

do not differ at all, that is, if and only if x = y. Moreover, the order in which we check

the differences is irrelevant. Hence dH(x, y) = dH(y, x). To check the triangle inequality,

suppose that we are given three strings x, y, z and that dH(x, z) = a and that there are b

positions where the elements of y are the same as those of of x but not the same as those

of z. Further, thta in the 3 − a positions in which x matches z there are c positions in

which the elements of y do not match either x or z. Then

dH(y, z) = b+ c and dH(x, y) = a− b+ c.

then adding these two distnces

dH(x, y) + dH(y, z) = (a− b+ c) + (b+ c) = a+ 2 b ≥ a = d(x, z) .

As remarked earlier, we will be using, for the most part, the Euclidean norm and we

now return to that particular case for the sake of definiteness.

Since the Euclidean norm induces a distance function, we can introduce concepts of

convergence of sequences and continuity of functions. We summarize some of these, as

well as related ideas and theorems, which will be crucial for our work. More detail may

be found in any book of advanced calculus or beginning analysis.

The metric structure on Rn leads us to the notion of sequential convergence. Let {xk}∞k=1

be a sequence of points in Rn. This sequence is said to converge to the point xo provided

limk→∞‖xk − xo‖ = 0 ,

that is, provided that, for every ε > 0 there is an integer ko such that ‖xk − xo‖ < ε

for all k > ko. We often write simply xk → xo to indicate convergence. A subse-

quence {y`}∞m=1 of the sequence {xk}∞k=1, is a subset of the original such that, given

an index ` there is an index k with y` = xk, and if `i < `j then k(`i) < k(`j).


For example, the sequence {12, 1, 1

3, 1, 1

4, 1, · · · } contains, among others, the subsequences

{1, 12, 1

3, 1

4, · · · , 1

n, · · · } , {1, 1

4, 1

9, 1, 1

16, 1

25, 1, · · · } and {1, 1, . . . , }.

From the definition it is clear that if xk → xo, then y` → xo for any subsequence

{y`}∞`=1 ⊂ {xk}∞k=1. Obviously the convergence of a subsequence does not imply the

convergence of the original sequence. A point x? is said to be an accumulation point of

the sequence provided that x? is the limit of some subsequence of the original one 11. A

sequence is called a bounded sequence if there exists a number r such that ‖xk‖ < r for

all indices k.

Many arguments in analysis depend on our ability to know that a given sequence

contains a convergent subsequence. The basic result is:

Theorem B.3.23 Every bounded sequence in Rn has a convergent subsequence.

This fact is worth checking. Here is the proof that uses the Bolzano-Weierstrass Theorem.

Proof: There are two situations to consider and they are easiest to describe if we view a

sequence as a function on the integers k → xk. If the range of this function consists of

only finitely many points, then there must be a point x? such that xk = x? for infinitely

many values of the index k. Then clearly the subsequence y` = x? for all ` is a convergent

subsequence with limit x?. The second possibility is that the range {xk}∞k=1 is an infinite

but bounded set. Then, by the Bolzano-Weierstrass Theorem, this set of points contains

an accumulation point, call it x. Since x is an accumulation point of the sequence, every

neighborhood of x contains infinitely many terms of the sequence. Consider, for the sake

of concreteness, the decreasing sequence of neighborhoods

Um :=

{x | ‖x− x‖ < 1

m

},m = 1, 2, · · · .

Choose any point ym ∈ Um ∩ S. Then, clearly, we have ym → x as m→∞.

It is interesting to see that the converse is true.

Proposition B.3.24 If a set, S, has the property that every sequence in it converges

then the set S must be bounded.

11This terminology is in accord with the definition of accumulation point (or limit point) given earlier.


Proof: Indeed, assume the contrary. Then if Bk denotes the ball of radius k, S ∩Bk 6= ∅.Then the sets (Bk+1−Bn) are pairwise disjoint and all meet S. Now, for each k = 1, 2, · · · ,choose xk ∈ S ∩ (Bk+1 − Bk). Then this sequence is in S and cannot converge contrary

to the hypothesis on S. Hence S must be bounded.

We can combine this last observation with the fact that a set is closed if and only if

it contains all the accumulation points of sequences of the set. We thereby establish the

following useful statement concerning compactness (see the definition B.3.1 above):

Theorem B.3.25 A set S ⊂ Rn is compact if and only if every sequence of points in S

has a subsequence which converges to a point in S.

Knowing this result suggests that we introduce the following definition.

Definition B.3.26 A subset S ⊂ Rn is called sequentially compact provided every se-

quence {xk}∞k=1 ⊂ S has a subsequence that converges to a point of S.

Using this term, the preceeding result says that every compact set in Rn is also sequentially

compact and conversely. In other words, in Rn the notions of compact and sequentially

compact are equivalent.

As an obvious, but often useful corollary to this result we have the following.

Corollary B.3.27 A closed subset of a compact set is compact.

We leave the simple proof as an exercise.

It is important to have some criterion to determine when a sequence converges. There

is one class of sequences which always converge in Rn. These are the Cauchy sequences

which we can define as follows:

Definition B.3.28 A sequence {xk}∞k=1 is said to be a Cauchy sequence provided that for

any ε > 0 there is an index ko such that, if k ` > ko then ‖xk − x`‖ < ε.

We can easily show that every convergent sequence is a Cauchy sequence: for suppose

that xk −→ xo. Choose ε > 0. Then there is a positive integer ko so that for k >

ko, ‖xk − xo‖ < ε/2. Then for any k, ` > ko we have

‖xk − x`‖ ≤ ‖xk − xo‖+ ‖xo − x`‖ < ε/2 + ε/2 = ε .


The converse statement is definitely not true. The simple example of X = (0, 1] ⊂ Rand the sequence {1/n}∞n=1 shows that a sequence may be Cauchy but yet it does not

converge to a point in the space since the point 0 is not in the space. Of perhaps more his-

torical interest is the metric space {Q, | · |} of the rational numbers with the usual distance

function on the line d(x, y) = |x − y|. Then the sequence {1, 1.4, 1.41, 1.414, 1.4142, . . .}which converges to

√2, which is not rational. This sequence is convergent, and hence

Cauchy.

This kind of behavior cannot happen in Rn. Indeed, it is well known that a sequence

{xk}∞k=1 in Rn converges if and only if it is a Cauchy sequence. The fact that every Cauchy

sequence in Rn converges to a point in Rn is just what is meant by the completeness of

Rn.

This last assertion is easy to check, provided we know that the set R is a complete

space. That this is so is a fundamental fact of elementary analysis, and we will accept it

without question.

The situation in most applications is that we are working, not in all of Rn but rather

in some subset X. It is therefore I absolutely crucial to know when a Cauchy sequence in

some subset X of Rn converges to a point in X itself. If this is the case, then we say that

the subset, X itself, is complete. The next result answers the question.

Proposition B.3.29 Let X ⊂ Rn. Then X is complete if and only if X is a closed subset

of Rn.

Proof: Suppose, first, that every Cauchy sequence in X converges to a point of X. To

show that X is closed, we need to show that it contains all of its limit points. So suppose

that xo is such a limit point. Then, for each positive integer k, the ball B1/k(xo) contains

a point xk ∈ X. Then xk −→ xo and so, by the fact that every convergent sequence

is Cauchy, this sequence must converge to xo ∈ X since we have assumed that X is

complete. So X must contain all its limit points and therefore is closed.

Conversely, suppose that X is closed and let {xk}∞k=1 ⊂ X be a Cauchy sequence. Then,

as a sequence in Rn, this Cauchy sequence must converge, in Rn to some point xo ∈ Rn.

Now this point is a limit point of the set consisting of the points of the sequence. Since

X is closed, this limit point must be in X. Hence X is complete.

B.4. FUNCTIONS ON RN 53

In closing, we should point out that the space Rn now has two aspects to its structure.

On the one hand, we recognize it as a vector space with its attendant algebraic structure.

On the other hand it has a norm structure, in which it makes sense to talk about what

are called topological properties, e.g., of open and closed sets, limit points, convergent

sequences, and completeness. This gives us one example of what is called a normed linear

space. Moreover, it is a complete normed linear space. Such spaces, and there are many

other than just Rn which is, after all, finite dimensional, are called Banach Spaces. Here

is an interesting question: are the maps from Rn × Rn to Rn given by (x,y) 7→ x + y

and from R× Rn to Rn given by (α,x) 7→ αx continuous? In fact, what do we mean by

continuous maps from one set to another? That is the question that we take up next.

B.4 Functions on Rn

In this section we will begin with a discussion of continuous funcitons and some of their

properties. In the following section, we will discuss semicontinuous functions and, later,

introduce the fundamental notion of the epigraph of a function.

B.4.1 Continuous Functions

We start with a defnintion.

Definition B.4.1 Let X ⊂ Rn. Then a function f : X → Rm is said to be continuous at

a point xo provided that if n f(xk)→ f(xo) for all sequences {xk}∞k=1 with xk → xo. The

function f is said to be continuous on X provided it is continuous at each point of X.

This definition is usually called sequential continuity and is due to Heine. In the setting

of a metric space, sequantial continuity is equivalent to the other notions of continuity,

and in particular, to the familiar “epsilon-delta” definition.

As usual, sums, products, compositions, and max of continuous functions is con-

tinuous. Moreover we know from elementary calculus about the continuity of poly-

nomials, exponentials, sine, cosine, and many other functions. In the case that f :

X → Rm, continuity is defined in the same way. In terms of the representation of

f(x) = (f1(x), f2(x), · · · , fm(x))> it is easy to show that continuity of f is equivalent to

continuity of each of its components. Note that if X and Y are subsets of Rn then we can

treat the set of ordered pairs X × Y as a subset of R2n and so treat continuous functions


f : X×Y −→ Rm. With this observation, we can look at some of the operations between

vectors we have discussed and show that, in the sense that these basic operations are con-

tinuous, the algebraic structure of Rn considered as real vector space, and the topological

structure as a normed space are compatible.

Proposition B.4.2 The following functions are continuous:

(a) x 7→ ‖x‖. 12

(b) (x,y) 7→ x + y of Rn × Rn → Rn.

(c) (x, α) 7→ αx of Rn × R→ Rn.

Proof:

(a) To prove the first statement, let xoinX and and consider any sequence {xk}∞k=1 that

converges to it. We wish to show that ‖xk‖ → ‖xo‖ as k →∞. But

0 ≤ | ‖xk‖ − ‖xo‖ | ≤ ‖xk − xo‖ ,

and ‖xk − xo‖ → 0 is what we mean by the sequence converging to xo.

(b) For the second result, let (xo,yo) be an arbitrary point of Rn × Rn and suppose

xk → xo while yk → yo, We wish to show that xk + yk → xo + yo. Indeed, using

the triangle inequality, we have

0 ≤ ‖ (xk +yk)− (xo +yo) ‖ = ‖ (xk−xo) + (yk−yo) ‖ ≤ ‖xk−xo‖+ ‖yk−yo‖

but each term on the right-hand side approaches 0 as k → ∞. Hence xk + yk →xo + yo.

And finally we have

(c) Suppose (xo, αo) ∈ Rn × R and that xk → xo while αk → αo. Then

0 ≤ ‖αkxk − αoxo‖ = ‖αkxk − αoxk + αoxk − αoxo‖

≤ ‖αkxk − αoxk‖+ ‖αoxk − αoxo‖ = ‖ (αk − αo) ‖xk‖+ ‖αo (xk − xo) ‖

= |αk − αo| ‖xk‖+ |αo| ‖xk − xo‖.

But, by hypothesis |αk−αo| ‖xk‖ → 0 · ‖xo‖ = 0 and |αo| ‖xk−xo‖ → |αo| · 0 = 0.

Hence the result.

12This particular result is not surprising since convergence is defined in terms of the norm.


It is a good idea to have on hand an example of a function that is not continuous in

a very dramatic way, much more dramatic, in fact than simple step discontinuities or

examples like 1/x on the interval (0, 1).

Example B.4.3 Consider the function f : R −→ R given by

f(x) =

1 if x is irrational

0 if x is rational

This function is continuous nowhere! For any open interval in R contains both rational and

irrational points and so every point has sequences of rationals and sequence of rationals

that converge to it. So if xo is rational, take a sequence {xn}∞n=1 of irrationals that

converge to xo and then f(xn) = 1 for all n while f(xo) = 0. Likewise, if xo is irrational,

then we take the sequence to be a sequence or rationals, along which the function has the

value 0 while f(xo) = 1.

We not turn to the behavior of continuous functions on compact sets in Rn. We

want to show in particular, that such functions must be bounded, the image must be

compact, and ta real-valued function takes on its maximum and minimum values. We

begin by combining the first two of these results. Since sequentially compact sets in Rn

are compact and vice versa we can use the sequential definitions in the proofs without

loss of generality.

Proposition B.4.4 Let S ⊂ Rn be (sequentially) compact and f : Rn → Rm be continu-

ous. Then the set f(S) ⊂ Rm is (sequentially) compact.

Proof: Let {yk}∞k=1 be a sequence of points in f(S). We need only show that this sequence

converges to a point of f(S). But for each yk there corresponds a xk ∈ S wth f(xk) = yk.

Since {xk}∞k=1 ⊂ S and S is assumed to be sequentially compact, this sequence contains

a subsequence {xk`}∞k=1 which converges to some xo ∈ S. Since f is assumed continuous,

yk` = f(xk`) −→ f(xo) ∈ f(S) ,

and so the subsequence {yk`}∞`=1 converges to a limit, f(xo) ∈ f(S) which was to be

proved.

Examples show that inverses of continuous functions, even when defined, are not nec-

essariloy continuous. The next result shows that the additional condition of compactness

of the domain is enough to guarantee continuity of the inverse.


Proposition B.4.5 Let S ⊂ Rn be sequentially compact and suppose f : S −→ Rm is

continuous and injective. Then f−1 : f(S) −→ S is continuous.

Proof: The function f : S → f(S) is bijective (one-to-one and onto) and so the inverse function f−1 is

well defined with f−1(f(x)) = x and f(f−1(y)) = y. To show the continuity of f−1 we must show that

if yk → yo in f(S), then f−1(yk) → f−1(yo) in S. To do this, look at the sequence in S given by

xk = f−1(yk). Since S is sequentially compact, all the limit points of the infinite set {xk}∞k=1 belong

to S. Let xo be one such limit point. Then there is a subsequence {xk`}∞`=1 such that xk` → xo in

S. By the continuity of the function f , yk` = f(xk` ) → f(xo) in f(S). But since the original sequence

{yk}∞k=1 converges to yo, so does every subsequence and hence yo = f(xo). But this means since xo was

an arbitrary limit point of {xk}∞k=1, any other limit point, say x would also satisfy yo = f(x). Since f

is injective (one-to-one) xo = x, i.e., the sequence {xk}∞k=1 can have only one limit point. So the entire

sequence {xk}∞k=1 converges to xo = f−1(yo). Hence f−1(yk)→ f−1(yo) in S as was to be proved.

Now, we prove what is probably the most important theorem in the theory of optimi-

ation. The result is due to Weierstrass. Note that here we are dealing with real-valued

functions.

Theorem B.4.6 (Weierstrass) Let f : S −→ R be continuous where S ⊂ Rn is compact.

Then there are points xM ,xm ∈ S such that f(xm) ≤ f(x) ≤ f(xM for all x ∈ S.

Proof: We know that, since S is compact and f is continuous, f is bounded on S and

so f(S) has a least upper bound α. By definition of the least upper bound, for every

k ∈ N , α − 1/k < α. Hence we can find a corresponding point xk ∈ S for which

f(xk) > α− 1/k. But then

α > f(xk) > α− 1/k ,

so that f(xk)→ α as k →∞.

Since the set S is sequentially compact, there is a subsequence {xk`}∞`=1 which converges

to a point xM . Then continuity of f implies that f(xk`) → f(xM). But the original

sequence {f(xk)}∞k=1 converges to α and so f(xM) = α. Hence f takes on its maximum

at a point of the set S.

Finally, since min{f(x)} = max{−f(x)}, the same argument applied to −f yields the

minimum point xm.

As an immediate application we give a proof that we have promised concerning the

equivalence of all norms on Rn.


Proposition B.4.7 On Rn, all norms are equivalent.

Proof: Let ‖ · ‖α be any norm on Rn. For any x ∈ Rn write x =n∑i=1

xi ei where

{e1, e2, . . . , en} is the standard basis. Then, since the triangle inequality is valid for

any norm,

‖x‖α ≤n∑i=1

|xi| ‖ei‖α ≤ M

n∑i=1

|xi| = M ‖x‖1 ,

where M = max1≤i≤n

{‖ei‖α}.

Now consider the function ϕ : x 7→ ‖x‖α. This function is continuous with respect to

the `1-norm. We can see this by starting with a sequence {xk}∞k=1 which converges to xo

in the sense that ‖xk − xo‖1 → 0 as k →∞. Then

0 ≤∣∣ ‖xk‖α − ‖xo‖α∣∣ ≤ ‖xk − xo‖α ≤ M ‖xk − xo‖1 .

Hence ‖xk − xo‖ → 0 implies that ‖xk‖α → ‖xo‖α as k → 0, i.e., ϕ(xk) → ϕ(xo). So ϕ

is continuous in the metric space {Rn, d1} where d1 is the metric induced by the `1-norm.

Now in this same metric space, the unit sphere S1 = {x ∈ Rn | ‖x‖1} is a closed and

bounded set. Hence it is compact and so, by Weierstrass’s Theorem B.4.6 there exists a

point x ∈ S1 such that ϕ(x) = minx∈S1 ϕ(x). So if m = ϕ(x) then ‖x‖α ≥ m for all

x ∈ S1. It follows that for any x ∈ Rn, x/‖x‖1 ∈ S1 and so

ϕ

(x

‖x‖1

)=

∥∥∥∥ x

‖x‖1

∥∥∥∥α

,=1

‖x‖1

‖x‖α, ≥ m.

It follows that m ‖x‖1 ≤ ‖x‖α ≤ M ‖x‖1.

Now we want to say something about the Intermediate Value Theorem and some related

results. First we need a lemma13.

Lemma B.4.8 Let f : Rn −→ Rm be continuous. Then f−1(V ) ⊂ Rn is closed whenever

V ⊂ Rm is closed.

Proof: We show that f−1(V ) contains all of its limit points. Suppose xo is a limit point

of the set f−1(V ). Then there is a sequence {xk}∞k=1 ⊂ f−1(V ) such that xk → xo. By

continuity of f , yk = f(xk)→ f(xo) = yo. Since, in general, f [−1(V )] ⊂ V the yk lie in

V for all k. But by hypothesis, V is closed so yo ∈ V . Hence f−1(yo) = xo ∈ f−1(V ). So

f−1(V ) contains all its llimit points and so is a closed set.

13This lemma is only half of a general result that characterizes continuous functions.


The result of the lemma allows us to prove that the continuous image of a connected

set is connected. Before reading the proof, you should check the definitions ?? and B.3.16.

We also make use of properties of inverse functions.

Proposition B.4.9 If f is a continuous map from RntoRm, and if E ⊂ Rn is a connected

set, then f(E) is a connected subset of Rm.

Proof: Assume, on the contrary, that f(E) = A ∪ B where A and B are non-empty separated subsets of

Rm. Define the sets G = E ∩ f−1(A) and H = E ∩ f−1(B). Then neither of these sets are empty and

E = GcupH.

Now, A ⊂ c`(A) and this means that G ⊂ f−1(c`(A)) and the lemma tells us that f−1(c`(A)) is closed. So

c`(G) ⊂ f−1(c`(A)) and it follows by applying f , that f(c`(G)) ⊂ c`(A). Since f(H) = B and c`(A)∩B = ∅,we conclude that c`(G)∩H = ∅. The same argument shows that G∩c`(H) = ∅. Thus G and H are separated

which is impossible if E is connected.

Reference to B.3.17 shows that the connected sets in R are just the intervals. This

leads to what is commonly called the Intermediate Value Theorem.

Proposition B.4.10 Let fbe a continuous real-valued function defined on the interval

[a, b]. If f(a) < f(b) and if c ∈ R satisfies f(a) < c < f(b) then there is an x ∈ [a, b] with

f(x) = c.

Proof: By B.3.17, the interval [a, b] is connected. Hence the preceeding theorem implies

that f([a, b]) is a connected subset of R. Then, again by B.3.17, this latter set is an

interval [f(a), f(b)] and c ∈ [f(a), f(b)]. The result is therefore true in this case. The case

that f(b) < f(a) is handled similarly.

Proposition B.4.9, in the case that n = 1 but m > 1 tells us that curves are connected.

Now we want to show that all open or closed balls in Rn are connected. If p andq are

points in Rn we define the line segment joining these to points to be the set of points

{x ∈ Rn |x = (1− λ)p + λ q , 1 ≤ λ ≤ 1} .

Then a typical component xi = (1 − λ) pi + λ qi = pi + λ (qi − pi) and thus each xi is a

continuous image of the interval [0, 1]. So the line segment is a connected set. Now

‖p− q‖ =

(n∑i=1

(pi − xi)2

) 12

=

(n∑i=1

(pi − [pi − λ(qi − pi)]2) 1

2

=

(n∑i=1

λ2(qi − pi)2

) 12

= λ

(n∑i=1

(qi − pi)2

) 12

= λ ‖p− q‖ .


So we see from this computation that the distance from any point on the line x to p is λ

times the distance between p and q. So the entire line segment between the center of any

ball and a point on the boundary of that ball lies entirely in the ball. The union of all these

radii each of which contains the center, is therefore connected according to Proposition

B.3.18 hence the ball is connected. Note that since Rn itself can be condidered the union

of line segments all containing the origin, this reasoning shows that Rn is connected.

One useful way to study real-valued functions is to study their level sets. If the function

is continuous and c ∈ R it is easy to check that the sets {x | f(x) ≤ c}, {x | f(x) ≥ c}are both closed. Likewise the set {x | f(x) = c} is closed. Indeed, for the first of these

three sets, if x? is a limit point there is a sequence {xk}∞k=1 ⊂ {x | f(x) ≤ c} such that

xk → x?. The continuity of f implies that c ≥ f(xk) → f(x?). Hence f(x?) ≤ c and so

xo ∈ {x ∈ Rn | f(x) ≤ c}. The proofs in the other cases are similar.

We note for future reference that if the function is a linear functional, i.e. if it has the

form f(x) = a1x1 + a2x2 + . . .+ anxn = 〈x,a〉 it is a continous function. In this case, the

third of these sets, namely {x | f(x) = c}, is called a hyperplane and the two other level

sets constitute two closed half spaces determined by this hyperplane as can be seen easily

by drawing a picture for the case n = 2. These structures will play a significant role in

our discussion of convex sets and optimization, particularly in the theory of duality.

As one would expect from earlier studies, the composition of continuous functions is

continuous. The composition is defined in the usual way. X ⊂ Rn and that f : X −→ Rm

while Y ⊂ Rm which contains f(X). Then, if g : Y → Rp we can define g ◦ g : X −→ Rp

by x 7→ g(f(x). Then it is easy to see that the continuity of f and g imply the continuity

of g ◦ f . Indeed, let xo ∈ X and let {xk}∞k=1 be any sequence converging to xo. Then the

continuity of f implies that f(xk)→ f(xo) and since g is continuous g(f(xk))→ g(f(xo))

as k →∞. Hence the continuity of the composition.

It is important as well as often useful, to know how open and closed sets are related to

continuity.

B.4.2 Semicontinuous Functions

For optimization problems the notion of lower semicontinuous function (as well as upper

semicontinuous function) is crucial. Suppose that f : Rn → R and that {xk}∞k=1 ⊂ Rn is a

sequence. Then {f(xk)}∞k=1 is a sequence in R.


Definition B.4.11 A function f is lower semicontinuous at xo provided f(xo) ≤ lim infk→∞

f(xk)

for all sequences {xk}∞k=1 which converge to xo. A function is called lower semicontinuous

on a set D ⊂ Rn provided that it is lower semicontinuous at every point of D. The func-

tion is called upper semicontinuous at xo provided f(xo) ≥ lim supk→∞

f(xk) for all sequences

{xk}∞k=1 which converge to xo.

We note that, from the relations between lim sup and lim inf it follows that a function

is continuous at a point if it is both lower and upper semicontinuous at that point. Indeed

f(xo) ≤ lim infk→∞

f(xk) ≤ lim supk→∞

f(xk) ≤ f(xo) .

The converse statement is trivial.

Example B.4.12 Simple examples of functions which are upper semicontinuous every-

where but are not continuous at some x0 are

f1(x) =

0, if x < x0

1, if x ≥ x0

and f2(x) =

0, if x 6= x0

1, if x = x0

.

On the other hand −f1 and −f2 are examples of lower semicontinuous functions. A

more interesting example is χQ, the characteristic function of the set of rationals14. The

function χQ is clearly upper semicontinuous at each rational and lower semicontinuous at

each irrational.

It is interesting and important in optimization, to see that the upper and lower semi-

continuous functions can be characterized by the nature of the sets {x | f(x) ≥ a} and

{x | f(x) ≤ a}. This is the content of the next result.

Theorem B.4.13 A function, f , is upper semicontinuous on E ⊂ Rn if and only if

{x ∈ E|f(x) ≥ a} is closed for all a ∈ R and f is lower semicontinuous on E if and only

if {x ∈ E|f(x) ≤ a} is closed for all a ∈ R.

14Given a set S the characteristic function of S, χS , is the function that takes the value 1 on S and 0

at all other points.


Proof: We check only the first statement since it is equivalent to the second, f being

upper semicontinuous on E if and only if −f is lower semicontinuous there.

Suppose, first, that f is upper semicontinuous on E. Let a ∈ R and let xo ∈ E be

a limit point of {x ∈ E|f(x) ≥ a} then there exists a sequence xk → xo, xk ∈ E,

k = 1, 2, . . ., and f(xk) ≥ a. By upper semicontinuity, f(xo) ≥ limk→∞ f(xk) ≥ a. Hence

the set is closed. Conversely, suppose that xo is a limit point of E which is in E and that

f is not upper semicontinuous at xo. Then f(xo) <∞ and there is a M and a sequence

{xk}∞k=1 ⊂ E such that f(xo) < M , xk → xo, and f(xk) ≥ M . Hence {x ∈ E|f(x) ≥ M}is not closed since it does not contain all of its limit points.

Remark: By complimentation f is upper semicontinuous on E provided sets of the

form {x ∈ E|f(x) < a} are open and lower semicontinuous provided {x ∈ E|f(x) > a}are closed.

Exercise B.4.14 Let S ⊂ Rn and let χS be its characteristic function. Then χS is upper

semicontinuous on S if and only if S is closed.

We have seen in Proposition B.4.6 that if K ⊂ Rn is compact and if f : K → R is

continuous, then the function f assumes its least upper bound and greatest lower bound

at points of K. In fact, a careful look at the definitions and the proof of that result leads

to the conclusion that if f is upper semicontinuous on K, then it will achieve its least

upper bound and, if it is lower semicontinuous, it will achieve its greatest lower bound.

Let us prove the first statement.

Theorem B.4.15 If K ⊂ Rn is compact and if f : K → R is upper semicontinuous on

K, then there exists a point xo ∈ K for which f(xo) = supx∈K f(x).

Proof: Let L := supx∈K f(x). Then, by definition of the supremum, there exists a

sequence of points {xk}∞k=1 ⊂ K such that f(xk) → L as k → ∞. Since K is compact,

the sequence {xk}∞k=1 contains a convergent subsequence {xk`}∞`=1 which converges to

some point xo ∈ K. By the upper semicontinuity of f, L ≥ f(xo) ≥ lim`→∞f(xk`) =

lim`→∞ f(xk`) = L. Hence f(xo) = L which was to be proved.

This result just hints at the importance that upper and lower semicontinuous real-

valued functions play in the theory and practice of optimization.


B.4.3 The Extended Reals

In our work, we will often find it useful to extend the usual set R of real numbers to a

set which allows us to represent unbounded numbers. In order to do this, we adjoin two

symbols −∞ and ∞ to R and denote by R? the set R∪ {−∞}∪ {∞}. This set is known

as the set of extended real numbers. We extend the usual order relation of R to the

set R? by defining −∞ < ∞ and, for all x ∈ R,−∞ < x < ∞. Thus, in particular,

intervals, whether open, closed, or half open (or half closed) are defined as usual e.g.,

[a, b] := {x ∈ R? | a ≤ x ≤ b}.

If, in this last example, a, b ∈ R then the interval [a, b] is said to be bounded, otherwise,

we call it unbounded.

The usual arithmetic operation are likewise extended to R? with some exceptions.

∞+ x = x+∞ =∞ ; −∞+ x = x+ (−∞) = x−∞ = −∞ .

∞+∞ =∞ , (−∞) + (−∞) = −∞ ; −(∞) = −∞, and− (−∞) =∞.

but we do not define ∞+ (−∞) or −∞+∞. Likewise, for x > 0,

∞ · x = x · ∞ =∞

and

−∞ · x = x · (−∞) = −∞

while for x ≤ 0,

∞ · x = x · ∞ = −∞

and

−∞ · x = x · (−∞) =∞ .

The expressions ∞ · −(∞) and −∞ · (+∞), are not defined. Finally, we will find it

convenient to adopt the convention that

∞ · 0 = 0 · ∞ = −∞ · 0 = 0 · (−∞) = 0 .

Once we have an ordering on R? we can define suprema and infima.

Definition B.4.16 Let A ⊂ R? then we will define the supremum of A,

sup A, and the infimum of A, inf A, according to


If A 6= ∅ then if

(i) A is bounded above then sup A is an element u ∈ R which is an upper

bound and is smaller than all other upper bounds for A;

(ii) A is bounded below then inf A is an element b ∈ R which is a lower bound

and is larger than all other lower bounds for A;

(iii) A is unbounded above, then sup A =∞;

and if

(iv) A is unbounded below, then sup A = −∞.

On the other hand, if A = ∅ then

(i) sup A = −∞

and

(i) inf A =∞.

In the case that sup A ∈ A it will be called maxA. Similarly, inf A will be

called min A inf A ∈ A.

For some purposes it will be convenient to treat extended real valued functions. Sit-

uations arise where, for example, it is convenient to extend a function to all of Rn by

defining the value of the function as ∞ outside its original domain. In this context, it is

useful to introduce the definition of the indicator function of a set S.

Definition B.4.17 Let S be a non-empty set. Then the indicator function of S is

defined as the function ψS given by

ψS =

0 if x ∈ S

+∞ if x 6∈ S.

Then, given a function f : X → Rn we can extend f to f : Rn → R∗ by defining


f(x) := (f + ψS) (x) =

f(x), if x ∈ S

+∞, if x 6∈ S.

If, on the other hand, we start with a function f : Rn → R and consider a non-empty set

X ⊂ Rn (which may, for example, represent the set of feasible points of an optimization

problem), then we can consider the restriction of f to X defined in terms of its graph

Gr (f |X) := {(x, y) |x ∈ X, y = f(x)} .

Then we can identify f |X with the extended real-valued function f = f + ψX .

B.4.4 Epigraphs and Effective Domains

The notions of epigraph and effective domain will be crucial for our discussion.

Suppose that X ⊂ Rn and f : X → R∗. Then the epigraph of f is the subset of Rn+1

given by

epi (f) := {(x, z) ∈ Rn+1 |x ∈ X, z ≥ f(x)} ,

while the effective domain of f is defined to be the set

dom (f) := {x ∈ X | f(x) <∞} .

Since epi(f) ⊂ Rn=1, it is easy to see that

dom (f) = {x ∈ X | for some z <∞ , (x, z) ∈ epi (f)}

Thus, the effective domain is just the projection of epi(f) onto Rn. If the function f is

restricted to its effective domain, its epigraph is not affected. Likewise, if we extend f by

setting f(x) =∞ for all x ∈ Rn \X the epigraph remains the same.

We often exclude the degenerate case where f ≡ ∞ (in which case epi(f) = ∅) as well

as the case in which f takes on the value −∞ at some point of its domain (in which case

epi(f) contains a vertical line.) We will say that f is proper if f(x) <∞ for at least one

x ∈ X and f(x) > −∞ for all x ∈ X.


We have seen, above, that there is a close connection between a lower semicontinous

function and properties of its level sets. We extend that result here.

Proposition B.4.18 For a function f : Rn → R∗, the following statements are equiva-

lent.

(a) The level set {x ∈ Rn | f(x) ≤ α} is closed for every real number α.

(b) The function f is lower semicontinuous on Rn.

(c) The set epi(f) is closed.

Proof: If f(x) ≡ ∞ the result is trivial. So we may assume that f(x) < ∞ for at least

one x ∈ Rn and, hence, that epi(f) 6= ∅. It follows from the definitions that there is at

least one non-empty level set.

Assume that the level set {x ∈ Rn | f(x) ≤ α} is closed for every choice of real

number α but that, for some x and some sequence {xk}∞k=1 with xk → x we have

f(x) > lim infk→∞

f(xk). Choose γ such that f(x) > γ > lim infk→∞

f(xk). Then there ex-

ists a subsequence, call it, again, {xk}∞k=1 such that f(xk) ≤ γ for all k = 1, 2, · · · . Since

level sets are assumed closed, this implies that f(x) ≤ γ which is a contradiction.

Now, suppose that f is lower semicontinuous and let (x , z) be a limit point of epi(f).

Then there exists a sequence {(xk, zk)}∞k=1 such that xk → x and zk → z and f(xk) ≤ zk

for all k. Hence, by lower semicontinuity, f(x) ≤ lim infk→∞

f(xk) ≤ z. Hence (x, z) ∈epi (f) and so epi(f) is closed.

Finally, to see that a closed epigraph entails closed level sets, suppose {xk}∞k=1 ⊂{x | f(x) ≤ α} for some α ∈ R. Suppose, further that xk → x. Then, for every

k , (xk, α) ∈ epi (f) and, since the epigraph is assumed closed and (xk, α) → (x, α) we

have (x, α) ∈ epi(f). Hence f(x) ≤ α so that x ∈ {x | f(x) ≤ α}.

Bibliography

[1] D. P. Bertsekas, Nonlinear Programming, Second Ed., Athena Scientific, Bell-

mont, MA, 1999.

[2] R. P. Boas,A Primer of Real Functions, Carus Mathematical Monographs vol. 13,

Fourth Ed., Mathematical Asooiciation of America, Washington, DC, 1996.

[3] F. H. Clarke, Optimization and Nonsmooth Analysis, John Wiley & Sons, New

York, NY, 1983.

[4] G. Danzig, Linear Programming and Extensions, Princeton University Press,

Princeton, NJ, 1963.

[5] G. Debreu, Theory of Value: an axiomatic analysis of economic equilibrium, Cowles

Foundation Monograph 17, Yale University Press, New Haven, CT, 1959.

[6] P. R. Halmos, Naive Set Theory, D. van Nostrand Company, Princeton, New

Jersey, 1961. (Rpt: Springer-Verlag, 1974).

[7] M. D. Intriligator, Mathematical Optimization and Economic Theory, SIAM

Publications, Philadelphia, PA, 2002.

[8] A. Marshall, Princples of Economics, Ninth (Variorum) Ed., McMillan, New York,

NY, 1961.

[9] J. von Neumann and O. Morgenstern, Theory of Games and Economic Be-

havior, John Wiley & Sons, New York, 1964.

[10] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ,

1970.

[11] M. Rosenlicht, Introduction to Analysis, Dover Publications, New York, NY, 1986.

67

68 BIBLIOGRAPHY

[12] W. Ruden, Principles of Mathematical Analysis, McGraw-Hill, New York, NY,

1976.

[13] P. A. Samuelson, Foundations of Economic Analysis, Harvard University Press,

Cambridge MA, 1947.

[14] P. A. Samuelson, Economics: an introductory analysis, Fifth Ed., McGraw-Hill,

New York, NY, 1961.

[15] A. Takayama, Mathematical Economics, Second Ed., Cambridge University Press,

Cambridge, UK, 1985.

mathematical methods in economicsangell/booke/appendix.pdf · 2015. 9. 21. · s(x) there...

Documents