mathematical methods in economicsangell/booke/appendix.pdf · 2015. 9. 21. · s(x) there...
TRANSCRIPT
Mathematical Methods in Economics
T. S. Angell
Department of Mathematical Sciences
University of Delaware
Newark, Delaware
c© September 21, 2015
2
Contents
A Basic Set Theory 1
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
A.2 Specification of Sets, Equality, and Subsets . . . . . . . . . . . . . . . . . . 3
A.3 The Algebra of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
A.3.1 Unions and Intersections . . . . . . . . . . . . . . . . . . . . . . . . 5
A.3.2 Set Differences, Complements, and DeMorgan’s Laws . . . . . . . . 7
A.4 Ordered Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
A.5 Binary Relations and Equivalence Relations . . . . . . . . . . . . . . . . . 11
A.6 Functions or Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
A.7 Orderings on Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
B Basic Analysis 27
B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
B.2 Norms and Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
B.2.1 Inner Products of Vectors . . . . . . . . . . . . . . . . . . . . . . . 28
B.2.2 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
B.2.3 Some Important Inequalities . . . . . . . . . . . . . . . . . . . . . . 33
B.3 Subsets of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
B.3.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
B.3.2 Suprema and Infima . . . . . . . . . . . . . . . . . . . . . . . . . . 40
B.3.3 Connected Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
B.3.4 The Bolzano-Weierstrass Theorem . . . . . . . . . . . . . . . . . . . 46
B.3.5 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
B.4 Functions on Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
B.4.1 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 53
B.4.2 Semicontinuous Functions . . . . . . . . . . . . . . . . . . . . . . . 59
i
ii CONTENTS
B.4.3 The Extended Reals . . . . . . . . . . . . . . . . . . . . . . . . . . 62
B.4.4 Epigraphs and Effective Domains . . . . . . . . . . . . . . . . . . . 64
Appendix A
Basic Set Theory
A.1 Introduction
We assume that most readers are familiar with the use of set notation and understand
the basic operations of the algebra of sets. But a review can be helpful and there are
certain particular matters that are worth emphasizing. So we devote this Appendix to a
review of the basic ideas. Of course, whole books have been written on the subject and
scores of textbooks contain this basic information. While these works discuss the subject
with varying levels of sophistication, or point of view is the “naive” one. Thus we take, as
basic undefined concepts, that of element, set, and the relation of belonging to. This is the
point of view of the book of P. R. Halmos, Naive Set Theory [?] which is still probably the
best exposition for the aspiring student. Much of what is contained in this brief appendix
follows the early part of his exposition.
To quote Halmos,
A pack of wolves, a bunch of grapes, or a flock of pigeons are all examples of
sets. . . . An element of a set may be a wolf, a grape, or a pigeon.
If we denote the set of wolves by W and a particular wolf by w, then the statement
w ∈ W is the statment that “w ” is a member of or belongs to the set W . Sometimes
the words collection, or class, or family are used synonymously with the word set. Some
authors reserve the word class to describe a set of sets and the word family to describe a
set of classes. Thus, to continue our example, we can speak of the the set W as being an
element of the class of sets of different species of mammals, and of the class of mammals
1
2 APPENDIX A. SETS
as belonging to the family of vertibrates. Again, having pointed out this usage, we use
these terms with some fluidity in our exposition. What is important, really, is clarity.
There are some logical, or better, grammatical niceties, that we need to discuss be-
fore we begin. Following Halmos, we list seven “logical” operators which will be used
throughout to construct sentences describing sets. They are
and,
or (in the sense of “either—or—or both”),
not,
if—then—(or implies),
if and only if,
for some— (or there exists—),
for all— .
The rules of sentence formation then can be listed:
(i) Put “not” before a sentence and enclose the result in parentheses.1
(ii) Put “and” or “or” or “if and only if” between two sentences and enclose the result
in parantheses.
(iii) Replace the dashes in “if—then—” by sentences and enclose the result in parenthe-
ses.
(iv) Replace the dash in “for some—” or in “for all—” by a letter, follow the result with
a sentence, and enclose the whole in parentheses.
The practice of “enclos[ing] the result in parentheses” is one that is used for clarity. Most
of the time, there is NO lack of clarity if we omit the parentheses e shall seldom, if ever,
use them.
1The correct answer to the question “Are you going to go to the rock concert or do something else?”
is “Yes”.
A.2. SPECIFICATION OF SETS, EQUALITY, AND SUBSETS 3
A.2 Specification of Sets, Equality, and Subsets
A set is specified or defined when its elements are completely characterized. There are
two ways to do this; one either makes an exhaustive list (without regard to order!) of all
elements of the set, or one gives an explicit property or attribute that actually character-
izes the elements. That doing so indeed specifies a set is often stated as an axiom of set
theory. Formally:
Axiom: To every set A and to every condition (equivalently “sentence”)
S(x) there corresponds a set B whose elements are exactly those elements x
of A for which S(x) holds.
The set of all students registered for a particular college course at a given moment in
time is given by the traditional class list. If we look at the students whose names appear
in this list, then we can define a set, F , of all named students who are female. If we call
the set of students whose names appear on the class list, L, then the set F is given by
{s ∈ L | s is the name of a female student}. A more mathematical example is the set of
points in the plane R2 that lie on the unit circle
S1 = {(x, y) ∈ R2 |x2 + y2 = 1} .
Notice that in both these examples a “generic” element is named (s in the first case and
(x, y) in the second) and they are required to be elements of some “universal” set (L in
the first case and R2 in the second). The specification of this so-called universe of discourse
makes clear what types of objects we are discussing and, in more theoretical expositions,
avoids certain well-known logical difficulties as, for example, the Russell Paradox ([?])
which involves only sets which do not have themselves as elemets and the set U of all such
sets.
Two sets, A and B are said to be equal provided they consist of exactly the same
elements. In this case we write A = B; in the contrary case we write A 6= B. We say
that a set A is a subset of B provided every element of A is also an element of B and, in
this case, we write A ⊂ B. Notice that this is quite different from the relation “belongs
to”; the relation “is a subset of” is, as defined, a reflexive relation in the sense that it is
always true that A ⊂ A. This is certainly not the case with the relation ∈.
The relation A ⊂ B may also be written in the reverse order as A ⊃ B. In the case
that A ⊂ B and A 6= B we say that A is a proper subset of B. Note that some authors
4 APPENDIX A. SETS
use the notation A ⊆ B for the relation A ⊂ B and A ( B in the case that A is a proper
subset. We do not use that notation in this book.
We will speak of relations with more specificity presently. To anticipate that discussion
we point out that the relation of being a subset has certain interesting properties. One we
have already mentioned, that of reflexivity. We collect the three most important properties
of the relation ⊂ here. Denoting the universe of discourse as U , we have
(1) A ⊂ A for every set A ⊂ U (reflexivity) ;
(2) For sets A and B, subsets of U , A ⊂ B and B ⊂ A implies
A = B (antisymmetry) ;
(3) For sets A,B,C ⊂ U , A ⊂ B and B ⊂ C implies A ⊂ C (transitivity) .
A relation on a class of sets, in this case all subsets of the universe U , with these three
properties defines what we will call a partial order (more on partial orderings presently).
In this case, we say that set class of all subsets of the set U , denoted by P(U), is partially
ordered by inclusion2. The set P(U) is called the power set of U
It is important to note that in this example of a partially ordered set, the properties
of reflexivity and antisymmetry can be combined into a single statement:
A = B if and only if A ⊂ B and B ⊂ A . (A.1)
This statement embodies the basic strategy for showing that two sets are equal: show
every element of A is an element of B and that every element of B is also an element of
A, i.e., that each set is a subset of the other.
It is often convenient to have a subset of the universe of discourse, U , that contains no
elements of U . This set is called the empty set and is denoted by the symbol ∅. If we agree,
as we do, that a set is specified by characterizing the properties that the elements must
have, then we may use any false statement to specify the empty set. Thus, for example,
we may write
∅ = {u ∈ U |u 6= u} .2We remark that there are other kinds of relations that can be defined on P(U). For example, equality
is a relation which is certainly reflexive, and transitive; however, rather than being anti-symmetric, it is
symmetric since A = B implies that B = A. Such relations on a set are called equivalence relations
A.3. THE ALGEBRA OF SETS 5
This set is called the empty set or, sometimes, the null set. It is a subset of any given set.
In particular, in any collection of sets, there can be at most one empty set since, were
there more, each would be subset of the other.
A.3 The Algebra of Sets
In this section we consider useful ways of combining sets to form new ones. We assume
that we have a fixed universe of discourse, U , and that all sets are subsets of U . We will
not always mention this universe in stating definitions. The familiar operations which
we study in this section are: union, intersection, complementation, and powers. These
operations are all fundamentally related to the relation of inclusion.
A.3.1 Unions and Intersections
We begin with the operation of set union. Given sets A and B, their union, written A∪Bis defined by
A ∪B = {x ∈ U |x ∈ A or x ∈ B} .3 (A.2)
For example, if A = {1, 5, 9, 7, 3} and B = {8, 4, 2, 6} then A∪B = {1, 2, 3, 4, 5, 6, 7, 8, 9}.(Note the only things here that matters are the elements themselves, not the order in
which we happen to write them down!). Alternately, using the symbol N for the nat-
ural numbers (set of positive integers) we can describe the union as A ∪ B = {x ∈N |x is used in Soduku puzzles}.
The operator ∪ is commutative, associative, and idempotent. That is (i)A ∪ B =
B ∪ A , (ii) (A ∪ B) ∪ C = A ∪ (B ∪ C), and (iii)A ∪ A = A. Likewise it is true that
A∪∅ = A and A∪U = U or, more generally, A ⊂ B if and only if A∪B = B. All of these
statements are statements about the equality of two sets. As such, they can be proved by
application of the definitions and our logical operators, although they are so elementary
that few would bother to write down a proof. But every serious student should prove
them once in a lifetime. Here is an example: we prove the statement
Proposition A.3.1 A ⊂ B if and only if A ∪B = B.
3REMEMBER: “or” means “either—or—or both”.
6 APPENDIX A. SETS
Proof: Suppose first that A ⊂ B. We then must show that A ∪ B = B. Following our
basic rule (A.1) we first, check that A ∪B ⊂ B and then prove the reverse inclusion. So,
if x ∈ A∪B then either x ∈ A or x ∈ B or both. But A ⊂ B means that for every x ∈ Awe must have x ∈ B so x ∈ A∪B implies x ∈ B. So the first inclusion is proved. Now we
establish the reverse inclusion B ⊂ A∪B. To do this, choose x ∈ B. Then, by definition
of union x ∈ A or x ∈ B hence x ∈ A ∪ B. This completes the proof of sufficiency: if
A ⊂ B then A ∪B = B.
To prove the reverse implication, that is, to prove the necessity of the left hand state-
ment, suppose that A ∪ B = B. We will be done if we can show that A ⊂ B. To this
end, choose x ∈ A. Then, x ∈ A ∪ B and since this latter set is, by hypothesis, just the
set B we have x ∈ B.
There are two things that are illustrated in this proof. First, the use of (A.1) to prove
that two sets are equal. The other is the structure of an “if and only if” proof. If the
reader is unsure of this reasoning, hopefully careful study of the arguments above will be
helpful.
The operation of intersection has many similarities with the operation of union. Given
two sets A and B, their intersection A ∩B is defined by
A ∩B = {x ∈ U |x ∈ A and x ∈ B} (A.3)
Notice that the definition is symmetric in A and B in the sense that A ∩ B = B ∩ A.
A list of elementary properties of the intersection operator is given here:
A ∩ ∅ = ∅ ,
A ∩B = B ∩ A ,
A ∩ (B ∩ C) = (A ∩B) ∩ C ,
A ∩ A = A, ,
A ⊂ B if and only if A ∩B = A .
Notice that the last of these properties shows, along with Proposition A.3.1, that set
inclusion can either be described in terms of unions or intersections.
If two sets have no common elements, then they are said to be disjoint and we write
A ∩ B = ∅. In the case that we have a collection of sets, any two of which are disjoint,
then we say that the collection is pairwise disjoint.
A.3. THE ALGEBRA OF SETS 7
We now have two operations defined on sets, union and intersection. The natural
question now is to ask how these two operations are related to each other. This question
is answered by the distributive laws:
A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C) (A.4)
A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C . (A.5)
As an exercise intended to illustrate, once more, a set-theoretic argument, let us prove
the first of these distributive laws.
Proposition A.3.2 For any sets A,B and C subsets of the set U ,
A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C) .
Proof: If x belongs to the left-hand side of this equation, then x ∈ A and x ∈ B or x ∈ Cor both. If x ∈ B then x ∈ A ∩ B and so is an element of the set on the right. Likewise,
if x ∈ C then x ∈ A∩C and so, again, x belongs to the right hand side. This shows that
the set on the right-hand side includes the set on the left.
To prove the reverse inclusion, suppose x belongs to the set on the right. Then either
x ∈ A∩B or x ∈ A∩C or both. If x ∈ A∩B then x ∈ A and x ∈ B. So x ∈ B ∪C and
hence x ∈ A ∩ (B ∪ C). Likewise, if x ∈ A ∩ C then x ∈ C and thus x ∈ B ∪ C. So in
this case as well, x ∈ A ∩ (B ∪ C) and x belongs to the left-hand set. Hence the reverse
inclusion is satisfied. wwe conclude that the sets on each side of th equality are the same
set.
A.3.2 Set Differences, Complements, and DeMorgan’s Laws
The set theoretic difference A\B (also written A−B) is defined by
A\B = {a ∈ A | a 6∈ B} .
In many situations we are only interested in subsets of a given set X (the universe of
discourse). The complement Ac of a set A with respect to X is defined by
Ac = X\A = {a ∈ X | a 6∈ A} .
8 APPENDIX A. SETS
We can now formulate and prove De Morgan’s Laws. These are rules that relate com-
plements of unions to intersections of complements, and complements of intersections to
unions of complements. It is surprising how useful they are and how often they are used4.
In the case of just two sets A,B ⊂ X, these rules are simple to write down and
understand in terms of Venn diagrams and the reader is invited to do so. In this simple
case they read
X \ (A ∪B) = (X \ A) ∩ (X \B) , or (A ∪B)c = Ac ∩Bc . (A.6)
and
X \ (A ∩B) = (X \ A) ∪ (X \B) , or (A ∩B)c = Ac ∪Bc . (A.7)
Proposition A.3.3 Assume that A1, A2, . . . , An are subsets of the set X. Then
(A1 ∪ A2 ∪ . . . ∪ An)c = Ac1 ∩ Ac2 ∩ . . . Acn ,
and
(A1 ∩ A2 ∩ . . . ∩ An)c = Ac1 ∪ Ac2 ∪ . . . ∪ Acn .
Proof: For the first part, assume that x ∈ (A1∪A2∪. . .∪An)c. Then x 6∈ A1∪A2∪. . .∪An,
and hence x 6∈ Ai for any i = 1, 2, . . . , n. This means that x ∈ Aci for all i and so
x ∈ Ac1 ∩ Ac2 ∩ . . . ∩ Acn. So we have shown that
(A1 ∪ A2 ∪ . . . ∪ An)c ⊂ Ac1 ∩ Ac2 ∩ . . . Acn .
To prove the reverse inclusion, assume that x ∈ Ac1∩Ac2∩ . . . Acn. This means that x ∈ Acifor all i. So x 6∈ Ai for all i = 1, 2, . . . , n. It follows that x 6∈ A1 ∪ A2 ∪ . . . ∪ An which
means that x ∈ (A1 ∪ A2 ∪ . . . ∪ An)c. This completes the proof of the first equality. We
leave the (analogous) proof of the second equation as an exercise.
De Morgan’s Laws have extension to arbitrary families of sets. We first extend the
notions of union and intersection to families of sets in the following way: If A is a non-
empty family of sets, we define
4DeMorgan’s Laws are frequently used in programming, in particular in the construction of sorting
algorithms. From the point of view of logic, they allow the substitution of equivalent statements, e.g.,
“not S or not T ‘ being equivalent to “not both S and T”.
A.3. THE ALGEBRA OF SETS 9
⋃A∈A
A = {a ∈ U | a belongs to at least one set A ∈ A}
and ⋂A∈A
A = {a ∈ U | a belongs to all sets A ∈ A} .
The distributive laws and the Laws of De Morgan extend to this case in the obvious
ways, e.g., (⋃A∈A
A
)c
=⋂A∈A
Ac .
Families are often given as indexed sets. This means we have one basic set I (the index
set and the family consists of one set Ai for each element i ∈ I. We then write the family
as
A = {Ai | i ∈ I} .
We may then write
⋃i∈I
Ai and⋂i∈I
Ai
for unions and intersections. In this setting De Morgan’s Laws become(⋃i∈I
Ai
)c
=⋂i∈I
Aci and
(⋂i∈I
Ai
)c
=⋃i∈I
Aci .
Let us finish the section with a simple example.
Example A.3.4 for each rational number q ∈ Q we can take the set
Cq := {(x, y) ∈ R2 |x2 + y2 = q2}
which is just the circle with rational radius q, centered at the origin. Then we can consider
C(Q) = {Cq | q ∈ Q} .
The set C(Q) is just the family of all circles in the plane R2 with center at the origin and
rational radius. Note that this family can be thought of as a family of sets indexed by the
rationals.
10 APPENDIX A. SETS
A.4 Ordered Pairs
In analytic geometry and elementary calculus it is common to introduce the coordinate, or
(x, y)-plane. The horizontal axis or axis of abscissae is associated with the value of x and
the vertical axis or axis of ordinates is associated with the y coordinate. The agreement
that the abscissa be listed first and the ordinate next in a certain sense gives a geometric
definition of the ordered pair (x, y). Likewise this construction gives a concrete example
of what is called the Cartesian product5 of two sets, in this case the two sets are two copies
of the real line.
In the theory of sets, we need a much more precise description of the notion of ordered
pair but its introduction makes things get very technical very quickly. We have given here
a quick summary for the sake of completeness.
The definition given here has the disadvantage of strangeness, but the decided advan-
tage of settling the problem of what we mean by a “first element” of an ordered pair. If
A = {a, b} and, in the desired order a comes first then we need a careful definition. Here
is the definition that we shall adopt.�
Definition A.4.1 The ordered pair of a and b with first coordinate a and second coordi-
nate b, is the set (a, b) defined by
(a, b) = {{a}, {a, b}} .
The definition clearly specifies the “first element”; it is the element that occurs in the
singleton set {a}. There are some technical difficulties that need to be addressed which
arise from the fact that the ordered pair is definied as a set of sets. Halmos ([?], pp.
23-24) deals with all of them. What is important is the statement that, given sets A and
B, there exists a set that contains all the ordered pairs (a, b) with a ∈ A and b ∈ B. This
set is called the Cartesian product of A and B, is written A×B, and is characterized by
the fact that
A×B = {x ∈ P [P(A ∪B)] |x = (a, b) for some a ∈ A and some b ∈ B} .
Remark: Note that we have written x ∈ P [P(A ∪ B) so (a, b) is a set of subsets of
A ∪B. If follows that one ordered pair is an element of P(A ∪B), while a set of ordered
pairs is then a set of subsets of A ∪B i.e., an element of P [P(A ∪B)].
5After all, it was Rene Desartes who introduced numerical components into geometry.
A.5. BINARY RELATIONS AND EQUIVALENCE RELATIONS 11
If R ⊂ A×B then the sets
RA = {a ∈ A | for some b ∈ B, (a, b) ∈ R}
and
RB = {b ∈ B | for some a ∈ A, (a, b) ∈ R}
are called the projections of R onto the first and second coordinates respectively.
Having seen the rigorous definition, we will henceforth treat ordered pairs less formally
as is usually done. Again, there are several facts that can be easily checked and we leave
them as exercises:
Exercise A.4.2 If A,B,X, and Y are sets, then
(a) If either A = ∅ or B = ∅, then A×B = ∅. The converse is also true.
(b) A ⊂ X and B ⊂ Y implies A × B ⊂ X × Y . If A × B 6= ∅ then A × B ⊂ X × Yimplies A ⊂ X and B ⊂ Y .
(c) The following distributive laws hold:
(i) (A ∪B)×X = (A×X) ∪ (B ×X) ;
(ii) (A ∩B)× (X ∩ Y ) = (A×X) ∩ (B × Y ) ;
(iii) (A−B)×X = (A×X)− (B ×X).
A.5 Binary Relations and Equivalence Relations
We start with two sets A and B. Then a binary relation R on a set A×B is a proposition
such that, for every ordered pair (a, b) ∈ A×B, one can decide if a is related to b or not.
It is simply a restricted set of ordered pairs. Formally,
Definition A.5.1 A binary relation in a set A×B is a subset R ⊂ A×B. The statement
“(a, b) ∈ R” is written as aR b.
12 APPENDIX A. SETS
Example A.5.2 (a) For any set A×A6 the diagonal ∆ = {(a, a) | a ∈ A} is the relation
of equality. The relation [(A× A) \∆] is the relation of inequality.
(b) The relation ≤ between two real numbers is the set
{(x, y) ∈ R× R |x coincides or lies to the left of y } ⊂ R× R .
(c) In P(A) the relation of set inclusion, B ⊂ A, is given by
{(A,B) ∈ P(A)× P(A) | every element of B is an element of A } .
(d) For any set X, let R be the relation on X×P(X) defined by (x,A) ∈ R if and only
if x ∈ A. This is the relation of membership in a set.
If A × B is a set with a binary relation R and C ⊂ A,D ⊂ B then the relation
R∩ (C ×D) is a binary relation on the set C ×D. It is called the relation induced by Ron C ×D .
Of all the relations, one of the most important is the equivalence relation. We will denote
such a relation by the symbol ∼ and write a ∼ b when we mean that a is equivalent to b.
We will also say that “a is similar to b”.
Definition A.5.3 An equivalence relation on a set X is a binary relation on X which is
reflexive, symmetric and transitive, i.e.
(a) for all a ∈ X : a ∼ a (reflexive).
(b) a ∼ b implies b ∼ a (symmetric).
(c) a ∼ b and b ∼ c implies a ∼ c (transitive).
We begin with some simple examples.
Example A.5.4
(a) The relation ∆ is an equivalence relation.
(b) In N the relation {(x, y) ∈ N×N |x−y is divisible by 2} is an equivalence relation.
6We will often say that a relation is defined on A to mean a relation on A × A, a slight abuse of
language that should cause no problem.
A.5. BINARY RELATIONS AND EQUIVALENCE RELATIONS 13
(c) Let f : X → Y be a function. Then {(x1, x2) ∈ X × X | f(x1) = f(x2)} is an
equivalence relation on X.
(d) Let T be the set of all triangles in the plane R2. Then the relation of congruence,
familiar from elementary Euclidean geometry is an equivalence relation.
Let us check the assertion (b). First, reflexivity. For alll x we have x − x = 0 and 0
is divisible by 2. Hence the relation is reflexive. Moreover, since y − x = (−1) (x − y)
it is clear that if 2|(x − y) then 2|(y − x). Hence the relation is symmetric. Finally, if
2|(x− y) and 2|(y− z), then since x− z = (x− y) + (y− z), it is clear that 2|(x− z). So
the relation is also transitive and hence is an equivalence relation.
Suppose that ∼ is an equivalence relation on the set X. If x ∈ X let E(x,∼) denote the
set of all elements y ∈ X such that x ∼ y. The set E(x,∼) is called the equivalence class
of x for the equivalence relation ∼. Since ∼ is an equivalence relation, the equivalence
classes have the following properties:
1. Each E(x;∼) is non-empty for, since x ∼ x, x ∈ E(x;∼).
2. Let x and y be elements of X. Since ∼ is symmetric, y ∈ E(x;∼) if and only if
x ∈ E(y;∼).
3. If x, y ∈ X the equivalence classes E(x;∼) and E(y;∼) are either identical or they
have no members in common.
Indeed, suppose, first, that x ∼ y. Let z ∈ E(x;∼). Then, by symmetry, since
z ∼ x we have also x ∼ z. Hence, by transitivity, z ∼ y and so, by symmetry,
y ∼ z. This shows that E(x;∼) ⊂ E(y;∼). By the symmetry of ∼ we see that
E(y;∼) ⊂ E(x;∼). Hence E(x;∼) = E(y;∼).
Finally, notice that if the points x, y ∈ X are not related then E(x;∼)∩E(y;∼) = ∅.Indeed, if z ∈ E(x;∼) ∩ E(y;∼) then x ∼ z and y ∼ z and so x ∼ z and z ∼ y.
Therefore x ∼ y which is a contradiction.
These facts lead to the following assertions concerning the family, F , of equivalence
classes of the equivalence relation ∼:
1. Every element of the family F is non-empty.
14 APPENDIX A. SETS
2. Each element x ∈ X belongs to one and only one of the sets in the family F .
3. x ∼ y if and only if x and y belong to the same set in the family F .
Otherwise said, an equivalence relation subdivides a set (or partitions the set) into the
union of a family of non-overlapping, non-empty subsets. Since, in most discussions, there
is only one equivalence relation that is relevant, we will often write simply E(x) instead
of E(x;∼) is no confusion can arise.
Here is an example which is perhaps the first most students see when they discuss
number systems.
Example A.5.5 In the construction of the rational numbers, which we will denote by Q,
we first introduce ratios of integers p/q where p ∈ N and q ∈ N. If p/q represents a point
on the number line, then the ratios kp/kq must represent the same point and hence the
same rational number. Thus, two ratios p/q and r/s represent the same rational number
and can be treated as equal and can be substituted for one another in proofs involving
rational numbers whenever the equality
ps = rq
is true.
Now, let us define a relation on N × N by (p, q) ∼ (r, s) if and only if ps = rq. We
check that this is an equivalence relation as follows:
(a) (Reflexivity): pq = pq hence (p, q) ∼ (p, q).
(b) (Symmetry): If ps = rq then rq = ps and so
(p, q) ∼ (r, s) implies (r, s) ∼ (p, q) .
(c) (Transitivity): If ps = rq and rt = vs, then
(pt) · s = (ps) · t = (rq) · t = (rt) · q = (vs) · q = (vq) · s
and thus pt = vq since s 6= 0. Hence (p, q) ∼ (r, s) and (r, s) ∼ (v, t) implies
(p, q) ∼ (v, t).
A.5. BINARY RELATIONS AND EQUIVALENCE RELATIONS 15
This argument shows that the rational numbers can be viewed as equivalence classes
of ratios of integers modulo the relation ∼ given in the example.
As a final example consider the following:
Example A.5.6 : Consider the set Z and let n be a fixed positive integer. Define a
relation ∼n by
x∼ny provided (x− y) is divisible by n .
This relation is called the relation of congruence modulo n. It is easy to check that
this is an equivalence relation on Z. (See the special case for n = 2 treated above.)
Moreover, there are n equivalence classes. Each integer x is uniquely expressible in the
form x = q n+ r, where q and r are integers and 0 ≤ r ≤ n− 1. (The integers q and r are
called the quotient and the remainder respectively. ) Hence each x is congruent modulo
n to one of the n integers 0, 1, . . . , n− 1. The equivalence classes are
E0 = {. . . ,−2n,−n, 0, n, 2n, . . .}
E1 = {. . . , 1− 2n, 1− n, 1 + n, 1 + 2n, . . .}
......
En−1 = {. . . , n− 1− 2n, n− 1− n, n− 1, n− 1 + n, n− 1 + 2n, . . .}
Formallly, the domain of a relation, R, on X × Y is the set of all first coordinates of
the members of R while, in this context, the range is the set of all second coordinates.
Formally
dom (R) = {x ∈ X | for some y ∈ Y, (x, y) ∈ R} ,
while
rng (R) = {y ∈ Y | for some x ∈ X, (x, y) ∈ R} .
The inverse of a relation R, denoted R−1, is obtained by reversing each of the pairs
belonging to R. Thus
R−1 = {(y, x) ∈ Y ×X | (x, y) ∈ R} .
16 APPENDIX A. SETS
Hence the domain of the inverse is the range of R and the range of R−1 is always the
domain of R.
If R and S are relations, then the composition R ◦ S is defined as
{(x, z) ∈ X × Z | for some y, (x, y) ∈ S and (y, z) ∈ R} .
Example A.5.7 If R = {(1, 2)} and S = {(0, 1)} then R◦S = {(0, 1)} while S ◦R = ∅.
Concerning compositions and inverses we have the following result
Proposition A.5.8 Let R,S, and T be relations. Then
(a) (R−1)−1
= R.
(b) (R ◦ S)−1 = S−1 ◦ R−1 .
(c) R ◦ (S ◦ T ) = (R ◦ S) ◦ T
Proof: (of (b)) We have
(x, a) ∈ (R ◦ S)−1 ⇔ (x, z) ∈ R ◦ S ⇔ for some y ,
(x, y) ∈ S and (y, z) ∈ R.
Consequently, (z, x) ∈ (R ◦ S)−1 if and only if (y, z) ∈ R−1 and (y, a) ∈ S−1 for some y.
But this is the condition that (z, x) ∈ S−1 ◦ R−1.
A.6 Functions or Maps
We now define the idea of a function (or a mapping) in terms of sets. This is not so
unusual since we often think of a function in terms of its graph which consists of a set
of ordered pairs: given two sets X and Y a function is determined provided we specify
a set of ordered pairs (the graph of the function) in X × Y with the additional property
that no two distinct pairs have the same first element. Hence a function is a particular
example of a relation! Not every subset of ordered pairs will do however; to be a function
the ordered pairs must satisfy a particular condition.
A.6. FUNCTIONS OR MAPS 17
Definition A.6.1 Let X and Y be two sets. A map f : X −→ Y (or a function with
domain X and range Y ) is a subset f ⊂ X × Y with the property: for each x ∈ X, there
is one, and only one, y ∈ Y satisfying (x, y) ∈ f .
It is usual to write y = f(x) instead of (x, y) ∈ f and say that “y is the value f assumes
at x”, or that “y is the image of x under f”, or that “f sends x to y”. The usual way to
define a map is to specify its domain X and the value of the function at each x ∈ X. We
often write x 7→ f(x). Here are some examples.
Example A.6.2 :
(a) Suppose k ∈ Y is fixed. Then the map defined for all x ∈ X by x 7→ k is called
a constant map. Note that a map need not send distinct points of X to distinct
points of Y , nor do we require it to take on all values in its range.
(b) The map x 7→ x of X onto itself is called the identity map on X. We will often
write this as IdX .
(c) If A ⊂ X the map i : A→ X given by a 7→ a is called the inclusion map of A into
X.
(d) For any sets X, Y the map p1 : X × Y → X determined by (x, y) 7→ x is called
the “projection onto the first coordinate”. Similarly p2 : X × Y → Y given by
(x, y) 7→ y is called the “projection onto the second coordinate”.
We have stated above that it is not required that every point in the range be a value
that is taken on by a given function. That is the motivation for the following definition.
Definition A.6.3 Let f : X → Y . Then
(1) For each A ⊂ X , f(A) = {f(x) ∈ Y |x ∈ A} ⊂ Y is called the image of A in Y
under f .
(2) For each B ⊂ Y , f−1(B) = {x ∈ X | f(x) ∈ B} is called the inverse image of B in
X under f .
Again, it is a good idea to see some examples.
18 APPENDIX A. SETS
Example A.6.4 (a) Let X = [−1, 1], Y = [0, 2] and f : X → Y be given by x 7→ x2.
Then f−1({1/4}) = {−1/2, 1/2}. This shows tat the inverse image of a single point
may well be a set in the domain. This cannot happen, of course, if f is one-to-one
(see definition below).7
(b) Let X = [−1, 1] , Y = [0, 2], and f : X → Y be x 7→ x2. Then f [0, 12] = [0, 1
4] and
f−1[0, 14] = [−1
2, 1
2].
(c) Let f : X → Y . If p1, p2 are the projections defined in the preceeding example
(A.6.2, part (d)), we have f(A) = p2[f ∩ (A× Y )] and f−1(B) = p1[f ∩ (X ×B)].
It is useful to think, explicitly, about how a function f : X → Y induces a map from
P(X) → P(Y ). This induced map is defined by A 7→ f(A) we call this induced map f
as well. Likewise, f : X → Y also induces a map f−1 : P(Y ) → P(X) by B 7→ f−1(B)
called the inverse map. Of these two maps, the most well-behaved is the inverse map f−1
and, in some sense, it is the most important.
Proposition A.6.5 : Let f : X → Y . then the inverse map f−1 : P(Y ) → P(X)
preserves union and intersection. Precisely
(a) f−1(B1 ∪B2) = f−1(B1) ∪ f−1(B2).
(b) f−1(B1 ∩B2) = f−1(B1) ∩ f−1(B2).
Proof: We leave (a) as an exercise and prove (b).
x ∈ f−1(B1 ∩B2) if and only if f(x) ∈ (B1 ∩B2) if and only if f(x) ∈ B1 and f(x) ∈ B2
if and only if x ∈ f−1(B1) and x ∈ f−1(B2)
if and only if x ∈ [f−1(B1) ∩ f−1(B2)] .
As a corollary, we can restate the result for arbitrary intersections; the proof is analo-
gous.
7While we usually make a careful distinction between a singleton set {x} ⊂ X and a point x ∈ X we
often abuse notation and write simply f−1(y) instead of f−1({y}).
A.6. FUNCTIONS OR MAPS 19
Corollary A.6.6 Let f : X → Y and let f−1 be the inverse map. Then if B be a family
of subsets of the set Y we have
f−1
(⋃B∈B
B
)=⋃B∈B
f−1(B) and f−1
(⋂B∈B
B
)=⋂B∈B
f−1(B) .
We have said that f−1 is better behaved because the last result is not true for the
induced map f . Indeed we have the counterexample:
Example A.6.7 : Let f : R → R be the constant map x 7→ 1. Let A = [0, 1] and
B = [2, 3]. Then A ∩B = ∅ and so
∅ = f(A ∩B) 6= f(A) ∩ f(B) = {1} .
We do find, however, that f preserves unions as the next result shows.
Proposition A.6.8 let A be a family of subsets of the set X. If f : X → Y , then for
the induced map f : P(X)→ P(Y ) we have
f
(⋃A∈A
A
)=⋃A∈A
f(A) , and f
(⋂A∈A
A
)⊂⋂A∈A
f(A) .
We leave the proof of this last result to the reader.
Note that, in general, we do not have equality in the last case as the above example
shows. To further clarify matters let us look at another example.
Example A.6.9 Let X = {x1, x2} and Y = {y}. Define f : X → Y by f(x1) = f(x2) =
y, and let A1 = {x1}, A2 = {x2}. Then A1 ∩ A2 = ∅ and consequently f(A1 ∩ A2) = ∅.On the other hand, f(A1) = f(A2) = {y} and so f(A1) ∩ f(A2) = {y}. This means that
f(A1 ∩ A2) 6= f(A1) ∩ f(A2).
The problem here stems from the fact that y belongs to both f(A1) and f(A2) but only
as the image of two different elements x1 ∈ A1 or x2 ∈ A2; there is no common element
x ∈ A1 ∩ A2 which is mapped into y. This cannot happen if f is one-to-one.
As for compositions, we have the usual result familiar from calculus.
20 APPENDIX A. SETS
Proposition A.6.10 Let f : X → Y and g : Y → Z. Then (g ◦ f)−1 = f−1 ◦ g−1.
Proof:
x ∈ (g ◦ f)−1(C) if and only if g ◦ f(x) ∈ C if and only if f(x) ∈ g−1(C)
if and only if x ∈ f−1[g−1(C)] if and only if x ∈ f−1 ◦ g−1(C) .
If f : X → Y takes on every value in its range, f is called surjective (or a surjection or
onto). Note that, for a surjective f we have for all B ⊂ Y , f [f−1(B)] = B.
If f sends distinct elements of X to distinct elements of Y , then f is call injective (or
and injection or one-to-one). Otherwise said, f in injective provided that x1 6= x2 iimplies
f(x1) 6= f(x2). This is equivalent to the statement that f(x1) = f(x2) if and only if
x1 = x2.
A function that is both injective and surjective is called bijective or a bijection. Note
that f is a bijection if and only if for all y ∈ Y , f−1({y}) is a single point. In this case
(f−1)−1 = f .
Example A.6.11 Consider the mapping f : R → R defined by x 7→ 2x + 3. Then f is
certainly a bijection with inverse mapping y 7→ 12y − 3
2. Indeed 1
2(2x+ 3)− 3
2= x.
Finally, we return to the notion of an indexed family of sets that we met in Section A3.
Let I be any set and F a family of subsets of a universal set U . Suppose, moreover, that
f : I −→ F . Then we write f(i) = Ai. In this way the function I is said to index the
sets {Ai ∈ F | f(i) = Ai, i ∈ I}. If the map f is surjective, then we say that the family
F has been indexed by I.
In particuar, if I = N then, to specify such a function simply defines what we usually
mean by a sequence of sets, and we write {A1, A2, . . .}. In the case that f is surjective,
we say that F is a countable family of subsets of U .
Example A.6.12 In Rn, let B(0) be the set of all sets of the form {x ∈ Rn |n∑i=1
x2i <
r2, r ∈ R}. These sets are the points in Rn whose distance from the origin is less than r.
Denote these sets by Br(0). Then the sets Bn(0) form a countable subset of B(0).
A.7. ORDERINGS ON SETS 21
A.7 Orderings on Sets
We will frequently meet certain binary relations on various sets which are called orderings
of which there are several types. Such relations are used in economics, for example, to de-
scribe preferences of various agents. Thus suppose that an n-vector x = (x1, x2, . . . , xn) ∈Rn represents a “bundle” of goods available to consumers, xi representing the amount of
good i in the bundle. Thus, for example, if the first component represents the number of
refrigerators measured in units while the second component represents wheat measured
in bushels then (2, 3.659, . . .) ∈ Rn is a bundle of goods consisting, among other things,
of two refrigerators and 3.659 bushels of wheat.
In describing consumer behavior, we generally make the assumption that one and only
one of the following alternatives holds:
1. a bundle x is preferred to the bundle y;
2. the consumer is indifferent in the choice of two bundles;
3. the bundle y is preferred to the bundle x.
Note that these alternatives, taken together, imply that the consumer can decide unam-
biguously between two bundles.
This situation suggests that we introduce some kind of structure that reflects preference,
i.e., some way of ordering bundles to reflect consumer desires.
Definition A.7.1 A binary relation R in a set A is said to be a preorder on A if it is
reflexive and transitive, i.e.,
(a) for all a ∈ A , aR a.
(b) If aR b and bR c then aR c.
A set, together with a definite preorder, is called a preordered set. It is traditional to write
a preorder with the symbol ≺ or with �. Thus “a preceeds b”, or “b is preceeded by a,
or, in economics, “b is preferred to a” is written a ≺ b. The symbol (A,≺) denotes a
preordered set. Notice that if B ⊂ A and if A is preordered by ≺ then, by default, this
preorder induces a preorder on B.
22 APPENDIX A. SETS
Example A.7.2 (a) In any set, the relation ∆ (the diagonal relation) on a set A× Ais a preorder and a ≺ b means a = b. Note that we no not assume in the definition
that any two elements can be compared. In other words, we do not require that
either a ≺ b or b ≺ a for all a, b ∈ A.
(b) In the set R the relation {(x, y) ∈ R× R |x ≤ y} is a preorder. On the other hand
{(x, y) ∈ R× R |x < y} is not a preorder. (Why?)
(c) (IMPORTANT!) For any set X, consider the power set P(X). The relation A ≺ B
defined by
A ≺ B if and only if A ⊂ B
is a preordering of P(X). In this particular case, we say that P(X) is preordered by
inclusion.
By putting other conditions on a preordering, different types of orders can be obtained.
Definition A.7.3 If a preordering on A satisfies the additional property of antisymmetry,
i.e.,
a ≺ b and b ≺ a if and only if a = b ,
then it is called a partial ordering. In this case A is called a partially ordered set.
As another, and important, example we consider the following.
Example A.7.4 Consider the set Rn. Let x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn).
Then we can introduce a partial order ≺ in Rn by
x ≺ y if and only if xi ≤ yi for all i = 1, 2, . . . , n.
Here ≤ is the usual ordering on the real line. Note that this is certainly a reflexive,
transitive, and anti-symmetric relation so that ≺ is indeed a partial ordering of Rn. Note
further that not every two elements can be compared. Thus, for example, in R2, the
vectors (1, 2)> and (2, 1)> are not comparable.
A.7. ORDERINGS ON SETS 23
This last example contrasts with the usual ordering ≤ on the real line, R, where every
element can be compared. This leads to an important special case of a partial order.
Definition A.7.5 Let A be a set. A total or linear order on the set A is a partial order
≺ such that, for all x, y ∈ A, x 6= y either x ≺ y or y ≺ x whenever x and y are both in
the domain and range of the order relation.
The usual order on the real line is the obvious example. We remark that, in this
terminology, a chain in a partially ordered set is a totally ordered family.
The set P(X) is partially ordered by inclusion since the preorder is also antisymmetric.
In general, the set P(X) is not a chain. In this context, what does a chain look like? One
example is the following: for each n ∈ N, let An ⊂ P(X) and suppose that, for each n,
An+1 ⊂ An and An 6= An+1. Then the set of subsets {An}∞n=1 constitutes a chain in P(X)
with respect to the partial ordering of inclusion.
Example A.7.6 Let X = [−1, 1] ⊂ R and let An = [− 1n, 1n] , n = 1, 2, · · · . Then this set
forms a chain, namely
[−1, 1] ⊃ [−1
2,1
2] ⊃ [−1
3,1
3] ⊃ · · ·
Here is another example which is important in a number of applications including
integer programming algorithms and sorting algorithms in computer sciennce.
Example A.7.7 (Lexicographical Order) Let X be the set of all infinite sequences of
real numbers. Define a relation ≺L on X by
a ≺L b provided, for the smallest integer io such that aio 6= bio , aio < bio .
This order is called lexicographical order since it is the same kind of order used in
common dictionaries. In fact, this order is a total, or linear, order. Indeed, it is clearly
reflexive. To check transitivity, note that if a ≺L b and b ≺L c then for minimial io and
jo, we have aio < bio and bjo < cjo . If io < jo then aio < bio = cio and so a ≺L c. On the
other hand, if jo < io then ajo = bjo < cjo and so, again, a ≺L c.
The property of anti-symmetry is easy to check. Finally, since any two sequences can
be compared, this partial ordering is indeed a total order.
24 APPENDIX A. SETS
Let us finish this section by discussing some standard ideas pertaining to partially
ordered sets that will be of some use in discussions of some of the ideas of Pareto and
multi-criteria optimization.
Definition A.7.8 Let (A,≺) be a partially ordered set with partial order given by ≺.
Then
(a) m ∈ A is called a maximal element in A provided m ≺ a implies m = a. The set A
is said to have a greatest element m provided, for all a ∈ A , a ≺ m.
(b) ao ∈ A is called an upper bound for a subset B ⊂ A provided, for all b ∈ B , b ≺ ao.
(c) B ⊂ A is called a chain in A if each two elements in B are related.
It is important to distinguish between maximal and greatest elements.
Example A.7.9 (a) Consider the set consisting of the union of the sets A = {2, 4, 6}and B = {3, 9, 27}. Partially order the union with the relation a ≺ b provided a is
a factor of b. Then there is no largest element, but both 6 and 27 are maximal.
(b) If we take the union of A = {2, 4, 6} and B = {1, 3, 9, 27}, then 1 is both a minimal
element and a least element.
We end this appendix with some remarks on functions that preserve order when the
domain and range of the function are ordered in some way. This is an important question
in Economics where it arises in the context of the existence of a scalar-valued utility
function. We confine ourselves to some elementary remarks.
In what follows, we are given two partially ordered sets {X,≺} and {Y,<}.
Definition A.7.10 A function f : X −→ Y is said to be order preserving or sometimes
isotone relative to the orders on X and Y provided f(u) < f(v) or f(u) = f(v) whenever
u, v ∈ X are such that u ≺ v.
The situation of interest to us is the case in which ≺ is a partial order, while < is a linear,
i.e., a complete order, in fact, Y = R and the order is the usual order on R. Unfortunately,
some extra conditions must be put on the sets before an isotone mapping will exist even
in the case that the order relation ≺ is a total order.
A.7. ORDERINGS ON SETS 25
�
We can see that this is the case by showing that lexicographic order does not admit
an order-preserving or isotone map into R with its usual order. The proof uses two
elementary facts about the system of real numbers, namely that any interval of positive
length contains a rational number, and that the cardinality of the set of rationals is less
than the cardinality of the set of reals. In particular, there cannot be an injective map of
R> into the rational numbers Q. The argument is by contradiction.
Proof: Suppose we are given a map u : R2> −→ R that represents lexicographic order in
R2. Were this the case, then, given any x ∈ R>, u(x, 0) < u(x, 1) since (x, 0) ≺L (x, 1)
the first components being the same and 0 < 1. Thus we can define an interval I(x) =
[u(x, 0), u(x, 1)] ⊂ R, of positive length.
Now, let x, y ∈ R> with x 6= y. Without loss of generality, we may assume that
y < x. Then (y, 1) ≺L (x, 0) in the lexicographic order. It follows that I(x) ∩ I(y) = ∅for (y, 0) ≺L (y, 1) ≺L (x, 0) ≺L (x, 1). Now let I =
⋃x∈R>
I(x) and define ϕ : R> −→ I by
ϕ(x) = I(x). This map is injective since I(x) ∩ I(y) = ∅.
Now it is a property of R that every interval contains a rational number. To finish
the proof, we pick a function that chooses one rational from each of the intervals8. Call
this map ψ. Then ψ : I −→ Q and ψ(I(x)) is, for each x, some rational number in the
interval I(x). Since the {I(x)}x∈R> is a disjoint family, the map ψ is injective and hence
ψ ◦ ϕ : R> −→ Q is an injective map. Hence card (R>) ≤ card (Q) which is false.
8Here, we are using the Axiom of Choice which we take as a fundamental axiom of Set Theory.
26 APPENDIX A. SETS
Appendix B
Basic Analysis
B.1 Introduction
In this appendix we collect a number of fact from Advanced Calculus that are neces-
sary for subsequent work. Again, the material here is intended as a summary and is not
a substitute for a course in Advanced Calculus/Beginning Analysis. Those who need a
good reference for this material should consult a standard text, for example M. Rosentlich
Introduction to Analysis [11] available through Dover, or W. Ruden, Principles of Math-
ematical Analysis,Third Ed. [12]. Here, as elsewhere in the text, we have printed some
proofs in small type. These are proofs which can be skipped without loss of continuity by
the beginning student
B.2 Norms and Inner Products
Our work will be confined almost exclusively to problems in a real n-dimensional Euclidean
vector space space which we will denote by Rn. However much of what we say here can be
considered in an arbitrary real vector space1. Vectors in the vector space Rn will always
be written as column vectors so that
1This is true in the finite dimensional case only. Certain changes muust be made in the case of an
infinite dimensional vector space like C[0, 1].
27
28 APPENDIX B. ANALYSIS
x =
x1
x2
...
xn
and we write its transpose as x>. Thus x> is a row vector. The distinction between the
notationally coincident row vectors and points in Rn will be clear from the context.
Recall, that there are two operations, the vector operations, that are defined on Rn,
namely vector addition and multipllication by scalars ( real numbers). These are defined
componentwise so that if x and y have ith components xi and yi and if α and β are real
numbers, then the ith component of αx + β y is αxi + β yi. There are eight basic rules
of computation that we list here for convenience.
Addition Scalar Multiplication
(1) x + y = y + x (5) α(x + y) = αx + βy
(2) x + (y + z) = (x + y) + z (6) (α + β)x = αx + βx
(3) There is a vector 0 with x + 0 = x (7) (αβ)x = α(βx)
(4) For each x there is a −x with x + (−x) = 0 (8) 1 · x = x
A linear combination of vectors is any sum of the form α1x1 + α2x
2 + · · ·+ αkxk.
B.2.1 Inner Products of Vectors
The usual dot product x · y between two vectors in Rn can be written in terms of matrix
multiplication. Thus
〈x,y〉 = y>x = x · y :=n∑i=1
xi yi .
We will almost always write the dot product as 〈x,y〉.This dot product has certain simple properties;
(i) 〈x,x〉 ≥ 0 and 〈x,x〉 = 0 if and only if x = 0. (positive definiteness)
(ii) 〈x,y〉 = 〈y,x〉 for all x,y ∈ Rn. (symmetry)
(iii) 〈αx,y〉 = α 〈x,y〉 for all α ∈ R and x,y ∈ Rn. (homogeneity)
(iv) 〈x + z,y〉 = 〈x,y〉+ 〈z,y〉 for all x,y, z ∈ Rn. (additivity)
B.2. NORMS AND INNER PRODUCTS 29
Notice that we can combine (ii) and (iii) to write
〈αx + βz,y〉 = α 〈x,y〉+ β 〈z,y〉
so that the inner product is linear in the first entry. By symmetry, it is likewise linear in
the second entry. This means that that an inner product2 is a positive definite, symmetric,
bilinear form. Indeed, in any real vector space, such a form defines an inner product. If
an inner product is given, the space, together with the inner product, is called an inner
product space.
We recall that two vectors in Rn are said to be orthogonal in the case that their dot
product vanishes. It is not difficult to show, say in R2 that orthogonal vectors must meet
at an angle of measure π/2. At this point, it is useful to recall a definition of linearly
independent vectors.
Definition B.2.1 A set of vectors {x1,x2, . . . ,xk} ⊂ Rn is said to be a linearly indepen-
dent set of vectors provided no one of them can be written as a linear combination of the
others.
Now we have the next result whose elementary proof we leave to the reader.
Lemma B.2.2 A set of vectors {x1, . . . ,xk} ⊂ Rn is a linearly independent set if and
only if the linear combination
α1x1 + · · ·+ αkx
k = 0
implies that α1 = α2 = · · · = αk = 0.
It should be clear that, for example, a set of mutually orthogonal non-zero vectors must
be linearly independent. Indeed, suppose that
α1x1 + · · ·αixi + · · ·+ αkx
k = 0 .
and take the inner product of both sides with xi. Then
k∑j=1
αj 〈xj,xi〉 = 〈0,xi〉 = 0 ,
2Here we restrict ourselves to the set of real scalars.
30 APPENDIX B. ANALYSIS
and each of the summands except the ith vanishes because of orthogonality. Hence this
last line reduces to αi〈xi,xi〉 = 0 and division by 〈xi,xi〉 shows that αi = 0. Since i is
arbitrary, all the αi = 0 and the vectors are linearly independent according to the lemma.
It is often useful to know that, given any set of linearly independent vectors, they span a
subspace of Rn and that the given set of k linearly independent vectors may be replaced
by a set of k mutually orthogonal vectors which span the same subspace. The method of
doing so is constructive and is known as the Gram-Schmidt Procedure.
B.2.2 Norms
The usual dot product, or inner product in Rn is associated with the Euclidean norm
‖x‖ =
√√√√ n∑i=1
x2i .
Unless stated explicitly, the symbol ‖ ·‖ will always refer to this Euclidean norm although
in some cases we will write ‖ · ‖2. This discinction may become useful since it is possible,
and often very useful (particularly in numerical work), to choose a different norm. In
particular, if we are working with an inner product that is different from the standard
dot product, then that new inner product likewise defines a norm in the same way that
the usual dot product induces the Euclidean norm.
Example B.2.3 As an example, suppose that A is a symmetric, n× n positive definite
matrix3. Then we can define a new inner product by
[x,y] = 〈Ax,y〉 .
The fact that the matrix is symmetric and positive definite insures that the new form [·, ·]satisfies all the properties of an inner product on Rn. This type of inner product is used
extensively in nonlinear programming where it arises in the so-called conjugate gradient
method. It is also useful, in more general settings, in studying elasticity where the norm
associated with the inner product is usually called the energy norm.
In order to understand this situation more fully, we should first recognize that the
idea of a norm is independent from that of an inner product. There are norms on Rn (or
on any vector space for that matter) that are not associated with any inner product at
3A matrix A is said to be positive definite provided, for all x ∈ Rn ,x 6= 0 , 〈Ax,x〉 > 0.
B.2. NORMS AND INNER PRODUCTS 31
all; those that are must enjoy a particular extra property4. In general, a norm is just
a generalization of the familiar absolute value of a real number. Here is the concrete
definition.
Definition B.2.4 A norm on Rn (or on any real vector space) is a real-valued function
Rn −→ R, whose value is denoted by ‖x‖ which has the following three properties
(1) ‖x‖ ≥ 0 for all x ∈ Rn, and ‖x‖ = 0 if and only if x = 0.
(2) ‖x + y‖ ≤ ‖x‖+ ‖y‖ for each choice of x,y ∈ Rn.
(3) ‖αx‖ = |α| ‖x‖ for all α ∈ R ,x ∈ Rn.
In terms of these properties, we see that, again, the norm is a positive definite form which
is positively homogeneous of degree 1 and which satisfies the triangle inequality (propery
(2) ).
A crucial result that follows from definition of a norm is the Cauchy-Schwarz-Bunyakovski
inequality5. This is, without doubt, one of the most important inequalities in all of math-
ematics and physics.
|〈x,y〉| ≤ ‖x‖ ‖y‖ , with equality if and only if y = αx for some scalar α 6= 0 .
Proof: Clearly the inequality is true in the case that y = 0 and it is likewise true if 〈x,y〉 = 0, i.e. if the
vectors are orthogonal. We assume, therefore, that neither is the case.
We observe that, if y = αx then
|〈x,y〉| = |〈x, (αx)〉| = |α| |〈x,x〉|
= |α| ‖x‖2 = ‖x‖ ‖αx‖ = ‖x‖ ‖y‖ .
So we have equality in this case.
Now define scalars ξ = ‖y‖ and η := −〈x,y〉‖y‖
. Note that η is defined and non-zero by assumption. Then,
by the binomial expansion
4This extra property is called the parallelogram law . Any norm that satisfies 2‖x‖2 + 2‖y‖2 =
‖x+ y‖2 + ‖x− y‖2 must arise from an inner product5Fair or not, in the Western European and English-speaking worlds, the name of Bunyakovski is left
out and the inequality is simply called the Cauchy-Schwarz inequality. We will follow that tradition.
32 APPENDIX B. ANALYSIS
‖ξ x + η y‖2 = ξ2‖x‖2 + 2 ξ η 〈x,y〉+ η2‖y‖2
= ‖x‖2‖y‖2 − 2‖y‖(〈x,y〉‖y‖
)〈x.y〉+
〈x,y〉2
‖y‖2‖y‖2
= ‖x‖2 ‖y‖2 − 〈x,y〉2 .
From this identity we see that if equality holds in the Cauchy-Schwarz inequality, then ‖ξ x+η y‖ = 0. But
then
y = −(ξ
η
)x , with η 6= 0 by assumption
so that y is a non-zero multiple of x.
Finally, assuming that ‖ξ x + η x‖ > 0 we have 〈x,y〉2 < (‖x‖ ‖y‖)2 which implies, taking appropriate
square roots, that |〈x,y〉| < ‖x‖ ‖y‖ and the Cauchy-Schwarz inequality is established.
Our first use of the Cauchy-Schwarz inequality is to check that what we called the
Euclidean “norm” is, indeed, a norm.
Example B.2.5 Consider what we have called the norm associated with the Euclidean
inner product. To check that ‖x‖ :=
√n∑i=1
x2i is indeed a norm, we need to check the
three properties listed in [B.2.4] . Since, for any r > 0 we have√r > 0 we have ‖x‖ ≥ 0.
Moreover ‖x‖ = 0 implies that x2i = 0 for each component i = 1, . . . , n. Hence ‖x‖ = 0
implies x = 0. Since x = 0 imples xi = 0 for all i, we see that property (1) holds. Since√a · b =
√a√b clearly ‖αx‖ =
√α2
n∑i=1
x2i =√α2
√n∑i=1
x2i = |α|
√n∑i=1
x2i = |α| ‖x‖ we see
that (3) is satisfied.
Finally, to check the triangle inequality, we use the Cauchy-Schwarz inequality. Start
with the relationship of the supposed norm to the inner product and expand the inner
product.
‖x + y‖2 = 〈x + y,x,y〉 = 〈x,x + y〉+ 〈y,x + y〉
= 〈x,x〉+ 〈x,y〉〈y,x〉+ 〈y,y〉 = ‖x‖62 + 〈y,x〉+ 〈x,y〉+ ‖y‖2
= ‖x‖2 + 2 〈x,y〉‖y‖2 ≤ ‖x‖2 + 2 |〈x,y〉|+ ‖y‖2
≤ ‖x‖2 + 2 ‖x‖ ‖y‖+ ‖y‖2 by Cauchy-Schwarz
= (‖x‖+ ‖y‖)2 , the preceeding line from simple expansions of the last .
This establish the triangle inequality and the relation ‖x‖ =
√n∑i=1
x2i does, indeed, define
a norm.
B.2. NORMS AND INNER PRODUCTS 33
The expansion of ‖x+y‖2 in the above calculation is a special case of what is known as
the binomial theorem. Given real scalars α, β and vectors x,y we have the usual binomial
expansion
‖αx + β y‖2 = α2‖x‖2 + 2αβ 〈x,y〉+ β2 ‖y‖2 ,
which can be easily checked by expanding the inner product ‖αx + β y‖2 = 〈(αx +
βy)(αx + βy)〉.
As remarked previously, there are may situations in which the Euclidean norm is not
the most convenient. The following form a scale of useful norms:
Examples B.2.6 (a) The `1-norm: ‖x‖1 = |x1|+ |x2|+ · · ·+ |xn|,
(b) The `2-norm : ‖x‖2 = (|x1|2 + |x2|2 + · · ·+ |xn|2)1/2,
(c) The `p-norm : ‖x‖p = (|x1|p + |x2|p + · · ·+ |xn|p)1/p , 1 ≤ p <∞,
(d) The `∞-norm: max1≤i≤n
|xi|.
Clearly, (a) and (b) are special cases of (c). In each case, the work of showing that the
given relation does in fact define a norm is that of checking that the triangle inequality
is satisfied. We have just shown that this is the case for the Euclidean or `2-norm
B.2.3 Some Important Inequalities
While we call the expressions the `p-norms, for 1 ≤ p < ∞, to check that they are, in
fact, norms requires that we check the triangle inequality, the other properties of norms
being easy to check. In these cases, the triangle inequality is called Minkowski’s Inequality
which reads (n∑i=1
|xi + yi|p)1/p
≤
(n∑i=1
|xi|p)1/p
+
(n∑i=1
|yi|p)1/p
To prove Minkowski’s inequality shows that these norms really are norms.
In order to prove Minkowski’s inequality, we need a generalization of the Cauchy-
Schwarz inequality which is called Holder’s Inequality. It is important in its own right and
it is worthwhile to remember both of these inequalities, together with the all-important
Cauchy-Schwarz inequality. Holder’s inequality is
n∑i=1
|xi yi| ≤ ‖x‖p ‖y‖q ,1
p+
1
q= 1 , 1 ≤ p <∞ .
34 APPENDIX B. ANALYSIS
Below, we establish these important inequalities.
We begin with a generalization of the inequality of the arithmetic-geometric mean of two positive real
numbers. This is the inequality√xy ≤ x
2+ y
2and is easily established.
0 ≤ (x− y)2 = x2 − 2x y + y2 (B.1)
= x2 + 2x y + y2 − 4x y = (x+ y)2 − 4x y . (B.2)
The result follows by rearrangement and taking square roots.
The generalization that we will need is given next.
Lemma B.2.7 Let a, b ≥ 0 be real numbers and let 1 ≤ p <∞ and q such that 1/p+ 1/q = 16. Then
a1/p b1/q ≤a
p+b
q.
Proof: We may consider only the case that both a and b are positive since, if either were zero, the result
would hold trivially. Now, for any fixed k, 0 < k < 1, and for t > 0 define a function f by
f(t) = k (t− 1)− tk + 1 ,
which has derivative f ′(t) = k − k tk−1 = k(1− tk−1) = k(
1− 1t1−k
)≥ 0. So f is an increasing function
and f(1) = 0 as can be easily checked. Hence
k(t− 1)− tk + 1 ≥ 0 , or tk ≤ k t+ 1− k) .
Since we require t ≥ 1 we have two cases. If a ≥ b put t = a/b and k = 1/p. Then
(ab
) 1p ≤
1
p
(ab
)+
1
q, or b
(ab
) 1p ≤ b
1
p
(ab
)+b
q,
from which it follows that
a1/pb1−1/p ≤a
p+
1
q.
The result follows from the fact that (1− 1/p) = 1/q.
If, on the other hand, b > a, then set t = b/a and k = 1/q. It follows that
(b
a
)1/q
≤1
q
(b
a
)+
(1−
1
q
),
and the result follows as before.
Lemma B.2.8 (Holder’s Inequality). For any x,y ∈ Rn
n∑i=1
|xi yi| ≤ ‖x‖p ‖y‖q ,1
p+
1
q= 1 , 1 ≤ p <∞ . (B.3)
6Such indices p and q are said to be conjugate.
B.3. SUBSETS OF RN 35
Proof: Again, if either x = 0 or y = 0 the inequality is triviallly true. Hence we assume that they are both
non-zero. In this case, set
ai =
(|xi|‖x‖p
)pand bi =
(|yi|‖y‖q
)q.
and appliy the preceeding lemma. Thus
a1/pi b
1/qi =
|xi‖x‖p
|yi|‖y‖q
≤ai
p+bi
q.
Adding these results, we obtain (recalling that , e.g., ‖x‖pp =∑|xi|p) we have
1
‖x‖p‖y‖q
n∑i=1
|xi yi| ≤(
1
p
)1
‖x‖pp
n∑i=1
|xi|p +
(1
q
)1
‖y‖qq
n∑i=1
|yi|q (B.4)
=1
p+
1
q= 1 . (B.5)
Multiplying both sides of this last inequality by ‖x‖p ‖y‖q the result follows.
We are now ready for the main event.
Proposition B.2.9 (Minkowski’s Inequality) For any x,y ∈ Rn , ‖x + y‖p ≤ ‖x‖p + ‖y‖p, 1 ≤ p <∞.
Proof: For p = 1 the inequality reduces to the well-known inequality for absolute value. For p > 1 and q
conjugate to p, note that p/q = p− 1. Then
‖x + y‖pp =
n∑i=1
|xi + yi|p =
n∑i=1
|xi + yi| |xi + yi|p−1 (B.6)
≤n∑i=1
|xi| |xi + yi|p−1 +
n∑i=1
|yi| |xi + yi|p−1 (B.7)
=n∑i=1
|xi| |xi + yi|p/q +
n∑i=1
|yi| |xi + yi|p/q (B.8)
≤ ‖x‖p
(n∑i=1
|xi + yi|p)1/q
+ ‖y‖p
(n∑i=1
|xi + yi|p)1/q
(B.9)
= (‖x‖p + ‖y‖p) ‖x + y‖p/qp (B.10)
where the last inequality is the result of applying Holder’s inequality. Dividing both sides by ‖x + y‖p/qp
we have, finally
‖x + y‖p−(p/q)p ≤ ‖x‖p + ‖y‖p ,
and Minkowski’s inequality follows from the fact that p− p/q = p(1− 1/q) = p(1/p) = 1.
B.3 Subsets of Rn
Basic properties of sets in Rn and related notions of convergence and continuity depend
on the notion of neighborhoods and of open sets, which, in turn, depend on the norm
imposed on the space. These notions are direct generalizations of, the notions of open
and closed intervals on the real line R. This section is devoted to an explanation of these
basic ideas.
36 APPENDIX B. ANALYSIS
B.3.1 Basic Definitions
We start with a tentative definition of neighborhood. The idea of neighborhood will be
expanded later.
By a δ-neighborhood of a point xo ∈ Rn is meant the set
Bδ(xo) := {x ∈ Rn | ‖x− xo‖ < δ} ,
which is also referred to as the open ball of radius δ centered at xo. Let S ⊂ Rn. Then a
point xo ∈ S is said to be an interior point of S provided there is a δ-neighborhood of xo
contained entirely in S. A point is said to be an accumulation point or a limit point of S if
every δ-neighborhood of xo contains a point x 6= xo with x ∈ S. Note that a limit point
of a set need not be in the set (take, e.g., S to be the open unit ball centered at xo = 0,
namely B1(0) = {x ∈ Rn | ‖x‖ < 1}. Then any unit vector is an accumulation point of
S and yet is not in S itself). A point is an isolated point of S if xo is in S but is not an
accumulation point of S. For example, if the set is {x ∈ R |x = −1 or 0 ≤ x ≤ 1} then
the point x = −1 is an isolated point of the set.
A point x is called a boundary point of S if every δ-neighborhood of xo contains
points in S and points not in S. For the set B1(0), all the vectors x with ‖x‖ = 1 are
boundary points of the unit ball. This set of boundary points is usually called the unit
sphere7. The boundary of a set A is just the set of all boundary points which we will write
bd(A). So, for example, if D is the the closed unit ball centered at the origin, then the
set of points on the unit circle constitute the boundary since every neighborhood of every
piont on the unit circle meets both the interior of D and R2 \ D. Finally, a point xo is
an exterior point of S provided there is a δ-neighborhood of xo which contains no points
of S.
Of all the subsets S ⊂ Rn we distinguish several with particular properties.
Definition B.3.1 A set S ⊂ Rn is said to be
(a) open provided all its points are interior points of S,
(b) closed provided it contains all its limit points,
(c) bounded provided it is contained in some ball {x ∈ Rn | ‖x‖ < r}, where 0 < r <∞,
7It is usually understood that term unit sphere is reserved for the set of all x with ‖x‖ = 1. It is called
a manifold orsurface in Rn and, as such, is a set of dimension n− 1.
B.3. SUBSETS OF RN 37
(d) compact provided S is both closed and bounded 8.
Note that these definitions imply that the entire space Rn as well as the empty set, ∅,are both open and closed. Moreover, it is easy to check, using the definitions, that the
union of an arbitrary number of open sets is open, while the intersection of any collection
of closed sets is closed.
�
Here we must be careful! If we interchange unions and intersections in the state-
ments above, the results are false unless we restrict ourselves to finitely many sets. So we
can say that the intersection of finitely many open sets is open while the union of finitely
many closed sets is closed. Let us check this first statement.
Suppose that G = {Gi}ki=1 is a finite family of open sets. Then the set G =k⋂i=1
Gi is
in fact open. In the cases that either G is empty, or that, for some i , Gi = ∅ or that the
family is pairwise disjoint, then the intersection, G, is empty and so open since the empty
set is open. We may suppose, therefore, that G 6= ∅.
In this case let x ∈ G. Then for each i = 1, 2, . . . , k ,x ∈ Gi and there is an εi > 0 for
which Bεi(x) ⊂ Gi. Let ε = min1≤i≤k
{εi}. Then Bε(x) ⊂ Gi for all i and so Bε(x) ⊂ ∩Gi = G.
So every point of G is the center of some ball completely contained in G. Hence G is
open.
It is easy to give an example to show that if infinitely many sets are allowed, then
G may well not be open. Simply take Gi to be the open interval (−1/i, 1/i). Then
∩Gi = {0}, a singleton, which is a closed set.
At this point, let us take a short, but important, detour. Above, when we talked
about norms, we gave examples of norms, other than the Euclidean norm, that can be
imposed on Rn (see B.2.6). Now we have discussed what we mean by open sets, closed
sets, accumulation points, etc., all defined in terms of neighborhoods that are,themselves,
defined in terms of the Euclidean norm. Moreover, we shall presently discuss convergence
of sequences, again in terms of the Euclidean, or `2-norm of B.2.6. Here is an interesting,
and important, question: if we use a norm different from the `2-norm, how, if at all, do
8The definition of the term “compact’ given here is specific to Rn. In more general contexts, another
definition is used and it becomes a theorem to be proved that a set in Rn is compact in this more general
sense provided it is closed and bounded.
38 APPENDIX B. ANALYSIS
these things change? The answer is that they do not change. This means, for example,
that open sets defined in terms of balls with respect to one norm are also open when
we use balls defined with one of the other norms, and convergence of a sequence in one
norm, implies convergence with respect to all the other norms. The implication is, of
course, that we may use all of these ideas, choosing whatever norm is convenient to a
given situation.
Why is this the case? We first introduce a definition.
Definition B.3.2 Suppose that ‖ · ‖α and ‖ · ‖β are two norms on Rn. Then these
norms are said to be equivalent norms provided there are constants K1 and K2 such that
K1‖x‖α ≤ ‖x‖β ≤ K2‖x‖α.
It is clear, from this definition, that every ball in the α-norm contains a ball in the
β-norm and vice versa. So, for example, the interior points of a set can be described in
terms of either norm. And from this it follows that open sets and closed sets with respect
to one norm are open or closed with respect to the other. As for the norms that we
introduced earlier, they are equivalent norms.
Proposition B.3.3 Let 1 ≤ p ≤ ∞ then the `p-norms given in B.2.6 are equivalent.
Indeed, we have the following inequalities:
(1) ‖x‖2 ≤ ‖x‖1 ≤√n ‖x‖2;
(2) ‖x‖∞ ≤ ‖x‖2 ≤√n ‖x‖∞;
[(3) ‖x‖∞ ≤ ‖x‖1 ≤ n ‖x‖∞.
We leave the proof as an exercise.
We mention that much more is true. In fact, all norms on Rn are equivalent. The
proof requires the notion of compactness and we postpone it. You will find the result in
Proposition B.4.7.
Now, returning to the main thread of our discussion we can start with a set A that is
not closed, and take its union with the set of all of its accumulation points. The resulting
set is obviously closed. This new set is called the closure of the set A and we will write it
c`(A). Thus, if we take `p(A) to be the set of limit points of A, then A ∪ `p(A) = c`(A).
Intuitively, the closure of A is the set A itself together with all points arbitrarily close to
B.3. SUBSETS OF RN 39
A. The basic example is the closure of the set B1(0) = {x ∈ Rn | ‖x‖ < 1} is the set
c`[B1(0]) = {x ∈ Rn | ‖x‖ ≤ 1}. Note that, for any set A ∈ Rn , c`(A) ⊃ A. We leave
the proof of the following facts to the reader.
Proposition B.3.4 Let A ⊂ Rn. Then
(a) A is closed if and only if A = c`(A).
(b) If F is a closed set such that F ⊃ A, then F ⊃ c`(A).
(c) If F denotes the set of all closed subsets containing A then ∩F∈FF = c`(A).
To complete the classification of points related to a set, we introduce the concept of
boundary point.
Definition B.3.5 Given a set A ⊂ Rn, a point x ∈ Rn is called a boundary point of A
provided that every open ball Bε(x) intersects both A and Rn \A. The boundary of the set
A is the set of all boundary points of A.
In what follows, we will denote the boundary of a set A by bd(A). So, for example, the
closed unit disk centered at the origin in R2 has the unit circle as its boundary since every
neighborhood of every point of the unit circle meets both the interior and the exterior of
the unit disk.
Again, we have some simple results whose proofs we leave to the reader.
Proposition B.3.6 Let A ⊂ Rn. Then
(a) bd(A) = c`(A) ∩ c`(Rn \ A).
(b) The set bd(A) is a closed set.
(c) The set A is closed if and only if it contains its boundary.
The next result characterizes closed sets in terms of open sets. The proof is worth
studying since it will give good practice in handling compliments, closed sets and open
sets.
Proposition B.3.7 A set F ⊂ Rn is closed if and only if its complement Rn \F is open.
40 APPENDIX B. ANALYSIS
Proof: Suppose, first, that the set F is closed. Then we show that O = Rn \ F is open.
If O is empty, then it is open and so we may assume that 0 6= ∅. Take an arbitrary point
x ∈ O. Since F is closed and hence contains all its limit points, x 6∈ `p(F ). So there
exists an open ball Bε(x) such that Bε(x) ∩ F = ∅. Hence, about any point x ∈ O we
can find an open ball, centered at x completely contained in O. Hence, O is an open set.
Conversely, suppose that the set O is open, and let x ∈ `p(F ). Then x ∈ F since each
point of O is the center of an open ball that does not meet F and such a point cannot be
a limit point of F by definition.
B.3.2 Suprema and Infima
We now turn to the case of sets in R and the notions of greatest lower bound or infimum
and least upper bound or supremum of a set of real numbers. Before proceeding, the reader
may wish to review the defiinition A.7.8 and the material immediately following that
definition.
A set S ⊂ R is said to be bounded above provided there is a constant M such that
x < M for all x ∈ S. Such a number M is called an upper bound. It is a property of
the real numbers with the usual ordering that every set which is bounded above has a
least upper bound, that is, there exists a number s such that x ≤ s for all x ∈ S and, if
M is an upper bound for S then s ≤ M . In this case we write s := sup(S). If S is not
bounded above, then we write sup(S) = ∞. Likewise, if S is bounded below, then there
exists a greatest lower bound or infimum that is, a number i such that i is itself a lower
bound and, if m ≤ x for all x ∈ S then m ≤ i. We write i := inf(S). If S is not bounded
below, we write i = −∞. Moreover, we will adopt the convention that sup(∅) = −∞ and
inf(∅) =∞. The numbers s and i may, or may not, belong to the set S. In the case that
they do, we write s = max(S) and i = min(S). In fact, we have the following result
Proposition B.3.8 Let S be a non-empty subset of R which is bounded above. If s =
sup(S) then s ∈ c`(S) Hence s ∈ S if S is closed.
Proof: If s ∈ S then s ∈ c`(S). On the other hand, if s 6∈ S then, for every ε > 0 there
is a point t ∈ S such that s− ε < t < s for otherwise, s− ε would be an upper bound for
S. Thus s is a limit point of S, hence s ∈ c`(S).
Before going more deeply into these ideas, we are going to use these basic definitions
to show that lying between any two real numbers there is both a rational number and
B.3. SUBSETS OF RN 41
an irrational number. We start with a very simple fact that follows directly from the
definition of least upper bound.
Lemma B.3.9 Suppose that S ⊂ R is bounded above and let x be any upper bound for
S. Then the following statements are equivalent:
(a) x = sup(S).
(b) For any ε > 0 , S ∩ (x− ε, x) 6= ∅.
Proof: To see that (a) implies (b), suppose not. Then there exists and ε > 0 with the
property that S∩(x−ε, x) = ∅. Then x−ε is an upper bound less than x, a contradiction
to the choice of x.
To see that (b) implies (a) suppose that (b) is true but that (a) is false i.e., that x is
not the least upper bound for S. Then there exists a z < x such that y ≤ z for all y ∈ S.
Let εo = (z − x. Then, clearly, S ∩ (x− εo, x) = ∅ and (b) does not hold. .
The next proposition is usually known as the Archimedian Property of the real num-
bers. It says that there is no upper bound for the real numbers, and consequently, no
smallest positive number. It is an essential fact necessary to actually prove that the
sequence {1/n}∞n=1 converges to zero.
Proposition B.3.10 For any x ∈ R there exists and n ∈ N such that x < n.
Proof: Assume, on the contrary, that there exists an x ∈ R such that, for all n ∈ N , n ≤x. This means that the set N is bounded above and hence has a least upper bound, call it
y. Since y is the least upper bound, there must be an integer n in the interval (y− 1/2, y]
which exists by the previous lemma. But then y − 1/2 < n ≤ y from which we deduce
by addition that y = 1/2 < n + 1. But n + 1 is an integer and so y cannot be an upper
bound for N.
Remark: Note that a similar argument, using greatest lower bounds will show that
there cannot be a smallest integer.
Corollary B.3.11 The sequence {1/n}∞n=1 converges to 0.
Proof: . Let ε > 0 be given and consider the open ball Bε(0) ⊂ R. Choose any integer
N.1/ε whose existence is guaranteed by the proposition. Then for all n > N , we have
0 <1
n<
1
N< ε which implies that
1
n∈ Bε(0) whenevern > N .
42 APPENDIX B. ANALYSIS
Hence in any ball of the given form, there are infinitely many elemenets of the sequence.
Hence 0 is a limit point of the sequnce.
These types of arguments can now be used to show that between any two real numbers
there is a rational.
Proposition B.3.12 Let a, b ∈ R with a < b. Then there is a rational number q with
a < q < b.
Proof: Choose an integer N such that N > 1/(b − a), or equivalently, 1/N < b − a.
Consider the subset Q ⊂ Q given by
Q ={mN
∣∣m ∈ Z}.
Then Q∩(a, b) 6= ∅. Indeed, if not there must be a largest integer m such that m/N < a9.
If (m + 1)/N < b then it is a rational number between a and b and we are done by
construction. Hence we must assume that (m+ 1)/N ≥ b. But then
b− a ≤ m+ 1
N− m
N=
1
N< b− a ,
which is impossible. Hence Q ∩ (a, b) 6= ∅.
In order to show that every interval contains an irrational number as well as a rational
one, we need only know that irrational numbers exist. There are plenty of choices, but
the traditional one is, of course,√
2. In case the reader has never seen the proof that this
number is irrational, we give the traditional proof that already appeared in the book of
Euclid.
Proof: Suppose the contrary, that√
2 is rational and hence can be written as a ratio of integers p/q where
p and q have no common factors. Then, 2 = p2/q2 and so p2 = 2 q2 showing that p2 must be divisible by
2 and hence by 4 since all factors of p must occur twice. Write p2 = 4 r2. Then we have 4 r2 = q2 so that
q2is also divisible by 4 from which it follows that q is divisible by 2. Hence p and q have the factor 2 is
common, a contradiction.
We now have
Proposition B.3.13 If a, b,∈ real with a < b then there is an irrational number t with
a < t < b.
9This follows from the remark above that there can be no smallest integer.
B.3. SUBSETS OF RN 43
Proof: The interval (a/√
2, b/√
2) contains a rational number, call it q. If a < 0 < b,
choose a rational number from the interval (a/√
2, 0) instead. Then√
2 q ∈ (a, b) in either
case, and this is the required irrational number. In fact, were this number to be rational,
then√
2 q = p, p rational, and hence√
2 = p/q which would mean√
2 would be rational,
which it is not.
At this juncture it is useful to introduce a definition.
Definition B.3.14 Let X be a subset of Rn with the usual metric and let E ⊂ X. Then
E is said to be dense in X provided every point of X is an accumulation point of E or a
point of E or both.
We have just shown that both the set of rational numbers and the set of irrational
numbers are dense in R. In other words, given any point in R, any ball around that point
contains a rational and an irrational number. So that point is an accumulation point of
both the rationals and the irrationals.
It is not hard to see that points with rational coordinates in Rn are dense. This follows
by using the fact that balls in the max-norm, `∞ are just “rectangles” whose sides are
determined by intervals of the form ai ≤ xi ≤ bi so that a point with rational coordinatec
can be found in any such neighborhood and then recalling that the norm is equivalent to
the `2-norm.
Now, let {xk}∞k=1 be a sequence of real numbers and let
ym := sup{xk | k ≥ m} , and zm := inf{xk | k ≥ m} .
Clearly, the sequence {ym}∞m=1 is nonincreasing while the sequence {zm}∞m=1 is nondecreas-
ing. It is a basic fact of real analysis that every bounded monotonically nonincreasing
or nondecreasing sequence converges. Here, if the original sequence is bounded above,
then the sequence {ym}∞m=1 converges, while if it is bounded below, the sequence {zm}∞m=1
converges. In the first case, the limit of the sequence {ym}∞m=1 is written lim supk→∞
xk, while,
in the second case, we write lim infk→∞
xk. If the sequence {xk}∞k=1 is not bounded above, we
write lim supk→∞
xk =∞ and if it is not bounded below, we write lim infk→∞
xk = −∞.
We give some simple examples.
44 APPENDIX B. ANALYSIS
(a) Let xk := (−1)k + 1k, k = 1, 2, . . .. So the sequence looks like {0, 3
2, −2
3, 5
4, −4
5, · · · }.
Then, since
max{xm, xm+1} =
m+1m
if m is even
m+1m+1
if m is odd
so that limm→∞
ym = 1 or lim supk→∞
= 1. Likewise, since
zm = inf{xk | k ≥ m} = min{xm, xm+1}
=
1−mm
if m is even
−mm+1
if m is odd ,
so that limm→∞
zm = −1 or lim infk→∞
xk = −1.
(b) Let xk = k2 sin2 (12kπ). Then xk ≥ 0 and, for each k even, xk = 0 while, for each k
odd, xk = k2. Hence, lim infk→∞
xk = 0 while lim supk→∞
xk =∞.
There are certain basic facts about how these limits behave. Here are a few facts: Let
{xk}∞k=1 and {yk}∞k=1 be sequences in R. Then
(i)
inf{xk | k ≥ m} ≤ lim infk→∞
xk
≤ lim supk→∞
xk ≤ sup{xk | k ≥ m} ,
(ii) {xk}∞k=1 converges if and only if −∞ < lim infk→∞
xk = lim supk→∞
xk < ∞. In which case
limk→∞
xk = lim infk→∞
xk = lim supk→∞
xk <∞.
(iii) If xk ≤ yk for all k = 1, 2, · · · , then
lim infk→∞
xk ≤ lim infk→∞
yk and lim supk→∞
xk ≤ lim supk→∞
yk .
(iv)
lim infk→∞
xk + lim infk→∞
yk ≤ lim infk→∞
(xk + yk) ,
and
lim supk→∞
xk + lim supk→∞
yk ≥ lim supk→∞
(xk + yk) .
B.3. SUBSETS OF RN 45
We will also use the notation limk→∞xk for lim supk→∞ xk and limk→∞xk for lim infk→∞ xk.
B.3.3 Connected Sets
Another useful property of a subset is that it be connected. Roughly speaking, this means
that the set is “in one piece”. We will need a preliminary definition.
Definition B.3.15 Two subsets A and B of Rn are said to be separated A ∩ c`(B) and
c`(A) ∩B are both empty.
Otherwise said, no point of A is in the closure of B and vice versa. With this definition
in hand, we can give a precise definition of a connected set in Rn.
Definition B.3.16 A set A ⊂ Rn is said to be conneted provided it cannot be written as
the union of two non-empty separated sets.
We will have more to say about connected sets later. For now, the basic fact that
we want to establish is that the open and closed intervals of the real line are connected
sets. A little later, we will see how this fact is important in establishing the familiar
intermediate value theorem. In the proof of the next result we will use the notation
(−∞, u) = {x ∈ R |x < u} and (u,∞) = {x ∈ R |x > u}.
Proposition B.3.17 A subset IofR is connected if and only if it has the following prop-
erty: If a, b ∈ I and a < x < b then x ∈ I.
Proof: If a < x < b and x 6∈ I then the sets A = I ∩ (−∞, x) adn B = I ∩ (x,∞) are
non-empty since a ∈ A ad b ∈ B. They are separated since A ⊂ (−∞, x) and B ⊂ (x,∞).
Moreover I = A ∪B. So I is not connected.
Conversely, suppose I is not connected. There there are non-empty separated sets A
and B such that A ∪ B = I. Choose a ∈ A and b ∈ B and assume, without loss of
generality, that a < b. Define x = sup{A ∩ [a, b]}. Then by B.3.8 x ∈ c`(A) and hence
x 6∈ B. In particular a ≤ x < b. If x /∈ A it follows that a < x < b and so x 6∈ I. If x ∈ Athen x 6∈ B and so there is a x1 such that x < x1 < b and x1 6∈ B. Then a < x1 < b and
x1 6∈ I.
Finally we prove a result about arbitrary unions of connected sets in Rn.
46 APPENDIX B. ANALYSIS
Proposition B.3.18 Let {Si}i∈I be a family of connected subsets of Rn and that, for
some index io, Sio ∩ Si 6= ∅ for all i ∈ I. Then S = ∪i∈Isi is connected.
Proof: Suppose that the uniton S is not connected. Then there exist separated sets, A
and B such that A ∪ B = S. We first show that, for every index i, either A ∩ Si = Si or
A ∩ Si = ∅. To see this, note that A ∩ c`(B) = ∅ implies that
(A ∩ Si) ∩ c` (B ∩ Si) ⊂ A ∩ c`(B) = ∅.
SImilarly (A ∩ Si) ∩ c` (B ∩ Si) = ∅.
Now Si is connected so either Si ∩A = ∅ or Si ∩A = Si. Likewise, we have that either
Si ∩B = ∅ or Si ∩B = Si. But neither A nor B is empty, so there must be some indices
m and n such that A ∩ Sm = Sm and B ∩ Sn = Sn. By hypothesis, the connected set
Sio meets each Si and so for the index m, Sio ∩ Sm 6= ∅ and so A ∩ Sio 6= ∅ and therefore
A ∩ Sio = Sio .
In a completely similar manner, we show that B ∩ Sio = Sio . But then A ∩B 6= ∅ and
we have a contradiction.
B.3.4 The Bolzano-Weierstrass Theorem
One basic result, which can be found in any advanced calculus text (see, for example [2]),
is the Bolzano-Weierstrass Theorem which says that
Theorem B.3.19 (Bolzano-Weierstrass) Every bounded infinite set in Rn has at least
one accumulation point.
We will make free use of this theorem in this book. For the sake of completeness, we
include here a proof iin the case of the real line, R, with the usual notions of open and
closed sets. The idea of the proof (akin, according to Boas, to the process of finding a lion
in the Sahara Desert) can be easily generalized to Rn by using n-dimensional intervals10 .
Proof: Let S ⊂ R be bounded and contain infinitely many points. Then S lies in some interval of the form
[−a, a]. Then at least one of the intervals [−a, 0] and [0, a] contain infinitely many points of S. Choose one
that does and call it I = [a1, b1]. Bisect this interval, to obtain a smaller interval, I2 = [a2, b2] containing
infinitely many points of the original set S.
10In Rn we can define an interval to be a set consisting of point x = (x1, x2, . . . , xn) such that
ai ≤ xi ≤ bi , i = 1, . . . , n. Or any set with < replacing ≤. Moreover, we do not rule out that for some
i, ai = bi; in particular ∅ is an interval.
B.3. SUBSETS OF RN 47
Now according to this construction b1−a1 = a and b2−a2 = (a/2). Again, construct an interval I3 = [a3, b3]
by bisection, so that I3 contains infinitely many points of S. Then, the length of I3 = |I3| = |I2|/2 =
a/22. Continuing in this manner we construct, at the nth step the interval In with length a/2n−1. Hence
limn→∞
|In| = 0 and so the endpoints an and bn converge to a point xo. This latter point is the required
accumulation point since, if ε > 0 is arbitrary, Bε(xo) ⊃ [an, bn] for n sufficiently large, specifically provided
bn − an < ε/2. In this case, Bε(xo) contains points of the original set S other than xo (in fact infinitely
many).
A related and quite useful result due to Cantor can be proved using the Bolzano-
Weierstrass Theorem. We state the theorem in Rn.
Theorem B.3.20 Let {S1, S2, . . .} be a sequence of closed, non-empty sets in Rn, nested
in the sense that Sk+1 ⊂ Sk, and assume that S1 is a bounded set. Then
S =∞⋂k=1
Sk 6= ∅ ,
and the intersection S is closed.
Proof: Since each Sk is closed, their intersection, S, is closed as well. The goal is then to show that S 6= ∅.
First observe that if any one of the sets has only finitely many points, all the rest do as well and the
existence of a common point is obvious. So, we assume that all the Sk have infinitely many points. Now,
let P = {x1,x2, . . .} where the xk ∈ Sk. Since S1 is bounded by hypothesis, so is the set P , and hence, by
the Bolzano-Weierstrass Theorem, this set P has an accumulation point, say xo.
Now, let ε > 0 be given and consider the neighborhood Bε(xo). Then, if P (k) = {xk, xk+1, . . .}, on the one
hand P (k) ⊂ Sk while, on the other hand, every Bε(xo) contains infinitely many points of P .
B.3.5 Convergence
It is convenient to extend the notion of neighborhood beyond that of the δ- neighborhood
introduced earlier . We will subsequently use the term neighborhood of the point xo to
mean any open set that contains the point xo. Similarly, we will find it useful to define a
neighborhood of a set S as any open set N containing S. In particular, a δ-neighborhood
of a set S, denoted by [S]δ, is the set of points in Rn each of which lies in a δ-neighborhood
of some point of the set S. In other words, if Bδ(x) denotes such a neighborhood of the
point x, then
[S]δ := ∪x∈SBδ(x) .
As a simple example, we can see that if S is the closed unit disk, then
[S] 12
= {x ∈ Rn | ‖x‖ ≤ 32}.
48 APPENDIX B. ANALYSIS
The `2-norm induces the usual Euclidean distance or Euclidean metric between points of
Rn via
d(x,y) = ‖x− y‖ =
(n∑i=1
|xi − yi|2) 1
2
.
Other norms induce other distance functions or metrics. Here is a definition of this
term. We state it for a general case, but remind outselves that most of our work is in
Rn. We do this because we are often working on a subset or Rn and we need the idea of
a metric on the set.
Definition B.3.21 Let X be a non-empty set. Then a metric on X is a real-valued
function d : X ×X −→ X which satisfies the following conditions:
1. d(x, y) ≥ 0, and d(x, y = 0 if and only if x = y.
2. d(x, y) = d(y, x) (symmetry).
3. d(x, y) ≤ d(x, z) + d(z, y) (the triangle inequality).
Here are some examples.
Examples B.3.22
Example 1. Let X be any non-empty set. Define d by
d(x, y) =
0 if x = y
1 if x 6= y
We leave it to the reader that this does, indeed, define a metric.
Example 2. Let X be any normed space. Then d(x, y) = ‖x − y‖ defines a metric
as is easily seen using the properties of a norm. So our usual Rn with the `2-norm is a
metric space with the metric induced in this way by the `1-norm. But then, so is the
same space, together with the distance functions defined by any of the `p-norms that we
have discussed. In the case of the `1-norm, the distance is sometimes called the taxicab
distance.
Example 3. The notion of Hamming distance occurs in coding theory, in particular in the
discussion of error detection and the construction of error correcting codes. Here, we will
B.3. SUBSETS OF RN 49
consider the set X to be the set of all strings of the symbols 0 and 1 of length k. Given
two such strings, the Hamming distance dH(x, y) is just the number of corresponding
entries that are different. Thus, in the case that k = 3 we have dH(111, 000) = 3 while
dH(110, 100 = 1.
To actually check that dH defines a metric, we need only check that dH has the appro-
priate properties.
It is clear that dH(x, y) ≥ 0 and dH(x, y) = 0 if and only if the components of x and y
do not differ at all, that is, if and only if x = y. Moreover, the order in which we check
the differences is irrelevant. Hence dH(x, y) = dH(y, x). To check the triangle inequality,
suppose that we are given three strings x, y, z and that dH(x, z) = a and that there are b
positions where the elements of y are the same as those of of x but not the same as those
of z. Further, thta in the 3 − a positions in which x matches z there are c positions in
which the elements of y do not match either x or z. Then
dH(y, z) = b+ c and dH(x, y) = a− b+ c.
then adding these two distnces
dH(x, y) + dH(y, z) = (a− b+ c) + (b+ c) = a+ 2 b ≥ a = d(x, z) .
As remarked earlier, we will be using, for the most part, the Euclidean norm and we
now return to that particular case for the sake of definiteness.
Since the Euclidean norm induces a distance function, we can introduce concepts of
convergence of sequences and continuity of functions. We summarize some of these, as
well as related ideas and theorems, which will be crucial for our work. More detail may
be found in any book of advanced calculus or beginning analysis.
The metric structure on Rn leads us to the notion of sequential convergence. Let {xk}∞k=1
be a sequence of points in Rn. This sequence is said to converge to the point xo provided
limk→∞‖xk − xo‖ = 0 ,
that is, provided that, for every ε > 0 there is an integer ko such that ‖xk − xo‖ < ε
for all k > ko. We often write simply xk → xo to indicate convergence. A subse-
quence {y`}∞m=1 of the sequence {xk}∞k=1, is a subset of the original such that, given
an index ` there is an index k with y` = xk, and if `i < `j then k(`i) < k(`j).
50 APPENDIX B. ANALYSIS
For example, the sequence {12, 1, 1
3, 1, 1
4, 1, · · · } contains, among others, the subsequences
{1, 12, 1
3, 1
4, · · · , 1
n, · · · } , {1, 1
4, 1
9, 1, 1
16, 1
25, 1, · · · } and {1, 1, . . . , }.
From the definition it is clear that if xk → xo, then y` → xo for any subsequence
{y`}∞`=1 ⊂ {xk}∞k=1. Obviously the convergence of a subsequence does not imply the
convergence of the original sequence. A point x? is said to be an accumulation point of
the sequence provided that x? is the limit of some subsequence of the original one 11. A
sequence is called a bounded sequence if there exists a number r such that ‖xk‖ < r for
all indices k.
Many arguments in analysis depend on our ability to know that a given sequence
contains a convergent subsequence. The basic result is:
Theorem B.3.23 Every bounded sequence in Rn has a convergent subsequence.
This fact is worth checking. Here is the proof that uses the Bolzano-Weierstrass Theorem.
Proof: There are two situations to consider and they are easiest to describe if we view a
sequence as a function on the integers k → xk. If the range of this function consists of
only finitely many points, then there must be a point x? such that xk = x? for infinitely
many values of the index k. Then clearly the subsequence y` = x? for all ` is a convergent
subsequence with limit x?. The second possibility is that the range {xk}∞k=1 is an infinite
but bounded set. Then, by the Bolzano-Weierstrass Theorem, this set of points contains
an accumulation point, call it x. Since x is an accumulation point of the sequence, every
neighborhood of x contains infinitely many terms of the sequence. Consider, for the sake
of concreteness, the decreasing sequence of neighborhoods
Um :=
{x | ‖x− x‖ < 1
m
},m = 1, 2, · · · .
Choose any point ym ∈ Um ∩ S. Then, clearly, we have ym → x as m→∞.
It is interesting to see that the converse is true.
Proposition B.3.24 If a set, S, has the property that every sequence in it converges
then the set S must be bounded.
11This terminology is in accord with the definition of accumulation point (or limit point) given earlier.
B.3. SUBSETS OF RN 51
Proof: Indeed, assume the contrary. Then if Bk denotes the ball of radius k, S ∩Bk 6= ∅.Then the sets (Bk+1−Bn) are pairwise disjoint and all meet S. Now, for each k = 1, 2, · · · ,choose xk ∈ S ∩ (Bk+1 − Bk). Then this sequence is in S and cannot converge contrary
to the hypothesis on S. Hence S must be bounded.
We can combine this last observation with the fact that a set is closed if and only if
it contains all the accumulation points of sequences of the set. We thereby establish the
following useful statement concerning compactness (see the definition B.3.1 above):
Theorem B.3.25 A set S ⊂ Rn is compact if and only if every sequence of points in S
has a subsequence which converges to a point in S.
Knowing this result suggests that we introduce the following definition.
Definition B.3.26 A subset S ⊂ Rn is called sequentially compact provided every se-
quence {xk}∞k=1 ⊂ S has a subsequence that converges to a point of S.
Using this term, the preceeding result says that every compact set in Rn is also sequentially
compact and conversely. In other words, in Rn the notions of compact and sequentially
compact are equivalent.
As an obvious, but often useful corollary to this result we have the following.
Corollary B.3.27 A closed subset of a compact set is compact.
We leave the simple proof as an exercise.
It is important to have some criterion to determine when a sequence converges. There
is one class of sequences which always converge in Rn. These are the Cauchy sequences
which we can define as follows:
Definition B.3.28 A sequence {xk}∞k=1 is said to be a Cauchy sequence provided that for
any ε > 0 there is an index ko such that, if k ` > ko then ‖xk − x`‖ < ε.
We can easily show that every convergent sequence is a Cauchy sequence: for suppose
that xk −→ xo. Choose ε > 0. Then there is a positive integer ko so that for k >
ko, ‖xk − xo‖ < ε/2. Then for any k, ` > ko we have
‖xk − x`‖ ≤ ‖xk − xo‖+ ‖xo − x`‖ < ε/2 + ε/2 = ε .
52 APPENDIX B. ANALYSIS
The converse statement is definitely not true. The simple example of X = (0, 1] ⊂ Rand the sequence {1/n}∞n=1 shows that a sequence may be Cauchy but yet it does not
converge to a point in the space since the point 0 is not in the space. Of perhaps more his-
torical interest is the metric space {Q, | · |} of the rational numbers with the usual distance
function on the line d(x, y) = |x − y|. Then the sequence {1, 1.4, 1.41, 1.414, 1.4142, . . .}which converges to
√2, which is not rational. This sequence is convergent, and hence
Cauchy.
This kind of behavior cannot happen in Rn. Indeed, it is well known that a sequence
{xk}∞k=1 in Rn converges if and only if it is a Cauchy sequence. The fact that every Cauchy
sequence in Rn converges to a point in Rn is just what is meant by the completeness of
Rn.
This last assertion is easy to check, provided we know that the set R is a complete
space. That this is so is a fundamental fact of elementary analysis, and we will accept it
without question.
The situation in most applications is that we are working, not in all of Rn but rather
in some subset X. It is therefore I absolutely crucial to know when a Cauchy sequence in
some subset X of Rn converges to a point in X itself. If this is the case, then we say that
the subset, X itself, is complete. The next result answers the question.
Proposition B.3.29 Let X ⊂ Rn. Then X is complete if and only if X is a closed subset
of Rn.
Proof: Suppose, first, that every Cauchy sequence in X converges to a point of X. To
show that X is closed, we need to show that it contains all of its limit points. So suppose
that xo is such a limit point. Then, for each positive integer k, the ball B1/k(xo) contains
a point xk ∈ X. Then xk −→ xo and so, by the fact that every convergent sequence
is Cauchy, this sequence must converge to xo ∈ X since we have assumed that X is
complete. So X must contain all its limit points and therefore is closed.
Conversely, suppose that X is closed and let {xk}∞k=1 ⊂ X be a Cauchy sequence. Then,
as a sequence in Rn, this Cauchy sequence must converge, in Rn to some point xo ∈ Rn.
Now this point is a limit point of the set consisting of the points of the sequence. Since
X is closed, this limit point must be in X. Hence X is complete.
B.4. FUNCTIONS ON RN 53
In closing, we should point out that the space Rn now has two aspects to its structure.
On the one hand, we recognize it as a vector space with its attendant algebraic structure.
On the other hand it has a norm structure, in which it makes sense to talk about what
are called topological properties, e.g., of open and closed sets, limit points, convergent
sequences, and completeness. This gives us one example of what is called a normed linear
space. Moreover, it is a complete normed linear space. Such spaces, and there are many
other than just Rn which is, after all, finite dimensional, are called Banach Spaces. Here
is an interesting question: are the maps from Rn × Rn to Rn given by (x,y) 7→ x + y
and from R× Rn to Rn given by (α,x) 7→ αx continuous? In fact, what do we mean by
continuous maps from one set to another? That is the question that we take up next.
B.4 Functions on Rn
In this section we will begin with a discussion of continuous funcitons and some of their
properties. In the following section, we will discuss semicontinuous functions and, later,
introduce the fundamental notion of the epigraph of a function.
B.4.1 Continuous Functions
We start with a defnintion.
Definition B.4.1 Let X ⊂ Rn. Then a function f : X → Rm is said to be continuous at
a point xo provided that if n f(xk)→ f(xo) for all sequences {xk}∞k=1 with xk → xo. The
function f is said to be continuous on X provided it is continuous at each point of X.
This definition is usually called sequential continuity and is due to Heine. In the setting
of a metric space, sequantial continuity is equivalent to the other notions of continuity,
and in particular, to the familiar “epsilon-delta” definition.
As usual, sums, products, compositions, and max of continuous functions is con-
tinuous. Moreover we know from elementary calculus about the continuity of poly-
nomials, exponentials, sine, cosine, and many other functions. In the case that f :
X → Rm, continuity is defined in the same way. In terms of the representation of
f(x) = (f1(x), f2(x), · · · , fm(x))> it is easy to show that continuity of f is equivalent to
continuity of each of its components. Note that if X and Y are subsets of Rn then we can
treat the set of ordered pairs X × Y as a subset of R2n and so treat continuous functions
54 APPENDIX B. ANALYSIS
f : X×Y −→ Rm. With this observation, we can look at some of the operations between
vectors we have discussed and show that, in the sense that these basic operations are con-
tinuous, the algebraic structure of Rn considered as real vector space, and the topological
structure as a normed space are compatible.
Proposition B.4.2 The following functions are continuous:
(a) x 7→ ‖x‖. 12
(b) (x,y) 7→ x + y of Rn × Rn → Rn.
(c) (x, α) 7→ αx of Rn × R→ Rn.
Proof:
(a) To prove the first statement, let xoinX and and consider any sequence {xk}∞k=1 that
converges to it. We wish to show that ‖xk‖ → ‖xo‖ as k →∞. But
0 ≤ | ‖xk‖ − ‖xo‖ | ≤ ‖xk − xo‖ ,
and ‖xk − xo‖ → 0 is what we mean by the sequence converging to xo.
(b) For the second result, let (xo,yo) be an arbitrary point of Rn × Rn and suppose
xk → xo while yk → yo, We wish to show that xk + yk → xo + yo. Indeed, using
the triangle inequality, we have
0 ≤ ‖ (xk +yk)− (xo +yo) ‖ = ‖ (xk−xo) + (yk−yo) ‖ ≤ ‖xk−xo‖+ ‖yk−yo‖
but each term on the right-hand side approaches 0 as k → ∞. Hence xk + yk →xo + yo.
And finally we have
(c) Suppose (xo, αo) ∈ Rn × R and that xk → xo while αk → αo. Then
0 ≤ ‖αkxk − αoxo‖ = ‖αkxk − αoxk + αoxk − αoxo‖
≤ ‖αkxk − αoxk‖+ ‖αoxk − αoxo‖ = ‖ (αk − αo) ‖xk‖+ ‖αo (xk − xo) ‖
= |αk − αo| ‖xk‖+ |αo| ‖xk − xo‖.
But, by hypothesis |αk−αo| ‖xk‖ → 0 · ‖xo‖ = 0 and |αo| ‖xk−xo‖ → |αo| · 0 = 0.
Hence the result.
12This particular result is not surprising since convergence is defined in terms of the norm.
B.4. FUNCTIONS ON RN 55
It is a good idea to have on hand an example of a function that is not continuous in
a very dramatic way, much more dramatic, in fact than simple step discontinuities or
examples like 1/x on the interval (0, 1).
Example B.4.3 Consider the function f : R −→ R given by
f(x) =
1 if x is irrational
0 if x is rational
This function is continuous nowhere! For any open interval in R contains both rational and
irrational points and so every point has sequences of rationals and sequence of rationals
that converge to it. So if xo is rational, take a sequence {xn}∞n=1 of irrationals that
converge to xo and then f(xn) = 1 for all n while f(xo) = 0. Likewise, if xo is irrational,
then we take the sequence to be a sequence or rationals, along which the function has the
value 0 while f(xo) = 1.
We not turn to the behavior of continuous functions on compact sets in Rn. We
want to show in particular, that such functions must be bounded, the image must be
compact, and ta real-valued function takes on its maximum and minimum values. We
begin by combining the first two of these results. Since sequentially compact sets in Rn
are compact and vice versa we can use the sequential definitions in the proofs without
loss of generality.
Proposition B.4.4 Let S ⊂ Rn be (sequentially) compact and f : Rn → Rm be continu-
ous. Then the set f(S) ⊂ Rm is (sequentially) compact.
Proof: Let {yk}∞k=1 be a sequence of points in f(S). We need only show that this sequence
converges to a point of f(S). But for each yk there corresponds a xk ∈ S wth f(xk) = yk.
Since {xk}∞k=1 ⊂ S and S is assumed to be sequentially compact, this sequence contains
a subsequence {xk`}∞k=1 which converges to some xo ∈ S. Since f is assumed continuous,
yk` = f(xk`) −→ f(xo) ∈ f(S) ,
and so the subsequence {yk`}∞`=1 converges to a limit, f(xo) ∈ f(S) which was to be
proved.
Examples show that inverses of continuous functions, even when defined, are not nec-
essariloy continuous. The next result shows that the additional condition of compactness
of the domain is enough to guarantee continuity of the inverse.
56 APPENDIX B. ANALYSIS
Proposition B.4.5 Let S ⊂ Rn be sequentially compact and suppose f : S −→ Rm is
continuous and injective. Then f−1 : f(S) −→ S is continuous.
Proof: The function f : S → f(S) is bijective (one-to-one and onto) and so the inverse function f−1 is
well defined with f−1(f(x)) = x and f(f−1(y)) = y. To show the continuity of f−1 we must show that
if yk → yo in f(S), then f−1(yk) → f−1(yo) in S. To do this, look at the sequence in S given by
xk = f−1(yk). Since S is sequentially compact, all the limit points of the infinite set {xk}∞k=1 belong
to S. Let xo be one such limit point. Then there is a subsequence {xk`}∞`=1 such that xk` → xo in
S. By the continuity of the function f , yk` = f(xk` ) → f(xo) in f(S). But since the original sequence
{yk}∞k=1 converges to yo, so does every subsequence and hence yo = f(xo). But this means since xo was
an arbitrary limit point of {xk}∞k=1, any other limit point, say x would also satisfy yo = f(x). Since f
is injective (one-to-one) xo = x, i.e., the sequence {xk}∞k=1 can have only one limit point. So the entire
sequence {xk}∞k=1 converges to xo = f−1(yo). Hence f−1(yk)→ f−1(yo) in S as was to be proved.
Now, we prove what is probably the most important theorem in the theory of optimi-
ation. The result is due to Weierstrass. Note that here we are dealing with real-valued
functions.
Theorem B.4.6 (Weierstrass) Let f : S −→ R be continuous where S ⊂ Rn is compact.
Then there are points xM ,xm ∈ S such that f(xm) ≤ f(x) ≤ f(xM for all x ∈ S.
Proof: We know that, since S is compact and f is continuous, f is bounded on S and
so f(S) has a least upper bound α. By definition of the least upper bound, for every
k ∈ N , α − 1/k < α. Hence we can find a corresponding point xk ∈ S for which
f(xk) > α− 1/k. But then
α > f(xk) > α− 1/k ,
so that f(xk)→ α as k →∞.
Since the set S is sequentially compact, there is a subsequence {xk`}∞`=1 which converges
to a point xM . Then continuity of f implies that f(xk`) → f(xM). But the original
sequence {f(xk)}∞k=1 converges to α and so f(xM) = α. Hence f takes on its maximum
at a point of the set S.
Finally, since min{f(x)} = max{−f(x)}, the same argument applied to −f yields the
minimum point xm.
As an immediate application we give a proof that we have promised concerning the
equivalence of all norms on Rn.
B.4. FUNCTIONS ON RN 57
Proposition B.4.7 On Rn, all norms are equivalent.
Proof: Let ‖ · ‖α be any norm on Rn. For any x ∈ Rn write x =n∑i=1
xi ei where
{e1, e2, . . . , en} is the standard basis. Then, since the triangle inequality is valid for
any norm,
‖x‖α ≤n∑i=1
|xi| ‖ei‖α ≤ M
n∑i=1
|xi| = M ‖x‖1 ,
where M = max1≤i≤n
{‖ei‖α}.
Now consider the function ϕ : x 7→ ‖x‖α. This function is continuous with respect to
the `1-norm. We can see this by starting with a sequence {xk}∞k=1 which converges to xo
in the sense that ‖xk − xo‖1 → 0 as k →∞. Then
0 ≤∣∣ ‖xk‖α − ‖xo‖α∣∣ ≤ ‖xk − xo‖α ≤ M ‖xk − xo‖1 .
Hence ‖xk − xo‖ → 0 implies that ‖xk‖α → ‖xo‖α as k → 0, i.e., ϕ(xk) → ϕ(xo). So ϕ
is continuous in the metric space {Rn, d1} where d1 is the metric induced by the `1-norm.
Now in this same metric space, the unit sphere S1 = {x ∈ Rn | ‖x‖1} is a closed and
bounded set. Hence it is compact and so, by Weierstrass’s Theorem B.4.6 there exists a
point x ∈ S1 such that ϕ(x) = minx∈S1 ϕ(x). So if m = ϕ(x) then ‖x‖α ≥ m for all
x ∈ S1. It follows that for any x ∈ Rn, x/‖x‖1 ∈ S1 and so
ϕ
(x
‖x‖1
)=
∥∥∥∥ x
‖x‖1
∥∥∥∥α
,=1
‖x‖1
‖x‖α, ≥ m.
It follows that m ‖x‖1 ≤ ‖x‖α ≤ M ‖x‖1.
Now we want to say something about the Intermediate Value Theorem and some related
results. First we need a lemma13.
Lemma B.4.8 Let f : Rn −→ Rm be continuous. Then f−1(V ) ⊂ Rn is closed whenever
V ⊂ Rm is closed.
Proof: We show that f−1(V ) contains all of its limit points. Suppose xo is a limit point
of the set f−1(V ). Then there is a sequence {xk}∞k=1 ⊂ f−1(V ) such that xk → xo. By
continuity of f , yk = f(xk)→ f(xo) = yo. Since, in general, f [−1(V )] ⊂ V the yk lie in
V for all k. But by hypothesis, V is closed so yo ∈ V . Hence f−1(yo) = xo ∈ f−1(V ). So
f−1(V ) contains all its llimit points and so is a closed set.
13This lemma is only half of a general result that characterizes continuous functions.
58 APPENDIX B. ANALYSIS
The result of the lemma allows us to prove that the continuous image of a connected
set is connected. Before reading the proof, you should check the definitions ?? and B.3.16.
We also make use of properties of inverse functions.
Proposition B.4.9 If f is a continuous map from RntoRm, and if E ⊂ Rn is a connected
set, then f(E) is a connected subset of Rm.
Proof: Assume, on the contrary, that f(E) = A ∪ B where A and B are non-empty separated subsets of
Rm. Define the sets G = E ∩ f−1(A) and H = E ∩ f−1(B). Then neither of these sets are empty and
E = GcupH.
Now, A ⊂ c`(A) and this means that G ⊂ f−1(c`(A)) and the lemma tells us that f−1(c`(A)) is closed. So
c`(G) ⊂ f−1(c`(A)) and it follows by applying f , that f(c`(G)) ⊂ c`(A). Since f(H) = B and c`(A)∩B = ∅,we conclude that c`(G)∩H = ∅. The same argument shows that G∩c`(H) = ∅. Thus G and H are separated
which is impossible if E is connected.
Reference to B.3.17 shows that the connected sets in R are just the intervals. This
leads to what is commonly called the Intermediate Value Theorem.
Proposition B.4.10 Let fbe a continuous real-valued function defined on the interval
[a, b]. If f(a) < f(b) and if c ∈ R satisfies f(a) < c < f(b) then there is an x ∈ [a, b] with
f(x) = c.
Proof: By B.3.17, the interval [a, b] is connected. Hence the preceeding theorem implies
that f([a, b]) is a connected subset of R. Then, again by B.3.17, this latter set is an
interval [f(a), f(b)] and c ∈ [f(a), f(b)]. The result is therefore true in this case. The case
that f(b) < f(a) is handled similarly.
Proposition B.4.9, in the case that n = 1 but m > 1 tells us that curves are connected.
Now we want to show that all open or closed balls in Rn are connected. If p andq are
points in Rn we define the line segment joining these to points to be the set of points
{x ∈ Rn |x = (1− λ)p + λ q , 1 ≤ λ ≤ 1} .
Then a typical component xi = (1 − λ) pi + λ qi = pi + λ (qi − pi) and thus each xi is a
continuous image of the interval [0, 1]. So the line segment is a connected set. Now
‖p− q‖ =
(n∑i=1
(pi − xi)2
) 12
=
(n∑i=1
(pi − [pi − λ(qi − pi)]2) 1
2
=
(n∑i=1
λ2(qi − pi)2
) 12
= λ
(n∑i=1
(qi − pi)2
) 12
= λ ‖p− q‖ .
B.4. FUNCTIONS ON RN 59
So we see from this computation that the distance from any point on the line x to p is λ
times the distance between p and q. So the entire line segment between the center of any
ball and a point on the boundary of that ball lies entirely in the ball. The union of all these
radii each of which contains the center, is therefore connected according to Proposition
B.3.18 hence the ball is connected. Note that since Rn itself can be condidered the union
of line segments all containing the origin, this reasoning shows that Rn is connected.
One useful way to study real-valued functions is to study their level sets. If the function
is continuous and c ∈ R it is easy to check that the sets {x | f(x) ≤ c}, {x | f(x) ≥ c}are both closed. Likewise the set {x | f(x) = c} is closed. Indeed, for the first of these
three sets, if x? is a limit point there is a sequence {xk}∞k=1 ⊂ {x | f(x) ≤ c} such that
xk → x?. The continuity of f implies that c ≥ f(xk) → f(x?). Hence f(x?) ≤ c and so
xo ∈ {x ∈ Rn | f(x) ≤ c}. The proofs in the other cases are similar.
We note for future reference that if the function is a linear functional, i.e. if it has the
form f(x) = a1x1 + a2x2 + . . .+ anxn = 〈x,a〉 it is a continous function. In this case, the
third of these sets, namely {x | f(x) = c}, is called a hyperplane and the two other level
sets constitute two closed half spaces determined by this hyperplane as can be seen easily
by drawing a picture for the case n = 2. These structures will play a significant role in
our discussion of convex sets and optimization, particularly in the theory of duality.
As one would expect from earlier studies, the composition of continuous functions is
continuous. The composition is defined in the usual way. X ⊂ Rn and that f : X −→ Rm
while Y ⊂ Rm which contains f(X). Then, if g : Y → Rp we can define g ◦ g : X −→ Rp
by x 7→ g(f(x). Then it is easy to see that the continuity of f and g imply the continuity
of g ◦ f . Indeed, let xo ∈ X and let {xk}∞k=1 be any sequence converging to xo. Then the
continuity of f implies that f(xk)→ f(xo) and since g is continuous g(f(xk))→ g(f(xo))
as k →∞. Hence the continuity of the composition.
It is important as well as often useful, to know how open and closed sets are related to
continuity.
B.4.2 Semicontinuous Functions
For optimization problems the notion of lower semicontinuous function (as well as upper
semicontinuous function) is crucial. Suppose that f : Rn → R and that {xk}∞k=1 ⊂ Rn is a
sequence. Then {f(xk)}∞k=1 is a sequence in R.
60 APPENDIX B. ANALYSIS
Definition B.4.11 A function f is lower semicontinuous at xo provided f(xo) ≤ lim infk→∞
f(xk)
for all sequences {xk}∞k=1 which converge to xo. A function is called lower semicontinuous
on a set D ⊂ Rn provided that it is lower semicontinuous at every point of D. The func-
tion is called upper semicontinuous at xo provided f(xo) ≥ lim supk→∞
f(xk) for all sequences
{xk}∞k=1 which converge to xo.
We note that, from the relations between lim sup and lim inf it follows that a function
is continuous at a point if it is both lower and upper semicontinuous at that point. Indeed
f(xo) ≤ lim infk→∞
f(xk) ≤ lim supk→∞
f(xk) ≤ f(xo) .
The converse statement is trivial.
Example B.4.12 Simple examples of functions which are upper semicontinuous every-
where but are not continuous at some x0 are
f1(x) =
0, if x < x0
1, if x ≥ x0
and f2(x) =
0, if x 6= x0
1, if x = x0
.
On the other hand −f1 and −f2 are examples of lower semicontinuous functions. A
more interesting example is χQ, the characteristic function of the set of rationals14. The
function χQ is clearly upper semicontinuous at each rational and lower semicontinuous at
each irrational.
It is interesting and important in optimization, to see that the upper and lower semi-
continuous functions can be characterized by the nature of the sets {x | f(x) ≥ a} and
{x | f(x) ≤ a}. This is the content of the next result.
Theorem B.4.13 A function, f , is upper semicontinuous on E ⊂ Rn if and only if
{x ∈ E|f(x) ≥ a} is closed for all a ∈ R and f is lower semicontinuous on E if and only
if {x ∈ E|f(x) ≤ a} is closed for all a ∈ R.
14Given a set S the characteristic function of S, χS , is the function that takes the value 1 on S and 0
at all other points.
B.4. FUNCTIONS ON RN 61
Proof: We check only the first statement since it is equivalent to the second, f being
upper semicontinuous on E if and only if −f is lower semicontinuous there.
Suppose, first, that f is upper semicontinuous on E. Let a ∈ R and let xo ∈ E be
a limit point of {x ∈ E|f(x) ≥ a} then there exists a sequence xk → xo, xk ∈ E,
k = 1, 2, . . ., and f(xk) ≥ a. By upper semicontinuity, f(xo) ≥ limk→∞ f(xk) ≥ a. Hence
the set is closed. Conversely, suppose that xo is a limit point of E which is in E and that
f is not upper semicontinuous at xo. Then f(xo) <∞ and there is a M and a sequence
{xk}∞k=1 ⊂ E such that f(xo) < M , xk → xo, and f(xk) ≥ M . Hence {x ∈ E|f(x) ≥ M}is not closed since it does not contain all of its limit points.
Remark: By complimentation f is upper semicontinuous on E provided sets of the
form {x ∈ E|f(x) < a} are open and lower semicontinuous provided {x ∈ E|f(x) > a}are closed.
Exercise B.4.14 Let S ⊂ Rn and let χS be its characteristic function. Then χS is upper
semicontinuous on S if and only if S is closed.
We have seen in Proposition B.4.6 that if K ⊂ Rn is compact and if f : K → R is
continuous, then the function f assumes its least upper bound and greatest lower bound
at points of K. In fact, a careful look at the definitions and the proof of that result leads
to the conclusion that if f is upper semicontinuous on K, then it will achieve its least
upper bound and, if it is lower semicontinuous, it will achieve its greatest lower bound.
Let us prove the first statement.
Theorem B.4.15 If K ⊂ Rn is compact and if f : K → R is upper semicontinuous on
K, then there exists a point xo ∈ K for which f(xo) = supx∈K f(x).
Proof: Let L := supx∈K f(x). Then, by definition of the supremum, there exists a
sequence of points {xk}∞k=1 ⊂ K such that f(xk) → L as k → ∞. Since K is compact,
the sequence {xk}∞k=1 contains a convergent subsequence {xk`}∞`=1 which converges to
some point xo ∈ K. By the upper semicontinuity of f, L ≥ f(xo) ≥ lim`→∞f(xk`) =
lim`→∞ f(xk`) = L. Hence f(xo) = L which was to be proved.
This result just hints at the importance that upper and lower semicontinuous real-
valued functions play in the theory and practice of optimization.
62 APPENDIX B. ANALYSIS
B.4.3 The Extended Reals
In our work, we will often find it useful to extend the usual set R of real numbers to a
set which allows us to represent unbounded numbers. In order to do this, we adjoin two
symbols −∞ and ∞ to R and denote by R? the set R∪ {−∞}∪ {∞}. This set is known
as the set of extended real numbers. We extend the usual order relation of R to the
set R? by defining −∞ < ∞ and, for all x ∈ R,−∞ < x < ∞. Thus, in particular,
intervals, whether open, closed, or half open (or half closed) are defined as usual e.g.,
[a, b] := {x ∈ R? | a ≤ x ≤ b}.
If, in this last example, a, b ∈ R then the interval [a, b] is said to be bounded, otherwise,
we call it unbounded.
The usual arithmetic operation are likewise extended to R? with some exceptions.
∞+ x = x+∞ =∞ ; −∞+ x = x+ (−∞) = x−∞ = −∞ .
∞+∞ =∞ , (−∞) + (−∞) = −∞ ; −(∞) = −∞, and− (−∞) =∞.
but we do not define ∞+ (−∞) or −∞+∞. Likewise, for x > 0,
∞ · x = x · ∞ =∞
and
−∞ · x = x · (−∞) = −∞
while for x ≤ 0,
∞ · x = x · ∞ = −∞
and
−∞ · x = x · (−∞) =∞ .
The expressions ∞ · −(∞) and −∞ · (+∞), are not defined. Finally, we will find it
convenient to adopt the convention that
∞ · 0 = 0 · ∞ = −∞ · 0 = 0 · (−∞) = 0 .
Once we have an ordering on R? we can define suprema and infima.
Definition B.4.16 Let A ⊂ R? then we will define the supremum of A,
sup A, and the infimum of A, inf A, according to
B.4. FUNCTIONS ON RN 63
If A 6= ∅ then if
(i) A is bounded above then sup A is an element u ∈ R which is an upper
bound and is smaller than all other upper bounds for A;
(ii) A is bounded below then inf A is an element b ∈ R which is a lower bound
and is larger than all other lower bounds for A;
(iii) A is unbounded above, then sup A =∞;
and if
(iv) A is unbounded below, then sup A = −∞.
On the other hand, if A = ∅ then
(i) sup A = −∞
and
(i) inf A =∞.
In the case that sup A ∈ A it will be called maxA. Similarly, inf A will be
called min A inf A ∈ A.
For some purposes it will be convenient to treat extended real valued functions. Sit-
uations arise where, for example, it is convenient to extend a function to all of Rn by
defining the value of the function as ∞ outside its original domain. In this context, it is
useful to introduce the definition of the indicator function of a set S.
Definition B.4.17 Let S be a non-empty set. Then the indicator function of S is
defined as the function ψS given by
ψS =
0 if x ∈ S
+∞ if x 6∈ S.
Then, given a function f : X → Rn we can extend f to f : Rn → R∗ by defining
64 APPENDIX B. ANALYSIS
f(x) := (f + ψS) (x) =
f(x), if x ∈ S
+∞, if x 6∈ S.
If, on the other hand, we start with a function f : Rn → R and consider a non-empty set
X ⊂ Rn (which may, for example, represent the set of feasible points of an optimization
problem), then we can consider the restriction of f to X defined in terms of its graph
Gr (f |X) := {(x, y) |x ∈ X, y = f(x)} .
Then we can identify f |X with the extended real-valued function f = f + ψX .
B.4.4 Epigraphs and Effective Domains
The notions of epigraph and effective domain will be crucial for our discussion.
Suppose that X ⊂ Rn and f : X → R∗. Then the epigraph of f is the subset of Rn+1
given by
epi (f) := {(x, z) ∈ Rn+1 |x ∈ X, z ≥ f(x)} ,
while the effective domain of f is defined to be the set
dom (f) := {x ∈ X | f(x) <∞} .
Since epi(f) ⊂ Rn=1, it is easy to see that
dom (f) = {x ∈ X | for some z <∞ , (x, z) ∈ epi (f)}
Thus, the effective domain is just the projection of epi(f) onto Rn. If the function f is
restricted to its effective domain, its epigraph is not affected. Likewise, if we extend f by
setting f(x) =∞ for all x ∈ Rn \X the epigraph remains the same.
We often exclude the degenerate case where f ≡ ∞ (in which case epi(f) = ∅) as well
as the case in which f takes on the value −∞ at some point of its domain (in which case
epi(f) contains a vertical line.) We will say that f is proper if f(x) <∞ for at least one
x ∈ X and f(x) > −∞ for all x ∈ X.
B.4. FUNCTIONS ON RN 65
We have seen, above, that there is a close connection between a lower semicontinous
function and properties of its level sets. We extend that result here.
Proposition B.4.18 For a function f : Rn → R∗, the following statements are equiva-
lent.
(a) The level set {x ∈ Rn | f(x) ≤ α} is closed for every real number α.
(b) The function f is lower semicontinuous on Rn.
(c) The set epi(f) is closed.
Proof: If f(x) ≡ ∞ the result is trivial. So we may assume that f(x) < ∞ for at least
one x ∈ Rn and, hence, that epi(f) 6= ∅. It follows from the definitions that there is at
least one non-empty level set.
Assume that the level set {x ∈ Rn | f(x) ≤ α} is closed for every choice of real
number α but that, for some x and some sequence {xk}∞k=1 with xk → x we have
f(x) > lim infk→∞
f(xk). Choose γ such that f(x) > γ > lim infk→∞
f(xk). Then there ex-
ists a subsequence, call it, again, {xk}∞k=1 such that f(xk) ≤ γ for all k = 1, 2, · · · . Since
level sets are assumed closed, this implies that f(x) ≤ γ which is a contradiction.
Now, suppose that f is lower semicontinuous and let (x , z) be a limit point of epi(f).
Then there exists a sequence {(xk, zk)}∞k=1 such that xk → x and zk → z and f(xk) ≤ zk
for all k. Hence, by lower semicontinuity, f(x) ≤ lim infk→∞
f(xk) ≤ z. Hence (x, z) ∈epi (f) and so epi(f) is closed.
Finally, to see that a closed epigraph entails closed level sets, suppose {xk}∞k=1 ⊂{x | f(x) ≤ α} for some α ∈ R. Suppose, further that xk → x. Then, for every
k , (xk, α) ∈ epi (f) and, since the epigraph is assumed closed and (xk, α) → (x, α) we
have (x, α) ∈ epi(f). Hence f(x) ≤ α so that x ∈ {x | f(x) ≤ α}.
66 APPENDIX B. ANALYSIS
Bibliography
[1] D. P. Bertsekas, Nonlinear Programming, Second Ed., Athena Scientific, Bell-
mont, MA, 1999.
[2] R. P. Boas,A Primer of Real Functions, Carus Mathematical Monographs vol. 13,
Fourth Ed., Mathematical Asooiciation of America, Washington, DC, 1996.
[3] F. H. Clarke, Optimization and Nonsmooth Analysis, John Wiley & Sons, New
York, NY, 1983.
[4] G. Danzig, Linear Programming and Extensions, Princeton University Press,
Princeton, NJ, 1963.
[5] G. Debreu, Theory of Value: an axiomatic analysis of economic equilibrium, Cowles
Foundation Monograph 17, Yale University Press, New Haven, CT, 1959.
[6] P. R. Halmos, Naive Set Theory, D. van Nostrand Company, Princeton, New
Jersey, 1961. (Rpt: Springer-Verlag, 1974).
[7] M. D. Intriligator, Mathematical Optimization and Economic Theory, SIAM
Publications, Philadelphia, PA, 2002.
[8] A. Marshall, Princples of Economics, Ninth (Variorum) Ed., McMillan, New York,
NY, 1961.
[9] J. von Neumann and O. Morgenstern, Theory of Games and Economic Be-
havior, John Wiley & Sons, New York, 1964.
[10] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ,
1970.
[11] M. Rosenlicht, Introduction to Analysis, Dover Publications, New York, NY, 1986.
67
68 BIBLIOGRAPHY
[12] W. Ruden, Principles of Mathematical Analysis, McGraw-Hill, New York, NY,
1976.
[13] P. A. Samuelson, Foundations of Economic Analysis, Harvard University Press,
Cambridge MA, 1947.
[14] P. A. Samuelson, Economics: an introductory analysis, Fifth Ed., McGraw-Hill,
New York, NY, 1961.
[15] A. Takayama, Mathematical Economics, Second Ed., Cambridge University Press,
Cambridge, UK, 1985.