20151103 cfd

Stephen M. Robinson

Convexity in Finite-DimensionalSpaces

November 3, 2015

Springer

Preface

Sections with an asterisk before their titles contain material that is not required forthe understanding of later sections, though there may be references to it and it islikely to be useful in providing perspective on the subject.

(Remainder of preface to be written)

v

Contents

Part I Convex Sets

1 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 Basic ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Dimensionality results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.3 Simplices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.1.4 Exercises for Section 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.2 Relative interiors of convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.2.1 Nonemptiness of the relative interior . . . . . . . . . . . . . . . . . . . . 151.2.2 Line segments and the relative interior . . . . . . . . . . . . . . . . . . . 171.2.3 Regularity conditions for set operations . . . . . . . . . . . . . . . . . . 201.2.4 Exercises for Section 1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.3 Notes and references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2 Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.1 Definitions of separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.2 Conditions for separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2.1 Strong separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2.2 Proper separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.2.3 Exercises for Section 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.1 Basic properties and some applications . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.1.2 Relative interiors of cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.1.3 Two theorems of the alternative . . . . . . . . . . . . . . . . . . . . . . . . 393.1.4 Exercises for Section 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2 Cones and linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.2.1 Exercises for Section 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3 Tangent and normal cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

vii

viii Contents

3.3.1 Exercises for Section 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.4 Notes and references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.1 Recession . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.1.1 The recession cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.1.2 The homogenizing cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.1.3 When are linear images of convex sets closed? . . . . . . . . . . . . 584.1.4 Exercises for Section 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2 Faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.2.1 Exercises for Section 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.3 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.3.1 Exercises for Section 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77


5 Polyhedrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.1 Polyhedral convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.2 The Minkowski-Weyl theorem and consequences . . . . . . . . . . . . . . . . 86

5.2.1 The Minkowski-Weyl theorem . . . . . . . . . . . . . . . . . . . . . . . . . 865.2.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.2.3 Exercises for Section 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.3 Polyhedrality and structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.4 Polyhedral multifunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.4.1 Elementary properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.4.2 Polyhedral convex multifunctions . . . . . . . . . . . . . . . . . . . . . . 985.4.3 General polyhedral multifunctions . . . . . . . . . . . . . . . . . . . . . . 1025.4.4 Exercises for Section 5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.5 Polyhedrality and separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.5.1 The CRF theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.5.2 Some applications of the CRF theorem . . . . . . . . . . . . . . . . . . 1105.5.3 Exercises for Section 5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112


Part II Convex Functions

6 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156.1 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.1.1 Convex functions with extended real values . . . . . . . . . . . . . . 1156.1.2 Some special results for finite-valued convex functions . . . . . 1216.1.3 *Example: Convexity and an intractable function . . . . . . . . . 1256.1.4 Exercises for Section 6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.2 Lower semicontinuity and closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276.2.1 Semicontinuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.2.2 Closed convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1316.2.3 Exercises for Section 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Contents ix

7 Conjugacy and recession . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397.1 Conjugate functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7.1.1 Definition and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397.1.2 Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1427.1.3 Support functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1437.1.4 Gauge functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1497.1.5 Exercises for Section 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

7.2 Recession functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1537.2.1 Exercises for Section 7.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

8 Subdifferentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1658.1 Subgradients and the subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

8.1.1 Exercises for Section 8.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1698.2 Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

8.2.1 Exercises for Section 8.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1718.3 Structure of the subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

8.3.1 Exercises for Section 8.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1748.4 The epsilon-subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

8.4.1 Definition and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

9 Functional operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1819.1 Convex functions and linear transformations . . . . . . . . . . . . . . . . . . . . 1819.2 Moreau envelopes and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1879.3 Systems of convex inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

10 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19510.1 Conceptual model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19610.2 Existence of saddle points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20010.3 Conjugate duality: formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

10.3.1 Constructing the Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . 20710.3.2 Symmetric duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20810.3.3 Example: linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . 20910.3.4 An economic interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

10.4 Conjugate duality: existence of solutions . . . . . . . . . . . . . . . . . . . . . . . 21510.4.1 Example: convex nonlinear programming . . . . . . . . . . . . . . . . 21710.4.2 The augmented Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219


A Topics from Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221A.1 Linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

A.1.1 Exercises for Section A.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222A.2 Normed linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

A.2.1 Exercises for Section A.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229A.2.2 Notes and references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

A.3 Linear spaces of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229A.3.1 Exercises for Section A.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

x Contents

A.3.2 Notes and references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234A.4 The singular value decomposition of a matrix . . . . . . . . . . . . . . . . . . . 234

A.4.1 Exercises for Section A.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237A.4.2 Notes and references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

A.5 Generalized inverses of linear operators . . . . . . . . . . . . . . . . . . . . . . . . 238A.5.1 Existence of generalized inverses . . . . . . . . . . . . . . . . . . . . . . . 238A.5.2 The Moore-Penrose generalized inverse . . . . . . . . . . . . . . . . . 240A.5.3 Notes and references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

A.6 Affine sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241A.6.1 Basic ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242A.6.2 The affine hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245A.6.3 General position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246A.6.4 Affine transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249A.6.5 The lineality space of a general set . . . . . . . . . . . . . . . . . . . . . . 252A.6.6 Exercises for Section A.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

B Topics from Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255B.1 Three inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

B.1.1 The arithmetic-geometric mean inequality . . . . . . . . . . . . . . . 255B.1.2 The Holder inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256B.1.3 Minkowski’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

B.2 *Projectors and distance functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257B.2.1 Notes and references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

C Topics from Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261C.1 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

Part IConvex Sets

Chapter 1Convex sets

This chapter introduces convex sets and explains some basic concepts such as prop-erties, such as the convex hull. A special case of this construction is the simplex, akind of elementary convex set that is very useful for developing properties of con-vex sets. Two important theorems, due to Caratheodory and to Shapley and Folkmanrespectively, follow. By combining properties of the simplex with some elementaryproperties of interiors, one can define the relative interior of a convex set. In contrastto the situation with interiors, every nonempty convex set has a nonempty relativeinterior. Among other things, it leads to important regularity conditions for opera-tions on convex sets.

To read this chapter you need to know several properties of affine sets (translatedlinear spaces). These are often included in linear algebra courses, but all of thenecessary material is in Section A.6 in the appendix.

1.1 Convex sets

Most of what we do in this book will involve sets having a property called convex-ity. These sets turn out to be considerably more interesting than the simple affinesets appearing in Section A.6. This section gives basic definitions and results aboutconvex sets that we use throughout the rest of the book.

1.1.1 Basic ideas

Definition 1.1.1. A subset C of Rn is convex if whenever c1 and c2 belong to C andλ ∈ (0,1), the point (1−λ )c1 +λc2 belongs to C. ⊓⊔

This says that C contains the line segment between each pair of its points, wherewe define the line segment [c1,c2] between c1 and c2 to be the set (1−λ )c1+λc2 |

3

4 1 Convex sets

0 ≤ λ ≤ 1. We also use the notation (c1,c2], [c1,c2), and (c1,c2) for segmentslacking one or both endpoints.

We call a collection λ1, . . . ,λk of nonnegative numbers that sum to 1 convex co-efficients, and we say a linear combination ∑I

i=1 λixi of points xi ∈ Rn is a convexcombination if the λi are convex coefficients.

Proposition 1.1.2. A subset C of Rn is convex if and only if it contains every convexcombination of finitely many of its points.

The proof parallels that given for affine sets in Proposition A.6.2.

Proof. (if) This is immediate, since the hypothesis implies that C satisfies Definition1.1.1.

(only if) Suppose that C is convex. If it is empty or a singleton, then the con-clusion needs no proof. Therefore suppose C contains at least two points. We knowthat it contains all convex combinations of two-point subsets, because it is convex.Therefore let k > 2 and suppose C contains all convex combinations of k−1 or fewerpoints. Let x1, . . . ,xk be points of C and let x := ∑k

i=1 λixi be a convex combinationof these. As k = 1 not all of the λi can be 1, and by renumbering if necessary we canassume that λk < 1. Then

x = (1−λk)k−1

∑i=1

[(1−λk)−1λi]xi +λkxk.

The summation in the first term is a convex combination of k−1 points of C; henceby the induction hypothesis it is in C. The definition of convexity then tells us thatx ∈C. ⊓⊔

Any convex set has an affine hull, and we define the dimension dimC and theparallel subspace parC of C to be those of affC. By using the definition of convexityone can show that the closure of a convex set is convex.

The definition of convexity shows that the intersection of an arbitrary collectionof convex sets is convex. If we specify a subset of Rn and take the collection toconsist of all convex sets containing that subset, we obtain the convex hull of thesubset.

Definition 1.1.3. S be a subset of Rn. The convex hull of S, written convS, is theintersection of all convex subsets of Rn that contain S. ⊓⊔

As the intersection of convex sets is convex, the convex hull of a set S is itself aconvex set. As it contains S, it is the smallest convex set containing S. It need not beclosed, even if S is closed (Exercise 1.1.23), so the operators conv and cl need notcommute.

An alternate way of obtaining convS is to build it up from S using the construc-tion of Proposition 1.1.2, as we now show.

Proposition 1.1.4. If S is a subset of Rn, then convS is the set of all convex combi-nations of finitely many points of S.

1.1 Convex sets 5

Proof. Write C for the set of all such convex combinations. If T is a convex setcontaining S, then a finite collection of points from S is a subset of T , and by Propo-sition 1.1.2 any convex combination of such a collection must belong to T ; thereforeC ⊂ T . As T was arbitrary, we have proved that C ⊂ convS. On the other hand, aconvex combination of two points of C is, by the definition of C, also a convex com-bination of some finite subset of S, and therefore it lies in C. This means that C isconvex, and it certainly contains S; therefore C ⊃ convS. ⊓⊔

Using Proposition 1.1.4, one can show easily that the convex-hull operator com-mutes with an affine transformation: that is, if S is a subset of Rn and T is an affinemap from Rn to Rm, then T (convS) = convT (S). The situation with inverse imagesis not quite so nice (Exercise 1.1.27).

If S is a subset of Rn, then the diameter of S is diamS = sup∥s− s′∥ | s,s′ ∈ S;it may be +∞, and it is −∞ by convention if S is empty. Although taking the convexhull may enlarge a set, it does not increase its diameter.

Proposition 1.1.5. If S is a subset of Rn, then diam convS = diamS.

Proof. If S is empty then so is convS and there is nothing to prove. If S is nonempty,then as it is a subset of convS we have diamS ≤ diam convS. On the other hand, ifx and x′ are two points of convS then by Proposition 1.1.4 we can write

x =I

∑i=1

λisi, x′ =J

∑j=1

µ js′j,

where the λi and µ j are convex coefficients and the points si and s′j belong to S.Then the products µiλ j are also convex coefficients, so

∥x− x′∥= ∥I

∑i=1

λisi −J

∑j=1

µ js′j∥= ∥I

∑i=1

J

∑j=1

λiµ j(si − s j)∥ ≤I

∑i=1

J

∑j=1

λiµ j∥si − s j∥.

As ∥si − s j∥ ≤ diamS for each i and j, we have diam convS ≤ diamS. ⊓⊔

1.1.2 Dimensionality results

Proposition 1.1.4 gives us a usable representation of the convex hull of a subsetS of Rn in terms of points of S. However, it does not exclude the possibility thatwe might need to take convex combinations of arbitrarily large numbers of pointsto form elements of convS. In finite-dimensional spaces that possibility does notoccur, as we see next.

Theorem 1.1.6. Let S be a subset of Rn. Then each element of convS is a convexcombination, with strictly positive coefficients, of a finite set of points of S that arein general position.

6 1 Convex sets

The finite set of points will depend on the chosen element, and in general it willnot be unique.

Proof. If S is empty, there is nothing to prove. Therefore suppose x ∈ convS, anduse Proposition 1.1.4 to find an integer k ≥ 0, a set s0, . . . ,sk of elements of S, anda set of convex coefficients λ0, . . . ,λk with x = ∑k

i=0 λisi. We may suppose that k isthe smallest integer for which x can be expressed as such a convex combination, andtherefore in particular that all of the λi are positive. If the si are in general positionwe are finished; therefore suppose not. The λi satisfy the linear system

k

∑i=0

(si1

)λi =

(x1

), (1.1)

and by Theorem A.6.12 the vectors in parentheses on the left side of (1.1) are lin-early dependent. Let µ0, . . . ,µk be numbers, not all zero, satisfying

k

∑i=0

(si1

)µi = 0,

where the element on the right-hand side is the origin of Rn+1. As the µi sum tozero but are not all zero, at least one is positive. Let ρ be the minimum of the ratiosλi/µi for those i such that µi > 0, and for each i let τi = λi −ρµi. The τi are thenconvex coefficients, and x = ∑k

i=0 τisi. But at least one of the τi is zero (choose any ifor which the minimum in the definition of ρ was attained), and this contradicts ourminimality assumption about k. Therefore in fact the si are in general position. ⊓⊔

Corollary 1.1.7 (Caratheodory’s theorem). If a subset S of Rn has dimension k,then each point of convS is a convex combination of no more than k+1 points of S.

Proof. If S is empty there is nothing to prove. Otherwise, given a point of convS useTheorem 1.1.6 to write it as a convex combination of a subset of S that is in generalposition. No such subset can contain more than k+1 elements. ⊓⊔

We noted earlier that the convex hull of a closed set need not be closed. As anexample of the application of Caratheodory’s theorem, we show that if the set istaken to be compact instead of just closed, then the convex hull is also compact.

Proposition 1.1.8. If S is a compact subset of Rn, then convS is compact.

Proof. If S is empty there is nothing to prove, so assume S = /0 and let

Λ = (λ1, . . . ,λn+1) | λi ≥ 0 for each i,n+1

∑i=1

λi = 1, K =n+1

∏i=1

S.

Then Λ and K are compact subsets of Rn+1 and R(n+1)n respectively; the points ofK are just the (n+ 1)-tuples (s1, . . . ,sn+1) with each si ∈ S. Let f be the functiondefined on Λ ×K by

1.1 Convex sets 7

f (λ1, . . . ,λn+1,s1, . . . ,sn+1) =n+1

∑i=1

λisi.

Then any point of the set f (Λ ×K) is a convex combination of elements of S, hencean element of convS. But Caratheodory’s theorem tells us that in fact f (Λ ×K)includes every point of convS. Therefore convS is the image of the compact setΛ ×K under the continuous function f , so it is compact. ⊓⊔

For any subset S of Rn we have convS ⊃ S, so clconvS ⊃ clS. We already notedthat the closure of a convex set is convex, so clconvS is a convex set containingclS, and therefore it contains convclS. As we noted above, the two sets may not beequal. However, suppose S is bounded. Then clS is compact, so convclS is com-pact by Proposition 1.1.8. As clS ⊃ S we have convclS ⊃ convS. The set on theleft is compact, hence a fortiori closed; by applying the closure operator we haveconvclS ⊃ clconvS. Therefore in this case the operators conv and cl actually com-mute. The situation is much simpler for the convex-hull and affine-hull operators;see Exercise 1.1.25.

The following theorem, attributed to Lloyd Shapley and Jon Folkman, has a proofsimilar to that of Caratheodory’s theorem.

Theorem 1.1.9 (Shapley-Folkman, 1966). Let Sα | α ∈ A be a finite family ofsubsets of Rn. For each x ∈ conv∑α∈A Sα there is a subset G of A , containing nomore than n elements, such that

x ∈ ∑α∈G

convSα + ∑α∈A \G

Sα .

Proof. Fix x ∈ conv∑α∈A Sα . As conv∑α∈A Sα = ∑α∈A convSα , for each α ∈ Awe can find a point xα ∈ convSα and a positive integer Nα so that

x = ∑α∈A

xα , xα =Nα

∑i=1

λ αi xα

i ,

with λ αi | i = 1, . . . ,Nα being convex coefficients for each α and with xα

i ∈ Sα foreach α and i.

For each choice of the points xα and of their representations as convex combina-tions of points xα

i , the quantity ∑α∈A |Nα | is a positive integer. We select the pointsxα and their representations so that the sum attains its minimum value over all suchchoices; in particular, this will force each coefficient λ α

i to be positive.Now let G = α ∈ A | Nα > 1. If α ∈ G then xα does not equal any of the xα

i ,and if α /∈ G then xα ∈ Sα . Thus if G has no more than n elements we are finished.

Suppose then that G has more than n elements; we will produce a contradiction.In this case the set xα − xα

1 | α ∈ G is linearly dependent in Rn, so that there arescalars µα | α ∈ G , not all zero, with ∑α∈G µα(xα

1 − xα) = 0. Then for any realnumber θ we have

8 1 Convex sets

x− ∑α∈A \G

xα = ∑α∈G

xα +θ µα [xα1 − xα ]= ∑

α∈G

Nα

∑i=1

λ αi (θ)xα

i , (1.2)

where

λ αi (θ) =

(1−θ µα)λ α

i +θ µα if i = 1,(1−θ µα)λ α

i if i > 1.

For each α ∈ G and for i = 1, . . . ,Nα we have λ αi (0) = λ α

i > 0, and for each θand each α ∈ G , ∑Nα

i=1 λ αi (θ) = 1. In addition, for each α ∈ G , xα = xα

1 so that noλ α

1 can be 1. Therefore, for those α ∈ G for which µα = 0, the quantities λ αi (θ) will

change as θ increases from 0. As their sum remains 1, some will increase and somewill decrease. Let θ∗ be the smallest value of θ for which one of the λ α

i (θ) takes thevalue zero. Then (1.2) still holds, but the total number of positive coefficients λ α

i (θ)has decreased by at least one, which is impossible because of our earlier choice tominimize the total number of positive coefficients. This contradiction shows that thenumber of elements in G does not exceed n. ⊓⊔

The Shapley-Folkman theorem has an interesting consequence, described in Ex-ample 1.1.11 below. To state the example we need the following definition.

Definition 1.1.10. Let P and Q be subsets of Rn.

a. The distance from a point x ∈Rn to Q is dQ(x) = infq∈Q ∥x−q∥ (+∞ by conven-tion if Q is empty). We will sometimes write d(x,Q) instead of dQ(x).

b. If Q is nonempty, the excess e(P,Q) of P over Q is supp∈P dQ(p) (−∞ by conven-tion if P is empty).

c. If P and Q are nonempty, the Pompeiu-Hausdorff distance ρ(P,Q) between Pand Q is maxe(P,Q),e(Q,P).

⊓⊔

In the literature, the Pompeiu-Hausdorff distance is often called the Hausdorffmetric. An equivalent way of writing it is

ρ(P,Q) = infµ ≥ 0 | P ⊂ Q+µBn, Q ⊂ P+µBn,

where Bn is the Euclidean unit ball in Rn: that is, the set of all x ∈ Rn with ∥x∥ ≤ 1.

Example 1.1.11. Suppose that we have a sequence of nonempty subsets Si | i =1,2, . . . of some compact convex set K in Rn. For any positive integer k define

Ak = k−1k

∑i=1

Si, Ck = convAk.

Thus the Ak are averages of the sets S1, . . . ,Sk, and Ak ⊂Ck. The quantity e(Ck,Ak)is a measure of the nonconvexity of Ak.

Let k be large compared to n, and let x be any point of Ck, so that kx ∈conv∑k

i=1 Si. According to the Shapley-Folkman theorem, there are points x1 ∈

1.1 Convex sets 9

convS1, . . . ,xk ∈ convSk, of which all but n can actually be chosen with xi ∈ Si,such that s := ∑k

i=1 xi = kx. We now replace those points xi in the sum s that are notin Si by points x′i ∈ Si, and call the resulting sum s′. Then s′ ∈ kAk. Further, we haves = s′+ r, where r is a sum of no more than n terms of the form xi − x′i. As both xiand x′i are in K, we have ∥r∥ ≤ ndiamK. Then

x = k−1s = k−1s′+ k−1r ∈ Ak + k−1r.

As x was any point of Ck, we have

e(Ck,Ak)≤ ∥k−1r∥ ≤ (n/k)diamK. (1.3)

The right-hand side of (1.3) becomes as small as we please as k increases, so thatthe set averages Ak become almost convex as the number of sets being averagedincreases. ⊓⊔

1.1.3 Simplices

This section introduces a class of convex sets, with very simple structure, that serveas building blocks for other convex sets. A set of this class is called a simplex (plural:simplices). Later in the section we define a class of generalized simplices useful forunderstanding the properties of unbounded convex sets.

Simplices

Definition 1.1.12. A k-simplex in Rn is the convex hull σ of a set x0, . . . ,xk ofk+ 1 points of Rn that are in general position. The points xi are the vertices of σ .The barycenter of σ is the point in σ whose barycentric coordinates with respect tothe vertices are (k+1)−1, . . . ,(k+1)−1: that is, the average of its vertices. ⊓⊔

A point in the affine hull of a simplex actually belongs to the simplex if and onlyif its barycentric coordinates with respect to the vertices are convex coefficients.Thus another way to express the content of Theorem 1.1.6 is to say that convS is aunion of simplices whose vertices are points of S.

The simplex illustrates why the number of points required by Caratheodory’stheorem (Corollary 1.1.7) is the best possible if we specify only the dimensionality kof the set S appearing in the theorem. For any k ≥ 0, take S to be the set of vertices ofa k-simplex. As the barycentric coordinates are unique, no point whose barycentriccoordinates are all positive can be written as a convex combination of k or fewerpoints of S.

Lemma 1.1.13. Let A be a k-dimensional affine subset of Rn; let c be a point of Aand N be a neighborhood of c in A. Then N contains a k-simplex σ having barycen-ter c.

10 1 Convex sets

Proof. Choose k linearly independent vectors y1, . . . ,yk in parA. For any point x0 ∈A and for i = 1, . . . ,k, define xi := x0 + yi. The points x0, . . . ,xk are then in generalposition by Theorem A.6.12. The barycenter of the k-simplex convx0, . . . ,xk isthen

b = (k+1)−1k

∑i=0

xi = x0 +(k+1)−1s,

where s = y1 + . . .+ yk. To make c be the barycenter we choose x0 = c− (k+1)−1sand call the resulting simplex σ .

For ε ∈ (0,1] define

xεi := (1− ε)c+ εxi, i = 0, . . . ,k;

again Theorem A.6.12 shows that these points are in general position. Let σ ε :=convxε

0, . . . ,xεk. The barycenter of σ ε is c, and the distance of any of its points

from c is no larger than ε maxki=0 ∥xi − c∥.

There is a convex neighborhood M of c in A with M ⊂ N. For sufficiently smallpositive ε we have σ ε ⊂ M, so that σ ε is a k-simplex in N with barycenter c. ⊓⊔

The next proposition tells us that any k-dimensional convex set has to contain ak-simplex. That fact will be important when we look at relative interiors of convexsets.

Proposition 1.1.14. Let C be a convex subset of Rn having dimension k ≥ 0, and letx0 be a point of C. Then C contains a k-simplex, one vertex of which is x0. Moreover,for any k-simplex σ that is contained in C, affσ = affC.

Proof. As x0 ∈ C, C contains a 0-simplex with vertex x0. If k = 0 we are finished.If k > 0 then suppose that for some j with 0 ≤ j < k we have found a j-simplex S jwith vertices x0, . . .x j belonging to C. We show that the construction can be extendedfrom the dimension j to j+1 and thereby establish the result by induction.

The set C cannot be contained in the affine set A j := affx0, . . . ,x j becausethe dimension of A j is j (the points x0, . . . ,x j are in general position), whereasthe dimension of C is k > j. Let x j+1 be a point of C not in A j, and note thatx1 − x0, . . . ,x j+1 − x0 must be linearly independent because x j+1 /∈ A j; to see thisassume dependence, use the fact that x1 − x0, . . . ,x j − x0 are linearly independent,and rearrange the terms to exhibit x j+1 as an affine combination of x0, . . . ,x j. Theset S j+1 = convx0, . . . ,x j+1 is a j+1-simplex in C with vertices x0, . . . ,x j+1. Wecan now continue the induction to find a k-simplex Sk ⊂C as required.

For the statement about affine hulls, recall that affC has dimension k, and as itcontains C it certainly contains Sk. Then Corollary A.6.13 says that affC = affSk.

⊓⊔

Corollary 1.1.15. Let C be a convex subset of Rn containing a point x0. For eachconvex neighborhood Q of x0, dim(C∩Q) = dimC.

Proof. Let k = dimC and suppose Q is a convex neighborhood of x0. As C∩Q is aconvex set contained in C, we have dim(C∩Q)≤ dimC.

1.1 Convex sets 11

By Proposition 1.1.14, C contains a k-simplex σ whose vertices are x0,x1, . . . ,xk.For i= 0, . . . ,k and µ ∈ [0,1] let vµ

i = x0+µ(xi−x0) and define σµ = convvµ0 , . . . ,v

µk .

If µ ∈ (0,1] then σµ is a k-simplex contained in C. If we take µ to be small enoughthen σµ ⊂C∩Q. Then

dimC = k = dimσµ ≤ dim(C∩Q)≤ dimC,

so C and C∩Q have the same dimension. ⊓⊔

Generalized simplices and convex hulls

It will be useful to extend the definition of convex hull to include what one mightthink of as points at infinity. For example, let x and v be elements of Rn, with v = 0.Define the halfline from x in the direction of v to be the set x+ vR+. We call v agenerator of the halfline, and x its endpoint. Of course, any positive scalar multipleof v is also a generator.

This halfline is certainly a convex set, but it is not the convex hull of any finite setof points. However, even for very large µ the line segment [x,x+µv] is the convexhull of x and x+ µv. The trouble is that to get the entire halfline we have to let µbecome infinitely large, and then the right-hand endpoint of the segment escapes toinfinity.

To extend the definition of convex hull we need a new operation. For a finite setT = t1, . . . , tk we say that a point x is a nonnegative linear combination of pointsof T if there are nonnegative scalars µ1, . . . ,µk such that x = ∑k

i=1 µiti.

Definition 1.1.16. Let S be a subset of Rn. The positive hull of S is the set posSconsisting of the origin together with the set of all nonnegative linear combinationsof finite subsets of S. ⊓⊔

There is an asymmetry in the treatment of the convex hull and the positive hull, asconv /0 = /0 but pos /0 is the origin. The purpose of including the origin is to ensurethat posS is never empty, even if S is.

Definition 1.1.17. Let S and T be subsets of Rn, with S nonempty. The generalizedconvex hull of the pair (S,T ) is the set

conv(S,T ) := convS+posT.

⊓⊔

It is easy to show using this definition that conv(S,T ) is convex.By using this new construction we can extend the notion of general position and

the applicability of Theorem 1.1.6.

Definition 1.1.18. Given a nonempty finite set X = x1, . . . ,xr of points of Rn anda finite set Y = y1, . . . ,ys of points of Rn, we say that the pair (X , Y ) is in generalposition if the vectors

12 1 Convex sets(x11

), . . . ,

(xr1

),

(y10

), . . . ,

(ys0

)are linearly independent. If (X , Y ) is in general position then we call conv(X , Y ) ageneralized simplex. ⊓⊔

When the set Y is empty, this just reduces to the ordinary notions of generalposition and of simplex. If (X ,Y ) is in general position then the coefficients in therepresentation of a point of the generalized simplex convX +posY must be unique.The next theorem extends the representation of Theorem 1.1.6 to this more generalsituation.

Theorem 1.1.19. Let X and Y be subsets of Rn, with X nonempty. For each point zof conv(X ,Y ) there are finite subsets X of X and Y of Y such that (X , Y ) is in generalposition and z belongs to the generalized simplex conv(X , Y ), with the coefficientsin the representation of z being strictly positive.

Proof. If Y is empty then Theorem 1.1.6 yields the required conclusion. Thereforeassume that each of X and Y is nonempty.

By hypothesis z ∈ convX + posY . Using Proposition 1.1.4 and the definition ofposY we can find some finite subsets X ′ = x1 . . . ,xr of X and Y ′ = y1, . . . ,ys ofY such that with

ui =

(xi1

), i = 1, . . . ,r, v j =

(y j0

), j = 1, . . . ,s, w =

(z1

),

we can find nonnegative coefficients ρ1, . . . ,ρr and σ1, . . . ,σs such that

w =r

∑i=1

ρiui +s

∑j=1

σ jv j.

We can also suppose that we have chosen the coefficients so as to make the in-teger r + s as small as possible, which in particular implies that the coefficientsρi and σ j are all positive. If (X ′,Y ′) were not in general position then the vectorsu1, . . . ,ur,v1, . . . ,vs would be linearly dependent. We could then argue as we did inTheorem 1.1.6 to justify deleting one of the points. This would contradict the mini-mality of r+ s; therefore (X ′,Y ′) must be in general position. ⊓⊔

Theorem 1.1.19 has an important consequence for sets that are generalized con-vex hulls of pairs (X ,Y ) in which both X and Y are finite. There is a special namefor such sets.

Definition 1.1.20. A finitely generated convex set in Rn is the generalized convexhull C = convX +posY of finite subsets X and Y of Rn. ⊓⊔

As posY is always nonempty, the sum C is empty if and only if X is empty.

Proposition 1.1.21. Each finitely generated convex subset C of Rn is closed.

1.1 Convex sets 13

Proof. If C is empty then it is certainly closed, so we may assume that X isnonempty. The first step is to show that C is closed in the special case in whichit is a generalized simplex: that is, in which the pair (X ,Y ) is in general position.For this case suppose that ck, k = 1,2, . . ., is a sequence in C that converges to c0.Let X = x1, . . . ,xI and, if Y is nonempty, Y = y1, . . . ,yJ. Then for each k thereare convex coefficients ρk

i and nonnegative coefficients σ kj with

ck =I

∑i=1

ρki xi +

J

∑j=1

σ kj y j.

For the case Y = /0 and for each k, define

A =

[x1 . . . xI y1 . . . yJ1 . . . 1 0 . . . 0

], ak =

[ck1

].

Here and in what follows, we proceed as if Y were nonempty; if it is empty then theprocedure is the same except that the elements pertaining to Y do not appear. Wenext define

zk =[ρk

1 . . . ρkI σ k

1 . . . σ kJ]∗.

Then for each k we have Azk = ak and zk ≥ 0, where we use the symbol ≥ to denotethe partial order on RI+J that makes p ≥ q if pr ≥ qr for r = 1, . . . , I + J.

By assumption C is a generalized simplex, which means that the columns of Aare linearly independent and therefore the kernel kerA is 0. It follows that A is aninvertible linear map of RI+J onto its image imA ⊂Rn+1. Write τ : imA →RI+J forthe inverse linear map. The ak belong to imA, so that zk = τ(ak). As τ is linear andtherefore continuous, and imA is a subspace and therefore closed, the fact that theak converge to the point (c0,1)∗ implies that (c0,1)∗ belongs to imA and that the zkconverge to the point

z0 := τ[

c01

].

As the limit of zk, z0 is nonnegative with ∑Ii=1(z0)i = 1, and therefore c0 ∈

conv(X , Y ).Now suppose that C = conv(X , Y ) is finitely generated but not necessarily a gen-

eralized simplex. Each generalized simplex S := conv(X , Y ) formed from subsetsX ⊂ X and Y ⊂ Y is contained in C, so the union of all such S is a subset of C. ButTheorem 1.1.19 says that each point of C is contained in such an S, so that in factC is the union of all such generalized simplices. As C is finitely generated, there areonly finitely many distinct generalized simplices in this union, so that C is a finiteunion of closed sets and is therefore closed. ⊓⊔

14 1 Convex sets

1.1.4 Exercises for Section 1.1

Exercise 1.1.22. Show that the inverse image of a convex set under an affine trans-formation is convex.

Exercise 1.1.23. Construct an example of a closed subset S of R2 whose convexhull is not closed. (You must prove that convS is not closed.)

Exercise 1.1.24. Let C and D be convex sets in Rn, and let µ and ν be real numbers.Show that µC+νD is convex. Further, show that if µ and ν are nonnegative, thenµC+νC = (µ +ν)C. Give an example to show that this equation is generally falsefor nonconvex sets.

Exercise 1.1.25. Show that for any subset S of Rn, affS = affconvS. (In particular,the affine-hull and convex-hull operators commute.)

Exercise 1.1.26. Show that if E : Rk → Rn is an affine transformation and U is asubset of Rk, then convE(U) = E(convU). (In particular, if U is convex then so isE(U).) Give an example to show that E(U) need not be closed even if U is closedand convex.

Exercise 1.1.27. Show that if E : Rk → Rn is an affine transformation and S is asubset of Rn, then convE−1(S) ⊂ E−1(convS). Show by example that the reverseinclusion need not hold in general, but prove that it holds if either S is convex orS ⊂ imE.

Exercise 1.1.28. For i = 1, . . . , I let Si be a subset of Rni . Show that

convI

∏i=1

Si =I

∏i=1

convSi.

Exercise 1.1.29. Use the results of Exercises 1.1.26 and 1.1.28 to show that if Si |i = 1, . . . , I is a finite collection of subsets of Rn, then

convI

∑i=1

Si =I

∑i=1

convSi.

1.2 Relative interiors of convex sets

The definition of a convex set may seem almost like that of an affine set, and ofcourse any affine set is convex. However, convex sets exhibit much more complexbehavior than do affine sets. This section introduces some of that complexity.

For convex sets in Rn the affine hull is an extremely useful technical device. Tosee why, think about the usefulness of the interior of a set, convex or not. Obviouslymany convex sets will not have interiors (think of a line segment in R2), but we can

1.2 Relative interiors of convex sets 15

define a substitute, called the relative interior, which is the interior of the set relativeto its affine hull. This is the set’s interior in the relative topology of the affine hull:that is, the topology induced on the affine hull by the topology of Rn. The open setsof the relative topology are just the intersections of the open sets of Rn with theaffine hull.

For example, the relative interior of a line segment between two distinct pointsof R2 consists of the line segment with its endpoints deleted. It’s possible that everypoint in the set may be in the relative interior (for example if the set is affine, or inthe case of the line segment if it contains neither of its endpoints), and in that casewe call the set relatively open.

In the remainder of this section we first show that any nonempty convex subsetof Rn has a nonempty relative interior. This fact highlights the usefulness of therelative interior, since a nonconvex set S in Rn certainly need not have an interior,either relative to Rn itself or relative to affS. Then we introduce two constructionsusing line segments that will help us to compute relative interiors of sets. Finally,we use those results to develop basic properties of the relative interior, including itsusefulness in developing regularity conditions for set operations.

1.2.1 Nonemptiness of the relative interior

Lemma A.6.17 showed that the barycentric coordinates are Lipschitz continuous.That fact has a useful consequence.

Lemma 1.2.1. Let σ be a simplex of dimension j ≥ 0 in Rn having vertices v0, . . . ,v j.The relative interior of σ is the set of points whose barycentric coordinates with re-spect to these vertices are all positive. Moreover, affσ = aff riσ .

Proof. Let P be the set of points of σ having all barycentric coordinates positive.Any point x ∈ riσ must be in σ , so its (unique) barycentric coordinates are con-vex coefficients. If x /∈ P then one of these coefficients, say βi, is zero; then j ≥ 1and another coefficient βi′ is positive. For ε > 0 the point x(ε) whose barycentriccoordinates β (ε) are

βk(ε) =

βk − ε if k = i,βk + ε if k = i′,βk otherwise,

belongs to affσ but does not belong to σ because β (ε) is not a convex combinationof the vertices (βi(ε)< 0) and the barycentric coordinates are unique. But as x(ε) isa continuous function of its barycentric coordinates, by choosing ε to be small wecan bring x(ε) as close to x as we wish. Therefore x /∈ riσ , so that riσ ⊂ P.

If x ∈ P then, as the barycentric coordinates are continuous functions of x, thereis a neighborhood N of x in affσ such that whenever x′ ∈ N, the barycentric coordi-nates of x′ are all positive. Then N ⊂ P ⊂ σ , so that x ∈ riσ and therefore P = riσ .

16 1 Convex sets

As riσ ⊂ σ , we have aff riσ ⊂ affσ . Thus, for the final claim it is enough toshow the reverse inclusion, and we can do that by exhibiting each vertex of σ as anaffine combination of points of P.

Construct the barycenter b = ∑ki=0(k + 1)−1vi of σ and, for ε ∈ (0,1] and

i = 0, . . . , j, let wi be the convex combination (1− ε)vi + εb. The barycentric co-ordinates of b are all positive, and those of the vi are all nonnegative. Therefore eachwi belongs to P. Then vi = (1− ε)−1(wi − εb) is the required affine combination.

⊓⊔

Proposition 1.2.2. If C is a nonempty convex subset of Rn, then aff riC = affC. Inparticular, riC is nonempty.

Proof. Let the dimension of C be k ≥ 0. Use Proposition 1.1.14 to find a k-simplexσ in C. That proposition also tells us that the affine hull of σ coincides with affC.Lemma 1.2.1 shows that riσ is nonempty; as this is also the interior of σ relative toaffC and as σ ⊂C we have /0 = riσ ⊂ riC. Using Lemma 1.2.1 again, we have

affσ = aff riσ ⊂ aff riC ⊂ affC = affσ ,

so that all of these sets are the same. ⊓⊔

One of our arguments in the above proof was that if D and E are subsets ofRn with D ⊂ E and affD = affE, then riD ⊂ riE. This follows from elementaryproperties of interiors, because we are taking the interior relative to the commonaffine hull of the two sets. However, we can only make this claim if the sets havethe same affine hull. To see this, let E be a triangle and D be one of its sides; thenD ⊂ E, but the relative interiors of D and E are actually disjoint. This problem doesnot occur with interiors, because the interior is taken with respect to an ambientspace that contains both sets.

Any affine set is relatively open (it is simultaneously its own affine hull andrelative interior), so Proposition 1.2.2 shows that for any convex set C we haveaff riC = affC = riaffC. Therefore the affine-hull and relative-interior operatorscommute. If S is a nonconvex set then riS may be empty even if S is not, and inthat case aff riS obviously cannot equal affS. However, as riS ⊂ S we always haveaff riS ⊂ affS = riaffS.

Here is another consequence of nonemptiness of the relative interior.

Corollary 1.2.3. Let C be a nonempty convex subset of Rn. If H is any hyperplanecontaining C in one of its closed halfspaces, then H meets riC if and only if Hcontains clC.

Proof. (if) As C is nonempty, riC is nonempty by Proposition 1.2.2. Therefore if Hcontains clC it must meet riC.

(only if) For some z∗ = 0 let H = x | ⟨z∗,x⟩= ζ, and suppose that c0 ∈ H∩ riC.With no loss of generality assume that C is in the lower closed halfspace of H, andlet c be any point of C. For small positive µ , c0 +µ(c0 − c) ∈C by the definition ofrelative interior. Then


ζ ≥ ⟨z∗,(1+µ)c0 −µc⟩= (1+µ)ζ −µ⟨z∗,c⟩,

where we used the fact that c0 ∈H. It follows that ⟨z∗,c⟩≥ ζ . But ⟨z∗,c⟩≤ ζ becausec is in the lower closed halfspace of H, so in fact c ∈ H and therefore C ⊂ H. As His closed, we also have clC ⊂ H. ⊓⊔

1.2.2 Line segments and the relative interior

Proofs in convexity often proceed by using simple geometric devices to developinformation about a set or to manipulate it in some way. Among the most useful ofthese devices are some facts about the relationships between line segments and therelative interior. The following theorem and its corollaries develop these. It says thatif x belongs to the closure of a convex set C and y belongs to the relative interior ofC, then the entire line segment between x and y, except for the single point x, lies inriC.

The proof of the theorem uses the fact, following from the definition of affinetransformation, that if T is an invertible affine transformation between two affinesets A and D, and if a0, . . . ,ak is a subset of A in general position, then for anyx ∈ A the barycentric coordinates of T (x) with respect to T (a0), . . . ,T (ak) are thesame as those of x with respect to a0, . . . ,ak.

Theorem 1.2.4. Let C be a convex subset of Rn; let x ∈ clC and r ∈ riC. Then(x,r]⊂ riC.

Proof. If x = r there is nothing to prove. We can therefore assume x = r so thatthe dimension k of C is at least 1. Choose any λ ∈ (0,1); we will show that s :=(1− λ )r + λx ∈ riC by constructing a k-simplex τ ⊂ C such that the barycentriccoordinates of s with respect to the vertices of τ are all positive. By Proposition1.1.14 the affine hulls of τ and C must be the same set, say A, so we will then haves ∈ riτ ⊂ riC.

Use Lemma 1.1.13 to find a k-simplex σ contained in C having vertices v0, . . . ,vkand barycenter r. Its affine hull must then be A. The barycentric coordinates of anypoint a ∈ A with respect to the vi are continuous functions of a (Lemma A.6.17),and for a = r those coordinates all equal (k+ 1)−1. There is then some positive ρsuch that for each point a ∈ A having ∥a− r∥ < ρ the barycentric coordinates of aare all positive. Further, as x ∈ clC we can find a point d ∈C with

λ (1−λ )−1∥x−d∥< ρ. (1.4)

Define an invertible affine transformation T : A → A by T (a) := (1−λ )a+λd;then T (C) ⊂ C by convexity. The set T (σ) is a k-simplex τ ⊂ C with verticesw0, . . . ,wk where wi = T (vi) for each i, and with barycenter t := T (r). As T isaffine, for any point a ∈ A the barycentric coordinates of T (a) with respect to the wiare the the same as those of a with respect to the vi.

18 1 Convex sets

Now define a point t ∈ A by t := r+λ (1−λ )−1(x−d); then

s = (1−λ )r+λx = (1−λ )t +λd = T (t). (1.5)

As ∥t− r∥< ρ by (1.4), the barycentric coordinates of t with respect to the vi are allpositive. But these are also the barycentric coordinates of s with respect to the wi,so we have s ∈ riτ ⊂ riC. ⊓⊔

Theorem 1.2.4 shows, for example, that the relative interior of a convex set isconvex (take x ∈ riC). Here is another useful technical result.

Corollary 1.2.5. Let C be a nonempty convex subset of Rn and let c be a point ofRn. The following are equivalent:

a. c ∈ riC.b. For each y ∈ affC there is a positive ε such that c+ ε(c− y) ∈C.c. For each y ∈C there is a positive ε such that c+ ε(c− y) ∈C.d. There exist r ∈ riC and ε > 0 such that c+ ε(c− r) ∈ clC.

Proof. (a) implies (b). As c ∈ riC there is a neighborhood N of c in affC such thatN ⊂ C. Let y ∈ affC and choose ε > 0 small enough so that c+ ε(c− y) ∈ N ⊂ C;then (b) holds.

(b) implies (c). affC ⊃C.(c) implies (d). As C is nonempty, so is riC by Proposition 1.2.2. Take y to be

r ∈ riC and apply (c) to get (d).(d) implies (a). If r and ε > 0 exist so that c+ ε(c− r) =: x ∈ clC, then c =

(1+ ε)−1εr+(1+ ε)−1x with r ∈ riC, x ∈ clC, and ε > 0. Theorem 1.2.4 says thatthen c ∈ riC. ⊓⊔

We define the relative boundary of a convex set C to consist of those points ofclC that do not belong to riC. Returning again to our line segment in R2, we seethat the relative boundary consists of the two endpoints of the segment, whereas theboundary would be the entire segment. This relative boundary relates to the affinehull and the relative interior as the boundary does to the ambient space and theinterior.

We commented earlier that Theorem 1.2.4 is a very useful device in proofs. Hereis a proposition showing the particularly nice behavior of the relative interior andclosure operators when applied to convex sets. Although one could not expect themto commute, they do almost the next best thing.

Proposition 1.2.6. If C is a convex subset of Rn, then

cl riC = clC, riclC = riC.

Proof. If C is empty there is nothing to prove, so assume C is nonempty. For the firstformula, the inclusion ⊂ follows from elementary properties of the closure operationand the fact that riC ⊂C. For the other inclusion, let x be any point of clC and use


Proposition 1.2.2 to choose c ∈ riC. According to Theorem 1.2.4, the line segment(x,c] lies in riC. As x belongs to the closure of this segment it belongs to cl riC.

For the second formula, recall that any set and its closure have the same affinehull; therefore as C ⊂ clC we have riC ⊂ riclC by elementary properties of theinterior operation. For the opposite inclusion let z ∈ riclC and choose x ∈ riC. Thisx also belongs to the convex set clC, so by Corollary 1.2.5 we can prolong the linesegment from x to z slightly beyond z to obtain µ > 0 and a point w := z+µ(z− x)that still lies in clC. However, z lies in the relatively open line segment betweenx ∈ riC and w ∈ clC, so by Theorem 1.2.4 z belongs to riC. ⊓⊔

Note that in the last part of the above proof we used Theorem 1.2.4 twice insuccession, applied to two different sets.

Here is a result connecting forward affine images and relative interiors. Such aresult certainly does not hold for interiors; what makes it work for convex sets isthe fact that affine transformations commute with the affine-hull operator (ExerciseA.6.31).

Proposition 1.2.7. Let C be a convex subset of Rn and T an affine transformationfrom Rn to Rm. Then T (riC) = riT (C).

Proof. If C is empty there is nothing to prove, so we may assume C is nonempty.(⊂). Let a0 be a point of T (riC); then a0 = T (c0), where c0 ∈ riC. The set T (C) is

nonempty and convex, so let a1 ∈ riT (C); then a1 = T (c1) for some c1 ∈C. By part(c) of Corollary 1.2.5 there is some positive µ such that c2 := c0+µ(c0−c1) belongsto C. Then T (c2) = a0+µ(a0−a1) =: a2 belongs to T (C), so that a0 ∈ (a1,a2). UseTheorem 1.2.4 to conclude that a0 lies in riT (C), so that T (riC)⊂ riT (C).

(⊃). For a set S and a continuous function f whose domain includes clS wealways have f (clS) ⊂ cl f (S) (Exercise A.6.29). As clC = cl riC by Proposition1.2.6, we find that

T (C)⊂ T (clC) = T (cl riC)⊂ clT (riC). (1.6)

As riC and C have the same affine hull, T (C) and T (riC) have the same affine hull;this is also the affine hull of clT (riC). Then we can take relative interiors in (1.6) toobtain

riT (C)⊂ riclT (riC) = riT (riC)⊂ T (riC),

so that riT (C)⊂ T (riC). ⊓⊔

We can now use the same device to deal with sums of relative interiors that weused previously with affine and convex hulls.

Corollary 1.2.8. If C and D are two convex subsets of Rn, then ri(C+D) = riC+riD.

Proof. We have ri(C ×D) = riC × riD (Exercise 1.2.15). Now apply Proposition1.2.7, using the linear operator L from R2n to Rn given by L(x,y) = x+ y, to obtain

20 1 Convex sets

ri(C+D) = riL(C×D) = L[ri(C×D)] = L(riC× riD) = riC+ riD.

⊓⊔

1.2.3 Regularity conditions for set operations

As an illustration of the usefulness of the relative interior, we use it to augmentthe formula of Exercise A.6.31. That formula related forward affine images to affinehulls; this one deals with inverse images. It is not hard to show that the inverse imageof an affine set or a convex set under an affine transformation is affine or convex,respectively; see Exercises A.6.26 and 1.1.22.

Proposition 1.2.9. Let T : Rn → Rm be an affine transformation and let S be anonempty subset of Rm. Then affT−1(S) ⊂ T−1(affS), and if imT meets riS thenequality holds.

Proof. T−1(affS) is affine and it contains T−1(S), so it also contains affT−1(S) bydefinition of the affine hull. To prove equality, assume that imT meets riS, so thatthere is some x0 ∈ Rn with T (x0) =: s0 ∈ riS. Suppose that x ∈ T−1(affS), and letT (x) =: f ∈ affS. Then f − s0 ∈ parS, so for sufficiently small positive ε we havesε := s0 + ε( f − s0) ∈ S by the definition of relative interior. However, if we definexε := x0 + ε(x− x0) then sε = T (xε) and therefore xε ∈ T−1(S). Then we have

x = (1− ε−1)x0 + ε−1xε ∈ affT−1(S),

which shows that in this case affT−1(S) = T−1(affS). ⊓⊔

The condition that the image of T should meet the relative interior of S is afundamental regularity condition that one sees throughout the study of convexity. Inthis case it’s really needed, as equality may not hold without it (Exercise 1.2.14).

Proposition 1.2.9 has a useful corollary dealing with intersections of sets.

Corollary 1.2.10. Let C1 and C2 be convex subsets of Rn. Then aff(C1 ∩C2) ⊂(affC1)∩ (affC2), and if (riC1)∩ (riC2) = /0 then equality holds.

Proof. Apply Proposition 1.2.9 with C =C1 ×C2 taking T to be the linear transfor-mation L : Rn →R2n given by L(x) = (x,x) and using the result of Exercise A.6.25.

⊓⊔

Although Proposition 1.2.7 showed that forward images behave very well underthe relative interior operation, with inverse images one has to be more careful. Thenext proposition shows that the same regularity condition helps to deal with thiscase.

Proposition 1.2.11. Suppose D is a convex set in Rm, and let T be an affine trans-formation from Rn to Rm. Then


T−1(riD)⊂ riT−1(D), T−1(clD)⊃ clT−1(D),

with equality in each case if imT meets riD.

Note that the inclusions go in opposite directions.

Proof. If D is empty there is nothing to prove, so we may assume D to be nonempty.We first show that T−1(riD) ⊂ riT−1(D). If x belongs to T−1(riD) and y is anypoint of riT−1(D) then T (x) ∈ riD and T (y) ∈ D. As T (x)−T (y) ∈ parD, by thedefinition of relative interior there is some positive ε with t := T (x) + ε(T (x)−T (y)) ∈ D. Writing z = x+ ε(x− y), we have T (z) = t because T is affine; as t ∈D we have z ∈ T−1(D). Applying Corollary 1.2.5, we find that x ∈ riT−1(D) asrequired.

To prove the inclusion involving closures, note that since T is continuous,T−1(clD) must be a closed set. This set contains T−1(D), so it must also containclT−1(D) by elementary properties of closure.

Now assume that imT meets riD. We show that in this case the inclusions be-come equalities.

Let w ∈ riT−1(D); we prove w ∈ T−1(riD). Suppose that v ∈ T−1(riD), and usethe definition of relative interior to find ε > 0 such that u := w+ε(w−v)∈ T−1(D).Then T (u) ∈ D, and

T (w) = (1+ ε)−1T (u)+ ε(1+ ε)−1T (v) ∈ riD,

where we used Theorem 1.2.4. Therefore w ∈ T−1(riD), as claimed.Finally, let x0 ∈ T−1(clD) and x1 ∈ T−1(riD), and write di = T (xi) for i = 0,1.

Theorem 1.2.4 says that (d0,d1] ⊂ riD, so (x0,x1] ⊂ T−1(riD) ⊂ T−1(D). Thenx0 ∈ cl(x0,x1]⊂ clT−1(D). ⊓⊔

There are cases in which this regularity condition doesn’t suffice. One of these isExercise 1.1.27, in which the condition V ⊂ imE cannot be replaced by, for exam-ple, the requirement that imE meet riconvV (Exercise 1.2.17).

We can use Proposition 1.2.11 together with some calculation to prove very use-ful formulas about commutativity of closure with intersection.

Proposition 1.2.12. Let Cα | α ∈ A be a collection of convex subsets of Rn. As-sume that ∩α∈A riCα = /0. Then

cl∩

α∈A

Cα =∩

α∈A

clCα ,

andri∩

α∈A

Cα ⊂∩

α∈A

riCα , (1.7)

with equality when A is a finite set. In particular, the intersection of any finite col-lection of relatively open sets is relatively open.

22 1 Convex sets

Proof. Let y ∈ ∩α∈A riCα and x ∈ ∩α∈A clCα . For each α ∈ A the segment [y,x)lies in riCα by Theorem 1.2.4; therefore [y,x) ⊂ ∩α∈A riCα . As x ∈ cl[y,x) it alsobelongs to cl∩α∈A riCα .

We then have

∩α∈A clCα ⊂ cl∩α∈A riCα ⊂ cl∩α∈ACα ⊂ ∩α∈A clCα ,

where the first inclusion came from the argument we just made and the last camefrom elementary properties of the closure operation. Since the same set appears ateach end of the chain of inclusions, we have proved the first claim. We have alsoshown that

cl∩α∈ACα = cl∩α∈A riCα .

From this, by recalling that riS = riclS for convex sets S, we obtain

ri∩α∈ACα = ricl∩α∈ACα = ricl∩α∈A riCα = ri∩α∈A riCα ⊂ ∩α∈A riCα ,

and this proves the main part of the second claim.For the rest of the second claim, suppose that A is finite, say 1, . . . ,k. Let

D = C1 × . . .×Ck, a subset of Rnk. If we define a linear transformation L fromRn to Rnk by Lx = (x, . . . ,x), then L−1(riD) = ∩k

i=1 riCi (assumed nonempty), andriL−1(D) = ri∩k

i=1Ci. By Proposition 1.2.11 these sets are the same.The last claim is certainly true if the intersection is empty. If it is nonempty then

the relative interiors of the sets being intersected have a point in common; then thesecond claim shows that the intersection is its own relative interior, so it is relativelyopen. ⊓⊔

The restriction to finite sets in the last claim is necessary (Exercise 1.2.18). Fur-ther, without the intersection condition the inclusion opposite to (1.7) may holdstrictly (Exercise 1.2.19).

By combining some of the preceding results we can produce various useful for-mulas for dealing with sets encountered in practice. Here is an example.

Proposition 1.2.13. Let x0, . . . ,xk be points in Rn. Then

riconvx0, . . . ,xk= k

∑i=0

λixi | λi > 0,k

∑i=0

λi = 1,

and

riposx0, . . . ,xk= k

∑i=0

λixi | λi > 0.

Proof. For the first formula, let X be an n × (k + 1) matrix whose columns arex0, . . . ,xk and let H = x ∈ Rk+1 | ∑k

i=0 xi = 1. Then convx0, . . . ,xk = X(H ∩Rk+1+ ). Further, H (which is its own relative interior) meets intRk+1

+ . ApplyingPropositions 1.2.7 and 1.2.12, we find that

1.3 Notes and references 23

riconvx0, . . . ,xk= riX [(H ∩Rk+1+ )] = X [ri(H ∩Rk+1

+ )]

= X [H ∩ (intRk+1+ )] =

k

∑i=0

λixi | λi > 0,k

∑i=0

λi = 1.

The proof for posx0, . . . ,xk is similar except that we only have to deal with Rk+1+

and not with H. ⊓⊔


Exercise 1.2.14. Show by example that in the situation of Proposition 1.2.9, if imTdoes not meet riC then T−1(affC) may properly contain affT−1(C).

Exercise 1.2.15. Show that for two convex sets C ⊂Rn and D ⊂Rm one has ri(C×D) = (riC)× (riD). You may find Corollary 1.2.5 helpful.

Exercise 1.2.16. Exhibit a subset S of R2 and a point x ∈ S having the propertiesthat for each y ∈ S and all sufficiently small positive µ we have x+µ(x−y) ∈ S, butx /∈ riS.

Exercise 1.2.17. Exhibit a linear transformation E from R to R2 and a subsetV of R2 having no more than three points, with imE ∩ riconvV = /0, such thatE−1(convV ) and convE−1(V ) are both nonempty but are not equal.

Exercise 1.2.18. Exhibit a family of intervals Ci∞i=1 in R such that ri∩∞

n=1Cn and∩∞

i=1 riCn are unequal.

Exercise 1.2.19. Exhibit two convex sets C1 and C2 such that ri(C1 ∩C2) properlycontains riC1 ∩ riC2.

Exercise 1.2.20. If C is a convex subset of Rn and clC is affine, then C is affine.

1.3 Notes and references

The first published version of the Shapley-Folkman theorem seems to have appearedin [47, Appendix 2].

Chapter 2Separation

A rough description of separation would say that if two convex sets don’t overlaptoo much then one can put a hyperplane between them, thereby separating them.Different definitions of “too much overlap” and “between” lead to different kinds ofseparation, and an important question is what properties a problem needs to have inorder to accomplish these different sorts of separation.

Many important constructions in convexity rely on the information that a sepa-rating hyperplane can provide. Moreover, the process of separation is closely con-nected with notions of distance between points and sets or between sets and sets,and therefore also with methods for approximating sets by other sets.

2.1 Definitions of separation

Suppose C and C′ are two nonempty subsets of Rn and H is a hyperplane in Rn.Here are four different ways in which H can separate C and C′.

Definition 2.1.1. H separates C and C′ if each of C and C′ lies in a different closedhalfspace of H. ⊓⊔

Intuitively this seems to mean that C and C′ should somehow be “apart,” butthis is not necessarily the case: for example, the hyperplane H separates itself fromitself. We can eliminate this annoyance by introducing the stronger requirement ofproper separation.

Definition 2.1.2. H properly separates C and C′ if H separates C and C′ with C∪C′ ⊂ H. ⊓⊔

Now H no longer separates itself from itself. However, one can properly separatean endpoint of the interval [0,1] in R from the interval itself (take H to be theendpoint: it’s a hyperplane in R), so this kind of separation might seem too weak forsome purposes. Therefore we introduce a third notion, that of strict separation.

25

26 2 Separation

Definition 2.1.3. H strictly separates C and C′ if each of these two sets lies in adifferent open halfspace of H. ⊓⊔

This seems stronger, because the hyperplane lies between the sets and cannotmeet either set. But even this does not suffice for some purposes: for example it ispossible for two convex sets C and C′ to be strictly separated, yet for points of Cand of C′ to be as close together as we please. An example in R2 is C = (x1,x2) |x2 ≥ expx1 and C′ = (x1,x2) | x2 ≤−expx1, with H being the x1-axis. For thatreason we define a fourth type of separation, called strong separation:

Definition 2.1.4. H strongly separates C and C′ if there is a positive scalar ε suchthat H separates C+ εBn and C′+ εBn, where Bn is the unit ball. ⊓⊔

Each of these four kinds of separation is stronger than those preceding it, butnot all are equally useful. It turns out that in practice proper separation is the mostuseful, with strong separation in second place, while the others are less important.

The concept of supporting hyperplane is also often useful.

Definition 2.1.5. Let S be a subset of Rn and H be a hyperplane in Rn. We say Hsupports S if S lies in one of the closed halfspaces of H and at least one point of Sbelongs to H. If in addition S ⊂ H then we say that the support of S by H is proper,or that H properly supports S. ⊓⊔

The following section characterizes strong separation and then proper separation.

2.2 Conditions for separation

For any subset S of Rn and any x ∈ Rn we say that a point s is closest to x in S if∥x− s∥ = dS(x). Such a point need not be unique: for example, if x is the origin ofRn and S is the sphere Sn−1, then every point of S is at the same distance from x andhence is a closest point to x in S.

The idea of separation is related to that of closeness, and one of the most usefulfacts about separation is the following characterization of the point in a convex setthat is closest to a given point.

Lemma 2.2.1. Let S be a subset of Rn containing a point s0, and let x0 ∈ Rn. If

For each s ∈ S, ⟨x0 − s0,s− s0⟩ ≤ 0, (2.1)

then s0 is the unique closest point to x0 in S.If S is convex and s0 is a closest point to x0 in S, then (2.1) holds and s0 is the

unique closest point to x0 in S.

Proof. For any s ∈ Rn,

∥x0 − s∥2 = ∥(x0 − s0)− (s− s0)∥2

= ∥x0 − s0∥2 −2⟨x0 − s0,s− s0⟩+∥s− s0∥2,(2.2)

2.2 Conditions for separation 27

so that2⟨x0 − s0,s− s0⟩−∥s− s0∥2 = ∥x0 − s0∥2 −∥x0 − s∥2. (2.3)

First suppose (2.1) holds. For each s ∈ S the first term on the left side of (2.3) isnonpositive. Then so is the entire left side and therefore the right side also, whichshows that s0 minimizes ∥x0 − (·)∥2 on S. In that case, if for some s ∈ S the rightside were actually zero then the two nonpositive terms on the left side would bothhave to be zero, implying s = s0. Therefore s0 is the unique minimizer of ∥x0 − (·)∥on S and hence is the unique closest point.

Now suppose that S is convex and that s0 minimizes ∥x0 − (·)∥ on S. Let s1 beany point of S, and for each ε ∈ (0,1] define sε := s0 + ε(s1 − s0), so that sε ∈ S.Putting s = sε in (2.3) and using our assumption about s0 we obtain

0 ≥ ∥x0 − s0∥2 −∥x0 − sε∥2

= 2⟨x0 − s0,sε − s0⟩−∥sε − s0∥2

= 2ε⟨x0 − s0,s1 − s0⟩− ε2∥s1 − s0∥2.

Dividing by 2ε and letting ε approach 0, we find that ⟨x0 − s0,s1 − s0⟩ ≤ 0, so (2.1)holds. The first part of the proof now shows that s0 is the unique closest point to x0in S. ⊓⊔

2.2.1 Strong separation

Lemma 2.2.1 allows us to characterize those points that we can strongly separatefrom a convex set.

Proposition 2.2.2. Let C be a nonempty convex subset of Rn, and let x be a point ofRn. A hyperplane strongly separating x and C exists if and only if x /∈ clC.

Proof. (only if) If such a hyperplane exists, there is a nonzero element y∗ of Rn,together with real numbers η and ε with ε positive, such that for each c ∈C and anyelements v and v′ of Bn,

⟨y∗,x+ εv⟩ ≤ η ≤ ⟨y∗,c+ εv′⟩. (2.4)

There is no loss in assuming that we have scaled y∗ and η so that y∗ has norm 1. Bysubtracting the left side of (2.4) from the right side and rearranging terms we findthat

ε⟨y∗,v− v′⟩ ≤ ⟨y∗,c− x⟩.

By choosing v = y∗ =−v′ and then using the Schwarz inequality, we obtain

2ε ≤ ⟨y∗,c− x⟩ ≤ ∥c− x∥.

28 2 Separation

This means that each point of C lies at a distance at least 2ε from x, so that x cannotbelong to clC.

(if) If x /∈ clC then let c′ be any point of C and consider the intersection D ofclC with the closed ball x+ ∥c′ − x∥Bn. This D is compact and nonempty, so thecontinuous function ∥x− ( ·)∥ attains its minimum on D at some point c0, whichthen actually minimizes ∥x − ( ·)∥ on all of clC. We have ∥x − c0∥ > 0 becausex /∈ clC. As clC is convex, Lemma 2.2.1 says that for each c ∈ clC,

0 ≥ ⟨x− c0,c− c0⟩= ∥x− c0∥2 −⟨x− c0,x− c⟩. (2.5)

Define y∗ = (x− c0)/∥x− c0∥ and ε = (1/2)∥x− c0∥; then divide (2.5) by ∥x− c0∥and rewrite the result as ⟨y∗,x− c⟩ ≥ 2ε , which implies that for any c ∈ clC,

⟨y∗,x⟩− ε ≥ ⟨y∗,c⟩+ ε. (2.6)

Let η = ε + supc∈clC⟨y∗,c⟩; by (2.6) the supremum is not greater than ⟨y∗,x⟩−2ε ,so η is finite. We have then for any c ∈ clC and any v and v′ in Bn

⟨y∗,c+ εv⟩ ≤ ⟨y∗,c⟩+ ε ≤ η ≤ ⟨y∗,x⟩− ε ≤ ⟨y∗,x+ εv′⟩,

so that the hyperplane H0(y∗,η) separates x+ εBn and (clC)+ εBn. This meansit strongly separates x and clC, and therefore also x and C. ⊓⊔

Proposition 2.2.2 lets us prove an external representation for the closure of theconvex hull, complementing the internal representation in Proposition 1.1.4.

Theorem 2.2.3. Let S be a subset of Rn. Then clconvS is the intersection of allclosed halfspaces of Rn that contain S.

Proof. If S is empty the claim is evidently true, so we can suppose that S isnonempty. Write C for the intersection of all closed halfspaces containing S. Anyclosed halfspace containing S is closed and convex; therefore it contains clconvSas well, so C must also contain clconvS. For the opposite inclusion, suppose x doesnot belong to clconvS. Use Proposition 2.2.2 to find a nonzero point y∗ ∈ Rn anda real number η such that for each z ∈ clconvS, ⟨y∗,x⟩ > η ≥ ⟨y∗,z⟩. The closedhalfspace consisting of all v with ⟨y∗,v⟩ ≤ η contains S but not x; accordingly, xdoes not belong to C, so clconvS contains C. ⊓⊔

For another application of this representation, we introduce the polarity opera-tion, which plays an important part in many aspects of convexity.

Definition 2.2.4. C be a nonempty subset of Rn. The polar of C is the set C of allx∗ in Rn such that for each x ∈C, ⟨x∗,x⟩ ≤ 1. ⊓⊔

As C is the intersection over x ∈ C of a collection of closed convex sets ofthe form x∗ | ⟨x∗,x⟩ ≤ 1, it is closed, convex, and nonempty (because it alwayscontains the origin). Also, the definition shows that if C1 ⊃ C2 then C

1 ⊂ C2 . One

can check that if C happens to be a subspace, then C =C⊥.Here is a fundamental property of this polarity operation.


Theorem 2.2.5. Let S be a nonempty subset of Rn. Then S = clconv(S∪0).

Proof. Write T for clconv(S∪0).(⊃). Let x ∈ T . If a point x∗ belongs to S then by definition H−(x∗,1) contains

S. It contains the origin too, so it contains S∪0. Then Theorem 2.2.3 says that xbelongs to H−(x∗,1). Therefore

x ∈ ∩x∗∈SH−(x∗,1) = S,

so that S ⊃ T .(⊂). Suppose that x /∈ T . By Proposition 2.2.2 there is a closed halfspace

H−(x∗,ξ ) containing S∪0 that does not contain x. As this halfspace contains theorigin, ⟨x∗,x⟩> ξ ≥ 0. By scaling x∗ and ξ appropriately we can arrange that ξ ≤ 1and that ⟨x∗,x⟩> 1. The first of these inequalities shows that x∗ ∈ (S∪0) ⊂ S;then the second shows that x /∈ S. Therefore S ⊂ T . ⊓⊔

Corollary 2.2.6. If C is a convex subset of Rn whose closure contains the origin,then C = clC.

Proof. The hypothesis shows that clC ⊃ C∪0, and as clC is convex and closedwe have the first inclusion in

clC ⊃ clconv(C∪0)⊃ clC;

the second follows from conv(C∪0)⊃C. Then Theorem 2.2.5 shows that

C = clconv(C∪0) = clC.

⊓⊔

Corollary 2.2.6 contains as a very special case the familiar equality of L and L⊥⊥

for a subspace L of Rn.Our results so far have concerned strong separation of a point and a set. With very

little additional effort we can characterize conditions under which we can stronglyseparate two nonempty convex sets.

Theorem 2.2.7. Let C and D be nonempty convex subsets of Rn. Then C and D canbe strongly separated if and only if 0 /∈ cl(C−D).

Proof. (only if) Strong separation implies that for some nonzero y∗ ∈ Rn and somereal η and positive ε , one has for each c ∈C and d ∈ D,

⟨y∗,c⟩ ≥ η + ε, ⟨y∗,d⟩ ≤ η − ε .

Subtracting and using the Schwarz inequality, we find that

∥y∗∥∥c−d∥ ≥ ⟨y∗,c−d⟩ ≥ 2ε,

so that ∥c− d∥ ≥ 2ε/∥y∗∥. Therefore the origin cannot belong to the closure ofC−D.

30 2 Separation

(if) Suppose that 0 /∈ cl(C − D). Proposition 2.2.2 then tells us that we canstrongly separate the origin from cl(C −D). In particular, this means that there isa nonzero y∗ such that for some positive ε and all c ∈C and d ∈ D,

⟨y∗,c−d⟩− ε ≥ ⟨y∗,0⟩+ ε;

that is,⟨y∗,c⟩− ε ≥ ⟨y∗,d⟩+ ε.

It follows thatinfc∈C

⟨y∗,c⟩− ε ≥ supd∈D

⟨y∗,d⟩+ ε. (2.7)

Let η be any real number between the left and right sides of (2.7); then if c ∈C andd ∈ D and if u and v are any elements of Bn,

⟨y∗,d + εu⟩ ≤ η ≤ ⟨y∗,c+ εv⟩,

so that the hyperplane H0(y∗,η) strongly separates C and D. ⊓⊔

A possibly surprising feature of Theorem 2.2.7 is that it does not say that we canstrongly separate two closed convex sets if and only if they are disjoint. The reasonis that such a statement is not generally true (Exercise 2.2.14). However, it becomestrue if we add the requirement that one of the sets be compact. The next propositionshows why.

Proposition 2.2.8. If S and T are subsets of Rn, with S closed and T compact, thenS+T is closed.

Proof. If either set is empty there is nothing to prove, so assume each is nonempty.If x ∈ cl(S+T ) then there are sequences sk ⊂ S and tk ⊂ T such that sk + tkconverges to x. As T is compact, by thinning the sequence if necessary we canassume that the tk converge to t ∈ T ; then the sk must also converge to some s, andas S is closed we have s ∈ S. Therefore x = s+ t ∈ S+T , so S+T is closed. ⊓⊔

Corollary 2.2.9. Let C and D be nonempty convex sets in Rn, with C closed and Dcompact. Then C and D can be strongly separated if and only if they are disjoint.

Proof. Saying that C and D are disjoint is equivalent to saying that the origin doesnot belong to C−D. Under our hypotheses Proposition 2.2.8 guarantees that the setC−D is closed, so the assertion follows from Theorem 2.2.7. ⊓⊔

2.2.2 Proper separation

Having characterized strong separation, we now do the same for proper separation.The procedure is similar to that for strong separation: first we deal with a point anda set, then extend the result to two sets. However, the arguments are more delicate.


Proposition 2.2.10. Let C be a nonempty convex subset of Rn, and let x be a pointof Rn. A hyperplane properly separating x and C exists if and only if x /∈ riC.

Proof. (only if) Suppose that H0(y∗,η) is a hyperplane properly separating xand C. If x /∈ H0(y∗,η) then x is not even contained in clC, much less in riC.Therefore suppose that x ∈ H0(y∗,η) and suppose that we have chosen y∗ so thatC ⊂ H−(y∗,η). The hypothesis of proper separation says that C ⊂ H0(y∗,η), so byCorollary 1.2.3 riC does not meet this hyperplane and therefore cannot contain x.

(if) Suppose x /∈ riC. If x /∈ clC, then by Proposition 2.2.2 we can strongly sep-arate x and clC, which certainly suffices. Therefore suppose x ∈ clC. We havericlC = riC, so x /∈ riclC and so it is in the relative boundary of clC. This meansthat there are points xn ∈ affC converging to x, none of which belongs to clC. Projecteach xn onto clC to obtain points cn; as x ∈ clC we have

∥cn − x∥ ≤ ∥cn − xn∥+∥xn − x∥ ≤ 2∥xn − x∥,

so the cn also converge to x. Apply Lemma 2.2.1 to show that for each n and eachc ∈ clC,

⟨xn − cn,c− cn⟩ ≤ 0. (2.8)

Let zn := (xn − cn)/∥xn − cn∥; these points all belong to parC and have norm 1, soby thinning the sequence we may suppose they converge to z0 ∈ parC with norm1. If we divide (2.8) by ∥xn − cn∥ and take the limit, we obtain for each c ∈ C theinequality ⟨z0,c− x⟩ ≤ 0. If we let ζ = ⟨z0,x⟩ then C ⊂ H−(z0,ζ ) and x ∈ H(z0,ζ ).

To show that this separation is proper, let c ∈ riC; then as z0 ∈ parC we have forsufficiently small positive ε the inclusion c+ εz0 ∈C, so that

ζ ≥ ⟨z0,(c+ εz0)⟩= ⟨z0,c⟩+ ε.

Therefore ⟨z0,c⟩ < ζ , showing that riC is actually disjoint from H0(z0,ζ ) and, inparticular that the separation is proper. ⊓⊔

Here is the characterization of proper separation.

Theorem 2.2.11. Let C and D be nonempty convex subsets of Rn. Then C and D canbe properly separated if and only if riC∩ riD = /0.

Proof. As ri(C −D) = riC − riD (Corollary 1.2.8), riC fails to meet riD exactlywhen the origin does not belong to ri(C−D). By Proposition 2.2.10 this is equiva-lent to saying that the origin can be properly separated from C−D by a hyperplane.Therefore we only need to show that such separation occurs if and only if C and Dcan be properly separated.

(Only if). If H0(y∗,η) properly separates the origin from C−D, we can supposethat

C−D ⊂ H−(y∗,η), 0 ∈ H+(y∗,η), (C−D)∪0 ⊂ H0(y∗,η). (2.9)

The first statement in (2.9) implies that for each c ∈C and d ∈ D, ⟨y∗,c⟩ ≤ ⟨y∗,d⟩+η , and the second statement shows that η ≤ 0. Taking the supremum on the left over

32 2 Separation

c ∈C and then the infimum on the right over d ∈ D, we obtain

supc∈C

⟨y∗,c⟩ ≤ infd∈D

⟨y∗,d⟩+η ,

and if we take η0 to be any real number between the left and right sides of thisinequality we find that

C ⊂ H−(y∗,η0), D ⊂ H+(y∗,η0 −η)⊂ H+(y∗,η0). (2.10)

Accordingly, H0(y∗,η0) separates C and D.If η = 0, then if C and D were both contained in H0(y∗,η0) we would have

C −D ⊂ H0(y∗,0). As the origin also belongs to that hyperplane, we would thenhave a contradiction to the third statement in (2.9). Therefore in this case H0(y∗,η0)properly separates C and D.

If η < 0, then (2.10) shows that for each c ∈C and d ∈ D,

⟨y∗,c⟩ ≤ η0 < η0 −η ≤ ⟨y∗,d⟩,

so that no point of D can belong to H0(y∗,η0) and therefore in this case also theseparation is proper.

(If). Suppose that H0(y∗,η) properly separates C and D. Then for each c ∈C andd ∈ D, ⟨y∗,c−d⟩ will remain on the same side of the origin and, for at least one pairc ∈C and d ∈ D, we have ⟨y∗, c⟩ = ⟨y∗, d⟩. The first statement shows that H0(y∗,0)separates C−D from the origin. The separation must also be proper because ⟨y∗, c−d⟩ = 0 so that c− d /∈ H0(y∗,0). ⊓⊔

Theorem 2.2.5 applied strong separation to show that the polar of S was theclosure of the convex hull of S∪0. We can now use proper separation to learnsomething about the structure of S.

Theorem 2.2.12. Let S be a nonempty subset of Rn. Then

a. If 0 /∈ riconvS then S contains a halfline from the origin.b. If 0 ∈ riconvS, let δ = d[0, rbconvS] and let L be the subspace parallel to convS.

Then there is a subset D of B(0,δ−1)∩L such that S = D+L⊥.

Proof. If 0 /∈ riconvS, separate 0 properly from convS to produce a nonzero s∗

and σ ∈ R such that for each s ∈ S,

0 = ⟨s∗,0⟩ ≥ σ ≥ ⟨s∗,s⟩,

and 0∪convS ⊂ H0(s∗,σ). Then S contains the halfline s∗R+, which proves (a).For (b), suppose 0 ∈ riconvS; then affconvS is the parallel subspace L. For each

s∗ ∈ S let s∗ = s∗L + s∗N be the unique decomposition of s∗ into s∗L ∈ L and s∗N ∈ L⊥,and define D = s∗L | s∗ ∈ S.

If s∗ ∈ S then s∗ = s∗L+s∗N with s∗L ∈D and s∗N ∈ L⊥, so s∗ ∈D+L⊥ and thereforeS ⊂ D+L⊥. Now suppose t∗ ∈ D+L⊥. Then there are s∗ ∈ S and n∗ ∈ L⊥ suchthat t∗ = s∗L +n∗. Let s ∈ S. Then


⟨t∗,s⟩= ⟨s∗L +n∗,s⟩= ⟨s∗,s⟩+ ⟨n∗− s∗N ,s⟩ ≤ 1+0 = 1,

because s ∈ L and therefore ⟨n∗− s∗N ,s⟩= 0. so t∗ ∈ S and therefore D+L⊥ ⊂ S.We then have S∗ = D+L⊥.

To establish the bound on D, let ε ∈ (0,δ ); then εBL ⊂ convS, where BL =Bn∩L.The definition of D above shows that each element of D has the form s∗L where forsome s∗ ∈ S we have s∗ = s∗L + s∗N with s∗L ∈ L and s∗N ∈ L⊥. If s∗L = 0 then certainly∥s∗L∥ ≤ ε−1. Otherwise, by choice of ε we have εs∗L/∥s∗L∥ ∈ convS, so there existpoints s0, . . . ,sk of S and convex coefficients µ0, . . . ,µk with

εs∗L/∥s∗L∥=k

∑i=0

µisi.

As ⟨s∗,si⟩ ≤ 1 and ⟨s∗N ,si⟩= 0 for each i, we have

1 ≥k

∑i=0

µi⟨s∗,si⟩=k

∑i=0

µi⟨s∗L,si⟩= ⟨s∗L,k

∑i=0

µisi⟩= ⟨s∗L,εs∗L/∥s∗L∥⟩= ε∥s∗L∥,

and therefore ∥s∗L∥≤ ε−1. It follows that no element of D has norm greater than ε−1.As ε can be arbitrarily close to δ , we have D ⊂ B(0,δ−1)∩L. ⊓⊔


Exercise 2.2.13. Let C be a nonempty closed convex subset of Rn. Define the Eu-clidean projector πC on Rn to be, for each x ∈ Rn, the point c(x) ∈ C closest to x.Now define the normal component νC(x) to be x−πC(x).

1. Show that for each x and x′ in Rn the following inequalities hold.

⟨νC(x)−νC(x′),πC(x)−πC(x′)⟩ ≥ 0; (2.11)∥πC(x)−πC(x′)∥2 +∥νC(x)−νC(x′)∥2 ≤ ∥x− x′∥2; (2.12)

⟨πC(x)−πC(x′),x− x′⟩ ≥ ∥πC(x)−πC(x′)∥2; (2.13)⟨νC(x)−νC(x′),x− x′⟩ ≥ ∥νC(x)−νC(x′)∥2. (2.14)

2. Show that each of πC and νC is Lipschitz continuous on Rn with Lipschitz mod-ulus 1. For each of these functions show that this Lipschitz constant cannot beimproved, by exhibiting a nonempty closed convex subset C of R2 and points xand x′ in R2 such that the norm of the difference of the function values at x andx′ is ∥x− x′∥.

Exercise 2.2.14. Give an example in R2 to show that there exist nonempty closedconvex sets C and D that are disjoint but cannot be strongly separated.

34 2 Separation

Exercise 2.2.15. Let C be a convex subset of Rn and let x ∈ C. Show that C has asupporting hyperplane at x if and only if x is in the boundary of C, and that if thesupport is proper then x is in the relative boundary of C.

Chapter 3Cones

3.1 Basic properties and some applications

This section defines cones and develops some elementary properties. It also illus-trates some ways in which they occur in applications by providing two prominentmembers of the class of theorems of the alternative from linear algebra.

3.1.1 Definitions

Definition 3.1.1. A subset K of Rn is a cone if whenever x ∈ K and µ is a positivereal number, µx ∈ K. ⊓⊔

A cone is thus a set that is closed under positive scalar multiplication. Some authorsuse a different definition, requiring a cone to be closed under nonnegative scalarmultiplication, so it is a good idea to check what definition an author is using. Oneimportant reason for using positive scalar multiplication is Theorem 3.1.7 below.

A cone may, but need not, contain the origin; it may be open or closed or neither.It may or may not be convex; if it is, we call it a convex cone. The definition alsoimplies that the forward and inverse images of a cone under a linear transformationare cones, and the Cartesian product of a finite collection of cones is a cone.

Any subspace is a cone, but many interesting cones are not subspaces. Threefamiliar convex cones are

• Rn+, the nonnegative orthant of the space Rn;

• Sn+, the semidefinite cone, consisting of the positive semidefinite symmetric ma-

trices in Rn×n;• Nm×n, the cone of matrices in Rm×n whose entries are all nonnegative.

We have already seen the polarity operation for sets. For cones it takes a specialform.

35

36 3 Cones

Proposition 3.1.2. Let K be a nonempty cone in Rn. The polar K is a nonemptyclosed convex cone, and we have

K = x∗ ∈ Rn | for each x ∈ K, ⟨x∗,x⟩ ≤ 0. (3.1)

When K is convex we have(parK)⊥ = lin(K). (3.2)

Proof. We already know that K will be a closed convex set, and it must containthe set shown on the right in (3.1). To see that they are the same set, suppose thatx∗ ∈ K, so that ⟨x∗,x⟩ ≤ 1 for each x ∈ K. Then there cannot be any x ∈ K with⟨x∗,x⟩ > 0, as otherwise we could multiply x by a sufficiently large positive scalarso that ⟨x∗,αx⟩> 1. But αx ∈ K, so this contradicts x∗ ∈ K. Therefore (3.1) holds,and it shows that K is a cone.

Suppose that K is convex and that v∗ ∈ (parK)⊥ and u∗ ∈ K. Then for any x ∈ Kwe have 2x ∈ K so x = 2x− x ∈ parK. Then for each α ∈ R,

⟨u∗+αv∗,x⟩= ⟨u∗,x⟩ ≤ 0,

which implies that v∗ ∈ lin(K). Therefore (parK)⊥ ⊂ lin(K).To show the other inclusion, let v∗ be any element of linK and u∗ be any element

of K. For each x ∈ K and each α ∈R we have ⟨u∗+αv∗,x⟩ ≤ 0, which implies that⟨v∗,x⟩= 0. Let L be the linear space spanned by v∗; then K ⊂ L⊥ and therefore

parK = affK ⊂ L⊥.

But then L ⊂ (parK)⊥, and as L was the span of an arbitrary element of lin(K), wehave lin(K)⊂ (parK)⊥. ⊓⊔

Sometimes we have to add cones to other sets. If the other set is bounded and theresult is a cone, the following proposition can be useful.

Proposition 3.1.3. Let P and K be nonempty subsets of Rn, with P bounded and Ka closed cone. If P+K is a cone then P ⊂ K.

Proof. Suppose P+K is a cone. As 0 ∈ K because K is closed, we have P = P+0 ⊂ P+K so that for any positive µ , µP ⊂ P+K. Choose p ∈ P: then for somep′ ∈ P and k′ ∈ K we have µ p = p′+ k′ and so p−µ−1 p′ = µ−1k′ ∈ K. ThereforedK(p)≤ µ−1∥p′∥≤ µ−1π , where the ball B(0,π) contains P. As µ can be arbitrarilylarge and K is closed, this implies that p ∈ K and therefore P ⊂ K. ⊓⊔

Certain operations make sets into cones, such as the conical hull defined next.

Definition 3.1.4. If S is a subset of Rn, the conical hull of S, written coneS, is theintersection of all cones containing S. ⊓⊔

At least one cone (Rn) contains S, so the intersection in Definition 3.1.4 is not overan empty collection, though the intersection itself can be empty: the empty set is

3.1 Basic properties and some applications 37

also a cone. As the intersection of any collection of cones is a cone, this definitionsays that coneS is the smallest cone containing S. As the next proposition shows, wecan also describe coneS as the cone generated by S: that is, the set ∪α>0αS, wherefor a subset S of Rn and a real number α ,

αS := αs | s ∈ S.

Under this definition, coneS contains the origin if and only if S does. Some defini-tions in the literature differ from this one.

Proposition 3.1.5. For any subset S of Rn, coneS = ∪α>0αS.

Proof. Write G for ∪α>0αS. G is a cone (use the definition), and it contains S be-cause α assumes the value 1 along with the other positive reals, so coneS ⊂ G.On the other hand, if K is any cone containing S then K must contain every positivemultiple of any element of S, so K ⊃ G. Definition 3.1.4 then shows that coneS ⊃ G,so coneS = G. ⊓⊔

If C is a convex set then so is coneC, though the converse is generally false. Tosee this, suppose that x and x′ belong to coneC and that λ ∈ [0,1]. By Proposition3.1.5, there are points c and c′ of C and positive real numbers µ and µ ′ such thatx = µc and x′ = µ ′c′. Let γ = (1−λ )µ +λ µ ′; then

(1−λ )x+λx′ = γ [(1−λ )µγ−1c+λ µ ′γ−1c′].

The quantity in square brackets on the right is a convex combination of c and c′. Itis therefore in C, so the left-hand side belongs to coneC.

3.1.2 Relative interiors of cones

The next theorem shows that in the convex case the conical hull and relative interioroperators commute, and it also provides a good example of the proper separationtheorem in action. Its proof uses the following lemma about separation of two sets,one of which is a cone.

Lemma 3.1.6. Let K and S be nonempty subsets of Rn, and suppose that K is a cone.If a hyperplane H0(y∗,η) separates K and S, then we may choose the orientation of(y∗,η) so that H0(y∗,0) separates K and S, y∗ ∈ K, and η ≥ 0. If the separationby H0(y∗,η) is proper, then so is the separation by H0(y∗,0).

Proof. By multiplying (y∗,η) by −1 if needed, we can arrange that K ⊂ H−(y∗,η)and S ⊂ H+(y∗,η). If any k ∈ K had ⟨y∗,k⟩> 0 then by taking a large positive γ wecould obtain ⟨y∗,γk⟩> η , a contradiction; therefore K ⊂ H−(y∗,0), so that y∗ ∈ K.Moreover, as the origin belongs to clK we must have η ≥ 0; as S ⊂ H+(y∗,η) wealso have S ⊂ H+(y∗,0) so that H0(y∗,0) separates K and S.

38 3 Cones

Now suppose that the separation by H0(y∗,η) is proper. If η > 0 then S is disjointfrom H0(y∗,0) so the separation is proper, while if η = 0 then the separation isproper by hypothesis. ⊓⊔

Theorem 3.1.7. If C is a convex subset of Rn, then riconeC = coneriC.

Proof. If C is empty there is nothing to prove, so assume C is nonempty.(⊂) If x /∈ coneriC, define X = conex. Then X and riC are disjoint, so the

proper separation theorem tells us there is a nonzero y∗ ∈ Rn and a real η such thatH0(y∗,η) properly separates X and C. As X is a cone we can use Lemma 3.1.6to choose the orientation of (y∗,η) so that H0(y∗,0) properly separates X and C,y∗ ∈ X, and η ≥ 0. Therefore in fact we have for each c ∈C,

⟨y∗,c⟩ ≥ η ≥ 0 ≥ ⟨y∗,x⟩.

Moreover, proper separation guarantees that there is some c ∈ C with ⟨y∗,x⟩ <⟨y∗, c⟩.

Let ε be any positive number and define p = x+ ε(x− c). Then

⟨y∗, p⟩= ⟨y∗,x⟩+ ε(⟨y∗,x⟩−⟨y∗, c⟩)< ⟨y∗,x⟩ ≤ 0.

This implies that for each positive ξ we have ⟨y∗,ξ p⟩ < 0 ≤ η , so ξ p /∈ C,and therefore p /∈ coneC. Now Corollary 1.2.5 shows that x /∈ riconeC, so thatriconeC ⊂ coneriC.

(⊃) Let x ∈ coneriC, and let z be an arbitrary point of coneC. We will find ε > 0with x+ε(x−z)∈ coneC, which by Corollary 1.2.5 will establish that x ∈ riconeC.

The assumptions about x and z mean that there are positive numbers σ and τsuch that τx ∈ riC and σz ∈ C. As τx ∈ riC there is some positive δ0 such thatτx+δ0(τx−σz) ∈C, and since C is convex it is easy to see that the inclusion holdsalso if δ0 is replaced by any number δ ∈ (0,δ0).

Choose δ ∈ (0,δ0) so small that 1+δ−1 > τ−1σ , and define positive numbers εand α by

1+δ−1 = (1+ ε−1)(τ−1σ), α = ε−1δσ .

Then δσ = εα , and

(1+δ )τ = δτ(1+δ−1) = δτ(1+ ε−1)(τ−1σ)

= εα(1+ ε−1) = (1+ ε)α .

We know that C contains the point y := τx+δ (τx−σz). However,

α−1y = α−1[τx+δ (τx−σz)] = α−1[(1+δ )τx−δσz]

= α−1[(1+ ε)αx− εαz] = x+ ε(x− z).

Therefore x+ε(x−z) belongs to coneC, so that x∈ riconeC and therefore riconeC ⊃coneriC. ⊓⊔


Corollary 3.1.8. Let C be a nonempty convex cone in Rn. Then riC and clC areconvex cones, and the three cones riC, C, and clC all have the same polar.

Proof. As C is convex, both riC and clC are convex. That clC is a cone is easy toshow using the definition of a cone, and Theorem 3.1.7 shows that riC = riconeC =coneriC, so riC is a cone.

We have riC ⊂C ⊂ clC, and by taking polars we find that

(clC) ⊂C ⊂ (riC). (3.3)

Let x∗ ∈ (riC); then as clC = cl riC by Proposition 1.2.6, for any point x ∈ clCthere is a sequence ck ⊂ riC converging to x. As ⟨x∗,ck⟩ ≤ 0 for each k, we have⟨x∗,x⟩ ≤ 0 also, and therefore x∗ ∈ (clC). Then (riC) ⊂ (clC), so the three setsin (3.3) are the same. ⊓⊔

Corollary 3.1.9. If C is a nonempty convex cone in Rn, then C = clC.

Proof. As C = (clC) by Corollary 3.1.8, it suffices to show that for a nonemptyclosed convex cone K one has K = K. Such a K must contain the origin, so theresult follows from Corollary 2.2.6. ⊓⊔

3.1.3 Two theorems of the alternative

Theorems of the alternative are results about the solvability of systems of linearequations and/or inequalities, in which cones play important parts. They are of-ten useful in the analysis of problems from various application areas. This sectionpresents two of the best known theorems of the alternative, due to Farkas and toGordan. After developing some additional tools we will show in Section 3.2 how togeneralize each of these.

Proposition 3.1.10 (Farkas lemma, 1898). Let A be an m× n matrix and let a ∈Rm. Exactly one of the following holds:

a. There is some x ∈ Rn+ with Ax = a.

b. There is some u∗ ∈ Rm with A∗u∗ ∈ Rn− and ⟨u∗,a⟩> 0.

Proof. To show that (a) and (b) cannot both hold, suppose they did. Then

0 < ⟨u∗,a⟩= ⟨u∗,Ax⟩= ⟨A∗u∗,x⟩ ≤ 0,

where the final inequality holds because A∗u∗ ∈ Rn− and x ∈ Rn

+. This is a contra-diction, so both cannot hold.

To complete the proof it suffices to show that if (a) fails then (b) holds. Write thecolumns of A as a1, . . . ,an and let

C = A(Rn+) = posa1, . . . ,an= conv0+posa1, . . . ,an.

40 3 Cones

This C is a cone, and it is also finitely generated. By Proposition 1.1.21, C is closed.To say that (a) does not hold is to say that a /∈C. Let c0 be the point in C closest toa, and let u∗ = a− c0. By construction, u∗ = 0. Further, Lemma 2.2.1 says that foreach c ∈C,

0 ≥ ⟨a− c0,c− c0⟩= ⟨u∗,c− c0⟩. (3.4)

By taking c = (1/2)c0 and c = 2c0 in (3.4) we find that ⟨u∗,c0⟩ = 0. A point cbelongs to C if and only if it is of the form c = Ax for x ∈ Rn

+; therefore for eachx ∈ Rn

+ we have0 ≥ ⟨u∗,Ax⟩= ⟨A∗u∗,x⟩. (3.5)

Let i ∈ 1, . . . ,n and take x in (3.5) to be ei, a point whose coordinates are 0 forevery index except i and 1 for the index i. We obtain (A∗u∗)i ≤ 0, and as this holdsfor each i we have A∗u∗ ∈ Rn

−. Also, as ⟨u∗,c0⟩= 0 we have

⟨u∗,a⟩= ⟨u∗,a− c0⟩= ∥u∗∥2 > 0,

so that (b) holds. ⊓⊔

Instead of using strong separation and the closedness of finitely generated convexsets to prove the Farkas lemma, one can do it with an induction on the dimension ofthe column space of A; for such a proof, see [12, Theorem 2.6]. The inductive proofdoes not provide the geometric perspective that the proof by separation gives.

One can prove a much more general version of the Farkas lemma, replacing Rn+

by a general convex cone K, if and only if A(K) is closed. We will do this in Section3.2, after establishing some results needed for the proof.

Another well known theorem of the alternative, slightly older than the Farkaslemma, is due to Gordan. In its statement we call an element of Rn semipositive if itbelongs to Rn

+\0. We also use the notation x > 0 for a positive vector in Rn: thatis, an element of intRn

+.

Proposition 3.1.11 (Gordan, 1873). Let A be an m× n matrix. Exactly one of thefollowing is true:

a. There is an x ∈ Rn with Ax > 0.b. There is a semipositive u∗ ∈ Rm with A∗u∗ = 0.

Proof. If (a) and (b) were both true, then we would have

0 = ⟨A∗u∗,x⟩= ⟨u∗,Ax⟩> 0,

so that these alternatives cannot both hold. We complete the proof by showing thatif (b) does not hold, then (a) does.

Any u∗ appearing in (b) is semipositive, so we can scale it to make its componentssum to 1 without affecting the truth or falsity of (b). Then saying that (b) does nothold is the same as saying that the origin does not belong to the convex hull C of therows of A. We do not even need Proposition 1.1.21 to show that C is closed: as C isthe convex hull of finitely many points, a simple compactness argument suffices. As


we are supposing that C does not contain the origin, we can project the origin ontoC to obtain a point c0 of C. By Lemma 2.2.1 we then have for each c′ ∈C,

⟨0− c0,c′− c0⟩ ≤ 0,

or equivalently,⟨c0,c′⟩ ≥ ∥c0∥2 > 0. (3.6)

By setting x = c0 and taking each row of A to be c′ in (3.6), we obtain Ax > 0. ⊓⊔

The proofs of Propositions 3.1.10 and 3.1.11 are very similar; the only significantdifference is that in the Gordan theorem we are dealing with projection onto theconvex hull of a finite set, so we have less difficulty in determining that it is closed.As in the case of the Farkas lemma one can formulate a much more general versionof the Gordan theorem. That version appears in Section 3.2.


Exercise 3.1.12. If C is a convex subset of Rn, then coneC is convex.

Exercise 3.1.13. If K is a nonempty cone in Rn, then

a. For any positive µ , µK = K;b. convK is a cone;c. K is a closed convex cone;d. K is convex if and only if K +K = K.

Exercise 3.1.14. If K is a convex cone in Rn then so is K ∪0; one has K ⊂ K ∪0 ⊂ clK, and all three are convex cones.

Exercise 3.1.15. For any set S in Rn, coneconvS = convconeS.

Exercise 3.1.16. If C and D are nonempty cones in Rn and Rm respectively, then(C×D) =C×D.

Exercise 3.1.17. If L is a subspace of Rn then L = L⊥.

Exercise 3.1.18. Consider the set

C = x ∈ R3 | x1 ≥ 0, x3 ≥ 0, 2x1x3 ≥ x22 .

Show that C is the set of points in R3 making an angle not greater than π/4 withthe halfline from the origin in the direction (1,0,1). Using this result, show that Cis a closed convex cone. Show further that C = −C. This cone is often useful forcounterexamples.

42 3 Cones

Exercise 3.1.19. If S and S′ are subsets of Rn then cone(S ∩ S′) ⊂ (coneS) ∩(coneS′). If either (1) at least one of S and S′ is a cone, or (2) each of S andS′ is convex and contains the origin, then the reverse inclusion holds so thatcone(S∩S′) = (coneS)∩ (coneS′).

Exercise 3.1.20. If S is any subset of Rn, then clconeS ⊃ coneclS. If S is anonempty bounded subset of Rn whose closure does not contain the origin, thenclconeS = (coneclS)∪0.

Exercise 3.1.21. Exhibit an example of a closed convex subset S of R2 for whichclconeS properly contains 0∪ coneclS.

3.2 Cones and linear transformations

It is often necessary to take forward or inverse images of cones under linear trans-formations. These images remain cones (Exercise 3.2.10); moreover, if the originalcone was convex then so are the images. The first proposition below shows whathappens if we take the polar of a forward image.

Proposition 3.2.1. Let T be a nonempty cone in Rn, and let L be a linear transfor-mation from Rn to Rm. Then L(T ) = (L∗)−1(T ).

Proof. An element y∗ of Rm belongs to L(T ) if and only if, for each t ∈ T ,

0 ≥ ⟨y∗,Lt⟩= ⟨L∗y∗, t⟩.

This is equivalent to saying that L∗y∗ ∈ T and also to y∗ ∈ (L∗)−1(T ). ⊓⊔

Corollary 3.2.2. Let S1, . . . ,Sk be nonempty cones in Rn. Then

(k

∑i=1

Si) =

k∩i=1

Si .

Proof. Let L be the linear transformation from Rkn to Rn given by L(x1, . . . ,xk) =x1 + . . . ,xk. Apply Proposition 3.2.1 to L and the set U = Π k

i=1Si, using Exercise3.1.16. ⊓⊔

In proving the following technical result we use the fact that if C is a convex setwhose closure is affine, then in fact C is closed and therefore it is affine (Exercise1.2.20).

Proposition 3.2.3. Let M and K be respectively a subspace and a convex cone inRn, both nonempty. The following are equivalent:

a. M meets riK.b. M+K is a subspace.

3.2 Cones and linear transformations 43

c. M⊥∩K is a subspace.

Proof. Let L be a nonempty convex cone; then if L is a subspace it is relativelyopen, so the origin belongs to its relative interior. On the other hand, if the originbelongs to riL then L must contain a neighborhood of the origin in affL. As affL isa subspace and L is a cone, L must coincide with affL. Therefore the origin belongsto riL if and only if L is a subspace.

Now take L = M+K and note that to say M meets riK is the same as saying thatthe origin belongs to (riK)−M. But as M is a subspace (hence equal to −M) wehave

(riK)−M = (riK)+M = (riK)+(riM) = ri(K +M),

where we used Corollary 1.2.8. We conclude that riK meets M if and only if theorigin belongs to ri(K +M), and that in turn happens if and only if K +M is asubspace. Therefore (a) and (b) are equivalent.

Corollary 3.2.2 says that(M+K) = M⊥∩K. Therefore (b) implies (c). On theother hand, if (c) holds then we have

cl(M+K) = (M+K) = [M⊥∩K],

where the last equality uses Corollary 3.2.2. It follows that cl(M+K) is a subspace.Our observation just before the statement of this proposition then shows that M+Kis a subspace, so that (c) implies (b). ⊓⊔

Unfortunately, the linear image of a closed convex cone need not be closed, butProposition 3.2.3 can help in finding conditions for it to be closed. Suppose K ⊂Rn

is such a cone and L : Rn → Rm is linear. Let y ∈ clL(K). Then L(K) will fail to beclosed if for the inverse images

N(ε) := x ∈ K | Lx ∈ B(y,ε)= K ∩L−1[B(y,ε)],

the distance from the origin to N(ε) converges to +∞ as ε converges to zero.The cone C of Example 3.1.18 provides an example of this behavior. If we define

the linear map L : R3 → R2 by L(x1,x2,x3) = (x1,x2), then L(C) = [(intR)×R]∪(0,0). The point (0,1) belongs to clL(C) but not to L(C). If we let δ > 0 and εbe small numbers, then the point (δ ,1+ ε) belongs to L(C) and is near (0,1). Theinverse image of this point is

(δ ,1+ ε,x3) | x3 ≥ 0, 2δx3 ≥ (1+ ε)2.

Therefore x3 must satisfy

x3 ≥ (1/2)δ−1(1+ ε)2, (3.7)

so as δ and ε become small, x3 increases without bound and so then does the dis-tance from the origin to (δ ,1+ ε,x3).

The next result uses Proposition 3.2.3 to develop a condition preventing this badbehavior.

44 3 Cones

Proposition 3.2.4. Let K be a closed convex cone in Rn and let L be a linear trans-formation from Rn →Rm. If imL∗ meets riK then there is a nonnegative real num-ber ρ such that for each z ∈ L(K) there is some x ∈ K with Lx = z and ∥x∥ ≤ ρ∥z∥.

Proof. Let V = (kerL)∩K. Under our hypotheses, Proposition 3.2.3 says that V isa subspace. Thus if v ∈V then −v ∈V too, so that in particular −v ∈ K.

We will first show that L(K) = L(K ∩V⊥), for which we only need to show thatthe inclusion ⊂ holds. Let z ∈ L(K) and find x ∈ K with Lx = z. As Rn =V ⊕V⊥, wecan write x uniquely as x = xV + xN , with xV ∈ V and xN ∈ V⊥. Then xN = x− xV ,and as we saw that −xV ∈K, xN is the sum of two elements of K and therefore it is inK. As xV ∈V it belongs in particular to kerL, so that L(xN) = L(x−xV ) = L(x) = z.But xN belongs to K ∩V⊥, so z ∈ L(K ∩V⊥) and therefore L(K) = L(K ∩V⊥).

If K∩V⊥ = 0 we are finished, because then by what we have just shown L(K)must consist only of the origin, so we may choose x = 0 and let ρ = 0. Therefore inthe remainder of the proof suppose that K ∩V⊥ = 0.

Let γ be the infimum of ∥Lx∥ for points x belonging to the set Q=K∩V⊥∩Sn−1,where Sn−1 is the unit sphere in Rn. As Q is nonempty and compact, this infimummust be attained at some x0. If Lx0 = 0 then x0 ∈ kerL, so that

x0 ∈ (kerL)∩Q =V ∩V⊥∩Sn−1 = /0,

a contradiction; therefore γ > 0. Let ρ = γ−1.Now choose any z ∈ L(K). If z = 0 then choose x = 0 ∈ K; then ∥x∥ ≤ ρ∥z∥. If

z = 0 then as we have shown that L(K) = L(K∩V⊥), there is some x ∈ K∩V⊥ withLx = z. As x cannot be zero, we have L(x/∥x∥) = z/∥x∥, and x/∥x∥ ∈ Q. It followsthat ∥(z/∥x∥)∥ ≥ ρ−1, so that ∥x∥ ≤ ρ∥z∥ as required. ⊓⊔

In the hypothesis of Proposition 3.2.4 we could just as well have assumed eitherof the equivalent properties given in Proposition 3.2.3. Also, Proposition 3.2.4 doesnot by any means assert that its bound applies to all inverse images of z—it generallydoes not—but only that it applies to at least one.

Corollary 3.2.5. Under the hypotheses of Proposition 3.2.4, L(K) is closed.

Proof. Suppose z j is a sequence of elements of L(K) that converges to z. Bychoice of the z j, for each j there is an element x j ∈ K with Lx j = z j. Use Proposition3.2.4 to show that for some ρ we can select the x j so that ∥x j∥ ≤ ρ∥z j∥. As the ∥z j∥are bounded, the sequence x j is then also bounded; therefore it has a cluster pointx ∈ K. Continuity then implies Lx = z so that z ∈ L(K), showing that L(K) is closed.

We will obtain in the next chapter a criterion for the linear image of a closedconvex set (not necessarily a cone) to be closed.

Proposition 3.2.6. Let T be a nonempty closed convex cone in Rm and let L be alinear transformation from Rn to Rm. Then

[L−1(T )] = clL∗(T ). (3.8)

3.2 Cones and linear transformations 45

If also imL meets riT then L∗(T ) is closed, so that

[L−1(T )] = L∗(T ). (3.9)

Proof. For (3.8), we apply Proposition 3.2.1 to obtain

clL∗(T ) = [L∗(T )] = [L−1(T )] = [L−1(T )]. (3.10)

Now assume in addition that imL meets riT , and apply Corollary 3.2.5 to T andL∗ to show that L∗(T ) is closed. Use this in (3.10) to obtain (3.9). ⊓⊔

Corollary 3.2.7. Let S1, . . . ,Sk be nonempty closed convex cones in Rn. Then

(k∩

i=1

Si) = cl

k

∑i=1

Si . (3.11)

If also the relative interiors of S1, . . . ,Sk have a point in common, then ∑ki=1 Si is

closed, so that

(k∩

i=1

Si) =

k

∑i=1

Si .

Proof. Use Proposition 3.2.6, with the same technique used in the proof of Corollary3.2.2. ⊓⊔

Here is the generalized version of the Farkas lemma mentioned in Section 3.1.

Proposition 3.2.8. Let A be an m×n matrix and let K be a nonempty convex conein Rn. If A(K) is closed, exactly one of the following is true:

a. There is an x ∈ K such that Ax = a.b. There is a u∗ ∈ Rm such that A∗u∗ ∈ K and ⟨u∗,a⟩> 0.

If A(K) is not closed, then there is an a ∈ Rm for which neither alternative (a) noralternative (b) holds.

Proof. Suppose A(K) is closed; then alternative (b) says that

a /∈ [(A∗)−1(K)] = ([A(K)]) = clA(K) = A(K),

where we used Proposition 3.2.6. As alternative (a) says that a ∈ A(K), exactly oneof these alternatives holds.

If A(K) is not closed take a to be a point of [clA(K)]\A(K). Then there is nox ∈ K with Ax = a, but as a ∈ clA(K) there is a (nonconvergent) sequence xi ∈ Kwith Axi converging to a. Then if A∗u∗ ∈ K, we have for each i

⟨u∗,Axi⟩= ⟨A∗u∗,xi⟩ ≤ 0,

and as ⟨u∗, · ⟩ is continuous, this implies ⟨u∗,a⟩ ≤ 0. Therefore alternatives (a) and(b) both fail to hold for this a. ⊓⊔

46 3 Cones

As mentioned earlier, the Gordan theorem has a similar generalization. The con-dition u∗ ∈ K\(linK) is the appropriate generalization of a semipositivity condi-tion on u∗.

Proposition 3.2.9. Let A be an m×n matrix and let K be a nonempty convex conein Rm. Exactly one of the following is true:

a. There is an x ∈ Rn such that Ax ∈ riK.b. There is a u∗ ∈ K\(linK) such that A∗u∗ = 0.

Proof. We have

(parK)⊥ = [par(clK)]⊥ = lin[(clK)] = linK,

where the second equality comes from Proposition 3.1.2 and the third from Corol-lary 3.1.8. If both (a) and (b) were true, then the u∗ from (b) would belong to K

but not to (parK)⊥, so we could find v ∈ parK with ⟨u∗,v⟩> 0. If we chose v to besmall enough, then as Ax ∈ riK we would have Ax+ v ∈ K. Then

0 ≥ ⟨u∗,Ax+ v⟩= ⟨A∗u∗,x⟩+ ⟨u∗,v⟩> 0,

and this contradiction shows that (a) and (b) cannot both be true.Now suppose (a) does not hold; we prove (b). To say (a) is false is to say that

imA does not meet riK, so we can separate imA and K properly by a hyperplane.From Lemma 3.1.6 we see that with no loss of generality we can suppose that thehyperplane passes through the origin and that its normal u∗ belongs to K. As thesubspace imA lies in the upper closed halfspace H+(u∗,0) it must in fact lie in thehyperplane H0(u∗,0). This implies that u∗ ∈ (imA)⊥ = kerA∗, so that A∗u∗ = 0and u∗ ∈ K. But as the separation is proper and as imA ⊂ H0(u∗,0) we must haveK ⊂ H0(u∗,0). Therefore there is some k ∈ K with ⟨u∗,k⟩< 0, so that u∗ cannot bein the lineality space of K. ⊓⊔

We stated Proposition 3.2.9 for an arbitrary nonempty convex cone, which notonly may not be polyhedral but may not even be closed, and we were able to proveit using only standard separation tools. This is a significant difference from the sit-uation with the Farkas lemma, which required that the image A(K) be closed.


Exercise 3.2.10. Show that if L is a linear transformation from Rn to Rm and Sand U are subsets of Rn and Rm respectively, then coneL(S) = L(coneS) andconeL−1(U) = L−1(coneU).

Exercise 3.2.11. Show that the closure operation in Corollary 3.2.7 is necessary byexhibiting two nonempty closed convex cones whose sum is not closed.

3.3 Tangent and normal cones 47

3.3 Tangent and normal cones

Two important cones associated with convex sets are the tangent and normal cones.Here are their definitions.

Definition 3.3.1. Let C be a convex set in Rn, and let x ∈ C. A point x∗ ∈ Rn isnormal to C at x if for each c ∈C, ⟨x∗,c− x⟩ ≤ 0. The normal cone to C at a pointx ∈ Rn is the set of all x∗ normal to C at x, if x ∈ C, and is empty otherwise. Thetangent cone to C at x is TC(x) := NC(x), if x ∈C, and is empty otherwise. ⊓⊔

That the normal cone is indeed a cone follows at once from the definition of acone; further, it is easy to prove that this cone is closed and convex. The tangentcone is also closed and convex because it is a polar cone. We then have

NC(x) = NC(x) = TC(x). (3.12)

The origin belongs to NC(x) whenever x ∈C, but in interesting cases NC(x) con-tains other points too. It is easy to show from the definition that NC(x) is alwaysa closed convex cone, and that it is invariant under translation: that is, for any ele-ment y ∈ Rn, NC+y(x+ y) = NC(x). Another useful property of NC is given in thefollowing proposition. Here, as elsewhere, we denote the unit ball of Rn by Bn.

Proposition 3.3.2. Let C be a convex subset of Rn and let x ∈ C. The normal coneNC(x) is a local object in the sense that if Q is any convex neighborhood of x thenNC(x) = NC∩Q(x) and TC(x) = TC∩Q(x).

Proof. Write D := C ∩Q. D is a nonempty convex subset of C, so the definitionof normal cone shows that NC(x) ⊂ ND(x). Suppose x∗ ∈ ND(x) and let c ∈ C. Letµ be a positive number less than 1 and small enough so that x + µ(c − x) ∈ Q.Then x+µ(c− x) also belongs to C by convexity, so it belongs to D. It follows that0≥⟨x∗, [x+µ(c−x)]−x⟩= µ⟨x∗,c−x⟩. As c was an arbitrary point of C and µ > 0,we have x∗ ∈ NC(x) and therefore NC(x)⊃ ND(x). Therefore NC(x) = NC∩Q(x), andas the tangent cone is the polar of the normal cone the second claim follows. ⊓⊔

Proposition 3.3.2 shows that if two convex sets U and V contain a common pointx and if they “agree near x” in the sense that for some neighborhood N of x one hasU ∩N = V ∩N, then the tangent and normal cones of U and V at x are the same.This is so because we can choose a convex neighborhood Q of x with Q ⊂ N; thenU ∩Q = V ∩Q and Proposition 3.3.2 says that the tangent and normal cones of Uand of V are the same. This is often useful in allowing us to be sure that particularcharacteristics of the sets don’t affect the normal cone.

An operation that sends points of Rm to subsets of Rn is called a multifunctionfrom Rm to Rn.

Definition 3.3.3. A multifunction T from Rm to Rn is closed if its graph Γ =(x,y) ∈ Rm ×Rn | y ∈ T (x) is closed. ⊓⊔

48 3 Cones

Saying that the graph is closed means that whenever (xk,yk) is a sequence in Γthat converges to (x,y), then (x,y) ∈Γ . If C is a closed convex subset of Rn then thenormal-cone operator NC(x) is a multifunction from Rn to Rn that is always closed,and this is often useful in proofs. On the other hand, TC(x) is in general not closed.

Definition 3.3.4. If T is a multifunction from Rm to Rn, then the inverse of T is themultifunction T−1 whose graph is the set of all pairs (y,x)∈Rn×Rm such that (x,y)belongs to the graph of T . ⊓⊔

Proposition 3.3.5. If K is a nonempty closed convex cone in Rn, then N−1K = NK .

Proof. By Exercise 3.3.13 (x,x∗) belongs to the graph of NK if and only if x ∈ K,x∗ ∈ K, and ⟨x∗,x⟩ = 0. Apply the exercise to K, using the fact that K = K, toconclude that these properties hold if and only if (x∗,x) belongs to the graph of NK .

⊓⊔

The following proposition develops two additional expressions for TC(x), one orthe other of which will frequently be more useful than the definition.

Proposition 3.3.6. Let C be a nonempty convex subset of Rn, and let x be a point ofC. Then

clcone(C− x) = TC(x) = z | for τ > 0,dC(x+ τz) = o(τ), (3.13)

andriTC(x) = cone[(riC)− x]. (3.14)

Proof. NC(x) is the set of all y∗ that make a nonpositive inner product with c− xwhenever c ∈C. Hence NC(x) = [cone(C−x)], so by using Corollary 3.1.9 and thedefinition of TC(x) we find that

TC(x) = NC(x) = [cone(C− x)] = clcone(C− x).

The convex-hull operator and the union with the origin were not necessary becauseC contains x and the convexity of C implies that of cone(C−x). This proves the firstequality.

For the second equality, suppose first that z ∈ TC(x); we have just shown thatthen z ∈ clcone(C− x), so for any positive ε there exist c ∈ C and µ > 0 such that∥z− µ−1(c− x)∥ < ε . Rewriting this as ∥c− (x+ µz)∥ < µε , we see that dC(x+µz)< µε . Now let τ ∈ (0,µ) and define λ = τ/µ ∈ (0,1). As dC(x) = 0 by choiceof x, we have

τ−1dC(x+ τz) = τ−1dC[(1−λ )x+λ (x+µz)]

≤ τ−1[(1−λ )dC(x)+λdC(x+µz)]

= τ−1λdC(x+µz)

= µ−1dC(x+µz)< ε,


which shows that dC(x+τz) = o(τ), so that TC(x) is contained in the set on the rightin (3.13).

Next, suppose z belongs to that set, and find positive numbers τn converging tozero and points cn ∈ C such that cn = x+ τnz+ rn and ∥τ−1

n rn∥ converges to zero.Then z = τ−1

n (cn−x)−τ−1n rn. The first term on the right belongs to cone(C−x) and

the second converges to zero, so z ∈ clcone(C− x) and therefore, by what we havealready proved, z ∈ TC(x).

For (3.14), use what we have already proved to write

riTC(x) = riclcone(C− x) = ricone(C− x) = coneri(C− x) = cone[(riC)− x],

where we used Theorem 3.1.7 as well as Proposition 1.2.6 and Corollary 1.2.8. ⊓⊔

Proper separation can also give us some insight into the structure of the normalcone, as the next theorem shows.

Theorem 3.3.7. Let C be a convex subset of Rn, and let x be a point of C. Then:

a. (parC)⊥ = NC(x)∩ [−NC(x)];b. NC(x) = (parC)⊥ if and only if x ∈ riC;c. NC(x) = 0 if and only if x ∈ intC.

In particular, NC(x) is a subspace if and only if x ∈ riC.

Proof. Write L for parC. If x∗ ∈ L⊥ then as c− x ∈ L for each c ∈C, we have

⟨x∗,c− x⟩= 0. (3.15)

Therefore both x∗ and −x∗ belong to NC(x), so L⊥ ⊂ NC(x)∩ [−NC(x)]. Conversely,if x∗ ∈ NC(x)∩ [−NC(x)] then (3.15) holds for each c ∈C, so x∗ is orthogonal to L,which is aff(C− x). Therefore NC(x)∩−NC(x)⊂ L⊥, which proves (a).

For (b), suppose first that x ∈ riC and let x∗ ∈ NC(x). Write x∗ = u∗ + v∗ withu∗ ∈ L and v∗ ∈ L⊥. For small positive ε we have x+ εu∗ ∈C, and therefore

0 ≥ ⟨x∗,(x+ εu∗)− x⟩= ⟨u∗+ v∗,εu∗⟩= ε∥u∗∥2,

implying that u∗ = 0 and therefore x∗ ∈ L⊥. This shows that NC(x) ⊂ L⊥, but asL⊥ ⊂ NC(x) by part (a) we actually have NC(x) = L⊥.

Next suppose x /∈ riC: then by Theorem 2.2.11 we can separate x properlyfrom C. This means there is a y∗ such that for each c ∈ C, ⟨y∗,c⟩ ≤ ⟨y∗,x⟩ (so that⟨y∗,c−x⟩ ≤ 0 and therefore y∗ ∈ NC(x)), but for some c ∈C, ⟨y∗, c⟩< ⟨y∗,x⟩. Then⟨−y∗, c− x⟩> 0, so −y∗ /∈ L⊥. Therefore NC(x) = L⊥, which proves (b).

To prove (c), note that if x ∈ intC then by (b) we have

NC(x) = L⊥ = (Rn)⊥ = 0.

On the other hand, if x /∈ intC we have two possibilities: first, if dimC < n then by(a) NC(x) must contain the nonzero subspace L⊥; second, if dimC = n then x /∈ riC,

50 3 Cones

so by (b) NC(x) contains an element that is not in L⊥ = 0. In either case NC(x)cannot be the origin.

For the final claim, if x ∈ riC then (b) shows that NC(x) is the subspace (parC)⊥.On the other hand, if NC(x) is a subspace then NC(x) =−NC(x) =NC(x)∩ [−NC(x)].In that case (a) shows that NC(x) = (parC)⊥, and then from (b) we have x ∈ riC. ⊓⊔

The next theorem has a great many applications in optimization, because con-straints in optimization problems often involve the intersection of sets.

Theorem 3.3.8. Let C1, . . . ,Ck be nonempty convex subsets of Rn, let C be their in-tersection and suppose x ∈C. If the sets riCi for i = 1, . . . ,k have a point in common,then

TC(x) =k∩

i=1

TCi(x) (3.16)

and

NC(x) =k

∑i=1

NCi(x), (3.17)

the sum on the right then being closed.

Proof. For the tangent cone, we have

TC(x) = clcone(C− x)

= clcone[(∩ki=1Ci)− x]

= clcone[∩ki=1(Ci − x)]

= cl∩ki=1[cone(Ci − x)]

= ∩ki=1 clcone(Ci − x)

= ∩ki=1TCi(x),

where the fourth equality uses Exercise 3.1.19 and the fifth uses Proposition 1.2.12and the fact that ri(Ci − x) = riCi − rix = (riCi)− x, so that the relative interiorsof the sets Ci − x have a point in common.

The form of riTC(x) shown in Proposition 3.3.6, together with the hypothesis,shows that the cones riTCi(x) for i = 1, . . . ,k have a point in common. Using (3.12)and Corollary 3.2.7, we then find that

NC(x) = TC(x) = [k∩

i=1

TCi(x)] =

k

∑i=1

TCi(x) =

k

∑i=1

NCi(x),

and that the sum on the right is closed. ⊓⊔

The formulas in (3.16) and (3.17) do not hold in general without the regu-larity condition, even if the sets involved are all closed: for an example, takeC1 = (x1,x2) | x2 ≥ (1/2)x2

1 and C2 = −C1. They do, however, hold without thatcondition if all of the sets involved are polyhedral, as we will see later.


The normal cone is often used to express optimality conditions, as in the fol-lowing proposition. We say a point x in a subset S of Rn is a local minimizer ofa real-valued function f defined on some neighborhood N of x in S if there is aneighborhood N′ ⊂ N of x in S such that for each x′ ∈ N′, f (x)≤ f (x′).

Proposition 3.3.9. Suppose that C is a nonempty convex subset of Rn and f is a real-valued function on a neighborhood N of x. Assume that f has a Frechet derivatived f (x) at x. If x is a local minimizer of f |C, then

0 ∈ d f (x)+NC(x). (3.18)

Proof. Let c be any point of C and for τ ∈ (0,1) define cτ = (1− τ)x+ τc; thencτ ∈C by convexity. For all small positive values of τ the point cτ belongs to N, andfor such τ we have

0 ≤ f (cτ)− f (x) = d f (x)(cτ − x)+o(τ) = τd f (x)(c− x)+o(τ).

Dividing by τ and letting τ 0, we find that d f (x)(c− x) ≥ 0. As c was arbitraryin C, we have (3.18). ⊓⊔

We can combine this basic optimality condition with Theorem 3.3.8 to obtain anecessary optimality condition that is very important in practice.

Corollary 3.3.10. Let C1, . . . ,Ck be nonempty convex subsets of Rn whose relativeinteriors have a point in common, and let C = ∩k

i=1Ci. Let x be a point of C and fbe a real-valued function on a neighborhood N of x. Assume that f has a Frechetderivative d f (x) at x. If x is a local minimizer of f |C, then

0 ∈ d f (x)+k

∑i=1

NCi(x). (3.19)

Proof. Use Proposition 3.3.9 together with Theorem 3.3.8. ⊓⊔

Many results about normal cones also hold for nonconvex sets, with a suitableextension of the definition of normal cone and, often, with additional restrictions.An exposition for the finite-dimensional case is in [41, Chapter 6].


Exercise 3.3.11. Let C be a nonempty closed convex subset of Rn. Show that thenormal-cone operator NC is always closed. Give an example in R to show that thetangent-cone operator TC may not be closed.

Exercise 3.3.12. If C is a nonempty closed convex set in Rn and x and y are twopoints of Rn, show that the following are equivalent: (i) x is the projection of y on C;(ii) for each positive τ , x is the projection on C of x+ τ(y− x); (iii) y− x ∈ NC(x).

52 3 Cones

Exercise 3.3.13. Let K be a nonempty convex cone in Rn and suppose k ∈ K. Showthat NK(k) = k∗ ∈ K | ⟨k∗,k⟩= 0.

Exercise 3.3.14. Let C be a nonempty closed convex set in Rn. Show that each pointx of Rn has a unique decomposition x = c+n where c ∈C and n ∈ NC(c). State theparticular forms this decomposition takes when (a) C is a cone, (b) C is a subspace.

Exercise 3.3.15. Let C be a convex set in Rn, and let S be a nonempty relativelyopen convex subset of C. Show that the normal-cone operator NC( ·) is constant onS: that is, show that for any x and y in S, NC(x) = NC(y).


The Farkas lemma was apparently first published in 1896 [8], but with an incom-plete proof. The first publication with a correct proof seems to have been in 1898[9] in Hungarian. A paper of 1902 in German [10] is often cited. The names on thelatter two papers appear to be different because the author’s given name Gyula isused for the first, but the translation Julius for the second. The Gordan theorem ap-peared in [15]. Much more information on the history of the mathematics underlyingoptimization is in the excellent survey paper of Prekopa [31].

Chapter 4Structure

This chapter concentrates on the structure of convex sets. Section 4.1 develops away to describe how a convex set is unbounded, rather than just saying that it isunbounded. Then Section 4.2 shows how we can break up a convex set into pieces,called faces of the set, that have important geometric properties especially useful inthe study of variational problems. Finally, Section 4.3 applies results from the earliersections to identify a set of minimal structural elements that combine to make up aconvex set.

4.1 Recession

Even for unbounded sets, convexity has important structural consequences. Thissection develops a usable definition of “directions in which a set is unbounded,” andapplies that information to find out more about the set’s behavior. The key tool forthis purpose is the recession cone, developed in the next section. This cone thenallows us to make a very useful technical construction to homogenize a set. Thatconstruction exposes important structural properties of sets, and we will use it ex-tensively.

4.1.1 The recession cone

Recession directions are directions in which we can move without leaving a set.Here is the definition.

Definition 4.1.1. Let S be a nonempty subset of Rn. An element w ∈ Rn is a reces-sion direction of S if S+w ⊂ S. We write rcS for the set of all recession directionsof S. ⊓⊔

53

54 4 Structure

The definition implies that the origin is a recession direction of any nonemptyset. Also, if w is a recession direction of S then for any s ∈ S we have s+w ∈ S,which implies that S contains (s+w)+w= s+2w, and therefore s+kw∈ S for eachnonnegative integer k. One can also show from the definition that the set of recessiondirections of a product S× S′ is the product of the sets of recession directions of Sand of S′ respectively.

Suppose now that s ∈ clS and w ∈ cl rcS. Then for any neighborhoods N of s andQ of w we can find points s′ ∈ N ∩ S and w′ ∈ Q∩ rcS. Then s′+w′ ∈ S, implyingthat s+w ∈ clS and therefore w ∈ rcclS. It follows that cl rcS ⊂ rcclS, and if S isclosed then

rcclS ⊂ cl rcclS = cl rcS,

so that equality holds.In general, the set of recession directions of S need not have any special properties

such as convexity or positive homogeneity: for example, the set of integers coincideswith its set of recession directions. We can produce more satisfactory behavior byassuming that S is convex.

Theorem 4.1.2. Let C be a nonempty convex subset of Rn. The set rcC of recessiondirections of C is then a nonempty convex cone containing the origin; it is closed ifC is closed. Further, of the following statements (a) is equivalent to (b), (b) implies(c), and if C is closed then all three statements are equivalent.

a. w ∈ rcC.b. For each c ∈C, c+wR+ ⊂C.c. There exists c ∈C with c+wR+ ⊂C.

Proof. We have already noted that the origin must be a recession direction of anynonempty set, so rcC is nonempty.

To show that (a) and (b) are equivalent, let w ∈ rcC and choose any c ∈C. Thenc + kw ∈ C for each nonnegative integer k, and convexity of C implies that c +wR+ ⊂ C. On the other hand, if (b) holds then as 1 ∈ R+ we also have (a). It isobvious that (b) implies (c).

The fact that c+wR+ ⊂C for any c ∈C and any recession direction w shows thatrcC is a cone. It is also convex, as we can see by choosing recession directions wand w′ and µ ∈ [0,1]; then for c ∈C each of c+w and c+w′ is in C, so by convexitywe have

c+[(1−µ)w+µw′] = (1−µ)(c+w)+µ(c+w′) ∈C,

so that (1−µ)w+µw′ ∈ rcC.Now assume C is closed. If v /∈ rcC then there is some x ∈C with x+v /∈C. Then

there is a neighborhood Q of v such that x+Q does not meet C, so that Q does notmeet rcC, and this shows that rcC is closed.

Suppose that c ∈ C and let c+wR+ ⊂ C . If w = 0 then w ∈ rcC, so assumew = 0. Choose any c′ ∈C and let α ≥ 0. For k = 1,2, . . . let εk be positive numbersconverging to zero and define

ck = (1− εk)c′+ εk(c+αε−1k w).

4.1 Recession 55

Then ck ∈C because C is convex. But the ck converge to c′+αw, which must then bein C because C is closed. As α was any nonnegative number, we have c′+wR+ ∈C,so that (c) is equivalent to (a) and to (b). ⊓⊔

An immediate consequence of Theorem 4.1.2 is that if C and C′ are two nonemptyclosed convex sets with C ⊂C′, then rcC ⊂ rcC′. If the sets are not closed, then it ispossible for rcC to contain rcC′ properly.

Remark 4.1.3. If C is convex and if c ∈ riC and w ∈ rcC, then the halfline c+wR+

also lies in riC. To see this, recall that we have already shown that c+wR+ ⊂ C.Choose any λ > 0 and let µ > λ . We have c ∈ riC and c+ µw ∈ C, and x+ λwlies in the relative interior of the segment [c,c+ µw]. But then by Theorem 1.2.4,x+λw ∈ riC. ⊓⊔

By combining the technique used in this remark with Proposition 1.2.6, one canshow that rc riC = rcclC (Exercise 4.1.22).

Theorem 4.1.2 has other very useful consequences. The next two corollaries il-lustrate some of these.

Corollary 4.1.4. Let L be a linear transformation from Rn to Rm, and let C be aconvex subset of Rm with L−1(C) = /0. Then L−1(rcC)⊂ rcL−1(C), and if C is closedthen L−1(rcC) = rcL−1(C).

Proof. Suppose x ∈ L−1(rcC); then v := Lx ∈ rcC. If w is any point of L−1(C), thenc := Lw ∈C. Therefore the halfline c+ vR+ is contained in C. But

c+ vR+ = L(w+ xR+),

so w+ xR+ ⊂ L−1(C). As w was any point of L−1(C), we have x ∈ rcL−1(C) andtherefore L−1(rcC)⊂ rcL−1(C).

Now suppose that C is closed and let v ∈ rcL−1(C); we will show that Lv ∈ rcC,so that v ∈ L−1(rcC) and therefore rcL−1(C)⊂ L−1(rcC). We assumed that L−1(C)was nonempty; let x belong to it, so that y = Lx ∈C. Then x+vR+ ⊂ L−1(C), whichimplies that y+(Lv)R+ ⊂C. Apply Theorem 4.1.2 to conclude that Lv ∈ rcC. ⊓⊔

The next corollary shows the form of recession cones of intersections.

Corollary 4.1.5. Let Cα | α ∈ A be a nonempty collection of convex subsets ofRn with ∩α∈A Cα = /0. Then ∩α∈A rcCα ⊂ rc∩α∈A Cα . If each Cα is closed, then

∩α∈A rcCα = rc∩α∈A Cα . (4.1)

Proof. Let w ∈ ∩α∈A rcCα , and let x be any point of ∩α∈A Cα . For each α we havex+w ∈Cα , so x+w ∈ ∩α∈A Cα and therefore w ∈ rc∩α∈A Cα .

If each Cα is closed then let w ∈ rc∩α∈A Cα . Choose some c ∈ ∩α∈A Cα ; then∩α∈A Cα contains the halfline c+wR+. But that means that this halfline is in eachof the Cα , and by applying Theorem 4.1.2 we conclude that w ∈∩α∈A rcCα , so that(4.1) holds. ⊓⊔

56 4 Structure

We can express the lineality space of a convex set in terms of its recession direc-tions.

Proposition 4.1.6. If C is a nonempty convex subset of Rn, then linC = (rcC)∩(− rcC).

Proof. Let L = (rcC)∩ (− rcC); then if v and v′ belong to L and γ and γ ′ are realnumbers, then each of γv and γ ′v′ belongs to both rcC and − rcC. For any convexcone K we have K = K +K, so γv+ γ ′v′ ∈ L. Therefore L is a subspace of Rn.

If x ∈ L and c ∈ C then c+ x ∈ C, so C +L ⊂ C. As L contains the origin, weactually have C+L =C. To complete the proof we let K be any subspace of Rn withC+K =C, and show K ⊂ L. If k ∈K then C+k ∈C, so k ∈ rcC. As −k ∈K, a similarargument shows that k ∈ − rcC; therefore k ∈ (rcC)∩ (− rcC) = L. Accordingly,K ⊂ L, so L = linC. ⊓⊔

If C is closed and contains a line, that line must be of the form c+ vR for somenonzero v. As then both v and −v belong to rcC by Theorem 4.1.2, we have v∈ linC.Conversely, if v∈ linC then for each c∈C the line c+vR is contained in C. Supposethat for nonzero z we say that a line x+ zR in Rn has direction z. Then each nonzeroelement of linC indicates the direction of some line contained in C, and converselythe direction of every line contained in C belongs to linC. The latter fact need notbe true if C is not closed.

Here is an alternate description of rcC that is valid when C is closed and convex.It shows how we can construct rcC from elements of C, and provides the justificationfor the notation 0+C used for rcC in, e.g., [39].

Proposition 4.1.7. Let C be a nonempty closed convex subset of Rn. Then rcC is theset of all limits of sequences of the form αkck in which αk ↓ 0 and ck ∈C.

Proof. Let Q be the set of limits described in the proposition. If w ∈ rcC then letc ∈C; choose αk ↓ 0 and let ck = c+α−1

k w. These ck belong to C by hypothesis,and αkck converges to w, so rcC ⊂ Q.

If w ∈ Q then there are elements ck ∈ C and αk ↓ 0 with αkck converging tow. The numbers αk are eventually in (0,1]; then if c is any point of C we have(1−αk)c+αkck ∈ C by convexity. Taking the limit and recalling that C is closed,we find that c+w ∈C, so that w ∈ rcC and therefore Q ⊂ rcC. ⊓⊔

The following corollary provides a very useful test for boundedness.

Corollary 4.1.8. Let C be a nonempty convex subset of Rn. For C to be bounded itis necessary that rcC = 0, and if C is closed then this condition is also sufficient.

Proof. If c ∈ C and w ∈ rcC then Theorem 4.1.2 says that c+wR+ ⊂ C. If C isbounded then w must be zero, which proves necessity. For sufficiency, suppose thatC is closed and unbounded. Let ck ∈ C and suppose that the sequence ∥ck∥ con-verges to +∞. For large k let wk = ck/∥ck∥; the norms of the wk are all 1, so asubsequence converges to some w0 with norm 1. Proposition 4.1.7 then shows thatw0 ∈ rcC, so that rcC = 0.

The hypothesis that C be closed is really needed here (Exercise 4.1.23).

4.1 Recession 57

4.1.2 The homogenizing cone

An application of recession in the convex case yields a homogenization construc-tion, detailed in the following theorem, that provides the foundation for some keyresults in convex analysis. These include the fundamental conjugacy relation forconvex functions.

Theorem 4.1.9. Let Q be a convex cone contained in Rn ×R− but not in Rn ×0.Define

C := x ∈ Rn | (x,−1) ∈ Q. (4.2)

Then C is nonempty and one has

riQ = cone[(riC)×−1], (4.3)

andclQ = cone[(clC)×−1]∪ [(rcclC)×0]. (4.4)

Proof. Throughout the proof we write A(ξ ) for the hyperplane Rn ×−ξ. By hy-pothesis there is some (x,−ξ )∈Q with ξ > 0. Then (ξ−1x,−1) belongs to C, whichmust therefore be nonempty. We have riQ= riconeQ= coneriQ, so riQ is a convexcone. By Corollary 1.2.3, it cannot meet A(0) because Q ⊂ A(0); hence each pointof riQ is of the form (x,−ξ ) with ξ > 0. Therefore riQ consists of all positive scalarmultiples of points in (riQ)∩A(1). But by Proposition 1.2.12,

(riQ)∩A(1) = ri[Q∩A(1)] = ri(C×−1) = (riC)×−1,

so we have the formula in (4.3).If ξ > 0 then A(ξ ) meets riQ, so that

(clQ)∩A(ξ ) = cl[Q∩A(ξ )] = clξ [Q∩A(1)]= ξ cl[C×−1] = ξ [(clC)×−1].

On the other hand, any point of (clQ)∩A(0) has the form (x,0) and is a limit ofsome sequences (xk,−ξk) in riQ in which ξk 0. We can write xk = ξk(ξ−1

k xk),and the definition of C implies that ξ−1

k xk ∈ C. As ξk converges to zero, the set ofsuch x is contained in rcclC by Proposition 4.1.7. To show that each element x ofrcclC can be obtained in this way, let c ∈ riC = riclC; then for each positive integerk, c+kx ∈ riclC = riC (Remark 4.1.3). It follows from (4.3) that (k−1c+x,−k−1)∈riQ, and by taking the limit we find that (x,0) ∈ clQ. Therefore (clQ)∩A(0) =[(rcclC)×0], which establishes (4.4). ⊓⊔

In Theorem 4.1.9 we started with Q and derived C from it, because this allows usto consider a slightly more general situation. However, the usual procedure in usingthis construction is to start with C and then use the following definition.

Definition 4.1.10. Let S be a subset of Rn, and define a cone HS ⊂ Rn+1 by

58 4 Structure

HS = cone(S×−1) = cone(s,−1) | s ∈ S. (4.5)

This HS is the homogenizing cone of S. ⊓⊔

The homogenizing cone will be useful here mainly when the set S is a nonemptyconvex set C. Then the homogenizing cone HC fits the requirements of Theorem4.1.9, and C satisfies (4.2), so the theorem applies.

There are important differences between the homogenizing cone HC and thecone, written coneC, that is generated by C. For one thing, they lie in differentspaces, HC in Rn+1 and coneC in Rn. For another, they have very different struc-ture. In particular, coneC generally loses much of the information that was containedin C. For example, if C = [1,2] then coneC = intR+, so one has no idea what theendpoints of the original interval were. On the other hand, one can always recoverC from HC, so the homogenization loses no information. This is one of the rea-sons why the homogenization construction is often a much more useful tool than isconeC.

The intersection of HC with the hyperplane Rn ×−1 is the set C. If we look atthe intersection of (HC)

with the hyperplane Rn ×1, a familiar set appears.

Proposition 4.1.11. If C is a nonempty convex subset of Rn, then HC ∩(Rn×1) =

C×1.

Proof. (⊂) Suppose (x∗,1) ∈ HC ∩ (Rn ×1). If x ∈C then (x,−1) ∈ HC so

0 ≥ ⟨(x∗,1),(x,−1)⟩= ⟨x∗,x⟩−1.

Therefore x∗ ∈C, so (x∗,1) ∈C×1.(⊃) If x∗ ∈C and (h,−η)∈HC then η > 0 and (η−1h,−1)∈HC so that η−1h ∈

C. We have0 ≥ η [⟨x∗,η−1h⟩−1] = ⟨(x∗,1),(h,−η)⟩,

and therefore (x∗,1) ∈ HC ∩ (Rn ×1). ⊓⊔

4.1.3 When are linear images of convex sets closed?

Next we show how one can use the recession cone to derive a closedness conditionfor (forward) linear images of sets. The proof provides an illustration of how to useCorollary 4.1.8.

In order to state the main theorem, we need to develop three contributing results.The first of these defines a new cone closely related to the recession cone and de-velops of some of its properties. The second introduces a key condition involving aconvex set and a linear operator, and applies Proposition 3.2.3 to derive some of itsequivalent forms. The third applies that condition to extend Proposition 3.2.4 fromclosed convex cones to any closed convex sets.

4.1 Recession 59

Definition 4.1.12. Let S be a nonempty subset of Rn. The barrier cone of S, writtenbcS, is the set of all x∗ ∈ Rn for which sups∈S⟨x∗,s⟩<+∞. ⊓⊔

Direct use of the definition shows that bcS is a convex cone containing the origin.However, it need not be closed: if S is the set of points (x1,x2) in R2 satisfyingx2 ≥ x2

1, then bcS is the union of the origin and the open lower halfplane. By usingTheorem 4.1.9 we can develop the relationship between the recession and barriercones of a convex set.

Corollary 4.1.13. Let C be a nonempty convex subset of Rn. Then (bcC) = rcclC.

Proof. Construct from C the cone HC of Theorem 4.1.9. A point w∗ belongs to bcCexactly when there is some finite number ϕ such that for each c ∈ C, ⟨w∗,c⟩ ≤ϕ . We can rewrite this inequality as ⟨(w∗,ϕ),(c,−1)⟩ ≤ 0, and this shows that w∗

belongs to bcC if and only if there is some finite ϕ with (w∗,ϕ) ∈ HC. If we define

a linear transformation L from Rn+1 to Rn by L(x,ξ ) = x, then bcC = L(HC). Now

Proposition 3.2.1 and Corollary 3.1.9 yield (bcC) = (L∗)−1(clHC), which meansthat (bcC) is the set of w ∈ Rn such that (w,0) ∈ clHC. By Theorem 4.1.9, this setis rcclC. ⊓⊔

Next, we introduce a key condition and use Proposition 3.2.3 to derive someequivalent forms in which we can express it.

Definition 4.1.14. Suppose that C is a nonempty convex set in Rn and L is a lineartransformation from Rn to Rm. We say that C satisfies the recession condition for Lif C is closed and

(kerL)∩ (rcC)⊂ linC. (4.6)

⊓⊔

Here are some equivalent forms of the recession condition.

Proposition 4.1.15. Let C be a nonempty closed convex subset of Rn, and L be alinear transformation from Rn to Rm. The following three conditions are equivalent:

a. (kerL)∩ (rcC)⊂ linC.b. The intersection on the left in (a) is a subspace.c. imL∗∩ ribcC = /0.

Proof. Denote by K the set on the left in (a). If (a) holds and if x belongs to K, thenx must belong to − rcC also, and therefore −x ∈ K. As K is a convex cone, it mustbe a subspace, so (b) holds. Now suppose (b) holds. As K is a subspace, if x ∈ Kthen −x ∈ K also, which in particular implies that x ∈− rcC and therefore x ∈ linC,so (a) holds. Therefore (a) and (b) are equivalent; the equivalence of (b) and (c)follows from Proposition 3.2.3 together with the facts that rcC = (bcC) and that(kerL)⊥ = imL∗. ⊓⊔

Next, we use the recession condition to extend Proposition 3.2.4 from closedconvex cones to any closed convex sets.

60 4 Structure

Theorem 4.1.16. Let L be a linear transformation from Rn to Rm, let C be a convexsubset of Rn that satisfies the recession condition for L, and let V := (kerL)∩(rcC).Then

a. L(C) = L(C∩V⊥);b. There are nonnegative real numbers α and β such that for each y ∈ L(C) there

is x ∈C∩V⊥ such that

Lx = y, ∥x∥ ≤ maxα,β∥y∥. (4.7)

In particular, if L(C) is bounded then we may take β = 0.

Proof. If C is empty there is nothing to prove. Otherwise, by hypothesis V is asubspace of linC. If we define D := C ∩V⊥ then L(D) ⊂ L(C). By PropositionA.6.22 we have C =V +D, so if c ∈C then c = v+d with v ∈ V and d ∈ D. ThenLc = Ld and therefore L(C) = L(D), which proves (a).

For (b), let ρ be a positive number such that B(0,ρ) meets L(D). In the specialcase in which L(C) is bounded, we take ρ large enough so that B(0,ρ)⊃ L(D). LetT := D∩L−1[B(0,ρ)]. Corollary 4.1.5 shows that

rcT = rcC∩V⊥∩L−1[B(0,ρ)]= (rcC)∩V⊥∩ rcL−1[B(0,ρ)],

and Corollaries 4.1.4 and 4.1.8 yield

rcL−1[B(0,ρ)] = L−1 rcB(0,ρ) = L−1(0) = kerL,

so thatrcT =V ∩V⊥ = 0.

Then Corollary 4.1.8 shows that T is bounded, so we can choose α with T ⊂B(0,α). Then for any y ∈ L(C)∩B(0,ρ) there is x ∈ D with Lx = y and ∥x∥ ≤ α .

If L(C)⊂ B(0,ρ), we set β = 0 and are finished; this includes the case in whichL(C) is bounded. Otherwise, define a set S of real numbers by

S := ∥x∥∥y∥−1 | x ∈ D, ∥y∥> ρ, Lx = y.

The set S is nonempty because L(C) = L(D) and we know L(C) ⊂ B(0,ρ). If ithad no upper bound then there would be sequences yk ⊂ L(C)\B(0,ρ) and xk ∈D∩L−1(yk) such that for each k, ∥x∥∥y∥−1 > k. Then ∥xk∥ ≥ k∥yk∥ ≥ kρ , so thatthe norms of the xk converge to +∞. By passing to a subsequence if needed, wecan arrange that the elements zk := xk/∥xk∥ converge to some z0 having norm 1. ByProposition 4.1.7, z0 ∈ rcD. But for each k we have

∥Lzk∥= ∥yk∥∥xk∥−1 < k−1,

so that by continuity Lz0 = 0. Then

z0 ∈ (kerL)∩ rcD =V ∩V⊥ = 0,

4.1 Recession 61

contradicting ∥z0∥ = 1. Therefore S has an upper bound β , so that whenever y ∈L(C) with ∥y∥> ρ there is x ∈ D with Lx = y and ∥x∥ ≤ β∥y∥.

By combining the two cases we have considered, we find that whenever y ∈ L(C)there is some x ∈ D with Lx = y and ∥x∥ ≤ maxα ,β∥y∥, which proves (b). ⊓⊔

The bound in Theorem 4.1.16 differs in form from that in Proposition 3.2.4 be-cause for small values of ∥y∥ it is not positively homogeneous. To see why this isnecessary when C is not a cone, let C be the closed line segment [(1,0),(1,1)] inR2 and L be the linear map from R2 to R whose matrix in the standard basis is[0,1]. The procedure in the first part of the proof could choose any positive ρ , andfor small ρ > 0 we could choose α = ∥(1,ρ)∥, so α could be any number greaterthan 1. On the other hand, if we tried to use α = 0 then for the sequence yk withyk = k−1, the only point xk in C with Lxk = yk is (1,k−1). Therefore no fixed β couldsatisfy ∥xk∥ ≤ β∥yk∥ for all k.

Corollary 4.1.17. Let L be a linear transformation from Rn to Rm and let C be aconvex subset of Rn that satisfies the recession condition for L. Then L(C) is closed.

Proof. Choose any y0 ∈ clC and let yk be a sequence in L(C) converging to y0. Asyk converges, it is bounded. Apply Theorem 4.1.16 to produce a sequence xk in Cwith Lxk = yk and constants α and β with ∥xk∥ ≤ maxα ,β∥yk∥ for each k. Thenxk is bounded, so a subsequence must converge to some x0 ∈C. By continuity ofL, y0 = Lx0 ∈ L(C), so L(C) = clL(C).

Corollary 4.1.18. Let L be a linear transformation from Rn to Rm and let C be anonempty convex subset of Rn. Then

a. L(rcC)⊂ rcL(C);b. If C satisfies the recession condition for L, then rcL(C) is closed and

L(rcC) = rcL(C) (4.8)

Proof. Let v ∈ rcC and let y be any point of L(C). There is then a point c ∈ Cwith Lc = y. By choice of v we have c+R+v ⊂ C, and therefore L(c+R+v) =y+R+Lv ⊂ L(C). As y was any point of L(C), Lv ∈ rcL(C), so (a) holds.

Now suppose that C satisfies the recession condition for L. Let w ∈ rcL(C). AsL(C) is closed by Corollary 4.1.17, rcL(C) is also closed by Theorem 4.1.2. Propo-sition 4.1.7 shows that w is the limit of a sequence γkwk where the γk are positivenumbers converging to 0 and the wk are elements of L(C). Apply Theorem 4.1.16to produce a sequence ck of points in C and real numbers α and β such that foreach k, Lck = wk and

∥ck∥ ≤ maxα,β∥wk∥ ≤ α +β∥wk∥.

Then for each k,∥γkck∥ ≤ γkα +β∥γkwk∥. (4.9)

The first term on the right in (4.9) converges to 0, and the second is bounded becauseγkwk converges to w. Therefore γkck is bounded, and by taking a subsequence if

62 4 Structure

necessary we may suppose it converges to some v which must, by Proposition 4.1.7,be in rcC. As Lck = wk for each k, by taking the limit in the equation L(γkck) = γkwkon the subsequence we obtain Lv = w. This shows that w ∈ L(rcC) and therebyproves (4.8). ⊓⊔

One might try to prove (4.8) under weaker hypotheses than those of Corollary4.1.18. The following example from [39, p. 74] shows that at least one reasonableset of hypotheses—that both C and L(C) be closed—will not work.

Example 4.1.19. Let C = x ∈ R2 | x2 ≥ x21 and let L : R2 → R be the linear map

taking (x1,x2) to x1 (the linear projector on R). Then C is closed, and L(C) = R isalso closed. However, rcC = 0×R+ so that

L(rcC) = 0 = R= rcL(C).

The recession condition does not hold here, because

(kerL)∩ (rcC) = 0×R+ ⊂ 0= linC.

⊓⊔

Corollary 4.1.20. Let C1, . . . ,Ck be nonempty convex subsets of Rn. Then

rck

∑i=1

Ci ⊃k

∑i=i

rcCi. (4.10)

If also the Ci are closed and for any elements v1, . . . ,vk with vi ∈ rcCi for each i and∑k

i=1 vi = 0 one has vi ∈ linCi for each i, then ∑ki=1 Ci and its recession cone are

closed and

rck

∑i=1

Ci =k

∑i=i

rcCi. (4.11)

Proof. Let D = Π ki=1Ci. Then rcD = Π k

i=1 rcCi (Exercise 4.1.24). Define L : Rkn →Rn by L(x1, . . . ,xk) = ∑k

i=1 xi, where each xi is in Rn. Then

L(D) =k

∑i=1

Ci, L(rcD) =k

∑i=1

rcCi,

so (4.10) follows from part (a) of Corollary 4.1.18.Now assume that the Ci are closed and that for any elements v1, . . . ,vk with vi ∈

rcCi for each i and ∑ki=1 vi = 0 one has vi ∈ linCi for each i. The kernel of L is the

set of k-tuples (x1, . . . ,xn) where the xi belong to Rn and ∑ki=1 xi = 0. By Proposition

4.1.6 the lineality space of D is

linD = (rcD)∩ (− rcD) = Π ki=1[(rcCi)∩ (− rcCi)] = Π k

i=1 linCi.

4.1 Recession 63

Our hypotheses then imply that (kerL)∩ (rcD)⊂ linD, so that D satisfies the reces-sion condition for L. Now the remaining assertions follow from Corollaries 4.1.17and 4.1.18. ⊓⊔

For a convex set C, the set coneC may or may not be closed even if C is closed.The following corollary exhibits the structure of clconeC.

Corollary 4.1.21. If C is any convex subset of Rn whose closure does not containthe origin, then clconeC = (coneclC)∪ (rcclC).

Proof. This is clear if C is empty, so we suppose that C is nonempty. Let HC =cone(C×−1) and let L be the linear transformation from Rn+1 to Rn defined byL(x,ξ ) = x; then clconeC = clL(HC). By Theorem 4.1.9 we have

clHC = cone[(clC)×−1]∪ [(rcclC)×0].

Also, if (z,ζ ) ∈ kerL∩ rcclHC we have z = 0 and (0,ζ ) ∈ clHC. This means thatthere is a sequence of points of the form (τncn,−τn)∈ HC converging to (0,ζ ), withτn > 0 and cn ∈ C for each n. The norms of the cn are bounded away from zeroby hypothesis, so the τn must converge to zero and therefore ζ = 0. Now Corollary4.1.17 tells us that L(clHC) = clL(HC). Therefore

clconeC = clL(HC) = L(clHC) = (coneclC)∪ (rcclC),

as was to be shown. ⊓⊔


Exercise 4.1.22. Show that for any nonempty convex set C in Rn, rcC ⊂ rcclC =rc riC. Exhibit a convex set C such that rcclC strictly contains cl rcC, and prove theinclusion.

Exercise 4.1.23. Exhibit an unbounded convex subset C of R2 such that rcC = 0,and prove the equality.

Exercise 4.1.24. For i = 1, . . . ,n let ni be a positive integer and let Ci be a nonemptyconvex subset of Rni . Show that rcΠ n

i=1Ci = Π ni=1 rcCi.

Exercise 4.1.25. Let A be an m×n matrix, D be a p×n matrix, a ∈Rm, and d ∈Rp.Define a set P ⊂ Rn by

P = x | Ax = a, Dx ≤ d ,

where the symbol ≤ expresses the partial order on Rp that defines u ≤ v to meanthat for each i, ui ≤ vi.

Show that if P is nonempty, then

rcP = z | Az = 0, Dz ≤ 0, linP = z | Az = 0, Dz = 0,

64 4 Structure

so that these sets do not depend on a and d.

Exercise 4.1.26. Let C be a nonempty convex cone in Rn. Show that C ⊂ rcC, andthat if 0 ∈C then C = rcC.

Exercise 4.1.27. Let C be a convex set in Rn whose affine hull is a subspace. Then(parC)⊥ = lin(C).

4.2 Faces

Convex sets have certain distinguished subsets, called faces, that appear frequentlyin convex analysis and especially in optimization. In this section we define facesand explain some of their special properties. We will use these properties in Section4.3 to establish a fundamental representation theorem showing how a closed convexset can be written as a linear combination of certain simpler sets associated with it.One can think of that theorem as the internal analogue of the external representationgiven by Theorem 2.2.3.

Definition 4.2.1. Let C be a convex subset of Rn. A convex subset F of C is calleda face of C if whenever x and y belong to C, λ ∈ (0,1), and (1−λ )x+λy ∈ F , bothx and y also belong to F . ⊓⊔

By applying the definition twice one sees that a face of a face of C is a face of C.The empty set is a face of every convex set, and any convex set is a face of itself.Sometimes we exclude the latter case by referring to proper faces: that is, faces thatare proper subsets of the set in question.

The definition also shows that the intersection of any collection of faces of C isa face of C. Therefore for each pair of faces of C, say F1 and F2, there is a uniquelargest face of C contained in both (namely F1 ∩F2) and a unique smallest face thatcontains both (the intersection of all faces containing both; the collection over whichthat intersection is taken is not empty, because C is a face of itself).

Some faces of low dimension have special names. The following definition,which uses the operator pos introduced in Definition 1.1.16, explains these.

Definition 4.2.2. Let C be a convex subset of Rn. A face of C that is a singleton iscalled an extreme point of C. If some face of C is a halfline of the form x+posywhere y = 0, then posy is called an extreme ray of C and we say that y generatesthat extreme ray. ⊓⊔

We use the notation F (C) for the collection of all faces of a convex set C.The definition of face said geometrically that if a face F met the relative interior

of the line segment between x and y, then the whole segment had to lie in F . In fact,this property is not confined to line segments, as the following proposition shows. Itgives two structural properties of faces that are often very useful in proofs.

Proposition 4.2.3. Let C be a convex subset of Rn and let F be a face of C.

4.2 Faces 65

a. If S is a convex subset of C whose relative interior meets F, then S ⊂ F.b. F =C∩ clF. In particular, F is closed if C is closed.c. C\F is convex.

Proof. For (a), let f ∈ F ∩ riS and let s be any point of S. We show that s ∈ F , whichis immediate if s = f . If s = f , then Corollary 1.2.5 tells us that we can find r ∈ Sand λ ∈ (0,1) with f = (1−λ )r+λ s. The definition of face now says that s ∈ F .

For (b), let S =C∩clF . Then F ⊂ S ⊂ clF , so S and F have the same affine hull.But then

riF ⊂ riS ⊂ riclF = riF,

so F and S have the same relative interior. Then F ⊃ S by (a), so in fact F = S.For (c), suppose that C\F were not convex. Then F would be nonempty and there

would be points x and x′ of C\F and µ ∈ (0,1) such that xµ :=(1−µ)x+µx′ /∈C\F .As xµ ∈C we must have xµ ∈ F . But F is a face of C, so we would then have x andx′ belonging to F as well as to C\F , a contradiction. Therefore C\F is convex. ⊓⊔

If F is a singleton subset of a convex set C and C\F is convex, then F is anextreme point of C. Indeed, as F is a singleton f we need only show that it is aface of C. Let c and c′ be points of C, and let µ ∈ (0,1) with f = (1− µ)c+ µc′.Then 0 = (1−µ)(c− f )+µ(c′− f ), so that c− f =−µ(1−µ)−1(c′− f ). If eitherof c− f or c′ − f were nonzero then so would be the other, and then each of cand c′ would belong to the convex set C\F . That would imply f ∈ C\F , which isimpossible. Therefore c = f = c′, so that F is an extreme point of C. However, fordimF > 0 the convexity of C\F does not necessarily imply that F is a face: e.g.,take C = [0,1] and F = [0,1).

Corollary 4.2.4. If K is a closed convex cone in Rn, then each face of K is a closedconvex cone.

Proof. Let F be a face of K. If F = /0 or F = 0, no proof is necessary. Otherwise,let f ∈ F\0. Then f (intR+) is a relatively open halfline in K. Because it is rela-tively open and meets F , it must be contained in F . Therefore each positive multipleof a point in F belongs to F , so F is a cone. It is convex because it is a face of K,and closed by Proposition 4.2.3. ⊓⊔

Recession cones of faces of a set are faces of the set’s recession cone, but theconverse is not always true.

Proposition 4.2.5. Let C be a nonempty closed convex subset of Rn. If F is anonempty face of C then rcF is a face of rcC.

Proof. Part (b) of Proposition 4.2.3 shows that F is closed, and as F is a convexsubset of C Theorem 4.1.2 says that rcF is a convex subset of rcC. Suppose v andv′ belong to rcC, let µ ∈ (0,1), and assume that w := (1−µ)v+µv′ ∈ rcF . Chooseany f ∈ F : then f +w ∈ F . As F ⊂C, the points f + v and f + v′ belong to C. But

f +w = (1−µ)( f + v)+µ( f + v′),

66 4 Structure

so as F is a face of C both f + v and f + v′ belong to F . As f was an arbitrary pointof F , both v and v′ belong to rcF , and therefore rcF is a face of rcC. ⊓⊔

To see that the converse may not hold, let C be the subset of R2 defined by

C =

(x1,x2) |

x2 ≥ 0 if x1 ≤ 0x2 ≥ x2

1, if x1 > 0

.

Then rcC = R−×R+, and G := 0×R+ is a face of rcC. However, G is not therecession cone of any nonempty face of C.

Proposition 4.2.6. Let C be a convex subset of Rn and let F be any proper face ofC. Then dimF < dimC, and F is entirely contained in the relative boundary of C.

Proof. C is nonempty because F is proper. Further, F must lie entirely in the relativeboundary of C because if F met riC then we would have C ⊂ F by Proposition 4.2.3,contradicting the hypothesis.

If the dimensions of F and C were the same, then F would be nonempty and,as F ⊂C, affF and affC would be the same. Then riF would be nonempty and wewould have riF ⊂ riC, whereas we have already shown that F does not meet riC.Therefore dimF < dimC. ⊓⊔

It follows that any chain of faces of C, each properly contained in the precedingone, must be finite.

Proposition 4.2.7. Let H be a hyperplane in Rn and let C be a nonempty convex setcontained in one of the closed halfspaces of H. Then C∩H is a face of C, and it isproper if and only if H does not meet riC.

Proof. If C and H are disjoint, we are finished. Therefore assume that H meetsC. Write F = C ∩H and let H = H0(y∗,η). If c1 and c2 are two points of C andλ ∈ (0,1) with (1−λ )c1 +λc2 ∈ F , then we must have ⟨y∗,c1⟩ = η = ⟨y∗,c2⟩ sothat both c1 and c2 belong to C∩H = F . Therefore F is a face of C.

For the second claim, apply Corollary 1.2.3 to conclude that there are only twomutually exclusive possibilities: either clC ⊂ H or H does not meet riC. The firstoccurs if and only if F is improper, and therefore the second occurs if and only if Fis proper. ⊓⊔

Definition 4.2.8. Let C be a nonempty convex set in Rn and let F be a face of C.The face F is exposed if there is some x∗ ∈ Rn such that F = N−1

C (x∗). ⊓⊔

An exposed face is therefore the set of maximizers on C of the linear functional⟨x∗, · ⟩. It may be C itself (for x∗ = 0), or the empty set, but otherwise it must beof the form F = C∩H0(x∗,µ), where x∗ = 0 and µ is the finite maximum on C of⟨x∗, · ⟩. As C ⊂ H−(x∗,µ), Proposition 4.2.7 shows that F is a face of C.

For a simple example of a face that is not exposed, start with the nonnegativeorthant R2

+ and obtain C by first deleting the closed square whose corners are (0,0),

4.2 Faces 67

(1,0), (1,1), and (0,1) and then taking the union of the result with a closed ball ofradius 1 centered at (1,1). The resulting C is closed and convex, but the points (1,0)and (0,1) are faces that are not exposed.

Exposed faces appear naturally in the following construction, which is very im-portant in optimization.

Definition 4.2.9. Let C be a nonempty closed convex subset of Rn and let z0 ∈ Rn.Let x0 = ΠC(z0) and x∗0 = z0 − x0. The critical cone induced by C and z0 is

κC(z0) := N−1TC(x0)

(x∗0).

⊓⊔

Our previous work with the projector shows that in this situation x∗0 belongs toNC(x0), which is the polar of TC(x0). Therefore each point y of TC(x0) satisfies,⟨x∗0,y⟩ ≤ 0, and as TC(x0) is a closed set the maximum is zero. The set of maximizersincludes the origin and possibly additional points of TC(x0), and

κC(z0) = TC(x0)∩v | ⟨x∗0,v⟩= 0.

The critical cone is a nonempty exposed face of TC(x0), but it need not be a faceof C. A companion concept, defined next, is useful when we want to deal only withpoints in C.

Definition 4.2.10. Let C be a nonempty closed convex subset of Rn, and let z0 ∈Rn.Let x0 = ΠC(z0) and x∗0 = z0 − x0. The critical face of C corresponding to z0 is theset

ϕC(z0) := N−1C (x∗0).

⊓⊔

ϕC(z0) is an exposed face of C by definition, and it is nonempty because (x0,x∗0) ∈NC; if z0 ∈C then ϕC(z0) is all of C.

As (x0,x∗0) ∈ NC we have x0 ∈ ϕC(z0) and ϕC(z0) = C ∩x | ⟨x∗0,x− x0⟩ = 0.Also C− x0 ⊂ TC(x0), so

ϕC(z0)− x0 = (C− x0)∩v | ⟨x∗0,v⟩= 0 ⊂ TC(x0)∩v | ⟨x∗0,v⟩= 0= κC(z0).(4.12)

However, we generally do not have equality because the tangent cone may be muchlarger than the set C. In fact, the critical cone and the critical face need not evenhave the same dimension. For example, let C be the Euclidean unit ball in R2 andlet z0 = (0,2). The projection of z0 on C is x0 = (0,1) and x∗0 = (0,1). The criticalface ϕC(z0) is the point (0,1), whereas the critical cone κC(z0) is the line R×0.When the set in question is polyhedral then the situation is much better, as we willshow later.

The next theorem gives a fundamental structural property of convex sets.

68 4 Structure

Theorem 4.2.11. Let C be a convex set in Rn. The relative interiors of the faces ofC form a partition of C: that is, they are disjoint and their union is C. They aremaximal in the sense that if D is any relatively open convex subset of C then D iscontained in the relative interior of one of the faces of C.

Proof. If C is empty there is nothing to prove. If C is nonempty and F1 and F2 aredistinct faces of C, then riF1 is disjoint from riF2 by Proposition 4.2.3 (otherwiseF1 = F2). Now let D be any relatively open convex subset of C. We show D iscontained in the relative interior of some face of C. This implies the partition claim,because for any c ∈C we could take D to be the singleton c.

As D is relatively open, it must be completely contained in each face of C that itmeets (Proposition 4.2.3). Take the intersection of all such faces, of which there isat least one (C), and call it F . Then F is a face of C, D ⊂ F , and D meets no properface of F . We will show that D ⊂ riF by assuming the contrary and producing acontradiction.

If riF does not contain D then let d be a point of D\ riF . Separate d and Fproperly by a hyperplane H. As D ⊂ F , D must lie in one of the closed halfspaces ofH and d in the other. But d ∈ D, so d must lie in H. As D is relatively open, d ∈ riDand therefore H meets riD. Corollary 1.2.3 now shows that D ⊂ H, which impliesF ⊂ H because the separation is proper. Apply Proposition 4.2.7 to F to concludenot only that the set G = F ∩H is a face of F but also, because F ⊂ H, that G is aproper face of F . But D ⊂ G, and this contradicts the construction of F . ThereforeD ⊂ riF . ⊓⊔

Theorem 4.2.11 has many applications. One that frequently arises is the use ofthis theorem to associate particular faces of a set with particular relatively opensubsets that are of special interest. The next three propositions use it in this way.They provide ways to infer the face structures of sets created by common operationsfrom those of the original sets.

Proposition 4.2.12. Let C and D be convex subsets of Rn and Rm respectively. Then

F [C×D] = F ×G | F ∈ F (C), G ∈ F (D).

Proof. (⊃) Choose F ∈ F (C) and G ∈ F (D). If either F or G is empty, then theproduct is the empty face of C×D. Therefore suppose ( f ,g)∈ F×G. For i= 1,2 let(ci,di) be points of C×D such that ( f ,g) belongs to the interval ((c1,d1),(c2,d2)).Then f is in the interval (c1,c2), so the definition of face ensures that each of c1 andc2 belongs to F . Similarly, each of d1 and d2 belongs to G. Hence for i = 1,2 thepair (ci,di) belongs to F ×G, so F ×G is a face of C×D.

(⊂)) Suppose H is a face of C×D. If it is empty then it is the product of the emptyfaces of C and D, so we may assume it is nonempty. Choose a pair (c0,d0) ∈ riH.Use Theorem 4.2.11 to find unique faces F of C and G of D such that c0 ∈ riF andd0 ∈ riG. Now F ×G is a convex subset of C×D whose relative interior contains(c0,d0). By Proposition 4.2.3 we have F ×G ⊂ H. However, we already saw in thefirst part of the proof that F ×G was a face of C×D, and as it meets the relativeinterior of H we must have F ×G ⊃ H, so that in fact H = F ×G. ⊓⊔

4.2 Faces 69

The second proposition deals with inverse images of convex sets under affinetransformations.

Proposition 4.2.13. Let C be a convex subset of Rn and let T be an affine transfor-mation from Rm to Rn. Then F [T−1(C)] = T−1[F (C)].

Proof. (⊃) Let F ∈ F (C) and define G to be T−1(F). We want to show that G is aface of T−1(C). If G is empty then it is the empty face of T−1(C). Therefore supposeit is nonempty; suppose that d1 and d2 belong to T−1(C) and that there is a point d0belonging to G∩ (d1,d2). We show that d1 and d2 must then belong to G, so that Gwill be a face of T−1(C). For i = 0,1,2 let ci = T (di); then c0 ∈ F ∩ (c1,c2), and asF is a face of C the points c1 and c2 must belong to F . But then d1 and d2 belong toG, as required.

(⊂) Let G be a face of T−1(C). If G is empty then it is the inverse image ofthe empty face of C. Therefore assume G is nonempty and let g0 be a point of itsrelative interior. Define f0 to be T (g0), and let F be the unique face of C whoserelative interior contains f0. We show that G = T−1(F).

First choose any g1 ∈ G. As g0 ∈ riG we can use Theorem 1.2.4 to find a point g2of G with g0 ∈ (g1,g2). For i = 1,2 define ci to be T (gi); then f0 ∈ (c1,c2). As F isa face of C, the points c1 and c2 must belong to F . But this means that g1 ∈ T−1(F),and therefore G ⊂ T−1(F).

Next choose any point d1 of T−1(F). We want to prove that d1 belongs to theface G of T−1(C). Let f1 = T (d1); as f1 ∈ F and f0 ∈ riF we can apply Corollary1.2.5 to produce f2 ∈ F with f0 ∈ ( f1, f2). Therefore there is some λ ∈ (0,1) withf0 = (1−λ ) f1 +λ f2. Define d2 to be (1−λ−1)d1 +λ−1g0; then T (d2) = f2. Nowg0 belongs to G, a face of T−1(C), and each of d1 and d2 belongs to T−1(F) andhence to T−1(C). But also g0 = (1−λ )d1 +λd2, so each of d1 and d2 must belongto G. Therefore T−1(F)⊂ G, so in fact G = T−1(F). ⊓⊔

The next result deals with forward images under affine transformations. Unlikethe last proposition, this one has only a partial statement: it says that any face ofthe image of a set is itself the image, under the affine transformation, of some faceof the original set. It does not say that the transformation carries any face of theoriginal set to a face of the image, because that is not generally true. To see this,consider the 2-simplex S in R2 that is the convex hull of (0,0), (2,0), and (1,1).Let T be the linear projector taking the point (x,y) ∈ R2 to x ∈ R. The set T (S)has three nonempty faces; each of the two endpoints is the image under T of oneof the vertices of S, while the entire set T (S) is the image of both S and one of itsone-dimensional faces. However, T carries one of the vertices of S and two of itsone-dimensional faces into subsets of T (S) that are not faces.

Proposition 4.2.14. Let C be a convex subset of Rn and let T be an affine transfor-mation from Rn to Rm. Then F [T (C)]⊂ T [F (C)].

Proof. Let G be a face of T (C). If G is empty then it is the image of the emptyface of C. If it is nonempty, choose g0 ∈ riG; define S =C∩T−1(g0). This set S is

70 4 Structure

nonempty; let f0 be any point in its relative interior. Use Theorem 4.2.11 to find theunique face F of C such that f0 ∈ riF . We show that G = T (F).

Let g1 be any point of G, and and choose g2 ∈ G with g0 ∈ (g1,g2). For i = 1,2let ci ∈ C with gi = T (ci). There is λ ∈ (0,1) with g0 = (1− λ )g1 + λg2; definea point fa ∈ C by fa = (1− λ )c1 + λc2. As T is affine we have T ( fa) = g0, sothat fa ∈ S. We chose f0 to be in the relative interior of S, so we can use Theorem1.2.4 to find some fb ∈ S with f0 ∈ ( fa, fb). Each of fa and fb belongs to C, so asf0 ∈ F we have fa and fb in F also. But fa ∈ (c1,c2), so c1 also belongs to F . Theng1 = T (c1) ∈ T (F), so G ⊂ T (F).

Next let d1 be any point of T (F) and let f1 ∈ F with d1 = T ( f1). As f0 ∈ riF wecan use Corollary 1.2.5 to find some f2 ∈ F with f0 ∈ ( f1, f2); let d2 = T ( f2). Thepoints d1 and d2 are in T (C), and as T is affine we have g0 = T ( f0) ∈ (d1,d2). AsG is a face of T (C) it follows that d1 ∈ G, and therefore T (F)⊂ G. ⊓⊔

Here are two consequences of the preceding three propositions. The first showsthat the faces of an intersection of convex sets are precisely the intersections of facesof the individual sets.

Corollary 4.2.15. Let C1, . . . ,Ck be convex subsets of Rn, and define C = ∩ki=1Ci. A

subset F of Rn is a face of C if and only if F = ∩ki=1Fi, where for each i the set Fi is

a face of Ci.

Proof. It suffices to prove the result for k = 2, as we can then extend it by induction.Let T be the linear transformation from Rn to R2n defined by T (x) = (x,x). ThenC1 ∩C2 = T−1(C1 ×C2). Accordingly, by Propositions 4.2.12 and 4.2.13 we have

F (C1 ∩C2) = F [T−1(C1 ×C2)]

= T−1(F ×G | F ∈ F (C1), G ∈ F (C2))= F ∩G | F ∈ F (C1), G ∈ F (C2).

⊓⊔

The second consequence is the fact that each face of a finite sum of convex setsis the sum of faces of individual sets appearing in the sum.

Corollary 4.2.16. Let C1, . . . ,Ck be convex subsets of Rn, and define C = ∑ki=1 Ci.

Each face F of C has the form F = ∑ki=1 Fi, where for each i the set Fi is a face of

Ci.

Proof. Again it is enough to prove the result for k = 2. Let T ∗ be the adjoint of thelinear transformation T defined in the proof of Corollary 4.2.15: that is, T ∗(x,y) =x+ y. Then C1 +C2 = T ∗(C1 ×C2). Applying Propositions 4.2.12 and 4.2.14 wehave

F (C1 +C2)⊂ T ∗(F (C1 ×C2))

= T ∗(F ×G | F ∈ F (C1), G ∈ F (C2))= F +G | F ∈ F (C1), G ∈ F (C2)= F (C1)+F (C2).

4.2 Faces 71

⊓⊔If C is a closed convex set in Rn, then the faces F of clHC fall into two disjoint

classes: those contained in Rn×0, and those meeting Rn×(−∞,0). The followingtheorem shows that these two classes correspond to the classes F (rcC) and F (C)respectively.

Theorem 4.2.17. If C is a nonempty closed convex subset of Rn and HC is its ho-mogenizing cone, then:

a. The nonempty faces U of clHC that are contained in Rn ×0 are in one-to-onecorrespondence with the nonempty faces V of rcC, via the maps

σ(V ) =V ×0, τ(U) = Π(U),

where Π(x1, . . . ,xn+1) = (x1, . . . ,xn).b. The elements of the collection H of faces F of clHC that meet Rn × (−∞,0) are

in one-to-one correspondence with the nonempty faces G of C, via the maps

ϕ(G) = clcone(G×−1), γ(F) = Π [F ∩ (Rn ×−1)].

Proof. For (a), we apply Theorem 4.1.9 to show that (clHC)∩(Rn×0) = (rcC)×0. Proposition 4.2.12 shows that the faces U of (rcC)×0 are the sets of formU =V ×0 where V is a face of rcC. Therefore these faces of clHC are in one-to-one correspondence with the faces of F (rcC) through the equations U = σ(V ) andV = τ(U).

To prove (b), let G be a face of C. Define a subset F of Rn+1 by

F = ϕ(G) := clcone(G×−1).

This set F is closed and convex, and it meets Rn × (−∞,0). As G ⊂C, F is a subsetof clHC. To show that F is a face of clHC, let (h,−α) and (h′,−α ′) be points ofclHC and suppose that µ ∈ (0,1) with

(hµ ,−αµ) := (1−µ)(h,−α)+µ(h′,−α ′) ∈ F. (4.13)

We show that both (h,−α) and (h′,−α ′) belong to F . There are three cases, treatedin separate paragraphs below.

Case 1 is that in which αµ = 0: then both α and α ′ must be zero. Theorem 4.1.9shows that h and h′ belong to rcC and, as G is closed by Proposition 4.2.3, it alsoshows that hµ belongs to rcG. But rcG is a face of rcC by Proposition 4.2.5, so thatboth h and h′ must belong to rcG. As Theorem 4.1.9 says that

F ∩ (Rn ×0) = (rcG)×0,

it follows that (h,−α) and (h′,−α ′) are both in F .The other possibility is that in which αµ > 0, and here there are two subcases: in

Case 2, exactly one of α and α ′ is zero, and with no loss we can assume it is α ′; inCase 3 both are positive.

72 4 Structure

Suppose next that we are in Case 2. Then α ′ = 0, and from (4.13) we have

(hµ ,−αµ) := [(1−µ)h+µh′,−(1−µ)α].

Then as αµ > 0, we obtain αµ = (1−µ)α and

α−1h+α−1(1−µ)−1µh′ = α−1µ hµ =: g ∈ G, (4.14)

where the inclusion comes from Theorem 4.1.9. As α ′ = 0 and (h′,−α ′) ∈ clHCwe have h′ ∈ rcC. We also have α > 0 and (h,−α) ∈ clHC so that α−1h =: c ∈C.Writing β for α−1(1−µ)−1µ , we obtain from (4.14) the equation

g = c+βh′. (4.15)

The halfline c+h′R+ is contained in C because c∈C and h′ ∈ rcC. Moreover, (4.15)shows that the relative interior of this halfline meets G, so by Proposition 4.2.3 theentire halfline is in G. This tells us first that c ∈ G and next, by Theorem 4.1.2, thath′ ∈ rcG. Then

(h,−α) = (αc,−α) ∈ F, (h′,−α ′) = (h′,0) ∈ F,

as required.in Case 3, both α and α ′ are positive. If we write

c = α−1h, c′ = α ′−1h′, g = α−1µ hµ ,

then (4.13) yields

(1−µ)α(c,−1)+µα ′(c′,−1) = αµ(g,−1). (4.16)

Define λ = µα ′α−1µ : then by using the second components of the quantities in (4.16)

we find that 1−λ = (1−µ)αα−1µ and that λ ∈ (0,1). Then (4.16) shows that

(1−λ )(c,−1)+λ (c′,−1) = (g,−1). (4.17)

As G is a face of C, (4.17) shows that both c and c′ belong to G, and this impliesthat both (h,−α) and (h′,−α ′) belong to F as required. Therefore F ∈ H .

We have now shown that ϕ takes faces of G to elements of H . The next step isto show that ϕ is injective, and for that purpose let G and G′ belong to F (C) andsuppose that ϕ(G) = F = F ′ = ϕ(G′). As F ∈H , by Theorem 4.1.9 there is a pointin riF of the form (g,−1) where g ∈ (riG)∩ (riG′). But as G and G′ are faces, thisimplies that G = G′, so that ϕ is injective.

Now suppose F is any element of H . As clHC is a closed convex cone, so isF by Corollary 4.2.4. Let G = γ(F) = Π [F ∩ (Rn ×−1)]. Then G is a closedconvex subset of C. Let c and c′ belong to C and choose µ ∈ (0,1) such that (1−µ)c+µc′ =: g ∈ G. Then (c,−1) and (c′,−1) belong to clHC, and we have

4.2 Faces 73

(1−µ)(c,−1)+µ(c′,−1) = (g,−1) ∈ F.

As F is a face, both (c,−1) and (c′,−1) are in F and so c and c′ belong to G. Thisshows that G is a face of C. Theorem 4.1.9 then says that F = ϕ(G), so that ϕ issurjective and ϕ γ = idH .

Let G ∈ F (C). By what we have just shown,

ϕ(G) = (ϕ γ)ϕ(G) = ϕ (γ ϕ)(G).

If we let G′ = (γ ϕ)(G) then this shows that ϕ(G) = ϕ(G′). But ϕ is injective,so G = G′ and therefore γ ϕ = idF (C). This establishes the asserted one-to-onecorrespondence between H and F (C). ⊓⊔

Faces of a closed convex cone are closely related to faces of the polar cone. Thefollowing theorem explains this relationship.

Theorem 4.2.18. Let K be a nonempty closed convex cone in Rn, and let F be anonempty face of K. Write L for parF. Then F = K∩L and for each x ∈ riF, NK(x)has the constant value F∗ := K∩L⊥. This F∗ is a face of K, and for each x∗ ∈ F∗,F ⊂NK(x∗). For x∗ ∈ riF∗ one has F =NK(x∗) if and only if the face F is exposed.

Proof. As K is a closed convex cone, so is F (Exercise 4.2.20). Then F contains theorigin, so L = parF = affF ⊃ F . The hypothesis says that F ⊂ K and parF = L, soF ⊂ K ∩L.

As F is a face of K, to show that K ∩L ⊂ F we need only show that ri(K ∩L)meets F (part (a) of Proposition 4.2.3). If this did not happen, then as F and K ∩Lare both convex cones that are subsets of L, we could separate them properly by ahyperplane H0(y∗,0) with y∗ ∈ L\0. (To see this, carry out the separation in L toproduce y∗, then include y∗ in Rn and use it to define H0.) As F ⊂ K ∩L, F must becontained in the hyperplane. If x ∈ riF then for small positive ε we have x+εy ∈ Fbecause L = affF . Then as ⟨y∗,x⟩= 0 we have

0 = ⟨y∗,x+ εy∗⟩= ε∥y∗∥2,

which contradicts y∗ = 0. Therefore ri(K∩L) meets F , so that K∩L ⊂ F and there-fore F = K ∩L.

As riF is a relatively open convex subset of K, the value of NK(x) for x ∈ riFis some constant set F∗ by Exercise 3.3.15. Let u∗ be a point of K ∩ L⊥ and letx ∈ riF . Then x−0 ∈ parF = L, so ⟨u∗,x⟩= 0. For any x′ ∈ K, as u∗ ∈ K we have

⟨u∗,x′− x⟩= ⟨u∗,x′⟩−⟨u∗,x⟩= ⟨u∗,x′⟩ ≤ 0,

so u∗ ∈ NK(x) = F∗, showing that K ∩L⊥ ⊂ F∗. Next, suppose that x∗ ∈ F∗ andx ∈ riF , and let v ∈ L. For small positive ε we have x+ εv ∈ riF . The constancy ofF∗ then implies that

0 ≥ ⟨x∗,(x+ εv)− x⟩, 0 ≥ ⟨x∗,x− (x+ εv)⟩,

74 4 Structure

so ⟨x∗,v⟩ = 0. As v was an arbitrary point of L, x∗ ∈ L⊥. But x∗ ∈ K becauseNK(x) = K∩v∗ | ⟨v∗,x⟩= 0 (Exercise 3.3.13), so x∗ ∈ K∩L⊥. Therefore F∗ ⊂K∩L⊥, so we have shown that F∗ = K∩L⊥.

To show that F∗ is a face of K, let k∗0 and k∗1 be points of K and suppose thatµ ∈ (0,1) such that (1− µ)k∗0 + µk∗1 =: k∗ ∈ F∗. Choose x ∈ riF and v ∈ L. Forsmall positive ε we have x+ εv ∈ riF . As k∗0 and k∗1 belong to K, both ⟨k∗0,x⟩ and⟨k∗1,x⟩ are nonpositive. However, k∗ ∈ F∗ ⊂ L⊥, so

0 = ⟨k∗,x⟩= (1−µ)⟨k∗0,x⟩+µ⟨k∗1,x⟩,

implying that both ⟨k∗0,x⟩ and ⟨k∗1,x⟩ are zero. A similar argument with x+ εv inplace of x shows that both ⟨k∗0,x + εv⟩ and ⟨k∗1,x + εv⟩ are zero, implying that⟨k∗0,v⟩ = 0 = ⟨k∗1,v⟩. As v was any element of L, both k∗0 and k∗1 belong to L⊥ andtherefore to F∗, which must then be a face of K.

For the last assertions, let x∗ ∈ F∗. The set NK(x∗) consists of those x ∈ K suchthat ⟨x∗,x⟩= 0. As x∗ ∈ K∩L⊥, if x ∈ K ∩L then x ∈ NK(x∗), and therefore F ⊂NK(x∗). If the face F is not exposed then there is no x∗ such that F = N−1

K (x∗). ButN−1

K =NK , so we cannot have F =NK(x∗). Now suppose F is exposed, and choosey∗ such that F = N−1

K (y∗). Then for each x ∈ F we have y∗ ∈ NK(x), so that y∗ ∈ F∗.Let V be the constant value of NK on riF∗; we have already shown that F ⊂V . Letv be any point of V . Choose a sequence x∗k of points in riF∗ converging to y∗; thenfor each k we have v ∈ NK(x∗k). As NK is a closed multifunction, v ∈ NK(x∗) = F .As v was any point of V we conclude that V ⊂ F and so F ⊂ V ⊂ F . Accordingly,F is the constant value of NK on riF∗. ⊓⊔

If the cone K is polyhedral then parF∗ = L⊥, as we show in Theorem 5.3.2below. However, this is false in general as one can see by taking K to be the cone ofExercise 3.1.18 and L to be the z-axis. Then K∩L⊥ and L⊥ have dimensions 1 and2 respectively.


Exercise 4.2.19. Show that if C is a nonempty closed convex subset of Rn then thelineality space of each nonempty face of C is linC.

Exercise 4.2.20. If K is a convex cone in Rn and F is a face of K, then F is a convexcone; further, F is closed if K is closed.

Exercise 4.2.21. Let C be a nonempty convex subset of Rn, and let L be a nonemptysubspace contained in linC. Define C0 to be C ∩ L⊥. Show that the faces F of Care in one-to-one correspondence with the faces F0 of C0 through the mappingsϕ(F0) = F0 +L, ψ(F) = F ∩L⊥.

Note. To establish a one-to-one correspondence between sets S and T one mustshow that there are functions σ : S → T and τ : T → S such that τ σ = idS andσ τ = idT , where idX denotes the identity map of X .

4.3 Representation 75

4.3 Representation

We are going to develop for closed convex sets a representation of the form convE+posD, where E and D are specified sets. To do so, we will use the generalizedsimplex introduced and discussed in Definitions 1.1.17 and 1.1.18, and in Theorem1.1.19.

We begin with the following proposition, which shows that a face inherits someof the structure of the original set.

Proposition 4.3.1. Let X and Y be subsets of Rn, and let C = convX + posY . LetF be a nonempty face of C, and define XF = X ∩F and YF = Y ∩ rcF. Then F =convXF +posYF .

Proof. The equivalence of (a) and (b) in Theorem 4.1.2 shows that F ⊃ convXF +posYF . Let z ∈ F ; we complete the proof by showing that z ∈ convXF +posYF .

As z ∈C, by Theorem 1.1.19 we can write z = ∑Ii=1 λixi +∑J

j=1 µ jy j, with the xiand y j belonging to X and Y respectively, with the λi and µ j strictly positive, andwith ∑I

i=1 λi = 1. Let E = convx1, . . . ,xI+posy1, . . . ,yJ and define

U = u ∈ RI | u ≥ 0,I

∑i=1

ui = 1, V = RJ+.

Then E is the image of U ×V under the linear transformation represented by thematrix whose columns are x1, . . . ,xI ,y1, . . . ,yJ , and z is the image of the point(λ1, . . . ,λI ,µ1, . . . ,µJ). As this point belongs to ri(U ×V ) we have z∈ riE by Propo-sition 1.2.7. But z ∈ F , so by Proposition 4.2.3 E ⊂ F . It follows that x1, . . . ,xI ∈ XF ,and that the points y1, . . . ,yJ belong to rcclF (apply part (c) of Theorem 4.1.2to clF). However, these points also belong to rcC. By Proposition 4.2.3 we haveF = C ∩ clF , so in fact y1, . . . ,yJ belong to rcF , and therefore to Y ∩ rcF = YF .Accordingly, z ∈ convXF +posYF as asserted. ⊓⊔

We now turn to the representation theorem mentioned at the beginning of thissection. This theorem will permit us to represent a closed convex set containing noline as a certain combination of extreme points of the set and extreme rays of itsrecession cone. Corollary 4.3.5 extends this representation to general closed convexsets. Chapter 5 will show how to sharpen the version given here for a special classof sets called polyhedral.

As the theorem deals with sets that do not contain lines, we give first an equiva-lent way to describe such sets.

Lemma 4.3.2. Let S be a nonempty subset of Rn. If bcS has a nonempty interiorthen S contains no line. If S is closed and convex then the converse assertion holdsalso.

Proof. Suppose that S contains a line. Let x ∈ S and let y be a nonzero point of Rn

such that x+ µy | µ ∈ R ⊂ S. If x∗ ∈ Rn and ⟨x∗, ·⟩ is bounded above on S, then

76 4 Structure

⟨x∗,y⟩= 0. Hence bcS lies in the (n−1)-dimensional subspace orthogonal to y, sointbcS is empty, and this proves the first assertion.

For the converse, assume that S is closed and convex, and that it contains no line.Corollary 4.1.13 says that (bcS) = rcS. Taking polars, we find that clbcS = (rcS).Let A be the affine hull of bcS. Then A is a subspace, and it is also the affine hullof clbcS. Therefore (rcS) ⊂ A. If the dimension of A were n− 1 or less, then bytaking polars we could obtain

A⊥ = A ⊂ (rcS) = rcS.

Then for each x ∈ S we would have x+A⊥ ⊂ S. As the dimension of A⊥ is at least1, this contradicts the fact that S contained no line. Therefore A has dimension n, sothe interior of bcS is nonempty. ⊓⊔

The next lemma provides the key step in the proof of the main theorem.

Lemma 4.3.3. If C is a closed convex set in Rn having dimension at least 2 andcontaining no line, then each point of C is a convex combination of two points ofrbC.

Proof. It suffices to show this for a point c∈ riC, as any other points of C are alreadyin rbC. The hypothesis that C contains no line, together with Lemma 4.3.2, ensuresthat bcC has a nonempty interior. Let x∗ be a nonzero point of this interior and defineE = C ∩ H, where H is the hyperplane consisting of all x with ⟨x∗,x⟩ = ⟨x∗,c⟩.Suppose that bcC contains a ball of radius ε > 0 around x∗. If z is any nonzeropoint of rcC then x∗ + εz/∥z∥ ∈ bcC, so that we must have ⟨x∗ + εz/∥z∥,z⟩ ≤ 0because C is closed and therefore rcC = (bcC) by Corollary 4.1.13. This implies⟨x∗,z⟩ < 0. Accordingly the recession cone rcH of H, which is the subspace Mconsisting of all z with ⟨x∗,z⟩ = 0, meets rcC only in the origin. Corollary 4.1.5now yields rcE = rcC∩ rcH = 0, which implies that E is bounded and thereforecompact.

We have[(parC)∩ (parH)]⊥ = (parC)⊥+(parH)⊥. (4.18)

As parC has dimension at least 2 and parH has dimension n− 1, the orthogonalcomplements on the right-hand side of (4.18) have dimensions γ ≤ n−2 and δ = 1respectively. The dimension of their sum is then not greater than γ + δ ≤ n− 1, sothe subspace M = (parC)∩ (parH) has dimension at least 1. Let w be any nonzeroelement of M. For each µ ∈ R the points c± µw belong to H. As c ∈ riC, for µclose to zero they belong to C also, hence to E. But as E is compact, there arelargest values µ+ and µ− such that c+ := c+µ+w ∈C and c− := c−µ−w ∈C. Thepoints c+ and c− belong to the relative boundary of C (because otherwise we couldincrease either µ+ or µ− without leaving C), and c is a convex combination of thesepoints. ⊓⊔

Here is the representation theorem.

4.3 Representation 77

Theorem 4.3.4. Let C be a closed convex set in Rn containing no line. Let X be theset of extreme points of C, and let Y be the set of points y that generate extreme raysof C. Then C = convX +posY .

Proof. If the dimension of C is −1, 0, or 1 then the proof is immediate by enumera-tion of cases. Therefore assume that C has dimension k > 1 and, for induction, thatthe theorem holds for each closed convex set in Rn containing no line and havingdimension less than k. We have Y ⊂ rcC, so C ⊃ convX + posY . To complete theproof we show that the opposite inclusion holds.

Lemma 4.3.3 shows that each point c of C can be written as a convex combinationof two points of the relative boundary rbC. Thus we need only show that any pointw of rbC belongs to convX + posY . To do so, apply Theorem 4.2.11 to show thatw belongs to the relative interior of some proper face F of C. By Proposition 4.2.6dimF ≤ k − 1. Moreover, F is convex and by Proposition 4.2.3 it is closed. Theinduction hypothesis then says that F can be written as convX ′+ posY ′, where X ′

consists of the extreme points of F and Y ′ of the points that generate extreme raysof F . But X ′ ⊂ X and Y ′ ⊂ Y , so w ∈ convX +posY . ⊓⊔

Although we stated this theorem for sets containing no lines we can easily extendit to provide a structure result for general closed convex sets.

Corollary 4.3.5. Let C be a closed convex subset of Rn. Let C′ =C∩(linC)⊥; defineX to be the set of extreme points of C′ and Y to be the set of elements y that generateextreme rays of C′. Then C = convX +posY + linC.

Proof. Using Proposition A.6.22, we can show that C =C′+ linC. But by construc-tion C′ contains no line, so Theorem 4.3.4 shows that C′ = convX +posY . ⊓⊔

Corollary 4.3.6. A compact convex subset of Rn is the convex hull of its extremepoints.

Proof. If the set is compact then it contains no line, so Theorem 4.3.4 applies; it alsohas no extreme rays, so the representation in the theorem reduces to convX . ⊓⊔

In infinite-dimensional spaces the statement of Corollary 4.3.6 has to be mod-ified: a compact convex subset of a locally convex topological vector space is theclosed convex hull of its extreme points (that is, the intersection of all closed convexsets containing the set of extreme points). This is the Krein-Milman theorem.


Exercise 4.3.7. Let X and Y be subsets of Rn. Show that convX +posY is the small-est convex set C such that C contains X and the set of recession directions of Ccontains Y .

78 4 Structure

Exercise 4.3.8. Consider the sets

X =

(−1−1

),

(1−1

), Y =

(−11

),

(11

).

Let C = convX + posY . By considering the origin (a point of C), show that therepresentation of a point in the minimal expansion given by Theorem 4.3.4 need notbe unique.


Chapter 5Polyhedrality

This chapter analyzes a class of convex sets that is extremely important in appli-cations, and that also has many nice properties not shared by general convex sets.It turns out that these sets can be described in two different ways, and the fact thatthese two descriptions really refer to the same class of sets is the central result ofthe chapter. The first section introduces these sets and examines some of their prop-erties.

5.1 Polyhedral convex sets

Definition 5.1.1. A convex subset C of Rn is polyhedral if C is the intersection of afinite collection of closed halfspaces. ⊓⊔

A convenient way of representing a polyhedral convex set in Rn is as the solutionset of some system of linear inequalities Ax ≤ a, where A is a linear transformationfrom Rn to Rm and a ∈Rm. We use the symbol ≤ to express the partial order on Rm

that defines u ≤ v to mean that for each i, ui ≤ vi. For a matrix A and a subset I ofthe set A indexing the rows of A, AI denotes the submatrix constructed of the rowsof A whose indices belong to I. We write cI for the complement of I in A .

Remark 5.1.2. If we translate a polyhedral convex set by some amount, the resultis still polyhedral. To see this, suppose that the set is P := x | Ax ≤ a, whereA ∈ Rm×n. If v ∈ Rn then

P− v = x− v | Ax ≤ a= y | Ay ≤ a−Av,

which is also polyhedral.The same property holds for finitely generated sets, but it is easier to see. If X

and Y are finite subsets of Rn with X nonempty, and if F = conv(X ,Y ), then foreach v ∈ Rn we have F − v = conv(X − v,Y ). ⊓⊔

79

80 5 Polyhedrality

Section 4.2 established general properties of faces, but we will need to use somespecial properties that for the most part hold only for faces of polyhedral convexsets. We first introduce some convenient tools for handling such faces.

Suppose we represent a nonempty polyhedral convex subset C of Rn by writing

C = x | Ax ≤ a, (5.1)

where A is an m×n matrix and a∈Rm. If I is a subset of 1, . . . ,m then we write AIand aI for the matrix and vector formed from those rows of A and of a respectively,whose indices belong to I. We define cI to be 1, . . . ,m\I and use the conventionjust introduced to define the matrix AcI and acI . If either I or cI is empty then theassociated matrix and vector are vacuous.

We will examine subsets of C having the form

CI := x | AIx = aI , AcIx ≤ acI (5.2)

for some index set I ⊂ 1, . . . ,m. If for the set I there exists a point xI satisfying

AIxI = aI , and if cI = /0 then AcIxI < acI , (5.3)

then we call I an exact index set for C, and we write EC for the collection of indexsets (with respect to the representation (5.1)) that are exact for C. If I is exact then(5.3) shows that xI ∈CI .

Example 5.1.3. Suppose we take C = R+ and begin with the representation

C = x | [−1]x ≤ 0. (5.4)

For this representation there are two index sets, /0 and 1. If we take I = /0 in (5.2)then cI = 1 and C/0 = x | [−1]x ≤ 0, which is R+. We can take any positive realnumber for x /0. On the other hand, if I = 1 then cI = /0 so C1 = x | [−1]x =0 = 0, and x1 = 0. Each of the two index sets is exact for C. In each case weused the vacuity of a system of equations or inequalities corresponding to the emptyindex set.

For additional perspective, we may represent the same set C = R+ in the form

C =

x |[−10

]x ≤

[00

]. (5.5)

We now have four index sets: /0, 1,2,and 1,2. These generate the followingsets CI . Accompanying each CI is an example of xI if one exists.

• C/0 = R+; x /0 does not exist;• C1 = 0; x1 does not exist;• C2 = R+; x2 = 1;• C1,2 = 0; x1,2 = 0.

In the second part of this example the set I is exact if it is either 2 or 1,2,but not otherwise. We see in both parts that the exact index sets are in one-to-one

5.1 Polyhedral convex sets 81

correspondence with the nonempty faces of C. The next theorem shows that this willbe the case for each nonempty polyhedral convex C. ⊓⊔

Theorem 5.1.4. Let C be a nonempty polyhedral convex subset of Rn,and supposethat A is an m×n matrix such that for some a ∈ Rm,

C = x | Ax ≤ a. (5.6)

Let E be the collection of exact index sets for the representation (5.6) of C. DefineF ′(C) = F (C)\ /0. For F ∈ F ′(C) let

γ(F) = i | for each x ∈ F, Aix = ai, (5.7)

and for I ∈ E letϕ(I) = x | AIx = aI , AcIx ≤ acI. (5.8)

Then the following results hold.

a. The functions γ and ϕ provide a one-to-one correspondence between the collec-tions F ′(C) of nonempty faces of C and E of exact index sets for its representa-tion (5.6).

b. If F ∈ F ′(C) and I = γ(F) then

F = x | AIx = aI , AcIx ≤ acI, (5.9)affF = x | AIx = aI, (5.10)parF = kerAI , (5.11)

riF = x | AIx = aI , AcIx < acI, (5.12)rcF = z | AIz = 0, AcIz ≤ 0, (5.13)

linF = kerA. (5.14)

Thus each face of C is polyhedral, as is the recession cone of that face, and thelineality space of each face is that of C.

As each point of C belongs to the relative interior of some face of C (Theorem4.2.11), part (c) of the theorem gives a complete description of the normal cone ofC at each of its points as a finitely generated convex set.

The correspondence in part (a) of Theorem 5.1.4 applies to nonempty faces only.The empty face of C cannot be of the form ϕ(I) for I ∈ E : if it were, then no pointx would satisfy

AIx = aI , AcIx ≤ acI ,

whereas the definition of E requires that there be a point xI satisfying the strongercondition

AIxI = aI , AcIxI < acI .

On the other hand, Example 5.1.3 showed that the empty index set may or may notbe in E , depending upon the representation used.

82 5 Polyhedrality

Proof. We establish the one-to-one correspondence by proving first that

γ : F ′(C)→ E , ϕ : E → F ′(C) (5.15)

and then thatγ ϕ = idE , ϕ γ = idF ′(C) . (5.16)

For the proof of (5.15) let F ∈ F ′(C) and define I := γ(F). F is nonempty, andfor each point x ∈ F and each i ∈ I we have Aix = ai. If I = 1, . . . ,m then I ∈ E ;define xI to be any point of F . Otherwise, for each j ∈ cI there is a point x j ∈ F withA jx j < a j. Define

xI = |cI|−1 ∑j∈cI

x j :

that is, xI is the average of the points x j. Then AIxI = aI and AcIxI < acI , so thatI ∈ E and therefore γ : F ′(C)→ E .

Note for future reference that we have shown that for each F ∈ F ′(C) the setγ(F) is an exact index set, so that there is a point xγ(F) with

Aγ(F)xγ(F) = aγ(F), Acγ(F)xγ(F) < acγ(F). (5.17)

In particular xγ(F) belongs to F .Now let I ∈ E and let F = ϕ(I). The form of ϕ(I) shows that F is a convex

subset of C. As I is exact, some point xI satisfies (5.3), so that it belongs to ϕ(I)and therefore F is nonempty. If x and x′ belong to C and µ ∈ (0,1) with x′′ :=(1−µ)x+µx′ ∈ F , then

0 = AIx′′−aI = (1−µ)[AIx−aI ]+µ [AIx′−aI ].

Each of the two quantities in square brackets on the right is nonpositive, so eachmust be zero: therefore x and x′ belong to F , which must then be a face of C. Thisshows that ϕ : E → F ′(C), and completes the proof of (5.15).

To prove (5.16) we begin by showing that γ ϕ = idE . Let I ∈ E and defineG := ϕ(I) and J := γ(G) = (γ ϕ)(I). By what we have already shown, G ∈ F ′

Cand J ∈ E . We will show that J = I. As I was any element of E , this will prove thatγ ϕ = idE .

For each i ∈ I and each x ∈ G we have Aix = ai, so i ∈ γ(G) = γ[ϕ(I)] andtherefore I ⊂ (γ ϕ)(I). On the other hand, suppose k ∈ 1, . . . ,m\I. We know thatxI ∈ ϕ(I), but as k /∈ I, AkxI < ak by the definition of xI . Therefore k /∈ γ[ϕ(I)]. Thisshows that (γ ϕ)(I)⊂ I, so that γ ϕ = idE .

For the second part of (5.16), let F ∈F ′(C) and write J := γ(F) and G := ϕ(J) =(ϕ γ)(F). We will show first that F ⊂ G and then that G ⊂ F .

For an arbitrary x ∈ F the definition of J shows that for each j ∈ J we haveA jx = a j, so that AJx = aJ and AcJx ≤ acJ . This shows that x ∈ ϕ(J) = G. It followsthat F ⊂ G.

Now let x ∈ G = ϕ(J). By choice of x we have


AJx = aJ , AcJx ≤ acJ .

Write x’ for the point xγ(F), which belongs to F . As γ(F) = J, x′ satisfies

AJx′ = aJ , AcJx′ < acJ .

A calculation shows that for small positive ε the point x′′ = x′+ε(x′−x) belongs toC. As F is a face of C and

F ∋ x′ = (1+ ε)−1x′+ ε(1+ ε)−1x,

we have x ∈ F and therefore G ⊂ F . This shows that ϕ γ = idF ′(C) and completesthe proof of (5.16). The representation (5.8) shows that each nonempty face of C ispolyhedral, and the empty face is trivially polyhedral, so we have established part(a) of the theorem.

Now let F ∈ F ′(C) and define I := γ(F). We showed in part (a) that ϕ(I) = F ,so

F = x | AIx = aI , AcIx ≤ acI. (5.18)

The set S := x | AIx = aI is affine and contains F , so it contains affF . Next, letx ∈ S and write xI for the point xγ(F) constructed in the proof of part (a). We showedthere that xI ∈ F . For small positive ε let x′ = xI −ε(x−xI). Using (5.17) and (5.18),we see that if ε is small enough then x′ ∈ F . But then

x = ε−1(1+ ε)xI − ε−1x′,

which shows that x is an affine combination of two points of F and therefore is inaffF , so that S ⊂ affF . Then S = affF , which establishes (5.10).

For (5.12), let R = x | AIx = aI , AcIx < acI. Let x ∈ R and x′ ∈ affF . Then(5.10) shows that AIx′ = aI . If x′ is close enough to x then we have AcIx′ < acI , andthen (5.18) shows that x′ ∈ F . Therefore x ∈ riF , and as x was any point of R wehave R ⊂ riF . As riF ⊂ F , to show that riF ⊂ R it suffices to show that if x ∈ F\Rthen x /∈ riF . This choice of x implies that it satisfies (5.18) but that for some j ∈ cIwe have A jx = a j. Now for positive µ let x′ = x+ µ(x− xI). This x′ is an affinecombination of x and xI , so it is in affF . However,

(Ax′−a) j = (1+µ)(Ax−a) j −µ(AxI −a) j =−µ(AxI −a) j > 0,

so that x′ is not in C, hence certainly not in F . This shows that there are pointsof affF arbitrarily close to x that do not belong to F , so that x cannot be in riF .Therefore riF ⊂ R, so that riF = R and we have proved (5.12).

To prove (5.13) and (5.14), it suffices to use (5.18) together with the fact thatthe recession cone of a nonempty set defined by a finite set of linear equations andinequalities is the set obtained by replacing the right-hand sides of the equationsand inequalities by zero, and the lineality space is the set obtained by addition-ally changing all the inequalities to equations (Exercise 4.1.25). In particular, (5.13)

84 5 Polyhedrality

shows that rcC is polyhedral, and so is linC because any subspace is polyhedral.This establishes part (b) and completes the proof. ⊓⊔

Theorem 5.1.4 is very useful in establishing properties of polyhedral convex sets.The next theorem illustrates how one can use it to determine the form of the normalcone on faces of such a set.

Theorem 5.1.5. Let C be a nonempty polyhedral convex subset of Rn,and supposethat A is an m×n matrix such that for some a ∈ Rm,

C = x | Ax ≤ a.

Let F be a nonempty face of C and define I to be the exact index set associated withF by part (a) of Theorem 5.1.4. If I is empty then F = C and C has a nonemptyinterior, on which NC( ·) is identically zero. If I is nonempty then for each f ∈ F

A∗I (R

|I|+ )⊂ NC( f ), (5.19)

with equality if and only if f ∈ riF.

Proof. If I is empty then there is a point x ∈ F with Ax < a. The strict inequalityshows that x ∈ intC, so C has a nonempty interior containing x. As x ∈ F , part (a) ofProposition 4.2.3 says that C ⊂ F . But C ⊃ F so in fact F =C. The definition of thenormal cone shows that it is zero on the interior of C.

Suppose then that I is nonempty, and write K for the cone A∗I (R

|I|+ ). We show

next that if f ∈ F then K ⊂ NC( f ), which will establish (5.19). Suppose that k ∈ K:then there is some v ∈ R|I|

+ with k = vAI . For each c ∈C we have

k(c− f ) = vAI(c− f ) = v(AIc−AI f )≤ 0, (5.20)

where we used the facts that AIc ≤ aI , AI f = aI , and v ≥ 0. As k was any elementof K, (5.20) shows that K ⊂ NC( f ).

To prove the final assertion, let f ∈ F and suppose first that f ∈ riF . Part (b) ofTheorem 5.1.4 shows that AcI f < acI . If also g /∈ K, then the Farkas lemma providessome w∈Rn with AIw≤ 0 and ⟨g,w⟩> 0. If ε is a sufficiently small positive numberthen

AI( f + εw) = aI + εAIw ≤ aI , AcI( f + εw)< acI ,

so that f + εw ∈C. However,

⟨g,( f + εw)− f ⟩= ε⟨g,w⟩> 0,

and this shows that g /∈ NC( f ). Accordingly, if f ∈ riF then K ⊃ NC( f ) so thatK = NC( f ). In the remaining case f belongs to the relative boundary of F , and thenby Part (b) of Theorem 5.1.4 there must be some index j ∈ cI with A j f = a j. Foreach c ∈ C we have A jc ≤ a j, so that ⟨A j,c− f ⟩ ≤ 0 and therefore A j ∈ NC( f ). Ifthere were some y ∈ R|I| with A j = yAI , then for each f ′ ∈ F we would have


A j f ′ = yAI f ′ = ⟨y,aI⟩,

so that the value of A j f ′ would not depend on the choice of f ′ ∈ F . But we knowthat A j f = a j and Part (b) of Theorem 5.1.4 shows that whenever f ′ ∈ riF we haveA j f ′ < a j, so that no such y can exist. In particular, A j /∈ K so that K ⊃ NC( f ) andtherefore K = NC( f ). ⊓⊔

The next proposition is useful in visualizing the polarity relationship for polyhe-dral sets. We will also need it for the proof of the Minkowski-Weyl theorem.

Proposition 5.1.6. Let X ∈ Rn×p, Y ∈ Rn×q, and ep ∈ Rp with each componentequal to 1, and let 0n, 0p and 0q be zero vectors of dimensions n, p, and q respec-tively. Define

P :=

u

∣∣∣∣∣∣0∗n

X∗

Y ∗

u ≤

1ep0q

(5.21)

and

F :=

v∣∣∣∣ for some s ∈ Rp+1

+ , t ∈ Rq+,[

0n X Y−1 −e∗p 0∗q

][st

]=

[v−1

]. (5.22)

Then P and F are nonempty and mutually polar.

The set P is polyhedral, while F is finitely generated; therefore both are closed.The definitions of the two sets show that each contains the origin. Moreover, anypolyhedral convex set containing the origin can be put in the form (5.21) by simplerearrangement and scaling of the defining inequalities, while any finitely generatedset containing the origin can be expressed in the form (5.22). It causes no difficultyto have p or q, or both, equal to zero.

Proof. First suppose u ∈ P and v ∈ F , and write D for the matrix in (5.22). As u ∈ Pwe have from (5.21)

D∗[

u1

]≤ 0. (5.23)

Using this with (5.22) we find that

⟨u∗,v⟩−1 =[u∗ 1

][ v−1

]=[u∗ 1

]D[

st

]=

⟨D∗[

u1

],

[st

]⟩≤ 0,

because s and t are nonnegative. As u and v were arbitrary elements of the two sets,we have F ⊂ P.

To show that P ⊂ F , suppose that v /∈ F ; then the equation in (5.22) has nosolution in nonnegative variables s and t. By the Farkas lemma there exist r ∈ Rn+1

and ρ ∈ R such that

D∗[

rρ

]≤ 0,

[v∗ −1

][rρ

]> 0. (5.24)

The two inequalities in (5.24) yield

86 5 Polyhedrality

ρ ≥ 0,[

X∗ −e∗pY ∗ 0∗q

][rρ

]≤ 0, ⟨v∗,r⟩> ρ. (5.25)

If ρ = 0 then the second inequality in (5.25) shows that r ∈ rcP. As 0 ∈ P, thehalfline 0+αr | α ∈ R+ lies in P. However, the third inequality in (5.25) showsthat then the inner product ⟨v∗,αr⟩ takes arbitrarily large values for α ∈ R+, sov /∈P. Therefore ρ > 0, and as (5.24) is positively homogeneous in the pair (r,ρ) wecan suppose that ρ = 1. Putting this in (5.24) we find that r ∈ P but that ⟨v∗,r⟩> 1,so that in this case also v /∈ P. Therefore P ⊂ F , so in fact F = P. By taking polarswe then find that F = P = P, so that P and F are mutually polar. ⊓⊔

5.2 The Minkowski-Weyl theorem and consequences

We have seen that polyhedral convex sets containing the origin have finitely gener-ated polars, and vice versa. This section proves more: namely, that a convex set ispolyhdral if and only if it is finitely generated. This is often called the Minkowski-Weyl theorem, though the terminology varies with different authors. We prove thetheorem in the following section, and then present some consequences in Section5.2.2.

5.2.1 The Minkowski-Weyl theorem

Here is the main theorem about polyhedral and finitely generated convex sets.

Theorem 5.2.1 (Minkowski-Weyl). If C is a convex subset of Rn, then C is polyhe-dral if and only if C is finitely generated.

Proof. If C is empty there is nothing to prove, so we assume that C is nonempty. IfC does not contain the origin, then we translate it to a set that does, prove the equiv-alence for that set, and translate back to C. Remark 5.1.2 shows that the equivalenceholds for C if and only if it holds for the translated set. Therefore there is no loss ofgenerality is supposing that C contains the origin.

(Only if) Suppose C is polyhedral. Let L= linC and C′ =C∩L⊥; then PropositionA.6.22 shows that C = L⊕C′. Corollary 4.3.5 shows that C is the sum of L and thegeneralized convex hull of the extreme points and extreme rays of C′, and Exercise4.2.21 showed that the faces of C′ are in one-to-one correspondence with those of C.Extreme points are faces of dimension 0, and extreme rays are faces of dimension 1.C is closed, and Theorem 5.1.4 shows that it has only finitely many faces. Thereforethe sets of extreme points and extreme rays of C′ are finite, so that C′ is finitelygenerated and therefore so is C′+L =C.

(If) Now suppose C is finitely generated. As it contains the origin, Proposition5.1.6 shows that C is polyhedral. As a polar set, it must contain the origin, and we

5.2 The Minkowski-Weyl theorem and consequences 87

have just shown that a polyhedral convex set containing the origin is finitely gener-ated, so C is finitely generated. Another application of Proposition 5.1.6 shows thatC is polyhedral. But C =C, so C is polyhedral. ⊓⊔

5.2.2 Applications

Theorem 5.2.1 has many useful consequences. Here are a few of them.

Corollary 5.2.2. If C is a convex subset of Rn, then C is polyhedral if and only if itis closed and has finitely many faces.

Proof. (Only if) As C is polyhedral, the definition shows that it is closed and Theo-rem 5.1.4 shows that it has only finitely many faces.

(If) Suppose C is closed and has finitely many faces. Use the argument in the“only if” part of the proof of Theorem 5.2.1 to decompose C into C′+L, where Lis the lineality space, but in place of Theorem 5.1.4 use the hypothesis that C hasfinitely many faces to conclude that C′, and therefore also C, is finitely generated.Now Theorem 5.2.1 says it is also polyhedral. ⊓⊔

Corollary 5.2.3. Let L be a linear transformation from Rn to Rm. If C is a polyhe-dral convex subset of Rn then L(C) is polyhedral; if D is a polyhedral convex subsetof Rm then L−1(D) is polyhedral.

Proof. As C is polyhedral, Theorem 5.2.1 shows that it is finitely generated. It is im-mediate that the linear image of a finitely generated set is finitely generated. There-fore L(C) is finitely generated and therefore polyhedral.

As D is polyhedral we can represent it as y | E(y) ≤ e for some matrix E andvector e. Then L−1(D) = x | EL(x)≤ e, which is polyhedral. ⊓⊔

It follows in particular that the sum of a finite collection of polyhedral convexsubsets of Rn is polyhedral, and that the projection of a polyhedral convex set ontoa subspace is also polyhedral, because the projector on a subspace is linear.

Corollary 5.2.4. Let C be a nonempty polyhedral convex subset of Rn and let L :Rn → Rm be a linear transformation. Then rcL(C) = L(rcC).

Proof. Theorem 5.2.1 shows that C is finitely generated, so that C = convX +posYfor finite subsets X and Y of Rn with X = /0, and therefore rcC = posY by Proposi-tion 4.1.7. But L(C) = convL(X)+posL(Y ), so

rcL(C) = posL(Y ) = L(posY ) = L(rcC).

Corollary 5.2.5. Let C and D be polyhedral convex subsets of Rn. Then C and Dcan be strongly separated by a hyperplane if and only if they are disjoint.

88 5 Polyhedrality

Proof. For strong separation it is necessary that C and D be disjoint. To see that thisis also sufficient, recall that by Theorem 2.2.7 strong separation is possible if andonly if the origin does not belong to the closure of C−D. If C and D are disjointthen the set C−D does not contain the origin, and it is closed by the remark justpreceding Corollary 5.2.4. ⊓⊔

Corollary 5.2.6. If C is a polyhedral convex subset of Rn then C is polyhedral.

Proof. Consider the sequence C, C, C, and C obtained by successive applica-tion of the polarity operation. We know that C contains the origin, and by Theorem2.2.5 that C = clconv(C∪0). As C is polyhedral it is also finitely generated. IfC = conv(X ,Y ) then conv(C∪0) = conv(X ∪0,Y ), which is finitely generatedand therefore closed, so C is finitely generated and contains the origin. By Propo-sition 5.1.6, C is polyhedral and contains the origin. But this set is the same asC.

⊓⊔

Corollary 5.2.7. If C is a nonempty polyhedral convex subset of Rn, then clHC ispolyhedral.

Proof. Theorem 4.2.17 showed that F (clHC) is in one-to-one correspondence withF (C)∪F (rcC). As both C and rcC are polyhedral, Corollary 5.2.2 shows that eachhas only finitely many faces. Then F (clHC) is also a finite set, and an applicationof Theorem 5.2.1 shows that clHC is polyhedral. ⊓⊔

Corollary 5.2.8. If C is a polyhedral convex subset of Rn, then the barrier cone ofC is polyhedral.

Proof. C is polyhedral, hence also finitely generated. Write C = P+K, where P isthe convex hull of some finite set and K = posY for some finite set Y = y1, . . . ,yk.The set P is compact. Therefore for each x∗ ∈ Rn the linear functional ⟨x∗, ·⟩ isbounded above on C if and only if it is bounded above on K, and that happens if andonly if x∗ ∈ K. Therefore bcC = K. We have

K =k∩

i=1

z∗ | ⟨z∗,yi⟩ ≤ 0,

and this is polyhedral. ⊓⊔

Next we look at a question that is very important for applications: given a nonzeroz∗ for which ⟨z∗, ·⟩ is bounded above on a polyhedral convex subset C of Rn, arethere any maximizers and, if so, what one can say about the structure of the set ofall such maximizers?

Polyhedrality makes this question interesting, because in general no such maxi-mizers need exist even when the set C is closed and convex. For example, the prob-lem of maximizing the function −x2 over the closed convex subset C of R2 definedby C = x | x1 > 0, x1x2 ≥ 1 has no solutions.

5.2 The Minkowski-Weyl theorem and consequences 89

A very nice property of polyhedral convex sets—and a central result in the theoryof linear programming—is that there are always maximizers; in fact, the set of allthese maximizers forms a nonempty face of C, as we now show.

Proposition 5.2.9. Let C be a nonempty polyhedral convex subset of Rn, and letz∗ be a nonzero element of Rn. If ⟨z∗, ·⟩ is bounded above on C, then the set ofmaximizers of ⟨z∗, ·⟩ on C is a nonempty face of C.

Proof. As C is polyhedral, by Theorem 5.2.1 it is finitely generated. Therefore thereare finite subsets X = x1, . . . ,xP and Y = y1, . . . ,yQ of points of Rn, with Xnonempty, such that C = convX +posY . As ⟨z∗, ·⟩ is bounded above on C, we musthave

⟨z∗,yq⟩ ≤ 0, q = 1, . . . ,Q, (5.26)

because if that were not so then for some such q we could make ⟨z∗, ·⟩ increasewithout bound along the halfline x1 +µyq | µ ≥ 0, which is a subset of C. In turn,(5.26) implies that ⟨z∗,y⟩ ≤ 0 for each y ∈ posY .

We know that every element of C is the sum of an element x of convX and anelement y of posY . For any given x we can maximize ⟨z∗,x+ y⟩ by choosing y = 0,so under our boundedness hypothesis the supremum of ⟨z∗, ·⟩ on C is the same asits supremum on convX . The latter value, say ζ , is attained (in fact, at some pointxp ∈ X). Therefore ⟨z∗, ·⟩ attains its maximum value of ζ on C. It follows that thehyperplane H0(z∗,ζ ) contains C in its lower closed halfspace, and moreover that theintersection F :=C∩H is nonempty. By Proposition 4.2.7, F is a face of C. ⊓⊔


Exercise 5.2.10. Let P and Q be nonempty convex subsets of Rm and Rn respec-tively. Show that P×Q is polyhedral if and only if each of P and Q is polyhedral.

Exercise 5.2.11. Show that if S is a convex subset of Rn and y ∈ Rn, then S+ y ispolyhedral if S is polyhedral, and finitely generated if S is finitely generated.

Exercise 5.2.12. Let P be a polyhedral convex subset of Rn that is representable asP = x | Ax ≤ a for some matrix A and vector a. Let x0 ∈ P and let I be the activeset at x0: that is, the set of row indices such that AIx0 = aI and AcIx0 < acI . Showthat

a. NP(x0) = A∗I (R

|I|+ ) = A∗

I u∗ | u∗ ≥ 0;b. TP(x0) = (AI)

−1(R|I|− ) = z | AIz ≤ 0;

c. There is a neighborhood Q of the origin in Rn such that Q∩(P−x0)=Q∩TP(x0).

Suggestion. Proposition 3.3.2 can save some work.

90 5 Polyhedrality

5.3 Polyhedrality and structure

For a polyhedral convex cone, we can sharpen the conclusions of Theorem 4.2.18.

Lemma 5.3.1. Let C be a nonempty polyhedral convex set. Then each nonemptyface of C is exposed.

Proof. Let F be a nonempty face of C. Apply Theorem 5.1.4 to produce a represen-tation of C as x |Ax≤ a and an index set I such that F = x |AIx= aI , AcIx≤ acI.If |I|= 0 then F =C = N−1

C (0), so that F is exposed. If |I|> 0 choose a strictly pos-itive element y∗ ∈ R|I| and set µ = ⟨y∗,aI⟩.

For any x′ ∈C,

⟨A∗I y∗,x′⟩= ⟨y∗,AIx′⟩ ≤ ⟨y∗,aI⟩= µ ,

while equality holds whenever x∈ F because there ⟨A∗I y∗,x⟩= ⟨y∗,AIx⟩= µ . There-

fore the maximum of ⟨A∗I y∗, · ⟩ on C is µ and it is attained everywhere on F . If

x ∈C\F then AcIx ≤ acI so for some i ∈ I we must have Aix < ai. As y∗ > 0 we thenhave ⟨A∗

I y∗,x⟩= ⟨y∗,AIx⟩< µ , so x is not a maximizer of ⟨A∗I y∗, · ⟩ on C. Therefore

F is exactly the set of maximizers; as this set is N−1C (A∗

I y∗), F is exposed. ⊓⊔

Theorem 5.3.2. Let K be a nonempty polyhedral convex cone in Rn, and let F be anonempty face of K. Write L for parF. Then F = K∩L and for each x ∈ riF, NK(x)has the constant value F∗ = K∩L⊥. This F∗ is a face of K with parF∗ = L⊥ and,for each x∗ ∈ riF∗, F = NK(x∗).

Proof. We proved most of the assertions in Theorem 4.2.18. Two things remain, oneof which is the final statement. Theorem 4.2.18 establishes this under the hypothesisthat F is an exposed face of K. Given Lemma 5.3.1, that must be true here.

The other remaining assertion is that parF∗ = L⊥. As F∗ = K ∩ L⊥, we haveparF∗⊂L⊥. For the opposite inclusion, we show first that if x0 ∈ riF then linTK(x0)=L. Represent C as x | Ax ≤ a, and find an index set I such that F = x | AIx =aI , AcIx ≤ acI with strict inequality holding for all indices in cI whenever x ∈ riF .Then AcIx0 < acI , so TK(x0) = A−1

I (R|i|−) and linTK(x0) = kerAI = parF = L.

Now Exercise 4.1.27 says that [parNK(x0)]⊥ = lin[TK(x0)]. As we have shown

that linTK(x0) = L, we have parNK(x0) = L⊥ as required. ⊓⊔

For a polyhedral set we can demonstrate a much closer relationship between thecritical cone and critical face than that for general convex sets. In Equation (4.12)we showed that ϕC(z0)− x0 ⊂ κC(z0), but noted that ϕC(z0) might have smallerdimension than κC(z0). The following proposition shows that in the polyhedral case,not only are the dimensions the same but in fact ϕC(z0)−x0 and κC(z0) coincide onsome neighborhood of the origin, and each is obtainable from the other.

Proposition 5.3.3. Let P be a nonempty polyhedral convex subset of Rn and letz0 ∈ Rn. Let x0 = ΠP(z0).

5.3 Polyhedrality and structure 91

a. We have

κP(z0) = cone[ϕP(z0)− x0], ϕP(z0) = [x0 +κP(z0)]∩P. (5.27)

b. There is a neighborhood Q of the origin such that

Q∩ [ϕP(z0)− x0] = Q∩κP(z0). (5.28)

In particular, ϕP(z0) and κP(z0) have the same dimension.

Proof. We first prove (b). Part (c) of Exercise 5.2.12 shows that there is a neighbor-hood Q of the origin such that Q∩ (P− x0) = Q∩TP(z0). If we define x∗0 = z0 − x0then

Q∩ [ϕP(z0)− x0] = Q∩ (P− x0)∩v | ⟨x∗0,v⟩= 0= Q∩TP(x0)∩v | ⟨x∗0,v⟩= 0= Q∩κP(z0),

which proves (5.28). For the assertion about dimension, apply Corollary 1.1.15 toobtain

dim[ϕP(z0)− x0] = dim(Q∩ [ϕP(z0)− x0])dim[Q∩κP(z0)] = dimκP(z0).

As translation of a convex set does not affect its dimension, dimϕP(z0)= dimκP(z0).To prove (a), we first apply Proposition 3.3.6 to obtain cone(P− x0) ⊂ TP(x0).

Next, define L to be the subspace of elements v ∈ Rn such that ⟨z0 − x0,v⟩= 0. Thestronger part of Exercise 3.1.19 then yields [cone(P−x0)]∩L = cone[(P−x0)∩L].Then

cone[ϕP(z0)− x0] = cone[(P−x0)∩L] = [cone(P−x0)]∩L ⊂ TP(x0)∩L = κP(z0).

For the opposite inclusion, start with v ∈ κP(z0); then v ∈ TP(x0), so part (b) saysthat for some positive µ , µv ∈ P−x0. But v ∈ L so µv ∈ (P−x0)∩L = ϕP(z0)−x0,and then v ∈ cone[ϕP(z0)− x0], which establishes the first part of (a).

For the second part of (a), we use the first part to establish that ϕP(z0) ⊂x0 +κP(z0). But ϕP(z0)⊂ P, so in fact ϕP(z0)⊂ [x0 +κP(z0)]∩P. For the oppositeinclusion, choose x∈ [x0+κP(z0)]∩P and for nonnegative µ let xµ = x0+µ(x−x0).For µ ∈ [0,1] we have xµ ∈ P, and as x− x0 ∈ κP(z0), also xµ ∈ x0 +κP(z0). If wechoose a sufficiently small positive µ then

xµ − x0 ∈ κP(z0)∩Q = [ϕP(z0)− x0]∩Q,

implying that xµ ∈ ϕP(z0). As x and x0 belong to P and ϕP(z0) is a face of P, bothx and x0 must belong to ϕP(z0). Therefore [x0 +κP(z0)]∩P ⊂ ϕP(z0), so the secondpart of (a) holds. ⊓⊔

92 5 Polyhedrality

Polyhedral convex sets have a useful property not shared by general closed con-vex sets: unless they are affine, they have facets. We use the following definition,which does not agree with some others in the literature.

Definition 5.3.4. Let C be a convex subset of Rn. A facet of C is a nonempty faceof C having dimension (dimC)−1. ⊓⊔

In general, a closed convex set need not have a facet: for example, the Euclideanunit ball in R2 has faces of dimensions 0 and 2, but not 1. No affine set has afacet, because such sets have no nonempty proper faces. However, if a nonemptypolyhedral convex set is not affine (that is, if it has a nonempty relative boundary),then it has a facet.

Theorem 5.3.5. Let P be a polyhedral convex subset of Rn Then each point of therelative boundary of P belongs to some facet of P.

Proof. If P has no relative boundary then there is nothing to prove; this case arises ifP is either affine or empty. Otherwise, suppose x0 belongs to rbP; then the dimensionk of P must be at least 1. If k < n then apply Theorem A.6.19 to find an affinehomeomorphism that preserves inner products and that maps affP onto Rk. We maytherefore suppose that we are working in Rk and that the set P has full dimension.As then x0 /∈ intP, Exercise 2.2.15 shows that P has a supporting hyperplane at x0,so NP(x0) = 0.

Exercise 5.2.12 shows that the tangent and normal cone operators associated withP have polyhedral values. Therefore the cones T := TP(x0) and N := NP(x0) arepolyhedral convex cones dual to each other. The cone N contains no line (otherwiseP could not have full dimension) and it is not the origin. The fundamental represen-tation theorem (Theorem 4.3.4) says that N is the convex hull of its extreme pointsand extreme rays. As N is a cone its only extreme point is the origin, and there areonly finitely many extreme rays because each is a face of N, which is polyhedral.But as N is not the origin, there must be at least one extreme ray R; suppose it isgenerated by n∗0.

Now apply Theorem 5.3.2 with K = N, K = T , and F = R, and let z0 = x0 +n∗0.The critical cone κP(z0) is the face F∗ = T ∩ span(n∗0)

⊥ of T that is identified in thetheorem. The critical face ϕP(z0) is a face of P containing x0 and having the samedimension as κP(z0). But the theorem tells us that parκP(z0) = span(n∗0)

⊥, so thatϕP(z0) has dimension k−1 and is therefore a facet of P containing x0. ⊓⊔

We can use Theorem 5.3.5 recursively to show that P has faces of every dimen-sion from dimlinP to dimP inclusive.

Corollary 5.3.6. Let P be a polyhedral convex subset of Rn having dimension k. IfM := linP has dimension m∈ [0,k], then for each j ∈ [m,k] P has a nonempty face Fjwith dimFj = j and no face of P has dimension outside the interval [m,k]. Moreover,Fj has the form M+G j−m where G j−m is a face of P∩M⊥ having dimension j−m;one has

P∩M⊥ := Gk−m ⊃ . . .⊃ G0, P := Fk ⊃ . . .⊃ Fm. (5.29)

5.4 Polyhedral multifunctions 93

Proof. The hypothesis requires k ≥ 0, so P cannot be empty. P is affine if and onlyif m = k, in which case we take Fm = M and G0 to be any point of P, and we arefinished.

In the remaining case, m < k. If we write M for linP then M is also the linealityspace of each face of P (Exercise 4.2.19), so no face of P can have dimension lessthan m, and certainly no face can have dimension greater than k.

We have P = M+Q with Q = P∩M⊥, and Exercise 4.2.21 shows that the facesF of P are in one-to-one correspondence with the faces G of Q through the mapsF = G+M and G = F ∩M⊥. As G ⊂ M⊥, if we write q = dimM then we havedimF = m+dimG for each such pair; taking F = P and G = Q, we have k = m+q.

By construction, Q contains no lines and has positive dimension, so rbQ isnonempty and Theorem 5.3.5 shows that Q has a facet Gq−1 of dimension q−1. AsQ contains no lines, this facet can be affine only if q− 1 = 0, whereas if q−1 > 0we can apply the theorem to Gq−1 to obtain a facet Gq−2 of Gq−1. But a face of aface is a face, so Gq−2 is a face of Q having dimension q−2. We can continue thisprocess until the face G0 produced at the qth stage has dimension 0 (so that it is anextreme point of Q), and we then have a sequence G0, . . . ,Gq of faces of Q withdimGi = i. The first inclusion in (5.29) follows from this construction. Correspond-ing to this sequence is the sequence Fm, . . . ,Fk of faces of P obtained by settingFi+m = M+Gi, and the second inclusion in (5.29) then follows from the first. ⊓⊔

5.4 Polyhedral multifunctions

Definition 5.4.1. A multifunction F from a set X to a set Y is an operator associatingwith each point x∈X a subset F(x) of Y . The graph of F is the subset gphF of X×Ydefined by

gphF = (x,y) ∈ X ×Y | y ∈ F(x).

Very often we will omit the label gph and just write F for gphF , saying forexample that (x,y) ∈ F instead of (x,y) ∈ gphF .

Such a multifunction F could be empty for some (or every) x. The points of Xwhere it is nonempty, if any, form a subset domF of Rn called the effective domainof F , and the points of y that belong to F(x) for some x form the image of F , writtenimF . Thus domF and imF are the projections of gphF on X and Y respectively.

Definition 5.4.2. If F is a multifunction from X to Y , the inverse of F is the multi-function F−1 : Y → X defined by (y,x) ∈ F−1 if and only if (x,y) ∈ F .

This definition implies that (F−1)−1 = F .If S ⊂ X then the forward image of S under F is

F(S) := ∪x∈SF(x), (5.30)

and if T ⊂ Y then the inverse image of T under F is

94 5 Polyhedrality

F−1(T ) := ∪y∈T F−1(y), (5.31)

so that the inverse image of T under F is the forward image of T under the multi-function F−1. We always have (F F−1)(T )⊃ T , and therefore by exchanging theroles of F and F−1 also (F−1 F)(S)⊃ S, but in general we cannot replace ⊃ by =.For example, if X = R= Y and

F(x) =

[0,1] if x ∈ [0,1],/0 otherwise,

then for each x ∈ [0,1]

(F F−1)(x) = [0,1] = (F−1 F)(x).

The sets F(x) for x ∈ X are the values of F , and if Y is a topological space wesay that F has closed values if for each x, F(x) is a closed set.

In many applications one finds multifunctions whose graphs are either convexpolyhedra or unions of such polyhedra. These multifunctions have particularly goodcontinuity properties, some of which this section develops.

5.4.1 Elementary properties

Some properties of multifunctions can be easily visualized in terms of the graph. IfX and Y are linear spaces we say that a multifunction is positively homogeneous ifits graph is a (not necessarily convex) cone, and homogeneous if its graph is a unionof lines through the origin.

Definition 5.4.3. If X and Y are topological spaces, a multifunction F : X → Y isclosed if gphF is closed in X ×Y . It is locally closed at a point (x,y) ∈ gphF ifthere is a neighborhood N of (x,y) in X ×Y such that N ∩gphF is closed in N.

As the definition of closedness involves only the graph, F is closed if and only ifF−1 is closed.

If the space X is T1 (in particular, if it is Rn) then each closed multifunctionhas closed values, but the example of F : R → R with F(x) defined to be [0,1] if0 < x < 1 and empty otherwise shows that the converse need not hold.

Definition 5.4.4. If X and Y are linear spaces, a multifunction F :Rn →Rm is convexif its graph is convex.

Convex multifunctions are sometimes called graph-convex.

Definition 5.4.5. If X and Y are metric spaces, a multifunction F : X →Y is boundedif imF is a bounded set, It is locally bounded at a point x ∈ Rn if there is a neigh-borhood N of x such that F(N) is bounded.


Definition 5.4.6. A multifunction is polyhedral if its graph is the union of a finitecollection of polyhedral convex sets, called components of the graph. It is polyhedralconvex if it is polyhedral and the graph is convex.

The graph of a polyhedral convex multifunction is by definition convex. On theother hand, the graph of a polyhedral multifunction is a union of convex sets, so itneed be neither convex nor even connected.

One very common example of a polyhedral convex multifunction occurs in thecase of a feasible set S in Rn defined by a set of linear inequalities:

S = x ∈ Rn | Ax ≤ a0,

where A is a matrix of dimension k×n. For purposes of analysis, we could includelinear equations in the definition of S simply by writing each equation as two oppo-site inequalities.

If we now want to consider such a feasible set not only for a fixed right-handside a0 but also for any a ∈ Rk, then we can write this dependence in the form of amultifunction:

S(a) = x ∈ Rn | Ax ≤ a. (5.32)

The graph of the function S defined by (5.32) is

gphS =

(a,x) ∈ Rk ×Rn |

[−I A

][ax

]≤ 0,

which a polyhedral convex cone. So this example is very special, because the mul-tifunction is actually positively homogeneous as well as being polyhedral convex.

Another familiar example of a polyhedral convex multifunction is a linear op-erator A: its graph, being a subspace, is also a convex cone so that the operator ishomogeneous. A third familiar example is an affine transformation.

The following proposition shows that the normal-cone operator associated withthe values of a multifunction like S is polyhedral. Thus, it is an example of a non-trivial polyhedral multifunction that is not convex. If we are interested in parametricvariational problems like

0 ∈ f (a,x)+NS(a)(x), (5.33)

then the polyhedrality of the normal-cone operator may be useful to us.

Proposition 5.4.7. Let S be a polyhedral convex multifunction from Rk to Rn. Thenthe multifunction G : Rk ×Rn → Rn defined by

G(a,x) = NS(a)(x) (5.34)

is polyhedral.

Proof. The graph of G is

gphG = (a,x,x∗) | (a,x) ∈ gphS,x∗ ∈ NS(a)(x).

96 5 Polyhedrality

As gphS is a polyhedral convex set there exist matrices V (m× k) and W (m× n),and an element s ∈ Rm, such that

gphS = (a,x) |Va+Wx ≤ s.

For any subset A of 1, . . . ,m, let cA be the complement of A in 1, . . . ,m. ForQ ⊂ 1, . . . ,m let VQ, WQ, and sQ consist of those rows of V and W , and elementsof s, whose indices are in Q. Define a (possibly empty) subset PA of Rk ×Rn ×Rn

byPA = (a,x) |VAa+WAx = sA, VcAa+WcAx ≤ scA×W T

A (R|A|+ ),

where in the case A = /0 we interpret the factor on the right to be the origin ofRn. Evidently PA is a polyhedral convex set. If there is no pair (a,x) ∈ gphS withVAa+WAx = sA, then PA = /0 ⊂ gphG. On the other hand, if such a pair (a,x) existsthen

VAa+WAx = sA, VcAa+WcAx ≤ scA.

A calculation shows that W TA (R|A|

+ ) ⊂ NS(a)(x), from which it follows that PA ⊂gphG. Therefore

∪A⊂1,...,mPA ⊂ gphG. (5.35)

On the other hand, for any point (a,x,x∗) of gphG let A be the active set for S(a)at x: that is, the unique subset A (perhaps empty) of 1, . . . ,m such that

VAa+WAx = sA, VcAa+WcAx < scA.

Then [41, Theorem 6.46] shows that NS(a)(x) = W TA (R|A|

+ ), so that (a,x,x∗) ∈ PA.Accordingly,

∪A⊂1,...,mPA ⊃ gphG. (5.36)

From (5.35) and (5.36) we obtain

gphG = ∪A⊂1,...,mPA,

which shows that G is a polyhedral multifunction. ⊓⊔

The class of polyhedral multifunctions is closed under several common opera-tions.

Proposition 5.4.8. Let T : Rn → Rm, U : Rn → Rm, and V : Rm → Rr, and W :Rn → Rr be polyhedral multifunctions. Then the following multifunctions are alsopolyhedral:

1. T−1;2. T ×W, where (T ×W )(x) := T (x)×W (x);3. T +U;4. T ∩U, where (T ∩U)(x) := T (x)∩U(x);5. T ∪U, where (T ∪U)(x) := T (x)∪U(x);6. V T .


Proof. The fundamental tool for this proof is the double description theorem [37,Theorem 5.3.2], which says that any polyhedral convex subset P of Rn can be ex-pressed in two ways: as

P = A−1(a+Rk−) = x ∈ Rn | Ax ≤ a,

for some k×n matrix A and some a ∈Rk, and where A−1 denotes the inverse imageunder the map A, and also as

P = B(∆ ×Rs+) = x ∈ Rn | x =

r

∑i=1

Biδi +s

∑j=1

C jγ j,

with all δi and γ j nonnegative, and withr

∑i=1

δi = 1.

Here ∆ is the unit simplex in Rr consisting of elements δ ∈Rr+ whose coordinates δi

sum to 1, and where B and C are matrices of dimensions n×r and n×s respectively.Either of ∆ and Rs

+, but not both, may be vacuous. A direct consequence of thedouble description theorem is that the forward or inverse image of a polyhedralconvex set under an affine transformation is a polyhedral convex set.

Suppose that T is a polyhedral multifunction from Rn to Rm whose graph hascomponents τk ⊂ Rn ×Rm, for k = 1, . . . ,K. Thus, the τk are polyhedral convexsets.

Then the graph of T−1 has components

σk =

[0 ImIn 0

]τk, k = 1, . . . ,K,

where Im and In are the identity matrices of Rm and Rn respectively. The σk arepolyhedral convex sets, so T−1 is a polyhedral multifunction.

For the Cartesian product, let the components of gphW be βi, i = 1, . . . , I. Thenthe components of gph(T ×W ) are the sets of the form

σki =

In 0 00 Im 0In 0 00 0 Ir

−1

(τk ×βi), k = 1, . . . ,K, i = 1, . . . , I,

some of which may be empty, and these are polyhedral convex sets.For addition, let the components of U be ρ j, j = 1, . . . ,J. Then the components

of gphT +U are the sets of the form

πk j =

[In 0 00 Im Im

]In 0 00 Im 0In 0 00 0 Im

−1

(τk ×ρ j), k = 1, . . . ,K, j = 1, . . . ,J,

98 5 Polyhedrality

which are polyhedral convex sets. Here we used the second assertion, already estab-lished.

For intersection, let U be as above; then the components of gphT ∩U are the setsof the form

νk j =

In 00 ImIn 00 Im

−1

(τk ×ρ j), k = 1, . . . ,K, j = 1, . . . ,J,

which are polyhedral convex sets.For union, write G for the graph of T ∪U . Any point of G must belong to some

component τk of T or to some component ρ j of U . Accordingly, G ⊂ (gphT )∪(gphU). If (x,y) ∈ (gphT )∪ (gphU) then y ∈ T (x) or y ∈ U(x). Then y ∈ (T ∪U)(x), so (gphT )∪ (gphU)⊂ G and therefore

gphT ∪U = gphT ∪gphU ;

this is a union of finitely many polyhedral convex sets, so T ∪U is polyhedral.Finally, for composition suppose that the components of V are αh, h = 1, . . . ,H.

Then the components of gphV T are the sets of the form

µkh =

[In 0 00 0 Ir

]In 0 00 Im 00 Im 00 0 Ir

−1

(τk ×αh), k = 1, . . . ,K, h = 1, . . . ,H,

which are polyhedral convex sets. ⊓⊔

One can apply Proposition 5.4.8 to show polyhedrality of other useful combinationsof multifunctions. For example, suppose T is as in that proposition and LA and LBare affine transformations from Rm to Rq and from Rp to Rn respectively. As allthree of these are polyhedral multifunctions, we can apply the result on compositiontwice using the expression

LA T LB = LA (T LB),

to conclude that LA T LB is polyhedral.

5.4.2 Polyhedral convex multifunctions

We develop next several results about the behavior of polyhedral convex multifunc-tions. They are useful in applications, and in addition we will need some of them inestablishing properties of the more general polyhedral multifunctions.


Lemma 5.4.9. Let P be a polyhedral convex multifunction from Rn to Rm. Then forany bounded subset S of domP there is a real number σ such that for each x ∈ S,dP(x)(0)≤ σ .

The property asserted in this lemma doesn’t hold for general multifunctions, asyou can see by considering the multifunction C from R2 to R whose graph is

(x,y,z) | x > 0, z ≥ y2/(2x)∪(0,0,z) | z ≥ 0.

Although its representation in this form looks complicated, the graph is simple: itis a right circular cone (ice-cream cone) centered on the 45 line in the positivequadrant of the xz-plane, and just touching the x and z axes.

This multifunction is often useful for finding counterexamples to plausible con-jectures. In the present case, if for k = 1,2, . . . we take xk = .5 ∗ k−3 and yk = k−1

then C(xk,yk) = [k,+∞). Therefore although the sequence xk,yk converges to theorigin, the sequence dC(xk,yk)(0) is k, which converges to +∞.

Proof. Let S be a bounded subset of domP, and let Q be any bounded polyhedralconvex subset of Rn containing S. If π1 is the canonical projector from Rn ×Rm

to Rn taking (x,y) to x, then since π1 is linear the set domP, which is π1(P), ispolyhedral convex, and therefore so is the set F = Q∩domP, which contains S. ButF is bounded; hence Theorem 4.3.4 says it is the convex hull of its (finite) set ofextreme points f1, . . . , fK . For k = 1, . . . ,K let pk be the projection of the origin onthe polyhedral convex set P( fk): that is, the smallest element of P( fk), and let σ bethe maximum of the norms ∥pk∥ for k = 1, . . . ,K.

Choose any point x of F ; then there are convex coefficients µ1, . . . ,µK such thatx = ∑K

k=1 µk fk. Let y := ∑Kk=1 µk pk. As the pairs ( fk, pk) for k = 1, . . . ,K belong to

P, we have

(x,y) = (K

∑k=1

µk fk,K

∑k=1

µk pk) =K

∑k=1

µk( fk, pk) ∈ P.

This shows that y ∈ P(x), so if z is the smallest element of P(x) then ∥z∥ ≤ ∥y∥.However, we also have

∥y∥= ∥K

∑k=1

µk pk∥ ≤K

∑k=1

µk∥pk∥ ≤ σ ,

so that ∥z∥ ≤ σ . ⊓⊔

It is sometimes useful to apply Lemma 5.4.9 to the inverse of the multifunctionunder consideration. Thus, suppose T is a polyhedral convex multifunction from Rn

to Rm, and let Q be any bounded subset of imT . Apply the lemma to T−1 and Qto conclude that there is some µ such that for each q ∈ Q, dT−1(q)(0) ≤ µ . But thismeans that there is some p ∈ Rn with q ∈ T (p) and ∥p∥ ≤ µ : in other words, thatQ ⊂ T (µBn).

The following theorem gives an important bound for error in solutions of systemsof linear inequalities. We use the notation x+, where x∈Rn, for the vector in Rn with

100 5 Polyhedrality

(x+)i =

xi if xi ≥ 0,0 if xi < 0.

That is, x+ is the (Euclidean) projection of x on Rn+. We also define x− by x =

x++ x−.

Theorem 5.4.10 (Hoffman, 1952). Let A be an m× n matrix.Then there is a con-stant γA such that for each a ∈ A(Rn) +Rm

+ and for each x ∈ Rn there exists anx′ ∈ Rn with Ax′ ≤ a and

∥x− x′∥ ≤ γA∥(Ax−a)+∥. (5.37)

An interpretation of this theorem is that if the system of inequalities Ax ≤ a hasany solution at all, then the distance of the point x from the solution set of theinequalities is bounded above by a constant multiple of the “residual,” ∥(Ax−a)+∥;furthermore, this constant can depend on A but not on a.

To include systems of linear equations along with the inequalities in this result,write each equation as two opposite inequalities.

Proof. Given A, define a multifunction Q from Rm to Rn by

Q(a) := x ∈ Rn | Ax−a ∈ Rm−.

The graph of Q is the set

Q =

[xa

]|[A −I

][xa

]∈ Rm

−

=[A −I

]−1(Rm

−).

As this is a polyhedral convex set, Q is a polyhedral convex multifunction.For each nonempty subset I of 1, . . . ,m define a positively homogeneous poly-

hedral convex multifunction PI from Rn to Rm by

PI(w) = y | A∗y = w, y ∈ Rm+, yi = 0 for each i /∈ I.

By Lemma 5.4.9, the distance from the origin to PI(·) is bounded above on theintersection of the unit ball Bn with domPI . This intersection is nonempty becausethe graph of PI is a polyhedral convex cone and therefore contains the origin. Let γAbe the maximum of these bounds over all nonempty subsets I of 1, . . . ,m.

Now choose any a ∈ A(Rn) +Rm+; then Q(a) := x | Ax ≤ a is a nonempty

polyhedral convex set. Let x ∈Rn and let x′ be the unique point in Q(a) closest to x.If x′ = x then we certainly have (5.37).

If x′ = x then x′ belongs to the relative interior of a unique face F of Q(a). Let Ibe the exact index set associated with that face by Theorem 5.1.4 applied to Q(a).The set I cannot be empty, because if it were then by Theorem 5.1.5 Q(a) wouldhave a nonempty interior. But then x′ would have to lie in that interior, and as x′ = xx′ could not be the closest point in Q(a) to x. Therefore I is nonempty.


The difference x− x′ belongs to the normal cone of Q(a) at x′, and by Theorem5.1.5 that normal cone is A∗

I (R|I|+ ). Therefore we can find y ∈ Rm

+ such that A∗y =x− x′ and ycI = 0. The (nonempty) set of all such y is PI(x− x′). Define t := (x−x′)/∥x−x′∥; then t ∈ (domPI)∩Bn. Choose z to be the element of least norm in PI(t),so that ∥z∥≤ γA, and define y′ := ∥x−x′∥z; then y′ ∈PI(x−x′) and ∥y′∥≤ γA∥x−x′∥.Then x− x′ = A∗y′, and as (y′)cI = 0 and (Ax′− a)I = 0 we have ⟨y′,Ax′− a⟩ = 0,while as y′ ≥ 0 and (Ax−a)− ≤ 0 we have ⟨y′,(Ax−a)−⟩ ≤ 0. Therefore

∥x− x′∥2 = ⟨x− x′,x− x′⟩= ⟨A∗y′,x− x′⟩= ⟨y′,A(x− x′)⟩= ⟨y′,(Ax−a)− (Ax′−a)⟩= ⟨y′,(Ax−a)⟩= ⟨y′,(Ax−a)++(Ax−a)−⟩≤ ⟨y′,(Ax−a)+⟩≤ γA∥x− x′∥∥(Ax−a)+∥,

(5.38)

where the last line follows from the Schwarz inequality. Now (5.37) follows from(5.38). ⊓⊔

One consequence of Hoffman’s theorem is a very useful property of polyhedralconvex multifunctions.

Corollary 5.4.11. If F is a nonempty polyhedral convex multifunction from Rn toRm, then the restriction of F to domF is Lipschitzian in the Hausdorff metric.

Proof. Let the graph of F be the polyhedral convex set

G := (x,y) ∈ Rn ×Rm |Cx+Dy ≤ a,

where C ∈ Rm×n and D ∈ Rm×m. Let γD be the number associated with D by Hoff-man’s theorem and define λ := γD∥C∥. Choose any points x′ and x′′ in domF .

If y′ ∈ F(x′) we have Dy′ ≤ a−Cx′. Hoffman’s theorem applied to the systemDy ≤ (a−Cx′′) says that there is a point y′′ with Dy′′ ≤ a−Cx′′, so that y′′ ∈ F(x′′),and with

∥y′− y′′∥ ≤ γD∥(Dy′− [a−Cx′′])+∥. (5.39)

However,

Dy′− [a−Cx′′] = (Cx′+Dy′−a)+C[x′′− x′]≤C[x′′− x′],

where the inequality holds because y′ ∈ F(x′). Then

0 ≤ (Dy′− [a−Cx′′])+ ≤ (C[x′′− x′])+,

and then 5.39 yields

102 5 Polyhedrality

∥y′− y′′∥ ≤ γD∥(Dy′− [a−Cx′′])+∥ ≤ γD∥(C[x′′− x′])+∥,

so thate[F(x′),F(x′′)]≤ γD∥(C[x′′− x′])+∥. (5.40)

Reversing the roles of x′ and x′′ yields

e[F(x′′),F(x′)]≤ γD∥(C[x′− x′′])+∥. (5.41)

For any z ∈ Rm, (−z)+ =−(z−), so with z =C[x′′− x′] we have

∥(C[x′− x′′])+∥= ∥− (C[x′′− x′])−∥= ∥(C[x′′− x′])−∥,

and by using this in 5.41 we obtain

e[F(x′′),F(x′)]≤ γD∥(C[x′′− x′])−∥. (5.42)

The Hausdorff distance ρ[F(x′),F(x′′)] is the maximum of the left sides of 5.40 and5.42. The right side of each of those two inequalities is not greater than γD∥C[x′′−x′]∥, so

ρ[F(x′),F(x′′)]≤ γD∥C[x′′− x′]∥ ≤ λ∥x′′− x′∥.

As x′ and x′′ were any points of domF , the multifunction F is Lipschitzian on domFin the Hausdorff metric with modulus λ . ⊓⊔

5.4.3 General polyhedral multifunctions

Corollary 5.4.11 in the preceding section showed that polyhedral convex multifunc-tions are Lipschitzian in the Hausdorff metric on their effective domains. Such astrong property does not hold for general polyhedral multifunctions, but we canestablish a kind of Lipschitz semicontinuity.

Definition 5.4.12. Let F be a multifunction from Rn to Rm and λ be a real number.F is locally outer Lipschitz continuous at a point x ∈Rn with modulus λ if for someneighborhood N of x and each x′ ∈ N we have

F(x′)⊂ F(x)+λ∥x′− x∥Bm. (5.43)

For (5.43) to hold we must have either x ∈ domF or x /∈ cldomF . Also, (5.43)implies in particular that the excess e[F(x′),F(x)] is not more than λ∥x′−x∥, thoughit says nothing about the excess e[F(x),F(x′)], which could be infinite.

Theorem 5.4.13. Let F be a polyhedral multifunction from Rn to Rm. Then thereexists a constant λ such that F is everywhere locally outer Lipschitz continuouswith modulus λ .


Proof. Let the components of the graph of F be G1, . . . ,GI , and denote by Pi themultifunction from Rn to Rm whose graph is Gi. The Gi are polyhedral convex,and therefore so are their projections π1(Gi), the union of which is domF . As thereare only finitely many components, domF is a closed set. If x /∈ domF , then someneighborhood of x does not meet domF , and on that neighborhood the values ofF all equal /0, so F is trivially locally outer Lipschitz continuous at x. Thereforesuppose x ∈ domF .

By Corollary 5.4.11, each Pi is Lipschitzian on its effective domain, with somemodulus λi. Let λ be the maximum of λ1, . . . ,λI and choose any x ∈ Rn. Let I bethe set of indices i in 1, . . . , I such that x ∈ π1(Gi). As the sets π1(Gi) are closed, foreach i /∈ I there is a neighborhood Ni of x that does not meet π1(Gi). Let N be theintersection of these neighborhoods. If x′ ∈ N, then if x′ /∈ domF the asserted boundholds because F(x′) = /0.

If x′ ∈ domF then for each y′ ∈ F(x′) the pair (x′,y′) belongs to some Gi forwhich π1(Gi) meets N, and hence for which i ∈ I . Then

δ [Pi(x′),Pi(x)]≤ λi∥x′− x∥ ≤ λ∥x′− x∥,

and as the values of Pi are closed sets we have

Pi(x′)⊂ Pi(x)+λ∥x′− x∥Bm.

Then as F(x) = ∪i∈I Pi(x) by definition of I ,

F(x′) = ∪Pi(x′) | x′ ∈ π1(Gi)⊂ ∪i∈I Pi(x′)

⊂ ∪i∈I Pi(x)+λ∥x′− x∥Bm

= F(x)+λ∥x′− x∥Bm,

as claimed. ⊓⊔

The constant λ in Theorem 5.4.13 depends only on F , so the same Lipschitzmodulus works for any x ∈ Rn.

Theorem 5.4.13 implies that the polyhedral multifunctions satisfy a property thatis similar, but not identical, to what is called in the literature metric regularity. Thefollowing proposition explains this property.

Proposition 5.4.14. Let T be a polyhedral multifunction from Rn to Rm. Then thereis some κ ∈ R+ such that for each y ∈ Rm there is a neighborhood Q of y with theproperty that if x ∈ T−1(Q) then

d[x,T−1(y)]≤ κd[y,T (x)]. (5.44)

Proof. Theorem 5.4.13 shows that there is a nonnegative modulus κ such that T−1

is everywhere locally outer Lipschitz continuous with modulus κ . Choose y ∈ Rm

and let Q be a ball B(y,ε) such that for y′ ∈ Q we have

104 5 Polyhedrality

T−1(y′)⊂ T−1(y)+κ∥y′− y∥Bn. (5.45)

If T−1(Q) = /0 there is nothing to prove. Otherwise, let x ∈ T−1(Q); as T (x) isclosed we can project y onto T (x) to obtain y′ ∈ T (x) with ∥y′ − y∥ = d[y,T (x)].Then x ∈ T−1(y′), so (5.45) implies x ∈ T−1(y)+κ∥y′− y∥Bn, and therefore

d[x,T−1(y)]≤ κ∥y′− y∥= κd[y,T (x)].

⊓⊔

In words, Proposition 5.4.14 says that if T (x) is close enough to the point y then thebound (5.44) holds. To see why this closeness condition is necessary, consider thepolyhedral multifunction T from R to R given by

T (x) =

−1 if x <−1,x if x ∈ [−1,1],1 if x > 1.

If we choose y = 0, then for Q ⊂ (−1,1) the conclusions of the proposition holdwith κ = 1. However, if Q contains either 1 or −1 then they fail with any κ , becausefor example T−1(1) = [1,+∞) ⊂ T−1(0)+κ∥1−0∥B1.

Lemma 5.4.9 established a bound on the distance to values of a polyhedral con-vex multifunction for arguments in a bounded set. A bound of the same kind holdsfor general polyhedral multifunctions.

Proposition 5.4.15. Let T be a polyhedral multifunction from Rn to Rn. For eachbounded subset S of domT there is a nonnegative real number γS such that for eachx ∈ S, dT (x)(0)≤ γS.

Proof. Choose a bounded subset S of domT and let Gi | i = 1, . . . ,k be the com-ponents of gphT . Let I be the subset of 1, . . . ,k consisting of those indices ifor which S∩ domGi = /0. If I = /0 then S is empty and there is nothing to prove,so assume otherwise. Let γS be the maximum of the bounds γi provided by Lemma5.4.9 for S and the polyhedral multifunctions Fi whose graphs are the Gi for i ∈ I .Choose x ∈ S; then for some i ∈ I we have x ∈ domFi, so that

d[0,T (x)]≤ d[0,Fi(x)]≤ γi ≤ γS.

⊓⊔

A consequence of Proposition 5.4.15 is that polyhedral multifunctions have theproperty that bounded subsets of their images are covered by the images of boundedsubsets of their domains, as the following corollary shows.

Corollary 5.4.16. Let G be a polyhedral multifunction from Rn to Rm, and let Qbe a bounded subset of imG. Then there is a bounded subset P of domG such thatG(P)⊃ Q.


Proof. As G is polyhedral, the multifunction T :=G−1 is also polyhedral. Let Q be abounded subset of imG = domT , and apply Proposition 5.4.15 to produce γQ ∈R+

such that for each q ∈ Q, the distance from the origin to G−1(q) is not more than γQ.This means that for any such q there is some xq ∈ γQBn with q∈G(xq), and thereforethat q ∈ G(γQBn). Then G(γQBn)⊃ Q, so we can take P = domG∩ (γQBn). ⊓⊔


Exercise 5.4.17. If P is a polyhedral convex subset of Rn, then NP is a polyhedralmultifunction.

Exercise 5.4.18. Let A be an m×n matrix and c∗ ∈ Rn. For a ∈ Rm define

Q(a) = x | Ax = a, x ≥ 0, τ(a) = infx∈Q(a)

⟨c∗,x⟩.

Assume the conventions that the infimum of the empty subset of R is +∞, and thesupremum of that set is −∞.

1. What kind of multifunction is Q(a)? What is its effective domain?2. What is the nature of the set on which τ(a)<+∞?3. Could τ take the value −∞? If so, specify the weakest possible condition on A

and c∗ that will ensure that τ > −∞, show that it is weakest, and assume thatcondition for the rest of this exercise.

4. Does τ have any convexity properties? If so, describe them.5. What is the largest set on which you can prove that τ is Lipschitzian? Specify the

set, and prove the Lipschitzian property.

Exercise 5.4.19. You are given a linear-programming model of a system that re-sponds to exterior demands in the following way: a demand appears (a vectord ∈ Rm), and the system must then solve

Dx ≥ d, Fx = f , x ≥ 0,

where D (m×n), F (k×n), and f ∈Rk are given quantities. The constraints Fx = frepresent aspects of the system other than demand satisfaction. The columns of Dand F represent activities of the system, whose use is costly. There is a given vectorof costs c∗ ∈ Rn. For each demand vector d one then wants to solve the linear-programming problem

inf⟨c∗,x⟩ | Dx ≥ d, Fx = f , x ≥ 0. (5.46)

The system operators are concerned that small changes in demand might resultin large increases in the cost of satisfying the demand. Show that there is a constantα such that given a base case demand d0 and a cost c0 associated with that problem,for each demand d for which (5.46) has a solution the cost of satisfying demand d

106 5 Polyhedrality

will not be more than c0+α∥(d−d0)+∥, and therefore will increase at most linearlywith change in demand.

5.5 Polyhedrality and separation

Stronger separation theorems hold for polyhedral convex sets than for convex setsin general. These separation theorems often present themselves in forms involvingalternative statements about the solvability of systems of linear equations and in-equalities, and for that reason they are called theorems of the alternative. They arevery useful in, e.g., the analysis of optimization problems.

It is possible to prove most theorems of the alternative either by direct separationarguments, such as we used in Lemma 3.1.10, or by using another theorem of thealternative together with substitutions to change the form of the equations and/or in-equalities involved. However, these proofs can sometimes be rather tedious. One canavoid them by establishing a single theorem tailored to the polyhedral case, and thenderiving the various theorems of the alternative as simple corollaries. The theoremalso provides information useful for dealing with problems other than separation.

The next section presents such a theorem, which we call the CRF theorem. Thesubsequent section shows how to use it to derive an array of theorems of the alter-native.

5.5.1 The CRF theorem

The result proved here comes from work of Camion, Rockafellar, and Fulkerson;see for example [39, Theorem 22.6] and the bibliographic comments on p. 428 ofthat work. Rather than use the name “Camion-Rockafellar-Fulkerson theorem,” wejust refer to it in what follows as the CRF theorem. The proof we give here is due toMinty [23].

The theorem uses elementary subspaces of a given subspace of Rn. The basicidea is that with each one-dimensional subspace one can unambiguously associatea support, which is simply the set of indices for which elements of that subspace(except the origin) have nonzero entries. The elementary subspaces are those whosesupports are minimal.

Definition 5.5.1. Let x be an element of Rn. The support of x, written supp(x), isthe set of indices i for which xi = 0. ⊓⊔

The support depends on the basis one uses for the space in question.

Definition 5.5.2. Let S be a one-dimensional subspace of Rn. The support of S,supp(S), is defined to be the set supp(x) where x is any nonzero element of S. ⊓⊔

This definition makes sense because all such x have the same support.

5.5 Polyhedrality and separation 107

Definition 5.5.3. Let L be a subspace of Rn. An elementary subspace of L is a one-dimensional subspace S ⊂ L such that there is no one-dimensional subspace of Lwhose support is properly contained in that of S. ⊓⊔

The empty subspace and the origin have no elementary subspaces. However, anyother subspace L ⊂ Rn has at least one elementary subspace. To find one, just listthe supports corresponding to nonzero elements of L (a finite set), and find a supportin that set that does not properly contain any other. Any element of L having thatsupport will generate an elementary subspace.

Theorem 5.5.4. Let L be a nonempty subspace of Rn, of dimension k ≥ 1.

a. If S and T are elementary subspaces of L with supp(S) = supp(T ), then S = T .b. L has only finitely many elementary subspaces.c. L is the sum of its elementary subspaces.d. The support of any elementary subspace of L has at most n− k+1 components.

Proof. To prove (a), let S and T be elementary subspaces having the same support,say S . Choose elements s ∈ S and t ∈ T , and let i be any index in S . Defineu = s−(si/ti)t; then supp(u)⊂S , and moreover the containment is proper becausei /∈ supp(u). If u were not zero, it would generate a one-dimensional subspace whosesupport was strictly contained in S , and this would contradict the fact that S and Tare elementary. Therefore u = 0; hence s and t are proportional, so S = T .

Assertion (a) implies that there are no more elementary subspaces of L than thereare possible supports. This is a finite number, which proves (b).

To prove (c) we have only to prove that L is contained in the sum of its elementarysubspaces, because the sum of any finite collection of subspaces of L is containedin L. Call the sum of the elementary subspaces M; we have already observed that Mis not empty.

Any vector in L whose support has only a single element must generate an ele-mentary subspace; there may or may not be any such vectors. Now let 0 < J < n andassume that any vector in L whose support has no more than J elements belongs toM. We establish this assertion for elements of L having no more than J+1 elements,and thereby complete the proof of (c) by induction.

Let x be a vector of L whose support has no more than J + 1 elements. If nononzero element of L has support properly contained in supp(x), then x generatesan elementary subspace of L, and then x ∈ M. Therefore suppose that y is a nonzeroelement of L whose support is properly contained in supp(x); then y ∈ M by theinduction hypothesis. Choose an index i with yi = 0 and let z = x− (xi/yi)y. Thensupp(z)⊂ supp(x), and the containment is proper since zi = 0. Therefore the induc-tion hypothesis implies that z ∈ M. Since x = z+(xi/yi)y, x ∈ M.

To prove (d), let S be a one-dimensional subspace of L and let S be its support. IfS has more than n−k+1 components, form a set T consisting of the complementof S in 1, . . . ,n, together with exactly one element of S . Let v1, . . . ,vn−k be abasis for L⊥, and consider the linear system

⟨vi,x⟩= 0, (i = 1, . . . ,n− k), x j = 0, ( j ∈ T ).

108 5 Polyhedrality

This is a system of fewer than (n−k)+ [n− (n−k+1)+1] = n linear equations inRn. Therefore it has a nontrivial solution, which spans a one-dimensional subspaceof L whose support is properly contained in S . Therefore S cannot be elementary,and it follows that the support of an elementary subspace has no more than n−k+1elements. ⊓⊔

We now apply the concept of elementary subspace to state and prove the CRFtheorem. This theorem deals with two objects: a nonempty subspace L of Rn, and acollection of n real intervals: a real interval is any convex subset of the real line R. Inparticular, these intervals could be open, closed, half-open, bounded, or unbounded.The theorem asserts that L meets the Cartesian product I of the intervals if andonly if, for each elementary subspace S of L⊥, S⊥ meets I . The “only if” part isimmediate, since S⊥ ⊃ L, so the strength of the theorem lies in the “if” statement.

Theorem 5.5.5. Let L be a nonempty subspace of Rn and let I1, . . . , In be real inter-vals. Let I = ∏n

i=1 Ii. Then L meets I if and only if, for each elementary subspaceS of L⊥, S⊥ meets I .

Proof. (Only if) If L meets I and S is a subspace of L⊥ then S⊥ ⊃ L⊥⊥ = L, so S⊥

meets I .(If) This proof is easy in three special cases that we consider first.

a. If any of the intervals is empty then so is I and the theorem obviously holds.Therefore we assume I = /0.

b. If dimL= n then L=Rn. In this case L must meet I , so the first statement holds;so does the second because L⊥ is 0, which has no elementary subspaces.

c. If dimL = n− 1 then L⊥ has a single elementary subspace (itself). The orthog-onal complement of this elementary subspace is L⊥⊥ = L. Therefore the secondstatement holds exactly when the first does.

If n = 1 then the dimension of L is either n or n− 1, so the theorem holds byobservations (b) and (c) above. Therefore suppose that n ≥ 2, that the dimension ofL is not more than n− 2, and that the theorem is true in spaces of dimension notmore than n−1. Suppose also that for each elementary subspace S of L⊥, S⊥ meetsI . We will show that L meets I .

For i = 1, . . . ,n define

Ji = I1 × I2 × . . .× Ii−1 ×R× Ii+1 × . . .× In.

We are going to prove next that for each i, L meets Ji. The proof for each i is similar,so with no loss of generality we prove this only for i = n. Define a subspace L′ ofRn−1 to be the image of L under the linear operator Q that takes (x1, . . . ,xn) ∈Rn to(x1, . . . ,xn−1)∈Rn−1. For future reference, note that the adjoint of Q is the mappingQ∗ from Rn−1 to Rn that takes (x1, . . . ,xn−1) to (x1, . . . ,xn−1,0).

Suppose that S′ is any elementary subspace of (L′)⊥, and let S := S′×0. Wefirst prove that S is an elementary subspace of L⊥.


a. To see that S ⊂ L⊥, note that

(L′)⊥ = [Q(L)]⊥ = (Q∗)−1(L⊥), (5.47)

We assumed that S′ ⊂ (L′)⊥, so it follows from (5.47) that S = S′×0= Q∗(S′)is contained in L⊥.

b. The dimension of S must be that of S′, which is 1, so S is a one-dimensionalsubspace of L⊥.

c. Finally, if supp(S) were not minimal among the supports of one-dimensionalsubspaces of L⊥, then we could find a one-dimensional subspace T of L⊥ havingstrictly smaller support. Choose nonzero elements s ∈ S and t ∈ T . By construc-tion, s must have the form (s′,0) where s′ ∈ S′. As supp(t) is strictly containedin supp(s), t must be of the form (t ′,0), with the support of t ′ strictly containedin that of s′. Then Q∗(t ′) = t ∈ L⊥, and (5.47) says that t ′ ∈ (L′)⊥. This contra-dicts the fact that S′ was an elementary subspace of (L′)⊥, so in fact supp(S) isminimal among the supports of one-dimensional subspaces of L⊥.

We now know that S is an elementary subspace of L⊥, so we can apply the hy-pothesis that for each elementary subspace S of L⊥, S⊥ meets I . But

S⊥ = (S′×0)⊥ = [(S′)⊥]×R,

and this means that (S′)⊥ meets the set I ′ := I1 × . . .× In−1. Therefore for eachelementary subspace S′ of (L′)⊥, (S′)⊥ meets I ′.

Now apply the induction hypothesis in Rn−1 to conclude that L′ meets I ′. Takeany point x′ of this intersection and use the fact that L′ = Q(L) to find a point x ∈ Lwith Qx = x′. Then x ∈ L∩ Jn as required.

To finish the proof, we use what we have already proved to find n points, sayx1, . . . ,xn, with xi ∈ L∩ Ji for i = 1, . . . ,n. For i = 2, . . . ,n define vi = xi − x1. The viare all in L, which has dimension not more than n−2, so they are linearly dependent.Therefore there are µ2, . . . ,µn, not all zero, with ∑n

i=2 µi(xi−x1) = 0. If we let µ1 :=−∑n

i=2 µi, then we have

n

∑i=1

µixi = 0,n

∑i=1

µi = 0.

Since the µi sum to zero yet are not all zero, they cannot all be of the samesign. By reordering the indices if necessary, we can suppose that µ1, . . . ,µk are non-negative and µk+1, . . . ,µn are negative. We have ∑k

i=1 µi = ∑ni=k+1(−µi); call this

positive number σ . Now for i = 1, . . . ,k define πi = µi/σ , and for i = k+ 1, . . . ,ndefine νi =−µi/σ . We then have

k

∑i=1

πixi =n

∑i=k+1

νixi. (5.48)

110 5 Polyhedrality

The left side of (5.48) is a convex combination of points, each of which belongsto

L∩ [R× . . .×R× Ik+1 × . . .× In];

the right side is another convex combination of points, each of which belongs to

L∩ [I1 × . . .× Ik ×R× . . .×R].

Therefore the element of Rn expressed by (5.48) belongs to the intersection of thesetwo sets, which is L∩I . ⊓⊔

5.5.2 Some applications of the CRF theorem

To illustrate a direct application of the CRF theorem we will use it to prove a com-plementarity theorem of A. W. Tucker. After that, we will rewrite it in the form ofa theorem of the alternative, and illustrate how to prove other such theorems usingthat form.

Theorem 5.5.6 (Tucker, 1956). Let L be any nonempty subspace of Rn. There existpoints x ∈ L and x∗ ∈ L⊥ with x ≥ 0, x∗ ≥ 0, and x+ x∗ > 0.

The orthogonality of x and x∗, together with nonnegativity, requires that their sup-ports be complementary, so this is sometimes referred to as Tucker’s complemen-tarity theorem.

Proof. Among the nonnegative elements of L⊥, select x∗ so that its support hasmaximum cardinality. If the support is 1, . . . ,n then take x = 0 ∈ L to finish theproof. Otherwise, for i = 1, . . . ,n define Ii to be (0,+∞) if i /∈ supp(x∗) and Ii = 0if i ∈ supp(x∗). Define I = I1 × . . . ,×In. If L∩I = /0 then let x be a point in thisintersection. This x is nonnegative, belongs to L, and satisfies x+x∗ > 0 as required.

To complete the proof we must show that L and I in fact have a nonemptyintersection. If they did not, then by the CRF theorem there would be an elementaryvector v∗ of L⊥ such that

n

∑i=1

(v∗)iIi ⊂ (0,+∞). (5.49)

This implies that the coordinates of v∗ corresponding to indices not in supp(x∗) arenonnegative, and at least one of them is positive. Now by forming v∗ +αx∗ andtaking α to be a sufficiently large positive number, we can produce a nonnegativeelement of L⊥ whose support strictly contains that of x∗, which is impossible by ourchoice of x∗. Therefore the intersection of L and I must be nonempty. ⊓⊔

For many applications of the CRF theorem it helps to use the following reformu-lated version, in the form of a theorem of the alternative.

Corollary 5.5.7. Let M be an m×n matrix of rank r, and let I1, . . . , Im be real inter-vals. Let I = I1 × . . .× Im. Then exactly one of the following statements holds.


a. There is an x ∈ Rn with Mx ∈ I .b. There is a y∗ ∈ Rm with M∗y∗ = 0 and with

m

∑i=1

y∗i Ii ⊂ (0,+∞). (5.50)

Moreover, in case (a) the vector x can be chosen to have no more than r nonzerocomponents, while in case (b) the vector y∗ can be chosen to have no more thanr+1 nonzero components and to span an elementary subspace of kerM∗.

Proof. Let L be the image of M; then L⊥ = kerM∗. Theorem 5.5.5 says that alterna-tive (a) fails to hold if and only if there is an elementary subspace K of kerM∗ suchthat K⊥ does not meet I . Let y∗ be a vector spanning K; then ⟨y∗,x⟩ is nonzero foreach x ∈ I . As I is a convex set, this means that these inner products all have thesame sign, so by choosing y∗ ∈ K to have the correct direction we can satisfy (5.50).

The image of M is spanned by some set of not more than r columns of M, so incase (a) no more than r components of x need be nonzero. In case (b), by construc-tion y∗ spans the elementary subspace K of kerM∗. By Theorem 5.5.4, y∗ has nomore than m− k+1 nonzero components, where k is the dimension of kerM∗. Butwe know from linear algebra that k = m− r, so y∗ has no more than r+ 1 nonzerocomponents. ⊓⊔

As an example of how one can use this reformulation, we prove a theorem of thealternative attributed to Gale. The exercises for this section ask for proofs of othertheorems of the alternative.

Theorem 5.5.8. For any m×n matrix A and any a∈Rm, exactly one of the followingsystems is solvable:

a. Ax ≥ a.b. A∗u∗ = 0, ⟨u∗,a⟩> 0, u∗ ≥ 0.

Proof. It is immediate that (a) and (b) cannot both hold. We suppose (a) does nothold, and apply Corollary 5.5.7 with M = A and Ii = [ai,+∞) for i = 1, . . . ,m toconclude that the second alternative in the corollary must hold. That means thatthere is some u∗ with A∗u∗ = 0 and such that

m

∑i=1

(u∗)i[ai,+∞)⊂ (0,+∞). (5.51)

For each i we choose an arbitrarily large positive number in one of the Ii and thecomponents of a in the others to conclude from (5.51) that (u∗)i ≥ 0. Finally, takingai ∈ Ii for every i, we conclude that ⟨u∗,a⟩> 0. ⊓⊔

112 5 Polyhedrality


The following exercises ask for proofs of various theorems of the alternative. Thenames given here are often associated with the theorems in the literature.

Exercise 5.5.9. (Motzkin) If D, E, and F are matrices each having n columns, thenexactly one of the following systems is solvable:

a. Dx > 0, Ex ≥ 0, Fx = 0.b. D∗u∗+E∗v∗+F∗w∗ = 0, u∗ and v∗ nonnegative, and u∗ not zero.

Exercise 5.5.10. (Tucker) If D, E, and F are matrices each having n columns, thenexactly one of the following systems is solvable:

a. Dx ≥ 0, Ex ≥ 0, Fx = 0, with Dx not zero.b. D∗u∗+E∗v∗+F∗w∗ = 0, u∗ > 0, v∗ ≥ 0.

Exercise 5.5.11. (Ville) For any matrix A, exactly one of the following systems issolvable:

a. Ax < 0, x ≥ 0.b. A∗w ≥ 0, w nonnegative and not zero.

Exercise 5.5.12. (Stiemke) For any matrix A, exactly one of the following systemsis solvable:

a. Ax nonnegative and not zero.b. A∗w = 0, w > 0.

Exercise 5.5.13. (A. Ben-Israel [3]): Let A and D be linear transformations from Rn

to Rp and Rq respectively. Let K be a nonempty polyhedral convex cone in Rp andL a nonempty closed convex cone in Rq. Show that the following are equivalent:

a. Ax ∈ K implies Dx ∈ L.b. A∗(K)⊃ D∗(L).


Polyhedral multifunctions were introduced in [32], where Theorem 5.4.13, Corol-lary 5.4.16, and a version of Proposition 5.4.14 appeared. The Hoffman theorem isfrom [17]. Instead of proving Corollary 5.4.11 from Hoffman’s theorem as we didhere, one can derive it from a more general theorem [49, Theorem 1] of Walkup andWets, which actually characterizes polyhedrality.

The proof of Theorem 5.5.6 is from [38].

Part IIConvex Functions

Chapter 6Basic properties

Convex functions are very important in both theory and practice, especially in thefield of optimization. One of the reasons for their importance in optimization isthat each local minimizer of a convex function is a global minimizer. As numericalalgorithms usually produce only local minimizers, it is of great value to be able toidentify cases in which such a local minimizer is in fact global.

Continuity is a basic property important in many areas of mathematical work.In order to establish properties and behavior of functions, we often require that thefunction be continuous. Nonetheless, many very important and useful convex func-tions are not continuous, but only lower semicontinuous. The lower semicontinuityproperty turns out to be the appropriate “niceness” condition for most operationswith convex functions, especially when combined with a restriction on infinite func-tion values to yield the class of closed convex functions.

6.1 Convexity

This section shows some reasons why convex functions are important in optimiza-tion and develops some of their most elementary properties.

6.1.1 Convex functions with extended real values

In applications using convex functions—particularly in optimization—we oftenwant to define a function for points in a linear space such as Rn, but not for allsuch points. Explicitly specifying the domain of definition will sometimes be incon-venient, so we will define functions on all of Rn but allow them to take values in theextended real numbers

R := R∪+∞∪−∞,

115

116 6 Basic properties

and we call such functions extended-real-valued, which we sometimes abbreviateERV. For such a function f : Rn → R the points of Rn at which f takes valuesstrictly less than +∞ correspond to the traditional domain of definition, and we callthe set of all such points the effective domain of f , written dom f . Points with valuesof +∞ would in the traditional situation be outside the domain of definition. Pointswith values of −∞ are in dom f , but we will often restrict f so that there will be nosuch points.

The presence of the two special elements of R does not cause any trouble as longas we observe three simple rules:

a. The product of any positive (negative) number with ±∞ is ±∞ (∓∞), and theproduct of zero with anything is zero.

b. The sum of any real number and ±∞ is ±∞.c. The sum of either of the infinite elements with itself is itself.

We do not define the sums (+∞)+ (−∞) and (−∞)+ (+∞). Given rule (a), thisimplies that the differences (+∞)− (+∞) and (−∞)− (−∞) are also undefined.In calculating with ERV functions it is important to ensure that these undefinedsituations do not arise.

In order to have a concept of closeness for elements of R, we need to have atopology for it. We therefore define the open sets of R to include the open sets ofR together with all sets of the form [−∞,µ) or (µ,+∞], where µ ∈ R. Now theusual operations of union and finite intersection applied to this collection produce atopology for R.

Next, we associate with each function from Rn to R a set called the epigraph ofthe function.

Definition 6.1.1. If f : Rn → R, its epigraph is the set epi f := (x,µ) ∈ Rn+1 |f (x)≤ µ . ⊓⊔

The epigraph is exceedingly useful, as it permits us to translate properties offunctions into geometric properties of convex sets and vice versa. For example, thefollowing definition of an extended-real-valued convex function rests on a geometricproperty of its epigraph.

Definition 6.1.2. A function f : Rn → R is convex if epi f is convex; f is concave if− f is convex. ⊓⊔

A simple example of a convex function that takes infinite values is the indicatorof a convex subset C of Rn; this is the function IC taking the value 0 on C and +∞off C. If C is not all of Rn then IC has to take +∞ somewhere. The epigraph of IC isjust C×R+, which will be convex exactly when C is convex.

In visualizing concave functions g one can use the hypograph, consisting of thepairs (x,µ) in Rn+1 having µ ≤ g(x). For example, one can define the concaveindicator of a set C by specifying its hypograph to be C ×R−. However, opera-tions involving the hypograph turn out to be just reflections of the correspondingoperations for epigraphs, so that in most cases one need not keep track of separatedefinitions.

6.1 Convexity 117

Sometimes we can use the definition to determine whether a function is convex,but often it is more convenient to have a test. The following proposition gives a testapplicable to any ERV function.

Proposition 6.1.3. Let f : Rn → R. The function f is convex if and only if for eachpair x,y ∈ Rn, each pair of real numbers ξ and η with f (x) < ξ and f (y) < η ,and each λ ∈ (0,1) one has

f [(1−λ )x+λy]< (1−λ )ξ +λη .

Proof. (only if). Suppose that f is convex, and choose x, y, ξ , η , and λ as described.Find real numbers ξ ′ ∈ ( f (x),ξ ) and η ′ ∈ ( f (y),η); then the pairs (x,ξ ′) and (y,η ′)belong to epi f , which is a convex set because f is convex. Then the pair ((1−λ )x+λy,(1−λ )ξ ′+λη ′) also belongs to epi f , and so

f [(1−λ )x+λy]≤ (1−λ )ξ ′+λη ′ < (1−λ )ξ +λη .

(if). Assume that f satisfies the hypothesis. Choose any points (x,µ) and (y,ν) inepi f , and let λ ∈ (0,1). For each positive ε we have f (x)< µ +ε and f (y)< ν +ε .Therefore

f [(1−λ )x+λy]< (1−λ )(µ + ε)+λ (ν + ε) = (1−λ )µ +λν + ε.

Now allowing ε to approach zero we obtain

f [(1−λ )x+λy]≤ (1−λ )µ +λν ,

which implies that (1−λ )(x,µ)+λ (y,ν) ∈ epi f , so that f must be convex. ⊓⊔

The presence of strict inequalities in (6.1.3) can be useful, for example in theproof of Theorem 9.3.1.

The epigraph of f consists of those pairs (x,ξ ) ∈ Rn+1 having f (x) ≤ ξ . For xbelonging to such a pair, f (x) cannot be +∞; on the other hand, if f (x)<+∞ thenthere is a real ξ such that (x,ξ ) ∈ epi f . Therefore the projection of epi f onto Rn isthe effective domain of f , so if f is convex then dom f is a convex set.

The next definition produces a class of ERV functions that do not have certaininconvenient features, such as values of −∞.

Definition 6.1.4. A function f : Rn → R is proper if f never takes −∞ and f is notidentically +∞. If f is not proper it is improper. ⊓⊔

If f is a proper function then dom f is nonempty and f takes finite values everywhereon that set.

The next proposition simplifies the convexity criterion of Proposition 6.1.3 in thecase of functions that do not take −∞. In particular, this includes proper functions.

Proposition 6.1.5. Let f : Rn → (−∞,+∞]. The function f is convex if and only if,for each pair x,y ∈ Rn, and each λ ∈ (0,1),


f [(1−λ )x+λy]≤ (1−λ ) f (x)+λ f (y).

Proof. (only if). Choose x, y, and λ as described. If f takes +∞ at x or at y, theconclusion is obvious. In the remaining case f (x) and f (y) are finite. Choosing anypositive ε and applying Proposition 6.1.3 to ξ := f (x)+ ε and η := f (y)+ ε weobtain

f [(1−λ )x+λy]< (1−λ ) f (x)+λ f (y)+ ε,

and by letting ε approach zero we obtain the desired result.(if). If the inequality in the statement of the theorem holds for all x, y and λ as

described, then if µ and ν are real numbers with f (x)< µ and f (y)< ν we have

f [(1−λ )x+λy]≤ (1−λ ) f (x)+λ f (y)< (1−λ )µ +λν .

Then f is convex by Proposition 6.1.3. ⊓⊔

The next result provides a useful result that is often called Jensen’s inequality,though that term also applies to more general forms: e.g., with integrals.

Proposition 6.1.6. Let f : Rn → (−∞,+∞]. Then f is convex if and only if for eachinteger k ≥ 1, each set of points x1, . . . ,xk ∈ Rn and each set of convex coefficientsλ1, . . . ,λk,

f (k

∑i=1

λixi)≤k

∑i=1

λi f (xi).

Proof. For the “if” part we can just take k = 2 and use Proposition 6.1.5. For “onlyif,” suppose f is convex and note that we can suppose all of the λi to be positive (ifnot, they contribute nothing to the sums). If k is 1 or 2, we have the result already.Therefore suppose that k > 2 and, for an induction, that the theorem is true forintegers from 1 to k−1 inclusive. We prove it for k.

If λk = 1 then we are in the previous case since then there is only one nonzero λi.Therefore suppose λk < 1 and write x′ = ∑k−1

i=1 (1−λk)−1λixi. By induction we have

f (x′)≤k−1

∑i=1

(1−λk)−1λi f (xi),

and also

f (k

∑i=1

λixi) = f [(1−λk)x′+λkxk]≤ (1−λk) f (x′)+λk f (xk).

Combining these two inequalities we have the assertion. ⊓⊔

Sometimes we are given a convex set C ⊂ Rn and a function g : C → R that isconvex on C in the traditional sense:

For each c1 and c2 in C and each λ ∈ [0,1],g[(1−λ )c1 +λc2]≤ (1−λ )g(c1)+λg(c2).

(6.1)

6.1 Convexity 119

We can extend such a function from the set C where it is defined to an ERV functionon all of Rn by defining the +∞ extension of g to be the function f that agrees withg on C and takes the value +∞ off C. In such a case we have dom f = C, and if Cis nonempty then f is proper. The next proposition shows that the +∞ extension fis convex in the sense of Definition 6.1.2 exactly when g is convex in the sense of(6.1).

Proposition 6.1.7. Let C be a nonempty convex subset of Rn and let g : C →R. Thefunction g is convex on C in the sense of (6.1) if and only if the +∞ extension f of gis convex in the sense of Definition 6.1.2.

Proof. The function f is proper, so the convexity criterion of Proposition 6.1.5 ap-plies. If g is convex, then let x and y be points in Rn and fix λ ∈ (0,1). If either xor y is outside C then the corresponding value of f is +∞, so f trivially satisfies theinequality in Proposition 6.1.5. Otherwise f agrees with g at both x and y, whichmust then belong to C, so the inequality follows from that of (6.1). Therefore f isconvex.

If f is convex, then the inequality of Proposition 6.1.5 holds for f everywhereon Rn; in particular it holds on C, where g and f agree. Therefore g is convex in thesense of Definition 6.1.13. ⊓⊔

One of the nice properties of convexity is that several functional operations pre-serve it. The next proposition develops one of these.

Proposition 6.1.8. Let A be a nonempty set and let f be a function from Rn ×A toR such that for each a ∈ A, f ( · ,a) is convex. Then the function

g(x) := supa∈A

f (x,a)

is convex on Rn.

Proof. epig = ∩a∈A epi f ( · ,a), and each of the sets in the intersection is convex byhypothesis. Therefore epig is convex, so g is convex. ⊓⊔

Here is a very important application of convexity to optimization. Although ex-tremely simple, it yields one of the few available tests for global minimization. Iff is an extended real-valued function on Rn, we say a point x ∈ Rn is a local mini-mizer of f if x ∈ dom f and there is some neighborhood N of x such that wheneverx′ ∈ N, f (x′)≥ f (x); x is a global minimizer of f if x ∈ dom f and for each x′ ∈ Rn

we have f (x′)≥ f (x).

Proposition 6.1.9. Let f be a convex function from Rn to R. Then each local mini-mizer of f is a global minimizer.

Proof. If there are no local minimizers, the result is true. Therefore assume that x isa local minimizer of f . If f (x) = −∞ then x is a global minimizer of f . Otherwisef (x) is finite; let y be any point of Rn. If f (y) =+∞ then f (y)≥ f (x). Thus suppose


f (y) < +∞ and let η be a real number greater than f (y). By hypothesis, for eachsmall positive t, f [x+ t(y− x)]≥ f (x). Therefore

f (x)≤ f [(1− t)x+ ty]≤ (1− t) f (x)+ tη ,

implying η ≥ f (x). As η was any number greater than f (y) we must have f (y) ≥f (x). Therefore x is a global minimizer of f . ⊓⊔

Convex functions have very strong boundedness and continuity properties. Thenext two results establish some of the most important of these.

Lemma 6.1.10. Let f : Rn → (−∞,+∞] be a convex function and let x be a point ofridom f . Then f is bounded above on some neighborhood of x relative to dom f .

The qualification “relative to dom f ” means that we will construct a neighbor-hood of x in the relative topology that dom f inherits from Rn, on which f will havea finite upper bound. We need the relative topology here because f takes +∞ at allpoints not in dom f , so no finite upper bound could hold for such points.

Proof. By hypothesis dom f is nonempty and f never takes −∞, so it is proper.Let dom f have dimension k ≥ 0 and use Lemma 1.1.13 to construct a k-simplex σcontained in dom f with x as its barycenter, and therefore with x ∈ riσ . Then σ is aneighborhood of x in the relative topology on dom f . However, each point of σ is aconvex combination of the k+1 vertices of σ , and therefore by Jensen’s inequalityf is bounded above on σ by the maximum value it attains on those vertices. ⊓⊔

The following theorem establishes local Lipschitz continuity of a proper convexfunction f , on ridom f , relative to dom f . Saying that a function is locally Lipschitzcontinuous (equivalently, locally Lipschitzian) at a point x means that it obeys aLipschitz condition on some neighborhood of x. As f is finite only on dom f , theLipschitz condition here only holds with respect to points near x that are in dom f :that is, on a relative neighborhood of x in dom f . This continuity means in particularthat f is bounded, rather than just bounded above, on dom f near a point x∈ ridom f .

Theorem 6.1.11. Let f : Rn → R be a proper convex function. If x ∈ ridom f then fis locally Lipschitzian at x relative to dom f .

Proof. If dom f is a singleton the result is true, so assume that dom f contains morethan one point. Let x ∈ ridom f , and let ε be a positive number such that N :=B(x,2ε)∩affdom f is contained in dom f and such that, by Lemma 6.1.10, f has anupper bound β on N. If y is any point of N then the point w = 2x−y also belongs toN. We have x = (1/2)y+(1/2)w, so

f (x)≤ (1/2) f (y)+(1/2) f (w)≤ (1/2) f (y)+(1/2)β ,

and therefore f (y)≥ 2 f (x)−β =: γ .Now let M = B(x,ε)∩ affdom f , and let x′ and x′′ be any two distinct points of

M. Let t = ε/∥x′− x′′∥ and define w′ = x′− t(x′′− x′). We have

6.1 Convexity 121

∥w′− x∥ ≤ ∥x′− x∥+ t∥x′′− x′∥ ≤ 2ε,

so that w′ ∈ N. Also, x′ = (1+ t)−1w′+ t(1+ t)−1x′′, so by convexity

f (x′)≤ (1+ t)−1 f (w′)+ t(1+ t)−1 f (x′′)

= f (x′′)+(1+ t)−1[ f (w′)− f (x′′)]

≤ f (x′′)+(1+ t)−1(β − γ).

By interchanging the roles of x′ and x′′ we find that

| f (x′)− f (x′′)| ≤ (1+ t)−1(β − γ)≤ ε−1(β − γ)∥x′− x′′∥.

Therefore f is Lipschitzian on M with modulus ε−1(β − γ). ⊓⊔

6.1.2 Some special results for finite-valued convex functions

Sometimes one has to deal with convex functions having the special properties thatthey are finite-valued and defined on some convex subset C of Rn. They may haveadditional properties such as varying degrees of differentiability. In that case onecan prove results stronger than those obtainable for general convex functions, andthis section collects some of those results.

Definition 6.1.12. Let f be a real-valued function defined on a convex subset C ofRn. A scalar ϕ is called a modulus of convexity for f on C if for each x and y in Cand each λ ∈ (0,1),

f [(1−λ )x+λy]≤ (1−λ ) f (x)+λ f (y)− (1/2)ϕλ (1−λ )∥x− y∥2.

⊓⊔

Definition 6.1.13. Let f be as in Definition 6.1.12 We say f is strongly convex if ithas a positive modulus of convexity, convex if it has a zero modulus of convexity,and weakly convex if it has a negative modulus of convexity. In addition, we sayf is strictly convex if for each x and y in C with x = y and for each λ ∈ (0,1),f [(1−λ )x+λy]< (1−λ ) f (x)+λ f (y). ⊓⊔

If ϕ is a modulus of convexity for f then so is any smaller scalar. Accordingly, afunction belonging to any of the categories weakly convex, convex, strictly convex,or strongly convex also belongs to each of the preceding categories.

It is often inconvenient to use the definition to verify that a number ϕ is a modulusof convexity for a function. If the function is differentiable, one can often obtaininformation about its convexity by testing the derivatives instead. Here are two tests,one using the first derivative and one using the second.

Theorem 6.1.14. Let Ω be an open subset of Rn, f a C1 function from Ω to R, andC a convex subset of Ω . For any real scalar ϕ , the following are equivalent:


a. ϕ is a modulus of convexity for f on C.b. For each x and y in C, f (y)≥ f (x)+d f (x)(y− x)+(ϕ/2)∥y− x∥2.c. For each x and y in C, [d f (y)−d f (x)](y− x)≥ ϕ∥y− x∥2.

Proof. ((a) implies (b)). Assume that ϕ is a modulus of convexity for f on C, andlet x and y be points of C. For each small positive t, we have

(1− t) f (x)+ t f (y)− f [(1− t)x+ ty]≥ (ϕ/2)t(1− t)∥y− x∥2.

Therefore

t[ f (y)− f (x)]≥ f [(1− t)x+ ty]− f (x)+(ϕ/2)t(1− t)∥y− x∥2,

and so

f (y)− f (x)≥ t−1 f [x+ t(y− x)]− f (x)+(ϕ/2)(1− t)∥y− x∥2.

Taking the limit as t ↓ 0, we have (b).((b) implies (a)). Suppose (b) holds. Let λ ∈ (0,1) and write w = (1−λ )x+λy,

so that y = w+(1−λ )(y− x) and x = w+(−λ )(y− x). Using (b), we obtain

f (y)≥ f (w)+d f (w)(1−λ )(y− x)+(ϕ/2)(1−λ )2∥y− x∥2, (6.2)

andf (x)≥ f (w)+d f (w)(−λ )(y− x)+(ϕ/2)(−λ )2∥y− x∥2. (6.3)

Multiplying (6.2) by λ and (6.3) by (1−λ ) and adding, we find that

λ f (y)+(1−λ ) f (x)− f [(1−λ )x+λy]≥ (ϕ/2)λ (1−λ )∥y− x∥2,

so that ϕ is a modulus of convexity for f on C.((b) implies (c)). Assume that (b) holds, and choose any x and y in C. We have

f (y)− f (x)≥ d f (x)(y− x)+(ϕ/2)∥y− x∥2,

andf (x)− f (y)≥ d f (y)(x− y)+(ϕ/2)∥y− x∥2,

so [d f (y)−d f (x)](y− x)≥ ϕ∥y− x∥2, which is (c).((c) implies (b)). Let (c) hold, and suppose that x and y belong to C. Then writing

h = y− x, we have

f (y)− f (x) =∫ 1

0d f (x+ th)hdt = d f (x)h+

∫ 1

0[d f (x+ th)−d f (x)]hdt.

Now [d f (x+ th)−d f (x)](th)≥ ϕ∥th∥2, so∫ 1

0[d f (x+ th)−d f (x)]hdt ≥ ϕ∥h∥2

∫ 1

0t dt = (ϕ/2)∥h∥2.

6.1 Convexity 123

Therefore we have

f (y)≥ f (x)+d f (x)(y− x)+(ϕ/2)∥y− x∥2,

as required. ⊓⊔

Suppose f is a C2 function from a neighborhood N of x ∈ Rn to Rm. The secondderivative d2 f , when evaluated at x ∈ Rn, produces a linear operator d2 f (x) : Rn →L (Rn,Rm), where L (Rn,Rm) denotes the class of linear transformations from Rn

to Rm. We will write d2 f (x)yz for the result of applying d2 f (x) to y ∈Rn to producethe linear operator J := d2 f (x)y : Rn → Rm, then applying J to z ∈ Rn to produceJ(z) ∈ Rm.

When m = 1 it is common to represent d2 f (x) by an n×n symmetric matrix D,with the interpretation that d2 f (x)yz= ⟨Dy,z⟩. In this case we will say that d2 f (x) ispositive semidefinite or positive definite if the representation D has these properties.When m> 1 one can still use matrix representatives, but with m matrices D1, . . . ,Dm.This can be somewhat awkward, and we will generally avoid this notation in favorof that explained in the preceding paragraph.

Theorem 6.1.15. Let Ω be an open subset of Rn; let C be a relatively open convexsubset of Ω and L be the parallel subspace of C. Let f be a C2 function from Ω toR. A scalar ϕ is a modulus of convexity for f on C if and only if for each x ∈C andeach z ∈ L, d2 f (x)zz ≥ ϕ∥z∥2.

Proof. If C is empty or a singleton there is nothing to prove, so we can assume thatL contains a nonzero point.

(only if). Suppose ϕ is a modulus of convexity for f on C, and choose z ∈ L. Ifz = 0 then the assertion is true, so assume z = 0 and define w = z/∥z∥. Let x ∈C andlet t be a real number small enough that x+ tw and x− tw belong to C. We have

f (x) = f [(1/2)(x+ tw)+(1/2)(x− tw)]

≤ (1/2) f (x+ tw)+(1/2) f (x− tw)− (ϕ/2)(1/4)∥2tw∥2,

so that(ϕ/2)t2 ≤ (1/2) f (x+ tw)+(1/2) f (x− tw)− f (x). (6.4)

We also have

f (x+ tw)− f (x) = td f (x)w+[(t2)/2]d2 f (x)ww+o(t2), (6.5)

andf (x− tw)− f (x) =−td f (x)w+[(t2)/2]d2 f (x)ww+o(t2). (6.6)

Multiplying (6.5) and (6.6) by 1/2, adding, and combining the result with (6.4), weobtain

(ϕ/2)t2 ≤ [(t2)/2]d2 f (x)ww+o(t2),

so that


ϕ ≤ d2 f (x)ww+ t−2o(t2),

implying d2 f (x)ww ≥ ϕ . But then d2 f (x)zz ≥ ϕ∥z∥2.(if). Suppose that for each x ∈C and each z ∈ L we have d2 f (x)zz ≥ ϕ∥z∥2. Let x

and y belong to C and suppose λ ∈ (0,1). For each z ∈C, each v ∈ L, and any scalart such that z+ tv ∈C, we have for σ ∈ [0,1]

d f (z+σtv)−d f (z) =∫ 1

0d2 f (z+ασ tv)(σtv)dα, (6.7)

and

f (z+ tv)− f (z) = d f (z)tv+∫ 1

0[d f (z+σtv)−d f (z)](tv)dσ (6.8)

Using (6.7) in (6.8), we find that

f (z+ tv)− f (z) = d f (z)tv+∫ 1

0[∫ 1

0d2 f (z+ασ tv)(σ tv)dα](tv)dσ

≥ d f (z)tv+∫ 1

0

∫ 1

0ϕσ t2∥v∥2dαdσ

= d f (z)tv+(ϕ/2)t2∥v∥2.

Now let z = (1−λ )x+λy, so that y = z+(1−λ )v and x = z−λv, where v = y−x.Then

f (y)− f (z)≥ (1−λ )d f (z)(y− x)+(ϕ/2)(1−λ )2∥y− x∥2,

f (x)− f (z)≥ (−λ )d f (z)(y− x)+(ϕ/2)(−λ )2∥y− x∥2,

and therefore

λ f (y)+(1−λ ) f (x)− f [(1−λ )x+λy]

≥ (ϕ/2)[λ (1−λ )2 +(1−λ )(−λ )2]∥y− x∥2

= (ϕ/2)λ (1−λ )∥y− x∥2.

Therefore ϕ is a modulus of convexity for f on C. ⊓⊔

Corollary 6.1.16. Let Ω , C, L, and f be as in Theorem 6.1.15. The function f isconvex on C if and only if for each x ∈C, d2 f (x) is positive semidefinite on L.

Proof. f is convex on C if and only if it has 0 as a modulus of convexity there. ⊓⊔

Corollary 6.1.16 shows, for example, that for an n×n, not necessarily symmetric,matrix A, an n-vector a and a scalar α the quadratic function ⟨Ax,x⟩+ ⟨a,x⟩+α isconvex on Rn if and only if the symmetric matrix A+A∗ is positive semidefinite.

6.1 Convexity 125

6.1.3 *Example: Convexity and an intractable function

Here is an illustration of a function occurring naturally in an application, but nothaving a tractable closed-form representation. Nevertheless, we will see that thisfunction is convex, so that all of the theory that we have already developed andwill develop below applies to it. In fact, some of that theory is central to effectivecomputing methods for dealing with such functions. To state the example we needsome technical terms from probability, but these do not affect the aspects of theexample with which we are concerned.

Suppose that customers arrive at a service center according to a proper renewalprocess with interarrival times λi: that is, the process begins at time zero; the firstcustomer arrives at epoch a1 = λ1, the second at epoch a2 = λ1 + λ2, etc., wherethe λi have a common continuous distribution. A customer who arrives when noothers are present is served immediately; if others are present the arriving customerjoins a queue and is served after all prior customers have been served in the order oftheir arrival. The ith customer’s service requires a time µi that is of the form p+ γi,where p is a fixed positive real number and the γi are nonnegative, with a commoncontinuous probability distribution, and are independent of each other and of the λi.A customer’s waiting time is the time from arrival until the beginning of service;thus it is the total time in the system less the service time.

The management of the service center wants to avoid long waiting times becausethey make customers unhappy. Under appropriate technical assumptions on p andon the λi and γi, the managers can be sure that the expected value of the waiting timeis finite. Moreover, by reducing p they can reduce the service time, which shouldreduce waiting time, but at a cost. The contribution of p to the total daily cost ofoperating the system is given by a function π(p) that is convex, continuous, and,for values of p in the interval [pL, pU ], decreasing; here 0 < pL < pU <+∞ and themanagers are able to set p at any value in [pL, pU ]. Thus if p decreases, the dailycost increases.

From past experience the managers believe that the contribution of the averagewaiting time w∗ to the average daily revenue from customers is given by a real-valued function δ (w∗) that is negative, concave, continuous, and decreasing: thus,if average waiting time increases, then the average daily revenue will decrease. Ac-cordingly, the managers would like to find the best tradeoff of p against averagewaiting time. Specifically, if for a given value of p the average waiting time is w∗(p),then they would like to set p so as to minimize the function π(p)− δ [w∗(p)] overvalues of p in [pL, pU ]. However, in order to implement such a plan they need toknow the relationship of w∗(p) to p.

To understand how w∗(p) (or an approximation to it) is related to p, we will in-vestigate the way in which individual waiting times are determined. All these timesdepend on p, but as for the moment p is fixed, to simplify the notation we suppressp in the part of the analysis immediately following.

• Suppose that the system starts operating at time zero, and that the first cus-tomer arrives at time a1 = λ1 > 0. This customer finds the system empty, and


immediately enters service, departing as soon as service is completed at timed1 = λ1 + p+ γ1. The waiting time w1 for this first customer is zero.

• Now suppose that k ≥ 1. The k+ 1st customer arrives at ak+1 = ak +λk+1. Wehave

dk −ak+1 = (ak +wk + p+ γk)− (ak +λk+1)

= wk + p+ γk −λk+1.(6.9)

If dk −ak+1 > 0 then this customer has to wait to enter service until dk, departingat dk+1 = dk + p+ γk+1. The waiting time is then

wk+1 = dk+1 − p− γk+1 −ak+1

= (dk + p+ γk+1)− p− γk+1 −ak+1

= dk −ak+1

= wk + p+ γk −λk+1,

where the last line came from (6.9).On the other hand, if dk−ak+1 < 0 then this customer enters service immediatelyand wk+1 = 0. In this case, again from (6.9), we have wk + p+ γk − λk+1 < 0.Therefore

wk+1 = max0,wk + p+ γk −λk+1. (6.10)

We have w1 = 0, while (6.10) gives wk+1 for each k ≥ 1.With this expression in hand, the managers decide to attack their problem by

simulating wk for a long run of n arrivals, then to estimate w∗(p) by the sampleaverage

m∗(p) = n−1n

∑k=1

wk(p).

If n is chosen to be large, there is a priori a high probability that m∗(p) will be closeto w∗(p). For a given sample path produced by simulation, the managers plan toregard m∗(p) as a deterministic function of p, then optimize it. This is the methodof sample-path optimization, and it can be shown that under suitable assumptions,for large n an optimizer of m∗(p) will with probability 1 be close to an optimizer ofw∗(p) [36].

However, there is a difficulty: the preceding paragraph discussed an optimizer ofm∗(p), but how are we to know that one exists? It certainly is infeasible to write outan expression for m∗(p) for n of the size (in the tens of thousands) typically used insample-path optimization. If we cannot even write down the function, how are we toanswer questions about the existence of an optimizer, or about whether it is a localor global optimizer?

Here the structure of wk(p), shown in (6.10), is of crucial importance. Recall thatonce the sample path is generated, the γk and λk are no longer random variablesbut are fixed, known numbers. Then beginning with k = 2 we argue that w2(p),as the maximum of a constant function and an affine function of p, is convex andcontinuous. Then for k ≥ 2, wk+1 will be the maximum of a constant and the convex,

6.2 Lower semicontinuity and closure 127

continuous function wk(p), and therefore it will be convex and continuous too. Ingeneral we would be using lower semicontinuity instead of continuity, but in thiscase the expressions are simple enough that continuity holds.

Finally, since we are dealing with finite convex functions on the interval [pL, pU ],the sum of all the wk(p), and therefore the sample mean m∗(p) will be a convex con-tinuous function of p. Then the tradeoff function π(p)−δ [m∗(p)] to be minimizedwill also be convex and continuous. As [pL, pU ] is compact and convex, this meansthat first, we are guaranteed that a minimizer exists, and second, we are guaranteedthat any local minimizer will be a global minimizer.

This example is highly simplified in order to provide some insight into themethodology without introducing more complications and technical details thanwere necessary. However, the reasoning used here applies to much more complexproblems as well. For example, the paper [30] reports computational solution of twoclasses of such problems: optimizing buffer sizes in manufacturing transfer lineswith random breakdown and repair, and cost optimization in stochastic PERT net-works. At the time that work was done, the problems reported were larger than anyof their kind for which solutions had been reported. In those cases the tools of con-vexity were important not only for analyzing the minimization problem but also forproviding some of the computational methods for obtaining the numerical solutions.


Exercise 6.1.17. Suppose that g is a convex function from Rn ×Rm to R. Show thatthe function

h(x) := infy∈Rm

g(x,y)

is convex on Rn.

Exercise 6.1.18. Show that if f is a convex function from Rn to (−∞,+∞] and g isa nondecreasing convex function from (−∞,+∞] to (−∞,+∞] such that g(+∞) =+∞, then the composition g f is convex.

Exercise 6.1.19. Let f be a proper convex function on Rn. We say that f is positivelyhomogeneous if for each τ > 0 and each x ∈ Rn, f (τx) = τ f (x). Show that f ispositively homogeneous if and only if epi f is a convex cone in Rn+1.

Exercise 6.1.20. Let C be a convex subset of Rn+1 and for each x ∈ Rn defineϕ(x) := infξ | (x,ξ ) ∈C. Then ϕ is a convex function from Rn to R.

6.2 Lower semicontinuity and closure

Sometimes convex functions may not look the way one might expect them to. Hereis an example.


Example 6.2.1. Let ψ be any function from Sn−1 to the non-negative halfline R+.Define a function f : Rn → (−∞,+∞] by:

f (x) =

0 if ∥x∥< 1,ψ(x) if ∥x∥= 1,+∞ if ∥x∥> 1.

This f is then a proper convex function (verify!).

Functions like f in Example 6.2.1 do not fit one’s mental image of what convexfunctions ought to be like. If we were to ask ourselves why this f seems bad, theanswer would be that it takes arbitrary non-negative values on Sn−1.

What if we were to redefine it to be zero there, to match its values on the interiorof the unit ball Bn? Then we would simply have the indicator IBn of the unit ball,which is a perfectly nice convex function. It is lower semicontinuous, not continu-ous, but no function g taking finite values on a closed proper subset F of Rn andvalues of +∞ on Rn\F can be continuous, because the inverse image under g ofthe open set R is F , which is not open. Lower semicontinuity is the best we canexpect from a function like IB. However, f as defined above is in general not lowersemicontinuous.

Section 6.2.1 introduces semicontinuity and provides some useful technical tools.Following that, Section 6.2.2 combines lower semicontinuity with convexity.

6.2.1 Semicontinuous functions

The functions with which we work in this section are not necessarily convex.

Definition 6.2.2. Let S ⊂ Rn and let f : S → R. f is lower semicontinuous (lsc) atx0 ∈ S if for each real µ < f (x0) the set x | f (x)> µ is a neighborhood of x0 in S.f is upper semicontinuous (usc) at x0 if − f is lsc at x0. f is lsc (usc) on S if it is lsc(usc) at each point of S. When S = Rn we just say f is lsc (usc). ⊓⊔

An equivalent definition of lower semicontinuity says that f is lower semicon-tinuous at x0 if f (x0) ≤ liminfx→Sx0 f (x) (and upper semicontinuous if f (x0) ≥limsupx→Sx0

f (x)). It will often be convenient for us to use this definition insteadDefinition 6.2.2, but before doing so we need to clarify some differences in termi-nology used in the literature.

Suppose f is a function from a topological space X to the real line R. For anypoint x of X let N (x) be the neighborhood system at x: that is, the collection ofopen sets in X that contain x. The traditional definition of the limit inferior of f as xapproaches a point x0 ∈ X is

liminfx→x0

f (x) = supN∈N (x0)

infx∈N\x0

f (x); (6.11)


see for example [42, p.48] for the case in which X is a subspace of R. A differentdefinition that has been used recently in variational analysis is

liminfx→x0

f (x) = supN∈N (x0)

infx∈N

f (x); (6.12)

see [39, p.51] and [41, p.8]. The exclusion of x0 in (6.11) matters, as one can seeby defining f : R→ R to be −1 at the point x0 := 0 and zero everywhere else. Thefollowing lemma shows the relationship between these two definitions.

Lemma 6.2.3. Let X be a topological space containing a point x0 and let f : X → R.Define

α := supN∈N (x0)

infx∈N

f (x), (6.13)

andβ := sup

N∈N (x0)

infx∈N\x0

f (x). (6.14)

Then α = min f (x0),β.

Proof. If N ∈ N (x0) then x0 ∈ N so that infx∈N f (x) ≤ f (x0). Take the supremumover N ∈ N (x0) in this inequality to get α ≤ f (x0). Considering N again we seethat as N\x0 ⊂ N, by taking the supremum over N ∈ N (x0) we obtain α ≤ β .We have thus proved that α ≤ min f (x0),β. To complete the proof it suffices toshow that if α < f (x0) then α = β .

If α < f (x0) then choose ε > 0 so that α < f (x0)−ε . The definition of α showsthat then for each N ∈ N (x0) there must be some xN with f (xN) < f (x0)− ε . Butthen removing x0 from N will not affect the infimum of f (x) on N, so infx∈N f (x) =infx∈N\x0 f (x) Taking the supremum over N ∈ N (x0) yields α = β . ⊓⊔

With this lemma we can show that the choice between the definitions of limitinferior in (6.11) and in (6.12) has no effect on the determination of lower semicon-tinuity at x0.

Proposition 6.2.4. Let X, x0, and f be as in Lemma 6.2.3. The inequality

f (x0)≤ liminfx→x0

f (x) (6.15)

holds with the definition of limit inferior in (6.11) if and only if it holds with thedefinition of limit inferior in (6.12). ⊓⊔

Proof. We use the definitions of α and β given in (6.13) and (6.14). Suppose that(6.15) holds using β for the limit inferior. Then f (x0) ≤ β , so Lemma 6.14 showsthat f (x0) = α and therefore (6.15) holds using α for the limit inferior. On the otherhand, if (6.15) holds using α for the limit inferior then f (x0) ≤ α , but the lemmashows that α ≤ β so that (6.15) holds using β for the limit inferior. ⊓⊔

As the choice between (6.11) and (6.12) does not affect the determination oflower semicontinuity, we are free to use either definition. In the remainder of thisbook we use (6.12).


If f is both lsc and usc at x0 then it is continuous there (Exercise 6.2.18). There-fore the idea of semicontinuity in effect splits the definition of continuity into twoparts. That refinement is useful because, as we see next, some functional operationscommonly occurring in optimization preserve semicontinuity although they do notpreserve continuity.

Proposition 6.2.5. Let S ⊂ Rn, let x0 ∈ S, and let fα | α ∈ A be a collection offunctions from S to R. If each fα is lsc at x0, then the function f defined by f (x) =supα∈A fα(x) is lsc at x0. The corresponding statement holds for usc functions with“infimum” replacing “supremum.”

Proof. We prove only the first statement, as the second follows immediately fromit. If A is empty then we are dealing with the constant function −∞ which is clearlylsc. Therefore suppose that A = /0.

If f (x0) =−∞ then f is certainly lsc at x0. Therefore suppose otherwise, and letµ be a real number with µ < f (x0). There must be some α ∈ A with fα(x0) > µ ,and by hypothesis the set N = x | fα(x) > µ is a neighborhood of x0 in S. Butx | f (x)> µ ⊃ N, so this set is also a neighborhood of x0 in S and therefore f islsc at x0. ⊓⊔

Proposition 6.2.5 shows one reason why semicontinuity is useful, since the supre-mum and infimum operations certainly do not preserve continuity even when theindex set is countable (Exercise 6.2.19). It also shows that statements about lscfunctions have mirror images for usc functions. For simplicity, in the remainderof this section we deal only with lsc functions, leaving it to the reader to derive thecorresponding statements for usc functions.

Sometimes it is simpler to deal with semicontinuity on Rn than with semiconti-nuity on a subset. If the subset in question is closed, we can easily pass back andforth between the two by using the +∞ extension.

Proposition 6.2.6. Let S be a closed subset of Rn and let f : S → R. Then f is lscon S if and only if the +∞ extension of f is lsc on Rn.

Proof. Denote the +∞ extension by g, and let x0 ∈ Rn.(only if) Assume that f is lsc on S. If g(x0) = −∞ then g is lsc at x0. Therefore

suppose that g(x0)>−∞ and let µ be a real number with µ < g(x0). If x0 /∈ S thenthere is some neighborhood of x0 that is disjoint from S, and on that neighborhoodg takes the value +∞, which certainly is greater than µ . On the other hand, if x0 ∈ Sthen by hypothesis there is a neighborhood NS of x0 in S on which f (hence g) takesvalues greater than µ . This NS is of the form S∩N, where N is a neighborhood ofx0 in Rn. As g takes +∞ everywhere on N\S, the values of g everywhere on N aregreater than µ , so g is lsc at x0.

(if) Assume g is lsc on Rn and choose x0 ∈ S. If f (x0) = −∞ there is nothingmore to prove, so let µ be any real number with µ < f (x0). Then µ < g(x0), sincef and g agree on S. Find a neighborhood N of x0 in Rn such that for each x ∈ Ng(x) > µ , and let NS = N ∩ S. Then NS is a neighborhood of x0 in S, and for eachx ∈ NS, f (x) = g(x)> µ . Therefore f is lsc at x0. ⊓⊔


The assumption that S is closed is necessary in the “only if” part of Proposition6.2.6 (Exercise 6.2.20).

It is not always easy to prove that a function is lsc using Definition 6.2.2, so itis helpful to have a general criterion for a function defined on Rn to be lsc. Thenext theorem develops such a criterion. In it we use the notation levα f to denote thelower level set x | f (x)≤ α, where f is any extended-real-valued function on Rn

and α ∈ R.

Theorem 6.2.7. Let f be an extended-real-valued function on Rn. The following areequivalent:

a. f is lsc.b. epi f is closed.c. For each real µ , levµ f is closed.

Proof. (a) implies (b) Suppose f is lsc, and let (x,µ) /∈ epi f . Then µ < f (x), sowe can find a real number ν with µ < ν < f (x). By the lsc assumption there is aneighborhood N of x such that for each x ∈ N, f (x)> ν . Let λ < µ; then N× (λ ,ν)is a neighborhood of (x,µ) that is disjoint from epi f . Therefore epi f is closed.

(b) implies (c) The set (levµ f )×µ is the intersection of epi f with Rn×µ .The intersection is closed, so levµ f is closed.

(c) implies (a) Let x ∈ Rn and suppose µ is a real number with f (x) > µ . Thenx /∈ levµ f , so by the closedness assumption there is a neighborhood N of x that isdisjoint from levµ f . At each point of this neighborhood, f must therefore take avalue greater than µ , so f is lsc at x and therefore, as x was arbitrary, f is lsc. ⊓⊔

Here is an important consequence of Theorem 6.2.7 for optimization.

Proposition 6.2.8. Let f be a lsc extended-real-valued function on Rn, and let S bea nonempty, compact subset of Rn. Then f attains a minimum on S.

Proof. For each real µ let Sµ = S∩ levµ f . Then Sµ is compact. If all of the Sµ areempty then f is identically +∞ on S, so it attains a minimum of +∞ there. Supposetherefore that some Sµ is not empty; then the value I := infx∈S f (x) is not +∞, andfor each µ > I the set Sµ is nonempty. As the Sµ are nested, the set S∗ := ∩µ>ISµ isnonempty and compact. On this set f cannot take any value greater than I; thereforeit takes the constant value I, which must then be the minimum of f on S. ⊓⊔

6.2.2 Closed convex functions

As we have seen that lower semicontinuity can be helpful in optimization, we cantry to find ways in which to regularize functions as we did in Example 6.2.1. Forgeneral functions this leads to the definition of lower semicontinuous hull; for con-vex functions it is helpful to adjust that definition in cases where the function takes−∞, to obtain the closure operation.


Definition 6.2.9. Let f be an extended-real-valued function on Rn. The lower semi-continuous hull of f is the function lsc f whose epigraph is clepi f . The closure off is the function cl f that is the lower semicontinuous hull of f if f never takes −∞,and is identically −∞ otherwise. f is closed if f = cl f . ⊓⊔

The definition implies that epicl f ⊃ epi f and therefore for each x, cl f (x)≤ f (x).We say that a function g minorizes f , or equivalently that g is a minorant of f , ifg ≤ f : that is, for each x ∈ Rn one has g(x) ≤ f (x). Accordingly, the closure of falways minorizes f .

We will see in Theorem 6.2.14 below that the two functions actually agree ev-erywhere except possibly on the relative boundary of dom f . In Example 6.2.1 bothlsc f and cl f equal the indicator IB. The closure operation removes all of the badbehavior of f on Sn−1, which in that case is the relative boundary of dom f .

We can see that in a certain sense the closure operation regularizes a convexfunction. In the remainder of this section we investigate the relationship betweensuch a function and its closure, developing an internal representation for the closurein terms of limits of function values along line segments. We will later establish anexternal representation in terms of pointwise suprema of affine minorants.

To develop the first of these results we need two technical tools, which we provenext. The first shows that an improper convex function can assume finite values onlyon the relative boundary of its effective domain, though it is not guaranteed to befinite even there.

Proposition 6.2.10. Let f be an improper convex function on Rn. Then for eachx ∈ ridom f one has f (x) =−∞.

Proof. If f ≡ +∞ then the claim is true. Otherwise, dom f is nonempty and theremust (since f is improper) be some z with f (z) = −∞. Choose any x ∈ ridom f ;then by part (c) of Corollary1.2.5 there is some y ∈ dom f and some λ ∈ (0,1) withx = (1−λ )z+λy. Let α be a fixed real number with f (y)< α and let β be any realnumber. As β > f (z), Proposition 6.1.3 tells us that f (x) < (1−λ )β +λα . But βwas arbitrary, so we must have f (x) =−∞. ⊓⊔

Proposition 6.2.11. Let f be a convex function on Rn. Then

riepi f = (x,ξ ) ∈ Rn+1 | x ∈ ridom f , f (x)< ξ .

Proof. If f ≡+∞ the claim is surely true. Otherwise, define a linear transformationL : Rn+1 → Rn by L(x,ξ ) = x. Using Proposition 1.2.7, we find that

L(riepi f ) = riL(epi f ) = ridom f ,

so that the points x of ridom f are exactly those such that for some ξ one has (x,ξ )∈riepi f . Let x be such a point and let V = x×R. V is relatively open because it isaffine, and it must meet riepi f since x ∈ L(riepi f ). By Proposition 1.2.12,


V ∩ riepi f = ri(V ∩ epi f ) = ri(x,ξ ) | ξ ≥ f (x)= x× ( f (x),+∞),

which proves the assertion. ⊓⊔

Proposition 6.2.11, together with separation, yields an extremely important resultabout the existence of subgradients. We define these first, then prove the existenceresult.

Definition 6.2.12. Let f be a convex function from Rn to R, and let x0 ∈ Rn. Anelement x∗ ∈ Rn is a subgradient of f at x0 if for each x ∈ Rn,

f (x)≥ f (x0)+ ⟨x∗,x− x0⟩. (6.16)

⊓⊔

If x∗ is a subgradient of f at x0 and f (x0) = +∞, then f ≡ +∞, whereas iff (x0) = −∞ then every point of Rn is a subgradient of f at x0. These cases arenot very interesting compared to the case in which f (x0) is finite, when the ex-istence of a subgradient yields the affine function on the right in (6.16). Thisfunction minorizes f and agrees with f at the point x0, so its graph is a hyper-plane H[(x∗,−1),⟨(x∗,−1),(x0, f (x0))⟩] in Rn+1 that supports epi f at the point(x0, f (x0)).

The next proposition gives the fundamental existence result for subgradients.

Proposition 6.2.13. Let f be a convex function from Rn to R. If x0 ∈ ridom f thenf has a subgradient at x0.

Proof. Let x0 ∈ ridom f . If f is improper then f (x0) =−∞ (Proposition 6.2.10), andthen each element of Rn is a subgradient of f at x0. Therefore suppose f is proper;then f (x0) is finite, and Proposition 6.2.11 tells us that the pair (x0, f (x0)) doesnot belong to riepi f . By Theorem 2.2.11 we can then properly separate this pointfrom epi f . This means that there exists some nonzero (x∗,ξ ∗) such that whenever(x,ξ ) ∈ epi f we have

⟨x∗,x⟩+ξ ∗ξ ≤ ⟨x∗,x0⟩+ξ ∗ f (x0),

with strict inequality for some (x, ξ ) ∈ epi f .Because of the form of epi f we cannot have ξ ∗ > 0. If ξ ∗ = 0 then for each

x ∈ dom f we have the inequality ⟨x∗,x⟩ ≤ ⟨x∗,x0⟩, with strict inequality for x = x.So we have properly separated x0 from dom f , which is impossible (Theorem2.2.11) because x0 ∈ ridom f . Therefore ξ ∗ < 0, and there is no loss of generalityin scaling x∗ so that ξ ∗ =−1. Then for each (x,ξ ) ∈ epi f we have

ξ ≥ f (x0)+ ⟨x∗,x− x0⟩,

which shows that x∗ is a subgradient of f at x0. ⊓⊔

The next theorem states several important facts about the relationship between aconvex function not taking −∞ and its closure. If f ever takes −∞ then its closure


is identically −∞, so that case is uninteresting. As we do elsewhere, we use here thenotation C ∼= D to mean that C and D have the same closure and the same relativeinterior.

Theorem 6.2.14. Let f be a convex function from Rn to (−∞,+∞]. Then cl f isa closed convex function, and it is proper if and only if f is proper. One hascl f (x) = f (x) everywhere except perhaps at points x on the relative boundary ofdom f . Further,

cldom f ⊃ domcl f ⊃ dom f ,

and therefore in particular dom f ∼= domcl f .

Proof. If f is improper, then f and cl f are both identically +∞, so cl f is closedand improper, and the claims about domains are certainly true. Therefore supposethat f is proper, and note that epicl f = clepi f . As this set is convex, cl f must bea convex function. Let L be the linear transformation from Rn+1 to Rn taking (x,ξ )to x. Then we have

clL(epi f )⊃ L(clepi f )⊃ L(epi f );

that is,cldom f ⊃ domcl f ⊃ dom f ,

which establishes the inclusion claim about domains. This inclusion also impliesthat the two domains must have the same affine hull, so that in fact domcl f ∼=dom f . Now let x be in the common relative interior of dom f and domcl f , and letV = L−1(x). Then V is a line which, by Proposition 6.2.11, must meet riepi f . Then

V ∩ epicl f =V ∩ clepi f = cl(V ∩ epi f )

= cl[x× [ f (x),+∞)] = x× [ f (x),+∞)

=V ∩ epi f ,

where we used Proposition 1.2.12. This shows that f and cl f agree on ridom f . Asf is finite there, so is cl f , but as this set is also ridomcl f it follows from Proposition6.2.10 that cl f is proper. But then cl f never takes −∞, and as its epigraph is closedit must be a closed convex function. We have shown that f and cl f agree on ridom fand off cldom f , so that they can differ (if at all) only at relative boundary points ofdom f . ⊓⊔

Here is the first representation result for closures. It says that we can calculatethe value of lsc f at any point by simply taking limits of the values of f along aline segment from the relative interior of dom f to that point, and it asserts that suchlimits will exist.

Proposition 6.2.15. Let f be a convex function on Rn. If x ∈ Rn and x′ ∈ ridom f ,then

(lsc f )(x) = limλ↓0

f [(1−λ )x+λx′].


Proof. Let x′ ∈ ridom f and x ∈Rn. For each λ ∈ (0,1], define xλ = (1−λ )x+λx′.Recall that epi lsc f = clepi f . Therefore lsc f ≤ f , and we have

(lsc f )(x)≤ liminfλ↓0

(lsc f )(xλ )≤ liminfλ↓0

f (xλ ). (6.17)

If (lsc f )(x)=+∞ then this completes the proof. Therefore suppose that (lsc f )(x)<+∞, and choose any real number µ ≥ (lsc f )(x). Let β be a real number greaterthan f (x′). The pair (x′,β ) then belongs to riepi f by Proposition 6.2.11, and(x,µ) belongs to clepi f . Using Theorem 1.2.4, we find that for each λ ∈ (0,1],(1− λ )(x,µ)+ λ (x′,β ) ∈ riepi f , so that f (xλ ) < (1− λ )µ + λβ by Proposition6.2.11. It follows that

limsupλ↓0

f (xλ )≤ limλ↓0

(1−λ )µ +λβ = µ .

As µ was any real number greater than or equal to (lsc f )(x), limsupλ↓0 f (xλ ) ≤lsc f (x). Using this inequality together with (6.17), we find that limλ↓0 f (xλ ) existsand equals (lsc f )(x) as claimed. ⊓⊔

If in Proposition 6.2.15 we do not permit f to take −∞ then we obtain a formulafor cl f .

Corollary 6.2.16. If f is a convex function from Rn to (−∞,+∞], then for eachx ∈ Rn and each x′ ∈ ridom f , one has

(cl f )(x) = limλ↓0

f [(1−λ )x+λx′].

Proof. For such f , cl f = lsc f . ⊓⊔

Corollary 6.2.17. Let f1, . . . , fk be proper convex functions on Rn. If

∩ki=1 ridom fi = /0, (6.18)

then

clk

∑i=1

fi =k

∑i=1

cl fi. (6.19)

Proof. Proposition 1.2.12 together with (6.18) yields

ri∩ki=1 dom fi = ∩k

i=1 ridom fi.

As the fi are proper we have

domk

∑i=1

fi = ∩ki=1 dom fi,

so


ridomk

∑i=1

fi = ∩ki=1 ridom fi.

Let x′ belong to this intersection. As the fi are proper we can apply Corollary 6.2.16to obtain for x ∈ Rn

(clk

∑i=1

fi)(x) = limλ↓0

k

∑i=1

fi[(1−λ )x+λx′] =k

∑i=1

limλ↓0

fi[(1−λ )x+λx′] =k

∑i=1

cl fi(x),

which proves (6.19). ⊓⊔


Exercise 6.2.18. Let f be an extended-real-valued function on a subset S of Rn, andlet x0 ∈ S. Show that if f is usc and lsc at x0, then it is continuous there.

Exercise 6.2.19. Exhibit a countable collection of continuous, uniformly boundedfunctions from [−1,1] to R whose supremum is not continuous.

Exercise 6.2.20. Give an example in R to show that the “only if” part of Proposition6.2.6 need not be true if the set S is not closed.

Exercise 6.2.21. Show that the function f of Definition 6.2.2 is lower semicontinu-ous at x0 if and only if f (x0)≤ liminfx→x0 f (x).

Exercise 6.2.22. Consider the cone K of Exercise 3.1.18, namely

K = x ∈ R3 | x1 ≥ 0, x3 ≥ 0, 2x1x3 ≥ x22.

Show that K is the epigraph of the function

f (x1,x2) =

x2

2/(2x1) if x1 > 0,0 if (x1,x2) = 0,+∞ otherwise.

( f is clearly proper and positively homogeneous, but this shows that it is also aclosed convex function.)

Exercise 6.2.23. Consider the function f of Exercise 6.2.22. Let α > 0 and definethree functions from R+ to R2 by

x0(t) = (.5t, t), xα(t) = ((2α)−1t2, t), x∞(t) = (.5t3, t).

Compute the limits of the composite functions f x0, f xα , and f x∞ as t ↓ 0.What does this show about continuity properties of closed proper convex functionslike f ?

Duy

Highlight

Duy

Highlight

Duy

Highlight


Exercise 6.2.24. The Cobb-Douglas production function, used in economics, is afunction from intRn

+ to R defined by

f (x1, . . . ,xn) =n

∏i=1

xλii ,

where the λi are fixed, positive numbers with ∑ni=1 λi = 1. This function is positively

homogeneous. Show how to extend this function to a closed, proper, positively ho-mogeneous concave function defined on all of Rn. (A function g is closed properconcave if −g is closed proper convex.)

Chapter 7Conjugacy and recession

One of the most useful aspects of convex functions is the fact that each closed con-vex function f has a conjugate function f ∗ that is also a closed convex function. Ifwe take the conjugate of f ∗, we recover f . The conjugacy operation provides sub-stantial insight into properties of the functions involved, and it is also very usefulfor computation.

The relationship between a function and its conjugate depends strongly upon thebehavior of the function “at infinity”: that is, its behavior as the argument becomeslarge. Recession functions permit us to describe and study that behavior.

7.1 Conjugate functions

This section introduces one of the principal tools of convex analysis, the conjugacycorrespondence for functions. We define this correspondence and develop a few ofits basic properties, as well as an extension of the conjugate called the adjoint. Laterin the section we illustrate a powerful application of conjugacy using the supportfunction as a vehicle.

7.1.1 Definition and properties

Definition 7.1.1. Let f be a function from Rn to R. The convex conjugate of f isthe function f ∗ : Rn → R defined by

f ∗(x∗) = supx⟨x∗,x⟩− f (x). (7.1)

⊓⊔To understand geometrically what we are doing in computing the conjugate, notethat (7.1) is equivalent to the expression

139

140 7 Conjugacy and recession

f ∗(x∗) = sup(x,ξ )∈epi f

⟨(x∗,−1),(x,ξ )⟩,

where the inner product in Rn+1 is defined by ⟨(a,α),(b,β )⟩ = ⟨a,b⟩+αβ . Thevector (x∗,−1) is the normal to a collection of parallel hyperplanes in Rn+1; the factthat the last component is −1 simply means that the hyperplanes are not “vertical”and that the normal vector points downward. Each hyperplane in the collection con-sists of all pairs (x,ξ ) that yield a particular numerical value of the inner product⟨(x∗,−1),(x,ξ )⟩; as this value increases, the hyperplanes move downward. If f isnot identically +∞, the formula in (7.1) gives the least upper bound of all such val-ues for which the hyperplane meets epi f ; otherwise it gives −∞. The operation ofcomputing the conjugate therefore amounts to moving the hyperplanes as far downas possible while still maintaining contact with epi f .

By contrast, the epigraph of f ∗ consists of those (x∗,ξ ∗) for which f (x) ≥⟨x∗,x⟩− ξ ∗ for each x ∈ Rn: that is, for which the affine function ⟨x∗, ·⟩− ξ ∗ mi-norizes the function f . Here the requirement that ξ ∗ ≥ f ∗(x∗) means that the affinefunction has to stay on or below the graph of f everywhere.

This geometric viewpoint suggests that, if f were convex and if the supremum in(7.1) were attained at some point x, f ∗(x∗) should be associated with a hyperplanesupporting the epigraph of f at the point (x, f (x)). This is a geometric statement ofthe basic relationship among functions, conjugates, and subgradients.

Under normal circumstances we just refer to the convex conjugate as the “con-jugate,” and no other kind of conjugate is required. However, for calculations in-volving duality relationships we sometimes need the concave conjugate, defined bysubstituting “inf” for “sup” in (7.1). By rearranging the formula in (7.1) we canthen see that the concave conjugate of f takes the value −(− f )∗(−x∗) at the pointx∗. This permits us to use properties of the convex conjugate to infer correspondingproperties of the concave conjugate.

As the conjugate in (7.1) is the supremum of the collection (indexed by x) ofaffine functions ⟨·,x⟩− f (x), its epigraph is empty (that is, f ∗ ≡+∞) if f ever takes−∞, it is Rn+1 (that is, f ∗ ≡ −∞) if f is identically +∞, and it is the intersectionof a collection of closed halfspaces otherwise. Therefore, f ∗ must be a lower semi-continuous convex function no matter what kind of function f was. The next resultshows that we can say more.

Proposition 7.1.2. Let f be a function from Rn to R. Then f ∗ is a closed convexfunction, and if f is convex then f ∗ is proper if and only if f is proper.

Proof. We just observed that f ∗ must be lower semicontinuous and convex. It cantake −∞ only if f ≡+∞, and in that case f ∗ ≡−∞. Therefore f ∗ is closed.

If f is improper then either it is identically +∞, in which case f ∗ ≡ −∞ as justnoted, or else f takes −∞ somewhere, in which case f ∗ ≡ +∞. In either case f ∗

is improper. If f is convex and proper then it cannot be everywhere +∞, so f ∗

can never take −∞. Also, there is a point x0 in ridom f at which f is finite, andby Proposition 6.2.13 f has a subgradient x∗0 at x0. Therefore, for each x ∈ Rn onehas f (x)≥ f (x0)+ ⟨x∗0,x− x0⟩. By rearranging this inequality we see that f ∗(x∗0) =

7.1 Conjugate functions 141

⟨x∗0,x0⟩− f (x0), a finite number. Then f ∗ cannot be identically +∞, so it is proper.⊓⊔

Much of the importance of conjugate functions rests on the fact that it is possiblenot only to pass from f to f ∗, but also from f ∗ back to f provided f is closed andconvex. That is the content of the following theorem.

Theorem 7.1.3. Let f be a convex function from Rn to R. Then f ∗∗ = cl f .

Proof. The enumeration of cases in the proof of Proposition 7.1.2 shows that theassertion is true if f is improper. Therefore we can suppose f is proper. Define twosets in Rn+2 by

E = (x,ξ ,−1) | (x,ξ ) ∈ epi f= (epi f )×−1,E∗ = (x∗,−1,ξ ∗) | (x∗,ξ ∗) ∈ epi f ∗.

Let Q = coneE, so that Q = Hepi f . Then to say (x∗,ξ ∗) belongs to epi f ∗ is to saythat for each (x,ξ ) ∈ epi f , ξ ≥ ⟨x∗,x⟩−ξ ∗.

Another way to express this is to say that for each (x,ξ ,−1) ∈ E,

⟨(x∗,−1,ξ ∗),(x,ξ ,−1)⟩ ≤ 0,

which is the statement that (x∗,−1,ξ ∗) ∈ Q. Therefore E∗ = Q ∩ (Rn ×−1×R). After transposition of the last two coordinates (which, for improved readability,we do not do explicitly), Q satisfies the hypotheses of Theorem 4.1.9, with E∗

playing the role of C×−1. It follows that ri(Q) = coneriE∗ = riconeE∗.Using the same reasoning we used above for f and f ∗, we see that a point (x,ξ )

belongs to epi f ∗∗ exactly when the point (x,ξ ,−1) belongs to (coneE∗). However,

(coneE∗) = (riconeE∗) = [ri(Q)] = [Q] = clQ,

where the first and third equalities are from Corollary 3.1.8. By Theorem 4.1.9the intersection of clQ with Rn+1 ×−1 is clepi f ×−1, so we have epi f ∗∗ =clepi f . But f is proper, so clepi f = epicl f ; it follows that f ∗∗ = cl f . ⊓⊔

Some useful facts about the double conjugate f ∗∗ follow immediately fromProposition 7.1.2 and Theorem 7.1.3.

Corollary 7.1.4. Let f be a function from Rn to R. With the conjugate taken in theconvex sense f ∗∗ is the greatest closed convex minorant of f , and it is also thepointwise supremum of the affine minorants of f .

Proof. If f is an improper function, then if f ever takes −∞ we have f ∗ ≡+∞ andtherefore f ∗∗ ≡ −∞. Any closed convex minorant of such an f must be identically−∞, so the first claim is true. The second follows from the fact that the supremum ofan empty collection of functions is −∞ by convention. If f ≡+∞ then f ∗∗ = f , and fis itself closed and convex, so the first claim is true. The second claim holds because


in this case every affine function is a minorant of f , and therefore the pointwisesupremum is identically +∞ and is therefore equal to f .

Assume therefore that f is proper, and let (x,ξ ) ∈ epi f . The definition of theconjugate shows that for each x∗ ∈ Rn we have f ∗(x∗) ≥ ⟨x∗,x⟩− ξ , and therefore⟨x∗,x⟩− f ∗(x∗)≤ ξ . Then

f ∗∗(x) = supx∗

⟨x∗,x⟩− f ∗(x∗) ≤ ξ ,

so that (x,ξ )∈ epi f ∗∗. Therefore epi f ⊂ epi f ∗∗, so that f ∗∗ ≤ f . In addition, Propo-sition 7.1.2 shows that f ∗∗ is closed and convex.

Now let g be any closed convex function minorizing f . Then g ≤ f implies g∗ ≥f ∗, and this in turn implies g = g∗∗ ≤ f ∗∗, where the equality came from Theorem7.1.3 because g is closed and convex. We have shown that f ∗∗ ≤ f , so that in factf ∗∗ is the greatest closed convex minorant of f .

An affine function ⟨x∗, ·⟩− ξ ∗ minorizes f exactly when (x∗,ξ ∗) ∈ epi f ∗. Butfor such a function, we have for each x


⟨x∗,x⟩− f ∗(x∗)= sup(x∗,ξ ∗)∈epi f ∗

⟨x∗,x⟩−ξ ∗,

which is the second claim. ⊓⊔

By using the concave conjugate instead of the convex conjugate one can provea result parallel to Corollary 7.1.4, but phrased in terms of concave majorants andinfima of affine functions, rather than convex minorants and suprema.

7.1.2 Adjoints

For applications to duality in convex optimization it is useful to have an extensionof the conjugacy concept to functions of two groups of variables.

Definition 7.1.5. Suppose that F is a function from Rn ×Rm to R. The adjoint of Fin the convex sense is the function FA : Rm ×Rn → R defined by

FA(y∗,x∗) = inf(x,y)

−⟨x∗,x⟩+ ⟨y∗,y⟩+F(x,y). (7.2)

The adjoint of F in the concave sense is defined by replacing infimum by supremumin (7.2). ⊓⊔

The hypograph of the function FA defined by (7.2) is the intersection of the hy-pographs of a collection of affine functions of (x∗,y∗) (indexed by (x,y)). Thereforethe adjoint of F in the convex sense is an upper semicontinuous concave function,and in fact it is closed because it can only take +∞ at a point if F ≡+∞, and in thatcase FA ≡ +∞ too. Similarly, the adjoint in the concave sense is a closed convex


function. We use the same notation for these two different operations because wewill use the adjoint in the convex sense when the function F with which we beginis convex, and in the concave sense when F is concave. Therefore the only possi-ble ambiguity might be when F is both convex and concave, and in such cases thecontext will make it clear which adjoint is intended.

Suppose that F is a closed convex function, and that we compute the adjoint FA

in the convex sense using (7.2). This FA is then closed and concave, and we couldtake its adjoint in the concave sense to obtain a function FAA. However, note that(7.2) expresses FA(y∗,x∗) as −F∗(x∗,−y∗). Therefore

FAA(x,y) = sup(y∗,x∗)

−⟨y∗,y⟩+ ⟨x∗,x⟩+FA(y∗,x∗)

= sup(x∗,−y∗)

⟨(

x∗

−y∗

),

(xy

)⟩−F∗(x∗,−y∗)

= F∗∗(x,y) = F(x,y),

(7.3)

where we used Theorem 7.1.3. Accordingly, if F is closed and convex then FAA =F ,and a similar argument shows that this formula holds also if F is closed and concave.

The term “adjoint” comes from the fact that if F is the convex indicator of thegraph of a linear transformation A : Rn → Rm (so that F(x,y) is zero if y = Ax and+∞ otherwise), then FA(y∗,x∗) is zero if x∗ = A∗y∗ and −∞ otherwise, so that FA isthe concave indicator of the graph of the adjoint transformation A∗.

7.1.3 Support functions

The support function provides a powerful tool for dealing with properties of closedconvex sets.

Definition 7.1.6. For any subset S of Rn, the support function of S is the function I∗Staking the value sups∈S⟨x∗,s⟩ at the point x∗. ⊓⊔

Assups∈S

⟨x∗,s⟩= sups∈Rn

⟨x∗,s⟩− IS(s),

where IS(s) is zero if s ∈ S and +∞ otherwise, the notation I∗S for the support func-tion is compatible with the definition of conjugacy. The definition shows that I∗S ispositively homogeneous (I∗S (σx∗) = σ I∗S (x

∗) for each positive σ and each x∗ ∈Rn),and Proposition 7.1.2 shows that I∗S is a closed convex function. Furthermore, if I∗S isimproper then it is identically −∞, which occurs exactly when S is empty; I∗S cannotbe identically +∞ because if S is nonempty then I∗S (0) = 0.

Like any other function, the support function has an epigraph. If S is a nonemptysubset of Rn then


epi I∗S = (x∗,ξ ∗) ∈ Rn+1 | ξ ∗ ≥ I∗S (x∗)

= (x∗,ξ ∗) | for each x ∈ S, ξ ∗ ≥ ⟨x∗,x⟩= (x∗,ξ ∗) | for each x ∈ S, ⟨(x∗,ξ ∗),(x,−1)⟩ ≤ 0= H

S ,

where HS is the homogenizing cone of S. Thus the epigraph of the support functionis the polar of the homogenizing cone. This polar is often called the “dual cone” ofS.

The support function in fact furnishes us with a one-to-one correspondence be-tween the class C of nonempty closed convex subsets of Rn and the class F ofpositively homogeneous closed proper convex functions on Rn. We establish thiscorrespondence in the following theorem. In its proof we use the notation f ∗ for ageneric member of F . This is reasonable because each element of F is closed andconvex, so it is a conjugate.

If f ∗ ∈ F then f ∗(0) = 0. To see this, start with the fact that by definition of F ,f ∗ is proper, so ridom f ∗ is nonempty and f ∗ is finite there, as it cannot take −∞anywhere. Let x ∈ ridom f ∗; as f ∗ is closed, Corollary 6.2.16 yields

f ∗(0) = limt0

f ∗[(1− t)0+ tx] = limt0

t f ∗(x) = 0.

Theorem 7.1.7. Let C be the class of nonempty closed convex subsets of Rn and Fbe the class of closed proper positively homogeneous convex functions on Rn. Theformulas

ϕ(C) = I∗C, γ( f ∗) = x | for each x∗,⟨x∗,x⟩ ≤ f ∗(x∗)

establish a one-to-one correspondence between C and F .

Proof. We prove that ϕ : C → F , that γ : F → C , and that ϕ γ and γ ϕ are theidentity maps of F and of C respectively.

If C ∈ C then ϕ(C) = I∗C. The comments following Definition 7.1.6 show thatI∗C is closed, convex, and positively homogeneous; it is also proper because C = /0.Therefore ϕ : C → F .

If f ∗ ∈ F then γ( f ∗) is the intersection of a collection of sets S(x∗) = x |⟨x∗,x⟩− f ∗(x∗) ≤ 0, indexed by x∗ ∈ Rn. As f ∗ is proper it cannot take −∞ any-where, so S(x∗) is Rn if f ∗(x∗) = +∞ or if x∗ = 0, and it is a closed halfspace ifx∗ = 0 and f ∗(x∗) is finite. Hence γ( f ∗) is a closed convex set. To conclude that itis in C we have to show that it is nonempty.

The definition of γ shows that

γ( f ∗) = x | f ∗∗(x)≤ 0. (7.4)

We know f ∗ is proper; by Proposition 7.1.2 so then is f ∗∗, and therefore f ∗∗ nevertakes −∞. Then its effective domain is a nonempty set on which f ∗∗ takes only finitevalues. Fix any x ∈ dom f ∗∗; then we have



⟨x∗,x⟩− f ∗(x∗). (7.5)

By taking x∗ = 0 and using the fact that f ∗(0) = 0 because f ∗ ∈ F , we see that thesupremum in (7.5) is at least 0. If there were any x∗ with ⟨x∗,x⟩− f ∗(x∗) > 0 thenas this expression is positively homogeneous in x∗ the supremum would be +∞,which would contradict x ∈ dom f ∗∗. So there is no such x∗, implying f ∗∗(x) = 0and therefore x ∈ γ( f ∗) by(7.4). It follows that γ( f ∗) is nonempty and is thereforein C . This shows that γ : F → C .

The above proof shows that if f ∗ ∈ F then f ∗∗ takes the value zero on its effec-tive domain; it takes +∞ at all other points. Then (7.4) shows that γ( f ∗) = dom f ∗∗,so that f ∗∗ is the indicator Iγ( f ∗) of γ( f ∗). But as f ∗∗∗ = f ∗, f ∗ is the support func-tion I∗γ( f ∗) of γ( f ∗): that is, f ∗ = ϕ [γ( f ∗)]. Therefore ϕ γ is the identity of F .

On the other hand, if C ∈ C then we have already seen that I∗C is convex, closedand proper, so ϕ(C) ∈ F . Moreover, γ[ϕ(C)] = γ(I∗C), and by definition this is theset of x for which supx∗⟨x∗,x⟩− I∗C(x

∗) ≤ 0: that is, for which I∗∗C (x)≤ 0. As C isclosed and nonempty, Theorem 7.1.3 tells us that I∗∗C = IC, so the set γ[ϕ(C)] is justC. Therefore γ ϕ is the identity of C . ⊓⊔

This correspondence between nonempty closed convex sets and their supportfunctions has very useful properties. The following proposition gives some of these.Here as elsewhere in this book, if f and g are functions from Rn to R, the notationf ≤ g means that f (x) ≤ g(x) for each x ∈ Rn, and similarly for f ≥ g. In part (c)we use the Pompeiu-Hausdorff distance ρ(C,D) from Definition 1.1.10.

Proposition 7.1.8. Let C and D be nonempty closed convex subsets of Rn.

a. If γ and δ are nonnegative real numbers, then I∗γC+δD = γI∗C +δ I∗D, where on theright-hand side we use the convention that 0(+∞) = 0.

b. I∗C ≤ I∗D if and only if C ⊂ D.c. ρ(C,D) = sup|I∗C(y∗)− I∗D(y

∗)| | ∥y∗∥ = 1, where here we use the conventionthat (+∞)− (+∞) = 0.

Proof. Claim (a). If either or both of γ and δ are zero then the result is immediate;therefore we assume that both are positive. Let x∗ ∈Rn and choose c ∈C and d ∈ D.Then

⟨x∗,γc+δd⟩= γ⟨x∗,c⟩+δ ⟨x∗,d⟩ ≤ γI∗C(x∗)+δ I∗D(x

∗).

Taking the supremum on the left over c∈C and d ∈D yields I∗γC+δD(x∗)≤ γI∗C(x

∗)+

δ I∗D(x∗), and as x∗ was arbitrary in Rn we have I∗γC+δD ≤ I∗γC + I∗δD.

The opposite inequality holds whenever the left-hand side is +∞. Let x∗ ∈ Rn

with I∗γC+δD(x∗)<+∞. If either of I∗C(x

∗) and I∗D(x∗) were +∞ then we could take a

sequence in that set with the support function increasing without bound, and a fixedelement of the other set, to contradict the finiteness of I∗γC+δD(x

∗). Therefore bothare finite. For c ∈C and d ∈ D we have

γ⟨x∗,c⟩+δ ⟨x∗,d⟩= ⟨x∗,γc+δd⟩ ≤ I∗γC+δD(x∗).


Now by taking the supremum in the left-hand side over c ∈C and d ∈ D we obtainγI∗C(x

∗)+ δ I∗D(x∗) ≤ I∗γC+δD(x

∗), and as x∗ is any point with I∗γC+δD(x∗) < +∞ we

have I∗γC+δD ≥ γI∗C +δ I∗D.Claim (b). The “if” direction follows directly from the definition of the support

function. For the “only if” direction, suppose that I∗C ≤ I∗D. Take conjugates to obtain

IC = I∗∗C ≥ I∗∗D = ID,

where the equalities follow from Theorem 7.1.3 because C ∈ C and therefore IC isclosed, and similarly for D. The indicator inequality IC ≥ ID implies C ⊂ D.

Claim (c). First suppose that there is no µ ≥ 0 such that C ⊂ D+ µBn and D ⊂C+ µBn, where Bn is the unit ball. Then ρ(C,D) = +∞. Also, in this case for anypositive µ we have either C ⊂ D+ µBn or D ⊂ C+ µBn. Suppose it is the former;then as the support function of Bn is the norm ∥ · ∥, I∗C ≤ I∗D + µ∥ · ∥ by part (b).Therefore there is some x∗ with ∥x∗∥= 1 and

I∗C(x∗)> I∗D(x

∗)+µ ,

so that

sup|I∗C(y∗)− I∗D(y∗)| | ∥y∗∥= 1 ≥ |I∗C(x∗)− I∗D(x

∗)| ≥ I∗C(x∗)− I∗D(x

∗)> µ.

In the case D ⊂C+µBn we reason similarly to obtain the same inequality sup|I∗C(y∗)−I∗D(y

∗)| | ∥y∗∥= 1> µ . As µ was arbitrary, we have

sup|I∗C(y∗)− I∗D(y∗)| | ∥y∗∥= 1=+∞ = ρ(C,D).

The remaining case is that in which there is µ ≥ 0 such that

C ⊂ D+µBn, D ⊂C+µBn. (7.6)

ThenI∗C ≤ I∗D +µ∥ · ∥, I∗D ≤ I∗C +µ∥ · ∥.

For each such µ and for each x∗ with ∥x∗∥= 1,

|I∗C(x∗)− I∗D(x∗)| ≤ µ ,

so thatsup|I∗C(y∗)− I∗D(y

∗)| | ∥y∗∥= 1 ≤ µ. (7.7)

Taking the infimum in (7.7) over all µ ≥ 0 satisfying (7.6), we find that

sup|I∗C(y∗)− I∗D(y∗)| | ∥y∗∥= 1 ≤ ρ(C,D). (7.8)

If the inequality in (7.8) were strict, then we could find µ with

sup|I∗C(y∗)− I∗D(y∗)| | ∥y∗∥= 1< µ < ρ(C,D).


Then we would have for each y∗ with norm 1 the inequality I∗C(y∗)≤ I∗D(y

∗)+µ , andtherefore for each z∗ ∈Rn I∗C(z

∗)≤ I∗D(z∗)+µ∥z∗∥. Then I∗C ≤ I∗D+µ∥ · ∥, which by

part (b) would imply C ⊂ D+µBn. Reasoning in the same way we could also con-clude that D ⊂C+µBn. But then µ satisfies (7.6) but µ < ρ(C,D), a contradiction.Therefore (7.8) holds as an equation. ⊓⊔

It is often possible to use the information in Proposition 7.1.8 to establish factsabout sets by manipulating the corresponding support functions, or vice versa, whena direct approach would have been much more difficult.

The next definition introduces an operation to produce from a given function fa positively homogeneous function cone f by applying the operation described inExercise 6.1.20 to the cone generated by epi f . For that reason we call it the conicalhull of f . As the function is explicitly mentioned in this description, there should beno confusion with the conical hull of a set.

Definition 7.1.9. Let f : Rn → R be a convex function. The function cone f fromRn → R is defined by

cone f (y) = infτ>0

τ−1 f (τy). (7.9)

⊓⊔

As mentioned before the definition, cone f (y) is the infimum of those η for which(y,η) ∈ coneepi f , and therefore in particular cone f is a convex function. To seethis, observe that (y,η) ∈ coneepi f if and only if there is some positive τ such that(τy,τη) ∈ epi f , which in turn is equivalent to

η ≥ τ−1 f (τy). (7.10)

Taking the infimum shown in (7.9) is equivalent to taking the infimum of η in (7.10)and thus also to taking the infimum of the η such that (y,η) ∈ coneepi f .

This function can behave in an unpleasant way; for example, suppose that theorigin is in the interior of dom f and that f (0) < 0. Then f (x) is negative for all xnear the origin (Theorem 6.1.11), so epi f contains a ball about the origin in Rn+1.It follows that cone f ≡ −∞. However, if f is closed and proper with f (0) beingfinite and positive, then cone f is closed and proper. We will prove this in Theorem7.2.15 below. In the meantime, we show that this function lets us identify the supportfunction of a level set in case the function involved is closed and proper.

Theorem 7.1.10. Let f : Rn → R be a closed proper convex function and let ϕ be areal number. The support function of levϕ f is clcone( f ∗+ϕ).

Proof. It suffices to prove the result for ϕ = 0, because for any real ϕ we then havelevϕ f = lev0( f −ϕ), so the support function will be clcone( f −ϕ)∗ = clcone( f ∗+ϕ), as required.

Corollary 7.1.4 says that clcone f ∗ is the pointwise supremum of the affine mi-norants of cone f ∗. Therefore in particular a linear function of the form ⟨x∗, · ⟩ mi-norizes clcone f ∗ if and only if it minorizes cone f ∗.


The function clcone f ∗ is closed, positively homogeneous, convex, and not iden-tically +∞, so there are two cases: either it is identically −∞ (the support functionof the empty set) or else it is proper.

In the first case cone f ∗ takes −∞ somewhere, so that there is some y∗ suchthat for each real µ there is a positive τ such that τ−1 f ∗(τy∗) < µ . If x ∈ lev0 f ,then since f = f ∗∗ we see that for each x∗ we have ⟨x∗,x⟩ − f ∗(x∗) ≤ f (x) ≤ 0.Setting µ = ⟨y∗,x⟩ and taking x∗ to be of the form τy∗ for positive τ , we haveτ−1 f ∗(τy∗)≥ ⟨y∗,x⟩= µ , a contradiction. Therefore the occurrence of the first casemeans that lev0 f = /0, so the assertion of the theorem is true in that case.

In the second case clcone f ∗ is not identically −∞; then it is proper, and thenby Theorem 7.1.7 it is the support function of the set S consisting of x such that⟨ · ,x⟩ minorizes clcone f ∗. These x are, by our previous comment, the x such that⟨ · ,x⟩ minorizes cone f ∗. But the latter statement means that for each x∗ and eachpositive τ , ⟨x∗,x⟩ ≤ τ−1 f ∗(τx∗). Introducing a new variable y∗ = τx∗ we have theequivalent statement that x ∈ S if and only if for each y∗ ∈ Rn, ⟨y∗,x⟩− f ∗(y∗)≤ 0.That inequality is equivalent to 0 ≥ f ∗∗(x) = f (x), so that S = lev0 f . Thereforeclcone f ∗ is the support function of lev0 f in the second case also. ⊓⊔

By evaluating the support function of a convex set C we can recover informationabout the position of the origin with respect to C. In fact, we can do this for anydesignated point x0 of Rn: the support function of x0 is ⟨·,x0⟩, and C and clChave the same support function (Exercise 7.1.18). Using these facts together withProposition 7.1.8, we find that I∗C−x0

= I∗(clC)−x0= I∗clC −⟨·,x0⟩. So by investigating

the position of the origin with respect to C−x0 we can obtain information about theposition of x0 with respect to C.

Theorem 7.1.11. Let C be a convex set in Rn. Then:

a. 0 ∈ clC if and only if I∗C is everywhere nonnegative.b. 0 ∈ intC if and only if I∗C is positive everywhere except at x∗ = 0.c. 0 ∈ riC if and only if I∗C is positive everywhere except at points x∗ for which

I∗C(x∗) = I∗C(−x∗) = 0.

Proof. Since the closure, interior, and relative interior are the same for C and forclC, and since by Exercise 7.1.18 C and clC have the same support function, wemay assume in what follows that C is closed.

For (a), note that to say I∗clC is nonnegative is equivalent to saying that it majorizesthe support function of the origin, and by our earlier remarks that is true if and onlyif C contains 0.

To establish (b) and (c), we start by observing that if the origin is in C then NC(0)consists of those points x∗ at which I∗C(x

∗) = 0. Now for (b), note that if 0 ∈ intCthen C contains a ball about the origin. The definition of the support function thenshows it must be positive at any nonzero x∗. On the other hand, if I∗C is positiveeverywhere except at the origin, then

IC(0) = I∗∗C (0) = supx∗

⟨x∗,0⟩− I∗C(x∗)= 0,


so that 0 ∈ C; further, the observation that we just made shows that NC(0) = 0.Now part (c) of Theorem 3.3.7 shows that 0 is actually in the interior of C.

For (c) note first that the exceptional points x∗ mentioned in (c) may consist oftwo types. First, the origin always qualifies as long as C is nonempty. Second, ifany such x∗ are nonzero then they are normals to hyperplanes through the originthat contain C, and the set of all such normals, together with the origin, is exactly(parC)⊥. To see this, note that C must be contained in the lower closed halfspacesassociated with hyperplanes through the origin with normals x∗ and −x∗.

Now if the origin belongs to riC then clearly I∗C is everywhere nonnegative. Bypart (b) of Theorem 3.3.7 we have NC(0) = (parC)⊥, which means that I∗C takes thevalue 0 at exactly those points that are mentioned in (c). On the other hand, if I∗Cis positive everywhere but at the exceptional points, then the origin belongs to Cby part (a), and then our observation shows that NC(0) = (parC)⊥. By part (b) ofTheorem 3.3.7 this is equivalent to saying that 0 ∈ riC. ⊓⊔

7.1.4 Gauge functions

In looking at the support function of a convex set C we found that its epigraphconsisted of the polar H

C of the homogenizing cone of C. We know from Section4.1.2 that this polar cone is closely connected to the polar set C of C. That suggestslooking for symmetry in this construction: suppose that we look at H

C , which isthe closure of the homogenizing cone HC. Suppose further that we turn the entirepicture upside down by multiplying by the linear reflection operator J : Rn+1 →Rn+1 defined by J(x,ξ ) = (x,−ξ ). Now C×1 is at the top and C×−1 is atthe bottom. If we assume, as we will in this section, that C is closed and containsthe origin, then it is C.

From what we saw in the previous section the cone clJ(HC), which is the polar ofJ(H

C), should be the epigraph of the support function of C. This support functionwill be a closed proper positively homogeneous convex function. It is called thegauge of C, and this section develops a few of its useful properties.

Suppose that C is nonempty. A point (x,ξ ) belongs to J(HC) exactly when ξ > 0and ξ−1x ∈C. The function we want to examine is therefore

γC(x) := infξ | (x,ξ ) ∈ J(HC)= infξ > 0 | ξ−1x ∈C. (7.11)

This function is positively homogeneous because, for fixed α > 0 and for anyx and ξ > 0, one has ξ−1x ∈ C if and only if (αξ )−1(αx) ∈ C, and thereforeγC(αx) = αγC(x). As J(HC) is convex, γC is convex by Exercise 6.1.20. Its val-ues are always nonnegative, so it cannot take −∞, and if x ∈C then (1)−1x ∈C andtherefore γC(x)≤ 1, so it is not identically +∞. Therefore it is proper. To investigateconditions for it to be closed we need a preliminary lemma.

Lemma 7.1.12. Let C be a closed convex subset of Rn containing the origin. Then


x | γC(x)≤ 0= rcC, x | γC(x)≤ 1=C. (7.12)

Proof. First suppose v ∈ rcC. As 0 ∈ C, C must contain the ray vR+, so for α > 0we have α−1x ∈ C. The infimum of such α is zero, so γC(x) = 0. Conversely, ifx /∈ rcC then by Theorem 4.1.2 C cannot contain the ray xR+. Accordingly, thereis some positive β such that if α > 0 and α−1x ∈ C then α−1 ≤ β and thereforeα ≥ β−1. Then γC(x) is at least as large as β−1, so it cannot be zero.

For the other assertion, let x ∈ C; then (1)−1x ∈ C, so γC(x) ≤ 1. Now supposex /∈C. Then x = 0 and (1)−1x /∈C. As C is closed there is β > 1 with β−1x /∈C. Butthen if α−1x ∈C we must have α−1 < β−1, so that α > β . Then γC(x)≥ β > 1, soγC(x) = 1. ⊓⊔

Proposition 7.1.13. Let C be a closed convex subset of Rn containing the origin.Then epiγC = J(clHC) = clJ(HC).

Proof. The second equality holds because J is nonsingular. To prove the first, writeE for J(clHC). E is convex and is closed by the second equality; it is a cone becauseHC is. As 0 ∈C the ray 0n ×R− lies in HC and therefore 0n ×R+ ⊂ E. As E isa closed convex cone this ray belongs to rcE, and it follows that E is an epigraph.

If (x,ξ ) ∈ E and ξ > 0 then (ξ−1x,−1) ∈ C ×−1 by Theorem 4.1.9. ThenγC(x)≤ ξ , which implies (x,ξ ) ∈ epiγC. Therefore E ⊂ epiγC.

Now suppose that (x,ξ ) /∈ E and consider two cases.

a. If ξ > 0 then (x,−ξ ) /∈ clHC and therefore for sufficiently small positive ε andρ = ξ + ε , also (x,−ρ) /∈ clHC and therefore (ρ−1x,−1) /∈C×−1 by Theo-rem 4.1.9. Then ρ−1x /∈C and, as 0∈C, whenever α ≤ ρ we also have α−1x /∈C.Then γC(x)≥ ρ > ξ , so that (x,ξ ) /∈ epiγC.

b. If ξ = 0 then (x,0) /∈ clHC and therefore x /∈ rcC by Theorem 4.1.9. Then Lemma7.1.12 requires γC(x)> 0 so that (x,ξ ) /∈ epiγC.

The results of these two cases show that epiγC ⊂ E. ⊓⊔

If we require that the origin lie in the relative interior of C rather than just in C,we can get better behavior from γC.

Proposition 7.1.14. Let C be a closed convex subset of Rn with 0 ∈ riC. Then

a. domγC = parC.b. There is a positive β such that for each x ∈ Rn, 0 ≤ γC(x)≤ β∥x∥.c. γC is locally Lipschitz continuous relative to parC.d. If C is compact, then there is a positive δ such that for each x∈Rn, δ∥x∥≤ γC(x).

Proof. Let L = parC and write BL(x,ξ ) = L∩B(x,ξ ). Find a positive β such thatBL(0,β−1) ⊂ C. If x /∈ L then there is no positive α with α−1x ∈ L and thereforeγC(x) = +∞ and x /∈ domγC.

If x ∈ L then if x ∈ rcC we have γC(x) = 0 (Lemma 7.1.12. If x /∈ rcC thenβ−1(x/∥x∥) ∈C by choice of β and it follows that 0 ≤ γC(x)≤ β∥x∥. In either casex ∈ domγC. This proves (a) and (b).


For (c) we use the fact that domγC is the subspace L, hence relatively open. ThenTheorem 6.1.11 shows that γC is locally Lipschitz continuous relative to L.

To prove (d), suppose that C is compact and choose a positive δ small enough sothat C ⊂ BL(0,δ−1). Let x ∈Rn. If α is positive with α−1∥x∥> δ−1 then α−1x /∈C.It follows that if α < δ∥x∥ it cannot appear in the infimum defining γC(x), andtherefore γC(x)≥ δ∥x∥. ⊓⊔

Proposition 7.1.14 shows that if C is closed and 0 ∈ riC then γC is a fairly nicefunction if we remain in L. We will give two examples of how a gauge can be usefulunder these conditions. The first shows that we have already been making heavyuse of gauges even though they have not appeared before this section. To state it weneed a definition: a subset S of Rn is symmetric if whenever x ∈ S then −x ∈ S also.

Theorem 7.1.15. Let C be a compact convex subset of Rn containing the origin inits interior. Then γC is a norm on Rn if and only if C is symmetric.

Proof. (Only if.) If C were not symmetric there would be some x ∈C with −x /∈C.Then γC(x)≤ 1 and γC(−x)> 1 by Lemma 7.1.12, so γC could not be a norm.

(If.) Suppose C is symmetric. We show that γC satisfies the conditions given inDefinition A.2.1, namely that for each x and y in Rn and each α ∈ R,

a. γC(x)≥ 0, and γC(x) = 0 if and only if x = 0.b. γC(αx) = |α|γC(x).c. γC(x+ y)≤ γC(x)+ γC(y).

For (a), we know that γC is nonnegative, and is finite everywhere on Rn by Propo-sition 7.1.14 because the hypotheses require dimC = n. As C is compact its reces-sion cone is the origin, and then by Lemma 7.1.12 it is zero at that point only.

To prove (b) we choose x ∈ Rn and α ∈ R. If α > 0 we have (b) by positivehomogeneity. If α = 0 then (b) holds because γC(0) = 0. If α < 0 then we write

γC(αx) = γC(−|α|x) = γC(|α|x) = |α |γC(x),

where for the second and third equalities we used symmetry and positive homo-geneity respectively.

Finally, let x and y be points of Rn; then

γC(x+ y) = γC[2(.5x+ .5y)]

= 2γC(.5x+ .5y)

≤ 2[.5γC(x)+ .5γC(y)]

= γC(x)+ γC(y),

where we used in turn positive homogeneity, convexity, and again positive homo-geneity. This proves (c). ⊓⊔

If | · | is a norm on Rn let Q = x | |x| ≤ 1 be the unit ball of | · | (although it maynot look like a ball). Then for x ∈ Rn,


γQ(x) = infα > 0 | α−1x ∈ Q= infα > 0 | |x| ≤ α= |x|,

so that | · | is the gauge of its unit ball. In particular, the Euclidean norm ∥ · ∥ is thegauge of the standard unit ball Bn. But Theorem 7.1.15 shows how one can create anorm from any symmetric compact convex set containing the origin in its interior,and Corollary A.2.9 shows that all of these norms are equivalent.

The following theorem establishes a fact that would be difficult to prove withoutthe assistance of gauges.

Theorem 7.1.16. If C and D are compact convex subsets of Rn, each having dimen-sion n, then they are homeomorphic.

Proof. We must show that there are continuous maps g : C → D and h : D → Csuch that gh = idC and hg = idD. Begin by choosing points c0 ∈ intC and d0 ∈intD and defining translations τC : C → Rn and τD : D → Rn by τC(c) = c− c0 andτD(d) = d − d0. Let E = τC(C) and F = τD(D). We will show that E and F arehomeomorphic via maps e : E → F and f : F → E, and it will then follow that Cand D are homeomorphic via τ−1

D e τC and τ−1C f τD respectively.

By construction, E and F are compact convex sets, each containing the origin inits interior, so that all of the conclusions of Proposition 7.1.14 apply to the gaugesγE and γF . We define the maps e and f by

e(x) = [γE(x)/γF(x)]x (0 = x ∈ E), e(0) = 0,f (y) = [γF(y)/γE(y)]y (0 = y ∈ F), f (0) = 0.

(7.13)

For x ∈ E we have γF [e(x)] = γE(x)≤ 1, so e(x) ∈ F and therefore e : E → F , and asimilar calculation shows that f : F → E.

Also, e(x) is surely continuous when x = 0, because parts (b) and (d) of Proposi-tion 7.1.14 show that there are positive β and δ , and positive η and θ , such that

δ∥x∥ ≤ γE(x)≤ β∥x∥, θ∥x∥ ≤ γF(x)≤ η∥x∥,

so for any nonzero x ∈ E we have

δη−1 ≤ γE(x)/γF(x)≤ βθ−1

and therefore∥e(x)∥ ≤ βθ−1∥x∥.

As e(0) = 0 this shows that e is also continuous at 0, and a similar argument showsthat f is continuous on F .

We have for 0 = x ∈ E

( f e)(x) = γF [e(x)]/γE [e(x)]e(x). (7.14)

However,

7.2 Recession functions 153

γF [e(x)] = [γE(x)/γF(x)]γF(x),

γE [e(x)] = [γE(x)/γF(x)]γE(x),

e(x) = [γE(x)/γF(x)]x,

and by substituting these values into (7.14) we find that ( f e)(x) = x. This equationalso holds if x = 0, so we conclude that f e = idE , and a similar calculation showsthat e f = idF . ⊓⊔


Exercise 7.1.17. Compute the closure and the conjugate of the function defined by

f (x) =

x lnx if x > 0,+∞ if x ≤ 0.

In computing the closure, explain how you know that the values you have prescribedon (−∞,0] are correct.

Exercise 7.1.18. Show that for any subset S of Rn, whether convex or not, I∗S =I∗clconvS.

Exercise 7.1.19. a. Let g(x∗) be the function of Exercise 6.2.22, namely

g(x∗1,x∗2) =

x∗2

2/(2x∗1) if x∗1 > 0,0 if (x∗1,x

∗2) = 0,

+∞ otherwise.

Identify a closed convex set C such that g is the support function of C, and showthat your specification of C is correct. Suggestion. Start by using g to calculatesome supporting hyperplanes to C, then sketch them in order to get an idea of theshape of that set.

b. If you were asked to prove that the cone K of Exercise 3.1.18 is closed andconvex, what would be a relatively easy way to do so given the information thatyou now have?

7.2 Recession functions

This section develops several results dealing with the way in which convex functionsvary as one moves along halflines in Rn. An intuitive, though oversimplified, wayto summarize these results is to say that the asymptotic behavior of f along such ahalfline depends only on the direction of the halfline and not on its location. This isa very strong property of convex functions.


A convenient way to present these results is through the recession function,whose epigraph is just the recession cone of the epigraph of the function underconsideration.

Definition 7.2.1. Let f be a proper convex function on Rn. The recession functionof f is the function rec f : Rn → R defined by epi rec f = rcepi f . ⊓⊔

The next proposition shows that this definition makes sense, and gives some ele-mentary properties of rec f .

Proposition 7.2.2. If f is a proper convex function on Rn then so is rec f ; it is posi-tively homogeneous with rec f (0) = 0. For each y ∈ Rn one has

rec f (y) = supx∈dom f

[ f (x+ y)− f (x)]. (7.15)

Proof. We first show that rec f is well defined (that is, rcepi f is an epigraph) andthat its value is given by the formula in (7.15). As rcepi f is convex, this will showthat rec f is a convex function.

Let y ∈ Rn. If the vertical line L = y×R through y fails to meet rcepi f , thenfor each real η the point (y,η) fails to belong to rcepi f , and therefore there is some(x,ξ ) ∈ epi f with (x+ y,ξ +η) /∈ epi f . This means x ∈ dom f , so f (x) is finite,and it also means that f (x+ y)> ξ +η ≥ f (x)+η . As η was arbitrary we see thatthe supremum in (7.15) is +∞. On the other hand, if L meets rcepi f , then there is areal number η such that (y,η) belongs to rcepi f . This is equivalent to saying thatfor each x ∈ dom f , we have f (x+y)≤ f (x)+η : that is, η ≥ f (x+y)− f (x). If thisinequality holds for some real η then it holds for all η ∈ [σ ,+∞), where σ is thesupremum in (7.15). Therefore rcepi f is the epigraph of the function rec f given by(7.15).

If rec f took −∞ at some point y then rcepi f would contain the vertical linethrough y, so for x ∈ dom f and each real η we would have f (x+ y) ≤ f (x)+η .This would imply f (x+ y) =−∞, which is impossible since f is proper. Thereforerec f is proper.

The origin belongs to rcepi f , so rec f (0)≤ 0. If rec f (0) were negative then forsome positive α the point (0,−α) would belong to rcepi f . Then for x ∈ dom fwe would have (x+ 0, f (x)−α) ∈ epi f , contradicting the definition of epigraph.Therefore rec f (0) = 0. Also, as epi rec f is a cone we know (Exercise 6.1.19) thatrec f is positively homogeneous. ⊓⊔

If f is closed then we can say more. We need a small lemma about the behaviorof a difference quotient.

Lemma 7.2.3. Let f : Rn → (−∞,+∞] be a proper convex function. If x ∈ dom fand y ∈ Rn, then the difference quotient

Q(x,y,τ) = τ−1[ f (x+ τy)− f (x)] (7.16)

is nondecreasing for τ > 0.


Proof. Suppose that 0 < τ1 < τ2; then

x+ τ1y = [1− (τ1/τ2)](x)+(τ1/τ2)(x+ τ2y),

so by convexity

f (x+ τ1y)≤ [1− (τ1/τ2)] f (x)+(τ1/τ2) f (x+ τ2y).

Rearranging this inequality yields Q(x,y,τ1)≤ Q(x,y,τ2) as required. ⊓⊔

Proposition 7.2.4. Let f be a closed proper convex function on Rn. Then rec f isclosed, proper and convex. If we define Q(x,y,τ) by (7.16), then for any x ∈ dom fand any y ∈ Rn one has

rec f (y) = supτ>0

Q(x,y,τ) = limτ→+∞

Q(x,y,τ).

Proof. Lemma 7.2.3 shows that Q(x,y,τ) is nondecreasing for τ > 0, so the supre-mum and the limit in the assertion of the theorem are the same. Also, as f is closedits epigraph is closed, so rcepi f is closed and convex by Theorem 4.1.2. This meansrec f is lower semicontinuous (Theorem 6.2.7). But rec f is proper (Proposition7.2.2), so it is actually a closed proper convex function.

Now let x ∈ dom f and y ∈ Rn. The point (x, f (x)) belongs to epi f , which is aclosed set. Therefore, for any real number η , saying that (y,η) belongs to epi rec fis equivalent to saying that epi f contains the halfline (x+ τy, f (x)+ τη) for τ ≥ 0(Theorem 4.1.2). In turn, this is the same as saying that η ≥Q(x,y,τ) for all positiveτ . Therefore rec f (y) is the minimum of the η for which the latter inequality holds:that is, it is the supremum in the assertion of the theorem. ⊓⊔

To illustrate the usefulness of recession functions we establish some propertiesof separable convex functions that we will need later.

Proposition 7.2.5. For i = 1, . . . ,k let gi : Rni → (0,+∞] be a proper convex func-tion. Let n = ∑k

i=1 ni; for i = 1, . . . ,k and xi ∈ Rni define

g(x1, . . . ,xk) =k

∑i=1

gi(xi).

The function g is then a proper convex function on Rn, and one has for i= 1, . . . ,kand xi ∈ Rni

clg(x1, . . . ,xk) =k

∑i=1

clgi(xi), (7.17)

and for yi ∈ Rni

recclg(y1, . . . ,yk) =k

∑i=1

recclgi(yi). (7.18)

This function g is not the usual sum of functions on a given space: rather, it isa separable function consisting of the sum of several functions on different spaces.


We will establish recession (and other) properties of the usual sum in a later resultwhose proof will use this proposition.

Proof. The function g is convex by Proposition 6.1.5. As each gi is proper, g cannottake −∞. Further, if for i = 1, . . . ,n gi we take xi to be a point of ridomgi, then gi isfinite there; so then is g(x1, . . . , xk), and therefore g is proper.

For the closure formula, let xi ∈Rni for i= 1, . . . ,k. We have domg=Π ki=1 domgi,

and therefore ridomg = Π ki=1 ridomgi by Exercise 1.2.15. Then Proposition 6.2.15

says that

clg(x1, . . . ,xk) = limµ↓0

k

∑i=1

gi[(1−µ)xi +µ xi]

=k

∑i=1

limµ↓0

gi[(1−µ)xi +µ xi]

=k

∑i=1

clgi(x1, . . . ,xk),

which proves (7.17).The claim in (7.18) involves only closures, so for brevity let hi = clgi and h =

clg. We have already shown that h(x1, . . . ,xk) = ∑ki=1 hi(xi). Let H = epih and for

i = 1, . . . ,k let Hi = epihi. If we define L to be the linear operator from Rn+k to Rn+1

given by

L(x1,ξ1, . . . ,xk,ξk) = (x1, . . . ,xk,k

∑i=1

ξi),

where for each i xi ∈ Rni and ξi ∈ R, then

H = L(Π ki=1Hi). (7.19)

The kernel of L consists of points z = (0,τ1, . . . ,0,τk) ∈ Rn such that ∑ki=1 τi = 0.

The recession cone of Π ki=1Hi is Π k

i=1 rcHi by Exercise 4.1.24, so if z ∈ (kerL)∩(rcΠ k

i=1Hi) then each τi is nonnegative because otherwise clgi would be identically−∞. As the τi must sum to zero, they are all zero and therefore z= 0. This shows thatthe closed set Π k

i=1Hi satisfies the recession condition for L. By Corollary 4.1.18,

rcH = L(rcΠ ki=1Hi) = L(Π k

i=1 rcHi) :

that is,epi recclg = L(Π k

i=1 epi recclhi),

so that (7.18) holds. ⊓⊔

Points y at which rec f takes nonpositive values are of special interest, so theyhave special names.


Definition 7.2.6. Let f be a proper convex function on Rn. The recession cone off is the set rc f of points y at which rec f (y) ≤ 0, and the constancy space of f iscs f = (rc f )∩ (− rc f ). ⊓⊔

Note that rec f (0) = 0 (Proposition 7.2.2), so that each of the sets in Definition7.2.6 contains the origin. To see why they are important – and why they have thenames given in that definition – we need to develop more information about the wayin which f behaves.

Theorem 7.2.7. Let f be a proper convex function on Rn. A point y ∈Rn belongs torc f if and only if for each x ∈ Rn the function ϕx(τ) = f (x+ τy) is nonincreasingon R, and y belongs to cs f if and only if for each x ∈ Rn ϕx is constant on R.

Proof. If y = 0 the assertion is trivial, so we can assume that y = 0. Also, we needonly prove the first assertion, since then by applying it to y and −y we can establishthe second.

First suppose that for each x ∈ Rn ϕx is nonincreasing. Let x ∈ dom f ; thenϕx(1) ≤ ϕx(0), so f (x+ y)− f (x) ≤ 0. Taking the supremum over x ∈ dom f andapplying Proposition 7.2.2, we conclude that rec f (y)≤ 0, so that y ∈ rc f .

Next, suppose that y ∈ rc f , and choose x ∈ Rn. If the line L = x+ τy | τ ∈ Rdoes not meet dom f , then ϕx ≡ +∞, so the assertion is true. If L meets dom f (aconvex set), the set τ ∈R | x+τy ∈ dom f is an interval T in R. This interval mustbe unbounded on the right, because y ∈ rc f implies f (x+ y)≤ f (x) by Proposition7.2.2, and therefore f (x+ ky) ≤ f (x) for each positive integer k. Any τ /∈ T musthave ϕx(τ) = +∞ by definition of ϕx. Therefore it suffices to show that for eachτ1 and τ2 in T with τ1 < τ2, ϕx(τ1) ≥ ϕx(τ2). As x + τ1y ∈ dom f , we have byProposition 7.2.2 and the hypothesis

0 ≥ (τ2 − τ1) rec f (y) = rec f [(τ2 − τ1)y]

≥ f [(x+ τ1y)+(τ2 − τ1)y]− f (x+ τ1y)

= ϕx(τ2)−ϕx(τ1),

which shows that ϕx is nonincreasing. ⊓⊔

Remark 7.2.8. If f is closed and proper, and if for some particular x ∈ dom f and y ∈Rn the quantity f (x+ τy) does not converge to +∞ as τ →+∞, then the differencequotient Q(x,y,τ) must converge to a nonpositive value. This in turn means thatrec f (y) ≤ 0 by Proposition 7.2.4 and so by Theorem 7.2.7 f (x+ τy) is actuallynonincreasing as a function of τ ∈ R.

The recession cone of a closed convex function f can be used to gain informationabout other quantities associated with f . We first look at the situation for level sets,which we write for ϕ ∈ R as levϕ f = x | f (x)≤ ϕ.

Theorem 7.2.9. Let f be a closed proper convex function on Rn. and let ϕ ∈ R. Iflevϕ f = /0 then rc levϕ f = rc f and lin levϕ f = cs f .


Proof. The second assertion follows from the first, so we prove only the first. Theset levϕ f is closed, so a vector y belongs to its recession cone only if the level setcontains a halfline from some x in the direction of y. Remark 7.2.8 then shows thaty ∈ rc f . Conversely, if y ∈ rc f then let x ∈ levϕ f . By Theorem 7.2.7 the values of falong the halfline from x in the direction of y must be nonincreasing, so this halflinelies in levϕ f . Then Theorem 4.1.2 says that y ∈ rc levϕ f . ⊓⊔

Theorem 7.2.9 is a very strong statement about level sets: it says that, roughlyspeaking, all of the nonempty level sets have to be unbounded in the same way.Therefore we can use information about rc f , if we have it, to predict properties ofthe level sets. If the set of minimizers of f is nonempty, it is a level set, and so inparticular we can get information about minimizers in this way.

To enhance our ability to compute rc f we now look at a way of determining therecession function rec f . It turns out to be easier to work with the conjugate, whichis always closed and whose recession function is therefore closed. As this recessionfunction is closed, proper, positively homogeneous, and convex, Theorem 7.1.7 tellsus that it must then be the support function of some nonempty convex set. In fact thesupport function uniquely specifies the closure of this set, but if we do not demandthat the set be closed then there may be many candidates. The following theoremidentifies this set as the effective domain of the original function.

Theorem 7.2.10. Let f be a proper convex function on Rn. Then rec f ∗ = I∗dom f . Iff is closed then rec f = I∗dom f ∗ .

Proof. The second statement follows from the first because f ∗∗ = cl f (Theo-rem 7.1.3). To prove the first statement, recall that epi f ∗ is the intersection overx ∈ dom f of the closed halfspaces (x∗,ξ ∗) | ⟨(x∗,ξ ∗),(x,−1)⟩ ≤ f (x). By Ex-ercise 4.1.25 the recession cones of these halfspaces are the halfspaces (x∗,ξ ∗) |⟨(x∗,ξ ∗),(x,−1)⟩ ≤ 0. Because the halfspaces involved are closed, Corollary 4.1.5tells us that the recession cone of epi f ∗ is the set of (x∗,ξ ∗) such that for eachx ∈ dom f , ⟨x∗,x⟩ ≤ ξ ∗: that is, ξ ∗ ≥ I∗dom f . Therefore the epigraphs of rec f ∗ and ofI∗dom f are the same, which proves the assertion. ⊓⊔

By combining Theorems 7.2.9 and 7.2.10 we obtain very strong results about thelevel sets of a function f in terms of properties of the effective domain of f ∗.

Theorem 7.2.11. Let f be a closed proper convex function on Rn, and let ϕ be areal number with levϕ f = /0. Then:

a. levϕ f = cs f +Kϕ , where Kϕ = [(levϕ f )∩ (cs f )⊥].b. Kϕ is compact if and only if 0 ∈ ridom f ∗.c. levϕ f is compact if and only if 0 ∈ intdom f ∗.

Proof. By Theorem 7.2.9 the lineality space and the recession cone of levϕ f arecs f and rc f respectively. Then assertion (a) follows from Proposition A.6.22.

For assertion (b), we apply Proposition A.6.22 again, using the fact that cs f =lin rc f , to obtain


rc f = cs f +(rc f )∩ (cs f )⊥. (7.20)

As Kϕ is nonempty, we have

rcKϕ = rc[(levϕ f )∩ (cs f )⊥] = (rc levϕ f )∩ (cs f )⊥ = (rc f )∩ (cs f )⊥,

where we used Corollary 4.1.5 and Theorem 7.2.9. If Kϕ is compact, its recessioncone is the origin (Corollary 4.1.8), which means that the second term in (7.20) iszero. Therefore rc f is just the constancy space cs f . This means that the supportfunction I∗dom f ∗ can take nonpositive values only at those x where it is nonpositivealso at −x. But since it takes the value zero at the origin, convexity then impliesthat it takes zero at both x and −x. Theorem 7.1.11 then tells us that 0 ∈ ridom f ∗.Conversely, if 0 ∈ ridom f ∗ then rec f is positive everywhere except on cs f , sorc f = cs f and therefore by the above analysis rcKϕ is the origin. By Corollary4.1.8, Kϕ is then compact.

For assertion (c), note that levϕ f is compact if and only if its recession coneis the origin (Corollary 4.1.8), which is equivalent to saying that rc f is the origin(Theorem 7.2.9). In turn, by Theorem 7.2.10 this is equivalent to saying that I∗dom f ∗

is positive everywhere except at the origin, and by Theorem 7.1.11 this is equivalentto the statement that the origin belongs to the interior of dom f ∗. ⊓⊔

One of the consequences of Theorem 7.2.11 is an important criterion for attain-ment of the minimum of a closed proper convex function.

Corollary 7.2.12. Let f be a closed proper convex function on Rn. If 0 ∈ ridom f ∗

then f attains its minimum, which is finite.

Proof. The minimum, if attained, must be finite because f is proper. If the originbelongs to ridom f ∗ then, according to Theorem 7.2.11, each nonempty level setlevϕ f is of the form cs f +Kϕ , with Kϕ = (levϕ f )∩ (cs f )⊥. Moreover, Kϕ is com-pact. This means that each element of levϕ f yields the same function value as someelement of Kϕ (because f is constant along any line in the direction of an elementof cs f ). Therefore a particular value of f is attained on levϕ f if and only if it isattained on Kϕ , and accordingly the minimum of f , which is also its minimum onlevϕ f , is the same as its minimum on Kϕ . But Kϕ is a compact subset of levϕ f , soby Proposition 6.2.8 the minimum is attained on Kϕ , and hence on levϕ f . ⊓⊔

The criterion in Corollary 7.2.12 is sufficient, but not necessary (Exercise 7.2.22).There is a close connection between the recession properties of a function and

the effective domain of its conjugate. The next two results express some aspects ofthat connection.

Corollary 7.2.13. Let f be a proper convex function on Rn. Then rc f ∗=(conedom f ).

Proof. As f is proper, dom f is nonempty and so the polar operation is well de-fined. The polar of conedom f consists of all y∗ ∈ Rn such that, for each x ∈ dom f ,⟨y∗,x⟩ ≤ 0, or equivalently the set of y∗ for which I∗dom f (y

∗)≤ 0. But I∗dom f = rec f ∗

by Theorem 7.2.10, so this set is rc f ∗. ⊓⊔


Corollary 7.2.14. Let f be a proper convex function on Rn and let L be a nonemptysubspace of Rn. Then L∩ rc f ∗ is a subspace if and only if L⊥ meets ridom f .

Proof. By Proposition 3.2.3, L ∩ rc f ∗ is a subspace if and only if L⊥ meetsri[(rc f ∗)]. Using Corollary 7.2.13, Corollary 3.1.9, Theorem 3.1.7, and Proposi-tion 1.2.6, we have

ri[(rc f ∗)] = ri[(conedom f )] = ri[clconedom f ]

= riconedom f = coneridom f .

But a subspace like L⊥ meets coneridom f if and only if it meets ridom f . ⊓⊔

As mentioned after Definition 7.1.9 above, the recession function lets us establisha criterion for the conical hull cone f to be closed and proper.

Theorem 7.2.15. Let f : Rn → R be a closed proper convex function and supposethat f (0) is finite and positive. Then cone f is a closed proper convex function.

Proof. We already know cone f is a convex function. Under our hypotheses its epi-graph is nonempty, so it is not identically +∞. The origin in Rn+1 does not belongto the (closed) epigraph of f , so we can separate strongly to obtain a pair (x∗,ξ ∗) ∈Rn+1 and a positive δ such that for each (x,ξ ) ∈ epi f , 0 > −δ > ⟨x∗,x⟩+ ξ ∗ξ .The form of epi f ensures that ξ ∗ ≤ 0, and if ξ ∗ = 0 then we obtain a contradictionbecause (0, f (0)) ∈ epi f . So ξ ∗ < 0, and we can scale (x∗,ξ ∗) so that ξ ∗ = −1.Then for each pair (y,η) ∈ coneepi f we have ⟨x∗,y⟩−η ≤ 0, so the affine func-tion ⟨x∗, · ⟩ minorizes cone f . Then cone f cannot take −∞, so it is proper. Thereforeepiclcone f = clepicone f .

The comment following the definition of cone f implies that

coneepi f ⊂ epicone f ⊂ clconeepi f ,

so thatepiclcone f = clepicone f = clconeepi f .

Now Corollary 4.1.21 tells us that, since the origin does not belong to the (closed)epigraph of f , we have

epiclcone f = clconeepi f = (coneepi f )∪ (rcepi f ).

As rcepi f = epi rec f , we see that for each y, clcone f (y) is the smaller of cone f (y)and rec f (y).

The rest of the proof consists of showing that rec f (y) is not required, and to doso we recall that as f is closed and f (0) is finite, we have

rec f (y) = supτ>0

τ−1[ f (0+ τy)− f (0)].

Now f (0) is finite and positive, so for any fixed positive δ we can find τ largeenough so that


rec f (y)≥ τ−1 f (τy)− τ−1 f (0)≥ τ−1 f (τy)−δ ≥ cone f (y)−δ .

It follows that rec f (y)≥ cone f (y), so in fact clcone f (y) = cone f (y), and thereforecone f is closed. ⊓⊔

The next definition creates two new functions by a kind of composition operationwith a scalar, using the recession function in the process. As we then show, if f isa closed proper convex function these compositions have interesting properties. Inparticular, they permit us to calculate the support function of an epigraph.

Definition 7.2.16. Suppose f : Rn → R is a proper convex function. For x ∈Rn andµ ∈ R we define a function µ f by

(µ f )(x) :=

µ f (x) if µ > 0,Icldom f (x) if µ = 0,+∞ otherwise,

(7.21)

and a function f µ by

( f µ)(x) :=

µ f (µ−1x) if µ > 0,(rec f )(x) if µ = 0,+∞ otherwise.

(7.22)

As the notation for the function µ f is the same as that for the ordinary product ofµ with f , there is a possibility of confusion. For that reason we usually write (µ f )for the function of Definition 7.2.16 when there is danger of ambiguity.

We can regard these functions either as functions of x for fixed µ or as functionsof (x,µ). In the first case, when f is closed we have the following relationship.

Proposition 7.2.17. Let f be a closed proper convex function from Rn to R and letµ ∈ [0,+∞). Then (µ f ) and ( f µ) are closed proper convex functions of x, and onehas

(µ f )∗ = ( f ∗µ), ( f µ)∗ = (µ f ∗). (7.23)

Proof. Both (µ f ) and ( f µ) are closed proper convex for µ > 0 because f isclosed proper convex. For µ = 0, (µ f ) is closed proper convex because cldom fis nonempty, closed, and convex, and ( f µ) is closed proper convex because rec f isclosed proper convex by Proposition 7.2.4.

If µ = 0 then (7.23) says that I∗cldom f = rec f ∗ and that (rec f )∗ = Icldom f ∗ . Thesefollow from Theorem 7.2.10 because f is closed and because the support functionsof a set and of its closure are the same.

If µ > 0 then

(µ f )∗(x∗) = supx⟨x∗,x⟩−µ f (x)

= µ supx⟨µ−1x∗,x⟩− f (x)

= µ f ∗(µ−1x∗) = ( f ∗µ)(x∗).


establishing the first equation in (7.23). For the second, if we introduce the variabley = µ−1x then we have

( f µ)∗(x∗) = supx⟨x∗,x⟩−µ f (µ−1x)

= µ supy⟨x∗,y⟩− f (y)

= µ f ∗(x∗) = (µ f ∗)(x∗).

⊓⊔

Suppose now that we were to consider (µ f ) and ( f µ) to be functions of (x,µ).Then even when f (x) = x ∈R and µ ≥ 0 the function g(x,µ) := (µ f )(x) would notbe convex, because g(0,1) = 0 = g(1,0) but g(.5, .5) = .25. However, the functionh(x,µ) := ( f µ)(x) is not only proper convex but is even closed if f is. That is aconsequence of the next proposition, which establishes a very useful connectionwith the epigraph of f .

Proposition 7.2.18. Let f be a closed proper convex function from Rn to R. Then

( f ∗µ∗)(x∗) = I∗epi f (x∗,−µ∗). (7.24)

Proof. If µ∗ < 0 then (7.24) holds because then the structure of the epigraph re-quires its support function to have the value +∞. If µ∗ = 0 then the right side of(7.24) is the support function of dom f (equivalently, cldom f ) evaluated at x∗. Thisis (µ f )∗(x∗), which by Proposition 7.2.17 is ( f ∗µ)(x∗), so (7.24) holds also in thiscase.

Now suppose µ∗ > 0 and write I∗epi f (x∗,−µ∗) as

supx,µ

⟨x∗,x⟩−µ∗µ − Iepi f (x,µ)= supx,µ

⟨x∗,x⟩−µ∗µ | µ ≥ f (x). (7.25)

The best choice of µ that we can make in (7.25) to make the supremum on the rightside as large as possible is f (x). Then we have

I∗epi f (x∗,−µ∗) = sup

x⟨x∗,x⟩−µ∗ f (x)= (µ∗ f )∗(x∗) = ( f ∗µ∗)(x∗), (7.26)

where we used Proposition 7.2.17 again. This proves (7.24). ⊓⊔

Corollary 7.2.19. If c is a closed proper convex function on Rn, then (cρ)(x) is aclosed proper convex function of (x,ρ).

Proof. We have assumed c to be closed, so in Proposition 7.2.18 take f ∗ = c, f =c∗, x∗ = x, and µ∗ = ρ . Then the right side of (7.26) is cρ(x) and the left sideis I∗epic∗(x,−ρ). Here c∗ is proper because c is proper, so domc∗ is nonempty andtherefore its support function is closed proper convex. ⊓⊔



Exercise 7.2.20. Let f = I[R+×(0,+∞)]∪(0,0). Find rec f , and determine whether ornot it is closed.

Exercise 7.2.21. Let f be a convex function on Rn. Show that inf f = − f ∗(0). Ifinf f = −∞, then what can you say about the position of the origin with respect todom f ∗?

Exercise 7.2.22. By considering the function f (x) = sup0,−x on R, show thatthe converse of Corollary 7.2.12 does not hold.

Exercise 7.2.23. (Stable minimum property.) Let f be a closed proper convex func-tion on Rn. Show that the origin belongs to the interior of dom f ∗ if and only ifthere is a positive ε such that, for each x∗ having norm less than ε , the infimum off (x)−⟨x∗,x⟩ is finite and attained.

Exercise 7.2.24. Let f be a closed proper convex function on Rn. Suppose you aretold that that there exist elements x ∈ dom f and y ∈ Rn, and a negative number µ ,such that for all λ ≥ 0, f (x+λy)≤ f (x)+µλ . Give the best lower bound that youcan determine for the distance from the origin to dom f ∗.

Chapter 8Subdifferentiation

We have already seen subgradients of functions in Definition 6.2.12. They allow usto describe supporting hyperplanes to the epigraph of a convex function f defined onRn. If x ∈ dom f , then a hyperplane with normal (z∗,ζ ∗) supports epi f at (x, f (x))if for each (x′,ξ ′) ∈ epi f we have

⟨[

z∗

ζ ∗

],

[x′

ξ ′

]⟩ ≤ ⟨

[z∗

ζ ∗

],

[x

f (x)

]⟩,

or equivalently,⟨z∗,x′⟩+ζ ∗ξ ′ ≤ ⟨z∗,x⟩+ζ ∗ f (x). (8.1)

The properties of an epigraph prevent the ξ ∗ in (8.1) from being positive, becauseif it were then we could take a large positive value of ξ ′ and contradict the inequality.Therefore it can be negative or else zero; in the latter case we say this hyperplane isvertical.

If ζ ∗ < 0 then we can divide (8.1) by −ζ ∗, write x∗ = z∗/(−ζ ∗) and substitutef (x′) for ξ ′. In this way we scale the normal (z∗,ζ ∗) to (x∗,−1) and obtain for eachx′ ∈ Rn,

f (x′)≥ f (x)+ ⟨x∗,x′− x⟩. (8.2)

The element x∗ is a subgradient of f at x if and only if it satisfies (8.2).From this we can see that subgradients represent scaled normals of non-vertical

supporting hyperplanes: x∗ is a subgradient of f at x exactly when (x∗,−1) is thenormal to a hyperplane supporting epi f at (x, f (x)). For some f the form of the epi-graph may also permit vertical supporting hyperplanes, but these do not correspondto subgradients.

165

166 8 Subdifferentiation

8.1 Subgradients and the subdifferential

To study the existence and properties of subgradients in a systematic way, it helpsto introduce a multifunction (an operation whose values are sets) called the subdif-ferential.

A multifunction F from Rn to Rm is simply a correspondence assigning to eachpoint x ∈ Rn a set F(x) ⊂ Rm. The graph of F , which we write as gphF or moreoften just as F , is the set of ordered pairs (x,y) in Rn ×Rm such that y ∈ F(x).We say that F is closed if its graph is closed, and monotone if m = n and for eachpair of elements (x,y) and (x′,y′) in the graph of F one has ⟨x− x′,y− y′⟩ ≥ 0. Theeffective domain of F , written domF , is the set of x such that F(x) is not empty, andthe image of F , written imF , is the set of y for which there exists an x with y ∈ F(x).The sets domF and imF are just the projections of the graph of F into Rn and Rm

respectively.With F we can associate another multifunction F−1, the inverse of F , defined

from Rm to Rn by F−1(y) = x | y∈ F(x). Note that (x,y)∈ F if and only if (y,x)∈F−1. Therefore domF−1 = imF and imF−1 = domF . Further, as the properties ofclosedness and monotonicity were defined in terms of the graph, they hold for F ifand only if they hold for F−1.

Definition 8.1.1. Let f be a convex function from Rn to R. The subdifferential∂ f (x) of the function f at a point x ∈ Rn is the set of all subgradients of f at x.The same definition holds for a concave function g on Rn, except that then a sub-gradient x∗ is defined to be a point such that for each z ∈ Rn,

g(z)≤ g(x)+ ⟨x∗,z− x⟩.

⊓⊔

One indication of the importance of the subdifferential is the fact that x is a globalminimizer of a convex f , or a global maximizer of a concave f , if and only if theorigin belongs to ∂ f (x). Another important case is that of the normal cone of aconvex set C ⊂ Rn. If we take the convex function f to be the indicator IC, then forx ∈C ∂ IC(x) consists of those x∗ such that for each z ∈ Rn,

IC(z)≥ IC(x)+ ⟨x∗,z− x⟩. (8.3)

For z /∈ C (8.3) holds for any x∗, so the important case is that of z ∈ C. Then (8.3)says that for each z ∈C, ⟨x∗,z− x⟩ ≤ 0, so that ∂ IC(x) is the normal cone NC(x) ofC at x.

Here are some equivalent ways of describing the subdifferential.

Proposition 8.1.2. Let f be a convex function from Rn to R. For each x and x∗ inRn, ∂ f (x) is a closed convex set, and the following are equivalent:

a. x∗ ∈ ∂ f (x).b. f ∗(x∗) = ⟨x∗,x⟩− f (x).

8.1 Subgradients and the subdifferential 167

c. ⟨x∗, · ⟩− f ( ·) attains its maximum on Rn at x.

If cl f (x) = f (x) then the preceding statements are all equivalent to the following:

d. x ∈ ∂ f ∗(x∗).e. ⟨ · ,x⟩− f ∗( ·) attains its maximum on Rn at x∗.

Proof. ∂ f (x) is the set of x∗ for which, for each z ∈Rn, f (z)≥ f (x)+ ⟨x∗,z−x⟩. Acalculation then shows that the set ∂ f (x) is closed and convex.

((a) implies (b)). If (a) holds, then for each z ∈ Rn, f (z) ≥ f (x) + ⟨x∗,z− x⟩.Rearranging this, we obtain ⟨x∗,z⟩− f (z)≤ ⟨x∗,x⟩− f (x), and this is (b).

((b) implies (c)). The supremum of ⟨x∗, · ⟩− f ( ·) is f ∗(x∗) by definition. If (b)holds then this is a maximum, attained at x, which is the statement in (c).

(c) implies (a)). If (c) holds, then for each z∈Rn one has ⟨x∗,z⟩− f (z)≤ ⟨x∗,x⟩−f (x). By rearranging this we obtain f (z)≥ f (x)+ ⟨x∗,z− x⟩, which is (a).

Now suppose cl f (x) = f (x) and write statements (a) to (c) for f ∗ and x∗. By whatwe have already shown, these must be equivalent. The first and third of these willbe (d) and (e) respectively, while the second statement is identical to (b) becausef ∗∗(x) = cl f (x) = f (x). Therefore in this case (d) and (e) are equivalent to (a), (b),and (c). ⊓⊔

It is certainly possible for ∂ f (x) to be empty. We are particularly interested inpoints x for which it is nonempty: that is, points of dom∂ f . .

Proposition 8.1.3. Let f be a proper convex function on Rn. Then ridom f ⊂dom∂ f ⊂ dom f .

Proof. Proposition 6.2.13 says that f has a subgradient at each point of ridom f , andthis proves the first inclusion. However, if f has a subgradient at a point x, then sincef is not always +∞ the value f (x) cannot be +∞ since otherwise the subgradientinequality could not hold. This proves the second inclusion. ⊓⊔

Next, we see that the regularizing operation that replaces f by cl f does not affectthe subdifferential of f at any point x for which that subdifferential was nonempty.The symbol “⊂” in the last statement of the proposition denotes graph inclusion.

Proposition 8.1.4. Let f be a proper convex function on Rn. For each x ∈ dom∂ f ,one has f (x) = (cl f )(x) and ∂ f (x) = ∂ (cl f )(x). In particular, one has ∂ f ⊂ ∂ cl f .

Proof. Let x ∈ dom∂ f and x∗ ∈ ∂ f (x); then we have

f (x)≥ (cl f )(x) = f ∗∗(x)≥ ⟨x∗,x⟩− f ∗(x∗) = f (x),

where the last equality came from Proposition 8.1.2. Therefore f (x) = (cl f )(x). Ify∗ ∈ ∂ (cl f )(x) then for each z one has

f (z)≥ cl f (z)≥ cl f (x)+ ⟨y∗,z− x⟩= f (x)+ ⟨y∗,z− x⟩,

and therefore ∂ (cl f )(x) ⊂ ∂ f (x). Now let z0 ∈ ridom f . For each z ∈ Rn, we haveby Corollary 6.2.16


(cl f )(z) = limτ↓0

f [(1− τ)z+ τz0].

However,f [(1− τ)z+ τz0]≥ f (x)+ ⟨x∗, [(1− τ)z+ τz0]− x⟩,

and by taking the limit we find that

(cl f )(z)≥ f (x)+ ⟨x∗,z− x⟩= (cl f )(x)+ ⟨x∗,z− x⟩.

It follows that ∂ f (x)⊂ ∂ (cl f )(x), so the two sets are the same.Finally, for any x ∈ Rn either x ∈ dom∂ f , in which case we have shown that

∂ f (x) = ∂ cl f (x), or else ∂ f (x) is empty, in which case it certainly is contained in∂ cl f (x). Therefore the graph of ∂ f is contained in that of ∂ cl f . ⊓⊔

We now show that the subdifferentials of a closed convex function f and of itsconjugate are inverses of each other, and when f is proper they are not only mono-tone operators but actually obey a stronger property called cyclic monotonicity, de-fined as follows.

Definition 8.1.5. A multifunction F defined from Rn to Rn is cyclically monotoneif for any K and for any pairs (x0, f0), . . . ,(xK , fK) in the graph of F , one has

K

∑k=0

⟨ fk,xk+1 − xk⟩ ≤ 0,

where we use the convention that xK+1 = x0. ⊓⊔

Monotonicity is the special case of cyclic monotonicity with K = 1.

Proposition 8.1.6. Let f be a closed convex function on Rn. Then ∂ f and ∂ f ∗ areclosed multifunctions with ∂ f ∗ = (∂ f )−1. If f is proper then ∂ f and ∂ f ∗ are cycli-cally monotone.

Proof. The equivalence of (a) and (d) in Proposition 8.1.2 shows that (x,x∗) ∈ ∂ fif and only if (x∗,x) ∈ ∂ f ∗, which is a rephrasing of ∂ f ∗ = (∂ f )−1. Knowing this,we need to prove closedness only for ∂ f ; as this is a graph property it will then holdalso for ∂ f ∗.

First suppose that (xk,x∗k) belong to ∂ f and converge to (x,x∗). For each z ∈ Rn

and each k we have f (z) ≥ f (xk)+ ⟨x∗k ,z− xk⟩. Letting k approach ∞ and recallingthat f is closed, we have

f (z)≥ [liminfk→∞

f (xk)]+ ⟨x∗,z− x⟩ ≥ f (x)+ ⟨x∗,z− x⟩,

so that (x,x∗) ∈ ∂ f and therefore ∂ f is closed.Now assume f is proper, choose a positive integer K, and suppose that for k =

0, . . . ,K (xk,x∗k) ∈ ∂ f . By Proposition 8.1.3, for each k f (xk) must be finite. Writethe K +1 subgradient inequalities

8.2 Directional derivatives 169

f (x1)≥ f (x0)+ ⟨x∗0,x1 − x0⟩,f (x2)≥ f (x1)+ ⟨x∗1,x2 − x1⟩,

. . . ,

f (xK)≥ f (xK−1)+ ⟨x∗K−1,xK − xK−1⟩,f (x0)≥ f (xK)+ ⟨x∗K ,x0 − xK⟩,

and add these. As the function values cancel, we have the inequality defining cyclicmonotonicity. The same method of proof works for f ∗. ⊓⊔

As already noted, cyclic monotonicity implies monotonicity, so these subdiffer-entials are also monotone.


Exercise 8.1.7. Let A ∈ Rn×n be a nonzero skew matrix (that is, 0 = A = −A∗).Show that the map F taking x ∈ Rn to Ax is a monotone operator but is not thesubdifferential of any closed proper convex function on Rn.

8.2 Directional derivatives

It turns out that we can develop much information about ∂ f by studying the direc-tional derivative of f . We define this directional derivative next, and study some ofits properties. Then, later in this section, we return to ∂ f and use the properties ofthe directional derivative to find out more about the subdifferential.

Definition 8.2.1. Let f be a convex function from Rn to R, and let x be a point atwhich f is finite. For each z ∈ Rn we define the directional derivative f ′(x;z) to be

f ′(x;z) = limτ0

τ−1[ f (x+ τz)− f (x)] = infτ>0

τ−1[ f (x+ τz)− f (x)].

Note that the limit exists and is equal to the infimum because, by Lemma 7.2.3,the difference quotient in the definition is nondecreasing in τ . The limit may, how-ever, be +∞ or −∞. A computation shows that the function f ′(x; ·) is positivelyhomogeneous, and that f ′(x;0) = 0.

Shapiro [44] provides an excellent review of directional derivatives for general(not necessarily convex) functions.

Proposition 8.2.2. Let f be a convex function from Rn to R, and let x be a point atwhich f is finite. Then f ′(x; ·) is convex, and for each h ∈Rn f ′(x;h)≥− f ′(x;−h).If x ∈ ridom f then f ′(x; ·) is closed and proper, and its effective domain ispardom f .


Proof. Choose h1 and h2 in Rn and let α1 and α2 be any real numbers such thatf ′(x;hi) < αi for i = 1,2 (if any such exist). The fact that f ′(x;hi) is an infimumshows that for all sufficiently small positive τ , f (x+ τhi)< f (x)+ ταi for i = 1,2.For λ ∈ (0,1) the convexity of f then implies

f (x+ τ[(1−λ )h1 +λh2])< (1−λ )[ f (x)+ τα1]+λ [ f (x)+ τα2]

= f (x)+ τ[(1−λ )α1 +λα2],

so that f ′(x;(1−λ )h1 +λh2) < (1−λ )α1 +λα2. Therefore f ′(x; ·) is convex. Acomputation using Proposition 6.1.3 shows that the inequality g(h)≥−g(−h) holdsfor any convex function g having g(0) = 0.

Now suppose x ∈ ridom f . If h ∈ pardom f then for sufficiently small τ > 0we have f (x + τh) < +∞, so that x + τh ∈ dom f ; therefore f ′(x;h) < +∞, sopardom f ⊂ dom f ′(x; ·). On the other hand, if h /∈ pardom f then for each positiveτ we have x+τh /∈ dom f . Then f ′(x;h) = +∞, so in fact dom f ′(x; ·) = pardom f .As pardom f is relatively open, f ′(x; ·) must be closed (Theorem 6.2.14), and as ittakes the value zero at the origin, we find from Proposition 6.2.10 that it is proper.

⊓⊔

The next theorem develops an important connection between the directionalderivative and the subdifferential. To understand what its significance is, remem-ber that we have shown that if f is convex and f (x) is finite then the directionalderivative f ′(x; ·) is a positively homogeneous convex function taking the value 0at the origin. It is easy to show that the closure of such a function has the same prop-erties (for the positive homogeneity, use Corollary 6.2.16). We know from Theorem7.1.7 that this closure must then be the support function of some nonempty closedconvex set. The theorem shows that this set is ∂ f (x).

Theorem 8.2.3. Let f be a convex function on Rn and let x be a point at which fis finite. A point x∗ ∈ Rn belongs to ∂ f (x) if and only if for each h ∈ Rn, f ′(x;h)≥⟨x∗,h⟩. In fact, we have

f ′(x; ·)∗ = I∂ f (x), cl f ′(x; ·) = I∗∂ f (x).

Proof. x∗ belongs to ∂ f (x) if and only if for each z ∈ Rn, f (z)≥ f (x)+ ⟨x∗,z− x⟩.Writing z = x+τh for τ > 0 and h ∈Rn, we can express this equivalently by sayingthat for each h ∈ Rn and each positive τ , ⟨x∗,h⟩ ≤ τ−1[ f (x+ τh)− f (x)]. This inturn is equivalent to

⟨x∗,h⟩ ≤ infτ>0

τ−1[ f (x+ τh)− f (x)] = f ′(x;h),

which is the first assertion of the theorem.The assertion that we just proved can be rephrased to say that x∗ ∈ ∂ f (x) if

and only if f ′(x; ·)∗(x∗)≤ 0. However, we have 0 = f ′(x;0) = ⟨x∗,0⟩− f ′(x;0), sof ′(x; ·)∗ is nonnegative. Therefore x∗ ∈ ∂ f (x) if and only if f ′(x; ·)∗(x∗) = 0. Asf ′(x; ·) is positively homogeneous and not identically +∞, its conjugate can take

8.2 Directional derivatives 171

only the values zero and +∞: that is, the conjugate is an indicator. Our work in thefirst part of the proof then shows that it is the indicator of ∂ f (x). The remaining as-sertion follows from its predecessor by taking conjugates and using Theorem 7.1.3.

⊓⊔

We can use these results to investigate the connection between local propertiesof the function and nonemptiness of the subdifferential at a given point.

Corollary 8.2.4. Let f be a convex function on Rn and let x be a point at which fis finite. If ∂ f (x) is nonempty then f is proper. If ∂ f (x) is empty then for each h incone[(ridom f )− x] one has f ′(x;h) =−∞ and f ′(x;−h) = +∞.

Proof. If f (x) is finite and ∂ f (x) is nonempty then f cannot take −∞ anywhere,and since we know f is not always +∞ we see that f is proper.

Now suppose ∂ f (x) is empty. Then by Theorem 8.2.3, the closure of f ′(x; ·)is the support function of the empty set, namely the function that takes −∞ every-where. Therefore f ′(x;h) =−∞ for some h, so f ′(x; ·) is improper and therefore itmust take −∞ at each h in the relative interior of its effective domain. Further, foreach such h, as f ′(x;h) ≥ − f ′(x;−h) we have f ′(x;−h) = +∞. Now the effectivedomain of f ′(x; ·) is the set of v such that for some positive τ , x+ τv ∈ dom f : thatis, cone[(dom f )− x]. Therefore the relative interior of dom f ′(x;h) is

ricone[(dom f )− x] = coneri[(dom f )− x] = cone[(ridom f )− x],

where we used Theorem 3.1.7 and Corollary 1.2.8. ⊓⊔


Exercise 8.2.5. Suppose that f is an extended-real-valued, closed proper convexfunction on Rn, and that x0 is a point in the relative interior of dom f . We will calla direction v = 0 a descent direction for f at x0 if there is some µ < 0 such that forall small positive t we have f (x0 + tv)≤ f (x0)+µt∥v∥.

Assume that someone has proposed the following idea for finding a descent direc-tion: “If x∗0 is any nonzero element of the subdifferential ∂ f (x0), the direction −x∗0is a descent direction for f at x0.” This idea extends a known property of (Gateauxor Frechet) derivatives.

a. Show by counterexample that this proposal is wrong (that is, produce a functionf and a point x0 for which it doesn’t work).

b. Prove that if x0 is not a minimizer of f and if, instead of any element of ∂ f (x0),one chooses the element x∗0 closest to the origin in the Euclidean norm, then −x∗0is a descent direction for f at x0.

Exercise 8.2.6. Let g be the Euclidean norm ∥ · ∥ on Rn, defined by

∥x∥= supx∗∈Bn

⟨x∗,x⟩,


where Bn is the unit ball of Rn. Show that g is a closed proper convex function.Determine its subdifferential ∂g, and prove that your answer is correct.

8.3 Structure of the subdifferential

We now have enough information to describe the subdifferential ∂ f (x) in terms ofother quantities associated with f .

Theorem 8.3.1. Let f be a proper convex function on Rn, and let x ∈ dom∂ f . Then:

a. rc∂ f (x) = Ndom f (x).b. lin∂ f (x) = (par dom f )⊥.c. ∂ f (x) = (par dom f )⊥+K, where K = ∂ f (x)∩ (par dom f ).d. K is compact if and only if x ∈ ridom f .e. ∂ f (x) is compact if and only if x ∈ int dom f .

Proof. Proposition 8.1.2 says that ∂ f (x) is the set of x∗ for which f ∗(x∗)+ f (x)−⟨x∗,x⟩= 0. Defining g(y) to be f (x+y)− f (x), we find by computation that g∗(x∗)=f ∗(x∗)+ f (x)−⟨x∗,x⟩. Noting that we always have f ∗(x∗)≥ ⟨x∗,x⟩− f (x), we canexpress ∂ f (x) as lev0 g∗. The hypothesis says that f is proper; therefore so is g. Itfollows that g∗ is closed proper convex. Applying Theorem 7.2.9 to g∗, we find thatrc∂ f (x) = rcg∗ and lin∂ f (x) = csg∗. Now Corollary 7.2.13, applied to g, tells usthat rcg∗ = (conedomg). The effective domain of g is the set of y for which x+y ∈dom f : that is, it is dom f − x. This shows that rcg∗ = (cone[dom f − x]). But a setand its closure have the same polar, so rcg∗ is the polar of the set clcone(dom f −x).The latter set is Tdom f (x) by Proposition 3.3.6, so

rc∂ f (x) = rcg∗ = [Tdom f (x)] = Ndom f (x),

and this is assertion (a).Assertion (b) now follows because

lin∂ f (x) = rc∂ f (x)∩ [− rc∂ f (x)]

= Ndom f (x)∩ [−Ndom f (x)]

= (par dom f )⊥,

where the last equality comes from part (a) of Theorem 3.3.7. To obtain (c), (d), and(e) we apply Theorem 7.2.11 to g∗ and recall that g∗∗ = clg, so that the conditionsin Theorem 7.2.11 amount to requirements that the origin lie in the relative interioror interior of domclg. But by Theorem 6.2.14 this set is the same as the relativeinterior or interior of domg, to which the origin belongs exactly when x belongs tothe relative interior or interior of dom f . ⊓⊔

One might wonder how the subgradients and directional derivatives that we haveintroduced here relate to ordinary derivatives when the latter exist. For the case ofFrechet derivatives there is a simple answer.

8.3 Structure of the subdifferential 173

Theorem 8.3.2. Let f be a convex function on Rn and let x be a point at which f isfinite. Then f is Frechet differentiable at x if and only if ∂ f (x) is a singleton x∗.In that case x∗ = d f (x).

Proof. (only if). Suppose that f is F-differentiable at x. Then for each nonzero h ∈Rn we have for small positive τ ,

f (x+ τh) = f (x)+d f (x)(τh)+o(τ),

which implies that f ′(x;h) = d f (x)h. Therefore f ′(x; ·) is the linear functionald f (x)( ·); this is closed, so Theorem 8.2.3 tells us that it is the support functionof ∂ f (x). Taking conjugates, we find that the indicator of ∂ f (x) is the indicator ofd f (x), which proves the assertion.

(if). Suppose that ∂ f (x) = x∗. Applying Theorem 8.2.3 again, we find that theclosure of f ′(x; ·) is ⟨x∗, · ⟩. As the effective domain of this function is all of Rn

(hence is relatively open) the closure operation is irrelevant (Theorem 6.2.14), so infact we must have f ′(x; ·) = ⟨x∗, · ⟩. This is enough for Gateaux differentiability off , but to establish Frechet differentiability we have to show that the quantity

∥v∥−1[ f (x+ v)− f (x)−⟨x∗,v⟩]

converges to zero as v does (not just for v restricted to be of the form τh for fixed h).By part (e) of Theorem 8.3.1, x must lie in the interior of dom f . Therefore we can

find an n-simplex S containing the origin in its interior and such that x+S ⊂ dom f .Then S ⊃ δB for some positive δ , where B is the unit ball. Let the vertices of S behi for i = 0, . . . ,n, and for h ∈ S and for fixed τ ∈ (0,1] define

gτ(h) = τ−1[ f (x+ τh)− f (x)]−⟨x∗,h⟩.

This function gτ is convex, nonnegative (because x∗ ∈ ∂ f (x)), and finite. Moreover,as each point h ∈ S is a convex combination of the vertices, the value of gτ(h) isbounded above by the maximum of the values gτ(hi); call this maximum µ(τ).Note that for each i gτ(hi) converges to zero as τ → 0+ because f ′(x;hi) = ⟨x∗,hi⟩.Therefore µ(τ) also converges to zero as τ → 0+.

Now let 0 = v ∈ δB, and let

r(v) = ∥v∥−1[ f (x+ v)− f (x)−⟨x∗,v⟩].

Writeh = δv/∥v∥, τ = δ−1∥v∥,

so that h ∈ S and v = τh. By substitution in the formula for r(v) we find that

r(v) = δ−1gτ(h)≤ δ−1µ(τ).

Further, as ∥v∥ converges to zero so does τ; therefore limv→0 r(v) = 0. It followsthat f is in fact F-differentiable at x with d f (x) = x∗. ⊓⊔


We showed in Proposition 8.1.6 that the subdifferential of a closed proper convexfunction is cyclically monotone. The following theorem shows that the graph of anycyclically monotone multifunction from Rn to Rn is contained in the graph of oneof these subdifferentials.

Theorem 8.3.3. Let F be a multifunction from Rn to Rn. Then F is cyclically mono-tone if and only if there exists a closed proper convex function f on Rn with F ⊂ ∂ f .

Proof. (If). If F ⊂ ∂ f for some closed proper convex function F , then F must becyclically monotone because ∂ f is.

(Only if). If F = /0 the assertion is true with any choice of f . Suppose F = /0,choose any (x0,y0) ∈ F and for each x ∈ Rn define

f (x) = sup⟨ym,x− xm⟩+ ⟨ym−1,xm − xm−1⟩+ . . .+ ⟨y0,x1 − x0⟩| (x1,y1), . . . ,(xm,ym) ∈ F, m finite.

This f is the supremum of a collection of affine functions; therefore it is closedand convex. None of the affine functions can take −∞ anywhere, and as f is thesupremum of this collection it never takes −∞. As F is cyclically monotone wealways have f (x0)≤ 0, so f is not identically +∞. Accordingly, f is proper.

Now let (x,y) ∈ F . We must show that for each z ∈ Rn, f (z) ≥ f (x)+ ⟨y,z− x⟩,so that F ⊂ ∂ f . Another way of stating this inequality is to say that for each finitesubset (x1,y1), . . . ,(xm,ym) of F we have

f (z)≥ ⟨y,z− x⟩+ ⟨ym,x− xm⟩+ . . .+ ⟨y0,x1 − x0⟩,

but this is true by the definition of f , because the subset

(x1,y1), . . . ,(xm,ym),(x,y)

will have been included in the supremum operation. Therefore F ∈ ∂ f . ⊓⊔


Exercise 8.3.4. Let f be a proper convex function, and let x be a point of theboundary of dom f at which f is subdifferentiable. Show that there is a nonzerov∗ ∈ Ndom f (x) such that for each x∗ ∈ ∂ f (x) the halfline x∗+ v∗R+ is contained in∂ f (x). Show also that if x is in the relative boundary of dom f then v∗ can be chosento lie in pardom f .

8.4 The epsilon-subdifferential 175

8.4 The epsilon-subdifferential

Subgradients are invaluable tools for working with, and determining the propertiesof, convex functions. However, they may be hard to calculate and, without specialattention, they may not provide some of the useful features of derivatives. For exam-ple, the function f (x) = |x| on R has a subdifferential that equals −1 on (−∞,0),[−1,1] at the origin, and +1 on (0,+∞). Therefore at any nonzero point, even onevery close to the origin, the subdifferential consists of a single element, either +1 or−1. It does not give any indication that the function’s behavior may change violentlyin a very small neighborhood of that point, and for many purposes—particularly incomputation—such a warning can be helpful.

For that reason it is useful to introduce another object, the ε-subdifferential, thatextends the ordinary subdifferential. If we relax the requirements for a subgradientwe can obtain ε-subgradients that are sometimes easier to handle and can also bevery useful in applications. The ε-subdifferential of a function f at x is then the setof all ε-subgradients of f at x.

8.4.1 Definition and properties

Definition 8.4.1. Suppose f : Rn → R and let ε ≥ 0. An element x∗ of Rn is an ε-subgradient of f at x ∈ Rn if for each x′ ∈ Rn, f (x′) ≥ f (x)+ ⟨x∗,x′− x⟩− ε . Theε-subdifferential ∂ε f (x) of f at x is the set of all ε-subgradients of f at x. ⊓⊔

We have ∂ε f (x) ⊂ ∂η f (x) when ε ≤ η , and when ε is zero the definition ofε-subgradient reduces to that of the ordinary subgradient. If ε > 0 then an ε-subgradient gives information about minimization similar to what a subgradientprovides: x ∈ Rn is a global epsilon-minimizer of f (that is, for each x′ ∈ Rn onehas f (x′) ≥ f (x)− ε) if and only if the origin belongs to ∂ε f (x). Here are someproperties of ∂ε f analogous to those shown for ∂ f in Theorem 8.3.1.

Theorem 8.4.2. Let f be a proper convex function on Rn, and let x be a point atwhich f is finite. If ε > f (x)− cl f (x), then

a. ∂ε(x) is a nonempty closed convex set.b. rc∂ε f (x) = Ndom f (x).c. lin∂ε f (x) = (pardom f )⊥.d. ∂ε f (x) = (pardom f )⊥+K, where K = ∂ε f (x)∩ (pardom f ).e. K is compact if and only if x ∈ ridom f .f. ∂ε f (x) is compact if and only if x ∈ intdom f .

Proof. If we define g(y) = f (x+y)− f (x) as we did in the proof of Theorem 8.3.1,then g∗(x∗) = f ∗(x∗)+ f (x)−⟨x∗,x⟩ and ∂ε f (x) is the set of x∗ such that, for eachz,

[⟨x∗,z⟩− f (z)]+ f (x)−⟨x∗,x⟩ ≤ ε.


By taking the supremum over z we see that this is the set of x∗ for which g∗(x∗)≤ ε:that is, levε g∗. It is therefore a closed convex set. By hypothesis we have

f (x) = [ f (x)− cl f (x)]+ f ∗∗(x)< ε + supx∗

⟨x∗,x⟩− f ∗(x∗),

so that for some x∗ we have f (x) < ⟨x∗,x⟩ − f ∗(x∗) + ε , and therefore ∂ε f (x) isnonempty, which proves the claim in (a). The proof of (b)–(f) is just the same asthe proof for ∂ f (x) in Theorem 8.3.1, as we used there only the fact that we had anonempty level set of g∗. ⊓⊔

The level-set representation also helps us to identify ∂ε f ∗.

Theorem 8.4.3. Let f : Rn → R be a closed proper convex function, and let ε ≥ 0.Then ∂ε f ∗ = (∂ε f )−1.

Proof. Choose any x∗ and x in Rn. We have to show that (x∗,x) ∈ ∂ε f ∗ if and onlyif (x,x∗) ∈ ∂ε f (x).

First suppose that f ∗(x∗) =+∞. Then as f ∗ is proper ∂ε f ∗(x∗) must be empty, so(x∗,x) /∈ ∂ε f ∗. If x∗ were to belong to ∂ε f (x) then as f is proper f (x) would be finite,and for each x′ ∈Rn we would have f (x′)≥ f (x)+⟨x∗,x′−x⟩−ε . Rearranging thisyields

⟨x∗,x⟩− f (x)+ ε ≥ ⟨x∗,x′⟩− f (x′),

so that the finite quantity on the left must be an upper bound for f ∗(x∗), contradictingf ∗(x∗) = +∞. Therefore (x,x∗) /∈ ∂ε f (x).

The remaining case is that in which f ∗(x∗) is finite. Applying the function g usedin the proof of Theorem 8.4.2 to f ∗ instead of to f , we see that (x∗,x) ∈ ∂ε f ∗ if andonly if

f ∗∗(x)+ f ∗(x∗)−⟨x∗,x⟩ ≤ ε.

As f ∗∗ = f under our hypotheses, this is equivalent to g∗(x∗)≤ ε and therefore alsoequivalent to

x∗ ∈ levε g = ∂ε f (x),

which is the same as saying (x,x∗) ∈ ∂ε f . ⊓⊔

As indicated earlier, one of the reasons for using the ε-subdifferential in practiceis that it can give information about the behavior of the function at nearby points.The next theorem explains this.

Theorem 8.4.4. Let f : Rn → R be a proper convex function and let x ∈ ridom f .For each positive ε there is a neighborhood U of x in dom f such that

∂ε f (x)⊃ ∪w∈U ∂ f (w). (8.4)

Proof. Choose ε > 0 and write L for pardom f . Theorem 6.1.11 says that thereexist a constant λ and a neighborhood N of x in dom f such that whenever w andw′ belong to N, | f (w)− f (w′)| ≤ λ∥w−w′∥. Let BL(x,ρ) be a ball x+ h | h ∈


L, ∥h∥ ≤ ρ with center x and positive radius ρ ≤ ε/(2λ ), contained in intN. LetU = riBL(x,ρ), a neighborhood of x in dom f , choose w ∈ U , and suppose w∗ ∈∂ f (w). We will show that w∗ ∈ ∂ε f (x), and that will establish (8.4).

The first step is to estimate the size of the component of w∗ in L. Write w∗ =u∗ + v∗, with u∗ ∈ L and v∗ ∈ L⊥. Choose δ ∈ (0,ρ −∥w∥) and consider the ballDL := h ∈ L | ∥h∥ ≤ δ about the origin. If h ∈ DL then w+ h ∈ intB(x,ρ), andthen by using the Lipschitz continuity together with the facts that w∗ ∈ ∂ε f (x) and⟨v∗,h⟩= 0 we find that

λ∥h∥ ≥ f (w+h)− f (w)≥ ⟨w∗,h⟩= ⟨u∗,h⟩,

so thatλ ≥ ⟨u∗,h/∥h∥⟩. (8.5)

Taking the supremum in (8.5) over all nonzero h ∈ DL, we have ∥u∗∥ ≤ λ .Next, use the Lipschitz continuity again to obtain f (x)− f (w)≤ λ∥x−w∥≤ ε/2,

which we can rewrite asf (w)≥ f (x)− ε/2. (8.6)

As w− x ∈ L we also have

⟨w∗,w− x⟩= ⟨u∗,w− x⟩ ≤ ∥u∗∥∥w− x∥ ≤ λ [ε/(2λ )] = ε/2,

and we can rewrite this as

0 ≥ ⟨w∗,w− x⟩− ε/2. (8.7)

Adding (8.6) and (8.7), we obtain

f (w)≥ f (x)+ ⟨w∗,w− x⟩− ε,

which shows that w∗ ∈ ∂ε f (x). ⊓⊔

The support function of ∂ε f (x) turns out to be an object analogous to the ordinarydirectional derivative, which when closed was the support function of ∂ f (x).

Definition 8.4.5. Let f be a convex function on Rn, and let x be a point at whichf is finite. Let ε ≥ 0. For any h ∈ Rn the ε-directional derivative of f at x in thedirection of h is

f ′ε(x;h) = infτ>0

τ−1[ f (x+ τh)− f (x)+ ε].

⊓⊔

For ε = 0 this reduces to our previous definition. However, when ε > 0 the equiv-alent limit form used in Definition 8.2.1 no longer applies.

Theorem 8.4.6. Let f : Rn → R be a closed proper convex function that is finite atx ∈ Rn, and let ε > 0. Then f ′ε(x; ·) is the support function of ∂ε f (x).


Proof. Let g(y) = f (x+ y)− f (x); we showed in the proof of Theorem 8.4.2 that∂ε f (x) = levε g∗. The support function of this set is, by Theorem 7.1.10, clcone(g+ε), since the fact that f is closed means that g is also closed. However, as g(0) = 0we see that g+ ε is positive at the origin, so we can use Theorem 7.2.15 to removethe closure symbol. Then the support function of ∂ε f (x) takes, at any h ∈ Rn, thevalue

cone(g+ ε)(h) = infτ>0

τ−1[ f (x+ τh)− f (x)+ ε] = f ′ε(x;h),

as required. ⊓⊔

For a positive ε , a convex function f on Rn, and a point x ∈ Rn we always have∂ f (x) ⊂ ∂ε f (x). However, the second set may be much larger than the first: forexample, if g(y) = f (x+ y)− f (x) and if the function g∗ is rather flat, then the sets∂ε f (x) = levε g∗ can be expected to grow rapidly as ε increases. However, if weare willing to consider distance in the graph space Rn ×Rn then we can bound thedistance in terms of ε .

Theorem 8.4.7. Let f : Rn → R be a closed proper convex function and let ε ≥ 0.Suppose that (x,x∗) ∈ ∂ε f . Then for each positive β there is a unique y ∈ Rn suchthat

∥y∥ ≤ ε1/2, (x+βy,x∗−β−1y) ∈ ∂ f . (8.8)

In particular, if we use the Euclidean norm on Rn ×Rn then we have

e(∂ε f ,∂ f )≤ (2ε)1/2. (8.9)

Proof. Choose ε ≥ 0 and β > 0, and define h(z) = f (x+β z)+ (1/2)∥z−βx∗∥2.This function h is closed, proper, and convex. Moreover, it has compact level sets:to see this, use Proposition 6.2.13 to find an affine function minorizing f ; then thesum of this affine function and the quadratic term in h has compact level sets andminorizes h. The level sets of h must then be compact too, and so since h is lowersemicontinuous it has a minimizer y by Proposition 6.2.8. The minimizer is in factunique because of the presence of the quadratic term in h.

As y minimizes h we have 0 ∈ ∂h(y) (use the definition). We are going to showthat

0 ∈ β∂ f (x+βy)+ y−βx∗; (8.10)

this would be immediate from some calculus rules for subdifferentials developed inthe next chapter, but in this case we can give a direct proof. It is easy to show fromthe definition that the subdifferential of f (x+βy) as a function of y is β∂ f (x+βy).Now as y minimizes h we have h′(y;z)≥ 0 for each z. We can calculate directly thatthis directional derivative is

h′(y;z) = β f ′(x+βy;z)+ ⟨y−βx∗,z⟩.

Therefore the closed proper convex function −⟨y−βx∗, · ⟩ minorizes β f ′(x+βy; ·)and therefore also minorizes clβ f ′(x+βy; ·). This closure is the support functionof the set ∂k(y), where k(y) = f (x+βy). We saw above that ∂k(y) = β∂ f (x+βy).


By part (b) of Proposition 7.1.8 we then have

β∂ f (x+βy)⊃ βx∗− y,

which is the claim in (8.10).From (8.10) we see that (x+βy,x∗−β−1y) ∈ ∂ f . Therefore

f (x)≥ f (x+βy)+ ⟨x∗−β−1y,x− (x+βy)⟩.

However, we also have by hypothesis

f (x+βy)≥ f (x)+ ⟨x∗,(x+βy)− x⟩− ε,

and by adding these two inequalities we find that ∥y∥2 ≤ ε , so that ∥y∥ ≤ ε1/2, andthis proves (8.8). If we take β = 1 then the second assertion in (8.8) shows that

e(∂ε f ,∂ f )≤ (∥βy∥2 +∥β−1y∥2)1/2 ≤ (2ε)1/2,

which establishes (8.8). ⊓⊔

Chapter 9Functional operations

Up to now we have developed numerous properties of individual convex functions,but with few exceptions we have not done anything about combining different func-tions. This chapter shows how to use functional calculus to create new convex func-tions from existing ones.

9.1 Convex functions and linear transformations

Definition 9.1.1. Let g and h be convex functions from Rn to R and let G and H belinear transformations, G from Rm to Rn and H from Rn to Rm. The functions gGand Hh are defined from Rm to R by

(gG)(y) = g(Gy), (Hh)(y) = infh(x) | Hx = y.

A standard calculation using Proposition 6.1.3 shows that gG and Hh are convexfunctions. If g does not take −∞ then neither does gG, so gG is closed if g is closedand proper. However, gG need not be proper: for example, it might happen that theeffective domain of g did not meet the image of G, in which case gG would beidentically +∞. The function Hh, however, need be neither closed nor proper, evenwhen h is both closed and proper. For example, if h is the indicator of the set of xin R2 having x1 > 0 and x1x2 ≥ 1, and if H(x1,x2) = x1, then Hh is the indicatorof (0,+∞), which is not closed. Again, if h is defined on R2 by h(x) = x2 and ifH(x1,x2) = x1, then Hh is identically −∞, which is improper.

Even with no additional assumptions, we can establish a simple relationship be-tween these two operations.

Proposition 9.1.2. Let f be a convex function on Rn and let A be a linear transfor-mation from Rn to Rm. Then (A f )∗ = f ∗A∗.

Proof. Let y∗ ∈ Rm. Then

181

182 9 Functional operations

(A f )∗(y∗) = supy⟨y∗,y⟩−A f (y)

= supy⟨y∗,y⟩− inf

x f (x) | Ax = y

= supx,y

⟨y∗,y⟩− f (x) | Ax = y

= supx⟨y∗,Ax⟩− f (x)

= supx⟨A∗y∗,x⟩− f (x)

= ( f ∗A∗)(y∗),

so that (A f )∗ = f ∗A∗. ⊓⊔Proposition 9.1.2 explains the reason for using the notation Hh for the infimum

operation shown in Definition 9.1.1. It also shows in particular that f A is closedwhenever f is closed, because we can then write f A as (A∗ f ∗)∗. Still, it does not doanything to improve the possible bad behavior that we pointed out earlier. For thatwe have to assume a regularity condition.

To motivate this condition, remember that f A is identically +∞ (hence improper)whenever the image of A fails to meet dom f . Therefore, in order to show goodbehavior we certainly have to require that these sets meet. A slight strengthening ofthis requirement yields a regularity condition that in fact ensures much more. Wefirst use it to establish a closure formula, then to prove a much more comprehensiveresult

Theorem 9.1.3. Let f be a proper convex function on Rn and let A be a lineartransformation from Rm to Rn. If imA meets ridom f , then f A is a proper convexfunction, and one has cl( f A) = (cl f )A and reccl( f A) = (reccl f )A.

Proof. We have already seen that f A is convex. It never takes −∞ because f isproper, and the hypothesis ensures that it is not always +∞, so f A is proper. De-fine a linear transformation D from Rm+1 to Rn+1 by D(y,η) = (Ay,η). A point(y,η) belongs to epi f A exactly when the point (Ay,η) = D(y,η) belongs to epi f .Therefore epi f A = D−1(epi f ).

The effective domain of f A consists of those y for which Ay ∈ dom f : that is,A−1(dom f ). Under our hypothesis we have ridom f A = A−1(ridom f ) by Propo-sition 1.2.11, and this set is nonempty. Let y′ be any point in ridom f A, so thatAy′ ∈ ridom f , and let y be any point of Rm. By Corollary 6.2.16 we have

cl( f A)(y) = limλ↓0

( f A)[(1−λ )y+λy′]

= limλ↓0

f [(1−λ )Ay+λAy′]

= (cl f )(Ay)

= [(cl f )A](y),

which proves the assertion about the closure. Applying our earlier analysis to cl finstead of to f we find that epi[(cl f )A] = D−1(epicl f ), so by Corollary 4.1.4 we

9.1 Convex functions and linear transformations 183

have

epi rec[(cl f )A] = rcepi[(cl f )A]

= rc[D−1(epicl f )]

= D−1(rcepicl f )

= D−1 epi[rec(cl f )]

= epi[rec(cl f )]A.

Accordingly, rec[(cl f )A] = (reccl f )A. ⊓⊔

Theorem 9.1.4. Let f be a proper convex function on Rn, and let A be a lineartransformation from Rm to Rn. Assume that imA meets ridom f . Then:

a. f A and A∗ f ∗ are proper convex functions and ( f A)∗ = A∗ f ∗; in particular, A∗ f ∗

is closed;b. recA∗ f ∗ = A∗(rec f ∗), and this function is closed proper convex;c. For each y∗ ∈ Rm the infimum in the definition of A∗ f ∗(y∗) is attained, but may

be +∞.d. For each y ∈ Rm,

∂ ( f A)(y) = A∗∂ f (Ay); (9.1)

e. For each y∗ in Rm,

∂ (A∗ f ∗)(y∗) = y | for some (x∗,x) ∈ ∂ f ∗, A∗x∗ = y∗ and Ay = x. (9.2)

Proof. Theorem 9.1.3 says that f A is a proper convex function whose closure is(cl f )A, which is therefore also a proper function. We have already seen that A∗ f ∗

is convex, and Proposition 9.1.2 says that its conjugate is f ∗∗A∗∗ = (cl f )A. As theconjugate is proper, A∗ f ∗ is proper too. Suppose now that we could prove that A∗ f ∗

is closed. Then we would have shown that

( f A)∗ = ( f A)∗∗∗ = [( f A)∗∗]∗ = [cl( f A)]∗ = [(cl f )A]∗

= [(A∗ f ∗)∗]∗ = cl(A∗ f ∗) = A∗ f ∗,

where the last equality would follow from the proof that A∗ f ∗ is closed. This wouldbe enough to prove (a).

We know A∗ f ∗ is proper, so to show it is closed we need only show that epiA∗ f ∗

is closed. Define a linear transformation E∗ from Rn+1 to Rm+1 by E∗(x∗,ξ ) =(A∗x∗,ξ ).

If (y∗,ξ ∗) ∈ E∗(epi f ∗) then there is some (x∗,ξ ∗) ∈ epi f ∗ such that y∗ = A∗x∗.Then as ξ ∗ ≥ f (x∗) we have ξ ∗ ≥ (A∗ f ∗)(y∗) and so (y∗,ξ ∗) ∈ epiA∗ f ∗.

Also, if (z∗,ζ ∗) ∈ epiA∗ f ∗ then for each ε > 0 there is some v∗ with A∗v∗ = z∗

and f ∗(v∗)≤ ζ + ε . Then (v∗,ζ + ε) ∈ epi f ∗ so (z∗,ζ )+(0,ε) ∈ E∗(epi f ∗). As εwas arbitrary we have

E∗(epi f ∗)⊂ epiA∗ f ∗ ⊂ clE∗(epi f ∗). (9.3)


We will prove that E∗(epi f ∗) closed, which with (9.3) will show that

epi(A∗ f ∗) = E∗(epi f ∗). (9.4)

We can use Corollary 4.1.17 for that proof if we can show that epi f ∗ satisfies therecession condition for E∗. The first requirement of that condition is that epi f ∗

be closed, which we already have. The second is that (kerE∗)∩ (rcepi f ∗) be asubspace. To show this, suppose z∗ ∈ (kerE∗)∩ (rcepi f ∗). Then z∗ = (k∗,0) withk∗ ∈ kerA. Also

(k∗,0) ∈ rcepi f ∗ = epi rec f ∗,

so that (rec f ∗)(k∗) ≤ 0. Therefore k∗ ∈ rc f ∗, so that k∗ ∈ (kerA∗)∩ (rc f ∗). Thehypothesis says that (imA)∩ (ridom f ) is nonempty, which by Corollary 7.2.14 isequivalent to saying that (kerA∗)∩(rc f ∗) is a subspace. The latter statement impliesthat −k∗ ∈ (kerA∗)∩(rc f ∗), which in particular lets us conclude that rec f ∗(−k∗)≤0, or equivalently

−z∗ = (−k∗,0) ∈ epi rec f ∗ = rcepi f ∗.

We certainly have −z∗ ∈ kerE∗, so −z∗ belongs to (kerE∗)∩(rcepi f ∗), which musttherefore be a subspace. Therefore epi f ∗ satisfies the recession condition for E∗, soCorollary 4.1.17 shows that E∗(epi f ∗) is closed, and this proves (9.4) as well asshowing that epiA∗ f ∗ is closed. The latter tells us that A∗ f ∗ is lower semicontin-uous; however, we already know it is proper, so it is in fact closed. This proves(a).

For (b), we begin by invoking Theorem 7.2.10 to show that, as f ∗ is proper,

rec f ∗ = I∗dom f = I∗cldom f ,

and therefore (rec f ∗)∗= Icldom f . Then by Proposition 9.1.2, [A∗(rec f ∗)]∗=(Icldom f )A.The hypothesis says that imA meets the set

ridom f = ri(cldom f ) = ridom(Icldom f ).

Theorem 9.1.3 then shows that (Icldom f )A is proper, and by Proposition 7.1.2 sothen is A∗(rec f ∗).

Applying the method of proof used for part (a) above to rec f ∗ instead of to f ∗,we obtain

E∗(epi rec f ∗)⊂ epiA∗(rec f ∗)⊂ clE∗(epi rec f ∗). (9.5)

We also established there the hypotheses of Corollary 4.1.18, which says thatrcE∗(epi f ∗) is closed and that

E∗(rcepi f ∗) = rcE∗(epi f ∗). (9.6)

The left side of (9.6) is E∗(epi rec f ∗), which is then closed; therefore from (9.5) weobtain

epiA∗(rec f ∗) = E∗(epi rec f ∗). (9.7)

9.1 Convex functions and linear transformations 185

This shows that A∗(rec f ∗) is lower semicontinuous, but as it is proper it is alsoclosed. Now by using successively (9.7), (9.6), and (9.4), we obtain

epiA∗(rec f ∗) = E∗(epi rec f ∗)

= E∗(rcepi f ∗)

= rcE∗(epi f ∗)

= rcepiA∗ f ∗

= epi rec(A∗ f ∗).

(9.8)

As we already know that A∗(rec f ∗) is a closed proper convex function, this com-pletes the proof of (b).

For (c), we find from (9.4) that a point of the form (y∗,(A∗ f ∗)(y∗)) in epiA∗ f ∗

must be the image under E∗ of some point (x∗,ξ ∗) ∈ epi f ∗. The finite value ξ ∗

cannot be greater than f ∗(x∗) because of the definition of A∗ f ∗, so it must be equalto f ∗(x∗). The other possibility is that (A∗ f ∗)(y∗) = +∞, in which case there are nopoints (x∗,ξ ∗) ∈ epi f with A∗x∗ = y∗. Then the set over which the infimum is takenis empty, and the infimum is +∞ by convention.

To prove (d) let y ∈Rm and x = Ay, and suppose that (x,x∗) ∈ ∂ f . Let y∗ = A∗x∗

and suppose that z is any point of Rm. Then

f A(z) = f (Az)≥ f (Ay)+ ⟨x∗,Az−Ay⟩= ( f A)(y)+ ⟨y∗,z− y⟩,

so that y∗ ∈ ∂ ( f A)(y). It follows that A∗∂ f (Ay) ⊂ ∂ ( f A)(y). We did not need theregularity condition for this conclusion.

When the regularity condition does hold, we know from the part of the proofalready completed that the conjugate of f A is A∗ f ∗. Suppose that y∗ ∈ ∂ ( f A)(y);as y belongs to dom∂ ( f A)( ·) ⊂ dom( f A), ( f A(y) must be finite. We then have(A∗ f ∗)(y∗) = ⟨y∗,y⟩− f (Ay), so (A∗ f ∗)(y∗) is finite too. From (c) we have the ex-istence of some x∗ ∈ Rn with A∗x∗ = y∗ and f ∗(x∗) = A∗ f ∗(y∗). Then f ∗(x∗) =⟨A∗x∗,y⟩− f (x), and if we write x = Ay we see that (x,x∗) ∈ ∂ f . Therefore y∗ ∈A∗∂ f (Ay), so ∂ ( f A)(y)⊂ A∗∂ f (Ay). This proves (d).

For (e) we observe that the regularity condition in the hypothesis is satisfied forf if and only if it is satisfied for cl f . Apply (9.1) to the function cl f to obtain

(y,y∗) ∈ ∂ (cl f )A if and only if (x,x∗) ∈ ∂ (cl f ), A∗x∗ = y∗, x = Ay. (9.9)

Now use Proposition 8.1.6 to rewrite (9.9) as

(y∗,y) ∈ ∂ (A∗ f ∗) if and only if (x∗,x) ∈ ∂ f ∗, A∗x∗ = y∗, x = Ay. (9.10)

The content of (9.10) is that ∂ (A∗ f ∗)(y∗) consists of those y for which there is some(x∗,x) ∈ ∂ f ∗ such that A∗x∗ = y∗ and x = Ay. That is (9.2), so it completes the proofof (e). ⊓⊔


Theorem 9.1.4 has many applications, but one of the most useful is to sums ofconvex functions. We demonstrate this application after defining an additional func-tional operation that it requires.

Definition 9.1.5. Let f1, . . . , fk be convex functions from Rn to (−∞,+∞]. The infi-mal convolution of these functions is the function defined by

⊎k

i=1fi(x) = inf

k

∑i=1

fi(xi) |k

∑i=1

xi = x

.

A computation using Proposition 6.1.3 shows that this new function is convex.However, it might not be well behaved: for example, the infimum might not beattained. Moreover, even the function ∑k

i=1 fi can behave badly: it will be identically+∞ if there is no point common to all of the sets dom fi. However, under a simpleregularity condition these functions are well behaved and, in fact, closely related toeach other.

Theorem 9.1.6. Let f1, . . . , fk be proper convex functions on Rn. Assume that

∩ki=1 ridom fi = /0. (9.11)

Then

a. ∑ki=1 fi is a proper convex function whose closure is ∑k

i=1 cl fi and whose conju-gate is

⊎ki=1 f ∗i ;

b.⊎k

i=1 f ∗i is closed, proper, and convex, and the infimum in its definition is alwaysattained;

c. The recession function of ∑ki=1 cl fi is ∑k

i=1 reccl fi;d. ∂ (∑k

i=1 fi) = ∑ki=1 ∂ fi;

e. For each x∗ ∈ Rn, ∂ (⊎k

i=1 f ∗i )(x∗) is the set of x ∈ Rn such that there exist

x∗1, . . . ,x∗k with (x∗i ,x) ∈ ∂ f ∗i for each i and with ∑k

i=1 x∗i = x∗.

Proof. Define a function F on Rnk by F(x1, . . . ,xk) = ∑ki=1 fi(xi). Then F is con-

vex (Proposition 6.1.3) and proper, and (clF)(x1, . . . ,xn) = ∑ki=1 cl fi(xi) (Corollary

6.2.16). Define a linear transformation A from Rn to Rnk by Ax = (x, . . . ,x). Then∑k

i=1 fi = FA, and the condition (9.11) is the statement that imA meets ridomF . ByTheorem 9.1.3 we then have cl(FA) = (clF)A: that is, cl∑k

i=1 fi = ∑ki=1 cl fi. The

assertion about the recession function also follows from Theorem 9.1.3.Routine computation shows that

F∗(x∗1, . . . ,x∗n) =

k

∑i=1

f ∗i (x∗i ), A∗(x∗1, . . . ,x

∗k) =

k

∑i=1

x∗i , A∗F∗ =⊎k

i=1f ∗i ,

and that∂F(x1 . . . ,xn) =×k

i=1∂ fi(xi).

Applying Theorem 9.1.4 we find that the conjugate of FA is⊎k

i=1 f ∗i , that the latterfunction is closed proper convex and the infimum in its definition is always attained,

9.2 Moreau envelopes and applications 187

and that the subdifferential formulas asserted in the statement of the theorem arevalid. ⊓⊔

In the proof of Theorem 9.1.4 we were able to obtain the formula ∂ ( f A)(y) ⊃A∗∂ f (Ay) without any regularity assumptions. Therefore in the situation of Theorem9.1.6 we always have

∂ (k

∑i=1

fi)(x)⊃k

∑i=1

∂ fi(x),

but we need the intersection condition (9.11) to conclude that equality holds.With the results of Theorems 9.1.4 and 9.1.6 one can calculate the subdifferen-

tials of many composite functions of practical interest for optimization. A typicalresult is the following characterization of optimality in the problem of minimizinga convex function on a convex set.

Proposition 9.1.7. Let f be a proper convex function on Rn and let C be a convexsubset of Rn containing a point x0. For x0 to minimize f on C it suffices that

0 ∈ ∂ f (x0)+NC(x0), (9.12)

and if riC meets ridom f then (9.12) is also necessary.

Proof. If (9.12) holds then there is some x∗ ∈ ∂ f (x0) with −x∗ ∈ NC(x0). Supposex ∈C. Then

f (x)≥ f (x0)+ ⟨x∗,x− x0⟩.

However, ⟨x∗,x − x0⟩ ≥ 0 because −x∗ ∈ NC(x0). Therefore f (x) ≥ f (x0), so x0minimizes f on C.

Now assume that ridom f meets riC. The latter set is the relative interior of theeffective domain of IC, and so by Theorem 9.1.6 we have for each x

∂ ( f + IC)(x) = ∂ f (x)+∂ IC(x) = ∂ f (x)+NC(x). (9.13)

Saying that x0 minimizes f on C is the same as saying that it minimizes f + IC,and then by the definition of subdifferential we have 0 ∈ ∂ ( f + IC)(x0). Referring to(9.13), we see that we can rewrite this in the form (9.12), as claimed. ⊓⊔

To obtain a sufficient optimality condition we needed no regularity condition,while the added hypothesis was required for a necessary condition. This situationis typical for local minimization. Here, however, because of convexity it held forglobal minimization.

9.2 Moreau envelopes and applications

Given a convex function f and a positive parameter λ , one can create a new convexfunction eλ , called the Moreau envelope of f . The envelope function minorizes f


and is always continuously differentiable. Moreover, the construction of eλ providesadditional useful information about the subdifferential ∂ f .

Definition 9.2.1. Let f : Rn → R be a closed proper convex function, and let λ > 0.The Moreau envelope eλ is a function defined from Rn to R by

eλ (x) = infy f (y)+(2λ )−1∥y− x∥2.

⊓⊔

Theorem 9.2.2. Let f : Rn → R be a closed proper convex function, and let λ > 0.Then

a. eλ is a closed proper convex function, and for each x ∈ Rn the infimum in itsdefinition is attained at a unique point y = pλ (x) characterized by

x = y+λy∗, (y,y∗) ∈ ∂ f . (9.14)

b. For each x ∈ dom f one has eλ (x)≤ f (x), and if x ∈ dom∂ f then also

f (x)≤ eλ (x)+(1/2)λd[0,∂ f (x)]2. (9.15)

c. For fixed λ > 0 the functions pλ (x) and

dλ (x) := λ−1[x− pλ (x)]

are Lipschitz continuous on Rn with moduli 1 and λ−1 respectively.d. eλ is Frechet differentiable on Rn with derivative dλ .

Proof. Fix x∈Rn. As f is proper convex, it is finite on its effective domain, which isnonempty. Let x0 ∈ ridom f , so that f is subdifferentiable at x0, and let x∗0 ∈ ∂ f (x0).Then

f (y)+(2λ )−1∥y− x∥2 ≥ f (x0)+ ⟨x∗0,y− x0⟩+(2λ )−1∥y− x∥2 =: m(y).

The quadratic function m is closed proper convex, and for any v ∈ Rn and τ > 0 wehave

τ−1[m(x+ τv)−m(x)] = ⟨x∗0,v⟩+ τ(2λ )−1∥v∥;

if v = 0 this difference quotient converges to +∞ as τ does. Therefore rcm = 0,and by Theorem 7.2.9 the level sets of m are compact.

If we write g(y) = f (y)+ (2λ )−1∥y− x∥2 then g is closed proper convex as thesum of f and (2λ )−1∥( ·)− x∥2, by Theorem 9.1.6. As m ≤ g, for each real ϕ wehave levϕ g ⊂ levϕ m. The sets levϕ g are closed because g is closed, so they arecompact. Applying Proposition 6.2.8 to any nonempty level set we conclude that gattains its minimum on that level set, and this is the global minimum of g.

However, g(y) inherits the strong convexity of (2λ )−1∥y− x∥2, and this impliesthat any minimizer y of g is unique. For this y we must have


0 ∈ ∂g(y) = ∂ f (y)+λ−1(y− x),

where the equality comes from Theorem 9.1.6. Therefore there is some y∗ with(y,y∗) ∈ ∂ f and with x = y+λy∗. If we suppose that (z,z∗) ∈ ∂ f with x = z+λ z∗,then

0 = (y− z)+λ (y∗− z∗),

and as ∂ f is monotone, we have

0 = ⟨y− z,y− z⟩)+λ ⟨y∗− z∗,y− z⟩ ≥ ∥y− z∥2,

so that z = y and therefore z∗ = y∗. Accordingly, for each x ∈ Rn (9.14) determinesa unique pair (y,y∗) ∈ ∂ f .

If we observe that the conjugate of q(v) := (2λ )−1∥v∥2 is q∗(v∗) = (1/2)λ∥v∗∥2,whose effective domain is all of Rn, that f is closed, and that f ∗+q∗ is proper, thenwe can apply Theorem 9.1.6 to obtain

( f ∗+q∗)∗(x) = infy f (y)+(2λ )−1∥y− x∥2= eλ (x), (9.16)

which shows that eλ is a closed proper convex function. The proves (a).For (b), we have eλ (x)≤ g(x) = f (x) for each x ∈ Rn. Suppose x ∈ dom∂ f , and

use (9.16) to writef (x)− eλ (x) = f (x)− ( f ∗+q∗)(x).

It follows that for any x∗ ∈ Rn we have

f (x)− eλ (x)≤ f (x)− [⟨x∗,x⟩− f ∗(x∗)]+q∗(x∗).

We can make the difference f (x)− [⟨x∗,x⟩− f ∗(x∗)] as small as possible by choosingan x∗ in ∂ f (x), for which this difference will be zero. As ∂ f (x) is closed, we mayfurther restrict x∗ by choosing it to be the smallest element of ∂ f (x) so that ∥x∗∥=d[0,∂ f (x)]. Then

f (x)− eλ (x)≤ q∗(x∗) = (1/2)λd[0,∂ f (x)]2,

which establishes (b).To prove (c), fix λ > 0 and consider two points x1 and x2. For i = 1,2 the points

yi = pλ (xi) satisfy (yi,y∗i ) ∈ ∂ f and xi = yi +λy∗i . Therefore

∥x1 − x2∥2 = ∥y1 − y2∥2 +2λ ⟨y∗1 − y∗2,y1 − y2⟩+λ 2∥y∗1 − y∗2∥2

≥ ∥y1 − y2∥2 +λ 2∥y∗1 − y∗2∥2.(9.17)

As we may discard either of the terms on the right side of (9.17), this shows inparticular that pλ (x) is Lipschitzian on Rn with modulus 1, and that if we define

dλ (x) = λ−1[x− pλ (x)],


then dλ is Lipschitzian on Rn with modulus λ−1, because dλ (x) is just y∗ in theequation x = y+λy∗.

For (d), fix x ∈ Rn and λ > 0 and define for h ∈ Rn

a(h) = eλ (x+h)− eλ (x)−⟨dλ (x),h⟩.

We will show that ∥h∥−1|a(h)| approaches zero as h does, which will establish thatdλ (x) is the F-derivative of eλ f at x.

Let y = pλ (x); then we have

eλ (x) = f (y)+(2λ )−1∥y− x∥2, eλ (x+h)≤ f (y)+(2λ )−1∥y− (x+h)∥2.

Subtracting, we obtain

eλ (x+h)−eλ (x)≤ (2λ )−1∥y−(x+h)∥2−∥y−x∥2= ⟨dλ (x),h⟩+(2λ )−1∥h∥2,

so that a(h)≤ (2λ )−1∥h∥2. However, a(h) is a convex function that takes the value0 at the origin. Accordingly,

−a(h)≤ a(−h)≤ (2λ )−1∥h∥2,

so that |a(h)| ≤ (2λ )−1∥h∥2. Therefore

∥h∥−1|a(h)| ≤ (2λ )−1∥h∥,

which approaches zero as ∥h∥ does. ⊓⊔

The conclusions of (c) and (d) together show that eλ actually has an F-derivativethat is everywhere Lipschitz continuous with a uniform modulus.

We can use part (a) of Theorem 9.2.2 to establish some very significant factsabout subdifferentials of closed proper convex functions. We defined monotoneoperators at the beginning of Section 8.1 to be multifunctions F from Rn to Rn

such that for each pair of elements (x,y) and (x′,y′) in the graph of F one has⟨x − x′,y − y′⟩ ≥ 0. We also defined cyclically monotone operators in Definition8.1.5, and showed in Proposition 8.1.6 that subdifferentials of closed proper con-vex functions are cyclically monotone. However, any monotone operator retains itsmonotonicity if we arbitrarily delete points of its graph. Thus, one might wonderwhether one could augment the graphs of subdifferential operators to obtain largeroperators—in the sense of graph inclusion—that were still monotone or cyclicallymonotone. We now show that this cannot be done.

Definition 9.2.3. Let F be a monotone operator from Rn to Rn. F is maximal mono-tone if its graph is not properly contained in the graph of any monotone operator. Itis maximal cyclically monotone if its graph is not properly contained in the graph ofany cyclically monotone operator. ⊓⊔

Thus, the maximal monotone operators are those whose graphs cannot be en-larged without their losing the property of monotonicity, and a similar statement


applies to maximal cyclically monotone operators and the property of cyclic mono-tonicity.

Part (a) of Theorem 9.2.2 established the Moreau decomposition: given a closedproper convex function f on Rn and a positive λ , we can express each point x of Rn

uniquely in the form given by (9.14), namely

x = y+λy∗, (y,y∗) ∈ ∂ f .

This is a far-reaching generalization of the decompositions that we have alreadyseen, which express a point of Rn uniquely as a sum m+m∗, with m belonging toa subspace M and m∗ to M⊥, or as k+ k∗, where k belongs to a closed convex coneK and k∗ to its polar cone K, with ⟨k,k∗⟩ = 0. Each of these is a special case ofanother decomposition, which uses a closed convex set C in Rn and expresses anypoint x of Rn uniquely as the projection c of x on C plus an element c∗ of NC(x). AsNC = ∂ IC, we can now see that this is the special case of the Moreau decompositionthat uses f = IC.

One of the consequences of the Moreau decomposition is the following theoremon maximal monotonicity.

Theorem 9.2.4. If f is a closed proper convex function on Rn, then ∂ f is a maximalmonotone operator.

Proof. Suppose ∂ f ⊂ T , where T is a monotone operator. Let (t, t∗) be any pointof T and write z = t + t∗. Apply the Moreau decomposition to z with λ = 1 to find(y,y∗) ∈ ∂ f with y+ y∗ = z. Then

0 = (t + t∗)− (y+ y∗) = (t − y)+(t∗− y∗), (9.18)

and so0 = ∥t − y∥2 + ⟨t∗− y∗, t − y⟩. (9.19)

As both (t, t∗) and (y,y∗) belong to T , the second term on the right in (9.19) isnonnegative, which implies t = y. Putting that result in (9.18) we obtain t∗ = y∗, sothat (t, t∗) = (y,y∗) and therefore (t, t∗) ∈ ∂ f . As (t, t∗) was an arbitrary point of T ,we have T = ∂ f so that ∂ f is maximal monotone. ⊓⊔

Corollary 9.2.5. If f is a closed proper convex function on Rn, then ∂ f is a maximalcyclically monotone operator.

Proof. The cyclically monotone operators are a subclass of the monotone operators.As ∂ f is maximal monotone by Theorem 9.2.4 and is cyclically monotone, it is alsomaximal within the more restricted class of cyclically monotone operators. ⊓⊔

Here is another consequence of the Moreau decomposition. We use the term Lip-schitz homeomorphism to denote a homeomorphism h of sets contained in normedlinear spaces, such that both h and h−1 are Lipschitz continuous. In the follow-ing theorem we need to put a norm on the graph space Rn ×Rn; we can use∥(a,a∗)∥= max∥a∥,∥a∗∥.


Theorem 9.2.6. If f is a closed proper convex function on Rn, then the graph of ∂ fis Lipschitz homeomorphic to Rn.

Proof. Write G for the graph of ∂ f and fix any λ > 0. The map ρ(g,g∗) = g+λg∗

takes G into Rn, and the map σ(z) = [pλ (z),dλ (z)] takes Rn into G because dλ (z) ∈∂ f (pλ (z)). In fact each of these maps is onto, because for any z ∈ Rn there is aunique (g,g∗) ∈ G with g+λg∗ = z so that z = ρ(g,g∗), and for each (g,g∗) ∈ Gif we define z = g+λg∗ then g = pλ (z) and g∗ = dλ (z) so that σ(z) = (g,g∗). Themap ρ is evidently Lipschitz continuous, and part (c) of Theorem 9.2.2 shows thatσ is Lipschitz continuous. ⊓⊔

9.3 Systems of convex inequalities

This section proves two basic results about finite or infinite systems of convex in-equalities. We will use them in the next chapter to prove the von Neumann min-maxtheorem, but they have other uses as well as being of interest in themselves.

The first theorem deals with a finite set of inequalities, and one can regard it asan extension of the Gordan theorem of the alternative (Proposition 3.1.11). We writeΛm for the set of points (λ1, . . . ,λm) ∈ Rm

+ whose components sum to 1.

Theorem 9.3.1. Let f1, . . . , fI be proper convex functions on Rn, and let C be aconvex subset of Rn such that

riC ⊂ ∩Ii=1 dom fi.

Then supIi=1 fi(x) ≥ 0 for each x ∈ C if and only if there exists p∗ ∈ ΛI such that

∑Ii=1 p∗i fi(x)≥ 0 for each x ∈ clC.

The reason for regarding this as an extension of the Gordan theorem is that thesupremum of the fi will be nonnegative on C if and only if there is no x ∈ C suchthat fi(x)< 0 for i = 1, . . . , I. This is an extension of the matrix inequality Ax < 0 inthe Gordan theorem, if one regards the rows Ai of A as generating convex functionsfi(x) = ⟨Ai,x⟩.

Proof. (If.) The supremum of the fi majorizes ∑i=1I p∗i fi whenever p∗ ∈ ΛI .(Only if.) If C is empty then any p∗ ∈ΛI will work, so assume that C is nonempty.

Define

D = w ∈ RI | for some x ∈C and each i = 1, . . . , I, fi(x)< wi.

The hypothesis says that D is disjoint from RI−. Moreover, D is convex: take two

points in it, choose λ ∈ (0,1), and apply Proposition 6.1.3 to each fi. Thereforewe can separate D and RI

− properly by a hyperplane with normal p∗. Lemma 3.1.6says that we can choose p∗ ∈ RI

+ (the polar of RI−), and we can suppose that the

hyperplane passes through the origin. Moreover, we can then scale p∗ so that itscomponents sum to 1, so that p∗ ∈ ΛI . Then for each w ∈ D, ⟨p∗,w⟩ ≥ 0.

9.3 Systems of convex inequalities 193

Let X = C ∩ [∩Ii=1 dom fi]. This X is a convex set on which every fi is finite.

Moreover, by hypothesis riC ⊂ dom fi for each i, so we have

riC ⊂ X ⊂C. (9.20)

Therefore C and X have the same affine hull, and riX is a nonempty subset of riC. Ifx ∈ X then choose any positive ε and set wi = fi(x)+ ε for i = 1, . . . , I; then w ∈ Dso that ⟨p∗,w⟩ ≥ 0 and therefore ∑i=1I p∗i fi(x) ≥ −ε . As ε was arbitrary we have∑i=1I p∗i fi(x) ≥ 0, so that the function f = ∑i=1I p∗i fi is nonnegative on X . As f isproper convex, if x is any point of clX then f (x) ≥ (cl f )(x), and (cl f )(x) is thelimit of values of f along a line segment from a point in riX to x. These values areall nonnegative, so f (x) is nonnegative. Accordingly, f is nonnegative on clX .Butby taking closures in (9.20) and using clC = cl riC we obtain clX = clC, so f isnonnegative on clC. ⊓⊔

The second theorem deals with a possibly infinite set of inequalities, but strength-ens the requirement on the functions and on the set C.

Theorem 9.3.2. Let fα | α ∈ A be a collection of closed proper convex functionson Rn, and let C be a compact convex subset of Rn such that C ⊂ dom fα for eachα ∈ A. Then supα∈A fi(x)> 0 for each x ∈C if and only if there exist a finite integerK, p∗ ∈ ΛK , and ε > 0 such that ∑k=1K p∗k fk(x)≥ ε for each x ∈C.

Proof. (If.) When p∗ ∈ ΛK , supα∈A fi(x) majorizes any finite sum of the form∑k=1K p∗k fk(x).

(Only if.) As before, if C is empty then any p∗ ∈ ΛI will work; therefore assumethat C is nonempty. For each x ∈C we are given that the supremum of the fα(x) ispositive, so at least one of these values, say fαx(x), is greater than a positive numberεx. Further, as fαx is lower semicontinuous there is a neighborhood Nx of x on whichfαx remains greater than εx.

The neighborhoods Nx | x ∈ C form an open cover of the compact set C, sothere is a finite subcover that we will enumerate as N1, . . . ,NK , associated with func-tions f1, . . . , fK and positive numbers ε1, . . . ,εK . Let ε be the (positive) minimum ofε1, . . . ,εK ; then for each x ∈C there is a k′ such that x ∈ Nk′ , and therefore

Ksupk=1

( fk(x)− ε)≥ fk′(x)− ε > 0.

Therefore the supremum of the functions fk(x)−ε | k = 1, . . . ,K is positive on C.Apply Theorem 9.3.1 to this finite collection of functions to produce p∗ ∈ ΛK

such that the function ∑Kk=1 p∗k [ fk(x)−ε] is nonnegative for each x∈C. Then because

the p∗k sum to 1, for each such x

K

∑k=1

p∗k fk(x)≥K

∑k=1

p∗k [ fk(x)− ε]+ ε ≥ ε.

Accordingly, ∑Kk=1 p∗k fk is never less than ε on C. ⊓⊔

Chapter 10Duality

A particularly useful application area for the techniques of convexity has been dual-ity, in which one associates with an optimization problem called the primal problema dual problem, also involving optimization but typically in a different space, andthen exploits information that each of the two problems provides about the other.Duality actually is a very simple concept that does not require any convexity at all.However, the best known and probably most effective applications have been in ar-eas where some convexity appears, because the methods of convexity facilitate thefunctional operations required, and they also provide symmetry in the relationshipbetween the primal and dual problems.

The initial section of this chapter introduces the underlying conceptual modeland the fundamental concept of a saddle point, and it establishes some basic defini-tions and notation. These require no convexity. The next section proves a min-maxtheorem asserting that under certain hypotheses saddle points will exist.

We then show how, given a convex optimization problem, one may embed thatproblem in a larger problem to which the conceptual model applies. This embeddingis the essence of duality. We illustrate it with a duality analysis of a general linearprogramming problem, which shows the procedure and the key objects involvedwith minimal technical complications. We then employ the same embedding methodto treat a more general convex optimization problem, and show how elements ofeach of the primal and dual problems provide insight into the behavior of the otherproblem.

Finally we develop criteria for the existence of optimal solutions. These involvesubgradients, and therefore the information that we developed in Chapters 8 and 9about the behavior of subgradients and their connection with directional differenti-ation give additional information about the behavior of these solutions.

195

196 10 Duality

10.1 Conceptual model

To introduce the concept of duality, suppose that S and T are two sets and that ϕis an extended real-valued function from S × T to R. One can think of functionvalues ϕ(x,y) as the outcomes of actions by two players, Xavier and Yvette, whocontrol the variables x ∈ S and y ∈ T respectively. Xavier wishes to make ϕ small,whereas Yvette wishes to make it large. Thus, one way to interpret ϕ would be asa money payment from Xavier to Yvette that can take on either positive or negativevalues, with negative values indicating a transfer of money from Yvette to Xavier.The fundamental model that we are using here is thus a two-person zero-sum game[11, 21, 27, 45], but with a general payoff function rather than the bilinear functionresulting from the well known finite (or matrix) games.

Define two subsets of S and T respectively by

dom1 ϕ = x ∈ S | for each y ∈ T , ϕ(x,y)<+∞,dom2 ϕ = y ∈ T | for each x ∈ S, ϕ(x,y)>−∞,

(10.1)

and define domϕ = dom1 ϕ ×dom2 ϕ . Thus, if (x,y) ∈ domϕ then ϕ(x,y) is finite,though it may also be finite at points outside domϕ . We say that ϕ is proper ifdomϕ = /0.

Now use ϕ to define two functions ξ : S → R and η : T → R by

ξ (x) = supy∈T

ϕ(x,y), η(y) = infx∈S

ϕ(x,y). (10.2)

In terms of the game model, ξ provides for each strategy x available to Xavier anupper bound on what he will have to pay if he uses x, given that Yvette observes hischoice and then does the best she can against that strategy, and η provides for eachy available to Yvette a lower bound on what she can get if she employs y.

If x /∈ dom1 ϕ then there is a y′ ∈ T such that ϕ(x,y′) =+∞, and therefore ξ (x) =+∞. For that reason, Xavier would never want to choose x outside dom1 ϕ . Similarly,if y /∈ dom2 ϕ then η(y) = −∞ and Yvette would never want to choose y outsidedom2 ϕ . We then have

infx∈S

ξ (x) = infx∈dom1 ϕ

ξ (x), supy ∈ T η(y) = supy∈doms ϕ

η(y).

The definitions of ξ and η show that for each x0 ∈ S and each y0 ∈ T ,

ξ (x0) = supy

ϕ(x0,y)≥ ϕ(x0,y0)≥ infx

ϕ(x,y0) = η(y0). (10.3)

Thus ξ (x)≥ η(y) whenever (x,y) ∈ S×T . This is the weak duality inequality.By taking the infimum over x ∈ S and the supremum over y ∈ T in (10.3) we find

thatinfx∈S

supy∈T

ϕ(x,y)≥ supy∈T

infx∈S

ϕ(x,y). (10.4)

10.1 Conceptual model 197

The two sides of (10.4) may very well be unequal: for example, the problem inwhich S and T are each 0,1 and

ϕ(x,y) =

1 if x = y,0 if x = y,

(10.5)

has ξ (x)≡ 1 and η(y)≡ 0, so that infx supy ϕ(x,y) = 1 and supy infx ϕ(x,y) = 0.

Definition 10.1.1. A point (x0,y0)∈ domϕ is a saddle point of ϕ if for each (x,y)∈domϕ one has

ϕ(x,y0)≥ ϕ(x0,y0)≥ ϕ(x0,y). (10.6)

Inspection of (10.6) shows that a saddle point satisfies

η(y0) = minx∈dom1 ϕ

ϕ(x,y0) = ϕ(x0,y0) = maxy∈dom2 ϕ

ϕ(x0,y)ξ (x0). (10.7)

Thus, by choosing x0 Xavier can guarantee that the payoff will be no larger thanϕ(x0,y0), though it may be smaller, and by choosing y0 Yvette can guarantee thatthe payoff will be no smaller than ϕ(x0,y0), though it may be larger. In defining asaddle point we restrict the variables to domϕ because, as observed above, playerswould never want to choose points outside domϕ .

A saddle point need not exist: e.g., the problem described in (10.5) has none.Moreover, if a saddle point exists it need not be unique, as one can see by amending(10.5) to make ϕ(x,y)≡ 1, so that every pair (x,y) is a saddle point.

We saw in (10.3) that we always had ξ (x)≥ ϕ(x,y)≥ η(y), and the example in(10.5) showed that we might never have ξ (x) = η(y). The following theorem showsthat a pair (x,y) achieves this equality precisely when it is a saddle point. It alsodevelops some properties of the set of saddle points.

Theorem 10.1.2. Let S and T be sets, with ϕ : S×T → R, and define ξ : S → R andη : T → R by (10.2). If ϕ is proper, then the following are equivalent:

a. (x0,y0) ∈ domϕ and ξ (x0) = η(y0).b. (x0,y0) is a saddle point of ϕ .

If these two conditions hold, then

ξ (x0) = ϕ(x0,y0) = η(y0). (10.8)

Further, there are subsets ΦS of S and ΦT of T such that the set Φ of saddlepoints has the form Φ = ΦS ×ΦT , and every saddle point has the same value of ϕ .

Proof. (a) implies (b). If (a) holds then (x0,y0) ∈ domϕ . Moreover, by applyingthe weak duality inequality (10.3) to (x0,y0) and using (a), we find that for each(x,y) ∈ domϕ ,

ϕ(x,y0)≥ η(y0) = ξ (x0)≥ ϕ(x0,y0)≥ η(y0) = ξ (x0)≥ ϕ(x0,y),

198 10 Duality

which shows that (x0,y0) is a saddle point and also shows that (a) implies (10.8).(b) implies (a). If (b) holds then (x0,y0) ∈ domϕ , and (10.7) implies that η(y0 =

ϕ(x0,y0) = ξ (x0).The final claims are true if there are no saddle points, so we can suppose that Φ

is nonempty. Let

ΦS = x ∈ S | for some y′ ∈ T , (x,y′) ∈ Φ,ΦT = y ∈ T | for some x′ ∈ S, (x′,y) ∈ Φ.

We have ΦS ⊂ dom1 ϕ , ΦT ⊂ dom2 ϕ , and Φ ⊂ ΦS ×ΦT .Now let (x,y) ∈ ΦS ×ΦT ; we will show that (x,y) ∈ Φ . By construction, there

are points x′ ∈ dom1 ϕ and y′ ∈ dom2 ϕ such that (x,y′) and (x′,y) belong to Φ . Then

ϕ(x′,y)≥ ϕ(x′,y′)≥ ϕ(x,y′)≥ ϕ(x,y)≥ ϕ(x′,y), (10.9)

so that all of these function values are the same.Recalling that (x′,y) is a saddle point and using (10.7) we find that

ξ (x′) = ϕ(x′,y) = η(y), (10.10)

and using the same reasoning on (x,y′) yields

ξ (x) = ϕ(x,y′) = η(y′). (10.11)

Using the equality of the values in (10.9) we can rearrange (10.10) and (10.11) toobtain

ξ (x) = ϕ(x,y) = η(y), ξ (x′) = ϕ(x′,y′) = η(y′).

Now from the equivalence of (a) and (b) we conclude that both (x,y) and (x′,y′)are saddle points, so that in particular (x,y) ∈ Φ and therefore Φ = ΦS ×ΦT . Tosee that any two saddle points have the same value of ϕ , let (u,v) and (u′,v′) besaddle points. Applying the argument that we just finished to these pairs instead ofto (x,y′) and (x′,y) shows that both (u,v′) and (u′,v) are saddle points. These fourpairs will then satisfy an inequality of the form (10.9), which shows in particularthat ϕ(u,v) = ϕ(u′,v′). ⊓⊔

The unique value that ϕ takes at each saddle point is called the saddle value ofϕ . The saddle value therefore represents the payoff resulting from a certain type ofequilibrium in which neither player can do any better by unilaterally departing fromthe equilibrium choices represented by the saddle point. It is a very special case ofthe Nash equilibrium [11, Section 1.2]. However, although ϕ takes the saddle valueat every saddle point, a point of S×T at which ϕ takes the saddle value may or maynot be a saddle point.

One might have reservations about this model on the grounds that it begins witha function ϕ that takes extended real values, which might seem artificial. Could westart with a model involving a real-valued function ϕr and constraints x ∈ X ⊂ S andy ∈ Y ⊂ T , then recover ϕ in a natural way by using extended real values to remove

10.1 Conceptual model 199

the constraints on x and y? We can do so, but in the process we will discover anindeterminacy about ϕ that might not have been evident from the initial discussionabove.

Let X and Y be nonempty subsets of S and T respectively, and let ϕr : X ×Y →R.Define ξr : X → R and ηr : Y → R by

ξr(x) = supy∈Y

ϕr(x,y), ηr(y) = supx∈X

ϕr(x,y). (10.12)

We can interpret these quantities in terms of a game just as we did in the extended-real-valued case.

Now define ϕ : S×T → R as follows:

ϕ(x,y) =

ϕr(x,y) if (x,y) ∈ X ×Y ,+∞ if x /∈ X and y ∈ Y ,−∞ if x ∈ X and y /∈ Y ,arbitrary values if x /∈ X and y /∈ Y .

(10.13)

If we construct from this ϕ the various elements of the extended-real-valued modelabove, we recover exactly the real-valued model, as we now see.

Proposition 10.1.3. Let X and Y be nonempty subsets of S and T respectively, andlet ϕr : X ×Y → R. Define ϕ : S×T → R by (10.13). Then ϕ is proper, with

dom1 ϕ = X , dom2 ϕ = Y, (10.14)

and with the functions ξ and η defined in (10.2) given by

ξ (x) =

ξr(x) = supy∈Y ϕr(x,y) if x ∈ X,+∞ if x /∈ X,

(10.15)

and

η(y) =

ηr(y) = infx∈X ϕr(x,y) if y ∈ Y ,−∞ if y /∈ Y .

(10.16)

Proof. From (10.13) we see that if x ∈ X then the value of ϕ(x,y) will be ϕr(x,y) ify ∈ Y and −∞ if y /∈ Y . Therefore ϕ(x,y) < +∞ for every y, so x ∈ dom1 ϕ . On theother hand, if x /∈ X then if we choose y ∈ Y , which is nonempty by hypothesis, wehave ϕ(x,y) = +∞ so that x /∈ dom1 ϕ . Therefore dom1 ϕ = X . A similar argumentshows that dom2 ϕ = Y , which establishes (10.14).

If x ∈ X , then as ϕ(x,y) =−∞ for y /∈ Y we have

ξ (x) = supy

ϕ(x,y) = supy∈Y

ϕ(x,y) = supy∈Y

ϕr(x,y) = ξr(x),

200 10 Duality

because ϕ agrees with ϕr on X ×Y . If x /∈ X , then as Y is nonempty there is a pointy for which ϕ(x,y) = +∞, so that ξ (x) = +∞. This establishes (10.15). A parallelargument for η establishes (10.16). ⊓⊔

The indeterminacy mentioned just before (10.12) is in the assignment in (10.13)of arbitrary values to ϕ(x,y) when x /∈ X and y /∈ Y . Evidently these values do notmatter for any of the work we have done up to this point. They do matter, however,for operations to regularize saddle functions by imposing semicontinuity require-ments. However, for the purpose of this section we are free to use any values wewant to, and two especially natural choices are to set ϕ(x,y) = −∞ for x /∈ X andy /∈ Y , or to use +∞ instead of −∞ in assigning these values. These two choicesproduce, respectively, the lower simple extension and the upper simple extension ofϕr.

The mention of semicontinuity might call to mind how little we have actuallyassumed about ϕ , X , Y , S, and T up to this point. We did not impose any structureon the spaces S and T , nor on X and Y except that they were nonempty subsets ofS and T , and we assumed nothing about ϕ except that it was a function. Althougheven without such assumptions we were able to establish some relationships amongvarious concepts and to give a game interpretation, it would be nice to have someidea of when saddle points can be expected to exist. In the next section we give anexistence proof for saddle points in a setting with additional structure.

10.2 Existence of saddle points

Theorems asserting the existence of saddle points are usually called min-max theo-rems. One such theorem, proved by John von Neumann (1928), involves two com-pact convex sets X ⊂ Rn and Y ⊂ Rm and a function ϕ : X ×Y → R such that foreach x ∈ X ϕ(x, ·) is concave and upper semicontinuous, and for each y ∈ Y ϕ( · ,y)is convex and lower semicontinuous. It shows that any such ϕ must have a saddlepoint in X ×Y .

More recently there have been proofs of various min-max theorems that requireweaker assumptions than does the von Neumann theorem. We will prove one, dueto Sion (1958), that has been very useful. Versions of this theorem are available forgeneral spaces, but we will work with nonempty subsets X of Rn and Y of Rm. Wewill use a function ϕ : X ×Y → R that is lower semicontinuous in x for each y ∈ Yand upper semicontinuous in y for each x ∈ X .

The semicontinuity assumptions on ϕ allow us to say a little more about the ex-pressions infx∈X supy∈Y ϕ(x,y) and supy∈Y infx∈X ϕ(x,y). The function ξ : X → Rgiven by ξ (x) := supy∈Y ϕ(x,y) is the supremum of a collection of functionsϕ( · ,y) | y ∈ Y, each of which is lower semicontinuous by hypothesis. By Propo-sition 6.2.5 ξ is then lower semicontinuous on X . A similar argument shows thatη(y) := infx∈X ϕ(x,y) is upper semicontinuous on Y .

If either or both of the sets X and Y is compact, we can do more. First, if X is com-pact then for each y∈Y we have by Proposition 6.2.8 infx∈X ϕ(x,y)=minx∈X ϕ(x,y)

10.2 Existence of saddle points 201

and thereforesupy∈Y

infx∈X

ϕ(x,y) = supy∈Y

minx∈X

ϕ(x,y).

Also, Proposition 6.2.8 then shows that for some x0 ∈ X ,

infx∈X

supy∈Y

ϕ(x,y) = infx∈X

ξ (x) = ξ (x0) = minx∈X

ξ (x) = minx∈X

supy∈Y

ϕ(x,y).

Similarly, if Y is compact then

infx∈X

supy∈Y

ϕ(x,y) = infx∈X

maxy∈Y

ϕ(x,y),

and for some y0 ∈ Y ,

supy∈Y

infx∈X

ϕ(x,y) = supy∈Y

η(y) = η(y0) = maxy∈Y

η(y) = maxy∈Y )

infx∈X

ϕ(x,y).

If both X and Y are compact then we can replace inf by min and sup by max every-where, and write

minx∈X

maxy∈Y

ϕ(x,y), maxy∈Y )

minx∈X

ϕ(x,y).

The Sion theorem uses functions that are quasiconvex or quasiconcave. Here is adefinition.

Definition 10.2.1. Let C be a convex subset of Rn. A function q : C → R is quasi-convex on C if for each real µ the set x∈C | q(x)≤ µ is convex. q is quasiconcaveon C if −q is quasicovex on C.

Thus quasiconvexity amounts to convexity of the lower level sets of a function. It isa much weaker assumption than convexity. An equivalent property is that for eachx and x′ in C and each λ ∈ [0,1], one has q[(1−λ )x+λx′]≤ maxq(x),q(x′). Formore on quasiconvexity and related topics, see e.g. [13, Section 2.4] or [22, Chapter9].

We will establish the theorem by first proving a technical lemma from which thetheorem follows immediately. To simplify the statements, we list first the followinghypotheses that will apply to both results.

Let X and Y be nonempty convex subsets of Rn and Rm

respectively, with X compact. Let ϕ : X ×Y → R be a function

that is lower semicontinuous and quasiconvex in x for each y ∈ Y ,

and upper semicontinuous and quasiconcave in y for each x ∈ X.

(10.17)

Here is the lemma.

Lemma 10.2.2. Assume (10.17) and suppose in addition that y1, . . . ,yk is a finitesubset of Y and that

202 10 Duality

0 < minx∈X

kmaxi=1

ϕ(x,yi). (10.18)

Then there exists y0 ∈ Y such that

0 < minx∈X

ϕ(x,y0). (10.19)

Proof. For k = 1 there is nothing to prove. We will show that the lemma is true inthe case k = 2, which requires most of the work, then establish it for general k byinduction.

For the case k = 2, if the lemma is not true then

For each y ∈ Y , minx∈X

f (x,y)≤ 0. (10.20)

Use (10.18) to find a real µ with

0 < µ < minx∈X

2maxi=1

f (x,yi). (10.21)

Write [y1,y2] for the closed line segment between y1 and y2, and for each y ∈ [y1,y2]define

C0(y) := x ∈ X | ϕ(x,y)≤ 0, Cµ(y) := x ∈ X | ϕ(x,y)≤ µ. (10.22)

These two sets are nonempty (by (10.20)), closed, and convex. Write C1 for Cµ(y1)and C2 for Cµ(y2). We have C1 ∩C2 = /0 because otherwise for some x′ ∈ X wewould have maxϕ(x′,y1),ϕ(x′,y2) ≤ µ , contradicting (10.21).

For each x ∈ X and each y ∈ [y1,y2] we have

ϕ(x,y)≥ minϕ(x,y1),ϕ(x,y2) (10.23)

by quasiconcavity of ϕ(x, ·). Fix y′ ∈ [y1,y2]. If Cµ(y′) ⊂C1∪C2 then there is an x′ ∈X with ϕ(x′,yi) > µ for i = 1,2 but ϕ(x′,y′) ≤ µ , contradicting (10.23). ThereforeCµ(y′)⊂C1 ∪C2.

Now define Di =Cµ(y′)∩Ci for i = 1,2. These two sets are closed and convex;we have D1 ∩D2 = /0 and D1 ∪D2 = Cµ(y′). If both sets were nonempty it wouldcontradict the fact that by Proposition C.1.2 Cµ(y′) is connected. Therefore one isempty, so we have

C0(y′)⊂Cµ(y′)⊂C1 or C0(y′)⊂Cµ(y′)⊂C2. (10.24)

Now define Y1 to be the set of those y′ ∈ [y1,y2] for which the first alternative in(10.24) holds, and Y2 to be the set of those for which the second holds. These arenonempty, because yi ∈ Yi for i = 1,2, and we have

Y1 ∩Y2 = /0, Y1 ∪Y2 = [y1,y2]. (10.25)


We will show next that each Yi is closed in [y1,y2]. The proof is similar for each,so we prove only that Y1 is closed. Choose a sequence wm ⊂ Y1 that convergesto some w0 belonging to the closed set [y1,y2]. To show that w0 ∈ Y1 we need toshow that Cµ(w0) ⊂ C1. Choose any x0 ∈ C0(w0). We have ϕ(x0,w0) ≤ 0 < µ , soby upper semicontinuity of ϕ(x0, ·) there is a neighborhood N of w0 such that foreach w ∈ N, ϕ(x0,w)< µ . Take m large enough so that ym ∈ N: then ϕ(x0,ym)< µso x0 ∈Cµ(ym). We took ym to be in Y1, so x0 ∈Cµ(ym)⊂C1. But x0 was any pointof C0(w0), so C0(w0) ⊂C1 and then by (10.24) also Cµ(w0) ⊂C1. This shows thatw0 ∈ Y1 and therefore that Y1 is closed in [y1,y2].

As Y1 and Y2 are nonempty, closed in [y1,y2], and satisfy (10.25) we have a con-tradiction to the connectedness of [y1,y2]. This implies that (10.20) does not hold,and therefore that there exists y0 ∈ Y such that 0 < minx∈X ϕ(x,y0), which provesthe lemma in the case k = 2.

Now suppose that we have established the lemma for all values of k less thansome k > 2. Suppose that (10.18) holds. We will show that (10.19) holds.

Define Z to be x ∈ X | ϕ(x,yk) ≤ 0. This Z is compact and convex. If it isempty, take y0 = yk and we have (10.19). Suppose it is nonempty. From (10.18) wehave

0 < minx∈X

kmaxi=1

ϕ(x,yi). (10.26)

For x ∈ Z, ϕ(x,yk)≤ 0 and therefore the index k cannot contribute to the maximumin (10.26). Accordingly, we have

0 < minx∈Z

k−1maxi=1

ϕ(x,yi). (10.27)

Apply the induction hypothesis to (10.27) to conclude that there is some y′ ∈Y suchthat

0 < minx∈Z

ϕ(x,y′).

Now we have

If x ∈ Z, then 0 < ϕ(x,y′)≤ maxϕ(x,y′),ϕ(x,yk),If x ∈ X\Z, then 0 < ϕ(x,yk)≤ maxϕ(x,y′),ϕ(x,yk),

from which it follows that

0 < minx∈X

maxϕ(x,y′),ϕ(x,yk),

and by the first part of the proof there is y0 such that (10.19) holds. ⊓⊔

Next, the theorem.

Theorem 10.2.3 (Sion, 1958). Assume (10.17). Then

minx∈X

supy∈Y

ϕ(x,y) = supy∈Y

minx∈X

ϕ(x,y). (10.28)

204 10 Duality

Proof. We already know from (10.4) that ≥ holds in (10.28), so we need only prove≤. Let

α < minx∈X

supy∈Y

ϕ(x,y); (10.29)

we will show thatα < sup

y∈Yminx∈X

ϕ(x,y), (10.30)

and that will complete the proof.For y ∈ Y define S(y) := x ∈ X | ϕ(x,y) ≤ α. These sets are closed by lower

semicontinuity of ϕ( · ,y) and compact because they are subsets of X . If their inter-section is nonempty then there is some x0 ∈ X such that for each y∈Y , ϕ(x0,y)≤α .Then

minx∈X

supy∈Y

ϕ(x,y)≤ supy∈Y

ϕ(x0,y)≤ α,

which contradicts (10.29). Therefore ∩y∈Y S(y) = /0, and by the finite intersectionprinciple there is a finite subset y1, . . . ,yk of Y such that ∩k

i=1S(yi)= /0. This meansthat for each x ∈ X we have maxk

i=1 ϕ(x,yi) > α . The function on the left side ofthis inequality is lower semicontinuous in x by Proposition 6.2.5, and it attains itsminimum for x ∈ X by Proposition 6.2.8. Therefore the minimum must be strictlygreater than α , so that

α < minx∈X

kmaxi=1

ϕ(x,yi). (10.31)

We can rewrite (10.31) as

0 < minx∈X

kmaxi=1

[ϕ(x,yi)−α],

and if we apply Lemma 10.2.2 we find that there exists y0 ∈ Y such that

0 < minx∈X

[ϕ(x,y0)−α],

so thatsupy∈Y

minx∈X

ϕ(x,y)≥ minx∈X

ϕ(x,y0)> α,

and this proves (10.30). ⊓⊔

With an additional assumption we can get a saddle point.

Corollary 10.2.4. Assume (10.17), and in addition suppose that Y is compact. Thenϕ has a saddle point.

Proof. If Y is compact, then as shown in the discussion just before (10.17), theconclusion of the Sion theorem becomes

minx∈X

maxy∈Y

ϕ(x,y) = maxy∈Y

minx∈X

ϕ(x,y).

Using the notation of Section 10.1 we can rewrite this as


minx∈X

ξ (x) = maxy∈Y

η(y).

There are then points x0 ∈ X and y0 ∈ Y such that

minx∈X

ξ (x) = ξ (x0), maxy∈Y

η(y) = η(y0),

and therefore ξ (x0) = η(y0). Now Theorem 10.1.2 shows that (x0,y0) is a saddlepoint of ϕ . ⊓⊔

The next result is the original von Neumann theorem. Its original statement re-ferred only to a real-valued function on the product of two sets. The following ver-sion uses the extension of such a function developed in Section 10.1.

Theorem 10.2.5 (von Neumann, 1928). Let X and Y be nonempty compact convexsubsets of Rn and Rm respectively. Let ϕr : X ×Y → R, and suppose that for eachy ∈ Y ϕr( · ,y) is a lower semicontinuous convex function on X, and for each x ∈ Xϕr(x, ·) is an upper semicontinuous concave function on Y . Let ϕ : Rn×Rm → R bean extension of ϕr satisfying (10.13). Then ϕ has a saddle point in X ×Y .

Proof. We first prove that the functions ξ (x) and η(y) are closed proper convexand closed proper concave respectively, and that the extrema in their definitions areattained. For x ∈ X ξ (x) is the supremum of ϕr(x,y) for y ∈ Y (Proposition 10.1.3),and by hypothesis ϕr(x, ·) is lower semicontinuous, real-valued and concave and Yis compact. Therefore the supremum is attained at a finite value, and in fact

ξ (x) = ξr(x) = maxy∈Y

ϕr(x,y) = maxy∈Y

ϕ(x,y).

For x /∈ X and y ∈ Y we have ϕ(x,y) = +∞, so that domξ = X . A similar argumentshows that the infimum in the definition of η(y) is attained for each y ∈ Y ,

η(y) = ηr(y) = minx∈X

ϕr(x,y) = minx∈X

ϕ(x,y),

and domη = Y .The epigraph of ξ is the intersection, over all y ∈Y , of the sets (x,α) | ϕ( · ,y)≤

α. For y ∈ Y the functions ϕr( · ,y) are lower semicontinuous and convex by hy-pothesis, and as ϕ( · ,y) is the +∞ extension of ϕr( · ,y) and X is closed, Proposition6.2.6 shows that ϕ( · ,y) is lower semicontinuous. The epigraph of ξ is then an in-tersection of closed convex sets, so ξ is lower semicontinuous and convex. It nevertakes −∞ because no ϕ( · ,y) does, and therefore ξ is a closed proper convex func-tion. A similar proof shows that η is closed proper concave.

It now follows from the nonemptiness and compactness of X and Y that

infx∈dom1 ϕ

ξ (x) = infx∈X

ξr(x) = minx∈X

ξr(x) = minx∈X

maxy∈Y

ϕ(x,y),

attained at some x0 ∈ dom1 ϕ , and that

206 10 Duality

supy∈dom2 ϕ

η(y) = supy∈Y

ηr(y) = maxy∈Y

ηr(y) = maxy∈Y

minx∈X

ϕ(x,y),

attained at some y0 ∈ dom2 ϕ .We know from (10.3) that

maxy∈Y

η(y) = maxy∈Y

minx∈X

ϕ(x,y)≤ minx∈X

maxy∈Y

ϕ(x,y) = minx∈X

ξ (x), (10.32)

and by our analysis above, the two sides of this inequality are finite. Write β for thevalue of the right-hand side. Choose a positive ε; we will show that the left-handside is greater than β − ε , and this will show that the two sides are actually equal.

For each y∈Y the function ϕ( · ,y) is closed proper convex. As β is the right-handside of (10.32), for each x ∈ X we have maxy∈Y ϕ(x,y)≥ β > β −ε . Apply Theorem9.3.2 to the collection θ( · ,y) = ϕ( · ,y)− (β − ε) | y ∈ Y with C = X to concludethat there are a finite positive integer K, an element p∗ ∈ ΛK , a positive δ , and acollection of points y1, . . . ,yK ⊂ Y such that for each x ∈ X , ∑K

k=1 p∗kθ(x,yk)≥ δ .If we set yε = ∑K

k=1 p∗kyk, then for each x ∈ X the concavity of the function θ(x, ·)yields

θ(x,yε)≥K

∑k=1

p∗kθ(x,yk)≥ δ ,

and therefore η(yε)− (β − ε)≥ δ . Then

maxy∈Y

minx∈X

ϕ(x,y) = maxy∈Y

η(y)≥ η(yε)≥ β − ε +δ > β − ε.

Therefore the two sides of (10.32) are equal, so that

minx∈X

ξ (x) = ξ (x0) = η(y0) = maxy∈Y

η(y).

Applying Theorem 10.1.2, we find that (x0,y0) is a saddle point of ϕ . ⊓⊔

10.3 Conjugate duality: formulation

The fundamental idea of duality is to start with a given optimization problem, calledthe primal problem, of finding the infimum of some given extended real-valuedfunction f depending on variables x ∈ Rn. We then embed this problem as oneof a family of problems indexed by a parameter p ∈ Rm. Specifically, we createa perturbation function F : Rn ×Rm → R such that F( · ,0) = f and for each xthe function F(x, ·) is closed and convex. Then for each p ∈ Rm we consider theoptimization problem of finding

v(p) := infx

F(x, p). (10.33)

10.3 Conjugate duality: formulation 207

The function v is called the value function, or marginal function, of the problem. Forp = 0 we have the original problem, as F(x,0) = f (x) so that v(0) is the infimum off , which may be either finite or infinite.

The next step is to construct from F a saddle function in such a way that onehalf of the resulting saddle problem is an optimization problem that reduces to theoriginal problem of minimizing f . The other half of the saddle problem turns out tobe a new problem, invisible until now, the dual problem.

Section 10.3.1 shows how to construct such a saddle function. Section 10.3.2extends the basic construction to produce a system that is symmetric in the sense thatone can start from either the primal or the dual problem and then proceed throughthe duality procedure to obtain the other problem. Section 10.3.3 applies this systemto some relatively simple problems, both to illustrate the calculations and to showhow this structure unifies particular formulations that otherwise might not seem tobe closely related.

10.3.1 Constructing the Lagrangian

A saddle function that fits the requirement outlined above is the Lagrangian L :Rn ×Rm → R defined by

L(x, p∗) = [−F(x, ·)]∗(p∗) = infp⟨p∗, p⟩+F(x, p), (10.34)

which we obtain by taking the conjugate in the concave sense of −F in the secondvariable only. As under our conditions

supp∗

L(x, p∗) = supp∗

[−F(x, ·)]∗(p∗) =−[−F(x, ·)]∗∗(0) = F(x,0) = f (x), (10.35)

the problem of finding the infimum of f becomes the problem of computing

infx

f (x) = infx

supp∗

L(x, p∗). (10.36)

In (10.36) we have one half of a saddle point problem. For the other half, we candefine a function of p∗ by

g(p∗) := infx

L(x, p∗), (10.37)

so that

supp∗

g(p∗) = supp∗

infx

L(x, p∗)≤ infx

supp∗

L(x, p∗) = infx

f (x). (10.38)

This g is the dual objective function associated with the particular duality structureinduced by our choice of F . If we had chosen a different F satisfying the require-ments above, we could have obtained a different dual objective. Thus, with the prob-

208 10 Duality

lem of finding the infimum of f we can associate as many different dual problemsas we wish, and these may differ greatly in form.

For each x, (10.34) shows that L(x, p∗) = [−F(x, ·)]∗(p∗), so that L(x, p∗) is aclosed concave function of p∗. The function g is the infimum of the collection ofsuch functions, indexed by x, and therefore it is closed and concave no matter whatf was.

In the above formulation (10.35) shows that f (x) corresponds to the functionξ (x) of Sections 10.1 and 10.2, and (10.37) shows that g(p∗) corresponds to η(y).The general analysis of Section 10.1 then shows that

supp∗

g(p∗)≤ infx

f (x), (10.39)

but that equality does not necessarily hold. When it fails to hold, the differenceinfx f (x)− supp∗ g(p∗) is called a duality gap. On the other hand, Theorem 10.1.2shows that equality does hold for points x0 ∈ dom1 L and p∗0 ∈ dom2 L exactly when(x0, p∗0) is a saddle point of L. It also shows that the set of saddle points has aproduct structure. Accordingly, if there are multiple solutions any primal optimalsolution x0 can be paired with any dual optimal solution p∗0 to form a saddle pointof the Lagrangian L.

10.3.2 Symmetric duality

We would like to extend the basic construction to a duality structure that is symmet-ric in the sense that we can pass from a primal to a dual problem, then apply thesame or an analogous technique to the dual problem and thereby recover the primal.We can do so by requiring that F be, not just closed and convex in p for fixed valuesof x, but closed and convex in (x, p).

Assuming that F is a closed convex function, define the dual perturbation func-tion G by G = FA, where the superscript A denotes the adjoint operation introducedin Section 7.1.2. As F is convex, we take the adjoint in the convex sense to obtain

G(p∗,x∗) = FA(p∗,x∗)

= infx,p

−⟨x∗,x⟩+ ⟨p∗, p⟩+F(x, p)

= infx−⟨x∗,x⟩+L(x, p∗).

(10.40)

This G is a closed concave function of (p∗,x∗), and by substituting x∗ = 0 in(10.40) we find that

G(p∗,0) = infx

L(x, p∗) = g(p∗),

so that G( · ,0) is the dual objective function g. Therefore we can regard G as aperturbation function playing the same role for the dual problem as F did for theprimal. The last line of (10.40) shows that we could also have obtained G directly


from L, because

−G(p∗,x∗) = supx⟨x∗,x⟩−L(x, p∗)= L( · , p∗)∗(x∗).

We can also define a Lagrangian for the dual problem; by analogy with the primalconstruction we take

L′(x, p∗) = [−G(p∗, ·)]∗(x)= sup

x∗⟨x∗,x⟩+G(p∗,x∗). (10.41)

Now suppose that we want to dualize the dual problem. We can try to do so bytaking the adjoint of the dual perturbation function G, and as G is concave we takethe adjoint in the concave sense. However, GA = FAA = F , so that dualizing the dualproblem allows us to recover the original primal perturbation function and hence, ifwe wish, the original primal objective function f (x) = F(x,0).

10.3.3 Example: linear programming

To put in perspective the various constructions that we just introduced, here is anexample of what the duality transformations do when applied to a general linearprogramming problem. The formulation is general enough so that one can recoverthe dual of any linear programming problem by making particular choices of thecones involved. Later we will illustrate calculations for other problems.

Let J and K be polyhedral convex cones in Rn and Rm respectively, and considerthe linear programming problem

infx⟨c∗,x⟩ | a−Ax ∈ K, x ∈ J. (10.42)

By replacing K and J by particular cones, one can express any finite-dimensionallinear programming problem in this form. We will illustrate how to obtain the con-ventional dual problem associated with (10.42) from one case of the general dual-ity structure discussed above. The condensed derivation (through the Lagrangian)comes first, then the longer version that yields the complete dual perturbation struc-ture.

We need first to embed (10.42) in a perturbation structure. For that purpose weintroduce a variable p ∈ Rm and define

F(x, p) =

⟨c∗,x⟩ if a− p−Ax ∈ K, x ∈ J,+∞ otherwise.

(10.43)

ThenL(x, p∗) = inf

p⟨p∗, p⟩+F(x, p). (10.44)

210 10 Duality

To calculate the infimum in (10.44), as x is fixed we have only p to work with. Wecan write the requirement a− p−Ax ∈ K as

p = a−Ax− k∗, k∗ ∈ K, (10.45)

and then reason that if a− p−Ax = k∗ and x ∈ J the quantity being minimized is⟨p∗, p⟩+ ⟨c∗,x⟩, so that as we can do nothing for the moment with x, we shouldtry to make ⟨p∗, p⟩ as small as possible. We can see from (10.45) that if p∗ ∈ Kthen by taking k∗ = 0 we can make ⟨p∗, p⟩ equal to ⟨p∗,a−Ax⟩, but no smaller.Therefore in this case the infimum is ⟨c∗,x⟩+ ⟨p∗,a⟩−⟨p∗,Ax⟩. On the other hand,if p∗ /∈ K then we can find some k∗ ∈ K with ⟨p∗,k∗⟩> 0, and then by taking largemultiples of this k∗ we can make ⟨p∗, p⟩+ ⟨c∗,x⟩ as small as we wish. In this case,the infimum is −∞. Finally, if x /∈ J then F(x, p) = +∞ for each p, so the infimumis +∞. Collecting the results of these three cases, we have

L(x, p∗) =

⟨c∗,x⟩+ ⟨p∗,a⟩−⟨p∗,Ax⟩ if p∗ ∈ K and x ∈ J,−∞ if p∗ /∈ K and x ∈ J,+∞ if x /∈ J.

(10.46)

To obtain the dual objective function g(p∗) we calculate the infimum in x of L( · , p)by considering first the case when p∗ ∈ K and then that of p∗ /∈ K. In the first casewe are taking the infimum of

⟨c∗,x⟩+ ⟨p∗,a⟩−⟨p∗,Ax⟩= ⟨p∗,a⟩+ ⟨c∗−A∗p∗,x⟩,

over x ∈ J (we do not want to leave J because the function value is then +∞, whichwill not help the infimum). This shows us that if A∗p∗− c∗ ∈ J then the best infi-mum we can get is ⟨p∗,a⟩ at x = 0, whereas if A∗p∗− c∗ /∈ J then we will get −∞.In the second case, when p∗ /∈ K, adjusting x makes no difference and the infimumis −∞. We therefore have

g(p∗) = infx

L(x, p∗) =

⟨p∗,a⟩ if A∗p∗− c∗ ∈ J and p∗ ∈ K,−∞ otherwise.

(10.47)

The dual of (10.42) is then the problem of finding the supremum of g. In notationsimilar to that of (10.42), this is

supp∗

⟨p∗,a⟩ | A∗p∗− c∗ ∈ J, p∗ ∈ K. (10.48)

For a particular instance of this general formulation, if we choose K = Rm andJ = Rn

+ then (10.42) and (10.48) become, respectively,

infx⟨c∗,x⟩ | Ax = a, x ≥ 0 (10.49)

and


supp∗

⟨p∗,a⟩ | A∗p∗ ≤ c, p∗ free. (10.50)

Next we use the same problems (10.42) and (10.48) to illustrate the full dualitystructure with both primal and dual perturbation functions. To do so we replace thecalculation of the Lagrangian in (10.46) by calculation of the adjoint of F as in(10.40):

G(p∗,x∗) = FA(p∗,x∗)

= infx,p

−⟨x∗,x⟩+ ⟨p∗, p⟩+F(x, p)

= infx

infp−⟨x∗,x⟩+ ⟨p∗, p⟩+ ⟨c∗,x⟩ | a− p−Ax ∈ K, x ∈ J

= infx

⟨p∗,a⟩+ ⟨c∗− x∗−A∗p∗,x⟩ | x ∈ J if p∗ ∈ K,−∞ if p∗ /∈ K,

=

⟨p∗,a⟩ if A∗p∗+ x∗− c∗ ∈ J, p∗ ∈ K,−∞ otherwise.

(10.51)

If we set the dual perturbation variable x∗ to 0 then G(p∗,x∗) reduces to g(p∗), aswe should expect.

We have now derived a dual perturbation structure G(p∗,x∗) from the originalprimal F(x, p) and, as we can check that F is closed proper convex, we know fromthe analysis of Section 10.3.2 that if we were to start with G then we could recoverF . However, the symmetry in this general structure is not quite complete, and in factit cannot be made complete.

To see this, use (10.41) to compute the dual Lagrangian:

L′(p∗,x) = [−G(p∗, ·)]∗ (x)= sup

x∗⟨x∗,x⟩+G(p∗,x∗)

= supx

⟨x∗,x⟩+ ⟨p∗,a⟩ | A∗p∗+ x∗− c∗ ∈ J if p∗ ∈ K,−∞ if p∗ /∈ K,

= supx

⟨c∗−A∗p∗+ l∗,x⟩+ ⟨p∗,a⟩ | l∗ ∈ J if p∗ ∈ K,−∞ if p∗ /∈ K,

=

⟨c∗,x⟩+ ⟨p∗,a⟩−⟨p∗,Ax⟩ if x ∈ J, p∗ ∈ K,+∞ if x /∈ J, p∗ ∈ K−∞ if p∗ /∈ K.

(10.52)

Now if we compare (10.46) with (10.52) we see that there is a small but notice-able difference: L and L′ disagree in the region where x /∈ J and p∗ /∈ K, the primalLagrangian L taking the value +∞ there while the dual Lagrangian L′ takes the value−∞. We saw a similar disagreement at the end of Section 10.1 when we looked atthe lower and upper simple extensions of a saddle function.

212 10 Duality

In fact this disagreement expresses a general property of saddle functions like Land L′: namely, a full treatment of such functions has to deal not with individualfunctions but with equivalence classes. The construction of L and L′ has producedtwo members of such an equivalence class.

10.3.4 An economic interpretation

The dual linear programming problem in (10.48) may appear to have come out ofnowhere as a result of the manipulations that we applied to the primal. However,dual problems arising in practice often have interpretations that help to illuminateaspects of the systems being studied. This is certainly the case with linear program-ming, where we can often assign a definite economic meaning to the dual.

To demonstrate how this happens, take the primal problem of (10.42) and makethe particular choices K = RI

+ and J = RH+, to obtain the problem

infx⟨c∗,x⟩ | Ax ≥ a, x ≥ 0. (10.53)

The problem in (10.53) is an abstraction of the decision problem faced by a man-ager with H production activities (e.g., plants or departments within plants), eachrepresented by a column of A: for the jth activity and for goods numbered 1, . . . , I,the element ai j of A is the number of units of good i produced by activity j in onetime period if the activity is operated at a reference level. In this interpretation weassume constant returns to scale, so for a nonnegative real number x j the activityif operated at level x j will produce ai jx j units of good i. The entries ai j may bepositive, zero, or negative, with a negative entry indicating net consumption of thatgood. On the other hand, the levels x j cannot be negative: these processes are notreversible. Therefore, the net output of the entire production process, given a set oflevels x ∈ RH

+, will be a vector Ax ∈ RI of goods.The cost of operating production activity j for one time unit at the reference level

is c∗j , and we assume that the cost of operating at level x j is c∗jx j. We will assumethat the manager’s tasks is to produce at least ai units of good i in one time period,where the ai also may be negative: producing at least −5 units means not consumingmore than 5 units.

This is the simplest formulation; it assumes free disposal in the sense that onecan produce more than is needed of any good, then give away or discard the excess.If there is no free disposal then we can add disposal activities, the most elementaryof which will be columns with a −1 in some entry and zeros in all others, and witha positive unit operating cost. Operating such an activity disposes of excess, but at acost. Then we take K =RI instead of K =RI

+, so that the constraint Ax ≥ a becomesAx = a.

Faced with the above requirement, in the version with free disposal the managerwill consider only production plans x ≥ 0 satisfying Ax ≥ a, and among those willprefer an x0 that achieves the least value of the total cost ⟨c∗,x⟩. This is (10.53).


Now consider the problem that the manager has in allocating cost to products.There is a total production cost ⟨c∗,x0⟩, and there is a vector (a1, . . . ,aI) of productsavailable for sale to bring in income to cover the cost. But it is not at all clear howto allocate the cost to the products, because the production costs appear to comefrom activities and not from products. We may know that a particular activity cost$128,000 to operate for one time unit, but that activity produced 560 units of oneproduct, 11,600 of another, consumed 8,470 units of a third, and so on. It is verydifficult to see how to make a realistic allocation of operating cost to the productsrather than to the activities.

Yet we need such an allocation, because we need to know whether making10,000 units of product number 5 will yield as much or more, when the productis sold in the market, as it cost us to make the 10,000 units. If the answer is that itwill yield less, then we had better change our production plan, or perhaps discon-tinue product 5 altogether. Thus, we actually need to find a set of prices that notonly allocates all of the cost to products, but does so in a way that lets us predict theeffects of changing our production plan. This means that we need what economistscall marginal costs of changes in the elements of a.

This is where the dual problem can help. With the particular choices of K and Lthat we made for (10.53), the dual becomes

supp∗

⟨p∗,a⟩ | A∗p∗ ≤ c∗, p∗ ≥ 0. (10.54)

We will interpret this as a problem of assigning prices p∗1, . . . , p∗I to the I products.As we show next, the constraints in (10.54) amount to requiring these prices to meetcertain requirements of realism derived from observed behavior of markets.

First, if there is free disposal then prices should not be negative. If, for example,p∗1 < 0 then this means that people will pay us to take good 1 away from them. Givenfree disposal, we can take large quantities from those people, collect this subsidy of|p∗1| per unit, and then throw away the goods. The people offering us these goodswould be behaving stupidly, as under free disposal they could throw the goods awaythemselves at no cost instead of paying us |p∗1| per unit to do so. It is not reasonableto assume that a price system depending on such behavior would last long, so weshould impose the constraint that p∗ ≥ 0.

The argument for the second constraint is very similar. Suppose that for some jwe were to have (A∗p∗) j > c∗j . As

(A∗p∗) j =I

∑i=1

p∗i ai j,

this is the value, at the prices p∗, of the package of goods produced by activity joperating for one time unit at the reference level. The total cost of that operation isc∗j , so that we have an arbitrage opportunity, which is a chance to make a transactionleaving us with no change in product inventory but a guaranteed net monetary gain:something for nothing. To do so we operate activity j at some positive level ρ andproduce a bundle ρ(a1 j, . . . ,aI j) of goods. We have to pay a cost ρc∗j . Then we

214 10 Duality

exchange the goods on the market for a sum of money equaling ρ(A∗p∗) j. We areleft with the same inventory of goods as before, but with our cash increased by thepositive amount ρ[(A∗p∗) j − c∗j ]. As there is no limit to the level ρ at which we areallowed to operate activity j, we can make as much free money as we want to byusing a sufficiently large operating level ρ .

Experience with markets has shown that arbitrage opportunities tend not to per-sist for very long, because people try to exploit the opportunity, buying on one sideand selling on the other. This activity drives up prices of the products being boughtand drives down prices of the products being sold, so the arbitrage opportunity dis-appears. Thus in order for our prices p∗ to pass the reality test, they should notoffer any arbitrage opportunities. That means that we should enforce the constraintA∗p∗ ≤ c∗.

In fact, if we look back at the argument by which we decided to constrain theprices to be nonnegative, we can see that the existence of a negative price in thepresence of free disposal was actually another arbitrage opportunity. Therefore wecan give a very simple verbal description of the dual constraints that we have de-rived: the prices should not offer any arbitrage opportunity, or in other words theyshould be arbitrage-free.

A price system p∗ satisfies the two requirements just discussed exactly when itsatisfies the constraints of the dual programming problem (10.54). We will assumethat the primal problem (10.53) has been properly formulated in the sense that it hasa feasible solution and its objective function is bounded below on the set of feasiblesolutions. Then we know from the theory of linear programming that both the primaland dual problems are actually solvable, and that their optimal objective values areequal. This means that we can find an optimal solution x0 of (10.53) and an optimalsolution p∗0 of (10.54) such that

⟨c∗,x0⟩= ⟨p∗0,a⟩. (10.55)

But this means that we have solved our allocation problem: we can find arbitrage-free prices p∗0 so that the total value ⟨p∗0,a⟩ of the goods a that we have produced, atthe prices p∗0, exactly equals the total cost ⟨c∗,x0⟩ of producing those goods.

In fact, we will show in Proposition 10.4.3 below that the negatives of theseprices are actually subgradients of the primal value function v(p) at p = 0. For ourchoices of K and L the perturbation function we used was

F(x, p) =

⟨c∗,x⟩ if Ax+ p ≥ a, x ≥ 0,+∞ otherwise.

(10.56)

Thus if we want to investigate the effect of changing a to a+ h for some h ∈ RI ,then we should set p =−h. The new optimal value will then be v(−h). Then for anydual optimal solution p∗0,

v(−h)≥ v(0)+ ⟨−p∗0,(−h)−0⟩,

10.4 Conjugate duality: existence of solutions 215

and therefore the change in total cost from this production change will be v(−h)−v(0), which is at least as large as ⟨p∗0,h⟩.

If the dual optimal solution is unique, as is often the case, then that solution isactually a Frechet derivative (Theorem 8.3.2), so that the change in optimal costto replace a by a+ h will be ⟨p∗0,h⟩+ o(h). In this case, the elements of p∗0 aretrue marginal costs for the production of the goods 1, . . . , I. It is possible to usethe polyhedrality of epigraphs associated with the linear programming problem toextend this result somewhat in this particular case, but we do not do so here.

10.4 Conjugate duality: existence of solutions

In this section we look at four questions:

1. When does the Lagrangian L have a saddle point?2. When is there no duality gap?3. How can one interpret the dual variables?4. What conditions ensure that the extrema of the primal and/or dual problems are

attained?

The answers to the first two questions are closely related, and we will look firstat those, then at the rest.

Theorem 10.4.1. Suppose that for each x, F(x, ·) is closed and convex. Then thefollowing are equivalent:

a. The Lagrangian L has a saddle point (x0, p∗0) with finite saddle value L(x0, p∗0).b. x0 ∈ dom f , p∗0 ∈ domg, and g(p0)

∗ = f (x0).

When these equivalent conditions hold,

supg(p∗) = g(p∗0) = L(x0, p∗0) = f (x0) = inf f , (10.57)

so that L(x0, p∗0) equals the common optimal value of the primal and dual problems.

Proof. From (10.35) and (10.37) we see that for each x and p,

g(p∗) = infx

L(x, p∗), supp∗

L(x, p∗) = f (x). (10.58)

If x ∈ dom f then the second assertion in (10.58) says that for each p∗ we haveL(x, p∗) ≤ f (x) < +∞, so x ∈ dom1 L and therefore dom f ⊂ dom1 L. A similar ar-gument using the first assertion of (10.58) shows that domg ⊂ dom2 L.

(a) implies (b). If (x0, p∗0) is a saddle point of L with finite saddle value, then

f (x0) = supp∗

L(x0, p∗) = L(x0, p∗0) = infx

L(x, p∗0) = g(p∗0). (10.59)

216 10 Duality

This shows that f (x0) and g(p∗0) are both finite, so that x0 ∈ dom f and p∗0 ∈ domg.Therefore (b) holds.

(b) implies (a). If (b) holds then (x0, p∗0)∈ dom f ×domg⊂ domL. Using (10.58)together with the assumption that f (x0) = g(p∗0), we have

f (x0) = g(p∗0)≤ L(x0, p∗0)≤ f (x0), (10.60)

so that all of these values are equal. As f (x0) < +∞ and g(p∗0) > −∞, in fact thevalues are all finite. Using (10.58) once more, we find that for each x and p∗,

L(x0, p∗)≤ f (x0) = L(x0, p∗0) = g(p∗0)≤ L(x, p∗0), (10.61)

so that (x0, p∗0) is a saddle point with finite saddle value, which proves (a).If (a) and (b) hold, then we know that f (x0) = g(p∗0). For each x and p∗, (10.39)

showed that g(p∗)≤ f (x), so we have

g(p∗)≤ f (x0) = g(p∗0)≤ f (x),

showing that x0 solves the primal problem and p∗0 solves the dual problem. We canthen see from (10.60) that the optimal value of each problem equals L(x0, p∗0). ⊓⊔

Theorem 10.4.1 shows that if the Lagrangian has a saddle point with finite saddlevalue, then both primal and dual problems are solvable with a common optimalvalue, so in that case there certainly is no duality gap. But there may be no dualitygap even if it is not the case that both problems have optimal solutions. To see this, ithelps to look at the value functions.We have already seen the primal value functionv(p) = infx F(x, p) in (10.33), and by analogy we can define the dual value functionto be

w(x∗) = supp∗

G(p∗,x∗). (10.62)

Proposition 10.4.2. Suppose that the perturbation function F is closed proper con-vex. Then

supg∗ = clv(0)≤ v(0) = inf f . (10.63)

Accordingly, there is no duality gap if and only if v(0) = clv(0).

Proof. The infimum of f is v(0) by definition of v. Moreover, as F is convex v is aconvex function by Exercise 6.1.17, so that −v is concave. Then

(−v)∗(p∗) = infp⟨p∗, p⟩− (−v)(p)

= infx,p

⟨p∗, p⟩+F(x, p)

= infx

L(x, p∗)

= g(p∗),

(10.64)

so that the dual objective g is (−v)∗, and therefore


g∗ = (−v)∗∗ = cl(−v) =−clv.

This means thatsupg =−g∗(0) = clv(0)≤ v(0) = inf f ,

which proves (10.63). ⊓⊔

Proposition 10.4.2 gives valuable insight into the possible difference betweensupg and inf f by showing that any such difference must be exactly the differencebetween the values of v and of clv at the origin. Evidently, there will be no differ-ence if 0 ∈ ridomv (Theorem 6.2.14), and there may be no difference even if thiscondition is not met. However, (10.64) can give us even more information, as weshow next.

Proposition 10.4.3. Suppose that for each x, F(x, ·) is closed and convex, and thatinf f is finite. Then

−∂v(0) = p∗ | g(p∗) = maxg. (10.65)

If 0 ∈ ridomv then this set is nonempty.

Proof. Let p be a point at which v(p) is finite. A point p∗ belongs to −∂v(p) exactlywhen p∗ ∈ ∂ (−v)(p) (use Definition 8.1.1, remembering that the inequality goesin opposite directions for convex and for concave functions). In turn, this occursexactly when

(−v)∗(p∗) = ⟨p∗, p⟩− (−v)(p).

Rewriting this using (10.64), we see that p∗ ∈ −∂v(p) if and only if

g(p∗) = ⟨p∗, p⟩+ v(p).

Setting p = 0, where v is finite by hypothesis, and recalling that v(0) = inf f , we findthat p∗ ∈−∂v(0) if and only if g(p∗) = inf f . But as we know that supg ≤ inf f , wesee that −∂v(0) is exactly the set of maximizers of the dual objective function g.

For this set to be nonempty it suffices by Proposition 6.2.13 that the origin belongto the relative interior of domv. ⊓⊔

Proposition 10.4.3 shows that the condition 0 ∈ ridomv ensures that the dualproblem has at least one optimal solution. Moreover, as we saw earlier this conditionalso suffices to ensure that

supg = clv(0) = v(0) = inf f ,

so that there will be no duality gap.

10.4.1 Example: convex nonlinear programming

Let fi(x) for i= 0, . . . ,m be closed proper convex functions on Rn, with ∩mi=0 dom fi =

/0. We consider the constrained optimization problem

218 10 Duality

infx f0(x) | fi(x)≤ 0, i = 1, . . . ,m. (10.66)

This representation can accommodate a constraint of the form x ∈ C by includingthe function IC among the constraint functions fi.

We use the perturbation structure

F(x, p) =

f0(x) if fi(x)≤ pi, i = 1, . . . ,m,+∞ otherwise.

(10.67)

One can check directly that the epigraph of F is a closed convex subset of Rn+m+1.Further, F cannot take −∞ and it does not always take +∞ (let x ∈ ∩m

i=0 dom fi andtake the pi to be large enough so that all constraints are satisfied), so it is a closedproper convex function.

Now a computation using the definition of the Lagrangian shows that

L(x, p∗) =

f0(x)+∑m

i=1 p∗i fi(x) if p∗ ≥ 0,+∞ otherwise,

(10.68)

and therefore the corresponding dual problem is that of finding the supremum in p∗

of

g(p∗) = infx

L(x, p∗)

=

infx f0(x)+∑m

i=1 p∗i fi(x) if p∗ ≥ 0,+∞ otherwise.

(10.69)

Depending on the form of the functions fi, it may or may not be possible to finda simple representation of the dual problem (in particular, one in which the “inf”operator does not explicitly appear).

Here is a condition that will ensure that the dual problem has a solution and thatthere is no duality gap.

Definition 10.4.4. The problem (10.66) satisfies the Slater condition if there is apoint x ∈ dom f0 such that

fi(x)< 0, i = 1, . . . ,m.

⊓⊔

To see why the Slater condition works, observe that if P is a sufficiently smallneighborhood of the origin in Rm, then for each p ∈ P we will have

f1(x)< p1, . . . , fm(x)< pm,

and thereforev(p) = inf

xF(x, p)≤ F(x, p) = f0(x)<+∞.


Accordingly, P ⊂ domv so that in fact 0 ∈ intdomv, which is stronger than thecondition 0 ∈ ridomv for applicability of Propositions 10.4.2 and 10.4.3.

One could set up conditions for the primal problem to have a minimizer by ap-plying Propositions 10.4.2 and 10.4.3 to the dual, using the symmetry of the dualityframework. However, the dual perturbation structure is frequently not easy to workwith. For that reason it may be better simply to impose a condition directly on theprimal problem: for example, compactness of a level set.

10.4.2 The augmented Lagrangian

The Lagrangian in (10.68) contains sign constraints on the dual variables p∗, and thedual objective function g inherits these constraints. In computing the Lagrangian onesees that these sign constraints arise, apparently inevitably, from the presence of in-equality constraints in the original problem. One might conclude that whenever theproblem contains inequality constraints, they will produce explicit sign restrictionson the dual variables. However this is not so, because the form of the Lagrangian(and so also the dual objective) depends not just on the form of the constraints butalso on other aspects of the function F .

To illustrate this we take the same primal problem and produce quite a differentLagrangian by altering the form of F . In fact, the alteration will be apparently veryslight: for some positive real number r we take

Fr(x, p) =

f0(x)+(r/2)∥p∥2 if fi(x)≤ pi, i = 1, . . . ,m,+∞ otherwise.

(10.70)

Now in calculating the Lagrangian we have to solve m minimization problems,each of the form

minp∗i pi +(r/2)p2i | fi(x)≤ pi. (10.71)

The unconstrained minimum of the quadratic objective function in (10.71) occursat pi = −r−1 p∗i , and it has the value −(2r)−1(p∗i )

2. Clearly, we want to make thischoice if it is feasible for (10.71): that is, if p∗i + r fi(x)≤ 0. Otherwise (when fi(x)is greater than the minimizer of (10.71)) we want to make the quadratic as small aspossible, which means setting pi = fi(x) so that the quadratic has the value p∗i fi(x)+(r/2) fi(x)2. Therefore we have

L(x, p∗) = f0(x)+m

∑i=1

θi( fi(x), p∗i ,r), (10.72)

where

θi(ϕi, p∗i ,r) =

−(2r)−1(p∗i )

2 if p∗i + rϕi ≤ 0,p∗i ϕi +(r/2)ϕ 2

i otherwise.(10.73)

220 10 Duality

The quantity defined by (10.72) and (10.73) is the so-called augmented Lagrangian,which has received much attention because of its usefulness in computational solu-tion of optimization problems. As we can see from this development, it is just an-other of the infinitely many possibilities for constructing Lagrangians for the sameprimal problem by selecting different functional forms for F .


The general setting of the problems outlined in Section 10.1 follows [39, Part VII],especially Sections 33 and 36.

For the saddle point theorems of Section 10.2, the theorem of Sion is adaptedfrom [46]. The proof given here follows [20]. The first proof of Theorem 10.2.5 isadapted from [5], and the second follows Section I.1 of [6].

The duality formulation in Section 10.3 is due to Rockafellar: see [40], whichalso deals with problems in infinite-dimensional spaces. Many previous authors haddealt with duality in more restricted forms.

A fuller treatment of saddle functions, including equivalence classes, is in [39].

Appendix ATopics from Linear Algebra

This appendix presents some useful results that are often not taught in elementarylinear algebra courses. We give results only for the real field, as it is simpler and wedo not need the extra generality.

The initial sections cover linear spaces, the concept of a norm, and a few basicproperties of norms, particularly on finite-dimensional linear spaces, including thatof m×n matrices.

We then apply the basic results to derive some structural properties of matrices,including the extremely useful singular value decomposition. As an application weshow how to compute the Moore-Penrose generalized inverse. The final section de-velops properties of affine sets, which are indispensable in the study of convexity.The Moore-Penrose inverse appears again here as a useful tool.

A.1 Linear spaces

A linear space, or vector space, is a pair V = (X ,F) where

• X is a group whose operation and identity element we denote by + and 0 respec-tively,

• F is a field whose elements we call scalars, and whose additive identity and unitelement are 0 and 1 respectively,

• For each α ∈ F and each x ∈ X the product αx is defined and is an element of X ,

and where the following four properties hold for each x and y in X and each α andβ in F :

a. α(x+ y) = αx+αy;b. (α +β )x = αx+βx;c. α(βx) = (αβ )x;d. 1x = x.

221

222 A Topics from Linear Algebra

A real vector space is one in which F = R. Every vector space in this book isreal. We usually take V to be the familiar vector space Rn in which the elementsof X are n-tuples of real numbers (x1, . . . ,xn) and F = R. Courses on basic linearalgebra develop the most important properties of Rn, and we will use these. We willoccasionally take a different X , such as the m×n real matrices.

If S = x1, . . . ,xk is a finite subset of V , then a linear combination of elementsof S is an element of the form µ1x1 + . . .+ µkxk, where µ1, . . . ,µk are scalars. Thespan of S is the set

spanS =

All linear combinations of elements of S if S = /0,0 otherwise.

V is finite-dimensional if there exists a finite subset S of V such that V = spanS.Every vector space used in this book is finite-dimensional.

When F =R the linear space V is an inner product space if there is an operation⟨ · , · ⟩ : V ×V → F obeying the following for each x, y, and z in X and each α and βin F :

1. ⟨x,x⟩ ≥ 0, and ⟨x,x⟩= 0 if and only if x = 0;2. ⟨x,y⟩= ⟨y,x⟩;3. ⟨αx+βy,z⟩= α⟨x,z⟩+β ⟨y,z⟩.If F were to consist of the complex numbers C, this definition would still work if thethe right-hand side of the equation in item (2) above were replaced by its complexconjugate.

For elements x=(x1, . . . ,xn) and y=(y1, . . . ,yn) of Rn the standard inner productis

⟨x,y⟩=n

∑i=1

xiyi, (A.1)

but there are infinitely many others: for example, ∑ni=1(xi)(Ay)i, where A is a sym-

metric positive definite matrix. We will use the notation in (A.1) for the standardinner product, and will introduce other notation if needed for different inner prod-ucts.

A.1.1 Exercises for Section A.1

Exercise A.1.1. Let A be an n×n matrix. Prove the following

1. If A is symmetric and positive definite, then the expression

[x,y] = ⟨x,Ay⟩ (A.2)

defines an inner product on Rn.2. Under either or both of the following conditions, (A.2) does not define an inner

product on Rn: (1) A is not symmetric; (2) A is not positive definite.

A.2 Normed linear spaces 223

A.2 Normed linear spaces

As shown in Section A.1, points in a real vector space V satisfy axioms concerningaddition as well as multiplication by scalars. However, without more structure wecannot speak about the size of an element x ∈V , except to say whether it is or is notzero. One approach to the question of size is to define a norm on that space.

Definition A.2.1. Let V be a real linear space. A norm on V is a function ϕ( ·) : V →R having the following properties for each x and y in V and each α ∈ R:

1. ϕ(x)≥ 0, and ϕ(x) = 0 if and only if x = 0.2. ϕ(αx) = |α|ϕ(x).3. (the triangle inequality) ϕ(x+ y)≤ ϕ(x)+ϕ(y).

⊓⊔

The pair (V,∥ · ∥) is a normed linear space.Norms are not unique (e.g., multiply by a nonnegative scalar). If | · | is any norm

for V , then | · | is a metric on V so (V, | · |) is a metric space.

Definition A.2.2. A function f between two normed linear spaces (X , | · |) and(Y,∥ · ∥) is Lipschitz continuous, or Lipschitzian, if there is some nonnegative realnumber λ such that for each x and x′ in X ,

∥ f (x)− f (x′)∥ ≤ λ |x− x′|.

It is nonexpansive if it is Lipschitzian with λ = 1. ⊓⊔

Lemma A.2.3. If (V,∥ · ∥) is a real normed linear space then ∥ · ∥ is a nonexpansivefunction from (V,∥ · ∥) to R.

Proof. Let v and v′ be elements of V . Then the triangle inequality for ∥ · ∥ showsthat ∥v∥ ≤ ∥v−v′∥+∥v′∥. Interchanging v and v′ we find that ∥v′∥ ≤ ∥v′−v∥+∥v∥.Therefore

|∥v∥−∥v′∥| ≤ ∥v− v′∥,

which shows that ∥ · ∥ is nonexpansive. ⊓⊔

If an inner product [ · , · ] is available on V , then it yields a very useful normdefined by

∥v∥= [v,v]1/2. (A.3)

In order to verify the properties of that norm, we first establish the Schwarz inequal-ity.

Lemma A.2.4. Let V be a real linear space having an inner product [ · , · ], and forv ∈V let ∥v∥ be defined by (A.3). If x and y are any elements of V , then

|[x,y]| ≤ ∥x∥∥y∥, (A.4)

with equality if and only if one of x and y is a scalar multiple of the other.


Proof. Choose x and y in V ; if y = 0 then each side of (A.4) is zero and y is a scalarmultiple of x, so the result holds in that case. If y = 0 then for each α ∈R the squareof the norm of x+αy is nonnegative:

0 ≤ ∥x+αy∥2 = [x+αy,x+αy] = ∥x∥2 +2α [x,y]+α2∥y∥2. (A.5)

If we take α =−[x,y]/∥y∥2 then (A.5) becomes

0 ≤ ∥x+αy∥2 = ∥x∥2 −2[x,y]2/∥y∥2 +[x,y]2/∥y∥2 = ∥y∥−2(∥x∥2∥y∥2 − [x,y]2),(A.6)

so that (A.4) holds. Moreover, if one of x and y is a scalar multiple of the other thenthe two sides of (A.4) are equal, but otherwise the right-hand side of (A.6) must bepositive because ∥x+αy∥> 0, and then strict inequality must hold in (A.4). ⊓⊔

Proposition A.2.5. If V is a real linear space having an inner product [ · , · ], thenthe function ∥ · ∥ defined in (A.3) is a norm on V .

Proof. The properties of the inner product show that ∥u∥ is nonnegative for eachu ∈V , and is zero if and only if u = 0. They also show that for any scalar α ,

∥αu∥= [αu,αu]1/2 = |α|∥u∥.

To verify the triangle inequality, apply the Schwarz inequality to obtain

∥u+ v∥2 = ∥u∥2 +2[u,v]+∥v∥2 ≤ ∥u∥2 +2∥u∥∥v∥+∥v∥2 = (∥u∥+∥v∥)2,

and take the nonnegative square root on each side. This also shows that for the norm∥ · ∥, the triangle inequality is strict unless one of u and v is a scalar multiple of theother. This does not hold for norms in general. ⊓⊔

If V = Rn and [ · , · ] is the standard inner product ⟨ · , · ⟩, then the norm ∥ · ∥constructed above is the Euclidean norm, given by ∥x∥ = (∑n

i=1 x2i )

1/2. It is alsocalled the 2-norm or l2 norm.

Two other norms commonly used on Rn are the maximum norm or l∞ norm, givenby

∥x∥∞ =n

maxi=1

|xi|, (A.7)

and the l1 norm, given by

∥x∥1 =n

∑i=1

|xi|. (A.8)

The presence of these other norms raises the question of whether one might de-fine some inner products on Rn that would yield these norms via (A.3). The follow-ing theorem characterizes those real normed linear spaces in which an inner productgenerates the norm.

Theorem A.2.6. Let (V,∥ · ∥) be a real normed linear space. The following areequivalent.


a. An inner product [ · , · ] exists on V with the property that for each v ∈V

[v,v] = ∥v∥2. (A.9)

b. For each u and v in V ,

∥u+ v∥2 +∥u− v∥2 = 2(∥u∥2 +∥v∥2). (A.10)

If these equivalent properties hold, then the inner product in (a) is uniquely definedfor each u and v in V by

[u,v] = (1/4)(∥u+ v∥2 −∥u− v∥2). (A.11)

Proof. (a) implies (b). Suppose [ · , · ] exists and satisfies (A.9). Then for each u andv in V ,

∥u+v∥2+∥u−v∥2 = [u+v,u+v]+[u−v,u−v] = 2[u,u]+2[v,v] = 2(∥u∥2+∥v∥2).(A.12)

Therefore (A.10) holds. Further,

∥u+ v∥2 −∥u− v∥2 = [u+ v,u+ v]− [u− v,u− v] = 4[u,v], (A.13)

so (A.11) uniquely defines the inner product.(b) implies (a). Now suppose that (A.10) holds for each u and v in V . Define [u,v]

by (A.11); we show that it satisfies (A.9) and obeys the requirements for an innerproduct. By taking u = v in (A.11) we find that [u,u] = ∥u∥2 for each u ∈V , so that(A.9) holds. The properties of the norm then show that [u,u] is always nonnegativeand is zero if and only if u = 0, and that [u,v] = [v,u] for each u and v.

Next, choose u, u′, and v in V , and let

s = (u+u′)/2, s′ = (u−u′)/2. (A.14)

Then (A.10) yields

∥(s+ v)+ s′∥2 +∥(s+ v)− s′∥2 = 2(∥s+ v∥2 +∥s′∥2)

∥(s− v)+ s′∥2 +∥(s− v)− s′∥2 = 2(∥s− v∥2 +∥s′∥2).(A.15)

Subtract the second equation from the first to get

(∥(s+ s′)+ v∥2 −∥(s+ s′)− v∥2)+(∥(s− s′)+ v∥2 −∥(s− s′)− v∥2)

= 2(∥s+ v∥2 −∥s− v∥2),

and then use (A.11) to rewrite this as

4([s+ s′,v]+ [s− s′,v]) = 8[s,v].

Finally, divide by 4 and use (A.14) to bring this to the form


[u,v]+ [u′,v] = [u+u′,v], (A.16)

which shows that part of the third requirement for an inner product holds. To com-plete the proof we show that for each real α and each u and v in V , [αu,v] = α [u,v].Let

C := α ∈ R | for each u and v in V , [αu,v] = α[u,v].

We will show that C = R.First, the definition of C shows that it contains 1. From (A.11) we see that 0 =

[0,v] and then (A.16) shows that

0 = [0,v] = [u+(−u),v] = [u,v]+ [−u,v],

so that [−u,v] =−[u,v] and therefore −1 ∈C. If α and β belong to C then

[(α +β )u,v] = [αu+βu,v] = [αu,v]+ [βu,v] = α [u,v]+β [u,v] = (α +β )[u,v],

so that α +β ∈ C, and an induction shows that any finite sum of elements of C isin C. As any integer is a finite sum of the numbers +1 and −1, all of the integersbelong to C. Now if p and q are two integers with q = 0, we have

[(p/q)u,v] = q−1q[(p/q)u,v] = q−1[pu,v] = (p/q)[u,v],

so that every rational number is in C.Lemma A.2.3 shows that the norm is a Lipschitz continuous function from (V,∥ ·

∥) to R. Then (A.11) shows that the function [ · , · ] is continuous from V ×V to R.Now let α ∈ R and let ρk be a sequence of rationals converging to α . Then foreach k, [ρku,v] = ρk[u,v] by what we have already shown. Taking the limit and, onthe left, using the continuity of [ · , · ] we obtain [αu,v] = α[u,v]. This shows that infact C = R as required. ⊓⊔

Although any norm on a linear space V is a metric, it is not obvious whether themetric topologies are all the same or whether they differ for different norms. Thekey point here is the dimensionality of the space: if the space is finite-dimensionalthen they are all the same, whereas if it is infinite-dimensional then they may differ.The following definition and theorem show why this is so.

Definition A.2.7. Let V be a linear space and let | · |a and | · |b be norms on V .These norms are equivalent if there exist positive constants α and β such that foreach v ∈V ,

α|v|a ≤ |v|b ≤ β |v|a. (A.17)

⊓⊔

If the norms | · |a and | · |b on V are equivalent, let v be a point of V and X bea neighborhood of v in (V, | · |a). Then by taking a small enough positive ε onecan always arrange for the ε-ball about v in (V, | · |b) to be contained in X . Theequivalence of two norms thus implies that the metric topologies associated withthose norms are the same.


The following theorem gives some fundamental properties of a finite-dimensionallinear space; then a corollary shows that any two norms on such a space are equiv-alent. An isometry between two linear spaces is a map that preserves distance, andan isomorphism from one onto the other is a linear map with a single-valued linearinverse.

Theorem A.2.8. If V is a real linear space having finite dimension n, then thereis an isomorphism x of Rn onto V . If ∥ · ∥ is the Euclidean norm on Rn, then thefunction

|u| := ∥x−1(u)∥

is a norm on V , and x is then an isometry from (Rn,∥ · ∥) to (V, | · |). Further, theunit sphere

S := u ∈V | |u|= 1

of (V, | · |) is compact.

Proof. Let dimV = n and choose a basis v1, . . . ,vn. For ξ = (ξ1, . . . ,ξn)∈Rn definex(ξ ) = ∑n

i=1 ξivi. Then x( ·) is a linear map from Rn onto V , because every x ∈ Vis a linear combination of the vi. As the vi are linearly independent the coefficientsin such a linear combination are unique, so that the map x is injective and it has asingle-valued linear inverse defined on V by

x−1(n

∑i=1

ξivi) = (ξ1, . . . ,ξn).

Therefore x is an isomorphism of Rn onto V .Now define a real-valued function | · | on V by |v|= ∥x−1(v)∥, where ∥ · ∥ is the

Euclidean norm on Rn. This function satisfies the conditions for a norm on V , so Vis a normed linear space. Moreover, if ξ and ξ ′ are points of Rn then

|x(ξ )− x(ξ ′)|= ∥x−1[x(ξ )]− x−1[x(ξ ′)]∥= ∥ξ −ξ ′∥, (A.18)

so that x and hence also x−1 are isometries between (Rn,∥ · ∥) and (V, | · |), hencein particular Lipschitz continuous.

A point u ∈V has |u|= 1 if and only if u = x(ξ ) and ∥ξ∥= 1. Therefore S is theimage x(Sn−1) of the unit sphere

Sn−1 = ξ ∈ Rn | ∥ξ∥= 1

of Rn under the map x. But Sn−1 is a compact subset of Rn and x is continuous, sothe image S is compact. ⊓⊔

Without the finite dimensionality requirement the unit sphere of a normed linearspace may not be compact.

Corollary A.2.9. If V is a finite-dimensional real linear space, then any two normson V are equivalent.


Proof. Let dimV = n and choose a basis v1, . . . ,vn for V ; let the function x be asdefined in Theorem A.2.8. That theorem shows that V has a norm | · |. Write ∥ · ∥for the Euclidean norm on Rn. We know from Lemma A.2.3 that | · | is Lipschitzcontinuous (in fact, nonexpansive) on V , We will now show that any norm on V isLipschitz continuous in the metric topology of (V, | · |).

Let | · |a be any norm on V and define µ+ = (∑ni=1 |vi|2a)1/2. Choose ξ and ξ ′ in

Rn and define u = x(ξ ) and u′ = x(ξ ′). Then

| |u|a −|u′|a | ≤ |x(ξ )− x(ξ ′)|a = |n

∑i=1

(ξi −ξ ′i )vi|a

≤n

∑i=1

|ξi −ξ ′i | |vi|a ≤ µ+∥ξ −ξ ′∥= µ+|u−u′|,

(A.19)

where the first inequality follows from Lemma A.2.3, the second from the triangleinequality for | · |a, and the third from the Schwarz inequality applied to the n-tuplesof real numbers involved in the sum. The last equality is from Theorem A.2.8. Then(A.19) shows that | · |a is a (Lipschitz) continuous function on (V, | · |).

Theorem A.2.8 says that the unit sphere S := u ∈ V | |u| = 1 of (V, | · |) iscompact. The function |u|a/|u| is continuous on S in the topology of (V, | · |), so itattains a maximum α+ and a minimum α− on S. Moreover, α− must be positivebecause otherwise the point u− ∈ S at which it is attained would have |u−|a = 0 andwould therefore be zero, contradicting |u−| = 1. For any nonzero u ∈ V we haveu/|u| ∈ S, so

α− ≤ |u/|u| |a ≤ α+,

and thereforeα−|u| ≤ |u|a ≤ α+|u|. (A.20)

However, (A.20) holds also for u = 0 and therefore it holds for every u ∈V .If | · |b is a norm on V then the same argument produces positive constants β−

and β+ such thatβ−|u| ≤ |u|b ≤ β+|u|. (A.21)

Rearrangement of (A.20) and (A.21) yields

α−1+ β−|u|a ≤ |u|b ≤ α−1

− β+|u|a,

so that | · |a and | · |b are equivalent. ⊓⊔

Corollary A.2.9 shows that all of the norms on a finite-dimensional linear spacedefine the same metric topology. Accordingly, we are free to use whichever norm ismost convenient for a particular application, with no concern about the effect of thechoice on the topology of the space.

A.3 Linear spaces of matrices 229


Exercise A.2.10. For each of the norms ∥ · ∥∞ and ∥ · ∥1 determine the dimensionsn, if any, for which there is an inner product [ · , · ] on Rn such that for each x, [x,x]is the square of the norm of x. Prove your assertions.

Exercise A.2.11. Given a positive dimension n, find constants µ− and µ+ whichmay depend on n, such that for each x ∈ Rn

µ−∥x∥∞ ≤ ∥x∥ ≤ µ+∥x∥∞, (A.22)

where ∥ · ∥ is the Euclidean norm. Show that your constants are sharp: that is, showthat with your µ− and µ+, for each of the two inequalities in (A.22) there is anonzero point of Rn at which that inequality becomes an equality.

A.2.2 Notes and references

The characterization of inner product spaces in Theorem A.2.6 is due to Jordan andvon Neumann [19, Theorem I]. Their characterization is for complex spaces; theproof we give here for real spaces is slightly simpler.

A.3 Linear spaces of matrices

Write Rm×n for the set of real m×n matrices, with each of m and n being at least 1.When equipped with the usual operations of matrix addition and of multiplicationby scalars, this becomes a linear space in which the zero element is the zero matrix.The resulting space is of dimension mn because the matrices Ai j having 1 in positioni j and zero elsewhere form a basis.

A useful inner product on Rm×n, defined below, employs the trace of a squarematrix. The trace is the sum of the diagonal elements, but it is also the sum ofthe eigenvalues of the matrix. To see why, let the matrix be F ∈ Rn×n; then theeigenvalues λ1, . . . ,λn are the zeros of the characteristic polynomial

P(λ ) = det(λ In −F) = Π ni=1(λ −λi). (A.23)

The product on the right in (A.23) shows that the coefficient of λ n−1 in P(λ ) mustbe

α :=−n

∑i=1

λi. (A.24)

However, the only way in which a product containing λ n−1 can occur in the expan-sion of the determinant in (A.23) is if n− 1 of the elements come from diagonal


entries (λ −Fii) in which λ is chosen for inclusion in the product, and the nth el-ement comes from the remaining diagonal entry but with −Fii selected instead ofλ . This element must come from the remaining diagonal entry because no two ele-ments in such a product can come from the same row or the same column, and allof the other rows and columns have already been used for the first n−1 terms. Thenthe product is −Fiiλ n−1, and as we can select any i for the element −Fii there are nof these products. Accordingly, we have α =−∑n

i=1 Fii, which together with (A.24)implies

tr F =n

∑i=1

Fii =n

∑i=1

λi.

Definition A.3.1. The standard inner product on Rm×n assigns to points A and Bthe number ⟨A,B⟩= tr(AB∗).

The trace of the m×m matrix AB∗ has the form

tr(AB∗) =m

∑i=1

(AB∗)ii =m

∑i=1

n

∑j=1

Ai jBi j. (A.25)

Equation (A.25) shows that ⟨A,B⟩ = ⟨B,A⟩, and that the inner product is linear inthe first argument with the second held constant (and vice versa). Setting B = A wesee that

⟨A,A⟩=m

∑i=1

n

∑j=1

A2i j ; (A.26)

hence ⟨A,A⟩ is always nonnegative and is zero if and only if A = 0. Therefore thequantity in Definition A.3.1 really is an inner product on Rm×n.

As ⟨ · , · ⟩ is an inner product on Rm×n, Proposition A.2.5 says that it defines anorm on that space. From (A.26) we see that this norm must be

∥A∥F := [A,A]1/2 =

(m

∑i=1

n

∑j=1

A2i j

)1/2

, (A.27)

which is usually called the Frobenius norm. This definition shows that for any suchA we have ∥A∥F = ∥A∗∥F .

The Frobenius norm is submultiplicative: that is, it satisfies the property in (A.28)below.

Proposition A.3.2. Let A ∈ Rm×n and B ∈ Rn×k. Then

∥AB∥F ≤ ∥A∥F∥B∥F . (A.28)

The three Frobenius norms in (A.28) are in three different spaces: Rm×k, Rm×n, andRn×k.

Proof. Write ai· for the ith row of A and b· j for the jth column of B. The element(AB)i j of the product is ⟨ai·,b· j⟩, and by the Schwarz inequality (in Rn) its absolute


value is not greater than the product ∥ai·∥∥b· j∥ of the 2-norms of the two vectors.Therefore

∥AB∥2F =

m

∑i=1

n

∑j=1

|(AB)i j|2 ≤m

∑i=1

n

∑j=1

∥ai·∥2∥b· j∥2 = (m

∑i=1

∥ai·∥2)(n

∑j=1

∥b· j∥2)

= ∥A∥2F∥B∥2

F .

⊓⊔

Definition A.3.3. Let | · |M be a norm on Rm×n and let | · |m and | · |n be vectornorms on Rm and Rn respectively. The triple (| · |m, | · |M, | · |n) is consistent if foreach A ∈ Rm×n and each x ∈ Rn,

|Ax|m ≤ |A|M|x|n. (A.29)

If a norm | · | is defined on Rn for each n, as is the Euclidean norm, and a matrixnorm ∥ · ∥ is defined on Rm×n for each pair (m,n), then if for each such pair thetriple (| · |, ∥ · ∥, | · |) is consistent we say the matrix norm ∥ · ∥ is subordinate tothe vector norm | · |.

Corollary A.3.4. The Frobenius norm is subordinate to the Euclidean vector norm.

Proof. Take k = 1 in Proposition A.3.2 and observe that the Frobenius norm of ann× 1 matrix is the same as the Euclidean norm of the n-vector having the samecomponents. ⊓⊔

The Frobenius norm is important because it comes from the standard inner prod-uct of Definition A.3.1, but it sometimes produces surprising results. For example,if we take m = n and A = In in (A.29) we obtain the bound ∥x∥ ≤ n1/2∥x∥, which issurely true but not particularly useful. For that reason we will investigate some otherpossible norms for Rm×n. The first of these is the very important 2-norm of A, alsocalled the spectral norm. We usually write ∥A∥ for this norm, but if multiple normsare in use we write ∥A∥2.

Definition A.3.5. The 2-norm of A ∈ Rm×n is

∥A∥ := maxx∈Sn−1

∥Ax∥.

⊓⊔

We can write max instead of sup because Sn−1 is compact and the Euclidean vectornorm ∥ · ∥ is continuous.

Definition A.3.5 shows that ∥A∥ is nonnegative, that it equals zero if and only ifA = 0, and that ∥αA∥= |α |∥A∥ for any scalar α . A calculation using the definitionalong with the triangle inequality for the vector norm also shows that if C ∈ Rm×n

then∥A+C∥ ≤ ∥A∥+∥C∥.


Therefore (Rm×n,∥ · ∥) is a normed linear space.Definition A.3.5 also shows that for each x ∈ Rn one has ∥Ax∥ ≤ ∥A∥∥x∥, so the

2-norm on Rm×n is subordinate to the Euclidean vector norm. Also, if D ∈ Rm×k

and E ∈ Rk×n, then for each x ∈ Sn−1,

∥(DE)x∥= ∥D(Ex)∥ ≤ ∥D∥∥Ex∥ ≤ ∥D∥∥E∥∥x∥= ∥D∥∥E∥,

so that ∥DE∥ ≤ ∥D∥∥E∥ and therefore ∥ · ∥ is submultiplicative.To develop an additional characterization of the 2-norm, recall that the elements

of a set v1, . . . ,vm of points in Rn are orthonormal if for each i and j, ⟨vi,v j⟩= δi j,where the right-hand side is the delta function taking the values 0 if i = j and 1 ifi = j. If we let V be the element of Rm×n having columns v1, . . . ,vm, then V ∗V isthe identity Im of Rm. The columns of V must be linearly independent, because if0 =V z then 0 =V ∗V z = z.

If m = n then we call such a V an orthogonal matrix. In that case the columns ofV form a basis of Rn. Given any x ∈ Rn we can then find y ∈ Rn with V y = x; then

(VV ∗)x =V (V ∗V )y =V y = x,

so that VV ∗ = In = V ∗V . The first of these equalities shows that the rows of V arealso orthogonal.

Matrices with orthonormal columns have the very convenient property that theydo not change the Euclidean norm of a vector.

Lemma A.3.6. Let A ∈Rk×n have orthonormal columns a1, . . . ,an. Then k ≥ n andfor each x ∈ Rn,

∥Ax∥= ∥x∥. (A.30)

Proof. For each x ∈ Rn we have

∥Ax∥2 = ⟨Ax,Ax⟩= ⟨x,A∗Ax⟩. (A.31)

However, the element (A∗A)i j is ⟨a∗i ,a j⟩= δi j, so that A∗A = In, the identity of Rn.Then (A.31) yields ∥Ax∥2 = ∥x∥2 and therefore (A.30). But (A.30) also shows thatA has full column rank, so we have k ≥ n. ⊓⊔

Proposition A.3.7. Let A ∈ Rm×n. Then ∥A∥2 is the largest eigenvalue of A∗A.

Proof. If x ∈ Rn then ⟨x,A∗Ax⟩ = ∥Ax∥2, so A∗A is positive semidefinite. It is alsosymmetric, so there exist an orthogonal matrix Q ∈ Rn×n and a diagonal matrixΛ ∈ Rn×n with nonnegative diagonal elements λ1 ≥ . . . ,≥ λn (the eigenvalues ofA∗A) such that A∗A = QΛQ∗. Then for each x ∈ Rn,

∥Ax∥2 = ⟨Ax,Ax⟩= ⟨x,A∗Ax⟩= ⟨x,QΛQ∗x⟩= ⟨Q∗x,Λ(Q∗x)⟩

=n

∑i=1

λi(Q∗x)2i ≤ λ1

n

∑i=1

(Q∗x)2i = λ1∥Q∗x∥2 = λ1∥x∥2,

(A.32)


where the last equality is from Lemma A.3.6. It follows that the maximum of ∥Ax∥2

for x ∈ Sn−1 is not greater than λ1. However, if the orthonormal columns of Q areq1, . . . ,qn then q1 ∈ Sn−1 and

∥Aq1∥2 = ⟨q1,A∗Aq1⟩= ⟨q1,λ1q1⟩= λ1,

showing that the maximum is exactly λ1, the largest eigenvalue of A∗A. ⊓⊔

Corollary A.3.8. If Q ∈ Rn×n is an orthogonal matrix, then ∥Q∥= 1.

Proof. Proposition A.3.7 says that ∥Q∥2 is the largest eigenvalue of Q∗Q, which isthe identity and therefore has all of its eigenvalues equal to 1. ⊓⊔

We have seen already that an orthogonal matrix does not change the length of avector. A somewhat similar property holds for matrices.

Proposition A.3.9. Let A ∈ Rm×n, and let P ∈ Rm×m and Q ∈ Rn×n be orthogonalmatrices. Then ∥PAQ∗∥= ∥A∥.

Proof. Let x ∈ Sn−1; then

∥Ax∥= ∥P(Ax)∥= ∥(PAQ∗)(Qx)∥ ≤ ∥PAQ∗∥∥Qx∥= ∥PAQ∗∥,

where the first equality holds because P does not change the length of Ax and thelast holds because Q does not change the length of x. Therefore ∥A∥ ≤ ∥PAQ∗∥. Butalso

∥PAQ∗∥ ≤ ∥P∥∥A∥∥Q∗∥= ∥A∥,

where the equality is from Corollary A.3.8. ⊓⊔

If we were to apply Proposition A.3.7 to A∗ then we would find that ∥A∗∥2 is thelargest eigenvalue of the matrix AA∗, which in general is not equal to A∗A and isnot even of the same dimensions unless m = n. The following result shows that thisdifference does not matter.

Theorem A.3.10. If A ∈ Rm×n and C ∈ Rn×m, then the nonzero eigenvalues of ACand of CA are the same in value and in multiplicity.

Proof. Let Ik be the identity matrix of Rk and define matrices F and G in R(n+m)×(n+m)

by

F =

[τIm AC In

], G =

[Im 0−C τIn

].

Then

FG =

[τIm −AC τA

0 τIn

], GF =

[τIm A0 τIn −CA

],

and taking determinants yields

τn det(τIm −AC) = detFG = detGF = τm det(τIn −CA). (A.33)


If λ is a nonzero eigenvalue of one of AC and CA with multiplicity k then (τ −λ )k

is a factor of the characteristic polynomial of that matrix. We see from (A.33) thatit must also be a factor of the characteristic polynomial of the other matrix, so anynonzero eigenvalue of one matrix is a nonzero eigenvalue of the other with the samemultiplicity. ⊓⊔

Corollary A.3.11. If A ∈ Rm×n then ∥A∗∥= ∥A∥.

Proof. Proposition A.3.7 says that ∥A∗∥2 is the largest eigenvalue of AA∗, and that∥A∥2 is the largest eigenvalue of A∗A. Theorem A.3.10 says that these are the same.

⊓⊔


Exercise A.3.12. Suppose we define a function on Rm×n by ∥A∥X :=maxmi=1 maxn

j=1 |Ai j|.

a. Show that ∥ · ∥X is a matrix norm.b. Exhibit matrices A and B conformable for multiplication, with ∥AB∥X > ∥A∥X∥B∥X .


The proof of Theorem A.3.10 given here is due to Josef Schmid [43].

A.4 The singular value decomposition of a matrix

The singular value decomposition (SVD) of a general real matrix is a versatile anduseful tool both for analysis and for computation. This section establishes the exis-tence of the SVD and some of its properties.

Theorem A.4.1. Let m and n be positive integers and A ∈ Rm×n. Then there areorthogonal matrices U ∈ Rm×m and V ∈ Rn×n such that

A =UΣV ∗, (A.34)

where the diagonal elements σ1, . . . ,σminm,n of Σ , called the singular values of A,are nonnegative and nonincreasing, and all other elements of Σ are zero.

Proof. If A = 0 we can take U = Im, V = In, and Σ = 0. Therefore suppose that Ahas at least one nonzero element. There is also no loss of generality in assuming thatm ≤ n, as otherwise we can work with A∗.

The matrix AA∗ is symmetric and positive semidefinite, so there exist an orthog-onal matrix U ∈ Rm×m and a positive semidefinite diagonal matrix ∆ ∈ Rm×m such

A.4 The singular value decomposition of a matrix 235

that AA∗ = UΛU∗, where the diagonal elements λ1, . . . ,λm of ∆ are nonincreas-ing. Of these, assume that λ1, . . . ,λr are positive, and let those elements form thediagonal of a positive definite matrix ∆ ∈ Rr×r; then

Λ =

[∆ 00 0

].

If r = m then some of the blocks in the following derivation disappear, but that doesnot affect the argument.

LetT := A∗U =

[T1 T2

],

where T1 ∈ Rn×r and T2 ∈ Rn×(m−r). Then[∆ 00 0

]= Λ =U∗AA∗U = T ∗T =

[T ∗

1 T1 T ∗1 T2

T ∗2 T1 T ∗

2 T2

]. (A.35)

This shows that T ∗2 T2 = 0, but as tr T ∗

2 T2 = ∥T2∥2F this implies that T2 = 0.

Also, (A.35) shows that T ∗1 T1 = ∆ , so that

Ir = (T1∆−1/2)∗(T1∆−1/2),

and therefore V1 := T1∆−1/2 has orthonormal columns. Here and later in the proofwe always use the nonnegative square root.

Adjoin to V1 a matrix V2 ∈Rn×(n−r) such that the resulting matrix V := [V1 V2] isorthogonal. Then

U∗AV = T ∗V =

[T ∗

1 T1∆−1/2 T ∗1 V2

T ∗2 T1∆−1/2 T ∗

2 V2

]=

[∆ 1/2 0

0 0

], (A.36)

where we used the relations

T ∗1 T1 = ∆ , T2 = 0, T ∗

1 V2 = ∆ 1/2V ∗1 V2 = 0.

Defining Σ to be the matrix on the right in (A.36), we obtain U∗AV = Σ , so that(A.34) holds. The singular values in Σ are nonnegative, and are nonincreasing be-cause their squares λ1, . . . ,λm were nonincreasing. ⊓⊔

The matrices U and V appearing in the SVD are generally not unique. For exam-ple, if A = In then for any orthogonal Q ∈ Rn×n, A = Q(In)Q∗.

Although the method used in the proof of Theorem A.4.1 provides a fairly trans-parent derivation, in general one should not use it for computation because calculat-ing AA∗ and A∗A can introduce needless inaccuracy [14, Example 5.3.2]. For discus-sion of some SVD algorithms, see [14, Section 5.4.5]. Good software is available:one example is the routine GESVD of LAPACK, available through Netlib [26].

Corollary A.4.2. The singular values of A are unique, and are the nonnegativesquare roots of the minm,n largest eigenvalues of AA∗ or equivalently of A∗A.


The columns of U and of V are complete sets of eigenvectors for AA∗ and for A∗Arespectively.

Proof. Use (A.34) to obtain AA∗=UΛU∗, where Λ :=Σ 2. This shows that σ21 , . . . ,σ2

mare the eigenvalues of AA∗ in nonincreasing order, and that the columns of U forma complete set of eigenvectors of AA∗. Uniqueness of the singular values follows,because they are nonnegative and because there is a unique list of eigenvalues innonincreasing order. A similar calculation using A∗A shows that all of its nonzeroeigenvalues are included in σ2

1 , . . . ,σ2m, and that the columns of V form a complete

set of eigenvectors. ⊓⊔

In the proof of Theorem A.4.1 we used U to compute V . In situations wherethe nonzero eigenvalues of AA∗ are not distinct it may not be possible to start witharbitrary choices of eigenvectors U of AA∗ and V of A∗A and then to obtain (A.34)from these (Exercise A.4.4).

Corollary A.4.3. Let r be the maximum index i for which σi > 0, or be zero if Σ = 0.This r is then the rank of A, and if r = 0 then imA is the span of the first r columnsof U and kerA is the span of the last n− r columns of V .

Proof. If r = 0 then A= 0 and there is nothing to prove. If r > 0 use (A.34) to obtainimA ⊂ imUΣ . For the opposite inclusion, suppose t ∈ imUΣ , so that for some y,

t =UΣy = (UΣV ∗)(V y) = A(V y) ∈ imA.

ThereforeimA = imUΣ = im[u1, . . . ,ur],

where ui denotes the ith column of U . Returning to (A.34) we see that kerA ⊃kerΣV ∗. However, if s ∈ kerA then 0 = As = (U∗ΣV ∗)s Multiplying by U we seethat s ∈ kerΣV ∗, so kerA = kerΣV ∗. The right-hand side is the set of elementsorthogonal to the first r columns of V , which is the span of the last n− r columns ofV . The assertion that r is the rank of A follows from imA = imUΣ and the fact thatU is nonsingular. ⊓⊔

The SVD is an extremely useful tool that can clarify considerably the structureand properties of a matrix. For an example, let A ∈ Rm×n, and let

A =UΣV ∗ (A.37)

be an SVD of A. If the diagonal elements of Σ are σ1 ≥ . . . ≥ σminm,n and if ofthese σ1, . . . ,σr are positive, for some positive r ≤ minm,n, then we can partitionU and V by taking the first r columns of each to create matrices U+ and V+, thenwriting

U = [U+ U0], V = [V+ V0]. (A.38)

The matrices U+ and V+ have orthonormal columns, but they will not be orthogonalmatrices unless they are square.

A.4 The singular value decomposition of a matrix 237

As σk = 0 if k > r, only columns of U+ and columns of V+ (i.e., rows of (V+)∗)

are involved in the product (A.37), so that

A =U+Σ+(V+)∗, (A.39)

where Σ+ ∈ Rr×r contains those elements Σi j with 1 ≤ i, j ≤ r and therefore has astrictly positive diagonal.

Corollary A.4.3 shows that

imA = imU+ (A.40)

andkerA = imV0. (A.41)

If A ∈Rµ×ν and B ∈Rν×ρ and if we write a(· j) for the jth column of A and b( j ·)for the jth row of B, then

AB =ν

∑j=1

a(· j)b( j ·), (A.42)

where the right-hand side is a sum of ν matrices, each of rank not more than 1. Thisformula holds because if for fixed m and r we compute the element of the right sidein position (m,r) we obtain ∑ν

j=1 am j b jr, and this is the same as (AB)mr.Applying this to (A.39) and writing uk for the kth column of U+ and vk for the

kth column of V+, we obtain

A =r

∑k=1

ukσk(vk)∗, (A.43)

which expresses A as the sum of r rank-one matrices, where r is the rank of A.


Exercise A.4.4. Give an example of a matrix A ∈ Rm×n, and orthogonal matricesU ∈ Rm×m and V ∈ Rn×n whose columns are eigenvectors of AA∗ and A∗A respec-tively, such that U∗AV is not an SVD of A.


The singular value decomposition appears to be due originally to Beltrami and Jor-dan in the latter part of the nineteenth century, but much more information has ap-peared since then. For an excellent discussion of its early history, see [48].

The method of proving existence follows [1, Theorem A.2.3].


A.5 Generalized inverses of linear operators

In general, linear operators don’t have single-valued inverses. But if we relax the re-quirements slightly then for any linear operator we can define a generalized inversethat acts somewhat like an inverse, in a sense made precise below. It becomes theinverse of the operator if the latter is nonsingular.

A.5.1 Existence of generalized inverses

Definition A.5.1. Two subspaces U and V of Rk are independent if U ∩V = 0. IfU and V are independent and their dimensions sum to k then they are complementaryto each other. ⊓⊔

Independence of U and V is equivalent to the requirement that U +V be the directsum U ⊕V ; complementarity is equivalent to having Rk =U ⊕V .

Definition A.5.2. A projector on Rk is a function f : Rk → Rk that is idempotent:i.e., f f = f . If a projector P is a linear transformation then we call it a linear pro-jector. In the case of a linear projector P, if U and V are complementary subspacesof Rk such that U = imP and V = kerP, then we call P the linear projector on Ualong V . ⊓⊔

In the last sentence of Definition A.5.2, P is “the” linear projector on U along Vbecause it is unique. To see that, observe first that P fixes any element u of U ,because as U = imP we must have u = Py for some y, and then

Pu = P(Py) = Py = u.

Now let x ∈Rk and let the unique expression of x as the sum of an element in U andan element in V be x = xU + xV . The linear operator P fixes U and annihilates V ,so Px = PxU = xU . Thus the direct-sum decomposition completely determines thevalues of P, so P is unique. We will write πU,V for this projector.

A linear projector is orthogonal if its kernel is the orthogonal complement of itsimage. Such projectors have a special property.

Proposition A.5.3. A linear projector P on Rn is orthogonal if and only if it is sym-metric.

Proof. If a linear projector P is symmetric then kerP = (imP∗)⊥ = (imP)⊥, sothat P is an orthogonal projector. Conversely, if P is an orthogonal projector thenkerP = (imP)⊥, so that for each x and y in Rn,

0 = ⟨(I −P)x,Py⟩= ⟨P∗(I −P)x,y⟩.

This implies that P∗(I−P) is the zero operator, and therefore so also is its transpose(I −P)∗P. Expanding, we find that

A.5 Generalized inverses of linear operators 239

P = P∗P = P∗,

so that P is symmetric. ⊓⊔

Let A : Rn → Rm be a linear transformation of rank r. Its image R ⊂ Rm thenhas dimension r, and its kernel K ⊂ Rn has dimension n− r. Choose any subspacesM ⊂ Rm of dimension m− r and N ⊂ Rn of dimension r such that Rm = M⊕R andRn = N ⊕K. The proof of the following theorem shows how to construct a linearoperator A−, the generalized inverse of A having image N and kernel M.

Theorem A.5.4. Let A : Rn → Rm be a linear transformation having image R andkernel K, and let M and N be subspaces of Rm and Rn respectively such that R⊕M = Rm and K ⊕N = Rn. There is then a unique linear transformation A− : Rm →Rn such that

imA− = N, kerA− = M, AA− = πR,M, A−A = πN,K . (A.44)

Proof. The restriction τ := A|N of A to N is a linear operator from N into Rm, andbecause K = kerA we have

A = τ πN,K . (A.45)

This shows that R = imA ⊂ imτ , and the opposite inclusion also holds because τ isa restriction of A. Therefore τ takes N onto R. Moreover, its kernel is 0 because Nand K are independent, so that τ is actually an isomorphism of N onto R. It followsthat τ has a single-valued inverse τ−1 : R → Rn with image N.

LetA− = τ−1 πR,M. (A.46)

Then A− is a linear transformation from Rm to Rn with image N and kernel M. Wehave

AA− = τ πN,K τ−1 πR,M = πR,M, (A.47)

because the image of τ−1 is N, so that πN,K τ−1 = τ−1. Similarly,

A−A = τ−1 πR,M τ πN,K = πN,K , (A.48)

because πR,M τ = τ .To show uniqueness, let B and C be linear operators from Rm to Rn, each having

the properties claimed for A− in (A.44). Then

B = BπR,M = BAC = πN,K C =C,

so that A− is unique. ⊓⊔

For much more information on generalized inverses of various kinds, and manyadditional references, see [25].


A.5.2 The Moore-Penrose generalized inverse

If we specialize the construction of Section A.5.1 by requiring our two direct-sumdecompositions to be orthogonal, then we have

N = K⊥ = imA∗, M = R⊥ = kerA∗. (A.49)

In that case the projectors

AA− = πR,M, A−A = πN,K (A.50)

will be orthogonal and so, by Proposition A.5.3, symmetric. Then it is customaryto write A+ in place of A− for the generalized inverse; this is the Moore-Penrosegeneralized inverse of L. For more on this special case, including computationalmethods, see [18, Section 1.3] and [14, Section 5.5.4].

Some presentations of the Moore-Penrose inverse define it as a matrix X ∈Rm×n

satisfying the conditions

[a.] (AX)∗ = AX ,

[b.] (XA)∗ = XA,

[c.] AXA = A,

[d.] XAX = X .

(A.51)

We can put this into the general framework already developed.

Proposition A.5.5. Let A ∈ Rn×m with imA = R and kerA = K, and let X ∈ Rm×n.The following are equivalent.

1. X satisfies the conditions given for A− in Theorem A.5.4 with the choices N =K⊥

and M = R⊥.2. X satisfies (A.51).

If these equivalent statements hold, then X is unique.

Proof. First suppose that assertion (1) holds. Then AX = πRM = πRR⊥ so AX is anorthogonal projector and therefore is symmetric. This establishes (a) in (A.51), anda similar argument proves (b). The proof of Theorem A.5.4 shows that kerX = M,so let y ∈ Rm and write it uniquely as y = yR + yM . Then

(XAX)y = X(AX)y = XπRMy = XyR = Xy,

where the last equality holds because yM ∈ kerX . Therefore (d) in (A.51) holds, anda similar argument establishes (c) and therefore assertion (2).

Next, suppose assertion (2) holds. Using (c) we obtain (AX)2 = (AXA)X = AX ,so AX is a projector; then (a) together with Proposition A.5.3 shows that it is anorthogonal projector. We have imAX ⊂ imA, while if y ∈ imA then for some x ∈Rm

we have

A.6 Affine sets 241

y = Ax = (AXA)x = (AX)Ax = (AX)y,

which shows that imAX = imA. Also kerX ⊂ kerAX , while if (AX)z = 0 then

0 = X(AX)z = Xz,

showing that kerAX = kerX . We have thus shown that

AX = πimA,kerX , kerX = (imA)⊥, (A.52)

where the second inequality comes from the fact that AX is an orthogonal projectorwith kernel kerX and image imA. A similar argument shows that

XA = πimX ,kerA, kerA = (imX)⊥. (A.53)

Now if we let R = imA and K = kerA, then (A.52) shows that AX = πR,R⊥ andkerX = R⊥, while (A.53) shows that XA = πK⊥,K and imX = K⊥. Therefore as-sertion (1) holds. The uniqueness claim follows because if assertion (1) holds thenTheorem A.5.4 guarantees that A− is unique. ⊓⊔

Let the rank of A be r ≥ 1 and write the special form of the SVD given in (A.39):

A =U+Σ+(V+)∗.

Then one can verify directly that the Moore-Penrose generalized inverse of thelinear operator represented by A is the operator A+ : Rn → Rm represented by

A+ =V+(Σ+)−1(U+)

∗, (A.54)

by simply checking that with the choices in (A.49) the operator given in (A.54)satisfies the requirements in (A.44).


E. H. Moore originally developed what is now called the Moore-Penrose general-ized inverse [24]. Some interesting historical background is in [4]. The idea wasrediscovered and extended by Penrose [29].

A.6 Affine sets

Affine sets are very simple sets whose properties follow from basic facts of linearalgebra. They can be points, lines, or flat surfaces (like subspaces but not necessar-ily passing through the origin). In this section we explain their basic properties and


establish some ways of representing them. We also introduce three important con-cepts: the affine transformation, the affine hull of a set, and the property of generalposition, and develop some relationships among these objects.

A.6.1 Basic ideas

Definition A.6.1. A subset A of Rn is affine if for any two points x and y in A, andfor any real µ , (1−µ)x+µy ∈ A. ⊓⊔

This means that A contains the whole line through any pair of its points. In par-ticular, the empty set is affine, as is the whole space Rn. Some of the exercises forthis section ask you to use the definition to show that the class of affine sets is closedwith respect to the operations of translation, of addition or intersection (of finitelymany sets), and of taking forward or inverse images under linear transformations.

It turns out that one is not restricted to forming combinations of just two pointsof such a set. Suppose that we use the term affine combination to refer to any linearcombination of points of Rn of the form ∑k

i=1 µixi with ∑ki=1 µi = 1. Then the class

of affine sets is closed under the operation of taking affine combinations, as thefollowing proposition shows.

Proposition A.6.2. A subset A of Rn is affine if and only if A contains all affinecombinations of finitely many points of A.

Proof. (If): This follows directly from the definition of affine set.(Only if): Suppose A is affine. The definition shows that the claim is true if such a

combination consists of 1 or 2 points. For an induction, assume that k > 2 and that Acontains all affine combinations of no more than k−1 of its points. Let x=∑k

i=1 µixi,where the xi belong to A and the µi sum to 1. As k > 2 not all of the µi can be 1, soby renumbering if necessary we can assume that µk = 1. Then

x = (1−µk)k−1

∑i=1

(1−µk)−1µixi +µkxk.

The summation, say s, is an affine combination of k− 1 points of A, so it belongsto A by the induction hypothesis. Then x is an affine combination of s and xk, so itbelongs to A. ⊓⊔

The sum of two subsets A and B of Rn is the set A+B = a+b | a ∈ A, b ∈ B.If one of the sets is a singleton, say a, then we usually write a+B rather thana+B. This operation is associative, so we can extend this definition to any finitenumber of summands by setting A+B+C = (A+B)+C, etc.

Definition A.6.3. Two affine subsets A and B of Rn are parallel if there is somex ∈ Rn with A = B+ x. In that case we say that A is a translate of B. ⊓⊔

A.6 Affine sets 243

If A is an affine subset of Rn and x ∈ Rn, then for two points a and a′ of A andµ ∈ R we have

(1−µ)(a+ x)+µ(a′+ x) = [(1−µ)a+µa′]+ x.

The right-hand side belongs to A+ x, so we see that any translate of an affine set isaffine.

It turns out that affine sets are just translates of subspaces. This should not besurprising, as a subset L of Rn is a nonempty subspace if and only if L is affine andcontains the origin. Indeed, if L is a nonempty subspace then it certainly containsthe origin. It is also closed under linear combinations, hence certainly under affinecombinations, so it is affine. On the other hand, if A is affine and contains the originthen let x and x′ belong to A and µ and µ ′ be real numbers. Then

µx+µ ′x′ = µx+µ ′x′+(1−µ −µ ′)0 ∈ A,

so that A is closed under linear combinations of its elements and therefore it isa nonempty subspace. The following proposition shows that any affine set has aunique subspace that is parallel to it.

Proposition A.6.4. Given any affine subset A of Rn, the set L := A−A is the uniquesubspace of Rn that is parallel to A. For any point a ∈ A one has A = L+a.

Proof. If A is empty then so is L, and there is nothing else to prove. Therefore let Abe nonempty and let zi ∈ L for i = 1,2. Then zi = ai − a′i for points ai and a′i of A.For any real numbers α1 and α2,

α1z1 +α2z2 = [(1+α1)a1 −α1a′1 +α2a2 −α2a′2]−a1.

The quantity in square brackets belongs to A, so the left-hand side belongs to A−A=L and therefore L is a subspace.

Next, choose any a ∈ A; then L = A−A ⊃ A−a so that A ⊂ L+a. But if z ∈ Lthen there are elements a′ and a′′ of A with z = a′−a′′; then z+a = a′−a′′+a ∈ A,so that L+a ⊂ A and therefore A = L+a.

Finally, suppose L1 and L2 are subspaces of Rn that are parallel to A. Then thereare points x1 and x2 of Rn with Li + xi = A for i = 1,2. As the subspaces Li containthe origin, the xi belong to A. Suppose z1 ∈ L1; then the point a1 := z1 + x1 belongsto A, and then so does a1 − x1 + x2 because A is affine. But as A = L2 + x2 there issome z2 ∈ L2 with z2 + x2 = a1 − x1 + x2, so that z2 = a1 − x1 = z1 As z1 was anarbitrary point of L1 we have L1 ⊂ L2, and reversing the roles of L1 and L2 leads tothe opposite inclusion. Therefore L1 = L2, so the subspace parallel to A is unique.

⊓⊔

We write parA for the unique subspace parallel to an affine set A.

Definition A.6.5. The dimension of an affine subset A of Rn is the dimension of itsparallel subspace (−1 if A = /0). ⊓⊔


Corollary A.6.6. Suppose A and A′ are two affine subsets of Rn, each having di-mension k. If A ⊂ A′, then A = A′.

Proof. If k = −1 there is nothing to prove, so assume that A is nonempty. Let LAand LA′ be the subspaces parallel to A and A′ respectively. If a ∈ A, then as a ∈ A′

also, Proposition A.6.4 yields

a+LA = A ⊂ A′ = a+LA′ , (A.55)

so that LA ⊂ LA′ . Any basis B for LA has k independent elements that also belong toLA′ , but as dimLA′ = k by hypothesis, by linear algebra B also generates LA′ . ThenLA = LA′ , so by (A.55) A = A′. ⊓⊔

Another corollary of Proposition A.6.4 extends the standard direct-sum decom-position of Rn.

Corollary A.6.7. If A is a nonempty affine subset of Rn with parallel subspace L,then

Rn = A⊕L⊥. (A.56)

Proof. Evidently A+L⊥ ⊂ Rn. To show that the reverse inclusion holds, let x ∈ Aand use the standard decomposition

Rn = L⊕L⊥ (A.57)

to write x = xL + xP where xL ∈ L and xP ∈ L⊥. Proposition A.6.4 shows that

A = L+ x = L+ xP,

where the second equality holds because L+ xL = L. Now we can write any y ∈ Rn

asy = yL + yP = (yL + xP)+(yP − xP), (A.58)

where yL ∈ L and yP ∈ L⊥. The first quantity in parentheses belongs to A and thesecond to L⊥, so that y ∈ A+L⊥ and hence Rn ⊂ A+L⊥.

For uniqueness, suppose that y= y′A+y′P with y′A ∈A and y′P ∈ L⊥. By subtractingthis equation from (A.58) we obtain

0 = [(yL + xP)− y′A]+ [(yP − xP)− y′P].

But the first quantity in square brackets belongs to A−A = L and the second to L⊥,so (A.57) shows that each is zero. Therefore y′A = yL + xP and y′P = yP − xP, so thedecomposition in (A.58) is unique and therefore Rn = A⊕L⊥. ⊓⊔

The next proposition shows that affine sets are the solution sets of systems oflinear equations.

Proposition A.6.8. A subset D of Rn is an affine set of dimension k ≥ 0 if and onlyif there exist an (n− k)× n matrix A of rank n− k and an element a ∈ Rn−k suchthat D = x | Ax = a.

A.6 Affine sets 245

Proof. Suppose that D is affine and has dimension k. Let d ∈ D and let L = D−d.We know that L has dimension k, so its orthogonal complement L⊥ has dimensionn− k. Choose any basis for L⊥ and form a (n− k)×n matrix A whose rows are thechosen basis vectors. We have

L = L⊥⊥ = x | Ax = 0.

Writing a = Ad we have D = L+d = x | Ax = a, as required.Conversely, if D is the solution set of a system of equations Ax = a with A having

rank n− k, it follows immediately from the definition that D is affine. If d ∈ D thena = Ad, so that D−d = x | Ax = 0. As A has rank n−k, we see that the subspaceD−d has dimension k, and therefore so does D. ⊓⊔

The names point, line, and hyperplane denote affine sets in Rn of dimensions0, 1, and n− 1 respectively, the last two requiring that n ≥ 1. Since hyperplaneshave dimension n− 1, their parallel subspaces are the orthogonal complements ofone-dimensional subspaces, each of which is determined by a nonzero vector thatis unique up to nonzero scalar multiplication. This vector is called the normal tothe hyperplane in question. The last proposition then shows that a hyperplane Hcan always be represented in the form H = z | ⟨y∗,z⟩= η with y∗ = 0, and sincethe subspace complementary to parH has dimension 1 it is easy to see that the pair(y∗,η) is unique up to multiplication by a nonzero scalar.

For this hyperplane H we define the lower closed halfspace of H to be x |⟨y∗,x⟩ ≤ η, and the upper closed halfspace of H to be the corresponding set with≤ replaced by ≥. There are also lower and upper open halfspaces associated withH, defined in the same way but using strict instead of weak inequalities. These half-spaces depend only on H and not on the particular y∗ and η used to represent it,though which halfspace is lower and which is upper will depend on y∗.

In the rest of this book we use standard notation for hyperplanes and for theirclosed halfspaces as follows: for y∗ ∈ Rn\0 and η ∈ R, we let

H0(y∗,η) = x ∈ Rn | ⟨y∗,x⟩= η,H−(y∗,η) = x ∈ Rn | ⟨y∗,x⟩ ≤ η,H+(y∗,η) = x ∈ Rn | ⟨y∗,x⟩ ≥ η.

(A.59)

A.6.2 The affine hull

Definition A.6.9. Let S be a subset of Rn. The affine hull of S, written affS, is theintersection of all affine sets containing S. ⊓⊔

The set affS is itself affine. Definition A.6.9 gives what one can think of as anouter representation of affS, but we can also develop an inner representation usingaffine combinations.


Proposition A.6.10. Let S be a subset of Rn. Then affS is the set of all affine com-binations of finite subsets of S.

Proof. The set of all such affine combinations is an affine set, say T (use the defi-nition) which contains S, so T contains affS by definition. Now let A be any affineset containing S. Proposition A.6.2 shows that A contains all affine combinations offinite subsets of S, so A contains T . But affS is the intersection of all such A, so affScontains T . ⊓⊔

The affine-hull operator is the first of several special operators for dealing withsets, which we’ll introduce in succeeding sections. Much of the technical content ofconvexity deals with using these operators, and one of the important questions aboutthem is when and if they commute with other operators.

We’ll establish various results about such commutativity as we proceed, but tobegin we consider the affine-hull operator and the closure operator. As an affineset is closed, for any subset S of Rn the set affS is closed and contains S; hence itcontains clS. The definition of affine hull then shows that in fact affS ⊃ affclS. Theopposite inclusion is trivial because S ⊂ clS. So we have affS = affclS, and the factthat affine sets are closed means that affS = claffS. Therefore the operators aff andcl commute.

A.6.3 General position

If we consider three points of R2 arranged in a triangle, then it is not hard to see thatwe can represent any point of R2 as an affine combination of these three points, andit turns out that the coefficients in such an affine combination are unique. However,if we move one of the points so that it lies on the line through the other two, thenwe can represent only points on that line as affine combinations of the three points,and moreover the coefficients are no longer unique. The following definition gives,for points in Rn, the crucial distinction between these two situations.

Definition A.6.11. Let S = x0, . . . ,xk be a set of k+1 points in Rn. The points ofS are said to be in general position if affS has dimension k. ⊓⊔

Points in general position are sometimes called affinely independent, but we donot use that term here. If the points of a set in Rn are in general position, the setcannot contain more than n+1 points.

Because the concept of general position is very useful, we give three equivalentcriteria, one or another of which is often easier to apply than is the definition.

Theorem A.6.12. Let S = x0, . . . ,xk be a subset of Rn. The following are equiva-lent:

a. The points of S are in general position.b. The points x1 − x0, . . . ,xk − x0 are linearly independent in Rn.

A.6 Affine sets 247

c. The points (x01

), . . . ,

(xk1

)are linearly independent in Rn+1.

d. For any point x of affS, the coefficients µi in the representation

x =n

∑i=0

µixi,n

∑i=0

µi = 1,

of x as an affine combination of x0, . . . ,xk, are unique.

When the points of S are in general position, the coefficients µi in (d) are calledthe barycentric coordinates of x with respect to x0, . . . ,xk.

Proof. First let L be the subspace parallel to affS, and write yi = xi − x0 for i =1, . . . ,k. The points yi belong to L. Further, if x = ∑k

i=0 µixi is any affine combinationof the points of S, we must have µ0 = 1−∑k

i=1 µi, and therefore x = x0 +∑ki=1 µiyi.

Since µ1, . . . ,µk are arbitrary, we find that affS = x0 + spany1, . . . ,yk . ThereforeL must be the span of the yi, so the dimension of L (and hence that of affS) is equalto the number of linearly independent yi.

(a) implies (b). To say affS has dimension k is to say that L has dimension k, sothe set y1, . . . ,yk contains k linearly independent elements.

(b) implies (c). Write z0, . . . ,zk for the points shown in (c). If the zi are linearlydependent, then we have

µ0x0 + . . .+µkxk = 0, µ0 + . . .+µk = 0,

for some µ0, . . . ,µk that are not all zero. As the µi sum to zero, at least one ofµ1, . . . ,µk must be nonzero. Further, µ0 = −(µ1 + . . .+ µk), so that ∑k

i=1 µiyi = 0,and therefore the yi are linearly dependent.

(c) implies (d). If we write a point x as an affine combination of the xi then wehave written (x,1) as a linear combination of the zi with the same coefficients. If (c)holds, these coefficients must be unique.

(d) implies (a). The dimension of affS is, by definition, that of L. If this dimensionis less than k then we can find scalars µ1, . . . ,µk, not all zero, with ∑k

i=1 µi(xi−x0) =0. Then

x0 = (1−k

∑i=1

µi)x0 +k

∑i=1

µixi,

and the two sides of this equation display x0 written as an affine combination of thepoints of S in two different ways. ⊓⊔

As the numbering of the points is at our disposal, any of the points could haveplayed the role of x0. This theorem therefore shows, for example, that if the pointsof some finite set are in general position then the points of any of its subsets are alsoin general position.


Let A be a nonempty affine subset of Rn, and suppose that the dimension of Ais k. To find k+ 1 points in A that are in general position, first choose any a0 ∈ Aand note that if k = 0 we are finished. If k > 0, write the parallel subspace L of A asA−a0. As L has dimension k, it contains k linearly independent elements v1, . . . ,vk.For i = 1, . . . ,k let ai = a0 + vi. These points belong to A, and Theorem A.6.12shows that a0, . . . ,ak are in general position. The following corollary shows that wecan recover A from such a set of points.

Corollary A.6.13. Let A be an affine subset of Rn of dimension k ≥ 0, and let S =a0, . . . ,am be a finite subset of A whose points are in general position. Then m≤ k,and if m = k then A = affS.

Proof. If the points of S are in general position, then by Theorem A.6.12 the pointsa1 − a0, . . . ,am − a0 are linearly independent. These points belong, by PropositionA.6.4, to the subspace parA, which has dimension k. Therefore m ≤ k. If m = k thenthe points just mentioned are in fact a basis for parA, so any point of parA can bewritten as a linear combination of them, say ∑k

i=1 µi(ai −a0). Again by PropositionA.6.4, we have A = a0 +parA, so any point of A can be written in the form

a0 +k

∑i=1

µi(ai −a0) =k

∑i=0

µiai,

where we define µ0 = 1−∑ki=1 µi. Therefore any point of A is an affine combination

of the points of S, so affS ⊃ A. The opposite inclusion is trivial because S ⊂ A. ⊓⊔

Another consequence of Theorem A.6.12 is that if points in general position areslightly perturbed, they remain in general position. To prove this we need a lemmaabout perturbed matrices.

Lemma A.6.14. Let A ∈ Rn×n be nonsingular. If ∆ ∈ Rn×n satisfies

∥A−1∆∥< 1, (A.60)

then A+∆ is nonsingular and

∥(A+∆)−1∥ ≤ (1−∥A−1∆∥)−1∥A−1∥. (A.61)

Proof. Fix A and ∆ , and suppose x ∈ Rn with (A+∆)x = 0. Then −x = A−1∆x, sothat

∥x∥= ∥− x∥= ∥A−1∆x∥ ≤ ∥A−1∆∥∥x∥.

As ∥A−1∆∥ < 1, this inequality can hold only if x = 0. Therefore A+∆ is nonsin-gular.

Let y be any point of Rn such that ∥y∥= 1, and write x= (A+∆)−1y. Multiplyingby A−1(A+∆), we obtain x = A−1y−A−1∆x. Then

∥x∥ ≤ ∥A−1y∥+∥A−1∆∥∥x∥,

A.6 Affine sets 249

so∥x∥ ≤ (1−∥A−1∆∥)−1∥A−1y∥ ≤ [(1−∥A−1∆∥)−1∥A−1∥]∥y∥,

which proves (A.61). ⊓⊔

As ∥A−1∆∥ ≤ ∥A−1∥∥∆∥, (A.60) always holds if ∥∆∥ < ∥A−1∥−1, so the set ofnonsingular matrices is open in (Rn×n,∥ · ∥).

Corollary A.6.15. If the points x0, . . . ,xk are in general position in Rn, then thereis a positive ε such that whenever points x′0, . . . ,x

′k of Rn satisfy ∥x′i − xi∥ < ε for

i = 0, . . . ,k, the x′i are in general position.

Proof. Theorem A.6.12 says that the xi are in general position exactly when thevectors z0, . . . ,zk in Rn+1, formed by augmenting each xi with a last component of1, are linearly independent. If k < n− 1 then choose additional points zk+1, . . . ,znso that z0, . . . ,zn is a linearly independent set. The matrix A whose columns arez0, . . . ,zn is then nonsingular. Let

∆ =

[[x′0 − x0

0

], . . .

[x′k − xk

0

], 0, . . . 0

];

then the first k+ 1 columns of A+∆ are the vectors z′0, . . . ,z′k in Rn+1 formed by

augmenting each x′i with a last component of 1. If ∥x′i − xi∥ < ε for i = 0, . . . ,kthen ∥∆∥ ≤ (k + 1)1/2ε . If we choose ε to be less than (k + 1)−1/2∥A−1∥−1 then∥∆∥∥A−1∥ < 1, and Lemma A.6.14 says that A+∆ is nonsingular. Then in partic-ular its first k+ 1 columns, which are z′0, . . . ,z

′k, are linearly independent. Another

application of Theorem A.6.12 then shows that x′0, . . . ,x′k are in general position. ⊓⊔

A.6.4 Affine transformations

Definition A.6.16. An affine transformation from an affine subset A of Rn to Rm isa function from A to Rm whose graph is an affine subset of A×Rm. ⊓⊔

This means that for each a and a′ in A and each real µ one has T [(1− µ)a+µa′] = (1− µ)T (a)+ µT (a′). For such a transformation the image T (A) ⊂ Rm isan affine set, and if T happens to be one-to-one on A, so that it has an inversefunction defined on T (A), then the inverse function is also an affine transformation.Proposition A.6.2 shows that for an affine transformation T of an affine set A andfor each set of points a0 . . . ,ak of A and each set µ0, . . . ,µk of affine coefficients, onehas T (∑k

i=0 µiai) = ∑ki=0 µiT (ai).

The following lemma shows the close association between affine transformationsdefined on an affine set A and linear transformations defined on the parallel subspaceparA.

Lemma A.6.17. Let A be a nonempty affine subset of Rn, and let a0 ∈ A and v0 ∈Rm.


a. If T : A →Rm is an affine transformation such that T (a0) = v0, then the functionL : parA → Rm defined by

L(u) := T (u+a0)− v0 (A.62)

is a linear transformation.b. If L : parA→Rm is a linear transformation, then the function T : A→Rm defined

byT (a) := L(a−a0)+ v0 (A.63)

is an affine transformation with T (a0) = v0.c. The linear transformation L defined in (a) is independent of the choice of (a0,v0)

in the graph of T .

Proof. Suppose the hypotheses of (a) hold, and choose p and p′ in parA. For any αand β in R we have

L(α p+β p′) = T [(α p+β p′)+a0]− v0

= T [α(p+a0)+β (p′+a0)+(1−α −β )a0]− v0

= αT (p+a0)+βT (p′+a0)+(1−α −β )T (a0)− v0

= α[T (p+a0)− v0]+β [T (p′+a0)− v0]

= αL(p)+βL(p′),

so that L is a linear transformation.Now let the hypotheses of (b) hold. The definition of T in (A.63) implies that

T (a0) = v0. Choose a and a′ in A, and let µ ∈ R. Then

T [(1−µ)a+µa′] = L[(1−µ)a+µa′−a0]+ v0

= L[(1−µ)(a−a0)+µ(a′−a0)]+ v0

= (1−µ)[L(a−a0)+ v0]+µ [L(a′−a0)+ v0]

= (1−µ)T (a)+µT (a′),

so that T is an affine transformation.For (c), choose two pairs (a0,v0) and (b0,w0) in the graph of T and for any

u ∈ parA define L(u) by (A.62) and M(u) := T (u+b0)−w0. Then

L(u) = T (u+a0)− v0

= T (u+a0)−T (a0)+T (b0)−w0

= T [(u+a0)−a0 +b0]−w0

= M(u).

⊓⊔

Remark A.6.18. A consequence of (A.63) is that

∥T (a)−T (a′)∥= ∥L(a−a′)∥ ≤ ∥L∥∥a−a′∥,

A.6 Affine sets 251

so that in particular any affine transformation is Lipschitz continuous. We can applythis observation in the case of a k-dimensional affine set A⊂Rn by taking T (a) to bethe function sending the point a ∈ A to the vector [µ0(a), . . . ,µk(a)] of barycentriccoordinates of a with respect to some subset a0, . . . ,ak of A that is in generalposition. The definition of barycentric coordinates shows that this T is an affinetransformation, so that the barycentric coordinates of a are Lipschitz continuousfunctions of a ∈ A. ⊓⊔

If X and Y are topological spaces, we say they are homeomorphic if there isan invertible function f from X onto Y such that both f and f−1 are continuous.We then call f and f−1 homeomorphisms. A Lipschitz homeomorphism is a home-omorphism f with the added property that f and f−1 are Lipschitz continuous.Homeomorphisms are very useful because if two spaces are homeomorphic, thenfor topological purposes we can regard each as a copy of the other.

If T is an affine map that is also a homeomorphism, then T has an inverse which,by our previous observation, is also affine. In that case we call T an affine homeo-morphism. We use the term isometry for a function between two metric spaces thatpreserves distances.

The next theorem shows how to construct an affine Lipschitz homeomorphismbetween any two nonempty affine sets having the same dimension. Moreover, thisconstruction preserves distances.

Theorem A.6.19. Let A and D be affine subsets of Rn and Rm respectively, eachhaving dimension k ≥ 0; let a0 ∈ A and d0 ∈ D. Then there is an affine Lipschitzhomeomorphism ϕ of A onto D having the property that ϕ(a0) = d0 and, for eacha1, a2, a3, and a4 in A,

⟨ϕ(a1)−ϕ(a2),ϕ(a3)−ϕ(a4)⟩= ⟨a1 −a2,a3 −a4⟩. (A.64)

In particular, ϕ and its inverse are isometries.

Proof. Let A = a0+L and D = d0+M, where L and M are k-dimensional subspacesof Rn and Rm respectively. Let V ∈ Rn×k and W ∈ Rm×k with L = imV and M =imW , and suppose that each of V and W has orthonormal columns. For a ∈ A andd ∈ D define

ϕ(a) = d0 +WV ∗(a−a0), ψ(d) = a0 +VW ∗(d −d0). (A.65)

Then ϕ : A→D and ψ : D→A are affine, hence Lipschitz continuous by RemarkA.6.18,with ϕ(a0) = d0 and ψ(d0) = a0. We have for a ∈ A

ψ ϕ(a) = a0 +VW ∗[WV ∗(a−a0)] = a0 +VV ∗(a−a0).

But VV ∗ = πL,L⊥ , the orthogonal projector on L along L⊥, and a− a0 ∈ L, so ψ ϕ(a) = a and therefore ψ ϕ = idA. A similar argument shows that ϕ ψ = idD,so each is surjective and ψ = ϕ−1. They are thus affine Lipschitz homeomorphismsbetween A and D.


Let a1, . . .a4 be elements of A. Then

⟨ϕ(a1)−ϕ(a2),ϕ(a3)−ϕ(a4)⟩= ⟨WV ∗(a1 −a2),WV ∗(a3 −a4)⟩= ⟨a1 −a2,VW ∗WV ∗(a3 −a4)⟩= ⟨a1 −a2,VV ∗(a3 −a4)⟩= ⟨a1 −a2,a3 −a4⟩,

(A.66)

where the last equality follows because VV ∗ = πL,L⊥ and a3 −a4 ∈ L. In particular,ϕ is an isometry because if we take a3 = a1 and a4 = a2 then (A.66) says that

∥ϕ(a1)−ϕ(a2)∥= ∥a1 −a2∥.

Using ψ = ϕ−1 in (A.66) shows that the same statements apply to ψ . ⊓⊔

If for some reason one does not want to require V and W to have orthonormalcolumns, then the homeomorphism result of Theorem A.6.19 still holds—withoutthe isometry assertion—provided one substitutes V+ and W+ for V ∗ and W ∗ inthe proof. Here V+ and W+ are the Moore-Penrose inverses discussed inn SectionA.5.2.

A.6.5 The lineality space of a general set

Sometimes one can simplify working with a subset of Rn by taking account of itslineality space,which one can think of as a subspace on which the set is uninterest-ing. As it is uninteresting, we can factor out that space and then deal with the rest ofthe set in a smaller space.

Proposition A.6.20. For any nonempty subset S of Rn there is a unique largest sub-space L (in the sense of inclusion) satisfying the equation S+L = S.

Proof. Let Q be the collection of all subspaces L of Rn such that S+L = S. Q isnonempty because it contains 0. If L and L′ are elements of Q then

S+(L+L′) = (S+L)+L′ = S+L′ = S,

so that L+ L′ ∈ Q. Now let L be an element of Q having maximal dimension. IfK is any element of Q then by what we have already shown, L+K ∈ Q. If K ⊂ Lthen dim(L+K) > dimL, which contradicts the choice of L. Therefore K ⊂ L, sothe subspace L contains every element of Q and thus is the largest element of Q. Itis unique because if elements L and L′ of Q each contained every element of Q, thenthey would contain each other so that L = L′. ⊓⊔

Proposition A.6.20 shows that the following definition makes sense.

A.6 Affine sets 253

Definition A.6.21. Let S be a nonempty subset of Rn. The lineality space linS of Sis the unique largest subspace L (in the sense of inclusion) satisfying the equationS+L = S. ⊓⊔

Proposition A.6.22. Let S be a nonempty subset of Rn. If L is a nonempty subspaceof linS and M is any subspace such that Rn = L⊕M, then

S = L⊕ (S∩M). (A.67)

Proof. Choose any s ∈ S. There are unique points sL ∈ L and sM ∈ M with s = sL +sM . Then sM belongs to M and, as sM = s− sL ∈ S+ linS = S, also to S. As sL ∈ L,we have shown that s ∈ L+(S∩M) and therefore that S ⊂ L+(S∩M). We alsohave L+(S∩M)⊂ linS+S = S, so S = L+(S∩M). The direct-sum decompositionin (A.67) follows because Rn = L⊕M. ⊓⊔

Proposition A.6.22 shows why a set S is uninteresting in directions contained inlinS. We can see from (A.67) that S is a cylinder with cross-section S∩M. We canthen factor out L and work with S∩M, which may have much smaller dimension. Itis often convenient to take M = L⊥.


We make the convention that the intersection of an empty collection of subsets ofRn is Rn.

Exercise A.6.23. The intersection of any collection of affine subsets of Rn is affine.

Exercise A.6.24. Let S be a subset of Rn. Show that S and its closure have the sameaffine hull and hence the same parallel subspace.

Exercise A.6.25. Use the definition of affine hull to show that for two sets S ⊂ Rn

and T ⊂ Rm one has aff(S×T ) = (affS)× (affT ).

Exercise A.6.26. Show that if T is an affine transformation from an affine set A toan affine set D, then the image under T of an affine subset of A is an affine subsetof D. Also show that the inverse image under T of an affine subset of D is an affinesubset of A. Do not assume that T is one-to-one.

Exercise A.6.27. The sum of any finite collection of affine subsets of Rn is affine.

Exercise A.6.28. Give an example of a closed subset S of R2 and a linear transfor-mation L : R2 → R such that L(S) is not closed.

Exercise A.6.29. Let S be a subset of Rn and suppose that f is a continuous functionfrom Rn to Rm. Show that if f (clS) is closed, then f (clS) = cl f (S).


Exercise A.6.30. Suppose A is the affine hull of the points (1,1,1,1), (2,1,2,1),and (0,3,1,2) in R4. Construct explicitly an affine isometry of A onto R2 usingthe method of Theorem A.6.19, and give an example to show that your isometrypreserves the inner product.

Exercise A.6.31. Show that the operator aff commutes with affine functions: that is,for any set S ⊂Rn and any affine function f from affS to another affine set, one hasf (affS) = aff f (S).

Appendix BTopics from Analysis

This appendix develops techniques from analysis that we need in parts of this book.The initial section covers three basic inequalities.

B.1 Three inequalities

We start with a general form of the inequality between arithmetic and geometricmean, then prove the inequalities of Holder and Minkowski as consequences.

B.1.1 The arithmetic-geometric mean inequality

Given real numbers x1, . . . ,xI and positive weights µ1, . . . ,µI , the (weighted) arith-metic mean of the xi is ∑I

i=1 µixi. If the xi are all positive then their (weighted)geometric mean is (Π I

i=1xµii . For positive xi the arithmetic mean is never less than

the geometric mean, and it is strictly greater unless the xi are all equal. Particularcases of these means are the unweighted versions, in which each µi equals I−1.

Theorem B.1.1. Let x1, . . . ,xI and µ1, . . . ,µI be positive real numbers such that∑I

i=1 µi = 1. Then

Π Ii=1xµi

i ≤I

∑i=1

µixi, (B.1)

with equality if and only if the xi are all equal.

Proof. The assertion is obvious if I = 1. For I = 2, if x1 = x2 then (B.1) holds as anequality. If x1 = x2 there is no loss of generality in assuming that x1 < x2; we provethat strict inequality then holds.

Define y = x2/x1. Then

255

256 B Topics from Analysis

(µ1x1 +µ2x2)/(xµ11 xµ2

2 ) = µ1y−µ2 +µ2yµ1 =: ϕ(y).

The mean value theorem says that for some ξ ∈ (1,y),

ϕ(y) = ϕ(1)+ϕ ′(ξ )(y−1) = ϕ(1)+ [µ1µ2ξ−µ2(1−ξ−1)](y−1).

We have ϕ(1) = 1, y> 1, and the quantity in square brackets is positive, so ϕ(y)> 1.Then (B.1) holds with strict inequality for I = 2.

Suppose now that I = k > 2 and that we have proved the conclusion for eachvalue of I less than k. Let µ0 = ∑k−1

i=1 µi, and for i = 1, . . . ,k−1 define νi = µi/µ0;then the νi are positive and they sum to 1. Then µ0 +µk = 1 and

Π ki=1xµi

i =(

Π k−1i=1 xνi

i

)µ0xµk

k ≤

(k−1

∑i=1

νixi

)µ0

xµkk

≤ µ0

(k−1

∑i=1

νixi

)+µkxk =

k

∑i=1

µixi,

(B.2)

where the first inequality comes from (B.1) for I = k−1 (the induction hypothesis)and the second from (B.1) for I = 2. The first inequality is strict unless x1 = . . . =xk−1 and the second is strict unless xk = ∑k−1

i=1 νixi. Therefore strict inequality holdsin (B.2) unless x1 = . . .= xk, so the assertion of the theorem holds for I = k. ⊓⊔

The next two sections develop useful corollaries of Theorem B.1.1.

B.1.2 The Holder inequality

Corollary B.1.2. Suppose that a1, . . . ,aI and b1, . . . ,bI are positive numbers, andthat p > 1 and q > 1 with p−1 +q−1 = 1. Then

I

∑i=1

aibi ≤

(I

∑i=1

api

)1/p( I

∑i=1

bqi

)1/q

. (B.3)

The inequality in (B.3) is strict unless there is some positive ρ such that for each i,ap

i = ρbqi .

Proof. Let γ = ∑i api and δ = ∑i bq

i , and for each i let ci = γ−1api and di = δ−1bq

i ,so that the ci and the di each sum to 1. To simplify the notation, write α for p−1 andβ for q−1, so that α +β = 1. If we write ∑i for ∑I

i=1, then

B.2 *Projectors and distance functions 257

∑i

aibi = ∑i(γci)

α(δdi)β

= γα δ β ∑i

cαi dβ

i

≤ γα δ β ∑i(αci +βdi)

= γα δ β

=

(∑

iap

i

)1/p(∑

ibq

i

)1/q

,

(B.4)

where the inequality comes from applying Theorem B.1.1 to each term cαi dβ

i in turn.This proves (B.3). Moreover, the inequality will be strict unless for each i we havecα

i = dβi ; that is, unless ai = ρbi with ρ = γ(p−1)δ (q−1). ⊓⊔

B.1.3 Minkowski’s inequality

Text to be written.

B.2 *Projectors and distance functions

In Section 1.1.2 we introduced the point-to-set distance function

dS(x) = inf∥x− s∥ | s ∈ S.

This section introduces some additional properties of this distance function and ofprojectors.

The infimum in the definition of dS need not be attained in general, but if S isclosed then the argument used in the proof of the “if” part of Proposition 2.2.2shows that for any point x ∈ Rn there is always a point s of S that is closest to x sothat dS(x) = ∥s− z∥, although s may not be unique. If x and x′ are points of Rn thenfor any positive ε there are points s and s′ of S with

∥x− s∥ ≤ dS(x)+ ε, ∥x′− s′∥ ≤ dS(x′)+ ε,

so that

dS(x)−dS(x′)≤ ∥x− s′∥−dS(x′)≤ ∥x− x′∥+∥x′− s′∥−dS(x′)≤ ∥x− x′∥+ ε,

and by interchanging the roles of x and x′ we find that |dS(x)− dS(x′)| ≤ ∥x− x′∥,so that the distance function is Lipschitzian with modulus 1.


If a set C is not only closed but also convex, then dC has an additional usefulproperty. Suppose that x and x′ are points of Rn, and let µ ∈ (0,1). Find points s ands′ in S such that

∥x− s∥< dC(x)+ ε, ∥x′− s′∥< dC(x′)+ ε,

and let sµ = (1−µ)s+µs′. If we define xµ = (1−µ)x+µx′, then

dC(xµ)≤ ∥xµ − sµ∥= ∥(1−µ)(x− s)+µ(x′− s′)∥< (1−µ)[dC(x)+ ε]+µ[dC(x′)+ ε]= (1−µ)dC(x)+µdC(x′)+ ε ,

and therefore for each λ ∈ [0,1] we have

dC[(1−λ )x+λx′]≤ (1−λ )dC(x)+λdC(x′).

This means that dC belongs to the class of convex functions, about which we willsee much more in future chapters.

Again for closed convex S, Lemma 2.2.1 shows that the closest point must beunique. In that case we can define a function ΠC :Rn → S taking x to the closest pointin S. This function is the Euclidean projector on S. This projector has remarkableproperties, two of which we will develop here; more properties will follow after wehave the technical tools to prove them.

The first property concerns the relationship between ΠC and the function I −ΠCthat takes x to x−ΠC(x), a residual quantity measuring the extent to which x cannotbe approximated by points of S.

Proposition B.2.1. Let S be a nonempty closed convex subset of Rn. For any x andx′ in Rn one has

∥ΠC(x)−ΠC(x′)∥2 +∥(I −ΠC)(x)− (I −ΠC)(x′)∥2 ≤ ∥x− x′∥2. (B.5)

Proof. For brevity write p for ΠC(x) and r for (I−ΠC)(x), with p′ and r′ being theanalogous quantities for x′. Then

∥x− x′∥2 = ∥(r− r′)+(p− p′)∥2

= ∥r− r′∥2 +2⟨−r, p′− p⟩+2⟨−r′, p− p′⟩+∥p− p′∥2.(B.6)

The four quantities in the last line of (B.6) are all nonnegative, the middle two beingso because of Lemma 2.1. This proves the assertion, but we will shortly return to(B.6) for additional proofs. ⊓⊔

An immediate consequence of (B.5) is that each of ΠC and I −ΠC is nonexpan-sive: that is, Lipschitzian on Rn with Lipschitz constant 1. However, we can showconsiderably more. The next result uses information from the proof of PropositionB.2.1 to establish additional properties of ΠC and I −ΠC

B.2 *Projectors and distance functions 259

Proposition B.2.2. For a nonempty closed convex subset C of Rn and for any x andx′ in Rn one has

⟨x− x′,ΠC(x)−ΠC(x′)⟩ ≥ ∥ΠC(x)−ΠC(x′)∥2 (B.7)

and

⟨x− x′,(I −ΠC)(x)− (I −ΠC)(x′)⟩ ≥ ∥(I −ΠC)(x)− (I −ΠC)(x′)∥2. (B.8)

Proof. We use the same notation as in the proof of Proposition B.2.1. For (B.7),write

⟨x− x′, p− p′⟩= ⟨(r− r′)+(p− p′), p− p′⟩= ⟨−r, p′− p⟩+ ⟨−r′, p− p′⟩+∥p− p′∥2

≥ ∥p− p′∥2,

where we again used Lemma 2.2.1. To obtain (B.7), interchange the roles of p andr. ⊓⊔

Proposition B.2.2 says in particular that both ΠC and I −ΠC are monotone func-tions: that is, functions f (x) having the property that for any x and x′ in the domainof f , ⟨x− x′, f (x)− f (x′)⟩ ≥ 0. However, it says more because of the squared termson the right-hand sides of the inequalities.

Later in the book we will develop information about more general monotoneoperators, which may be multivalued (so that f (x) can be an entire set of pointsinstead of a single point as it is here). However, we can use the information we nowhave about the projector to find out more about differentiability properties of dC. Wewrite bdC for the boundary of C.

Proposition B.2.3. Let C be a nonempty closed convex subset of Rn.Then (1/2)dC( ·)2

is Frechet differentiable on Rn with its derivative being (I −ΠC), which has Lips-chitz modulus 1 everywhere on Rn. The function dC is Frechet differentiable onRn\bdC, with derivative at x given by (I −ΠC)(x)/∥(I −ΠC)(x)∥ on the comple-ment of C and, if C has full dimension, by zero on the interior of C. This derivativeis locally Lipschitzian wherever it exists.

Proof. Again using the notation from the proof of Proposition B.2.1, we have for xand x′ in Rn

(1/2)dC(x′)2 − (1/2)dC(x)2 −⟨r,x′− x⟩= ∥r′∥2/2−∥r∥2/2−⟨r,x′− x⟩= [∥r′∥2/2−⟨r′,r⟩+∥r∥2/2]− [∥r∥2 −⟨r′,r⟩+ ⟨r,x′− x⟩]= ∥r′− r∥2/2+ ⟨−r, p′− p⟩.

(B.9)

The last line in (B.9) is one-half of the sum of the first two of the four terms in thelast line of of (B.6). Recalling that each of those four terms was nonnegative andthat their sum was ∥x′− x∥2, we see that


0 ≤ (1/2)dC(x′)2 − (1/2)dC(x)2 −⟨r,x′− x⟩ ≤ (1/2)∥x′− x∥2,

which shows that r is the derivative of (1/2)d2C at x. The assertions about Lipschitz

continuity follow from the nonexpansivity proved in Proposition B.2.1.For the second claim, observe that everywhere on the complement of C (I −

ΠC)(x) is nonzero and dC(x) = ∥(I −ΠC)(x)∥, which is positive. The form of thederivative of dC then follows from the chain rule applied to the function (d2

C)1/2,

and local Lipschitz continuity holds because z/∥z∥ is locally Lipschitz continuous ateach nonzero point and because the composition of Lipschitz continuous functionsis Lipschitz continuous. If C has an interior, then dC is identically zero there, and sothen is its derivative. ⊓⊔

B.2.1 Notes and references

The method of proof in Theorem B.1.1 follows that of [16, Appendix I].

Appendix CTopics from Topology

This appendix develops a few concepts from topology required in other parts of thebook. We restrict attention to subsets of Rn, and we only present material that isneeded elsewhere. A very good general reference for basic topology is [2].

C.1 Connectedness

Definition C.1.1. Let S be a subset of Rn. S is connected if for any two nonemptysubsets P and Q of S such that P∪Q = S, at least one of P∩ clQ and (clP)∩Q isnonempty.

Convex sets have the very useful property of being always connected.

Proposition C.1.2. A convex subset C of Rn is connected.

Proof. If C is empty or is a singleton, then it is connected. Otherwise, suppose thatD and E are nonempty subsets of C with D∪E = C. If D and E intersect then theconnectedness condition is satisfied. If they do not, then as each is nonempty we canfind two points cD ∈ D and cE ∈ E and these points must be distinct. For λ ∈ [0,1]let xλ = (1−λ )cD+λcE , and let µ = supλ ∈ [0,1] | cλ ∈D. If cλ ∈D then λ < 1,as otherwise D and E would intersect. Then there are points cµ with µ > λ but asclose to λ as we wish. These points must be in E, so cλ ∈ D∩ clE, which is thennonempty. If cλ ∈ E then λ cannot be zero. As it is a supremum, there are pointscµ ∈ D with µ < λ but as close to it as we wish. Then cλ ∈ (clD)∩E, showing thatC is connected. ⊓⊔

261

References 263

References

1. Anderson, T.W.: Introduction to Multivariate Statistical Analysis, 2d edn. Wiley, New York(1984). ISBN-10: 0-471-88987-3

2. Armstrong, M.A.: Basic Topology. Undergraduate Texts in Mathematics. Springer, New York(1983). ISBN-13: 978-0-387-90839-7; originally published 1979 by McGraw-Hill (UK)

3. Ben-Israel, A.: Linear equations and inequalities on finite dimensional, real or complex, vectorspaces: A unified theory. Journal of Mathematical Analysis and Applications 27, 367–389(1969)

4. Ben-Israel, A.: The Moore of the Moore-Penrose inverse. Electronic Journal of Linear Algebra9, 150–157 (2002)

5. Berge, C.: Topological Spaces. Macmillan, New York (1963)6. Brezis, H.: Operateurs maximaux monotones et semi-groupes de contractions dans les espaces

de Hilbert. No. 5 in North-Holland Mathematics Studies. North-Holland, Amsterdam (1973)7. Dontchev, A.L., Rockafellar, R.T.: Characterizations of strong regularity for variational in-

equalities over polyhedral convex sets. SIAM Journal on Optimization 6, 1087–1105 (1996)8. Farkas, G.: A Fourier-fele mechanikai elv alkalmazasainak algebrai alapjarol [On the algebraic

basis of the applications of the mechanical principle of Fourier]. Mathematikai es PhysikaiLapok 5, 49–54 (1896)

9. Farkas, G.: A Fourier-fele mechanikai elv alkalmazasanak algebrai alapja [The alge-braic basis of the application of the mechanical principle of Fourier]. Mathematikai esTermeszettudomanyi Ertesito 16, 361–364 (1898)

10. Farkas, J.: Uber die Theorie der einfachen Ungleichungen. Journal fur die reine und ange-wandte Mathematik 124, 1–27 (1902)

11. Fudenberg, D., Tirole, J.: Game Theory. MIT Press, Cambridge, MA (1991). ISBN-10 0-262-06141-4

12. Gale, D.: The Theory of Linear Economic Models. University of Chicago Press, Chicago, IL(1989). ISBN-13: 978-0226278841. Originally published 1960 by McGraw-Hill, New York.

13. Giannessi, F.: Constrained Optimization and Image Space Analysis, vol. 1. Springer, NewYork (2005). ISBN-10: 0-387-24770-X

14. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3d edn. The Johns Hopkins UniversityPress, Baltimore, MD (1996)

15. Gordan, P.: Ueber die Auflosung linearer Gleichungen mit reellen Coefficienten. Mathematis-che Annalen 6, 23–28 (1873)

16. Hardy, G.H.: A Course of Pure Mathematics, 10th edn. Cambridge University Press, Cam-bridge, UK (1967). (First edition 1908.)

17. Hoffman, A.J.: On approximate solutions of systems of linear inequalities. Journal of Researchof the National Bureau of Standards 49, 263–265 (1952)

18. Householder, A.S.: The Theory of Matrices in Numerical Analysis. Dover Publications, Inc.,New York (1975). Originally published 1964 by Blaisdell Publishing Co.

19. Jordan, P., von Neumann, J.: On inner products in linear metric spaces. The Annals of Math-ematics 36, 719–723 (1935)

20. Komiya, H.: Elementary proof for Sion’s minimax theorem. Kodai Mathematical Journal 11,5–7 (1988)

21. Luce, R.D., Raiffa, H.: Games and Decisions: Introduction and Critical Survey. Dover, NewYork (1989). ISBN-10 0-486-65943-7; originally published 1957 by John Wiley & Sons

22. Mangasarian, O.L.: Nonlinear Programming. No. 10 in Classics in Applied Mathematics. So-ciety for Industrial and Applied Mathematics (1994). ISBN-13: 978-0898713411. Originallypublished 1969 by McGraw-Hill, New York

23. Minty, G.J.: A “from-scratch” proof of a theorem of Rockafellar and Fulkerson. MathematicalProgramming 7, 368–375 (1974)

24. Moore, E.H.: On the reciprocal of the general algebraic matrix. Bulletin of the AmericanMathematical Society 26, 385–396 (1920). [This is a report entitled “The fourteenth westernmeeting of the American Mathematical Society”; Moore’s report appears on pp. 394–395.]

264 C Topics from Topology

25. Nashed, M.Z. (ed.): Generalized Inverses and Applications. Academic Press, New York (1976)26. Netlib: http://www.netlib.org/index.html27. Neumann, J.v., Morgenstern, O.: Theory of Games and Economic Behavior. John Wiley &

Sons, New York (1964). Originally published 1944 by Princeton University Press28. Pang, J.S.: Newton’s method for B-differentiable equations. Mathematics of Operations Re-

search 15, 311–341 (1990)29. Penrose, R.: A generalized inverse for matrices. Mathematical Proceedings of the Cambridge

Philosophical Society 51, 406–413 (1955)30. Plambeck, E.L., Fu, B.R., Robinson, S.M., Suri, R.: Sample-path optimization of convex

stochastic performance functions. Mathematical Programming 75, 137–176 (1996)31. Prekopa, A.: On the development of optimization theory. American Mathematical Monthly

87, 527–542 (1980)32. Robinson, S.M.: Some continuity properties of polyhedral multifunctions. Mathematical Pro-

gramming Studies 14, 206–214 (1981)33. Robinson, S.M.: Local structure of feasible sets in nonlinear programming, Part II: Nonde-

generacy. Mathematical Programming Studies 22, 217–230 (1984)34. Robinson, S.M.: Implicit B-differentiability in generalized equations. Technical Summary

Report 2854, Mathematics Research Center, University of Wisconsin-Madison, Madison, WI,USA (1985). NTIS Accession No. AD A160 980

35. Robinson, S.M.: An implicit-function theorem for a class of nonsmooth functions. Mathemat-ics of Operations Research 16, 292–309 (1991)

36. Robinson, S.M.: Analysis of sample-path optimization. Mathematics of Operations Research21, 513–528 (1996)

37. Robinson, S.M.: Convexity in finite-dimensional spaces (2008). Manuscript38. Rockafellar, R.T.: The elementary vectors of a subspace of Rn. In: R.C. Bose, T.A. Dowling

(eds.) Combinatorial Mathematics and Its Applications, pp. 104–127. University of NorthCarolina Press, Chapel Hill, NC (1969)

39. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton, NJ (1970)40. Rockafellar, R.T.: Conjugate Duality and Optimization. No. 16 in CBMS Regional Conference

Series in Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphia,PA (1974)

41. Rockafellar, R.T., Wets, R.J.: Variational Analysis. No. 317 in Grundlehren der mathematis-chen Wissenschaften. Springer-Verlag, Berlin (1998). ISBN13: 978-3-540-62772-3

42. Royden, H.L.: Real Analysis, 2d edn. Macmillan, New York (1968)43. Schmid, J.: A remark on characteristic polynomials. American Mathematical Monthly 77,

998–999 (1970)44. Shapiro, A.: On concepts of directional differentiability. Journal of Optimization Theory and

Applications 66, 477–487 (1990)45. Shubik, M.: Game Theory in the Social Sciences: Concepts and Solutions. MIT Press, Cam-

bridge, MA (1982). ISBN-10 0-262-19195-446. Sion, M.: On general minimax theorems. Pacific Journal of Mathematics 8, 171–176 (1958)47. Starr, R.M.: Quasi-equilibria in markets with non-convex preferences. Econometrica 37, 25–

38 (1969)48. Stewart, G.W.: On the early history of the singular value decomposition. SIAM Review 35,

551–566 (1993). Stable URL: http://www.jstor.org/stable/213238849. Walkup, D.W., Wets, R.J.B.: A Lipschitzian characterization of convex polyhedra. Proceed-

ings of the American Mathematical Society 23, 167–173 (1969)

Index

2-norm (vector), 234l1 norm, 234

adjointconcave sense, 152convex sense, 152

affine combination, 252affine hull, 255affine independence, 256affine sets

parallel, 252affine transformation, 259arithmetic mean, 265

barrier cone, 58barycenter, 9barycentric coordinates, 257

Lipschitz continuity of, 261boundary

relative, 18

Caratheodory’s theorem, 6best bound in, 9

closed multifunction, 47closest point

to a set, 26closure

of a function, 142combination

affine, 252complementary subspaces, 248composition

of a function and a scalar, 171condition

recession, 59cone

barrier, 58

convex, 35critical, 67definition of, 35generated by a set, 37homogenizing, 58normal, 47semidefinite, 35tangent, 47

cone,recession,and barrier cone of polar, 59conical hull, 36

of a function, 157when closed and proper, 170

continuityLipschitz, 233

convex coefficients, 4convex combination, 4convex hull, 4

generalized, 11convex set, 3

polyhedralproducts, 89translation of, 89

coordinatesbarycentric, 257

critical cone, 67critical face, 67

delta function, 242diameter of a set, 5dimension

of an affine set, 253of convex set, 4

directionof a line, 56recession, 53

directional derivativeepsilon-, 187

265

266 Index

distancefrom a point to a set, 8Pompeiu-Hausdorff, 8

domaineffective, 126

effective domain, 126of a multifunction, 176

epigraph, 126relative interior of, 142support function of, 172

epsilon-directional derivative, 187epsilon-minimizer, 185epsilon-subdifferential, 185epsilon-subgradient, 185ERV (extended-real-valued), 126Euclidean norm, 234Euclidean projector, 33exact index set, 80excess

of one set over another, 8extended real numbers, 125

facecritical, 67of a sum of sets, 70of an intersection of sets, 70of forward affine image, 69of inverse affine image, 69of product set, 68

facesnonempty, notation for, 80notation for, 64

facet, 92existence of, 92

Farkas lemma, 39generalized, 45

finite-dimensional, 232finitely generated convex set, 12function

+∞ extension of, 129extended-real-valued, 126locally Lipschitzian, 130lower semicontinuous, 138proper, 127recession, 164upper semicontinuous, 138

general position, 256characterization of, 256extended, 11

generalized inverse, 248Moore-Penrose, 250

geometric mean, 265

Gordan theoremgeneralized, 46

graphof a multifunction, 176of multifunction, 93

halfline, 11generator of, 11

halfspaceclosed

standard notation for, 255homeomorphism, 261

affine, 261Lipschitz, 261

homogenizing cone, 58hull

lower semicontinuous, 142hyperplane, 255

closed halfspace of, 255open halfspace of, 255properly supporting, 26standard notation for, 255supporting, 26

imageof a multifunction, 176

independent subspaces, 248index set

exact, 80inner product space, 232interior

relative, 14intersection

of empty collection, 263inverse

of a multifunction, 48, 176isometry, 261

Jensen’s inequality, 128

level setlower, 141support function of, 157

limit inferiordefinitions of, 138

line, 255line segment, 3

notation for, 4lineality space, 262linear combination

nonnegative, 11linear image

closedness of, 61recession cone of, 61

Index 267

linear projector, 248linear space, 231

normed, 233Lipschitz continuity, 233

local, 130locally Lipschitzian funciton, 130lower level set, 141lower semicontinuous hull, 142

matrixorthogonal, 242

maximum norm, 234mean

arithmetic, 265geometric, 265

minimizerepsilon-, 185local, 51

minorant, 142minorization, 142monotone

functiondefinition of, 269

operator, 269multifunction, 47, 176

closed, 47, 176definition, 93effective domain, 93forward image of, 93graph, 93graph of, 176image, 93inverse, 93inverse image of, 93inverse of, 48monotone, 176values, 94with closed values, 94

nonexpansive, 233norm

l1, 234matrix

consistence, 241submultiplicative, 240subordinate, 241

maximum, 234vector, 233

2-norm, 234Euclidean, 234

normalcone, 47definition, 47to a hyperplane, 255

normal component, 33normal cone

as subdifferential of indicator, 176local property of, 47

normed linear space, 233norms

equivalent, 236

operatormonotone, 269

orthogonal matrix, 242orthonormal, 242

parallel subspaceof a convex set, 4of an affine set, 253

point, 255points

affinely independent, 256polar

and polar of homogenizing cone, 58double, 29of a nonempty set, 28

Pompeiu-Hausdorff distance, 8positive hull, 11positive vector, 40projector, 248

Euclidean, 33, 268is nonexpansive, 268linear, 248orthogonal linear, 248

proper function, 127

recessiondirection, 53

recession condition, 59recession cone

of intersection, 55of inverse linear image, 55

recession function, 164reflection operator

in Rn+1, 159relative boundary, 18relative interior, 14relative topology, 14relatively open set, 15

scalar, 231composition of a function with, 171

Schwarz inequalityderivation, 233

semidefinite cone, 35semipositive vector, 40separation

268 Index

definition of, 25proper

characterization of, 31definition of, 25

strictdefinition of, 26

strongcharacterization of, 29definition of, 26of a point and a set, 27

setaffine, 251finitely generated convex, 12relatively open, 15symmetric, 161

Shapley-Folkman theorem, 7simplex, 9

generalized, 12vertices of, 9

singular values, 244space

inner product, 232lineality, 262linear, 231vector, 231

span, 232subdifferential, 176

epsilon-, 185subgradient, 143

and supporting hyperplane, 143, 175epsilon-, 185

subspacescomplementary, 248independent, 248

support function, 153of a level set, 157of an epigraph, 172

supporting hyperplane, 26proper, 26

symmetric set, 161

tangentcone, 47

theorem of the alternativeGale’s, 120

theorem of the alternative,Motzkin’s, 120theorem of the alternative,Stiemke’s, 121theorem of the alternative,Tucker’s, 120theorem of the alternative,Ville’s, 121theorems of the alternative, 39topology

relative, 14transformation

affine, 259Tucker’s complementarity theorem, 119

vectorpositive, 40semipositive, 40

vector space, 231vertices

of a simplex, 9

20151103 cfd

Documents

convex sets1 convex

recession functions

polyhedral convex sets795

closed convex functions1316

linear images of convex

conjugate functions

support functions

gauge functions