release 3.2.2 mosek apschapter2 linearoptimization in this chapter we discuss various aspects of...

132
MOSEK Modeling Cookbook Release 3.2.2 MOSEK ApS 18 March 2021

Upload: others

Post on 21-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

MOSEK Modeling CookbookRelease 3.2.2

MOSEK ApS

18 March 2021

Page 2: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Contents

1 Preface 1

2 Linear optimization 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Linear modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Infeasibility in linear optimization . . . . . . . . . . . . . . . . . . . . . . . 122.4 Duality in linear optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Conic quadratic optimization 203.1 Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Conic quadratic modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 Conic quadratic case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 The power cone 304.1 The power cone(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Sets representable using the power cone . . . . . . . . . . . . . . . . . . . . 314.3 Power cone case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Exponential cone optimization 385.1 Exponential cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2 Modeling with the exponential cone . . . . . . . . . . . . . . . . . . . . . . 395.3 Geometric programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.4 Exponential cone case studies . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6 Semidefinite optimization 516.1 Introduction to semidefinite matrices . . . . . . . . . . . . . . . . . . . . . . 516.2 Semidefinite modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.3 Semidefinite optimization case studies . . . . . . . . . . . . . . . . . . . . . 65

7 Practical optimization 747.1 Conic reformulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747.2 Avoiding ill-posed problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.3 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797.4 The huge and the tiny . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817.5 Semidefinite variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827.6 The quality of a solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

i

Page 3: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

7.7 Distance to a cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8 Duality in conic optimization 888.1 Dual cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898.2 Infeasibility in conic optimization . . . . . . . . . . . . . . . . . . . . . . . . 908.3 Lagrangian and the dual problem . . . . . . . . . . . . . . . . . . . . . . . . 928.4 Weak and strong duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 958.5 Applications of conic duality . . . . . . . . . . . . . . . . . . . . . . . . . . 988.6 Semidefinite duality and LMIs . . . . . . . . . . . . . . . . . . . . . . . . . 99

9 Mixed integer optimization 1029.1 Integer modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1029.2 Mixed integer conic case studies . . . . . . . . . . . . . . . . . . . . . . . . 110

10 Quadratic optimization 11410.1 Quadratic objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11410.2 Quadratically constrained optimization . . . . . . . . . . . . . . . . . . . . . 11710.3 Example: Factor model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

11 Bibliographic notes 121

12 Notation and definitions 122

Bibliography 124

Index 126

ii

Page 4: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Chapter 1

Preface

This cookbook is about model building using convex optimization. It is intended as amodeling guide for the MOSEK optimization package. However, the style is intentionallyquite generic without specific MOSEK commands or API descriptions.

There are several excellent books available on this topic, for example the books by Ben-Tal and Nemirovski [BenTalN01] and Boyd and Vandenberghe [BV04] , which have bothbeen a great source of inspiration for this manual. The purpose of this manual is to collectthe material which we consider most relevant to our users and to present it in a practicalself-contained manner; however, we highly recommend the books as a supplement to thismanual.

Some textbooks on building models using optimization (or mathematical programming)introduce various concepts through practical examples. In this manual we have chosen adifferent route, where we instead show the different sets and functions that can be modeledusing convex optimization, which can subsequently be combined into realistic examples andapplications. In other words, we present simple convex building blocks, which can then becombined into more elaborate convex models. We call this approach extremely disciplinedmodeling. With the advent of more expressive and sophisticated tools like conic optimization,we feel that this approach is better suited.

Content

We begin with a comprehensive chapter on linear optimization, including modeling ex-amples, duality theory and infeasibility certificates for linear problems. Linear problems areoptimization problems of the form

minimize ๐‘๐‘‡๐‘ฅsubject to ๐ด๐‘ฅ = ๐‘,

๐‘ฅ โ‰ฅ 0.

Conic optimization is a generalization of linear optimization which handles problems of theform:

minimize ๐‘๐‘‡๐‘ฅsubject to ๐ด๐‘ฅ = ๐‘,

๐‘ฅ โˆˆ ๐พ,

1

Page 5: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

where ๐พ is a convex cone. Various families of convex cones allow formulating different typesof nonlinear constraints. The following chapters present modeling with four types of convexcones:

โ€ข quadratic cones ,

โ€ข power cone,

โ€ข exponential cone,

โ€ข semidefinite cone.

It is โ€œwell-knownโ€ in the convex optimization community that this family of cones issufficient to express almost all convex optimization problems appearing in practice.

Next we discuss issues arising in practical optimization, and we wholeheartedly recom-mend this short chapter to all readers before moving on to implementing mathematicalmodels with real data.

Following that, we present a general duality and infeasibility theory for conic problems.Finally we diverge slightly from the topic of conic optimization and introduce the language ofmixed-integer optimization and we discuss the relation between convex quadratic optimizationand conic quadratic optimization.

2

Page 6: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Chapter 2

Linear optimization

In this chapter we discuss various aspects of linear optimization. We first introduce thebasic concepts of linear optimization and discuss the underlying geometric interpretations.We then give examples of the most frequently used reformulations or modeling tricks usedin linear optimization, and finally we discuss duality and infeasibility theory in some detail.

2.1 Introduction

2.1.1 Basic notions

The most basic type of optimization is linear optimization. In linear optimization we mini-mize a linear function given a set of linear constraints. For example, we may wish to minimizea linear function

๐‘ฅ1 + 2๐‘ฅ2 โˆ’ ๐‘ฅ3

under the constraints that

๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 = 1, ๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3 โ‰ฅ 0.

The function we minimize is often called the objective function; in this case we have a linearobjective function. The constraints are also linear and consist of both linear equalities andinequalities. We typically use more compact notation

minimize ๐‘ฅ1 + 2๐‘ฅ2 โˆ’ ๐‘ฅ3

subject to ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 = 1,๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3 โ‰ฅ 0,

(2.1)

and we call (2.1) a linear optimization problem. The domain where all constraints are satisfiedis called the feasible set ; the feasible set for (2.1) is shown in Fig. 2.1.

For this simple problem we see by inspection that the optimal value of the problem is โˆ’1obtained by the optimal solution

(๐‘ฅโ‹†1, ๐‘ฅ

โ‹†2, ๐‘ฅ

โ‹†3) = (0, 0, 1).

3

Page 7: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

x1 x2

x3

Fig. 2.1: Feasible set for ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 = 1 and ๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3 โ‰ฅ 0.

Linear optimization problems are typically formulated using matrix notation. The standardform of a linear minimization problem is:

minimize ๐‘๐‘‡๐‘ฅsubject to ๐ด๐‘ฅ = ๐‘,

๐‘ฅ โ‰ฅ 0.(2.2)

For example, we can pose (2.1) in this form with

๐‘ฅ =

โŽกโŽฃ ๐‘ฅ1

๐‘ฅ2

๐‘ฅ3

โŽคโŽฆ , ๐‘ =

โŽกโŽฃ 12

โˆ’1

โŽคโŽฆ , ๐ด =[๏ธ€

1 1 1]๏ธ€.

There are many other formulations for linear optimization problems; we can have differenttypes of constraints,

๐ด๐‘ฅ = ๐‘, ๐ด๐‘ฅ โ‰ฅ ๐‘, ๐ด๐‘ฅ โ‰ค ๐‘, ๐‘™๐‘ โ‰ค ๐ด๐‘ฅ โ‰ค ๐‘ข๐‘,

and different bounds on the variables

๐‘™๐‘ฅ โ‰ค ๐‘ฅ โ‰ค ๐‘ข๐‘ฅ

or we may have no bounds on some ๐‘ฅ๐‘–, in which case we say that ๐‘ฅ๐‘– is a free variable. Allthese formulations are equivalent in the sense that by simple linear transformations andintroduction of auxiliary variables they represent the same set of problems. The importantfeature is that the objective function and the constraints are all linear in ๐‘ฅ.

2.1.2 Geometry of linear optimization

A hyperplane is a subset of R๐‘› defined as {๐‘ฅ | ๐‘Ž๐‘‡ (๐‘ฅโˆ’๐‘ฅ0) = 0} or equivalently {๐‘ฅ | ๐‘Ž๐‘‡๐‘ฅ = ๐›พ}with ๐‘Ž๐‘‡๐‘ฅ0 = ๐›พ, see Fig. 2.2.

4

Page 8: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

a

x

aTx = ฮณ

x0

Fig. 2.2: The dashed line illustrates a hyperplane {๐‘ฅ | ๐‘Ž๐‘‡๐‘ฅ = ๐›พ}.

Thus a linear constraint

๐ด๐‘ฅ = ๐‘

with ๐ด โˆˆ R๐‘šร—๐‘› represents an intersection of ๐‘š hyperplanes.Next, consider a point ๐‘ฅ above the hyperplane in Fig. 2.3. Since ๐‘ฅ โˆ’ ๐‘ฅ0 forms an acute

angle with ๐‘Ž we have that ๐‘Ž๐‘‡ (๐‘ฅ โˆ’ ๐‘ฅ0) โ‰ฅ 0, or ๐‘Ž๐‘‡๐‘ฅ โ‰ฅ ๐›พ. The set {๐‘ฅ | ๐‘Ž๐‘‡๐‘ฅ โ‰ฅ ๐›พ} is called ahalfspace. Similarly the set {๐‘ฅ | ๐‘Ž๐‘‡๐‘ฅ โ‰ค ๐›พ} forms another halfspace; in Fig. 2.3 it correspondsto the area below the dashed line.

a

x

aTx > ฮณ

x0

Fig. 2.3: The grey area is the halfspace {๐‘ฅ | ๐‘Ž๐‘‡๐‘ฅ โ‰ฅ ๐›พ}.

A set of linear inequalities

๐ด๐‘ฅ โ‰ค ๐‘

corresponds to an intersection of halfspaces and forms a polyhedron, see Fig. 2.4.The polyhedral description of the feasible set gives us a very intuitive interpretation of

linear optimization, which is illustrated in Fig. 2.5. The dashed lines are normal to theobjective ๐‘ = (โˆ’1, 1), and to minimize ๐‘๐‘‡๐‘ฅ we move as far as possible in the oppositedirection of ๐‘, to the furthest position where a dashed line intersect the polyhedron; anoptimal solution is therefore always either a vertex of the polyhedron, or an entire facet ofthe polyhedron may be optimal.

5

Page 9: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

a1

a2a3

a4

a5

Fig. 2.4: A polyhedron formed as an intersection of halfspaces.

xโ‹†

a1

a2a3

a4

a5

c

Fig. 2.5: Geometric interpretation of linear optimization. The optimal solution ๐‘ฅโ‹† is at apoint where the normals to ๐‘ (the dashed lines) intersect the polyhedron.

6

Page 10: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

The polyhedron shown in the figure is nonempty and bounded, but this is not always thecase for polyhedra arising from linear inequalities in optimization problems. In such casesthe optimization problem may be infeasible or unbounded, which we will discuss in detail inSec. 2.3.

2.2 Linear modeling

In this section we present useful reformulation techniques and standard tricks which allowconstructing more complicated models using linear optimization. It is also a guide throughthe types of constraints which can be expressed using linear (in)equalities.

2.2.1 Maximum

The inequality ๐‘ก โ‰ฅ max{๐‘ฅ1, . . . , ๐‘ฅ๐‘›} is equivalent to a simultaneous sequence of ๐‘› inequalities

๐‘ก โ‰ฅ ๐‘ฅ๐‘–, ๐‘– = 1, . . . , ๐‘›

and similarly ๐‘ก โ‰ค min{๐‘ฅ1, . . . , ๐‘ฅ๐‘›} is the same as

๐‘ก โ‰ค ๐‘ฅ๐‘–, ๐‘– = 1, . . . , ๐‘›.

Of course the same reformulation applies if each ๐‘ฅ๐‘– is not a single variable but a linearexpression. In particular, we can consider convex piecewise-linear functions ๐‘“ : R๐‘› โ†ฆโ†’ Rdefined as the maximum of affine functions (see Fig. 2.6):

๐‘“(๐‘ฅ) := max๐‘–=1,...,๐‘š

{๐‘Ž๐‘‡๐‘– ๐‘ฅ + ๐‘๐‘–}

a1x+ b1

a2x+ b2

a3x+ b3

Fig. 2.6: A convex piecewise-linear function (solid lines) of a single variable ๐‘ฅ. The functionis defined as the maximum of 3 affine functions.

The epigraph ๐‘“(๐‘ฅ) โ‰ค ๐‘ก (see Sec. 12) has an equivalent formulation with ๐‘š inequalities:

๐‘Ž๐‘‡๐‘– ๐‘ฅ + ๐‘๐‘– โ‰ค ๐‘ก, ๐‘– = 1, . . . ,๐‘š.

Piecewise-linear functions have many uses linear in optimization; either we have a convexpiecewise-linear formulation from the onset, or we may approximate a more complicated(nonlinear) problem using piecewise-linear approximations, although with modern nonlinearoptimization software it is becoming both easier and more efficient to directly formulate andsolve nonlinear problems without piecewise-linear approximations.

7

Page 11: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

2.2.2 Absolute value

The absolute value of a scalar variable is a special case of maximum

|๐‘ฅ| := max{๐‘ฅ,โˆ’๐‘ฅ},

so we can model the epigraph |๐‘ฅ| โ‰ค ๐‘ก using two inequalities

โˆ’๐‘ก โ‰ค ๐‘ฅ โ‰ค ๐‘ก.

2.2.3 The โ„“1 norm

All norms are convex functions, but the โ„“1 and โ„“โˆž norms are of particular interest for linearoptimization. The โ„“1 norm of vector ๐‘ฅ โˆˆ R๐‘› is defined as

โ€–๐‘ฅโ€–1 := |๐‘ฅ1| + |๐‘ฅ2| + ยท ยท ยท + |๐‘ฅ๐‘›|.

To model the epigraph

โ€–๐‘ฅโ€–1 โ‰ค ๐‘ก, (2.3)

we introduce the following system

|๐‘ฅ๐‘–| โ‰ค ๐‘ง๐‘–, ๐‘– = 1, . . . , ๐‘›,๐‘›โˆ‘๏ธ

๐‘–=1

๐‘ง๐‘– = ๐‘ก, (2.4)

with additional (auxiliary) variable ๐‘ง โˆˆ R๐‘›. Clearly (2.3) and (2.4) are equivalent, in thesense that they have the same projection onto the space of ๐‘ฅ and ๐‘ก variables. Therefore, wecan model (2.3) using linear (in)equalities

โˆ’๐‘ง๐‘– โ‰ค ๐‘ฅ๐‘– โ‰ค ๐‘ง๐‘–,๐‘›โˆ‘๏ธ

๐‘–=1

๐‘ง๐‘– = ๐‘ก, (2.5)

with auxiliary variables ๐‘ง. Similarly, we can describe the epigraph of the norm of an affinefunction of ๐‘ฅ,

โ€–๐ด๐‘ฅโˆ’ ๐‘โ€–1 โ‰ค ๐‘ก

as

โˆ’๐‘ง๐‘– โ‰ค ๐‘Ž๐‘‡๐‘– ๐‘ฅโˆ’ ๐‘๐‘– โ‰ค ๐‘ง๐‘–,๐‘›โˆ‘๏ธ

๐‘–=1

๐‘ง๐‘– = ๐‘ก,

where ๐‘Ž๐‘– is the ๐‘–โˆ’th row of ๐ด (taken as a column-vector).

8

Page 12: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Example 2.1 (Basis pursuit). The โ„“1 norm is overwhelmingly popular as a convex ap-proximation of the cardinality (i.e., number on nonzero elements) of a vector ๐‘ฅ. Forexample, suppose we are given an underdetermined linear system

๐ด๐‘ฅ = ๐‘

where ๐ด โˆˆ R๐‘šร—๐‘› and ๐‘š โ‰ช ๐‘›. The basis pursuit problem

minimize โ€–๐‘ฅโ€–1subject to ๐ด๐‘ฅ = ๐‘,

(2.6)

uses the โ„“1 norm of ๐‘ฅ as a heuristic for finding a sparse solution (one with many zeroelements) to ๐ด๐‘ฅ = ๐‘, i.e., it aims to represent ๐‘ as a linear combination of few columns of๐ด. Using (2.5) we can pose the problem as a linear optimization problem,

minimize ๐‘’๐‘‡ ๐‘งsubject to โˆ’๐‘ง โ‰ค ๐‘ฅ โ‰ค ๐‘ง,

๐ด๐‘ฅ = ๐‘,(2.7)

where ๐‘’ = (1, . . . , 1)๐‘‡ .

2.2.4 The โ„“โˆž norm

The โ„“โˆž norm of a vector ๐‘ฅ โˆˆ R๐‘› is defined as

โ€–๐‘ฅโ€–โˆž := max๐‘–=1,...,๐‘›

|๐‘ฅ๐‘–|,

which is another example of a simple piecewise-linear function. Using Sec. 2.2.2 we model

โ€–๐‘ฅโ€–โˆž โ‰ค ๐‘ก

as

โˆ’๐‘ก โ‰ค ๐‘ฅ๐‘– โ‰ค ๐‘ก, ๐‘– = 1, . . . , ๐‘›.

Again, we can also consider affine functions of ๐‘ฅ, i.e.,

โ€–๐ด๐‘ฅโˆ’ ๐‘โ€–โˆž โ‰ค ๐‘ก,

which can be described as

โˆ’๐‘ก โ‰ค ๐‘Ž๐‘‡๐‘– ๐‘ฅโˆ’ ๐‘ โ‰ค ๐‘ก, ๐‘– = 1, . . . , ๐‘›.

9

Page 13: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Example 2.2 (Dual norms). It is interesting to note that the โ„“1 and โ„“โˆž norms are dual.For any norm โ€– ยท โ€– on R๐‘›, the dual norm โ€– ยท โ€–* is defined as

โ€–๐‘ฅโ€–* = max{๐‘ฅ๐‘‡๐‘ฃ | โ€–๐‘ฃโ€– โ‰ค 1}.

Let us verify that the dual of the โ„“โˆž norm is the โ„“1 norm. Consider

โ€–๐‘ฅโ€–*,โˆž = max{๐‘ฅ๐‘‡๐‘ฃ | โ€–๐‘ฃโ€–โˆž โ‰ค 1}.

Obviously the maximum is attained for

๐‘ฃ๐‘– =

{๏ธ‚+1, ๐‘ฅ๐‘– โ‰ฅ 0,โˆ’1, ๐‘ฅ๐‘– < 0,

i.e., โ€–๐‘ฅโ€–*,โˆž =โˆ‘๏ธ€

๐‘– |๐‘ฅ๐‘–| = โ€–๐‘ฅโ€–1. Similarly, consider the dual of the โ„“1 norm,

โ€–๐‘ฅโ€–*,1 = max{๐‘ฅ๐‘‡๐‘ฃ | โ€–๐‘ฃโ€–1 โ‰ค 1}.

To maximize ๐‘ฅ๐‘‡๐‘ฃ subject to |๐‘ฃ1| + ยท ยท ยท + |๐‘ฃ๐‘›| โ‰ค 1 we simply pick the element of ๐‘ฅ withlargest absolute value, say |๐‘ฅ๐‘˜|, and set ๐‘ฃ๐‘˜ = ยฑ1, so that โ€–๐‘ฅโ€–*,1 = |๐‘ฅ๐‘˜| = โ€–๐‘ฅโ€–โˆž. Thisillustrates a more general property of dual norms, namely that โ€–๐‘ฅโ€–** = โ€–๐‘ฅโ€–.

2.2.5 Homogenization

Consider the linear-fractional problem

minimize ๐‘Ž๐‘‡ ๐‘ฅ+๐‘๐‘๐‘‡ ๐‘ฅ+๐‘‘

subject to ๐‘๐‘‡๐‘ฅ + ๐‘‘ > 0,๐น๐‘ฅ = ๐‘”.

(2.8)

Perhaps surprisingly, it can be turned into a linear problem if we homogenize the linearconstraint, i.e. replace it with ๐น๐‘ฆ = ๐‘”๐‘ง for a single variable ๐‘ง โˆˆ R. The full new optimizationproblem is

minimize ๐‘Ž๐‘‡๐‘ฆ + ๐‘๐‘งsubject to ๐‘๐‘‡๐‘ฆ + ๐‘‘๐‘ง = 1,

๐น๐‘ฆ = ๐‘”๐‘ง,๐‘ง โ‰ฅ 0.

(2.9)

If ๐‘ฅ is a feasible point in (2.8) then ๐‘ง = (๐‘๐‘‡๐‘ฅ + ๐‘‘)โˆ’1, ๐‘ฆ = ๐‘ฅ๐‘ง is feasible for (2.9) with thesame objective value. Conversely, if (๐‘ฆ, ๐‘ง) is feasible for (2.9) then ๐‘ฅ = ๐‘ฆ/๐‘ง is feasible in (2.8)and has the same objective value, at least when ๐‘ง ฬธ= 0. If ๐‘ง = 0 and ๐‘ฅ is any feasible pointfor (2.8) then ๐‘ฅ + ๐‘ก๐‘ฆ, ๐‘ก โ†’ +โˆž is a sequence of solutions of (2.8) converging to the value of(2.9). We leave it for the reader to check those statements. In either case we showed anequivalence between the two problems.

10

Page 14: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Note that, as the sketch of proof above suggests, the optimal value in (2.8) may not beattained, even though the one in the linear problem (2.9) always is. For example, considera pair of problems constructed as above:

minimize ๐‘ฅ1/๐‘ฅ2

subject to ๐‘ฅ2 > 0,๐‘ฅ1 + ๐‘ฅ2 = 1.

minimize ๐‘ฆ1subject to ๐‘ฆ1 + ๐‘ฆ2 = ๐‘ง,

๐‘ฆ2 = 1,๐‘ง โ‰ฅ 0.

Both have an optimal value of โˆ’1, but on the left we can only approach it arbitrarily closely.

2.2.6 Sum of largest elements

Suppose ๐‘ฅ โˆˆ R๐‘› and that ๐‘š is a positive integer. Consider the problem

minimize ๐‘š๐‘ก +โˆ‘๏ธ€

๐‘– ๐‘ข๐‘–

subject to ๐‘ข๐‘– + ๐‘ก โ‰ฅ ๐‘ฅ๐‘–, ๐‘– = 1, . . . , ๐‘›,๐‘ข๐‘– โ‰ฅ 0, ๐‘– = 1, . . . , ๐‘›,

(2.10)

with new variables ๐‘ก โˆˆ R, ๐‘ข๐‘– โˆˆ R๐‘›. It is easy to see that fixing a value for ๐‘ก determines therest of the solution. For the sake of simplifying notation let us assume for a moment that ๐‘ฅis sorted:

๐‘ฅ1 โ‰ฅ ๐‘ฅ2 โ‰ฅ ยท ยท ยท โ‰ฅ ๐‘ฅ๐‘›.

If ๐‘ก โˆˆ [๐‘ฅ๐‘˜, ๐‘ฅ๐‘˜+1) then ๐‘ข๐‘™ = 0 for ๐‘™ โ‰ฅ ๐‘˜ + 1 and ๐‘ข๐‘™ = ๐‘ฅ๐‘™ โˆ’ ๐‘ก for ๐‘™ โ‰ค ๐‘˜ in the optimal solution.Therefore, the objective value under the assumption ๐‘ก โˆˆ [๐‘ฅ๐‘˜, ๐‘ฅ๐‘˜+1) is

obj๐‘ก = ๐‘ฅ1 + ยท ยท ยท + ๐‘ฅ๐‘˜ + ๐‘ก(๐‘šโˆ’ ๐‘˜)

which is a linear function minimized at one of the endpoints of [๐‘ฅ๐‘˜, ๐‘ฅ๐‘˜+1). Now we cancompute

obj๐‘ฅ๐‘˜+1โˆ’ obj๐‘ฅ๐‘˜

= (๐‘˜ โˆ’๐‘š)(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜+1).

It follows that obj๐‘ฅ๐‘˜has a minimum for ๐‘˜ = ๐‘š, and therefore the optimum value of (2.10)

is simply

๐‘ฅ1 + ยท ยท ยท + ๐‘ฅ๐‘š.

Since the assumption that ๐‘ฅ is sorted was only a notational convenience, we conclude thatin general the optimization model (2.10) computes the sum of ๐‘š largest entries in ๐‘ฅ. InSec. 2.4 we will show a conceptual way of deriving this model.

11

Page 15: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

2.3 Infeasibility in linear optimization

In this section we discuss the basic theory of primal infeasibility certificates for linear prob-lems. These ideas will be developed further after we have introduced duality in the nextsection.

One of the first issues one faces when presented with an optimization problem is whetherit has any solutions at all. As we discussed previously, for a linear optimization problem

minimize ๐‘๐‘‡๐‘ฅsubject to ๐ด๐‘ฅ = ๐‘,

๐‘ฅ โ‰ฅ 0.(2.11)

the feasible set

โ„ฑ๐‘ = {๐‘ฅ โˆˆ R๐‘› | ๐ด๐‘ฅ = ๐‘, ๐‘ฅ โ‰ฅ 0}

is a convex polytope. We say the problem is feasible if โ„ฑ๐‘ ฬธ= โˆ… and infeasible otherwise.

Example 2.3 (Linear infeasible problem). Consider the optimization problem:

minimize 2๐‘ฅ1 + 3๐‘ฅ2 โˆ’ ๐‘ฅ3

subject to ๐‘ฅ1 + ๐‘ฅ2 + 2๐‘ฅ3 = 1,โˆ’ 2๐‘ฅ1 โˆ’ ๐‘ฅ2 + ๐‘ฅ3 = โˆ’0.5,โˆ’ ๐‘ฅ1 + 5๐‘ฅ3 = โˆ’0.1,

๐‘ฅ๐‘– โ‰ฅ 0.

This problem is infeasible. We see it by taking a linear combination of the constraintswith coefficients ๐‘ฆ = (1, 2,โˆ’1)๐‘‡ :

๐‘ฅ1 + ๐‘ฅ2 + 2๐‘ฅ3 = 1, / ยท1โˆ’ 2๐‘ฅ1 โˆ’ ๐‘ฅ2 + ๐‘ฅ3 = โˆ’0.5, / ยท2โˆ’ ๐‘ฅ1 + 5๐‘ฅ3 = โˆ’0.1, / ยท(โˆ’1)โˆ’ 2๐‘ฅ1 โˆ’ ๐‘ฅ2 โˆ’ ๐‘ฅ3 = 0.1.

This clearly proves infeasibility: the left-hand side is negative and the right-hand side ispositive, which is impossible.

2.3.1 Farkasโ€™ lemma

In the last example we proved infeasibility of the linear system by exhibiting an explicit linearcombination of the equations, such that the right-hand side (constant) is positive while onthe left-hand side all coefficients are negative or zero. In matrix notation, such a linearcombination is given by a vector ๐‘ฆ such that ๐ด๐‘‡๐‘ฆ โ‰ค 0 and ๐‘๐‘‡๐‘ฆ > 0. The next lemma showsthat infeasibility of (2.11) is equivalent to the existence of such a vector.

12

Page 16: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Lemma 2.1 (Farkasโ€™ lemma). Given ๐ด and ๐‘ as in (2.11), exactly one of the two statementsis true:

1. There exists ๐‘ฅ โ‰ฅ 0 such that ๐ด๐‘ฅ = ๐‘.

2. There exists ๐‘ฆ such that ๐ด๐‘‡๐‘ฆ โ‰ค 0 and ๐‘๐‘‡๐‘ฆ > 0.

Proof. Let ๐‘Ž1, . . . , ๐‘Ž๐‘› be the columns of ๐ด. The set {๐ด๐‘ฅ | ๐‘ฅ โ‰ฅ 0} is a closed convex conespanned by ๐‘Ž1, . . . , ๐‘Ž๐‘›. If this cone contains ๐‘ then we have the first alternative. Otherwisethe cone can be separated from the point ๐‘ by a hyperplane passing through 0, i.e. thereexists ๐‘ฆ such that ๐‘ฆ๐‘‡ ๐‘ > 0 and ๐‘ฆ๐‘‡๐‘Ž๐‘– โ‰ค 0 for all ๐‘–. This is equivalent to the second alternative.Finally, 1. and 2. are mutually exclusive, since otherwise we would have

0 < ๐‘ฆ๐‘‡ ๐‘ = ๐‘ฆ๐‘‡๐ด๐‘ฅ = (๐ด๐‘‡๐‘ฆ)๐‘‡๐‘ฅ โ‰ค 0.

Farkasโ€™ lemma implies that either the problem (2.11) is feasible or there is a certificateof infeasibility ๐‘ฆ. In other words, every time we classify model as infeasible, we can certifythis fact by providing an appropriate ๐‘ฆ, as in Example 2.3.

2.3.2 Locating infeasibility

As we already discussed, the infeasibility certificate ๐‘ฆ gives coefficients of a linear combinationof the constraints which is infeasible โ€œin an obvious wayโ€, that is positive on one side andnegative on the other. In some cases, ๐‘ฆ may be very sparse, i.e. it may have very fewnonzeros, which means that already a very small subset of the constraints is the root causeof infeasibility. This may be interesting if, for example, we are debugging a large modelwhich we expected to be feasible and infeasibility is caused by an error in the problemformulation. Then we only have to consider the sub-problem formed by constraints withindex set {๐‘— | ๐‘ฆ๐‘— ฬธ= 0}.

Example 2.4 (All constraints involved in infeasibility). As a cautionary note considerthe constraints

0 โ‰ค ๐‘ฅ1 โ‰ค ๐‘ฅ2 โ‰ค ยท ยท ยท โ‰ค ๐‘ฅ๐‘› โ‰ค โˆ’1.

Any problem with those constraints is infeasible, but dropping any one of the inequalitiescreates a feasible subproblem.

2.4 Duality in linear optimization

Duality is a rich and powerful theory, central to understanding infeasibility and sensitivityissues in linear optimization. In this section we only discuss duality in linear optimization ata descriptive level suited for practitioners; we refer to Sec. 8 for a more in-depth discussionof duality for general conic problems.

13

Page 17: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

2.4.1 The dual problem

Primal problem

We consider as always a linear optimization problem in standard form:

minimize ๐‘๐‘‡๐‘ฅsubject to ๐ด๐‘ฅ = ๐‘,

๐‘ฅ โ‰ฅ 0.(2.12)

We denote the optimal objective value in (2.12) by ๐‘โ‹†. There are three possibilities:

โ€ข The problem is infeasible. By convention ๐‘โ‹† = +โˆž.

โ€ข ๐‘โ‹† is finite, in which case the problem has an optimal solution.

โ€ข ๐‘โ‹† = โˆ’โˆž, meaning that there are feasible solutions with ๐‘๐‘‡๐‘ฅ decreasing to โˆ’โˆž, inwhich case we say the problem is unbounded.

Lagrange function

We associate with (2.12) a so-called Lagrangian function ๐ฟ : R๐‘› ร— R๐‘š ร— R๐‘›+ โ†’ R that

augments the objective with a weighted combination of all the constraints,

๐ฟ(๐‘ฅ, ๐‘ฆ, ๐‘ ) = ๐‘๐‘‡๐‘ฅ + ๐‘ฆ๐‘‡ (๐‘โˆ’ ๐ด๐‘ฅ) โˆ’ ๐‘ ๐‘‡๐‘ฅ.

The variables ๐‘ฆ โˆˆ R๐‘š and ๐‘  โˆˆ R๐‘›+ are called Lagrange multipliers or dual variables. For any

feasible ๐‘ฅ* โˆˆ โ„ฑ๐‘ and any (๐‘ฆ*, ๐‘ *) โˆˆ R๐‘š ร— R๐‘›+ we have

๐ฟ(๐‘ฅ*, ๐‘ฆ*, ๐‘ *) = ๐‘๐‘‡๐‘ฅ* + (๐‘ฆ*)๐‘‡ ยท 0 โˆ’ (๐‘ *)๐‘‡๐‘ฅ* โ‰ค ๐‘๐‘‡๐‘ฅ*.

Note the we used the nonnegativity of ๐‘ *, or in general of any Lagrange multiplier associatedwith an inequality constraint. The dual function is defined as the minimum of ๐ฟ(๐‘ฅ, ๐‘ฆ, ๐‘ ) over๐‘ฅ. Thus the dual function of (2.12) is

๐‘”(๐‘ฆ, ๐‘ ) = min๐‘ฅ

๐ฟ(๐‘ฅ, ๐‘ฆ, ๐‘ ) = min๐‘ฅ

๐‘ฅ๐‘‡ (๐‘โˆ’ ๐ด๐‘‡๐‘ฆ โˆ’ ๐‘ ) + ๐‘๐‘‡๐‘ฆ =

{๏ธ‚๐‘๐‘‡๐‘ฆ, ๐‘โˆ’ ๐ด๐‘‡๐‘ฆ โˆ’ ๐‘  = 0,โˆ’โˆž, otherwise.

Dual problem

For every (๐‘ฆ, ๐‘ ) the value of ๐‘”(๐‘ฆ, ๐‘ ) is a lower bound for ๐‘โ‹†. To get the best such boundwe maximize ๐‘”(๐‘ฆ, ๐‘ ) over all (๐‘ฆ, ๐‘ ) and get the dual problem:

maximize ๐‘๐‘‡๐‘ฆsubject to ๐‘โˆ’ ๐ด๐‘‡๐‘ฆ = ๐‘ ,

๐‘  โ‰ฅ 0.(2.13)

The optimal value of (2.13) will be denoted ๐‘‘โ‹†. As in the case of (2.12) (which from now onwe call the primal problem), the dual problem can be infeasible (๐‘‘โ‹† = โˆ’โˆž), have an optimalsolution (โˆ’โˆž < ๐‘‘โ‹† < +โˆž) or be unbounded (๐‘‘โ‹† = +โˆž). Note that the roles of โˆ’โˆž and+โˆž are now reversed because the dual is a maximization problem.

14

Page 18: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Example 2.5 (Dual of basis pursuit). As an example, let us derive the dual of the basispursuit formulation (2.7). It would be possible to add auxiliary variables and constraintsto force that problem into the standard form (2.12) and then just apply the dual trans-formation as a black box, but it is both easier and more instructive to directly write theLagrangian:

๐ฟ(๐‘ฅ, ๐‘ง, ๐‘ฆ, ๐‘ข, ๐‘ฃ) = ๐‘’๐‘‡ ๐‘ง + ๐‘ข๐‘‡ (๐‘ฅโˆ’ ๐‘ง) โˆ’ ๐‘ฃ๐‘‡ (๐‘ฅ + ๐‘ง) + ๐‘ฆ๐‘‡ (๐‘โˆ’ ๐ด๐‘ฅ)

where ๐‘’ = (1, . . . , 1)๐‘‡ , with Lagrange multipliers ๐‘ฆ โˆˆ R๐‘š and ๐‘ข, ๐‘ฃ โˆˆ R๐‘›+. The dual function

๐‘”(๐‘ฆ, ๐‘ข, ๐‘ฃ) = min๐‘ฅ,๐‘ง

๐ฟ(๐‘ฅ, ๐‘ง, ๐‘ฆ, ๐‘ข, ๐‘ฃ) = min๐‘ฅ,๐‘ง

๐‘ง๐‘‡ (๐‘’โˆ’ ๐‘ขโˆ’ ๐‘ฃ) + ๐‘ฅ๐‘‡ (๐‘ขโˆ’ ๐‘ฃ โˆ’ ๐ด๐‘‡๐‘ฆ) + ๐‘ฆ๐‘‡ ๐‘

is only bounded below if ๐‘’ = ๐‘ข + ๐‘ฃ and ๐ด๐‘‡๐‘ฆ = ๐‘ขโˆ’ ๐‘ฃ, hence the dual problem is

maximize ๐‘๐‘‡๐‘ฆsubject to ๐‘’ = ๐‘ข + ๐‘ฃ,

๐ด๐‘‡๐‘ฆ = ๐‘ขโˆ’ ๐‘ฃ,๐‘ข, ๐‘ฃ โ‰ฅ 0.

(2.14)

It is not hard to observe that an equivalent formulation of (2.14) is simply

maximize ๐‘๐‘‡๐‘ฆsubject to โ€–๐ด๐‘‡๐‘ฆโ€–โˆž โ‰ค 1,

(2.15)

which should be associated with duality between norms discussed in Example 2.2.

Example 2.6 (Dual of a maximization problem). We can similarly derive the dual ofproblem (2.13). If we write it simply as

maximize ๐‘๐‘‡๐‘ฆsubject to ๐‘โˆ’ ๐ด๐‘‡๐‘ฆ โ‰ฅ 0,

then the Lagrangian is

๐ฟ(๐‘ฆ, ๐‘ข) = ๐‘๐‘‡๐‘ฆ + ๐‘ข๐‘‡ (๐‘โˆ’ ๐ด๐‘‡๐‘ฆ) = ๐‘ฆ๐‘‡ (๐‘โˆ’ ๐ด๐‘ข) + ๐‘๐‘‡๐‘ข

with ๐‘ข โˆˆ R๐‘›+, so that now ๐ฟ(๐‘ฆ, ๐‘ข) โ‰ฅ ๐‘๐‘‡๐‘ฆ for any feasible ๐‘ฆ. Calculating min๐‘ข max๐‘ฆ ๐ฟ(๐‘ฆ, ๐‘ข)

is now equivalent to the problem

minimize ๐‘๐‘‡๐‘ขsubject to ๐ด๐‘ข = ๐‘,

๐‘ข โ‰ฅ 0,

so, as expected, the dual of the dual recovers the original primal problem.

15

Page 19: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

2.4.2 Weak and strong duality

Suppose ๐‘ฅ* and (๐‘ฆ*, ๐‘ *) are feasible points for the primal and dual problems (2.12) and (2.13),respectively. Then we have

๐‘๐‘‡๐‘ฆ* = (๐ด๐‘ฅ*)๐‘‡๐‘ฆ* = (๐‘ฅ*)๐‘‡ (๐ด๐‘‡๐‘ฆ*) = (๐‘ฅ*)๐‘‡ (๐‘โˆ’ ๐‘ *) = ๐‘๐‘‡๐‘ฅ* โˆ’ (๐‘ *)๐‘‡๐‘ฅ* โ‰ค ๐‘๐‘‡๐‘ฅ*

so the dual objective value is a lower bound on the objective value of the primal. In particular,any dual feasible point (๐‘ฆ*, ๐‘ *) gives a lower bound:

๐‘๐‘‡๐‘ฆ* โ‰ค ๐‘โ‹†

and we immediately get the next lemma.

Lemma 2.2 (Weak duality). ๐‘‘โ‹† โ‰ค ๐‘โ‹†.

It follows that if ๐‘๐‘‡๐‘ฆ* = ๐‘๐‘‡๐‘ฅ* then ๐‘ฅ* is optimal for the primal, (๐‘ฆ*, ๐‘ *) is optimal forthe dual and ๐‘๐‘‡๐‘ฆ* = ๐‘๐‘‡๐‘ฅ* is the common optimal objective value. This way we can use theoptimal dual solution to certify optimality of the primal solution and vice versa.

The remarkable property of linear optimization is that ๐‘‘โ‹† = ๐‘โ‹† holds in the most interest-ing scenario when the primal problem is feasible and bounded. It means that the certificateof optimality mentioned in the previous paragraph always exists.

Lemma 2.3 (Strong duality). If at least one of ๐‘‘โ‹†, ๐‘โ‹† is finite then ๐‘‘โ‹† = ๐‘โ‹†.

Proof. Suppose โˆ’โˆž < ๐‘โ‹† < โˆž; the proof in the dual case is analogous. For any ๐œ€ > 0consider the feasibility problem with variable ๐‘ฅ โ‰ฅ 0 and constraints[๏ธ‚

โˆ’๐‘๐‘‡

๐ด

]๏ธ‚๐‘ฅ =

[๏ธ‚โˆ’๐‘โ‹† + ๐œ€

๐‘

]๏ธ‚that is ๐‘๐‘‡๐‘ฅ = ๐‘โ‹† โˆ’ ๐œ€,

๐ด๐‘ฅ = ๐‘.

Optimality of ๐‘โ‹† implies that the above problem is infeasible. By Lemma 2.1 there exists๐‘ฆ = [๐‘ฆ0 ๐‘ฆ]๐‘‡ such that [๏ธ€

โˆ’๐‘, ๐ด๐‘‡]๏ธ€๐‘ฆ โ‰ค 0 and

[๏ธ€โˆ’๐‘โ‹† + ๐œ€, ๐‘๐‘‡

]๏ธ€๐‘ฆ > 0.

If ๐‘ฆ0 = 0 then ๐ด๐‘‡๐‘ฆ โ‰ค 0 and ๐‘๐‘‡๐‘ฆ > 0, which by Lemma 2.1 again would mean that theoriginal primal problem was infeasible, which is not the case. Hence we can rescale so that๐‘ฆ0 = 1 and then we get

๐‘โˆ’ ๐ด๐‘‡๐‘ฆ โ‰ฅ 0 and ๐‘๐‘‡๐‘ฆ โ‰ฅ ๐‘โ‹† โˆ’ ๐œ€.

The first inequality above implies that ๐‘ฆ is feasible for the dual problem. By letting ๐œ€ โ†’ 0we obtain ๐‘‘โ‹† โ‰ฅ ๐‘โ‹†.

We can exploit strong duality to freely choose between solving the primal or dual versionof any linear problem.

16

Page 20: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Example 2.7 (Sum of largest elements). Suppose that ๐‘ฅ is now a constant vector. Con-sider the following problem with variable ๐‘ง:

maximize ๐‘ฅ๐‘‡ ๐‘งsubject to

โˆ‘๏ธ€๐‘– ๐‘ง๐‘– = ๐‘š,

0 โ‰ค ๐‘ง โ‰ค 1.

The maximum is attained when ๐‘ง indicates the positions of ๐‘š largest entries in ๐‘ฅ, andthe objective value is then their sum. This formulation, however, cannot be used when ๐‘ฅis another variable, since then the objective function is no longer linear. Let us derive thedual problem. The Lagrangian is

๐ฟ(๐‘ง, ๐‘ , ๐‘ก, ๐‘ข) = ๐‘ฅ๐‘‡ ๐‘ง + ๐‘ก(๐‘šโˆ’ ๐‘’๐‘‡ ๐‘ง) + ๐‘ ๐‘‡ ๐‘ง + ๐‘ข๐‘‡ (๐‘’โˆ’ ๐‘ง) == ๐‘ง๐‘‡ (๐‘ฅโˆ’ ๐‘ก๐‘’ + ๐‘ โˆ’ ๐‘ข) + ๐‘ก๐‘š + ๐‘ข๐‘‡ ๐‘’

with ๐‘ข, ๐‘  โ‰ฅ 0. Since ๐‘ ๐‘– โ‰ฅ 0 is arbitrary and not otherwise constrained, the equality๐‘ฅ๐‘– โˆ’ ๐‘ก + ๐‘ ๐‘– โˆ’ ๐‘ข๐‘– = 0 is the same as ๐‘ข๐‘– + ๐‘ก โ‰ฅ ๐‘ฅ๐‘– and for the dual problem we get

minimize ๐‘š๐‘ก +โˆ‘๏ธ€

๐‘– ๐‘ข๐‘–

subject to ๐‘ข๐‘– + ๐‘ก โ‰ฅ ๐‘ฅ๐‘–, ๐‘– = 1, . . . , ๐‘›,๐‘ข๐‘– โ‰ฅ 0, ๐‘– = 1, . . . , ๐‘›,

which is exactly the problem (2.10) we studied in Sec. 2.2.6. Strong duality now impliesthat (2.10) computes the sum of ๐‘š biggest entries in ๐‘ฅ.

2.4.3 Duality and infeasibility: summary

We can now expand the discussion of infeasibility certificates in the context of duality. Farkasโ€™lemma Lemma 2.1 can be dualized and the two versions can be summarized as follows:

Lemma 2.4 (Primal and dual Farkasโ€™ lemma). For a primal-dual pair of linear problemswe have the following equivalences:

1. The primal problem (2.12) is infeasible if and only if there is ๐‘ฆ such that ๐ด๐‘‡๐‘ฆ โ‰ค 0 and๐‘๐‘‡๐‘ฆ > 0.

2. The dual problem (2.13) is infeasible if and only if there is ๐‘ฅ โ‰ฅ 0 such that ๐ด๐‘ฅ = 0and ๐‘๐‘‡๐‘ฅ < 0.

Weak and strong duality for linear optimization now lead to the following conclusions:

โ€ข If the problem is primal feasible and has finite objective value (โˆ’โˆž < ๐‘โ‹† < โˆž) thenso is the dual and ๐‘‘โ‹† = ๐‘โ‹†. We sometimes refer to this case as primal and dual feasible.The dual solution certifies the optimality of the primal solution and vice versa.

โ€ข If the primal problem is feasible but unbounded (๐‘โ‹† = โˆ’โˆž) then the dual is infeasible(๐‘‘โ‹† = โˆ’โˆž). Part (ii) of Farkasโ€™ lemma provides a certificate of this fact, that is a

17

Page 21: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

vector ๐‘ฅ with ๐‘ฅ โ‰ฅ 0, ๐ด๐‘ฅ = 0 and ๐‘๐‘‡๐‘ฅ < 0. In fact it is easy to give this statement ageometric interpretation. If ๐‘ฅ0 is any primal feasible point then the infinite ray

๐‘ก โ†’ ๐‘ฅ0 + ๐‘ก๐‘ฅ, ๐‘ก โˆˆ [0,โˆž)

belongs to the feasible set โ„ฑ๐‘ because ๐ด(๐‘ฅ0 + ๐‘ก๐‘ฅ) = ๐‘ and ๐‘ฅ0 + ๐‘ก๐‘ฅ โ‰ฅ 0. Along this raythe objective value is unbounded below:

๐‘๐‘‡ (๐‘ฅ0 + ๐‘ก๐‘ฅ) = ๐‘๐‘‡๐‘ฅ0 + ๐‘ก(๐‘๐‘‡๐‘ฅ) โ†’ โˆ’โˆž.

โ€ข If the primal problem is infeasible (๐‘โ‹† = โˆž) then a certificate of this fact is providedby part (i). The dual problem may be unbounded (๐‘‘โ‹† = โˆž) or infeasible (๐‘‘โ‹† = โˆ’โˆž).

Example 2.8 (Primal-dual infeasibility). Weak and strong duality imply that the onlycase when ๐‘‘โ‹† ฬธ= ๐‘โ‹† is when both primal and dual problem are infeasible (๐‘‘โ‹† = โˆ’โˆž,๐‘โ‹† = โˆž), for example:

minimize ๐‘ฅsubject to 0 ยท ๐‘ฅ = 1.

2.4.4 Dual values as shadow prices

Dual values are related to shadow prices, as they measure, under some nondegeneracy as-sumption, the sensitivity of the objective value to a change in the constraint. Consider againthe primal and dual problem pair (2.12) and (2.13) with feasible sets โ„ฑ๐‘ and โ„ฑ๐‘‘ and with aprimal-dual optimal solution (๐‘ฅ*, ๐‘ฆ*, ๐‘ *).

Suppose we change one of the values in ๐‘ from ๐‘๐‘– to ๐‘โ€ฒ๐‘–. This corresponds to moving oneof the hyperplanes defining โ„ฑ๐‘, and in consequence the optimal solution (and the objectivevalue) may change. On the other hand, the dual feasible set โ„ฑ๐‘‘ is not affected. Assumingthat the solution (๐‘ฆ*, ๐‘ *) was a unique vertex of โ„ฑ๐‘‘ this point remains optimal for the dualafter a sufficiently small change of ๐‘. But then the change of the dual objective is

๐‘ฆ*๐‘– (๐‘โ€ฒ๐‘– โˆ’ ๐‘๐‘–)

and by strong duality the primal objective changes by the same amount.

Example 2.9 (Student diet). An optimization student wants to save money on the dietwhile remaining healthy. A healthy diet requires at least ๐‘ƒ = 6 units of protein, ๐ถ = 15units of carbohydrates, ๐น = 5 units of fats and ๐‘‰ = 7 units of vitamins. The student canchoose from the following products:

P C F V pricetakeaway 3 3 2 1 5vegetables 1 2 0 4 1bread 0.5 4 1 0 2

18

Page 22: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

The problem of minimizing cost while meeting dietary requirements is

minimize 5๐‘ฅ1 + ๐‘ฅ2 + 2๐‘ฅ3

subject to 3๐‘ฅ1 + ๐‘ฅ2 + 0.5๐‘ฅ3 โ‰ฅ 6,3๐‘ฅ1 + 2๐‘ฅ2 + 4๐‘ฅ3 โ‰ฅ 15,2๐‘ฅ1 + ๐‘ฅ3 โ‰ฅ 5,๐‘ฅ1 + 4๐‘ฅ2 โ‰ฅ 7,๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3 โ‰ฅ 0.

If ๐‘ฆ1, ๐‘ฆ2, ๐‘ฆ3, ๐‘ฆ4 are the dual variables associated with the four inequality constraints thenthe (unique) primal-dual optimal solution to this problem is approximately:

(๐‘ฅ, ๐‘ฆ) = ((1, 1.5, 3), (0.42, 0, 1.78, 0.14))

with optimal cost ๐‘โ‹† = 12.5. Note ๐‘ฆ2 = 0 indicates that the second constraint is notbinding. Indeed, we could increase ๐ถ to 18 without affecting the optimal solution. Theremaining constraints are binding.

Improving the intake of protein by 1 unit (increasing ๐‘ƒ to 7) will increase the costby 0.42, while doing the same for fat will cost an extra 1.78 per unit. If the student hadextra money to improve one of the parameters then the best choice would be to increasethe intake of vitamins, with shadow price of just 0.14.

If one month the student only had 12 units of money and was willing to relax one ofthe requirements then the best choice is to save on fats: the necessary reduction of ๐น issmallest, namely 0.5 ยท 1.78โˆ’1 = 0.28. Indeed, with the new value of ๐น = 4.72 the sameproblem solves to ๐‘โ‹† = 12 and ๐‘ฅ = (1.08, 1.48, 2.56).

We stress that a truly balanced diet problem should also include upper bounds.

19

Page 23: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Chapter 3

Conic quadratic optimization

This chapter extends the notion of linear optimization with quadratic cones. Conic quadraticoptimization, also known as second-order cone optimization, is a straightforward generaliza-tion of linear optimization, in the sense that we optimize a linear function under linear(in)equalities with some variables belonging to one or more (rotated) quadratic cones. Wediscuss the basic concept of quadratic cones, and demonstrate the surprisingly large flexibilityof conic quadratic modeling.

3.1 Cones

Since this is the first place where we introduce a non-linear cone, it seems suitable to makeour most important definition:

A set ๐พ โŠ† R๐‘› is called a convex cone if

โ€ข for every ๐‘ฅ, ๐‘ฆ โˆˆ ๐พ we have ๐‘ฅ + ๐‘ฆ โˆˆ ๐พ,

โ€ข for every ๐‘ฅ โˆˆ ๐พ and ๐›ผ โ‰ฅ 0 we have ๐›ผ๐‘ฅ โˆˆ ๐พ.

For example a linear subspace of R๐‘›, the positive orthant R๐‘›โ‰ฅ0 or any ray (half-line)

starting at the origin are examples of convex cones. We leave it for the reader to checkthat the intersection of convex cones is a convex cone; this property enables us to assemblecomplicated optimization models from individual conic bricks.

3.1.1 Quadratic cones

We define the ๐‘›-dimensional quadratic cone as

๐’ฌ๐‘› =

{๏ธ‚๐‘ฅ โˆˆ R๐‘› | ๐‘ฅ1 โ‰ฅ

โˆš๏ธ๐‘ฅ22 + ๐‘ฅ2

3 + ยท ยท ยท + ๐‘ฅ2๐‘›

}๏ธ‚. (3.1)

The geometric interpretation of a quadratic (or second-order) cone is shown in Fig. 3.1 fora cone with three variables, and illustrates how the boundary of the cone resembles anice-cream cone. The 1-dimensional quadratic cone simply states nonnegativity ๐‘ฅ1 โ‰ฅ 0.

20

Page 24: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Fig. 3.1: Boundary of quadratic cone ๐‘ฅ1 โ‰ฅโˆš๏ธ€

๐‘ฅ22 + ๐‘ฅ2

3 and rotated quadratic cone 2๐‘ฅ1๐‘ฅ2 โ‰ฅ ๐‘ฅ23,

๐‘ฅ1, ๐‘ฅ2 โ‰ฅ 0.

3.1.2 Rotated quadratic cones

An ๐‘›โˆ’dimensional rotated quadratic cone is defined as

๐’ฌ๐‘›๐‘Ÿ =

{๏ธ€๐‘ฅ โˆˆ R๐‘› | 2๐‘ฅ1๐‘ฅ2 โ‰ฅ ๐‘ฅ2

3 + ยท ยท ยท + ๐‘ฅ2๐‘›, ๐‘ฅ1, ๐‘ฅ2 โ‰ฅ 0

}๏ธ€. (3.2)

As the name indicates, there is a simple relationship between quadratic and rotated quadraticcones. Define an orthogonal transformation

๐‘‡๐‘› :=

โŽกโŽฃ 1/โˆš

2 1/โˆš

2 0

1/โˆš

2 โˆ’1/โˆš

2 00 0 ๐ผ๐‘›โˆ’2

โŽคโŽฆ . (3.3)

Then it is easy to verify that

๐‘ฅ โˆˆ ๐’ฌ๐‘› โ‡โ‡’ ๐‘‡๐‘›๐‘ฅ โˆˆ ๐’ฌ๐‘›๐‘Ÿ ,

and since ๐‘‡ is orthogonal we call ๐’ฌ๐‘›๐‘Ÿ a rotated cone; the transformation corresponds to a

rotation of ๐œ‹/4 in the (๐‘ฅ1, ๐‘ฅ2) plane. For example if ๐‘ฅ โˆˆ ๐’ฌ3 andโŽกโŽฃ ๐‘ง1๐‘ง2๐‘ง3

โŽคโŽฆ =

โŽกโŽฃ 1โˆš2

1โˆš2

01โˆš2

โˆ’ 1โˆš2

0

0 0 1

โŽคโŽฆ ยท

โŽกโŽฃ ๐‘ฅ1

๐‘ฅ2

๐‘ฅ3

โŽคโŽฆ =

โŽกโŽฃ 1โˆš2(๐‘ฅ1 + ๐‘ฅ2)

1โˆš2(๐‘ฅ1 โˆ’ ๐‘ฅ2)

๐‘ฅ3

โŽคโŽฆthen

2๐‘ง1๐‘ง2 โ‰ฅ ๐‘ง23 , ๐‘ง1, ๐‘ง2 โ‰ฅ 0 =โ‡’ (๐‘ฅ21 โˆ’ ๐‘ฅ2

2) โ‰ฅ ๐‘ฅ23, ๐‘ฅ1 โ‰ฅ 0,

and similarly we see that

๐‘ฅ21 โ‰ฅ ๐‘ฅ2

2 + ๐‘ฅ23, ๐‘ฅ1 โ‰ฅ 0 =โ‡’ 2๐‘ง1๐‘ง2 โ‰ฅ ๐‘ง23 , ๐‘ง1, ๐‘ง2 โ‰ฅ 0.

Thus, one could argue that we only need quadratic cones ๐’ฌ๐‘›, but there are many exampleswhere using an explicit rotated quadratic cone ๐’ฌ๐‘›

๐‘Ÿ is more natural, as we will see next.

21

Page 25: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

3.2 Conic quadratic modeling

In the following we describe several convex sets that can be modeled using conic quadraticformulations or, as we call them, are conic quadratic representable.

3.2.1 Absolute values

In Sec. 2.2.2 we saw how to model |๐‘ฅ| โ‰ค ๐‘ก using two linear inequalities, but in fact theepigraph of the absolute value is just the definition of a two-dimensional quadratic cone, i.e.,

|๐‘ฅ| โ‰ค ๐‘ก โ‡โ‡’ (๐‘ก, ๐‘ฅ) โˆˆ ๐’ฌ2.

3.2.2 Euclidean norms

The Euclidean norm of ๐‘ฅ โˆˆ R๐‘›,

โ€–๐‘ฅโ€–2 =โˆš๏ธ

๐‘ฅ21 + ๐‘ฅ2

2 + ยท ยท ยท + ๐‘ฅ2๐‘›

essentially defines the quadratic cone, i.e.,

โ€–๐‘ฅโ€–2 โ‰ค ๐‘ก โ‡โ‡’ (๐‘ก, ๐‘ฅ) โˆˆ ๐’ฌ๐‘›+1.

The epigraph of the squared Euclidean norm can be described as the intersection of a rotatedquadratic cone with an affine hyperplane,

๐‘ฅ21 + ยท ยท ยท + ๐‘ฅ2

๐‘› = โ€–๐‘ฅโ€–22 โ‰ค ๐‘ก โ‡โ‡’ (1/2, ๐‘ก, ๐‘ฅ) โˆˆ ๐’ฌ๐‘›+2๐‘Ÿ .

3.2.3 Convex quadratic sets

Assume ๐‘„ โˆˆ R๐‘›ร—๐‘› is a symmetric positive semidefinite matrix. The convex inequality

(1/2)๐‘ฅ๐‘‡๐‘„๐‘ฅ + ๐‘๐‘‡๐‘ฅ + ๐‘Ÿ โ‰ค 0

may be rewritten as

๐‘ก + ๐‘๐‘‡๐‘ฅ + ๐‘Ÿ = 0,๐‘ฅ๐‘‡๐‘„๐‘ฅ โ‰ค 2๐‘ก.

(3.4)

Since ๐‘„ is symmetric positive semidefinite the epigraph

๐‘ฅ๐‘‡๐‘„๐‘ฅ โ‰ค 2๐‘ก (3.5)

22

Page 26: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

is a convex set and there exists a matrix ๐น โˆˆ R๐‘˜ร—๐‘› such that

๐‘„ = ๐น ๐‘‡๐น (3.6)

(see Sec. 6 for properties of semidefinite matrices). For instance ๐น could be the Choleskyfactorization of ๐‘„. Then

๐‘ฅ๐‘‡๐‘„๐‘ฅ = ๐‘ฅ๐‘‡๐น ๐‘‡๐น๐‘ฅ = โ€–๐น๐‘ฅโ€–22

and we have an equivalent characterization of (3.5) as

(1/2)๐‘ฅ๐‘‡๐‘„๐‘ฅ โ‰ค ๐‘ก โ‡โ‡’ (๐‘ก, 1, ๐น๐‘ฅ) โˆˆ ๐’ฌ2+๐‘˜๐‘Ÿ .

Frequently ๐‘„ has the structure

๐‘„ = ๐ผ + ๐น ๐‘‡๐น

where ๐ผ is the identity matrix, so

๐‘ฅ๐‘‡๐‘„๐‘ฅ = ๐‘ฅ๐‘‡๐‘ฅ + ๐‘ฅ๐‘‡๐น ๐‘‡๐น๐‘ฅ = โ€–๐‘ฅโ€–22 + โ€–๐น๐‘ฅโ€–22

and hence

(๐‘“, 1, ๐‘ฅ) โˆˆ ๐’ฌ2+๐‘›๐‘Ÿ , (โ„Ž, 1, ๐น๐‘ฅ) โˆˆ ๐’ฌ2+๐‘˜

๐‘Ÿ , ๐‘“ + โ„Ž = ๐‘ก

is a conic quadratic representation of (3.5) in this case.

3.2.4 Second-order cones

A second-order cone is occasionally specified as

โ€–๐ด๐‘ฅ + ๐‘โ€–2 โ‰ค ๐‘๐‘‡๐‘ฅ + ๐‘‘ (3.7)

where ๐ด โˆˆ R๐‘šร—๐‘› and ๐‘ โˆˆ R๐‘›. The formulation (3.7) is simply

(๐‘๐‘‡๐‘ฅ + ๐‘‘,๐ด๐‘ฅ + ๐‘) โˆˆ ๐’ฌ๐‘š+1 (3.8)

or equivalently

๐‘  = ๐ด๐‘ฅ + ๐‘,๐‘ก = ๐‘๐‘‡๐‘ฅ + ๐‘‘,

(๐‘ก, ๐‘ ) โˆˆ ๐’ฌ๐‘š+1.(3.9)

As will be explained in Sec. 8, we refer to (3.8) as the dual form and (3.9) as the primalform. An alternative characterization of (3.7) is

โ€–๐ด๐‘ฅ + ๐‘โ€–22 โˆ’ (๐‘๐‘‡๐‘ฅ + ๐‘‘)2 โ‰ค 0, ๐‘๐‘‡๐‘ฅ + ๐‘‘ โ‰ฅ 0 (3.10)

which shows that certain quadratic inequalities are conic quadratic representable.

23

Page 27: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

3.2.5 Simple sets involving power functions

Some power-like inequalities are conic quadratic representable, even though it need not beobvious at first glance. For example, we have

|๐‘ก| โ‰ค โˆš๐‘ฅ, ๐‘ฅ โ‰ฅ 0 โ‡โ‡’ (๐‘ฅ, 1/2, ๐‘ก) โˆˆ ๐’ฌ3

๐‘Ÿ,

or in a similar fashion

๐‘ก โ‰ฅ 1

๐‘ฅ, ๐‘ฅ โ‰ฅ 0 โ‡โ‡’ (๐‘ฅ, ๐‘ก,

โˆš2) โˆˆ ๐’ฌ3

๐‘Ÿ.

For a more complicated example, consider the constraint

๐‘ก โ‰ฅ ๐‘ฅ3/2, ๐‘ฅ โ‰ฅ 0.

This is equivalent to a statement involving two cones and an extra variable

(๐‘ , ๐‘ก, ๐‘ฅ), (๐‘ฅ, 1/8, ๐‘ ) โˆˆ ๐’ฌ3๐‘Ÿ

because

2๐‘ ๐‘ก โ‰ฅ ๐‘ฅ2, 2 ยท 1

8๐‘ฅ โ‰ฅ ๐‘ 2, =โ‡’ 4๐‘ 2๐‘ก2 ยท 1

4๐‘ฅ โ‰ฅ ๐‘ฅ4 ยท ๐‘ 2 =โ‡’ ๐‘ก โ‰ฅ ๐‘ฅ3/2.

In practice power-like inequalities representable with similar tricks can often be expressedmuch more naturally using the power cone (see Sec. 4), so we will not dwell on these examplesmuch longer.

3.2.6 Rational functions

A general non-degenerate rational function of one variable ๐‘“(๐‘ฅ) = ๐‘Ž๐‘ฅ+๐‘๐‘๐‘ฅ+๐‘‘

can always be writtenin the form ๐‘“(๐‘ฅ) = ๐‘ + ๐‘ž

๐‘๐‘ฅ+๐‘‘, and when ๐‘ž > 0 it is convex on the set where ๐‘๐‘ฅ + ๐‘‘ > 0. The

conic quadratic model of

๐‘ก โ‰ฅ ๐‘ +๐‘ž

๐‘๐‘ฅ + ๐‘‘, ๐‘๐‘ฅ + ๐‘‘ > 0, ๐‘ž > 0

with variables ๐‘ก, ๐‘ฅ can be written as:

(๐‘กโˆ’ ๐‘, ๐‘๐‘ฅ + ๐‘‘,โˆš๏ธ€

2๐‘ž) โˆˆ ๐’ฌ3๐‘Ÿ.

3.2.7 Harmonic mean

Consider next the hypograph of the harmonic mean,(๏ธƒ1

๐‘›

๐‘›โˆ‘๏ธ๐‘–=1

๐‘ฅโˆ’1๐‘–

)๏ธƒโˆ’1

โ‰ฅ ๐‘ก โ‰ฅ 0, ๐‘ฅ โ‰ฅ 0.

24

Page 28: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

It is not obvious that the inequality defines a convex set, nor that it is conic quadraticrepresentable. However, we can rewrite it in the form

๐‘›โˆ‘๏ธ๐‘–=1

๐‘ก2

๐‘ฅ๐‘–

โ‰ค ๐‘›๐‘ก,

which suggests the conic representation:

2๐‘ฅ๐‘–๐‘ง๐‘– โ‰ฅ ๐‘ก2, ๐‘ฅ๐‘–, ๐‘ง๐‘– โ‰ฅ 0, 2๐‘›โˆ‘๏ธ

๐‘–=1

๐‘ง๐‘– = ๐‘›๐‘ก. (3.11)

Alternatively, in case ๐‘› = 2, the hypograph of the harmonic mean can be represented by asingle quadratic cone:

(๐‘ฅ1 + ๐‘ฅ2 โˆ’ ๐‘ก/2, ๐‘ฅ1, ๐‘ฅ2, ๐‘ก/2) โˆˆ ๐’ฌ4. (3.12)

As this representation generalizes the harmonic mean to all points ๐‘ฅ1 + ๐‘ฅ2 โ‰ฅ 0, as impliedby (3.12), the variable bounds ๐‘ฅ โ‰ฅ 0 and ๐‘ก โ‰ฅ 0 have to be added seperately if so desired.

3.2.8 Quadratic forms with one negative eigenvalue

Assume that ๐ด โˆˆ R๐‘›ร—๐‘› is a symmetric matrix with exactly one negative eigenvalue, i.e., ๐ดhas a spectral factorization (i.e., eigenvalue decomposition)

๐ด = ๐‘„ฮ›๐‘„๐‘‡ = โˆ’๐›ผ1๐‘ž1๐‘ž๐‘‡1 +

๐‘›โˆ‘๏ธ๐‘–=2

๐›ผ๐‘–๐‘ž๐‘–๐‘ž๐‘‡๐‘– ,

where ๐‘„๐‘‡๐‘„ = ๐ผ, ฮ› = Diag(โˆ’๐›ผ1, ๐›ผ2, . . . , ๐›ผ๐‘›), ๐›ผ๐‘– โ‰ฅ 0. Then

๐‘ฅ๐‘‡๐ด๐‘ฅ โ‰ค 0

is equivalent to๐‘›โˆ‘๏ธ

๐‘—=2

๐›ผ๐‘—(๐‘ž๐‘‡๐‘— ๐‘ฅ)2 โ‰ค ๐›ผ1(๐‘ž

๐‘‡1 ๐‘ฅ)2. (3.13)

This shows ๐‘ฅ๐‘‡๐ด๐‘ฅ โ‰ค 0 to be a union of two convex subsets, each representable by a quadraticcone. For example, assuming ๐‘ž๐‘‡1 ๐‘ฅ โ‰ฅ 0, we can characterize (3.13) as

(โˆš๐›ผ1๐‘ž

๐‘‡1 ๐‘ฅ,

โˆš๐›ผ2๐‘ž

๐‘‡2 ๐‘ฅ, . . . ,

โˆš๐›ผ๐‘›๐‘ž

๐‘‡๐‘›๐‘ฅ) โˆˆ ๐’ฌ๐‘›. (3.14)

3.2.9 Ellipsoidal sets

The set

โ„ฐ = {๐‘ฅ โˆˆ R๐‘› | โ€–๐‘ƒ (๐‘ฅโˆ’ ๐‘)โ€–2 โ‰ค 1}describes an ellipsoid centred at ๐‘. It has a natural conic quadratic representation, i.e., ๐‘ฅ โˆˆ โ„ฐif and only if

๐‘ฅ โˆˆ โ„ฐ โ‡โ‡’ (1, ๐‘ƒ (๐‘ฅโˆ’ ๐‘)) โˆˆ ๐’ฌ๐‘›+1.

25

Page 29: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

3.3 Conic quadratic case studies

3.3.1 Quadratically constrained quadratic optimization

A general convex quadratically constrained quadratic optimization problem can be writtenas

minimize (1/2)๐‘ฅ๐‘‡๐‘„0๐‘ฅ + ๐‘๐‘‡0 ๐‘ฅ + ๐‘Ÿ0subject to (1/2)๐‘ฅ๐‘‡๐‘„๐‘–๐‘ฅ + ๐‘๐‘‡๐‘– ๐‘ฅ + ๐‘Ÿ๐‘– โ‰ค 0, ๐‘– = 1, . . . , ๐‘,

(3.15)

where all ๐‘„๐‘– โˆˆ R๐‘›ร—๐‘› are symmetric positive semidefinite. Let

๐‘„๐‘– = ๐น ๐‘‡๐‘– ๐น๐‘–, ๐‘– = 0, . . . , ๐‘,

where ๐น๐‘– โˆˆ R๐‘˜๐‘–ร—๐‘›. Using the formulations in Sec. 3.2.3 we then get an equivalent conicquadratic problem

minimize ๐‘ก0 + ๐‘๐‘‡0 ๐‘ฅ + ๐‘Ÿ0subject to ๐‘ก๐‘– + ๐‘๐‘‡๐‘– ๐‘ฅ + ๐‘Ÿ๐‘– = 0, ๐‘– = 1, . . . , ๐‘,

(๐‘ก๐‘–, 1, ๐น๐‘–๐‘ฅ) โˆˆ ๐’ฌ๐‘˜๐‘–+2๐‘Ÿ , ๐‘– = 0, . . . , ๐‘.

(3.16)

Assume next that ๐‘˜๐‘–, the number of rows in ๐น๐‘–, is small compared to ๐‘›. Storing ๐‘„๐‘– requiresabout ๐‘›2/2 space whereas storing ๐น๐‘– then only requires ๐‘›๐‘˜๐‘– space. Moreover, the amountof work required to evaluate ๐‘ฅ๐‘‡๐‘„๐‘–๐‘ฅ is proportional to ๐‘›2 whereas the work required toevaluate ๐‘ฅ๐‘‡๐น ๐‘‡

๐‘– ๐น๐‘–๐‘ฅ = โ€–๐น๐‘–๐‘ฅโ€–2 is proportional to ๐‘›๐‘˜๐‘– only. In other words, if ๐‘„๐‘– have low rank,then (3.16) will require much less space and time to solve than (3.15). We will study thereformulation (3.16) in much more detail in Sec. 10.

3.3.2 Robust optimization with ellipsoidal uncertainties

Often in robust optimization some of the parameters in the model are assumed to be unknownexactly, but there is a simple set describing the uncertainty. For example, for a standardlinear optimization problem we may wish to find a robust solution for all objective vectors๐‘ in an ellipsoid

โ„ฐ = {๐‘ โˆˆ R๐‘› | ๐‘ = ๐น๐‘ฆ + ๐‘”, โ€–๐‘ฆโ€–2 โ‰ค 1}.

A common approach is then to optimize for the worst-case scenario for ๐‘, so we get a robustversion

minimize sup๐‘โˆˆโ„ฐ ๐‘๐‘‡๐‘ฅ

subject to ๐ด๐‘ฅ = ๐‘,๐‘ฅ โ‰ฅ 0.

(3.17)

The worst-case objective can be evaluated as

sup๐‘โˆˆโ„ฐ

๐‘๐‘‡๐‘ฅ = ๐‘”๐‘‡๐‘ฅ + supโ€–๐‘ฆโ€–2โ‰ค1

๐‘ฆ๐‘‡๐น ๐‘‡๐‘ฅ = ๐‘”๐‘‡๐‘ฅ + โ€–๐น ๐‘‡๐‘ฅโ€–2

26

Page 30: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

where we used that supโ€–๐‘ขโ€–2โ‰ค1 ๐‘ฃ๐‘‡๐‘ข = (๐‘ฃ๐‘‡๐‘ฃ)/โ€–๐‘ฃโ€–2 = โ€–๐‘ฃโ€–2. Thus the robust problem (3.17) is

equivalent to

minimize ๐‘”๐‘‡๐‘ฅ + โ€–๐น ๐‘‡๐‘ฅโ€–2subject to ๐ด๐‘ฅ = ๐‘,

๐‘ฅ โ‰ฅ 0,

which can be posed as a conic quadratic problem

minimize ๐‘”๐‘‡๐‘ฅ + ๐‘กsubject to ๐ด๐‘ฅ = ๐‘,

(๐‘ก, ๐น ๐‘‡๐‘ฅ) โˆˆ ๐’ฌ๐‘›+1,๐‘ฅ โ‰ฅ 0.

(3.18)

3.3.3 Markowitz portfolio optimization

In classical Markowitz portfolio optimization we consider investment in ๐‘› stocks or assetsheld over a period of time. Let ๐‘ฅ๐‘– denote the amount we invest in asset ๐‘–, and assume astochastic model where the return of the assets is a random variable ๐‘Ÿ with known mean

๐œ‡ = E๐‘Ÿ

and covariance

ฮฃ = E(๐‘Ÿ โˆ’ ๐œ‡)(๐‘Ÿ โˆ’ ๐œ‡)๐‘‡ .

The return of our investment is also a random variable ๐‘ฆ = ๐‘Ÿ๐‘‡๐‘ฅ with mean (or expectedreturn)

E๐‘ฆ = ๐œ‡๐‘‡๐‘ฅ

and variance (or risk)

(๐‘ฆ โˆ’ E๐‘ฆ)2 = ๐‘ฅ๐‘‡ฮฃ๐‘ฅ.

We then wish to rebalance our portfolio to achieve a compromise between risk and expectedreturn, e.g., we can maximize the expected return given an upper bound ๐›พ on the tolerablerisk and a constraint that our total investment is fixed,

maximize ๐œ‡๐‘‡๐‘ฅsubject to ๐‘ฅ๐‘‡ฮฃ๐‘ฅ โ‰ค ๐›พ

๐‘’๐‘‡๐‘ฅ = 1๐‘ฅ โ‰ฅ 0.

(3.19)

Suppose we factor ฮฃ = ๐บ๐บ๐‘‡ (e.g., using a Cholesky or a eigenvalue decomposition). Wethen get a conic formulation

maximize ๐œ‡๐‘‡๐‘ฅsubject to (

โˆš๐›พ,๐บ๐‘‡๐‘ฅ) โˆˆ ๐’ฌ๐‘›+1

๐‘’๐‘‡๐‘ฅ = 1๐‘ฅ โ‰ฅ 0.

(3.20)

27

Page 31: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

In practice both the average return and covariance are estimated using historical data. Arecent trend is then to formulate a robust version of the portfolio optimization problem tocombat the inherent uncertainty in those estimates, e.g., we can constrain ๐œ‡ to an ellipsoidaluncertainty set as in Sec. 3.3.2.

It is also common that the data for a portfolio optimization problem is already given inthe form of a factor model ฮฃ = ๐น ๐‘‡๐น of ฮฃ = ๐ผ + ๐น ๐‘‡๐น and a conic quadratic formulation asin Sec. 3.3.1 is most natural. For more details see Sec. 10.3.

3.3.4 Maximizing the Sharpe ratio

Continuing the previous example, the Sharpe ratio defines an efficiency metric of a portfolioas the expected return per unit risk, i.e.,

๐‘†(๐‘ฅ) =๐œ‡๐‘‡๐‘ฅโˆ’ ๐‘Ÿ๐‘“(๐‘ฅ๐‘‡ฮฃ๐‘ฅ)1/2

,

where ๐‘Ÿ๐‘“ denotes the return of a risk-free asset. We assume that there is a portfolio with๐œ‡๐‘‡๐‘ฅ > ๐‘Ÿ๐‘“ , so maximizing the Sharpe ratio is equivalent to minimizing 1/๐‘†(๐‘ฅ). In otherwords, we have the following problem

minimize โ€–๐บ๐‘‡ ๐‘ฅโ€–๐œ‡๐‘‡ ๐‘ฅโˆ’๐‘Ÿ๐‘“

subject to ๐‘’๐‘‡๐‘ฅ = 1,๐‘ฅ โ‰ฅ 0.

The objective has the same nature as a quotient of two affine functions we studied in Sec.2.2.5. We reformulate the problem in a similar way, introducing a scalar variable ๐‘ง โ‰ฅ 0 anda variable transformation

๐‘ฆ = ๐‘ง๐‘ฅ.

Since a positive ๐‘ง can be chosen arbitrarily and (๐œ‡ โˆ’ ๐‘Ÿ๐‘“๐‘’)๐‘‡๐‘ฅ > 0, we can without loss of

generality assume that

(๐œ‡โˆ’ ๐‘Ÿ๐‘“๐‘’)๐‘‡๐‘ฆ = 1.

Thus, we obtain the following conic problem for maximizing the Sharpe ratio,

minimize ๐‘กsubject to (๐‘ก, ๐บ๐‘‡๐‘ฆ) โˆˆ ๐’ฌ๐‘˜+1,

๐‘’๐‘‡๐‘ฆ = ๐‘ง,(๐œ‡โˆ’ ๐‘Ÿ๐‘“๐‘’)

๐‘‡๐‘ฆ = 1,๐‘ฆ, ๐‘ง โ‰ฅ 0,

and we recover ๐‘ฅ = ๐‘ฆ/๐‘ง.

28

Page 32: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

3.3.5 A resource constrained production and inventory problem

The resource constrained production and inventory problem [Zie82] can be formulated asfollows:

minimizeโˆ‘๏ธ€๐‘›

๐‘—=1(๐‘‘๐‘—๐‘ฅ๐‘— + ๐‘’๐‘—/๐‘ฅ๐‘—)

subject toโˆ‘๏ธ€๐‘›

๐‘—=1 ๐‘Ÿ๐‘—๐‘ฅ๐‘— โ‰ค ๐‘,

๐‘ฅ๐‘— โ‰ฅ 0, ๐‘— = 1, . . . , ๐‘›,

(3.21)

where ๐‘› denotes the number of items to be produced, ๐‘ denotes the amount of commonresource, and ๐‘Ÿ๐‘— is the consumption of the limited resource to produce one unit of item ๐‘—.The objective function represents inventory and ordering costs. Let ๐‘๐‘๐‘— denote the holdingcost per unit of product ๐‘— and ๐‘๐‘Ÿ๐‘— denote the rate of holding costs, respectively. Further, let

๐‘‘๐‘— =๐‘๐‘๐‘—๐‘

๐‘Ÿ๐‘—

2

so that

๐‘‘๐‘—๐‘ฅ๐‘—

is the average holding costs for product ๐‘—. If ๐ท๐‘— denotes the total demand for product ๐‘— and๐‘๐‘œ๐‘— the ordering cost per order of product ๐‘— then let

๐‘’๐‘— = ๐‘๐‘œ๐‘—๐ท๐‘—

and hence

๐‘’๐‘—๐‘ฅ๐‘—

=๐‘๐‘œ๐‘—๐ท๐‘—

๐‘ฅ๐‘—

is the average ordering costs for product ๐‘—. In summary, the problem finds the optimal batchsize such that the inventory and ordering cost are minimized while satisfying the constraintson the common resource. Given ๐‘‘๐‘—, ๐‘’๐‘— โ‰ฅ 0 problem (3.21) is equivalent to the conic quadraticproblem

minimizeโˆ‘๏ธ€๐‘›

๐‘—=1(๐‘‘๐‘—๐‘ฅ๐‘— + ๐‘’๐‘—๐‘ก๐‘—)

subject toโˆ‘๏ธ€๐‘›

๐‘—=1 ๐‘Ÿ๐‘—๐‘ฅ๐‘— โ‰ค ๐‘,

(๐‘ก๐‘—, ๐‘ฅ๐‘—,โˆš

2) โˆˆ ๐’ฌ3๐‘Ÿ, ๐‘— = 1, . . . , ๐‘›.

It is not always possible to produce a fractional number of items. In such case ๐‘ฅ๐‘— should beconstrained to be integers. See Sec. 9.

29

Page 33: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Chapter 4

The power cone

So far we studied quadratic cones and their applications in modeling problems involving,directly or indirectly, quadratic terms. In this part we expand the quadratic and rotatedquadratic cone family with power cones, which provide a convenient language to expressmodels involving powers other than 2. We must stress that although the power cones in-clude the quadratic cones as special cases, at the current state-of-the-art they require moreadvanced and less efficient algorithms.

4.1 The power cone(s)

๐‘›-dimensional power cones form a family of convex cones parametrized by a real number0 < ๐›ผ < 1:

๐’ซ๐›ผ,1โˆ’๐›ผ๐‘› =

{๏ธ๐‘ฅ โˆˆ R๐‘› : ๐‘ฅ๐›ผ

1๐‘ฅ1โˆ’๐›ผ2 โ‰ฅ

โˆš๏ธ€๐‘ฅ23 + ยท ยท ยท + ๐‘ฅ2

๐‘›, ๐‘ฅ1, ๐‘ฅ2 โ‰ฅ 0}๏ธ. (4.1)

The constraint in the definition of ๐’ซ๐›ผ,1โˆ’๐›ผ๐‘› can be expressed as a composition of two con-

straints, one of which is a quadratic cone:

๐‘ฅ๐›ผ1๐‘ฅ

1โˆ’๐›ผ2 โ‰ฅ |๐‘ง|,๐‘ง โ‰ฅ

โˆš๏ธ€๐‘ฅ23 + ยท ยท ยท + ๐‘ฅ2

๐‘›,(4.2)

which means that the basic building block we need to consider is the three-dimensional powercone

๐’ซ๐›ผ,1โˆ’๐›ผ3 =

{๏ธ€๐‘ฅ โˆˆ R3 : ๐‘ฅ๐›ผ

1๐‘ฅ1โˆ’๐›ผ2 โ‰ฅ |๐‘ฅ3|, ๐‘ฅ1, ๐‘ฅ2 โ‰ฅ 0

}๏ธ€. (4.3)

More generally, we can also consider power cones with โ€œlong left-hand sideโ€. That is, for๐‘š < ๐‘› and a sequence of exponents ๐›ผ1, . . . , ๐›ผ๐‘š with ๐›ผ1 + ยท ยท ยท + ๐›ผ๐‘š = 1, we have the mostgeneral power cone object defined as

๐’ซ๐›ผ1,ยทยทยท ,๐›ผ๐‘š๐‘› =

{๏ธ๐‘ฅ โˆˆ R๐‘› :

โˆ๏ธ€๐‘š๐‘–=1 ๐‘ฅ

๐›ผ๐‘–๐‘– โ‰ฅ

โˆš๏ธโˆ‘๏ธ€๐‘›๐‘–=๐‘š+1 ๐‘ฅ

2๐‘– , ๐‘ฅ1, . . . , ๐‘ฅ๐‘š โ‰ฅ 0

}๏ธ. (4.4)

The left-hand side is nothing but the weighted geometric mean of the ๐‘ฅ๐‘–, ๐‘– = 1, . . . ,๐‘š withweights ๐›ผ๐‘–. As we will see later, also this most general cone can be modeled as a compositionof three-dimensional cones ๐’ซ๐›ผ,1โˆ’๐›ผ

3 , so in a sense that is the basic object of interest.

30

Page 34: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

There are some notable special cases we are familiar with. If we let ๐›ผ โ†’ 0 then in thelimit we get ๐’ซ0,1

๐‘› = R+ ร— ๐’ฌ๐‘›โˆ’1. If ๐›ผ = 12

then we have a rescaled version of the rotatedquadratic cone, precisely:

(๐‘ฅ1, ๐‘ฅ2, . . . , ๐‘ฅ๐‘›) โˆˆ ๐’ซ12, 12

๐‘› โ‡โ‡’ (๐‘ฅ1/โˆš

2, ๐‘ฅ2/โˆš

2, ๐‘ฅ3, . . . , ๐‘ฅ๐‘›) โˆˆ ๐’ฌ๐‘›r .

A gallery of three-dimensional power cones for varying ๐›ผ is shown in Fig. 4.1.

4.2 Sets representable using the power cone

In this section we give basic examples of constraints which can be expressed using powercones.

4.2.1 Powers

For all values of ๐‘ ฬธ= 0, 1 we can bound ๐‘ฅ๐‘ depending on the convexity of ๐‘“(๐‘ฅ) = ๐‘ฅ๐‘.

โ€ข For ๐‘ > 1 the inequality ๐‘ก โ‰ฅ |๐‘ฅ|๐‘ is equivalent to ๐‘ก1/๐‘ โ‰ฅ |๐‘ฅ| and hence corresponds to

๐‘ก โ‰ฅ |๐‘ฅ|๐‘ โ‡โ‡’ (๐‘ก, 1, ๐‘ฅ) โˆˆ ๐’ซ1/๐‘,1โˆ’1/๐‘3 .

For instance ๐‘ก โ‰ฅ |๐‘ฅ|1.5 is equivalent to (๐‘ก, 1, ๐‘ฅ) โˆˆ ๐’ซ2/3,1/33 .

โ€ข For 0 < ๐‘ < 1 the function ๐‘“(๐‘ฅ) = ๐‘ฅ๐‘ is concave for ๐‘ฅ โ‰ฅ 0 and so we get

|๐‘ก| โ‰ค ๐‘ฅ๐‘, ๐‘ฅ โ‰ฅ 0 โ‡โ‡’ (๐‘ฅ, 1, ๐‘ก) โˆˆ ๐’ซ๐‘,1โˆ’๐‘3 .

โ€ข For ๐‘ < 0 the function ๐‘“(๐‘ฅ) = ๐‘ฅ๐‘ is convex for ๐‘ฅ > 0 and in this range the inequality๐‘ก โ‰ฅ ๐‘ฅ๐‘ is equivalent to

๐‘ก โ‰ฅ ๐‘ฅ๐‘ โ‡โ‡’ ๐‘ก1/(1โˆ’๐‘)๐‘ฅโˆ’๐‘/(1โˆ’๐‘) โ‰ฅ 1 โ‡โ‡’ (๐‘ก, ๐‘ฅ, 1) โˆˆ ๐’ซ1/(1โˆ’๐‘),โˆ’๐‘/(1โˆ’๐‘)3 .

For example ๐‘ก โ‰ฅ 1โˆš๐‘ฅ

is the same as (๐‘ก, ๐‘ฅ, 1) โˆˆ ๐’ซ2/3,1/33 .

4.2.2 ๐‘-norm cones

Let ๐‘ โ‰ฅ 1. The ๐‘-norm of a vector ๐‘ฅ โˆˆ R๐‘› is โ€–๐‘ฅโ€–๐‘ = (|๐‘ฅ1|๐‘ + ยท ยท ยท + |๐‘ฅ๐‘›|๐‘)1/๐‘ and the ๐‘-normball of radius ๐‘ก is defined by the inequality โ€–๐‘ฅโ€–๐‘ โ‰ค ๐‘ก. We take the ๐‘-norm cone (in dimension๐‘› + 1) to be the convex set {๏ธ€

(๐‘ก, ๐‘ฅ) โˆˆ R๐‘›+1 : ๐‘ก โ‰ฅ โ€–๐‘ฅโ€–๐‘}๏ธ€. (4.5)

For ๐‘ = 2 this is precisely the quadratic cone. We can model the ๐‘-norm cone by writing theinequality ๐‘ก โ‰ฅ โ€–๐‘ฅโ€–๐‘ as:

๐‘ก โ‰ฅโˆ‘๏ธ๐‘–

|๐‘ฅ๐‘–|๐‘/๐‘ก๐‘โˆ’1

31

Page 35: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Fig. 4.1: The boundary of ๐’ซ๐›ผ,1โˆ’๐›ผ3 seen from a point inside the cone for ๐›ผ = 0.1, 0.2, 0.35, 0.5.

32

Page 36: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

and bounding each summand with a power cone. This leads to the following model:

๐‘Ÿ๐‘–๐‘ก๐‘โˆ’1 โ‰ฅ |๐‘ฅ๐‘–|๐‘ ((๐‘Ÿ๐‘–, ๐‘ก, ๐‘ฅ๐‘–) โˆˆ ๐’ซ1/๐‘,1โˆ’1/๐‘

3 ),โˆ‘๏ธ€๐‘Ÿ๐‘– = ๐‘ก.

(4.6)

When 0 < ๐‘ < 1 or ๐‘ < 0 the formula for โ€–๐‘ฅโ€–๐‘ gives a concave, rather than convex functionon R๐‘›

+ and in this case it is possible to model the set{๏ธ‚(๐‘ก, ๐‘ฅ) : 0 โ‰ค ๐‘ก โ‰ค

(๏ธโˆ‘๏ธ๐‘ฅ๐‘๐‘–

)๏ธ1/๐‘, ๐‘ฅ๐‘– โ‰ฅ 0

}๏ธ‚, ๐‘ < 1, ๐‘ ฬธ= 0.

We leave it as an exercise (see previous subsection). The case ๐‘ = โˆ’1 appears in Sec. 3.2.7.

4.2.3 The most general power cone

Consider the most general version of the power cone with โ€œlong left-hand sideโ€ defined in(4.4). We show that it can be expressed using the basic three-dimensional cones ๐’ซ๐›ผ,1โˆ’๐›ผ

3 .Clearly it suffices to consider a short right-hand side, that is to model the cone

๐’ซ๐›ผ1,...,๐›ผ๐‘š

๐‘š+1 = {๐‘ฅ๐›ผ11 ๐‘ฅ๐›ผ2

2 ยท ยท ยท๐‘ฅ๐›ผ๐‘š๐‘š โ‰ฅ |๐‘ง|, ๐‘ฅ1, . . . , ๐‘ฅ๐‘š โ‰ฅ 0} , (4.7)

whereโˆ‘๏ธ€

๐‘– ๐›ผ๐‘– = 1. Denote ๐‘  = ๐›ผ1 + ยท ยท ยท + ๐›ผ๐‘šโˆ’1. We split (4.7) into two constraints

๐‘ฅ๐›ผ1/๐‘ 1 ยท ยท ยท๐‘ฅ๐›ผ๐‘šโˆ’1/๐‘ 

๐‘šโˆ’1 โ‰ฅ |๐‘ก|, ๐‘ฅ1, . . . , ๐‘ฅ๐‘šโˆ’1 โ‰ฅ 0,๐‘ก๐‘ ๐‘ฅ๐›ผ๐‘š

๐‘š โ‰ฅ |๐‘ง|, ๐‘ฅ๐‘š โ‰ฅ 0,(4.8)

and this way we expressed ๐’ซ๐›ผ1,...,๐›ผ๐‘š

๐‘š+1 using two power cones ๐’ซ๐›ผ1/๐‘ ,...,๐›ผ๐‘šโˆ’1/๐‘ ๐‘š and ๐’ซ๐‘ ,๐›ผ๐‘š

3 . Pro-ceeding by induction gives the desired splitting.

4.2.4 Geometric mean

The power cone ๐’ซ1/๐‘›,...,1/๐‘›๐‘›+1 is a direct way to model the inequality

|๐‘ง| โ‰ค (๐‘ฅ1๐‘ฅ2 ยท ยท ยท ๐‘ฅ๐‘›)1/๐‘›, (4.9)

which corresponds to maximizing the geometric mean of the variables ๐‘ฅ๐‘– โ‰ฅ 0. In this specialcase tracing through the splitting (4.8) produces an equivalent representation of (4.9) usingthree-dimensional power cones as follows:

๐‘ฅ1/21 ๐‘ฅ

1/22 โ‰ฅ |๐‘ก3|,

๐‘ก1โˆ’1/33 ๐‘ฅ

1/33 โ‰ฅ |๐‘ก4|,

ยท ยท ยท๐‘ก1โˆ’1/(๐‘›โˆ’1)๐‘›โˆ’1 ๐‘ฅ

1/(๐‘›โˆ’1)๐‘›โˆ’1 โ‰ฅ |๐‘ก๐‘›|,

๐‘ก1โˆ’1/๐‘›๐‘› ๐‘ฅ

1/๐‘›๐‘› โ‰ฅ |๐‘ง|.

33

Page 37: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

4.2.5 Non-homogenous constraints

Every constraint of the form

๐‘ฅ๐›ผ11 ๐‘ฅ๐›ผ2

2 ยท ยท ยท๐‘ฅ๐›ผ๐‘š๐‘š โ‰ฅ |๐‘ง|๐›ฝ, ๐‘ฅ๐‘– โ‰ฅ 0,

whereโˆ‘๏ธ€

๐‘– ๐›ผ๐‘– < ๐›ฝ and ๐›ผ๐‘– > 0 is equivalent to

(๐‘ฅ1, ๐‘ฅ2, . . . , ๐‘ฅ๐‘š, 1, ๐‘ง) โˆˆ ๐’ซ๐›ผ1/๐›ฝ,...,๐›ผ๐‘š/๐›ฝ,๐‘ ๐‘š+2

with ๐‘  = 1 โˆ’โˆ‘๏ธ€๐‘– ๐›ผ๐‘–/๐›ฝ.

4.3 Power cone case studies

4.3.1 Portfolio optimization with market impact

Let us go back to the Markowitz portfolio optimization problem introduced in Sec. 3.3.3,where we now ask to maximize expected profit subject to bounded risk in a long-only port-folio:

maximize ๐œ‡๐‘‡๐‘ฅ

subject toโˆš๐‘ฅ๐‘‡ฮฃ๐‘ฅ โ‰ค ๐›พ,

๐‘ฅ๐‘– โ‰ฅ 0, ๐‘– = 1, . . . , ๐‘›,

(4.10)

In a realistic model we would have to consider transaction costs which decrease the expectedreturn. In particular if a really large volume is traded then the trade itself will affect theprice of the asset, a phenomenon called market impact. It is typically modeled by decreasingthe expected return of ๐‘–-th asset by a slippage cost proportional to ๐‘ฅ๐›ฝ for some ๐›ฝ > 1, sothat the objective function changes to

maximize

(๏ธƒ๐œ‡๐‘‡๐‘ฅโˆ’

โˆ‘๏ธ๐‘–

๐›ฟ๐‘–๐‘ฅ๐›ฝ๐‘–

)๏ธƒ.

A popular choice is ๐›ฝ = 3/2. This objective can easily be modeled with a power cone as inSec. 4.2:

maximize ๐œ‡๐‘‡๐‘ฅโˆ’ ๐›ฟ๐‘‡ ๐‘ก

subject to (๐‘ก๐‘–, 1, ๐‘ฅ๐‘–) โˆˆ ๐’ซ1/๐›ฝ,1โˆ’1/๐›ฝ3 (๐‘ก๐‘– โ‰ฅ ๐‘ฅ๐›ฝ

๐‘– ),ยท ยท ยท

In particular if ๐›ฝ = 3/2 the inequality ๐‘ก๐‘– โ‰ฅ ๐‘ฅ3/2๐‘– has conic representation (๐‘ก๐‘–, 1, ๐‘ฅ๐‘–) โˆˆ ๐’ซ2/3,1/3

3 .

4.3.2 Maximum volume cuboid

Suppose we have a convex, conic representable set ๐พ โŠ† R๐‘› (for instance a polyhedron, aball, intersections of those and so on). We would like to find a maximum volume axis-parallel

34

Page 38: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

cuboid inscribed in ๐พ. If we denote by ๐‘ โˆˆ R๐‘› the leftmost corner and by ๐‘ฅ1, . . . , ๐‘ฅ๐‘› theedge lengths, then this problem is equivalent to

maximize ๐‘กsubject to ๐‘ก โ‰ค (๐‘ฅ1 ยท ยท ยท ๐‘ฅ๐‘›)1/๐‘›,

๐‘ฅ๐‘– โ‰ฅ 0,(๐‘1 + ๐‘’1๐‘ฅ1, . . . , ๐‘๐‘› + ๐‘’๐‘›๐‘ฅ๐‘›) โˆˆ ๐พ, โˆ€๐‘’1, . . . , ๐‘’๐‘› โˆˆ {0, 1},

(4.11)

where the last constraint states that all vertices of the cuboid are in ๐พ. The optimal volumeis then ๐‘ฃ = ๐‘ก๐‘›. Modeling the geometric mean with power cones was discussed in Sec. 4.2.4.

Maximizing the volume of an arbitrary (not necessarily axis-parallel) cuboid inscribed in๐พ is no longer a convex problem. However, it can be solved by maximizing the solution to(4.11) over all sets ๐‘‡ (๐พ) where ๐‘‡ is a rotation (orthogonal matrix) in R๐‘›. In practice onecan approximate the global solution by sampling sufficiently many rotations ๐‘‡ or using moreadvanced methods of optimization over the orthogonal group.

Fig. 4.2: The maximal volume cuboid inscribed in the regular icosahedron takes up approx-imately 0.388 of the volume of the icosahedron (the exact value is 3(1 +

โˆš5)/25).

4.3.3 ๐‘-norm geometric median

The geometric median of a sequence of points ๐‘ฅ1, . . . , ๐‘ฅ๐‘˜ โˆˆ R๐‘› is defined as

argmin๐‘ฆโˆˆR๐‘›

๐‘˜โˆ‘๏ธ๐‘–=1

โ€–๐‘ฆ โˆ’ ๐‘ฅ๐‘–โ€–,

that is a point which minimizes the sum of distances to all the given points. Here โ€– ยท โ€– canbe any norm on R๐‘›. The most classical case is the Euclidean norm, where the geometricmedian is the solution to the basic facility location problem minimizing total transportationcost from one depot to given destinations.

35

Page 39: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

For a general ๐‘-norm โ€–๐‘ฅโ€–๐‘ with 1 โ‰ค ๐‘ < โˆž (see Sec. 4.2.2) the geometric median is thesolution to the obvious conic problem:

minimizeโˆ‘๏ธ€

๐‘– ๐‘ก๐‘–subject to ๐‘ก๐‘– โ‰ฅ โ€–๐‘ฆ โˆ’ ๐‘ฅ๐‘–โ€–๐‘,

๐‘ฆ โˆˆ R๐‘›.(4.12)

In Sec. 4.2.2 we showed how to model the ๐‘-norm bound using ๐‘› power cones.The Fermat-Torricelli point of a triangle is the Euclidean geometric mean of its vertices,

and a classical theorem in planar geometry (due to Torricelli, posed by Fermat), states thatit is the unique point inside the triangle from which each edge is visible at the angle of 120โˆ˜

(or a vertex if the triangle has an angle of 120โˆ˜ or more). Using (4.12) we can compute the๐‘-norm analogues of the Fermat-Torricelli point. Some examples are shown in Fig. 4.3.

Fig. 4.3: The geometric median of three triangle vertices in various ๐‘-norms.

4.3.4 Maximum likelihood estimator of a convex density function

In [TV98] the problem of estimating a density function that is know in advance to be convexis considered. Here we will show that this problem can be posed as a conic optimizationproblem. Formally the problem is to estimate an unknown convex density function ๐‘” : R+ โ†’R+ given an ordered sample ๐‘ฆ1 < ๐‘ฆ2 < . . . < ๐‘ฆ๐‘› of ๐‘› outcomes of a distribution with density๐‘”.

The estimator ๐‘” โ‰ฅ 0 is a piecewise linear function

๐‘” : [๐‘ฆ1, ๐‘ฆ๐‘›] โ†’ R+

with break points at (๐‘ฆ๐‘–, ๐‘ฅ๐‘–), ๐‘– = 1, . . . , ๐‘›, where the variables ๐‘ฅ๐‘– > 0 are estimators for ๐‘”(๐‘ฆ๐‘–).The slope of the ๐‘–-th linear segment of ๐‘” is

๐‘ฅ๐‘–+1 โˆ’ ๐‘ฅ๐‘–

๐‘ฆ๐‘–+1 โˆ’ ๐‘ฆ๐‘–.

Hence the convexity requirement leads to the constraints

๐‘ฅ๐‘–+1 โˆ’ ๐‘ฅ๐‘–

๐‘ฆ๐‘–+1 โˆ’ ๐‘ฆ๐‘–โ‰ค ๐‘ฅ๐‘–+2 โˆ’ ๐‘ฅ๐‘–+1

๐‘ฆ๐‘–+2 โˆ’ ๐‘ฆ๐‘–+1

, ๐‘– = 1, . . . , ๐‘›โˆ’ 2.

36

Page 40: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Recall the area under the density function must be 1. Hence,

๐‘›โˆ’1โˆ‘๏ธ๐‘–=1

(๐‘ฆ๐‘–+1 โˆ’ ๐‘ฆ๐‘–)

(๏ธ‚๐‘ฅ๐‘–+1 + ๐‘ฅ๐‘–

2

)๏ธ‚= 1

must hold. Therefore, the problem to be solved is

maximizeโˆ๏ธ€๐‘›

๐‘–=1 ๐‘ฅ๐‘–

subject to ๐‘ฅ๐‘–+1โˆ’๐‘ฅ๐‘–

๐‘ฆ๐‘–+1โˆ’๐‘ฆ๐‘–โˆ’ ๐‘ฅ๐‘–+2โˆ’๐‘ฅ๐‘–+1

๐‘ฆ๐‘–+2โˆ’๐‘ฆ๐‘–+1โ‰ค 0, ๐‘– = 1, . . . , ๐‘›โˆ’ 2,โˆ‘๏ธ€๐‘›โˆ’1

๐‘–=1 (๐‘ฆ๐‘–+1 โˆ’ ๐‘ฆ๐‘–)(๏ธ€๐‘ฅ๐‘–+1+๐‘ฅ๐‘–

2

)๏ธ€= 1,

๐‘ฅ โ‰ฅ 0.

Maximizingโˆ๏ธ€๐‘›

๐‘–=1 ๐‘ฅ๐‘– or the geometric mean (โˆ๏ธ€๐‘›

๐‘–=1 ๐‘ฅ๐‘–)1๐‘› will produce the same optimal solu-

tions. Using that observation we can model the objective as shown in Sec. 4.2.4. Alterna-tively, one can use as objective

โˆ‘๏ธ€๐‘– log ๐‘ฅ๐‘– and model it using the exponential cone as in Sec.

5.2.

37

Page 41: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Chapter 5

Exponential cone optimization

So far we discussed optimization problems involving the major โ€œpolynomialโ€ families of cones:linear, quadratic and power cones. In this chapter we introduce a single new object, namelythe three-dimensional exponential cone, together with examples and applications. The ex-ponential cone can be used to model a variety of constraints involving exponentials andlogarithms.

5.1 Exponential cone

The exponential cone is a convex subset of R3 defined as

๐พexp ={๏ธ€

(๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3) : ๐‘ฅ1 โ‰ฅ ๐‘ฅ2๐‘’๐‘ฅ3/๐‘ฅ2 , ๐‘ฅ2 > 0

}๏ธ€โˆช

{(๐‘ฅ1, 0, ๐‘ฅ3) : ๐‘ฅ1 โ‰ฅ 0, ๐‘ฅ3 โ‰ค 0} . (5.1)

Thus the exponential cone is the closure in R3 of the set of points which satisfy

๐‘ฅ1 โ‰ฅ ๐‘ฅ2๐‘’๐‘ฅ3/๐‘ฅ2 , ๐‘ฅ1, ๐‘ฅ2 > 0. (5.2)

When working with logarithms, a convenient reformulation of (5.2) is

๐‘ฅ3 โ‰ค ๐‘ฅ2 log(๐‘ฅ1/๐‘ฅ2), ๐‘ฅ1, ๐‘ฅ2 > 0. (5.3)

Alternatively, one can write the same condition as

๐‘ฅ1/๐‘ฅ2 โ‰ฅ ๐‘’๐‘ฅ3/๐‘ฅ2 , ๐‘ฅ1, ๐‘ฅ2 > 0,

which immediately shows that ๐พexp is in fact a cone, i.e. ๐›ผ๐‘ฅ โˆˆ ๐พexp for ๐‘ฅ โˆˆ ๐พexp and ๐›ผ โ‰ฅ 0.Convexity of ๐พexp follows from the fact that the Hessian of ๐‘“(๐‘ฅ, ๐‘ฆ) = ๐‘ฆ exp(๐‘ฅ/๐‘ฆ), namely

๐ท2(๐‘“) = ๐‘’๐‘ฅ/๐‘ฆ[๏ธ‚

๐‘ฆโˆ’1 โˆ’๐‘ฅ๐‘ฆโˆ’2

โˆ’๐‘ฅ๐‘ฆโˆ’2 ๐‘ฅ2๐‘ฆโˆ’3

]๏ธ‚is positive semidefinite for ๐‘ฆ > 0.

38

Page 42: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Fig. 5.1: The boundary of the exponential cone ๐พexp. The red isolines are graphs of ๐‘ฅ2 โ†’๐‘ฅ2 log(๐‘ฅ1/๐‘ฅ2) for fixed ๐‘ฅ1, see (5.3).

5.2 Modeling with the exponential cone

Extending the conic optimization toolbox with the exponential cone leads to new types ofconstraint building blocks and new types of representable sets. In this section we list thebasic operations available using the exponential cone.

5.2.1 Exponential

The epigraph ๐‘ก โ‰ฅ ๐‘’๐‘ฅ is a section of ๐พexp:

๐‘ก โ‰ฅ ๐‘’๐‘ฅ โ‡โ‡’ (๐‘ก, 1, ๐‘ฅ) โˆˆ ๐พexp. (5.4)

5.2.2 Logarithm

Similarly, we can express the hypograph ๐‘ก โ‰ค log ๐‘ฅ, ๐‘ฅ โ‰ฅ 0:

๐‘ก โ‰ค log ๐‘ฅ โ‡โ‡’ (๐‘ฅ, 1, ๐‘ก) โˆˆ ๐พexp. (5.5)

5.2.3 Entropy

The entropy function ๐ป(๐‘ฅ) = โˆ’๐‘ฅ log ๐‘ฅ can be maximized using the following representationwhich follows directly from (5.3):

๐‘ก โ‰ค โˆ’๐‘ฅ log ๐‘ฅ โ‡โ‡’ ๐‘ก โ‰ค ๐‘ฅ log(1/๐‘ฅ) โ‡โ‡’ (1, ๐‘ฅ, ๐‘ก) โˆˆ ๐พexp. (5.6)

39

Page 43: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

5.2.4 Relative entropy

The relative entropy or Kullback-Leiber divergence of two probability distributions is definedin terms of the function ๐ท(๐‘ฅ, ๐‘ฆ) = ๐‘ฅ log(๐‘ฅ/๐‘ฆ). It is convex, and the minimization problem๐‘ก โ‰ฅ ๐ท(๐‘ฅ, ๐‘ฆ) is equivalent to

๐‘ก โ‰ฅ ๐ท(๐‘ฅ, ๐‘ฆ) โ‡โ‡’ โˆ’๐‘ก โ‰ค ๐‘ฅ log(๐‘ฆ/๐‘ฅ) โ‡โ‡’ (๐‘ฆ, ๐‘ฅ,โˆ’๐‘ก) โˆˆ ๐พexp. (5.7)

Because of this reparametrization the exponential cone is also referred to as the relativeentropy cone, leading to a class of problems known as REPs (relative entropy problems).Having the relative entropy function available makes it possible to express epigraphs of otherfunctions appearing in REPs, for instance:

๐‘ฅ log(1 + ๐‘ฅ/๐‘ฆ) = ๐ท(๐‘ฅ + ๐‘ฆ, ๐‘ฆ) + ๐ท(๐‘ฆ, ๐‘ฅ + ๐‘ฆ).

5.2.5 Softplus function

In neural networks the function ๐‘“(๐‘ฅ) = log(1+๐‘’๐‘ฅ), known as the softplus function, is used asan analytic approximation to the rectifier activation function ๐‘Ÿ(๐‘ฅ) = ๐‘ฅ+ = max(0, ๐‘ฅ). Thesoftplus function is convex and we can express its epigraph ๐‘ก โ‰ฅ log(1 + ๐‘’๐‘ฅ) by combining twoexponential cones. Note that

๐‘ก โ‰ฅ log(1 + ๐‘’๐‘ฅ) โ‡โ‡’ ๐‘’๐‘ฅโˆ’๐‘ก + ๐‘’โˆ’๐‘ก โ‰ค 1

and therefore ๐‘ก โ‰ฅ log(1 + ๐‘’๐‘ฅ) is equivalent to the following set of conic constraints:

๐‘ข + ๐‘ฃ โ‰ค 1,(๐‘ข, 1, ๐‘ฅโˆ’ ๐‘ก) โˆˆ ๐พexp,

(๐‘ฃ, 1,โˆ’๐‘ก) โˆˆ ๐พexp.(5.8)

5.2.6 Log-sum-exp

We can generalize the previous example to a log-sum-exp (logarithm of sum of exponentials)expression

๐‘ก โ‰ฅ log(๐‘’๐‘ฅ1 + ยท ยท ยท + ๐‘’๐‘ฅ๐‘›).

This is equivalent to the inequality

๐‘’๐‘ฅ1โˆ’๐‘ก + ยท ยท ยท + ๐‘’๐‘ฅ๐‘›โˆ’๐‘ก โ‰ค 1,

and so it can be modeled as follows:โˆ‘๏ธ€๐‘ข๐‘– โ‰ค 1,

(๐‘ข๐‘–, 1, ๐‘ฅ๐‘– โˆ’ ๐‘ก) โˆˆ ๐พexp, ๐‘– = 1, . . . , ๐‘›.(5.9)

40

Page 44: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

5.2.7 Log-sum-inv

The following type of bound has applications in capacity optimization for wireless networkdesign:

๐‘ก โ‰ฅ log

(๏ธ‚1

๐‘ฅ1

+ ยท ยท ยท +1

๐‘ฅ๐‘›

)๏ธ‚, ๐‘ฅ๐‘– > 0.

Since the logarithm is increasing, we can model this using a log-sum-exp and an exponentialas:

๐‘ก โ‰ฅ log(๐‘’๐‘ฆ1 + ยท ยท ยท + ๐‘’๐‘ฆ๐‘›),๐‘ฅ๐‘– โ‰ฅ ๐‘’โˆ’๐‘ฆ๐‘– , ๐‘– = 1, . . . , ๐‘›.

Alternatively, one can also rewrite the original constraint in equivalent form:

๐‘’โˆ’๐‘ก โ‰ค ๐‘  โ‰ค(๏ธ‚

1

๐‘ฅ1

+ ยท ยท ยท +1

๐‘ฅ๐‘›

)๏ธ‚โˆ’1

, ๐‘ฅ๐‘– > 0.

and then model the right-hand side inequality using the technique from Sec. 3.2.7. Thisapproach requires only one exponential cone.

5.2.8 Arbitrary exponential

The inequality

๐‘ก โ‰ฅ ๐‘Ž๐‘ฅ11 ๐‘Ž๐‘ฅ2

2 ยท ยท ยท ๐‘Ž๐‘ฅ๐‘›๐‘› ,

where ๐‘Ž๐‘– are arbitrary positive constants, is of course equivalent to

๐‘ก โ‰ฅ exp

(๏ธƒโˆ‘๏ธ๐‘–

๐‘ฅ๐‘– log ๐‘Ž๐‘–

)๏ธƒand therefore to (๐‘ก, 1,

โˆ‘๏ธ€๐‘– ๐‘ฅ๐‘– log ๐‘Ž๐‘–) โˆˆ ๐พexp.

5.2.9 Lambert W-function

The Lambert function ๐‘Š : R+ โ†’ R+ is the unique function satisfying the identity

๐‘Š (๐‘ฅ)๐‘’๐‘Š (๐‘ฅ) = ๐‘ฅ.

It is the real branch of a more general function which appears in applications such as diodemodeling. The ๐‘Š function is concave. Although there is no explicit analytic formula for๐‘Š (๐‘ฅ), the hypograph {(๐‘ฅ, ๐‘ก) : 0 โ‰ค ๐‘ฅ, 0 โ‰ค ๐‘ก โ‰ค ๐‘Š (๐‘ฅ)} has an equivalent description:

๐‘ฅ โ‰ฅ ๐‘ก๐‘’๐‘ก = ๐‘ก๐‘’๐‘ก2/๐‘ก

and so it can be modeled with a mix of exponential and quadratic cones (see Sec. 3.1.2):

(๐‘ฅ, ๐‘ก, ๐‘ข) โˆˆ ๐พexp, (๐‘ฅ โ‰ฅ ๐‘ก exp(๐‘ข/๐‘ก)),(1/2, ๐‘ข, ๐‘ก) โˆˆ ๐’ฌr , (๐‘ข โ‰ฅ ๐‘ก2).

(5.10)

41

Page 45: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

5.2.10 Other simple sets

Here are a few more typical sets which can be expressed using the exponential and quadraticcones. The presentations should be self-explanatory; we leave the simple verifications to thereader.

Table 5.1: Sets representable with the exponential coneSet Conic representation๐‘ก โ‰ฅ (log ๐‘ฅ)2, 0 < ๐‘ฅ โ‰ค 1 (1

2, ๐‘ก, ๐‘ข) โˆˆ ๐’ฌ3

r , (๐‘ฅ, 1, ๐‘ข) โˆˆ ๐พexp, ๐‘ฅ โ‰ค 1๐‘ก โ‰ค log log ๐‘ฅ, ๐‘ฅ > 1 (๐‘ข, 1, ๐‘ก) โˆˆ ๐พexp, (๐‘ฅ, 1, ๐‘ข) โˆˆ ๐พexp

๐‘ก โ‰ฅ (log ๐‘ฅ)โˆ’1, ๐‘ฅ > 1 (๐‘ข, ๐‘ก,โˆš

2) โˆˆ ๐’ฌ3r , (๐‘ฅ, 1, ๐‘ข) โˆˆ ๐พexp

๐‘ก โ‰ค โˆšlog ๐‘ฅ, ๐‘ฅ > 1 (1

2, ๐‘ข, ๐‘ก) โˆˆ ๐’ฌ3

r , (๐‘ฅ, 1, ๐‘ข) โˆˆ ๐พexp

๐‘ก โ‰ค โˆš๐‘ฅ log ๐‘ฅ, ๐‘ฅ > 1 (๐‘ฅ, ๐‘ข,

โˆš2๐‘ก) โˆˆ ๐’ฌ3

r , (๐‘ฅ, 1, ๐‘ข) โˆˆ ๐พexp

๐‘ก โ‰ค log(1 โˆ’ 1/๐‘ฅ), ๐‘ฅ > 1 (๐‘ฅ, ๐‘ข,โˆš

2) โˆˆ ๐’ฌ3r , (1 โˆ’ ๐‘ข, 1, ๐‘ก) โˆˆ ๐พexp

๐‘ก โ‰ฅ log(1 + 1/๐‘ฅ), ๐‘ฅ > 0 (๐‘ฅ + 1, ๐‘ข,โˆš

2) โˆˆ ๐’ฌ3r , (1 โˆ’ ๐‘ข, 1,โˆ’๐‘ก) โˆˆ ๐พexp

5.3 Geometric programming

Geometric optimization problems form a family of optimization problems with objective andconstraints in special polynomial form. It is a rich class of problems solved by reformulating inlogarithmic-exponential form, and thus a major area of applications for the exponential cone๐พexp. Geometric programming is used in circuit design, chemical engineering, mechanicalengineering and finance, just to mention a few applications. We refer to [BKVH07] for asurvey and extensive bibliography.

5.3.1 Definition and basic examples

A monomial is a real valued function of the form

๐‘“(๐‘ฅ1, . . . , ๐‘ฅ๐‘›) = ๐‘๐‘ฅ๐‘Ž11 ๐‘ฅ๐‘Ž2

2 ยท ยท ยท๐‘ฅ๐‘Ž๐‘›๐‘› , (5.11)

where the exponents ๐‘Ž๐‘– are arbitrary real numbers and ๐‘ > 0. A posynomial (positive poly-nomial) is a sum of monomials. Thus the difference between a posynomial and a standardnotion of a multi-variate polynomial known from algebra or calculus is that (i) posynomi-als can have arbitrary exponents, not just integers, but (ii) they can only have positivecoefficients.

For example, the following functions are monomials (in variables ๐‘ฅ, ๐‘ฆ, ๐‘ง):

๐‘ฅ๐‘ฆ, 2๐‘ฅ1.5๐‘ฆโˆ’1๐‘ฅ0.3, 3โˆš๏ธ€

๐‘ฅ๐‘ฆ/๐‘ง, 1 (5.12)

and the following are examples of posynomials:

2๐‘ฅ + ๐‘ฆ๐‘ง, 1.5๐‘ฅ3๐‘ง + 5/๐‘ฆ, (๐‘ฅ2๐‘ฆ2 + 3๐‘งโˆ’0.3)4 + 1. (5.13)

42

Page 46: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

A geometric program (GP) is an optimization problem of the form

minimize ๐‘“0(๐‘ฅ)subject to ๐‘“๐‘–(๐‘ฅ) โ‰ค 1, ๐‘– = 1, . . . ,๐‘š,

๐‘ฅ๐‘— > 0, ๐‘— = 1, . . . , ๐‘›,(5.14)

where ๐‘“0, . . . , ๐‘“๐‘š are posynomials and ๐‘ฅ = (๐‘ฅ1, . . . , ๐‘ฅ๐‘›) is the variable vector.A geometric program (5.14) can be modeled in exponential conic form by making a

substitution

๐‘ฅ๐‘— = ๐‘’๐‘ฆ๐‘— , ๐‘— = 1, . . . , ๐‘›.

Under this substitution a monomial of the form (5.11) becomes

๐‘๐‘’๐‘Ž1๐‘ฆ1๐‘’๐‘Ž2๐‘ฆ2 ยท ยท ยท ๐‘’๐‘Ž๐‘›๐‘ฆ๐‘› = exp(๐‘Ž๐‘‡* ๐‘ฆ + log ๐‘)

for ๐‘Ž* = (๐‘Ž1, . . . , ๐‘Ž๐‘›). Consequently, the optimization problem (5.14) takes an equivalentform

minimize ๐‘กsubject to log(

โˆ‘๏ธ€๐‘˜ exp(๐‘Ž๐‘‡0,๐‘˜,*๐‘ฆ + log ๐‘0,๐‘˜)) โ‰ค ๐‘ก,

log(โˆ‘๏ธ€

๐‘˜ exp(๐‘Ž๐‘‡๐‘–,๐‘˜,*๐‘ฆ + log ๐‘๐‘–,๐‘˜)) โ‰ค 0, ๐‘– = 1, . . . ,๐‘š,(5.15)

where ๐‘Ž๐‘–,๐‘˜ โˆˆ R๐‘› and ๐‘๐‘–,๐‘˜ โˆˆ R for all ๐‘–, ๐‘˜. These are now log-sum-exp constraints we alreadydiscussed in Sec. 5.2.6. In particular, the problem (5.15) is convex, as opposed to theposynomial formulation (5.14).

Example

We demonstrate this reduction on a simple example. Take the geometric problem

minimize ๐‘ฅ + ๐‘ฆ2๐‘งsubject to 0.1

โˆš๐‘ฅ + 2๐‘ฆโˆ’1 โ‰ค 1,

๐‘งโˆ’1 + ๐‘ฆ๐‘ฅโˆ’2 โ‰ค 1.

By substituting ๐‘ฅ = ๐‘’๐‘ข, ๐‘ฆ = ๐‘’๐‘ฃ, ๐‘ง = ๐‘’๐‘ค we get

minimize ๐‘กsubject to log(๐‘’๐‘ข + ๐‘’2๐‘ฃ+๐‘ค) โ‰ค ๐‘ก,

log(๐‘’0.5๐‘ข+log 0.1 + ๐‘’โˆ’๐‘ฃ+log 2) โ‰ค 0,log(๐‘’โˆ’๐‘ค + ๐‘’๐‘ฃโˆ’2๐‘ข) โ‰ค 0.

and using the log-sum-exp reduction from Sec. 5.2.6 we write an explicit conic problem:

minimize ๐‘กsubject to (๐‘1, 1, ๐‘ขโˆ’ ๐‘ก), (๐‘ž1, 1, 2๐‘ฃ + ๐‘ค โˆ’ ๐‘ก) โˆˆ ๐พexp, ๐‘1 + ๐‘ž1 โ‰ค 1,

(๐‘2, 1, 0.5๐‘ข + log 0.1), (๐‘ž2, 1,โˆ’๐‘ฃ + log 2) โˆˆ ๐พexp, ๐‘2 + ๐‘ž2 โ‰ค 1,(๐‘3, 1,โˆ’๐‘ค), (๐‘ž3, 1, ๐‘ฃ โˆ’ 2๐‘ข) โˆˆ ๐พexp, ๐‘3 + ๐‘ž3 โ‰ค 1.

Solving this problem yields (๐‘ฅ, ๐‘ฆ, ๐‘ง) โ‰ˆ (3.14, 2.43, 1.32).

43

Page 47: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

5.3.2 Generalized geometric models

In this section we briefly discuss more general types of constraints which can be modeledwith geometric programs.

Monomials

If ๐‘š(๐‘ฅ) is a monomial then the constraint ๐‘š(๐‘ฅ) = ๐‘ is equivalent to two posynomialinequalities ๐‘š(๐‘ฅ)๐‘โˆ’1 โ‰ค 1 and ๐‘š(๐‘ฅ)โˆ’1๐‘ โ‰ค 1, so it can be expressed in the language ofgeometric programs. In practice it should be added to the model (5.15) as a linear constraintโˆ‘๏ธ

๐‘˜

๐‘Ž๐‘˜๐‘ฆ๐‘˜ = log ๐‘.

Monomial inequalities ๐‘š(๐‘ฅ) โ‰ค ๐‘ and ๐‘š(๐‘ฅ) โ‰ฅ ๐‘ should similarly be modeled as linear inequal-ities in the ๐‘ฆ๐‘– variables.

In similar vein, if ๐‘“(๐‘ฅ) is a posynomial and ๐‘š(๐‘ฅ) is a monomial then ๐‘“(๐‘ฅ) โ‰ค ๐‘š(๐‘ฅ) is stilla posynomial inequality because it can be written as ๐‘“(๐‘ฅ)๐‘š(๐‘ฅ)โˆ’1 โ‰ค 1.

It also means that we can add lower and upper variable bounds: 0 < ๐‘1 โ‰ค ๐‘ฅ โ‰ค ๐‘2 isequivalent to ๐‘ฅโˆ’1๐‘1 โ‰ค 1 and ๐‘โˆ’1

2 ๐‘ฅ โ‰ค 1.

Products and positive powers

Expressions involving products and positive powers (possibly iterated) of posynomialscan again be modeled with posynomials. For example, a constraint such as

((๐‘ฅ๐‘ฆ2 + ๐‘ง)0.3 + ๐‘ฆ)(1/๐‘ฅ + 1/๐‘ฆ)2.2 โ‰ค 1

can be replaced with

๐‘ฅ๐‘ฆ2 + ๐‘ง โ‰ค ๐‘ก, ๐‘ก0.3 + ๐‘ฆ โ‰ค ๐‘ข, ๐‘ฅโˆ’1 + ๐‘ฆโˆ’1 โ‰ค ๐‘ฃ, ๐‘ข๐‘ฃ2.2 โ‰ค 1.

Other transformations and extensions

โ€ข If ๐‘“1, ๐‘“2 are already expressed by posynomials then max{๐‘“1(๐‘ฅ), ๐‘“2(๐‘ฅ)} โ‰ค ๐‘ก is clearlyequivalent to ๐‘“1(๐‘ฅ) โ‰ค ๐‘ก and ๐‘“2(๐‘ฅ) โ‰ค ๐‘ก. Hence we can add the maximum operator tothe list of building blocks for geometric programs.

โ€ข If ๐‘“, ๐‘” are posynomials, ๐‘š is a monomial and we know that ๐‘š(๐‘ฅ) โ‰ฅ ๐‘”(๐‘ฅ) then theconstraint ๐‘“(๐‘ฅ)

๐‘š(๐‘ฅ)โˆ’๐‘”(๐‘ฅ)โ‰ค ๐‘ก is equivalent to ๐‘กโˆ’1๐‘“(๐‘ฅ)๐‘š(๐‘ฅ)โˆ’1 + ๐‘”(๐‘ฅ)๐‘š(๐‘ฅ)โˆ’1 โ‰ค 1.

โ€ข The objective function of a geometric program (5.14) can be extended to include otherterms, for example:

minimize ๐‘“0(๐‘ฅ) +โˆ‘๏ธ๐‘˜

log๐‘š๐‘˜(๐‘ฅ)

44

Page 48: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

where ๐‘š๐‘˜(๐‘ฅ) are monomials. After the change of variables ๐‘ฅ = ๐‘’๐‘ฆ we get a slightlymodified version of (5.15):

minimize ๐‘ก + ๐‘๐‘‡๐‘ฆsubject to

โˆ‘๏ธ€๐‘˜ exp(๐‘Ž๐‘‡0,๐‘˜,*๐‘ฆ + log ๐‘0,๐‘˜) โ‰ค ๐‘ก,

log(โˆ‘๏ธ€

๐‘˜ exp(๐‘Ž๐‘‡๐‘–,๐‘˜,*๐‘ฆ + log ๐‘๐‘–,๐‘˜)) โ‰ค 0, ๐‘– = 1, . . . ,๐‘š,

(note the lack of one logarithm) which can still be expressed with exponential cones.

5.3.3 Geometric programming case studies

Frobenius norm diagonal scaling

Suppose we have a matrix ๐‘€ โˆˆ R๐‘›ร—๐‘› and we want to rescale the coordinate systemusing a diagonal matrix ๐ท = Diag(๐‘‘1, . . . , ๐‘‘๐‘›) with ๐‘‘๐‘– > 0. In the new basis the lineartransformation given by ๐‘€ will now be described by the matrix ๐ท๐‘€๐ทโˆ’1 = (๐‘‘๐‘–๐‘€๐‘–๐‘—๐‘‘

โˆ’1๐‘— )๐‘–,๐‘—.

To choose ๐ท which leads to a โ€œsmallโ€ rescaling we can for example minimize the Frobeniusnorm

โ€–๐ท๐‘€๐ทโˆ’1โ€–2๐น =โˆ‘๏ธ๐‘–๐‘—

(๏ธ€(๐ท๐‘€๐ทโˆ’1)๐‘–๐‘—

)๏ธ€2=โˆ‘๏ธ๐‘–๐‘—

๐‘€2๐‘–๐‘—๐‘‘

2๐‘– ๐‘‘

โˆ’2๐‘— .

Minimizing the last sum is an example of a geometric program with variables ๐‘‘๐‘– (and withoutconstraints).

Maximum likelihood estimation

Geometric programs appear naturally in connection with maximum likelihood estimationof parameters of random distributions. Consider a simple example. Suppose we have twobiased coins, with head probabilities ๐‘ and 2๐‘, respectively. We toss both coins and countthe total number of heads. Given that, in the long run, we observed ๐‘– heads ๐‘›๐‘– times for๐‘– = 0, 1, 2, estimate the value of ๐‘.

The probability of obtaining the given outcome equals(๏ธ‚๐‘›0 + ๐‘›1 + ๐‘›2

๐‘›0

)๏ธ‚(๏ธ‚๐‘›1 + ๐‘›2

๐‘›1

)๏ธ‚(๐‘ ยท 2๐‘)๐‘›2 (๐‘(1 โˆ’ 2๐‘) + 2๐‘(1 โˆ’ ๐‘))๐‘›1 ((1 โˆ’ ๐‘)(1 โˆ’ 2๐‘))๐‘›0 ,

and, up to constant factors, maximizing that expression is equivalent to solving the problem

maximize ๐‘2๐‘›2+๐‘›1๐‘ ๐‘›1๐‘ž๐‘›0๐‘Ÿ๐‘›0

subject to ๐‘ž โ‰ค 1 โˆ’ ๐‘,๐‘Ÿ โ‰ค 1 โˆ’ 2๐‘,๐‘  โ‰ค 3 โˆ’ 4๐‘,

or, as a geometric problem:

minimize ๐‘โˆ’2๐‘›2โˆ’๐‘›1๐‘ โˆ’๐‘›1๐‘žโˆ’๐‘›0๐‘Ÿโˆ’๐‘›0

subject to ๐‘ž + ๐‘ โ‰ค 1,๐‘Ÿ + 2๐‘ โ‰ค 1,

13๐‘  + 4

3๐‘ โ‰ค 1.

For example, if (๐‘›0, ๐‘›1, ๐‘›2) = (30, 53, 16) then the above problem solves with ๐‘ = 0.29.

45

Page 49: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

An Olympiad problem

The 26th Vojtฤ›ch Jarnรญk International Mathematical Competition, Ostrava 2016. Let๐‘Ž, ๐‘, ๐‘ be positive real numbers with ๐‘Ž + ๐‘ + ๐‘ = 1. Prove that(๏ธ‚

1

๐‘Ž+

1

๐‘๐‘

)๏ธ‚(๏ธ‚1

๐‘+

1

๐‘Ž๐‘

)๏ธ‚(๏ธ‚1

๐‘+

1

๐‘Ž๐‘

)๏ธ‚โ‰ฅ 1728

Using the tricks introduced in Sec. 5.3.2 we formulate this problem as a geometric program:

minimize ๐‘๐‘ž๐‘Ÿsubject to ๐‘โˆ’1๐‘Žโˆ’1 + ๐‘โˆ’1๐‘โˆ’1๐‘โˆ’1 โ‰ค 1,

๐‘žโˆ’1๐‘โˆ’1 + ๐‘žโˆ’1๐‘Žโˆ’1๐‘โˆ’1 โ‰ค 1,๐‘Ÿโˆ’1๐‘โˆ’1 + ๐‘Ÿโˆ’1๐‘Žโˆ’1๐‘โˆ’1 โ‰ค 1,

๐‘Ž + ๐‘ + ๐‘ โ‰ค 1.

Unsurprisingly, the optimal value of this program is 1728, achieved for (๐‘Ž, ๐‘, ๐‘, ๐‘, ๐‘ž, ๐‘Ÿ) =(๏ธ€13, 13, 13, 12, 12, 12

)๏ธ€.

Power control and rate allocation in wireless networks

We consider a basic wireless network power control problem. In a wireless network with๐‘› logical transmitter-receiver pairs if the power output of transmitter ๐‘— is ๐‘๐‘— then the powerreceived by receiver ๐‘– is ๐บ๐‘–๐‘—๐‘๐‘—, where ๐บ๐‘–๐‘— models path gain and fading effects. If the ๐‘–-threceiverโ€™s own noise is ๐œŽ๐‘– then the signal-to-interference-plus-noise (SINR) ratio of receiver๐‘– is given by

๐‘ ๐‘– =๐บ๐‘–๐‘–๐‘๐‘–

๐œŽ๐‘– +โˆ‘๏ธ€

๐‘— ฬธ=๐‘– ๐บ๐‘–๐‘—๐‘๐‘—. (5.16)

Maximizing the minimal SINR over all receivers (max min๐‘– ๐‘ ๐‘–), subject to bounded poweroutput of the transmitters, is equivalent to the geometric program with variables ๐‘1, . . . , ๐‘๐‘›, ๐‘ก:

minimize ๐‘กโˆ’1

subject to ๐‘min โ‰ค ๐‘๐‘— โ‰ค ๐‘max, ๐‘— = 1, . . . , ๐‘›,๐‘ก(๐œŽ๐‘– +

โˆ‘๏ธ€๐‘— ฬธ=๐‘–๐บ๐‘–๐‘—๐‘๐‘—)๐บ

โˆ’1๐‘–๐‘– ๐‘

โˆ’1๐‘– โ‰ค 1, ๐‘– = 1, . . . , ๐‘›.

(5.17)

In the low-SNR regime the problem of system rate maximization is approximated by theproblem of maximizing

โˆ‘๏ธ€๐‘– log ๐‘ ๐‘–, or equivalently minimizing

โˆ๏ธ€๐‘– ๐‘ 

โˆ’1๐‘– . This is a geometric

problem with variables ๐‘๐‘–, ๐‘ ๐‘–:

minimize ๐‘ โˆ’11 ยท ยท ยท ยท ๐‘ โˆ’1

๐‘›

subject to ๐‘min โ‰ค ๐‘๐‘— โ‰ค ๐‘max, ๐‘— = 1, . . . , ๐‘›,๐‘ ๐‘–(๐œŽ๐‘– +

โˆ‘๏ธ€๐‘— ฬธ=๐‘–๐บ๐‘–๐‘—๐‘๐‘—)๐บ

โˆ’1๐‘–๐‘– ๐‘

โˆ’1๐‘– โ‰ค 1, ๐‘– = 1, . . . , ๐‘›.

(5.18)

For more information and examples see [BKVH07] .

46

Page 50: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

5.4 Exponential cone case studies

In this section we introduce some practical optimization problems where the exponentialcone comes in handy.

5.4.1 Risk parity portfolio

Consider a simple version of the Markowitz portfolio optimization problem introduced in Sec.3.3.3, where we simply ask to minimize the risk ๐‘Ÿ(๐‘ฅ) =

โˆš๐‘ฅ๐‘‡ฮฃ๐‘ฅ of a fully-invested long-only

portfolio:

minimizeโˆš๐‘ฅ๐‘‡ฮฃ๐‘ฅ

subject toโˆ‘๏ธ€๐‘›

๐‘–=1 ๐‘ฅ๐‘– = 1,๐‘ฅ๐‘– โ‰ฅ 0, ๐‘– = 1, . . . , ๐‘›,

(5.19)

where ฮฃ is a symmetric positive definite covariance matrix. We can derive from the first-order optimality conditions that the solution to (5.19) satisfies ๐œ•๐‘Ÿ

๐œ•๐‘ฅ๐‘–= ๐œ•๐‘Ÿ

๐œ•๐‘ฅ๐‘—whenever ๐‘ฅ๐‘–, ๐‘ฅ๐‘— > 0,

i.e. marginal risk contributions of positively invested assets are equal. In practice this oftenleads to concentrated portfolios, whereas it would benefit diversification to consider portfolioswhere all assets have the same total contribution to risk:

๐‘ฅ๐‘–๐œ•๐‘Ÿ

๐œ•๐‘ฅ๐‘–

= ๐‘ฅ๐‘—๐œ•๐‘Ÿ

๐œ•๐‘ฅ๐‘—

, ๐‘–, ๐‘— = 1, . . . , ๐‘›. (5.20)

We call (5.20) the risk parity condition. It indeed models equal risk contribution from allthe assets, because as one can easily check ๐œ•๐‘Ÿ

๐œ•๐‘ฅ๐‘–= (ฮฃ๐‘ฅ)๐‘–โˆš

๐‘ฅ๐‘‡ฮฃ๐‘ฅand

๐‘Ÿ(๐‘ฅ) =โˆ‘๏ธ๐‘–

๐‘ฅ๐‘–๐œ•๐‘Ÿ

๐œ•๐‘ฅ๐‘–

.

Risk parity portfolios satisfying condition (5.20) can be found with an auxiliary optimizationproblem:

minimizeโˆš๐‘ฅ๐‘‡ฮฃ๐‘ฅโˆ’ ๐‘

โˆ‘๏ธ€๐‘– log ๐‘ฅ๐‘–

subject to ๐‘ฅ๐‘– > 0, ๐‘– = 1, . . . , ๐‘›,(5.21)

for any ๐‘ > 0. More precisely, the gradient of the objective function in (5.21) is zerowhen ๐œ•๐‘Ÿ

๐œ•๐‘ฅ๐‘–= ๐‘/๐‘ฅ๐‘– for all ๐‘–, implying the parity condition (5.20) holds. Since (5.20) is scale-

invariant, we can rescale any solution of (5.21) and get a fully-invested risk parity portfoliowith

โˆ‘๏ธ€๐‘– ๐‘ฅ๐‘– = 1.

The conic form of problem (5.21) is:

minimize ๐‘กโˆ’ ๐‘๐‘’๐‘‡ ๐‘ 

subject to (๐‘ก,ฮฃ1/2๐‘ฅ) โˆˆ ๐’ฌ๐‘›+1, (๐‘ก โ‰ฅโˆš๐‘ฅ๐‘‡ฮฃ๐‘ฅ),

(๐‘ฅ๐‘–, 1, ๐‘ ๐‘–) โˆˆ ๐พexp, (๐‘ ๐‘– โ‰ค log ๐‘ฅ๐‘–).

(5.22)

47

Page 51: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

5.4.2 Entropy maximization

A general entropy maximization problem has the form

maximize โˆ’โˆ‘๏ธ€๐‘– ๐‘๐‘– log ๐‘๐‘–subject to

โˆ‘๏ธ€๐‘– ๐‘๐‘– = 1,๐‘๐‘– โ‰ฅ 0,๐‘ โˆˆ โ„,

(5.23)

where โ„ defines additional constraints on the probability distribution ๐‘ (these are known asprior information). In the absence of complete information about ๐‘ the maximum entropyprinciple of Jaynes posits to choose the distribution which maximizes uncertainty, that isentropy, subject to what is known. Practitioners think of the solution to (5.23) as the mostrandom or most conservative of distributions consistent with โ„.

Maximization of the entropy function ๐ป(๐‘ฅ) = โˆ’๐‘ฅ log ๐‘ฅ was explained in Sec. 5.2.Often one has an a priori distribution ๐‘ž, and one tries to minimize the distance between

๐‘ and ๐‘ž, while remaining consistent with โ„. In this case it is standard to minimize theKullback-Leiber divergence

๐’Ÿ๐พ๐ฟ(๐‘||๐‘ž) =โˆ‘๏ธ๐‘–

๐‘๐‘– log ๐‘๐‘–/๐‘ž๐‘–

which leads to an optimization problem

minimizeโˆ‘๏ธ€

๐‘– ๐‘๐‘– log ๐‘๐‘–/๐‘ž๐‘–subject to

โˆ‘๏ธ€๐‘– ๐‘๐‘– = 1,๐‘๐‘– โ‰ฅ 0,๐‘ โˆˆ โ„.

(5.24)

5.4.3 Hitting time of a linear system

Consider a linear dynamical system

xโ€ฒ(๐‘ก) = ๐ดx(๐‘ก) (5.25)

where we assume for simplicity that ๐ด = Diag(๐‘Ž1, . . . , ๐‘Ž๐‘›) with ๐‘Ž๐‘– < 0 with initial conditionx(0)๐‘– = ๐‘ฅ๐‘–. The resulting dynamical system x(๐‘ก) = x(0) exp(๐ด๐‘ก) converges to 0 and one canask, for instance, for the time it takes to approach the limit up to distance ๐œ€. The resultingoptimization problem is

minimize ๐‘ก

subject toโˆš๏ธโˆ‘๏ธ€

๐‘– (๐‘ฅ๐‘– exp(๐‘Ž๐‘–๐‘ก))2 โ‰ค ๐œ€,

with the following conic form, where the variables are ๐‘ก, ๐‘ž1, . . . , ๐‘ž๐‘›:

minimize ๐‘กsubject to (๐œ€, ๐‘ฅ1๐‘ž1, . . . , ๐‘ฅ๐‘›๐‘ž๐‘›) โˆˆ ๐’ฌ๐‘›+1,

(๐‘ž๐‘–, 1, ๐‘Ž๐‘–๐‘ก) โˆˆ ๐พexp, ๐‘– = 1, . . . , ๐‘›.

48

Page 52: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

See Fig. 5.2 for an example. Other criteria for the target set of the trajectories are alsopossible. For example, polyhedral constraints

๐‘๐‘‡x โ‰ค ๐‘‘, ๐‘ โˆˆ R๐‘›+, ๐‘‘ โˆˆ R+

are also expressible in exponential conic form for starting points x(0) โˆˆ R๐‘›+, since they

correspond to log-sum-exp constraints of the form

log

(๏ธƒโˆ‘๏ธ๐‘–

exp(๐‘Ž๐‘–x๐‘– + log(๐‘๐‘–๐‘ฅ๐‘–))

)๏ธƒโ‰ค log ๐‘‘.

For a robust version involving uncertainty on ๐ด and ๐‘ฅ(0) see [CS14] .

Fig. 5.2: With ๐ด = Diag(โˆ’0.3,โˆ’0.06) and starting point ๐‘ฅ(0) = (2.2, 1.3) the trajectoryreaches distance ๐œ€ = 0.5 from origin at time ๐‘ก โ‰ˆ 15.936.

5.4.4 Logistic regression

Logistic regression is a technique of training a binary classifier. We are given a training setof examples ๐‘ฅ1, . . . , ๐‘ฅ๐‘› โˆˆ R๐‘‘ together with labels ๐‘ฆ๐‘– โˆˆ {0, 1}. The goal is to train a classifiercapable of assigning new data points to either class 0 or 1. Specifically, the labels should beassigned by computing

โ„Ž๐œƒ(๐‘ฅ) =1

1 + exp(โˆ’๐œƒ๐‘‡๐‘ฅ)(5.26)

and choosing label ๐‘ฆ = 0 if โ„Ž๐œƒ(๐‘ฅ) < 12

and ๐‘ฆ = 1 for โ„Ž๐œƒ(๐‘ฅ) โ‰ฅ 12. Here โ„Ž๐œƒ(๐‘ฅ) is interpreted as

the probability that ๐‘ฅ belongs to class 1. The optimal parameter vector ๐œƒ should be learnedfrom the training set, so as to maximize the likelihood function:โˆ๏ธ

๐‘–

โ„Ž๐œƒ(๐‘ฅ๐‘–)๐‘ฆ๐‘–(1 โˆ’ โ„Ž๐œƒ(๐‘ฅ๐‘–))

1โˆ’๐‘ฆ๐‘– .

By taking logarithms and adding regularization with respect to ๐œƒ we reach an unconstrainedoptimization problem

minimize๐œƒโˆˆR๐‘‘ ๐œ†โ€–๐œƒโ€–2 +โˆ‘๏ธ๐‘–

โˆ’๐‘ฆ๐‘– log(โ„Ž๐œƒ(๐‘ฅ๐‘–)) โˆ’ (1 โˆ’ ๐‘ฆ๐‘–) log(1 โˆ’ โ„Ž๐œƒ(๐‘ฅ๐‘–)). (5.27)

49

Page 53: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Problem (5.27) is convex, and can be more explicitly written as

minimizeโˆ‘๏ธ€

๐‘– ๐‘ก๐‘– + ๐œ†๐‘Ÿsubject to ๐‘ก๐‘– โ‰ฅ โˆ’ log(โ„Ž๐œƒ(๐‘ฅ)) = log(1 + exp(โˆ’๐œƒ๐‘‡๐‘ฅ๐‘–)) if ๐‘ฆ๐‘– = 1,

๐‘ก๐‘– โ‰ฅ โˆ’ log(1 โˆ’ โ„Ž๐œƒ(๐‘ฅ)) = log(1 + exp(๐œƒ๐‘‡๐‘ฅ๐‘–)) if ๐‘ฆ๐‘– = 0,๐‘Ÿ โ‰ฅ โ€–๐œƒโ€–2,

involving softplus type constraints (see Sec. 5.2) and a quadratic cone. See Fig. 5.3 for anexample.

Fig. 5.3: Logistic regression example with none, medium and strong regularization (small,medium, large ๐œ†). The two-dimensional dataset was converted into a feature vector ๐‘ฅ โˆˆ R28

using monomial coordinates of degrees at most 6. Without regularization we get obviousoverfitting.

50

Page 54: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Chapter 6

Semidefinite optimization

In this chapter we extend the conic optimization framework introduced before with symmet-ric positive semidefinite matrix variables.

6.1 Introduction to semidefinite matrices

6.1.1 Semidefinite matrices and cones

A symmetric matrix ๐‘‹ โˆˆ ๐’ฎ๐‘› is called symmetric positive semidefinite if

๐‘ง๐‘‡๐‘‹๐‘ง โ‰ฅ 0, โˆ€๐‘ง โˆˆ R๐‘›.

We then define the cone of symmetric positive semidefinite matrices as

๐’ฎ๐‘›+ = {๐‘‹ โˆˆ ๐’ฎ๐‘› | ๐‘ง๐‘‡๐‘‹๐‘ง โ‰ฅ 0, โˆ€๐‘ง โˆˆ R๐‘›}. (6.1)

For brevity we will often use the shorter notion semidefinite instead of symmetric positivesemidefinite, and we will write ๐‘‹ โชฐ ๐‘Œ (๐‘‹ โชฏ ๐‘Œ ) as shorthand notation for (๐‘‹ โˆ’ ๐‘Œ ) โˆˆ ๐’ฎ๐‘›

+

((๐‘Œ โˆ’๐‘‹) โˆˆ ๐’ฎ๐‘›+). As inner product for semidefinite matrices, we use the standard trace inner

product for general matrices, i.e.,

โŸจ๐ด,๐ตโŸฉ := tr(๐ด๐‘‡๐ต) =โˆ‘๏ธ๐‘–๐‘—

๐‘Ž๐‘–๐‘—๐‘๐‘–๐‘—.

It is easy to see that (6.1) indeed specifies a convex cone; it is pointed (with origin ๐‘‹ = 0),and ๐‘‹, ๐‘Œ โˆˆ ๐’ฎ๐‘›

+ implies that (๐›ผ๐‘‹ + ๐›ฝ๐‘Œ ) โˆˆ ๐’ฎ๐‘›+, ๐›ผ, ๐›ฝ โ‰ฅ 0. Let us review a few equivalent

definitions of ๐’ฎ๐‘›+. It is well-known that every symmetric matrix ๐ด has a spectral factorization

๐ด =๐‘›โˆ‘๏ธ

๐‘–=1

๐œ†๐‘–๐‘ž๐‘–๐‘ž๐‘‡๐‘– .

where ๐‘ž๐‘– โˆˆ R๐‘› are the (orthogonal) eigenvectors and ๐œ†๐‘– are eigenvalues of ๐ด. Using thespectral factorization of ๐ด we have

๐‘ฅ๐‘‡๐ด๐‘ฅ =๐‘›โˆ‘๏ธ

๐‘–=1

๐œ†๐‘–(๐‘ฅ๐‘‡ ๐‘ž๐‘–)

2,

51

Page 55: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

which shows that ๐‘ฅ๐‘‡๐ด๐‘ฅ โ‰ฅ 0 โ‡” ๐œ†๐‘– โ‰ฅ 0, ๐‘– = 1, . . . , ๐‘›. In other words,

๐’ฎ๐‘›+ = {๐‘‹ โˆˆ ๐’ฎ๐‘› | ๐œ†๐‘–(๐‘‹) โ‰ฅ 0, ๐‘– = 1, . . . , ๐‘›}. (6.2)

Another useful characterization is that ๐ด โˆˆ ๐’ฎ๐‘›+ if and only if it is a Grammian matrix

๐ด = ๐‘‰ ๐‘‡๐‘‰ . Here ๐‘‰ is called the Cholesky factor of ๐ด. Using the Grammian representationwe have

๐‘ฅ๐‘‡๐ด๐‘ฅ = ๐‘ฅ๐‘‡๐‘‰ ๐‘‡๐‘‰ ๐‘ฅ = โ€–๐‘‰ ๐‘ฅโ€–22,

i.e., if ๐ด = ๐‘‰ ๐‘‡๐‘‰ then ๐‘ฅ๐‘‡๐ด๐‘ฅ โ‰ฅ 0 for all ๐‘ฅ. On the other hand, from the posi-tive spectral factorization ๐ด = ๐‘„ฮ›๐‘„๐‘‡ we have ๐ด = ๐‘‰ ๐‘‡๐‘‰ with ๐‘‰ = ฮ›1/2๐‘„๐‘‡ , whereฮ›1/2 = Diag(

โˆš๐œ†1, . . . ,

โˆš๐œ†๐‘›). We thus have the equivalent characterization

๐’ฎ๐‘›+ = {๐‘‹ โˆˆ ๐’ฎ๐‘›

+ | ๐‘‹ = ๐‘‰ ๐‘‡๐‘‰ for some ๐‘‰ โˆˆ R๐‘›ร—๐‘›}. (6.3)

In a completely analogous way we define the cone of symmetric positive definite matrices as

๐’ฎ๐‘›++ = {๐‘‹ โˆˆ ๐’ฎ๐‘› | ๐‘ง๐‘‡๐‘‹๐‘ง > 0, โˆ€๐‘ง โˆˆ R๐‘›}

= {๐‘‹ โˆˆ ๐’ฎ๐‘› | ๐œ†๐‘–(๐‘‹) > 0, ๐‘– = 1, . . . , ๐‘›}= {๐‘‹ โˆˆ ๐’ฎ๐‘›

+ | ๐‘‹ = ๐‘‰ ๐‘‡๐‘‰ for some ๐‘‰ โˆˆ R๐‘›ร—๐‘›, rank(๐‘‰ ) = ๐‘›},and we write ๐‘‹ โ‰ป ๐‘Œ (๐‘‹ โ‰บ ๐‘Œ ) as shorthand notation for (๐‘‹ โˆ’ ๐‘Œ ) โˆˆ ๐’ฎ๐‘›

++ ((๐‘Œ โˆ’๐‘‹) โˆˆ ๐’ฎ๐‘›++).

The one dimensional cone ๐’ฎ1+ simply corresponds to R+. Similarly consider

๐‘‹ =

[๏ธ‚๐‘ฅ1 ๐‘ฅ3

๐‘ฅ3 ๐‘ฅ2

]๏ธ‚with determinant det(๐‘‹) = ๐‘ฅ1๐‘ฅ2โˆ’๐‘ฅ2

3 = ๐œ†1๐œ†2 and trace tr(๐‘‹) = ๐‘ฅ1+๐‘ฅ2 = ๐œ†1+๐œ†2. Therefore๐‘‹ has positive eigenvalues if and only if

๐‘ฅ1๐‘ฅ2 โ‰ฅ ๐‘ฅ23, ๐‘ฅ1, ๐‘ฅ2 โ‰ฅ 0,

which characterizes a three-dimensional scaled rotated cone, i.e.,[๏ธ‚๐‘ฅ1 ๐‘ฅ3

๐‘ฅ3 ๐‘ฅ2

]๏ธ‚โˆˆ ๐’ฎ2

+ โ‡โ‡’ (๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3

โˆš2) โˆˆ ๐’ฌ3

๐‘Ÿ.

More generally we have

๐‘ฅ โˆˆ R๐‘›+ โ‡โ‡’ Diag(๐‘ฅ) โˆˆ ๐’ฎ๐‘›

+

and

(๐‘ก, ๐‘ฅ) โˆˆ ๐’ฌ๐‘›+1 โ‡โ‡’[๏ธ‚

๐‘ก ๐‘ฅ๐‘‡

๐‘ฅ ๐‘ก๐ผ

]๏ธ‚โˆˆ ๐’ฎ๐‘›+1

+ ,

where the latter equivalence follows immediately from Lemma 6.1. Thus both the linearand quadratic cone are embedded in the semidefinite cone. In practice, however, linear andquadratic cones should never be described using semidefinite constraints, which would resultin a large performance penalty by squaring the number of variables.

52

Page 56: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Example 6.1. As a more interesting example, consider the symmetric matrix

๐ด(๐‘ฅ, ๐‘ฆ, ๐‘ง) =

โŽกโŽฃ 1 ๐‘ฅ ๐‘ฆ๐‘ฅ 1 ๐‘ง๐‘ฆ ๐‘ง 1

โŽคโŽฆ (6.4)

parametrized by (๐‘ฅ, ๐‘ฆ, ๐‘ง). The set

๐‘† = {(๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ R3 | ๐ด(๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ ๐’ฎ3+},

(shown in Fig. 6.1) is called a spectrahedron and is perhaps the simplest bounded semidef-inite representable set, which cannot be represented using (finitely many) linear orquadratic cones. To gain a geometric intuition of ๐‘†, we note that

det(๐ด(๐‘ฅ, ๐‘ฆ, ๐‘ง)) = โˆ’(๐‘ฅ2 + ๐‘ฆ2 + ๐‘ง2 โˆ’ 2๐‘ฅ๐‘ฆ๐‘ง โˆ’ 1),

so the boundary of ๐‘† can be characterized as

๐‘ฅ2 + ๐‘ฆ2 + ๐‘ง2 โˆ’ 2๐‘ฅ๐‘ฆ๐‘ง = 1,

or equivalently as [๏ธ‚๐‘ฅ๐‘ฆ

]๏ธ‚๐‘‡ [๏ธ‚1 โˆ’๐‘ง

โˆ’๐‘ง 1

]๏ธ‚ [๏ธ‚๐‘ฅ๐‘ฆ

]๏ธ‚= 1 โˆ’ ๐‘ง2.

For ๐‘ง = 0 this describes a circle in the (๐‘ฅ, ๐‘ฆ)-plane, and for โˆ’1 โ‰ค ๐‘ง โ‰ค 1 it characterizesan ellipse (for a fixed ๐‘ง).

6.1.2 Properties of semidefinite matrices

Many useful properties of (semi)definite matrices follow directly from the definitions (6.1)-(6.3) and their definite counterparts.

โ€ข The diagonal elements of ๐ด โˆˆ ๐’ฎ๐‘›+ are nonnegative. Let ๐‘’๐‘– denote the ๐‘–th standard basis

vector (i.e., [๐‘’๐‘–]๐‘— = 0, ๐‘— ฬธ= ๐‘–, [๐‘’๐‘–]๐‘– = 1). Then ๐ด๐‘–๐‘– = ๐‘’๐‘‡๐‘– ๐ด๐‘’๐‘–, so (6.1) implies that ๐ด๐‘–๐‘– โ‰ฅ 0.

โ€ข A block-diagonal matrix ๐ด = Diag(๐ด1, . . . ๐ด๐‘) is (semi)definite if and only if eachdiagonal block ๐ด๐‘– is (semi)definite.

โ€ข Given a quadratic transformation ๐‘€ := ๐ต๐‘‡๐ด๐ต, ๐‘€ โ‰ป 0 if and only if ๐ด โ‰ป 0 and ๐ต hasfull rank. This follows directly from the Grammian characterization ๐‘€ = (๐‘‰ ๐ต)๐‘‡ (๐‘‰ ๐ต).For ๐‘€ โชฐ 0 we only require that ๐ด โชฐ 0. As an example, if ๐ด is (semi)definite then sois any permutation ๐‘ƒ ๐‘‡๐ด๐‘ƒ .

53

Page 57: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Fig. 6.1: Plot of spectrahedron ๐‘† = {(๐‘ฅ, ๐‘ฆ, ๐‘ง) โˆˆ R3 | ๐ด(๐‘ฅ, ๐‘ฆ, ๐‘ง) โชฐ 0}.

โ€ข Any principal submatrix of ๐ด โˆˆ ๐’ฎ๐‘›+ (๐ด restricted to the same set of rows as columns)

is positive semidefinite; this follows by restricting the Grammian characterization ๐ด =๐‘‰ ๐‘‡๐‘‰ to a submatrix of ๐‘‰ .

โ€ข The inner product of positive (semi)definite matrices is positive (nonnegative). Forany ๐ด,๐ต โˆˆ ๐’ฎ๐‘›

++ let ๐ด = ๐‘ˆ๐‘‡๐‘ˆ and ๐ต = ๐‘‰ ๐‘‡๐‘‰ where ๐‘ˆ and ๐‘‰ have full rank. Then

โŸจ๐ด,๐ตโŸฉ = tr(๐‘ˆ๐‘‡๐‘ˆ๐‘‰ ๐‘‡๐‘‰ ) = โ€–๐‘ˆ๐‘‰ ๐‘‡โ€–2๐น > 0,

where strict positivity follows from the assumption that ๐‘ˆ has full column-rank, i.e.,๐‘ˆ๐‘‰ ๐‘‡ ฬธ= 0.

โ€ข The inverse of a positive definite matrix is positive definite. This follows from thepositive spectral factorization ๐ด = ๐‘„ฮ›๐‘„๐‘‡ , which gives us

๐ดโˆ’1 = ๐‘„๐‘‡ฮ›โˆ’1๐‘„

where ฮ›๐‘–๐‘– > 0. If ๐ด is semidefinite then the pseudo-inverse ๐ดโ€  of ๐ด is semidefinite.

โ€ข Consider a matrix ๐‘‹ โˆˆ ๐’ฎ๐‘› partitioned as

๐‘‹ =

[๏ธ‚๐ด ๐ต๐‘‡

๐ต ๐ถ

]๏ธ‚.

Let us find necessary and sufficient conditions for ๐‘‹ โ‰ป 0. We know that ๐ด โ‰ป 0 and๐ถ โ‰ป 0 (since any principal submatrix must be positive definite). Furthermore, we can

54

Page 58: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

simplify the analysis using a nonsingular transformation

๐ฟ =

[๏ธ‚๐ผ 0๐น ๐ผ

]๏ธ‚to diagonalize ๐‘‹ as ๐ฟ๐‘‹๐ฟ๐‘‡ = ๐ท, where ๐ท is block-diagonal. Note that det(๐ฟ) = 1, so๐ฟ is indeed nonsingular. Then ๐‘‹ โ‰ป 0 if and only if ๐ท โ‰ป 0. Expanding ๐ฟ๐‘‹๐ฟ๐‘‡ = ๐ท,we get [๏ธ‚

๐ด ๐ด๐น ๐‘‡ + ๐ต๐‘‡

๐น๐ด + ๐ต ๐น๐ด๐น ๐‘‡ + ๐น๐ต๐‘‡ + ๐ต๐น ๐‘‡ + ๐ถ

]๏ธ‚=

[๏ธ‚๐ท1 00 ๐ท2

]๏ธ‚.

Since det(๐ด) ฬธ= 0 (by assuming that ๐ด โ‰ป 0) we see that ๐น = โˆ’๐ต๐ดโˆ’1 and directsubstitution gives us [๏ธ‚

๐ด 00 ๐ถ โˆ’๐ต๐ดโˆ’1๐ต๐‘‡

]๏ธ‚=

[๏ธ‚๐ท1 00 ๐ท2

]๏ธ‚.

In the last part we have thus established the following useful result.

Lemma 6.1 (Schur complement lemma). A symmetric matrix

๐‘‹ =

[๏ธ‚๐ด ๐ต๐‘‡

๐ต ๐ถ

]๏ธ‚.

is positive definite if and only if

๐ด โ‰ป 0, ๐ถ โˆ’๐ต๐ดโˆ’1๐ต๐‘‡ โ‰ป 0.

6.2 Semidefinite modeling

Having discussed different characterizations and properties of semidefinite matrices, we nextturn to different functions and sets that can be modeled using semidefinite cones and vari-ables. Most of those representations involve semidefinite matrix-valued affine functions,which we discuss next.

6.2.1 Linear matrix inequalities

Consider an affine matrix-valued mapping ๐ด : R๐‘› โ†ฆโ†’ ๐’ฎ๐‘š:

๐ด(๐‘ฅ) = ๐ด0 + ๐‘ฅ1๐ด1 + ยท ยท ยท + ๐‘ฅ๐‘›๐ด๐‘›. (6.5)

A linear matrix inequality (LMI) is a constraint of the form

๐ด0 + ๐‘ฅ1๐ด1 + ยท ยท ยท + ๐‘ฅ๐‘›๐ด๐‘› โชฐ 0 (6.6)

55

Page 59: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

in the variable ๐‘ฅ โˆˆ R๐‘› with symmetric coefficients ๐ด๐‘– โˆˆ ๐’ฎ๐‘š, ๐‘– = 0, . . . , ๐‘›. As a simpleexample consider the matrix in (6.4),

๐ด(๐‘ฅ, ๐‘ฆ, ๐‘ง) = ๐ด0 + ๐‘ฅ๐ด1 + ๐‘ฆ๐ด2 + ๐‘ง๐ด3 โชฐ 0

with

๐ด0 = ๐ผ, ๐ด1 =

โŽกโŽฃ 0 1 01 0 00 0 0

โŽคโŽฆ , ๐ด2 =

โŽกโŽฃ 0 0 10 0 01 0 0

โŽคโŽฆ , ๐ด3 =

โŽกโŽฃ 0 0 00 0 10 1 0

โŽคโŽฆ .

Alternatively, we can describe the linear matrix inequality ๐ด(๐‘ฅ, ๐‘ฆ, ๐‘ง) โชฐ 0 as

๐‘‹ โˆˆ ๐’ฎ3+, ๐‘ฅ11 = ๐‘ฅ22 = ๐‘ฅ33 = 1,

i.e., as a semidefinite variable with fixed diagonal; these two alternative formulations illus-trate the difference between primal and dual form of semidefinite problems, see Sec. 8.6.

6.2.2 Eigenvalue optimization

Consider a symmetric matrix ๐ด โˆˆ ๐’ฎ๐‘š and let its eigenvalues be denoted by

๐œ†1(๐ด) โ‰ฅ ๐œ†2(๐ด) โ‰ฅ ยท ยท ยท โ‰ฅ ๐œ†๐‘š(๐ด).

A number of different functions of ๐œ†๐‘– can then be described using a mix of linear and semidef-inite constraints.

Sum of eigenvalues

The sum of the eigenvalues corresponds to

๐‘šโˆ‘๏ธ๐‘–=1

๐œ†๐‘–(๐ด) = tr(๐ด).

Largest eigenvalue

The largest eigenvalue can be characterized in epigraph form ๐œ†1(๐ด) โ‰ค ๐‘ก as

๐‘ก๐ผ โˆ’ ๐ด โชฐ 0. (6.7)

To verify this, suppose we have a spectral factorization ๐ด = ๐‘„ฮ›๐‘„๐‘‡ where ๐‘„ is orthogonaland ฮ› is diagonal. Then ๐‘ก is an upper bound on the largest eigenvalue if and only if

๐‘„๐‘‡ (๐‘ก๐ผ โˆ’ ๐ด)๐‘„ = ๐‘ก๐ผ โˆ’ ฮ› โชฐ 0.

Thus we can minimize the largest eigenvalue of ๐ด.

56

Page 60: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Smallest eigenvalue

The smallest eigenvalue can be described in hypograph form ๐œ†๐‘š(๐ด) โ‰ฅ ๐‘ก as

๐ด โชฐ ๐‘ก๐ผ, (6.8)

i.e., we can maximize the smallest eigenvalue of ๐ด.

Eigenvalue spread

The eigenvalue spread can be modeled in epigraph form

๐œ†1(๐ด) โˆ’ ๐œ†๐‘š(๐ด) โ‰ค ๐‘ก

by combining the two linear matrix inequalities in (6.7) and (6.8), i.e.,

๐‘ง๐ผ โชฏ ๐ด โชฏ ๐‘ ๐ผ,๐‘ โˆ’ ๐‘ง โ‰ค ๐‘ก.

(6.9)

Spectral radius

The spectral radius ๐œŒ(๐ด) := max๐‘– |๐œ†๐‘–(๐ด)| can be modeled in epigraph form ๐œŒ(๐ด) โ‰ค ๐‘กusing two linear matrix inequalities

โˆ’๐‘ก๐ผ โชฏ ๐ด โชฏ ๐‘ก๐ผ.

Condition number of a positive definite matrix

Suppose now that ๐ด โˆˆ ๐’ฎ๐‘š+ . The condition number of a positive definite matrix can be

minimized by noting that ๐œ†1(๐ด)/๐œ†๐‘š(๐ด) โ‰ค ๐‘ก if and only if there exists a ๐œ‡ > 0 such that

๐œ‡๐ผ โชฏ ๐ด โชฏ ๐œ‡๐‘ก๐ผ,

or equivalently if and only if ๐ผ โชฏ ๐œ‡โˆ’1๐ด โชฏ ๐‘ก๐ผ. If ๐ด = ๐ด(๐‘ฅ) is represented as in (6.5) then achange of variables ๐‘ง := ๐‘ฅ/๐œ‡, ๐œˆ := 1/๐œ‡ leads to a problem of the form

minimize ๐‘กsubject to ๐ผ โชฏ ๐œˆ๐ด0 +

โˆ‘๏ธ€๐‘š๐‘–=1 ๐‘ง๐‘–๐ด๐‘– โชฏ ๐‘ก๐ผ,

(6.10)

from which we recover the solution ๐‘ฅ = ๐‘ง/๐œˆ. In essence, we first normalize the spectrum bythe smallest eigenvalue, and then minimize the largest eigenvalue of the normalized linearmatrix inequality. Compare Sec. 2.2.5.

57

Page 61: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

6.2.3 Log-determinant

Consider again a symmetric positive-definite matrix ๐ด โˆˆ ๐’ฎ๐‘š+ . The determinant

det(๐ด) =๐‘šโˆ๏ธ๐‘–=1

๐œ†๐‘–(๐ด)

is neither convex or concave, but log det(๐ด) is concave and we can write the inequality

๐‘ก โ‰ค log det(๐ด)

in the form of the following problem:[๏ธ‚๐ด ๐‘๐‘๐‘‡ diag(๐‘)

]๏ธ‚โชฐ 0,

๐‘ is lower triangular,๐‘ก โ‰คโˆ‘๏ธ€๐‘– log๐‘๐‘–๐‘–.

(6.11)

The equivalence of the two problems follows from Lemma 6.1 and subadditivity of determi-nant for semidefinite matrices. That is:

0 โ‰ค det(๐ดโˆ’ ๐‘๐‘‡diag(๐‘)โˆ’1๐‘) โ‰ค det(๐ด) โˆ’ det(๐‘๐‘‡diag(๐‘)โˆ’1๐‘)= det(๐ด) โˆ’ det(๐‘).

On the other hand the optimal value det(๐ด) is attained for ๐‘ = ๐ฟ๐ท if ๐ด = ๐ฟ๐ท๐ฟ๐‘‡ is theLDL factorization of ๐ด.

The last inequality in problem (6.11) can of course be modeled using the exponentialcone as in Sec. 5.2. Note that we can replace that bound with ๐‘ก โ‰ค (

โˆ๏ธ€๐‘– ๐‘๐‘–๐‘–)

1/๐‘š to get insteadthe model of ๐‘ก โ‰ค det(๐ด)1/๐‘š using Sec. 4.2.4.

6.2.4 Singular value optimization

We next consider a non-square matrix ๐ด โˆˆ R๐‘šร—๐‘. Assume ๐‘ โ‰ค ๐‘š and denote the singularvalues of ๐ด by

๐œŽ1(๐ด) โ‰ฅ ๐œŽ2(๐ด) โ‰ฅ ยท ยท ยท โ‰ฅ ๐œŽ๐‘(๐ด) โ‰ฅ 0.

The singular values are connected to the eigenvalues of ๐ด๐‘‡๐ด via

๐œŽ๐‘–(๐ด) =โˆš๏ธ€

๐œ†๐‘–(๐ด๐‘‡๐ด), (6.12)

and if ๐ด is square and symmetric then ๐œŽ๐‘–(๐ด) = |๐œ†๐‘–(๐ด)|. We show next how to optimizeseveral functions of the singular values.

58

Page 62: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Largest singular value

The epigraph ๐œŽ1(๐ด) โ‰ค ๐‘ก can be characterized using (6.12) as

๐ด๐‘‡๐ด โชฏ ๐‘ก2๐ผ,

which from Schurโ€™s lemma is equivalent to[๏ธ‚๐‘ก๐ผ ๐ด๐ด๐‘‡ ๐‘ก๐ผ

]๏ธ‚โชฐ 0. (6.13)

The largest singular value ๐œŽ1(๐ด) is also called the spectral norm or the โ„“2-norm of ๐ด, โ€–๐ดโ€–2 :=๐œŽ1(๐ด).

Sum of singular values

The trace norm or the nuclear norm of ๐‘‹ is the dual of the โ„“2-norm:

โ€–๐‘‹โ€–* = supโ€–๐‘โ€–2โ‰ค1

tr(๐‘‹๐‘‡๐‘). (6.14)

It turns out that the nuclear norm corresponds to the sum of the singular values,

โ€–๐‘‹โ€–* =๐‘šโˆ‘๏ธ๐‘–=1

๐œŽ๐‘–(๐‘‹) =๐‘›โˆ‘๏ธ

๐‘–=1

โˆš๏ธ€๐œ†๐‘–(๐‘‹๐‘‡๐‘‹), (6.15)

which is easy to verify using singular value decomposition ๐‘‹ = ๐‘ˆฮฃ๐‘‰ ๐‘‡ . We have

supโ€–๐‘โ€–2โ‰ค1

tr(๐‘‹๐‘‡๐‘) = supโ€–๐‘โ€–2โ‰ค1

tr(ฮฃ๐‘‡๐‘ˆ๐‘‡๐‘๐‘‰ )

= supโ€–๐‘Œ โ€–2โ‰ค1

tr(ฮฃ๐‘‡๐‘Œ )

= sup|๐‘ฆ๐‘–|โ‰ค1

๐‘โˆ‘๏ธ๐‘–=1

๐œŽ๐‘–๐‘ฆ๐‘– =

๐‘โˆ‘๏ธ๐‘–=1

๐œŽ๐‘–.

which shows (6.15). Alternatively, we can express (6.14) as the solution to

maximize tr(๐‘‹๐‘‡๐‘)

subject to[๏ธ‚

๐ผ ๐‘๐‘‡

๐‘ ๐ผ

]๏ธ‚โชฐ 0,

(6.16)

with the dual problem (see Example 8.8)

minimize tr(๐‘ˆ + ๐‘‰ )/2

subject to[๏ธ‚

๐‘ˆ ๐‘‹๐‘‡

๐‘‹ ๐‘‰

]๏ธ‚โชฐ 0.

(6.17)

In other words, using strong duality we can characterize the epigraph โ€–๐ดโ€–* โ‰ค ๐‘ก with[๏ธ‚๐‘ˆ ๐ด๐‘‡

๐ด ๐‘‰

]๏ธ‚โชฐ 0, tr(๐‘ˆ + ๐‘‰ )/2 โ‰ค ๐‘ก. (6.18)

For a symmetric matrix the nuclear norm corresponds to the sum of absolute values ofeigenvalues, and for a semidefinite matrix it simply corresponds to the trace of the matrix.

59

Page 63: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

6.2.5 Matrix inequalities from Schurโ€™s Lemma

Several quadratic or quadratic-over-linear matrix inequalities follow immediately fromSchurโ€™s lemma. Suppose ๐ด : R๐‘šร—๐‘ and ๐ต : R๐‘ร—๐‘ are matrix variables. Then

๐ด๐‘‡๐ตโˆ’1๐ด โชฏ ๐ถ

if and only if [๏ธ‚๐ถ ๐ด๐‘‡

๐ด ๐ต

]๏ธ‚โชฐ 0.

6.2.6 Nonnegative polynomials

We next consider characterizations of polynomials constrained to be nonnegative on the realaxis. To that end, consider a polynomial basis function

๐‘ฃ(๐‘ก) =(๏ธ€1, ๐‘ก, . . . , ๐‘ก2๐‘›

)๏ธ€.

It is then well-known that a polynomial ๐‘“ : R โ†ฆโ†’ R of even degree 2๐‘› is nonnegative on theentire real axis

๐‘“(๐‘ก) := ๐‘ฅ๐‘‡๐‘ฃ(๐‘ก) = ๐‘ฅ0 + ๐‘ฅ1๐‘ก + ยท ยท ยท + ๐‘ฅ2๐‘›๐‘ก2๐‘› โ‰ฅ 0, โˆ€๐‘ก (6.19)

if and only if it can be written as a sum of squared polynomials of degree ๐‘› (or less), i.e.,for some ๐‘ž1, ๐‘ž2 โˆˆ R๐‘›+1

๐‘“(๐‘ก) = (๐‘ž๐‘‡1 ๐‘ข(๐‘ก))2 + (๐‘ž๐‘‡2 ๐‘ข(๐‘ก))2, ๐‘ข(๐‘ก) := (1, ๐‘ก, . . . , ๐‘ก๐‘›). (6.20)

It turns out that an equivalent characterization of {๐‘ฅ | ๐‘ฅ๐‘‡๐‘ฃ(๐‘ก) โ‰ฅ 0, โˆ€๐‘ก} can be given in termsof a semidefinite variable ๐‘‹,

๐‘ฅ๐‘– = โŸจ๐‘‹,๐ป๐‘–โŸฉ, ๐‘– = 0, . . . , 2๐‘›, ๐‘‹ โˆˆ ๐’ฎ๐‘›+1+ . (6.21)

where ๐ป๐‘›+1๐‘– โˆˆ R(๐‘›+1)ร—(๐‘›+1) are Hankel matrices

[๐ป๐‘–]๐‘˜๐‘™ =

{๏ธ‚1, ๐‘˜ + ๐‘™ = ๐‘–,0, otherwise.

When there is no ambiguity, we drop the superscript on ๐ป๐‘–. For example, for ๐‘› = 2 we have

๐ป0 =

โŽกโŽฃ 1 0 00 0 00 0 0

โŽคโŽฆ , ๐ป1 =

โŽกโŽฃ 0 1 01 0 00 0 0

โŽคโŽฆ , . . . ๐ป4 =

โŽกโŽฃ 0 0 00 0 00 0 1

โŽคโŽฆ .

To verify that (6.19) and (6.21) are equivalent, we first note that

๐‘ข(๐‘ก)๐‘ข(๐‘ก)๐‘‡ =2๐‘›โˆ‘๏ธ๐‘–=0

๐ป๐‘–๐‘ฃ๐‘–(๐‘ก),

60

Page 64: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

i.e., โŽกโŽขโŽขโŽขโŽฃ1๐‘ก...๐‘ก๐‘›

โŽคโŽฅโŽฅโŽฅโŽฆโŽกโŽขโŽขโŽขโŽฃ

1๐‘ก...๐‘ก๐‘›

โŽคโŽฅโŽฅโŽฅโŽฆ๐‘‡

=

โŽกโŽขโŽขโŽขโŽฃ1 ๐‘ก . . . ๐‘ก๐‘›

๐‘ก ๐‘ก2 . . . ๐‘ก๐‘›+1

...... . . . ...

๐‘ก๐‘› ๐‘ก๐‘›+1 . . . ๐‘ก2๐‘›

โŽคโŽฅโŽฅโŽฅโŽฆ .

Assume next that ๐‘“(๐‘ก) โ‰ฅ 0. Then from (6.20) we have

๐‘“(๐‘ก) = (๐‘ž๐‘‡1 ๐‘ข(๐‘ก))2 + (๐‘ž๐‘‡2 ๐‘ข(๐‘ก))2

= โŸจ๐‘ž1๐‘ž๐‘‡1 + ๐‘ž2๐‘ž๐‘‡2 , ๐‘ข(๐‘ก)๐‘ข(๐‘ก)๐‘‡ โŸฉ

=โˆ‘๏ธ€2๐‘›

๐‘–=0โŸจ๐‘ž1๐‘ž๐‘‡1 + ๐‘ž2๐‘ž๐‘‡2 , ๐ป๐‘–โŸฉ๐‘ฃ๐‘–(๐‘ก),

i.e., we have ๐‘“(๐‘ก) = ๐‘ฅ๐‘‡๐‘ฃ(๐‘ก) with ๐‘ฅ๐‘– = โŸจ๐‘‹,๐ป๐‘–โŸฉ, ๐‘‹ = (๐‘ž1๐‘ž๐‘‡1 + ๐‘ž2๐‘ž

๐‘‡2 ) โชฐ 0. Conversely, assume

that (6.21) holds. Then

๐‘“(๐‘ก) =2๐‘›โˆ‘๏ธ๐‘–=0

โŸจ๐ป๐‘–, ๐‘‹โŸฉ๐‘ฃ๐‘–(๐‘ก) = โŸจ๐‘‹,2๐‘›โˆ‘๏ธ๐‘–=0

๐ป๐‘–๐‘ฃ๐‘–(๐‘ก)โŸฉ = โŸจ๐‘‹, ๐‘ข(๐‘ก)๐‘ข(๐‘ก)๐‘‡ โŸฉ โ‰ฅ 0

since ๐‘‹ โชฐ 0. In summary, we can characterize the cone of nonnegative polynomials over thereal axis as

๐พ๐‘›โˆž = {๐‘ฅ โˆˆ R๐‘›+1 | ๐‘ฅ๐‘– = โŸจ๐‘‹,๐ป๐‘–โŸฉ, ๐‘– = 0, . . . , 2๐‘›, ๐‘‹ โˆˆ ๐’ฎ๐‘›+1

+ }. (6.22)

Checking nonnegativity of a univariate polynomial thus corresponds to a semidefinite feasi-bility problem.

Nonnegativity on a finite interval

As an extension we consider a basis function of degree ๐‘›,

๐‘ฃ(๐‘ก) = (1, ๐‘ก, . . . , ๐‘ก๐‘›).

A polynomial ๐‘“(๐‘ก) := ๐‘ฅ๐‘‡๐‘ฃ(๐‘ก) is then nonnegative on a subinterval ๐ผ = [๐‘Ž, ๐‘] โŠ‚ R if and only๐‘“(๐‘ก) can be written as a sum of weighted squares,

๐‘“(๐‘ก) = ๐‘ค1(๐‘ก)(๐‘ž๐‘‡1 ๐‘ข1(๐‘ก))

2 + ๐‘ค2(๐‘ก)(๐‘ž๐‘‡2 ๐‘ข2(๐‘ก))

2

where ๐‘ค๐‘–(๐‘ก) are polynomials nonnegative on [๐‘Ž, ๐‘]. To describe the cone

๐พ๐‘›๐‘Ž,๐‘ = {๐‘ฅ โˆˆ R๐‘›+1 | ๐‘“(๐‘ก) = ๐‘ฅ๐‘‡๐‘ฃ(๐‘ก) โ‰ฅ 0, โˆ€๐‘ก โˆˆ [๐‘Ž, ๐‘]}

we distinguish between polynomials of odd and even degree.

โ€ข Even degree. Let ๐‘› = 2๐‘š and denote

๐‘ข1(๐‘ก) = (1, ๐‘ก, . . . , ๐‘ก๐‘š), ๐‘ข2(๐‘ก) = (1, ๐‘ก, . . . , ๐‘ก๐‘šโˆ’1).

61

Page 65: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

We choose ๐‘ค1(๐‘ก) = 1 and ๐‘ค2(๐‘ก) = (๐‘กโˆ’ ๐‘Ž)(๐‘โˆ’ ๐‘ก) and note that ๐‘ค2(๐‘ก) โ‰ฅ 0 on [๐‘Ž, ๐‘]. Then๐‘“(๐‘ก) โ‰ฅ 0,โˆ€๐‘ก โˆˆ [๐‘Ž, ๐‘] if and only if

๐‘“(๐‘ก) = (๐‘ž๐‘‡1 ๐‘ข1(๐‘ก))2 + ๐‘ค2(๐‘ก)(๐‘ž

๐‘‡2 ๐‘ข2(๐‘ก))

2

for some ๐‘ž1, ๐‘ž2, and an equivalent semidefinite characterization can be found as

๐พ๐‘›๐‘Ž,๐‘ = {๐‘ฅ โˆˆ R๐‘›+1 | ๐‘ฅ๐‘– = โŸจ๐‘‹1, ๐ป

๐‘š๐‘– โŸฉ + โŸจ๐‘‹2, (๐‘Ž + ๐‘)๐ป๐‘šโˆ’1

๐‘–โˆ’1 โˆ’ ๐‘Ž๐‘๐ป๐‘šโˆ’1๐‘– โˆ’๐ป๐‘šโˆ’1

๐‘–โˆ’2 โŸฉ,๐‘– = 0, . . . , ๐‘›, ๐‘‹1 โˆˆ ๐’ฎ๐‘š

+ , ๐‘‹2 โˆˆ ๐’ฎ๐‘šโˆ’1+ }. (6.23)

โ€ข Odd degree. Let ๐‘› = 2๐‘š+1 and denote ๐‘ข(๐‘ก) = (1, ๐‘ก, . . . , ๐‘ก๐‘š). We choose ๐‘ค1(๐‘ก) = (๐‘กโˆ’๐‘Ž)and ๐‘ค2(๐‘ก) = (๐‘โˆ’ ๐‘ก). We then have that ๐‘“(๐‘ก) = ๐‘ฅ๐‘‡๐‘ฃ(๐‘ก) โ‰ฅ 0,โˆ€๐‘ก โˆˆ [๐‘Ž, ๐‘] if and only if

๐‘“(๐‘ก) = (๐‘กโˆ’ ๐‘Ž)(๐‘ž๐‘‡1 ๐‘ข(๐‘ก))2 + (๐‘โˆ’ ๐‘ก)(๐‘ž๐‘‡2 ๐‘ข(๐‘ก))2

for some ๐‘ž1, ๐‘ž2, and an equivalent semidefinite characterization can be found as

๐พ๐‘›๐‘Ž,๐‘ = {๐‘ฅ โˆˆ R๐‘›+1 | ๐‘ฅ๐‘– = โŸจ๐‘‹1, ๐ป

๐‘š๐‘–โˆ’1 โˆ’ ๐‘Ž๐ป๐‘š

๐‘– โŸฉ + โŸจ๐‘‹2, ๐‘๐ป๐‘š๐‘– โˆ’๐ป๐‘š

๐‘–โˆ’1โŸฉ,๐‘– = 0, . . . , ๐‘›, ๐‘‹1, ๐‘‹2 โˆˆ ๐’ฎ๐‘š

+ }. (6.24)

6.2.7 Hermitian matrices

Semidefinite optimization can be extended to complex-valued matrices. To that end, let โ„‹๐‘›

denote the cone of Hermitian matrices of order ๐‘›, i.e.,

๐‘‹ โˆˆ โ„‹๐‘› โ‡โ‡’ ๐‘‹ โˆˆ C๐‘›ร—๐‘›, ๐‘‹๐ป = ๐‘‹, (6.25)

where superscript โ€™๐ปโ€™ denotes Hermitian (or complex) transposition. Then ๐‘‹ โˆˆ โ„‹๐‘›+ if and

only if

๐‘ง๐ป๐‘‹๐‘ง = (โ„œ๐‘ง โˆ’ ๐‘–โ„‘๐‘ง)๐‘‡ (โ„œ๐‘‹ + ๐‘–โ„‘๐‘‹)(โ„œ๐‘ง + ๐‘–โ„‘๐‘ง)

=

[๏ธ‚โ„œ๐‘งโ„‘๐‘ง

]๏ธ‚๐‘‡ [๏ธ‚ โ„œ๐‘‹ โˆ’โ„‘๐‘‹โ„‘๐‘‹ โ„œ๐‘‹

]๏ธ‚ [๏ธ‚โ„œ๐‘งโ„‘๐‘ง

]๏ธ‚โ‰ฅ 0, โˆ€๐‘ง โˆˆ C๐‘›.

In other words,

๐‘‹ โˆˆ โ„‹๐‘›+ โ‡โ‡’

[๏ธ‚โ„œ๐‘‹ โˆ’โ„‘๐‘‹โ„‘๐‘‹ โ„œ๐‘‹

]๏ธ‚โˆˆ ๐’ฎ2๐‘›

+ . (6.26)

Note that (6.25) implies skew-symmetry of โ„‘๐‘‹, i.e., โ„‘๐‘‹ = โˆ’โ„‘๐‘‹๐‘‡ .

6.2.8 Nonnegative trigonometric polynomials

As a complex-valued variation of the sum-of-squares representation we consider trigonometricpolynomials; optimization over cones of nonnegative trigonometric polynomials has several

62

Page 66: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

important engineering applications. Consider a trigonometric polynomial evaluated on thecomplex unit-circle

๐‘“(๐‘ง) = ๐‘ฅ0 + 2โ„œ(๐‘›โˆ‘๏ธ

๐‘–=1

๐‘ฅ๐‘–๐‘งโˆ’๐‘–), |๐‘ง| = 1 (6.27)

parametrized by ๐‘ฅ โˆˆ Rร— C๐‘›. We are interested in characterizing the cone of trigonometricpolynomials that are nonnegative on the angular interval [0, ๐œ‹],

๐พ๐‘›0,๐œ‹ = {๐‘ฅ โˆˆ Rร— C๐‘› | ๐‘ฅ0 + 2โ„œ(

๐‘›โˆ‘๏ธ๐‘–=1

๐‘ฅ๐‘–๐‘งโˆ’๐‘–) โ‰ฅ 0, โˆ€๐‘ง = ๐‘’๐‘—๐‘ก, ๐‘ก โˆˆ [0, ๐œ‹]}.

Consider a complex-valued basis function

๐‘ฃ(๐‘ง) = (1, ๐‘ง, . . . , ๐‘ง๐‘›).

The Riesz-Fejer Theorem states that a trigonometric polynomial ๐‘“(๐‘ง) in (6.27) is nonnegative(i.e., ๐‘ฅ โˆˆ ๐พ๐‘›

0,๐œ‹) if and only if for some ๐‘ž โˆˆ C๐‘›+1

๐‘“(๐‘ง) = |๐‘ž๐ป๐‘ฃ(๐‘ง)|2. (6.28)

Analogously to Sec. 6.2.6 we have a semidefinite characterization of the sum-of-squares rep-resentation, i.e., ๐‘“(๐‘ง) โ‰ฅ 0, โˆ€๐‘ง = ๐‘’๐‘—๐‘ก, ๐‘ก โˆˆ [0, 2๐œ‹] if and only if

๐‘ฅ๐‘– = โŸจ๐‘‹,๐‘‡๐‘–โŸฉ, ๐‘– = 0, . . . , ๐‘›, ๐‘‹ โˆˆ โ„‹๐‘›+1+ (6.29)

where ๐‘‡ ๐‘›+1๐‘– โˆˆ R(๐‘›+1)ร—(๐‘›+1) are Toeplitz matrices

[๐‘‡๐‘–]๐‘˜๐‘™ =

{๏ธ‚1, ๐‘˜ โˆ’ ๐‘™ = ๐‘–0, otherwise , ๐‘– = 0, . . . , ๐‘›.

When there is no ambiguity, we drop the superscript on ๐‘‡๐‘–. For example, for ๐‘› = 2 we have

๐‘‡0 =

โŽกโŽฃ 1 0 00 1 00 0 1

โŽคโŽฆ , ๐‘‡1 =

โŽกโŽฃ 0 0 01 0 00 1 0

โŽคโŽฆ , ๐‘‡2 =

โŽกโŽฃ 0 0 00 0 01 0 0

โŽคโŽฆ .

To prove correctness of the semidefinite characterization, we first note that

๐‘ฃ(๐‘ง)๐‘ฃ(๐‘ง)๐ป = ๐ผ +๐‘›โˆ‘๏ธ

๐‘–=1

(๐‘‡๐‘–๐‘ฃ๐‘–(๐‘ง)) +๐‘›โˆ‘๏ธ

๐‘–=1

(๐‘‡๐‘–๐‘ฃ๐‘–(๐‘ง))๐ป

i.e., โŽกโŽขโŽขโŽขโŽฃ1๐‘ง...๐‘ง๐‘›

โŽคโŽฅโŽฅโŽฅโŽฆโŽกโŽขโŽขโŽขโŽฃ

1๐‘ง...๐‘ง๐‘›

โŽคโŽฅโŽฅโŽฅโŽฆ๐ป

=

โŽกโŽขโŽขโŽขโŽฃ1 ๐‘งโˆ’1 . . . ๐‘งโˆ’๐‘›

๐‘ง 1 . . . ๐‘ง1โˆ’๐‘›

...... . . . ...

๐‘ง๐‘› ๐‘ง๐‘›โˆ’1 . . . 1

โŽคโŽฅโŽฅโŽฅโŽฆ .

63

Page 67: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Next assume that (6.28) is satisfied. Then

๐‘“(๐‘ง) = โŸจ๐‘ž๐‘ž๐ป , ๐‘ฃ(๐‘ง)๐‘ฃ(๐‘ง)๐ปโŸฉ= โŸจ๐‘ž๐‘ž๐ป , ๐ผโŸฉ + โŸจ๐‘ž๐‘ž๐ป ,โˆ‘๏ธ€๐‘›

๐‘–=1 ๐‘‡๐‘–๐‘ฃ๐‘–(๐‘ง)โŸฉ + โŸจ๐‘ž๐‘ž๐ป ,โˆ‘๏ธ€๐‘›๐‘–=1 ๐‘‡

๐‘‡๐‘– ๐‘ฃ๐‘–(๐‘ง)โŸฉ

= โŸจ๐‘ž๐‘ž๐ป , ๐ผโŸฉ +โˆ‘๏ธ€๐‘›

๐‘–=1โŸจ๐‘ž๐‘ž๐ป , ๐‘‡๐‘–โŸฉ๐‘ฃ๐‘–(๐‘ง) +โˆ‘๏ธ€๐‘›

๐‘–=1โŸจ๐‘ž๐‘ž๐ป , ๐‘‡ ๐‘‡๐‘– โŸฉ๐‘ฃ๐‘–(๐‘ง)

= ๐‘ฅ0 + 2โ„œ(โˆ‘๏ธ€๐‘›

๐‘–=1 ๐‘ฅ๐‘–๐‘ฃ๐‘–(๐‘ง))

with ๐‘ฅ๐‘– = โŸจ๐‘ž๐‘ž๐ป , ๐‘‡๐‘–โŸฉ. Conversely, assume that (6.29) holds. Then

๐‘“(๐‘ง) = โŸจ๐‘‹, ๐ผโŸฉ +๐‘›โˆ‘๏ธ

๐‘–=1

โŸจ๐‘‹,๐‘‡๐‘–โŸฉ๐‘ฃ๐‘–(๐‘ง) +๐‘›โˆ‘๏ธ

๐‘–=1

โŸจ๐‘‹,๐‘‡ ๐‘‡๐‘– โŸฉ๐‘ฃ๐‘–(๐‘ง) = โŸจ๐‘‹, ๐‘ฃ(๐‘ง)๐‘ฃ(๐‘ง)๐ปโŸฉ โ‰ฅ 0.

In other words, we have shown that

๐พ๐‘›0,๐œ‹ = {๐‘ฅ โˆˆ Rร— C๐‘› | ๐‘ฅ๐‘– = โŸจ๐‘‹,๐‘‡๐‘–โŸฉ, ๐‘– = 0, . . . , ๐‘›, ๐‘‹ โˆˆ โ„‹๐‘›+1

+ }. (6.30)

Nonnegativity on a subinterval

We next sketch a few useful extensions. An extension of the Riesz-Fejer Theorem states thata trigonometric polynomial ๐‘“(๐‘ง) of degree ๐‘› is nonnegative on ๐ผ(๐‘Ž, ๐‘) = {๐‘ง | ๐‘ง = ๐‘’๐‘—๐‘ก, ๐‘ก โˆˆ[๐‘Ž, ๐‘] โŠ† [0, ๐œ‹]} if and only if it can be written as a weighted sum of squared trigonometricpolynomials

๐‘“(๐‘ง) = |๐‘“1(๐‘ง)|2 + ๐‘”(๐‘ง)|๐‘“2(๐‘ง)|2

where ๐‘“1, ๐‘“2, ๐‘” are trigonemetric polynomials of degree ๐‘›, ๐‘› โˆ’ ๐‘‘ and ๐‘‘, respectively, and๐‘”(๐‘ง) โ‰ฅ 0 on ๐ผ(๐‘Ž, ๐‘). For example ๐‘”(๐‘ง) = ๐‘ง + ๐‘งโˆ’1 โˆ’ 2 cos๐›ผ is nonnegative on ๐ผ(0, ๐›ผ), and itcan be verified that ๐‘“(๐‘ง) โ‰ฅ 0, โˆ€๐‘ง โˆˆ ๐ผ(0, ๐›ผ) if and only if

๐‘ฅ๐‘– = โŸจ๐‘‹1, ๐‘‡๐‘›+1๐‘– โŸฉ + โŸจ๐‘‹2, ๐‘‡

๐‘›๐‘–+1โŸฉ + โŸจ๐‘‹2, ๐‘‡

๐‘›๐‘–โˆ’1โŸฉ โˆ’ 2 cos(๐›ผ)โŸจ๐‘‹2, ๐‘‡

๐‘›๐‘– โŸฉ,

for ๐‘‹1 โˆˆ โ„‹๐‘›+1+ , ๐‘‹2 โˆˆ โ„‹๐‘›

+, i.e.,

๐พ๐‘›0,๐›ผ = {๐‘ฅ โˆˆ Rร— C๐‘› | ๐‘ฅ๐‘– = โŸจ๐‘‹1, ๐‘‡

๐‘›+1๐‘– โŸฉ + โŸจ๐‘‹2, ๐‘‡

๐‘›๐‘–+1โŸฉ + โŸจ๐‘‹2, ๐‘‡

๐‘›๐‘–โˆ’1โŸฉ

โˆ’2 cos(๐›ผ)โŸจ๐‘‹2, ๐‘‡๐‘›๐‘– โŸฉ, ๐‘‹1 โˆˆ โ„‹๐‘›+1

+ , ๐‘‹2 โˆˆ โ„‹๐‘›+}.

(6.31)

Similarly ๐‘“(๐‘ง) โ‰ฅ 0, โˆ€๐‘ง โˆˆ ๐ผ(๐›ผ, ๐œ‹) if and only if

๐‘ฅ๐‘– = โŸจ๐‘‹1, ๐‘‡๐‘›+1๐‘– โŸฉ + โŸจ๐‘‹2, ๐‘‡

๐‘›๐‘–+1โŸฉ + โŸจ๐‘‹2, ๐‘‡

๐‘›๐‘–โˆ’1โŸฉ โˆ’ 2 cos(๐›ผ)โŸจ๐‘‹2, ๐‘‡

๐‘›๐‘– โŸฉ

i.e.,

๐พ๐‘›๐›ผ,๐œ‹ = {๐‘ฅ โˆˆ Rร— C๐‘› | ๐‘ฅ๐‘– = โŸจ๐‘‹1, ๐‘‡

๐‘›+1๐‘– โŸฉ + โŸจ๐‘‹2, ๐‘‡

๐‘›๐‘–+1โŸฉ + โŸจ๐‘‹2, ๐‘‡

๐‘›๐‘–โˆ’1โŸฉ

+2 cos(๐›ผ)โŸจ๐‘‹2, ๐‘‡๐‘›๐‘– โŸฉ, ๐‘‹1 โˆˆ โ„‹๐‘›+1

+ , ๐‘‹2 โˆˆ โ„‹๐‘›+}.

(6.32)

64

Page 68: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

6.3 Semidefinite optimization case studies

6.3.1 Nearest correlation matrix

We consider the set

๐‘† = {๐‘‹ โˆˆ ๐’ฎ๐‘›+ | ๐‘‹๐‘–๐‘– = 1, ๐‘– = 1, . . . , ๐‘›}

(shown in Fig. 6.1 for ๐‘› = 3). For ๐ด โˆˆ ๐’ฎ๐‘› the nearest correlation matrix is

๐‘‹โ‹† = arg min๐‘‹โˆˆ๐‘†

โ€–๐ดโˆ’๐‘‹โ€–๐น ,

i.e., the projection of ๐ด onto the set ๐‘†. To pose this as a conic optimization we define thelinear operator

svec(๐‘ˆ) = (๐‘ˆ11,โˆš

2๐‘ˆ21, . . . ,โˆš

2๐‘ˆ๐‘›1, ๐‘ˆ22,โˆš

2๐‘ˆ32, . . . ,โˆš

2๐‘ˆ๐‘›2, . . . , ๐‘ˆ๐‘›๐‘›),

which extracts and scales the lower-triangular part of ๐‘ˆ . We then get a conic formulationof the nearest correlation problem exploiting symmetry of ๐ดโˆ’๐‘‹,

minimize ๐‘กsubject to โ€–๐‘งโ€–2 โ‰ค ๐‘ก,

svec(๐ดโˆ’๐‘‹) = ๐‘ง,diag(๐‘‹) = ๐‘’,๐‘‹ โชฐ 0.

(6.33)

This is an example of a problem with both conic quadratic and semidefinite constraints inprimal form. We can add different constraints to the problem, for example a bound ๐›พ onthe smallest eigenvalue by replacing ๐‘‹ โชฐ 0 with ๐‘‹ โชฐ ๐›พ๐ผ.

6.3.2 Extremal ellipsoids

Given a polytope we can find the largest ellipsoid contained in the polytope, or the smallestellipsoid containing the polytope (for certain representations of the polytope).

Maximal inscribed ellipsoid

Consider a polytope

๐‘† = {๐‘ฅ โˆˆ R๐‘› | ๐‘Ž๐‘‡๐‘– ๐‘ฅ โ‰ค ๐‘๐‘–, ๐‘– = 1, . . . ,๐‘š}.

The ellipsoid

โ„ฐ := {๐‘ฅ | ๐‘ฅ = ๐ถ๐‘ข + ๐‘‘, โ€–๐‘ขโ€– โ‰ค 1}

is contained in ๐‘† if and only if

maxโ€–๐‘ขโ€–2โ‰ค1

๐‘Ž๐‘‡๐‘– (๐ถ๐‘ข + ๐‘‘) โ‰ค ๐‘๐‘–, ๐‘– = 1, . . . ,๐‘š

65

Page 69: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

or equivalently, if and only if

โ€–๐ถ๐‘Ž๐‘–โ€–2 + ๐‘Ž๐‘‡๐‘– ๐‘‘ โ‰ค ๐‘๐‘–, ๐‘– = 1, . . . ,๐‘š.

Since Vol(โ„ฐ) โ‰ˆ det(๐ถ)1/๐‘› the maximum-volume inscribed ellipsoid is the solution to

maximize det(๐ถ)subject to โ€–๐ถ๐‘Ž๐‘–โ€–2 + ๐‘Ž๐‘‡๐‘– ๐‘‘ โ‰ค ๐‘๐‘–, ๐‘– = 1, . . . ,๐‘š,

๐ถ โชฐ 0.

In Sec. 6.2.2 we show how to maximize the determinant of a positive definite matrix.

Minimal enclosing ellipsoid

Next consider a polytope given as the convex hull of a set of points,

๐‘† โ€ฒ = conv{๐‘ฅ1, ๐‘ฅ2, . . . , ๐‘ฅ๐‘š}, ๐‘ฅ๐‘– โˆˆ R๐‘›.

The ellipsoid

โ„ฐ โ€ฒ := {๐‘ฅ | โ€–๐‘ƒ (๐‘ฅโˆ’ ๐‘)โ€–2 โ‰ค 1}

has Vol(โ„ฐ โ€ฒ) โ‰ˆ det(๐‘ƒ )โˆ’1/๐‘›, so the minimum-volume enclosing ellipsoid is the solution to

maximize det(๐‘ƒ )subject to โ€–๐‘ƒ (๐‘ฅ๐‘– โˆ’ ๐‘)โ€–2 โ‰ค 1, ๐‘– = 1, . . . ,๐‘š,

๐‘ƒ โชฐ 0.

Fig. 6.2: Example of inner and outer ellipsoidal approximations of a pentagon in R2.

6.3.3 Polynomial curve-fitting

Consider a univariate polynomial of degree ๐‘›,

๐‘“(๐‘ก) = ๐‘ฅ0 + ๐‘ฅ1๐‘ก + ๐‘ฅ2๐‘ก2 + ยท ยท ยท + ๐‘ฅ๐‘›๐‘ก

๐‘›.

66

Page 70: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Often we wish to fit such a polynomial to a given set of measurements or control points

{(๐‘ก1, ๐‘ฆ1), (๐‘ก2, ๐‘ฆ2), . . . , (๐‘ก๐‘š, ๐‘ฆ๐‘š)},

i.e., we wish to determine coefficients ๐‘ฅ๐‘–, ๐‘– = 0, . . . , ๐‘› such that

๐‘“(๐‘ก๐‘—) โ‰ˆ ๐‘ฆ๐‘—, ๐‘— = 1, . . . ,๐‘š.

To that end, define the Vandermonde matrix

๐ด =

โŽกโŽขโŽขโŽขโŽฃ1 ๐‘ก1 ๐‘ก21 . . . ๐‘ก๐‘›11 ๐‘ก2 ๐‘ก22 . . . ๐‘ก๐‘›2...

......

...1 ๐‘ก๐‘š ๐‘ก2๐‘š . . . ๐‘ก๐‘›๐‘š

โŽคโŽฅโŽฅโŽฅโŽฆ .

We can then express the desired curve-fit compactly as

๐ด๐‘ฅ โ‰ˆ ๐‘ฆ,

i.e., as a linear expression in the coefficients ๐‘ฅ. When the degree of the polynomial equalsthe number measurements, ๐‘› = ๐‘š, the matrix ๐ด is square and non-singular (provided thereare no duplicate rows), so we can can solve

๐ด๐‘ฅ = ๐‘ฆ

to find a polynomial that passes through all the control points (๐‘ก๐‘–, ๐‘ฆ๐‘–). Similarly, if ๐‘› > ๐‘šthere are infinitely many solutions satisfying the underdetermined system ๐ด๐‘ฅ = ๐‘ฆ. A typicalchoice in that case is the least-norm solution

๐‘ฅln = arg min๐ด๐‘ฅ=๐‘ฆ

โ€–๐‘ฅโ€–2

which (assuming again there are no duplicate rows) equals

๐‘ฅln = ๐ด๐‘‡ (๐ด๐ด๐‘‡ )โˆ’1๐‘ฆ.

On the other hand, if ๐‘› < ๐‘š we generally cannot find a solution to the overdeterminedsystem ๐ด๐‘ฅ = ๐‘ฆ, and we typically resort to a least-squares solution

๐‘ฅls = arg min โ€–๐ด๐‘ฅโˆ’ ๐‘ฆโ€–2

which is given by the formula (see Sec. 8.5.1)

๐‘ฅls = (๐ด๐‘‡๐ด)โˆ’1๐ด๐‘‡๐‘ฆ.

In the following we discuss how the semidefinite characterizations of nonnegative polynomials(see Sec. 6.2.6) lead to more advanced and useful polynomial curve-fitting constraints.

67

Page 71: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

โ€ข Nonnegativity. One possible constraint is nonnegativity on an interval,

๐‘“(๐‘ก) := ๐‘ฅ0 + ๐‘ฅ1๐‘ก + ยท ยท ยท + ๐‘ฅ๐‘›๐‘ก๐‘› โ‰ฅ 0, โˆ€๐‘ก โˆˆ [๐‘Ž, ๐‘]

with a semidefinite characterization embedded in ๐‘ฅ โˆˆ ๐พ๐‘›๐‘Ž,๐‘, see (6.23).

โ€ข Monotonicity. We can ensure monotonicity of ๐‘“(๐‘ก) by requiring that ๐‘“ โ€ฒ(๐‘ก) โ‰ฅ 0 (or๐‘“ โ€ฒ(๐‘ก) โ‰ค 0), i.e.,

(๐‘ฅ1, 2๐‘ฅ2, . . . , ๐‘›๐‘ฅ๐‘›) โˆˆ ๐พ๐‘›โˆ’1๐‘Ž,๐‘ ,

or

โˆ’(๐‘ฅ1, 2๐‘ฅ2, . . . , ๐‘›๐‘ฅ๐‘›) โˆˆ ๐พ๐‘›โˆ’1๐‘Ž,๐‘ ,

respectively.

โ€ข Convexity or concavity. Convexity (or concavity) of ๐‘“(๐‘ก) corresponds to ๐‘“ โ€ฒโ€ฒ(๐‘ก) โ‰ฅ 0 (or๐‘“ โ€ฒโ€ฒ(๐‘ก) โ‰ค 0), i.e.,

(2๐‘ฅ2, 6๐‘ฅ3, . . . , (๐‘›โˆ’ 1)๐‘›๐‘ฅ๐‘›) โˆˆ ๐พ๐‘›โˆ’2๐‘Ž,๐‘ ,

or

โˆ’(2๐‘ฅ2, 6๐‘ฅ3, . . . , (๐‘›โˆ’ 1)๐‘›๐‘ฅ๐‘›) โˆˆ ๐พ๐‘›โˆ’2๐‘Ž,๐‘ ,

respectively.

As an example, we consider fitting a smooth polynomial

๐‘“๐‘›(๐‘ก) = ๐‘ฅ0 + ๐‘ฅ1๐‘ก + ยท ยท ยท + ๐‘ฅ๐‘›๐‘ก๐‘›

to the points {(โˆ’1, 1), (0, 0), (1, 1)}, where smoothness is implied by bounding |๐‘“ โ€ฒ๐‘›(๐‘ก)|. More

specifically, we wish to solve the problem

minimize ๐‘งsubject to |๐‘“ โ€ฒ

๐‘›(๐‘ก)| โ‰ค ๐‘ง, โˆ€๐‘ก โˆˆ [โˆ’1, 1]๐‘“๐‘›(โˆ’1) = 1, ๐‘“๐‘›(0) = 0, ๐‘“๐‘›(1) = 1,

or equivalently

minimize ๐‘งsubject to ๐‘ง โˆ’ ๐‘“ โ€ฒ

๐‘›(๐‘ก) โ‰ฅ 0, โˆ€๐‘ก โˆˆ [โˆ’1, 1]๐‘“ โ€ฒ๐‘›(๐‘ก) โˆ’ ๐‘ง โ‰ฅ 0, โˆ€๐‘ก โˆˆ [โˆ’1, 1]๐‘“๐‘›(โˆ’1) = 1, ๐‘“๐‘›(0) = 0, ๐‘“๐‘›(1) = 1.

Finally, we use the characterizations ๐พ๐‘›๐‘Ž,๐‘ to get a conic problem

minimize ๐‘งsubject to (๐‘ง โˆ’ ๐‘ฅ1,โˆ’2๐‘ฅ2, . . . ,โˆ’๐‘›๐‘ฅ๐‘›) โˆˆ ๐พ๐‘›โˆ’1

โˆ’1,1

(๐‘ฅ1 โˆ’ ๐‘ง, 2๐‘ฅ2, . . . , ๐‘›๐‘ฅ๐‘›) โˆˆ ๐พ๐‘›โˆ’1โˆ’1,1

๐‘“๐‘›(โˆ’1) = 1, ๐‘“๐‘›(0) = 0, ๐‘“๐‘›(1) = 1.

68

Page 72: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

f2(t)

f4(t)

f8(t)

12

1

32

โˆ’2 โˆ’1 0 1 2 t

Fig. 6.3: Graph of univariate polynomials of degree 2, 4, and 8, respectively, passing through{(โˆ’1, 1), (0, 0), (1, 1)}. The higher-degree polynomials are increasingly smoother on [โˆ’1, 1].

In Fig. 6.3 we show the graphs for the resulting polynomails of degree 2, 4 and 8, re-spectively. The second degree polynomial is uniquely determined by the three constraints๐‘“2(โˆ’1) = 1, ๐‘“2(0) = 0, ๐‘“2(1) = 1, i.e., ๐‘“2(๐‘ก) = ๐‘ก2. Also, we obviously have a lower boundon the largest derivative max๐‘กโˆˆ[โˆ’1,1] |๐‘“ โ€ฒ

๐‘›(๐‘ก)| โ‰ฅ 1. The computed fourth degree polynomial isgiven by

๐‘“4(๐‘ก) =3

2๐‘ก2 โˆ’ 1

2๐‘ก4

after rounding coefficients to rational numbers. Furthermore, the largest derivative is givenby

๐‘“ โ€ฒ4(1/

โˆš2) =

โˆš2,

and ๐‘“ โ€ฒโ€ฒ4 (๐‘ก) < 0 on (1/

โˆš2, 1] so, although not visibly clear, the polynomial is nonconvex on

[โˆ’1, 1]. In Fig. 6.4 we show the graphs of the corresponding polynomials where we added aconvexity constraint ๐‘“ โ€ฒโ€ฒ

๐‘›(๐‘ก) โ‰ฅ 0, i.e.,

(2๐‘ฅ2, 6๐‘ฅ3, . . . , (๐‘›โˆ’ 1)๐‘›๐‘ฅ๐‘›) โˆˆ ๐พ๐‘›โˆ’2โˆ’1,1.

In this case, we get

๐‘“4(๐‘ก) =6

5๐‘ก2 โˆ’ 1

5๐‘ก4

and the largest derivative increases to 85.

6.3.4 Filter design problems

Filter design is an important application of optimization over trigonometric polynomials insignal processing. We consider a trigonometric polynomial

๐ป(๐œ”) = ๐‘ฅ0 + 2โ„œ(โˆ‘๏ธ€๐‘›

๐‘˜=1 ๐‘ฅ๐‘˜๐‘’โˆ’๐‘—๐œ”๐‘˜)

= ๐‘Ž0 + 2โˆ‘๏ธ€๐‘›

๐‘˜=1 (๐‘Ž๐‘˜ cos(๐œ”๐‘˜) + ๐‘๐‘˜ sin(๐œ”๐‘˜))

69

Page 73: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

f2(t)

f4(t)

f8(t)12

1

32

โˆ’2 โˆ’1 0 1 2 t

Fig. 6.4: Graph of univariate polynomials of degree 2, 4, and 8, respectively, passing through{(โˆ’1, 1), (0, 0), (1, 1)}. The polynomials all have positive second derivative (i.e., they areconvex) on [โˆ’1, 1] and the higher-degree polynomials are increasingly smoother on thatinterval.

where ๐‘Ž๐‘˜ := โ„œ(๐‘ฅ๐‘˜), ๐‘๐‘˜ := โ„‘(๐‘ฅ๐‘˜). If the function ๐ป(๐œ”) is nonnegative we call it a transferfunction, and it describes how different harmonic components of a periodic discrete signalare attenuated when a filter with transfer function ๐ป(๐œ”) is applied to the signal.

We often wish a transfer function where ๐ป(๐œ”) โ‰ˆ 1 for 0 โ‰ค ๐œ” โ‰ค ๐œ”๐‘ and ๐ป(๐œ”) โ‰ˆ 0 for๐œ”๐‘  โ‰ค ๐œ” โ‰ค ๐œ‹ for given constants ๐œ”๐‘, ๐œ”๐‘ . One possible formulation for achieving this is

minimize ๐‘กsubject to 0 โ‰ค ๐ป(๐œ”) โˆ€๐œ” โˆˆ [0, ๐œ‹]

1 โˆ’ ๐›ฟ โ‰ค ๐ป(๐œ”) โ‰ค 1 + ๐›ฟ โˆ€๐œ” โˆˆ [0, ๐œ”๐‘]๐ป(๐œ”) โ‰ค ๐‘ก โˆ€๐œ” โˆˆ [๐œ”๐‘ , ๐œ‹],

which corresponds to minimizing ๐ป(๐‘ค) on the interval [๐œ”๐‘ , ๐œ‹] while allowing ๐ป(๐‘ค) to departfrom unity by a small amount ๐›ฟ on the interval [0, ๐œ”๐‘]. Using the results from Sec. 6.2.8 (inparticular (6.30), (6.31) and (6.32)), we can pose this as a conic optimization problem

minimize ๐‘กsubject to ๐‘ฅ โˆˆ ๐พ๐‘›

0,๐œ‹,(๐‘ฅ0 โˆ’ (1 โˆ’ ๐›ฟ), ๐‘ฅ1:๐‘›) โˆˆ ๐พ๐‘›

0,๐œ”๐‘,

โˆ’(๐‘ฅ0 โˆ’ (1 + ๐›ฟ), ๐‘ฅ1:๐‘›) โˆˆ ๐พ๐‘›0,๐œ”๐‘

,

โˆ’(๐‘ฅ0 โˆ’ ๐‘ก, ๐‘ฅ1:๐‘›) โˆˆ ๐พ๐‘›๐œ”๐‘ ,๐œ‹,

(6.34)

which is a semidefinite optimization problem. In Fig. 6.5 we show ๐ป(๐œ”) obtained by solving(6.34) for ๐‘› = 10, ๐›ฟ = 0.05, ๐œ”๐‘ = ๐œ‹/4 and ๐œ”๐‘  = ๐œ”๐‘ + ๐œ‹/8.

6.3.5 Relaxations of binary optimization

Semidefinite optimization is also useful for computing bounds on difficult non-convex orcombinatorial optimization problems. For example consider the binary linear optimization

70

Page 74: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

t?

1โˆ’ ฮด1 + ฮด

H(ฯ‰

)

ฯ‰p ฯ‰s ฯ€

ฯ‰

Fig. 6.5: Plot of lowpass filter transfer function.

problem

minimize ๐‘๐‘‡๐‘ฅsubject to ๐ด๐‘ฅ = ๐‘,

๐‘ฅ โˆˆ {0, 1}๐‘›.(6.35)

In general, problem (6.35) is a very difficult non-convex problem where we have to explore2๐‘› different objectives. Alternatively, we can use semidefinite optimization to get a lowerbound on the optimal solution with polynomial complexity. We first note that

๐‘ฅ๐‘– โˆˆ {0, 1} โ‡โ‡’ ๐‘ฅ2๐‘– = ๐‘ฅ๐‘–,

which is, in fact, equivalent to a rank constraint on a semidefinite variable,

๐‘‹ = ๐‘ฅ๐‘ฅ๐‘‡ , diag(๐‘‹) = ๐‘ฅ.

By relaxing the rank 1 constraint on ๐‘‹ we get a semidefinite relaxation of (6.35),

minimize ๐‘๐‘‡๐‘ฅsubject to ๐ด๐‘ฅ = ๐‘,

diag(๐‘‹) = ๐‘ฅ,๐‘‹ โชฐ ๐‘ฅ๐‘ฅ๐‘‡ ,

(6.36)

where we note that

๐‘‹ โชฐ ๐‘ฅ๐‘ฅ๐‘‡ โ‡โ‡’[๏ธ‚

1 ๐‘ฅ๐‘‡

๐‘ฅ ๐‘‹

]๏ธ‚โชฐ 0.

Since (6.36) is a semidefinite optimization problem, it can be solved very efficiently. Suppose๐‘ฅโ‹† is an optimal solution for (6.35); then (๐‘ฅโ‹†, ๐‘ฅโ‹†(๐‘ฅโ‹†)๐‘‡ ) is also feasible for (6.36), but thefeasible set for (6.36) is larger than the feasible set for (6.35), so in general the optimalsolution of (6.36) serves as a lower bound. However, if the optimal solution ๐‘‹โ‹† of (6.36) has

71

Page 75: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

rank 1 we have found a solution to (6.35) also. The semidefinite relaxation can also be usedin a branch-bound mixed-integer exact algorithm for (6.35).

We can tighten (or improve) the relaxation (6.36) by adding other constraints that cutaway parts of the feasible set, without excluding rank 1 solutions. By tightening the re-laxation, we reduce the gap between the optimal values of the original problem and therelaxation. For example, we can add the constraints

0 โ‰ค ๐‘‹๐‘–๐‘— โ‰ค 1, ๐‘– = 1, . . . , ๐‘›, ๐‘— = 1, . . . , ๐‘›,๐‘‹๐‘–๐‘– โ‰ฅ ๐‘‹๐‘–๐‘—, ๐‘– = 1, . . . , ๐‘›, ๐‘— = 1, . . . , ๐‘›,

and so on. This will usually have a dramatic impact on solution times and memory re-quirements. Already constraining a semidefinite matrix to be doubly nonnegative (๐‘‹๐‘–๐‘— โ‰ฅ 0)introduces additional ๐‘›2 linear inequality constraints.

6.3.6 Relaxations of boolean optimization

Similarly to Sec. 6.3.5 we can use semidefinite relaxations of boolean constraints ๐‘ฅ โˆˆ{โˆ’1,+1}๐‘›. To that end, we note that

๐‘ฅ โˆˆ {โˆ’1,+1}๐‘› โ‡โ‡’ ๐‘‹ = ๐‘ฅ๐‘ฅ๐‘‡ , diag(๐‘‹) = ๐‘’, (6.37)

with a semidefinite relaxation ๐‘‹ โชฐ ๐‘ฅ๐‘ฅ๐‘‡ of the rank-1 constraint.As a (standard) example of a combinatorial problem with boolean constraints, we consider

an undirected graph ๐บ = (๐‘‰,๐ธ) described by a set of vertices ๐‘‰ = {๐‘ฃ1, . . . , ๐‘ฃ๐‘›} and a set ofedges ๐ธ = {(๐‘ฃ๐‘–, ๐‘ฃ๐‘—) | ๐‘ฃ๐‘–, ๐‘ฃ๐‘— โˆˆ ๐‘‰, ๐‘– ฬธ= ๐‘—}, and we wish to find the cut of maximum capacity.A cut ๐ถ partitions the nodes ๐‘‰ into two disjoint sets ๐‘† and ๐‘‡ = ๐‘‰ โˆ– ๐‘†, and the cut-set ๐ผ isthe set of edges with one node in ๐‘† and another in ๐‘‡ , i.e.,

๐ผ = {(๐‘ข, ๐‘ฃ) โˆˆ ๐ธ | ๐‘ข โˆˆ ๐‘†, ๐‘ฃ โˆˆ ๐‘‡}.

The capacity of a cut is then defined as the number of edges in the cut-set, |๐ผ|.

v0

v1

v2

v3 v4

v5v6

Fig. 6.6: Undirected graph. The cut {๐‘ฃ2, ๐‘ฃ4, ๐‘ฃ5} has capacity 9 (thick edges).

To maximize the capacity of a cut we define the symmetric adjacency matrix ๐ด โˆˆ ๐’ฎ๐‘›,

[๐ด]๐‘–๐‘— =

{๏ธ‚1, (๐‘ฃ๐‘–, ๐‘ฃ๐‘—) โˆˆ ๐ธ,0, otherwise,

72

Page 76: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

where ๐‘› = |๐‘‰ |, and let

๐‘ฅ๐‘– =

{๏ธ‚+1, ๐‘ฃ๐‘– โˆˆ ๐‘†,โˆ’1, ๐‘ฃ๐‘– /โˆˆ ๐‘†.

Suppose ๐‘ฃ๐‘– โˆˆ ๐‘†. Then 1 โˆ’ ๐‘ฅ๐‘–๐‘ฅ๐‘— = 0 if ๐‘ฃ๐‘— โˆˆ ๐‘† and 1 โˆ’ ๐‘ฅ๐‘–๐‘ฅ๐‘— = 2 if ๐‘ฃ๐‘— /โˆˆ ๐‘†, so we get anexpression for the capacity as

๐‘(๐‘ฅ) =1

4

โˆ‘๏ธ๐‘–๐‘—

(1 โˆ’ ๐‘ฅ๐‘–๐‘ฅ๐‘—)๐ด๐‘–๐‘— =1

4๐‘’๐‘‡๐ด๐‘’โˆ’ 1

4๐‘ฅ๐‘‡๐ด๐‘ฅ,

and discarding the constant term ๐‘’๐‘‡๐ด๐‘’ gives us the MAX-CUT problem

minimize ๐‘ฅ๐‘‡๐ด๐‘ฅsubject to ๐‘ฅ โˆˆ {โˆ’1,+1}๐‘›, (6.38)

with a semidefinite relaxation

minimize โŸจ๐ด,๐‘‹โŸฉsubject to diag(๐‘‹) = ๐‘’,

๐‘‹ โชฐ 0.(6.39)

73

Page 77: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Chapter 7

Practical optimization

In this chapter we discuss various practical aspects of creating optimization models.

7.1 Conic reformulations

Many nonlinear constraint families were reformulated to conic form in previous chapters, andthese examples are often sufficient to construct conic models for the optimization problems athand. In case you do need to go beyond the cookbook examples, however, a few compositionrules are worth knowing to guarantee correctness of your reformulation.

7.1.1 Perspective functions

The perspective of a function ๐‘“(๐‘ฅ) is defined as ๐‘ ๐‘“(๐‘ฅ/๐‘ ) on ๐‘  > 0. From any conic represen-tation of ๐‘ก โ‰ฅ ๐‘“(๐‘ฅ) one can reach a representation of ๐‘ก โ‰ฅ ๐‘ ๐‘“(๐‘ฅ/๐‘ ) by exchanging all constants๐‘˜ with their homogenized counterpart ๐‘˜๐‘ .

Example 7.1. In Sec. 5.2.6 it was shown that the logarithm of sum of exponentials

๐‘ก โ‰ฅ log(๐‘’๐‘ฅ1 + ยท ยท ยท + ๐‘’๐‘ฅ๐‘›),

can be modeled as: โˆ‘๏ธ€๐‘ข๐‘– โ‰ค 1,

(๐‘ข๐‘–, 1, ๐‘ฅ๐‘– โˆ’ ๐‘ก) โˆˆ ๐พexp, ๐‘– = 1, . . . , ๐‘›.

It follows immediately that its perspective

๐‘ก โ‰ฅ ๐‘  log(๐‘’๐‘ฅ1/๐‘  + ยท ยท ยท + ๐‘’๐‘ฅ๐‘›/๐‘ ),

can be modeled as: โˆ‘๏ธ€๐‘ข๐‘– โ‰ค ๐‘ ,

(๐‘ข๐‘–, ๐‘ , ๐‘ฅ๐‘– โˆ’ ๐‘ก) โˆˆ ๐พexp, ๐‘– = 1, . . . , ๐‘›.

74

Page 78: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Proof. If [๐‘ก โ‰ฅ ๐‘“(๐‘ฅ)] โ‡” [๐‘ก โ‰ฅ ๐‘(๐‘ฅ, ๐‘Ÿ) + ๐‘0, ๐ด(๐‘ฅ, ๐‘Ÿ) + ๐‘ โˆˆ ๐พ] for some linear functions ๐‘ and ๐ด, acone ๐พ, and artificial variable ๐‘Ÿ, then [๐‘ก โ‰ฅ ๐‘ ๐‘“(๐‘ฅ/๐‘ )] โ‡” [๐‘ก โ‰ฅ ๐‘ 

(๏ธ€๐‘(๐‘ฅ/๐‘ , ๐‘Ÿ) + ๐‘0

)๏ธ€, ๐ด(๐‘ฅ/๐‘ , ๐‘Ÿ) + ๐‘ โˆˆ

๐พ] โ‡” [๐‘ก โ‰ฅ ๐‘(๐‘ฅ, ๐‘Ÿ) + ๐‘0๐‘ , ๐ด(๐‘ฅ, ๐‘Ÿ) + ๐‘๐‘  โˆˆ ๐พ], where ๐‘Ÿ is taken as a new artificial variable inplace of ๐‘Ÿ๐‘ .

7.1.2 Composite functions

In representing composites ๐‘“(๐‘”(๐‘ฅ)), it is often helpful to move the inner function ๐‘”(๐‘ฅ) to aseparate constraint. That is, we would like ๐‘ก โ‰ฅ ๐‘“(๐‘”(๐‘ฅ)) to be reformulated as ๐‘ก โ‰ฅ ๐‘“(๐‘Ÿ) for anew variable ๐‘Ÿ constrained as ๐‘Ÿ = ๐‘”(๐‘ฅ). This reformulation works directly as stated if ๐‘” isaffine; that is, ๐‘”(๐‘ฅ) = ๐‘Ž๐‘‡๐‘ฅ+๐‘. Otherwise, one of two possible relaxations may be considered:

1. If ๐‘“ is convex and ๐‘” is convex, then ๐‘ก โ‰ฅ ๐‘“(๐‘Ÿ), ๐‘Ÿ โ‰ฅ ๐‘”(๐‘ฅ) is equivalent to ๐‘ก โ‰ฅ{๏ธƒ๐‘“(๐‘”(๐‘ฅ)) if ๐‘”(๐‘ฅ) โˆˆ โ„ฑ+,

๐‘“min otherwise ,

2. If ๐‘“ is convex and ๐‘” is concave, then ๐‘ก โ‰ฅ ๐‘“(๐‘Ÿ), ๐‘Ÿ โ‰ค ๐‘”(๐‘ฅ) is equivalent to ๐‘ก โ‰ฅ{๏ธƒ๐‘“(๐‘”(๐‘ฅ)) if ๐‘”(๐‘ฅ) โˆˆ โ„ฑโˆ’,

๐‘“min otherwise ,

where โ„ฑ+ (resp. โ„ฑโˆ’) is the domain on which ๐‘“ is nondecreasing (resp. nonincreasing),and ๐‘“min is the minimum of ๐‘“ , if finite, and โˆ’โˆž otherwise. Note that Relaxation 1 providesan exact representation if ๐‘”(๐‘ฅ) โˆˆ โ„ฑ+ for all ๐‘ฅ in domain, and likewise for Relaxation 2 if๐‘”(๐‘ฅ) โˆˆ โ„ฑโˆ’ for all ๐‘ฅ in domain.

Proof. Consider Relaxation 1. Intuitively, if ๐‘Ÿ โˆˆ โ„ฑ+, you can expect clever solvers to drive๐‘Ÿ towards โˆ’โˆž in attempt to relax ๐‘ก โ‰ฅ ๐‘“(๐‘Ÿ), and this push continues until ๐‘Ÿ โ‰ฅ ๐‘”(๐‘ฅ) becomesactive or else โ€” in case ๐‘”(๐‘ฅ) ฬธโˆˆ โ„ฑ+ โ€” until ๐‘“min is reached. On the other hand, if ๐‘Ÿ โˆˆ โ„ฑโˆ’

(and thus ๐‘Ÿ โ‰ฅ ๐‘”(๐‘ฅ) โˆˆ โ„ฑโˆ’) then solvers will drive ๐‘Ÿ towards +โˆž until ๐‘“min is reached.

Example 7.2. Here are some simple substitutions.

โ€ข The inequality ๐‘ก โ‰ฅ exp(๐‘”(๐‘ฅ)) can be rewritten as ๐‘ก โ‰ฅ exp(๐‘Ÿ) and ๐‘Ÿ โ‰ฅ ๐‘”(๐‘ฅ) for allconvex functions ๐‘”(๐‘ฅ). This is because ๐‘“(๐‘Ÿ) = exp(๐‘Ÿ) is nondecreasing everywhere.

โ€ข The inequality 2๐‘ ๐‘ก โ‰ฅ ๐‘”(๐‘ฅ)2 can be rewritten as 2๐‘ ๐‘ก โ‰ฅ ๐‘Ÿ2 and ๐‘Ÿ โ‰ฅ ๐‘”(๐‘ฅ) for allnonnegative convex functions ๐‘”(๐‘ฅ). This is because ๐‘“(๐‘Ÿ) = ๐‘Ÿ2 is nondecreasing on๐‘Ÿ โ‰ฅ 0.

โ€ข The nonconvex inequality ๐‘ก โ‰ฅ (๐‘ฅ2 โˆ’ 1)2 can be relaxed as ๐‘ก โ‰ฅ ๐‘Ÿ2 and ๐‘Ÿ โ‰ฅ ๐‘ฅ2 โˆ’ 1.This relaxation is exact on ๐‘ฅ2 โˆ’ 1 โ‰ฅ 0 where ๐‘“(๐‘Ÿ) = ๐‘Ÿ2 is nondecreasing, i.e., on thedisjoint domain |๐‘ฅ| โ‰ฅ 1. On the domain |๐‘ฅ| โ‰ค 1 it represents the strongest possibleconvex cut given by ๐‘ก โ‰ฅ ๐‘“min = 0.

75

Page 79: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Example 7.3. Some amount of work is generally required to find the correct reformulationof a nested nonlinear constraint. For instance, suppose that for ๐‘ฅ, ๐‘ฆ, ๐‘ง โ‰ฅ 0 with ๐‘ฅ๐‘ฆ๐‘ง > 1we want to write

๐‘ก โ‰ฅ 1

๐‘ฅ๐‘ฆ๐‘ง โˆ’ 1.

A natural first attempt is:

๐‘ก โ‰ฅ 1

๐‘Ÿ, ๐‘Ÿ โ‰ค ๐‘ฅ๐‘ฆ๐‘ง โˆ’ 1,

which corresponds to Relaxation 2 with ๐‘“(๐‘Ÿ) = 1/๐‘Ÿ and ๐‘”(๐‘ฅ, ๐‘ฆ, ๐‘ง) = ๐‘ฅ๐‘ฆ๐‘ง โˆ’ 1 > 0. Thefunction ๐‘“ is indeed convex and nonincreasing on all of ๐‘”(๐‘ฅ, ๐‘ฆ, ๐‘ง), and the inequality ๐‘ก๐‘Ÿ โ‰ฅ 1is moreover representable with a rotated quadratic cone. Unfortunately ๐‘” is not concave.We know that a monomial like ๐‘ฅ๐‘ฆ๐‘ง appears in connection with the power cone, but thatrequires a homogeneous constraint such as ๐‘ฅ๐‘ฆ๐‘ง โ‰ฅ ๐‘ข3. This gives us an idea to try

๐‘ก โ‰ฅ 1

๐‘Ÿ3, ๐‘Ÿ3 โ‰ค ๐‘ฅ๐‘ฆ๐‘ง โˆ’ 1,

which is ๐‘“(๐‘Ÿ) = 1/๐‘Ÿ3 and ๐‘”(๐‘ฅ, ๐‘ฆ, ๐‘ง) = (๐‘ฅ๐‘ฆ๐‘ง โˆ’ 1)1/3 > 0. This provides the right balance:all conditions for validity and exactness of Relaxation 2 are satisfied. Introducing anothervariable ๐‘ข we get the following model:

๐‘ก๐‘Ÿ3 โ‰ฅ 1, ๐‘ฅ๐‘ฆ๐‘ง โ‰ฅ ๐‘ข3, ๐‘ข โ‰ฅ (๐‘Ÿ3 + 13)1/3,

We refer to Sec. 4 to verify that all the constraints above are representable using powercones. We leave it as an exercise to find other conic representations, based on othertransformations of the original inequality.

7.1.3 Convex univariate piecewise-defined functions

Consider a univariate function with ๐‘˜ pieces:

๐‘“(๐‘ฅ) =

โŽงโŽชโŽจโŽชโŽฉ๐‘“1(๐‘ฅ) if ๐‘ฅ โ‰ค ๐›ผ1,

๐‘“๐‘–(๐‘ฅ) if ๐›ผ๐‘–โˆ’1 โ‰ค ๐‘ฅ โ‰ค ๐›ผ๐‘–, for ๐‘– = 2, . . . , ๐‘˜ โˆ’ 1,

๐‘“๐‘˜(๐‘ฅ) if ๐›ผ๐‘˜โˆ’1 โ‰ค ๐‘ฅ,

Suppose ๐‘“(๐‘ฅ) is convex (in particular each piece is convex by itself) and ๐‘“๐‘–(๐›ผ๐‘–) = ๐‘“๐‘–+1(๐›ผ๐‘–). Inrepresenting the epigraph ๐‘ก โ‰ฅ ๐‘“(๐‘ฅ) it is helpful to proceed via the equivalent representation:

๐‘ก =โˆ‘๏ธ€๐‘˜

๐‘–=1 ๐‘ก๐‘– โˆ’โˆ‘๏ธ€๐‘˜โˆ’1

๐‘–=1 ๐‘“๐‘–(๐›ผ๐‘–),

๐‘ฅ =โˆ‘๏ธ€๐‘˜

๐‘–=1 ๐‘ฅ๐‘– โˆ’โˆ‘๏ธ€๐‘˜โˆ’1

๐‘–=1 ๐›ผ๐‘–,๐‘ก๐‘– โ‰ฅ ๐‘“๐‘–(๐‘ฅ๐‘–) for ๐‘– = 1, . . . , ๐‘˜,๐‘ฅ1 โ‰ค ๐›ผ1, ๐›ผ๐‘–โˆ’1 โ‰ค ๐‘ฅ๐‘– โ‰ค ๐›ผ๐‘– for ๐‘– = 2, . . . , ๐‘˜ โˆ’ 1, ๐›ผ๐‘˜โˆ’1 โ‰ค ๐‘ฅ๐‘˜.

76

Page 80: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Proof. In the special case when ๐‘˜ = 2 and ๐›ผ1 = ๐‘“1(๐›ผ1) = ๐‘“2(๐›ผ1) = 0 the epigraph of ๐‘“ isequal to the Minkowski sum {(๐‘ก1 + ๐‘ก2, ๐‘ฅ1 + ๐‘ฅ2) : ๐‘ก๐‘– โ‰ฅ ๐‘“๐‘–(๐‘ฅ๐‘–)}. In general the epigraph overtwo consecutive pieces is obtained by shifting them to (0, 0), computing the Minkowski sumand shifting back. Finally more than two pieces can be joined by continuing this argumentby induction.

As the reformulation grows in size with the number of pieces, it is preferable to keepthis number low. Trivially, if ๐‘“๐‘–(๐‘ฅ) = ๐‘“๐‘–+1(๐‘ฅ), these two pieces can be merged. Substituting๐‘“๐‘–(๐‘ฅ) and ๐‘“๐‘–+1(๐‘ฅ) for max(๐‘“๐‘–(๐‘ฅ), ๐‘“๐‘–+1(๐‘ฅ)) is sometimes an invariant change to facilitate thismerge. For instance, it always works for affine functions ๐‘“๐‘–(๐‘ฅ) and ๐‘“๐‘–+1(๐‘ฅ). Finally, if ๐‘“(๐‘ฅ)is symmetric around some point ๐›ผ, we can represent its epigraph via a piecewise function,with only half the number of pieces, defined in terms of a new variable ๐‘ง = |๐‘ฅโˆ’ ๐›ผ| + ๐›ผ.

Example 7.4 (Huber loss function). The Huber loss function

๐‘“(๐‘ฅ) =

โŽงโŽชโŽจโŽชโŽฉโˆ’2๐‘ฅโˆ’ 1 if ๐‘ฅ โ‰ค โˆ’1,

๐‘ฅ2 if โˆ’ 1 โ‰ค ๐‘ฅ โ‰ค 1,

2๐‘ฅโˆ’ 1 if 1 โ‰ค ๐‘ฅ,

is convex and its epigraph ๐‘ก โ‰ฅ ๐‘“(๐‘ฅ) has an equivalent representation

๐‘ก โ‰ฅ ๐‘ก1 + ๐‘ก2 + ๐‘ก3 โˆ’ 2,๐‘ฅ = ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3,๐‘ก1 = โˆ’2๐‘ฅ1 โˆ’ 1, ๐‘ก2 โ‰ฅ ๐‘ฅ2

2, ๐‘ก3 = 2๐‘ฅ3 โˆ’ 1,๐‘ฅ1 โ‰ค โˆ’1, โˆ’1 โ‰ค ๐‘ฅ2 โ‰ค 1, 1 โ‰ค ๐‘ฅ3,

where ๐‘ก2 โ‰ฅ ๐‘ฅ22 is representable by a rotated quadratic cone. No two pieces of ๐‘“(๐‘ฅ) can be

merged to reduce the size of this formulation, but the loss function does satisfy a simplesymmetry; namely ๐‘“(๐‘ฅ) = ๐‘“(โˆ’๐‘ฅ). We can thus represent its epigraph by ๐‘ก โ‰ฅ ๐‘“(๐‘ง) and๐‘ง โ‰ฅ |๐‘ฅ|, where

๐‘“(๐‘ง) =

{๏ธƒ๐‘ง2 if ๐‘ง โ‰ค 1,

2๐‘ง โˆ’ 1 if 1 โ‰ค ๐‘ง.

In this particular example, however, unless the absolute value ๐‘ง from ๐‘ง โ‰ฅ |๐‘ฅ| is usedelsewhere, the cost of introducing it does not outweigh the savings achieved by going fromthree pieces in ๐‘ฅ to two pieces in ๐‘ง.

7.2 Avoiding ill-posed problems

For a well-posed continuous problem a small change in the input data should induce a smallchange of the optimal solution. A problem is ill-posed if small perturbations of the problemdata result in arbitrarily large perturbations of the solution, or even change the feasibility

77

Page 81: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

status of the problem. In such cases small rounding errors, or solving the problem on adifferent computer can result in different or wrong solutions.

In fact, from an algorithmic point of view, even computing a wrong solution is numericallydifficult for ill-posed problems. A rigorous definition of the degree of ill-posedness is possibleby defining a condition number. This is an attractive, but not very practical metric, as itsevaluation requires solving several auxiliary optimization problems. Therefore we only makethe modest recommendations to avoid problems

โ€ข that are nearly infeasible,

โ€ข with constraints that are linearly dependent, or nearly linearly dependent.

Example 7.5 (Near linear dependencies). To give some idea about the perils of nearlinear dependence, consider this pair of optimization problems:

maximize ๐‘ฅsubject to 2๐‘ฅโˆ’ ๐‘ฆ โ‰ฅ โˆ’1,

2.0001๐‘ฅโˆ’ ๐‘ฆ โ‰ค 0,

and

maximize ๐‘ฅsubject to 2๐‘ฅโˆ’ ๐‘ฆ โ‰ฅ โˆ’1,

1.9999๐‘ฅโˆ’ ๐‘ฆ โ‰ค 0.

The first problem has a unique optimal solution (๐‘ฅ, ๐‘ฆ) = (104, 2 ยท104 +1), while the secondproblem is unbounded. This is caused by the fact that the hyperplanes (here: straightlines) defining the constraints in each problem are almost parallel. Moreover, we canconsider another modification:

maximize ๐‘ฅsubject to 2๐‘ฅโˆ’ ๐‘ฆ โ‰ฅ โˆ’1,

2.001๐‘ฅโˆ’ ๐‘ฆ โ‰ค 0,

with optimal solution (๐‘ฅ, ๐‘ฆ) = (103, 2 ยท 103 + 1), which shows how a small perturbation ofthe coefficients induces a large change of the solution.

Typical examples of problems with nearly linearly dependent constraints are discretiza-tions of continuous processes, where the constraints invariably become more correlated aswe make the discretization finer; as such there may be nothing wrong with the discretiza-tion or problem formulation, but we should expect numerical difficulties for sufficiently finediscretizations.

One should also be careful not to specify problems whose optimal value is only achievedin the limit. A trivial example is

minimize ๐‘’๐‘ฅ, ๐‘ฅ โˆˆ R.

78

Page 82: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

The infimum value 0 is not attained for any finite ๐‘ฅ, which can lead to unspecified behaviourof the solver. More examples of ill-posed conic problems are discussed in Fig. 7.1 and Sec.8. In particular, Fig. 7.1 depicts some troublesome problems in two dimensions. In (a)minimizing ๐‘ฆ โ‰ฅ 1/๐‘ฅ on the nonnegative orthant is unattained approaching zero for ๐‘ฅ โ†’ โˆž.In (b) the only feasible point is (0, 0), but objectives maximizing ๐‘ฅ are not blocked by thepurely vertical normal vectors of active constraints at this point, falsely suggesting localprogress could be made. In (c) the intersection of the two subsets is empty, but the distancebetween them is zero. Finally in (d) minimizing ๐‘ฅ on ๐‘ฆ โ‰ฅ ๐‘ฅ2 is unbounded at minus infinityfor ๐‘ฆ โ†’ โˆž, but there is no improving ray to follow. Although seemingly unrelated, the fourcases are actually primal-dual pairs; (a) with (b) and (c) with (d). In fact, the missing normalvector property in (b)โ€”desired to certify optimalityโ€”can be attributed to (a) not attainingthe best among objective values at distance zero to feasibility, and the missing positivedistance property in (c)โ€”desired to certify infeasibilityโ€”is because (d) has no improvingray.

Fig. 7.1: Geometric representations of some ill-posed situations.

7.3 Scaling

Another difficulty encountered in practice involves models that are badly scaled. Looselyspeaking, we consider a model to be badly scaled if

โ€ข variables are measured on very different scales,

โ€ข constraints or bounds are measured on very different scales.

For example if one variable ๐‘ฅ1 has units of molecules and another variable ๐‘ฅ2 measurestemperature, we might expect an objective function such as

๐‘ฅ1 + 1012๐‘ฅ2

79

Page 83: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

and in finite precision the second term dominates the objective and renders the contributionfrom ๐‘ฅ1 insignificant and unreliable.

A similar situation (from a numerical point of view) is encountered when using penal-ization or big-๐‘€ strategies. Assume we have a standard linear optimization problem (2.12)with the additional constraint that ๐‘ฅ1 = 0. We may eliminate ๐‘ฅ1 completely from the model,or we might add an additional constraint, but suppose we choose to formulate a penalizedproblem instead

minimize ๐‘๐‘‡๐‘ฅ + 1012๐‘ฅ1

subject to ๐ด๐‘ฅ = ๐‘,๐‘ฅ โ‰ฅ 0,

reasoning that the large penalty term will force ๐‘ฅ1 = 0. However, if โ€–๐‘โ€– is small we havethe exact same problem, namely that in finite precision the penalty term will completelydominate the objective and render the contribution ๐‘๐‘‡๐‘ฅ insignificant or unreliable. Therefore,the penalty term should be chosen carefully.

Example 7.6 (Risk-return tradeoff). Suppose we want to solve an efficient frontier variantof the optimal portfolio problem from Sec. 3.3.3 with an objective of the form

maximize ๐œ‡๐‘‡๐‘ฅโˆ’ 1

2๐›พ๐‘ฅ๐‘‡ฮฃ๐‘ฅ

where the parameter ๐›พ controls the tradeoff between return and risk. The user mightchoose a very small ๐›พ to discount the risk term. This, however, may lead to numericalissues caused by a very large solution. To see why, consider a simple one-dimensional,unconstrained analogue:

maximize ๐‘ฅโˆ’ 1

2๐›พ๐‘ฅ2, ๐‘ฅ โˆˆ R

and note that the maximum is attained for ๐‘ฅ = 1/๐›พ, which may be very large.

Example 7.7 (Explicit redundant bounds). Consider again the problem (2.12), but withadditional (redundant) constraints ๐‘ฅ๐‘– โ‰ค ๐›พ. This is a common approach for some optimiza-tion practitioners. The problem we solve is then

minimize ๐‘๐‘‡๐‘ฅsubject to ๐ด๐‘ฅ = ๐‘,

๐‘ฅ โ‰ฅ 0,๐‘ฅ โ‰ค ๐›พ๐‘’,

with a dual problem

maximize ๐‘๐‘‡๐‘ฆ โˆ’ ๐›พ๐‘’๐‘‡ ๐‘งsubject to ๐ด๐‘‡๐‘ฆ + ๐‘ โˆ’ ๐‘ง = ๐‘,

๐‘ , ๐‘ง โ‰ฅ 0.

80

Page 84: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Suppose we do not know a-priori an upper bound on โ€–๐‘ฅโ€–โˆž, so we choose a very large๐›พ = 1012 reasoning that this will not change the optimal solution. Note that the largevariable bound becomes a penalty term in the dual problem; in finite precision such alarge bound will effectively destroy accuracy of the solution.

Example 7.8 (Big-M). Suppose we have a mixed integer model which involves a big-Mconstraint (see Sec. 9), for instance

๐‘Ž๐‘‡๐‘ฅ โ‰ค ๐‘ + ๐‘€(1 โˆ’ ๐‘ง)

as in Sec. 9.1.3. The constant ๐‘€ must be big enough to guarantee that the constraintbecomes redundant when ๐‘ง = 0, or we risk shrinking the feasible set, perhaps to the pointof creating an infeasible model. On the other hand, the user might set ๐‘€ to a huge, safevalue, say ๐‘€ = 1012. While this is mathematically correct, it will lead to a model withpoorer linear relaxations and most likely increase solution time (sometimes dramatically).It is therefore very important to put some effort into estimating a reasonable value of ๐‘€which is safe, but not too big.

7.4 The huge and the tiny

Some types of problems and constraints are especially prone to behave badly with very largeor very small numbers. We discuss such examples briefly in this section.

Sum of squares

The sum of squares ๐‘ฅ21 + ยท ยท ยท+๐‘ฅ2

๐‘› can be bounded above using the rotated quadratic cone:

๐‘ก โ‰ฅ ๐‘ฅ21 + ยท ยท ยท + ๐‘ฅ2

๐‘› โ‡โ‡’ (1

2, ๐‘ก, ๐‘ฅ) โˆˆ ๐’ฌ๐‘›

r ,

but in most cases it is better to bound the square root with a quadratic cone:

๐‘กโ€ฒ โ‰ฅโˆš๏ธ

๐‘ฅ21 + ยท ยท ยท + ๐‘ฅ2

๐‘› โ‡โ‡’ (๐‘กโ€ฒ, ๐‘ฅ) โˆˆ ๐’ฌ๐‘›.

The latter has slightly better numerical properties: ๐‘กโ€ฒ is roughly comparable with ๐‘ฅ andis measured on the same scale, while ๐‘ก grows as the square of ๐‘ฅ which will sooner lead tonumerical instabilities.

Exponential cone

Using the function ๐‘’๐‘ฅ for ๐‘ฅ โ‰ค โˆ’30 or ๐‘ฅ โ‰ฅ 30 would be rather questionable if ๐‘’๐‘ฅ is supposedto represent any realistic value. Note that ๐‘’30 โ‰ˆ 1013 and that ๐‘’30.0000001 โˆ’ ๐‘’30 โ‰ˆ 106, so a

81

Page 85: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

tiny perturbation of the exponent produces a huge change of the result. Ideally, ๐‘ฅ shouldhave a single-digit absolute value. For similar reasons, it is even more important to havewell-scaled data. For instance, in floating point arithmetic we have the โ€œequalityโ€

๐‘’20 + ๐‘’โˆ’20 = ๐‘’20

since the smaller summand disappears in comparison to the other one, 1017 times bigger.For the same reason it is advised to replace explicit inequalities involving exp(๐‘ฅ) with

log-sum-exp variants (see Sec. 5.2.6). For example, suppose we have a constraint such as

๐‘ก โ‰ฅ ๐‘’๐‘ฅ1 + ๐‘’๐‘ฅ2 + ๐‘’๐‘ฅ3 .

Then after a substitution ๐‘ก = ๐‘’๐‘กโ€ฒ we have

1 โ‰ฅ ๐‘’๐‘ฅ1โˆ’๐‘กโ€ฒ + ๐‘’๐‘ฅ2โˆ’๐‘กโ€ฒ + ๐‘’๐‘ฅ3โˆ’๐‘กโ€ฒ

which has the advantage that ๐‘กโ€ฒ is of the same order of magnitude as ๐‘ฅ๐‘–, and that theexponents ๐‘ฅ๐‘– โˆ’ ๐‘กโ€ฒ are likely to have much smaller absolute values than simply ๐‘ฅ๐‘–.

Power cone

The power cone is not reliable when one of the exponents is very small. For example,consider the function ๐‘“(๐‘ฅ) = ๐‘ฅ0.01, which behaves almost like the indicator function of ๐‘ฅ > 0in the sense that

๐‘“(0) = 0, and ๐‘“(๐‘ฅ) > 0.5 for ๐‘ฅ > 10โˆ’30.

Now suppose ๐‘ฅ = 10โˆ’20. Is the constraint ๐‘“(๐‘ฅ) > 0.5 satisfied? In principle yes, but inpractice ๐‘ฅ could have been obtained as a solution to an optimization problem and it may infact represent 0 up to numerical error. The function ๐‘“(๐‘ฅ) is sensitive to changes in ๐‘ฅ wellbelow standard numerical accuracy. The point ๐‘ฅ = 0 does not satisfy ๐‘“(๐‘ฅ) > 0.5 but it isonly 10โˆ’30 from doing so.

7.5 Semidefinite variables

Special care should be given to models involving semidefinite matrix variables (see Sec. 6),otherwise it is easy to produce an unnecessarily inefficient model. The general rule of thumbis:

โ€ข having many small matrix variables is more efficient than one big matrix variable,

โ€ข efficiency drops with growing number of semidefinite terms involved in linear con-straints. This can have much bigger impact on the solution time than increasing thedimension of the semidefinite variable.

Let us consider a few examples.

82

Page 86: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Block matrices

Given two matrix variables ๐‘‹, ๐‘Œ โชฐ 0 do not assemble them in a block matrix and writethe constraints as [๏ธ‚

๐‘‹ 00 ๐‘Œ

]๏ธ‚โชฐ 0.

This increases the dimension of the problem and, even worse, introduces unnecessary con-straints for a large portion of entries of the block matrix.

Schur complement

Suppose we want to model a relaxation of a rank-one constraint:[๏ธ‚๐‘‹ ๐‘ฅ๐‘ฅ๐‘‡ 1

]๏ธ‚โชฐ 0.

where ๐‘ฅ โˆˆ R๐‘› and ๐‘‹ โˆˆ ๐’ฎ๐‘›+. The correct way to do this is to set up a matrix variable

๐‘Œ โˆˆ ๐’ฎ๐‘›+1+ with only a linear number of constraints:

๐‘Œ๐‘–,๐‘›+1 = ๐‘ฅ๐‘–, ๐‘– = 1, . . . , ๐‘›๐‘Œ๐‘›+1,๐‘›+1 = 1,๐‘Œ โชฐ 0,

and use the upper-left ๐‘›ร— ๐‘› part of ๐‘Œ as the original ๐‘‹. Going the other way around, i.e.starting with a variable ๐‘‹ and aligning it with the corner of another, bigger semidefinitematrix ๐‘Œ introduces ๐‘›(๐‘›+1)/2 equality constraints and will quickly have formidable solutiontimes.

Sparse LMIs

Suppose we want to model a problem with a sparse linear matrix inequality (see Sec.6.2.1) such as:

minimize ๐‘๐‘‡๐‘ฅ

subject to ๐ด0 +โˆ‘๏ธ€๐‘˜

๐‘–=1 ๐ด๐‘–๐‘ฅ๐‘– โชฐ 0,

where ๐‘ฅ โˆˆ R๐‘˜ and ๐ด๐‘– are symmetric ๐‘› ร— ๐‘› matrices. The representation of this problem inprimal form is:

minimize ๐‘๐‘‡๐‘ฅ

subject to ๐ด0 +โˆ‘๏ธ€๐‘˜

๐‘–=1 ๐ด๐‘–๐‘ฅ๐‘– = ๐‘‹,๐‘‹ โชฐ 0,

and the linear constraint requires a full set of ๐‘›(๐‘› + 1)/2 equalities, many of which are just๐‘‹๐‘˜,๐‘™ = 0, regardless of the sparsity of ๐ด๐‘–. However the dual problem (see Sec. 8.6) is:

maximize โˆ’โŸจ๐ด0, ๐‘โŸฉsubject to โŸจ๐ด๐‘–, ๐‘โŸฉ = ๐‘๐‘–, ๐‘– = 1, . . . , ๐‘˜,

๐‘ โชฐ 0,

83

Page 87: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

and the number of nonzeros in linear constraints is just joint number of nonzeros in ๐ด๐‘–. Itmeans that large, sparse LMIs should almost always be dualized and entered in that formfor efficiency reasons.

7.6 The quality of a solution

In this section we will discuss how to validate an obtained solution. Assume we have a conicmodel with continuous variables only and that the optimization software has reported anoptimal primal and dual solution. Given such a solution, we might ask how to verify that itis indeed feasible and optimal.

To that end, consider a simple model

minimize โˆ’๐‘ฅ2

subject to ๐‘ฅ1 + ๐‘ฅ2 โ‰ค 3,๐‘ฅ22 โ‰ค 2,

๐‘ฅ1, ๐‘ฅ2 โ‰ฅ 0,

(7.1)

where a solver might approximate the solution as

๐‘ฅ1 = 0.0000000000000000 and ๐‘ฅ2 = 1.4142135623730951

and therefore the approximate optimal objective value is

โˆ’1.4142135623730951.

The true objective value is โˆ’โˆš

2, so the approximate objective value is wrong by the amount

1.4142135623730951 โˆ’โˆš

2 โ‰ˆ 10โˆ’16.

Most likely this difference is irrelevant for all practical purposes. Nevertheless, in general asolution obtained using floating point arithmetic is only an approximation. Most (if not all)commercial optimization software uses double precision floating point arithmetic, implyingthat about 16 digits are correct in the computations performed internally by the software.This also means that irrational numbers such as

โˆš2 and ๐œ‹ can only be stored accurately

within 16 digits.

Verifying feasibility

A good practice after solving an optimization problem is to evaluate the reported solution.At the very least this process should be carried out during the initial phase of building amodel, or if the reported solution is unexpected in some way. The first step in that processis to check that the solution is feasible; in case of the small example (7.1) this amounts tochecking that:

๐‘ฅ1 + ๐‘ฅ2 = 0.0000000000000000 + 1.4142135623730951 โ‰ค 3,๐‘ฅ22 = 2.0000000000000004 โ‰ค 2, (?)

๐‘ฅ1 = 0.0000000000000000 โ‰ฅ 0,๐‘ฅ2 = 1.4142135623730951 โ‰ฅ 0,

84

Page 88: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

which demonstrates that one constraint is slightly violated due to computations in finiteprecision. It is up to the user to assess the significance of a specific violation; for example aviolation of one unit in

๐‘ฅ1 + ๐‘ฅ2 โ‰ค 1

may or may not be more significant than a violation of one unit in

๐‘ฅ1 + ๐‘ฅ2 โ‰ค 109.

The right-hand side of 109 may itself be the result of a computation in finite precision, andmay only be known with, say 3 digits of accuracy. Therefore, a violation of 1 unit is notsignificant since the true right-hand side could just as well be 1.001ยท109. Certainly a violationof 4 ยท 10โˆ’16 as in our example is within the numerical tolerance for zero and we would acceptthe solution as feasible. In practice it would be standard to see violations of order 10โˆ’8.

Verifying optimality

Another question is how to verify that a feasible approximate solution is actually optimal,which is answered by duality theory, which is discussed in Sec. 2.4 for linear problems andin Sec. 8 in full generality. Following Sec. 8 we can derive an equivalent version of the dualproblem as:

maximize โˆ’3๐‘ฆ1 โˆ’ ๐‘ฆ2 โˆ’ ๐‘ฆ3subject to 2๐‘ฆ2๐‘ฆ3 โ‰ฅ (๐‘ฆ1 + 1)2,

๐‘ฆ1 โ‰ฅ 0.

This problem has a feasible point (๐‘ฆ1, ๐‘ฆ2, ๐‘ฆ3) = (0, 1/โˆš

2, 1/โˆš

2) with objective value โˆ’โˆš

2,matching the objective value of the primal problem. By duality theory this proves that boththe primal and dual solutions are optimal. Again, in finite precision the dual objective valuemay be reported as, for example

โˆ’1.4142135623730950

leading to a duality gap of about 10โˆ’16 between the approximate primal and dual objectivevalues. It is up to the user to assess the significance of any duality gap and accept or rejectthe solution as sufficiently good approximation of the optimal value.

7.7 Distance to a cone

The violation of a linear constraint ๐‘Ž๐‘‡๐‘ฅ โ‰ค ๐‘ under a given solution ๐‘ฅ* is obviously

max(0, ๐‘Ž๐‘‡๐‘ฅ* โˆ’ ๐‘).

85

Page 89: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

It is less obvious how to assess violations of conic constraints. To this end suppose we havea convex cone ๐พ. The (Euclidean) projection of a point ๐‘ฅ onto ๐พ is defined as the solutionof the problem

minimize โ€–๐‘โˆ’ ๐‘ฅโ€–2subject to ๐‘ โˆˆ ๐พ.

(7.2)

We denote the projection by proj๐พ(๐‘ฅ). The distance of ๐‘ฅ to ๐พ

dist(๐‘ฅ,๐พ) = โ€–๐‘ฅโˆ’ proj๐พ(๐‘ฅ)โ€–2,

i.e. the objective value of (7.2), measures the violation of a conic constraint involving ๐พ.Obviously

dist(๐‘ฅ,๐พ) = 0 โ‡โ‡’ ๐‘ฅ โˆˆ ๐พ.

This distance measure is attractive because it depends only on the set ๐พ and not on anyparticular representation of ๐พ using (in)equalities.

Example 7.9 (Surprisingly small distance). Let ๐พ = ๐’ซ0.1,0.93 be the power cone defined

by the inequality ๐‘ฅ0.1๐‘ฆ0.9 โ‰ฅ |๐‘ง|. The point

๐‘ฅ* = (๐‘ฅ, ๐‘ฆ, ๐‘ง) = (0, 10000, 500)

is clearly not in the cone, in fact ๐‘ฅ0.1๐‘ฆ0.9 = 0 while |๐‘ง| = 500, so the violation of the conicinequality is seemingly large. However, the point

๐‘ = (10โˆ’8, 10000, 500)

belongs to ๐’ซ0.1,0.93 (since (10โˆ’8)0.1 ยท 100000.9 โ‰ˆ 630), hence

dist(๐‘ฅ*,๐’ซ0.1,0.93 ) โ‰ค โ€–๐‘ฅ* โˆ’ ๐‘โ€–2 = 10โˆ’8.

Therefore the distance of ๐‘ฅ* to the cone is actually very small.

Distance to certain cones

For some basic cone types the projection problem (7.2) can be solved analytically. De-noting [๐‘ฅ]+ = max(0, ๐‘ฅ) we have that:

โ€ข For ๐‘ฅ โˆˆ R the projection onto the nonnegative half-line is projR+(๐‘ฅ) = [๐‘ฅ]+.

โ€ข For (๐‘ก, ๐‘ฅ) โˆˆ Rร— R๐‘›โˆ’1 the projection onto the quadratic cone is

proj๐’ฌ๐‘›(๐‘ก, ๐‘ฅ) =

โŽงโŽชโŽชโŽจโŽชโŽชโŽฉ(๐‘ก, ๐‘ฅ) if ๐‘ก โ‰ฅ โ€–๐‘ฅโ€–2,12

(๏ธ๐‘ก

โ€–๐‘ฅโ€–2 + 1)๏ธ

(โ€–๐‘ฅโ€–2, ๐‘ฅ) if โˆ’ โ€–๐‘ฅโ€–2 < ๐‘ก < โ€–๐‘ฅโ€–2,0 if ๐‘ก โ‰ค โˆ’โ€–๐‘ฅโ€–2.

86

Page 90: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

โ€ข For ๐‘‹ =โˆ‘๏ธ€๐‘›

๐‘–=1 ๐œ†๐‘–๐‘ž๐‘–๐‘ž๐‘‡๐‘– the projection onto the semidefinite cone is

proj๐’ฎ๐‘›+

(๐‘‹) =๐‘›โˆ‘๏ธ

๐‘–=1

[๐œ†๐‘–]+๐‘ž๐‘–๐‘ž๐‘‡๐‘– .

87

Page 91: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Chapter 8

Duality in conic optimization

In Sec. 2 we introduced duality and related concepts for linear optimization. Here we presenta more general version of this theory for conic optimization and we illustrate it with examples.Although this chapter is self-contained, we recommend familiarity with Sec. 2, which someof this material is a natural extension of.

Duality theory is a rich and powerful area of convex optimization, and central to under-standing sensitivity analysis and infeasibility issues. Furthermore, it provides a simple andsystematic way of obtaining non-trivial lower bounds on the optimal value for many difficultnon-convex problems.

From now on we consider a conic problem in standard form

minimize ๐‘๐‘‡๐‘ฅsubject to ๐ด๐‘ฅ = ๐‘,

๐‘ฅ โˆˆ ๐พ,(8.1)

where ๐พ is a convex cone. In practice ๐พ would likely be defined as a product

๐พ = ๐พ1 ร— ยท ยท ยท ร—๐พ๐‘š

of smaller cones corresponding to actual constraints in the problem formulation. The abstractformulation (8.1) encompasses such special cases as ๐พ = R๐‘›

+ (linear optimization), ๐พ๐‘– = R(unconstrained variable) or ๐พ๐‘– = ๐’ฎ๐‘›

+ (12๐‘›(๐‘› + 1) variables representing the upper-triangular

part of a matrix).The feasible set for (8.1):

โ„ฑ๐‘ = {๐‘ฅ โˆˆ R๐‘› | ๐ด๐‘ฅ = ๐‘} โˆฉ๐พ

is a section of ๐พ. We say the problem is feasible if โ„ฑ๐‘ ฬธ= โˆ… and infeasible otherwise. Thevalue of (8.1) is defined as

๐‘โ‹† = inf{๐‘๐‘‡๐‘ฅ : ๐‘ฅ โˆˆ โ„ฑ๐‘},

allowing the special cases of ๐‘โ‹† = +โˆž (the problem is infeasible) and ๐‘โ‹† = โˆ’โˆž (the problemis unbounded). Note that ๐‘โ‹† may not be attained, i.e. the infimum is not necessarily aminimum, although this sort of problems are ill-posed from a practical point of view (seeSec. 7).

88

Page 92: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Example 8.1 (Unattained problem value). The conic quadratic problem

minimize ๐‘ฅsubject to (๐‘ฅ, ๐‘ฆ, 1) โˆˆ ๐’ฌ3

๐‘Ÿ,

with constraint equivalent to 2๐‘ฅ๐‘ฆ โ‰ฅ 1, ๐‘ฅ, ๐‘ฆ โ‰ฅ 0 has ๐‘โ‹† = 0 but this optimal value is notattained by any point in โ„ฑ๐‘.

8.1 Dual cone

Suppose ๐พ โŠ† R๐‘› is a closed convex cone. We define the dual cone ๐พ* as

๐พ* = {๐‘ฆ โˆˆ R๐‘› : ๐‘ฆ๐‘‡๐‘ฅ โ‰ฅ 0 โˆ€๐‘ฅ โˆˆ ๐พ}. (8.2)

For simplicity we write ๐‘ฆ๐‘‡๐‘ฅ, denoting the standard Euclidean inner product, but everythingin this section applies verbatim to the inner product of matrices in the semidefinite context.In other words ๐พ* consists of vectors which form an acute (or right) angle with every vectorin ๐พ. We see easily that ๐พ* is in fact a convex cone. The main example, associated withlinear optimization, is the dual of the positive orthant:

Lemma 8.1 (Dual of linear cone).

(R๐‘›+)* = R๐‘›

+.

Proof. This is obvious for ๐‘› = 1, and then we use the simple fact

(๐พ1 ร— ยท ยท ยท ร—๐พ๐‘š)* = ๐พ*1 ร— ยท ยท ยท ร—๐พ*

๐‘š,

which we invite the reader to prove.

All cones studied in this cookbook can be explicitly dualized as we show next.

Lemma 8.2 (Duals of nonlinear cones). We have the following:

โ€ข The quadratic, rotated quadratic and semidefinite cones are self-dual:

(๐’ฌ๐‘›)* = ๐’ฌ๐‘›, (๐’ฌ๐‘›r )* = ๐’ฌ๐‘›

r , (๐’ฎ๐‘›+)* = ๐’ฎ๐‘›

+.

โ€ข The dual of the power cone (4.4) is

(๐’ซ๐›ผ1,ยทยทยท ,๐›ผ๐‘š๐‘› )* =

{๏ธ‚๐‘ฆ โˆˆ R๐‘› :

(๏ธ‚๐‘ฆ1๐›ผ1

, . . . ,๐‘ฆ๐‘š๐›ผ๐‘š

, ๐‘ฆ๐‘š+1, . . . , ๐‘ฆ๐‘›

)๏ธ‚โˆˆ ๐’ซ๐›ผ1,ยทยทยท ,๐›ผ๐‘š

๐‘›

}๏ธ‚.

โ€ข The dual of the exponential cone (5.2) is the closure

(๐พexp)* = cl{๏ธ€๐‘ฆ โˆˆ R3 : ๐‘ฆ1 โ‰ฅ โˆ’๐‘ฆ3๐‘’

๐‘ฆ2/๐‘ฆ3โˆ’1, ๐‘ฆ1 > 0, ๐‘ฆ3 < 0}๏ธ€.

89

Page 93: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Proof. We only sketch some parts of the proof and indicate geometric intuitions. First, letus show (๐’ฌ๐‘›)* = ๐’ฌ๐‘›. For (๐‘ก, ๐‘ฅ), (๐‘ , ๐‘ฆ) โˆˆ ๐’ฌ๐‘› we have

๐‘ ๐‘ก + ๐‘ฆ๐‘‡๐‘ฅ โ‰ฅ โ€–๐‘ฆโ€–2โ€–๐‘ฅโ€–2 + ๐‘ฆ๐‘‡๐‘ฅ โ‰ฅ 0

by the Cauchy-Schwartz inequality |๐‘ข๐‘‡๐‘ฃ| โ‰ค โ€–๐‘ขโ€–2โ€–๐‘ฃโ€–2, therefore ๐’ฌ๐‘› โŠ† (๐’ฌ๐‘›)*. Geometricallywe are just saying that the quadratic cone ๐’ฌ๐‘› has a right angle at the apex, so any twovectors inside the cone form at most a right angle. Now suppose (๐‘ , ๐‘ฆ) is a point outside๐’ฌ๐‘›. If (โˆ’๐‘ ,โˆ’๐‘ฆ) โˆˆ ๐’ฌ๐‘› then โˆ’๐‘ 2 โˆ’ ๐‘ฆ๐‘‡๐‘ฆ < 0 and we showed (๐‘ , ๐‘ฆ) ฬธโˆˆ (๐’ฌ๐‘›)*. Otherwise let(๐‘ก, ๐‘ฅ) = proj๐’ฌ๐‘›(๐‘ , ๐‘ฆ) be the projection (see Sec. 7.7); note that (๐‘ , ๐‘ฆ) and (๐‘ก,โˆ’๐‘ฅ) โˆˆ ๐’ฌ๐‘› forman obtuse angle.

For the semidefinite cone we use the property โŸจ๐‘‹, ๐‘Œ โŸฉ โ‰ฅ 0 for ๐‘‹, ๐‘Œ โˆˆ ๐’ฎ๐‘›+ (see Sec. 6.1.2).

Conversely, assume that ๐‘ ฬธโˆˆ ๐’ฎ๐‘›+. Then there exists a ๐‘ค satisfying โŸจ๐‘ค๐‘ค๐‘‡ , ๐‘โŸฉ = ๐‘ค๐‘‡๐‘๐‘ค < 0,

so ๐‘ ฬธโˆˆ (๐’ฎ๐‘›+)*.

As our last exercise let us check that vectors in ๐พexp and (๐พexp)* form acute angles. Bydefinition of ๐พexp and (๐พexp)* we take ๐‘ฅ, ๐‘ฆ such that:

๐‘ฅ1 โ‰ฅ ๐‘ฅ2 exp(๐‘ฅ3/๐‘ฅ2), ๐‘ฆ1 โ‰ฅ โˆ’๐‘ฆ3 exp(๐‘ฆ2/๐‘ฆ3 โˆ’ 1), ๐‘ฅ1, ๐‘ฅ2, ๐‘ฆ1 > 0, ๐‘ฆ3 < 0,

and then we have

๐‘ฆ๐‘‡๐‘ฅ = ๐‘ฅ1๐‘ฆ1 + ๐‘ฅ2๐‘ฆ2 + ๐‘ฅ3๐‘ฆ3 โ‰ฅ โˆ’๐‘ฅ2๐‘ฆ3 exp(๐‘ฅ3

๐‘ฅ2+ ๐‘ฆ2

๐‘ฆ3โˆ’ 1) + ๐‘ฅ2๐‘ฆ2 + ๐‘ฅ3๐‘ฆ3

โ‰ฅ โˆ’๐‘ฅ2๐‘ฆ3(๐‘ฅ3

๐‘ฅ2+ ๐‘ฆ2

๐‘ฆ3) + ๐‘ฅ2๐‘ฆ2 + ๐‘ฅ3๐‘ฆ3 = 0,

using the inequality exp(๐‘ก) โ‰ฅ 1 + ๐‘ก. We refer to the literature for full proofs of all thestatements.

Finally it is nice to realize that (๐พ*)* = ๐พ and that the cones {0} and R are each othersโ€™duals. We leave it to the reader to check these facts.

8.2 Infeasibility in conic optimization

We can now discuss infeasibility certificates for conic problems. Given an optimizationproblem, the first basic question we are interested in is its feasibility status. The theoryof infeasibility certificates for linear problems, including the Farkas lemma (see Sec. 2.3)extends almost verbatim to the conic case.

Lemma 8.3 (Farkasโ€™ lemma, conic version). Suppose we have the conic optimization problem(8.1). Exactly one of the following statements is true:

1. Problem (8.1) is feasible.

2. Problem (8.1) is infeasible, but there is a sequence ๐‘ฅ๐‘› โˆˆ ๐พ such that โ€–๐ด๐‘ฅ๐‘› โˆ’ ๐‘โ€– โ†’ 0.

3. There exists ๐‘ฆ such that โˆ’๐ด๐‘‡๐‘ฆ โˆˆ ๐พ* and ๐‘๐‘‡๐‘ฆ > 0.

90

Page 94: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Problems which fall under option 2. (limit-feasible) are ill-posed: an arbitrarily smallperturbation of input data puts the problem in either category 1. or 3. This fringe caseshould therefore not appear in practice, and if it does, it signals issues with the optimizationmodel which should be addressed.

Example 8.2 (Limit-feasible model). Here is an example of an ill-posed limit-feasiblemodel created by fixing one of the root variables of a rotated quadratic cone to 0.

minimize ๐‘ขsubject to (๐‘ข, ๐‘ฃ, ๐‘ค) โˆˆ ๐’ฌ3

๐‘Ÿ,๐‘ฃ = 0,๐‘ค โ‰ฅ 1.

The problem is clearly infeasible, but the sequence (๐‘ข๐‘›, ๐‘ฃ๐‘›, ๐‘ค๐‘›) = (๐‘›, 1/๐‘›, 1) โˆˆ ๐’ฌ3๐‘Ÿ with

(๐‘ฃ๐‘›, ๐‘ค๐‘›) โ†’ (0, 1) makes it limit-feasible as in alternative 2. of Lemma 8.3. There is noinfeasibility certificate as in alternative 3.

Having cleared this detail we continue with the proof and example for the actually usefulpart of conic Farkasโ€™ lemma.

Proof. Consider the set

๐ด(๐พ) = {๐ด๐‘ฅ : ๐‘ฅ โˆˆ ๐พ}.

It is a convex cone. Feasibility is equivalent to ๐‘ โˆˆ ๐ด(๐พ). If ๐‘ ฬธโˆˆ ๐ด(๐พ) but ๐‘ โˆˆ cl(๐ด(๐พ))then we have the second alternative. Finally, if ๐‘ ฬธโˆˆ cl(๐ด(๐พ)) then the closed convex conecl(๐ด(๐พ)) and the point ๐‘ can be strictly separated by a hyperplane passing through theorigin, i.e. there exists ๐‘ฆ such that

๐‘๐‘‡๐‘ฆ > 0, (๐ด๐‘ฅ)๐‘‡๐‘ฆ โ‰ค 0 โˆ€๐‘ฅ โˆˆ ๐พ.

But then 0 โ‰ค โˆ’(๐ด๐‘ฅ)๐‘‡๐‘ฆ = (โˆ’๐ด๐‘‡๐‘ฆ)๐‘‡๐‘ฅ for all ๐‘ฅ โˆˆ ๐พ, so by definition of ๐พ* we have โˆ’๐ด๐‘‡๐‘ฆ โˆˆ๐พ*, and we showed ๐‘ฆ satisfies the third alternative. Finally, 1. and 3. are mutually exclusive,since otherwise we would have

0 โ‰ค (โˆ’๐ด๐‘‡๐‘ฆ)๐‘‡๐‘ฅ = โˆ’๐‘ฆ๐‘‡๐ด๐‘ฅ = โˆ’๐‘ฆ๐‘‡ ๐‘ < 0

and the same argument works in the limit if ๐‘ฅ๐‘› is a sequence as in 2. That proves thelemma.

Therefore Farkasโ€™ lemma implies that (up to certain ill-posed cases) either the problem(8.1) is feasible (first alternative) or there is a certificate of infeasibility ๐‘ฆ (last alternative).In other words, every time we classify a model as infeasible, we can certify this fact byproviding an appropriate ๐‘ฆ. Note that when ๐พ = R๐‘›

+ we recover precisely the linear version,Lemma 2.1.

91

Page 95: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Example 8.3 (Infeasible conic problem). Consider a minimization problem:

minimize ๐‘ฅ1

subject to โˆ’๐‘ฅ1 + ๐‘ฅ2 โˆ’ ๐‘ฅ4 = 1,2๐‘ฅ3 โˆ’ 3๐‘ฅ4 = โˆ’1,

๐‘ฅ1 โ‰ฅโˆš๏ธ€

๐‘ฅ22 + ๐‘ฅ2

3,๐‘ฅ4 โ‰ฅ 0.

It can be expressed in the standard form (8.1) with

๐ด =

[๏ธ‚โˆ’1 1 0 โˆ’10 0 2 โˆ’3

]๏ธ‚, ๐‘ =

[๏ธ‚1โˆ’1

]๏ธ‚, ๐พ = ๐’ฌ3 ร— R+, ๐‘ = [1, 0, 0, 0]๐‘‡ .

A certificate of infeasibility is ๐‘ฆ = [1, 0]๐‘‡ . Indeed, ๐‘๐‘‡๐‘ฆ = 1 > 0 and โˆ’๐ด๐‘‡๐‘ฆ = [1,โˆ’1, 0, 1] โˆˆ๐พ* = ๐’ฌ3 ร— R+. The certificate indicates that the first linear constraint alone causesinfeasibility, which is indeed the case: the first equality together with the conic constraintsyield a contradiction:

๐‘ฅ2 โˆ’ 1 โ‰ฅ ๐‘ฅ2 โˆ’ 1 โˆ’ ๐‘ฅ4 = ๐‘ฅ1 โ‰ฅ |๐‘ฅ2|.

8.3 Lagrangian and the dual problem

Classical Lagrangian

In general constrained optimization we consider an optimization problem of the form

minimize ๐‘“0(๐‘ฅ)subject to ๐‘“(๐‘ฅ) โ‰ค 0,

โ„Ž(๐‘ฅ) = 0,(8.3)

where ๐‘“0 : R๐‘› โ†ฆโ†’ R is the objective function, ๐‘“ : R๐‘› โ†ฆโ†’ R๐‘š encodes inequality constraints,and โ„Ž : R๐‘› โ†ฆโ†’ R๐‘ encodes equality constraints. Readers familiar with the method of Lagrangemultipliers or penalization will recognize the Lagrangian for (8.3), a function ๐ฟ : R๐‘› ร—R๐‘š ร—R๐‘ โ†ฆโ†’ R that augments the objective with a weighted combination of all the constraints,

๐ฟ(๐‘ฅ, ๐‘ฆ, ๐‘ ) = ๐‘“0(๐‘ฅ) + ๐‘ฆ๐‘‡โ„Ž(๐‘ฅ) + ๐‘ ๐‘‡๐‘“(๐‘ฅ). (8.4)

The variables ๐‘ฆ โˆˆ R๐‘ and ๐‘  โˆˆ R๐‘š+ are called Lagrange multipliers or dual variables. The

Lagrangian has the property that ๐ฟ(๐‘ฅ, ๐‘ฆ, ๐‘ ) โ‰ค ๐‘“0(๐‘ฅ) whenever ๐‘ฅ is feasible for (8.1) and๐‘  โˆˆ R๐‘š

+ . The optimal point satisfies the first-order optimality condition โˆ‡๐‘ฅ๐ฟ(๐‘ฅ, ๐‘ฆ, ๐‘ ) = 0.Moreover, the dual function ๐‘”(๐‘ฆ, ๐‘ ) = inf๐‘ฅ ๐ฟ(๐‘ฅ, ๐‘ฆ, ๐‘ ) provides a lower bound for the optimalvalue of (8.3) for any ๐‘ฆ โˆˆ R๐‘ and ๐‘  โˆˆ R๐‘š

+ , which leads to considering the dual problem ofmaximizing ๐‘”(๐‘ฆ, ๐‘ ).

92

Page 96: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Lagrangian for a conic problem

We next set up an analogue of the Lagrangian theory for the conic problem (8.1)

minimize ๐‘๐‘‡๐‘ฅsubject to ๐ด๐‘ฅ = ๐‘,

๐‘ฅ โˆˆ ๐พ,

where ๐‘ฅ, ๐‘ โˆˆ R๐‘›, ๐‘ โˆˆ R๐‘š, ๐ด โˆˆ R๐‘šร—๐‘› and ๐พ โŠ† R๐‘› is a convex cone. We associate with (8.1)a Lagrangian of the form ๐ฟ : R๐‘› ร— R๐‘š ร—๐พ* โ†’ R

๐ฟ(๐‘ฅ, ๐‘ฆ, ๐‘ ) = ๐‘๐‘‡๐‘ฅ + ๐‘ฆ๐‘‡ (๐‘โˆ’ ๐ด๐‘ฅ) โˆ’ ๐‘ ๐‘‡๐‘ฅ.

For any feasible ๐‘ฅ* โˆˆ โ„ฑ๐‘ and any (๐‘ฆ*, ๐‘ *) โˆˆ R๐‘š ร—๐พ* we have

๐ฟ(๐‘ฅ*, ๐‘ฆ*, ๐‘ *) = ๐‘๐‘‡๐‘ฅ* + (๐‘ฆ*)๐‘‡ ยท 0 โˆ’ (๐‘ *)๐‘‡๐‘ฅ* โ‰ค ๐‘๐‘‡๐‘ฅ*. (8.5)

Note the we used the definition of the dual cone to conclude that (๐‘ *)๐‘‡๐‘ฅ* โ‰ฅ 0. The dualfunction is defined as the minimum of ๐ฟ(๐‘ฅ, ๐‘ฆ, ๐‘ ) over ๐‘ฅ. Thus the dual function of (8.1) is

๐‘”(๐‘ฆ, ๐‘ ) = min๐‘ฅ

๐ฟ(๐‘ฅ, ๐‘ฆ, ๐‘ ) = min๐‘ฅ

๐‘ฅ๐‘‡ (๐‘โˆ’ ๐ด๐‘‡๐‘ฆ โˆ’ ๐‘ ) + ๐‘๐‘‡๐‘ฆ =

{๏ธ‚๐‘๐‘‡๐‘ฆ, ๐‘โˆ’ ๐ด๐‘‡๐‘ฆ โˆ’ ๐‘  = 0,โˆ’โˆž, otherwise.

Dual problem

From (8.5) every (๐‘ฆ, ๐‘ ) โˆˆ R๐‘š ร—๐พ* satisfies ๐‘”(๐‘ฆ, ๐‘ ) โ‰ค ๐‘โ‹†, i.e. ๐‘”(๐‘ฆ, ๐‘ ) is a lower bound for๐‘โ‹†. To get the best such bound we maximize ๐‘”(๐‘ฆ, ๐‘ ) over all (๐‘ฆ, ๐‘ ) โˆˆ R๐‘š ร—๐พ* and get thedual problem:

maximize ๐‘๐‘‡๐‘ฆsubject to ๐‘โˆ’ ๐ด๐‘‡๐‘ฆ = ๐‘ ,

๐‘  โˆˆ ๐พ*,(8.6)

or simply:

maximize ๐‘๐‘‡๐‘ฆsubject to ๐‘โˆ’ ๐ด๐‘‡๐‘ฆ โˆˆ ๐พ*.

(8.7)

The optimal value of (8.6) will be denoted ๐‘‘โ‹†. As in the case of (8.1) (which from now onwe call the primal problem), the dual problem can be infeasible (๐‘‘โ‹† = โˆ’โˆž), have an optimalsolution (โˆ’โˆž < ๐‘‘โ‹† < +โˆž) or be unbounded (๐‘‘โ‹† = +โˆž). As before, the value ๐‘‘โ‹† is definedas a supremum of ๐‘๐‘‡๐‘ฆ and may not be attained. Note that the roles of โˆ’โˆž and +โˆž arenow reversed because the dual is a maximization problem.

Example 8.4 (More general constraints). We can just as easily derive the dual of a prob-lem with more general constraints, without necessarily having to transform the problem to

93

Page 97: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

standard form beforehand. Imagine, for example, that some solver accepts conic problemsof the form:

minimize ๐‘๐‘‡๐‘ฅ + ๐‘๐‘“

subject to ๐‘™๐‘ โ‰ค ๐ด๐‘ฅ โ‰ค ๐‘ข๐‘,๐‘™๐‘ฅ โ‰ค ๐‘ฅ โ‰ค ๐‘ข๐‘ฅ,๐‘ฅ โˆˆ ๐พ.

(8.8)

Then we define a Lagrangian with one set of dual variables for each constraint appearingin the problem:

๐ฟ(๐‘ฅ, ๐‘ ๐‘๐‘™ , ๐‘ ๐‘๐‘ข, ๐‘ 

๐‘ฅ๐‘™ , ๐‘ 

๐‘ฅ๐‘ข, ๐‘ 

๐‘ฅ๐‘›) = ๐‘๐‘‡๐‘ฅ + ๐‘๐‘“ โˆ’ (๐‘ ๐‘๐‘™ )

๐‘‡ (๐ด๐‘ฅโˆ’ ๐‘™๐‘) โˆ’ (๐‘ ๐‘๐‘ข)๐‘‡ (๐‘ข๐‘ โˆ’ ๐ด๐‘ฅ)โˆ’(๐‘ ๐‘ฅ๐‘™ )๐‘‡ (๐‘ฅโˆ’ ๐‘™๐‘ฅ) โˆ’ (๐‘ ๐‘ฅ๐‘ข)๐‘‡ (๐‘ข๐‘ฅ โˆ’ ๐‘ฅ) โˆ’ (๐‘ ๐‘ฅ๐‘›)๐‘‡๐‘ฅ

= ๐‘ฅ๐‘‡ (๐‘โˆ’ ๐ด๐‘‡ ๐‘ ๐‘๐‘™ + ๐ด๐‘‡ ๐‘ ๐‘๐‘ข โˆ’ ๐‘ ๐‘ฅ๐‘™ + ๐‘ ๐‘ฅ๐‘ข โˆ’ ๐‘ ๐‘ฅ๐‘›)+(๐‘™๐‘)๐‘‡ ๐‘ ๐‘๐‘™ โˆ’ (๐‘ข๐‘)๐‘‡ ๐‘ ๐‘๐‘ข + (๐‘™๐‘ฅ)๐‘‡ ๐‘ ๐‘ฅ๐‘™ โˆ’ (๐‘ข๐‘)๐‘‡ ๐‘ ๐‘ฅ๐‘ข + ๐‘๐‘“

and that gives a dual problem

maximize (๐‘™๐‘)๐‘‡ ๐‘ ๐‘๐‘™ โˆ’ (๐‘ข๐‘)๐‘‡ ๐‘ ๐‘๐‘ข + (๐‘™๐‘ฅ)๐‘‡ ๐‘ ๐‘ฅ๐‘™ โˆ’ (๐‘ข๐‘)๐‘‡ ๐‘ ๐‘ฅ๐‘ข + ๐‘๐‘“

subject to ๐‘ + ๐ด๐‘‡ (โˆ’๐‘ ๐‘๐‘™ + ๐‘ ๐‘๐‘ข) โˆ’ ๐‘ ๐‘ฅ๐‘™ + ๐‘ ๐‘ฅ๐‘ข โˆ’ ๐‘ ๐‘ฅ๐‘› = 0,๐‘ ๐‘๐‘™ , ๐‘ 

๐‘๐‘ข, ๐‘ 

๐‘ฅ๐‘™ , ๐‘ 

๐‘ฅ๐‘ข โ‰ฅ 0,

๐‘ ๐‘ฅ๐‘› โˆˆ ๐พ*.

Example 8.5 (Dual of simple portfolio). Consider a simplified version of the portfoliooptimization problem, where we maximize expected return subject to an upper bound onthe risk and no other constraints:

maximize ๐œ‡๐‘‡๐‘ฅsubject to โ€–๐น๐‘ฅโ€–2 โ‰ค ๐›พ,

where ๐‘ฅ โˆˆ R๐‘› and ๐น โˆˆ R๐‘šร—๐‘›. The conic formulation is

maximize ๐œ‡๐‘‡๐‘ฅsubject to (๐›พ, ๐น๐‘ฅ) โˆˆ ๐’ฌ๐‘š+1,

(8.9)

and we can directly write the Lagrangian

๐ฟ(๐‘ฅ, ๐‘ฃ, ๐‘ค) = ๐œ‡๐‘‡๐‘ฅ + ๐‘ฃ๐›พ + ๐‘ค๐‘‡๐น๐‘ฅ = ๐‘ฅ๐‘‡ (๐œ‡ + ๐น ๐‘‡๐‘ค) + ๐‘ฃ๐›พ

with (๐‘ฃ, ๐‘ค) โˆˆ (๐’ฌ๐‘š+1)* = ๐’ฌ๐‘š+1. Note that we chose signs to have ๐ฟ(๐‘ฅ, ๐‘ฃ, ๐‘ค) โ‰ฅ ๐œ‡๐‘‡๐‘ฅsince we are dealing with a maximization problem. The dual is now determined by theconditions ๐น ๐‘‡๐‘ค + ๐œ‡ = 0 and ๐‘ฃ โ‰ฅ โ€–๐‘คโ€–2, so it can be formulated as

minimize ๐›พโ€–๐‘คโ€–2subject to ๐น ๐‘‡๐‘ค = โˆ’๐œ‡.

(8.10)

94

Page 98: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Note that it is actually more natural to view problem (8.9) as the dual form and problem(8.10) as the primal. Indeed we can write the constraint in (8.9) as[๏ธ‚

๐›พ0

]๏ธ‚โˆ’[๏ธ‚

0โˆ’๐น

]๏ธ‚๐‘ฅ โˆˆ (๐‘„๐‘š+1)*

which fits naturally into the form (8.7). Having done this we can recover the dual as aminimization problem in the standard form (8.1). We leave it to the reader to check thatwe get the same answer as above.

8.4 Weak and strong duality

Weak duality

Suppose ๐‘ฅ* and (๐‘ฆ*, ๐‘ *) are feasible points for the primal and dual problems (8.1) and(8.6), respectively. Then we have

๐‘๐‘‡๐‘ฆ* = (๐ด๐‘ฅ*)๐‘‡๐‘ฆ* = (๐‘ฅ*)๐‘‡ (๐ด๐‘‡๐‘ฆ*) = (๐‘ฅ*)๐‘‡ (๐‘โˆ’ ๐‘ *) = ๐‘๐‘‡๐‘ฅ* โˆ’ (๐‘ *)๐‘‡๐‘ฅ* โ‰ค ๐‘๐‘‡๐‘ฅ* (8.11)

so the dual objective value is a lower bound on the objective value of the primal. In particular,any dual feasible point (๐‘ฆ*, ๐‘ *) gives a lower bound:

๐‘๐‘‡๐‘ฆ* โ‰ค ๐‘โ‹†

and we immediately get the next lemma.

Lemma 8.4 (Weak duality). ๐‘‘โ‹† โ‰ค ๐‘โ‹†.

It follows that if ๐‘๐‘‡๐‘ฆ* = ๐‘๐‘‡๐‘ฅ* then ๐‘ฅ* is optimal for the primal, (๐‘ฆ*, ๐‘ *) is optimal forthe dual and ๐‘๐‘‡๐‘ฆ* = ๐‘๐‘‡๐‘ฅ* is the common optimal objective value. This way we can use theoptimal dual solution to certify optimality of the primal solution and vice versa.

Complementary slackness

Moreover, (8.11) asserts that ๐‘๐‘‡๐‘ฆ* = ๐‘๐‘‡๐‘ฅ* is equivalent to orthogonality

(๐‘ *)๐‘‡๐‘ฅ* = 0

i.e. complementary slackness. It is not hard to verify what complementary slackness meansfor particular types of cones, for example

โ€ข for ๐‘ , ๐‘ฅ โˆˆ R+ we have ๐‘ ๐‘ฅ = 0 iff ๐‘  = 0 or ๐‘ฅ = 0,

โ€ข vectors (๐‘ 1, ๐‘ ), (๐‘ฅ1, ๏ฟฝฬƒ๏ฟฝ) โˆˆ ๐’ฌ๐‘›+1 are orthogonal iff (๐‘ 1, ๐‘ ) and (๐‘ฅ1,โˆ’๏ฟฝฬƒ๏ฟฝ) are parallel,

โ€ข vectors (๐‘ 1, ๐‘ 2, ๐‘ ), (๐‘ฅ1, ๐‘ฅ2, ๏ฟฝฬƒ๏ฟฝ) โˆˆ ๐’ฌ๐‘›+2r are orthogonal iff (๐‘ 1, ๐‘ 2, ๐‘ ) and (๐‘ฅ2, ๐‘ฅ1,โˆ’๏ฟฝฬƒ๏ฟฝ) are

parallel.

95

Page 99: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

One implicit case is worth special attention: complementary slackness for a linear inequal-ity constraint ๐‘Ž๐‘‡๐‘ฅ โ‰ค ๐‘ with a non-negative dual variable ๐‘ฆ asserts that (๐‘Ž๐‘‡๐‘ฅ* โˆ’ ๐‘)๐‘ฆ* = 0.This can be seen by directly writing down the appropriate Lagrangian for this type of con-straint. Alternatively, we can introduce a slack variable ๐‘ข = ๐‘โˆ’ ๐‘Ž๐‘‡๐‘ฅ with a conic constraint๐‘ข โ‰ฅ 0 and let ๐‘ฆ be the dual conic variable. In particular, if a constraint is non-binding inthe optimal solution (๐‘Ž๐‘‡๐‘ฅ* < ๐‘) then the corresponding dual variable ๐‘ฆ* = 0. If ๐‘ฆ* > 0 thenit can be related to a shadow price, see Sec. 2.4.4 and Sec. 8.5.2.

Strong duality

The obvious question is now whether ๐‘‘โ‹† = ๐‘โ‹†, that is if optimality of a primal solu-tion can always be certified by a dual solution with matching objective value, as for linearprogramming. This turns out not to be the case for general conic problems.

Example 8.6 (Positive duality gap). Consider the problem

minimize ๐‘ฅ3

subject to ๐‘ฅ1 โ‰ฅโˆš๏ธ€๐‘ฅ22 + ๐‘ฅ2

3,๐‘ฅ2 โ‰ฅ ๐‘ฅ1, ๐‘ฅ3 โ‰ฅ โˆ’1.

The only feasible points are (๐‘ฅ, ๐‘ฅ, 0), so ๐‘โ‹† = 0. The dual problem is

maximize โˆ’๐‘ฆ2subject to ๐‘ฆ1 โ‰ฅ

โˆš๏ธ€๐‘ฆ21 + (1 โˆ’ ๐‘ฆ2)2,

with feasible points (๐‘ฆ, 1), hence ๐‘‘โ‹† = โˆ’1.Similarly, we consider a problem

minimize ๐‘ฅ1

subject to

โŽกโŽฃ 0 ๐‘ฅ1 0๐‘ฅ1 ๐‘ฅ2 00 0 1 + ๐‘ฅ1

โŽคโŽฆ โˆˆ ๐’ฎ3+,

with feasible set {๐‘ฅ1 = 0, ๐‘ฅ2 โ‰ฅ 0} and optimal value ๐‘โ‹† = 0. The dual problem can beformulated as

maximize โˆ’๐‘ง2

subject to

โŽกโŽฃ ๐‘ง1 (1 โˆ’ ๐‘ง2)/2 0(1 โˆ’ ๐‘ง2)/2 0 0

0 0 ๐‘ง2

โŽคโŽฆ โˆˆ ๐’ฎ3+,

which has a feasible set {๐‘ง1 โ‰ฅ 0, ๐‘ง2 = 1} and dual optimal value ๐‘‘โ‹† = โˆ’1.

To ensure strong duality for conic problems we need an additional regularity assumption.As with the conic version of Farkasโ€™ lemma Lemma 8.3, we stress that this is a technical con-dition to eliminate ill-posed problems which should not be formed in practice. In particular,

96

Page 100: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

we invite the reader to think that strong duality holds for all well formed conic problems oneis likely to come across in applications, and that having a duality gap signals issues with themodel formulation.

We say problem (8.1) is very nicely posed if for all values of ๐‘0 the feasibility problem

๐‘๐‘‡๐‘ฅ = ๐‘0, ๐ด๐‘ฅ = ๐‘, ๐‘ฅ โˆˆ ๐พ

satisfies either the first or third alternative in Lemma 8.3.

Lemma 8.5 (Strong duality). Suppose that (8.1) is very nicely posed and ๐‘โ‹† is finite. Then๐‘‘โ‹† = ๐‘โ‹†.

Proof. For any ๐œ€ > 0 consider the feasibility problem with variable ๐‘ฅ and constraints[๏ธ‚โˆ’๐‘๐‘‡

๐ด

]๏ธ‚๐‘ฅ =

[๏ธ‚โˆ’๐‘โ‹† + ๐œ€

๐‘

]๏ธ‚๐‘ฅ โˆˆ ๐พ

that is๐‘๐‘‡๐‘ฅ = ๐‘โ‹† โˆ’ ๐œ€,๐ด๐‘ฅ = ๐‘,๐‘ฅ โˆˆ ๐พ.

Optimality of ๐‘โ‹† implies that the above problem is infeasible. By Lemma 8.3 and becausewe assumed very-nice-posedness there exists ๐‘ฆ = [๐‘ฆ0 ๐‘ฆ]๐‘‡ such that[๏ธ€

๐‘, โˆ’๐ด๐‘‡]๏ธ€๐‘ฆ โˆˆ ๐พ* and

[๏ธ€โˆ’๐‘โ‹† + ๐œ€, ๐‘๐‘‡

]๏ธ€๐‘ฆ > 0.

If ๐‘ฆ0 = 0 then โˆ’๐ด๐‘‡๐‘ฆ โˆˆ ๐พ* and ๐‘๐‘‡๐‘ฆ > 0, which by Lemma 8.3 again would mean that theoriginal problem was infeasible, which is not the case. Hence we can rescale so that ๐‘ฆ0 = 1and then we get

๐‘โˆ’ ๐ด๐‘‡๐‘ฆ โˆˆ ๐พ* and ๐‘๐‘‡๐‘ฆ โ‰ฅ ๐‘โ‹† โˆ’ ๐œ€.

The first constraint means that ๐‘ฆ is feasible for the dual problem. By letting ๐œ€ โ†’ 0 weobtain ๐‘‘โ‹† โ‰ฅ ๐‘โ‹†.

There are more direct conditions which guarantee strong duality, such as below.

Lemma 8.6 (Slater constraint qualification). Suppose that (8.1) is strictly feasible: thereexists a point ๐‘ฅ โˆˆ int(๐พ) in the interior of ๐พ such that ๐ด๐‘ฅ = ๐‘. Then strong duality holdsif ๐‘โ‹† is finite. Moreover, if both primal and dual problem are strictly feasible then ๐‘โ‹† and ๐‘‘โ‹†

are attained.

We omit the proof which can be found in standard texts. Note that the first problemfrom Example 8.6 does not satisfy Slater constraint qualification: the only feasible pointslie on the boundary of the cone (we say the problem has no interior). That problem is notvery nicely posed either: the point (๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3) = (0.5๐‘20๐œ€

โˆ’1 +๐œ€, 0.5๐‘20๐œ€โˆ’1, ๐‘0) โˆˆ ๐’ฌ3 violates the

inequality ๐‘ฅ2 โ‰ฅ ๐‘ฅ1 by an arbitrarily small ๐œ€, so the problem is infeasible but limit-feasible(second alternative in Lemma 8.3).

97

Page 101: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

8.5 Applications of conic duality

8.5.1 Linear regression and the normal equation

Least-squares linear regression is the problem of minimizing โ€–๐ด๐‘ฅ โˆ’ ๐‘โ€–22 over ๐‘ฅ โˆˆ R๐‘›, where๐ด โˆˆ R๐‘šร—๐‘› and ๐‘ โˆˆ R๐‘š are fixed. This problem can be posed in conic form as

minimize ๐‘กsubject to (๐‘ก, ๐ด๐‘ฅโˆ’ ๐‘) โˆˆ ๐’ฌ๐‘š+1,

and we can write the Lagrangian

๐ฟ(๐‘ก, ๐‘ฅ, ๐‘ข, ๐‘ฃ) = ๐‘กโˆ’ ๐‘ก๐‘ขโˆ’ ๐‘ฃ๐‘‡ (๐ด๐‘ฅโˆ’ ๐‘) = ๐‘ก(1 โˆ’ ๐‘ข) โˆ’ ๐‘ฅ๐‘‡๐ด๐‘‡๐‘ฃ + ๐‘๐‘‡๐‘ฃ

so the constraints in the dual problem are:

๐‘ข = 1, ๐ด๐‘‡๐‘ฃ = 0, (๐‘ข, ๐‘ฃ) โˆˆ ๐‘„๐‘š+1.

The problem exhibits strong duality with both the primal and dual values attained in theoptimal solution. The primal solution clearly satisfies ๐‘ก = โ€–๐ด๐‘ฅโˆ’ ๐‘โ€–2, and so complementaryslackness for the quadratic cone implies that the vectors (๐‘ข,โˆ’๐‘ฃ) = (1,โˆ’๐‘ฃ) and (๐‘ก, ๐ด๐‘ฅ โˆ’ ๐‘)are parallel. As a consequence the constraint ๐ด๐‘‡๐‘ฃ = 0 becomes ๐ด๐‘‡ (๐ด๐‘ฅโˆ’ ๐‘) = 0 or simply

๐ด๐‘‡๐ด๐‘ฅ = ๐ด๐‘‡ ๐‘

so if ๐ด๐‘‡๐ด is invertible then ๐‘ฅ = (๐ด๐‘‡๐ด)โˆ’1๐ด๐‘‡ ๐‘. This is the so-called normal equation forleast-squares regression, which we now obtained as a consequence of strong duality.

8.5.2 Constraint attribution

Consider again a portfolio optimization problem with mean-variance utility function, vectorof expected returns ๐›ผ and covariance matrix ฮฃ = ๐น ๐‘‡๐น :

maximize ๐›ผ๐‘‡๐‘ฅโˆ’ 12๐‘๐‘ฅ๐‘‡ฮฃ๐‘ฅ

subject to ๐ด๐‘ฅ โ‰ค ๐‘,(8.12)

where the linear part represents any set of additional constraints: total budget, sector con-straints, diversification constraints, individual relations between positions etc. In the absenceof additional constraints the solution to the unconstrained maximization problem is easy toderive using basic calculus and equals

๏ฟฝฬ‚๏ฟฝ = ๐‘โˆ’1ฮฃโˆ’1๐›ผ.

We would like to understand the difference ๐‘ฅ* โˆ’ ๏ฟฝฬ‚๏ฟฝ, where ๐‘ฅ* is the solution of (8.12), andin particular to measure which of the linear constraints actually cause ๐‘ฅ* to deviate from ๏ฟฝฬ‚๏ฟฝand to what degree. This can be quantified using the dual variables.

98

Page 102: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

The conic version of problem (8.12) is

maximize ๐›ผ๐‘‡๐‘ฅโˆ’ ๐‘๐‘Ÿsubject to ๐ด๐‘ฅ โ‰ค ๐‘,

(1, ๐‘Ÿ, ๐น๐‘ฅ) โˆˆ ๐’ฌr

with dual

minimize ๐‘๐‘‡๐‘ฆ + ๐‘ subject to ๐ด๐‘‡๐‘ฆ = ๐›ผ + ๐น ๐‘‡๐‘ข,

(๐‘ , ๐‘, ๐‘ข) โˆˆ ๐’ฌr ,๐‘ฆ โ‰ฅ 0.

Suppose we have a primal-dual optimal solution (๐‘ฅ*, ๐‘Ÿ*, ๐‘ฆ*, ๐‘ *, ๐‘ข*). Complementary slacknessfor the rotated quadratic cone implies

(๐‘ *, ๐‘, ๐‘ข*) = ๐›ฝ(๐‘Ÿ*, 1,โˆ’๐น๐‘ฅ*)

which leads to ๐›ฝ = ๐‘ and

๐ด๐‘‡๐‘ฆ* = ๐›ผโˆ’ ๐‘๐น ๐‘‡๐น๐‘ฅ*

or equivalently

๐‘ฅ* = ๏ฟฝฬ‚๏ฟฝโˆ’ ๐‘โˆ’1ฮฃโˆ’1๐ด๐‘‡๐‘ฆ*.

This equation splits the difference between the constrained and unconstrained solutions intocontributions from individual constraints, where the weights are precisely the dual variables๐‘ฆ*. For example, if a constraint is not binding (๐‘Ž๐‘‡๐‘– ๐‘ฅ* โˆ’ ๐‘๐‘– < 0) then by complementaryslackness ๐‘ฆ*๐‘– = 0 and, indeed, a non-binding constraint has no effect on the change in solution.

8.6 Semidefinite duality and LMIs

The general theory of conic duality applies in particular to problems with semidefinite vari-ables so here we just state it in the language familiar to SDP practitioners. Consider forsimplicity a primal semidefinite optimization problem with one matrix variable

minimize โŸจ๐ถ,๐‘‹โŸฉsubject to โŸจ๐ด๐‘–, ๐‘‹โŸฉ = ๐‘๐‘–, ๐‘– = 1, . . . ,๐‘š,

๐‘‹ โˆˆ ๐’ฎ๐‘›+.

(8.13)

We can quickly repeat the derivation of the dual problem in this notation. The Lagrangianis

๐ฟ(๐‘‹, ๐‘ฆ, ๐‘†) = โŸจ๐ถ,๐‘‹โŸฉ โˆ’โˆ‘๏ธ€๐‘– ๐‘ฆ๐‘–(โŸจ๐ด๐‘–, ๐‘‹โŸฉ โˆ’ ๐‘๐‘–) โˆ’ โŸจ๐‘†,๐‘‹โŸฉ= โŸจ๐ถ โˆ’โˆ‘๏ธ€๐‘– ๐‘ฆ๐‘–๐ด๐‘– โˆ’ ๐‘†,๐‘‹โŸฉ + ๐‘๐‘‡๐‘ฆ

99

Page 103: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

and we get the dual problem

maximize ๐‘๐‘‡๐‘ฆsubject to ๐ถ โˆ’โˆ‘๏ธ€๐‘š

๐‘–=1 ๐‘ฆ๐‘–๐ด๐‘– โˆˆ ๐’ฎ๐‘›+.

(8.14)

The dual contains an affine matrix-valued function with coefficients ๐ถ,๐ด๐‘– โˆˆ ๐’ฎ๐‘› and variable๐‘ฆ โˆˆ R๐‘š. Such a matrix-valued affine inequality is called a linear matrix inequality (LMI). InSec. 6 we formulated many problems as LMIs, that is in the form more naturally matchingthe dual.

From a modeling perspective it does not matter whether constraints are given as linearmatrix inequalities or as an intersection of affine hyperplanes; one formulation is easilyconverted to other using auxiliary variables and constraints, and this transformation is oftendone transparently by optimization software. Nevertheless, it is instructive to study anexplicit example of how to carry out this transformation. An linear matrix inequality

๐ด0 + ๐‘ฅ1๐ด1 + ยท ยท ยท + ๐‘ฅ๐‘›๐ด๐‘› โชฐ 0

where ๐ด๐‘– โˆˆ ๐’ฎ๐‘š+ is converted to a set of linear equality constraints using a slack variable

๐ด0 + ๐‘ฅ1๐ด1 + ยท ยท ยท + ๐‘ฅ๐‘›๐ด๐‘› = ๐‘†, ๐‘† โชฐ 0.

Apart from introducing an explicit semidefinite variable ๐‘† โˆˆ ๐’ฎ๐‘š+ we also added ๐‘š(๐‘š + 1)/2

equality constraints. On the other hand, a semidefinite variable ๐‘‹ โˆˆ ๐’ฎ๐‘›+ can be rewritten as

a linear matrix inequality with ๐‘›(๐‘› + 1)/2 scalar variables

๐‘‹ =๐‘›โˆ‘๏ธ

๐‘–=1

๐‘’๐‘–๐‘’๐‘‡๐‘– ๐‘ฅ๐‘–๐‘– +

๐‘›โˆ‘๏ธ๐‘–=1

๐‘›โˆ‘๏ธ๐‘—=๐‘–+1

(๐‘’๐‘–๐‘’๐‘‡๐‘— + ๐‘’๐‘—๐‘’

๐‘‡๐‘– )๐‘ฅ๐‘–๐‘— โชฐ 0.

Obviously we should only use these transformations when necessary; if we have a problemthat is more naturally interpreted in either primal or dual form, we should be careful torecognize that structure.

Example 8.7 (Dualization and efficiency). Consider the problem:

minimize ๐‘’๐‘‡ ๐‘งsubject to ๐ด + Diag(๐‘ง) = ๐‘‹,

๐‘‹ โชฐ 0.

with the variables ๐‘‹ โˆˆ ๐’ฎ๐‘›+ and ๐‘ง โˆˆ R๐‘›. This is a problem in primal form with ๐‘›(๐‘›+ 1)/2

equality constraints, but they are more naturally interpreted as a linear matrix inequality

๐ด +โˆ‘๏ธ๐‘–

๐‘’๐‘–๐‘’๐‘‡๐‘– ๐‘ง๐‘– โชฐ 0.

The dual problem is

maximize โˆ’โŸจ๐ด,๐‘โŸฉsubject to diag(๐‘) = ๐‘’,

๐‘ โชฐ 0,

in the variable ๐‘ โˆˆ ๐’ฎ๐‘›+. The dual problem has only ๐‘› equality constraints, which is a vast

improvement over the ๐‘›(๐‘› + 1)/2 constraints in the primal problem. See also Sec. 7.5.

100

Page 104: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Example 8.8 (Sum of singular values revisited). In Sec. 6.2.4, and specifically in (6.16),we expressed the problem of minimizing the sum of singular values of a nonsymmetricmatrix ๐‘‹. Problem (6.16) can be written as an LMI:

maximizeโˆ‘๏ธ€

๐‘–,๐‘— ๐‘‹๐‘–๐‘—๐‘ง๐‘–๐‘—

subject to ๐ผ โˆ’โˆ‘๏ธ€๐‘–,๐‘— ๐‘ง๐‘–๐‘—

[๏ธ‚0 ๐‘’๐‘—๐‘’

๐‘‡๐‘–

๐‘’๐‘–๐‘’๐‘‡๐‘— 0

]๏ธ‚โชฐ 0.

Treating this as the dual and going back to the primal form we get:

minimize Tr(๐‘ˆ) + Tr(๐‘‰ )subject to ๐‘† = โˆ’1

2๐‘‹,[๏ธ‚

๐‘ˆ ๐‘†๐‘‡

๐‘† ๐‘‰

]๏ธ‚โชฐ 0,

which is equivalent to the claimed (6.17). The dual formulation has the advantage ofbeing linear in ๐‘‹.

101

Page 105: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Chapter 9

Mixed integer optimization

In other chapters of this cookbook we have considered different classes of convex problemswith continuous variables. In this chapter we consider a much wider range of non-convexproblems by allowing integer variables. This technique is extremely useful in practice, andalready for linear programming it covers a vast range of problems. We introduce differentbuilding blocks for integer optimization, which make it possible to model useful non-convexdependencies between variables in conic problems. It should be noted that mixed integeroptimization problems are very hard (technically NP-hard), and for many practical cases anexact solution may not be found in reasonable time.

9.1 Integer modeling

A general mixed integer conic optimization problem has the form

minimize ๐‘๐‘‡๐‘ฅsubject to ๐ด๐‘ฅ = ๐‘,

๐‘ฅ โˆˆ ๐พ,๐‘ฅ๐‘– โˆˆ Z, โˆ€๐‘– โˆˆ โ„,

(9.1)

where ๐พ is a cone and โ„ โŠ† {1, . . . , ๐‘›} denotes the set of variables that are constrained to beintegers.

Two major techniques are typical for mixed integer optimization. The first one is theuse of binary variables, also known as indicator variables, which only take values 0 and 1,and indicate the absence or presence of a particular event or choice. This restriction can ofcourse be modeled in the form (9.1) by writing:

0 โ‰ค ๐‘ฅ โ‰ค 1 and ๐‘ฅ โˆˆ Z.

The other, known as big-M, refers to the fact that some relations can only be modeled linearlyif one assumes some fixed bound ๐‘€ on the quantities involved, and this constant enters themodel formulation. The choice of ๐‘€ can affect the performance of the model, see Example7.8.

102

Page 106: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

9.1.1 Implication of positivity

Often we have a real-valued variable ๐‘ฅ โˆˆ R satisfying 0 โ‰ค ๐‘ฅ < ๐‘€ for a known upper bound๐‘€ , and we wish to model the implication

๐‘ฅ > 0 =โ‡’ ๐‘ง = 1. (9.2)

Making ๐‘ง a binary variable we can write (9.2) as a linear inequality

๐‘ฅ โ‰ค ๐‘€๐‘ง, ๐‘ง โˆˆ {0, 1}. (9.3)

Indeed ๐‘ฅ > 0 excludes the possibility of ๐‘ง = 0, hence forces ๐‘ง = 1. Since a priori ๐‘ฅ โ‰ค ๐‘€ ,there is no danger that the constraint accidentally makes the problem infeasible. A typicaluse of this trick is to model fixed setup costs.

Example 9.1 (Fixed setup cost). Assume that production of a specific item ๐‘– costs ๐‘ข๐‘– perunit, but there is an additional fixed charge of ๐‘ค๐‘– if we produce item ๐‘– at all. For instance,๐‘ค๐‘– could be the cost of setting up a production plant, initial cost of equipment etc. Thenthe cost of producing ๐‘ฅ๐‘– units of product ๐‘– is given by the discontinuous function

๐‘๐‘–(๐‘ฅ๐‘–) =

{๏ธ‚๐‘ค๐‘– + ๐‘ข๐‘–๐‘ฅ๐‘–, ๐‘ฅ๐‘– > 00, ๐‘ฅ๐‘– = 0.

If we let ๐‘€ denote an upper bound on the quantities we can produce, we can then minimizethe total production cost of ๐‘› products under some affine constraint ๐ด๐‘ฅ = ๐‘ with

minimize ๐‘ข๐‘‡๐‘ฅ + ๐‘ค๐‘‡ ๐‘งsubject to ๐ด๐‘ฅ = ๐‘,

๐‘ฅ๐‘– โ‰ค ๐‘€๐‘ง๐‘–, ๐‘– = 1, . . . , ๐‘›๐‘ฅ โ‰ฅ 0,๐‘ง โˆˆ {0, 1}๐‘›,

which is a linear mixed-integer optimization problem. Note that by minimizing the pro-duction cost, we drive ๐‘ง๐‘– to 0 when ๐‘ฅ๐‘– = 0, so setup costs are indeed included only forproducts with ๐‘ฅ๐‘– > 0.

9.1.2 Semi-continuous variables

We can also model semi-continuity of a variable ๐‘ฅ โˆˆ R,

๐‘ฅ โˆˆ 0 โˆช [๐‘Ž, ๐‘], (9.4)

where 0 < ๐‘Ž โ‰ค ๐‘ using a double inequality

๐‘Ž๐‘ง โ‰ค ๐‘ฅ โ‰ค ๐‘๐‘ง, ๐‘ง โˆˆ {0, 1}.

103

Page 107: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Total

cost

0 1 2 3 4 5Quantities produced

wi

ui

Fig. 9.1: Production cost with fixed setup cost ๐‘ค๐‘–.

9.1.3 Indicator constraints

Suppose we want to model the fact that a certain linear inequality must be satisfied whensome other event occurs. In other words, for a binary variable ๐‘ง we want to model theimplication

๐‘ง = 1 =โ‡’ ๐‘Ž๐‘‡๐‘ฅ โ‰ค ๐‘.

Suppose we know in advance an upper bound ๐‘Ž๐‘‡๐‘ฅ โˆ’ ๐‘ โ‰ค ๐‘€ . Then we can write the aboveas a linear inequality

๐‘Ž๐‘‡๐‘ฅ โ‰ค ๐‘ + ๐‘€(1 โˆ’ ๐‘ง).

Now if ๐‘ง = 1 then we forced ๐‘Ž๐‘‡๐‘ฅ โ‰ค ๐‘, while for ๐‘ง = 0 the inequality is trivially satisfied anddoes not enforce any additional constraint on ๐‘ฅ.

9.1.4 Disjunctive constraints

With a disjunctive constraint we require that at least one of the given linear constraints issatisfied, that is

(๐‘Ž๐‘‡1 ๐‘ฅ โ‰ค ๐‘1) โˆจ (๐‘Ž๐‘‡2 ๐‘ฅ โ‰ค ๐‘2) โˆจ ยท ยท ยท โˆจ (๐‘Ž๐‘‡๐‘˜ ๐‘ฅ โ‰ค ๐‘๐‘˜).

Introducing binary variables ๐‘ง1, . . . , ๐‘ง๐‘˜, we can use Sec. 9.1.3 to write a linear model

๐‘ง1 + ยท ยท ยท + ๐‘ง๐‘˜ โ‰ฅ 1,๐‘ง1, . . . , ๐‘ง๐‘˜ โˆˆ {0, 1},๐‘Ž๐‘‡๐‘– ๐‘ฅ โ‰ค ๐‘๐‘– + ๐‘€(1 โˆ’ ๐‘ง๐‘–), ๐‘– = 1, . . . , ๐‘˜.

Note that ๐‘ง๐‘— = 1 implies that the ๐‘—-th constraint is satisfied, but not vice-versa. Achievingthat effect is described in the next section.

104

Page 108: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

9.1.5 Constraint satisfaction

Say we want to define an optimization model that will behave differently depending on whichof the inequalities

๐‘Ž๐‘‡๐‘ฅ โ‰ค ๐‘ or ๐‘Ž๐‘‡๐‘ฅ โ‰ฅ ๐‘

is satisfied. Suppose we have lower and upper bounds for ๐‘Ž๐‘‡๐‘ฅ โˆ’ ๐‘ in the form of ๐‘š โ‰ค๐‘Ž๐‘‡๐‘ฅโˆ’ ๐‘ โ‰ค ๐‘€ . Then we can write a model

๐‘ + ๐‘š๐‘ง โ‰ค ๐‘Ž๐‘‡๐‘ฅ โ‰ค ๐‘ + ๐‘€(1 โˆ’ ๐‘ง), ๐‘ง โˆˆ {0, 1}. (9.5)

Now observe that ๐‘ง = 0 implies ๐‘ โ‰ค ๐‘Ž๐‘‡๐‘ฅ โ‰ค ๐‘ + ๐‘€ , of which the right-hand inequality isredundant, i.e. always satisfied. Similarly, ๐‘ง = 1 implies ๐‘ + ๐‘š โ‰ค ๐‘Ž๐‘‡๐‘ฅ โ‰ค ๐‘. In other words ๐‘งis an indicator of whether ๐‘Ž๐‘‡๐‘ฅ โ‰ค ๐‘.

In practice we would relax one inequality using a small amount of slack, i.e.,

๐‘ + (๐‘šโˆ’ ๐œ–)๐‘ง + ๐œ– โ‰ค ๐‘Ž๐‘‡๐‘ฅ (9.6)

to avoid issues with classifying the equality ๐‘Ž๐‘‡๐‘ฅ = ๐‘.

9.1.6 Exact absolute value

In Sec. 2.2.2 we showed how to model |๐‘ฅ| โ‰ค ๐‘ก as two linear inequalities. Now suppose weneed to model an exact equality

|๐‘ฅ| = ๐‘ก. (9.7)

It defines a non-convex set, hence it is not conic representable. If we split ๐‘ฅ into positive andnegative part ๐‘ฅ = ๐‘ฅ+ โˆ’ ๐‘ฅโˆ’, where ๐‘ฅ+, ๐‘ฅโˆ’ โ‰ฅ 0, then |๐‘ฅ| = ๐‘ฅ+ + ๐‘ฅโˆ’ as long as either ๐‘ฅ+ = 0or ๐‘ฅโˆ’ = 0. That last alternative can be modeled with a binary variable, and we get a modelof (9.7):

๐‘ฅ = ๐‘ฅ+ โˆ’ ๐‘ฅโˆ’,๐‘ก = ๐‘ฅ+ + ๐‘ฅโˆ’,0 โ‰ค ๐‘ฅ+, ๐‘ฅโˆ’,

๐‘ฅ+ โ‰ค ๐‘€๐‘ง,๐‘ฅโˆ’ โ‰ค ๐‘€(1 โˆ’ ๐‘ง),๐‘ง โˆˆ {0, 1},

(9.8)

where the constant ๐‘€ is an a priori known upper bound on |๐‘ฅ| in the problem.

9.1.7 Exact 1-norm

We can use the technique above to model the exact โ„“1-norm equality constraint๐‘›โˆ‘๏ธ

๐‘–=1

|๐‘ฅ๐‘–| = ๐‘, (9.9)

105

Page 109: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

where ๐‘ฅ โˆˆ R๐‘› is a decision variable and ๐‘ is a constant. Such constraints arise for instancein fully invested portfolio optimizations scenarios (with short-selling). As before, we split ๐‘ฅinto a positive and negative part, using a sequence of binary variables to guarantee that atmost one of them is nonzero:

๐‘ฅ = ๐‘ฅ+ โˆ’ ๐‘ฅโˆ’,0 โ‰ค ๐‘ฅ+, ๐‘ฅโˆ’,

๐‘ฅ+ โ‰ค ๐‘๐‘ง,๐‘ฅโˆ’ โ‰ค ๐‘(๐‘’โˆ’ ๐‘ง),โˆ‘๏ธ€

๐‘– ๐‘ฅ+๐‘– +

โˆ‘๏ธ€๐‘– ๐‘ฅ

โˆ’๐‘– = ๐‘,๐‘ง โˆˆ {0, 1}๐‘›, ๐‘ฅ+, ๐‘ฅโˆ’ โˆˆ R๐‘›.

(9.10)

9.1.8 Maximum

The exact equality ๐‘ก = max{๐‘ฅ1, . . . , ๐‘ฅ๐‘›} can be expressed by introducing a sequence ofmutually exclusive indicator variables ๐‘ง1, . . . , ๐‘ง๐‘›, with the intention that ๐‘ง๐‘– = 1 picks thevariable ๐‘ฅ๐‘– which actually achieves maximum. Choosing a safe bound ๐‘€ we get a model:

๐‘ฅ๐‘– โ‰ค ๐‘ก โ‰ค ๐‘ฅ๐‘– + ๐‘€(1 โˆ’ ๐‘ง๐‘–), ๐‘– = 1, . . . , ๐‘›,๐‘ง1 + ยท ยท ยท + ๐‘ง๐‘› = 1,

๐‘ง โˆˆ {0, 1}๐‘›.(9.11)

9.1.9 Boolean operators

Typically an indicator variable ๐‘ง โˆˆ {0, 1} represents a boolean value (true/false). In thiscase the standard boolean operators can be implemented as linear inequalities. In the tablebelow we assume all variables are binary.

Table 9.1: Boolean operatorsBoolean Linear๐‘ง = ๐‘ฅ OR ๐‘ฆ ๐‘ฅ โ‰ค ๐‘ง, ๐‘ฆ โ‰ค ๐‘ง, ๐‘ง โ‰ค ๐‘ฅ + ๐‘ฆ๐‘ง = ๐‘ฅ AND ๐‘ฆ ๐‘ฅ โ‰ฅ ๐‘ง, ๐‘ฆ โ‰ฅ ๐‘ง, ๐‘ง + 1 โ‰ฅ ๐‘ฅ + ๐‘ฆ๐‘ง = NOT ๐‘ฅ ๐‘ง = 1 โˆ’ ๐‘ฅ๐‘ฅ =โ‡’ ๐‘ฆ ๐‘ฅ โ‰ค ๐‘ฆAt most one of ๐‘ง1, . . . , ๐‘ง๐‘› holds (SOS1, set-packing)

โˆ‘๏ธ€๐‘– ๐‘ง๐‘– โ‰ค 1

Exactly one of ๐‘ง1, . . . , ๐‘ง๐‘› holds (set-partitioning)โˆ‘๏ธ€

๐‘– ๐‘ง๐‘– = 1At least one of ๐‘ง1, . . . , ๐‘ง๐‘› holds (set-covering)

โˆ‘๏ธ€๐‘– ๐‘ง๐‘– โ‰ฅ 1

At most ๐‘˜ of ๐‘ง1, . . . , ๐‘ง๐‘› holds (cardinality)โˆ‘๏ธ€

๐‘– ๐‘ง๐‘– โ‰ค ๐‘˜

9.1.10 Fixed set of values

We can restrict a variable ๐‘ฅ to take on only values from a specified finite set {๐‘Ž1, . . . , ๐‘Ž๐‘›} bywriting

๐‘ฅ =โˆ‘๏ธ€

๐‘– ๐‘ง๐‘–๐‘Ž๐‘–๐‘ง โˆˆ {0, 1}๐‘›,โˆ‘๏ธ€

๐‘– ๐‘ง๐‘– = 1.(9.12)

106

Page 110: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

In (9.12) we essentially defined ๐‘ง๐‘– to be the indicator variable of whether ๐‘ฅ = ๐‘Ž๐‘–. In somecircumstances there may be more efficient representations of a restricted set of values, forexample:

โ€ข (sign) ๐‘ฅ โˆˆ {โˆ’1, 1} โ‡โ‡’ ๐‘ฅ = 2๐‘ง โˆ’ 1, ๐‘ง โˆˆ {0, 1},

โ€ข (modulo) ๐‘ฅ โˆˆ {1, 4, 7, 10} โ‡โ‡’ ๐‘ฅ = 3๐‘ง + 1, 0 โ‰ค ๐‘ง โ‰ค 3, ๐‘ง โˆˆ Z,

โ€ข (fraction) ๐‘ฅ โˆˆ {0, 1/3, 2/3, 1} โ‡โ‡’ 3๐‘ฅ = ๐‘ง, 0 โ‰ค ๐‘ง โ‰ค 3, ๐‘ง โˆˆ Z,

โ€ข (gap) ๐‘ฅ โˆˆ (โˆ’โˆž, ๐‘Ž]โˆช [๐‘,โˆž) โ‡โ‡’ ๐‘โˆ’๐‘€(1โˆ’๐‘ง) โ‰ค ๐‘ฅ โ‰ค ๐‘Ž+๐‘€๐‘ง, ๐‘ง โˆˆ {0, 1} for sufficientlylarge ๐‘€ .

In a very similar fashion we can restrict a variable ๐‘ฅ to a finite union of intervalsโ‹ƒ๏ธ€

๐‘–[๐‘Ž๐‘–, ๐‘๐‘–]:โˆ‘๏ธ€๐‘– ๐‘ง๐‘–๐‘Ž๐‘– โ‰ค ๐‘ฅ โ‰คโˆ‘๏ธ€๐‘– ๐‘ง๐‘–๐‘๐‘–,

๐‘ง โˆˆ {0, 1}๐‘›,โˆ‘๏ธ€๐‘– ๐‘ง๐‘– = 1.

(9.13)

9.1.11 Alternative as a bilinear equality

In the literature one often finds models containing a bilinear constraint

๐‘ฅ๐‘ฆ = 0.

This constraint is just an algebraic formulation of the alternative ๐‘ฅ = 0 or ๐‘ฆ = 0 and thereforeit can be written as a mixed-integer linear problem:

|๐‘ฅ| โ‰ค ๐‘€๐‘ง,|๐‘ฆ| โ‰ค ๐‘€(1 โˆ’ ๐‘ง),๐‘ง โˆˆ {0, 1},

for a suitable constant ๐‘€ . The absolute value can be omitted if both ๐‘ฅ, ๐‘ฆ are nonnegative.Otherwise it can be modeled as in Sec. 2.2.2.

9.1.12 Equal signs

Suppose we want to say that the variables ๐‘ฅ1, . . . , ๐‘ฅ๐‘› must either be all nonnegative or allnonpositive. This can be achieved by introducing a single binary variable ๐‘ง โˆˆ {0, 1} whichpicks one of these two alternatives:

โˆ’๐‘€(1 โˆ’ ๐‘ง) โ‰ค ๐‘ฅ๐‘– โ‰ค ๐‘€๐‘ง, ๐‘– = 1, . . . , ๐‘›.

Indeed, if ๐‘ง = 0 then we have โˆ’๐‘€ โ‰ค ๐‘ฅ๐‘– โ‰ค 0 and if ๐‘ง = 1 then 0 โ‰ค ๐‘ฅ๐‘– โ‰ค ๐‘€ .

107

Page 111: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

9.1.13 Continuous piecewise-linear functions

Consider a continuous, univariate, piecewise-linear, non-convex function ๐‘“ : [๐›ผ1, ๐›ผ5] โ†ฆโ†’ Rshown in Fig. 9.2. At the interval [๐›ผ๐‘—, ๐›ผ๐‘—+1], ๐‘— = 1, 2, 3, 4 we can describe the function as

๐‘“(๐‘ฅ) = ๐œ†๐‘—๐‘“(๐›ผ๐‘—) + ๐œ†๐‘—+1๐‘“(๐›ผ๐‘—+1)

where ๐œ†๐‘—๐›ผ๐‘— +๐œ†๐‘—+1๐›ผ๐‘—+1 = ๐‘ฅ and ๐œ†๐‘— +๐œ†๐‘—+1 = 1. If we add a constraint that only two (adjacent)variables ๐œ†๐‘—, ๐œ†๐‘—+1 can be nonzero, we can characterize every value ๐‘“(๐‘ฅ) over the entire interval[๐›ผ1, ๐›ผ5] as some convex combination,

๐‘“(๐‘ฅ) =4โˆ‘๏ธ

๐‘—=1

๐œ†๐‘—๐‘“(๐›ผ๐‘—).

f(x)

xฮฑ1 ฮฑ2 ฮฑ3 ฮฑ4 ฮฑ5

Fig. 9.2: A univariate piecewise-linear non-convex function.

The condition that only two adjacent variables can be nonzero is sometimes called anSOS2 constraint. If we introduce indicator variables ๐‘ง๐‘– for each pair of adjacent variables(๐œ†๐‘–, ๐œ†๐‘–+1), we can model an SOS2 constraint as:

๐œ†1 โ‰ค ๐‘ง1, ๐œ†2 โ‰ค ๐‘ง1 + ๐‘ง2, ๐œ†3 โ‰ค ๐‘ง2 + ๐‘ง3, ๐œ†4 โ‰ค ๐‘ง4 + ๐‘ง3, ๐œ†5 โ‰ค ๐‘ง4๐‘ง1 + ๐‘ง2 + ๐‘ง3 + ๐‘ง4 = 1, ๐‘ง โˆˆ {0, 1}4,

so that we have ๐‘ง๐‘— = 1 =โ‡’ ๐œ†๐‘– = 0, ๐‘– ฬธ= {๐‘—, ๐‘— + 1}. Collectively, we can then model theepigraph ๐‘“(๐‘ฅ) โ‰ค ๐‘ก as

๐‘ฅ =โˆ‘๏ธ€๐‘›

๐‘—=1 ๐œ†๐‘—๐›ผ๐‘—,โˆ‘๏ธ€๐‘›

๐‘—=1 ๐œ†๐‘—๐‘“(๐›ผ๐‘—) โ‰ค ๐‘ก

๐œ†1 โ‰ค ๐‘ง1, ๐œ†๐‘— โ‰ค ๐‘ง๐‘— + ๐‘ง๐‘—โˆ’1, ๐‘— = 2, . . . , ๐‘›โˆ’ 1, ๐œ†๐‘› โ‰ค ๐‘ง๐‘›โˆ’1,

๐œ† โ‰ฅ 0,โˆ‘๏ธ€๐‘›

๐‘—=1 ๐œ†๐‘— = 1,โˆ‘๏ธ€๐‘›โˆ’1

๐‘—=1 ๐‘ง๐‘— = 1, ๐‘ง โˆˆ {0, 1}๐‘›โˆ’1,(9.14)

for a piecewise-linear function ๐‘“(๐‘ฅ) with ๐‘› terms. This approach is often called the lambda-method.

For the function in Fig. 9.2 we can reduce the number of integer variables by using aGray encoding

of the intervals [๐›ผ๐‘—, ๐›ผ๐‘—+1] and an indicator variable ๐‘ฆ โˆˆ {0, 1}2 to represent the fourdifferent values of Gray code. We can then describe the constraints on ๐œ† using only two

108

Page 112: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

00 10 11 01

ฮฑ1 ฮฑ2 ฮฑ3 ฮฑ4 ฮฑ5

indicator variables,

(๐‘ฆ1 = 0) โ†’ ๐œ†3 = 0,(๐‘ฆ1 = 1) โ†’ ๐œ†1 = ๐œ†5 = 0,(๐‘ฆ2 = 0) โ†’ ๐œ†4 = ๐œ†5 = 0,(๐‘ฆ2 = 1) โ†’ ๐œ†1 = ๐œ†2 = 0,

which leads to a more efficient characterization of the epigraph ๐‘“(๐‘ฅ) โ‰ค ๐‘ก,

๐‘ฅ =โˆ‘๏ธ€5

๐‘—=1 ๐œ†๐‘—๐›ผ๐‘—,โˆ‘๏ธ€5

๐‘—=1 ๐œ†๐‘—๐‘“(๐›ผ๐‘—) โ‰ค ๐‘ก,

๐œ†3 โ‰ค ๐‘ฆ1, ๐œ†1 + ๐œ†5 โ‰ค (1 โˆ’ ๐‘ฆ1), ๐œ†4 + ๐œ†5 โ‰ค ๐‘ฆ2, ๐œ†1 + ๐œ†2 โ‰ค (1 โˆ’ ๐‘ฆ2),

๐œ† โ‰ฅ 0,โˆ‘๏ธ€5

๐‘—=1 ๐œ†๐‘— = 1, ๐‘ฆ โˆˆ {0, 1}2,

The lambda-method can also be used to model multivariate continuous piecewise-linear non-convex functions, specified on a set of polyhedra ๐‘ƒ๐‘˜. For example, for the function shown inFig. 9.3 we can model the epigraph ๐‘“(๐‘ฅ) โ‰ค ๐‘ก as

๐‘ฅ =โˆ‘๏ธ€6

๐‘–=1 ๐œ†๐‘–๐‘ฃ๐‘–,โˆ‘๏ธ€6

๐‘–=1 ๐œ†๐‘–๐‘“(๐‘ฃ๐‘–) โ‰ค ๐‘ก,๐œ†1 โ‰ค ๐‘ง1 + ๐‘ง2, ๐œ†2 โ‰ค ๐‘ง1, ๐œ†3 โ‰ค ๐‘ง2 + ๐‘ง3,

๐œ†4 โ‰ค ๐‘ง1 + ๐‘ง2 + ๐‘ง3 + ๐‘ง4, ๐œ†5 โ‰ค ๐‘ง3 + ๐‘ง4, ๐œ†6 โ‰ค ๐‘ง4,

๐œ† โ‰ฅ 0,โˆ‘๏ธ€6

๐‘–=1 ๐œ†๐‘– = 1,โˆ‘๏ธ€4

๐‘–=1 ๐‘ง๐‘– = 1,๐‘ง โˆˆ {0, 1}4.

(9.15)

Note, for example, that ๐‘ง2 = 1 implies that ๐œ†2 = ๐œ†5 = ๐œ†6 = 0 and ๐‘ฅ = ๐œ†1๐‘ฃ1 + ๐œ†3๐‘ฃ3 + ๐œ†4๐‘ฃ4.

Fig. 9.3: A multivariate continuous piecewise-linear non-convex function.

9.1.14 Lower semicontinuous piecewise-linear functions

The ideas in Sec. 9.1.13 can be applied to lower semicontinuous piecewise-linear functions aswell. For example, consider the function shown in Fig. 9.4. If we denote the one-sided limits

109

Page 113: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

by ๐‘“โˆ’(๐‘) := lim๐‘ฅโ†‘๐‘ ๐‘“(๐‘ฅ) and ๐‘“+(๐‘) := lim๐‘ฅโ†“๐‘ ๐‘“(๐‘ฅ), respectively, the one-sided limits, then wecan describe the epigraph ๐‘“(๐‘ฅ) โ‰ค ๐‘ก for the function in Fig. 9.4 as

๐‘ฅ = ๐œ†1๐›ผ1 + (๐œ†2 + ๐œ†3 + ๐œ†4)๐›ผ2 + ๐œ†5๐›ผ3,๐œ†1๐‘“(๐›ผ1) + ๐œ†2๐‘“โˆ’(๐›ผ2) + ๐œ†3๐‘“(๐›ผ2) + ๐œ†4๐‘“+(๐›ผ2) + ๐œ†5๐‘“(๐›ผ3) โ‰ค ๐‘ก,

๐œ†1 + ๐œ†2 โ‰ค ๐‘ง1, ๐œ†3 โ‰ค ๐‘ง2, ๐œ†4 + ๐œ†5 โ‰ค ๐‘ง3,

๐œ† โ‰ฅ 0,โˆ‘๏ธ€5

๐‘–=1 ๐œ†๐‘– = 1,โˆ‘๏ธ€3

๐‘–=1 ๐‘ง๐‘– = 1, ๐‘ง โˆˆ {0, 1}3,(9.16)

where we have a different decision variable for the intervals [๐›ผ1, ๐›ผ2), [๐›ผ2, ๐›ผ2], and (๐›ผ2, ๐›ผ3]. Asa special case this gives us an alternative characterization of fixed charge models consideredin Sec. 9.1.1.

f(x)

xฮฑ1 ฮฑ2 ฮฑ3

Fig. 9.4: A univariate lower semicontinuous piecewise-linear function.

9.2 Mixed integer conic case studies

9.2.1 Wireless network design

The following problem arises in wireless network design and some other applications. Wewant to serve ๐‘› clients with ๐‘˜ transmitters. A transmitter with range ๐‘Ÿ๐‘— can serve all clientswithin Euclidean distance ๐‘Ÿ๐‘— from its position. The power consumption of a transmitter withrange ๐‘Ÿ is proportional to ๐‘Ÿ๐›ผ for some fixed 1 โ‰ค ๐›ผ < โˆž. The goal is to assign locations andranges to transmitters minimizing the total power consumption while providing coverage ofall ๐‘› clients.

Denoting by ๐‘ฅ1, . . . , ๐‘ฅ๐‘› โˆˆ R2 locations of the clients, we can model this problem usingbinary decision variables ๐‘ง๐‘–๐‘— indicating if the ๐‘–-th client is covered by the ๐‘—-th transmitter.This leads to a mixed-integer conic problem:

minimizeโˆ‘๏ธ€

๐‘— ๐‘Ÿ๐›ผ๐‘—

subject to ๐‘Ÿ๐‘— โ‰ฅ โ€–๐‘๐‘— โˆ’ ๐‘ฅ๐‘–โ€–2 โˆ’๐‘€(1 โˆ’ ๐‘ง๐‘–๐‘—), ๐‘– = 1, . . . , ๐‘›, ๐‘— = 1, . . . , ๐‘˜,โˆ‘๏ธ€๐‘— ๐‘ง๐‘–๐‘— โ‰ฅ 1, ๐‘– = 1, . . . , ๐‘›,

๐‘๐‘— โˆˆ R2, ๐‘ง๐‘–๐‘— โˆˆ {0, 1}.(9.17)

The objective can be realized by summing power bounds ๐‘ก๐‘— โ‰ฅ ๐‘Ÿ๐›ผ๐‘— or by directly boundingthe ๐›ผ-norm of (๐‘Ÿ1, . . . , ๐‘Ÿ๐‘˜). The latter approach would be recommended for large ๐›ผ.

110

Page 114: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

This is a type of clustering problem. For ๐›ผ = 1, 2, respectively, we are minimizing theperimeter and area of the covered region. In practical applications the power exponent ๐›ผcan be as large as 6 depending on various factors (for instance terrain). In the linear costmodel (๐›ผ = 1) typical solutions contain a small number of huge disks covering most of theclients. For increasing ๐›ผ large ranges are penalized more heavily and the disks tend to bemore balanced.

9.2.2 Avoiding small trades

The standard portfolio optimization model admits a number of mixed-integer extensionsaimed at avoiding solutions with very small trades. To fix attention consider the model

maximize ๐œ‡๐‘‡๐‘ฅsubject to (๐›พ,๐บ๐‘‡๐‘ฅ) โˆˆ ๐’ฌ๐‘›+1,

๐‘’๐‘‡๐‘ฅ = ๐‘ค + ๐‘’๐‘‡๐‘ฅ0,๐‘ฅ โ‰ฅ 0,

(9.18)

with initial holdings ๐‘ฅ0, initial cash amount ๐‘ค, expected returns ๐œ‡, risk bound ๐›พ and decisionvariable ๐‘ฅ. Here ๐‘’ is the all-ones vector. Let โˆ†๐‘ฅ๐‘— = ๐‘ฅ๐‘— โˆ’ ๐‘ฅ0

๐‘— denote the change of position inasset ๐‘—.

Transaction costs

A transaction cost involved with nonzero โˆ†๐‘ฅ๐‘— could be modeled as

๐‘‡๐‘—(๐‘ฅ๐‘—) =

{๏ธƒ0, โˆ†๐‘ฅ๐‘— = 0,

๐›ผ๐‘—โˆ†๐‘ฅ๐‘— + ๐›ฝ๐‘—, โˆ†๐‘ฅ๐‘— ฬธ= 0,

similarly to the problem from Example 9.1. Including transaction costs will now lead to themodel:

maximize ๐œ‡๐‘‡๐‘ฅsubject to (๐›พ,๐บ๐‘‡๐‘ฅ) โˆˆ ๐’ฌ๐‘›+1,

๐‘’๐‘‡๐‘ฅ + ๐›ผ๐‘‡๐‘ฅ + ๐›ฝ๐‘‡ ๐‘ง = ๐‘ค + ๐‘’๐‘‡๐‘ฅ0,๐‘ฅโˆ’ ๐‘ฅ0 โ‰ค ๐‘€๐‘ง, ๐‘ฅ0 โˆ’ ๐‘ฅ โ‰ค ๐‘€๐‘ง,๐‘ฅ โ‰ฅ 0, ๐‘ง โˆˆ {0, 1}๐‘›,

(9.19)

where the binary variable ๐‘ง๐‘— is an indicator of โˆ†๐‘ฅ๐‘— ฬธ= 0. Here ๐‘€ is a sufficiently largeconstant, for instance ๐‘€ = ๐‘ค + ๐‘’๐‘‡๐‘ฅ0 will do.

111

Page 115: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Cardinality constraints

Another option is to fix an upper bound ๐‘˜ on the number of nonzero trades. The meaningof ๐‘ง is the same as before:

maximize ๐œ‡๐‘‡๐‘ฅsubject to (๐›พ,๐บ๐‘‡๐‘ฅ) โˆˆ ๐’ฌ๐‘›+1,

๐‘’๐‘‡๐‘ฅ = ๐‘ค + ๐‘’๐‘‡๐‘ฅ0,๐‘ฅโˆ’ ๐‘ฅ0 โ‰ค ๐‘€๐‘ง, ๐‘ฅ0 โˆ’ ๐‘ฅ โ‰ค ๐‘€๐‘ง,๐‘’๐‘‡ ๐‘ง โ‰ค ๐‘˜,๐‘ฅ โ‰ฅ 0, ๐‘ง โˆˆ {0, 1}๐‘›.

(9.20)

Trading size constraints

We can also demand a lower bound on nonzero trades, that is |โˆ†๐‘ฅ๐‘—| โˆˆ {0} โˆช [๐‘Ž, ๐‘] for all๐‘—. To this end we combine the techniques from Sec. 9.1.6 and Sec. 9.1.2 writing ๐‘๐‘—, ๐‘ž๐‘— for theindicators of โˆ†๐‘ฅ๐‘— > 0 and โˆ†๐‘ฅ๐‘— < 0, respectively:

maximize ๐œ‡๐‘‡๐‘ฅsubject to (๐›พ,๐บ๐‘‡๐‘ฅ) โˆˆ ๐’ฌ๐‘›+1,

๐‘’๐‘‡๐‘ฅ = ๐‘ค + ๐‘’๐‘‡๐‘ฅ0,๐‘ฅโˆ’ ๐‘ฅ0 = ๐‘ฅ+ โˆ’ ๐‘ฅโˆ’,๐‘ฅ+ โ‰ค ๐‘€๐‘, ๐‘ฅโˆ’ โ‰ค ๐‘€๐‘ž,๐‘Ž(๐‘ + ๐‘ž) โ‰ค ๐‘ฅ+ + ๐‘ฅโˆ’ โ‰ค ๐‘(๐‘ + ๐‘ž),๐‘ + ๐‘ž โ‰ค ๐‘’,๐‘ฅ, ๐‘ฅ+, ๐‘ฅโˆ’ โ‰ฅ 0, ๐‘, ๐‘ž โˆˆ {0, 1}๐‘›.

(9.21)

9.2.3 Convex piecewise linear regression

Consider the problem of approximating the data (๐‘ฅ๐‘–, ๐‘ฆ๐‘–), ๐‘– = 1, . . . , ๐‘› by a piecewise linearconvex function of the form

๐‘“(๐‘ฅ) = max{๐‘Ž๐‘—๐‘ฅ + ๐‘๐‘—, ๐‘— = 1, . . . ๐‘˜},

where ๐‘˜ is the number of segments we want to consider. The quality of the fit is measuredwith least squares as โˆ‘๏ธ

๐‘–

(๐‘“(๐‘ฅ๐‘–) โˆ’ ๐‘ฆ๐‘–)2.

Note that we do not specify the locations of nodes (breakpoints), i.e. points where ๐‘“(๐‘ฅ)changes slope. Finding them is part of the fitting problem.

As in Sec. 9.1.8 we introduce binary variables ๐‘ง๐‘–๐‘— indicating that ๐‘“(๐‘ฅ๐‘–) = ๐‘Ž๐‘—๐‘ฅ๐‘– + ๐‘๐‘—, i.e. itis the ๐‘—-th linear function that achieves maximum at the point ๐‘ฅ๐‘–. Following Sec. 9.1.8 we

112

Page 116: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

now have a mixed integer conic quadratic problem

minimize โ€–๐‘ฆ โˆ’ ๐‘“โ€–2subject to ๐‘Ž๐‘—๐‘ฅ๐‘– + ๐‘๐‘— โ‰ค ๐‘“๐‘–, ๐‘– = 1, . . . , ๐‘›, ๐‘— = 1, . . . , ๐‘˜,

๐‘“๐‘– โ‰ค ๐‘Ž๐‘—๐‘ฅ๐‘– + ๐‘๐‘— + ๐‘€(1 โˆ’ ๐‘ง๐‘–๐‘—), ๐‘– = 1, . . . , ๐‘›, ๐‘— = 1, . . . , ๐‘˜,โˆ‘๏ธ€๐‘— ๐‘ง๐‘–๐‘— = 1, ๐‘– = 1 . . . , ๐‘›,

๐‘ง๐‘–๐‘— โˆˆ {0, 1}, ๐‘– = 1, . . . , ๐‘›, ๐‘— = 1, . . . , ๐‘˜,

(9.22)

with variables ๐‘Ž, ๐‘, ๐‘“, ๐‘ง, where ๐‘€ is a sufficiently large constant.

Fig. 9.5: Convex piecewise linear fit with ๐‘˜ = 2, 3, 4 segments.

Frequently an integer model will have properties which formally follow from the problemโ€™sconstraints, but may be very hard or impossible for a mixed-integer solver to automaticallydeduce. It may dramatically improve efficiency to explicitly add some of them to the model.For example, we can enhance (9.22) with all inequalities of the form

๐‘ง๐‘–,๐‘—+1 + ๐‘ง๐‘–+๐‘–โ€ฒ,๐‘— โ‰ค 1,

which indicate that each linear segment covers a contiguous subset of the sample and addi-tionally force these segments to come in the order of increasing ๐‘— as ๐‘– increases from left toright. The last statement is an example of symmetry breaking.

113

Page 117: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Chapter 10

Quadratic optimization

In this chapter we discuss convex quadratic and quadratically constrained optimization.Our discussion is fairly brief compared to the previous chapters for three reasons; (i) convexquadratic optimization is a special case of conic quadratic optimization, (ii) for most convexproblems it is actually more computationally efficient to pose the problem in conic form, and(iii) duality theory (including infeasibility certificates) is much simpler for conic quadraticoptimization. Therefore, we generally recommend a conic quadratic formulation, see Sec. 3and especially Sec. 3.2.3.

10.1 Quadratic objective

A standard (convex) quadratic optimization problem

minimize 12๐‘ฅ๐‘‡๐‘„๐‘ฅ + ๐‘๐‘‡๐‘ฅ

subject to ๐ด๐‘ฅ = ๐‘,๐‘ฅ โ‰ฅ 0,

(10.1)

with ๐‘„ โˆˆ ๐’ฎ๐‘›+ is conceptually a simple extension of a standard linear optimization problem

with a quadratic term ๐‘ฅ๐‘‡๐‘„๐‘ฅ. Note the important requirement that ๐‘„ is symmetric positivesemidefinite; otherwise the objective function would not be convex.

10.1.1 Geometry of quadratic optimization

Quadratic optimization has a simple geometric interpretation; we minimize a convexquadratic function over a polyhedron., see Fig. 10.1. It is intuitively clear that the followingdifferent cases can occur:

โ€ข The optimal solution ๐‘ฅโ‹† is at the boundary of the polyhedron (as shown in Fig. 10.1).At ๐‘ฅโ‹† one of the hyperplanes is tangential to an ellipsoidal level curve.

โ€ข The optimal solution is inside the polyhedron; this occurs if the unconstrained mini-mizer arg min๐‘ฅ

12๐‘ฅ๐‘‡๐‘„๐‘ฅ + ๐‘๐‘‡๐‘ฅ = โˆ’๐‘„โ€ ๐‘ (i.e., the center of the ellipsoidal level curves) is

inside the polyhedron. From now on ๐‘„โ€  denotes the pseudoinverse of ๐‘„; in particular๐‘„โ€  = ๐‘„โˆ’1 if ๐‘„ is positive definite.

114

Page 118: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

โ€ข If the polyhedron is unbounded in the opposite direction of ๐‘, and if the ellipsoid levelcurves are degenerate in that direction (i.e., ๐‘„๐‘ = 0), then the problem is unbounded.If ๐‘„ โˆˆ ๐’ฎ๐‘›

++, then the problem cannot be unbounded.

โ€ข The problem is infeasible, i.e., {๐‘ฅ | ๐ด๐‘ฅ = ๐‘, ๐‘ฅ โ‰ฅ 0} = โˆ….

xโ‹†

a0

a1a2

a3

a4

Fig. 10.1: Geometric interpretation of quadratic optimization. At the optimal point ๐‘ฅโ‹† thehyperplane {๐‘ฅ | ๐‘Ž๐‘‡1 ๐‘ฅ = ๐‘} is tangential to an ellipsoidal level curve.

Possibly because of its simple geometric interpretation and similarity to linear optimiza-tion, quadratic optimization has been more widely adopted by optimization practitionersthan conic quadratic optimization.

10.1.2 Duality in quadratic optimization

The Lagrangian function (Sec. 8.3) for (10.1) is

๐ฟ(๐‘ฅ, ๐‘ฆ, ๐‘ ) =1

2๐‘ฅ๐‘‡๐‘„๐‘ฅ + ๐‘ฅ๐‘‡ (๐‘โˆ’ ๐ด๐‘‡๐‘ฆ โˆ’ ๐‘ ) + ๐‘๐‘‡๐‘ฆ (10.2)

with Lagrange multipliers ๐‘  โˆˆ R๐‘›+, and from โˆ‡๐‘ฅ๐ฟ(๐‘ฅ, ๐‘ฆ, ๐‘ ) = 0 we get the necessary first-order

optimality condition

๐‘„๐‘ฅ = ๐ด๐‘‡๐‘ฆ + ๐‘ โˆ’ ๐‘,

i.e. (๐ด๐‘‡๐‘ฆ + ๐‘ โˆ’ ๐‘) โˆˆ โ„›(๐‘„). Then

arg min๐‘ฅ

๐ฟ(๐‘ฅ, ๐‘ฆ, ๐‘ ) = ๐‘„โ€ (๐ด๐‘‡๐‘ฆ + ๐‘ โˆ’ ๐‘),

which can be substituted into (10.2) to give a dual function

๐‘”(๐‘ฆ, ๐‘ ) =

{๏ธ‚๐‘๐‘‡๐‘ฆ โˆ’ 1

2(๐ด๐‘‡๐‘ฆ + ๐‘ โˆ’ ๐‘)๐‘„โ€ (๐ด๐‘‡๐‘ฆ + ๐‘ โˆ’ ๐‘), (๐ด๐‘‡๐‘ฆ + ๐‘ โˆ’ ๐‘) โˆˆ โ„›(๐‘„),

โˆ’โˆž otherwise.

115

Page 119: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Thus we get a dual problem

maximize ๐‘๐‘‡๐‘ฆ โˆ’ 12(๐ด๐‘‡๐‘ฆ + ๐‘ โˆ’ ๐‘)๐‘„โ€ (๐ด๐‘‡๐‘ฆ + ๐‘ โˆ’ ๐‘)

subject to (๐ด๐‘‡๐‘ฆ + ๐‘ โˆ’ ๐‘) โˆˆ โ„›(๐‘„),๐‘  โ‰ฅ 0,

(10.3)

or alternatively, using the optimality condition ๐‘„๐‘ฅ = ๐ด๐‘‡๐‘ฆ + ๐‘ โˆ’ ๐‘ we can write

maximize ๐‘๐‘‡๐‘ฆ โˆ’ 12๐‘ฅ๐‘‡๐‘„๐‘ฅ

subject to ๐‘„๐‘ฅ = ๐ด๐‘‡๐‘ฆ โˆ’ ๐‘ + ๐‘ ,๐‘  โ‰ฅ 0.

(10.4)

Note that this is an unusual dual problem in the sense that it involves both primal and dualvariables.

Weak duality, strong duality under Slater constraint qualification and Farkas infeasibilitycertificates work similarly as in Sec. 8. In particular, note that the constraints in both (10.1)and (10.4) are linear, so Lemma 2.4 applies and we have:

1. The primal problem (10.1) is infeasible if and only if there is ๐‘ฆ such that ๐ด๐‘‡๐‘ฆ โ‰ค 0 and๐‘๐‘‡๐‘ฆ > 0.

2. The dual problem (10.4) is infeasible if and only if there is ๐‘ฅ โ‰ฅ 0 such that ๐ด๐‘ฅ = 0,๐‘„๐‘ฅ = 0 and ๐‘๐‘‡๐‘ฅ < 0.

10.1.3 Conic reformulation

Suppose we have a factorization ๐‘„ = ๐น ๐‘‡๐น where ๐น โˆˆ R๐‘˜ร—๐‘›, which is most interesting when๐‘˜ โ‰ช ๐‘›. Then ๐‘ฅ๐‘‡๐‘„๐‘ฅ = ๐‘ฅ๐‘‡๐น ๐‘‡๐น๐‘ฅ = โ€–๐น๐‘ฅโ€–22 and the conic quadratic reformulation of (10.1) is

minimize ๐‘Ÿ + ๐‘๐‘‡๐‘ฅsubject to ๐ด๐‘ฅ = ๐‘,

๐‘ฅ โ‰ฅ 0,(1, ๐‘Ÿ, ๐น๐‘ฅ) โˆˆ ๐’ฌ๐‘˜+2

r ,

(10.5)

with dual problem

maximize ๐‘๐‘‡๐‘ฆ โˆ’ ๐‘ขsubject to โˆ’๐น ๐‘‡๐‘ฃ = ๐ด๐‘‡๐‘ฆ โˆ’ ๐‘ + ๐‘ ,

๐‘  โ‰ฅ 0,(๐‘ข, 1, ๐‘ฃ) โˆˆ ๐’ฌ๐‘˜+2

r .

(10.6)

Note that in an optimal primal-dual solution we have ๐‘Ÿ = 12โ€–๐น๐‘ฅโ€–22, hence the complementary

slackness for ๐’ฌ๐‘˜+2r demands ๐‘ฃ = โˆ’๐น๐‘ฅ and โˆ’๐น ๐‘‡๐น๐‘ฃ = ๐‘„๐‘ฅ, as well as ๐‘ข = 1

2โ€–๐‘ฃโ€–22 = 1

2๐‘ฅ๐‘‡๐‘„๐‘ฅ.

This justifies why some of the dual variables in (10.6) and (10.4) have the same names - theyare in fact equal, and so both the primal and dual solution to the original quadratic problemcan be recovered from the primal-dual solution to the conic reformulation.

116

Page 120: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

10.2 Quadratically constrained optimization

A general convex quadratically constrained quadratic optimization problem is

minimize 12๐‘ฅ๐‘‡๐‘„0๐‘ฅ + ๐‘๐‘‡0 ๐‘ฅ + ๐‘Ÿ0

subject to 12๐‘ฅ๐‘‡๐‘„๐‘–๐‘ฅ + ๐‘๐‘‡๐‘– ๐‘ฅ + ๐‘Ÿ๐‘– โ‰ค 0, ๐‘– = 1, . . . ,๐‘š,

(10.7)

where ๐‘„๐‘– โˆˆ ๐’ฎ๐‘›+. This corresponds to minimizing a convex quadratic function over an inter-

section of convex quadratic sets such as ellipsoids or affine halfspaces. Note the importantrequirement ๐‘„๐‘– โชฐ 0 for all ๐‘– = 0, . . . ,๐‘š, so that the objective function is convex and theconstraints characterize convex sets. For example, neither of the constraints

1

2๐‘ฅ๐‘‡๐‘„๐‘–๐‘ฅ + ๐‘๐‘‡๐‘– ๐‘ฅ + ๐‘Ÿ๐‘– = 0,

1

2๐‘ฅ๐‘‡๐‘„๐‘–๐‘ฅ + ๐‘๐‘‡๐‘– ๐‘ฅ + ๐‘Ÿ๐‘– โ‰ฅ 0

characterize convex sets, and therefore cannot be included.

10.2.1 Duality in quadratically constrained optimization

The Lagrangian function for (10.7) is

๐ฟ(๐‘ฅ, ๐œ†) = 12๐‘ฅ๐‘‡๐‘„0๐‘ฅ + ๐‘๐‘‡0 ๐‘ฅ + ๐‘Ÿ0 +

โˆ‘๏ธ€๐‘š๐‘–=1 ๐œ†๐‘–

(๏ธ€12๐‘ฅ๐‘‡๐‘„๐‘–๐‘ฅ + ๐‘๐‘‡๐‘– ๐‘ฅ + ๐‘Ÿ๐‘–

)๏ธ€= 1

2๐‘ฅ๐‘‡๐‘„(๐œ†)๐‘ฅ + ๐‘(๐œ†)๐‘‡๐‘ฅ + ๐‘Ÿ(๐œ†),

where

๐‘„(๐œ†) = ๐‘„0 +๐‘šโˆ‘๏ธ๐‘–=1

๐œ†๐‘–๐‘„๐‘–, ๐‘(๐œ†) = ๐‘0 +๐‘šโˆ‘๏ธ๐‘–=1

๐œ†๐‘–๐‘๐‘–, ๐‘Ÿ(๐œ†) = ๐‘Ÿ0 +๐‘šโˆ‘๏ธ๐‘–=1

๐œ†๐‘–๐‘Ÿ๐‘–.

From the Lagrangian we get the first-order optimality conditions

๐‘„(๐œ†)๐‘ฅ = โˆ’๐‘(๐œ†), (10.8)

and similar to the case of quadratic optimization we get a dual problem

maximize โˆ’12๐‘(๐œ†)๐‘‡๐‘„(๐œ†)โ€ ๐‘(๐œ†) + ๐‘Ÿ(๐œ†)

subject to ๐‘(๐œ†) โˆˆ โ„› (๐‘„(๐œ†)) ,๐œ† โ‰ฅ 0,

(10.9)

or equivalently

maximize โˆ’12๐‘ค๐‘‡๐‘„(๐œ†)๐‘ค + ๐‘Ÿ(๐œ†)

subject to ๐‘„(๐œ†)๐‘ค = โˆ’๐‘(๐œ†),๐œ† โ‰ฅ 0.

(10.10)

Using a general version of the Schur Lemma for singular matrices, we can also write (10.9)as an equivalent semidefinite optimization problem,

maximize ๐‘ก

subject to[๏ธ‚

2(๐‘Ÿ(๐œ†) โˆ’ ๐‘ก) ๐‘(๐œ†)๐‘‡

๐‘(๐œ†) ๐‘„(๐œ†)

]๏ธ‚โชฐ 0,

๐œ† โ‰ฅ 0.

(10.11)

117

Page 121: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Feasibility in quadratically constrained optimization is characterized by the following con-ditions (assuming Slater constraint qualification or other conditions to exclude ill-posedproblems):

โ€ข Either the primal problem (10.7) is feasible or there exists ๐œ† โ‰ฅ 0, ๐œ† ฬธ= 0 satisfying

๐‘šโˆ‘๏ธ๐‘–=1

๐œ†๐‘–

[๏ธ‚2๐‘Ÿ๐‘– ๐‘๐‘‡๐‘–๐‘๐‘– ๐‘„๐‘–

]๏ธ‚โ‰ป 0. (10.12)

โ€ข Either the dual problem (10.10) is feasible or there exists ๐‘ฅ โˆˆ R๐‘› satisfying

๐‘„0๐‘ฅ = 0, ๐‘๐‘‡0 ๐‘ฅ < 0, ๐‘„๐‘–๐‘ฅ = 0, ๐‘๐‘‡๐‘– ๐‘ฅ = 0, ๐‘– = 1, . . . ,๐‘š. (10.13)

To see why the certificate proves infeasibility, suppose for instance that (10.12) and (10.7)are simultaneously satisfied. Then

0 <โˆ‘๏ธ๐‘–

๐œ†๐‘–

[๏ธ‚1๐‘ฅ

]๏ธ‚๐‘‡ [๏ธ‚2๐‘Ÿ๐‘– ๐‘๐‘‡๐‘–๐‘๐‘– ๐‘„๐‘–

]๏ธ‚ [๏ธ‚1๐‘ฅ

]๏ธ‚= 2

โˆ‘๏ธ๐‘–

๐œ†๐‘–

(๏ธ‚1

2๐‘ฅ๐‘‡๐‘„๐‘–๐‘ฅ + ๐‘๐‘‡๐‘– ๐‘ฅ + ๐‘Ÿ๐‘–

)๏ธ‚โ‰ค 0

and we have a contradiction, so (10.12) certifies infeasibility.

10.2.2 Conic reformulation

If ๐‘„๐‘– = ๐น ๐‘‡๐‘– ๐น๐‘– for ๐‘– = 0, . . . ,๐‘š, where ๐น๐‘– โˆˆ R๐‘˜๐‘–ร—๐‘› then we get a conic quadratic reformulation

of (10.7):

minimize ๐‘ก0 + ๐‘๐‘‡0 ๐‘ฅ + ๐‘Ÿ0subject to (1, ๐‘ก0, ๐น0๐‘ฅ) โˆˆ ๐’ฌ๐‘˜0+2

r ,(1,โˆ’๐‘๐‘‡๐‘– ๐‘ฅโˆ’ ๐‘Ÿ๐‘–, ๐น๐‘–๐‘ฅ) โˆˆ ๐’ฌ๐‘˜๐‘–+2

r , ๐‘– = 1, . . . ,๐‘š.(10.14)

The primal and dual solution of (10.14) recovers the primal and dual solution of (10.7) sim-ilarly as for quadratic optimization in Sec. 10.1.3. Let us see for example, how a (conic) pri-mal infeasibility certificate for (10.14) implies an infeasibility certificate in the form (10.12).Infeasibility in (10.14) involves only the last set of conic constraints. We can derive theinfeasibility certificate for those constraints from Lemma 8.3 or by directly writing the La-grangian

๐ฟ = โˆ’โˆ‘๏ธ€๐‘–

(๏ธ€๐‘ข๐‘– + ๐‘ฃ๐‘–(โˆ’๐‘๐‘‡๐‘– ๐‘ฅโˆ’ ๐‘Ÿ๐‘–) + ๐‘ค๐‘‡

๐‘– ๐น๐‘–๐‘ฅ)๏ธ€

= ๐‘ฅ๐‘‡(๏ธ€โˆ‘๏ธ€

๐‘– ๐‘ฃ๐‘–๐‘๐‘– โˆ’โˆ‘๏ธ€

๐‘– ๐น๐‘‡๐‘– ๐‘ค๐‘–

)๏ธ€+ (โˆ‘๏ธ€

๐‘– ๐‘ฃ๐‘–๐‘Ÿ๐‘– โˆ’โˆ‘๏ธ€

๐‘– ๐‘ข๐‘–) .

The dual maximization problem is unbounded (i.e. the primal problem is infeasible) if wehave โˆ‘๏ธ€

๐‘– ๐‘ฃ๐‘–๐‘๐‘– =โˆ‘๏ธ€

๐‘– ๐น๐‘‡๐‘– ๐‘ค๐‘–,โˆ‘๏ธ€

๐‘– ๐‘ฃ๐‘–๐‘Ÿ๐‘– >โˆ‘๏ธ€

๐‘– ๐‘ข๐‘–,2๐‘ข๐‘–๐‘ฃ๐‘– โ‰ฅ โ€–๐‘ค๐‘–โ€–2, ๐‘ข๐‘–, ๐‘ฃ๐‘– โ‰ฅ 0.

118

Page 122: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

We claim that ๐œ† = ๐‘ฃ is an infeasibility certificate in the sense of (10.12). We can assume๐‘ฃ๐‘– > 0 for all ๐‘–, as otherwise ๐‘ค๐‘– = 0 and we can take ๐‘ข๐‘– = 0 and skip the ๐‘–-th coordinate. Let๐‘€ denote the matrix appearing in (10.12) with ๐œ† = ๐‘ฃ. We show that ๐‘€ is positive definite:

[๐‘ฆ, ๐‘ฅ]๐‘‡๐‘€ [๐‘ฆ, ๐‘ฅ] =โˆ‘๏ธ€

๐‘–

(๏ธ€๐‘ฃ๐‘–๐‘ฅ

๐‘‡๐‘„๐‘–๐‘ฅ + 2๐‘ฃ๐‘–๐‘ฆ๐‘๐‘‡๐‘– ๐‘ฅ + 2๐‘ฃ๐‘–๐‘Ÿ๐‘–๐‘ฆ

2)๏ธ€

>โˆ‘๏ธ€

๐‘–

(๏ธ€๐‘ฃ๐‘–โ€–๐น๐‘–๐‘ฅโ€–2 + 2๐‘ฆ๐‘ค๐‘‡

๐‘– ๐น๐‘–๐‘ฅ + 2๐‘ข๐‘–๐‘ฆ2)๏ธ€

โ‰ฅโˆ‘๏ธ€๐‘– ๐‘ฃโˆ’1๐‘– โ€–๐‘ฃ๐‘–๐น๐‘–๐‘ฅ + ๐‘ฆ๐‘ค๐‘–โ€–2 โ‰ฅ 0.

10.3 Example: Factor model

Recall from Sec. 3.3.3 that the standard Markowitz portfolio optimization problem is

maximize ๐œ‡๐‘‡๐‘ฅsubject to ๐‘ฅ๐‘‡ฮฃ๐‘ฅ โ‰ค ๐›พ,

๐‘’๐‘‡๐‘ฅ = 1,๐‘ฅ โ‰ฅ 0,

(10.15)

where ๐œ‡ โˆˆ R๐‘› is a vector of expected returns for ๐‘› different assets and ฮฃ โˆˆ ๐’ฎ๐‘›+ denotes the

corresponding covariance matrix. Problem (10.15) maximizes the expected return of an in-vestment given a budget constraint and an upper bound ๐›พ on the allowed risk. Alternatively,we can minimize the risk given a lower bound ๐›ฟ on the expected return of investment, i.e.,we can solve the problem

minimize ๐‘ฅ๐‘‡ฮฃ๐‘ฅsubject to ๐œ‡๐‘‡๐‘ฅ โ‰ฅ ๐›ฟ,

๐‘’๐‘‡๐‘ฅ = 1,๐‘ฅ โ‰ฅ 0.

(10.16)

Both problems (10.15) and (10.16) are equivalent in the sense that they describe the samePareto-optimal trade-off curve by varying ๐›ฟ and ๐›พ.

Next consider a factorization

ฮฃ = ๐‘‰ ๐‘‡๐‘‰ (10.17)

for some ๐‘‰ โˆˆ R๐‘˜ร—๐‘›. We can then rewrite both problems (10.15) and (10.16) in conic quadraticform as

maximize ๐œ‡๐‘‡๐‘ฅsubject to (1/2, ๐›พ, ๐‘‰ ๐‘ฅ) โˆˆ ๐’ฌ๐‘˜+2

๐‘Ÿ ,๐‘’๐‘‡๐‘ฅ = 1,๐‘ฅ โ‰ฅ 0,

(10.18)

and

minimize ๐‘กsubject to (1/2, ๐‘ก, ๐‘‰ ๐‘ฅ) โˆˆ ๐’ฌ๐‘˜+2

๐‘Ÿ ,๐œ‡๐‘‡๐‘ฅ โ‰ฅ ๐›ฟ,๐‘’๐‘‡๐‘ฅ = 1,๐‘ฅ โ‰ฅ 0,

(10.19)

119

Page 123: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

respectively. Given ฮฃ โ‰ป 0, we may always compute a factorization (10.17) where ๐‘‰ is upper-triangular (Cholesky factor). In this case ๐‘˜ = ๐‘›, i.e., ๐‘‰ โˆˆ R๐‘›ร—๐‘›, so there is little differencein complexity between the conic and quadratic formulations. However, in practice, betterchoices of ๐‘‰ are either known or readily available. We mention two examples.

Data matrix

ฮฃ might be specified directly in the form (10.17), where ๐‘‰ is a normalized data-matrixwith ๐‘˜ observations of market data (for example daily returns) of the ๐‘› assets. When theobservation horizon ๐‘˜ is shorter than ๐‘›, which is typically the case, the conic representationis both more parsimonious and has better numerical properties.

Factor model

For a factor model we have

ฮฃ = ๐ท + ๐‘ˆ๐‘‡๐‘…๐‘ˆ

where ๐ท = Diag(๐‘ค) is a diagonal matrix, ๐‘ˆ โˆˆ R๐‘˜ร—๐‘› represents the exposure of assets to riskfactors and ๐‘… โˆˆ R๐‘˜ร—๐‘˜ is the covariance matrix of factors. Importantly, we normally have asmall number of factors (๐‘˜ โ‰ช ๐‘›), so it is computationally much cheaper to find a Choleskydecomposition ๐‘… = ๐น ๐‘‡๐น with ๐น โˆˆ R๐‘˜ร—๐‘˜. This combined gives us ฮฃ = ๐‘‰ ๐‘‡๐‘‰ for

๐‘‰ =

[๏ธ‚๐ท1/2

๐น๐‘ˆ

]๏ธ‚of dimensions (๐‘› + ๐‘˜) ร— ๐‘›. The dimensions of ๐‘‰ are larger than the dimensions of theCholesky factors of ฮฃ, but ๐‘‰ is very sparse, which usually results in a significant reductionof solution time. The resulting risk minimization conic problem can ultimately be writtenas:

minimize ๐‘‘ + ๐‘กsubject to (1/2, ๐‘ก, ๐น๐‘ˆ๐‘ฅ) โˆˆ ๐’ฌ๐‘˜+2

๐‘Ÿ ,

(1/2, ๐‘‘, ๐‘ค1/21 ๐‘ฅ1, ยท ยท ยท , ๐‘ค1/2

๐‘› ๐‘ฅ๐‘›) โˆˆ ๐’ฌ๐‘›+2๐‘Ÿ ,

๐œ‡๐‘‡๐‘ฅ โ‰ฅ ๐›ฟ,๐‘’๐‘‡๐‘ฅ = 1,๐‘ฅ โ‰ฅ 0.

(10.20)

120

Page 124: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Chapter 11

Bibliographic notes

The material on linear optimization is very basic, and can be found in any textbook. Forfurther details, we suggest a few standard references [Chv83] , [BT97] and [PS98] , whichall cover much more that discussed here. [NW06] gives a more modern treatment of boththeory and algorithmic aspects of linear optimization.

Material on conic quadratic optimization is based on the paper [LVBL98] and the books[BenTalN01] , [BV04] . The papers [AG03] , [ART03] contain additional theoretical andalgorithmic aspects.

For more theory behind the power cone and the exponential cone we recommend thethesis [Cha09] .

Much of the material about semidefinite optimization is based on the paper [VB96]and the books [BenTalN01] , [BKVH07] . The section on optimization over nonnegativepolynomials is based on [Nes99] , [Hac03] . We refer to [LR05] for a comprehensive surveyon semidefinite optimization and relaxations in combinatorial optimization.

The chapter on conic duality follows the exposition in [GartnerM12]Mixed-integer optimization is based on the books [NW88] , [Wil93] . Modeling of piecewise

linear functions is described in the survey paper [VAN10] .

121

Page 125: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Chapter 12

Notation and definitions

R and Z denote the sets of real numbers and integers, respectively. R๐‘› denotes the set of๐‘›-dimensional vectors of real numbers (and similarly for Z๐‘› and {0, 1}๐‘›); in most cases wedenote such vectors by lower case letters, e.g., ๐‘Ž โˆˆ R๐‘›. A subscripted value ๐‘Ž๐‘– then refers tothe ๐‘–-th entry in ๐‘Ž, i.e.,

๐‘Ž = (๐‘Ž1, ๐‘Ž2, . . . , ๐‘Ž๐‘›).

The symbol ๐‘’ denotes the all-ones vector ๐‘’ = (1, . . . , 1)๐‘‡ , whose length always follows fromthe context.

All vectors are interpreted as column-vectors. For ๐‘Ž, ๐‘ โˆˆ R๐‘› we use the standard innerproduct,

โŸจ๐‘Ž, ๐‘โŸฉ := ๐‘Ž1๐‘1 + ๐‘Ž2๐‘2 + ยท ยท ยท + ๐‘Ž๐‘›๐‘๐‘›,

which we also write as ๐‘Ž๐‘‡ ๐‘ := โŸจ๐‘Ž, ๐‘โŸฉ. We let R๐‘šร—๐‘› denote the set of ๐‘šร— ๐‘› matrices, and weuse upper case letters to represent them, e.g., ๐ต โˆˆ R๐‘šร—๐‘› is organized as

๐ต =

โŽกโŽขโŽขโŽขโŽฃ๐‘11 ๐‘12 . . . ๐‘1๐‘›๐‘21 ๐‘22 . . . ๐‘2๐‘›...

... . . . ...๐‘๐‘š1 ๐‘๐‘š2 . . . ๐‘๐‘š๐‘›

โŽคโŽฅโŽฅโŽฅโŽฆFor matrices ๐ด,๐ต โˆˆ R๐‘šร—๐‘› we use the inner product

โŸจ๐ด,๐ตโŸฉ :=๐‘šโˆ‘๏ธ๐‘–=1

๐‘›โˆ‘๏ธ๐‘—=1

๐‘Ž๐‘–๐‘—๐‘๐‘–๐‘—.

For a vector ๐‘ฅ โˆˆ R๐‘› we have

Diag(๐‘ฅ) :=

โŽกโŽขโŽขโŽขโŽฃ๐‘ฅ1 0 . . . 00 ๐‘ฅ2 . . . 0...

... . . . ...0 0 . . . ๐‘ฅ๐‘›

โŽคโŽฅโŽฅโŽฅโŽฆ ,

122

Page 126: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

i.e., a square matrix with ๐‘ฅ on the diagonal and zero elsewhere. Similarly, for a squarematrix ๐‘‹ โˆˆ R๐‘›ร—๐‘› we have

diag(๐‘‹) := (๐‘ฅ11, ๐‘ฅ22, . . . , ๐‘ฅ๐‘›๐‘›).

A set ๐‘† โŠ† R๐‘› is convex if and only if for any ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘† and ๐œƒ โˆˆ [0, 1] we have

๐œƒ๐‘ฅ + (1 โˆ’ ๐œƒ)๐‘ฆ โˆˆ ๐‘†

A function ๐‘“ : R๐‘› โ†ฆโ†’ R is convex if and only if its domain dom(๐‘“) is convex and for all๐œƒ โˆˆ [0, 1] we have

๐‘“(๐œƒ๐‘ฅ + (1 โˆ’ ๐œƒ)๐‘ฆ) โ‰ค ๐œƒ๐‘“(๐‘ฅ) + (1 โˆ’ ๐œƒ)๐‘“(๐‘ฆ).

A function ๐‘“ : R๐‘› โ†ฆโ†’ R is concave if and only if โˆ’๐‘“ is convex. The epigraph of a function๐‘“ : R๐‘› โ†ฆโ†’ R is the set

epi(๐‘“) := {(๐‘ฅ, ๐‘ก) | ๐‘ฅ โˆˆ dom(๐‘“), ๐‘“(๐‘ฅ) โ‰ค ๐‘ก},

shown in Fig. 12.1.

epi(f)

f

Fig. 12.1: The shaded region is the epigraph of the function ๐‘“(๐‘ฅ) = โˆ’ log(๐‘ฅ).

Thus, minimizing over the epigraph

minimize ๐‘กsubject to ๐‘“(๐‘ฅ) โ‰ค ๐‘ก

is equivalent to minimizing ๐‘“(๐‘ฅ). Furthermore, ๐‘“ is convex if and only if epi(๐‘“) is a convexset. Similarly, the hypograph of a function ๐‘“ : R๐‘› โ†ฆโ†’ R is the set

hypo(๐‘“) := {(๐‘ฅ, ๐‘ก) | ๐‘ฅ โˆˆ dom(๐‘“), ๐‘“(๐‘ฅ) โ‰ฅ ๐‘ก}.

Maximizing ๐‘“ is equivalent to maximizing over the hypograph

maximize ๐‘กsubject to ๐‘“(๐‘ฅ) โ‰ฅ ๐‘ก,

and ๐‘“ is concave if and only if hypo(๐‘“) is a convex set.

123

Page 127: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Bibliography

[AG03] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math. Programming,95(1):3โ€“51, 2003.

[ART03] E. D. Andersen, C. Roos, and T. Terlaky. On implementing a primal-dual interior-point method for conic quadratic optimization. Math. Programming, February 2003.

[BT97] D. Bertsimas and J. N. Tsitsiklis. Introduction to linear optimization. Athena Scien-tific, 1997.

[BKVH07] S. Boyd, S.J. Kim, L. Vandenberghe, and A. Hassibi. A Tutorial on Geo-metric Programming. Optimization and Engineering, 8(1):67โ€“127, 2007. Available athttp://www.stanford.edu/~boyd/gp_tutorial.html.

[BV04] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press,2004. http://www.stanford.edu/~boyd/cvxbook/.

[CS14] Venkat Chandrasekaran and Parikshit Shah. Conic geometric programming. In In-formation Sciences and Systems (CISS), 2014 48th Annual Conference on, 1โ€“4. IEEE,2014.

[Cha09] Peter Robert Chares. Cones and interior-point algorithms for structed convex opti-mization involving powers and exponentials. PhD thesis, Ecole polytechnique de Louvain,Universitet catholique de Louvain, 2009.

[Chv83] V. Chvรกtal. Linear programming. W.H. Freeman and Company, 1983.

[GartnerM12] B. Gรคrtner and J. Matousek. Approximation algorithms and semidefinite pro-gramming. Springer Science & Business Media, 2012.

[Hac03] Y. Hachez. Convex optimization over non-negative polynomials: structured algo-rithms and applications. PhD thesis, Universitรฉ Catholique De Lovain, 2003.

[LR05] M. Laurent and F. Rendl. Semidefinite programming and integer programming.Handbooks in Operations Research and Management Science, 12:393โ€“514, 2005.

[LVBL98] M. S. Lobo, L. Vanderberghe, S. Boyd, and H. Lebret. Applications of second-order cone programming. Linear Algebra Appl., 284:193โ€“228, November 1998.

[NW88] G. L. Nemhauser and L. A. Wolsey. Integer programming and combinatorial opti-mization. John Wiley and Sons, New York, 1988.

124

Page 128: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

[Nes99] Yu. Nesterov. Squared functional systems and optimization problems. In H. Frenk,K. Roos, T. Terlaky, and S. Zhang, editors, High Performance Optimization. KluwerAcademic Publishers, 1999.

[NW06] J. Nocedal and S. Wright. Numerical optimization. Springer Science, 2nd edition,2006.

[PS98] C. H. Papadimitriou and K. Steiglitz. Combinatorial optimization: algorithms andcomplexity. Dover publications, 1998.

[TV98] T. Terlaky and J.-Ph. Vial. Computing maximum likelihood estimators of convexdensity functions. SIAM J. Sci. Statist. Comput., 19(2):675โ€“694, 1998.

[VB96] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Rev., 38(1):49โ€“95,March 1996.

[VAN10] J. P. Vielma, S. Ahmed, and G. Nemhauser. Mixed-integer models for nonseparablepiecewise-linear optimization: unifying framework and extensions. Operations research,58(2):303โ€“315, 2010.

[Wil93] H. P. Williams. Model building in mathematical programming. John Wiley and Sons,3rd edition, 1993.

[Zie82] H Ziegler. Solving certain singly constrained convex optimization problems in pro-duction planning. Operations Research Letters, 1982. URL: http://www.sciencedirect.com/science/article/pii/016763778290030X.

[BenTalN01] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization:Analysis, Algorithms, and Engineering Applications. MPS/SIAM Series on Optimiza-tion. SIAM, 2001.

125

Page 129: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Index

Aabsolute value, 8, 22, 105adjacency matrix, 72

Bbasis pursuit, 9, 15big-M, 81, 102binary optimization, 70, 72binary variable, 102Boolean operator, 106

Ccardinality constraint, 112Cholesky factorization, 23, 27, 52complementary slackness, 95condition number, 57cone

convex, 20distance to, 85dual, 89exponential, 37p-order, 31power, 29product, 88projection, 85quadratic, 19, 52rotated quadratic, 20, 52second-order, 19self-dual, 89semidefinite, 51

conic quadratic optimization, 19constraint, 3

disjunctive, 104indicator, 104redundant, 80

constraint attribution, 98constraint satisfaction, 105

convexcone, 20function, 123set, 123

covariance matrix, 27, 119CQO, 19curve fitting, 66cut, 72

Ddeterminant, 58disjunctive constraint, 104dual

cone, 89function, 14, 93norm, 10problem, 14, 93, 116

dualityconic, 87gap, 96linear, 13, 18quadratic, 115, 117strong, 16, 96weak, 16, 95

dynamical system, 48

Eeigenvalue optimization, 56ellipsoid, 25, 26, 65, 114entropy, 39

maximization, 48relative, 40, 48

epigraph, 123exponential

cone, 37function, 39optimization, 37

126

Page 130: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

Ffactor model, 23, 28, 120Farkas lemma, 12, 17, 90, 116, 118feasibility, 88feasible set, 3, 12, 88Fermat-Torricelli point, 36filter design, 69function

concave, 123convex, 123dual, 14, 93entropy, 39exponential, 39, 41Lagrange, 14, 92, 93, 115, 117Lambert W, 41logarithm, 39, 42lower semicontinuous, 109piecewise linear, 7, 108power, 23, 31, 44, 110sigmoid, 49softplus, 40

Ggeometric mean, 33, 34, 36geometric median, 35geometric programming, 42GP, 42Grammian matrix, 52

Hhalfspace, 5harmonic mean, 24Hermitian matrix, 62hitting time, 48homogenization, 10, 28Huber loss, 77hyperplane, 4hypograph, 123

Iill-posed, 77, 79, 88, 91indicator

constraint, 104variable, 102

infeasibility, 88

certificate, 12, 90, 92, 116, 118locating, 13

inner product, 122matrix, 51

integer variable, 102inventory, 29

KKullback-Leiber divergence, 40, 48

LLagrange function, 14, 92, 93, 115, 117Lambert W function, 41limit-feasibility, 91linear

matrix inequality, 55, 83, 99near dependence, 78optimization, 2

linear matrix inequality, 55, 83, 99linear-fractional problem, 10LMI, 55, 83, 99LO, 2log-determinant, 58log-sum-exp, 40log-sum-inv, 41logarithm, 39, 42logistic regression, 49lowpass filter, 69

MMarkowitz model, 27matrix

adjacency, 72correlation, 65covariance, 27, 119Grammian, 52Hermitian, 62inner product, 51positive definite, 51pseudo-inverse, 53, 114semidefinite, 51, 114, 117variable, 51

MAX-CUT, 73maximum, 7, 9, 44, 106maximum likelihood, 36, 45

127

Page 131: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

meangeometric, 33harmonic, 24

MIO, 101monomial, 42

Nnearest correlation matrix, 65network

design, 110wireless, 46, 110

norm1-norm, 8, 1052-norm, 22dual, 10Euclidean, 22Frobenius, 45infinity norm, 9nuclear, 59p-norm, 31, 35

normal equation, 98

Oobjective, 114objective function, 3optimal value, 3, 88

unattainment, 10, 88optimization

binary, 70boolean, 72eigenvalue, 56exponential, 37linear, 2mixed integer, 101p-order cone, 31power cone, 29practical, 73quadratic, 113robust, 26semidefinite, 50

overflow, 81

Ppenalization, 80perturbation, 77

piecewise linearfunction, 7regression, 112

pOCO, 31polyhedron, 5, 35, 114polynomial

curve fitting, 66nonnegative, 60, 61, 67trigonometric, 62, 64, 69

portfolio optimizationcardinality constraint, 112constraint attribution, 98covariance matrix, 27, 119duality, 94factor model, 23, 28, 120fully invested, 47, 105market impact, 34Markowitz model, 27risk factor exposure, 120risk parity, 47Sharpe ratio, 28trading size, 112transaction costs, 111

posynomial, 42power, 23, 44power cone optimization, 29power control, 46precision, 82, 84principal submatrix, 53pseudo-inverse, 53, 54

QQCQO, 26, 117QO, 22, 113quadratic

cone, 19duality, 115, 117optimization, 113rotated cone, 20

quadratic optimization, 22, 26

Rrate allocation, 46redundant constraints, 80regression

128

Page 132: Release 3.2.2 MOSEK ApSChapter2 Linearoptimization In this chapter we discuss various aspects of linear optimization. We first introduce the basic concepts of linear optimization and

linear, 98logistic, 49piecewise linear, 112regularization, 49

regularization, 49relative entropy, 40, 48relaxation

semidefinite, 70, 72, 83Riesz-Fejer Theorem, 63risk parity, 47robust optimization, 26rotated quadratic cone, 20

Sscaling, 79Schur complement, 54, 60, 83, 117SDO, 50second-order cone, 19, 23semicontinuous variable, 103semidefinite

cone, 51optimization, 50relaxation, 70, 72, 83

setcovering, 106packing, 106partitioning, 106

setup cost, 103shadow price, 18Sharpe ratio, 28sigmoid, 49signal processing, 69signal-to-noise, 46singular value, 58, 101Slater constraint qualification, 97, 116, 118SOCO, 19softplus, 40SOS1, 106SOS2, 108spectrahedron, 53spectral factorization, 25, 51, 54spectral radius, 57sum of squares, 60, 63symmetry breaking, 113

Ttrading size, 112transaction costs, 111trigonometric polynomial, 62, 64, 69

Uunattainment, 78

Vvariable

binary, 102indicator, 102integer, 102matrix, 51semicontinuous, 103

verification, 84violation, 84volume, 34, 66

129