johnlee first course in linear optimization

7/21/2019 JohnLee First Course in Linear Optimization

1/172

A First Course inLinear Optimization a dynamic book

by

Jon Lee

First Edition

ReEx PrEsS
https://sites.google.com/site/jonleewebpage/https://sites.google.com/site/jonleewebpage/


2/172


3/172

Jon Lee

2013

ReEx PrEsS


4/172


5/172

iii

This work is licensed under the

Creative Commons Attribution 3.0 Unported License

(CC BY 3.0)cb.

To view a copy of this license, visithttp://creativecommons.org/licenses/by/3.0/

where you will see the summary information below and can click through to the fulllicense information.
http://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/


6/172


7/172

Go Forward

This is a book on linear optimization, written in LATEX. I started it, aiming it at thecourse IOE 510, a masters-level course at the University of Michigan. Use it as is, oradapt it to your course! It is an ongoing project. It is alive! It can be used, modified (theLATEX source is available) and redistributed as anyone pleases, subject to the terms oftheCreative Commons Attribution 3.0 Unported License (CC BY 3.0)cb.Please takespecial note that you can share(copy and redistribute in any medium or format) and

adapt(remix, transform, and build upon for any purpose, even commercially) this ma-terial, but you must give appropriate credit, provide a link to the license, and indicate ifchanges were made. You may do so in any reasonable manner, but not in any way thatsuggests that I endorse you or your use. If you are interested in endorsements, speakto my agent.

I started this material, but I dont control so much what you do with it. Control issometimes overrated and I am a control freak, so I should know!

I hope that you find this material useful. If not, I am happy to refund what you paidme.

Jon Lee

Ann Arbor, Michiganstarted March 2013

v
http://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://sites.google.com/site/jonleewebpage/https://sites.google.com/site/jonleewebpage/http://creativecommons.org/licenses/by/3.0/


8/172


9/172

Preface

This book is a treatment of linear optimization meant for students who are reasonablycomfortable with matrix algebra (or willing to get comfortable rapidly). It is not agoalofmine to teach anyone how to solve small problems by hand. My goals are to introduce:(i) the mathematics and algorithmics of the subject at a beginning mathematical level,(ii) algorithmically-aware modeling techniques, and (iii) high-level computational toolsfor studying and developing optimization algorithms (in particular,MATLABandAMPL).

Proofs are given when they are important in understanding the algorithmics. Imake free use of the inverse of a matrix. But it should be understood, for example,that B1b is meant as a mathematical expression for the solution of the linear system ofequationsBx= b. We are not in any way suggesting that an efficient way to calculatethe solution of a large (often sparse) linear system is to calculate an inverse! Also, Iavoid the dual simplex algorithm (e.g., rather deftly in describing branch-and-bound),preferring to just think about the ordinary simplex algorithm applied to the dual prob-lem. Again, my goal is not to describe the most efficient way to do matrix algebra!

Illustrations are woefully few. Though if Lagrange could not be bothered1, who amI to aim higher? Still, some of the algorithms are illustratedin the modern way, withcomputer code.

The material that I present was mostly well known by the 1960s. As a student atCornell in the late 70s and early 80s, I learned and got excited about linear optimiza-tion from Bob Bland, Les Trotter and Lou Billera, using [1] and[3]. The present bookis a treatment of some of that material, with additional material on integer-linear opti-mization, mostly which I originally learned from George Nemhauser and Les. There isnothing here on interior-point algorithms and the ellipsoid algorithm; dont tell MikeTodd!

Jon Lee

Ann Arbor, Michiganstarted March 2013(or maybe really in Ithaca, NY in 1979)

vii


10/172


11/172

Serious Acknowledgments

Throw me some funding for this project, and I will acknowledge you. Seriously!

Many of the pictures in this book were found floating around on the web. I am makingfair useof them as they float through this document. Of course, I gratefully acknowl-edge those who own them.

ix
http://www.copyright.gov/fls/fl102.htmlhttp://www.copyright.gov/fls/fl102.htmlhttp://www.copyright.gov/fls/fl102.htmlhttp://www.copyright.gov/fls/fl102.html


12/172


13/172

Dedication

For students(evenOhiostudents). Not for publishers not this time. Maybe next time.

xi


14/172


15/172

The Nitty Gritty

You can always get all released editions of this book (in .pdfformat) and the materialsto produce them (LATEX source, etc.) from myweb page.

If you decide that you need to recompile the LATEX to change something, then youshould be aware of two things. First, there is no users manual nor help desk. Secondly,I use a lot of include files, for incorporating code listings and output produced by soft-ware likeMatlabandAMPL. Make sure that you understand where everything is pulled

from if you recompile the LATEX. In particular, if you changeanythingin the directorythat the LATEX file is in, something can well change in the.pdfoutput. For example,Matlaband AMPLscripts are pulled into the book, and likewise for the output of thosescripts. If you want to play, and dont want to change those parts of the book, play inanother directory.

I make significant use of software. Everything seems to work with:

MATLAB R2013a (8.1.0.04)

AMPL 20100429

CPLEX 12.4.0.0

Mathematica 9.0.1.0

WinEdt 6.0MiKTeX 2.9

Use of older versions is inexcusable. Newer versions will surely break things.I use lots of LATEX packages. I couldnt possibly gather the version numbers of those

I do have a day job!

xiii


16/172


17/172


18/172

xvi CONTENTS

6 Sensitivity Analysis 53

6.1 Right-Hand Side Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.1.1 Local analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.1.2 Global analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.1.3 A brief detour: the column geometry for the Simplex Algorithm . 566.2 Objective Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.2.1 Local analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.2.2 Global analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.2.3 Local sensitivity analysis with a modeling language . . . . . . . . 58

6.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

7 Large-Scale Linear Optimization 61

7.1 Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617.1.1 The master reformulation . . . . . . . . . . . . . . . . . . . . . . . 627.1.2 Solution of the Master via the Simplex Algorithm . . . . . . . . . 64

7.2 Lagrangian Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727.2.1 Lagrangian bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . 737.2.2 Solving the Lagrangian Dual . . . . . . . . . . . . . . . . . . . . . 74

7.3 The Cutting-Stock Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 807.3.1 Formulation via cutting patterns . . . . . . . . . . . . . . . . . . . 817.3.2 Solution via continuous relaxation . . . . . . . . . . . . . . . . . . 817.3.3 The knapsack subproblem . . . . . . . . . . . . . . . . . . . . . . . 827.3.4 Applying the Simplex Algorithm . . . . . . . . . . . . . . . . . . . 837.3.5 A demonstration implementation. . . . . . . . . . . . . . . . . . . 84

7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

8 Integer-Linear Optimization 918.1 Integrality for Free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 918.1.1 Some structured models . . . . . . . . . . . . . . . . . . . . . . . . 918.1.2 Unimodular basis matrices and total unimodularity . . . . . . . . 948.1.3 Consequences of total unimodularity . . . . . . . . . . . . . . . . 97

8.2 Modeling Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1028.2.1 Disjunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1038.2.2 Forcing constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 1048.2.3 Piecewise-linear univariate functions . . . . . . . . . . . . . . . . 105

8.3 Cutting Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1068.3.1 Pure-integer problems . . . . . . . . . . . . . . . . . . . . . . . . . 106

8.3.2 Mixed-integer problems . . . . . . . . . . . . . . . . . . . . . . . . 1108.4 Branch-and-Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1138.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Appendices 121

A.1 LATEX template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123A.2 MATLABfor deconstructing the Simplex Algorithm . . . . . . . . . . . . . 129A.3 AMPLfor uncapacitated facility-location problem . . . . . . . . . . . . . . 137


19/172

CONTENTS xvii

End Notes 141

Bibliography 147

Index ofdefinitionsand onlydefinitions lazy and maintainable 150


20/172


21/172

Chapter 1

Lets Get Started

Our main goals in this chapter are as follows:

Introduce some terminology associated with linear optimization.

Describe elementary techniques for transforming any linear-optimization prob-

lems to one in a standard form. Introduce the Weak Duality Theorem.

Review ideas from linear algebra that we will make use of later.

1.1 Linear Optimization and Standard Form

Linear optimizationis the study of the mathematics and algorithms associated withminimizing or maximizing a real linearobjective functionof a finite number of realvariables, subject to a finite number oflinear constraints, each being that a real linearfunction on these variables be =, or a constant. A polyhedronis the solution

set of a finite number of linear constraints; so we are studying optimization of a linearfunction on a polyhedron.

Definition 1.1

Asolutionof a linear-optimization problem is an assignment of real values to the vari-ables. A solution isfeasibleif it satisfies the linear constraints. A solution isoptimalif there is no feasible solution with better objective value. The set of feasible solutions(which is a polyhedron) is the feasible region.

1


22/172


23/172


24/172

4 CHAPTER 1. LETS GET STARTED

number of columns ofAikagrees with the number of rows ofBkj , then

AB=

pk=1A1kBk1

pk=1A1kBkn

pk=1A2kBk1

pk=1A2kBkn

... . . .

...pk=1AmkBk1

pk=1AmkBkn

.

That is, blocki, jof the product isp

k=1AikBkj, andAikBkjis understood as ordinarymatrix multiplication.

For vectors x1, x2, . . . , xp Rn , and scalars 1, 2, . . . , p ,p

i=1ixi is a linear

combinationofx1, x2, . . . , xp . The linear combination istrivialif alli = 0. The vec-

torsx1, x2, . . . , xp Rn arelinearly independentif the only representation of the zerovector in Rn as a linear combination ofx1, x2, . . . , xp is trivial. The set of all linear com-binations ofx1, x2, . . . , xp is the vector-spacespan of{x1, x2, . . . , xp} . The dimension ofa vector spaceV, denoted dim(V), is the maximum number of linearly-independentvectors in it. Equivalently, it is the minimum number of vectors needed to span thespace.

A set of dim(V)linearly-independent vectors that spans a vector spaceV is aba-sisfor V . IfV is the vector-space span of{x1, x2, . . . , xp}, then there is a subset of{x1, x2, . . . , xp} that is a basis forV. Moreover, every linearly-independent subset of ofsuch aV can be extended to a basis forV using vectors fromx1, x2, . . . , xp .

The span of the rows ofa matrix A Rmn isthe row space ofA , denoted r.s.(A) :=

{y

A : y Rm

} . Similarly, the span of the columns of a matrix Ais thecolumn spaceofA, denoted c.s.(A) := {Ax : x Rn}. It is a simple fact that, for a matrixA,the dimension of its row space and the dimension of its column space are identical,this common number being called the rankofA . The matrix A has full row rankifits number of rows is equal to its rank. That is, if its rows are linearly independent.Similarly, the matrix A has full column rank if its number of columns is equal to itsrank. That is, if its columns are linearly independent.

Besides the row and columns spaces of a matrix A Rmn , there is another veryimportant vector space associated withA . The null spaceofAis the set of vectorshaving0dot product with all rows ofA, denoted n.s.(A) :={x Rn : Ax= 0} .

An important result is the following theorem relating the dimensions of the row

and null spaces of a matrix.

Theorem 1.3 (Rank-Nullity Theorem)

IfAis a matrix withncolumns, then

dim(r.s.(A)) +dim(n.s.(A)) =n .


25/172

1.3. LINEAR-ALGEBRA REVIEW 5

There are some simple operations on a matrix that preserve its row and null spaces.The following operations areelementary row operations:

1. multiply a row by a nonzero scalar;

2. interchange a pair of rows;3. add a scalar multiple of a row to another row;

4. delete a row that is identically zero.

There is one more operation that we allow, which is really one of convenience ratherthan mathematics. It is convenient to be able to permutecolumnswhile also permutingthe corresponding column indices. That is, ifA Rmn , we regard the columns aslabeled,in left-to-right order:1, 2, . . . , n. So we have

A= [A1, A2, . . . , An].

It can be convenient to have a permutation1, 2, . . . , nof1, 2, . . . , n, and then write

[A1, A2, . . . , An ].

This matrix is really equivalent to A, because we regard its columns as labeled by1, 2, . . . , n rather than 1, 2, . . . , n . Put another way, when we write a matrix, theorder of the columns is at our convenience, but the labels of columns is determined bythe order that we choose for placing the columns.

The identity matrix Irin Rrr is the matrix having 1 as every diagonal element and0 as every off-diagonal element. Via elementary row operations, any matrix A, that isnot all zero, can be transformed, via elementary row operations, into one of the form

[Ir, M].

Using corresponding operations on the associated system of equations, this is knownasGauss-Jordan elimination.

For an r r matrix B of rank r , there is a unique r r matrix B1 such thatB1B = Ir. For this reason, such a matrixBis calledinvertible, andB1 is called theinverseofB . According to the definition,B1B = Ir, but we also haveBB1 = Ir.Also,(B)1 = (B1) , and ifAandBare both invertible, then(AB)1 =B1A1 .

Noting that,B1[B, Ir] = [Ir, B

1],

we see that there is a nice way to compute the inverse of a matrix B using elementaryrow operations. That is, we perform elementary row operations on

[B, Ir]

so that we have the form[Ir, M],

and the resulting matrixMis B1 .


26/172


Next, we define the determinant of a square r r matrix B , which we denotedet(B) . We define the determinant in a nonstandard but useful manner, via a recursiveformula known asLaplace expansion.2

Ifr = 1, thenB = (b11), and we define det(B) :=b11. Forr >1, choose any fixed

columnj ofB, and we define

det(B) =r

i=1

(1)i+j bijdet(Bij),

whereBij is the(r 1) (r 1)matrix obtained by deleting rowiand columnjofB.It is a fact that this is well defined that is, the value of det(B) does not depend on thechoice ofj(taken at each step of the recursion). Moreover, we have det(B) =det(B),so we can could as well choose any fixedrowiofB, and we have

det(B) =r

j=1(1)i+j bijdet(Bij),

resulting in the same value for det(B).An interesting observation links det(B)with elementary row operations. Consider

performing elementary row operations on

[B, Ir]

to obtain[Ir, B

1].

As we carry out the elementary row operations, we sometimes multiply a row by anonzero scalar. If we accumulate the product of all of these multipliers, the result isdet(B1); equivalently, the reciprocal is det(B).

Finally, for an invertiblen nmatrixB and a vectorb, we can express the uniquesolution xof the systemBx= b, via a formula involving determinants. Cramers ruleis the following formula:

xj =det(B(j))

det(B) , forj = 1, 2, . . . , r ,

where B(j) is defined to be the matrix Bwith itsj-th column replaced by b . It is worthemphasizing that direct application of Cramers rule is not to be thought of as a usefulalgorithm for computing the solution of a system of equations. But it can be very usefulto have in the proof toolbox.3

1.4 ExercisesExercise 1.0 (Learn LATEX)

Learn to use LATEX for writing all of your homework solutions. Personally, I useMiKTEX, which is an implementation of LATEX for Windows. Specifically, within MiKTEX,I am using pdfLATEX (it only matters for certain things like including graphics and alsopdf into a document). I find it convenient to use the editor WinEdt,which is very LATEXfriendly. A good book on LATEX is
http://miktex.org/http://miktex.org/http://miktex.org/http://www.winedt.com/http://www.winedt.com/http://www.winedt.com/http://miktex.org/


27/172

1.4. EXERCISES 7

In A.1 there is a template to get started. Also, there are plenty of tutorialsand beginnersguides on the web.

Exercise 1.1 (Convert to standard form)

Give an original example (i.e., with actual numbers) to demonstrate that you know howto transform a general linear-optimization problem to one in standard form.

Exercise 1.2 (Weak Duality example)

Give an original example to demonstrate the Weak Duality Theorem.

Exercise 1.3 (Convert to form)

Describe a general recipe for transforming an arbitrary linear-optimization probleminto one in which all of the linear constraints are of type.

Exercise 1.4 (m + 1inequalities)

Prove that the system ofmequations innvariablesAx= bis equivalent to the systemAx baugmented by only one additional linear inequality that is, a total of onlym+ 1inequalities.

Exercise 1.5 (Weak duality for another form)

Give and prove a Weak Duality Theorem for

max cxAx b;

x 0.(P)

HINT: Convert (P) to a standard-form problem, and then apply the ordinary WeakDuality Theorem for standard-form problems.

Exercise 1.6 (Weak duality for a complicated form)

Give and prove a Weak Duality Theorem for

min cx + fwAx + Bw b;Dx = g;

x 0 w 0 .

(P)

HINT: Convert (P) to a standard-form problem, and then apply the ordinary WeakDuality Theorem for standard-form problems.


28/172


Exercise 1.7 (Weak duality for a complicated form with MATLAB)

TheMATLABcode below makes and solves an instance of (P) from Exercise1.6. Studythe code to see how it is works. Now, extend the code to solve the dual of (P). Also, afterconverting (P) to standard form (as indicated in the HINT for Exercise1.6), useMATLABto solve that problem and its dual. Make sure that you get the same optimal value forall of these problems.

% DualityWithMatlab1.m // Jon Lee

%

n1=7

n2=15

m1=2

m2=4

rng('default');

rng(1); % set seed

A = rand(m1,n1);

B = rand(m1,n2);

D = rand(m2,n1);

% Organize the situation

% so that the problem has a feasible solution

x = rand(n1,1);

w = rand(n2,1);

b = A*x + B*w + 0.01 * rand(m1,1);

g = D*x;

% Organize the situation

% so that the dual problem has a feasible solution

y = rand(m1,1);

pi = rand(m2,1) rand(m2,1);

c = A '*y + D '*pi + 0.01 * rand(n1,1);

f = B '*y 0.01 * rand(n2,1);

% Here is how the 'linprog' function works:

%

% [v,z,exitflag] = linprog(c,A,b,Aeq,beq,lb,ub)

% minimizes c'v ,

% with constraints A v


29/172

1.4. EXERCISES 9

% Study the following part carefully. This is how we set up

% a block problem. [c ; f] stacks vertically the vectors c and f.

% [A B] concatenates A and B horizontally. zeros(m2,n2) is an

% m2byn2 matrix of 0's. ones(n1,1) is an n1vector of 1's, and

% Inf is +infinity, so Inf*

ones(n1,1) is a vector of length n1

% with all components equal to +infinity. In this way we have

% infinite upper bounds on some variables.

[v,z,exitflag] = linprog([c ; f],[A B],b,[D zeros(m2,n2)],g, ...

[zeros(n1,1);Inf*ones(n2,1)],[Inf*ones(n1,1);zeros(n2,1)])

if (exitflag < 1)

disp('fail 1: LP did not have an optimal solution');

return;

end;

% Extract the parts of the solution v to the formulation vectors

x = v(1:n1,1)

w = v(n1+1:n1+n2,1)


30/172


31/172

Chapter 2

Modeling

Our goals in this chapter are as follows:

Learn some basic linear-optimization modeling techniques. Learn how to use an optimization modeling language likeAMPL.

2.1 A Production Problem

We suppose that a company hasm resources, available in quantities bi,i= 1, 2, . . . , m,and nproduction activities, with per-unit profits cj , j = 1, 2, . . . , n . Each unit ofactivity j consumes aij units of resources i . Each production activity can be carriedout at any nonnegative level, as long as the resources availabilities are respected. Weassume that any unused resource quantities have no value and can be disposed of atno cost. The problem is to find a profit-maximizing production plan. We can formulatethis problem as the linear-optimization problem

max cxAx b;

x 0,(P)

whereb := (b1, b2, . . . , bm) ,c := (c1, c2, . . . , cn) ,A Rmn is the matrix ofaij , andxis a vector of variables in Rn .

11


32/172


33/172


34/172


35/172

2.4. AN OPTIMIZATION MODELING LANGUAGE 15

3 35 ;

param n : = 2 ; # NUMBER OF COLUMNS

param c := # OBJECTIVE VECTOR

# j c(j)

1 3

2 2 ;

param a: 1 2 := # CONSTRAINT MATRIX

# a(i,j):

1 8 5

2 8 6

3 8 7 ;

Finally, we can haveproduction.run:

# production.run // Jon Lee

#

reset; # CLEAR FROM ANY PREVIOUS SOLVE

option show_stats 1; # SET AMPL OPTION TO CONTROL OUTPUT

option presolve 0; # COMMENT OUT TO SOLVE BIGGER MODELS

option solver cplex; # AMPL OPTION TO CHOOSE AN AVAILABLE SOLVER

option cplex_options 'display=2 presolve=0'; # SETTING SOLVER OPTIONS

# DELETING "presolve=0" MAY HELP TO SOLVE BIGGER MODELS

model production.mod; # READ IN THE MODEL

data production.dat; # READ IN THE DATA AND INSTANTIATE THE MODEL

solve; # PASS THE INSTANCE TO THE SOLVER

display z > production.out; # WRITE THE OPTIMAL OBJECTIVE VALUE TO A FILE

# NOTE THAT '>' DELETES THE FILE FIRST, IF IT EXISTS

for {j in COLS}

{ # USING CSTYLE COMMANDS TO OUTPUT OPTIMAL SOLUTION TO FILE

printf "x(%i) = %f\n", j, x[j] >> production.out;

# NOTE THAT '>>' APPENDS TO THE FILE, IF IT EXISTS

};

At the University of Michigan, College of Engineering, we have AMPL with the solverCPLEX installed on the CAEN (ComputerAided Engineering Network:caen.engin.umich.edu)machines running Red Hat Linux.

We check that we have the three files needed to run our Production Problem:

caen-vnc02% ls

production.dat production.mod production.run

Next, we invoke AMPLfrom a command prompt, and invoke the production.runfile:
http://caen.engin.umich.edu/http://caen.engin.umich.edu/


36/172


37/172

2.5. EXERCISES 17

There is a subtlety concerning the summations in the constraints. In each constraint,the nodeiis fixed. So sum{(i,j) in ARCS} is really summing over j in NODESsuchthat(i,j)is inARCS. That is the arcs whose tail is i. Similarly, sum{(j,i) in ARCS}

is really summing over j in NODESsuch that (j,i)is in ARCS. That is the arcs whosehead isi.

Theflow.datfile can be:

# flow.dat // Jon Lee

#

param: NODES: b :=

1 12

2 6

3 2

4 05 9

6 7 ;

param: ARCS: upper c :=

1,2 6 2

1,3 8 5

2,4 5 3

2,5 7 12

3,5 5 9

4,5 8 2

4,6 5 0

5,6 5 4 ;

We leave it to the gentle reader to devise an appropriate fileflow.run(the optimalobjective function value is25).

2.5 Exercises

Exercise 2.1 (Dual in AMPL)

Without changing the fileproduction.dat, useAMPLto solve thedualof the ProductionProblem example, as described in Section 2.1. You will need to modify production.modandproduction.run.

Exercise 2.2 (Sparse solution for equations with AMPL)

In some application areas, it is interesting to find a sparsest solution to a system ofequations that is, one with as few non-zeros as possible. It is well known that a1-norm minimizing solution is a good heuristic for finding a sparse solution. UsingAMPL,try this idea out on some large examples.

Exercise 2.3 (BloodyAMPL)

A transportation problemis a special kind of (single-commodity min-cost) network-flow problem. There are certain nodesv calledsupply nodeswhich have net supply


38/172


39/172


40/172


41/172

3.1. BASIC FEASIBLE SOLUTIONS AND EXTREME POINTS 21

We can calculate

A1 A =

1 4/3

1 2/32 10/3

1 2/3

,

A1 b := (1, 3, 3, 3/10) ,

and then we have plotted this in Figure3.1.The plot hasx1 =x3as the abscissa, andx2 =x5as the ordinate. In the plot, besides the nonnegativity of the variablesx3and

x4, the four inequalities of

A1 A

x A1 b are labeled with their slack variables

these are thebasic variablesx1, x2, x4, x6. The correct matching of the basic variables

to the inequalities of

A1 A

x A1 bis simply achieved by seeing that the i-th

inequality has slack variablexi.The feasible region is coloredcyan, while basic feasible solutions project togreen

points and basic infeasible solutions project toredpoints. We can see that the basicsolution associate with the current basis is feasible, because the origin (correspondingto the nonbasic variables being set to 0) is feasible.

Aset S Rn is a convex setif it contains the entire line segment between every pairof points inS. That is,

x1 + (1 )x2 S , wheneverx1, x2 Sand0<


42/172


43/172

3.1. BASIC FEASIBLE SOLUTIONS AND EXTREME POINTS 23

That is,is the list of indices for the zero variables ofx . Together, and partition{1, 2, . . . , n} .

Our goal is to construct a basic partition,, , so that the associated basic solutionis preciselyx.

The first thing that we will establish is that the columns ofAare linearly indepen-dent. We will do that by contradiction. Suppose that they are linearly dependent. Thatis, there existsz R|| different from the zero vector, such thatAz = 0. Next we ex-tend zto a vector z Rn , by letting z= 0 . Clearly Az= Az + Az= 0 + A0= 0 ;that is,zis in the null space ofA. Next, let

x1 := x+z

and

x2 := x z ,

with chosen to be sufficiently small so that x1 andx2 are nonnegative. Because z isonly nonzero on thecoordinates (wherexis positive), we can choose an appropriate. Notice thatx1 =x2 , becausezand hencezis not the zero vector. Now, it is easy toverify thatAx1 =A(x + z) =Ax + Az= b + 0= band similarlyAx2 =b. Therefore,x1 andx2 are feasible solutions of (P). Also, 12 x

1 + 12 x2 = 12 (x + z) +

12 (x z) = x. So

xis on the interior (actually it is the midpoint) of the line segment between x1 andx2 ,in contradiction to xbeing an extreme point of the feasible region of (P). Therefore, itmust be that the columns ofAare linearly independent.

In particular, we can conclude that|| m, since we assume that A Rmn hasfull row rank. If||< m, we choosearbitrarilym || columns ofAto append toA,to form an matrix A having m linearly-independent columns. As usual, we let bea list of then mindices not in . By definition, the associated basic solutionxhasx = 0 , and we observe that it is the unique solution to the system of equationsAx= bhavingx = 0 . Butx = 0becausexis a subvector ofx= 0 . Therefore,x= x. Thatis,xis a basic solution of (P).

Taken together, these last two results give us the main result of this section.

Corollary 3.4

For a feasible point x of standard-form (P), x is extreme if and only ifxis a basicsolution.


44/172


45/172


46/172

26 CHAPTER 3. ALGEBRA VERSUS GEOMETRY

If the basic directionzis a ray, then we call it a basic feasible ray. We have alreadyseen thatAz= 0. Furthermore,z 0if and only ifAj :=A

1 Aj 0.

Therefore, we have the following result:

Theorem 3.6The basic directionzis a ray of the feasible region of (P) if and only ifAj 0 .

Recall, further, thatzis a basic feasible directionrelative to the basic feasible solutionxifx+zis feasible, forsufficiently small positive R . Therefore, ifzis a basic feasibleray, relative to the basic partition, and xis the basic feasible solution relative to thesame basic partition, then zis a basic feasible direction relative to x.

A rayzof a convex set Sis anextreme rayif wecannotwrite

z = z1 +z2 , withz1 =z2being rays ofSand = 0.

Similarly to the correspondence between basic feasible solutions and extreme points forstandard-form problems, we have the following two results.

Theorem 3.7

Every basic feasible ray of standard-form (P) is an extreme ray of its feasible region.

Theorem 3.8

Every extreme ray of the feasible region of standard-form (P) is a positive multiple ofa basic feasible ray.

3.4 Exercises

Exercise 3.1 (Illustrate algebraic and geometric concepts)

Make a small example, say with six variables and four equations, to fully illustrate allof the concepts in this chapter. InA.2,there areMATLABscripts and functions that could

be very helpful here.

Exercise 3.2 (Basicfeasible rays are extreme rays)

Prove Theorem3.7.

Exercise 3.3 (Extreme rays are positive multiplesofbasic feasible rays)

If you are feeling very ambitious, prove Theorem3.8.


47/172


48/172

28 CHAPTER 4. THE SIMPLEX ALGORITHM

Definition 4.3

The vector of reduced costs associated with basisis

c :=c cA1 A= c

yA .

Lemma 4.4

The dual solution of (D) associated with basisis feasible for (D) if

c 0 .

Proof. Using the definitions ofyandc, the conditionc 0 is equivalent to

yA c .

The definition ofygives usyA =c

(equivalently, c = 0).

So we haveyA= y[A , A](c

, c

) =c

.

Henceyis feasible for (D).

Theorem 4.5 (Weak Optimal Basis Theorem)

Ifis a feasible basis andc 0, then the primal solution xand the dual solution yassociated withare optimal.

Proof. We have already observed that cx= yb for the pair of primal and dual solutionsassociated with the basis . If these solutions xand y are feasible for (P) and (D),respectively, then by weak duality these solutions are optimal.

We can also take (P) and transform it into an equivalent form that is quite revealing.Clearly, (P) is equivalent to

min cx + cx

Ax + Ax = b

x 0 , x 0 .

Next, multiplying the equations on the left byA1 , we see that they are equivalent to

x+ A1 Ax =A

1 b .

We can also see this asx =A

1 b A

1 Ax .


49/172

4.2. THE SIMPLEX ALGORITHM WITH NO WORRIES 29

Using this equation to substitute for xin the objective function, we are led to the linearobjective function

cA1 b+ min

c cA

1 A

x =cA

1 b+ min c

x,

which is equivalent to the original one on the set of points satisfying Ax = b. In thisequivalent form, it is now solely expressed in terms ofx. Now, ifc 0 , the best wecould hope for in minimizing is to setx = 0. But the unique solution havingx = 0is the basicfeasiblesolution x. So that xis optimal.

Example 4.6

This is a continuation of Example3.1. In Figure4.1, we have depicted the sufficientoptimality criterion, in the space of a particular choice of nonbasic variables not thechoice previously depicted. Specifically, we consider the equivalent problem

min cxAx b;

x 0.

This plot demonstrates the optimality of := (5, 4, 6, 3)( := (2, 1)). The basic direc-tions available from the basic feasible solution xappear as standard unit vectors in thespace of the nonbasic variables. The solution xis optimal becausec 0 ; we can alsothink of this as c having a nonnegative dot product with each of the standard unitvectors, hence neither direction is improving.

4.2 The Simplex Algorithm with No Worries

Improving direction. Often it is helpful to directly refer to individual elements of thevectorc; namely,

cj =cj cA

1 Aj =cj c

Aj , forj = 1, 2, . . . , n m .

If the sufficient optimality criterion is not satisfied, then we choose an j such thatcjis negative, and we consider solutions that increase the value ofxj up fromxj = 0,changing the values of the basic variables to insure that we still satisfy the equationsAx= b, while holding the other nonbasic variables at zero.


50/172


51/172

4.2. THE SIMPLEX ALGORITHM WITH NO WORRIES 31

Notice that for i such ai,j 0 , there is no limit on how large can be. In fact, it can wellhappen that Aj 0. In this case,x + zis feasible for all >0andc

(x + z) as+ , so the problem is unbounded.

Otherwise, to insure that x+z 0 , we just enforce

bi

ai,j, forisuch thatai,j >0 .

Finally, to get the best improvement in the directionzfromx, we letequal

:= mini : ai,j>0

biai,j

.

Nondegeneracy. There is a significant issue in even carrying out one iteration of thisalgorithm. Ifbi = 0for someisuch thatai,j > 0 , then = 0, and we are not able tomake any change fromxin the direction z. Just for now, we will simply assume awaythis problem, using the following hypothesis that every basic variable of every basicfeasible solution is positive.

Definition 4.7

The problem (P) satisfies thenondegeneracy hypothesisif for every feasible basis ,we havexi :=bi > 0fori = 1, 2, . . . , m.

Under the nondegeneracy hypothesis, >0.

Another basic feasible solution. By our construction, the new solution x +zis fea-sible and has lesser objective value than that ofx. We can repeat the construction as

long as the new solution isbasic. If it is basic, there is a natural guess as to what anappropriate basis may be. The variablexj, formerly nonbasic at value0has increasedto, so clearly it must become basic. Also, at least one variable that was basic now hasvalue0. In fact, under our nondegeneracy hypothesis, once we establish that the newsolution is basic, we observe that exactlyone variable that was basic now has value 0.Let

i := argmini : ai,j>0

biai,j

.

If there is more than oneithat achieves the minimum (which can happen if we do notassume the nondegeneracy hypothesis), then we will see that the choice ofi can be

any of these. We can see thatxihas value0inx +z. So it is natural to hope we canreplacexi as a basic variable with xj .

Let:= (1, 2, . . . , i1, j , i+1, . . . , m)

and := (1, 2, . . . , j1, i , j+1, . . . , m) .


52/172


53/172


54/172


55/172


56/172


(Pivot.mp4)

Figure 4.2: If you are using the Acrobat Reader, click above to see the clip. Or just seeit on YouTube by clickinghere.

4.4 Obtaining a Basic Feasible Solution

Next, we will deal with the problem of finding an initial basic feasible solution forthe standard-form problem

min cxAx = b;

x 0.(P)

At first, we ignore the degeneracy issue why worry about two things at once?! Theidea is rather simple. First, we choose any basic partition ,. If we are lucky, thenA1

b 0 . Otherwise, we have some work to do. We define a new nonnegative variable

xn+1, which we temporarily adjoin as an additional nonbasic variable. So our basic

indices remain as=

1,2, . . . ,m

,

while our nonbasic indices are extended to

=

1,2, . . . ,nm, nm+1:=n+ 1

.

The column for the constraint matrix associated with this new variable is definedasAn+1 :=A1. Hence An+1 =1. Finally, we temporarily put aside the objective
http://www.youtube.com/watch?v=w5DHU8SwYJ0http://www.youtube.com/watch?v=w5DHU8SwYJ0


57/172


58/172


Anticycling for (). There is one lingering issue remaining. We have not discussedanticycling for (). But this is relatively simple. We define an-perturbed version

min xn+1

Ax + An+1xn+1 = b(B) ;x , xn+1 0,

()

whereb(B) := b+ , and := (, 2, . . . , m) . Then we choose i so that xiis most

negative with respect to the ordering


59/172

4.5. THE SIMPLEX ALGORITHM 39

Be patient. Perhaps a more elegant way to handle the situation is to fully solve ().In doing so, if (P) has a feasible solution, then the minimum objective value of () will

be 0 (i.e., the zero polynomial), andxn+1will necessarily be nonbasic. That is because,at every iteration, every basic variable in () is positive. Becausexn+1legally left the

basis for () at the final iteration, the resulting basis is feasible for (P). So we do notre-perturb (P) , and we simply revert to solving (P) from the final basis of () .

4.5 The Simplex Algorithm

Putting everything together, we get a mathematically complete algorithm for linearoptimization. That is:


60/172


1. Apply an algebraic perturbation to the phase-one problem;

2. Solve the phase-one problem using the Worry-Free Simplex Algorithm, adaptedto algebraically perturbed problems, but always giving preference to xn+1 for

leaving the basis whenever it is eligible to leave for the unperturbed problem.Go to the next step, as soon as xn+1leaves the basis;

3. Starting from the feasible basis obtained for the original standard-form problem,apply an algebraic perturbation (Note that the previous step may have left us witha basis that is feasible for the original unperturbed problem, but infeasible for theoriginal perturbed problem this is why we apply a perturbation anew);

4. Solve the problem using the Worry-Free Simplex Algorithm, adapted to alge-braically perturbed problems.

It is important to know that the Simplex Algorithm will be used, later, to prove the

celebrated Strong Duality Theorem. For that reason, it is important that our algorithmbe mathematically complete. But from a practical computational viewpoint, there issubstantial overhead in working with the -perturbed problems. Therefore, in practice,no computer code that is routinely applied to large instances worries about thepotentialfor cycling associated with degeneracy.

4.6 Exercises

Exercise 4.1 (Illustrate aspects of the Simplex Algorithm)

Make a small example, and some variations on it, to fully illustrate all of the conceptsin this chapter. In A.2,there are MATLAB scripts and functions that could be very helpful

here.Exercise 4.2 (Worry-Free Simplex Algorithm can cycle)

Let = 2/k, with integerk 5. The idea is to use the symmetry of the geometriccircle, and complete a cycle of the Worry-Free Simplex Algorithm in2kpivots. Choose


61/172

4.6. EXERCISES 41

a constantsatisfying0< 0.Feel free to useMATLAB,MATHEMATICA, etc.

If you are feeling ambitious, check that for all k 5 , we get a cycle of the Worry-FreeSimplex Algorithm.

Note that it may seem hard to grasp the picture at all4.

SomeMathematicacode to play with this is as follows:

(* circle.m // Jon Lee *)

(* *)

t = 2 Pi/5; c = Cos[t]; s = Sin[t]; n = 69; e = .75;

m = {{c, s, 0}, {s, c, 0}, {(c 1)/c, s/c, 1}};

x = {{1}, {0}, {0}};

y = {{0}, {Cot[t]}, {0}};

T = Partition[

Flatten[{x, y, m.x, m.y, m.m.x, m.m.y, m.m.m.x, m.m.m.y, m.m.m.m.x,

m.m.m.m.y, x}], {3}, {3}];ListAnimate[

Table[Graphics3D[

Table[{GrayLevel[e], Polygon[{{0, 0, 0}, T[[i]], T[[i + 1]]}]},

{i,10}], Boxed > False, PlotRangePadding > 2.5,

BoxRatios > {2, 2, 2}, SphericalRegion > True,

ViewPoint > {Sin[2 Pi*t/n], Cos[2 Pi*t/n], 0},

Lighting > False], {t, 0, n}]]


62/172


Figure 4.3: A picture of the cycle withk = 5


63/172


64/172

44 CHAPTER 5. DUALITY

5.1 The Strong Duality Theorem

We have already seen two simple duality theorems:

Weak Duality Theorem. Ifxis feasible in (P) and yis feasible in (D), thencxyb.

Weak Optimal Basis Theorem.Ifis a feasible basis andc 0 , then the primalsolution xand the dual solution yassociated withare optimal.

The Weak Duality Theorem directly implies that ifx is feasible in (P) andyis feasible in(D), and cx= yb ,thenx andyare optimal. Thinking about it this way, we see that boththe Weak Duality Theorem and the Weak Optimal Basis Theorem assert conditions thatare sufficient for establishing optimality.

Theorem 5.1 (Strong Optimal Basis Theorem)

If (P) has a feasible solution, and (P) is not unbounded, then there exists a basis suchthat the associated basic solution xand the associated dual solution y are optimal.Moreover,cx= yb.

Proof. If (P) has a feasible solution and (P) is not unbounded, then the Simplex Algo-rithm will terminate with a basis such that the associated basic solutionxand theassociated dual solution yare optimal.

As a direct consequence, we have a celebrated theorem.

Theorem 5.2 (Strong Duality Theorem)

If (P) has a feasible solution, and (P) is not unbounded, then there exist feasible solu-tions xfor (P) andyfor (D) that are optimal. Moreover,cx= yb.

It is important to realize that the Strong Optimal Basis Theorem and the StrongDuality Theorem depend on the correctness of the Simplex Algorithm this includes:(i) the correctness of the phase-one procedure to find an initial feasible basis of (P), and(ii) the anti-cycling methodology.


65/172

5.2. COMPLEMENTARY SLACKNESS 45

5.2 Complementary Slackness

Definition 5.3

With respect to the standard-form problem (P) and its dual (D), the solutions xand yarecomplementaryif

(cj y

Aj)xj = 0, forj = 1, 2, . . . , n;yi(Aix bi) = 0, fori = 1, 2, . . . , m .

Theorem 5.4

Ifxis a basic solution (feasible or not) of standard-form (P), and yis the associateddual solution, then xandyare complementary.

Proof. Notice that ifxis a basic solution then Ax = b. Then we can see that comple-mentarity ofxand yamounts to

cjxj = 0, forj = 1, 2, . . . , n .

It is clear then that xand yare complementary, because ifxj > 0, then j is a basicindex, andcj = 0for basic indices.

Theorem 5.5

Ifxandyare complementary with respect to (P) and (D), then cx= yb.

Proof.cx yb= (c yA)x+ y(Ax b),

which is 0 by complementarity.

Corollary 5.6 (Weak Complementary Slackness Theorem)

Ifxand yare feasible and complementary with respect to (P) and (D), then xand yare optimal.


66/172


Proof. This immediately follows from Theorem 5.5 and the Weak Duality Theorem.

Theorem 5.7 (Strong Complementary Slackness Theorem)

Ifxand yare optimal for (P) and (D), then xand yare complementary (with respect

to (P) and (D)).

Proof. Ifx andyare optimal, then by the Strong Duality Theorem, we have cxyb= 0 .Therefore, we have

0 = (c yA)x+ y(Ax b)

=n

j=1

(cj yAj)xj+

mi=1

yi(Aix bi).

Next, observing thatxand yare feasible, we have

nj=1

(cj yAj)

0

xj0

+mi=1

yi(Aix bi) =0

.

Clearly this expression is equal to a nonnegative number. Finally, we observe that thisexpression can only be equal to 0 if

(cj yAj)xj = 0, forj = 1, 2, . . . , n .

5.3 Duality for General Linear-Optimization Problems

Thus far, we have focused on duality for the standard-form problem (P). But we will seethateverylinear-optimization problem has a natural dual. Consider the rather generallinear minimization problem

min cPxP + cNxN + c

UxU

AGPxP + AGNxN + AGUxU bG;ALPxP + ALNxN + ALUxU bL;AEPxP + AENxN + AEUxU = bE ;

xP 0 , xN 0 .

(G)

We will see in the next result that a natural dual for it ismax yGbG + y

LbL + y

EbE

yGAGP + yLALP + y

EAEP c

P ;

yGAGN + yLALN + y

EAEN c

N ;

yGAGU + yLALU + y

EAEU = c

U ;

yG 0 , yL 0 .

(H)


67/172


68/172


Strong Complementary Slackness Theorem: If(xP,xN,xU) and (yG,yL,yE)are optimal for (G) and (H), (xP,xN,xU)and (yG,yL,yE)are complementary(with respect to (G) and (H)).

Proof. Similarly to the proof for standard-form (P) and its dual (D), we consider thefollowing expression.

0 =jP

(cj yGAGj y

LALj y

EAEj )

0

xj0

+jN

(cj yGAGj y

LALj y

EAEj )

0xj

0+jU

(cj yGAGj y

LALj y

EAEj )

=0

xj

+iG

yi0

(AiPxP+AiNxN+AiUxU bi) 0

+iL

yi0

(AiPxP+AiNxN+AiUxU bi) 0

+iE

yi(AiPxP+AiNxN+AiUxU bi)

=0

.

The results follows easily using the Weak and Strong Duality Theorems for (G)and(H).

The table below summarizes the duality relationships between the type of each pri-mal constraint and the type of each associated dual variable. Highlighted in yelloware the relationships for the standard-form (P) and its dual (D). It is important to notethat the columns are labeled min and max, rather than primal and dual the tableisnotcorrect if min and max are interchanged.

min max

constraints

0variables 0

= unres.

variables

0 constraints0

unres. =


69/172


70/172


1. bis in the cone generated by the columns ofA (i.e., b is a nonnegative linearcombination of the columns ofA), or

2. there is y Rm that makes an acute angle with b and a non-acute (i.e., right orobtuse) angle with every column ofA.

In the case of (2), considering the hyperplaneHcontaining the origin having yas itsnormal vector, thisHseparatesb from the cone generated by the columns ofA . So,the Farkas Lemma has the geometric interpretation as aSeparating-Hyperplane The-orem.

In a similar fashion to the Farkas Lemma, we can develop theorems of this type forfeasible regions of other linear-optimization problems.

Theorem 5.13 (Theorem of the Alternative for Linear Inequalities)

LetA Rmn andb Rm be given. Then exactly one of the following two systemshas a solution.

Ax b . (I)

yb > 0 ;yA = 0 ;y 0.

(II)

Proof. It is easy to see that there cannot simultaneously be a solution xto (I) and yto(II). Otherwise we would have

0 = yA=0

x yb >0 ,

which is a clear inconsistency.Next, suppose that (I) has no solution. Then the following problem is infeasible:

min 0xAx b .

(P)

Its dual is

max ybyA = 0 ;y 0.

(D)

Because (P) is infeasible, then (P) is either infeasible or unbounded. But y := 0is afeasible solution to (D), therefore (D) must be unbounded. Therefore, there exists afeasible solutionyto (D) having objective value greater than zero (or even greater thanany fixed constant). Such a yis a solution to (II).


71/172

5.5. EXERCISES 51

5.5 Exercises

Exercise 5.1 (Duality and complementarity with AMPL)

After optimization usingAMPL, it is easy to get more information regarding primal anddual problems. If the variables arex, then theAMPLcommand display x.rc displays

the associated reduced costs. If the constraints are Constraints, then theAMPLcom-manddisplay Constraints.dualdisplaystheassociateddualvariablesand displayConstraints.slack displays the associated slackness. Using these commands, de-velop an example to verify the concepts of duality and complementarity developedin this chapter.

Exercise 5.2 (Complementary slackness)

Construct an example where we are given xand yand asked to check whether xisoptimal using complementary slackness. I want your example to have the propertythat xis optimal,xandyare complementary, butyis not feasible.

The idea is to see an example where there is not a unique dual solution complemen-

tary tox, and so xis optimal, but we only verify it with another choice ofy.Exercise 5.3 (Over complementarity)

With respect to the standard-form problem (P) and its dual (D), complementary solu-tions xandyareoverly complementaryifexactlyone of

cj yAjand xjis 0, forj = 1, 2, . . . , n .

Prove that if (P) has an optimal solution, then there are always optimal solutions for (P)and (D) that are overly complementary.

HINT: Letvbe the optimal objective value of (P). For eachj = 1, 2, . . . , n, consider

max xjcx vAx = b .

x 0.

(Pj)

(Pj) seeks an optimal solution of (P) that has xjpositive. Using the dual of (Pj), showthat if no optimal solution xof (P) has xj positive, then there is an optimal solution yof (D) withcj yAjpositive. Once you do this you can conclude that, for any fixedj ,there are optimal solutionsxandywith the property that

cj yAjand xjis 0.

Take all of thesenpairs of solutions xand yand combine them appropriately to con-struct optimalxandythat are overly complementary.

Exercise 5.4 (Another proof of a Theorem of the Alternative)

Prove the Theorem of the Alternative for Linear Inequalities directly from the FarkasLemma, without appealing to linear-optimization duality. HINT: Transform (I) of theTheorem of the Alternative for Linear Inequalities to a system of the form of (I) of theFarkas Lemma.


72/172


Exercise 5.5 (A general Theorem of the Alternative)

State and prove a Theorem of the Alternative for the system:

AGPxP + AGNxN + A

GUxU b

G ;ALPxP + A

LNxN + A

LUxU b

L ;

AEPxP + AENxN + AEUxU = bE ;xP 0 , xN0 .

(I)

Exercise 5.6 (Dual ray)

Consider the linear-optimization problem

min cxAx b;

x 0.(P)

a) Suppose that (P) is infeasible. Then, by a Theorem of the Alternative there is asolution to what system?

b) Suppose, further, that the dual (D) of (P) is feasible. Take a feasible solution yof(D) and a solution yto your system of part (a) and combine them appropriatelyto prove that (D) is unbounded.


73/172

Chapter 6

Sensitivity Analysis

Our goal in this chapter is as follows:

Learn how the optimal value of a linear-optimization problem behaves when theright-hand side vector and objective vector are varied.

6.1 Right-Hand Side Changes

We define a functionf : Rm R via

f(b) := min cxAx = b;

x 0.(Pb)

That is, (Pb) is simply (P) with the optimal objective value viewed as a function of itsright-hand side vectorb.

6.1.1 Local analysis

53


74/172

54 CHAPTER 6. SENSITIVITY ANALYSIS

Consider a fixed basis for (Pb). Associated with that basis is the basic solutionx = A

1 band the corresponding dual solution y

= cA1 . Let us assume that yis

feasible for the dual of (Pb) or, equivalently, c yA 0 . Considering the setB

ofb Rm such thatis an optimal basis, is it easy to see that Bis just the set ofbsuch

thatb:= A

1

b0

. That is, B R

m

is the solution set ofmlinear inequalities (in fact,it is a simplicial cone we will return to this point in Section 6.1.3). Now, forb B,we havef(b) = yb. Therefore,fis a linear function onb B. Moreover, as long asbis in the interior ofB, we have fbi = yi. So we have that yis the gradient off, as longasbis in the interior ofB. Now what does it mean forbto be in the interior ofB? It

just means that bi> 0fori = 1, 2, . . . , m.Let us focus our attention on changes to a single right-hand side element bi. Sup-

pose thatis an optimal basis of (P) , and consider the problem

min cxAx = b+ ie

i ;x 0.

(Pi)

The basis is feasible (and hence still optimal) for (Pi) ifA1 (b+ iei) 0 . Let

hi :=A1 ei . So

[h1, h2, . . . , hm] =A1 .

Then, the condition A1 (b+ iei) 0can be re-expressed as b+ ihi 0 . It is

straightforward to check that is feasible (and hence still optimal) for (Pi)aslongas iis in the interval[Li, Ui], where

Li:= maxk : hi

k>0

bk/h

ik

,

andUi:= min

k : hik


75/172

6.1. RIGHT-HAND SIDE CHANGES 55

Theorem 6.1

The domain offis a convex set.

Proof. Suppose thatbj is in the domain off, forj = 1, 2. Therefore, there existxj thatare feasible for (Pbj ) , for j = 1, 2. For any0 < < 1, letb := b

1 + (1 )b2 , andconsider x:=x1 + (1 )x2 . It is easy to check thatxis feasible for (Pb) , so we canconclude that bis in the domain off.

A function f : Rk Rhaving a convex set as its domain is a convex piecewise-linear function if, on its domain, it is the pointwise maximum ofafinitenumberofaffinefunctions.

It would be odd to refer to a function as being convex piecewise-linear if it werenot convex! The next result justifies the moniker.

Theorem 6.2

Iffis a convex piecewise-linear function, then it is a convex function.

Proof. Letf(u) := max

1ik{fi(u)} ,

foruin the domain off, where eachfiis an affine function. Then, for0 < < 1 andu1 , u2 Rk ,

f(u1+ (1 )u2) = max

1ik fi(u

1 + (1 )u2)

= max1ik fi(u1) + (1 )fi(u2) max

1ik

fi(u

1)

+ max1ik

(1 )fi(u

2)

= max1ik

fi(u

1)

+ (1 ) max1ik

fi(u

2)

= f(u1) + (1 )f(u2).


76/172


Theorem 6.3

fis a convex piecewise-linear function on its domain.

Proof. We refer to the dual

f(b) := max ybyA c;

(Db)

of (Pb).A basisis feasible or not for (Db), independent ofb. Thinking about it this way,

we can see that

f(b) = maxcA

1 b : is a dual feasible basis ,

and sofis a convex piecewise-linear function, because it is the pointwise maximum ofa finite number of affine (even linear) functions.

6.1.3 A brief detour: the column geometry for the Simplex Algorithm

In this section, we will describe a geometry for visualizing the Simplex Algorithm.6

The ordinary geometry for a standard-form problem, in the space of the nonbasic vari-ables for same choice of basis, can be visualized when n m = 2or 3. The columngeometry that we will describe is in Rm+1 , so it can be visualized when m + 1 = 2or3. Note that the graph of the functionf(b)(introduced at the start of this chapter) isalso in Rm+1 , which is why we take the present detour.

We think of thenpoints cjAj

,

forj = 1, 2, . . . , n, and the additional so-calledrequirement line zb

: z R

.


77/172

6.2. OBJECTIVE CHANGES 57

We think of the first component of these points and the line as the vertical dimension,so the requirement line is thought of as vertical. It of particular interest to think of thecone generated by thenpoints. That is,

K := cx

Ax Rm+1 : x 0 .

Notice how the top coordinate of a point in the cone gives the objective value of theassociated x for (P). So the goal of solving (P) can be thought of as that of finding apoint in the cone that is as low as possible.

Restricting ourselves to a basis, we have the cone

K :=

cx

Ax

Rm+1 : x 0

.

The cone K is an m-dimensional simplicial cone. Next, we observe that if is a

feasible basis, thenK

intersects the requirement line uniquely at the point cx

Ax

,

wherexis the basic solution associated with .In a pivot of the Simplex Algorithm from basis to basis , we do so with the goal

of havingK intersect the requirement line at a lowerpoint than didK . So at eachiteration of the Simplex Algorithm, we exchanges a single generator of the simplicialconeKassociated with our basis, to descend along the requirement line, ultimatelyfinding a point ofKthat meets the requirement at its lowest point.

6.2 Objective Changes

Here is what is needed for Occupy Wall Street to become a force for change: a clear,and clearly expressed, objective. Or two. Elayne Boosler

We define a functiong : Rn R via

g(c) := min cxAx = b;

x 0.(Pc)

That is, (Pc) is simply (P) with the optimal objective value viewed as a function of itsobjective vectorc.


78/172


6.2.1 Local analysis

Consider a fixed basisfor (Pc). Associated with that basis is the basic solution x =A1 band the corresponding dual solutiony

=cA1 . Let is assume thatxis feasible

for (Pc) or, equivalently,A1

b 0 . Considering the set Cofc Rn such thatis an

optimal basis, is it easy to see that this is just the set ofcsuch thatc cA

1 A 0

.That is, C Rn is the solution set ofn mlinear inequalities (in fact, it is a cone). Now,forc C, we haveg(c) =cx. Therefore,gis a linear function onc C.

6.2.2 Global analysis

Thedomainofg isthesetofc forwhich(Pc) has an optimal solution. Assuming that (Pc)is feasible, then the domain ofgis just the set ofc Rn such that (Pc) is not unbounded.

Similarly to the case of variations in the right-hand side vector b, we have the fol-lowing two results.

Theorem 6.4

The domain ofgis a convex set.

The function gis a concave piecewise-linear function if it is the pointwise minimumof a finite number of affine functions.

Theorem 6.5

gis a concave piecewise-linear function on its domain.

6.2.3 Local sensitivity analysis with a modeling language

To carry out some local sensitivity analysis withAMPL, use the following options

option solver cplex;

option cplex_options sensitivity;


79/172

6.3. EXERCISES 59

Then use theAMPLcommandsolveto solve the problem (in particular, to determine anoptimal basis).

AMPL has suffixes to retrieve some local sensitivity information from the solver. For aconstraint named, .currentgives the current right-hand

side value, while.down

and.up

give lower and upper lim-its on thatright-hand side value, such thatthe optimal basis remains optimal. Changinga particular right-hand side value, in this way, assumes that the right-hand side valuesfor the other constraints are held constant (though see Exercise 6.3for the case of si-multaneous changes). In our notation from Section6.1.1,ifcorrespondston

j=1aijxj =bi(that is, thei-th constraint of (Pb), then

bi = .current,

Li = .current .down,

Ui = .up .current.

In a similar way, for a variable named , .downand .up

give lower and upper limits on the coefficient ofin the objective function,such that the optimal basis remains optimal.

6.3 Exercises

Exercise 6.1 (Illustrate local sensitivity analysis)

Make an original example to illustrate the local-analysis concepts of this chapter.

Exercise 6.2 (Illustrate global sensitivity analysis)

Using AMPL, make an original example, with at least three constraints, graphing theobjective value of(P) , as a single b[i] is varied from to + . As you work on this,

bear in mind Theorem6.3.Exercise 6.3 (I feel that I know the change that is needed. Mahatma Gandhi)We are given2mnumbers satisfyingLi 0 Ui,i = 1, 2, . . . , m. Letbe an optimal

basis for all of the mproblems

min cxAx = b+ ie

i ;x 0,

(Pi)

for all i satisfying Li i Ui . Lets be clear on what this means: For eachiindividually, the basisis optimal when theith right-hand side component is changedfrombitobi+ i, as long asiis in the interval[Li, Ui].

The point of this problem is to be able to say something about simultaneouslychang-ing all of thebi. Prove that we can simultaneously changebito

bi:= bi+i

LiUi

,

wherei 0, whenm

i=1i 1. [Note that in the formula above, for each iwe canpick eitherLi(a decrease) orUi(an increase)].


80/172


Exercise 6.4 (Domain for objective variations)

Prove Theorem6.4.

Exercise 6.5 (Concave piecewise-linear function)

Prove Theorem6.5.


81/172

Chapter 7

Large-Scale Linear Optimization

Our goals in this chapter are as follows:

To see some approaches to large-scale linear-optimization problems

In particular, to learn about decomposition, Lagrangian relaxation and columngeneration.

Also, via a study of the cutting-stock problem, we will have a first glimpse atsome issues associated with integer-linear optimization.

7.1 Decomposition

61


82/172

62 CHAPTER 7. LARGE-SCALE LINEAR OPTIMIZATION

In this section we describe what is usually known as Dantzig-Wolfe Decomposi-tion. It is an algorithm aimed at efficiently solving certain kinds of structured linear-optimization problems. The general viewpoint is that we might have a veryefficientway to solve a certain type of structured linear-optimization problem, if it were not for

a small number of constraints that break the structure. For example, the constraint ma-trix might have the form in Figure7.1,where if it were not for the top constraints, theoptimization problem would separate into many small problems7.

. . .

Figure 7.1: Nearly separates

7.1.1 The master reformulation

Theorem 7.1 (The Representation Theorem)

Letmin cx

Ax = b;x 0.

(P)

Suppose that (P) has a nonempty feasible region. Let X :={xj : j J }be the set ofbasic-feasible solutions of (P), and letZ := {zk : k K}be the set of basic-feasible

rays of (P). Then the feasible region of (P) is equal to

jJ

jxj +

kK

kzk :

jJ

j = 1 ; j 0 , j J ; k 0 , k K

.

Proof. LetSbe the feasible region of (P) . Let

S =

jJ

jxj +

kK

kzk :

jJ

j = 1 ; j 0 , j J ; k0 , k K

.

We will demonstrate thatS = S . It is very easy to check thatS S, and we leavethat to the reader. For the other direction, suppose thatx S, and consider the system

jJ

jxj +

kK

kzk = x;

jJ

j = 1 ;

j 0 , j J ; k 0 , k K .

(I)


83/172

7.1. DECOMPOSITION 63

Keep in mind that in (I),xis fixed as well as are thexj and thezk the variables arethej and thek. By way of establishing that S S , suppose thatx / S that is,suppose that (I) has no solution. Applying the Farkas Lemma to (I) , we see that thesystem

w

x + t > 0 ;wxj + t 0, j J ;wzk 0, k K

(II)

has a solution, say w, t. Now, consider the linear-optimization problem

min wxAx = b;

x 0.(P)

(P) cannot be unbounded, because wzk 0, for all k K . In addition, everybasic feasible solution of(P)has objective value at least t. By Theorem5.1(the StrongOptimal Basis Theorem), this implies that the optimal value of(P)is at least t . Butthe objective value wxofxis less than t . Therefore, xcannot be feasible. That is,x /S.

Corollary 7.2 (The Decomposition Theorem)

Letmin cx

Ex = h;Ax = b;

x 0.

(Q)

LetS :={xR

n

: Ax= b , x 0} , let X :={xj

: j J }be the set of basic-feasiblesolutionsS, and let Z :={zk : k K}be the set of basic-feasible rays ofS. Then (Q)is equivalent to theMaster Problem

minjJ

cxj

j+

kK

czk

k

jJ

Exj

j+

kK

Ezk

k = h;

jJ

j = 1 ;

j 0 , j J , k0 , k K .

(M)

Proof. Using the Representation Theorem, we just substitute the expressionjJ

jxj +

kK

kzk

forxincxand inEx= hof (Q), and it is easy to see that (M) is equivalent to (Q).


84/172


Decompositionis typicallyapplied in a way such that the constraints defining (S) aresomehow relatively nice, and the constraintsE x = hsomehow are complicatingthe situation. For example, we may have a problem where the overall constraint matrixhas the form depicted in Figure7.1. In such a scenario, we would let

E:= and

A:=

. . .

7.1.2 Solution of the Master via the Simplex Algorithm

Next, we describe how to solve (M) using the Simplex Algorithm. Our viewpoint is

that we cannot write out (M) explicitly; there are typically far too many variables. Butwe can reasonably maintain a basic solution of (M) , because the number of constraintsis just one more than the number of constraints in Ex= h.

The only part of the Simplex Algorithm that is sensitive to the total number of vari-ables is the step in which we check whether there is a variable with a negative reducedcost. So rather than checking this directly, we will find an indirect way to carry it out.

Toward this end, we define dual variablesyandfor (M) .

minjJ

cxj

j+

kK

czk

k dual variables

jJ Exjj+ kKEz

kk = h; yjJ

j = 1 ;

j 0 , j J , k 0, k K .

(M)

Whileis a scalar variable,yis a vector with a component for each row ofE.We will temporarily put aside how we calculate values foryand, but for now we

suppose that we have a basic solution of (M) and an associated dual solution yand .

Entering variable. With respect to such a dual solution, the reduced cost of a variablejis cxj y Exj = + c yExj .It is noteworthy that with the dual solution fixed (at yand ), the reduced cost ofjisa constant () plus a linear function ofxj . A variablejis eligible to enter the basisif its reduced cost is negative. So we formulatethe following optimization problem:

+ min (c yE) xAx = b;

x 0.(SUB)


85/172


86/172


That is,

=B1

h1

.

Finally, organizing the costs (cxj) and (czk) of the basic variables jand kinto a vector

, with their order appropriately matched with the columns ofB, the associated dualsolution(y,)is precisely the solution of

(y, )B= .

That is,(y,) =B1 .

Starting basis. It is not obvious how to construct a feasible starting basis for (M) ; afterall, we may not have at hand any basic feasible solutions and rays of (S) . Next, we givea simple recipe. We assume that the problem is first in a slightly different form, wherethe complicating constraints are inequalities:

min cxEx h;Ax = b;

x 0,

(Q)

where xis a vector ofn mvariables, E Rm(nm) , A Rp(nm) , and all of theother data has dimensions that conform.

Introducingm slack variables for the Ex hconstraints, we have the equivalentproblem

min cx

Ex + s = h;

Ax = b;

x , s 0.

Now, it is convenient to define x Rnby

xj :=

xj , forj = 1, . . . , n m;sj(nm), forj =n m+ 1, . . . , n,

and to defineE Rmnby

Ej :=

Ej , forj = 1, . . . , n m;ej(nm) , forj =n m+ 1, . . . , n,

andA Rpnby

Ej := Aj , forj = 1, . . . , n m;0, forj =n m+ 1, . . . , n.

That is, x =

xs

, E = [E, Im] , A = [A, 0] , so now our system has the familiar

equivalent formmin cx

Ex = h;Ax = b;

x 0.

(Q)


87/172


Now, we have everything organized to specify how to get an initial basis for (M).First, we take as x1 anybasic feasible solution of (S) . Such a solution can be readilyobtained using ourusual (phase-one) methodology of the Simplex Algorithm. Next, weobserve that fork = 1, 2 . . . , m,zk := en(m+k) is a basic feasible ray ofS. The rayzk

corresponds to increasing the slack variable sk(arbitrarily); because, the slack variablesk =xnm+kdoes not truly enter into the Ax= bconstraints (i.e., the coefficients ofskis zero in those constraints), we do not have to alter the values of any other variableswhenskis increased.

So, we have as an initial set of basicvariables 1, . . . , m (corresponding to z1, . . . , zm)and1(corresponding tox1). Notice thatE zk = ek , fork = 1, 2 . . . , m. Organizingour basic variables in the order1, 2, . . . , m, 1, we have the initial basis matrix

B =

Im Ex

1

0 1

.

It is very easy to see that this is an invertible matrix.

Finally, it is veryimportant to realizethat we have given a recipe for finding an initialbasic solution of (M) . [It happens that this basic solution is feasible precisely whenx1

satisfies the Ex = h constraints]. But this solution may not be feasible. If the associatedbasic solution of (M) is not feasible (i.e., has any negative components), then we wouldintroduce an artificial variable and do a phase-one procedure.

A demonstration implementation. It is not completely trivial to write a smallMATLABcode for the Decomposition Algorithm. First of all, we solve the subproblems (SUB) us-ing functionality ofMATLAB. With this, if a linear-optimization problem is unbounded,MATLAB does not give us access to the basic-feasible ray on which the objective is decreas-ing. Because of this, ourMATLABcode quits if it encounters an unbounded subproblem.

Another point is that rather than carry out the simplex method at a detailed levelon the Master Problem (M), we just accumulate all columns of (M) that we generate,and always solve linear-optimization problems, using functionality ofMATLAB, with allof the columns generated thus far. In this way, we do not maintain bases ourselves,and we do not carry out the detailed pivots of the Simplex Algorithm. Note that thelinear-optimization functionality ofMATLABdoes give us a dual solution, so we do notcompute that ourselves.

% Decomp.m // Jon Lee

%

% Apply Decomposition to

% z = min c'x

% Ex + Is = h

% Ax = b

% x , s >= 0

%

% NOTE: WE ONLY HANDLE PROBLEMS WITH BOUNDED SUBPROBLEM LPs.

% IF WE ENCOUNTER AN UNBOUNDED SUBPROBLEM, THEN WE QUIT.

% generate a random example

n = 5 0 % number of x variables


88/172


m 1 = 1 5 % number of 'complicating' equations = number of s variables

m 2 = 1 0 % number of 'nice' equations

rng('default');

rng(1); % set seed

E = rand(m1,n); % random m1byn matrix

A = rand(m2,n); % random m2byn matrix

w = rand(n,1); % generate rhs's so problem is feasible

h = E*w + 0.01*rand(m1,1);

b = A*w;

c = rand(n,1); % generate random objective

% first we will calculate the optimal value z of the original LP, just to

% compare later.

disp('Optimizing the original LP without decomposition');

%options = optimoptions(@linprog,'Algorithm','simplex');

%[x,z,exitflag,output,lambda]=linprog([c;zeros(m1,1)],[],[],...

%[horzcat(E,eye(m1)); horzcat(A,zeros(m2,m1))],[h; b],zeros(n+m1,1), ...%[],[],options);

[x,z,exitflag,output,lambda]=linprog([c;zeros(m1,1)],[],[], ...

[horzcat(E,eye(m1)); horzcat(A,zeros(m2,m1))],[h; b], ...

zeros(n+m1,1),[],[]);

if (exitflag < 1)

disp('fail 1: original LP without decomposition did not solve correctly');

return;

end;

disp('Optimal value of the original LP:');

z

%disp('dual solution:');

%disp( lambda.eqlin); % Need to flip the signs due to a Matlab convention

disp('Starting Decomposition Algorithm');

MAXIT = 50 % number of iterations

% Set up the initial master basis

% ... First, we need any extreme point of the subproblem

x1 = linprog(c',[],[],A,b,zeros(n,1),[]);

X = x1; % start accumulating the extreme points

C X = c '*x1; % and their costs

numx = 1;

%Next, we build out the initial master basis

B = [horzcat(eye(m1), E*x1) ; horzcat(zeros(1,m1), 1)];

numcol = m1+1;

% OK, now we have a basis matrix for the master.

% Let's see if this basis is feasible.zeta = linsolve(B,[h ; 1])

% If it is not feasible, we need to adjoin an artificial column

if (min(zeta) < 0.00000001)

disp('*** Phaseone needed');

% adjoin an artificial column

numcol = numcol + 1;

artind = numcol;

B(:,artind) = B*ones(m1+1,1);


89/172


xi = zeros(numcol,1);

xi(artind) = 1;

% OK, now we are going to solve a phaseone problem to drive the artificial

% variable to zero. Of course, we need to do this with decomposition.

k = 1 ; % iteration counter

subval = Inf;

sigma = 0.0;

while sigma + subval < 0.00000001 && k


90/172


end;

%

% Now time for phasetwo

disp('***

Starting phasetwo');

results1 = zeros(MAXIT,1);

results2 = zeros(MAXIT,1);

k = 0 ; % iteration counter

subval = Inf;

sigma = 0.0;

while sigma + subval < 0.00000001 && k


91/172


92/172


Figure 7.2: Example of upper bounds generated by Decomposition

7.2 Lagrangian Relaxation


93/172

7.2. LAGRANGIAN RELAXATION 73

Again, we considerz:= min cx

Ex = h;Ax = b;

x 0

,

(Q)

but our focus now is on efficiently getting a good lower bound on z , with again theview that we are able to quickly solve many linear-optimization problems having onlythe constraints:Ax = b,x 0 .

7.2.1 Lagrangian bounds

For any fixed choice ofy, consider the following Lagrangian optimization problem

v(y) := yh+ min (c yE)xAx = b

x 0.(Ly)

Note that the only variables in the minimization are x we consider yto be fixed.

Theorem 7.3

v(y)z, for allyin the domain ofv.

Proof. Letxbe an optimal solution for (Q). Clearlyx is feasible for (Ly). Therefore

v(y) yh+ (c yE)x

= cx + y(h Ex)= z .

The last equation uses the fact thatx is optimal for (Q), soz = cx andEx =h.

From what we learned in studying sensitivity analysis, it can be seen that vis a con-cave (piecewise-linear) function on its domain (see Theorem6.5). Because of this nice

behavior, it is plausible that we could calculate the maximum ofvas a means of gettinga good lower bound on z . Before doing that, we examine the precise relationship be-tween primal and dual solutions of (Q), minimizers ofv, and primal and dual solutionsof the Lagrangian.

Theorem 7.4

Suppose thatx is optimal for (Q) , and suppose thatyandare optimal for the dualof (Q) . Thenx is optimal for(Ly),is optimal for the dual of(Ly),yis a maximizerofv, and the maximum value ofvis z.


94/172


Proof. x is clearly feasible for(Ly). Because yand are feasible for the dual of (Q) ,we haveA+ yE c , and so is feasible for the dual of(Ly).

Using the Strong Duality Theorem for (Q) implies thatcx = yh + b. Using thatEx = h(feasibility ofx in (Q)), we then have that (c yE) x = b. Finally, using

the Weak Duality Theorem for (Ly) , we have that x

is optimal for (Ly) andis optimalfor the dual of(Ly).Next,

z v(y) (by Theorem7.3)

= yh+

c yE

x (becausex is optimal for(Ly))

= yh+cx yh (becauseEx =h)

= cx

= z .

Therefore the first inequality is an equation, and so yis a maximizer ofvand the max-imum value isz.

Theorem 7.5

Suppose thatyis a maximizer ofv, and suppose thatis optimal for the dual of(Ly) .Thenyand are optimal for the dual of (Q) , and the optimal value of (Q) is v(y).

Proof.

v(y) = maxy

{v(y)}

= maxy yh+ minx c yEx : Ax= b , x 0= max

y

yh+ max

b : A c yE

= max

y,

yh+b : yE+Ac

= z .

The third equation follows from taking the dual of the inner (minimization) problem.The last equation follows from seeing that the final maximization (overyandsimul-taneously) is just the dual of (Q).

So, we have established that the optimal valuezof (Q) isv(y). Looking a bit moreclosely, we have established that z = yh+ b, and because A c yE, we have

thatyand are optimal for the dual of (Q) .

7.2.2 Solving the Lagrangian Dual

Theorem7.3gives us a simple way to calculate a lower bound onz, by solving a poten-tially much-easier linear-optimization problem. But the bound depends on the choiceofy Rm . Can we find the best suchy? This would entail solving the so-calledLa-grangian Dualproblem ofmaximizingv(y)over ally Rm . It should seem that there


95/172

7.2. LAGRANGIAN RELAXATION 75

is hope for doing this becausevis a concave function. Butvis not a smooth function(it is piecewise linear), so we cannot rely on calculus-based techniques.

Theorem 7.6

Suppose that we fix y, and solve for v(y) . Let xbe the solution of(Ly) . Let :=h Ex. Then

v(y)v(y) + (y y) ,

for allyin the domain ofv.

Proof.

v(y) + (y y) = yh+ (c yE)x+ (y y)(h Ex)

= y

h+ (c

y

E)x v(y).

The last equation follows from the fact that xis feasible (but possible not optimal) for(Ly).

Subgradient. What is v(y)+(yy)? Itisalinearestimationofv(y) starting from theactual value ofvat y. The direction y yis what we add to yto move toy. The choiceof := h Exis made so that Theorem7.6holds. That is,is chosen in such a waythat the linear estimation is always an upper bound on the valuev(y)of the functionfor all y. The nice property ofdemonstrated with Theorem7.6has a name: we say

that := h Exis a subgradientof (the concave function)v at y(because it satisfiesthe inequality of Theorem7.6).

Subgradient Optimization. Next, we describe a simple Subgradient OptimizationAlgorithm for solving the Lagrangian Dual. The general idea is to iteratively move inthe direction of a subgradient.

Subgradient Optimization Algorithm

0. Start with anyy1 Rm . Letk = 1.

1. Solve (Lyk ) to getxk .

2. Calculate the subgradientk :=h Exk .

3. Letyk+1 yk +kk .

4. Letk k + 1, and GOTO1.


96/172


Convergence. We have neglected, thus far, to fully specify the Subgradient Optimiza-tion Algorithm. To prove that the method converges, in a certain sense, it suffices totake an sequence ofk > 0such that

k=1

2k < +and

k=1k = + . That is,

square summable, but not summable.

For example, takingk := /(+ k), with > 0and 0, we get a sequenceof step sizes satisfying this property; in particular, for = 1 and = 0 we havethe harmonic series

k=11/kwhich satisfiesln(k+ 1) 0, for allk. Then

y yk+12 y y12 2k

i=1

i

v(y) v(yi)

+

ki=1

2i i2 .

Proof. The proof is by induction on k. The inequality is trivially true for the base caseofk = 0. So, consider generalk >0.

y yk+12 =

y yk

kk

2

=

y yk

kk

y yk

kk

= y yk2 + kk2 2k

y yk

k

y

yk

2

+2

kk

2

2k v(y) v(yk)

y y12 2

k1i=1

i

v(y) v(yi)

+k1i=1

2i i2

+2kk2 k

v(y) v(yk)

.

The penultimate inequality uses the assumption that k > 0and the subgradient in-equality:

v(y) v(yk) + (y yk)k ,

plugging iny for y. The final inequality uses the inductive hypothesis. The f

johnlee first course in linear optimization

Documents