1 basics optimization

7/29/2019 1 Basics Optimization

1/73

Model Predictive Control for Linear and

Hybrid Systems.Basics on Optimization

Francesco BorrelliDepartment of Mechanical Engineering,

University of California at Berkeley,USA

[email protected]

February 3, 2011


2/73

ReferencesFrom my Book:

Main Concepts (Chapter 2)

Optimality Conditions: Lagrange duality theory and KKT Conditions(Chapter 3)

Polyhedra, Polytopes and Simplices (Chapter 4)

Linear and Quadratic Programming (Chapter 5)

Note: The following notes have been extracted from

Stephen Boyds lecture notes for his course on convex optimization. Thefull notes can be downloaded from http://www.stanford.edu/~boyd

Nonlinear Programming Theory and Algorithms by Bazaraa, Sherali,Shetty

LMIs in Control by Scherer and Weiland

Lectures on Polytopes by ZieglerFrancesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 2 / 73


3/73

Outline

1 Main conceptsOptimization problemsContinuous problemsInteger and Mixed-Integer ProblemsConvexity

2 Optimality conditions: Lagrange duality theory and KKT conditions

3 Polyhedra, polytopes and simplices

4

Linear and quadratic programming

Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 3 / 73


4/73

Abstract optimization problems

Optimization problems are abundant in economical daily life (wherever adecision problem is)

General abstract features of problem formulation: Decision set Z

Constraints on decision and subset S Z of feasible decisions. Assign to each decision a cost f(x) R. Goal is to minimize the cost by choosing a suitable decision.

Concrete features of problem formulation: Z is real vector space: Continuous problem.

Z is discrete set: Discrete or combinatorial problem.



5/73

Concrete Optimization Problems

Generally formulated as

infz f(z)subj. to z S Z

(1)

The vector z collects the decision variables

Z is the domain of the decision variables,S Z is the set of feasible or admissible decisions.

The function f : Z R assigns to each decision z a cost f(z) R.

Shorter form of problem (1)inf

zSZf(z) (2)

Problem (2) is a called nonlinear mathematical program or simply nonlinearprogram.



6/73

Concrete Optimization ProblemsSolving problem (2) means

Compute the least possible cost J (called the optimal value)

J infzS

f(z) (3)

J is the greatest lower bound of f(z) over the set S:

f(z) J, z S

AND1

z S : f(z) = J

OR

2

> 0 z S| f(z) J +

Compute the optimizer, z S withf(z) = J. If z exists, thenrewrite (3) as

J = minzS

f(z) (4)



7/73

Concrete Optimization Problems

Consider the Nonlinear Program (NLP)

J = minzS

f(z)

Notation:If J = the problem is unbounded below.

If the set S is empty then the problem is said to be infeasible (we setJ = +).

If S = Z the problem is said to be unconstrained.

The set of all optimal solutions is denoted by

argminzSf(z) {z S : f(z) = J}



8/73

Continuous problemsConsider the problem

infz f(z)

subj. to gi(z) 0 for i = 1, . . . , mhi(z) = 0 for i = 1, . . . , pz Z

(5)

The domain Z of the problem (5) is a subset ofRs (the

finite-dimensional Euclidian vector-space), defined as:

Z = {z Rs : z dom f, z dom gi, i = 1, . . . , m ,

z dom hi, i = 1, . . . , p}

A point z Rs is feasible for problem (5) if: z Zand gi(z) 0 for i = 1, . . . , m, hi(z) = 0 for i = 1, . . . , p

The set of feasible vectors is

S = {z Rs : z Z, gi(z) 0, i = 1, . . . , m, hi(z) = 0,

i = 1, . . . , p}.



9/73

Local and Global Optimizer

Let J be the optimal value of problem (5). A global optimizer, if itexists, is a feasible vector z with f(z) = J.

A feasible point z is a local optimizer for problem (5) if there exists anR > 0 such that

f(z) = inf z f(z)subj. to gi(z) 0 for i = 1, . . . , m

hi(z) = 0 for i = 1, . . . , pz z Rz Z

(6)



10/73

Active, Inactive and Redundant Constraints

Consider the problem

infz f(z)subj. to gi(z) 0 for i = 1, . . . , m

hi(z) = 0 for i = 1, . . . , pz Z

The i-th inequality constraint gi(z) 0 is active at z if gi(z) = 0,otherwise is inactive

Equality constraints are always active for all feasible points.

Removing a redundant constraint does not change the feasible set S,this implies that removing a redundant constraint from the optimizationproblem does not change its solution.



11/73

Problem Description

The functions f, gi and hi can be available in analytical form or can bedescribed through an oracle model (also called black box orsubroutine model).

In an oracle model f, gi and hi are not known explicitly but can beevaluated by querying the oracle. Often the oracle consists of subroutineswhich, called with the augment z, return f(z), gi(z) and hi(z) and theirgradients f(z), gi(z), hi(z).



12/73

Integer and Mixed-Integer Problems

If the decision set Z in the optimization problem is finite, then theoptimization problem is called combinatorial or discrete.

If Z {0, 1}s, then the problem is said to be integer.

If Z is a subset of the Cartesian product of an integer set and a realEuclidian space, i.e., Z {[zc, zb] : zc Rsc , zb {0, 1}sb}, then the

problem is said to be mixed-integer.The standard formulation of a mixed-integer non-linear program is

inf[zc,zb] f(zc, zb)subj. to gi(zc, zb) 0 for i = 1, . . . , m

hi(z

c, z

b) = 0 for i = 1, . . . , p

zc Rsc , zb {0, 1}sb

[zc, zb] Z

(7)



13/73

Convexity

A set S Rs is convex if

z1 + (1 )z2 S for all z1, z2 S, [0, 1].

A function f : S R is convex if S is convex and

f(z1 + (1 )z2) f(z1) + (1 )f(z2)

for all z1, z2 S, [0, 1].

A function f : S R is strictly convex if S is convex and

f(z1 + (1 )z2) < f(z1) + (1 )f(z2)

for all z1, z2 S, (0, 1).

A function f : S R is concave if S is convex and f is convex.



14/73

Operations preserving convexity

1 The intersection of an arbitrary number of convex sets is a convex set:

if Sn is convex n N+ then

nN+Sn is convex.

The empty set is convex.

2 The sub-level sets of a convex function f on S are convex:

if f(z) is convex then S {z S : f(z) } is convex .

3 f1, . . . , f N convex N

i=1 ifi is convex function i 0, i = 1, . . . , N .

4 The composition of a convex function f(z) with an affine map z = Ax + bgenerates a convex function f(Ax + b) of x.

5 A linear function f(z) = cz + d is both convex and concave.

6 A quadratic function f(z) = zQz + 2sz + r is convex if and only ifQ Rss is positive semidefinite. Strictly convex if Q Rss is positivedefinite.



15/73

Operations preserving convexity

Suppose f(x) = h(g(x)) = h(g1(x), . . . , gk(x)) with h : Rk R,gi : R

s R. Then,1 f is convex if h is convex, h is nondecreasing in each argument, and gi are

convex,2 f is convex if h is convex, h is nonincreasing in each argument, and gi are

concave,3 f is concave if h is concave, h is nondecreasing in each argument, and gi

are concave.

The pointwise maximum of a set of convex functions is a convex function:

f1(z), . . . , f k(z) convex functions

f(z) = max{f1(z), . . . , f k(z)} is a convex function.



16/73

Convex optimization problems

Problem (5) is convex if the cost f is convex on Z and S is convex.

In convex optimization problems local optimizers are also globaloptimizers.

Non-convex optimization problems solved by iterating between the

solutions of convex sub-problems.Convexity of the feasible set S difficult to prove except in special cases.(See operations preserving the convexity)

Non-convex problems exist which can be transformed into convexproblems through a change of variables and manipulations on cost and

constraints.



17/73

Outline

1 Main concepts

2 Optimality conditions: Lagrange duality theory and KKT conditionsOptimality ConditionsDuality theoryCertificate of optimalityComplementarity slacknessKKT conditions


4 Linear and quadratic programming


C


18/73

Optimality Conditions

Consider the problem


hi(z) = 0 for i = 1, . . . , pz Z

In general, an analytical solution does not exist.

Solutions are usually computed by recursive algorithms which start froman initial guess z0 and at step k generate a point zk such that{f(zk)}k=0,1,2,... converges to J

.

These algorithms recursively use and/or solve analytical conditions foroptimality


O i li C di i f U i d


19/73

Optimality Conditions for UnconstrainedOptimization Problems

Theorem (Necessary condition*)

f : Rs R differentiable at z. If there exists a vector d such thatf(z)d < 0, then there exists a > 0 such that f(z + d) < f(z) for all (0, ).

The vector d in the theorem above is called descent direction.

The direction of steepest descent ds at z is defined as the normalizeddirection where f(z)ds < 0 is minimized.

The direction ds of steepest descent is ds = f(z)f(z)

.

Corollary

f : Rs R is differentiable at z. If z is a local minimizer, then f(z) = 0.


O ti lit C diti f U t i d


20/73

Optimality Conditions for UnconstrainedOptimization Problems

Theorem (Sufficient condition*)Suppose that f : Rs R is twice differentiable at z. If f(z) = 0 and theHessian of f(z) at z is positive definite, then z is a local minimizer.

Theorem (Necessary and sufficient condition*)

Suppose that f : Rs R is differentiable at z. If f is convex, then z is a global

minimizer if and only if f(z) = 0.Proofs available in Chapter 4 of M.S. Bazaraa, H.D. Sherali, and C.M.Shetty. Nonlinear Programming Theory and Algorithms. John Wiley & Sons,Inc., New York, second edition, 1993.


D lit Th Th L F ti


21/73

Duality Theory. The Lagrange Function

Consider the primal optimization problem


hi(z) = 0 for i = 1, . . . , pz Z

Any feasible point: upper bound of the optimal value

Lagrange dual problem: lower bound on optimal valueConstruct Lagrange function

L(z ,u,v) = f(z) + u1g1(z) + . . . + umgm(z)++v1h1(z) + . . . + vphp(z)

More compactL(z ,u,v) f(z) + ug(z) + vh(z)

ui and vi called Lagrange multipliers or dual variables

objective is augmented with weighted sum of constraint functions


D lit Th Th L F ti


22/73

Duality Theory. The Lagrange Function

Consider Lagrange function

L(z ,u,v) f(z) + ug(z) + vh(z)

Let z S be feasible. For arbitrary vectors u 0 and v trivially have

L(z ,u,v) f(z)

After infimization we infer

infzZ

L(z ,u,v) infzZ, g(z)0, h(z)=0

f(z)

Best lower bound: since u 0 and v arbitrary

sup(u,v), u0

infzZ

L(z ,u,v) infzZ, g(z)0, h(z)=0

f(z)


D lit Th Th D l P bl


23/73

Duality Theory. The Dual Problem

Let(u, v) inf

zZL(z ,u,v) [, +] (8)

Lagrangian dual problem

sup(u,v), u0

(u, v) (9)

Remarks

The (8) (Lagrangiang dual subproblem) is an unconstrainedoptimization problem. Only points (u, v) with (u, v) > areinteresting

(u, v) always concave the (9) is concave, much easier to solve than

the primal (non convex in general)Weak duality always holds:

sup(u,v), u0

(u, v) infzZ, g(z)0, h(z)=0

f(z)


Duality Theory Duality Gap and Certificate of


24/73

Duality Theory. Duality Gap and Certificate of

Optimality

Let:d = max

(u,v), u0(u, v)

J = minzZ, g(z)0, h(z)=0

f(z)

then

we always have d J

d J is called optimal duality gap

Strong duality if d = J

In case of strong duality u and v serve as certificate of optimality


Duality Theory Constraint Qualifications: The Slaters


25/73

Duality Theory. Constraint Qualifications: The Slater s

Condition

When do we haveDual optimal value = Primal optimal value?

In general, strong duality does not hold, even for convex primal problems.

Constraint qualifications. Conditions on the constraint functionsimplying strong duality for convex problems.

Slaters condition (A well known constraint qualification )Consider the primal problem. There exists z Rs which belongs to therelative interior of the problem domain Z, which is feasible(g(z) 0, h(z) = 0) and for which gj (z) < 0 for all j for which gj is notan affine function.

Other constraint qualifications exist. Linear Independence, Cottles,Zangwills, Kuhn-Tuckers constraint qualifications.


Duality Theory The Slaters Theorem


26/73

Duality Theory. The Slater s Theorem

When do we haveDual optimal value = Primal optimal value?

Theorem (Slaters theorem)

Consider the primal problem and its dual problem. If the primal problem isconvex and Slaters condition holds then d > and d = J.

Thenmax

(u,v), u0(u, v) = min

zZ, g(z)0, h(z)=0f(z)

Slaters condition reduces to feasibility when all inequality constraints arelinear. Strong duality holds for convex QP and for LP (feasible)


Certificate of Optimality


27/73

Certificate of Optimality

Be z a feasible point. f(z) is an upper bound on the cost, i.e.,

J

f(z).Be (u, v) a dual feasible point, (u, v) is an lower bound on the cost,i.e., (u, v) J.

If z and (u, v) are primal and dual feasible, respectively, then(u, v) J f(z). Therefore z is -suboptimal, with equal to

primal-dual gap, i.e., = f(z) (u, v).The optimal value of the primal (and dual) problems will lie in the sameinterval

J, d [(u, v), f(z)].

(u, v) is a certificate that proves the (sub)optimality of z.

At iteration k, algorithms produce a primal feasible zk and a dual feasibleuk, vk with f(zk) (uk, vk) 0 as k , hence at iteration k weknow J [(uk, vk), f(zk)] (useful stopping criteria)


Complementary slackness


28/73

Complementary slackness

Suppose that z, u, v are primal, dual feasible with zero duality gap (hence,

they are primal, dual optimal)

f(z) = (u, v)= inf z

f(z) + ug(z) + vh(z)

f(z) + ug(z) + vh(z)

hence we have

mi=1 u

i gi(z

) = 0 and so

ui gi(z) = 0, i = 1, . . . , m

called complementary slackness condition

i-th constraint inactive at optimum = ui = 0

ui > 0 at optimum = i-th constraint active at optimum


KKT optimality conditions


29/73


Suppose

f, gi, hi are differentiable

z, u, v are (primal, dual) optimal, with zero duality gap

by complementary slackness we have

f(z) +i ui gi(z

) + j vj hj(z

) =

minz

f(z) +

i

ui gi(z) +

j

vj hj (z)

(10)

i.e., z minimizes L(z, u, v) therefore

f(z) +

i

ui gi(z) +

j

vj hj (z) = 0




30/73


z, (u, v) of an optimization problem, with differentiable cost andconstraints and zero duality gap, have to satisfy the following conditions:

0 = f(z) +m

i=1ui gi(z

) +

p

j=1vj hi(z

), (11a)

0 = ui gi(z), i = 1, . . . , m (11b)

0 ui , i = 1, . . . , m (11c)

0 gi(z), i = 1, . . . , m (11d)

0 = hj (z) j = 1, . . . , p (11e)

Conditions (11a)-(11e) are called the Karush-Kuhn-Tucker(KKT) conditions.




31/73


Consider the primal problem (5).

Theorem (Necessary and sufficient condition)Suppose that problem (5) is convex and that cost and constraints f, gi and hiare differentiable at a feasible z. If problem (5) satisfies Slaters conditionthen z is optimal if and only if there are (u, v) that, together with z,satisfy the KKT conditions.

Theorem (Necessary and sufficient condition)

Let z be a feasible solution and A = {i : gi(z) = 0} be the set of activeconstraints at z. Suppose that f, gi are differentiable at z

i and that hi arecontinuously differentiable at z i. Further, suppose that gi(z) for i A

and hi(z) for i = 1, . . . , p, are linearly independent. If z, (u, v) areoptimal, then they satisfy the KKT conditions. In addition, if problem (5) isconvex, then z is optimal if and only if there are (u, v) that, togetherwith z, satisfy the KKT conditions.


KKT geometric interpretation


32/73

KKT geometric interpretation-f(z)1

g(z)12

g(z)

1

0

g (z)

3

0

g( )z 0

g(z)11

-f(z)2 g(z)22

z2

z1

g(z)23

g(z)

2

0

Rewrite the (11a), as

f(z) =iI

uigi(z), ui 0,

i.e., the direction of cost steepest descent belongs to the convex cone spanned

by gis,Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 32 / 73

KKT conditions. Example


33/73

s pKKT conditions under only the convexity assumptions are notnecessary Consider the problem:

min z1

subj. to (z1 1)2 + (z2 1)2 1(z1 1)2 + (z2 + 1)2 1

(12)

(1,1)

(1,-1)

f(1,0)g1(1,0)

g (1,0)2

z1

z2

KKT are necessary if constraints qualification is satisfied:

z Z such that g(z) 0, h(z) = 0, gj(z) < 0 if gj is not affine and z intZ


Outline


34/73

1 Main concepts



General Set Definitions and OperationsPolyhedra Definitions and RepresentationsBasic Operations on Polytopes



General Set Definitions and Operations


35/73

p

An n-dimensional ball B(x0, ) is the setB(x

0, ) = {x Rn| x x

02

}. x0

and are the center and theradius of the ball, respectively.

Affine sets are sets described by the solutions of a system of linearequations:

F = {x Rn : Ax = b, with A Rmn, b Rm}.

The affine combination of x1, . . . , xk is defined as the point1x1 + . . . + kxk where

ki=1 i = 1.

The affine hull of K Rn is the set of all affine combinations of pointsin K and it is

aff(K) = {1x1 + . . . + kxk | xi K, i = 1, . . . , k ,k

i=1

i = 1}


General Set Definitions and Operations


36/73

pThe dimension of an affine set is the dimension of the largest ball ofradius > 0 included in the set.

The convex combination of x1, . . . , xk is defined as the point

1x1 + . . . + kxk where

ki=1 i = 1 and i 0, i = 1, . . . , k.

The convex hull of a set K Rn is the set of all convex combinationsof points in K and it is denoted as conv(K):

conv(K) {1x1 + . . . + kxk | xi K, i 0, i = 1, . . . , k ,

ki=1

i = 1}.

A cone spanned by a finite set of points K = {x1, . . . , xk} is defined as

cone(K) = {k

i=1

ixi, i 0, i = 1, . . . , k}.

The Minkowski sum of two sets P, Q Rn is defined as

P Q {x + y|x P, y Q}.


Polyhedra Definitions and Representations


37/73

An H-polyhedron P in Rn denotes an intersection of a finite set of closedhalfspaces in Rn:

P = {x Rn : Ax b}

A two-dimensional H-polyhedron

a x b1 1

a x b3 3

a x b4 4

a x b2 2

a x b5 5

Inequalities which can be removed without changing the polyhedron are calledredundant. The representation of an H-polyhedron is minimal if it does notcontain redundant inequalities.Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 37 / 73



38/73

A V-polyhedron P in Rn denotes the Minkowski sum:

P = conv(V) cone(Y)

for some V = [V1, . . . , V k] Rnk, Y = [y1, . . . , yk ] Rnk

.

Any H-polyhedron is a V-polyhedron.

An H-polytope (V-polytope) is a bounded H-polyhedron(V-polyhedron). Any H-polytope is a V-polytope

The dimension of a polytope (polyhedron) P is the dimension of itsaffine hull and is denoted by dim(P).

A polytope P Rn, is full-dimensional if dim(P) = n or, equivalently,if it is possible to fit a non-empty n-dimensional ball in POtherwise, we say that polytope P is lower-dimensional.

If Pxi 2 = 1, where Px

i denotes the i-th row of a matrix Px, we say that

the polytope P is normalized.




39/73

A linear inequality cz c0 is said to be valid for P if it is satisfied for allpoints z P.

A face of P is any nonempty set of the form

F = P {z Rs| cz = c0}

where cz c0 is a valid inequality for P.

The faces of dimension 0,1, dim(P)-2 and dim(P)-1 are called vertices,edges, ridges, and facets, respectively.

A d-simplex is a polytope ofRd with d + 1 vertices.

10 8 6 4 2 0 2 4 6 8 1010

8

6

4

2

0

2

4

6

8

10

x1

x

2

(a) V-representation.

10 8 6 4 2 0 2 4 6 8 1010

8

6

4

2

0

2

4

6

8

10

x1

x

2

(b) H-representation.


Polytopal Complexes


40/73

A set C Rn is called a P-collection (in Rn) if it is a collection of a finitenumber of n-dimensional polytopes, i.e.

C = {Ci}

NC

i=1,where Ci := {x Rn : Cxi x C

ci }, dim(Ci) = n, i = 1, . . . , N C, with NC < .

The underlying set of a P-collection C = {Ci}NCi=1 is the point set

C :=PC P =

NCi=1 Ci.

Special Classes

A collection of sets {Ci}NCi=1 is a strict partition of a set C if (i)

NCi=1 Ci = C and (ii) Ci Cj = , i = j.{Ci}NCi=1 is a strict polyhedral partition of a polyhedral set C if {Ci}

NCi=1

is a strict partition of C and Ci is a polyhedron for all i, where Ci denotesthe closure of the set CiA collection of sets {Ci}

NCi=1 is a partition of a set C if (i)

NCi=1 Ci = C

and (ii) (Ci\Ci) (Cj \Cj) = , i = j.


Functions on Polytopal Complexes


41/73

A function h() : R

k

, where R

s

, is piecewise affine (PWA) ifthere exists a strict partition R1, . . . ,RN of and h() = Hi + ki,

Ri, i = 1, . . . , N .

A function h() : Rk, where Rs, is piecewise affine onpolyhedra (PPWA) if there exists a strict polyhedral partition

R1, . . . ,RN of and h() = H

i

+ k

i

, Ri, i = 1, . . . , N .A function h() : R, where Rs, is piecewise quadratic(PWQ) if there exists a strict partition R1, . . . ,RN of andh() = Hi + ki + li, Ri, i = 1, . . . , N .

A function h() : R, where Rs, is piecewise quadratic on

polyhedra (PPWQ) if there exists a strict polyhedral partitionR1, . . . ,RN of and h() = Hi + ki + li, Ri, i = 1, . . . , N .


Basic Operations on Polytopes


42/73

Convex Hull of a set of points V = {Vi}NVi=1, with Vi R

n,

conv(V) = {x Rn : x =NV

i=1iVi, 0 i 1,

NV

i=1i = 1}. (13)

Used to switch from a V-representation of a polytope to anH-representation.

Vertex Enumeration of a polytope P given in H-representation. (dualof the convex hull operation)




43/73

Polytope reduction is the computation of the minimal representationof a polytope. A polytope P Rn, P = {x Rn : Pxx Pc} is in a

minimal representation if the removal of any row in Pxx Pc wouldchange it (i.e., if there are no redundant constraints).

The Chebychev Ball of a polytope P = {x Rn | Pxx Pc}, withPx RnPn, Pc RnP , corresponds to the largest radius ball B(xc, R)with center xc, such that B(xc, R) P.

10 8 6 4 2 0 2 4 6 8 1010

8

6

4

2

0

2

4

6

8

10

x1

x2




44/73

Projection Given a polytope

P = {[x

y

]

Rn+m

: P

x

x + P

y

y Pc

} Rn+m

the projection ontothe x-space Rn is defined as

projx(P) := {x Rn | y Rm : Pxx + Pyy Pc}.




45/73

Set-Difference The set-difference of two polytopes Y and R0

R = Y \ R0 := {x Rn : x Y, x / R0},

in general, can be a nonconvex and disconnected set and can be describedas a P-collection R =

mi=1 Ri, where Y =

mi=1 Ri

(R0

Y).

The P-collection R =m

i=1 Ri can be computed by consecutivelyinverting the half-spaces defining R0 as described in the following

TheoremLet Y Rn be a polyhedron, R0 {x Rn : Ax b}, andR0 {x Y : Ax b} = R0

Y, where b Rm, R0 = and Ax b is a

minimal representation of R0. Also let

Ri =

x Y : Aix > bi

Aj x bj , j < i

i = 1, . . . , m

Let R m

i=1 Ri. Then, R is a P-collection and {R0, R1, . . . , Rm} is a strictpolyhedral partition of Y.




46/73

R0

X

x2x+2

x2x1

x+1x

1

R0

X

x2x+2

x2x1

x+1x

1

R1

g1 0g1 0

R0

X

x2x+2

x2x1

x+1x

1

R1

R2g1 0

g2 0

R0

X

x2x+2

x2x1

x+1x

1

R1

R2

R3R4R5

Figure: Two dimensional example: partition of the rest of the space X \R0.




47/73

Pontryagin Difference The Pontryagin difference (also known asMinkowski difference) of two polytopes P and Q is a polytope

P Q := {x Rn : x + q P, q Q}.

The Minkowski sum of two polytopes P and Q is a polytope

P Q := {x Rn : y P, z Q, x = y + z}.

5 4 3 2 1 0 1 2 3 4 5

5

4

3

2

1

0

1

2

3

4

5

x1

x2

P QP

Q

(a) Pontryagin differenceP Q.

5 4 3 2 1 0 1 2 3 4 5

5

4

3

2

1

0

1

2

3

4

5

x1

x2

P QP

Q

(b) Minkowski sum P Q.


Minkowski Sum of Polytopes


48/73

The Minkowski sum is computationally expensive.Consider

P = {y Rn | Pyy Pc}, Q = {z Rn | Qzz Qc},

it holds that

W = P Q

=

x Rn | y Pyy Pc, z Qzz Qc, y , z Rn, x = y + z

=

x Rn | y Rn, s.t. Pyy Pc, Qz(x y) Qc

= x Rn | y Rn, s.t. 0 Py

Q

z

Q

z xy P

c

Q

c= projx

[xy] Rn+n |

0 Py

Qz Qz

xy

Pc

Qc

.


Pontryagin Difference of Polytopes


49/73

The Pontryagin difference is not computationally expensive.

Consider

P = {y Rn s.t. , Pyy Pb}, Q = {z Rns.t. Qzz Qb},

Then:P Q = {x Rn s.t.Pyx Pb H(Py, Q)}

where the i-th element of H(Py, Q) is

Hi(Py, Q) max

xQPyi x

and P

y

i is the i-th row of the matrix P

y

.For special cases (e.g. when Q is a hypercube), more efficientcomputational methods exist.




50/73

Note that (P Q) Q P.

2 1.5 1 0.5 0 0.5 1 1.5 2

2

1.5

1

0.5

0

0.5

1

1.5

2

x1

x2

P

Q

(c) Two polytopes P and Q.

2 1.5 1 0.5 0 0.5 1 1.5 2

2

1.5

1

0.5

0

0.5

1

1.5

2

x1

x2

P Q

P

(d) Polytope P and Pon-tryagin difference P Q.

2 1.5 1 0.5 0 0.5 1 1.5 2

2

1.5

1

0.5

0

0.5

1

1.5

2

x1

x2

P Q

(P Q) Q

(e) Polytope P Q and theset (P Q) Q.

Figure: Illustration that (P Q) Q P .


Affine Mappings and Polyhedra


51/73

Consider a polyhedron P = {x Rn | Pxx Pc}, with Px RnPn andan affine mapping f(z)

f : z Rn Az + b, A Rnn, b Rn

Define the composition of P and f as the following polyhedron

P f {z Rm | Pxf(z) Pc} = {z Rm | PxAz Pc Pxb}

Useful for backward-reachability


Affine Mappings and Polyhedra

C id l h d P { Rn | P x P c} ith P x RnPn d


52/73

Consider a polyhedron P = {x Rn | Pxx Pc}, with Px RnPn andan affine mapping f(z)

f : z Rn

Az + b, A Rnn

, b Rn

Define the composition of f and P as the following polyhedron

f P {y RmA | y = Ax + b x Rn, Pxx Pc}

The polyhedron f P in can be computed as follows. Write P inV-representation P = conv(V) and map the vertices V = {V1, . . . , V k}through the transformation f. Because the transformation is affine, theset f P is the convex hull of the transformed vertices

f P = conv(F), F = {AV1 + b , . . . , A V k + b}.

If f is invertible x = A1y A1b and therefore

f P = {y RmA | PxA1y Pc + PxA1b}

Useful for forward-reachability


Outline


53/73

1 Main concepts





Linear Programming


54/73

infz cz

subj. to Gz Wwhere z Rs.

Convex optimization problems.

Other common forms:

infz czsubj. to Gz W

Geq z = Weq

orinfz c

z

subj. to Geq z = Weqz 0

Always possible to convert one of the three forms into the other.


Graphical Interpretation and Solutions PropertiesLet P be the feasible set. P is a polyhedron.


55/73

P P p yIf P is empty, then the problem is infeasible.Denote by J the optimal value and by Z the set of optimizers

Z

= argminzP c

zCase 1. The LP solution is unbounded, i.e., J = .Case 2. The LP solution is bounded, i.e., J > and the optimizer is unique.

Z is a singleton.Case 3. The LP solution is bounded and there are multiple optima. Z is an

uncountable subset ofRs which can be bounded or unbounded.

cz = k4

cz = k1

P

(a) Case 1

cz = k4cz = k3

cz = k1

zP

(b) Case 2

cz = k4

cz = kz

P

(c) Case 3

Fi ure: Gra hical Inter retation of the Linear Pro ram Solution k < k Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 55 / 73

Dual of LP


56/73

Consider the LP infz czsubj. to Gz W

(14)

with z Rs and G Rms. The Lagrange function is

L(z, u) = cz + u(Gz W).

The dual cost is

(u) = infz

L(z, u) = infz

(c + uG)z uW =

uW if Gu = c

if Gu = c

Since we are interested only in cases where is finite,


supu uW

subj. to Gu = cu 0

infu Wu

subj. to Gu = cu 0

KKT condition for LP


57/73

The KKT conditions for the LP (14) become

Gu = c, (15a)

(Gj z Wj )uj = 0, (15b)

u 0, (15c)Gz W (15d)

which are: primal feasibility (15d), dual feasibility (15a), (15c) and slacknesscomplementary conditions (15b).


Active Constraints and DegeneraciesBe J {1, . . . , m} the set of constraint indices and consider the sets


58/73

A(z) {j J : Gj z = Wj }, of active constraints at feasible zN A(z) {j J : Gj z < Wj }, of inactive constraints at feasible z.

C 1. A(z) undefined, since z is undefinedC 2. The cardinality of A(z) can be between s and m. Higher than s

regardless whether P is minimal or not (see figure below)C 3. IfP is minimal, then the cardinality of A(z) is s p with p the dimension

of the optimal face at all points z Z contained in the relative interiorof the optimal face, otherwise it can be any number p with 1 < p m.

P

z*

1

23

4

56

Figure: Primal Degeneracy in a Linear ProgramFrancesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 58 / 73

Active Constraints and Degeneracies


59/73

Definition (Primal degenerate)

The LP is said to be primal degenerate if there exists a z Z such that thenumber of active constraints at z is greater than the number of variables s.

Definition (Dual degenerate)

The LP is said to be dual degenerate if its dual problem is primal degenerate.

Homework. Show that if the primal problem has multiple optima, then thedual problem is primal degenerate (i.e., the primal problem is dualdegenerate). The converse is not always true. Details in the book.


Convex Piecewise Linear OptimizationConsider


60/73

J = minz J(z)subj. to Gz W

(16)

where the cost function has the form

J(z) = maxi=1,...,k

{ciz + di} (17)

where ci Rs and di R.

0 1 2 3 4 5 6 70

1

2

3

4

5

x

P1

fx

(

)f1( )x

f2( )x

f3( )x

f4( )x

P2

P3

P4


Convex Piecewise Linear Optimization


61/73

The cost function J(z) in (17) is a convex PWA function. The optimizationproblem (16)-(17) can be solved by the following linear program:

J = minz,

subj. to Gz Wciz + di , i = 1, . . . , k


Convex Piecewise Linear Optimization


62/73

ConsiderJ = minz J

1(z

1) + J

2(z

2)

subj. to G1z1 + G2z2 W (18)

where the cost function has the form

J1(z1) = maxi=1,...,k {ciz1 + di}J2(z2) = maxi=1,...,j {miz2 + ni}

(19)

The optimization problem (18)-(19) can be solved by the following linearprogram:

J = minz,1,2 1 + 2subj. to G1z1 + G2z2 W

ciz1 + di 1, i = 1, . . . , kmiz2 + ni 2, i = 1, . . . , j


Example


63/73

The optimization problem:

minz1,z2

f(x, u) = min |z1 + 5| + |z2 3|

subject to 2.5 z1 5

1 z2 1

can be solved in Matlab by using v=linprog(f,A,b) where v = [1

, 2

, z1

, z2

],f = [1 1 0 0],

A =

0 0 1 00 0 1 00 0 0 10 0 0 11 0 1 01 0 1 0

0 10 0 10 10 0 1

, b =

52.51155

33

Solution is v = [7.5 0.2 2.5 1]


Quadratic Programming


64/73

minz12z

Hz + qz + rsubj. to Gz W

(20)

where z Rs, H = H > 0 Rss.

Other QP forms often include equality and inequality constraints.

Let P be the feasible set. Two cases can occur if P is not empty:

Case 1. The optimizer lies strictly inside the feasible polyhedronCase 2.

The optimizer lies on the boundary of the feasible polyhedron

z'Hz+qz+r=ki

P

z*

(a) Case 1

z'Hz+qz+r=ki

P

z*

(b) Case 2


Dual of QP

minz12z

Hz + qz


65/73

minz 2z Hz + q zsubj. to Gz W

The Lagrange function is

L(z, u) = {1

2zHz + qz + u(Gz W)}

The the dual cost is

(u) = minz

{ 12

zHz + qz + u(Gz W)} (21)

and the dual problem is

maxu0

minz

{1

2zHz + qz + u(Gz W)}.

For a given u the Lagrange function 12zHz + qz + u(Gz W) is convex.

Therefore it is necessary and sufficient for optimality that the gradient is zero

Hz + q+ Gu = 0.


Dual of QP


66/73

From the previous equation we can derive z = H1(q+ Gu) andsubstituting this in equation (21) we obtain:

(u) = 1

2u(GH1G)u u(W + GH1q)

1

2qH1q (22)

By using (22) the dual problem can be rewritten as:

minu12u

(GH1G)u + u(W + GH1q) + 12qH1q

subj. to u 0


KKT condition for QP


67/73

Consider the QP (20), f(z) = Hz + q, gi(z) = Giz Wi, gi(z) = Gi. TheKKT conditions become

Hz + q+ Gu = 0 (23a)

ui(Giz Wi) = 0 (23b)

u 0 (23c)

Gz W 0 (23d)


Active Constraints and Degeneracies


68/73

Let J {1, . . . , m} be the set of constraint indices. Consider the set of activeand inactive constraints at feasible z:

A(z) {j J : Gj z = Wj }N A(z) {j J : Gj z < Wj }.

We have two cases:

Case 1. A(z) = {}.Case 2. A(z) is a nonempty subset of {1, . . . , m}.

The QP is said to be primal degenerate if there exists a z Z such thatthe number of active constraints at z is greater than the number ofvariables s.

The QP is said to be dual degenerate if its dual problem is primaldegenerate.


Constrained Least-Squares Problems


69/73

The problem of minimizing the convex quadratic function arises in many fieldsand has many names, e.g., linear regression or least-squares approximation.

Az b22 = zAAz 2bAz + bb

The minimizer isz = (AA)1Ab Ab

When linear inequality constraints are added, the problem is calledconstrained linear regression or constrained least-squares, and there is nolonger a simple analytical solution. As an example we can consider regressionwith lower and upper bounds on the variables, i.e.,

minz Az b22subj. to li zi ui, i = 1, . . . , n ,

which is a QP.


Example


70/73

Considermin

zAz b22

where

A =

0.7513 0.5472 0.81430.2551 0.1386 0.24350.5060 0.1493 0.92930.6991 0.2575 0.35000.8909 0.8407 0.1966

0.9593 0.2543 0.2511

, b =

0.61600.47330.35170.83080.5853

0.5497

Unconstrained Least-Squares: in Matlab z = A\b orz=quadprog(AA,bA). z = [0.7166, 0.0205, 0.1180],

Assume z2 0. Constrained Least-Squares in Matlab:

z=quadprog(A

A,b

A,[0 -1 0],0 ). z

= [0.7045, 0, 0.1194].


Nonlinear Programming


71/73

Considerminz f(z)

subj. to gi(z) 0 for i = 1, . . . , mhi(z) = 0 for i = 1, . . . , pz Z

A variety of softwares exists

In general, global optimality not guaranteedSolutions are usually computed by recursive algorithms which start froman initial guess z0 and at step k generate a point zk such that{f(zk)}k=0,1,2,... converges to J.

These algorithms recursively use and/or solve analytical conditions for

optimalityIn this class we will use NPSOL


Nonlinear Programming - NPSOL


72/73

Possible syntax:

[INFORM,ITER,ISTATE,C,CJAC,CLAMDA,OBJF,OBJGRAD,R,X] =npsol(X0,A,L,U,funobj,funcon,OPTION);

Solving

minz funobj(z)

subj. to L

zAz

funcon(z)

U

NPSOL Manual and Example on bSpace

Note it is a mex-function


Nonlinear Programming - Matlab Optimization toolbox


73/73

Possible syntax:

[X,FVAL,EXITFLAG] =fmincon(funobj,X0,A,B,Aeq,Beq,LB,UB,NONLCON);

Solvingminz funobj(z)subj. to A z B

Aeq z = BeqLB z U BCeq(z) = 0C(z) 0

The function NONLCON accepts z and returns the vectors C(z) andCeq(z)

help fmincon in Matlab


1 basics optimization

Documents