1 basics optimization
TRANSCRIPT
-
7/29/2019 1 Basics Optimization
1/73
Model Predictive Control for Linear and
Hybrid Systems.Basics on Optimization
Francesco BorrelliDepartment of Mechanical Engineering,
University of California at Berkeley,USA
February 3, 2011
-
7/29/2019 1 Basics Optimization
2/73
ReferencesFrom my Book:
Main Concepts (Chapter 2)
Optimality Conditions: Lagrange duality theory and KKT Conditions(Chapter 3)
Polyhedra, Polytopes and Simplices (Chapter 4)
Linear and Quadratic Programming (Chapter 5)
Note: The following notes have been extracted from
Stephen Boyds lecture notes for his course on convex optimization. Thefull notes can be downloaded from http://www.stanford.edu/~boyd
Nonlinear Programming Theory and Algorithms by Bazaraa, Sherali,Shetty
LMIs in Control by Scherer and Weiland
Lectures on Polytopes by ZieglerFrancesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 2 / 73
-
7/29/2019 1 Basics Optimization
3/73
Outline
1 Main conceptsOptimization problemsContinuous problemsInteger and Mixed-Integer ProblemsConvexity
2 Optimality conditions: Lagrange duality theory and KKT conditions
3 Polyhedra, polytopes and simplices
4
Linear and quadratic programming
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 3 / 73
-
7/29/2019 1 Basics Optimization
4/73
Abstract optimization problems
Optimization problems are abundant in economical daily life (wherever adecision problem is)
General abstract features of problem formulation: Decision set Z
Constraints on decision and subset S Z of feasible decisions. Assign to each decision a cost f(x) R. Goal is to minimize the cost by choosing a suitable decision.
Concrete features of problem formulation: Z is real vector space: Continuous problem.
Z is discrete set: Discrete or combinatorial problem.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 4 / 73
-
7/29/2019 1 Basics Optimization
5/73
Concrete Optimization Problems
Generally formulated as
infz f(z)subj. to z S Z
(1)
The vector z collects the decision variables
Z is the domain of the decision variables,S Z is the set of feasible or admissible decisions.
The function f : Z R assigns to each decision z a cost f(z) R.
Shorter form of problem (1)inf
zSZf(z) (2)
Problem (2) is a called nonlinear mathematical program or simply nonlinearprogram.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 5 / 73
-
7/29/2019 1 Basics Optimization
6/73
Concrete Optimization ProblemsSolving problem (2) means
Compute the least possible cost J (called the optimal value)
J infzS
f(z) (3)
J is the greatest lower bound of f(z) over the set S:
f(z) J, z S
AND1
z S : f(z) = J
OR
2
> 0 z S| f(z) J +
Compute the optimizer, z S withf(z) = J. If z exists, thenrewrite (3) as
J = minzS
f(z) (4)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 6 / 73
-
7/29/2019 1 Basics Optimization
7/73
Concrete Optimization Problems
Consider the Nonlinear Program (NLP)
J = minzS
f(z)
Notation:If J = the problem is unbounded below.
If the set S is empty then the problem is said to be infeasible (we setJ = +).
If S = Z the problem is said to be unconstrained.
The set of all optimal solutions is denoted by
argminzSf(z) {z S : f(z) = J}
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 7 / 73
-
7/29/2019 1 Basics Optimization
8/73
Continuous problemsConsider the problem
infz f(z)
subj. to gi(z) 0 for i = 1, . . . , mhi(z) = 0 for i = 1, . . . , pz Z
(5)
The domain Z of the problem (5) is a subset ofRs (the
finite-dimensional Euclidian vector-space), defined as:
Z = {z Rs : z dom f, z dom gi, i = 1, . . . , m ,
z dom hi, i = 1, . . . , p}
A point z Rs is feasible for problem (5) if: z Zand gi(z) 0 for i = 1, . . . , m, hi(z) = 0 for i = 1, . . . , p
The set of feasible vectors is
S = {z Rs : z Z, gi(z) 0, i = 1, . . . , m, hi(z) = 0,
i = 1, . . . , p}.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 8 / 73
-
7/29/2019 1 Basics Optimization
9/73
Local and Global Optimizer
Let J be the optimal value of problem (5). A global optimizer, if itexists, is a feasible vector z with f(z) = J.
A feasible point z is a local optimizer for problem (5) if there exists anR > 0 such that
f(z) = inf z f(z)subj. to gi(z) 0 for i = 1, . . . , m
hi(z) = 0 for i = 1, . . . , pz z Rz Z
(6)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 9 / 73
-
7/29/2019 1 Basics Optimization
10/73
Active, Inactive and Redundant Constraints
Consider the problem
infz f(z)subj. to gi(z) 0 for i = 1, . . . , m
hi(z) = 0 for i = 1, . . . , pz Z
The i-th inequality constraint gi(z) 0 is active at z if gi(z) = 0,otherwise is inactive
Equality constraints are always active for all feasible points.
Removing a redundant constraint does not change the feasible set S,this implies that removing a redundant constraint from the optimizationproblem does not change its solution.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 10 / 73
-
7/29/2019 1 Basics Optimization
11/73
Problem Description
The functions f, gi and hi can be available in analytical form or can bedescribed through an oracle model (also called black box orsubroutine model).
In an oracle model f, gi and hi are not known explicitly but can beevaluated by querying the oracle. Often the oracle consists of subroutineswhich, called with the augment z, return f(z), gi(z) and hi(z) and theirgradients f(z), gi(z), hi(z).
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 11 / 73
-
7/29/2019 1 Basics Optimization
12/73
Integer and Mixed-Integer Problems
If the decision set Z in the optimization problem is finite, then theoptimization problem is called combinatorial or discrete.
If Z {0, 1}s, then the problem is said to be integer.
If Z is a subset of the Cartesian product of an integer set and a realEuclidian space, i.e., Z {[zc, zb] : zc Rsc , zb {0, 1}sb}, then the
problem is said to be mixed-integer.The standard formulation of a mixed-integer non-linear program is
inf[zc,zb] f(zc, zb)subj. to gi(zc, zb) 0 for i = 1, . . . , m
hi(z
c, z
b) = 0 for i = 1, . . . , p
zc Rsc , zb {0, 1}sb
[zc, zb] Z
(7)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 12 / 73
-
7/29/2019 1 Basics Optimization
13/73
Convexity
A set S Rs is convex if
z1 + (1 )z2 S for all z1, z2 S, [0, 1].
A function f : S R is convex if S is convex and
f(z1 + (1 )z2) f(z1) + (1 )f(z2)
for all z1, z2 S, [0, 1].
A function f : S R is strictly convex if S is convex and
f(z1 + (1 )z2) < f(z1) + (1 )f(z2)
for all z1, z2 S, (0, 1).
A function f : S R is concave if S is convex and f is convex.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 13 / 73
-
7/29/2019 1 Basics Optimization
14/73
Operations preserving convexity
1 The intersection of an arbitrary number of convex sets is a convex set:
if Sn is convex n N+ then
nN+Sn is convex.
The empty set is convex.
2 The sub-level sets of a convex function f on S are convex:
if f(z) is convex then S {z S : f(z) } is convex .
3 f1, . . . , f N convex N
i=1 ifi is convex function i 0, i = 1, . . . , N .
4 The composition of a convex function f(z) with an affine map z = Ax + bgenerates a convex function f(Ax + b) of x.
5 A linear function f(z) = cz + d is both convex and concave.
6 A quadratic function f(z) = zQz + 2sz + r is convex if and only ifQ Rss is positive semidefinite. Strictly convex if Q Rss is positivedefinite.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 14 / 73
-
7/29/2019 1 Basics Optimization
15/73
Operations preserving convexity
Suppose f(x) = h(g(x)) = h(g1(x), . . . , gk(x)) with h : Rk R,gi : R
s R. Then,1 f is convex if h is convex, h is nondecreasing in each argument, and gi are
convex,2 f is convex if h is convex, h is nonincreasing in each argument, and gi are
concave,3 f is concave if h is concave, h is nondecreasing in each argument, and gi
are concave.
The pointwise maximum of a set of convex functions is a convex function:
f1(z), . . . , f k(z) convex functions
f(z) = max{f1(z), . . . , f k(z)} is a convex function.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 15 / 73
-
7/29/2019 1 Basics Optimization
16/73
Convex optimization problems
Problem (5) is convex if the cost f is convex on Z and S is convex.
In convex optimization problems local optimizers are also globaloptimizers.
Non-convex optimization problems solved by iterating between the
solutions of convex sub-problems.Convexity of the feasible set S difficult to prove except in special cases.(See operations preserving the convexity)
Non-convex problems exist which can be transformed into convexproblems through a change of variables and manipulations on cost and
constraints.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 16 / 73
-
7/29/2019 1 Basics Optimization
17/73
Outline
1 Main concepts
2 Optimality conditions: Lagrange duality theory and KKT conditionsOptimality ConditionsDuality theoryCertificate of optimalityComplementarity slacknessKKT conditions
3 Polyhedra, polytopes and simplices
4 Linear and quadratic programming
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 17 / 73
C
-
7/29/2019 1 Basics Optimization
18/73
Optimality Conditions
Consider the problem
infz f(z)subj. to gi(z) 0 for i = 1, . . . , m
hi(z) = 0 for i = 1, . . . , pz Z
In general, an analytical solution does not exist.
Solutions are usually computed by recursive algorithms which start froman initial guess z0 and at step k generate a point zk such that{f(zk)}k=0,1,2,... converges to J
.
These algorithms recursively use and/or solve analytical conditions foroptimality
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 18 / 73
O i li C di i f U i d
-
7/29/2019 1 Basics Optimization
19/73
Optimality Conditions for UnconstrainedOptimization Problems
Theorem (Necessary condition*)
f : Rs R differentiable at z. If there exists a vector d such thatf(z)d < 0, then there exists a > 0 such that f(z + d) < f(z) for all (0, ).
The vector d in the theorem above is called descent direction.
The direction of steepest descent ds at z is defined as the normalizeddirection where f(z)ds < 0 is minimized.
The direction ds of steepest descent is ds = f(z)f(z)
.
Corollary
f : Rs R is differentiable at z. If z is a local minimizer, then f(z) = 0.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 19 / 73
O ti lit C diti f U t i d
-
7/29/2019 1 Basics Optimization
20/73
Optimality Conditions for UnconstrainedOptimization Problems
Theorem (Sufficient condition*)Suppose that f : Rs R is twice differentiable at z. If f(z) = 0 and theHessian of f(z) at z is positive definite, then z is a local minimizer.
Theorem (Necessary and sufficient condition*)
Suppose that f : Rs R is differentiable at z. If f is convex, then z is a global
minimizer if and only if f(z) = 0.Proofs available in Chapter 4 of M.S. Bazaraa, H.D. Sherali, and C.M.Shetty. Nonlinear Programming Theory and Algorithms. John Wiley & Sons,Inc., New York, second edition, 1993.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 20 / 73
D lit Th Th L F ti
-
7/29/2019 1 Basics Optimization
21/73
Duality Theory. The Lagrange Function
Consider the primal optimization problem
infz f(z)subj. to gi(z) 0 for i = 1, . . . , m
hi(z) = 0 for i = 1, . . . , pz Z
Any feasible point: upper bound of the optimal value
Lagrange dual problem: lower bound on optimal valueConstruct Lagrange function
L(z ,u,v) = f(z) + u1g1(z) + . . . + umgm(z)++v1h1(z) + . . . + vphp(z)
More compactL(z ,u,v) f(z) + ug(z) + vh(z)
ui and vi called Lagrange multipliers or dual variables
objective is augmented with weighted sum of constraint functions
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 21 / 73
D lit Th Th L F ti
-
7/29/2019 1 Basics Optimization
22/73
Duality Theory. The Lagrange Function
Consider Lagrange function
L(z ,u,v) f(z) + ug(z) + vh(z)
Let z S be feasible. For arbitrary vectors u 0 and v trivially have
L(z ,u,v) f(z)
After infimization we infer
infzZ
L(z ,u,v) infzZ, g(z)0, h(z)=0
f(z)
Best lower bound: since u 0 and v arbitrary
sup(u,v), u0
infzZ
L(z ,u,v) infzZ, g(z)0, h(z)=0
f(z)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 22 / 73
D lit Th Th D l P bl
-
7/29/2019 1 Basics Optimization
23/73
Duality Theory. The Dual Problem
Let(u, v) inf
zZL(z ,u,v) [, +] (8)
Lagrangian dual problem
sup(u,v), u0
(u, v) (9)
Remarks
The (8) (Lagrangiang dual subproblem) is an unconstrainedoptimization problem. Only points (u, v) with (u, v) > areinteresting
(u, v) always concave the (9) is concave, much easier to solve than
the primal (non convex in general)Weak duality always holds:
sup(u,v), u0
(u, v) infzZ, g(z)0, h(z)=0
f(z)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 23 / 73
Duality Theory Duality Gap and Certificate of
-
7/29/2019 1 Basics Optimization
24/73
Duality Theory. Duality Gap and Certificate of
Optimality
Let:d = max
(u,v), u0(u, v)
J = minzZ, g(z)0, h(z)=0
f(z)
then
we always have d J
d J is called optimal duality gap
Strong duality if d = J
In case of strong duality u and v serve as certificate of optimality
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 24 / 73
Duality Theory Constraint Qualifications: The Slaters
-
7/29/2019 1 Basics Optimization
25/73
Duality Theory. Constraint Qualifications: The Slater s
Condition
When do we haveDual optimal value = Primal optimal value?
In general, strong duality does not hold, even for convex primal problems.
Constraint qualifications. Conditions on the constraint functionsimplying strong duality for convex problems.
Slaters condition (A well known constraint qualification )Consider the primal problem. There exists z Rs which belongs to therelative interior of the problem domain Z, which is feasible(g(z) 0, h(z) = 0) and for which gj (z) < 0 for all j for which gj is notan affine function.
Other constraint qualifications exist. Linear Independence, Cottles,Zangwills, Kuhn-Tuckers constraint qualifications.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 25 / 73
Duality Theory The Slaters Theorem
-
7/29/2019 1 Basics Optimization
26/73
Duality Theory. The Slater s Theorem
When do we haveDual optimal value = Primal optimal value?
Theorem (Slaters theorem)
Consider the primal problem and its dual problem. If the primal problem isconvex and Slaters condition holds then d > and d = J.
Thenmax
(u,v), u0(u, v) = min
zZ, g(z)0, h(z)=0f(z)
Slaters condition reduces to feasibility when all inequality constraints arelinear. Strong duality holds for convex QP and for LP (feasible)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 26 / 73
Certificate of Optimality
-
7/29/2019 1 Basics Optimization
27/73
Certificate of Optimality
Be z a feasible point. f(z) is an upper bound on the cost, i.e.,
J
f(z).Be (u, v) a dual feasible point, (u, v) is an lower bound on the cost,i.e., (u, v) J.
If z and (u, v) are primal and dual feasible, respectively, then(u, v) J f(z). Therefore z is -suboptimal, with equal to
primal-dual gap, i.e., = f(z) (u, v).The optimal value of the primal (and dual) problems will lie in the sameinterval
J, d [(u, v), f(z)].
(u, v) is a certificate that proves the (sub)optimality of z.
At iteration k, algorithms produce a primal feasible zk and a dual feasibleuk, vk with f(zk) (uk, vk) 0 as k , hence at iteration k weknow J [(uk, vk), f(zk)] (useful stopping criteria)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 27 / 73
Complementary slackness
-
7/29/2019 1 Basics Optimization
28/73
Complementary slackness
Suppose that z, u, v are primal, dual feasible with zero duality gap (hence,
they are primal, dual optimal)
f(z) = (u, v)= inf z
f(z) + ug(z) + vh(z)
f(z) + ug(z) + vh(z)
hence we have
mi=1 u
i gi(z
) = 0 and so
ui gi(z) = 0, i = 1, . . . , m
called complementary slackness condition
i-th constraint inactive at optimum = ui = 0
ui > 0 at optimum = i-th constraint active at optimum
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 28 / 73
KKT optimality conditions
-
7/29/2019 1 Basics Optimization
29/73
KKT optimality conditions
Suppose
f, gi, hi are differentiable
z, u, v are (primal, dual) optimal, with zero duality gap
by complementary slackness we have
f(z) +i ui gi(z
) + j vj hj(z
) =
minz
f(z) +
i
ui gi(z) +
j
vj hj (z)
(10)
i.e., z minimizes L(z, u, v) therefore
f(z) +
i
ui gi(z) +
j
vj hj (z) = 0
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 29 / 73
KKT optimality conditions
-
7/29/2019 1 Basics Optimization
30/73
KKT optimality conditions
z, (u, v) of an optimization problem, with differentiable cost andconstraints and zero duality gap, have to satisfy the following conditions:
0 = f(z) +m
i=1ui gi(z
) +
p
j=1vj hi(z
), (11a)
0 = ui gi(z), i = 1, . . . , m (11b)
0 ui , i = 1, . . . , m (11c)
0 gi(z), i = 1, . . . , m (11d)
0 = hj (z) j = 1, . . . , p (11e)
Conditions (11a)-(11e) are called the Karush-Kuhn-Tucker(KKT) conditions.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 30 / 73
KKT optimality conditions
-
7/29/2019 1 Basics Optimization
31/73
KKT optimality conditions
Consider the primal problem (5).
Theorem (Necessary and sufficient condition)Suppose that problem (5) is convex and that cost and constraints f, gi and hiare differentiable at a feasible z. If problem (5) satisfies Slaters conditionthen z is optimal if and only if there are (u, v) that, together with z,satisfy the KKT conditions.
Theorem (Necessary and sufficient condition)
Let z be a feasible solution and A = {i : gi(z) = 0} be the set of activeconstraints at z. Suppose that f, gi are differentiable at z
i and that hi arecontinuously differentiable at z i. Further, suppose that gi(z) for i A
and hi(z) for i = 1, . . . , p, are linearly independent. If z, (u, v) areoptimal, then they satisfy the KKT conditions. In addition, if problem (5) isconvex, then z is optimal if and only if there are (u, v) that, togetherwith z, satisfy the KKT conditions.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 31 / 73
KKT geometric interpretation
-
7/29/2019 1 Basics Optimization
32/73
KKT geometric interpretation-f(z)1
g(z)12
g(z)
1
0
g (z)
3
0
g( )z 0
g(z)11
-f(z)2 g(z)22
z2
z1
g(z)23
g(z)
2
0
Rewrite the (11a), as
f(z) =iI
uigi(z), ui 0,
i.e., the direction of cost steepest descent belongs to the convex cone spanned
by gis,Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 32 / 73
KKT conditions. Example
-
7/29/2019 1 Basics Optimization
33/73
s pKKT conditions under only the convexity assumptions are notnecessary Consider the problem:
min z1
subj. to (z1 1)2 + (z2 1)2 1(z1 1)2 + (z2 + 1)2 1
(12)
(1,1)
(1,-1)
f(1,0)g1(1,0)
g (1,0)2
z1
z2
KKT are necessary if constraints qualification is satisfied:
z Z such that g(z) 0, h(z) = 0, gj(z) < 0 if gj is not affine and z intZ
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 33 / 73
Outline
-
7/29/2019 1 Basics Optimization
34/73
1 Main concepts
2 Optimality conditions: Lagrange duality theory and KKT conditions
3 Polyhedra, polytopes and simplices
General Set Definitions and OperationsPolyhedra Definitions and RepresentationsBasic Operations on Polytopes
4 Linear and quadratic programming
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 34 / 73
General Set Definitions and Operations
-
7/29/2019 1 Basics Optimization
35/73
p
An n-dimensional ball B(x0, ) is the setB(x
0, ) = {x Rn| x x
02
}. x0
and are the center and theradius of the ball, respectively.
Affine sets are sets described by the solutions of a system of linearequations:
F = {x Rn : Ax = b, with A Rmn, b Rm}.
The affine combination of x1, . . . , xk is defined as the point1x1 + . . . + kxk where
ki=1 i = 1.
The affine hull of K Rn is the set of all affine combinations of pointsin K and it is
aff(K) = {1x1 + . . . + kxk | xi K, i = 1, . . . , k ,k
i=1
i = 1}
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 35 / 73
General Set Definitions and Operations
-
7/29/2019 1 Basics Optimization
36/73
pThe dimension of an affine set is the dimension of the largest ball ofradius > 0 included in the set.
The convex combination of x1, . . . , xk is defined as the point
1x1 + . . . + kxk where
ki=1 i = 1 and i 0, i = 1, . . . , k.
The convex hull of a set K Rn is the set of all convex combinationsof points in K and it is denoted as conv(K):
conv(K) {1x1 + . . . + kxk | xi K, i 0, i = 1, . . . , k ,
ki=1
i = 1}.
A cone spanned by a finite set of points K = {x1, . . . , xk} is defined as
cone(K) = {k
i=1
ixi, i 0, i = 1, . . . , k}.
The Minkowski sum of two sets P, Q Rn is defined as
P Q {x + y|x P, y Q}.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 36 / 73
Polyhedra Definitions and Representations
-
7/29/2019 1 Basics Optimization
37/73
An H-polyhedron P in Rn denotes an intersection of a finite set of closedhalfspaces in Rn:
P = {x Rn : Ax b}
A two-dimensional H-polyhedron
a x b1 1
a x b3 3
a x b4 4
a x b2 2
a x b5 5
Inequalities which can be removed without changing the polyhedron are calledredundant. The representation of an H-polyhedron is minimal if it does notcontain redundant inequalities.Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 37 / 73
Polyhedra Definitions and Representations
-
7/29/2019 1 Basics Optimization
38/73
A V-polyhedron P in Rn denotes the Minkowski sum:
P = conv(V) cone(Y)
for some V = [V1, . . . , V k] Rnk, Y = [y1, . . . , yk ] Rnk
.
Any H-polyhedron is a V-polyhedron.
An H-polytope (V-polytope) is a bounded H-polyhedron(V-polyhedron). Any H-polytope is a V-polytope
The dimension of a polytope (polyhedron) P is the dimension of itsaffine hull and is denoted by dim(P).
A polytope P Rn, is full-dimensional if dim(P) = n or, equivalently,if it is possible to fit a non-empty n-dimensional ball in POtherwise, we say that polytope P is lower-dimensional.
If Pxi 2 = 1, where Px
i denotes the i-th row of a matrix Px, we say that
the polytope P is normalized.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 38 / 73
Polyhedra Definitions and Representations
-
7/29/2019 1 Basics Optimization
39/73
A linear inequality cz c0 is said to be valid for P if it is satisfied for allpoints z P.
A face of P is any nonempty set of the form
F = P {z Rs| cz = c0}
where cz c0 is a valid inequality for P.
The faces of dimension 0,1, dim(P)-2 and dim(P)-1 are called vertices,edges, ridges, and facets, respectively.
A d-simplex is a polytope ofRd with d + 1 vertices.
10 8 6 4 2 0 2 4 6 8 1010
8
6
4
2
0
2
4
6
8
10
x1
x
2
(a) V-representation.
10 8 6 4 2 0 2 4 6 8 1010
8
6
4
2
0
2
4
6
8
10
x1
x
2
(b) H-representation.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 39 / 73
Polytopal Complexes
-
7/29/2019 1 Basics Optimization
40/73
A set C Rn is called a P-collection (in Rn) if it is a collection of a finitenumber of n-dimensional polytopes, i.e.
C = {Ci}
NC
i=1,where Ci := {x Rn : Cxi x C
ci }, dim(Ci) = n, i = 1, . . . , N C, with NC < .
The underlying set of a P-collection C = {Ci}NCi=1 is the point set
C :=PC P =
NCi=1 Ci.
Special Classes
A collection of sets {Ci}NCi=1 is a strict partition of a set C if (i)
NCi=1 Ci = C and (ii) Ci Cj = , i = j.{Ci}NCi=1 is a strict polyhedral partition of a polyhedral set C if {Ci}
NCi=1
is a strict partition of C and Ci is a polyhedron for all i, where Ci denotesthe closure of the set CiA collection of sets {Ci}
NCi=1 is a partition of a set C if (i)
NCi=1 Ci = C
and (ii) (Ci\Ci) (Cj \Cj) = , i = j.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 40 / 73
Functions on Polytopal Complexes
-
7/29/2019 1 Basics Optimization
41/73
A function h() : R
k
, where R
s
, is piecewise affine (PWA) ifthere exists a strict partition R1, . . . ,RN of and h() = Hi + ki,
Ri, i = 1, . . . , N .
A function h() : Rk, where Rs, is piecewise affine onpolyhedra (PPWA) if there exists a strict polyhedral partition
R1, . . . ,RN of and h() = H
i
+ k
i
, Ri, i = 1, . . . , N .A function h() : R, where Rs, is piecewise quadratic(PWQ) if there exists a strict partition R1, . . . ,RN of andh() = Hi + ki + li, Ri, i = 1, . . . , N .
A function h() : R, where Rs, is piecewise quadratic on
polyhedra (PPWQ) if there exists a strict polyhedral partitionR1, . . . ,RN of and h() = Hi + ki + li, Ri, i = 1, . . . , N .
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 41 / 73
Basic Operations on Polytopes
-
7/29/2019 1 Basics Optimization
42/73
Convex Hull of a set of points V = {Vi}NVi=1, with Vi R
n,
conv(V) = {x Rn : x =NV
i=1iVi, 0 i 1,
NV
i=1i = 1}. (13)
Used to switch from a V-representation of a polytope to anH-representation.
Vertex Enumeration of a polytope P given in H-representation. (dualof the convex hull operation)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 42 / 73
Basic Operations on Polytopes
-
7/29/2019 1 Basics Optimization
43/73
Polytope reduction is the computation of the minimal representationof a polytope. A polytope P Rn, P = {x Rn : Pxx Pc} is in a
minimal representation if the removal of any row in Pxx Pc wouldchange it (i.e., if there are no redundant constraints).
The Chebychev Ball of a polytope P = {x Rn | Pxx Pc}, withPx RnPn, Pc RnP , corresponds to the largest radius ball B(xc, R)with center xc, such that B(xc, R) P.
10 8 6 4 2 0 2 4 6 8 1010
8
6
4
2
0
2
4
6
8
10
x1
x2
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 43 / 73
Basic Operations on Polytopes
-
7/29/2019 1 Basics Optimization
44/73
Projection Given a polytope
P = {[x
y
]
Rn+m
: P
x
x + P
y
y Pc
} Rn+m
the projection ontothe x-space Rn is defined as
projx(P) := {x Rn | y Rm : Pxx + Pyy Pc}.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 44 / 73
Basic Operations on Polytopes
-
7/29/2019 1 Basics Optimization
45/73
Set-Difference The set-difference of two polytopes Y and R0
R = Y \ R0 := {x Rn : x Y, x / R0},
in general, can be a nonconvex and disconnected set and can be describedas a P-collection R =
mi=1 Ri, where Y =
mi=1 Ri
(R0
Y).
The P-collection R =m
i=1 Ri can be computed by consecutivelyinverting the half-spaces defining R0 as described in the following
TheoremLet Y Rn be a polyhedron, R0 {x Rn : Ax b}, andR0 {x Y : Ax b} = R0
Y, where b Rm, R0 = and Ax b is a
minimal representation of R0. Also let
Ri =
x Y : Aix > bi
Aj x bj , j < i
i = 1, . . . , m
Let R m
i=1 Ri. Then, R is a P-collection and {R0, R1, . . . , Rm} is a strictpolyhedral partition of Y.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 45 / 73
Basic Operations on Polytopes
-
7/29/2019 1 Basics Optimization
46/73
R0
X
x2x+2
x2x1
x+1x
1
R0
X
x2x+2
x2x1
x+1x
1
R1
g1 0g1 0
R0
X
x2x+2
x2x1
x+1x
1
R1
R2g1 0
g2 0
R0
X
x2x+2
x2x1
x+1x
1
R1
R2
R3R4R5
Figure: Two dimensional example: partition of the rest of the space X \R0.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 46 / 73
Basic Operations on Polytopes
-
7/29/2019 1 Basics Optimization
47/73
Pontryagin Difference The Pontryagin difference (also known asMinkowski difference) of two polytopes P and Q is a polytope
P Q := {x Rn : x + q P, q Q}.
The Minkowski sum of two polytopes P and Q is a polytope
P Q := {x Rn : y P, z Q, x = y + z}.
5 4 3 2 1 0 1 2 3 4 5
5
4
3
2
1
0
1
2
3
4
5
x1
x2
P QP
Q
(a) Pontryagin differenceP Q.
5 4 3 2 1 0 1 2 3 4 5
5
4
3
2
1
0
1
2
3
4
5
x1
x2
P QP
Q
(b) Minkowski sum P Q.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 47 / 73
Minkowski Sum of Polytopes
-
7/29/2019 1 Basics Optimization
48/73
The Minkowski sum is computationally expensive.Consider
P = {y Rn | Pyy Pc}, Q = {z Rn | Qzz Qc},
it holds that
W = P Q
=
x Rn | y Pyy Pc, z Qzz Qc, y , z Rn, x = y + z
=
x Rn | y Rn, s.t. Pyy Pc, Qz(x y) Qc
= x Rn | y Rn, s.t. 0 Py
Q
z
Q
z xy P
c
Q
c= projx
[xy] Rn+n |
0 Py
Qz Qz
xy
Pc
Qc
.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 48 / 73
Pontryagin Difference of Polytopes
-
7/29/2019 1 Basics Optimization
49/73
The Pontryagin difference is not computationally expensive.
Consider
P = {y Rn s.t. , Pyy Pb}, Q = {z Rns.t. Qzz Qb},
Then:P Q = {x Rn s.t.Pyx Pb H(Py, Q)}
where the i-th element of H(Py, Q) is
Hi(Py, Q) max
xQPyi x
and P
y
i is the i-th row of the matrix P
y
.For special cases (e.g. when Q is a hypercube), more efficientcomputational methods exist.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 49 / 73
Basic Operations on Polytopes
-
7/29/2019 1 Basics Optimization
50/73
Note that (P Q) Q P.
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
x1
x2
P
Q
(c) Two polytopes P and Q.
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
x1
x2
P Q
P
(d) Polytope P and Pon-tryagin difference P Q.
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
x1
x2
P Q
(P Q) Q
(e) Polytope P Q and theset (P Q) Q.
Figure: Illustration that (P Q) Q P .
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 50 / 73
Affine Mappings and Polyhedra
-
7/29/2019 1 Basics Optimization
51/73
Consider a polyhedron P = {x Rn | Pxx Pc}, with Px RnPn andan affine mapping f(z)
f : z Rn Az + b, A Rnn, b Rn
Define the composition of P and f as the following polyhedron
P f {z Rm | Pxf(z) Pc} = {z Rm | PxAz Pc Pxb}
Useful for backward-reachability
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 51 / 73
Affine Mappings and Polyhedra
C id l h d P { Rn | P x P c} ith P x RnPn d
-
7/29/2019 1 Basics Optimization
52/73
Consider a polyhedron P = {x Rn | Pxx Pc}, with Px RnPn andan affine mapping f(z)
f : z Rn
Az + b, A Rnn
, b Rn
Define the composition of f and P as the following polyhedron
f P {y RmA | y = Ax + b x Rn, Pxx Pc}
The polyhedron f P in can be computed as follows. Write P inV-representation P = conv(V) and map the vertices V = {V1, . . . , V k}through the transformation f. Because the transformation is affine, theset f P is the convex hull of the transformed vertices
f P = conv(F), F = {AV1 + b , . . . , A V k + b}.
If f is invertible x = A1y A1b and therefore
f P = {y RmA | PxA1y Pc + PxA1b}
Useful for forward-reachability
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 52 / 73
Outline
-
7/29/2019 1 Basics Optimization
53/73
1 Main concepts
2 Optimality conditions: Lagrange duality theory and KKT conditions
3 Polyhedra, polytopes and simplices
4 Linear and quadratic programming
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 53 / 73
Linear Programming
-
7/29/2019 1 Basics Optimization
54/73
infz cz
subj. to Gz Wwhere z Rs.
Convex optimization problems.
Other common forms:
infz czsubj. to Gz W
Geq z = Weq
orinfz c
z
subj. to Geq z = Weqz 0
Always possible to convert one of the three forms into the other.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 54 / 73
Graphical Interpretation and Solutions PropertiesLet P be the feasible set. P is a polyhedron.
-
7/29/2019 1 Basics Optimization
55/73
P P p yIf P is empty, then the problem is infeasible.Denote by J the optimal value and by Z the set of optimizers
Z
= argminzP c
zCase 1. The LP solution is unbounded, i.e., J = .Case 2. The LP solution is bounded, i.e., J > and the optimizer is unique.
Z is a singleton.Case 3. The LP solution is bounded and there are multiple optima. Z is an
uncountable subset ofRs which can be bounded or unbounded.
cz = k4
cz = k1
P
(a) Case 1
cz = k4cz = k3
cz = k1
zP
(b) Case 2
cz = k4
cz = kz
P
(c) Case 3
Fi ure: Gra hical Inter retation of the Linear Pro ram Solution k < k Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 55 / 73
Dual of LP
-
7/29/2019 1 Basics Optimization
56/73
Consider the LP infz czsubj. to Gz W
(14)
with z Rs and G Rms. The Lagrange function is
L(z, u) = cz + u(Gz W).
The dual cost is
(u) = infz
L(z, u) = infz
(c + uG)z uW =
uW if Gu = c
if Gu = c
Since we are interested only in cases where is finite,
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 56 / 73
supu uW
subj. to Gu = cu 0
infu Wu
subj. to Gu = cu 0
KKT condition for LP
-
7/29/2019 1 Basics Optimization
57/73
The KKT conditions for the LP (14) become
Gu = c, (15a)
(Gj z Wj )uj = 0, (15b)
u 0, (15c)Gz W (15d)
which are: primal feasibility (15d), dual feasibility (15a), (15c) and slacknesscomplementary conditions (15b).
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 57 / 73
Active Constraints and DegeneraciesBe J {1, . . . , m} the set of constraint indices and consider the sets
-
7/29/2019 1 Basics Optimization
58/73
A(z) {j J : Gj z = Wj }, of active constraints at feasible zN A(z) {j J : Gj z < Wj }, of inactive constraints at feasible z.
C 1. A(z) undefined, since z is undefinedC 2. The cardinality of A(z) can be between s and m. Higher than s
regardless whether P is minimal or not (see figure below)C 3. IfP is minimal, then the cardinality of A(z) is s p with p the dimension
of the optimal face at all points z Z contained in the relative interiorof the optimal face, otherwise it can be any number p with 1 < p m.
P
z*
1
23
4
56
Figure: Primal Degeneracy in a Linear ProgramFrancesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 58 / 73
Active Constraints and Degeneracies
-
7/29/2019 1 Basics Optimization
59/73
Definition (Primal degenerate)
The LP is said to be primal degenerate if there exists a z Z such that thenumber of active constraints at z is greater than the number of variables s.
Definition (Dual degenerate)
The LP is said to be dual degenerate if its dual problem is primal degenerate.
Homework. Show that if the primal problem has multiple optima, then thedual problem is primal degenerate (i.e., the primal problem is dualdegenerate). The converse is not always true. Details in the book.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 59 / 73
Convex Piecewise Linear OptimizationConsider
-
7/29/2019 1 Basics Optimization
60/73
J = minz J(z)subj. to Gz W
(16)
where the cost function has the form
J(z) = maxi=1,...,k
{ciz + di} (17)
where ci Rs and di R.
0 1 2 3 4 5 6 70
1
2
3
4
5
x
P1
fx
(
)f1( )x
f2( )x
f3( )x
f4( )x
P2
P3
P4
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 60 / 73
Convex Piecewise Linear Optimization
-
7/29/2019 1 Basics Optimization
61/73
The cost function J(z) in (17) is a convex PWA function. The optimizationproblem (16)-(17) can be solved by the following linear program:
J = minz,
subj. to Gz Wciz + di , i = 1, . . . , k
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 61 / 73
Convex Piecewise Linear Optimization
-
7/29/2019 1 Basics Optimization
62/73
ConsiderJ = minz J
1(z
1) + J
2(z
2)
subj. to G1z1 + G2z2 W (18)
where the cost function has the form
J1(z1) = maxi=1,...,k {ciz1 + di}J2(z2) = maxi=1,...,j {miz2 + ni}
(19)
The optimization problem (18)-(19) can be solved by the following linearprogram:
J = minz,1,2 1 + 2subj. to G1z1 + G2z2 W
ciz1 + di 1, i = 1, . . . , kmiz2 + ni 2, i = 1, . . . , j
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 62 / 73
Example
-
7/29/2019 1 Basics Optimization
63/73
The optimization problem:
minz1,z2
f(x, u) = min |z1 + 5| + |z2 3|
subject to 2.5 z1 5
1 z2 1
can be solved in Matlab by using v=linprog(f,A,b) where v = [1
, 2
, z1
, z2
],f = [1 1 0 0],
A =
0 0 1 00 0 1 00 0 0 10 0 0 11 0 1 01 0 1 0
0 10 0 10 10 0 1
, b =
52.51155
33
Solution is v = [7.5 0.2 2.5 1]
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 63 / 73
Quadratic Programming
-
7/29/2019 1 Basics Optimization
64/73
minz12z
Hz + qz + rsubj. to Gz W
(20)
where z Rs, H = H > 0 Rss.
Other QP forms often include equality and inequality constraints.
Let P be the feasible set. Two cases can occur if P is not empty:
Case 1. The optimizer lies strictly inside the feasible polyhedronCase 2.
The optimizer lies on the boundary of the feasible polyhedron
z'Hz+qz+r=ki
P
z*
(a) Case 1
z'Hz+qz+r=ki
P
z*
(b) Case 2
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 64 / 73
Dual of QP
minz12z
Hz + qz
-
7/29/2019 1 Basics Optimization
65/73
minz 2z Hz + q zsubj. to Gz W
The Lagrange function is
L(z, u) = {1
2zHz + qz + u(Gz W)}
The the dual cost is
(u) = minz
{ 12
zHz + qz + u(Gz W)} (21)
and the dual problem is
maxu0
minz
{1
2zHz + qz + u(Gz W)}.
For a given u the Lagrange function 12zHz + qz + u(Gz W) is convex.
Therefore it is necessary and sufficient for optimality that the gradient is zero
Hz + q+ Gu = 0.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 65 / 73
Dual of QP
-
7/29/2019 1 Basics Optimization
66/73
From the previous equation we can derive z = H1(q+ Gu) andsubstituting this in equation (21) we obtain:
(u) = 1
2u(GH1G)u u(W + GH1q)
1
2qH1q (22)
By using (22) the dual problem can be rewritten as:
minu12u
(GH1G)u + u(W + GH1q) + 12qH1q
subj. to u 0
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 66 / 73
KKT condition for QP
-
7/29/2019 1 Basics Optimization
67/73
Consider the QP (20), f(z) = Hz + q, gi(z) = Giz Wi, gi(z) = Gi. TheKKT conditions become
Hz + q+ Gu = 0 (23a)
ui(Giz Wi) = 0 (23b)
u 0 (23c)
Gz W 0 (23d)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 67 / 73
Active Constraints and Degeneracies
-
7/29/2019 1 Basics Optimization
68/73
Let J {1, . . . , m} be the set of constraint indices. Consider the set of activeand inactive constraints at feasible z:
A(z) {j J : Gj z = Wj }N A(z) {j J : Gj z < Wj }.
We have two cases:
Case 1. A(z) = {}.Case 2. A(z) is a nonempty subset of {1, . . . , m}.
The QP is said to be primal degenerate if there exists a z Z such thatthe number of active constraints at z is greater than the number ofvariables s.
The QP is said to be dual degenerate if its dual problem is primaldegenerate.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 68 / 73
Constrained Least-Squares Problems
-
7/29/2019 1 Basics Optimization
69/73
The problem of minimizing the convex quadratic function arises in many fieldsand has many names, e.g., linear regression or least-squares approximation.
Az b22 = zAAz 2bAz + bb
The minimizer isz = (AA)1Ab Ab
When linear inequality constraints are added, the problem is calledconstrained linear regression or constrained least-squares, and there is nolonger a simple analytical solution. As an example we can consider regressionwith lower and upper bounds on the variables, i.e.,
minz Az b22subj. to li zi ui, i = 1, . . . , n ,
which is a QP.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 69 / 73
Example
-
7/29/2019 1 Basics Optimization
70/73
Considermin
zAz b22
where
A =
0.7513 0.5472 0.81430.2551 0.1386 0.24350.5060 0.1493 0.92930.6991 0.2575 0.35000.8909 0.8407 0.1966
0.9593 0.2543 0.2511
, b =
0.61600.47330.35170.83080.5853
0.5497
Unconstrained Least-Squares: in Matlab z = A\b orz=quadprog(AA,bA). z = [0.7166, 0.0205, 0.1180],
Assume z2 0. Constrained Least-Squares in Matlab:
z=quadprog(A
A,b
A,[0 -1 0],0 ). z
= [0.7045, 0, 0.1194].
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 70 / 73
Nonlinear Programming
-
7/29/2019 1 Basics Optimization
71/73
Considerminz f(z)
subj. to gi(z) 0 for i = 1, . . . , mhi(z) = 0 for i = 1, . . . , pz Z
A variety of softwares exists
In general, global optimality not guaranteedSolutions are usually computed by recursive algorithms which start froman initial guess z0 and at step k generate a point zk such that{f(zk)}k=0,1,2,... converges to J.
These algorithms recursively use and/or solve analytical conditions for
optimalityIn this class we will use NPSOL
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 71 / 73
Nonlinear Programming - NPSOL
-
7/29/2019 1 Basics Optimization
72/73
Possible syntax:
[INFORM,ITER,ISTATE,C,CJAC,CLAMDA,OBJF,OBJGRAD,R,X] =npsol(X0,A,L,U,funobj,funcon,OPTION);
Solving
minz funobj(z)
subj. to L
zAz
funcon(z)
U
NPSOL Manual and Example on bSpace
Note it is a mex-function
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 72 / 73
Nonlinear Programming - Matlab Optimization toolbox
-
7/29/2019 1 Basics Optimization
73/73
Possible syntax:
[X,FVAL,EXITFLAG] =fmincon(funobj,X0,A,B,Aeq,Beq,LB,UB,NONLCON);
Solvingminz funobj(z)subj. to A z B
Aeq z = BeqLB z U BCeq(z) = 0C(z) 0
The function NONLCON accepts z and returns the vectors C(z) andCeq(z)
help fmincon in Matlab
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 73 / 73