ele604/ele704 optimization - hacettepe universityusezen/ele604/optimization3-2p.pdfumut sezen &...
TRANSCRIPT
ELE604/ELE704 Optimization
Constrained Optimization
http://www.ee.hacettepe.edu.tr/∼usezen/ele604/
Dr. Umut Sezen & Dr. Cenk Toker
Department of Electrical and Electronic Engineering
Hacettepe University
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 1 / 118
Contents
Duality
Duality
Lagrange Dual Function
The Lagrange Dual Problem
Dual Problem
Weak and Strong Duality
Weak Duality
Strong Duality
Slater's Condition
Saddle-point Interpretation
Optimality Conditions
Certi�cate of Suboptimality and Stopping CriterionStopping Criterion
Complementary Slackness
KKT Optimality Conditions
Solving The Primal Problem via The Dual
Perturbation and Sensitivity Analysis
Constrained Optimization Algorithms
Introduction
Primal MethodsFeasible Direction MethodsActive Set MethodsGradient Projection Method
Equality Constrained OptimizationQuadratic MinimizationEliminating Equality ConstraintsNewton's Method Equality ConstraintsNewton's Method with Equality Constraint Elimination
Penalty and Barrier MethodsPenalty MethodsBarrier MethodsProperties of the Penalty & Barrier Methods
Interior-Point MethodsLogarithmic BarrierCentral PathDual Points from Central PathKKT InterpretationNewton Step for Modi�ed KKT equationsThe Interior-Point AlgorithmHow to start from a feasible point?
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 2 / 118
Duality Duality
DualityI Consider the standard minimization problem (will be referred as the primal
problem)
min f(x)
s.t. g(x) ≤ 0
h(x) = 0
where g(x) and h(x)
g(x) =[g1(x) g2(x) · · · gi(x) · · · gL(x)
]Th(x) =
[h1(x) h2(x) · · · hj(x) · · · hM (x)
]Trepresent the L inequality and M equality constraints, respectively.
I The domain of the optimization problem D is de�ned by
D = dom f(x) ∩L⋂i=1
dom gi(x) ∩M⋂j=1
domhj(x)
I Any point x ∈ D satisfying the constraints is called a feasible point, i.e., g(x) ≤ 0and h(x)= 0.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 3 / 118
Duality Lagrange Dual Function
Lagrange Dual FunctionI De�ne the Lagrangian L : RN × RL × RM → R as
L(x,λ,ν) = f(x) + λTg(x) + νTh(x)
= f(x) +L∑i=1
λigi(x) +M∑j=1
νjhj(x)
where λi ≥ 0 and νj are called the Lagrange multipliers, and λ and ν
λ =[λ1 λ2 · · · λi · · · λL
]Tν =
[ν1 ν2 · · · νj · · · νM
]Tare called the Lagrange multiplier vectors.
I On the feasible set F where
F = {x | x ∈ D ∧ g(x) ≤ 0 ∧ h(x) = 0}
Lagragian has a value less than or equal to the cost function, i.e,
L(x,λ,ν) ≤ f(x), ∀x ∈ F, ∀λi ≥ 0
.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 4 / 118
Duality Lagrange Dual Function
I Then the Lagrange dual function, `(λ,ν), is de�ned as
`(λ,ν) = infx∈D
L(x,λ,ν)
= infx∈D
(f(x) + λTg(x) + νTh(x)
)= inf
x∈D
(f(x) +
L∑i=1
λigi(x) +M∑j=1
νjhj(x)
)
I Dual function `(λ,ν) is the pointwise in�mum of a set of a�ne functions of λ andν, hence it is concave even if f(x), gi(x) and hj(x) are not convex.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 5 / 118
Duality Lagrange Dual Function
I Proposition: The dual function constitutes a lower bound on p∗ = f(x∗), i.e.,
`(λ,ν) ≤ p∗ ∀λ ≥ 0
I Proof: Let x be a feasible point, that is x ∈ F (i.e., gi(x) ≤ 0 and hi(x) = 0),and λ > 0, then
λTg(x) + νTh(x) ≤ 0.
Then, the Lagrangian is
L(x,λ,ν) = f(x) + λTg(x) + νTh(x)︸ ︷︷ ︸≤0
≤ f(x)
So,`(λ,ν) = inf
x∈DL(x,λ,ν) ≤ L(x,λ,ν) ≤ f(x) ∀x �
I The pair (λ,ν) is called dual feasible when λ ≥ 0.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 6 / 118
Duality Lagrange Dual Function
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 7 / 118
Duality Lagrange Dual Function
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 8 / 118
Duality Lagrange Dual Function
Examples:
I Least Squares (LS) solution of linear equations
min xTx
s.t. Ax = b
- The Lagrangian isL(x,ν) = xTx + νT (Ax− b)
then the Lagrange dual function is
`(ν) = infxL(x,ν)
Since L(x,ν) is quadratic in x, it is convex
∇L(x,ν) = 2x + ATν = 0⇒ x∗ = −1
2ATν
Hence Lagrange dual function is given by
`(ν) = L(−1
2ATν,ν) = −1
4νTATAν − bTν
which is obviously concave p∗ ≥ `(ν) = − 14νTATAν − bTν
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 9 / 118
Duality Lagrange Dual Function
I Linear Programming (LP)
min cTx
s.t. Ax = b
− x ≤ 0
- The Lagrangian is
L(x,λ,ν) = cTx− λTx + νT (Ax− b)
then the Lagrange dual function is
`(λ,ν) = −bTν + infx
{(c + ATν − λ
)Tx
}In order `(ν) to be bounded, we must have c + ATν − λ = 0, i.e.,
`(λ,ν) =
{−bTν, c + ATν − λ = 0
−∞, otherwise
A�ne, i.e., concave when c + ATν − λ = 0 with λ ≥ 0
So, p∗ ≥ −bTν when c + ATν − λ = 0 with λ ≥ 0.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 10 / 118
Duality Lagrange Dual Function
I Two-way partitioning (a non-convex problem)
min xTWx
s.t. x2j = 1, j = 1, . . . , N
- This is a discrete-problem since xj ∈ {∓1}, and very di�cult to solve forlarge N .
The Lagrange dual function is
`(ν) = infx
{xTWx +
N∑j=1
νj(x2j − 1
)}
= infx
{xT (W + diag ν)x
}− 1Tν
=
{−1Tν, W + diag ν � 0
−∞, else
We may take ν = −λmin(W)1T which yields
p∗ ≥ −1Tν = N λmin(W)
where λmin(W) is the minimum eigen value of W.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 11 / 118
Duality Lagrange Dual Function
- If we relax the constraint to be ‖x‖2 = N , i.e.,N∑j=1
x2j = N , then problem
becomes easy to solve
min xTWx
s.t. ‖x‖2 = N
with an exact solution ofp∗ = N λmin(W)
where λmin(W) is the minimum eigen value of W.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 12 / 118
The Lagrange Dual Problem Dual Problem
The Lagrange Dual Problem
max `(λ,ν)
s.t. λ ≥ 0
gives the best lower bound for p∗, i.e., p∗ ≥ `(λ,ν).
I The pairs (λ,ν) with λ ≥ 0, ν ∈ RM and `(λ,ν) > −∞ are dual feasible.
I Solution of the above problem for a dual feasible set is called the dual optimalpoint (λ∗,ν∗) (i.e., optimal Lagrange multipliers)
p∗ = `(λ∗,ν∗) ≤ p∗
I Some hidden (implicit) constraints can be made explicit in the dual problem, e.g.,consider the LP problems on the next slides.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 13 / 118
The Lagrange Dual Problem Dual Problem
I Linear problem (LP)
min cTx
s.t. Ax = b
x ≥ 0
has the dual function
`(λ,ν) =
{−bTν, c + ATν − λ = 0
−∞, otherwise
- So, the dual problem can be given by
max − bTν
s.t. ATν − λ + c = 0
λ ≥ 0
- The dual problem can be further simpli�ed to the following problem
max − bTν
s.t. ATν + c ≥ 0
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 14 / 118
The Lagrange Dual Problem Dual Problem
I Linear problem (LP) with inequality
min cTx
s.t. Ax ≤ b
has the dual function
`(λ) =
{−bTλ, ATλ + c = 0
−∞, otherwise
- So, the dual problem can be given by
max − bTλ
s.t. ATλ + c = 0
λ ≥ 0
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 15 / 118
Weak and Strong Duality Weak Duality
Weak Duality
I We know thatsup
λ∈R+,ν`(λ,ν) = p∗ ≤ p∗ = inf
x∈Df(x)
I The inequality means weak duality.
I Here (p∗ − p∗) is called the duality gap, which is always nonnegative.
I Weak duality always holds even when the primal problem is non-convex.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 16 / 118
Weak and Strong Duality Strong Duality
Strong Duality
I Strong duality refers to the case where the duality gap is zero, i.e.,
p∗ = p∗
I In general it does not hold.
I It may hold if the primal problem is convex.
I The conditions under which strong duality hold are called constraint quali�cations,where one of them is the Slater's condition.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 17 / 118
Weak and Strong Duality Slater's Condition
Slater's Condition
I Strong duality holds for a convex problem
min f(x)
s.t. g(x) ≤ 0
Ax = b
if it is strongly feasible, i.e., ∃x ∈ interiorD
g(x) < 0
Ax = b
i.e., inequality constraints hold with strict inequality.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 18 / 118
Weak and Strong Duality Saddle-point Interpretation
Saddle-point Interpretation
I First consider the following problem,
supλ≥0
L(x,λ) = supλ≥0
(f(x) + λTg(x)
)= sup
λ≥0
(f(x) +
L∑i=1
λigi(x)
)
=
{f(x), gi(x) ≤ 0, ∀i∞, otherwise
with no equality constraint.
I If gi(x) ≤ 0, then optimum choice is λi = 0
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 19 / 118
Weak and Strong Duality Saddle-point Interpretation
I Hence from duality gap, we have
p∗ ≤ p∗
with weak duality as the inequality
supλ≥0
infxL(x,λ) ≤ inf
xsupλ≥0
L(x,λ)
and strong duality as the equality
supλ≥0
infxL(x,λ) = inf
xsupλ≥0
L(x,λ)
I With strong duality we can switch inf and sup for λ ≥ 0.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 20 / 118
Weak and Strong Duality Saddle-point Interpretation
I In general, for any f(w, z) : RN × RL → R
supz∈Z
infw∈W
f(w, z) ≤ infw∈W
supz∈Z
f(w, z)
with W ⊆ RN , Z ⊆ RL, and is called the max-min inequality.
I f(w, z) satisfy the strong max-min property or the saddle-point property if theabove inequality holds with equality.
I A point {(w, z)|w ∈W, z ∈ Z} is called as the saddle-point for f(w, z) if
f(w, z) ≤ f(w, z) ≤ f(w, z)
∀w ∈W, ∀z ∈ Z
I i.e.,
f(w, z) = infw∈W
f(w, z)
= supz∈Z
f(w, z)
the strong max-min property holds with the value f(w, z).
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 21 / 118
Weak and Strong Duality Saddle-point Interpretation
I For Lagrange duality, if x∗ and λ∗ are optimal points for the primal and dualproblems with strong duality (zero duality gap), then they form a saddle-point forthe Lagrangian, or vice-versa (i.e., the converse is also true).
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 22 / 118
Optimality Conditions Certi�cate of Suboptimality and Stopping Criterion
Certi�cate of Suboptimality
I We know that a dual feasible point satis�es
`(λ,ν) ≤ p∗
i.e., the point (λ,ν) is proof or certi�cate of this condition.
Then,f(x)− p∗ ≤ f(x)− `(λ,ν)
for primal feasible point x and dual feasible point (λ,ν) whereas the duality gapassociated with these points is
f(x)− `(λ,ν)
in other words
p∗ ∈ [`(λ,ν), f(x)] and p∗ ∈ [`(λ,ν), f(x)]
- If the duality gap is zero, i.e., f(x) = `(λ,ν) then x∗ = x and (λ∗,ν∗) = (λ,ν)are the primal and dual optimal points.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 23 / 118
Optimality Conditions Certi�cate of Suboptimality and Stopping Criterion
Stopping Criterion
I If an algorithm produces the sequences x(k) and (λ(k),ν(k)) check for
f(x(k))− `(λ(k),ν(k))?< ε
to guarantee ε-suboptimality. ε can approach to zero, i.e., ε→ 0, for strongduality.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 24 / 118
Optimality Conditions Complementary Slackness
Complementary Slackness
I Assume that x∗ and (λ∗,ν∗) satisfy strong duality
f(x∗) = `(λ∗,ν∗)
= infx
(f(x) + λTg(x) + νTh(x)
)= inf
x
(f(x) +
∑i
λigi(x) +∑j
νjhj(x)
)≤ f(x∗) +
∑i
λ∗i gi(x∗)︸ ︷︷ ︸
≤0
+∑j
ν∗j hj(x∗)︸ ︷︷ ︸
=0
≤ f(x∗)
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 25 / 118
Optimality Conditions Complementary Slackness
I Observations: (due to strong duality)
1. Inequality in the third line always holds with equality, i.e., x∗ minimizesL(x,λ∗,ν∗)
2. From the forth line we have∑i
λ∗i gi(x∗) = 0 and since λigi(x) ≤ 0
λ∗i gi(x∗) = 0 ∀i
which is known as complementary slackness.
I In other words
λ∗i > 0 if gi(x∗) = 0
λ∗i = 0 if gi(x∗) < 0
i.e., the ith optimal lagrange multiplier is
- positive if gi(x∗) is active at x∗.
- zero if gi(x∗) is not active at x∗.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 26 / 118
Optimality Conditions KKT Optimality Conditions
KKT Optimality Conditions
I x∗ minimizes L(x,λ∗,ν∗), thus ∇L(x,λ∗,ν∗) = 0, i.e.,
∇f(x∗) +∑i
λ∗i∇gi(x∗) +∑j
ν∗j∇hj(x∗) = 0
I Then the Karush-Kuchn-Tucker (KKT) conditions for x∗ and (λ∗,ν∗) beingprimal and dual optimal points with zero duality gap (i.e., with strong duality) are
gi(x∗) ≤ 0 (constraint)
hj(x∗) = 0 (constraint)
λ∗i ≥ 0 (constraint)
λ∗i gi(x∗) = 0 (complementary slackness)
∇f(x∗) +∑i
λ∗i∇gi(x∗) +∑j
ν∗j∇hj(x∗) = 0
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 27 / 118
Optimality Conditions KKT Optimality Conditions
I For any optimization problem with di�erentiable objective (cost) and constraintfunctions for which strong duality holds, any pair of primal and optimal pointssatisfy KKT conditions.
I For convex problems:
If f(x), gi(x) and hj(x) are convex and x, λ and ν satisfy KKT conditions, thenthey are optimal points, i.e.,
- from complementary slackness f(x) = L(x, λ, ν)
- from last condition `(λ, ν) = L(x, λ, ν)(
= infxL(x, λ, ν)
). Note that
L(x, λ, ν) is convex in x.
Thus,f(x) = `(λ, ν).
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 28 / 118
Optimality Conditions KKT Optimality Conditions
Example 18:
min1
2xTQx + cTx + r (Q : SPD)
s.t. Ax = b
Soln: From KKT consitions
Ax∗ = b
Qx∗ + c + ATν∗ = 0
}=
[Q AT
A 0
] [x∗
ν∗
]=
[−cb
]
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 29 / 118
Optimality Conditions KKT Optimality Conditions
Example 19:
Solution:
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 30 / 118
Optimality Conditions KKT Optimality Conditions
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 31 / 118
Optimality Conditions KKT Optimality Conditions
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 32 / 118
Optimality Conditions Solving The Primal Problem via The Dual
Solving The Primal Problem via The Dual
I If strong duality holds and if dual optimal solution (λ∗,ν∗) exists, then we cancompute a primal optimal solution from the dual solutions.
I When strong duality holds and (λ∗,ν∗) is given, the minimizer of L(x,λ∗,ν∗),i.e.,
min f(x) +∑i
λ∗i gi(x) +∑j
ν∗j hj(x)
is unique and if it is primal feasible, then it is also primal optimal.
I If the dual problem is easier to solve (e.g., has less dimensions or has an analyticalsolution), then solving the dual problem and �nding the optimal dual parameters(λ∗,ν∗) �rst and then solving
x∗ = argmin L(x,λ∗,ν∗)
will be an acceptable method to solve constrained minimization problems.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 33 / 118
Optimality Conditions Solving The Primal Problem via The Dual
I Example 20: Consider the following problem
min f(x) =N∑i=1
fi(xi)
s.t. aTx = b
where each fi(x) is strictly convex and di�erentiable, a ∈ RN and b ∈ R. Assumethat the problem has a unique solution (non-empty and non-in�nity) and it is dualfeasible.
- f(x) is separable because fi(xi) is a function of xi only. Now the Lagrangian willbe given as
L(x, ν) =N∑i=1
fi(xi) + ν(aTx− b
)= −bν +
N∑i=1
(fi(xi) + νaixi)
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 34 / 118
Optimality Conditions Solving The Primal Problem via The Dual
Then the dual function is given by
`(ν) = infxL(x, ν)
= −bν + infx
N∑i=1
(fi(xi) + νaixi)
= −bν +N∑i=1
infxi
(fi(xi) + νaixi)︸ ︷︷ ︸f∗i (−νai)
= −bν −N∑i=1
f∗i (−νai)
where f∗i (y) is the conjugate function of fi(x).
NOTE: The conjugate function f∗(y) is the maximum gap between the linearfunction yTx and f(x) (see Boyd Section 3.3). If f(x) is di�erentiable, thisoccurs at a point x where ∇f(x) = y. Note that, f∗(y) is a convex function.
f∗(y) = supx∈dom f(x)
(yTx− f(x)
)
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 35 / 118
Optimality Conditions Solving The Primal Problem via The Dual
Then the dual problem is a function of a scalar ν ∈ R
maxν
(−bν −
N∑i=1
f∗i (−νai)
)
- Once we �nd ν∗, we know that L(x, ν∗) is strictly convex as each fi(x) is strictlyconvex. So, we can �nd x∗ by solving
∇L(x, ν∗) = 0
∂fi(xi)
∂xi= −ν∗ai.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 36 / 118
Optimality Conditions Perturbation and Sensitivity Analysis
Perturbation and Sensitivity AnalysisI Original Problem
primal dual
min f(x)
s.t. g(x) ≤ 0
h(x) = 0
max `(λ,ν)
s.t. λ ≥ 0
I Perturbed problem
primal dual
min f(x)
s.t. g(x) ≤ u
h(x) = v
max `(λ,ν)− λTu− νTv
s.t. λ ≥ 0
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 37 / 118
Optimality Conditions Perturbation and Sensitivity Analysis
I Here u = [ui]L×1 and v = [vj ]M×1 are called the perturbations. When ui = 0 andvj = 0, the problem becomes the original problem. If ui > 0, it means we haverelaxed the i-th inequality constraint, and if ui < 0, it means we have tightenedthe i-th inequality constraint.
I Let us use the notation p∗(u,v) to denote the optimal solution of the perturbedproblem. Thus, the optimal solution of the original problem isp∗ = p∗(0,0) = `(λ∗,ν∗).
I Assume that strong duality holds and optimal dual solution exists, i.e.,p∗ = `(λ∗,ν∗). Then, we can show that
p∗(u,v) ≥ p∗(0,0)− λ∗Tu− ν∗Tv
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 38 / 118
Optimality Conditions Perturbation and Sensitivity Analysis
I If λi is large and ui < 0, then p∗(u,v) is guaranteed to increase greatly.
I If λi is small and ui > 0, then p∗(u,v) will not decrease too much.
I If |νj | is large- If νj > 0 and vi < 0, then p∗(u,v) is guaranteed to increase greatly.
- If νj < 0 and vi > 0, then p∗(u,v) is guaranteed to increase greatly.
I If |νj | is small
- If νj > 0 and vi > 0, then p∗(u,v) will not decrease too much.
- If νj < 0 and vi < 0, then p∗(u,v) will not decrease too much.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 39 / 118
Optimality Conditions Perturbation and Sensitivity Analysis
I The perturbation inequality, p∗(u,v) ≥ p∗(0,0)− λ∗Tu− ν∗Tv, give a lowerbound on the perturbed optimal value, but no upper bound. For this reason theresults are not symmetric respect to loosening or tightening a constraint, Forexample, if λi is large and we loosen the i-th constraint a bit (i.e., take ui smalland positive, 0 < ui < ε), then perturbation inequality is not useful, it does notimply that optimal value will decrease considerably.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 40 / 118
Constrained Optimization Algorithms Introduction
Introduction
I The general constrained optimization problem
min f(x)
s.t. g(x) ≤ 0
h(x) = 0
I Defn (Active Constraint): Given a feasible point x(k), if gi(x) ≤ 0 is satis�ed withequality, i.e., gi(x) = 0 , then constraint i is said to be active at x(k). Otherwise itis inactive at x(k).
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 41 / 118
Constrained Optimization Algorithms Introduction
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 42 / 118
Constrained Optimization Algorithms Primal Methods
Primal Methods
See Luenberger Chapter 12.
I A primal method is a search method that works on the original problem directly bysearching through the feasible region for the optimal solution. Each point in theprocess is feasible and the value of the objective function constantly decreases.
For a problem with N variables and M equality constraints, primal methods workin the feasible space of dimension N −M .
Advantages:
- x(k) are all composed of feasible points.
- If x(k) is a convergent sequence, it converges at least to a local minimum.
- Do not rely on special problem structure, e.g. convexity, in other words,primal methods are applicable to general non-linear problems.
Disadvantages:
- must start from a feasible initial point.
- may fail to converge for inequality constraints if precaution is not taken.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 43 / 118
Constrained Optimization Algorithms Primal Methods
Feasible Direction Methods
I Update equation isx(k+1) = x(k) + α(k)d(k)
d(k) must be a descent direction and x(k) + α(k)d(k) must be contained in thefeasible region, i.e., d(k) must be a feasible direction for some α(k) > 0.
Very similar to unconstrained descent methods but now line search is constrainedso that x(k+1) is also a feasible point.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 44 / 118
Constrained Optimization Algorithms Primal Methods
I Example 21: Consider the following problem
min f(x)
s.t. aTi x ≤ bi, i = 1, . . . ,M
- Let A(k) be the set of indices representing active constraints, i.e.,aTi x(k+1) = bi, i ∈ A(k) at x(k), then the direction vector d(k) is calculated by
mind∇T f(x(k))d
s.t. aTi d ≤ 0, i ∈ A(k)
N∑i=1
|di| = 1
Last line ensures a bounded solution. The other constraints assure that vectors ofthe form x(k) + α(k)d(k) will be feasible for su�ciently small α(k) > 0, and subjectto these conditions, d(k) is chosen to line up as closely as possible with thenegative gradient of f(x(k)). In some sense this will result in the locally bestdirection in which to proceed. The overall procedure progresses by generatingfeasible directions in this manner, and moving along them to decrease theobjective.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 45 / 118
Constrained Optimization Algorithms Primal Methods
There are two major shortcomings of feasible direction methods that require that theybe modi�ed in most cases.
I The �rst shortcoming is that for general problems there may not exist any feasibledirections. If, for example, a problem had nonlinear equality constraints, we might�nd ourselves in the situation where no straight line from x(k) has a feasiblesegment. For such problems it is necessary either to relax our requirement offeasibility by allowing points to deviate slightly from the constraint surface or tointroduce the concept of moving along curves rather than straight lines.
I A second shortcoming is that in simplest form most feasible direction methods arenot globally convergent. They are subject to jamming (sometimes referred to aszigzagging) where the sequence of points generated by the process converges to apoint that is not even a constrained local minimum point.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 46 / 118
Constrained Optimization Algorithms Primal Methods
Active Set Methods
I The idea underlying active set methods is to partition inequality constraints intotwo groups: those that are to be treated as active and those that are to be treatedas inactive. The constraints treated as inactive are essentially ignored.
I Consider the following problem
min f(x)
s.t. g(x) ≤ 0
For simplicity, no equality constraints, the inclusion of equality constraints will bestraightforward.
Necessary conditions for optimum x∗ are
∇f(x∗) + λT∇g(x∗) = 0
g(x∗) ≤ 0
λTg(x∗) = 0
λ ≥ 0
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 47 / 118
Constrained Optimization Algorithms Primal Methods
I Let A be the set of indices of active constraints (i.e., gi(x∗) = 0, i ∈ A). Then
∇f(x∗) +∑i∈A
λi∇gi(x∗) = 0
gi(x∗) = 0, i ∈ A
gi(x∗) < 0, i /∈ Aλi ≥ 0, i ∈ Aλi = 0, i /∈ A
Inactive constraints are inhibited (i.e., λi = 0).
I Just taking active constraints in A, problem is converted to anequality-constrained-only problem.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 48 / 118
Constrained Optimization Algorithms Primal Methods
I Active set method:
- at each step �nd the working set W and treat it as the active set (can be asubset of the actual active set, i.e., W ⊆ A ).
- move to a lower point on the surface of the working set
- �nd a new working set W for this point
- repeat
I The surface de�ned by the working set W will be called the working surface.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 49 / 118
Constrained Optimization Algorithms Primal Methods
I Given any working set W ⊆ A, Assume that xW is a solution to the problem PW
min f(x)
s.t. gi(x) = 0, i ∈W
also satisfying gi(xW) < 0, i /∈ A. If xW cannot be found change the working setW until a solution xW is obtained.
I Once xW is found, then solve
∇f(xW) +∑i∈W
λi∇gi(xW) = 0
to �nd λi.
I If λi ≥ 0, ∀i ∈W, then xW is a local optimal solution of the original problem.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 50 / 118
Constrained Optimization Algorithms Primal Methods
I If ∃i ∈W such that λi < 0, then dropping constraint i (but staying feasible) willdecrease the value due to the sensitivity theorem (relaxing the constraint i to begi(x) = −c reduces f(x) by λic).
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 51 / 118
Constrained Optimization Algorithms Primal Methods
I The surface de�ned by a working set is called a working surface. By dropping ifrom W and moving on the new working surface (toward the interior of the feasibleregion F, we move to an improved solution. )
I Monitoring the movement to avoid infeasibility until one or more constraintsbecome active, then add them to the working set W. Now, solve the changedproblem PW again to �nd the a new xW and repeat the previous steps.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 52 / 118
Constrained Optimization Algorithms Primal Methods
I If we can assure the the objective function (cost function) value is monotonicallydecreasing, then any working set will not appear twice in the process. Hence theactive set method terminates in a �nite number of iterations.)
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 53 / 118
Constrained Optimization Algorithms Primal Methods
Active Sets Algorithm: For a given working set Wrepeat
repeat
- minimize f(x) over the working set W using x(k+1) = x(k) + α(k)d(k)
- check whether a new constraint becomes active.
if so, add the new constraint to the working set W.
until some stopping criterion is satis�ed
check the Lagrange multipliers λi
- drop constraints with λi < 0 from the working set W
until some stopping criterion is satis�ed
I Note that f(x) strictly decreases at each step.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 54 / 118
Constrained Optimization Algorithms Primal Methods
I Disadvantage: The inner loop must terminate at a global optimum in order todetermine correct λi, and the same working set is not encountered in the followingiterations.
I Discuss: How can we integarte equality constraints to the original problem?
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 55 / 118
Constrained Optimization Algorithms Primal Methods
Gradient Projection Method
I Gradient Projection Method is an extension of the Gradient (or Steepest) DescentMethod (GD or SD) to the constrained case.
I Let us �rst consider linear constraints
min f(x)
s.t. aTi x ≤ bi, i ∈ I1aTi x = bi, i ∈ I2
Let us take the active constraints,i.e., aTi x = bi, as the working set W and seek afeasible descent direction
�nd ∇T f(x)d < 0
while satisfying aTi d = 0, i ∈W
i.e., d must lie in the tangent plane de�ned by aTi d = 0, i ∈W.
Hence, the above problem is the projection of −∇f(x) on to this tangent plane.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 56 / 118
Constrained Optimization Algorithms Primal Methods
I Another perspective is to let the equality Ax = b to represent all the activeconstraints aTi x = bi, i ∈W. Thus
min f(x)
s.t. Ax = b
I If we use �rst order Taylor approximation around point x(k), such thatf(x) = f(x(k) + d) ∼= f(x(k)) +∇T f(x(k))d for small enough d , the we will have
min f(x(k)) +∇T f(x(k))d
s.t. A(x(k) + d
)= b
dT I d ≤ 1
I As Ax(k) = b and Ax(k+1) = b, so this implies that Ad = 0.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 57 / 118
Constrained Optimization Algorithms Primal Methods
I Thus, the problem simpli�es to
min ∇T f(x(k))d
s.t. Ad = 0
dT I d ≤ 1
- Ad = 0 de�nes the tangent plane M ⊆ RN and ensures that x(k+1) is still feasible.
- dT I d ≤ 1 is the Euclidean unit ball (‖d‖22 ≤ 1), thus this is the projected GDalgorithm.
- We may also use the constraint dTQd ≤ 1, where Q is a SPD matrix, to obtainthe projected SD algorithm.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 58 / 118
Constrained Optimization Algorithms Primal Methods
Projected Steepest Descent Algorithm (PSDA):
1. For a feasible initial point x(0) (i.e., x(0) ∈ F)
2. Solve the Direction Finding Problem (DFP)
d(k) = argmin ∇T f(x(k))d
s.t. Ad = 0
dTQd ≤ 1, (Q : SPD)
3. If ∇T f(x(k))d(k) = 0, stop. x(k) is a KKT point.
4. Solve α(k) = argmin f(x(k) + αd(k)) using a line search algorithm, e.g. exact orbacktracking line search.
5. x(k+1) = x(k) + α(k)d(k)
6. Goto Step 1 with x(0) = x(k+1).
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 59 / 118
Constrained Optimization Algorithms Primal Methods
Projection:
I DFP problem is another constrained optimization problem which should satisfy itsKKT conditions, i.e.,
Ad(k) = 0
d(k)TQd(k) = 1
∇f(x(k)) + 2βkQd(k) + ATλk = 0
βk ≥ 0
βk(
1− d(k)TQd(k))
= 0
- Here, Ad(k) = 0 de�nes the tangent plane M(k) ⊆ RN .
If we set d(k) = 2βkd(k), From the third condition we �nd that
d(k) = −Q−1∇f(x(k))−Q−1ATλk
Now if put this value into the �rst condition we obtain
λk = −(AQ−1AT
)−1
AQ−1∇f(x(k))
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 60 / 118
Constrained Optimization Algorithms Primal Methods
Thus, d(k) is obtained as
d(k) = −[Q−1 −Q−1AT
(AQ−1AT
)−1
AQ−1
]∇f(x(k))
= −P(k)∇f(x(k))
where
P(k) = Q−1 −Q−1AT(AQ−1AT
)−1
AQ−1
is called the projection matrix.
I Note that if Q = I, then
P(k) = I−AT(AAT
)−1
A
I As 2βk is just a scaling factor, we can safely write that the descent direction d(k)
is given by
d(k) = −P(k)∇f(x(k))
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 61 / 118
Constrained Optimization Algorithms Primal Methods
PSDA with DFP Algorithm: Given a feasible point x(k) (i.e., x(k) ∈ F)1. Find the active constraint set W(k) and form A (actually A(k)) matrix
2. Calculate
P(k) = Q−1 −Q−1AT(AQ−1AT
)−1
AQ−1
d(k) = −P(k)∇f(x(k))
3. If d(k) 6= 0
a) Find α
α1 = max{α : x(k) + αd(k) is feasible
}α2 = argmin
{f(x + αd(k)) : 0 ≤ α ≤ α1
}
b) x(k+1) = x(k) + α2d(k)
c) Goto Step 1 (with x(k) = x(k+1))
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 62 / 118
Constrained Optimization Algorithms Primal Methods
4. If d(k) = 0
a) Find λ
λ = −(AQ−1AT
)−1
AQ−1∇f(x(k))
b) If λi ≥ 0, ∀i, stop. x(k) satis�es KKT.
c) If ∃λi < 0, delete the row from A corresponding to the inequality with themost negative component of λi and drop its index from W. Goto Step 2.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 63 / 118
Constrained Optimization Algorithms Primal Methods
I Example 22: Consider the following problem
min x21 + x22 + x23 + x24 − 2x1 − 3x4
s.t. 2x1 + x2 + x3 + 4x4 = 7
x1 + x2 + 2x3 + x4 = 6
xi ≥ 0, i = 1, 2, 3, 4
Given the initial point x(0) =[2 2 1 0
]T, �nd the initial direction d(0) for the
projected gradient descent algorithm (PGDA).
- The active constraints are the two equalities and the inequality x4 ≥ 0, thus
A =
2 1 1 41 1 2 10 0 0 1
Also, ∇f(x(0)) is given by
∇f(x(0)) =
242−3
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 64 / 118
Constrained Optimization Algorithms Primal Methods
So,
AAT =
22 9 49 7 14 1 1
(AAT
)−1
=1
11
6 −5 −19−5 6 14−19 14 73
Hence, the projection matrix P(0) is given by
P(0) = I−AT(AAT
)−1
A =1
11
1 −3 1 0−3 9 −3 01 −3 1 00 0 0 0
Finally, the driection d(0) is given by
d(0) = −P(0)∇f(x(0)) =1
11
−824−80
.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 65 / 118
Constrained Optimization Algorithms Primal Methods
Nonlinear constraints:
I Consider the problem which de�nes only the active constraints
min f(x)
s.t. h(x) = 0
I In linear constraints x(k) lie on the tangent plane M. However in nonlinearconstraints, the surface de�ned by the constraints and the tangent plane M touchin a single point
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 66 / 118
Constrained Optimization Algorithms Primal Methods
I In this case updated point must be projected onto the constrained surface. Forprojected gradient descent method, the projection matrix is given by
P(k) = I− JTh (x(k))(Jh(x(k))JTh (x(k))
)−1
Jh(x(k))
where Jh(x) is the Jacobian of h(x), i.e.,
Jh(x) =(∇hT (x)
)T.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 67 / 118
Constrained Optimization Algorithms Equality Constrained Optimization
Equality Constrained Optimization
See Boyd Chapter 10.
min f(x)
s.t. Ax = b
where f(x) : RN → R, convex, twice di�erentiable, A ∈ RM×N , M < N .
Minimum occurs at
p∗ = inf {f(x) |Ax = b} = f(x∗)
where x∗ satis�es KKT conditions
Ax = b
∇f(x∗) + ATν∗ = 0
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 68 / 118
Constrained Optimization Algorithms Equality Constrained Optimization
Quadratic Minimization
min1
2xTQx + cTx + r
s.t. Ax = b
where Q ∈ RN×N is a symmetric positive semide�ne matrix (SPSD), A ∈ RM×Nand b ∈ RM .
I Using KKT conditions
Ax∗ = b
Qx∗ + c + ATν∗ = 0
}=
[Q AT
A 0
]︸ ︷︷ ︸KKT matrix
[x∗
ν∗
]=
[−cb
]
I Above equations de�nes a KKT system for an equality constrained quadraticoptimization problem with (N +M) linear equations in the (N +M) variables(x∗,ν∗).
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 69 / 118
Constrained Optimization Algorithms Equality Constrained Optimization
Eliminating Equality Constraints
I One general approach to solving the equality constrained problem is to eliminatethe equality constraints and then solve the resulting unconstrained problem usingmethods for unconstrained minimization.
I We �rst �nd a matrix F ∈ RN×(N−M) and vector x that parametrize the (a�ne)feasible set:
{x |Ax = b} ={
Fz + x | z ∈ RN−M}.
Here x can be chosen as any particular solution of Ax = b, and F ∈ RN×(N−M) isany matrix whose range (columnspace) is the nullspace of A, i.e., R(F) = N (A).
I We then form the reduced or eliminated optimization problem
min f(z) = f(Fz + x)
which is an unconstrained problem with variable z ∈ RN−M . From its solution z∗,we can �nd the solution of the equality constrained problem as
x∗ = Fz∗ + x
.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 70 / 118
Constrained Optimization Algorithms Equality Constrained Optimization
I There are, of course, many possible choices for the elimination matrix F, whichcan be chosen as any matrix in RN×(N−M) with the range (columnspace) of thematrix F is equals to the nullspace of the matrix A, i.e., R(F) = N (A). If F isone such matrix, and T ∈ R(M−N)×(M−N) is nonsingular, then F = FT is also asuitable elimination matrix, since
R(F) = R(F) = N (A).
I Conversely, if F and F are any two suitable elimination matrices, then there issome nonsingular T such that F = FT. If we eliminate the equality constraintsusing F, we solve the unconstrained problem
min f(Fz + x)
while if F is used, we solve the unconstrained problem
min f(Fz + x) = f(F(Tz) + x)
This problem is equivalent to the one above, and is simply obtained by the changeof coordinates z = Tz. In other words, changing the elimination matrix can bethought of as changing variables in the reduced problem.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 71 / 118
Constrained Optimization Algorithms Equality Constrained Optimization
Example 23:
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 72 / 118
Constrained Optimization Algorithms Equality Constrained Optimization
Newton's Method Equality Constraints
I It is the same as unconstrained Newton's Method except
- initial point is feasible, x(0) ∈ F, i.e., Ax(0) = b.
- Newton step ∆xnt is a feasible direction, i.e., A∆xnt = 0.
I In order to use Newton's Method on the problem
min f(x)
s.t. Ax = b
we can use second-order Taylor approximation around x (actually x(k)) to obtain aquadratic minimization problem
min f(x + ∆x) = f(x) +∇T f(x)∆x +1
2∆xTH(x)∆x
s.t. A (x + ∆x) = b
This problem is convex if H(x) � 0
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 73 / 118
Constrained Optimization Algorithms Equality Constrained Optimization
I Using the results of quadratic minimization with equality constraints[H(x) AT
A 0
]︸ ︷︷ ︸
KKT matrix
[∆xnt
ν∗
]=
[−∇f(x)
0
].
Solution exists when the KKT matrix is non-singular.
I The same solution can also be obtained by setting x∗ = x + ∆xnt in theoptimality condition equations of the original problem
Ax∗ = b
∇f(x∗) + ATν∗ = 0
as ∆xnt and ν∗ should satisfy the optimality conditions.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 74 / 118
Constrained Optimization Algorithms Equality Constrained Optimization
I Newton decrement λ(x) is the same as the one used for the unconstrainedproblem, i.e.,
λ(x) =(
∆xTntH(x)∆xnt
)1/2= ‖∆xnt‖H(x)
being the norm of the Newton step in the norm de�ned by H(x).
I λ2(x)2
, being a good estimate of f(x)− p∗ (i.e., f(x)− p∗ ≈ λ2(x)2
), can be used
as the stopping criterion λ2(x)2≤ ε.
I Appears in the line search
d f(x + α∆xnt)
dα
∣∣∣∣α=0
= ∇T f(x)∆xnt = −λ2(x)
I In order to Newton step ∆xnt to be a feasible descent direction the following twoconditions need to be satis�ed
∇T f(x)∆xnt = −λ2(x)
A∆xnt = 0
I The solution for equality constrained Newton's Method is invariant to a�netransformations.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 75 / 118
Constrained Optimization Algorithms Equality Constrained Optimization
Newton's Method with Equality ConstraintElimination
min f(x)
s.t. Ax = b≡ min f(z) = f(Fz + x)
with R(F) = N (A) and Ax = b.
I The gradient and Hessian of f(z) are
∇f(z) = FT∇f(Fz + x)
H(z) = FTH(Fz + x)F
I Then, the KKT matrix is invertible, i� H(z) is invertible.
I The Newton step of the reduced problem is
∆znt = −H−1(z)∇f(z) = −(FTH(x)F
)−1
FT∇f(x)
where x = Fz + x.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 76 / 118
Constrained Optimization Algorithms Equality Constrained Optimization
I It can be shown that∆xnt = F∆znt
where ∆xnt is the Newton step of the original problem with equality constraints.
I The Newton decrement λ(z) is the same as the Newton decrement of the originalproblem
λ2(z) = ∆zTntH(z)∆znt
= ∆zTntFTH(x)F∆znt
= ∆xTntH(x)∆xnt
= λ2(x)
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 77 / 118
Constrained Optimization Algorithms Penalty and Barrier Methods
Penalty and Barrier Methods
See Luenberger Chapter 13.
I They approximate constrained problems by unconstrained problems
I The approximation is obtained by adding
- a term with a high cost for violating the constraints (penalty methods)
- a term that favors points interior to the feasible points over those near theboundary (barrier methods)
to the objective function (cost function).
I The penalty and barrier methods work directly in original the N -dimensional spaceRN rather than the (N −M)-dimensional space RN−M as in the case of theprimal methods.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 78 / 118
Constrained Optimization Algorithms Penalty and Barrier Methods
Penalty Methods
min f(x)
s.t. x ∈ FI The idea is to replace the this problem by the following penalty problem
min f(x) + cP (x)
where c ∈ R++ is a constant and P (x) : RN → R is the penalty function.
I Here
- P (x) is continuous
- P (x) ≥ 0 ∀x ∈ RN
- P (x) = 0 i� x ∈ FI In general, the following quadratic penalty function is used
P (x) =1
2
L∑i=1
(max {0, gi(x)})2 +1
2
M∑j=1
(hj(x))2
where gi(x) ≤ 0 are the inequality constraints and hj(x) = 0 are the equalityconstraints.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 79 / 118
Constrained Optimization Algorithms Penalty and Barrier Methods
I For example, consider P (x) = 12
2∑i=1
(max {0, gi(x)})2 with g1(x) = x− b and
g2(x) = a− x
I For large c, the minimum point of the penalty problem will be in a region whereP (x) is small.
I For increasing c, the solution will approach the feasible region F and as c→∞ thepenalty problem will converge to a solution of the constrained problem.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 80 / 118
Constrained Optimization Algorithms Penalty and Barrier Methods
Penalty Method:
I Let{c(k)}, k = 1, 2, . . . be a sequence tending to ∞ such that ∀k c(k) ≥ 0 and
c(k+1) > c(k).
I Letq(c,x) = f(x) + cP (x)
and for each k solve the penalty problem
min q(c(k),x)
obtaining a solution point x(k).
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 81 / 118
Constrained Optimization Algorithms Penalty and Barrier Methods
I Lemma: As k →∞, c(k+1) > c(k) (e.g., start from an exterior point)
i. q(c(k),x(k)) ≤ q(c(k+1),x(k+1))
ii. P (x(k)) ≥ P (x(k+1))
iii. f(x(k)) ≤ f(x(k+1))
iv. f(x∗) ≥ q(c(k),x) ≥ f(x(k))
I Theorem: Let{
x(k)}, k = 1, 2, . . . be a sequence generated by the penalty
method. Then, any limit of the sequence is a solution of the original constrainedproblem.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 82 / 118
Constrained Optimization Algorithms Penalty and Barrier Methods
Barrier MethodsAlso known as interior methods.
min f(x)
s.t. x ∈ F
I The idea is to replace the this problem by the following barrier problem
min f(x) +1
cB(x)
s.t. x ∈ interior of F
where c ∈ R++ is a constant and B(x) : RN → R is the barrier function.
I Here, barrier function B(x) is de�ned on the interior of F such that
- B(x) is continuous
- B(x) ≥ 0
- B(x)→∞ as x approaches the boundary of FI Ideally,
B(x) =
{0, x ∈ interiorF∞, x /∈ interiorF
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 83 / 118
Constrained Optimization Algorithms Penalty and Barrier Methods
I There are several approximations. Two common barrier functions for the inequalityconstraints are given below
Log barrier: B(x) = −L∑i=1
log (−gi(x))
Inverse barrier: B(x) = −L∑i=1
1
gi(x)
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 84 / 118
Constrained Optimization Algorithms Penalty and Barrier Methods
I For example, consider B(x) = −1/g1(x)− 1/g2(x), g1(x) = x− b andg2(x) = a− x.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 85 / 118
Constrained Optimization Algorithms Penalty and Barrier Methods
Barrier Method:
I Let{c(k)}, k = 1, 2, . . . be a sequence tending to ∞ such that ∀k c(k) ≥ 0 and
c(k+1) > c(k).
I Let
r(c,x) = f(x) +1
cB(x)
and for each k solve the barrier problem
min r(c(k),x)
s.t. x ∈ interior of F
obtaining a solution point x(k).
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 86 / 118
Constrained Optimization Algorithms Penalty and Barrier Methods
I Lemma: As k →∞, c(k+1) > c(k) (start from an interior point)
i. r(c(k),x(k)) ≥ r(c(k+1),x(k+1))
ii. B(x(k)) ≤ B(x(k+1))
iii. f(x(k)) ≥ f(x(k+1))
iv. f(x∗) ≤ f(x(k)) ≤ r(c(k),x)
I Theorem: Any limit point of a sequence{
x(k)}, k = 1, 2, . . . generated by the
barrier method is a solution of the original constrained problem.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 87 / 118
Constrained Optimization Algorithms Penalty and Barrier Methods
Properties of the Penalty & Barrier Methods
I The penalty method solves the unconstrained problem
min f(x) + cP (x)
I Let pi(x) = max {0, gi(x)} and p(x) =[pi]L×1
Let the penalty function γ(x) where P (x) = γ(p(x)) to be the Euclidean normfunction
γ(y) = yTy ⇒ P (x) = pT (x)p(x) =L∑i=1
(pi(x))2
or more generally to be the quadratic norm function
γ(y) = yT Γ y ⇒ P (x) = pT (x) Γ p(x)
I The Hessian of the above problem becomes more and more ill-conditioned asc→∞
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 88 / 118
Constrained Optimization Algorithms Penalty and Barrier Methods
I De�ningq(c,x) = f(x) + c γ(p(x))
The Hessian Q(c,x) is given by
Q(c,x) = F(x) + c∇T γ(p(x)) G(x) + cJTp (x) Γ(p(x)) Jp(x)
where F(x), G(x) and Γ(x) are Hessians of f(x), g(x) and γ(x) respectively, andJp(x) is the Jacobian of p(x).
I If at x∗ there are r active constraints, then the Hessian matrices Q(c(k),x(k))have r eigenvalues tending to ∞ as c(k) →∞ and (n− r) eigenvalues tending tosome �nite value. In other words, condition number goes to in�nity (κ→∞) asc(k) →∞.
I Gradient Descent may not be directly applied, instead Newton's Method ispreferred!
I Same observation also applies to the barrier method.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 89 / 118
Constrained Optimization Algorithms Interior-Point Methods
Interior-Point MethodsSee Boyd Chapter 11.
min f(x) +1
c
(−
L∑i=1
log (−gi(x))
)s.t. Ax = b
Minimum occurs at
p∗ = inf {f(x) |Ax = b} = f(x∗)Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 90 / 118
Constrained Optimization Algorithms Interior-Point Methods
I The function
φ(x) = −L∑i=1
log (−gi(x))
with domφ(x) ={x ∈ RN | gi(x) < 0, ∀i
}is called the logarithmic barrier
function.
I We will modify the Newton's algorithm to solve the above problem. So, we willneed
∇φ(x) = −L∑i=1
1
gi(x)∇gi(x)
Hφ(x) =L∑i=1
1
g2i (x)∇gi(x)∇T gi(x)−
L∑i=1
1
gi(x)Hgi(x)
where Hφ(x) = ∇2φ(x) and Hgi(x) = ∇2gi(x) are the Hessians of φ(x) andgi(x) respectively.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 91 / 118
Constrained Optimization Algorithms Interior-Point Methods
Central PathI Consider the equivalent problem (c > 0)
min cf(x) + φ(x)
s.t. Ax = b
I The point x∗(c) is the solution. The trajectory of x∗(c) as a function of c is calledthe central path. Points on the central path satisfy
Ax∗(c) = b
gi(x∗(c)) < 0, ∀i
c∇f(x∗(c)) +∇φ(x∗(c)) + AT ν = 0
Centrality conditions
for some ν ∈ RM .
I Last line could be rewritten as
c∇f(x∗(c))−L∑i=1
1
gi(x∗(c))∇gi(x∗(c)) + AT ν = 0
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 92 / 118
Constrained Optimization Algorithms Interior-Point Methods
I Example 24: Inequality form linear programming. The logarithmic barrier functionfor an LP in inequality form
min eTx
s.t. Ax ≤ b
(where e ∈ RN , A ∈ RN×L and b ∈ RL are constants) is given by
φ(x) = −L∑i=1
log(bi − aTi x
), domφ(x) = {x |Ax < b}
where aT1 , . . . ,aTL are the rows of A.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 93 / 118
Constrained Optimization Algorithms Interior-Point Methods
The gradient and Hessian of the barrier function are
∇φ(x) =L∑i=1
1
bi − aTi xai
= ATd
Hφ(x) =L∑i=1
1
(bi − aTi x)2 aia
Ti
= AT (diag d)2A
where di =1
bi − aTi x.
Since x is strictly feasible, we have d > 0, so the Hessian of φ(x), Hφ(x) isnonsingular if and only if A has rank N , i.e., full-rank.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 94 / 118
Constrained Optimization Algorithms Interior-Point Methods
The centrality condition isc e + ATd = 0
We can give a simple geometric interpretation of the centrality condition. At apoint x∗(c) on the central path the gradient ∇φ(x∗(c)), which is normal to thelevel set of φ(x) through x∗(c), must be parallel to e. In other words, thehyperplane eTx = eTx∗(c) is tangent to the level set of φ(x) through x∗(c).Figure below shows an example with L = 6 and N = 2.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 95 / 118
Constrained Optimization Algorithms Interior-Point Methods
The dashed curves in the previous �gure show three contour lines of thelogarithmic barrier function φ(x). The central path converges to the optimal pointx∗ as c→∞. Also shown is the point on the central path with c = 10. Theoptimality condition at this point can be veri�ed geometrically: The lineeTx = eTx∗(10) is tangent to the contour line of φ(x) through x∗(10).
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 96 / 118
Constrained Optimization Algorithms Interior-Point Methods
Dual Points from Central Path
I From the centrality condition, let
λ∗i (c) = − 1
c gi(x∗(c)), ∀i
ν∗(c) =ν
c
then
∇f(x∗(c)) +L∑i=1
λ∗i (c)∇gi(x∗(c)) + ATν∗(c) = 0
Hence, from KKT, x∗(c) minimizes
L(x,λ,ν) = f(x) + λTg(x) + νT (Ax− b)
for λ = λ∗(c) and ν = ν∗(c) for a particular c, which means that (λ∗(c),ν∗(c)) isa dual feasible pair.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 97 / 118
Constrained Optimization Algorithms Interior-Point Methods
So, the dual function optimal value p∗(c) = `(λ∗(c),ν∗(c)) is �nite and given as
`(λ∗(c),ν∗(c)) = f(x∗(c)) +L∑i=1
λ∗i (c)gi(x∗(c))︸ ︷︷ ︸
= 1c
+ν∗T (c) (Ax∗(c)− b)︸ ︷︷ ︸=0
= f(x∗(c))− L
c
I Duality gap isL
cand
f(x∗(c))− p∗ ≤ L
cgoes to zero, as c→∞.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 98 / 118
Constrained Optimization Algorithms Interior-Point Methods
KKT Interpretation
I Central path (centrality) conditions can be seen as deformed KKT conditions, i.e.,
Ax = b
gi(x∗) ≤ 0, ∀iλ∗i ≥ 0, ∀i
−λ∗i gi(x∗) =1
c, ∀i
∇f(x∗) +L∑i=1
λ∗i∇gi(x∗) + ATν∗ = 0
satis�ed byx∗(c), λ∗(c) and ν∗(c)
I Complementary slackness, λigi(x) = 0 → −λ∗i gi(x∗) =1
c
I As c→∞, x∗(c), λ∗(c) and ν∗(c) almost satisfy the KKT optimality conditions.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 99 / 118
Constrained Optimization Algorithms Interior-Point Methods
Newton Step for Modi�ed KKT equations
I Let λi = − 1
c gi(x), then
∇f(x)−L∑i=1
1
c gi(x)∇gi(x) + ATν = 0
Ax = b
I To solve this set of (N +M) linear independent equations (of N +M variables xand ν) consider the nonlinear part of the �rst set of equations
∇f(x + d)−L∑i=1
1
c gi(x + d)∇gi(x + d) ∼= ∇f(x)−
L∑i=1
1
c gi(x)∇gi(x)︸ ︷︷ ︸
=g
+ H(x)d−L∑i=1
1
c gi(x)Hgi(x)d +
L∑i=1
1
c g2i (x)∇gi(x)∇T gi(x)d︸ ︷︷ ︸
=Hd
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 100 / 118
Constrained Optimization Algorithms Interior-Point Methods
I Substituting back, we obtain
Hd + ATν = −g
Ad = 0
where
H = H(x)−L∑i=1
1
c gi(x)Hgi(x) +
L∑i=1
1
c g2i (x)∇gi(x)∇T gi(x)
g = ∇f(x)−L∑i=1
1
c gi(x)∇gi(x)
I Using the derivations of ∇φ(x) and Hφ(x)
H = H(x) +1
cHφ(x)
g = ∇f(x) +1
c∇φ(x)
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 101 / 118
Constrained Optimization Algorithms Interior-Point Methods
I Let us represent the previous modi�ed KKT equations in matrix form[H AT
A 0
] [dν
]=
[−g0
]whose solution would give the modi�ed Newton step ∆xnt and ν∗nt[
cH(x) + Hφ(x) AT
A 0
] [∆xnt
ν∗nt
]=
[−c∇f(x)−∇φ(x)
0
]where ν∗nt = cν∗.
I Using this Newton step, the Interior-Point Method (i.e., Barrier Method) can beconstructed.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 102 / 118
Constrained Optimization Algorithms Interior-Point Methods
The Interior-Point Method (Barrier Method):Given a strictly feasible x ∈ F, c = c(0)(> 0), µ > 1 and tolerance ε > 0
repeat
1. Centering step:
Compute x∗(c) by minimizing the modi�ed barrier problem
x∗(c) = argmin (c f(x) + φ(x))
s.t. Ax = b
starting at x using the modi�ed Newton's Method.
2. Update: x = x∗(c)
3. Stopping criterion: quit ifL
c< ε
4. Increase c: c = µ c
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 103 / 118
Constrained Optimization Algorithms Interior-Point Methods
I Accuracy of centering:
- Computing x∗(c) exactly is not necessary since the central path has nosigni�cance beyond the fact that it leads to a solution of the original problemas c→∞; inexact centering will still yield a sequence of points x(k) thatconverges to an optimal point. Inexact centering, however, means that thepoints λ∗(c) and ν∗(c), computed from the �rst two equation given in thesection titled "Dual points from central path", are not exactly dual feasible.This can be corrected by adding a correction term to these formulae, whichyields a dual feasible point provided the computed x is near the central path,i.e., x∗(c).
- On the other hand, the cost of computing an extremely accurate minimizerof c f(x) + φ(x), as compared to the cost of computing a good minimizer ofc f(x) + φ(x), is only marginally more, i.e., a few Newton steps at most. Forthis reason it is not unreasonable to assume exact centering.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 104 / 118
Constrained Optimization Algorithms Interior-Point Methods
I Choice of µ:
- µ provides a trade-o� in the number of iterations for the inner and outerloops.
- small µ: at each step inner loop starts from very good point, few inner loopiterations are required, but too many outer loop iterations may be required.
- large µ: at each step c increases a large amount, the current starting pointmay not be a good point for the inner loop. Too many inner loop iterationsmay be required, but few outer loop iterations are required.
- In practice, small values of µ (i.e., near one) result in many outer iterations,with just a few Newton steps for each outer iteration. For µ in a fairly largerange, from around 3 to 100 or so, the two e�ects nearly cancel, so the totalnumber of Newton steps remains approximately constant. This means thatthe choice of µ is not particularly critical; values from around 10 to 20 or soseem to work well.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 105 / 118
Constrained Optimization Algorithms Interior-Point Methods
I Choice of c(0):
- large c(0): �rst run of inner loop may require too many iterations.
- small c(0): more outer-loop iterations are required.
- One reasonable choice is to choose c(0) so that L
c(0)is approximately of the
same order as f(x(0))− p∗, or µ times this amount. For example, if a dualfeasible point (λ,ν) is known, with duality gap η = f(x(0))− `(λ,ν), thenwe can take c(0) = L
η. Thus, in the �rst outer iteration we simply compute a
pair with the same duality gap as the initial primal and dual feasible points.
- Another possibility is to �nd c(0) which minimizes
infν‖c∇f(x(0)) +∇φ(x(0)) + ATν‖2
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 106 / 118
Constrained Optimization Algorithms Interior-Point Methods
Example 25:
Solution:
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 107 / 118
Constrained Optimization Algorithms Interior-Point Methods
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 108 / 118
Constrained Optimization Algorithms Interior-Point Methods
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 109 / 118
Constrained Optimization Algorithms Interior-Point Methods
Example 26:
Solution:
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 110 / 118
Constrained Optimization Algorithms Interior-Point Methods
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 111 / 118
Constrained Optimization Algorithms Interior-Point Methods
How to start from a feasible point?
I The interior-point method requires a strictly feasible starting point x(0)
I If such a point is not known, a preliminary stage, Phase I is run �rst
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 112 / 118
Constrained Optimization Algorithms Interior-Point Methods
Basic Phase I Method:
I Consider gi(x) ≤ 0, i = 1, 2, . . . , L and Ax = b.
I We always assume that we are given a point x(0) ∈L⋂i=1
dom gi(x) satisfying
Ax(0) = b.
I Then, we form the following optimization problem
min s
s.t. gi(x) ≤ s, i = 1, 2, . . . , L
Ax = b
in the variables x ∈ RN and s ∈ R.
s is a bound on the maximum infeasibility of the inequalities and it is to be drivenbelow zero.
I The problem is always feasible when select s(0) ≥ maxigi(x
(0)) together with the
given x(0) ∈L⋂i=1
dom gi(x) with Ax(0) = b.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 113 / 118
Constrained Optimization Algorithms Interior-Point Methods
I Then, apply the interior-point method to solve the above problem. There are threecases depending on the optimal value p∗
1. If p∗ < 0, then a strictly feasible solution is reached. Moreover if (x, s) isfeasible with s < 0, then x satis�es gi(x) < 0. This means we do not needto solve the optimization problem with high accuracy; we can terminatewhen s < 0.
2. If p∗ > 0, then there is no feasible solution.
3. If p∗ = 0, and the minimum is attained at x∗ and s∗ = 0, then the set ofinequalities is feasible, but not strictly feasible. However, if p∗ = 0 and theminimum is not attained, then the inequalities are infeasible.
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 114 / 118
Constrained Optimization Algorithms Interior-Point Methods
Example 27:
Solution:
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 115 / 118
Constrained Optimization Algorithms Interior-Point Methods
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 116 / 118
Constrained Optimization Algorithms Interior-Point Methods
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 117 / 118
Constrained Optimization Algorithms Interior-Point Methods
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 118 / 118