ele604/ele704 optimization - hacettepe universityusezen/ele604/optimization3-2p.pdfumut sezen &...

59

Upload: others

Post on 08-Jul-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

ELE604/ELE704 Optimization

Constrained Optimization

http://www.ee.hacettepe.edu.tr/∼usezen/ele604/

Dr. Umut Sezen & Dr. Cenk Toker

Department of Electrical and Electronic Engineering

Hacettepe University

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 1 / 118

Contents

Duality

Duality

Lagrange Dual Function

The Lagrange Dual Problem

Dual Problem

Weak and Strong Duality

Weak Duality

Strong Duality

Slater's Condition

Saddle-point Interpretation

Optimality Conditions

Certi�cate of Suboptimality and Stopping CriterionStopping Criterion

Complementary Slackness

KKT Optimality Conditions

Solving The Primal Problem via The Dual

Perturbation and Sensitivity Analysis

Constrained Optimization Algorithms

Introduction

Primal MethodsFeasible Direction MethodsActive Set MethodsGradient Projection Method

Equality Constrained OptimizationQuadratic MinimizationEliminating Equality ConstraintsNewton's Method Equality ConstraintsNewton's Method with Equality Constraint Elimination

Penalty and Barrier MethodsPenalty MethodsBarrier MethodsProperties of the Penalty & Barrier Methods

Interior-Point MethodsLogarithmic BarrierCentral PathDual Points from Central PathKKT InterpretationNewton Step for Modi�ed KKT equationsThe Interior-Point AlgorithmHow to start from a feasible point?

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 2 / 118

Page 2: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Duality Duality

DualityI Consider the standard minimization problem (will be referred as the primal

problem)

min f(x)

s.t. g(x) ≤ 0

h(x) = 0

where g(x) and h(x)

g(x) =[g1(x) g2(x) · · · gi(x) · · · gL(x)

]Th(x) =

[h1(x) h2(x) · · · hj(x) · · · hM (x)

]Trepresent the L inequality and M equality constraints, respectively.

I The domain of the optimization problem D is de�ned by

D = dom f(x) ∩L⋂i=1

dom gi(x) ∩M⋂j=1

domhj(x)

I Any point x ∈ D satisfying the constraints is called a feasible point, i.e., g(x) ≤ 0and h(x)= 0.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 3 / 118

Duality Lagrange Dual Function

Lagrange Dual FunctionI De�ne the Lagrangian L : RN × RL × RM → R as

L(x,λ,ν) = f(x) + λTg(x) + νTh(x)

= f(x) +L∑i=1

λigi(x) +M∑j=1

νjhj(x)

where λi ≥ 0 and νj are called the Lagrange multipliers, and λ and ν

λ =[λ1 λ2 · · · λi · · · λL

]Tν =

[ν1 ν2 · · · νj · · · νM

]Tare called the Lagrange multiplier vectors.

I On the feasible set F where

F = {x | x ∈ D ∧ g(x) ≤ 0 ∧ h(x) = 0}

Lagragian has a value less than or equal to the cost function, i.e,

L(x,λ,ν) ≤ f(x), ∀x ∈ F, ∀λi ≥ 0

.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 4 / 118

Page 3: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Duality Lagrange Dual Function

I Then the Lagrange dual function, `(λ,ν), is de�ned as

`(λ,ν) = infx∈D

L(x,λ,ν)

= infx∈D

(f(x) + λTg(x) + νTh(x)

)= inf

x∈D

(f(x) +

L∑i=1

λigi(x) +M∑j=1

νjhj(x)

)

I Dual function `(λ,ν) is the pointwise in�mum of a set of a�ne functions of λ andν, hence it is concave even if f(x), gi(x) and hj(x) are not convex.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 5 / 118

Duality Lagrange Dual Function

I Proposition: The dual function constitutes a lower bound on p∗ = f(x∗), i.e.,

`(λ,ν) ≤ p∗ ∀λ ≥ 0

I Proof: Let x be a feasible point, that is x ∈ F (i.e., gi(x) ≤ 0 and hi(x) = 0),and λ > 0, then

λTg(x) + νTh(x) ≤ 0.

Then, the Lagrangian is

L(x,λ,ν) = f(x) + λTg(x) + νTh(x)︸ ︷︷ ︸≤0

≤ f(x)

So,`(λ,ν) = inf

x∈DL(x,λ,ν) ≤ L(x,λ,ν) ≤ f(x) ∀x �

I The pair (λ,ν) is called dual feasible when λ ≥ 0.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 6 / 118

Page 4: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Duality Lagrange Dual Function

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 7 / 118

Duality Lagrange Dual Function

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 8 / 118

Page 5: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Duality Lagrange Dual Function

Examples:

I Least Squares (LS) solution of linear equations

min xTx

s.t. Ax = b

- The Lagrangian isL(x,ν) = xTx + νT (Ax− b)

then the Lagrange dual function is

`(ν) = infxL(x,ν)

Since L(x,ν) is quadratic in x, it is convex

∇L(x,ν) = 2x + ATν = 0⇒ x∗ = −1

2ATν

Hence Lagrange dual function is given by

`(ν) = L(−1

2ATν,ν) = −1

4νTATAν − bTν

which is obviously concave p∗ ≥ `(ν) = − 14νTATAν − bTν

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 9 / 118

Duality Lagrange Dual Function

I Linear Programming (LP)

min cTx

s.t. Ax = b

− x ≤ 0

- The Lagrangian is

L(x,λ,ν) = cTx− λTx + νT (Ax− b)

then the Lagrange dual function is

`(λ,ν) = −bTν + infx

{(c + ATν − λ

)Tx

}In order `(ν) to be bounded, we must have c + ATν − λ = 0, i.e.,

`(λ,ν) =

{−bTν, c + ATν − λ = 0

−∞, otherwise

A�ne, i.e., concave when c + ATν − λ = 0 with λ ≥ 0

So, p∗ ≥ −bTν when c + ATν − λ = 0 with λ ≥ 0.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 10 / 118

Page 6: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Duality Lagrange Dual Function

I Two-way partitioning (a non-convex problem)

min xTWx

s.t. x2j = 1, j = 1, . . . , N

- This is a discrete-problem since xj ∈ {∓1}, and very di�cult to solve forlarge N .

The Lagrange dual function is

`(ν) = infx

{xTWx +

N∑j=1

νj(x2j − 1

)}

= infx

{xT (W + diag ν)x

}− 1Tν

=

{−1Tν, W + diag ν � 0

−∞, else

We may take ν = −λmin(W)1T which yields

p∗ ≥ −1Tν = N λmin(W)

where λmin(W) is the minimum eigen value of W.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 11 / 118

Duality Lagrange Dual Function

- If we relax the constraint to be ‖x‖2 = N , i.e.,N∑j=1

x2j = N , then problem

becomes easy to solve

min xTWx

s.t. ‖x‖2 = N

with an exact solution ofp∗ = N λmin(W)

where λmin(W) is the minimum eigen value of W.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 12 / 118

Page 7: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

The Lagrange Dual Problem Dual Problem

The Lagrange Dual Problem

max `(λ,ν)

s.t. λ ≥ 0

gives the best lower bound for p∗, i.e., p∗ ≥ `(λ,ν).

I The pairs (λ,ν) with λ ≥ 0, ν ∈ RM and `(λ,ν) > −∞ are dual feasible.

I Solution of the above problem for a dual feasible set is called the dual optimalpoint (λ∗,ν∗) (i.e., optimal Lagrange multipliers)

p∗ = `(λ∗,ν∗) ≤ p∗

I Some hidden (implicit) constraints can be made explicit in the dual problem, e.g.,consider the LP problems on the next slides.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 13 / 118

The Lagrange Dual Problem Dual Problem

I Linear problem (LP)

min cTx

s.t. Ax = b

x ≥ 0

has the dual function

`(λ,ν) =

{−bTν, c + ATν − λ = 0

−∞, otherwise

- So, the dual problem can be given by

max − bTν

s.t. ATν − λ + c = 0

λ ≥ 0

- The dual problem can be further simpli�ed to the following problem

max − bTν

s.t. ATν + c ≥ 0

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 14 / 118

Page 8: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

The Lagrange Dual Problem Dual Problem

I Linear problem (LP) with inequality

min cTx

s.t. Ax ≤ b

has the dual function

`(λ) =

{−bTλ, ATλ + c = 0

−∞, otherwise

- So, the dual problem can be given by

max − bTλ

s.t. ATλ + c = 0

λ ≥ 0

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 15 / 118

Weak and Strong Duality Weak Duality

Weak Duality

I We know thatsup

λ∈R+,ν`(λ,ν) = p∗ ≤ p∗ = inf

x∈Df(x)

I The inequality means weak duality.

I Here (p∗ − p∗) is called the duality gap, which is always nonnegative.

I Weak duality always holds even when the primal problem is non-convex.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 16 / 118

Page 9: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Weak and Strong Duality Strong Duality

Strong Duality

I Strong duality refers to the case where the duality gap is zero, i.e.,

p∗ = p∗

I In general it does not hold.

I It may hold if the primal problem is convex.

I The conditions under which strong duality hold are called constraint quali�cations,where one of them is the Slater's condition.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 17 / 118

Weak and Strong Duality Slater's Condition

Slater's Condition

I Strong duality holds for a convex problem

min f(x)

s.t. g(x) ≤ 0

Ax = b

if it is strongly feasible, i.e., ∃x ∈ interiorD

g(x) < 0

Ax = b

i.e., inequality constraints hold with strict inequality.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 18 / 118

Page 10: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Weak and Strong Duality Saddle-point Interpretation

Saddle-point Interpretation

I First consider the following problem,

supλ≥0

L(x,λ) = supλ≥0

(f(x) + λTg(x)

)= sup

λ≥0

(f(x) +

L∑i=1

λigi(x)

)

=

{f(x), gi(x) ≤ 0, ∀i∞, otherwise

with no equality constraint.

I If gi(x) ≤ 0, then optimum choice is λi = 0

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 19 / 118

Weak and Strong Duality Saddle-point Interpretation

I Hence from duality gap, we have

p∗ ≤ p∗

with weak duality as the inequality

supλ≥0

infxL(x,λ) ≤ inf

xsupλ≥0

L(x,λ)

and strong duality as the equality

supλ≥0

infxL(x,λ) = inf

xsupλ≥0

L(x,λ)

I With strong duality we can switch inf and sup for λ ≥ 0.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 20 / 118

Page 11: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Weak and Strong Duality Saddle-point Interpretation

I In general, for any f(w, z) : RN × RL → R

supz∈Z

infw∈W

f(w, z) ≤ infw∈W

supz∈Z

f(w, z)

with W ⊆ RN , Z ⊆ RL, and is called the max-min inequality.

I f(w, z) satisfy the strong max-min property or the saddle-point property if theabove inequality holds with equality.

I A point {(w, z)|w ∈W, z ∈ Z} is called as the saddle-point for f(w, z) if

f(w, z) ≤ f(w, z) ≤ f(w, z)

∀w ∈W, ∀z ∈ Z

I i.e.,

f(w, z) = infw∈W

f(w, z)

= supz∈Z

f(w, z)

the strong max-min property holds with the value f(w, z).

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 21 / 118

Weak and Strong Duality Saddle-point Interpretation

I For Lagrange duality, if x∗ and λ∗ are optimal points for the primal and dualproblems with strong duality (zero duality gap), then they form a saddle-point forthe Lagrangian, or vice-versa (i.e., the converse is also true).

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 22 / 118

Page 12: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Optimality Conditions Certi�cate of Suboptimality and Stopping Criterion

Certi�cate of Suboptimality

I We know that a dual feasible point satis�es

`(λ,ν) ≤ p∗

i.e., the point (λ,ν) is proof or certi�cate of this condition.

Then,f(x)− p∗ ≤ f(x)− `(λ,ν)

for primal feasible point x and dual feasible point (λ,ν) whereas the duality gapassociated with these points is

f(x)− `(λ,ν)

in other words

p∗ ∈ [`(λ,ν), f(x)] and p∗ ∈ [`(λ,ν), f(x)]

- If the duality gap is zero, i.e., f(x) = `(λ,ν) then x∗ = x and (λ∗,ν∗) = (λ,ν)are the primal and dual optimal points.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 23 / 118

Optimality Conditions Certi�cate of Suboptimality and Stopping Criterion

Stopping Criterion

I If an algorithm produces the sequences x(k) and (λ(k),ν(k)) check for

f(x(k))− `(λ(k),ν(k))?< ε

to guarantee ε-suboptimality. ε can approach to zero, i.e., ε→ 0, for strongduality.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 24 / 118

Page 13: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Optimality Conditions Complementary Slackness

Complementary Slackness

I Assume that x∗ and (λ∗,ν∗) satisfy strong duality

f(x∗) = `(λ∗,ν∗)

= infx

(f(x) + λTg(x) + νTh(x)

)= inf

x

(f(x) +

∑i

λigi(x) +∑j

νjhj(x)

)≤ f(x∗) +

∑i

λ∗i gi(x∗)︸ ︷︷ ︸

≤0

+∑j

ν∗j hj(x∗)︸ ︷︷ ︸

=0

≤ f(x∗)

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 25 / 118

Optimality Conditions Complementary Slackness

I Observations: (due to strong duality)

1. Inequality in the third line always holds with equality, i.e., x∗ minimizesL(x,λ∗,ν∗)

2. From the forth line we have∑i

λ∗i gi(x∗) = 0 and since λigi(x) ≤ 0

λ∗i gi(x∗) = 0 ∀i

which is known as complementary slackness.

I In other words

λ∗i > 0 if gi(x∗) = 0

λ∗i = 0 if gi(x∗) < 0

i.e., the ith optimal lagrange multiplier is

- positive if gi(x∗) is active at x∗.

- zero if gi(x∗) is not active at x∗.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 26 / 118

Page 14: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Optimality Conditions KKT Optimality Conditions

KKT Optimality Conditions

I x∗ minimizes L(x,λ∗,ν∗), thus ∇L(x,λ∗,ν∗) = 0, i.e.,

∇f(x∗) +∑i

λ∗i∇gi(x∗) +∑j

ν∗j∇hj(x∗) = 0

I Then the Karush-Kuchn-Tucker (KKT) conditions for x∗ and (λ∗,ν∗) beingprimal and dual optimal points with zero duality gap (i.e., with strong duality) are

gi(x∗) ≤ 0 (constraint)

hj(x∗) = 0 (constraint)

λ∗i ≥ 0 (constraint)

λ∗i gi(x∗) = 0 (complementary slackness)

∇f(x∗) +∑i

λ∗i∇gi(x∗) +∑j

ν∗j∇hj(x∗) = 0

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 27 / 118

Optimality Conditions KKT Optimality Conditions

I For any optimization problem with di�erentiable objective (cost) and constraintfunctions for which strong duality holds, any pair of primal and optimal pointssatisfy KKT conditions.

I For convex problems:

If f(x), gi(x) and hj(x) are convex and x, λ and ν satisfy KKT conditions, thenthey are optimal points, i.e.,

- from complementary slackness f(x) = L(x, λ, ν)

- from last condition `(λ, ν) = L(x, λ, ν)(

= infxL(x, λ, ν)

). Note that

L(x, λ, ν) is convex in x.

Thus,f(x) = `(λ, ν).

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 28 / 118

Page 15: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Optimality Conditions KKT Optimality Conditions

Example 18:

min1

2xTQx + cTx + r (Q : SPD)

s.t. Ax = b

Soln: From KKT consitions

Ax∗ = b

Qx∗ + c + ATν∗ = 0

}=

[Q AT

A 0

] [x∗

ν∗

]=

[−cb

]

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 29 / 118

Optimality Conditions KKT Optimality Conditions

Example 19:

Solution:

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 30 / 118

Page 16: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Optimality Conditions KKT Optimality Conditions

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 31 / 118

Optimality Conditions KKT Optimality Conditions

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 32 / 118

Page 17: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Optimality Conditions Solving The Primal Problem via The Dual

Solving The Primal Problem via The Dual

I If strong duality holds and if dual optimal solution (λ∗,ν∗) exists, then we cancompute a primal optimal solution from the dual solutions.

I When strong duality holds and (λ∗,ν∗) is given, the minimizer of L(x,λ∗,ν∗),i.e.,

min f(x) +∑i

λ∗i gi(x) +∑j

ν∗j hj(x)

is unique and if it is primal feasible, then it is also primal optimal.

I If the dual problem is easier to solve (e.g., has less dimensions or has an analyticalsolution), then solving the dual problem and �nding the optimal dual parameters(λ∗,ν∗) �rst and then solving

x∗ = argmin L(x,λ∗,ν∗)

will be an acceptable method to solve constrained minimization problems.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 33 / 118

Optimality Conditions Solving The Primal Problem via The Dual

I Example 20: Consider the following problem

min f(x) =N∑i=1

fi(xi)

s.t. aTx = b

where each fi(x) is strictly convex and di�erentiable, a ∈ RN and b ∈ R. Assumethat the problem has a unique solution (non-empty and non-in�nity) and it is dualfeasible.

- f(x) is separable because fi(xi) is a function of xi only. Now the Lagrangian willbe given as

L(x, ν) =N∑i=1

fi(xi) + ν(aTx− b

)= −bν +

N∑i=1

(fi(xi) + νaixi)

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 34 / 118

Page 18: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Optimality Conditions Solving The Primal Problem via The Dual

Then the dual function is given by

`(ν) = infxL(x, ν)

= −bν + infx

N∑i=1

(fi(xi) + νaixi)

= −bν +N∑i=1

infxi

(fi(xi) + νaixi)︸ ︷︷ ︸f∗i (−νai)

= −bν −N∑i=1

f∗i (−νai)

where f∗i (y) is the conjugate function of fi(x).

NOTE: The conjugate function f∗(y) is the maximum gap between the linearfunction yTx and f(x) (see Boyd Section 3.3). If f(x) is di�erentiable, thisoccurs at a point x where ∇f(x) = y. Note that, f∗(y) is a convex function.

f∗(y) = supx∈dom f(x)

(yTx− f(x)

)

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 35 / 118

Optimality Conditions Solving The Primal Problem via The Dual

Then the dual problem is a function of a scalar ν ∈ R

maxν

(−bν −

N∑i=1

f∗i (−νai)

)

- Once we �nd ν∗, we know that L(x, ν∗) is strictly convex as each fi(x) is strictlyconvex. So, we can �nd x∗ by solving

∇L(x, ν∗) = 0

∂fi(xi)

∂xi= −ν∗ai.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 36 / 118

Page 19: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Optimality Conditions Perturbation and Sensitivity Analysis

Perturbation and Sensitivity AnalysisI Original Problem

primal dual

min f(x)

s.t. g(x) ≤ 0

h(x) = 0

max `(λ,ν)

s.t. λ ≥ 0

I Perturbed problem

primal dual

min f(x)

s.t. g(x) ≤ u

h(x) = v

max `(λ,ν)− λTu− νTv

s.t. λ ≥ 0

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 37 / 118

Optimality Conditions Perturbation and Sensitivity Analysis

I Here u = [ui]L×1 and v = [vj ]M×1 are called the perturbations. When ui = 0 andvj = 0, the problem becomes the original problem. If ui > 0, it means we haverelaxed the i-th inequality constraint, and if ui < 0, it means we have tightenedthe i-th inequality constraint.

I Let us use the notation p∗(u,v) to denote the optimal solution of the perturbedproblem. Thus, the optimal solution of the original problem isp∗ = p∗(0,0) = `(λ∗,ν∗).

I Assume that strong duality holds and optimal dual solution exists, i.e.,p∗ = `(λ∗,ν∗). Then, we can show that

p∗(u,v) ≥ p∗(0,0)− λ∗Tu− ν∗Tv

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 38 / 118

Page 20: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Optimality Conditions Perturbation and Sensitivity Analysis

I If λi is large and ui < 0, then p∗(u,v) is guaranteed to increase greatly.

I If λi is small and ui > 0, then p∗(u,v) will not decrease too much.

I If |νj | is large- If νj > 0 and vi < 0, then p∗(u,v) is guaranteed to increase greatly.

- If νj < 0 and vi > 0, then p∗(u,v) is guaranteed to increase greatly.

I If |νj | is small

- If νj > 0 and vi > 0, then p∗(u,v) will not decrease too much.

- If νj < 0 and vi < 0, then p∗(u,v) will not decrease too much.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 39 / 118

Optimality Conditions Perturbation and Sensitivity Analysis

I The perturbation inequality, p∗(u,v) ≥ p∗(0,0)− λ∗Tu− ν∗Tv, give a lowerbound on the perturbed optimal value, but no upper bound. For this reason theresults are not symmetric respect to loosening or tightening a constraint, Forexample, if λi is large and we loosen the i-th constraint a bit (i.e., take ui smalland positive, 0 < ui < ε), then perturbation inequality is not useful, it does notimply that optimal value will decrease considerably.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 40 / 118

Page 21: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Introduction

Introduction

I The general constrained optimization problem

min f(x)

s.t. g(x) ≤ 0

h(x) = 0

I Defn (Active Constraint): Given a feasible point x(k), if gi(x) ≤ 0 is satis�ed withequality, i.e., gi(x) = 0 , then constraint i is said to be active at x(k). Otherwise itis inactive at x(k).

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 41 / 118

Constrained Optimization Algorithms Introduction

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 42 / 118

Page 22: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Primal Methods

Primal Methods

See Luenberger Chapter 12.

I A primal method is a search method that works on the original problem directly bysearching through the feasible region for the optimal solution. Each point in theprocess is feasible and the value of the objective function constantly decreases.

For a problem with N variables and M equality constraints, primal methods workin the feasible space of dimension N −M .

Advantages:

- x(k) are all composed of feasible points.

- If x(k) is a convergent sequence, it converges at least to a local minimum.

- Do not rely on special problem structure, e.g. convexity, in other words,primal methods are applicable to general non-linear problems.

Disadvantages:

- must start from a feasible initial point.

- may fail to converge for inequality constraints if precaution is not taken.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 43 / 118

Constrained Optimization Algorithms Primal Methods

Feasible Direction Methods

I Update equation isx(k+1) = x(k) + α(k)d(k)

d(k) must be a descent direction and x(k) + α(k)d(k) must be contained in thefeasible region, i.e., d(k) must be a feasible direction for some α(k) > 0.

Very similar to unconstrained descent methods but now line search is constrainedso that x(k+1) is also a feasible point.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 44 / 118

Page 23: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Primal Methods

I Example 21: Consider the following problem

min f(x)

s.t. aTi x ≤ bi, i = 1, . . . ,M

- Let A(k) be the set of indices representing active constraints, i.e.,aTi x(k+1) = bi, i ∈ A(k) at x(k), then the direction vector d(k) is calculated by

mind∇T f(x(k))d

s.t. aTi d ≤ 0, i ∈ A(k)

N∑i=1

|di| = 1

Last line ensures a bounded solution. The other constraints assure that vectors ofthe form x(k) + α(k)d(k) will be feasible for su�ciently small α(k) > 0, and subjectto these conditions, d(k) is chosen to line up as closely as possible with thenegative gradient of f(x(k)). In some sense this will result in the locally bestdirection in which to proceed. The overall procedure progresses by generatingfeasible directions in this manner, and moving along them to decrease theobjective.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 45 / 118

Constrained Optimization Algorithms Primal Methods

There are two major shortcomings of feasible direction methods that require that theybe modi�ed in most cases.

I The �rst shortcoming is that for general problems there may not exist any feasibledirections. If, for example, a problem had nonlinear equality constraints, we might�nd ourselves in the situation where no straight line from x(k) has a feasiblesegment. For such problems it is necessary either to relax our requirement offeasibility by allowing points to deviate slightly from the constraint surface or tointroduce the concept of moving along curves rather than straight lines.

I A second shortcoming is that in simplest form most feasible direction methods arenot globally convergent. They are subject to jamming (sometimes referred to aszigzagging) where the sequence of points generated by the process converges to apoint that is not even a constrained local minimum point.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 46 / 118

Page 24: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Primal Methods

Active Set Methods

I The idea underlying active set methods is to partition inequality constraints intotwo groups: those that are to be treated as active and those that are to be treatedas inactive. The constraints treated as inactive are essentially ignored.

I Consider the following problem

min f(x)

s.t. g(x) ≤ 0

For simplicity, no equality constraints, the inclusion of equality constraints will bestraightforward.

Necessary conditions for optimum x∗ are

∇f(x∗) + λT∇g(x∗) = 0

g(x∗) ≤ 0

λTg(x∗) = 0

λ ≥ 0

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 47 / 118

Constrained Optimization Algorithms Primal Methods

I Let A be the set of indices of active constraints (i.e., gi(x∗) = 0, i ∈ A). Then

∇f(x∗) +∑i∈A

λi∇gi(x∗) = 0

gi(x∗) = 0, i ∈ A

gi(x∗) < 0, i /∈ Aλi ≥ 0, i ∈ Aλi = 0, i /∈ A

Inactive constraints are inhibited (i.e., λi = 0).

I Just taking active constraints in A, problem is converted to anequality-constrained-only problem.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 48 / 118

Page 25: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Primal Methods

I Active set method:

- at each step �nd the working set W and treat it as the active set (can be asubset of the actual active set, i.e., W ⊆ A ).

- move to a lower point on the surface of the working set

- �nd a new working set W for this point

- repeat

I The surface de�ned by the working set W will be called the working surface.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 49 / 118

Constrained Optimization Algorithms Primal Methods

I Given any working set W ⊆ A, Assume that xW is a solution to the problem PW

min f(x)

s.t. gi(x) = 0, i ∈W

also satisfying gi(xW) < 0, i /∈ A. If xW cannot be found change the working setW until a solution xW is obtained.

I Once xW is found, then solve

∇f(xW) +∑i∈W

λi∇gi(xW) = 0

to �nd λi.

I If λi ≥ 0, ∀i ∈W, then xW is a local optimal solution of the original problem.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 50 / 118

Page 26: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Primal Methods

I If ∃i ∈W such that λi < 0, then dropping constraint i (but staying feasible) willdecrease the value due to the sensitivity theorem (relaxing the constraint i to begi(x) = −c reduces f(x) by λic).

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 51 / 118

Constrained Optimization Algorithms Primal Methods

I The surface de�ned by a working set is called a working surface. By dropping ifrom W and moving on the new working surface (toward the interior of the feasibleregion F, we move to an improved solution. )

I Monitoring the movement to avoid infeasibility until one or more constraintsbecome active, then add them to the working set W. Now, solve the changedproblem PW again to �nd the a new xW and repeat the previous steps.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 52 / 118

Page 27: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Primal Methods

I If we can assure the the objective function (cost function) value is monotonicallydecreasing, then any working set will not appear twice in the process. Hence theactive set method terminates in a �nite number of iterations.)

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 53 / 118

Constrained Optimization Algorithms Primal Methods

Active Sets Algorithm: For a given working set Wrepeat

repeat

- minimize f(x) over the working set W using x(k+1) = x(k) + α(k)d(k)

- check whether a new constraint becomes active.

if so, add the new constraint to the working set W.

until some stopping criterion is satis�ed

check the Lagrange multipliers λi

- drop constraints with λi < 0 from the working set W

until some stopping criterion is satis�ed

I Note that f(x) strictly decreases at each step.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 54 / 118

Page 28: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Primal Methods

I Disadvantage: The inner loop must terminate at a global optimum in order todetermine correct λi, and the same working set is not encountered in the followingiterations.

I Discuss: How can we integarte equality constraints to the original problem?

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 55 / 118

Constrained Optimization Algorithms Primal Methods

Gradient Projection Method

I Gradient Projection Method is an extension of the Gradient (or Steepest) DescentMethod (GD or SD) to the constrained case.

I Let us �rst consider linear constraints

min f(x)

s.t. aTi x ≤ bi, i ∈ I1aTi x = bi, i ∈ I2

Let us take the active constraints,i.e., aTi x = bi, as the working set W and seek afeasible descent direction

�nd ∇T f(x)d < 0

while satisfying aTi d = 0, i ∈W

i.e., d must lie in the tangent plane de�ned by aTi d = 0, i ∈W.

Hence, the above problem is the projection of −∇f(x) on to this tangent plane.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 56 / 118

Page 29: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Primal Methods

I Another perspective is to let the equality Ax = b to represent all the activeconstraints aTi x = bi, i ∈W. Thus

min f(x)

s.t. Ax = b

I If we use �rst order Taylor approximation around point x(k), such thatf(x) = f(x(k) + d) ∼= f(x(k)) +∇T f(x(k))d for small enough d , the we will have

min f(x(k)) +∇T f(x(k))d

s.t. A(x(k) + d

)= b

dT I d ≤ 1

I As Ax(k) = b and Ax(k+1) = b, so this implies that Ad = 0.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 57 / 118

Constrained Optimization Algorithms Primal Methods

I Thus, the problem simpli�es to

min ∇T f(x(k))d

s.t. Ad = 0

dT I d ≤ 1

- Ad = 0 de�nes the tangent plane M ⊆ RN and ensures that x(k+1) is still feasible.

- dT I d ≤ 1 is the Euclidean unit ball (‖d‖22 ≤ 1), thus this is the projected GDalgorithm.

- We may also use the constraint dTQd ≤ 1, where Q is a SPD matrix, to obtainthe projected SD algorithm.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 58 / 118

Page 30: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Primal Methods

Projected Steepest Descent Algorithm (PSDA):

1. For a feasible initial point x(0) (i.e., x(0) ∈ F)

2. Solve the Direction Finding Problem (DFP)

d(k) = argmin ∇T f(x(k))d

s.t. Ad = 0

dTQd ≤ 1, (Q : SPD)

3. If ∇T f(x(k))d(k) = 0, stop. x(k) is a KKT point.

4. Solve α(k) = argmin f(x(k) + αd(k)) using a line search algorithm, e.g. exact orbacktracking line search.

5. x(k+1) = x(k) + α(k)d(k)

6. Goto Step 1 with x(0) = x(k+1).

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 59 / 118

Constrained Optimization Algorithms Primal Methods

Projection:

I DFP problem is another constrained optimization problem which should satisfy itsKKT conditions, i.e.,

Ad(k) = 0

d(k)TQd(k) = 1

∇f(x(k)) + 2βkQd(k) + ATλk = 0

βk ≥ 0

βk(

1− d(k)TQd(k))

= 0

- Here, Ad(k) = 0 de�nes the tangent plane M(k) ⊆ RN .

If we set d(k) = 2βkd(k), From the third condition we �nd that

d(k) = −Q−1∇f(x(k))−Q−1ATλk

Now if put this value into the �rst condition we obtain

λk = −(AQ−1AT

)−1

AQ−1∇f(x(k))

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 60 / 118

Page 31: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Primal Methods

Thus, d(k) is obtained as

d(k) = −[Q−1 −Q−1AT

(AQ−1AT

)−1

AQ−1

]∇f(x(k))

= −P(k)∇f(x(k))

where

P(k) = Q−1 −Q−1AT(AQ−1AT

)−1

AQ−1

is called the projection matrix.

I Note that if Q = I, then

P(k) = I−AT(AAT

)−1

A

I As 2βk is just a scaling factor, we can safely write that the descent direction d(k)

is given by

d(k) = −P(k)∇f(x(k))

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 61 / 118

Constrained Optimization Algorithms Primal Methods

PSDA with DFP Algorithm: Given a feasible point x(k) (i.e., x(k) ∈ F)1. Find the active constraint set W(k) and form A (actually A(k)) matrix

2. Calculate

P(k) = Q−1 −Q−1AT(AQ−1AT

)−1

AQ−1

d(k) = −P(k)∇f(x(k))

3. If d(k) 6= 0

a) Find α

α1 = max{α : x(k) + αd(k) is feasible

}α2 = argmin

{f(x + αd(k)) : 0 ≤ α ≤ α1

}

b) x(k+1) = x(k) + α2d(k)

c) Goto Step 1 (with x(k) = x(k+1))

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 62 / 118

Page 32: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Primal Methods

4. If d(k) = 0

a) Find λ

λ = −(AQ−1AT

)−1

AQ−1∇f(x(k))

b) If λi ≥ 0, ∀i, stop. x(k) satis�es KKT.

c) If ∃λi < 0, delete the row from A corresponding to the inequality with themost negative component of λi and drop its index from W. Goto Step 2.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 63 / 118

Constrained Optimization Algorithms Primal Methods

I Example 22: Consider the following problem

min x21 + x22 + x23 + x24 − 2x1 − 3x4

s.t. 2x1 + x2 + x3 + 4x4 = 7

x1 + x2 + 2x3 + x4 = 6

xi ≥ 0, i = 1, 2, 3, 4

Given the initial point x(0) =[2 2 1 0

]T, �nd the initial direction d(0) for the

projected gradient descent algorithm (PGDA).

- The active constraints are the two equalities and the inequality x4 ≥ 0, thus

A =

2 1 1 41 1 2 10 0 0 1

Also, ∇f(x(0)) is given by

∇f(x(0)) =

242−3

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 64 / 118

Page 33: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Primal Methods

So,

AAT =

22 9 49 7 14 1 1

(AAT

)−1

=1

11

6 −5 −19−5 6 14−19 14 73

Hence, the projection matrix P(0) is given by

P(0) = I−AT(AAT

)−1

A =1

11

1 −3 1 0−3 9 −3 01 −3 1 00 0 0 0

Finally, the driection d(0) is given by

d(0) = −P(0)∇f(x(0)) =1

11

−824−80

.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 65 / 118

Constrained Optimization Algorithms Primal Methods

Nonlinear constraints:

I Consider the problem which de�nes only the active constraints

min f(x)

s.t. h(x) = 0

I In linear constraints x(k) lie on the tangent plane M. However in nonlinearconstraints, the surface de�ned by the constraints and the tangent plane M touchin a single point

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 66 / 118

Page 34: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Primal Methods

I In this case updated point must be projected onto the constrained surface. Forprojected gradient descent method, the projection matrix is given by

P(k) = I− JTh (x(k))(Jh(x(k))JTh (x(k))

)−1

Jh(x(k))

where Jh(x) is the Jacobian of h(x), i.e.,

Jh(x) =(∇hT (x)

)T.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 67 / 118

Constrained Optimization Algorithms Equality Constrained Optimization

Equality Constrained Optimization

See Boyd Chapter 10.

min f(x)

s.t. Ax = b

where f(x) : RN → R, convex, twice di�erentiable, A ∈ RM×N , M < N .

Minimum occurs at

p∗ = inf {f(x) |Ax = b} = f(x∗)

where x∗ satis�es KKT conditions

Ax = b

∇f(x∗) + ATν∗ = 0

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 68 / 118

Page 35: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Equality Constrained Optimization

Quadratic Minimization

min1

2xTQx + cTx + r

s.t. Ax = b

where Q ∈ RN×N is a symmetric positive semide�ne matrix (SPSD), A ∈ RM×Nand b ∈ RM .

I Using KKT conditions

Ax∗ = b

Qx∗ + c + ATν∗ = 0

}=

[Q AT

A 0

]︸ ︷︷ ︸KKT matrix

[x∗

ν∗

]=

[−cb

]

I Above equations de�nes a KKT system for an equality constrained quadraticoptimization problem with (N +M) linear equations in the (N +M) variables(x∗,ν∗).

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 69 / 118

Constrained Optimization Algorithms Equality Constrained Optimization

Eliminating Equality Constraints

I One general approach to solving the equality constrained problem is to eliminatethe equality constraints and then solve the resulting unconstrained problem usingmethods for unconstrained minimization.

I We �rst �nd a matrix F ∈ RN×(N−M) and vector x that parametrize the (a�ne)feasible set:

{x |Ax = b} ={

Fz + x | z ∈ RN−M}.

Here x can be chosen as any particular solution of Ax = b, and F ∈ RN×(N−M) isany matrix whose range (columnspace) is the nullspace of A, i.e., R(F) = N (A).

I We then form the reduced or eliminated optimization problem

min f(z) = f(Fz + x)

which is an unconstrained problem with variable z ∈ RN−M . From its solution z∗,we can �nd the solution of the equality constrained problem as

x∗ = Fz∗ + x

.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 70 / 118

Page 36: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Equality Constrained Optimization

I There are, of course, many possible choices for the elimination matrix F, whichcan be chosen as any matrix in RN×(N−M) with the range (columnspace) of thematrix F is equals to the nullspace of the matrix A, i.e., R(F) = N (A). If F isone such matrix, and T ∈ R(M−N)×(M−N) is nonsingular, then F = FT is also asuitable elimination matrix, since

R(F) = R(F) = N (A).

I Conversely, if F and F are any two suitable elimination matrices, then there issome nonsingular T such that F = FT. If we eliminate the equality constraintsusing F, we solve the unconstrained problem

min f(Fz + x)

while if F is used, we solve the unconstrained problem

min f(Fz + x) = f(F(Tz) + x)

This problem is equivalent to the one above, and is simply obtained by the changeof coordinates z = Tz. In other words, changing the elimination matrix can bethought of as changing variables in the reduced problem.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 71 / 118

Constrained Optimization Algorithms Equality Constrained Optimization

Example 23:

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 72 / 118

Page 37: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Equality Constrained Optimization

Newton's Method Equality Constraints

I It is the same as unconstrained Newton's Method except

- initial point is feasible, x(0) ∈ F, i.e., Ax(0) = b.

- Newton step ∆xnt is a feasible direction, i.e., A∆xnt = 0.

I In order to use Newton's Method on the problem

min f(x)

s.t. Ax = b

we can use second-order Taylor approximation around x (actually x(k)) to obtain aquadratic minimization problem

min f(x + ∆x) = f(x) +∇T f(x)∆x +1

2∆xTH(x)∆x

s.t. A (x + ∆x) = b

This problem is convex if H(x) � 0

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 73 / 118

Constrained Optimization Algorithms Equality Constrained Optimization

I Using the results of quadratic minimization with equality constraints[H(x) AT

A 0

]︸ ︷︷ ︸

KKT matrix

[∆xnt

ν∗

]=

[−∇f(x)

0

].

Solution exists when the KKT matrix is non-singular.

I The same solution can also be obtained by setting x∗ = x + ∆xnt in theoptimality condition equations of the original problem

Ax∗ = b

∇f(x∗) + ATν∗ = 0

as ∆xnt and ν∗ should satisfy the optimality conditions.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 74 / 118

Page 38: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Equality Constrained Optimization

I Newton decrement λ(x) is the same as the one used for the unconstrainedproblem, i.e.,

λ(x) =(

∆xTntH(x)∆xnt

)1/2= ‖∆xnt‖H(x)

being the norm of the Newton step in the norm de�ned by H(x).

I λ2(x)2

, being a good estimate of f(x)− p∗ (i.e., f(x)− p∗ ≈ λ2(x)2

), can be used

as the stopping criterion λ2(x)2≤ ε.

I Appears in the line search

d f(x + α∆xnt)

∣∣∣∣α=0

= ∇T f(x)∆xnt = −λ2(x)

I In order to Newton step ∆xnt to be a feasible descent direction the following twoconditions need to be satis�ed

∇T f(x)∆xnt = −λ2(x)

A∆xnt = 0

I The solution for equality constrained Newton's Method is invariant to a�netransformations.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 75 / 118

Constrained Optimization Algorithms Equality Constrained Optimization

Newton's Method with Equality ConstraintElimination

min f(x)

s.t. Ax = b≡ min f(z) = f(Fz + x)

with R(F) = N (A) and Ax = b.

I The gradient and Hessian of f(z) are

∇f(z) = FT∇f(Fz + x)

H(z) = FTH(Fz + x)F

I Then, the KKT matrix is invertible, i� H(z) is invertible.

I The Newton step of the reduced problem is

∆znt = −H−1(z)∇f(z) = −(FTH(x)F

)−1

FT∇f(x)

where x = Fz + x.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 76 / 118

Page 39: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Equality Constrained Optimization

I It can be shown that∆xnt = F∆znt

where ∆xnt is the Newton step of the original problem with equality constraints.

I The Newton decrement λ(z) is the same as the Newton decrement of the originalproblem

λ2(z) = ∆zTntH(z)∆znt

= ∆zTntFTH(x)F∆znt

= ∆xTntH(x)∆xnt

= λ2(x)

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 77 / 118

Constrained Optimization Algorithms Penalty and Barrier Methods

Penalty and Barrier Methods

See Luenberger Chapter 13.

I They approximate constrained problems by unconstrained problems

I The approximation is obtained by adding

- a term with a high cost for violating the constraints (penalty methods)

- a term that favors points interior to the feasible points over those near theboundary (barrier methods)

to the objective function (cost function).

I The penalty and barrier methods work directly in original the N -dimensional spaceRN rather than the (N −M)-dimensional space RN−M as in the case of theprimal methods.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 78 / 118

Page 40: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Penalty and Barrier Methods

Penalty Methods

min f(x)

s.t. x ∈ FI The idea is to replace the this problem by the following penalty problem

min f(x) + cP (x)

where c ∈ R++ is a constant and P (x) : RN → R is the penalty function.

I Here

- P (x) is continuous

- P (x) ≥ 0 ∀x ∈ RN

- P (x) = 0 i� x ∈ FI In general, the following quadratic penalty function is used

P (x) =1

2

L∑i=1

(max {0, gi(x)})2 +1

2

M∑j=1

(hj(x))2

where gi(x) ≤ 0 are the inequality constraints and hj(x) = 0 are the equalityconstraints.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 79 / 118

Constrained Optimization Algorithms Penalty and Barrier Methods

I For example, consider P (x) = 12

2∑i=1

(max {0, gi(x)})2 with g1(x) = x− b and

g2(x) = a− x

I For large c, the minimum point of the penalty problem will be in a region whereP (x) is small.

I For increasing c, the solution will approach the feasible region F and as c→∞ thepenalty problem will converge to a solution of the constrained problem.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 80 / 118

Page 41: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Penalty and Barrier Methods

Penalty Method:

I Let{c(k)}, k = 1, 2, . . . be a sequence tending to ∞ such that ∀k c(k) ≥ 0 and

c(k+1) > c(k).

I Letq(c,x) = f(x) + cP (x)

and for each k solve the penalty problem

min q(c(k),x)

obtaining a solution point x(k).

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 81 / 118

Constrained Optimization Algorithms Penalty and Barrier Methods

I Lemma: As k →∞, c(k+1) > c(k) (e.g., start from an exterior point)

i. q(c(k),x(k)) ≤ q(c(k+1),x(k+1))

ii. P (x(k)) ≥ P (x(k+1))

iii. f(x(k)) ≤ f(x(k+1))

iv. f(x∗) ≥ q(c(k),x) ≥ f(x(k))

I Theorem: Let{

x(k)}, k = 1, 2, . . . be a sequence generated by the penalty

method. Then, any limit of the sequence is a solution of the original constrainedproblem.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 82 / 118

Page 42: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Penalty and Barrier Methods

Barrier MethodsAlso known as interior methods.

min f(x)

s.t. x ∈ F

I The idea is to replace the this problem by the following barrier problem

min f(x) +1

cB(x)

s.t. x ∈ interior of F

where c ∈ R++ is a constant and B(x) : RN → R is the barrier function.

I Here, barrier function B(x) is de�ned on the interior of F such that

- B(x) is continuous

- B(x) ≥ 0

- B(x)→∞ as x approaches the boundary of FI Ideally,

B(x) =

{0, x ∈ interiorF∞, x /∈ interiorF

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 83 / 118

Constrained Optimization Algorithms Penalty and Barrier Methods

I There are several approximations. Two common barrier functions for the inequalityconstraints are given below

Log barrier: B(x) = −L∑i=1

log (−gi(x))

Inverse barrier: B(x) = −L∑i=1

1

gi(x)

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 84 / 118

Page 43: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Penalty and Barrier Methods

I For example, consider B(x) = −1/g1(x)− 1/g2(x), g1(x) = x− b andg2(x) = a− x.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 85 / 118

Constrained Optimization Algorithms Penalty and Barrier Methods

Barrier Method:

I Let{c(k)}, k = 1, 2, . . . be a sequence tending to ∞ such that ∀k c(k) ≥ 0 and

c(k+1) > c(k).

I Let

r(c,x) = f(x) +1

cB(x)

and for each k solve the barrier problem

min r(c(k),x)

s.t. x ∈ interior of F

obtaining a solution point x(k).

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 86 / 118

Page 44: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Penalty and Barrier Methods

I Lemma: As k →∞, c(k+1) > c(k) (start from an interior point)

i. r(c(k),x(k)) ≥ r(c(k+1),x(k+1))

ii. B(x(k)) ≤ B(x(k+1))

iii. f(x(k)) ≥ f(x(k+1))

iv. f(x∗) ≤ f(x(k)) ≤ r(c(k),x)

I Theorem: Any limit point of a sequence{

x(k)}, k = 1, 2, . . . generated by the

barrier method is a solution of the original constrained problem.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 87 / 118

Constrained Optimization Algorithms Penalty and Barrier Methods

Properties of the Penalty & Barrier Methods

I The penalty method solves the unconstrained problem

min f(x) + cP (x)

I Let pi(x) = max {0, gi(x)} and p(x) =[pi]L×1

Let the penalty function γ(x) where P (x) = γ(p(x)) to be the Euclidean normfunction

γ(y) = yTy ⇒ P (x) = pT (x)p(x) =L∑i=1

(pi(x))2

or more generally to be the quadratic norm function

γ(y) = yT Γ y ⇒ P (x) = pT (x) Γ p(x)

I The Hessian of the above problem becomes more and more ill-conditioned asc→∞

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 88 / 118

Page 45: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Penalty and Barrier Methods

I De�ningq(c,x) = f(x) + c γ(p(x))

The Hessian Q(c,x) is given by

Q(c,x) = F(x) + c∇T γ(p(x)) G(x) + cJTp (x) Γ(p(x)) Jp(x)

where F(x), G(x) and Γ(x) are Hessians of f(x), g(x) and γ(x) respectively, andJp(x) is the Jacobian of p(x).

I If at x∗ there are r active constraints, then the Hessian matrices Q(c(k),x(k))have r eigenvalues tending to ∞ as c(k) →∞ and (n− r) eigenvalues tending tosome �nite value. In other words, condition number goes to in�nity (κ→∞) asc(k) →∞.

I Gradient Descent may not be directly applied, instead Newton's Method ispreferred!

I Same observation also applies to the barrier method.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 89 / 118

Constrained Optimization Algorithms Interior-Point Methods

Interior-Point MethodsSee Boyd Chapter 11.

min f(x) +1

c

(−

L∑i=1

log (−gi(x))

)s.t. Ax = b

Minimum occurs at

p∗ = inf {f(x) |Ax = b} = f(x∗)Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 90 / 118

Page 46: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

I The function

φ(x) = −L∑i=1

log (−gi(x))

with domφ(x) ={x ∈ RN | gi(x) < 0, ∀i

}is called the logarithmic barrier

function.

I We will modify the Newton's algorithm to solve the above problem. So, we willneed

∇φ(x) = −L∑i=1

1

gi(x)∇gi(x)

Hφ(x) =L∑i=1

1

g2i (x)∇gi(x)∇T gi(x)−

L∑i=1

1

gi(x)Hgi(x)

where Hφ(x) = ∇2φ(x) and Hgi(x) = ∇2gi(x) are the Hessians of φ(x) andgi(x) respectively.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 91 / 118

Constrained Optimization Algorithms Interior-Point Methods

Central PathI Consider the equivalent problem (c > 0)

min cf(x) + φ(x)

s.t. Ax = b

I The point x∗(c) is the solution. The trajectory of x∗(c) as a function of c is calledthe central path. Points on the central path satisfy

Ax∗(c) = b

gi(x∗(c)) < 0, ∀i

c∇f(x∗(c)) +∇φ(x∗(c)) + AT ν = 0

Centrality conditions

for some ν ∈ RM .

I Last line could be rewritten as

c∇f(x∗(c))−L∑i=1

1

gi(x∗(c))∇gi(x∗(c)) + AT ν = 0

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 92 / 118

Page 47: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

I Example 24: Inequality form linear programming. The logarithmic barrier functionfor an LP in inequality form

min eTx

s.t. Ax ≤ b

(where e ∈ RN , A ∈ RN×L and b ∈ RL are constants) is given by

φ(x) = −L∑i=1

log(bi − aTi x

), domφ(x) = {x |Ax < b}

where aT1 , . . . ,aTL are the rows of A.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 93 / 118

Constrained Optimization Algorithms Interior-Point Methods

The gradient and Hessian of the barrier function are

∇φ(x) =L∑i=1

1

bi − aTi xai

= ATd

Hφ(x) =L∑i=1

1

(bi − aTi x)2 aia

Ti

= AT (diag d)2A

where di =1

bi − aTi x.

Since x is strictly feasible, we have d > 0, so the Hessian of φ(x), Hφ(x) isnonsingular if and only if A has rank N , i.e., full-rank.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 94 / 118

Page 48: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

The centrality condition isc e + ATd = 0

We can give a simple geometric interpretation of the centrality condition. At apoint x∗(c) on the central path the gradient ∇φ(x∗(c)), which is normal to thelevel set of φ(x) through x∗(c), must be parallel to e. In other words, thehyperplane eTx = eTx∗(c) is tangent to the level set of φ(x) through x∗(c).Figure below shows an example with L = 6 and N = 2.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 95 / 118

Constrained Optimization Algorithms Interior-Point Methods

The dashed curves in the previous �gure show three contour lines of thelogarithmic barrier function φ(x). The central path converges to the optimal pointx∗ as c→∞. Also shown is the point on the central path with c = 10. Theoptimality condition at this point can be veri�ed geometrically: The lineeTx = eTx∗(10) is tangent to the contour line of φ(x) through x∗(10).

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 96 / 118

Page 49: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

Dual Points from Central Path

I From the centrality condition, let

λ∗i (c) = − 1

c gi(x∗(c)), ∀i

ν∗(c) =ν

c

then

∇f(x∗(c)) +L∑i=1

λ∗i (c)∇gi(x∗(c)) + ATν∗(c) = 0

Hence, from KKT, x∗(c) minimizes

L(x,λ,ν) = f(x) + λTg(x) + νT (Ax− b)

for λ = λ∗(c) and ν = ν∗(c) for a particular c, which means that (λ∗(c),ν∗(c)) isa dual feasible pair.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 97 / 118

Constrained Optimization Algorithms Interior-Point Methods

So, the dual function optimal value p∗(c) = `(λ∗(c),ν∗(c)) is �nite and given as

`(λ∗(c),ν∗(c)) = f(x∗(c)) +L∑i=1

λ∗i (c)gi(x∗(c))︸ ︷︷ ︸

= 1c

+ν∗T (c) (Ax∗(c)− b)︸ ︷︷ ︸=0

= f(x∗(c))− L

c

I Duality gap isL

cand

f(x∗(c))− p∗ ≤ L

cgoes to zero, as c→∞.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 98 / 118

Page 50: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

KKT Interpretation

I Central path (centrality) conditions can be seen as deformed KKT conditions, i.e.,

Ax = b

gi(x∗) ≤ 0, ∀iλ∗i ≥ 0, ∀i

−λ∗i gi(x∗) =1

c, ∀i

∇f(x∗) +L∑i=1

λ∗i∇gi(x∗) + ATν∗ = 0

satis�ed byx∗(c), λ∗(c) and ν∗(c)

I Complementary slackness, λigi(x) = 0 → −λ∗i gi(x∗) =1

c

I As c→∞, x∗(c), λ∗(c) and ν∗(c) almost satisfy the KKT optimality conditions.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 99 / 118

Constrained Optimization Algorithms Interior-Point Methods

Newton Step for Modi�ed KKT equations

I Let λi = − 1

c gi(x), then

∇f(x)−L∑i=1

1

c gi(x)∇gi(x) + ATν = 0

Ax = b

I To solve this set of (N +M) linear independent equations (of N +M variables xand ν) consider the nonlinear part of the �rst set of equations

∇f(x + d)−L∑i=1

1

c gi(x + d)∇gi(x + d) ∼= ∇f(x)−

L∑i=1

1

c gi(x)∇gi(x)︸ ︷︷ ︸

=g

+ H(x)d−L∑i=1

1

c gi(x)Hgi(x)d +

L∑i=1

1

c g2i (x)∇gi(x)∇T gi(x)d︸ ︷︷ ︸

=Hd

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 100 / 118

Page 51: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

I Substituting back, we obtain

Hd + ATν = −g

Ad = 0

where

H = H(x)−L∑i=1

1

c gi(x)Hgi(x) +

L∑i=1

1

c g2i (x)∇gi(x)∇T gi(x)

g = ∇f(x)−L∑i=1

1

c gi(x)∇gi(x)

I Using the derivations of ∇φ(x) and Hφ(x)

H = H(x) +1

cHφ(x)

g = ∇f(x) +1

c∇φ(x)

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 101 / 118

Constrained Optimization Algorithms Interior-Point Methods

I Let us represent the previous modi�ed KKT equations in matrix form[H AT

A 0

] [dν

]=

[−g0

]whose solution would give the modi�ed Newton step ∆xnt and ν∗nt[

cH(x) + Hφ(x) AT

A 0

] [∆xnt

ν∗nt

]=

[−c∇f(x)−∇φ(x)

0

]where ν∗nt = cν∗.

I Using this Newton step, the Interior-Point Method (i.e., Barrier Method) can beconstructed.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 102 / 118

Page 52: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

The Interior-Point Method (Barrier Method):Given a strictly feasible x ∈ F, c = c(0)(> 0), µ > 1 and tolerance ε > 0

repeat

1. Centering step:

Compute x∗(c) by minimizing the modi�ed barrier problem

x∗(c) = argmin (c f(x) + φ(x))

s.t. Ax = b

starting at x using the modi�ed Newton's Method.

2. Update: x = x∗(c)

3. Stopping criterion: quit ifL

c< ε

4. Increase c: c = µ c

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 103 / 118

Constrained Optimization Algorithms Interior-Point Methods

I Accuracy of centering:

- Computing x∗(c) exactly is not necessary since the central path has nosigni�cance beyond the fact that it leads to a solution of the original problemas c→∞; inexact centering will still yield a sequence of points x(k) thatconverges to an optimal point. Inexact centering, however, means that thepoints λ∗(c) and ν∗(c), computed from the �rst two equation given in thesection titled "Dual points from central path", are not exactly dual feasible.This can be corrected by adding a correction term to these formulae, whichyields a dual feasible point provided the computed x is near the central path,i.e., x∗(c).

- On the other hand, the cost of computing an extremely accurate minimizerof c f(x) + φ(x), as compared to the cost of computing a good minimizer ofc f(x) + φ(x), is only marginally more, i.e., a few Newton steps at most. Forthis reason it is not unreasonable to assume exact centering.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 104 / 118

Page 53: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

I Choice of µ:

- µ provides a trade-o� in the number of iterations for the inner and outerloops.

- small µ: at each step inner loop starts from very good point, few inner loopiterations are required, but too many outer loop iterations may be required.

- large µ: at each step c increases a large amount, the current starting pointmay not be a good point for the inner loop. Too many inner loop iterationsmay be required, but few outer loop iterations are required.

- In practice, small values of µ (i.e., near one) result in many outer iterations,with just a few Newton steps for each outer iteration. For µ in a fairly largerange, from around 3 to 100 or so, the two e�ects nearly cancel, so the totalnumber of Newton steps remains approximately constant. This means thatthe choice of µ is not particularly critical; values from around 10 to 20 or soseem to work well.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 105 / 118

Constrained Optimization Algorithms Interior-Point Methods

I Choice of c(0):

- large c(0): �rst run of inner loop may require too many iterations.

- small c(0): more outer-loop iterations are required.

- One reasonable choice is to choose c(0) so that L

c(0)is approximately of the

same order as f(x(0))− p∗, or µ times this amount. For example, if a dualfeasible point (λ,ν) is known, with duality gap η = f(x(0))− `(λ,ν), thenwe can take c(0) = L

η. Thus, in the �rst outer iteration we simply compute a

pair with the same duality gap as the initial primal and dual feasible points.

- Another possibility is to �nd c(0) which minimizes

infν‖c∇f(x(0)) +∇φ(x(0)) + ATν‖2

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 106 / 118

Page 54: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

Example 25:

Solution:

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 107 / 118

Constrained Optimization Algorithms Interior-Point Methods

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 108 / 118

Page 55: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 109 / 118

Constrained Optimization Algorithms Interior-Point Methods

Example 26:

Solution:

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 110 / 118

Page 56: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 111 / 118

Constrained Optimization Algorithms Interior-Point Methods

How to start from a feasible point?

I The interior-point method requires a strictly feasible starting point x(0)

I If such a point is not known, a preliminary stage, Phase I is run �rst

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 112 / 118

Page 57: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

Basic Phase I Method:

I Consider gi(x) ≤ 0, i = 1, 2, . . . , L and Ax = b.

I We always assume that we are given a point x(0) ∈L⋂i=1

dom gi(x) satisfying

Ax(0) = b.

I Then, we form the following optimization problem

min s

s.t. gi(x) ≤ s, i = 1, 2, . . . , L

Ax = b

in the variables x ∈ RN and s ∈ R.

s is a bound on the maximum infeasibility of the inequalities and it is to be drivenbelow zero.

I The problem is always feasible when select s(0) ≥ maxigi(x

(0)) together with the

given x(0) ∈L⋂i=1

dom gi(x) with Ax(0) = b.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 113 / 118

Constrained Optimization Algorithms Interior-Point Methods

I Then, apply the interior-point method to solve the above problem. There are threecases depending on the optimal value p∗

1. If p∗ < 0, then a strictly feasible solution is reached. Moreover if (x, s) isfeasible with s < 0, then x satis�es gi(x) < 0. This means we do not needto solve the optimization problem with high accuracy; we can terminatewhen s < 0.

2. If p∗ > 0, then there is no feasible solution.

3. If p∗ = 0, and the minimum is attained at x∗ and s∗ = 0, then the set ofinequalities is feasible, but not strictly feasible. However, if p∗ = 0 and theminimum is not attained, then the inequalities are infeasible.

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 114 / 118

Page 58: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

Example 27:

Solution:

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 115 / 118

Constrained Optimization Algorithms Interior-Point Methods

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 116 / 118

Page 59: ELE604/ELE704 Optimization - Hacettepe Universityusezen/ele604/optimization3-2p.pdfUmut Sezen & Cenk okerT (Hacettepe University)ELE604 Optimization27-Dec-2016 19 / 118 Weak and Strong

Constrained Optimization Algorithms Interior-Point Methods

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 117 / 118

Constrained Optimization Algorithms Interior-Point Methods

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 118 / 118