ele604/ele704 optimization - hacettepe universityusezen/ele604/optimization3-2p.pdfumut sezen &...

ELE604/ELE704 Optimization

Constrained Optimization

http://www.ee.hacettepe.edu.tr/∼usezen/ele604/

Dr. Umut Sezen & Dr. Cenk Toker

Department of Electrical and Electronic Engineering

Hacettepe University

Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 1 / 118

Contents

Duality

Duality

Lagrange Dual Function

The Lagrange Dual Problem

Dual Problem

Weak and Strong Duality

Weak Duality

Strong Duality

Slater's Condition

Saddle-point Interpretation

Optimality Conditions

Certi�cate of Suboptimality and Stopping CriterionStopping Criterion

Complementary Slackness

KKT Optimality Conditions

Solving The Primal Problem via The Dual

Perturbation and Sensitivity Analysis

Constrained Optimization Algorithms

Introduction

Primal MethodsFeasible Direction MethodsActive Set MethodsGradient Projection Method

Equality Constrained OptimizationQuadratic MinimizationEliminating Equality ConstraintsNewton's Method Equality ConstraintsNewton's Method with Equality Constraint Elimination

Penalty and Barrier MethodsPenalty MethodsBarrier MethodsProperties of the Penalty & Barrier Methods

Interior-Point MethodsLogarithmic BarrierCentral PathDual Points from Central PathKKT InterpretationNewton Step for Modi�ed KKT equationsThe Interior-Point AlgorithmHow to start from a feasible point?


Duality Duality

DualityI Consider the standard minimization problem (will be referred as the primal

problem)

min f(x)

s.t. g(x) ≤ 0

h(x) = 0

where g(x) and h(x)

g(x) =[g1(x) g2(x) · · · gi(x) · · · gL(x)

]Th(x) =

[h1(x) h2(x) · · · hj(x) · · · hM (x)

]Trepresent the L inequality and M equality constraints, respectively.

I The domain of the optimization problem D is de�ned by

D = dom f(x) ∩L⋂i=1

dom gi(x) ∩M⋂j=1

domhj(x)

I Any point x ∈ D satisfying the constraints is called a feasible point, i.e., g(x) ≤ 0and h(x)= 0.


Duality Lagrange Dual Function

Lagrange Dual FunctionI De�ne the Lagrangian L : RN × RL × RM → R as

L(x,λ,ν) = f(x) + λTg(x) + νTh(x)

= f(x) +L∑i=1

λigi(x) +M∑j=1

νjhj(x)

where λi ≥ 0 and νj are called the Lagrange multipliers, and λ and ν

λ =[λ1 λ2 · · · λi · · · λL

]Tν =

[ν1 ν2 · · · νj · · · νM

]Tare called the Lagrange multiplier vectors.

I On the feasible set F where

F = {x | x ∈ D ∧ g(x) ≤ 0 ∧ h(x) = 0}

Lagragian has a value less than or equal to the cost function, i.e,

L(x,λ,ν) ≤ f(x), ∀x ∈ F, ∀λi ≥ 0

.



I Then the Lagrange dual function, `(λ,ν), is de�ned as

`(λ,ν) = infx∈D

L(x,λ,ν)

= infx∈D

(f(x) + λTg(x) + νTh(x)

)= inf

x∈D

(f(x) +

L∑i=1

λigi(x) +M∑j=1

νjhj(x)

)

I Dual function `(λ,ν) is the pointwise in�mum of a set of a�ne functions of λ andν, hence it is concave even if f(x), gi(x) and hj(x) are not convex.



I Proposition: The dual function constitutes a lower bound on p∗ = f(x∗), i.e.,

`(λ,ν) ≤ p∗ ∀λ ≥ 0

I Proof: Let x be a feasible point, that is x ∈ F (i.e., gi(x) ≤ 0 and hi(x) = 0),and λ > 0, then

λTg(x) + νTh(x) ≤ 0.

Then, the Lagrangian is

L(x,λ,ν) = f(x) + λTg(x) + νTh(x)︸︷︷︸≤0

≤ f(x)

So,`(λ,ν) = inf

x∈DL(x,λ,ν) ≤ L(x,λ,ν) ≤ f(x) ∀x �

I The pair (λ,ν) is called dual feasible when λ ≥ 0.



Examples:

I Least Squares (LS) solution of linear equations

min xTx

s.t. Ax = b

- The Lagrangian isL(x,ν) = xTx + νT (Ax− b)

then the Lagrange dual function is

`(ν) = infxL(x,ν)

Since L(x,ν) is quadratic in x, it is convex

∇L(x,ν) = 2x + ATν = 0⇒ x∗ = −1

2ATν

Hence Lagrange dual function is given by

`(ν) = L(−1

2ATν,ν) = −1

4νTATAν − bTν

which is obviously concave p∗ ≥ `(ν) = − 14νTATAν − bTν



I Linear Programming (LP)

min cTx

s.t. Ax = b

− x ≤ 0

- The Lagrangian is

L(x,λ,ν) = cTx− λTx + νT (Ax− b)

then the Lagrange dual function is

`(λ,ν) = −bTν + infx

{(c + ATν − λ

)Tx

}In order `(ν) to be bounded, we must have c + ATν − λ = 0, i.e.,

`(λ,ν) =

{−bTν, c + ATν − λ = 0

−∞, otherwise

A�ne, i.e., concave when c + ATν − λ = 0 with λ ≥ 0

So, p∗ ≥ −bTν when c + ATν − λ = 0 with λ ≥ 0.



I Two-way partitioning (a non-convex problem)

min xTWx

s.t. x2j = 1, j = 1, . . . , N

- This is a discrete-problem since xj ∈ {∓1}, and very di�cult to solve forlarge N .

The Lagrange dual function is

`(ν) = infx

{xTWx +

N∑j=1

νj(x2j − 1

)}

= infx

{xT (W + diag ν)x

}− 1Tν

=

{−1Tν, W + diag ν � 0

−∞, else

We may take ν = −λmin(W)1T which yields

p∗ ≥ −1Tν = N λmin(W)

where λmin(W) is the minimum eigen value of W.



- If we relax the constraint to be ‖x‖2 = N , i.e.,N∑j=1

x2j = N , then problem

becomes easy to solve

min xTWx

s.t. ‖x‖2 = N

with an exact solution ofp∗ = N λmin(W)

where λmin(W) is the minimum eigen value of W.


The Lagrange Dual Problem Dual Problem

The Lagrange Dual Problem

max `(λ,ν)

s.t. λ ≥ 0

gives the best lower bound for p∗, i.e., p∗ ≥ `(λ,ν).

I The pairs (λ,ν) with λ ≥ 0, ν ∈ RM and `(λ,ν) > −∞ are dual feasible.

I Solution of the above problem for a dual feasible set is called the dual optimalpoint (λ∗,ν∗) (i.e., optimal Lagrange multipliers)

p∗ = `(λ∗,ν∗) ≤ p∗

I Some hidden (implicit) constraints can be made explicit in the dual problem, e.g.,consider the LP problems on the next slides.



I Linear problem (LP)

min cTx

s.t. Ax = b

x ≥ 0

has the dual function

`(λ,ν) =

{−bTν, c + ATν − λ = 0

−∞, otherwise

- So, the dual problem can be given by

max − bTν

s.t. ATν − λ + c = 0

λ ≥ 0

- The dual problem can be further simpli�ed to the following problem

max − bTν

s.t. ATν + c ≥ 0



I Linear problem (LP) with inequality

min cTx

s.t. Ax ≤ b

has the dual function

`(λ) =

{−bTλ, ATλ + c = 0

−∞, otherwise

- So, the dual problem can be given by

max − bTλ

s.t. ATλ + c = 0

λ ≥ 0


Weak and Strong Duality Weak Duality

Weak Duality

I We know thatsup

λ∈R+,ν`(λ,ν) = p∗ ≤ p∗ = inf

x∈Df(x)

I The inequality means weak duality.

I Here (p∗ − p∗) is called the duality gap, which is always nonnegative.

I Weak duality always holds even when the primal problem is non-convex.


Weak and Strong Duality Strong Duality

Strong Duality

I Strong duality refers to the case where the duality gap is zero, i.e.,

p∗ = p∗

I In general it does not hold.

I It may hold if the primal problem is convex.

I The conditions under which strong duality hold are called constraint quali�cations,where one of them is the Slater's condition.


Weak and Strong Duality Slater's Condition

Slater's Condition

I Strong duality holds for a convex problem

min f(x)

s.t. g(x) ≤ 0

Ax = b

if it is strongly feasible, i.e., ∃x ∈ interiorD

g(x) < 0

Ax = b

i.e., inequality constraints hold with strict inequality.


Weak and Strong Duality Saddle-point Interpretation

Saddle-point Interpretation

I First consider the following problem,

supλ≥0

L(x,λ) = supλ≥0

(f(x) + λTg(x)

)= sup

λ≥0

(f(x) +

L∑i=1

λigi(x)

)

=

{f(x), gi(x) ≤ 0, ∀i∞, otherwise

with no equality constraint.

I If gi(x) ≤ 0, then optimum choice is λi = 0



I Hence from duality gap, we have

p∗ ≤ p∗

with weak duality as the inequality

supλ≥0

infxL(x,λ) ≤ inf

xsupλ≥0

L(x,λ)

and strong duality as the equality

supλ≥0

infxL(x,λ) = inf

xsupλ≥0

L(x,λ)

I With strong duality we can switch inf and sup for λ ≥ 0.



I In general, for any f(w, z) : RN × RL → R

supz∈Z

infw∈W

f(w, z) ≤ infw∈W

supz∈Z

f(w, z)

with W ⊆ RN , Z ⊆ RL, and is called the max-min inequality.

I f(w, z) satisfy the strong max-min property or the saddle-point property if theabove inequality holds with equality.

I A point {(w, z)|w ∈W, z ∈ Z} is called as the saddle-point for f(w, z) if

f(w, z) ≤ f(w, z) ≤ f(w, z)

∀w ∈W, ∀z ∈ Z

I i.e.,

f(w, z) = infw∈W

f(w, z)

= supz∈Z

f(w, z)

the strong max-min property holds with the value f(w, z).



I For Lagrange duality, if x∗ and λ∗ are optimal points for the primal and dualproblems with strong duality (zero duality gap), then they form a saddle-point forthe Lagrangian, or vice-versa (i.e., the converse is also true).


Optimality Conditions Certi�cate of Suboptimality and Stopping Criterion

Certi�cate of Suboptimality

I We know that a dual feasible point satis�es

`(λ,ν) ≤ p∗

i.e., the point (λ,ν) is proof or certi�cate of this condition.

Then,f(x)− p∗ ≤ f(x)− `(λ,ν)

for primal feasible point x and dual feasible point (λ,ν) whereas the duality gapassociated with these points is

f(x)− `(λ,ν)

in other words

p∗ ∈ [`(λ,ν), f(x)] and p∗ ∈ [`(λ,ν), f(x)]

- If the duality gap is zero, i.e., f(x) = `(λ,ν) then x∗ = x and (λ∗,ν∗) = (λ,ν)are the primal and dual optimal points.


Optimality Conditions Certi�cate of Suboptimality and Stopping Criterion

Stopping Criterion

I If an algorithm produces the sequences x(k) and (λ(k),ν(k)) check for

f(x(k))− `(λ(k),ν(k))?< ε

to guarantee ε-suboptimality. ε can approach to zero, i.e., ε→ 0, for strongduality.


Optimality Conditions Complementary Slackness

Complementary Slackness

I Assume that x∗ and (λ∗,ν∗) satisfy strong duality

f(x∗) = `(λ∗,ν∗)

= infx

(f(x) + λTg(x) + νTh(x)

)= inf

x

(f(x) +

∑i

λigi(x) +∑j

νjhj(x)

)≤ f(x∗) +

∑i

λ∗i gi(x∗)︸︷︷︸

≤0

+∑j

ν∗j hj(x∗)︸︷︷︸

=0

≤ f(x∗)


Optimality Conditions Complementary Slackness

I Observations: (due to strong duality)

1. Inequality in the third line always holds with equality, i.e., x∗ minimizesL(x,λ∗,ν∗)

2. From the forth line we have∑i

λ∗i gi(x∗) = 0 and since λigi(x) ≤ 0

λ∗i gi(x∗) = 0 ∀i

which is known as complementary slackness.

I In other words

λ∗i > 0 if gi(x∗) = 0

λ∗i = 0 if gi(x∗) < 0

i.e., the ith optimal lagrange multiplier is

- positive if gi(x∗) is active at x∗.

- zero if gi(x∗) is not active at x∗.


Optimality Conditions KKT Optimality Conditions

KKT Optimality Conditions

I x∗ minimizes L(x,λ∗,ν∗), thus ∇L(x,λ∗,ν∗) = 0, i.e.,

∇f(x∗) +∑i

λ∗i∇gi(x∗) +∑j

ν∗j∇hj(x∗) = 0

I Then the Karush-Kuchn-Tucker (KKT) conditions for x∗ and (λ∗,ν∗) beingprimal and dual optimal points with zero duality gap (i.e., with strong duality) are

gi(x∗) ≤ 0 (constraint)

hj(x∗) = 0 (constraint)

λ∗i ≥ 0 (constraint)

λ∗i gi(x∗) = 0 (complementary slackness)

∇f(x∗) +∑i

λ∗i∇gi(x∗) +∑j

ν∗j∇hj(x∗) = 0



I For any optimization problem with di�erentiable objective (cost) and constraintfunctions for which strong duality holds, any pair of primal and optimal pointssatisfy KKT conditions.

I For convex problems:

If f(x), gi(x) and hj(x) are convex and x, λ and ν satisfy KKT conditions, thenthey are optimal points, i.e.,

- from complementary slackness f(x) = L(x, λ, ν)

- from last condition `(λ, ν) = L(x, λ, ν)(

= infxL(x, λ, ν)

). Note that

L(x, λ, ν) is convex in x.

Thus,f(x) = `(λ, ν).



Example 18:

min1

2xTQx + cTx + r (Q : SPD)

s.t. Ax = b

Soln: From KKT consitions

Ax∗ = b

Qx∗ + c + ATν∗ = 0

}=

[Q AT

A 0

] [x∗

ν∗

]=

[−cb

]



Example 19:

Solution:


Optimality Conditions Solving The Primal Problem via The Dual

Solving The Primal Problem via The Dual

I If strong duality holds and if dual optimal solution (λ∗,ν∗) exists, then we cancompute a primal optimal solution from the dual solutions.

I When strong duality holds and (λ∗,ν∗) is given, the minimizer of L(x,λ∗,ν∗),i.e.,

min f(x) +∑i

λ∗i gi(x) +∑j

ν∗j hj(x)

is unique and if it is primal feasible, then it is also primal optimal.

I If the dual problem is easier to solve (e.g., has less dimensions or has an analyticalsolution), then solving the dual problem and �nding the optimal dual parameters(λ∗,ν∗) �rst and then solving

x∗ = argmin L(x,λ∗,ν∗)

will be an acceptable method to solve constrained minimization problems.



I Example 20: Consider the following problem

min f(x) =N∑i=1

fi(xi)

s.t. aTx = b

where each fi(x) is strictly convex and di�erentiable, a ∈ RN and b ∈ R. Assumethat the problem has a unique solution (non-empty and non-in�nity) and it is dualfeasible.

- f(x) is separable because fi(xi) is a function of xi only. Now the Lagrangian willbe given as

L(x, ν) =N∑i=1

fi(xi) + ν(aTx− b

)= −bν +

N∑i=1

(fi(xi) + νaixi)



Then the dual function is given by

`(ν) = infxL(x, ν)

= −bν + infx

N∑i=1

(fi(xi) + νaixi)

= −bν +N∑i=1

infxi

(fi(xi) + νaixi)︸︷︷︸f∗i (−νai)

= −bν −N∑i=1

f∗i (−νai)

where f∗i (y) is the conjugate function of fi(x).

NOTE: The conjugate function f∗(y) is the maximum gap between the linearfunction yTx and f(x) (see Boyd Section 3.3). If f(x) is di�erentiable, thisoccurs at a point x where ∇f(x) = y. Note that, f∗(y) is a convex function.

f∗(y) = supx∈dom f(x)

(yTx− f(x)

)



Then the dual problem is a function of a scalar ν ∈ R

maxν

(−bν −

N∑i=1

f∗i (−νai)

)

- Once we �nd ν∗, we know that L(x, ν∗) is strictly convex as each fi(x) is strictlyconvex. So, we can �nd x∗ by solving

∇L(x, ν∗) = 0

∂fi(xi)

∂xi= −ν∗ai.


Optimality Conditions Perturbation and Sensitivity Analysis

Perturbation and Sensitivity AnalysisI Original Problem

primal dual

min f(x)

s.t. g(x) ≤ 0

h(x) = 0

max `(λ,ν)

s.t. λ ≥ 0

I Perturbed problem

primal dual

min f(x)

s.t. g(x) ≤ u

h(x) = v

max `(λ,ν)− λTu− νTv

s.t. λ ≥ 0



I Here u = [ui]L×1 and v = [vj ]M×1 are called the perturbations. When ui = 0 andvj = 0, the problem becomes the original problem. If ui > 0, it means we haverelaxed the i-th inequality constraint, and if ui < 0, it means we have tightenedthe i-th inequality constraint.

I Let us use the notation p∗(u,v) to denote the optimal solution of the perturbedproblem. Thus, the optimal solution of the original problem isp∗ = p∗(0,0) = `(λ∗,ν∗).

I Assume that strong duality holds and optimal dual solution exists, i.e.,p∗ = `(λ∗,ν∗). Then, we can show that

p∗(u,v) ≥ p∗(0,0)− λ∗Tu− ν∗Tv



I If λi is large and ui < 0, then p∗(u,v) is guaranteed to increase greatly.

I If λi is small and ui > 0, then p∗(u,v) will not decrease too much.

I If |νj | is large- If νj > 0 and vi < 0, then p∗(u,v) is guaranteed to increase greatly.

- If νj < 0 and vi > 0, then p∗(u,v) is guaranteed to increase greatly.

I If |νj | is small

- If νj > 0 and vi > 0, then p∗(u,v) will not decrease too much.

- If νj < 0 and vi < 0, then p∗(u,v) will not decrease too much.



I The perturbation inequality, p∗(u,v) ≥ p∗(0,0)− λ∗Tu− ν∗Tv, give a lowerbound on the perturbed optimal value, but no upper bound. For this reason theresults are not symmetric respect to loosening or tightening a constraint, Forexample, if λi is large and we loosen the i-th constraint a bit (i.e., take ui smalland positive, 0 < ui < ε), then perturbation inequality is not useful, it does notimply that optimal value will decrease considerably.


Constrained Optimization Algorithms Introduction

Introduction

I The general constrained optimization problem

min f(x)

s.t. g(x) ≤ 0

h(x) = 0

I Defn (Active Constraint): Given a feasible point x(k), if gi(x) ≤ 0 is satis�ed withequality, i.e., gi(x) = 0 , then constraint i is said to be active at x(k). Otherwise itis inactive at x(k).


Constrained Optimization Algorithms Introduction


Constrained Optimization Algorithms Primal Methods

Primal Methods

See Luenberger Chapter 12.

I A primal method is a search method that works on the original problem directly bysearching through the feasible region for the optimal solution. Each point in theprocess is feasible and the value of the objective function constantly decreases.

For a problem with N variables and M equality constraints, primal methods workin the feasible space of dimension N −M .

Advantages:

- x(k) are all composed of feasible points.

- If x(k) is a convergent sequence, it converges at least to a local minimum.

- Do not rely on special problem structure, e.g. convexity, in other words,primal methods are applicable to general non-linear problems.

Disadvantages:

- must start from a feasible initial point.

- may fail to converge for inequality constraints if precaution is not taken.



Feasible Direction Methods

I Update equation isx(k+1) = x(k) + α(k)d(k)

d(k) must be a descent direction and x(k) + α(k)d(k) must be contained in thefeasible region, i.e., d(k) must be a feasible direction for some α(k) > 0.

Very similar to unconstrained descent methods but now line search is constrainedso that x(k+1) is also a feasible point.




min f(x)

s.t. aTi x ≤ bi, i = 1, . . . ,M

- Let A(k) be the set of indices representing active constraints, i.e.,aTi x(k+1) = bi, i ∈ A(k) at x(k), then the direction vector d(k) is calculated by

mind∇T f(x(k))d

s.t. aTi d ≤ 0, i ∈ A(k)

N∑i=1

|di| = 1

Last line ensures a bounded solution. The other constraints assure that vectors ofthe form x(k) + α(k)d(k) will be feasible for su�ciently small α(k) > 0, and subjectto these conditions, d(k) is chosen to line up as closely as possible with thenegative gradient of f(x(k)). In some sense this will result in the locally bestdirection in which to proceed. The overall procedure progresses by generatingfeasible directions in this manner, and moving along them to decrease theobjective.



There are two major shortcomings of feasible direction methods that require that theybe modi�ed in most cases.

I The �rst shortcoming is that for general problems there may not exist any feasibledirections. If, for example, a problem had nonlinear equality constraints, we might�nd ourselves in the situation where no straight line from x(k) has a feasiblesegment. For such problems it is necessary either to relax our requirement offeasibility by allowing points to deviate slightly from the constraint surface or tointroduce the concept of moving along curves rather than straight lines.

I A second shortcoming is that in simplest form most feasible direction methods arenot globally convergent. They are subject to jamming (sometimes referred to aszigzagging) where the sequence of points generated by the process converges to apoint that is not even a constrained local minimum point.



Active Set Methods

I The idea underlying active set methods is to partition inequality constraints intotwo groups: those that are to be treated as active and those that are to be treatedas inactive. The constraints treated as inactive are essentially ignored.

I Consider the following problem

min f(x)

s.t. g(x) ≤ 0

For simplicity, no equality constraints, the inclusion of equality constraints will bestraightforward.

Necessary conditions for optimum x∗ are

∇f(x∗) + λT∇g(x∗) = 0

g(x∗) ≤ 0

λTg(x∗) = 0

λ ≥ 0



I Let A be the set of indices of active constraints (i.e., gi(x∗) = 0, i ∈ A). Then

∇f(x∗) +∑i∈A

λi∇gi(x∗) = 0

gi(x∗) = 0, i ∈ A

gi(x∗) < 0, i /∈ Aλi ≥ 0, i ∈ Aλi = 0, i /∈ A

Inactive constraints are inhibited (i.e., λi = 0).

I Just taking active constraints in A, problem is converted to anequality-constrained-only problem.



I Active set method:

- at each step �nd the working set W and treat it as the active set (can be asubset of the actual active set, i.e., W ⊆ A ).

- move to a lower point on the surface of the working set

- �nd a new working set W for this point

- repeat

I The surface de�ned by the working set W will be called the working surface.



I Given any working set W ⊆ A, Assume that xW is a solution to the problem PW

min f(x)

s.t. gi(x) = 0, i ∈W

also satisfying gi(xW) < 0, i /∈ A. If xW cannot be found change the working setW until a solution xW is obtained.

I Once xW is found, then solve

∇f(xW) +∑i∈W

λi∇gi(xW) = 0

to �nd λi.

I If λi ≥ 0, ∀i ∈W, then xW is a local optimal solution of the original problem.



I If ∃i ∈W such that λi < 0, then dropping constraint i (but staying feasible) willdecrease the value due to the sensitivity theorem (relaxing the constraint i to begi(x) = −c reduces f(x) by λic).



I The surface de�ned by a working set is called a working surface. By dropping ifrom W and moving on the new working surface (toward the interior of the feasibleregion F, we move to an improved solution. )

I Monitoring the movement to avoid infeasibility until one or more constraintsbecome active, then add them to the working set W. Now, solve the changedproblem PW again to �nd the a new xW and repeat the previous steps.



I If we can assure the the objective function (cost function) value is monotonicallydecreasing, then any working set will not appear twice in the process. Hence theactive set method terminates in a �nite number of iterations.)



Active Sets Algorithm: For a given working set Wrepeat

repeat

- minimize f(x) over the working set W using x(k+1) = x(k) + α(k)d(k)

- check whether a new constraint becomes active.

if so, add the new constraint to the working set W.

until some stopping criterion is satis�ed

check the Lagrange multipliers λi

- drop constraints with λi < 0 from the working set W

until some stopping criterion is satis�ed

I Note that f(x) strictly decreases at each step.



I Disadvantage: The inner loop must terminate at a global optimum in order todetermine correct λi, and the same working set is not encountered in the followingiterations.

I Discuss: How can we integarte equality constraints to the original problem?



Gradient Projection Method

I Gradient Projection Method is an extension of the Gradient (or Steepest) DescentMethod (GD or SD) to the constrained case.

I Let us �rst consider linear constraints

min f(x)

s.t. aTi x ≤ bi, i ∈ I1aTi x = bi, i ∈ I2

Let us take the active constraints,i.e., aTi x = bi, as the working set W and seek afeasible descent direction

�nd ∇T f(x)d < 0

while satisfying aTi d = 0, i ∈W

i.e., d must lie in the tangent plane de�ned by aTi d = 0, i ∈W.

Hence, the above problem is the projection of −∇f(x) on to this tangent plane.



I Another perspective is to let the equality Ax = b to represent all the activeconstraints aTi x = bi, i ∈W. Thus

min f(x)

s.t. Ax = b

I If we use �rst order Taylor approximation around point x(k), such thatf(x) = f(x(k) + d) ∼= f(x(k)) +∇T f(x(k))d for small enough d , the we will have

min f(x(k)) +∇T f(x(k))d

s.t. A(x(k) + d

)= b

dT I d ≤ 1

I As Ax(k) = b and Ax(k+1) = b, so this implies that Ad = 0.



I Thus, the problem simpli�es to

min ∇T f(x(k))d

s.t. Ad = 0

dT I d ≤ 1

- Ad = 0 de�nes the tangent plane M ⊆ RN and ensures that x(k+1) is still feasible.

- dT I d ≤ 1 is the Euclidean unit ball (‖d‖22 ≤ 1), thus this is the projected GDalgorithm.

- We may also use the constraint dTQd ≤ 1, where Q is a SPD matrix, to obtainthe projected SD algorithm.



Projected Steepest Descent Algorithm (PSDA):

1. For a feasible initial point x(0) (i.e., x(0) ∈ F)

2. Solve the Direction Finding Problem (DFP)

d(k) = argmin ∇T f(x(k))d

s.t. Ad = 0

dTQd ≤ 1, (Q : SPD)

3. If ∇T f(x(k))d(k) = 0, stop. x(k) is a KKT point.

4. Solve α(k) = argmin f(x(k) + αd(k)) using a line search algorithm, e.g. exact orbacktracking line search.

5. x(k+1) = x(k) + α(k)d(k)

6. Goto Step 1 with x(0) = x(k+1).



Projection:

I DFP problem is another constrained optimization problem which should satisfy itsKKT conditions, i.e.,

Ad(k) = 0

d(k)TQd(k) = 1

∇f(x(k)) + 2βkQd(k) + ATλk = 0

βk ≥ 0

βk(

1− d(k)TQd(k))

= 0

- Here, Ad(k) = 0 de�nes the tangent plane M(k) ⊆ RN .

If we set d(k) = 2βkd(k), From the third condition we �nd that

d(k) = −Q−1∇f(x(k))−Q−1ATλk

Now if put this value into the �rst condition we obtain

λk = −(AQ−1AT

)−1

AQ−1∇f(x(k))



Thus, d(k) is obtained as

d(k) = −[Q−1 −Q−1AT

(AQ−1AT

)−1

AQ−1

]∇f(x(k))

= −P(k)∇f(x(k))

where

P(k) = Q−1 −Q−1AT(AQ−1AT

)−1

AQ−1

is called the projection matrix.

I Note that if Q = I, then

P(k) = I−AT(AAT

)−1

A

I As 2βk is just a scaling factor, we can safely write that the descent direction d(k)

is given by

d(k) = −P(k)∇f(x(k))



PSDA with DFP Algorithm: Given a feasible point x(k) (i.e., x(k) ∈ F)1. Find the active constraint set W(k) and form A (actually A(k)) matrix

2. Calculate

P(k) = Q−1 −Q−1AT(AQ−1AT

)−1

AQ−1

d(k) = −P(k)∇f(x(k))

3. If d(k) 6= 0

a) Find α

α1 = max{α : x(k) + αd(k) is feasible

}α2 = argmin

{f(x + αd(k)) : 0 ≤ α ≤ α1

}

b) x(k+1) = x(k) + α2d(k)

c) Goto Step 1 (with x(k) = x(k+1))



4. If d(k) = 0

a) Find λ

λ = −(AQ−1AT

)−1

AQ−1∇f(x(k))

b) If λi ≥ 0, ∀i, stop. x(k) satis�es KKT.

c) If ∃λi < 0, delete the row from A corresponding to the inequality with themost negative component of λi and drop its index from W. Goto Step 2.




min x21 + x22 + x23 + x24 − 2x1 − 3x4

s.t. 2x1 + x2 + x3 + 4x4 = 7

x1 + x2 + 2x3 + x4 = 6

xi ≥ 0, i = 1, 2, 3, 4

Given the initial point x(0) =[2 2 1 0

]T, �nd the initial direction d(0) for the

projected gradient descent algorithm (PGDA).

- The active constraints are the two equalities and the inequality x4 ≥ 0, thus

A =

2 1 1 41 1 2 10 0 0 1

Also, ∇f(x(0)) is given by

∇f(x(0)) =

242−3



So,

AAT =

22 9 49 7 14 1 1

(AAT

)−1

=1

11

6 −5 −19−5 6 14−19 14 73

Hence, the projection matrix P(0) is given by

P(0) = I−AT(AAT

)−1

A =1

11

1 −3 1 0−3 9 −3 01 −3 1 00 0 0 0

Finally, the driection d(0) is given by

d(0) = −P(0)∇f(x(0)) =1

11

−824−80

.



Nonlinear constraints:

I Consider the problem which de�nes only the active constraints

min f(x)

s.t. h(x) = 0

I In linear constraints x(k) lie on the tangent plane M. However in nonlinearconstraints, the surface de�ned by the constraints and the tangent plane M touchin a single point



I In this case updated point must be projected onto the constrained surface. Forprojected gradient descent method, the projection matrix is given by

P(k) = I− JTh (x(k))(Jh(x(k))JTh (x(k))

)−1

Jh(x(k))

where Jh(x) is the Jacobian of h(x), i.e.,

Jh(x) =(∇hT (x)

)T.


Constrained Optimization Algorithms Equality Constrained Optimization

Equality Constrained Optimization

See Boyd Chapter 10.

min f(x)

s.t. Ax = b

where f(x) : RN → R, convex, twice di�erentiable, A ∈ RM×N , M < N .

Minimum occurs at

p∗ = inf {f(x) |Ax = b} = f(x∗)

where x∗ satis�es KKT conditions

Ax = b

∇f(x∗) + ATν∗ = 0



Quadratic Minimization

min1

2xTQx + cTx + r

s.t. Ax = b

where Q ∈ RN×N is a symmetric positive semide�ne matrix (SPSD), A ∈ RM×Nand b ∈ RM .

I Using KKT conditions

Ax∗ = b

Qx∗ + c + ATν∗ = 0

}=

[Q AT

A 0

]︸︷︷︸KKT matrix

[x∗

ν∗

]=

[−cb

]

I Above equations de�nes a KKT system for an equality constrained quadraticoptimization problem with (N +M) linear equations in the (N +M) variables(x∗,ν∗).



Eliminating Equality Constraints

I One general approach to solving the equality constrained problem is to eliminatethe equality constraints and then solve the resulting unconstrained problem usingmethods for unconstrained minimization.

I We �rst �nd a matrix F ∈ RN×(N−M) and vector x that parametrize the (a�ne)feasible set:

{x |Ax = b} ={

Fz + x | z ∈ RN−M}.

Here x can be chosen as any particular solution of Ax = b, and F ∈ RN×(N−M) isany matrix whose range (columnspace) is the nullspace of A, i.e., R(F) = N (A).

I We then form the reduced or eliminated optimization problem

min f(z) = f(Fz + x)

which is an unconstrained problem with variable z ∈ RN−M . From its solution z∗,we can �nd the solution of the equality constrained problem as

x∗ = Fz∗ + x

.



I There are, of course, many possible choices for the elimination matrix F, whichcan be chosen as any matrix in RN×(N−M) with the range (columnspace) of thematrix F is equals to the nullspace of the matrix A, i.e., R(F) = N (A). If F isone such matrix, and T ∈ R(M−N)×(M−N) is nonsingular, then F = FT is also asuitable elimination matrix, since

R(F) = R(F) = N (A).

I Conversely, if F and F are any two suitable elimination matrices, then there issome nonsingular T such that F = FT. If we eliminate the equality constraintsusing F, we solve the unconstrained problem

min f(Fz + x)

while if F is used, we solve the unconstrained problem

min f(Fz + x) = f(F(Tz) + x)

This problem is equivalent to the one above, and is simply obtained by the changeof coordinates z = Tz. In other words, changing the elimination matrix can bethought of as changing variables in the reduced problem.



Example 23:



Newton's Method Equality Constraints

I It is the same as unconstrained Newton's Method except

- initial point is feasible, x(0) ∈ F, i.e., Ax(0) = b.

- Newton step ∆xnt is a feasible direction, i.e., A∆xnt = 0.

I In order to use Newton's Method on the problem

min f(x)

s.t. Ax = b

we can use second-order Taylor approximation around x (actually x(k)) to obtain aquadratic minimization problem

min f(x + ∆x) = f(x) +∇T f(x)∆x +1

2∆xTH(x)∆x

s.t. A (x + ∆x) = b

This problem is convex if H(x) � 0



I Using the results of quadratic minimization with equality constraints[H(x) AT

A 0

]︸︷︷︸

KKT matrix

[∆xnt

ν∗

]=

[−∇f(x)

0

].

Solution exists when the KKT matrix is non-singular.

I The same solution can also be obtained by setting x∗ = x + ∆xnt in theoptimality condition equations of the original problem

Ax∗ = b

∇f(x∗) + ATν∗ = 0

as ∆xnt and ν∗ should satisfy the optimality conditions.



I Newton decrement λ(x) is the same as the one used for the unconstrainedproblem, i.e.,

λ(x) =(

∆xTntH(x)∆xnt

)1/2= ‖∆xnt‖H(x)

being the norm of the Newton step in the norm de�ned by H(x).

I λ2(x)2

, being a good estimate of f(x)− p∗ (i.e., f(x)− p∗ ≈ λ2(x)2

), can be used

as the stopping criterion λ2(x)2≤ ε.

I Appears in the line search

d f(x + α∆xnt)

dα

∣∣∣∣α=0

= ∇T f(x)∆xnt = −λ2(x)

I In order to Newton step ∆xnt to be a feasible descent direction the following twoconditions need to be satis�ed

∇T f(x)∆xnt = −λ2(x)

A∆xnt = 0

I The solution for equality constrained Newton's Method is invariant to a�netransformations.



Newton's Method with Equality ConstraintElimination

min f(x)

s.t. Ax = b≡ min f(z) = f(Fz + x)

with R(F) = N (A) and Ax = b.

I The gradient and Hessian of f(z) are

∇f(z) = FT∇f(Fz + x)

H(z) = FTH(Fz + x)F

I Then, the KKT matrix is invertible, i� H(z) is invertible.

I The Newton step of the reduced problem is

∆znt = −H−1(z)∇f(z) = −(FTH(x)F

)−1

FT∇f(x)

where x = Fz + x.



I It can be shown that∆xnt = F∆znt

where ∆xnt is the Newton step of the original problem with equality constraints.

I The Newton decrement λ(z) is the same as the Newton decrement of the originalproblem

λ2(z) = ∆zTntH(z)∆znt

= ∆zTntFTH(x)F∆znt

= ∆xTntH(x)∆xnt

= λ2(x)


Constrained Optimization Algorithms Penalty and Barrier Methods

Penalty and Barrier Methods

See Luenberger Chapter 13.

I They approximate constrained problems by unconstrained problems

I The approximation is obtained by adding

- a term with a high cost for violating the constraints (penalty methods)

- a term that favors points interior to the feasible points over those near theboundary (barrier methods)

to the objective function (cost function).

I The penalty and barrier methods work directly in original the N -dimensional spaceRN rather than the (N −M)-dimensional space RN−M as in the case of theprimal methods.



Penalty Methods

min f(x)

s.t. x ∈ FI The idea is to replace the this problem by the following penalty problem

min f(x) + cP (x)

where c ∈ R++ is a constant and P (x) : RN → R is the penalty function.

I Here

- P (x) is continuous

- P (x) ≥ 0 ∀x ∈ RN

- P (x) = 0 i� x ∈ FI In general, the following quadratic penalty function is used

P (x) =1

2

L∑i=1

(max {0, gi(x)})2 +1

2

M∑j=1

(hj(x))2

where gi(x) ≤ 0 are the inequality constraints and hj(x) = 0 are the equalityconstraints.



I For example, consider P (x) = 12

2∑i=1

(max {0, gi(x)})2 with g1(x) = x− b and

g2(x) = a− x

I For large c, the minimum point of the penalty problem will be in a region whereP (x) is small.

I For increasing c, the solution will approach the feasible region F and as c→∞ thepenalty problem will converge to a solution of the constrained problem.



Penalty Method:

I Let{c(k)}, k = 1, 2, . . . be a sequence tending to ∞ such that ∀k c(k) ≥ 0 and

c(k+1) > c(k).

I Letq(c,x) = f(x) + cP (x)

and for each k solve the penalty problem

min q(c(k),x)

obtaining a solution point x(k).



I Lemma: As k →∞, c(k+1) > c(k) (e.g., start from an exterior point)

i. q(c(k),x(k)) ≤ q(c(k+1),x(k+1))

ii. P (x(k)) ≥ P (x(k+1))

iii. f(x(k)) ≤ f(x(k+1))

iv. f(x∗) ≥ q(c(k),x) ≥ f(x(k))

I Theorem: Let{

x(k)}, k = 1, 2, . . . be a sequence generated by the penalty

method. Then, any limit of the sequence is a solution of the original constrainedproblem.



Barrier MethodsAlso known as interior methods.

min f(x)

s.t. x ∈ F

I The idea is to replace the this problem by the following barrier problem

min f(x) +1

cB(x)

s.t. x ∈ interior of F

where c ∈ R++ is a constant and B(x) : RN → R is the barrier function.

I Here, barrier function B(x) is de�ned on the interior of F such that

- B(x) is continuous

- B(x) ≥ 0

- B(x)→∞ as x approaches the boundary of FI Ideally,

B(x) =

{0, x ∈ interiorF∞, x /∈ interiorF



I There are several approximations. Two common barrier functions for the inequalityconstraints are given below

Log barrier: B(x) = −L∑i=1

log (−gi(x))

Inverse barrier: B(x) = −L∑i=1

1

gi(x)



I For example, consider B(x) = −1/g1(x)− 1/g2(x), g1(x) = x− b andg2(x) = a− x.



Barrier Method:

I Let{c(k)}, k = 1, 2, . . . be a sequence tending to ∞ such that ∀k c(k) ≥ 0 and

c(k+1) > c(k).

I Let

r(c,x) = f(x) +1

cB(x)

and for each k solve the barrier problem

min r(c(k),x)

s.t. x ∈ interior of F

obtaining a solution point x(k).



I Lemma: As k →∞, c(k+1) > c(k) (start from an interior point)

i. r(c(k),x(k)) ≥ r(c(k+1),x(k+1))

ii. B(x(k)) ≤ B(x(k+1))

iii. f(x(k)) ≥ f(x(k+1))

iv. f(x∗) ≤ f(x(k)) ≤ r(c(k),x)

I Theorem: Any limit point of a sequence{

x(k)}, k = 1, 2, . . . generated by the

barrier method is a solution of the original constrained problem.



Properties of the Penalty & Barrier Methods

I The penalty method solves the unconstrained problem

min f(x) + cP (x)

I Let pi(x) = max {0, gi(x)} and p(x) =[pi]L×1

Let the penalty function γ(x) where P (x) = γ(p(x)) to be the Euclidean normfunction

γ(y) = yTy ⇒ P (x) = pT (x)p(x) =L∑i=1

(pi(x))2

or more generally to be the quadratic norm function

γ(y) = yT Γ y ⇒ P (x) = pT (x) Γ p(x)

I The Hessian of the above problem becomes more and more ill-conditioned asc→∞



I De�ningq(c,x) = f(x) + c γ(p(x))

The Hessian Q(c,x) is given by

Q(c,x) = F(x) + c∇T γ(p(x)) G(x) + cJTp (x) Γ(p(x)) Jp(x)

where F(x), G(x) and Γ(x) are Hessians of f(x), g(x) and γ(x) respectively, andJp(x) is the Jacobian of p(x).

I If at x∗ there are r active constraints, then the Hessian matrices Q(c(k),x(k))have r eigenvalues tending to ∞ as c(k) →∞ and (n− r) eigenvalues tending tosome �nite value. In other words, condition number goes to in�nity (κ→∞) asc(k) →∞.

I Gradient Descent may not be directly applied, instead Newton's Method ispreferred!

I Same observation also applies to the barrier method.


Constrained Optimization Algorithms Interior-Point Methods

Interior-Point MethodsSee Boyd Chapter 11.

min f(x) +1

c

(−

L∑i=1

log (−gi(x))

)s.t. Ax = b

Minimum occurs at

p∗ = inf {f(x) |Ax = b} = f(x∗)Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 27-Dec-2016 90 / 118


I The function

φ(x) = −L∑i=1

log (−gi(x))

with domφ(x) ={x ∈ RN | gi(x) < 0, ∀i

}is called the logarithmic barrier

function.

I We will modify the Newton's algorithm to solve the above problem. So, we willneed

∇φ(x) = −L∑i=1

1

gi(x)∇gi(x)

Hφ(x) =L∑i=1

1

g2i (x)∇gi(x)∇T gi(x)−

L∑i=1

1

gi(x)Hgi(x)

where Hφ(x) = ∇2φ(x) and Hgi(x) = ∇2gi(x) are the Hessians of φ(x) andgi(x) respectively.



Central PathI Consider the equivalent problem (c > 0)

min cf(x) + φ(x)

s.t. Ax = b

I The point x∗(c) is the solution. The trajectory of x∗(c) as a function of c is calledthe central path. Points on the central path satisfy

Ax∗(c) = b

gi(x∗(c)) < 0, ∀i

c∇f(x∗(c)) +∇φ(x∗(c)) + AT ν = 0

Centrality conditions

for some ν ∈ RM .

I Last line could be rewritten as

c∇f(x∗(c))−L∑i=1

1

gi(x∗(c))∇gi(x∗(c)) + AT ν = 0



I Example 24: Inequality form linear programming. The logarithmic barrier functionfor an LP in inequality form

min eTx

s.t. Ax ≤ b

(where e ∈ RN , A ∈ RN×L and b ∈ RL are constants) is given by

φ(x) = −L∑i=1

log(bi − aTi x

), domφ(x) = {x |Ax < b}

where aT1 , . . . ,aTL are the rows of A.



The gradient and Hessian of the barrier function are

∇φ(x) =L∑i=1

1

bi − aTi xai

= ATd

Hφ(x) =L∑i=1

1

(bi − aTi x)2 aia

Ti

= AT (diag d)2A

where di =1

bi − aTi x.

Since x is strictly feasible, we have d > 0, so the Hessian of φ(x), Hφ(x) isnonsingular if and only if A has rank N , i.e., full-rank.



The centrality condition isc e + ATd = 0

We can give a simple geometric interpretation of the centrality condition. At apoint x∗(c) on the central path the gradient ∇φ(x∗(c)), which is normal to thelevel set of φ(x) through x∗(c), must be parallel to e. In other words, thehyperplane eTx = eTx∗(c) is tangent to the level set of φ(x) through x∗(c).Figure below shows an example with L = 6 and N = 2.



The dashed curves in the previous �gure show three contour lines of thelogarithmic barrier function φ(x). The central path converges to the optimal pointx∗ as c→∞. Also shown is the point on the central path with c = 10. Theoptimality condition at this point can be veri�ed geometrically: The lineeTx = eTx∗(10) is tangent to the contour line of φ(x) through x∗(10).



Dual Points from Central Path

I From the centrality condition, let

λ∗i (c) = − 1

c gi(x∗(c)), ∀i

ν∗(c) =ν

c

then

∇f(x∗(c)) +L∑i=1

λ∗i (c)∇gi(x∗(c)) + ATν∗(c) = 0

Hence, from KKT, x∗(c) minimizes

L(x,λ,ν) = f(x) + λTg(x) + νT (Ax− b)

for λ = λ∗(c) and ν = ν∗(c) for a particular c, which means that (λ∗(c),ν∗(c)) isa dual feasible pair.



So, the dual function optimal value p∗(c) = `(λ∗(c),ν∗(c)) is �nite and given as

`(λ∗(c),ν∗(c)) = f(x∗(c)) +L∑i=1

λ∗i (c)gi(x∗(c))︸︷︷︸

= 1c

+ν∗T (c) (Ax∗(c)− b)︸︷︷︸=0

= f(x∗(c))− L

c

I Duality gap isL

cand

f(x∗(c))− p∗ ≤ L

cgoes to zero, as c→∞.



KKT Interpretation

I Central path (centrality) conditions can be seen as deformed KKT conditions, i.e.,

Ax = b

gi(x∗) ≤ 0, ∀iλ∗i ≥ 0, ∀i

−λ∗i gi(x∗) =1

c, ∀i

∇f(x∗) +L∑i=1

λ∗i∇gi(x∗) + ATν∗ = 0

satis�ed byx∗(c), λ∗(c) and ν∗(c)

I Complementary slackness, λigi(x) = 0 → −λ∗i gi(x∗) =1

c

I As c→∞, x∗(c), λ∗(c) and ν∗(c) almost satisfy the KKT optimality conditions.



Newton Step for Modi�ed KKT equations

I Let λi = − 1

c gi(x), then

∇f(x)−L∑i=1

1

c gi(x)∇gi(x) + ATν = 0

Ax = b

I To solve this set of (N +M) linear independent equations (of N +M variables xand ν) consider the nonlinear part of the �rst set of equations

∇f(x + d)−L∑i=1

1

c gi(x + d)∇gi(x + d) ∼= ∇f(x)−

L∑i=1

1

c gi(x)∇gi(x)︸︷︷︸

=g

+ H(x)d−L∑i=1

1

c gi(x)Hgi(x)d +

L∑i=1

1

c g2i (x)∇gi(x)∇T gi(x)d︸︷︷︸

=Hd



I Substituting back, we obtain

Hd + ATν = −g

Ad = 0

where

H = H(x)−L∑i=1

1

c gi(x)Hgi(x) +

L∑i=1

1

c g2i (x)∇gi(x)∇T gi(x)

g = ∇f(x)−L∑i=1

1

c gi(x)∇gi(x)

I Using the derivations of ∇φ(x) and Hφ(x)

H = H(x) +1

cHφ(x)

g = ∇f(x) +1

c∇φ(x)



I Let us represent the previous modi�ed KKT equations in matrix form[H AT

A 0

] [dν

]=

[−g0

]whose solution would give the modi�ed Newton step ∆xnt and ν∗nt[

cH(x) + Hφ(x) AT

A 0

] [∆xnt

ν∗nt

]=

[−c∇f(x)−∇φ(x)

0

]where ν∗nt = cν∗.

I Using this Newton step, the Interior-Point Method (i.e., Barrier Method) can beconstructed.



The Interior-Point Method (Barrier Method):Given a strictly feasible x ∈ F, c = c(0)(> 0), µ > 1 and tolerance ε > 0

repeat

1. Centering step:

Compute x∗(c) by minimizing the modi�ed barrier problem

x∗(c) = argmin (c f(x) + φ(x))

s.t. Ax = b

starting at x using the modi�ed Newton's Method.

2. Update: x = x∗(c)

3. Stopping criterion: quit ifL

c< ε

4. Increase c: c = µ c



I Accuracy of centering:

- Computing x∗(c) exactly is not necessary since the central path has nosigni�cance beyond the fact that it leads to a solution of the original problemas c→∞; inexact centering will still yield a sequence of points x(k) thatconverges to an optimal point. Inexact centering, however, means that thepoints λ∗(c) and ν∗(c), computed from the �rst two equation given in thesection titled "Dual points from central path", are not exactly dual feasible.This can be corrected by adding a correction term to these formulae, whichyields a dual feasible point provided the computed x is near the central path,i.e., x∗(c).

- On the other hand, the cost of computing an extremely accurate minimizerof c f(x) + φ(x), as compared to the cost of computing a good minimizer ofc f(x) + φ(x), is only marginally more, i.e., a few Newton steps at most. Forthis reason it is not unreasonable to assume exact centering.



I Choice of µ:

- µ provides a trade-o� in the number of iterations for the inner and outerloops.

- small µ: at each step inner loop starts from very good point, few inner loopiterations are required, but too many outer loop iterations may be required.

- large µ: at each step c increases a large amount, the current starting pointmay not be a good point for the inner loop. Too many inner loop iterationsmay be required, but few outer loop iterations are required.

- In practice, small values of µ (i.e., near one) result in many outer iterations,with just a few Newton steps for each outer iteration. For µ in a fairly largerange, from around 3 to 100 or so, the two e�ects nearly cancel, so the totalnumber of Newton steps remains approximately constant. This means thatthe choice of µ is not particularly critical; values from around 10 to 20 or soseem to work well.



I Choice of c(0):

- large c(0): �rst run of inner loop may require too many iterations.

- small c(0): more outer-loop iterations are required.

- One reasonable choice is to choose c(0) so that L

c(0)is approximately of the

same order as f(x(0))− p∗, or µ times this amount. For example, if a dualfeasible point (λ,ν) is known, with duality gap η = f(x(0))− `(λ,ν), thenwe can take c(0) = L

η. Thus, in the �rst outer iteration we simply compute a

pair with the same duality gap as the initial primal and dual feasible points.

- Another possibility is to �nd c(0) which minimizes

infν‖c∇f(x(0)) +∇φ(x(0)) + ATν‖2



Example 25:

Solution:







Example 26:

Solution:





How to start from a feasible point?

I The interior-point method requires a strictly feasible starting point x(0)

I If such a point is not known, a preliminary stage, Phase I is run �rst



Basic Phase I Method:

I Consider gi(x) ≤ 0, i = 1, 2, . . . , L and Ax = b.

I We always assume that we are given a point x(0) ∈L⋂i=1

dom gi(x) satisfying

Ax(0) = b.

I Then, we form the following optimization problem

min s

s.t. gi(x) ≤ s, i = 1, 2, . . . , L

Ax = b

in the variables x ∈ RN and s ∈ R.

s is a bound on the maximum infeasibility of the inequalities and it is to be drivenbelow zero.

I The problem is always feasible when select s(0) ≥ maxigi(x

(0)) together with the

given x(0) ∈L⋂i=1

dom gi(x) with Ax(0) = b.



I Then, apply the interior-point method to solve the above problem. There are threecases depending on the optimal value p∗

1. If p∗ < 0, then a strictly feasible solution is reached. Moreover if (x, s) isfeasible with s < 0, then x satis�es gi(x) < 0. This means we do not needto solve the optimization problem with high accuracy; we can terminatewhen s < 0.

2. If p∗ > 0, then there is no feasible solution.

3. If p∗ = 0, and the minimum is attained at x∗ and s∗ = 0, then the set ofinequalities is feasible, but not strictly feasible. However, if p∗ = 0 and theminimum is not attained, then the inequalities are infeasible.



Example 27:

Solution:




ele604/ele704 optimization - hacettepe universityusezen/ele604/optimization3-2p.pdfumut sezen &...

Documents