lecture 3 optimization problems and iterative algorithmsangelia/ie598ns_lect3.pdfa. nedich and u....

Lecture 3

Optimization Problems

and Iterative Algorithms

August 27, 2008

A. Nedich and U. Shanbhag Lecture 3

Outline

• Special Functions: Linear, Quadratic, Convex

• Criteria for Convexity of a Function

• Operations Preserving Convexity

• Unconstrained Optimization

• First-Order Necessary Optimality Conditions

• Constrained Optimization

• First-Order Necessary Optimality Conditions

• KKT Conditions

• Iterative Algorithms

Game Theory: Models, Algorithms and Applications 1


Convex Functionf is convex when dom(f) is convex set and there holds

f(αx + (1− α)y) ≤ αf(x) + (1− α)f(y)

for all x, y ∈ dom(f) and α ∈ [0,1]

strictly convex if the inequality is strict for all x, y ∈ dom(f) & α ∈ (0,1)

x

f (x)

yx

f (x)

y

f (y)

f is concave when −f is convex

f is strictly concave when −f is strictly convex



Examples of Convex/Concave FunctionsExamples on RConvex:

• Affine: ax + b over R for any a, b ∈ R• Exponential: eax over R for any a ∈ R• Power: xp over (0,+∞) for p ≥ 1 or p ≤ 0

• Powers of absolute value: |x|p over R for p ≥ 1

• Negative entropy: x lnx over (0,+∞)

Concave:

• Affine: ax + b over R for any a, b ∈ R• Powers: xp over (0,+∞) for 0 ≤ p ≤ 1

• Logarithm: lnx over (0,+∞)

Examples on Rn

• Affine functions are both convex and concave

• Norms ‖x‖, ‖x‖1, ‖x‖∞ are convex



Second-Order Conditions for Convexity

Let f be twice differentiable and let dom(f) be the domain of f

[In general, when differentiability is considered, it is required that dom(f)

is open]

The Hessian ∇2f(x) is a symmetric n × n matrix whose entries are the

second-order partial derivatives of f at x:

[∇2f(x)

]ij

=∂2f(x)

∂xi∂xj

for i, j = 1, . . . , n

2nd-order conditions:

• f is convex if and only if dom(f) is convex set and

∇2f(x) � 0 for all x ∈ dom(f)• f is strictly convex if dom(f) is convex set

∇2f(x) � 0 for all x ∈ dom(f)



Examples

• Quadratic function: f(x) = (1/2)x′Qx + q′x + r with a symmetric

n× n matrix Q

∇f(x) = Qx + q, ∇2f(x) = Q

Convex for Q ≥ 0

• Least-squares objective: f(x) = ‖Ax− b‖2 with an m× n matrix A

∇f(x) = 2AT(Ax− b), ∇2f(x) = 2ATA

Convex for any A

• Quadratic-over-linear: f(x, y) = x2/y

∇2f(x, y) = 2y3

[y

−x

] [y

−x

]T

� 0

Convex for y > 0



First-Order Condition for Convexity

Let f be differentiable and let dom(f) be its domain. Then, the gradient

∇f(x) =

(∂f(x)

∂x1,∂f(x)

∂x2, . . . ,

∂f(x)

∂xn

)

exists at each x ∈ dom(f)

• 1st-order condition: f is convex if and only if dom(f) is convex and

f(x) +∇f(x)T(z − x) ≤ f(z) for all x, z ∈ dom(f)

• Note: A first order approximation is a global underestimate of f

• Very important property used in convex optimization for algorithm

designs and performance analysis



Operations Preserving ConvexityLet f and g be convex functions over Rn

• Positive Scaling: λf is convex for λ > 0; (λf)(x) = λf(x) for all x

• Sum: f + g is convex; (f + g)(x) = f(x) + g(x) for all x

• Composition with affine function: for g affine [i.e., g(x) = Ax + b],

the composition f ◦ g is convex, where

(f ◦ g)(x) = f(Ax + b) for all x

• Pointwise maximum: For convex functions f1, . . . , fm, the pointwise-max function

h(x) = max {f1(x), . . . , fm(x)} is convex

• Polyhedral function: f(x) = maxi=1,...,m(aTi x + bi) is convex

• Pointwise supremum: Let Y ⊆ Rm and f : Rn × Rm → R. Let

f(x, y) be convex in x for each y ∈ Y . Then, the supremum functionover the set Y

h(x) = supy∈Y f(x, y) is convex



Optimization TerminologyLet C ⊆ Rn and f : C → R. Consider the following optimization problem

minimize f(x)

subject to x ∈ C

Example: C = {x ∈ Rn | g(x) ≤ 0, x ∈ X}Terminology:• The set C is referred to as feasible set• We say that the problem is feasible when C is nonempty

• The problem is unconstrained when C = Rn, and it is constrained

otherwise

• We say that a vector x∗ is optimal solution or a global minimum when

x∗ is feasible and the value f(x∗) is not exceeded at any x ∈ C, i.e.,

x∗ ∈ C

f(x∗) ≤ f(x) for all x ∈ C



Local Minimum

minimize f(x)

subject to x ∈ C

• A vector x is a local minimum for the problem if x ∈ C and there is a

ball B(x, r) such that

f(x) ≤ f(x) for all x ∈ C with ‖x− x‖ ≤ r

• Every global minimum is also a local minimum

• When the set C is convex and the function f is convex then a local

minimum is also global



First-Order Necessary Optimality Condition:

Unconstrained ProblemLet f be a differentiable function with dom(f) = Rn and let C = Rn.

• If x is a local minimum of f over Rn, then the following holds:

∇f(x) = 0

• The gradient relation can be equivalently given as:

(y − x)′∇f(x) ≥ 0 for all y ∈ Rn

This is a variational inequality V I(K, F ) with the set K and the

mapping F given by

K = Rn, F (x) = ∇f(x)

• Solving a minimization problem can be reduced to solving a correspond-

ing variational inequality



First-Order Necessary Optimality Condition:

Constrained Problem

Let f be a differentiable function with dom(f) = Rn and let C ⊆ Rn be a

closed convex set.

• If x is a local minimum of f over C, then the following holds:

(y − x)′∇f(x) ≥ 0 for all y ∈ C (1)

Again, this is a variational inequality V I(K, F ) with the set K and

the mapping F given by

K = C, F (x) = ∇f(x)

• Recall that when f is convex, then a local minimum is also global

• When f is convex: the preceding relation is also sufficient for x to be

a global minimum i.e.,

if x satisfies relation (1), then x is a (global) minimum



Equality and Inequality Constrained Problem

Consider the following problem

minimize f(x)

subject to h1(x) = 0, . . . , hp(x) = 0

g1(x) ≤ 0, . . . , gm(x) ≤ 0

where f , hi and gj are continuously differentiable over Rn.

Def. For a feasible vector x, an active set of (inequality) constraints is

the set given by

A(x) = {j | gj(x) = 0}

If j 6∈ A(x), we say that the j-th constraint is inactive at x

Def. We say that a vector x is regular if the gradients

∇h1(x), . . . ,∇hp(x), and ∇gj(x) for j ∈ A(x)

are linearly independentNOTE: x is regular when there are no equality constraints, and all the

inequality constrains are inactive [p = 0 and A(x) = ∅]



Lagrangian Function

With the problem

minimize f(x)

subject to h1(x) = 0, . . . , hp(x) = 0

g1(x) ≤ 0, . . . , gm(x) ≤ 0

(2)

we associate the Lagrangian function L(x, λ, µ) defined by

L(x, λ, µ) = f(x) +p∑

i=1

λihi(x) +m∑

j=1

µjgj(x)

where λi ∈ R for all i, and µj ∈ R+ for all j



First-Order Karush-Kuhn-Tucker (KKT)

Necessary Conditions

Th. Let x be a local minimum of the equality/inequality constrained

problem (2). Also, assume that x is regular. Then, there exist unique

multipliers λ and µ such that

• ∇xL(x, λ, µ) = 0 [L is the Lagrangian function]

• µj ≥ 0 for all j

• µj = 0 for all j 6∈ A(x)

The last condition is referred to as complementarity conditions

We can compactly write them as:

g(x) ⊥ µ



Second-Order KKT Necessary Conditions

Th. Let x be a local minimum of the equality/inequality constrained

problem (2). Also, assume that x is regular and that f, hi, gj are twice

continuously differentiable. Then, there exist unique multipliers λ and µ

such that

• ∇xL(x, λ, µ) = 0

• µj ≥ 0 for all j

• µj = 0 for all j 6∈ A(x)

• For any vector y such that ∇hi(x)′y = 0 for all i and ∇gj(x)′y = 0

for all j ∈ A(x), the following relation holds:

y′∇2xxL(x, λ, µ)y ≥ 0



Solution Procedures: Iterative Algorithms

For solving problems, we will consider iterative algorithms

• Given an initial iterate x0

• We generate a new iterate

xk+1 = Gk(xk)

where Gk is a mapping that depends on the optimization problem

Objectives:

• Provide necessary conditions on the mappings Gk that yield a sequence

{xk} converging to a solution of the problem of interest

• Study how fast the sequence {xk} converges:

• Global convergence rate (when far from optimal points)

• Local convergence rate (when near an optimal point)



Gradient Descent MethodConsider continuously differentiable function f . We want to

minimize f(x) over x ∈ Rn

Gradient descent methodxk+1 = xk − αk∇f(xk)

• The scalar αk is a stepsize: αk > 0• The stepsize choices αk = α, or line search, or other stepsize rule so

that f(xk+1) < f(xk)

Convergence Rate:

• Looking at the tail of an error e(xk) = dist(xk, X∗) sequence:

Local convergence is at the best linear

lim supk→∞

e(xk+1)

e(xk)≤ q for some q ∈ (0,1)

• Global convergence is also at the best linear



Newton’s MethodConsider twice continuously differentiable function f with Hessian∇2f(x) >

0 for all x. We want to

minimize f(x) over x ∈ Rn

Newton’s methodxk+1 = xk − αk∇2f(xk)

−1∇f(xk)

Global Convergence Rate (when far from x∗)• f(x) decreases by at least γ at each iteration; therefore, there can be at

most (f(x0)− f ∗)/γ iterations [under some additional conditions on f ]• Method converges in finite number of iterations

Local Convergence Rate (near x∗)• ‖∇f(x)‖ converges to zero quadratically:

‖∇f(xk)‖ ≤ C q2kfor all large enough k

where C > 0 and q ∈ (0,1)



Penalty Methods

For solving inequality constrained problems:

minimize f(x)

subject to gj(x) ≤ 0, j = 1, . . . , m

Penalty Approach: Remove the constraints but penalize their violation

Pc : minimize F (x, c) = f(x)+cP (g1(x), . . . , gm(x)) over x ∈ Rn

where c > 0 is a penalty parameter and P is some penalty function

Penalty methods operate in two stages for c and x, respectively

• Choose initial value c0

(1) Having ck, solve the problem Pckto obtain its optimal x∗(ck)

(2) Using x∗(ck), update ck to obtain ck+1 and go to step 1



Interior-Point Methods

Solve inequality (and more generally) constrained problem:

minimize f(x)

subject to gj(x) ≤ 0, j = 1, . . . , m

The IPM solves a sequence of problems parametrized by t > 0:

minimize f(x)−1

t

m∑j=1

ln(−gj(x)) over x ∈ Rn

• Can be viewed as a penalty method with

• Penalty parameter c = 1t

• Penalty function

P (u1, . . . , um) = −m∑

j=1

ln(−uj)

This function is known as logarithmic barrier or log barrier function



References for this lecture

The material for this lecture:

• (B) Bertsekas D.P. Nonlinear Programming

• Chapter 1 and Chapter 3 (descent and Newton’s methods, KKT

conditions)

• (BNO) Bertsekas, Nedic, Ozdaglar Convex Analysis and Optimization

• Chapter 1 (convex functions)

• Lecture slides for Convex Optimization at

https://netfiles.uiuc.edu/angelia/www/angelia.html

• Lectures 14 and 17 (descent and interior point methods)


lecture 3 optimization problems and iterative algorithmsangelia/ie598ns_lect3.pdfa. nedich and u....

Documents