lecture 3 optimization problems and iterative algorithmsangelia/ie598ns_lect3.pdfa. nedich and u....
TRANSCRIPT
Lecture 3
Optimization Problems
and Iterative Algorithms
August 27, 2008
A. Nedich and U. Shanbhag Lecture 3
Outline
• Special Functions: Linear, Quadratic, Convex
• Criteria for Convexity of a Function
• Operations Preserving Convexity
• Unconstrained Optimization
• First-Order Necessary Optimality Conditions
• Constrained Optimization
• First-Order Necessary Optimality Conditions
• KKT Conditions
• Iterative Algorithms
Game Theory: Models, Algorithms and Applications 1
A. Nedich and U. Shanbhag Lecture 3
Convex Functionf is convex when dom(f) is convex set and there holds
f(αx + (1− α)y) ≤ αf(x) + (1− α)f(y)
for all x, y ∈ dom(f) and α ∈ [0,1]
strictly convex if the inequality is strict for all x, y ∈ dom(f) & α ∈ (0,1)
x
f (x)
yx
f (x)
y
f (y)
f is concave when −f is convex
f is strictly concave when −f is strictly convex
Game Theory: Models, Algorithms and Applications 2
A. Nedich and U. Shanbhag Lecture 3
Examples of Convex/Concave FunctionsExamples on RConvex:
• Affine: ax + b over R for any a, b ∈ R• Exponential: eax over R for any a ∈ R• Power: xp over (0,+∞) for p ≥ 1 or p ≤ 0
• Powers of absolute value: |x|p over R for p ≥ 1
• Negative entropy: x lnx over (0,+∞)
Concave:
• Affine: ax + b over R for any a, b ∈ R• Powers: xp over (0,+∞) for 0 ≤ p ≤ 1
• Logarithm: lnx over (0,+∞)
Examples on Rn
• Affine functions are both convex and concave
• Norms ‖x‖, ‖x‖1, ‖x‖∞ are convex
Game Theory: Models, Algorithms and Applications 3
A. Nedich and U. Shanbhag Lecture 3
Second-Order Conditions for Convexity
Let f be twice differentiable and let dom(f) be the domain of f
[In general, when differentiability is considered, it is required that dom(f)
is open]
The Hessian ∇2f(x) is a symmetric n × n matrix whose entries are the
second-order partial derivatives of f at x:
[∇2f(x)
]ij
=∂2f(x)
∂xi∂xj
for i, j = 1, . . . , n
2nd-order conditions:
• f is convex if and only if dom(f) is convex set and
∇2f(x) � 0 for all x ∈ dom(f)• f is strictly convex if dom(f) is convex set
∇2f(x) � 0 for all x ∈ dom(f)
Game Theory: Models, Algorithms and Applications 4
A. Nedich and U. Shanbhag Lecture 3
Examples
• Quadratic function: f(x) = (1/2)x′Qx + q′x + r with a symmetric
n× n matrix Q
∇f(x) = Qx + q, ∇2f(x) = Q
Convex for Q ≥ 0
• Least-squares objective: f(x) = ‖Ax− b‖2 with an m× n matrix A
∇f(x) = 2AT(Ax− b), ∇2f(x) = 2ATA
Convex for any A
• Quadratic-over-linear: f(x, y) = x2/y
∇2f(x, y) = 2y3
[y
−x
] [y
−x
]T
� 0
Convex for y > 0
Game Theory: Models, Algorithms and Applications 5
A. Nedich and U. Shanbhag Lecture 3
First-Order Condition for Convexity
Let f be differentiable and let dom(f) be its domain. Then, the gradient
∇f(x) =
(∂f(x)
∂x1,∂f(x)
∂x2, . . . ,
∂f(x)
∂xn
)
exists at each x ∈ dom(f)
• 1st-order condition: f is convex if and only if dom(f) is convex and
f(x) +∇f(x)T(z − x) ≤ f(z) for all x, z ∈ dom(f)
• Note: A first order approximation is a global underestimate of f
• Very important property used in convex optimization for algorithm
designs and performance analysis
Game Theory: Models, Algorithms and Applications 6
A. Nedich and U. Shanbhag Lecture 3
Operations Preserving ConvexityLet f and g be convex functions over Rn
• Positive Scaling: λf is convex for λ > 0; (λf)(x) = λf(x) for all x
• Sum: f + g is convex; (f + g)(x) = f(x) + g(x) for all x
• Composition with affine function: for g affine [i.e., g(x) = Ax + b],
the composition f ◦ g is convex, where
(f ◦ g)(x) = f(Ax + b) for all x
• Pointwise maximum: For convex functions f1, . . . , fm, the pointwise-max function
h(x) = max {f1(x), . . . , fm(x)} is convex
• Polyhedral function: f(x) = maxi=1,...,m(aTi x + bi) is convex
• Pointwise supremum: Let Y ⊆ Rm and f : Rn × Rm → R. Let
f(x, y) be convex in x for each y ∈ Y . Then, the supremum functionover the set Y
h(x) = supy∈Y f(x, y) is convex
Game Theory: Models, Algorithms and Applications 7
A. Nedich and U. Shanbhag Lecture 3
Optimization TerminologyLet C ⊆ Rn and f : C → R. Consider the following optimization problem
minimize f(x)
subject to x ∈ C
Example: C = {x ∈ Rn | g(x) ≤ 0, x ∈ X}Terminology:• The set C is referred to as feasible set• We say that the problem is feasible when C is nonempty
• The problem is unconstrained when C = Rn, and it is constrained
otherwise
• We say that a vector x∗ is optimal solution or a global minimum when
x∗ is feasible and the value f(x∗) is not exceeded at any x ∈ C, i.e.,
x∗ ∈ C
f(x∗) ≤ f(x) for all x ∈ C
Game Theory: Models, Algorithms and Applications 8
A. Nedich and U. Shanbhag Lecture 3
Local Minimum
minimize f(x)
subject to x ∈ C
• A vector x is a local minimum for the problem if x ∈ C and there is a
ball B(x, r) such that
f(x) ≤ f(x) for all x ∈ C with ‖x− x‖ ≤ r
• Every global minimum is also a local minimum
• When the set C is convex and the function f is convex then a local
minimum is also global
Game Theory: Models, Algorithms and Applications 9
A. Nedich and U. Shanbhag Lecture 3
First-Order Necessary Optimality Condition:
Unconstrained ProblemLet f be a differentiable function with dom(f) = Rn and let C = Rn.
• If x is a local minimum of f over Rn, then the following holds:
∇f(x) = 0
• The gradient relation can be equivalently given as:
(y − x)′∇f(x) ≥ 0 for all y ∈ Rn
This is a variational inequality V I(K, F ) with the set K and the
mapping F given by
K = Rn, F (x) = ∇f(x)
• Solving a minimization problem can be reduced to solving a correspond-
ing variational inequality
Game Theory: Models, Algorithms and Applications 10
A. Nedich and U. Shanbhag Lecture 3
First-Order Necessary Optimality Condition:
Constrained Problem
Let f be a differentiable function with dom(f) = Rn and let C ⊆ Rn be a
closed convex set.
• If x is a local minimum of f over C, then the following holds:
(y − x)′∇f(x) ≥ 0 for all y ∈ C (1)
Again, this is a variational inequality V I(K, F ) with the set K and
the mapping F given by
K = C, F (x) = ∇f(x)
• Recall that when f is convex, then a local minimum is also global
• When f is convex: the preceding relation is also sufficient for x to be
a global minimum i.e.,
if x satisfies relation (1), then x is a (global) minimum
Game Theory: Models, Algorithms and Applications 11
A. Nedich and U. Shanbhag Lecture 3
Equality and Inequality Constrained Problem
Consider the following problem
minimize f(x)
subject to h1(x) = 0, . . . , hp(x) = 0
g1(x) ≤ 0, . . . , gm(x) ≤ 0
where f , hi and gj are continuously differentiable over Rn.
Def. For a feasible vector x, an active set of (inequality) constraints is
the set given by
A(x) = {j | gj(x) = 0}
If j 6∈ A(x), we say that the j-th constraint is inactive at x
Def. We say that a vector x is regular if the gradients
∇h1(x), . . . ,∇hp(x), and ∇gj(x) for j ∈ A(x)
are linearly independentNOTE: x is regular when there are no equality constraints, and all the
inequality constrains are inactive [p = 0 and A(x) = ∅]
Game Theory: Models, Algorithms and Applications 12
A. Nedich and U. Shanbhag Lecture 3
Lagrangian Function
With the problem
minimize f(x)
subject to h1(x) = 0, . . . , hp(x) = 0
g1(x) ≤ 0, . . . , gm(x) ≤ 0
(2)
we associate the Lagrangian function L(x, λ, µ) defined by
L(x, λ, µ) = f(x) +p∑
i=1
λihi(x) +m∑
j=1
µjgj(x)
where λi ∈ R for all i, and µj ∈ R+ for all j
Game Theory: Models, Algorithms and Applications 13
A. Nedich and U. Shanbhag Lecture 3
First-Order Karush-Kuhn-Tucker (KKT)
Necessary Conditions
Th. Let x be a local minimum of the equality/inequality constrained
problem (2). Also, assume that x is regular. Then, there exist unique
multipliers λ and µ such that
• ∇xL(x, λ, µ) = 0 [L is the Lagrangian function]
• µj ≥ 0 for all j
• µj = 0 for all j 6∈ A(x)
The last condition is referred to as complementarity conditions
We can compactly write them as:
g(x) ⊥ µ
Game Theory: Models, Algorithms and Applications 14
A. Nedich and U. Shanbhag Lecture 3
Second-Order KKT Necessary Conditions
Th. Let x be a local minimum of the equality/inequality constrained
problem (2). Also, assume that x is regular and that f, hi, gj are twice
continuously differentiable. Then, there exist unique multipliers λ and µ
such that
• ∇xL(x, λ, µ) = 0
• µj ≥ 0 for all j
• µj = 0 for all j 6∈ A(x)
• For any vector y such that ∇hi(x)′y = 0 for all i and ∇gj(x)′y = 0
for all j ∈ A(x), the following relation holds:
y′∇2xxL(x, λ, µ)y ≥ 0
Game Theory: Models, Algorithms and Applications 15
A. Nedich and U. Shanbhag Lecture 3
Solution Procedures: Iterative Algorithms
For solving problems, we will consider iterative algorithms
• Given an initial iterate x0
• We generate a new iterate
xk+1 = Gk(xk)
where Gk is a mapping that depends on the optimization problem
Objectives:
• Provide necessary conditions on the mappings Gk that yield a sequence
{xk} converging to a solution of the problem of interest
• Study how fast the sequence {xk} converges:
• Global convergence rate (when far from optimal points)
• Local convergence rate (when near an optimal point)
Game Theory: Models, Algorithms and Applications 16
A. Nedich and U. Shanbhag Lecture 3
Gradient Descent MethodConsider continuously differentiable function f . We want to
minimize f(x) over x ∈ Rn
Gradient descent methodxk+1 = xk − αk∇f(xk)
• The scalar αk is a stepsize: αk > 0• The stepsize choices αk = α, or line search, or other stepsize rule so
that f(xk+1) < f(xk)
Convergence Rate:
• Looking at the tail of an error e(xk) = dist(xk, X∗) sequence:
Local convergence is at the best linear
lim supk→∞
e(xk+1)
e(xk)≤ q for some q ∈ (0,1)
• Global convergence is also at the best linear
Game Theory: Models, Algorithms and Applications 17
A. Nedich and U. Shanbhag Lecture 3
Newton’s MethodConsider twice continuously differentiable function f with Hessian∇2f(x) >
0 for all x. We want to
minimize f(x) over x ∈ Rn
Newton’s methodxk+1 = xk − αk∇2f(xk)
−1∇f(xk)
Global Convergence Rate (when far from x∗)• f(x) decreases by at least γ at each iteration; therefore, there can be at
most (f(x0)− f ∗)/γ iterations [under some additional conditions on f ]• Method converges in finite number of iterations
Local Convergence Rate (near x∗)• ‖∇f(x)‖ converges to zero quadratically:
‖∇f(xk)‖ ≤ C q2kfor all large enough k
where C > 0 and q ∈ (0,1)
Game Theory: Models, Algorithms and Applications 18
A. Nedich and U. Shanbhag Lecture 3
Penalty Methods
For solving inequality constrained problems:
minimize f(x)
subject to gj(x) ≤ 0, j = 1, . . . , m
Penalty Approach: Remove the constraints but penalize their violation
Pc : minimize F (x, c) = f(x)+cP (g1(x), . . . , gm(x)) over x ∈ Rn
where c > 0 is a penalty parameter and P is some penalty function
Penalty methods operate in two stages for c and x, respectively
• Choose initial value c0
(1) Having ck, solve the problem Pckto obtain its optimal x∗(ck)
(2) Using x∗(ck), update ck to obtain ck+1 and go to step 1
Game Theory: Models, Algorithms and Applications 19
A. Nedich and U. Shanbhag Lecture 3
Interior-Point Methods
Solve inequality (and more generally) constrained problem:
minimize f(x)
subject to gj(x) ≤ 0, j = 1, . . . , m
The IPM solves a sequence of problems parametrized by t > 0:
minimize f(x)−1
t
m∑j=1
ln(−gj(x)) over x ∈ Rn
• Can be viewed as a penalty method with
• Penalty parameter c = 1t
• Penalty function
P (u1, . . . , um) = −m∑
j=1
ln(−uj)
This function is known as logarithmic barrier or log barrier function
Game Theory: Models, Algorithms and Applications 20
A. Nedich and U. Shanbhag Lecture 3
References for this lecture
The material for this lecture:
• (B) Bertsekas D.P. Nonlinear Programming
• Chapter 1 and Chapter 3 (descent and Newton’s methods, KKT
conditions)
• (BNO) Bertsekas, Nedic, Ozdaglar Convex Analysis and Optimization
• Chapter 1 (convex functions)
• Lecture slides for Convex Optimization at
https://netfiles.uiuc.edu/angelia/www/angelia.html
• Lectures 14 and 17 (descent and interior point methods)
Game Theory: Models, Algorithms and Applications 21