nondifferentiable optimization via approximation*differentiable functions. the technique is based on...

Mathematical Programming Study 3 (1975) 1-25. North-Holland Publishing Company

NONDIFFERENTIABLE OPTIMIZATION VIA APPROXIMATION*

Dimitri P. BERTSEKAS University of Illinois, Urbana, IlL, U.S.A.

Received 8 November 1974 Revised manuscript received I 1 April 1975

This paper presents a systematic approach for minimization of a wide class of nondifferentiable functions. The technique is based on approximation of the nondifferentiable function by a smooth function and is related to penalty and multiplier methods for constrained minimization. Some convergence results are given and the method is illustrated by means of examples from nonlinear programming.

I. Introduction

Optimization problems with nondifferentiable cost functionals, partic- ularly minimax problems, have received considerable attention recently since they arise naturally in a variety of contexts. Optimality conditions for such problems have been derived by several authors while a number of computational methods have been proposed for their solution (the reader is referred to [1] for:a fairly complete list of references up to 1973). Among the computational algorithms currently available are the subgradient methods of [10, 15, 19], the s-subgradient method [1, 2] coupled with an interesting implementation of the direction finding step given in [12], the minimax methods of [6, 7, 9, 17] which were among the first proposed in the nondifferentiable area, and the recent interesting methods proposed in [-5, 20]. While the advances in the area of computational algorithms have been significant, the methods mentioned above are by no means capable of handling all problems encountered in practice since they are often limited in their scope by assumptions such as convexity, cannot handle

* This work was supported in part by the Joint Services Electronics Program (U.S. Army, U.S. Navy and U.S. Air Force) under Contract DAAB-07-72-C-0259, and in part by the U.S. Air Force under Grant AFOSR-73-2570.

D.P. Bertsekas / Nondifferentiable optimization via approximation

nonlinear constraints or they are applicable only to a special class of problems such as minimax problems of particular form. Furthermore several of these methods are similar in their behavior to either the method of steepest descent or first order methods of feasible directions and converge slowly when faced with relatively ill-conditioned problems. Thus there is considerable room for new methods and approaches for solution of nondifferentiable problems and the purpose of this paper is to provide a class of methods which is simple to implement, is quite broad in its scope, and relies on an entirely different philosophy than those underlying methods already available.

We consider minimization problems of the form

minimize g(x), (1) subject to x ~ Q c R",

where g is a real-valued function on R" (n-dimensional Euclidean space). We consider the case where the objective function O is nondifferentiable

exclusively due to the presence of several terms of the form

V[fi(x)] = max {0, fi(x)}, i ~ I, (2)

where {fi: ir I} is an arbitrary collection of real-valued functions on R". By this we mean that if the terms y(.) in the functional expression of g were replaced by some continuously differentiable functions ~(-) then the resulting function would also be everywhere continuously differentiable.

For purposes of easy reference we shall call a term of the form (2), a simple kink. It should be emphasized that while we concentrate attention on simple kinks, the approach is quite general since we do not necessarily require that the functions f in (2) are differentiable but rather we allow them to contain in their functional expressions other simple kinks. In this way some other kinds of nondifferentiable terms such as for example terms of the form

max {fl(x) . . . . ,f,,(x)} (3)

can be expressed in terms of simple kinks by writing

max {fl, ...,fro} = f l + Y[f2 - f , + Y[---7[fm-1 - fm-z + Y[f., -- f m - , ] ] . " ] ] ' (4)

Since there are no restrictions on the manner in which simple kinks enter in the functional expression of g, a little reflection should convince the


reader that the class of nondifferentiable problems that we are considering is indeed quite broad.

The basic idea of our approach for numerical solution of problems of the form (1) is to approximate every simple kink in the functional expression of 9 by a smooth function and solve the resulting differentiable problem by conventional methods. In this way an approximate solution of problem (1) will be obtained which hopefully converges to an exact solution as the approximation of the simple kinks becomes more and more accurate.

While, as will be explained shortly, other approximation methods are possible, we shall concentrate on the following two-parameter approximation ~[f(x), y, c] of a simple kink 7If(x)],

( f ( x ) - (1 - y)2/2c ~[f(x), y, c] = <y f (x) + �89 [f(x)] 2

[ . -y2 /2c

where y and c are parameters with

O_<y_< 1, O < c .

if (1 - y)/c < f(x), if - y / c < f(x) < (1 - y)/c, if f (x) < - y/c,

(5)

(6)

If the functionfis differentiable then the function ~[f(x), y, c] above is also differentiable with respect to x. Its gradient is given by

( V f(x) if (1 - y)/c < f(x), V~[f(x),y, c] = ~ [y + cf(x)] Vf(x) if - y / c < f(x) < (1 - y)/c, (7)

~, 0 if f (x) < -y /c .

The functional form of ~ is depicted in Fig. 1. It may be seen that

,)(t, y, c) < 7(0 < ~(t, y, c) + (1/2c)max {y2, (1 - y)2} (8) < ~(t, y, c) + (1/2c) for all t E R.

Thus the parameter c controls the accuracy of the approximation. The parameter y determines whether the approximation is more accurate for positive or negative values of the argument t. Thus for the extreme case y -- l, the approximation to 7(0 is exact for 0 < t while for y = 0 the approximation is exact for t < 0.

Let us now formally describe the approximation procedure for solving problem (1), where we assume that the nondifferentiability of 0 is exclusively due to the presence of terms

7[f~(x)] = max { 0, f~(x) }, i E I (9)

and I is an arbitrary index set.


~( t ,y ,c ) y(t)

y( t )

/ /

/

( l -y) 2 ~c

~ /II _ y2 J

i ~ ~(t,y,c)

%

Fig. 1.

Given parameters Ck, Y~k, i e I with Ck > O, 0 < y~ < 1, replace each term 7[fi(x)], i e ! in the functional expression of g by ~[fi(x), y~, Ck] to obtain a function 9k and solve the problem

minimize ~k(X), (10) subject to x e Q c X .

If xk is a solution of the above problem update c by setting Ck + 1 = fl Ck, where fl > 1, and update the multipliers y~ in some fashion to obtain y~ + 1, with 0 < y~+ 1 < 1, i e I. Solve the problem

minimize gk + 1 (X), (1 1) subject to x e Q c X

to obtain a solution Xk+ 1 and repeat the procedure. It is important to note that the choice of the approximation (5) is by

no means arbitrary and in fact it is closely related to penalty and multiplier methods for constrained minimization (see e.g. [3, 4, 8, 11, 16, 18]). By


introducing an auxiliary variable z, a simple kink may be written as

y[ f ( x ) ] = min z. (12) f(x)<--z,O<.z

By using a quadratic penalty function, the minimization problem above may be approximated by

min [z + �89 (max{0, f ( x ) - z})2], c > 0. (13) O_<z

Carrying out the minimization with respect to z, the expression above is equal to

( f ( x ) - 1/2c if I/c < f (x) , ~[f(x), 0, c] ={ lc [ f (x ) ]2 if 0 _< f ( x ) < 1/c,

( o if f (x) <_ o.

If we use the generalized quadratic penalty function used in the method of multipliers [4, 18] the minimization problem in (12) may be approximated by the problem

min [z + 1/2c[(max{0, y + c [ f ( x ) - z]}) 2 - y2]], o-<z (14)

0 < c , 0 < y < l .

Again by carrying out the minimization explicitly, the expression above is equal to ~[f(x), y, c] as given by (5). Notice that we limit the range of the multiplier y to the interval [0, 1] since one may prove that the Lagrange multiplier for problem (12) lies in that interval.

The interpretation of the approximation procedure in terms of penalty and multiplier methods is very valuable for a number of reasons. First it provides guidelines for approximation of simple kinks by a wide variety of functions. Every penalty function suitable for a penalty or multiplier method yields an approximating function via the procedure described above. A wide class of such functions is given in [13, 14]. Many of these functions yield twice differentiable approximating functions, a property which may be desirable from the computational point of view. Second, our interpretation reveals that we may expect that the basic attributes of the behavior of penalty and multiplier methods will also be present in our approximation procedures. Thus we may expect ill-conditioning for large values of the parameter c. This fact necessitates sequential operation of the approximation method, i.e., repetitive solution of the approximate problem for ever increasing values of the parameter c. Third, the interpretation motivates us to consider an updating procedure for the parameters y which


is analogous to the one used in the method of multipliers. This updating procedure results in most cases in significant improvements in computational efficiency.

The updating formula for the multipliers y~, which closely parallels the one of the method of multipliers and which will be discussed in some detail in this paper, is given by

{ i if 1 <Y~+Ckfi(Xk)' fik+a = ~k -~- Ckfi()s if 0 < eL + Ckfi(Xk) ~ 1, (15)

if y~ + Ckfi(Xk) < O.

One heuristic way to justify (15) is based on the following observation�9 If we assume for a moment that XR --' 2, where 2 is an optimal solution of (1) and for some k we have (1 - YD/Ck < f~(xk) this indicates that likely there holds 0 < f~(~-) and therefore it is better to approximate more accu- rately the simple kink y[fi(x)] for positive values of f (x) rather than for negative values�9 The iteration (15) accomplishes precisely that by setting y~+a = 1 (c.f. (5)). Similarly for the other c a s e s fi(Xk)~-~--y~/C k and -Y~/Ck < fi(Xk) < (1 -- Y~)/Ck the iteration (15) may be viewed as adaptively adjusting the accuracy of approximation of the simple kink y(t) from positive to negative values of the scalar argument t and vice versa as the circum- stances dictate. A more rigorous and satisfying justification for the employment of (15) together with a clarification of the connection with the method of multipliers is provided in Section 4.

Since the class of problems of the form (1) is extremely rich in variety, it is very difficult to provide a convergence and rate of convergence analysis for the general case. The utter impossibility o f providing a unifying no- tational description for the general case of problem (1) is one of the main obstacles here. For this reason we shah restrict ourselves to the specific class of problems

minimize g[x, ~[f~(x)],..., ?[fro(X)]], (16) subject to x ~ Q c R " ,

where #, f~, ..., fm are continuously differentiable functions. Notice that this class of problems includes the problem of solving systems of equations of the form

hi[x , 7[fl(x)], ..., y[f,,(x)]] = O, i = 1,..., n

by means of the minimization problem

D.P. Bertsekas I Nondifferentiable optimization via approximation

minimize ~. Ih,[x, eEA(x) ] . . . . , eEf . (x)3] l i = 1

Our results also apply with simple modifications in statement and proof to the class of problems

sm minimize g[x, max{f~(x),. . . , f~'(x)} . . . . . max{f~(x) ..... f~ (x)}], (17) subject to x ~ Q c R .

On the other hand, our analysis can serve as the prototype for the analysis of other different or more general cases and provides a measure of what kind of behavior one may expect from the approximation methods that we propose.

The paper is organized as follows: In the next section we prove the basic convergence results for our approximation methods. Section 3 shows that one may obtain, as a byproduct of the computation, quantities which play a role analogous to Lagrange multipliers in constrained minimization. We also show that our convergence results may be used to obtain some optimality conditions for the problem that we are considering. Section 4 examines the possibility of acceleration of convergence by using iteration (15). The connection with the method of multipliers is clarified and some convergence and rate of convergence results are inferred. Finally in Section 5 we present some computational results.

2. Some convergence results

Consider problem (16), where we adopt the standing assumption that the set Q is nonempty and that the functions

g :R"+m~ R, f : R " ~ R, i = 1,. . . ,m

are everywhere continuously differentiable. We denote by VxO the (column) vector of the first n partial derivatives of 9, while we denote by ag/dt~, i = 1, ..., m the partial derivative of O with respect to the (n + i)th argument O.e., agfat, =

Consider now the k th approximate minimization problem

min O[x, ~[fl(x), y~, Ck], ..., ~[fm(X), yr~, Ck]], (18) xEQ

where 0 ~ C k ~ O k + l , Ck --.I. 00,

O < y ~ < l, i = 1,...,m, k = O , 1,...

D.P. Bertsekas / Nondif ferentiable optimization via approximation

and the approximate kink ~[f/(x), y~, Ck] is given by (5). Any rule may be used for updating y~--for example y~ may be left constant. Let Xk be an optimal solution of problem (18) (assuming one exists). We have the following basic proposition:

Proposition 2.1. There holds

Ig* - g[xk, ~ [ A ( x k ) ] , . . . , ~ [ f m ( x k ) ] ] l -< L/ck, where

k = 0, 1, ..., (19)

o* = inf oEx,~,[A(x)], .... ~,[f,.(x)]],.

L = sup (x, tl, tin) i = 1 (x, tl . . . . . tm)~M ~ i " ' "

M = {(x, t, . . . . . t,.): x e Q, 7[f , (x)] -- �89 o _< <-- t i <-- 7[f~(x)], i = 1 . . . . . m},

provided L above is finite.

(20)

(21)

(22)

Corollary 2.3. Let g have the particular form

g[x, ?[fx(x)], ..., ?[f, ,(x)]] = go(X) + ~ '),[f~(x)], i = 1

Then Ig* - g[x, , , ~ , [A (xk ) ] , .. . , ~,[f, , ,(x,,)]] l -< ,n/c,,.

90: Rn -~ R.

Corollary 2.2. Let Q be a bounded set. Then

lkim g[Xk, ~[fl(Xk)], ..., ?[fm(Xk)]] = g*. (24)

Proof. The boundedness of Q implies that L as given by (21) is finite. Hence by (19) and the fact CR ~ ~ the result follows.

As a direct consequence of the proposit ion above we have the following convergence results.

Proof. By Taylor 's formula, (8), and (21), (22) we have for every x e Q,

IgEx, tEA(x)] . . . . . 7 [ f , (x) ] ] + - g[x, ~[f~(x), y~, Ck],..., ~[fm(x), Y~', Ck]]l < L/2Ck (23)

from which the result follows.


Proof. Immediate from (23), (21) and (19).

The above corollary is interesting from a computational point of view since it shows that for the problem above there are available a priori bounds on the approximation error.

Corollary 2.4. Let ~ be any limit point of the sequence {Xk} and let Q be a closed set. Then

g[~, ~[f~(~)] . . . . . 7[fm(~)]] = min g[x, v[fx(x)] . . . . , ~[fm(x)]], xeQ

i.e. -~ is an optimal solution of problem (16).

Proof. Without loss of generality assume that the whole sequence {Xk} converges to ~ and let S be any closed sphere containing the sequence {Xk}. Clearly each vector Xk is an optimal solution of the problem

min g[x, )[fl(x), y~, Ck],..., ~[fm(x), Y~, Ck]] xeQnS

and since Q n S is bounded, by Corollary 2.2 and (23) we have

l im g[x,, ~[fl(xk), y~, c,] . . . . . ~[f.(x,), y~, c,]] =

= g[~, ? [ f , ~ ) ] . . . . ,7[fm(x)]]

= min g[x, 7[fl(x)], ..., ?[f,~(x)]]. x~Qr~S

It remains to show that the minimum of g over Q n S above is equal to the minimum of g over Q. But this follows from the fact that S can be any closed sphere containing {Xk} and hence it can have an arbitrarily large radius.

Notice that the proposition states that every limit point of {Xk} is a minimizing point of problem (16), but does not guarantee the existence of at least one limit point. This is to be expected since problem (.16) may not have a solution. On the other hand the existence of a solution as well as of at least one limit point of {Xk} is guaranteed when Q is a compact set.

The above proposition and corollaries establish the validity of the approximation procedure. One may notice that the proofs are very simple and rest on the fact that the function ~[f~(x), y~, Ck] approximates uniformly the simple kink ~[f~(x)] with an approximation error at most equal to 1/2Ck as shown by (8). It is interesting to observe that convergence does not

10 D.P. Bertsekas / Nondifferentiable optimization v~t approximation

depend on the particular values y~, i ~ I employed. This allows a great deal of freedom in adjusting y~, for the purpose of accelerating convergence.

Since our procedures are related to penalty methods one expects that they must yield, as a by-product of the computation, quantities which may be viewed as Lagrange multipliers. In the next section we show that such multipliers may indeed be obtained. Furthermore we show that these multipliers enter in optimality conditions which, aside from their analytical value, may serve as a basis for termination of our approximation procedure.

3. Multiplier convergence and conditions for optimality

Let us assume throughout this section that Q is a closed convex set. Using (9), the gradient with respect to x of the objective function in problem (18) may be calculated to be

Vg[x, ~[f,(x), y~, Ck] . . . . , ~[f.(x), Y~'k, Ck]] = ag !

= v . + - , J cd i = 1 ' I x

= Vxg + -- ~ og ~ ,=, ~//Yk(X) Vf, (25)

where ~ , i = 1 . . . . . m is given by

{ i if I<y~+Ckfi(X) ' Y~k(X) = ~ + Ckfi(x) if 0 < y~ + Ckf~(x) < 1, (26)

if Y'k + Ckf~(x) < 0

and all gradients in the right-hand side of (25) are evaluated at the point x. Since Q is a convex set and Xk is an optimal solution of problem (18)

we have the necessary condition

(Vxg+ i ~ 1 CtiO~gYik(Xk)V filx = x~'(X -- Xk)) >-0 for all x ~ Q, (27)

where ( . , . ) denotes the usual inner product in R". Let now {Xk}k~r be a subsequence of {XR} which converges to Y. By

Corollary 2.4, 2 is an optimal solution of problem (16). In addition, the subsequence {)7/} = {37~(Xk), .... ~kk(Xk)}k~r defined by (26) has at least one limit point and, by taking limits in (27), each of its limit points y = {yl . . . . . ~-,,} must satisfy

D.P. Bertsekas / Nondifferentiable optimization via approximation 11

i= 1 -~i yiVfi , (x - 2) > 0 for all x e Q.

Combining the above observations with Corollary 2.4 we obtain the following proposition:

Proposition 3.1. Let Q be a closed convex set, and let (~,y) be any limit point o f the sequence { Xk, Yk}, where Yk = (Y~(Xk), ..-, ~(Xk)) is defined by (26). Then ~ is an optimal solution of problem (16); and y is a multiplier vector satisfying

V x g + i = l ~ i y v J i , ( x - 2 ) >_0 for a l l x ~ Q ;

(28) yi = 0 /ff/fx) < O, yi = 1 / f f (x) > O,

0 ___ y' _< 1 / f f , f~) = 0.

Proposition 3.1 together with Proposition 2.1 and its corollaries may be used to provide a simple proof of an optimality condition for problem (16).

We shall say that ~ is a local minimum for problem (16) if ~ is a minimizing point of g[x, y[fl(x)], ..., ~[fm(x)]] over a set of the form Q c~ {x: [x - ~] _< e}, where [.I denotes the Euclidean norm and e > 0 is some positive scalar. I f~ is a unique minimizing point over Q n {x: Ix - ~] _< e}, we shall say that ~ is an isolated local minimum.

Proposition 3.2. Let Q be a closed convex set and ~ be an isolated local minimum for problem (16). Then there exists a multiplier vector

= (-ill,..., ym) satisfying

I 1 i=1 ~ y vJi Ix=" (x - x--) >_ 0 for all x e Q, (29)

y' = 0 if f~(2) < O, i = 1 . . . . . m, (30)

y i = 1 / f f~(2)>O, i = 1 . . . . . m, (31)

0 < y' <_ 1 if f~(2) = O, i = 1,.. . , m. (32)

Proof. Let Q be a set of the form Qc~ {x: [x - 21 < e} within which 2 is a unique minimum of g. Consider the approximation procedure for the problem

min g[x, ~[f~(x)] . . . . . ~[f,,(x)]]. xei~

12 D.P. Bertsekas / Nondifferentiable optimization via approximation

The generated sequence {Xk} is well defined since ~) is compact. Further- more, since 2 is the unique minimizing point of g within Q, we have Xk ~ and {)k}k~r ~ Y for some vector y e R" and some subsequence {)k}k~r, where Yk is defined in Proposition 3.1. Then (29) follows from (28). The relations (30), (31) and (32) follow directly from the definition of Yk and the fact Xk --" ~.

We note that when Q = R", the above proposition yields the stationarity condition

Vxg + - - y vJi = 0. (33) i= a Oti X ~

The above condition and in some cases the more general condition of Proposition 3.2 may be used as a basis for termination criteria of the approximation procedure.

We note that necessary conditions similar to the one of Proposition 3.2 may also be proved in an analogous manner for problem (17) as well as for many other problems which are similar in nature and are amenable to the same type of analysis as the one presented for problem (16).

4. Acceleration by multiplier iterations

In this section we examine an updating procedure for the multiplier vectors Yk which in many cases can greatly improve the computational efficiency of the approximation method. We consider the case where the approximation procedure is operated sequentially and the multipliers y, y~, i = 1 . . . . . m used in the approximations, are updated by means of the iteration

{ i if 1 <--Yik+Ckfi(Xk)' Y~k+l = ~ + Ckfii(Xk) ifO <_ y~ + Ckfi(Xk) -~ 1, (34)

if Y~k + Ckf~(Xk) < O.

A heuristic justification for this iteration was given in the introduction, where we also mentioned its connection with iterations for the method of multipliers. We now concentrate on clarifying this latter connection further. Some familiarity with the method of multipliers is required on the part of the reader for the purpose of following the discussion.

Consider the following simple special case of problem (16)

D.P, Bertsekas / Nondifferentiable optimizat ion via approximation 13

minimize{go(X)+ i=1~ aiT[f~(x)]},

where a~, i = 1, ..., m are some positive scalars. By introducing additional variables zl, ..., Zm this problem is equivalent to the problem

min {go(x)+ ~zi}. ~if~(x) < zi i= 1

O<_z i

The problem above may be solved by the method of multipliers with a quadratic penalty function as described for example in I-4, 13, 14, 18]. One may either eliminate only the constraints =~f~(x) < z~ by means of a generalized penalty or eliminate both the constraints a~f~(x) < z~ and 0 _< z~. It is possible to show that both approaches lead to identical results in our case. The reader may easily verify that our approximation procedure coupled with iteration (34) is in fact equivalent to solving the minimization problem above by the method of multipliers mentioned earlier. Our approximation procedure however is not equivalent to the method of multipliers for problems which do not have the simple form above although there is a certain relation which we now proceed to discuss.

Let Q = R" and let 2 be a unique (isolated) local minimum of problem (16) within some open sphere S(2; e) = {x: Ix - 21 < 5}. Let us use the notation

t + = {i: f/(2) > 0, i = 1,2 . . . . . m}, 1- = { i : f ~ ( 2 ) < 0 , i = 1 , 2 . . . . ,m}, I ~ = {i: f~(2) = 0, i = 1,2 . . . . , m } .

Assume that 5 > 0 is taken sufficiently small to guarantee that

fi(x) > 0 for all x ~ S(2; 5), i e 1 +, ft(x) < 0 for all x s S(2; 5), i e 1-.

Let us first consider the case where the objective function O in problem (16) has the particular form

g[x, 7[f,(x)],..., 7[fm(x)]] = go(x) + ~ gi(x) 7[fi(x)], (35) i=1

where g~: R" ~ R, i = 0 . . . . , m, are continuously differentiable functions. Now if we make the assumption

gi(2) 4 :0 for all i s I ~ (36)

14 D . P . B e r t s e k a s / N o n d i f f e r e n t i a b l e o p t i m i z a t i o n v ia a p p r o x i m a t i o n

we have that, when g has the form (35), problem (16), locally within a neighborhood of ~, may be written as

min~go(x) + ~ g,(x) fi(x) ( i e I +

+ ,~,o+ ~ max[-O, g,(x) f(x)] + ,~,o ~ min[O, g,(x) f~(x)]} (37)

where

I ~247 = {i: gi(~) > O, f~(x) = 0}, I ~ = {i: gi~) < O, f(x) = 0}.

Since 2 is an isolated local minimum of the above problem, it follows under the mild assumption

V[gi(x) fi(x)] x=x' i ~ 10 are linearly independent vectors (38)

that the set I ~ is empty and we have

gift) > 0 for all i ~ I ~ (39)

Notice that the previous assumption (36) is implied by assumption (39). This fact can be verified by noting that 2 is an optimal solution of the problem

min ~go(X)+ ~, gi(x) fi(x)+ ~ z,+ ~, gi(x)fi(x)~ (40) g i ( x ) f t (x) <- zl ~ i d + i e l o + i ~ "o - J 0 < zl, i e l O+

for every subset ' /~ of the set I ~ By applying the Kuhn-Tucker theorem to problem (40) and using (38) it follows that the set I ~ must be empty, i.e., (39) holds.

The basic conclusion from the preceding analysis is that, assuming (38), the problem of minimizing (35) is equivalent, locally around ~, to the nonlinear programming problem

min {go(x)+~,gi(x) f~(x)+~zi} . (41) gi(x)f~ (x) <-- zi iEl + i~l ~

0 <- zl, i~l ~

At this point we deviate somewhat from our main subject and discuss briefly a constrained minimization method which is identical to the method of multipliers as described for example in [-4, 18] except for the fact that the penalty parameter may depend on the vector x. It turns out that this method is closely related to our approximation procedure.

D.P. Bertsekas / Nond(fferentiable optimization via approximation 15

Consider a constrained minimization problem of the form

min po(X), (42) pi(x) <. 0

i = l . . . . , m

where Po, Pi :R"--' R. The method of multipliers consists of sequential unconstrained minimization of the function

~ {[max(O, Y~k + dkp,(x))] 2 -- (Y/k)2}, (43) po(X) + i = 1 ~Ck

where (d,) are sequences of positive numbers and the multipliers y~, are updated at the end of each unconstrained minimization by means of the iteration

Y~,+I = max[0, y~, + dkpi(Xk)], i = 1,..., m, (44)

where Xk minimizes (43). The same updating formula may be used even if Xk is only an approximate minimizing point of (43). The method maintains its basic convergence characteristics provided the unconstrained minimization is asymptotically exact. We refer for a detailed discussion of this point as well as for supporting analysis to [3, 4].

Now consider a variation of the method above whereby x-dependent penalty parameters ?~(x), ?~, : R n-~ R, are used in (43), (44), in place of c~,, i.e., we minimize

1 {[max(0,y~, + ~,~(x)pi(x))] 2 - (y~,)~} (45) P(x, yk)= po(xt + i = 1

and update the multipliers by means of

Y~,§ = max[0, y~, + ~,(xk) p~(x~)], i = 1, ..., m, (46)

where x~ minimizes (45). Here we assume that ~,(x) is positive over a region of interest and that there is some form of control over the magnitude of ?~,(x) so that it can be uniformly increased if desired. For example we may have ?/k(x) = cikr~(x), where c], is a scalar penalty parameter that may be increased from one minimization to the next and ri(x) is a positive function of x over the region of interest which does not depend on the index k.

It is not difficult to see that a method of multipliers with an x-dependent penalty parameter of the type described above should behave similarly as a method of multipliers of the ordinary type. The reason is that if Xk is a minimizing point of the function P(x, Yk) of (45), then Xk is also an approximate minimizing point of the function


m 1 P(X, Yk) = po(x) + ,=i ~ 2~(Xk) {[max(O,y~ + C~(Xk) pI(x))] 2 -- (y~)2} (47)

in the sense that

VP(Xk, Yk) = --�89 ~ [max(-- y~/C~k(Xk), p,(xk))]2V~k(Xk). i = 1

Now i f x k is close to an optimal solution ~ and Yk is close to a corresponding Lagrange multiplier vector y of problem (42), then VP(xk, Yk) is small and in the limit as Xk ~ X, YR --* Y we have VP(xk, Yk) - * O. In other words, Xk is an asymptotically exact minimizing point of (47). As a result, the multiplier method with x-dependent penalty parameters is equivalent to an ordinary multiplier method with penalty parameter sequences {~(Xk)} where asymptotically exact minimization is employed. It follows under suitable assumptions that the reader may easily provide, using the analysis of [3, 4], that such methods of multipliers employing x-dependent parameters possess the well-known advantages of multiplier methods over penalty methods. In particular the multiplier iteration (46) accelerates convergence and the penalty parameters need not be increased to infinity in order for the method to converge.

Now it may be verified that our approximation method when iteration (34) is employed is equivalent, within a neighborhood of ~, to a multiplier method for solving the constrained minimization problem (41) where the penalty parameter is x-dependent as described above. To be precise, let (Ck} be a parameter sequence used in the approximation method for minimizing (35) and let {Xk}, {Yk}be the corresponding generated sequences. Then the vectors Xk, Yk, for k > k, where k is sufficiently large, are identical to the ones that would be generated by a method of multipliers for problem (41) for which:

(a) Only the constraints g~(x) f (x ) < zi, i e I ~ are eliminated by means of a generalized quadratic penalty.

(b) The penalty parameter for the (k + 1)'h minimization corresponding to the ith constraint, i e I ~ depends continuously on x and is given by

= ck/g (x).

(c) The multiplier vector ~ at the beginning at the (k + 1) th minimization is equal to Yr. Alternatively, the vectors Xk, Yk for k > k, where k is sufficiently large, are identical to the ones that would be generated by the method of multipliers for problem (41) for which:

(a) Both constraints g~(x) f ( x ) < z~, i e I ~ and 0 < z~, i e I ~ are eliminated by means of a generalized quadratic penalty.


(b) The penalty parameter for the (k + 1)th minimization corresponding to the ith constraints depends continuously on x and is given by ~(x) = 2cdgi(x).

(c) The multiplier vectors ~ , ~ at the beginning of the (k + 1)th minimization (where ~ corresponds to the constraints #~(x) f~(x) < zi, i ~ I ~ and ~ corresponds to the constraints 0 < z~, i t I ~ satisfy ~ = Yk- and ~ = 1 - - y ~ .

The equivalence described above may be seen by verifying the following relations which hold for all scalars y ~ [0, 1 ], c > 0, g > 0, f .

~ ( f , y, c) = min[z + (g/2c) {[max(0,y + (c/g)(gf - z))] 2 - y2}] 0 < z

= min[z + (g/4c){[max(0,y + (2c/g)(g f - z))] 2 - y2

+ [max(0, 1 - y - (2c/g)z)] 2 - (1 - y)2}].

The above relations show that after a certain index (when ck is sufficiently high and the multipliers y~, i e I +, y~, i e I - have converged to one and zero respectively) the k th unconstrained minimization in our approximation method is equivalent to the k th unconstrained minimization in the multiplier methods mentioned above.

The conclusion from the above analysis is that our approximation procedure with the iteration (34) when applied to minimization of a function of the form (35) has similar behavior as a method of multipliers with x- dependent penalty parameter applied to problem (41). Thus we can conclude that results concerning convergence and rate of convergence for the method of multipliers carry over to our case. In particular we expect that under assumptions which hold in most cases of interest the iteration (34) will accelerate convergence and will avoid the need to increase ck to infinity, i.e., the approximation procedure will converge even for ck constant but sufficiently large.

We turn now to the general case of problem (16) where the cost function g does not have the form (35). Let us assume for convenience and without loss of generality that I ~ = { 1,..., m}, and consider the following Taylor series expansion around

a(x) = O[x, ~[fl(x)] . . . . . 7[fm(X)]] mr?

= gEx, ?[f,(~)] .... ,7[fm~)]] + 2 ~ [ X , 7[f,(x)], ..., 7[fm~)]]TEf~(x)] i = l tJLi

(48,


where Ox is a function of x, 7[f/(x)], i = 1 . . . . . m such that for every x,

lim

i=1

= 0 .

Now the function (48) is of the form (35) except for terms of order higher than one. Thus we expect that in the limit and close to 2, where the term 0~(~i~1 7[f/(x)]) is negligible relative to first order terms, the approximation method will yield similar behavior as for the case of a function of the form (35), and the iteration (34) can be viewed as asymptotically equivalent to the iteration of a method of multipliers.

A more precise justification of the point above can be given by considering the function (48) up to first order

~(x) = g[x, 7[f1(#)] . . . . . 7 [ f , fY)]]

8g + ,=;~ ~-~ [x, 7[flfY)] . . . . . 7[f,,(2)] 7[f(x)]. (49)

Now the point 2 satisfies the necessary condition of the previous section for an isolated local minimum of the function above. Let us assume that is indeed an isolated local minimum of (49). This assumption is not, of course, always satisfied but is likely to be true in many cases of interest. Then the approximation procedure for minimizing G(x) is equivalent to the approximation procedure for minimizing (~(x) except for the fact that in the latter case we terminate the unconstrained minimizations at points Xk, where the gradient'of the approximate objective function is

Now when Ox contains terms of second order and higher and x k ~ 2~ Yk ~

(this is guaranteed for example if Ck ~ 00), the term above tends to the zero vector and the minimization of the approximation of G(x) is asymptotically exact.

Our discussion above has been somewhat brief since a detailed analysis ~ of our viewpoint would be extremely long and tedious. However it is felt that enough explanation has been provided to the interested reader in


order for him to supply the necessary assumptions and analysis and firmly establish the conclusions reached.

We mention that for practical purposes it may be computationally efficient to update the multipliers prior to completing the minimization in the approximate optimization problems. One may use analogs of termination criteria used for penalty and multiplier methods with inexact minimization [3, 4]. While it seems that the employment of such termination criteria should result in more efficient computation for many problems, our computational experiments were inconclusive in this respect.

Finally we note that when the constraint set Q in problem (16) is specified by equality and inequality constraints, it is convenient to eliminate these constraints by means of penalty functions while solving the approximate minimization problems. In this way the approximation method is combined with the penalty function method in a natural way. One may use the same parameter Ck to control both the accuracy of the approximation and the severity of the penalty. Assuming that Ck ~ O0 one may prove various convergence theorems for methods of this type simply by combining standard arguments of convergence proofs of penalty methods [8] together with the convergence arguments of this paper. As an example consider the problem

minimize go[X, 7[f~(x)], ..., 7[fm(x)]],

subject to g l [x , 7[hl(x)], ..., r[hp(x)]] = 0,

where go, gl are real valued functions. One may consider sequential minimization of the function

go[x, ~[s yL ~], ..., ~[/,(x), H, cQ] +

+ ,kg,[~, ~[h,(~), wL c~] ..... 9[h,(~),,q, ~ ] ]

+ �89 [~, ~[h,(~), w~', ~,], ..., ~ [h~(x), w~, c~]]] ~,

where (Ck} is an increasing sequence tending to infinity with Ck > 0, {y~}, {W~,} satisfy

0 < y ~ < 1, 0 < w ~ < 1 for allk, i

and {2k} is a bounded scalar sequence. The updating procedure for the multipliers which corresponds to the iteration of the method of multipliers is given by

20 D.P. Bertsekas / Nondifferentiabte optimization via approximation

{ii y'k + 1 = k + ck f~(x~)

if 1 < yik + ckf~(Xk), if 0 < Y~k + Ck-f~(Xk) < 1, if Y'k + ckfi(xk) < 0;

Wk + 1 k + Ckhi(Xk)

if 1 < Wik + ckhi(xk), if 0 < w~ + Ckh~(Xk) < 1, if y~ + Ckhi(Xk) < 0;

(50)

(51)

)'k+1 = 2k + Ckg,[Xk, ~[h1(Xk), W~, Ck], ...,~'[hp(Xk), W~, Ck]]. (52)

The updating procedure (50)-(52) appears to be a reasonable one and when employed it improved a great deal the speed of convergence in our computational experiments. However we offer here no theoretical analysis which supports the conjecture that it accelerates convergence for any broad class of problems.

5. Computational results

We have performed, with the assistance of L. Berman, a number of computational experiments to test the analysis of this paper. We present here some of the results related to two test problems. In both problems we performed the unconstrained minimizations by using the Davidon-Flet- cher-PoweU method available on the IBM-360 and referred to as the FMFP Scientific Subroutine. The value of the parameter e which controls accuracy of minimization in this method was taken to be e = 10- s. Double precision was used throughout. The starting point for each unconstrained minimization, except the first one in each problem, was the final point obtained from the previous minimization. The computational results are reported in terms of number of iterations required (the number of function evaluations not being readily available). These results, naturally, are highly dependent upon the efficiency of the particular unconstrained minimization subroutine employed. It is possible that much better (or worse) results may be obtained by employing a different unconstrained minimization method such as for example Newton's method.

Test problem 1. The problem is

m i n i + i l x i , / = 1


where x ~ denotes the ith coordinate of x ~ R". We represented Ix~[ by x ~ + y [ - 2 x i] and used our approximation procedure with and without updating of the multipliers starting with x i = - 1, y~ = 0, i = 1, ..., n. We solved the problem for n = 5 and n = 50 and a penalty parameter sequence Ck = 5 k. We also solved the problem by using a constant value of penalty parameter ck = 10 for all k in conjunction with iteration (34). Table 1 shows the results of the computation.

Table 1 'Value of objective at minimizing point Xk)/(Number of iterations required)

Ck = 5 k, y~ -- 0 Ck = 5 k, Yk : updated ck -- 10, Yk : updated

k n - - 5 n = 5 0 n = 5 n = 5 0 n = 5 n = 5 0

0 32.6820 10 14753.3 6 1 3.06250 19 6420.92 13 2 1.32250 16 441.704 72 3 1.06090 13 11.9509 93 4 1.00240 13 2.28010 132 5 1.00048 34 1.21441 165 6 1.00010 24 1.04122 131 7 1.00002 29 1.00818 180 8 1.00000 30 1.00163 103 9 1.00033 186

10 1.00007 143 11 1.00001 168 12 1.00000 140

32.6820 10 14753.3 6 1.89062 16 2028.52 56 2.00493 18 5807.64 73 1.00000 10 131.120 86 1.00000 13 72.7005 60 50.7946 84

1.93668 139 18.6044 115 1.00000 I00 1.00091 71

1.00090 50 1.00000 36

Total number o f iterations

188 1532 41 378 26 498

We also solved a constrained version of the problem

subject to Ix 1 - 2L + Ix 1 + . . . § Ix l = 1

by using the combinat ion of the penalty or multiplier method and the approximation method described at the end of the previous section. The starting points were x i = - 1, yi = 0, w i = 0, i = 1, 2 . . . . . n and ;t = 0. The results are summarized in Table 2. In each case the constraint equat ion was satisfied within six significant digits at the final solution point.

22 D:P. Bertsekas / Nondifferentiable optimization via approximation

Table 2 (Value of objective at minimizing point Xk) / (Number of iterations required)

C k = 5 k, yk,Wk,2k =- 0 C k ---- 5 k, yk,Wk,2k:Updated C k = 10, yk,wk,2 k :updated

k n = 5 n = 50 n = 5 n = 50 n = 5 n = 50

0 23.6249 13 164410. 61 23.6249 13 164410. 61 3.85141 15 2611.20 92 1 4.35020 19 7095.79 93 3.17223 16 4400.94 103 3.63379 16 161.944 133 2 3.84160 15 432.923 102 3.80147 15 99.9155 114 3.93774 5 52.6251 131 3 3.95553 19 20.1098 108 3.99683 5 5.73819 133 3.98959 5 3.22301 111 4 3.99054 16 6.21738 137 3.99999 5 3.99912 109 3.99826 5 3.86473 84 5 3.99809 26 4.40393 222 4.00000 6 4.00002 60 3.99971 5 3.97907 51 6 3.99962 44 4.07921 132 4.00000 50 3.99995 5 3.99800 54 7 3.99992 36 4.01578 167 3.99999 5 3.99937 50 8 4.00000 36 4.00315 139 4.00000 5 3.99989 50 9 4.00063 149 3.99998 50

10 4.00013 186 4.00000 50 11 4.00003 131 12 4.00001 500* 13 4.00000 500*

Total number of iterations

224 2,627 60 630 66 856

* Limit on # of iterations

Test problem 2. This is a minimax problem suggested to us by Claude Lemarechal:

min max{fx(x), f2(x) . . . . . fs(x)}, X

where x E R t~ and

f~(x) = ~x, A i x ) - (bi, x ) , i = 1, 2 . . . . , 5.

The elements at(m, n), m, n = 1 . . . . . 10 of the matrices Ai and the coordinates hi(m) of the vectors bi are given by:

at(m, n) = e m/n cos(m- n) sin(i) for m < n,

ai(m, n) = ai(n, m) for m > n,

a,(m,m) -- 2 Isin (i)1 i/m + Y~ la,(m,j)l, j~p rn

bi(m) = e " / i s i n ( / " m ) , i = 1 . . . . . 5 , m = 1, ..., 10.

We represented max{f1, f2 . . . . . fs} by

D . P . B e r t s e k a s / N o n d i f f e r e n t i a b l e o p t i m i z a t i o n v ia a p p r o x i m a t i o n 23

~

. ~

.8

.=.

0

0

.=.~.

I I I I I I I I

~ t r l

r r162

N

I I

0

I I

I I

I I I I I 1 ~ 1

I I I I

r ,...~ t ' q

tr tr ~'%

I t I

t"q ("4

I I I I

t,r tr i t - i

I I I

O

0 , . , . [ . - o


max{A, . . . , fs} = A + 7[f2 - A + ~:[f3 - A + ~'[A - f3 + ~:[f5 - A ] ] ] ]

and used our approximation procedure in conjunction with iteration (34). The starting points were x ~ = x z = ..- = x 1~ = 0 and y01 = y~ = yo 3 = yo 4 = 0. The optimal value obtained is -0.51800 and the corresponding multiplier vector was

= (1.00000, 0.99550, 0.89262, 0.58783).

It is worth noting that for minimax problems of this type the optimal values of the approximate objective obtained during the computat ion constitute useful lower bounds for the optimal value of the problem. Table 3 shows the results of the computa t ion for the case where unconstrained minimization was "exact" (i.e., e = 10- 5 in the D F P routine% It also shows the results of the computat ion when the unconstrained minimization was inexact in the sense that the k th minimization was terminated when t h e / t - n o r m of the direction vector in the D F P was less than max[lO -s, lO-k].

References

[17 D.P. Bertsekas and S.K. Mitter, "A descent numerical method for optimization problems with nondifferentiable cost functionals", SlAM Journal on Control 11 (1973) 637-652.

[2] D.P. Bertsekas and S.K. Mitter, "Steepest descent for optimization problems with nondifferentiable cost functionals", Proceedinos of the 5th annual Princeton conference on information sciences and systems, Princeton, N.J., March 1971, pp. 347-351.

[3] D.P. Bertsekas, "Combined primal-dual and penalty methods for constrained minimization", SIAM Journal on Control to appear.

[4] D.P. Bertsekas, "On penalty and multiplier methods for constrained minimization", SIAM Journal on Control 13 (1975) 521-544.

[5] J. Cullum, W.E. Donath and P. Wolfe, "An algorithm for minimizing certain nondifferentiable convex functions", RC 4611, IBM Research, Yorktown Heights, N.Y. (November 1973).

[6] V.F. Demyanov, "On the solution of certain minimax problems", Kibernetica 2 (1966). [7] V.F. Demyanov and A.M. Rubinov, Approximate methods in optimization problems

(American Elsevier, New York, 1970). [8] A.V. Fiacco and G.P. McCormick, Nonlinear programming : sequential unconstrained

minimization techniques (Wiley, New York, 1968). [9] A.A. Goldstein, Constructive real analysis (Harper & Row, New York, 1967).

[10] M. Held, P. Wolfe and H.P. Crowder, "Validation of subgradient optimization", Mathematical Programmin 0 6 (1) (1974) 62-88.

[11] M.R. Hestenes, "Multiplier and gradient methods", Journal of Optimization Theory and Applications 4 (5) (1969) 303-320.

[12] C. Lemarechal, "An algorithm for minimizing convex functions", in : J.L. Rosenfeld, ed., Proceedings of the IFIP Conoress 74 (North-Holland, Amsterdam, 1974) pp. 552-556.


[13] B.W. Kon and D.P. Bertsekas, "Combined primal-dual and penalty methods for convex programming", SIAM Journal on Control, to appear.

[14] B.W. Kort and D.P. Bertsekas, "Multiplier methods for convex programming", Pro- ceedings of 1973 IEEE conference on decision and control, San Diego, Calif., December 1973, pp. 428-432.

[15] B.T. Polyak, "Minimization of unsmooth functionals", ~,urnal Vy~islitel'noi Mate- matiki i Matemati~eskoi Fiziki 9 (3) (1969) 509-521.

[16] M.J.D. Powell, "A method for nonlinear constraints in minimization problems", in : R. Fletcher, ed., Optimization (Academic Press, New York, 1969) pp. 283-298

[17] B.N. Pschenishnyi, "Dual methods in extremum problems", Kibernetica I (3) (1965) 89-95.

[18] R.T. Rockafr "The multiplier method of Hestenes and Powell applied to convex programming", Journal of Optimization Theory and Applications 12 (6) (1973).

[19] N.Z. Shot, "On the structure of algorithms for the numerical solution of planning and design problems", Dissertation, Kiev (1964).

['20] P. Wolfe, "A method of conjugate subgradients for minimizing nondifferentiable functions", Mathematical Programming Study 3 (1975) 145-173 (this volume).

nondifferentiable optimization via approximation*differentiable functions. the technique is based on...

Documents