point for minimization

L,

LINEARLY CONVERGENT

INEXACT PROXIMAL POINT ALGORITHM

FOR MINIMIZATION

Ciyou Zhu * hf athematics and Computer Science Division, Argonne National Laboratory

9700 South Cass Avenue, Argonne, IL 60439

August, 1993; revised April, 1994

Abstract. In this paper, we propose a linearly convergent inexact PP14 for minimization, where the inner loop stops when the reIative reduction on the residue (defined as the objective value minus the optimal d u e ) of the inner loop subproblem meets some p r e w j i g n e d constant. This inner loop stopping criterion can be achieved in a fized number of iterations if the inner loop algorithm has a linear rate on the regularized subproblems. Therefore the algorithm is able to avoid the computationally expensive process of solving the inner loop subproblems exactly or asymptotically accurately - a process required by most of the other linearly convergent PPAs.

As applications of this inexact PP-4, we develop linearly convergent iteratior, schemes for minimizing functions with singular Hessian matrices, and for solving hemiquadratic extended linear-quadratic programming problems. We also prove that Correa-LemarCchal’s “implementable form” of PP-4 [2, Algorithm 3.31 converges linearly under mild conditions.

Keywords. Convex programming, proximal point algorithm, extended 1inea.r- quadratic programming, singular Hessian matrix.

* This research was supported by the Office of Scientific Computing, U. S. Department of Energy, under Contract W-31-109-Eng-38 and the National Science Foundation under Contract .4S c-92 13 149.

by a Contractor of the U. S. Government under contract No. W-31-104ENG-38. Accordingly, the U. S. Government retains a nonexcIusrva. royatty-tre-3 11cen.a to publlrh or reproduce the publrhed form Of thll: conlrrbution, or allow others to do Io. for

1

1. Introduction.

Consider the convex programming problem

(PI minimize f ( x ) over all x E X

in the n-dimensional Euclidean space, where X C IR.* is a closed convex set and f : Et” -+ IR is a convex function. Denote the Euclidean norm in Et” by I -4 general iteration scheme for solving ( P ) is the proximal point algorithm (PPA)

1.

[SI - Proximal Point imal parameter). quence {xk} as

Algorithm (Exact PPA). Choose a positive constant c (prox- Specify starting points x’. For IC = 0,1,2,. . . , generate the se-

The PPA is especially useful as a “regularization” procedure for the problem where the objective function is not strongly convex. Some commonly used minimization algorithms may lost their convergence properties or even become undefined when f is not strongly convex. With the PP.4 as an “outer loop” of the iteration, however, these algorithms can still be used in solving the “inner loop” subproblems of calculating the argmin on the right-hand side of (1.1). The objective functions in these subproblems are strongly convex because of the augmented term z 1 Iz - x k 2 I .

In a more general sense, the PPA can be formulated as a procedure for solving the multivalue equation 0 E T(x) related to a m a z i m a l m o n o t o n e operator T : Et” =f Et” [E]. The operator

J, = ( I + cT)-’

is single valued from all of IR” into R” 171. Hence the iteration

is well defined. Moreover, J c ( z ) = 5 if and only if 0 E T ( x ) . The iteration (1.3) converges to a solution of 0 E T [5, 61. For the minimization problem (P), define the extended-valued convex function f :+ R U { m} as

2

where 6x is the indicator function of the set X . Then the subdifferential mapping llf is maximal monotone [lo]. The problem of minimizing f is equivalent to the multivalue equation 0 E T ( z ) with T = l l f , and iteration (1.3) reduces precisely to (1.1). See [12] for detailed discussions.

One problem in the implementation of PPA is the calculation of argmin on the right-hand side of (1.1). Usually, it is also an iterative process. Since this subproblem has to be solved for each outer-loop iteration, it is computationally expensive to take zk+l as the exact argmin in (1.1). In [12], Rockafellar proposed a more realistic version of the PPA, where an approximation, instead of the exact argmin in (l.l), is taken as the next iterate zk+l.

Asymptotically Accurate Proximal Point Algorithm [12]. Choose a positive

constant (proximd parameter) c, and a positive sequence {bk} such that 6k < +m. Spec* starting points xo E X. For k = 0,1,2,. . . , generate the sequence {xk} in such a way that

where .I

This iteration scheme converges to an optimal solution of (PI [12]. If in addi- tion, zk+l satisfies

(1.7) IXk+l _Zk+l I < - 6k1zk+' - xkl,

then the convergence will be linear under the assumptions that the optimal solution set of (P) is a singleton and that the inverse of the mapping T = af is Lipschitz continuous in some neighborhood of 0. Other convergence results regarding the asymptotically accurate PPA may be found in Luque [4], where conditions similar to (1.7) are also imposed.

Unfortunately, the criterion (1.7) with S k < +co will eventually drive

Hence, the subproblem of finding the argmin in (1.6) will still have to be solved to high accuracy, thus posing potential difficulty for the inner loop. Intuitively it might

3

not be wise to spend much computational effort in solving a single subproblem to very high accuracy while leaving the “base point” zk of the inner-loop iteration unchanged.

In [2], Correa and LemarrCchal proposed an “implementable f01-m” of PP.4, where they use a4 e-subdifferential relation to stop the inner loop iteration. (See Section 6.) However, this stopping critrion will still be tightened asymptotically as E + 0 in the algorithm. Besides that, the rate of convergence of the algorithm wits not discussed there.

Another question in the implementation of PPA is the choice of the proximal parameter c. According to the analysis of Rockafellax [12] and Luque [4], larger c

will in general lead to faster convergence €or the outer-loop iteration. However, a larger c often results in numerically more difficult subproblems and, therefore, a slower progress in the inner loop. Hence, a reasonabIe choice of c should be based on the analysis of the combined effects of c on both the inner and the outer loops.

In this paper, we develop some linearly convergent variants of PP.4, where the inner-loop subproblems need not to be solved exactly or asymptotically accurately, but with limited relative precision during the whole process. We call these new &- ants inezac t proximal point algorithms, in contrast to the PPA in its exact form or in its asymptotically accurate forms. In Section 2, we formulate two versions of inexact PPAs and prove their convergence. In Section 3, we prove the linear convergence of the sequence { f ( z k ) } of objective values generated by the inexact PPA under mild assumptions. In other words, we get r i d of any ezactly or asymptotically accurately solved subproblems, while still achieving linear convergence f o r the objective values. In Section 4, we derive linearly convergent iteration schemes in the framework of the inexact PP-4 for minimizing objective functions with singular Hessian matrices. In Section 5 , we apply the inexact PP4 to solve the hemiquadratic extended linear- quadratic programming problems. We also derive certain “suboptimal” choices for the proximal parameter c in these applications. We conclude the paper by showing, in Section 6 , that Correa-LemarCchal’s inner loop stopping criterion in their “implementable form” of PPA [2, Algorithm 3.31 implies the stopping criterion in our algorithm. Thus the linear convergence of Correa-Lemarichal’s algorithm follows immediately from the results of this paper.

4

2. Linearly Convergent Inexact Proximal Point Algorithms.

In this section, we formulate two linearly convergent versions of inexact PPAs and prove their convergence. The first of these two versions applies to the general convex objective functions, while the second version takes advantage of the differentiability of the objective function when it is continuously differentiable.

-4 prominent feature of these inexact PPAs is the fized relative accuracy in the inner loop stopping criterion. For fixed c > 0 and z E IR.", let the function f(-;z) : JR" 3 R be defined as

For convenience, denote

1 z ) = f(2) + T l Z - .I2.

,C

;xk) and fl = minfk(z). z € X

At the base point zk, the minimization of the regularized objective function fk will be stopped as soon as a point ck+' satisfying

is generated in the inner loop, where 7 is a constant in (0,l). Obviously starting from zk, this stopping criterion will be satisfied within a fixed number of iterations if the inner loop algorithm itself has a linear rate on the regularized function fk.

Of course, stopping criterion (3.3) in the inner loop does not suffice to guarantee the convergence of the outer loop in general. A novel idea in our algorithm is to use a bench murk point in the updating to secure the convergence. With the combined effects of the bench mark point and the inner loop stopping criterion (2.3), we are able to obtain both the convergence and a linear rate for the whole process without requiring the proximal calculation being carried out exactly or asymptotically accurately.

We denote the Euclidean inner product by { -, - ), and denote the line segment between two points w1 and w2 by [w1,w2J. The optimal solution set x of (P) is a closed convex set in IR". Let f * be the corresponding optimal value. For given z

and [w1,w2] c X , define

5

Inexact Proximal Point Algorithm - 1 (for general convex f). (i) Choose an inner loop algorithm for minimizing the regularized objective h c -

tion f(. ; z ) on X for fixed I, as well as a constant 77 f (0,l) for the stopping criterion. (ii) Choose a positive constant p E (0 , I], and a mapping D : X + X such that

f o r a l l z E X \ X , D ( s ) # s a n d d ( z ) = D ( z ) - z beingadescentdirectionoff(.;z) at z. For any x E X and y E X , let

Specifir starting point xo E X. For k = 0,1,2,. . . , generate the sequence { s k }

as follows: i f x k E X , then stop; otherwise Step 1. generate a bench mark point yk++' E A(zk) ; Step 2. start the minimization of fk(x) at x k to get some i j k + I satisfying the

inner loop stopping criterion (2.3)

Step 3. take Gk+l W ( i j k + l ) I f(yk+'), p + 1 = yk+' otherwise;

and repeat.

Observe that no exact line search is required in the generation of the bench mark points. By (2.4), yk+' needs only to reduce the regularized objective function

to a certain extent (determined by 9) on the line segment [zk, D ( z k ) ] . Moreover, we will show later that some of the inner loop algorithms provide such a bench mark point automatically in the first iteration. In those cases, there would be no extra computations involved in Step 1.

Obviously, stopping criterion (2.3) could be replaced by any other conditions, as long as these conditions imply (2.3). One possibility is to impose an upper bound for the number of inner loop iterations if the inner loop algorithm itself has a linear rate on the regularized function f, . In Section 6, we will prove that Correa-Lemardchal's stopping criterion in their "implementable form" of PPA [a, Algorithm 3.31 also implies (2.3) with certain 77 f (0 , l ) .

6

If the objective function f in (P) is differentiable in an open neighborhood containing X, then the formula in Step 1 of the above algorithm for bench mark points can be replaced by the Armijo step length strategy.

Inexact Proximal Point Algorithm - 2 (for C' objective function). (i) Choose an inner loop algorithm for minimizing the regularized objective fine-

tion f( - ; z ) on X for fixed .z, as well as a constant q f (0 , l ) for the stopping criterion.

(ii) Choose 8 E (0, I), T E (0, I), and a mapping D : X 3 X such that for all z E x \ X, ~ ( z ) # x and (OJ(z; z ) , ~ ( z ) - z) < 0, where af(.z; z ) is the @ d e n t off(.;z) a t z. Let

A'(x) = x + X(s)d(z), (2.5) where d(x) = D ( z ) - x and

forz ex.

as follows: i f x k E x, then stop; otherwise Specify starting point xo E X. For k = 0, 1,2,. . . , generate the sequence { x k }

Step 1. generate the bench mark point yk+' = A'(zk) ; Step 2. start the minimization off;(.) at x k to get some ijkfl satisfying the

inner loop stopping criterion (2.3)

Step 3. take c k + l if f ( G k + ' ) 5 f(yk"): yk+' otherwise;

x k + l =

and repeat.

Proposition 1 (convergence of inexact PPA-1). Suppose the optimal solution set x is nonempty and bounded. If the mapping D : X -+ X in the inexact PPA-I is continuous on x \ X, and iffor dl z E x \ X, d(z) = ~ ( z ) - z is a descent direction off( -; 2) at t, then the sequence ( z k ) c x generated by the inexact PPA-I either terminates at a point in X or satisfies

dist(zk,X) 4 0 and j ( x k ) --f f* as k --+ 00.

.

7

(2.6)

by Step 3 of the algorithm. In the following, we prove the proposition by verify- ing that for the algorithmic mapping A and B defined in (2.4) and. (2.6) respectively, all the conditions of [l, Theorem 7.3.41 on the convergence of the iteration z k + I f (BoA)(zk) are satisfied.

X \ X, and y' f A(w'), y' -+ y, implies y E A(w) [l]. Now y' E A(&) means First, we claim that A is closed on X \ X in the sense that wi E X , w i + u' E

Let X E [0, 11 be a real number such that

Then, by the continuity of convex function in its effective domain [ll, Theorem

j((1- X)W' + X D ( W ' ) ; W ' ) + min f (z ;w>, as toi -+ 20.

10.11,

= E I w , D ( w ) ] - But f:i , [w',D( w i ) ] 5 f((1 - X)wi + XD(wi); w'). Therefore (2.7) yields

f(yi; w ' ) 5 (1 - p)f(w' ; w ' ) + 3J(( 1 - X)w' + XD(w'); WZ). (2-8)

Taking the limit on both sides of (2.8), we obtain

which means that y E A(w). Hence the algorithmic mapping ,4 is closed on X \ -u. Second, we claim that

f (y ) < f (z) vy f A ( z ) , 2 E x \ -u. (2.9)

Recall that D ( z ) - z is a descent direction of f( -; z ) at z for all z E X \ X. Hence

8

which implies f(y; z ) < f(z; z ) Vy f A(%), z E X \ X,

by (2.4). But f(z;z) = f(z) and f (y;z) 2 f(y). Hence (2.9) follows. Third, we claim that for any xo E X , the sequence {zk} generated by the

inexact PPA-1 is contained in a compact subset of X . Note that is a level set of f defined in (1.4). But all the level sets of a proper, lower semicontinuous convex function have the same recession cone [ll, Theorem 8.71. Hence, the boundedness of X implies the boundedness of

(2.iO)

Obviously, the set on the left-hand side of (2.10) is a closed subset of X containing a the sequence (5‘). This completes the proof.

Proposition 2 (convergence of inexact PPA--2). Suppose that the optimai solution set is nonempty and bounded and that the objective function f is continuously differentiable in some open neighborhood of X. I f the mapping D : X -, X in the inexact PPA-2 is continuous on x\X, and (~f(z; z ) , ~ ( z ) - r> < o for a . ~ z E x\X, where Vf( z ; z ) is the gradient off( - ; z) at z, then the sequence {z‘) C X generated by the inexact PPA-2 either terminates at a point in or satisfies

dist(z’,ff) -+ 0 and f ( z k ) -+ f* as IC -+ 00.

Proof. Let

Then zk+l E B(y“1) = (BOA’) (P)

by Step 3 of the algorithm. Note that in (2 .5 ) , A’ is a descent algorithmic mapping

(2.11)

and B does not increasing the objective value. Let {zk} be an infinite sequence generated by the algorithm. Then {zk} is contained in the set

9

which is compact as has been shown in the proof of Proposition 1. Let { z k ) X be a convergent subsequence of { z } ,

z k 4 w as k + o o , k E K : ,

where K: is a subset of (0, 1,2,. . .}. Then the nonincreasing sequence of objective d u e s ( f ( z k ) } converges to f(w) as k 4 00. To prove the proposition, we need only to prove that UI E ,u.

By contradiction, suppose w $! X. Then zk X for sufficiently large k E K: since X is a closed set. Let

A* :=max(ejIf(w+B.'d(w);w)-J(w;w) <.B.'(Vf(w;w),d(w)),j E {0,1,2, ...}), x k := max { e j I f k ( z k + e j d ( 2 ) ) - jk(zk) I r e j ( v f k ( t k ) , d ( z k ) ) , j E io, i , ~ , . . .I >, where the simplified notation fk( .) is used for the function f( - ; z'). Suppose A' = d . Then

F k ( z k + e'd(Zk)) -+&(zk) < T e j ( v j k ( Z k ) , d ( Z k ) )

f k ( z k + e i d ( Z k ) ) - j k ( l k ) > r e i ( v f k ( Z k ) , d ( Z k ) )

v j > j ' ,

'Jj < j ' ,

for sufficiently large k E Ac. Hence Xk E {A*, ex*} and

for sufficiently large k E K. By (2.11) and the equality V f k ( z k ) = Vf(zk), it follows from (2.12) that

Taking lim sup as k -, cx) with k f K , we get

Iimsup ~(A'<z ' )>) - f (w) 5 T@X*(V~(W), d(w)) .

Let E = -~sX*(Vf(w), d ( w ) ) . Then E > 0 by the assumption in the inexact PPA-2. But zk+' E (BoA')(zk) implies f ( z k++' ) 5 f ( A ' ( z k ) ) . Hence

k - m , k E K

This contradicts the fact that the nonincreasing sequence { f ( z k ) ) converges to f(w) ask-+m.

10

3. Rate of Convergence for the Inexact PPA.

In this section, we investigate the rate of convergence for the sequence ( f ( x k ) ) generated by the inexact PPAs. We show that the rate is related to the progress in the inner loop minimization, as well as the growth condition on the inverse of the subdifferential mapping near 0.

The results in this section will also be useful in deriving convergence rates for some other variants of PPAs. In order to facilitate such applications in Section 6, we here consider a slightly more general problem

(PP) minimize cp(z) over all x E Et",

where 'p : Et" -+ I R U (00) is a proper low semicontinuous convex function. Suppose the optimd solution set k = argmincp(s) is nonempty. For fixed z f R", let

ZER"

Obviously problem (PP) includes (P) as a special case, and the regularized function @(z; z ) coincides with f(z; z ) for x E X when cp = f as defined in (1.4). We denote

Let X be the effective domain [ll] of 9,

x= (z€IR"[p(z)<+cw,) .

The following lemma gives a general result on the relationship between the outer- loop residue 9 ( z k ) - cp* and the inner-loop residue + k ( x k ) - +;. Lemma 3 (relation between inner and outer loop residues). Suppose there exists cr > 0, S > 0 and r 2 1 such that the inverse of the subdifferential mapping 7 = dq satisfies the growth condition

v w E B(0,6), vx E 7 - I ( w ) , dist(X,z) 5 crlwl', ( 3 4

where B(0,6) is the closed ball centered at 0 with radius 6 in R". Then for any sequence {zk} C .Y \ such that dist(A?,xk) -+ 0 as k --f co,

.

11

for s&ciently large E . Hence for any E > 0, there exists k' such that for all k 2 k17

Proof. The first half of (3.2) follows directly from + k ( z k ) = (p(zk) and $; 2 cp*. Now we prove the second half of (3.2). Let Zk+l = a,rgmin$k(s). Then =

J,-(zk) and Zk+' # zk unless x k f k, where ,?= = (I+ c7)- ' is the proximal mapping related to the operator 7. According to Giiler (3, Lemma 2.21

LEX

1 1 1 2c 2 C 2c

#+') - c p ( t ) I -12 - Zk12 - -It. - Zk+ll2 - -12 - zk+112 vz E x. (3.4)

For any 5 E k, z = Z in (3.4) yields

On the other hand, t = x k in (3.4) yields

1 2 C

+ k ( X k ) - @@+1) 2 -+k - Zk+112.

Combining (3.5) and (3.6), we get

But

Substituting (3.8) in (3.7), we get

(3.5)

12

for any i E 2. Hence & ( F k + l ) - 'p* 2dist(2, ++I)

+ k ( Z k ) - +j,(z"+') - < 1 + I z k - z k + l I where 'pa is the optimal value of (Pp).

(3-9)

Now, according to [12, Proposition I],

lJc(.> - Jc(z')12+l(l - Jc)(.> - (1 - Jc)(z'>l2 L 1. - 2'1, (3.10)

(3.11) (1 - Jc)(4 E C 7 ( J c ( 4 ) ,

for all z E X and zf E X. Let z = zk and zf = 5 in (3.10) and (3.11). By (3.10),

Izk+l - 212 + 12 - zk+l I - I < zk - 212 vz E 2, Hence Izk - Zk+' I - < dist(x,zk). Therefore

lzk - Zk+ll But zk - Zk+' E c l ( Z k + ' ) by (3.11), or, in other words,

0 as dist(x,zk) -+ 0.

Hence it follows from (3.1) that

(3.12)

for sufficiently large I C . Substituting (3.12) in (3.9), we obtain

from which (3.2) follows, since + k ( z k ) = v(zk) and lzk - Zk+'l 5 dist(x,zk).

Lemma 3 may be viewed as a quantization of the relationship

Jc(x) = x e 0 E 7(4. The result can also be used in the optimality test of the algorithm. If an estimation on ( p k ( x k ) - $1 for the subproblem is available, then an estimation on y ( z k ) - V*

for the original problem can be obtained via (3.3). The growth condition (3.1) is in fact a local upper Lipschitz condition with

modulus 01 and order r on T- l . It is a common requirement when convergence rates are under considerations. Both Rockafe1la.r [I21 and Luque [4] used similar conditions in their analysis of PPAs. Actually, the condition in [12] on the convergence of the asymptotically accurate PPA implies the growth condition with T = 1. See Zhu [18] for more discussions on the growth conditions.

13

Theorem 4 (relation between inner and outer loop rates). Suppose {zk] c X \ A? is a sequence converging to .-? in the sense that dist(2, xk) + 0. Assume the growth condition (3. I ) on the inverse of the subdifferential mapping 7 = 3~ is satisfied with constaints CY > 0, 6 > 0 and r 2 1. Let {ck} C X be a sequence satisfying

with some 7 E [0, 1) for all su6ciently large k. Then

di"+') - Q* - 1--77 v W ) - v* - 2( 1 + (Q/c')diStr--l(2, z")

(3.13)

(3.14)

for all sdcient ly large k.

Proof. It follows from (3.13) that

for sficiently large I C . However, by the second half of (3.2),

V ( z k ) - cp* 5 (2 + (2a/cr)distr-l(2, z')) ( G k ( z k ) - s i ) (3.16)

for sufficiently large IC. Observe that y ( z k ) - y* > 0 for zk 4 k. Hence dividing (3.15) by (3.16), we have

1-7 > - 2(1 + (a/cr)distr-'(A?,zk)) $(P) - +(ij"t")

v ( z k ) - y*

for sufficiently large I C . But + k ( z k ) = q ( z k ) and +k(fj '+I) >_ ~ ( f j ' + ' ) . Therefore

for sufficiently large I C , from which (3.14) follows. 0

Corollary 5 (rate of convergence of inexact PPA4-1 and PP-4-2). Suppose there exist constants Q > 0, 6 > o and T 2 1 such that the inverse of the subdifferential mapping T = af related to f in (1.4) satisfies the growth condition

Vw E IB(O,6), Vx E T - ' ( w ) , dist(AT,z) 5 Q I W ~ ~ , (3.17)

14

where IB(O,6) is the closed ball centered at 0 with radius 6 in R”. Let {zk) c X\X be a sequence generated by PPA-1 (or PPA-2). If the conditions in Proposition 1 (or Proposition 2) for the convergence of PPA-1 (or PPA-2 respectively) are satisfied, then the sequence { f (zk)} of the objective value converges to the optimal d u e of (P) h e a d y in the sense that

for all sufliciently large k, and

f(sk+’)--f* < 1 +77 i f r , 1, lim sup - k-oo f ( 4 - f * 2

(3.13)

(3.19)

where 7 E (0, l ) is the constant in the stopping criterion of the dgorithrn.

Proof. By Proposition 1 (or Proposition 2 respectively), we have dist(x, zk) j 0 as k -, 00. Sow invoke Theorem 4 for ‘p = f(z) = f(z) + 6x(z). (3.18) and (3.19) follow directly from (3.14), since f ( zk+’) 5 f ( G k + ’ ) in the algorithm.

According to Corollary 5, a linear rate of the inner-loop iteration on the “regularized” subproblem will induce a linear rate on the outer-loop iteration, provided the growth condition (3.17) is satisfied. Hence algorithms possessing linear or superlinear rate of convergence on “regular” problems can be extended to solve the “singular” problems via the inexact PPA.

Corollary 5 also throws a light on the choice of the proximal parameter c. For r = 1, which is the common case corresponding to an upper Lipschitz continuous inverse of the subdifferential mapping af in some neighborhood of 0, the rate given by (3.18) depends on the proximal parameter c. as well a s on the inner loop rate represented by 7. Once the inner loop algorithm is chosen, the relationship between 7 and c c a n be determined. Then the choice of c can be optimized by minimizing the right-hand side of (3.18).

15

4. AppIications In Unconstrained Minimization.

Consider the unconstrained minimization of a convex C2 function with possibly singular Hessian matrix

(PS) minimize f(z) over all z E R".

Most commonly used minimization algorithms with a linear (or superlinear) rate require that the Hessian matrix be uniformly positive definite and bokded in some neighborhood JV of x, that is, that there exist on >, 01 > 0 such that

Their convergence properties are usually not guaranteed if the Hessian matrix is singular at some Z E a situation that occurs, for instance, when the optimal solution is not unique. Now, by putting these algorithms in the inner loop of the inexact PPA, we can extend them to solve problems with singular Hessian matrices, while still retaining a linear rate of convergence.

Take the steepest descent method as an example. The algorithm searches along the direction dk = -Vf(zk) to generate the new iterate

zk+l := zL+' + X k d k , where Xk = ar,Pmin f ( z k + Adk). XE[O,+=)

If (4.1) is satisfied (without loss of generality, suppose JV is convex), then

f ( X k + ' ) < -

for zk sufficiently close Taylor expansion at an

f(2 + a , V )

f(2) + a;l(vj(zk), dL) + -(dk, V2. f (w)dk) aZ2

2 (4.2)

to X, where w f [zk,xk + a; 'dk] . On the other hand, the optimal solution 5 E X gives

Combining (4.3) and (4.3), we have

16

Therefore the conditions in Proposition 1 are satisfied, and the algorithm converges in the sense that dist(sk,_%) -+ 0 and f ( x k ) -+ f" as k 00, unless the iteration terminates with an z k E X.

But IVf(tk)llzk - $1 2 (Vf(sk),xk - z) 2 c# - 212

(cf. [S, p. 861). Hence the sequence of objective values will eventually converge with a linear rate

f ( z k + ' ) - f* g1 2 51-(-). fbk) - f' On (4.4)

Now consider problems with singular Hessian matrices. Suppose the optimal solution set of (P,) is nonempty and bounded. Let

One possible iteration scheme based on Inexact PPA-1 goes as follows.

A Linearly Convergent Inexact PPA for Unconstrainted Minimization. Choose an integer m 2 1. Specify startingpoint to E X . Fork = 0, I, 2,. . . , generate the sequence { x k } as follows: Z x k f X, then stop; otherwise (i) minimize f ( w ; x k > starting from wkv0 = zk to generate w c J , w k r 2 , - - . > wk+ by

using the steepest descent algorithm; (ji) define := w k * m 9 Y k+1 := w k - 1 ; take

if f ( G k + ' I f (yk+') , i j k + l , k + l

- { yk+' otherwise.

Obviously the point wky' is taken as the bench mark in Step 1 of PP,4-1, and the inner loop minimization in Step 2 is stopped when the number of iteration exceeds m. Now Vf(;c; x) = Vf(z), and V~(J:; z) = 0 if and only if 5 E -7. Define

D(z) = 5 - cVf(z; z) = 5 - CVf(Z).

The first step of the steepest descent method in the inner loop of minimizing f k ( u )

yields (4.5)

17

Now for any bounded neighborhood JV of X, there exists u,, > 0 such that the objective function f ( * ; z ) of the inner-loop subproblem satisfies

Without loss of generality, suppose JV is convex. By taking limits on both sides of the inequality

as k -+ 00, it is easy to see that, for any t E I O 7 1,. . ,m] ,

Hence all the points tuk,', wk*2 , - . - 7 wk*" will be in iV for sufficiently large IC . There- fore, similar to (4.4) for f, we have

for fk by (4.6), where t = 0,1,. . . , m - 1. 3ote that ijk+' = tukTrn and xk = wkPo in the algorithm. Multiplying (4.7) for t = 0,l7. . . , m - 1, we obtain

Hence the stopping criterion in PPA-1 is satisfied with an q equals to the right-hand side of (4.8). According to Corollary 3, the outer iteration converges with a linear rate

provided the growth condition (3.17) is satisfied. -4 "suboptimal" choice of c could be obtained by minimizing the right-hand side of (4.9).

Note that instead of the steepest descent method, one can use the conjugate gradient method or the BFGS method in the inner loop iteration a5 well, to get the "inexact PPA-1 version of the CG method" or the "inexact PPA-1 version of the BFGS method" correspondingly. These iteration schemes can be used on

18

the singular problems where the corresponding ordinary methods may fail. The convergence of these schemes follow immediately from the analysis in this section, since the &st step in either the CG or the BFGS methods is identical to the first step in the steepest descent method. The rate of convergence will depend on the inner loop progress resulting from those individual algorithms.

5. Application In Hemiquad r at ic Extended Linear- Q uadr at ic Problems . Let

L(u,v) = TU + Qu*Pu + q.v - +u*&v - V-Ru, (5-1)

where p E IR", Q E Etrn, and R E Etrnx". The matrices P E RnX" and Q E lRmxm are symmetric and positive semidefinite. (In this section, the Euclidean inner product is indicated simply by "." for convenience.) The associated primal and dual extended linear-quadratic problems (ELQPs) are

minimize f(u) over all u E G, where f(u) := sup L(u, v ) ,

maximize g(v) over dl v E V, where g ( v ) := inf L(u,v) , V f V

uEU

(PL

(Q L )

where U and V are nonempty polyhedral (convex) sets in R" and R", respectively. Problems (PL) and ( QL) have a common optimal value if either has a finite optimal d u e ; in this case, the problems are equivalent to the saddle point problem of the Lagrangian L on U x V [15, Theorem I]. The objective functions f and g in these problems are piecewise linear-quadratic, and may have discontinuities in the first or second order derivatives in general.

The primal-dual steepest descent algorithm is a newly developed method for large-scale ELQPs with special structure (19, 161. It has a linear rate on fully quadrutic problems, that is, the problems with positive definite P and Q matrices [16]. Now with the aid of the inexact PP-4, we can extend this algorithm to solve problems where one of P and Q is singular (hemiquadratic case). The resulting iteration scheme will retain a linear rate of convergence.

Suppose that in (5.1) the matrix P is singular, while the matrix Q is positive definite. Then dom f = R", and we can apply the inexact PPA-1 on the primal problem (PL). (The variable u corresponds to the variable z in Sections 2 and 3,

19

and the sets U and U correspond to the sets X and R, respectively.) The objective function of the inner-loop subproblem is

1 -. f(u; 2) = f ( u ) + 5 l u - .I2 = sup L(u,v; z),

vE V

where is the augmented Lagrangian

Let F ( . ) : U ---* V and G(-; - ) : V x U + U be the mappings

respectively. For fixed z, define the mapping B( * ; z ) : U -+ U as

@u; z ) = G ( F ( u ) ; z) . (5 .5 )

The primal part of the algorithm [lS], applied to the subproblem of minimizing f ( u ; z) on U goes as follows. Specify some starting point uo E U. At uk, generate uk+l by

where there itre three possible rules to determine the step length Xk [lS]:

(i) Perfect line search

(ii) Fixed step lengths

1 1 Xk := min{l, -}, where -/ = IQ-)R(P + -I)-+]; C 2y 2

20

(iii) Adaptive step lengths

with 6 E (0 , l ) and G(v; z ) = minuEU Z(u, v ; z ) . The corresponding single step rates are [16, Theorems 3.1, 3-21

for rules (i) and (ii), and

for rule (iii) respectively. 4 possible iteration scheme based on the Inexact PPA-1 goes as follows.

A Linearly Convergent Inexact PPA for Hemiquadratic ELQP. Choose an integer m >, 1. Specify the starting point uo E U. For k = 0, I, 2,. . . , generate the sequence { u k } as follows: if u k E 0, then stop; otherwise

(i) minimize f(w; uk) starting from wk** = u k to generate wkJ, wkj2 , . . . , wkym by using the iteration (5.6).

(ii) define ~ ~ + l := w k,m , y k+l := W k , l ; take

In this scheme, the point w'*' is taken as the bench mark in Step 1 of PPA-1, and the inner loop minimization in Step 2 is stopped when the number of iteration exceeds n. Define

D ( u ) = d(u; u ) .

It follows from Rockafellar [14, p. 4691 that D is continuous. Moreover, by 119, Proposition 3.11, d(u) = D(u) - u is a descent direction of f ( . ; u ) at u for u # fi-. Hence the conditions in Proposition 1 are satisfied by either the step length rule (i),

21

or (ii), or (iii). Therefore the algorithm converges in the sense that dist(uk, D) + 0 and f ( u k ) + f’ as IC 3 00, unless the iteration terminates with an uk E e.

To verify that the growth condition in Corollary 5 is satisfied, observe that the subdifferentials i3f is a polyhedral multifunction in the sense of Robinson [g] (cf. [13, p. 7871). Hence, the subdifferential Sf = af + i3Nu is also a polyhedral multifunction, where NU(u) is the normal cone of U at u. (The equality follows from 111, Theorem 23.81.) Then, by 19, Corollary], there exist cr > 0 and 6 > 0 such that the growth condition (3.17) is satisfied with T = 1.

The rate of the outer loop can be obtained by first raising the single step ratio in (5.7) or (5.8) to the m-th power to get the constant 7 in the inner loop stopping criterion, and then substituting this 17 in Corollary 5. For instance, if step rule (i) or (ii) is used in the inner loop, we have

(5.9)

where i ! j ( i l , ( i i ) is given by (5.7). algorithm. Hence, multiplying (9.9) for t = 0,1, . . . m - 1, we obtain

Note that ijk+’ = wkJ”and uk = tukio in the

According to Corollary 5 , the outer iteration converges with a linear rate

(5.10)

X ”suboptimal” choice of c could be obtained by minimizing the right-hand side of (5.10).

The inner Ioop algorithm used above has been extended, in [17], to solve the minimax problems represented by a more general Lagrangian than (5.1). Therefore, it is not difficult to derive a linearly convergent inexact PPA for those minimax problems by following the patterns in this section.

22

6. Linear Convergence on Correa-LemarCchal’s Implementable PPA.

In this section, We prove the linear convergence of Correa-Lemar6cha.I’~ “implementable form” of PPA [2, Algorithm 3.31. Let ‘p : R” -+ R U (00) be a proper low semicontinuous convex function. Let &‘p denote the e-subdifferential of ‘p. Recall that

9 E &v(z) - Cpb> 2 $44 + (2 - w) - E v 2. For simplicity of the notation, we will still use a fixed proximal constant c. The extension of the results to the case of a sequence of indexed {ck} would be straight forward. With the notations developed in Section 3, Correa-LemarCchal’s implementable form of PPA for minimizing ~ ( z ) goes as follows:

Correa-Lemar6chal’s implementable form of PPA [2]. Choose an algorithm to generate a minimizing sequence ( w J } of +(.; z ) for fixed z. Fix K > 1 and p E (0, I). Start from z1 E X, set IC = 1.

Step 1. Set j = 1; start from some wk*j = w k J .

Step 2. Set

then go to Step 3; otherwise compute wk?j+l , increase j by 1 and execute Step 2 again.

Step 3. Set zk+’ = wkyj, increase k by 1 and loop to Step 1.

Correa and Lemarkchal pointed out in [a] that (6.1) and (6.2) imply

They also proved that if c p is strongly coercive, i.e.,

then

23

(i) for each k, Step 2 eventually exit to Step 3, unless zk is already optimal [2,

(ii) liminfh,, (p(zk) = 'p* [Z, Proposition 3.21. Theorem 3.21;

Now we prove further that, when the the parameters 6 and ,u fall in certain ranges, the algorithm converges at least linearly for the problems satisfying the growth conditions (3.1).

Proposition 6 (rate of convergence on Correa-Lemar6chal's form of PPA). Suppose the optimal solution set 2 is nonempty and bounded, and zk # i%? in Cornea- Lemare'chal's implementable PPA for all k. -Ifq is strongly coercive in the sense of (6.4), and there exist constants CY > 0, S > 0 and r ,> 1 such that the inverse of the subdifferential mapping 7 = ay satisfies the growth condition (3.1) in Lemma 3, then the sequence {xk} PP.4 satisfies

for all sufticiently large k.

c X generated by Correa-Lemare'chd's implementable

1 (6.5)

(26 + 2 / ( 2 p - 1)) (1 + (Q/cr)diStr-l (X, x")

Hence the sequence { y ( z k ) } of the objective value converges to the optimal d u e y* linearly

, 3 ( x k + ' ) - 'p* 1 - < I - d Z k > - 9* (36 + 2/( 2p - 1)) (1 + a/c')

for dI sufficiently large I C , and

Proof. (3.13) holds with

We prove the proposition by showing that dist(zk, 2) - 0 as k + CO, and

1 q = l -

K + 1/(2p - 1)'

Then the conclusions of the proposition follow directly from Theorem 4.

First observe that lim infk,, p(zk) = y* implies limk,, (p(zk) = p*, since the sequence { y ( z k ) } is monotonically decreasing by (6 .3) . By contradiction, suppose

24

dist(zk,x) f , 0 as IC --+ do. Then there exists a subsequence ( z k } ~ , where X: is m infinite subset of the set of all positive integers, such that

x k + 2 $ ? X as k + d o , k E K ,

for some 5 E X. Hence

which contradicts the fact that limk,, v(zk) = v*. Therefore under the assumptions of the proposition, we have

dis t (zk,z) -+ 0 as k -+ 00.

Next observe that (6.2) in the algorithm with &(tk, w J ) given by (6.1) yields

1 c p ( % ) &(Zk) + -(2 - Zk+l, z - Zk} - &(Zk, Zk+l) v z C

=&) + -(Zk 1 - Z k + l , Z - z k } - K[&) - cp(z k+l) - .IXk+' - ,k12] v 2, C C

or equivalently,

Let z = zk+l = argmin+k(s). We have Z E X

Therefore, with the notation $: = minzEJy gk(z),

25

Hence, for p E ( 3 , l), there holds

@ k ( Z k ) - &(Zk++') 5 K [ + k ( Z h ) - &(2+1)] + 1/2" 2C - Z k l Z . (6.11) On the other hand,

&(2) - (&(Zk+l) + ( 1 - ) 1 2 + 1 - 2 1 2 = &) - #+') - E I S L + l - Zk12 2 0 2C C

( + k ( Z k ) - $ k ( Z k + l ) ) 2 -~zk+' - 2 1 2 .

by (6 .3) . Hence 1

(6.12) 1

2 p - 1 2c Substituting (6.12) in (6.11), we obtain

1 + k b k ) - 6 I (. + ) [+k(.') - a k ( Z " + ' ) ] . 2p -

Then

or equivalently,

Therefore the condition (3.13) in Theorem 4 holds with given by (6.8). 0

Acknowledgments. After this paper was written, the author became aware of A. N. Iusem and B. F. Svaiter's paper in press by RXIRO with the title ''A proximal regularization of the steepest descent method," where a regularized objective function was used in the line search of the steepest descent method. Iusem and Svaiter were able to prove the convergence of their method for pseudo-convex objective functions without requiring the boundedness of the level sets. However, their paper presents no rate of convergence results. In contrast, the example in Section 4 of the present paper with m = 1 shows that their method has a linear rate when the objective function satisfies the conditions in this paper.

26

References.

[l] M. S. Bazaraa and C. M. Shetty, Nonlinear Programming: Theory and Algo- r i thms, Wiley, 1979.

f2] R. Correa and C. Lemarthal, “Convergence of some algorithms for convex minimization,” Math. Programming 62 (1993), pp. 261-275.

[3] 0. Guler, “On the convergence of the proximal point algorithm for convex minimization,” SIAM J. Control Opt. 29 (1991), pp. 403-419.

[4] F. J. Luque, “Asymptotic convergence analysis of the proximal point algorithm,” SIAM J. Control Opt. 22 (1984), pp. 277-293.

[5] B. Martinet, “Regularisation d’inhquations variationelles par approximations successives,” Rev. Francaise Inf. Rech. Oper., 1970, pp. 154-159.

[6] B. Martinet, “Determination approchie d’un point fixe d’une application pseudo- contractante,” C.R. Acad. Sci. Paris 274 (1972), pp. 163-165.

[7] G. J. Minty, “Monotone (nonlinear) operators in Hilbert space,” Duke Math. J. 29 (1962), pp. 341-346.

[8] J. &I. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York 1970.

[9] S. hl. Robinson, “Some continuity properties of polyhedral multifunctions,” Math. Programming Studies 14 (1981), pp. 205-214.

[lo] R. T. Rockafellar, “On the maximality of sums of nonlinear monotone operators,” Trans. Amer. Math. SOC. 149 (1970), pp. 75-88.

[ll] R. T. Rockafellar, Convex Analyiis, Princeton Univ. Press, Princeton, N.J., 19-70.

1121 R. T. Rockafellar, ‘‘Monotone operators and the proximal point algorithm,” SIAM J. Control Opt. 14 (1976), pp. 877-898.

[ 131 R. T. Rockaf‘ellar, “Linear-quadratic programming and optimal control,” SIAM J. Control Opt. 25 (1987), pp. 781-814.

[ 141 R. T. Rockafellar, “Computational schemes for solving large-scale problems in extended linear-quadratic programming,” Math. Programming 48 (1990), pp.

27

447-474.

E151 R. T. Rockafellar and R. J-B Wets, “A Lagrangian finite generation technique €or solving linear-quadratic problems in stochastic progrrrmming,” Math. Program- ming Studies 28 (1986), pp. 63-93.

[16] C. Zhu, “On the primal-dual steepest descent algorithm for extended linear- quadratic programming,” S A M J. Optimization, accepted.

[17] C. Zhu, “Solving large scale minimax problems with the primal-dual steepest descent algorithm,” Mathematical Programming, accepted.

[18] C. Zhu, ”Asymptotic convergence analysis of the forward-backward splitting algorithm,” Mathematics of Operations Research, accepted.

[I91 C. Zhu and R. T. Rockafellar, “Primal-dual projected gradient algorithms for extended linear-quadratic programming,” SIAM J. Optimization 3 (1993), pp. 751-783.

DISCLAIMER

This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsi- bility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Refer- ence herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recorn- mendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

point for minimization

Documents