filled functions for unconstrained global optimization

17
Journal of Global Optimization 20: 49–65, 2001. © 2001 Kluwer Academic Publishers. Printed in the Netherlands. 49 Filled functions for unconstrained global optimization ZHENG XU 1,, HONG-XUAN HUANG 2,, PANOS M. PARDALOS 1,and CHENG-XIAN XU 3,1 Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA; 2 Department of Mathematical Sciences, Tsinghua University, Beijing 100084, P. R. China; 3 Faculty of Science, Xi’an Jiaotong University, Xi’an 710049, P. R. China (Received 28 August 1998; accepted in revised form 23 February 2001) Abstract. This paper is concerned with filled function techniques for unconstrained global min- imization of a continuous function of several variables. More general forms of filled functions are presented for smooth and non-smooth optimization problems. These functions have either one or two adjustable parameters. Conditions on functions and on the values of parameters are given so that the constructed functions have the desired properties of filled functions. Key words: Global optimization, Local minimizer, Filled function, Basin, Hill 1. Introduction Global optimization problems arise in many disperse fields of science and tech- nology. Many methods have been proposed to search for a globally optimal solu- tion of a given function (Ge, 1990; Horst and Pardalos, 1995; Horst et al., 2000; Levy and Gómez, 1985; Wales and Scheraga, 1999). Many deterministic methods, including Filled Function (Ge, 1990), Tunneling (Levy and Gómez, 1985) and Basin-Hopping (Wales and Scheraga, 1999), use a transformed objective function strategy to construct a path from one of the local minimizers of a given function to another local minimizer with lower function value (if the objective function has many minimizers). This paper is concerned with filled functions for unconstrained global minim- ization of a continuous function F(x), x R n . Let X be a closed and bounded nonempty set which contains a finite number of minimizers of the function F(x), and x k X be a known local minimizer of F(x) with F (x k )>F = min{F(x)|x X}. The basic idea of the filled function methods is to construct an auxiliary The author was partially supported by NSF under Grants DBI 9808210 and EIA 9872509. The author was partially supported by the China Scholarship Council while the author was visiting the Department of Industrial and Systems Engineering, University of Florida, Gainesville, USA. The author was supported by Natural Science Foundation of China.

Upload: zheng-xu

Post on 02-Aug-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Journal of Global Optimization 20: 49–65, 2001.© 2001 Kluwer Academic Publishers. Printed in the Netherlands.

49

Filled functions for unconstrained globaloptimization

ZHENG XU1,∗, HONG-XUAN HUANG2,†, PANOS M. PARDALOS1,∗ andCHENG-XIAN XU3,‡

1Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611,USA;2Department of Mathematical Sciences, Tsinghua University, Beijing 100084, P. R. China;3Faculty of Science, Xi’an Jiaotong University, Xi’an 710049, P. R. China

(Received 28 August 1998; accepted in revised form 23 February 2001)

Abstract. This paper is concerned with filled function techniques for unconstrained global min-imization of a continuous function of several variables. More general forms of filled functions arepresented for smooth and non-smooth optimization problems. These functions have either one or twoadjustable parameters. Conditions on functions and on the values of parameters are given so that theconstructed functions have the desired properties of filled functions.

Key words: Global optimization, Local minimizer, Filled function, Basin, Hill

1. Introduction

Global optimization problems arise in many disperse fields of science and tech-nology. Many methods have been proposed to search for a globally optimal solu-tion of a given function (Ge, 1990; Horst and Pardalos, 1995; Horst et al., 2000;Levy and Gómez, 1985; Wales and Scheraga, 1999). Many deterministic methods,including Filled Function (Ge, 1990), Tunneling (Levy and Gómez, 1985) andBasin-Hopping (Wales and Scheraga, 1999), use a transformed objective functionstrategy to construct a path from one of the local minimizers of a given functionto another local minimizer with lower function value (if the objective function hasmany minimizers).

This paper is concerned with filled functions for unconstrained global minim-ization of a continuous function F(x), x ∈ Rn. Let X be a closed and boundednonempty set which contains a finite number of minimizers of the function F(x),and x∗

k ∈ X be a known local minimizer of F(x) with F(x∗k ) > F ∗ = min{F(x)|x ∈

X}. The basic idea of the filled function methods is to construct an auxiliary

∗ The author was partially supported by NSF under Grants DBI 9808210 and EIA 9872509.† The author was partially supported by the China Scholarship Council while the author was visitingthe Department of Industrial and Systems Engineering, University of Florida, Gainesville, USA.‡ The author was supported by Natural Science Foundation of China.

50

function, called filled function of F(x), such that minimizing the filled functionwill generate a point xk+1 in a basin (a particular connected domain around a localminimizer, see Definition 2.2 in Section 2) of F(x) lower than the basin B∗

k of F(x)

at x∗k . Then the minimization of the function F(x) can be restarted at the point xk+1

to generate a new minimizer x∗k+1 of F(x) with F(x∗

k+1) < F(x∗k ). Repeat the

process until a global minimizer of F(x) is found. The filled function is updated atsuccessively local minimizers of F(x). The filled function at a local minimizer x∗

k

of F(x) is required to reach its maximum at x∗k , to have neither a minimizer nor a

saddle point in the basin B∗k and in any basin of F(x) higher than B∗

k , and to haveminimizers or saddle points in basins of F(x) lower than B∗

k .The first filled function with two adjustable parameters was proposed for smooth

optimization by Ge in 1983 and finally published in 1990. Theoretical analyses andnumerical experiments show that the filled function method is promising. However,the filled function with two parameters have some disadvantages, especially, theexcessive restriction on the choices of the parameter values. Modifications to thefilled function are made to avoid the restricted choices of the parameter values(Ge and Qin, 1987), and to extend to non-smooth optimization (Ge, 1987; Kongand Zhuang, 1996). In this paper we will present extensions of Ge’s type filledfunctions in (Ge, 1987, 1990; Ge and Qin, 1987, 1990; Kong and Zhuang, 1996;Zhuang, 1994) to more general forms for smooth optimization, and present somefilled functions for non-smooth optimization.

The paper is organized as follows: In Section 2, some definitions and assump-tions related to the filled functions are presented. Sections 3 and 4 are devoted togeneral forms of the filled functions with two parameters and with one parameter,respectively, for smooth optimization. In Section 5, we will present some filledfunctions for non-smooth optimization.

2. Definitions and assumptions

In this section we give some definitions and assumptions that will be used in thepaper. It is assumed that the function F(x) is globally convex (or coercive) in thesense F(x) → +∞ as ‖x‖ → +∞. Global convexity implies that there existsa closed and bounded set X ⊂ Rn such that X contains all global minimizers ofF(x), and that all minimizers (local or global) of F(x) in X are interior points ofX. It is also assumed that F(x) has a finite number of minimizers in X and hence,each minimizer is isolated.

DEFINITION 2.1. Let x1, x2 ∈ Rn, x1 �= x2. x1 − x2 is called a descent (ascent)segment of F(x) if the function f (λ) = F(x2 + λ(x1 − x2)) is monotonicallydecreasing (increasing) for λ ∈ [0, 1]. x1 − x2 is called a strong descent (ascent)segment of F(x) if one of the following conditions holds:

51

(i) When F(x) is continuously differentiable,

f ′(λ) = ∇F(x2 + λ(x1 − x2))T (x1 − x2) < 0 (> 0), ∀λ ∈ [0, 1].

(ii) When F(x) is directional differentiable,

max{(x1 − x2)T ξ | ξ ∈ ∂F (x(λ))} < 0 (> 0),

where ∂F (x) is the sub-gradient of F(x) at x and x(λ) = x2+λ(x1−x2), λ ∈[0, 1].

DEFINITION 2.2. The basin B∗k at a local minimizer x∗

k of F(x) is a connecteddomain with the following properties:

(i) x∗k ∈ B∗

k ;(ii) for any x ∈ B∗

k such that x �= x∗k and F(x) > F(x∗

k ), there exists a descenttrajectory from x to x∗

k .

The hill of F(x) at its maximizer x∗k is the basin of −F(x) at x∗

k .

The idea about the basin appeared in the 1970s (Dixon et al., 1976). If thefunction F(x) is continuously differentiable, the descent trajectory is a smoothcurve, for example, the steepest descent trajectory. If the function F(x) is non-smooth, the trajectory may consist of a finite number of descent segments of F(x).

DEFINITION 2.3. Let x∗k and x∗

k+1 be two distinct minimizers of F(x). If F(x∗k ) >

F(x∗k+1), we say that the basin B∗

k+1 at x∗k+1 is lower than the basin B∗

k at x∗k or the

basin B∗k is higher than the basin B∗

k+1.

Let U ∗k denote the union of all basins higher than B∗

k . It is clear that F(x) > F(x∗k )

holds for any point x ∈ U ∗k .

DEFINITION 2.4. The simple basin S∗k at a local minimizer x∗

k of F(x) is aconnected domain with the following properties:

(i) x∗k ∈ S∗

k ⊂ B∗k ;

(ii) (x − x∗k ) is a strong ascent segment of F(x) for any x ∈ S∗

k , x �= x∗k .

By the Definition 2.1, we know that

(x − x∗k )

T ∇F(x) > 0, ∀x ∈ S∗k , x �= x∗

k ,

if F(x) is continuously differentiable. In addition, if x∗k is an isolated minimizer of

F(x), then

Dk = min{‖x − x∗k ‖ | x �∈ S∗

k } > 0, (2.1)

52

that is, the minimal radius of the simple basin S∗k is not zero. If F(x) is twice

continuously differentiable and ∇2F(x) is positive definite at x∗k , then x∗

k is anisolated minimizer.

DEFINITION 2.5. P(x) is called a filled function of the function F(x) at a localminimizer x∗

k if P(x) has the following properties:

(i) x∗k is a local maximizer of P(x);

(ii) P(x) has neither minimizer nor saddle point in W ∗k \{x∗

k } and the set W ∗k

becomes a part of a hill of P(x) at x∗k , where W ∗

k = B∗k ∪ U ∗

k ;(iii) If F(x) has a basin, e.g., B∗

k+1, lower than B∗k , then there exists a point

x′ ∈ B∗k+1 that minimizes P(x) along the ray x∗

k + λ(x′ − x∗k ), λ > 0.

These properties of a filled function ensure that when a descent method, for ex-ample a quasi-Newton method, is employed to minimize the constructed filledfunction, the sequence of iterate points will not terminate at any point in W ∗

k , andwhen there exist basins of F(x) lower than B∗

k , the sequence will either terminateat a point in a basin lower than B∗

k or generate a point such that the value of F(x)

is less than F(x∗k ).

In the following sections, it is assumed that x∗k is an available local minimizer

of F(x) and F ∗ is the global minimum of F(x).

3. Filled functions with two parameters

In this section we study filled functions with two parameters for smooth optim-ization. It is assumed that the function F(x) is twice continuously differentiableand has a finite number of minimizers in the compact set X. Furthermore, supposethat ∇2F(x) is positive definite at every minimizer. In this case, each minimizer ofF(x) is an isolated minimizer and is contained in a certain simple basin.

The first filled function proposed by Ge (1990) has the form

p(x, r, ρ) = 1

r + F(x)exp(−‖x − x∗

k ‖2

ρ2),

where r and ρ are two adjustable parameters. Under some conditions on the func-tion F(x) and on the values of the parameters r and ρ, the function p(x, r, ρ) is afilled function of F(x).

A more general form of filled functions with two parameters can be expressedas

P(x, r,A) = ψ(r + F(x))exp(−Aw(‖x − x∗k ‖β)), β � 1, A > 0, (3.2)

where the value of r is chosen so that r+F(x) > 0 for all x ∈ X, and the functionsψ(t), w(t) have the following properties:

53

(i) ψ(t) and w(t) are continuously differentiable for t ∈ (0,+∞);(ii) for t ∈ (0,+∞), ψ(t) > 0, ψ ′(t) < 0 and ψ ′(t)/ψ(t) is monotonically

increasing;(iii) w(0) = 0 and for any t ∈ (0,+∞), w(t) > 0, w′(t) � c > 0.

Choices for these two functions can be 1/ta(a > 0), csch(t), exp (1/t)−1, · · · forψ(t) and t, sinh(t), et − 1, · · · for w(t).

The following theorems give the conditions on the values of the parameters r

and A so that P(x, r,A) is a filled function of F(x).

THEOREM 3.1. x∗k is a local maximizer of P(x, r,A) for any A > 0.

Proof. Since x∗k is an isolated local minimizer of F(x) and is contained in a simple

basin S∗k , F(x) > F(x∗

k ) for all x ∈ S∗k , x �= x∗

k . From the properties of thefunctions ψ and w, it follows that

P(x∗k , r, A) = ψ(r + F(x∗

k )) > ψ(r + F(x))

> ψ(r + F(x))exp(−Aw(‖x − x∗k ‖β)) = P(x, r,A),

for all x ∈ S∗k , x �= x∗

k . Thus, x∗k is a local maximizer of P(x, r,A).

THEOREM 3.2. If r and A satisfy the inequality

A(r + F(x∗k )) � L

cαβDβ−1k

, (3.3)

then any x ∈ W ∗k \{x∗

k } is not a stationary point of P(x, r,A), where

L = max{‖∇F(x)‖ | x ∈ X},θ(t) = − ψ(t)

tψ ′(t), t ∈ (0,+∞),

α = min{θ(r + F(x)) | x ∈ X},Dk is defined in (2.1) and W ∗

k has the same meaning as one in Definition 2.5.

Note that since both θ(t) and F(x) are continuous, X is compact and r+F(x) > 0for all x ∈ X, the minimum of the function θ(r + F(x)) exists on the set X.Proof. The gradient of the function P(x, r,A) with respect to x is given by

∇P(x, r,A) = − exp (−Aw(‖x − x∗k ‖β))[−ψ ′(r + F(x))∇F(x) +

Aβψ(r + F(x))w′(‖x − x∗k ‖β)‖x − x∗

k ‖β−2(x − x∗k )].(3.4)

For x ∈ S∗k , x �= x∗

k , it follows from (x − x∗k )

T ∇F(x) > 0 that (x − x∗k )

T ∇P(x, r,

A) < 0. Hence, ∇P(x, r,A) �= 0, x is not a stationary point of P(x, r,A) andx − x∗

k is a descent direction of the function P(x, r,A) at x.

54

For any x ∈ V ∗k = W ∗

k \S∗k , it follows from (3.4) that if x is a stationary point of

P(x, r,A), then the equation

−ψ ′(r + F(x))

Aψ(r + F(x))∇F(x) + βw′(‖x − x∗

k ‖β)‖x − x∗k ‖β−2(x − x∗

k ) = 0

must hold. Hence, a necessary condition for x to be a stationary point of P(x, r,A)

is

1

A(r + F(x))= βw′(‖x − x∗

k ‖β)‖x − x∗k ‖β−1

‖∇F(x)‖ θ(r + F(x)).

On one hand, from the property of the function w(t) and definitions of Dk, α andL, we have

βw′(‖x − x∗k ‖β)‖x − x∗

k ‖β−1

‖∇F(x)‖ θ(r + F(x)) � cαβDβ−1k

L. (3.5)

On the other hand, by using the hypothesis (3.3) and the fact x ∈ V ∗k , we have

1

A(r + F(x))<

1

A(r + F(x∗k ))

� cαβDβ−1k

L. (3.6)

Therefore, (3.5), (3.6) and the above necessary condition show that when the valuesof r and A satisfy (3.3), x ∈ V ∗

k can not be a stationary point of P(x, r,A).

From Theorem 3.2 and its proof, we can obtain the following conclusions:

(i) for any x ∈ S∗k , x �= x∗

k , x − x∗k is a strong descent segment of P(x, r,A) at

x∗k ;

(ii) for x ∈ V ∗k , appropriate choices for the values of r and A can make x − x∗

k

a descent direction of P(x, r,A) at x. Thus, W ∗k becomes a part of a hill of

P(x, r,A) with top x∗k . In fact, it follows from (3.4) that (x − x∗

k )T ∇P(x, r,

A) < 0 is equivalent to the inequality

−ψ ′(r + F(x))

Aψ(r + F(x))(x − x∗

k )T ∇F(x) + βw′(‖x − x∗

k ‖β)‖x − x∗k ‖β > 0.

(3.7)

If (x−x∗k )

T ∇F(x) � 0, the inequality (3.7) certainly holds. If (x−x∗k )

T ∇F(x)

< 0, then the inequality (3.7) also holds when

1

A(r + F(x))<

1

A(r + F(x∗k ))

� −βw′(‖x − x∗k ‖β)‖x − x∗

k‖β

(x − x∗k )

T ∇F(x)θ(r + F(x)).

(3.8)

It can be seen that condition (3.8) is implied by condition (3.3).

55

The following two theorems have similar proofs to that of Theorem 3.2, and hencetheir proofs are omitted.

THEOREM 3.3. If the values of r and A satisfy the following inequality

A(r + F ∗) � L

cαβDβ−1k

then x − x∗k is a descent direction of P(x, r,A) for any x ∈ X, x �= x∗

k , andP(x, r,A) has neither minimizer nor saddle point in X.

THEOREM 3.4. Let x �∈ W ∗k . If (x−x∗

k )T ∇F(x) � 0, x−x∗

k is a descent directionof P(x, r,A) at x. If (x − x∗

k )T ∇F(x) < 0, then x − x∗

k is an ascent direction ofP(x, r,A) at x when

1

A(r + F(x))> −βw′(‖x − x∗

k‖β)‖x − x∗k ‖β

(x − x∗k )

T ∇F(x)θ(r + F(x)). (3.9)

Theorem 3.4 indicates that if F(x) has basins lower than B∗k , the function P(x, r,A)

may have either minimizers or saddle points in a basin of F(x) lower than B∗k

when the values of r, A and F(x) satisfy condition (3.9). If x is a point in sucha basin with (x − x∗

k )T ∇F(x) < 0, then P(x, r,A) has a minimizer in the line

x(λ) = x∗k + λ(x − x∗

k ), λ ∈ [0, 1]. This is because P(x(λ), r, A) is descent forsmall λ and is ascent for λ close 1.

The following theorem shows that for appropriately chosen values of r and A,either minimizers or saddle points, x say, of P(x, r,A) are really in basins of F(x)

lower than the basin B∗k , and x − x∗

k is a descent direction of F(x) at x.

THEOREM 3.5. Let the values of r and A satisfy

1

A(r + F(x∗k ))

<cβD

β−1k θ(r + F(x∗

k ))

L. (3.10)

If x is either a minimizer or a saddle point of P(x, r,A), then

(x − x∗k )

T ∇F(x) < 0 and F(x) < F(x∗k ).

Proof. It follows from ∇P(x, r, A) = 0 and (3.4) that

−ψ ′(r + F(x))

Aψ(r + F(x))(x − x∗

k )T ∇F(x) + βw′(‖x − x∗

k ‖β)‖x − x∗k ‖β = 0. (3.11)

Thus we have

(x − x∗k )

T ∇F(x) < 0.

Since (x − x∗k )

T ∇F(x) > 0 for any x ∈ S∗k\{x∗

k }, x cannot be a point in S∗k .

56

Suppose F(x) � F(x∗k ). It follows from (3.11) and the monotonic property of

the function ψ ′(t)/ψ(t) that

cβDβ−1k

L� β‖x − x∗

k ‖β−1w′(‖x − x∗k ‖β)

‖∇F(x)‖� − ψ ′(r + F(x))

Aψ(r + F(x))� − ψ ′(r + F(x∗

k ))

Aψ(r + F(x∗k ))

= 1

A(r + F(x∗k ))θ(r + F(x∗

k )).

This contradicts with (3.10). Thus F(x) < F(x∗k ) must hold.

The above analyses indicate that the function P(x, r,A) is a desired filled func-tion of the function F(x) provided that the values of r and A are properly chosen.Hence, P(x, r,A) can be used in the process of finding global minimizers of F(x).However, there are some disadvantages for the filled function P(x, r,A):

(i) The evaluations of the function P(x, r,A) and the gradient ∇P(x, r,A) areeasily affected by the factor exp(−Aw(‖x − xk‖β). Large values of Aw(‖x −x∗k ‖β) will either lead to an overflow of calculations or make very small changes

in P(x, r,A) and ∇P(x, r,A).(ii) The choices for the values of the parameters r and A are very restricted. It

follows from Theorems 3.2 and 3.3 that r and A must satisfy the followinginequalities

A(r + F(x∗k )) � L/(cαβD

β−1k ) > A(r + F ∗),

when r + F ∗ > 0, in order that P(x, r,A) becomes a desired filled function.(iii) An undesirable property of the filled function P(x, r,A) is indicated by The-

orem 3.3, that is, global minimizers of the function F(x) may be lost if thevalues of r and A are not properly chosen.

4. Filled functions with one parameter

Attempts have been made to improve the properties of the filled functions. Thebasic idea for the modification is to cancel the parameter r by taking r = −F(x∗

k ).The first modification is proposed by Ge and Qin (1987) and has the form

q(x,A) = −[F(x) − F(x∗k )]exp(A‖x − x∗

k ‖2), A > 0.

A more general form for filled functions with one parameter can be expressedas

Q(x,A) = −φ(F(x) − F(x∗k ))exp(Aw(‖x − x∗

k ‖β)), β � 1, A > 0,

where the function w(t) has the same properties as one in Section 3 and φ(t) hasthe following properties:

57

(i) φ(t) is continuously differentiable for t � 0;(ii) φ(0) = 0, φ′(t) > 0, ∀t � 0;(iii) φ′(t)/φ(t) is monotonically decreasing for t ∈ (0,+∞).

Choices for the function φ(t) can be t , at −1 (a > 1), sinh(t) and so on. The filledfunction properties of Q(x,A) will be proved in the following theorems.

THEOREM 4.1. x∗k is a local maximizer of Q(x,A) for any A > 0.

Proof. It follows from the properties of the function φ(t) that

Q(x∗k , A) = 0 > Q(x,A), ∀x ∈ S∗

k , x �= x∗k .

Thus, x∗k is a local maximizer of Q(x,A).

THEOREM 4.2. For any x ∈ W ∗k , x �= x∗

k , if

A > max{0, −φ′(F (x) − F(x∗k ))∇F(x)T (x − x∗

k )

φ(F (x) − F(x∗k ))βw

′(‖x − x∗k ‖β)‖x − x∗

k ‖β}, (4.12)

then x − x∗k is a descent direction of Q(x,A) at x and x is not a stationary point

of Q(x,A).

Proof. The gradient of the function Q(x,A) with respect to x is given by

∇Q(x,A) = − exp(Aw(‖x − x∗k ‖β))[φ′(F (x) − F(x∗

k ))∇F(x)

+Aβφ(F(x) − F(x∗k ))w

′(‖x − x∗k ‖β)‖x − x∗

k ‖β−2(x − x∗k )].

(4.13)

If x ∈ S∗k , x �= x∗

k , then it follows from (x −x∗k )

T ∇F(x) > 0 that (x−x∗k )

T ∇Q(x,

A) < 0. Therefore, x − x∗k is a descent direction of Q(x,A) at x for any A > 0

and x is not a stationary point of Q(x,A) since ∇Q(x,A) �= 0.Now, we consider points x ∈ V ∗

k = W ∗k \S∗

k . If (x − x∗k )

T ∇F(x) � 0, we stillhave (x − x∗

k )T ∇Q(x,A) < 0 for any A > 0. If (x − x∗

k )T ∇F(x) < 0, then

(x − x∗k )

T ∇Q(x,A) < 0 is equivalent to

φ′(F (x) − F(x∗k ))(x − x∗

k )T ∇F(x)

+Aβφ(F(x) − F(x∗k ))w

′(‖x − x∗k ‖β)‖x − x∗

k ‖β > 0.

This inequality will hold when

A >−φ′(F (x) − F(x∗

k ))(x − x∗k )

T ∇F(x)

βφ(F (x) − F(x∗k ))w

′(‖x − x∗k ‖β)‖x − x∗

k ‖β.

This completes the proof.

58

Let x∗j , j = 1, 2, · · · , + be the local minimizers of F(x) with F(x∗

j ) > F(x∗k )

and

m = φ′(d)φ(d)

, d = min{F(x∗j ) − F(x∗

k ) | j = 1, 2, · · · , +} > 0. (4.14)

If

A >mL

cβDβ−1k

,

then (4.12) is implied and x − x∗k is a descent direction of Q(x,A) at any point

x ∈ W ∗k , x �= x∗

k . Therefore, for A large enough, the set W ∗k of F(x) becomes a

part of a hill of Q(x,A) with top x∗k .

The following theorem is a direct consequence of (x − x∗k )

T ∇Q(x,A) > 0and (4.13), and shows that if F(x) has basins lower than B∗

k , Q(x,A) has eitherminimizers or saddle points in one of such basins.

THEOREM 4.3. Let x �∈ W ∗k , F(x) > F(x∗

k ). If (x − x∗k )

T ∇F(x) < 0 and

A <−φ′(F (x) − F(x∗

k ))(x − x∗k )

T ∇F(x)

βφ(F (x) − F(x∗k ))w

′(‖x − x∗k ‖β)‖x − x∗

k ‖β, (4.15)

then x − x∗k is an ascent direction of Q(x,A) at x.

When F(x) − F(x∗k ) → 0+ for x in a basin of F(x) lower than B∗

k (if itexists), φ(F(x) − F(x∗

k )) → 0+ and the right hand side of (4.15) tends to infinity.Thus the inequality (4.15) will hold for points in such a basin with F(x) − F(x∗

k )

close to 0+ no matter how large A is. This property of the filled function Q(x,A)

ensures that global minimizers of F(x) will not be lost, and that if the minimiz-ation of the filled function Q(x,A) finds a point x with either F(x) < F(x∗

k ) or(x − x∗

k )T ∇Q(x,A) > 0, then the point x is already in a lower basin, hence the

minimization of Q(x,A) can be terminated and the minimization of F(x) can berestarted at the point x.

The filled function Q(x,A) removes the excessive restriction on the choiceof the parameter values. However, the factor exp(Aw(‖x − x∗

k ‖β) still exists andinfluences the evaluation of Q(x,A) and ∇Q(x,A). A further modification can bemotivated from the fact that if f (x) > 0, then f (x) and ln(f (x)) have the sameextreme points. Thus, a new filled function of F(x) can be obtained from

− ln[(a + φ(F(x) − F(x∗k ))) exp(Aw(‖x − x∗

k ‖β))]= − ln(a + φ(F(x) − F(x∗

k )) − Aw(‖x − x∗k ‖β),

where a > 0 is a constant.A more general form for this kind of filled functions is

U(x,A) = −η(F (x) − F(x∗k )) − Aw(‖x − x∗

k ‖β),

where the function w(t) has the same properties as one in Section 3 and η(t) isrequired to have the following properties:

59

(i) η(0) = 0, η(t) > 0, ∀t ∈ (0,+∞);(ii) η′(t) > 0 is monotonically decreasing for t ∈ (0,+∞);(iii) limt→0+ η′(t) = +∞.

It can be similarly proved that when the functions η(t) and w(t) have the desiredproperties, the function U(x,A) is a filled function of F(x) without the factorexp(Aw(‖x − x∗

k ‖β)). That is, U(x,A) has the following desired properties:

(i) the local minimizer x∗k of F(x) is a local maximizer of U(x,A);

(ii) when

A >Lη′(d)cβD

β−1k

,

any point x ∈ W ∗k , x �= x∗

k is not a stationary point of U(x,A), and x − x∗k is

a descent direction of U(x,A) at x, where d is defined in (4.14);(iii) for a point x in a basin of F(x) lower than B∗

k with F(x) > F(x∗k ), when

(x − x∗k )

T ∇F(x) < 0 and

A <−(x − x∗

k )T ∇F(x)η′(F (x) − F(x∗

k ))

βw′(‖x − x∗k ‖β)‖x − x∗

k ‖β, (4.16)

x − x∗k is an ascent direction of U(x,A) at x. Hence, U(x,A) has either

minimizers or saddle points in basins lower than B∗k .

The third property of the function η(t) ensures the satisfaction of (4.16) for pointsin such a basin with F(x) − F(x∗

k ) → 0+.

5. Filled functions for non-smooth optimization

This section is devoted to the filled functions for non-smooth unconstrained op-timization. It is assumed that the function F(x) is a composite function in theform F(x) = f (x) + h(c(x)), where f (x) and c(x)T = (c1(x) · · · cm(x)) aresmooth functions and h : Rm → R is convex but non-smooth [3]. Examples ofthese functions occur when solving a system of nonlinear equations, finding a feas-ible point of a system of nonlinear inequalities, penalty functions for constrainedoptimization and so on.

The filled functions for non-smooth optimization can be expressed in the sameforms as those for smooth optimization. However, for simplicity of discussion, herewe will only consider the filled functions with the function w(t) = t and β = 2.

For the filled functions in the form

P(x, r,A) = ψ(r + F(x))exp(−A‖x − x∗k ‖2),

the function ψ(t) is required to have the properties:

60

(i) ψ(t) > 0 for t � 0;(ii) ψ(t) is monotonically decreasing for t � 0;(iii) ψ(t1) − ψ(t2) � c2(t2 − t1) for t2 > t1 � 0, where c2 > 0 is a constant.

Because of the disadvantages of the filled functions in this form, we just point outhere that, if

A(r + F(x∗k )) > L/(2αDk),

then the local minimizer x∗k of F(x) is a local maximizer of P(x, r,A), and the set

W ∗k of F(x) becomes a part of a hill of P(x, r,A) at x∗

k , and P(x, r,A) has eitherminimizers or saddle points in basins of F(x) lower than B∗

k (if it exists). If

A(r + F ∗) � L/(2αDk),

then global minimizers of F(x) can be lost, where

α = min{ ψ(r + F(x))

c2(r + F(x))| x ∈ X}, ‖ξ‖ � L, ∀ ξ ∈ ∂F (x), x ∈ X.

For the filled functions in the form

Q(x,A) = −φ(F(x) − F(x∗k ))exp(A‖x − x∗

k ‖2),

the function φ(t) is required to satisfy:

(i) φ(0) = 0, φ(t) is monotonically increasing for t � 0;(ii) c1(t2 − t1) � φ(t2) − φ(t1) � c2(t2 − t1) for t2 > t1 � 0, where 0 < c1 � c2

are constants.

THEOREM 5.1. x∗k is a local maximizer of Q(x,A).

Proof. From the fact that x∗k is a local minimizer of F(x), we have

Q(x∗k , A) = 0 > Q(x,A), ∀x ∈ S∗

k , x �= x∗k .

Therefore, x∗k is a local maximizer of Q(x,A).

The following lemmas are required to get the result that the set W ∗k of F(x)

becomes a part of a hill of Q(x,A) at x∗k .

LEMMA 5.2. Let x(λ) = x2 + λ(x1 − x2), λ ∈ [0, 1]. If

(x1 − x2)T (x2 − x∗

k ) � 0, (5.17)

then ‖x(λ)−x∗k ‖2 and (x1 −x2)

T (x(λ)−x∗k ) are increasing functions for λ ∈ [0, 1]

and hence

(x1 − x2)T (x(λ) − x∗

k ) � (x1 − x2)T (x2 − x∗

k ) � 0, ∀λ ∈ [0, 1].

61

Proof. The conclusions can directly be obtained from condition (5.17) and thefollowing expansions

‖x(λ) − x∗k ‖2 = ‖λ(x1 − x2) + (x2 − x∗

k )‖2 = ‖x2 − x∗k ‖2

+λ2‖x1 − x2‖2 + 2λ(x1 − x2)T (x2 − x∗

k ),

(x1 − x2)T (x(λ) − x∗

k ) = (x1 − x2)T [λ(x1 − x2) + (x2 − x∗

k )]= λ‖x1 − x2‖2 + (x1 − x2)

T (x2 − x∗k ).

LEMMA 5.3. Let x1 − x2 be an ascent segment of F(x). If (5.17) holds, thenx1 − x2 is a descent segment of Q(x,A) for any A > 0.

Proof. Let f (λ) = F(x(λ)) = F(x2 + λ(x1 − x2)) and g(λ) = Q(x(λ),A) =−φ(f (λ)−F(x∗

k ))exp(A‖x(λ)−x∗k ‖2). Then f (λ), φ(f (λ)−F(x∗

k )), exp(‖x(λ)−x∗k ‖2) and hence φ(f (λ)−F(x∗

k )) exp(A‖x(λ)−x∗k ‖2) are increasing functions for

λ ∈ [0, 1]. Therefore, g(λ) is decreasing for λ ∈ [0, 1], and x1 − x2 is a descentsegment of Q(x,A).

LEMMA 5.4. Let x2 ∈ V ∗k = W ∗

k \{S∗k }, x1 − x2 be a descent segment of F(x). If

(x1 − x2)T (x2 − x∗

k )

‖x1 − x2‖ · ‖x2 − x∗k ‖

� c > 0, (5.18)

A � c2L

2c1cdDk

, (5.19)

the x1 − x2 is also a descent segment of Q(x,A), where d is defined in (4.14), Dk

is given in (2.1).

Proof. Since x1 − x2 is a descent segment of F(x), f (λ) is decreasing for λ ∈[0, 1]. It is clear that if

g(λ2)�= −φ(f (λ2) − F(x∗

k ))exp(A‖x(λ2) − x∗k‖2)

< g(λ1)�= −φ(f (λ1) − F(x∗

k ))exp(A‖x(λ1) − x∗k ‖2) (5.20)

holds for any 1 � λ2 > λ1 � 0, then x1 − x2 is a descent segment of Q(x,A).From (5.20), we have

φ(f (λ1) − F(x∗k ))

φ(f (λ2) − F(x∗k ))

< exp(A(‖x(λ2) − x∗k ‖2 − ‖x(λ1) − x∗

k ‖2)). (5.21)

(5.18) indicates ‖x(λ) − x∗k ‖2 is increasing for λ ∈ [0, 1] (see Lemma 5.2). It

follows from the Taylor expansion of et that if

φ(f (λ1) − F(x∗k )) − φ(f (λ2) − F(x∗

k ))

φ(f (λ2) − F(x∗k ))

< A(‖x(λ2) − x∗k ‖2 − ‖x(λ1) − x∗

k ‖2)

= A(λ2 − λ1)(x1 − x2)T [2(x2 − x∗

k ) + (λ2 + λ1)(x1 − x2)] (5.22)

62

holds, then (5.21) will be satisfied. Under the condition (5.18), we know that (5.22)is equivalent to

A >φ(f (λ1) − F(x∗

k )) − φ(f (λ2) − F(x∗k ))

φ(f (λ2) − F(x∗k ))

×1

(λ2 − λ1)(x1 − x2)T [2(x2 − x∗k ) + (λ2 + λ1)(x1 − x2)] . (5.23)

From the properties of the function φ(t) and the definition of d, (5.23) is impliedby

A >c2(F (x2 + λ1(x1 − x2)) − F(x2 + λ2(x1 − x2)))

c1d(λ2 − λ1)(x1 − x2)T [2(x2 − x∗k ) + (λ2 + λ1)(x1 − x2)] . (5.24)

Since x1−x2 is a descent segment of F(x), it follows from the non-smooth analysis(see [1, 3]) that we have

limλ2→λ1

F(x2 + λ2(x1 − x2)) − F(x2 + λ1(x1 − x2))

λ2 − λ1

= max{(x1 − x2)T ξ | ξ ∈ ∂F (x(λ1))} = (x1 − x2)

T ξ � 0,

where ξ is the element in ∂F (x(λ1)) such that (x1 − x2)T ξ maximizes the value

(x1 − x2)T ξ for all ξ ∈ ∂F (x(λ1)). Taking limit in the right hand side of (5.24)

generates

A � −c2(x1 − x2)T ξ

2c1d(x1 − x2)T (x(λ1) − x∗k )

. (5.25)

Since (x1 − x2)T (x(λ) − x∗

k ) � (x1 − x2)T (x2 − x∗

k ) � c‖x1 − x2‖ · ‖x2 − x∗k ‖ �

cDk‖x1 −x2‖ and |(x1 −x2)T ξ | � ‖x1 −x2‖ · ‖ξ‖ � L‖x1 −x2‖, (5.25) is implied

by condition (5.19). Therefore, when the value of A satisfies condition (5.19), theinequality (5.20) holds, and x1 − x2 is a descent segment of Q(x,A).

THEOREM 5.5. When the value of A satisfies inequality (5.19), x−x∗k is a descent

segment of Q(x,A) for any x ∈ W ∗k , x �= x∗

k , that is, W ∗k becomes a part of a hill

of Q(x,A) at x∗k . Therefore, there is neither a minimizer nor a saddle point of

Q(x,A) in the set W ∗k \{x∗

k }.Proof. If x ∈ S∗

k , x �= x∗k , then x − x∗

k is an ascent segment of F(x). Using thesame method used in the proof of lemma 5.3, we can determine that x − x∗

k is adescent segment of Q(x,A).

If x ∈ V ∗k = W ∗

k \S∗k , then there exists some λ ∈ (0, 1) such that x(λ) =

x∗k + λ(x − x∗

k ) ∈ ∂S∗k where ∂S∗

k is the boundary of the set S∗k . The result in

the previous paragraph shows that x(λ) − x∗k is a descent segment of Q(x,A).

The segment x − x(λ) can be divided into a number of subsegments such that

63

each subsegment is either a descent or an ascent one of F(x). It is clear that forsuch subsegments, the first inequality of condition (5.18) becomes an equality withc = 1. Then it follows from lemmas 5.3 and 5.4 that these subsegments are descentsegments of Q(x,A), and hence x − x∗

k is a descent segment of Q(x,A). Thiscompletes the proof.

THEOREM 5.6. Let x2 �∈ W ∗k , F(x2) > F(x∗

k ) and x1 − x2 be a strong descentsegment of F(x). If (x1 − x2)

T (x2 − x∗k ) > 0 and

A <−c1(x1 − x2)

T ξ

2(x1 − x2)T (x(λ) − x∗k )φ(F (x(λ)) − F(x∗

k )), ∀λ ∈ [0, 1], (5.26)

where ξ ∈ ∂F (x(λ)) such that (x1 − x2)T ξ = max{(x1 − x2)

T ξ | ξ ∈ ∂F (x(λ))},then x1 − x2 is an ascent segment of Q(x,A).

Proof. If

−φ(f (λ2) − F(x∗k )) exp(A‖x(λ2) − x∗

k ‖2)

> −φ(f (λ1) − F(x∗k ))exp(A‖x(λ1) − x∗

k ‖2) (5.27)

holds for any 1 � λ2 > λ1 � 0, then x1 − x2 is an ascent segment of Q(x,A).From (5.27) we obtain

exp(A‖x(λ2) − x∗k‖2) − exp(A‖x(λ1) − x∗

k ‖2)

exp(A‖x(λ1) − x∗k ‖2)

<φ(f (λ1) − F(x∗

k )) − φ(f (λ2) − F(x∗k ))

φ(f (λ2) − F(x∗k ))

. (5.28)

The property of the function φ(t) implies that if

exp(A‖x(λ2) − x∗k ‖2) − exp(A‖x(λ1) − x∗

k ‖2)

exp(A‖x(λ1) − x∗k‖2)

� c1(f (λ1) − f (λ2))

φ(f (λ2) − F(x∗k ))

= c1(F (x2 + λ1(x1 − x2)) − F(x2 + λ2(x1 − x2))

φ(F (x2 + λ2(x1 − x2)) − F(x∗k ))

, (5.29)

then inequality (5.28) is satisfied. Using λ2 − λ1 to divide the both hand sides of(5.29) and then taking limits as λ2 → λ1, we obtain

2A(x1 − x2)T (x(λ1) − x∗

k ) � −c1(x1 − x2)T ξ

φ(F (x(λ1)) − F(x∗k ))

.

Therefore, when (5.26) holds, x1 − x2 is an ascent segment of Q(x,A).

64

Since x2 �∈ W ∗k , i.e. x2 is in a basin of F(x) lower than B∗

k , the inequality (5.26)will hold when F(x(λ)) − F(x∗

k ) → 0+ as λ increases.Finally, for the filled functions in the form

U(x,A) = −η(F (x) − F(x∗k )) − A‖x − x∗

k ‖2.

When the function η(t) satisfies the properties (i) and (ii) of the function φ(t),U(x,A) is a desirable filled function and has the following properties:

(i) the local minimizer x∗k of F(x) is a local maximizer of U(x,A);

(ii) if A � c2L/(2cDk), then the set W ∗k of the function F(x) becomes a part of a

hill of the function U(x,A) with peak x∗k , where c is defined in (5.18);

(iii) if x1 − x2 is a descent segment of F(x) in a basin lower than B∗k and (x1 −

x2)T (x2 − x∗

k ) > 0, and if

A <c1(F (x2 + λ1(x1 − x2)) − F(x2 + λ2(x1 − x2)))

(λ2 − λ1)(x1 − x2)T [2(x2 − x∗

k ) + (λ2 + λ1)(x1 − x2)]holds for all 1 � λ2 > λ1 � 0, then x1 − x2 is an ascent segment of U(x,A).

6. Concluding remarks

In the paper, we are concerned with some general forms of filled functions usedfor unconstrained global minimization of a continuous function (smooth or non-smooth) of several variables. These filled functions have either one or two ad-justable parameters. Conditions on the objective function and on the values ofparameters are given so that the constructed functions have the desired propertiesof filled functions. Note that the forms of filled functions considered only use twostatic parameters or one static parameter. It will be useful from the practical pointof view to extend the filled functions to other types, where the parameters aredynamically adjusted.

References

1. Clarke, F.H. (1983), Optimization and Non-smooth Analysis, John Wiley & Sons, New York.2. Dixon, L.C.W., Gomulka, J. and Herson, S.E. (1976), Reflections on Global Optimization

Problem, in L.C.W. Dixon (ed), Optimization in Action, Academic Press, New York, pp.398–435.

3. Fletcher, R. (1981), Practical Methods of Optimization, Vol. 2, Constrained Optimization, JohnWiley & Sons, New York.

4. Ge, R.P. (1987), The Theory of Filled Function Methods for Finding Global Minimizers ofNonlinearly Constrained Minimization Problems, Journal of Computational Mathematics 5(1):1–9.

5. Ge, R.P. (1990), A Filled Function Method for Finding a Global Minimizer of a Function ofSeveral Variables, Mathematical Programming 46: 191–204.

65

6. Ge, R.P. and Qin, Y.F. (1987), A Class of Filled Functions for Finding a Global Minimizerof a Function of Several Variables, Journal of Optimization Theory and Applications 54(2):241–252.

7. Ge, R.P. and Qin, Y.F. (1990), The Globally Convexized Filled Functions for GloballyOptimization, Applied Mathematics and Computations 35: 131–158.

8. Horst, R. and Pardalos, P.M. (eds) (1995), Handbook of Global Optimization, KluwerAcademic Publishers, Dordrecht, The Netherlands.

9. Horst, R., Pardalos, P.M. and Thoai, N.V. (2000), Introduction to Global Optimization (2ndedition), Kluwer Academic Publishers, Dordrecht, The Netherlands.

10. Kong, M. and Zhuang, J.N. (1996), A Modified Filled Function Method for Finding a GlobalMinimizer of a Non-smooth Function of Several Variables, Numerical Mathematics—A Journalof Chinese Universities 18(2): 165–174.

11. Levy, A.V. and Gómez, S. (1985), The Tunneling Method Applied to Global Optimization, in:P.T. Boggs, R.H. Byrd and R.B. Schnabel (eds), Numerical Optimization, SIAM, pp. 213–244.

12. Wales, D.J. and Scheraga, H.A. (1999), Global Optimization of Clusters, Crystals andBiomolecules, Science, 285: 1368–1372.

13. Zhuang, J.N. (1994), A Generalized Filled Function Method for Finding the Global Minimizerof a Function of Several Variables, Numerical Mathematics—A Journal of Chinese Universities16(3): 279–287.