etsso: adaptive search method for stochastic global ... · fect kriging metamodel, which solves...

eTSSO: Adaptive Search Method for Stochastic Global

Optimization Under Finite Budget

Chenwei LiuDepartment of Industrial & Systems Engineering,

National University of Singapore.

Giulia PedrielliCentre for Maritime Studies


Szu Hui NgDepartment of Industrial & Systems Engineering,


August 15, 2014

Abstract

TSSO is a recently proposed search algorithm based on the modified nugget ef-fect kriging metamodel, which solves unconstrained stochastic simulation optimizationproblems with heterogeneous variances, when the total simulation budget is fixed,through a budget allocation followed by a two–stage sequential procedure. Despitethe efficiency and performance of TSSO, no analysis of the structural properties ofthe algorithm in terms of asymptotic behaviour has been provided. Furthermore,the algorithm arbitrarily assigns the budget per iteration when the procedure starts.However, given a finite budget, the choice of the number of replications per iterationis particularly critical and a fixed a–priori assignment can affect the ability to controlthe algorithm making it particularly sensitive to the initial settings. This work tries tofill these gaps. An analysis of the convergence properties of a modified version of theTSSO is provided. A new allocation of the budget per iteration is proposed, leadingto the Extended TSSO (eTSSO) algorithm, which mitigates the TSSO sensitivity tothe parameter settings and increases its effectiveness. Specifically, the new budgetallocation adaptively increases the available budget at each iteration based on theupdated information coming from the simulation. Numerical examples give evidenceof the algorithm performance.Keywords:Modified Nugget Effect Kriging; Simulation–Optimization; Con-vergence

1 Introduction and Motivation

In the last decade, simulation has been adopted as a means to iteratively evaluate theperformance of the solutions generated by search algorithms (Law and Kelton, 1991; Shimet al., 2002). In the literature, this coupling is referred to as simulation–optimization. Thisapproach has been proven particularly effective when complex functions are of concern,and the search algorithm has to rely upon the performance estimation produced by thesimulation (typically treated as a black box oracle) at selected points (candidate solutions)(Fu et al., 2008, 2005; Swisher et al., 2000; Carson and Maria, 1997).

1

In this work, we refer to the particular case in which only noisy observations can beobtained from the simulation model and we want to find the point x satisfying x∗ ∈arg minx∈XE (Y (x)), where X represents the design space (i.e., the feasible region of theproblem), and Y (x) is the function whose noisy measurements can be obtained throughsimulation.

Two families of approaches, in both deterministic and stochastic settings, can be iden-tified: (1) direct methods calling the simulator at each iteration to obtain an estimateof the response, and (2) surrogate methods which use simulation to estimate the corre-sponding meta–model which guides the search (Tekin and Sabuncuoglu, 2004; Fu et al.,2008).

Popular direct algorithms include COMPASS (Xu et al., 2010; Hong and Nelson, 2006),Discrete Stochastic Approximation (Lim, 2012), R-SPLINE (Wang et al., 2013), Rankingand Selection method (Kim and Nelson, 2007; Qu et al., 2012), stochastic approxima-tion methods (Clark and Kushner, 1978; Yin and Kushner, 2003), Optimization basedtechniques (Kleijnen, 2008) and the Nested Partitions Method (Shi and Olafsson, 2000).

A major drawback of this family of approaches is their cost when simulation runsrequire high computation effort.

In such cases, meta–modelling based (surrogate) search offers the possibility to usethe information coming from the simulation runs more effectively and computer budgetallocation can be adopted to properly allocate the finite simulation budget to specificcandidate solutions in the design space.

Surrogate methods use a few simulation runs to estimate a model of the response (i.e.,the meta–model) which can be used, in the search procedure, to quickly evaluate the per-formance at any given location in the design space without the need to run the simulator(Nakayama et al., 2006; Wan et al., 2005).Response Surface Methodology (Myers and Anderson-Cook, 2009) is among the most pop-ular techniques in this class especially for its ease of implementation. It alternatively usesa first order and a second order linear regression models to guide the search towards theoptimum.A more complex an commonly adopted model form is Kriging, which has been particularlysuccessful in deterministic computer experiments (Cressie and Cassie, 1993).Recently, a noticeable effort has been dedicated to extend the kriging model structure tothe case of stochastic simulations, including homoscedastic (homogeneous random noisein the design space) and heteroscedastic (heterogeneous random noise in the design space)cases. Specifically, for heteroscedastic simulations, Ankenman et al. (2010) and Yin et al.(2011) proposed the Stochastic Kriging (SK) model and the Modified Nugget Effect Krig-ing (MNEK), respectively.

Quan et al. (2013) proposed the Two Stage Sequential Optimization (TSSO) algorithmto solve unconstrained stochastic simulation–optimization problems with heterogeneousvariance through a budget allocation followed by a two–stage sequential procedure. Atthe first stage, for the search, TSSO uses the MNEK model to explore the region and todefine new candidate solutions. The evaluation stage runs simulation experiments at eachpoint according to the Optimal Computing Budget Allocation (OCBA) rule (Chen et al.,2000) and the kriging model is subsequently updated.

With respect to the current state of the art, this work provides two main contributions.First, we aim at studying the structural properties of the TSSO algorithm delving into

2

the analysis of its asymptotic behaviour. The formal analysis is provided for the one–dimensional case and empirical evidence of the asymptotic behaviour is also provided forhigher dimensions. Second, we aim to mitigate the sensitivity of the TSSO algorithm tothe initial parameters setting. In particular, TSSO performance are quite influenced bythe decision on the number of simulation replications (i.e., the budget) allocated to eachiteration (B) given the total available budget T , i.e., to the pair (T,B). We propose a newadaptive budget allocation scheme which mitigates the effect of the initial TSSO settingsand avoids the user to choose an appropriate value of B.

The remainder of the paper is structured as follows: section 2 presents the backgroundto the presented work. In section 3, the original TSSO is modified in order to analyse itsasymptotic properties. In section 3.1, the theoretical infinite budget analysis is presentedfor the one–dimensional case. The effect of the proposed modifications over the conver-gence performance of the TSSO with finite budget are then investigated in section 3.2.The new eTSSO algorithm, which further extends TSSO, is presented section 4, wherethe new adaptive budget allocation and the algorithm steps are described. In section 5,the new algorithm is tested over multi–dimensional functions to assess its performance.Finally, section 6 draws the conclusions of the paper.

2 Meta–modelling background and stochastic kriging

We consider a single objective, unconstrained minimization problem over a compact setX. The deterministic d–dimensional objective function y : x ∈ X ⊂ <d → y(x) ∈ < is hereobserved with noise through a black box simulation oracle running at point x. Our goalis to find a global minimum of y : X→ <, where X ⊆ <d solving:

P : min y (x) (1)

s.to x ∈ X

In order to find a solution to P, we use a search procedure based on the Modified NuggetEffect Kriging model (MNEK, (Yin et al., 2011)), whose parameters are updated, asthe search progresses, based on the simulation results. The model guides the searchpredicting the function values at points where no simulation has been conducted yet.Section 2.1 provides more details on the adopted Kriging model, whereas some of themain contributions on Kriging based optimization are explored in section 2.2.

2.1 MNEK Model Formulation

Let Y (xi) be the output from the stochastic simulation at xi, where xi =(xi1, xi2, . . . , xid)

T is a point in a d–dimensional input space. We assume that Y (xi)are realizations of a random process that can be described by the model:

Y (xi) = Z (xi) + ξ (xi) i = 1, . . . , k, (2)

where Z describes the mean and ξ describes the random noise of the process. We furtherassume that Z (xi) can be modelled as a Gaussian process with covariance function τ2Rz,where τ2 is the process variance and Rz the matrix of process correlation; formally, Z (xi)is a GP

(µ (x) , τ2Rz

).

3

A commonly adopted correlation function Rz is the d–dimensional separable version of thepower exponential family of functional forms which is characterized by smooth response(equation (3)).

Rz (Z (xi) , Z (xj)) =

d∏l=1

exp(−φz,l (xi,l − xj,l)t

)(3)

The scale correlation parameter φz,l is the sensitivity parameter that controls how fastthe correlation decays with distance in the l –th dimension, and the parameter t controlsthe smoothness of the response. When t = 1, equation (3) is known as the exponentialcorrelation function, and when t = 2, it is known as the Gaussian correlation function(Cressie and Cassie, 1993; Picheny et al., 2013).

The noise ξ (x) is assumed to be distributed with zero mean and covariance functionσ2ξRξ, where Rξ denotes the matrix of sampling correlations. Error variances are generally

not constant and they may depend on x. With independent sampling (i.e., no CRN),Rξ is diagonal, and equation (2) reduces to the independent sampling noise model (Yinet al., 2011; Ng and Yin, 2012). The general form of equation (2) is similar to the formproposed in Ankenman et al. (2010).

As shown in Yin et al. (2011), the MSE optimal predictor for (2) at the point x0, givenk points have been already sampled, is:

Y (x0) =k∑i=1

cT (Rz + Rξ)−1 ei + 1T (Rz + Rξ)

−1

[1− 1T (Rz + Rξ)

−1 c]T

1T (Rz + Rξ)−1 1

ei

Yk. (4)

Where Yk represents the average of the function values at the already sampledpoints, the correlation vector is, assuming a Gaussian correlation form, c (x, ·;φz)T =(e−φzd

2x,x1 · · · e−φzd

2x,xk

), and ei is a vector of size k (being k the number of sampled

points) having all elements equal to 0 except the i–th element which is equal to 1.The optimal MSE results Yin et al. (2011):

MSE∗ = c0 + τ2

1−

c + 1

(1− 1TR

′−1c)

1TR′−11

T R′−1c +

(1− 1TR

′−1c)

1TR′−11

(5)

Where R′

= Rz + Rξ, and c0 is the nugget effect value which usually can be estimatedfrom the sample variance as c0 = s2/n.

2.2 TSSO Algorithm

The Efficient Global Optimization (EGO) method, derived from the Bayesian framework,represents the reference for most of the Kriging based search algorithms. Non–sampledpoints are iteratively selected based on the Expected Improvement (EI) criterion, whichassigns a score to each candidate solution based on the related kriging model predictionJones et al. (1998).

4

In order to extend the kriging based optimization to the case of noisy measurements,Huang et al. (2006) proposed the Stochastic Kriging Optimization (SKO) method. SKOuses the nugget effect kriging model (which assumes constant variances throughout thesample space) replacing the EI criterion with the Augmented Expected Improvement (AEI)in order to deal with noisy function measurements.

To consider the heterogeneous variance and the finite simulation budget, Quan et al.(2013) proposed the Two-Stage Sequential Optimization (TSSO) algorithm. Differentlyfrom EGO and SKO, TSSO is a two–stage algorithm which uses the first stage to balancethe effort between exploration and exploitation when the total number of available replica-tions is limited. Specifically, it allocates the budget between exploration and exploitationaccording to the following rule:

rS,k = rS,k−1 −∆r, rA,k = rA,k−1 + ∆r, (6)

where rS,k is the number of replications to assign to the search stage at iteration k,rA,k is the number of replications for the evaluation at iteration k, ∆r = b(B − rmin)/Kcrepresents the fixed rate of decay (increase) of rS,k (rA,k), and rmin is the minimum numberof replications required to sample a new point. Finally, given T is the total number ofreplications available (i.e., the entire simulation budget), and N0 is the size of the initialspace filling design. We can easily derive the maximum number of iterations that can beperformed by TSSO as K = b(T −N0B) /Bc.Once the budget allocation has been performed, the algorithm uses the MNEK model(section 2.1) to estimate the function values at potential infill points x ∈ X /∈ S, whereS represents the set of already sampled locations. Subsequently, the point xk is added tothe set S if it maximizes the modified expected improvement function Tk (x) Quan et al.(2013):

xk ∈ arg maxx∈X/∈S

Tk (x) = E[max

{y∗k−1 − Yk (x) , 0

}]. (7)

Where y∗k−1 represents the response at the sampled point with the lowest mean up to

iteration k, and Yk (x) is a random variable with the mean given by kriging mean functionand the variance given by the spatial prediction uncertainty s2

z (x). The point xk is sampledwith rS,k replications, and added to the set S.In the second stage, TSSO uses the OCBA technique to assign the available replications,rA,k = B−rS,k, to each of the sampled points x ∈ S. In particular, equation (8) determinesthe relative allocation between non best designs and equation (9) is used to derive therelative allocation between the best design and non best designs Chen et al. (2000):

ni/nj =

(σxi (xi) /δb,iσxj (xj) /δb,j

), (8)

nb = σxb (xb)

√√√√ ∑x∈X:x 6=xb

niσ2xi

. (9)

Where xb is the design point with the lowest sample mean, nb is the related number ofreplication and σxb (xb) is the related standard deviation; ni is the number of replicationsat location i and σxi is the estimated standard deviation at that point. δb,i is the differencebetween sample mean at point i and the lowest sample mean.

5

Algorithm 1 Sequential TSSO algorithm with OCBA sampling

1: k = 0Run a size N0 space filling design, with rS,0 = B replications allocated to each point.rA,0 = 0Fit the MNEK model to the set of sample meansCross validate to check if the initial model fit is satisfactory.

2: Until k ≤ K Find the point xk that maximizes the Tk (x) functionAllocation Stage: Set rS,k = rS,k−1 −∆r and rA,k = rA,k−1 + ∆r

Search Stage: if rS,k ≥ rmin, sample the point that maximizes the function Tk (x)with rS,k replications, otherwise set rS,k = 0 and rA,k = rA,k + rS,k and no point isadded to SEvaluation Stage: Apply equations (8)–(9) to allocate rA,k to the sampled pointsModel Fit: Fit the MNEK model to the set of sample means.

3 TSSO Asymptotic behavior

Although TSSO has been applied effectively to several real world problems Quan et al.(2013), there is no formal analysis of the algorithm convergence properties, which are thesubject of this section. For tractability and ease of exposition, we firstly simplify thealgorithm assuming the following:

1. Infinite budget is available (i.e., T =∞);

2. One–dimensional functions are considered (i.e., d = 1);

3. The budget assigned at each iteration increases as the algorithm progresses. Hence,the available budget is a function of the iteration, formally, B ← Bk and Bk−1 < Bk.

4. At each algorithm iteration, a uniform allocation scheme is adopted to assign thereplications to the sampled points.

We refer to the modified TSSO as TSSO∆B

U , where ∆B recalls the budget increase and Urefers to the uniform budget allocation.

The first modification is a requirement for the asymptotic study. The restriction to theclass of 1–d functions was made for tractability and ease of exposition. According to thethird modification, the budget increases at each iteration to improve the evaluation andfit of the model. Under infinite budget, this is desirable as the improvement can speed upthe search.Hence, the main difference between the original TSSO algorithm and its modified versionrelies in the allocation scheme. The uniform allocation simplifies the OCBA used in TSSOand this is done for ease the analysis. Nonetheless, we expect (as it will be empiricallystudied) OCBA to perform better than the uniform allocation scheme.In section 3.2, we subsequently relax the assumptions made in the simplified algorithm,in order to provide an empirical convergence analysis.

TSSO∆B

U algorithm starts with a user–defined number of initial points x1, x2, . . . , xN0 ,assigns each of them an initial number of replications B0 and fits the initial kriging responsemodel according to equations (2) and (4)–(5). The initial simulation budget T0 = N0B0 is

6

uniformly distributed among the points in the design space, i.e., if we refer to n0 (xi) , i =1, . . . , N0 as the number of replications assigned to each location at iteration k = 0, wehave n0 (xi) = b T0

N0c = B0, i = 1, . . . , N0.

At each iteration k > 0, a new point (infill point) is selected from the design space thatmaximizes the modified expected improvement performance function:

Tk (x;Dk) = E[max

{y∗k−1 − Yk (x) , 0

}|Dk

], (10)

Equation (10) is the same as (7), we only highlight the set Dk =[xk,yk,S

2,nk]

of in-

formation available at the design locations (x1, x2, . . . , xk)T sampled up to iteration k.

Specifically, yk = (y1, y2, . . . , yk)T are the sample means, S2 =

(s2

1, s22, . . . , s

2k

)Tthe es-

timated variance, and n = (n1, n2, . . . , nk) the number of replications assigned at eachlocation.Once the new point has been chosen, it receives rmin replications and the set S is orderedbased on the location of the new infill point.The budget Bk is increased according to Bk = Bk−1 + ∆B · (k)2, where ∆B is an inputparameter defining the budget increase rate. Bk is then uniformly allocated among thesampled points, i.e., nk (xi) = bBkk c, i = 1, . . . , k.At this point, the kriging model fit is performed and the set of information Dk is updatedaccordingly, i.e., Dk = Dk−1 ∪

{xk,yk,S

2k,nk

}.

If a finite number of replications T (total budget) is given, the algorithm will stop oncethe budget is exhausted. The detailed steps are summarized in Algorithm 2. Section 3.1

Algorithm 2 Sequential TSSO algorithm with budget increase and uniform sampling(1–d case)

k = 0Define the N0 space filling design, the initial available budget B0, the replicationincrease coefficient ∆B.Fit the MNEK model response model to the set of sample meansCross validate to check if the initial model fit is satisfactory.k = N0.

2: Choose the new point xk /∈ S solving: xk ∈ arg maxx∈X Tk (x;Dk)Reorder the set of sampled locations S:IF(∃i, 2 ≤ i ≤ k : xk ∈ (xi−1, xi))set xk ← xi and ∀j ≥ i : xj ← xj+1 and x← xjELSE xk ← x1 and ∀j ≥ 1 : xj ← xj+1

Assign to the new point a number of replications ni = rminIncrease the number of replications according to Bk = Bk−1 + ∆B · (k)2

Allocate to each sampled point a number of replications nk = bBkk ck = k + 1Fit the MNEK model according to the updated information.

studies TSSO∆B

U asymptotic behaviour. In section 3.2, we will empirically analyse the

TSSO behaviour relaxing the restrictions made on TSSO∆B

U (i.e., uniform allocation, onedimensional functions, infinite budget).Specifically, we will consider the effect of the OCBA allocation on the convergence and con-vergence speed still assuming large total budget. We will refer to this version of TSSO∆B

U

7

as TSSO∆B

OCBA. Afterwards, we will relax the assumption on one–dimensional functionsand we will investigate the convergence when higher dimensions are considered. Finally,we will study the performance of TSSO∆B

OCBA under finite total budget T , in order toevaluate the sensitivity of the algorithm to the noise magnitude and the size of the initialdesign N0.

3.1 Proof of Asymptotic Convergence

In the following, we first introduce the main assumptions and notation adopted through-out the proof (section 3.1.1). Section 3.1.2 presents the detailed proof of asymptotic

convergence for the TSSO∆B

U algorithm.

3.1.1 Assumptions and Notation

In TSSO∆B

U algorithm, we restricted the class of the functions object of the present studyto the 1–d class and modified the budget allocation scheme for tractability and ease ofexposition (section 3).

Assumption 1. The function to be optimized is a 1–d real valued function Y (x), x ∈ [a, b],where both a, b ∈ <. We further assume that the parameters of the MNEK model are known(τ2 and φz).

According to the results in Yin et al. (2011) and Ankenman et al. (2010) the predictionY (x) has a normal distribution with mean µ|Dk (x) = Y (x) (equation (4)) and varianceσ2|Dk (x) = MSE∗ (equation (5)). In order to guarantee the global optimum will be

sampled infinitely often, as the budget goes to infinity, we impose the budget to increaseat each iteration. Moreover, to progressively drive the error to 0, we have to appropriatelycharacterize the increase rate applied to the budget per iteration Bk Spall (2003); Hu et al.(2008).

Assumption 2. The number of replications Bk assigned at each iteration satisfies Bk ≥Bk−1, ∀k = 1, 2, . . . , and Bk → ∞ as k → ∞. Moreover, for any ε > 0 there exists aδε ∈ (0, 1) and a kε > 0 such that ψ2kL (Bk−1, ε) ≤ (δε)

k , ∀k ≥ kε, where L (·, ·) is strictlydecreasing in Bk−1 and non–increasing in ε.At each iteration of the algorithm, the available replications Bk are uniformly allocatedamong the sampled points.

The following assumption characterizes the correlation form adopted in (4)–(5).

Assumption 3. The Gaussian correlation function is chosen from the exponential familyform.

Assumption 3 is a sufficient condition for the existence of the derivative processes andit ensures that the various variance–covariance matrices are positive definite, i.e., nonsingular. These will be used in Property 1.

As empirically showed in Picheny et al. (2013), the restriction on the correlation formdoes not provide any evidence of worsening the optimization performance, although itprovides desirable properties in terms of convergence.

8

3.1.2 Proof of Convergence

The proof of asymptotic convergence for TSSO∆B

U algorithm consists of three main parts:(1) we first prove that, as the number of replications at each iteration grows to ∞, theMNEK kriging model converges to its deterministic counterpart as the noise is driven to0; (2) we characterize the function Tk (x;Dk) as the number of iterations k →∞ proving

its Lipschitz continuity; (3) we exploit results in (1)–(2) to prove TSSO∆B

U is dense inthe design space X. Finally, we use the result in Calvin and Zilinskas (1999) that theconvergence to the global optimum is guaranteed with dense iterates. This is a ratherintuitive condition for convergence; in essence, it says that, in order to guarantee theglobal optimum is reached, we must converge to every point in the domain. Practically, aglobal convergent method must have a feature that forces it to pay attention to regions inthe design space that have been relatively unexplored and, from time to time, to go backand sample in these regions Jones (2001).

Lemma 1. As the number of iterations k →∞, under assumptions 1–2, the MNEK modelapproaches its deterministic counterpart.

The detailed proof is in the Appendix A1.The basic idea is to prove that Rξ → 0 and R → Rz uniformly over the design space.Notice that the same cannot be said for the MNEK model correlation matrix Rz. Indeed,Rz independent from the error correlation matrix Rξ and it is not generally decreasing inthe number of replications at each point, thus it cannot be treated as L function in thesense of assumption 2.

Lemma 2. As the number of iterations k →∞, Y (x|Dk) ∼ N(µ|Dk (x) , σ2

|Dk (x))

.

Proof. As the number of iterations k → ∞ we can apply Lemma 1 and the result inStein (1999) (Appendix A), and Y (x|Dk) can be treated as a normal distribution withthe following mean and variance:

µ|Dk (x) =k∑i=1

(cTR−1

z ei + 1TR−1z

[1− 1TR−1

z c]T

1TR−1z 1

ei

)Yk (11)

σ2|Dk (x) = τ2

1−

[c + 1

(1− 1TR−1

z c)

1TR−1z 1

]TR−1z c +

(1− 1TR−1

z c)

1TR−1z 1

(12)

Hence, we obtain the deterministic predictor for the function and MSE in Yin et al.(2011).

We exploit Lemma 2 to characterize the function Tk (x;Dk) when k →∞.

Property 1. The function Tk (xi;Dk) is Lipschitz continuous over the domain [a, b].

The detail of the proof can be found in the Appendix A1.We can exploit Property 1 to prove global convergence of the TSSO∆B

U algorithm.

Theorem 1. The TSSO∆B

U algorithm converges uniformly to the global optimum.

9

The proof is reported in the Appendix A1.The intuition behind global convergence can be seen as follows: if the search is

captured in a specific region (either local or global optimum), as the algorithm progresses,the estimation of the kriging model will worsen in the rest of the design space. As aresult, the extrinsic variance will lead the expected improvement Tk (x) to increase inthese regions and the search will escape the current location. A stable condition is then,ideally, reached when all the points have been sampled, i.e., |xk − xk−1|→ 0.

Remark 1. In case the parameters φz and τ2 are unknown, the convergence is affectedby the quality of the estimators. If MLEs are adopted, a bias in the response estimation isintroduced Kleijnen et al. (2012). However, there is no evidence that such a bias negativelyaffects the effectiveness of the search algorithm. Intuitively, when the bias is consistent, theoptimal location should still be identified as the bias will only affect the optimal functionvalue.

3.2 Empirical Results on convergence

In this section, the assumptions made for the theoretical proof of convergence are sequen-tially relaxed, in order to numerically study their impact on the convergence properties ofthe original TSSO.First, we relax the assumption on the uniform allocation and adopt, instead, OCBA toallocate the budget (as the original TSSO). We will refer to TSSO with budget increase

and OCBA sampling as TSSO∆B

OCBA.

Afterwards, we empirically study the convergence of TSSO∆B

OCBA for a 2–d function, hencerelaxing the 1–d assumption made in the scope of the proof.Finally, we consider the effect of the finite budget on the performance of the TSSO∆B

OCBA.

Effect of the allocation scheme In this part, we analyse the converging behaviourof TSSO∆B

U when the OCBA allocation scheme is in place (i.e., TSSO∆B

OCBA), as in theoriginal TSSO, instead of uniform allocation. We propose the study of the following 1–dtest function:

Y (x) = (2x+ 9.96) cos(13x− 0.26) + ξ (x) (13)

This function has a global minimum in x∗ = 0.746 with function value y∗ = −11.45 anda local minimum in x = 0.2628. We adopted a Gaussian random field as noise functionwith variance:

σ2ξ (x) = δ · (x+ 1)2 (14)

We used the following settings:

• Minimum number of replications rmin = 10;

• Initial Budget B0 = 10;

• Size of initial space filling design N0 = 6;

10

• Noise level δ = 1 (low noise);

• Budget increase rate ∆B = 10.

Figure 1: TSSO∆B

U and TSSO∆B

OCBA convergence, ∆B = 10

Each plot in Figure 1 reports a single sample path, showing the behaviour of the responseerror and location error as the budget used increases. In particular, the y–axis reportsthe performance and the x–axis the corresponding budget (i.e., the total budget usedup to that point). The first two plots refer to the difference y − y∗ at each iteration

of the algorithm and the first is the result form TSSO∆B

U , whereas the second refers to

TSSO∆B

OCBA. The third and the fourth plot report the performance x−x∗ at each iterationof the algorithm.

Comparing the uniform and the OCBA sampling in Figure 1, it is clear that the OCBAscheme improves the speed of convergence of the algorithm. Observing x−x∗, if we considerthe same budget (for example 0.4 corresponding to 4000 replications), we can see that

while TSSO∆B

OCBA has already reached convergence, the TSSO∆B

U still deviates from 0.0.Although the effect on the convergence rate is mitigated if the function value is observed,the TSSO∆B

OCBA is more stable than TSSO∆B

U . The reason for the better performance of

TSSO∆B

OCBA resides in the fact that this algorithm focuses on the promising regions. As

more effort is spent in these locations, TSSO∆B

OCBA converges faster Chen et al. (2000).On the contrary, uniform allocation requires a large amount of iterations to approach theglobal optimum as it distributes evenly the effort among points.

11

Effect of the function dimensions We relax the assumption on the class of functions,and we numerically study the convergence of a two–dimensional case. Specifically, wepropose the results obtained for the Hump 6 function Kleijnen et al. (2012):

Y (x1, x2) = 4x21 − 2.1x4

1 + x61/3 + x1x2 − 4x2

2 + 4x42, (15)

with −2 ≤ x1 ≤ 2 and −1 ≤ x2 ≤ 1. In the continuous domain, this function has twoglobal minima, namely, [0.089842 − 0.712656] and [−0.089842 0.712656] with functionvalue y∗ = −1.031628 and two additional local minima.We used a normal random noise with standard deviation σ2

ξ = δ · (|x1|+|x2|) and δ = 1.0.

Furthermore, we set ∆B = 10, B0 = 10, rmin = 10, N0 = 20. For the two–dimensionalcase, we show the results obtained from the algorithm TSSO∆B

OCBA.

Figure 2 shows the performance obtained from TSSO∆B

OCBA in terms of function evaluation(difference yk−y∗ in the first sub–figure) and location (x1,k−x∗1 and x2,k−x∗2 for the firstand second dimension in the second and third sub–plot, respectively). The x–axis reportsthe iteration index k. From the 10 sample paths shown in Figure 2, we can observe a

Figure 2: Response error y∗(TSSO∆B

OCBA) − y∗, location error x∗1(TSSO∆B

OCBA) − x∗1 and

x∗2(TSSO∆B

OCBA)− x∗2.

convergent behaviour also in the two–dimensional case, hence conjecturing the validity ofthe theoretical results when higher dimensions are considered.

Effect of the finite budget In the TSSO algorithm, the budget available at eachiteration, B, is chosen at the algorithm start and it is not modified as the search progresses.Then, the TSSO allocation rule is such that the constant budget B is dynamically dividedbetween the search (rS,k) and the evaluation (rA,k) (equation (6)). From equation (6),we observe that, B must satisfy a condition to obtain positive values of rA,k. Indeed, thefollowing must hold:

B ≥ Bmin = drmin −N0 +

√(N0 − rmin)2 + 4T

2e (16)

When the assigned budget B is below this limit, TSSO will only perform search, i.e.,rA,k = 0 ∀k. The value in (16) implies a bound to the maximum number of iterations

12

that TSSO will perform:

Kmax = bT −BminN0

Bminc (17)

Fixing the budget at each iteration and, consequently, the number of iterations that thealgorithm performs, can lead to different drawbacks as the following examples show.

When the initial MNEK model is fitted, the B replications are assigned to each pointin the space filling design. Subsequent points will have to share with already sampledpoints the same budget B. As a result, the performance of the algorithm will be stronglyinfluenced by the choice of the N0 starting points.

Moreover, at the beginning of the search, values of Tk (x) are generally large and theextrinsic variance is predominant with respect to the intrinsic variance: if this is the case,we might want to assign a lower budget in order to maximize the number of iterationsthat can be performed (equation (17)). On the contrary, once the extrinsic variance hasbeen mitigated and the search procedure has identified the region where the optimumresides, we might wish to increase the budget (hence decreasing the number of remainingiterations) in order to improve the evaluation. This is not possible when B and the numberof iterations are both fixed.

The budget increase ∆B in TSSO∆B

OCBA might improve the performance of the originalTSSO and it has the additional desirable characteristic to favour convergence (at leastwhen large budget is available).

To empirically investigate such advantages, we compared the performance of the origi-nal TSSO against TSSO∆B

OCBA under the 1–d function in equation (13). Table 1 shows the

obtained results for the original TSSO and the TSSO∆B

OCBA, where ∆B = 1 was adopted

for TSSO∆B

OCBA (the standard deviation in the results is shown in parenthesis). In additionto the already presented performance measures, the number of iterations (K∗) until thebest point is identified and the adopted number of replications B∗ to identify the bestpoint were recorded. These were computed as the observations required, and the usedbudget, when the closest point to the known global optimum is reached. We also recordedthe % of times in which the identified best is the global instead of the local minimum of(13) (γ in Table 1). In the original TSSO, the effect of the noise is prominent and a high

sensitivity to B can be noticed from the results. This effect is mitigated in TSSO∆B

OCBA.The results also suggest that both algorithms are influenced by the number of initial pointsN0 when the noise is low. More specifically, although N0 does not appear significant fromthe results (this confirms the outcomes in Picheny et al. (2013)), a strong interaction be-tween B0 and N0 affects the performance of both methods.As it will be further investigated, in the case of low noise and finite budget, we shouldfocus on the search and maximize the number of sampled points, since a small number ofreplications is required to evaluate the function at each location. As a result, in the casewhere a large initial value of B is assigned, better results can be reached with lower N0 asthe algorithm has more budget to perform the search.The noise level negatively impacts the TSSO performance. However, with the budgetincrease, in TSSO∆B

OCBA we observe less sensitivity (i.e. the decrease in the performanceis lower).

Finally, the introduction of the budget increase leads to better performance in termsof γ. This result is particularly relevant as γ also represents a measure of convergence for

13

Table 1: TSSO and TSSO∆B

OCBA comparison under finite budget

Algorithm B0 N0 δ T |x∗ − x∗| (std err) |y∗ − y∗| (std err) K∗ B∗ γ

10 3 1 180 0.077 (0.154) 0.718 ( 0.604) 6.5 111 8620 3 1 360 0.019 (0.007) 0.391 (0.243) 6.66 226.4 10010 6 1 180 0.027 (0.016) 0.593 (0.445) 8.43 148.6 100

TSSO 20 6 1 360 0.035 (0.048) 0.821 (0.639) 8.47 298.8 9910 3 5 180 0.209 (0.235) 2.62 (1.562) 8.32 146.4 6020 3 5 360 0.155 (0.217) 2.193 (1.22) 7.95 278 7010 6 5 180 0.207 (0.227) 2.891 (1.713) 8.89 157.8 5820 6 5 360 0.17 (0.225) 2.373 (1.414) 8.83 313.2 67

10 3 1 180 0.062 (0.141) 0.758 (0.456) 6.61 117.04 8920 3 1 360 0.023(0.064) 0.481(0.321) 6.5 228.72 9810 6 1 180 0.013 (0.012) 0.436 (0.335) 8.37 154.8 100

TSSO∆B

OCBA 20 6 1 360 0.01 (0.009) 0.295 (0.408) 8.27 301.6 10010 3 5 180 0.131 (0.202) 2.036 (1.3) 6.94 125.66 7720 3 5 360 0.127 (0.198) 1.745 (1.24) 6.96 252.44 7610 6 5 180 0.18 (0.227) 2.624 (1.868) 8.49 159.6 6520 6 5 360 0.145 (0.216) 1.802 (1.106) 8.43 314.4 72

the finite budget case. In fact, convergence was observed, on the average of the presentedcases, in less than three iterations after the initial fit (K∗ −N0 in Table 1).

Despite the desirable effects just observed, an increasing budget can endanger thesearch when only a finite number of replications is available, forcing the algorithm to stopearly due to budget exhaustion.The performance of TSSO∆B

OCBA is strongly influenced by the choice of ∆B which is ascomplicated as selecting a correct value for B in the original TSSO. We show, in thefollowing, how critical the assignment of ∆B is for the performance of TSSO∆B

OCBA. Inparticular, we propose a the case in which the noise is not particularly high, but theavailable budget is low.

The function adopted is the 2–d Hump6 (equation (15)) and we study the case δ = 1.0

and T = 600, N0 = 20. We observe the performance of TSSO∆B

OCBA when ∆B takes values{1, 10, 30}. Figure 3 reports the error in the function estimation from a single replicationthrough the algorithm iterations. In particular, each point in the graph is related to analgorithm step and the value on the x–axis refers to the budget allocated up to the iteration

k, i.e., Budget =k∑i=1

Bi (ranging from 1 to T −N0 ∗rmin = 600−20 ·10 = 400). Observing

the graph, we notice that the effect of different ∆B is not particularly critical in the functionestimation. This is due to the positive effect of the parameter in increasing the qualityof the evaluation (i.e., decreasing the noise level). Nevertheless, the parameter reveals itscriticality in the location of the optimum. Observing Figure 4, we can notice how thedecrease in the number of iterations can lead to an inefficient use of the available budgetand large errors (Figures 4(e) and 4(f)). The wrong choice of ∆B adversely affects the rateof convergence of the algorithm. Indeed, in case ∆B = 30, we need more budget to reachconvergence. Table 2 reports the average and standard deviation of the performance of the

14

(a) Response error y∗(TSSO∆B

OCBA) − y∗,

∆B = 1.

(b) Response error y∗(TSSO∆B

OCBA) − y∗,

∆B = 10.

(c) Response error y∗(TSSO∆B

OCBA) − y∗,

∆B = 30.

Figure 3: TSSO∆B

U empirical convergence

algorithm over 100 macro–replications using the settings already presented, confirming thebehaviour observed for the single iteration in Figures 3 and 4. Although the results suggest

Table 2: TSSO∆B

OCBA results under different values of ∆B , T = 600

∆B |x− x∗| |y − y∗|average stderr average stderr

1 0.06340522 0.03727166 0.04381201 0.0407135710 0.07616418 0.0634372 0.05834296 0.0449601230 0.1031872 0.08128569 0.06189078 0.04859752

low values of ∆B should be adopted, the positive effect of large values of the parameteron the noise mitigation should be carefully considered. Consequently, to balance off theneed to explore the design space and the noise conditions of the model, ∆B should beadaptively modified to identify the optimum location more effectively.

Summary remarks The empirical analysis has revealed several interesting aspects ofthe studied algorithms:

• The OCBA scheme contributes in the speed of convergence. We observedTSSO∆B

OCBA converges faster than TSSO∆B

U ;

• The budget increase has the positive effect to decrease the influence of the budgetper iteration B adopted in the original TSSO and the noise level. Nonetheless, theinteraction between the initial budget B and the size of the sample used for theinitial fit N0 affects both algorithms;

• The TSSO∆B

OCBA is highly dependent on ∆B and the algorithm can show poor per-formance under finite budget conditions. Nonetheless, the selection of such a param-eter is not a trivial task as it should balance the need to reduce the noise leading

15

(a) Response error x∗1(TSSO∆B

OCBA) − x∗1,

∆B = 1.

(b) Response error x∗2(TSSO∆B

OCBA) − x∗2,

∆B = 1.

(c) Response error x∗1(TSSO∆B

OCBA) − x∗1,

∆B = 10.

(d) Response error x∗2(TSSO∆B

OCBA) − x∗2,

∆B = 10.

(e) Response error x∗1(TSSO∆B

OCBA) − x∗1,

∆B = 30.

(f) Response error x∗2(TSSO∆B

OCBA) − x∗2,

∆B = 30.

Figure 4: TSSO∆B

OCBA Performance

to larger values of ∆B and the need to have enough iterations to explore the designspace leading to small ∆B.

When a finite total budget is considered, an appropriate allocation should satisfy two maindesirable characteristics:

1. In the case of low noise, the budget should not be increased to a great extent es-pecially at the first iterations, to favour the search, since only few replications areneeded to provide a good point estimate of the function value. The increase shouldbe prominent once the estimate of the model is good in order to speed up the search.

2. In the case of high noise, at larger budget should be allocated since concentratingthe effort on the search might lead to the divergence of the algorithm as a result ofthe extremely poor function estimations.

In this work, we further propose a new allocation scheme to adaptively increase the avail-able number of simulation replications at each algorithm iteration.

4 eTSSO Algorithm

In order to mitigate the effect of choosing a fixed budget B as in TSSO, as well as choosinga budget increase rate as suggested in TSSO∆B

OCBA, we propose to modify the TSSO witha new allocation scheme which adaptively increases B at each iteration according to the

16

information coming from the simulation.Specifically, we use the estimate of the intrinsic variance at each sampled point and theextrinsic variance, to update the number of replications Bk to allocate at each iteration k.The resulting Bk is non–decreasing in k providing a suitable characteristic to the adoptedallocation scheme in terms of convergence properties:

Bk = bBk−1

(1 +

σ2ξ,k

σ2ξ,k + s2

z,k

)c (18)

The budget at the first iteration is set to the minimum number of replications to samplea new point B0 = rmin. In applying (18), the problem of how to estimate the variancecomponent σ2

ξ,k arises. We propose to use, as σ2ξ,k, the estimate of the intrinsic variance

at the point to which the OCBA procedure assigns the largest number of replications.The motivation behind the proposed approach is that the Probability of Correct Selection(PCS ) is a valuable criterion to choose how much budget to allocate at a specific algorithmiteration Chen et al. (2000). Indeed, OCBA chooses a point by considering both thefunction value and the variance of the estimate at each point.

If the most relevant point, according to OCBA, has low variance (i.e., it is selectedbecause of the good function value) the kriging model may have either a large or smallextrinsic variance.In the first case, the allocation rule will keep the budget low in order to favour the search(i.e., increase the number of points to sample). In the second case, the budget is greatlyincreased. This is desirable as, in such a situation, we are likely to be in proximity of theoptimum point, hence, we want to reduce the number of search points and allocate moreat the most promising among the already sampled locations.

If the point chosen by OCBA has large associated variance, we would increase thebudget assigned to the iteration in order to drive down the noise. Nevertheless, when thealgorithm is at its early stages and large extrinsic variance characterizes the search, thebudget increase rate is mitigated.

Figure 5 shows the budget allocated at each iteration (one single sample path is consid-ered) by the proposed allocation rule under different noise magnitudes δ. The underlyingfunction is the Hump 6 studied in section 3.2 with the same noise function form. Thetotal budget was fixed at T = 1340 replications, the initial design size was N0 = 20 andwe set B0 = rmin = 10.

We can observe that the allocation has the desirable properties stated at the end ofsection 3.2: when the noise level is low the budget increment is low as well. For example, incase δ = 1.2, the budget is slightly increased at the beginning and then it remains constantthroughout the algorithm until the end when it is increased to terminate the search as thebudget is exhausted. In this case, 19 iterations are performed by the algorithm, i.e., 19new points are sampled.On the contrary, when δ = 9.6, only 9 new points are sampled and more effort is devotedto the noise reduction, i.e., more replications are assigned at each iteration. Table 3 showsthe detailed allocation performed by eTSSO at each iteration of a single macro–replication

for the two extreme cases in Figure 5, i.e., δ = 1.2 and δ = 9.6. The columnσ2ξ,k

σ2ξ,k+s2z,k

reports the coefficient for computing the budget increase (equation (18)) and the resultingnumber of replications is reported in column Bk.

17

Figure 5: eTSSO adaptive budget allocation under different noise levels δ: budget at each iterationBk

In both noise magnitude cases, we observe that, at the beginning of the search, theextrinsic variance s2

z,k, whose role is to keep the increase rate of the budget low, is pre-dominant.

In the low noise case, the budget is never increased after the 7–th iteration becausethe intrinsic variance is never predominant. At the last iteration, B19 increases from 73to 141 because the budget is exhausted. Indeed, since Bk is an increasing sequence, whenthe remaining budget at iteration k is less than the current budget, then the remainingbudget is added to the current.

When the noise is large, we observe a different behaviour. The increase rate of thebudget is much larger as the intrinsic noise is predominant.

We also notice that the location error increases with the noise, as already observed inthe previous experiments. Concerning the estimation error, for the four noise cases, weobtained |y − y∗|= {0.0125, 0.0126, 0.111, 0.07} for increasing noise values, respectively.

4.1 eTSSO algorithm overview

With the modifications to the budget allocation for each iteration (equation (18)), theresulting eTSSO is proposed in Algorithm 3.

4.2 Numerical Example

In this section, to evaluate the performance of the proposed eTSSO with respect to theoriginal TSSO, we compare the algorithms for different values of noise magnitude δ, totalavailable budget T and function complexity (dimensions). A one–dimensional case isstudied followed by a 2–dimensional case.

Effect of the budget per iteration In the following, we propose a study of the 1–dfunction already used for the convergence analysis (equation (13)) using the same noisesignal (equation (14)). We set the total budget T = 600, rmin = 10 and N0 = 6. Forthis case, the minimum budget per iteration Bmin, according to (16), is Bmin = 27. Themaximum Bmax, i.e., the budget such that all the available replications are used for theevaluation of the initial design, is Bmax = 100. Finally the minimum “feasible” budget isBf = rmin = 10.Figure 6(a), shows the performance of the original TSSO algorithm, in terms of optimum

18

Table 3: eTSSO adaptive budget allocation

δ k σ2ξ,k s2

Z,k

σ2ξ,k

σ2ξ,k+s2z,k

Bk

1 0.0327169 18.251774 0.0017893 102 0.0327169 0.944056 0.0334949 103 0.0471193 18.920306 0.0024842 104 0.086458 0.000135 0.9984405 19

1.2 5 0.1818287 0.0001165 0.9993599 376 0.039963 10.485177 0.0037969 377 0.1005435 0.0001801 0.9982124 738 0.0198574 2.0741556 0.0094829 73...

......

......

19 0.1032024 1944.0066 5.308E-05 141

1 2.0938835 20.121369 0.0942543 102 2.0938835 44.100773 0.0453274 103 2.0938835 56.961903 0.035456 104 5.5333143 0.0158354 0.9971463 19

9.6 5 3.9648416 25.370515 0.1351557 216 3.4245438 0.1661451 0.9537289 417 2.5011538 0.0001001 0.99996 818 1.1283042 0.0001593 0.9998588 1619 4.7734296 0.6235668 0.8844604 30310 0.3351774 37354.546 8.973E-06 484

location and optimal function value estimation, for values of the budget rmin ≤ B ≤ Bmaxagainst eTSSO. In particular, each point represents the average performance obtained outof 100 macro–replications of the algorithms. Notice that eTSSO does not require to setB, hence it corresponds to the horizontal lines in the graph. Figure 6(a) reveals severalinteresting aspects. If TSSO is considered, the performance in terms of location |x − x∗|oscillates with respect to the assigned budget B whereas the performance on the functionvalue estimation, |y− y∗|, shows a smoother behaviour. The response |y− y∗| is smootherbecause of two main factors: (1) close points have a similar function value, as a result,even if the error in the location oscillates, the one on the objective function is stable; (2)the algorithm chooses the best point based on the value of the function, i.e., TSSO (andeTSSO) directly controls the response Y and the convergence on the location x is a resultof the convergence of the function.

As B increases, the noise effect is mitigated and the number of iterations that can beperformed decreases. If the first effect leads to an improvement of the performance, thesecond can be critical in terms of convergence to the global optimum as fewer new pointsare sampled.

Focusing on the eTSSO results, we notice that it does not return the best performancein terms of |x− x∗| for every budget allocation B of TSSO. Nevertheless, the error in thelocation resulting from the new algorithm, is far below the average error obtained whenTSSO is applied over all B’s (av|x− x∗| in Figure 6(a)).

19

Algorithm 3 eTSSO Algorithm

k = 0Define the total budget T , the minimum number of replications for sampling a pointrmin, the size N0 of the space filling designSet the initial available budget B0 = rmin and collect the initial dataFit the MNEK model to the set of sample meansApply cross–validation to evaluate the quality of the modelUpdate the total budget according to T = T −B0N0

Set the available budget A = TSet k = 1.Until A > 0Search: Find the point xk ∈ arg maxx∈X/∈S T (x) and run rmin replications to evaluatethe function in that pointBudget Computation:IF k > 1Use OCBA to determine σ2

ε,k−1

Update the budget Bk according to (18).ENDEvaluation StageBk = Bk − rminApply equations (8)–(9) to allocate Bk to the sampled pointsFit the kriging model according to the updated informationUpdate A = A−Bkk = k + 1

It is important to consider that, even if TSSO can lead to better performance, differentvalues of B have to be tested in order to obtain a good configuration. However, since nostructural properties are defined, running the algorithm is the only way to determine asuitable value for B.

Focusing on the |y−y∗| performance, we observe an expected good result from eTSSO.In this specific case, the extended algorithm is always better of the original TSSO. SinceeTSSO explicitly considers the response noise, by increasing the budget when this is par-ticularly large, we can expect a better performance in terms of function value estimation,especially with large noise levels.

This aspect is central as it reflects in the location performance |x − x∗| (which is themost important). Indeed, as the algorithm progresses, convergence to the optimal locationis guaranteed only if the function value is correctly estimated Vogel and Lachout (2003).

Effect of the noise We performed a further set of experiments to compare therobustness of the two algorithms against different noise levels. Specifically, we increased δfrom the value 1.0 adopted in the previous experiment to δ = 10. We set the total budgetT = 3000 because of the increased noise. Figure 6(b) shows the average performance outof 100 macro–replications. Despite a decrease in the algorithm performance overall (bothTSSO and eTSSO), due to the increased noise level, it is possible to observe a similarbehaviour as in the lower noise case (Figure 6(a)).

20

(a) eTSSO and TSSO performance compar-ison under different values of budget per it-eration Bk, δ = 1.2 1–dimensional case

(b) eTSSO and TSSO performancecomparison under different valuesof budget per iteration Bk, δ = 101–dimensional case

Figure 6: Effect of The Budget per iteration

Observing results in Figure 6(b), it is noteworthy that for B = 390 and B = 400, i.e.,when only 1 point is added to the initial design and no points are added to the initialdesign, respectively, the performance in the location is particularly good for the TSSOalgorithm. However, this phenomenon is due to the specific initial design which containsa point particularly close to the global optimum. Moreover, we observe that the functionestimate is away from the minimum.

Effect of the total budget We performed an additional study to investigate the effectof the total budget T over the performance of the eTSSO algorithm. In these experiments,the noise is set to δ = 3.6 and we set B = rmin = 10. In turn, B should be estimatedfor each T running several experiments as we showed in the previous sections. Herein, wekeep B constant for all the conditions and only the total budget T is increased.

Figure 7 shows the obtained results from both algorithms. As expected, the totalbudget increase is producing better performance in terms of function estimation (|y− y∗|)especially for eTSSO. Indeed, as T increases, the budget at each iteration B will havemore room for increasing and more replications will be assigned improving the evaluation.The same holds for the location error |x− x∗|.Such a result is mitigated for TSSO: as B is fixed throughout the algorithm, the increasedtotal budget T translates in an increase in the number of iterations (equation (17)), but itdoes not imply an improved evaluation at each iteration as the budget available does notchange. As a result, we observe in Figure 7, that |y − y∗| remains large. Concerning thelocation error, we can observe |x− x∗| shows an oscillating behaviour. As already stated,the increase of the budget reflects in the increase of the total number of iterations, thusfavouring the search. Nonetheless, the larger error in the function estimation might leadto the wrong location.

Effect of the function dimensions We studied the effect of the noise on the two–dimensional function Hump 6 in order to evaluate the performance of the proposed rulewith respect to higher dimensions. The noise function adopted was σ2

ξ (x1, x2) = δ ·(|x1|+|x2|). The first experiment set was performed with T = 2400, rmin = 15, δ = 1.2

21

Figure 7: Effect of the total budget T over eTSSO and TSSO performance

resulting in Bmin = 45, whereas the second with T = 9600, rmin = 60, δ = 4.8 resulting inBmin = 120. For both conditions we performed 100 macro–replications.We observed the same behaviour of the 1–d case, thus confirming the structural propertiesof the proposed allocation scheme. Results are reported in the Appendix (A2).

eTSSO is always better off TSSO in the case with low noise. The difference betweenthe two algorithms mitigates as the noise level increases. Notwithstanding, as in the 1–dcase, eTSSO is never significantly worse than TSSO.

Summary remarks Summarizing, the following main points were raised from the pre-sented numerical analysis:

• eTSSO has good performance with respect to TSSO and it is always performingbetter than the average performance of TSSO (where the average is taken overthe different values of budget per iteration B). A fundamental advantage of thealgorithm is that it does not require the user to define any arbitrary value for B;

• The adaptiveness of the allocation scheme in eTSSO proves to perform better thanTSSO in the estimation of the function when the noise is larger since the budgetincrease enhances the evaluation. This contributes to improve the identification ofthe optimal location which is always satisfactory;

• The adaptiveness of the new allocation leads to a more effective use of the availablebudget: while TSSO would require to size the appropriate B based on the specifictotal available budget, eTSSO adapts to T and its performance are consistentlyimproving as the total number of replications that can be performed increases;

• The study on the two–dimensional case shows the consistency of the performanceand of the relationship to TSSO, in higher dimensions.

5 Empirical Results

In this section, we focus on eTSSO, analysing its behaviour under different total budgetsT and noise levels δ, in order to evaluate its robustness and performance. In particular,we observe the following measures: (1) the absolute location error |x − x∗| computed as

22

the euclidean distance between the location identified by eTSSO and the, known, globaloptimum, and (2) the absolute function estimation error |y − y∗|.

We perform this study on several functions with increasing level of complexity. Inparticular, we consider a two–dimensional test function tetra–modal, the three–dimensionaltest function Hartmann 3, and a six–dimensional test function Hartmann 6.

Two dimensional case We tested the following tetra–modal function:

Y (x1, x2) = −5(1− (2x1 − 1)2)(1− (2x2 − 1)2)(4 + 2x1 − 1)(

0.05(2x1−1)2 − 0.05(2x2−1)2)2.(19)

Both dimensions of the test function, x1 and x2, are scaled to [0, 1]. The global minimumis located at [0.85, 0.5] and has the value −7.098.We applied to the function a normal random noise with standard deviationσ2ξ = δ · (|x1|+|x2|). We set B0 = rmin. We tested the noise mag-

nitude levels δ = {0.6, 1.2, 2.4, 3.6, 4.8, 9.6}, with related total budget T ={1200, 2400, 4800, 7200, 9600, 19200} and minimum number of replications rmin ={10, 20, 40, 60, 80, 160}, the initial budget was set to the minimum number of replications,B0 = rmin.We compared the obtained results with those from the Genetic Algorithm (Matlab Tool-Box) under the same available budget and setting the initial population for the GA equalto the size of the initial space filling design, i.e., N0 = 20 Davis (1991). According to thesettings adopted for eTSSO, we fixed the number of replications to evaluate each individ-ual of the population in GA to be rmin. Finally, the number of population generation wasset to bT−N0rmin

N0rminc, in order to keep the same budget for both search methods.

Results from 100 independent macro–replications of the two algorithms are shown in Ta-ble 4. In case of larger noise levels, the number of points sampled by eTSSO decreases

Table 4: Results for the Tetramodal functionAlgorithm δ |x− x∗| |y − y∗|

average std err average std err

0.6 0.0054 0.0038 0.0331 0.03011.2 0.0072 0.005 0.0462 0.0405

eTSSO 2.4 0.0072 0.0047 0.061 0.04763.6 0.01 0.0069 0.0746 0.06224.8 0.0094 0.007 0.087 0.07919.6 0.014 0.0096 0.1512 0.1258

0.6 0.2714 0.223 1.3979 1.151.2 0.3421 0.2212 1.1702 0.8644

GA 2.4 0.2648 0.237 1.0001 0.81743.6 0.2987 0.2358 1.1784 0.91654.8 0.2861 0.2357 0.9942 0.68659.6 0.3078 0.2383 0.9529 0.725

(26.59 on the average when δ = 0.6 to an average of 13.18 when δ = 9.6) even under alarger initial budget T . This is reasonable: as the intrinsic variance increases the budgetincreases at faster rates. As long as the difference in the increase rates is more than linear

23

in the noise level, the budget is exhausted at faster rate when noise is larger. This intu-itively, as already stated, can endanger the search as the algorithm samples less points.Nonetheless, these fewer sampled points are subject to lower intrinsic noise because of thelarger allocated number of replications.The Genetic Algorithm cannot converge to the global optimum in most of the replicationsand eTSSO is statistically better off (α = 10%) under all experimental conditions. Thebehaviour of the GA can be brought back to the multi modal nature of the function underanalysis. We observed that the heuristic fails to converge to the global optimum and it isinstead trapped in the local minima most of the time.

Three dimensional case We consider the Hartmann–3 test function:

Y (x1, x2, x3) = −4∑i=1

αiexp

− 3∑j=1

Aij (xj − Pij)2

with 0 ≤ xi ≤ 1 for i = 1, 2, 3; parameters α = (1.0, 1.2, 3.0, 3.2), and Aij and Pij given

Table 5: Parameters Aij and Pij of the Hartmann–3 function

Aij Pij3 10 30 0.3689 0.117 0.2673

0.1 10 35 0.4699 0.4387 0.7473 10 30 0.1091 0.8732 0.5547

0.1 10 35 0.03815 0.5743 0.8828

in Table 5. The function has a global minimum at x∗ = (0.114614, 0.555649, 0.852547)with Y (x∗) = −3.86278; the function has three additional local minima. We applied a

normal random noise with standard deviation σ2ξ = δ ·

(3∑i=1|xi|)

.

We tested the noise magnitude levels δ = {0.6, 1.2, 2.4, 3.6, 4.8, 9.6}, with related totalbudget T = {1600, 3200, 6400, 9600, 12800, 25600} and minimum number of replicationsrmin = {10, 20, 40, 60, 80, 160}, the initial budget was set to the minimum number of repli-cations, B0 = rmin.We compared the obtained results with those from the Genetic Algorithm (Matlab Tool-Box) under the same available budget and setting the initial population for the GA equalto the size of the initial space filling design, which was N0 = 30.

Results from the two algorithms are shown in Table 6. From the results, we firstnotice the impact of the noise on the performance of the algorithm confirming the resultsobtained in the previous case and in section 4.2. Moreover, we notice an increase in theerror as the dimensions of the function increase. Nevertheless, given the same numberof replications, the GA cannot converge to the global optimum most of the time andeTSSO is statistically better (α = 10%) than the genetic algorithm under all experimentalconditions, as observed in the case of the Tetramodal function. The justification for theheuristic behaviour resides, again, in the multi–modal characteristic of the function understudy. It is noteworthy that the Hartmann 3, as the Tetramodal, has 3 local minima andthe GA, given the budget T , is almost always trapped in a local minimum point.

24

Table 6: Results for the Hartmann–3 functionAlgorithm δ |x− x∗| |y − y∗|


0.6 0.0721 0.15 0.0462 0.05061.2 0.1889 0.2694 0.0943 0.075

eTSSO 2.4 0.2166 0.2838 0.1254 0.09363.6 0.2884 0.2963 0.1733 0.1224.8 0.3303 0.3057 0.2356 0.17829.6 0.4109 0.333 0.4596 0.2621

0.6 1.0134 0.4951 4.165 1.1191.2 1.0283 0.4674 4.2734 1.0481

GA 2.4 1.0175 0.3998 4.3204 1.0113.6 1.049 0.4396 4.3294 0.94434.8 0.9872 0.4254 4.206 0.97819.6 0.9793 0.4067 4.3318 1.0179

Six dimensional case The Hartmann–6 test function is as follows:

Y (x1, x2, x3, x4, x5, x6) = −4∑i=1

ciexp

− 6∑j=1

αij (xj − pij)2

with 0 ≤ xi ≤ 1 for i = 1, . . . , 6; parameters α = (1.0, 1.2, 3.0, 3.2), and

Table 7: Parameters αij and Pij of the Hartmann–6 function

αij 10.0 3.0 17.0 3.5 1.7 8.00.05 10.0 17.0 0.1 8.0 14.03.0 3.5 1.7 10.0 17.0 8.0

17.0 8.0 0.05 10.0 0.1 14.0

pij 0.1312 0.1696 0.5569 0.0124 0.8283 0.58860.2329 0.4135 0.8307 0.3736 0.1004 0.99910.2348 0.1451 0.3522 0.2883 0.3047 0.66500.4047 0.8828 0.8732 0.5743 0.1091 0.0381

αij and pij given in Table 7. This function has a global minimum at x∗ =(0.20169, 0.150011, 0.476874, 0.275332, 0.311652, 0.6573) with y (x∗) = −3.32237; the func-tion also has five additional local minima. We applied to the function a normal random

noise with standard deviation σ2ξ = δ ·

(3∑i=1|xi|)

, i.e., the same noise applied to the Hart-

mann 3. Hence, the difference in performance in this case can be entirely brought back tothe function dimensions as the two forms are particularly similar.We tested the noise magnitude levels δ = {0.6, 1.2, 2.4, 3.6, 4.8, 9.6}, with related totalbudget T = {4800, 9600, 19200, 28800, 38400, 76800} and minimum number of replicationsrmin = {40, 80, 160, 240, 320, 640}. We compared the obtained results with those from theGenetic Algorithm (Matlab ToolBox) under the same available budget and setting theinitial population for the GA equal to the size of the initial space filling design, which was

25

N0 = 60.Results from the two algorithms are shown in Table 8. The Hartmann–6 case confirms all

Table 8: Results for the Hartmann–6 functionAlgorithm δ |x− x∗| |y − y∗|


0.6 0.44559 0.27392 0.38654 0.325985051.2 0.4883 0.1598637 0.711 0.26988286

eTSSO 2.4 0.5327 0.1396394 0.8839 0.260793453.6 0.5365 0.1039856 0.9961 0.258074264.8 0.5612 0.1360967 1.0729 0.279509649.6 0.5392 0.1042693 1.0348 0.27350492

0.6 1.1606 0.3564509 4.5527 0.849549381.2 1.2365 0.3812312 4.7721 0.79668504

GA 2.4 1.1983 0.4152014 4.4998 0.857747613.6 1.2103 0.3953533 4.6236 0.906529494.8 1.1656 0.3691149 4.5448 0.935465939.6 1.1827 0.37272 4.772 0.95452731

the previous findings. It is noteworthy that, as the dimensions of the function increase, thechoice of the initial design is always more critical and the size of the design must increasesignificantly.

6 Conclusions

In this paper, we address two main open issues related to the recently proposed stochasticoptimization algorithm TSSO: the analysis of the convergence properties and the sensitiv-ity of the algorithm to the budget allocated at each iteration.

In light of the first objective, we performed the asymptotic convergence analysis forthe TSSO∆B

U obtained from TSSO with several assumptions and restrictions.An empirical analysis showed the effect of the restrictions and assumptions relaxing

them. As a result of this study, the algorithm TSSO∆B

OCBA is proposed which differs from

TSSO as the budget increases at each iteration. We observed TSSO∆B

OCBA converges faster

than TSSO∆B

U and that the budget increase has the positive effect to decrease the influenceof the budget per iteration B, adopted in the original TSSO, and the noise magnitude.However, the performance of TSSO∆B

OCBA is highly dependent on ∆B, i.e., the fixed a–priori budget increase rate, and the algorithm shows poor performance under the wrongparameter setting when the total budget T is finite.

We proposed eTSSO, which, similarly to TSSO∆B

OCBA, increases the budget at eachiteration, but uses information coming from the simulation to adaptively compute theincrease rate instead of fixing it, mitigating the effect of the available budget T .

The performance of eTSSO have been tested against functions of increasing dimensionsand results have been compared with the Genetic Algorithm. eTSSO is shown to bebetter off than GA given the total available budget T , in the proposed examples. Theperformance of eTSSO are sensitive to the function dimensions, nonetheless the algorithm

26

behaviour is consistent with respect to the lower dimension cases, proving the generalityof the proposed approach and of the empirical results.

Future research will be dedicated to further investigate the impact of the new budgetallocation scheme when applied to different stochastic search algorithms to investigate itsgeneral validity.

Acknowledgements

This research was supported in part by the research project grant (R-SMI-2013-MA-11)funded by the Singapore Maritime Institute.

References

Bruce Ankenman, Barry L. Nelson, and Jeremy Staum. 2010. Stochastic kriging for sim-ulation metamodeling. Operations research 58, 2 (2010), 371–382.

James Calvin and Antanas Zilinskas. 1999. On the Convergence of the P-Algorithm forOne-Dimensional Global Optimization of Smooth Functions. Journal of OptimizationTheory and Applications 102, 3 (1999), 479–495.

Yolanda Carson and Anu Maria. 1997. Simulation optimization: methods and applications.In Proceedings of the 29th conference on Winter simulation. IEEE Computer Society,118–126.

Chun-Hung Chen, Jianwu Lin, Enver Yucesan, and Stephen E. Chick. 2000. Simulationbudget allocation for further enhancing the efficiency of ordinal optimization. DiscreteEvent Dynamic Systems 10, 3 (2000), 251–270.

Dean Clark and Harold J. Kushner. 1978. Stochastic approximation methods for con-strained and unconstrained systems. Springer.

Noel A.C. Cressie and Noel A. Cassie. 1993. Statistics for spatial data. Vol. 900. WileyNew York.

Lawrence Davis. 1991. Handbook of genetic algorithms. Vol. 115. Van Nostrand ReinholdNew York.

Michael C. Fu, Chun-Hung Chen, and Leyuan Shi. 2008. Some topics for simulationoptimization. In Proceedings of the 40th Conference on Winter Simulation. Winter Sim-ulation Conference, 27–38.

Michael C. Fu, Fred W. Glover, and Jay April. 2005. Simulation optimization: a review,new developments, and applications. In Proceedings of the 37th conference on Wintersimulation. Winter Simulation Conference, 83–95.

Jeff L. Hong and Barry L. Nelson. 2006. Discrete optimization via simulation using COM-PASS. Operations Research 54, 1 (2006), 115–129.

27

Jiaqiao Hu, Michael C. Fu, and Steven I. Marcus. 2008. A Model Reference AdaptiveSearch Method for Stochastic Global Optimization. Communications in Information &Systems 8, 3 (2008), 245–276.

Deng Huang, Theodore T. Allen, William I. Notz, and Ning Zeng. 2006. Global opti-mization of stochastic black-box systems via sequential kriging meta-models. Journalof global optimization 34, 3 (2006), 441–466.

Donald R. Jones. 2001. A taxonomy of global optimization methods based on responsesurfaces. Journal of global optimization 21, 4 (2001), 345–383.

Donald R. Jones, Matthias Schonlau, and William J. Welch. 1998. Efficient global opti-mization of expensive black-box functions. Journal of Global optimization 13, 4 (1998),455–492.

Seong-Hee Kim and Barry L. Nelson. 2007. Recent advances in ranking and selection. InProceedings of the 39th conference on Winter simulation: 40 years! The best is yet tocome. IEEE Press, 162–172.

Jack P.C. Kleijnen. 2008. Response surface methodology for constrained simulation opti-mization: An overview. Simulation Modelling Practice and Theory 16, 1 (2008), 50–64.

Jack P.C. Kleijnen, Wim van Beers, and Inneke Van Nieuwenhuyse. 2012. Expectedimprovement in efficient global optimization through bootstrapped kriging. Journal ofglobal optimization 54, 1 (2012), 59–73.

Averill M. Law and W. David Kelton. 1991. Simulation modeling and analysis. Vol. 2.McGraw-Hill New York.

Eunji Lim. 2012. Stochastic approximation over multidimensional discrete sets with ap-plications to inventory systems and admission control of queueing networks. ACMTransactions on Modeling and Computer Simulation (TOMACS) 22, 4 (2012), 19.

Marco Locatelli. 1997. Bayesian algorithms for one-dimensional global optimization. Jour-nal of Global Optimization 10, 1 (1997), 57–76.

Raymond H. Myers and Christine M. Anderson-Cook. 2009. Response surface methodology:process and product optimization using designed experiments. Vol. 705. John Wiley &Sons.

Hirotaka Nakayama, Koichi Inoue, and Yukihiro Yoshimori. 2006. Approximate optimiza-tion using computaional intelligence and its application to reinforcement of cable-stayedbridges. In Proceedings of the 2006 conference on Integrated Intelligent Systems for En-gineering Design. IOS Press, 289–304.

Szu Hui Ng and Jun Yin. 2012. Bayesian kriging analysis and design for stochastic sim-ulations. ACM Transactions on Modeling and Computer Simulation (TOMACS) 22, 3(2012), 17.

Victor Picheny, Tobias Wagner, and David Ginsbourger. 2013. A benchmark of kriging-based infill criteria for noisy optimization. Structural and Multidisciplinary Optimization48, 3 (2013), 607–626.

28

Huashuai Qu, Ilya O. Ryzhov, and Michael C. Fu. 2012. Ranking and selection withunknown correlation structures. In Proceedings of the Winter Simulation Conference.Winter Simulation Conference, 12.

Ning Quan, Jun Yin, Szu Hui Ng, and Loo Hay Lee. 2013. Simulation optimizationvia kriging: a sequential search using expected improvement with computing budgetconstraints. Iie Transactions 45, 7 (2013), 763–780.

Leyuan Shi and Sigurdur Olafsson. 2000. Nested partitions method for global optimization.Operations Research 48, 3 (2000), 390–407.

Jung P. Shim, Merrill Warkentin, James F. Courtney, Daniel J. Power, Ramesh Sharda,and Christer Carlsson. 2002. Past, present, and future of decision support technology.Decision support systems 33, 2 (2002), 111–126.

James C. Spall. 2003. Introduction to Stochastic Search and Optimization: Estimation,Simulation, and Control. Wiley.

Michael L. Stein. 1999. Interpolation of spatial data: some theory for kriging. Springer.

James R. Swisher, Paul D. Hyden, Sheldon H. Jacobson, and Lee W. Schruben. 2000. Asurvey of simulation optimization techniques and procedures. In Simulation Conference,2000. Proceedings. Winter, Vol. 1. IEEE, 119–128.

Eylem Tekin and Ihsan Sabuncuoglu. 2004. Simulation optimization: A comprehensivereview on theory and applications. IIE Transactions 36, 11 (2004), 1067–1081.

Silvia Vogel and Petr Lachout. 2003. On continuous convergence and epi-convergence ofrandom functions. Part I: Theory and relations. Kybernetika 39, 1 (2003), 75–98.

Xiaotao Wan, Joseph F. Pekny, and Gintaras V. Reklaitis. 2005. Simulation-based opti-mization with surrogate modelsApplication to supply chain management. Computers &chemical engineering 29, 6 (2005), 1317–1328.

Honggang Wang, Raghu Pasupathy, and Bruce W. Schmeiser. 2013. Integer-OrderedSimulation Optimization using R-SPLINE: Retrospective Search with Piecewise-LinearInterpolation and Neighborhood Enumeration. ACM Transactions on Modeling andComputer Simulation (TOMACS) 23, 3 (2013), 17.

Jie Xu, Barry L. Nelson, and Jeff Hong. 2010. Industrial strength COMPASS: A compre-hensive algorithm and software for optimization via simulation. ACM Transactions onModeling and Computer Simulation (TOMACS) 20, 1 (2010), 3.

G. George Yin and Harold J. Kushner. 2003. Stochastic approximation and recursivealgorithms and applications. Springer.

Jun Yin, Szu Hui Ng, and Kien Ming Ng. 2011. Kriging metamodel with modified nugget-effect: The heteroscedastic variance case. Computers & Industrial Engineering 61, 3(2011), 760–777.

29

APPENDIX

A1: Proofs for Asymptotic convergence

In this Appendix we report the proofs in section 3.1.2.Proof of Lemma 1.

Proof. Let us refer to assumption 2 and consider the function L = σ2ξRξ and let δε be a

gaussian function of the minimum distance between two sampled locations at each iterationk. The function σ2

ξRξ satisfies the properties required to L: being an error function, it isstrictly decreasing in Bk−1 and it cannot increase as the distance between points increases(smoothing effect of the correlation function). The assumption is stating that there exist

a finite number of iterations such that L ≤(δε/ψ

2)k

. This means that at each iteration,the algorithm produces estimates of the function L which are decreasing with δε/ψ

2.Let us consider a Gaussian correlation function with known parameter φz, the correlationmatrix Rz can be written as:

Rz =

1 e(−φzd

212) · · · e(−φzd

21k)

e(−φzd221) 1 · · · e(−φzd

22k)

......

......

e(−φzd2k1) · · · · · · 1

The correlation vector at the point of interest is c (x, ·;φz)T =

(e−φzd

2x,x1 · · · e−φzd

2x,xk

).

Since points are ordered by the algorithm, the subscript i corresponds to the orderedlocation of the point (section 3). We use this fact to rewrite the correlation matrix at thek–th algorithm iteration:

Rz =

1 e−φz(x2−x1)2

· · · e−φz(xk−x1)2

e−φz(x2−x1)2

1 · · · e−φz(xk−x2)2

· · · · · · e−φz(xi+1−xi−1)2

· · ·e−φz(xk−x1) · · · · · · 1

The element exp

(−φz (xi+1 − xi−1)2

)refers to the pair of locations before and after

the considered candidate point. Notice that, according to the algorithm, we change thesubscript of the observations to consider the infill point, for this reason the indexes of thecorrelation matrix are defined until k despite, at iteration k, only k−1 points have alreadybeen sampled.The correlation vector for the candidate point can be rewritten as:

c (x, ·;φz)T =(e(−φz(xi−x1)2) · · · e(−φz(xi−xi−1)2) e(−φz(xi+1−xi)2) · · · e(−φz(xk−xi)2)

).

30

Let us rewrite the covariance matrix of the kriging model:

R = Rz + Rξ =

1 e(−φzd

212) · · · e(−φzd

21k)

e(−φzd221) 1 · · · e(−φzd

22k)

......

......

e(−φzd2k1) · · · · · · 1

+

+

σ2ξ (x1)

n1τ2 0 · · · 0...

......

...

0 · · · · · · σ2ξ (xk)

nkτ2

where ni represents the number of replications performed at location i according to theadopted uniform allocation. Having assumed knowledge on the kriging variance, thediagonal elements of the matrix Rξ are bounded by decreasing values of δε/ψ

2, where

δε = minj∈Sexp(−(xj+1−xj)2)

njand ψ2 = τ2 (S refers to the set of sampled points). Ap-

parently, δε ∈ (0, 1) and δε/ψ2 → 0 as k → ∞, since nk → ∞ ∀k. Since the budget is

uniformly allocated to the candidate points, Rξ → 0 uniformly over the design space andR→ Rz uniformly over the design space.

Proof of Property 1.

Proof. According to Locatelli (1997) the following holds:

Tk (xi;Dk) = E [max {y∗k − Zk (x) , 0} |Dk] =

∫ y∗k−µ|Dkσ|Dk

−∞

(y∗k − µ|Dkσ|Dk

− t)dF (t) .

Given lemma 1 holds we use Lemma 2 to write the following:

Tk (xi;Dk) = σ|Dkφ


)−(y∗k − µ|Dk

)(1− Φ


)), (20)

where φ and Φ represent the pdf and cdf of the normal distribution, respectively.Such a function is globally Lipschitz continuous in case µ|Dk and σ|Dk are defined accordingto (11) and (12), respectively.

To show Lipschitz continuity we have to guarantee that dTk(xi:Dk)dx <∞, ∀x ∈ [a, b]. Under

the normality assumption the components φ(y∗k−µ|Dkσ|Dk

)and Φ

(y∗k−µ|Dkσ|Dk

)are the pdf and

cdf of a normal distribution with finite mean and variance 0 < µ|Dk , σ|Dk < ∞, thereforethey have finite derivative. More attention needs to be paid to the derivatives of µ|Dk andσ|Dk . We rewrite in explicit form equation (12):

(21)σ|Dk = τ

1−

k∑h=1

k∑g=1

e−φz(xi−xh)2

e−φz(xg−xh)2

r−1z,hg

1/2

31

In equation (21), the elements r−1z,hg represent the entries of the correlation matrix Rz

(h–th row, g–th column). We can highlight the weighting process rewriting the predictedvariance as follows:

(22)σ|Dk = τ

1−

k∑h=1

k∑g=1

wihwhgr−1z,hg

1/2

Where the functions wij = exp(−φz (xi − xj)2

)are interpreted as the Gaussian weighting

of the observations. It is apparent that according to assumption 3, equation (22) is in-finitely differentiable with respect to xi. The finiteness of the derivative, however, dependson the values of r−1

zhg, ∀ (h, g). In particular, the derivative will exist finite as long as thematrix Rz is non–singular, i.e., if assumption 3 holds.The same can be proven for the mean component. We then conclude function Tk (xi;Dk)is Lipschitz continuous.

Proof of Theorem 1.

Proof. As the number of iterations k → ∞: the stochastic model converges to its deter-ministic counterpart, and the prediction has a normal distribution with moments that areindependent from the noise ξ (x).We can bound the term σ|Dk from above as follows:

(23)σ|Dk = τ

1−

k∑h=1

k∑g=1

e−φz(xi−xh)2

e−φz(xg−xh)2

r−1z,hg

1/2

≤ τ

From Property 1, we have Tk (xi;Dk) is globally Lipschitz continuous. Hence, Tk (xi;Dk) isfinite over the design space and, given two consecutive points, |Tk (xi−1;Dk)−Tk (xi;Dk) |is finite as well. We can then consider the following:

Tk (xi;Dk) ≤ σ|Dkφ(y∗k − µ|Dkσ|Dk

)≤ σ|Dk/

√2π ≤ τ/

√2π, (24)

where the last two inequalities are derived from the normality property and equation (23),respectively.With (23) and (24), we can then use the results in Locatelli (1997). As seen from (23),the variance is a weighted sum of the estimates, where the weights are a function of thedistance between the candidate point and previous and subsequent points. With this andthe Lipschitz continuity from Property 1, we can apply Lemma 1 in Locatelli (1997) (page60), using as the Lipschitz constant c = τ .Lemma 1 leads to the following result (Locatelli (1997), page 61):

limk→∞

maxi=1,...,k

(xi − xi−1) = 0. (25)

Hence, the set of points at which the function is observed if the algorithm is never stopped,is dense in [a; b]. We can then refer to Calvin and Zilinskas (1999) where convergence tothe global optimum is guaranteed with dense iterates.

32

A2: Additional numerical results from section 4.2

Herein, we report the results for the experiment in section 4.2. In particular, we stud-ied the effect of the noise on the two–dimensional Hump6 in order to evaluate theperformance of the proposed rule with respect to higher dimensions. The noise func-tion adopted was σ2

ξ (x1, x2) = δ · (|x1|+|x2|). The first experiment set was performedwith T = 2400, rmin = 15, δ = 1.2 resulting in Bmin = 45, whereas the second withT = 9600, rmin = 60, δ = 4.8 resulting in Bmin = 120. For both conditions we performed100 macro–replications. Figures 8(a) and 8(b) report the results for the lower and largernoise level, respectively. Also in the 2–d case we can observe a non–monotonic behaviour

(a) eTSSO and TSSO performancecomparison under different values ofbudget per iteration Bk, δ = 1.2 2–dimensional case.

(b) eTSSO and TSSO performancecomparison under different values ofbudget per iteration Bk, δ = 4.8 2–dimensional case.

Figure 8: eTSSO and TSSO performance comparison under different values of budget per iterationBk

of TSSO for the performance for increasing values of the budget B at each iteration andeTSSO proves to perform statistically as the best, unknown, configuration of B for theTSSO.

When the noise level is low, rmin replications are enough to compensate the effect ofthe noise (Figure 8(a)) and TSSO returns satisfactory results in terms of both functionand location prediction even for B ≤ Bmin, i.e, when only the search is performed by thealgorithm. On the contrary, large values of B lead to poor performance (both in the caseof low and large noise) as no search is performed.

33

etsso: adaptive search method for stochastic global ... · fect kriging metamodel, which solves...

Documents