adaptive sampling with topological scores · functions each with their own advantages and...

23
International Journal for Uncertainty Quantification, 3 (2): 119–141 (2013) ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES Dan Maljovec, 1 Bei Wang, 1,* Ana Kupresanin, 2 Gardar Johannesson, 2 Valerio Pascucci, 1 & Peer-Timo Bremer 1,2 1 Scientific Computing and Imaging Institute, University of Utah, 72 South Central Campus Drive, Salt Lake City, Utah 84112, USA 2 Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550- 9234, USA Original Manuscript Submitted: 01/07/2012; Final Draft Received: 07/18/2012 Understanding and describing expensive black box functions such as physical simulations is a common problem in many application areas. One example is the recent interest in uncertainty quantification with the goal of discovering the relationship between a potentially large number of input parameters and the output of a simulation. Typically, the simulation of interest is expensive to evaluate and thus the sampling of the parameter space is necessarily small. As a result choosing a “good” set of samples at which to evaluate is crucial to glean as much information as possible from the fewest samples. While space-filling sampling designs such as Latin hypercubes provide a good initial cover of the entire domain, more detailed studies typically rely on adaptive sampling: Given an initial set of samples, these techniques construct a surrogate model and use it to evaluate a scoring function which aims to predict the expected gain from evaluating a potential new sample. There exist a large number of different surrogate models as well as different scoring functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling using four popular regression models combined with six traditional scoring functions compared against a space-filling design. Furthermore, for a single high-dimensional output function, we introduce a new class of scoring functions based on global topological rather than local geometric information. The new scoring functions are competitive in terms of the root mean squared prediction error but are expected to better recover the global topological structure. Our experiments suggest that the most common point of failure of adaptive sampling schemes are ill-suited regression models. Nevertheless, even given well-fitted surrogate models many scoring functions fail to outperform a space-filling design. KEY WORDS: adaptive sampling, experimental design, topological techniques 1. INTRODUCTION As the accuracy and availability of computer simulations improves, their results are increasingly used to inform far reaching decisions. Experts in areas ranging from building and automobile design to national energy policy regu- larly use predictive simulations to evaluate alternative approaches. However, virtually all simulations are based on approximate models and an incomplete knowledge of the underlying physics and thus do not accurately predict the phenomena of interest. Furthermore, there typically do exist a large number of parameters, e.g., material properties, boundary conditions, subscale parameters, etc., that influence the outcome and are used to tune the simulation to match experiments or observations. Nevertheless, usually no perfect set of parameters exists nor is this type of fit- ting possible for the most interesting case of truly predictive simulations, e.g., weather or climate forecasts. In such scenarios the simulation parameters represent a significant source of uncertainty and a single best guess even by an * Correspond to Bei Wang, E-mail: [email protected], URL: http://www.sci.utah.edu/beiwang/ 2152–5080/13/$35.00 c 2013 by Begell House, Inc. 119

Upload: others

Post on 26-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

International Journal for Uncertainty Quantification, 3 (2): 119–141 (2013)

ADAPTIVE SAMPLING WITH TOPOLOGICALSCORES

Dan Maljovec,1 Bei Wang,1,∗ Ana Kupresanin,2 Gardar Johannesson,2Valerio Pascucci,1 & Peer-Timo Bremer1,2

1Scientific Computing and Imaging Institute, University of Utah, 72 South Central CampusDrive, Salt Lake City, Utah 84112, USA

2Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550-9234, USA

Original Manuscript Submitted: 01/07/2012; Final Draft Received: 07/18/2012

Understanding and describing expensive black box functions such as physical simulations is a common problem inmany application areas. One example is the recent interest in uncertainty quantification with the goal of discoveringthe relationship between a potentially large number of input parameters and the output of a simulation. Typically, thesimulation of interest is expensive to evaluate and thus the sampling of the parameter space is necessarily small. As aresult choosing a “good” set of samples at which to evaluate is crucial to glean as much information as possible from thefewest samples. While space-filling sampling designs such as Latin hypercubes provide a good initial cover of the entiredomain, more detailed studies typically rely on adaptive sampling: Given an initial set of samples, these techniquesconstruct a surrogate model and use it to evaluate a scoring function which aims to predict the expected gain fromevaluating a potential new sample. There exist a large number of different surrogate models as well as different scoringfunctions each with their own advantages and disadvantages. In this paper we present an extensive comparative studyof adaptive sampling using four popular regression models combined with six traditional scoring functions comparedagainst a space-filling design. Furthermore, for a single high-dimensional output function, we introduce a new class ofscoring functions based on global topological rather than local geometric information. The new scoring functions arecompetitive in terms of the root mean squared prediction error but are expected to better recover the global topologicalstructure. Our experiments suggest that the most common point of failure of adaptive sampling schemes are ill-suitedregression models. Nevertheless, even given well-fitted surrogate models many scoring functions fail to outperform aspace-filling design.

KEY WORDS: adaptive sampling, experimental design, topological techniques

1. INTRODUCTION

As the accuracy and availability of computer simulations improves, their results are increasingly used to inform farreaching decisions. Experts in areas ranging from building and automobile design to national energy policy regu-larly use predictive simulations to evaluate alternative approaches. However, virtually all simulations are based onapproximate models and an incomplete knowledge of the underlying physics and thus do not accurately predict thephenomena of interest. Furthermore, there typically do exist a large number of parameters, e.g., material properties,boundary conditions, subscale parameters, etc., that influence the outcome and are used to tune the simulation tomatch experiments or observations. Nevertheless, usually no perfect set of parameters exists nor is this type of fit-ting possible for the most interesting case of truly predictive simulations, e.g., weather or climate forecasts. In suchscenarios the simulation parameters represent a significant source of uncertainty and a single best guess even by an

∗Correspond to Bei Wang, E-mail: [email protected], URL: http://www.sci.utah.edu/∼beiwang/

2152–5080/13/$35.00 c© 2013 by Begell House, Inc. 119

Page 2: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

120 Maljovec et al.

expert is not very reliable. Consequently, understanding the uncertainty involved in a prediction, the range of possibleoutcomes, and the confidence in the results is of interest.

One common approach to address these issues is to create not one but an ensemble of simulations each withslightly different parameter settings. The resulting collection of outcomes is then analyzed to determine the likelihoodof various scenarios to occur and to assess the confidence in the prediction. The challenge lies in the fact that thedimension of the parameter space can be large, e.g., tens or hundreds of parameters, and each simulation might beexpensive, e.g., taking hundreds or thousands of CPU hours. Therefore, the space of possible solutions can only besampled very sparsely and each simulation must be carefully chosen to provide the maximal amount of information.A first-order solution is to sample the parameter space as evenly as possible and there exist a number of approachessuch as the Latin hypercube design [1], orthogonal arrays [2], and related techniques [3] that address this issue.

However, in practice much of the parameter space might be invalid, producing clearly unphysical results, or itmight simply be uninteresting and easy to predict. In these situations a space-filling sampling would waste a largeamount of resources. Instead, adaptive sampling is used to iteratively guide the choice of future computer runs byrepeatedly analyzing the model based on the existing samples. The most common approach to do adaptive samplingis based on the use of statistical prediction models (also known as surrogate models, response functions, statisticalemulators, meta models) such as Gaussian processes models (GPMs) [4] or multivariate adaptive regression splines(MARS) [5]. The basic concept of adaptive sampling is relatively simple and well established: First, one constructs aprediction model based on an initial set of samples (training points); second, a large set of candidate points is chosenin the parameter space and the prediction model is evaluated at these points; third, each candidate point is assigned ascorebased on, for example the estimated prediction error; finally, the candidate(s) with the highest score are selectedand evaluated by running the corresponding simulation.

While the basic pipeline of adaptive sampling is universally accepted, combining different regression models withdifferent scoring functions often leads to drastically different results. Here, we present an extensive experimentalstudy combining four popular statistical models, a GPM [4], two implementations of MARS [5] (available in themda and theearth libraries of the statistical programming languageR), and a neural network (NNET) [6] (fromthe nnet library in R), with six traditional as well as three new topology-based scoring functions. Using a varietyof test functions in two to five dimensions, each combination is compared against a Latin hypercube design of thesame sample size to evaluate the potential advantage of adaptive sampling in general. As discussed in more detailin Section 4, the most dominant factor in the results is the choice of the statistical prediction model as some modelsappear to be ill suited for several test functions and for all scoring functions they produce unsatisfactory results. Givenan appropriate prediction model some trends appear that favor information theoretic scoring functions even thoughfor several experiments all scoring functions fail to outperform the space-filling design.

In addition to the experimental results we also introduce a new class of scoring functions based on global topo-logical rather than local geometric information. In particular, we define three new scoring functions primarily aimedat recovering the global structure of a function. Nevertheless, our results show that the new scoring functions remaincompetitive in terms of mean squared prediction error.

2. BACKGROUND AND RELATED WORK

Here we introduce some of the necessary background and discuss related work in both adaptive sampling as well astopological analysis and visualization.

2.1 Statistical Prediction Models

Since computer simulations are generally computationally expensive, a standard approach to analyze them is to builda statistical prediction model (PM), and use it in place of the actual simulation code in further analysis, to guide theselection of the upcoming ensemble of simulations. We now give a brief overview of some commonly used statisticalPMs; see Fang et al. [7] for further details.

Let y(x) denote the output of the simulation code, where the vector input variable,x, is restricted to thed-dimensional unit cube[0, 1]d. This assumption is easily relaxed to any “rectangular” input space. Letx1, . . . ,xn ∈

International Journal for Uncertainty Quantification

Page 3: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

Adaptive Sampling with Topological Scores 121

[0, 1]d be an ensemble ofn input vectors and denote the resulting simulation data of interest byyi = y(xi), i =1, . . . , n.

2.1.1 Gaussian Process Model (GPM)

One of the statistical models most frequently used for prediction is the Gaussian process model (GPM), first used inthe context of computer experiments by Sacks [8] in 1989. A common practice is to place a homogeneous GP prior onthe family of possible output functions. Then, the predictor is given by the posterior mean conditional on the outputof the computer experiment.

The GPM places a prior on the class of possible functionsy(x). We denote byY (x) the random functionwhose distribution is determined by the prior. Suppose thatY (x) = µ + Z(x), whereµ is a mean parameter andZ(x) is a Gaussian stochastic process with mean0, constant varianceσ2, and an assumed (parametric) correla-tion function. An example of a popular correlation model is the power exponential correlation function, given byR(x,x′) = exp[−h(x,x′)], whereh(x,x′) =

∑dj=1 θj |xj − x′j |pj with θj ≥ 0 and1 ≤ pj ≤ 2. Here,µ, σ2, θj , pj

are the parameters of the prior model. The values of these parameters are estimated using the ensemble data,(xi, yi),i = 1, . . . , n, typically using a likelihood-based objective function. In situations where the output is quite smooth, itis often the case thatpj = 2 holds for allj, resulting in a Gaussian correlation function.

The predictorY (x) for Y (x) is the posterior mean ofY (x) given {(xi, yi)} and is available in a closed formexpression. Similarly, the mean squared prediction error (MSPE) ofY (x) (the prediction error variance), taking intoaccount the uncertainty from estimatingµ by maximum likelihood (but not estimating the correlation parameters), isalso available in a closed form. For more details, see Rasmussen et al. [4].

2.1.2 Multivariate Adaptive Regression Splines (MARS)

Another popular class of PMs are Friedman’s multivariate adaptive regression splines [5] (see also Hastie et al. [9] fora gentle introduction).

Multivariate adaptive regression splines (MARS) is a nonparametric regression method that uses a collection ofsimple basis functions to build a complex and flexible response function. At the core of MARS are piecewise linearbasis functions, given bymax(0, x − k) andmax(0, k − x), wherek is the knot (the break point). These simplefunctions are used to build a collection of basis functions,{max(0, xj−k), max(0, k−xj)}, wherek ∈ {x1j , . . . , xnj)for j = 1, . . . , d. If all the input values are different in the training data this will yield2nd basis functions. The MARSpredictor is then given byy(x) = β0 +

∑Mm=1 βmhm(x), where theβm’s are coefficients and thehm’s are basis

functions, which are either given by a single function from the collection of simple basis functions or a product of twoor more of such functions.

For a given set of basis functions,h1, . . . , hm, the β coefficients are estimated using least squares. The mainnovelty of the MARS is how the basis functions are selected using a forward selection process, followed by a backwardelimination process. The forward pass starts with a model that only includes the constant termβ0 and then adds simplebasis functions in pairs in a greedy fashion. This yields a model that overfits the training data (i.e., has too manyterms). The backward elimination process drops terms using the generalized cross-validation (GCV) criterion, whichcompromises between the fidelity of the model and its complexity (size).

As with most nonparametric regression methods, there is no direct method to assess their prediction error. How-ever, the prediction error can be estimated by bootstrap methods [10]. In short, an ensemble of MARS models iscreated by repeatedly resampling, with replacement, the original training data{(xi, yi)} and a MARS model is fittedto each batch of resampled data. An estimate of the prediction error is then given by the standard deviation of thepredictions provided by the bootstrap ensemble.

2.1.3 Neural Networks (NNET)

Another popular class of PMs is given by neural networks (NNETs), in particular by simple feed-forward neuralnetworks with a single hidden layer (see, e.g., Ripley [6] and Hastie et al. [9] for further details).

Volume 3, Number 2, 2013

Page 4: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

122 Maljovec et al.

The single-layer NNET is simply a nonlinear statistical model, where the response of interest is modeled as a linearcombination ofM derived features (the hidden units of the NNET);y(x) = β0 + βT z, z = (z1, . . . , zM )T , wherezm = σ(α0m + αT

mx). The activation functionσ(v) is usually taken as the sigmoid function,σ(v) = 1/(1 + e−v).The unknown parameters of the NNET, which we denote byθ are often referred to as weights, and consist of the

M(d+1) α’s of the hidden layer and theM +1 β’s for the response model. The weights are estimated by minimizingthe sum-of-squares,R(θ) =

∑ni=1[yi − y(xi)]2. However, for a large NNET, the resulting NNET overfits the data

(i.e., the NNET fits the training data well, but performs poorly on independent validation data). A common solution isto add a regularization term that shrinks the weights to zero, such asJ(θ) =

∑Mm=0 β2

m +∑M,d

m=0,j=1 α2mj , and then

minimizeR(θ) + λJ(θ). The minimization can be carried out in an efficient manner using gradient-based methods,as the gradient of the NNET can be computed easily using the chain rule for differentiation. As in the case of MARS,the prediction error of the NNET can be estimated using bootstrap methods.

2.2 Design of Computer Experiments

Experimental designs relevant to computer experiments (i.e., sampling) are often broadly categorized into two classes:space-filling and criterion-based designs. Intuitively, it is natural to consider a space-filling design strategy to minimizethe overall prediction error and a number of approaches have been studied. These include methods based on selectingrandom samples, e.g., Latin hypercube designs, distance-based designs, and uniform designs. A thorough discussionsof the various strategies may be found in Satner et al. [11], Koehler and Owen [12], and Bates et al. [13]. Intuitively,the goal is to ensure that the input points are uniformly distributed over the range of each input dimension. In oursetting, space-filling designs are attractive for an initial exploratory analysis. However, their applicability for detailedstudies is limited by their very construction rationale, that is, the assumption that the features of the response modelare equally likely to be found anywhere in the input space. More specifically, following a space-filling design, wewill have no freedom to adapt our selection of input points to information gathered as simulations are completed.Instead, using the knowledge contained in a partially constructed model may allow us to adaptively select the samplesin regions of interest. If samples are chosen correctly, this strategy will greatly improve prediction accuracy andefficiency compared to a pure space-filling design.

This leads to the second class of experimental designs: those constructed by adding one or several points at atime to the initial model. New points are selected from a large candidate set based on various statistical criteria andthe prediction of the existing (partial) model. Due to their iterative nature these designs are usually referred to assequential, or adaptive. Sequential designs based on optimizing statistical criteria such as mean squared predictionerror or the notion of entropy have also been used to construct designs for computer experiments [11]. Here we usea simple sampling of the range space (see Section 3.1), as well as the mean squared prediction error (MSPE) and theexpected improvement as criteria.

2.2.1 Maximum Mean Squared Prediction Error

The mean square prediction error (MSPE) is simply the prediction error of PM at a given new input point. In the caseof the GPM, the prediction error is available in a closed form, but is estimated using bootstrap for MARS and NNET.The MSPE criterion aims at selecting the point from the candidate set that has the largest MSPE.

In the case of a stationary GPM with a constant mean, this results in criterion that spreads points out, but atdifferent density along each axis, and typically starts out by populating points near the boundary of the input space. Inthe case of MARS and NNET, which can capture very nonstationary behavior, the criterion typically populates pointsin the region of the input space which is most sensitive to bootstrap resampling, that is, where there is a large variationin the response.

2.2.2 Maximum Expected Improvement

The expected improvement (EI) criterion was proposed by [14] and originally developed in the context of globaloptimization [15]. Lam [16] considered a modification of this criterion with the goal of obtaining a good global fit of

International Journal for Uncertainty Quantification

Page 5: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

Adaptive Sampling with Topological Scores 123

the GPM instead of locating the global optimum or optima. Intuitively, the objective here is to search for “informative”regions in the domain that will help improve the global fit of the model, where “informative” means regions withsignificant variation in the response variable.

In case of the GPM, for each potential input pointx, Lam defined the improvement asI(x) = [Y (x)− y(xj∗)]2,wherey(xi∗) is the observed output at the sampled pointxi∗ closest (in Euclidean distance) to the candidate pointx.The maximum expected improvement criterion advises to select as the next point the one that maximizes the expectedimprovementEI(x) = [Y (x)− y(xi∗)]2 + σ2(x). One typically works with the square root of theE(I), which is atthe same scale as response. For the details of the derivation ofE(I) using a GPM, we refer to [16]. The EI criterion canbe extended to MARS and NNET by simply replacingY (x) with the bootstrap average and the prediction varianceσ2(x) with the bootstrap variance.

The estimate of expected improvement uses two search components, one local and one global. The first (local)component of the expected improvement will tend to be large at points where the increase in response over the nearestsampled points is large. The second (global) component is large at points with the largest prediction error.

2.3 Morse-Smale Complex and Its Approximation

To quantify the expected topological impact of a point during adaptive sampling, we introduce a key topologicalstructure, theMorse-Smale Complex, which forms the basis for the new scoring functions introduced in Section 3.2.

Topological structures, such as contour trees [17], Reeb graphs [17–20], and Morse-Smale complexes [21, 22]provide abstract representations for scalar functions. These structures can be used to define a wide variety of featuresin various applications, ranging from medical [23] to physical [24, 25] and material science [26]. To analyze andvisualize high-dimensional data, several topological approaches have been proposed, in particular, [27, 28].

LetM be a smooth manifold without boundary andf : M → R be a smooth function with gradient∇ f . A pointx ∈ M is calledcritical if ∇ f(x) = 0, otherwise it isregular. If the Hessian matrix at a critical point is nonsingularthen the critical point is callednondegenerate. At any regular pointx the gradient is well-defined and integrating itin both directions traces out an integral line,γ : R → M, which is a maximal path whose tangent vectors agreewith the gradient [21]. Each integral line begins and ends at critical points off . Theascending/descending manifoldsof a critical pointp are defined as all the points whose integral lines start/end atp. The descending manifolds forma complex called aMorse complexof f and the ascending manifolds define the Morse complex of−f . The set ofintersections of ascending and descending manifolds creates theMorse-Smalecomplex off . Each cell (crystal) ofthe Morse-Smale complex is a union of integral lines that all share the same origin and the same destination. In otherwords, all the points inside a single crystal have uniform gradient flow behavior. These crystals yield a decomposi-tion into monotonic, nonoverlapping regions of the domain, as shown in Figs. 1(a)–(c) for a two-dimensional heightfunction.

In two and three dimensions, discrete Morse-Smale complex on piecewise linear functions can be constructed[17, 21, 22, 29]. For high dimensional point cloud data, the Morse-Smale complex can only be approximated [30, 31].Here, we give an overview of the approximation approach in high dimension detailed in [31, 32], which is crucial inour algorithm pipeline. First, the domain is approximated by ak nearest neighbor (kNN) graph. The algorithm uses adiscrete approximation of the integral line by following the paths of steepest ascent and descent among neighboring

(a) (b) (c) (d) (e)

FIG. 1: Left: (a) Ascending manifolds, (b) descending manifolds, and (c) Morse-Smale complex. Right: A 2D Morse-Smale complex before (c) and after (d) persistence simplification. Maxima are red, minima are blue, and saddles aregreen.

Volume 3, Number 2, 2013

Page 6: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

124 Maljovec et al.

points in the graph, based on a quick-shift algorithm [33]. The neighbor of a pointxi is defined asadj (xi) = {xj |xj ∈ knn (xi) or xi ∈ knn (xj)}. Its steepest ascent isarg maxxj∈adj (x) ||f(xj) − f(xi)||/||xi − xj ||, while itssteepest descent isarg maxxj∈adj (x) ||f(xi) − f(xj)||/||xi − xj ||. Each pointxi is then assigned to a crystal of theMorse-Smale complex which is a union of approximated integral lines that all share the same origin and the same des-tination. The domain is then partitioned into regions{C1, C2, ..., Cl} where

⋃i Ci = {xi}n

i=1. In this approximatedMorse-Smale complex, a maximum/minimum has no ascending/descending neighbors, respectively.

2.4 Persistence

One advantage of the Morse-Smale complex is that it can be used to associate a notion ofsignificanceto the criticalpoints. For example, as shown in Fig. 1(d), the left peak (the circled red maxima) is considered less important topo-logically than its nearby peak (uncircled red maxima to the right) as it is lower. Therefore, at a certain scale, we wouldlike to represent this feature as a single peak instead of two separate peaks, as shown in Fig. 1(e). This simplificationprocedure and the notion ofscaleis defined through the concept ofpersistence.

The theory of persistence was first introduced in [34, 35], but borrows from the conventional notion of the saliencyof watersheds in image segmentation. It has since been applied to a number of problems, including sensor networks[36], surface description and reconstruction [37], protein shapes [38], images analysis [39], and topological denoising[40]. In visualization, it has been used to simplify Morse-Smale complexes [41, 42], Reeb graphs [43], and contourtrees [23]. Here we introduce persistence for a 1D (single variable) function [38] and refer to [34, 35, 44] for itsgeneral settings.

For a one-dimensional smooth functionf : R→ R, persistence can be described through the number of connectedcomponents in the sublevel sets, and by tracking the birth and death of these components. In particular, components arecreated and destroyed only at sublevel sets containing critical points. Pairing the critical point that creates a componentwith the one that destroys it thus creates a pairing of critical points. Supposef has nondegenerate critical points withdistinct function values. We have two types of critical points, (local) maxima and (local) minima. We consider thesublevel sets off , Ft = f−1(−∞, t] and track the connectivity ofFt as we increaset from −∞. As shown inFig. 2(a), when we pass the minimum pointa, a new component appears in the sublevel sets, which is representedby the minimuma, with a birth time f(a). Similarly when we pass the minimumb andc, two new components areborn with birth timef(b) andf(c), respectively. When we pass the maximumd, two components represented byaandc are merged and the maxima is paired with the younger (higher) of the two minimum that represent the twocomponents, that is,d andc are paired, wheref(d) is thedeathtime of the component represented byc. We definethepersistenceof the pair to bef(d) − f(c), which corresponds to the significant of a topological feature. We thenencode persistence in thepersistence diagram, Dgm(f), by mapping the critical point pair to a point[f(c), f(d)] onthe 2D plane. Similarly we paire with b andf with a, resulting in two more points inDgm(f). For technical reasons,the diagonal is considered as part of the persistence diagram that contains an infinite number of points.

birth

death

a

b

c

birth

death

d

e

f

(a) (b)

FIG. 2: (a) A 1D function with three local minima and three local maxima. The critical points are paired, and eachpair is encoded as a point in the persistence diagram on the right. (b) Left, two functionsf, g : R → R with smallL∞-distance. Right, their corresponding persistence diagramsDgm(f) (circles) andDgm(g) (squares) have smallbottleneck distance.

International Journal for Uncertainty Quantification

Page 7: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

Adaptive Sampling with Topological Scores 125

Recent results show that persistence diagrams are stable under small perturbations of the functions [44, 45]. Letp = (p1, p2), q = (q1, q2) be two points in the persistence diagram, and let||p−q||∞ = max{|p1−q1|, |p2−q2|}. Forfunctionsf, g : R→ R, ||f − g||∞ = supx |f(x)− g(x)|. Thebottleneck distancebetween two multisets of points inDgm(f) andDgm(g) is dB [Dgm(f),Dgm(g)] = infγ supx ||x − γ(x)||∞, wherex ∈ Dgm(f) andy ∈ Dgm(g)range over all points, andγ ranges over all bijections fromDgm(f) to Dgm(g) [45]. The Stability Theorem statesthat the persistence diagrams satisfydB(Dgm(f), Dgm(g)) ≤ ||f − g||∞. This is illustrated in Fig. 2(b).

Using the Morse-Smale complex the persistence pairing can be created by successively canceling the two criticalpoints connected in the complex with minimal persistence while avoiding certain degenerate situations. This assignsa persistence to each critical point in the complex which, intuitively, describes the scale at which a critical pointwould disappear through simplification. Note that in the approximate Morse-Smale complexes created from high-dimensional point clouds [32] only a subset of theoretically possible cancellations can be performed which changesthe persistences slightly. However, we have not observed any negative effects of the approximation.

3. ADAPTIVE SAMPLING

Two main families of prediction models (PMs) are used in the uncertainty quantification (UQ): regression models suchas MARS and stochastic models such as Gaussian processes. Our general pipeline as illustrated in Fig. 3 is applicableto both families.

We begin by selecting some initial training data, running the simulation, and obtaining a collection of true re-sponses at these data points. Second, we fit a prediction model (PM), i.e., a Gaussian process model, from the initialset of training data. Third, a large set of candidate points is chosen in the parameter space using Latin hypercubesampling (LHS), and the PM is evaluated at these points. It is important to note that we use PM to approximate valuesat these candidate points, which is highly efficient. Fourth, each candidate point is assigned a score based on someadaptive sampling scoring functions. Finally, the candidates with the highest scores are selected and added to the setof training data to begin a new cycle.

Traditional scoring functions are either based on point density, where candidate points which are further away fromthe existing training data get higher scores, or are based on prediction accuracy, where candidates with higher pre-

Select training data & run simulation

Fit a Prediction Model (PM),

i.e. Gaussian Process Model

Select candidates & predict

response values from PM

Score candidates & select

candidate to add to training data

1

2

3

4

FIG. 3: The iterative pipeline for adaptive sampling. Starting with a set of training points (top) a prediction modelis created (right). The model is evaluated at a large number of candidate points (bottom) typically created through aspace-filling design. Each candidate point is assigned a score indicating the expected information gain were this pointbeing evaluated. Finally, the point with the highest score is selected and evaluated using the simulation (left). Finally,the new sample is added to the training data and the process is repeated.

Volume 3, Number 2, 2013

Page 8: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

126 Maljovec et al.

diction uncertainty get higher scores. We propose a third class of scoring functions based on topological information,where candidate points in the area of larger topological changes are assigned higher scores.

3.1 Traditional Scoring Functions

We first review several traditional scoring functions. We compare our topological scoring functions to these metrics.There are three main scoring functions, namely, Delta, ALM, and EI, as detailed below. Each of them is furtheraugmented with a distance penalization factor, creating three additional scoring functions.

Let T = {z}ni=1 be the set ofn training points in dimensiond. The true response at a pointz ∈ T is denoted as

y(z). Let S = {x}mi=1 be the set ofm candidate points in dimensiond. The predicted response at a pointx ∈ S is

denoted asy(x).

3.1.1 Delta

This criterion can be seen as a way to evenly sample the range space of the function. It is defined as the absolutevalue of the difference between the predicted response at a candidate point and the response at the nearest trainingpoint. Points are chosen wherever the response model predicts a large gap in function value. Note that while the deltacriterion is very intuitive it does not consider either gradient magnitudes nor “predictability.” For example, a pointin the middle of a steep but linear ramp is easily predicted even though it may have a large difference in functionvalue. Similarly, a point with a large difference in function value far away from the nearest sample may not be asinteresting as a slightly smaller difference in a highly sampled region. Formally, for a pointx ∈ S, Delta(x) is theabsolute difference, to theqth power, between the predicted response and the response observed at the closest pointin the training sample, as measured by theLp distance metric, that is,d(x,x′) = (

∑di=1 |xi − x′i|p)1/p. That is, for

some fixed parametersp andq, letx∗ = arg minz∈T d(x, z), thenDelta(x) = |y(x)−y(x∗)|q. In the default setting,p = 2 andq = 2.

3.1.2 ALM

This is the active learning MacKay criterion described in [46] which attempts to optimize the predictive variance.The idea is that the variance represents a notion of uncertainty in the prediction and new samples should be evaluatedin the least well undestood regions of the parameter space. For the GPM the variance can be computed directlyfrom the model which appears to be a significant advantage (see Section 4). For the other prediction models we usebootstrapping (as described in Section 2.1 and Appendix B) to estimate the variance.

3.1.3 EI

This is the expected improvement criterion. As discussed in Section 2.2, this can be seen as a combination of theexpected prediction error used in the ALM method and the Delta criterion. Points are chosen that either show a largeuncertainty in their current prediction or have a large discrepancy with the closest existing sample. Our predicationmodel usesEI(x) = [|y(x)− y(x∗)|2 + ALM(x)]1/2.

3.1.4 Distance Penalization

Each of the above three scoring functions can be augmented with a distance penalization factor, therefore creatingthree additional scoring functions, namely,DeltaDP, ALMDP andEIDP. For DeltaDP, the Delta criterion with anadditional penalty term, we can prevent samples from lying too close to the training set. The scaling attempts tobalance the goal of sampling in areas of large function variance with the ability to detect yet unknown features bypreferring under-sampled areas. For a pointx ∈ S, DeltaDP (x) = Delta(x) ∗ ρx, whereρx is the distance scalingfactor. Recalldx = d(x,x∗) is the distance fromx to the closest point in the training data, andD is a distance vectorof dx for all x ∈ S. ρx = ρx(dx, d0, p0), whered0 is the range andq0 is the quantile (by default,d0 is theq0 quantileof D). If dx > d0, setρx = 1, otherwiseρx = 1.5dx − 0.5d3

x, where the coefficients are taken from spherical

International Journal for Uncertainty Quantification

Page 9: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

Adaptive Sampling with Topological Scores 127

semivariogram. Similarly we defineALMDP (x) = ALM(x) ∗ ρx, where we approach a more space-filling pointselection, andEIDP (x) = EI(x) ∗ ρx. By default we useq0 = 0.5.

3.2 Topological Scoring Functions

All of the scoring functions discussed above pick sample points more or less directly based on the idea of globallyimproving the prediction accuracy. However, these points are not necessarily the optimal candidates. Imagine, forexample, a steep mountain that (by random chance) has already been sampled both close to its peak as well assomewhere near the base. For points on the slope of the mountain, the prediction will show a large difference infunction value thus making it attractive for most standard techniques. However, evaluating the prediction in more detailwould also show that even taking a sizable prediction error into account the global structure of the mountain wouldnot change by adding a point on its slope. More specifically, the single mountain would remain a single mountainfor a wide range of potential new values even considering errors and uncertainty. This rationale leads to a topology-based scoring function aimed at discovering the global structure—the topology—of a function rather than its detailedgeometry. In particular, we propose three different topology-based scoring functions, named,TopoHP, TopoP, andTopoBas detailed below, illustrated in Fig. 4.

3.2.1 TopoHP

The first strategy is aimed at sampling at or near predicted critical points with significant influence on the topology.It is defined as the persistence of a candidate point within an (approximated) Morse-Smale complex constructed fromoversampling the current response model. Given the current response model, we evaluate its prediction at all candi-date points and compute the Morse-Smale complex of the resulting point set by combining both training points andcandidate points. We then assign all critical points of the complex that are part of the candidate sets their persistenceas score and assign a zero score to all regular points within the candidate sets. Referring to Fig. 4(a), the silver pointsillustrate the training pointsT and the purple points correspond to the candidate pointsS. We construct a Morse-Smale complex overT ∪ S, and return the persistence of the critical points within the candidates. Here, pointx isselected with the highest persistence, therefore, the highestTopoHP (x).

3.2.2 TopoP

Similar to a bootstrapping approach, this strategy aims to evaluate how much the topology (as represented by thepersistences) would change if a new candidate point is added. It is defined as the average change of persistence for allcurrent extrema when a given candidate point with its predicted response is inserted into the Morse-Smale complex.As shown in Fig. 4(b), we first construct the Morse-Smale complex of all training dataT (silver points). Then foreach candidate pointx ∈ S, we construct a new Morse-Smale complex consisting ofT ∪x [Fig. 4(b), right]. To score

(a) (b)

birth

death

(c)

FIG. 4: (a) TopoHP: Construct a Morse-Smale complex from training data (silver) as well as all candidates withpredicted responses (purple), return the persistence of the critical points within the candidates. Pointx is selectedwith the highestTopoHP (x). (b) TopoP: average change in persistence for all extrema before (left) and after (right)inserting a candidatex into the Morse-Smale complex. (c) Morse-Smale complexes before (left) and after (middle)inserting a candidate pointx; TopoB (right): bottleneck distance between the corresponding persistence diagramsbefore (circles) and after (squares) insertion.

Volume 3, Number 2, 2013

Page 10: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

128 Maljovec et al.

the candidatex, we compute the change in persistence for each training pointx ∈ T that remains as an extrema pointbetween the original and enhanced Morse-Smale complex, and average these changes to obtain a single, non-negativevalue.

3.2.3 TopoB

For each pointx ∈ S, TopoB(x) is defined as the bottleneck distance between the persistence diagram of theMorse-Smale complex overT versus the Morse-Smale complex consisting ofT ∪ x. This strategy is similar to theTopoP scoring except that the bottleneck distance not only takes the persistence values into account but also theorder and nesting of the corresponding simplification. This is shown in Fig. 4(c), where left and middle illustratethe (approximated) Morse-Smale complexes before and after inserting a candidate pointx, and right displays thecorresponding persistence diagrams of these complexes.TopoB(x) is defined as the bottleneck distance betweenthem.

4. EXPERIMENTS

This section summarizes the different experiments and highlights apparent trends and interesting behaviors.

4.1 Example Data Sets

To evaluate the different scoring functions and understand the behavior in different scenarios we have conducted aseries of experiments with well-known analytic functions, which can be generalized to high dimensions, for example,a widely used multimodal test function from the optimization literatureAckley[47], as well as easily controlled testfunctions such asGaussian mixtures, and theDiagonalfunction. The Diagonal function consists of asin curve alignedwith the main diagonal of the unit (hyper-)cube convolved with a Gaussian kernel in the hyperplane orthogonal to thediagonal (see [31]). The Diagonal function is attractive for testing as it is not axis aligned, its topological structureis well understood and can be computed analytically, and its complexity is easily controlled. All functions with theirclosed forms and their 2D contour plots are shown in Appendix A.

4.2 Plot and Specifications

All graphs show the root mean squared error (RMSE) of a given regression model versus the number of samples usedfor training. The RMSE is computed using points evaluated on a grid. For the 2D case, the total number of points is2601, evenly spaced at an interval of 0.02. The 3D case uses a total of 9261 points with a grid spacing of 0.05. The4D case uses a total of 14,641 points with a spacing of 0.1, and the 5D case uses a total of 7776 validation pointswith a spacing of 0.2. All plots show the median RMSE of 10 trial runs. We use the median value since, as discussedbelow, several regression techniques seem to fail for particular sets of samples and the median is more robust againstthe resulting outliers in the RMSE. For a fixed prediction model, all trials start from the same LHS initial trainingsampleT . |T | = 20, 30, 100, and200 in 2D, 3D, 4D, and 5D, respectively. Curves are colored by scoring functionwith the thicker black line indicating an LHS sample of the given size. During each step of the adaptive sampling, apoint is chosen among all|S| = 200 ∗ d candidate points selected using LHS with the highest score. We have opted tonot show variances or percentiles alongside the medians as the resulting plots become too cluttered. Nevertheless, asdiscussed below even without an explicit representation several of the plots suggest drastic differences in the stabilityof the regression.

4.3 Discussions on Results

Unsurprisingly, the most significant factor in the success of any scoring function is the quality of the regression model.Unfortunately, many experiments even in lower dimensions failed in the sense that the regression based on the space-filling design did not converge within the number of samples tried. In fact, for some functions some techniques didnot show signs of improvement with an increasing number of samples. While it is possible that these results could

International Journal for Uncertainty Quantification

Page 11: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

Adaptive Sampling with Topological Scores 129

be improved through manual parameter tuning, this would likely not be feasible in a real world application whereno ground truth is known and the number of samples is typically severely limited. Furthermore, any manual or semi-automatic parameter tuning would make comparing models even more challenging and potentially bias the results.Therefore, we have selected to run all experiments with the default values provide with the various regression packageslisted in Appendix B. Subsequently, we only consider experiments in which the space-filling design indicated a validsurrogate model.

In general, the GPM-based regression produces the smoothest and most distinctive plots showing clear differencesbetween scoring functions. Consider the 2D Ackley function shown in Fig. 5(a): The ALM criterion significantlyoutperforms all other scoring functions with the TopoHP the only other criterion that beats the space-filling design.Compare this to the EARTH model shown in Fig. 5(b): Most scoring functions except TopoHP perform qualitativelysimilar and barely achieve the same quality as the LHS sampling. Furthermore, all curves are less smooth suggesting ahigh variance among the different trials. The other MARS implementation, MDA, performs slightly worse [Fig. 5(c)]but qualitatively similar with very rough curves without clear trends that fail to achieve the same performance as thespace-filling sample. Finally, the NNET implementation shown in Fig. 5(d) barely shows any improvement in RMSEwith increasing samples, and all sampling strategies including the LHS sample seem to perform similarly with largefluctuations. Overall, NNET achieves by far the worst fit at an RMSE almost an order of magnitude larger than that ofthe GPM model.

The general differences between regression models seen in the Ackley functions are present for all test functions.Where the GPM models tend to converge for some reasonable number of samples and even in higher dimensions,both MARS implementations have difficulties with non-axis aligned structures such as the Diagonal and the Salomonfunction. The NNET with the standard parameters consistently performs worse than all other models, and among all

(a) (b)

(c) (d)

FIG. 5: Adaptive sampling of the 2D Ackley function using different regression techniques and different scoringfunctions. (a) GPM; (b) EARTH; (c) MDA; and (d) NNET.

Volume 3, Number 2, 2013

Page 12: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

130 Maljovec et al.

techniques only the GPM shows the smoothly converging curves one would traditionally expect. Furthermore, onlythe GPM model shows significant differences between scoring functions. For the GPM-based adaptive sampling theone trend observed in most 2D test functions is that the ALM criterion seems to outperform all others. A possibleexplanation for this behavior is that the predicted variance on which the ALM criterion is based is an integral part ofthe GPM model itself. This makes the ALM criterion especially well suited for a GPM model and could explain theconsistently better performance.

Nevertheless, the performance of the scoring functions even for the well behaved GPM regression is far fromconsistent across test functions. For example, in the Diagonal function with two maxima [Fig. 6(a)] the ALM and EIcriteria perform best closely followed by TopoHP. The Delta function is on par with the LHS sampling while the other

(a) (b)

(c) (d)

(e) (f)

FIG. 6: Adaptive sampling various 2D test functions using the GPM. (a) Two-maxima Diagonal; (b) four-maximaDiagonal; (c) mixture of five Gaussians; (d) MGR; (e) Salomon; (f) Whitley.

International Journal for Uncertainty Quantification

Page 13: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

Adaptive Sampling with Topological Scores 131

two topological criteria do not perform well. The four maxima Diagonal version [Fig. 6(b)] shows a similar behavioreven though now small differences appear between the ALM and EI criteria and the Delta function now fails to achievethe RMSE of the space-filling sample. The mixture of five Gaussians [Fig. 6(c)] again shows a clear preference forthe error-driven criteria even though now the Delta function performs worse than the topological scoring functions.

The remaining functions show a very similar pattern with the noticeable exception that now the EI criteria nolonger perform as well as the the ALM. For the MGR function [Fig. 6(d)] TopoHP performs second best with EI andDelta mirroring the LHS line. For the Salomon function [Fig. 6(e)] EI performs better than Delta again but both failto beat the space-filling sample. Finally, the Whitley function [Fig. 6(f)] shows overall a better performance with EIand Delta both outperforming the topological scoring functions.

The 3D experiments using the GPM show largely similar results as shown in Fig. 7. The ALM function generallyperforms best with the remaining scoring functions in different combinations behind. However, a notable and interest-ing exception is the mixtures of two and five Gaussians. For both functions the Delta criterion performs significantlybetter than the rest. One explanation is that these functions are characterized by a few large mountains which, oncefound, almost entirely determine the function. In such cases the Delta functional may perform well as it picks valuespurely based on the observed range. It is unclear, however, why such a behavior is not present in the 2D versions.

A similar behavior can be seen in higher dimensions: The 4D Ackley [Fig. 7(f)] and the 5D Salomon function[Fig. 7(g)] show the expected advantage of the ALM criterion while the 5D two maxima Diagonal function [Fig. 7(h)]again prefers the Delta criterion. However, in this case all three topological functions outperform the remaining scoringfunctions even though they only perform on par with the LHS samples.

As mentioned above, the performance of the other regression techniques is rather spotty and the results of theadaptive scoring experiments are correspondingly inconsistent. The EARTH model produces results similar to theones shown in Fig. 5(b) in case the model itself converges, for example, for the two maxima Diagonal function[Fig. 8(a)] and the mixture of five Gaussians [Fig. 8(b)]. However, other experiments show rather large fluctuations ineither some adaptive sampling curves, e.g., the TopoHP curve within the MGR function [Fig. 8(c)] or the LHS curve,e.g., the Whitley function [Fig. 8(d)]. Overall, the 2D Whitley function shows a good performance with nearly alladaptive criteria outperforming the LHS curve. Interestingly, in its 3D incarnation [Fig. 8(f)] the adaptive samplingdoes not work nearly as well and furthermore shows excessive variations in several curves.

In general there appears to be no significant advantage of one scoring function over another even though for theEARTH model in many cases the TopoHP seems to perform especially badly; for example, for the 3D Whitley and3D MGR function [Figs. 8(f) and 8(g)]. This is rather surprising since for the GPM model TopoHP performed ratherwell and typically better than the other topological scoring functions. The fact that there exists virtually no differenceamong the other scoring functions is likely due to the error computation. For non-GPM models such as MARS theexpected prediction error that forms the bases for both the ALM and the EI criterion is constructed through bootstrap-ping and thus is probably less reliable. This may negate the differences between error-based criteria and the others,and result in across-the-board worse performance. Unfortunately, we have not been able to get acceptable results fordimensions beyond three, as even for simple models such as the mixture of two Gaussians the non-GPM models didnot converge. A failed but nevertheless interesting result comes from the mixture of two Gaussians [Fig. 8(h)] wherethe model in general shows no real improvements for higher number of points but both the Delta and the EI criteriafirst decrease the quality of the fit.

The other MARS implementation MDA, unsurprisingly, shows largely the same behavior as EARTH. Figure 9shows four examples each in two and three dimensions. Comparing with the EARTH model, we could see typi-cally comparable RMSE, and higher variability [e.g. Figs. 9(a) and 9(b)] with the exception of 3D Whitley function[Fig. 9(f)]. Again the 3D mixture of two Gaussians [Fig. 9(h)] shows the initial decrease in RMSE for several scoringfunctions.

Finally, the NNET implementation in its default setting performs clearly worse with excessive variability in allplots which makes interpretation of any possible trends questionable. Apart from the Ackley function of Fig. 5(d) somemodels that performed reasonably in two dimensions are the two-maxima Diagonal function [Fig. 10(a)], mixture offive Gaussians [Fig. 10(b)], and the Salomon function [Fig. 10(d)]. The MGR function [Fig. 10(c)] shows a competitiveRMSE as baseline but excessive variations for virtually all adaptive scoring functions as well as the LHS line. The 3DMGR function [Fig. 11(b)] shows a similar behavior except that the LHS line now appears to be stable.

Volume 3, Number 2, 2013

Page 14: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

132 Maljovec et al.

(a) (b)

(c) (d)

(e) (f)

(g) (h)

FIG. 7: Adaptive sampling of various test functions using the GPM. (a) 3D Ackley; (b) 3D two-maxima Diagonal;(c) 3D mixture of two Gaussians; (d) 3D mixture of five Gaussians; (e) 3D MGR; (f) 4D Ackley; (g) 5D Salomon; (h)5D two-maxima Diagonal function.

International Journal for Uncertainty Quantification

Page 15: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

Adaptive Sampling with Topological Scores 133

(a) (b)

(c) (d)

(e) (f)

(g) (h)

FIG. 8: Adaptive sampling of various test functions using the EARTH regression model. (a) 2D two-maxima Diagonalfunction; (b) 2D mixture of five Gaussians; (c) 2D MGR function; (d) 2D Whitley functions; (e) 3D Ackley function;(f) 3D Whitley function; (g) 3D MGR function; (h) 3D mixture of two Gaussians.

Volume 3, Number 2, 2013

Page 16: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

134 Maljovec et al.

(a) (b)

(c) (d)

(e) (f)

(g) (h)

FIG. 9: Adaptive sampling of various test functions using the MDA model. (a) 2D two-maxima Diagonal; (b) 2Dmixture of five Gaussians; (c) 2D MGR function; (d) 2D Whitley function; (e) 3D Ackley function; (f) 3D Whitleyfunction; (g) 3D MGR function; (h) 3D mixture of two Gaussians.

International Journal for Uncertainty Quantification

Page 17: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

Adaptive Sampling with Topological Scores 135

(a) (b)

(c) (d)

FIG. 10: Adaptive sampling of various 2D test functions using the NNET model. (a) Two-maxima Diagonal function;(b) Mixture of five Gaussians; (c) MGR function; (d) Salomon function.

The 3D Ackley [Fig 11(a)] and Salomon function [Fig. 11(c)] both show the high variability and high RMSE withno adaptive scoring functions able to outperform the space-filling sampling. Just as with the two MARS implementa-tions the 3D mixture of two Gaussians [Fig. 11(d)] does not improve with increasing number of samples and showsthe initial increase in RMSE for some scoring functions.

5. DISCUSSION

After running an extensive set of experiments in two to five dimensions with nine different scoring functions somegeneral trends appear even though the fundamental question of which particular scoring function to use remainslargely inconclusive. The results given in this paper are only small steps in this direction. The first, not necessarilysurprisingly result is that the quality of the underlying regression model plays a key role in the performance of anyadaptive sampling technique. In this study the GPM model performs the best and is the only one showing significantdifferences between scoring functions. Overall, it seems that the combination of ALM and GPM is the preferredchoice even though some functions perform well with the Delta criteria.

The remaining regression models all show problems fitting many of the test functions and the large variabilitysuggest that they are sensitive to specific sample locations. In the cases where reasonable fits have been achieved allscoring functions perform equally well (or poorly).

The new topological scoring functions are largely competitive in terms of RMSE and often perform among thetop scoring functions. Some results suggest that detailed fits with a high number of samples are less well suited fortopological scoring functions as they are designed to recover larger scale features. Nevertheless, topological scoringfunctions are expected to better recover the global structure of a function, and finding quantitative metrics to test thishypothesis as well as expanding the class of such functions will be the focus of future research.

Volume 3, Number 2, 2013

Page 18: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

136 Maljovec et al.

(a) (b)

(c) (d)

FIG. 11: Adaptive sampling of various 3D test functions using the NNET model. (a) Ackley function; (b) MGRfunction; (c) Salomon function; (d) mixture of two Gaussians.

ACKNOWLEDGMENTS

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore NationalLaboratory under Contract No. DE-AC52-07NA27344, LLNL-JRNL-519591, in particular theUncertainty Quantifi-cation Strategic Initiative Laboratory Directed Research and Development Projectat LLNL under project trackingcode 10-SI-013.

REFERENCES

1. McKay, M. D., Beckman, R., and Conover, W. J., A comparison of three methods for selecting values of input variables in theanalysis of output from a computer code,Technometrics, 21:239–245, 1979.

2. Tang, B., Orthogonal array-based Latin hypercubes,J. Am. Stat. Assoc., 88(424):1392–1397, 1993.

3. van Dam, E. R., Two-dimensional minimax Latin hypercube designs,Discrete Appl. Math., 156:3483–3493, 2008.

4. Rasmussen, C. E. and Williams, C. K. I.,Gaussian Processes for Machine Learning, Cambridge, MA, MIT Press, 2006.

5. Friedman, J., Multivariate adaptive regression splines,Ann. Stat., 19(1):1–67, 1991.

6. Ripley, B. D.,Pattern Recognition and Neural Networks, Cambridge, UK, Cambridge University Press, 1996.

7. Fang, K.-T., Li, R., and Sudjianto, A.,Design and Modeling of Computer Experiments, Boca Raton, FL, Chapman andHall/CRC, 2005.

8. Sacks, J., Schiller, S. B., and Welch, W., Design for computer experiments,Technometrics, 31:41–47, 1989.

9. Hastie, T., Tibshirani, R., and Friedman, J.,The Elements of Statistical Learning, Berlin, Springer-Verlag, 2009.

International Journal for Uncertainty Quantification

Page 19: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

Adaptive Sampling with Topological Scores 137

10. Davison, A. C. and Hinkley, D. V.,Bootstrap Methods and their Application, Cambridge, UK, Cambridge University Press,1997.

11. Santner, T. J., Williams, B., and Notz, W. I.,The Design and Analysis of Computer Experiments, Berlin, Springer-Verlag,2003.

12. Koehler, J. R. and Owen, A. B., Computer experiments, In: Ghosh, S. and Rao, C. R. (Eds.),Handbook of Statistics,Amsterdam, Elsevier Science, 1996.

13. Bates, R. A., Buck, R. J., Riccomagno, E., and Wynn, H. P., Experimental design and observation for large systems,J. R. Stat.Soc., Ser. B, 58:77–94, 1996.

14. Schonlau, M.,Computer Experiments and Global Optimization, PhD Thesis, University of Waterloo, 1997.

15. Jones, D., Schonlau, M., and Welch, W., Efficient global optimization of expensive black-box functions,J. Global Optimiza-tion, 13:455–492, 1998.

16. Lam, C. Q., Sequential adaptive designs in computer experiments for response surface model fit, http://etd.ohiolink.edu/,2008.

17. Carr, H., Snoeyink, J., and Axen, U., Computing contour trees in all dimensions,Comput. Geometry Theory Appl., 24(3):75–94, 2003.

18. Reeb, G., Sur les points singuliers d’une forme de Pfaff complement integrable ou d’une fonction numerique,C. R. SeancesAcad. Sci. Paris, 222:847–849, 1946.

19. Pascucci, V., Scorzelli, G., Bremer, P.-T., and Mascarenhas, A., Robust on-line computation of Reeb graphs: Simplicity andspeed,ACM Trans. Graphics, 26(3):58, 2007.

20. Boyell, R. L. and Ruston, H., Hybrid techniques for real-time radar simulation, InProceedings of the Fall Joint ComputerConference, pp. 445–458, 1963.

21. Edelsbrunner, H., Harer, J., and Zomorodian, A. J., Hierarchical Morse-Smale complexes for piecewise linear 2-manifolds,Discrete Comput. Geometry, 30:87–107, 2003.

22. Gyulassy, A., Natarajan, V., Pascucci, V., and Hamann, B., Efficient computation of Morse-Smale complexes for three-dimensional scalar functions,IEEE Trans. Visualization Comput. Graphics, 13:1440–1447, 2007.

23. Carr, H., Snoeyink, J., and Panne, M.v. d., Simplifying flexible isosurfaces using local geometric measures, InProceedings ofthe 15th IEEE Visualization Conference, pp. 497–504, 2004.

24. Laney, D., Bremer, P.-T., Mascarenhas, A., Miller, P., and Pascucci, V., Understanding the structure of the turbulent mixinglayer in hydrodynamic instabilities,IEEE Trans. Visualization Computer Graphics, 12:1052–1060, 2006.

25. Bremer, P.-T., Weber, G., Pascucci, V., Day, M., and Bell, J., Analyzing and tracking burning structures in lean premixedhydrogen flames,IEEE Trans. Visualization Comput. Graphics, 16(2):248–260, 2010.

26. Gyulassy, A., Duchaineau, M., Natarajan, V., Pascucci, V., Bringa, E., Higginbotham, A., and Hamann, B., Topologicallyclean distance fields,IEEE Trans. Visualization Comput. Graphics, 13(6):1432–1439, 2007.

27. Oesterling, P., Heine, C., Jaenicke, H., Scheuermann, G., and Heyer, G., Visualization of high-dimensional point clouds usingtheir density distribution’s topology,IEEE Trans. Visualization Comput. Graphics, 17(11):1547–1559, 2011.

28. Correa, C. D. and Lindstrom, P., Towards robust topology of sparsely sampled data,IEEE Trans. Visualization Comput.Graphics, 17(12):1852–1861, 2011.

29. Edelsbrunner, H., Harer, J., Natarajan, V., and Pascucci, V., Morse-Smale complexes for piecewise linear 3-manifolds, InProceedings of the 19th Annual Symposium on Computational Geometry, pp. 361–370, 2003.

30. Chazal, F., Guibas, L. J., Oudot, S. Y., and Skraba, P., Analysis of scalar fields over point cloud data, InProceedings of the20th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1021–1030, 2009.

31. Gerber, S., Bremer, P.-T., Pascucci, V., and Whitaker, R. T., Visual exploration of high dimensional scalar functions,IEEETrans. Visualization Comput. Graphics, 16:1271–1280, 2010.

32. Gerber, S., Rubel, O., Bremer, P.-T., Pascucci, V., and Whitaker, R. T., Morse-Smale regression,J. Comput. Graphical Stat.,2012, in press.

33. Vedaldi, A. and Soatto, S., Quick shift and kernel methods for mode seeking, InProceedings of European Conference onComputer Vision, pp. 705–718, 2008.

Volume 3, Number 2, 2013

Page 20: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

138 Maljovec et al.

34. Edelsbrunner, H., Letscher, D., and Zomorodian, A. J., Topological persistence and simplification,Discrete Comput. Geome-try, 28:511–533, 2002.

35. Carlsson, G., Zomorodian, A. J., Collins, A., and Guibas, L. J., Persistence barcodes for shapes, InProceedings of Euro-graphs/ACM SIGGRAPH Symposium on Geometry Processing, pp. 124–135, 2004.

36. de Silva, V. and Ghrist, R., Coverage in sensor networks via persistent homology,Algebraic Geometric Topology, 7:339–358,2007.

37. Carlsson, E., Carlsson, G., and de Silva, V., An algebraic topological method for feature identification,Int. J. Comput. Geom-etry Appl., 16:291–314, 2003.

38. Edelsbrunner, H. and Harer, J., Persistent homology—A survey,Contemporary Math., 453:257–282, 2008.

39. Carlsson, G., Ishkhanov, T., de Silva, V., and Zomorodian, A., On the local behavior of spaces of natural images,Int. J.Comput. Vision, 76:1–12, 2008.

40. Kloke, J. and Carlsson, G., Topological de-noising: Strengthening the topological signal,Comput. Geometry, doi:arXiv:0910.5947, 2010.

41. Bremer, P.-T., Edelsbrunner, H., Hamann, B., and Pascucci, V., A topological hierarchy for functions on triangulated surfaces,IEEE Trans. Visualization Comput. Graphics, 10(4):385–396, 2004.

42. Gyulassy, A., Natarajan, V., Pascucci, V., Bremer, P. T., and Hamann, B., Topology-based simplification for feature extractionfrom 3D scalar fields, InProceedings of 16th IEEE Visualization, pp. 535–542, 2005.

43. Cole-McLaughlin, K., Edelsbrunner, H., Harer, J., Natarajan, V., and Pascucci, V., Loops in Reeb graphs of 2-manifolds, InProceedings of 19th Annual Symposium on Computational Geometry, pp. 344–350, 2003.

44. Chazal, F., Cohen-Steiner, D., Glisse, M., Guibas, L. J., and Oudot, S. Y., Proximity of persistence modules and their diagrams,In Proceedings of 25th Annual Symposium on Computational Geometry, pp. 237–246, 2009.

45. Cohen-Steiner, D., Edelsbrunner, H., and Harer, J., Stability of persistence diagrams,Discrete Comput. Geometry, 37:103–120, 2007.

46. MacKay, D., Information-based objective functions for active data selection,Neural Comput., 4(4):589–603, 1992.

47. Ackley, D. H.,A Connectionist Machine for Genetic Hillclimbing, Boston, Kluwer Academic Publishers, 1987.

48. Grollman, D., Sparse online Gaussian process, http://lasa.epfl.ch/∼dang/code.shtml, 2011.

49. Csato, L. and Opper, M., Sparse online Gaussian processes,Neural Comput., 14:641–668, 2002.

50. Csato, L., Gaussian processes—Iterative sparse approximations, PhD thesis, Aston University, Birmingham, UK, 2002.

51. Milborrow, S.,EARTH: Multivariate Adaptive Regression Spline Model, R package version 3.2-1, derived from MDA:MARSby T. Hastie and R. Tibshirani, 2011.

52. Hastie, T., Tibshirani, R., Leisch, F., Hornik, K., and Ripley, B. D.,MDA: Mixture and Flexible Discriminant Analysis, Rpackage version 0.4-2, 2011.

53. Venables, W. N. and Ripley, B. D.,Modern Applied Statistics with S, Berlin, Spinger, 2002.

54. Carnell, R.,LHS: Latin Hypercube Samples, R package version 0.5, 2009.

APPENDIX A. EXAMPLE TEST FUNCTIONS CLOSED FORMS AND PLOTS

We use several testing functions with closed forms described below. All functions can be generalized to higher di-mensions and their two dimensional contour plots are shown in Fig 12. LetD be the dimension of the input vector~θ.Ackley

f(~θ) =D−1∑

i=1

{e−0.2

√θ2

i + θ2i+1 + 3[cos(2θi) + sin(2θi+1)]

},

where{~θ|θi ∈ [−3, 3]}.Diagonal

International Journal for Uncertainty Quantification

Page 21: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

Adaptive Sampling with Topological Scores 139

FIG. 12: Pseudo-colored contour plots of the 2D versions of the test functions. From left to right: Ackley, Diagonalfunction with2 maxima (Diag2Max), Diagonal function with4 maxima (Diag4Max), Gaussian Mixture Model with2 (GMM2) and5 Gaussians (GMM5), Mis-scaled Generalized Rastrigin (MGR), Salomon and Whitley, respectively.

Let m be the number of maxima along the main diagonal,

f(~θ) =12

sin

π

12

+ [(m + 1) mod 2] +m

(∑Di=1 θi

)

D

× e

[(∑D

i=1 θ2i−

(∑Di=1 θi)2

D

)(log(0.001)√

D

)]

,

where{~θ|θi ∈ [−1, 1]}.Random Gaussian Mixture Model

Volume 3, Number 2, 2013

Page 22: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

140 Maljovec et al.

Let m be the number of extrema in the domain. Letai be the amplitude of theith extrema. Letci,j be thejthcoordinate of theith extrema. Letσi,j be the standard deviation of theith extrema with respect to thejth coordinate.

f(~θ) =m∑

i=1

(aie

−(∑D

j=1(θi,j−ci,j)

2

2σ2i,j

))

where{~θ|θi ∈ [0, 1]}.Mis-scaled Generalized Rastrigin

f(~θ) = 10D +D∑

i=1

{(10

i−1D−1 θi)2 − 10 cos[2π(10

i−1D−1 θi)]

}

where{~θ|θi ∈ [−2, 2]}.Salomon

f(~θ) = − cos

(2π

D∑

i=1

θ2i

)+ 0.1

√√√√D∑

i=1

θ2i + 1

where{~θ|θi ∈ [−1, 1]}.Whitley

f(~θ) =D∑

i=1

D∑

j=1

[k2

ij

4000− cos(kij) + 1

]

where{~θ|θi ∈ [−1, 2]}, andkij = [100(θ2i − θj)2 + (1− θj)2].

APPENDIX B. SOFTWARE PACKAGES AND PARAMETER SETTINGS

Now we give details on packages and parameter settings used in our experiments.For GPM, we use theSparse Online Gaussian Process (SOGP) C++ library [48], which is based on

work in [49, 50]. The default parameters are employed (see [50] for details), i.e., we use the Radial Basis Kernel witha spherical covariance set toσ2

0 = 0.1, the widths are uniformly set to 0.1, and the amplitudeA = 1.For EARTH, MDA, and NNET, we use nonparametric bootstrapping using250 samples without cross-validation.For EARTH, we use theearth library in R [51]. We use the following parameter settings (see [51] for details).

If a parameter is not listed, the default value given by the package is used.

degree=3 // Maximum degree of interaction (Friedman’s mi)nk=63 //Maximum number of model terms before pruningminspan=1 //Min. dist. between knots, 1 for non-noisy datathresh=1e-8 //Forward stepping thresholdpenalty=3 //Generalized Cross Validation Penalty per knot

For MDA, we usemda library in R, with the following parameters (see [52] for details, nonlisted parameters usepackage defaults):

degree=3 // Maximum degree of interaction (Friedman’s mi)nk=63 //Maximum number of model termsthresh=1e-8 // Forward stepping thresholdpenalty=3 // The cost per degree of freedom charge

For NNET, we usennet library in R, with the following parameters (see [53] for details, nonlisted parametersuse package defaults):

International Journal for Uncertainty Quantification

Page 23: ADAPTIVE SAMPLING WITH TOPOLOGICAL SCORES · functions each with their own advantages and disadvantages. In this paper we present an extensive comparative study of adaptive sampling

Adaptive Sampling with Topological Scores 141

size=1+ceiling(sqrt(D)) // Number of units in the hidden layer// D is the dimensionality of the input

decay=1e-3 // Parameter for weight decayskip=TRUE // Add skip-layer connections from input to outputlinout=TRUE // Linear output units, as opposed to logistic

To create the space-filling samplings, thelhs library in R [54] is used. More specifically, we use therandomLHSfunction, which chooses uniform, random samples without any attempts to optimize the design, to construct thetraining data, candidate data, and LHS samples in the plots shown.

Volume 3, Number 2, 2013