complements on surrogate based optimization for engineering...

Complements on Surrogate Based Optimizationfor Engineering Design

Lecture Series AVT-167Strategies for Optimization and Automated Design

of Gas Turbine Engines

Ingrid Lepot

Cenaero

October 26, 2009

Surrogate based optimization

What is a surrogate model ?

Low cost replacement of the original function for a wide variety ofpurposes

Educated guess as to what an engineering function might look like,based on a few points in space where one can afford to measure thefunction values

Basic idea: Avoid the temptation to invest one’s computation budget inanswering the question at hand and, instead, invest in developing fastmathematical approximations to the long running computer codes⇒ trade-offs exploration and insights gain

AVT167 - SBO complements (Lecture 4) copyright@cenaero 2009 2 / 26

Surrogate based optimization typical workflow


Some important topics

In a surrogate model approach, the devil’s in the details:

What points do you sample in building the approximation ?

What approximation method do you employ ?

How do you manage the approximation model(s) ?

How do you use the approximation to suggest new, improved designs ?

How do you use the approximations to explore tradeoffs betweenobjectives ?

(What to do if your simulation has numerical “noise” in it ?)


Sampling the design space

Design of Experiments (DoE)

Classical DoE techniques (e.g. full factorial, fractional factorial,Central Composite, Box Benhken, ...)

Optimal DoE techniques (e.g. Taguchi methods, ...)

Space-filling techniques (random and quasi-random sequences, LatinHypercube Sampling, ...) appropriate for computer experiments andwhen there is no a priori knowledge of the considered cost functions

Adaptive or so-called capture/recapture techniques, whichincorporate function knowledge


Surrogate models

Two main categories of low-fidelity surrogate models :

I Data-fitting models

non-physics-based approximations

typically involve interpolation or regression of a set of data generatedfrom the original expensive model

Global modelsI provide information about the global behaviour of the system

Local modelsI Construct a local model around the current design pointI Use in local search to move in a downhill direction

characterized by the number of data points used in the fitI local approximations use data from a single pointI multipoint approximations use a small number of data pointsI global approximations use a set of data points distributed over the

whole design space


Surrogate models

I Physics-based surrogate models

Hierarchical modelsI also known as multi-fidelity, variable fidelity or variable complexity

models. Such models use corrected results from a low-fidelity model(e.g. coarser mesh discretization, looser convergence tolerances, simplermodel that neglects some physics: Navier-Stokes vs. Euler equations,...) as an approximation to the results of a high-fidelity model.

Reduced-order modelsI models with fewer unknowns than the original high-fidelity modelI generated directly from a high-fidelity model through the use of a

reduced basis (modal analysis or Proper Orthogonal Decomposition)and projection of the original high-dimensional system down to a smallnumber of generalized coordinates

I do not require multiple models of varying fidelity


Physics-based surrogates - POD techniques

Proper Orthogonal Decomposition (POD)

Also known as Karhunen-Loeve Decomposition and PrincipalComponent Analysis

Standard tool in data analysis to reduce a large, complex data set toa lower dimensional one⇒ to identify the most meaningful basis, remove as much ofredundancy as possible, and give a compact representation⇒ enables to reveal the sometimes hidden, simple underlyingstructures in complex structures

Widely used in CFD (low dimensional description of turbulent flows),image processing, signal analysis, data compression, etc


POD techniques

An efficient method for computing PODs for large dimensional problems isthe method of Snapshots (Sirovich).

Consider a computationally expensive simulation S(x) ∈ IRn,depending on np parameters (x1, . . . , xnp ).

Generate observations or “snapshots” {sk}k∈m of S at m locations inthe design space.

Construct the snapshot deviation matrix asD = ((s1 − s) . . . (sm − s)) where s is the mean vector.

Compute the Singular Value Decomposition (SVD) of D :

D = UΣV ∗ = U

σ1 0. . .

0 σr

V ∗

where σ1 ≥ · · · ≥ σr > 0, r = rank(D).


POD techniques

Then

sk = s +r∑

i=1

α(k)i Φi (k = 1, . . . ,m)

where the i th mode Φi = U(:, i).

The most important part of the “energy” contribution isconcentrated in the first modes.

POD truncated approximation (p < r)

sk ≈ s +

p∑i=1

α(k)i Φi (k = 1, . . . ,m)

POD basis = optimal basis which contains more information than anyother one.


Non intrusive POD surrogate

Derivation of low-order models by combining

POD techniquesI to perform the space reduction of the modelI to obtain the basis functions for the low-order model

Generic data-fitting techniques like RBF or KrigingI to perform the low-dimensional reconstruction in the design spaceI to approximate the POD coefficients for cases not included in thesnapshot set, i.e. to express the POD coefficients directly as functionsof the design variables

For any x , one can estimate S(x) :

xdata-fitting−−−−−−−−−→

Kriging or RBF

(α

(x)i

)i∈p

reconstitution−−−−−−−−→POD

S(x)

Does not require an intrusive or code-specific implementationI ... as good as the observed data set used for its training!


Illustration of POD approach

Optimal rotary control of the cylinder wakeusing POD Reduced Order Model


Model Assessment

The accuracy of the metamodel depends on the number and location ofsample points in the design space.

The leave-one-out (LOO) procedure is a way to estimate the accuracy of ametamodel without the need for creating extra data for validation.

LOO procedure

Create a data set with k sample points.

k − 1 samples are used to build a metamodel and the ithsample is left out in the fitting process.

The output value at the ith sample is estimated with themetamodel.

I k − 1 samples ; training dataI ith sample ; validation data

Repeat for all the k sample points (each sample point is usedonce as the validation data).


Model Assessment

20 samples 50 samples


Surrogate model(s) management

Key issues

Offline(i.e. a priori trained)/Online(i.e. adaptively improved) modelsmanagement I search infill criteria

Global/Local models management (move limit strategies, trust regiontechniques, ...)

Graal quest: Optimum exploitation/exploration balance


Search infill criteria

Enhance the accuracy of the model using further function calls : infill orupdate points

Online model management

Improve the accuracy only in the region of the optimum predicted bythe surrogate → local exploitationI quickly converge to an optimum valueI possibly get stuck at a local optimum

Ability to search away from the current optimum and explore otherregions → global exploration

Instead of either exploiting or exploring the surrogate model, use infillcriteria which balance these options.



Kriging not only allows to compute a predictor, y(x), but also a measureof the possible error in the predictor, s(x).

There are 2 “zones” where it isdesirable to add new samplepoints :

where the model is minimizedI min y(x)

where there is a significanterror in the predictionI max s(x)

I Updating approaches based on explicit measures of uncertainty



Simplest way of balancing exploitation of y(x) and exploration using s(x)is to minimize the lower confidence bounding (LCB) function

LCB(x) = y(x)− ρ s(x)

small ρ leads rapidly to an optimum,but possibly only a local one

large ρ explores the potential regions of improvement,in which the uncertainty is high

I Difficult to choose the user defined parameter ρ to obtain a goodbalance between exploration and exploitation.



The amount of improvement we expect may also be evaluatedI Maximizing the Expected Improvement (EI) criteria

EI (x) =

{(ymin − y)Φ

(ymin−y

s

)+ s Ψ

(ymin−y

s

)if s > 0

0 if s = 0

where Φ and Ψ are the cumulative and density normal distributionfunctions.



The concept of merit function can be employed to maintain diversityin the solution space.

Search both in regions where the surrogate model indicates theremight be a minimizer of the objective and where we realize that weknow very little about the problem (few samples).

Take into account the distance ofan individual with the otherindividuals and favor the solutionsfar away from their neighbours :

merit (x) = y(x) − ρ d(x)

with ρ ≥ 0 and d(x) = mini‖x − xi‖2.


Parallel infill points

Parallel updating approaches : multiple designs are selected for samplingduring each iterationI make effective use of parallel computing resources

Different approaches :

Some infill criteria (e.g. EI, PI) exhibit multi-modal behaviourI identify several local optima and select them as infill points

Search the infill criterion, temporarily add the surrogate predictedvalue at this point (assume the model is correct at this location),rebuild the surrogate, search again the infill criterion, . . .

Focus on one target at a time rather than compromise by, forinstance, adding two new design points per iteration, one coming fromthe exploration search and the other one from the exploitation search



Move limit strategy allows to

adapt the search range of the variables along the design process basedon efficacity of approximations

focus the optimization search on smaller regions of the design spaceand exploiting local models

ensure that the inner optimization does not produce design pointsoutside the region where the surrogate model is valid

As the optimization proceeds, the idea is to enlarge or restrict the searchspace based on a heuristic rule in order to refine the candidate optimalregion (new additional points are chosen within these move limits).

If improvement ⇒ we can trust the model and the search region isenlarged.

If no improvement ⇒ the search region is contracted.



Example for the Rosenbrock function (locations of one additional randomsample point at each design iteration)

with move limit without move limit

f ∗ = 1.7 10−8 f ∗ = 1.4 10−6


ANOVA

Analysis of Variance

Gives information on which design variables have more influence overthe outcome, e.g. performance.

Allows to quantify first order sensitivities, cross terms, higher orderinteraction volume ...

Based on multi-dimensional integration of the model with givenvariable(s) held constant. This process is repeated number of timeswith different value of the fixed variable(s).


Conclusion

Multidisciplinary design optimization (MDO) has considerable impact onthe design by increasing performance, lowering lifecycle cost and

shortening design time for complex products

However,the objective is not only to eke out a “5% “ performance improvement inthe design solution, but most importantly to

Gain insight into the design space

Assess key factors

Quantify trades

Point out innovative design options


Conclusion

No one optimization method works well for all problems

“Chicken and egg dilemma ...”The best search algorithm to exploit depends upon the type of designspace that has been defined. But the characteristics of the design spaceare typically not known until it has been explored, ...which is the primaryrole of the search method.

⇒ Trend towards hybrid and adaptive search strategies

Fundamental role of the optimization specification:Parameterization, bounds definition, model simulations choice, cost

functions and constraints definitions, ...


complements on surrogate based optimization for engineering...

Documents