heuristics and statistical methods - gerad...statistical methods heuristics cma-es references...
TRANSCRIPT
-
Statistical methods Heuristics CMA-ES References
Heuristics and statistical methods
MTH8418
S. Le Digabel, Polytechnique Montréal
Winter 2020(v2)
MTH8418: Heuristics & stat. methods 1/49
-
Statistical methods Heuristics CMA-ES References
Plan
Statistical methods
Heuristics
CMA-ES
References
MTH8418: Heuristics & stat. methods 2/49
-
Statistical methods Heuristics CMA-ES References
Deterministic vs stochastic
I Deterministic method: The execution of the method alwaysgives the same execution and solution
I Stochastic method:I Some randomness is involved
I Use a good Random Number Generator (RNG)
I Use the process identification number (PID) as the randomseed, not the time
I Change the seed for different results
I Remember the seed for reproducibility
I Is reproducibility possible for different platforms andparallelism?
MTH8418: Heuristics & stat. methods 3/49
-
Statistical methods Heuristics CMA-ES References
Statistical methods
Heuristics
CMA-ES
References
MTH8418: Heuristics & stat. methods 4/49
-
Statistical methods Heuristics CMA-ES References
SamplingI Design of Experiments: Problem of choosing the inputs at
which to evaluate f for constructing a model
I Main objective: Reduce the noise
I Examples of techniques: Latin-Hypercubesampling [McKay et al., 1979], Orthogonalarrays [Hedayat et al., 1999]
I Multi-Variate Normal Sampling: MVNS(m,σ2C): Generate λpoints in Rn where each point corresponds to a randomvector:
Xk ∼ N(mean = m ∈ Rn, cov = σ2C ∈ Rn×n) = m+σN(0, C),
k ∈ {1, 2, . . . , λ}
MTH8418: Heuristics & stat. methods 5/49
http://en.wikipedia.org/wiki/Design_of_experimentshttp://en.wikipedia.org/wiki/Latin_hypercube_samplinghttp://en.wikipedia.org/wiki/Latin_hypercube_samplinghttp://en.wikipedia.org/wiki/Orthogonal_arrayhttp://en.wikipedia.org/wiki/Orthogonal_array
-
Statistical methods Heuristics CMA-ES References
Surrogates
I A surrogate model is an approximation of the objectivefunction, but is cheaper to evaluate
I It can be extended to the functions defining the constraints
I Static vs dynamic surrogates
I Dynamic surrogates and the Surrogate ManagementFramework [Booker et al., 1999]
I Surrogates will be studied in Lesson #11
MTH8418: Heuristics & stat. methods 6/49
https://www.gerad.ca/Sebastien.Le.Digabel/MTH8418/11_surrogates.pdf
-
Statistical methods Heuristics CMA-ES References
Response Surface Methods (RSM): 1/2
I Response Surface Methods [Box and Wilson, 1951]
I Initial sampling
I ANOVA to select the most important variables (optional)
I Build the surrogate
I Optimize the surrogate in order to find promising new pointswhere to evaluate the true function
I Or use the surrogate in order to identify a good descentdirection. Then perform a line-search with the true function
I Update the surrogate and iterate
MTH8418: Heuristics & stat. methods 7/49
-
Statistical methods Heuristics CMA-ES References
Response Surface methods (RSM): 2/2
I How to select/filter the points for constructing the model?
I Formulation of the surrogate subproblem?
I Evaluations for objective improvement VS surrogateamelioration?
I Intensification VS diversification?
I How to deal with constraints?
I RSM example: DACE [Lophaven et al., 2002]:I Design and Analysis of Computer ExperimentsI Based on Kriging surrogates (Lesson #11)I MATLAB toolbox: Provides sampling tools and Kriging
MTH8418: Heuristics & stat. methods 8/49
https://www.gerad.ca/Sebastien.Le.Digabel/MTH8418/11_surrogates.pdfhttp://www.imm.dtu.dk/~hbni/dace/
-
Statistical methods Heuristics CMA-ES References
The EGO method
I Efficient Global Optimization ∈ RSM
I [Jones et al., 1998]
I In the abstract: “The key to using response surfaces for globaloptimization lies in balancing the need to exploit theapproximating surface (by sampling where it is minimized)with the need to improve the approximation (by samplingwhere prediction error may be high)”
I Based on the ability of Kriging, or more generally, Gaussianprocesses, to provide model uncertainty
I Introduction of the Expected Improvement (EI)
MTH8418: Heuristics & stat. methods 9/49
-
Statistical methods Heuristics CMA-ES References
EGO: Improvement function
I Improvement random variable at x ∈ X :
I(x) = max{fmin − Z(x), 0} ≥ 0
I fmin is the value of the current best (feasible) iterate
I Z(x) is a random variable whose distribution depends on thesurrogate model fit given the data
I Z(x) ∼ N(ẑ(x), σ̂2(x)
)I ẑ(x), σ̂2(x): Predicted mean and variance at x
MTH8418: Heuristics & stat. methods 10/49
-
Statistical methods Heuristics CMA-ES References
EGO: Expected Improvement (EI)I Expected Improvement at x: EI(x) = E[I(x)] =
(fmin − ẑ(x)) Φ(fmin − ẑ(x)
σ̂(x)
)+ σ̂(x)φ
(fmin − ẑ(x)
σ̂(x)
)
I φ, Φ: Standard normal density and distribution function
I Measures how much improvement is expected by sampling atthe unknown location x
I Tradeoff between promising and uncertain regions bysimultaneously considering information about the predictorand the uncertainty measure
I Constrained version presented in Lesson #11
MTH8418: Heuristics & stat. methods 11/49
https://www.gerad.ca/Sebastien.Le.Digabel/MTH8418/11_surrogates.pdf
-
Statistical methods Heuristics CMA-ES References
EGO: EI illustration
Plots by R.B. Gramacy. IECI: Integrated Expected ConditionalImprovement (generalization of the EI)
MTH8418: Heuristics & stat. methods 12/49
http://bobby.gramacy.com
-
Statistical methods Heuristics CMA-ES References
R tools to compute the EIThe tgp and dynaTree packages by R.B. Gramacy:
0 5 10 15 20 25−5
−4
−3
−2
−1
0
1
2
3
4
x
f
Data pointsPartitionPredictive fPredicitive f ± σ
I The predictive mean f̂(x) and standard deviation σ̂(x)
I The predictive cumulative distribution P [f(x) ≤ f0]I The expected improvement EI
I Classification model available for discrete inputs (dynaTree)
MTH8418: Heuristics & stat. methods 13/49
https://cran.r-project.org/web/packages/tgp/index.htmlhttps://cran.r-project.org/web/packages/dynaTree/index.html
-
Statistical methods Heuristics CMA-ES References
EGO: Algorithm
1 Initial Sampling
2 Construct Kriging model
3 Surrogate subproblem: maxx∈X
EI(x)
4 Evaluate blackbox at the solution of the surrogate subproblem
5 Update the model
6 Go to Step 3 unless a stopping criterion is met
MTH8418: Heuristics & stat. methods 14/49
-
Statistical methods Heuristics CMA-ES References
Statistical methods
Heuristics
CMA-ES
References
MTH8418: Heuristics & stat. methods 15/49
-
Statistical methods Heuristics CMA-ES References
Introduction
I Heuristics are methods that do not guarantee convergence
I They are typically developed for combinatorial optimization,but can be applied to blackbox problems. This impliesneighbourhoods in Rn, denoted by y ∈ N(x)
I Heuristics may be:I Trajectory methods: One solution is considered at each
iterationI Population-based: a set of solutions called individuals is
considered at each iteration, or generation
I Other heuristics are applied to handle constraints. Forexample: Reparation
MTH8418: Heuristics & stat. methods 16/49
-
Statistical methods Heuristics CMA-ES References
Intensification and diversification
I Intensification: Moves favouring local exploration and quickconvergence to a local optimum
I Diversification: Moves favouring global exploration.Perturbations or mutations that temporarily increase theobjective value, or generate infeasible solutions, in order toescape from local optima
MTH8418: Heuristics & stat. methods 17/49
-
Statistical methods Heuristics CMA-ES References
Simulated annealing
I SA: Simulated Annealing [Kirkpatrick et al., 1983]
I ASA: Adaptive Simulated Annealing [Ingber, 1993]
I Trajectory method inspired from the metallurgy industrywhere heating and cooling of a material is alternated in orderto minimize the thermodynamic free energy of the material
MTH8418: Heuristics & stat. methods 18/49
-
Statistical methods Heuristics CMA-ES References
Simulated annealing: Ingredients
I Temperature parameter T and a function g to update it
I Random number r in [0; 1]
I Function p(T, x, x′) of T and two feasible solutions. Usually
taken as e
(f(x)−f(x′)
T
)and such that:
I If f(x′) < f(x) or if T is large, then almost every timer < p(T, x, x′)
I If f(x′) > f(x) and T small, then almost every timer > p(T, x, x′)
MTH8418: Heuristics & stat. methods 19/49
-
Statistical methods Heuristics CMA-ES References
Simulated annealing: Algorithm
Simulated annealing
[1] Choose x feasible (x ∈ Ω)Choose initial temperature T1k ← 1
[2] While no stopping condition is observedRandomly choose x′ ∈ N(x) ∩ ΩRandomly choose r in [0; 1]If r < p(Tk, x, x
′)x← x′
Tk+1 ← g(Tk)k ← k + 1
MTH8418: Heuristics & stat. methods 20/49
-
Statistical methods Heuristics CMA-ES References
Simulated annealing: Temperature update (1/2)
I The smaller T is, the less probable it is to accept a move thatincreases f
I We begin with a large T value for exploration(diversification)
I Then T → 0 in order to forbid f to increase
I Decreasing T = intensification
MTH8418: Heuristics & stat. methods 21/49
-
Statistical methods Heuristics CMA-ES References
Simulated annealing: Temperature update (2/2)
I Theoretical convergence with g(T ) = Llog(T+c) with L, c ∈ R,but practical convergence is slow
I Popular choice: g(T ) = αT with 0 < α < 1 (typicallyα = 0.95)
I The update function g may be non-monotonous, sinceincreasing T favours diversification
MTH8418: Heuristics & stat. methods 22/49
-
Statistical methods Heuristics CMA-ES References
Generic Evolution Strategy (ES)
Evolution Strategy
[1] Generate an initial population
[2] While no stopping condition is satisfiedCooperation, mutations
Individual adaptation
In practice, these methods will spend a lot of evaluations and willrequire intense parameter tuning. They should only be consideredin desperate situations
MTH8418: Heuristics & stat. methods 23/49
-
Statistical methods Heuristics CMA-ES References
Population
I Typically, one individual = one solution. But it may also be apart of a solution
I The current population is made of parents and of newoffsprings
I Generational replacement: All the parents are replaced
I Steady state: Only a fraction of the population can change
I Most of the ES methods consider a fixed population of size λ
MTH8418: Heuristics & stat. methods 24/49
-
Statistical methods Heuristics CMA-ES References
Infeasibility
Each new individual may be infeasible. In this case, we have fouroptions:
I Reject the offspring
I Repair the offspring to make it feasible
I Accept the offspring in the population if a penalty is used in f
I Do not generate infeasible offsprings
MTH8418: Heuristics & stat. methods 25/49
-
Statistical methods Heuristics CMA-ES References
Genetic algorithms (GAs)
I Wikipedia
I [Holland, 1992]
I “Inspired” from the theory of evolution and from the biologicprocesses that allow organisms to adapt themselves to theirenvironment
MTH8418: Heuristics & stat. methods 26/49
http://en.wikipedia.org/wiki/Genetic_algorithm
-
Statistical methods Heuristics CMA-ES References
GA with generational replacement
GA
[1] Generate initial population X0 of size λk ← 0
[2] While no stopping condition is satisfiedk ← k + 1Xk ← ∅Repeat λ times
Create o from 2 individuals in Xk−1Mutate o and add it to Xk
MTH8418: Heuristics & stat. methods 27/49
-
Statistical methods Heuristics CMA-ES References
Genetic algorithms: LAST RESORT ONLY
From [Audet and Hare, 2017]:
I “These methods are popular, in part because of the nicemetaphor that they invoke. However, they fail to besupported by a useful convergence analysis”
I “Convergence” with fkbest the best f value at iteration k:
P
[lim
k→+∞fkbest = f
∗]
= 1
boils down to “if you try enough points, eventually you shouldget lucky”
MTH8418: Heuristics & stat. methods 28/49
-
Statistical methods Heuristics CMA-ES References
Ant Colony Optimization (ACO)I [Dorigo, 1992]I Wikipedia, M. Dorigo pageI One individual = one ant. Each ant constructs a solution by
taking some decisions (moves or parts of solution)
I Greedy force ηd: Value that represents the interest that an anthas to take the decision d
I Trace τd: Historical interest
I Given α and β parameters, one ant takes the decision d withprobability:
(ηd)α(τd)
β∑r∈D
(ηr)α(τr)β
MTH8418: Heuristics & stat. methods 29/49
http://en.wikipedia.org/wiki/Ant_colony_optimization_algorithmshttp://iridia.ulb.ac.be/~mdorigo/ACO/publications.html
-
Statistical methods Heuristics CMA-ES References
ACO algorithmI When a solution is complete, the ant leaves a trace
proportional to the quality of the solution. This trace will fadewith time: Evaporation
I ρ: Evaporation parameter in ]0; 1[
I A: set of ants, f(a) value of the solution generated by Ant a
I Trace update with decision d:
τd = (1− ρ)τd + c∑a∈A
∆d(a)
where c is a constant, and
∆d(a) =
{1/f(a) if Ant a made the choice d
0 otherwise
MTH8418: Heuristics & stat. methods 30/49
-
Statistical methods Heuristics CMA-ES References
ACO: Algorithm
ACO
[1] Initialize traces τd to 0for all possible decisions d
[2] While no stopping condition is satisfiedConstruct |A| solutions considering
the greedy force and the trace
Local search from these solutions
Update traces τdUpdate best solution
i← i+ 1
MTH8418: Heuristics & stat. methods 31/49
-
Statistical methods Heuristics CMA-ES References
Other heuristicsI Differential Evolutions (DE) [Rocca et al., 2011]
I Particle Swarm Optimization(PSO) [Kennedy and Eberhart, 1995]
I Artificial Bee Colony [Karaboga, 2005]
I Scatter Search [Glover, 1998]
I Tabu Search [Glover and Laguna, 1997]
I Variable Neighbourhood Search(VNS) [Mladenović and Hansen, 1997]
I GRASP (Greedy randomized adaptive searchprocedure) [Hart and Shogan, 1987]
I Other “metaphoric” metaheuristics: WikipediaExtract: “metaphor-inspired metaheuristics in general have attracted criticism in the research community
for hiding their lack of effectiveness or novelty behind an elaborate metaphor”
MTH8418: Heuristics & stat. methods 32/49
https://en.wikipedia.org/wiki/List_of_metaphor-based_metaheuristics
-
Statistical methods Heuristics CMA-ES References
Statistical methods
Heuristics
CMA-ES
References
MTH8418: Heuristics & stat. methods 33/49
-
Statistical methods Heuristics CMA-ES References
CMA-ES: Software
I Covariance Matrix Adaptation EvolutionStrategy [Hansen, 2006]
I Nikolaus Hansen
I Multiple implementations: C, C++, FORTRAN, JAVA,PYTHON , MATLAB, OCTAVE, R, SCILAB
I Bounds indirectly handled. No constraints
I Population-based stochastic method
I Global convergence with probability oneI Contrary to other population-based methods, there are few
parameters to decide:I Starting point x0 ∈ RnI Initial step size σ0 > 0 ∈ RI Population size λ ≥ 2
MTH8418: Heuristics & stat. methods 34/49
http://en.wikipedia.org/wiki/CMA-EShttp://en.wikipedia.org/wiki/CMA-EShttp://www.cmap.polytechnique.fr/~nikolaus.hansen/http://cma.gforge.inria.fr/cmaes_sourcecode_page.html
-
Statistical methods Heuristics CMA-ES References
CMA-ES: Idea
I At each iteration (or generation), MVNS is used to generatenew candidates in Rn. This corresponds to mutation
I CMA consists to update the covariance of the distribution.Objective: Fit the distribution to the contour lines of f
I The mean of the multi-variate normal distribution correspondsto the current solution
I The mean update corresponds to recombination
I The standard deviation of the distribution corresponds to thestep-size
I Selection is deterministic and based on the objective rankings
MTH8418: Heuristics & stat. methods 35/49
-
Statistical methods Heuristics CMA-ES References
CMA-ES: Illustration
Taken from [Hansen, 2006]
MTH8418: Heuristics & stat. methods 36/49
-
Statistical methods Heuristics CMA-ES References
CMA-ES: Parameters
I m ∈ Rn: Multi-Variate Normal mean, and current bestsolution
I σ > 0: Step-size
I C ∈ Rn×n: Symmetric and positive definite covariance matrix
I pσ ∈ Rn, pc ∈ Rn: Isotropic and anisotropic evolution paths
I Population size, sample size, number of offsprings: λ ≥ 2.Constant. λ > 4 is recommended
I µ ≤ λ: Parent population size
MTH8418: Heuristics & stat. methods 37/49
-
Statistical methods Heuristics CMA-ES References
CMA-ES: Algorithm
[1] λ ≥ 2, µ ≤ λ, m← x0, σ ← σ0C ← In, pσ = pc = 0
[2] Repeat until stopping condition
X = {x1, x2, . . . , xλ} ← MVNS(m,σ2C)Evaluate f(x) for all x ∈ XSort points in X with f(x1) ≤ f(x2) ≤ . . . ≤ f(xλ)Update mean mUpdate evolution paths pσ and pcUpdate covariance matrix CUpdate step-size σ
MTH8418: Heuristics & stat. methods 38/49
-
Statistical methods Heuristics CMA-ES References
CMA-ES: Mean update (recombination)I µ ≤ λ: Parent population sizeI Weighted average of the µ best points in X:
m←µ∑i=1
wixi ∈ Rn
I Withµ∑i=1
wi = 1 and w1 ≥ w2 ≥ . . . ≥ wµ > 0
I Recommended: wi ∝ µ− i+ 1 and µ ' λ/2
I Variance effective selection mass: µeff =(∑µ
i=1w2i
)−1. We
aim at µeff ' λ/4I Remember the “old” vector m with ∆m = m−mold
(displacement)
MTH8418: Heuristics & stat. methods 39/49
-
Statistical methods Heuristics CMA-ES References
CMA-ES: Evolution paths update
I pσ ← (1− cσ)pσ +√cσ(2− cσ)µeff C−1/2 ∆mσ ∈ R
n
I pc ← (1− cc)pc +√cc(2− cc)µeff ∆mσ ∈ R
n
I cσ, cc < 1
I Definition: C−1/2 = BD−1B where C = BD2B> is aneigen-decomposition of C (and D is diagonal)
MTH8418: Heuristics & stat. methods 40/49
-
Statistical methods Heuristics CMA-ES References
CMA-ES: Covariance and step-size updates
I C ← (1− ccov)C + ccovµeff pcp>c
+ ccov
(1− 1µeff
) [ µ∑i=1
wi(xi−mσ
) (xi−mσ
)>]I ccov ' min{µeff, n2}/n2
I σ ← σ exp(cσdσ
‖pσ‖√2
Γ((n+1)/2)Γ(n/2)
− 1)
I dσ ' 1
MTH8418: Heuristics & stat. methods 41/49
-
Statistical methods Heuristics CMA-ES References
Statistical methods
Heuristics
CMA-ES
References
MTH8418: Heuristics & stat. methods 42/49
-
Statistical methods Heuristics CMA-ES References
References I
Audet, C. and Hare, W. (2017).
Derivative-Free and Blackbox Optimization.
Springer Series in Operations Research and Financial Engineering.Springer International Publishing, Berlin.
Booker, A., Dennis, Jr., J., Frank, P., Serafini, D., Torczon, V., andTrosset, M. (1999).
A Rigorous Framework for Optimization of Expensive Functions bySurrogates.
Structural and Multidisciplinary Optimization, 17(1):1–13.
(SMF).
Box, G. and Wilson, K. (1951).
On the experimental attainment of optimum conditions.
J. Roy. Statist. Soc. Ser. B., XIII:1–45.
(RSM).
MTH8418: Heuristics & stat. methods 43/49
-
Statistical methods Heuristics CMA-ES References
References II
Dorigo, M. (1992).
Optimization, Learning and Natural Algorithms.
PhD thesis, Politecnico di Milano.
(ACO).
Glover, F. (1998).
A template for scatter search and path relinking.
In Hao, J.-K., Lutton, E., Ronald, E., Schoenauer, M., and Snyers, D.,editors, Artificial Evolution, volume 1363 of Lecture Notes in ComputerScience, pages 1–51. Springer Berlin Heidelberg.
(Scatter Search).
Glover, F. and Laguna, M. (1997).
Tabu Search.
Kluwer Academic Publishers.
MTH8418: Heuristics & stat. methods 44/49
-
Statistical methods Heuristics CMA-ES References
References III
Hansen, N. (2006).
The CMA Evolution Strategy: A Comparing Review.
In Lozano, J., Larrañaga, P., Inza, I., and Bengoetxea, E., editors,Towards a New Evolutionary Computation, volume 192 of Studies inFuzziness and Soft Computing, pages 75–102. Springer Berlin Heidelberg.
Hart, J. P. and Shogan, A. (1987).
Semi-greedy heuristics: An empirical study.
Operations Research Letters, 6(3):107–114.
(GRASP).
Hedayat, A., Sloane, N., and Stufken, J. (1999).
Orthogonal Arrays: Theory and Applications.
Springer.
MTH8418: Heuristics & stat. methods 45/49
-
Statistical methods Heuristics CMA-ES References
References IV
Holland, J. (1992).
Adaptation in Natural and Artificial Systems: Introductory Analysis withApplications to Biology, Control, and Artificial Intelligence.
MIT Press, Cambridge.
(GAs).
Ingber, L. (1993).
Simulated annealing: Practice versus theory.
Mathematical and Computer Modelling, 18(11):29–57.
(ASA).
Jones, D., Schonlau, M., and Welch, W. (1998).
Efficient Global Optimization of Expensive Black Box Functions.
Journal of Global Optimization, 13(4):455–492.
(EGO).
MTH8418: Heuristics & stat. methods 46/49
-
Statistical methods Heuristics CMA-ES References
References V
Karaboga, D. D. (2005).
An Idea Based On Honey Bee Swarm for Numerical Optimization.
Technical Report TR06, Erciyes University, Engineering Faculty, ComputerEngineering Department.
(Artificial Bee Colony).
Kennedy, J. and Eberhart, R. (1995).
Particle Swarm Optimization.
In Proceedings of the 1995 IEEE International Conference on NeuralNetworks, pages 1942–1948, Perth, Australia. IEEE Service Center,Piscataway.
Kirkpatrick, S., Jr., C. G., and Vecchi, M. (1983).
Optimization by Simulated Annealing.
Science, 220(4598):671–680.
MTH8418: Heuristics & stat. methods 47/49
-
Statistical methods Heuristics CMA-ES References
References VI
Lophaven, S., Nielsen, H., and Søondergaard, J. (2002).
DACE: A MATLAB Kriging toolbox version 2.0.
Technical Report IMM-REP-2002-12, Informatics and MathematicalModelling, Technical University of Denmark.
Matheron, G. (1963).
Principles of geostatistics.
Economic Geology, 58:1246–1266.
(Kriging).
McKay, M., Beckman, R., and Conover, W. (1979).
A comparison of three methods for selecting values of input variables inthe analysis of output from a computer code.
Technometrics, 21(2):239–245.
(LHS).
MTH8418: Heuristics & stat. methods 48/49
-
Statistical methods Heuristics CMA-ES References
References VII
Mladenović, N. and Hansen, P. (1997).
Variable neighborhood search.
Computers and Operations Research, 24(11):1097–1100.
(VNS).
Rocca, P., Oliveri, G., and Massa, A. (2011).
Differential Evolution as Applied to Electromagnetics.
IEEE Antennas and Propagation Magazine, 53(1):38–49.
MTH8418: Heuristics & stat. methods 49/49
Statistical methods
Heuristics
CMA-ES
References