goback - department of cyberneticslabe.felk.cvut.cz/~posik/xe33scp/xe33scp-eda2.pdf · finite...
TRANSCRIPT
GoBack
P. Pošík c© 2007 Soft Computing – 1 / 23
8. Estimation of Distribution Algorithms. Continuous Domain.
Petr Pošík
Czech Technical University in Prague
Faculty of Electrical Engineering
Department of Cybernetics
Contents
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 2 / 23
Last week. . .
Features of continuous spaces
Real-valued EDAs
Optimization using Gaussian distribution
Last week. . .
Last week. . .
Intro to EDAs
Content of the lectures
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 3 / 23
Intro to EDAs
Last week. . .
Intro to EDAs
Content of the lectures
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 4 / 23
Black-box optimization
GA vs. EDA
✔ GA approach: select — crossover — mutate
✔ EDA approach: select — model — sample
EDA with binary representation
✔ the best possible (general, flexible) model: joint probability
✘ determine the probability of each possible combination of bits
✘ 2D − 1 parameters, exponential complexity
✔ less precise (less flexible), but simpler probabilistic models
Content of the lectures
Last week. . .
Intro to EDAs
Content of the lectures
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 5 / 23
Binary EDAs
✔ Without interactions
✘ 1-dimensional marginal probabilities p(X = x)
✘ PBIL, UMDA, cGA
✔ Pairwise interactions
✘ conditional probabilities p(X = x|Y = y)
✘ sequences (MIMIC), trees (COMIT), forrest (BMDA)
✔ Multivariate interactions
✘ conditional probabilities p(X = x|Y = y, Z = z, . . .)
✘ Bayesian networks (BOA, EBNA, LFDA)
Continuous EDAs
✔ Histograms
✔ Gaussian distribution
✔ Evolutionary strategies, CMA-ES
Features of continuous spaces
Last week. . .
Features of continuousspaces
The difference of binaryand real space
Local neighborhood
Real-valued EDAs
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 6 / 23
The difference of binary and real space
Last week. . .
Features of continuousspaces
The difference of binaryand real space
Local neighborhood
Real-valued EDAs
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 7 / 23
Binary space
✔ Each possible solution is placedin one of the corners ofD-dimensional hypercube
✔ No values lying between them
✔ Finite number of elements
0000 1000
11000100
0010 1010
11100110
0111
0101
111110110011
0001 1001
1101
Real space
✔ The space in each dimension need not be bounded
✔ Even when bounded by a hypercube, there are infinitely many points betweenthe bounds (theoretically; in practice we are limited by the numerical precisionof given machine)
✔ Infinitely many (even uncountably many) candidate solutions
Local neighborhood
Last week. . .
Features of continuousspaces
The difference of binaryand real space
Local neighborhood
Real-valued EDAs
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 8 / 23
How do you define a local neighborhood?
✔ . . . as a set of points that do not have the distance to a reference point largerthan a threshold?
Local neighborhood
Last week. . .
Features of continuousspaces
The difference of binaryand real space
Local neighborhood
Real-valued EDAs
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 8 / 23
How do you define a local neighborhood?
✔ . . . as a set of points that do not have the distance to a reference point largerthan a threshold?
✘ The volume of the local neighborhood relative to the volume of thewhole space exponentially drops
✘ With increasing dimensionality the neighborhood becomes increasinglymore local
Local neighborhood
Last week. . .
Features of continuousspaces
The difference of binaryand real space
Local neighborhood
Real-valued EDAs
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 8 / 23
How do you define a local neighborhood?
✔ . . . as a set of points that do not have the distance to a reference point largerthan a threshold?
✘ The volume of the local neighborhood relative to the volume of thewhole space exponentially drops
✘ With increasing dimensionality the neighborhood becomes increasinglymore local
✔ . . . as a set of points that are closest to the reference point and their unificationcovers part of the search space of certain (constant) size?
Local neighborhood
Last week. . .
Features of continuousspaces
The difference of binaryand real space
Local neighborhood
Real-valued EDAs
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 8 / 23
How do you define a local neighborhood?
✔ . . . as a set of points that do not have the distance to a reference point largerthan a threshold?
✘ The volume of the local neighborhood relative to the volume of thewhole space exponentially drops
✘ With increasing dimensionality the neighborhood becomes increasinglymore local
✔ . . . as a set of points that are closest to the reference point and their unificationcovers part of the search space of certain (constant) size?
✘ The size of the local neighborhood rises with dimensionality of the searchspace
✘ With increasing dimensionality of the search space the neighborhood isincreasingly less local
Curse of dimensionality!
Real-valued EDAs
Last week. . .
Features of continuousspaces
Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 9 / 23
By analogy to discrete EDAs. . .
Last week. . .
Features of continuousspaces
Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 10 / 23
Without interactions
✔ UMDA: the same principle, only the marginal probability model is of differenttype
✔ Univariate histograms?
✔ Univariate Gaussian distribution?
✔ Univariate mixture of Gaussians?
Pairwise and higher-order interactions:
✔ Many different types of interactions!
✔ Model which would describe all possible kinds of interaction is virtuallyimpossible to find!
No Interactions Among Variables
Last week. . .
Features of continuousspaces
Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 11 / 23
UMDA: EDA with marginal productmodel:
p(x) =D
∏d=1
p(xd) (1)
The following univariate models werecompared:
✔ Equi-width histogram
✔ Equi-height histogram
✔ Max-diff histogram
✔ Univariate mixture of Gaussians
Features:
✔ the most straightforward analogywith discrete histograms
✔ if any bin is empty, there is noway to create new individual inthat bin
0 5 10 15 200
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Equi−width Histogram
No Interactions Among Variables
Last week. . .
Features of continuousspaces
Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 11 / 23
UMDA: EDA with marginal productmodel:
p(x) =D
∏d=1
p(xd) (1)
The following univariate models werecompared:
✔ Equi-width histogram
✔ Equi-height histogram
✔ Max-diff histogram
✔ Univariate mixture of Gaussians
Features:
✔ instead of fixing the bin width, fixthe number of points in each bin
✔ no empty bins, always possibleto generate any point in thehyperrectangle
0 5 10 15 200
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Equi−height Histogram
No Interactions Among Variables
Last week. . .
Features of continuousspaces
Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 11 / 23
UMDA: EDA with marginal productmodel:
p(x) =D
∏d=1
p(xd) (1)
The following univariate models werecompared:
✔ Equi-width histogram
✔ Equi-height histogram
✔ Max-diff histogram
✔ Univariate mixture of Gaussians
Features:
✔ place the bin boundaries to thelargest gaps between the points
✔ no empty bins, always possibleto generate any point in thehyperrectangle
0 5 10 15 200
0.02
0.04
0.06
0.08
0.1
0.12
Max−diff Histogram
No Interactions Among Variables
Last week. . .
Features of continuousspaces
Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 11 / 23
UMDA: EDA with marginal productmodel:
p(x) =D
∏d=1
p(xd) (1)
The following univariate models werecompared:
✔ Equi-width histogram
✔ Equi-height histogram
✔ Max-diff histogram
✔ Univariate mixture of Gaussians
Features:
✔ built by the EM algorithm(probabilistic version of k-meansclustering)
✔ more suitable for unboundedspaces
0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
0.3
0.35Mixture of Gaussians
No Interactions Among Variables
Last week. . .
Features of continuousspaces
Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 11 / 23
UMDA: EDA with marginal productmodel:
p(x) =D
∏d=1
p(xd) (1)
The following univariate models werecompared:
✔ Equi-width histogram
✔ Equi-height histogram
✔ Max-diff histogram
✔ Univariate mixture of Gaussians
The winner of comparison:
Equi-height histogram
✔ precise
✔ non-parametric
0 5 10 15 200
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Equi−height Histogram
Results: Two Peaks Function
Last week. . .
Features of continuousspaces
Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 12 / 23
✔ optimum in (1, 1, . . . , 1)
✔ 2D local optima
0 2 4 6 8 10 120
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Evolution of bin boundaries (component centers for MOG):
0 20 40 60 80 1000
2
4
6
8
10
12Evolution of bin boundaries for the HEH model
0 20 40 60 80 1000
2
4
6
8
10
12Evolution of bin boundaries for the HMD model
0 20 40 60 80 1000
2
4
6
8
10
12Evolution of component centers for the MOG model
Histogram UMDA: Summary
Last week. . .
Features of continuousspaces
Real-valued EDAsBy analogy to discreteEDAs. . .No Interactions AmongVariablesResults: Two PeaksFunctionHistogram UMDA:Summary
Optimization usingGaussian distribution
P. Pošík c© 2007 Soft Computing – 13 / 23
Suitable when
✔ the search space is bounded by a hyperrectangle
✔ there are no strong interactions among variables
Possible extension:
✔ Rotation of the coordinate system → UMDA is then able to work with linearinteractions
Optimization using Gaussian distribution
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 14 / 23
Case study: Quadratic function
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 15 / 23
Consider simple EDA with the following settings:
✔ Truncation selection: use τ · N best individuals to build the model
✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate
Two situations:
Population centered around optimum(population in the valley):
Population far away from optimum(population on the slope):
Case study: Quadratic function
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 15 / 23
Consider simple EDA with the following settings:
✔ Truncation selection: use τ · N best individuals to build the model
✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate
Two situations:
Population centered around optimum(population in the valley):
−3 −2 −1 0 1 2 30
0.2
0.4
0.6
0.8
1
1.2
1.4τ = 0.8
Population far away from optimum(population on the slope):
Case study: Quadratic function
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 15 / 23
Consider simple EDA with the following settings:
✔ Truncation selection: use τ · N best individuals to build the model
✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate
Two situations:
Population centered around optimum(population in the valley):
−3 −2 −1 0 1 2 30
0.2
0.4
0.6
0.8
1
1.2
1.4τ = 0.8
Population far away from optimum(population on the slope):
Case study: Quadratic function
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 15 / 23
Consider simple EDA with the following settings:
✔ Truncation selection: use τ · N best individuals to build the model
✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate
Two situations:
Population centered around optimum(population in the valley):
−3 −2 −1 0 1 2 30
0.2
0.4
0.6
0.8
1
1.2
1.4τ = 0.8
Population far away from optimum(population on the slope):
Case study: Quadratic function
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 15 / 23
Consider simple EDA with the following settings:
✔ Truncation selection: use τ · N best individuals to build the model
✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate
Two situations:
Population centered around optimum(population in the valley):
−3 −2 −1 0 1 2 30
0.2
0.4
0.6
0.8
1
1.2
1.4τ = 0.8
Population far away from optimum(population on the slope):
Case study: Quadratic function
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 15 / 23
Consider simple EDA with the following settings:
✔ Truncation selection: use τ · N best individuals to build the model
✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate
Two situations:
Population centered around optimum(population in the valley):
−3 −2 −1 0 1 2 30
0.2
0.4
0.6
0.8
1
1.2
1.4τ = 0.8
Population far away from optimum(population on the slope):
−103 −102 −101 −100 −99 −98 −970
0.5
1
1.5
2
2.5
3
3.5
4τ = 0.2
Case study: Quadratic function
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 15 / 23
Consider simple EDA with the following settings:
✔ Truncation selection: use τ · N best individuals to build the model
✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate
Two situations:
Population centered around optimum(population in the valley):
−3 −2 −1 0 1 2 30
0.2
0.4
0.6
0.8
1
1.2
1.4τ = 0.8
Population far away from optimum(population on the slope):
−103 −102 −101 −100 −99 −98 −970
0.5
1
1.5
2
2.5
3
3.5
4τ = 0.2
Case study: Quadratic function
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 15 / 23
Consider simple EDA with the following settings:
✔ Truncation selection: use τ · N best individuals to build the model
✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate
Two situations:
Population centered around optimum(population in the valley):
−3 −2 −1 0 1 2 30
0.2
0.4
0.6
0.8
1
1.2
1.4τ = 0.8
Population far away from optimum(population on the slope):
−103 −102 −101 −100 −99 −98 −970
0.5
1
1.5
2
2.5
3
3.5
4τ = 0.2
Case study: Quadratic function
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 15 / 23
Consider simple EDA with the following settings:
✔ Truncation selection: use τ · N best individuals to build the model
✔ Gaussian distribution: fit the Gaussian using maximum likelihood (ML)estimate
Two situations:
Population centered around optimum(population in the valley):
−3 −2 −1 0 1 2 30
0.2
0.4
0.6
0.8
1
1.2
1.4τ = 0.8
Population far away from optimum(population on the slope):
−103 −102 −101 −100 −99 −98 −970
0.5
1
1.5
2
2.5
3
3.5
4τ = 0.2
What happens on the slope?
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 16 / 23
The change of population statistics in 1 generation:
Expected value:
µt+1 = E(X|X > xmin) = µt + σ · d(τ),
where
d(τ) =φ(Φ−1(τ))
τ.
What happens on the slope?
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 16 / 23
The change of population statistics in 1 generation:
Expected value:
µt+1 = E(X|X > xmin) = µt + σ · d(τ),
where
d(τ) =φ(Φ−1(τ))
τ.
Variance:
(σt+1)2 = Var(X|X > xmin) = (σt)2 · c(τ),
where
c(τ) = 1 +Φ−1(1 − τ) · φ(Φ−1(τ))
τ− d(τ)2.
What happens on the slope?
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 16 / 23
The change of population statistics in 1 generation:
Expected value:
µt+1 = E(X|X > xmin) = µt + σ · d(τ),
where
d(τ) =φ(Φ−1(τ))
τ.
Variance:
(σt+1)2 = Var(X|X > xmin) = (σt)2 · c(τ),
where
c(τ) = 1 +Φ−1(1 − τ) · φ(Φ−1(τ))
τ− d(τ)2.
0 0.2 0.4 0.6 0.8 1−0.5
0
0.5
1
1.5
2
2.5
3
τ
d
On slopeIn the valley
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
τ
c
On slopeIn the valley
What happens on the slope (cont.)
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 17 / 23
Population statistics in generation t:
µt = µ0 + σ0 · d(τ) · ∑ti=1
√
c(τ)i−1
σt = σ0 ·√
c(τ)t
Convergence of population statistics:
limt→∞
µt = µ0 + σ0 · d(τ) · 1
1−√
c(τ)
limt→∞
σt = 0
What happens on the slope (cont.)
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 17 / 23
Population statistics in generation t:
µt = µ0 + σ0 · d(τ) · ∑ti=1
√
c(τ)i−1
σt = σ0 ·√
c(τ)t
Convergence of population statistics:
limt→∞
µt = µ0 + σ0 · d(τ) · 1
1−√
c(τ)
limt→∞
σt = 0
Geometric series
What happens on the slope (cont.)
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 17 / 23
Population statistics in generation t:
µt = µ0 + σ0 · d(τ) · ∑ti=1
√
c(τ)i−1
σt = σ0 ·√
c(τ)t
Convergence of population statistics:
limt→∞
µt = µ0 + σ0 · d(τ) · 1
1−√
c(τ)
limt→∞
σt = 0
Geometric series
The distance the population can “travel” in this algorithm is bounded!
Premature convergence!
Preventing premature convergence
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 18 / 23
Artificially enlarge the ML estimate of variance:
✔ keep variance at values greater than a specified threshold
✔ use self-adaptation (let the variance be part of the chromosome)
✔ adaptive variance scaling when population is on the slope, ML estimate ofvariance when population is in the valley
✔ . . .
Conclusions:
✔ Maximum likelihood estimates are suitable in situations when model fits thefitness function well (at least in local neighborhood)
✘ Gaussian distribution is suitable in the neighborhood of optimum
✘ Gaussian distribution is not suitable on the slope of fitness function!
Evolutionary Strategies
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 19 / 23
✔ (µ, λ)-ES or (µ + λ)-ES (µ parents, λ offspring)
✘ (µ, λ)-ES: offspring completely replace parents
✘ (µ + λ)-ES: parents are joined with offspring, both fight for survival
✔ offspring individuals are created using mutation as
x′ = x + ND(0, σ),
where x is parent, x′ is offspring, and ND(0, σ) is D-dimensional isotropicGaussian distribution
Increasing flexibility of ES: σ is not constant during ES run
✔ decrease σ deterministicaly
✔ feedback control of σ ( 15 rule)
✔ autoadaptation of σ
✘ σ is part of chromosome
σ′ = σ · exp(N1(0, ∆σ))
x′ = x + ND(0, σ′)
ES: Increasing model flexibility
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 20 / 23
Use different σ in all dimensions
✔ Gaussiam with diagonal covariance matrix
σ = (σ1, σ2, . . . , σD)
x′ = x + ND(0, I · σ)
✔ Gaussian with full covariance matrix
x′ = x + ND(0, C)
✔ Autoadaptation of σ and C
✔ Changes in the covariance structure are still very random!
CMA-ES
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 21 / 23
✔ Derandomized evolutionary strategy
✔ (1, λ)-ES with covariance matrix adaptation:
1. Generate λ offspring
xt = µt + ND(0, σt · Ct)
µt ∈ RD , σt ∈ R+, Ct ∈ RD×D
2. Based on the offspring, adapt the model parameters:
Adapt the Gaussian center µ and covariance matrix C using maximumlikelihood estimation:
µt+1 = arg maxµt+1
P(xtsel|µt+1), µt+1 = x̄t
sel
Ct+1 = arg maxCt+1
P
(
xtsel − µt
σt|Ct+1
)
, Ct+1 = Cov
(
xtsel − µt
σt
)
Adapt the global step size σ so that two consecutive steps, µt → µt+1 andµt+1 → µt+2, are conjugated, i.e. conceptually
(
µt+2 − µt+1)
× C−1 ×(
µt+1 − µt
(σt+1)2
)
= 0
CMA-ES: Summary
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 22 / 23
✔ CMA-ES has its roots in ES, but seems to be an instance of EDA (lerning ofprobabilistic model)
✔ behaves like local optimizer, but often does not get stuck in local optimum
✔ state-of-the-art in real-valued black-box ooptimization, its advantages aresignificant already in spaces of 5–10 dimensions
✔ it was used to solve many real-world problems (tuning of electronic filters,aerodynamic and hydrodynamic design, etc.)
Real-valued EDAs: Summary
Last week. . .
Features of continuousspaces
Real-valued EDAs
Optimization usingGaussian distributionCase study: QuadraticfunctionWhat happens on theslope?
Preventing prematureconvergence
Evolutionary Strategies
ES: Increasing modelflexibility
CMA-ES
CMA-ES: Summary
Real-valued EDAs:Summary
P. Pošík c© 2007 Soft Computing – 23 / 23
✔ Much less developed than EDAs for binary representation
✔ The difficulties are caused mainly by
✘ much severe effects of the curse of dimensionality
✘ many different types of interactions among variables
✔ Despite of that, EDA (and EAs generally) are able to gain better results thenconventional optimization techniques (line search, Nelder-Mead search, . . . )