metaheuristic optimization: algorithm analysis and open problems

Intro Metaheuristic Algorithms Applications Markov Chains Analysis All NFL Open Problems Thanks

Metaheristics Optimization: Algorithm Analysisand Open Problems

Xin-She Yang

National Physical Laboratory, UK

@ SEA 2011

Xin-She Yang 2011

Metaheuristics and Optimization


Intro

Intro

Computational science is now the third paradigm of science,complementing theory and experiment.

- Ken Wilson (Cornell University), Nobel Laureate.

Xin-She Yang 2011



Intro

Intro



All models are wrong, but some are useful.

- George Box, Statistician

Xin-She Yang 2011



Intro

Intro



All models are inaccurate, but some are useful.


Xin-She Yang 2011



Intro

Intro





All algorithms perform equally well on average over all possiblefunctions.

- No-free-lunch theorems (Wolpert & Macready)

Xin-She Yang 2011



Intro

Intro





All algorithms perform equally well on average over all possiblefunctions. How so?


Xin-She Yang 2011



Intro

Intro





All algorithms perform equally well on average over all possiblefunctions. Not quite! (more later)


Xin-She Yang 2011



Overview

Overview

Introduction

Metaheuristic Algorithms

Applications

Markov Chains and Convergence Analysis

Exploration and Exploitation

Free Lunch or No Free Lunch?

Open Problems

Xin-She Yang 2011





Essence of an Optimization Algorithm

To move to a new, better point xi+1 from an existing knownlocation xi .

x1

x2

xi

Xin-She Yang 2011





Essence of an Optimization Algorithm

To move to a new, better point xi+1 from an existing knownlocation xi .

x1

x2

xi

xi+1

?

Population-based algorithms use multiple, interacting paths.

Different algorithms

Different strategies/approaches in generating these moves!

Xin-She Yang 2011



Optimization Algorithms

Optimization Algorithms

Deterministic

Newton’s method (1669, published in 1711), Newton-Raphson(1690), hill-climbing/steepest descent (Cauchy 1847),least-squares (Gauss 1795),

linear programming (Dantzig 1947), conjugate gradient(Lanczos et al. 1952), interior-point method (Karmarkar1984), etc.

Xin-She Yang 2011



Stochastic/Metaheuristic

Stochastic/Metaheuristic

Genetic algorithms (1960s/1970s), evolutionary strategy(Rechenberg & Swefel 1960s), evolutionary programming(Fogel et al. 1960s).

Simulated annealing (Kirkpatrick et al. 1983), Tabu search(Glover 1980s), ant colony optimization (Dorigo 1992),genetic programming (Koza 1992), particle swarmoptimization (Kennedy & Eberhart 1995), differentialevolution (Storn & Price 1996/1997),

harmony search (Geem et al. 2001), honeybee algorithm(Nakrani & Tovey 2004), ..., firefly algorithm (Yang 2008),cuckoo search (Yang & Deb 2009), ...

Xin-She Yang 2011



Steepest Descent/Hill Climbing

Steepest Descent/Hill Climbing

Gradient-Based Methods

Use gradient/derivative information – very efficient for local search.

Xin-She Yang 2011



Newton’s Method

xn+1 = xn −H−1∇f , H =

∂2f∂x1

2 · · · ∂2f∂x1∂xn

.... . .

...∂2f

∂xn∂x1· · · ∂2f

∂xn2

.

Xin-She Yang 2011



Newton’s Method

xn+1 = xn −H−1∇f , H =

∂2f∂x1

2 · · · ∂2f∂x1∂xn

.... . .

...∂2f

∂xn∂x1· · · ∂2f

∂xn2

.

Quasi-Newton

If H is replaced by I, we have

xn+1 = xn − αI∇f (xn).

Here α controls the step length.

Xin-She Yang 2011



Newton’s Method

xn+1 = xn −H−1∇f , H =

∂2f∂x1

2 · · · ∂2f∂x1∂xn

.... . .

...∂2f

∂xn∂x1· · · ∂2f

∂xn2

.

Quasi-Newton

If H is replaced by I, we have

xn+1 = xn − αI∇f (xn).

Here α controls the step length.

Generation of new moves by gradient.

Xin-She Yang 2011



Simulated Annealling


Metal annealing to increase strength =⇒ simulated annealing.

Probabilistic Move: p ∝ exp[−E/kBT ].

kB=Boltzmann constant (e.g., kB = 1), T=temperature, E=energy.

E ∝ f (x),T = T0αt (cooling schedule) , (0 < α < 1).

T → 0, =⇒p → 0, =⇒ hill climbing.

Xin-She Yang 2011





Metal annealing to increase strength =⇒ simulated annealing.

Probabilistic Move: p ∝ exp[−E/kBT ].

kB=Boltzmann constant (e.g., kB = 1), T=temperature, E=energy.

E ∝ f (x),T = T0αt (cooling schedule) , (0 < α < 1).

T → 0, =⇒p → 0, =⇒ hill climbing.

This is essentially a Markov chain.Generation of new moves by Markov chain.

Xin-She Yang 2011



An Example

An Example

Xin-She Yang 2011



Genetic Algorithms

Genetic Algorithms

crossover mutation

Xin-She Yang 2011



Xin-She Yang 2011



Generation of new solutions by crossover, mutation and elistism.

Xin-She Yang 2011



Swarm Intelligence

Swarm Intelligence

Ants, bees, birds, fish ...

Simple rules lead to complex behaviour.

Swarming Starlings

Xin-She Yang 2011



PSO

PSO

xi

g∗

xj

Particle swarm optimization (Kennedy and Eberhart 1995)

vt+1i = vt

i + αǫ1(g∗ − xt

i ) + βǫ2(x∗i − xt

i ),

xt+1i = xt

i + vt+1i .

α, β = learning parameters, ǫ1, ǫ2=random numbers.

Xin-She Yang 2011



PSO

PSO

xi

g∗

xj

Particle swarm optimization (Kennedy and Eberhart 1995)

vt+1i = vt

i + αǫ1(g∗ − xt

i ) + βǫ2(x∗i − xt

i ),

xt+1i = xt

i + vt+1i .

α, β = learning parameters, ǫ1, ǫ2=random numbers.

Without randomness, generation of new moves by weightedaverage or pattern search.Adding randomization to increase the diversity of new solutions.

Xin-She Yang 2011



PSO Convergence

PSO ConvergenceConsider a 1D system without randomness (Clerc & Kennedy 2002)

v t+1i = v t

i + α(x ti − x∗

i ) + β(x ti − g), x t+1

i = x ti + v t+1

i .

Xin-She Yang 2011



PSO Convergence


v t+1i = v t


i ) + β(x ti − g), x t+1

i = x ti + v t+1

i .

Considering only one particle and defining p =αx∗i +βg

α+β, φ = α + β

and setting y t = p − x ti , we have

{

v t+1 = v t + φy t ,y t+1 = −v t + (1− φ)y t .

Xin-She Yang 2011



PSO Convergence


v t+1i = v t


i ) + β(x ti − g), x t+1

i = x ti + v t+1

i .

Considering only one particle and defining p =αx∗i +βg

α+β, φ = α + β

and setting y t = p − x ti , we have

{

v t+1 = v t + φy t ,y t+1 = −v t + (1− φ)y t .

This can be written as

Ut =

(

v t

y t

)

, A =

(

1 φ−1 (1− φ)

)

, =⇒Ut+1 = AUt ,

a simple dynamical system whose eigenvalues are

λ± = 1− φ

2±

√

φ2 − 4φ

2.

Periodic, quasi-periodic depending on φ. Convergence for φ ≈ 4.

Xin-She Yang 2011



Ant and Bee Algorithms


Ant Colony Optimization (Dorigo 1992)

Bee algorithms & many variants (Nakrani & Tovey 2004,Karabogo 2005, Yang 2005, Asfhar et al. 2007, ..., others.

Xin-She Yang 2011





Ant Colony Optimization (Dorigo 1992)

Bee algorithms & many variants (Nakrani & Tovey 2004,Karabogo 2005, Yang 2005, Asfhar et al. 2007, ..., others.

Advantages

Very promising for combinatorial optimization, but for continuousproblems, it may not be the best choice.

Xin-She Yang 2011



Ant & Bee Algorithms

Ant & Bee Algorithms

Pheromone based

Each agent follows paths with higher pheromoneconcentration (quasi-randomly)

Pheromone evaporates (exponentially) with time

Xin-She Yang 2011



Firefly Algorithm

Firefly Algorithm

Firefly Algorithm by Xin-She Yang (2008)(Xin-She Yang, Nature-Inspired Metaheuristic Algorithms, Luniver Press, (2008).)

Firefly Behaviour and Idealization

Fireflies are unisex and brightness varies with distance.

Less bright ones will be attracted to bright ones.

If no brighter firefly can be seen, a firefly will move randomly.

xt+1i = xt

i + β0e−γr2

ij (xj − xi ) + α ǫti .

Generation of new solutions by random walk and attraction.Xin-She Yang 2011



FA Convergence

FA Convergence

For the firefly motion without the randomness term, we focus on asingle agent and replace xt

j by g

xt+1i = xt

i + β0e−γr2

i (g − xti ),

where the distance ri = ||g − xti ||2.

Xin-She Yang 2011



FA Convergence

FA Convergence


j by g

xt+1i = xt

i + β0e−γr2

i (g − xti ),


In the 1-D case, we set yt = g − xti and ut =

√γyt , we have

ut+1 = ut [1− β0e−u2

t ].

Xin-She Yang 2011



FA Convergence

FA Convergence


j by g

xt+1i = xt

i + β0e−γr2

i (g − xti ),


In the 1-D case, we set yt = g − xti and ut =

√γyt , we have

ut+1 = ut [1− β0e−u2

t ].

Analyzing this using the same methodology for ut = λut(1− ut),we have a corresponding chaotic map, focusing on the transitionfrom periodic multiple states to chaotic behaviour.

Xin-She Yang 2011



Convergence can be achieved for β0 < 2. There is a transitionfrom periodic to chaos at β0 ≈ 4.

Chaotic characteristics can often be used as an efficientmixing technique for generating diverse solutions.

Too much attraction may cause chaos :)

Xin-She Yang 2011



Cuckoo Breeding Behaviour

Cuckoo Breeding Behaviour

Evolutionary Advantages

Dumps eggs in the nests of host birds and let these host birds raisetheir chicks.

Cuckoo Video (BBC)

Xin-She Yang 2011



Cuckoo Search

Cuckoo Search

Cuckoo Search by Xin-She Yang and Suash Deb (2009)(Xin-She Yang and Suash Deb, Cuckoo search via Levy flights, in: Proceeings of

World Congress on Nature & Biologically Inspired Computing (NaBIC 2009, India),

IEEE Publications, USA, pp. 210-214 (2009). Also, Xin-She Yang and Suash Deb,

Engineering Optimization by Cuckoo Search, Int. J. Mathematical Modelling and

Numerical Optimisation, Vol. 1, No. 4, 330-343 (2010). )

Cuckoo Behaviour and Idealization

Each cuckoo lays one egg (solution) at a time, and dumps itsegg in a randomly chosen nest.

The best nests with high-quality eggs (solutions) will carry outto the next generation.

The egg laid by a cuckoo can be discovered by the host birdwith a probability pa and a nest will then be built.

Xin-She Yang 2011



Cuckoo Search

Cuckoo Search

Local random walk:

xt+1i = xt

i + s ⊗ H(pa − ǫ)⊗ (xtj − xt

k).

[xi , xj , xk are 3 different solutions, H(u) is a Heaviside function, ǫis a random number drawn from a uniform distribution, and s isthe step size.

Xin-She Yang 2011



Cuckoo Search

Cuckoo Search

Local random walk:

xt+1i = xt

i + s ⊗ H(pa − ǫ)⊗ (xtj − xt

k).


Global random walk via Levy flights:

xt+1i = xt

i + αL(s, λ), L(s, λ) =λΓ(λ) sin(πλ/2)

π

1

s1+λ, (s ≫ s0).

Xin-She Yang 2011



Cuckoo Search

Cuckoo Search

Local random walk:

xt+1i = xt

i + s ⊗ H(pa − ǫ)⊗ (xtj − xt

k).


Global random walk via Levy flights:

xt+1i = xt

i + αL(s, λ), L(s, λ) =λΓ(λ) sin(πλ/2)

π

1

s1+λ, (s ≫ s0).

Generation of new moves by Levy flights, random walk and elitism.

Xin-She Yang 2011



Applications

Applications

Design optimization: structural engineering, product design ...

Scheduling, routing and planning: often discrete,combinatorial problems ...

Applications in almost all areas (e.g., finance, economics,engineering, industry, ...)

Xin-She Yang 2011



Pressure Vessel Design Optimization

Pressure Vessel Design Optimization

r

d1

r

L d2

Xin-She Yang 2011



Optimization

Optimization

This is a well-known test problem for optimization (e.g., seeCagnina et al. 2008) and it can be written as

minimize f (x) = 0.6224d1rL+1.7781d2r2+3.1661d2

1 L+19.84d21 r ,

subject to

g1(x) = −d1 + 0.0193r ≤ 0g2(x) = −d2 + 0.00954r ≤ 0g3(x) = −πr2L− 4π

3 r3 + 1296000 ≤ 0g4(x) = L− 240 ≤ 0.

Xin-She Yang 2011



Optimization

Optimization



1 L+19.84d21 r ,

subject to

g1(x) = −d1 + 0.0193r ≤ 0g2(x) = −d2 + 0.00954r ≤ 0g3(x) = −πr2L− 4π

3 r3 + 1296000 ≤ 0g4(x) = L− 240 ≤ 0.

The simple bounds are

0.0625 ≤ d1, d2 ≤ 99× 0.0625, 10.0 ≤ r , L ≤ 200.0.

Xin-She Yang 2011



Optimization

Optimization



1 L+19.84d21 r ,

subject to

g1(x) = −d1 + 0.0193r ≤ 0g2(x) = −d2 + 0.00954r ≤ 0g3(x) = −πr2L− 4π

3 r3 + 1296000 ≤ 0g4(x) = L− 240 ≤ 0.

The simple bounds are

0.0625 ≤ d1, d2 ≤ 99× 0.0625, 10.0 ≤ r , L ≤ 200.0.

The best solution found so far

f∗ = 6059.714, x∗ = (0.8125, 0.4375, 42.0984, 176.6366).

Xin-She Yang 2011



Dome Design

Dome Design

Xin-She Yang 2011



Dome Design

Dome Design

120-bar dome: Divided into 7 groups, 120 design elements, about 200

constraints (Kaveh and Talatahari 2010; Gandomi and Yang 2011).

Xin-She Yang 2011



Tower Design

Tower Design

26-storey tower: 942 design elements, 244 nodal links, 59 groups/types,

> 4000 nonlinear constraints (Kaveh & Talatahari 2010; Gandomi & Yang 2011).

Xin-She Yang 2011



Monte Carlo Methods

Monte Carlo Methods

Random walk – A drunkard’s walk:

ut+1 = µ + ut + wt ,

where wt is a random variable, and µ is the drift.

For example, wt ∼ N(0, σ2) (Gaussian).

Xin-She Yang 2011



Monte Carlo Methods

Monte Carlo Methods


ut+1 = µ + ut + wt ,



-10

-5

0

5

10

15

20

25

0 100 200 300 400 500

Xin-She Yang 2011



Monte Carlo Methods

Monte Carlo Methods


ut+1 = µ + ut + wt ,



-10

-5

0

5

10

15

20

25

0 100 200 300 400 500-20

-15

-10

-5

0

5

10

-15 -10 -5 0 5 10 15 20

Xin-She Yang 2011



Markov Chains

Markov Chains

Markov chain: the next state only depends on the current stateand the transition probability.

P(i , j) ≡ P(Vt+1 = Sj

∣

∣

∣V0 = Sp, ...,Vt = Si)

= P(Vt+1 = Sj

∣

∣

∣Vt = Sj),

=⇒Pijπ∗i = Pjiπ

∗j , π∗ = stionary probability distribution.

Examples: Brownian motion

ui+1 = µ + ui + ǫi , ǫi ∼ N(0, σ2).

Xin-She Yang 2011



Markov Chains

Markov Chains

Monopoly (board games)

Monopoly Animation

Xin-She Yang 2011



Markov Chain Monte Carlo

Markov Chain Monte Carlo

Landmarks: Monte Carlo method (1930s, 1945, from 1950s) e.g.,Metropolis Algorithm (1953), Metropolis-Hastings (1970).

Markov Chain Monte Carlo (MCMC) methods – A class ofmethods.

Really took off in 1990s, now applied to a wide range of areas:physics, Bayesian statistics, climate changes, machine learning,finance, economy, medicine, biology, materials and engineering ...

Xin-She Yang 2011



Convergence Behaviour


As the MCMC runs, convergence may be reached

When does a chain converge? When to stop the chain ... ?

Are multiple chains better than a single chain?

0

100

200

300

400

500

600

0 100 200 300 400 500 600 700 800 900

Xin-She Yang 2011





t=2

t=0

t=−2U

1

2

3

−∞← t

t=−n

converged

Multiple, interacting chains

Multiple agents trace multiple, interacting Markov chains duringthe Monte Carlo process.

Xin-She Yang 2011



Analysis

Analysis

Classifications of Algorithms

Trajectory-based: hill-climbing, simulated annealing, patternsearch ...

Population-based: genetic algorithms, ant & bee algorithms,artificial immune systems, differential evolutions, PSO, HS,FA, CS, ...

Xin-She Yang 2011



Analysis

Analysis

Classifications of Algorithms

Trajectory-based: hill-climbing, simulated annealing, patternsearch ...

Population-based: genetic algorithms, ant & bee algorithms,artificial immune systems, differential evolutions, PSO, HS,FA, CS, ...

Ways of Generating New Moves/Solutions

Markov chains with different transition probability.

Trajectory-based =⇒ a single Markov chain;Population-based =⇒ multiple, interacting chains.

Tabu search (with memory) =⇒ self-avoiding Markov chains.Xin-She Yang 2011



Ergodicity

Ergodicity

Markov Chains & Markov Processes

Most theoretical studies uses Markov chains/process as aframework for convergence analysis.

A Markov chain is said be to regular if some positive power k

of the transition matrix P has only positive elements.

A chain is call time-homogeneous if the change of itstransition matrix P is the same after each step, thus thetransition probability after k steps become Pk .

A chain is ergodic or irreducible if it is aperiodic and positiverecurrent – it is possible to reach every state from any state.

Xin-She Yang 2011





As k →∞, we have the stationary probability distribution π

π = πP, =⇒ thus the first eigenvalue is always 1.

Asymptotic convergence to optimality:

limk→∞

θk → θ∗, (with probability one).

Xin-She Yang 2011





As k →∞, we have the stationary probability distribution π

π = πP, =⇒ thus the first eigenvalue is always 1.

Asymptotic convergence to optimality:

limk→∞

θk → θ∗, (with probability one).

The rate of convergence is usually determined by the secondeigenvalue 0 < λ2 < 1.

An algorithm can converge, but may not be necessarily efficient,as the rate of convergence is typically low.

Xin-She Yang 2011



Convergence of GA

Convergence of GA

Important studies by Aytug et al. (1996)1, Aytug and Koehler(2000)2, Greenhalgh and Marschall (2000)3, Gutjahr (2010),4 etc.5

The number of iterations t(ζ) in GA with a convergenceprobability of ζ can be estimated by

t(ζ) ≤⌈

ln(1− ζ)

ln

{

1−min[(1− µ)Ln, µLn]

}

⌉

,

where µ=mutation rate, L=string length, and n=population size.

1H. Aytug, S. Bhattacharrya and G. J. Koehler, A Markov chain analysis of genetic algorithms with power of

2 cardinality alphabets, Euro. J. Operational Research, 96, 195-201 (1996).2H. Aytug and G. J. Koehler, New stopping criterion for genetic algorithms, Euro. J. Operational research,

126, 662-674 (2000).3D. Greenhalgh & S. Marshal, Convergence criteria for genetic algorithms, SIAM J. Computing, 30, 269-282

(2000).4W. J. Gutjahr, Convergence Analysis of Metaheuristics Annals of Information Systems, 10, 159-187 (2010).

5 ´

Xin-She Yang 2011



Multiobjective Metaheuristics


Asymptotic convergence of metaheuristic for multiobjectiveoptimization (Villalobos-Arias et al. 2005)6

The transition matrix P of a metaheuristic algorithm has astationary distribution π such that

|Pkij − πj | ≤ (1− ζ)k−1, ∀i , j , (k = 1, 2, ...),

where ζ is a function of mutation probability µ, string length L

and population size. For example, ζ = 2nLµnL, so µ < 0.5.

6M. Villalobos-Arias, C. A. Coello Coello and O. Hernandez-Lerma, Asymptotic convergence of metaheuristics

for multiobjective optimization problems, Soft Computing, 10, 1001-1005 (2005).

Xin-She Yang 2011





Asymptotic convergence of metaheuristic for multiobjectiveoptimization (Villalobos-Arias et al. 2005)6

The transition matrix P of a metaheuristic algorithm has astationary distribution π such that

|Pkij − πj | ≤ (1− ζ)k−1, ∀i , j , (k = 1, 2, ...),

where ζ is a function of mutation probability µ, string length L

and population size. For example, ζ = 2nLµnL, so µ < 0.5.

Note: An algorithm satisfying this condition may not converge (formultiobjective optimization)However, an algorithm with elitism, obeying the above condition,does converge!.

6M. Villalobos-Arias, C. A. Coello Coello and O. Hernandez-Lerma, Asymptotic convergence of metaheuristics

for multiobjective optimization problems, Soft Computing, 10, 1001-1005 (2005).

Xin-She Yang 2011



Other results

Other results

Limited results on convergence analysis exist, concerning (finitestates/domains)

ant colony optimization

generalized hill-climbers and simulated annealing,

best-so-far convergence of cross-entropy optimization,

nested partition method, Tabu search, and

of course, combinatorial optimization.

Xin-She Yang 2011



Other results

Other results

Limited results on convergence analysis exist, concerning (finitestates/domains)

ant colony optimization

generalized hill-climbers and simulated annealing,

best-so-far convergence of cross-entropy optimization,

nested partition method, Tabu search, and

of course, combinatorial optimization.

However, more challenging tasks for infinite states/domains andcontinuous problems.

Many, many open problems needs satisfactory answers.

Xin-She Yang 2011



Converged?

Converged?

Converged, often the ‘best-so-far’ convergence, not necessarily atthe global optimality

In theory, a Markov chain can converge, but the number ofiterations tends to be large.

In practice, a finite (hopefully, small) number of generations, if thealgorithm converges, it may not reach the global optimum.

Xin-She Yang 2011



Converged?

Converged?

Converged, often the ‘best-so-far’ convergence, not necessarily atthe global optimality

In theory, a Markov chain can converge, but the number ofiterations tends to be large.

In practice, a finite (hopefully, small) number of generations, if thealgorithm converges, it may not reach the global optimum.

How to avoid premature convergence

Equip an algorithm with the ability to escape a local optimum

Increase diversity of the solutions

Enough randomization at the right stage

....(unknown, new) ....

Xin-She Yang 2011



All

All

So many algorithms – what are the common characteristics?

What are the key components?

How to use and balance different components?

What controls the overall behaviour of an algorithm?

Xin-She Yang 2011





Characteristics of Metaheuristics

Exploration and Exploitation, or Diversification and Intensification.

Xin-She Yang 2011







Exploitation/Intensification

Intensive local search, exploiting local information.E.g., hill-climbing.

Xin-She Yang 2011







Exploitation/Intensification

Intensive local search, exploiting local information.E.g., hill-climbing.

Exploration/Diversification

Exploratory global search, using randomization/stochasticcomponents. E.g., hill-climbing with random restart.

Xin-She Yang 2011



Summary

Summary

Exploitation

Exp

lora

tion

Xin-She Yang 2011



Summary

Summary

Exploitation

Exp

lora

tion

uniformsearch

Xin-She Yang 2011



Summary

Summary

Exploitation

Exp

lora

tion

uniformsearch

steepestdescent

Xin-She Yang 2011



Summary

Summary

Exploitation

Exp

lora

tion

uniformsearch

steepestdescent

Tabu Nelder-Mead

CS

PSO/FAEP/ESSA Ant/Bee

Genetic algorithms

Newton-Raphson

Xin-She Yang 2011



Summary

Summary

Exploitation

Exp

lora

tion

uniformsearch

steepestdescent

Tabu Nelder-Mead

CS

PSO/FAEP/ESSA Ant/Bee

Genetic algorithms

Newton-Raphson

Best?

Free lunch?

Xin-She Yang 2011



No-Free-Lunch (NFL) Theorems


Algorithm Performance

Any algorithm is as good/bad as random search, when averagedover all possible problems/functions.

Xin-She Yang 2011







Finite domains

No universally efficient algorithm!

Xin-She Yang 2011







Finite domains

No universally efficient algorithm!

Any free taster or dessert?

Yes and no. (more later)

Xin-She Yang 2011



NFL Theorems (Wolpert and Macready 1997)


Search space is finite (though quite large), thus the space ofpossible “cost” values is also finite. Objective functionf : X 7→ Y, with F = YX (space of all possible problems).Assumptions: finite domain, closed under permutation (c.u.p).

For m iterations, m distinct visited points form a time-ordered

set dm ={(

dxm(1), dy

m(1))

, ...,(

dxm(m), dy

m(m))}

.

The performance of an algorithm a iterated m times on a costfunction f is denoted by P(dy

m|f ,m, a).

For any pair of algorithms a and b, the NFL theorem states∑

f

P(dym|f ,m, a) =

∑

f

P(dym|f ,m, b).

Xin-She Yang 2011





Search space is finite (though quite large), thus the space ofpossible “cost” values is also finite. Objective functionf : X 7→ Y, with F = YX (space of all possible problems).Assumptions: finite domain, closed under permutation (c.u.p).

For m iterations, m distinct visited points form a time-ordered

set dm ={(

dxm(1), dy

m(1))

, ...,(

dxm(m), dy

m(m))}

.

The performance of an algorithm a iterated m times on a costfunction f is denoted by P(dy

m|f ,m, a).

For any pair of algorithms a and b, the NFL theorem states∑

f

P(dym|f ,m, a) =

∑

f

P(dym|f ,m, b).

Any algorithm is as good (bad) as a random search!Xin-She Yang 2011



Proof Sketch

Proof Sketch

Wolpert and Macready’s original proof by inductionFor m = 1, d1 = {dx

1 , dy1 }, so the only possible value of d

y1 is f (dx

1 ), and thusδ(dy

1 , f (dx1 )). This means

∑

f

P(dy1 |f ,m = 1, a) =

∑

f

δ(dy1 , f (dx

1 )) = |Y||X|−1,

which is independent of algorithm a. [|Y| is the size of Y .]If it is true for m, or

∑

f P(dym |f , m, a) is independent of a, then for m + 1, we

have dm+1 = dm ∪ {x , f (x)} with dxm+1(m + 1) = x and d

ym+1(m + 1) = f (x).

Thus, we get (Bayesian approach)

P(dym+1|f ,m + 1, a) = P(dy

m+1(m + 1)|dm , f ,m + 1, a)P(dym |f , m + 1, a).

So∑

f P(dym+1|f ,m + 1, a) =

∑

f ,x δ(dmm+1(m + 1), f (x))P(x |dy

m , f ,m + 1, a)P(dym |f ,m + 1, a).

Using P(x |dm, a) = δ(x , a(dm)) and P(dm |f ,m + 1, a) = P(dm |f , m, a), thisleads to

∑

f

P(dym+1|f , m + 1, a) =

1

|Y|

∑

f

P(dym |f ,m, a),

which is also independent of a.

Xin-She Yang 2011



Free Lunches

Free Lunches

NFL – not true for continuous domains (Auger and Teytaud 2009)

Continuous free lunches =⇒ some algorithms are better than others!

For example, for a 2D sphere function, an efficient algorithm onlyneeds 4 iterations/steps to reach the optimality (global minimum).7

7A. Auger and O. Teytaud, Continuous lunches are free plus the design of optimal optimization algorithms,

Algorithmica, 57, 121-146 (2010).8J. A. Marshall and T. G. Hinton, Beyond no free lunch: realistic algorithms for arbitrary problem classes,

WCCI 2010 IEEE World Congress on Computational Intelligence, July 1823, Barcelona, Spain, pp. 1319-1324.

Xin-She Yang 2011



Free Lunches

Free Lunches

NFL – not true for continuous domains (Auger and Teytaud 2009)

Continuous free lunches =⇒ some algorithms are better than others!

For example, for a 2D sphere function, an efficient algorithm onlyneeds 4 iterations/steps to reach the optimality (global minimum).7

Revisiting algorithms

NFL assumes that the time-ordered set has m distinct points(non-revisiting). For revisiting points, it breaks the closed underpermutation, so NFL does not hold (Marshall and Hinton 2010)8

7A. Auger and O. Teytaud, Continuous lunches are free plus the design of optimal optimization algorithms,

Algorithmica, 57, 121-146 (2010).8J. A. Marshall and T. G. Hinton, Beyond no free lunch: realistic algorithms for arbitrary problem classes,

WCCI 2010 IEEE World Congress on Computational Intelligence, July 1823, Barcelona, Spain, pp. 1319-1324.

Xin-She Yang 2011



More Free Lunches

More Free Lunches

Coevolutionary algorithms

A set of players (agents?) in self-play problems work together toproduce a champion – like training a chess champion– free lunches exist (Wolpert and Macready 2005).9

[A single player tries to pursue the best next move, or for twoplayers, the fitness function depends on the moves of both players.]

9D. H. Wolpert and W. G. Macready, Coevolutonary free lunches, IEEE Trans. Evolutionary Computation, 9,

721-735 (2005).10

D. Corne and J. Knowles, Some multiobjective optimizers are better than others, Evolutionary Computation,CEC’03, 4, 2506-2512 (2003).Xin-She Yang 2011



More Free Lunches

More Free Lunches

Coevolutionary algorithms

A set of players (agents?) in self-play problems work together toproduce a champion – like training a chess champion– free lunches exist (Wolpert and Macready 2005).9

[A single player tries to pursue the best next move, or for twoplayers, the fitness function depends on the moves of both players.]

Multiobjective

“Some multiobjective optimizers are better than others” (Corneand Knowles 2003).10 [results for finite domains only]Free lunches due to archiver and generator.

9D. H. Wolpert and W. G. Macready, Coevolutonary free lunches, IEEE Trans. Evolutionary Computation, 9,

721-735 (2005).10

D. Corne and J. Knowles, Some multiobjective optimizers are better than others, Evolutionary Computation,CEC’03, 4, 2506-2512 (2003).Xin-She Yang 2011



Open Problems

Open Problems

Framework: Need to develop a unified framework foralgorithmic analysis (e.g.,convergence).

Exploration and exploitation: What is the optimal balancebetween these two components? (50-50 or what?)

Performance measure: What are the best performancemeasures ? Statistically? Why ?

Convergence: Convergence analysis of algorithms for infinite,continuous domains require systematic approaches?

Xin-She Yang 2011



More Open Problems

More Open Problems

Free lunches: Unproved for infinite or continuous domains formultiobjective optimization. (possible free lunches!)What are implications of NFL theorems in practice?If free lunches exist, how to find the best algorithm(s)?

Knowledge: Problem-specific knowledge always helps to findappropriate solutions? How to quantify such knowledge?

Intelligent algorithms: Any practical way to design trulyintelligent, self-evolving algorithms?

Xin-She Yang 2011



Thanks

Thanks

Yang X. S., Engineering Optimization: An Introduction with Metaheuristic

Applications, Wiley, (2010).Yang X. S., Introduction to Computational Mathematics, World Scientific,(2008).Yang X. S., Nature-Inspired Metaheuristic Algorithms, Luniver Press, (2008).Yang X. S., Introduction to Mathematical Optimization: From Linear

Programming to Metaheuristics, Cambridge Int. Science Publishing, (2008).Yang X. S., Applied Engineering Optimization, Cambridge Int. SciencePublishing, (2007).

Xin-She Yang 2011



IJMMNO

IJMMNO

International Journal of Mathematical Modelling and NumericalOptimization (IJMMNO)

http://www.inderscience.com/ijmmno

Thank you!

Xin-She Yang 2011



Thank you!

Questions ?

Xin-She Yang 2011


metaheuristic optimization: algorithm analysis and open problems

Education

paradigm of science

nfl open problems thanksoptim

optimization algorithmto

open problemsxin

algorithm analysis

nobel laureate

statisticianall algorithms

ken wilson cornell university