natural computing

Natural Computing

Lecture 15

Michael [email protected]: 0131 6 517177Informatics Forum 1.42

12/10/2010

12/11/2010 M. Herrmann

Overview

Discrete PSO

Bees, frogs, �re�ies, bats, cukoos, eagles

Comparison of metaheuristic algorithms


Discrete Particle Swarm Optimization

A particle in a swarm

has a position and a velocity

knows its position & objective function value for this position

knows its neighbours, best previous position and objectivefunction value (or: current position & objective function value)

remember its best previous position

Its behaviour is determined by a compromise between 3 possiblechoices

To follow its own way (self-con�dence)

To go towards its best previous position (experience)

To go towards the best neighbour's best previous position, ortowards the best neighbour (�peer pressure�)

(see Maurice Clerc ([email protected]) http://www.mauriceclerc.net)


Canonical PSO

xi , vi ∈ Rd , 1 ≤ i ≤ n, r1, r2∈ Rd ,ω, α1, α2 ∈ R+,

f : Rd → R+ to be minimized

For all member of the swarm

vi := ωvi + α1r1 ◦ (xi − xi ) + α2r2 ◦ (g − xi )◦: component-wise multiplication

xi := xi + vi

xi := xi if f (xi ) < f (xi )

g := xi if f (xi ) < f (g)

until termination criterion is met


Discrete PSO

States x are implied by optimisation problem, e.g. states s ∈ Zd

Option 1: Run the algorithm for continuous states x anddiscretize [s = (int)x ] after a solution has been found

Option 2: If the objective function does not accept continuousvalues then discretize before the evaluation of the swarmmembers

Option 3: Use discrete states s = x . The velocities are stillcontinuous but are incremented by discrete steps. Whenupdating s with a small velocity there is no e�ect, only from acertain threshold s is actually changed. This could beadvisable if continuous values of the state have no meaning

Option 4: Use discrete states s = x and continuous velocities,but smoothen the e�ect of the states onto the velocities. Thiscould be advisable for binary states.

Option 5: Use a more systematic approach (cf. below)


Discrete PSO

For all options, adaptive discretisation schemes might beuseful.

The parameters ω, α1, α2 may have optimal values di�erentfrom the standard values for the continuous case.

Theoretical predictions about the behaviour of the algorithmcan hardly be made

Practically, dPSO performs competes well withgenuine-discrete algorithms (ACO, GA)


Example: Sequence alignment

Time-warped sequences qm(t), m = 1, . . . ,M, t ∈ [0,T ]

If we had the correct warping functions wm (t) for each sequencethen for all t

q1 (t + w1 (t)) = · · · = qM (t + wM (t))

More generally, we cannot assume exact equality, so we minimise

f [w ] =M∑

i ,j=1

∫(qi (t + wi (t))− qj (t + wj (t)))

2 dt

by choosing appropriate wm (t) subject to a simultaneousminimization of ‖wm‖2!

This is an in�nite-dimensional problem.


Example: Sequence alignment

Choose a discretization t = 1, . . . ,T (or use the naturaldiscretization of the data)

f [w ] =M∑

i ,j=1

T∑t=1

(qi (t + wi (t))− qj (t + wj (t)))2

w is a M × T dimensional vector that can be used as state x inPSO.

However, having discretized t only discrete values of w aremeaningful. Nevertheless, the above options 1 - 4 are applicable.

If the the �tness function is evaluated w.r.t. to given data thenoptions 1 - 3 are applicable.


An algorithm for binary states

Initialize the v and the discrete particles x , choose ω,α1, α2, a, b

For discrete particles x , calculate the �tness f (x)

Calculate vpb and vgb (× is standard multiplication)

vpb = a × xpb + b × (1− xpb)

vgb = a × xgb + b × (1− xgb)

Update v (usually with relatively small αi )

v = w × v + α1vpb + α2vgb

If rand > vki then xki = 1 else xki = 0 (i : particles, k dimension)[e.g. rand = U [0, 1] for a = 0.3, b = 0.7]

Repeat until termination criterion is satis�ed.

Yang, S.Y., Wang, M., Jiao, L.C., 2004. A Quantum Particle Swarm Optimization. Proc. 2004 IEEECongress on Evolutionary Computation, 1:320-324.


An algorithm for binary states

Note: The velocities represented a tendency to move either to0 or to 1, i.e. an estimate of a probability.

Analoguous to �compact GA� (see GA as MBS, lect. 12)

It appears to be less �dynamic�: Induce diversity (exploration)by combination with GA operators

Exploitation can be imporved by local search, e.g. simulatedannealing

Yang, S.Y., Wang, M., Jiao, L.C., 2004. A Quantum Particle Swarm Optimization. Proc. 2004 IEEECongress on Evolutionary Computation, 1:320-324.


General case: Operator formalism

What if velocities also need to be discrete? 'Overload' the requiredoperations.

Subtraction (position � position) operator:two positions x1 and x2: x2 − x1 = v (velocity)

Addition (position + velocity) operator:position x and v velocity: x + v = x1 (position)

Addition (velocity + velocity) operator:two velocities: v1 and v2: v1 + v2 (velocity)

Multiplication (Coe�cient Ö velocity) operator:learning coe�cient: α, velocity v : c × v (velocity)

M Clerc: Discrete Particle Swarm Optimization, illustrated by the Traveling Salesman Problem. In:Godfrey C. Onwubolu, B. V. Babu (eds.) New optimization techniques in engineering , p 219-239.


Discrete PSO for TSP

Search space of positions/states S = {si} → graph:

Hamilton cycles in a weighted graph G = {EG ,VG}Cost/objective function f on X maps into a set of valuesS → C = {ci}For TSP: f (s) =

∑Ni=1 wni ,ni+1

with nN+1 ≡ n0, w denotingdistances.

Order on C , or, more generally, a semi-order: either ci < cj orci ≥ cj (if comparable)

Enumber EG and serch for sequences of N + 1 nodes with �rstand last identical, otherwise di�erent.


TSP: Discrete velocities

What is a state?

A vector containing N nodes

What is a velocity?

De�ne it as a permutation, but only using pairs:Simplest case: the exchange of two nodes:(..., i , ..., j , ...) → (..., j , ..., i , ...), i.e. the cycle (ij).More generally {(ik , jk)}k=1,...,|v |: A sequence of pairwiseexchanges.


TSP: Discrete velocities

A negative velocity?

De�ne−v ={(

i|v |−k+1, j|v |−k+1

)}k=1,...,|v |

Adding a velocity to a state

applying a permutation (v) to a set of objects (x)

Di�erence between states?

The permuation that transforms x1 into x2

Sum of velocities?

⊕ perform �rst the pair exchances of v1 than those of v2 (notcommutative; may be contracted into fewer pairs)

Multiplication by a scalar?

ω = 0: ωv = Id

ω ∈ (0, 1]: remove all pairs from v above (�oor)ω |v |ω > 1 concatenate (�oor)ω-times and add (�oor)ω |v | pairsfrom the beginning of v


Algorithm for discrete velocities

vt+1 = ωvt ⊕ α1 (x − xt)⊕ α2 (g − xt)

xt+1 = xt + vt+1

Fitness evaluation, and update of personal bet and global bestare standard.

Performance is not great unless parameters are adapted inorder to revive the swarm when diversity is too small.

A GA in the disguise of a PS?


Bees, frogs, �re�ies, bats, cukoos, eagles

Honey bee algorithm: A bee directs others to nectar sources independence on its previous success (cf. ACO)

Fire�ies allgorithm: Fire�ies attract others by an inversesquare law of the �light intensity� (i.e. �tness) (cf. ACO)

Bat algorithm: Bats �y with a velocity that depends on their�wave length� (i.e. �tness), but can change also loudness andduration of the pulse etc. (cf. PSO)

Frog leaping algorithm: Out of several subgroups of frogs thebest ones are allowed to �jump�, i.e. to exchange di�erencevectors (cf. DE)

see X-S Yang: Nature-inspired metaheuristic algorithms. Luniver Press 2010.


Comparison among ME algorithms

Global bests in a standard set of benchmark problems basedon a standard solution quality metrics (neither is agreed upon)

Comparisons are not always meaningful

Standard data sets are simple,Data sets are pragmatically selectedEven with best intentions one's own algorithm will be bettertuned than the algorithm of a competitor

Open competitions are an option

Preparation: Parameter adaptation on a given datasetCompetion: Test on a similar but unknown data set withmanual readjustment of parameters

Asymptotic space and time complexity (e.g. runtime growth rate)

Dimension and sensitivity of the parameter space

J. Silberholz and B: Golden: Comparison of Metaheuristics. in Handbook of Metaheuristics 2010,Vol. 146, 625-640.


Principles for comparisons

First experimental principle: The problems used for assessing theperformance of an algorithm cannot be used in the development ofthe algorithm itself.

Second experimental principle: The designer can take into accountany available domain-speci�c knowledge as well as make use ofpilot studies on similar problems.

Third experimental principle: When comparing several algorithms,all the algorithms should make use of the available domain-speci�cknowledge, and equal computational e�ort should be invested in allthe pilot studies. Similarly, in the test phase, all the algorithmsshould be compared on an equal computing time basis.

Mauro Birattari, Mark Zlochinand Marco Dorigo: Toward a theory of practice in metaheuristicdesign: Amachine learning perspective. RAIRO-Inf. Theor. Appl. 40 (2006) 353-369.


Emad Elbeltagi, TarekHegazy, Donald Grierson (2005) Advanced Comparison among�ve evolutionary-based optimization algorithms. Engineering Informatics 19, 43�53.

Advanced Comparison

Emad Elbeltagi, TarekHegazy, Donald Grierson (2005) Advanced Comparison among�ve evolutionary-based optimization algorithms. Engineering Informatics 19, 43�53.


Result of the comparison

. . . and the winner is

(check back for the next competition)


Example: Power Generation Expansion Planning

Long-term behaviour of electricity markets

Minimize the total investment and the operating cost of thegenerating units

Meet the demand criteria, fuel mix ratio, and the reliabilitycriteria

Highly constrained, nonlinear, discrete optimization problem

Solution through complete enumeration in the entire planninghorizon

System dynamics models for system behaviour: Detailledrelationships between the main variables of the system withexplicit recognition of feedbacks and delays.

S. Kannan, S. Mary Raja Slochanal, and Narayana Prasad Padhy: Application and Comparison ofMetaheuristic Techniques to Generation Expansion Planning Problem. IEEE TRANSACTIONS ONPOWER SYSTEMS 20:1, 2005.


Another competition

Algorithms:

Genetic algorithmDi�erential evolutionEvolutionary programmingEvolution strategiesAnt colony optimizationPartical swarmsTaboo searchsimulated annealingHybrid approach(GA+direct search in linear span)


Medium term results

(Cost, # �tness evaluations, # generations, error range,success rate, execution time)


Long term results


Conclusions

Tuning of all algorithm by generic methods

virtual mapping procedure (e.g. n1× type A power station,n2× type B: use variable n that Cantor anumerates the arrayformed by the pairs(n1, n2))intelligent initial population generation (does not observeconstraints but meets the demand plus a reserve margin)penalty factor approach (constraints are penalised, but notdeselected)

Dynamic programming (DP) is optimal when computable

Hybrid approach wins!

Among the others: DE is best (perhaps because 4 vectors are�crossed over� instead of 2 in GA etc.)


natural computing

Documents