sintetizando los algoritmos evolutivos: algoritmos de ... › aepia › files › evia ›...

Sintetizando los Algoritmos Evolutivos: Algoritmos de Estimación de Distribuciones

Sintetizando los Algoritmos Evolutivos:Algoritmos de Estimación de Distribuciones

Jose A. Lozano

Intelligent Systems GroupThe University of the Basque Country

EVIA 2014(A Coruña, September 3rd, 2014)

Outline of the presentation

1 Estimation of Distribution Algorithms: Introduction

2 Protein Folding with EDAs

3 Miscellaneous

4 Conclusions

EDAs: Introduction

3 Miscellaneous

4 Conclusions

EDAs: Introduction

Evolutionary Computation

Genetic Algorithms (GAs)Based on mimicking natural evolutionKeep a population of solutions at each stepUse of reproduction operators: crossover and mutationMathematically modeled as a Markov chain

EDAs: Introduction

Estimation of Distribution Algorithms (EDAs): Thebasics

Motivation: Drawbacks of Genetic AlgorithmsToo much heuristic algorithmsMany parameters to set upDifficult to define good crossover and mutation operatorsBad results in some trivial problemsLack of rigorous mathematical analysis

Basic CharacteristicsDelete the reproduction operators of GAs:

Learn a probability distribution from the selected individualsSample the probability distribution to obtain the newpopulation

EDAs: Introduction

Estimation of Distribution Algorithms (EDAs): Thebasics

Motivation: Drawbacks of Genetic AlgorithmsToo much heuristic algorithmsMany parameters to set upDifficult to define good crossover and mutation operatorsBad results in some trivial problemsLack of rigorous mathematical analysis

Basic CharacteristicsDelete the reproduction operators of GAs:

Learn a probability distribution from the selected individualsSample the probability distribution to obtain the newpopulation

EDAs: Introduction

Optimization of OneMax with EDAs

max h(x) =6∑

with xi = 0,1

EDAs: Introduction

max h(x) =6∑

with xi = 0,1

X1 X2 X3 X4 X5 X6 h(x)1 1 0 1 0 1 0 32 0 1 0 0 1 0 23 0 0 0 1 0 0 14 1 1 1 0 0 1 45 0 0 0 0 0 1 16 1 1 0 0 1 1 47 0 1 1 1 1 1 58 0 0 0 1 0 0 19 1 1 0 1 0 0 3

10 1 0 1 0 0 0 211 1 0 0 1 1 1 412 1 1 0 0 0 1 313 1 0 1 0 0 0 214 0 0 0 0 1 1 215 0 1 1 1 1 1 516 0 0 0 1 0 0 117 1 1 1 1 1 0 518 0 1 0 1 1 0 319 1 0 1 1 1 1 520 1 0 1 1 0 0 3

EDAs: Introduction

max h(x) =6∑

with xi = 0,1

X1 X2 X3 X4 X5 X6 h(x)1 1 0 1 0 1 0 32 0 1 0 0 1 0 23 0 0 0 1 0 0 14 1 1 1 0 0 1 45 0 0 0 0 0 1 16 1 1 0 0 1 1 47 0 1 1 1 1 1 58 0 0 0 1 0 0 19 1 1 0 1 0 0 3

10 1 0 1 0 0 0 211 1 0 0 1 1 1 412 1 1 0 0 0 1 313 1 0 1 0 0 0 214 0 0 0 0 1 1 215 0 1 1 1 1 1 516 0 0 0 1 0 0 117 1 1 1 1 1 0 518 0 1 0 1 1 0 319 1 0 1 1 1 1 520 1 0 1 1 0 0 3

EDAs: Introduction

Learning the probability distribution

X1 X2 X3 X4 X5 X61 1 0 1 0 1 04 1 1 1 0 0 16 1 1 0 0 1 17 0 1 1 1 1 1

11 1 0 0 1 1 112 1 1 0 0 0 115 0 1 1 1 1 117 1 1 1 1 1 018 0 1 0 1 1 019 1 0 1 1 1 1

EDAs: Introduction

X1 X2 X3 X4 X5 X61 1 0 1 0 1 04 1 1 1 0 0 16 1 1 0 0 1 17 0 1 1 1 1 1

11 1 0 0 1 1 112 1 1 0 0 0 115 0 1 1 1 1 117 1 1 1 1 1 018 0 1 0 1 1 019 1 0 1 1 1 1

p(x) = p(x1, . . . , x6) = p(x1) · p(x2) · p(x3) · p(x4) · p(x5) · p(x6)

EDAs: Introduction

X1 X2 X3 X4 X5 X61 1 0 1 0 1 04 1 1 1 0 0 16 1 1 0 0 1 17 0 1 1 1 1 1

11 1 0 0 1 1 112 1 1 0 0 0 115 0 1 1 1 1 117 1 1 1 1 1 018 0 1 0 1 1 019 1 0 1 1 1 1

p(X1 = 1) =7

EDAs: Introduction

X1 X2 X3 X4 X5 X61 1 0 1 0 1 04 1 1 1 0 0 16 1 1 0 0 1 17 0 1 1 1 1 1

11 1 0 0 1 1 112 1 1 0 0 0 115 0 1 1 1 1 117 1 1 1 1 1 018 0 1 0 1 1 019 1 0 1 1 1 1

p(X1 = 1) =7

10p(X2 = 1) =

10p(X3 = 1) =

10p(X4 = 1) =

10p(X5 = 1) =

10p(X6 = 1) =

EDAs: Introduction

X1 X2 X3 X4 X5 X61 1 0 1 0 1 04 1 1 1 0 0 16 1 1 0 0 1 17 0 1 1 1 1 1

11 1 0 0 1 1 112 1 1 0 0 0 115 0 1 1 1 1 117 1 1 1 1 1 018 0 1 0 1 1 019 1 0 1 1 1 1

p(X1 = 1) =7

10p(X2 = 1) =

10p(X3 = 1) =

10p(X4 = 1) =

10p(X5 = 1) =

10p(X6 = 1) =

EDAs: Introduction

p(X1 = 1) =7

10p(X2 = 1) =

10p(X3 = 1) =

10p(X4 = 1) =

10p(X5 = 1) =

10p(X6 = 1) =

EDAs: Introduction

p(X1 = 1) =7

10p(X2 = 1) =

10p(X3 = 1) =

10p(X4 = 1) =

10p(X5 = 1) =

10p(X6 = 1) =

p(X1 = 1, X2 = 0, X3 = 1, X4 = 0, X5 = 1, X6 = 1) =

EDAs: Introduction

p(X1 = 1) =7

10p(X2 = 1) =

10p(X3 = 1) =

10p(X4 = 1) =

10p(X5 = 1) =

10p(X6 = 1) =

p(X1 = 1, X2 = 0, X3 = 1, X4 = 0, X5 = 1, X6 = 1) =

p(X1 = 1) · p(X2 = 0) · p(X3 = 1) · p(X4 = 0) · p(X5 = 1) · p(X6 = 1) =

EDAs: Introduction

p(X1 = 1) =7

10p(X2 = 1) =

10p(X3 = 1) =

10p(X4 = 1) =

10p(X5 = 1) =

10p(X6 = 1) =

p(X1 = 1, X2 = 0, X3 = 1, X4 = 0, X5 = 1, X6 = 1) =

p(X1 = 1) · p(X2 = 0) · p(X3 = 1) · p(X4 = 0) · p(X5 = 1) · p(X6 = 1) =

EDAs: Introduction

p(X1 = 1) =7

10p(X2 = 1) =

10p(X3 = 1) =

10p(X4 = 1) =

10p(X5 = 1) =

10p(X6 = 1) =

p(X1 = 1, X2 = 0, X3 = 1, X4 = 0, X5 = 1, X6 = 1) =

p(X1 = 1) · p(X2 = 0) · p(X3 = 1) · p(X4 = 0) · p(X5 = 1) · p(X6 = 1) =

↓ ↓ ↓ ↓ ↓ ↓

710 · 3

10 · 610 · 4

10 · 810 · 7

EDAs: Introduction

Obtaining the new population

0.23 p(X1 = 1) = 710 −→ 1

0.45 p(X2 = 1) = 710 −→ 1

0.89 p(X3 = 1) = 610 −→ 0

0.12 p(X4 = 1) = 610 −→ 1

0.98 p(X5 = 1) = 810 −→ 0

0.54 p(X6 = 1) = 710 −→ 1

EDAs: Introduction

0.23 p(X1 = 1) = 710 −→ 1

0.45 p(X2 = 1) = 710 −→ 1

0.89 p(X3 = 1) = 610 −→ 0

0.12 p(X4 = 1) = 610 −→ 1

0.98 p(X5 = 1) = 810 −→ 0

0.54 p(X6 = 1) = 710 −→ 1

EDAs: Introduction

0.23 p(X1 = 1) = 710 −→ 1

0.45 p(X2 = 1) = 710 −→ 1

0.89 p(X3 = 1) = 610 −→ 0

0.12 p(X4 = 1) = 610 −→ 1

0.98 p(X5 = 1) = 810 −→ 0

0.54 p(X6 = 1) = 710 −→ 1

EDAs: Introduction

0.23 p(X1 = 1) = 710 −→ 1

0.45 p(X2 = 1) = 710 −→ 1

0.89 p(X3 = 1) = 610 −→ 0

0.12 p(X4 = 1) = 610 −→ 1

0.98 p(X5 = 1) = 810 −→ 0

0.54 p(X6 = 1) = 710 −→ 1

EDAs: Introduction

0.23 p(X1 = 1) = 710 −→ 1

0.45 p(X2 = 1) = 710 −→ 1

0.89 p(X3 = 1) = 610 −→ 0

0.12 p(X4 = 1) = 610 −→ 1

0.98 p(X5 = 1) = 810 −→ 0

0.54 p(X6 = 1) = 710 −→ 1

EDAs: Introduction

X1 X2 X3 X4 X5 X6 h(x)1 1 1 0 1 0 1 52 1 0 1 0 1 1 43 1 1 1 1 1 0 54 0 1 0 1 1 1 45 1 1 1 1 0 1 56 1 0 0 1 1 1 47 0 1 0 1 1 0 38 1 1 1 0 1 0 49 1 1 1 0 0 1 4

10 1 0 0 1 1 1 411 1 1 0 0 1 1 412 1 0 1 1 1 0 413 0 1 1 0 1 1 414 0 1 1 1 1 0 415 1 1 1 1 1 1 616 0 1 1 0 1 1 417 1 1 1 1 1 0 518 0 1 0 0 1 0 219 0 0 1 1 0 1 320 1 1 0 1 1 1 5

EDAs: Introduction

Pseudocode for EDAs

Obtain an initial population of individuals D0

Repeat until a stopping criterion is met

Select from Di a subset of individuals DSi

Learn a probability distribution pi(x) from DSi

Sample pi(x) to obtain Di+1/2

Create the new population Di+1 from Di and Di+1/2

EDAs: Introduction

What components should be set in EDAs?

Population sizeSelection operatorProbabilistic modelLearning algorithmSampling algorithm

EDAs: Introduction

Classification of EDAs

Two criteria to classify EDAsKeep the structure of the probabilistic model fixedClassification based on the complexity of the modelstructure

EDAs: Introduction

EDAs with fixed structure

The structure of the probabilistic model keeps fixed duringthe searchAt each generation only parametric learning is carried outThe fixed probabilistic model tries to mimic the structure ofthe function to optimizeMost of the work has been done in the optimization ofadditive decomposable functions

EDAs: Introduction

EDAs that change the structure

Classification based on the model structureModify the structure ((in)dependences) of the probabilitymodel at each iteration:

1 Univariate model: all the variables are independent (UMDA,PBIL, CGA,...)(*)

2 Bivariate model: consider second-order statistics (TREE,MIMIC)

3 Unrestricted model: use of Bayesian networks (EBNA,LFDA, BOA)

EDAs: Introduction

No dependencies

Univariate Marginal Distribution Algorithm (UMDA)

Probabilistic model: pl(x) =∏n

i=1 pl(xi)

Parameter learning: maximum likelihood (frequencycounts)

Probabilistic Based Incremental Learning (PBIL)Probabilistic model: same as UMDAParameter learning: maximum likelihood (frequencycounts)Rule to modify the probabilistic model:

pl+1(x) = (1− α)pl(x) + α1N

N∑k=1

x lk :M

EDAs: Introduction

Bivariate model

Probabilistic model: p(x) =∏n

i=1 p(xi |xj(i))

Structural learning: Chow and Liu algorithmParametric learning: Maximum likelihoodSampling: Probabilistic logic sampling

EDAs: Introduction

Chow and Liu algorithm

i) For each Xi , Xj compute:

MI(Xi ,Xj) =∑xi ,xj

p(xi , xj) logp(xi , xj)

p(xi)p(xj)X1

EDAs: Introduction

p(xi)p(xj)X1

EDAs: Introduction

p(xi)p(xj)

ii) Assign the two edges with the highestMI to the tree

EDAs: Introduction

p(xi)p(xj)

EDAs: Introduction

p(xi)p(xj)

EDAs: Introduction

p(xi)p(xj)

iii) Repeat until there are n − 1 edges inthe tree:Assign the edge with the highest MIvalue to the tree if it does not form aloop, otherwise delete it

EDAs: Introduction

p(xi)p(xj)

EDAs: Introduction

p(xi)p(xj)

EDAs: Introduction

p(xi)p(xj)

EDAs: Introduction

p(xi)p(xj)

iv) Give orientation to the edges bychoosing a node as the root

EDAs: Introduction

p(xi)p(xj)

iv) Give orientation to the edges bychoosing a node as the root

EDAs: Introduction

p(x1, x2, x3, x4) = p(x1|x3) · p(x2|x3) · p(x3) · p(x4|x3)

EDAs: Introduction

Parametric learning in TREE