recombinative emcmc algorithmsleonardo/artigos/ricardo/01554944.pdf · ing few chains (e.g....

2024

Recombinative EMCMC algorithms

Madalina M. DruganComputer Science Dept.

Utrecht UniversityPadualaan 14, 3584CH Utrecht

The [email protected]

Abstract- Evolutionary Markov Chain Monte Carlo (EM-CMC) is a class of algorithms obtained by mergingMarkov chain Monte Carlo algorithms with evolutionarycomputation methods. EMCMC integrates techniquesfrom the EC framework (population, recombination andselection) into the MCMC framework to increase theperformance of the standardMCMC algorithms. In thispaper, we show how to use recombination operators inEMCMC and how to combine them with other existingMCMC techniques (e.g. mutation and selection). Weillustrate these principles by means of an example.

1 Introduction

Markov Chain Monte Carlo (MCMC) algorithms provide aframework for sampling from complicated target distribu-tions that cannot be sampled with simpler, distribution spe-cific, methods. In its standard form, MCMC uses a singlechain which runs for a long time. MCMCs that convergefast to the target distribution are said to mix well (note thatthis is unrelated to the mixing of building blocks in the ECliterature). In an attempt to improve the convergence rate,some MCMC variants work with a population of MCMCs.In this paper, we investigate how to use recombination oper-ators in population-based MCMCs, such that the algorithmconverges to the target distribution.

The Markov Chain Monte Carlo Framework. MCMC isa general framework to generate samples Xt from a prob-ability distribution P(-) while exploring its search spaceQ(X) using a Markov chain. MCMC does not sam-ple directly from P(-), but only requires that the densityP(X) can be evaluated within a multiplicative constantP(X) = P'(X)/Z, where Z is a normalization constantand P' (-) is the unnormalized target distribution. A Markovchain is a discrete-time stochastic process {Xo,X1,.. .}with the property that the state Xt given all previous val-ues {Xo,X1,. .. ,Xt-} only depends on Xt-1: P(XtX0,X1, .. lt = P(Xt Xt-1). We call P(. ) thetransition matrix of the Markov chain. P( ) is a station-ary - this is, independent of time t - transition matrix (ordistribution) with the following properties: (i) all the en-tries are non-negative, and (ii) the sum of the entries in arow is 1. We assume that P(-) > 0.MCMC converges, in infinite time, to the probability dis-

tribution P(.), thus it samples with higher probability frommore likely states of P(-). A finite state MCMC whichhas an irreducible and aperiodic stationary transition matrix

Dirk ThierensComputer Science Dept.

Utrecht UniversityPadualaan 14, 3584CH Utrecht

The [email protected]

converges to a unique stationary distribution [1]. A MCMCchain is irreducible if, and only if, every state of the MCMCchain can be reached from every other state in several steps.A MCMC is aperiodic if, and only if, there exist no cyclesto be trapped into. A sufficient, but not necessary, conditionto ensure that P(.) is the stationary target distribution is thatthe MCMC satisfies the detailed balance condition [1]. AMCMC satisfies the detailed balance condition if, and onlyif, the probability to move from Xt to Xn multiplied by theprobability to be in Xt is equal to the probability to movefrom X, to Xt multiplied by the probability to be in Xn:P(Xn Xt) * P(Xt) = P(Xt X8) * P(Xn).Metropolis-Hastings algorithms. Many MCMC algo-rithms with detailed balance are Metropolis-Hastings (MH)algorithms [4, 8]. Since we cannot sample directly fromP(.), MH algorithms consider a simpler stationary distribu-tion S(- -), called the proposal distribution for samplingthe next state of a MCMC chain. S(Xn Xt) generates thecandidate state Xn from the current state Xt, and the newstate Xn is accepted with probability:

A(Xn Xt) = min (1 P'(Xn).(X Xn(nlt) ( >P'(Xt) S(Xn Xt))

If the candidate state is accepted, the next state becomesXt+1 = Xn. Otherwise, Xt+l = Xt. For finite searchspaces, the transition probability for arriving in Xn whenthe current state is Xt $ Xn, is T(Xn Xt) = S(XnXt) * A(Xn Xe), and T(Xt Xt) = 1 -Eyyyxt S(YXt) A(Y Xt), otherwise. A MH algorithm is aperi-odic, since the chain can remain in the same state with aprobability greater than 0, and by construction it satisfiesthe detailed balance condition, P(Xt) - T(Xn Xt) =

P(Xn) .T(Xt Xn,). If, in addition, the chain is irreducible,then it converges to the stationary distribution P(.). Therate of convergence depends on the relationship betweenthe proposal distribution and the target distribution: thecloser the proposal distribution is to the stationary distribu-tion, the faster the chain converges. A popular Metropolis-Hastings algorithms is the Metropolis algorithm. For theMetropolis algorithm, the proposal distribution is symmet-rical S(Xn Xt) = S(Xt Xn) and the acceptance rulebecomes A(Xn Xt) = min (1, P .(Xn) Note that, to findthese acceptance probabilities, we do not have to computethe proposal probabilities. Because it accepts the candidatestates often - and thus the state space is well sampled - theMetropolis rule generally performs well.

Mutation. A popular irreducible proposal distribution is

0-7803-9363-5/05/$20.00 ©2005 IEEE. 2024

2025

defined with the mutation operator [6, 7, 12], Sm. In a dis-crete space Sm randomly changes every value of each vari-able of the current state with a non-zero probability, calledmutation rate. Assuming a binary space with string length1, and a mutation rate ofPm > 0, the probability to generatea candidate state Xn from Xt is:

Sm(Xn Xt) = (1 pm)1-A(Xn,Xt) pmA(Xn ,X0)

where A (Xn, Xt) the is hamming distance. The bigger themutation rate, the bigger the jump in the search space of thechild state from the parent state.Theorem 1 The mutation operator defines an irreducible,symmetric and stationary proposal distribution, Sm.Proof. Sm is irreducible because from any string there isa non-zero probability to go to any other string, Sm(XnXt) > 0. Sm is also stationary: it is invariant in time, andthe sum of the elements on one row is 1, Eycu(X) Sm(YXt) = 1. Sm is symmetrical: the probability to generatean individual from another one depends on the number ofpositions in which the two individuals coincide. OIn the rest of the paper, we call Sm the mutation proposaldistribution. The mutation transition matrix, Tm, proposescandidate individuals with Sm and accepts them with A.Corollary 1 The mutation transition matrix, Tm, defines anirreducible MHalgorithm which converges to the stationarydistribution.

Multiple independent chains (MICs). In an attempt toimprove the mixing behavior of MCMCs one could makeuse of multiple chains that run independently (MICs). Thechains are started at different initial states and their outputis observed at the same time. It is hoped that this way amore reliable sampling of the target distribution P(-) is ob-tained. It is important to note that no information exchangebetween the chains takes place. Recommendations in theliterature are conflicting regarding the efficiency of multipleindependent chains. Yet there are at least theoretical ad-vantages of multiple independent chains MCMC for estab-lishing its convergence to P(.) [3]. Since the chains do notinteract, MIC is at population level an MCMC with transi-tion probabilities equal to the product of component chainstransition probabilities. Calling Xt = (Xt [1], . . . , Xt [N])the population at time t of N states (or individuals) Xt[.],then N

T(Xt+1 Xt) = TIT(xt±[i] Xt[i])i=l1

where Xt+1 is the next population. If the MCMCs havedetailed balance, are irreducible and aperiodic, then MICinherits these properties and it converges, at pvoAulationlevel, to the product of their target distributions, Hl=1 Pi (.),where Pi(-) is the target distribution of the i-th chain. MICsusing the mutation operator are used in EMCMCs [6, 12].

2 Recombinative EMCMCs

MICs are population MCMCs with non-interacting chains;each chain in a MIC is itself a MCMC. EMCMCs allowinteractions between the individual chains.

Definition 1 An evolutionary Markov chain Monte Carlo(EMCMC) algorithm is a populationMCMC that exchangesinformation between individual states such that, at popula-tion level, the EMCMC is a MCMC.Note that, in EMCMCs, the population is a multi-set ofindividuals states rather than a collection of MCMCs: thecurrent individual states depends on several states from theprevious population. We call the i-th chain the sequence ofthe i-th individual states in the populations of an EMCMC,xo ... .,xt.* * *

In this paper, we focus on EMCMCs that, like MICs,converge to the target distribution IfJIN Pi (-). Detailed bal-ance is a sufficient, but not necessarily condition for an ir-reducible aperiodic EMCMC to converge to this target dis-tribution. Most of the existing EMCMCs [2] are irreducibleMH algorithms - by use of mutation - and apply recombina-tion in the proposal distribution. We call EMCMCs that userecombination to exchange information between individualsrecombinative EMCMCs. We consider recombination oper-ators that are respectful - this is, the common substructuresof the parents are inherited by the offspring [9]. In Section4, we show that to obtain detailed balance, the individualsthat interact through recombination also need to interact inthe acceptance rule.

Definition 1 is similar, but more general, with the defini-tion of population MCMCs (popMCMC) from Laskey andMeyers [5] and adaptive direction sampling (ADS) fromGilks and Roberts [3]. The main differences with the EM-CMC framework are: (i) EMCMCs allow to propose morethan one individual for the next generation, and (ii) popM-CMC and ADS, like MICs, converge always to the N inde-pendent sample points from the target distributions, whereasin EMCMC we can consider any target distribution. We de-scribe the popMCMC algorithm in Section 5.

After introducing the basic concepts and algorithms ofMCMC in the previous section, in the next section, we dis-cuss the properties of different types of recombination oper-ators in the EMCMC framework. Recombination operatorscan both be symmetrical or non-symmetrical. For symmet-rical proposal distributions, we can use a Metropolis algo-rithm that is computationally advantageous - because theproposal probabilities are not calculated. Note that, somenon-symmetrical proposal distributions, however, can speedup the mixing of an EMCMC (e.g. the popMCMC [5]that is described in the next section). Section 4 shows howto combine the recombination operator with several accep-tance rules to obtain detailed balance. Section 5 outlineshow to design various recombinative EMCMCs. In general,not every state (or individual) can be generated by recom-bination alone, therefore an EMCMC needs the addition ofa mutation operator. We study possible combinations be-tween recombination and mutation: how to combine them(e.g. adding or multiplying the effect of these operators)and when to combine them (e.g. before or after the accep-tance rule). Using the MCMC formalism and the mathemat-ical expression of operators, we formalize a number of EM-CMC algorithms we have found in the literature [6, 5, 7, 12].In Section 6 we define a small example function and show

2025

2026

analytically that recombination improves the efficiency ofMCMC but also the time to arrive at the optimum. Section7 concludes the paper.

3 Recombination in EMCMCs

In the following, we investigate properties for recombina-tion operators in EMCMCs. We restrict our attention to re-spectful recombinations [9]. Here, we consider an individ-ual as a string of 1 characters where the position of X[-] [j] inX[.] is called the locus ofX[.] U]. We use X[ ] [.]a to denotethe a-th value Q(X[-][-]), the set of all possible values ofX[.][.]. We call X[-][j]a an allele of X[.][j].Focus Recombination is the interaction between two ormore parents in a search space that generates individualsin the same search space such that if at a certain positionthe parents have identical alleles, then the children have thesame allelesfor that position.Thus, with recombination the common parts of strings areprotected against disruption.Theorem 2 The respectful recombination operators are re-ducible.Proof. From a population of individuals with identical alle-les on a locus, the probability to go to some population withanother allele on that locus is 0. OIn Table 1 we present the operators composed from muta-tion and/or recombination, their irreducibility, their symme-try, and their number of parents compared with the numberof children.

Given the number of chains which interact, we distinguishbetweenfamily and population recombinations. Recombin-ing few chains (e.g. two or three chains) is an example ofthe first approach, while in the latter all chains from the pop-ulation exchange information. We assume that, for familyrecombination, each generation, the population is groupedin families such that each individual is belonging to ex-actly one family. All the chains from an EMCMC, even-tually, will recombine. At individual and at family level, theproposal probabilities of recombination are not stationarysince they depend on the family members with which theyare grouped. At population level, the probability to gen-erate with recombination one population from another oneis stationary. We call recombination proposal distributionthe distribution defined by the recombination probabilitiesat population level. We denote it with Sr Recombinationoperators generate one or more children.

It is important to establish if a recombination operatoris symmetrical or not: for the non-symmetrical recombi-nations, we have to compute the proposal probabilities,whereas for symmetrical operators we can use the Metropo-lis algorithm. In the existing EMCMCs, the symmetry isobtained by preserving the Hamming distance between theparents and their children. For example, the parts of in-dividuals are exchanged between parents, like in the uni-form crossover [6, 12], or the distance between a parentand its child is constant as compared with the distance be-tween two other individuals in the population, like in thexor crossover [12]. We show two types of non-symmetric

Table 1: Several mutation/recombination operators

type op opmut Smrecomb unif recomb S,,,u

xor Sr,xmasked Sr,spop freq Sr,p

mixture Sm+r=0.5Sm+O.5Srcycle Srm,u = Sm x Sr,U

5rm,xSrm,s

Sp,r +1/NSmr,p=- 1+-(X[-r l)/N

irredirredredredredredirredirredirredirredirred

symmetrysymmsymmsymm

non-symmnon-symmsymmsymm

non-symmnon-symmnon-symm

par/child1/12/23/ 12/1N/ 12/22/23/32i/N / I

recombinations, masked crossover and population frequen-cies crossover [5], closely related with the symmetric ones,but which do not preserve the Hamming distance. All thepresented recombination proposal distributions are station-ary. In the next paragraphs, we exemplify on symmetric andnon-symmetric recombination types.

Swapping alleles. This type of recombination is often usedin EMCMCs, similar to the standard GAs.Proposition 1 The recombination proposal distributionswhich swap parts of individuals in between chains are re-ducible, symmetrical and stationary. Furthermore, the ham-ming distance between parents and children is constant.Proof. Since there are equal probabilities to swap alleles(parts) in between parents and in between children, this re-combination is symmetrical and their hamming distance re-mains equal. OWe have recombinations which exchange non-common al-leles, e.g. uniform crossover [7, 6], or parts of individuals,e.g. 1 and 2 point crossover [6, 7]. In literature, this recom-bination is often used as family recombination.

An example of such recombination is parameterized uni-form crossover, Sr,u, which generates two candidate in-dividuals by swapping alleles between two parents with auniform probability, Pm. Thus, it is impossible to gener-ate children that have other common alleles than their par-ents. Where the two parents differ, an allele is swapped withthe probability Pm and is not swapped with the probabil-ity 1 -Pm. The probability to generate XJ(i] and X?, [j]is PmA(Xn[i]Xt[i]) * (1 - PM)A(Xt[i1,Xt[jD-A(Xrt[Xt[i]where 1- A (X [i], Xt[i]) -1 + A(Xt[i], Xt[j]) is the num-ber of alleles common for Xr, [i] and Xt [i] but not commonfor Xt[i] and Xt[j]. For Pm = 2 , the operator is called uni-form crossover and is used with all codings: for strings ofbits [6] and for strings of reals numbers [3].

Differential recombination. This type of recombination isimported from real coded EAs [11] and EMCMCs [13] forbinary strings by the xor crossover [12].Proposition 2 Consider recombination operators where acandidate individual, Xr,[i], is generated from three par-ents from the current population, Xt [i], Xt[j], Xt[o], suchthat the total distance between parents, A (Xt[i], Xt[j]) +A(Xt[i], Xt[o]) + zN(Xt[j], Xt[o]), is equal with the totaldistance between the candidate individual and two of theparents. Ifallparents are selectedfrom the population with-out replacement, then this operator is reducible, symmetri-cal, and stationary.

2026

=

2027

Proof. Parent Xt[i] and child X, [i] are interchangeable;they have the same total distance with the other two parents.Thus, this recombination is symmetrical. aWe propose the total difference crossover, Sr,x. The newindividual, X1, [i] has the same alleles like Xt [i] on thepositions where the two other parents coincide. On theother positions, we mutate the alleles of Xt [i] with theprobability Pm. We assume that i $ j $ o. Theprobability to generate Xn[i] is Pm^(X4[i]3Xt[i]) * (1-p )A(Xt [j],x[o])-A(X-[i],xt[i]). The xor crossover [12] isa special case of ST,, where the mutation probability is 1for Xt [i]'s bits where Xt[j] and Xt [o] disagree.

The main difference between Sr,x and Sr,u is that Sr,xpreserves the sum of hamming distances between the threeparents when generating a child and Sr,u preserves the ham-ming distance between two parents when generating twochildren.

Snooker recombination. We import the snooker recombi-nation from the real coded EMCMCs [3] into, what we call,the masked crossover for binary strings, Sr,8. A child X,, [i]is generated from a parent, Xt[i], and a mask, Xt[j]. Thecommon alleles of Xt [i] and Xt[j] are passed to X, [i], butthe non-common alleles are flipped in Xt [i] with the ratePm. If the candidate individual does not coincide in posi-tions where Xt [i] and Xt[jU] coincide, the proposal distri-bution is 0; otherwise the probability to generate X, [i] isPmA(X. [i],Xt[j]). (1 Pm )A(Xt [i],Xt [j])-A(Xn [i],Xt [i]) This

crossover and the parameterized uniform crossover have thesame probabilities to generate one child. But, since Sr,sgenerates only one child, in general, the hamming distanceis not preserved and the symmetry condition does not hold.Thus, we have to compute the probabilities to generate acandidate individual with ST,S in the acceptance rule of theMH algorithm. 5r,s also resembles ST,x where two parentsare identical. However, by replacing the identical parentswith the child in the candidate generation, the symmetrycondition does not hold.Proposition 3 Sr,s is reducible and stationary. Considerthat from a parent Xt [i] and a mask Xt [j] we generate achild, X. [i] with Sr.,, such that Xn [i] $ Xt [i]. Since it isimpossible to generate Xt [i] from the child and the mask,Sr,s is non-symmetrical.Proof. Let's consider that Xt [i] :$ X, [i] because bits areflipped on some positions. In that positions, the mask Xt [j]and the child X, [i] have the same values, whereas Xt [i] andXt [j] do not. Then, it is impossible to generate Xt [i] fromX,, [i] and Xt[j]. 0

Recombination using probabilistic models. This recom-bination builds a probabilistic model of parents to generatetheir children. It is used at population level in EMCMCs [5]but also in EAs, by estimation distribution algorithms.

Laskey and Myers [5] propose a frequencies proba-bilistic crossover, Sr,p. which generates alleles based ontheir frequencies in the population. Using Sr,p, an al-lele X,[i][j]a on the j-th locus of the candidate individualX, [i] is generated using a distribution N(X. [i][3 ) whereN(Xn[i][J]a) is the number of individuals that have the al-lele X,[j][j]a on the j-th locus.

Proposition 4 Sr,p is reducible, non-symmetrical and sta-tionary. Furthermore, eventually, after several steps, with-out considering an acceptance step, all individuals are thesame.Proof. When an individual is generated with Sr,p and re-places a parent, some allele frequencies can increase in thedetriment of others. Then, the probability to generate thecurrent population from the next one is smaller than vice-versa; and the probability that the same alleles are generatednext is increasing.0Laskey and Myers approximate, for infinite populations, aproposal distribution almost independent of the current stateand arbitrarily close to the target distribution.

We conclude that, in the existing EMCMC, only the re-combination operators which preserve the hamming dis-tance when generating candidate individuals are symmet-rical. Furthermore, these above non-symmetrical recombi-nation distributions cannot be used in the MH algorithms.The probability to generate the candidate population fromthe current one can be 0 even when the reverse probabilityis non-zero. Then, the probability to accept the candidatepopulation is 0. In Section 5, we show how to use them byadding mutation.

4 Detailed balance in recombinative EMCMCs

In the previous section we studied the properties of recom-bination in EMCMCs. In the following, we investigate howto obtain EMCMCs with detailed balance.All children accepted/rejected. A common characteristicfor the acceptance rules with detailed balance is that chil-dren are all accepted or rejected. For example, with the cou-pled acceptance rule Ac, proposed by Liang and Wong [6]two parents and two children are considered for acceptance;both children are accepted or rejected:

AC (X [i], Xn[j] Xt[i], Xt[j]) =

mm (1 Pi (Xn[i]) *P(Xn[]) S(Xt [i], Xt DI Xn [i], Xn[j]))min (1, J

Sx'-] XII] X-j]Pi, (Xt [i] ) *Pj2(Xtg[J) S(Xn PIi] Xn [2] Xt PI] i Xt [UDwhere Pi'(-) is the unnormalized target distribution of thei-th chain.Theorem 3 A MH algorithm where all children generatedbyparents are accepted or all rejected has detailed balance.Proof. Without loss of generality, we assume that from2 parents, Xt = {Xt[1],Xt[2]}, we generate 2 children

Xn = 1{Xn[1]Xn[2]} and P'(Xn[1]).P2(Xn[2]) < 1. For

simplicity, we assume that the proposal distribution issymmetrical. If the target distribution of the two chains,P(., -), is the product of target distributions for eachchain, P1(.) - P2(-), we obtain the coupled acceptancerule, Ac. By rewriting the Ac, the transition probabilityto go from Xt to Xn is T(Xn[1], Xn[2] IXt [1], Xt [2]) =S(Xn [1]vXn [2] Xt [l],Xt [2] ) -Pl (Xn [l]) e2(X [2])Furthermore, T(Xt[1], Xt[2] X4[1], Xn[2]) =

S(Xt[l],Xt[2] Xn[1],Xn[2]) * 1. By replacing thesetransition probabilities in the detailed balance equa-tion, P(Xt[1],Xt[2]) * T(Xn[1],Xn[2] Xt[l],Xt[2]) =

2027

2028

P(Xn[11] Xn[2]) * T(Xt[1], Xt[2]lXn[1], Xn[2]), we obtainP(Xt 1],Xt[2]) - P(Xn[l],X-n[2]) .

Pl' (Xt [1]) *P2 (Xt [21) pl (X. [1 ) *P2 (Xn [2])For example, when Ac is associated with the uniformcrossover - that generates two children from two parents -the algorithm has detailed balance [6, 12]. We denote thistransition matrix with Tr,u.

Another example is the population acceptance ruleAp [3], where the current population competes against theproposed population. With all recombinations, Ap has de-tailed balance, because all individuals from the candidatepopulation are accepted or all rejected. However, in prac-tice, not always such an acceptance rule is desired, since itis not selective at individual level. For example, usually, in-dividuals with higher and lower probabilities are proposedand they are all accepted or all rejected.

Sometimes the standard acceptance rule is equivalentwith the coupled and population acceptance rules.Proposition 5 A symmetrical recombination distributionwhich generates one child, can use the standard MH ac-ceptance rule A to replace one ofthe parents with the child.Proof. Such an acceptance is equivalent with the MH ac-ceptance A(X, [i] Xt[i]), where X,[i] is the generatedchild, and Xt [i] is the parent candidate for replacement. OFor example, when the total difference recombination is as-sociated with a standard acceptance rule, A, the resultingtransition matrix, Tr,x, has detailed balance.

Laskey and Myers [5] use an acceptance rule which re-sembles the A acceptance rule - one generated child againstthe parent that might be replaced - but which computes theproposal probabilities for their non-symmetrical crossoverover the whole population. We consider this a populationacceptance rule.

Some children accepted/rejected. Even though the MHacceptance rules resemble the selection rules from EA thefirst ones are restricted by the detailed balance condition.When only some children are accepted, the resulting tran-sition probability is added to the transition where only theaccepted children were generated from the same parents.Theorem 4 A MH algorithm where some children - notnecessarily all - are accepted with MH acceptance rules,has detailed balance only for the proposal distributionswhere the probability to generate all children from all par-ents is equal with the probability to generate from the ac-cepted children and the parents of the rejected ones, the re-jected children and the parents of the accepted ones.Proof. Consider, without loss of generality, N = 2 and 4individuals Y1,Y2,Y3,Y4 E Q(X), such that pi(Y3) < 1

and P(Y24) < 1, and a MH acceptance rule where one par-ent competes against a child. For simplicity, we assume thatthe proposal distribution, S( *), is symmetrical. We have

AC, . (Y3) Y4 Yl,Y2)=min{ 1, ' }I min{i ,gj(Y?) }.Note that the probabilities to accept or reject both childrenare equal for AC,1.1 and Ac. However, the transition prob-ability to go from Xt = {Y1,Y2} to Xn = {Y3,Y2} isa sum between the probability to accept Xn when gener-ated from Xt, S(X IXt) - Ac,1.1(XnIXt), and the prob-

ability to generate {Y3, Y4} from Xt, but accept only Y3,S(Y3,Y41Y1,Y2) . (1- Y)). Where (1-y)is the rejection probability of Y4 when it competes againstY2. In the sequel, the transition probability to go from Xnto Xt is a sum between the probability to accept Xt whengenerated from X7l, S(Xt Xn) AC,1.1 (Xt Xn) and theprobability to generate {Y1, Y4 } from Xn but only Y1 is ac-

cepted, S(Y1, Y4 Y3, Y2) 1 * (1-py). But AC,1 1(Xn2Xt)=Ac (XnlXt) and AC,1.1 (XtlXn)=Ac (XtIXn). Thus,the extra terms of these transition as compared with the tran-sitions accepting with Ac, have to be equal in the detailedbalance equation, P(Y1, Y2). Pi,3Y . (1- P2(Y4y) S(Y3, Y41Y1 Y2)=P(y3,Y2)- (I- ) - S(Y1,Y4IY3, Y2). Since

P(Y1,Y2) - PY( 3Y2) ) we have S(Y3,Y4 Y1, Y2)=P4'(Yi>4Y)-P2y2 l(y3 'P2(y2)S(Y1, Y41Y3, Y2) DThe above condition holds for mutation distributions andsymmetrical recombination distributions that generate onechild, situations for which we already established detailedbalance. According to Theorem 4, if we generate indi-viduals with uniform recombination and each child com-petes against one parent in the standard acceptance rule,the resulting algorithm does not have detailed balance: forany four individuals, Y1, Y2, Y3, Y4 E Q(X), Sr,u(Y3, Y4Yi, Y2)#Sr,u(y,1 Y4 Y3, Y2).Nested transition distributions. In the following, we pro-pose a method to integrate the transitions without detailedbalance in MH algorithms with detailed balance.Definition 2 A nested MH algorithm is a MH algorithmwhere individuals are proposed using a transition distribu-tion and then are all accepted or rejected using a MH accep-tance rule. A nested transition distribution is the transitiondistribution used as proposal distribution by the nestedMHalgorithm.Proposition 6 The nested MH algorithm has detailed bal-ance.Proof. Proof is immediate if we consider the nested transi-tion distribution as a proposal distribution. LIFurthermore, if the nested transition distribution uses a re-combination operator, then itself is a recombination distri-bution.Proposition 7 The nested transition distribution composedby a recombinationproposal distribution andan acceptancerule is by itselfa recombination proposal distribution.Proof. By definition, if parents have on some positionsidentical values, using recombination, the generated indi-viduals have on the same positions the same values. Anacceptance rule just selects from parents and children. Theaccepted individuals have on the same positions same val-ues. ONested transitions are, usually, non-symmetrical. Thus, weneed to compute these probabilities.

Detailed balance at population level. Most existing EM-CMCs use family recombinations where all individuals arerandomly grouped - each individual belongs to exactly onegroup - each generation. If the children generates withrecombination are all accepted or all rejected with an ac-

2028

2029

ceptance rule as in Theorem 3, we obtain family transitionprobabilities with detailed balance. At individual or familylevel, these transitions are not MCMCs, since their proposalprobabilities are not stationary - they depend on how the in-dividuals are grouped. At population level, for all possiblegroupings of the current population, the transition distribu-tion is stationary. Then, the population transition proba-bilities obtained by combining the family transitions havedetailed balance and define a MCMC. In the sequel, to pop-ulation recombinations, we associate the population accep-tance rule Ap, to obtain EMCMCs with detailed balancethat converge, at population level, to the target distribution.

5 Recombination in irreducible EMCMCs

In the previous sections we have investigated whether vari-ous recombination operators in combination with some ac-ceptance rule, result in algorithms with detailed balance.Since recombination by definition is reducible, in the fol-lowing, we study how to combine it with mutation to obtainirreducible EMCMCs. Depending how the mutation and re-combination operators are interlaced with acceptance rules,we have successions between: (i) mutation, acceptance andrecombination, acceptance - that is the mutation and recom-bination transition distributions are combined - or (ii) mu-tation and recombination, acceptance - that is mutation andrecombination proposal distributions are combined. We as-sume that mutation and recombination transition distribu-tions have the same target distribution. In the existing EM-CMCs, distributions are combined following simple mathe-matical rules, in mixtures and cycles. In Table 2, we writethe transition matrices of some irreducible EMCMCs withdetailed balance.Mixtures. Mixtures of transition distributions are commonin EMCMCs [6, 12] and were defined by Tierney [14].Definition 3 A mixture of transition (orproposal) distribu-tions is a sum oftransition (orproposal) distributions whereat each step one distribution is selected according to someconstant positive probability.Mixtures of transitions have some remarkable properties.Theorem 5 (Tierney [14]) In a mixture ofMH algorithms,if one of the algorithms is irreducible, then the mixture isirreducible and converges to the target distribution.Proof. Since the irreducible MH algorithm goes in severalsteps from any to any individual, so does the mixture. C1For example, Strens, in his discrete EMCMC (DeM-CMC) [12], uses a mixture with three components: uniformcrossover with probability Pr, xor crossover with probabil-ity p, and mutation with probability 1-P,-Px.TDeMCMC = Pr *Tru + Px *Tr,x + (1 - Pr - Px) *TmFor 1-Pr-px > 0, DeMCMC is irreducible, and convergesto the target distribution.

The mixtures of proposal distributions have similar prop-erties with the mixtures of transition distributions.Theorem 6 Consider a mixture of proposal distributions.If one distribution is irreducible, so does the mixture. Ifall distributions are symmetrical and stationary, so does themixture.

Proof. Same reasons as for the previous theorem. D1For example, the following mixture

Sm+r = Pr 'Sr + (1-Pr) *Sm

is irreducible when Pr < 1, and symmetrical when the re-combination is symmetrical. The above operator is equiva-lent to recombination for Pr = 1, Sm+r = Sr - then it is re-ducible. For Pr < 1, an MH algorithm that accepts with thecoupled acceptance rule the individuals generated by Sm+ris irreducible, with detailed balance, and converges to thetarget distribution.

Cycles. Cycles are common techniques to combine distri-butions in EMCMCs [6, 5] but also in EAs. Cycles of tran-sition distributions were defined by Tierney [14].Definition 4 A cycle of transition (or proposal) distribu-tions is the product of transition (or proposal) distributionswhere in each step one distribution is used in turn, and whenthe last distribution is used, the cycle is restarted.Unlike for mixtures, for cycles, there are no rules for irre-ducibility and detailed balance (or symmetry). They have tobe checked for each cycle.Theorem 7 (Tierney [14]) For finite spaces, if one of thetransition distributions in a cycle is an irreducible MH al-gorithm, then the cycle converges to its target distribution.

Cycles of proposal distributions are common for thestandard GAs where one considers first mutation and thenrecombination, Smr, or first recombination and then muta-tion, Srm

Smr = Sr x Sm ; Srm = Sm x Sr

In general, since two matrices usually do not commute, Smrand Srm are non-symmetrical.Proposition 8 Smr and S,,,, are irreducible, stationary,and equivalent if recombination is symmetrical. They aresymmetrical for any recombination that swaps alleles [7].They are non-symmetricalfor the total difference crossoverProof. They are irreducible because they have non-zeroprobabilities to go from any to any other population, Smr >O and Srm > 0. If recombination is symmetrical then

Smr(Xn Xt) = E Sm(X | Xt) - Sr(Xn X')X' Q(X)

- 5 Sm(Xt X') - Sr(X Xn) = Srm(Xt |Xn)X'EQ(X)

where Q(X) is the search space. Srm and Smr are symmet-rical for recombinations that swap alleles: mutation gen-erates the alleles which differ in the two populations andrecombination swaps them or vice-versa.

By means of an example, we proof that Srm,x is non-symmetrical. Consider Xt = {0, 1, 0}, X = {1, 1, 1}, themutation rate of 1, and, for simplicity, the xor crossover.With the xor crossover, given the distance A (0, 1), we gen-erate 1 from 0 - the intermediate population is Y = {1, 1, 0}- and given A(0, 0), we generate 1 from 1. We have

Sm(X, y) = 22 . 1. For all possible parents choicefor xor, Srm,x(XnlXt)=3 - (2. *3 * 3t).Srmx(

)3* 3 3*2

2029

2030

Table 2: Several EMCMCs with detailed balancealgorithm transitionPRSA [7] Smr,u A0popMCMC [5] Smr,p * ApEMC [6] Tr's x ((1- Pr) * Tm + Pr * Tr,)DeMCMC [12] (1 -Pr -Px) * Tm + Pr * Tr,u + Px * Tr,x

Parallel Recombinative Simulated Annealing (PRSA) [7]uses recombination that swaps alleles followed by muta-tion, Srm,u( ). When the candidate individuals are ac-cepted/rejected with the coupled acceptance rule Ac, PRSAhas detailed balance. When a child competes against oneparent in a standard acceptance rule, the resulting algorithmdoes not have detailed balance, see Theorem 4.

To use the masked crossover with a MH algorithm, weneed to add mutation, such that if the probability to generatea child Xn[i] from a parent Xt[i] and a mask Xt[j] is non-zero, the reverse probability to generate the parent from thechild with the same mask is non-zero. For the common partsof Xt [i] and Xt [], we use mutation with a probability 1}to generate Xn [i]. For the non-common values, like in themasked crossover, the values of Xt[i] are flipped with theprobabilitypm. We denote this operator with Smr,s. We canconsider it a cycle of proposal distributions that generateswith mutation and masked crossover disjunct positions of anindividual. Note that forPm = -, this operator is equivalentwith the mutation operator, Sm, with the mutation rateProposition 9 Smr,s is irreducible. For Pm $ 1, Smr,s isnon-symmetrical.Proof Sm,,, is irreducible, since the probability to arrivefrom any parent to any child is greater then 0. A bit isflipped in the non-common parts of Xt[i] with Xt[j] withthe probability Pm. This bit is common for X,[i] andXt[j]. Thus, it is flipped back with the probability il. IfPm 5$ 1, then Smr,8(Xn[i] Xt[i], Xt[j]) $ Smr,s(XtA[]Xn [i], Xt [j]). Then Smr,8 is non-symmetrical. O

Laskey and Myers [5] propose Smr,p that is a cycle ofproposal distributions, but this time directly in each lo-cus of an individual. This operator, Smr,p chooses a lo-cus Xt[i][j] at uniformly random from the population andproposes an allele X,[i][j]a on that locus with probabil-ity of N(X. [i[j])+, where IQ(X[.][-])J is the cardinalityof the searching space. In the popMCMC [5] the candi-date individual is accepted using the standard MH accep-tance rule, A. We can consider that Smr,p is composedfrom the crossover Sr,p to which a constant probability,N' is added as mutation and the result is normalized with(1 + Io(xk ][.]). Since Sr,p is non-symmetrical, so is Smr,pCombining mixtures and cycles. Based on mixtures andcycles operators one can create various other ones. For ex-ample, the components of a mixture can be cycles

Smr+rm = (1- Pr) - Smr + Pr - Srm

Or we can use a cycle in a cycle.Smrm = Srm X Sm ; Srmr = Smr X Sr

They are symmetrical when recombination is symmetrical.Proposition 10 Smr+rm is irreducible if0 < Pr < 1, andSmr or/and Srm are irreducible. It is also symmetrical if

recombination is symmetrical and Pr = 0.5. mrm andSrmr are irreducible; they are symmetric if recombinationis symmetric.Proof. If recombination is symmetrical then smr = Srm.If Pr = 0.5, Smr+rm(Xn Xt) = 0.5 Smr(Xn Xt)+0-5 Srm( Xt) =Smr+rm(Xt jXn). And Smrm(XnXt)=ZEY,vE(x) Sm(YlXt) - Sr(Vl|y) - Sm(V Xn)=EY,VeQ(X) Sm(VIXt) - Sr(YIV) - Sm(YlXn)=Smrm(Xt|Xn). The other properties follow directly. O

EMC [6] uses a mixture and a cycle. The mixturehas two components: uniform mutation with probability1 - Pr > 0 and uniform crossover with probability Pr. Thesecond step in the cycle swaps states between chains withdifferent target distributions, Tr,S. EMC is irreducible - be-cause TEMC(. -) > 0 - and has detailed balance.

TEMC = Tr,S x ((1- Pr) * Tm + Pr * Tr,u)

6 Example

In this section we look at a simple example for which we canformally compute the effect of the coupled acceptance ruleAc by itself and when combined with recombination. Weconsider three algorithms: multiple independent MCMCs(MIC) that proposes new states by mutation and acceptsthem using the standard Metropolis acceptance rule A, anEMCMC using mutation and the coupled acceptance rule(mut+Ac), and an EMCMC using a cycle of parameter-ized uniform crossover and mutation to propose new statesand accepting them with the coupled acceptance rule Ac(rEMCMC). From the discussion in the previous sectionswe know that all three algorithms will converge to the tar-get distribution Hi=N , Pi(-). We are interested in how fastthese algorithms converge - this is, how well the Markovchains mix - and also in how quickly they reach the mostlikely state. We compute the former property by calculat-ing the second largest eigenvalue A2 of the transition matri-ces of the Markov processes: the smaller A2 the faster thechain mixes [3]. The latter property is computed by calcu-lating the mean hitting time of generating the most likelystate [10].The hyper-geometrical function. To each individual X[.]we associate a fitness function f : Q(X) -+ E, whereP (X[-]) = g(f (X[-])) (g: R1-+ JR+ \ {0}). The testfunction is a trap function which has a hyper-geometric dis-tribution (Hyper), where all bits 0 - we denote this individ-ual with X0 - is the second largest peak h2, and the optimumhl is in all bits one

Uh2 w-A(X,Xo) if A(XX)<wP'(X)= 0.01 if A(X,Xo)=w

hi- A(X,Xo)-w otherwiseb-w

where b the building blocks size, and w the number of onesof the individuals with the lowest value 0.01. For multi-building block function, we add the values of the componentbuilding blocks. We set b = 4 and hl = b. In the firstexperiment, we set w = 3 and we vary the value h2 of thelocal optimum to {0.01, 1, 2, 3}. In the second experiment,

2030

rEM M swa . 1.2MCm -t--. 1.1mut+Ac mut -------- 1

0.90.8 -

< 0.7j 0.6

Ca) 0.5-0.4

1.5 1 1.5 2 2.5 3Peak height

(a)

1 1.5 2 2.5Distance

3

rEM C swaD R5

mutl'AcmuttO

0.5_1. 2I.

3.5 -0.5 1 1.5 2 2.5 3Peak height

(b)

Distance

1.2 rEMC wn .21.1-Cmtu11-1mut mut 0.:25-0.9 -

0.8 -0.7 -

0.6 -

0.5 -

0.4,-0.3 l0.5 1 1.5 2 2.5 3 3.5

Peak height

(c)

1.1 -

dl 1 -

- 0.9 -

[email protected]

-j 0.6 -

0.3-3.5 0.5 1 1.5 2 2.5 3

Distance3.5

(d) (e) (f)Figure 1: Analytical results for Hyper with 2 blocks on 4 bits where (a,b,c) w = 3 and the peak height is varied h2 = {0.01, 1, 2, 31, or (d,e,f) h2 = 3and the distance to the local optimum is varied w = {1, 2, 3}.

we set h2 = 3 and w = {1, 2, 3}, thus varying the basin of Bibliographyattraction of the optima. [1] C. Andrieu, N. de Freitas, A. Doucet, and M. Jordan. An introduction

The computational cost of calculating the eigenvalues to MCMC for Machine Learning. Machine Learning, pages 5-43,and hitting times scales exponentially with the total string 2003.size 1. Therefore, we have to limit the analysis to a Hyper [2] M.M. Drugan and D. Thierens. Evolutionary Markov chain Montefunction with 2 blocks of size 4 (1 = 8), and population size Carlo. In Proc. Artifi cial Evolution'03, LNCS 2936, pages 63-76,N = 2. Note that- the computational cost is also reduced 2004.by grouping individuals with the same number of ones and [3] W.R. Gilks, S. Richardson, and D.J. Spiegelhalter, editors. Markovzeros in one block [10]. These individuals have the same Chain Monte Carlo in practice. Chapman & Hall, 1996.fitness value. For a building block of size b, after reduction, [4] W.K. Hastings. Monte Carlo sampling methods using Markov chainsg. z~~~~~' and their applications. Biometrika, 57:97-109, 1970.we obtain strings of size b which can take (b + 1) values. a t

Figure 1 shows the results for high (0.5) and low (0.125) [5] K. B. Laskey and J. W. Myers. Population Markov Chain Montemutation rates for MIC handmut+AC nrM C has Carlo. Machine Learning, pages 175-196, 2003.mutthensam eswapnporobab for+A..p raEte uio [6] F. Liang and W. H. Wong. Evolutionary Monte Carlo: Applica-

tions to Cp model sampling and change point problem. In Statisticacrossover, 0.5 and 0.125, and the mutation rate of 0.125. Sinica, pages 317-342, 2000.It can be noticed that the mutation rate greatly influences [7] S. W. Mahfoud and D. E. Goldberg. Parallel Recombinative Simu-the performance of MIC, whereas the rEMCMC does not lated Annealing: a Genetic Algorithm. Parallel Computing, pagesvary much with the swapping probabilities. It is also clear 1-28,1995.that for this problem the mixing or sampling performance [8] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, andof MIC is diminished by the coupled acceptance rule AC in E. Teller. Equation of state calculations by fast computing machines.mut+AC. Adding recombination to the coupled acceptance Journal of Chemical Physics, 21:1087-1092, 1953.rule AC improves the sampling performance, but stays less [9] N.J. Radcliffe. Forma analysis and random respectful recombination.efficient than MIC when high mutation rates are applied. In Proc. of the Fourth Inter. Conf: on Gen. Alg., pages 222-229, 1992.

rEMCMC has slightly better hitting times than MIC, while [10] G. Rudolph. Convergence Properties of Evolutionary Algorithms.both rEMCMC and MIC have significant lower hitting times Verlag Dr. Kovac, 1997.

than mut+A0. [11] R. Stom and K. Price. Differential evolution - a simple and efficientheuristic for global optimization over continuous spaces. J. Glob.Opt., 11:341-359, 1997.

7 Conclusion [12] M. J. A. Strens. Evolutionary MCMC sampling and optimization indiscrete spaces. In Proc. ICML, 2003.

We have investigated how to integrate recombination in [13] C.J.F. ter Braak. Genetic algorithms and Markov chain Monte Carlo:population MCMCs to obtain irreducible EMCMCs with Differential Evolution Markov Chain makes Bayesian computingdetailed balance. First, we have studied the properties of easy. Technical report, Biometris, Wageningen UR, 2004.various recombination proposal distributions. We have in- [14] L. Tiemey. Markov chains for exploring posterior distributions. Thevestigated how to use the MH acceptance rules with recom- Annals of Statistics, pages 1701-1762, 1994.bination to obtain EMCMCs with detailed balance. We havestudied how to integrate mutation in these EMCMCs for in-suring irreducibility.

2031

2031

2500 200

m 150

*I 100

50

400, _, 1..

.6)dl8*a

COasc)

5o

rEMUC - swa

utAc, mut350300

E 250oc 200

: 15010050

0.5

12 rEMC! 6sWa&p.m~ 1 mut~+c'mu:

.9 .,,,--- -------- ---------,

08a 076-c o5 -

0.4 -

0.33.5 0.5 1 1.5 2 2.5 3

mm ItA_mut+ c; mu'

ouu

I50

50 - - - -I ---

I0I4

c

.. 11

-I

recombinative emcmc algorithmsleonardo/artigos/ricardo/01554944.pdf · ing few chains (e.g....

Documents