a new crossover technique in genetic programming janet clegg intelligent systems group electronics...
Post on 20-Dec-2015
216 Views
Preview:
TRANSCRIPT
A new crossover technique in Genetic Programming
Janet Clegg
Intelligent Systems Group
Electronics Department
This presentation
Describe basic evolutionary optimisation
Overview of failed attempts at crossover methods
Describe the new crossover technique
Results from testing on two regression problems
Evolutionary optimisation
Start by choosing a set of random trial solutions
(population)
Each trial solution is evaluated
(fitness / cost)
1.2 0.9 0.8 1.4 0.2 0.3 2.1 2.3 0.3 0.9 1.3 1.0
0.8 2.4 0.4 1.7 1.5 0.6 1.3 0.8 2.5 0.7 1.4 2.1
Parent selection
0.9 2.1 1.3 0.7
1.4 0.6 1.7 1.5
2.1
1.7
Select mother
Select father
Perform crossover
child 1 child 2
Mutation
Probability of mutation small (say 0.1)
This provides a new population of solutions –
the next generation
Repeat generation after generation
1 select parents
2 perform crossover
3 mutate
until converged
Genetic Algorithms (GA) –
Optimises some quantity by varying parameter values which have numerical values
Genetic Programming (GP) –
Optimises some quantity by varying parameters which are functions / parts of computer code / circuit components
Two types of evolutionary optimisation
Representation
Representation
Traditional GA’s - binary representation
e.g. 1011100001111
Floating point GA – performs better than binary
e.g. 7.2674554
Genetic Program (GP)Nodes represent functions whose inputs are below the branches attached to the node
Some crossover methods
Crossover in a binary GA
Mother
1 0 0 0 0 0 1 0 = 130
Father
0 1 1 1 1 0 1 0 = 122
Child 1
1 1 1 1 1 0 1 0 = 250
Child 2
0 0 0 0 0 0 1 0 = 2
Min parameter value
Max parameter value
Mother father
Offspring chosen as random point between mother and father
Crossover in a floating point GA
Traditional method of crossover in a GP
mother father
Child 1 Child 2
Motivation for this work
Tree crossover in a GP does not always perform well
Angeline and Luke and Spencer compared:-
(1) performance of tree crossover
(2) simple mutation of the branches
difference in performance was statistically insignificant
Consequently some people implement GP’s with no crossover - mutation only
Motivation for this work
In a GP many people do not use crossover so mutation is the more important operator
In a GA the crossover operator contributes a great deal to its performance - mutation is a secondary operator
AIM:- find a crossover technique in GP which works as well as the crossover in a GA
Cartesian Genetic Programming
Cartesian Genetic Programming (CGP)
Julian Miller introduced CGP
Replaces tree representation with directed graphs – represented by a string of integers
The CGP representation will be explained within the first test problem
CGP uses mutation only – no crossover
First simple test problem
A simple regression problem:-
Finding the function to best fit data taken from
122 xx
The GP should find this exact function as the optimal fit
The traditional GP method for this problem
Set of functions and terminals
Functions Terminals
+ 1
- x
*
*
x x x 1
-
*
(1-x) * (x*x)
Initial population created by randomly choosing functions and terminals within the tree structures
Crossover by randomly swapping sub-branches of the parent trees
mother father
Child 1 Child 2
Set of functions – each function identified by an integer
Functions Integer representation
+ 0
- 1
* 2
Set of terminals – each identified by an integer
Terminals Integer representation
1 0
x 1
CGP representation
Creating the initial population
2
First integer is random choice of function 0 (+), 1 (-), or 2 (*)
0 1
Second two integers are random choice of terminals 0 (1) or 1 (x)
0
2
1 1
Next integers are random choice of inputs for the function from the set
0 (1) 1 (x) or node 2
3
Creating the initial population
2 0 1 0
2
1 1
random choice of inputs from
0 1 2 3 4 5
Terminals nodes
3
1 3 1 0 2 3
4 5
2 4 1
6
random choice for output from
0 1 2 3 4 5 6
Terminals all nodes
5
output
2 0 1 0 1 1 1 3 1 0 2 3 2 4 1 5
2 3 4 5 6 output
output
5
+
x x x 1
*
+
32
(1*x) +(x+x) = 3x
122 xx
Run the CGP with test data taken from the function
Population size 30
28 offspring created at each generation
Mutation only to begin with
Fitness is the sum of squared differences between data and function
Result
0 0 1 2 2 1 1 2 2 0 3 2 0 5 1 5
2 3 4 5 6 output
xxx 1*)1(
5
+
x 1 x +
*
+
23
1 x
2
= 122 xx
Any two runs of a GP (or GA) will not be exactly the same
To analyse the convergence of the GP we need lots of runs
All the following graphs depict the average convergence out of 4000 runs
Statistical analysis of GP
Introduce crossover
Introducing some Crossover
Parent 1
0 0 1 2 2 1 1 2 2 0 3 2 0 5 1 5
Parent 2
2 0 1 0 1 1 1 3 1 0 2 3 2 4 1 5
Child 1
0 0 1 2 2 1 1 3 1 0 2 3 2 4 1 5
Child 2
2 0 1 0 1 1 1 2 2 0 3 2 0 5 1 5
Pick random crossover point
GP with and without crossover
0
1
2
3
4
5
6
7
8
9
0 10 20 30 40 50 60 70 80 90 100
Without crossoverWith crossover
-350
-300
-250
-200
-150
-100
-50
0 200 400 600 800 1000
No CrossoverWith Crossover
GA with and without crossover
Parent 1
0 0 1 2 2 1 1 2 2 0 3 2 0 5 1 5
Parent 2
2 0 1 0 1 1 1 3 1 0 2 3 2 4 1 5
Child 1
0 0 1 2 2 1 1 3 1 0 2 3 2 4 1 5
Child 2
2 0 1 0 1 1 1 2 2 0 3 2 0 5 1 5
Random crossover point but must be between the nodes
0 0 1 1 1 2 2 3 2 2 4 1 0 3 5 6
2 3 4 5 6 output
+
-*
x + * x
1 x- +
x +
1 x
1 x
6
3
2
5
4
3
2
2
GP crossover at nodes
0
1
2
3
4
5
6
7
8
9
0 10 20 30 40 50 60 70 80 90 100
Without crossoverWith crossover
Crossover at nodes
Parent 1
0 0 1 2 2 1 1 2 2 0 3 2 0 5 1 5
Parent 2
2 0 1 0 1 1 1 3 1 0 2 3 2 4 1 5
Child 1
0 0 1 2 2 1 1 3 1 0 2 3 2 4 1 5
Child 2
2 0 1 0 1 1 1 2 2 0 3 2 0 5 1 5
Pick a random node along the string and swap this single node
Crossover only one node
0
1
2
3
4
5
6
7
8
9
0 10 20 30 40 50 60 70 80 90 100
Without crossoverWith crossover
Crossover only one node
Parent 1
0 0 1 2 2 1 1 2 2 0 3 2 0 5 1 5
Parent 2
2 0 1 0 1 1 1 3 1 0 2 3 2 4 1 5
Child 1
2 0 1 0 2 1 1 2 1 0 2 2 2 5 1 5
Child 2
0 0 1 2 1 1 1 3 2 0 3 3 0 4 1 5
Each integer in child randomly takes value from either mother or father
Random swap crossover
0
1
2
3
4
5
6
7
8
9
0 10 20 30 40 50 60 70 80 90 100
Without crossoverWith crossover
Random swap crossover
Comparison with random search
GP with no crossover performs better than any of the trial crossover here
How much better than a completely random search is it?
The only means it will improve on a random search are by
parent selection
mutation
Comparison with a random search
0
1
2
3
4
5
6
7
8
9
0 10 20 30 40 50 60 70 80 90 100
Without crossoverRandom search
Comparison with a random search
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0.01
30 35 40 45 50 55 60 65 70 75 80
Without crossoverRandom search
GP converges in 58 generations
Random search 73 generations
GA performance compared with a completely random search
GA tested on a large problem –
A random search would have involved searching through
150,000,000 data points
The GA reached the solution after testing
27,000 data points
( average convergence of 5000 GA runs)
Probability of random search reaching solution in 27,000 trials is 0.00018 !!!!
Why does GP tree crossover not always work well?
f1
f2 f3
f5f4
x8x7x6x5x4x3x2x1
f6 f7
f1 { f2 [ f4( x1,x2 ), f5( x3,x4 ) ] , f3 [ f6( x5,x6 ), f7( x7,x8 ) ] }
g1 { g2 [ g4( y1,y2 ), g5( y3,y4 ) ] , g3 [ g6( y5,y6 ), g7( y7,y8 ) ] }
f1 { f2 [ f4( x1,x2 ), f5( x3,x4 ) ] , f3 [ f6( x5,x6 ), g7( y7,y8 ) ] }
x2
x1
f
g
g( x1 ) = f( x2 )
Good!
x2
x1
f
gg( x2 )
f( x1 )
Good!
f( x1 )
f( x2 )
g( x2 )
g( x1 )
0
0.5
1 0
0.5
1
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
Suppose we are trying to find a function to fit a set of data looking like this
exp( -(0.5-x)*(0.5-x)-(0.5-y)*(0.5-y) )
0
0.5
1 0
0.5
1
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1/( 0.85+( (0.5-x)*(0.5-x)+(0.5-y)*(0.5-y) )**0.5 )
Suppose we have 2 parents which fit the data fairly well
22 5.0)5.0(
1 xye
/
1
+
xy
^ ^
- - 22
0.50.5
exp
Choose crossover point
/
1
2
^
+
+
yx
^ ^
- -2
0.50.5
0.5
0.85 22 5.05.085.0
1
yx
Choose crossover point
exp( -(0.5-x)*(0.5-x)-(0.5-y)*(0.5-y) )
0
0.5
1 0
0.5
1
0.6
0.7
0.8
0.9
1
1.1
1.2
1/( 0.85+( (0.5-x)*(0.5-x)+(0.5-y)*(0.5-y) )**0.5 )1/(0.85+((x-0.5)**2+(x-0.5)**2)**0.5)
0
0.5
1 0
0.5
1
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
Introducing the new technique
Based on Julian Miller’s
Cartesian Genetic Program (CGP)
The CGP representation is changed
Integer values are replaced by floating point values
Crossover is performed as in a floating point GA
CGP representation
0 0 1 2 2 1 1 2 2 0 3 2 0 5 1 5
1 2 3 4 5 output
New representation – replace integers with floating point variables
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16
Where the xi lie in a defined range, say 1,0ix
2 3 4 5 6 output
Interpretation of the new representation
For the variables xi which represent choice of function
If the set of functions is
+ - *
0 0.33 0.66 1
+ - *
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16
1 2 3 4 5 output
0 0.33 0.66 1
1 x node 1
0 0.2 0.4 1
1 node 1 x node 2 node 3
0.6 0.8
The crossover operator
Crossover is performed as in a floating point GA
Two parents, p1 and p2 are chosen
Offspring o1 and o2 are created by
Uniformly generated random number is chosen 0 < ri < 1
oi = p1 + ri (p2 − p1) when p1 < p2
Min parameter value
Max parameter value
Mother father
Offspring chosen as random point between mother and father
Crossover in the new representation
Why is this crossover likely to work better than tree crossover?
x8
f1
f2 f3
f5f4
x7x6x5x4x3x2x1
f6 f7
f1 { f2 [ f4( x1,x2 ), f5( x3,x4 ) ] , f3 [ f6( x5,x6 ), f7( x7,x8 ) ] }
g1 { g2 [ g4( y1,y2 ), g5( y3,y4 ) ] , g3 [ g6( y5,y6 ), g7( y7,y8 ) ] }
f1 { f2 [ f4( x1,x2 ), f5( x3,x4 ) ] , f3 [ f6( x5,x6 ), g7( y7,y8 ) ] }
Mathematical interpretation of tree crossover
1321 ,..., nxxxf
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16
Mathematical interpretation of the new method
1 2 3 4 5 output
The fitness can be thought of as a function of these 16 variables, say
1621 ,...,, xxxf
and the optimisation becomes that of finding the values of these 16 variables which give the best fitness
The new crossover guides each variable towards its optimal value in a continuous manner
The new technique has been tested on two problems
Two regression problems studied by Koza
246 2 xxx
xxx 35 2
(1)
(2)
Test data – 50 points in the interval [-1,1]
Fitness is the sum of the absolute errors over 50 points
Population size 50 – 48 offspring each generation
Tournament selection used to select parents
Various rates of crossover tested 0% 25% 50% 75%
Number of nodes in representation = 10
The following results are based on the average convergence of the new method out of 1000 runs
Considered converged when the absolute error is less than 0.01 at all of the 50 data points (same as Koza)
Results based on
(1) average convergence graphs
(2) the average number of generations to converge
(3) Koza’s computational effort figure
Statistical analysis of new method
0
0.5
1
1.5
2
2.5
3
3.5
4
0 50 100 150 200 250 300 350 400 450 500
Cos
t fu
nctio
n
Generation number
Crossover 0%Crossover 25%Crossover 50%Crossover 75%
Average convergence for 246 2 xxx
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
200 300 400 500 600 700 800 900 1000
Cos
t fu
nctio
n
Generation number
Crossover 0%Crossover 25%Crossover 50%Crossover 75%
Convergence for latter generations
Introduce variable crossover
At generation number 1
crossover performed 90% of the time
Rate of crossover linearly decreases until
Generation number 180
crossover is 0%
0
0.5
1
1.5
2
2.5
3
3.5
4
0 50 100 150 200 250 300 350 400 450 500
Cos
t fu
nctio
n
Generation number
Crossover 0%Crossover 25%Crossover 50%Crossover 75%
Variable crossover
Variable crossover
Average number of generations to converge and computational effort
Percentage crossover
Average number of
generations to converge
Koza’s computational
effort
0% 168 30,000
25% 84 9,000
50% 57 8,000
75% 71 6,000
Variable crossover
47 10,000
1
2
3
4
5
6
7
8
9
0 50 100 150 200 250 300 350 400 450 500
Cos
t fu
nctio
n
Generation number
Crossover 0%Crossover 25%Crossover 50%Crossover 75%
Average convergence for xxx 35 2
0
1
2
3
4
5
6
7
8
9
0 50 100 150 200 250 300 350 400 450 500
Cos
t fu
nctio
n
Generation number
Crossover 0%Crossover 25%Crossover 50%Crossover 75%
Variable crossover
Variable crossover
Percentage crossover
Average number of
generations to converge
Koza’s computational
effort
0% 516 44,000
25% 735 24,000
50% 691 14,000
75% 655 11,000
Variable crossover
278 13,000
Average number of generations to converge and computational effort
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 20 40 60 80 100
Num
ber
of g
ener
atio
ns t
o co
nver
ge
Run number
Problem given in Equation 4Problem given in Equation 5
Number of generations to converge for both problems over 100 runs
0
1
2
3
4
5
6
7
8
9
0 100 200 300 400 500 600 700 800 900 1000
Cost
function
Generation number
Crossover 0%Crossover 25%Crossover 50%Crossover 75%
Variable crossover
Average convergence ignoring runs which take over 1000 generations to converge
The new technique has reduced the average number of generations to converge
From 168 down to 47 for the first problem tested
From 516 down to 278 for the second problem
Conclusions
When crossover is 0% this new method is equivalent to the traditional CGP - mutation only
The computational effort figures for 0% crossover here are similar to those reported for the traditional CGP
Although a larger mutation rate and population size have been used here
Conclusions
Future work
Investigate the effects of varying the GP parameters
population size
mutation rate
selection strategies
Test the new technique on other problems
larger problems
other types of problems
Thankyou for listening!
0
0.5
1
1.5
2
2.5
3
3.5
4
0 50 100 150 200 250 300 350 400 450 500
Cost
function
Generation number
Crossover 0%Crossover 25%Crossover 50%Crossover 75%
Variable crossover
Average convergence for
using 50 nodes instead of 10
246 2 xxx
Average number of generations to converge and computational effort
Percentage crossover
Average number of
generations to converge
Koza’s computational
effort
0% 78 18,000
25% 85 13,000
50% 71 11,000
75% 104 13,000
Variable crossover
45 14,000
0
1
2
3
4
5
6
7
8
9
0 50 100 150 200 250 300 350 400 450 500
Cost
function
Generation number
Crossover 0%Crossover 25%Crossover 50%Crossover 75%
Variable crossover
xxx 35 2Average convergence for
using 50 nodes instead of 10
Average number of generations to converge and computational effort
Percentage crossover
Average number of
generations to converge
Koza’s computational
effort
0% 131 18,000
25% 193 17,000
50% 224 12,000
75% 152 19,000
Variable crossover
58 16,000
top related