ga applications peaks function c- code goat package for matlab minimization and maximization...
DESCRIPTION
Components of binary GA in Feature Selection Problem: max R f 1 = 0.60 f 2 = 0.30 f 3 = Crossover point Crossover Selection Population Fitness Selected gene Mutation Selected Population Mutated gene R 2 = Goodness of fitTRANSCRIPT
GA Applications
• Peaks function C- code GOAT package for MATLAB minimization and maximization• Traveling Salesman Problem genotype and phenotype encoding customizing operators rankscaling• Hillis Sorting Problem• Sequence Alignment• Floating point GAs• Constraint optimization• Multi-objective optimization• The schemata theorem
Components of binary GA in Feature Selection
Problem: max R2
110101111111000000
f1 = 0.60f2 = 0.30f3 = 0.10
0.60.3
0.1
110101110101000000
Crossover point
111111000000
111100000011Crossover
SelectionPopulation Fitness
Selected gene
111111 111110Mutation
SelectedPopulation
Mutated gene
R2 = Goodness of fit
• Uniform Crossover
5 24 131 534 603 Parent 1
19 33 255 334 508 Parent 2
19 33 131 534 603 Child 1
5 24 255 334 508 Child 2
5 24 131 534 603
5 24 344 534 603
• MutationParent
Child
Genetic Operators
222222 15312
31
51013, yxyxyx eeyxxexyxfx
222222 15312
31
51013, yxyxyx eeyxxexyxfx
222222 15312
31
51013, yxyxyx eeyxxexyxfx
Traveling Salesman Problem
33 34 35 36 37 38 39 40 41 42-10
-5
0
5
10
15
20
25
3052 CITY TSP BENCHMARK
X-coordinate
Y-c
oord
inat
e
tourlength =73.9876
1312
161
324
8
15
5
11
9
107614
33 34 35 36 37 38 39 40 41 42
-10
-5
0
5
10
15
20
25
3052 CITY TSP BENCHMARK
X-coordinate
Y-c
oord
inat
e
tourlength =73.9998
84
2 3
16
131276 10
9
11
5
15 14
1
0 200 400 600 800 1000 1200 1400 1600 1800
0
200
400
600
800
1000
120052 CITY TSP BENCHMARK
X-coordinate
Y-c
oord
inat
e
tourlength =8386.4347
52
1112
51
33
43
1524483739363534
44
502023
3027 42
21
173
1831 221
4932
109
84119
45
403856 4
2546
16
29
47
26
2827
13
14
0 200 400 600 800 1000 1200 1400 1600 1800
0
200
400
600
800
1000
120052 CITY TSP BENCHMARK
X-coordinate
Y-c
oord
inat
e
tourlength =8231.2415
4
43
109 33
51
11
52
14
13
47
2627
28
12254644
16
29
5020
3027 42
2123
311817
3
418
1945
32221
49363534394037384824
5156
0 200 400 600 800 1000 1200 1400 1600 1800
0
200
400
600
800
1000
120052 CITY TSP BENCHMARK
X-coordinate
Y-c
oord
inat
e
tourlength =7992.4585
3027 42
21
173
418
910
1945
3249363534394037384824
5156 4
43
33
51
11
52
14
13
47
2627
28
12254644
1221831
232050 16
29
void main(int argc, char *argv[]) {char mombassa[80], root[80];data b;double alpha, beta; //user dataint num_cities;MATRIX distances;
Container box; //user data to objective function in boxdouble (* fptr) (data*, VECTOR); //function pointer to objective fnctngenotype pop;
fptr = Salesman3; MatrixAllocate(&distances, 500, 500); userData(&b, &box); // tells pointer of userdata in data struct for b Read_User_Data(&alpha, &beta, &num_cities, distances); box.pop = &pop; box.alpha = alpha; box.beta = beta; box.num_cities = num_cities; box.distances = distances; if (argc == 2) strcpy( mombassa, argv[1]); Allocate_GA(&pop, &b, argc, mombassa, root, fptr); b.print_flag=0; Loop_GA(&b, &pop, root, fptr); Write_User_Data(&b, &pop, root, fptr); De_Allocate_GA(&pop, &b, root, fptr); MatrixFree(distances, 500);}
double Salesman2(data *a, VECTOR x) {int i, isum=0;double tour= 0, pen1=0, pen2=0;double alpha, beta;int num_cities, one, two, help;
Container * box = (Container *)(a->ud); alpha = box->alpha; beta = box->beta; num_cities = box->num_cities; help = num_cities/2*(num_cities-1); if (num_cities%2 == 1) help = help+num_cities%2; for (i = 0; i < num_cities-1;i++) { one = (int) x[i]; two = (int) x[i+1]; tour = tour + box->distances[one][two]; } one = (int) x[num_cities-1]; two = (int) x[0]; tour = tour + box->distances[one][two]; for (i = 0; i < num_cities;i++) isum += (int) x[i]; if (isum!=help) pen1=alpha; getche(); box->penn1=pen1; box->penn2=pen2; return tour + pen1;}
SCHEMATA THEOREM (Holland)
• h(i) raw fitness for population sample i• f(i) = normalized fitness f(i) = h(i)/Σh(i)• A schema denotes a set of substrings that have identical values at certain loci: 1#101 = {10101, 11101}• m(S,t) number of scheme exemplars in pop at generation t• Number of schema of inividual S present in next generation is proportional to chance of an individual being picked that has the schema according to:
m(S,t+1) = m(S,t) n f(S)/Σf = m(S,t) f(S)/fave = m(S,t) fave (1+c)
• m(S,t+1) = m(S,0) (1+c)t
• Better than average schemata grow exponentially
Partially Mapped Crossover
37 16 52 44 61 3 57 7 99 71
32 99 77 70 16 35 49 12 48 89
Parent 1
Parent 2
Crossover point I
37 16 52 44 61 3 57 7 99 71
32 99 77 70 16 35 49 12 48 89
Parent 1
Parent 2
Crossover point II
37 16 52 70 16 35 57 7 99 71
32 99 77 44 61 3 49 12 48 89
Step1. Select parents
Step2. Select Crossover Points
Step3. Swap Mapping Sections
Step4. Determine Mapping Relationship
Step5. Correct Offspring based on Mapping Relationship
37 61 52 70 16 35 57 7 99 71
32 99 77 44 61 3 49 12 48 89
Offspring 1
Offspring 2
Duplicated feature
44 70 61 16 3 35
Genetic Algorithm cycle
Initial PopulationEvaluation
Selection
Elitist strategy Next Generation
Crossover
Mutation
Selected Population Parents
OffspringParents
Evaluation
Rank selection1)1()( iqqip
Fitness proportional
Tournament
Make sure that best individual survives
Note: In the plot, fitnesses are plotted as (1-R2) andThe problem can be thought as a minimization.
Source: A. Yasri and D. Hartsough, Toward an Optimal Procedure for Variable Selection and QSAR Model Building
J. Chem. Inf. Comput. Sci. 2001 Vol. 41, No.5, pp. 1218-1227.
Search space in feature selection
Number of features to be selected (N) Search space
0 11 102 453 1204 2105 2526 2107 1208 459 10
10 1Total 1024
24.4%
311.7%
00.1%
420.5%
11.0%
524.6%
91.0%
620.5%
100.1%
711.7%
84.4%A data set with 10 features