ga applications peaks function c- code goat package for matlab minimization and maximization...

GA Applications

• Peaks function C- code GOAT package for MATLAB minimization and maximization• Traveling Salesman Problem genotype and phenotype encoding customizing operators rankscaling• Hillis Sorting Problem• Sequence Alignment• Floating point GAs• Constraint optimization• Multi-objective optimization• The schemata theorem

Components of binary GA in Feature Selection

Problem: max R2

110101111111000000

f1 = 0.60f2 = 0.30f3 = 0.10

0.60.3

0.1

110101110101000000

Crossover point

111111000000

111100000011Crossover

SelectionPopulation Fitness

Selected gene

111111 111110Mutation

SelectedPopulation

Mutated gene

R2 = Goodness of fit

• Uniform Crossover

5 24 131 534 603 Parent 1

19 33 255 334 508 Parent 2

19 33 131 534 603 Child 1

5 24 255 334 508 Child 2

5 24 131 534 603

5 24 344 534 603

• MutationParent

Child

Genetic Operators

222222 15312

31

51013, yxyxyx eeyxxexyxfx

Traveling Salesman Problem

33 34 35 36 37 38 39 40 41 42-10

-5

0

5

10

15

20

25

3052 CITY TSP BENCHMARK

X-coordinate

Y-c

oord

inat

e

tourlength =73.9876

1312

161

324

8

15

5

11

9

107614

33 34 35 36 37 38 39 40 41 42

-10

-5

0

5

10

15

20

25


X-coordinate

Y-c

oord

inat

e

tourlength =73.9998

84

2 3

16

131276 10

9

11

5

15 14

1

0 200 400 600 800 1000 1200 1400 1600 1800

0

200

400

600

800

1000


X-coordinate

Y-c

oord

inat

e

tourlength =8386.4347

52

1112

51

33

43

1524483739363534

44

502023

3027 42

21

173

1831 221

4932

109

84119

45

403856 4

2546

16

29

47

26

2827

13

14

0 200 400 600 800 1000 1200 1400 1600 1800

0

200

400

600

800

1000


X-coordinate

Y-c

oord

inat

e


4

43

109 33

51

11

52

14

13

47

2627

28

12254644

16

29

5020

3027 42

2123

311817

3

418

1945

32221

49363534394037384824

5156

0 200 400 600 800 1000 1200 1400 1600 1800

0

200

400

600

800

1000


X-coordinate

Y-c

oord

inat

e


3027 42

21

173

418

910

1945

3249363534394037384824

5156 4

43

33

51

11

52

14

13

47

2627

28

12254644

1221831

232050 16

29

void main(int argc, char *argv[]) {char mombassa[80], root[80];data b;double alpha, beta; //user dataint num_cities;MATRIX distances;

Container box; //user data to objective function in boxdouble (* fptr) (data*, VECTOR); //function pointer to objective fnctngenotype pop;

fptr = Salesman3; MatrixAllocate(&distances, 500, 500); userData(&b, &box); // tells pointer of userdata in data struct for b Read_User_Data(&alpha, &beta, &num_cities, distances); box.pop = &pop; box.alpha = alpha; box.beta = beta; box.num_cities = num_cities; box.distances = distances; if (argc == 2) strcpy( mombassa, argv[1]); Allocate_GA(&pop, &b, argc, mombassa, root, fptr); b.print_flag=0; Loop_GA(&b, &pop, root, fptr); Write_User_Data(&b, &pop, root, fptr); De_Allocate_GA(&pop, &b, root, fptr); MatrixFree(distances, 500);}

double Salesman2(data *a, VECTOR x) {int i, isum=0;double tour= 0, pen1=0, pen2=0;double alpha, beta;int num_cities, one, two, help;

Container * box = (Container *)(a->ud); alpha = box->alpha; beta = box->beta; num_cities = box->num_cities; help = num_cities/2*(num_cities-1); if (num_cities%2 == 1) help = help+num_cities%2; for (i = 0; i < num_cities-1;i++) { one = (int) x[i]; two = (int) x[i+1]; tour = tour + box->distances[one][two]; } one = (int) x[num_cities-1]; two = (int) x[0]; tour = tour + box->distances[one][two]; for (i = 0; i < num_cities;i++) isum += (int) x[i]; if (isum!=help) pen1=alpha; getche(); box->penn1=pen1; box->penn2=pen2; return tour + pen1;}

SCHEMATA THEOREM (Holland)

• h(i) raw fitness for population sample i• f(i) = normalized fitness f(i) = h(i)/Σh(i)• A schema denotes a set of substrings that have identical values at certain loci: 1#101 = {10101, 11101}• m(S,t) number of scheme exemplars in pop at generation t• Number of schema of inividual S present in next generation is proportional to chance of an individual being picked that has the schema according to:

m(S,t+1) = m(S,t) n f(S)/Σf = m(S,t) f(S)/fave = m(S,t) fave (1+c)

• m(S,t+1) = m(S,0) (1+c)t

• Better than average schemata grow exponentially

Partially Mapped Crossover

37 16 52 44 61 3 57 7 99 71

32 99 77 70 16 35 49 12 48 89

Parent 1

Parent 2

Crossover point I

37 16 52 44 61 3 57 7 99 71

32 99 77 70 16 35 49 12 48 89

Parent 1

Parent 2

Crossover point II

37 16 52 70 16 35 57 7 99 71

32 99 77 44 61 3 49 12 48 89

Step1. Select parents

Step2. Select Crossover Points

Step3. Swap Mapping Sections

Step4. Determine Mapping Relationship

Step5. Correct Offspring based on Mapping Relationship

37 61 52 70 16 35 57 7 99 71

32 99 77 44 61 3 49 12 48 89

Offspring 1

Offspring 2

Duplicated feature

44 70 61 16 3 35

Genetic Algorithm cycle

Initial PopulationEvaluation

Selection

Elitist strategy Next Generation

Crossover

Mutation

Selected Population Parents

OffspringParents

Evaluation

Rank selection1)1()( iqqip

Fitness proportional

Tournament

Make sure that best individual survives

Note: In the plot, fitnesses are plotted as (1-R2) andThe problem can be thought as a minimization.

Source: A. Yasri and D. Hartsough, Toward an Optimal Procedure for Variable Selection and QSAR Model Building

J. Chem. Inf. Comput. Sci. 2001 Vol. 41, No.5, pp. 1218-1227.

Search space in feature selection

Number of features to be selected (N) Search space

0 11 102 453 1204 2105 2526 2107 1208 459 10

10 1Total 1024

24.4%

311.7%

00.1%

420.5%

11.0%

524.6%

91.0%

620.5%

100.1%

711.7%

84.4%A data set with 10 features

ga applications peaks function c- code goat package for matlab minimization and maximization...

Documents