parallel memetic algorithm quick design guide quick tips...

1
RESEARCH POSTER PRESENTATION DESIGN © 2011 www.PosterPresentation s.com Genome size is a parameter that determines the length of the genetic information carried by individuals. Increasing the genome size causes the algorithm to execute longer for both CPU and GPU algorithms since genome size is directly proportional to the size of data processed. In this paper, a parallel memetic algorithm implementation for CUDA platform is described. The conventional genetic operators are adapted to the GPU considering the GPU architecture. In this population based optimization technique, there are one more islands and each island consists of constant number of individuals. Each CUDA thread is responsible for evolution of one individual, and islands are mapped as CUDA blocks to benefit from the shared memory. The results show up to 38x speedup compared to the CPU implementation. ABSTRACT Populations reside in thread blocks, so the shared memory is effectively used for genetic operators including crossover, mutation and local search. In each thread block, each thread is responsible for an individual. At the beginning of the kernel code, each thread reads its genome data from global memory and puts them into shared memory. After that point all of the calculations done by threads are made on shared memory except the migration operator. Since migration operator requires communication between distinct islands (thread blocks), global memory has to be employed for this operation. GENERAL DESIGN KERNEL PSEUDO CODE EXPERIMENTS TESTING CONFIGURATION The performance of our implementation has been evaluated on a PC having Intel Core i7 870 @ 2.93 GHz processor and NVDIA GeForce GTX 460 (336 cores) GPU. All of the experiments have been repeated 20 times using the same configuration.. The parallel memetic algorithm configuration: population size: 16 to 512 number of islands: 16 to 1024 genome size: 4 tournament selection with winning probability of 0.8 mutation probability: 0.05 local search iteration number: 10 local search step size: 0.001 migration interval: 10 CONCLUSION While it is dependent on the fitness functions, GPU implementation shows better performance for all types of fitness functions. The best speedup is achieved with Griewangk function where 38x speedup is observed. The future work will be focusing on more complex optimization problems providing solutions to real world problems. Moreover, different genetic operators (local search, mutation and crossover) are planned to be implemented on CUDA to achieve better optimization results. Parallel Memetic Algorithm Implementation on CUDA Umut Cinar, Alptekin Temizel Graduate School of Informatics, Middle East Technical University, Ankara, Turkey Function Formula Global Minimum Properties Rosenbrock = [100 +1 2 2 + ( − 1) 2 ] −1 =1 = 1,1, … , 1 ; =0 Not seperable Rastrigin = 10 + [ 2 − 10cos(2 )] =1 = 0,0, … , 0 ; =0 Seperable and Multimodal Griewangk =1+ 2 4000 =1 cos( ) =1 = 0,0, … , 0 ; =0 Not seperable and Multimodal Schwefel = ( =1 ) 2 =1 = 0,0, … , 0 ; =0 Not seperable Initialize shared memory buffers WHILE termination criterion is not met; DO evaluate individual begin local search tournament selection IF thread_id < warp_size offsprings = crossover(parents) offsprings = mutation(offsprings) replace offsprings with worst_individuals ELSE IF thread_id < warp_size*2 AND (MIGRATION_INTERVAL is true) migrate(best_individuals) END IF END WHILE Threads within a block continues in parallel until the variation and migration operators take place. At this point, two warp of threads executes crossover/mutation and migration operations. First warp is assigned to crossover/mutation while the second warp is assigned to migration (when it is migration interval). Hence, this part of the implementation is completely unrolled by using a warp for each operation. This approach decreases the cost of warp divergence significantly. The experiments show that the results are heavily dependent on the fitness function type. In a memetic algorithm, the most computationally complex routine is the fitness function, and this is reflected in the dependency of performance on fitness function type. The maximum speedups achieved with Griewangk function as x38. 16 64 256 0 2 4 6 8 10 12 14 Speedup Population Size Number of Islands Rosenbrock function on GPU (max speedup is x13) 12-14 10-12 8-10 6-8 4-6 2-4 0-2 16 64 256 0 5 10 15 20 25 Speedup Population Size Number of Islands Rastrigin function on GPU (max speedup is x25) 20-25 15-20 10-15 5-10 0-5 16 64 256 0 2 4 6 8 10 Speedup Population Size Number of Islands Schwefel function on GPU (max speedup is x9) 8-10 6-8 4-6 2-4 0-2 16 64 256 0 10 20 30 40 Speedup Population Size Number of Islands Grienwangk function on GPU (max speedup is x38) 30-40 20-30 10-20 0-10 Speedup has been investigated using several benchmarking functions including Rosenbrock, Rastrigin, Schwefel and Griewangk and various parameters. INFLUENCE OF GENOME SIZE ON EXECUTION TIME 53 97 141 185 229 1989 3408 5027 6626 8536 1 10 100 1000 10000 2 4 6 8 10 Execution Time (ms) Genome Size Genome Size vs Execution Time CPU GPU REFERENCES [1] Pablo Moscato, Carlos Cotta, “A Gentle Introduction to Memetic Algorithms”, Handbook of Metaheuristics, pp. 105-144, 2003. [2] Wojciech Bozejko, Mieczyslaw Wodecki, “The Methodology of Parallel Memetic Algorithms Designing”, in Proc. ICAART 2011.

Upload: others

Post on 21-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Memetic Algorithm QUICK DESIGN GUIDE QUICK TIPS ...developer.download.nvidia.com/GTC/...poster_Memetic_UCinar_ATemizelv3.pdf · type. In a memetic algorithm, the most computationally

RESEARCH POSTER PRESENTATION DESIGN © 2011

www.PosterPresentations.com

QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--)

This PowerPoint 2007 template produces a 91cm x 122cm

professional poster. It will save you valuable time placing

titles, subtitles, text, and graphics.

Use it to create your presentation. Then send it to

PosterPresentations.com for premium quality, same day

affordable printing.

We provide a series of online tutorials that will guide you

through the poster design process and answer your poster

production questions.

View our online tutorials at:

http://bit.ly/Poster_creation_help

(copy and paste the link into your web browser).

For assistance and to order your printed poster call

PosterPresentations.com at 1.866.649.3004

Object Placeholders

Use the placeholders provided below to add new elements

to your poster: Drag a placeholder onto the poster area,

size it, and click it to edit.

Section Header placeholder

Move this preformatted section header placeholder to the

poster area to add another section header. Use section

headers to separate topics or concepts within your

presentation.

Text placeholder

Move this preformatted text placeholder to the poster to

add a new body of text.

Picture placeholder

Move this graphic placeholder onto your poster, size it

first, and then click it to add a picture to the poster.

Student discounts are available on our Facebook page.

Go to PosterPresentations.com and click on the FB icon.

QUICK TIPS

(--THIS SECTION DOES NOT PRINT--)

This PowerPoint template requires basic PowerPoint

(version 2007 or newer) skills. Below is a list of commonly

asked questions specific to this template.

If you are using an older version of PowerPoint some

template features may not work properly.

Using the template

Verifying the quality of your graphics

Go to the VIEW menu and click on ZOOM to set your

preferred magnification. This template is at 100% the size

of the final poster. All text and graphics will be printed at

100% their size. To see what your poster will look like

when printed, set the zoom to 100% and evaluate the

quality of all your graphics before you submit your poster

for printing.

Using the placeholders

To add text to this template click inside a placeholder and

type in or paste your text. To move a placeholder, click on

it once (to select it), place your cursor on its frame and

your cursor will change to this symbol: Then, click

once and drag it to its new location where you can resize

it as needed. Additional placeholders can be found on the

left side of this template.

Modifying the layout

This template has four different

column layouts. Right-click your

mouse on the background and

click on “Layout” to see the

layout options. The columns in

the provided layouts are fixed and cannot be moved but

advanced users can modify any layout by going to VIEW

and then SLIDE MASTER.

Importing text and graphics from external sources

TEXT: Paste or type your text into a pre-existing

placeholder or drag in a new placeholder from the left

side of the template. Move it anywhere as needed.

PHOTOS: Drag in a picture placeholder, size it first, click

in it and insert a photo from the menu.

TABLES: You can copy and paste a table from an external

document onto this poster template. To adjust the way

the text fits within the cells of a table that has been

pasted, right-click on the table, click FORMAT SHAPE then

click on TEXT BOX and change the INTERNAL MARGIN

values to 0.25

Modifying the color scheme

To change the color scheme of this template go to the

“Design” menu and click on “Colors”. You can choose from

the provide color combinations or you can create your

own.

© 2011 PosterPresentations.com 2117 Fourth Street , Unit C Berkeley CA 94710 [email protected]

• Genome size is a parameter that determines the length of the genetic information carried by individuals.

• Increasing the genome size causes the algorithm to execute longer for both CPU and GPU algorithms since genome size is directly proportional to the size of data processed.

In this paper, a parallel memetic algorithm implementation for CUDA platform is described. The conventional genetic operators are adapted to the GPU considering the GPU architecture. In this population based optimization technique, there are one more islands and each island consists of constant number of individuals. Each CUDA thread is responsible for evolution of one individual, and islands are mapped as CUDA blocks to benefit from the shared memory. The results show up to 38x speedup compared to the CPU implementation.

ABSTRACT

Populations reside in thread blocks, so the shared memory is effectively used for genetic operators including crossover, mutation and local search. In each thread block, each thread is responsible for an individual.

At the beginning of the kernel code, each thread reads its genome data from global memory and puts them into shared memory. After that point all of the calculations done by threads are made on shared memory except the migration operator. Since migration operator requires communication between distinct islands (thread blocks), global memory has to be employed for this operation.

GENERAL DESIGN

KERNEL PSEUDO CODE

EXPERIMENTS

TESTING CONFIGURATION

The performance of our implementation has been evaluated on a PC having Intel Core i7 870 @ 2.93 GHz processor and NVDIA GeForce GTX 460 (336 cores) GPU. All of the experiments have been repeated 20 times using the same configuration..

The parallel memetic algorithm configuration:

• population size: 16 to 512

• number of islands: 16 to 1024

• genome size: 4

• tournament selection with winning probability of 0.8

• mutation probability: 0.05

• local search iteration number: 10

• local search step size: 0.001

• migration interval: 10

CONCLUSION

• While it is dependent on the fitness functions, GPU implementation shows better performance for all types of fitness functions.

• The best speedup is achieved with Griewangk function where 38x speedup is observed.

• The future work will be focusing on more complex optimization problems providing solutions to real world problems.

• Moreover, different genetic operators (local search, mutation and crossover) are planned to be implemented on CUDA to achieve better optimization results.

Parallel Memetic Algorithm

Implementation on CUDA Umut Cinar, Alptekin Temizel

Graduate School of Informatics, Middle East Technical University, Ankara, Turkey

Function Formula Global Minimum Properties

Rosenbrock 𝑓𝑅𝑜𝑠 𝑥 = [100 𝑥𝑖+1 − 𝑥𝑖2 2 + (𝑥𝑖 − 1)

2]

𝑝−1

𝑖=1

𝑥∗ = 1,1,… , 1 ; 𝑓𝑅𝑜𝑠 𝑥∗ = 0

Not

seperable

Rastrigin 𝑓𝑅𝑎𝑠 𝑥 = 10𝑝 + [𝑥𝑖2 − 10cos(2𝜋𝑥𝑖)]

𝑝

𝑖=1

𝑥∗ = 0,0,… , 0 ; 𝑓𝑅𝑎𝑠 𝑥∗ = 0

Seperable

and

Multimodal

Griewangk 𝑓𝐺𝑟𝑖 𝑥 = 1 + 𝑥𝑖2

4000

𝑝

𝑖=1

− cos(𝑥𝑖

𝑖)

𝑝

𝑖=1 𝑥∗ = 0,0,… , 0 ; 𝑓𝐺𝑟𝑖 𝑥

∗ = 0

Not

seperable

and

Multimodal

Schwefel 𝑓𝑆𝑐ℎ 𝑥 = ( 𝑥𝑗

𝑖

𝑗=1

)2

𝑝

𝑖=1

𝑥∗ = 0,0,… , 0 ; 𝑓𝑆𝑐ℎ 𝑥∗ = 0

Not

seperable

Initialize shared memory buffers

WHILE termination criterion is not met; DO

evaluate individual

begin local search

tournament selection

IF thread_id < warp_size

offsprings = crossover(parents)

offsprings = mutation(offsprings)

replace offsprings with worst_individuals

ELSE IF thread_id < warp_size*2 AND

(MIGRATION_INTERVAL is true)

migrate(best_individuals)

END IF END WHILE

• Threads within a block continues in parallel until the variation and migration operators take place.

• At this point, two warp of threads executes crossover/mutation and migration operations. First warp is assigned to crossover/mutation while the second warp is assigned to migration (when it is migration interval).

• Hence, this part of the implementation is completely unrolled by using a warp for each operation. This approach decreases the cost of warp divergence significantly.

• The experiments show that the results are heavily dependent on the fitness function type. In a memetic algorithm, the most computationally complex routine is the fitness function, and this is reflected in the dependency of performance on fitness function type.

• The maximum speedups achieved with Griewangk function as x38.

16

64

2560

2

4

6

8

10

12

14

Spe

ed

up

Population Size

Number of Islands Rosenbrock function on GPU (max speedup is x13)

12-14

10-12

8-10

6-8

4-6

2-4

0-2

16

64

2560

5

10

15

20

25

Spe

ed

up

Population Size

Number of Islands

Rastrigin function on GPU (max speedup is x25)

20-25

15-20

10-15

5-10

0-5

16

64

256

0

2

4

6

8

10

Spe

ed

up

Population Size

Number of Islands

Schwefel function on GPU (max speedup is x9)

8-10

6-8

4-6

2-4

0-2

16

64

2560

10

20

30

40

Spe

ed

up

Population Size

Number of Islands

Grienwangk function on GPU (max speedup is x38)

30-40

20-30

10-20

0-10

Speedup has been investigated using several benchmarking functions including Rosenbrock, Rastrigin, Schwefel and Griewangk and various parameters.

INFLUENCE OF GENOME SIZE ON EXECUTION TIME

53 97

141 185 229

1989 3408

5027 6626 8536

1

10

100

1000

10000

2 4 6 8 10

Exe

cuti

on

Tim

e (

ms)

Genome Size

Genome Size vs Execution Time

CPU

GPU

REFERENCES

[1] Pablo Moscato, Carlos Cotta, “A Gentle Introduction to Memetic Algorithms”, Handbook of Metaheuristics, pp. 105-144, 2003. [2] Wojciech Bozejko, Mieczyslaw Wodecki, “The Methodology of Parallel Memetic Algorithms Designing”, in Proc. ICAART 2011.