adam padee ( [email protected] ) wojciech padee ( [email protected] )
DESCRIPTION
Large-Scale Evolutionary Optimization on the Grid: Multiple-Deme Genetic Algorithm in the Globus-Based Environment. Adam Padee ( [email protected] ) Wojciech Padee ( [email protected] ) Krzysztof Zaremba ( [email protected] ). Goals of the project. - PowerPoint PPT PresentationTRANSCRIPT
Cracow Grid Workshop – 15-18 October 2006 - 1
Large-Scale Evolutionary Optimization on the Grid:Multiple-Deme Genetic Algorithm in the Globus-
Based Environment
Adam Padee ([email protected])Wojciech Padee ([email protected])
Krzysztof Zaremba ([email protected])
Cracow Grid Workshop – 15-18 October 2006 - 2
Goals of the project
• Create a tool for numerical optimization of complex problems that are:• Computationally very expensive• Impossible to solve using classical, gradient-based methods (too many local
optima)
• Utilize evolutionary algorithms as they don’t rely directly on the gradient vector
• Objective function calls an external program and parses it’s output • Easy adaptation to new tasks
• Example application: Track reconstruction optimization in HEP experiments – will be shown at the end of this presentation
Cracow Grid Workshop – 15-18 October 2006 - 3
Common architectures of the parallel evolutionary algorithms (1)
• Master-slave• One population is stored on a server (master node), calculation of the
fitness function values distributed among the worker nodes (slaves)• Synchronous• Asynchronous (split population or SSGA – Steady State Genetic
Algorithm)
• Multiple population algorithms (also called coarse-grained)• They consist of multiple independent populations, exchanging only
selected individuals. Frequency of the exchanges, migration channels and the operators applied to the individuals depend on the model, e.g.:
• Fully connected topology (suitable especially for parallel supercomputers)• Island Model (arbitrary topology, simple migrations but less frequent)• Pollen transmission• Social• ...
Cracow Grid Workshop – 15-18 October 2006 - 4
Common architectures of the parallel evolutionary algorithms (2)
• Cellular (also called fine-grained)• One population divided spatially among neighboring processors. Each
of them can process one or more individuals. Selection and crossing-over takes place only among neighbors. Most popular implementations:
• Hardware (dedicated integrated circuits)
• Software: usually on SIMD processors, although there are also very efficient implementations on ccNUMA architecture (Cache Coherent Non-Uniform Memory Access)
• Hierarchical • Coarse-grained algorithms consisting of multiple cellular or master-
slave algorithms. This is the most advanced, and also most flexible architecture.
Cracow Grid Workshop – 15-18 October 2006 - 5
Asynchronous master-slave SSGA on a single cluster
1. Create empty population. 2. Create empty execution queue (this is an internal object with mapping
to one of the physical queues in the batch system). 3. If there are not at least two free places in the execution queue, go to
step 54. Check if there are free places in the population
a) If there are, create two random individuals, place them in the execution queue and return to the step 3
b) If not, select two individuals using reproduction operator. Apply crossing-over and mutation. Place them in the execution queue. Return to 3.
5. Wait until one of the client finishes it’s work. Collect the results.6. If there are free places in the population, place the newcomer in one
of them. If not, select the individual to replace using reverse reproduction operator (tournament, proportional or random).
7. Check if the stop criteria has been reached. If yes, terminate the program, otherwise return to the step 3.
Cracow Grid Workshop – 15-18 October 2006 - 6
Implementation details(LSF and OpenPBS)
• Master process runs on the batch system server (or on the designated UI machine - LSF):
• Creates new individuals and applies the genetic operators• Registers input data in MSS• Runs and monitors the slave processes• Collects the results using batch system mechanisms and assigns the fitness
values to the individuals in the execution queue (execution queue is program’s internal object, mapping to the batch system queue done via appropriate API)
• Batch system introduces couple of seconds delay:• Registration in the queue• Selection of the free CPU• Transfer of the parameters• Monitoring• Gathering results
• With job flow around 50-100 jobs/sec, the failure rate doesn’t exceed 10% (in real life application – RECON 2000)
Cracow Grid Workshop – 15-18 October 2006 - 7
Flat master-slave on the Grid
• At the first glance implementation is relatively easy• Convenient API/CLI functions for job submission• Single sign-on allows the master process to operate autonomously• Global file systems (e.g. LFC) facilitate the data access
But ...
• Approximately 100 more processors ( 100 more slaves, network bandwidth requirements are very high)
• Complicated task monitoring and error analysis• Job submission overhead can reach order of minutes for a
single job• RB + L&B is not prepared for a massive submission of short
jobs ( frequent failures, disturbance for other users)
Cracow Grid Workshop – 15-18 October 2006 - 8
Island Model GA: basic concept
GA
1
GA
2
GA
3
GA
4
GA
5
GA
6
Growth phase: population on each of the islands is being developed independently
Cracow Grid Workshop – 15-18 October 2006 - 9
Island Model GA: basic concept
GA
1
GA
2
GA
3
GA
4
GA
5
GA
6
Migration phase: Each of the population selects one or more individuals (usually the best ones) and sends him to the neighboring island, where immigrant is introduced in the local population Migration
channels
Cracow Grid Workshop – 15-18 October 2006 - 10
Island model: parameters
• Size of the member populations
• Migration topology (directed graph)
• Frequency of the migrations
• Selection of the migrants and adoption of the immigrants
These parameters have big influence on the convergence speed, but the optimal choice of their values highly depend on the optimized function and used infrastructure (type of the computer, cost of the CPU cycles vs. communications). There are models based on Markov chains allowing their calculation for a given probability of reaching the global optimum, but applicability of these models is limited to very simple cases
Cracow Grid Workshop – 15-18 October 2006 - 11
Flat Island Model GA on the Grid
• One deme per every CPU requires high migration rates (bandwidth problems)
• Flat model of communication hard to implement across sites.
• Grid-wide MPI not available in big production grids like EGEE• Introduction of dedicated service is not flexible• Possible exchange of information via replicas in LFC - very slow
and inefficient solution
Cracow Grid Workshop – 15-18 October 2006 - 12
Hybrid algorithm: Islands with master-slave SSGA populations
• One island is formed on each cluster. • Thanks to fast internal communication, the master-slave algorithm
for clusters can be used with only slight modifications
• Master is running on the gatekeeper (CE), which is usually a batch system server or at least has proper rights to run batch jobs directly (via qsub or bsub)
• This machine has outbound and inbound IP connectivity with other sites (at least GLOBUS_TCP_PORT_RANGE ports are open)
• Communication with other islands is possible in any topology
• Relatively big population size at each island allows lower migration rates
• Migrants can be exchanged also via files in LFC
Cracow Grid Workshop – 15-18 October 2006 - 13
Hybrid algorithm (LFC variant):start of the master processes
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Resource Broker
JDL with task
CondorG
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1
PBS Passwordless SSH
GACE
1CE
1
Cracow Grid Workshop – 15-18 October 2006 - 14
Hybrid algorithm (LFC variant): calculations
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1GA
CE 1
CE 1
Running slave processes via PBS/LSF
GA GA
Cracow Grid Workshop – 15-18 October 2006 - 15
Hybrid algorithm (LFC variant): migration
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1GA
CE 1
CE 1GA GA
Registration of the migrants
Cracow Grid Workshop – 15-18 October 2006 - 16
Hybrid algorithm (LFC variant): migration
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1GA
CE 1
CE 1
Readout of the immigrants
GA GA
Cracow Grid Workshop – 15-18 October 2006 - 17
Hybrid algorithm (LFC variant): calculations
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1GA
CE 1
CE 1
Running slave processes via PBS/LSF
GA GA
Cracow Grid Workshop – 15-18 October 2006 - 18
Hybrid algorithm (LFC variant): registration of the results
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1GA
CE 1
CE 1
Saving the best individuals
GA GA
Cracow Grid Workshop – 15-18 October 2006 - 19
Hybrid algorithm (MPI variant): start of the master processes
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
MPI-enabled Resource Broker
JDL with task
CondorG
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1
PBS
GA
CE 1
CE 1
Cracow Grid Workshop – 15-18 October 2006 - 20
Hybrid algorithm (MPI variant): calculations
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1
GA
CE 1
CE 1
Running slave processes via PBS/LSF
GA GA
Cracow Grid Workshop – 15-18 October 2006 - 21
Hybrid algorithm (MPI variant): migration
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1
GA
CE 1
CE 1
Communication through MPI
GA GA
Cracow Grid Workshop – 15-18 October 2006 - 22
Hybrid algorithm (MPI variant): calculations
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1
GA
CE 1
CE 1
Running slave processes via PBS/LSF
GA GA
Cracow Grid Workshop – 15-18 October 2006 - 23
Hybrid algorithm (MPI variant): registration of the results
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1
GA
CE 1
CE 1
Saving the best individuals
GA GA
Cracow Grid Workshop – 15-18 October 2006 - 24
Hybrid algorithm (TCP variant): start of the master processes
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Resource Broker
JDL with task
CondorG
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1
PBS Passwordless SSH
GACE
1CE
1
Registration of IP / port
Cracow Grid Workshop – 15-18 October 2006 - 25
Hybrid algorithm (TCP variant): start of the master processes
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1GA
CE 1
CE 1
Readout of other machines’ addresses
GA GA
Cracow Grid Workshop – 15-18 October 2006 - 26
Hybrid algorithm (TCP variant): calculations
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1GA
CE 1
CE 1
Running slave processes via PBS/LSF
GA GA
Cracow Grid Workshop – 15-18 October 2006 - 27
Hybrid algorithm (TCP variant): migration
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1GA
CE 1
CE 1
Communication through TCP/IP
GA GA
Cracow Grid Workshop – 15-18 October 2006 - 28
Hybrid algorithm (TCP variant): calculations
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1GA
CE 1
CE 1
Running slave processes via PBS/LSF
GA GA
Cracow Grid Workshop – 15-18 October 2006 - 29
Hybrid algorithm (TCP variant): registration of the results
UIJDL
Worker NodesWorker Nodes
Logical Logical File File CatalogCatalog
Worker NodesWorker Nodes Worker NodesWorker Nodes
CE 1GA
CE 1
CE 1
Saving the best individuals
GA GA
Cracow Grid Workshop – 15-18 October 2006 - 30
Hybrid algorithm on the Grid – conclusions and problems
• Size of each deme should reflect the available number of CPUs at a site ( demes have different sizes)
• To avoid differences in the convergence speed, it is necessary to differentiate intensities and ranges of the genetic operators. For example, smaller demes have lower Gaussian mutation range, thus performing their search more locally.
• Length of the epoch at each deme should be variable and adapted with local population development
• Too early migrations may lead all the islands to a suboptimal solution
• Migrations have to be done asynchronously • Problem especially with MPI or plain TCP versions. To overcome that
difficulty, two independent processes are needed (one for migrations control, one for population development and batch system management)
Cracow Grid Workshop – 15-18 October 2006 - 31
Hybrid algorithm on the Grid – conclusions and problems
• MPI and TCP variants are not yet implemented• For the most demanding application – particle track reconstruction
optimization in HEP – LFC seems to be enough• Inter-cluster MPI not available (at least not in EGEE)• Manual TCP/IP communication is troublesome
• It is hard to guess, how many job slots are really available for a given VO (different batch system configurations, not always reflected in the information index)
• This does not affect the SSGA directly, but eventually may lead to improper adaptation of the operator ranges and intensities. Therefore, some islands may lag behind the others due to slower convergence
Cracow Grid Workshop – 15-18 October 2006 - 32
Griewank function
Test results – simulated behavior on well-known deceptive functions
Rosenbrock function
21
2212 )1()(100 xxxxf
2
2cos2cos
40
)()( 21
22
21 cx
cxc
cxcxxf
Cracow Grid Workshop – 15-18 October 2006 - 33
Griewank function
1x100 individuals
2x100 individuals
10x100 individuals
Fully connected topologies
Test results – simulated behavior on well-known deceptive functions
Rosenbrock function
1x50 individuals
2x50 individuals
4x50 individuals
Fully connected topologies
0 10 20 30 40 50997
997.5
998
998.5
999
999.5
1000
0 10 20 30 40 501.5
2
2.5
3
3.5
4x 10
5
Cracow Grid Workshop – 15-18 October 2006 - 34
„Real life” application – optimization of particle track reconstruction
Bending MagnetTarget
Detector planes
• Input data: set of hits from the detector planes
• Output data: momenta of the charged particles
• To get the momentum we need to reconstruct the whole track first.
Cracow Grid Workshop – 15-18 October 2006 - 35
Problems
• Particles don’t leave traces in all the detector planes.
• Many hits originate from the background noise.
• Mathematical models used in reconstruction are simplified.
• In one trigger there are tracks from many particles.
Cracow Grid Workshop – 15-18 October 2006 - 36
Optimized parameters
• Geometrical tolerances on the straight parts of the tracks (areas 1 and 3)
• Number of missing planes allowed in each track• Precision of the crossing point in the area of the magnet• Precision of the primary interaction vertex in the target• ...And• Everything in 3 dimensions + angles where applicable• Each step consists of several iterations controlled by
different parameters• Totally about 70 parameters should be optimized
simultaneously• Evaluation of one set takes about 10 minutes
Cracow Grid Workshop – 15-18 October 2006 - 37
0 5 10 15 20648
650
652
654
656
658
660
662
0 5 10 15 20 25 30 353200
3220
3240
3260
3280
3300
3320
0 5 10 15 20 25 30 35131
132
133
134
135
136
137
0 5 10 15 2016
18
20
22
24
26
28
Mean number of properly and improperly reconstructed tracks (synchronous, total population size 40, 100 physical events used for fitness calculation).
Mean number of properly and improperly reconstructed tracks (asynchronous, total population size 100, 500 physical events used for fitness calculation).
Results
Cracow Grid Workshop – 15-18 October 2006 - 38
Literature
1. Alba E., Tomassini M.: Parallelism and Evolutionary Algorithms. IEEE Transactions on Evolutionary Computation, Vol. 6 no. 5, pp.443-462, (2002)
2. Cantu-Paz E.: Efficient and Accurate Parallel Genetic Algorithms: Kluwer Academic Publishers (2000)
3. Goldberg D.E.: Genetic algorithms in search, optimization, and machine learning: Addison-Wesley (1989)
4. Meunier H. et al.: A Multiobjective Genetic Algorithm for Radio Network Optimization. Proceedings of the 2000 Congress on Evolutionary Computation CEC00 (2000)
5. Michalewicz Z.: Genetic Algorithms + Data Structures = Evolution Programs: Springer-Verlag Berlin Heidelberg (1996)
6. Miettinen K. et al.: Evolutionary Algorithms in Engineering and Computer Science: John Wiley and Sons Ltd (1999)
7. Padee A., Kurek K., Zaremba K. “Parallel evolutionary algorithm for track reconstruction optimization on PC cluster”. “Artificial Intelligence and Soft Computing”, Polish Neural Network Society, Warsaw 2006, pp. 211-216
8. The COMPASS Collaboration: Common Muon and Proton Aparatus for Structure and Spectroscopy. CERN/SPSLC 96-14 (SPSC/P297) (1996)