![Page 1: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/1.jpg)
Parallelisation of Random Number Generation in PLACET
Approaches of parallelisation in PLACET
Martin BlahaUniversity of Vienna AT
CERN 25.09.2013
![Page 2: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/2.jpg)
Additions to centralised RNG
• TCL command RandomReseto Sets seeds to all streams individually
RandomReset –stream Misalignments –seed 1234o sets default seeds = reset, if called without argumento sets generatorso replaces redundancy in Tcl commands that set seeds (e.g.
Groundmotion_init)o Help that lists all streams
• Benchmarks on not parallelised codeo gsl causes slowdown of max. 3% depending on generator
1
![Page 3: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/3.jpg)
Motivation for parallel execution
Runtimes of simulations slow!
“Low-performance” functions:
● SBEND
● QUADRUPOLE
● MULTIPOLE
● ELEMENT
→ they refer to RNGs through syncrotron radiation emission
profile by Yngve LevinsenFeb. 2013
2
![Page 4: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/4.jpg)
Parallel Random Number Generation
Problems
requesting random numbers from a sequential stream for parallel use is uncontrolable
controlable and reproducible
gsl random number generators do not support parallel generation by itself
3
![Page 5: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/5.jpg)
Methods for parallel random number generation
● centralized generation
● replicated generation
● distributed generation
● existing Libraries
4
![Page 6: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/6.jpg)
Centralized RNGOne generator produces all numbers
Advantages:
only one RNG with good sequence
easy implementation
Disadvantage:
race conditions occur
fair play not guaranteed or crash (programme not stable)
slow if queueing (even slower than single thread)
5
![Page 7: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/7.jpg)
Replicated RNGInitial RNG is copied for each thread
Advantages:
more efficient
easy implementation
Disadvantage:
can suffer from correlations between threads
6
![Page 8: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/8.jpg)
Distributed RNGEach thread has its own generator
Advantages:
efficient - each thread can work stand alone
threadsafe
reproducible
Disadvantage:
can suffer from correlations
7
![Page 9: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/9.jpg)
Existing Libraries
SPRNG - University of Florida
hard to find “good” documentation on how to
combine with parallel code eg OpenMp
PRAND
for CUDA environment on GPU and CPU
good documentation on RNGs in general
Disadvantage: yet another library8
![Page 10: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/10.jpg)
Distributed RNG
Summary:
distributed generation considered to fit the best for our needs
Common methods that are known to produce satisfactory outcome
1. Random Tree Method
2. Block Splitting
3. Leapfrog Method 9
![Page 11: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/11.jpg)
Random Tree Method
• Global RNG for seeding
• Standalone RNG per thread
• Reproducible for known number of threads
new tcl command to set number of threads
→ only runs fair for the same number of
threads, not for dinamical thread assignment
Seed
10
![Page 12: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/12.jpg)
Block Splitting
Split a sequence of RN in blocks
Advantages:
no overlap in random numbers
plays fair
Disadvantages:
allocates a huge array of numbers
number of RNs has to be known in advance
11
![Page 13: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/13.jpg)
Leapfrog Method
Distributes a sequence or RN over several threads one by one
Advantages:
number of RNs must not be known in advance
guarantees no overlap of RN
plays fair, still permutations in calls
Disadvantage:
costly call of random numbers
12
![Page 14: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/14.jpg)
Block splitting vs. Leapfrog
Block-Splitting and Leapfrog runs fair with dynamic thread assignment
Problem of implimentation in a distributed, non centralised wayPeriod per thread is period of RNG/# threads
13
![Page 15: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/15.jpg)
Testing parallel RNG methods
SPEEDUP to -33,3% in runtime for random tree method
only overheads for nosynrad and little number of particles
SLOWDOWN to + 120% in runtime for leapfrog method
due to withdrawing more numbers than needed
Testing via test-bds-track for 300 000 particles, with quadrupoles and multipoles
14
![Page 16: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/16.jpg)
PreparationTool for Parallelisation - OpenMp
easy implementation
control of variable scope, assignment schedule, critical sections
15
![Page 17: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/17.jpg)
Preparation:Centralising synrad functions
2 functions calculate synrad emmission:
synrad.cc
photon_spectrum.cc
Centralised for easier and reproducible use of parallel RNG
synrad.cc has been removed
Tested via test-bds-track for 3e5 particles, same outcome
16
![Page 18: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/18.jpg)
Implementation of new class
New class PARALLEL_RNG
Inherits all methods from RANDOM_NEW
Initialises parallel RNG always on max. number of available threads
New Tcl-command ParallelThreads –num val to choose number of threads
Now RNG stream Radiation runs completely parallel by default 17
![Page 19: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/19.jpg)
Testing – BDS tracking
Covariance Matrix of test-bds-track
18
Testing via test-bds-track for 300 000 particles, with quadrupoles and multipoles
![Page 20: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/20.jpg)
Testing – CLIC beam tracking
Beam - tracking with no correction Beam - tracking with simple correction
19Testing test-clic-3 for 3500 machines
![Page 21: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/21.jpg)
Time Profile
Total runtime on 32 cores: 27 sec
Total runtime on 1 core 1 m 21 sec
Total runtime on PLACET:58 sec
BDS tracking:
PLACET: 39 sec
PLACET-NEW:~9 sec
BDS TRACKING 3.5 times faster
Timeprofile for BDS tracking for 300 000 particles2x Intel Xeon E5-2650 2.00 GHz 8-Core (16 w/hyper threading)(95W 20MB 2.8GHz Turbo Sandy Bridge EP)
20
![Page 22: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/22.jpg)
Profiling
BDS:
Sbend, elements, multipole, quadrupole still most timeconsuming functions
Linac:OMP library causes slowdown in simple-correction routines
(e.g. test-clic-4)
76% of time consumption caused by OpenMP in wait_sleep
It was necessary to find a compromise!21
![Page 23: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/23.jpg)
Conclusion
BDS runs ~30 % faster (total runtime)
CLIC 4 runs ~13 % faster
Compared to current placet in the trunk
OpenMP is a quick and easy way to parallelisation for existing functions.
22
![Page 24: Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN 25.09.2013](https://reader035.vdocument.in/reader035/viewer/2022062308/56649ec15503460f94bcc622/html5/thumbnails/24.jpg)
Future Plan
• Need to understand the overhead while running sequential
• Benchmark performance of quick functions e.g. dipoles, drifts, step-in, BPMs
• Adjust automatically to current configuration
• Write technical/user documentation
• Merge into trunk
23