ilija vukotic [email protected] gp using gp gpu my experience with opencl future computing in...
TRANSCRIPT
![Page 1: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/1.jpg)
Ilija Vukotic [email protected]
GP using GP GPU
my experience with OpenCL
Future computing in particle physics
15. Jun. 2011
![Page 2: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/2.jpg)
Long time ago …
20/04/23 Ilija Vukotic 2
Nucleons interactions:Strong forceElectromagnetic
Liquid drop model – Gamow, Borh, Wheeler
1935 – Carl Friedrich von Weizsäcker SEMF
![Page 3: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/3.jpg)
20/04/23 Ilija Vukotic 3
Long time ago …
PairingVolume Surface Coulomb Asymmetry
Magic numbers: 2, 8, 20, 28, 50, 82, 126
Weizsäcker Semi-Empirical Mass Formula
![Page 4: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/4.jpg)
Long time ago...
20/04/23 Ilija Vukotic 4
![Page 5: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/5.jpg)
These days
20/04/23 Ilija Vukotic 5
• Nuclei don’t look like you imagine them• Diameter 1.75 – 15fm• 37 different models* – from 3 to hundreds of parameters.
*N.D. Cook (2010). Models of the Atomic Nucleus (2nd ed.) Springer
2009 - Be11 GSI - ISOLDA
![Page 6: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/6.jpg)
These days
20/04/23 Ilija Vukotic 6
2010 – Borromean –RIKEN Tokio C22
2008 – Argon - GANIL
![Page 7: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/7.jpg)
These days
20/04/23 Ilija Vukotic 7
![Page 8: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/8.jpg)
Why?
20/04/23 Ilija Vukotic 8
Goals• Test bounds • Nuclear Structure • Phases of Nuclear Matter• Quantum Chromodynamics• Nuclei in the Universe• Fundamental Interactions• Applications
Experiments • CERN ISOLDA• FAIR – GSI • EURISOL• Spiral2 GANIL – Caen• Riken – Japan • MSU, ISAAC – USA
![Page 9: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/9.jpg)
Genetic Algorithm
20/04/23 Ilija Vukotic 9
Def. heuristic based on rules of natural evolution.
Ingredients• Genes• Individuals• Population
Used for difficult optimization or search problems.
Operations • Selection • Crossover• Mutation
initialization
evaluation
selection
cross-over
mutation
Example 1
Example 2
Example 3
![Page 10: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/10.jpg)
Genetic Algorithm
20/04/23 Ilija Vukotic 10
Deceptively simple
Only some aspects are theoretically explained. Only experience will help you get optimal algorithm.
Infinite number of ways to set it up*.Important decisions:
• Representation (binary, real, multiple sexes…)• Crossover (single, two point, continuous,…)• Selection (elitist strategy, weighted,… )• Tunings: number of populations, population size, mutation rate, …
* There are even Human based Genetic algorithms
![Page 11: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/11.jpg)
Genetic Algorithm
20/04/23 Ilija Vukotic 11
Pros• Applicability• Speed • Embarrassingly parallel• robust to local minima
Cons• Needs full understanding of both problem and method• Needs tuning for optimal performance• Speed (in case of very expensive fitness function)
![Page 12: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/12.jpg)
Genetic programming
• Usually a genetic algorithm evolving a computer program optimal for a given task.
• Recent breakthroughs in theoretical explanations
• Important results in last few years (electronic design, game playing, evolvable hardware)
• Even more complex to set up
• Very computationally intensive
• Usually done in Lisp. Gens are often assembler commands.
20/04/23 Ilija Vukotic 12
![Page 13: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/13.jpg)
Genetic programming
20/04/23 Ilija Vukotic 13
Example:
1
+
/ +
sin mod
x
y
z y
1
+
/ +
sin mod
x
y
z y
mod
z y
1
+
/ +
mody
z y
1
+
/ +
sin
x
y sin
x
![Page 14: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/14.jpg)
GenetiX
20/04/23 Ilija Vukotic 14
Requirements
• Any platform
• Use all CPU’s and GPU’s
• As simple as possible
• As extensible as possible
![Page 15: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/15.jpg)
Real work
• Started with having ARTS in mind– 4 servers – 16 cores + 4 nVidia GPUs– Unfortunately of compute capability 1.0
• Decide on OpenCL– A bit more complex to use than CUDA– Similar performance expected
• All the genetic operations on CPU only
• Graphics based on Qt (with qwt)
20/04/23 Ilija Vukotic 15
![Page 16: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/16.jpg)
OpenCl part 1
• Usage rather simple– clGetDeviceIDs– clCreateContext– clCreateCommandQueue– clCreateBuffer– clEnqueueWriteBuffer/clEnqueueMapBuffer– clCreateProgramWithSource– clBuildProgram– clCreateKernel– clGetKernelWorkGroupInfo– clSetKernelArg– clEnqueueNDRangeKernel– clFinish– clEnqueueReadBuffer
20/04/23 Ilija Vukotic 16
![Page 17: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/17.jpg)
OpenCl part 2
• Usage rather simple but good performance complex– Need new tools to measure performance– Need to know hardware in details
• Even differences between 1.0 and 1.3 cards are huge
– Need parallel algorithms
20/04/23 Ilija Vukotic 17
![Page 18: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/18.jpg)
Real work part 2
First idea: let OpenCl parse the equation string.– Fast to build for CPU. 100x slower for GPU even without aggressive
optimization.
20/04/23 Ilija Vukotic 18
__kernel void FF( __global float* A, __global float* B, __global float* R){
int i = get_global_id(0);
R[i]=A[i]+B[i] * sin(A[i]) / pow(A[i],B[i]);}
__kernel void DIV( __global float* A, __global float* B, __global float* C){
int i = get_global_id(0);
C[i]=native_divide(A[i],B[i]);}
__kernel void ADD( __global float* A, __global float* B, __global float* C){
int i = get_global_id(0);
C[i]=A[i]+B[i];}
Solution: • equation in postfix format • operations as separate kernels uploaded once• parsed by myself
![Page 19: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/19.jpg)
Real work part 3
20/04/23 Ilija Vukotic 19
Idea: Sum elements of fitness function on CPU
Getting results back is way too expensive
• Non-power-of-2 size problems are greatly penalized• Do one transfer per population and not per individual• Use page-locked (pinned) memory
Solution:• Do parallel reduction on the GPU • Optimal reduction quite complex
0.01
0.1
1
10
# Elements
Tim
e (m
s)
1: Interleaved Addressing:Divergent Branches
2: Interleaved Addressing:Bank Conflicts
3: Sequential Addressing
4: First add during globalload
5: Unroll last warp
6: Completely unroll
7: Multiple elements perthread (max 64 blocks)
![Page 20: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/20.jpg)
Performance
• MacBookPro• CPU
– I5 M520– 2.40 GHz– 2 cores/4 threads– L2 256kB– L3 3MB
• GPU– GeForce GT 330M – Cuda 1.2– 6 multiprocessors * 8 cores– MAX_WORK_GROUP_SIZE: 512– MAX_CLOCK_FREQUENCY: 1100
20/04/23 Ilija Vukotic 20
• MacPro• CPU
– Quad-Core Xeon– 2.26 GHz– 2 processors/8 cores/16 threads– L2 256kB– L3 8MB (per processor)
• GPU– GeForce GT 120 – Cuda 1.1– 30 cores– MAX_WORK_GROUP_SIZE: 512– MAX_CLOCK_FREQUENCY: 550
![Page 21: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/21.jpg)
Performance
20/04/23 Ilija Vukotic 21
MacBook Pro
Equ
atio
n ca
lcul
atio
ns/s
![Page 22: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/22.jpg)
Performance
20/04/23 Ilija Vukotic 22
MacPro
Equ
atio
n ca
lcul
atio
ns/s
Doing very bad job on this CPU!
![Page 23: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/23.jpg)
Problems
• Compute profiler on Mac not well supported by nVidia
• On laptops need to warm up GPU
• Even in simple cases there is no analytical way to pre-calculate optimal localWorkSize (there is an excel spreadsheet …)
• Difficult to estimate influence of non ECC memory
20/04/23 Ilija Vukotic 23
![Page 24: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/24.jpg)
OpenCL experience
• For current CPU’s (4 cores) more than factor 2-5 can’t be obtained with compute capability 1.2 cards
• And that only with very optimal problem (code)
• Problems smaller than 64k elements shouldn’t be considered
• Problems with large I/O • Problems with unpredictable branching
20/04/23 Ilija Vukotic 24
![Page 25: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/25.jpg)
To do
• Move project storage to cloud (Google)• Add OpenMPI• Move from qwt to ROOT• Add symbolic reduction• Add free fit parameters• Fine GA tuning• Move from tree to node representation (?)• “Discover” better description of inter-
nucleon interactions.
20/04/23 Ilija Vukotic 25
![Page 26: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/26.jpg)
Disclaimer
No physicist will loose job because of this or any other similar system.
Physics laws are expressed by equations but further advancement is made by humans making mental picture of what that equation means.
Still, having equation would greatly help.
20/04/23 Ilija Vukotic 26
![Page 27: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/27.jpg)
Simple search
20/04/23 Ilija Vukotic 27
backX
Y
Simulated annealingHill climbing
Blind kangarooslooking for Mount Everest
Gen: 64 bit number in gray representationIndividual: two genes connected 128 bitsMutation: toggle of one random bitCrossover: with 20% probability take bit from other individual
![Page 28: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/28.jpg)
Physics systems
20/04/23 Ilija Vukotic 28
back
HEP analysis cut optimization
![Page 29: Ilija Vukotic vukotic@lal.in2p3.fr GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun. 2011](https://reader035.vdocument.in/reader035/viewer/2022081514/56649f225503460f94c3ac38/html5/thumbnails/29.jpg)
Music & Art industry
20/04/23 Ilija Vukotic 29
back