beam dynamic calculation by nvidia® cuda technology e. perepelkin, v. smirnov, and s. vorozhtsov...
TRANSCRIPT
Beam Dynamic Calculation by
NVIDIA® CUDA Technology
E. Perepelkin, V. Smirnov, and S. Vorozhtsov JINR, Dubna
7 July 2009
Introduction• Cyclotron beam dynamic problems [1]:
• Losses on geometry• Space Charge effects• Optimization of the central region [2]
• CBDA [3] code calculations:• OpenMP ( by CPU )• CUDA ( by GPU )
__________________________________________________________________[1] Beam injection and extraction of RIKEN AVF cyclotron, A. Goto, CNS-RIKEN Workshop on Upgrade
of AVF Cyclotron, CNS Wako Campus, 3-4 March 2008[2] SPIRAL INFLECTORS AND ELECTRODES IN THE CENTRAL REGION OF THE VINCY
CYCLOTRON, E. Perepelkin, A. Vorozhtsov, S. Vorozhtsov, P. Beličev, V. Jocić, N. Nešković, etc., Cyclotrons and Their Applications 2007, Eighteenth International Conference
[3] CBDA - CYCLOTRON BEAM DYNAMICS ANALYSIS CODE, E. Perepelkin, S. Vorozhtsov, RuPAC 2008, Zvenigorod, Russia
Computer model of the cyclotron
Injection line
ESD
Dee
Magnet sectors
Regions of the field maps
Inflector Electric
field
Axial channel Magnetic
field
G1 Magnetic
field
Axial injection line
Cyclotron
Central region optimization
φRF = 13°
φRF = 15°
φRF = 28°
φRF = 10°
Particle losses
Bunch acceleration
Optimization process
S0 S1 S2
S3 S4
Acceleration field map
Very time consuming problem• About 5 different variants – minimum• Many ion species – accelerated• Very complicated structure• Multi macro particle simulations for
SC dominated beams
One run requires ~ several days of computer time
Open Multi-Processing
( Open MP )
Spiral inflector
Beam phase space projections at the inflector entrance
Beam phase space projections
at the inflector exit
Blue points – PIC by FFT (Grid: 25 x 25 x 25 )Red points – PP
MethodWithout OpenMP
With OpenMP
Computerplatform
PP4 h. 53 min. 2 h. 34 min. AMD Turion
64×2, 1.60 GHz
4 h. 38min 1 h. 25 min.Intel Core Quad
2.4 GHz
PIC25 x 25 x 25
~11 min. ~6 min. AMD Turion 64×2, 1.60 GHz
7 min. ~2 min.Intel Core Quad
2.4 GHz
Calculation time
10,000 particlesNo geometry losses
Compute Unified Device Architecture
( CUDA )
GeForce 8800 GTX ( price ~ $300 )
GPU structure
128 SP ( Streaming Processors )
Kernel functions• __global__ void Track ( field maps, particles
coordinates )• Calculate particle motion in electromagnetic
field maps
• __global__ void Losses ( geometry, particles coordinates )• Calculate particle losses on the structure
• __global__ void Rho ( particles coordinates )• Produce charge density for SC effects
Kernel functions• __global__ FFT ( charge density )
• FFT method ( analysis / synthesis )
• __global__ PoissonSolver ( Fourie’s coefficients )• Find solution of Poisson equation
• __global__ E_SC ( electric potential )• Calculate electric field by E = -
grad( U )
__global__ void Track ( )• Function with many parameters. Use
variable type __constant__:• __device__ __constant__ float d_float[200];• __device__ __constant__ int d_int[80];
• Particle number corresponds • int n = threadIdx.x+blockIdx.x*blockDim.x;
• Number of “if, goto, for” should be decreased;
__global__ void Losses ( )• Geometry structure consists from triangles.
Triangles coordinates stored in __shared__ variables. This feature gave drastically increase performance• int tid = threadIdx.x; - used for parallel
copying data to shared memory
• Particle number corresponds to • int n = threadIdx.x+blockIdx.x*blockDim.x; • Check particles and triangle match
__global__ void Rho• Calculate charge impact in the nodes of
mesh from particle with number int n = threadIdx.x+blockIdx.x*blockDim.x;
Cell 7
Cell 1
Cell 8
Cell 3
Cell 2
Cell 5
Cell 6 Node
__global__ FFT ( )• Used real FFT for sin(πn/N) basis functions;• 3D transform consist from three 1D FFT for
each axis: X, Y, Z• int n = threadIdx.x+blockIdx.x*blockDim.x;
k=(int)(n/(NY+1));j=n-k*(NY+1);
m=j*(NX+1)+k*(NX+1)*(NY+1);FFT_X[i+1]=Rho[i+m];
n = j +
k*(NY+1)
NZ
NY
__global__ PoissonSolver ( )• int n =
threadIdx.x+blockIdx.x*blockDim.x;
• Uind(i,j,k) = Uind(i,j,k) / ( kxi2 + kyj
2 + kzk2 )
ind(i,j,k)=i+j*(NX+1)+k*(NX+1)*(NY+1);
k=(int)(n/(NX+1)*(NY+1));j=(int)(n-k*(NX+1)*(NY+1))/(NX+1);i=n-j*(NX+1)-k*(NX+1)*(NY+1);
__global__ E_SC ( )• int n = threadIdx.x+blockIdx.x*blockDim.x+st_ind
Un
Un + 1
Un - 1
Un - ( NX + 1
)
Un + ( NX + 1
)
Un - ( NX + 1 )( NY +
1 )
Un + ( NX + 1 )( NY +
1 )
Performance
Functions*
Time, [msec] Ratio,[x]CPU GPU
Track 486 30 16
Losses 6997 75 93
Rho 79 6 14
Poisson/FFT 35 3 13
E_SC 1.2 0.8 1.4
Total 7598 114 67* Mesh size: 25 x 25 x 25. Particles: 100,000.
Triangles: 2054
Comparison
Number of
particles
Calculation time
Rate,[x]CPU GPU
1,000 3 min 19 sec 12 sec 17
10,000 34 min 14 sec 42 sec 49
100,000 5h 41 min ~6 min 56
1,000,000
2 days 8h 53 min
1h 60
SC effect
no SCLosses 24%
SCLosses 94%
I = 4 mA
Conclusions• Very chipper technology• Increasing of performance at
power 1.5 gave chance to produce the complex cyclotron modeling
• Careful programming• Expand this method for
calculation of beam halo and etc.