greater than 10x acceleration of fusion plasma edge simulations using...
TRANSCRIPT
-
This work was funded by European Fusion Development Agreement
(EFDA) under WP13-SOL-01-01/CCFE/PS.
Greater than 10x Acceleration of fusion plasma edge simulations using the Parareal algorithm
Debasmita Samaddar1, David P. Coster2, Xavier Bonnin3,5, Christoph Bergmeister1, Eva Havlickova1, Wael R. Elwasif4, Lee A. Berry4, Donald B. Batchelor4 1 –CCFE, Culham Science Centre, UK , 2 - Max-Planck-Institut für Plasmaphysik, Germany, 3 – CNRS-LSPM, Université Paris 13, France, 4 – ORNL, USA, 5 – Now at ITER Organization, Route de Vinon sur Verdon, 13115 Saint Paul Lez Durance, France
Introduction :
Fusion power / Input power = Q ≥
10
Event based parareal implementation using the IPS framework
[4, 5] greatly improves resource utilization as well as gain.
•Relatively environmentally friendly : Waste with
radioactive toxicity < 100 years
•Virtually unlimited – Deuterium abundant in sea water. Tritium produced from lithium - also abundant in nature.
Temperature - Ti:
1-2108 K (10-20 keV)
~10 temperature of sun’s core
Density - ni:11020 m-3
~10-6 of atmospheric density
3.5 MeV
14 MeV
•Possible solution to our search for an alternate energy
source
Energy confinement time - E : 3-4s
Plasma pulse duration~1000s
Fusion :
ITER :
SOLPS :
Geometry:
- toroidal symmetry
- equations written in curvlinear
coordinates coinciding with
magnetic geometry
Physical plane Computational
plane
Package consists of 2 codes:
B2(plasma fluid transport) &
Eirene(neutral particle
transport).
Parallel & perpendicular
transport described in 2D
system.
Equations :
Results – Coarse model 1: Monte-Carlo model replaced by fluid neutrals model in G:
Results – Coarse model 2: Reduced grid size and bigger dt in G:
Current continuity
Energy conservation electrons (and ions)
Parallel momentum ions
Continuity
Parareal Algorithm :
Metric for
convergence:
i=0 to (N-1)
Check Convergence
k = k + 1
i = 0 to (N-1)
k = 0 to (N-1), F : a propagator evolving
the function (energy(t))
from initial time, t0, to a
later time .
G - faster but inaccurate
propagator
Electron density :
Coarse (G) solution Fine (F) solution
Parareal solution
Size of time slice solved per prpcessors =
10
Parareal convergence was sensitive to size of time slice (or number of
timesteps solved per processor).
Computational gain = 12.58 with 240
processors.
Coarse model 2 – Reduced grid model in G:
Fine grid: 150 X 36, Coarse: 150 X 18, dt_G =
50dt_F. Gain=12.53 for 32 processors
Results – Event based parareal significantly improves performance:
Understanding the physics of the edge plasma is crucial
for the design and operation of magnetic fusion devices
like ITER, JET, MAST, D|||-D. But, these simulations are
computationally intensive. This work explores the
parareal approach to temporally parallelize the SOLPS
code and demonstrates a computational gain > 10 which
is noteworthy for systems as complex as these.
References:
1) Lions J. L et al, C.R. Acad. Sci. Paris, Series I, 332 (2001), pp.661668 2) Samaddar D. et al, J. Comp Phys,18 (229) (2010)
3) Schneider, R. et al, Contrib. Plasma Phys. 46, No. 1-2,3-191 (2006)
4) Berry, L. A. et. al, Journal of Computational Physics 231(2012) 59455954
5) Elwasif W.R et al, 4th IEEE Workshop on Many-Task Computing on Grids and
Supercomputers, MTAGS 2011, (2011)
In this case, ‘coarse’ solution diverges away from ‘fine’ solution.
Coarse solution
G is faster than F by > 50 times. After
parareal convergence is achieved, the
parareal solution matches the serial
solution (or F solution).
Parareal fails for slice > 10
Parareal solution
Coarse model 1 – Monte-Carlo model replaced by
fluid neutrals model in G:
In F, neutral transport is modelled using Monte-Carlo treatment solving the Boltzman equation (Eirene
package). Replacing it by a fluid neutrals model for G
speeds up computations.
a) Fine grid 150X36 for MAST
plasma.
Coarse grids: 150X18, 76X36,
76X18 CFL condition allows bigger dt with reduced grid sizes.
b) Fine grid 96X36 for DIIID
plasma.
Coarse grids: 48X36, 32X36
Coarsening in space : 2 cases were explored for this purpose:
Fine grid: 150 X 36, Coarse: 150 X 18, dt_G =
64dt_F. Gain=32 for 64 processors
Fine grid: 96 X 36, Coarse: 48 X 36, dt_G =
30dt_F. Gain=21.8 for 96 processors
Size of time slice solved per processor
affects the gain (as also seen in
Coarse model 1).
For each processor:
NTIMF = Number of timesteps in Fine
NTIMG = Number of timesteps in
Coarse.
(DIIID case)
Performance may be improved by increasing NTIMF for the same NTIMG
MAST case: for NTIMG = 4 and 16 processors, the gain varies with
NTIMF :
Weak scaling: Gain may be maximized
by optimising NTIMF / NTIMG
Strong scaling: Gain will reduce for high
processor number as NTIMF & NTIMG
reduce significantly
Gain = T(serial) / T(parareal)
Actual 1st G 1st F 2nd G 2nd F 3rd G 3rd F
t0 t1 t2 t3 t4 t5 t6
ener
gy
(t)
New G Old G
Time
} } } } } } P0 P4 P3 P2 P1 P5