greater than 10x acceleration of fusion plasma edge simulations using...

1
This work was funded by European Fusion Development Agreement (EFDA) under WP13-SOL-01-01/CCFE/PS. Greater than 10x Acceleration of fusion plasma edge simulations using the Parareal algorithm Debasmita Samaddar 1 , David P. Coster 2 , Xavier Bonnin 3,5 , Christoph Bergmeister 1 , Eva Havlickova 1 , Wael R. Elwasif 4 , Lee A. Berry 4 , Donald B. Batchelor 4 1 CCFE, Culham Science Centre, UK , 2 - Max-Planck-Institut für Plasmaphysik, Germany, 3 CNRS-LSPM, Université Paris 13, France, 4 ORNL, USA, 5 Now at ITER Organization, Route de Vinon sur Verdon, 13115 Saint Paul Lez Durance, France Introduction : Fusion power / Input power = Q ≥ 10 Event based parareal implementation using the IPS framework [4, 5] greatly improves resource utilization as well as gain. Relatively environmentally friendly : Waste with radioactive toxicity < 100 years Virtually unlimited Deuterium abundant in sea water. Tritium produced from lithium - also abundant in nature. Temperature - T i : 1-210 8 K (10-20 keV) ~10 temperature of suns core Density - n i :110 20 m -3 ~10 -6 of atmospheric density 3.5 MeV 14 MeV Possible solution to our search for an alternate energy source Energy confinement time - E : 3-4s Plasma pulse duration~1000s Fusion : ITER : SOLPS : Geometry: - toroidal symmetry - equations written in curvlinear coordinates coinciding with magnetic geometry Physical plane Computational plane Package consists of 2 codes: B2(plasma fluid transport) & Eirene(neutral particle transport). Parallel & perpendicular transport described in 2D system. Equations : Results Coarse model 1: Monte-Carlo model replaced by fluid neutrals model in G: Results Coarse model 2: Reduced grid size and bigger dt in G: Current continuity Energy conservation electrons (and ions) Parallel momentum ions Continuity Parareal Algorithm : Metric for convergence: i=0 to (N-1) Check Convergence k = k + 1 i = 0 to (N-1) k = 0 to (N-1), F : a propagator evolving the function (energy(t)) from initial time, t0, to a later time . G - faster but inaccurate propagator Electron density : Coarse (G) solution Fine (F) solution Parareal solution Size of time slice solved per prpcessors = 10 Parareal convergence was sensitive to size of time slice (or number of timesteps solved per processor). Computational gain = 12.58 with 240 processors. Coarse model 2 Reduced grid model in G: Fine grid: 150 X 36, Coarse: 150 X 18, dt_G = 50dt_F. Gain=12.53 for 32 processors Results Event based parareal significantly improves performance: Understanding the physics of the edge plasma is crucial for the design and operation of magnetic fusion devices like ITER, JET, MAST, D|||-D. But, these simulations are computationally intensive. This work explores the parareal approach to temporally parallelize the SOLPS code and demonstrates a computational gain > 10 which is noteworthy for systems as complex as these. References: 1) Lions J. L et al, C.R. Acad. Sci. Paris, Series I, 332 (2001), pp.661668 2) Samaddar D. et al, J. Comp Phys,18 (229) (2010) 3) Schneider, R. et al, Contrib. Plasma Phys. 46, No. 1-2,3-191 (2006) 4) Berry, L. A. et. al, Journal of Computational Physics 231(2012) 59455954 5) Elwasif W.R et al, 4th IEEE Workshop on Many-Task Computing on Grids and Supercomputers, MTAGS 2011, (2011) In this case, ‘coarse’ solution diverges away from ‘fine’ solution. Coarse solution G is faster than F by > 50 times. After parareal convergence is achieved, the parareal solution matches the serial solution (or F solution). Parareal fails for slice > 10 Parareal solution Coarse model 1 Monte-Carlo model replaced by fluid neutrals model in G: In F, neutral transport is modelled using Monte-Carlo treatment solving the Boltzman equation (Eirene package). Replacing it by a fluid neutrals model for G speeds up computations. a) Fine grid 150X36 for MAST plasma. Coarse grids: 150X18, 76X36, 76X18 CFL condition allows bigger dt with reduced grid sizes. b) Fine grid 96X36 for DIIID plasma. Coarse grids: 48X36, 32X36 Coarsening in space : 2 cases were explored for this purpose: Fine grid: 150 X 36, Coarse: 150 X 18, dt_G = 64dt_F. Gain=32 for 64 processors Fine grid: 96 X 36, Coarse: 48 X 36, dt_G = 30dt_F. Gain=21.8 for 96 processors Size of time slice solved per processor affects the gain (as also seen in Coarse model 1). For each processor: NTIM F = Number of timesteps in Fine NTIM G = Number of timesteps in Coarse. (DIIID case) Performance may be improved by increasing NTIM F for the same NTIM G MAST case: for NTIM G = 4 and 16 processors, the gain varies with NTIM F : Weak scaling: Gain may be maximized by optimising NTIM F / NTIM G Strong scaling: Gain will reduce for high processor number as NTIM F & NTIM G reduce significantly Gain = T(serial) / T(parareal) Actual 1st G 1st F 2nd G 2nd F 3rd G 3rd F t0 t1 t2 t3 t4 t5 t6 energy (t) New G Old G Time } } } } } } P0 P4 P3 P2 P1 P5

Upload: others

Post on 03-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • This work was funded by European Fusion Development Agreement

    (EFDA) under WP13-SOL-01-01/CCFE/PS.

    Greater than 10x Acceleration of fusion plasma edge simulations using the Parareal algorithm

    Debasmita Samaddar1, David P. Coster2, Xavier Bonnin3,5, Christoph Bergmeister1, Eva Havlickova1, Wael R. Elwasif4, Lee A. Berry4, Donald B. Batchelor4 1 –CCFE, Culham Science Centre, UK , 2 - Max-Planck-Institut für Plasmaphysik, Germany, 3 – CNRS-LSPM, Université Paris 13, France, 4 – ORNL, USA, 5 – Now at ITER Organization, Route de Vinon sur Verdon, 13115 Saint Paul Lez Durance, France

    Introduction :

    Fusion power / Input power = Q ≥

    10

    Event based parareal implementation using the IPS framework

    [4, 5] greatly improves resource utilization as well as gain.

    •Relatively environmentally friendly : Waste with

    radioactive toxicity < 100 years

    •Virtually unlimited – Deuterium abundant in sea water. Tritium produced from lithium - also abundant in nature.

    Temperature - Ti:

    1-2108 K (10-20 keV)

    ~10 temperature of sun’s core

    Density - ni:11020 m-3

    ~10-6 of atmospheric density

    3.5 MeV

    14 MeV

    •Possible solution to our search for an alternate energy

    source

    Energy confinement time - E : 3-4s

    Plasma pulse duration~1000s

    Fusion :

    ITER :

    SOLPS :

    Geometry:

    - toroidal symmetry

    - equations written in curvlinear

    coordinates coinciding with

    magnetic geometry

    Physical plane Computational

    plane

    Package consists of 2 codes:

    B2(plasma fluid transport) &

    Eirene(neutral particle

    transport).

    Parallel & perpendicular

    transport described in 2D

    system.

    Equations :

    Results – Coarse model 1: Monte-Carlo model replaced by fluid neutrals model in G:

    Results – Coarse model 2: Reduced grid size and bigger dt in G:

    Current continuity

    Energy conservation electrons (and ions)

    Parallel momentum ions

    Continuity

    Parareal Algorithm :

    Metric for

    convergence:

    i=0 to (N-1)

    Check Convergence

    k = k + 1

    i = 0 to (N-1)

    k = 0 to (N-1), F : a propagator evolving

    the function (energy(t))

    from initial time, t0, to a

    later time .

    G - faster but inaccurate

    propagator

    Electron density :

    Coarse (G) solution Fine (F) solution

    Parareal solution

    Size of time slice solved per prpcessors =

    10

    Parareal convergence was sensitive to size of time slice (or number of

    timesteps solved per processor).

    Computational gain = 12.58 with 240

    processors.

    Coarse model 2 – Reduced grid model in G:

    Fine grid: 150 X 36, Coarse: 150 X 18, dt_G =

    50dt_F. Gain=12.53 for 32 processors

    Results – Event based parareal significantly improves performance:

    Understanding the physics of the edge plasma is crucial

    for the design and operation of magnetic fusion devices

    like ITER, JET, MAST, D|||-D. But, these simulations are

    computationally intensive. This work explores the

    parareal approach to temporally parallelize the SOLPS

    code and demonstrates a computational gain > 10 which

    is noteworthy for systems as complex as these.

    References:

    1) Lions J. L et al, C.R. Acad. Sci. Paris, Series I, 332 (2001), pp.661668 2) Samaddar D. et al, J. Comp Phys,18 (229) (2010)

    3) Schneider, R. et al, Contrib. Plasma Phys. 46, No. 1-2,3-191 (2006)

    4) Berry, L. A. et. al, Journal of Computational Physics 231(2012) 59455954

    5) Elwasif W.R et al, 4th IEEE Workshop on Many-Task Computing on Grids and

    Supercomputers, MTAGS 2011, (2011)

    In this case, ‘coarse’ solution diverges away from ‘fine’ solution.

    Coarse solution

    G is faster than F by > 50 times. After

    parareal convergence is achieved, the

    parareal solution matches the serial

    solution (or F solution).

    Parareal fails for slice > 10

    Parareal solution

    Coarse model 1 – Monte-Carlo model replaced by

    fluid neutrals model in G:

    In F, neutral transport is modelled using Monte-Carlo treatment solving the Boltzman equation (Eirene

    package). Replacing it by a fluid neutrals model for G

    speeds up computations.

    a) Fine grid 150X36 for MAST

    plasma.

    Coarse grids: 150X18, 76X36,

    76X18 CFL condition allows bigger dt with reduced grid sizes.

    b) Fine grid 96X36 for DIIID

    plasma.

    Coarse grids: 48X36, 32X36

    Coarsening in space : 2 cases were explored for this purpose:

    Fine grid: 150 X 36, Coarse: 150 X 18, dt_G =

    64dt_F. Gain=32 for 64 processors

    Fine grid: 96 X 36, Coarse: 48 X 36, dt_G =

    30dt_F. Gain=21.8 for 96 processors

    Size of time slice solved per processor

    affects the gain (as also seen in

    Coarse model 1).

    For each processor:

    NTIMF = Number of timesteps in Fine

    NTIMG = Number of timesteps in

    Coarse.

    (DIIID case)

    Performance may be improved by increasing NTIMF for the same NTIMG

    MAST case: for NTIMG = 4 and 16 processors, the gain varies with

    NTIMF :

    Weak scaling: Gain may be maximized

    by optimising NTIMF / NTIMG

    Strong scaling: Gain will reduce for high

    processor number as NTIMF & NTIMG

    reduce significantly

    Gain = T(serial) / T(parareal)

    Actual 1st G 1st F 2nd G 2nd F 3rd G 3rd F

    t0 t1 t2 t3 t4 t5 t6

    ener

    gy

    (t)

    New G Old G

    Time

    } } } } } } P0 P4 P3 P2 P1 P5