terascale computations of multiscale magnetohydrodynamics for fusion plasmas: ornl ldrd project d....
TRANSCRIPT
Terascale Computations of Multiscale Magnetohydrodynamics for Fusion Plasmas:
ORNL LDRD Project
D. A. Spong*, E. F. D’Azevedo†, D. B. Batchelor*,
D. del-Castillo-Negrete*, M. Fahey‡, S. P. Hirshman*, R. T. Mills‡
*Fusion Energy Division, †Computer Science and Mathematics Division,‡National Center for Computational Sciences
Oak Ridge National Laboratory
S. Jardin, W. Park, G.Y.Fu, J. Breslau, J. Chen,
S. Ethier, S. Klasky
Princeton University Plasma Physics Laboratory
Our project identified six tasks to work on over two years. Our focus during the past year has
been on the first three.
• Our focus will shift to tasks 3 - 6 for next year• Access to ORNL X1E/XT3 computers required -
more significant as project progresses
1. Adapt/optimize M3D for the Cray2. Adapt/optimize DELTA5D for the Cray3. Develop particle closure relations4. Couple M3D and particle closures5. Demonstrate code on realistic examples6. Final report, documentation
Significant progress on tasks that are critical to this LDRD project:
• DELTA5D porting and optimization– Vectorized particle stepping + rapid field evaluation (3D spline, SVD
compression)
• M3D porting and optimization– New vectorized sparse matrix-vector multiply algorithm
• Particle-based closure relations– Improved compatibility with MHD (flowsn isolated from particle
model), benchmarked with other codes (DKES)
• PPPL Subcontract added (May - September, 2005)– Addresses issues for fusion scientific end station applications
DELTA5D and M3D optimization
Particle simulation performance has been significantly improved by optimizing the
magnetic field evaluation routines:
• DELTA5D converted to cylindrical geometry for compatibility with M3D
• 3D B-spline routine optimized by vectorization
• For larger problems, spline memory requirements will limit number of particles per processor
• New data compression techniques developed
0.1
1
10
100
1000
10 100 1000 104
# of particles per processor
CheetahIBM Power4
Cray X1
Cray XT3
Initial 3D splinealgorithm tests
Cray X1IBM Power4
impact ofoptimization &vectorization
DELTA5D
High performance + small memory footprint SVD* fits of magnetic/electric field data have been
developed
Strategy: combine 2-D SVD* fit (R,Z) with 1-D Fourier series ()
N x N matrix -rank approximation
*SVD: Singular value decomposition
B
i j=B θi,φj( )
2N terms <<N B
i j = wk gk θ i( )
k=1
∑ fk φj( )
DELTA5D
New sparse matrix-vector multiply routine (vectorized) developed: 10 times faster
• M3D spends much time in:Sparse elliptic equation solvers
• PETSC matrix storage– CSR (Compressed Sparse Row)– Unit stride good for scalar/poor for
vector processors
• CSRP (CSR with permutation)– Reorders/groups rows to match
vector register size– “strip-mining” breaks up longer
loops
• CSRP algorithm encapsulated into a new PETSC matrix object
M3D
Matrix-vectormultiplies
ILU preconditioner(forward/backward solves)
10
100
1000
ASTRO BCSSTK18 7PT 77PTB
PETSC-SSPCSRP Vectorized -SSP
Test Problem
1
10
100
1000
ASTRO BCSSTK18 7PT 77PTB
PETSC-MSPCSRP Vectorized-MSP
Test Problem
This substantial improvement in matrix-vector operations has exposed the next layer of opportunity for M3D performance gains:
• For small test problems up to 15% improvement is observed with the vectorized CSRP routine
• Incomplete LU factorization uses sparse triangular solves that are not easily vectorizable
• Other preconditioning techniques offer better vectorization - tradeoff between effectiveness vs. cost of iterations– Multi-color ordering has been used on Earth Simulator to vectorize
sparse triangular solves– Forces diagonal blocks in the triangular matrices to be diagonal
• Other novel vectorizable preconditioners– Sparse approximate inverse (SPAI)– Sparse direct solvers
M3D
Neoclassical Closure Relations
Our goal is to couple kinetic transport effects with an MHD model - important for long collisional
path length plasmas such as ITER
nm∂∂t
+V ⋅∇⎛⎝⎜
⎞⎠⎟V =−∇p−∇⋅Π+ J ×B
∂∂t
+V ⋅∇⎛⎝⎜
⎞⎠⎟
p=−γp∇⋅V + γ −1( ) Q−∇⋅q( )
E =−V ×B+ηJ +1ne
J ×B−∇pe −∇⋅Π||e( )
∂B∂t
=−∇×E, ∇⋅B =0, μ0J =∇×B
• Closure relations: enter through the momentum balance equation and Ohm’s law:
• Moments hierarchy closed by Π= function of n, T, V, B, E• Requires solution of Boltzmann equation: f = f(x,v,t)• High dimensionality: 3 coordinate + 2 velocity + time
Neoclassical transport closures introduce new challenges:
• Collisions introduce new timescales– lengthy evolution to steady state, especially at low
collisionalities– Our particle model (DELTA5D) assigns particles to
processors, assumes global data can be gathered at each MHD step
– Multi-layer time-stepping• orbit time step v||t1 << L|| - determined by LSODE• collisional time step t2 << 1• For low collisionality t1 << t2
– Want to avoid calculating quantities (flows, macroscopic gradients) that are already evolved by the MHD model
• New f partitioning
N x (time stepfollowed by
collisional step)
- Collect diagnostic data- Check for particles that exit the solution domain (i.e., last closed surface) reseed or remove- Check for collisions that scatter particles outside of |v||/v| ≤ 1 or energy ≥ 0-For beams or alphas, check for thermalized particles
Merge diagnosticdata from different
processors
-process, depending on physics problem- write out from PE 0- Neutral beam
heating/slowing down- Global energy confinement- Orbit dispersion/diffusion- PDF data for nonlocal transport- Bootstrap current- ICRF heating- Alpha particle/runaway elect. confinement
Main components of DELTA5D Monte Carlo code:
Set initialconditions andassign groupsof particlesto processors
DELTA5D equations were converted from magnetic to cylindrical coordinates
Uses bspllib 3D cubic B-spline fit to data from VMEC
drR
dt=
1B||
*v||
rB* −b̂×
rE * −
1Ze
μr∇
rB
⎛
⎝⎜⎞
⎠⎟⎡
⎣⎢
⎤
⎦⎥
mdv||
dt=
rB*
B||*gZe
rE * −μ
r∇
rB( )
where B||* =b̂g
rB* b̂=
rB /
rB μ =
mv⊥2
2rB
rB* =
rB +
mv||
Ze
r∇×b̂=
rB−
mv||
Zeb̂× b̂g
r∇b̂( )
rE * =
rE −
mv||
Ze∂b̂∂t
≈rE (if ∂B / ∂t= Ωc )
In M3D variables,rB =
r∇ψ ×
r∇φ+
1F∇⊥F + R0 +
%I( )r∇φ
A model perturbed field has been added to mock up tearing modes: B = BVMEC + (BVMEC):
= mnc (ψ )cos(mθ − nφ) +α mns (ψ )sin(mθ − nφ)[ ]m,n∑
010 0103 0102
Coulomb collision operator for collisions of test particles (species a) with a background plasma
(species b):
Cabfa =νD
ab
2∂∂λ
(1−λ2)∂fa
∂λ+
1v2
∂∂v
v2 2νεαabfa +νε
vαab
3 ∂fa
∂v⎡ ⎣ ⎢
⎤ ⎦ ⎥
⎧ ⎨ ⎩
⎫ ⎬ ⎭
where
νDab =
ν0ab
v /αab( )3 φ
vαb
⎛ ⎝ ⎜
⎞ ⎠ ⎟ −G
vαb
⎛ ⎝ ⎜
⎞ ⎠ ⎟
⎡
⎣ ⎢
⎤
⎦ ⎥ νε =ν0
abGvαb
⎛ ⎝ ⎜
⎞ ⎠ ⎟
φ(x) =2π
dtt1/2 e−t
0
x
∫ G(x) =1
2x2 φ(x)−x ′ φ (x)[ ]
αab =2Tb0
ma
αb =2Tb0
mb
ν0ab =
4πnb lnΛab eaeb( )2Tb( )
3/2ma1/2
2
Monte Carlo (Langevin) Equivalent of the Monte Carlo (Langevin) Equivalent of the Fokker-Planck OperatorFokker-Planck Operator
[A. Boozer, G. Kuo-Petravic, Phys. Fl. 24 (1981)]
λn =λn−1(1−νdΔt) ± 1−λn−12( )νdΔt[ ]
1/2
En =En−1 −(2νεΔt) En−1 −32
+En−1
νε
dνε
dE⎛ ⎝ ⎜
⎞ ⎠ ⎟ Tb
⎡
⎣ ⎢
⎤
⎦ ⎥ ±2 TbEn−1νεΔt[ ]
1/2
Local Monte-Carlo equivalent quasilinear Local Monte-Carlo equivalent quasilinear ICRF operator (developed by J. Carlsson)ICRF operator (developed by J. Carlsson)
E+ =E− +μE +ζ σ EE λ+ =λ−+μλ +ζ σ λλ
ζ =a zero−mean, unit−variancerandomnumber(i.e., μζ =0 andσ ζ =1)
σ EE =2m2v⊥2Δv0 σ λλ =2
k||
ω−
v||
v2⎛ ⎝
⎞ ⎠
2 v⊥3Δv0
v2
μE =2 1−k||v||
ω⎛ ⎝
⎞ ⎠ mv⊥Δv0 μλ = 2 1−
k||v||
ω⎛ ⎝
⎞ ⎠ −
v⊥2
v2
⎡ ⎣ ⎢
⎤ ⎦ ⎥
k||
ω−
v||
v2⎛ ⎝
⎞ ⎠ +
v||
v2
v⊥2
v2
⎧ ⎨ ⎩
⎫ ⎬ ⎭
v⊥Δv0
v
where
Δv0 =1v⊥
eZ2m
E+J n−1(k⊥ρ)+E−J n+1(k⊥ρ)⎛ ⎝
⎞ ⎠
2 2πn ˙ Ω
as ˙ Ω → 0
2πn ˙ Ω
→ 2π2 2n˙ ̇ Ω
2/3
×Ai2 −n2 ˙ Ω 2
42
n˙ ̇ Ω
4/3⎛ ⎝ ⎜
⎞ ⎠ ⎟
A f method is used that follows the distribution function partitioning used in
H. Sugama, S. Nishimura, Phys. Plasmas 9, 4637 (2002).
f = fM 1+eT
dlB
BE||−B2
B2BE||
⎛
⎝⎜
⎞
⎠⎟∫
⎡
⎣⎢⎢
⎤
⎦⎥⎥
+2vth
v||v
xfM u||+ x2 −52
⎛⎝⎜
⎞⎠⎟2q||
p⎡
⎣⎢
⎤
⎦⎥
+fMT
fUu||B
B2+ x2 −
52
⎛⎝⎜
⎞⎠⎟2 q||B
p B2
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪+ fX X1 + X2 x2 −
52
⎛⎝⎜
⎞⎠⎟
⎧⎨⎩
⎫⎬⎭+ fU + mv||B( )
⎡
⎣⎢⎢
⎤
⎦⎥⎥
V −C( ) fU =σU =−mv2P2 (v|| / v)rBg
r∇lnBV −C( ) fX =σ X =−
v2
2ΩP2 (v|| / v)
rBg
r∇ B %U( )
wherex=v/ vth,%U =Pfirsch−Schlüterflow=Bζ
B1−
B2
B2
⎡
⎣⎢⎢
⎤
⎦⎥⎥fortokamak,P2 (y) =
32
y2 −12
Transport / viscositycoefficients : fU ,σU( ); fX,σ X( ); fU ,σ X( )
withL ,L( ) =12
d(v|| / v)−1
1
∫ L ,L( )θ,ζ“∫∫ g / V
New MHD viscosity-based closure relations are more consistent with the MHD model
0
0.002
0.004
0.006
0.008
0.01
0 5 10-5 0.0001 0.00015
time (seconds)
/ =0.033v m-1
0.08m-1
0.16m-10.016m-1
0.0085m-1
0.0035m-1
0.0018m-1
0.0009m-1
10-5
0.0001
0.001
0.01
0.0001 0.001 0.01 0.1
/ (v meter-1)
5DELTA D( )Monte Carlo
results
Viscosity coefficient from DKES code
μ|| from DELTA5D: running time averages - 1 flux surface
3200 electrons
μ|| benchmark with DKES
* DKES: Drift Kinetic Equation Solver
DELTA5D
B⋅∇⋅Π( )
B ⋅∇⋅Θ( )
⎡
⎣⎢⎢
⎤
⎦⎥⎥=
M1 M2
M2 M3
⎡
⎣⎢⎢
⎤
⎦⎥⎥
V||
Q||
⎡
⎣⎢⎢
⎤
⎦⎥⎥+
N1 N2
N2 N3
⎡
⎣⎢⎢
⎤
⎦⎥⎥
1n∂p∂s
−e∂φ∂s
−∂T∂s
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
where M j,N j ∝ dE0
∞
∫ e−E /kT E E −52
kT⎛
⎝⎜⎞
⎠⎟
j−1
μ||,N(E )
In the next phase, kinetic closure relations will be further developed and coupled with
the MHD model:
• Closure relations– Collisionality scans, convergence studies– Examine/check 2D/3D variation of stress tensor– Accelerate slow collisional time evolution of viscosity coefficients
• Test pre-converged restarts• Equation-free projective integration extrapolation methods
– Extension to more general magnetic field models:
– Green-Kubo molecular dynamics methods - direct viscosity calculation
• DELTA5D/M3D coupling– Interface, numerical stability, data locality issues
B =∇ ×∇β
DELTA5D
For next step: need data from several saturated M3D tearing
mode test cases at different helilicities (2/1, 3/2, 4/3, etc.)
’ and island width evolution from my earlier calculations with the FAR/KITE reduced MHD codes
q(r ) =q(0) 1+rr0
⎛
⎝⎜⎞
⎠⎟
2⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
1/
r = χ / χedge
mn =lim→ 0
dψmn
dr
⎛
⎝⎜⎞
⎠⎟r+
−dψmn
dr
⎛
⎝⎜⎞
⎠⎟r−
⎡
⎣⎢⎢
⎤
⎦⎥⎥ψmn rs( )⎡⎣ ⎤⎦
−1where q(rs ) =m/ n
e.g., for r0 =0.56, =1.5 [D.A. Spong, et al., ORNL /TM −1097]
21 < 0 for 0.8 <q(0) < 0.9 and 21 > 0 for 0.9 <q(0) <1.
Neoclassical island width evolution from the FAR/KITE reduced MHD codes
M3D positive ’ runs (based on case from W. Park)
Nonlinear M3D 2/1 tearing mode
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 20 40 60 80 100 120 140
γ
t/τA
0.0025*(0.1)-3/5
10-11
10-10
0 20 40 60 80
t/τA
Restarttimes
Strategy/tasks for Terascale MHD LDRDPort and optimize
DELTA5D for Cray X1
Port and optimizeM3D for Cray X1
Adapt particle modelto cylindrical geometry,M3D fields, parallelize
(by processor or region?)
Develop ’ > 0 tearingmode test cases
Develop efficientrepresentation for E, B
broadcast to all processors
optimize/test
Develop closure relations:- current or pressure tensor- neo. ion polarization drift effects- generalized Sugama- Ohm’s law- delta-f or full-f- new methods:
- viscosity from dispersion vs.time of momentum (?)- track parallel friction between particles/background
Couple currents,Pressure tensor,…back into M3D - needequations from flows/viscosities to JBS
optimize/testnumericalstability
Monte Carlo issues:Collision operator conservation propertiesTest particle/background consistencySmoothed particles or bin and then smoothResolution of trapped/passing boundary layers