gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms
DESCRIPTION
Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms. Stephane Ethier Princeton Plasma Physics Laboratory SCIDAC 2005 Conference San Francisco, CA. - PowerPoint PPT PresentationTRANSCRIPT
Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced
computing platforms
Stephane Ethier
Princeton Plasma Physics Laboratory
SCIDAC 2005 Conference
San Francisco, CA
Work performed under the DOE SCIDAC Center for Gyrokinetic Particle Simulation of Turbulent
Transport in Burning Plasmas
The Ultimate Burning Plasma
Fusion Powers the Sun and Stars
Can we harness Fusion power on Earth?
The Case for Fusion Energy
• Worldwide demand for energy continues to increase– Due to population increases and economic development
• Worldwide oil and gas production is near or past peak– Need for alternative source: coal, fission, fusion
• Increasing evidence that release of greenhouse gases is causing global climate change– This makes nuclear (fission or fusion) preferable to fossil
(coal)• Fusion has clear advantages over fission
– Inherently safe (no China syndrome)– No weapons proliferation considerations (security)– Greatly reduced waste disposal problems (no Yucca mountain)
• Can produce electricity and hydrogen• Abundant fuel, available to all nations
– Deuterium and lithium supply will last 1000’s of years
Fusion Reaction
• The two ions need to break the Coulomb barrier to get close enough to fuse together
Putting the Sun in a Bottle
The Most Successful Magnetic Confinement Configuration is the Tokamak
magnets
plasma
magnetic field
Fusion Experiments in the World
J. B. Lister
We know we can make Fusion Energy –The Challenge now is to make it Practical!
Fusion: DOE OFES #1 Item on List of Priorities
November 10, 2003Energy Secretary Spencer Abraham Announces Department of Energy20-Year Science Facility PlanSets Priorities for 28 New, Major Science Research Facilities
#1 on the list of priorities is ITER, an unprecedented international collaboration on the next major step for the development of fusion
#2 is UltraScale Scientific Computing Capability
Plasma Science Challenges
• Macroscopic Stability
– What limits the pressure in plasmas?
• Wave-particle Interactions
– How do particles and plasma waves interact?
• Microturbulence & Transport
– What causes plasma transport?
• Plasma-material Interactions
– How can high-temperature plasma and material surfaces co-exist?
Challenge to Theory & Simulations
• Huge range of spatial and temporal scales
• Overlap in scales often means strong (simplified) ordering not possible
10-6 10-4 10-2 100 102
Spatial Scales (m)electron gyroradius
debye length
ion gyroradius
tearing length
skin depth system size
atomic mfp electron-ion mfp
10-10 10-5 100 105
Temporal Scales (s)
electron gyroperiod electron collision
ion gyroperiod Ion collision
inverse electron plasma frequency confinement
Inverse ion plasma frequency current diffusion
pulse length
Major Fusion Codes
Importance of Turbulence inFusion Plasmas
• Turbulence is believed to be the mechanism for cross-field transport in magnetically confined plasmas:– Size and cost of a fusion reactor determined by particle and
energy confinement time and fusion self-heating.
• Plasma turbulence is a complex nonlinear phenomenon:– Large time and spatial scale separations similar to fluid
turbulence.
– Self-consistent electromagnetic fields: many-body problem
– Strong nonlinear wave-particle interactions: kinetic effects.
– Importance of plasma spatial inhomogeneities, coupled with complex confining magnetic fields, as drivers for microinstabilities and the ensuing plasma turbulence.
The Fundamental Equations for Plasma Physics: Boltzmann+Maxwell
• Complete but impractical• Cannot solve on all time and
length scales
• Can eliminate dimensions by integrating over velocity space (assuming a Maxwellian)
udfqEJB
t
E
c
udfuqJBEt
B
turft
ffBuE
m
qfu
t
f
Cu
002
1
0
),,(
6D+time
Gyrokinetic Approximation forLow Frequency Modes
• Gyrokinetic ordering
• Gyro-motion: guiding center drifts + charged ring
– Parallel to B: mirror force, magnetically trapped
– Perpendicular: E x B, polarization, gradient, and curvature drifts
• Gyrophase-averaged 5D gyrokinetic equation
– Suppress plasma oscillation and gyro-motion
– Larger time step and grid size, smaller number of particles
1~
1~~~ //
k
kTe
L
The Gyrokinetic Toroidal CodeGTC
• Description:– Particle-in-cell code (PIC)
– Developed by Zhihong Lin (now at UC Irvine)
– Non-linear gyrokinetic simulation of microturbulence [Lee, 1983]
– Particle-electric field interaction treated self-consistently
– Uses magnetic field line following coordinates ()
– Guiding center Hamiltonian [White and Chance, 1984]
– Non-spectral Poisson solver [Lin and Lee, 1995]
– Low numerical noise algorithm (f method)
– Full torus (global) simulation
The Particle-in-cell Method
• Particles sample distribution function• Interactions via the grid, on which the potential is
calculated (from deposited charges).
The PIC Steps• “SCATTER”, or deposit,
charges on the grid (nearest neighbors)
• Solve Poisson equation• “GATHER” forces on each
particle from potential• Move particles (PUSH)• Repeat…
Charge Deposition:4-point average method
Classic PIC 4-Point Average GK(W.W. Lee)
Charge Deposition Step (SCATTER operation)
GTC
Quasi-2D Structure of Electrostatic Potential
Global Field-aligned Mesh () /q
Saves a factor of about 100 in CPU time
Domain Decomposition
• Domain decomposition:– each MPI process holds a toroidal section
– each particle is assigned to a processor according to its position
• Initial memory allocation is done locally on each processor to maximize efficiency
• Communication between domains is done with MPI calls (runs on most parallel computers)
2nd Level of Parallelism:Loop-level with OpenMP
MPI_init
MPI process MPI process MPI process MPI process
MPI_finalize
OpenMPLoop
OpenMPLoop
Startthreads
Mergethreads
New MPI-based particle decomposition
• Each domain in the 1D (and soon 2D) domain decomposition can have more than 1 processor associated with it.
• Each processor holds a fraction of the total number of particles in that domain.
• Scales well when using a large number of particles
Processor 2
Processor 3
Processor 0
Processor 1
Main Computing Platform:NERSC’s IBM SP Seaborg
• 416 x 16-processor SMP nodes (with 64G, 32G, or 16G memory)
• 380 compute nodes (6,080 processors)
• 375 MHz POWER 3+ processors with 1.5 GFlops/sec/proc peak
CRAY X1 at ORNL
• 512 Multi-streaming vector processors (MSPs)• 12.8 Gflops/sec peak performance per MSP• Currently being upgraded to X1E (1,024 – 18GF/MSP)
Earth Simulator
• Collaboration with Dr. Leonid Oliker of LBL/NERSC
• Only US team doing performance study on ES
• Many thanks to Dr. Sato
• 5,120 vector processors• 8 Gflops/sec per proc.• 40 Tflops/sec peak
• “Gather-Scatter” operation in PIC codes– The particles are randomly distributed in the simulation
volume (grid).
– Particle charge deposition on the grid leads to indirect addressing in memory
– Not cache friendly.
– Need to be tuned differently depending on the architecture.
particle array scatter operation
grid array
Optimization Challenges
Vectorization Work
• Main challenge: charge deposition (scatter)– Need to avoid memory dependencies– Solved with work-vector method– Each element in the processor register has a private copy of
the local grid
• ES: Minimize memory banks conflicts– Use “duplicate” directive (thanks to David Parks…)
• X1: Streaming + vector– Straightforward since GTC already had loop-level
parallelism.
GTC Performance
# of proc.
#part (Bil-lion)
IBM SP 3 (Seaborg)
Itanium 2 + Quadrics
CRAY X1 ES
Gflop %Pk Gflop %Pk Gflop %Pk Gflop %Pk
64 0.207 9.0 9.3 25.0 6.9 82.6 10.1 102.4 20.0
128 0.414 17.9 9.3 49.9 6.9 156.2 9.6 199.7 19.5
256 0.828 35.8 9.3 97.3 6.9 299.5 9.1 396.8 19.4
512 1.657 71.7 9.4 194.6 6.8 783.4 19.1
1024 3.314 143.4 8.7 378.9 6.7 1,925 23.5
2048 6.627 266.2 8.4 757.8 6.7 3,727 22.7
3.7 Teraflops achieved on the Earth Simulator with 2,048 processorsusing 6.6 billion particles!!
Performance Results
Number of particles (in million) that move 1 step in 1 second
1
10
100
1000
10000
64 128 256 512 1024 2048
Number of processors
Co
mp
ute
Po
wer
(m
illio
ns
of
par
ticl
es)
Earth Simulator
NEC SX8
CRAY X1
Thunder
Seaborg
Blue Gene/L
Device-size Scans: ITER-size Simulations
• ITER-size simulation using 1 billion particles (GC), 125 M spatial grid points, and 7000 time steps --- leading to important (previously inaccessible) new results
• Made possible by mixed-model MPI-OpenMP on Seaborg
a / i0
0.5
1.0
1.5
0 1000200 400 600 800
/i GB
simulation
formula
Continuous Improvements in GTC bringnew Computational Challenges
• Recent full kinetic electron simulations of electron temperature gradient instability required 8 billion particles!
• Electron-wave interaction has sharp resonances that requires higher phase space resolution
• Fully electromagnetic version requires new solver (multi-grid)
Look for many GPS-related work during this conference
• Scientific accomplishments with enhanced versions of GTC (Z. Lin et al., presented by G. Rewoldt)
• Shaped plasma device simulations with general geometry GTC (W. Wang)
• New electromagnetic solver for kinetic electrons capability (M. Adams)
• Visualization techniques (K.-L. Ma)• Data management and workflows (S. Klasky)
Conclusions
• Simulating fusion experiments is very challenging• It involves multiscale physics• Gyrokinetic particle-in-cell simulation is a very
powerful method to study plasma micro-turbulence• The GTC code can efficiently use the available
computing power• New and exciting discoveries are continuously being
made with GTC through advanced computing