terra-neo a two-scale approach for e cient on-the- y
TRANSCRIPT
Motivation Stencils Method Numerical Study Performance Mantle Convection
Terra-NeoA two-scale approach for efficient on-the-fly operator
assembly on curved geometries
Simon Bauer1 Markus Huber2 Marcus Mohr1 Ulrich Rude3
Jens Weismuller1,5 Markus Wittmann4 Barbara Wohlmuth2
1Dept. of Earth and Environmental Sciences, Ludwig-Maximilians-Universitat Munchen2Institute for Numerical Mathematics (M2), Technische Universitat Munchen
3Computer Science 10, FAU Erlangen-Nurnberg4Erlangen Regional Computing Centre (RRZE), FAU Erlangen-Nurnberg
5Leibniz Supercomputing Centre (LRZ)
SPPEXA Annual Plenary Meeting21.03.2017
Bauer et al. Terra-Neo 1/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Geophysicist
Earth Mantle represented as athick spherical shell
Bauer et al. Terra-Neo 2/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Geophysicist
Earth Mantle represented as athick spherical shell
Mathematician/ComputerScientist
+ well-suited formatrix-free FE implementations
+ regular grid →constant access patterns
Bauer et al. Terra-Neo 2/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Geophysicist
Earth Mantle represented as athick spherical shell
Mathematician/ComputerScientist
+ well-suited formatrix-free FE implementations
+ regular grid →constant access patterns
Bauer et al. Terra-Neo 2/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Hierarchical Hybrid Grids (HHG)
Hybrid approach
• initial coarse unstructured mesh (macroelements)
• regular refinement of macro elements
• constant access patterns (FE stencils)within each macro element for therefined grids
• matrix-free implementation
• compute stencils on-the-fly (only ONCEper macro element), reduces memorytraffic
• excellent parallel scalability and nodeperformance [Gmeiner et al., 2015]
Projection of fine grid nodes
• accurate representation of geometry onall levels
• FE stencils are not constant withinmacro elements
• need to assemble stencil for EACH finegrid node → SLOW
Bauer et al. Terra-Neo 3/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Sequence of Mappings (2D sketch)
reference domain original HHG domain projected HHG domain
Bauer et al. Terra-Neo 4/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Sequence of Mappings (2D sketch)
reference domain original HHG domain projected HHG domain
Can we get the performance of the original HHGalso on the projected domain?
Bauer et al. Terra-Neo 4/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Outline
1) Introduce novel approach
2) Numerical study and performance results
3) Geophysical application
−∆u = f
Bauer et al. Terra-Neo 5/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Novel Approach forEfficient Stencil Assembly on Curved
Geometries
Bauer et al. Terra-Neo 6/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
HHG Stencil Layout
Layout of the 15pt stencil in HHG Selected interior points of a macrotetrahedron in 3D reference domain
Bauer et al. Terra-Neo 7/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Original (blue) vs. projected (red) stencil weights
weights (TC, . . . ) of 15pt stencil for nodes in gray plane (previous slide)
TC TS
MW MNBauer et al. Terra-Neo 8/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Novel Approach
Idea: Approximate stencils by low order polynomials
Replace expensive stencil assembly via evaluation of local elementmatrices by low-cost approximation
• Employ quadratic or cubic (or even higher order) polynomials
• Pre-compute polynomial coefficients in setup phase and store them
• Define one polynomial for each stencil weight at each level and foreach macro element
• Fit polynomial coefficients by interpolation (IPOLY) or least-squaresapproach (LSQP)
Bauer et al. Terra-Neo 9/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Sampling Points for Determination of PolynomialCoefficients
IPOLY: (quadratic) 10 samplingpoints → interpolation polynomial
LSQP: employ more sampling pointsand solve least-squares problem
Bauer et al. Terra-Neo 10/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Polynomial Evaluation
1D: a quadratic polynomialpi := p(zi ) = a2z
2i + a1zi + a0
can be evaluated incrementallywith only two flops
pi+1 = pi+δpi , δpi+1 = δpi+δk
where
δp(zi ) := p′(zi )h +p′′(zi )
2h2
δk := 2ah2
Polynomial degree q > 2: Formulacan be generalized to higher poly-nomial degrees (based on Taylorseries expansion)
3D: ansatz transfers directly tohigher dimensions
Bauer et al. Terra-Neo 11/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Numerical Study and Performance Results
Bauer et al. Terra-Neo 12/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Model Problem
−∆u = f in Ω, u = 0 on ∂Ω
Exact solution:
u(x , y , z) = g(x , y , z)·h(x , y , z)
with
g(x , y , z) = (r − rmin)(r − rmax)
h(x , y , z) = sin(10x) sin(4y) sin(7z)
Bauer et al. Terra-Neo 13/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Numerical Experiments (1/2)
• Employ geometric multigrid solver with V(3,3)-cycle andGauss-Seidel (GS) smoother
• Test different numerical strategies:
CONS original HHG (non-projected)
IFEM projected HHG (assemble FE stencils exactly)
IPOLY employ interpolation polynomial(quadratic)
LSQP employ approximation polynomial (quadratic)
LSQP (q=3) employ approximation polynomial (cubic)
DD use IFEM for residual and(Double Discretization) IPOLY/LSQP for the smoother
Bauer et al. Terra-Neo 14/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Numerical Experiments (2/2)
Discretization error for differentlevels of refinement
• LSQP better than IPOLY, butboth deteriorate after some levels
• cubic more accurate thanquadratic polynomials (but moreexpensive)
• DD (Double Discretization)schemes are exact for all levels
• critical level LC depends onmacro element size H, forsmaller H we get larger LC
• Disc. error is in O(h2) +O(Hq+1)q : polynomial degree
Bauer et al. Terra-Neo 15/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Numerical Experiments - Take Home Message
Disc. error is in O(h2) +O(Hq+1)q : polynomial degree
Task: Identify (h,H)-parameter regime such that ”O(Hq+1) 6 O(h2)”Large Scale: O(104) MPI-processes → lower bound for number of macroelements → small H → O(Hq+1) is also small
LSQP (q=2) is accurate for at least 6 levels ofrefinement.
This allows a global resolution of the Earth mantleof 1km.
Bauer et al. Terra-Neo 16/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Node Level OptimizationExecution-Cache-Memory (ECM) model (single core)
Reported cycles from IACA for the 3 loops normalized to 8 stencil updates. (maximum values of each port of a certain category)
loop total arithmetic data type commentLoop 1 22 20 22 throughput vectorizedLoop 2 56 56 36 throughput vectorizedLoop 3 40 4 12 throughput scalar w/ recursion
measurement (data in memory)
mul/fma
ld
add/fma
div (56 cy)
add/fma
ld
loop latency (40 cy)
st
ld
mul/fma
st st
36 cy 92 cy 132 cy
0 20 40 60 80 100 120 140 160cy
L1/L2 L2/L3 L3/mem @ 2.3 GHz
loop 1 (36 cy) loop 2 (56 cy) loop 3 (40 cy)
132 cy
2 x
2 c
y/cl
2 x
2 c
y/cl
1 x
5.7
cy/
cl
2 x
2 c
y/cl
2 x
2 c
y/cl
1 x
5.7
cy/
cl
model 132 cy
155 cy
Bauer et al. Terra-Neo 17/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Weak Scaling on SuperMUC Phase 2
Minimal CPU time [s] for one V(3,3)-cycle. 16 macro elements areassigned per core with 3.3 · 105 DOFs per macro element.
Bauer et al. Terra-Neo 18/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Geophysical Application
Bauer et al. Terra-Neo 19/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Mantle Convection: Physical Model
Conservation of momentum, mass and energy:
divσ + %g = 0 (1a)
∂t%+ div(%u) = 0 (1b)
∂t(%e) + div(%eu) + div q− H − σ : ε = 0 (1c)
with
σ = 2µ
(ε− tr ε
3I
)− pI , ε =
1
2
(∇u + (∇u)T
)and an equation of state % = %(p,T ).
% density u velocity p pressureg gravitational acceleration µ dynamic viscosity σ Cauchy stress tensorε rate of strain tensor T temperature e internal energyq heat flux per unit area H heat production rate
Bauer et al. Terra-Neo 20/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Mantle Convection: Generalised Stokes System
Re-casting in terms of key quantities velocity, pressure and temperature,we get a generalised Stokes problem for the quasi-stationary flow (1a),(1b)(with an elliptic viscous operator L)
L(u;µ)−∇p = F (T )
div(u) = 0µ = µ(x,T ,u)
coupled to the energy equation (1c)
∂tT + u · ∇T + div(κ∇T )− 1
cp%(H − σ : ε) = 0
only via the buoyancy term F (T ).
cp specific heat capacity κ heat conductivity
Bauer et al. Terra-Neo 21/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
LSQP for Stokes System
Mixed Finite Element discretisation:[A(µ) BT
B −C
][u
q
]=
[f
g
]
Challenges:
1) Curved geometry → employ LSQP approachI system of PDEs → approximate stencils of each operator block
individually with a quadratic polynomial
2) How do we deal with non-constant µ?
Bauer et al. Terra-Neo 22/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
LSQP with Variable Coefficients
Remember: stencil weight aI ,Jk,m is computed via:
aI ,Jk,m =∑
t∈N (I )∩N (J)
µtEI ,Jt (2)
• Approximate stencil weight including µ (”classical” LSQP approach)→ only for ”sufficiently smooth” µ
• Approximate local element matrices → replace expensive E I ,Jt with
polynomial approximation (LSQP LocEl)
+ works for general µ– more expensive, but still much faster than computation of E I ,J
t
I , J node indices N (I ) neighbourhood of node I
k,m operator component, 1 6 k,m 6 3 EI,Jt contribution of local element matrix
t local element µt averaged viscosity on element t
Bauer et al. Terra-Neo 23/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Application - Velocity Streamlines
classic FE assembly LSQP LocEl
Viscosity model [Davies et al. 2012] with jump at da = 660/6371
µ(x,T ) = exp
(4.61
1− ‖x‖2
1− rcmb− 2.99T
)1/10 · 6.3713d3
a for ‖x‖2 > 1− da,
1 else.
Present day temperature field T [Grand et. al 1997, Stixrude and Lithgow-Bertelloni 2005],plate velocities [GPlates] on surface and free-slip BC’s on core-mantle boundary (cmb)
Bauer et al. Terra-Neo 24/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Application - Velocity Streamlines
classic FE assembly
=⇒speedup
5.9x(excl. coarsegrid solver)
LSQP LocEl
Viscosity model [Davies et al. 2012] with jump at da = 660/6371
µ(x,T ) = exp
(4.61
1− ‖x‖2
1− rcmb− 2.99T
)1/10 · 6.3713d3
a for ‖x‖2 > 1− da,
1 else.
Present day temperature field T [Grand et. al 1997, Stixrude and Lithgow-Bertelloni 2005],plate velocities [GPlates] on surface and free-slip BC’s on core-mantle boundary (cmb)
Bauer et al. Terra-Neo 24/25
Motivation Stencils Method Numerical Study Performance Mantle Convection
Thank you for your attention
Bauer et al. Terra-Neo 25/25
Bauer et al. Terra-Neo 26/25
Original (blue) vs. projected (red) stencil weights
weights (MC, . . . ) of 15pt stencil for nodes in gray plane (previous slide)
MC MSE
BNW BEBauer et al. Terra-Neo 27/25
Numerical Experiments
Residual in discrete L2-norm Discretization error in discreteL2-norm
Bauer et al. Terra-Neo 28/25