s3d direct numerical simulation — preparation for the 10 ... · national renewable energy...

44
NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable Energy, LLC. Ray W. Grout Scientific Computing ACSS, 29 March 2012 Ramanan Sankaran ORNL John Levesque Cray Cliff Woolley, Stan Posey nVidia J.H. Chen SNL

Upload: dinhthu

Post on 21-Jul-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44

S3D Direct Numerical Simulation — Preparation for the10–100 PF era

NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable Energy, LLC.

Ray W. GroutScientific ComputingACSS, 29 March 2012

Ramanan Sankaran ORNL

John Levesque Cray

Cliff Woolley, Stan Posey nVidia

J.H. Chen SNL

Page 2: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 2 / 44

Key Questions

1. How S3D (DNS) can address the science challenges Jackieidentified

2. Performance requirements of the science and how we can meetthem

3. Optimizations and refactoring

4. What we can do on Titan

5. Future work

Page 3: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 3 / 44

The governing physics

Compressible Navier-Stokes for Reacting Flows

— PDEs for conservation of momentum, mas, energy and composition

— Chemical reaction network governing composition changes

— Mixture averaged transport model

— Flexible thermochemical state description (IGL)

— Modular inclusion of case-specific physics– Optically thin radiation– Compression heating model– Lagrangian particle tracking

Page 4: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 4 / 44

Solution Algorithm (What does S3D do?)

— Method of lines solution:– Replace spatial derivatives with finite-difference approximations

to obtain coupled set of ODEs– 8th order centered approximations to first derivative– Second derivative evaluated by repeated application of first

derivative operator∂2φ

∂x2≈ ∂

∂x

[

∂xφ

]

— Integrate explicitly in time

— Thermochemical state and transport coefficients evaluatedpointwise

— Chemical reaction rates evaluated point-wise

— Block spatial parallel decomposition between MPI ranks

Page 5: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 5 / 44

Solution Algorithm

— Fully compressible formulation– Fully coupled acoustic/thermochemical/chemical interaction

— No subgrid model: fully resolve turbulence-chemistry interaction

— Total integration time limited by large scale (acoustic, bulk velocity,chemical) residence time

— Grid must resolve smallest mechanical, scalar, chemicallength-scale

— Time-step limited by smaller of chemical timescale or acoustic CFL

Page 6: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 6 / 44

Resolution requirements in detail

1. Kolmogorov lengthscales

η ≈ Λ

Re(3/4)t

; Λ = k1L L = N∆x

η > k2∆x ⇒ Re(3/4)t <

k1

k2N ⇒ Ret <

(

k1

k2

)4/3

N4/9

2. Batchelor lengthscales

λβ =η√Sc

Sc =ν

D≈ O(1)

Hydrogen-air, Sc ≈ 0.2; n-heptane-air, Sc ≈ 2.4

3. Chemical lengthscales:

∆x <δ

Q⇒ L

δ<

N

QQ ≈ 20

Page 7: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 7 / 44

Resolution requirements (temporal)

1. Acoustic CFL

∆t <∆x

a

∆ta

∆tu′

= Ma

2. Advective CFL∆ta

∆tu′

= Ma

3. Chemical timescale

— Flame timescale(

τc ∼ δsL

)

— Species creation rates max(

S−1i

)

— Reaction rates max(

ω−1j

)

— Eigenvalues of reaction rate jacobian

Page 8: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 8 / 44

Combustion regimes

Thickened-

wrinkled flame,

distributed

reaction

Wrinkled

flamelets

Thickened flame /

Well stirred

reactor

Wrinkled

flamelets,

Corrugated

flamelets

Integral length scale / flame thickness

RMS velocity

/ flame speed

Page 9: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 9 / 44

Chemistry reduction and stiffness removal

— Reduce species and reaction count through extensive staticanalysis and manipulation of reaction mechanism

— Literature from T. Lu, C.K. Law et al.– DRG analysis of reaction network– Quasi-steady state approximations– Partial equilibrium approximations

— Dynamic analysis to adjust reactions that are assumed ‘fast’relative to diffusion at runtime (implications later)

Page 10: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 10 / 44

Benchmark problem for development

— HCCI study of stratified configuration

— Periodic

— 52 species n-heptane/air reaction mechanism (with dynamicstiffness removal)

— Mixture average transport model

— Based on target problem sized for 2B gridpoints

— 483 points per node (hybridized)

— 203 points per core (MPI-everywhere)

— Used to determine strategy, benchmarks, memory footprint

— Alternate chemistry (22 species Ethylene-air mechanism) used assurrogate for ‘small’ chemistry

Page 11: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 11 / 44

Evolving chemical mechanism

— 73 species bio-diesel mechanism now available; 99 speciesiso-octane mechanism upcomming

— Revisions to target late in process as state of science advances

— ‘Bigger’ (next section) and ‘more costly’ (last section)

— Continue with initial benchmark (acceptance) problem– Keeping in mind that all along we’ve planned on chemistry

flexibility– Work should transfer– Might need smaller grid to control total simulation time

Page 12: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 12 / 44

Target Science Problem

— Target simulation: 3D HCCI study

— Outer timescale: 2.5ms

— Inner timescale: 5ns ⇒ 500 000 timesteps

— As ‘large’ as possible for realism:– Large in terms of chemistry: 73 species bio-diesel or 99 species

iso-octane mechanism preferred, 52 species n-Heptanemechanism alternate

– Large in terms of grid size: 9003, 6503 alternate

Page 13: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 13 / 44

Summary (I)

— Provide solutions in regime targeted for model development andfundamental understanding needs

— Turbulent regime weakly sensitive to grid size: need a large changeto alter Ret significantly

— Chemical mechanism is significantly reduced in size from the fullmechanism by external, static analysis to O(50) species

Page 14: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 14 / 44

Performance profile for legacy S3D

Where we started (n-heptane)

242 × 16, 720 nodes 5.6s242 × 16, 7200 nodes 7.9s483, 8 nodes 28.7s483, 18 000 nodes 30.4s

Page 15: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 15 / 44

S3D RHS

u =qu(~x, t)

ρ; Yn =

qn(~x, t)

ρT, Cp, h

Spatial Derivative Terms

AccumulateDifferentiate

Assemble transport equation operands

Reaction Rates

Fluxes

Transport coefficients

Thermodynamic Properties

Gradients

Primary variables

Page 16: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 16 / 44

S3D RHS

Cp = p4(T ); h = p4(T )

(Polynomials tabulated andlinearly interpolated)

T, Cp, h

Spatial Derivative Terms

AccumulateDifferentiate

Assemble transport equation operands

Reaction Rates

Fluxes

Transport coefficients

Thermodynamic Properties

Gradients

Primary variables

Page 17: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 17 / 44

S3D RHS

p = ρRTT, Cp, h

Spatial Derivative Terms

AccumulateDifferentiate

Assemble transport equation operands

Reaction Rates

Fluxes

Transport coefficients

Thermodynamic Properties

Gradients

Primary variables

Page 18: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 18 / 44

S3D RHS

∂Yn

∂xi

;∂uj

∂xj

;∂T

∂xi

— Historically computedusing sequential 1Dderivatives

T, Cp, h

Spatial Derivative Terms

AccumulateDifferentiate

Assemble transport equation operands

Reaction Rates

Fluxes

Transport coefficients

Thermodynamic Properties

Gradients

Primary variables

Page 19: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 19 / 44

S3D RHS

λ = f(

T, ~X)

µ = f(

T, ~X)

Dn = f(

T, ~X, p)

(these polynomialsevaluated directly)

T, Cp, h

Spatial Derivative Terms

AccumulateDifferentiate

Assemble transport equation operands

Reaction Rates

Fluxes

Transport coefficients

Thermodynamic Properties

Gradients

Primary variables

Page 20: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 20 / 44

S3D RHS

τij = 2µ

[

∂uk

∂xk

(

1

3

∂uk

∂xk

)]

δij

(

∂ui

∂xj

+∂uj

∂xi

)

T, Cp, h

Spatial Derivative Terms

AccumulateDifferentiate

Assemble transport equation operands

Reaction Rates

Fluxes

Transport coefficients

Thermodynamic Properties

Gradients

Primary variables

Page 21: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 21 / 44

S3D RHS

∂ρu∂t

= ∂∂x

[−ρuu− p + τxx]−

∂∂y

[ρuv − τxy]

∂∂z

[ρuw − τxz]

T, Cp, h

Spatial Derivative Terms

AccumulateDifferentiate

Assemble transport equation operands

Reaction Rates

Fluxes

Transport coefficients

Thermodynamic Properties

Gradients

Primary variables

Page 22: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 22 / 44

S3D RHS

A + B ⇔ C + D

kf = AfjTβj exp

(

−Taj

T

)

Rf = [A][B]kf

Sn = Wn

M∑

j=1

νkjRj

T, Cp, h

Spatial Derivative Terms

AccumulateDifferentiate

Assemble transport equation operands

Reaction Rates

Fluxes

Transport coefficients

Thermodynamic Properties

Gradients

Primary variables

Page 23: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 23 / 44

Communication in Chemical Mechanisms

— Need diffusion term separately from advective term to facilitatedynamic stiffness removal– See T. Lu et al., Combustion and Flame 2009.– Application of quasi-steady state (QSS) assumption in situ– Applied to species that are transported, so applied by correcting

reaction rates (traditional QSS doesn’t conserve mass if speciestransported)

— Diffusive contribution usually lumped with advective term:

∂x(ρuY − Jx)

— We need to break it out separately to correct Rf , Rb

Page 24: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 24 / 44

Readying S3D for Titan

— Migration strategy:1. Requirements for host/accelerator work distribution2. Profile legacy code (previous slides)3. Identify key kernels for optimization

– Chemistry, transport coefficients, thermochemical state(pointwise)

– Derivatives (reuse)4. Prototype and explore performance bounds using cuda5. “Hybridize” legacy code: MPI for inter-node, OpenMP intra-node6. OpenACC for GPU execution7. Restructure to balance compute effort between accelerator and

host

Page 25: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 25 / 44

Chemistry

— Reaction rate — temperature dependence– Need to store rates: temporary storage for Rf , Rb

— Reverse rates from equilibrium constants or separate set ofconstants

— Multiply forward/reverse rates by concentrations

— Number of algebraic relationships involving non-contiguous accessto rates scales with number of QSS species

— Species source term is algebraic combination of reaction rates(non-contiguous access to temporary array)

— Extracted as a ‘self-contained’ kernel; analysis by nVidia suggestedseveral optimizations

— Captured as improvements in code generation tools (seeSankaran, AIAA 2012)

Page 26: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 26 / 44

Move everything over . . .

— Memory footprint for 483 gridpoints per node52 species n-Heptane 73 species bio-diesel

Primary variables 57 78Primitive variables 58 79Work Variables 280 385Chemistry Scratch a 1059 1375RK Carryover 114 153RK Error control 171 234Total 1739 2307MB for 483 points 1467 1945

aFor evaluating all gridpoints together

Page 27: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 27 / 44

Communication aggregationfor all species do

MPI_IRecvsnd_left( 1:4,:,:) = f(1:4,:,:,i)snd_right( 1:4,:,:) = f(nx-3:nx,:,:,i)MPI_ISendevaluate interior derivativeMPI_Waitevaluate edge derivative

end for

for all species doMPI_IRecv

end forfor all species do

snd_left( 1:4,:,:,i) = f(1:nx,:,:,i)snd_right( 1:4,:,:,i) = f(nx-3:nx,:,:,i)

end forfor all species do

MPI_ISendend forMPI_Waitfor all species do

evaluate interior derivativeevaluate edge derivative

end for

Page 28: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 28 / 44

RHS reorganization

T, Cp, h

Spatial Derivative Terms

AccumulateDifferentiate

Assemble transport equation operands

Reaction Rates

Fluxes

Transport coefficients

Thermodynamic Properties

Gradients

Reaction Rates

Compute diffusion timescale

Accumulate

Compute Derivatives

Pack and send

Assemble operands

Fluxes

Transport Coeffieicents

Gradients

Thermodynamic Properties

Pack and send for gradients

Primary variables; T, Cp, h

Primary variables

Page 29: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 29 / 44

RHS reorganization

T, Cp, h

Spatial Derivative Terms

AccumulateDifferentiate

Assemble transport equation operands

Reaction Rates

Fluxes

Transport coefficients

Thermodynamic Properties

Gradients

Reaction Rates

Compute diffusion timescale

Accumulate

Compute Derivatives

Pack and send

Assemble operands

Fluxes

Transport Coeffieicents

Gradients

Thermodynamic Properties

Pack and send for gradients

Primary variables; T, Cp, h

Primary variables

Page 30: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 30 / 44

RHS reorganization

T, Cp, h

Spatial Derivative Terms

AccumulateDifferentiate

Assemble transport equation operands

Reaction Rates

Fluxes

Transport coefficients

Thermodynamic Properties

Gradients

Reaction Rates

Compute diffusion timescale

Accumulate

Compute Derivatives

Pack and send

Assemble operands

Fluxes

Transport Coeffieicents

Gradients

Thermodynamic Properties

Pack and send for gradients

Primary variables; T, Cp, h

Primary variables

Page 31: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 31 / 44

Optimize ∇Y for reuse

— Legacy approach: compute components sequentially:for all interior i, j, k do

∂Y∂x =

∑4l=1 cl

(

Yi+l,j,k − Yi−l,j,k

)

sxi

end forfor all i, interior j, k do

∂Y∂y =

∑4l=1 cl

(

Yi,j+l,k − Yl,j−l,k

)

syj

end forfor all i, j, interior k do

∂Y∂z =

∑4l=1 cl

(

Yi,j,k+l − Yi,j,k−l

)

szk

end for

— Points requiring halo data handled in separate loops

Page 32: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 32 / 44

Optimize ∇Y for reuse

— Combine evaluation for interior of gridfor all ijk do

if interior i then∂Y∂x =

∑4l=1 cl

(

Yi+l,j,k − Yi−l,j,k

)

sxi

end ifif interior j then

∂Y∂y =

∑4l=1 cl

(

Yi,j+l,k − Yl,j−l,k

)

syj

end ifif interior k then

∂Y∂z =

∑4l=1 cl

(

Yi,j,k+l − Yi,j,k−l

)

szk

end ifend for

— Writing interior without conditionals requires 55 loops– 43, 42(N − 8), 4(N − 8)2, (N − 8)3 points

Page 33: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 33 / 44

Restructure to rebalance

All GPU CPU + GPUHalo→host nx3 − (nx − 8)3 nx3 − (nx − 8)3

Halo→device nx3 − (nx − 8)3 0Interior buffer→host 0 (nx − 8)3 − (nx − 16)3

Result→device 0 nx3 − (nx − 8)3

— Compute derivative “inner-halos” on host

— Data traffic (∼ 30%), but move work to host

Page 34: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 34 / 44

New cost profile

Overall: GPU vs original vs hybrid performance

Page 35: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 35 / 44

Summary (2)

1. Significant restructuring to expose node-level parallelism

2. Resulting code is hybrid MPI+OpenMP and MPI+OpenACC(-DGPU only changes directives)

3. Optimizations to overlap communication and computation

4. Changed balance of effort

5. For small per-rank sizes, accept degraded cache utilization in favorof improved scalability

Page 36: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 36 / 44

Reminder: Target Science Problem

— Target simulation: 3D HCCI study

— Outer timescale: 2.5ms

— Inner timescale: 5ns ⇒ 500 000 timesteps

— As ‘large’ as possible for realism:– Large in terms of chemistry: 73 species bio-diesel or 99 species

iso-octane mechanism preferred, 52 species n-Heptanemechanism alternate

– Large in terms of grid size: 9003, 6503 alternate

Page 37: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 37 / 44

Benchmark problem

— 12003, 52 species n-Heptane mechanism

7200 nodes 18000 nodes

XK6 XK6 XE6 XK6 XK6 XE6

(no GPU) (GPU) (2 CPU) (no GPU) (GPU) (2 CPU)

Adjustment 3.23 2.2 2.4 1.5 1.0 1.1

Size per node 623

483

WC per timestep 8.4 5.6 6 3.9 2.58 2.78

Total WC time (days) 48.6 32.4 34.7 22.6 15 16.1

— Very large by last years’ standards — 225M core-hours

Page 38: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 38 / 44

Time to solutionProblem 7200 nodes 18000 nodes

CPU CPU+GPU CPU CPU+GPU

6503, 52 spc Size per node 35

325

3

(6859, 6653) (17576, 650

3)

WC per timestep 1.5 1.0 0.55 0.36

Total WC time 8.8 5.8 3.2 2.1

9003, 52 spc Size per node 46

335

3

(8000, 9203) (17576, 910

3)

WC per timestep 3.4 2.3 1.5 1.0

Total WC time 20 13 8.8 5.8

Page 39: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 39 / 44

Time to solutionProblem 7200 nodes 18000 nodes

CPU CPU+GPU CPU CPU+GPU

6503, 73 spc Size per node 35

325

3

(6859, 6653) (17576, 650

3)

WC per timestep 2.1 1.4 0.77 0.51

Total WC time 12.3 8.1 4.5 3

9003,73 spc Size per node 46

335

3

(8000, 9203) (17576, 910

3)

WC per timestep 4.8 3.2 2.1 1.4

Total WC time 28 18 12.3 8.1

Page 40: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 40 / 44

Further optimization potential

Page 41: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 41 / 44

Temporary storage

— Two of the main time consuming kernels (reaction rates, transportcoefficients) generate significant intermediate results

— Reaction rate ‘spill’

!"! #$

%&'"()* +,$

-(.( /+$

012345678024189138:3;8%<+=8< $>4642?>%<>402@@53 $>4642?>45A1>BC60;D $>4642?>45A1>BC254D $>4642?>45A1

E?6F2?>?62GH <+$ <$ </$ <I$

E?6F2?>H4601H J$ <$ K$ <$

?632?>?62GH I#$ I$ <K$ <I$

?632?>H4601H I+$ I$ /,$ IK$

01L?2MH ! N$ ! N$

G1L17G174>57H7>?24173M ! ! <K$ <K$

367406?>@?6C ! <$ #$ J$

5741E10>6LH ! <N$ ! <N$

@LI/>6LH ! I$ ! I$

@L+N>6LH ! N$ ! N$

H@O>6LH ! K$ ! K$

4642? <K<$

%5A5457E>P23460>&72?MH5H

.IQ>R06@5?1>&72?MH5H

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxx

xxxxxxxxxxx

xxxxxxxxxxx

xxxxxxxxxxx

xxxxxxxxxxx

— We are working to expose another dimension of parallelism toimprove this and permit evaluating much large reactionmechanisms.

Page 42: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 42 / 44

Future algorithmic improvements

— Second Derivative approximation

— Chemistry network optimization to minimize working set size

— Replace algebraic relations with in place solve

— Time integration schemes - coupling, semi-implicit chemistry

— Several of these are being looked at by ExaCT co-design center,where the impacts on future architectures are being evaluated– Algorithmic advances can be back-ported to this project

Page 43: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 43 / 44

Outcomes

— Reworked code is ‘better’: more flexible, well suited to bothmanycore and accelerated– GPU version required minimal overhead using OpenACC

approach– Potential for reuse in derivatives favors optimization (chemistry

not easiest target despite exps

— We already have ‘Opteron + GPU’ performance exceeding 2Opteron performance– Majority of work is done by GPU: extra cycles on CPU for new

physics (including those that are not well suited to GPU)– We have the ‘hard’ performance– Specifically moved work back to the CPU

Page 44: S3D Direct Numerical Simulation — Preparation for the 10 ... · NATIONAL RENEWABLE ENERGY LABORATORY 1 / 44 S3D Direct Numerical Simulation — Preparation for the 10–100 PF era

NATIONAL RENEWABLE ENERGY LABORATORY 44 / 44

Outcomes

— Significant scope for further optimization– Performance tuning– Algorithmic– Toolchain– Future hardware

— Broadly useful outcomes

— Software is ready to meet the needs of scientific research now and to be aplatform for future research

– We can run as soon as the Titan build-out is complete . . .