lattice boltzmann simulations on heterogeneous cpu-gpu

32
Computer Science X - System Simulation Group Harald Köstler ([email protected]) Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium “Computer Simulations on GPU” Freudenstadt, 29.05.2013 1

Upload: others

Post on 19-Mar-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Lattice Boltzmann simulations on heterogeneous CPU-GPU

clusters H. Köstler

2nd International Symposium “Computer Simulations on GPU”

Freudenstadt, 29.05.2013

1

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Contents

Motivation

waLberla software concepts

LBM simulations on Tsubame

Future Work

2

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Computational Science and Engineering @ LSS

3

Applications • Multiphysics • fluid, structure • medical imaging • laser

Applied Math • LBM • multigrid • FEM • numerics

Computer Science • HPC / hardware • Performance

engineering • software

engineering USE_SweepSection( getLBMsweepUID() ) USE_Sweep() swUseFunction(„LBM",sweep::LBMsweep,FSUIDSet::all(),hsCPU,BSUIDSet::all()); USE_After() //Communication

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Problems

Hardware: Modern HPC clusters are massively parallel Intra-core, intra-node, and inter-node

Software: Applications become more complex with increasing computational power

More complex (physical) models

Code development in interdisciplinary teams

Algorithm: Many variants exist Components and parameters depend on computational domain or grid, type of problem, …

4

Computer Science X - System Simulation Group Harald Köstler ([email protected])

WALBERLA Applications

5

Computer Science X - System Simulation Group Harald Köstler ([email protected])

waLBerla: parallel block-structured grid framework

6

Computer Science X - System Simulation Group Harald Köstler ([email protected])

waLBerla @ GPU

7

Geometric multigrid solver on Tsubame

Computational Steering (VIPER)

CFD, fluid-structure interaction

0 500

1000 1500 2000 2500 3000 3500

unknowns in million

runt

ime

in m

s

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Boltzmann equation

Mesoscopic approach to solving the Navier-Stokes equations

Boltzmann equation describes the statistical distribution of one particle in a fluid

f is the probability distribution function (PDF), the particle velocity, and Ω(f) is the change due to collision

Models behavior of fluids in statistical physics

Lattice Boltzmann Method (LBM) solves the discrete Boltzmann equation

)(f f ft Ω=∇⋅+∂ ζ

ζ

8

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Particulate Flow Simulation

9

D3Q19 LBM cell

Collide and Stream

→ simulation done by Ch. Feichtinger

→ K. Iglberger amF ⋅=α⋅= JM

Computer Science X - System Simulation Group Harald Köstler ([email protected])

WALBERLA CPU-GPU cluster software concepts

10

Computer Science X - System Simulation Group Harald Köstler ([email protected])

waLBerla: Block concept

11

Computer Science X - System Simulation Group Harald Köstler ([email protected])

waLBerla: Sweep concept

12

Computer Science X - System Simulation Group Harald Köstler ([email protected])

waLBerla: Communication concept

13

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Overlapping of work and communication

14

Computer Science X - System Simulation Group Harald Köstler ([email protected])

WaLBerla: Subblocks

Assumption: A block corresponds to a (shared-memory) compute node

Can possibly be heterogeneous (CPU + GPU)

Distributed memory communication (via MPI) is not required within one block

Divide one block into subblocks of different sizes for (static) load balancing

Subblocks map to (local) devices

15

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Domain decomposition on one compute node

16

Computer Science X - System Simulation Group Harald Köstler ([email protected])

RESULTS LBM Simulations on Tsubame 2.0

17

Computer Science X - System Simulation Group Harald Köstler ([email protected])

18

Tsubame 2.0 in Japan

Compute nodes: 1442

Processor: Intel Xeon X5670

GPU: 3 x Nvidia Tesla M2050

LINPACK performance: 1.2 Petaflops

Power consumption: 1.4 MW

Interconnect: QDR Infiniband

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Input Algorithm: LBM kernel

Generic Implementation

Hardware information (bandwidth, peak performance)

Assumption

Computation time limited by memory bandwidth and instruction throughput

Communication time limited by network bandwidth and latency (for direct and collective communication)

Performance Model I

),max( ,,,, MPIcommGPUCPUcommbufferinnercompoutercomptotal tttttt +++=

19

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Single node performance on Tsubame

Machine balance

Code balance

Lightspeed estimate

Performance Model II

20

eperformancpeak bandwidth esustainabl

=mB

=

c

m

BBl ,1min

200304

FLOPS executed no.stored and loaded bytes no.

==cB

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Single Compute Node Performance I

21

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Single Compute Node Performance II

22

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Single Compute Node Performance III

23

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Single Compute Node Performance IV

24

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Weak scaling, 3 GPUs per node

25

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Strong scaling, 3 GPUs per node

26

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Test case: Packed bed of hollow cylinders

27

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Porous media: 100x100x1536, 1D dom. decomp.

28

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Porous media: 100x100x1536, 1D dom. decomp.

29

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Porous media: 100x100x1536, 1D/2D/3D

30

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Porous media: 256x256x3600, 1D/2D

31

Computer Science X - System Simulation Group Harald Köstler ([email protected])

Future Work

Tests on Nvidia Kepler cluster

Main focus in waLBerla currently on Juqueen and SuperMUC

Programming paradigms on future HPC clusters?

Code generation techniques to improve portability

Dynamic load balancing

32