lattice boltzmann simulations on heterogeneous cpu-gpu
Post on 19-Mar-2022
4 Views
Preview:
TRANSCRIPT
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Lattice Boltzmann simulations on heterogeneous CPU-GPU
clusters H. Köstler
2nd International Symposium “Computer Simulations on GPU”
Freudenstadt, 29.05.2013
1
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Contents
Motivation
waLberla software concepts
LBM simulations on Tsubame
Future Work
2
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Computational Science and Engineering @ LSS
3
Applications • Multiphysics • fluid, structure • medical imaging • laser
Applied Math • LBM • multigrid • FEM • numerics
Computer Science • HPC / hardware • Performance
engineering • software
engineering USE_SweepSection( getLBMsweepUID() ) USE_Sweep() swUseFunction(„LBM",sweep::LBMsweep,FSUIDSet::all(),hsCPU,BSUIDSet::all()); USE_After() //Communication
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Problems
Hardware: Modern HPC clusters are massively parallel Intra-core, intra-node, and inter-node
Software: Applications become more complex with increasing computational power
More complex (physical) models
Code development in interdisciplinary teams
Algorithm: Many variants exist Components and parameters depend on computational domain or grid, type of problem, …
4
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
WALBERLA Applications
5
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
waLBerla: parallel block-structured grid framework
6
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
waLBerla @ GPU
7
Geometric multigrid solver on Tsubame
Computational Steering (VIPER)
CFD, fluid-structure interaction
0 500
1000 1500 2000 2500 3000 3500
unknowns in million
runt
ime
in m
s
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Boltzmann equation
Mesoscopic approach to solving the Navier-Stokes equations
Boltzmann equation describes the statistical distribution of one particle in a fluid
f is the probability distribution function (PDF), the particle velocity, and Ω(f) is the change due to collision
Models behavior of fluids in statistical physics
Lattice Boltzmann Method (LBM) solves the discrete Boltzmann equation
)(f f ft Ω=∇⋅+∂ ζ
ζ
8
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Particulate Flow Simulation
9
D3Q19 LBM cell
Collide and Stream
→ simulation done by Ch. Feichtinger
→ K. Iglberger amF ⋅=α⋅= JM
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
WALBERLA CPU-GPU cluster software concepts
10
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
waLBerla: Block concept
11
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
waLBerla: Sweep concept
12
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
waLBerla: Communication concept
13
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Overlapping of work and communication
14
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
WaLBerla: Subblocks
Assumption: A block corresponds to a (shared-memory) compute node
Can possibly be heterogeneous (CPU + GPU)
Distributed memory communication (via MPI) is not required within one block
Divide one block into subblocks of different sizes for (static) load balancing
Subblocks map to (local) devices
15
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Domain decomposition on one compute node
16
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
RESULTS LBM Simulations on Tsubame 2.0
17
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
18
Tsubame 2.0 in Japan
Compute nodes: 1442
Processor: Intel Xeon X5670
GPU: 3 x Nvidia Tesla M2050
LINPACK performance: 1.2 Petaflops
Power consumption: 1.4 MW
Interconnect: QDR Infiniband
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Input Algorithm: LBM kernel
Generic Implementation
Hardware information (bandwidth, peak performance)
Assumption
Computation time limited by memory bandwidth and instruction throughput
Communication time limited by network bandwidth and latency (for direct and collective communication)
Performance Model I
),max( ,,,, MPIcommGPUCPUcommbufferinnercompoutercomptotal tttttt +++=
19
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Single node performance on Tsubame
Machine balance
Code balance
Lightspeed estimate
Performance Model II
20
eperformancpeak bandwidth esustainabl
=mB
=
c
m
BBl ,1min
200304
FLOPS executed no.stored and loaded bytes no.
==cB
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Single Compute Node Performance I
21
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Single Compute Node Performance II
22
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Single Compute Node Performance III
23
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Single Compute Node Performance IV
24
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Weak scaling, 3 GPUs per node
25
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Strong scaling, 3 GPUs per node
26
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Test case: Packed bed of hollow cylinders
27
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Porous media: 100x100x1536, 1D dom. decomp.
28
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Porous media: 100x100x1536, 1D dom. decomp.
29
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Porous media: 100x100x1536, 1D/2D/3D
30
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Porous media: 256x256x3600, 1D/2D
31
Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de)
Future Work
Tests on Nvidia Kepler cluster
Main focus in waLBerla currently on Juqueen and SuperMUC
Programming paradigms on future HPC clusters?
Code generation techniques to improve portability
Dynamic load balancing
32
top related