hpc @ t2i what an integration partner can do for you: an ... · update volume, mass and momentum...
TRANSCRIPT
HPC @ T2iWhat an integration partner can do for you:
an example with EPFL-LMHLionel Clavien & Siamak Alimirzazadeh / 2017-04-12
22
Groupe T2i
About
Skills & Services
EPFL-LMH’sGPU-SPHEROS
Numerical Method
Algorithms
Results
Agenda
33
Software Editor &IT Services Provider
• > 30 years on the market
• ~ 200 Employees
• > 250’000 Users
Markets & Territories
• Public, Real Estate, HR, Financial Services,Academia, Retail, Insurance, Logistics, …
• From SMEs to international accounts
• Switzerland, France & Canada
Solutions • EDM, CMS, Workflows, HR, IT Modernization, …
Infrastructure & CloudExpertise
• Reselling, Hosting, Managed Services, SaaS, Services VAR
Quick Facts
44
Strong relationships with vendors &deep technical
knowledge of products
• IBM & Lenovo Business Partner
• NVIDIA Preferred Partner
• OpenPOWER Foundation Member
HPC reach• Academic / Research
• Commercial
Heterogeneous Computing
• Acceleration through GPGPUs and FPGAs (*CAPI)
• High-speed / High-throughput Data Acquisition
Cognitive Computing
• AI / Machine Learning / Deep Learning
• IBM Watson
• Predictive analysis
Skills & Interests… in the HPC World
55
Exploration
• Presentations
• Workshops
• Brainstormings
Testing
• Own test machines
• Access to vendors’ labs
Design
• Networking
• Storage
• Rack layout
Pricing
• Optimizations
• Support from vendors
Provided Services – Pre-sales
66
Monitoring
Infrastructure : Nagios, Icinga, … Performance : Ganglia, …
Cluster and storage configuration
Schedulers, including parameterization: Platform LSF, SGE, … Deep expertise with IBM Spectrum Scale (aka GPFS)
Software deployment
Operating system and base toolchain, libraries Through xCAT or similar tools
Physical installation
Expertise with HPC solutions cabling
Provided Services – Implementation
77
The OpenPOWER Foundation
Chip / SOC
Boards / Systems
I/O / Storage / Acceleration
System / Integration
Software
Implementation / HPC / Research
88
OPF: an example (Minsky)
NVIDIA: GPU Accelerator
Ubuntu by Canonical: Launch OS,
supporting NVLink and Page Migration
Engine
Wistron: Platform co-design
Mellanox: InfiniBand / Ethernet connectivity
in and out of server
HGST: NVMe adapters
Broadcom: PCIe adapters
QLogic: Fiber Channel adapters
Samsung: SSDs
Hynix, Samsung, Micron: Memory
IBM: CPU
LMH Laboratory for Hydraulic Machines
Introduction
Free surface and large boundary deformation problems
Finite Volume Particle Method (FVPM)
SPHEROS: an in-house MPI-based parallel solver
GPU-SPHEROS: a GPU-accelerated version of SPHEROS
Speedups
9
LMH Laboratory for Hydraulic Machines
Simulation of Pelton turbines usingparticle-based methods
Free surface and splashing (water jet)
Large deformation of boundaries (rotating bucket and free surface)
Finite Volume Particle Method
Christian Vessaz, EPFL PhD thesis n° 6470 (2015)
10
LMH Laboratory for Hydraulic Machines
Mass and momentum conservation:
Finite Volume Particle Method (FVPM)Governing equations
Conservative and consistent
Robust in handling free surface and moving boundaries
High computational cost
Conservation law can be written as:𝜕𝑼
𝜕𝑡+ 𝛁 . 𝑭 𝑼 = 𝜌𝐠
𝑼 =𝜌𝜌𝑪
𝑑𝜌
𝑑𝑡= −𝜌𝛁 . 𝑪
𝜌𝑑𝑪
𝑑𝑡= 𝛁 . (𝒔 − 𝑝I) + 𝜌𝐠
where: 𝑭 =𝜌𝑪
𝜌𝑪⊗ 𝑪 − 𝒔 + 𝑝Iand,
11
LMH Laboratory for Hydraulic Machines
SPHEROS
SPHEROS is a FVPM parallel in-house solver (using MPI)
Able to simulate interaction between fluid, solid and silt
Mainly developed for free surface and erosion modeling in hydraulic turbines
1. École Polytechnique Fédérale de Lausanne (EPFL) thesis n° 6470 (2015)
2. Sebastian Leguizamon, PhD candidate at EPFL-LMH since 2015
[2] [1]
12
LMH Laboratory for Hydraulic Machines
GPU-SPHEROS overall algorithm
for each time step t
for each particle i
find the neighbor particles j
end for
for each particle i
for each neighbor j
compute the interaction vectors
end for
end for
for each particle i
for each neighbor j
If i is silt
compute contact forces 𝑓𝑖𝑗𝑐 from Hertz theory for spherical particles
and hydrodynamic force 𝑓𝑖𝑗ℎ : 𝑓𝑖 = 𝑓𝑖𝑗
𝑐𝑛𝑗∈𝑠𝑖𝑙𝑡 ,𝑠𝑜𝑙𝑖𝑑 + 𝑓𝑖𝑗
ℎ𝑛𝑗∈𝑓𝑙𝑢𝑖𝑑
else
compute momentum flux from pressure P and deviatory stress G
where G is Newtonian viscous stress in the fluid and hypo-elastic
stress in the solid:
𝑓𝑖 = (𝜌𝑪𝒙 − 𝜌𝑪𝑪)𝑖𝑗 − 𝑷𝑖𝑗 + 𝑮𝑖𝑗 𝑖 .∆𝒊𝒋 − 𝑝𝑏𝑩𝑖
compute mass flux including the smoothing mass term:
𝑚𝑖 = (𝜌𝒙 − 𝜌𝑪)𝑖𝑗 + 𝑮𝒊𝒋 𝑖 .∆𝒊𝒋
compute volume flux:
𝑉 𝑖 = 𝒙 𝒊𝒋.∆𝒊𝒋𝑖 + 𝒙 𝒊.𝑩𝒊
end if
end for
end for
for each particle i (using 2nd order Runge-Kuta predictor corrector scheme)
update volume, mass and momentum
update density and compute pressure from equation of state
compute velocity correction and update particle velocity
update particle position
end for
𝑡 ← 𝑡 + ∆𝑡 end for
Computing interaction vectors (67.5%)
Computing forces and fluxes + time integration (5%)
Octree-based neighbor search (27.5%)
GPU-SPHEROS
13
LMH Laboratory for Hydraulic Machines
Octree-based fixed-radius neighbor search based on SFC
The particle fixed-radius neighbor search is based on spece filling
curves (Morton curve)
The particles are sorted using parallel radix sort algorithm (Thrust
parallel algorithms) in order to make coalesced memory accesses
14
LMH Laboratory for Hydraulic Machines
[2][1]
1. E. Jahanbakhsh, PhD Thesis EPFL, n° 6284 (2014)2. E. Jahanbakhsh et al. Exact finite volume particle method
with spherical-support kernels, Computer Methods inApplied Mechanics and Engineering (ISSN: 0045-7825),vol. 317, p. 102-127 Elsevier, 2017
15
LMH Laboratory for Hydraulic Machines
Computing forces and fluxes (Re = 50k)
16
LMH Laboratory for Hydraulic Machines
WLS (compute-bound) and VIG (memory-bound)
Intel Xeon E5649 vs. NVIDIA Tesla K40
17
LMH Laboratory for Hydraulic Machines
0
1
2
3
4
5
6
7
8
9
janv.16 oct.16 feb.17 mars.17
13%
85%
2%
Octree Neighbor SearchInteraction VectorsForces and Fluxes
Reference: 2 x Intel® Xeon® CPU E5-2660 v2Accelerated: Tesla P100 with NVlink + 2x POWER8 (10 core 2.86 GHz)
13%2%
13%
27.5%
67.5%
5%
[-] Global Speedup (for ~132k particles)
1.13x1.32x
2.5x
7.7x
Optimization of interaction vectors
Optimization ofneighbor search
Interaction vectors based on spherical-support kernel
Computing forces and fluxes
Octree-based neighbor search
18
LMH Laboratory for Hydraulic Machines
Speedup
19
LMH Laboratory for Hydraulic Machines
Summary
Finite Volume Particle Method (FVPM) is very compute-intensive
GPU-SPHEROS: a GPU-accelerated version of SPHEROS
GPU-SPHEROS is faster than 8 nodes with 2 Intel® Xeon® CPU
E5-2660 v2
2 Tesla P100 GPUs with NVlink between GPUs and CPU (T2i)
Another great speedup is expected after further optimization of
interaction vectors computation
20
21
LMH Laboratory for Hydraulic Machines
Realistic simulations
Single-jet with three buckets takes more than 8 hours on one node
22
LMH Laboratory for Hydraulic Machines
Computing forces and fluxes
Lid-driven cavity benchmark
23
LMH Laboratory for Hydraulic Machines
Computing forces and fluxes
Validation for lid-driven cavity benchmark (Re = 400)
24
LMH Laboratory for Hydraulic Machines
Kernel Optimization
Optimization level Technique Time [ms] Improvement [-]
0 Atomic operations/Thrust sequential reduction ≈100 > 4x
1 for loop 23.69 1x
2 Unrolling loops 5.27 4.49x
3 Using Structure of Arrays 3.23 7.33x
4 __restrict__ pointers 2.89 8.19x
5 __launch_bound__ 2.51 9.43x
6 Optimized number of threads per block 2.36 10.03x
Memory-bound kernels
Compute-bound kernels
Example: Optimization procedure of a volume integral gradient kernel
𝜵𝒑𝒊 =𝟏
𝑽𝒊
𝒋
𝒑𝒊 + 𝒑𝒋
𝟐∆𝒊𝒋
25