philipp offenhauser¨...a dynamic load-balancing strategy for large scale cfd-applications philipp...
TRANSCRIPT
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
A dynamic load-balancing strategy for large scale CFD-applications
Philipp Offenhauser
10.10.2017
1/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Outline
Motivation and Background
Load-Balancing strategy
Results
Further Challenges
Conclusion
2/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Motivation - Top500 from 1993 to 2017
1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016100
101
102
103
104
105
106
107
108
109
Year
Perform
ance
[GFlop/s]
Rmax Rank 1Rmax Rank 10
100
101
102
103
104
105
106
107
108
109
#Cores
#Cores Rank 1#Cores Rank 10
(Source: top500.org)
3/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Motivation
I HPC systems generate their power byfacilitating hundreds of thousands ofcores
I Massive parallel algorithms areimplemented
I Computational effort must bedistributed evenly across all cores
I Initial domain decompositiontechniques are well know
I Well balanced applications for scientificuse-cases are developed
1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016100
101
102
103
104
105
106
107
108
109
Year
Perform
ance
[GFlop/s]
Rmax Rank 1Rmax Rank 10
100
101
102
103
104
105
106
107
108
109
#Cores
#Cores Rank 1#Cores Rank 10
(Source: top500.org)
4/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Definitions
Element-weight:Specific computational effort of an element.
Load of a MPI-domain:Sum of the element-weights of a MPI-domain.
Load-imbalance:
Load-imbalanceL =maximum loadaverage load
Load-imbalanceT =maximum computation timeaverage computation time
5/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Definitions
Element-weight:Specific computational effort of an element.
Load of a MPI-domain:Sum of the element-weights of a MPI-domain.
Load-imbalance:
Load-imbalanceL =maximum loadaverage load
Load-imbalanceT =maximum computation timeaverage computation time
5/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Definitions
Element-weight:Specific computational effort of an element.
Load of a MPI-domain:Sum of the element-weights of a MPI-domain.
Load-imbalance:
Load-imbalanceL =maximum loadaverage load
Load-imbalanceT =maximum computation timeaverage computation time
5/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Example 1 - Finite-Volume-Code FS3D
References:Institute of Aerospace Thermodynamics - University of Stuttgarthttp://www.uni-stuttgart.de/itlr/forschung/tropfen/fs3d/index.php?lang=en&lang=en:w
6/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Examples for load-imbalance - Volume of Fluid Method
1 1 1 0,7 0
1 1 1 0,3 0
1 1 0,6 0 0
0,7 0,3 0 0 0
0 0 0 0 0
x
y
Source: ITLR
Treatment of multiple phases
f (x , t) =
0 liquid(0; 1) interface1 solid
I Reconstruction of the interfaceI Additional computational effort for the
interface elements
7/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Examples for load-imbalance - Volume of Fluid Method
1 1 1 0,7 0
1 1 1 0,3 0
1 1 0,6 0 0
0,7 0,3 0 0 0
0 0 0 0 0
x
y
Source: ITLR
Treatment of multiple phases
f (x , t) =
0 liquid(0; 1) interface1 solid
I Reconstruction of the interfaceI Additional computational effort for the
interface elements
7/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Example 2 - Discontinuous Galerkin Spectral ElementMethod (DG SEM) based code FLEXI
References:Institute of Aerodynamics and Gas Dynamics - University of StuttgartNumerical research group https://nrg.iag.uni-stuttgart.de/
FLEXI on github: https://github.com/flexi-framework/flexi
8/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
FLEXI - a high-order numerical framework
I Hight-order numerical approachI Massively parallel CFD-code
designed for HPC-systemsI DG SEM enables efficient
communication-pattern(communication hiding)
I FLEXI tested on differentHPC-systems: Hornet, HazelHen,JUQueen, Marconi
I FLEXI is used for very largeacademic use-cases
9/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Example - gas injection
I Numerical approach wasextended to simulate complexfluid-flow
I The FV-Sub-Cell-Method isused for shock capturing
I Code is used for industrial usecases
I Depending on the localsolution theFV-Sub-Cell-Method is turnedon/off
I FV-Sub-Cell-Method addslocal computational effort
10/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Local computational effort
T = 0,00 ms T = 0,25 ms T = 0,50 ms
11/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Motivation for dynamic load-balancing
0.95
1.00
1.05
1.10
1.15
1.20
1.25
1.30
10 20 30 40 50 60 70 80 90
load
/av
erag
elo
ad
core idLoad distribution of a test-case on 96 cores.
I Small number of cores are over-loadedI Huge number of cores are under-loadedI Performance of the application suffers from the overload on a small number of cores
12/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
A dynamic load-balancing strategy
Step 1: Calculate a new optimized loaddistribution
Step 2: Distribute the load betweenMPI-processes
Challenges:I Global optimization problemI Simple and fast calculation of the
optimized load distributionI Keep the communication overhead
small
Challenges:I Communication-structure originates
during run-timeI Communication-structure changes
during run-timeI Keep the communication overhead
small
13/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
A dynamic load-balancing strategy
Step 1: Calculate a new optimized loaddistribution
Step 2: Distribute the load betweenMPI-processes
Challenges:I Global optimization problemI Simple and fast calculation of the
optimized load distributionI Keep the communication overhead
small
Challenges:I Communication-structure originates
during run-timeI Communication-structure changes
during run-timeI Keep the communication overhead
small
13/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
A dynamic load-balancing strategy
Step 1: Calculate a new optimized loaddistribution
Step 2: Distribute the load betweenMPI-processes
Challenges:I Global optimization problemI Simple and fast calculation of the
optimized load distributionI Keep the communication overhead
small
Challenges:I Communication-structure originates
during run-timeI Communication-structure changes
during run-timeI Keep the communication overhead
small
13/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
A dynamic load-balancing strategy - approach
Core0
7
0
1 2
3 4 5
67
8 9
101112
1314
15
16 17
1819
20
21 22
23 24
25 26
27
2829
30 31 32 33
3435
36
37 38
39 40
41 42
43
4445
46 47
48
4950
515253
54 55
5657
58 59 60
61 62
63
⇒
Element-weight
w1
w2
Core0
7
0
1 2
3 4 5
67
8 9
101112
1314
15
16 17
1819
20
21 22
23 24
25 26
27
2829
30 31 32 33
3435
36
37 38
39 40
41 42
43
4445
46 47
48
4950
515253
54 55
5657
58 59 60
61 62
63
I Using a space-filling-curve for the domain-decompositionI Mapping the 3D-problem to a 1D-problemI Minimal communication (only one time)I Simple communication pattern
14/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
1. Calculate a new optimized load distribution
An optimization algorithm based on the element specific computational effort is used for theload-distribution.
∆cPnavg = min(|∆c
Pn−1avg + ∆Pn
1 |, |∆cPn−1avg + ∆Pn
2 |)
0.95
1.00
1.05
1.10
1.15
1.20
1.25
1.30
10 20 30 40 50 60 70 80 90
load
/av
erag
elo
ad
core id
Load distribution without load-balancing
0.95
1.00
1.05
1.10
1.15
1.20
1.25
1.30
10 20 30 40 50 60 70 80 90
load
/av
erag
elo
ad
core id
Load distribution with load-balancing
Load distribution of a test-case on 96 cores.
15/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Step 2: Distributing the load between MPI-processes
I Using again the 1D-structure of the space-filling-curveI Intra-node-communication via MPI-Shared-Memory-WindowI Shared-Memory used for communication and to store the results during the
load-balancingI MPI-communication only between nodes
16/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Results - Element distribution
T0 = 0,00ms T1 = 0,25ms T2 = 0,50ms
I Element distribution depends on the computational effortI Redistrubution of the elements every 1,000 timestepsI 10% reduction of the wall-time
17/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Dynamic load distribution
18/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Further Challenges - detection of load-imbalance at run-time
State of the art:I Different powerful performance measuring tools are availableI All tools need an instrumentation and produce overheadI Results of the measurement only available after simulationI No feedback loop to the application
Challenges:I Measurement has to be integrated in the applicationI Feedback loop to the applicationI Measurement of the load-imbalance with a minimal overhead
19/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Further Challenges - detection of load-imbalance at run-time
State of the art:I Different powerful performance measuring tools are availableI All tools need an instrumentation and produce overheadI Results of the measurement only available after simulationI No feedback loop to the application
Challenges:I Measurement has to be integrated in the applicationI Feedback loop to the applicationI Measurement of the load-imbalance with a minimal overhead
19/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Conclusion
I HPC systems generate their power by facilitating hundreds of thousands of coresI Massively parallel CFD-codes are implementedI Initial domain decomposition techniques are well knowI Well balanced CFD-applications
I Simulation of complex fluid flowI varying load distribution during run-timeI occurrence of the load distribution hard to predict spatially and chronologicallyI simulation specific load-imbalance occurs
I For efficient application execution dynamic load-balance strategies are indispensableI Efficient redistribution of the computational loadI Detection of load-imbalance with a very low overhead
20/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
Thank you for your attention!
20/20 :: A dynamic load-balancing strategy for large scale CFD-applications :: 10.10.2017 ::