considerazioni riscaldamento distribuito forno smalvic

59
Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project Workshop “HPC enabling of OpenFOAM ® for CFD applications” 26-28 november, CINECA, Casalecchio di Reno (BO), Italy A. De Maio (1) , V. Krastev ( 2) , P. Lanucara (3) , F. Salvadore (3) (1) Nu.m.i.d.i.a. S. r. l. (2) Dept. of Industrial Engineering, University of Rome “Tor Vergata” (3) CINECA Roma, Dipartimento SCAI

Upload: vubao

Post on 09-Jan-2017

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Considerazioni riscaldamento distribuito forno Smalvic

Aerodynamics of a hi-performancevehicle: a parallel computing

application inside the Hi-ZEV project

Workshop “HPC enabling of OpenFOAM® for CFD applications”

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

A. De Maio(1), V. Krastev(2), P. Lanucara(3), F. Salvadore(3)

(1) Nu.m.i.d.i.a. S. r. l.(2) Dept. of Industrial Engineering, University of Rome “Tor Vergata”(3) CINECA Roma, Dipartimento SCAI

Page 2: Considerazioni riscaldamento distribuito forno Smalvic

Summary

• Hi-ZEV project outline

• Preliminary evaluation of the OpenFOAM® code

• Prototype car simulations: aerodynamic results and scalability/performance tests

• Conclusions

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Page 3: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

• Granted by the Italian Ministry of Economic Development’s program«Industria 2015 – Nuove Tecnologie per il Made in Italy»

• The project aim is the development of an Innovative High Performance Car with Low Environmental Impact based on an Electrical/Hybrid Powertrain

• The project started on 01/01/2011 and will last until 31/12/2013

Hi-ZEV: a collaborative industrial research project

Page 4: Considerazioni riscaldamento distribuito forno Smalvic

Hi-ZEV: the partners

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Technos Reat

Fondazione Italiana Nuove Comunicazioni

Icomet Microsistemi srl Elettromedia Advanced Devices spa

Dyesol Italia srl Leaff Engineering srl ISAM spa Concept Inn srl HPH Consulting

Page 5: Considerazioni riscaldamento distribuito forno Smalvic

Hi-ZEV: the partners

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Technos Reat

Fondazione Italiana Nuove Comunicazioni

Icomet Microsistemi srl Elettromedia Advanced Devices spa

Dyesol Italia srl Leaff Engineering srl ISAM spa Concept Inn srl HPH Consulting

Team Leader and Project Coordinator

Page 6: Considerazioni riscaldamento distribuito forno Smalvic

Hi-ZEV: the partners

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Technos Reat

Fondazione Italiana Nuove Comunicazioni

Icomet Microsistemi srl Elettromedia Advanced Devices spa

Dyesol Italia srl Leaff Engineering srl ISAM spa Concept Inn srl HPH Consulting

Team Leader and Project Coordinator

Page 7: Considerazioni riscaldamento distribuito forno Smalvic

Hi-ZEV: technical Key Points

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Very light vehicle (low weight/power ratio)

High performance Hybrid Powertrain for a wide rangetorque availability

Very advanced chassis and suspensions for an excellentroad-holding

Accurate Fluid-Dynamic Design

Page 8: Considerazioni riscaldamento distribuito forno Smalvic

Hi-ZEV: technical Key Points

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Very light vehicle (low weight/power ratio)

High performance Hybrid Powertrain for a wide rangetorque availability

Very advanced chassis and suspensions for an excellentroad-holding

Accurate Fluid-Dynamic Design CFD

Page 9: Considerazioni riscaldamento distribuito forno Smalvic

The role of CFD inside the project

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

• In the early, as well as in the more advanced design stages, CFD can beeffectively used to optimize:

1. the external aerodynamics of the vehicle;2. the underhood aerodynamics/thermal

management;3. The HVAC systems.

• The combination of an open source fully parallelized code (OpenFOAM®) with the the HPC infrastructure ofCASPUR/CINECA represents anincredibly powerful and efficientanswer to these needs.

OpenFOAM® + HPC

CFD

Externalaerodynamics Underhood HVAC

Page 10: Considerazioni riscaldamento distribuito forno Smalvic

Preliminary simulations on the Matrix cluster

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

• Preliminary evaluation of OpenFOAM®

on the Matrix infrastructure

• Standard external aerodynamics test case (Ahmed body)

• OpenFOAM-1.7.1 + OpenMPI-1.4.2 + Scotch for decomposition

• Steady state solver (simpleFoam) on unstructured grids (up to 6*106 cells)

• High-Re RANS turbulence modeling(RNG/realizable k-e + WF)

• Up to 256 cores (32 nodes) involved

8 cores x node (2 x quad core AMD Opteron23xx @ 2.1 GHz)

320 nodes with 16 GB RAM each Infiniband DDR connection between nodes 20 Tflops peak perfomance, 177 Mflops/W

sustained performance

Page 11: Considerazioni riscaldamento distribuito forno Smalvic

Preliminary simulations on the Matrix cluster: computational domain

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Page 12: Considerazioni riscaldamento distribuito forno Smalvic

Ahmed body results: wake flow structures, ϕ=25°

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Symmetry plane 3D (Q- criterion, Q=1 04 s- 2)

(RKE)

(RNG)

Page 13: Considerazioni riscaldamento distribuito forno Smalvic

Ahmed body results: wake flow structures, ϕ=25°

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Symmetry plane 3D (Q- criterion, Q=1 04 s- 2)

(RKE)

(RNG)

Page 14: Considerazioni riscaldamento distribuito forno Smalvic

Ahmed body results: wake flow structures, ϕ=25°

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Symmetry plane 3D (Q- criterion, Q=1 04 s- 2)

(RKE)

(RNG)

Page 15: Considerazioni riscaldamento distribuito forno Smalvic

Ahmed body results: wake flow structures, ϕ=35°

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

(RKE)

(RNG)

Symmetry plane 3D (Q- criterion, Q=1 04 s- 2)

Page 16: Considerazioni riscaldamento distribuito forno Smalvic

Ahmed body results: wake flow structures, ϕ=35°

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

(RKE)

(RNG)

Symmetry plane 3D (Q- criterion, Q=1 04 s- 2)

Page 17: Considerazioni riscaldamento distribuito forno Smalvic

Ahmed body results: velocity profiles in the symmetry plane

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

ϕ=25° ϕ=35°

Page 18: Considerazioni riscaldamento distribuito forno Smalvic

Ahmed body results: velocity profiles in the symmetry plane

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

ϕ=25° ϕ=35°

Page 19: Considerazioni riscaldamento distribuito forno Smalvic

Ahmed body results: integrated rearpressure drag

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Rear pressure drag coefficients (ϕ =25) Total Difference (%)*

Slant Base

RKE 0.147 0.088 0.235 -13.3

RNG 0.147 0.083 0.230 -15.1

Lienhart et al. 0.156 0.115 0.271 -

Rear pressure drag coefficients (ϕ =35) Total Difference (%)*

Slant Base

RKE 0.110 0.107 0.217 -12.5

RNG 0.115 0.101 0.216 -12.9

Lienhart et al. 0.121 0.127 0.248 -

Comments:

Results are aligned with previous CFD studies on the 25°/35° configurations

The realizable k-ε captures fairlywell the relative drag reduction (~ 8%)in the 25° to 35° passage

Overall comparison:

Page 20: Considerazioni riscaldamento distribuito forno Smalvic

Ahmed body results: some considerations about scalability

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Case description:

•Finest grid (~6*106 cells)

•PCG linear solver on pressureequation

•64-96-128-256 cores (8-12-16-32 nodes) progression

Speedup specific efficiency

88

90

92

94

96

98

100

102

8-12 12-16 16-32

Spee

dup

spec

ifice

ffici

ency

(%)

Nodes increase

. . . speedup relative increases s e nodes relative increase=

Page 21: Considerazioni riscaldamento distribuito forno Smalvic

Aaaaaaa

Ahmed body results: some considerations about scalability

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Case description:

•Finest grid (~6*106 cells)

•PCG linear solver on pressureequation

•64-96-128-256 cores (8-12-16-32 nodes) progression

•Almost linear inter-node scaling(at least in the consideredinterval)

88

90

92

94

96

98

100

102

8-12 12-16 16-32

Spee

dup

spec

ifice

ffici

ency

(%)

Nodes increase

Speedup specific efficiency

Page 22: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

• Aims:1. Aerodynamic optimization of the Hi-ZEV

prototype external design;2. More systematic scalability tests on the

CASPUR/CINECA HPC infrastructures.

• Two hybrid (prisms+tetras) gridsconsidered:

1. 7.5*106 cells (symmetric);2. 15*106 cells (complete geometry).

• OpenFOAM-2.1.1 + Scotch

• Three architectures selected for the performance tests

8 cores x node (2 x quad core AMD Opteron23xx @ 2.1 GHz)

320 nodes with 16 GB RAM each Infiniband DDR connection between nodes 20 Tflops peak perfomance, 177 Mflops/W

sustained performance

Matrix (AMD Opteron)

Page 23: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

12 cores x node (2 x six-core Intel X5650 “Westmere” @ 2.67 GHz )†

16 nodes with 48 GB RAM each Infiniband QDR connection between nodes 14.3Tflops peak perfomance, 785 Mflops/W

sustained performance

Jazz (Intel Xeon)

† Each node equipped also with 2 nVidia Tesla GPU computing units, not involved in the OpenFOAMsimulations

• Aims:1. Aerodynamic optimization of the Hi-ZEV

prototype external design;2. More systematic scalability tests on the

CASPUR/CINECA HPC infrastructures.

• Two hybrid (prisms+tetras) gridsconsidered:

1. 7.5*106 cells (symmetric);2. 15*106 cells (complete geometry).

• OpenFOAM-2.1.1 + Scotch

• Three architectures selected for the performance tests

Page 24: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

16 cores x node (IBM PPCA2 @ 1.6 GHz) 10240 nodes (163840 cores) with 16 GB

RAM each (1 GB x core) Network interface with 11 links ->5D Torus 2 Pflops peak perfomance

Fermi (BG/Q)• Aims:

1. Aerodynamic optimization of the Hi-ZEVprototype external design;

2. More systematic scalability tests on the CASPUR/CINECA HPC infrastructures.

• Two hybrid (prisms+tetras) gridsconsidered:

1. 7.5*106 cells (symmetric);2. 15*106 cells (complete geometry).

• OpenFOAM-2.1.1 + Scotch

• Three architectures selected for the performance tests

Page 25: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations: computationaldomain

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

movingfloor

inlet

half car

symmetryplane

outlet

top

side

Page 26: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations: aerodynamicresults (OF vs. Fluent)

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

OpenFOAM® settings:

•Symmetrical prism/tetra grid(exactly the same for both codes)

•simpleFoam pressure-based solver

•Realizable k-ε for turbulence + standard WF

•TVD scheme for momentumconvection, upwind for k/ε

Fluent settings:

•Symmetrical prism/tetra grid(exactly the same for both codes)

•pressure-based solver

•Realizable k-ε for turbulence + non-equilibrium WF

•Second-order upwind scheme for allconvective terms

Page 27: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations: aerodynamicresults (OF vs. Fluent)

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

OpenFOAM® Fluent

Aerodynamic coefficients

Cd = 0.32, CL = 0.14 Cd = 0.31, CL = 0.17

Page 28: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations: aerodynamicresults (OF vs. Fluent)

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Pressure distribution around the car, y=0 (symmetry plane)

Fluent, 6000 iterations

OpenFOAM, 4500 iterations

212

pp pC

∞ ∞

−=

212

pp pC

∞ ∞

−=

Page 29: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations: aerodynamicresults (OF vs. Fluent)

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Pressure distribution around the car, y=- 0. 4

Fluent, 6000 iterations

OpenFOAM, 4500 iterations

212

pp pC

∞ ∞

−=

212

pp pC

∞ ∞

−=

Page 30: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations: aerodynamicresults (OF vs. Fluent)

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Pressure distribution around the car, y=- 0. 7

Fluent, 6000 iterations

OpenFOAM, 4500 iterations

212

pp pC

∞ ∞

−=

212

pp pC

∞ ∞

−=

Page 31: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations: aerodynamicresults (OF vs. Fluent)

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Total pressure distribution around the car, y=0 (symmetry plane)

Fluent, 6000 iterations

OpenFOAM, 4500 iterations

,

tpt

t

p pCp p

∞ ∞

−=

,

tpt

t

p pCp p

∞ ∞

−=

Page 32: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations: aerodynamicresults (OF vs. Fluent)

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Total pressure distribution around the car, y=- 0. 4

Fluent, 6000 iterations

OpenFOAM, 4500 iterations

,

tpt

t

p pCp p

∞ ∞

−=

,

tpt

t

p pCp p

∞ ∞

−=

Page 33: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations: aerodynamicresults (OF vs. Fluent)

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Total pressure distribution around the car, y=- 0. 7

Fluent, 6000 iterations

OpenFOAM, 4500 iterations

,

tpt

t

p pCp p

∞ ∞

−=

,

tpt

t

p pCp p

∞ ∞

−=

Page 34: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations: aerodynamicresults (OF vs. Fluent)

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Total pressure distribution around the car, z=0. 1 1

Fluent, 6000 iterations

OpenFOAM, 4500 iterations

,

tpt

t

p pCp p

∞ ∞

−=

,

tpt

t

p pCp p

∞ ∞

−=

Page 35: Considerazioni riscaldamento distribuito forno Smalvic

Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Speedup, Matrix vs Jazz, PCG

0

4

8

12

16

20

24

0 4 8 12 16 20

Spee

dup

Number of nodes

Matrix, PCG

Jazz, PCG

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•The computing node is selected asthe fundamental unit

1( )( )

node

N nodes

time per stepspeedup time per step−

− −= − −

Page 36: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Speedup, Matrix vs Jazz, GAMG

Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)

0

4

8

12

16

0 4 8 12 16 20

Spee

dup

Number of nodes

Matrix, GAMG

Jazz, GAMG

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•The computing node is selected asthe fundamental unit

1( )( )

node

N nodes

time per stepspeedup time per step−

− −= − −

Page 37: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Speedup, Matrix, GAMG vs PCG

Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)

0

4

8

12

16

20

24

0 8 16 24 32

Spee

dup

Number of nodes

Matrix, PCG

Matrix, GAMG

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•The computing node is selected asthe fundamental unit

1( )( )

node

N nodes

time per stepspeedup time per step−

− −= − −

Page 38: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Speedup, Jazz, GAMG vs PCG

Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)

0

4

8

12

16

20

24

0 4 8 12 16 20

Spee

dup

Number of nodes

Jazz, PCG

Jazz, GAMG

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•The computing node is selected asthe fundamental unit

1( )( )

node

N nodes

time per stepspeedup time per step−

− −= − −

Page 39: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Comments:

The PCG solver clearly outperformsGAMG when the parallelization startsto become extensive (approximatelyabove 100 processes for the half-carcase)

Jazz appears to scale better thanMatrix, probably because of the more capable infiniband network (QDR vs DDR) and of better cache “filling” asthe single processes become smaller

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•The computing node is selected asthe fundamental unit

Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)

Page 40: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Time- per- step, Matrix, GAMG vs PCG

Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•Time-per-step evaluated on a per-core basis

0

10

20

30

40

50

60

70

8 16 32 64 128 256

time

(s)

Number of cores

Matrix, PCG

Matrix, GAMG

Page 41: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Time- per- step, Jazz, GAMG vs PCG

Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•Time-per-step evaluated on a per-core basis

0

5

10

15

20

25

30

12 24 48 96 192

time

(s)

Number of cores

Jazz, PCG

Jazz, GAMG

Page 42: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Time- per- step, single- node, Matrix, GAMG vs PCG

Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•Time-per-step evaluated on a per-core basis

0

50

100

150

200

250

300

1 2 4 8

time

(s)

Number of cores

Matrix, PCG

Matrix, GAMG

Page 43: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•Time-per-step evaluated on a per-core basis

Time- per- step, single- node, Jazz, GAMG vs PCG

0102030405060708090

100

1 2 6 12

time

(s)

Number of cores

Jazz, PCG

Jazz, GAMG

Page 44: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•Time-per-step evaluated on a per-core basis

Comments:

Though the very inefficient intra-node scaling, the newer Intel arch. is(as expected) much faster than the AMD one

If the procs. number is kept in the “acceptable scaling range”, the GAMG solver is always faster than the PCG one (e. g. 40% faster on 64 Matrix cores)

Page 45: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Speedup efficiency, 1 6 ppn, PCG vs GAMGCase description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•16 and 32 MPI processes per node considered

Prototype car simulations: scalabilitytests (Fermi, symmetrical grid)

0

20

40

60

80

100

120

2 4 8 16 32 64 128 256

Spee

dup

effic

ienc

y (%

)

Number of nodes

Fermi, PCG, 16 PPN

Fermi, GAMG, 16 PPN

1 1· ·( ). .(%) 100 ( )

node

N nodes Ntime per steps e time per step

− −= − −

Page 46: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Prototype car simulations: scalabilitytests (Fermi, symmetrical grid)

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•16 and 32 MPI processes per node considered

0

20

40

60

80

100

120

2 4 8 16 32 64

Spee

dup

effic

ienc

y (%

)

Number of nodes

Fermi, PCG, 16 PPN

Fermi, PCG, 32 PPN

Speedup efficiency, PCG, 1 6 ppn vs. 32 ppn

1 1· ·( ). .(%) 100 ( )

node

N nodes Ntime per steps e time per step

− −= − −

Page 47: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Prototype car simulations: scalabilitytests (Fermi, symmetrical grid)

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•16 and 32 MPI processes per node considered

0

20

40

60

80

100

120

2 4 8 16 32 64

Spee

dup

effic

ienc

y (%

)

Number of nodes

Fermi, PCG, 16 PPN

Fermi, PCG, 32 PPN

Speedup efficiency, PCG, 1 6 ppn vs. 32 ppn

What about absolute performance?

Page 48: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Prototype car simulations: scalabilitytests (Fermi, symmetrical grid)

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•16 and 32 MPI processes per node considered

Time- per- step, PCG, 1 6 ppn vs. 32 ppn

Apparently usingo more ppn could be beneficial in terms of absolute performance, butactually when the number of nodes reaches a “practical” value (64) the benefit vanishes, and in addition…

0

5

10

15

20

25

30

35

2 4 8 16 32 64

time

(s)

Number of nodes

Fermi, PCG, 16 PPN

Fermi, PCG, 32 PPN

Page 49: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Output generation time, PCG, 1 6 ppn vs. 32 ppn

Prototype car simulations: I/O performance tests (Fermi, symmetrical grid)

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG linear solver on pressure

•Output generation time andinitialization time monitored

•16 and 32 MPI processes per node considered

05

101520253035404550

4 8 16 32 64 128

time

(s)

Number of nodes

Fermi, PCG, 16 PPN

Fermi, PCG, 32 PPN

Page 50: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Initialization time, PCG, 1 6 ppn vs. 32 ppn

Prototype car simulations: I/O performance tests (Fermi, symmetrical grid)

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG linear solver on pressure

•Output generation time andinitialization time monitored

•16 and 32 MPI processes per node considered 0

50

100

150

200

250

4 8 16 32 64 128

time

(s)

Number of nodes

Fermi, PCG, 16 PPN

Fermi, PCG, 32 PPN

Page 51: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Prototype car simulations: commentsabout Fermi runs (symmetrical grid)

Comments:

The case is of course too small to prove Fermi’s real potential, but…

…up to the minimum “practical” nodenumber (64) the SIMPLE iteration scalingis acceptable (PCG)

…when the I/O capability of the nodesgets actually saturated, a dramatic dropin the I/O efficiency occurs (and thingsget even worse with 32 ppn)

Case description:

•Symmetrical grid (~7.5*106 cells)

•PCG and GAMG linear solver on pressure equation

•50 iterations monitoring, startingfrom a fairly converged solution

•16 and 32 MPI processes per node considered

Page 52: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Time- per- step, PCG, symm. vs. doubledCase description:

•Doubled grid (~15*106 cells)

•PCG solver on pressure equation

•Only 16 ppn considered

•Comparison made assuming the samemesh-per-node load distribution (i. e. doubling the number of nodes forthe bigger grid)

Further simulations on Fermi: doubledgrid

0

0,5

1

1,5

2

2,5

3

32-64 64-128 128-256

time

(s)

Number of nodes (symm-double)

Fermi, PCG, symm

Fermi, PCG, double

Page 53: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Further simulations on Fermi: doubledgrid

Case description:

•Doubled grid (~15*106 cells)

•PCG solver on pressure equation

•Only 16 ppn considered

•Comparison made assuming the samemesh-per-node load distribution (i. e. doubling the number of nodes forthe bigger grid)

O. g. t. , PCG, symm. vs. doubled

0

5

10

15

20

25

30

35

40

32-64 64-128 128-256

time

(s)

Number of nodes (symm-double)

Fermi, PCG, symm

Fermi, PCG, double

Page 54: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Further simulations on Fermi: doubledgrid

Case description:

•Doubled grid (~15*106 cells)

•PCG solver on pressure equation

•Only 16 ppn considered

•Comparison made assuming the samemesh-per-node load distribution (i. e. doubling the number of nodes forthe bigger grid)

I. t. , PCG, symm. vs. doubled

0

100

200

300

400

500

600

32-64 64-128 128-256

time

(s)

Number of nodes (symm-double)

Fermi, PCG, symm

Fermi, PCG, double

Page 55: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Further simulations on Fermi: doubledgrid

Comments:

The SIMPLE iteration weak-scalingperformance appears fairly good and thus should encourage more tests on bigger cases, but…

…the I/O issues are confirmed

Case description:

•Doubled grid (~15*106 cells)

•PCG and GAMG linear solver on pressure equation

•Only 16 ppn considered

•Comparison made assuming the samemesh-per-node load distribution (i. e. doubling the number of nodes forthe bigger grid)

Page 56: Considerazioni riscaldamento distribuito forno Smalvic

Conclusions (1)

• Hi-ZEV a is successful example of how industry can take advantagefrom the combination of parallelized open-source CFD toolkits and highly qualified HPC infrastructures, in a collaborative project framework

• The OpenFOAM® code has been evaluated on “conventional” AMD and Intel HPC facilities for external aerodynamics applications, showing:– Good accuracy compared to well established commercial CFD codes;– Interesting parallel performances (still not totally exploited), at least for

small/medium size cases (~ 107 cells) and depending on the optimal pressuresolver choice (PCG scales better, GAMG is faster for smal procs. numbers)

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Page 57: Considerazioni riscaldamento distribuito forno Smalvic

Conclusions (2)

• The OpenFOAM® performances have been assessed also on the BG/Q supercomputer Fermi and, in spite of the (relatively) smallsize of the considered cases, the following remarks can beextracted:– The solver iteration scaling performances are promising (with PCG), especially in

the perspective of coping with much bigger problems;– Though for the considered cases a more conventional architecture (e. g. Intel

Xeon) seems to be a better choice, a deeper investigation should be made in order to include also performance vs. energy consumption aspects;

– Unfortunately, for massively parallel applications (thousands of processes) a dramatic I/O efficiency question rises (further evaluation needed)

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Page 58: Considerazioni riscaldamento distribuito forno Smalvic

Aknowledgments

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

(1) Nu.m.i.d.i.a. S. r. l.(2) Dept. of Industrial Engineering, University of Rome “Tor Vergata”(3) CINECA Roma, Dipartimento SCAI

A. De Maio(1), V. Krastev(2), P. Lanucara(3), F. Salvadore(3)

M. Testa(1) (for providing the half-car grid and Fluent results)

Page 59: Considerazioni riscaldamento distribuito forno Smalvic

26-28 november, CINECA, Casalecchio di Reno (BO), Italy

Workshop “HPC enabling of OpenFOAM® for CFD applications”