performance analysis of massively-parallel computational ... · but its architecture will be...

29
Performance Analysis of Massively-Parallel Computational Fluid Dynamics 11 th ICHD, Singapore,

Upload: others

Post on 17-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Performance Analysis of Massively-Parallel

Computational Fluid Dynamics

11th ICHD, Singapore,

Page 2: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Performance Analysis of Massively-Parallel CFD

Background

J. Hawkes

Exascale computing...

First computer capable of 1018 FLOPs in 2020.

GFLOPs of the Top500 (#1, #500 and Sum)

Page 3: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Background

J. Hawkes

... But its architecture will be completely different to current supercomputers.

Limited by power consumption, rather than hardware speed.

Shift towards MASSIVE CONCURRENCY.

Mostly intra-node concurrency (shared memory).

Average of the Top500 supercomputers, with the first exascale machine superimposed.

Performance Analysis of Massively-Parallel CFD

Page 4: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

The Strong Scalability Problem

J. Hawkes

We’ve been used to weak scaling – making our problems larger to match

increases in concurrency.

Massive increase in concurrency makes this impractical – we need to split

our problems into smaller chunks (strong scaling).

Memory capacity growing half as fast as overall compute power; finite cap

on problem size grows slower than core-count.

Most challenging CFD lies in unsteady simulations. We can only parallelize

spatial dimensions, and we have to do this more to increase our unsteady

simulation capabilities.

Performance Analysis of Massively-Parallel CFD

Page 5: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Objectives

J. Hawkes

Investigating limitations to strong scalability

Considering a range of common user settings to cover a broad range of

applications

Discretization Schemes

Turbulence Models

Grid Structure

Linear Equation-System Solvers

Performance Analysis of Massively-Parallel CFD

Page 6: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Experimental Setup

Performance Analysis of Massively-Parallel CFD

Page 7: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

ReFRESCO

J. Hawkes

Maritime-optimized RANS code

Various Eddy-Viscosity models

State-of-the-art sliding, deforming and adaptive meshes

Developed at MARIN (Netherlands), IST (Portugal), USP-TPN (Brazil), TUDelft

(Netherlands), University of Southampton (United Kingdom)

Performance Analysis of Massively-Parallel CFD

ReFRESCO Cavitation and Manoeuvring simulations [marin.nl]

Page 8: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Profiling

J. Hawkes

Run-time of SIMPLE split

into its key parts.

Assembly of Linear Systems

Solution to those systems

MPI Data Exchange

Calculations of Gradients

Other

Performance Analysis of Massively-Parallel CFD

Page 9: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

KVLCC2

J. Hawkes

Double-body wind-tunnel simulation

Performance Analysis of Massively-Parallel CFD

Page 10: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Numerical Setup

J. Hawkes

Segregated Solver

2.67m cells

k-ω shear stress transport (SST-2003)

Momentum/Turbulence Equation

Block Jacobi + GMRES

ILCT 1%

Explicit 0.15, Implicit 0.8-0.85 (ramped)

QUICK or Upwind (1st Order)

Pressure Equation

ML + GMRES

ILCT 1%

Explicit 0.1

Performance Analysis of Massively-Parallel CFD

Page 11: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

IRIDIS4

J. Hawkes

University of Southampton Supercomputer

#179 on Top500 in November 2013

12,200 cores at 2.6Ghz (16 cores per node)

Performance Analysis of Massively-Parallel CFD

[UoS, Computational Modelling Group – cmg.soton.ac.uk]

Page 12: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Results

Performance Analysis of Massively-Parallel CFD

Page 13: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Basic Scalability

J. HawkesPerformance Analysis of Massively-Parallel CFD

Scalability of profiled routines.

Page 14: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Basic Scalability

J. HawkesPerformance Analysis of Massively-Parallel CFD

Proportions of time spent in profiled routines.

Page 15: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Convective Discretization Scheme

J. HawkesPerformance Analysis of Massively-Parallel CFD

Scalability of total runtime with various convective discretization schemes.

Page 16: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Turbulence Models

J. HawkesPerformance Analysis of Massively-Parallel CFD

Scalability of total runtime with various turbulence models.

Page 17: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Structured vs. Unstructured

J. HawkesPerformance Analysis of Massively-Parallel CFD

Scalability of total runtime comparing a structured grid to an unstructured grid. Some interpolation error.

Page 18: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Linear Equation-System Solvers

J. HawkesPerformance Analysis of Massively-Parallel CFD

Effect on total run-time of Inner Loop Convergence Tolerance with various outer loop relaxation factors.

Page 19: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Linear Equation-System Solvers

J. HawkesPerformance Analysis of Massively-Parallel CFD

Scalability of solve routines with varying pre-conditioner. BCGS and GMRES show similar trends.

Page 20: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Optimized Scalability

J. HawkesPerformance Analysis of Massively-Parallel CFD

Scalability of profiled routines.

Page 21: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Optimized Scalability

J. HawkesPerformance Analysis of Massively-Parallel CFD

Proportions of time spent in profiled routines.

Page 22: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Conclusions

Performance Analysis of Massively-Parallel CFD

Page 23: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Conclusions

J. Hawkes

Discretization scheme had little effect on overall scalability, assembly

routines are not a bottleneck.

Two-equation Eddy-Viscosity models were considerably more expensive and

non-scalable than one-equation models – mostly due to non-scalable solve

routines.

Mesh structure had little effect on overall scalability.

Choice of pre-conditioner had a large effect on the solve routines, but still a

bottleneck.

MPI Data Exchange is a bottleneck at higher node-count.

Solve routines are a bottleneck to shared-memory parallelization.

Exchange routines are a bottle neck to distributed-memory parallelization.

Performance Analysis of Massively-Parallel CFD

Page 24: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Thank You

Page 25: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Further Work

Performance Analysis of Massively-Parallel CFD

Page 26: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

So Why Isn’t the Solver Scalable?

J. HawkesChaotic Linear Solvers for Massively Parallel CFD

Most codes use a Krylov Subspace (KSP) method such as GMRES.

Page 27: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

J. HawkesChaotic Linear Solvers for Massively Parallel CFD

CIM: Maximum use of computation and communication.

Chaotic Iterative Methods

Page 28: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Q & A

Performance Analysis of Massively-Parallel CFD

Page 29: Performance Analysis of Massively-Parallel Computational ... · But its architecture will be completely different to current supercomputers. Limited by power consumption, rather than

Nodal Speedup

J. Hawkes

Nodal speed-up up to ~80k cells per core. Can achieve higher with lower processes-per-node.

Performance Analysis of Massively-Parallel CFD