[1cm] a massively-parallel framework for finite-volume ......slide2/2 pdes2014 — deconincketal. a...

10
A Massively-Parallel Framework for Finite-Volume Simulation of Global Atmospheric Dynamics Willem Deconinck 1 Mats Hamrud 1 Piotr K. Smolarkiewicz 1 Joanna Szmelter 2 Zhao Zhang 2 1 European Centre for Medium-Range Weather Forecasts, Reading, UK 2 Wolfson School, Loughborough University, UK 2014 PDEs Workshop

Upload: others

Post on 25-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [1cm] A Massively-Parallel Framework for Finite-Volume ......Slide2/2 PDEs2014 — Deconincketal. A Massively-Parallel Framework for Finite-Volume Simulation of Global Atmospheric

A Massively-Parallel Framework for Finite-VolumeSimulation of Global Atmospheric Dynamics

Willem Deconinck 1 Mats Hamrud 1 Piotr K. Smolarkiewicz 1

Joanna Szmelter 2 Zhao Zhang 2

1European Centre for Medium-Range Weather Forecasts, Reading, UK2Wolfson School, Loughborough University, UK

2014 PDEs Workshop

Page 2: [1cm] A Massively-Parallel Framework for Finite-Volume ......Slide2/2 PDEs2014 — Deconincketal. A Massively-Parallel Framework for Finite-Volume Simulation of Global Atmospheric

T3999 ∼5km global resolution spectral transform model IFS operational in∼2024 scales well into petascale

0 50000 100000 150000 200000 250000# cores

100

150

200

250

300

350

400

450

500

Fore

cast

Days

/ D

ay

Beyond exascale (∼2030, at T7999): gridpoint method with local communication

MPDATAhorizontally unstructuredvertically structuredlon-lat domain2nd order in time and space Evolutionary introduction into IFS

Slide 1/2 PDEs 2014 — Deconinck et al.

Page 3: [1cm] A Massively-Parallel Framework for Finite-Volume ......Slide2/2 PDEs2014 — Deconincketal. A Massively-Parallel Framework for Finite-Volume Simulation of Global Atmospheric

T3999 ∼5km global resolution spectral transform model IFS operational in∼2024 scales well into petascale

0 50000 100000 150000 200000 250000# cores

100

150

200

250

300

350

400

450

500

Fore

cast

Days

/ D

ay

Beyond exascale (∼2030, at T7999): gridpoint method with local communication

MPDATAhorizontally unstructuredvertically structuredlon-lat domain2nd order in time and space Evolutionary introduction into IFS

Slide 1/2 PDEs 2014 — Deconinck et al.

Page 4: [1cm] A Massively-Parallel Framework for Finite-Volume ......Slide2/2 PDEs2014 — Deconincketal. A Massively-Parallel Framework for Finite-Volume Simulation of Global Atmospheric

T3999 ∼5km global resolution spectral transform model IFS operational in∼2024 scales well into petascale

0 50000 100000 150000 200000 250000# cores

100

150

200

250

300

350

400

450

500

Fore

cast

Days

/ D

ay

Beyond exascale (∼2030, at T7999): gridpoint method with local communication

MPDATAhorizontally unstructuredvertically structuredlon-lat domain2nd order in time and space

Evolutionary introduction into IFS

Slide 1/2 PDEs 2014 — Deconinck et al.

Page 5: [1cm] A Massively-Parallel Framework for Finite-Volume ......Slide2/2 PDEs2014 — Deconincketal. A Massively-Parallel Framework for Finite-Volume Simulation of Global Atmospheric

T3999 ∼5km global resolution spectral transform model IFS operational in∼2024 scales well into petascale

0 50000 100000 150000 200000 250000# cores

100

150

200

250

300

350

400

450

500

Fore

cast

Days

/ D

ay

Beyond exascale (∼2030, at T7999): gridpoint method with local communication

MPDATAhorizontally unstructuredvertically structuredlon-lat domain2nd order in time and space Evolutionary introduction into IFS

Slide 1/2 PDEs 2014 — Deconinck et al.

Page 6: [1cm] A Massively-Parallel Framework for Finite-Volume ......Slide2/2 PDEs2014 — Deconincketal. A Massively-Parallel Framework for Finite-Volume Simulation of Global Atmospheric

Developments and Results

Development of flexible dynamic data structure

Object Oriented C++ design with Fortran interfaceMPI / Halo-exchangesinterpolation, mesh-generation, product delivery

Orographic zonal flow, meridional velocity

0 5 10 15 20 25 30 35 40# cores x 1000

0

5

10

15

20

25

30

35

40

spee

dup

x 10

00

linearMPDATAIFS

102 103 104 105

# cores

0

20

40

60

80

100

effic

ienc

y in

%

MPDATAIFS

T2047 (∼10km) with 137 isentrope levels – scaling to 40000 cores

Slide 2/2 PDEs 2014 — Deconinck et al.

Page 7: [1cm] A Massively-Parallel Framework for Finite-Volume ......Slide2/2 PDEs2014 — Deconincketal. A Massively-Parallel Framework for Finite-Volume Simulation of Global Atmospheric

Developments and ResultsDevelopment of flexible dynamic data structure

Object Oriented C++ design with Fortran interfaceMPI / Halo-exchangesinterpolation, mesh-generation, product delivery

Orographic zonal flow, meridional velocity

0 5 10 15 20 25 30 35 40# cores x 1000

0

5

10

15

20

25

30

35

40

spee

dup

x 10

00

linearMPDATAIFS

102 103 104 105

# cores

0

20

40

60

80

100

effic

ienc

y in

%

MPDATAIFS

T2047 (∼10km) with 137 isentrope levels – scaling to 40000 cores

Slide 2/2 PDEs 2014 — Deconinck et al.

Page 8: [1cm] A Massively-Parallel Framework for Finite-Volume ......Slide2/2 PDEs2014 — Deconincketal. A Massively-Parallel Framework for Finite-Volume Simulation of Global Atmospheric

Developments and ResultsDevelopment of flexible dynamic data structure

Object Oriented C++ design with Fortran interfaceMPI / Halo-exchangesinterpolation, mesh-generation, product delivery

Orographic zonal flow, meridional velocity

0 5 10 15 20 25 30 35 40# cores x 1000

0

5

10

15

20

25

30

35

40

spee

dup

x 10

00

linearMPDATAIFS

102 103 104 105

# cores

0

20

40

60

80

100

effic

ienc

y in

%

MPDATAIFS

T2047 (∼10km) with 137 isentrope levels – scaling to 40000 cores

Slide 2/2 PDEs 2014 — Deconinck et al.

Page 9: [1cm] A Massively-Parallel Framework for Finite-Volume ......Slide2/2 PDEs2014 — Deconincketal. A Massively-Parallel Framework for Finite-Volume Simulation of Global Atmospheric

Developments and ResultsDevelopment of flexible dynamic data structure

Object Oriented C++ design with Fortran interfaceMPI / Halo-exchangesinterpolation, mesh-generation, product delivery

Orographic zonal flow, meridional velocity

0 5 10 15 20 25 30 35 40# cores x 1000

0

5

10

15

20

25

30

35

40

spee

dup

x 10

00

linearMPDATAIFS

102 103 104 105

# cores

0

20

40

60

80

100

effic

ienc

y in

%

MPDATAIFS

T2047 (∼10km) with 137 isentrope levels – scaling to 40000 coresSlide 2/2 PDEs 2014 — Deconinck et al.

Page 10: [1cm] A Massively-Parallel Framework for Finite-Volume ......Slide2/2 PDEs2014 — Deconincketal. A Massively-Parallel Framework for Finite-Volume Simulation of Global Atmospheric

A Massively-Parallel Framework for Finite-Volume Simulationof Global Atmospheric Dynamics

Willem Deconinck 1 Mats Hamrud 1 Piotr K. Smolarkiewicz 1 Joanna Szmelter 2 Zhao Zhang 2

1European Centre for Medium-Range Weather Forecasts, Reading, UK2Wolfson School, Loughborough University, UK

[email protected]

Scalability of Current IFS Dynamical Core

5km horizontal resolution with 137 levels

0 50000 100000 150000 200000 250000# cores

100

150

200

250

300

350

400

450

500

Fore

cast

Days

/ D

ay

I We will reach Exa-scale in ∼2030I Spectral Transform Method does not scale to Exa-scale because of global communicationsI Semi-Lagrangian time-stepping implementation is non-conservative

Alternative Dynamical Core: Unstructured Edge-based Finite Volume MPDATA

I Non-oscillatory forward-in-time scheme, capable ofaccomodating a wide range of scales and conservationproblems

I Unstructured prismatic meshes allow irregular spatialresolution and enhancement of polar regions.

I Formulation for time-dependent non-orthogonalcurvilinear coordinates on the manifold.

∂Gψ

∂t+∇ · (Gv∗ψ) = GR

See Szmelter and Smolarkiewicz (2010, JCP), for further discussion.

Massively Parallel Implementation

I Multiple levels of parallelism

Data

Task 1 Task 2 Task 3 Task 4

MPI

Thread 1---------------Thread 2

---------------Thread 3

OpenMP

Thread 1---------------Thread 2

---------------Thread 3

OpenMP

Thread 1---------------Thread 2

---------------Thread 3

OpenMP

Thread 1---------------Thread 2

---------------Thread 3

OpenMP

I Optimal Equal-Area Domain decomposition

I Local computations in every subdomainI Small halo needs to be exchanged with

surrounding subdomains for DistributedMemory algorithms

I Shared Memory parallelisation avoids furthersubdivision of subdomains

I Structured treatment of vertical directiondiscounts cost of horizontal indirectaddressing

Evolutionary Introduction into IFS

I Construction ofunstructured mesh usingsame data points as usedby IFS’ SpectralTransform Method

I Integration with ECMWF’sinfrastructure forarchiving, post-processing,visualisation

Flexible Dynamic Framework

I Complex requirements for unstructured meshesI Handling of distributed memory parallelisationI Mesh specific routines: construction of dual mesh, periodicity,

reading/writing fields, interpolationI Multiple meshes to handle multigrid implementationsI Object Oriented Design using C++:

I Hierarchical nesting of topological objectsI Meshes, Field Sets, FieldsI Multiple Halo-Exchange patterns

I Fortran Interface allows direct access to internal data

Shallow Water Equations on the Sphere

∂GD∂t

+∇ · (Gv∗D) = 0

∂GQx

∂t+∇ · (Gv∗Qx) = G

(− g

hxD∂H∂x

+ fQy −1

GD∂hx∂yQxQy

)∂GQy

∂t+∇ · (Gv∗Qy) = G

(− g

hxD∂H∂x

+ fQy −1

GD∂hx∂yQxQy

)

Meridional wind-component for flow over 2km mountain at mid-latitudes;result obtained using Reduced Gaussian mesh with 16km resolution.

3D Hydrostatic Equations in Isentropic Coordinates

Froude Number = 2, Zontal wind U = 10 m/s,Brunt-Vaisala frequency = 0.04

Isentrope height perturbation at He = λz/8

Isentropes in a verticalplane at the equator

220 240 260 280 300 320

Result obtained using Reduced Gaussian mesh with 1km horizontal resolution,and 40m vertical resolution on a small planet with radius 64km.

Parallel Scaling results

0 5 10 15 20 25 30 35 40# cores x 1000

0

5

10

15

20

25

30

35

40

speedup x

10

00

linearMPDATAIFS

102 103 104 105

# cores

0

20

40

60

80

100

eff

icie

ncy

in %

MPDATAIFS

Scaling results obtained with 10km Reduced Gaussian mesh and 137 Levels.

Acknowledgements