© crown copyright met office weather prediction and climate modelling at exascale: introducing the...
TRANSCRIPT
© Crown copyright Met Office
Weather prediction and climate modelling at Exascale: Introducing the Gung-Ho projectR. Ford, M.J. Glover, D.Ham, C.M. Maynard, S. Pickles, G. Riley and N. Wood
… and the weather for the conference is
© Crown copyright Met Office
© Crown copyright Met Office
The primitive equations
• Rather complicated
• Equations of motion for density, humidity, pressure, temperature and wind, mass conservation and thermodynamics
• Partial Differential Equations – no general solution
• Approximate, discrete, numerical methods
• Rather complicated
• Equations of motion for density, humidity, pressure, temperature and wind, mass conservation and thermodynamics
• Partial Differential Equations – no general solution
• Approximate, discrete, numerical methods
When a problem in pure or in applied mathematics is "solved" by numerical computation, errors, that is, deviations of the numerical "solution" obtained from the true, rigorous one, are unavoidable. Such a "solution" is therefore meaningless, unless there is an estimate of the total error in the above sense.
When a problem in pure or in applied mathematics is "solved" by numerical computation, errors, that is, deviations of the numerical "solution" obtained from the true, rigorous one, are unavoidable. Such a "solution" is therefore meaningless, unless there is an estimate of the total error in the above sense.
J.von Neumann and H.H. Goldstine, Bull.Amer.Math.Soc 53 (1947) 1021-99J.von Neumann and H.H. Goldstine, Bull.Amer.Math.Soc 53 (1947) 1021-99
© Crown copyright Met Office
Vegetation Model
Short-wave radiation
CloudsConvection
Pre
cipi
tatio
n
Long-wave radiation
Surface Processes
Physics Parameterisations
© Crown copyright Met Office
Duration and/or Ensemble size
Re
so
luti
on
ComputingResources
Complexity
1/120
Challenge: Demands on computer power
Parallel Programming just got harder!
June 21, 2012 6
Moore’s Law: More not fasterMoore’s Law: More not faster
Some cores are more equal than others. NUMASome cores are more equal than others. NUMA
AMD InterlagosAMD Interlagos
Heterogeneous Architectures: AcceleratorsHeterogeneous Architectures: Accelerators
NVidia FermiNVidia Fermi
Data parallel: cores MPI taskData parallel: cores MPI task
scale 230 heterogeneous cores?scale 230 heterogeneous cores?
Main memory is receding from view
Main memory is receding from view
The Unified Model - software
© Crown copyright Met Office
Obs, Var, UM (+), IO server, ensembles and verification – more than 2 million lines of code
+ Coupled models excluding ocean and sea ice
UM used for both NWP and Climate Models
Now ~ 25 years old
Fortran90 (some F77 features remain)
Parallelism expressed via MPI
Some lower-level OpenMP (retro-fit)
IO server
MPI tasks dedicated to IO
Dramatic improvement in IO performance.
UM used for both NWP and Climate Models
Now ~ 25 years old
Fortran90 (some F77 features remain)
Parallelism expressed via MPI
Some lower-level OpenMP (retro-fit)
IO server
MPI tasks dedicated to IO
Dramatic improvement in IO performance.
© Crown copyright Met Office
Problems with a long-lat grid
At 25km resolution, grid spacing near poles = 75m
At 10km reduces to 12m!
At 25km resolution, grid spacing near poles = 75m
At 10km reduces to 12m!
3rd Gen dynamical core (ENDGame) improved scaling
Weak CFL ∆t↓ as ∆x↓ (implicit scheme)
Data parallel in 2-D
3rd Gen dynamical core (ENDGame) improved scaling
Weak CFL ∆t↓ as ∆x↓ (implicit scheme)
Data parallel in 2-D
Globally
Uniform
Next
Generation
Highly
Optimized
GungHo! - Working Together Harmoniously
5 Year Project
“To research, design and develop a new dynamical core suitable for operational, global and regional, weather and climate simulation on massively parallel computers of the size envisaged over the coming 20 years.”
To address (inter alia):
What should replace the lat-lon grid?
How to transport material on that grid?
Is implicit time scheme viable/desirable on such computers?
Split into two phases:
2 years “research”
3 years “development”
Bath, Exeter, Imperial, Leeds, Manchester, Reading – NERC
STFC Daresbury and Met Office
Bath, Exeter, Imperial, Leeds, Manchester, Reading – NERC
STFC Daresbury and Met Office
Choice of mesh
© Crown copyright Met Office
New dynamical coreScalable to a very large number of elementsChoice of elements and mesh not fixed
Support for irregular elements in the horizontal
New dynamical coreScalable to a very large number of elementsChoice of elements and mesh not fixed
Support for irregular elements in the horizontal
structured meshNeighbours known by construction - stencilDirect memory access
structured meshNeighbours known by construction - stencilDirect memory access
unstructured meshNeighbours unknownLook up tableIndirect memory access
unstructured meshNeighbours unknownLook up tableIndirect memory access
Derivative operatorsDerivative operators
Consequences for memory access
© Crown copyright Met Office
a(i)=c*b(nb_list(i))a(i)=c*b(nb_list(i))
do k = 1, nlevela(k,i)=c*b(k,nb_list(i))
end do
do k = 1, nlevela(k,i)=c*b(k,nb_list(i))
end do
Indirect memory access destroys data localitypoor cache utilisation poor performance
Indirect memory access destroys data localitypoor cache utilisation poor performance
Mesh is likely to be structured in vertical horizontally unstructured columnar mesh vertical index (k) innermost (contiguous in memory)
Mesh is likely to be structured in vertical horizontally unstructured columnar mesh vertical index (k) innermost (contiguous in memory)
Cache versus oversubscribed concurrency
© Crown copyright Met Office
Conventional CPU cache based memory model Will node level cache-coherency continue?
Conventional CPU cache based memory model Will node level cache-coherency continue?
GPU based thread-teams (Warp) fast switchingNaively each thread own individual elementvector memory access (coalesced)
horizontal index (i) contiguous in memory
GPU based thread-teams (Warp) fast switchingNaively each thread own individual elementvector memory access (coalesced)
horizontal index (i) contiguous in memory
ILP - vectorisation
© Crown copyright Met Office
Vectorisation not limited to GPU-type machineSIMD units on CPUsSSE 2 64-bit words, AVX 4, SIMD on Intel MIC 8
Vectorisation not limited to GPU-type machineSIMD units on CPUsSSE 2 64-bit words, AVX 4, SIMD on Intel MIC 8
complex issue -- Pickles and Porter 2012 – NEMO (Ocean code) Compared two data layouts for 3D arrays-- found different operations favour different orderings Possible to vectorise some ops either layout
complex issue -- Pickles and Porter 2012 – NEMO (Ocean code) Compared two data layouts for 3D arrays-- found different operations favour different orderings Possible to vectorise some ops either layout
Vector friendly -- layer contiguousVector friendly -- layer contiguous
cache friendly -- column contiguouscache friendly -- column contiguous
Overview
ModelInputdata
Griddata
Infrastructure(e.g. mctutils)
halo_exchange()put(), get() – in place coupling
(MPI) ProgramParallelism mgmt-mpi, threads
Read_(partitioned)_grid()
call infrastructure_init()
-Data and comms init-Including halo exchange init- and coupling exchange init
call model_init()-e.g. Allocation
-non in-place coupling
model_timestep_control call model_run()
call model_finalise()
Model Science Codeinit()run()finalise()
Model data set upe.g. Field descriptionsCoupling requirements(‘tag’-based access)
There will be several models and programs
© Crown copyright Met Office
Science Model
Computational Science (CS) workpackage - proposal
Met Office Software development project
Separation of concerns
Computational Science (CS) workpackage - proposal
Met Office Software development project
Separation of concerns
Computational science performance code
Computational science performance code
Scientific and CS, performance code
Scientific and CS, performance code
Fortran 2K3 + MPI + directives (OpenMP)Do not exclude PGAS models (CAF) of single-sided comms
Kernel API
© Crown copyright Met Office
Algorithm layer calls kernelsparallel or serial implemented in PSy layerPSy layer calls compute for generic kernels -- defined interface
Algorithm layer calls kernelsparallel or serial implemented in PSy layerPSy layer calls compute for generic kernels -- defined interface
Hand code versus auto-generatedHand code versus auto-generated
Misses opportunity for data re-use between kernelsSpecial kernels e.g. Helmholtz consist of smaller kernels which share halo exchange
Misses opportunity for data re-use between kernelsSpecial kernels e.g. Helmholtz consist of smaller kernels which share halo exchange
Infrastructure Model
© Crown copyright Met Office
Define infrastructure API to be used by modelsImplementation neutralUse infrastructure software modelsHide these implementations behind APIe.g. ESMF for halo exchange, MCT-OASIS for coupling to Ocean model (NEMO)
Define infrastructure API to be used by modelsImplementation neutralUse infrastructure software modelsHide these implementations behind APIe.g. ESMF for halo exchange, MCT-OASIS for coupling to Ocean model (NEMO)
Data Model
© Crown copyright Met Office
• Model has local data view + halos• Data belongs to objects – fields • Data objects contain
• function space information – DoF of field• topological entity
• Algorithm layer cannot access raw DoF arrays• Enables Mesh/topological entity/function space to be changed without large code changes• Unpacked as arrays before passing to kernel (variable or fixed data size for kernel?)• State object contains internal GH data
• Model has local data view + halos• Data belongs to objects – fields • Data objects contain
• function space information – DoF of field• topological entity
• Algorithm layer cannot access raw DoF arrays• Enables Mesh/topological entity/function space to be changed without large code changes• Unpacked as arrays before passing to kernel (variable or fixed data size for kernel?)• State object contains internal GH data
Summary
© Crown copyright Met Office
NWP and Climate models are complex problemsKey scientific driver for Exascale systemsGung-ho Complete redesign for UK Met Office mathematical formulation Algorithm Numerical Analysis software
NWP and Climate models are complex problemsKey scientific driver for Exascale systemsGung-ho Complete redesign for UK Met Office mathematical formulation Algorithm Numerical Analysis software
personal view What about the hardware? Is there scope Co-design (wider project?) Software and hardware working together harmoniously
personal view What about the hardware? Is there scope Co-design (wider project?) Software and hardware working together harmoniously