enzo and extreme scale amr for hydrodynamic cosmology

36
ENZO AND EXTREME SCALE AMR FOR HYDRODYNAMIC COSMOLOGY Michael L. Norman, UC San Diego and SDSC [email protected]

Upload: zan

Post on 22-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Michael L. Norman, UC San Diego and SDSC [email protected]. ENZO and extreme scale amr for hydrodynamic cosmology. A parallel AMR application for astrophysics and cosmology simulations Hybrid physics: fluid + particle + gravity + radiation Block structured AMR MPI or hybrid parallelism - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ENZO and extreme scale  amr  for hydrodynamic cosmology

ENZO AND EXTREME SCALE AMR FOR HYDRODYNAMIC COSMOLOGY

Michael L. Norman, UC San Diego and [email protected]

Page 2: ENZO and extreme scale  amr  for hydrodynamic cosmology

WHAT IS ENZO? A parallel AMR application for astrophysics and

cosmology simulations Hybrid physics: fluid + particle + gravity + radiation Block structured AMR MPI or hybrid parallelism

Under continuous development since 1994 Greg Bryan and Mike Norman @ NCSA Shared memorydistributed memoryhierarchical memory C++/C/F, >185,000 LOC

Community code in widespread use worldwide Hundreds of users, dozens of developers Version 2.0 @ http://enzo.googlecode.com

Page 3: ENZO and extreme scale  amr  for hydrodynamic cosmology
Page 4: ENZO and extreme scale  amr  for hydrodynamic cosmology

TWO PRIMARY APPLICATION DOMAINS

ASTROPHYSICAL FLUID DYNAMICS HYDRODYNAMIC COSMOLOGY

Supersonic turbulence Large scale structure

Page 5: ENZO and extreme scale  amr  for hydrodynamic cosmology

ENZO PHYSICSPhysics Equations Math type Algorithm(s) Communicati

onDark matter Newtonian

N-bodyNumerical integration

Particle-mesh Gather-scatter

Gravity Poisson Elliptic FFTmultigrid

Global

Gas dynamics

Euler Nonlinear hyperbolic

Explicit finite volume

Nearest neighbor

Magnetic fields

Ideal MHD Nonlinear hyperbolic

Explicit finite volume

Nearest neighbor

Radiation transport

Flux-limited radiation diffusion

Nonlinear parabolic

Implicit finite differenceMultigrid solves

Global

Multispecies chemistry

Kinetic equations

Coupled stiff ODEs

Explicit BE ,implicit

None

Inertial, tracer, source , and sink particles

Newtonian N-body

Numerical integration

Particle-mesh Gather-scatter

Physics modules can be used in any combination in 1D, 2D and 3D making ENZO a very powerful and versatile code

Page 6: ENZO and extreme scale  amr  for hydrodynamic cosmology

ENZO MESHING Berger-

Collela structured AMR

Cartesian base grid and subgrids

Hierarchical timetepping

Page 7: ENZO and extreme scale  amr  for hydrodynamic cosmology

Level 0

AMR = collection of grids (patches);each grid is a C++ object

Level 1

Level 2

Page 8: ENZO and extreme scale  amr  for hydrodynamic cosmology

Unigrid = collection of Level 0 grid patches

Page 9: ENZO and extreme scale  amr  for hydrodynamic cosmology

EVOLUTION OF ENZO PARALLELISM Shared memory (PowerC) parallel (1994-1998)

SMP and DSM architecture (SGI Origin 2000, Altix) Parallel DO across grids at a given refinement level

including block decomposed base grid O(10,000) grids

Distributed memory (MPI) parallel (1998-2008) MPP and SMP cluster architectures (e.g., IBM PowerN) Level 0 grid partitioned across processors Level >0 grids within a processor executed sequentially Dynamic load balancing by messaging grids to

underloaded processors (greedy load balancing) O(100,000) grids

Page 10: ENZO and extreme scale  amr  for hydrodynamic cosmology
Page 11: ENZO and extreme scale  amr  for hydrodynamic cosmology

Projection of refinement levels

160,000 grid patches at 4 refinement levels

Page 12: ENZO and extreme scale  amr  for hydrodynamic cosmology

1 MPI task per processor

Task = a Level 0 grid patch and all associated subgrids;

processed sequentially across and within levels

Page 13: ENZO and extreme scale  amr  for hydrodynamic cosmology

EVOLUTION OF ENZO PARALLELISM Hierarchical memory (MPI+OpenMP) parallel

(2008-) SMP and multicore cluster architectures (SUN

Constellation, Cray XT4/5) Level 0 grid partitioned across shared memory

nodes/multicore processors Parallel DO across grids at a given refinement

level within a node Dynamic load balancing less critical because of

larger MPI task granularity (statistical load balancing)

O(1,000,000) grids

Page 14: ENZO and extreme scale  amr  for hydrodynamic cosmology

N MPI tasks per SMPM OpenMP threads per task

Task = a Level 0 grid patch and all associated subgrids processed concurrently within levels and

sequentially across levels

Each grid is an OpenMP thread

Page 15: ENZO and extreme scale  amr  for hydrodynamic cosmology

ENZO ON PETASCALE PLATFORMS

ENZO ON CRAY XT5 1% OF THE 64003 SIMULATION

Non-AMR 64003 80 Mpc box 15,625 (253) MPI tasks,

2563 root grid tiles 6 OpenMP threads per

task 93,750 cores 30 TB per checkpoint/re-

start/data dump >15 GB/sec read, >7

GB/sec write Benefit of threading

reduce MPI overhead & improve disk I/O

Page 16: ENZO and extreme scale  amr  for hydrodynamic cosmology

ENZO ON PETASCALE PLATFORMS

ENZO ON CRAY XT5 105 SPATIAL DYNAMIC RANGE

AMR 10243 50 Mpc box, 7 levels of refinement 4096 (163) MPI tasks, 643

root grid tiles 1 to 6 OpenMP threads

per task - 4096 to 24,576 cores

Benefit of threading Thread count increases

with memory growth reduce replication of grid

hierarchy data

Page 17: ENZO and extreme scale  amr  for hydrodynamic cosmology

Using MPI+threads to access more RAM as the AMR calculation grows in size

Page 18: ENZO and extreme scale  amr  for hydrodynamic cosmology

ENZO ON PETASCALE PLATFORMS

ENZO-RHD ON CRAY XT5 COSMIC REIONIZATION

Including radiation transport 10x more expensive LLNL Hypre multigrid

solver dominates run time near ideal scaling to at

least 32K MPI tasks Non-AMR 10243 8 and

16 Mpc boxes 4096 (163) MPI tasks, 643

root grid tiles

Page 19: ENZO and extreme scale  amr  for hydrodynamic cosmology

BLUE WATERS TARGET SIMULATIONRE-IONIZING THE UNIVERSE

Cosmic Reionization is a weak-scaling problem large volumes at a fixed resolution to span range of scales

Non-AMR 40963 with ENZO-RHD Hybrid MPI and OpenMP SMT and SIMD tuning 1283 to 2563 root grid tiles 4-8 OpenMP threads per task 4-8 TBytes per checkpoint/re-start/data dump (HDF5) In-core intermediate checkpoints (?) 64-bit arithmetic, 64-bit integers and pointers Aiming for 64-128 K cores 20-40 M hours (?)

Page 20: ENZO and extreme scale  amr  for hydrodynamic cosmology

PETASCALE AND BEYOND ENZO’s AMR infrastructure limits

scalability to O(104) cores We are developing a new, extremely

scalable AMR infrastructure called Cello http://lca.ucsd.edu/projects/cello

ENZO-P will be implemented on top of Cello to scale to

Page 21: ENZO and extreme scale  amr  for hydrodynamic cosmology

CURRENT CAPABILITIES: AMR VS TREECODE

Page 22: ENZO and extreme scale  amr  for hydrodynamic cosmology

CELLO EXTREME AMR FRAMEWORK: DESIGN PRINCIPLES

Hierarchical parallelism and load balancing to improve localization

Relax global synchronization to a minimum Flexible mapping between data structures

and concurrency Object-oriented design Build on best available software for fault-

tolerant, dynamically scheduled concurrent objects (Charm++)

Page 23: ENZO and extreme scale  amr  for hydrodynamic cosmology

CELLO EXTREME AMR FRAMEWORK: APPROACH AND SOLUTIONS

1. hybrid replicated/distributed octree-based AMR approach, with novel modifications to improve AMR scaling in terms of both size and depth;

2. patch-local adaptive time steps; 3. flexible hybrid parallelization strategies; 4. hierarchical load balancing approach based on actual

performance measurements; 5. dynamical task scheduling and communication; 6. flexible reorganization of AMR data in memory to permit

independent optimization of computation, communication, and storage;

7. variable AMR grid block sizes while keeping parallel task sizes fixed;

8. address numerical precision and range issues that arise in particularly deep AMR hierarchies;

9. detecting and handling hardware or software faults during run-time to improve software resilience and enable software self-management.

Page 24: ENZO and extreme scale  amr  for hydrodynamic cosmology

IMPROVING THE AMR MESH:PATCH COALESCING

Page 25: ENZO and extreme scale  amr  for hydrodynamic cosmology

IMPROVING THE AMR MESH:TARGETED REFINEMENT

Page 26: ENZO and extreme scale  amr  for hydrodynamic cosmology

IMPROVING THE AMR MESH:TARGETED REFINEMENT WITH BACKFILL

Page 27: ENZO and extreme scale  amr  for hydrodynamic cosmology

CELLO SOFTWARE COMPONENTS

http://lca.ucsd.edu/projects/cello

Page 28: ENZO and extreme scale  amr  for hydrodynamic cosmology

ROADMAP

Page 29: ENZO and extreme scale  amr  for hydrodynamic cosmology

Enzo website (code, documentation) http://lca.ucsd.edu/projects/enzo

2010 Enzo User Workshop slides http://lca.ucsd.edu/workshops/enzo2010

yt website (analysis and vis.) http://yt.enzotools.org

Jacques website (analysis and vis.) http://jacques.enzotools.org/doc/Jacques/Ja

cques.html

ENZO RESOURCES

Page 30: ENZO and extreme scale  amr  for hydrodynamic cosmology

BACKUP SLIDES

Page 31: ENZO and extreme scale  amr  for hydrodynamic cosmology

Level 0

x x

x

Level 1

Level 2

GRID HIERARCHY DATA STRUCTURE

(0,0)

(1,0)

(2,0) (2,1)

Page 32: ENZO and extreme scale  amr  for hydrodynamic cosmology

(0)

(1,0) (1,1)

(2,0) (2,1) (2,2) (2,3) (2,4)

(3,0) (3,1) (3,2) (3,4) (3,5) (3,6) (3,7)

(4,0) (4,1) (4,3) (4,4)

Dept

h (le

vel)

Breadth (# siblings)

Scaling the AMR grid hierarchy in depth and breadth

Page 33: ENZO and extreme scale  amr  for hydrodynamic cosmology

10243, 7 LEVEL AMR STATSLevel Grids Memory (MB) Work =

Mem*(2^level)0 512 179,029 179,0291 223,275 114,629 229,2582 51,522 21,226 84,9043 17,448 6,085 48,6804 7,216 1,975 31,6005 3,370 1,006 32,1926 1,674 599 38,3367 794 311 39,808Total 305,881 324,860 683,807

Page 34: ENZO and extreme scale  amr  for hydrodynamic cosmology

real grid object

virtual grid object

grid metadataphysics data

grid metadata

Current MPI Implementation

Page 35: ENZO and extreme scale  amr  for hydrodynamic cosmology

SCALING AMR GRID HIERARCHY

Flat MPI implementation is not scalable because grid hierarchy metadata is replicated in every processor For very large grid counts, this dominates memory

requirement (not physics data!) Hybrid parallel implementation helps a lot!

Now hierarchy metadata is only replicated in every SMP node instead of every processor

We would prefer fewer SMP nodes (8192-4096) with bigger core counts (32-64) (=262,144 cores)

Communication burden is partially shifted from MPI to intranode memory accesses

Page 36: ENZO and extreme scale  amr  for hydrodynamic cosmology

CELLO EXTREME AMR FRAMEWORK Targeted at fluid, particle, or hybrid

(fluid+particle) simulations on millions of cores Generic AMR scaling issues:

Small AMR patches restrict available parallelism Dynamic load balancing Maintaining data locality for deep hierarchies Re-meshing efficiency and scalability Inherently global multilevel elliptic solves Increased range and precision requirements for deep

hierarchies