the u.s. doe accelerated climate modeling for energy project · 2015-04-27 · across acme...

29
The U.S. DOE Accelerated Climate Modeling for Energy Project Robert Jacob April 22, 2015 Third Workshop on Coupling Technologies for Earth System Models

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

The U.S. DOE Accelerated Climate Modeling for Energy Project

Robert Jacob April 22, 2015 Third Workshop on Coupling Technologies for Earth System Models

Page 2: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Argonne National Laboratory §  $675M  opera,ng  

budget  §  3,200  employees  §  1,450  scien,sts  and  

engineers  §  750  Ph.D.s  

Page 3: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME in a nutshell… A new U.S. climate modeling effort led by the U.S. Department of Energy Office of Biological and Enviornmental Research

or…  

“A  collabora,on  among  the  DOE  na,onal  laboratories  (and  a  few  other  ins,tu,ons)  to  develop  and  apply  the  most  complete,  leading-­‐edge  climate  and  Earth  system  models  for  the  most  challenging  and  demanding  climate-­‐change  research  problems  and  DOE  mission  needs  while  efficiently  using  DOE  Leadership  Compu,ng  Facili,es.”        

Page 4: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Why  should  DOE  in  par3cular  be  interested?  

Page 5: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Leading DOE machines at our last meeting

•  IBM  Blue  Gene/Q  System  –  48  racks  –  49,152  nodes  –  786  TB  of  memory  –  Peak  flop  rate:  10  PF  

•  Cray  XK7  System  –  299,008  cores  –   18,688  NVIDIA  Kepler  

K20x  GPUs  –  710  TB  of  memory  –  Peak  flop  rate:  27  PF  

Page 6: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Upcoming DOE machines

•  Intel/Cray  Aurora  (ALCF)  –  50,000  Xeon  Phi  nodes  (“Knight’s  Hill”)  –  Approx  150PF  –  Produc,on  in  2019  

•  IBM/NVIDIA  Summit  (OLCF)  –  3,400  Power9  nodes  –  Mul,ple  NVIDIA  Volta  GPUs  per  

node  –  Approx  150PF  –  Produc,on  in  2018  

•  Intel/Cray  Cori  (NERSC)  –  9,300  Xeon  Phi  nodes  (“Knight’s  Corner”)  –  Approx  30PF  –  Produc,on  in  2017  

Page 7: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME Project Goals •  a series of prediction and simulation experiments

addressing scientific questions and mission needs; •  a well documented and tested, continuously advancing,

evolving, and improving system of model codes that comprise the ACME Earth system model;

•  the ability to use effectively leading (and “bleeding”) edge computational facilities soon after their deployment at DOE national laboratories; and

•  an infrastructure to support code development, hypothesis testing, simulation execution, and analysis of results.

Page 8: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Climate Science Drivers for ACME

Water cycle: How do the hydrological cycle and water resources interact with the climate system on local to global scales?

Biogeochemistry: How do biogeochemical cycles

interact with global climate change? Cryosphere: How do rapid changes in cryospheric systems

interact with the climate system?

Page 9: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME: let specific science questions drive development

Atmosphere: More accurate simulation of aerosols, clouds, wind, and precipitation

Land: More accurate simulation of terrestrial feedbacks from more complex carbon, nutrient and water cycles Ocean: Introduction of multi-resolution dynamics to more accurately simulate ocean heat uptake and

water masses Sea ice: Recast numerics to focus resolution in polar regions, and add icebergs, sea ice strength, and

snow physics Land ice: Addition of the first realistic, dynamic coupled ice-sheet model

Driver  

Ques,ons   Hypotheses   Experiments   Requirements  

Development  

Page 10: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME Roadmap

Page 11: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME size/start: •  No new funding: 6-7 existing DOE lab climate projects

combined in to one program •  8 U.S. national laboratories and 6 partner institutions •  85 researchers working ¼ time or more •  Total effort ~43 FTE •  Started from beta tag of CESM1.3

–  Using cpl7/MCT for coupler.

•  Started July 1, 2014. 3 years initially.

Page 12: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME organization  

ACME  Council  Dave  Bader,  Chair  

Execu,ve  Commihee:  W.  Collins,  M.  Taylor    R.  Jacob,  P.  Jones,  P.  Rasch,  P.  Thornton,  D.  Williams  

Ex  Officio:  J.  Edmonds,  J.  Hack,  W.  Large,  E.  Ng        

Execu3ve  CommiAee      Chair:  D.  Bader        

Chief  Scien,st:  William  Collins  Chief  Computa,onal  Scien,st:  Mark  Taylor    

 

   Project  Engineer  Renata  McCoy  

   

     

Coupled  Simula3on    Group  Dave  Bader,  Bill  Collins,  

Mark  Taylor      

       

Coupled    Sim.  Task      Leaders  

       

     

Workflow      Group  

Dean  Williams  Katherine  Evans  

   

       

Workflow  Task      Leaders  

       

     

SoLware  Eng./Coupler  Group  

Robert  Jacob  Andrew  Salinger  

   

       

SE/Coupler  Task      Leaders  

       

     

Performance/  Algorithms    Group  

Patrick  Worley  Hans  Johansen  

   

       

Perf.  /  Alg.  Task      Leaders  

       

     

Land      Group  

Peter  Thornton  William  Riley  

   

       

Land  Task        Leaders  

       

     

Atmosphere    Group  

Philip  Rasch  Shaocheng  Xie  

   

       

Atmosphere  Task      Leaders  

       

     

Ocean/Ice    Group  

Philip  Jones  Todd  Ringler  

   

       

Ocean/Ice  Task      Leaders  

       

Page 13: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME organization  

ACME  Council  Dave  Bader,  Chair  

Execu,ve  Commihee:  W.  Collins,  M.  Taylor    R.  Jacob,  P.  Jones,  P.  Rasch,  P.  Thornton,  D.  Williams  

Ex  Officio:  J.  Edmonds,  J.  Hack,  W.  Large,  E.  Ng        

Execu3ve  CommiAee      Chair:  D.  Bader        

Chief  Scien,st:  William  Collins  Chief  Computa,onal  Scien,st:  Mark  Taylor    

 

   Project  Engineer  Renata  McCoy  

   

     

Coupled  Simula3on    Group  Dave  Bader,  Bill  Collins,  

Mark  Taylor      

       

Coupled    Sim.  Task      Leaders  

       

     

Workflow      Group  

Dean  Williams  Katherine  Evans  

   

       

Workflow  Task      Leaders  

       

     

SoLware  Eng./Coupler  Group  

Robert  Jacob  Andrew  Salinger  

   

       

SE/Coupler  Task      Leaders  

       

     

Performance/  Algorithms    Group  

Patrick  Worley  Hans  Johansen  

   

       

Perf.  /  Alg.  Task      Leaders  

       

     

Land      Group  

Peter  Thornton  William  Riley  

   

       

Land  Task        Leaders  

       

     

Atmosphere    Group  

Philip  Rasch  Shaocheng  Xie  

   

       

Atmosphere  Task      Leaders  

       

     

Ocean/Ice    Group  

Philip  Jones  Todd  Ringler  

   

       

Ocean/Ice  Task      Leaders  

       

Page 14: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME  Workflow  group:    Developing  a  comprehensive  approach  to  enable  large  scale  climate  science  

•  The end-to-end workflow integrates –  a model run simulation manager

(AKUNA) –  a data publishing/sharing/archiving

infrastructure (ESGF) –  a secure data transport (Globus) –  analysis/diagnostics/visualization tools

(UV-CDAT) –  provenance capture framework (ProvEn)

to improve reproducibility and tracking

Page 15: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

•  Performance  Monitoring  and  Analysis  •  Internode  and  I/O:    Load  balancing,  communica,on  algorithm  op,miza,on,  

computa,on/communica,on  overlap,  exploi,ng  addi,onal  concurrency      •  On-­‐node:  Accelerators,  threading,  memory  management,  programming  models  •  Next  Genera,on  Architectures:  NERSC  NESAP,  OLCF-­‐4  CAAR,  ALCF-­‐3  ESP  

•  Work  with  DOE  computer  scien,sts  in  other  projects  involving  performance  and  fast  numerical  libraries.  

ACME  Performance  Group:    Simula3on  Throughput  with  a  target  of  5  simulated  years/day  

Page 16: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

•  Best Practices: Common tools, methodologies adopted across ACME science/tech teams. –  Developer’s test suites –  Continuous Integration with Jenkins –  Repository set up and workflow

•  I/O: Parallel I/O at ACME model scales, increased use of in-situ diagnostics.

•  Modularity and configurability: Modular interfaces for all new models, runtime configurability

•  Coupling: Coupler performance, coupler/main design, and MCT!

ACME  SoLware  Engineering/Coupler  Group  

ACME  git  workflow  

Page 17: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Model Coupling Toolkit •  A set of Fortran90 datatypes and functions for building

parallel coupled models. –  With or without a coupler (which MCT doesn’t provide). –  All models are assumed to be parallel with MPI. –  2-sided send/recv model for moving data similar to MPI –  Support for online parallel interpolation using offline-calculated

weights. –  Model registry, decomposition descriptors (by numbering dofs),

distributed data type, communication tables. –  Functions for time accumulation, spatial averaging and merging.

Page 18: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Model Coupling Toolkit

•  At  last  mee,ng,  Feb,  2013,  this  was  s,ll  news  •  Now,  a  lihle  embarrassing  

MCT  website  as  of  4/22/15  

Page 19: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

A lot has been happening with MCT…

GitX  display  of  recent  MCT  repository  history  

Page 20: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Model Coupling Toolkit, v2.9

•  Moved repository from Argonne git server to github.com –  https://github.com/MCSclimate/MCT!

•  New Features to aid in studying Router initialization –  GSMap and MCTWorld print().

•  Print contents to ascii file for later reading –  Router init internal timers

•  Invoked with optional string argument to Router init. –  RouterTest.F90 - test program which reads in output GSMaps

and MCTWorld info and builds a Router. •  Will build on same number of procs and same decomposition as

original model. –  Great for creating coupling benchmarks!

Page 21: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Model Coupling Toolkit, v2.9 •  Support for NAG 6.0 •  Support for Mac builds •  Bug fixes including ones found by valgrind (many thanks to

NCAR’s Sean Santos for above)

•  mpi-serial 2.0 –  a small single-node MPI library. –  For programs that assume MPI and users who don’t want to install

a full MPI-library on their laptop/desktop. Doesn’t require mpirun. –  Not a stub-library MPI_Send/Recv really copies data. –  In 2.0

•  Many more MPI datatypes/functions added. •  Self-contained build system (autoconf)

–  Developed by ALCF’s Raymond Loy •  MCT 2.9 release is imminent.

Page 22: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Model Coupling Toolkit: Development process

•  Use gitworkflows on github. –  Anyone with a free github account can create a fork (git clone) of

the MCT repo. –  Make a branch and develop your cool new feature. –  Submit a “Pull Request” to have your feature included in master

•  A few developers make branches directly in MCT main repo. –  Change gets reviewed and tested (by me) before inclusion. –  Branch gets merged to master.

•  Discussion of bugs/proposed new features with github issues.

•  Documentation on github wiki.

Page 23: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Model Coupling Toolkit: Future plans

•  Near term –  Router initilization benchmarks based on ACME science cases

(0.25 degree atmosphere, 1/10th degree ocean, 10K cores) –  Improve scaling/timing of Router init and Rearranger/MCT_Send/

Recv communication for ACME cases on LCFs. Releasing any MCT improvements.

•  Long term –  MCT-MOAB

•  Talked about before. Mesh Oriented Database •  Tried using MOAB’s Fortran interface to build MCT datatypes/

functions. Not very satisfying. •  New plan (notion): MCT-MOAB in C with MOAB’s C/C++ interface

and Fortran interfaces defined using F2003 standard.

Page 24: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Exascale is coming

Top500  list  and  projected  performance.  (top500.org)  

#1:    Tianhae-­‐2  (33PF)  #2:  Titan  (DOE  OLCF  17PF)  #5:  Mira  (DOE  ALCF  8.5PF)  

Page 25: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Exascale impact on software (including coupling software) •  Massive in-node parallelism (exponential growth)

–  Programmer cannot hand-pick work granularity –  Deeper memory hierarchy –  “Communication is expensive, FLOPS are free”

•  Power as a managed system resource –  Turning on/off components –  Selecting algorithms for speed within power envelope –  Adjusting arithmetic precision –  Potentially adjusting fault protection

•  Dynamic parallelism and work decomposition •  Fault tolerance actively managed in software at many levels •  Architecture organization:

–  Heterogeneous cores –  Specialized functional units –  In-situ NVRAM

Page 26: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Coupling at Exascale: possible problems

•  Coupler (for Earth sub-system interfaces) is almost entirely 2D –  Limited amount of parallelism –  Also not a huge number of flops compared to full model –  Not a huge memory demand except for datatypes that

grow with number of cores. •  But coupler does lots of memory movement

–  Moving data between model’s native data type and coupler data type.

–  Moving data from one model’s processors to another’s

Page 27: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Coupling at Exascale: to do

•  More parallelism through more components executing concurrently –  Ensembles –  Different models

•  Reduce memory movement –  One data type across all model components? –  Co-located decompositions.

Page 28: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Co-located decomposition

2

3 1

2

3 1

1 1 1

2 2 2

3 3 3

3

3

2

2 3 2

1

1 1 2  models:  unrelated  decomposi,ons   2  models:  related  decomposi,ons  

Page 29: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

hhp://climatemodeling.science.energy.gov/projects/accelerated-­‐climate-­‐modeling-­‐energy  

More  informa3on  

hhp://www.mcs.anl.gov/mct/