convergence of computation and data workflows · pdf fileconvergence of computation and data...
TRANSCRIPT
Convergence of computation and data workflowsIS-ENES Workshop on Workflows and Metadata Generation
Lisbon, PORTUGAL
V. Balaji
NOAA/GFDL and Princeton University
28 September 2016
V. Balaji ([email protected]) Convergence 28 September 2016 1 / 35
Amy Langenhorst 1977-2016
Principal Developer of the FMS Runtime Environment FRE.
V. Balaji ([email protected]) Convergence 28 September 2016 2 / 35
Outline
1 Hardware DirectionsGPUs, MICs, ARMInexact computingEnergy cost of algorithms and data movement
2 A Graph ApproachDirected Acyclic GraphsConvergence of computation and dataFault tolerance across the workflow
3 Metadata and provenanceDevelopment and production workflowStatistical and scientific reproducibility
4 Summary
V. Balaji ([email protected]) Convergence 28 September 2016 3 / 35
Outline
1 Hardware DirectionsGPUs, MICs, ARMInexact computingEnergy cost of algorithms and data movement
2 A Graph ApproachDirected Acyclic GraphsConvergence of computation and dataFault tolerance across the workflow
3 Metadata and provenanceDevelopment and production workflowStatistical and scientific reproducibility
4 Summary
V. Balaji ([email protected]) Convergence 28 September 2016 4 / 35
Power-8 with NVLink
Figure courtesy IBM.V. Balaji ([email protected]) Convergence 28 September 2016 5 / 35
KNL Overview
Figure courtesy Intel.
V. Balaji ([email protected]) Convergence 28 September 2016 6 / 35
The inexorable triumph of commodity computing
... means ARM? From The Platform, Hemsoth (2015).V. Balaji ([email protected]) Convergence 28 September 2016 7 / 35
Irreproducible Computing, Inexact Hardware
Figure 1 from Düben et al, Phil. Trans. A, 2016. Which bits can weallow to be “inexactly” flipped? Lorenz 96 as canonical test case ofnon-linearity and chaos.
V. Balaji ([email protected]) Convergence 28 September 2016 8 / 35
Irreproducible Computing, Inexact Hardware
Figure 2 from Düben et al, Phil. Trans. A, 2016.
V. Balaji ([email protected]) Convergence 28 September 2016 9 / 35
COSMO: energy to solution
T. SchulthessENES HPC Workshop, Hamburg, March 17, 2014
Cray XE6 (Nov. 2011)
Cray XK7 (Nov. 2012)
Cray XC30 (Nov. 2012)
Cray XC30 hybrid (GPU) (Nov. 2013)
6.0
4.5
3.0
1.5
Current production code
1.75x
New HP2C funded code
1.41x
1.49x
2.51x
2.64x
6.89x
Energy to solution (kWh / ensemble member)
!6
3.93x
Figure courtesy Thomas Schulthess, CSCS.
V. Balaji ([email protected]) Convergence 28 September 2016 10 / 35
JPSY comparison across ESMs
Model Machine Resol SYPD CHSY JPSYCM4 gaea/c2 1.2⇥108 4.5 16000 8.92⇥108
CM4 gaea/c3 1.2⇥108 10 7000 3.40⇥108
Comparative measures of capability (SYPD), capacity (CHSY),and energy cost (JPSY) per “unit of science”.Can you have codes that are “slower but greener”? Algorithmsthat are “less accurate but more eco-friendly”?From Balaji et al (2016), in review at GMDD.http://goo.gl/Nj1c2N
V. Balaji ([email protected]) Convergence 28 September 2016 11 / 35
Workflows for the exascale
Billion-way concurrency still a daunting challenge for everyone: nomagic bullets anywhere to be found.Exotic hardware is on the way; this is quite likely the lastgeneration of conventional hardware. Computing is likely tobecome irreproducible.Software investment paid back in power savings (Schulthess).Energy to solution will become key metric.More threading needs to be found: to fit 1018 op/s within a 1 MWpower budget, an operation should be 1 pJ: data movement is⇠10 pJ to main memory; ⇠100 pJ on network!DARPA: commodity improvements will slow to a trickle within 10years: go back to specialized computing?DOE: double investment in exascale.
V. Balaji ([email protected]) Convergence 28 September 2016 12 / 35
A network of compute and data nodes
FRE and other elements in the GFDL modeling environment managethe complex scheduling of jobs across a distributed computingresource.V. Balaji ([email protected]) Convergence 28 September 2016 13 / 35
... a global network of compute and data nodes
Workflow task is to minimize data flow across the global network.Figure courtesy IPSL.
V. Balaji ([email protected]) Convergence 28 September 2016 14 / 35
Outline
1 Hardware DirectionsGPUs, MICs, ARMInexact computingEnergy cost of algorithms and data movement
2 A Graph ApproachDirected Acyclic GraphsConvergence of computation and dataFault tolerance across the workflow
3 Metadata and provenanceDevelopment and production workflowStatistical and scientific reproducibility
4 Summary
V. Balaji ([email protected]) Convergence 28 September 2016 15 / 35
Examples of DAG parallelism
DAG example: Cholesky Inversion
Source: Stan Tomov, ICL, University of Tennessee, Knoxville
DAG = Directed Acyclic Graph Can IFS use this technology?
ECMWF Seminar 2013
Figure courtesy George Mozdzynski, ECMWF.
V. Balaji ([email protected]) Convergence 28 September 2016 16 / 35
SWARM for DAGs
Jeffrey et al, IEEE Micro 2016.
V. Balaji ([email protected]) Convergence 28 September 2016 17 / 35
KNL Overview
Figure courtesy Intel.
V. Balaji ([email protected]) Convergence 28 September 2016 18 / 35
SWARM for DAGs: hardware implementation
Jeffrey et al, IEEE Micro 2016.
V. Balaji ([email protected]) Convergence 28 September 2016 19 / 35
NVRAM will blur distinction between memory andfilesystem
Hemsoth, 2014: http://goo.gl/3ZeOXtV. Balaji ([email protected]) Convergence 28 September 2016 20 / 35
NVRAM will blur distinction between memory andfilesystem
Hemsoth, 2014: http://goo.gl/3ZeOXt
V. Balaji ([email protected]) Convergence 28 September 2016 21 / 35
Work avoidance
Work avoidance: find minimal path to complete outputmake: traverse tree backwards; state is the filesystem state.cylc/chaco: traverse tree forwards; each task formulated as ano-op if outputs exist; fred contains state including tasks in flight.
V. Balaji ([email protected]) Convergence 28 September 2016 22 / 35
Work avoidance
Work avoidance: find minimal path to complete outputmake: traverse tree backwards; state is the filesystem state.cylc/chaco: traverse tree forwards; each task formulated as ano-op if outputs exist; fred contains state including tasks in flight.
V. Balaji ([email protected]) Convergence 28 September 2016 23 / 35
Work avoidance
Work avoidance: find minimal path to complete outputmake: traverse tree backwards; state is the filesystem state.cylc/chaco: traverse tree forwards; each task formulated as ano-op if outputs exist; fred contains state including tasks in flight.
V. Balaji ([email protected]) Convergence 28 September 2016 24 / 35
Use of cross-network message queues
MQ Cluster
MQ Apps
API
DB’s
IPS
L
IPS
L
IPSL User @ Browser | Command Line | Desktop
json
TGCC
MQ Relay
IDRIS
MQ Relay
CINES
MQ Relay
CNRM
MQ Relay
XXX
MQ Relay
msg msgmsgmsgmsg
IPSL have tested handling O(105) enqueues/dequeues per day.Google reports Rabbit service of O(106) per second! (more thanall SMS/WhatsApp/etc) https://goo.gl/GBlAAzAMQP: active messages containing instructions as well as data.
Figure courtesy Sébastien Denvil, IPSL.V. Balaji ([email protected]) Convergence 28 September 2016 25 / 35
Outline
1 Hardware DirectionsGPUs, MICs, ARMInexact computingEnergy cost of algorithms and data movement
2 A Graph ApproachDirected Acyclic GraphsConvergence of computation and dataFault tolerance across the workflow
3 Metadata and provenanceDevelopment and production workflowStatistical and scientific reproducibility
4 Summary
V. Balaji ([email protected]) Convergence 28 September 2016 26 / 35
Development and production workflowModel developers have different workflow priorities and requirements.
Production workflow benefits from coherence and similarity acrossruns.Development workflow requires extremely fine-grained access tocode, namelists, scripts. A lot of rules broken:
Favored IDE/UI is called vi!source code edits in user directoriesinput file modifications on the fly
Analysis workflow requires random access to local disk:inspiration-driven rather than industrial strengthStill benefit from regression testing harness: multiple compilers,platformsEmulators? e.g SoftFloathttp://www.jhauser.us/arithmetic/SoftFloat.htmlProvenance and metadata requirements relaxed for developmentworkflow.
V. Balaji ([email protected]) Convergence 28 September 2016 27 / 35
Statistical comparison across model versions
Live monitoring of model runs. From GFDL MDT Tracking Page...
V. Balaji ([email protected]) Convergence 28 September 2016 28 / 35
Statistical comparison across model versions
Are two runs the same or different? What difference in inputs isresponsible for the disrepancy? From GFDL MDT Tracking Page...V. Balaji ([email protected]) Convergence 28 September 2016 29 / 35
Multi-model ensembles for climate projection
Figure SPM.7 from the IPCC AR5 Report. Can be interpreted as themost general and rigorous test of scientific reproducibility.
V. Balaji ([email protected]) Convergence 28 September 2016 30 / 35
Multi-model ensembles for climate projection
Critically depends on software, metadata, and data standards: theEarth System Grid Federation (http://esgf.org): a 3 PBfederated archive.Workflows for replication, versioning, subsetting, QC, citation.
V. Balaji ([email protected]) Convergence 28 September 2016 31 / 35
Outline
1 Hardware DirectionsGPUs, MICs, ARMInexact computingEnergy cost of algorithms and data movement
2 A Graph ApproachDirected Acyclic GraphsConvergence of computation and dataFault tolerance across the workflow
3 Metadata and provenanceDevelopment and production workflowStatistical and scientific reproducibility
4 Summary
V. Balaji ([email protected]) Convergence 28 September 2016 32 / 35
Summary
Community is struggling to move from synchronous toasynchronous data flow on the coming hardware platforms.Hardware is blurring the line between cache and memory,memory and storage (deep hierarchy).Energy to solution as a benchmark across the entire workflow.Expressing entire workflow as graphs (DAGs).
Maximize parallelism across the entire graphMinimize graph traversal during fault recovery
Accommodating different needs for development and productionworkflows: relax provenance and metadata requirements duringdevelopment.Irreproducible computing: include statistical consistency testinginto workflow.
CMIP8 is going to be awesome!
V. Balaji ([email protected]) Convergence 28 September 2016 33 / 35
Summary
Community is struggling to move from synchronous toasynchronous data flow on the coming hardware platforms.Hardware is blurring the line between cache and memory,memory and storage (deep hierarchy).Energy to solution as a benchmark across the entire workflow.Expressing entire workflow as graphs (DAGs).
Maximize parallelism across the entire graphMinimize graph traversal during fault recovery
Accommodating different needs for development and productionworkflows: relax provenance and metadata requirements duringdevelopment.Irreproducible computing: include statistical consistency testinginto workflow.CMIP8 is going to be awesome!
V. Balaji ([email protected]) Convergence 28 September 2016 34 / 35