next generation computing: needs for the atmospheric...

19
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 1 Next Generation Computing: Needs for the Atmospheric Sciences at NCAR May 15, 2017 National Academy of Sciences, Washington DC Anke Kamrath, [email protected] Interim Director, Computing and Information Systems Laboratory Director, Operations and Services Division in CISL National Center for Atmospheric Research (NCAR) * Thanks to Jim Hurrell, Rich Loft, Dave Hart, J-F Lamarque and Ben Cash for their contributions to this slide deck.

Upload: truongdien

Post on 18-May-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 1

Next Generation Computing: Needs for the Atmospheric Sciences

at NCAR

May 15, 2017 National Academy of Sciences, Washington DC

Anke Kamrath, [email protected] Interim Director, Computing and Information Systems Laboratory

Director, Operations and Services Division in CISL

National Center for Atmospheric Research (NCAR)

* Thanks to Jim Hurrell, Rich Loft, Dave Hart, J-F Lamarque and Ben Cash for their contributions to this slide deck.

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 2

Overview • Community Earth System Model (CESM)

• State of Computing at NCAR Today and Future Needs – NCAR’s Data Intensive Computing Environment

– Computing Roadmap

• The Challenges Ahead – “The Wall”

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 3

CESM2 – Community Earth System Model

• Fully-coupled, community, global climate model

• ~60% of NCAR HPC Usage

• Model of models - >1.5M lines code

• ~2X more expensive than CESM1 due

to addition of more science

• Stringent verification criteria

• Community Governance via Working Groups (Atmosphere, Biogeochemistry, Chemistry Climate, Climate Variability & Change, Land Model, Ice, Ocean, Paleoclimate, Polar Climate, Societal Dimensions, Software Engineering, Whole Atmosphere)

• Utilized by 100s of scientists around the world

• Single Code Base Across Desktop, Departmental

and HPC

Study of regional-refinement in CAM6 (AMIP) with the Spectral Element (SE) and MPAS dynamical cores (A. Gettelman and C. Zarzycki)

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 4

CESM Development Process: 5 years Where to add code refactoring, optimization,

parallelization or even rearchitecting?

Model release (CESM1/CLM4) Detailed model

assessment (identify strengths and weaknesses)

LMWG members develop

parameterizations or add features

Present ideas/results at

LMWG meetings

Publish papers

Plans for next (and next next) model version discussed

at LMWG meetings

Build and test beta version of offline model

Finalize and test within

CESM Use model for

scientific studies

Evaluate competing parameterizations

Document; Control

integrations; Model

release (CESM2/CLM5)

Observations

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 5

Data Transfer Services 40 GB/s

RDA, Climate Data Service

40-Gb Ethernet

HPSS Archive 175-190 PB capacity

80 PB stored, 12.5 GB/s >20 PB/yr growth

Geyser, Caldera DAV clusters GLADE

Central disk resource 37 PB, 90/200 GB/s GPFS Yellowstone

1.5 PFLOPS peak

High-bandwidth Low-latency HPC and I/O Networks EDR / FDR InfiniBand and 40-Gb Ethernet

Cheyenne 5.34 PFLOPS peak

Remote Vis Partner Sites XSEDE Sites

Supercomputing Environment at NCAR

SGI ICE, 145K Xeon Broadwell cores, 4K nodes

331TB RAM, EDR

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 6

5 Year Target “Data Friendly” Architecture

NVRAM

O(102) Analysis Nodes

Viz/FPGA nodes

Web servers

disk/tape

5x DRAM memory

O(1M cores) O(1 PB DRAM)

100x DRAM

O(10 sec CkPt)

Data Analysis & Vis Supercomputer Super-cache

UCAR Confidential

Collections/Projects 20x DRAM

SSD Storage

Node islands With

NVRAM xconnect

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 7

• Computing wall

• Data wall

• Complexity wall

• Efficiency wall

The Wall

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 8

Computing Wall

• Processor trends – More transistors

– More cores (∝ transistors)

– Flat clock speeds and power

– Slowing per thread performance

– Increasing flops/byte of memory BW

• SunWei processors ~25 flops/byte

• KNL processors ~7 flops/byte

• Climate computing not well matched to these trends – Climate applications are state heavy with low computational intensity

• ESMs typically run at <1 flops/byte over entire application (e.g., MOM6 Barotropic Solver - .11 flops/byte)

– Physics code is branchy, hard to vectorize, has divides and load imbalances

Source: Karl Rupp

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 9

Difficult Road to Exascale

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 10

Efficiency Wall?

0

10

20

30

40

50

60

70

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

4.5%

5.0%

13/01/01 13/04/11 13/07/20 13/10/28 14/02/05 14/05/16

Dai

ly A

vera

ge T

era

FLO

P/s

eco

nd

Dai

ly A

vera

ge %

Flo

atin

g P

oin

t Ef

fici

en

cy

NCAR’S Yellowstone Floating Point Efficiency Yellowstone %FP Efficiency Yellowstone Avg TFLOP/s

1.57% Lifetime Average Application Floating-Point Efficiency

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 11

Algorithmic ways around the wall?

• Bigger timesteps

– Implicit integration

– Parallel in time

• Fewer points

– Adaptive mesh refinement

– Numerical schemes with higher effective resolution

• Model emulators

– Neural network encoders

• Reduced precision

– FPGA-based computation

“The energy liberated by not

performing overly exact calculations

could be put to more productive use.” ----Tim Palmer

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 12

Co-Design/Partnership between vendors, government and ESM Developers?

– Vendors focused on market of analytics,

deep-learning – simulation is secondary.

– Community level coordination needed

– Purpose-built HPC for ESM including more thread concurrency, more memory bandwidth, less latency to memory, configurable, fast I/O, resilience, efficient reductions, etc.

– Similar needs by other communities: geoscience (tectonics and magna flow), energy (power station design), aerospace (CFD), biomechanics (blood flow), solar and astrophysics physics

CoDEx: Co-Design for Exascale

CRAFT

MDGrape-3 RIKEN

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 13

Data Wall

• Data volumes are exploding and underlying technologies are rapidly changing. A radical rethinking about how data is produced, stored, analyzed, visualized, shared and understood needs to occur.

• Change from “Computing Campaigns” to “Data Campaigns”

• Technology Challenges and Opportunities – Storage Costs outpacing Compute Costs

– New and emerging capabilities in the memory-storage hierarchy

• Science Challenges: – Ensembles (50-100) with Data Assimilation of billions of

observations is just around the corner.

– CMIP6 could be >30PB – how to tackle data management and model intercomparison at this scale?

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 14

NCAR Strategies for Reducing Data Friction

• Provide “Big Data” Community DAV – CMIP Analysis Platform

– NCAR Research Data Archives (RDA) server-side subsetting and processing (processed 20 PB in 2016, delivered .2 PB).

• Better Data Management Policies – Create “storage economy and policies” that drive appropriate trade-offs of

“save data” vs “recompute”

• Lossy Compression – Seeing results of 80% reduction in size in some data types

– With new data policies – more interest by scientists

• New/better storage technologies & hierarchy – Seeing 20X speedups of some workflows (disk-to-disk vs ssd-to-ssd)

• Parallel climate analytics & automated workflow software – Focus on end-to-end workflow performance is vital.

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 15

Complexity Wall

More $ More Compute

More $ More Compute

Faster Threads, and/or Better Code Efficiency to Improve SYPD

Complexity

Resolution

Ensemble size

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 16

Conclusions

• Models are getting exponentially more complex. Without appropriate engineering and governance, ability to produce codes that can run well in jeopardy

• Current ESMs <2% of peak performance • Low Code efficiency means Less Science • Optimizing codes while science is added - like tuning up a race car while

it’s driving • ESMs will not speed up (SYPD) without real effort. New ideas are required

to break through this barrier. – Exploring radically new algorithmic approaches – New types of parallelism – System Co-Design with Vendors

• We must invest heavily in next-generation codes to keep science moving forward!

• Investment (pay, pipeline, diversity, training, etc) in workforce, workforce and workforce… is vital.

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 17

Questions? Comments?

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 18

Some References

• Carman, Jessie, Thomas Clune, Francis Giraldo, Mark Govertt, Brian Gross, Anke Kamrath, Tsengdar Lee, David McCarren, John Michalakes, Scott Sandgathe, Tim Whitcomb, 2017. Position Paper on high performance computing needs in Earth System Prediction. National Earth System Prediction Capability. https://doi.org/10.7289/V5862DH3

• Next Generation Earth System Prediction: Strategies for Subseasonal to Seasonal Forecasts, April 2016 National Academies – NRC Report

• NSF RFI CI Submissions from NCAR – HPC Community

– CESM Community

Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 19

• Best atmos resolution on Yellowstone was 16km, Cheyenne <10km • Simultaneous 4x increase in ocean resolution to .25 degree • ~2x increase in years of integration (Note: sharply limited years

integrated due to computing, storage constraints) • 3X may allow for:

• 5 km maximum atmospheric resolution • 0.1 degree ocean? • 1-1.5 PB analyzable output

• Probably 2 supercomputer gens from explicit representation of convection

Progression Hero-Climate Runs

NCAR Super Project Atmosphere model cycle

Atmosphere spectral

truncation

Atmos vertical levels

Ocean model

Ocean horizontal

Res.

Ocean vertical levels

Yellowstone (1.5PFLOPS)

(2013)

MINERVA IFS cy 38r1 TL319 (64km) TL639 (32km)

TL1279 (16km)

91 levels, top = 1 Pa

NEMO v 3.0/3.1

1 degree 42 levels

Cheyenne (5.34Pflops)(

2017)

METIS IFS cy 43r1 TCO199 (64km) TCO639 (16km) TCO1279 (9km)

91 levels, top = 1 Pa

NEMO v 3.4.1

TCO199: 1º TCO639: 0.25º TCO1279: 0.25º

TCO199:42 TCO639: 75 TCO1279: 75