blue waters and resource management - now and in the future

23
Blue Waters and Resource Management – Now and in the Future Dr. William Kramer National Center for Supercomputing Applications, University of Illinois

Upload: insidehpc

Post on 03-Jul-2015

433 views

Category:

Technology


1 download

DESCRIPTION

In this presentation from Moabcon 2013, Bill Kramer from NCSA presents: Blue Waters and Resource Management - Now and in the Future. Watch the video of this presentation: http://insidehpc.com/?p=36343

TRANSCRIPT

Page 1: Blue Waters and Resource Management - Now and in the Future

Blue Waters and Resource Management – Now and in the Future

Dr. William Kramer National Center for Supercomputing Applications, University of Illinois

Page 2: Blue Waters and Resource Management - Now and in the Future

Science & Engineering on Blue Waters

Molecular Science Weather & Climate Forecasting

Earth Science Astro* Health

Blue Waters will enable advances in a broad range of science and engineering disciplines. Examples include:

Moabcon 2013 - April 10, 2013 - Salt Lake City 2

Page 3: Blue Waters and Resource Management - Now and in the Future

What Have I been Doing?

3 Moabcon 2013 - April 10, 2013 - Salt Lake

City

Page 4: Blue Waters and Resource Management - Now and in the Future

Science  Area   Number  of  Teams  

Codes   Struct  Grids  

Unstruct  Grids  

Dense  Matrix  

Sparse  Matrix  

N-­‐Body  

Monte  Carlo  

FFT   PIC   Significant  I/O  

Climate  and  Weather   3   CESM,  GCRM,  CM1/WRF,  HOMME  

X   X   X   X   X  

Plasmas/Magnetosphere   2   H3D(M),VPIC,  OSIRIS,  Magtail/UPIC  

X   X   X   X  

Stellar  Atmospheres  and  Supernovae  

5   PPM,  MAESTRO,  CASTRO,  SEDONA,  ChaNGa,  MS-­‐FLUKSS  

X   X   X   X   X   X  

Cosmology   2   Enzo,  pGADGET   X   X   X  CombusRon/Turbulence   2   PSDNS,  DISTUF   X   X  General  RelaRvity   2   Cactus,  Harm3D,  

LazEV  X   X  

Molecular  Dynamics   4   AMBER,  Gromacs,  NAMD,  LAMMPS  

X   X   X  

Quantum  Chemistry   2   SIAL,  GAMESS,  NWChem  

X   X   X   X   X  

Material  Science   3   NEMOS,  OMEN,  GW,  QMCPACK  

X   X   X   X  

Earthquakes/Seismology   2   AWP-­‐ODC,  HERCULES,  PLSQR,  SPECFEM3D  

X   X   X   X  

Quantum  Chromo  Dynamics  

1   Chroma,  MILC,  USQCD  

X   X   X   X   X  

Social  Networks   1   EPISIMDEMICS  

EvoluRon   1   Eve  

Engineering/System  of  Systems  

1   GRIPS,Revisit   X  

Computer  Science   1   X   X   X   X   X  Moabcon 2013 - April 10, 2013 - Salt Lake City 4

Page 5: Blue Waters and Resource Management - Now and in the Future

Blue Waters Computing System

Moabcon  2013  -­‐  April  10,  2013  -­‐  Salt  Lake  City  

Sonexion:  26  usable  PB  

>1  TB/sec  

100  GB/sec  

10/40/100  Gb  Ethernet  Switch  

Spectra  Logic:  300  usable  PB  

120+  Gb/sec  

100-­‐300  Gbps  WAN  

IB  Switch  

5

External  Servers  

Aggregate  Memory  –  1.5  PB  

Page 6: Blue Waters and Resource Management - Now and in the Future

40  GbE  

FDR  IB  

10  GbE  

QDR  IB  

Cray  HSN  

1.2PB    useable  Disk  

1,200 GB/s

100 GB/s

28  Dell  720  IE  servers  

4  Dell    esLogin  Online  disk  >25PB  

/home,  /project  /scratch  

LNET(s)   rSIP  GW  

300 GB/s

Network  GW  

FC8  

LNET TCP/IP (10 GbE)

SCSI (FCP)

Protocols

GridFTP (TCP/IP)

380PB  RAW  Tape  

50  Dell  720  Near  Line  servers  

55 GB/s

100 GB/s

100 GB/s

6

100 GB/s

40GbE  Switch  

440  Gb/s  Ethernet  from  site  network  

Core    FDR/QDR  IB    Extreme  Switches  

LAN/WAN

100 GB/s

All storage sizes given as the amount usable. Rates are always usable/measured sustained rates

Moabcon 2013 - April 10, 2013 - Salt Lake City

Page 7: Blue Waters and Resource Management - Now and in the Future

7

Cray XE6/XK7 - 276 Cabinets

XE6  Compute  Nodes  -­‐  5,688  Blades  –  22,640  Nodes  –      362,240  FP  (bulldozer)  Cores  –  724,480  Integer  Cores  

4  GB  per  FP  core  

DSL  48  Nodes  Resource    

Manager  (MOM)  64  Nodes  

H2O  Login    4  Nodes  

Import/Export  Nodes  

Management  Node  

esServers Cabinets

HPSS  Data  Mover  Nodes  

XK7    GPU  Nodes  768  Blades  –  3,072  (4,224)  Nodes  

24,576  (33,792)  FP  Cores      –  4,224  GPUs      4  GB  per  FP  core  

Sonexion  25+  usable  PB  online  storage  

36  racks  

BOOT  2  Nodes  

SDB  2  Nodes  

Network  GW  8  Nodes  

Reserved  74  Nodes  

LNET  Routers  582  Nodes  

InfiniBand  fabric  Boot RAID

Boot Cabinet

SMW    

10/40/100  Gb  Ethernet  Switch  

Gemini Fabric (HSN)

RSIP  12Nodes  

NCSAnet  Near-­‐Line  Storage  300+  usable  PB  

SupporRng  systems:  LDAP,  RSA,  Portal,  JIRA,  Globus  CA,  Bro,  test  systems,  Accounts/AllocaRons,  CVS,  Wiki  

Cyber  ProtecRon  IDPS  

NPCF

Moabcon 2013 - April 10, 2013 - Salt Lake City

SCUBA

Page 8: Blue Waters and Resource Management - Now and in the Future

BW Focus on Sustained Performance •  Blue Water’s and NSF are focusing on sustained performance in a way few have

been before. •  Sustained is the computer’s useful, consistent performance on a broad range of

applications that scientists and engineers use every day. •  Time to solution for a given amount of work is the important metric – not hardware Ops/s •  Sustained performance (and therefore tests) include time to read data and write the results

•  NSF’s Track-1 call emphasized sustained performance, demonstrated on a collection of application benchmarks (application + problem set)

•  Not just simplistic metrics (e.g. HP Linpack) •  Applications include both Petascale applications (effectively use the full machine, solving

scalability problems for both compute and I/O) and applications that use a large fraction of the system

•  Blue Waters project focus is on delivering sustained Petascale performance to computational and data focused applications

•  Develop tools, techniques, samples, that exploit all parts of the system •  Explore new tools, programming models, and libraries to help applications get the most from

the system •  By the Sustained Petascale Performance Metrics Blue Waters sustained >1.3 across

12 different time to solution application tests. Moabcon 2013 - April 10, 2013 - Salt Lake

City 8

Page 9: Blue Waters and Resource Management - Now and in the Future

View from the Blue Waters Portal

9 Moabcon 2013 - April 10, 2013 - Salt Lake

City

As of April 2, 2013, Blue Waters has delivered over 1.3 Billion core-hours to S&E Teams

Page 10: Blue Waters and Resource Management - Now and in the Future

Usage Breakdown – Jan 1 to Mar 26, 2013

•  Torque log accounting (NCSA, Mike Showerman)

10 Moabcon 2013 - April 10, 2013 - Salt Lake

City

0  

5  

10  

15  

20  

25  

30  

35  

1   2   4   8   16   32   64   128   256   512   1024   2048   4096   8192   16384  32768  

usage  (M

 nod

e-­‐ho

urs)  

XE  job  size  (NODES:  1  node  =  32  cores)  

Accumulated  XE  node-­‐hours  –  January  1  to  March  26,  2013  

Power of 2

> 65,536 Cores

> 262,144 Cores

Page 11: Blue Waters and Resource Management - Now and in the Future

OBSERVATIONS AND THOUGHTS:

FLEXIBILITY IF THE WORD OF THE NEXT DECADE

What is Blue Waters already telling us about the future @Scale systems

11 Moabcon 2013 - April 10, 2013 - Salt Lake

City

Page 12: Blue Waters and Resource Management - Now and in the Future

Observation 1: Topology Matters •  Much of the work for performance improvement of early

applications was understanding and tuning for layout/topology even on dedicated systems •  Factors of almost 10 were seen for some applications

•  Nvidia’s Linpack results are mostly due to topology aware work layout

•  Done with hand tuning, special node selection etc, •  Needs to become common place to really benefit use

12 Moabcon 2013 - April 10, 2013 - Salt Lake

City

Page 13: Blue Waters and Resource Management - Now and in the Future

Topology Matters •  Even very small changes can

have dramatic and unexpected consequences. •  Example – having just 1

down gemini out of 6114 can slow an application by >20%

•  0.0156% of components unavailable can extend an application run time by >20% if the components just happen to be in the wrong place

•  P3DNS – 6114 Nodes

13 Moabcon 2013 - April 10, 2013 - Salt Lake

City

Page 14: Blue Waters and Resource Management - Now and in the Future

Topology

•  1 poorly placed node out of 4116 (0.02%) can slow an application by >30%

•  On a dedicated system! •  It is hard to get an optimal

topology assignments, especial in non-dedicated use, but is should be easy to avoid really detrimental topology assignments.

14

1 poorly placed node out of 4116 (0.02%) can slow an application by >30%

Moabcon 2013 - April 10, 2013 - Salt Lake City

Page 15: Blue Waters and Resource Management - Now and in the Future

Topology Awareness Needed for All Types of Interconnects

15 Moabcon 2013 - April 10, 2013 - Salt Lake

City

Tori Trees Hypercubes

Direct Connect & Dragonflys

Page 16: Blue Waters and Resource Management - Now and in the Future

Performance and Scalability through Flexibility

•  Harder for applications are able to scale in the face of limited bandwidths. •  BW works with science teams and technology providers to

•  Understand and develop better process-to-node mapping analysis to determine behavior and usage patterns.

•  Better instrumentation of what the network is really doing •  Topology aware resource and systems management that enable and reward

topology aware applications •  Malleability – for applications and systems

•  Understanding topology given and maximizing effectiveness •  Being able to express desired topology based on algorithms •  Mid ware support

•  Even if applications scale, consistency becomes an increasing issue for systems and applications

•  This will only get worse in future systems

16 Moabcon 2013 - April 10, 2013 - Salt Lake

City

Page 17: Blue Waters and Resource Management - Now and in the Future

Flexible Resiliency Modes •  Run again •  Defensive I/O (traditional Checkpoint/restart)

•  Expensive •  Extra overhead for application and system

•  Intrusive •  I/O infrastructure share across all jobs

•  New C/R (node memory copy, SSD, Journaling,…) •  Spare nodes in job requests to rebalance work if a single point of failure

•  Wastes resources •  Run times do not support well yet (but can do it)

•  Redistribute work within remaining nodes •  Charm++ , some MPI implementations •  Takes longer

•  Add spare nodes from system pool to job •  Job scheduler and resource manager and runtime all have to made more flexible

17 Moabcon 2013 - April 10, 2013 - Salt Lake

City

Page 18: Blue Waters and Resource Management - Now and in the Future

Observation 2: Resiliency Flexibility Critical •  Migrate From Checkpoint to Application Resiliency

•  Traditional system based checkpoint restart is no longer viable •  Defensive I/O per application is inefficient but the current state of

the art •  Better application resiliency requires improvements in both systems

and applications •  Several teams moving to new frameworks (e.g. Charm++) to

improve resiliency •  MPI trying to add better features for resiliency

18 Moabcon 2013 - April 10, 2013 - Salt Lake

City

Page 19: Blue Waters and Resource Management - Now and in the Future

Resiliency Flexibility •  Application Based Resiliency

•  Multiple layers of Software and Hardware have to coordinate information and reaction

•  Analysis and understanding is needed before action •  Correct and actionable messages need to flow up and down the

stack to the applications so they can take the proper action with correct information

•  Application Situational Awareness - need to understand circumstances and take action

•  Flexible resource provisioning needed in real time •  Replacing failed node on the dynamically from a system pool of

nodes •  Interaction with other constraints so sub-optimization does not

adversely impact overall system optimization

19 Moabcon 2013 - April 10, 2013 - Salt Lake

City

Page 20: Blue Waters and Resource Management - Now and in the Future

The Chicken or the Egg •  Applications cannot take advantage of features the

system does not provide •  So they do the best they can with guesses

•  Technology providers do not provide features because they say applications do not use them

My message is We can not brute force our way to future @scale

systems and applications any longer.

20 Moabcon 2013 - April 10, 2013 - Salt Lake

City

Page 21: Blue Waters and Resource Management - Now and in the Future

Many Other Observations – Other Presentations

•  Storage and I/O significant challenges •  System software quality and resiliency

•  Testing for function, feature and performance at scale •  Information Gathering for the system •  Application Methods •  Measuring real time to solution performance •  System SW scale performance •  Heterogeneous components •  Application Consistency •  Efficiency

•  Energy, TCO, utilization S&E Team Productivity

•  ….

21 Moabcon 2013 - April 10, 2013 - Salt Lake

City

Page 22: Blue Waters and Resource Management - Now and in the Future

Summary •  Blue Waters is delivering on its commitment to sustained performance to the

Nation for computational and data focused @Scale problems. •  We appreciate the tremendous efforts and support of all our technology

providers and science team partners •  I am pleased to see Adaptive and Cray seriously addressing topology

awareness issues – to meet BW specific needs and hopefully beyoind •  I am pleased Cray made initial improvements to enable application resiliency

but Adaptive, Cray, MPI and other technology providers need to do much more to solve

•  I am very encouraged application teams are willing (and desire) to implement flexibility in their codes if they have options

•  We need more commonality across technology providers and implement •  BW is an excellent platform for studying the issues as well as providing an

unprecedented S&E resource •  Stay tuned for amazing results from BW

22 Moabcon 2013 - April 10, 2013 - Salt Lake

City

Page 23: Blue Waters and Resource Management - Now and in the Future

Acknowledgements

This work is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (award number OCI

07-25070) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign, its National Center for

Supercomputing Applications, Cray, and the Great Lakes Consortium for Petascale Computation.

The work described is achievable through the efforts of the many other on different teams.

Moabcon 2013 - April 10, 2013 - Salt Lake City 23