building the world’s largest linux supercomputers · 2003. 7. 8. · building the system ndesign...

Post on 08-Oct-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Building the World’s Largest Linux Supercomputers

Kim ClarkV.P. Engineering

Leading the Cluster Revolution

Lawrence Livermore11.2 TFLOPSLinux Networx E22,304 Intel Processors

Los Alamos10 TFLOPSLinux Networx E22,048 Intel Processors

Argonne1.68 TFLOPSLinux Networx E2408 Intel Processors

The Linux Networx TeraFLOPS Club

World’s Most PowerfulLinux Supercomputer

nRpeak=11.2 TF, Rmax=7.634 TF (68% efficiency)

n # 3 Top 500 Supercomputer Listn # 9 Capability Class Systems

(IDC Balanced Ratings)

LLNL System Facts

n 1152 Evolocity II (.8u) Nodes

n 2.4 GHz Intel® Xeon™ ProcessorsnQuadrics ELAN3 QsNetn 4.6 TBytes Memory

n 138 TBytes Local Storage

n 115 TBytes Global Storage

Building The Systemn Design Prep

n Rack Layoutn Cabling Layoutn Test Plan

n Facility Prepn Power Measurementsn HVAC Estimations

n Built At Factory Firstn Tear Down 3 Daysn Rebuild at LLNL 3 days

Testing The System

n Test PlannBurn In Full RacknBasic Network TestnMPI Stress TestnLinpacknPre-Ship Test SuitenPost-Ship Test SuitenFinal Acceptance

Cooling The Systemn Node

n Redundant Fansn CPU temp is 24°C Under Heavy Load

n Rackn Patented Cooling Chassisn 18° C In , 22°C Out

n Systemn Hot air flow mixn CRAC 80 Tons

Powering The System

n350 Watts per sq. footnTotal: 280 kWnTwo 50 amp feeds/rackn2 PDUs per rackn ICE Box Management

nPower ManagementnTemperature SensingnSerial Console AccessnNode Beaconing

Reliability

Failures Happen!n 1000 Components with a 150,000 hour individual MTBF

have an aggregate MTBF of 150 hours if there is no redundancy!

n To combat low reliability, applications must checkpoint frequently which degrades performance!

Higher Reliability = Higher Performance

MCR (Multi-programmatic Capability Cluster) Architecturen Scalable Unitsn1 FSU (First Scalable Unit)n11 CNSU (Compute Node Scalable Units)

n NetworksnMPI - QuadricsnManagement - Ethernet 10/100nDebug - SerialnOST - GigEnLogin - GigE

Scalable Units

n FSU n60 Compute Nodesn32 Gateway Nodes (QsNet -> GigE)n 2 Login Nodesn 2 Management Nodesn 2 MDS Nodes (Kimberlite for HA)

nCNSU (Compute Node Scalable Units)n96 Compute Nodes Each

MPI Network

nSingle-rail Quadrics Elan3n5 usec latency for short messagesnGot 325 out of 340 MB/sec

n 3:1 Oversubscribed Fat-tree Networkn12 first tier, 4 second tier switchesnLess than 50% degradation (66%

expected)

Software Stack

LinuxBIOS

LinuxXFS

Lustre

Mpich (Quadrics)

RMS/SLURM

Maui/DPCS

ClusterWorX

LinuxBIOS

n LinuxBIOS on all nodesn < 2 Seconds from Power-on to Boot Loadern < 1 Minute from Boot to Login CompletenRemote Manageability

nEdit CMOS ParmsnFlash/store ROM image

nSupports Multi-cast Boot

Lustre

Client

MDS

MDS

OST

Linux NetworX Accomplishments With MCR

n Fastest Intel or Linux-based systemnShortest Delivery Time for Top 5 Systemn Full System Pre-Stage At FactorynRemote Imagingn1152 nodes imaged in 15 minutes.

n LinuxBIOSnLess than 1 minute Boot

nHighest CPU density - Evolocity II (.8u)

Dr. Still:“MCR is a great machine.

... Please buy more machines like it.”

Science Runsn 768 CPU Pf3d Laser

simulationn 6 Billion cells!n Able to dump restart

files at 464 MB/sec

High Productivity Computing Systems

Reliabilityn Superior patented cooling mechanism

n Redundant bearingless fans

n No component less than 150K hour MTBF

(Real World = 3 weeks)

n Onsite Buildup

Scalabilityn MCR added 192 nodes

High Productivity Computing Systems

Manageability

n HW Management – ICE Box

n SW Management – Clusterworx

n Linux BIOS

n High Density

Performance

n High Performance Linpack / IDC Ratings

Application for Smaller Systems

n Same Componentsn Same Cooling Technologyn Same QA/Processn Full System Pre-Stage At Factoryn Quicker to implementn Can run smaller systems in ambient

temperatures.n Can add as needs grow.

World’s Fastest Linux Supercomputers

top related