building the world’s largest linux supercomputers · 2003. 7. 8. · building the system ndesign...
Post on 08-Oct-2020
1 Views
Preview:
TRANSCRIPT
Building the World’s Largest Linux Supercomputers
Kim ClarkV.P. Engineering
Leading the Cluster Revolution
Lawrence Livermore11.2 TFLOPSLinux Networx E22,304 Intel Processors
Los Alamos10 TFLOPSLinux Networx E22,048 Intel Processors
Argonne1.68 TFLOPSLinux Networx E2408 Intel Processors
The Linux Networx TeraFLOPS Club
World’s Most PowerfulLinux Supercomputer
nRpeak=11.2 TF, Rmax=7.634 TF (68% efficiency)
n # 3 Top 500 Supercomputer Listn # 9 Capability Class Systems
(IDC Balanced Ratings)
LLNL System Facts
n 1152 Evolocity II (.8u) Nodes
n 2.4 GHz Intel® Xeon™ ProcessorsnQuadrics ELAN3 QsNetn 4.6 TBytes Memory
n 138 TBytes Local Storage
n 115 TBytes Global Storage
Building The Systemn Design Prep
n Rack Layoutn Cabling Layoutn Test Plan
n Facility Prepn Power Measurementsn HVAC Estimations
n Built At Factory Firstn Tear Down 3 Daysn Rebuild at LLNL 3 days
Testing The System
n Test PlannBurn In Full RacknBasic Network TestnMPI Stress TestnLinpacknPre-Ship Test SuitenPost-Ship Test SuitenFinal Acceptance
Cooling The Systemn Node
n Redundant Fansn CPU temp is 24°C Under Heavy Load
n Rackn Patented Cooling Chassisn 18° C In , 22°C Out
n Systemn Hot air flow mixn CRAC 80 Tons
Powering The System
n350 Watts per sq. footnTotal: 280 kWnTwo 50 amp feeds/rackn2 PDUs per rackn ICE Box Management
nPower ManagementnTemperature SensingnSerial Console AccessnNode Beaconing
Reliability
Failures Happen!n 1000 Components with a 150,000 hour individual MTBF
have an aggregate MTBF of 150 hours if there is no redundancy!
n To combat low reliability, applications must checkpoint frequently which degrades performance!
Higher Reliability = Higher Performance
MCR (Multi-programmatic Capability Cluster) Architecturen Scalable Unitsn1 FSU (First Scalable Unit)n11 CNSU (Compute Node Scalable Units)
n NetworksnMPI - QuadricsnManagement - Ethernet 10/100nDebug - SerialnOST - GigEnLogin - GigE
Scalable Units
n FSU n60 Compute Nodesn32 Gateway Nodes (QsNet -> GigE)n 2 Login Nodesn 2 Management Nodesn 2 MDS Nodes (Kimberlite for HA)
nCNSU (Compute Node Scalable Units)n96 Compute Nodes Each
MPI Network
nSingle-rail Quadrics Elan3n5 usec latency for short messagesnGot 325 out of 340 MB/sec
n 3:1 Oversubscribed Fat-tree Networkn12 first tier, 4 second tier switchesnLess than 50% degradation (66%
expected)
Software Stack
LinuxBIOS
LinuxXFS
Lustre
Mpich (Quadrics)
RMS/SLURM
Maui/DPCS
ClusterWorX
LinuxBIOS
n LinuxBIOS on all nodesn < 2 Seconds from Power-on to Boot Loadern < 1 Minute from Boot to Login CompletenRemote Manageability
nEdit CMOS ParmsnFlash/store ROM image
nSupports Multi-cast Boot
Lustre
Client
MDS
MDS
OST
Linux NetworX Accomplishments With MCR
n Fastest Intel or Linux-based systemnShortest Delivery Time for Top 5 Systemn Full System Pre-Stage At FactorynRemote Imagingn1152 nodes imaged in 15 minutes.
n LinuxBIOSnLess than 1 minute Boot
nHighest CPU density - Evolocity II (.8u)
Dr. Still:“MCR is a great machine.
... Please buy more machines like it.”
Science Runsn 768 CPU Pf3d Laser
simulationn 6 Billion cells!n Able to dump restart
files at 464 MB/sec
High Productivity Computing Systems
Reliabilityn Superior patented cooling mechanism
n Redundant bearingless fans
n No component less than 150K hour MTBF
(Real World = 3 weeks)
n Onsite Buildup
Scalabilityn MCR added 192 nodes
High Productivity Computing Systems
Manageability
n HW Management – ICE Box
n SW Management – Clusterworx
n Linux BIOS
n High Density
Performance
n High Performance Linpack / IDC Ratings
Application for Smaller Systems
n Same Componentsn Same Cooling Technologyn Same QA/Processn Full System Pre-Stage At Factoryn Quicker to implementn Can run smaller systems in ambient
temperatures.n Can add as needs grow.
World’s Fastest Linux Supercomputers
top related