march 22, 2000dr. thomas sterling, caltech1. networking options for beowulf clusters dr. thomas...
TRANSCRIPT
Networking Options for Beowulf Clusters
Dr. Thomas Sterling
California Institute of Technologyand
NASA Jet Propulsion Laboratory
March 22, 2000
Presentation to the American Physical Society:
March 22, 2000 Dr. Thomas Sterling, Caltech 5
Points of Inflection Computing• Heroic Era (1950)
– technology: vacuum tubes, mercury delay lines, pulse transformers
– architecture: accumulator based– model: von-Neumann, sequential instruction
execution– examples: Whirlwind, EDSAC
• Mainframe (1960)– technology: transistors, core memory, disk drives– architecture: register bank based– model: reentrant concurrent processes– examples: IBM 7042, 7090, PDP-1
• Scientific Computer(1970)– technology: earliest SSI logic gate modules– architecture: virtual memory– model: parallel processing– examples: CDC 6600, Goodyear STARAN
March 22, 2000 Dr. Thomas Sterling, Caltech 6
Points of Inflection in the History of Computing
• Supercomputers (1980)– technology: ECL, semiconductor integration, RAM– architecture: pipelined– model: vector– example: Cray-1
• Massively Parallel Processing (1990)– technology: VLSI, microprocessor, – architecture: MIMD– model: Communicating Sequential Processes,
Message passing– examples: TMC CM-5, Intel Paragon
• ? (2000)– trans-teraflops epoch
March 22, 2000 Dr. Thomas Sterling, Caltech 9
Punctuated Equilibriumnonlinear dynamics drive to point of inflexion
• Drastic reduction in vendor support for HPC• Component technology for PCs matches workstation
capability• PC hosted software environments achieve
sophistication and robustness of mainframe O/S• Low cost network hardware and software enable
balanced PC clusters• MPPs establish low level of expectation• Cross-platform parallel programming model
March 22, 2000 Dr. Thomas Sterling, Caltech 10
BEOWULF-CLASS SYSTEMS
• Cluster of PCs– Intel x86– DEC Alpha– Mac Power PC
• Pure M2COTS• Unix-like O/S with source
– Linux, BSD, Solaris
• Message passing programming model– PVM, MPI, BSP, homebrew remedies
• Single user environments• Large science and engineering applications
March 22, 2000 Dr. Thomas Sterling, Caltech 11
Rank Manufacturer Computer Rmax Installation Site # Proc Rpeak 33 Sun HPC 4500 Cluster 272.1 Sun Burlington 720 483.84 34 Compaq AlphaServer SC 271.4 Compaq Computer Corporation Littleton 512 512 44 Self-made CPlant Cluster 232.6 Sandia National Laboratories Albuquerque 580 580 143 Sun HPC 10000 400 MHz Cluster 68.77 KT Freetel Seoul 110 88 169 Compaq Alphleet Cluster 61.3 Institute of Physical and Chemical Res. (RIKEN) Wako 140 140 265 Self-Made Avalon Cluster 48.6 Los Alamos National Laboratory/CNLS Los Alamos 140 149.4 351 Fujitsu-Siemens hpcLine Cluster 41.45 Universitaet Paderborn - PC2 Paderborn 192 86.4 384 Sun HPC 10000 333 MHz Cluster 39.87 Dutchtone 78 46.8 397 SGI ORIGIN 2000 250 MHz - Eth-Cluster 39.4 The Sabre Group Ft Worth 128 64 399 Sun HPC 10000 400 MHz Cluster 39.03 Computer Manufacturer 64 51.2 400 Sun HPC 10000 400 MHz Cluster 39.03 Semiconductor Company 64 51.2 420 SGI ORIGIN 2000 300 MHz - Eth-Cluster 37.31 Industrial Light & Magic 128 76.8 421 SGI ORIGIN 2000 250 MHz - Eth-Cluster 37.31 Government 144 72 422 SGI ORIGIN 2000 250 MHz - Eth-Cluster 37.31 America On Line (AOL) 128 64 423 SGI ORIGIN 2000 250 MHz - Eth-Cluster 37.31 Industrial Light & Magic 128 64 424 SGI ORIGIN 2000 250 MHz - Eth-Cluster 37.31 NASA/Ames Research Center/NAS Mountain View 128 64 443 Sun HPC 10000 333 MHz Cluster 35.17 Gedas N.A. (VW) 70 42 445 SGI ORIGIN 2000 250 MHz - Eth-Cluster 34.47 Government 112 56 454 Self-made Parnass2 Cluster 34.23 University Bonn - Dep. of Applied Mathematics Bonn 128 57.6
March 22, 2000 Dr. Thomas Sterling, Caltech 12
Beowulf-class SystemsA New Paradigm for the Business of Computing
• Brings high end computing to broad ranged problems– new markets
• Order of magnitude Price-Performance advantage• Commodity enabled
– no long development lead times
• Low vulnerability to vendor-specific decisions– companies are ephemeral; Beowulfs are forever
• Rapid response technology tracking• Just-in-place user-driven configuration
– requirement responsive
• Industry-wide, non-proprietary software environment
March 22, 2000 Dr. Thomas Sterling, Caltech 14
Have to Run Big Problems on Big Machines?
• Its work, not peak flops• A user’s throughput over application cycle• Big machines yield little slices
– due to time and space sharing
• But data set memory requirements– wide range of data set needs, three order of magnitude– latency tolerant algorithms enable out-of-core computation
• What is Beowulf breakpoint for price-performance?
March 22, 2000 Dr. Thomas Sterling, Caltech 15
Throughput Turbochargers• Recurring costs approx.. 10% MPPs• Rapid response to technology advances• Just-in-place configuration and reconfigurable • High reliability• Easily maintained through low cost replacement• Consistent portable programming model
– Unix, C, Fortran, Message passing
• Applicable to wide range of problems and algorithms• Double machine room throughput at a tenth the cost• Provides super-linear speedup
March 22, 2000 Dr. Thomas Sterling, Caltech 16
Beowulf Project - A Brief History• Started in late 1993• NASA Goddard Space Flight Center
– NASA JPL, Caltech, academic and industrial collaborators
• Sponsored by NASA HPCC Program• Applications: single user science station
– data intensive– low cost
• General focus:– single user (dedicated) science and engineering applications– out of core computation– system scalability– Ethernet drivers for Linux
March 22, 2000 Dr. Thomas Sterling, Caltech 17
Beowulf System at JPL (Hyglac)
• 16 Pentium Pro PCs, each with 2.5 Gbyte disk, 128 Mbyte memory, Fast Ethernet card.
• Connected using 100Base-T network, through a 16-way crossbar switch.
Theoretical peak performance: 3.2 GFlop/s.
Achieved sustained performance: 1.26 GFlop/s.
March 22, 2000 Dr. Thomas Sterling, Caltech 18
A 10 Gflops Beowulf
California Institute of Technology
Center for Advance Computing Research
172 Intel Pentium Pro microprocessors
March 22, 2000 Dr. Thomas Sterling, Caltech 20
1st printing: May, 1999
2nd printing: Aug. 1999
MIT Press
March 22, 2000 Dr. Thomas Sterling, Caltech 23
Electro-dynamic FDTD Code
All timing data is in CPU seconds/simulated time step, for a global grid size of 282 362 102, distributed on 16 processors.
T3D(shmem)
T3D(MPI)
Hyglac(MPI,
Good LoadBalance)
Hyglac(MPI,
Poor LoadBalance)
InteriorComputation
1.8(1.3*)
1.8(1.3*)
1.1 1.1
InteriorCommunication
0.007 0.08 3.8 3.8
BoundaryComputation
0.19 0.19 0.14 0.42
BoundaryCommunication
0.04 1.5 50.1 0.0
Total 2.0(1.5*)
3.5(3.0*)
55.1 5.5
(*using assembler kernel)
March 22, 2000 Dr. Thomas Sterling, Caltech 24
Network Topology Scaling
UDP Latency
TCP Latency
0
50
100
150
200
250
300
350
Latencies (s)
March 22, 2000 Dr. Thomas Sterling, Caltech 27
System Area Network Technologies
• Fast Ethernet– LAN, 100 Mbps, 100 usec
• Gigabit Ethernet– LAN/SAN, 1000 Mbps, 50 usec
• ATM– WAN/LAN, 155/620 Mbps,
• Myrinet– SAN, 1250 Mbps, 20 usec
• Giganet– SAN/VIA, 1000 Gbps, 5 usec
• Servernet II– SAN/VIA, 1000 Gbps, 10 usec
• SCI– SAN, 8000 Gbps, 5 usec
March 22, 2000 Dr. Thomas Sterling, Caltech 28
3Com CoreBuilder 9400 Switch and Gigabit Ethernet NIC
March 22, 2000 Dr. Thomas Sterling, Caltech 30
M2LM-SW16 16-Port Myrinet Switch with 8 SAN ports and 8
LAN ports
March 22, 2000 Dr. Thomas Sterling, Caltech 40
The Beowulf Deltalooking forward
• 6 years• Clock rate: X 4• flops (per chip): X 50 (2-4 proc/chip, 4-8 way ILP/proc)• #processors: 32• Networking: X 32 (32 - 64 Gbps)• Memory: X 10 (4 Gbytes)• Disk: X 100• price-performance: X 50• system performance: 50 Tflops
March 22, 2000 Dr. Thomas Sterling, Caltech 41
Million $$ Teraflops Beowulf?
• Today, $3M peak Tflops• < year 2002 $1M peak Tflops• Performance efficiency is serious challenge• System integration
– does vendor support of massive parallelism have to mean massive markup
• System administration, boring but necessary• Maintenance without vendors; how?
– New kind of vendors for support
• Heterogeneity will become major aspect