march 22, 2000dr. thomas sterling, caltech1. networking options for beowulf clusters dr. thomas...

42

Upload: angelina-phelps

Post on 02-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

March 22, 2000 Dr. Thomas Sterling, Caltech 1

Networking Options for Beowulf Clusters

Dr. Thomas Sterling

California Institute of Technologyand

NASA Jet Propulsion Laboratory

March 22, 2000

Presentation to the American Physical Society:

March 22, 2000 Dr. Thomas Sterling, Caltech 3

March 22, 2000 Dr. Thomas Sterling, Caltech 4

March 22, 2000 Dr. Thomas Sterling, Caltech 5

Points of Inflection Computing• Heroic Era (1950)

– technology: vacuum tubes, mercury delay lines, pulse transformers

– architecture: accumulator based– model: von-Neumann, sequential instruction

execution– examples: Whirlwind, EDSAC

• Mainframe (1960)– technology: transistors, core memory, disk drives– architecture: register bank based– model: reentrant concurrent processes– examples: IBM 7042, 7090, PDP-1

• Scientific Computer(1970)– technology: earliest SSI logic gate modules– architecture: virtual memory– model: parallel processing– examples: CDC 6600, Goodyear STARAN

March 22, 2000 Dr. Thomas Sterling, Caltech 6

Points of Inflection in the History of Computing

• Supercomputers (1980)– technology: ECL, semiconductor integration, RAM– architecture: pipelined– model: vector– example: Cray-1

• Massively Parallel Processing (1990)– technology: VLSI, microprocessor, – architecture: MIMD– model: Communicating Sequential Processes,

Message passing– examples: TMC CM-5, Intel Paragon

• ? (2000)– trans-teraflops epoch

March 22, 2000 Dr. Thomas Sterling, Caltech 7

March 22, 2000 Dr. Thomas Sterling, Caltech 8

March 22, 2000 Dr. Thomas Sterling, Caltech 9

Punctuated Equilibriumnonlinear dynamics drive to point of inflexion

• Drastic reduction in vendor support for HPC• Component technology for PCs matches workstation

capability• PC hosted software environments achieve

sophistication and robustness of mainframe O/S• Low cost network hardware and software enable

balanced PC clusters• MPPs establish low level of expectation• Cross-platform parallel programming model

March 22, 2000 Dr. Thomas Sterling, Caltech 10

BEOWULF-CLASS SYSTEMS

• Cluster of PCs– Intel x86– DEC Alpha– Mac Power PC

• Pure M2COTS• Unix-like O/S with source

– Linux, BSD, Solaris

• Message passing programming model– PVM, MPI, BSP, homebrew remedies

• Single user environments• Large science and engineering applications

March 22, 2000 Dr. Thomas Sterling, Caltech 11

Rank Manufacturer Computer Rmax Installation Site # Proc Rpeak 33 Sun HPC 4500 Cluster 272.1 Sun Burlington 720 483.84 34 Compaq AlphaServer SC 271.4 Compaq Computer Corporation Littleton 512 512 44 Self-made CPlant Cluster 232.6 Sandia National Laboratories Albuquerque 580 580 143 Sun HPC 10000 400 MHz Cluster 68.77 KT Freetel Seoul 110 88 169 Compaq Alphleet Cluster 61.3 Institute of Physical and Chemical Res. (RIKEN) Wako 140 140 265 Self-Made Avalon Cluster 48.6 Los Alamos National Laboratory/CNLS Los Alamos 140 149.4 351 Fujitsu-Siemens hpcLine Cluster 41.45 Universitaet Paderborn - PC2 Paderborn 192 86.4 384 Sun HPC 10000 333 MHz Cluster 39.87 Dutchtone 78 46.8 397 SGI ORIGIN 2000 250 MHz - Eth-Cluster 39.4 The Sabre Group Ft Worth 128 64 399 Sun HPC 10000 400 MHz Cluster 39.03 Computer Manufacturer 64 51.2 400 Sun HPC 10000 400 MHz Cluster 39.03 Semiconductor Company 64 51.2 420 SGI ORIGIN 2000 300 MHz - Eth-Cluster 37.31 Industrial Light & Magic 128 76.8 421 SGI ORIGIN 2000 250 MHz - Eth-Cluster 37.31 Government 144 72 422 SGI ORIGIN 2000 250 MHz - Eth-Cluster 37.31 America On Line (AOL) 128 64 423 SGI ORIGIN 2000 250 MHz - Eth-Cluster 37.31 Industrial Light & Magic 128 64 424 SGI ORIGIN 2000 250 MHz - Eth-Cluster 37.31 NASA/Ames Research Center/NAS Mountain View 128 64 443 Sun HPC 10000 333 MHz Cluster 35.17 Gedas N.A. (VW) 70 42 445 SGI ORIGIN 2000 250 MHz - Eth-Cluster 34.47 Government 112 56 454 Self-made Parnass2 Cluster 34.23 University Bonn - Dep. of Applied Mathematics Bonn 128 57.6

March 22, 2000 Dr. Thomas Sterling, Caltech 12

Beowulf-class SystemsA New Paradigm for the Business of Computing

• Brings high end computing to broad ranged problems– new markets

• Order of magnitude Price-Performance advantage• Commodity enabled

– no long development lead times

• Low vulnerability to vendor-specific decisions– companies are ephemeral; Beowulfs are forever

• Rapid response technology tracking• Just-in-place user-driven configuration

– requirement responsive

• Industry-wide, non-proprietary software environment

March 22, 2000 Dr. Thomas Sterling, Caltech 13

March 22, 2000 Dr. Thomas Sterling, Caltech 14

Have to Run Big Problems on Big Machines?

• Its work, not peak flops• A user’s throughput over application cycle• Big machines yield little slices

– due to time and space sharing

• But data set memory requirements– wide range of data set needs, three order of magnitude– latency tolerant algorithms enable out-of-core computation

• What is Beowulf breakpoint for price-performance?

March 22, 2000 Dr. Thomas Sterling, Caltech 15

Throughput Turbochargers• Recurring costs approx.. 10% MPPs• Rapid response to technology advances• Just-in-place configuration and reconfigurable • High reliability• Easily maintained through low cost replacement• Consistent portable programming model

– Unix, C, Fortran, Message passing

• Applicable to wide range of problems and algorithms• Double machine room throughput at a tenth the cost• Provides super-linear speedup

March 22, 2000 Dr. Thomas Sterling, Caltech 16

Beowulf Project - A Brief History• Started in late 1993• NASA Goddard Space Flight Center

– NASA JPL, Caltech, academic and industrial collaborators

• Sponsored by NASA HPCC Program• Applications: single user science station

– data intensive– low cost

• General focus:– single user (dedicated) science and engineering applications– out of core computation– system scalability– Ethernet drivers for Linux

March 22, 2000 Dr. Thomas Sterling, Caltech 17

Beowulf System at JPL (Hyglac)

• 16 Pentium Pro PCs, each with 2.5 Gbyte disk, 128 Mbyte memory, Fast Ethernet card.

• Connected using 100Base-T network, through a 16-way crossbar switch.

Theoretical peak performance: 3.2 GFlop/s.

Achieved sustained performance: 1.26 GFlop/s.

March 22, 2000 Dr. Thomas Sterling, Caltech 18

A 10 Gflops Beowulf

California Institute of Technology

Center for Advance Computing Research

172 Intel Pentium Pro microprocessors

March 22, 2000 Dr. Thomas Sterling, Caltech 19

Avalon architecture and price.

March 22, 2000 Dr. Thomas Sterling, Caltech 20

1st printing: May, 1999

2nd printing: Aug. 1999

MIT Press

March 22, 2000 Dr. Thomas Sterling, Caltech 21

Beowulf at Work

March 22, 2000 Dr. Thomas Sterling, Caltech 22

Beowulf Scalability

March 22, 2000 Dr. Thomas Sterling, Caltech 23

Electro-dynamic FDTD Code

All timing data is in CPU seconds/simulated time step, for a global grid size of 282 362 102, distributed on 16 processors.

T3D(shmem)

T3D(MPI)

Hyglac(MPI,

Good LoadBalance)

Hyglac(MPI,

Poor LoadBalance)

InteriorComputation

1.8(1.3*)

1.8(1.3*)

1.1 1.1

InteriorCommunication

0.007 0.08 3.8 3.8

BoundaryComputation

0.19 0.19 0.14 0.42

BoundaryCommunication

0.04 1.5 50.1 0.0

Total 2.0(1.5*)

3.5(3.0*)

55.1 5.5

(*using assembler kernel)

March 22, 2000 Dr. Thomas Sterling, Caltech 24

Network Topology Scaling

UDP Latency

TCP Latency

0

50

100

150

200

250

300

350

Latencies (s)

March 22, 2000 Dr. Thomas Sterling, Caltech 25

Routed Network - Random Pattern

March 22, 2000 Dr. Thomas Sterling, Caltech 26

March 22, 2000 Dr. Thomas Sterling, Caltech 27

System Area Network Technologies

• Fast Ethernet– LAN, 100 Mbps, 100 usec

• Gigabit Ethernet– LAN/SAN, 1000 Mbps, 50 usec

• ATM– WAN/LAN, 155/620 Mbps,

• Myrinet– SAN, 1250 Mbps, 20 usec

• Giganet– SAN/VIA, 1000 Gbps, 5 usec

• Servernet II– SAN/VIA, 1000 Gbps, 10 usec

• SCI– SAN, 8000 Gbps, 5 usec

March 22, 2000 Dr. Thomas Sterling, Caltech 28

3Com CoreBuilder 9400 Switch and Gigabit Ethernet NIC

March 22, 2000 Dr. Thomas Sterling, Caltech 29

Lucent Cajun M770 Multifunction Switch

March 22, 2000 Dr. Thomas Sterling, Caltech 30

M2LM-SW16 16-Port Myrinet Switch with 8 SAN ports and 8

LAN ports

March 22, 2000 Dr. Thomas Sterling, Caltech 31

Dolphin Modular SCI Switch for System Area Networks

March 22, 2000 Dr. Thomas Sterling, Caltech 32

Giganet High Performance Host Adapters

March 22, 2000 Dr. Thomas Sterling, Caltech 33

Giganet High Performance Cluster Switch

March 22, 2000 Dr. Thomas Sterling, Caltech 34

March 22, 2000 Dr. Thomas Sterling, Caltech 35

March 22, 2000 Dr. Thomas Sterling, Caltech 36

March 22, 2000 Dr. Thomas Sterling, Caltech 37

March 22, 2000 Dr. Thomas Sterling, Caltech 38

March 22, 2000 Dr. Thomas Sterling, Caltech 39

March 22, 2000 Dr. Thomas Sterling, Caltech 40

The Beowulf Deltalooking forward

• 6 years• Clock rate: X 4• flops (per chip): X 50 (2-4 proc/chip, 4-8 way ILP/proc)• #processors: 32• Networking: X 32 (32 - 64 Gbps)• Memory: X 10 (4 Gbytes)• Disk: X 100• price-performance: X 50• system performance: 50 Tflops

March 22, 2000 Dr. Thomas Sterling, Caltech 41

Million $$ Teraflops Beowulf?

• Today, $3M peak Tflops• < year 2002 $1M peak Tflops• Performance efficiency is serious challenge• System integration

– does vendor support of massive parallelism have to mean massive markup

• System administration, boring but necessary• Maintenance without vendors; how?

– New kind of vendors for support

• Heterogeneity will become major aspect

March 22, 2000 Dr. Thomas Sterling, Caltech 42