supermike: lsu’s terascale, beowulf-class supercomputer presented to lasci 2003 by joel e....

23
SuperMike: LSU’s TeraScale, Beowulf-class Supercomputer Presented to LASCI 2003 by Joel E. Tohline, former Interim Director Center for Applied Information Technology and Learning http://www.capital.lsu.edu/ October 29, 2003

Upload: drusilla-atkins

Post on 24-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

SuperMike:LSU’s TeraScale, Beowulf-class Supercomputer

Presented to LASCI 2003

by Joel E. Tohline, former Interim Director

Center for Applied Information Technology and Learninghttp://www.capital.lsu.edu/

October 29, 2003

10/29/03 LACSI 2003 2

AY2001/02 was a Special Year! A decade of experience acquiring and utilizing

parallel architectures in LSU’s Concurrent Computing Laboratory for Materials Simulation.

Beowulf systems maturing– Commodity CPUs becoming good number crunchers– Network bandwidths, robustness, and size improving– Linux OS and message-passing software stabilizing

Numerous LSU groups building Beowulf clusters

Governor “Mike” Foster’s $23M Information Technology Initiative

10/29/03 LACSI 2003 3

Building a Beowulf-Class Super-computer: Considerations [Fall ’01]

What processor and what chipset? What motherboard? How much RAM and on-board disk space? What I/O features? What network interconnect?

How many nodes and processors/node? What encasement and form-factor? What about power and A/C requirements? Physical footprint and location? How to assemble? Must be installed before July 1, 2002!

10/29/03 LACSI 2003 4

Top500.org [Nov. 2001]

10/29/03 LACSI 2003 5

NCSA’s Netfinity Cluster [Nov. ’01]

Intel P III 1GHz processors 256 KB L2 cache 2 processors/node 512 nodes 1 TeraFlops peak Network: Myrinet 2000 – 100 Mbit Ethernet Actual aggregate speed: 594 GFlops

So … at worst, same config. w/ 1.8 GHz procs. should give 1.0 TFlops comparable in speed to SDSC’s IBM Power3 (Blue Horizon)

10/29/03 LACSI 2003 6

Competitive Invitation to Bid [Dec. ’01]

Requested bids on two configurations– 512 dual-processor > 1.7 GHz P III– 512 dual-processor > 1.7 GHz dual Xeon

w/ Intel’s 860 chipset Sought Experienced Vendors

– Must be approved Myricom OEM– Must have previously installed a cluster

containing at least 128 nodes

10/29/03 LACSI 2003 7

Lowest Qualified Bid

10/29/03 LACSI 2003 8

• 1.8 GHz clock• 512 KB L2 cache• Dual-processor• Hyper-Threading • 400 MHz system bus

http://www.intel.com/design/Xeon/prodbref/

10/29/03 LACSI 2003 9

Intel® E7500 Chipset [Feb. ’02]

10/29/03 LACSI 2003 10

Tyan “Thunder i7500” Motherboardhttp://www.tyan.com/products/html/thunderi7500.html

10/29/03 LACSI 2003 11

Myrinet® Network

10/29/03 LACSI 2003 12

Building a Beowulf-Class Supercomputer: Choices

What processor and what chipset?– Intel 1.8 GHz P4 Xeon DP w/ E7500 chipset

What motherboard?– Tyan “Thunder i7500” motherboard

How much RAM and on-board disk space?– 1 GB RAM and 40 GB IDE disk drive per processor

What I/O features?– CD-ROM, floppy disk, 2 USB, video, keyboard/mouse– Fast Ethernet

What network interconnect?– Mycom’s Myrinet 2000 – 2 Gbit bi-directional

10/29/03 LACSI 2003 13

Building a Beowulf-Class Supercomputer: Choices

How many nodes and processors/node?– 512 nodes; 2 processors/node

What encasement and form-factor?– Rack-mountable; 2U form-factor

What about power and A/C requirements?– 300 kilowatts

Physical footprint?– 1300 sq. ft.

Location?

10/29/03 LACSI 2003 14

Fred C. Frey Computing Services Center

10/29/03 LACSI 2003 15

SuperMike: Assembled at LSU

10/29/03 LACSI 2003 16

SuperMike’s Specs [Aug. ’02]

512 dual-processor nodes – 3.6 TeraFlops peak– 1 TeraByte RAM– 40 TeraBytes Disk space

Actual aggregate speed: 2.207 TeraFlops

So … actually 3.7 times faster than NCSA’s Netfinity! At time of installation, 11th fastest machine in the world. Still 2nd fastest machine among U.S. academic institution!

10/29/03 LACSI 2003 17

Top500.org [June 2003]

10/29/03 LACSI 2003 18

Intel Intrigued!

10/29/03 LACSI 2003 19

SuperMike: Operation + Management

OS: Linux Redhat 7.2 [kernel: 2.4.9-31 smp] Queueing/Scheduler: PBS/PBS (moving to

PBS/Maui) Global File System: PVFS Nodes/Network Monitoring Tools: xpbsmon/mute +

cluster scripts– Original plan was to use “clusterware” management tools;

but incompatible w/ PBS– Ganglia useful but only enabled, as needed, per node in

order to avoid competition with simulations (some have suggested utilizing “clumon”)

Storage: Fiber-channel connection to SANs + lto tape drives

10/29/03 LACSI 2003 20

SuperMike Usage[Aug. 2003] – Node-days: 10,102/15,872 = 64%

Group Application Node-days % <nodes>

Mech. Eng. CFD 2847 28.2 45

Astrophys. CFD 2514 24.9 61

Chemistry Q. Chem. 2374 23.5 38

Chem + Phys MD 1145 11.3 46

Physics G. Relativity 895 8.9 45

Biol. + Exp. Phys. + Civ. Eng.

--- 274 2.7 21

10/29/03 LACSI 2003 21

SuperMike Usage[Sept. 2003] – Node-days: 10,107/15,360 = 66%

Group Application Node-days % <nodes>

Mech. Eng. CFD 1533 15.2 50

Astrophys. CFD 1595 15.8 45

Chemistry Q. Chem. 3688 36.5 65

Chem + Phys MD 1003 9.9 28

Physics G. Relativity 1744 17.3 42

Biol. + Exp. Phys. + Civ. Eng.

--- 248 2.5 22

10/29/03 LACSI 2003 22

SuperMike: UsageCase Study

Astrophysics: CFD– Hyperbolic + Elliptic PDEs; Home-grown Finite-

Difference algorithm with explicit mpi– Last year’s 12-month, NRAC allocation was

480,000 service units = processor-hours– One month on SuperMike: 2514 node-days =

120,700 processor-hours– Typical job uses 128 processors = 1/8 of

SuperMike’s capacity

10/29/03 LACSI 2003 23

LSU Center for Applied Information Technology and Learning

Ed Seidel, Director, hired July ’03– From Max Planck’s AEI– Numerical Relativity– Grid Computing

Upcoming name change: Center for Computation & Technology [CCT]