SuperMike:LSU’s TeraScale, Beowulf-class Supercomputer
Presented to LASCI 2003
by Joel E. Tohline, former Interim Director
Center for Applied Information Technology and Learninghttp://www.capital.lsu.edu/
October 29, 2003
10/29/03 LACSI 2003 2
AY2001/02 was a Special Year! A decade of experience acquiring and utilizing
parallel architectures in LSU’s Concurrent Computing Laboratory for Materials Simulation.
Beowulf systems maturing– Commodity CPUs becoming good number crunchers– Network bandwidths, robustness, and size improving– Linux OS and message-passing software stabilizing
Numerous LSU groups building Beowulf clusters
Governor “Mike” Foster’s $23M Information Technology Initiative
10/29/03 LACSI 2003 3
Building a Beowulf-Class Super-computer: Considerations [Fall ’01]
What processor and what chipset? What motherboard? How much RAM and on-board disk space? What I/O features? What network interconnect?
How many nodes and processors/node? What encasement and form-factor? What about power and A/C requirements? Physical footprint and location? How to assemble? Must be installed before July 1, 2002!
10/29/03 LACSI 2003 5
NCSA’s Netfinity Cluster [Nov. ’01]
Intel P III 1GHz processors 256 KB L2 cache 2 processors/node 512 nodes 1 TeraFlops peak Network: Myrinet 2000 – 100 Mbit Ethernet Actual aggregate speed: 594 GFlops
So … at worst, same config. w/ 1.8 GHz procs. should give 1.0 TFlops comparable in speed to SDSC’s IBM Power3 (Blue Horizon)
10/29/03 LACSI 2003 6
Competitive Invitation to Bid [Dec. ’01]
Requested bids on two configurations– 512 dual-processor > 1.7 GHz P III– 512 dual-processor > 1.7 GHz dual Xeon
w/ Intel’s 860 chipset Sought Experienced Vendors
– Must be approved Myricom OEM– Must have previously installed a cluster
containing at least 128 nodes
10/29/03 LACSI 2003 8
• 1.8 GHz clock• 512 KB L2 cache• Dual-processor• Hyper-Threading • 400 MHz system bus
http://www.intel.com/design/Xeon/prodbref/
10/29/03 LACSI 2003 10
Tyan “Thunder i7500” Motherboardhttp://www.tyan.com/products/html/thunderi7500.html
10/29/03 LACSI 2003 12
Building a Beowulf-Class Supercomputer: Choices
What processor and what chipset?– Intel 1.8 GHz P4 Xeon DP w/ E7500 chipset
What motherboard?– Tyan “Thunder i7500” motherboard
How much RAM and on-board disk space?– 1 GB RAM and 40 GB IDE disk drive per processor
What I/O features?– CD-ROM, floppy disk, 2 USB, video, keyboard/mouse– Fast Ethernet
What network interconnect?– Mycom’s Myrinet 2000 – 2 Gbit bi-directional
10/29/03 LACSI 2003 13
Building a Beowulf-Class Supercomputer: Choices
How many nodes and processors/node?– 512 nodes; 2 processors/node
What encasement and form-factor?– Rack-mountable; 2U form-factor
What about power and A/C requirements?– 300 kilowatts
Physical footprint?– 1300 sq. ft.
Location?
10/29/03 LACSI 2003 16
SuperMike’s Specs [Aug. ’02]
512 dual-processor nodes – 3.6 TeraFlops peak– 1 TeraByte RAM– 40 TeraBytes Disk space
Actual aggregate speed: 2.207 TeraFlops
So … actually 3.7 times faster than NCSA’s Netfinity! At time of installation, 11th fastest machine in the world. Still 2nd fastest machine among U.S. academic institution!
10/29/03 LACSI 2003 19
SuperMike: Operation + Management
OS: Linux Redhat 7.2 [kernel: 2.4.9-31 smp] Queueing/Scheduler: PBS/PBS (moving to
PBS/Maui) Global File System: PVFS Nodes/Network Monitoring Tools: xpbsmon/mute +
cluster scripts– Original plan was to use “clusterware” management tools;
but incompatible w/ PBS– Ganglia useful but only enabled, as needed, per node in
order to avoid competition with simulations (some have suggested utilizing “clumon”)
Storage: Fiber-channel connection to SANs + lto tape drives
10/29/03 LACSI 2003 20
SuperMike Usage[Aug. 2003] – Node-days: 10,102/15,872 = 64%
Group Application Node-days % <nodes>
Mech. Eng. CFD 2847 28.2 45
Astrophys. CFD 2514 24.9 61
Chemistry Q. Chem. 2374 23.5 38
Chem + Phys MD 1145 11.3 46
Physics G. Relativity 895 8.9 45
Biol. + Exp. Phys. + Civ. Eng.
--- 274 2.7 21
10/29/03 LACSI 2003 21
SuperMike Usage[Sept. 2003] – Node-days: 10,107/15,360 = 66%
Group Application Node-days % <nodes>
Mech. Eng. CFD 1533 15.2 50
Astrophys. CFD 1595 15.8 45
Chemistry Q. Chem. 3688 36.5 65
Chem + Phys MD 1003 9.9 28
Physics G. Relativity 1744 17.3 42
Biol. + Exp. Phys. + Civ. Eng.
--- 248 2.5 22
10/29/03 LACSI 2003 22
SuperMike: UsageCase Study
Astrophysics: CFD– Hyperbolic + Elliptic PDEs; Home-grown Finite-
Difference algorithm with explicit mpi– Last year’s 12-month, NRAC allocation was
480,000 service units = processor-hours– One month on SuperMike: 2514 node-days =
120,700 processor-hours– Typical job uses 128 processors = 1/8 of
SuperMike’s capacity