clemson: solving the hpc data deluge
DESCRIPTION
In this presentation from the Dell booth at SC13, Boyd Wilson from Clemson describes how Big Data gets handled for HPC at the University. "As science drives a rapidly growing need for storage, existing environments face increasing pressure to expand capabilities while controlling costs. Many researchers, scientists and engineers find that they are outgrowing their current system, but fear their organizations may be too small to cover the cost and support needed for more storage. Join these experts for a lively discussion on how you can take control and solve the HPC data deluge." Watch the video presentation: http://insidehpc.com/2013/12/03/panel-discussion-solving-hpc-data-deluge/TRANSCRIPT
Clemson HPC Storage Dell Panel SC13 Boyd Wilson So,ware CTO Clemson University
Outline
• Palme9o Cluster • Wide Area Storage Across the Innova@on PlaAorm • Collec@ve Cluster (Real-‐Time Data Aggrega@on and Analy@cs Cluster) • Performance Numbers • Research DMZ/Network
Palmetto Storage
Primary Research Cluster at Clemson • 1972 nodes • 22928 Cores • 998400 Cuda Cores • 396 TF (only benchmarked newest GPU nodes) • ~120 + TF addi@onal not benchmarked. • Condominium Model • Home Storage SAMQFS backed by SL8500 (6PB) • Scratch OrangeFS
SAM QFS Home and Archive on
SL8500
Palmetto Storage
Scratch • 32 R510 • 16 R720 • 512TB OrangeFS (v2.8.8)
FDR IB Nodes 200 Nodes
400 Nvidia K20 396 TF MX Nodes
1622 Nodes 96 TF
FDR IB 10G MX
NFS Home/Archive • SAMQFS over NFS • 120TB Disk • 6PB Tape
10G Eth
96 IB Nodes with
Innova@on PlaAorm
Data Access
Campus Data Access
Palmetto Scratch Next Steps
• 32 Dell R720 • 520TB Scratch • OrangeFS • WebDAV to OrangeFS • Hadoop over OrangeFS with MyHadoop
FDR IB Nodes 200 Nodes
400 Nvidia K20 GPU 396 TF
MX Nodes 1622 Nodes
96 TF
FDR IPoIB 10G IPoMX
WebDAV
Mul@ple 10G Eth
ScienceDMZ
Mul@ple 10G Eth / 100 G
OrangeFS Clients • File Write 37Gb/s
• Server Hw problems & network packet loss during tests • Perfsonar 49Gb/s ini@al • Later retest ~70Gb/s with tuning • Addi@onal File tes@ng planned
(Ini@al tes@ng systems had to move to produc@on)
Clemson – USC 100Gb tests
12 Dell R720 OrangeFS Servers
OrangeFS Clients
SC13 Demo
OrangeFS Clients
16 Dell R720 OrangeFS Servers SC13 Floor
• Clemson • USC • I2 • Omnibond
Innova@on PlaAorm
Data Access
Campus Data Access Social Data Input
The “Collective” Cluster • 12 R720 • 170TB • D3 based Vis Toolkit called SocalTap
• Social Media Aggrega@on Via GNIP
• Elas@c Search • Hadoop MapReduce • OrangeFS • WebDAV to OrangeFS
Palme9o
WebDAV
Mul@ple 10G Eth
ScienceDMZ
OrangeFS on Dell R720s
• 16 Dell R720 Servers Connected with 10Gb/s Ethernet • 32 Clients reached nearly 12GB/s read and 8GB/s write
# Write iozone -‐i 0 -‐c -‐e -‐w -‐r $RS -‐s 4g -‐t $NUM_PROCESSES -‐+n -‐+m $CLIENT_LIST # Read iozone -‐i 1 -‐c -‐e -‐w -‐r $RS -‐s 4g -‐t $NUM_PROCESSES -‐+n -‐+m $CLIENT_LIST
MapReduce over OrangeFS
• 8 Dell R720 Servers Connected with 10Gb/s Ethernet • Remote Case adds an additional 8 Identical Servers and does all OrangeFS work Remotely and only Local work is done on Compute Node (Traditional HPC Model)
• *25% improvement with OrangeFS running on Separate nodes from Map Reduce
MapReduce over OrangeFS
• 16 Dell R720 Servers Connected with 10Gb/s Ethernet • Remote Clients are Dell R720s with single SAS disks for local data (vs. 12 disk arrays in the previous test).
Clemson Research Network
100Gig&Tagged&Trunk
Brocade(MLx32(Core((Router
Clemson
CLightCollaborator
F/W&(ACL)&and&Route&Filter
Science(DMZ(
Peer&Link
Perimeter&F/W
Dell&Z9000
Dell&S4810
Dell&S4810Dell&S4810
DMZ
I2&InnovaJon&PlaKorm
PerfSonar
PerfSonar
PerfSonar
PerfSonar
Host&Firewall
Internet/I2/NLR
PerfSonar
CC7NIE
Palme>oNet
Innova@on(PlaAorm
Internet
Campus
Top&of&RackSamQFS
Fibre(Channel