papadopoulos_optiput
TRANSCRIPT
SoCal Infrastructure
OptIPuter Southern CaliforniaNetwork Infrastructure
Philip PapadopoulosOptIPuter Co-PI
University of California, San Diego
Program Director, Grids and clusters
San Diego Supercomputer Center
September 2003
SoCal Infrastructure
UCSD Heavy Lifters
• Greg Hidley, School of Engineering, Director of Cal-(IT)2 Technology Infrastructure
• Mason Katz, SDSC, Cluster Development Group Leader• David Hutches, School of Engineering• Ted O’Connell, School of Engineering• Max Okumoto, School of Engineering
SoCal Infrastructure
Fiber to CRCA
Fiber to6th College
4
4
44 BondedGigE Linksover fiber pairs
4
4
4
4
DedicatedFiber pair
4
Dell 5224
Dell 5224ChiaroEnstara
op-nodes-ucsd-y1 9/26/03 -grh
Shared Shared
OptIPuterOwned
OptIPuterOwned
OptIPuterOwned
Shared
Dell 5224
Fiber to CSE
8-node cluster
8-node cluster
8-node cluster
SDSC SDSC Annex
4-node control
JSOE Preuss School
Geowall PC
9-node VizCluster
SIO SOM
4
Year 1 Mod-0, UCSD
SoCal Infrastructure
Building an Experimental Apparatus
• Mod-0 Optiputer Ethernet (Packet) Based– Focused as an Immediately-usable High-bandwidth Distributed Platform– Multiple Sites on Campus ( a Few Fiber Miles )– Next-generation Highly-scalable Optical Chiaro Router at Center of Network
• Hardware Balancing Act– Experiments Really Require Large Data Generators and Consumers– Science Drivers Require Significant Bandwidth to Storage– OptIPuter Predicated on Price/performance curves of > 1GE networks
• System Issues– How does one Build and Manage a Reconfigurable Distributed Instrument?
SoCal Infrastructure
Raw Hardware
• Center of UCSD Network is a Chiaro Internet Router– Unique Optical Cross Connect Scales to 6.4 Tbit/sec Today
– We Have the 640 Gigabit “Starter” System– Has “Unlimited” Bandwidth from our Perspective– Programmable Network Processors
– Supports Multiple Routing Instances (Virtual Cut-through)– “Wild West” OptIPuter-routed (Campus)– High-performance Research in Metro (CalREN-HPR) and Wide-area– Interface to Campus Production Network with Appropriate Protections
• Endpoints are Commodity Clusters– Clustered Commodity-based CPUs, Linux. GigE on Every Node.
– Differentiated as Storage vs. Compute vs. Visualization– > $800K of Donated Equipment From Sun And IBM
– 128 Node (256 Gbit/s) Intel-based Cluster from Sun (Delivered 2 Weeks ago)– 48 Node (96 Gbit/s), 21TB (~300 Spindles) Storage Cluster from IBM (in
Process) – SIO VIZ Cluster Purchased by Project
SoCal Infrastructure
Storewidth Investigations: General Model
DAV DAVDAV DAV DAV
Local Cluster Interconnect
Parallel Pipes, Large Bisection, Unified Name Space
Viz, C
om
pu
te o
r oth
er C
lustered
E
nd
po
int
Sto
rage C
luster
with
Mu
ltiple
netw
ork an
d d
rive p
ipes
Large Virtual Disk (Multiple Network Pipes)
httpd
pvfs
httpd
pvfs
httpd
pvfs
httpd
pvfs
httpd
pvfs
Local Cluster Interconnect
httpd
pvfs
httpd
pvfs
httpd
pvfs
httpd
pvfs
httpd
pvfs
Local Cluster Interconnect
Aggregation Switch
Aggregation Switch
Chiaro
Aggregation Switch
Symmetric “Storage Service”
1.6 Gbit/s (200 MB/s) - 6 clients & servers (HTTP, 1GB file)
1.1 Gbit/s (140 MB/s) - 7 clients & servers (davFS, 1GB file)
baselin
e
SoCal Infrastructure
Year 2 – Mod-0, UCSD
10
1
4
102
ChiaroEnstara
op-nodes-ucsd-y1.5 9/26/03 -grh
SDSCJSOESDSC
Annex
CSE2
Fiber toCRCA
Fiber to6th College
44Bonded
GigE
1
Dell 5224
8-node cluster(shared)
8-node cluster(shared)
PreussSchool
DellGeowall
IBM 9-nodeViz Cluster
SIO SOM
Dell 5224
Sun 32-nodecomputecluster
Sun 32-nodecompute
cluster
IBM 48-nodeStorageCluster21TB
Dell 5224
IBM 128-nodeCompute Cluster
(shared)
100-node cluster(shared)
Dell 5224
1
Sun 32-nodeStoragecluster
Sun 128-nodecompute(shared)
8-node cluster(shared)
2
4-nodecontrol
Sun 32-nodecomputecluster
GigE Switch10GigE Uplink
GigE Switch10GigE Uplink
UCSD 6509Shared VLAN
SoCal Infrastructure
Southern Cal Metro Extension Year 2
JenksLab
CRCA VizPlanned
6th College VizPlanned
10
4
4
1
4
10
1
4
Dedicated Fiber pair
2
Dell 5224 Dell 5224
ElectronMicroscope
ChiaroEnstara
op-nodes-ucsd+socal-y1.5 9/25/03 -grh
Shared8 Nodecluster
Shared 8node
cluster
OptIPuterGeoWall+ Display
Shared 8node
cluster
SDSC
Shared128 Node
cluster
OptIPuter32 nodestoragecluster
JSOE
SDSCAnnex
OptIPuter Clusters48 node 21TB storage,
32 node compute & 4 node control
Shared100 Node
cluster
CSE
PreussSchool
2
2
Shared VLANOptIPuter 32 nodecompute, 9 node viz cluster,Shared 128 node (Compas)
Dell 5224
SIOSOM
OptIPuter32 nodecomputecluster
Dell 5224
6thCollege
CRCA
44
4
Shared 32nodecluster
UCIKuester
Lab
Shared vizcluster
USCShared
1152 & 64node
clusters
ISIShared 32
nodecluster
Foundry
T320
Cisco
T320
1
1
1
1
1
Teraburst
Teraburst
1
SDSU
CalREN-HPR
GigE Switch10GigE Uplink
UCSD 6509Shared VLAN
GigE Switch10GigE Uplink
SoCal Infrastructure
Aggregates
• Year 1 (Network Build)– Chiaro Router Purchased, Installed, Working (Feb)– 5 sites on Campus. Each with 4 GigE Uplinks to Chiaro– Private Fiber, UCSD-only. – ~40 Individual nodes, Most Shared with Other Projects– Endpoint resource poor. Network Rich
• Year 2 (Endpoint Enhancements)– Chiaro Router – Additional Line Cards, IPV6, Starting 10GigE Deployment– 8 Sites on Campus– h 3 Metro Sites– Multiple Virtual Routers for Connection to Campus, CENIC HPR, others– > 200 Nodes. Most are Donated (Sun and IBM) . Most Dedicated to OptIPuter– Infiniband Test Network on 16 nodes + Direct IB Switch to GigE– Enough Resource to Support Data-intensive Activity, – Slightly network poor.
• Year 3 + (Balanced Expansion Driven by Research Requirements)– Expand 10GigE deployments– Bring Network, Endpoint, and DWDM (Mod-1) Forward Together– Aggregate at Least a Terabit (both Network and Endpoints) by Year 5
SoCal Infrastructure
Managing a few hundred endpoints
• Rocks Toolkit used on over 130 Registered Clusters. Several Top500 Clusters
– Descriptions Easily Express Different System Configurations
– Support IA32 and IA64. Opteron in Progress
• OptIPuter is Extending the Base Software
– Integrate Experimental Protocols/Kernels/Middleware into stack
– Build Visualization and Storage Endpoints
– Adding Common Grid (NMI) Services through Collaboration with GEON/BIRN