1
Cluster or Network?An Emulation Facility for
Research
Jay Lepreau Chris AlfeldDavid Andersen (MIT) Mac Newbold
Rob Place Kristin Wright
Dept. of Computer ScienceUniversity of Utah
http://www.cs.utah.edu/flux/testbed/
February 3, 2000
2
Research We Do
•Operating systems, local and distributed
•Distributed systems
Web caching schemes, distributed objects, ...
•Active Networks
code in every packet: route me!
Configurable router
•Router operating systems
3
What?
•A configurable Internet (cluster) in a room
230 nodes, 1000 links, BFS (switch)
virtualizable topology, links, software
•An instrument for experimental CS research
•Universally available to any remote experimenter
•Simple to use!
4
Why?
• “We evaluated our system on five nodes.” -job talk from university with 300-node cluster
• “We evaluated our Web proxy design with 10 clients on 100Mbit ethernet.”
• “Simulation results indicate ...”
• “Memory and CPU demands on the individual nodes were not measured, but we believe will be modest.”
• “The authors ignore interrupt handling overhead in their evaluation, which likely dominates all other costs.”
• “Resource control remains an open problem.”
5
Why 2
• “You have to know the right people to get access to the cluster.”
• “The cluster is hard to use.”
• “<Experimental network X> runs FreeBSD 2.2.x.”
• “October’s schedule for <experimental network Y> is…”
• “<Experimental network Z> is tunneled through the Internet”
6
Complementary to Other Experimental Environments
•Simulation
•Small static testbeds
•Live networks
•Maybe someday, a large scale set of distributed small testbeds (“Access”)
7
Some Unique Characteristics
• Significant scale: initially 225 nodes, degree four 100Mb links between 42 core routers.
•User-configurable control of “physical” characteristics: shaping of link latency/bandwidth/drops/errors(via invisibly interposed “shaping nodes”),router processing power, buffer space, …
•Node breakdown: 42 core, 160 edge, 26 shaping, 2 management
8
More Unique Characteristics
• Capture of low-level node behavior such as interrupt load and memory bandwidth
•User-replaceable node OS software
•User-configurable physical link topology(VLAN via BFS; “P-LAN” via BFPP)
• Completely configurable and usable by external researchers, including node power cycling
9
Fundamental Research Leverage:
Extremely Configurable
10
Obligatory Pictures
11
Prototype Pieces: edge nodes
12
Big Iron
13
A View from the Dark Side
14
And the Light Side
15
Artist’s Conception
16
Zoom in: “Delay” Node
17
Feature:Automatic mapping of desired
topologies and characteristics to physical resources
•Algorithm goals:
minimize likelihood of experimental artifacts (bottlenecks)
“optimal” packing of multiple simultaneous experiments
Complete in finite time!
• Constraint-based heuristic algorithm (version 2!)
• Feature: accepts ns-compatible specification
18
Current Algorithm
• Simulated annealing
Make random change (move node from one switch to another), compute score, accept/reject based on current temp.
•Heuristic algorithm
•~ 4 seconds for 30 nodes; polynomial
• Improve:
Hardwired node connections will slow it down x100
Edge nodes
Speed - incremental score recomputation
Virtual Topology
Mapping into Physical Topology
21
Roatan: Remote Console for a Node
22
Early Network Configuration GUI
23
Research Applications
• Simulation validation
•Active networks
•Resource demands of services inside routers
•Denial-of-service resistance
• Interaction of adaptive applications and protocols
•All sorts of distributed system experiments
• ...
24
Research Applications (continued)
•Detailed performance monitoring and analysis
•Relationships between {node, link, topology} characteristics and
Application performance
Task scheduling and assignment
Communication software
Application algorihms
….
25
Study: Interconnection Techniques
• Point-to-point vs.always through a switch
Salmon et al (Caltech)
• Cost vs. performance
•Of most interest on large clusters
• Locality of communication patterns
• Interference with local processing
•Ad hoc mobile networking
26
Research Issues and Other Challenges
• Calibration, validation, and scaling: how to emulate different speed networks? Scaling behavior of emulating faster links by slowing nodes?
• Can we sufficiently capture real router internal behavior in a PC?
•Assuring validity: detecting switch bottlenecks, measuring and controlling physical characteristics without introducing artifacts.
•Algorithms and software to map requirements to resources while minimizing artifacts.
• Integrate with ns?
• Providing a reasonable user interface to all this.
27
Final Remarks
•Should be limping next month
•Looking for feedback on your potential use
•Looking for early users
•Collaborators/clients: UU Physics, CMU CS, MIT CS, Georgia Tech, IBM research
•Sponsors: University of Utah, Novell, DARPA, Compaq, Nortel, <your_name_here>