simulation of large-scale communication networks how large? how fast? mostafa ammar, steve ferenci,...
TRANSCRIPT
Simulation of Large-Scale Communication NetworksHow Large? How Fast?
Mostafa Ammar, Steve Ferenci, Richard Fujimoto, Kalyan Perumalla, George Riley, Alfred Park, Hao Wu
Georgia Institute of Technology
Outline Quantifying Simulator Performance Parallel Network Simulation Software
Federated approach to parallel network simulation
Experimental StudyPerformance measurements ranging from one
to over 1500 CPUs Future Challenges
Large-Scale Network Simulation Simulation an indispensable tool to study the behavior of
computer communication networks Network protocol evaluation Security attacks, countermeasures Interdependencies among critical infrastructures
Most studies examine a few to a few thousands of nodes Useful to understand protocol behaviors (for example) Limitations of existing tools
Large-scale network simulation offers Verify validity of simulation results on small networks Examine issues of scale Validate theoretical models for large networks
Here, focus on packet-level simulation of wired networks Discrete event simulation Many tools exist: NS2, Opnet, Qualnet, …
Packet-Level Simulation Performance: A Quantitative Approach
One can characterize a simulation workload by the number of packet transmissions that must be simulated Bulk of the computation involves simulating packets moving hop
by hop through the network (queueing, transmitting over link, etc.) Typically, two simulator events per “packet hop” Define a packet transmission as sending one packet over a single
communication link
One can characterize a simulator’s performance by the number of packet transmissions it can simulate in one second of wallclock time
Quantifying Packet-Level Simulator Performance Execution time: T ≈ (NF * PF * HF) / PTS
NF = number of flows
PF = packets sent per flow
HF = average hops per flow
PTS = simulator speed (simulated packets transmissions / sec) Ignores lost packets, protocol generated packets (e.g., acks)
Example 500,000 active UDP flows, 1.0 Mbps per flow, average of 8 hops to
reach the destination Assume 1KByte packets (125 packets per sec per flow) Workload: simulate 500 Million packet transmissions per second of
network operation
Real time performance: can simulate one second of network operation in one second of wallclock time
Number ofpacket transmissions (hops)to be simulated
Scalability of Packet Level Simulators
Network Size(hosts, routers, etc.)
Sim
ulat
or S
peed
- P
TS
(tra
ffic
tha
t ca
n b
e s
imu
late
d in
re
al t
ime)
1 102 104 106 108
102
104
106
108
1010
SequentialSimulation
Time ParallelSimulation
Space-parallelSimulation
(parallel discreteevent simulation)
Our focus
Outline Quantify Packet-Level Simulator
Performance Parallel Network Simulation Software
Federated approach to parallel network simulation
Experimental StudyPerformance measurements ranging from one
to over 1500 CPUs Future Challenges
Approaches to Parallel Network Simulation
Build “from scratch” approach: Substantial effort to build &
validate new models Users must learn a new
simulator SSFNet, TeD, Qualnet,
ROSS, Javasim, Warped, TeleSim, AdHopNet…
Large-scaleparallel network
simulatorBackplane/RTI
NS2 NS2 NS2 NS2
Federated simulation approach: Integrated existing simulators via
a software backplane/RTI Exploit existing software,
validated model & user base Heterogenous simulations UPS (queueing nets), PDNS,
GTNets, Genesis
Parallel Simulation Software Parallel/Distributed NS (PDNS)
Developed by Riley (‘99); optimized by Perumalla and Park (‘03) Based on ns-2.1b9/2.26 compiled for RedHat Linux using gcc-2.95 Optimizations: NixVectors, message compression
Georgia Tech Nework Simulator (GTNetS) Developed by Riley (‘02) Network simulation environment designed for scalable, efficient, distributed
execution Current Models
Links: Ethernet, Point-to-Point Routing: Static and NixVectors Detailed IPV4 model TCP: Tahoe, Reno, NewReno UDP: On-Off Sources Queuing: DropTail, RED Under development: TCP-Sack, IEEE 802-11 Wireless, BGP (Using Zebra),
DSR/AODV Wireless Routing More detailed layer 2 & 3 models than NS; memory efficient
Simulator
RTI interface
Federates (e.g., ns2, gtnets)
RTI
Simulator
RTI interface
Software Architecture
RTI Software / Interface (e.g., HLA)
RTI-Kit: Primitives for building RTIs
FM-Lib (low level communications)
other libraries: buffer management priority
queues, etc.
MCAST(group
communication)
TM-Kit(time management
algorithms)
Internet
RTI interface
Jane Server
Jane Client
Jane client/server architecture:Remote control via the Internet
Outline Quantify Packet-Level Simulator
Performance Parallel Network Simulation Software
Federated approach to parallel network simulation
Experimental StudyPerformance measurements ranging from one
to over 1500 CPUs Future Challenges
Performance Study
Single Campus Network
LAN(4 sub-LANs,
42 hosts)
Server
Figure courtesy of David Nicol
Benchmark network (Dartmouth, Nicol, et al.) Building Block: Campus Network
538 nodes 504 clients
Multiple Campus Networks (CNs) connected to form a ring
Up to 10,000 campus networks (~5 Million nodes)
Links up to 2Gb/s Link delay ranging from 1ms to
200ms Additional chord links
Goal: Assess performance / scalability of parallel, federated, network simulators
Network Topologies: CampusNet(Dartmouth)
10 campus networks connected in ring
Single Campus Network 538 nodes 543 links
MilNet (Dartmouth, UCR)
Dartmouth:3886 nodes
ORNL: 9177 nodes
Campus:538 nodes Backbone based on maps collected
by RocketFuel Six major U.S. ISPs (3,036 routers) Link bandwidth based on network maps
published by each ISP Link delay based on distance
164 Military LANs, 3 types
Traffic Scenarios CampusNet ftp traffic (Dartmouth)
Each client sends 500K bytes (file xfer; TCP request) from server in next campus network in ring
Variations: traffic to distant servers, UDP, mix of TCP and UDP traffic, long data transfers
Web traffic Based on [Mah, Infocom ‘97]
DDoS attack, detection, filtering SYN Flood, UDP storm Background traffic from CAIDA traces, ISI RAMP
Worm attacks UDP worm propagation
Hardware Platforms Sequential: Sun / Solaris
Ultra-80, UltraSPARC-II 450MHz 4GB memory
Parallel: Intel / RedHat Linux 7.3 8-way Pentium-III XEON (2MB L2 cache) SMP 550MHz clock speed 4GB memory 17 SMPs (136 CPUs) connected via Gigabit Ethernet
Performance measurements are conservative (due to hardware performance)
Sequential Performance Comparison (Single Campus Network)
COTS
(Sun/Solaris)
ns-2**
(Sun/Solaris)
GTNetS (Sun/Solaris)
ns-2**
(Intel/Linux)
Events 30,700,649 9,107,023 9,143,553 9,117,070
Packet Transmissions* 4,658,390 4,546,074 4,571,264 4,551,084
Events/Packet Transmission 6.59 2.00 2.00 2.00
Run Time (sec) 1,677 104 112.3 48
Packet Trans. / Sec. (PTS) 2,778 43,712 40,706 94,814
* A packet transmission involves simulating a packet transmission over a single link** Includes NixVectors optimization
Average end-to-end delay differed by less than 3%
Sequential Performance
100
1000
10000
100000
0 3000 6000 9000 12000
Number of Nodes
Pkt Trans / Sec
NS (Intel)NS (Sun)COTS (Sun)
1 Campus Net
Campus network topology; increase number of CNs in ring configuration
FTP traffic
PDNS Performance on Cluster(Perumalla/Park)
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
8 16 24 32 40 48 56 64 72 80 88 96 104 112 120
Processors
Packet Transmissions per second
Each processor simulates 10 CNs (scale problem size) Up to 120 processors simulating 645,600 nodes
PDNS: More Runs Scenario 1: Campus Network Scenario
Optimized PDNS 658,512 nodes, 616,896 traffic flows 5.5 Million PTS on 136 Processors Chord links, randomized traffic reduce performance
increased interprocessor communications 2.0 to 2.6 Million PTS on 128 processors, 482K nodes
Scenario 2: Denial of Service Attack Scenario SynFlood attack, 25,000 attacking hosts Campus network configuration 50% original traffic in “background” 1.5 Million PTS on 136 Processors
Scenario 3: Milnet Network Scenario 166,478 nodes 142,083 FTP flows (based on CAIDA traces) 1.4 million PTS on 64 processors
Lemieux Supercomputer
Pittsburgh Supercomputing Centerhttp://www.psc.edu/machines/tcs/lemieux.html
•750 HP-Alpha ES45 servers
•4Gbytes memory per server
•4 CPUs per server
•1GHz CPU
•3000 CPUs total
•64-bit computing
•Quadrics interconnect
PDNS Performance on PSC(Perumalla)
02040
6080
100120140
0 256 512 768 1024 1280 1536Processors
Million Pkt Trans/sec
Ideal/LinearPDNS Performance
147K PTS on one CPU Campus network topology, FTP traffic (500 packets/flow, TCP) Scale problem size & number CPUs (up to ~4 million network nodes) Performance up to 106 Million PTS
GTNetS Performance (PSC)(Riley)
Run 1: Campus network configuration 512 Processors 5.5 Million Nodes, 5.2 Million flows 12.3 Million PTS
Run 2: Near real time web traffic simulation Empirical HTTP Traffic model [Mah, Infocom ‘97] 512 processors 1.1 million nodes, 1.0 Million web browsers 20.5 Million TCP Connections 541 seconds of wallclock time to simulate 300
seconds of network operation
Performance Summary
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
1.E+08
1.E+09
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
Network Size (Nodes)
Execution Speed (Normalized PTS)
SequentialParallel
Execution speed normalized to single CPU PSC performance
Summary and Current Work Simulated packet transmissions/sec (PTS)
benchmarking metric Large-Scale network simulation is feasible
>100 Million PTS can be achieved to simulate networks containing millions of nodes and traffic flows
Performance highly network and scenario dependent
Current Work More complex network configurations
Irregular traffic, topologies
Synchronization protocols Improving usability of the tools
Many Challenges Remain
Modeling issues [Floyd/Paxson] Building credible large-scale models and scenarios Verifying and validating large-scale simulations
Topology? Traffic? Methodologies and tools to effectively utilize the simulators
How large is large enough?
Tools & Parallel Simulation Issues Robust performance Making parallel simulation more transparent, “automatic” Access to HPC platforms Visualization Tools
Application Studies Killer apps?
Simulating the Internet remains a major challenge
Acknowledgements
Funding for this research provided by NSF Grants ANI-9977544 and ANI-0136939 DARPA Contract N66001-00-1-8934