1 from airplanes to elephants (the pac-10 meets internet2) chris thomas sr. network engineer ucla...
TRANSCRIPT
1
From Airplanes to Elephants(The Pac-10 meets Internet2)
Chris ThomasSr. Network Engineer
UCLA Office of Information [email protected]
April 25, 2006
2
Introduction (“Bring it on!”)
3
Use of Video in Coaching
• Essential to any modern sports program– It’s not your father’s athletic department– Football is the largest sports video user today– Other sports interested – Multi-purpose (officiating, recruiting, coaching)
• No In-person scouting (NCAA)– Schools exchange video
4
Video Exchange Process
• Each school tapes own football games – Broadcast quality video equipment– Captured to disk as DV-25 (25 Mbit/sec
uncompressed video stream)
• Typical game is 18 gigabyte video file
• Loaded into video editing system– Manually index start & end of each play
5
(Example Video)
6
Process (cont)
• Early each Sunday– Put all season’s games (on disks) into a box– Go to airport (may not be close), send box to
next opponent, wait until his box arrives– Return to office, import and mark plays,
distribute to coaches– Coaches expect video by Sunday evening– Repeat next weekend with next opponent
• Obvious opportunity (FTP)
7
The Challenge
• Exchange multiple 18 GB files between ten schools every weekend (FTP)
• Windows-based systems (and users)
• File transfer is only one step in a process– Time-constrained– Dependable (predictable)
• People interested in solution rather than networks
8
I2 File Exchange Pilot Results
• Began with Fall 2005 football season in Pac-10• Five participants in pilot
– UCLA, USC, Stanford, Washington, Norte Dame
• Used BBFTP (freeware)– http://doc.in2p3.fr/bbftp/– UNIX server with UNIX & Windows clients– Selected on basis of performance on simulated
network
• Central server model (run by UCLA)– Alternative is peer to peer
9
Central Server vs. Peer-to-Peer
• BBFTP server runs on UNIX only
• LINUX sender faster than Windows sender– Server activity is mostly sending
• Upload a game only once, not once/week
• Talk to same server every week
• Expertise on one end of every connection
• Get full benefit of your link speed
10
Results (cont)
Two schools with 1Gb/s local connections, three with 100 Mb/s local connections:
Uploads: 65 games
Downloads: 160 games
Total gigabytes transferred: 4100 GB
Largest file transferred: 62.7 GB
11
How long is typical transfer?
• 100 Mb/s – 22-26 min per game
• Gigabit – 6-9 min per game
• 10 Gigabit – 15 seconds!
12
Results (cont)
• Initial glitch in BBFTP client (Cygwin)– After correction (4th week), essentially 100% success– Pleasant surprise (no drops, no garbles)
• All 10 conference schools are passionate about participating next season– “Saved me an incredible amount of time”– “Coaches got video 12 hours earlier”– “I got to spend Sunday mornings with my kids”
13
Results (cont)
• Mostly, very large file transfer over I2 works well– 4 TB on schedule, without incident– Long term stability?
• Elimination of courier costs makes participation by other sports feasible
• Saving $10K/yr offsets a good part of I2 membership fee
14
“What works? What doesn’t?”
• My comments are meant as observations and not as criticisms– Project impossible without Internet2
• What is acceptable use of Internet2?– “Academic Use”?– Not fully appreciated by all implementers
15
WW? WD? (cont)
• High-performance networking expertise not always readily available to end-user– “Big numbers are at your end.”– We need to build local expertise, and make
sure the right people find it
• Biggest load your campus has seen– “Network police” (exception monitoring)
• What is message to our user communities?
– You WILL debug campus network problems
16
Why doesn’t FTP go fast?
• Network speeds (slowest link governs)
• Speed of light (fast but not infinite)
• Network congestion (other traffic)
• PC hardware & software
17
WW? WD? (cont)
• Abilene is a fine backbone– But you don’t connect directly (most cases) to
Abilene
• Slowest regional network links are 100 times slower than Abilene and fastest regional networks– 10 Gb down to OC3 (155 Mb/s)– Slow (OC3) links are often heavily congested
• 0.5% busy on 10G is 50% busy on 100M
18
WW? WD? (cont)
• Effective BW of Internet2 is overstated for non-aggregate flows– Two of ten Pac-10 schools are OC3 limited
• Reason is almost always cost
– Internet2 was formed to make HP networking between universities universal
• We’re not there yet
– Firewalls & other campus bottlenecks
19
WW? WD?
• Firewalls– Hacker legacy Performance-crippled networks
• Significant problem now – imposes limits on high-speed networking in many places
• People installing firewalls often have no idea of performance impact
• Some inappropriate forces (grant applications)• Valid protocols broken: PMTU, ECN, et al
– Firewall-free VLANs for applications requiring high-performance networking is a possible solution
20
“Speed of Light” Issue
21
TCP Window Size
• “Ping-pong”– Due to speed of light, 15 serves/second– Need to have a lot of balls in flight at once (500+)– Controlled by “window_size” connection parameter
• Windows default wsize 1000x too small (UNIX is “only” 100x too small)– Hard cap on maximum pps / transfer speed
• Independent of link speed
– Set “window scaling” in registry to ON (RFC1323) to permit large window( > 64KB)
– Up to software how much window it uses
22
Key Registry Settings
23
Receive Windows for 1 Gbps
1 MB
3 MB 4 MB 5 MB64KB limit is 32 miles 2 MB
Dykstra, SC2004
24
Maximum TCP/IP Data RateWith 64KB window
45Mbps
100Mbps
622Mbps
Dykstra, SC2004
25
Congestion Control
• Original TCP lacked ability to deal effectively with congestion (other traffic, speed mismatch)– Need to find receiver’s maximum speed– TCP Reno, congestion window
• Turns out TCP Reno’s fix is overkill• Only some versions of UNIX have optimum
congestion control today– Big throughput advantage with LINUX sender
• CW is invisible (Web100 kernel patch)• Selective ACK matters
26
PC Hardware Matters for FTP!
• At 100 Mb/sec, typical recent PCs can keep up with network
• At gigabit speeds, most PCs can’t go fast enough and become the bottleneck
• Use iperf to measure– http://dast.nlanr.net/Projects/Iperf/– Warning: Doesn’t include effect of disk speed– Warning: Make sure to use large windowsize– Warning: There are “better” tools
27
PC Hardware (cont.)
• Disk speed– Most likely Gb bottleneck. Single disks go about 55
MB/sec. A Gb network is 125 MB/sec (2.5x).– Older disks can be 20 – 30 MB/sec– http://www.simplisoftware.com/Public/index.php?
request=HdTach– (Some) disk arrays are fast
• Watch RAID-5 write speeds – can be 10% of read speed• TIP: pair of new WD Raptor II in RAID-0 stripe will do Gb
28
PC Hardware (cont.)
• Motherboard / Bus– Look for PCI Express bus & slots
• PCI is just too slow
– Affects both NIC and disk speed
• CPU– 2.5 Ghz good (mono-core P4)
• Memory– Usually not an issue (512 MB)
29
PC Hardware (cont.)
• Network Card (NIC)– Large onboard buffers (e.g., 96KB) – TCP offload– For future, jumbo frames (9000 bytes)– Highly recommend: SysKonnect SK-9S21 /
SK-9E21 (about $100)– Even if you have a Gb port on the m’board,
adding an SK card may help (2x)
30
PC Operating System
• Anti-virus software typically has a major impact (~50%) on disk write speed– Dual copy– Temporarily disable or configure to bypass
exchange directory.
• Test performance under real conditions– Real networks have delay, packet duplication
& reordering, and even packet loss– Build and use a (cheap) network simulator
31
PC Operating System (cont.)
• High-speed TCP & FTP development is ongoing– Done by Computer Science departments on UNIX– Done at physics institutes on LINUX machines (10G)– LINUX is actively involved in high-performance TCP
• Windows XP TCP Reno is > 5 years behind current (2.6.12+) LINUX in the critical area of congestion control.– Can make a major difference on busy links (sender
side matters)
32
FTP application
• Use of TCP window– Can it use large (~5MB) window?– Client-side specified?
• Multiple streams– Conventional wisdom doesn’t seem to hold; Optimum
varies with link quality; 4-5 good– Warning: XP SP2 limits half-open connections to 10;
others wait (event log ID 4226)
• Encryption– UID/PW = Yes!! Data = No
33
Jumbo Frames
• Ethernet requires small pieces for network transmission– 1500 bytes -- Called MTU Size (Maximum
Transmission Unit) or Frame• Hasn’t changed in 25 years (speeds have!)
– Larger pieces, called Jumbo Frames, are more efficient (faster)
– Internet2 Jumbo size is 9000 bytes– Traditionally, throughput scales with MTU (6x)
• 2x – 2.5x is more typical
34
WW? WD?
• Jumbo frames are still a “future objective” in most cases– Every switch and router in path must support
jumbo frame size– Abilene and up-to-date regional networks
support jumbo frames today– Campus networks mostly don’t
35
Network Simulator
• Not $$$ Smartbits™
• Linux PC– Any basic PC with two network interface cards
• NISTNet (Linux 2.4)– http://snad.ncsl.nist.gov/itg/nistnet/
• Netem (Linux 2.6)– http://linux-net.osdl.org/index.php/Netem
36
Simulator
37
WW? WD?
• Ubiquitous 100 isn’t here yet– Lack thereof is impacting deployment of new
apps with high-end requirements
• File exchange testing with ~25 schools– 10 Mb/s connection seems standard for
Athletics to campus backbone– With 100M/1G, 30-40Mb/s throughput is
typical
38
WW? WD?
• Recurring issue: “Transfer is slow. Is bottleneck the PC or the network?”
• I’d like to see I2 maintain a “connectivity database”
• I’d like to see every I2 member running:– “Public” iperf server– “Public” pathrate / pathload server
• http://www.pathrate.org/
• My current solution: ship known PC around to each client
39
WW? WD?
• Know your own network– What file transfer rates are realistic from your
campus to another?
• Whom do users get referred to when they call NOC with performance issue?– Make sure your (potential) users have access
to campus expertise
• “Be Galileo, not Aristotle”
40
Thank you!
Questions?