optimizing network performance alan whinery u. hawaii its april 7, 2010
Post on 22-Dec-2015
214 views
TRANSCRIPT
![Page 1: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/1.jpg)
Optimizing Network Performance
Alan WhineryU. Hawaii ITS
April 7, 2010
![Page 2: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/2.jpg)
IP, TCP, ICMP
When you transfer a file with HTTP or FTP A TCP connection is set up between sender and
reciver The sending computer hands the file to TCP, which
slices the file into pieces, called segments, which it assigns numbers, called Sequence Numbers
TCP hands each piece to IP, which makes datagrams
IP hands each piece to Ethernet driver, which transmits frames
(continued >>> )
![Page 3: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/3.jpg)
IP, TCP, ICMP Ethernet carries the frame (through switches) to a
router, which: takes the IP datagrams out of the Ethernet frames decides where it should go next
Check cache OR queue for CPU If it is not forwarded*, the router may send an ICMP message back
to the sender to tell it why hands it to a different Ethernet driver etc.
(...)
* reasons routers neglect to forward: no route, expired TTL, failed IP checksum, Access-list drop, input-queue flushes, selective discard
![Page 4: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/4.jpg)
IP, TCP, ICMP
The last router delivers the datagrams to the receiving computer by sending them in frames across the final link
the receiving computer extracts the datagrams from the frames,
extracts the segments from the datagrams sends a TCP acknowledgement for this segment's
Sequence Number back to the sender good segments are handed to the application (i.e.
web browser) which will write them to a file on disk
![Page 5: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/5.jpg)
elements on each end computer Disk – data rate, errors DMA – data rate, errors Ethernet (link) driver – link neg., speed duplex, errors
Features: (Int. Coa., Chk. Off., Seg. Off.) buffer sizes, frame size FCS check
TCP (OS) – transport, error/congestion recovery Features (Con. Av., Buffer sizes, SACK,ECN,TS) parameters – MSS, buffer/window sizes
IP4 (OS) – MTU, TTL, Checksum IP6 (OS) – MTU, Hop Limit Cable or transmission space
![Page 6: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/6.jpg)
Brain teaser
A packet capture near a major UHNet ingress/egress point will observe IP datagrams with Good CRCs carrying TCP with bad CRCs. On the order of a dozen or so per hour How can this be?
It's either an unimaginable coincidence, OR The source host has bit errors between the calculation of
TCP checksum and that of IP checksum
![Page 7: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/7.jpg)
elements on each switch (L2/bridge)
link negotiation/physical input queue output queue vlan tagging/processing FCS check Spanning Tree (changes/port-change-blocking)
![Page 8: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/8.jpg)
elements on each router
Everything the switch has, plus route table/route cache
changing, possibly temporarily invalid When cache changes, “process routing” adds
latency ARP
![Page 9: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/9.jpg)
TCP
Like pouring water from a bucket into a two-liter soda bottle. (important to take the cap off first) :^)
If you pour too fast, some water gets lost
when loss occurs, you pour more slowly
TCP continues re-trying until all of the water is in the bottle
![Page 10: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/10.jpg)
Round Trip Time
RTT, similar to the round trip time reported by “ping”, is how long it takes a packet to traverse the network from the sender to the receiver and then back to the sender.
![Page 11: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/11.jpg)
Bandwidth * Delay Product BDP is the one-half RTT times the useful
“bottleneck” transmission rate (BW) of the network path It's actually BW * the one-way delay -- 0.5 * RTT is
an estimate of one-way delay Equal to the amount of data that will be “in
flight” in a “full pipe” from the Sender to the receiver when the earliest possible ACK is received.
![Page 12: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/12.jpg)
How TCP works
S = sender R = receiver S & R set up a “connection”
S & R negotiate RWIN MSS, etc S starts sending segments not larger than MSS R starts acknowledging segments as they are
received in good condition. Acknowledgments refer to last segment received,
not every single segment S limits unacknowledged “data in flight” to R's
advertised RWIN
![Page 13: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/13.jpg)
How TCP works TCP performance on a connection is limited by
the following three numbers: Sender's socket buffer (you can set this)
Must hold 2 * BDP of data to “fill pipe” Congestion Window (calculated during transfer)
Sender's estimate of the available bandwidth Scratchpad number kept by sender based on ACK/loss
history Receiver's Receive Window (you can set this)
must equal ~ BDP to “fill pipe”
These can be specified with nuttcp and iperf OS defaults can be specified in each OS
![Page 14: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/14.jpg)
How TCP works
original TCP was unable to deal with out-of-order segments was forced to throw away received segments that
occurred after a lost segment Modern TCP Has
SACK (selective acknowledgements) Timestamps Explicit Congestion Notification
![Page 15: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/15.jpg)
TCP Congestion Avoidance
Early TCP performed poorly in the face of lost packets, a problem which became more serious as transfer rates increased Although bit-rates went up, RTT remained the
same. Many TCP variants have been customized for
large bandwidth-delay products HSTCP, FAST TCP, BIC TCP, CUBIC TCP, H-TCP,
Compound TCP
![Page 16: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/16.jpg)
Modern Ethernet drivers
Current Ethernet devices offer several optimizations TCP/IP checksum offloading
NIC chipset does checksumming for TCP and Ipv4 TCP segmentation offloading
OS sends large blocks of data to NIC, NIC chops it up Implies TCP Checksum offloading
Interrupt Coalescing After receiving an Ethernet frame, NIC waits for more
before raising interrupt to ICU
![Page 17: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/17.jpg)
Modern Ethernet drivers Optimizing the NIC's switch connection(s)
Teaming Combining more than one NIC into one “link”
Flow-control (PAUSE frames) Allowing the switch to pause the NIC's sending I have not found an example of negative effects Can band-aid problem NICs by smoothing rate and
preventing queue drops (and therefore keeping TCP from seeing congestion)
VLANs Very useful on some servers, as you can set up several
interfaces on one NIC Although it is offered in some Windows drivers, I have
only made it work in Linux
![Page 18: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/18.jpg)
Modern Ethernet drivers
Optimizing the driver's use of the bus/dma/etc. Or Ethernet switch Scatter-gather
Multipart DMA transfers Write-combining
Data transfer “coalescing” Message Signaled interrupts
PCI 2.2 and PCI-E messages that expand available interrupts and relieve the need for interrupt connector pins
Multiple receive queues (hardware steering)
![Page 19: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/19.jpg)
Modern Ethernet drivers
Although there are gains to be had from tweaking offloading and other opts Always baseline a system with defaults before
changing things Sometimes, disabling all offloading and coalescing
can stabilize performance (perhaps exposing a bug) Segmentation offloading affects a machine's
perspective when packet capturing its own frames on its own interface
![Page 20: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/20.jpg)
ethtool
Linux utility for interacting with Ethernet drivers Support and output format varies between drivers Shows useful statistics View or set features (offloading, coalescing, etc) Set Ethernet driver ring buffer sizes Blink LEDs for NIC identification Show link condition, speed, duplex, etc.
![Page 21: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/21.jpg)
ethtool
Linux utility for interacting with Ethernet drivers root@bongo:~# ethtool eth0
Settings for eth0: Supported ports: [ MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: MII PHYAD: 1 Transceiver: external Auto-negotiation: on Supports Wake-on: g Wake-on: d Link detected: yes
![Page 22: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/22.jpg)
ethtool
Linux utility for interacting with Ethernet driversroot@bongo:~# ethtool -i eth0 driver: forcedeth version: 0.61 firmware-version: Bus-info: 0000:00:14.0
root@uhmanoa:/home/whinery# ethtool eth2Settings for eth2: Supported ports: [ ] Supported link modes: Supports auto-negotiation: No Advertised link modes: Not reported Advertised auto-negotiation: No Speed: Unknown! (10000) Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: off Current message level: 0x00000004 (4) Link detected: yes
![Page 23: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/23.jpg)
modinfo
Extract status and documentation from Linux modules (like Ethernet drivers)
root@bongo:~# modinfo forcedethfilename: /lib/modules/2.6.24-26-rt/kernel/drivers/net/forcedeth.kolicense: GPLdescription: Reverse Engineered nForce ethernet driverauthor: Manfred Spraul <[email protected]>srcversion: 9A02DCF1CF871DD11BB129Ealias: pci:v000010DEd00000AB3sv*sd*bc*sc*i*(...)depends:vermagic: 2.6.24-26-rt SMP preempt mod_unloadparm: max_interrupt_work:forcedeth maximum events handled per interrupt (int)parm: optimization_mode:In throughput mode (0), every tx & rx packetwill generate an interrupt. In CPU mode (1), interrupts are controlled by a timer. (int)parm: poll_interval:Interval determines how frequent timer interrupt is generated by
[(time_in_micro_secs * 100) / (2^10)]. Min is 0 and Max is 65535. (int)parm: msi:MSI interrupts are enabled by setting to 1 and disabled by setting to 0. (int)parm: msix:MSIX interrupts are enabled by setting to 1 and disabled by setting to 0. (int)parm: dma_64bit:High DMA is enabled by setting to 1 and disabled by setting to 0. (int)
![Page 24: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/24.jpg)
NDT Network Diagnostic Tool written by Rich
Carlson of US Dept. of Energy Argonne Lab/Internet2
Server written in C, primary client is a Java Applet
![Page 25: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/25.jpg)
NPAD (Network Path and Application Diagnosis)
By Matt Mathis and John Heffner, Pittsburgh Supercomputing Center
Allows for analysis of network loss, throughput not for a target rate and RTT
Attempts to guide user to solution of network problems
![Page 26: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/26.jpg)
Iperf Command-line throughput test server/client Works on Linux/Windows/Mac OS X/ etc. Originally developed by NLANR/DAST Performs unicast TCP and UDP tests Performs multicast UDP tests Allows setting TCP parameters Original development ended in 2002 Sourceforge fork project has produced mixed
results
![Page 27: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/27.jpg)
Nuttcp
Command-line throughput test server/client Runs on Linux, Windows, Mac OS X etc By Bill Fink, Rob Scott Does everything iperf does Also third party testing Bidirectional traceroutes More extensive output
![Page 28: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/28.jpg)
Nuttcp
nuttcp -T30 -i1 -vv 192.168.222.5 30 second TCP send from this host to target
nuttcp -T30 -i1 -vv 192.168.2.1 192.168.2.2 30 second TCP send from 2.1 to 2.2 This host is neither 2.1 nor 2.2 Each of the slaves must be running “nuttcp -S”
![Page 29: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/29.jpg)
Nuttcp (or iperf) and periodic reports
C:\bin\nuttcp>nuttcp.exe -i1 -T10 128.171.6.156 22.1875 MB / 1.00 sec = 186.0967 Mbps 7.3125 MB / 1.00 sec = 61.3394 Mbps 14.0000 MB / 1.00 sec = 117.4402 Mbps 12.8125 MB / 1.00 sec = 107.4796 Mbps 7.1250 MB / 1.00 sec = 59.7715 Mbps 6.4375 MB / 1.00 sec = 53.9991 Mbps 10.7500 MB / 1.00 sec = 90.1771 Mbps 4.8750 MB / 1.00 sec = 40.8945 Mbps 9.5625 MB / 1.00 sec = 80.2164 Mbps 1.9375 MB / 1.00 sec = 16.2529 Mbps
97.0625 MB / 10.11 sec = 80.5500 Mbps 3 %TX 6 %RX Seeing 10 1-second samples tells you more about a test
than one 10-second average
![Page 30: Optimizing Network Performance Alan Whinery U. Hawaii ITS April 7, 2010](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d815503460f94a66e1c/html5/thumbnails/30.jpg)
Testing notes
Neither iperf nor nuttcp uses TCP auto-tuning